Puppeteer page interaction. Jordan Teaches Web Scraping

Demo code here

Welcome to the sixth post in the Learn to Web Scrape series. In this post we are going to go over using puppeteer for page interaction. If you are completely new to this, I would highly recommending visiting the first post on puppeteer. I’d also highly recommend learning more about css selectors for html.

The tools and getting started

minecraft tools gif

This section I will include in every post of this series. It’s going to go over the tools that you will need to have installed. I’m going to try and keep it to a minimum so you don’t have to add a bunch of things.

Nodejs – This runs javascript. It’s very well supported and generally installs in about a minute. You’ll want to download the LTS version, which is 12.13.0 at this time. I would recommend just hitting next through everything. You shouldn’t need to check any boxes. You don’t need to do anything further with this at this time.

Visual Studio Code – This is just a text editor. 100% free, developed by Microsoft. It should install very easily and does not come with any bloatware.

You will also need the demo code referenced at the top and bottom of this article. You will want to hit the “Clone or download” button and download the zip file and unzip it to a preferred location.

clone or download repository

Once you have it downloaded, unzipped, and with Nodejs installed, you need to open Visual Studio Code and then go File > Open Folder and select the folder where you downloaded the code.

open a folder

We will also be using the terminal to execute the commands that will run the script. In order the open the terminal in Visual Studio Code you go to the top menu again and go Terminal > New Terminal. The terminal will open at the bottom looking something (but probably not exactly like) this:

cursor in terminal

It is important that the terminal is opened to the actual location of the code or it won’t be able to find the scripts when we try to run them. In your side navbar in Visual Studio Code, without any folders expanded, you should see a > src folder. If you don’t see it, you are probably at the wrong location and you need to re-open the folder at the correct location.

After you have the package downloaded and you are at the terminal, your first command will be npm install. This will download all of the necessary libraries required for this project.

Using puppeteer for page interaction

puppeteer page interaction fun gif
This makes you think of interaction. …right?

General page interaction with puppeteer is very simple and straightforward. You can easily type into an input and then just submit it. For our example today we are going to use the Roller Coaster Database because it has a full featured form. Plus it’s just a really cool website. Here we are:

    await page.goto('https://rcdb.com/os.htm?ot=2');

    await page.type('#nc', 'dragon');

    await page.click('#sub input');

    // Pause to see the interaction
    await page.waitFor(1500);

With this, we find the input labeled with the id of '#nc' and then type “dragon” into it.

#nc input field

We then find the submit input and click it. Pretty easy.

submit button

Selecting dropdowns

Another common use is selecting a value from a dropdown. This is also very easily done with puppeteer. In this example we are going to change the status of the roller coasters we are looking for to “operating” and submit the form.

advanced search rcdb

The code that we use for this looks like this:

    await page.goto('https://rcdb.com/os.htm?ot=2');

    await page.select('#st', '93');

    // Pause to see the interaction
    await page.waitFor(1750);

    await page.click('#sub input');

    // Pause to see the interaction
    await page.waitFor(1500);

I am using the puppeteer function select and for this it wants to know the value of the item we want to set the select. I found this beforehand by inspecting the dropdown and finding the value I wanted.

operating value

Bam. Pretty easy and done!

Weirder interactions

The interaction that is different in this one is where we want to select the location. When you click the location dropdown it doesn’t just open a dropdown, it opens a modal with all of the regional options.

location modal

We could go through and select the region we want but that gets a little bit tricky since we have to wait for it to show up, then select the region we want and then possibly select the sub region. This is all very possible with puppeteer but let’s go for an easier way.

We do something similar to what we did above and just go and find the id of the region we want. For this case I’m using Alberta, Canada.

alberta canada id

Then, once we have that id we can just set the value of the input directly.

    await page.goto('https://rcdb.com/os.htm?ot=2');

    await page.$eval('#targetol', (element: any) => element.value = '19');

    await page.click('#sub input');

    // Pause to see the interaction
    await page.waitFor(1500);

Because it is a hidden input, what is actually reflected on the page does not match the actual value. When we actually click the submit button, though, it does search for Alberta, Canada as we are hoping.

final puppeteer page interaction

Bam. We just did some cool page interactions with puppeteer.

Demo code here

Looking for business leads?

Using the techniques talked about here at javascriptwebscrapingguy.com, we’ve been able to launch a way to access awesome web data. Learn more at Cobalt Intelligence!

Leave a Reply

Your email address will not be published. Required fields are marked *