Welcome to the sixth post in the Learn to Web Scrape series. In this post we are going to go over using puppeteer for page interaction. If you are completely new to this, I would highly recommending visiting the first post on puppeteer. I’d also highly recommend learning more about css selectors for html.
The tools and getting started
This section I will include in every post of this series. It’s going to go over the tools that you will need to have installed. I’m going to try and keep it to a minimum so you don’t have to add a bunch of things.
12.13.0 at this time. I would recommend just hitting next through everything. You shouldn’t need to check any boxes. You don’t need to do anything further with this at this time.
Visual Studio Code – This is just a text editor. 100% free, developed by Microsoft. It should install very easily and does not come with any bloatware.
You will also need the demo code referenced at the top and bottom of this article. You will want to hit the “Clone or download” button and download the zip file and unzip it to a preferred location.
Once you have it downloaded and with Nodejs installed, you need to open Visual Studio Code and then go File > Open Folder and select the folder where you downloaded the code.
We will also be using the terminal to execute the commands that will run the script. In order the open the terminal in Visual Studio Code you go to the top menu again and go Terminal > New Terminal. The terminal will open at the bottom looking something (but probably not exactly like) this:
It is important that the terminal is opened to the actual location of the code or it won’t be able to find the scripts when we try to run them. In your side navbar in Visual Studio Code, without any folders expanded, you should see a
> src folder. If you don’t see it, you are probably at the wrong location and you need to re-open the folder at the correct location.
After you have the package downloaded and you are at the terminal, your first command will be
npm install. This will download all of the necessary libraries required for this project.
Using puppeteer for page interaction
General page interaction with puppeteer is very simple and straightforward. You can easily type into an input and then just submit it. For our example today we are going to use the Roller Coaster Database because it has a full featured form. Plus it’s just a really cool website. Here we are:
await page.goto('https://rcdb.com/os.htm?ot=2'); await page.type('#nc', 'dragon'); await page.click('#sub input'); // Pause to see the interaction await page.waitFor(1500);
With this, we find the input labeled with the id of
'#nc' and then type “dragon” into it.
We then find the submit input and click it. Pretty easy.
Another common use is selecting a value from a dropdown. This is also very easily done with puppeteer. In this example we are going to change the status of the roller coasters we are looking for to “operating” and submit the form.
The code that we use for this looks like this:
await page.goto('https://rcdb.com/os.htm?ot=2'); await page.select('#st', '93'); // Pause to see the interaction await page.waitFor(1750); await page.click('#sub input'); // Pause to see the interaction await page.waitFor(1500);
I am using the puppeteer function
select and for this it wants to know the value of the item we want to set the select. I found this beforehand by inspecting the dropdown and finding the value I wanted.
Bam. Pretty easy and done!
The interaction that is different in this one is where we want to select the location. When you click the location dropdown it doesn’t just open a dropdown, it opens a modal with all of the regional options.
We could go through and select the region we want but that gets a little bit tricky since we have to wait for it to show up, then select the region we want and then possibly select the sub region. This is all very possible with puppeteer but let’s go for an easier way.
We do something similar to what we did above and just go and find the id of the region we want. For this case I’m using Alberta, Canada.
Then, once we have that id we can just set the value of the input directly.
await page.goto('https://rcdb.com/os.htm?ot=2'); await page.$eval('#targetol', (element: any) => element.value = '19'); await page.click('#sub input'); // Pause to see the interaction await page.waitFor(1500);
Because it is a hidden input, what is actually reflected on the page does not match the actual value. When we actually click the submit button, though, it does search for Alberta, Canada as we are hoping.
Bam. We just did some cool page interactions with puppeteer.