And how to get the actual site’s html. And maybe some knowledge of how arrays work. Okay okay, and some looping. But that’s not too hard, right?
Why? Well, it’s really just pretty simple. Let’s do an example. The extremely complicated code below was taken from W3 Schools.
<!DOCTYPE html> <html> <head> <title>Page Title</title> </head> <body> <h1>This is a Heading</h1> <p>This is a paragraph.</p> </body> </html>
Let’s say we want to get the title and the innerHTML of the
// Get the html with request or some other XHR method. const html = someWayToGetTheHTML(); const title = html.split('<title>').split('</'); const paragraph = html.split('<p>').split('</'); // title => 'Page Title' // paragraph => 'This is a paragraph.'
There. Done. Pretty simple, right?
Let’s break down a bit what we did. We take the html and
split() on the opening tag of the element we want. In this case, we get arrays with two elements in it. All of the string before the opening tag and then all of the string after the opening tag. Note that the item you are splitting on is in neither element.
const splitTitle = html.split('<title>'); // splitTitle => ["<!DOCTYPE html>↵<html>↵<head>↵", "Page Title</title>↵</head>↵<body>↵↵<h1>This is a H…/h1>↵<p>This is a paragraph.</p>↵↵</body>↵</html>"] const splitParagraph = html.split('<p>'); // splitParagraph => ["<!DOCTYPE html>↵<html>↵<head>↵<title>Page Title</title>↵</head>↵<body>↵↵<h1>This is a Heading</h1>↵", "This is a paragraph.</p>↵↵</body>↵</html>"]
So you select the second element
 and then split the latter part off at the closing tag.
Not so bad, right? I know, you are probably saying…
“The page I’m scraping has 322 p tags, Jordan. This method is garbage.”
Don’t despair, my young friend. And don’t worry, you didn’t offend me by calling my method garbage. I mean, maybe it thinks you are garbage.
Yes, 322 p tags does make it more complicated but you just have to get a bit more creative with your splitting and maybe dig a bit deeper. Split and then split again. And then again. And maybe even again.
Why do I like this method? Sometimes I just don’t want to dig into the docs of Cheerio or Puppeteer. I think I’m a pretty sharp guy but man, this way is easy. I don’t have to learn or remember how to use one of the other tools.
Anyway, hope it helps. Please don’t share this with anyone. It’s a secret. They wouldn’t understand.