Jordan Scrapes Secretary of States: West Virginia

Demo code here

Okay, I’ll admit it. I really don’t know anything about West Virginia. I’m still scraping its secretary of state for business leads. If you look at a map, it’s definitely west of Virginia so the name checks out.

map of with west virginia and virginia in it

I choose it at random for scraping and it turned out to be an easy scrape using some of the techniques that I’ve built over the other secretary of state pages I’ve scraped.

Investigation

West Virginia pretty gif

I try to look for the most recently registered businesses. They are the businesses that very likely are trying to get setup with new services and products and probably don’t have existing relationships. I think typically these are going to be the more valuable leads.

If the state doesn’t offer a date range with which to search, I’ve discovered a trick that works pretty okay. I just search for “2020”. 2020 is kind of a catchy number and because we are currently in that year people tend to start businesses that have that name in it.

Once I find one of these that is registered recently, I look for a business id somewhere. It’s typically a query parameter in the url or form data in the POST request. Either way, if I can increment that id by one number and still get a company that is recently registered, I know I can find recently registered business simply by increasing the id with which I search.

West Virginia, fortunately, had an advanced search which included adding a date range.

West Virginia secretary of state advanced search options

Selecting any of these revealed what I was looking for. A business id in the query parameter that appeared to be numeric. Incrementing it by one shows another recently registered business. BAM. Newly registered businesses found.

west virginia secretary of state with query parameter in the url

The code

funny gif of ship crashing
There is something mesmerizing about this gif.

This part is crazy simple. I depend on Axios to make the get request and cheerio to parse the html. I start with a basic function looping through 20 ids to check that they are indeed incrementing.

(async () => {
	// const startingId = 11045521;
	const startingId = 493294;

	for (let i = 0; i < 20; i++) {
		await getBusinessDetails(startingId + i);
	}

})();

And then the getBusinessDetails function just takes the id, makes the get request with the incremented id and gets the fields we want.

async function getBusinessDetails(id: number) {
	const url = `https://apps.sos.wv.gov/business/corporations/organization.aspx?org=${id}`;

	const axiosResponse = await axios.get(url);

	const $ = cheerio.load(axiosResponse.data);

	const title = $('#lblOrg').text();
	const date = $('table:nth-of-type(1) tr:nth-of-type(3) td:nth-of-type(4)').text();
	const address = $('table:nth-of-type(3) tr:nth-of-type(3) td:nth-of-type(1)').text();
	const officer = $('table:nth-of-type(4) tr:nth-of-type(3) td:nth-of-type(1)').text();

	const business = {
		title: title,
		date: date,
		address: address,
		officer: officer
	};

	console.log('business', business);
}

The html is super simple here. Each section of data is within a table so I use nth-of-type to find the one I want and then I just pluck from the rows and cells to grab the data I want from those. Very simple scrape. The end.

These posts are starting to get smaller, it seems. I think this is partially because I’m getting better at this. If I’m missing some things that you would be intersted in, please let me know and I’ll be happy to go into more depth.

Demo code here

Looking for business leads?

Using the techniques talked about here at javascriptwebscrapingguy.com, we’ve been able to launch a way to access awesome web data. Learn more at Cobalt Intelligence!

Leave a Reply

Your email address will not be published. Required fields are marked *