Jordan Scrapes Secretary of State: Hawaii

Demo code here

Today we go to one of my favorite places. Hawaii! It’s not an overly difficult task web scraping the Hawaii secretary of state. This is the 16th (!) post in the Secretary of State scraping series.

Hawaii is a place I love. I went to college there for one semester and I sometimes have regrets that I didn’t go more than that. It’s a beautiful place where the weather is always great. Dang, talking about it now is making me want to go.

Investigation

Hawaii surfing gif

I try to look for the most recently registered businesses. They are the businesses that very likely are trying to get setup with new services and products and probably don’t have existing relationships. I think typically these are going to be the more valuable leads.

If the state doesn’t offer a date range with which to search, I’ve discovered a trick that works pretty okay. I just search for “2020”. 2020 is kind of a catchy number and because we are currently in that year people tend to start businesses that have that name in it.

Once I find one of these that is registered recently, I look for a business id somewhere. It’s typically a query parameter in the url or form data in the POST request. Either way, if I can increment that id by one number and still get a company that is recently registered, I know I can find recently registered business simply by increasing the id with which I search.

Hawaii secretary of state search

The business search for Hawaii was simple. Just swap the search mode to “Contains…” and then search for 2020. Clicking on the details of a page gives a url like this:

Hawaii business URL

Bingo. This is already a good indication that this may be a simple scrape. Maybe we can just interate and increase this number to find recently registered businesses. The “C5” part is different since it’s not a pure number but maybe that’s something that I can just append.

This business (3KAI LLC) is registered on June 28, 2020.

If I go up one digit to “238736C5” the business (LUKE VR, LLC) is registered on July 8, 2020. Going up one more showed a business registered on July 6, 2020. So, not completely ascending but definitely safe to assume that incrementing this number is going get more recently registered businesses.

The code

Surfing gif
I think this picture is amazing. Just like this code.

The code for scraping Hawaii can’t get much simpler. We loop through the ids, increasing them by one each time.

(async () => {
	const startingId = 238735;
	for (let i = 0; i < 10; i++) {
		await getDetails(startingId + i);
		await timeout(1000);
	}
})();

And then from there we just pluck the data that we want from the details page. We show an example of title and filing date here but if you expand it you can use it to get any data for which you are looking.

async function getDetails(sosId: number) {
	const axiosResponse = await axios.get(`https://hbe.ehawaii.gov/documents/business.html?fileNumber=${sosId}C5`);
	const $ = cheerio.load(axiosResponse.data);
	const title = $("#myTabContent .row div:nth-of-type(1) dl:nth-of-type(1) dd:nth-of-type(1)").text();
	const filingDate = $("#myTabContent .row div:nth-of-type(1) dl:nth-of-type(1) dd:nth-of-type(6)").text();

	console.log("Master Name-", title);
	console.log("Registration Date-", filingDate);


	const business: IBusiness = {};
	business.title = title;
	business.filingDate = filingDate;

	console.log("business", business);
}
Results from looping through Hawaii businesses

The CSS selectors here is a little advanced. I’d recommend reading more on CSS selectors from W3 here. I also have a post specifically on some of hte advanced CSS selectors here.

And that’s it!

Looking for business leads?

Using the techniques talked about here at javascriptwebscrapingguy.com, we’ve been able to launch a way to access awesome web data. Learn more at Cobalt Intelligence!

Leave a Reply

Your email address will not be published. Required fields are marked *