Jordan Handles Rate Limiting

Demo code here

No direct scraping in this post. Although it is tangentially related. Any time you are web scraping time is often an important aspect. There is a lot of data to sift through. This post talks about ways to handle that sifting in things like rate limiting.

Rate limiting is often imposed when you are using third party APIs. It’s something the API uses to protect itself from being overwhelmed. Going over the rate limit can result in the API request just being refused (ouch) or being queued. Both of these can have pretty rough consequences.

The scenario that I had was when sending out a large number of emails. Being rate limited was very scary to me because I didn’t want to lose track of who had already received the email and then duplicate it to that user later, spamming them.

Console.time

fun time gif

console.time is a very neat tool that took me way too long to discover. Whenever I searched for “how to see how long javascript function takes” this was not at the top of the list.

It’s very easy to use. Use console.time('timerName') to start and console.timeEnd('timerName') to end it, where it will log out the total time used.

Here’s an example (bad, don’t do this):

(async () =>  {
	console.time('TimeTest');
	consoleTimeTest();
	console.timeEnd('TimeTest');
})();

async function consoleTimeTest() {
	const smallArray = new Array(100);

	for (let item of smallArray) {
		await timeout(500);
	}
}

Running this function will result in the following:

TimeTest result.

.669ms! That’s fast. Because…we didn’t wait for the promise to resolve. Let’s try again with an await.

(async () =>  {
	console.time('TimeTest');
	await consoleTimeTest();
	console.timeEnd('TimeTest');
})();

async function consoleTimeTest() {
	const smallArray = new Array(100);

	for (let item of smallArray) {
		await timeout(500);
	}
}

And the result:

timetest result with await

~50 seconds; exactly what I would expect from 100 array items with a 500ms pause between each iteration.

Now with some rate limiting

limiting gif fun

Alright, now let’s assume that we have something that limits us to five emails sent a second. We want to take advantage of the asychronous I/O nature of Node but don’t want to overwhelm the server.

(async () => {
	// Rate limiting
	const bigArray = new Array(50000);

	console.time('Start');
	console.time('Five check');
	for (let i = 0; i < bigArray.length; i++) {
		console.log(new Date(), 'Doing something with this ***', i);

		somethingThatTakesTime(i);

		if (i % 5 === 0) {
			await timeout(1000);
			console.timeEnd('Five check');
			console.time('Five check');
		}
	}
	console.timeEnd('Start');
})();

I landed on something like the above. It loops quickly and sends out 5 requests to a task that takes a random amount of time. The somethingThatTakesTime function looks like this:

function somethingThatTakesTime(index: number) {
	setTimeout(() => {
		console.log(new Date(), 'Completed', index);
	}, Math.floor(Math.random() * 2500));
}

Look how it works in the gif below. You can see the items get queued in order but the completion of the items happens asychronously. You can also see the time check for every time we send five requests. Sending the requests takes only between 1 – 250ms but then it pauses for 1000ms in order to comply with the rate limiting.

rate limiting pausing every five seconds

And that’s it. It feels very satisfying to be able to handle things as efficiently as possible and this was no exception.

Looking for business leads?

Using the techniques talked about here at javascriptwebscrapingguy.com, we’ve been able to launch a way to access awesome web data. Learn more at Cobalt Intelligence!

Leave a Reply

Your email address will not be published. Required fields are marked *