I’m going to summarize my goal with this scrape. I went over this in my last post but I’m going to try and do a better job here. When I was looking for private label products, my main goal was to find a good search keyword(s) where I wanted to rank highly on.
For example: https://www.amazon.com/s?k=pasta+makers. The overall sales volume on this page was good. The price point was pretty good. There was competition but I could differentiate enough based on price that I felt like I had a chance.
So that is the goal with this scrape. Find some good search keywords. Well, obviously this makes it a lot more complicated. I can’t just search through a bunch of individual products. I have to find products that may be eligible and then I have to somehow craft what the search keywords would be. I then check that search keyword to see if matches the criteria outlined above.
Main problem so far – Crafting the search query
It pretty much is coming down to the search query. I’m just not able to (yet) algorithmically create a good search query.
This is what I am doing now to create it. I find a product that I think could be a good seed for a search query, I remove the brand name, cut off anything after ‘,’ and ‘-‘. I then take the first four words.
// Remove the brand, split on commas and dashes searchTerm = title.replace(brand, '').split(',').split('-'); // Get the first four words searchTerm = searchTerm.split(' ').slice(0, 4).join().replace(/,/g, ' ');
What are some better ways to do this?
The result so far
They’re garbage. I’m just not getting good results from the scrape. On my last scrape, these are my five results over 1,000 iterations (though I think it failed before it hit the full 1,000 iterations):
[ 'https://www.amazon.com/s?k=', 'https://www.amazon.com/s?k= ', 'https://www.amazon.com/s?k= BTH', 'https://www.amazon.com/s?k= SoundAsleep CloudNine Series', 'https://www.amazon.com/s?k= 233TC Essential Cotton' ]
Value to be had from that list? 0. The first two can be filtered out easily. If the search string is empty, let’s not save it. But the other three are just bad. There is not even really an inspiration to be had.
So…I may put this scrape on hold for now. If someone wants the code it will be out there but I’m out of ideas right now of how to improve it.