Project 7:Scraping Amazon Website to scrap SQL Book Data

The project's goal was to scrape SQL book data from https://www.amazon.com/ using data fields with four properties. Book Name, Published Date, rating out of 5 and Global Ratings are the columns. The data was scraped from 1 to 20 different page sizes, resulting in a 263 x 4 matrix.

I began my investigation of the website by using "Inspect" to examine the elements and observe how the required data is tagged with which HTML tag element.

I chose the Python Modules Beautiful Soup and Requests to scrape this website since they are extremely quick at downloading the data content i needed in a short period of time. I always use a technique for sending requests to the targeted server that involves delaying the request for a certain amount of time and then altering the request pattern by stopping the request for 10-20 seconds and then restarting it using the sleep and randint routines. This allows me to scrape data iteratively with several sites without being blocked by the server.

The project was completed by achieving the desired goal with cleaned data values, despite the fact that bypassing the website's security feature of Amazon is difficult in and of itself. However, using the techniques described above, I was able to bypass the security feature and retrieve the total number of required data.