Project 5:Scraping Women sunglass data from https://www.sunglasshut.com

The purpose of this project was to scrape sunglass data’s from https://www.sunglasshut.com/, mainly women’s sunglass information is required with the following data columns: sunglass Name, Sunglass Model, Sunglass Price, Sunglass Size and Sunglass Color . A total of 1 to 20 page sizes are scrapped with 18 pages of data, resulting in a 365 x 5 matrix. I’ve first investigated the website by first check the elements with “Inspect” and see how the required data are tagged with which HTML tag element.

I’ve selected Beautiful Soup and Requests libraries of python to scrap this website as there is no hidden API Json file available in network monitor tab of internet browsers. All the required information is linked with other page using header reference attributes so that I’ve partitioned my python code with three modules, the first one will parse the href for getting sunglass product information and pass this value to the second function to parse sunglass data. The third module is the main function to call the two defined functions by setting page limit sizes. I could able to retrieve all the required data by using the sleep function to stop the request for 3-14 seconds and then resume it. This helps the server to avoid being overloaded by iteratively modifying the request pattern.

To avoid being prevented from accessing the website, I used also Change User Agent with request library to change the default User Agent.The project is completed by achieving the desired goal with transformed data values, despite the fact that bypassing the website's security feature is difficult in and of itself. However, using the techniques described above, I have able to bypass the security feature and retrieve the total number of required instances.