EXTRACTING DATA FROM WEBSITES ON THE EXAMPLE OF DEVELOPING A PLUGIN FOR WEB SCRAPING

Authors

Keywords:

web scraping, extracting data, web site

Abstract

Internet search involves not only getting acquainted with this or that information in order to gain new knowledge, it also involves comparing, analyzing and summarizing data found on different websites and pages. This problem can be solved by monotonous manual copying of the required information into your own files and formatting them according to your needs. Such a process is inefficient because it takes a lot of time.

Web scraping is an automated process of extracting huge amounts of data from websites and converting it into structured data. Programs that carry out such processes are called web scrapers and are able to retrieve required HTML content, work with JavaScript, filter received data and output it as a ready-to-use database, Excel spreadsheet, CSV file or separate API etc.

Presented is an implementation option of web scraping plugin for Chromium-based browsers. The plugin has client and server sides. Implementing client part involves designing and creating interface of plugin pop-up window, web page elements highlight on hover. Server side of the plugin analyzes requests, generates links and performs element selection.

References

Introduction to Web Scraping - GeeksforGeeks. GeeksforGeeks. URL: https://www.geeksforgeeks.org/introduction-to-web-scraping/?ref=rp (date of access: 29.03.2023).

Що таке веб-скрейпінг і як він пов’язаний з проксі. Enterprise data gathering infrastructure | ASTROPROXY. URL: https://astroproxy.com/ua/blog/shho-take-veb-skreiping-i-yak-vin-povyazanii-z-proksi (дата звернення: 29.03.2023).

Web Scraping. Techopedia. URL: https://www.techopedia.com/definition/5212/web-scraping (date of access: 29.03.2023).

What is Web Scraping and How to Use It? - GeeksforGeeks. GeeksforGeeks. URL: https://www.geeksforgeeks.org/what-is-web-scraping-and-how-to-use-it/ (date of access: 29.03.2023).

ScrapingBot • Web Scraping API - Extract HTML content. Scraping-bot.io. URL: https://www.scraping-bot.io/ (date of access: 29.03.2023).

Web Scraping Tool & Free Web Crawlers | Octoparse. Web Scraping Tool & Free Web Crawlers | Octoparse. URL: https://www.octoparse.com/ (date of access: 29.03.2023).

Chrome Extensions architecture overview - Chrome Developers. Chrome Developers. URL: https://developer.chrome.com/docs/extensions/mv3/architecture-overview/ (date of access: 29.03.2023).

Published

2023-05-22

How to Cite

[1]
Pavlenko, Y. and Pelekch, H. 2023. EXTRACTING DATA FROM WEBSITES ON THE EXAMPLE OF DEVELOPING A PLUGIN FOR WEB SCRAPING. Applied Problems of Computer Science, Security and Mathematics. 1 (May 2023), 76–80.