EXTRACTING DATA FROM WEBSITES ON THE EXAMPLE OF DEVELOPING A PLUGIN FOR WEB SCRAPING
Keywords:
web scraping, extracting data, web siteAbstract
Internet search involves not only getting acquainted with this or that information in order to gain new knowledge, it also involves comparing, analyzing and summarizing data found on different websites and pages. This problem can be solved by monotonous manual copying of the required information into your own files and formatting them according to your needs. Such a process is inefficient because it takes a lot of time.
Web scraping is an automated process of extracting huge amounts of data from websites and converting it into structured data. Programs that carry out such processes are called web scrapers and are able to retrieve required HTML content, work with JavaScript, filter received data and output it as a ready-to-use database, Excel spreadsheet, CSV file or separate API etc.
Presented is an implementation option of web scraping plugin for Chromium-based browsers. The plugin has client and server sides. Implementing client part involves designing and creating interface of plugin pop-up window, web page elements highlight on hover. Server side of the plugin analyzes requests, generates links and performs element selection.
References
Introduction to Web Scraping - GeeksforGeeks. GeeksforGeeks. URL: https://www.geeksforgeeks.org/introduction-to-web-scraping/?ref=rp (date of access: 29.03.2023).
Що таке веб-скрейпінг і як він пов’язаний з проксі. Enterprise data gathering infrastructure | ASTROPROXY. URL: https://astroproxy.com/ua/blog/shho-take-veb-skreiping-i-yak-vin-povyazanii-z-proksi (дата звернення: 29.03.2023).
Web Scraping. Techopedia. URL: https://www.techopedia.com/definition/5212/web-scraping (date of access: 29.03.2023).
What is Web Scraping and How to Use It? - GeeksforGeeks. GeeksforGeeks. URL: https://www.geeksforgeeks.org/what-is-web-scraping-and-how-to-use-it/ (date of access: 29.03.2023).
ScrapingBot • Web Scraping API - Extract HTML content. Scraping-bot.io. URL: https://www.scraping-bot.io/ (date of access: 29.03.2023).
Web Scraping Tool & Free Web Crawlers | Octoparse. Web Scraping Tool & Free Web Crawlers | Octoparse. URL: https://www.octoparse.com/ (date of access: 29.03.2023).
Chrome Extensions architecture overview - Chrome Developers. Chrome Developers. URL: https://developer.chrome.com/docs/extensions/mv3/architecture-overview/ (date of access: 29.03.2023).
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Ю.С. Павленко, Г.В. Пелех

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.