onion-crawler

Onion Crawler

Scrape and Store Dark Web Sites | Crawler for Dark Web | Search Engine Oriented

</p> [![DOCS](https://img.shields.io/badge/Documentation-see%20docs-green?style=flat-square&logo=appveyor)](https://documenter.getpostman.com/view/9118595/TVCiUS16) [![UI ](https://img.shields.io/badge/User%20Interface-Link%20to%20UI-orange?style=flat-square&logo=appveyor)](INSERT_UI_LINK_HERE) ## Functionalities - [x] fetch onion links - [x] recursive fetching - [x] store scrapped data - [x] user added url - [x] url blacklisting </br> ## Increasing the crawler reach ```txt Increase Crawl Depth Add More starter links Create more spiders with special focus on Directories ``` </br> ## Spiders - `DRL` Link Dir Onion - A big directory of urls - `UADD` User Added - Added by user - presently links are appened in _user_added_urls.txt_ under spider_data - Crawled in exactly similar fashion as to DRL </br> ## Instructions to run * Pre-requisites: - Py3 - Tor * < directions to install > ```bash pip install -r requirements.txt ``` * < directions to execute > ```bash # start tor on port 9150 pproxy -l http://:8181 -r socks5://127.0.0.1:9150 -vv scrapy crawl name_of_spider # DRL ``` ## Sample JSON - [DRL](https://github.com/1UC1F3R616/onion-crawler/blob/master/dark_web_scraping/scraped_data_DRL_2020-07-02T00-58-53.json) - [UADD](https://github.com/1UC1F3R616/onion-crawler/blob/master/dark_web_scraping/scraped_data_UADD_2020-07-02T08-06-50.json) - [TF-IDF Type](https://github.com/1UC1F3R616/onion-crawler/blob/master/dark_web_scraping/scraped_data.json) ## Instructional Video - [YouTube URL](https://www.youtube.com/watch?v=AGe3Mh91pNA) ## Bigger Datasets - [DarkNet Dataset](https://1uc1f3r616.github.io/Dark-Net-Websites-Dataset/) ## Contributors

1UC1F3R616 (Kush Choudhary)

Made with :heart: by DSC VIT