Onion Crawler
Scrape and Store Dark Web Sites | Crawler for Dark Web | Search Engine Oriented
## Functionalities
- [x] fetch onion links
- [x] recursive fetching
- [x] store scrapped data
- [x] user added url
- [x] url blacklisting
## Increasing the crawler reach
Increase Crawl Depth
Add More starter links
Create more spiders with special focus on Directories
## Spiders
- `DRL` Link Dir Onion
- A big directory of urls
- `UADD` User Added
- Added by user
- presently links are appened in _user_added_urls.txt_ under spider_data
- Crawled in exactly similar fashion as to DRL
## Instructions to run
* Pre-requisites:
- Py3
- Tor
* < directions to install >
pip install -r requirements.txt
* < directions to execute >
# start tor on port 9150
pproxy -l http://:8181 -r socks5:// -vv
scrapy crawl name_of_spider # DRL
## Sample JSON
- [DRL](https://github.com/1UC1F3R616/onion-crawler/blob/master/dark_web_scraping/scraped_data_DRL_2020-07-02T00-58-53.json)
- [UADD](https://github.com/1UC1F3R616/onion-crawler/blob/master/dark_web_scraping/scraped_data_UADD_2020-07-02T08-06-50.json)
- [TF-IDF Type](https://github.com/1UC1F3R616/onion-crawler/blob/master/dark_web_scraping/scraped_data.json)
## Instructional Video
- [YouTube URL](https://www.youtube.com/watch?v=AGe3Mh91pNA)
## Bigger Datasets
- [DarkNet Dataset](https://1uc1f3r616.github.io/Dark-Net-Websites-Dataset/)
## Contributors
1UC1F3R616 (Kush Choudhary)
Made with :heart: by DSC VIT