Onion Crawler
Scrape and Store Dark Web Sites | Crawler for Dark Web | Search Engine Oriented
</p>
[![DOCS](https://img.shields.io/badge/Documentation-see%20docs-green?style=flat-square&logo=appveyor)](https://documenter.getpostman.com/view/9118595/TVCiUS16)
[![UI ](https://img.shields.io/badge/User%20Interface-Link%20to%20UI-orange?style=flat-square&logo=appveyor)](INSERT_UI_LINK_HERE)
## Functionalities
- [x] fetch onion links
- [x] recursive fetching
- [x] store scrapped data
- [x] user added url
- [x] url blacklisting
</br>
## Increasing the crawler reach
```txt
Increase Crawl Depth
Add More starter links
Create more spiders with special focus on Directories
```
</br>
## Spiders
- `DRL` Link Dir Onion
- A big directory of urls
- `UADD` User Added
- Added by user
- presently links are appened in _user_added_urls.txt_ under spider_data
- Crawled in exactly similar fashion as to DRL
</br>
## Instructions to run
* Pre-requisites:
- Py3
- Tor
* < directions to install >
```bash
pip install -r requirements.txt
```
* < directions to execute >
```bash
# start tor on port 9150
pproxy -l http://:8181 -r socks5://127.0.0.1:9150 -vv
scrapy crawl name_of_spider # DRL
```
## Sample JSON
- [DRL](https://github.com/1UC1F3R616/onion-crawler/blob/master/dark_web_scraping/scraped_data_DRL_2020-07-02T00-58-53.json)
- [UADD](https://github.com/1UC1F3R616/onion-crawler/blob/master/dark_web_scraping/scraped_data_UADD_2020-07-02T08-06-50.json)
- [TF-IDF Type](https://github.com/1UC1F3R616/onion-crawler/blob/master/dark_web_scraping/scraped_data.json)
## Instructional Video
- [YouTube URL](https://www.youtube.com/watch?v=AGe3Mh91pNA)
## Bigger Datasets
- [DarkNet Dataset](https://1uc1f3r616.github.io/Dark-Net-Websites-Dataset/)
## Contributors
1UC1F3R616 (Kush Choudhary)
|
Made with :heart: by DSC VIT