Sorry for my confusing words.
My end goal is find a Clojure/Java library like Python’s Scrapy.
Scraping and crawling, I think they are almost same. The really difference is web indexer
and web crawler
.
After a closer view of Nutch, Heritrix3, I found both of this are mainly web indexer which mainly used for scraping web pages then used by search engine, of coursed can be used for crawler too. But not very good for web crawler.
You can check out Nutch document, it use scraped web pages for search engine like Apache Solr. I don’t need this. I want to save parsed data on web pages to database, and do some data science.
Maybe I can use part of Nutch. I will take a closer look into Nutch about how it works.