Arch is an open source extension of Apache Nutch (a popular, highly scalable general purpose search engine) for intranet search. Not happy with your corporate search engine? Not surprising, very few people are. To the best of our knowledge, there are no intranet engines that work as well as the Google's global Web search does. There is a fundamental reason for this: the algorithms used by Google on the global Web (or similar) do not work nearly as well on intranets for the lack of statistical data. Arch (finally!) solves this problem. It uses a novel method to deliver high precision search results that works great. Don't believe it? Blind test evaluation tools are included.
- Document level security. Users can find only documents that they are authorized to see.
- Inexpensive index updates. Arch is able to keep indexes up to date and avoid regular complete site recrawling.
- 24/7 availabilty. There is always a working index available, even if a crawl fails.
- Support for simultaneous indexing and search of multiple web sites, with ability to search and administer any site separately, if needed. Dynamic adding and removal of web sites is easy.