How does it work? In geek-speak, it's a multi-tier, fully distributed system. In plain language,
it runs like this:
We have a central server, which coordinates all the activity of the system and is the central data
store. It keeps track of what url's have been crawled, and receives back the data that comes from
the crawler.
There is a "feed", which can crawl around the internet, getting html pages, pull data out of xml
sources, or take it from news wires directly. At the moment, it's just crawling around, grabbing
pages that are new enough to be of interest. Periodically, it starts a new crawl against the
sources it's directed to. There can be as many "feeds" as desired, and
soon, we'll start distributing the feed, so that other people can contribute content and search
power to the system.
There is a Java applet, which is the part of the system you see. It is the thing that comes up
in your browser, asks you to login, gets the data for the pages the server has, then searches through
that data, using the power of your machine, to find the things you are looking for. This client goes
back to the server every so often, asking for any new data, which is then sent down for display to you!
Pretty simple, really.
The whole system is written in Java, which allows us to run it on virtually anything, and you to run
the client on any java-capable browser.
NewsToYou.com - making the web self-aware®
NewsToYou.com
Copyright 2003-2010, and is a TradeMark of Spinning Cogs, Inc.