Intelligent Agents for retrieving
Electronic Documents from the Internet

twinIsles.dev

1. Background: the Internet
2. Intelligent Agents
3. Agents in Use
4. Products currently available
5. Research
6. Conclusions and Predictions
7. Further Reading
8. References

1. Background: the Internet
The Internet is a global computer network consisting of machines belonging to governments, academic institutions, businesses of all sizes and private individuals. The Internet has experienced phenomenal growth since the development of the World Wide Web in 1991 and the release of MOSAIC, the first graphical web browser in 1993 [A Little History of the World Wide Web]. It now houses a vast quantity of information; the search engine Google claims to index over 1.3 billion web pages.

The current state of technology enables anyone with a computer, modem and telephone line to effortlessly access this vast storehouse of knowledge. USENET news groups and free web hosting services allow that same user base to actively contribute to the 'net as easily as viewing it.

By making information readily available and facilitating communication among the masses unrestricted by physical distance or political boundary the Internet may be viewed as a great liberator. There is, however, a downside. The sheer quantity of material existing in cyberspace introduces the problem of information overload. Finding the precise documents needed among all the dross is akin to looking for the proverbial needle in a haystack. This problem has been described as "infoglut" by the Gartner Group.

Numerous search engines exist with the purpose of providing maintaining searchable indexes of web pages. Whilst these are a powerful tool, considerable skill is required in framing queries which retrieve the most useful content and eliminate that without value. Search engine output can be distorted by web designers biasing their pages to achieve higher listings in the search engines. To make matters worse services such as GoTo permit web site owners to buy prominent positions in search engine results.
Back to top

2. Intelligent Agents
One solution to the problem discussed above is provided by software known as intelligent agents. Millman defines an agent as a "self-sufficient piece of code that can make decisions without human intervention." Moreover it has "the capability to 'learn', that is, it becomes more effective as it is used".

According to Hendler an effective agent must be:

Communicative; it must share with the user a body of knowledge and vocabulary relating to the topic of interest.
Capable; it must be able to take action on the user's behalf, e.g. entering registration details and credit card information in order to obtain the required documents.
Autonomous; it must be able to act on its own initiative. Ideally the degree of autonomy should be user configurable.
Adaptive; it must have the ability to learn from its experience.

Millman additionally suggests mobility as a desirable quality for an agent. This means the agent software is able to travel through the Internet, executing on a variety of hosts before returning with its findings.

One possible obstacle to the wider deployment of agents arises from the potential security risks posed by independent programs roaming the Internet and executing on any machine at will. Systems managers and users alike will need to be reassured in this area if agents are to make a major impact on the way we use the 'net.
Back to top

3. Agents in Use
The following scenario, from the not too distant future, illustrates the use of intelligent agent software in electronic document retrieval. A researcher wishes to obtain information concerning current laboratory experiments in the field of parapsychology. He enters his request into the agent's interface using natural language. If necessary the agent responds, in natural language, by requesting further details / clarification, e.g. the agent may ask what period the researcher is interested in or whether, and how much, he is prepared to pay to retrieve documents. When the agent has sufficient information it begins searching leaving the researcher free to concentrate on other matters.

A little later the researcher receives an e-mail listing a number of documents, ranked in order of relevance, that the agent has located. The agent invites the researcher to grade the documents retrieved in terms of their usefulness, this allows the agent to build a profile of the user and better tailor its results in future searches. Additionally the agent will watch the documents reported as being most useful and notify the researcher when they are updated.
Back to top

4. Products currently available
At the lower end of the market is WebFerret, an intelligent search tool from FerretSoft. Installed on the user's desktop it offers a means of querying multiple search engines, setting the closeness of match and filtering and sorting the results. WebFerret is available for free download from the developer's website. The free version includes banner advertising. A "PRO" version of the software allowing advertising to be turned off is available for $26.95 for a single copy with discounts for bulk orders.

VIA, Versatile Intelligent Agents, by Kinetoscope is a modular agent system based on the Java language and aimed at the corporate market. Kinetoscope's website describes how VIA allows developers to use VIA's pre-existing task libraries and agents or to write their own to customise VIA for the specific needs of their business. The website includes a case study detailing the adoption of a VIA solution by Schlumberger, the multi-billion dollar oil and telecommunications company.

Concordia from Mitsubishi is a system for developing mobile agents. Concordia agent applications are able to access and deliver information across multiple networks, even when the user is off-line. An evaluation version, excluding "reliability and security components or features", is available for free download from the Concordia website.
Back to top

5. Research
Lieberman, of the Massachusetts Institute of Technology Media Laboratory, has developed Letizia, an intelligent agent that assists web browsing. Letizia is used in conjunction with a conventional web browser, sitting in the background as the user navigates the web. Letizia continually attempts to anticipate the user's requirements by following links, initiating searches etc. The user may call upon Letizia at any time to peruse its findings.

The University of Washington's Department of Computer Science and Engineering is conducting research into intelligent software agents for the Internet. Its Softbots homepage describes the Internet Learning Agent, an application enabling people to find information on the 'net. The Internet Learning Agent "forms models of information resources by interacting with them".

The Multi-Agent Systems Laboratory at the University of Massachusetts has developed BIG, A Resource-Bounded Information Gathering Agent. BIG was developed in response to two observations. Firstly, much information gathering takes place in support of a broader decision-making process. Secondly, constructing a model from the results of information gathering is a matter of interpretation, i.e. the model must be built from a number of incomplete documents.

BIG is able to refine its strategy during the search process in the light of documents retrieved so far as well as learning from the experience of previous searches. It permits users to define the relative importance of parameters such as quality and price and to enter the time in which results are require.
Back to top

6. Conclusions and Predictions
Given that the content found on the Internet is set to grow exponentially and that information is likely to be the many business's most valuable asset the demand for automated assistance in sifting and searching the mass of documents available on the web, and hence for intelligent agents, is likely to continue unabated. Assuming that outstanding security issues are resolved satisfactorily the use of such agents is likely to become an everyday feature of business, academic and personal computing.

A White Paper by Kinetoscope, available on its website, indicates the company is developing the following agent-related technologies:

Agent profiles, which will mirror users' needs and learn from their actions;
Speech-to-text and Natural-Language Processing, allowing agents to handle voice commands;
Multiagent systems in which many agents co-operate to better achieve their goals.

7. Further Reading
In addition to the articles by Millman and Hendler quoted in the references, the book "Intelligent Software Agents" by Murch and Johnson is also suggested. It is strongly recommended that interested parties download and test the free evaluation copies of the products described above.
Back to top

8. References
Millman, Howard; Info World February 16, 1998; Agents at your Service
Hendler, James; Nature - web matters March 11, 1999; Is There an Intelligent Agent in Your Future?
Back to top

twinIsles.dev

Intelligent Agents for retrieving Electronic Documents from the Internet

Contents

Intelligent Agents for retrieving
Electronic Documents from the Internet