This is an automatic information extraction system for collecting a list
of resources from the World Wide Web. Currently, the resources extracted
by this system are nouns and noun phrases. The extracted resources are sorted
by their relevance to the user query.
The semantics of wildcards
The system supports two kinds of wildcards, namely % and *.
The % wildcard
The % wildcard represents a noun or noun phrase in a query. The use of
the % wildcard enables you to specify which nouns or noun phrases you want
extract from the Web. For example, the query "% is a Canadian city" will
extract nouns or noun phrases immediately before "is a Canadian city".
The % wildcard should appear exactly once in the query.
The * wildcard
A word marked up by a pair of * wildcards will be augmented with its synonyms.
Consider the following scenario: you want to extract a list of names of car
manufacturers, so you enter your query as "% is a car manufacturer". However,
some bona fide car manufacturers are often referred as "vehicle manufacturers",
"sedan manufacturers", and so on. To address this problem, you can re-formulate
the query as "% is a *car* manufacturer", and the query will be automatically
expanded to include "car" and its synonyms.
The * wildcard is optional.
How to write queries
You can specify what information to extract by writing queries. A query
in our system is similar to the phrase query for a typical search engine,
except that you must use the % wildcard to indicate what to extract. The
following is a list of sample queries:
- % is a country
- % is a summer *blockbuster*
- popular movie stars including %
- Thomas Edison *invent* %
- Google *acquired* %