Using the Web for Assignments
4. Searching the Web
Spiders
Computer robot programs that are used by search engines to roam the World
Wide Web via the Internet, visit sites and databases, and keep the search
engine database up-to-date. They obtain new pages, update known pages, and
delete obsolete ones. Their findings are then integrated into the "home"
database.
Search Engines
Most large search engines operate several spiders all the time. Web is so
enormous that it can take six months for spiders to cover it resulting in
a certain degree of "out-of-datedness" in all the search engines.
Search engines for the general web do not really search the World wide Web directly. Each one searches a database of the full text of web pages selected from the billions of web pages out there residing on servers.
An Internet search engine allows the user to enter keywords relating to a topic and retrieve information about Internet sites containing those keyword. When you click on links provided in a search engine's search results, you retrieve from the sever the current version of the page.
Web search engines tend to be developed by private companies, though most of them are available free of charge.
A web search engine consists of three components:
- Spider - Program that traverses the Web from link to link, identifying and reading pares.
- Index - Database containing a copy of each Web page gathered by the spider.
- Search engine mechanism - Software that enables users to query the index and that usually returns results in term relevancy ranked order.
Term Ranked Order
A document will appear higher in your list of results if your search term
appears many times, near the beginning of the document, close together in
the document, in the document title, etc. These may be thought of as first
generation search engines.
A new development in search engine technology is the ordering of search results by concept, keyword, site, links or popularity. Engines that support these features may be thought of as second generation search engines.
For example, Google ranks results according to the number of highly ranked Web pages that link to other pages. a Web page becomes highly ranked if still other highly ranked pages link to them. This scheme represents a melding of technology and human judgment.
All search engines have rules for formulating queries. It is imperative that you read the help files at the site before proceeding. To learn more see: SearchEngine Showdown and Search Engine Watch.
Search engines:
- are built by computer robot programs (spiders), not by human selection
- are NOT organized by subject categories. All pages are ranked by a computer algorithm
- contain full-text (every word0 of the web pages they link to. You find pages by matching words in the pages you want
- are huge and often retrieve al lot of information
- and are UNEVALUATED. YOU MUST EVALUATE EVERYTHING YOU FIND
Meta-Search Engines - Search engines that automatically submit your keyword search to several other search tools, and retrieve results from all their databases. Convenient time-savers for relatively simple keyword searches )one or tow keywords or phrases in "quotation marks". See SurfWax www.surfwax.com
Subject Directories
An approach to Web documents by subject terms hierarchically grouped.
May be browsed or searched by keywords. Subject directories are smaller
than other searchable databases, because of the human involvement required
to classify pages by subject. See Librarians Index to the Internet www.lii.org
Subject Directories:
- are built by human selection, not by computers or robot programs
- are organized into subject categories. Subject classifications among the directories are not standardized and vary according to the scope of each directory.
- NEVER contain full-text of the web pages they link to. you can only search what you can see (titles, descriptions, subject categories, etc.)
- use broad or general terms
- are small and specialized to large and huge in range. But smaller than most search engines.
- and are often carefully evaluated and annotated (but not always!!)
Use a search engine when...
- you have a narrow or obscure topic or idea to research.
- you are looking for a specific site.
- you want to search the full text of millions of pages.
- you want to retrieve a large number of documents on your topic.
- you want to search for particular types of documents, file types, source locations, languages, date last modified, etc
Use a subject directory when...
- you have a broad topic or idea to research.
- your want to see a list of sites on your topic often recommended and annotated by experts.
- you want to retrieve a list of sites relevant to your topic, rather than numerous individual pages contained within these sites.
- you want to search for the site title, annotation and if available) assigned keywords to retrieve relevant material rather than the full text of and document.
- you want to avoid viewing low-content documents that often turn up on search engines.
<<< PREVIOUS 3. Web Pages and Web Sites <<<

Back to Top