google search principle Papers

Published: 29th March 2011
Views: N/A
Ask About This Article Print Republish This Article
This article, we introduced the google, it is a large search engine (of a large-scale search engine) prototype, the search engine is widely used in the hypertext.

Google is designed to efficiently catch and index pages, which results than other existing systems are clever. The prototype of the full text and hyperlink database of

at least 24'000'000 pages. We can http://google.stanford.edu/ download. Design search engine is a challenging task. Search engines index billions of pages, which

contains a large number of very different vocabulary. And to answer thousands of queries per day. In the network, although large-scale search engine is very important,

but little academic research it. Furthermore, since the rapid development of technology and the large increase in pages, and now a search engine and three years ago

are completely different.

This paper describes our major search engines, as far as we know, in the published paper, this is the first description of such a detailed manner. In addition to


traditional data search technology to such a large class of problems encountered in the page, there are many new technical challenges, including the application of

hypertext in the additional information to improve search results. This article will address this issue, describes how to use additional information in the hypertext,

a large utility system. Anyone can freely publish information online, how to effectively deal with these non-organized collection of hypertext, but also to pay

attention to this issue.
Keywords World Wide Web, search engines, information retrieval, PageRank, Google 1 Introduction to Information Retrieval Web has brought new challenges. Rapid

growth in the amount of information on the Web, while there have been no experience of new users to experience the Web the art. People like to use hyperlinks to surf

the Internet, usually to important pages such as Yahoo or search engines started. We believe that List (catalog) effectively contains all topics of interest, but it is


subjective, expensive to establish and maintain, upgrade slowly, not all esoteric topics. Automatic search engine based on keywords usually return too many low-quality

match. Make matters worse, some advertisers find ways to win people's attention to mislead the automatic search engine.

We have established a large-scale search engine to solve many problems in the existing system. Application of hypertext structure, greatly improving the quality

of the query. Our system named google, named after the popular spelling of googol, or 10 to the 100th, this and our goal to build a large-scale search engine coincide.
1.1 Web search engine - upgrade (scaling up) :1994-2000 had to quickly upgrade the search engine technology (scale dramatically) to keep up with the number

doubling in the web. In 1994, the first Web search engine, World Wide Web Worm (WWWW) can be retrieved 110,000 Web pages and Web documents. To November 1994, claiming

the top of the search engine can retrieve 2'000'000 (WebCrawler) to 100'000'000 a network file (from the Search Engine Watch). Can be expected to 2000, the page can be

retrieved more than 1'000'000'000. Meanwhile, the search engine traffic will grow at an alarming rate. In March and April 1997, World Wide Web Worm received an average

of 1,500 queries per day.

In November 1997, Altavista claimed it handled roughly 20'000'000 day queries. With the growth of Internet users, to 2000, the automatic search engines will handle

hundreds of millions of daily queries. Our system is designed to solve many problems, including quality and scalability, the introduction of search engine technology

to upgrade (scaling search engine technology), to upgrade it to such a large number of data.

1.2 Google: Web to keep up the pace (Scaling with the Web) to create a scale able to adapt to today's web search engines will face many challenges. Web technology

must be caught fast enough to keep up with the pace of change pages (keep them up to date). Indexing and document storage space must be large enough. Indexing system

must be able to deal effectively with hundreds of billions of data. Process the query must be fast, to be able to handle hundreds of queries per second (hundreds to

thousands per second.). As the Web grows, these tasks become more difficult. However, the efficiency and cost of hardware is also growing rapidly, can be partially

offset these difficulties. There are several noteworthy factors, such as disk seek time (disk seek time), the efficiency of the operating system (operating system

robustness). Google in the design process, we not only consider the Web's growth rate, but also consider the technology updates. Google is designed to handle very

large data set of the upgrade. It can effectively use the storage space to store the index. Optimized data structure can quickly and efficiently access (see Section

4.2). Further, we hope, as opposed to the capture of text files and HTML pages in terms of quantity, the cost of storage and indexing as small as possible (see

Appendix B). Such as Google for the centralized system, these measures were satisfactory system scalability (scaling properties).


1.3 Design Goals
1.3.1 improve search quality our main goal is to improve the quality of Web search engines. In 1994, it was that the establishment of the whole search index (a

complete search index) can make it easy to find any data. According to Best of the Web 1994 - Navigators, "The best navigation service can search for any information

on the Web is very easy (at that time all the data can be logged in)." However, on the Web in 1997 is very different. Recent search engine users have confirmed the

integrity of the index is not the sole criterion for evaluation of search quality. Interest to the user's search results are often lost in the "junk results Junk

result" in the.


Q-logic SFP
Redback SFP
SMC SFP

This article is free for republishing
Source: http://ucoolstuff.articlealley.com/google-search-principle-papers-2149296.html


Report this article Ask About This Article Print Republish This Article


Loading...
More to Explore
 


Ask a Professional Online Now
27 Experts are Online. Ask a Question, Get an Answer ASAP.
Type your question here...
Optional:
Select...