Index ch22 (11) 14 (47) obiektywy lampy1 Pod redakcją Adama Bilikiewicza Psychiatria (podręcznik dl 2 (39) 21 02 Russell Boski plan wieków ulice (4) forsyth Fałszerz abc.com.pl 7 |
[ Pobierz całość w formacie PDF ] .The matches are hyperlinks to the underlying files, just as WAIS-HTTPgateways provide.As Paul Klark writes, "Following the hyperlinkleads you not only to a particular file, but also to the exactplace where the match occurred.Hyperlinks in the documents areconverted on the fly to actual hyperlinks, which you can followimmediately."I cannot stress too much this lesson: The developer should lookaround before coding! Web gateway programming is the problem ofusing to best advantage dozens of promising building blocks, allinterrelated in a dense and tangled mesh.The chances are verygood that someone already has started the project that a new webdeveloper is undertaking, or, at the very least, constructed somethingso similar that it is quite useful.Pros and Cons of GlimpseAssuming that a site has a data store it would like to index andquery, I recommend Glimpse when a site has disk storage constraints,when it does not need the relevancy scores of WAIS, or when empiricaltests show that Glimpse's query speed and accuracy are acceptablyclose to WAIS.Its ease of use is a definite plus, and a well-establishedresearch team is pushing it forward.The authors mention a few weaknesses in the current version ofGlimpse.(See note) I will mentiontwo here:Because Glimpse's index is word based, it can search for combinationsonly by splitting the phrase into its individual words and thentaking an additional step to form the phrase.If a document containsmany occurrences of the word last and the word standbut very few occurrences of the phrase last stand, thealgorithm will be slow.The -f fast-indexing flag does notwork with -b medium indexes.The authorsnote that this is scheduled to be fixed in the next release.The Glimpse team is to be commended for its excellent on-linereference material, identification of known weaknesses and bugs,porting initiatives, and well-conceived demonstration pages.HarvestHarvest, a research project headed by Michael Schwartz at theUniversity of Colorado (the team also includes Udi Manber of Glimpsefame), addresses the very practical problem of reducing the networkload caused by the high traffic of client information requestsand reducing the machine load placed on information servers.Harvest is a highly modular and scaleable toolkit that placesa premium on acquiring indexing information in an efficient mannerand replicating the information across the Internet.No longeris there a curse on the machine that has a popular informationstore; formerly, that machine would have to bear the burden ofanswering thousands of text-retrieval requests daily.With Harvest,one site's content can be efficiently represented and replicated.The first piece of the Harvest software is the Gatherer.The Gatherersoftware can be run at the information provider's machine, thusavoiding network load, or it can run using FTP or HTTP to accessa remote provider.The function of the Gatherer is to collectthe indexing information from a site.The Gatherer takes advantageof a highly customizable extraction software known as Essence,which can unpack archived files, such as tar(tape archive) files, or find author and title lines in Latexdocuments.The Essence tool, because it easily can be manipulatedat the information site, will build a high-quality index for outbounddistribution.The second piece is the Broker.The Gatherer communicates to theBroker using a flexible protocol that is a stream of attribute/valuepairs.Brokers provide the actual query interface and can accommodateincremental indexing of the information provided by Gatherers.The power of the Gatherer-Broker system is in its use of the distributednature of the Internet.Not only can one Gatherer feed many Brokersacross the Net, but Brokers also can feed their current indexto other brokers.Because distributed Brokers may possess differentquery interfaces, the differences may be used to filter the informationstream.Harvest provides a registry system, the Harvest ServerRegistry (HSR), which maintains information on Gatherers andBrokers.A new information store should consult the registry toavoid reinventing the wheel with its proposed index, and an informationrequester should consult the registry to locate the most proximateBrokers to cut down on search time.After the user enters a query to a Harvest Broker, a search enginetakes over.The Broker does not require a specific search engine-itmight be WAIS, freeWAIS, Glimpse, or others.Glimpse is distributedwith the Harvest source and has, as already mentioned, very compactindexes.Another critical piece of the Harvest system is the Replicator.This subsystem is rather complicated, but there are daemons overseeingcommunication between Brokers (which are spread all over the Interneton a global scale) and determining the extent, timing, and flowof replicated information.The upshot is that certain replicationgroups flood object information to the other members of the samegroup and then between groups.Thus, a high degree of replicationis achieved between neighbors in the conceptual wide-area mappingand convergence toward high replication in less proximate Brokersover time.Any information site can acquire the Harvest software, run Gathererto acquire indexing information, and then make itself known tothe Harvest registry.Web developers who want to reduce load ona popular information store are strongly advised to do more researchon Harvest and its components.Figure 22.12 shows the Internet Multicasting Service's EDGAR inputscreen to a Harvest query [ Pobierz całość w formacie PDF ] |
||||
Wszelkie Prawa Zastrzeżone! Kawa była słaba i bez smaku. Nie miała treści, a jedynie formę. Design by SZABLONY.maniak.pl. | |||||