Analysis of the principle of HTS algorithm of search engine

link analysis is a common search engine analysis a web structure, is generally based on relevant link analysis algorithms of search engine, and then on the page in the chain and chain of detailed collation and analysis of data, and according to the characteristics of these links, then a score and ranking on the web page, when when a user search keywords, the search engine will be related to the theme of "keywords in these links to a reasonable analysis and sorting, finally get the ranking structure, in this paper, the pen to tell you about the topic is HITS, and the HITS algorithm is a link analysis algorithm a more representative.

HITS algorithm in the application, the general is the use of HUB page (many links in web pages and are pointing to the authority of the page, usually navigation or directory ") and Authority (that is, authority is by a large number of links to web pages, web pages) to link between each other to strengthen relations" given the score calculation, that is to say the implementation process of the algorithm is the search engine from the Internet to grab all the page into the HUB page and Authority page appears in the search engine, good Hub page should be pointing to the authority a lot ", and" high authority value should have a lot of points the Hub web site, so we conclude that the core idea of HITS algorithm:

in the understanding of the core idea of HITS algorithm, we need to know is how to according to the algorithm given by calculating the set of web pages weights to sort the search results, then the author through the website 贵族宝贝 mode of operation of the HITS algorithm for further dissection: we can extend out "set T as a set of matrix, at the same time all HUB pages will be seen as the vertex set A, all > will be contained in the collection

3, T should be included in a large number of HUB pages and Authority pages

first, we know that the HITS algorithm is a search algorithm based on query, so when the user submits the query to the search engine, search engine keyword matching query according to the user’s search terms, and return number is highly correlated with the theme of "set" in S, the correlation set, there will be a lot of related links between web pages and web page, so the search engine algorithm HITS will "set S to expand according to the characteristics of the web link, will set a link on the page, the page reference links, and cited the link to other pages are added to the collection, the formation of a new set of T. At the same time to set T asked us:

collection page 2, in the T are highly relevant to the theme in


and T are 1 and web pages in the collection

