会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明授权
    • Method and apparatus for retrieving and indexing hidden pages
    • 用于检索和索引隐藏网页的方法和装置
    • US07685112B2
    • 2010-03-23
    • US11570330
    • 2005-05-27
    • Alexandros NtoulasJunghoo ChoPetros Zerfos
    • Alexandros NtoulasJunghoo ChoPetros Zerfos
    • G06F17/30G06F15/173
    • G06F17/30864
    • A method and system for autonomously downloading and indexing Hidden Web pages from Websites includes the steps of selecting a query term and issuing a query to a site-specific search interface containing Hidden Web pages. A results index is then acquired and the Hidden Web pages are downloaded from the results index. A plurality of potential query terms are then identified from the downloaded Hidden Web pages. The efficiency of each potential query term is then estimated and a next query term is selected from the plurality of potential query terms, wherein the next selected query term has the greatest efficiency. The next selected query term is then issued to the site-specific search interface using the next query term. The process is repeated until all or most of the Hidden Web pages are discovered.
    • 用于从网站自主下载和索引隐藏网页的方法和系统包括以下步骤:选择查询项并向包含隐藏网页的站点特定搜索界面发出查询。 然后获取结果索引,并从结果索引中下载隐藏的网页。 然后从下载的隐藏的网页中识别出多个潜在的查询词。 然后估计每个潜在查询项的效率,并且从多个潜在查询项中选择下一个查询项,其中下一个所选择的查询项具有最大的效率。 然后使用下一个查询项将下一个选定的查询项发布到特定于站点的搜索界面。 重复该过程,直到发现所有或大部分隐藏的网页。
    • 2. 发明申请
    • Method and Apparatus for Retrieving and Indexing Hidden Pages
    • 检索和索引隐藏页面的方法和装置
    • US20080097958A1
    • 2008-04-24
    • US11570330
    • 2005-05-27
    • Alexandros NtoulasJunghoo ChoPetros Zerfos
    • Alexandros NtoulasJunghoo ChoPetros Zerfos
    • G06F15/16
    • G06F17/30864
    • A method and system is provided for autonomously downloading and indexing Hidden Web pages from Websites having site-specific search interfaces. The method may be implemented using a crawler program or the like to autonomously cull Hidden Web content. The method includes the steps of selecting a query term and issuing a query to a site-specific search interface containing Hidden Web pages. A results index is then acquired and the Hidden Web pages are downloaded from the results index. A plurality of potential query terms are then identified from the downloaded Hidden Web pages. The efficiency of each potential query term is then estimated and a next query term is selected from the plurality of potential query terms, wherein the next selected query term has the greatest efficiency. The next selected query term is then issued to the site-specific search interface using the next query term. The process is repeated until all or most of the Hidden Web pages are discovered. In one aspect of the invention, the efficiency of each potential query term is expressed as a ratio of number of new documents returned for the potential query term to the cost associated with issuing the potential query.
    • 提供了一种方法和系统,用于从具有特定于站点的搜索界面的网站自动下载和索引隐藏的网页。 该方法可以使用爬行程序等来实现,以自主地剔除隐藏的Web内容。 该方法包括以下步骤:选择查询项并向包含隐藏网页的站点特定搜索界面发出查询。 然后获取结果索引,并从结果索引中下载隐藏的网页。 然后从下载的隐藏的网页中识别出多个潜在的查询词。 然后估计每个潜在查询项的效率,并且从多个潜在查询项中选择下一个查询项,其中下一个所选择的查询项具有最大的效率。 然后使用下一个查询项将下一个选定的查询项发布到特定于站点的搜索界面。 重复该过程,直到发现所有或大部分隐藏的网页。 在本发明的一个方面,每个潜在查询项的效率被表示为为潜在查询项返回的新文档的数量与发布潜在查询相关联的成本的比率。
    • 4. 发明申请
    • Unbiased page ranking
    • 无偏见的页面排名
    • US20060294124A1
    • 2006-12-28
    • US11033691
    • 2005-01-12
    • Junghoo Cho
    • Junghoo Cho
    • G06F7/00
    • G06F16/951
    • The pages in a network of linked pages are ranked based on the quality of the pages. Page quality is obtained by determining the change over time of the link structure of the page, which is obtained by determining the link structure of the page at different periods of time by taking multiple snapshots of the link structure of the network. The link structures are approximated by their PageRanks, page quality being determined by the formula: Q ⁡ ( p ) ≈ D · Δ ⁢   ⁢ PR ⁢ ( p ) PR ⁡ ( p ) + PR ⁡ ( p ) where Q(p) is the quality of the page, PR(p) is the current PageRank of the page, ΔPR(p) is the change over time in the PageRank of the page, and D is a constant that determines the relative weight of the terms ΔPR(p)/PR(p) and PR(p).
    • 链接页面网络中的页面根据页面的质量进行排名。 通过确定通过采用网络的链接结构的多个快照来确定不同时间段的页面的链接结构而获得的页面的链接结构随时间的变化而获得页面质量。 链接结构由它们的PageRank近似,页面质量由公式确定: p MO>≈ D Delta PR PR PR p 其中Q(p)是页面的质量,PR(p)是页面的当前PageRank,DeltaPR(p)是PageRank中随时间推移的变化 并且D是确定项DeltaPR(p)/ PR(p)和PR(p)的相对权重的常数。