2012. 8. 23. 11:11

Semantic Search
R. Guha
  IBM Research, Almaden rguha@us.ibm.com
Rob McCool  Knowledge Systems Lab, Stanford Stanford, CA, USA robm@ksl.stanford.edu
Eric Miller  W3C/MIT Cambridge, MA, USA em@w3.org

 

2. SEMANTIC SEARCH INTRODUCTION

 Semantic search is an application of the Semantic Web to search.

We believe that the addition of explicit semantics can improve search.

Semantic Search attempts to augment and improve traditional search results (based on Information Retrieval technology) by using data from the Semantic Web.

 

 Traditional Information Retrieval (IR) technology is based almost purely on the occurrence of words in documents.

Search engines like Google [9]), augment this in the context of the Web with information about the hyperlink structure of the Web.

 

Navigational Searches: In this class of searches, the user provides the search engine a phrase or combination of words which s/he expects to find in the documents. There is no straightforward, reasonable interpretation of these words as denoting a concept. In such cases, the user is using the search engine as a navigation tool to navigate to a particular intended document.
We are not interested in this class of searches.

예) A search query like “W3C track 2pm Panel” does not denote any concept. The user is likely just trying to find the page containing all these words.

 

Research Searches: In many other cases, the user provides the search engine with a phrase which is intended to denote an object about which the user is trying to gather/research information. There is no particular document which the user knows about that s/he is trying to get to. Rather, the user is trying to locate a number of documents which together will give him/her the information s/he is trying to find. This is the class of searches we are interested in.

예) search queries like “Eric Miller” or “Dublin Ohio”, denote a person or a place. The user is likely doing an research search on the person or place denoted by the query.

 

We have built two Semantic Search systems. The first system, Activity Based Search (ABS), provides Semantic Search for a range
of domains, including musicians, athletes, actors, places and products.
The second system (W3C Semantic Search) is more focused and provides Semantic Search for the website of the World Wide Web Consortium (http://www.w3.org/).

 

 Both the Semantic Search application and these portions of the Semantic Web have been built on top of the TAP infrastructure.

 

이 글은 스프링노트에서 작성되었습니다.

'Paper' 카테고리의 다른 글

SCI Journals  (0) 2012.08.23
PageRank 관련 paper  (0) 2012.08.23
Finding Matches for Keyword Search  (0) 2012.08.23
Posted by yeoshim

댓글을 달아 주세요

2012. 8. 23. 11:10
  1. IEEE Transactions on Systems, Man, And Cybernetics. Part A: Systems and Humans

    http://www.ieeesmc.org/Newsletter/Current_Issue/index.php

  2. ACM TRANSACTIONS ON INFORMATION SYSTEMS
  3. IEEE TRANSACTIONS ON INFORMATION THEORY
  4. INFORMATION AND COMPUTATION (Elsevier)
  5. INFORMATION SCIENCES
  6. INFORMATION SYSTEMS
  7. IEEE INTELLIGENT SYSTEMS
  8.  

 

http://legoman.tistory.com/234

 

An effective Model and Scheme of Blog Space for Blog Search.

 

 

이 글은 스프링노트에서 작성되었습니다.

'Paper' 카테고리의 다른 글

Semantic Search(Paper)  (0) 2012.08.23
PageRank 관련 paper  (0) 2012.08.23
Finding Matches for Keyword Search  (0) 2012.08.23
Posted by yeoshim
TAG journal, Paper, sci

댓글을 달아 주세요

2012. 8. 23. 11:10

BreadthFirst Search Crawling Yields HighQuality Pages

Compaq system research center (2001)

 

page를 crawl할 때 PageRank를 이용하여 page를 평가한다.

web graph를 순회할 때 너비우선검색 이 좋은 crawl 전략이며, 이것이 crawl에서 high-quality page를 빨리 찾을 수 있다.

 

가장 쉽게 생각할 수 있는 방법은 random 방식이다. Scooter가 이 방식을 사용

Internet Archive crawler는 64개의 host를 동시에 병행적으로 crawl 한다. 하지만 이 방식은 high-quality page를 고려하지 않는다.

많은 전략이 있겠지만 각 검색회사들은 자신의 crawl 전략을 공개하지 않아 알려진 전략은 거의 없다.

 

The Intelligent Surfer:
Probabilistic Combination of Link and Content Information in PageRank

University of Washington

 

전통적인 웹 정보검색 기술은 그 방대한 정보의 양과 다양한 정보의 내용으로 인해 만족할 만한 검색 결과를 내지 못함.

이러한 문제를 해결하기 위해 page간의 연결구조(link structure)에 포함된 정보를 활용한 연구가 진행되었고

가장 잘 알려진 알고리즘은 HITS와 PageRank이다. 이러한 알고리즘은 더 많이 연결되어 있는 page가 더 나은 page라는 믿음(belief)을 기반으로 한다.

 

page content와 지능적 random surfer의 form에 있는 연결구조를 확률적으로 결합한 모델을 제안함.

이 모델은 오늘날 사용되는 대부분의 query relevance function을 지원하며 PageRank보다 더 나은 결과를 낸다.

대신 시간과 저장용량이 필요하지만 그것은 오늘날의 검색엔진에서 수용가능한 수준이다.

 

이 글은 스프링노트에서 작성되었습니다.

'Paper' 카테고리의 다른 글

Semantic Search(Paper)  (0) 2012.08.23
SCI Journals  (0) 2012.08.23
Finding Matches for Keyword Search  (0) 2012.08.23
Posted by yeoshim

댓글을 달아 주세요

이전버튼 1 2 3 4 5 6 7 8 ··· 16 이전버튼