University of Twente Student Theses

Login

Optimizing XML information retrieval query execution at the physical level

Os, Roel van (2007) Optimizing XML information retrieval query execution at the physical level.

[img] PDF
575kB
Abstract:XML is emerging as a standard format for information interchange and storage of structured information. The wide-spread use of XML has sparked the interest of both the database and information retrieval research communities. XML databases are designed to store and query large volumes of XML data. Structured information retrieval or XML-IR is the application of information retrieval concepts and techniques to search structured data, usually in the form of documents in XML format. The PF/Tijah XML information retrieval (XML-IR) system combines the expressive power of the XML Query language (XQuery) with techniques for structured information retrieval. PF/Tijah provides an extension, based on the the TIJAH XML-IR research system, to the Pathfinder XML database. Similar to traditional database systems, the PF/Tijah extension is structured along three layers. The conceptual level deals with the user’s search request in the form of NEXI queries. The logical level deals with these queries expressed in the Score Region Algebra (SRA). The physical level provides implementations of the SRA operators on top of the MonetDB open source database kernel. In this thesis, the physical level implementation of the PF/Tijah XML-IR system is examined. The implementation of optimized IR primitives on top of the MonetDB relational database kernel is demonstrated. The influence of intermediate result size reduction on efficiency and retrieval effectiveness is investigated. Small-scale tests of the individual SRA operators combined with large-scale experiments based on the INEX 2004 and 2005 evaluation initiative methods show that large performance improvements can be achieved with only limited reduction in retrieval effectiveness.
Item Type:Essay (Master)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:54 computer science
Programme:Computer Science MSc (60300)
Link to this item:https://purl.utwente.nl/essays/794
Export this item as:BibTeX
EndNote
HTML Citation
Reference Manager

 

Repository Staff Only: item control page