The Query System
One of the major challenges for end users today (whether scientists, researchers, policymakers, etc.) is how to pose integrative queries over Web data sources. Today one can fairly easily input a keyword search into Google (Bing, Yahoo, etc.) and receive satisfactory answers if one's information need matches that of someone in the past: the search engine will point at a Web page containing the already-assembled content.
The challenge lies when an information discovery query is being posed -- one that requires assembly of content from multiple data items, but has not previously been posed. The Q System attempts to provide an intuitive means of posing such queries.
In Q, the user first defines a web query form to answer queries related to a specific topic or topic domain: this is done by describing (using keywords) the set of concepts that need to be interrelated. The system finds sets of data sources related to each of the topics. Then, using automatic schema matching algorithms, it finds ways of combining source data items to return results.
Of course, depending on the specific application, the user may find certain sources or data perspectives to be more valuable than others. Through feedback on the query output, the Q System learns to adjust its relevance function to match the user's context and information need. Additionally, in some cases the automatic schema matching algorithms may suggest nonsensical ways of combining source data: in this case, Q also can take user feedback on the bad answers, and determine how best to repair bad matchings.
We are currently studying the following problems.
- When the user gives feedback on query results, this is in fact a form of the view update problem, where we seek to either adjust a tuple’s score or its values. Traditionally the database literature has considered view update to only be meaningful when “side effects” are prohibited, i.e., when the effects of the feedback *only* affect the tuple the user has manipulated. Algorithms like MIRA in fact seek to *produce* side effects, in that other related tuples may *also* receive feedback. A major focus of our effort is on unifying the concepts of view update and feedback-based learning.
- We are considering how to best “bootstrap” the process of assigning scores to relations and tuples that we have not previously seen.
- Finally, we need mechanisms for ensuring *efficient computation* of top-k query results, when the user initially poses a query and subsequently after they produce feedback.
- Marie Jacob and Zachary G. Ives. Sharing work in Keyword Search over Databases. SIGMOD 2011.
- Partha Pratim Talukdar, Zachary G. Ives, Fernando Pereira. Automatically Incorporating New Sources in Keyword Search-Based Data Integration. SIGMOD 2010.
- The Orchestra Collaborative Data Sharing System. Zachary G. Ives, Todd J. Green, Grigoris Karvounarakis, Nicholas E. Taylor, Val Tannen, Partha Pratim Talukdar, Marie Jacob, Fernando Pereira. ACM SIGMOD Record, September 2008.
- Partha Pratim Talukdar, Marie Jacob, M. Salman Mehmood, Koby Crammer, Zachary G. Ives, Fernando Pereira, and Sudipto Guha. Learning to Create Data-Integrating Queries. VLDB 2008.