To model the evidence contained within the MPEG-7 documents, and specifically, the structural and unusual conceptual characteristic of evidence inheritance, we adopt the Inference Network model for IR as developed by Turtle [18]. The Inference Network (IN) model has ability to perform a ranking given many sources of evidence by performing a combination of evidence. The IN model is basically a Bayesian Network used to model documents, the document contents, and the query. The IN consists of two sub-networks: the Document Network (DN) produced during indexing and then static during retrieval; the Query Network (QN) produced from the query text during retrieval.
The DN represents the document collection and consists of nodes for
each document (called document nodes) and nodes for each concept with
the collection (document concept nodes). The document nodes represent
the retrievable units within the network, that is, those items we
wish to see in the resultant ranking. A causal link (represented as
) between document node and the document concept node indicates
that the document content is represented by the concept. Each link
contains a conditional probability, or weight, to indicate the strength
of the relationship. The evaluation of a node is done using the value
of the parent nodes and the conditional probabilities.
The QN represents the submitted query and consists of a framework of nodes that represent the required concepts (query concept nodes) and the operators (query operator nodes), connected in an inverted tree structure. The QN is constructed with a final leaf node I that represents the user Information Need. The framework permits statistical operators and statistical approximations of the Boolean operators, a number of which are given in Table 1 (as in the INQUERY implementation [2]).
|
Two further processes are done to perform retrieval: the attachment process, where by the QN is attached to the DN to form the complete IN and is done where concepts in both networks are the same; the evaluation process, whereby the complete IN is evaluated for each document node to form the probability of the relevance to the query. The evaluation is initialised by setting the output of one document node to 1 and all the other document nodes to 0. This is done for each document node in turn and the network is evaluated (see [19] for exact detail and examples on how nodes are evaluated). The probability of document relevence is taken from the final node I and is used to produce the ranking.