Search for contacts, projects,
courses and publications

Cross-media Indexing for Multimedia Information Retrieval



Crestani F.



This project will address the issue of Cross-Media indexing for Multimedia Information Retrieval. Cross-Media indexing enables to enrich and augment the indexing of multimedia information, such as for example videos or web documents, by linking together all indexing information obtained from the different single media that compose a specific multimedia item (e.g. a video story segment or a web page). Past research has shown that cross-media indexing can greatly augment and enhance the value of single media indexing for multimedia documents by cross linking information found in different media types within an uncertain evidence framework. For example, a video segment could contain speech, music, images, text (e.g. captions, subtitles), faces, and so on, with different indexing information attempting to characterise the content of the video segment can be extracted from these media. So, speech can be translated into text and can be used to partially identify the topical content of the video segment by extracting index terms or facts, but can also be used, when possible, to identify the speaker or, in some cases to discern the speaker´s emotions. Images can be used to characterise the scenes (e.g. indoor vs. outdoor, urban landscape vs. countryside) or to identify specific objects or buildings appearing in the video. Other media can provide additional information. However, speech recognition, image or face recognition and in general all of these single media analysis technologies are far from perfect and often produce errors. The power of cross-media indexing is in linking the information provided by the analysis of the different single media so that errors produced by specific different single media processors can be compensated by the results of other analysers and a more precise and more comprehensive indexing of the video segment can be obtained. Indexing features extracted from the different single media processors can be combined to boost each other or to compensate each other, so that detection or recognition errors can be recovered from. This requires mathematical models to combine multiple uncertain evidence that will have to be evaluated within a proper evaluation framework. This project aims at investigating different mathematical models of cross-media indexing and to evaluate them using a purpose built test collection of multimedia material (videos) where each single media material as well as the multimedia material have been indexed and assessed for relevance separately and independently.

Additional information

Start date
End date
42 Months
Funding sources
Swiss National Science Foundation / Project Funding / Mathematics, Natural and Engineering Sciences (Division II)