Context: With the success of "multimedia" as channel of communication, the quantity of data available (not only on the Internet) has skyrocketed. So now, what we would like to is to answer a simple question: how can we find some precise piece of data?
At a first glance two main difficulties appear: The first one is how to treat such large amount of data (i.e. storing, transmitting, coding, etc.), and the second difficulty is how to effectively find what we are looking for, knowing that the data is usually not semantically described. In fact, manual annotation is not only costly and therefore rare, but also very often insufficient, when not incomplete or inconsistent.
|
|
So, we find ourselves with the following paradox: On the one hand we have a precise numeric visual description of the data (used, for instance, to show it on a screen) and on the other hand we have a semantic meaning of what it is seen. Surprisingly it is very difficult to make a connection between the two. This problem is known as "the semantic gap". In order to face this challenge and solve this problem I propose two strategies:
- One option is to get around the gap, by exploiting all sorts of data attached to the multimedia objects (as for instance text, URL, file history, etc.), and then we do an intelligent fusion.
- The second possibility is to reduce the gap by using machine learning. This means we can try to discover the "relationship" existing between signals and meaning. The idea is to let the "computer look with its own eyes". The intuition is that the computer can easily look for statistical regularities and find that when a particular visual feature is available, as a result we can obtain this or that semantic object or meaning. Simultaneously, we can try to decompose the problem into several steps, by automatically structuring the data. The goal is to provide some order and structure by linking items that are similar (i.e. by clustering).
|
|
Perspectives and future work: The fact of tackling the multimedia domain with artificial intelligence opens up several fundamental and practical perspectives. This is just a quick summary of the research performed on this subject. If you are interested in more details contact me directly by e-mail.
|
|
A document in French describing in more detail, my work and some perspectives is available by clicking here: [HDR]
|