AVES
Research
Image Processing

## Introduction

Relevance feedback, the application of human feedback regarding the performance of information retrieval systems, has been a topic of study in the field of information retrieval for almost 50 years. Starting with textual document retrieval, the field has since expanded to (content-based) multimedia retrieval (such as audio, images, and video) as well. In the current decade, we are confronted with more and more multimedia content. The high rate at which open sensor networks are being set up and increasing rate at which sensor data is available today expose the need to perform semantic searches over these large heterogeneous datasets.

One of the challenges this brings is bridging the semantic gap, the discrepancy between the high level query posed by the user in natural language, and the low level sensor data features. One way of decreasing this gap is to include the user in the information retrieval system. To identify what the user is looking for, the system can iterate a number of times over its results, using the users feedback on which returned results are relevant for its query and which results are irrelevant. Adapting to the user's feedback, the system can refine the search query and retrieve more relevant results.

In the last years, relevance feedback in information retrieval is an increasingly active field of research. In this section we will give a brief overview of the status quo. One of the most well-known and applied relevance feedback algorithms that has its origins in text-retrieval is the Rocchio algorithm. The classic Rocchio algorithm of relevance feedback looks to find the most optimal query based on the user's initial query, and relevance feedback consisting of positive and negative documents. This method of relevance retrieval found its way from text-retrieval into many other domains, including video retrieval.

Relevance feedback can be performed automatically, without any user interaction, or manually, by allowing a user to mark results as relevant or not relevant in varying degrees. The first branch of relevance feedback, also called pseudo-relevance feedback, has piqued the interest of a large number of researchers and software engineers due to its non-reliance on users, though normal relevance feedback is also very relevant to this day. Whereas in normal relevance feedback the user would have to iterate over the retrieved results a number of times, this does not have to be the case in a (well performing) pseudo-relevance feedback system. This makes it favourable for implementation since often the user does not have the time or motivation to give extensive feedback. Human relevance feedback has been known to provide major improvements in precision for information retrieval system.

## Research

In this research we evaluated a novel method of multimedia retrieval that incorporates relevance feedback using concept detector weight calibration. The method proposed in this research (AVES) is based on work by Rocchio. Rocchio has been proven to be a strong method of relevance feedback for text-retrieval and we believe the application for multimedia retrieval purposes is promising and worth investigating. We simulated user relevance feedback (optimal-, pseudo-, and random-relevance feedback) to evaluate the model's performance. We also evaluated the model's performance using human relevance feedback. Retrieval without relevance feedback is taken as a baseline, and the well-performing RS method is used for comparison. Giacinto and Roli introduced this method - that takes into account not only a document's relevance towards the other documents, which is usually considered, but also its irrelevance. The formula they utilize of a document's relevance takes the inverse of a documents distance to the nearest relevant document over the distance to the nearest irrelevant document (see Equation 1.1 below). This score, named relevance score (RS), is then used as ranking measure.

\begin{align} RS(I) = (1+\frac{dR(I)}{dNR(I)})^{-1}\text{, with} \tag{1.1} \nonumber \newline I = \text{Image} \nonumber \newline dR = \text{Dissimilarity from nearest image in R} \nonumber \newline dNR = \text{Dissimilarity from nearest image in NR} \nonumber \newline R = \text{Set of relevant videos} \nonumber \newline NR = \text{Set of nonrelevant videos} \nonumber \end{align}

## Implementation

The current implementation has been dubbed AVES (Automatic Video Event Search). The system is setup as a Python app using Flask (a lightweight web framework based on Werkzeug and Jinja2). To render the pages it uses Blueprints and a number of small (optimization/minification) tools for static asset preprocessing. This is set up as a Gulp pipeline. Nunjucks is used as templating engine and SocketIO provides a way for the app to communicate changes to and from the backend quickly over websockets.

.

.

.

.

.