Description
Tip of the tongue refers to the situation when we have a vague idea of an object in our memory but simply cannot name it. Most often than not, we feel that retrieval of the object's name is imminent. However, we can definitely draw out a doodle of this object when asked to. To do this, we seek to investigate the design of neural network architectures and the learning algorithms to learn such models.
We first identify that the retrieval should be feature-based. This means we need to find a representation for each image with which we can quantitatively measure the similarity between a pair of images, i.e., manually-drawn doodles and potentially corresponding real-world photos. After learning a model that takes in images and produces meaningful representations of the inputs, we construct an engine that takes a drawn doodle as input, extracts the representation, computes the similarity between the input and the images in the database, and finally outputs the top-k most similar images. Our project covers every aspect of the above pipeline, ranging from data collection to building a retrieval engine that really works with real-world data. A screenshot of our search page is presented below.
This is a top-scoring project in NUS CS4243 Computer Vision and Pattern Recognition, instructed by Xavier Bresson.
Here is the poster and the presentation video (demo available!).