Multimedia Information Retrieval

An introduction to retrieval from image, video, and audio databases, from a statistical, pattern recognition, and algorithmic point of view.

  • Models of information retrieval
  • Nearest neighbors, range queries, hash algorithms
  • PLSA and topic models vector space models
  • Text categorization, analysis, tagging, parsing
  • Content based image and video retrieval
  • Image retrieval based on color, texture, and shape
  • Visual bag of words model
  • Grouping, static and motion segmentation, scene cuts
  • Geometric indexing, verification, and object recognition
  • Automatic annotation and categorization
  • HMM-based methods (audio, video, music, document retrieval)
  • Generative and discriminative methods
  • Selected special purpose applications, such as face detection, x-ray image analysis
  • Performance evaluation and competitions
  • Applications in consumer imaging, security, forensics, and copyright and plagiarism detection

Materials

Questions

Lecture 1

  • Name common commercial applications of multimedia information retrieval.
  • Describe specific, common multimedia information retrieval tasks (what do queries look like? what is stored in the database?).
  • Write a Python / NumPy / SciPy function to …
  • Define: precision, recall, 11-point P/R graph, F-measure, mean average precision, true/false positive/negative, ROC curve, DET curve
  • Describe the vector space model of text information retrieval.
  • Describe the process of tokenization.
  • Describe latent semantic analysis.

Lecture 2

  • Name and describe major image storage formats, their properties, and differences (e.g., PPM, PNG, JPEG, JPEG2000).
  • Name and describe common command line tools and Python libraries for image processing (e.g., ImageMagick, SciPy, PIL, etc.)
  • What is EXIF? What kind of information is stored in it?
  • Given Python expressions to load images, display them, and convert them to grayscale.
  • Describe JPEG compression.
  • How do you perform in-memory JPEG compression of an image given as a Python array?
  • What is an MD5 checksum? What is a common use for checksums in image databases?
  • How are images stored in relational databases?
  • What is the HDF5 database format? How does it differ from a relational database?
  • What is OpenCV?
  • How would you perform fade-to-black detection in a video?
  • How would you perform scene cut detection in a video?
  • What is keyframe detection? What is it used for?
  • What is optical flow? How does it differ from object motion?

Lecture 3

  • What can you say about the error rate of nearest neighbor classifiers?
  • How are nearest neighbor classifiers related to information retrieval applications?
  • Why are nearest neighbor classifiers particularly suitable for information retrieval applications (compared to other kinds of classifiers)?
  • What properties do dissimilarity functions commonly satisfy?
  • What are the requirements for a distance in a metric space?
  • What Python library functions are useful for performing nearest neighbor classification?
  • Name some common color spaces.
  • What is a color histogram?
  • How are color histograms used for image retrieval?
  • What is a perceptually uniform color space?
  • What is image texture?
  • How is image texture represented?
  • How do you segment images by their texture?

Lecture 4

  • What do clustering algorithms do?
  • Describe commonly used, different approaches to clustering.
  • Describe the k-means clustering algorithm.
  • What is Gaussian mixture modeling?
  • When does GMM work better than k-means clustering?
  • How can you determine the correct number of clusters in a clustering problem?
  • What is hierarchical clustering?
  • What is a dendrogram?
  • What is single-linkage clustering? What is complete linkage clustering? What is average linkage clustering?
  • How can you combine k-means clustering and PCA? Why would you?
  • What is “hierarchical tree VQ”? What is hierarchical clustering?

Lecture 5

  • What are image patches? How are they used in MMIR?
  • How are patches commonly represented?
  • How does compression by vector quantization work?
  • What is a VQ code histogram? What is the bag of visual words method?
  • What is interest point detection?
  • How are interest points used with patch descriptors?
  • Name commonly used interest point detectors.
  • Describe how you can tag/label images using a combination of interest point detection, patch descriptors, and logistic regression.
  • Motivate and describe the Harris corner detector.
  • Describe corner detection based on (1) median filters, (2) level curves, (3) orientation histograms, (4) morphological operations.
  • What is scale space? How is scale space used in the SIFT interest point detector?

Lecture 6

  • Describe the abstraction used for interest point detection and descriptors used in OpenCV.
  • Describe common applications for interest point detectors and descriptors.
  • Describe the demands that each of these applications places on interest point detectors / descriptors.
  • Name some commonly used interest point detectors and feature descriptors.
  • What are SIFT, SURF, MSER, Harris corners?
  • Give commonly desired properties for interest point detectors.
  • Describe how we can benchmark/test/evaluate interest point detectors.
  • Describe the SIFT feature descriptor.

Lecture 7

  • Describe the relationship between k-nearest neighbor classification and nonparametric density estimation.
  • What happens with nearest neighbor classification in high dimensions?
  • What are approximate nearest neighbor methods? Does such an approximation make sense in high dimensions?
  • What is the intrinsic dimension of a data set?
  • How can we determine the intrinsic dimension of a data set?
  • What is the covering dimension?
  • Describe a linear method for dimensionality reduction.
  • Name some common nonlinear dimensionality reduction techniques.
  • Given a video sequence of an object rotating in 3D, what is the intrinsic dimensionality of the video frames viewed as feature vectors?
  • Name some commonly used techniques for fast approximate nearest neighbor retrieval.

Lecture 8

  • Explain the Hough transform for finding lines / the generalized Hough transform.
  • Explain the RANSAC algorithm for finding lines / for finding general object instances.
  • Explain the RAST algorithm for finding lines / for finding general object instances.
  • Explain how 2D coordinates can be represented as complex numbers, and how common geometric transformations can be expressed as complex arithmetic.
  • Describe how you can perform 2D object recognition using interest point detectors and geometric matching.
  • Explain how SIFT or SURF descriptors can be used to improve / speed up geometric matching for object recognition.

Lecture 9

  • Explain segmentation by thresholding / adaptive thresholding.
  • Discuss how thresholding might be used for color segmentation / texture segmentation.
  • What is document binarization?
  • How can the k-means algorithm be used for simple color-based segmentations of images?
  • Which color space is better for performing color based segmentation, Lab or RGB? Why?
  • What is the basic idea behind edge-based segmentation?
  • Explain the watershed segmentation algorithm.
  • Explain the idea behind the random-walk based segmentation algorithm.
  • What are superpixels?
  • Describe a simple k-means based implementation of superpixel segmentation.
  • Describe the graph cut segmentation algorithm.
  • What are active contours?
  • Describe the GVF-based active contour segmentation algorithm.

Lecture 10

  • How is speech generated? What is the vocal tract? How does it generate sounds?
  • What are formants?
  • What is a spectrogram? How is it computed?
  • What is a window function?
  • What is the purpose of dynamic time warping?
  • Describe the dynamic time warping algorithm.
  • How can you perform speech recognition with dynamic time warping?
  • What is a Markov chain?
  • What is an n-gram?
  • Define: accessibility / communicating states / irreducibility / transience / recurrence / positive recurrence / absorption / ergodicity / reversibility

Lecture 11

  • What is a Hidden Markov Model? How is it specified?
  • How can OCR (optical character recognition) be carried out with an HMM?
  • How can speech recognition be carried out with an HMM?
  • Describe how a spectrogram or the image of a line of text is transformed (using k-means) into a symbol sequence suitable for modeling with an HMM.
  • Construct a simple HMM to produce this ____ letter.
  • What is a durational model?
  • What is the durational model for a single HMM state with a self-recurrence and one transition to another state?
  • How can you construct a durational model with an approximately normal distribution of durations out of an HMM?
  • What does the forward algorithm compute?
  • Describe the forward algorithm.
  • Why do we implement most HMM algorithms using log probabilities?
  • What is Viterbi decoding?
  • Describe the Viterbi decoding algorithm.
  • What is Viterbi training?
  • What does the forward-backward algorithm compute?
  • Describe the forward-backward algorithm.
  • Describe Baum-Welch reestimation.
  • What is the Bakis model?
  • Why does the Bakis model often give better results than an unconstrained model with the same number of states?
  • Why does the Bakis model result in better durational models than a model with the minimum number of states necessary to produce the output sequence?