Privacy, Identity, and Computational Forensics

The course covers methods for computational analysis of digital data and physical evidence. An emphasis is on the use of statistical, pattern recognition, and machine learning techniques. We will be examining how personal information and identity can be leaked and what techniques there are for protecting personal information.

Materials

Questions

  • face recognition

    • Give commercial applications of face detection.
    • Give examples of companies that are using face detection.
    • What are the limits of forensic applications of face detection?
    • What were the goals and results of the BKA face recognition expeirment in Mainz?
    • How can face recognition systems be fooled?
    • What is the difference between impersonation and camouflage?
    • List sources of variation that a face recognition system has to cope with.
    • What do experiments on human face recognition tell us about machine face recognition?
      • low resolution/blurred
      • recognition from contour information
      • holistic vs parts-based
      • contribution of eyebrows, eyes, facial shape
      • ability to cope with geometric distortions
      • ability to cope with caricature (deviations from the mean)
      • relative contributions of shape vs pigmentation
      • recognition from negative images
      • ability to cope with lighting variations
      • temporal association of facial views in motion sequences
      • contributions of facial motion
      • innate ability to recognize faces
      • familiarity of race
    • What are the basic categories of algorithms?  Describe briefly how they work.
      • feature-based
      • eigenfaces
  • errors in biometric systems

    • What are the kinds of errors that occur in biometric systems?
      • false positive/negative
      • type I/II
    • What is an ROC curve?
      • TPP vs FPR
    • What is a DET curve?
    • What is a training set and what is a test set?
    • How does test set performance relate to training set performance?
    • What is overtraining?
    • How is overtraining detected?
    • What is a wizard-of-Oz experiment?
  • hand-based biometrics

    • What other physical biometric identifiers are frequently used in addition to faces?  Name five.

    • What biometric elements does the new German biometric passport use?

    • fingerprints

      • Discuss the advantages and disadvantages of using fingerprints as biometric identifiers.

        How are fingerprints commonly acquired as a biometric identifier?

        What visually distinctive elements are used in fingerprint analysis?

        What are the basic patterns of fingerprints?

        What are the commonly used minutiae?

        What are commonly used methods for spoofing fingerprints?

        What does Schaeuble’s fingerprint tell us about the security of fingerprints as an authentication method?

        What is liveness detection and why does it matter?

        Give examples of methods for liveness detection in fingerprint recognition systems.

    • hand shape

      • What is handshape recognition?
      • How does hand shape recognition compare to fingerprint recognition?
      • What are some of the advantages and disadvantages of hand shape recognition?
    • hand veins

      • What is hand vein recognition?
      • How are hand veins acquired?
      • What are some advantages and disadvantages of hand vein recognition?
  • behavioral biometrics

    • What is behavioral biometrics?
    • How does behavioral biometrics differ from physical biometrics?
    • Give examples of behavioral biometric methods.
    • speaker recognition
      • Categorize different speaker recognition tasks.
        • identification / verification
        • environment
        • text dependency
        • cooperativity
      • Categorize variabilities in speaker recognition systems.
        • between speaker
        • within speaker
      • What is a formant?
      • How is the speech sound formed physically?
      • How do we detect formants?
      • What is the short time Fourier transform?
      • Why do we use windowing with the short time Fourier transform?
      • Identify this spectrum ______ as a vowel/stop/fricative.
      • Define phone, phoneme, allophone.
      • What is the significance of allophones for speaker identification?
      • What is a dialect?  What is an accent?
      • What is prosody?
      • How does GMM-based speaker identification work?
      • How does DTW-based speaker identification work?
      • How does MLP-based speaker identification work?
      • How does prosody-based speaker identificaiton work?
      • How do GMM and DTW-based speaker identification differ from the user’s point of view?
    • writer identification
      • Describe a SOM-based approach to writer identification.
  • CAPTCHA

    • What is a Turing test?
    • What is a reverse Turing test?
    • What does CAPTCHA stand for?
    • What are CAPTCHAs used for?
    • What is the Loebner Prize awarded for?
    • What are the primary considerations in designing a text-based CAPTCHA?
    • Give an example of an image-based CAPTCHA.
    • What is a replay attack on a CAPTCHA?
    • What is a man-in-the-middle attack on a CAPTCHA?
    • What is Amazon’s mechanical Turk?
    • Describe the Mori-Malik attack on CAPTCHAs.
    • What is Pessimal Print?
    • What is BaffleText?
    • What is ScatterType?
    • What is reCAPTCHA?
    • What is an implicit CAPTCHA?
    • How do CAPTCHAs relate to accessibility?
    • How do visual passwords work?
  • steganography

    • Define: data embedding, data hiding, steganography, watermarking, tamper proofing, covert communications.

      What are applications of data hiding? List and explain.

      • copyright infringement detection
      • copy prevention
      • proof of ownership
      • canary trap
      • broadcast monitoring
      • covert communications
      • metadata embedding

      What are examples of data hiding? List and explain.

      • DivX
      • yellow dots
      • data glyphs

      What are the main dimensions along which we evaluate data hiding? List and explain.

      • perceptibility / fidelity
      • capacity
      • robustness
        • fragile
        • semi fragile
        • obust

      What are the main attack types on data hiding? List and explain.

      • active attacks
      • passive attacks
      • collusion attacks
      • forgery attacks

      Examples of steganography.

      • image modification
      • hiding in encrypted data
      • chaffing / winnowing
      • executable file redundancy
      • modifying packet delay times

      Explain the information theoretic view of steganography.

      What is the relationship between compression, encryption, and steganography?  Explain.

      What is LSB insertion?

      What is the steghide algorithm?

      What is the patchwork algorithm?

      What is texture block coding?

  • camera and printer forensics

    • What are the goals of digital camera forensics? List and explain.
      • source identification
        • camera model
        • camera instance
      • forgery detection
    • cameras
      • Outline the image formation and processing stages of a digital camera.
      • For each stage of digital image capture, identify forensically relevant properties.
      • Explain classifier-based camera identification.  Give examples of features.
    • camera forensics
      • What are common image manipulations and how do they affect different camera forensic methods?

        Explain how lens imperfections are used for camera identification.  Explain limitations of the approach.

        Explain how dust is used for camera identification.

        • How do we proceed if we do/don’t have the camera to be identified?
        • What is the source of the information?
        • How does that work with cropping? Rescaling?
        • How many pictures do we need?
        • How can contour models be used to improve camera forensics based on dust?

        What are sources and characteristics of sensor noise in cameras?  What are the two major classes?  List and explain.

        • reduced by frame averaging
          • shot noise
          • circuit noise
        • not reduced by frame averaging
          • fixed pattern noise
          • photoresponse non-uniformity

        What is dark frame subtraction?

        What is flat-fielding?

        Explain the mathematical model behind noise-based camera identification.

        How can noise-based camera identification be forged?

        How can sensor noise be used for tamper detection?

        What is the CFA algorithm?  Why do digital cameras have it?

        How can the CFA algorithm be used in camera forensics?

        • camera model identification
        • NOT camera instance identification
        • tamper detection
    • tamper detection
      • copy-move detection
        • What is copy-move detection?
        • How can copy-move detection be implemented efficiently?
        • How can autocorrelation be used for copy-move detection?
        • How can autocorrelation be computed efficiently?
      • JPEG double quantization
        • Explain how JPEG double quantization can be used for tamper detection.
        • Under what circumstances does JPEG double quantization fail to detect tampering?
      • blocking artifacts
        • What are JPEG blocking artifacts?
        • How can JPEG blocking artifacts be used for tamper detection?
      • chromatic aberration
        • What is chromatic aberration?
        • Why does chromatic aberration occur?
        • How can chromatic aberration be used for tamper detection?
      • lighting
        • What are lighting inconsistencies?
        • How can lighting inconsistencies be used to detect image tampering?
        • At what locations in the image is it easiest to estimate lighting direction?
  • stylometry

    • Give practical examples of applications of stylometry.
    • What kinds of distinctions is stylometry being used for?
      • 1-vs-few
      • 1-vs-many
      • authorship verification
      • determine attributes of author
      • put texts in chronological order
    • What kinds of writer attributes might be recoverable using stylometry?
      • gender
      • personality
      • sentiment
      • rhetorical style
      • origin
      • political orientation
      • bilingualism
      • educational level
      • age
      • translation from…
      • translation by…
    • What are challenges to stylometry?
      • different styles same author
      • active concealment
      • imitation of styles for reasons other than concealment
      • heterogeneity over time
      • style variations based on genre and content
    • What is an embedded cipher?
    • What discriminators / features are frequently used for stylometry? What are their properties?
      • words, sentence lengths
      • vocabulary richness
      • word ratios
      • hapax legomena
      • letter-based measures
      • parts-of-speech measures
      • layout
      • n-grams
      • syntactic annotation
    • What classifiers are commonly used for stylometry?
      • linear discriminants
      • Bayesian methods
      • univariate analysis
      • cusum charts
      • multivariate analysis
      • cluster analysis
      • neural networks
      • genetic algorithms
      • PCA
    • What is the idea behind compression-based stylometry?  Explain.
    • What kinds of attacks exist on stylometric methods?
    • What are “manual” attacks on stylometric methods?  How well do they work?
    • What are “assisted” attacks on stylometric methods?  How well do they work?
  • computer forensics

    • What does a computer forensic examiner do?
    • What is live vs. dead computer forensics?  What are the advantages/disadvantages of each?
    • What is logical vs. physical computer forensics?
    • What is deniable encryption?  How does it affect forensics?
    • What are standard operating procedures?
    • Describe commonly used standard operating procedure for collecting computer forensic evidence.
    • What are concepts used in crime reconstruction?
      • relational
      • temporal
      • identity
        • production
        • segment
        • alteration
        • location
      • comparison
    • What is evidence dynamics?
    • How does evidence dynamics differ from evidence tampering?
    • What are disclosure and discovery?
    • What is EnCase?
    • Why has EnCase been successful?
    • What is write-blocking hardware and when would you use it?
    • What are MAC times?
    • What is the forensic significance (and limitations) of using MAC times?
    • What is the MFT?
    • What happens with deleted files?  What can be reconstructed?  Explain.
    • What is the Recycle Bin and how does it work?  What is its forensic significance?
    • How are .LNK files forensically relevant?
    • Give two examples of standard cache files and their forensic relevance.
    • What is file carving?
    • What is block-based carving?
    • What is header/footer carving?
    • What is semantic carving?
    • Why does even overwriting a block not necessarily erase the data from the disk?
    • How do flash drives work and how is that forensically relevant?
    • What is Back Orifice?
    • Name five examples of forensically relevant network log files.
    • Explain the function of these network utilities.
      • netstat
      • lsof
      • nmap
      • arp-scan
      • p0f
      • driftnet
      • tcpdump / wireshark
      • dsniff
      • ettercap
    • What is a downgrade attack on a switched network?
    • What is anti-forensics?
    • Give examples of anti-forensic measures.
  • data mining

    • What are the goals of data mining for fraud detection?
    • What are the general steps for performing data mining for fraud detection?
    • Name common types of fraud that can be detected by data mining.
    • What types of attributes are used in data mining?
      • binary
      • numerical
      • categorical
      • ordered
    • What attributes are used in the following kinds of data mining problems?
      • home insurance fraud
      • medical insurance fraud
      • credit fraud
      • telecommunications fraud
    • Explain the different kinds of data mining
      • supervised
      • semi-supervised
      • unsupervised
      • anomaly detection
    • Explain why error rate is frequently not a good measure for data mining performance.
    • What is “area under ROC”?  How useful is it?
    • Name common machine learning/data mining algorithms and explain at a high level how they work.
      • MLP
      • SVM
      • C4.5/CART
      • k nearest neighbors
      • case based reasoning
    • What characteristics are traditionally used for credit scoring? (The 5 C’s)
    • How does aggregation help with bank fraud detection? What is being aggregated?
    • In what ways is meta-learning useful for data mining and fraud detection?
    • What is privacy preserving data mining?
  • privacy on the web

    • What information is revealed to a web site during a web page view? How?
    • What kind of data needs to be stored for “Vorratsdatenspeicherung”?  Who stores it?
    • What is a “DNS side channel”?
    • How can the sender be traced for mail sent from webmail services and for desktop mail clients?
    • What is Tor?
    • How does onion routing work?
    • What is a web bug?
    • Who is using web bugs?
    • How does Google AdSense work?
    • What is AdBlock?
    • What is NoScript?
    • How do AdBlock and NoScript help your privacy?
    • How can you detect web bugs in Firefox?
    • What is a cookie?
    • What is a third-party cookie?
    • What properties/attributes do cookies have?
    • What does the combination of web bugs and cookies achieve?
    • What is CookieMonster for Firefox?
    • What is Google Analytics?
    • What are timing-based data leaks?
    • How can web sites obtain history information even if JavaScript security is working correctly?
    • How can history information be leaked via CSS?
    • What was the AOL Research query privacy breach?
    • Give means by which deanonymization of search records is possible.
    • What is search chaffing?  What is TrackMeNot?
    • What are supercookies?
    • What is an LSO?
    • How did Windows Media Player use to compromise your privacy?
    • What is BetterPrivacy?  How does it work?
    • What is Panopticlick?
    • How many bits of information does your browser reveal about a common user?  What does that mean?
    • What is device fingerprinting?
    • How can we combine timing-based browser history attacks and social networks to deanonymize users?