Old‎ > ‎

Old Exercises

These are from previous years.

Please obtain and use the following software packages:

  • Python
  • Scientific Python (scipy.org)
  • Matplotlib and Pylab (matplotlib.sf.net)
  • Python Imaging Library (PIL, www.pythonware.com/products/pil)
  • ImageMagick (www.imagemagick.org)

You may also find the interactive iPython shell useful. There are also several Python IDEs available; DrPython is easy to use and reasonably powerful.

You can use the computer science department's machines. If you want to use your own machine, the easiest way is to run Ubuntu or another desktop Linux version. You can either run Linux directly, use one of the Ubuntu-under-Linux versions (e.g., andlinux.org), or install under VirtualBox or VMWare. For Macintosh, your best bet is to start with the SciPy bundle, or maybe to install it via Fink.

Images are represented in Python as NumPy arrays (2D arrays). There is a lot of introductory information available on the net on Python, NumPy, and SciPy. Good places to start reading are www.scipy.org and www.python.org.

Image File Formats

Exercise 1.1  

Exercise 1.2 (Image File Formats)  
(10 Point10=1s)
A common image storage format is the PGM format; you can find the specification here:
Write Python functions for reading and writing the binary, 8 bit per pixel variant of the PGM format (you should raise an error for any other input). You can generate sample images with the command:
        convert magick:rose rose.pgm

Image Compression

Exercise 1.3  

Exercise 1.4 (Image Compression)  
(10 Point10=1s)
Image compression relies on reducing redundancies. A simple trick for compressing images relies on computing differences between subsequent pixels followed by gzip compression.

First, write a Python script that takes any file as input and outputs a file that consists of differences between pixels; that is, if the original file contains the bytes $ b_0$, $ b_1$, $ b_2$, ..., $ b_n$, then the output file should contain $ b_0$, $ b_0-b_1$, $ b_1-b_2$, ..., $ b_{n-1}-b_n$. Verify that you can transform images, reverse the transform, and obtain a bit-identical output image.

Take graylevel input images and convert them to binary PGM images (you can take color images and convert them with convert image.jpg image.pgm). For 10 images, determine the following: (1) the size of the image in binary PGM format, (2) the size of the image when JPEG compressed with quality 99, (3) and the size of the image when you take the binary PGM image and compress it with gzip -9, and (4) the size of the image when you take the binary PGM image, apply pixel differencing, and then compress it with gzip -9.

Explain and discuss your results.


Exercise 1.5  

Exercise 1.6 (Path-Finding)  
(20 Point20=1s)
Assume you are given an image image. You are given the two points `source' $ s=(x_s,y_s)$ and `target' $ t=(x_t,y_t)$ in the image for which image[xs,ys]==0 and image[xt,yt]==0. 1=10
Write a function in your favorite programming language that determines the shortest path between $ s$ and $ t$ that only uses image points with value 0 if such a path exists. (Otherwise return e.g. the empty sequence.) More specifically, the function should determine a sequence $ p_1,\ldots,p_N$, such that $ p_1=s$ and $ p_N=t$ and for each $ n=1,\ldots,N:$ $ image(p_n)=0$ and for each $ n=1,\ldots,N-1:$ $ p_{n+1}-p_{n} \in \{(0,1),(1,0),(0,-1),(-1,0)\}$ (4-neighborhood), and for which no shorter sequence with the same attributes exists.
What is the computational complexity of your approach?
How does the function change if you consider the 8-neighborhood instead of the 4-neighborhood? Will the paths be shorter or longer in general? What is the relation between the existence of a path in the two cases?
What are possible uses for this function (within or outside of image processing)?

Python has several libraries dealing with images. You may find the following code useful for reading images using the PIL library. The PIL library understands a lot of image formats but doesn't output numerical arrays (instead, it outputs an Image object). The code below performs the necessary conversions.

from __future__ import with_statement
__all__ = """pil2numpy numpy2pil read_pil write_pil""".split()

import re
import numpy,scipy,scipy.ndimage
import cStringIO
from numpy import arange,where,amax,zeros
from PIL import Image

def pil2numpy(image):
"""Convert a PIL image into a NumPy array."""
modes = { "RGB" : numpy.uint8, "L" : numpy.uint8,
"F" : numpy.float32 }
result = numpy.asarray(image,dtype=modes[image.mode])
result = result.copy() # make it read/write
return result

def numpy2pil(image,mode=None):
"""Convert a NumPy array into a PIL image."""
result = Image.fromarray(image,mode=mode)
return result

def write_pil(filename,image,type="png"):
"""Write a NumPy array as a PNG image using PIL."""
image = numpy2pil(image)

def read_pil(filename):
"""Read a NumPy array using PIL."""
image = Image.open(filename
return pil2numpy(image)

Image and Video Processing (SS 2007)

Exercise Sheet 2

to be handed in: 19.06.2007

Connected Components

Exercise 2.1  

Exercise 2.2 (Connected Components)  
(20 Point20=1s)
In your favorite programming language, program a function that given a binary image computes all connected components of value 1. (Implement the algorithm discussed in the lecture.) Assume the binary image is given as a one-dimensional array and values for height and width. Output the result coded as an image. That is, the function should return an image of the same size as the input image in which each pixel is either 0 (when the corresponding pixel was 0 in the input) or has the label $ l>0$ of the connected component the pixel belongs to.

Current Cameras

Exercise 2.3  

Exercise 2.4 (Current Cameras)  
(5 Point5=1s)
Perform a web- or library-search to answer the following questions regarding cameras sensors: 1=10
What is the chip size of current camera sensors with respect to the number of pixels? Give a few examples. What is the largest number of pixels on a chip that you can find? What are reasons that limit the number of pixels on a chip?
What is the size of a sensor corresponding to one pixel on a typical chip? What are advantages and disadvantages of such a small pixel sensor size?

Color Spaces

Exercise 2.5  

Exercise 2.6 (Color Spaces)  
(10 Point10=1s)
Briefly discuss one application for each of the following color spaces: 1=10

Fourier Series

Exercise 2.7  

Exercise 2.8 (Fourier Series)  
(10 Point10=1s)
Compute the Fourier series of the rectangle signal

\begin{displaymath}f(x)= \begin{cases} 1 & \text{if $0 \leq x < \pi$,}\\ -1 & \text{if $\pi \leq x \leq 2\pi$.} \end{cases}\end{displaymath}
on $ [0,2\pi]$.

Gaussian Smoothing

Exercise 2.9  

Exercise 2.10 (Gaussian Smoothing)  
(10 Point10=1s)
An $ N\times N$ image is to be low-pass filtered. 1=10
Derive the formula for a discrete filter that does Gaussian smoothing in $ x$-direction and in $ y$-direction with identical variance parameter $ \sigma$ for both.
Implement the Gaussian as an FIR filter in your favorite programming language. (If you choose the C/C++ programming language, you can use the I/O routines as presented in the first exercises to test your program.)
What is the computational complexity in $ N$ and $ \sigma$ of your approach?

Smoothing for Down-Sampling

Exercise 2.11  

Exercise 2.12 (Smoothing for Down-Sampling)  
(20 Point20=1s)
An $ N\times N$ image is to be down-scaled by an (integer) factor $ k$. To avoid aliasing, we want to use a Gaussian smoothing filter. 1=10
Compute the `correct' standard deviation $ \sigma$ for the Gaussian filter (use the Sampling Theorem).
Implement the filter in your favorite programming language. (If you choose the C programming language, you can use the I/O routines as presented in the first exercises to test your program.) What is the computational complexity in $ N$ and in $ k$ of your approach?

Fourier transform properties

Exercise 3.1  

Exercise 3.2 (Fourier transform properties)  
(10 Point10=1s)
In the lecture, we discussed various connections between an image and its Fourier transform. For example, consider
  • sharp edges correspond to high frequency components in the orthogonal direction
  • a ``broader'' image corresponds to a ``more peaky'' Fourier transform (consider e.g. a scaled object in an image)
  • a rotation of the image corresponds to a rotation of its Fourier transform
  • a translation in the image corresponds to a phase shift in the Fourier transform and does not change the magnitudes
Your task is to find/design/paint/construct images that illustrate some of these effects and discuss the Fourier spectra you obtain in a short text. You do not have to implement the Fourier transform! (Use existing software to compute it instead, e.g. ImageJ.)

Image Operations

Exercise 3.3  

Exercise 3.4 (Image Operations)  
(10 Point10=1s)
Implement the image processing operations 1=10
Min-Max Normalization
Histogram Equalization
as they were introduced in the lecture. Test what effect they have on the example images from the course homepage. When do these operations improve the image quality and when do they not?


Exercise 3.5  

Exercise 3.6 (Subsampling)  
(15 Point15=1s)
In the lecture, it has been mentioned that low-pass filters should be applied before subsampling from an image in order to avoid aliasing. 1=10
Write a program that subsamples an image by a given integer factor $ k$.
Test your program with the example images from the course homepage for $ k=2$ and $ k=4$. Does aliasing occur? When?
Use the Gaussian filter from the previous exercise sheet to smooth the image before subsampling. What value of $ \sigma$ (in the image domain) do you find from that to give best visual results for subsampling with $ k=2$? For $ k=4$?
What is the optimal variance parameter for filtering the image with a Gaussian when the goal is subsampling by a factor of $ k$? (You can assume that a Gaussian filter of variance $ \sigma'$ in the frequency domains acts as a low-pass filter that cuts off all frequencies components $ \sigma'$ or higher.)

Hint: If you cannot derive yourself what the relationship between Gaussians in image space and in frequency space is look it up in the Book by Gonzales and Woods, or online.

Rotation with 3 Shears - Theory

Exercise 3.7  

Exercise 3.8 (Rotation with 3 Shears - Theory)  
(5 Point5=1s)
In the lecture it was discussed that rotating an image can be done by performing $ 3$ shear operations instead of using a rotation matrix. Derive the shear parameters $ a,b,c$ for a rotation by $ \alpha$ using the ansatz

$\displaystyle \begin{pmatrix}\cos \alpha & -\sin \alpha \\  \sin \alpha & \ \ \ ... ...atrix}1 & 0\\ b & 1\end{pmatrix}\cdot \begin{pmatrix}1 & c\\ 0 & 1\end{pmatrix}$


Exercise 3.9  

Exercise 3.10 (Rotation)  
(15 Point15=1s)
Write a program that rotates a given image by an angle $ \alpha$ using the image center as the origin. 1=10
by using a forward rotation matrix, i.e. for each original image pixel, put its value to the transformed position in the target image.
by using a backward rotation matrix, i.e. for each target pixel, fetch a value from the correct position in the original image.

To read or write to between-pixel positions, use bilinear interpolation from/to the neighbors. During the transforms, make sure that the images are large enough to contain all pixels and don't forget to keep the origin at the image center.

Now test the two methods: take two example images from the course homepage and rotate them first by $ \alpha = \frac{\pi}{7}$ and then by $ \alpha = -\frac{\pi}{7}$. Crop the transformed image to size that the original had and calculate the difference image between the original and the transformed version. What do you see?


Morphological Processing

Exercise 4.1  

Exercise 4.2 (Morphological Processing)  
(5 Point5=1s)
Prove that the following relation between morphological closing and opening holds:

$\displaystyle (A \bullet B)^c = (A^c \circ \hat{B})$

Unsharp Masking

Exercise 4.3  

Exercise 4.4 (Unsharp Masking)  
(10 Point10=1s)
In the lecture, we discussed the use of the Laplacian operator for the enhancement of image details. How is the discussed procedure related to the following method, called `unsharp masking'?

A negative was contact-copied on to a low contrast film or plate to create a positive. However, the positive copy was blurred. After processing, this blurred positive was replaced in contact with the back of the original negative. When light was passed through both negative and in-register positive, the positive partially cancelled some of the information in the negative. Because the positive was intentionally blurred, only the low frequency (blurred) information was cancelled. In addition, the mask effectively reduced the dynamic range of the original negative. Thus, if the resulting enlarged image is recorded on contrasty photographic paper, the partial cancellation emphasizes the high frequency (fine detail) information in the original, without loss of highlight or shadow detail. The resulting print appears sharper than one made without the unsharp mask.
(Adapted from http://en.wikipedia.org/wiki/Unsharp_mask)


Discuss the relationship of the two methods.
Design (you do not need to implement anything) a digital unsharp masking filter
  • in two steps (unsharp followed by masking), and
  • in one step.

Histogram Properties

Exercise 4.5  

Exercise 4.6 (Histogram Properties)  
(10 Point10=1s)
In the lecture, we discussed several statistical properties of the grayscale histogram of an image in the context of Otsu-thresholding. Show that the following equalities hold (cp. the notation used in the lecture): 1=10
$ \sigma_b^2 = p_1 p_2 (\mu_1 -\mu_2)^2$
$ \sigma^2 = \sigma_w^2 + \sigma_b^2$
Note: analogously to the quantities defined in the lecture, we have $ \sigma_w^2 = p_1 \sigma_1^2 + p_2 \sigma_2^2$ with $ \sigma_1^2 = \frac{1}{p_1} \sum_{g=0}^T (g-\mu_1)^2 \, h_g,$ and $ \sigma_2^2 = \frac{1}{p_2} \sum_{g=T+1}^{L-1} (g-\mu_2)^2 \,h_g$.

Wavelets Practice

Exercise 4.7  

Exercise 4.8 (Wavelets Practice)  
(15 Point15=1s)
Write a program that reads an image of even dimensions and performs one level of the Wavelet transform on it, using a the Haar basis $ (\frac{1}{2},\frac{1}{2})$ and $ (\frac{1}{2},-\frac{1}{2})$. Make sure that no over- or underflows occur and output the result as an image again.

Extend the program such that it performs all levels of the transform. If the size of the image is not a power of 2, fill the additional parts with 0.

Wavelets Theory

Exercise 4.9  

Exercise 4.10 (Wavelets Theory)  
(10 Point10=1s)
What is the computational complexity of your routine for the full wavelet transform for images of size $ N=2^k$. Is this optimal?

Compare the complexity of the DWT with the one of the FFT. Do you think that the difference will matter in practice?

Exercise 5.1  

Exercise 5.2 (Median Filter)  
(5 Point5=1s)
Show that the Median Filter for images is not linear.

Invariant Features

Exercise 5.3  

Exercise 5.4 (Invariant Features)  
(15 Point15=1s)
Analyze the behaviour of the three localized features
Greyscale histogram in a circle of radius $ N$ pixels centered at $ (x,y)$
Greyscale centroid of a circular region of radius $ N$ pixels centered at $ (x,y)$
(the centroid was introduced on exercise sheet 1)

Normalized gradient vector, i.e. $ \frac{\textrm{grad}}{\Vert\textrm{grad}\Vert}$ in a point $ (x,y)$
under each of the transformations
Translation by $ (+10,+10)$
Rotation by $ 90^\circ$ around the origin
Scaling by a factor of $ 2$
Brightness change $ I(x,y)\mapsto 2 \cdot I(x,y)$
Perform your analysis in the following way: Fix a point $ p_1 = (x_1,y_1)$ in an image $ I_1$. Assume that an image $ I_2$ resulted from $ I_1$ by the given transformation and denote the position where $ (x_1,y_1)$ is moved to by $ p_2=(x_2,y_2)$. Calculate the feature values at $ (x_1,y_1)$ in $ I_1$ and at $ (x_2,y_2)$ in $ I_2$. Do they coincide? If yes, give a proof. If not, give a counterexample.

Choose one features/transformation combination which is not invariant. Can you come up with a modification of the feature that makes it (more) invariant?

Signal Entropy

Exercise 5.5  

Exercise 5.6 (Signal Entropy)  
(10 Point10=1s)
Calculate the entropy for the following discrete probability distribution resp. signal sequences:
$ 1,2,3,4,5,4,3,2,1$
$ P(x) = p^{x}(1-p)^{1-x}$         for $ x\in \{0,1\}$ and $ 0< p < 1$.

(Draw/Plot the result as a function of $ p$).

$ P(x) = \frac{2520}{7381}\cdot\frac{1}{x}$     for $ x=1,2,3,\dots,10$
$ P(x) = \frac{1}{2^x}$     for $ x=1,2,3,\dots$
$ 0,\ 0,1,\ 0,1,2,\ 0,1,2,3,\ 0,1,2,3,4,\dots,0,1,2,3,4,5,6,7,8,9$

Huffman Codes

Exercise 5.7  

Exercise 5.8 (Huffman Codes)  
(5 Point5=1s)
Construct a Huffman Code for items (1), (2), and (3) of exercise 5.3.

Hough Transform

Exercise 5.9  

Exercise 5.10 (Hough Transform)  
(20 Point20=1s)
In the lecture, the Hough Transform for line-finding was explained. Write a program that reads a binary image and uses the Hough Transform to find the best line to match the ''ON'' pixels. Use a $ 1000\times 1000$ resolution in parameter space, i.e. discretize the angle and distance into $ 1000$ steps each.


June 1

Given two images, a big one and a small one, find their best alignment using FFT convolution.

June 15

Detect blobs in an image by taking local maxima of Laplacian of Gaussian (see also: Blob Detection at Wikipedia).

June 29

Cluster letters from a page by the following equivalence relation: A ~ B if eroded A is a subset of dilated B and vice versa.

July 13

Compare Fourier descriptors of hand-drawn curves. (Use the attached Python module to input curves)