Efficiently Implementing Dilate and Erode Image Functions

Dilate by One

Motivation

Dilate

Dilate is a function that accepts a black and white image. It is also known by the names "grow", "bolden", and "expand". It turns on pixels which were near pixels that were on originally, thereby thickening the items in the image.

Erode

Erode is the sister function to dilate. It is also known by the name "shrink". It turns off pixels which were near pixels that were off originally, thereby eating away at the edges of the items in the image.

Detecting Image "Snow"

Dilate and erode can be used in conjunction do detect and correct a common problem with digital and scanned images: snow. Snow is caused by bad pixels on the CCD of a digital camera or dust that gets onto a scanned image or negative. If we dilate and image, and then erode the image, we can see that holes in the image get filled in:

Original

After Dilate

After Dilate and Erode

0	0	0
0	0	0
1	1	1
1	0	1
1	1	1
0	0	0
0	0	0

0	0	0	0	0
0	1	1	1	0
1	1	1	1	1
1	1	1	1	1
1	1	1	1	1
0	1	1	1	0
0	0	0	0	0

0	0	0
0	0	0
1	1	1
1	1	1
1	1	1
0	0	0
0	0	0

A typical algorithm to correct snow would usually be something like this:

Image correctSnow(Image image, int radiusOfSnow){
    // Copy the image and convert the copy to black and white
    Image blackAndWhiteImage = threshold(copy(image), THRESHOLD_VALUE);
    // Copy the threshold and dilate and erode the copy
    Image dilatedAndEroded = erode(
        dilate(
            copy(blackAndWhiteImage), 
            radiusOfSnow
        ), 
        radiusOfSnow
    );
    // Figure out which pixels were snow based on the difference
    // between the thresholded image and the dilated and eroded image
    Image snowPixels = diff(blackAndWhiteImage, dilatedAndEroded);
    // Fix up any pixels in the original that were marked as snow by 
    // filling them with color of surrounding pixels
    return blendSnowPixels(image, snowPixels);
}

As we will see, dilate and erode can be efficient, no matter how big the snow that you want to remove is.

The Challenge

Write a function that dilates an image. Assume that the image can be of arbitrary size (n by n). An image is represented by a two dimensional array of integers. The image data is binary (zeros and ones), black and white, no color or gray scale. Dilate should turn on any pixel that is touching a pixel in the north, east, south, or west direction (no diagonals) that is already turned on in the input.

This is how dilation is supposed to behave (Flip a couple bits and the press dilate):

First Try

When first presented with this problem most people make a few mistake related to bounds checking (off by one errors and such). Those problems are easily corrected. However the most common incorrect algorithm is that it is easy to step on ones toes.

// Incorrect solution - steps on its own toes
int[][] dilate(int[][] image){
    for (int i=0; i<image.length; i++){
        for (int j=0; j<image[i].length; j++){
            if (image[i][j] == 1){
                if (i>0) image[i-1][j] = 1;
                if (j>0) image[i][j-1] = 1;
                if (i+1<image.length) image[i+1][j] = 1;
                if (j+1<image[i].length) image[i][j+1] = 1;
            }
        }
    }
    return image;
}

Second Try

The easiest way to correct this is to create a copy of the image to work with a copy of the image. This is perfectly correct solution, however it uses O(n^2) extra space. Notice that the copy of the image has to be initialized to all zeros and that the current pixel has to be copied over as well as the surrounding pixels.

// Correct, but creates a copy of the image which is inefficient
int[][] dilate(int[][] image){
    int[][] imagecopy = new int[image.length][image[0].length];
    for (int i=0; i<image.length; i++){
        for (int j=0; j<image[i].length; j++){
            if (image[i][j] == 1){
                imagecopy[i][j] = 1;
                if (i>0) imagecopy[i-1][j] = 1;
                if (j>0) imagecopy[i][j-1] = 1;
                if (i+1<image.length) imagecopy[i+1][j] = 1;
                if (j+1<image[i].length) imagecopy[i][j+1] = 1;
            }
        }
    }
    return imagecopy;
}

Best Solution

The best solution is to store information about what is newly turned on (vs was on originally) in the pixels themselves. This solution uses the value of 2 to indicate something was newly turned on. Notice that before a pixel is flipped to a two, it must be checked to ensure that it wasn't a one first. Also notice that a second pass is made at the end to turn the twos back to ones. This does not effect the magnitude of the running time: The algorithm still runs in O(n^2) time.

// Best dilate by one solution
int[][] dilate(int[][] image){
    for (int i=0; i<image.length; i++){
        for (int j=0; j<image[i].length; j++){
            if (image[i][j] == 1){
                if (i>0 && image[i-1][j]==0) image[i-1][j] = 2;
                if (j>0 && image[i][j-1]==0) image[i][j-1] = 2;
                if (i+1<image.length && image[i+1][j]==0) image[i+1][j] = 2;
                if (j+1<image[i].length && image[i][j+1]==0) image[i][j+1] = 2;
            }
        }
    }
    for (int i=0; i<image.length; i++){
        for (int j=0; j<image[i].length; j++){
            if (image[i][j] == 2){
                image[i][j] = 1;
            }
        }
    }
    return image;
}

Dilate by K

Motivation

What we have written is dilate by one. This can be generalized into dilate by k. k will be defined in terms of Manhattan distance. In Manhattan, the distance between any two places is the number of blocks you have to walk to get there. You must travel in a stair step fashion as you cannot cut diagonally through buildings. The dilation by k will turn on all pixels that are within k Manhattan distance of a pixel that was on in the input.

Here is how dilation by k is supposed to work:

Nasty Solution

The temptation is to follow a similar algorithm to the one presented by dilate by one. In such a solution we would walk each point in the image. When a point that is on is found, travel k in each direction turning on pixels. While possible to write, such a solution would be fairly messy and run in O(n^2*k^2) time. Since k is bounded by n, that is really a O(n^4) solution. There are both much cleaner ways and much more efficient ways to write the algorithm.

// n^4 (very inefficient) solution for dilate by k
int[][] dilate(int[][] image, int k){
    for (int i=0; i<image.length; i++){
        for (int j=0; j<image[i].length; j++){
            if (image[i][j] == 1){
                for (int l=i-k; l<=i+k; l++){
                    int remainingk = k-Math.abs(i-l);
                    for (int m=j-remainingk; m<=j+remainingk; m++){
                        if (l>=0 && m>=0 && l<image.length && m<image.length && image[l][m]==0){
                            image[l][m] = 2;
                        }
                    }
                }
            }
        }
    }
    for (int i=0; i<image.length; i++){
        for (int j=0; j<image[i].length; j++){
            if (image[i][j] == 2){
                image[i][j] = 1;
            }
        }
    }
    return image;
}

Easy solution

If you pressed the dilate button more than once in the dilate by one examples, you may have noticed that dilating by one k times, is equivalent to dilating by k. This suggests the super easy algorithm of calling what was already written k times. Notice that this runs in O(n^2*k) time. Since k is bounded by n, that is really O(n^3) time.

// easy to write (but n^3 inefficient) dilate by k solution
int[][] dilate(int[][] image, int k){
    for (int i=0; i<k; i++){
        image = dilate(image);
    }
    return image;
}

Efficient Solution

To find an O(n^2) solution to this problem, the problem needs to be framed differently. What if there were an oracle that told us (for every pixel) the Manhattan distance to the nearest pixel. Pixels that are on would get a zero from the oracle. Pixels next to them a one, and so on.

The Manhattan distance oracle:

If we had that oracle we could just threshold by the Manhattan distance.

// n^2 solution with Manhattan oracle
int[][] dilate(int[][] image, int k){
    image = manhattan(image);
    for (int i=0; i<image.length; i++){
        for (int j=0; j<image[i].length; j++){
            image[i][j] = ((image[i][j]<=k)?1:0);
        }
    }
    return image;
}

Manhattan Oracle

In One Dimension

So the question just becomes how can we write the Manhattan oracle to run in O(n^2) time. The best way to approach this problem is to take it down a dimension. In a single dimensional array, for each point find the distance to the nearest "on" point.

There are two facts that help:

The maximum distance to a pixel that is on is the length of the array
A pixel is at most one further from a pixel that is on than the pixel before it

Those with eagle eyes will soon discover that if you walk the array, and find a pixel that is on, you can count up until you get halfway to the next pixel that is on. That suggests that a two pass solution: One pass to find the nearest pixel to the left and one pass to find the nearest pixel to the right. The distance to the nearest pixel will be the minimum of the pixel to the left and the pixel to the right. In cases in which the distance is unknown, it is safe to assume the maximum because that value will be corrected later.

// O(n) solution to find the distance to "on" pixels in a single dimension array
int[] distance(int[] arr){
    // traverse forwards
    for (int i=0; i<arr.length; i++){
        if (arr[i] == 1){
            // first pass and pixel was on, it gets a zero
            arr[i] = 0;
        } else {
            // pixel was off
            // It is at most the length of array
            // away from a pixel that is on
            arr[i] = arr.length;
            // One more than the pixel to the left
            if (i>0) arr[i] = Math.min(arr[i], arr[i-1]+1);
        }
    }
    // traverse backwards
    for (int i=arr.length-1; i>=0; i--){
        // what we had on the first pass
        // or one more than the pixel to the right
        if (i+1<arr.length) arr[i] = Math.min(arr[i], arr[i+1]+1);
    }
    return arr;
}

In Two Dimensions

Generalizing from one dimension, we have a similar solution, but now have pixels on four sides of us instead of two sides to which to pay attention. Again, we can make two passes. On the first pass, look north and west, adding one to each (use the minimum). On the second pass, look south and east, adding one to each (use the minimum). The maximum Manhattan distance that a pixel can be away from a pixel that is "on" is the sum of the width and the height of the image.

// O(n^2) solution to find the Manhattan distance to "on" pixels in a two dimension array
int[][] manhattan(int[][] image){
    // traverse from top left to bottom right
    for (int i=0; i<image.length; i++){
        for (int j=0; j<image[i].length; j++){
            if (image[i][j] == 1){
                // first pass and pixel was on, it gets a zero
                image[i][j] = 0;
            } else {
                // pixel was off
                // It is at most the sum of the lengths of the array
                // away from a pixel that is on
                image[i][j] = image.length + image[i].length;
                // or one more than the pixel to the north
                if (i>0) image[i][j] = Math.min(image[i][j], image[i-1][j]+1);
                // or one more than the pixel to the west
                if (j>0) image[i][j] = Math.min(image[i][j], image[i][j-1]+1);
            }
        }
    }
    // traverse from bottom right to top left
    for (int i=image.length-1; i>=0; i--){
        for (int j=image[i].length-1; j>=0; j--){
            // either what we had on the first pass
            // or one more than the pixel to the south
            if (i+1<image.length) image[i][j] = Math.min(image[i][j], image[i+1][j]+1);
            // or one more than the pixel to the east
            if (j+1<image[i].length) image[i][j] = Math.min(image[i][j], image[i][j+1]+1);
        }
    }
    return image;
}

4 thoughts on “Efficiently Implementing Dilate and Erode Image Functions”

4 comments

Alexander
June 17, 2015, 10:42 am

Thank you for the post. I enjoyed reading it. Just one comment. The following statement “If you pressed the dilate button more than once in the dilate by one examples, you may have noticed that dilating by one k times, is equivalent to dilating by k.” is correct for a rectangular structuring element, but it is not for a circular one, for example. And the circular one is also a popular choice…

Alex
March 4, 2019, 4:36 pm

Hi.
Great article. It is very useful for me.
But there is one mistake. When you write about complexity in O notation.
As I understood, n – is a one dimension of your image(array). And n^2 – is a square of it.
If it’s true, then you must write complexity O(n), if you’r algorithm pass throw(or copy) image only once.
Because O argument(n) describes all data, instead of only one side of image.
If algorithm passes throw the image with AxB size only once it means O(n) complexity. Not O(n^2).
If algorithm passes throw the image with AxB size (AxB)*(AxB) times it means O(n^2) complexity.

If you don’t understand or not agree, please reply and I’ll try to explain more clear.

- Stephen Post author
  October 22, 2019, 12:45 pm
  
  It depends how your define n. I define it as an n by n image, so an image has n^2 pixels.
  
- Charles Martin
  October 20, 2020, 9:15 pm
  
  actually, his big-O analysis is correct. If you represent an image as a 2-D grid, that means you’re running two nested loops. For NxN image, that is O(N^2)

Dilate by One

Motivation

Dilate

Erode

Detecting Image "Snow"

The Challenge

First Try

Second Try

Best Solution

Dilate by K

Motivation

Nasty Solution

Easy solution

Efficient Solution

Manhattan Oracle

In One Dimension

In Two Dimensions

Leave a comment Cancel reply

4 thoughts on “Efficiently Implementing Dilate and Erode Image Functions”