There are countless movies and shows where photos, security footage or other video material is enhanced to reveal some important detail.
Most of what is shown in these scenes is completely ridiculous, but new algorithms allow for surprising image enhancements. Image manipulation is not my specialty, but I will still try to summarize what I have found in this post.
Generally there are two types of input material:
Here is another picture of deblurring from the following paper: http://www.ece.northwestern.edu/~sda690/PartialBlur_CVPR09.pdf
Here are a few other examples from the commercial app "SmartDeblur": http://smartdeblur.net/gallery.html
(images (c) smartdeblur and http://gizmodo.com)
Research has also spent resources in finding good ways to upscale pictures. That is, given an image with low resolution, produce the best image with higher resolution. The high-resolution image can't really have more information in it, but the output image can look drastically better if advanced algorithms are used.
The following page has a great comparison of available filters that are currently in use (and a new one they propose): http://www.wisdom.weizmann.ac.il/~vision/SingleImageSR.html
The following video demonstrates how multiple images can be used to increase the resolution of several photos:
Similarly high-dynamic range (HDR) pictures are usually obtained by combining photos that have been taken with different exposures. The resulting image shows details in areas that would be too dark or too bright to provide good details.
The following video shows the effect produced by NIK (a software Google recently bought).
The (non-embeddable) video at http://vimeo.com/1913931# (from http://tlrobinson.net/blog/2008/10/recovering-censored-text-using-adobe-photoshop-cs3/) gives a good idea of how a specialized program would do it. The algorithm runs through all possible combinations and tries them at the location where the original license plate is. It simulates the same detrimental effects (low resolution, compression, ...) that are present in the original image and compares the original image with the simulated one. Even if it can't uniquely identify the license plate it will be able to reduce the number of candidates (especially in combination with other properties like for example the car model).
The same technique is very nicely described on the following page too: http://dheera.net/projects/blur.php or in this NEC research video:
A similar (but even simpler) technique was used a few years ago to uncover redacted information on a CIA memo (http://www.globalsecurity.org/intell/library/reports/2004/pdb_6august2001-declass.pdf):
The original PDF has the sensitive information blacked out, but it shows the original context. Recovering the information is as simple as trying different words and seeing if they fit. The first blacked out area can, for example, not just be "friendly" since that would be too short. The original article (from http://lemonde.fr) unfortunately doesn't exist anymore, but a babelfish translation can still be found here: http://cryptome.org/cia-decrypt.htm
In the same category, note that some image effects can be undone. In 2007 the "Mr. Swirl", a pedophile who had posted images of himself with his face obfuscated by a swirl effect, was arrested after German computer experts had undone the effect: http://en.wikipedia.org/wiki/Christopher_Paul_Neil
Here is a demo of an image enhancer that uses multiple frames of a video.
Intel describes the process of combining multiple frames of a video in this summary-paper: http://www.intel.la/content/dam/www/public/us/en/documents/technology-briefs/intel-labs-image-reconstruction-brief.pdf:
The paper from Microsoft has slightly more details: http://research.microsoft.com/en-us/um/people/cohen/VideoSnapshots_TVCG12.pdf
The following comparison comes from this paper: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.55.7768&rep=rep1&type=pdf
And finally: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.48.8088&rep=rep1&type=pdf
This is rapidly changing, though. The resolution of cameras gets higher and higher: Nokia's Lumia 1020 has an impressive resolution of 41MegaPixel. This doesn't necessarily mean that it provides good photos (since the lens is just too tiny, which implies that the aperture of the phone isn't really big), but it means that zooming in becomes possible. Security cameras, too, increase in resolution. HD (1080p) security cameras are already available and it's just a matter of time before every security camera will provide footage that allows to "enhance".
Most of what is shown in these scenes is completely ridiculous, but new algorithms allow for surprising image enhancements. Image manipulation is not my specialty, but I will still try to summarize what I have found in this post.
Input Material
The quality of the input material is crucial for image enhancements and license plate or face recognition.Generally there are two types of input material:
- analogue input (such as VHS), or
- digital input (any new equipment).
For analogue data the quality is mostly determined by the equipment and the quality of the storage (mostly the tape). in particular a tape that has been used over and over again will just not give the same pictures as a fresh quality-brand tape.
For digital videos we have three main factors:
- the resolution.
- the color depth (how many shades per color).
- the bandwidth or compression.
The higher the resolution, the more pixels a picture has. In theory this allows for sharper pictures, but strictly speaking one can always take low resolution pictures and upscale them (zoom in). This shows, that resolution itself is not enough to determine the image quality.
The color depth basically says how many shades of colors a picture can have. A black-white picture (without grays) has a color depth of 1 bit: on or off. Most binary formats have a picture depth of 24 bits, allowing for 16777216 different colors. This seems like a lot, but for serious image manipulation it reaches its limits.
There is also the problem that cameras need to actually use these shades. It doesn't help if a picture can have 16 millions of colors, if the the camera only differentiates between half of them (mostly because of sensor noise).
The bandwidth/compression just makes the quality worse. Without reducing the maximum resolution and color depth it locally does exactly that. Ideally it only removes details that the human eye can't see, but image manipulation algorithms live from these details. Compression is hence the enemy of "enhance".
To summarize: bad quality VHS material can not simply be "enhanced". There is just not enough information in it. However, if the quality is right, image manipulation can do wonders.
Still Images
There exists an impressive number of image enhancement algorithms, and every digital camera already applies lots of them before showing them to the user. Many high-end cameras also allow to access the unprocessed image (so called "raw" image) but many cameras only provide an enhanced image.
Cameras don't have lots of processing power and computers are able to do much more (at the moment). Adobe, for example, gave an impressive deblurring demo a few years ago:
Here is another picture of deblurring from the following paper: http://www.ece.northwestern.edu/~sda690/PartialBlur_CVPR09.pdf
Here are a few other examples from the commercial app "SmartDeblur": http://smartdeblur.net/gallery.html
(images (c) smartdeblur and http://gizmodo.com)
Research has also spent resources in finding good ways to upscale pictures. That is, given an image with low resolution, produce the best image with higher resolution. The high-resolution image can't really have more information in it, but the output image can look drastically better if advanced algorithms are used.
The following page has a great comparison of available filters that are currently in use (and a new one they propose): http://www.wisdom.weizmann.ac.il/~vision/SingleImageSR.html
Multiple Images
Another exciting area of research is the combination of multiple images to combine their information.The following video demonstrates how multiple images can be used to increase the resolution of several photos:
Similarly high-dynamic range (HDR) pictures are usually obtained by combining photos that have been taken with different exposures. The resulting image shows details in areas that would be too dark or too bright to provide good details.
The following video shows the effect produced by NIK (a software Google recently bought).
License Plate and Redacted Information Recovery
Together with faces, license plates are probably the most common reason for enhancements in movies. Almost always, movies simply zoom into the section and get a nice read of the plate. Despite the cool filters shown above, this is just not realistic. However, a specialized algorithm doesn't need to enhance the full image. There are only a limited number of combinations that can be on a license plate, and programs that take this into account can be much more efficient.The (non-embeddable) video at http://vimeo.com/1913931# (from http://tlrobinson.net/blog/2008/10/recovering-censored-text-using-adobe-photoshop-cs3/) gives a good idea of how a specialized program would do it. The algorithm runs through all possible combinations and tries them at the location where the original license plate is. It simulates the same detrimental effects (low resolution, compression, ...) that are present in the original image and compares the original image with the simulated one. Even if it can't uniquely identify the license plate it will be able to reduce the number of candidates (especially in combination with other properties like for example the car model).
The same technique is very nicely described on the following page too: http://dheera.net/projects/blur.php or in this NEC research video:
The original PDF has the sensitive information blacked out, but it shows the original context. Recovering the information is as simple as trying different words and seeing if they fit. The first blacked out area can, for example, not just be "friendly" since that would be too short. The original article (from http://lemonde.fr) unfortunately doesn't exist anymore, but a babelfish translation can still be found here: http://cryptome.org/cia-decrypt.htm
In the same category, note that some image effects can be undone. In 2007 the "Mr. Swirl", a pedophile who had posted images of himself with his face obfuscated by a swirl effect, was arrested after German computer experts had undone the effect: http://en.wikipedia.org/wiki/Christopher_Paul_Neil
Videos
Everything that applies to multiple images, also applies to videos. After all, a video can be viewed as a sequence of images. However a video has the additional advantage that the pictures are at a very close succession. This implies that they usually come from similar positions and allow to compute directions of moving objects (including the camera). In theory this should allow for amazing image enhancements, but compression uses some of the same information to reduce the required storage size. The stronger the compression the harder it is to enhance pictures using multiple frames of a video.
An important research area is the upscaling of videos. TVs want to show DVDs on high-resolution screens without looking blocky.
Here is a demo of an image enhancer that uses multiple frames of a video.
Intel describes the process of combining multiple frames of a video in this summary-paper: http://www.intel.la/content/dam/www/public/us/en/documents/technology-briefs/intel-labs-image-reconstruction-brief.pdf:
The paper from Microsoft has slightly more details: http://research.microsoft.com/en-us/um/people/cohen/VideoSnapshots_TVCG12.pdf
The following comparison comes from this paper: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.55.7768&rep=rep1&type=pdf
And finally: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.48.8088&rep=rep1&type=pdf
Conclusion
Algorithms can do amazing things, but not nearly as much as movie writers believe is possible.This is rapidly changing, though. The resolution of cameras gets higher and higher: Nokia's Lumia 1020 has an impressive resolution of 41MegaPixel. This doesn't necessarily mean that it provides good photos (since the lens is just too tiny, which implies that the aperture of the phone isn't really big), but it means that zooming in becomes possible. Security cameras, too, increase in resolution. HD (1080p) security cameras are already available and it's just a matter of time before every security camera will provide footage that allows to "enhance".
Comments
Post a Comment