Image Sensor Shootout (... the myth of pixel-shifting)

Ever since the release of the Panasonic's AG-HVX200 camcorder, video geeks all over the world are discussing the benefits of pixel-shifting.

Advanced video-camcorders usually have three image sensors - one for each of the primary colors red, green and blue. Normally, each of the three sensors has the same resolution as the desired target video format, for example 720x480 pixels for NTSC or 1440x1080 pixels for HDV.

However, for a HD-camcorder the AG-HVX200 has low resolution chips (960x540) and claims to double resolution to full-HD (1920x1080) by shifting the green sensor side- and upwards by half a pixel relative to the red and blue sensors.
Look at the exemplary sensor on the right: The luminance of row 4 / column 4 is calculated by taking a quarter of the luminance of the green sensor B and red and blue sensors F: 709Y44 = 0.2126 x RedF + 0.7152 x GreenB + 0.0722 x BlueF. The weight factors (0.2126, 0.7152 and 0.0722) are standardized in ITU-R 709 and model the human image perception.

This trick obviously allows to calculate twice as many columns and rows of luminance information compared to unshifted sensors. But is the result comparable to an (unshifted) high resolution image sensor with 4 times as many sensor elements?
stacks_image_07B3A528-B358-49EF-A805-4ABE60225A03
Pixel-shifting is not a new technology, though. It has been used in many professional cameras for years and is also used in semi-professional handheld-camcorders manufactured by Canon and others. It’s acknowledged benefits are improved color rendering and reduced need for alias filtering.

Pixel-shifting shootout


So what is the truth behind the myth?
To find out let's look at a small sensor with just 5x5 pixels. Although this sensor is obviously much too small to be useful in real world it still is big enough to demonstrate the principles.
Here is a schematic representation of our sensor:
The numbers in the upper left corner of each sensor element represent the calculated luminance values (with 0 being full black and 1 being bright white).
In this example the sensor is seeing one black square exactly in the middle (0,0) with bright white on all other fields (1,0).

For illustration purposes there are gaps between the gray sensor elements.
stacks_image_548DEDE6-D990-42D2-A3F0-81102CAF944C
On the right is the numerical representation of the exemplary image from above.
I have used a much higher image resolution (20x20) than the sensor resolution (5x5) to be able to explore the sensor behavior when the image is gradually moving.
stacks_image_961050E0-C251-44E0-B32C-87C1516C7AE5
So now a graphical view of our first image:
stacks_image_B0104F46-B0BA-4711-83AD-00D54333F69E
And this is what the sensor will make of it:
stacks_image_F336DC4A-8196-4C0D-A17A-78FB378663DC
No surprises ;-)
Let's try a more complex example.
We will shift the black dot slightly to the lower right so that it spans across several sensor elements:
The image now looks like this:
stacks_image_4D4D887A-C976-4094-B96A-11FCAFFB30D4
Due to the limited resolution of our sensor (5x5) it can not represent this image correctly. Instead, the luminance values will be mapped to the individual pixels like this:
stacks_image_108621C0-A1BF-4B2B-AB85-9CEB1049856F
Obviously, this does not work so very well.
Now we use the same example and see what the result would be with a pixel-shifted sensor.
To the right is a graphical representation of that sensor. Again, the sensor elements are in a 5x5 matrix. This time, however, the green elements are shifted sideways and upwards from the red and blue sensor elements.
stacks_image_530541BF-0A8F-48C6-B42C-52426595DEE2
We are using the same image as above:
stacks_image_6C790CD9-F909-44DB-8069-B46359D2B039
Due to the spatial offset of the green sensor elements of the pixel-shifted sensor we can now calculate a 10x10 matrix of luminance values from the 5x5 matrix of sensor elements:
stacks_image_CE2FEF5E-D564-45DD-A1E5-DB20818DBA1E
Both position and size of the black square are much closer to reality than they were with the unshifted sensor. A clear improvement. However there are some halos and the square is still pretty much distorted.
Now let's look at a 'true' high-resolution sensor with twice as many sensor-elements in both directions (10x10):
stacks_image_E40A890F-A8FB-4AA0-BA59-5784625A687A
For the third time we are using the same image:
stacks_image_F5497530-FEC9-4CBA-B80F-1635752B712F
And here are the luminance-values of the high-res sensor:
stacks_image_66B78055-53E4-4B4F-9F74-FF4B98423C3D
The square renders a little larger than in reality but it is located at the right spot, has the correct shape and there are no halos around the image.
A clear winner.
Out of curiosity, let's see what a Bayer-pattern sensor would do to this image. Such sensors are used in nearly all single-sensor cameras (DSLRs and low-end camcorders).
Such a sensor combines all three primary colors within it's sensor array. The green pixels are usually included twice as often as the red and blue pixels due to the color selectivity of our eyes.
In our 5x5 matrix there are 12,5 green pixels and 6 1/4 red and blue pixels each.
stacks_image_669E86BA-D081-479A-B989-42438880BE53
For the last time our shifted square as the original image:
stacks_image_E37A892C-1C07-4102-8952-87557AE5130A
Rendering of a Bayer-pattern sensor:
stacks_image_F2741574-C36E-4097-B67C-FE3A4BB85A47
Again, not a surprise: The output of this sensor is by far the worst. Bayer-pattern sensors need much higher pixel density than the desired resolution in the final image. After all, three colors are combined in just one chip.
As all of this was done with a simple EXCEL-Sheet. I invite you to download the file and do your own experiments.

More images


For a fair comparison many more images need to be examined. So I have created a video with moving black squares of different sizes.

The video-screen is divided in four quadrants: The upper left part shows the original image - a moving black square on solid white background. The square comes in three different sizes as the video progresses: At first, it’s just as large as one pixel of the 5x5 array (like the one in our example). The second square is one quarter that size (i.e. just as large as a 10x10 sensor’s pixel) and the third and final square is even smaller (again one quarter of the previous size).

The other three quarters of the video-screen show the output of the 5x5 sensor, the pixel-shifted 5x5 sensor and the 10x10 sensor.

Now compare for yourself: Can the pixel shifted sensor compete with the high-resolution sensor? Is it at least performing any better than an unshifted sensor?

Download the Sensor-Shootout Video (640x480, wmv, 2.9 MB).

Here's what I found:
Large Squares
Low-Res Sensor
Low-Res Sensor
(pixel-shifted)
High-Res Sensor
Footprint

Varies from 4 to 16 squares
Average 10,6
1 Point

Varies from 4 to 13 squares
Average 9,5
2 Points

Varies from 4 to 12 squares
Average 7,0
3 Points

Contrast

Good contrast
2 Points

Most images with low contrast
1 Point

High contrast all over
3 Points

Shape

Some images
heavily distorted
1 Point

Some images distorted
but main part is a square
2 Points

Mostly a square
3 Points

Overall score

4 Points

5 Points

9 Points

Medium Squares
Low-Res Sensor
Low-Res Sensor
(pixel-shifted)
High-Res Sensor
Footprint

Varies from 4 to 8 squares
Average 4,8
1 Point

Varies from 2 to 4 squares
Average 3,4
2 Points

Varies from 1 to 4 squares
Average 2,4
3 Points

Contrast

Contrast varies from very low to
good; some images invisible
1 Points

Contrast is always low
1 Point

Good contrast
3 Points

Shape

Sometimes a square, sometimes
two adjacent squares
1 Point

Just like low-res
but smaller
2 Points

Just like low-res
but smaller
3 Points

Overall score

3 Points

5 Points

8 Points

Small Squares
Low-Res Sensor
Low-Res Sensor
(pixel-shifted)
High-Res Sensor
Footprint

Contrast below visibility
0 Points

Contrast below visibility
0 Points

Too big; one square
1 Point

Contrast

Contrast below visibility
0 Points

Contrast below visibility
0 Points

Good but could be better
2 Points

Shape

Contrast below visibility
0 Points

Contrast below visibility
0 Points

Always the correct shape
3 Points

Overall score

0 Points

0 Points

6 Points

Conclusion


This doesn't come as a big surprise: The high-res sensor wins hands-down with 23 points over both the low-res sensor (7 points) and the the pixel-shifted low-res sensor (10 points).
Pixel-shifted sensors do provide an improvement over unshifted sensors yet they don't get anywhere near high-resolution sensors.

Would you buy a Ferrari with a 1.2 liter / 100 HP engine? Even if the sales-guy told you it had chip tuning and was just as good as the original 5 liter / 400 HP engine?

So when your are looking for a HD-camera make sure it has the proper sensor resolution and don't get fooled by the myth of pixel-shifting.