Device apparatus and method for 3D image interpolation based on a degree of similarity between a motion vector and a range motion vector转让专利

申请号 : US13519158

文献号 : US09270970B2

文献日 :

基本信息:

PDF:

法律信息:

相似专利:

发明人 : Yasunori Ishii

申请人 : Yasunori Ishii

摘要 :

A 3D image interpolation device performs frame interpolation on 3D video. The 3D image interpolation device includes: a range image interpolation unit that generates at least one interpolation range image to be interpolated between a first range image indicating a depth of a first image included in the 3D video and a second range image indicating a depth of a second image included in the 3D video; an image interpolation unit that generates at least one interpolation image to be interpolated between the first image and the second image; and an interpolation parallax image generation unit generates, based on interpolation image, at least one pair of interpolation parallax images having parallax according to a depth indicated by the interpolation range image.

权利要求 :

The invention claimed is:

1. A three-dimensional (3D) image interpolation device that performs frame interpolation on 3D video, the 3D image interpolation device comprising:a range image interpolation unit configured to generate at least one interpolation range image to be interpolated between a first range image and a second range image, the first range image indicating a depth of a first image included in the 3D video, and the second range image indicating a depth of a second image included in the 3D video;an image interpolation unit configured to generate at least one interpolation image to be interpolated between the first image and the second image;a range motion vector calculation unit configured to calculate, as a range motion vector, a motion vector between the first range image and the second range image;an image motion vector calculation unit configured to calculate, as an image motion vector, a motion vector between the first image and the second image;a vector similarity calculation unit configured to calculate a vector similarity that is a value indicating a degree of a similarity between the image motion vector and the range motion vector; andan interpolation parallax image generation unit configured to generate, based on the at least one interpolation image interpolated according to the vector similarity, at least one pair of interpolation parallax images having parallax according to a depth indicated by the at least one interpolation range image; andan interpolation image number determination unit configured to determine an upper limit of the number of interpolations, so that the number of the interpolations increases as the vector similarity calculated by the vector similarity calculation unit increases,wherein the interpolation parallax image generation unit is configured to generate the at least one pair of interpolation parallax images which is equal to or less than the upper limit determined by the interpolation image number determination unit.

2. The 3D image interpolation device according to claim 1,wherein the range motion vector calculation unit is configured to calculate the range motion vector for each block having a first size,the image motion vector calculation unit is configured to calculate the image motion vector for each block having the first size, andthe vector similarity calculation unit is configured to:(i) generate at least one of a histogram of directions of range motion vectors including the range motion vector and a histogram of powers of the range motion vectors, for each block having a second size greater than the first size;(ii) generate at least one of a histogram of directions of image motion vectors including the image motion vector and a histogram of powers of the image motion vectors, for each block having the second size; and(iii) calculate the vector similarity based on at least one of (a) a similarity between the histogram of the directions of the range motion vectors and the histogram of the directions of the image motion vectors and (b) a similarity between the histogram of the powers of the range motion vectors and the histogram of the powers of the image motion vectors.

3. The 3D image interpolation device according to claim 1,wherein the interpolation image number determination unit is configured to determine, as the number of the interpolations, a number which is inputted by a user and is equal to or less than the upper limit, andthe interpolation parallax image generation unit is configured to generate the at least one pair of interpolation parallax images which is equal to the number of the interpolations determined by the interpolation image number determination unit.

4. The 3D image interpolation device according to claim 1, further comprisinga range image obtainment unit configured to: (i) obtain the first range image based on a blur correlation between a plurality of captured images which are included in a first captured image group and have respective different focal distances; and (ii) obtain the second range image based on a blur correlation between a plurality of captured images which are included in a second captured image group and have respective different focal distances, the second captured image group being temporally subsequent to the first captured image group.

5. The 3D image interpolation device according to claim 4, further comprisinga texture image obtainment unit configured to: (i) obtain, as the first image, a first texture image by reconstructing one captured image included in the first captured image group based on blur information indicating a feature of blur in the one captured image; and (ii) obtain, as the second image, a second texture image by reconstructing one captured image included in the second captured image group based on blur information indicating a feature of blur in the one captured image.

6. The 3D image interpolation device according to claim 1,wherein the 3D image interpolation device is implemented as an integrated circuit.

7. A 3D imaging apparatus, comprising:

an imaging unit; and

the 3D image interpolation device according to claim 1.

8. A three-dimensional (3D) image interpolation method of performing frame interpolation on 3D video, the 3D image interpolation method comprising:generating at least one interpolation range image to be interpolated between a first range image and a second range image, the first range image indicating a depth of a first image included in the 3D video, and the second range image indicating a depth of a second image included in the 3D video;generating at least one interpolation image to be interpolated between the first image and the second image;calculating, as a range motion vector, a motion vector between the first range image and the second range image;calculating, as an image motion vector, a motion vector between the first image and the second image;calculating a vector similarity that is a value indicating a degree of a similarity between the image motion vector and the range motion vector; andgenerating, based on the at least one interpolation image interpolated according to the vector similarity, at least one pair of interpolation parallax images having parallax according to a depth indicated by the at least one interpolation range image; anddetermining an upper limit of the number of interpolations, so that the number of the interpolations increases as the vector similarity calculated by the vector similarity calculation unit increases,wherein the at least one pair of interpolation parallax images generated is equal to or less than the upper limit determined.

9. A non-transitory computer-readable recording medium for use in a computer, the recording medium having a computer program recorded thereon for causing the comptuer to execute the 3D image interpolation method according to claim 8.

说明书 :

TECHNICAL FIELD

The present invention relates to three-dimensional (3D) image interpolation devices, 3D imaging apparatuses, and 3D image interpolation methods for performing frame interpolation on 3D video.

BACKGROUND OF THE INVENTION

In recent years, digital still cameras and digital camcorders using a solid-state imaging device (hereinafter, referred to also simply as an “imaging device”) such as a Charge Coupled Device (CCD) image sensor and a Complementary Metal Oxide Semiconductor (CMOS) image sensor have achieved remarkably higher functions and higher performance. In particular, with the advance of the semiconductor manufacturing technologies, pixel structures in such solid-state imaging devices have been further miniaturized.

As a result, higher integration of pixels and driving circuits in the solid-state imaging devices have been considered. Therefore, in a few years, the number of pixels in an imaging device has immensely been increased from about million pixels to ten million pixels or more. Furthermore, quality of images captured by imaging has also dramatically been improved.

In the meanwhile, flat-screen display apparatuses such as Liquid Crystal Displays (LCDs) and plasma displays can save space and display high-definition and high-contrast images. Such movement of improving image quality is expanding from two-dimensional (2D) images to 3D images. Recently, 3D display apparatuses, which can display high-quality 3D images by using polarization eyeglasses or eyeglasses with high-speed shutter, have been developed.

3D imaging apparatuses for generating high-quality 3D images or high-quality 3D video to be displayed by 3D display apparatuses have also been developed. For a simple method of generating 3D images and displaying them by a 3D display apparatus, it is considered that an image or video is captured by an imaging apparatus having two optical systems (two sets of a lens and an imaging device) located at two different positions. Images captured by the respective optical systems are provided as a left-eye image and a right-eye image to a 3D display apparatus. The 3D display apparatus displays the captured left-eye image and right-eye image by switching them at a high speed, so that a user wearing eyeglasses can perceive the images as a 3D image.

There is another method for generating a left-eye image and a right-eye image, by calculating depth information of a scene by an imaging system including a plurality of cameras, and using the depth information and texture information for the left-eye/right-eye image generation. There is still another method for generating a left-eye image and a right-eye image, by which depth information is calculated from a plurality of images captured by a single camera by varying geometric or optical conditions of a scene (such as a way of light exposure) or conditions of an optical system in an imaging apparatus (such as a diaphragm size).

One example of the above-described method using a plurality of cameras is a multi-baseline stereo method disclosed in Non-Patent Literature 1 by which a depth of each pixel is calculated by simultaneously using images captured by a plurality of cameras. It is known that this multi-baseline stereo method can estimate a scene depth with a higher accuracy than that of a general twin-lens stereo.

The following describes one example of the multi-baseline stereo method in the case where a left-eye image and a right-eye image (parallax images) are generated by using two cameras (a twin-lens stereo). A twin-lens stereo captures two images of a subject from different viewpoints by using two cameras, and extracts feature points from the respective captured images to determine a correspondence relationship between the feature points to find corresponding points. A distance between the found corresponding points is called a parallax. For example, regarding two images captured by the two cameras, if coordinates (x, y) of the corresponding feature points are (5, 10) and (10, 10), respectively, a parallax is 5. Here, assuming that the in cameras are arranged in parallel to each other, and “d” represents a parallax, “f” represents a focal distance between the two cameras, and “B” represents a distance (baseline) between the cameras, a distance from the cameras to the subject is calculated by following Equation 1.

[

Math

.

1

]

Z

=

-

Bf

d

(

Equation

1

)

If the distance between the two cameras is far, a feature point observed by one of the cameras may not be observed by the other camera. Even in such a case, the multi-baseline stereo method can use three or more cameras to reduce ambiguity in the corresponding point search, thereby reducing errors in parallax estimation.

If a depth is determined, it is possible to generate a left-eye image and a right-eye image by using the depth information and a scene texture as disclosed in Non-Patent Literature 2, for example. According to the method disclosed in Non-Patent Literature 2, based on the estimated depth and the scene texture obtained by the imaging apparatus, it is possible to generate images which are vertically captured from vertical camera positions (a vertical left-eye camera position and a vertical right-eye camera position) as new viewpoints. Thereby, it is possible to generate images having viewpoints different from those in actual capturing.

The images having the new viewpoints can be generated by following Equations 2. Here, the respective symbols are the same as those in Equation 1. “xc” represents x-coordinates of a camera for which a depth is calculated, and “xl” and “xr” represent x-coordinates of respective cameras at the newly-generated viewpoints. “xl” is x-coordinates of a (virtual) left-eye camera, and “xr” is x-coordinates of a (virtual) right-eye camera. “tx” represents a distance (baseline) between the virtual cameras.

[

Math

.

2

]

xl

=

xc

+

txf

2

Z

xr

=

xc

-

txf

2

Z

(

Equations

2

)

As described above, if a depth is calculated by using a plurality of cameras, it is possible to generate a left-eye image and a right-eye image.

On the other hand, one example of the method in which conditions regarding a scene are varied to calculate a depth is the photometric stereo method disclosed in Non-Patent Literature 3. When a plurality of images generated by capturing a subject by varying positions of illumination are inputted, a 3D position of the subject is determined based on a 3D relationship between pixel values of the subject and the illumination positions. Furthermore, an example of the method of varying optical conditions of an imaging apparatus is the depth-from-defocus method disclosed in Non-Patent Literature 4. By this method, a distance (depth) from a camera to a subject can be calculated by using (a) a change amount in blur in each pixel in a plurality of images captured by varying a focal distance of the camera, (b) a focal distance of the camera, and (c) a diaphragm size (opening size) of the camera. As described above, various methods for determining scene depth information have been researched. In particular, the depth-from-defocus method has advantages of reducing a size and a weight of an imaging apparatus and not requiring other apparatuses such as an illumination apparatus.

CITATION LIST

Patent Literatures

Non Patent Literature

SUMMARY OF INVENTION

As described above, by using the depth-from-defocus method, it is possible to determine scene depth information by a small single-lens system. However, the depth-from-defocus method requires capturing of two or more images by varying a camera focal distance. In other words, it is necessary, in capturing images, to drive a lens (or an imaging device) backwards and forwards to vary a focal distance of the camera. Therefore, a time required for a single capturing task heavily depends on a driving time and a time required to wait until the lens or imaging device stops vibrating after driving.

Therefore, the depth-from-defocus method is not capable of capturing many images in a second. If video is captured while calculating depth information by the depth-from-defocus method, a frame rate of the video is low.

In order to generate video with a high frame rate from video with a low frame rate, there is a method of performing interpolating using two images in a temporal direction to generate an image having a higher temporal resolution. This method is used, for example, to increase a temporal resolution for smooth display on a display apparatus.

However, if interpolation in a temporal direction is performed using images including blurs for the depth-from-defocus method, it is possible to generate an interpolated image including the blurs, but the blurs affect the depth information calculation. Therefore, the depth-from-defocus method cannot calculate depth information from the interpolated image including blurs.

In addition, in order to increase a temporal resolution of a 2D image, the following method is also considered. First, the depth-from-defocus method is used to generate a left-eye image and a right-eye image for each still image, and then image interpolation is performed for each viewpoint.

However, since the left-eye image and the right-eye image have been separately interpolated, it is not assured that the 3D geometric position relationship is correct. Therefore, there is no feeling of strangeness when the images are perceived as independent different still pictures, but there is a feeling of strangeness when the images are perceived as 3D video by a 3D display apparatus.

By the method disclosed in Patent Literature 1, a movement model of a subject is defined, and coordinate information and movement information are interpolated. By this method, it is possible to interpolate not only 2D coordinate information but also 3D movement information. However, since a general scene includes a complicated movement which is difficult to be modeled, it is difficult to apply this method to general scenes.

Thus, in order to overcome the above-described problems of the conventional techniques, one non-limiting and exemplary embodiment provides a 3D image interpolation device, a 3D imaging apparatus, and a 3D image interpolation method which are capable of performing frame interpolation on 3D video with a high accuracy.

In one general aspect, the techniques disclosed here feature; a three-dimensional (3D) image interpolation device that performs frame interpolation on 3D video, the 3D image interpolation device including: a range image interpolation unit configured to generate at least one interpolation range image to be interpolated between a first range image and a second range image, the first range image indicating a depth of a first image included in the 3D video, and the second range image indicating a depth of a second image included in the 3D video; an image interpolation unit configured to generate at least one interpolation image to be interpolated between the first image and the second image; and an interpolation parallax image generation unit configured to generate, based on the at least one as interpolation image, at least one pair of interpolation parallax images having parallax according to a depth indicated by the at least one interpolation range image.

With the above structure, the interpolation parallax images are generated after separately performing interpolation for 2D images and interpolation for range images when frame interpolation is performed on 3D video. Therefore, it is possible to suppress more interpolation errors in a depth direction in comparison to the case where interpolation parallax images are generated by separately performing interpolation for left-eye images and interpolation for right-eye images. As a result, the frame interpolation on 3D video can be performed with a high accuracy. In addition, a left-eye interpolation image and a right-eye interpolation image are generated by using the same interpolation range image and the same interpolation image. Therefore, the 3D video for which the frame interpolation has been performed hardly cause the user viewing the 3D video to feel uncomfortable due to the interpolation.

It is possible that the 3D image interpolation device further includes: a range motion vector calculation unit configured to calculate, as a range motion vector, a motion vector between the first range image and the second range image; an image motion vector calculation unit configured to calculate, as an image motion vector, a motion vector between the first image and the second image; a vector similarity calculation unit configured to calculate a vector similarity that is a value indicating a degree of a similarity between the image motion vector and the range motion vector; and an interpolation image number determination unit configured to determine an upper limit of the number of interpolations, so that the number of the interpolations increases as the vector similarity calculated by the vector similarity calculation unit increases, wherein the interpolation parallax image generation unit is configured to generate the at least one pair of interpolation parallax images which is equal to or less than the upper limit determined by the interpolation image number determination unit.

With the above structure, it is possible to determine the upper limit of interpolations depending on a similarity between a range motion vector and an image motion vector. When the similarity between the range motion vector and the image motion vector is low, there is a high possibility that the range motion vector or the image motion vector is not correctly calculated. Therefore, in such a case, the interpolation upper limit is set to be low so as to prevent that interpolation parallax images deteriorates image quality of the 3D video.

It is also possible that the range motion vector calculation unit is configured to calculate the range motion vector for each block having a first size, the image motion vector calculation unit is configured to calculate the image motion vector for each block having the first size, and the vector similarity calculation unit is configured to: (i) generate at least one of a histogram of directions of range motion vectors including the range motion vector and a histogram of powers of the range motion vectors, for each block having a second size greater than the first size; (ii) generate at least one of a histogram of directions of image motion vectors including the image motion vector and a histogram of powers of the image motion vectors, for each block having the second size; and (iii) calculate the vector similarity based on at least one of (a) a similarity between the histogram of the directions of the range motion vectors and the histogram of the directions of the image motion vectors and (b) a similarity between the histogram of the powers of the range motion vectors and the histogram of the powers of the image motion vectors.

With the above structure, it is possible to calculate a vector similarity based on at least one of a histogram of motion vector directions and a histogram of motion vector powers. It is thereby possible to improve a correlation between a possibility of incorrect calculation of motion vectors and a vector similarity. As a result, the interpolation upper limit can be determined appropriately.

It is further possible that the interpolation image number determination unit is configured to determine, as the number of the interpolations, a number which is inputted by a user and is equal to or less than the upper limit, and the interpolation parallax image generation unit is configured to generate the at least one pair of interpolation parallax images which is equal to the number of the interpolations determined by the interpolation image number determination unit.

With the above structure, the number of interpolations can be determined based on an input from the user. As a result, it is possible to prevent the frame interpolation from causing the user to feel uncomfortable.

It is still further possible that the 3D image interpolation device further includes a range image obtainment unit configured to: (i) obtain the first range image based on a blur correlation between a plurality of captured images which are included in a first captured image group and have respective different focal distances; and (ii) obtain the second range image based on a blur correlation between a plurality of captured images which are included in a second captured image group and have respective different focal distances, the second captured image group being temporally subsequent to the first captured image group.

With the above structure, a plurality of captured images having respective different focal distances can be used as inputs. Therefore, it is possible to contribute to the decrease of an imaging apparatus size.

It is still further possible that the 3D image interpolation device further includes a texture image obtainment unit configured to: (i) obtain, as the first image, a first texture image by reconstructing one captured image included in the first captured image group based on blur information indicating a feature of blur in the one captured image; and (ii) obtain, as the second image, a second texture image by reconstructing one captured image included in the second captured image group based on blur information indicating a feature of blur in the one captured image.

With the above structure, it is possible to generate interpolation parallax images based on a texture image.

It is still further possible that the 3D image interpolation device is implemented as an integrated circuit.

It is still further possible that a 3D imaging apparatus, including: an imaging unit; and the above-described 3D image interpolation device.

With the above structure, the 3D imaging apparatus can offer the same advantages as those of the above-described 3D image interpolation device.

It should be noted that the present disclosure can be implemented not only as the 3D image interpolation device including the above characteristic units, but also as a 3D image interpolation method including steps performed by the characteristic units of the 3D image interpolation device. The present disclosure may be implemented also as a program causing a computer to execute the characteristic steps of the 3D image interpolation method. Of course, the program can be distributed via a non-transitory recording medium such as a Compact Disc-Read Only Memory (CD-ROM) or via a transmission medium such as the Internet.

The present disclosure is capable of performing frame interpolation on 3D video with a high accuracy.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a block diagram showing an overall structure of a 3D imaging apparatus according to an embodiment of the present disclosure.

FIG. 2 is a block diagram showing a structure of a 3D image interpolation unit apparatus according to the embodiment of the present disclosure.

FIG. 3 is a flowchart of processing performed by the 3D in image interpolation unit according to the embodiment of the present disclosure.

FIG. 4 is a flowchart of processing performed by a range image obtainment unit according to the embodiment of the present disclosure.

FIG. 5 is a diagram for explaining an example of a motion vector calculation method according to the embodiment of the present disclosure.

FIG. 6 is a diagram showing a relationship among a blurred image, an omnifocal image, and PSF.

FIG. 7 is a diagram showing how to determine a size of a blur kernel according to the embodiment of the present disclosure.

FIG. 8 is a flowchart of processing performed by a vector similarity calculation unit according to the embodiment of the present disclosure.

FIG. 9 is a diagram showing one example of a method of inputting the number of interpolations according to the embodiment of the present disclosure.

FIG. 10 is a diagram for explaining a method of generating interpolation range images and interpolation texture images according to the embodiment of the present disclosure.

FIG. 11 is a diagram for explaining a method of generating parallax images according to the embodiment of the present disclosure.

FIG. 12 is a block diagram showing a functional structure of a 3D image interpolation device according to another embodiment of the present disclosure.

FIG. 13 is a flowchart of processing performed by the 3D image interpolation unit according to the other embodiment of the present disclosure.

DETAILED DESCRIPTION OF INVENTION

The following describes embodiments according to the present disclosure with reference to the drawings. It should be noted that all the embodiments described below are specific examples of the present disclosure. Numerical values, shapes, materials, constituent elements, arrangement positions and the connection configuration of the constituent elements, steps, the order of the steps, and the like described in the following embodiments are merely examples, and are not intended to limit the present disclosure. The present disclosure is based on the appended claims. Therefore, among the constituent elements in the following embodiments, constituent elements that are not described in independent claims that show the most generic concept of the present disclosure are described as elements constituting more desirable configurations, although such constituent elements are not necessarily required to achieve the object of the present disclosure.

It should also be noted that hereinafter, an “image” refers to signals or information two-dimensionally expressing luminance or colors of a scene. Furthermore, a “range image” refers to signals or information two-dimensionally expressing a distance (depth) from a camera to the scene. Moreover, “parallax images” refer to a plurality of images (for example, a right-eye image and a left-eye image) corresponding to respective different viewpoints.

FIG. 1 is a block diagram showing an overall structure of a 3D imaging apparatus 10 according to an embodiment of the present disclosure. The 3D imaging apparatus 10 according to the present embodiment is a digital electron camera. The 3D imaging apparatus 10 includes an imaging unit 100, a signal processing unit 200, and a display unit 300. The following describes the imaging unit 100, the signal processing unit 200, and the display unit 300 in more detail.

The imaging unit 100 captures an image of a scene. A scene refers to everything seen in an image captured by the imaging unit 100. A scene includes a background in addition to a subject.

As shown in FIG. 1, the imaging unit 100 includes an imaging device 101, an optical lens 103, a filter 104, a control unit 105, and a device driving unit 106.

The imaging device 101 is a solid-state imaging device such as a CCD image sensor or a CMOS image sensor. The imaging device 101 is manufactured by a known semiconductor manufacturing technology. For example, the imaging device 101 includes a plurality of light-sensing cells arranged in rows and columns on an imaging plane.

The optical lens 103 generates an image on the imaging plane of the imaging device 101. Although the imaging unit 100 according to the present embodiment includes the single optical lens 103, but it may include a plurality of optical lenses.

The filter 104 is an infrared cut filter through which visible light can pass but near-infrared light (IR) cannot pass. It should be noted that the imaging unit 100 may not include the filter 104.

The control unit 105 generates basic singles for driving the imaging device 101. Furthermore, the control unit 105 receives output signals of the imaging device 101 and provides them to the signal processing unit 200.

Based on the basic signals generated by the control unit 105, the device driving unit 106 drives the imaging device 101. It should be noted that the control unit 105 and the device driving unit 106 are implemented as Large Scale Integrations (LSIs) such as CCD drivers.

The signal processing unit 200 generates image signals based on the signals issued from the imaging unit 100. As shown in FIG. 1, the signal processing unit 200 includes a memory 201, a 3D image interpolation unit 202, and an interface unit 203.

The 3D image interpolation unit 202 performs frame interpolation on 3D video. The 3D image interpolation unit 202 may be appropriately implemented as a combination of hardware such as a known digital signal processor (DSP) and software for executing image processing including image signal generation. The 3D image interpolation unit 202 will be described in more detail later with reference to corresponding figures.

The memory 201 is, for example, a Dynamic Random Access Memory (DRAM) or the like. On the memory 201, signals obtained from the imaging unit 100 are recorded, and furthermore, image data generated by the 3D image interpolation unit 202 or its compressed data is temporarily recorded. These pieces of image data are provide to a recording medium (not shown) or the display unit 300 via the interface unit 203.

The display unit 300 displays capturing conditions or captured images. Furthermore; the display unit 300 is, for example, a capacitance touch panel or a resistance film touch panel, and can serve also as a receiving unit that receives inputs from a user. Input information from the user is used in control of the signal processing unit 200 and the imaging unit 100 via the interface unit 203.

It should be noted that the 3D imaging apparatus 10 according to the present embodiment may further include known structural elements such as an electronic shutter, a view finder, a power source (battery), and a flash light, but these elements are not essential to understand the present disclosure so that they will not be described herein.

FIG. 2 is a block diagram showing an overall structure of the 3D imaging apparatus 202 according to the embodiment of the present disclosure. As shown in FIG. 2, the 3D image interpolation unit 202 includes a range image obtainment unit 400, a texture image obtainment unit 408, a range motion vector calculation unit 401, an range motion vector calculation unit 402, a vector similarity calculation unit 403, an interpolation image number determination unit 404, a range image interpolation unit 405, an image interpolation unit 406, and an interpolation parallax image generation unit 407.

The range image obtainment unit 400 obtains a first range image and a second range image. The first range image expresses a depth of a first image, and the second range image expresses a depth of a second image. The first and second images are included in 3D video and have the same viewpoint, so that they are used for frame interpolation.

According to the present embodiment, the range image obtainment unit 400 obtains the first range image based on a blur correlation among a plurality of captured images which are included in the first captured image group and have different focal distances. In addition, the range image obtainment unit 400 obtains the second range image based on a blur correlation among a plurality of captured images which are included in a second captured image group and have different focal distances.

Each of the first captured image group and the second captured image group includes a plurality of images captured by varying a focal distance by the imaging unit 100. The second captured image group is temporally subsequent to the first captured image group.

The texture image obtainment unit 408 obtains a first texture image as the first image, by reconstructing a single captured image included in the first captured image group by using blur information indicating a blur feature of the single captured image. In addition, the texture image obtainment unit 408 obtains a second texture image as the second image, by reconstructing a single captured image included in the second captured image group by using blur information indicating a blur feature of the single captured image.

According to the present embodiment, a texture image refers to an image that is generated by reconstructing a captured image using blur information indicating a blur feature in the captured image. In other words, a texture image is an image from which blur included in the captured image has been removed. Therefore, a texture image is an image in which all pixels come into the same focus.

It should be noted that it is not necessary to use the first texture image and the second texture image as the first image and the second image, respectively. In other words, the first and second images may include blur. In this case, the 3D image interpolation unit 202 may not include the texture image obtainment unit 408.

The range motion vector calculation unit 401 calculates a motion vector from the first range image and the second range image. Here, the motion vector calculated from the first range image and the second range image is referred to as a range motion vector.

The image motion vector calculation unit 402 calculates a motion vector from the first image and the second image. Here, the motion vector calculated from the first image and the second image is referred to as an image motion vector.

The vector similarity calculation unit 403 calculates a vector similarity that is a value indicating a degree of a similarity between the range motion vector and the image motion vector. The method of calculating the vector similarity will be described later in detail.

The interpolation image number determination unit 404 determines an upper limit of interpolations so that the number of interpolations increases as the calculated similarity increases.

The range image interpolation unit 405 generates at least one interpolation range image to be interpolated between the first range image and the second range image. More specifically, the range image interpolation unit 405 generates interpolation range images which are equal to or less than the interpolation upper limit determined by the interpolation image number determination unit 404.

The image interpolation unit 406 generates at least one interpolation image to be interpolated between the first image and the second image. According to the present embodiment, the image interpolation unit 406 generates at least one interpolation texture image to be interpolated between the first texture image and the second texture image.

More specifically, the image interpolation unit 406 generates interpolation images which are equal to or less than the interpolation upper limit determined by the interpolation image number determination unit 404.

The interpolation parallax image generation unit 407 generates, based on an interpolation image, at least one pair of interpolation parallax images having parallax according to a depth of an interpolation range image. According to the present embodiment, the interpolation parallax image generation unit 407 generates interpolation parallax image pairs which are equal to or less than the interpolation upper limit determined by the interpolation image number determination unit 404.

The 3D image interpolation unit 202 performs such frame interpolation on 3D video, by generating these interpolation parallax images as described above. The 3D video for which the frame interpolation has been performed is provided to, for example, a 3D display apparatus (not shown). The 3D display apparatus displays the 3D video, for example, by a 3D display method using eyeglasses. The eyeglasses 3D display method is a method of displaying a left-eye image and a right-eye image having parallax to a user wearing eyeglasses (for example, liquid crystal shutter eyeglasses or polarization eyeglasses).

It should be noted that the 3D display apparatus does not need to in display parallax images always by the eyeglasses 3D display method, but may display them by a glasses-free 3D display method. An example of the glasses-free 3D display method is a 3D display method without using eyeglasses (for example, a parallax barrier method or a lenticular lens method).

The following describes processing performed by the 3D image interpolation unit 202 having the above-described structure.

FIG. 3 is a flowchart of processing performed by the 3D image interpolation unit 202 according to the present embodiment of the present disclosure. It is assumed herein that the first image is the first texture image and the second image is the second texture image.

First, the range image obtainment unit 400 obtains the first range image and the second range image (S102). The range motion vector calculation unit 401 calculates a motion vector (range motion vector) from the first range image and the second range image (S104). The texture image obtainment unit 408 obtains the first texture image and the second texture image (S105). The range motion vector calculation unit 402 calculates a motion vector (image motion vector) from the first texture image and the second texture image (S106).

The vector similarity calculation unit 403 calculates a similarity between the range motion vector and the image motion vector (S108). The interpolation image number determination unit 404 determines an upper limit of interpolations so that the number of interpolations increases as the calculated similarity increases (110).

The range image interpolation unit 405 generates interpolation range images which are equal to or less than the interpolation upper limit, in order to be interpolated between the first range image and the second range image (S112). The image interpolation unit 406 generates interpolation texture images which are equal to or less than the interpolation upper limit, in order to be interpolated between the first texture image and the second texture image (S114).

The interpolation parallax image generation unit 407 generates, based on an interpolation texture image, a pair of interpolation parallax images having parallax according to a depth indicated by an interpolation range image corresponding to the interpolation texture image (S116).

The interpolation parallax images are generated as described above, and frame interpolation is performed on 3D video. It should be noted that the processing from Step S102 to Step S116 is repeated by switching a current image (the first texture image or the second texture image) to be interpolated.

Next, each of the steps shown in FIG. 3 is described in more detail.

<Range Image Obtainment (S102)>

According to the present embodiment, the range image obtainment unit 400 obtains range images each indicating a distance from the camera to a scene (hereinafter, referred to also as a “subject distance” or simply as a “distance”), based on a plurality of images captured by the imaging unit 100. The following describes a method of measuring the distance for each pixel by the depth-from-defocus method disclosed in Patent Literature 2. It should be noted that the range image obtainment unit 400 may obtain range images by other methods (for example, the stereo method using a plurality of cameras, the photometric stereo method, or the TOF method using an active sensor).

In the depth-from-defocus method, first, the imaging unit 100 captures, as a single image group, a plurality of images having different blurs, by varying setting of the lens or diaphragm. The imaging unit 100 generates a plurality of such image groups by repeating the above-described capturing. Here, one image group among the plurality of image groups generated as described above is referred to as the first image group, and an image group temporally subsequent to the first image group is referred to as the second image group.

Here, as one example, the description is given for processing in which the range image obtainment unit 400 obtains a single range image from a single image group.

The range image obtainment unit 400 calculates, for each pixel, a correlation amount of blur among the captured images included in the first image group. The range image obtainment unit 400 obtains (selects) a range image from the first image group, by referring, for each pixel, to a reference table in which a relationship between a blur correlation amount and a subject distance is predetermined.

FIG. 4 is a flowchart of processing performed by the range image obtainment unit 400 according to the present embodiment of the present disclosure. More specifically, FIG. 4 shows a distance measurement method using the depth-from-defocus method.

First, the range image obtainment unit 400 obtains, from the imaging unit 100, two captured images showing the same scene but having different focal distances (S202). It is assumed that the two captured images are included in the first image group. The focal distance can be changed by moving a position of the lens or imaging device.

Next, for each of the two images, the range image obtainment unit 400 sets, as a DFD kernel, a region including (a) a current pixel for which a distance is to be measured and (b) pixel groups in a region around the pixel (S204). This DFD kernel is a target for which a subject distance to be measured. A size or a shape of the DFD kernel is not specifically limited. For example, it is possible to set, as the in DFD kernel, a rectangular region of 10 pixels×10 pixels around the current pixel to be measured.

Then, the range image obtainment unit 400 extracts the region set as the DFD kernel from each of the two images captured by varying a is focal distance, and calculates a blur correlation amount for each pixel between the DFD kernels (S206).

Here, the range image obtainment unit 400 weights the blur correlation amount calculated for the respective pixels in the DFD kernel, by using a weighting coefficient predetermined for the DFD kernel (S208). For example, a greater value of the weighting coefficient is assigned to a location closer to the center of the DFD kernel, and a smaller value of the weighting coefficient is assigned to a location closer to an the end of the DFD kernel. It should be noted that a known weighting distribution such as gauss distribution may be used as the weighting coefficient. The weighting processing can provide robustness to noise influence. A sum of the weighted blur correlation amounts regarding the respective pixels is treated as a blur correlation amount of the DFD kernel (hereinafter, referred to as a “DFD kernel blur correlation amount”).

Finally, the range image obtainment unit 400 calculates the subject distance, based on the DFD kernel blur correlation amount by using a lookup table indicating a relationship between a subject distance and a DFD kernel blur correlation amount (S210). In the lookup table, a DFD kernel blur correlation amount has a linear relationship with reciprocal of a subject distance (refer to Non-Patent Literature 5 for the lookup table calculation). If the lookup table does not include a corresponding DFD kernel blur correlation amount, the range image obtainment unit 400 may calculate a subject distance by interpolation. It is desirable that the lookup table is changed if the optical system is changed. Here, the range image obtainment unit 400 may prepare a plurality of lookup tables depending on diaphragm sizes or focal distances. Since the setting information of the optical system is known in image capturing, it is possible to predetermine a lookup table to be used.

Next, a method of calculating a blur correlation amount is described.

It is assumed that two images captured by different focal distances are referred to as an image G1 and an image G2. The range image obtainment unit 400 selects a current pixel for which a subject distance is to be measured, and sets pixel values in a rectangular region of M pixels×M pixels around the current pixel to be a DFD kernel in each of the image G1 and the image G2. The pixel value in the DFD kernel is expressed as g1(u, v) for the image G1 and as g2(u, v) for the image G2, where {u, v: 1, 2, 3, . . . M}. The coordinates of the current pixel are expressed as (cu, cv). The blur correlation amount G(u, v) of each pixel in an arbitral pixel position (u, v) in the DFD kernel is expressed by following Equation 3.

[

Math

.

3

]

G

(

u

,

v

)

=

C

{

g

1

(

u

,

v

)

-

g

2

(

u

,

v

)

}

Δ

g

1

(

u

,

v

)

+

Δ

g

2

(

u

,

v

)

(

Equation

3

)



where C is a constant that is experimentally determined, and Δ represents a quadratic differential (Laplacian) of a pixel value. As described above, a blur correlation amount for each pixel is determined by dividing (a) a difference of a pixel value of a predetermined pixel between the two images having different blurs by (b) an average value of quadratic differentials of the predetermined pixels in the two images. This blur correlation amount indicates a degree of a correlation of blurs on a pixel-by-pixel basis in an image.

By the above-described processing, the range image obtainment unit 400 obtains a range image indicating a distance from the camera to the subject from a captured image group. More specifically, the range image obtainment unit 400 obtains the first range image based on a blur correlation between a plurality of captured images which are included in the first captured image group and have different focal distances. In addition, the range image obtainment unit 400 obtains the second range image based on a blur correlation between a plurality of captured images which are included in the second captured image group temporally subsequent to the first captured image group and have different focal distances.

It should be noted that the range image obtainment unit 400 does not always need to perform the above-described processing to obtain the range images. For example, the range image obtainment unit 400 may merely receive range images that have been generated by the imaging unit 100 having a distance sensor.

<Range Motion Vector Calculation (S104)>

The range motion vector calculation unit 401 calculates a motion vector from the first range image and the second range image.

More specifically, the range motion vector calculation unit 401 first determines, for each pixel, a point in the first range image which corresponds to a point in the second range image (hereinafter, referred to as corresponding points in the first and second range as image). Then, the range motion vector calculation unit 401 determines a vector connecting the corresponding points as a motion vector. The motion vector indicates a motion amount and a motion direction for the pixel in the first and second images. The motion vector is explained with reference to FIG. 5.

FIG. 5 (a) shows a range image at time t (the first range image) and a range image at time t+1 (the second range image). In FIG. 5 (a), a pixel A and a pixel B are determined as corresponding points, by searching the image at time t+1 for a pixel corresponding to the pixel A at time t.

Here, a method of searching for the corresponding points is explained. First, the range motion vector calculation unit 401 calculates a correlation value between (a) a region regarding the pixel A and (b) a region regarding a pixel including in a search region, in order to search the range image at time t+1 for a pixel corresponding to the pixel A. The correlation value is calculated by using, for example, a Sum of Absolute Difference (SAD) or a Sum of Squared Difference (SSD).

The search region is, for example, shown in FIG. 5 (a) as framed by a broken line in the range image at time t+1. It should be noted that a size of the search region may be set larger, if an object in the scene moves fast or if an interval between time t and time t+1 is long. On the other hand, the size of the search region may be set smaller, if the object in the scene moves slowly or if the interval between time t and time t+1 is short.

Equations 4 for calculating correlation values using SAD or SSD are presented below.

[

Math

.

4

]

corsad

=

u

=

0

N

v

=

0

M

I

1

(

u

+

i

1

,

v

+

j

1

)

-

I

2

(

u

+

i

2

,

v

+

j

2

)

corssd

=

u

=

0

N

v

=

0

M

(

I

1

(

u

+

i

1

,

v

+

j

1

)

-

I

2

(

u

+

i

2

,

v

+

j

2

)

)

2

(

Equations

4

)



where I1(u, v) represents a pixel value of a pixel (u, v) in the image I1 at time t, and I2(u, v) represents a pixel value of a pixel (u, v) in the image I2 at time t+1. The range motion vector calculation unit 401 calculates a correlation value between (a) a region of N pixels×M pixels around a pixel (i1, j1) in the image I1 and (b) a region of N pixels×M pixels around a pixel (i2, j2) in the image I2 according to Equation 4, so as to search the image I2 for a region similar to the region of N pixels×M pixels around the pixel (i1, j1) in the image “corsad” represents a correlation value determined by SAD, and “corssd” represents a correlation value determined by SSD. Any of them can be used as a correlation value. Each of “corsad” and “corssd” has a value that decreases as a correlation increases.

The range motion vector calculation unit 401 calculates a correlation value by switching a pixel (i2, j2) in the search region. The range motion vector calculation unit 401 determines, as a pixel corresponding to the pixel A, a pixel (i2, j2) having a minimum correlation value from among the correlation values calculated as described above.

It should be noted that the method of calculating a correlation value according to SAD or SDD has been described above assuming that illumination fluctuation or contrast fluctuation is small between the two images. However, if illumination fluctuation or contrast fluctuation is large between the two images, it is desirable to calculate a correlation value by using, for example, a normalization cross-correlation method. It is thereby possible to search for more robust corresponding points.

The range motion vector calculation unit 401 can determine a motion vector for each pixel between the two range images, by performing the above-described processing for each of the pixels. Here, it is also possible to perform noise cancellation such as median filtering after the motion vector calculation.

It should also be noted that a motion vector may not be calculated for each pixel. For example, the range motion vector calculation unit 401 may calculate a range motion vector for each of blocks having a first size which are divided from an image. This case can reduce a load on the motion vector calculation in comparison to the case where a motion vector is calculated for each pixel.

<Texture Image Obtainment (S105)>

According to the present embodiment, the texture image obtainment unit 408 first calculates the first texture image by using the first image group and the first range image. In addition, the texture image obtainment unit 408 calculates the second texture image by using the second image group and the second range image.

More specifically, the texture image obtainment unit 408 generates the first texture image, by reconstructing a single captured image included in the first captured image group based on blur information indicating a blur feature of the single captured image. In addition, the texture image obtainment unit 408 generates the second texture image, by reconstructing a single captured image included in the second captured image group based on blur information indicating a blur feature of the single captured image.

The following describes these processes in more detail with reference to the corresponding figures.

First, a method of calculating the texture images is explained. A texture image according to the present embodiment refers to an image from which a blur included in a captured image is removed by using a range image obtained by the depth-from-defocus method. Therefore, a texture image is an image (omnifocal image) in which all pixels come into the same focus.

First, a method of generating a texture image from a captured image is described. According to the present embodiment, the texture image obtainment unit 408 calculates blur information (a blur kernel) indicating a size of a blur in each pixel, based on a range image and a formula of the lens.

The texture image obtainment unit 408 performs an inverse convolution operation (restruction) on a blur kernel of each of pixels in a captured image, so as to generate a texture image (omnifocal image) in which all pixels come into the same focus.

In order to describe the above processing, how a blur occurs in an image is first explain. A luminance distribution of an omnifocal image without blur is expressed as s(x, y), and a blur function (Point Spread Function (PSF)) indicating a blur size is expressed as f(x, y). Here, for the sake of explanation simplicity, it is assumed that blurs having a blur function “f” homogenously occur on the entire image. The following Equation 5 is established if noise influence is ignored.

[Math. 5]



i(x,y)=s(x,y)*f(x,y)  (Equation 5)



where the symbol “*” represents a convolution operation. FIG. 6 shows an example where Equation 5 is expressed by images. If an omnifocal image is given as a point as shown in FIG. 6, it is convoluted by a circular blur function (defined in more detail later) to generate a blur image i(x, y). The blur function is referred to also as a blur kernel. A diameter of a circle of the blur function is called a kernel size.

The right side of Equation 5 is generally expressed by the following Equation 6.

[

Math

.

6

]

s

(

x

,

y

)

*

f

(

x

,

y

)

=

-

-

s

(

j

,

k

)

f

(

x

-

j

,

y

-

k

)

j

k

(

Equation

6

)

If an image consists of M pixels×N pixels, the above Equation 6 can be expressed by the following Equation 7.

[

Math

.

7

]

s

(

x

,

y

)

*

f

(

x

,

y

)

=

1

M

×

N

j

=

0

M

-

1

k

=

0

N

-

1

s

(

j

,

k

)

f

(

x

-

j

,

y

-

k

)

(

Equation

7

)

Generally, a Fourier transform for convoluting two functions is expressed by multiplication on Fourier transform of each function. Therefore, if Fourier transforms of i(x, y), x(x, y), and f(x, y) are expressed as I(u, v), S(u, v), and F(u, v), the following Equation 8 is derived from Equation 5. Here, (u, v) represents coordinates in a frequency region, and the coordinates indicate a spatial frequency in an x-direction and a spatial frequency in a y-direction, respectively, in an actual image.

[Math. 8]



I(u,v)=S(u,v)∘F(u,v)  (Equation 8)



where the symbol “•” represents multiplication on function in a frequency region. Equation 8 is transformed to the following Equation 9.

[

Math

.

9

]

S

(

u

,

v

)

=

I

(

u

,

v

)

F

(

u

,

v

)

(

Equation

9

)

Equation 9 expresses that a function generated by dividing (a) a Fourier transform I(u, v) of an image i(x, y) captured by the camera by (b) a Fourier transform F(u, v) of f(x, y) that is a blur function PSF is equivalent to a Fourier transform S(u, v) of the omnifocal image s(x, y).

If f(x, y) that is a blur function PSF for each pixel is determined in the above manner, it is possible to determine the omnifocal image s(x, y) from the captured image i(x, y).

Then, an example of the method of calculating a blur function PSF for each pixel is explained. FIG. 7 shows a schematic diagram of the lens. It is assumed that a size of a blur kernel in capturing a subject having a distance “d” from the camera is “B”, and a distance to the imaging plane is “C”. A diaphragm diameter (opening size) “A” and a focal distance “f” are known from the setting conditions of the camera. Here, since a relationship between the opening size “A” and the focal distance “f” is similar to a relationship between the blur kernel “B” and a difference between the distance “C” to the imaging plane and the focal distance “f”, the following Equation 10 is obtained.

[Math. 6]



A:B=f:C−f  (Equation 10)

According to Equation 10, the blur kernel size “B” is expressed by the following Equation 11.

[

Math

.

11

]

B

=

(

C

-

f

)

A

f

(

Equation

11

)

Here, the following Equation 12 is obtained based on the formula of the lens.

[

Math

.

12

]

1

C

+

1

d

=

1

f

(

Equation

12

)

Since the distance “d” from the camera to the subject and the focal distance “f” are known, Equation 11 can be transformed using Equation 12 to the following Equation 13.

[

Math

.

13

]

B

=

(

1

(

1

d

+

1

f

)

-

f

)

A

f

(

Equation

13

)

The texture image obtainment unit 408 can calculate the blur kernel size “B” according to Equation 13. If the blur kernel size “B” is determined, the blur function f(x, y) is obtained. According to the present embodiment, the blur kernel is defined by a pillbox function. The pillbox function can be defined by the following Equation 14.

[

Math

.

14

]

f

(

x

,

y

)

=

{

1

:

if

x

2

+

y

2

B

2

0

:

otherwise

(

Equation

14

)

By the above-described method, the texture image obtainment unit 408 can obtain a blur function by determining a blur kernel for each pixel. Then, the texture image obtainment unit 408 performs the inverse convolution operation on the captured images by using the blur function according to Equation 10, thereby generating a texture image.

The texture image obtainment unit 408 calculates such a texture in image from the first captured image group captured at time t, and also from the second captured image group captured at time t+1, so as to obtain the first texture image and the second texture image.

<Image Motion Vector Calculation (S106)>

The following describes image motion vector calculation at Step s106.

The image motion vector calculation unit 402 calculates a motion vector (image motion vector) from the first texture image and the second texture image.

It should be noted that the detailed processing of calculating a motion vector from the first texture image and the second texture image is the same as the range motion vector calculation, so that it is not described again.

<Vector Similarity Calculation (S108)>

The following describes the vector similarity calculation at Step S108 in more detail.

The vector similarity calculation unit 403 calculates a vector similarity between (a) a range motion vector calculated by the range motion vector calculation unit 401 and (b) an image motion vector calculated by the image motion vector calculation unit 402.

First, reasons why the vector similarity is to be calculated are explained. If the two motion vectors are not similar to each other, it means that the subject moves differently between the range images and the texture images. However, it is considered that if the same subject is shown in these two images, the movement of the subject should be similar between the range images and the texture images.

Therefore, if the two motion vectors are not similar to each other, there is a high possibility that interpolation parallax images generated from an interpolation range image and an interpolation texture image which are generated based on the two motion vectors do not correctly express a depth of the scene. As a result, when a 3D video for which frame interpolation has been performed using such interpolation parallax images is displayed by a 3D display apparatus, the user cannot correctly recognize the scene depth.

In particular, if corresponding points between range images are not correctly determined and therefore a range motion vector is not correctly calculated, a scene giving an unrealistic depth impression is displayed by the 3D display apparatus. In such 3D video, for example, a single subject which moves slowly in reality is perceived as abruptly moving forwards or backwards. Here, since the expected movement of the subject is significantly different from the movement of the subject perceived in the 3D video, there is a high possibility that the user feels sick watching the 3D video.

Therefore, in order to detect a failure of such motion vector calculation for range images, the present embodiment uses a similarity between a motion vector of range images and a motion vector of texture images. A range image and a texture image have, as images, different information, but they are characterized by having similar motion directions in an image region which result from a movement of an object included in a scene.

Therefore, certainty of the two motion vectors can be defined by a similarity between the two motion vectors. In other words, when the motion vector of the range images is not similar to the motion vector of the texture images, there is a high possibility that at least one of the motion vector of the range images and the motion vector of the texture images is not correctly calculated. Therefore, there is a high possibility that interpolation texture images or interpolation range images cannot be correctly generated by using the motion vector. Therefore, in such a case, by limiting the number of generated interpolation images, the 3D display apparatus displays 3D video with a low frame rate. This can prevent 3D sickness occurred by rapid changes of a scene depth.

A method of calculating a similarity between the motion vector of the range images and the motion vector of the texture images is described with reference to FIG. 8. FIG. 8 is a flowchart of the processing performed by the vector similarity calculation unit 403 according to the present embodiment of the present disclosure.

First, the vector similarity calculation unit 403 divides each of a range image and a texture image into a plurality of blocks (for example, a rectangular region having N pixels×M pixels, where each of N and M is an integer of 1 or more) (S302). A size of the block is larger than a block size based on which a motion vector is to be calculated. This means that, when a motion vector is calculated in units of a first block size, the vector similarity calculation unit 403 divides a target image into blocks each having a second block size larger than the first block size (S308).

Next, the vector similarity calculation unit 403 generates a direction histogram and a power histogram for each block (S304). The vector similarity calculation unit 403 calculates a similarity for each block using these histograms (S306). Finally, the vector similarity calculation unit 403 calculates an average value of the similarity determined for each block.

Here, a method of expressing motion vectors in a histogram is described. A motion vector is a vector on a two-dimensional plane. Therefore, a direction “dir” and a power “pow” of a motion vector can be calculated by the following Equations 15.

[

Math

.

15

]

dir

=

tan

-

1

(

yvec

xvec

)

pow

=

xvec

2

+

yvec

2

(

Equation

15

)

First, a method of generating a direction histogram of motion vectors is described. A value of a direction “dir” of a motion vector which is determined by Equation 15 ranges from 0 degrees to 359 degrees. Therefore, the vector similarity calculation unit 403 calculates, for each block, a direction “dir” of a motion vector of each pixel in a target block according to Equation 15. Then, the vector similarity calculation unit 403 calculates, for each angle ranging from 0 degrees to 359 degrees, a frequency of the calculated direction “dir” of the motion vector for each pixel, and generates the direction histogram of motion vectors for each block.

More specifically, the vector similarity calculation unit 403 applies Equation 16 to motion vectors of all respective pixels in the target block. Here, the motion vector is expressed as (xvec, yvec). If a motion vector of one pixel in the target block is selected, a direction of the selected motion vector is calculated by using a value of “xvec” and a value of “yvec”.

Here, “direction_hist” is an array having 360 memory regions. All elements in this array have an initial value of 0. A function “f” in Equation 16 is a function for transforming the value from a radian to a frequency. In the function “f”, a value after the decimal point is rounded off (or cut off). Assuming that the value ranging from 0 to 359 indicating a direction obtained by the function “f” is an argument of “direction_hist”, a value of an element that corresponds in the array to the argument is incremented only by 1. Thereby, the direction histogram of motion vectors in the target block can be obtained.

[Math. 16]



direction_hist[f(dir)]=direction*_hist[f(dir)]+1  (Equation 16)

Next, a method of generating a power histogram of motion vectors is described. A maximum value of a motion vector power “pow” which is determined by Equation 15 is a maximum value of a length of the motion vector. In other words, a maximum value of a motion vector power “pow” of a motion vector is equivalent to a maximum value of a search range for corresponding points between an image at time t and an image at time t+1. Therefore, a maximum value of a motion vector power “pow” is equivalent to a maximum value of a distance between a pixel (i1, j1) of the image at time t and a pixel (i2, j2) of an image at time t+1 which is determined according to Equation 4.

This search range may be determined depending on a scene to be captured, or determined for each imaging apparatus. Furthermore, the search range may be set when the user captures images. If a maximum value of the search range is represented as “powmax”, a possible range for a motion vector power is from 0 to “powmax”.

The vector similarity calculation unit 403 generates a power histogram of motion vectors, by applying Equation 17 to motion vectors of all respective pixels in the target block. Here, “power_hist” is an array having (powmax+1) memory regions. All elements in this array have an initial value of 0.

If a motion vector of one pixel in the target block is selected, a power of the selected motion vector is calculated according to Equation 15. The function “g” in Equation 17 is a function for rounding off (or cutting off) a value after the decimal point of the calculated motion vector power. Assuming that the value ranging from 0 to “powmax” indicating a power obtained by the function “g” is an argument of “powmax_hist”, a value of an element that corresponds in the array to the argument is incremented only by 1. Thereby, the power histogram of the motion vectors in the target block can be determined.

[Math. 17]



power_hist[g(pow)]=power_hist[g(pow)]+1  (Equation 17)

Next, the description is given for a method of calculating a similarity between blocks, based on a direction histogram and a power histogram of motion vectors which are generated in the above manner. For range images, a direction histogram is denoted as “d_direction_hist” and a power histogram is denoted as “d_power_hist”. Likewise, for texture images, a direction histogram is denoted as “t_direction_hist” and a power histogram is denoted as “t_power_hist”. The number of pixels (the number of motion vectors) in the target block is assumed as N pixels×M pixels. Here, the vector similarity calculation unit 403 calculates a histogram correlation value of the direction histograms, and a histogram correlation value of the power histograms, according to the following Equations 18.

[

Math

.

18

]

dircor

=

1

N

×

M

i

=

0

359

min

(

d_direction

_hist

[

i

]

,

t_direction

_hist

[

i

]

)

powcor

=

1

N

×

M

i

=

0

359

min

(

d_power

_hist

[

i

]

,

t_power

_hist

[

i

]

)

(

Equation

18

)

According to Equations 18, “dircor” represents a correlation value of direction histograms, “powcor” represents a correlation value of power histograms, and a function “min” is a function for returning a smaller value of two arguments. As shapes of histograms are more similar to each other, a histogram correlation value (“dircor” and “powcor”) approaches to 1. As shapes of histograms are more different from each other, the histogram correlation value approaches to 0.

The vector similarity calculation unit 403 calculates, for each block, a correlation value of histograms generated by the above method. Then, the vector similarity calculation unit 403 determines, as a similarity, an average value of the correlation values calculated for respective blocks. Since the histogram correlation value ranges from 0 to 1, the average value, namely, their average value, also ranges from 0 to 1. Therefore, the similarity indicates a rate indicating how much a motion vector of range images is similar to a motion vector of texture images.

As described above, the vector similarity calculation unit 403 generates, for each block, a direction histogram and a power histogram regarding range motion vectors. In addition, the vector similarity calculation unit 403 generates, for each block, a direction histogram and a power histogram regarding image motion vectors. Then, the vector similarity calculation unit 403 calculates a vector similarity, based on (a) a similarity between a direction histogram of range motion vectors and a direction histogram of image motion vectors and (b) a similarity between a power histogram of range motion vectors and a power histogram of image motion vectors.

It should be noted that the vector similarity calculation unit 403 does not necessarily use both a similarity of direction histograms and a similarity of power histograms in order to calculate a vector similarity. In other words, the vector similarity calculation unit 403 may calculate a vector similarity based on either a similarity of direction histograms or a similarity of power histograms. In this case, it is not necessary to generate the other similarity among the similarity of direction histograms and the similarity of power histograms.

It should also be noted that the vector similarity calculation unit 403 does not need to use such histograms to calculate a vector similarity. For example, the vector similarity calculation unit 403 calculates a vector similarity by comparing a direction to a power of an average vector.

<Interpolation Image Number Determination (S110)>

The interpolation image number determination unit 404 determines an upper limit of the number of interpolations (in other words, is interpolation images) based on a vector similarity. As described above, if the motion vector is not correctly calculated and the number of interpolation parallax images is large, there is not only a problem of deteriorating image quality of 3D video, but also a problem of causing the user to feel 3D sickness, for example. Therefore, according to the present embodiment, a vector similarity is considered as an accuracy of motion vectors, and the interpolation upper limit is determined so that the number of generated interpolation parallax images is decreased when the vector similarity is lower. Therefore, even if motion vectors are not correctly calculated, it is possible to reduce harmful influence (for example, 3D sickness and the like) on a viewer of the frame-interpolated 3D video.

The following describes the method of determining the upper limit of interpolations based on a similarity of motion vectors. The interpolation image number determination unit 404 determines the upper limit “Num” of the number of interpolations corresponding to a vector similarity according to Equation 19.

[Math. 19]



Num=Sim*F  (Equation 19)



where “F” represents a predetermined fixed value, and “Sim” represents a vector similarity. For example, if “F” is 30 and the vector similarity “Sim” is 0.5, the upper limit of the possible interpolation parallax images between time t and time t+1 is determined as 15.

Furthermore, the interpolation image number determination unit 404 in may set, as the interpolation number, the number which is inputted by the user and is equal to or smaller than the interpolation upper limit. For example, if the upper limit is 15, the user may input a numeral number ranging from 0 to 15 as the interpolation number.

For example, as shown in FIG. 9, on a touch panel (the display unit 300), a slider bar is displayed to receive an input of a number ranging from 0 to 15. The user inputs a number equal to or smaller than the upper limit number by touching the touch panel to shifting the slider bar displayed on the touch panel.

This means that the use can set the interpolation number while viewing the display unit 300 on the rear side of the camera. With the above-described structure, the user can adjust the interpolation number, while checking 3D video for which frame interpolation has been performed by the interpolation parallax images generated by interpolation parallax image generation as described below.

Therefore, the user can intuitively input the interpolation number to obtain 3D video with less 3D sickness. In other words, it is possible to prevent the frame interpolation from discomforting the user. It is also possible to receive an input of the interpolation number not only by the illustrated touch panel but also by other input devices.

It should be noted that the interpolation image number determination unit 404 does not need to set the number inputted by the user as the interpolate number. For example, the interpolation image number determination unit 404 may determine the upper limit directly as the interpolation number.

It should be noted that Non-Patent Literature 6 does not show experimental results directly relating to 3D sickness caused by viewing 3D video, but shows the experimental results regarding sickness caused by viewing 2D images. Non-Patent Literature 6 in describes that parameters of a camera are set not to cause inexactness of a size of left/right images, rotation, colors, and the like in camera capturing, in order to prevent that the inexactness causes video sickness or eyestrain.

Non-Patent Literature 6 also describes that some people can easily view images stereoscopically and the others cannot, and therefore fatigue viewing 3D video varies among individuals. Therefore, it is difficult to determine the number of interpolations never to cause 3D sickness by errors of interpolation parallax images. In order to address the difficulty, it is therefore desirable that the number of interpolation parallax images is set to a small value in standard, and the number of interpolations is adjusted by the user interface that designates a value as shown in FIG. 9.

<Interpolation Range Image Generation (S112), Interpolation Texture Image Generation (S114)>

The range image interpolation unit 405 generates, by using motion vectors, interpolation range images which are equal to or less than the interpolation upper limit determined by the interpolation image number determination unit 404. The image interpolation unit 406 generates, by using the motion vectors, interpolation texture images which are equal to or less than the interpolation upper limit determined by the interpolation image number determination unit 404.

Here, it is assumed that a motion vector regarding a pixel (u, v) in the image I1 at time t is expressed as (vx, vy). Under the assumption, a pixel which is included in the image I2 and corresponds to the pixel (u, v) in the image I1 is expressed as (u+vx, u+vy).

The following describes the method of interpolating between range images and between texture images by linear interpolation in the case where the interpolation number is “Num”.

FIG. 9 is a diagram showing the method of interpolating between range images and between texture images according to the present embodiment of the present disclosure. In FIG. 9, interpolation range images to be interpolated between a range image at time t and a range image at time t+1 are generated, and interpolation texture images to be interpolated between a texture image at time t and a texture image at time t+1 are generated.

In the case of interpolation number “Num”=2, as shown in FIG. 10 (a), an interval between time t and time t+1 is divided into 3 parts, and a first interpolation range image at time t+1/3 and a second interpolation range image at time t+2/3 are generated. A pixel included in the first interpolation range image (hereinafter, referred to as a “first interpolation pixel”) and a pixel included in the second interpolation range image (hereinafter, referred to as a “second interpolation pixel”) are dividing points between a pixel (u, v) in the first range image and a pixel (u+vx, v+vy) in the second range image. Therefore, the first interpolation pixel is expressed as (u+vx/3, v+vy/3), and the second interpolation pixel is expressed as (u+vx*2/3, v+vy*2/3).

Here, a pixel value of a pixel (u, v) in the first range image is expresses as Depth(u, v), and a pixel value of a pixel (u, v) in the second range image is expresses as Depth′(u, v). Here, a pixel value of the first interpolation pixel (u+vx/3, v+vy/3) is expressed as Depth(u, v)*2/3+Depth′(u+vx, v+vy)/3. Furthermore, a pixel value of the second interpolation pixel is expressed as Depth(u, v)/3+Depth′(u+vx, v+vy)*2/3.

Interpolation range images are generated by the above-described linear interpolation. It should be noted that interpolation texture images are generated also by the same method as described above, so that the generation of interpolation texture images is not explained below.

The above-described processing is generalized into Equations 20 and 21, were (u, v) represents coordinates of a pixel at time t, (vx, vy) represents a motion vector, “Num” represents the number of interpolations, “j” represents an integer ranging from 1 to “Num”. Coordinates of a pixel in a j-th interpolation image is calculated according to the following Equation 20.

[

Math

.

20

]

(

u

+

Vx

Num

+

1

j

,

v

+

Vy

Num

+

1

j

)

(

Equation

20

)

A calculation equation for calculating a pixel value of the j-th interpolation image is presented as the following Equation 21, where I(u, v) represents a pixel value of a pixel (u, v) at time t, and I′(u, v) represents a pixel value of a pixel (u, v) at time t+1.

[

Math

.

21

]

I

(

u

,

v

)

Num

+

1

-

j

Num

+

1

+

I

(

u

+

vx

,

v

+

vy

)

j

Num

+

1

(

Equation

21

)

The j-th interpolation image can be generated by the above-defined equations.

<Interpolation Parallax Image Generation (S116)>

Finally, the description is given for details of the interpolation parallax image generation at Step S116.

The interpolation parallax image generation unit 407 generates interpolation parallax images (here, parallax images mean a pair of a left-eye image and a right-eye image) from an interpolation range image and an interpolation texture image. The following describes a method of generating a left-eye interpolation image from an interpolation texture image and an interpolation range image.

FIG. 11 is a diagram for explaining the method of generating interpolation parallax images according to the present embodiment of the present disclosure. More specifically, FIG. 11 shows a relationship between (a) a distance to a subject and (b) coordinates on an image, in the case where the subject is viewed from a viewpoint of an interpolation range image and an interpolation texture image and from a viewpoint of a left-eye image to be generated. Symbols in FIG. 11 represent as follows.

A: direction measuring position

B: left parallax position

C, D: subject

E: optical axis of left parallax position

G, I: position of a left-eye camera to capture image of a subject C, D

f: focal distance of direction measuring position

d: distance between A and B

Z, Z′: a distance to C, D

X1, X2: coordinates on captured image

If a pixel which is included in the left-eye interpolation image and corresponds to a pixel (u, v) in the interpolation texture image is determined, a pixel value of the pixel (u, v) is copied to the corresponding pixel in the left-eye interpolation image, thereby generating a left-eye image. In FIG. 11, the focal distance “f” and the distance Z or Z′ from the camera to the subject are known. The distance “d” is a value which can be desirably predetermined in generating parallax images, so that the distance “d” is known. Here, since a triangle ABC and a triangle EIB are similar to each other, and a triangle ABD and a triangle EGB are similar to each other, the following Equations 22 are obtained.

[Math. 22]



f:Z′=X2:d,f:Z=X1:d  (Equations 22)

Equations 22 are transformed to Equations 23.

[

Math

.

23

]

X

2

=

fd

Z

,

X

1

=

fd

Z

(

Equation

23

)

Therefore, when a distance indicated by the interpolation range image is Z, a pixel (u, v) in the interpolation texture image corresponds to a pixel (u−X1, v) in the left-eye interpolation image. Therefore, a pixel value of the pixel (u, v) in the interpolation texture image is copied to the pixel (u−X1, v) in the left-eye interpolation image, so as to generate a left-eye interpolation image. Likewise, when a distance indicated by the interpolation range image is Z′, the pixel value of the pixel (u, v) in the interpolation texture image is copied to a pixel (u−X2, v) in the left-eye interpolation image.

The interpolation parallax image generation unit 407 performs the above-described processing on every pixel included in the interpolation range image so as to generate a left-eye interpolation image. A right-eye interpolation image is generated by copying the pixel value to a position symmetrical to that in the left-eye interpolation image. In the previous example, a pixel which is included in the right-eye interpolation image and corresponds to the pixel (u−X1, v) in the left-eye interpolation image is a pixel (u+X1, v). As described above, the interpolation parallax image generation unit 407 generates a left-eye interpolation image and a right-eye interpolation image. It should be noted that the interpolation parallax image generation unit 407 may generate not only interpolation parallax images but also parallax images.

As described above, the 3D imaging apparatus according to the present embodiment generates interpolation parallax images after separately performing interpolation for 2D images and interpolation for range images when frame interpolation is performed on 3D video. Therefore, it is possible to suppress more interpolation errors in a depth direction in comparison to the case where interpolation parallax images are generated by separately performing interpolation for left-eye images and interpolation for right-eye images. As a result, the frame interpolation on 3D video can be performed with a high accuracy. In addition, a left-eye interpolation image and a right-eye interpolation image are generated by using the same interpolation range image and the same interpolation image. Therefore, the 3D video for which the frame interpolation has been performed hardly cause the user viewing the 3D video to feel uncomfortable due to the interpolation.

Furthermore, the 3D imaging apparatus according to the present embodiment can determine the upper limit of interpolations depending on a similarity between a range motion vector and an image motion vector. When the similarity between the range motion vector and the image motion vector is low, there is a high possibility that the range motion vector or the image motion vector is not correctly calculated. Therefore, in such a case, the interpolation upper limit is set to be low so as to prevent that interpolation parallax images deteriorates image quality of the 3D video.

Moreover, the 3D imaging apparatus according to the present embodiment can calculate a vector similarity based on at least one of a histogram of motion vector directions and a histogram of motion vector powers. It is thereby possible to improve a correlation between a possibility of incorrect calculation of motion vectors and a vector similarity. As a result, the interpolation upper limit can be determined appropriately.

Furthermore, the 3D imaging apparatus according to the present embodiment uses, as inputs, a plurality of captured images having respective different focal distances. Therefore, the 3D imaging apparatus can contribute to the decrease of an imaging apparatus size.

Although the 3D imaging apparatus according to an aspect of the present disclosure has been described with reference to the embodiments as above, the present disclosure is not limited to these embodiments. Those skilled in the art will be readily appreciate that various modifications of the embodiments are possible without materially departing from the novel teachings and advantages of the present disclosure. Accordingly, all such modifications are intended to be included within the scope of this disclosure.

For example, it has been described in the above embodiment that the 3D image interpolation unit performs each processing on, as inputs, a plurality of captured images having respective different focal distances, but it is not necessarily to always use such captured images. For example, it is also possible to receive, as inputs, 3D video including left-eye images and right-eye images. In this case, the range image obtainment unit may obtain a range image based on parallax between a left-eye image and a right-eye image.

It should also be noted that it has been described in the above embodiment that the 3D image interpolation unit is included in the 3D imaging apparatus, but the 3D image interpolation unit may be implemented as a 3D image interpolation device independent from the 3D imaging apparatus. An example of such a 3D image interpolation device is described with reference to FIGS. 12 and 13.

FIG. 12 is a block diagram showing a functional structure of a 3D image interpolation device 500 according to another embodiment of the present disclosure. FIG. 13 is a flowchart of processing performed by the 3D image interpolation device 500 according to the other embodiment of the present disclosure. As shown in FIG. 12, the 3D image interpolation device 500 includes a range image interpolation unit 501, an image interpolation unit 502, and an interpolation parallax image generation unit 503.

As shown in FIG. 13, first, the range image interpolation unit 501 generates at least one interpolation range image to be interpolated between the first range image and the second range image (S402). Subsequently, the image interpolation unit 502 generates at least one interpolation image to be interpolated between the first image and the second image (S404). Finally, the interpolation parallax image generation unit 503 generates, based on the interpolation image, interpolation parallax images having parallax depending on a depth indicated by the interpolation range image (S406). As described above, the 3D image interpolation device 500 performs frame interpolation on 3D video.

(Other Variations)

It should also be noted that the present disclosure may be a computer-readable recording medium on which the computer program or the digital signals are recorded. Examples of the computer-readable recording medium are a flexible disk, a hard disk, a Compact Disc (CD)-ROM, a magnetooptic disk (MO), a Digital Versatile Disc (DVD), a DVD-ROM, a DVD-RAM, a BD (Blue-Ray® Disc), and a semiconductor memory. The present disclosure may be digital signals recorded on the recording medium.

It should also be noted in the present disclosure that the computer program or the digital signals may be transmitted via an electric communication line, a wired or wireless communication line, a network represented by the Internet, data broadcasting, and the like.

It should also be noted that the present disclosure may be a computer system including a microprocessor operating according to the computer program and a memory storing the computer program.

It should also be noted that the program or the digital signals may be recorded onto the recording medium to be transferred, or may be transmitted via a network or the like, so that the program or the digital signals can be executed by a different independent computer system.

(5) The above-described embodiments and variations may be combined.

The 3D image interpolation device and the 3D imaging apparatus according to the embodiments of the present disclosure can perform frame interpolation on 3D video with a high accuracy, and can be used as digital camcorders, display apparatuses, computer software, and the like.

REFERENCE SIGNS LIST