CUDA Cubic B-Spline Interpolation (CI) is an implementation of cubic interpolation in nVIDIA's CUDA language. The CUDA language natively already provides nearest neighbor and linear interpolation within 1D, 2D and 3D texture data. Many applications, however, could benefit from higher order interpolation. This especially applicable to CUDA, since it is targeted at making the processing power of the graphics processing unit (GPU) available for general purpose applications (GPGPU). The difference between linear interpolation and cubic interpolation is shown in the image below.
The left part of this image was generated using tri-linear interpolation on a cross-section plane through a voxel data set. The right part is the continuation of the same plane, but using pre-filtered 3D cubic interpolation.
The CI library also provides the possibility to very efficiently retrieve the cubicly interpolated first order derivatives of the texture data. There is no additional memory usage involved for querying the derivatives and the computational costs are very modest.
The most recent CI code can be downloaded here.
To execute and compile CI you need CUDA and the CUDA SDK (2.0 or higher). A quick guide to using the CI code is provided in the readme text. This software has been released under a revised BSD style license.
The B-spline function is the maximally differentiable interpolative basis function. Nearest neighbor and linear interpolation correspond to B-spline interpolation of degree 0 and 1 respectively. Those are the only ones where the B-spline coefficients coincide with the sample values, which means that the interpolated B-spline function will exactly pass through the coefficient values at the sample grid. For any higher order B-spline function this is not the case, which means that the coefficients have to be predetermined. Fortunately the coefficients can be efficiently obtained using a causal and anti-causal pre-filter , and CI provides a CUDA accelerated version of this filter.
Prefiltering a data set of 256 * 256 * 256 voxels (16 MB) takes 1330 ms using an Intel Xeon 2.33 GHz processor (single threaded, no SSE optimizations, etc), while the CUDA implementation uses 77.5 ms on an nVIDIA GeForce 9800 card with 512MB onboard memory, delivering a speedup factor of 17.2.
Cubic B-spline interpolation of a slice through that data set on an output window of 512 * 512 pixels delivers a frame rate of 1.76 fps using non-optimized CPU code and 22.1 fps for SSE accelerated multi-threaded CPU code, versus 846 fps when using the fast CUDA implementation, on the same hardware as above. This corresponds to a speedup factor of 38.3 with respect to the optimized CPU code, and a factor 481 when compared to the straight-forward CPU implementation! In order to profile only the interpolation all displaying was disabled during the measurements.
The code used to perform these measurements can be found in the example folder of the source code zip.
A comprehensive discussion of uniform B-spline interpolation and the pre-filter can be found in . The GPU implementation that can be downloaded here is described in . The fast cubic B-spline interpolation is an adapted version of the method introduced by Sigg and Hadwiger . A description of the adapted algorithm, its merits and its drawbacks is given in .
Do you have question, remarks, feedback or are you looking for help? Send me a message!
The CI code is provided without any warranty of delivering correct results or of any other kind. For more information, please consult the license.
Copyright 2008-2013 Danny Ruijters