We’ve been asked a lot lately about why FluidRay RT doesn’t use the GPU for rendering. In this post, we’ll try to answer that question as extensively as possible.
Even if there seems to be a trend toward GPU rendering, GPUs still have a lot of limitations.
For example, the GPU memory is limited (in the order of 1-10Gb) compared to the amount of main memory available (up to 256Gb). This puts serious limitations to the amount of textures and scene’s polygonal complexity. Secondly, GPUs are a hell to develop on. Drivers are often unstable and what works on a driver version may not work on another. Some features are supported only by NVidia cards and not by ATI cards and vice-versa.
Consequently, complex algorithms like bidirectional-path tracing, metropolis sampling, light-path expression and programmable shading are very hard if not impossible to fit all in a general purpose GPU renderer.
For these reasons, a GPU only renderer might lack features, or perform really well on some specific scenes, but very poorly on others.
Some people are opting for the hybrid solution of using the GPU only for the ray intersection part of the algorithm while implementing the rest of the algorithms on the CPU. This still suffers from the GPU memory limitation problem though.
Another possibility is out of core textures and geometry. In this case, you have 2 possible solutions: you either 1. run part of the intersection and texture sampling code on the GPU and part on the CPU or 2. you design some caching system that transfers bits of geometry/textures from system ram to the GPU when the GPU needs it.
In the first case, you would completely loose the speed advantage of having the GPU, because the GPU would be constantly waiting for the CPU to send back the intersection and sampling results. You would basically have a GPU renderer that runs at the same speed (in the best case) of a CPU renderer, most likely at a much lower speed, because of communication and bus transfer issues.
In the case of the caching system, you would run into problems as well: Global illumination algorithms (i.e path tracing and photon mapping) have the big issue of having a very incoherent memory access pattern, essentially, every time you reflect a ray off a surface it can hit virtually any other part of the scene. That means that the cache would be constantly invalidated by requests of bits of geometry/textures that were not present in the cache before. As a result, the GPU would be constantly waiting for data to be transferred in the cache through the bus.
It would be interesting to do some test with a scene that needs, let’s say, 2-3 times the GPU memory and see how the performance degrades.
Considering all those issues, and wanting to have the most general purpose and feature-rich renderer possible, we decided for a CPU-only solution in FluidRay RT. For intersection, we use Intel Embree raytracing kernels, which is CPU only and doesn’t suffer from the memory limitation problem, while still providing an excellent real-time performance.
As far as we know, a fair comparison between raw intersection performance between GPU and CPU haven’t been done yet. Many of the benchmarks out there are just comparing oranges with apples. They should be done, for example, between a single high-end CPU and a single high-end GPU. Also, considering that the results are highly scene dependent, they should be done on a variety of different scenes.
For more in-depth info on the topic, check the SIGGRAPH 2014 paper: Embree – A Kernel Framework for Efficient CPU Ray Tracing
Edit:
Intel Embree has been evolved a lot since this post was writted. Here is some highlights:
- Subdivision Surfaces
- Displacement Mapping
- Hair and Fur Rendering
- Much improved performance and support for ray packets
References:
Intel Embree Website
Embree – A Kernel Framework for Efficient CPU Ray Tracing
Exploiting Local Orientation Similarity for Efficient Ray Traversal of Hair and Fur
Watertight Ray/Triangle Intersection