NVIDIA 8800 series - Tech Preview
Author: Luka Rakamaric
Date: 08 Nov 2006

Today NVIDIA launches its new series of graphics cards, the 8 series. As with all launches from past few series, NVIDIA starts its new series lineup with a model ending in 800. The 8800 will be the new flagship of NVIDIA?s offer for some time. GeForce 8800 brings us two major features previously not seen with the GPU market. The first one is DirectX 10 support. With the upcoming Windows Vista release date, and consequentially the DirectX 10 release date, it could have been expected that compatible graphics cards would start to show up. DirectX 10 is the most significant step in 3D API design since the introduction of programmable shaders. It brings us powerful geometry shaders, and a new "Shader Model 4" programming model which will increase both performance and quality.
The new 8800 supports all of the requirements of Microsoft?s DirectX 10 specification. Since it is the first shipping model that supports DX10, it will be a reference model for DX10 API certification and development. The 8800 GPU has some new features that will work in conjuncture with DX10 to produce even more performance. They include geometry shader processing, improved instancing and stream output. It also reduces CPU overhead, by enabling even more processing to take place on the GPU, thus removing a significant load from the CPU.

The second feature that 8800 first offers on the market is its unified shader architecture. Before we take a look at this new architecture, we will try to explain how things have been up to now. Every graphics card on the marked to date has used a strictly pipelined design.

The CPU sends vertex data to the GPU, where the first processing stage is vertex shader processing. DirectX 7 used fixed function transformations, and DX8 brought programmable shaders. In the time of Shader Model 2, pixel shaders were introduced at this stage, and Shader Model 3 (DX9c) brought us dynamic flow control. The next stage in data processing is the setup phase, in which the particles from the vertex shader are combined into simple forms such as triangles, lines and dots. These are called primitives, which are then rasterized into pixel fragments. The following stage is the fragment shader, which is popularly called ?pixel shader?, because it works with pixel fragments. It performs operations such as shading, Z-testing, blending and anti aliasing. In the past, these shaders were fairly simple, and allowed only for simple operations such as uniform shading and applying color. Nowadays, they can perform numerous shading effects. Shaded fragments were then sent to the ROP (Raster Operation) stage. The Z-buffer checks if the received fragment is visible and sorts them for further processing. They are combined with existing frame buffer pixels and sent to the frame buffer memory for scanout and display. This design has been basically the same for 20 years, with significant evolutions over the years. There are many limitations that are caused by using a pipeline architecture. Data in the pipeline is not reusable, there is much overhead, and data cannot be written to memory in mid pipeline. In other words, the whole process of a particular fragment had to be completed to write it into memory, at which point it might not be usable any more.

 
next >>