NVIDIA 8800 series - Tech Preview - NVIDIA 8800 series - Tech Preview - pg.2
Author: Luka Rakamaric
Date: 08 Nov 2006

Lets now move to the NVIDIA new unified shader architecture, that is the main feature of the 8800 series GPU. In the summer of 2002 NVIDIA began working on the new architecture that would greatly improve the efficiency of the GPU. They worked together with Microsoft to deliver a conjuncture of a powerful and efficient GPU and API, that will introduce many new features, such as geometry shading and stream output. NVIDIA wanted to significantly improve the processing power of its GPU, regardless of the nature of the data processed. Different applications use very different mixes of vertexes, pixels and geometry data to be processed. If you had a vertex heavy scene, most of the GPU would be doing nothing useful, since the pixel shader parts of it would not be used. This highly sequential design was replaced with a looping one that can be seen on a picture below.

As we can see, each operation is performed by the same unit, which ensures that it is always in use. This enables the one thing NVIDIA wanted the most, and that is efficiency. No part of the GPU in useless in any given time, since all processing units can perform all functions. If we look at an average application, it will generally produce three times more pixel data than vertices. As we could see with the Series 7 GPU, it had 24 pixel shader pipelines, and only 8 vertex shaders. The 8800GTX has 125 processing units, which can process both pixels and vertices, as well as serve as geometry shaders. The next picture shows us just how the older generation GPU?s were losing processing time while working with real life applications.

The picture shows us a theoretical GPU with 4 vertex shaders and 8 pixel shaders. The first part offers an insight in how a vertex heavy scene is rendered. All four vertex shaders are busy, but the GPU works only as fast as they do, so it cannot do more that 4 operations.

The majority of pixel shader units are idle, because they have nothing to process. The next example shows us the next situation that goes into the opposite extreme. No vertices, but a heavy pixel scene means that the situation will be somewhat better, because 8 units out of 12 will be working, but it is still not ideal, and the performance number is ?8?. Although you will rarely encounter a scene that has one or the other, the unified shader architecture is the step that ensures a 100% GPU utilization, meaning that no part of the GPU is ever idle. The picture below explains how this architecture works.

There are 12 processing units, but they aren't divided into groups, because all 12 can perform any task required. We can have a 11 / 1 vertex/pixel ratio, or a 1 / 11, the GPU will always produce a performance of ?12?. Unified stream processors (SPs) are able to process pixels, vertices, geometry or physics. The GPU dispatch and control logic unit dynamically decides what units are doing what kind of processing. This will enable game developers to pay less attention to complexity of the scenes, as they will be able to produce vertex heavy scenes that will not affect performance as it has in the past. As we can see on this representation of an SP, future functions could be added to the unit, due to its multipurpose nature.
NVIDA Lumenex engine is another improvement over the previous generation. 8800 implements a new AA technology, called coverage sampling anti aliasing, or CSAA. Four new modes are introduced, 8x, 8xQ, 16x and 16xQ. On average, the new 16x modes will produce the same performance as the old 4x multisampling AA.

As we know, the previous generation of NVIDIA cards did not support using High Dynamic Range (HDR) lighting with the use of multisampling AA. This is now possible, both with FP16 and FP32 components and multisampling enabled.



 
<< previous next >>