RHW Articles

 

GT200 Architecture

by Richard La Rose

  The GT200 is not just an over clocked variant of the G80 or G92 architecture but can be considered a near completely new architecture. The flagship model of the GT200 known as the GTX 280 looks like what you see above: You have a geometry shader processing unit, vertex shader, and setup/raster units. These feed into the unified shader array of no less than 240 stream processing units (SCALAR). That's 10 blocks of 24 shader units; each block is three groups of eight.

  L2 texture cache sits between the stream processing units and the memory interface units typically referred to as "ROPs" or Raster Operator Units (eight "blocks" of those with four-per-block, which is double the ROP power of the G92 and 8 more units than G80). Each stream processor includes a register file twice the size of those in the stream processors of earlier nVIDIA designs (G80/G92), along with a floating point compute unit, an integer compute unit, and a move/compare unit. There's also a double precision floating point unit, which is useful for some GPGPU tasks (very important indicator of performance when we compare the total Mathematical Computational Capacity of the GT200 and RV770). A double precision floating point unit is an important factor in both Collision Detection (In dynamic loops and branching), which is used in PhysX calculations, as well as in Folding@Home’s GPGPU client.

The same way the GT200 is not an over clocked G80/G92 is the same way that RV770 is not an over clocked RV670. RV770 can be considered a revamped and vastly improved RV670 design. One area where RV670 did not lack was in Mathematical Computational Capacity, that being said AMD still pushed the envelope and increased the amount of Stream Processors from 64 to 160 which when considering the Superscalar (Vec5) nature of the RV770’s architecture amounts to 800 Stream Processors (versus just 320 on the R600/RV670). Bellow you will find a single Vec5 unit:

Taking a closer look at a single Vec5 unit one quickly notices a change in SIMD deployment (over RV670). Although RV770 uses the same SIMD arrangement as RV670, Five SPs and a branch execution unit, the difference is in how they're placed within the architecture as well as how they communicate with one another (aside from the amount of these units of course). Whereas RV670 grouped 64 of these units for a total of 320 SPs, RV770 groups 160 of these SIMD units, grouped in 10 cores of 16, which adds up to 800 SPs (about 2.5x the amount found in RV670/R600).

Each 32-bit floating-point, scalar SP can be set to work on any thread and in single-precision mode issue two instructions per clock (just like GT200). Each scalar MP can crank it up to double-precision (64-bit) with relative ease (comparatively speaking to GT200). It is important to note that unlike GT200 RV770 does not natively support FP64. Rather, it has to use its 32-bit SPs, run over a number of cycles but AMD reckons RV770 is still good for one-fifth of single-precision throughput (and tests using Rightmark do confirm this). That's 1/5th of 1.2TFlops or 240GFlops, and we'll come back to this later during the Mathematical Computational Capacity piece.

some random text

 some random text some random textsome random textsome random text some random textsome random textsome random textsome random textsome random text

some random text

 some random text some random textsome random textsome random text some random textsome random textsome random textsome random textsome random text

some random text

 some random text some random textsome random textsome random text some random textsome random textsome random textsome random textsome random text

home
forums
articles
news
reviews
tutorials