Wednesday, August 13, 2008

How to Count to 800 (comparing the NV GTX 280, ATI Radeon 4870, and what I hear about LRB)

This week at SIGGRAPH Intel presented a technical paper entitled Larrabee: A Many-Core x86 Architecture for Visual Computing which described Intel's new graphics architecture that is intended to compete with high end GPU products from NVIDIA and ATI/AMD. In trying to draw comparisons between Larrabee and more traditional GPUs, some confusion has ensured over what the various companies mean by terms such as "stream processors", "thread processors", and "processing cores". For example, NVIDIA describes the recent GeForce GTX 280 as having 240 thread processors. AMD describes it's Radeon 4870 as a 800 stream processor chip. Intel's paper describes the possibility of Larrabee incarnations with core counts in the 12 to 48 range.

Okay, so what gives? Obviously this isn't an apples-to-apples comparison, but as it turns out, it's very difficult to define a precise notion of "processing core" that is consistent and accurate across the three architectures. Moreover, counting cores is no measure of absolute chip performance given differences in clock rate, core capabilities, memory subsystem performance, the presence of fixed-function processing, etc. However, since we all like to count things when drawing comparisons, it's useful to try and make an attempt to count consistently.

I've decided to create few slides about how I think about the three chips (I reiterate: this is my own mental model, but it has worked well for me). The following set of slides describe how to derive peak 32-bit floating point capability of the three GPU's programmable components beginning with the organization of ALUs on the chip. As a bonus, I throw in a few notes about how programs (such as Direct3D shader programs, or CUDA programs) map onto the compute resources of the 3 architectures. The iconography and terminology of the slides follows from descriptions and principles presented in the talk "From Shader Code to a Teraflop: How a Shader Core Works" given as part of the SIGGRAPH 2008 Class: "Beyond Programmable Shading: Fundamentals".

Click here for the full talk in pdf form.
Click here for just the notes on NVIDIA, ATI, and proposed Larrabee chips.


Welcome to Kayvon's GBLOG, an attempt to provide simple technical explanations of modern graphics architectures and throughput computing programming models and techniques. I hope to use this blog to clarify and disambiguate the large amount of (often conflicting) terminology that gets thrown around in the field of graphics architecture and, more generally, the emerging field of commodity throughput computing.

I am a PhD candidate in the Computer Graphics Lab at Stanford University. My contact information can be found at