Telling it like it is on the inside of production semiconductors as seen at Chipworks

Thursday, February 17, 2011

How to Get 5 Gbps Out of a Samsung Graphics DRAM

It’s well known that electronics games buffs like their image creation as realistic (or at least as cinema-like) as possible, which in image-processing terms means handling more and more fine-grained pixel data as fast as possible. That means more and more stream processors and texture units in the graphics processor to handle parallel data streams, and faster and faster memory to funnel the data in and out of the GPU.

We recently pulled apart a Sapphire Radeon HD5750 graphics board, containing an AMD/ATI RV840 40-nm GPU, running at 700 MHz, and supported by eight Gb (1 GB) of Samsung GDDR5 memory. This card is a budget card, but the ATI chip still boasts 1.04 billion transistors, 720 stream processors and 36 texture units, can compute at ~1 TFLOPS with a pixel fill rate of 11 Gpixel/s, and can run memory at 1150 MHz with 74 GB/sec of memory bandwidth. I’m not a gamer, but those numbers are impressive to me!

When we started looking at the memory chips, and decoded the part number, we found that we had Samsung’s fastest graphics memory part, claimed to run at 5 Gbps. Graphics DRAMs are designed to run faster anyway, but 5 Gbps is three times faster than the fastest regular DDR3 (Double-Data Rate, 3rd Generation) SDRAM, which can do 1.6 Gbps.*

So what makes this one so blazing fast? Beginning with the x-ray, the difference between a Graphics DDR5 when compared with a 1Gb DDR3 (K4B1G0846F-HCF8) part starts to show up. If we look at an x-ray of the DDR3 chip, we can see that it has the conventional wire-bonding down the central spine:

Plan-View X-ray of Samsung 1 Gb DDR3 SDRAM
When we compare the K4G10325FE-HC04 GDDR5 we can see first that it’s a flip-chip device (no wires), and if we squint hard enough we can also see that the bumps are distributed across the die as well as along the spine.

Plan-view X-ray of Samsung 1 Gb GDDR5 Part from ATI Radeon

This is confirmed in the die photograph:

Die Photo of Samsung 1 Gb GDDR5 SGRAM
Which compares with the die photo of the 1-Gb DDR3:

Die Photo of Samsung 1 Gb DDR3 SDRAM
The die layout is clearly optimized to reduce RC delays from the memory blocks to the outside world. The next question for me is the nature of the flip-chip bonding; is it regular solder bumps or gold stud bumps? A cross-section solves that problem – solder, on plated-up copper lands.

Cross-sectional Images of Samsung GDDR5 Chip in Package
A quick x-ray spectroscopy analysis tells us that the solder is silver-tin lead-free, confirming the package marking.

So the answer to our question is actually fairly obvious – lay out the die to reduce input/output line lengths, and thereby RC delays on the chip, and replace bond wires with bumps to minimize RC delays in the package. A nice exposition of basic principles used to optimize performance.

The next step would be to co-package the memory chips with the GPU to reduce lateral board delays, and we have seen that in products such as the Sony RSX chip in the PS3 gaming system. And after that, lay out the GPU for through-silicon vias - but that will be another story..

For those with an interest in the memory interface circuitry in the RV840, my colleague Randy Torrance has posted a discussion on the Chipworks blog.

* At the time of writing!