Distinguishing Fact from Fiction
During the development of RDRAM, the manufacturers promised a bandwidth of twice that of PC100, which is true to a certain extent, but this is only valid when comparing PC800 RDRAM with PC100 SDRAM. Confused about what “PC100” (or PC-100) and “PC800” really means? PC800 would lead one to believe that it should be 8 times the speed of PC100, but is it?
Upon closer examination, RDRAM uses a 2 byte (16 bit) wide databus versus SDRAM’s 8byte (64 bit) wide databus. Obviously, this makes the PC800 rating a bit confusing, but there is an explanation. PC800 RDRAM is actually a double-pumped module operating at a 400 MHz clock speed. Double-pumped means that data is transferred to the RDRAM on both the rising and falling edges of the clock, which is often referred to as double data rate *(DDR), creating an effective 800 MHz memory rating. PC100 SDRAM, on the other hand, is referred to as single data rate (SDR) and operates at 100 MHz clock speed, which can only transfer data on the rising edge of the clock, thus having an effective 100 MHz memory rating.
*Not to be confused with DDR SDRAM.
Memory Bandwidth is Theoretical in Nature
If we were to compare theoretical bandwidth, without considering memory latency, we would see the following:
PC800 RDRAM : 800 MHz x 2 Bytes = 1600 MB/s = 1.6 GB/s
PC100 SDRAM : 100 MHz x 8 Bytes = 800 MB/s = 0.8 GB/s
However, there are a number of new chipsets that have been released lately by both VIA and Intel (beginning with the BX unofficially) that support 133 MHz, or PC133 memory. If we look at the theoretical bandwidth of PC133 memory, it appears as follows:
PC133 SDRAM : 133 MHz x 8 Bytes = 1064 MB/s = 1.064 GB/s
If we carefully compare these three sets of theoretical bandwidth figures, it would appear that PC133 is well above PC100, and nibbling at the heals of RDRAM performance, but is it really? It would seem that PC133 offers performance well above what we have seen in real world benchmarks. Okay, so what’s missing?
What is missing is that these are theoretical bandwidth numbers that do not take into consideration memory latency, which makes a big difference in actual bandwidth. Unfortunately many companies, and all too often vendors, use these unadjusted numbers to promote one architecture’s superiority over another.
As we note above, theoretical bandwidth alone cannot be used to measure memory architecture superiority. It is fact that memory latency imposes too much of a penalty on actual memory bandwidth, and is different for every architecture. Therefore, to be able to really determine architectural superiority all factors must be taken into consideration, including the latencies.
Differences in Architecture
RDRAM is a memory architecture that relies on a packet-based protocol with an access latency that largely depends its distance from the memory controller. Although systems with multiple RDRAMs have slightly increased latencies compared to single-RDRAM systems, RDRAM latency is still, in a manner of speaking, comparable to that of SDRAM systems. By comparison, RDRAM protocols and architecture facilitate memory concurrency and minimize latency, as opposed to SDRAM, which does not. This is especially beneficial when multiple memory references are being serviced simultaneously. The number of RDRAMs does not affect peak bandwidth, and an RDRAM-based memory system provides peak bandwidth twice that of PC100 SDRAM. The 1.6 GB/sec bandwidth of RDRAM is achieved with only a 16-bit data bus, and when combined with control signals the memory controller only needs about one third of I/O channels that SDRAM does.
SDRAM uses a different approach. It uses a parallel databus 64 bits wide, and adding modules to the system has no effect on memory latency. In addition to the 64-bit databus, the memory controller must drive a multiplexed row and column address to the SDRAMs along with control signals.
To accurately measure (within reason) SDRAM performance, two metrics must be considered, bandwidth and latency. Unlike SDRAM, RDRAM offers not only higher bandwidth, but its latency is much improved when compared to what we’ve come to expect from SDRAM. You might be surprised to note that PC133 SDRAM latency is actually worse than PC100. You may review a reference article from Samsung Semiconductor, Inc. here: Rambus Dram Performance.
Defining Component Latency
The accepted definition of latency is the time between the moment the RAS (Row Address Strobe) is activated (ACT command sampled) to the moment the first data bit becomes valid. Synchronous device timing is always a multiple of the device clock period. You can read more about memory latencies here: CAS Latency, what is it? and here: Application Performance and Loaded Memory Latency, both of which will open in a new window for you.
The fundamental latency of a DRAM is determined by the speed of the memory core. All SDRAMs use the same memory core technology, thus all SDRAMs are subject to the same latency. Any differences in latency between SDRAM types is, therefore, only the result of the differences in the speed of their interfaces.
At the 400 MHz databus, the interface to a RDRAM operates with an extremely fine timing granularity of 1.25ns, resulting in a component latency of 38.75ns. The PC100 SDRAM interface runs with a coarse timing granularity of 10ns, or about eight (8) times that of RDRAM. Its interface timing matches the memory core timing very well, therefore its component latency ends at 40ns. The PC133 SDRAM interface, with its coarse timing granularity of 7.5ns, incurs a mismatch with the timing of the memory core. This mismatch significantly increases the component latency to 45ns.
Latency timing values can be easily computed from the data sheets of the respective devices. For SDRAMs, specifically PC100 and PC133, the component latency is the sum of the tRCD and CL values (tRCD*CL = component latency). With respect to RDRAM’s, the component latency is the sum of the tRCD and tCAC values, plus one half clock period for the data to become valid (tRCD*tCAC+(tCLK/2) = component latency.
Although component latency is an important factor in system performance, system latency is even more important, as it reduces overall performance. System latency is determined by adding external address and data delays to the component latency. In most personal computers today, the system latency is measured as the time to return 32-bytes of data, also referred to as the ‘cache line fill’ data, to the CPU. Cache issues are often overlooked, whether intentionally or unintentionally, and they play a large part in over all system performance. You will find our rather lengthy discussion of cache issues here: Cache Explained.
In a computer system, SDRAM suffers from what is referred to as the two-cycle addressing problem. The address must be driven for two clock cycles (20ns at 100 MHz) to provide sufficient time for the signals to settle or arrive entirely on the SDRAM’s already loaded address bus. After both the two-cycle address delay and the component delay, three more clocks are required to return the 32 bytes of data. The system latency of PC100 and PC133 SDRAM add five clocks to the component latency. The total SDRAM system latency is calculated as follows:
40 + (2 x 10) + (3 x 10) = 90ns for PC100 SDRAM
45 + (2 x 7.5) + (3 x 7.5) = 82.5ns for PC133 SDRAM
RDRAMs superior electrical characteristics eliminate the two-cycle addressing problem, thereby requiring only 10ns to drive the address to the RDRAM. The 32 bytes of data are transferred back to the CPU at 1.6 GB/second, which works out to be 18.75ns. Adding in the component latency, the RDRAM system latency is calculated as follows:
38.75 + 10 + 18.75 = 67.5ns for PC800 RDRAM
Whether measured at the component or system level, RDRAMs has the fastest (or lowest) latency. And as mentioned earlier, as the result of the mismatch between interface and core timing, the latency of PC133 SDRAM is significantly higher than the PC100. RDRAM’s low latency, coupled with its 1.6 gigabyte per second bandwidth, provides the highest possible sustained system performance. Granted, DDR SDRAM has demonstrated an entirely new concept regarding timing and latency issues, however the final judgment on that issue has not yet arrived even as we close upon the third quarter of 2001.
When we consider overall system performance, we must take note of the impact of L1 and L2 cache hits on memory architecture performance. (Review more about L1 and L2 cache issues in Cache Explained). In addition, individual programs vary widely in memory use, and as such have an array of different impacts on system performance. As an example, a program that uses a random database search using a large chunk of memory will so heavily impact the caches that the memory architecture having the lowest latency will have the advantage. On the other hand, some well written software that creates large sequential memory transfers that requires little CPU processing often easily saturates SDRAM bandwidth. RDRAM will have an advantage here as well with its higher bandwidth. In those situations where the software code fits nicely within the L1/L2 caches, memory type will have virtually no impact at all.
It has been quite some time now since Intel chose to implement support for the RDRAM memory architecture in its i820/i840 and upcoming chipsets. Most people in the industry thought that they were making a huge mistake, as the promised performance benefits weren’t showing up in most of the “then current” benchmarks. In fairness to Intel and to RDRAM (Rambus), it wasn’t all that long ago we were still using EDO RAM and SDRAM was the upstart technology, and at the time pretty expensive. We do not write benchmarking software, and with that we are qualified to ask the question of those that do, “is current benchmark software capable of accurately measuring the performance of RDRAM and DDR SDRAM?”. Thus far, we haven’t received a viable answer, and we believe that is due to the fact that each camp is more interested in promoting its own technology rather than developing the truth. In hindsight we’ve all seen the benefits of using SDRAM and its impact on overall system performance. If we look at the performance benefits SDRAM offered in its early days on applications that were popular then, it seems as though it didn’t offer huge advantages. Yet, we’ve come a long way, and SDRAM performance has continued to improve even though the technology, at first, didn’t seem to promise that much of an improvement.
Due to the growing demand in memory bandwidth, the arrival of GHz+ CPUs and the ever-growing demands of today’s software, SDRAM has run into bandwidth limitations. Sure, DDR SDRAM and VCDRAM might be able to hold off the introduction of a new memory standard for a little while, but it’s eventual arrival is inevitable. We have run many of the same comparative tests between RDRAM and DDR, and while DDR SDRAM might promise increased memory bandwidth, it will run into severe timing, latency and propagation delay problems due to its wide databus and the ever increasing clock speeds. At present (and for the foreseeable future) memory will be relatively cheap to produce, but as data rates increase, motherboards will need to have six or even eight PCB layers to be able to run these memory modules at these higher rates and clock speeds, thus increasing motherboard costs substantially.
While RDRAM is not perfect, it is one of the most promising solutions to bandwidth, latency and propagation delay problems. It’s scalable, which gives it a distinct advantage! When it was first released more than a year and a half ago, it was extremely expensive, but that was partly because it was new and the market hadn’t caught on yet. Although still expensive when compared to SDRAM, RDRAM pricing will drop further as more manufacturers begin recognizing its potential and begin start selling RDRAM. It will then become as commonplace as SDRAM is now. By the nature of the Rambus manufacturing process itself, in all probability Rambus will never be as affordable as SDRAM, but then again SDRAM doesn’t offer the same performance. But isn’t performance what you’re actually paying for? Better technology always comes at a price. We are certain that you wouldn’t expect your $1000 or less computer to perform nearly as well as a $5000 top-of-the-line model.
Notice: Windows® 95, Windows® 98, Windows® NT, Windows® 2000 and Microsoft® Office are registered trademarks or trademarks of the Microsoft Corporation.
All other trademarks are the property of their respective owners.