WCDRAM: A fully associative integrated Cached-DRAM with
wide cache lines
Gershon Kedem
Dept. of Computer Science
Duke University
Durham, NC-27708
kedem@cs.duke.edu
Ram Prasad Koganti
Dept. of Electrical and Computer Engineering
Duke University
Durham, NC-27708
rpk@cs.duke.edu
This paper presents an integrated, fully associative, wide Cached-DRAM (WCDRAM). We propose a fully associative cache with very large blocks (1KB-8KB) but few entries (32-128), integrated with the DRAM array. By doing this, we exploit the internal bandwidth inherent in DRAM access. A single row address strobe (RAS) operation is used to transfer a large (up to 8KB) memory block between the DRAM array and the SRAM buffer. Our simulation results demonstrate that the proposed cache is very effective. For more than half of the benchmarks, 1MB WCDRAM has a 20-100 times lower local miss rate (LMR) than a 4MB level-3 cache.
The WCDRAM cache tags can be integrated in the microprocessor and the WCDRAM cache lookup can be done in parallel with an L2 cache access. On an L2 cache miss, the processor issues either a WCDRAM cache access or a DRAM access. As a result, the WCDRAM cache hit time does not include the cost of a fully-associative cache look-up and a DRAM access is not preceeded by a WCDRAM cache access. Also, block transfer time is a negligible factor in the cache miss penalty, because the WCDRAM cache is integrated with the DRAM.
The WCDRAM cache is an effective alternative to off-chip, multi-megabyte secondary or tertiary direct mapped caches. Analysis shows that using WCDRAM memory instead of DRAM memory, in a system with 16KB, two-way set associative L1 cache and 256KB, four-way set associative L2 cache, improves performance of some benchmark programs by a factor of 1.5-1.9. This is possible because of the high hit rates on the cache, which are not accompanied by an increase in hit time and miss penalty, and the look-aside nature of the WCDRAM cache. The WCDRAM architecture promises to yield an average memory access time that is comparable to SRAM rather than DRAM.
The Complete Report:
wcdram paper to presented at HPCS'97
HPCS'97 wcdram presentation slides