What is processor cache and what effect it has

CPU Cache is important. Without it, from the high processor clock speed, there would be no use. The cache allows you to use any, even the “slowest” RAM on your computer, without noticeable damage to its performance.

Computer processors have made a significant breakthrough in development over the past few years. The size of transistors decreases every year, and productivity is growing. At the same time, Moore’s law is already becoming irrelevant. As for the performance of processors, it should be taken into account, not only the number of transistors and frequency, but also the cache volume.

You may have already heard about the memory cache when you were looking for processor information. But, usually, we do not pay much attention to these numbers, they do not even stand out much in advertising processors. Let’s look at what the processor cache effects, what types of cache are and how it all works.

Solving any task, the computer processor receives from the RAM the necessary blocks of information. Having processed them, it writes the results of calculations into memory and receives the following blocks for processing. This continues until the task is completed.

All mentioned operations are performed at a very high speed. However, even the fastest RAM is slower than any “leisurely” processor. Each reading of information from it and writing it back is time consuming. On average, the speed of RAM is 16 – 17 times lower than the processor speed.

Despite such an imbalance, the processor does not stand idle and does not wait every time the RAM “issues” or “receives” data. It almost always runs at maximum speed. And all thanks to his cache.

The processor cache is a small but very fast memory. It is built into the processor and is a kind of buffer that smooths out interruptions in the exchange of data with slower RAM. CPU Cache is often called super-random access memory.

Processor cache is needed not only to balance the imbalance of speed. The processor processes the data in smaller portions than those in which they are stored in RAM. Therefore, the cache memory also plays the role of a kind of place for “repackaging” and temporarily storing information before transmitting it to the processor, as well as returning the processing results to RAM.

Contents

what is processor cache

In simple terms, the processor cache is just a very fast memory. As you already know, a computer has several types of memory. This is read-only memory that is used to store data, the operating system, and programs, such as an SSD or hard drive. The computer also uses RAM. This is random access memory, which is much faster than constant memory. Finally, the processor has even faster memory blocks, which are collectively called the cache.

If you imagine the computer’s memory as a hierarchy by its speed, the cache will be at the top of this hierarchy. In addition, it is closest to computing cores, as it is part of the processor.

The cache memory of the processor is a static memory (SRAM) and is designed to speed up work with RAM. Unlike dynamic random access memory (DRAM), data can be stored here without constant updating.

How does cpu cache work?

As you may already know, a program is a set of instructions that a processor executes. When you start the program, the computer needs to transfer these instructions from read-only memory to the processor. And here comes the memory hierarchy. First, the data is loaded into RAM, and then transferred to the processor.

These days, the processor can process a huge number of instructions per second. To make the most of their capabilities, the processor needs super fast memory. Therefore, a cache was developed.

The processor’s memory controller does the job of retrieving data from RAM and sending it to the cache. Depending on the processor used in your system, this controller may be located in the northbridge of the motherboard or in the processor itself. The cache also stores the results of executing instructions in the processor. In addition, the processor cache itself also has its own hierarchy.

Processor cache device

The processor cache system consists of two blocks – the cache controller and the cache itself.

Memory cache controller

A memory cache controller is a device that controls the contents of the cache, retrieves the necessary information from the RAM, transfers it to the processor, and returns the calculation results to the RAM.

When the processor core accesses the controller for some data, it checks to see if this data is in the cache. If this is the case, information from the cache is immediately given to the kernel (the so-called cache hit occurs ).

Otherwise, the kernel has to wait for data to arrive from slow RAM. The situation when the required data is not in the cache is called a cache miss .

The controller’s task is to ensure that cache misses occur as rarely as possible, and ideally so that they do not exist at all.

The size of the processor cache compared to the size of RAM is disproportionately small. It can only contain a copy of a tiny fraction of the data stored in RAM. But, despite this, the controller makes cache misses not often. The effectiveness of its work is determined by several factors:

the size and structure of the cache memory (the more resources the controller has at its disposal, the lower the likelihood of a cache miss);
the effectiveness of the algorithms by which the controller determines what information the processor will need at the next moment in time;
complexity and number of tasks simultaneously solved by the processor. The more complex the task and the more there are, the more often the controller “errs”.

Processor cache levels L1, L2, L3.

The processor cache is made in the form of static memory chips (Static Random Access Memory, abbreviated as SRAM). Compared to other types of memory, static memory has a very high speed.

However, this speed also depends on the size of a particular chip. The larger the volume of the microcircuit, the more difficult it is to ensure a high speed of its operation.

Given this feature, the processor cache is made in the form of several small blocks called levels. Most processors use a three-tier cache system :

L1 Cache – a very small, but the fastest and most important memory chip. In no processor does its volume exceed several tens of kilobytes. It works without any delay. It contains data that is most often used by the processor.
The number of memory chips L1 in the processor, as a rule, is equal to the number of its cores. Each core has access only to its L1 chip.
L2 cache is slightly slower than L1 cache, but its volume is more substantial (several hundred kilobytes). It serves to temporarily store important information, the probability of a request for which is lower than that of information located in L1.
L3 cache is a larger, but slower memory scheme. However, it is much faster than RAM. Its size can reach several tens of megabytes. Unlike L1 and L2, it is common to all processor cores. The L3 level cache for temporary storage of important data with a relatively low probability of a request, as well as for ensuring the interaction of the processor cores with each other.

There are also processors with a two-level cache memory. In them, L2 combines the functions of L2 and L3.

how does more cache effect processor

When executing a request to provide data to the kernel, the memory controller first searches for it in the cache of the first level, then in the cache of the second and third levels.

Which cache on the CPU is used first?
According to statistics, the cache of the first level of any modern processor provides up to 90% of cache hits. The second and third levels – another 90% of what remains. And only about 1% of all processor requests end in cache misses.

These indicators relate to simple tasks. As the processor load increases, the number of cache misses increases.

The efficiency of the processor cache minimizes the impact of RAM speed on computer performance. For example, a computer will work equally well with RAM 1066 MHz and 2400 MHz. Other things being equal, the difference in performance in most applications will not exceed 5%.

When trying to gauge cache performance, users most often look for answers to the following questions:

Which cache structure is better: two-tier or three-tier?
L3 cache is more efficient.

To determine how much L3 affects processor performance, an experiment was conducted. It consisted in measuring the performance of the Athlon II X4 and Phenom II X4 processors. Both processors are equipped with the same cores. The first differs from the second only in the absence of L3 cache and lower clock speed.

Bringing the frequencies of both processors to the same level, it was found that the presence of L3 cache memory increases the Phenom processor performance by 5.8%. But this is an average. In some applications, it was almost zero (office programs), in others, it reached 8% and even more (3D computer games, archivers, etc.).

how does cache size affect CPU performance

Estimating the size of the cache, you need to take into account the characteristics of the processor and the range of tasks it solves.

The dual-core processor cache rarely exceeds 3 MB. Especially if its clock frequency is below 3 GHz. Manufacturers are well aware that a further increase in the cache size of such a processor will not bring a performance gain, but it will significantly increase its cost.

Another thing is high-frequency 4-, 6- or even 8-core processors. Some of them (for example, Intel Core i7) support Hyper Threading technology, which ensures that each core simultaneously performs two tasks. Naturally, the potential of such processors cannot be revealed with a small cache. Therefore, its increase to 15 or even 20 MB is justified.

In Intel processors, the cache filling algorithm is constructed according to the so-called inclusive scheme, when the contents of the upper level caches (L1, L2) are completely or partially duplicated in the lower level cache (L3). This to some extent reduces the usable volume of its space. On the other hand, the inclusive scheme has a positive effect on the interaction of the processor cores with each other.

In general, experiments show that in the average “home” processor, the influence of cache size on performance is within 10%, and it can be compensated for, for example, by a high frequency.

The effect of the large cache is most noticeable when using archivers, in 3D games, during video encoding. In “not heavy” applications, the difference tends to zero (office programs, Internet surfing, working with photos, listening to music, etc.).

Multi-core processors with a large cache are needed on computers designed to run multi-threaded applications, while simultaneously solving several complex tasks.

This is especially true for servers with high traffic. Some high-load servers and supercomputers even provide the installation of a cache of the fourth level (L4). It is made in the form of separate microcircuits connected to the motherboard.

how to find cache size of CPU

There are special programs that provide detailed information about the processor of the computer, including its cache memory. One of them is the CPU-Z program.
• CPU-Z : ⇒ More about the program | download >>>

The program does not require installation. After its launch, you need to go to the “Caches” tab

The example shows that the processor under test is equipped with a three-level cache memory. The size of L3 cache in it is 3 MB, L2 – 512 KB (256×2), L1 – 128 KB (32×2 + 32×2).

Is there any way to increase the processor cache?
As mentioned in one of the previous paragraphs, the possibility of increasing the processor cache is provided in some servers and supercomputers by connecting it to the motherboard.

In home or office computers, this is not possible. Cache memory is an integral integral part of the processor, has very small physical dimensions and cannot be replaced. And on ordinary motherboards there are no slots for connecting additional cache memory.

What is CPU Cache for ?

It’s time to answer the main question of this article, what is affected by the processor cache? Data comes from RAM to the L3 cache, then to L2, and then to L1. When the processor needs data to complete the operation, it tries to find it in the L1 cache, and if it does, then this situation is called a cache hit. Otherwise, the search continues in the cache L2 and L3. If even now the data could not be found, a request is made to the RAM.

Now we know that the cache is designed to accelerate the transfer of information between RAM and the processor. The time it takes to retrieve data from memory is called Latency. L1 cache has the lowest latency, so it is the fastest, L3 cache is the highest. When there is no data in the cache, we are faced with an even higher delay, since the processor must access the memory.

Previously, in the design of processors, L2 and L3 caches were moved outside the processor, which led to high delays. However, reducing the process technology by which processors are manufactured allows you to place billions of transistors in a space much smaller than before. As a result, space was freed up to place the cache as close to the cores as possible, which further reduces latency.

CONCLUSIONS

Now you know what the processor cache is responsible for and how it works. The cache design is constantly evolving, and memory is becoming faster and cheaper. AMD and Intel have already done many cache experiments, while Intel even tried to use L4 cache. The processor market is developing much faster than ever. The cache architecture will keep up with the ever-increasing processor power.

In addition, much is being done to eliminate the bottlenecks that modern computers have. Reducing the latency of working with memory is one of the most important parts of this work. The future looks very promising.