The value of the cache memory in the processor. What does the cache memory of the processor mean, what is the difference between L1, L2, L3. What is processor cache

Cache is a memory built into the processor, into which the most frequently used data (commands) of the RAM are written, which significantly speeds up the work.

L1 cache size (from 8 to 128 KB)
The amount of cache memory in the first level.
L1 cache is a block of high-speed memory located directly on the processor core.
It copies data retrieved from RAM.

Saving the basic commands allows you to increase the performance of the processor due to the higher speed of data processing (processing from the cache is faster than from RAM).

The capacity of the cache memory of the first level is small and is calculated in kilobytes.
Typically, "older" processor models have a large L1 cache.
For multi-core models, the amount of L1 cache for one core is indicated.

L2 cache size (from 128 to 12288 KB)
The amount of cache memory in the second level.
The L2 cache is a block of high-speed memory that performs the same functions as the L1 cache (see "L1 cache size"), but with a slower speed and a larger volume.

If you choose a processor for resource-intensive tasks, then a model with a large amount of L2 cache will be preferable.
For multi-core processors, the total amount of L2 cache is indicated.

L3 cache size (from 0 to 16384 KB)
The amount of cache memory in the third level.
The integrated L3 cache, combined with a fast system bus, forms a high-speed data link to the system memory.

As a rule, only CPUs for server solutions or special editions of "desktop" processors are equipped with a third-level cache.

L3 cache is available, for example, in such processor lines as Intel Pentium 4 Extreme Edition, Xeon DP, Itanium 2, Xeon MP and others.

Twin BiCS FLASH - a new 3D flash memory technology

On December 11, 2019, at the IEEE International Electronic Devices Meeting (IEDM), TOKYO-Kioxia Corporation announced a 3D flash memory technology - Twin BiCS FLASH.

AMD Radeon Software Adrenalin Edition 2020 Driver 19.12.2 WHQL (Added)

On December 10, AMD introduced the Radeon Software Adrenalin 2020 Edition 19.12.2 WHQL mega driver.

Windows 10 Cumulative Update 1909 KB4530684

On December 10, 2019, Microsoft released cumulative update KB4530684 (Build 18363.535) for Windows 10 November 2019 Update (version 1909) on x86, x64 (amd64), ARM64 and Windows Server 2019 (1909) processor-based systems for x64-based systems.

NVIDIA Game Ready GeForce Driver 441.66 WHQL

NVIDIA GeForce Game Ready 441.66 WHQL driver includes support for MechWarrior 5: Mercenaries and Detroit: Become Human, and adds G-SYNC support for MSI MAG251RX and ViewSonic XG270 monitors.

All processors since the late 90s have an internal cache (or just cache). The cache is a high-speed memory that transfers instructions and data that are directly processed by the processor.

Modern processors have built-in cache memory of two levels - the first (L1) and the second (L2). With the contents of the L1 cache, the processor is somewhat faster, and the L2 cache is usually slightly larger. The cache memory is accessed without a waiting state, i.e. The first-level cache (on-chip cache) runs at the same frequency as the processor.

This means that if the data needed by the processor is in the cache, then there is no delay in processing. Otherwise, the processor must get data from the main memory, which significantly reduces system performance.

In order to qualitatively understand the principle of operation of cache memory of both levels, let's consider a domestic situation using an example.

You come to the cafe for lunch every day, at the same time, and always sit at the same table. Always order a standard three-course set.

The waiter runs to the kitchen, the chef puts them on a tray and then they bring your order. And so, let's say, on the third day, the waiter, in order not to run to the kitchen once again, meets you at the appointed time with a ready-made hot lunch on a tray.

You do not wait for the order and saved a lot of time. A tray with your dishes is the first level cache. But on the fourth day, you suddenly want to add another dish, let's say dessert.

Although a tray with an order was already waiting for you at the appointed time, the waiter still had to run to the kitchen to get dessert.

And on the fifth - again a menu of three items. On the sixth - again a dessert, but different from the previous one. And the waiter, not knowing what kind of dessert you want to order (and not knowing at all whether you will order anything), decides on the next step: next to your table he puts a cabinet with several types of dessert.

And if you express a desire, everything is at hand, you don’t need to run to the kitchen. The dessert locker is a second level cache.

The size of the L1 cache (from 16 to 128 KB) and L2 (from 64 KB to 512 KB, up to 4 MB in the Pentium III Cheop and AMD Opteron) significantly affects the performance of the processor.

Intel Pentium III processors and Celeron processors based on it have a L1 cache size of 32 KB. The Intel Pentium 4, as well as Celeron and Cheop versions based on it, have only 20 KB. AMD Duron, Athlon (including XP/MP) and Opteron processors, as well as VIA C3, contain 128 KB of L1 cache.

Modern dual-core processors have a first-level cache for each core separately, so sometimes we can see the number 128x2 in the cache description. This means that each processor core has 128 KB of L1 cache.

L1 cache size is important for getting high performance in most common tasks (office applications, games, most server applications, etc.). Its effectiveness is especially pronounced for streaming calculations (for example, video image processing).

This is one of the reasons why the Pentium 4 is relatively inefficient for most common applications (although this is compensated by the high clock speed). The L1 cache always works (exchanges information with the processor core) at the internal frequency of the processor.

In contrast, the L2 cache in different processor models operates at different frequencies (and, accordingly, performance). Beginning with the Intel Pentium II, many processors used an L2 cache running at half the processor's internal frequency.

This solution is used in outdated Intel Pentium III processors (up to 550 MHz) and outdated AMD Athlon processors (in some of them, the internal L2 cache worked at a third of the processor core frequency). The amount of L2 cache is also different for different processors.

Older and some newer Intel Pentium III processors have 512 KB L2 cache, while other Pentium IIIs have 256 KB. The Pentium III-based Intel Celeron processor came with 128 and 256 KB of L2 cache, while the Pentium 4-based processor came with only 128 KB. Various variants of the Xeon version of the Intel Pentium 4 have up to 4 MB of L2 cache.

The new Pentium 4 processors (some series with a frequency of 2000 MHz and all for frequencies higher) have 512 KB of L2 cache, the rest of the Pentium 4 have 256 KB. Cheop processors (based on the Pentium 4) have 256 or 512 KB of L2 cache.

In addition, they also have a cache memory of the third level L3. The integrated L3 cache, combined with a fast system bus, forms a high-speed data link to the system memory.

As a rule, only processors for server solutions or special models of "desktop" processors are equipped with L3 cache memory. L3 cache memory is possessed, for example, by such processor lines as Xeon DP, Itanium 2, Xeon MP.

The AMD Duron processor has 128 KB L1 cache and 64 KB L2 cache. Athlon processors (except the older ones), Athlon MP, and most Athlon XP variants have 128 KB L1 cache and 256 KB L2 cache, and the latest Athlon XP processors (2500+, 2800+, 3000+ and above) have 512 KB L2 cache. The AMD Opteron contains 1 MB of L2 cache.

The latest models of Intel Pentium D, Intel Pentium M, Intel Core 2 Duo processors come with 6 MB L2 cache, and Core 2 Quad with 12 MB L2 cache.

The latest Intel Core i7 processor as of this writing has 64KB of L1 cache for each of the 4 cores, and 256KB of L2 memory for each core. In addition to the cache memory of the first and second levels, the processor also has a common cache for all cores of the third level, equal to 8 MB.

For processors that can have different L2 cache sizes (or in the case of Intel Xeon MP - L3) for the same model, this size must be specified when selling (of course, the price of the processor depends on it). If the processor is sold in a "boxed" package (in-box delivery), it usually indicates the size of the cache.

For normal user tasks (including games), the speed of the L2 cache is more important than its size; for server tasks, on the contrary, volume is more important. The most productive servers, especially those with large amounts of RAM (several gigabytes), require the maximum amount and maximum speed of the L2 cache.

Cheop versions of Pentium III processors remain unsurpassed in these parameters. (The Xeon MP processor is still more productive in server tasks than the Pentium III Xeon, due to the higher clock frequency of the processor itself and the memory bus.) From the above, we can conclude that cache memory improves the interaction between a faster processor and a more slow RAM, and also allows you to minimize the waiting periods that occur during data processing. The decisive role in this is played by the second-level cache memory located in the processor chip.

One of the important factors that increase processor performance is the presence of cache memory, or rather its volume, access speed and distribution by levels.

For a long time, almost all processors are equipped with this type of memory, which once again proves the usefulness of its presence. In this article, we will talk about the structure, levels and practical purpose of cache memory, as a very important characteristic of the processor.

What is cache memory and its structure

Cache memory is a super-fast memory used by the processor to temporarily store data that is most frequently used. This is how, briefly, this type of memory can be described.

Cache memory is built on flip-flops, which, in turn, consist of transistors. A group of transistors takes up much more space than the same capacitors that make up RAM. This entails many difficulties in production, as well as restrictions in volumes. That is why cache memory is a very expensive memory, while having negligible volumes. But from such a structure, the main advantage of such a memory follows - speed. Since flip-flops do not need regeneration, and the delay time of the gate on which they are assembled is small, the switching time of the flip-flop from one state to another is very fast. This allows the cache memory to operate at the same frequencies as modern processors.

Also, an important factor is the location of the cache memory. It is located on the processor chip itself, which significantly reduces the access time to it. Previously, cache memory of some levels was located outside the processor chip, on a special SRAM chip somewhere on the motherboard. Now, in almost all processors, the cache memory is located on the processor chip.

What is CPU Cache Used for?

As mentioned above, the main purpose of cache memory is to store data that is frequently used by the processor. The cache is a buffer into which data is loaded, and despite its small size (about 4-16 MB) in modern processors, it gives a significant performance boost in any application.

To better understand the need for cache memory, let's imagine the memory organization of a computer as an office. RAM will be a cabinet with folders that the accountant accesses periodically to retrieve large blocks of data (that is, folders). And the table will be the cache memory.

There are elements that are placed on the accountant's desk, which he refers to several times during the hour. For example, it can be phone numbers, some examples of documents. These types of information are right on the table, which, in turn, increases the speed of access to them.

In the same way, data can be added from those large data blocks (folders), to the table, for quick use, for example, any document. When this document is no longer needed, it is placed back in the cabinet (in RAM), thereby clearing the table (cache) and freeing this table for new documents that will be used in the next period of time.

Also with the cache memory, if there is some data that is most likely to be re-accessed, then this data from the RAM is loaded into the cache memory. Very often, this happens with the joint loading of the data that is most likely to be used after the current data. That is, there are assumptions about what will be used "after". These are the simple principles of operation.

Processor cache levels

Modern processors are equipped with a cache, which often consists of 2 or 3 levels. Of course, there are exceptions, but this is often the case.

In general, there can be such levels: L1 (first level), L2 (second level), L3 (third level). Now a little more about each of them:

The first level cache (L1) is the fastest cache memory level that works directly with the processor core, thanks to this tight interaction, this level has the shortest access time and operates at frequencies close to the processor. It is a buffer between the processor and the second level cache.

We will consider the volumes on the high performance Intel Core i7-3770K processor. This processor is equipped with 4 x 32 KB L1 cache 4 x 32 KB = 128 KB. (32 KB per core)

Second level cache (L2) - the second level is larger than the first, but as a result, it has less "speed characteristics". Accordingly, it serves as a buffer between the L1 and L3 levels. If we turn again to our Core i7-3770 K example, then here the amount of L2 cache is 4x256 KB = 1 MB.

Level 3 cache (L3) - the third level, again, slower than the previous two. But it's still much faster than RAM. The L3 cache in the i7-3770K is 8 MB. If the two previous levels are divided into each core, then this level is common to the entire processor. The indicator is quite solid, but not sky-high. Since, for example, Extreme-series processors like i7-3960X, it is 15MB, and some new Xeon processors have more than 20.

we-it.net

What is the cache for and how much is needed?

This is not about cash, but about the cache memory of processors and not only. Traders have made another commercial fetish out of the amount of cache memory, especially with the cache of central processors and hard drives (video cards also have it - but they haven't gotten to it yet). So, there is a XXX processor with a 1MB L2 cache, and exactly the same XYZ processor with a 2MB cache. Guess which one is better? Ah - don't do it right away!

Cache memory is a buffer where things that can and / or need to be put aside for later are added. The processor does the work and situations arise when intermediate data needs to be stored somewhere. Well, of course in the cache! - after all, it is orders of magnitude faster than RAM, tk. it is in the processor die itself and usually runs at the same frequency. And then, after some time, he will fetch this data back and process it again. Roughly speaking, like a potato sorter on a conveyor, which every time something other than a potato (carrot) comes across, throws it into a box. And when it's full, he gets up and carries it into the next room. At this moment, the conveyor stops and idle is observed. The volume of the box is the cache in this analogy. And how much do you need - 1MB or 12? It is clear that if its volume is small, it will take too much time to take it out and it will be simple, but from some volume its further increase will not give anything. Well, the sorter will have a box for 1000 kg of carrots - yes, he will not have so much of it for the entire shift, and this will NOT BECOME TWICE FASTER! There is one more subtlety - a large cache can cause an increase in delays in accessing it, firstly, and at the same time, the probability of errors in it increases, for example, during overclocking - secondly. (You can read about HOW to determine the stability / instability of the processor in this case and find out that the error occurs precisely in its cache, test L1 and L2 - you can read here.) Thirdly, the cache consumes a decent area of ​​\u200b\u200bthe crystal and the transistor budget of the processor circuit. The same applies to hard drive cache memory. And if the processor architecture is strong, it will require 1024Kb of cache or more in many applications. If you have a fast HDD - 16MB or even 32MB is appropriate. But no 64MB of cache will make it faster if it's a cut called the green version (Green WD) with a speed of 5900 instead of the required 7200, even if the latter has 8MB. Then Intel and AMD processors use this cache differently (generally AMD is more efficient and their processors are often comfortable with smaller values). In addition, Intel has a shared cache, while AMD has a personal one for each core. The fastest L1 cache in AMD processors is 64Kb for data and instructions, which is twice as much as that of Intel. L3 cache is usually present in top processors like AMD Phenom II 1055T X6 Socket AM3 2.8GHz or competitor Intel Core i7-980X. First of all, games love large amounts of cache. And the cache is NOT liked by many professional applications (see below). Computer for rendering, video editing and professional applications). More precisely, the most demanding ones are generally indifferent to it. But what you definitely shouldn’t do is choose a processor by cache size. The old Pentium 4 in its latest manifestations even had 2MB of cache at operating frequencies far beyond 3GHz - compare its performance with a cheap dual-core Celeron E1 *** operating at frequencies of about 2GHz. He will not leave stone unturned from the old man. A more recent example is the high-frequency dual-core E8600, which costs almost $200 (apparently because of the 6MB cache) and the Athlon II X4-620 2.6GHz, which has only 2MB. This does not prevent Athlone from butchering a competitor for a nut.

As you can see from the graphs, neither in complex programs, nor in processor-demanding games, any cache will replace additional cores. Athlon with 2MB cache (red) easily outperforms Cor2Duo with 6MB cache even at a lower frequency and nearly half the cost. Also, many people forget that the cache is present in video cards, because, generally speaking, they also have processors. A recent example is the GTX460 video card, where they manage not only to cut the bus and the amount of memory (which the buyer will guess) - but also the shader cache, respectively, from 512Kb to 384Kb (which the buyer will NOT guess). And this will also add its negative contribution to performance. It will also be interesting to find out the dependence of performance on cache size. Let's examine how fast it grows with an increase in the cache size using the example of the same processor. As you know, processors of the E6***, E4*** and E2*** series differ only in cache size (4, 2 and 1 MB respectively). Working at the same frequency of 2400 MHz, they show the following results.

As you can see, the results are not too different. I will say more - if a processor with a capacity of 6 MB were involved - the result would increase by a little more, because. processors reach saturation. But for models with 512Kb, the drop would be noticeable. In other words, 2MB is enough even in games. Summarizing, we can draw the following conclusion - the cache is good when ALREADY there is a lot of everything else. It is naive and stupid to change the speed of the hard drive or the number of processor cores per cache size at the same cost, because even the most capacious sorting box cannot replace another sorter. But there are good examples. had 1MB of cache for two cores (E2160 series and the like), and the late 45nm revision of the E5200 series already has 2MB, all other things being equal (and most importantly, PRICE). Of course, it is worth choosing the latter.

compua.com.ua

What is a cache, why is it needed and how does it work

What is the dirtiest place on a computer? Think basket? User folders? Cooling system? Didn't guess! The dirtiest place is the cache! After all, it constantly has to be cleaned!

In fact, there are many caches on a computer, and they serve not as a waste dump, but as accelerators for equipment and applications. Where does their reputation as a "systemic garbage chute" come from? Let's see what a cache is, how it happens, how it works and why it needs to be cleaned from time to time.

Cache or cache memory is a special storage of frequently used data, access to which is carried out tens, hundreds and thousands of times faster than RAM or other storage media.

Applications (web browsers, audio and video players, database editors, etc.), operating system components (thumbnail cache, DNS cache) and hardware (CPU L1-L3 cache, GPU framebuffer, etc.) have their own cache memory. chip, drive buffers). It is implemented in different ways - software and hardware.

  • The program cache is just a separate folder or file where, for example, pictures, menus, scripts, multimedia content and other content of visited sites are downloaded. This is the folder where the browser first dives when you open a web page again. Swapping a piece of content from local storage speeds up its loading and reduces network traffic.

  • In hard drives, in particular, the cache is a separate RAM chip with a capacity of 1-256 Mb, located on the electronics board. It receives information read from the magnetic layer and not yet loaded into RAM, as well as data that the operating system most often requests.

  • A modern central processor contains 2-3 main levels of cache memory (it is also called scratch memory), located in the form of hardware modules on the same chip. The fastest and smallest in volume (32-64 Kb) is cache Level 1 (L1) - it runs at the same frequency as the processor. L2 is in the middle position in terms of speed and capacity (from 128 Kb to 12 Mb). And L3 is the slowest and most voluminous (up to 40 Mb), it is absent on some models. The speed of L3 is only low relative to its faster counterparts, but it is also hundreds of times faster than the most productive RAM.

The scratchpad memory of the processor is used to store constantly used data, pumped from RAM, and machine code instructions. The larger it is, the faster the processor.

Today, three levels of caching is no longer the limit. With the advent of the Sandy Bridge architecture, Intel has implemented an additional cache L0 (intended for storing decrypted microinstructions) in its products. And the most high-performance CPUs also have a fourth-level cache, made in the form of a separate microcircuit.

Schematically, the interaction of cache L0-L3 levels looks like this (for example, Intel Xeon):

Human language about how it all works

To understand how cache memory works, imagine a person working at a desk. The folders and documents that he uses all the time are on the table (in the cache). To access them, just reach out your hand.

Papers that he needs less often are stored nearby on the shelves (in RAM). To get them, you need to get up and walk a few meters. And what a person does not currently work with has been archived (recorded on a hard disk).

The wider the table, the more documents will fit on it, which means that the employee will be able to get quick access to more information (the larger the cache capacity, the faster the program or device works in theory).

Sometimes he makes mistakes - he keeps papers on the table that contain incorrect information and uses them in his work. As a result, the quality of his work is reduced (errors in the cache lead to failures in the operation of programs and equipment). To correct the situation, the employee must discard the documents with errors and put the correct ones in their place (clear the cache memory).

The table has a limited area (cache memory has a limited capacity). Sometimes it can be expanded, for example, by moving a second table, and sometimes it cannot (the cache size can be increased if such an opportunity is provided by the program; the hardware cache cannot be changed, since it is implemented in hardware).

Another way to speed up access to more documents than the table can hold is to find an assistant who will serve paper to the worker from the shelf (the operating system can allocate some of the unused RAM to cache device data). But it's still slower than taking them off the table.

Documents at hand should be relevant for current tasks. This is the responsibility of the employee himself. You need to put things in order in papers regularly (displacing irrelevant data from the cache memory falls "on the shoulders" of applications that use it; some programs have the function of automatically clearing the cache).

If an employee forgets to maintain order in the workplace and keep documentation up to date, he can draw a table cleaning schedule for himself and use it as a reminder. As a last resort, entrust this to an assistant (if an application dependent on cache memory has become slower or often loads outdated data, use scheduled cache cleaning tools or do this manually every few days).

We actually come across "caching functions" all over the place. This is the purchase of products for the future, and various actions that we perform in passing, at the same time, etc. In fact, this is everything that saves us from unnecessary fuss and unnecessary body movements, streamlines life and facilitates work. The computer does the same. In a word, if there was no cache, it would work hundreds and thousands of times slower. And we wouldn't like it.

f1comp.ru

Cache, cache, cash - memory. What is cache memory for? Impact of cache size and speed on performance.

Cache - memory (cache, cache, buffer - eng.) - is used in digital devices as a high-speed clipboard. Cache memory can be found on such computer devices as hard drives, processors, video cards, network cards, CD drives, and many others.

The principle of operation and architecture of the cache can be very different.

For example, the cache can serve as a regular clipboard. The device processes the data and transfers it to a high-speed buffer, where the controller transfers the data to the interface. Such a cache is intended to prevent errors, hardware check data for integrity, or to encode the signal from the device into an understandable signal for the interface, without delays. Such a system is used, for example, in CD / DVD drives of compact discs.

In another case, the cache can be used to store frequently used code and thus speed up data processing. That is, the device does not need to recompute or look up the data, which would take much longer than reading it from the cache. In this case, the size and speed of the cache plays a very important role.


This architecture is most commonly found on hard drives, SSDs, and central processing units (CPUs).

When devices are running, special firmware or dispatcher programs can be loaded into the cache, which would work slower with ROM (Read Only Memory).

Most modern devices use a mixed type of cache, which can serve as both a clipboard and for storing frequently used code.

There are several very important functions implemented for the cache of processors and video chips.

Consolidation of execution units. CPUs and video processors often use a fast shared cache between cores. Accordingly, if one core has processed the information and it is in the cache, and a command arrives for the same operation, or for working with this data, then the data will not be processed by the processor again, but will be taken from the cache for further processing. The kernel will be offloaded to process other data. This significantly increases performance in the same type, but complex calculations, especially if the cache is large and fast.

The shared cache also allows cores to work with it directly, bypassing slow RAM.

Cache for instructions. There is either a general very fast first-level cache for instructions and other operations, or a dedicated one for them. The more embedded instructions a processor has, the larger the instruction cache it needs. This reduces memory latency and allows the block of instructions to function almost independently. When it is full, the block of instructions begins to periodically idle, which slows down the speed of calculation.

Other functions and features.

It is noteworthy that in CPUs (Central Processing Units), hardware error correction (ECC) is used, because a small error in the cache can lead to one continuous error in further processing of this data.

In the CPU and GPU, there is a cache memory hierarchy that allows you to separate data for individual cores and general ones. Although almost all data from the second level cache is still copied to the third, general level, but not always. The first cache level is the fastest, and each subsequent one is slower, but larger in size.

For processors, three or fewer cache levels are considered normal. This allows you to achieve a balance between speed, cache size and heat dissipation. It is difficult to find more than two levels of cache in video processors.

Cache size, performance impact and other characteristics.

Naturally, the larger the cache, the more data it can store and process, but there is a serious problem here.

Big cache means big transistor budget. In server processors (CPUs), the cache can use up to 80% of the transistor budget. Firstly, this affects the final cost, and secondly, power consumption and heat dissipation increase, which is not comparable with a performance increased by several percent.

What is the dirtiest place on the computer? Think basket? User folders? Cooling system? Didn't guess! The dirtiest place is the cache! After all, it constantly has to be cleaned!

In fact, there are many caches on a computer, and they serve not as a waste dump, but as accelerators for equipment and applications. Where does their reputation as a "systemic garbage chute" come from? Let's see what a cache is, how it happens, how it works and why from time to time.

The concept and types of cache memory

Esh or cache memory is a special storage of frequently used data, which is accessed tens, hundreds and thousands of times faster than RAM or other storage media.

Applications (web browsers, audio and video players, database editors, etc.), operating system components (thumbnail cache, DNS cache) and hardware (CPU L1-L3 cache, GPU framebuffer, etc.) have their own cache memory. chip, drive buffers). It is implemented in different ways - software and hardware.

  • The program cache is just a separate folder or file where, for example, pictures, menus, scripts, multimedia content and other content of visited sites are downloaded. This is the folder where the browser first dives when you open a web page again. Swapping a piece of content from local storage speeds up its loading and .

  • In hard drives, in particular, the cache is a separate RAM chip with a capacity of 1-256 Mb, located on the electronics board. It receives information read from the magnetic layer and not yet loaded into RAM, as well as data that the operating system most often requests.

  • A modern central processor contains 2-3 main levels of cache memory (it is also called scratch memory), located in the form of hardware modules on the same chip. The fastest and smallest in volume (32-64 Kb) is cache Level 1 (L1) - it runs at the same frequency as the processor. L2 is in the middle position in terms of speed and capacity (from 128 Kb to 12 Mb). And L3 is the slowest and most voluminous (up to 40 Mb), it is absent on some models. The speed of L3 is only low relative to its faster counterparts, but it is also hundreds of times faster than the most productive RAM.

The scratchpad memory of the processor is used to store constantly used data, pumped from RAM, and machine code instructions. The larger it is, the faster the processor.

Today, three levels of caching is no longer the limit. With the advent of the Sandy Bridge architecture, Intel has implemented an additional cache L0 (intended for storing decrypted microinstructions) in its products. And the most high-performance CPUs also have a fourth-level cache, made in the form of a separate microcircuit.

Schematically, the interaction of cache L0-L3 levels looks like this (for example, Intel Xeon):

Human language about how it all works

To understand how cache memory works, imagine a person working at a desk. Folders and documents that he uses all the time are on the table ( in cache). To access them, just reach out your hand.

The papers he needs less often are stored nearby on the shelves ( in RAM). To get them, you need to get up and walk a few meters. And what a person does not currently work with has been archived ( recorded on hard disk).

The wider the table, the more documents will fit on it, which means that the employee will be able to get quick access to more information ( the larger the cache capacity, the faster the program or device works in theory).

Sometimes he makes mistakes - he keeps papers on the table that contain incorrect information and uses them in his work. As a result, the quality of his work is reduced ( cache errors lead to software and hardware failures). To correct the situation, the employee must throw away the documents with errors and put the correct ones in their place ( clear cache memory).

The table has a limited area ( cache memory is limited). Sometimes it can be expanded, for example, by moving a second table, and sometimes it cannot (the cache size can be increased if such an opportunity is provided by the program; the hardware cache cannot be changed, since it is implemented in hardware).

Another way to speed up access to more documents than the table can hold is to find an assistant who will serve paper to the worker from the shelf (the operating system can allocate some of the unused RAM to cache device data). But it's still slower than taking them off the table.

Documents at hand should be relevant for current tasks. This is the responsibility of the employee himself. You need to clean up the papers regularly (the extrusion of irrelevant data from the cache memory falls "on the shoulders" of applications that use it; some programs have an automatic cache clearing function).

If an employee forgets to maintain order in the workplace and keep documentation up to date, he can draw a table cleaning schedule for himself and use it as a reminder. As a last resort, entrust this to an assistant (if an application dependent on cache memory has become slower or often loads outdated data, use scheduled cache cleaning tools or do this manually every few days).

We actually come across "caching functions" all over the place. This is the purchase of products for the future, and various actions that we perform in passing, at the same time, etc. In fact, this is everything that saves us from unnecessary fuss and unnecessary body movements, streamlines life and facilitates work. The computer does the same. In a word, if there was no cache, it would work hundreds and thousands of times slower. And we wouldn't like it.

What is a cache, why is it needed and how does it work updated: February 25, 2017 by: Johnny Mnemonic

All users are well aware of such elements of a computer as a processor responsible for processing data, as well as random access memory (RAM or RAM) responsible for storing them. But not everyone probably knows that there is also a processor cache (Cache CPU), that is, the RAM of the processor itself (the so-called super-RAM memory).

What is the reason that prompted computer developers to use special memory for the processor? Isn't RAM enough for a computer?

Indeed, for a long time, personal computers did without any kind of cache memory. But, as you know, the processor is the fastest device in a personal computer and its speed has grown with each new generation of CPU. Currently, its speed is measured in billions of operations per second. At the same time, standard RAM has not significantly increased its performance over the course of its evolution.

Generally speaking, there are two main technologies for memory chips - static memory and dynamic memory. Without delving into the details of their structure, we will only say that static memory, unlike dynamic memory, does not require regeneration; in addition, 4-8 transistors are used for one bit of information in static memory, while 1-2 transistors are used in dynamic memory. Accordingly, dynamic memory is much cheaper than static memory, but at the same time much slower. Currently, RAM chips are manufactured on the basis of dynamic memory.

Approximate evolution of the ratio of the speed of processors and RAM:

Thus, if the processor took information from the main memory all the time, then it would have to wait for the slow dynamic memory, and it would be idle all the time. In the same case, if static memory were used as RAM, then the cost of the computer would increase several times.

That is why a reasonable compromise was developed. The main part of the RAM remained dynamic, while the processor got its own fast cache based on static memory chips. Its volume is relatively small - for example, the volume of the L2 cache is only a few megabytes. However, here it is worth remembering that all the RAM of the first IBM PC computers was less than 1 MB.

In addition, the expediency of implementing caching technology is also influenced by the fact that different applications that are in RAM load the processor differently, and, as a result, there is a lot of data that requires priority processing compared to the rest.

History of the cache

Strictly speaking, before the cache memory moved to personal computers, it had been successfully used in supercomputers for several decades.

For the first time, a cache memory of only 16 KB appeared in a PC based on the i80386 processor. Today's processors use various levels of cache, from the first (the fastest cache of the smallest size - usually 128 KB) to the third (the slowest cache of the largest size - up to tens of MB).

At first, the processor's external cache memory was located on a separate chip. Over time, however, this led to the fact that the bus located between the cache and the processor became a bottleneck, slowing down data exchange. In modern microprocessors, both the first and second levels of cache memory are located in the processor core itself.

For a long time, there were only two cache levels in processors, but for the first time in the Intel Itanium CPU, a third-level cache memory appeared, common to all processor cores. There are also developments of processors with a four-level cache.

Architectures and principles of cache operation

To date, two main types of cache memory organization are known, which originate from the first theoretical developments in the field of cybernetics - Princeton and Harvard architectures. The Princeton architecture implies a single memory space for storing data and commands, while the Harvard one has a separate one. Most personal computer processors of the x86 line use a separate type of cache memory. In addition, a third type of cache memory has also appeared in modern processors - the so-called associative translation buffer, designed to speed up the conversion of operating system virtual memory addresses into physical memory addresses.

Simplified, the scheme of interaction between the cache memory and the processor can be described as follows. First, the presence of the information needed by the processor is checked in the fastest - the first-level cache, then - in the second-level cache, and so on. If the necessary information was not found in any level of the cache, then they say about an error, or a cache miss. If there is no information in the cache at all, then the processor has to take it from RAM or even from external memory (from the hard disk).

The order in which the processor searches for information in memory:

This is how the processor searches for information

To control the operation of the cache memory and its interaction with the computing units of the processor, as well as the RAM, there is a special controller.

Scheme of organizing the interaction of the processor core, cache and RAM:

The cache controller is the key link between the processor, RAM and cache.

It should be noted that data caching is a complex process that uses many technologies and mathematical algorithms. Among the basic concepts used in caching, one can single out the methods of writing a cache and the architecture of cache memory associativity.

Cache Write Methods

There are two main methods for writing information to the cache:

  1. The write-back method (writeback) - data is written first to the cache, and then, upon the occurrence of certain conditions, to RAM.
  2. The write-through method (through writing) - data is written simultaneously to RAM and cache.

Cache Associativity Architecture

The cache associativity architecture defines the way in which data from RAM is mapped to the cache. There are the following main variants of caching associativity architecture:

  1. Direct-mapped cache - a specific area of ​​the cache is responsible for a specific area of ​​RAM
  2. Fully associative cache - any cache area can be associated with any RAM area
  3. Mixed cache (set-associative)

Different cache associativity architectures can typically be used at different cache levels. Direct RAM-mapped caching is the fastest caching option, so this architecture is typically used for large caches. In turn, a fully associative cache has fewer cache errors (misses).

Conclusion

In this article, you got acquainted with the concept of cache memory, cache memory architecture and caching methods, and learned how it affects the performance of a modern computer. The presence of cache memory can significantly optimize the performance of the processor, reduce its idle time, and, consequently, increase the performance of the entire system.