Memory: How Does a Computer Remember Stuff?

Computers depend on time to perform their logic operations.

There are many reasons why we might need to delay logic across time:

  • With all the electricity flowing through, running all the logic gates at once will overheat everything.
  • We might only want the computer to calculate things when we hit a button.
  • The computer might need to measure or perform something at an interval, such as every second.

Combinational logic is treating logic as an instant thing, but sequential logic works on a moment-by-moment basis. So, Logic Calculation A will create Result A when Logic Calculation B starts.

Technically, electricity doesn’t just move from one area to another sequentially, and it flows a bit like water as the electrons migrate. To compensate for that, engineers figured out how much time it takes for electricity to move from one place to another, put stopping points in the middle (typically with “capacitors”), then made slightly larger separated units of time (“discrete time”, rather than “continuous time”) to track it. These are called “clock cycles”.

Each Cycle

To keep track of time from the motherboard, engineers use a Data Flip Flop (DFF) at each stopping point between each cycle, which holds either a 0 or a 1. It’s usually made of 2 NAND gates stuck together.

This happens fast. A 3.6 MHz processor, for example, is processing about 3,600,000 cycles a second. To put that in perspective, our brains can never react faster than 8 cycles a second and the display for most high-end games run at about 120 frames a second.

Normally, the entire endeavor can look a bit like stop-and-go traffic (“synchronous logic”), but engineers who manage asynchronous logic can make everything move even faster by getting rid of gaps where the electricity in the DFF is waiting for another clock cycle. This is a crazy-complicated engineering and design challenge.

Across Cycles

Now that we have the DFF, we sometimes want to hold that information across as many cycles as we want. To achieve this, we’ll need to keep that information while allowing future information to come in (“register”).

There’s a straightforward implementation to make it work:

  1. Attach a multiplexer before a DFF.
  2. Feed the DFF output to both the output and that multiplexer’s input.
  3. Use a “load” to select whether we want the old info (if it’s “0”) or the new input (if it’s “1”, aka “assert”).

Or to put more simply, imagine two light switches. Switch 1 controls the light, but only switches it on or off when Switch 2 is on. You can flip 1 back and forth all day with no difference until you flip 2 on. Flip 2 off again, and whatever you do with 1 doesn’t matter.

Remembering Stuff

We can put registers together in parallel to make any size we want (represented by “w” for word width). Naturally, like everything else in computers, it’ll be a base-2 setup (2-bit, 4-bit, etc.).

To make a RAM unit, we’ll set up those registers in parallel, then make an “address” that locates which register to look at, represented by “k”.

We’ll then take the load selector, but run it through a gigantic demultiplexer, with the load as the input and the address as the selector.

In this way, we can assert on many registers. Even with a million registers, the logic is only operating on 1 register at a time. We don’t notice because it’s millions of times a second.

In other words, to reference a specific 0 or 1 in memory, it goes through the following:

  1. Send the input to all the registers at once, but the “load” on all the registers (except our target register) is 0, so nothing changes. (e.g., register 1-8 are all receiving “00000001”)
  2. Run the load through a demultiplexer, with a selector for the desired address. (e.g., the selector address is “01000000”, so the demultiplexer routes to register 2)
  3. If the load is 0, only read the register.
  4. If the load is 1, copy the information to the register.
  5. Run the desired address through a gigantic multiplexer to read the memory. (e.g., the selector address is “01000000”, so the multiplexer pulls from register 2)
  6. Repeat a bazillion times.

To compare, the entire operation feels like a forklift driver accessing a gigantic warehouse full of filing cabinets: Row D, Pallet 14, Cabinet 3, second drawer. Row D, Pallet 14, Cabinet 3, third drawer, and so on.

After about 64 milliseconds, the electrical signals in bits would start decaying. For that reason, the memory system performs a refresh operation 8,192 times every 64 milliseconds (ms), which means it’ll refresh (called tREFI) every 7,812.5 nanoseconds (ns), or every 7.81 microseconds (μs). When the memory gets hot, the refresh command will happen more slowly (every 3,906.25 ns). Anywhere from 0.4% to 5% of the running time is made of refreshes, which block any other operations on memory while they’re happening.

By using ECC (“error correcting code”), we can be sure the information is reliably stored and retrieved. Not all processors use ECC, but they really, really should.

  • There aren’t any standards for ECC, so it all lands on the hardware vendor or the programmer downstream.
  • While it’s theoretically an increase in performance, individual bits (a 0 or 1) can sometimes flip, which can change anything from a barely different-colored pixel to a different text character in a file name.
  • A 0.0000001% chance of a bit flip means a one-in-a-million chance, which becomes thousands of bits when it’s billions of bits of data (and 1 gigabyte has 8 billion bits).
  • Without ECC, hackers can often exploit bit flips.

Clumping Together

As a quantity, memory is measured by a downright confusing convention:

  • 8 bits (i.e., registers) becomes one byte.
  • 1024 (2^10) bits is a kilobit (kb).
  • 1024 bytes is a kibibyte (kiB).
  • BUT…1000 bytes is a kilobyte (kB).
  • 1024 kilobits is a megabit (mb).
  • 1024 kibibytes is a mebibyte (miB).
  • BUT…1000 kilobytes is a megabyte (mB).
  • The pattern carries onward for giga-, tera-, peta-, and so on.
  • Beyond petabit/petabyte, we have exa-, zetta-, yotta-, xenotta-, shilentno-, and domegenegrotte-.

Computer-minded people like 1024-based numbers, so they’ll use the “ibi” version, though sometimes people use bit-based numbering, so you have to divide it by 1/8. Data transfer often uses “bit” while hard drives tend to use “byte”, so multiply it by 8 to calculate the actual amount of data downloaded or uploaded.

Non-tech normal people use 1000-based numbers, and marketing people like it because it makes their numbers feel bigger (4 terabytes is just under 3.64 tebibytes).

However, the prefixes are always the same, and are very useful to get a sense of scale. They go in a predictable order:

  • Kilo- (a text file is rarely more than a few of these)
  • Mega- (a music file is usually a few of these)
  • Giga- (a game often takes this much, and used to be the memory size of computers c. 2002)
  • Tera- (the typical size of c. 2020 computers’ memory)
  • Peta- (more than any consumer would ever need, often the amount needed for machine learning or running a large website)
  • Exa- (only giant corporations use this much c. early-2020s)
  • Zetta- (massive)
  • Yotta- (very, very massive and nobody talks about it)

Physical vs. Logical

Obviously, memory must be stored somewhere. That location defines what its “physical memory” is.

However, by using distributed systems, a computer can have more memory (by pretending multiple memory units are the same) or less memory (and allowing other computers to use the leftover memory). The portion the computer has designated as its memory (or how it’s grouped across multiple media) determines is “logical memory”.

This distinction between logical and physical memory, along with balancing CPU load and bandwidth, is essentially the basis of all cloud systems.

Memory Hierarchy

The faster the memory, the more expensive it is. Memory is often the most expensive part of a computer. But, if you want a fast computer, you’ll want plenty of memory for processing.

However, addressing a large memory set takes more time than a small memory set:

  • Memory addresses can become profoundly long, meaning more time to string along the 0s and 1s for it.
  • The location of the memory has to physically be further away, just because of how big it is, meaning more length of wire to travel across.

Instead of paying top-dollar for all top-of-the-line memory, processors typically have a faster memory for constantly changing things (typically called the “cache”) and a slower memory for less frequent changes (called main memory or RAM), and even slower and larger memory on a disk somewhere.

This situation has evolved into what’s known as a “memory hierarchy”:

  • Modern general-purpose CPUs have multiple caches (on this computer it’s L1 at 32 kilobytes, L2 at 256 kilobytes, and L3 at 9 megabytes).
  • There’s typically gigabytes’ worth of RAM plugged into the motherboard.
  • Beyond the motherboard, there’s long-term storage that can be terabytes’ worth of information.
  • Plus, in a distributed system, the memory can theoretically sit all the way across the world and be as big as you want.

A CPU has to manage this memory hierarchy efficiently. To achieve this, the CPU has a “memory management unit” (MMU) that connects the virtual location with the physical memory location. This would mean, for example, that memory register 00000985 will correspond to memory bank 1, register 986. The MMU stores this information in a “translation lookaside buffer” as a type of cheat sheet for the CPU.

Long-Term Storage

There are several physical ways to store memory long-term.

RAM (random access memory) depends on electricity, meaning it’s “volatile”. But, it’s easily accessible and relatively cheap.

  • One of the older implementations of memory was called drum memory, which involved a spinning cylinder with read-write heads along the outside of the drum. It was eventually replaced in the 1970s by magnetic-core memory (i.e., RAM).

ROM (read only memory), on the other hand, doesn’t depend on electricity. But, you can’t write anything to it.

There are a few critical measurements of memory. They’re often representing statistical averages more than precise numbers, but can be significant for planning (especially at scale):

  1. Its transfer speed. It uses standard network speeds to determine it, but should be magnitudes faster than a network’s (since it’s supposed to be literally next to the CPU).
  2. Its projected lifespan, which is measured by “program-erase cycles” (or simply P/E).

There’s a frequent RAM/ROM hybrid called “electronically-erasable programmable read-only memory” (EEPROM) that lets you read and write things to it, but it sticks around after turning the computer off.

  • EEPROM is at the heart of a lot of memory:
    • Long-term storage in flash drives (aka “flash memory drives”) and MP3 players
    • Memory that holds the computer’s BIOS
    • Solid-state drives (SSDs)
  • EEPROM does have downsides:
    • Depending on what you need it for, it can be expensive.
    • The data will deteriorate after decades of sitting dormant.
  • More modern iterations of EEPROM/flash memory use what’s called NAND flash or NOR flash:
    • Single-level cell (SLC) stores 1 bit of information per cell. It’s the highest-performing at 100,000 P/E cycles, but also slow and expensive.
    • Multi-level cell (MLC) stores multiple bits per cell (typically 2). It’s denser than SLC and therefore can store more, but more sensitive to data errors and can only last about 10,000 P/E cycles.
    • Triple-level cell (TLC) stores 3 bits per cell. It’s even more data-dense, but much more susceptible to errors and only lasts about 3,000 P/E cycles.
    • Constraints in electrical engineering prevent quad-level cells on a two-dimensional circuit board. 3D NAND is a new approach that stacks the cells into a 3rd dimension instead of as flat panels. It allows for more storage capacity without a huge price increase, and tends to give better endurance and lower power consumption.

There’s also EPROM (erasable programmable read-only memory) that uses UV light to clear it. They’re a pain to set up, so they’re only useful for specific things that won’t ever get changed (like an alarm clock or stopwatch).

Parameter RAM was a RAM on older Mac computers that required a constant battery backup.

Many computers use NVRAM (non-volatile random access memory) to hold key system settings that may need to change (such as most of the BIOS settings).

If you’re trying to store lots of information, you likely have only a few possible uses:

  • Desktop PC – mixed read/write, but only on for hours at a time (and not days or weeks).
  • Network Attached Storage (NAS) – mixed read/write, but not accessed for days, weeks, or months.
  • Surveillance – heavy amounts of writing data, but less than 1% of the data is ever read.
  • Cold storage – idle for most of the time, but requires enormously fast response and speed when accessed.
  • Data centerconstant read/write on a hefty network connection.

To store lots of information for a long time, there are several specific options:

  • Compact discs (CDs) are coated polycarbonate plastic with optical (i.e., laser-readable) information permanently burned onto them, with a thin layer of aluminum to reflect the information. Some are read-only (CD-R) and others are read/write (CD-RW). While they can get scratched, the ink from a permanent marker can also erode the reflective coating over years.
  • Hard disk drives (HDDs) are cheap, even if they’re slower than SSDs, and are often bigger capacity. Like their granddaddy floppy drives, they use magnetic signals to store information. Unlike floppies, HDDs use a neodymium magnet as a pivoting actuator (rather than hardware in the floppy drive itself).
    • HDDs and SSDs have historically used Serial Advanced Technology Attachment (SATA) cables, but some SSDs use Non-Volatile Memory express (NVMe) to connect straight to the motherboard for improved speed.
  • Tape drives are still popular for huge amounts of relatively unimportant data (such as soil measurements or trivial user data).

Regarding how computers view memory logically, any long-term media is functionally the same as RAM, but much slower and often much bigger, as well as treated differently by the operating system. In fact, many operating systems convert hard drives into “swap space”, which creates “virtual memory” for the CPU to work with more memory.

When fabricating memory, there are several constant realities to manufacturing them that make it expensive when not at scale:

  • The environment needs to be spotless, to the magnitude of 100-1000x cleaner than a surgeon’s operating room.
  • The memory is first fabricated on a flat surface based on a pre-made diagram, then layered together, frequently with something in the middle to protect the layers from each other.
  • The layers are woven together with a conductive wiring like gold thread or copper circuitry.
  • If it’s on a circuit board, the solder paste must be melted through a large oven.
  • It needs a metric shed load of quality control to pass the high standards everyone expects from their media.

Safe Memory

Memory must be tested rigorously before it’s sold. The general approach is to test it for up to millions of program/read/erase/read cycles to ensure it’s reliable. Flash memory goes through less rigorous testing than hard drives.

Higher-quality memory has built-in ECC, with a parity bit for every 8 bits (i.e., a 64-bit register will have 72 bits total).

Before a hard drive permanently fails, it’ll often provide S.M.A.R.T. (Self-Monitoring, Analysis, and Reporting Technology) errors. These errors can be crucial in detecting problems before they arise and avoid permanent data loss.

For long-term storage, it’s not always safe to only have one place to hold the memory:

  • If it’s your desktop computer, you won’t really need any redundancy besides cloud backup of your important user files.
  • Because of the sensitivity of the information, NAS and surveillance utterly require a “redundant array of independent disks” (RAID).
  • For cold storage, it’s safest to use at least 12–24 drives in a RAID array, possibly higher.
  • In a data center, the vast size of the project can mean hundreds or even thousands of drives.

RAID configurations are based on redundant information (“parity”) and spreading the information across multiple drives (“striping”):

  • RAID 0 has striping, but no parity:
    • DISK1 – A1, A3, A5, A7
    • DISK2 – A2, A4, A6, A8
    • This is technically the most unsafe way to set up an array, but gives you the most space.
  • RAID 1 has parity, but no striping:
    • DISK1 – A1, A2, A3, A4
    • DISK2 – A1, A2, A3, A4
    • Because all the memory is used twice, you can usually swap the drive while the computer is still running.
  • RAID 5 has striping across multiple disks and parity to check if there are errors:
    • DISK1 – Ap, A2, A3, A4
    • DISK2 – B1, Bp, B3, B4
    • DISK3 – C1, C2, Cp, C4
    • DISK4 – D1, D2, D3, Dp
    • This is a tradeoff: it isn’t as fast or large as RAID 0 (from the parity bits) or as safe as RAID 1.
  • RAID 10 is a hybrid of RAID 1 and RAID 0:
    • RAID 0: 2 disks
      • RAID 1 with 2 disks (DISK1 and DISK2)
      • RAID 1 with 2 disks (DISK3 and DISK4)
    • This solution is the best for large-scale endeavors because there’s always a copy of the data if any individual drive fails.

For read-only external media like CDs, the best way to store a backup is to make a .iso archive file. Most modern operating systems can pull up that file as a virtual drive.

If the drive is self-encrypted, it means there’s a processor attached directly to the memory unit that encrypts and decrypts the memory as it’s used.

Faster Memory

Memory access is typically the slowest part of processing, so engineers are constantly working to make faster RAM.

The clock speed can be a bit misleading. While it can be pretty accurate (3200 = 3,200,000,000 megatransfers/cycles per second), RAM timing/frequency is based on how fast it takes to move information to or from the RAM, which means you need to cut it in half. It’s not too difficult to break down the math:

  • 3200 megatransfers/second
  • x 64 bits per transfer in a 64-bit computer
  • / 8 bits per byte
  • = 25,600 MB/s
  • / 2 for actual speed (because it’s read/write) = 12,800 MB/s

To speed it up even further, RAM started using multiple channels at once.

At first, they used a “ganged” setup, where they connected 2 64-bit buses together to make a 128-bit bus. Once CPUs started having more than 1 core, unganged RAM became faster because it could pipe the information to multiple cores at once. This entire setup is known as “dual channel”, but can also be quad, 6, 8, or higher.

So, a dual-channel takes the above calculation and doubles it, a quad-channel quadruples it, and so on.

Further, RAM will often include ICs (integrated circuits) to perform tasks to make it even faster.

Abstraction

Once the memory has been created physically, it groups into “sectors” when a new hard drive is created, which then assemble into blocks of memory. At this point, it’s a homogeneous abstraction where it really doesn’t matter what it is, as long as it’s reliable.

Of course, like all things in reality, memory attenuates over time, so it will need periodic scanning to find bad sectors. From there, the operating system can earmark those sectors as bad, and hopefully no important data was lost.