What do "multicore" and "multithreaded" mean?
November 18, 2017 5:17 AM   Subscribe

I just got a new computer and its CPU is described as "four core, eight thread." Can you give me a simple layperson's explanation of what that means? I don't know anything about computer architecture, and all of the Google results seem to assume a base knowledge that I don't have.
posted by Frobenius Twist to Computers & Internet (15 answers total) 5 users marked this as a favorite
A core is basically like a little independent CPU, so you have multiple CPUs working on different things at the same time.

A thread is a software process or program running on the computer. If you have one core, then the operating system (OS) runs one thread for a short time, then switches over and runs another thread for a short time, to make it look like all threads are running simultaneously. If you have multiple cores, then you can have different threads running on different cores and they are actually running simultaneously without the OS having to fake it*.

Originally, each core could only run one thread at a time (with the OS quickly switching between threads as mentioned above). With newer processors, a core can truly run two threads at the same time, which is called hyperthreading. In your case 4 cores x 2 threads per core = 8 threads.

Extra info: There are some limitations on the extra threads in hyperthreading, so having an 8 thread hyperthreaded CPU is not as fast as an equivalent 8 core non-hyperthreaded CPU. But it's still a big improvement over a 4 core non-hyperthreaded CPU.

* The OS actually still does switch threads in-and-out on a multi-core CPU, but the principle still holds true in that it can have multiple threads running simultaneously.
posted by duoshao at 5:31 AM on November 18, 2017 [7 favorites]

The job of a CPU is to execute instructions, and instructions come in a sequence: each program is a sequence of instructions. One core is the physical infrastructure for executing a sequence of instructions. Old CPUs had just one core, so they were running just one program at a time. Multicore CPUs include several copies of the same infrastructure, so they can run more than one program at a time: a four core CPU should normally be able to execute four sequences of instructions in parallel - i.e. four programs at the same time.

On top of that there's multithreading, which is a clever way of using a single core to execute two programs - basically switching from the one to the other when the next instruction cannot be run because the CPU must wait for something external to happen. The important part is that this is done automatically by the CPU, so from the a outside a four-core CPU with multithreading looks like an eight-core CPU. It's not as good as an actual 8-core, but it's better than a simple 4-core.

Note that this is further complicated by the fact that a core is executing one program at any one time, but it may be switching from one program to the other several times per second - thus the user gets the illusion that even a single-core CPU is executing multiple programs in parallel.
posted by Dr Dracator at 5:38 AM on November 18, 2017 [1 favorite]

Also: almost all vaguely modern Intel chips have hyperthreading (and AMD’s Zen architecture has something similar), so you can pretty much just compare core counts.

On some workloads, hyperthreading can even make performance worse—as in, 8 HT threads get less work done per unit time than 4 non-HT threads. People using their computers mostly for heavy scientific or graphical processing often benchmark that processing both ways and globally enable or disable HT in BIOS/UEFI based on the result. For general desktop use, with more than one core, the difference is unlikely to ever reveal itself.
posted by musicinmybrain at 5:56 AM on November 18, 2017 [1 favorite]

Excellent, thanks! That makes sense. This is indeed an Intel chip. I'm using this computer for music production, which can get heavily CPU intensive, so maybe I'll play around with disabling hyperthreading.
posted by Frobenius Twist at 6:22 AM on November 18, 2017

The previous answers are correct, but you might like an analogy. Imagine your computer is a huge train - the programs you want to run are the carriages and your CPU is the engines that pulls them.

Your 4 core, 8 thread processor is a train with 4 engines, each engine has 2 motors.

It does not mean you can run 8 programs at a time, your computer/train can run/pull a huge number of programs. It will just travel slower than a 8 engine train, but faster than 2 engine train pulling the same load.

You computer will also have a clock speed, measured in Ghz. This is like the RPM of the motors, but the analogy breaks down after that.

Don't disable hyperthreading, in almost all cases it improves things.
posted by AndrewStephens at 6:27 AM on November 18, 2017 [6 favorites]

Ok cool, I like that metaphor
posted by Frobenius Twist at 6:29 AM on November 18, 2017

There are some technical distinctions between threads and programs that don't really matter for what follows, so I'll use both words interchangeably.

When a plain single-threaded CPU core runs multiple programs in parallel, it does this by running one of them for a little while, then saving some notes about where it was up to with that, then restoring its notes about where the next thread had got up to last time it was saved, then running the next thread for a while, then saving some notes about where that one was up to, and round and round it goes.

The saving and restoring of internal state that needs to be done every time a core switches to a new thread is time-consuming, and becomes relatively more so the faster the switching needs to be done. Maintaining the illusion that two threads are in fact running at the same time requires the back-and-forth switching to happen quite quickly, and this makes the switching overhead quite noticeable.

So there's a tradeoff between responsiveness (which is highest when thread execution never gets put off for a long time, i.e. when switching from thread to thread happens at a high rate) and overall throughput (which is highest when thread switching happens as infrequently as possible, minimizing the total amount of per-switch overhead).

Modern computers usually need to do several things at once and some of those things involve responding perceptually instantly to user actions. Also, it turns out that we've actually reached a point in CPU design history where it's quite difficult to clock them any quicker - we're already doing that so fast that physical limits on the time it takes for signals to propagate around inside the chips are a real constraint on how fast a core can be clocked.

So the general response has been to add more cores. Each core is, to a reasonable first approximation, equivalent to a whole extra processor. This makes it possible to run more threads in parallel without needing a whole lot of switching overhead; if at any instant there are (say) four programs that need to be doing stuff simultaneously, and you have a four-core CPU, you really truly can run one program on each core, and you really truly can get almost four times the throughput of a single-core machine with the same clock speed, and no switching-induced responsiveness penalty. And if you need to run more parallel threads than you have cores, you can do things like devote one or more whole real cores to high-priority threads while doing the customary switching dance on the rest of them.

But as it turns out, you don't actually need complete replication of a whole core to make it possible for a core to run more than one thread without a switching penalty. The parts of the core whose state needs to be saved and restored on each thread switch are actually quite a small part of the core as a whole, and duplicating just those parts - which costs much less silicon and power consumption than duplicating an entire core - lets the core's attention get split evenly across two parallel threads without needing the save+restore step on thread switches. That means that the switching can happen really really fast, which is good for responsiveness, while taking very little away from total throughput.

Hyperthreading is Intel's name for this idea. Most modern Intel cores support it, and many modern AMD cores have something like it as well. The way it's done it makes a single hyperthreading core appear to be two full cores from the programmer's point of view; hiding the distinction between a full core and half a shared core in this way simplifies the programming model without wasting much potential capability.

So a four core, eight thread CPU appears to the programmer to be eight independent processors, each of which can have a single thread run on it with no programmer-visible switching overhead, even though when fully loaded such a chip will in fact be switching internally between two threads on each of its four cores.

And although the hyperthreading architecture makes that switching relatively cheap, there is still some small cost associated with it due to the way the cores, the memory subsystem and the caches interact with each other. Which is why, on CPU-bound scientific or similar workloads only, where total throughput is all-important and responsiveness basically matters not at all, it makes sense to turn hyperthreading off and give each of the four real cores a single thread to chew away on with as little switching as possible.
posted by flabdablet at 8:10 AM on November 18, 2017 [4 favorites]

2 hyperthreads on 1 core is the standard Intel architecture right now. The rule of thumb I have in my head is that 1 core with 2 hyperthreads is about like 1.7 cores without hyperthreads. It's definitely almost always a win, I wouldn't turn off hyperthreading unless you have very specific information it will improve what you are doing.
posted by Nelson at 8:11 AM on November 18, 2017

If you're using Ableton, freeze your tracks that use CPU-heavy VSTs (but don't delete the VST tracks, just "disable" them). That way, Ableton is playing the track as an audio file and not using the VST to make whatever sound you have.
posted by kuanes at 9:14 AM on November 18, 2017

Great answers, thanks so much! This makes a lot of sense.

So, if I understand correctly, multiple cores or threads could be used in the same program, but the program needs to be programmed in such a way that it tells the cores how to split up their duties?
posted by Frobenius Twist at 9:55 AM on November 18, 2017

Exactly right Frobenius. In the past, before many programs were updated to use multiple cores regular desktop users didn't really need more than two (one for the main program, one for everything else) but now, a lot of programs are designed to use multiple threads which can/should run on multiple cores.

However, the way the programs use multiple threads / cores is often imperfect and the speed of a single core can often be the bottleneck, even when free cores are available. A lot depends on the specific program you're running and the tasks it's doing; for example video encoding tends to be perfectly parallelizable but browsing a single web page isn't, though new versions of browsers are working on that.
posted by bsdfish at 10:25 AM on November 18, 2017

multiple cores or threads could be used in the same program, but the program needs to be programmed in such a way that it tells the cores how to split up their duties?

Yes. A single program can be structured as multiple threads, and the operating system will allocate these across the available cores. The OS will also contain facilities that let the programmer inquire about how many cores are available, and how many of those are full cores rather than hyperthreading simulated ones, in case the programmer wants to specify the relationship between threads and the cores they run on more tightly for performance reasons.

Multi-threaded programming has a reasonably well-deserved reputation for being tricky to get right, and is often avoided for that reason. Even so, it's reasonably common for a program with a GUI to have a thread devoted to user interaction and at least one "worker" thread that gets given things to do in the background; the advantage of this architecture is that delays seen by the worker thread due to slow disks or networks won't necessarily cause lagginess in the GUI.
posted by flabdablet at 10:36 AM on November 18, 2017

I'm using this computer for music production, which can get heavily CPU intensive, so maybe I'll play around with disabling hyperthreading.

I wouldn't.

Music production is all about multiple threads computing streams of results with very low latency, so that there's as little delay as possible between trigger and output and all the computed results arrive at the mixer thread without gaps. It's an absolutely responsiveness-dependent workload.

Total throughput is less important than keeping the latency low and the work spread as evenly across the available CPU power as possible. The more cores you seem to have available, the less time the OS will need to spend on thread context switching - even if all the visible cores are actually being simulated by hyperthreading.

Turning hyperthreading off is really only ever beneficial for workloads where raw CPU grunt is actually the performance bottleneck and exactitude of timing doesn't matter very much, and even then it can do more harm than good depending on the number of concurrent threads the program uses and the way those threads interact.
posted by flabdablet at 11:01 AM on November 18, 2017 [1 favorite]

Also: almost all vaguely modern Intel chips have hyperthreading (and AMD’s Zen architecture has something similar), so you can pretty much just compare core counts.

Not consistently true with every Intel CPU generation, just to be confusing. I believe all i7 processors have hyperthreading, but the i5 usually doesn't. Every now and then they do release an i5 with hyperthreading, probably just to compete with AMD at a certain price point. For a specific example: the i5 in my 2017 5K iMac is not one with hyperthreading, but Intel is now shipping some i5 processors that have it.
posted by fedward at 12:36 PM on November 18, 2017

If you want to get a bit deeper into the architecture side, barrel processors go about as far down the hyperthreading road as it's possible to go. Here's a bit of background that will hopefully help you understand why I think they're such a neat idea.

In order to execute a single programming instruction, a processor has to perform a bunch of separate steps:

1. Fetch the next instruction from memory.
2. Decode the instruction.
3. Do any memory reads required by the instruction.
4. Compute some kind of result in an instruction-specific way (such as adding a couple of numbers, or comparing them, or incrementing one of them, or treating one as the address of the next instruction to run).
5. Do any memory or internal register writes needed to save the result somewhere appropriate.

The processor has separate sections designed specifically to perform each of these steps, each of which requires at least one tick of the processor's internal clock.

Very early processors did this stuff in the simplest possible way, by just doing all the steps one by one for each instruction. But it soon became apparent that this wasted a lot of time: it seems a bit sad to have the sections devoted to steps 1, 2, 4 and 5 sit around doing nothing at all while they wait for the step 3 hardware to finish its work, for example.

And so arose the idea of the pipeline, which is a way to allow parts of multiple instructions to be worked on at the same time, ideally keeping all sections fully busy. So as the instruction fetcher is grabbing the next instruction, the decoder is decoding the one the fetcher fetched in the tick before that; the memory read unit(s) are reading operands for the instruction fetched two ticks ago; the execution units are doing work for the instruction fetched three ticks ago; and the write units are saving the results for an instruction four ticks old.

You can think of the pipeline as a little assembly line, instruction completion as a product being built in steps, and the processor subunits that do those steps as workers by the side of the line.

Pipelines work really well, and every modern processor has them. But they do complicate things, because the design needs to take into account a bunch of possibilities that can't occur in a non-pipelined machine.

One of those possibilities is dependencies between instructions that end up in the pipeline at the same time. If the program contains an instruction that asks for two numbers to get multiplied together, followed by an instruction asking for the result of that to be added to a third number, then the pipeline must stall: the add instruction is held up at the decode step, because the data it needs to fetch at step 3 won't exist until after completion of step 5 of the previous (multiply) instruction.

CPU architects find pipeline stalls grossly offensive, and various approaches have been tried to get rid of as many of them as possible.

One approach is to just tell the programmer: listen, pal, you need to understand that this is a pipelined machine you're using here so just don't write code that would stall the pipe, mmkay? But there's only so much a programmer (or, for modern code, an automatic code compiler) can do to rearrange instructions into an order that avoids data dependencies. When that's not possible, NOP (no operation) instructions need to be inserted in the program and these function equivalently to explicitly coded pipeline stalls.

Another approach is to break instruction execution into even more substeps, make the pipelines longer, add more subunits, and have several instructions flowing through the pipe at any instant. This is called superscalar execution, and it allows the processor to analyze data dependencies on-the-fly and re-order instruction processing as needed to avoid stalls while not destroying the programmer's intended meaning.

As you can imagine, this is hella complex and ridiculously difficult to get right. But it turns out that it can be made to work really well, and really does keep most of the silicon busy working on useful stuff most of the time. All the modern Intel and AMD x86-family processors work this way internally. And if they have cores that do hyperthreading as well, the HT stuff is essentially bolted on over the top of this already horrendously complicated pipeline.

Barrel processors come from looking at all that excellent and intricate work and going "that's just nuts, there has to be a simpler way", taking a step back, and realizing that a multi-threaded workload has an inherent source of valuable successive-instruction independence built in: instructions from different threads. Instructions executed close to one another within any given thread will typically have frequent interdependencies, but in general there's no reason at all why an instruction from thread A should be interdependent with whatever unrelated instruction needs to be executed next in thread B.

So in a barrel processor, the same parts retaining programmer-visible CPU state that the Intel processors duplicate for hyperthreading is replicated even more times, and the result is a processor that has one set of that stuff for each pipeline stage. From the programmer's point of view, the barrel processor appears to have as many independent cores, and is therefore capable of executing as many threads in parallel, as it has pipeline stages

For any given single thread, the barrel processor appears to work like the dinosaur machines that roamed the Earth before pipelines were even a thing: fetch, then decode, then read, then compute, then write, then fetch, then decode...

The intra-thread instruction timing therefore looks very slow and steady and methodical, but it's not subject to pipeline stalls at all and that makes it very predictable. That makes barrel processors a good fit for applications that need to guarantee that some external minimum timing can always be met.

But all those subunits are not sitting there idle while an instruction from thread A occupies another; they're working on their own stages of instructions from threads B, C, D and E at the same time.

For workloads that do need multiple threads running most of the time, and also need very predictable execution timing, barrel processors are a really tidy solution. A lot of I/O fits into that category, which is why Seymour Cray stuck a barrel processor on the side of his early CDC Cyber supercomputers to deal with that stuff.
posted by flabdablet at 3:56 AM on November 19, 2017 [1 favorite]

« Older What should I bolt on to my Nashville trip?   |   Infant/toddler shoe brand that sounds like “robot”... Newer »
This thread is closed to new comments.