There are 3 "types" of threads.
1. User level threads (threads which you create in a multi-threaded application)
2. Kernel threads (each acts as a unit to be scheduled by the OS scheduler)
3. Hardware threads (or CPU threads)
To give you a complete understanding, I will have to go to some considerable depth about processor architecture. However, I will try to elucidate the above mentioned 3 categories as briefly as possible.
So let's take the 1st category. User level threads are basically what the programmer creates. So when you write something like a pthread_create() in Linux, you are creating User level threads. Whether they actually map to independent execution units depends on how many kernel threads are supported for a process. So what is the advantage of creating user-level threads? To answer that let us now focus on kernel threads.
Kernel threads are independent units of execution which have their own thread context which are scheduled independently. Each kernel thread can be scheduled independently of another. In case an OS supports 2 kernel threads per process, and you create 10 user-level threads, internally the 10-user level threads will map to a maximum of 2 kernel level threads. So in true sense, some threads will be recombined to form one thread. Of course it will not be visible to the user, as the compiler takes care to preserve a separate stack for each thread, but in actuality the threads which have combined will not be independent scheduling units as you may think. While executing, scheduling user-level thread1 might mean scheduling user-level threads 1,2,3,4 and 5. The user level thread 3 (for example) will never get scheduled alone.
Coming to the final category, the hardware threads.
When a process executes on a CPU, essentially a machine instruction in executed in each clock cycle. A process is made up of numerous such machine instructions. In the simplest case, only one machine instruction is fetched and executed at a time. Which instruction to fetch is indicated by a special hardware register called the program counter (PC), and there is only 1 PC. However, if the process has two sets of machine instructions which are independent of each other, we can fetch and execute 2 instructions in every clock cycle. In this case we can have 2 PCs each pointing to the next instruction from its set, to be fetched. These two sets of instructions are essentially your kernel threads, and when we add support to the hardware in the form of extra PCs (also extra register files but i will not go into that), we get a processor with multiple hardware threads.
One key aspect to note here is that a hardware capable of running multiple threads, need not depend of the presence of multiple kernel threads (a hardware scheduler can detect independent instructions), and also a kernel thread need not depend upon existence of multiple user-level threads to map into it, theoretically. A compiler can detect independent sets and can create multiple kernel threads from a single-threaded application.
By creating threads at each level, we are just explicitly indicating the independence of instructions and are increasing the chances of the thread being mapped directly into the lower-level thread.
To sum up
Software thread: - 1. User level threads and 2. kernel threads.
CPU capable of running multiple threads: - 1. Hardware threads (a cpu with support for extra PCs and Registers.)