I know this thread is long dead, but some of the information in this post is incorrect,
and I hope to shed some light on it.
First some background: On IRIX a process consists of one or more kernel execution vehicles, which we call "uthreads" (i.e. "user threads). A normal non-threaded process has one and only one uthread. A Pthreaded application is composed of one or more Pthreads, which are assigned to and execute atop one or more uthreads. In a way you can think of a uthread as a virtual processor (in fact that's what they're called inside the library), and you can think of a Pthread a virtual process -- the process executes on a processor, and something handles scheduling, context switching, and the like.
So, given that background...
squeen wrote:
The PROCESS scope threads run in the same process as the one that launched them. But there is just one kernel entity for IRIX to schedule. Therefore you get concurrence (if one thread blocks the other keeps going) but not multi-cpu parallelism. This is known an Nx1 parallelism.
This isn't quite true. The IRIX Pthreads library dynamically creates as man uthreads as it can productively use to schedule individual Pthreads atop them. For PROCESS scope threads this means that as various Pthreads block, or as Pthreads are created and there are additional processors available in the system, the library will spawn additional uthreads to handle any Pthreads which are runnable. The library takes care of scheduling which Pthread is executing atop which uthread at any given moment, and the relationship between a Pthread and a uthread is not fixed -- they may switch associations at any time. In other words, it's just like how the kernel switches normal processes to run on different CPUs -- the relationship between a process and the CPU it is running on may change.
In other words, IRIX PROCESS scope threads follow an MxN threading model. "M" Pthreads running atop "N" uthreads.
squeen wrote:
The SYSTEM scope attaches a kernel entity to each thread so IRIX can schedule them for CPU time independently. This will give you multiprocess parallelism across multiple CPUs. This is NxN parallelism, but it requires special user privledges since the kernel entity runs in the real-time priority band which is above that of most other processes except (usually!) the kernel itself.
Almost. When a SYSTEM scope thread is created on IRIX, the Pthreads library creates a uthread specifically for that Pthread, assigns the Pthread to that uthread, and never changes the relationship between them. That is, the uthread never executes any Pthread other than the one for which it was created, and the Pthread never runs on any uthread other than the one which was created for it. The exact same thing is true for BOUND_NP threads as well.
This is a 1:1 threading model.
The priority of kernel execution has nothing to do with why the CAP_SCHED_MGT capability is required in order to create SYSTEM scope threads. The only reason that CAP_SCHED_MGT is required is because the user
may
choose to alter the priority of that thread, and
may
just choose a priority which boosts it above other system threads, which can cause system lockups if great care isn't taken -- but it's all something that realtime programmers are familiar with and know how to deal with. But due to the delicate nature of such decisions, the additional capability is required.
SYSTEM scope threads are also scheduled onto CPUs only by the kernel, which has all the appropriate knowledge to handle realtime events. The Pthreads library does not handle scheduling of SYSTEM scope threads because it is not realtime aware in and of itself, and thus would be unsuitable to such tasks. Only by tieing a uthread to a Pthread and letting the kernel take care of the details, essentially removing the library from all scheduling decisions for SYSTEM scope threads, can realtime scheduling work.
Which leads to...
squeem wrote:
The BOUND_NP refers to the thread being "bound" to a kernel entity. It gives you paralleism across CPU (NxN) but doens't run a real-time priority and therefore doesn't require special user privledges. The "NP" refers to "not portable" since this is IRIX only and not a POSIX standard.
This is mostly correct, though I don't really agree with calling it NxN. The
only
difference between a BOUND_NP thread and a SYSTEM thread is the ability to alter the thread's priority and other special abilities that CAP_SCHED_MGT grants to a thread. Otherwise they are completely identical. And thank you
very
much for pointing out what "NP" means -- that seems to be lost on most people until it's explained.
So why BOUND_NP scope threads if they're just "crippled" SYSTEM scope threads? There is a class of applications, mosty in HPC areas, that would like to tie a given thread to a given CPU (see dplace(1)) for performance reasons (e.g. cache warmth, memory locality), but which do not need scheduling management capabilities (e.g. setting thread priorities). The BOUND_NP scope thread fills this niche. The 1x1 binding of uthread to Pthread allows the application to staticaly set a CPU on which to run, and set up memory locality and other characteristics for that thread, something which could not be accomplished if the Pthreads library was constantly rearranging the assignment between Pthreads and uthreads, as it does for PROCESS scope threads.
For what it's worth, and if you care, Linux to my knowledge only has the equivalent of IRIX's BOUND_NP scope. Both the PROCESS and SYSTEM scope threads on Linux behave the same as IRIX's BOUND_NP scope.
squeem wrote:
If you want to speed things up by using both of the CPUs on your Octane, I'd recommed the BOUND_NP priority level.
Also, I've directly noticed a difference between PROCESS and BOUND_NP when I run "top". In my app, the process show about "100%" CPU usage using PROCESS scope but shows around "150%" CPU usage when I go with BOUND_NP.
I disagree with this method to speed things up. Yes, there are some individual cases where using BOUND_NP scope threads instead of PROCESS scope threads could eek out a performance advantage -- however the Pthreads library
will
schedule as much useful work as possible for PROCESS scope threads -- bearing in mind that it will ramp up the number of uthreads over time, so short-lived Pthreads may not trigger the creation of as many additional uthreads as expected.
I've no explanation for why you would see such a difference in CPU usage between PROCESS and BOUND_NP scope threads, unless you're running into some degenerate case in the Pthreads scheduling code. I'd have to be very familiar with the application to give a good explanation. I'd be curious to know whether you are actually seeing a performance benefit with the bound scope threads, or if that extra 50% CPU time is being wasted in something like lock contention. Is 50% more work getting done or are things running 33% faster, or is the room just getting 2% warmer?