Nekonomicon - CPU Core Counts: AIX, HP-UX, IRIX, Solaris, BSD, etc.

jwp
Who joined Nov. 18, 2012, 7:14 p.m.
and authored 74 notes

Wrote the following at July 6, 2013, 9:45 p.m...

thegoldbug wrote:

the output from the "hinv -c processor" command

IRIS 46# hinv -c processor
1 600 MHZ IP35 Processor
CPU: MIPS R14000 Processor Chip Revision: 2.4
FPU: MIPS R14010 Floating Point Chip Revision: 2.4
IRIS 47#

Ah, I was foolishly looking for only the plural form "Processors" rather than the singular "Processor" ... Fixed.

_________________
Debian GNU/Linux on a ThinkPad, running a simple setup with Fvwm.

hamei
Who joined Feb. 24, 2004, 4:10 p.m.
and authored 8705 notes

Wrote the following at July 6, 2013, 10:33 p.m...

jwp wrote:

Ah, I was foolishly looking for only the plural form "Processors" rather than the singular "Processor" ... Fixed.

You realize this entire exercise is totally wrong, I hope ? It is not the application's job to determine how many processors are available or how to assign resources. That's what the operating system's scheduler is for.

Stand back and let the o.s. do its job. All you are doing with this shit is making a mess.

bluecode
Who joined Dec. 13, 2011, 7:56 a.m.
and authored 192 notes

Wrote the following at July 6, 2013, 11:11 p.m...

hamei wrote:

jwp wrote:

Ah, I was foolishly looking for only the plural form "Processors" rather than the singular "Processor" ... Fixed.

You realize this entire exercise is totally wrong, I hope ? It is not the application's job to determine how many processors are available or how to assign resources. That's what the operating system's scheduler is for.

Stand back and let the o.s. do its job. All you are doing with this shit is making a mess.

It depends. Increasingly in the UNIX world things seem to be moving in the direction you're going. A lot of the modern parallelism stuff is aimed at making it easy to get the advantages of multiple cores and threads automagically without the race conditions and other issues. But this trades off efficiency, performance, and control for "safety" and is almost never the best solution from a design or performance point of view. The best way to use multiple processors depends on a lot of stuff including the hardware, the OS, the APIs, and in many cases it depends heavily on the application and what it is doing. There is the issue of how one application fits into the OS and shares resources with other applications and the OS itself. But there is also the issue of what the app is designed to do, and how it can best be organized to do that. That is not something the OS or these new APIs can do. It depends on having fairly low-level access to stuff like controlling serialization directly,resource managing {threads|tasks|processes|memory} etc.

In practice the "it's not the application's job" and "let the OS do it" approach devolves to a "throw more hardware at the problem" approach since it removes control from the very place where the issues are most understood- the resource consumer. This is a separate issue from the fact many/most coders are unskilled and shouldn't be coding.

I think your comment is too general on one hand and too UNIX-specific on the other. There are other (older) paradigms which have been working just fine in alternative universes, and they involve letting the application designer/coder set up his program in the best way possible and then the OS allocates resources when available. The same program can run on a uniprocessor box or a multiprocessor box without changes. It just usually runs a lot better on a multiprocessor box, up to a point where even throwing more threads|tasks etc. at the problem don't help.

_________________
Paint It Blue

porter
Who joined Nov. 1, 2006, 10:37 p.m.
and authored 2895 notes

Wrote the following at July 6, 2013, 11:38 p.m...

Put it this way, you now know how many CPU/cores/threads you have, what are you actually going to do with that information?

_________________
:Indy:

4xRS6K 2xHP9K 6xSUN 1xDEC 14xMAC 7xPC 2xPS2

hamei
Who joined Feb. 24, 2004, 4:10 p.m.
and authored 8705 notes

Wrote the following at July 6, 2013, 11:56 p.m...

bluecode wrote:

It depends.

Absolutely agree with you on this and I knew someone would kick my ass (justifiably) over that statement.

But in the context of nekochan / amateur coders / desktop software / &c I think I can make a good case that it's bascially a true statement ...

Quote:

Increasingly in the UNIX world things seem to be moving in the direction you're going.

That's kind of funny because I personally think Unix sucks in this area ... do you remember when The Man hisself said "Linux will never support smp !" ?

From a desktop user perspective, Irix may be able to handle 1024 processors but a dual-p box running OS/2 will kick Irix ass. Part of that is Irix' fault and part is the software's fault but the end result is the same - Irix does not handle mullitple processors as well as OS/2 from a user standpoint. It's probaly great for massively parallel mathematical calculations but for responsiveness, it sucks. Don't know about BeOS but that's possibly better as well (It ran well when I had it but there were so few applications I never really got to thrash it.) Yewnix is not really multiple-processor friendly at the desktop level.

(Yes, I am aware that OS/2 has its own peculiar problems as well. I'm not talking about those tho, I am thinking of how the operating system handles resources and multiple threads.)

((footnote two, let's not even talk about Fireflop which is apparently massively multi-threaded but the braindead dog turd sits there locked up solid waiting an hour for a non-existent link to twitter.com to show up. What effing clown wrote that piece of shit ?))

Quote:

There is the issue of how one application fits into the OS and shares resources with other applications and the OS itself. But there is also the issue of what the app is designed to do, and how it can best be organized to do that. That is not something the OS or these new APIs can do. It depends on having fairly low-level access to stuff like controlling serialization directly,resource managing {threads|tasks|processes|memory} etc.

I am sure you have a good point. But let's look at the real world for a second. The bozos writing software can't even design a configure make make install scenario that works. The most popular web browser in the world is total shit. Consumer grade software is stinking worthless garbage.

These are the people I am supposed to trust to make intelligent decisions about resource management ?

The biggest accolades for Macintosh is that "it just works !" Wull kiss my rosy red ass, Jackson ! Something that actually works ? Wow. I best rush out and buy that !

Quote:

In practice the "it's not the application's job" and "let the OS do it" approach devolves to a "throw more hardware at the problem" approach since it removes control from the very place where the issues are most understood- the resource consumer.

This is probaly true in the Linux world. In the OS/2 world, the people who wrote the scheduler and the kernel were smart. They actually knew what they were doing, and the scheduler was very very effective.

Quote:

This is a separate issue from the fact many/most coders are unskilled and shouldn't be coding.

Unfortunately, as software users we can't entirely separate those issues

Quote:

I think your comment is too general on one hand and too UNIX-specific on the other. There are other (older) paradigms which have been working just fine in alternative universes, and they involve letting the application designer/coder set up his program in the best way possible and then the OS allocates resources when available. The same program can run on a uniprocessor box or a multiprocessor box without changes. It just usually runs a lot better on a multiprocessor box, up to a point where even throwing more threads|tasks etc. at the problem don't help.

This is exactly how OS/2 works, so I am not going to argue with you on this

In fact, I don't particularly like Yewnix. It's really not that good. It just happens to be better than the other choices we commonly have.

I still think that in the majority of cases, the people writing the OS are smarter and more capable than an application developer. Especially since almost all application developers are convinced their application has to be the most important thing running on the computer.

Oh, wait ... now that we have Linux, maybe that's not a valid viewpoint.

Oh shit

bluecode
Who joined Dec. 13, 2011, 7:56 a.m.
and authored 192 notes

Wrote the following at July 7, 2013, 12:25 a.m...

Well Hamei this looks like a conversation we could have over a pitcher or three of beer. I agree with everything you said. Your post reminded me there's a lot I am dissapointed with in the direction parallelism support is going but it's because of what we said: most desktop software is crap and written by people who can't code their way out of a paper bag. And UNIX falls far short of my desires but it's the best commonly available option and the price is right. So we're kind of stuck with it until who knows when.

The race conditions and buffer overflows never seem to end. The response to this is to take control away from the application and give it to more and more layers. Look at Fortran which can be considered a leading language for HPC. They have changed the language dramatically in the last few standards going from a language that wasn't useful for much more than calculations without vendor extensions(no pointers, no dynamic storage management) to a language with a lot of nice features that's becoming generally useful. But they left out tasking control completely in favor of external solutions like OpenMP. Could we hope for native threads like PL/I has had since the 1970s and Ada had since 1983? If not, could we hope for a POSIX thread interface? No! My reaction to your post was more about this kind of stuff than actually worrying about the number of cores/threads. I believe in giving the application designer as much control as possible over managing resources because that enables the best possible design for that specific application. After that it becomes the OS' job to make sure the applications live together harmoniously which is something most OS don't seem particularly good at.

There are things only the application knows so doing all this stuff at a higher/lower level is never going to solve the fundamental problem. A good simple example is an email client. Most people have multiple email accounts but most (all?) email clients fetch mail serially and most of them lock the UI while the mail is being fetched. This is obviously the easy, "safe" way to write an email client because serialization is hard. But there's no reason there couldn't be a UI thread so the UI stays unlocked and you could compose and search and read mail while you're fetching from *all* the mail servers in parallel. It's just that nobody did this (AFAIK). No OS is intelligent enough to split up this work into meaningful logical components. I was not involved but I worked next to a group that did a lot of work and had some success on parallelizing big workloads. It's very difficult to do from outside and there is a limit to how much this can ever work, and that's why it always degenerates to throwing more hardware at the problem. They did a good enough job to make money with their solution but only the people designing the workload have a complete understanding (ok, if they're competent, which is not a given) of what can be parallelized and what can't. The external parallelizing layers and tools aren't a real solution. They're just a po-man's way of getting an extra bump for not much effort. People have to buy more and more hardware to keep up with lamer and lamer software.

_________________
Paint It Blue

porter
Who joined Nov. 1, 2006, 10:37 p.m.
and authored 2895 notes

Wrote the following at July 7, 2013, 12:38 a.m...

bluecode wrote:

People have to buy more and more hardware to keep up with lamer and lamer software.

Not if the solution does not depend on memory being shared; then you can run your processes in the cloud. Just rent the h/w as and when you need it.

_________________
:Indy:

4xRS6K 2xHP9K 6xSUN 1xDEC 14xMAC 7xPC 2xPS2

jwp
Who joined Nov. 18, 2012, 7:14 p.m.
and authored 74 notes

Wrote the following at July 7, 2013, 12:51 a.m...

hamei wrote:

jwp wrote:

Ah, I was foolishly looking for only the plural form "Processors" rather than the singular "Processor" ... Fixed.

You realize this entire exercise is totally wrong, I hope ? It is not the application's job to determine how many processors are available or how to assign resources. That's what the operating system's scheduler is for.

Stand back and let the o.s. do its job. All you are doing with this shit is making a mess.

Your comments seem to make the presumption that one program only needs one process. Most programs only need one process, but for some things like CPU-intensive programs, build tools, server programs, libraries, etc., knowing about the resources available is important for multiprocessing. Otherwise, how many processes should the task be divided into? There are basically a few choices: (1) do one thing at a time, (2) use some magic number conjured out of thin air, or (3) make a very large number of processes (potentially wasting a lot of time and memory).

There are many very useful programs that do need to know how to divide up work in order for them to fork the correct number of processes. For example, pbzip2 (parallel bzip2) and pigz (parallel gzip) can run an order of magnitude faster than just using bzip2 and gzip, respectively. Likewise, GNU Make can take advantage of parallelism when building a project. The same goes for GNU Parallel -- that program splits up work into subprocesses and does tasks in parallel. There are also special libraries that are used specifically for parallelism, and they also need to be able to detect the number of processors.

I was reading through the code for GNU Parallel, and the code for detecting the number of processors was lacking, to say the least. If I remember correctly, it basically just works for Linux, FreeBSD, Solaris, AIX, and Darwin. Libraries and utilities specifically meant to execute tasks in parallel (and therefore remove the low-level detection from the application itself), should have better processor detection code that works for IRIX, HP-UX, NetBSD, OpenBSD, etc.

I want to create a small reference implementation so developers who need to write software like this can refer to the script. The script itself may not be so useful. Otherwise, without some reference, few people have access to many of these commercial Unix systems, and may even leave out support for OpenBSD, NetBSD, etc.

_________________
Debian GNU/Linux on a ThinkPad, running a simple setup with Fvwm.

hamei
Who joined Feb. 24, 2004, 4:10 p.m.
and authored 8705 notes

Wrote the following at July 7, 2013, 2:49 a.m...

bluecode wrote:

The race conditions and buffer overflows never seem to end.

It does seem that many real problems never get fixed, while new toolkits multiply like rabbits.

Quote:

I believe in giving the application designer as much control as possible over managing resources because that enables the best possible design for that specific application. After that it becomes the OS' job to make sure the applications live together harmoniously which is something most OS don't seem particularly good at.

I think that was what I was trying to say

Quote:

A good simple example is an email client. Most people have multiple email accounts but most (all?) email clients fetch mail serially and most of them lock the UI while the mail is being fetched. This is obviously the easy, "safe" way to write an email client because serialization is hard. But there's no reason there couldn't be a UI thread so the UI stays unlocked and you could compose and search and read mail while you're fetching from *all* the mail servers in parallel. It's just that nobody did this (AFAIK).

Ah. But this is exactly what OS/2 did. The IBM recommended practices for OS/2 programs were "For any non-trivial application (i.e., takes more than 1/10th second to complete - this was on a 386, btw) then immediately spin off three threads." One is to listen for user input, one to do the work, and I forget what the third was. At the time I used MR/2 ICE for mail - it was very uncommon to have more than one mailbox so I never tried fetching mail from several places at once, but MR/2 did spin off worker threads to fetch mail while you could browse through your mail directories, read messages in another window, write mail in another window, etc - all while other threads fetched mail in the background AND the user interface never went dead while any single task waited for completion.

This was the whole design philosophy of OS/2, in a time when Unix didn't even have posix threads.

It worked, too. It was nice. (The braindead SIQ was not so nice though - although, in a way, the crappy siq forced people to write programs correctly.)

Quote:

No OS is intelligent enough to split up this work into meaningful logical components.

Agreed, it isn't ... but the os schedular is smart enough to assign resources to all the different programs which want them. So if one program needs ten threads and three processes while another needs one process with nine threads and five other processes need to run in the background, then let the scheduler do that. Don't let Joe Sixpack decide that since the computer has four cores, wull by gawd I'm gonna make my application use four cores !

Not smart

Quote:

The external parallelizing layers and tools aren't a real solution. They're just a po-man's way of getting an extra bump for not much effort. People have to buy more and more hardware to keep up with lamer and lamer software.

I didn't mean to promote that crap. Let me hop down to making friends and influencing people below and you can see what I meant ...

jwp wrote:

Your comments seem to make the presumption that one program only needs one process.

No, my quote does not make that presumption at all . In fact, your statement scares me. Do you know how a computer works ?

Quote:

knowing about the resources available is important for multiprocessing.

For general computing and desktop use, no it is not

Quote:

Otherwise, how many processes should the task be divided into? There are basically a few choices: (1) do one thing at a time, (2) use some magic number conjured out of thin air, or (3) make a very large number of processes (potentially wasting a lot of time and memory).

Jesus. The task should be divided up according to what needs to get done.

Using Bluecode's example of an email client :

Open the app with one thread, immediately spin off another thread to actually draw the windows, another to do the work to fill them. Thread One is listener thread for user input. User decides to collect mail, clicks [email protected] and one new thread created to collect those emails. User has two more accounts, clicks [email protected] and tommy@the_grill.org . One worker thread each. Three sets of mails being collected (seemingly) simultaneously, user interface still responsive. Mr User can even browse the main window and open last week's mail (another thread) or if he likes, open another window and write a happy birthday letter to his Mom (another thread.)

These threads and/or processes are all dependent on what he needs to do , not some peculiar calculation based on how many cores are available.

Now, what happens in the larger picture ? Due to the fact that an operating system has a scheduler, with a single core computer the scheduler will decide when each of these threads is handled. The scheduler will also be handling a lot of other tasks, because nowadays even a laptop is doing a lot of things "at once" (not really at once but it seems that way.)

The beauty of doing this correctly is that you can now take this exact same program and put it on a 4-core computer. Now, if the application has sixteen threads, instead of them lining up in a row like a freight train, the scheduler can hand out a task to every cores as the cores become available. You can't just decide that since you have four cores you're going to have four threads. First off, there's other stuff going on in the computer. Second, that's no more effective than having four DOS computers under one piece of sheet metal. Multi-tasking isn't about having an army of little computers that don't talk to each other. It's about doing a number of things in a way which seems simultaneous (although it really isn't.)

Quote:

There are many very useful programs that do need to know how to divide up work in order for them to fork the correct number of processes.

Sure. But there is no need for them to have any idea whatsoever about how many processors the computer has. If the app needs ten threads, give it ten threads. Then let the scheduler decide how to hand out cpu time.

Here's how it works in real life - let's say you have four cores available. There are 76 tasks running (I just ran top, that's what's going right now on my O350.) Your app opens, it adds another x # of tasks, let's say three threads to open. Are you going to choose 4 threads just because the box has four cores ? What's the point of that ? It's already processing another 76 tasks, how does 4 fit in there ? (Don't say 19 times.) You think it's going to preempt everything else just because your app wants four threads NOW , like a spolied child ?

Bullshit. There's no reason whatseover to know how many cores the box has. Divide the app up into what it needs, then let Mr Scheduler hand out cpu-time as a core becomes available.

Quote:

For example, pbzip2 (parallel bzip2) and pigz (parallel gzip) can run an order of magnitude faster than just using bzip2 and gzip, respectively. Likewise, GNU Make can take advantage of parallelism when building a project. The same goes for GNU Parallel -- that program splits up work into subprocesses and does tasks in parallel.

Just had this discussion with smj a little ... knowing the number of cores available is once again not the way to decide how to program this. For example, you have a 512 p O3900 and a 256k zip file. What are you going to do, check the number of cores and split it into 512 tasks ? That's stupid.

That's an extreme example but the point is valid. There are other factors that decide the proper nuber of threads. For general use, the number of cores available is not a factor.

Quote:

There are also special libraries that are used specifically for parallelism, and they also need to be able to detect the number of processors.

There is probably a good use for those libraries but general desktop computing is not one of them. An application should be correctly written for the job is has to do, not shoehorned into some one-size-fits-all shortcut that doesn't work worth a shit.

Quote:

I was reading through the code for GNU Parallel, and the code for detecting the number of processors was lacking, to say the least. If I remember correctly, it basically just works for Linux, FreeBSD, Solaris, AIX, and Darwin. Libraries and utilities specifically meant to execute tasks in parallel (and therefore remove the low-level detection from the application itself), should have better processor detection code that works for IRIX, HP-UX, NetBSD, OpenBSD, etc.

My instinctive feeling is to say, "GNU Parallel sounds like worthless shit."

Quote:

I want to create a small reference implementation so developers who need to write software like this can refer to the script.

I'd be the last one to criticize someone for writing software that includes Irix

However, I disagree with the idea that you can just stuff these things into some kind of script-kiddy package. If we take Fireflop as an example, I have read that it's actually heavily multi-threaded. But the thing is, it doesn't fucking work. What it should be doing does not even resemble what it does do. That piece of crap should be ajoy on a 4p box, but instead it's a nightmare. Multi-processing has to fit what the job is, not some easy-peasy 'pour in single-thread code at one end and get ha-chachachacha multi-thread code out the other end' fantasy.

You know, IBM is not a bunch of dummies. Really. The longer you hang out in computerland, the more you have to respect them.

ShadeOfBlue
Moderator
Who joined Nov. 25, 2003, 12:09 p.m.
and authored 764 notes

Wrote the following at July 7, 2013, 4:24 a.m...

hamei wrote:

Quote:

knowing about the resources available is important for multiprocessing.

For general computing and desktop use, no it is not

This is true. The email client is a good example of how multithreaded programs should be written.

hamei wrote:

Quote:

There are many very useful programs that do need to know how to divide up work in order for them to fork the correct number of processes.

Sure. But there is no need for them to have any idea whatsoever about how many processors the computer has. If the app needs ten threads, give it ten threads. Then let the scheduler decide how to hand out cpu time.

This is partially true. It works great for IO-bound tasks, but for CPU-bound tasks it's different.

In a 3D renderer, you can trivially parallelize rendering by splitting rendering of individual lines of the output image into separate threads. However, if you ignore how many cores the system has and spawn all the threads at once, the OS scheduler will try to execute them all at the same time, there will be a lot of context switching and the program will run slower than if it knew how many cores it has.

This is an example where spawning all the threads is a bad idea, but only because the scheduler is not a batch job scheduler. If the OS had a separate scheduler for such tasks, it would be as you say.

The reason why it's slower is because all those CPU-bound threads compete for execution at the same time. Every time the scheduler swaps a thread it must save its registers and load registers of the new thread, this is called a context switch. A good scheduler will also take cache into account. If a thread has been doing some work on core 2, that core's L1 and L2 caches have data which that thread needs. If you now give that core another thread to execute, that data will eventually be replaced with the new thread's data. When the original thread resumes execution on that core, the core will have to reload all the data from main memory, which is very slow. Now imagine this repeats for every CPU-bound thread. If you spawn just the right number of threads, every thread will get its own core and things will run faster because they won't compete with each other. Even if a CPU-bound thread is interrupted by an IO-bound thread, it won't cause as much damage because IO-bound threads typically don't need a lot of cache, so our renderer thread will still have its working data in the cache the next time it gets scheduled.

For HPC workloads a batch scheduler is best, but for general desktop use the time-sharing one we have now is better. If you want to run long CPU-bound tasks on this kind of system, you need to compromise. Knowing how many cores the system has enables you to simulate a batch scheduler on top of a time-sharing one. It's a kludge, but it's what we've got

An interesting project would be to add a batch job scheduler to UNIX. Then the OS could worry about these things and knowing how many cores the system has wouldn't be necessary anymore. On HPC systems this is typically done in userspace, but a kernel-based solution might be faster.

hamei wrote:

Quote:

For example, pbzip2 (parallel bzip2) and pigz (parallel gzip) can run an order of magnitude faster than just using bzip2 and gzip, respectively. Likewise, GNU Make can take advantage of parallelism when building a project. The same goes for GNU Parallel -- that program splits up work into subprocesses and does tasks in parallel.

Just had this discussion with smj a little ... knowing the number of cores available is once again not the way to decide how to program this. For example, you have a 512 p O3900 and a 256k zip file. What are you going to do, check the number of cores and split it into 512 tasks ? That's stupid.

This is an IO-bound task, so you are correct, splitting it up into that many pieces would be much slower

jwp
Who joined Nov. 18, 2012, 7:14 p.m.
and authored 75 notes

Wrote the following at July 7, 2013, 4:25 p.m...

hamei wrote:

Quote:

knowing about the resources available is important for multiprocessing.

For general computing and desktop use, no it is not

Quote:

Otherwise, how many processes should the task be divided into? There are basically a few choices: (1) do one thing at a time, (2) use some magic number conjured out of thin air, or (3) make a very large number of processes (potentially wasting a lot of time and memory).

Jesus. The task should be divided up according to what needs to get done.

Using Bluecode's example of an email client :

Open the app with one thread, immediately spin off another thread to actually draw the windows, another to do the work to fill them. Thread One is listener thread for user input. User decides to collect mail, clicks [email protected] and one new thread created to collect those emails. User has two more accounts, clicks [email protected] and tommy@the_grill.org . One worker thread each. Three sets of mails being collected (seemingly) simultaneously, user interface still responsive. Mr User can even browse the main window and open last week's mail (another thread) or if he likes, open another window and write a happy birthday letter to his Mom (another thread.)

These threads and/or processes are all dependent on what he needs to do , not some peculiar calculation based on how many cores are available.

Right, but this is a trivial example in which the number of tasks to be done is something small like three, and the whole thing is I/O-bound.

hamei wrote:

For general use, the number of cores available is not a factor.

Multiprocessing within a program is often for things that are not simple "general use."

For example, I have a SQLite database that is several gigabytes in size. When I want to run the backup, I use pbzip2 (parallel bzip2) which works several times faster than normal bzip2 on my computer. The database dump is a single text stream sent through a pipe, so the alternative is just to use normal bzip2 and wait for the whole task to complete, using a fraction of the computer's power. Logically it is only one task, but splitting it up into more tasks makes sense because it is CPU-bound.

Another example is that I have a CPU-bound program that does a lot of string operations (some 100 million across around 10,000 files). If I just start as many processes as I need for the task, it would mean 10,000 processes sitting in memory, which would drive performance into the ground, if it were even possible at all. If I just guessed and divided it into 4 processes, then it would waste time on a machine with 16 processors.

This is somewhat alleviated by GNU Parallel, like:

Code:

$ find ~/some_files -type f | parallel -t bzip2 -9 {}

But if the startup cost of starting the program (just an example) is a significant part of the total execution time, then it may be more efficient to handle multiprocessing in the program itself, as long as it is convenient to do so. Some languages like Python and Ruby can make this more convenient than what has historically been the case in C.

_________________
Debian GNU/Linux on a ThinkPad, running a simple setup with Fvwm.

hamei
Who joined Feb. 24, 2004, 4:10 p.m.
and authored 8705 notes

Wrote the following at July 7, 2013, 5:32 p.m...

ShadeOfBlue wrote:

This is partially true. It works great for IO-bound tasks, but for CPU-bound tasks it's different.

I expected a bigger ass-whuppin' than that ! Of course any broad beneralization is going to be wrong a lot of the time, which is why I tried to cover my butt with the "general desktop use" proviso. Didn't work tho

But I think your points are in a way the same as mine : how one handles multiple processors depends on what one needs to do, not on "how many cpu's does this box have ?"

Quote:

In a 3D renderer, you can trivially parallelize rendering by splitting rendering of individual lines of the output image into separate threads. However, if you ignore how many cores the system has and spawn all the threads at once, the OS scheduler will try to execute them all at the same time, there will be a lot of context switching and the program will run slower than if it knew how many cores it has.

I immediately think of the reason that SGI would not put faster processors in the Octane - the memory system couldn't feed them. And I'll go out on a limb and bet that the memory system in "modern" commodity computers can't feed the cpu's either.

In fact, if this were the main job of a computer, wouldn't it be better to have fifty little DOS computers with a single cpu and their own memory systems, with each one doing one task without interruption, then joining their answwers at the end ? A non-multitasking cluster ? Multi-tasking operating systems may not be the best solution for every requirement ?

jwp wrote:

Multiprocessing within a program is often for things that are not simple "general use."

For example, I have a SQLite database that is several gigabytes in size. When I want to run the backup, I use pbzip2 (parallel bzip2) which works several times faster than normal bzip2 on my computer. The database dump is a single text stream sent through a pipe, so the alternative is just to use normal bzip2 and wait for the whole task to complete, using a fraction of the computer's power. Logically it is only one task, but splitting it up into more tasks makes sense because it is CPU-bound.

jwp, you're scary. Did you know that you don't have a clue ? I hope you are not in the "IT" world but fear that you probably are ?

jwp
Who joined Nov. 18, 2012, 7:14 p.m.
and authored 75 notes

Wrote the following at July 7, 2013, 9:31 p.m...

hamei wrote:

Quote:

For example, I have a SQLite database that is several gigabytes in size. When I want to run the backup, I use pbzip2 (parallel bzip2) which works several times faster than normal bzip2 on my computer. The database dump is a single text stream sent through a pipe, so the alternative is just to use normal bzip2 and wait for the whole task to complete, using a fraction of the computer's power. Logically it is only one task, but splitting it up into more tasks makes sense because it is CPU-bound.

jwp, you're scary. Did you know that you don't have a clue ? I hope you are not in the "IT" world but fear that you probably are ?

I care about performance because a lot of what I do is CPU-bound. It's just like waiting for some big rendering job because your video card isn't powerful enough. I'm not happy using just 25% of the available CPU power, needlessly waiting for some big task to complete (and the same goes for many other people). If you just write documents and surf the Web, then of course it's silly and useless to talk about these things, but not everyone is like that. And for the rest of us, what's the use of a computer if you feel like you have your hands tied behind your back?

Unix doesn't have to be crippleware. We can actually use the multiprocessing capabilities that are a native part of the operating system. People have been doing things like this for decades now, using the fork(2), wait(2), exec(2), etc. All of these multiprocessing libraries and utilities internally are pretty much just calling fork and managing the child processes, which is a normal part of Unix programming. The only thing new is that new libraries and utilities are available that provide convenient interfaces for these features.

_________________
Debian GNU/Linux on a ThinkPad, running a simple setup with Fvwm.

ShadeOfBlue
Moderator
Who joined Nov. 25, 2003, 12:09 p.m.
and authored 764 notes

Wrote the following at July 9, 2013, 5:51 a.m...

hamei wrote:

I expected a bigger ass-whuppin' than that ! Of course any broad beneralization is going to be wrong a lot of the time, which is why I tried to cover my butt with the "general desktop use" proviso. Didn't work tho

But I think your points are in a way the same as mine : how one handles multiple processors depends on what one needs to do, not on "how many cpu's does this box have ?"

I suppose. However, if you're reimplementing the functionality of OpenMP and similar lower-level libraries, then you need to know how many cores the system has at runtime. There's no other way to do these things, because the kernel lacks a proper batch job scheduler.

But this is a kludge, really. For example, on the new AMD processors, they lie about the number of cores -- their "8-core" processor only has 4 proper cores (with FPUs and everything) and an additional 4 integer units. So if you try to run an FPU-heavy workload on one of these processors, it will run slower with 8 threads than with just 4. This is why I think a batch job scheduler should be part of the kernel, there are simply too many different architectures.

A practical example... The c-ray benchmark is flawed because it doesn't take into account how many cores the system it's running on has. If you spawn one thread for every line, the threads will fight each other and the program will run slower. The proper way of doing parallelization in c-ray would be to ditch the manual pthread implementation and just use OpenMP. With just one line of C, it would run properly and wouldn't have to know how many cores the system has. Also, there would be no overhead on single processor systems -- simply compile without the "-mp" switch and voila.

hamei wrote:

I immediately think of the reason that SGI would not put faster processors in the Octane - the memory system couldn't feed them. And I'll go out on a limb and bet that the memory system in "modern" commodity computers can't feed the cpu's either.

Memory is still the main bottleneck, apart from I/O, yes

This is why processors have caches, but for random memory access patterns, they don't do much.

Also, on the Octane, both processors share the same bus. I forgot the exact bandwidth of that bus, but it was around 1GB/s. If both processors need something they don't have in the cache, that's effectively half a GB/s for each CPU, which is kind of slow today (but not when the Octane was released).

hamei wrote:

In fact, if this were the main job of a computer, wouldn't it be better to have fifty little DOS computers with a single cpu and their own memory systems, with each one doing one task without interruption, then joining their answwers at the end ?

This is exactly how modern GPUs work

They have thousands of independent tiny cores, which work on a stream of data.

However, only a few problems can benefit from such an architecture. You often need to have some sort of communication between various cores.

SGI also had reconfigurable FPGAs that you could attach to an O3k or Altix system. That was a pretty good idea, but probably too expensive to be of much use.

hamei wrote:

A non-multitasking cluster ? Multi-tasking operating systems may not be the best solution for every requirement ?

Of course

Most of the old computers used in HPC, e.g. Crays, didn't have multitasking. There was just a simple interface for the batch job dispatcher, so the system ran only one program at a time.

Now HPC systems use Linux and have a similar job dispatcher implemented in userspace. At least this is how things work in the larger systems I've seen.

jwp wrote:

All of these multiprocessing libraries and utilities internally are pretty much just calling fork and managing the child processes, which is a normal part of Unix programming.

True, but spawning processes and switching between them is slow. This is why most libraries used in HPC use threads.

This thread has really veered off course

I've tested the script on a MacBook Pro with 10.6.8 and it works fine:

Code:

  Darwin icarus.lan 10.8.0 Darwin Kernel Version 10.8.0: Tue Jun  7 16:32:41 PDT 2011; root:xnu-1504.15.3~1/RELEASE_X86_64 x86_64
  
  2

hamei
Who joined Feb. 24, 2004, 4:10 p.m.
and authored 8705 notes

Wrote the following at July 10, 2013, 4:22 a.m...

ShadeOfBlue wrote:

... if you're reimplementing the functionality of OpenMP and similar lower-level libraries, ... The proper way of doing parallelization in c-ray would be to ditch the manual pthread implementation and just use OpenMP.

Speaking of OpenMP, was talking to the main developer for GraphicsMagick a while back. We aren't able to make OpenMP work in Irix. (He gave directions, I pushed buttons, in case you were worried.) It would build but then crash badly. Even though SGI invented Open MP, it has problems with the current implementation

When we were done talking there was some doubt in my mind whether it was really working in Linux, either

If anyone in Texas has a dually SGI of some sort, a loan to the GraphicsMagick people might be a positive move ... unlike some unnamed developers, the GM guy really worked at making the code build with MIPSPro.

bluecode
Who joined Dec. 13, 2011, 7:56 a.m.
and authored 192 notes

Wrote the following at July 10, 2013, 6:12 a.m...

ShadeOfBlue wrote:

Most of the old computers used in HPC, e.g. Crays, didn't have multitasking. There was just a simple interface for the batch job dispatcher, so the system ran only one program at a time.

I never used a Cray so I don't know, but this statement just sounds wrong. For one thing, there is no contradiction between an operating system that only runs one program at a time [if that was ever accurate about Cray] and multitasking. One program can certainly multitask- this is how we do things on IBM. When there are multiprocessors we get the advantage of them, it not we still get the advantage of correct division of work and the advantage of having more workunits available for dispatching among all work on the system. This was something I alluded to earlier in the thread.

I did a quick search and I saw a document from 1986 on the CFT77 Compiler which says "CFT77 is a multipass, optimizing, vectorizing, and multitasking compiler.." See here:
http://archive.computerhistory.org/resources/text/Cray/Cray.CFT77.1986.102646186.pdf

This page http://www.ecmwf.int/services/computing/overview/supercomputer_history.html also briefly documents their use of multitasking on a Cray in 1984.

IBM mainframes have had multitasking support in hardware and software from the beginning or shortly thereafter in S/360 (I can point you to doc from 1965 verifying this) and they were never considered supercomputers, so surely Cray had multitasking support or what was the point of those multiprocessors?

_________________
Paint It Blue

ShadeOfBlue
Moderator
Who joined Nov. 25, 2003, 12:09 p.m.
and authored 764 notes

Wrote the following at July 10, 2013, 2:16 p.m...

hamei wrote:

Speaking of OpenMP, was talking to the main developer for GraphicsMagick a while back. We aren't able to make OpenMP work in Irix. (He gave directions, I pushed buttons, in case you were worried.) It would build but then crash badly. Even though SGI invented Open MP, it has problems with the current implementation

It's possible to crash the program if the functions used inside an OpenMP block aren't thread-safe. Sometimes it's also necessary to manually specify which variables are shared and private, so that the compiler can handle accesses to them properly. Another thing is that MIPSpro doesn't support the latest OpenMP version, they're one or two versions behind, so that might be a possible reason.

A quick check would be to compile it with the gcc 4.7.1 I built (the older Nekoware package had broken OpenMP support). If it works fine, it's probably a version problem, but if it doesn't, it's probably related to the IRIX C/C++ library.

As with all multithreaded programs, there are a number of other possible causes

bluecode wrote:

I never used a Cray so I don't know, but this statement just sounds wrong. For one thing, there is no contradiction between an operating system that only runs one program at a time [if that was ever accurate about Cray] and multitasking. One program can certainly multitask- this is how we do things on IBM.

We probably think the same thing but use different definitions of multitasking

I meant that in the same way we have multitasking on standard computers today -- a time-shared execution of multiple different programs. First Crays didn't have that, but a single program could still use more than one processor, the later ones ran UNIX anyways.

bluecode
Who joined Dec. 13, 2011, 7:56 a.m.
and authored 192 notes

Wrote the following at July 10, 2013, 9:33 p.m...

ShadeOfBlue wrote:

hamei wrote:

bluecode wrote:

I never used a Cray so I don't know, but this statement just sounds wrong. For one thing, there is no contradiction between an operating system that only runs one program at a time [if that was ever accurate about Cray] and multitasking. One program can certainly multitask- this is how we do things on IBM.

We probably think the same thing but use different definitions of multitasking

I meant that in the same way we have multitasking on standard computers today -- a time-shared execution of multiple different programs. First Crays didn't have that, but a single program could still use more than one processor, the later ones ran UNIX anyways.

That's called multiprogramming [1] [2] as far as I know.

[1] http://publib.boulder.ibm.com/infocenter/zos/basics/topic/com.ibm.zos.zconcepts/zconcepts_75.htm
[2] http://research.microsoft.com/en-us/um/people/gbell/computer_structures__readings_and_examples/00000294.htm

_________________
Paint It Blue

ShadeOfBlue
Moderator
Who joined Nov. 25, 2003, 12:09 p.m.
and authored 764 notes

Wrote the following at July 11, 2013, 3:14 a.m...

bluecode wrote:

That's called multiprogramming [1] [2] as far as I know.

I've seen it called that too, but the newer literature seems to prefer "multitasking" (e.g. http://en.wikipedia.org/wiki/Multitasking#Preemptive_multitasking.2Ftime-sharing ).

hamei
Who joined Feb. 24, 2004, 4:10 p.m.
and authored 8705 notes

Wrote the following at July 11, 2013, 5:13 a.m...

ShadeOfBlue wrote:

bluecode wrote:

That's called multiprogramming [1] [2] as far as I know.

I've seen it called that too, but ...

IBM historically speaks its own language. RIPL ? What's that ? Oh, "netboot"

_________________
waiting for flight 1203 ...

Miscellaneous Operating Systems/Hardware

CPU Core Counts: AIX, HP-UX, IRIX, Solaris, BSD, etc. - Page 2