SGI: Development

Parallelization of Codes on the SGI Origins - Page 1

Sherry Chang - Parallelization of Codes on the SGI Origins

http://people.nas.nasa.gov/~schang/origin_parallel.html

Ideally, one would like to have 100% of a code executed in parallel. In reality, this is never achieved. If for some segments of a code, a dependence exists between program statements when the order of statement execution affects the results of the program, these segments must be executed in serial. Dependency is usually the cause why a code can not be well parallelized.
:Octane2: 2xR12000 400MHz, 4GB RAM, V12
SGI - the legend will never die!!
Geoman wrote: Sherry Chang - Parallelization of Codes on the SGI Origins

http://people.nas.nasa.gov/~schang/origin_parallel.html

Ideally, one would like to have 100% of a code executed in parallel. In reality, this is never achieved. If for some segments of a code, a dependence exists between program statements when the order of statement execution affects the results of the program, these segments must be executed in serial. Dependency is usually the cause why a code can not be well parallelized.

Nice find but I will now start a fight ... from the standpoint of a desktop (and quite possibly a bigger machine) she is entirely wrong. The Unix People have their heads in a dark place about this.

Her first sentence is a dead giveaway that she hasn't got a clue. To put this into a different context, if you have three grains of rice to take to the market, do you want to put one each into separate dump trucks and drive them there ? Obviously what you want to do is parallelize the things that can be improved by being processed in parallel and leave the other stuff alone.

This means you have to look at what am I trying to accomplish ? instead of how can I make this code run on eight processors ?

For a trivial example, let's take zipping or unzipping. If you have a six megabyte file to compress and four processors, would it make sense to split the file into quarters and let each processor work on one piece ? I would guess so.

But if the file to be zipped is 256k and you have 512 processors, are you going to split it into 512 pieces of 500 bytes and expect this to work well ?

Obviously not, that would be stupid. But this is the approach she is advocating. This whole fixation with determining how many processors a box has, then splitting a task into that many parts is stupid.

For one thing, a multi-tasking computer is doing other stuff so you never get that many processors anyhow. Most likely it is doing several other things, so using the number p to determine your parallelizing is mentally retarded. You're never going to get that many resources no matter what, so quit fixating on that fantasy.

For a second thing, the amount of parallelism should be determined by the task, as demonstrated by the zip example. So why not first figure out the most efficient way to split up the task, then hand over however many threads you decided on to the operating system's scheduler ? It has a hell of a lot more information about the current situation than any applications programmer ever will, so let it do its job. I know this is assuming that the people making operating systems are the top guys (not always a warranted assumption) but hey. Between the people building an os and the people creating gtk3, I'll have to go with the kernel creators.

IBM figured this out twenty-five years ago. Why are the Unix people so thick-headed about this subject ?

(One additional nasty little feature here is that she spends a great deal of time talking about OpenMP. But OpenMP on Irix does not work with pthreads. This is not going to be a joy for anyone writing average daily-use parallel programs on Irix. Sproc, totally different from everyone else, oh goody.)
he said a girl named Patches was found ...
Wow! That's an extensive analysis - as interesting to read as the original article. Thank you
:Octane2: 2xR12000 400MHz, 4GB RAM, V12
SGI - the legend will never die!!
Geoman wrote: Wow! That's an extensive analysis - as interesting to read as the original article.

Well, take it with a pound of salt. But I've used smp on slow processors on a system that has it figured out, and Unix is terribly disappointing in comparison. An O350 should be much more responsive than an old Z-Pro with a pair of Pentium Pro's, but it isn't :(

And that lady's fixation on OpenMP ... except for the fact that you can't use it on Irix with pthreads, I s'pose it's great. That includes libraries, so for a desktop ....
he said a girl named Patches was found ...
hamei wrote: An O350 should be much more responsive than an old Z-Pro with a pair of Pentium Pro's, but it isn't :(

if that's the case then you have the worst config possible or the thing is simply broken :P
r-a-c.de
She was writing about the kinds of codes running on NASA Origin systems, systems that would run multi-day (or even multi-week) jobs pegging dozens/hundreds of cpus at 99+% sustained utilization. For people doing that kind of stuff, a 16 cpu Origin is a development/testing machine, a 32-64 cpu one is the one you use for simple, light duty jobs, and you think very carefully about optimizing what you put on the 512 cpu machine, because once you start the job, that machine is not available to anyone else for a week. Fine, her first sentence was an overstatement, but I'm sure her intended audience understood it in the context of those kinds of jobs. Her intended audience almost certainly was not interested in writing interactive desktop applications. In other words, your interactive use case is not at all what she was writing about (though a lot of her points are applicable to certain kinds of desktop jobs).

Any programmer who understands what she wrote would not make the kinds of architectural mistakes you suggested, hamei. To be fair, though, I have been amazed by the number of developers (on any platform) who really don't understand parallelization at all.
josehill wrote: To be fair, though, I have been amazed by the number of developers (on any platform) who really don't understand parallelization at all.

on some days i could even rephrase that to:
"I have been amazed by the number of developers (on any platform) who really don't understand."

:P
r-a-c.de
foetz wrote: on some days i could even rephrase that to:
"I have been amazed by the number of developers (on any platform) who really don't understand."

:P

;)
josehill wrote: She was writing about the kinds of codes running on NASA Origin systems

Hence the very first sentence, "for desktop use" :D

Fine, her first sentence was an overstatement, but I'm sure her intended audience understood it in the context of those kinds of jobs.

I think I disagree with you here. I think those people actually think that way. They are living in a DOS world where THEY control the computer ! THEY decide what happens when and where ! THEY make the choices !

I think this is a problem.

Her intended audience almost certainly was not interested in writing interactive desktop applications.

True. But unfortunately ...

... a lot of her points are applicable to certain kinds of desktop jobs).

THERE is the problem. Her points are not applicable to desktop jobs. Desktops should be entirely different, yet they are not because too many people read this stuff intended for huge machines running large scientific codes and think that's how it should be.

It shouldn't. A desktop is not an Origin 3800 any more than a Mack truck is a Lotus Elan.

Any programmer who understands what she wrote would not make the kinds of architectural mistakes you suggested, hamei.

Then there is no one in the Unix or Linux kermyooonity who understands what she wrote, because that's exactly what they do for desktops all the time. It hasn't been two months since jwp here was writing a little utility to determine the number of processors in a box so he could set the number of cpu's in an application. That is so wrong .

And it's not just him. It's all of them. I bet there isn't a single Linux application that is properly written for multi-tasking, multi-user computing. Not one.

To be fair, though, I have been amazed by the number of developers (on any platform) who really don't understand parallelization at all.

Agreed 10,000%.
he said a girl named Patches was found ...
I stopped reading at

Before parallelizing a code, a programmer should optimize his code for single CPU execution
Speaking of big Origins, I just found this. Pretty neat !

http://www.rent-a-sgi.com/
he said a girl named Patches was found ...
hamei wrote: Speaking of big Origins, I just found this. Pretty neat !

http://www.rent-a-sgi.com/


Not very far away from me actually -- in Switzerland!
:Octane2: 2xR12000 400MHz, 4GB RAM, V12
SGI - the legend will never die!!
that's a neat idea. switzerland has always been a bit more flexible and open compared to other countries :-)
r-a-c.de
foetz wrote:
hamei wrote: An O350 should be much more responsive than an old Z-Pro with a pair of Pentium Pro's, but it isn't :(

if that's the case then you have the worst config possible or the thing is simply broken :P

You could be correct but I don't think so :(

In reference to this article and in general, I don't think Unix programmers understand multi-tasking at all . You yourself have noticed that the O350 will block you out for periods of time. And your O350 is doubtless better than mine :D Thunderbird just blocked on me a few minutes ago, I've had terminals stay unresponsive for periods up to ten seconds, Fireflop is a pig in this respect ... connecting to ... connecting to ... connecting to .... . The people who wrote that abortion don't have a clue.

Okay, that's not correct. The truth is they don't care about the user, not even a tiny little bit. They are interested in jamming their shit down our throats no matter what -- those are the real Mozilla Foundation Goals. Their propaganda is, quite simply, lies.

Anyway, I could be mistaken but I've looked and looked at Linux "how to program with threads" docs and never found anything that explains why like this :

http://www.edm2.com/index.php/Writing_M ... s_Programs

IBM was adamant about user-responsiveness. No one else seems to give a shit. Or even understand why you would want to. They all want to control the volume themselves, to hell with the person who bought and paid for that computer :P
he said a girl named Patches was found ...
hamei wrote: You yourself have noticed that the O350 will block you out for periods of time

which turned out to be a disk issue (and was no issue for normal operation anyway). but even with the bad disk it ran like hell. compared to a pentium pro they're worlds apart in absolutely every aspect.

as for smp, i used a 16 cpu origin 2000 for pretty much all of the 6.5 packages i provided over the years and i can assure you that multiple cpus and shared memory, especially the way sgi did it, does pay off :D
r-a-c.de
foetz wrote: as for smp, i used a 16 cpu origin 2000 for pretty much all of the 6.5 packages i provided over the years and i can assure you that multiple cpus and shared memory, especially the way sgi did it, does pay off :D

Exhibit A for the defense : Fireflop :D
he said a girl named Patches was found ...
hehe, of course smp only works if your proggy supports it. but i've taken that for granted given that this thread is about that very subject :P
r-a-c.de
foetz wrote: hehe, of course smp only works if your proggy supports it.

When multi-core chips became the norm I was all excited that application programmers would finally have to figure out smp.

Alas, I was wrong :(

but i've taken that for granted given that this thread is about that very subject :P

Yeah. What percentage of programs are run 24/7 on 512p machines to determine how soon the climate will implode ? Yet this is what we are faced with from all the Linux-centric Programmers' Guild.

I wish they'd get their heads out of their butts (see Mozilla Corporation, aka Firefox if you disagree).

btw, that lady's degree is in chemistry and she's from Taiwan : a recipe-programmer telling other people how to parallelize for an environment that's not applicable to 95% of normal uses. But it bleeds over into what application programmers think is the right way to do things. And no, all my commercial programs think they are a big fat (expensive) DOS program also :(

We're living in the sunset of the world.
he said a girl named Patches was found ...
hamei wrote: When multi-core chips became the norm I was all excited that application programmers would finally have to figure out smp.

Alas, I was wrong :(

i'm afraid so. the number of daily use programs that can utilize more than one cpu or core is pretty much zero. one of the main reasons why multi-core cpus for the consumer market are a joke
r-a-c.de
So it would be wise to remove the dual R12K and insert a single R14K 600. Because I too often see 49% CPU usage in top :-/
:Octane2: 2xR12000 400MHz, 4GB RAM, V12
SGI - the legend will never die!!