Geoman wrote:
Sherry Chang - Parallelization of Codes on the SGI Origins
http://people.nas.nasa.gov/~schang/origin_parallel.html
Ideally, one would like to have 100% of a code executed in parallel. In reality, this is never achieved. If for some segments of a code, a dependence exists between program statements when the order of statement execution affects the results of the program, these segments must be executed in serial. Dependency is usually the cause why a code can not be well parallelized.
Nice find but I will now start a fight ... from the standpoint of a desktop (and quite possibly a bigger machine) she is entirely wrong. The Unix People have their heads in a dark place about this.
Her first sentence is a dead giveaway that she hasn't got a clue. To put this into a different context, if you have three grains of rice to take to the market, do you want to put one each into separate dump trucks and drive them there ? Obviously what you want to do is parallelize the things that can be improved by being processed in parallel and
leave the other stuff alone.
This means you have to look at
what am I trying to accomplish ?
instead of
how can I make this code run on eight processors ?
For a trivial example, let's take zipping or unzipping. If you have a six megabyte file to compress and four processors, would it make sense to split the file into quarters and let each processor work on one piece ? I would guess so.
But if the file to be zipped is 256k and you have 512 processors, are you going to split it into 512 pieces of 500 bytes and expect this to work well ?
Obviously not, that would be stupid. But this is the approach she is advocating. This whole fixation with determining how many processors a box has, then splitting a task into that many parts is stupid.
For one thing, a multi-tasking computer is doing other stuff so you never get that many processors anyhow. Most likely it is doing
several
other things, so using the number p to determine your parallelizing is mentally retarded. You're never going to get that many resources no matter what, so quit fixating on that fantasy.
For a second thing, the amount of parallelism should be determined by the task, as demonstrated by the zip example. So why not first figure out the most efficient way to split up the task, then hand over however many threads you decided on to the operating system's scheduler ? It has a hell of a lot more information about the current situation than any applications programmer ever will, so let it do its job. I know this is assuming that the people making operating systems are the top guys (not always a warranted assumption) but hey. Between the people building an os and the people creating gtk3, I'll have to go with the kernel creators.
IBM figured this out twenty-five years ago. Why are the Unix people so thick-headed about this subject ?
(One additional nasty little feature here is that she spends a great deal of time talking about OpenMP. But OpenMP on Irix does not work with pthreads. This is not going to be a joy for anyone writing average daily-use parallel programs on Irix. Sproc, totally different from everyone else, oh goody.)