SGI: Development

Fractal benchmark - Page 1

Hello,

here is a little program that can be used as a benchmark in a multiprocessor environment.
It calculates three beautiful (?) fractal images:

http://www.martin-steen.de/sgi/frbatch.tar.gz

It makes heavy use of posix threads, so the programs benefits a lot of systems
with more than one CPU. Actually, it is a multithreaded floating-point benchmark.

Atfer unpacking, run the script "./run1.sh" to run a fast benchmark. "./run2.sh" runs a
slower benchmark that creates bigger images. The script uses four threads,
but you can adjust the scripts to use more threads or create even larger images
(8000x8000 pixel are no problem, but it takes a while).

When the benchmark has run, another small program converts the images from Targa
format to Jpeg. The programs are compiled for 64 bit machines (-64 -mips4).

On a SGI Fuel, the first script takes about 120 seconds.
fract_0088_8.jpg
fract_0088_8.jpg (48.88 KiB) Viewed 4944 times

Best regards,
Martin
The executable in there is a MIPS/IRIX executable?
legalize wrote: The executable in there is a MIPS/IRIX executable?


Yes, the files "fractbatch" and "tga2jpeg" are 64-bit mips executables.

--
My collection so far ;)

:Fuel: :Octane:
@Martin Steen: Thank you very much!
I didn't dare to ask you to publish the fractal generator from the "SGI Art Gallery" thread, thought you might want to keep that closed - but you finally released it :-D
:Indy: :O2: :O2: :Indigo: :Indigo2IMP: :Octane: :Octane2: :Octane2:
SGI - the legend will never die!!
Here we go:

Code: Select all

Octane2 8% ./run1.sh
Fractal benchmark / by Martin Steen
Size   =256x256
Iter   =16384
Threads=4
Render 1 of 3: fract_0088_2.tga
InitThreads OK
....................................................
Render 2 of 3: fract_0088_8.tga
InitThreads OK
....................................................
Render 3 of 3: fract_0088_9.tga
InitThreads OK
....................................................

real    2m38.56s
user    2m37.17s
sys     0m0.18s


~159seconds on a dual R12K400 Octane.

concerning CPU-load, top gave me:

Code: Select all

2 CPUs: 47,6% idle, 51,6% usr,  0,8% ker,  0,0% wait,  0,0% xbrk,  0,0% intr
PID       PGRP USERNAME PRI  SIZE   RES STATE    TIME WCPU% CPU% COMMAND
1399       1398 robert   +20 5472K 2864K run/1    0:27 96,3 96,23 fractba
:Indy: :O2: :O2: :Indigo: :Indigo2IMP: :Octane: :Octane2: :Octane2:
SGI - the legend will never die!!
Geoman wrote: ~159seconds on a dual R12K400 Octane.

concerning CPU-load, top gave me:

Code: Select all

2 CPUs: 47,6% idle, 51,6% usr,  0,8% ker,  0,0% wait,  0,0% xbrk,  0,0% intr
PID       PGRP USERNAME PRI  SIZE   RES STATE    TIME WCPU% CPU% COMMAND
1399       1398 robert   +20 5472K 2864K run/1    0:27 96,3 96,23 fractba



Hmmm.. looks like only one CPU is working. Unfortunately, I don't own a SGI with more than
one CPU. On my Linux computer with a phenom quadcore, all four cores are 96% busy
(using the same sourcecode).

So, since you have a dual-CPU Octane2, you have to be my betatester now ;)

Thank you for testing and best regards,
Martin
Wait! :-)

Seems like Octane has to warmup ^_^ a little bit.
doing the 2nd script, your renderer uses both cpu's in stage 3:

Code: Select all

Octane2 10% ./run2.sh
Fractal benchmark / by Martin Steen
Size   =2048x2048
Iter   =16384
Threads=4
Render 1 of 3: fract_0088_2.tga
InitThreads OK
....................................................
Render 2 of 3: fract_0088_8.tga
InitThreads OK
.............^[[B.......................................
Render 3 of 3: fract_0088_9.tga
InitThreads OK
................


Code: Select all

2 CPUs:  1,3% idle, 94,8% usr,  3,3% ker,  0,1% wait,  0,0% xbrk,  0,5% intr
PID       PGRP USERNAME PRI  SIZE   RES STATE    TIME WCPU% CPU% COMMAND
1402       1401 robert   +20 5696K 3088K run/0  116:09  154 160,1 fractba
:Indy: :O2: :O2: :Indigo: :Indigo2IMP: :Octane: :Octane2: :Octane2:
SGI - the legend will never die!!
Geoman wrote: Wait! :-)

Seems like Octane has to warmup ^_^ a little bit.
doing the 2nd script, your renderer uses both cpu's in stage 3:

Code: Select all

2 CPUs:  1,3% idle, 94,8% usr,  3,3% ker,  0,1% wait,  0,0% xbrk,  0,5% intr
PID       PGRP USERNAME PRI  SIZE   RES STATE    TIME WCPU% CPU% COMMAND
1402       1401 robert   +20 5696K 3088K run/0  116:09  154 160,1 fractba


Thank you for the report. Seems like there is still an issue about threads that I have
to spot.

Best regards, Martin
I can give it a try on my 16CPU Origin tomorrow (hope the rainy weather continues :) ).

Also, on MIPS (and other properly designed architectures), 64-bit apps actually run *slower* than 32-bit ones, so compile for 64-bit only if your app uses more than 2GB of memory.
ShadeOfBlue wrote: I can give it a try on my 16CPU Origin tomorrow (hope the rainy weather continues :) ).

Also, on MIPS (and other properly designed architectures), 64-bit apps actually run *slower* than 32-bit ones, so compile for 64-bit only if your app uses more than 2GB of memory.


That would be great! I can also give you the sourcecode so you can compile your own
version (MIPSpro compiler needed).

Best regards,
Martin
Seems like there is still an issue about threads that I have
to spot.


Well, for me, your program works really smooth, even at 97,4% usr. I can still do work while rendering.

Seti, on the other hand, stalls the Octane while working on 2 datasets; it becomes totally unresponsive then.
:Indy: :O2: :O2: :Indigo: :Indigo2IMP: :Octane: :Octane2: :Octane2:
SGI - the legend will never die!!
ShadeOfBlue wrote: I can give it a try on my 16CPU Origin tomorrow (hope the rainy weather continues :) )...


You have a watercooled origin? :P

SCNR,
Joerg
joerg wrote:
ShadeOfBlue wrote: I can give it a try on my 16CPU Origin tomorrow (hope the rainy weather continues :) )...


You have a watercooled origin? :P

SCNR,
Joerg


Hello Joerg, hello ShadeOfBlue, hello Geoman,

if you dare to run the app on your multiprozessor machines, here is the sourcecode for the
fractal batch programm:
http://www.martin-steen.de/sgi/fractbatch-source.tar.gz

Just type "make" to build the executable.
Run ./testrun.sh to create three (simple) fractals

This test is not a real challange for a machine with 16 cpus.
To make it more interesting, you can try:

Code: Select all

./fractbatch 8192 8192 1024 32 frects0089.txt


This creates the three images with a size of 8192x8192 pixels
and uses 32 threads.

Remarks:
- The limit for Targa pictures is 32768x32768 Pixels.
- maybe you know compiler switches for a faster executable

Best regards,
Martin
The "testrun.sh" consistently uses only one CPU, the time used for that test is about 82 seconds (on a 16P 400MHz O3k, compiled with the defaults [and MIPSpro 7.4.4]: "-DEFAULT:abi=n32:isa=mips4:proc=r12k" and the "-O2" from the Makefile).

The other command uses about 10-12 CPUs and is still running :)

When it finishes, I will try to get the first test to run on all CPUs (will probably require setting the number of threads to 64 or somesuch).
joerg wrote:
ShadeOfBlue wrote: I can give it a try on my 16CPU Origin tomorrow (hope the rainy weather continues :) )...


You have a watercooled origin? :P

:lol:

These things are like little furnaces, the room heats up by 4°C in an hour. Excellent to have in winter, not so much when it's 35°C outside :)

Anyhow, the weather is favorable, I shall power it up and open the balcony, expect results soon :)
Geoman wrote:
Thanks for sharing the sourcecode. It built on MIPSpro 7.4.4m without any error or warning.

Now I'm torturing the Octane with it:
Code:
/fractbatch 8192 8192 1024 32 frects0089.txt


I get 187,8 CPU% right at the beginning.


Ok, then everything seems to work fine! :)

There are two strange warnings I turned off (-woff). They have something to do with
the STL ("A template was detected during header processing"). I don't know what it is
and G++ doesn't show this warning, so I switched it off.

Thanks for testing and best regards,
Martin
Update:
The long test took 1935 seconds and used 11 CPUs in average.

I re-did the short test with 32 threads: 9.3 seconds, ~9 CPUs.
The program dumps core at 40+ threads :?
Code:
Trace/BPT/RangeErr/DivZero/Ovflow trap (core dumped)
Thanks for sharing the sourcecode. It built on MIPSpro 7.4.4m without any error or warning.

Now I'm torturing the Octane with it:

Code: Select all

/fractbatch 8192 8192 1024 32 frects0089.txt


I get 187,8 CPU% right at the beginning.
:Indy: :O2: :O2: :Indigo: :Indigo2IMP: :Octane: :Octane2: :Octane2:
SGI - the legend will never die!!
ShadeOfBlue wrote:
The "testrun.sh" consistently uses only one CPU, the time used for that test is about 82 seconds (on a 16P 400MHz O3k, compiled with the defaults [and MIPSpro 7.4.4]: "-DEFAULT:abi=n32:isa=mips4:proc=r12k" and the "-O2" from the Makefile).

The other command uses about 10-12 CPUs and is still running :)

When it finishes, I will try to get the first test to run on all CPUs (will probably require setting the number of threads to 64 or somesuch).


Thank you for starting your huge machine for this test!

It looks like IRIX does not start all CPUs when a thread is running to short.

After starting, every worker-thread calculates exactly four lines. The supervisor-thread
waits for all worker threads to stop (*) . Then the supervisor-thread writes the lines to the
TGA-file and starts the threads again to compute the next four lines.

This process makes it possible to render pictures of nearly arbitrary size (without consuming
much RAM).

Quote:
The program dumps core at 40+ threads


Probably this has something to do with the number of lines. Maybe the picture is to small for
so many threads. I will check the source.

Thank you for your report!

Best regards,
Martin

*) The worker-thread does't really stop, but waits (pthread_cond_wait) for the next start signal
Martin Steen wrote:
It looks like IRIX does not start all CPUs when a thread is running to short.

You could try forcing each thread to its own CPU with pthread_setrunon_np() .