SGI: Development

Fractal benchmark - Page 2

ShadeOfBlue wrote:
Martin Steen wrote:
It looks like IRIX does not start all CPUs when a thread is running to short.

You could try forcing each thread to its own CPU with pthread_setrunon_np() .



Good idea. Thank you!

Best regards,
Martin
ShadeOfBlue wrote:
Martin Steen wrote:
It looks like IRIX does not start all CPUs when a thread is running to short.

You could try forcing each thread to its own CPU with pthread_setrunon_np() .

But isn't the operating system's scheduler normally better at this than letting every program arbitrarily decide what resources it wants to use ?
hamei wrote:
But isn't the operating system's scheduler normally better at this than letting every program arbitrarily decide what resources it wants to use ?

Normally yes and it does that here too, but it takes the scheduler too long to figure it out (since this program's threads process work in short bursts), so providing it with a hint could possibly improve the performance.

I usually use OpenMP for threading in such programs and it starts using all available CPUs almost instantly. It's also much simpler than writing your own code to do the threading, but then again, I'm just lazy :)
How far back does OpenMP support go in IRIX?

_________________
Director, Computer Graphics Museum
Museum's SGI HW collection :
:Onyx2RM: :Onyx2RE: :Onyx2RE: :Onyx2RE: :ChallengeXL: :OnyxR: :Skywriter: :Skywriter:
:IRIS3130: :4D220VGX: :4D220VGX: :PI: :PI: :PI: :PI: :PI: :PI: :PI: :Onyx: :Onyx: :Crimson: :O2000: :O2000: :Onyx2: :Fuel: :Fuel:
:Indigo: :Indigo: :Indigo: :Octane: :Octane: :Octane: :Octane: :Octane: :Octane: :Octane: :Octane: :Octane: :Octane: :Octane: :Octane: :Octane: :Octane: :Octane: :O2: :O2: :O2: :O2: :O2:
:Indigo2IMP: :Indigo2IMP: :Indigo2: :Indigo2: :Indigo2: :Indy: :Indy: :Indy: :Indy:
legalize wrote:
How far back does OpenMP support go in IRIX?

It had OpenMP from at least MIPSpro 7.2 onwards, with support for OMP 2.0 added in 7.4, but even before that, SGI's compilers had special #pragma's for parallelization and the OpenMP spec was partially based on those, IIRC :)
Code:
OCTANE2 44% ./run1.sh
Fractal benchmark / by Martin Steen
Size   =256x256
Iter   =16384
Threads=4
Render 1 of 3: fract_0088_2.tga
InitThreads OK
....................................................
Render 2 of 3: fract_0088_8.tga
InitThreads OK
....................................................
Render 3 of 3: fract_0088_9.tga
InitThreads OK
....................................................

real    1m59.30s
user    3m30.14s
sys     0m0.29s
converting file fract_0088_2.tga
converting file fract_0088_8.tga
converting file fract_0088_9.tga


This was done on my Octane2 2x300MHz. Render 1 only used one CPU, but the second CPU decided to get off its butt and help out for Render 2 and Render 3. :)

Like Geoman said, running
Code:
/fractbatch 8192 8192 1024 32 frects0089.txt
fires up both processors immediately. It's still pounding away on the Octane now.

Thanks for sharing this! Very cool.

_________________
:Onyx2: :Fuel: :Indigo2: :Indigo2IMP: :O3x0:
Servus Martin,

am a bit late to the party. I compiled the source but when I run <./testrun.sh> [with threads = 1], although the tga files are generated ok, I don't see any benchmarking information displayed.

Anyway, for fun I did a <time ./testrun.sh> [with the default sizes/iterations] and get on my Fuel 700Mhz:
Code:
real    0m58.05s
user    0m46.82s
sys     0m0.19s

_________________
For aliens we're aliens.
I have done some testing on Octane 2xR10000@195MHz with the original program and got the following.

./run1.sh only runs one cpu for Render 1 but two cpus for Render 2 and 3
Code:
real 2m54.19s
user 5m8.01s
sys  0m0.52s

Changing threads => 11 makes both cpus go from start.
./run1.sh with threads = 16
Code:
real 2m37.05s
user 5m8.12s
sys  0m0.50s

Changing threads =>21 coredumps.
Code:
./run1.sh[18]: 1924 Trace/BPT trap(coredump)

./run2.sh runs on two cpus for Render 1,2 and 3, but it take a little while while before both go on for Render 1. It takes about half the render before it happens (20 dots on the meter).

The coredump don't happen with run2.sh, but I only tested up to 64 threads. I guess it has to do with the number of lines in the image as you say.
Did some more tests on the Origin 32cpu viewtopic.php?f=14&t=16720995 , watching the cpu meters on the MMSC display.
Code:
gigantix 49# ./run1.sh

Fractal benchmark / by Martin Steen
Size   =256x256
Iter   =16384
Threads=4
Render 1 of 3: fract_0088_2.tga
InitThreads OK
....................................................
Render 2 of 3: fract_0088_8.tga
InitThreads OK
....................................................
Render 3 of 3: fract_0088_9.tga
InitThreads OK
....................................................

real    4m1.19s
user    3m59.92s
sys     0m0.13s
converting file fract_0088_2.tga
converting file fract_0088_8.tga
converting file fract_0088_9.tga
For some reason, only 1 cpu worked during Render 1.
Code:
gigantix 50# ./run1.sh

Fractal benchmark / by Martin Steen
Size   =256x256
Iter   =16384
Threads=16
Render 1 of 3: fract_0088_2.tga
InitThreads OK
................................................................
Render 2 of 3: fract_0088_8.tga
InitThreads OK
................................................................
Render 3 of 3: fract_0088_9.tga
InitThreads OK
................................................................

real    0m34.05s
user    4m0.38s
sys     0m0.28s
converting file fract_0088_2.tga
converting file fract_0088_8.tga
converting file fract_0088_9.tga
That's more like it. This time 16 cpus got working directly.
Code:

gigantix 52# time ./fractbatch 256 256 16384 32 frects0088.txt

Render 1 of 3: fract_0088_2.tga
InitThreads OK
Trace/BPT/RangeErr/DivZero/Ovflow trap (core dumped)
Coredumped with 32 threads. Imagesize to small for 32 threads?
Code:
gigantix 56# ./run2.sh

Fractal benchmark / by Martin Steen
Size   =2048x2048
Iter   =16384
Threads=64
Render 1 of 3: fract_0088_2.tga
InitThreads OK
................................................................
Render 2 of 3: fract_0088_8.tga
InitThreads OK
................................................................
Render 3 of 3: fract_0088_9.tga
InitThreads OK
................................................................

real    9m10.86s
user    4h16m1.75s
sys     0m10.56s
converting file fract_0088_2.tga
converting file fract_0088_8.tga
converting file fract_0088_9.tga
Large image size and 64 threads. All cpus working. 4 hours of work ni 9 minutes. That's a supercomputer :-)
bjornl wrote:
Large image size and 64 threads. All cpus working. 4 hours of work ni 9 minutes. That's a supercomputer :-)


Cool stuff.

Is that the system in your fathers shop? Laptop and serial console? I imagine that you haven't brought this beast home? You know, considering the space available. :D

/Matt

_________________
If I can't fix it I can fix it so it can't be fixed.
:O2000: :O2000: +MXE/IO6G :Onyx: [RE2] :Fuel: :O200: :Octane: x3 :Octane2: :O2: x2 :1600SW: x3 :Indigo2IMP: x2 :Indigo2: x2 :Indigo: :Indy: x5 :320: x2, 2xSparcStation 20 and a horde of PC's
Any chance to have an executable version for my O2?

Code: Select all

# ./fractbatch: Exec format error. Wrong Architecture.
# file fractbatch
fractbatch:     ELF 64-bit MSB mips-4 dynamic executable (not stripped) MIPS - version 1
#

I'm just curious...

:)
Strangly, girls talks to me when I walk with my O2?!
Image R12k @ 300MHz, 384Mb Ram
run2.sh on my 2 best machines

Code: Select all

Octane2 2x360mhz

real   1h45m15.03s
user   3h6m42.59s
sys   0m40.77s

Origin 2400 8x250mhz

real   39m43.19s
user   4h16m6.97s
sys   0m24.51s


Anyone can explain to me what is user time? and why it's actually higher on my Origin2400 ?

Oh and another strange behavior, it seems i lose a router link when second test starts... Doesn't seem to affect the system, but it always shuts this link down each time it starts the second test of the run2.sh
front_link.jpg
front_link.jpg (37.56 KiB) Viewed 903 times
:Octane: 270Mhz SI 384Mb ram :Octane2: 2x360Mhz V6 1.5Gb ram
:O2000: Death by flooding... :(
Amiga A3000D / Full ECS, Kickstart 3.1, 2Mb CHIP/24Mb FAST with 2+18Gb SCSI HD
Amiga A1200 starting to work on this one.
Quick crash course:
"real" time is also commonly called "wallclock" - it's how much actual, measurable time the program spent from when you invoked it to when it finished. Generally measured using a real-time clock in the system, if it's available.
"user" time is an estimate of how much CPU time was spent in user code (i.e. the program itself). It's usually measured by counting instruction cycles.
"system" time is an estimate of how much CPU time was spent in system code (i.e. the kernel and syscalls). Again usually measured by counting instruction cycles.

This is a grossly simplified explanation, but user time is higher on your Origin 2400 because it has slower CPUs - it had to do about 4 hours of processing "work" to match your Octane's 3 hours. It could just slice it up into 8 pieces, so it took less real time.
Oh! explained like that i can now fully understand the results.

So it also means, that if i had another module (16 cpus total), it would then take about 20mins of real time instead of 40mins, but would still sum up to 4h of user time.
:Octane: 270Mhz SI 384Mb ram :Octane2: 2x360Mhz V6 1.5Gb ram
:O2000: Death by flooding... :(
Amiga A3000D / Full ECS, Kickstart 3.1, 2Mb CHIP/24Mb FAST with 2+18Gb SCSI HD
Amiga A1200 starting to work on this one.
Bill622 wrote: Any chance to have an executable version for my O2?

Code: Select all

# ./fractbatch: Exec format error. Wrong Architecture.
# file fractbatch
fractbatch:     ELF 64-bit MSB mips-4 dynamic executable (not stripped) MIPS - version 1
#

I'm just curious...

:)


Is the O2 not a 64-bit-machine?
It's 10 years ago that I worked on a O2.

Here is another executable for you, compiled with compiler-switches -n32 -mips3:
http://www.martin-steen.de/sgi/n32/fractbatch.gz

Best regards,
Martin
Martin Steen wrote: Is the O2 not a 64-bit-machine?

It has a 32-bit kernel, since there's really no reason to run it in 64-bit mode due to the 1GB memory limit (it can, however, still use 64-bit instructions -- it just lacks the kernel support and libraries that use a 64-bit address space).

It is more efficient to run MIPS4 N32 code on all SGI machines that have an R5k or better CPU inside (and MIPS3 N32 for R4k). Compile it as 64-bit only if your app uses more than 2GB of memory, otherwise it will just cause a slow down and use more memory.
ShadeOfBlue wrote:
Martin Steen wrote: Is the O2 not a 64-bit-machine?

It has a 32-bit kernel, since there's really no reason to run it in 64-bit mode due to the 1GB memory limit (it can, however, still use 64-bit instructions -- it just lacks the kernel support and libraries that use a 64-bit address space).

It is more efficient to run MIPS4 N32 code on all SGI machines that have an R5k or better CPU inside (and MIPS3 N32 for R4k). Compile it as 64-bit only if your app uses more than 2GB of memory, otherwise it will just cause a slow down and use more memory.


So the O2 has a 64-bit CPU, but it cannot run 64-bit code because the kernel and the libraries are
32-bit? That's good to know, if one is going to use posix memory mapping (mmap) with big files.
(on my job I have to deal with satellite images that can be 200GB large).

Best regards, Martin
henrycault wrote: run2.sh on my 2 best machines

Oh and another strange behavior, it seems i lose a router link when second test starts... Doesn't seem to affect the system, but it always shuts this link down each time it starts the second test of the run2.sh
front_link.jpg


That is indeed very strange. There is no network-code in the program.
Martin Steen wrote: So the O2 has a 64-bit CPU, but it cannot run 64-bit code because the kernel and the libraries are 32-bit?

Yes, although it can still use 64-bit instructions and operate on 64-bit data (e.g. dadd, dsub, dmult, ...) normally in N32 mode :)
Essentially, the 64-bit ABI only enlarges the 'long' and 'pointer' types to 64-bits and thus enables the program to use more than 2GB of RAM.

This document explains the exact differences between N32 and O32 and N64: http://techpubs.sgi.com/library/tpl/cgi-bin/browse.cgi?coll=0650&db=bks&cmd=toc&pth=/SGI_Developer/Mpro_n32_ABI .

That's good to know, if one is going to use posix memory mapping (mmap) with big files.
(on my job I have to deal with satellite images that can be 200GB large).

An Onyx/Origin 3800 system would be more suited to that kind of work if you want to fit the entire dataset within main memory (though an O2 with 1GB of RAM [the maximum for an O2] could map textures up to ~800MB at a time).
henrycault wrote: Oh and another strange behavior, it seems i lose a router link when second test starts... Doesn't seem to affect the system, but it always shuts this link down each time it starts the second test of the run2.sh
After looking at your photo, I don't think you're losing the link - the green LED on the left indicates a successful link, the yellow LED on the right indicates activity <or a fault>. That particular pair of LEDs are associated with node 1 - so it might be possible that node 1 isn't used during the the run2 test.

linkstat or xbstat might give you some additional detail on the router/xbow activity during the test.
***********************************************************************
Welcome to ARMLand - 0/0x0d00
running...(sherwood-root 0607201829)
* InfiniteReality/Reality Software, IRIX 6.5 Release *
***********************************************************************