The collected works of ShadeOfBlue - Page 5

hamei wrote:
MIPS processors don't have the Ring0, Ring3, etc etc junk that plagues the Intel processors, do they ?

R4k and up have three privilege levels -- kernel, user, and supervisor (but nobody actually used that one :P ). Two are enough to implement a secure OS.

hamei wrote:
Quote:
It should be fairly straightforward to implement a FAT16 filesystem module :)

Best news I've heard all week, cuz natcherly I have a realworld use : CF cards larger than 2 gb.

FAT is a really simple filesystem; making an IRIX module for it should take about a week or two. Remind me again sometime in July :)

hamei wrote:
Quote:
It's similar to developing new device drivers for the kernel.

Oh good. Good practice for adding a USB device or two :D

That's much more complicated :D

The current USB implementation is completely undocumented, so the cleanest thing to do would be to port the USB stack from NetBSD or something similar. This is a lot of work, since someone has to rewrite the parts dealing with announcing new devices to the kernel, making entries for them under /hw and so on.

hamei wrote:
Quote:
Interesting! It doesn't seem to be documented anywhere -- perhaps it wasn't finished?

Or more likely they just never wrote any documentation ? There is other stuff in there I wasn't able to find much info on either. Must be somewhere, I am just an easily-deviated searcher ...

A quick disassembly of the module would show if it only has stubs or actually implements something. Reverse engineering the API is probably too much work and it would be easier to just port FUSE to IRIX.

hamei wrote:
Quote:
You can also remove PPP support and some special locking implementation for Oracle databases and there's plenty more like that

You mind-reader you :D

:D

Also, to speed up booting, you can put an "exit 0" after the first few lines of comments in /etc/init.d/sendmail and /etc/init.d/esp (or availmon or something...). Even if you chkconfig those off, they will still print a message and run some crap which takes some time at boot and shutdown. There are some other files like that, but I always forget what they are :P
vishnu wrote:
but if it's compiling the individual source files into object files it's doing it very surreptitiously because I have yet to see one! :shock:

They're in a hidden directory in each source subdirectory. This is normal for an autotools-generated makefile.

hamei wrote:
Code:
gmake[2]: *** No rule to make target `/usr/include/standa', needed by `/usr/people/dev/maxwellwp/lib/libmx_edit.a(mx_wp_edit.o)'.  Stop.

It looks like a dependency line in the makefile got truncated somehow. Have you set the ncargs systune variable to 262144 before running ./configure and all that? If you haven't, increase it and run "make distclean" then "./configure" etc.

hamei wrote:
So for people who do not have commercial software, Max and Ted are still maybe the best choices ...

Learn LaTeX, you will never want to use anything else :)
ClassicHasClass wrote: GL h4xx0r5, is there an easy way to flip glDrawPixels other than by labouriously turning the buffer around in memory?

glPixelZoom(1.0f, -1.0f) :)
You will also need to start drawing the image at a different y position, use glRasterPos to do that.
Try a newer GDB, there have been some changes to the debugging format in gcc 4.something.
ClassicHasClass wrote: It's already there, I think. I played with a few different iterations and all I got was a black screen. When it goes to blit the bits, it starts at the bottom and works back, but setting it to 0.0 didn't fix it, unless I'm totally wrong somewhere.

Try setting the starting y offset to the height of the image you're drawing :)
hamei wrote:
The Selectric of the same vintage is a joy to type on, even better than a Type M clicky keyboard. A Selectric would probably make a great input device for a computer. They really are great.

The Model M was supposedly a "cost-reduced" version of the Selectric keyboard and the Cherry key switches are a "cost-reduced" version of the buckling-spring ones in the Model M :)

I have one of these keyboards . It's not a Selectric by any stretch and it looks kinda cheap, but it's well made, takes up little space, works well with SGIs, and is comfortable to type on. It looks much better if you replace the three Windows keys with blank ones ;)

hamei wrote:
FORTRAN is a nice comfortable language :)

Wow, that code is really ancient :)
For comparison, here's how a modern Fortran program can look like: http://www.liv.ac.uk/HPC/HTMLF90Course/HTMLF90CourseNotesnode62.html

hamei wrote:
A Bendix 5 control has a Control Data computer as the thinking portion. K&T used PDP-8's and later built their own, Sundstrand used a PDP-11, McDonnel-Douglas built their own (Actrion), Giddings & Lewis built their own, Cincinnati built their own, GE built a pile of crap (GE has always been second-rate shit), Allen-Badley built their own, Hurco built their own

These early computers were interesting, everyone made their own CPU architecture back then :)

The patent applications were much more substantial that the crap that gets accepted today. For example, for the HP 9830A computer, HP put _everything_ you needed to know how to build the machine in the patent application. This includes all the schematics and even the source code for the ROMs. Here's a PDF of it: http://www.google.com/patents/US4012725.pdf .
PymbleSoftware wrote:
Bah, that looks more like Ada. Real FORTRAN looks like:

Code:
IF IF = 0 THEN THEN = 0;
STOP
END


... and is tens of thousands of lines long, the variable TOTAL doesn't contain the accumulated total but CURR does and functions don't do what their name suggests and has GOTOs everywhere and totally incomprehensible logic.

;-)

:lol:

hamei wrote:
That's interesting ! thanks ... I'm still looking for a keyboard that has the F keys in a double row down the left side. HP built them that way but HPIL she don't work with an SGI :(

There's this HP-HIL to PS/2 adapter , but it's very expensive.

hamei wrote:
You could build your own PDP-8 from the prints that came with a K&T. Honest, the machine came with full prints to the boards and K&T built three boards of their own that went into the PDP-8 for real-time control and they wrote their own operating system. Not "sort-of" real-time, realtime. Sundstrand likewise. And when you called tech support, you got someone who knew their shit. No one from India ! Some of those guys could think in binary. Seriously. It was nice to talk to people who knew their job. Knew it inside-out, frontwards and backwards, and never once said "Oh we can't tell you that, it's a seeecret !" Gag.

I've always wanted to build my own PDP-8/S :D
The PDP-8/S is interesting because it had only 519 logic gates and its CPU processed only 1 bit at a time instead of the full 12 bits (it was slow, but physically small-ish).
If somebody has any links to a schematic for it (preferably using new CMOS/TTL SSI/MSI logic, not individual transistors), please let me know =)

Most of the old equipment I've seen was also designed to be serviced, so there are many extra markings & test points on the PCBs, and dedicated service manuals etc. Now it's just expected to throw the entire thing away and buy a new one if a single capacitor fails :roll:

The build quality of most new devices is also very bad. I managed to save an HP 9830A , built in 1972, from being thrown away (however, those idiots threw away the plotter before I could get to it :evil: ). The machine is over 40 years old and is _still_ working perfectly! I bet no electronic device made today will last that long.
hamei wrote:
I could get you two complete computers, a tape punch and a 2' x 2' x 4' plywood box of spares if you wanted to ship them from China :D

Tempting! One computer and the tape punch would be more than enough, though :D

Surface shipping for those two would probably be affordable if the Chinese post office still offers that option... I'll send you a PM in June and we'll see if it's feasible :)
hamei wrote:
Umm, you ain't gonna be sending a PDP-8 by mail :shock: They aren't as tall but they're every bit as heavy as a Xerox laser printer.

Sorry, I mixed up surface mail with freight :)

The PDP is about 50kg (110 lbs) and the tape punch about 20kg (44 lbs), right?

However, it has a linear power supply, so that is probably about half its weight. If the PSU is easily removable, the PDP could be split into two packages and sent by mail (it might exceed the max. dimensions, though).

hamei wrote:
Let me make sure the stuff hasn't gone to the smelter before we get too excited tho.

Thanks, I'll keep my fingers crossed :)

hamei wrote:
Most people don't consider tape punches and 1975 computers to be valuable :(

It's strange that companies don't donate such things to a museum, especially since in most countries, they have to pay for someone to recycle the stuff... That HP plotter was older than the people who threw it away, so I guess they didn't even know what it was for, maybe that's the reason.
I've been following this thread since the beginning and I'm getting more jealous with every update :D

This room is going to be awesome.
hamei wrote:
A caveat : you can't safely share libraries between MIPSPro binaries and gcc binaries.

This is true only for C++ code, you can mix C code without any problems, so:

MIPSpro C program + GCC C libraries = ok
MIPSpro C++ program + GCC C libraries = ok
GCC C++ program + MIPSpro C libraries = ok
MIPSPro C++ program + GCC C++ libraries = not ok
GCC C++ program + MIPSpro C++ libraries = not ok

This is because MIPSpro and GCC use different function naming for C++ code, but the naming of C functions is the same.

The optimizer in GCC 4.7.1 is at least as good as the one in MIPSpro (and actually substantially better on some numerical code I tried it on, YMMV), so there's really nothing bad about compiling a C library with GCC anymore :)

There were incompatibilities with passing structures smaller than 16 bytes, but that was in GCC 3.3, which is ancient. I haven't experienced this problem with 4.7.1.
duck wrote:
miod wrote:
duck wrote:
You'd need to add backslashes after the 0 and +1:s to make this trick work

No.

Yes. I tested it :-)

How?

Miod is right. No standards-compliant C compiler should require backslashes in this situation.

Why? The compiler doesn't care about whitespace at all. So, the following two programs are identical, as far as the compiler is concerned:
Code:
int foo[0+1+1+2+3+5];

int main() { return foo[0]; }

and
Code:
int foo[0
+1
+1
+2
+3+5];

int main() { return foo[0]; }

Compile them both with c99; if it fails to compile the second example, then the compiler doesn't follow the standard.
inst shows these internal package names by default and I think there's an option to have that same behaviour in swmgr.

You can see which subsystem contains a specific file by doing a "versions long | fgrep /usr/nekoware/lib/libz.so". Then to get the subsystem version do a "versions -n neko_zlib.sw.lib".

"versions -n long | fgrep /path/to/file" might also work, but I'm away from my SGIs at the moment, so I can't check.
johnsmith wrote:
gcc -I/usr/freeware/include/gtk-1.2 -I/usr/freeware/include/glib-1.2 -I /usr/lib32 -I/usr/freeware/lib32/glib/include -I/usr/freeware/include -o glchess anim.o config.o dialog.o draw.o engine.o game.o global.o image.o interface.o main.o menu.o models.o move.o player.o prefs.o san.o splash.o texture.o -lgtkgl -I /usr/freeware/lib32 -lgtk -lgdk -lgmodule -lglib -lXi -lXext -lX11 -lm -lGLU -lGL -mips1

There are lots of errors here.

First, if you want to add a search path for libraries, you use "-L", not "-I". "-I" is for adding a search path for C/C++ header files and is ignored while linking. Also, there's no need to add "-L/usr/lib32", it's searched by default.

"-mips1" is only for R2000/R3000 and requires the older O32 ABI. You shouldn't be using this on any modern SGI, it slows things down. Use "-mips4" if you're on a R5000 or newer, "-mips3" otherwise.

From a previous post it looks like you're compiling this without any optimization flags, which is a really bad idea. Add "-O2" to the compile line.

After you make these changes, it should work better :)
Try this:
Code:
gcc -mips3 -O2 -DPACKAGE=\"glChess\" -DVERSION=\"0.4.7\" -DSTDC_HEADERS=1 -I. -I/usr/freeware/include -I/usr/freeware/include/gtk-1.2 -I/usr/freeware/include/glib-1.2 -I/usr/freeware/lib32/glib/include -o glchess *.c -L/usr/freeware/lib32 -lgtkgl -lgtk -lgdk -lgmodule -lglib -lXi -lXext -lX11 -lGLU -lGL -lm


This will compile all the .c files and link them into a "glchess" executable.
johnsmith wrote:
Thanks for the tip. I tried it, but if failed at the linking stage with can't mix o32 and n32.

Did it say which library was O32?

IRIX 6.2 supports the N32 ABI, so it would be best to make sure all the libraries you are linking to are N32 rather than O32.

The SGI freeware packages should have separate subsystems for O32 and N32 libraries. Make sure that all the prerequisite packages have the N32 subsystem installed.
Then it should Just Work(TM) :)
foetz wrote:
i'd also give -O3 a shot. as long as you don't get problems ... why go for less :D

-O3 can actually create slower code. It helps a lot if the code is C++ and uses STL, or does a lot of numerical math, but in other cases it can be about 5-10% slower (this is mainly because it unrolls too many loops, so less code fits in the L1 cache).
It's best to experiment :)

johnsmith wrote:
So I'm still scratching my head over this o32 n32 issue.

You're linking them with the wrong libraries. */lib is for O32, */lib32 is for N32. You're telling the linker to look for O32 libraries, so it's no surprise that's all it finds.

If you have all the appropriate libraries installed (N32 subsystems of their freeware packages), the single compile&link line I posted a while ago should work. You don't need to use "make", just copy & paste that line.

If you want to do it manually with "ld", tell it to look in */lib32 and not in */lib. Also, there are no libraries in /usr/bin, so remove "-L/usr/bin" :D
johnsmith wrote:
Thanks, I tried that. It's still not compiling. I do see some errors looking for stuff like malloc and free, which should be included by default, right?

Not if you're linking manually with ld :)

Use gcc to link the files:
Code:
gcc -mips3 -n32 -o glchess *.o -L/usr/freeware/lib32 -L/usr/X11R6/lib -lgtkgl -lgtk -lgdk -lgmodule -lglib -lXi -lXext -lX11 -lGLU -lGL -lm

This will automatically link it with libc and any other common runtime files it may require.
johnsmith wrote:
I think gcc used Irix ld back then, it gives the exact same error messages. Hmmmmmm.

The order of the libraries is also important. Did you use the exact line I posted?
I've never peeked inside a running Mac Pro, but on the old PowerMacs they had a red LED next to the RAM turned on all the time. On those machines it was harmless and didn't really mean anything if it was on. They probably did this as a warning that the machine is running and you shouldn't replace RAM or something.
I've seen this on both Sawtooth and MDD PowerMacs.

Looks like they've changed the meaning of that LED on the newer models :)
hamei wrote:
Quote:
knowing about the resources available is important for multiprocessing.

For general computing and desktop use, no it is not

This is true. The email client is a good example of how multithreaded programs should be written.

hamei wrote:
Quote:
There are many very useful programs that do need to know how to divide up work in order for them to fork the correct number of processes.

Sure. But there is no need for them to have any idea whatsoever about how many processors the computer has. If the app needs ten threads, give it ten threads. Then let the scheduler decide how to hand out cpu time.

This is partially true. It works great for IO-bound tasks, but for CPU-bound tasks it's different.

In a 3D renderer, you can trivially parallelize rendering by splitting rendering of individual lines of the output image into separate threads. However, if you ignore how many cores the system has and spawn all the threads at once, the OS scheduler will try to execute them all at the same time, there will be a lot of context switching and the program will run slower than if it knew how many cores it has.

This is an example where spawning all the threads is a bad idea, but only because the scheduler is not a batch job scheduler. If the OS had a separate scheduler for such tasks, it would be as you say.

The reason why it's slower is because all those CPU-bound threads compete for execution at the same time. Every time the scheduler swaps a thread it must save its registers and load registers of the new thread, this is called a context switch. A good scheduler will also take cache into account. If a thread has been doing some work on core 2, that core's L1 and L2 caches have data which that thread needs. If you now give that core another thread to execute, that data will eventually be replaced with the new thread's data. When the original thread resumes execution on that core, the core will have to reload all the data from main memory, which is very slow. Now imagine this repeats for every CPU-bound thread. If you spawn just the right number of threads, every thread will get its own core and things will run faster because they won't compete with each other. Even if a CPU-bound thread is interrupted by an IO-bound thread, it won't cause as much damage because IO-bound threads typically don't need a lot of cache, so our renderer thread will still have its working data in the cache the next time it gets scheduled.

For HPC workloads a batch scheduler is best, but for general desktop use the time-sharing one we have now is better. If you want to run long CPU-bound tasks on this kind of system, you need to compromise. Knowing how many cores the system has enables you to simulate a batch scheduler on top of a time-sharing one. It's a kludge, but it's what we've got :)

An interesting project would be to add a batch job scheduler to UNIX. Then the OS could worry about these things and knowing how many cores the system has wouldn't be necessary anymore. On HPC systems this is typically done in userspace, but a kernel-based solution might be faster.

hamei wrote:
Quote:
For example, pbzip2 (parallel bzip2) and pigz (parallel gzip) can run an order of magnitude faster than just using bzip2 and gzip, respectively. Likewise, GNU Make can take advantage of parallelism when building a project. The same goes for GNU Parallel -- that program splits up work into subprocesses and does tasks in parallel.

Just had this discussion with smj a little ... knowing the number of cores available is once again not the way to decide how to program this. For example, you have a 512 p O3900 and a 256k zip file. What are you going to do, check the number of cores and split it into 512 tasks ? That's stupid.

This is an IO-bound task, so you are correct, splitting it up into that many pieces would be much slower :)
hamei wrote:
I expected a bigger ass-whuppin' than that ! Of course any broad beneralization is going to be wrong a lot of the time, which is why I tried to cover my butt with the "general desktop use" proviso. Didn't work tho :D

But I think your points are in a way the same as mine : how one handles multiple processors depends on what one needs to do, not on "how many cpu's does this box have ?"

:D

I suppose. However, if you're reimplementing the functionality of OpenMP and similar lower-level libraries, then you need to know how many cores the system has at runtime. There's no other way to do these things, because the kernel lacks a proper batch job scheduler.

But this is a kludge, really. For example, on the new AMD processors, they lie about the number of cores -- their "8-core" processor only has 4 proper cores (with FPUs and everything) and an additional 4 integer units. So if you try to run an FPU-heavy workload on one of these processors, it will run slower with 8 threads than with just 4. This is why I think a batch job scheduler should be part of the kernel, there are simply too many different architectures.

A practical example... The c-ray benchmark is flawed because it doesn't take into account how many cores the system it's running on has. If you spawn one thread for every line, the threads will fight each other and the program will run slower. The proper way of doing parallelization in c-ray would be to ditch the manual pthread implementation and just use OpenMP. With just one line of C, it would run properly and wouldn't have to know how many cores the system has. Also, there would be no overhead on single processor systems -- simply compile without the "-mp" switch and voila.


hamei wrote:
I immediately think of the reason that SGI would not put faster processors in the Octane - the memory system couldn't feed them. And I'll go out on a limb and bet that the memory system in "modern" commodity computers can't feed the cpu's either.

Memory is still the main bottleneck, apart from I/O, yes :)
This is why processors have caches, but for random memory access patterns, they don't do much.

Also, on the Octane, both processors share the same bus. I forgot the exact bandwidth of that bus, but it was around 1GB/s. If both processors need something they don't have in the cache, that's effectively half a GB/s for each CPU, which is kind of slow today (but not when the Octane was released).

hamei wrote:
In fact, if this were the main job of a computer, wouldn't it be better to have fifty little DOS computers with a single cpu and their own memory systems, with each one doing one task without interruption, then joining their answwers at the end ?

This is exactly how modern GPUs work :) They have thousands of independent tiny cores, which work on a stream of data.

However, only a few problems can benefit from such an architecture. You often need to have some sort of communication between various cores.

SGI also had reconfigurable FPGAs that you could attach to an O3k or Altix system. That was a pretty good idea, but probably too expensive to be of much use.

hamei wrote:
A non-multitasking cluster ? Multi-tasking operating systems may not be the best solution for every requirement ?

Of course :)
Most of the old computers used in HPC, e.g. Crays, didn't have multitasking. There was just a simple interface for the batch job dispatcher, so the system ran only one program at a time.

Now HPC systems use Linux and have a similar job dispatcher implemented in userspace. At least this is how things work in the larger systems I've seen.


jwp wrote:
All of these multiprocessing libraries and utilities internally are pretty much just calling fork and managing the child processes, which is a normal part of Unix programming.

True, but spawning processes and switching between them is slow. This is why most libraries used in HPC use threads.


This thread has really veered off course :D

I've tested the script on a MacBook Pro with 10.6.8 and it works fine:
Code:
Darwin icarus.lan 10.8.0 Darwin Kernel Version 10.8.0: Tue Jun  7 16:32:41 PDT 2011; root:xnu-1504.15.3~1/RELEASE_X86_64 x86_64
2
Those errors are because of a missing function prototype for gtk_file_chooser_button_new_with_backend().

By default (if you forget the prototype or leave out types in the implementation), every C function returns an int, so this is the reason for the last two errors. The first error says what I just wrote, but in a shorter way :)
hamei wrote:
Speaking of OpenMP, was talking to the main developer for GraphicsMagick a while back. We aren't able to make OpenMP work in Irix. (He gave directions, I pushed buttons, in case you were worried.) It would build but then crash badly. Even though SGI invented Open MP, it has problems with the current implementation :(

It's possible to crash the program if the functions used inside an OpenMP block aren't thread-safe. Sometimes it's also necessary to manually specify which variables are shared and private, so that the compiler can handle accesses to them properly. Another thing is that MIPSpro doesn't support the latest OpenMP version, they're one or two versions behind, so that might be a possible reason.

A quick check would be to compile it with the gcc 4.7.1 I built (the older Nekoware package had broken OpenMP support). If it works fine, it's probably a version problem, but if it doesn't, it's probably related to the IRIX C/C++ library.

As with all multithreaded programs, there are a number of other possible causes :)


bluecode wrote:
I never used a Cray so I don't know, but this statement just sounds wrong. For one thing, there is no contradiction between an operating system that only runs one program at a time [if that was ever accurate about Cray] and multitasking. One program can certainly multitask- this is how we do things on IBM.

We probably think the same thing but use different definitions of multitasking :)

I meant that in the same way we have multitasking on standard computers today -- a time-shared execution of multiple different programs. First Crays didn't have that, but a single program could still use more than one processor, the later ones ran UNIX anyways.
bluecode wrote:
That's called multiprogramming [1] [2] as far as I know.

I've seen it called that too, but the newer literature seems to prefer "multitasking" (e.g. http://en.wikipedia.org/wiki/Multitasking#Preemptive_multitasking.2Ftime-sharing ).
Would you consider the textbook used in MIT's Operating Systems I class a few years ago to be authoritative enough?

According to Operating System Concepts, 7th edition , multitasking and time sharing are one and the same thing [p. 16] (also, under the index entry for "multitasking" it says "see time sharing"). It defines it as an extension of multiprogramming, where switches between jobs occur frequently and not only when a job is waiting for I/O or another task, so that the users can interact with every running program. It later divides it into preemptive and cooperative multitasking (as is done on the Wikipedia page).

This is the accepted definition in use today.

35 years ago, the word "terminal" was primarily used to describe a physical machine, but now when someone says "I'll open a terminal", it means he'll start a terminal emulator program, not that he'll take apart a VT100.

I don't decide how people choose to name or rename concepts :)
Some drives have a "delayed start" jumper, so they don't trip the circuit breaker if they're used in large disk arrays :)

If you check the drive's documentation, you should see if it has any such jumpers and if they're set on your drive.
I actually agree with most of your points (and thanks for the detailed explanations) :)
The low-level OS stuff I've worked on was dominated by UNIX-like systems, so this has skewed my view of operating systems in general. To be frank, I didn't like that book either, is there a better one you could recommend?

The broad and general description I used was simply because that is what the vast majority of the systems used today implement, so this is what people reading this forum are most familiar with. As you say, it is not correct to lump everything together like that, but this is a public forum, not a scientific article ;)
The Wikipedia link was never meant as a proper proof, but as an example of such usage (which is why I preceded it with "e.g.").


So now that we've cleared up the terminology and almost completely derailed the thread, perhaps we can return to the original topic of the number-of-CPUs detection script :mrgreen:

jwp : on IRIX you can also use "sysconf NPROC_ONLN" to get the number of CPUs currently online. This way you don't have to parse the hinv output, so it's a little bit faster and will now also work on mixed-CPU Origins, where hinv reports processors as:
Code:
Processor 0: 500 MHZ IP35
CPU: MIPS R14000 Processor Chip Revision: 1.4
FPU: MIPS R14010 Floating Point Chip Revision: 1.4
Processor 1: 500 MHZ IP35
CPU: MIPS R14000 Processor Chip Revision: 1.4
FPU: MIPS R14010 Floating Point Chip Revision: 1.4
[...]
Processor 14: 400 MHZ IP35
CPU: MIPS R12000 Processor Chip Revision: 3.5
FPU: MIPS R12010 Floating Point Chip Revision: 3.5
Processor 15: 400 MHZ IP35
CPU: MIPS R12000 Processor Chip Revision: 3.5
FPU: MIPS R12010 Floating Point Chip Revision: 3.5
[...]
bluecode wrote:
Thanks for your post. I'm sorry I got a little bent out of shape. Your good manners put an end to this hopefully and it now ends on a positive note because of that.

It's OK :)
Writing makes it harder to convey the tone and English isn't my first language. I'm sure if we had this discussion in person, there would have been no friction.

bluecode wrote:
The other choices are really specific to a particular OS or family- for instance Bach or McKusick on UNIX, or Tannenbaum on Minix (but not on anything else!) etc.

I like that Maurice Bach's book, it's well written.

bluecode wrote:
Back then everything was in the manuals.

HP also made some pretty awesome patents for their hardware -- all information on how to build the machine was in there, including ROM contents with disassembly. I recently saved a fully working 9830A from being thrown away and am slowly going through its patent application to see how it works internally. It's similar to the early HP minicomputers, except it has a much slower CPU, simpler I/O, and a BASIC interpreter in its ROM (this dialect of BASIC is actually much nicer that what was available on home computers, it's more similar to Fortran and can do matrix multiplication etc.).

bluecode wrote:
A lot of people are learning OS concepts today by running vintage OS to see how things were done when a person could understand almost all of it.

I like doing that, but some hardware is very hard to find nowadays or it's too large to ship. Emulators exist, but not for every system, so the less popular ones slowly fade away, which is sad.
hamei wrote:
Code:
cc-1020 cc: ERROR File = ../test/utils.c, Line = 371
The identifier "MAP_ANON" is undefined.

addr = mmap (NULL, n_bytes, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS,
^

This is a bit trickier to fix, since IRIX doesn't have MAP_ANON. The first comment here shows you how to fix this.

So, instead of using -1 as the file descriptor and MAP_ANON in the flags, you manually open /dev/zero (mfd in that comment is an int) and use that file descriptor. You should also close() that file descriptor after the munmap() call, otherwise it will leak resources if it's called too often.

EDIT: I see we really jumped on this one :D
jwp wrote:
Thank you, ShadeOfBlue, I have updated the script accordingly. I trust that this sysconf approach will work for older systems as well? I would guess so, since it's something from the System V heritage...

It should, but I'm not 100% sure, since SGI yanked the manpages for anything older than 6.5 from Techpubs :)


bluecode wrote:
Your English is great, better than I could speak your language I'm sure! Yeah a pitcher of beer always seems to smooth things out. If that doesn't work there's always the waitresses who bring it :lol:

:D
It seems normal to me. I'm accessing it when everyone in the US is asleep, though :D
Moved to the HP/DEC forum :)
hamei wrote:
ShadeOfBlue wrote:
hamei wrote:
Sure. But there is no need for them to have any idea whatsoever about how many processors the computer has. If the app needs ten threads, give it ten threads. Then let the scheduler decide how to hand out cpu time.

This is partially true. It works great for IO-bound tasks, but for CPU-bound tasks it's different.

Been thinking about this a while and came to the conclusion that you have been misled, no, it is not different.

You're looking at this from the wrong point of view. I am only speaking of the situation as it pertains to workstations .

I think we're mixing up the current situation on UNIX systems and elsewhere .

What I said before was that on a UNIX system , if a CPU-bound program knows how many CPUs the system has, it can run better than if it just spawns hundreds of threads willy-nilly. This is a measurable fact, not speculation. GUI programs are IO-bound (waiting for user input), so they won't benefit from knowing the number of CPUs.

The script made by jwp is for UNIX systems, so it makes sense to talk about those in this thread :)

I'll now address a few of the points you made...


1. Responsiveness while running CPU-bound background jobs
The responsiveness issues you describe can be fixed by running the CPU-intensive stuff at a lower priority (higher nice(1) value). If you run the rendering job at a nice value of 20, then every UI app will get priority over the rendering job and there shouldn't be any sluggishness. You could also try boosting the priority of UI apps (lowering their "nice" value with renice(1) ).

However, you must set the nice value manually, since the kernel has no clue which program does what. IIRC, Maya runs the rendering processes at a lower priority, so the user can still work normally on the rest of the scene at the same time. And this actually works, people use it this way.

If you're compiling something in the background and you don't use "nice" to start those jobs at a much lower priority, they will run at the same priority as nedit and firefox and the rest of the stuff you've started. As far as the kernel is concerned, these tasks are equally important, so it won't favour nedit and firefox.



2. CPU-bound programs spawning as many threads as they can
Now if a renderer starts 2160 threads (one for each line in a typical 4K image), it will burden the system needlessly, especially if you run it at the same priority as your GUI apps. It doesn't matter if the threads are created a few milliseconds apart, that's not the issue here (thread creation is cheap). Context switches are expensive, it simply makes no sense to spawn that many threads, because then the system will spend too much time switching between them (the threads will fight each other for CPU time!) instead of doing work, it is inefficient (I've explained this in more detail in a previous post). This is not speculation, I know this because I've written parallel renderers before and I measured that spawning threads for every line in the output image was much slower and used more memory. This is why no proper renderer does that. Look at Maya or any other good 3D suite, every single one will check how many cores the system has. There is much less overhead if the program spawns only as many threads as there are CPUs and then distributes work between them. Every scientific app using OpenMP works like this. The OpenMP library checks the number of cores available and then spawns the appropriate number of worker threads.

The only scenario where spawning all the jobs at once would make sense is if the kernel provided a batch job scheduler. So you could simply put your tasks into a global queue and the kernel would work on one or more tasks from that queue (depending on number of CPUs) whenever the system is idle.

In a time-sharing OS that doesn't provide this functionality, spawning lots of CPU-bound threads will result in lower overall performance (the threads will fight each other) and consume more resources (you need memory for every thread's stack at the very least) than simply checking how many CPUs are available and using an appropriate number of threads.



3. Using an Origin as a desktop machine & behaviour of IRIX's scheduler
Origins are not desktop machines. The behaviour you've seen (one CPU more loaded than the other) is normal. This is because the scheduler also takes cache locality into account, along with a lot of other metrics (on NUMA systems it is also aware of the system's topology which also influences the decisions it makes). Having one CPU loaded more that the others is not an indication of a crappy scheduler.

If you have lots of short-lived tasks, it makes no sense to spread them out to other CPUs and thrash their caches. IRIX will actually distribute threads to other CPUs once it figures out they will run for more than a few seconds (or if the main CPU is very busy). Also, if you re-run a single program many times, IRIX will try to run it on the same CPU, because that CPU already has the data that program needs in its cache, which is why it would be inefficient to run it on a different CPU. If the kernel didn't care about cache locality and just tried to keep every CPU busy, it would be similar to one of those parts boxes with multiple drawers and having one of each screw type in every single drawer, because it's a waste to keep drawers empty. Finding a specific screw would be very slow :)

What you want is something like BeOS, a single-user desktop OS. IRIX was made to run on everything from workstations with a single CPU and one user to servers with 1024 CPUs and thousands of users; it's difficult to have optimal performance in every scenario and for every workload. Adding a batch job scheduler to the kernel would help a bit with background jobs such as rendering and compiling (and then those programs wouldn't need to know how many CPUs the system has). But even as it is now, if you lower the priority of such tasks, the system should remain quite responsive. I'm not saying it's perfect, it could use improvements, but it's not a complete disaster :)




I hope it's clearer now why checking the number of available CPUs is a perfectly normal thing for a CPU-bound program to do on a UNIX system :)
hamei wrote:
My point is that in any activity there will be an optimum number of threads. For an exaggerated example, take pbzip. If you have 128 processors and a 256k zipfile, spawning 128 threads to unzip that file would be ridiculous. There is going to be an optimum number of threads for the task at hand. In general, the task at hand is not going to be determined by how many processors you have !

pbzip is IO-bound, so what you said is true. But for most CPU-bound tasks, the number of CPUs you have _is_ the optimal number of threads!

Path tracing (a rendering algorithm) is a task which scales linearly with the number of CPUs available (for all practical purposes... on a NUMA system with 256 cores and more you need to tweak memory placement as well to maintain the speedup, but that's not really important to this discussion). For this task, spawning as many threads as there are CPUs _is_ optimal. If you know a better solution to this that doesn't involve knowing the number of CPUs, then please say so :)

There are plenty of other CPU-bound workloads which have the same behaviour.


hamei wrote:
This is a pretty crappy solution. Adjusting priority by which application has the focus is way better.

And which application has the focus on a multi-user system? :mrgreen:


hamei wrote:
ShadeOfBlue wrote:
However, you must set the nice value manually, since the kernel has no clue which program does what.

Pretty crappy system.

So what you're saying is that the kernel should have a magic crystal ball and use that to determine which process you find more important? :mrgreen:

But still, the kernel will age the priority of processes if it figures out they're CPU-bound. The longer a CPU-bound process runs, the less important it becomes to the scheduler. So it should favour IO-bound processes (including GUI).

However, when you're compiling something, the compiler also does IO operations, which causes the kernel _not_ to lower its priority, which explains why you would experience responsiveness issues in this scenario.

Adjusting the nice value of a process will hint the kernel that the process is either less or more important from the start and will keep the system responsive.

hamei wrote:
You're looking at this too much from a programmer's view. Look at it as a computer user for a change. We really don't give a shit if it takes ten seconds longer for the background task to finish. What is important to us is what we are doing. NOW !

If you want to use a UNIX system, help the kernel and run the background tasks at a lower priority, and the system will work exactly as you want it to :)

If you want the OS to handle UI tasks in a special way, then UNIX is not what you should be using. On UNIX, every one of your processes is equally important to the kernel, unless you tell it otherwise or it figures it out on its own.

IRIX also has a special scheduling class for (soft) real-time processes. If the GUI used that, you wouldn't see any issues with responsiveness. However, SGI must have decided against that for some reason and they use real-time priorities only for recording video and similar tasks.


hamei wrote:
ShadeOfBlue wrote:
There is much less overhead if the program spawns only as many threads as there are CPUs and then distributes work between them.

I do not think this is a good approach.

What would be a good approach then? You've agreed that spawning one thread for each image line isn't the right thing to do, so what is the optimal number of threads for a renderer if you can't get the number of CPUs?


hamei wrote:
And OpenMP sucks dead donkey balls for workstation loads. It's just not optimized for what a workstation should be doing. For some dedicated render machine or a box simulating nuclear explosions, fine. But not for a workstation .

It's possible to lower the priority of individual OpenMP threads, so in that case it works just fine on workstations. Within the same app you just check if the user ran the simulation/render job from the GUI or from the command-line and adjust priorities so it doesn't get in their way. Or you could just always use a lower priority. This is similar to the Maya example in my previous post.


hamei wrote:
On a 2p O350, I have to disagree. If the point of having more than one processor is to actually use them, then having one do 70% of the work while the other sits idle and my desktop goes unresponsive, is just bad design

Did you read the rest of my explanation? Spreading threads to other CPUs just because they're free would give you worse overall performance, including GUI apps.

The unresponsiveness isn't caused by the thread placement policy.

hamei wrote:
I don't care if it's inefficient, I want it to do that because I goddamned well want that application running NOW because that's what I am doing NOW.

It would also be inefficient for the usage scenario you describe. If you want the kernel to treat your programs preferentially, then just tell it so! :)

hamei wrote:
Actually, in real life this is not the case. Grab an old dually something and put OS/2 or BeOS on it.

Both of those OSes were made for much simpler hardware where cache locality and similar concepts didn't matter as much as they do today. If you changed their schedulers to take that into account, they would run even better on new hardware, thread placement isn't responsible for sluggishness. For a single-user machine, I agree, an OS like those would be better.

Speaking of schedulers... I took another look at Miser , the batch system on IRIX. It's not a fully-userspace implementation, as I had previously thought, but it has a kernel component too. It runs batch jobs from various user-defined queues whenever the system is idle. This kind of batch system is exactly what I had in mind when I talked about this in a previous post.

Next time, try using "nice gmake -j2" instead of just "gmake -j2". If you're still not happy with the responsiveness of the system, try using Miser. I've used IRIX even under heavy loads and have never experienced the kind of responsiveness issues you describe. It works well enough for me.

Did you experience similar UI responsiveness issues on other SGIs you have?


hamei wrote:
And ja 2, a single-user workstation is looking more and more attractive. What the hell do I really want all these users and groups for ? For most people it's pointless additional complexity. How many users do you really run your desktop as ?

It is useful if you want to ssh into your machine from another location. On single-user OSes like BeOS or classic Mac OS, you'd have to run a VNC server and then you wouldn't get a separate session, you'd just control the session that's currently active.

If you don't need that, then yes, it would be a bit faster and more responsive :)

hamei wrote:
ShadeOfBlue wrote:
IRIX was made to run on everything from workstations with a single CPU and one user to servers with 1024 CPUs and thousands of users;

Disagree. Remember the Personal Iris ? Indigo ? Indy ? Even the Crimson and Onyxes were "desksides", meant for one or two users, not a cast of thousands. They got into the hpc server schtick later on.

That's what I wrote. The last version of IRIX is used in both scenarios and everything in between.

Even in the Personal Iris era you had the Power Series racks with 8 CPUs and multiple users, so a single-user OS like you describe wouldn't have been useful there.

A big reason why it took off is that you had the same OS on both the low-end workstations and large servers, same behaviour everywhere. This made it easy to develop and test software for it.

hamei wrote:
The fact is, in many ways Yewnix is one huge kluge .... so we get new toolkits by the dozens while the underlying structural problems are wallpapered over. Post-modern society, whoopee :(

You might like Plan 9, it was designed from ground up to fix the kludge that UNIX has accumulated over the years. However, since most people think that UNIX-like systems are good-enough (or aren't bothered enough to look for alternatives), it doesn't get much attention.

Another problem with alternative operating systems like BeOS and such is that they lack software. An OS without software is useless. If every program you need to do your work is only available on UNIX or crappier alternatives, then you don't really have much of a choice :)
I just stumbled upon this on eBay:
Attachment:
sgi_unicorn.png
sgi_unicorn.png [ 213.89 KiB | Viewed 217 times ]


:lol:

http://www.ebay.co.uk/itm/380687061536
Cool, it even has Open Look :)
Blackwolf wrote:
i cant locate it on any of my installation discs unless im missing it.
Where can i find this?

It's on one of the "General & Platform Demos" CDs as part of the general demos distribution :)

I'm not sure if it's exactly v1.9, but there's a chance it could be newer than what's on ID's website.
hamei wrote:
Ooh. How about a big CF disk in a firewire reader, format the card xfs, and dd the image onto it ? Seems like that should work ?

It should :)
You don't even need to format it with xfs, since dd overwrites the filesystem anyways.

On an O3k the CD is connected via Firewire, so the system knows how to boot from it. If SGI kept this feature in the O300 and Tezro machines, it should be possible to boot IRIX CD images this way or even have the system disk on a CF card and use it as an SSD without the huge cost of SATA->SCSI adapters and similar.