The collected works of dexter1 - Page 2

schleusel wrote: HI renamed the tardist and edited above announcement. Could someone please change the blog entry accordingly?


Just did that.
Love the app! (i just installed it from sources on my O2 IRIX 6.5.16m mpro7313)

But there is a problem with xlock screensavers and password locking.

When i revive the machine and want to enter my password to unlock the screen, the screensaver restarts whenever i enter the first character. At first i was stumped, but then i saw iconbar blitting away at the very bottom. So i killed xlock manually and got my desktop back.

Exactly how one affects another i'm not sure, but try running an iconbar yourself, and set screensaver to lock the screen after one minute to see the effect (hopefully)
Thanks Squeen for the fix! Haven't tried it yet, cause i'm at home...

Actually i'm gonna try this on my Crimson at home because of my holiday next week. The monster reacted favourably to my self fritted cables and i have a fresh disk ready to install IRIX 6.2 and iconbar. Expect screenshots :)
Yes, read:
http://www.ff-net.demon.nl/papa/overclo ... eite3.html

The Indy R5K@150MHz and the O2 R5K@180MHz are identical in appearance. Gemm swapped them and both machines were happy with eachothers CPU.
BTW, the 180 MHz Indy cpu has a voltage regulator feeding the CPU at 3.6 volt, which the 150 MHz is lacking.

Triox has pioneered Indy overclocking, together with my feeble attemps at overclocking 5 Volt R4K's. We are now focusing on modifying the EEPROM to overclock the R5K@150MHz to 200 MHz. If the O2 can take that, the Indy should work as well.
Done: http://www.nekochan.net/wiki/gallery/album21/crimea1

I took the latest CVS last fridayevening. In all the excitement i forgot to test the password locking. #-o Will do that tomorrow on my O2 at work asap.

Cheers

Update: the screenlocking is fixed indeed, thanks so much!
.... only thing is that now the icons reappear after i unlock the screensaver by typing in my password. This doesn't happen when the screensaver doesn't lock.

The only way to fix it is by clicking on the icons and minimize them which brings them in the iconbar again. A bit annoying if you have to do that at each screensaver unlock.

But i am now convinced that this app will stay on my machine!
No this is irix 6.5.19m on my O2 at work. My irix 6.2 Crimson RE is at home.

I've been able to reproduce the problem. It's a multiple desktop problem, but has nothing to do with xlock.
Apparently 'ov' does lots of dirty tricks. Here's how i can reproduce it:

Make two (or three) desks in ov and start iconbar in desk 1. You see the icons on the ov display, but they are not showing on your desktop! If you switch back and forth with the second desk the icons appear as soon as you go back to desk 1.

Hmm. I thought Lisa had issues as well with multiple desktop. I believe she said that if you click on an iconized window from another desktop it will apppear on your current one.
Say Squeen,

How did you manage to circumvent the nagging about ImageMagick 5.5.1 when configuring? The freeware distro is 5.4.x, which configure doesn't like. And did you build it with gcc or MIPSPro?
Sheesh, the horrors MIPSPro compiler developers put us though to get our apps compiled... :)

Thanks for the answer, i'll attempt an optimized 0.7.6 with MIPSPro and put back some of the notexture stuff from Lisa back in it. I also try a static build with libbz2 libpng and who knows libimagemagick as well.

Just being masochistic today. Wonder why i'm not out there igniting fireworks :)
ducks wrote:
I couldn't do a full netinstall, couse csh on FreeBSD is not exactly the same as SGI's


Use pdksh instead of csh, i've done several installs, even irix 4.0.1,5.3 and 6.2! I usually follow this recipe for Linux:

Code:
sysctl -w net.ipv4.ip_no_pmtu_disc=1
sysctl -w net.ipv4.ip_local_port_range="2048 32767"

bootptab:
pippa:ht=1:sm=255.255.255.0:gw=192.168.9.5:ha=080069022996:ip=192.168.9.7

inetd.conf:
shell   stream  tcp     nowait  root    /usr/sbin/tcpd  in.rshd -L
login   stream  tcp     nowait  root    /usr/sbin/tcpd  in.rlogind
tftp    dgram   udp     wait    root    /usr/sbin/in.tftpd      in.tftpd -p -vv
bootps  dgram   udp     wait    root    /usr/sbin/bootpd        bootpd

passwd:
guest:x:500:100:,,,:/home/guest:/bin/sh

shadow:
guest:$1$Wsf0bAiT$Ua/QWYtP6k98G7R8uqQJH/:12329:0:99999:7:::

hosts:
192.168.9.7             pippa.sol pippa

hosts.allow:
ALL:localhost
ALL:192.168.9.1
ALL:192.168.9.7
ALL:pippa.sol

/home/guest/.rhosts
localhost frank guest
pippa root frank guest

copy install cd to disk first.

setenv notape 1
boot -f bootp()neo:/home2/irix53/stand/fx.IP12 --x

setenv notape 1
setenv tapedevice bootp()neo:/home2/irix53/dist/sa
boot -f $tapedevice(sash.IP12) --m


which contains the necessary instructions to netboot my Personal IRIS with IRIX 5.3. Along with 'ln -s /bin/pdksh /bin/sh'
Happy new year to all of you!

:hathat49: :silly: :smilecolros: :drinking:
As a matter of fact, yes.

I have built ImageMagick 5.5.7-15 statically without perl and managed to get rss-glx0.7.6 compiled with MIPSPro 7.3.1.3m and applied the patch from lisa's 0.7.4. There were only three things, which needed to be fixed in order for it to compile on MIPSPro:

Code: Select all

--- oglc_src/FirePart.h.save    Fri Jan  2 14:22:27 2004
+++ oglc_src/FirePart.h Fri Jan  2 14:23:38 2004
@@ -122,7 +122,7 @@
Particle *p = TblP;     //+1;

n = 0;
-       float da = pow (FIREDA, dt);
+       float da = pow ((float)FIREDA, dt);
//float ds = pow (FIREDS, dt);
//SVector3D v;

@@ -162,7 +162,7 @@
p->s.x = size;
p->s.y = size;
p->s.z = size;
-                               p->a = alpha * pow (FIREDA, (t - LastPartTime));
+                               p->a = alpha * pow ((float)FIREDA, (t - LastPartTime));
p->s *= 0.5f + nrnd (0.5);
}
} else
--- reallyslick/cpp_src/skyrocket_smoke.cpp.save        Fri Jan  2 14:31:13 2004
+++ reallyslick/cpp_src/skyrocket_smoke.cpp     Fri Jan  2 14:31:44 2004
@@ -17,6 +17,7 @@
*/

#include <stdlib.h>
+#include <stdio.h>
#include <GL/gl.h>
#include <GL/glu.h>

--- reallyslick/cpp_src/skyrocket_world.cpp.save        Fri Jan  2 14:31:30 2004
+++ reallyslick/cpp_src/skyrocket_world.cpp     Fri Jan  2 14:31:58 2004
@@ -16,6 +16,7 @@
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
*/

+#include <stdio.h>
#include <math.h>
#include <GL/gl.h>
#include <GL/glu.h>


The code was built on my O2 at work, and haven't had a chance to package it and give it a spin. Will do that as soon as i get to work next monday. I'm attempting to compile it on my crimson though. So hold on...
Well, all looks good, except for cyclone ofcourse. Lisa's patch was for lattice to run without textures, so everything looks cool, except for sound. I'll bench my stuff along with squeens and see if it really makes a difference. And because everybody has an Octane, i'll make it a -mips4 -Ofast=ip30 -IPA build along with a build -mips3 and more general optimisations.
It seems that for the lattice, Lisa has added an option -T if you want textures. I'm not sure if that means you get testures only with -T or you can specify a texture. I'm gonna try this on an XZ Indy to see if that makes a difference. Otherwise we have to hack a bit more or build an unpatched rss build next to a patched one.
Looks like some disk labels have been screwed

Best bet is to redo fx in export mode 'fx -x'

and write a default SGI label onto the disks, then select (o)ption drive in the partition menu.

As for XVM, don't know. Haven't played yet, because XLV is still my preferred way of striping disks, because i know the procedure best.
Neko, i've upped two tardist for the MIPSPro build of RSS GLX 0.7.6 with lisa's patch and my compile patch in incoming.
The mips3 is a general -Ofast -mips3 -IPA build for all machines
The mips4 is a -Ofast=ip30 -mips4 -IPA build optimized for Octanes, but should run on every R5K and up.
Prereqs is only libbz2, which the package checks for.

Let me know if there are any probs. Oh and Cyclone is still broken :(
You have a valid point. I will try to reedit that particular posting.
canavan wrote: (2) what's the size limit of EFS anyway?


8 Gb, so this will fit nicely on a double layer DVD
dexter1 wrote: Oh and Cyclone is still broken :(


Fixed it! :)

Take a look at reallyslick/cpp_src/cyclone.cpp:443

Code: Select all

glColor3f (r, g, b);
glPushMatrix ();
glLoadIdentity ();
glTranslatef (xyz[0], xyz[1], xyz[2]);
glRotatef (tiltAngle, crossVec[0], crossVec[1], crossVec[2]);
glRotatef (spinAngle, 0, 1, 0);
glTranslatef (width * cyWidth, 0, 0);
if (dStretch)
glScalef (1.0f, 1.0f, scale);
glCallList (1);
glPopMatrix ();


which is the main screen update subroutine (glLoadIdentity clears the screen and one sets up a glCalllist of functions and primitives to be shown). Note the first "glRotatef" having "tiltangle" as first argument. For some reason this glRotatef causes a no-show on SGI's native viewport but as soon as you set the display to a Linux machine with openGL support (Matrox) it did work!?!?!
Huh?
After several hours of dissecting, this tiltangle and crossVec are actually products of a machinecode optimised x86 routine with a C++ counterpart for 'other' CPU's like our MIPS. Instead of agonising assembly i took the easy way out and commented out only that first glRotatef. Viola :) My Crimson sweating out a Twister:

http://www.nekochan.net/wiki/gallery/album21/crimea3

I'll retest it on my O2 tomorrow and if i'm happy i'll pop the corrected tardists (mips3 and mips4) onto Neko's server.

Cheerio
Hello,

i've just uploaded a 75Mb tardist of Qt 3.3.1 on my university server:
http://www.mechanics.citg.tudelft.nl/~e ... s3.tardist
I have uploaded it to neko's ftp server as well. Basically it's a full MIPSPro build in mips3, no dependencies with freeware, and both single and multi threaded libraries exist. It's all in the nekoware format, so you can build the stuff yourself.

FWIW i've built it on a Challenge S irix 6.5.20m with MIPSPro 7.4.1 and POSIX/MIPSPro patches. A small patch to /usr/include/stdlib.h was necessary to build qmake, which is included. From there on it's a breeze :) Well, apart from creating the idb file, that has taken me several hours :(

More to come ! (KDE 3.2.1)
That's ok. I'm a patient man and my wine bottle is not empty yet :drinking:
In the qt331 tardist i included that exact strtoll as a patch to /usr/include/stdlib.h

strtoll is actually a C99 function but not a C++ 99 one (yet)
foetz wrote:
got the qt pack but qmake is missing!!!! the whole folder!


Ah woops, the bin/qmake was a link to qmake/qmake. Should have spotted that one. You only need the qmake binary? Or do you also need the qmake dir with object files?
I'm currently fixing some symbolic link deficiencies in my qt tardist package.

Be patient; it's a virtue.
I've uploaded the new qt331-mips3.tardist onto neko's server and on my mirror http://www.mechanics.citg.tudelft.nl/~e ... s3.tardist

It fixes the absence of qmake, symbolic links to non existant includes and the inclusion of phrasebooks and the templates directory. As Whiter said, that directory does contain two header files with brackets in them, which chokes the entire swpkg build. I had an idea of fixing that with a postop script/command, but abandoned it. I deleted the brackets from the names and i'll leave it at that, until i have a bright idea.
I never ran lmdd, but i always use diskperf which does the job just fine. Just gave lmdd a spin and it looks like more of a general timing I/O benchmark program than a disk performance analyzer tool.

IMO ditch lmdd usage for disk performance numbers and go for diskperf instead.
Orakel wrote: One of the most interesting usages (ie. Challenge S as firewall) is not possible because GIO64 is not supported, hence no Phobos.


For the record, Challenge S uses GIO32bis, not GIO64. The Mezzanine board is indeed not supported (yet).
Challenge M uses GIO64, BTW.
Unixmuseum, that is quite enough. No personal attacks please.

If someone's idea of system seciurity is sitting behind a NAT and letting every bsd protocol open, that is their choice. My Crimson at home is also behind a NAT, but i teach myself to compile the latest openSSL/H and login only via that. Not only it's good practice to do so, but you also get to know the Crimson quirks of compiling that code, and in general it increases the work needed for an occasional hacker who succeeds in breaking into my Firewall.
Yes, i have read the thread.

Look; putting irony or even sarcasm in posts is fine, as long as you use smileys or an obvious joke to express the irony or sarcasm itself. Your post doesn't even contain one. How am i supposed to know then, which remark is irony and which one is not?

Please people, use the smileys! That's what they are there for...

Sorry if my response was a bit harsh, but the conversation was getting very offtopic, and in my view Orakel didn't deserve your reply, which only widens the gap between your points-of-views.

Also I have received flak for other (non)moderation decisions in the past. That's ok, i can take it. Please please PM me to state your complaints, so i can moderate better in the future.
Shtoink wrote: I'm no genius, just a lacky... :D


Fetch me some beer then... :twisted:
With Schleusels patches to mplayer 1.0pre4 i have succeeded in running a speedshop trace. My machine is I2 HI+TRAM 195MHz 384 Mb, 6.5.22m+patches, MIPSPro 7.4.2m+patches. flags were '-O3 -r10000 -mips4 -n32'. Sample file was a fansub of Psychic Academy episode 4.
BTW, when compiling mplayer, do not strip the resulting executable! Then:

Code: Select all

ssrun -v -exp usertime ./mplayer -vo gl2 -vf format=RGB24 -nosound -benchmark your.avi

which results in a file called mplayer.usertime.somenumbers
Then run prof:

Code: Select all

prof mplayer.usertime.somenumbers > out.txt

And this is what comes out:

Code: Select all

-------------------------------------------------------------------------
SpeedShop profile listing generated Sat Jul  3 20:07:29 2004

prof mplayer.usertime.m17232

mplayer (n32): Target program
usertime: Experiment name
ut:cu: Marching orders
R10000 / R10010: CPU / FPU
1: Number of CPUs
195: Clock frequency (MHz.)
Experiment notes--
From file mplayer.usertime.m17232:
Caliper point 0 at target begin, PID 17232
/usr2/local/src/MPlayer-1.0pre4/mplayer -nosound -benchmark psychic_academy_ep04.avi
Caliper point 1 at exit(0)
-------------------------------------------------------------------------
Summary of statistical callstack sampling data (usertime)--
494: Total Samples
0: Samples with incomplete traceback
14.820: Accumulated Time (secs.)
30.0: Sample interval (msecs.)
-------------------------------------------------------------------------
Function list, in descending order by exclusive time
-------------------------------------------------------------------------
[index]  excl.secs excl.%   cum.%  incl.secs incl.%    samples  procedure  (dso: file, line)

[14]      3.810  25.7%   25.7%      3.810  25.7%        127  yuv2rgb_c_24_rgb (mplayer: yuv2rgb.c, 313)
[20]      2.490  16.8%   42.5%      2.490  16.8%         83  simple_idct_add (mplayer: simple_idct.c, 399)
[21]      1.530  10.3%   52.8%      1.530  10.3%         51  __ioctl (libc.so.1: stat.c, 32; compiled in ioctl.s)
[23]      1.410   9.5%   62.3%      1.410   9.5%         47  __glMgrWaitForDMAWrite (libGLcore.so: mgras_pxdma.c, 368)
[29]      0.930   6.3%   68.6%      0.930   6.3%         31  put_pixels16_l2 (mplayer: dsputil.c, 67)
[30]      0.810   5.5%   74.1%      0.810   5.5%         27  __glMgrim_Finish (libGLcore.so: mgras_modes.c, 60)
[33]      0.420   2.8%   76.9%      0.420   2.8%         14  simple_idct_put (mplayer: simple_idct.c, 389)
[34]      0.420   2.8%   79.8%      0.420   2.8%         14  yuv2rgb_c_24_bgr (mplayer: yuv2rgb.c, 332)
[32]      0.390   2.6%   82.4%      0.450   3.0%         15  msmpeg4_decode_block (mplayer: msmpeg4.c, 1676)
[35]      0.300   2.0%   84.4%      0.300   2.0%         10  memset (libc.so.1: stat.c, 32; compiled in bzero.s)
[7]      0.240   1.6%   86.0%      4.530  30.6%        151  MPV_decode_mb (mplayer: mpegvideo.c, 3093)
[39]      0.180   1.2%   87.2%      0.180   1.2%          6  simple_idct (mplayer: simple_idct.c, 409)
[42]      0.150   1.0%   88.3%      0.150   1.0%          5  __write (libc.so.1: flush.c, 58; compiled in write.s)
[49]      0.120   0.8%   89.1%      0.120   0.8%          4  __read (libc.so.1: malloc.c, 907; compiled in read.s)
[50]      0.120   0.8%   89.9%      0.120   0.8%          4  ff_h263_update_motion_val (mplayer: h263.c, 614)
[31]      0.090   0.6%   90.5%      0.690   4.7%         23  msmpeg4v34_decode_mb (mplayer: msmpeg4.c, 1582)
[63]      0.090   0.6%   91.1%      0.090   0.6%          3  h263_pred_motion (mplayer: h263.c, 1573)
[64]      0.090   0.6%   91.7%      0.090   0.6%          3  put_no_rnd_pixels16_xy2_c (mplayer: dsputil.c, 897)
[65]      0.090   0.6%   92.3%      0.090   0.6%          3  _BSD_getime (libc.so.1: flush.c, 58; compiled in BSD_getime.s)
[28]      0.060   0.4%   92.7%      1.110   7.5%         37  mpeg_motion (mplayer: mpegvideo.c, 2464)
.
snip


Amazing! :shock: More that 25% is spent in yuv2rgb, the colorspace conversion. The inverse discrete cosine transform is #2. Looks like we can kick some butt by:
1) Write a faster colorspace converter routine. MIPS asm, SGI_color_matrix, your momma on a calculator, anything seems better than this one.
2) The idct routine. If i can get SCSL libraries installed, there's a good chance it has some speedupped fast fourier transform routines. Also complib for somewhat older machines is an option.

project! :D
jan-jaap wrote: firefox has tag 0x100013, you have to tag your /usr/local/firefox/firefox (or whereever it lives):
cd /usr/local/firefox && tag 0x100013 firefox


Unfortunately, the firefox in /usr/local/firefox/firefox is not a binary but a shell script, so tagging doesn't work. maybe i'll try a "glob" later tonight...
Gaaack :oops:

I feel like a newbie... You're right, Neko.

i probably typed #0x100013 instead of 0x100013. Can't think of any better excuse :)
ChiaHos wrote: Unfortunately we've hit another snag: The current CVS compiles the solid/qhull collision detection libs, which seems to only want to compile with MipsPro 7.4 (unless somebody knows how to process lines like "#include <cmath>" wth MipsPro 7.3.1.3m?).


This has been covered in an old thread about Octave builds on IRIX:
viewtopic.php?t=710

The trick is to get separate CC-isoheaders to augment your MIPSPro 7.3.1.3m. There are also patches from SGI about this, but are behind support contracts. :( Unfortunately the tarball mentioned in the above thread is gone, but, i found an Octave reference:
http://wiki.octave.org/wiki.pl?PaulKienzleIrixConf

And here is the CC-isoheaders link:
http://octave.sourceforge.net/MIPS73-isoheaders.tar.gz
Hi all,

Sorry for not being so active on Nekoware builds the last couple of weeks, but i really wanted to take part in Schleusel's and Vegac's attempts in making MPlayer just a little bit faster, so i can watch neato movie stuff on my I2 Impact ;) . It has cost me a lot of time, but boy, am i glad i spent it with them and MPlayer. I will show you what i did, so maybe you can learn from my trials of getting that app speed up it's framerate. The optimisation is not done yet, it still ongoing, and we only just begun searching out the possibilities of hardware colorspace conversion, but the methods behind the software optimisation part has now been understood. Here it comes. I hope Neko won't mind me breaking the record of longest post ever on Nekochan. Beer is in the mail, Pete :P

Since my first speedshop run, in the mplayer 1.0pre3 thread viewtopic.php?t=1374 i've read man pages of ssrun just to get myself acquainted with the most used options. Instead of 'ssrun -exp totaltime' or 'ssrun -exp usertime' i now do 'ssrun -exp fpcsampx' to get my Finegrained-ProgramCounter-SAMPling-with-4-bytes timing for all the routines which MPlayer is busy with. So lets do a standard MIPSPro 7.4.2 '-O3 -r10000 -mips4 -n32' build of MPlayer 1.0-pre4, run 'ssrun -exp fpcsampx ./mplayer' with a standard .avi file (Schleusel and i use "courtyard.avi" from a Call Of Duty Demo video (11.4MB), to be found on the net). Machine is an R10K@180MHz Origin200:

Code:
ssrun -exp fpcsampx ./mplayer -vo null -vf format=rgb24 -nosound courtyard.avi
prof -lines mplayer.fpcsampx.#


Code:
-------------------------------------------------------------------------
SpeedShop profile listing generated Thu Jul 22 09:54:04 2004

prof -lines mplayer.fpcsampx.m249164

mplayer (n32): Target program
fpcsampx: Experiment name
pc,4,1000,0:cu: Marching orders
R10000 / R10010: CPU / FPU
4: Number of CPUs
180: Clock frequency (MHz.)
Experiment notes--
From file mplayer.fpcsampx.m249164:
Caliper point 0 at target begin, PID 249164
/usr1/local/everdij/MPlayer-1.0pre4/mplayer -vo null -vf format=rgb24 -nosound courtyard.avi
Caliper point 1 at exit(0)
-------------------------------------------------------------------------
Summary of statistical PC sampling data (fpcsampx)--
64010: Total samples
64.010: Accumulated time (secs.)
1.0: Time per sample (msecs.)
4: Sample bin width (bytes)
-------------------------------------------------------------------------
Function list, in descending order by time
-------------------------------------------------------------------------
[index]      secs    %    cum.%   samples  function (dso: file, line)

[1]    21.754  34.0%  34.0%     21754  simple_idct_add (mplayer: simple_idct.c, 399)
[2]    20.301  31.7%  65.7%     20301  yuv2rgb_c_24_rgb (mplayer: yuv2rgb.c, 319)
[3]     4.593   7.2%  72.9%      4593  put_pixels8_c (mplayer: dsputil.c, 897)
[4]     3.701   5.8%  78.7%      3701  simple_idct_put (mplayer: simple_idct.c, 389)
[5]     2.402   3.8%  82.4%      2402  msmpeg4_decode_block (mplayer: msmpeg4.c, 1676)
[6]     1.323   2.1%  84.5%      1323  put_pixels16_xy2_c (mplayer: dsputil.c, 897)
[7]     1.214   1.9%  86.4%      1214  memset (libc.so.1: stat.c, 32; compiled in bzero.s)


and so forth.... Note the total time, which is 64 seconds. One third of the program's time is used in the InverseDiscreteCosineTransform adition, another third in the colorspace converter yuv2rgb_c_24_rgb and the rest in the rest.
So Schleusel, Vegac and me sorta divided up the tasks. Schleusel tried some optimisations flags (-IPA) and porting details, Vegac had a look at the colorspace conversion and possible OpenGL O2/ICE speedups, and i got libavcodec/simple_idct.c :) simple_idct_add is just a small routine, consisting of two parts of each 8 'for' loops, which is basically the way how a double 1D-IDCT works. First the rows are transformed, and then the colums are transformed. The fransformed values are added to the already existing image, hence 'add'. For more info, read up on IDCT:

(1) http://skal.planet-d.net/coding/dct.html
(2) http://www-vs.informatik.uni-ulm.de/bib ... paper.html
(3) http://rnvs.informatik.tu-chemnitz.de/~ ... /IDCT.html

Here's the interesting part:
Code:
/* signed 16x16 -> 32 multiply add accumulate */
#define W1  22725  //cos(i*M_PI/16)*sqrt(2)*(1<<14) + 0.5
#define W2  21407  //cos(i*M_PI/16)*sqrt(2)*(1<<14) + 0.5
#define W3  19266  //cos(i*M_PI/16)*sqrt(2)*(1<<14) + 0.5
#define W4  16383  //cos(i*M_PI/16)*sqrt(2)*(1<<14) + 0.5
#define W5  12873  //cos(i*M_PI/16)*sqrt(2)*(1<<14) + 0.5
#define W6  8867   //cos(i*M_PI/16)*sqrt(2)*(1<<14) + 0.5
#define W7  4520   //cos(i*M_PI/16)*sqrt(2)*(1<<14) + 0.5
#define ROW_SHIFT 11
#define COL_SHIFT 20 // 6

#define MAC16(rt, ra, rb) rt += (ra) * (rb)

/* signed 16x16 -> 32 multiply */
#define MUL16(rt, ra, rb) rt = (ra) * (rb)

static inline void idctSparseColAdd (uint8_t *dest, int line_size,
DCTELEM * col)
{
int a0, a1, a2, a3, b0, b1, b2, b3;
uint8_t *cm = cropTbl + MAX_NEG_CROP;

/* XXX: I did that only to give same values as previous code */
a0 = W4 * (col[8*0] + ((1<<(COL_SHIFT-1))/W4));
a1 = a0;
a2 = a0;
a3 = a0;

a0 +=  + W2*col[8*2];
a1 +=  + W6*col[8*2];
a2 +=  - W6*col[8*2];
a3 +=  - W2*col[8*2];

MUL16(b0, W1, col[8*1]);
MUL16(b1, W3, col[8*1]);
MUL16(b2, W5, col[8*1]);
MUL16(b3, W7, col[8*1]);

MAC16(b0, + W3, col[8*3]);
MAC16(b1, - W7, col[8*3]);
MAC16(b2, - W1, col[8*3]);
MAC16(b3, - W5, col[8*3]);

if(col[8*4]){
a0 += + W4*col[8*4];
a1 += - W4*col[8*4];
a2 += - W4*col[8*4];
a3 += + W4*col[8*4];
}

if (col[8*5]) {
MAC16(b0, + W5, col[8*5]);
MAC16(b1, - W1, col[8*5]);
MAC16(b2, + W7, col[8*5]);
MAC16(b3, + W3, col[8*5]);
}

if(col[8*6]){
a0 += + W6*col[8*6];
a1 += - W2*col[8*6];
a2 += + W2*col[8*6];
a3 += - W6*col[8*6];
}

if (col[8*7]) {
MAC16(b0, + W7, col[8*7]);
MAC16(b1, - W5, col[8*7]);
MAC16(b2, + W3, col[8*7]);
MAC16(b3, - W1, col[8*7]);
}

dest[0] = cm[dest[0] + ((a0 + b0) >> COL_SHIFT)];
dest += line_size;
dest[0] = cm[dest[0] + ((a1 + b1) >> COL_SHIFT)];
dest += line_size;
dest[0] = cm[dest[0] + ((a2 + b2) >> COL_SHIFT)];
dest += line_size;
dest[0] = cm[dest[0] + ((a3 + b3) >> COL_SHIFT)];
dest += line_size;
dest[0] = cm[dest[0] + ((a3 - b3) >> COL_SHIFT)];
dest += line_size;
dest[0] = cm[dest[0] + ((a2 - b2) >> COL_SHIFT)];
dest += line_size;
dest[0] = cm[dest[0] + ((a1 - b1) >> COL_SHIFT)];
dest += line_size;
dest[0] = cm[dest[0] + ((a0 - b0) >> COL_SHIFT)];
}


At first glance i thought, what a lot of branches/decisions! If this routine is to become fast, it has to get rid of all those branches. Careful observing shows that the 'if' statements can be removed safely, because the condition (col[8*x]) is true when col[8*x]!=0. But if col[8*x]==0 then nothing is added or subtraced to the coefficients inside the 'if' statement anyway, so the if statement is superfluous:

Code:
if(col[8*4]){
a0 += + W4*col[8*4];
a1 += - W4*col[8*4];
a2 += - W4*col[8*4];
a3 += + W4*col[8*4];
}

if (col[8*5]) {
MAC16(b0, + W5, col[8*5]);
MAC16(b1, - W1, col[8*5]);
MAC16(b2, + W7, col[8*5]);
MAC16(b3, + W3, col[8*5]);
}

if(col[8*6]){
a0 += + W6*col[8*6];
a1 += - W2*col[8*6];
a2 += + W2*col[8*6];
a3 += - W6*col[8*6];
}

if (col[8*7]) {
MAC16(b0, + W7, col[8*7]);
MAC16(b1, - W5, col[8*7]);
MAC16(b2, + W3, col[8*7]);
MAC16(b3, - W1, col[8*7]);
}


will become:
Code:
a0 += + W4*col[8*4];
a1 += - W4*col[8*4];
a2 += - W4*col[8*4];
a3 += + W4*col[8*4];
MAC16(b0, + W5, col[8*5]);
MAC16(b1, - W1, col[8*5]);
MAC16(b2, + W7, col[8*5]);
MAC16(b3, + W3, col[8*5]);
a0 += + W6*col[8*6];
a1 += - W2*col[8*6];
a2 += + W2*col[8*6];
a3 += - W6*col[8*6];
MAC16(b0, + W7, col[8*7]);
MAC16(b1, - W5, col[8*7]);
MAC16(b2, + W3, col[8*7]);
MAC16(b3, - W1, col[8*7]);


So now i have to do more instructions, but will it weigh up to the time spent in those conditions? Answer later.

When reading (1) it becomes clear that these are indeed matrix operations of which 4 of them are so called 'rotations' which involves cosines. The cosine coefficients are all in separate '#defines' and converted to integers, so that makes this routine a 'fast-integer 1D-IDCT' Now write out all the multiplications and coefficients:
Code:
a0 = W4 * (col[8*0] + ((1<<(COL_SHIFT-1))/W4));
a1 = a0;
a2 = a0;
a3 = a0;

a0 +=  + W2*col[8*2];
a1 +=  + W6*col[8*2];
a2 +=  - W6*col[8*2];
a3 +=  - W2*col[8*2];

MUL16(b0, W1, col[8*1]);
MUL16(b1, W3, col[8*1]);
MUL16(b2, W5, col[8*1]);
MUL16(b3, W7, col[8*1]);

MAC16(b0, + W3, col[8*3]);
MAC16(b1, - W7, col[8*3]);
MAC16(b2, - W1, col[8*3]);
MAC16(b3, - W5, col[8*3]);
a0 += + W4*col[8*4];
a1 += - W4*col[8*4];
a2 += - W4*col[8*4];
a3 += + W4*col[8*4];
MAC16(b0, + W5, col[8*5]);
MAC16(b1, - W1, col[8*5]);
MAC16(b2, + W7, col[8*5]);
MAC16(b3, + W3, col[8*5]);
a0 += + W6*col[8*6];
a1 += - W2*col[8*6];
a2 += + W2*col[8*6];
a3 += - W6*col[8*6];
MAC16(b0, + W7, col[8*7]);
MAC16(b1, - W5, col[8*7]);
MAC16(b2, + W3, col[8*7]);
MAC16(b3, - W1, col[8*7]);

=
Code:
a0  = W4 * col[8*0] + (1<<(COL_SHIFT-1));
a1  = a0;
a2  = a0;
a3  = a0;

a0 += W4*col[8*4];
a1 -= W4*col[8*4];
a2 -= W4*col[8*4];
a3 += W4*col[8*4];

a0 += col[8*2]*W2;
a1 += col[8*2]*W6;
a2 -= col[8*2]*W6;
a3 -= col[8*2]*W2;
a0 += col[8*6]*W6;
a1 -= col[8*6]*w2;
a2 += col[8*6]*W2;
a3 -= col[8*6]*W6;

b0  = col[8*1]*W1;
b1  = col[8*1]*W3;
b2  = col[8*1]*W5;
b3  = col[8*1]*W7;
b0 += col[8*3]*W3;
b1 -= col[8*3]*W7;
b2 -= col[8*3]*W1;
b3 -= col[8*3]*W5;
b0 += col[8*5]*W5;
b1 -= col[8*5]*W1;
b2 += col[8*5]*W7;
b3 += col[8*5]*W3;
b0 += col[8*7]*W7;
b1 -= col[8*7]*W5;
b2 += col[8*7]*W3;
b3 -= col[8*7]*W1;

=
Code:
int d0,d2=col[8*2],d4=W4*col[8*4],d6=col[8*6];
int d1=col[8*1],d3=col[8*3],d5=col[8*5],d7=col[8*7];

a0  = W4 * col[8*0] + (1<<(COL_SHIFT-1));
a1  = a0;
a0 += d4;
a1 -= d4;
a3  = a0;
a2  = a1;

a0 += d2*W2 + d6*W6;
a1 += d2*W6 - d6*W2;
a2 +=-d2*W6 + d6*W2;
a3 +=-d2*W2 - d6*W6;

b0  = d1*W1 + d7*W7;
b3  = d1*W7 - d7*W1;
b2  = d1*W5 + d7*W3;
b1  = d1*W3 - d7*W5;

b0 += d3*W3 + d5*W5;
b3 +=-d3*W5 + d5*W3;
b1 +=-d3*W7 - d5*W1;
b2 +=-d3*W1 + d5*W7;


So after a lot of wizardry, i'm left with some fairly symmetric multiplications. Now comes the clever part. Also from (1), A multiplication of the form:
Code:
t0 = W0 * d0 + W1 * d1;
t1 = W0 * d1 - W1 * d0;

can also be written as:
Code:
int tmp = W0 * (d0 + d1);
t0 = tmp + (W1 - W0) * d1;
t1 = tmp - (W1 + W0) * d0;

which saves you one expensive multiplication per 2x2 matrix multiplication. This sort of 2x2 matrix multiplication BTW is supposed to be called a Butterfly, because of the butterfly shape of the diagrammatic form.

This is the reason why i had to get rid of those 'if' branches in the beginning. I couldn't have written out those butterflies without those pieces inside an 'if'. In this case it is worthwile, but one always has to test these things carefully.

Enter the SIMD (Single Instruction on Multiple Data) instructions on the MIPS 4 Instruction set.
Code:
madd  ==>  a = a + b*c
nmsub ==>  a = a - b*c

These instruction perform a multiplication and addition/subtraction in one go! They are only to be found on mips4 instruction sets, so you gotta compile with -mips4 to get them. Also they are only for floats, either singles and doubles, not for integers! :(

BUT, the R10000 is a pretty nifty piece of machinery. It is blessed with both an Integer ALU and floating point ALU and can issue 2 integer ops and 1 floating point ops in one clocktick. So what if we substitute a part of the calculation with floats instead of integers? Maybe the compiler can then weave these two "threads" together, with the added bonus of 'madd'/'nmsub' instructions for the floating point part. So i devised this:
Code:
#define BUTTERFLY(t0,t1,W0,W1,d0,d1)    \
do {                                    \
int tmp = W0 * (d0 + d1);           \
t0 = tmp + (W1 - W0) * d1;          \
t1 = tmp - (W1 + W0) * d0;          \
} while (0)

#define BUTTERFLY0(t0,t1,W0,W1,d0,d1)   \
do {                                    \
t0 = W0 * d0 + W1 * d1;             \
t1 = W0 * d1 - W1 * d0;             \
} while (0)

#define BUTTERFLYADD0(t0,t1,W0,W1,d0,d1)\
do {                                    \
t0 += W0 * d0 + W1 * d1;            \
t1 += W0 * d1 - W1 * d0;            \
} while (0)

static inline void idctSparseColAdd (uint8_t *dest, int line_size,
DCTELEM * col)
{
int a0, a1, a2, a3;
float b0, b1, b2, b3;
uint8_t *cm = cropTbl + MAX_NEG_CROP;
int d0,d4=W4*col[8*4];
float d1=col[8*1],d3=col[8*3],d5=col[8*5],d7=col[8*7];

/* XXX: I did that only to give same values as previous code */

a0 = W4 * col[8*0] + (1<<(COL_SHIFT-1));
a1 = a0;
BUTTERFLY0(b0,b3,d1,d7,W1,W7);

a0 += d4;
a1 -= d4;
BUTTERFLY0(b2,b1,d1,d7,W5,W3);
a3 = a0;
a2 = a1;
BUTTERFLYADD0(b3,b0,d3,d5,-W5,W3);
BUTTERFLY(d0,d4,col[8*2],col[8*6],W2,W6);
BUTTERFLYADD0(b1,b2,d3,d5,-W7,-W1);

a0 += d0;
a1 += d4;
a2 -= d4;
a3 -= d0;

dest[0] = cm[dest[0] + ((a0 + (int)b0) >> COL_SHIFT)];
dest += line_size;
dest[0] = cm[dest[0] + ((a1 + (int)b1) >> COL_SHIFT)];
dest += line_size;
dest[0] = cm[dest[0] + ((a2 + (int)b2) >> COL_SHIFT)];
dest += line_size;
dest[0] = cm[dest[0] + ((a3 + (int)b3) >> COL_SHIFT)];
dest += line_size;
dest[0] = cm[dest[0] + ((a3 - (int)b3) >> COL_SHIFT)];
dest += line_size;
dest[0] = cm[dest[0] + ((a2 - (int)b2) >> COL_SHIFT)];
dest += line_size;
dest[0] = cm[dest[0] + ((a1 - (int)b1) >> COL_SHIFT)];
dest += line_size;
dest[0] = cm[dest[0] + ((a0 - (int)b0) >> COL_SHIFT)];
}

All the calculation involving the 'b' coefficients are floats and all the 'a's are integers. Got some temp storage variables as well, since the R10000 has a large number of registers, so this never hurts.
The BUTTERFLY0 macro is for the floats, because that gives the compiler the chance to issue the madd instructions and the normal BUTTERFLY macro is for the integers, because of the saving of one multiplication. Oh and the order of macro's for the float series (b) is independent from the integer series (a), which makes it easy for the compiler/CPU to balance the load between the two ALU's

If you look carefully look at the assembly output ('c99 -S'), you can see the weaving of instructions. First the 'old' routine:
Normal
Code:
# Program Unit: idctSparseColAdd
.ent    idctSparseColAdd
idctSparseColAdd:       # 0x340
.dynsym idctSparseColAdd        sto_default
.frame  $sp, 80, $31
# lgra_spill_temp_9 = 0
# lgra_spill_temp_10 = 8
# lgra_spill_temp_11 = 16
# lgra_spill_temp_12 = 24
# lra_spill_temp_13 = 32
# lra_spill_temp_14 = 40
# lra_spill_temp_15 = 48
# lra_spill_temp_16 = 56
# lra_spill_temp_17 = 64
# lra_spill_temp_18 = 72
.loc    1 255 1
# 251  }
# 252
# 253  static inline void idctSparseColAdd (uint8_t *dest, int line_size,
# 254                                       DCTELEM * col)
# 255  {
.BB1.idctSparseColAdd:  # 0x340
#<freq>
#<freq> BB:1 frequency = 1.00000 (heuristic)
#<freq>
addiu $sp,$sp,-80               # [0]
sd $16,0($sp)                   # [1]  lgra_spill_temp_9
.loc    1 265 9
# 261          a1 = a0;
# 262          a2 = a0;
# 263          a3 = a0;
# 264
# 265          a0 +=  + W2*col[8*2];
lh $16,32($6)                   # [0]  id:201
addiu $10,$0,21407              # [0]
mult $16,$10                    # [2]
.loc    1 266 9
# 266          a1 +=  + W6*col[8*2];
addiu $12,$0,8867               # [1]
.loc    1 255 1
sd $19,24($sp)                  # [2]  lgra_spill_temp_12
.loc    1 265 9
mflo $19                        # [8]
nop                             # [2]
nop                             # [2]
.loc    1 266 9
.
.
.
.loc    1 322 9
lbu $19,0($18)                  # [163]  id:229
sra $24,$24,20                  # [160]
.loc    1 323 1
# 323  }
ld $16,0($sp)                   # [164]  lgra_spill_temp_9
.loc    1 322 9
addu $19,$19,$24                # [165]
addu $17,$17,$19                # [166]
.loc    1 323 1
ld $19,24($sp)                  # [165]  lgra_spill_temp_12
.loc    1 322 9
lbu $17,384($17)                # [168]  id:230 cropTbl+0x0
sb $17,0($18)                   # [169]  id:231
.loc    1 323 1
ld $17,8($sp)                   # [170]  lgra_spill_temp_10
ld $18,16($sp)                  # [171]  lgra_spill_temp_11
jr $31                          # [162]
addiu $sp,$sp,80                # [162]
.end    idctSparseColAdd
.section .text


171 clockticks! Because all the ops are integers, the integer ALU gets extremely busy, thereby stalling the throughput.

Optimised
Code:
# Program Unit: idctSparseColAdd
.ent    idctSparseColAdd
idctSparseColAdd:       # 0x200
.dynsym idctSparseColAdd        sto_default
.frame  $sp, 0, $31
.loc    1 204 1
# 200  }
# 201
# 202  static inline void idctSparseColAdd (uint8_t *dest, int line_size,
# 203                                       DCTELEM * col)
# 204  {
.BB1.idctSparseColAdd:  # 0x200
#<freq>
#<freq> BB:1 frequency = 1.00000 (heuristic)
#<freq>
.loc    1 209 32
# 205          int a0, a1, a2, a3;
# 206          float b0, b1, b2, b3;
# 207          uint8_t *cm = cropTbl + MAX_NEG_CROP;
# 208          int d0,d4=W4*col[8*4];
# 209          float d1=col[8*1],d3=col[8*3],d5=col[8*5],d7=col[8*7];
lh $8,80($6)                    # [0]  id:169
.loc    1 209 20
lh $10,48($6)                   # [1]  id:168
.loc    1 209 32
mtc1 $8,$f0                     # [2]
.loc    1 209 8
lh $12,16($6)                   # [2]  id:167
.loc    1 209 20
mtc1 $10,$f7                    # [3]
.loc    1 224 9
# 220          a3 = a0;
# 221          a2 = a1;
# 222
# 223          BUTTERFLYADD0(b3,b0,d3,d5,-W5,W3);
# 224          BUTTERFLY(d0,d4,col[8*2],col[8*6],W2,W6);
lh $1,32($6)                    # [3]  id:172
addiu $9,$0,30274               # [1]
.loc    1 209 44
lh $11,112($6)                  # [4]  id:170
.loc    1 209 32
cvt.s.w $f0,$f0                 # [5]
.loc    1 224 9
mult $1,$9                      # [5]
.loc    1 209 8
mtc1 $12,$f1                    # [4]
.
subu $15,$15,$25                # [13]
.loc    1 234 9
madd.s $f9,$f9,$f0,$f6          # [18]                             <== Float op [18], combined with integer op [18]
.loc    1 232 9
trunc.w.s $f13,$f13             # [21]
.loc    1 217 9
subu $2,$2,$3                   # [18]                             <== Integer op [18], combined with float op [18]
.loc    1 236 9
# 235          dest += line_size;
# 236          dest[0] = cm[dest[0] + ((a2 + (int)b2) >> COL_SHIFT)];
mul.s $f6,$f7,$f6               # [19]                             <== Float op [19], combined with integer op [19]
.loc    1 217 9
lui $3,8                        # [15]
addu $2,$15,$2                  # [19]                             <== Integer op [19], combined with float op [19]
.loc    1 224 9
addu $9,$24,$9                  # [18]                             <== Integer op [18], combined with float op [18]
.loc    1 232 9
mfc1 $7,$f13                    # [23]
.loc    1 217 9
addu $2,$2,$3                   # [20]                             <== Integer op [20], combined with float op [20]
.loc    1 236 9
nmsub.s $f6,$f6,$f0,$f11        # [20]                             <== Float op [20], combined with integer op [20]
.loc    1 227 9
.

lbu $8,384($8)                  # [48]  id:193 cropTbl+0x0
.loc    1 245 9
# 245          dest += line_size;
addu $2,$5,$9                   # [45]
.loc    1 244 9
sb $8,0($9)                     # [49]  id:194
.loc    1 246 9
# 246          dest[0] = cm[dest[0] + ((a0 - (int)b0) >> COL_SHIFT)];
subu $4,$6,$7                   # [46]
lbu $3,0($2)                    # [50]  id:195
sra $4,$4,20                    # [47]
addu $3,$3,$4                   # [52]
addu $1,$1,$3                   # [53]
lbu $1,384($1)                  # [54]  id:196 cropTbl+0x0
.loc    1 247 1
# 247  }
jr $31                          # [48]
.loc    1 246 9
sb $1,0($2)                     # [55]  id:197
.end    idctSparseColAdd
.section .text
.align 6


Tadaaa, 55 ticks! Amazing what a little software pipelining can do for your code!


Running speedshop again proves it:
Code:

Summary of statistical PC sampling data (fpcsampx)--
49298: Total samples
49.298: Accumulated time (secs.)
1.0: Time per sample (msecs.)
4: Sample bin width (bytes)
-------------------------------------------------------------------------
Function list, in descending order by time
-------------------------------------------------------------------------
[index]      secs    %    cum.%   samples  function (dso: file, line)

[1]    20.239  41.1%  41.1%     20239  yuv2rgb_c_24_rgb (mplayer: yuv2rgb.c, 319)
[2]     7.177  14.6%  55.6%      7177  idctSparseColAdd (mplayer: simple_idct.c, 209)
[3]     4.454   9.0%  64.6%      4454  put_pixels8_c (mplayer: dsputil.c, 897)
[4]     2.382   4.8%  69.5%      2382  msmpeg4_decode_block (mplayer: msmpeg4.c, 1676)
[5]     1.948   4.0%  73.4%      1948  idctRowCondDC (mplayer: simple_idct.c, 104)
[6]     1.551   3.1%  76.6%      1551  simple_idct_put (mplayer: simple_idct.c, 313)
[7]     1.301   2.6%  79.2%      1301  put_pixels16_xy2_c (mplayer: dsputil.c, 897)



IDCT is a bit more scattered now, three routines instead of one, but added up (idctSparseColAdd +idctRowCondDC + simple_idct_put) gives 21.7% which is half the time of yuv2rgb_c_24_rgb(41.1%). Compared that with the starting speedshop run, this is a 50% reduction in time spent in IDCT! Looking at the total times. 64.0 versus 49.3 seconds is 23% speedup of the app. Whoa! Granted, it's only for this .avi file and this specific IDCT. Other codecs need other routines, but it's a start.

Well, hope you have read through it and picked up some ideas. Next time i'll be looking at yuv2rgb.c software wise.

The patch for libavcodec/simple_idct.c is now living at http://www.mechanics.citg.tudelft.nl/~e ... pre4.patch get it and try it. Schleusel and i will try to pester MPlayer CVS guru's to get us some libavcodec/mips subdirectory where we can store this.

%-)
hamei wrote:
Brombear wrote:
hamei wrote:
From what I understand the Itanic is a lot like RISC in some ways - execution speed is very dependent on smart compilers. So what do they have on Linux ? Gcc ? hmmm.


I believe the intel compiler (icc) is used on these machines. Hard to guess its performance without real tests though


I just know what I've read about developers in the HP camp screaming bloody murder about "no tools ! no tools ! where the hell are all the optimized tools you promised us ?"


It's not icc and ifc/ifort, but ecc and efc/efort on Itanium systems. And there is one performance tool you can run on an itanium, because of its specific counter registers included in the CPU core.
The tool is called HistX and can be downloaded from the SGI site:

http://www.sgi.com/products/evaluation/altix_histx/

But i admit, i miss ssrun/cvd/perfex on Itanium systems. Also, there's a lot of funky performance issues like some code running like mad on the PIV Xeon will crawl on Itanium. And you're completely dependent on ecc for your optimisations, so no pragma's like on MIPS will help you. I have seen and tested an Itanium2 1.5 GHz with 3MB cache to be slightly slower than a PIV Xeon 3.05GHz with respect to floating point fortran code. And considering a dual Itanium2 machine costs triple the amount of a dual PIV machine, the choice is easily made...
My first attempt at a FAQ for building tardists for nekoware. It's grossly incomplete, contains arrogant remarks, smells like dogpoo and is probably not very accurate. Please comment on this FAQ in the regular thread "Nekoware tardists build FAQ" started by O2ric and post suggestions! I'll be doing additions on this document in a hopeful steady rate. Hence my posting today, otherwise this will never get finished :)



The Nekoware Tardist Build FAQ

collected by Frank Everdij aka dexter1 29 oct 2004

Intro

This is supposed to be a guide about making nekoware tardist packages to be enjoyed on SGI IRIX machines, aka those funny colored machines with a cube logo up front which costed a small car back in the days...


Rules

First of all we need to define some rules on what packages we intend to be released as nekoware and on what platforms they should run. We aim at opensource software, GNU GPL or BSD license or other free/non-commercial license, usually developed on linux x86 machines. Because these opensource software were mainly developed on x86 architecture, this poses some constraints on the target machine. It should have reasonably fast IO, fast integer performance and for building code it should run a recent MIPSPro compiler which can squeeze as much performance from code as it can.

As target SGI machines, we have chosen the following:

1) MIPS IV instruction set, meaning all machines with R5000 processors and up, which include: R5K Indy, R8K Indigo2, R10K Indigo2 Impact, R8K and R10K Challenge, R8K and R10K Onyx1, all O2, all Octane, all Onyx2, all Origin, Tezro, Fuel.
2) Minimum IRIX to run is 6.5.21m. On supportfolio you can get hold of 6.5.22m overlays, but take care to install Patch 5086 first if you plan to upgrade. People running less than 6.5.21 can use nekoware, but need some runtime modifications with RLD_LIST becuae of missing symbols, most notably strlcpy and strlcat. Ugly, but it works. Also take care to install the latest patches for the OS.
3) MIPSPro 7.4.x is the preferred compiler choice. Mipspro 7.3.1.3m is also pretty good and for some apps it's even better than the 7.4.x series. We discourage the use of gcc 3.x, though sometimes it is the only choice. Unsure is, if compiling c++ code with g++ will play with MIPSPro compiled code, because of name mangling issues.
4) IRIX 6.5.22m is our preferred build platform OS. This IRIX is the end of the line for a lot of older machines, has mp3 support, UDF, IPV6, NTP, in short a nice OS, is fairly recent, and if reasonably patched, quite stable.


Port/Compile

Porting opensource on IRIX platform is sometimes not an easy task. To name a few major problems:
1) Endianness. IRIX is big-endian, meaning that word/multiword storage starts with the most significant byte first. x86 PC's are little-endian. So you can have situations where you need to swap bytes around before or after processing variables in routines.
2) ASM. Forget it. Optimised x86 assembly code will never run on MIPS. Ditch the software and find a C/C++ equivalent.
3) GCC-isms. A name bundling a variety of hacks (void pointer arithmetics), oddball coding (namespace clashes), wrong standards (C++ iostreambuf return pointer), hard-coded Makefiles (the worst) and botched/ignorant ./configure scripts (even worse than worst). Sometimes switching to c99 instead of cc fixes a lot of gcc-isms. In the code atleast
4) Different device handling, most notably parallel port programming and audio. No joysticks :( no USB (except your mouse on the Tezro)
5) Performance/Speed. MIPS being a floating point killer has trouble with integer code. Most opensource code do not take this into consideration, so optimisations are needed to crank up the code to make it run at useable speeds on your IRIX box.

Minor problems like misplaced or missing headers can usually be fixed by a bit of searching and #ifdef's wrapping. __sgi is a nice symbol to use for that, though most ./configure scripts can determine that you're running MIPS and set defines accordingly. Also defines which include BSD in the name are sometimes a good idea, since IRIX is a BSD style Unix flavor.

Selecting a good environment is necessary to get a proper build. A good starting point is:

Code: Select all

setenv CC cc
setenv CXX CC
setenv CFLAGS '-O3 -mips4 -n32'
setenv CXXFLAGS '-O3 -mips4 -n32'
setenv CPPFLAGS '-I/usr/nekoware/include'
setenv LDFLAGS '-L/usr/nekoware/lib'

CC is the basic C compiler environment name, which usually gets picked up by ./configure scripts. sometimes one may have to substitute it with:

Code: Select all

setenv CC c99  (only for mipspro 7.4.x) or
setenv CC c89  to select a more gcc-like parsing and compiling behaviour.

CXX is the C++ compiler environment name. It should be CC or g++ if the code is too messy to compile with MIPSPro.

CFLAGS and CXXFLAGS should be small and neat. -O3 gives you the best optimisation for a first compile attempt. -mips4 selects MIPS IV instruction set optimisations. -n32 select 32bit addressing, so pointers should not exceed 2 Gb, which rarely happens with opensource programs.
/usr/nekoware/include and /usr/nekoware/lib should be good include and lib paths for porting an app to nekoware. If you need more libs to port an app, it may be an idea to first build it into /usr/local, so you can test the code before doing a proper nekoware build.

Other less used environment variables which can be picked up by ./configure are

Code: Select all

setenv PERL /usr/nekoware/bin/perl
setenv GNUMAKE /usr/nekoware/bin/gmake
setenv SED /usr/nekoware/bin/sed


Packaging

Try to start ports with the necessary prerequisites installed, preferably nekoware packages if you plan to make a nekoware tardist. Although nekoware doesn't bite with freeware, some ./configure scripts can get confused with two similar named libraries in different locations. Imagine a package including /usr/freeware/include/jpeg.h and linking with /usr/nekoware/lib/libjpeg.so...
So it's best to dedicate a machine for that.

I compile most opensource stuff for nekoware in /usr/local/src/<programname-version>
For building packages i have made a directory /usr/local/src/build and in there i make a directory <programname> where i store the following; Lets give and example for the program called "program" In /usr/local/src/build/program :

1) neko_program.idb <- swpkg/gendist package list of files and actions
2) neko_program.spec <- specification file of subsystems, versions and dependencies
3) neko_program.txt <- text file with specifications on the build proces, program version, used environment variables, ./configure options, dependencies needed, background info, sometimes test results or performance numbers.
4) program-1.2.3.tar.gz or program-1.2.3.tar.bz2 <- original program tarball. Always leave the tarball intact, because the source code and copyright notice should remain available.
5) neko_program-1.2.3_irix.patch <- patch file to patch the source. should be a cat of diff -u output applied inside the source path /usr/local/src/program-1.2.3 so it should be appliable with "patch -p0 < neko_program-1.2.3_irix.patch"
6) clean.txt and program.txt <- These are ls -lR outputs of /usr/nekoware done as root, before and after a "make install" so i know what files have been added/deleted by doing a diff or xdiff of these two files.

5) and 6) are more or less my choices but 1) through 4) are mandatory files and should be included in the nekoware tardist build so people can recreate the tardist if needed.

More to come after my beauty sleep :)
Beer:
Grolsch (springcap bottle)
Duvel (8.5% Belgian beer, very bitter)

Spirits:
"ForestWalk" (cream liquor with Bananas)
Ouzo (vat12)
Absinth (Rare 55% Anice spirit with Terpene oil compounds, in small doses allowed in Europe)
Intel-OUTSIDE wrote:
dexter1 wrote: Spirits:
"ForestWalk" (cream liquor with Bananas)
Ouzo (vat12)
Absinth (Rare 55% Anice spirit with Terpene oil compounds, in small doses allowed in Europe)


you have got to be joking, especially the Ouzo, greek paint-stripper!!!


Hahaha, no really it's very doable :) The anice hides the bitter taste of alcohol well. In fact, i hardly get any hangovers from Ouzo and it keeps you warm in cold Nerd-camps. Ofcourse Absinth is something different alltogether: http://www.eabsinthe.com/hills/serving.htm
it does give you a hangover, but oddly only my rightside frontal lobe felt sore the next morning. Must be the Thujone terpene binding to receptors for my abstract abilities... :P
Hakimoto wrote: What I'm asking myself, dexter1, is if they're really still making Absinth with Artemisia absinthum or some other stuff like sage.


Yes, nowadays they make it with Artemisia absinthum or wormwood, which contains the Thujone. There is a legal European maximum dose of Thujone in absinth which is 10 mg per litre for drinks above 35% promillage. Formally, Absinth is forbidden in Holland, but one Judge ruling has placed this 1909 Absinth law invalid. So it is now legal to buy the Absinth liquors Like Tabu from Germany in stores in Holland. I have a bottle still open, care to join me? ;)

For more information, visit http://www.groenefee.nl/ for some specific Dutch info.

And you can make me really happy with some genuine french Absinth like Francois Guy or Versinth La Blanche. Never tasted those, and you're closer to France than i am :)