The collected works of lewis

Hey - I've been playing with mplayer a bit, and my own CVS-current build with the freeware gcc gets exactly the same (or so) -benchmark results as the MIPSPro build above. Bit disappointing. I'm testing with MPEG2 VOB files and the same files mencoded to raw video/audio.

I'm a dual 300 SSE and I can only get 15-20 FPS playing DVD rate MPEG2, playing the uncompressed version is fine (mmm, Cheetah X15 :)) but with rather alarming CPU usage - 50% of each processor at least. For us people without TRAMS , I imagine one could do a much better job using glDrawPixels() instead of icky XImages. It also looks like all MGRAS cards could do hardware YUV-RGB with the SGI_color_matrix GL extension. Not sure how fast that would be. I may have a go at writing an output plugin. Perhaps one could use XSGIvc to switch to a better resolution too. If glPixelZoom() is too slow for full screen, which I image it is, it would be nice to have a 720x480_59.94_db32 mode but playing with the BlockSync template and VFC I can't seem to make one. And it would be the wrong aspect ratio anyway, and I doubt most monitors would like it. I use a 640x480_89.91 mode and chop the edges off at the minute :)

One could use the various video extensions to get proper synchronisation with the vertical retrace too. If one could get an interlaced mode, maybe it could be possible to split each frame into lines and get actual proper like-a-TV interlaced output :)

Of course, mpeg2lib is stil waay to slow to play DVDs. I don't suppose anyone is handy with the MADD instructions and feels like optimizing it? ;) Way over my head sadly. If assembly is asking too much, maybe some of the motion compensation and IDCT could be rewritten using routines from the SCSL , which is presumably pretty speedy?

More thoughts: How come xdpyinfo shows an XVideo extension, and there are headers, but no libraries? How come I can't build mplayer with ./configure --enable-profile? GCC complains about ftell being redeclared in IRIX's stdio_core.h.

Oh, and to get mplayer to play anything at all smoothly I have to use both -cache <lots> and -ni, which I thought were somewhat mutually exclusive...

Oh, and almost all of the -lavc codecs core when I try to encode with them. I was hoping that HuffYUV might let me play stuff without each file taking up 20Gib of disk...

AND, why does the whole screen revert to 1.0 gamma when the cursor is over the video window? I though television gamma was more like 2.2?

Sorry for all the questions... I'm also consideing porting the dvdlibs that mplayer uses. I'd have to basically write all the dvd ioctls, but it's already been done for BSDI using the user space ds SCSI stuff.
If anyone cares, I've attempted to write a new output plugin using glDrawPixels(). Rather embarassingly it's actually slower than the regular X11/SHM one at the moment. I'm not sure why. Using the colour matix extension for colour space conversion works fine, but it's not very fast at all, possibly no faster than doing it entirely in software. Maybe it's accelerated more on something other than Impact graphics?

Having to unpack everything from planar 4:2:0 YUV into packed RGBA is a major pain. I've done it the obvious way with nested for() loops, but does anyone with more clue about algorithm design know of a better way?

I managed to get all the work of the drawing onto the other CPU with some very (VERY) crude use of pthreads. This breaks slices support and is generally a Bad Thing, but I get much better framerates using this and -noslices, so I don't care :)

If anyone has any suggestions on how it could go faster, do let me know :) I still can't get profiling to work. I may tidy it up and make a binary or a diff, but for now the guts in case there's something obviously wrong:

Code: Select all

#include "video_out.h"
#include "video_out_internal.h"
#include <pthread.h>
#include <semaphore.h>
#include <GL/glx.h>
#include <stdio.h>

static vo_info_t info = {
"OpenGL output for SGI machines without hardware texturing",
"flour",
"Lewis Saunders <[email protected]>",
""
};

sem_t sem;
Display* dpy;
GLXWindow glxWin;
uint8_t* Y;
uint8_t* U;
uint8_t* V;

LIBVO_EXTERN(flour)

void artist(void* dontCare);

static uint32_t preinit(const char* arg) {
pthread_t child;

sem_init(&sem, 1, 0);
pthread_create(&child, NULL, (void*)artist, NULL);

return(0);
}

static uint32_t control(uint32_t request, void* data, ...) {
switch(request) {
case VOCTRL_QUERY_FORMAT:
if(*(uint32_t*)data == IMGFMT_YV12) {
return(VFCAP_CSP_SUPPORTED |
VFCAP_CSP_SUPPORTED_BY_HW |
VFCAP_HWSCALE_UP |
VFCAP_HWSCALE_DOWN);
} else {
return(0);
}
break;
}
}

static uint32_t config(uint32_t width,   uint32_t height,
uint32_t d_width, uint32_t d_height,
uint32_t fullscreen, char* title, uint32_t format) {
return(0);
}

static uint32_t draw_slice(uint8_t* src[], int stride[],
int w, int h, int x, int y) {
Y = src[0];
U = src[1];
V = src[2];
sem_post(&sem);
}

void artist(void* dontCare) {
XVisualInfo*         xVisual;
XSetWindowAttributes xWinAttrs;
XEvent               event;
Window               xWin;
GLXFBConfig*         fbConfigs;
GLXContext           glxContext;
GLfloat              yuv2rgb[16] = {1.000,  1.000,  1.000,  0.000,
0.000, -0.344,  1.770,  0.000,
1.403, -0.714,  0.000,  0.000,
0.000,  0.000,  0.000,  1.000};
static uint8_t       packed[480][720][4];
int                  i, j;
int                  xWinAttrMask;
int                  fbCount;
int                  fbAttrs[] = {GLX_DOUBLEBUFFER,  True,
GLX_RED_SIZE,      1,
GLX_GREEN_SIZE,    1,
GLX_BLUE_SIZE,     1,
None};
dpy = XOpenDisplay(NULL);

fbConfigs = glXChooseFBConfig(dpy, DefaultScreen(dpy),
fbAttrs, &fbCount);

xVisual = glXGetVisualFromFBConfig(dpy, fbConfigs[0]);

xWinAttrMask = CWColormap | CWEventMask;
xWinAttrs.event_mask = StructureNotifyMask;
xWinAttrs.colormap = XCreateColormap(dpy,
RootWindow(dpy, xVisual->screen),
xVisual->visual, AllocNone);
xWin = XCreateWindow(dpy, RootWindow(dpy, xVisual->screen), 0, 0,
720, 480, 0, xVisual->depth, InputOutput,
xVisual->visual, xWinAttrMask, &xWinAttrs);
XFree(xVisual);
XMapWindow(dpy, xWin);
do {
XNextEvent(dpy, &event);
} while (event.type != MapNotify || event.xmap.event != xWin);

glxContext = glXCreateNewContext(dpy, fbConfigs[0], GLX_RGBA_TYPE,
NULL, True);
glxWin = glXCreateWindow(dpy, fbConfigs[0], xWin, NULL);
XFree(fbConfigs);
glXMakeContextCurrent(dpy, glxWin, glxWin, glxContext);

glDisable(GL_ALPHA_TEST);
glDisable(GL_BLEND);
glDisable(GL_DEPTH_TEST);
glDisable(GL_DITHER);
glDisable(GL_FOG);
glDisable(GL_LIGHTING);
glDisable(GL_LOGIC_OP);
glDisable(GL_STENCIL_TEST);
glDisable(GL_TEXTURE_1D);
glDisable(GL_TEXTURE_2D);
glDisable(GL_TEXTURE_3D_EXT);
glDisable(GL_CONVOLUTION_1D_EXT);
glDisable(GL_CONVOLUTION_2D_EXT);
glDisable(GL_SEPARABLE_2D_EXT);
glDisable(GL_HISTOGRAM_EXT);
glDisable(GL_MINMAX_EXT);
glPixelTransferi(GL_MAP_COLOR, GL_FALSE);
glPixelTransferi(GL_RED_SCALE, 1);
glPixelTransferi(GL_RED_BIAS, 0);
glPixelTransferi(GL_GREEN_SCALE, 1);
glPixelTransferi(GL_GREEN_BIAS, 0);
glPixelTransferi(GL_BLUE_SCALE, 1);
glPixelTransferi(GL_BLUE_BIAS, 0);
glPixelTransferi(GL_ALPHA_SCALE, 1);
glPixelTransferi(GL_ALPHA_BIAS, 0);
glPixelZoom(1.0, -1.0);
glRasterPos2i(-1, 1);
glMatrixMode(GL_COLOR);
glLoadMatrixf(yuv2rgb);
glPixelTransferf(GL_GREEN_BIAS, -0.5);
glPixelTransferf(GL_BLUE_BIAS, -0.5);
glClearColor(0.0, 0.0, 0.0, 1.0);
glClear(GL_COLOR_BUFFER_BIT);
glXSwapBuffers(dpy, glxWin);
glClear(GL_COLOR_BUFFER_BIT);

for(;;) {
sem_wait(&sem);
for(i = 0; i < 480; i++) {
for(j = 0; j < 720; j++) {
packed[i][j][0] = Y[i*720 + j];
packed[i][j][1] = U[(i/2)*360 + j/2];
packed[i][j][2] = V[(i/2)*360 + j/2];
}
}
glDrawPixels(720, 480, GL_RGBA, GL_UNSIGNED_BYTE, packed);
glXSwapBuffers(dpy, glxWin);
}
}

static uint32_t draw_frame(uint8_t* src[]) {}
static void draw_osd(void) {}
static void check_events(void) {}
void uninit(void) {}
void flip_page(void) {}


If you want to try it out, bung it in the libvo folder of an mplayer source tree, add the reference to video_out.c and fiddle with the configure script extensively...
The actual glDrawPixel() call is not exactly fast. Without the color matrix stuff it's not so bad, but with it it gets many times slower. Wish I had VPro graphics with their asynchronous pixel writes :(

But yes, it's those loops which are the big problem. I've tried to look at how mplayer's software scaler does the packing conversion but it's really evil code with macros all over the place, and I can't understand it even after preprocessing.

Thanks for the help Matthias, I'll try those changes and get back to you.

Squeen, I am using system scope pthreads, or at least both CPUs are being used according to gr_osview. I don't even have capabilities turned on... I thought pthreads did that by default? I did look at IRIX's sproc() and shared arena stuff, but I don't see that it would be any faster - pthreads share the same memory too. And I must emphasise that mplayer is really not designed for this, it likes its output plugins to be all serialized so it can order and time stuff properly. I would imagine that as long as the drawing can keep up with the codec it would work acceptably well, but I do see some tearing which doesn't happen with X11.

BTW using glZoomPixels() for zooming is /much/ faster than either the software or the SDL scaler.

While I'm here, why can't I get pixie to run on anything linked with IRIX's libc? It complains about, umm, something arcane do to with code blocks. Works fine on the o32 libc, but that ain't much use :) Profiling would really be a help here...
Brombear, I tried your code (with a couple of changes added so it actually worked :)), and it made no difference whatsoever. What you did makes sense but I reckon gcc does that kind of optimization pretty well, expessions evaluating to constants inside loops and whatnot. I dare say it reorganizes loops to run downwards when it can because that made no difference either.

From crudely commenting bits out it seems the packing takes about 40% of a single 300Mhz R12k, drawing and swapping with no colourspace conversion about 15%, and drawing with colourspace conversion about 60%. So on bits where the codec is chucking out frames at full speed it can't keep up.

Squeen, both CPUs are most assuredly absolutely pegged most of the time :) Why would having more than one drawing thread make any difference on a dual machine? The first CPU is occupied entirely with the MPEG2 codec... if I had an Origin, then yeah, but I think that can wait :)
I tried your code before you edited it, and... nothing changed. I'm losing sight of what exactly is going on here. What you did makes a lot of sense, I really should have thought of using only one 32bit memory store like that :) Although you do have the bytes back to front. GL_ABGR_EXT is your friend. What do you mean by "R10k pipeline"? Are you relying on speculative memory fetches or something?

I think something odd must be happening because nothing I do changes how much CPU time is used. Plus, it uses a lot more CPU when playing VOBs than when playing raw stuff, which seems back to front. I dunno. I'd post my modified source but it's huge and I'm on dialup here - maybe I could do diffs to the specific CVS version I have.

I think it's fair to say that IRIX's SMP is rather more advanced than OS/2's, y'know :) There doesn't seem to be any job swapping going on apart from the usual every-couple-of-seconds-measurment-artifact thing.
Squeen, thanks, that does make sense. One wouldn't want just anybody to be making system scope threads :) I think process scope is still fine for this purpose, though.

BTW my Octane is not around so I can't do more on this for a couple of months. But I will at some point... anyone else is welcome to take what's above and carry on. Just stick the first code I posted in the libvo folder as vo_whatever.c, and add links to it in video_out.c or wherever it is. And mess with the configure script.
Squeen, thanks. Thinking about it, that might explain why none of the above optimisations made any difference. Shall experiment, at some point.
Blender is fine even at 1280. It's nice being able to rearrange the buttons window to be vertical with the new GUI - it would look ridiculous on a 1600SW, otherwise :)
MPEG 1 (.mpg) or a Quicktime (.mov or .qt) with the Cinepak codec should be small and playable everywhere.
The SCSL FFT functions are very fast, but not nessecarily much use... what might be useful is the BLAS stuff, a lot of time in the IDCT is probably spent in cross multiplying matrices etc.

It's a real pity that mplayer as a whole isn't more thread-friendly - that time waiting for GL would be better spent crunching.

I'm not sure their yuv-rgb routine sucks that much... IIRC someone on the dev lists said 40% of a P3 was typical, although of course most PC cards can do it in hardware.

I sorted out my pthreads properly as per squeen's advice above, and get a reliable 20 FPS playing a DVD on a dual 300, no TRAM, both CPUs completely maxed. Not ideal. But it plays MPEG4 better than a 350Mhz iMac, which is something.
Beautiful stuff, dexter!

Vegac, I wouldn't really sweat the pthreads for O2s - glDrawPixels() is very fast.

Gonna go apply that IDCT patch to my source tree - I'll probably be able to play DVDs at full speed, finally!
It was more a case of not wanted to accidentally re-flash the PROM with something which wouldn't boot than not knowing what to change, wasn't it? Did anyone ever find the relevant list of CPU IDs? And would the kernel have to be altered too? Mayhap a huge pile of backup motherboards might be more helpful than inside information :)
You might try using setmon -sn, which might/should/could squirt out normal PC-style sync signals instead of SOG. Providing all the pins are wired.

If you're going to rip pins out, it shouldn't harm anything... all you actually need are the RGB pairs, for doing SOG. As far as I know.

I take it the S93 is known good? It sounds broken if other monitors work okay.
I happen to have my dual 300 Octane with Media Illusion next to a Sempron 2500+ with Combustion 3. I set up a crude comparision, more of a race than a benchmark, just running the same footage through a keyer, a blur, an invert. The results were basically identical, maybe not pixel-for-pixel. The Octane rendered the sequence about 10% faster. I was pretty alarmed by that... shouldn't a PC with all that vector and SIMD stuff, and DDR RAM, completely kick ass at this? I don't think it could have been disk-bound. I'll try and do some better benchmarks, maybe with a dual G4 as well, especially if I can get an old flint rig involved. It's probably just crap software, but even so - people seem to drool over Combustion a lot, yet Avid dropped Illusion ;-)
ISTR that Lightwave uses a completely arse-backwards way to get standard open and save dialogs - it calls a seperate file selector binary which writes the selected file path to somewhere in /tmp. Maybe it's not finding this seperate binary? Is your path all okay? I think it was called filesel or something. Bloody stupid way to carry on IMO.
mia, you know SGI have donated FAM to Linux, right?

http://oss.sgi.com/projects/fam

:)
It sounds like it will work. They're not the first to leave SOG out of the specs, most people don't care. My Dell 2005FPW flatpanel didn't mention it in the specs, but it works. I suppose the monitor manufacturer often just buys a sync decoder IC, or reuses an old design not paying attention to the fact that many of them do do sync on green.

And yeah that's a niiiice screen. Jealous.
Mplayer can play H.264... the ffmpeg webpage was crowing about having playback support on the day QT7 was released. Probably needs a recent CVS build.

To spit out image sequences, I go: mplayer -vo tga -vf format=bgr24 somefile.mov. Some apps get the channels mixed up without the format filter. This works so much faster than outputting image sequences from Final Cut Pro :) Just wish it could output SGI format images... I should write another output plugin...


:?

So instead of just sending the fairly small vertex and textures data you send at least a 25fps megapixel video stream. Can't see that working terribly well, really...
I've also been trying to build blender with gcc. I've got both 2.37a and the current CVS to build with a bit of mucking about but they both core in the STL before blender even gets started. I gave up. I don't have the real compiler.
FYI I think it's impossible to build Blender with gcc. The problem is that it needs to link in the GLU library, which brings the SGI C++ library libC with it, which conflicts horribly with gcc's own C++ library, libstdc++. Someone else had similar problems with other things, and as far as I know there's no solution - you can't compile C++ programs that use GLU with gcc on Irix. If anyone knows otherwise, pipe up!
Your own GLU? That would be quite a task, I think... I wonder how the Mesa one compares...
Okely dokely, I've managed to compile the latest CVS with gcc. I had to build libGLU from the OpenGL sample implementation source, which was an adventure in itself, but that removed the dependancy on Irix's C++ libraries and all was well.

Comparing 2.37a built with MIPSPro to my own gcc build, my build is about 20% slower doing the crude draw benchmark. The latest version is about 10% faster, though.

I'll try and package it up, or maybe wait for the 2.4 final, if anyone else even cares :)
Incidentally you only need to set those two environment variables if you want to be able to run gmake lower down the source tree - if you're building just by running gmake in the root of the tree they're not nessecary.

Gonna try a MIPS4 build now before I break out ogldebug and go after FTGL. International fonts seem to be broken in the last official build so maybe it's something on my system...
GCC constantly crashes when doing -mips4 builds so that goes out the window.
Probably be pretty simple if I used Irix's libraries. When I next have some time... although it's not like I use mplayer since I discovered its DV stupidities.
I have an SS5 which I used to run OpenBSD on, but I robbed it of its disk so it's just sitting about collecting dust right now. It has the 24 bit framebuffer, the slow one ;)
ajerimez wrote: Seriously now. I'm sure that this system cost close to that amount when new, but can it possibly be worth anywhere near this much years later?


Years later? Hardly, the current version of Inferno is 6.5 and that is 6.2, it's like a year old. That's a damn good price.
Thanks for that. I find the default icons annoyingly difficult to interpret, but making a whole set of text-only ones would take for-ev-er...

Do you happen to know what the hardwareType actually does? Is there any visible quality difference?
Yeah, I much prefer the Tremor interface. Not sure how you'd do it in a header file but you can launch "shake -gui 1" to get it.
Well currently my Wacom on Irix goes totally to shit if I leave the pointer in one place for more than a second. Lifting the pen off makes it better but BOY is it annoying :)

I'm not really a Shake guy so I'm unlikely to be making an icon set but I did make one for Flame which you can see/get here: http://www.fxguide.com/postlite1645-batch+icon.html
GIJoe wrote: and it looks like you're turning batch into a shake-default-like colorful icon mess :lol:


:-p

You have a point but at least all the text names are still there. In shake I look at the row of buttons under viewer for the rotoshapes and go HUH? What are these weird arrows and stuff supposed to mean? :)
The page size in Irix is actually dynamically variable per process, and I think maybe even per thread or even more finely grained... actually, I think you can set any chunk of a process's address space to a different page size.

Pretty scary, huh? I believe this feature is totally unique to Irix.

http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?coll=0650&db=bks&srch=&fname=/SGI_Developer/MMC_PG/sgi_html/ch02.html
In the real world I don't think anything except mad custom code deviates from the default page size :)
That only works on graphics cards which have specific support for TV out through the VESA plug. Those are very few and enormously far between.
Bummer. You should take advantage of that support contract you have. You do have one, RIGHT? ;)

If the archives are just sitting on the system disk you could rip it out and whack it in the front of another Tezro (gently). If it's stuff still on the Stone (assumng Flame/Smoke here?) then you would need to move both the clip libraries from the system disk and the Stone itself onto another machine, bit hairy, wouldn't recommend doing that.

You could move both System disk and Stone and the SysID button but again, hairy :-o

Reseating the graphics board might help but that is a pretty serious mission in a Tezro, much harder than an Octane, I think.
You need to install the 64-bit version of this and a couple other libraries, if you're installing what I think you're installing. And this is all well documented in the release notes :)
SkyBound, is your avatar picture CGI? Where's it from?