SGI: Video

O2 dmrecord error - Page 2

Would be interested in any fellow O2 user with an O2cam, if they wouldn't mind wasting a couple of minutes:

With mediarecorder, select Tasks -> Movie -> Video Production JPEG and try and record more than 6 individual 20second clips one after another without it dropping a frame. I get to the 6th and they it wont record anymore without resorting to waiting for more than 5 minutes or rebooting.

Tell me what OS rev you are, and the CPU type.
Thanks a bunch,
Tried 6.5.30m. Basically the same results at recording. Didn't fix this sys storm.

Removed all the RAM, started reinstalling in pairs. I have two pairs of 128 and two pairs or 64.

Observation. It's ok with 256MB ram, seems to record reliably all the time. Doesn't matter whether it's 2x 128 or 4x64. So discounting the slots or the sticks.

Put in 384 (256 +128) and it's records and fails to record in even amounts. When it fails to record, it often only does twice

Found out about timex -p

Good:
18:08:34 vflt/s dfill/s cache/s pgswp/s pgfil/s pflt/s cpyw/s steal/s rclm/s
18:10:13 8.86 3.01 5.80 0.00 0.01 0.76 0.59 3.18 0.00
18:08:34 physmem kernel user fsctl fsdelwr fsdata freedat empty
18:10:13 98304 13870 12881 863 1691 13969 1117 53913

Bad:
17:54:59 vflt/s dfill/s cache/s pgswp/s pgfil/s pflt/s cpyw/s steal/s rclm/s
17:55:03 182.26 32.90 148.33 0.00 2.83 19.28 15.17 37.02 0.00
17:54:59 physmem kernel user fsctl fsdelwr fsdata freedat empty
17:55:03 98304 12919 12504 356 28 1716 464 70317

More faults (vflts/s) and less pages of free memory that may be reclaimable (freedat), this is the %sys activity that kills it. So what's all this about? what's the relationship between this and the installed ram, and is it more a systune? I know MIPS uses two memory segments for kernel data structures.

man sar(1)
-p Report paging activities:
vflt/s - address translation page faults (valid page not in memory);
dfill/s - address translation fault on demand fill or demand zero
page;
cache/s - address translation fault page reclaimed from page cache;
pgswp/s - address translation fault page reclaimed from swap space;
pgfil/s - address translation fault page reclaimed from filesystem;
pflt/s - (hardware) protection faults -- including illegal access to
page and writes to (software) writable pages;
cpyw/s - protection fault on shared copy-on-write page;
steal/s - protection fault on unshared writable page;
rclm/s - pages reclaimed by paging daemon.

Posting for my own amusement. I'm not expecting anyone to be able to fix this, or there to be any IRIX virtual memory authors haunting this list. Live in hope.
rooprob wrote:
Observation. It's ok with 256MB ram, seems to record reliably all the time. Doesn't matter whether it's 2x 128 or 4x64. So discounting the slots or the sticks. Put in 384 (256 +128) and it's records and fails to record in even amounts. When it fails to record, it often only does twice
Early on there was an issue with O2s that had more than 256MB of memory installed , but afaik *only* IRIX 6.3 was affected.

_________________
***********************************************************************
Welcome to ARMLand - 0/0x0d00
running...(sherwood-root 0607201829)
* InfiniteReality/Reality Software, IRIX 6.5 Release *
***********************************************************************
Yeah I'm on 6.5.30.

So I've read this, particularly the section Checking for Excessive Paging and Swapping.
http://techpubs.sgi.com/library/tpl/cgi ... /ch10.html

Particularly
Code:
-p vflt/s

Frequency with which a process accessed a page that was not in memory. Compare this number between times of good and bad performance. If the onset of poor performance is associated with a sharp increase of vflt/s, swap I/O may be a problem even if %vswp is low or 0.


I have developed a little test harness to gather par and sar output and tested as root with different RAM configurations.

Code:
mapleleaf 8# cat /usr/people/oo/bin/dm1.par_sched
#!/bin/sh
set -e
# http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi/0650/bks/SGI_Admin/books/IA_ConfigOps/sgi_html/ch10.html
date=`date '+%Y-%m-%d_%H%M'`

if [ -z "$1" -o -z "$2" ]; then
echo "usage: $0 <tag> <duration>s"
exit 0;
fi
tag=$1
duration=$2
script=`basename $0`

export TMPDIR=/var/tmp/$USER/$tag/$script/$date
[ ! -d $TMPDIR ] && mkdir -p $TMPDIR
tmpfile=`mktemp -p $TMPDIR XXXXXXXX`

echo "dm1: output to $TMPDIR"

/usr/lib/sa/sadc 1 1 $tmpfile.sa.out
par -rQQ > $tmpfile.$script.out dmrecord -B auto -t $duration -C -v -2 -p video -p audio $tmpfile.mv || \
{
rc=$?
echo "error: failed $!"
}

/usr/lib/sa/sadc 1 1 $tmpfile.sa.out
echo "sar report"
echo "=========="
sar -A -f $tmpfile.sa.out
echo "files"
echo "====="
echo $TMPDIR
ls -al $TMPDIR


I am seeing increased vflt/s - page faults (valid page not in memory)

Tested with 256MB, all good. 171 vflt/s (this is a common average result of a few goes)
Code:
mapleleaf 11# sar -p -f /var/tmp/root/256/dm1.par_sched/2012-02-07_2112/gWYG2417.sa.out

IRIX mapleleaf 6.5 07202013 IP32    02/07/12

21:12:33  vflt/s dfill/s cache/s pgswp/s pgfil/s  pflt/s  cpyw/s steal/s rclm/s
21:12:44  171.87   69.49  102.03    0.00    0.09   16.93   11.29   75.13   0.00


Test with 384 MB (results from when it fails to record 10s of video, after several successful attempts)

Code:
mapleleaf 15# sar -p -f /var/tmp/root/384/dm1.par_sched/2012-02-07_2126/wkII1425.sa.out

IRIX mapleleaf 6.5 07202013 IP32    02/07/12

21:26:28  vflt/s dfill/s cache/s pgswp/s pgfil/s  pflt/s  cpyw/s steal/s rclm/s
21:26:30  933.17  374.52  556.73    0.00    0.48   87.02   62.02  399.52   0.00
mapleleaf 16#

mapleleaf 16# sar -p -f /var/tmp/root/384/dm1.par_sched/2012-02-07_2126/ZlPq1410.sa.out

IRIX mapleleaf 6.5 07202013 IP32    02/07/12

21:26:21  vflt/s dfill/s cache/s pgswp/s pgfil/s  pflt/s  cpyw/s steal/s rclm/s
21:26:23 1353.85  544.06  806.99    0.00    0.70  128.67   89.51  583.22   0.00



Very much higher vflt/s, but zero pgswp and pgfil, which are pages retreived from disk.

So what I have is a "non-disk page fault" condition when I *add* more RAM to the box. And it doesn't seem to matter what arrangement of RAM is added (64MB or 128MB sticks) just that it goes wrong above 256MB of RAM.

So something is getting too big for something. Could just be a poorly written dmrecord. It came from 6.3 and the age of 256MB of RAM. People have talked about it being crappy - perhaps it's failing to localize it's resources effectively once RAM gets to the crazy heights of 384 MB and beyond. You can watch RAM deplete the longer you run it, and since it's spooling to disk through ICE it shouldn't be really consuming anything more than a static pool- not actually consuming RAM indefinitely. I'm going to write my own.
If you write your own, and it works, you'll be famous. I'd build a statue. 8)

Ian.

_________________
SGI Systems/Parts/Spares/Upgrades For Sale: http://www.sgidepot.co.uk/sgidepot/
[email protected] , [email protected] , +44 (0)131 476 0796, check my auctions on eBid!
I like bronze :)

Played around with the systunes, even setting them all the way to 256MB ram settings and nada - cannot stop the vm paging from saturating the capture and preventing from working.

So plan B. Poured over the man pages and lurkers guide while the wife has been doing coursework. I have a prototype in C. Uses the "cross platform" (meaning 6.x) 6.5 dmedia libraries. Architecturally it's using the event model on the vlPath which triggers callbacks - instead of forking/select polling. That much works. It could still go either way as I'm doing something wrong - it's running at something like 4 frames per second. And it doesn't actually save anything yet. Not fully understanding something about dmbuffers and the dmIC. The point of O2 is that only data pointers are passed and devices can all see the same RAM thanks to the UMA design. Perhaps the call overhead in the event model when handling video is too high - the docs says it's really supposed to be for application events. Don't know yet.

Who knows. Anyway, more later in the week.
May as well complete this thread.

I found the reason in the source for dmrecord.dmic (/usr/share/src/dmedia), which it seems to share behaviour of the stock utility /usr/sbin/dmrecord.

capture.c
DC(dmBufferSetPoolDefaults(p,options.outbufs,video.xferbytes,DM_TRUE,DM_TRUE));

The establishment of the buffer for the video -> compressor sets both cacheable and mapped to TRUE. This is actually a cache coherency overhead when passing data between the two pools which seems to drive the system into a problem state. When I set to FALSE, I observe consistent realtime compression without any issues over and over.

That was a bit of a rat hole. Learned about the dmedia libraries, written three different versions of dmrecord (forking and event driven), a par parser, some graphviz transformations of the call trace data. Pretty pictures too :)

Educational, but it was finding jrecvid.c on ftp.sgi.com (amazingly still functional) and on a hunch that led me to the apparent fix. Learning is fun.

I'm transitioning jobs at the moment which is why I have a stupid amount of free time right now to expend on this puzzle. (I'm not allowed to touch much at my current job, just hang around to answer questions until the end of the month). I may consider learning motif to make a little UI for my doings.
rooprob wrote:
... I may consider learning motif to make a little UI for my doings.


A reliable capture app?? Time to plan that statue... 8)

Ian.
noticed that the O2 docs mention a 256MB limit for direct kernel access to memory.
Code:
For system memory, the full 1GByte memory space is directly accessible to the CPU, but the first 256 MBytes are aliased to KSEG0 so that operating system structures may be mapped without using TLB entries. To access the full 1GByte memory configuration, translation buffer entries are required.


The R5000 has only 48 TLB entries, so there may be more pressure on this resource when capturing in a >128 MB system.

_________________
:PI: :O2: :Indigo2IMP: :Indigo2IMP:
Would it be any different for an R10K/R12K system?

Ian.
The R10K has more TLB entries (64) but as far as I know the memory map works the same way.

_________________
:PI: :O2: :Indigo2IMP: :Indigo2IMP: