Getting Started, Documentation, Tips & Tricks

Mirroring Techpubs Docs - Page 1

I asked a question about a year or so back about making my own mirror of techpubs but was unsuccessful. In light of recent SGI developments, I'd like to create my own snapshots of all the current SGI & IRIX docs should they become unavailable in the future. I've got original IRIX CD media with various docs, but I'd really like to grab the newest document versions direct from SGI themselves and then burn them onto my own DVDs.

In my previous attempts I had tried using wget in various ways to grab the info, but this ended up giving me all sorts of issues with their CGI setup. Has anyone here got a sure-fire way of grabbing the docs (preferably PDFs) off of the SGI Techpubs site (in an automated fashion)?

Many thanks in advance!
Nick
Have you tried Zoontf's technique?

viewtopic.php?t=4241&highlight=wget+techpubs

It worked for me (before SGI revamped the site). I seem to remember having to remove a couple of commands to get it to run. YMMV.
I've tried the suggestions in the link but am having trouble getting it working.

Code:
$ cat grabdocs2.sh

#!/bin/sh
wget -r --accept="*.pdf,download.cgi*" \
--reject="browse.cgi,summary.cgi,init.cgi,help.cgi,feedback.cgi,shownew.cgi,listdocs.cgi" \
--domains=techpubs.sgi.com -nd -i techpubs.txt 2> log.txt &

$ cat techpubs.txt
http://techpubs.sgi.com/library/tpl/cgi-bin/browse.cgi?db=bks&coll=hdwr&pth=ALL
http://techpubs.sgi.com/library/tpl/cgi-bin/browse.cgi?db=bks&coll=0650&pth=ALL


It runs and dumps the html for each download's link page e.g.
Code:
...
-rw-r--r--  1 nick  nick  13641 May  9 22:16 download.cgi?coll=hdwr&db=bks&docnumber=860-0218-002
-rw-r--r--  1 nick  nick  13627 May  9 22:16 download.cgi?coll=hdwr&db=bks&docnumber=860-0219-002
-rw-r--r--  1 nick  nick  13549 May  9 22:25 download.cgi?coll=hdwr&db=bks&docnumber=860-0220-001
-rw-r--r--  1 nick  nick  13557 May  9 22:25 download.cgi?coll=hdwr&db=bks&docnumber=860-0221-001
-rw-r--r--  1 nick  nick  13505 May  9 22:26 download.cgi?coll=hdwr&db=bks&docnumber=860-0222-001
-rw-r--r--  1 nick  nick  13576 May  9 22:26 download.cgi?coll=hdwr&db=bks&docnumber=860-0223-001
...


Any ideas where I might be going wrong?

Thanks!
Nick
I'm sure there must have been a nice spike in the network traffic @ sgi.com yesterday, I refreshed my mirror as well :wink:

This is probably not the best way, but here's how I did it. Various Linuxisms (debian 3.1) may be hidden in here.

Code:
#!/bin/bash

#set -x

# Freeware (fw) doesn't have books in it's collection
COLLECTIONS="0530 0620 0630 0640 0650 hdwr linux nt"

WGETOPT="-m -nv -T60 -t0 -nH --cut-dirs=2"

for coll in $COLLECTIONS; do
mkdir manuals_$coll
echo "#!/bin/sh" > wget_$coll.sh
chmod 755 wget_$coll.sh
echo "cd  manuals_$coll" >> wget_$coll.sh
lynx -dump -width=999 "http://techpubs.sgi.com/library/tpl/cgi-bin/browse.cgi?db=bks&coll=$coll&pth=ALL" > dump_$coll.txt
# Get part numbers
grep "download.cgi" < dump_$coll.txt | cut -d '=' -f4 | sort > manuals_$coll.txt
MANUALS=`cat manuals_$coll.txt`
for book in $MANUALS; do
major=`echo $book | cut -c5`
echo "wget "$WGETOPT" http://techpubs.sgi.com/library/manuals/"$major"000/"$book"/pdf/"$book".pdf" >> wget_$coll.sh
echo "wget "$WGETOPT" http://techpubs.sgi.com/library/manuals/"$major"000/"$book"/dl/"$book".html.tgz" >> wget_$coll.sh
done
done;


This creates dirs 'manuals_0530' etc. and scripts 'wget_0530.sh' etc.

Scripts look like this:
Code:
#!/bin/sh
cd  manuals_0530
wget -m -nv -T60 -t0 -nH --cut-dirs=2 http://techpubs.sgi.com/library/manuals/0000/007-0603-100/pdf/007-0603-100.pdf
wget -m -nv -T60 -t0 -nH --cut-dirs=2 http://techpubs.sgi.com/library/manuals/0000/007-0603-100/dl/007-0603-100.html.tgz
...

After inspecting the "wget_*.sh you run them.

Expect these download volumes (kB)
184928 manuals_0530
247840 manuals_0620
252076 manuals_0630
284396 manuals_0640
562524 manuals_0650
805100 manuals_hdwr
210080 manuals_linux
68356 manuals_nt

This is both the online (html.tgz) and pdf versions.

_________________
Now this is a deep dark secret, so everybody keep it quiet :)
It turns out that when reset, the WD33C93 defaults to a SCSI ID of 0, and it was simpler to leave it that way... -- Dave Olson, in comp.sys.sgi

Currently in commercial service: Image :Onyx2: (2x) :O3x02L:
In the museum : almost every MIPS/IRIX system.
Wanted : GM1 board for Professional Series GT graphics (030-0076-003, 030-0076-004)
2 Jan-Jaap: working perfectly for me on my Linux box (Fedora Core 2) - well done champ, thanks very much indeed!
Steve
jan-jaap wrote:
IVarious Linuxisms (debian 3.1) may be hidden in here.


Works on IRIX too provided you change the first line to point to your local bash binary (and you have wget/lynx someplace). Been pulling down the docs for several hours now, thanks!

_________________
Twitter: @neko_no_ko
IRIX Release 4.0.5 IP12 Version 06151813 System V
Copyright 1987-1992 Silicon Graphics, Inc.
All Rights Reserved.
jan-jaap wrote:
Various Linuxisms (debian 3.1) may be hidden in here.


I changed /bin/bash to /bin/sh (ksh) here on OpenBSD and it seems to be working just fine. Am now downloading the needed 6.5 and Hardware docs to my local server.

Really great scripts - very much appreciated.

Many thanks again.

Nick
It might be worth mirroring them here, right next to the nekoware stuff?

They don't seem all that big :wink:

_________________
Man is the only animal smart enough to build the Empire State Building, and the only one stupid enough to jump off it.
Spidy wrote:
It might be worth mirroring them here, right next to the nekoware stuff?

They don't seem all that big :wink:


Can't do that without their permission I'm afraid.

_________________
Twitter: @neko_no_ko
IRIX Release 4.0.5 IP12 Version 06151813 System V
Copyright 1987-1992 Silicon Graphics, Inc.
All Rights Reserved.
Thank you jan-jaap! I am currently dumping everything onto my O200 and converting the PDFs to plaintext for easy greping. :D

Any idea how large this is going to get, and how often SGI really updates their doc tree? I wonder if I did an indepedant dump next week, how large the diff would be? Hmmm...

[Edit]Turned out to be just shy of 2.5GB. *LOTS* of 404s, though.[/Edit]
And what mirroring supportfolio patches as well :twisted:
BTW mirroring TPL works great on OSX as well.

_________________
:O2: :Indy: (KO) :Octane: (KO)

Looking for:
1600sw, O2 cam, Fuel
ok, ive got it running now on my server, 5 minutes ago and up to 100mb already, what all does this archive? is there a way to archive the freeware too? how about patches? I'm trying to archive all that I would ever need before it disappears!! :D

_________________
My SGI systems (in order received) :Indy: Deaconblues :Indigo: Badsneakers :Indigo2: Greenearrings :Indigo2IMP: Kidcharlemagne :Octane: Haitiandivorce :O2: Aja :320: :1600SW: MidnightCruiser
(looking for) :Fuel: Pretzellogic :Tezro: Blackfriday
Quote:
ok, ive got it running now on my server, 5 minutes ago and up to 100mb already, what all does this archive?
No offense intended, but if you are running a script that is thwacking a website without understanding what the script is actually doing, you probably shouldn't be running the script, even if it comes from a reputable character like jan-jaap...
Quote:
No offense intended,
none taken
Quote:
but if you are running a script that is thwacking a website without understanding what the script is actually doing, you probably shouldn't be running the script, even if it comes from a reputable character like jan-jaap...

I know, I know. I have a good general idea of what it does, but I just wanted to confirm what was happening here, and ask if there was a way to archive some of the other portions of the site. I misunderstood that the techpubs site also held software, and patches, I realize thats supportfolio. so the correct question from me is, is there a way to archive supportfolio? without clicking each link?

_________________
My SGI systems (in order received) :Indy: Deaconblues :Indigo: Badsneakers :Indigo2: Greenearrings :Indigo2IMP: Kidcharlemagne :Octane: Haitiandivorce :O2: Aja :320: :1600SW: MidnightCruiser
(looking for) :Fuel: Pretzellogic :Tezro: Blackfriday
OK, I have completed the download of the tech pubs from Diego.
He also provided me with some sites that he mirrored five years ago.
There is still some more stuff that I will download from him whenever he can find some more time.

So thank you so much Diego!


It is all available at the Swedish Nekoware Mirror http://se.mirror.nekoware.se
I have also put up some other miscellaneous SGI info and files I had.

If you have more stuff you can contribute like mirrored sites or IRIX patches/software
please PM me and I will put it up there.

I intend to keep this server up for a long time so there is no need for you to download it all from it :-)

Enjoy!
//deBug

_________________
Mein Führer, I can walk!
deBug wrote:
I intend to keep this server up for a long time so there is no need for you to download it all from it :-)
Extraordinary... You guys are the BEST! :mrgreen:

_________________
Project:
Movin' on up, toooo the east side
Plan:
World domination! Or something...
deBug wrote:
OK, I have completed the download of the tech pubs from Diego.
He also provided me with some sites that he mirrored five years ago.
There is still some more stuff that I will download from him whenever he can find some more time.

So thank you so much Diego!


It is all available at the Swedish Nekoware Mirror http://se.mirror.nekoware.se
I have also put up some other miscellaneous SGI info and files I had.

If you have more stuff you can contribute like mirrored sites or IRIX patches/software
please PM me and I will put it up there.

I intend to keep this server up for a long time so there is no need for you to download it all from it :-)

Enjoy!
//deBug


To make sure that the server stays up for a long time without any nasty letters from lawyers... Has anyone checked with SGI and explained our intent, interest, and how this will be of value to them (i.e. fewer people hammering their servers) to get official permission?

_________________
Damn the torpedoes, full speed ahead!

There are those who say I'm a bit of a curmudgeon. To them I reply: "GET OFF MY LAWN!"

:Indigo: :Octane: :Indigo2: :Indigo2IMP: :Indy: :PI: :O3x0: :ChallengeL: :O2000R: (single-CM)
deBug wrote:
OK, I have completed the download of the tech pubs from Diego.
He also provided me with some sites that he mirrored five years ago.
There is still some more stuff that I will download from him whenever he can find some more time.

So thank you so much Diego!


It is all available at the Swedish Nekoware Mirror http://se.mirror.nekoware.se
I have also put up some other miscellaneous SGI info and files I had.
...

http://se.mirror.nekoware.se/ didn't work for me but http://se.mirror.nekoware.net/ did.

-Darkstar

_________________
My SGI collection, in chronological order:
:Indy: :Indigo2: :Indigo2IMP: :Indy: :Octane: :Indigo: :O2000: :O2: :O2: :Octane2:
Current tasks: Find Indigo Keyboard+Mouse, find Octane PCI cardcage, find O2k Hardware

Other systems from my collection:
IBM PReP 43P/120, HP 9000/712, DEC VAXstation 4000 model 60, HP Envizex II, DEC AlphaStation 200, Amiga 2000, HP NetServer LH4r, Commodore 64, Commodore 128, Atari 512ST, Sun SPARCengine CP1500, Sun SPARCengine Ultra AXi
Darkstar wrote:


My mistake, nekoware.net is the correct one as you said.

//deBug

_________________
Mein Führer, I can walk!
FYI, make sure you keep the old backups from techpubs, since SGI likes to refresh PDF documents for new systems, thereby increasing the minor document number and removing the old document :(

This happened to "OpenGL on Silicon Graphics Systems" : viewtopic.php?f=11&t=6345&p=47952

i have a june 2005 one, btw

_________________
:Crimson: :PI: :Indigo: :O2: :Indy: :Indigo2: :Indigo2IMP: :O2000: :Onyx2:
European nekoware mirror, updated twice a day: http://www.mechanics.citg.tudelft.nl/~everdij/nekoware
ftp://mech001.citg.tudelft.nl rsync mech001.citg.tudelft.nl::nekoware