SGI: Hardware

LSI SAS/SATA HBA's and the Fuel (or other IP35 Systems) - Page 4

Makes sense. I think we have a pretty similar setup; I use the O2 as a firewall, and shell server (~50W is well worth it); and IP27/30/53 for compute intensive tasks (compilation and optimization); where a lot of ram and IO helps.
:Onyx2:
The only thing wrong with my setup atm is the Phobos P1000 Gbit NIC in the O2 runs really slow, barely
more than 5 to 10MB/sec. I'm sure there must be kernel stuff one can do to fix it, but so far I've gotten nowhere.

Ian.
(07/Mar/2015) FREE! (collection only) 16x Sagitta 12-bay dual-channel U160 SCSI JBOD units.
Email, phone or PM for details, or see my forum post .
[email protected]
+44 (0)131 476 0796
mapesdhs wrote: The only thing wrong with my setup atm is the Phobos P1000 Gbit NIC in the O2 runs really slow, barely
more than 5 to 10MB/sec. I'm sure there must be kernel stuff one can do to fix it, but so far I've gotten nowhere.


Have you tried enabling jumbo frames ? It really helps , especially on CPU-starved machines.
I had one of those, but I ended up tossing it away, it was just crippling my system, so I suspect their proprietary drivers are not worth a dime; I think I still have a 4x100 one (which didn't work in IP32).
In the US, unlike in Europe and Asia, Internet is still pretty slow (for the masses), all things are relative, it's still much faster than 10 years ago.

http://www.netindex.com/download/allcountries/

Does your site goes through the O2, or is it hosted somewhere else? just curious.
:Onyx2:
ShadeOfBlue wrote: Have you tried enabling jumbo frames ? It really helps , especially on CPU-starved machines.


Hmm, perhaps I've edited the wrong file?... netstat still shows 1500 for pge0 (the P1000 interface). In netif.options, if1name is ec0
(the link to the cable modem, DHCP), if2name is pge0 (internal static IP, host name gateway). I thought this meant the file to edit would
be /etc/config/ifconfig-2.options, which currently contains:

Code: Select all

netmask 0xffffff00 sspace 262144 rspace 262144 mtu 9000


Is there some other way of changing the MTU for the pge0 link if this method isn't applicable?

I changed snd/recvspace, but performance is still poor, around 32MB/sec. Real ftp speed is much worse, about 14MB/sec for sending
a file from the O2 to my Fuel, and a measly 4MB/sec for receiving a file to the O2 from the Fuel (ftp session running on the O2, 275MB file).

Note that sending a file from my i7 PC to the Fuel gives more like 95MB/sec.


mia writes:
> Does your site goes through the O2, ...

Heavens no. :D My main site is hosted by daily.co.uk, their download speed is pretty good. Try downloading this (89MB file), you'll see
what I mean. I get 2.3MB/sec, which is basically maxing out my 20Mbit Virgin Media link.

Ian.
mapesdhs wrote: I get 2.3MB/sec, which is basically maxing out my 20Mbit Virgin Media link.

I get 7.48MB/s -- which leaves ~ 25% of my 100/100 FTTH unused ;)

I think they offer speeds up to 500Mbit here. I should ask my provider if they support jumbo frames :mrgreen:
Now this is a deep dark secret, so everybody keep it quiet :)
It turns out that when reset, the WD33C93 defaults to a SCSI ID of 0, and it was simpler to leave it that way... -- Dave Olson, in comp.sys.sgi

Currently in commercial service: Image :Onyx2: (2x) :O3x02L:
In the museum : almost every MIPS/IRIX system.
Wanted : GM1 board for Professional Series GT graphics (030-0076-003, 030-0076-004)
mia wrote: ( Describes DMF, which sounds like a fairly sophisticated hierarchical storage management system ...)
It's cool however that DMF is available for Irix, it used to be mostly on big Unicos systems (Cray C90 & J90 and such) with some relatively big tape transports (thousands of slots in silos). In other words, it's good to see software that was available on systems worth tens of millions of dollars come to "affordable" workstations.
I agree, and like Ian hadn't realized this was available with plain old IRIX. Pretty cool stuff.

I'd be sorely tempted to fool around with DMF if I had time for half the projects going on already...
Then? :IRIS3130: ... Now? :O3x02L: :A3504L: - :A3502L: :1600SW: +MLA :Fuel: :Octane2: :Octane: :Indigo2IMP: ... Other: DEC :BA213: :BA123: Sun , DG AViiON , NeXT :Cube:
jan-jaap wrote: I get 7.48MB/s ...


Holy crap on a cracker. :D

I could upgrade to 30Mbit for free, but that would require a different cable modem, a model about which I've heard bad things.

Virgin offer up to 100Mbit atm. You can have 500? Blimey...

Ian.
mapesdhs wrote: Virgin offer up to 100Mbit atm. You can have 500? Blimey...

Fiber to the home. It's a symmetrical line so my upload speed is also 100Mb/s. Try that with ADSL :D

Started out as 50/50Mbit, got a freebie upgrade to 100/100. I can get higher speeds but I don't think I'd notice the difference. IIRC they've started experiments with gigabit in some cities this year.
Now this is a deep dark secret, so everybody keep it quiet :)
It turns out that when reset, the WD33C93 defaults to a SCSI ID of 0, and it was simpler to leave it that way... -- Dave Olson, in comp.sys.sgi

Currently in commercial service: Image :Onyx2: (2x) :O3x02L:
In the museum : almost every MIPS/IRIX system.
Wanted : GM1 board for Professional Series GT graphics (030-0076-003, 030-0076-004)
I started with 1Mbit 8 years ago; they never changed the package name, just kept increasing the speed, though now I
do wish at least once they'd simply drop the price instead (slow price creep over the years).

Ian.
(07/Mar/2015) FREE! (collection only) 16x Sagitta 12-bay dual-channel U160 SCSI JBOD units.
Email, phone or PM for details, or see my forum post .
[email protected]
+44 (0)131 476 0796
mapesdhs wrote: Is there some other way of changing the MTU for the pge0 link if this method isn't applicable?

Try editing /var/sysgen/master.d/if_pge if it exists. Some drivers honor the MTU settings in the ifconfig file and some don't, so you have to edit the driver config instead.

There's also a possibility that the Phobos card doesn't support jumbo frames at all.

mapesdhs wrote: I changed snd/recvspace, but performance is still poor, around 32MB/sec. Real ftp speed is much worse, about 14MB/sec for sending
a file from the O2 to my Fuel, and a measly 4MB/sec for receiving a file to the O2 from the Fuel (ftp session running on the O2, 275MB file).

The biggest performance gain comes from jumbo frames, but it's also possible that the driver is poorly written or the card is badly made. You should be able to get at least 50MB/s of raw performance as measured by iperf.
ShadeOfBlue writes:
> Try editing /var/sysgen/master.d/if_pge if it exists. ...

It does exist, but I'm not sure what I should change that's applicable:

Code: Select all

*
* Copyright 1998 Phobos Corporation
* All Rights Reserved.
*
* Phobos P1000 Gigabit Ethernet Driver
*
*FLAG   PREFIX  SOFT    #DEV    DEPENDENCIES
cs      pge_      -       -       bsd
$$$
/*
* Change pfe to et to allow SGI tools like net visualizer to recognize
* the interface
* char *pge_phobos_name = "et";
*/
char *pge_phobos_name = "pge";

/*
* Number of transmit and recieve buffers allocated at attach time
*/
int Phobos_pge_TX_DESC = 768;
int Phobos_pge_RX_DESC = 1024;

/*
* Transmit mapping threshold
*/
int Phobos_pge_MAP_THRESHOLD = 80;



> There's also a possibility that the Phobos card doesn't support jumbo frames at all.

Hmm, good point.


> The biggest performance gain comes from jumbo frames, but it's also possible that the driver is poorly written or the card is badly
> made. You should be able to get at least 50MB/s of raw performance as measured by iperf.

Of course I don't expect to get more than about 35MB/sec for an ftp (limit of O2's UW bus) but it ought to be better than it is atm.

I tried iperf (feel free to comment if I'm not using it correctly), here's the O2 acting as the server end:

Code: Select all

gateway# ./iperf -f M -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 0.25 MByte (default)
------------------------------------------------------------
[  4] local 192.168.100.1 port 5001 connected with 192.168.100.10 port 19704
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  49.2 MBytes  4.91 MBytes/sec


but look at the output from the Fuel, re the write2 error:

Code: Select all

winters# ./iperf -f M -c gateway
------------------------------------------------------------
Client connecting to gateway, TCP port 5001
TCP window size: 0.19 MByte (default)
------------------------------------------------------------
[  3] local 192.168.100.10 port 19704 connected with 192.168.100.1 port 5001
write2 failed: Interrupted function call
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  49.2 MBytes  4.92 MBytes/sec


What do you make of that? Note the Fuel is still using MTU 1500.


Here's the O2 acting as the client:

Code: Select all

gateway# ./iperf -f M -c winters
------------------------------------------------------------
Client connecting to winters, TCP port 5001
TCP window size: 0.25 MByte (default)
------------------------------------------------------------
[  3] local 192.168.100.1 port 57088 connected with 192.168.100.10 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec    290 MBytes  29.1 MBytes/sec


and the Fuel as the server:

Code: Select all

winters# ./iperf -f M -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 0.06 MByte (default)
------------------------------------------------------------
[  4] local 192.168.100.10 port 5001 connected with 192.168.100.1 port 57088
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec    290 MBytes  29.0 MBytes/sec


So it's much quicker sending data to the Fuel than receiving it. Strange...

Should I set the Fuel's MTU to 9000? Never bothered looking into it before since the speed of the Fuel's GigE has
always been really good to/from my other systems.

Ian.
mapesdhs wrote: It does exist, but I'm not sure what I should change that's applicable:

Hmm... OK, so there's nothing in the driver config file and the driver won't honor the ifconfig mtu setting.
Perhaps there's a systune parameter that the driver exports? Look into the /var/sysgen/mtune/if_pge file, the variables listed there (if any) are available for tweaking via systune (you can also edit the file directly, run autoconfig and reboot).

You can also take a look at the card itself, google the part number of its ethernet controller and check the datasheet to see if it supports MTUs bigger than 1500.


mapesdhs wrote:

Code: Select all

write2 failed: Interrupted function call

What do you make of that? Note the Fuel is still using MTU 1500.

That's probably a bug in iperf :D
The write call returned an EINTR, which the program was supposed to handle by repeating the write call, but I guess nobody implemented that. You could try a newer version of iperf to see if it has been fixed.


mapesdhs wrote: So it's much quicker sending data to the Fuel than receiving it. Strange...

These results are really poor for a gigabit card; putting a quad-port (or even dual-port) 100Mbit ethernet card and trunking the ports would give better results :/

You could try tweaking the Phobos_pge_TX_DESC and Phobos_pge_RX_DESC parameters in that master.d file, but I have a feeling that the driver adjusts them automatically.

Oh, you could also increase 'systune maxdmasz' (the value is in pages, so 4kB on an O2, IIRC). I remember the default value is really low, at least for an Origin system.

mapesdhs wrote: Should I set the Fuel's MTU to 9000?

If you do, you should do the same for all other machines on that subnet, otherwise you will get weird errors (e.g. ssh connections will randomly disconnect, chat clients will constantly reconnect, etc.).
I have a separate subnet just for jumbo-frame-capable hardware and haven't had any problems.
mapesdhs wrote: Should I set the Fuel's MTU to 9000?
shady blue wrote: If you do, you should do the same for all other machines on that subnet, otherwise you will get weird errors (e.g. ssh connections will randomly disconnect, chat clients will constantly reconnect, etc.)

I have a separate subnet just for jumbo-frame-capable hardware and haven't had any problems.


My experience was that the switch is an important component in the jumbo-frame equation. I had some jumbo framers and some 10mbit devices all happy on the same subnet but if the switch doesn't negotiate well, there were problems. Finally got it all working nicely then the soho switch fried. Affordable Cisco doesn't do jumbo so it was back to square one, but 100 mbit on cisco has been much more reliable than gigabit on soho ....
A friend of mine also has mixed jumbo- and non-jumbo devices on the same subnet and it works fine for him, but I remember reading that this actually isn't guaranteed to work (it relies on path MTU discovery, which can be flakey), so I decided to play it safe and make a separate subnet :)

If you still have that switch, you can open it and check if a capacitor has blown -- this is the single most common failure in switches and it's easy to replace.
I've received a stack of "dead" 24-port 10/100 switches, which all had a blown capacitor and started working properly after I replaced it.
Ian,

I'm sorry, this is a dumb question, but have you tried to disable ipfilter on your O2 when using the gigE card? By disabling I mean unloading the kernel module all together.
:Onyx2:
mia wrote: Ian,

I'm sorry, this is a dumb question, but have you tried to disable ipfilter on your O2 when using the gigE card? By disabling I mean unloading the kernel module all together.


How does one unload the kernel module?

If you're thinking ipfilter could be a problem, could I test it merely by shutting down ipfilter from with /etc/init.d?

Ian.
Another one for fun: (this one is work in progress)

I've replicated FC jedi's Chris Kalisiak's configuration from http://www.futuretech.blinkenlights.nl/fc.html (Ian's website):

Took a "relatively large Origin server", a netapp 14-disks (Seagate 300GB FC) lun array, and got those laughable results:

Code: Select all

root@plum:/mnt# diskperf -W -D -r 4k -m 4m testfile
#---------------------------------------------------------
# Disk Performance Test Results Generated By Diskperf V1.2
#
# Test name     : Unspecified
# Test date     : Thu Oct 18 15:44:29 2012
# Test machine  : IRIX64 plum 6.5 07202013 IP35
# Test type     : XFS data subvolume
# Test path     : testfile
# Request sizes : min=4096 max=4194304
# Parameters    : direct=1 time=10 scale=1.000 delay=0.000
# XFS file size : 4294967296 bytes
#---------------------------------------------------------
# req_size  fwd_wt  fwd_rd  bwd_wt  bwd_rd  rnd_wt  rnd_rd
#  (bytes)  (MB/s)  (MB/s)  (MB/s)  (MB/s)  (MB/s)  (MB/s)
#---------------------------------------------------------
4096    8.31   14.80    8.85   14.75    5.03    0.92
8192   15.24   26.80   15.12   25.85   12.91    1.78
16384   24.20   41.61   23.51   37.39   22.47    3.50
32768   32.82   52.38   33.18   44.34   23.89    5.86
65536   42.26   59.25   42.64   21.11   31.91    9.75
131072   44.83   63.53   46.15   25.68   32.12   15.08
262144   47.30   69.03   46.97   32.19   28.54   20.39
524288   46.78   69.99   39.64   41.26   37.37   22.61
1048576   45.64   70.74   41.75   43.68   39.97   34.29
2097152   41.44   74.54   42.69   49.77   39.57   40.78
4194304   45.52   72.21   38.86   59.10   40.88   48.07


At the time of the tests the load was roughly ~40% on the netapp; so I'm trying to understand where is the bottleneck, plausible causes:

- qla2342 driver?
- single channel (2Gbps) FC, not appropriate? (This netapp, while using a 2Gbps GBic is pushing only 1Gbps at most). I should trunk more ports.
- gremlins
:Onyx2:
How did you create the array though? So far all my testing has used diskalign in order to create stripe arrays
optimised for uncompressed video.

Also, why is your test file so large? If the array is that slow, it needn't be more than 1GB at most, and even that's
on the high side.

Ian.
(07/Mar/2015) FREE! (collection only) 16x Sagitta 12-bay dual-channel U160 SCSI JBOD units.
Email, phone or PM for details, or see my forum post .
[email protected]
+44 (0)131 476 0796
The array is created and managed by the netapp head; then exposed to the host/switch using either the fibre channel protocol or iscsi.

Netapp encapsulate lun blocks within those 4k WAFL (netapp's filesystem) blocks. WAFL itself is stripped between all the drives (with redundancy bits, here raid-DP). WAFL is the underlying filesystem of iscsi, nfs, cifs, fcp on netapp. Yes, this is not a typo, a block device is chunked on a filesystem.

There are advantages and (performance) drawbacks of this method. But this overhead probably explains why I can't reach 1Gbps line rate; because the cost of IO operations (including parity checks). Add to this that WAFL is ironed on disk blocks, which are checksumed (in the case of those FC drives, the disk blocks are 524 bytes, where 8 bytes are reserved for checksums, and 512 bytes are available.

So there's a lot of overhead, but also a lot of features (mirroring to another array, symmetrically or asymmetrically, snapshots etc.) it's a tradeoff of features vs. throughput; as it's often the case. Finally, this specific array doesn't have a lot of cache available for read-ahead, write-back, etc. But I do appreciate its features. Regardless, I think this is a good test anyway. Please note that this disk shelf is 10 years old; but again, so is this Origin.
:Onyx2: