Miscellaneous Operating Systems/Hardware

9x 1.5TB + RAID5 + LVM + XFS? - Page 1

Hi,
we are building a small disk array in our studio and I would like to ask what would be the best setup to use.
We did some benchmarks, but we have all but the time to tweak & test all filesystems/parameters.

So, my question would be if anyone here has experience with similar setup. (9 x 1.5TB hdd)
Primary task for the array is shared storage for video files (20-30gb each).
So far, we have been using 7 x 1.5TB and ext3, which has become quite laggy over time (creating directory takes more than 20 seconds if the utilization of array is above 80%). We used raw raid5+ext3, no lvm.

Do you think the setup mentioned in subject is ok, and if so, what parameters to use for xfs?

thanks for any suggestions

_________________
:Indigo2IMP: :Octane: This post was typed using dvorak keyboard layout - http://www.dvzine.org
Putting nine drives into a single RAID5 is arguably suboptimal.

_________________
:OnyxR: :IRIS3130: :IRIS2400: :Onyx: :ChallengeL: :4D220VGX: :Indigo: :Octane: :Cube: :Indigo2IMP: :Indigo2: :Indy:
kjaer wrote:
Putting nine drives into a single RAID5 is arguably suboptimal.


what do you suggest then?

_________________
:Indigo2IMP: :Octane: This post was typed using dvorak keyboard layout - http://www.dvzine.org
I've got 4 * 1TB in a (soft) RAID5 with LVM and XFS on my home server running Debian 'Lenny'. Works fine, but most I/O is reading and it is rarely hammered from multiple clients simultaneously.

RAID5 is economical when files are mostly read, but if a significant portion of your I/O is writing it's a bad idea. I don't think LVM causes a significant overhead. Any FS will slow down when you fill it up and XFS is no exception. But my old server (similar setup) was routinely > 95% full and it didn't become noticeable to me at least. XFS performs relatively well with large files like video. If you regularly delete large numbers of files, you will hate the XFS synchronous file delete which is slow. There are probably ways around that.

_________________
Now this is a deep dark secret, so everybody keep it quiet :)
It turns out that when reset, the WD33C93 defaults to a SCSI ID of 0, and it was simpler to leave it that way... -- Dave Olson, in comp.sys.sgi

Currently in commercial service: Image :Octane2: :Onyx2: (2x) :0300:
In the museum: almost every MIPS/IRIX system.
kjaer wrote:
Putting nine drives into a single RAID5 is arguably suboptimal.


What? Putting 9 drives in a single RAID5 is just fine performance-wise for linear writes (which are most common in most applications). The only issues are that with 9 drives, there's a higher chance 2 or more might fail at once, losing you data, and that to write a random sector on a stripe the controller has to load the whole stripe to do parity (this is alleviated by tweaking your stripe size and write sizes - in many cases the entire stripe will be being written anyway so you lose nothing).

I've seen hundreds of drives (with tens of hot-spares, to try to avoid the multiple failure situation) in RAID5 arrays that perform just fine. It's very common on the high end. People who want a little extra insurance will generally RAID0 smaller RAID5 groups (so that each group can withstand a failure, making the array resistant to several simultaneous failures).

LVM's overhead is indeed quite low, and I'd recommend it if you ever anticipate needing to resize anything about your array as it makes this process much easier (in addition to full backup and snapshotting).

_________________
:0300: <> :0300: :Indy: :1600SW: :1600SW:
toxygen wrote:
So, my question would be if anyone here has experience with similar setup. (9 x 1.5TB hdd)
Primary task for the array is shared storage for video files (20-30gb each).
So far, we have been using 7 x 1.5TB and ext3, which has become quite laggy over time (creating directory takes more than 20 seconds if the utilization of array is above 80%) . We used raw raid5+ext3, no lvm.


so i have a similar situation. ordinarily the raid5 with the large files is just fine, but small block writes just lost in the shuffle in the middle of large block transfers. not a great situation... raid10 will help reduce the number of operations for those directory creates, also the dual spindle fault problem bri3d notes.

incidentally, what is common practice in the video profession for this? seems like it's a problem with a known solution.

_________________
I love my iPad!!!
I know RAID-3 would be more suitable for sequential reads/writes, whereas random I/O sucks. RAID-5 is just too much overhead for anything other than random I/Os.

_________________
:O3000: :Fuel: :Indy: :0300: :0300: :0300: :0300: :0300: :0300: :0300: :0300: :0300:
bri3d wrote:
I've seen hundreds of drives (with tens of hot-spares, to try to avoid the multiple failure situation) in RAID5 arrays that perform just fine. It's very common on the high end. People who want a little extra insurance will generally RAID0 smaller RAID5 groups (so that each group can withstand a failure, making the array resistant to several simultaneous failures).


I don't think you've looked very closely at what is actually going on with these huge RAID5 systems though. There may be hundreds of drives in the cabinet, but they're configured as multiple RAID5s, with maybe six drives per array... backed by multiple back-end controllers, massive RAM cache, and lots of MIPS to keep the performance up. The Hitachi Freedom, STK VSS, and IBM DS8100 all work this way, as does the EMC Symmetrix (if you've configured it in RAID5 mode).

Part of the problem is that SCSI (and FCAL) performance does not scale linearly with the number of targets. The bus saturates due to command block overhead after five or six targets. I was going to point this out, but it is bus dependent - e.g. IBM SSA does not exhibit this characteristic in the same way. With 1.5 TB drives it is unlikely the OP is using SCSI or FCAL (or SSA!). SATA was my guess, and that's a port architecture that would scale depending on controller architecture/design.

I still wouldn't put nine drives in a single RAID5 group, even if I weren't convinced RAID5 is a dance with the devil in the pale moonlight.

_________________
:OnyxR: :IRIS3130: :IRIS2400: :Onyx: :ChallengeL: :4D220VGX: :Indigo: :Octane: :Cube: :Indigo2IMP: :Indigo2: :Indy:
kjaer wrote:
I still wouldn't put nine drives in a single RAID5 group, even if I weren't convinced RAID5 is a dance with the devil in the pale moonlight.


I've seen people explicitly configure large Sun FC-attach arrays into one RAID5, although I agree that this doesn't make much sense. But 9 drives is right on the border in my opinion - I'd just set it up as one array.

As an amusing anecdote with regards to other stupid-disk related things I've seen people do, I've also seen people try to run big high-IOP Oracle instances on brand-new (untested) commodity SATA arrays - needless to say, they've soon found themselves buying an awful lot of drives awfully quickly ;)

_________________
:0300: <> :0300: :Indy: :1600SW: :1600SW:
toxygen wrote:
...studio...

2 thoughts assuming that this pays your bills rather than being a hobby

-if you're in production, don't touch anything till the project is out

toxygen wrote:
We did some benchmarks, but we have all but the time to tweak & test all filesystems/parameters.

-you have to do this, hit a pause for a few days and do decent tests (w/ realistic video chunks not just the default diskperf stuff). quick+dirty tests are ok for editing shots of your dog doing tricks on a netbook.

mr foetz is da man, Dr. Dave also went through the hassle to document some really nice tests here, run a search or try the irc channel.
thank you all for explanation about pros and cons of the alternative.
I'm thinking about leaving current setup: 7 disks in raid5 with ext3 as it is and buying one more disk, so I can build another array with xfs on top of lvm on top of raid5 on top of 3 disks.
And over time, I'll try to migrate stuff from old array to new one with simultaneously extending it.
Maybe one last question, what xfs tweaks do you suggest (chunk size, log size, ...)?



fu wrote:
-if you're in production, don't touch anything till the project is out

there is always some project going on. we have never been in state [idle] as far as i can remember.

fu wrote:
-you have to do this, hit a pause for a few days and do decent tests (w/ realistic video chunks not just the default diskperf stuff). quick+dirty tests are ok for editing shots of your dog doing tricks on a netbook.

well, the thing is i cannot hit pause for a few days :) if i could I wouldn't be asking for help here.

_________________
:Indigo2IMP: :Octane: This post was typed using dvorak keyboard layout - http://www.dvzine.org
cool cool toxy, to be honest i thought this was a deja vu for a second, then i saw your location

2 weeks ago i brought a project into a fresh studio somewhere in Europe. The array went down, stretching everything 33 hours after the deadline. guess who payed the bill (me, sony or them)

best of luck and do hop on irc (and do as many tests as you can :))
A few years ago they did some research and found that RAID5 arrays tended to have a problematic failure mode much more often than others. A single disk would go down (OK, that's why I have RAID5 redundancy, right?), but then during rebuilding the array a second drive would go down. I can't remember the numbers, but it was enough of a risk to where I do not use RAID-5 any more. While the research did not directly mention RAID-3, I would suspect similar results because the same processes are in place (all disks need to be read completely during the rebuild of the array).

It uses more disk space, but RAID-10 doesn't have this problem nearly as often.

_________________
Damn the torpedoes, full speed ahead!

:Indigo: :Octane: :Indigo2: :Indigo2IMP: :Indy: :PI: :O200: :ChallengeL:
@SAQ depending on the technology (disk type, make, model, year) you're looking at, the unrecoverable read error rate of media defects in late model drives have risen to the point of a statistically significant chance of encountering such an event during the course of a rebuild. this is why raid-6 has become a requirement in the high end lately.

this issue comes down to the increase in probability of encountering a previously undetected 2nd fault during an erasure. raid-10 would also have this problem, but at a reduced probability due to the restricted symbol space of the code word.

in a word: yup!

i suggested raid-10 for a reasonable balance between small block writes and large sequential r/w's. as a general recommendation it would be a good place to start, but as noted above take the time to optimize.

_________________
I love my iPad!!!
SAQ wrote:
A few years ago they did some research and found that RAID5 arrays tended to have a problematic failure mode much more often than others. A single disk would go down (OK, that's why I have RAID5 redundancy, right?), but then during rebuilding the array a second drive would go down. I can't remember the numbers, but it was enough of a risk to where I do not use RAID-5 any more.

Just how isolated was this sort of behaviour?

Why I'm asking is that at one of my customers we had a breakdown on a RAID-6 raidset in a HP MSA1500 array. These were a diskbox full (20 disks) of 500GB SATA drives. Everything was fine and we popped in a new drive the next morning, when we found *another* failed drive in the very same box and therefore the same raidset. Since a RAID-6 survives two drives failing, that was OK. The rebuild of the first failed drive went along and that night a *third* drive went dead - and the raidset wasn't fully syncronized yet.
Epic fail.

We've isolated this behaviour to a specific line of Maxtor drives, whereas HP now sends out only Seagate drives to replace them. So far we got a few of those Maxtors left, but I can't replace it and wait for a rebuild... a second or third drive might pop in the progress when the load rises.

_________________
:O3000: :Fuel: :Indy: :0300: :0300: :0300: :0300: :0300: :0300: :0300: :0300: :0300:
Oh and I forgot to mention, we've got in the areas of 60TB+ of harddrive space altogether and RAID-5 is just fine.
But then we're talking SCSI or FC drives...
:O3200: :Fuel: :Indy: :O3x02L:
toxygen wrote: I'm thinking about leaving current setup: 7 disks in raid5 with ext3 as it is and buying one more disk, so I can build another array with xfs on top of lvm on top of raid5 on top of 3 disks.
And over time, I'll try to migrate stuff from old array to new one with simultaneously extending it.


Just so you don't get surprised by this later - you can't grow a RAID5 set by adding a disk to it without rebuilding it from scratch, unless you want your original RAID5 with an unprotected disk tacked on the end of it.
:OnyxR: :IRIS3130: :IRIS2400: :Onyx: :ChallengeL: :4D220VGX: :Indigo: :Octane: :Cube: :Indigo2IMP: :Indigo2: :Indy:
kjaer wrote: Just so you don't get surprised by this later - you can't grow a RAID5 set by adding a disk to it without rebuilding it from scratch, unless you want your original RAID5 with an unprotected disk tacked on the end of it.


Depends on the RAID controller doesn't it? ICP Vortex controllers used to let you do a non-destructive rebuild, which allowed you to add a new drive to the array.
:4D70GT: :Octane2: :O200: :O2: :O2: :O2: :O2: :1600SW:
Solaris has address the issue with rebuild times in later ZFS using raidz3, making it possible to survive 3 disk failures this is important since the drives are so big nowadays taht rebuild time
is very long!

I currently have a E450 with raidz2 in my pool having 16 300GB drives split in two vdevs in one pool, but 300GB is not that big but the interface is only 40MB/s at max, rebuild maybe can use 10MB/s at max which means about 10 hours, then think 2TB drives!!

On a AMD backup server I have had two 1TB Seagate SATA drives go down in a few days in a single raidz2 6 disk pool, I had one spare drive on the shelf popped that in
and then the second drive failed after a few days that was scary waiting for more spare drives :)

Knock on wood but my E450 has never ever failed me until for a few weeks ago one PSU failed but it is a 2+1 config so no big issue there and no rebuild time :)
--
No Microsoft product was used in any way to write or send this text.
If you use a Microsoft product to read it, you're doing so at your own
risk.
mila wrote: On a AMD backup server I have had two 1TB Seagate SATA drives go down in a few days in a single raidz2 6 disk pool, I had one spare drive on the shelf popped that in
and then the second drive failed after a few days that was scary waiting for more spare drives :)


Haven't made the terabyte leap yet for (among other reasons) the fact that they seem to die much more often than smaller drives.

How long had they been running? (i.e. is this something that should have been caught in factory QA if they still did QA, or was it something that cropped up after install+test+a noticeable amount of use - 6months or better).
Damn the torpedoes, full speed ahead!

Living proof that you can't keep a blithering idiot down.

:Indigo: :Octane: :Indigo2: :Indigo2IMP: :Indy: :PI: :O3x0: :ChallengeL: :O2000R: (single-CM)