* Fwd: Linux MD raid5 and reiser4... Any experience ?
[not found] <fd8d0180601050104x15079396h@mail.gmail.com>
@ 2006-01-05 9:06 ` Francois Barre
2006-01-05 10:14 ` Daniel Pittman
2006-01-05 13:38 ` John Stoffel
0 siblings, 2 replies; 31+ messages in thread
From: Francois Barre @ 2006-01-05 9:06 UTC (permalink / raw)
To: linux-raid
Hi all,
Well, I think everything is in the subject... I am looking at this
solution for a 6*250GB raid5 data server, evolving in a 12*250 rai5 in
the months to come... Performance is absolutely not a big issue for
me, but I would not appreciate any data loss.
Furthermore, I would prefer not to use LVM nor any middle layer
between MD and the fs... Is this middle layer *very* usefull when I'm
sure my partitions layout will not evolve (e.g. only one enormous fs)
?
Thanks for your answerings/advices,
F.-E.B.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Fwd: Linux MD raid5 and reiser4... Any experience ?
2006-01-05 9:06 ` Fwd: Linux MD raid5 and reiser4... Any experience ? Francois Barre
@ 2006-01-05 10:14 ` Daniel Pittman
2006-01-05 11:21 ` Francois Barre
2006-01-05 11:26 ` berk walker
2006-01-05 13:38 ` John Stoffel
1 sibling, 2 replies; 31+ messages in thread
From: Daniel Pittman @ 2006-01-05 10:14 UTC (permalink / raw)
To: linux-raid
Francois Barre <francois.barre@gmail.com> writes:
G'day Francois.
> Well, I think everything is in the subject... I am looking at this
> solution for a 6*250GB raid5 data server, evolving in a 12*250 rai5 in
> the months to come... Performance is absolutely not a big issue for
> me, but I would not appreciate any data loss.
If your key interest is data integrity, and you don't care a fig about
performance, you would be much better off using ext3 on that filesystem.
Depending on the test, ext3 may not do better than other filesystems,
but it is really quite hard to go past the long history of reliability
and stability that it has.
It also has extremely good tools for recovering if something /does/ go
wrong, and is very resilient to damage on the disk. Reiserfs has,
historically, had some issues in those areas, especially in recovery
from corruption.
> Furthermore, I would prefer not to use LVM nor any middle layer
> between MD and the fs... Is this middle layer *very* usefull when I'm
> sure my partitions layout will not evolve (e.g. only one enormous fs)
Nope, pretty much no advantage at all, if you are just planning on using
this to store volume data.
Daniel
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Fwd: Linux MD raid5 and reiser4... Any experience ?
2006-01-05 10:14 ` Daniel Pittman
@ 2006-01-05 11:21 ` Francois Barre
2006-01-05 11:31 ` Gordon Henderson
` (2 more replies)
2006-01-05 11:26 ` berk walker
1 sibling, 3 replies; 31+ messages in thread
From: Francois Barre @ 2006-01-05 11:21 UTC (permalink / raw)
To: linux-raid
2006/1/5, Daniel Pittman <daniel@rimspace.net>:
> Francois Barre <francois.barre@gmail.com> writes:
>
> G'day Francois.
>
> > Well, I think everything is in the subject... I am looking at this
> > solution for a 6*250GB raid5 data server, evolving in a 12*250 rai5 in
> > the months to come... Performance is absolutely not a big issue for
> > me, but I would not appreciate any data loss.
>
> If your key interest is data integrity, and you don't care a fig about
> performance, you would be much better off using ext3 on that filesystem.
>
> Depending on the test, ext3 may not do better than other filesystems,
> but it is really quite hard to go past the long history of reliability
> and stability that it has.
>
[...]
Well, as far as I understood it (that is, not so far :-p), reiser4
seemed to have a stronger and more efficient journal than ext3. That
is not what everyone believes, but reiser4 was to be designed that way
more or less... But I guess that ext3 and its very-heavily-tested
journal can still be more trusted than any newcomer.
Truth is, I would have been glad to play with reiser4 on a large
amount of data, just because I was interrested on the theories behind
it (including the database-filesystem strange wedding Hans tried to
organize). Maybe it's too great a risk for a production system.
Well, anyway, thanks for the advice. Guess I'll have to stay on ext3
if I don't want to have nightmares...
Best regards,
F.-E.B.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Fwd: Linux MD raid5 and reiser4... Any experience ?
2006-01-05 10:14 ` Daniel Pittman
2006-01-05 11:21 ` Francois Barre
@ 2006-01-05 11:26 ` berk walker
2006-01-05 11:35 ` Francois Barre
1 sibling, 1 reply; 31+ messages in thread
From: berk walker @ 2006-01-05 11:26 UTC (permalink / raw)
To: Daniel Pittman; +Cc: linux-raid
Daniel Pittman wrote:
>Francois Barre <francois.barre@gmail.com> writes:
>
>G'day Francois.
>
>
>
>>Well, I think everything is in the subject... I am looking at this
>>solution for a 6*250GB raid5 data server, evolving in a 12*250 rai5 in
>>the months to come... Performance is absolutely not a big issue for
>>me, but I would not appreciate any data loss.
>>
>>
>
>If your key interest is data integrity, and you don't care a fig about
>performance, you would be much better off using ext3 on that filesystem.
>
>Depending on the test, ext3 may not do better than other filesystems,
>but it is really quite hard to go past the long history of reliability
>and stability that it has.
>
>It also has extremely good tools for recovering if something /does/ go
>wrong, and is very resilient to damage on the disk. Reiserfs has,
>historically, had some issues in those areas, especially in recovery
>from corruption.
>
>
>
>>Furthermore, I would prefer not to use LVM nor any middle layer
>>between MD and the fs... Is this middle layer *very* usefull when I'm
>>sure my partitions layout will not evolve (e.g. only one enormous fs)
>>
>>
>
>Nope, pretty much no advantage at all, if you are just planning on using
>this to store volume data.
>
> Daniel
>
>
Ext3 does have a fine record. Might I also suggest an added expense of
18 1/2% and do RAID6 for better protection against data loss?
b-
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Fwd: Linux MD raid5 and reiser4... Any experience ?
2006-01-05 11:21 ` Francois Barre
@ 2006-01-05 11:31 ` Gordon Henderson
2006-01-06 6:33 ` Daniel Pittman
2006-01-06 9:47 ` Simon Valiquette
2 siblings, 0 replies; 31+ messages in thread
From: Gordon Henderson @ 2006-01-05 11:31 UTC (permalink / raw)
To: linux-raid
On Thu, 5 Jan 2006, Francois Barre wrote:
> Well, anyway, thanks for the advice. Guess I'll have to stay on ext3
> if I don't want to have nightmares...
And you can always mount it as ext2 if you think the journal is corrupt.
Have you considered Raid-6 rather than R5?
The biggest worry I have is having a 2nd disk fail during a
reconstruction. True, you lose another disks worth of space, but disk is
cheap these days, right? :)
Especially with 12 disks in the array.
What I do now is do a read-only badblocks test on all the
drives/partitions in a set if I ever have a drive failure. That way I will
know in advance if a disk is going to fail during the reconstruction -
however, this takes time (and can severely impact performance) but
hopefully the time window between checking the disks and the
reconstruction would be small enough that another disk shouldn't go
faulty...
Gordon
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Fwd: Linux MD raid5 and reiser4... Any experience ?
2006-01-05 11:26 ` berk walker
@ 2006-01-05 11:35 ` Francois Barre
2006-01-05 11:43 ` Gordon Henderson
` (2 more replies)
0 siblings, 3 replies; 31+ messages in thread
From: Francois Barre @ 2006-01-05 11:35 UTC (permalink / raw)
To: linux-raid
2006/1/5, berk walker <berk@panix.com>:
[...]
> >
> >
> Ext3 does have a fine record. Might I also suggest an added expense of
> 18 1/2% and do RAID6 for better protection against data loss?
> b-
>
Well, I guess so. I just hope I'll be given enough money for it, since
it increases the cost per GB. Anyway I guess that it is preferable
than a raid5 + 1 spare.
By the way, not really a raid-oriented question, but what is the exact
robustness of ext3 resizing ? I mean : what happens if the box crashes
while resizing an ext3 ?
This implies, of course, resizing capabilities for raid6, which is not
implement yet. Is it ?
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Fwd: Linux MD raid5 and reiser4... Any experience ?
2006-01-05 11:35 ` Francois Barre
@ 2006-01-05 11:43 ` Gordon Henderson
2006-01-05 11:59 ` berk walker
2006-01-05 13:13 ` Bill Rugolsky Jr.
2 siblings, 0 replies; 31+ messages in thread
From: Gordon Henderson @ 2006-01-05 11:43 UTC (permalink / raw)
To: linux-raid
On Thu, 5 Jan 2006, Francois Barre wrote:
> 2006/1/5, berk walker <berk@panix.com>:
> [...]
> > >
> > >
> > Ext3 does have a fine record. Might I also suggest an added expense of
> > 18 1/2% and do RAID6 for better protection against data loss?
> > b-
> >
>
> Well, I guess so. I just hope I'll be given enough money for it, since
> it increases the cost per GB. Anyway I guess that it is preferable
> than a raid5 + 1 spare.
If you've got a spare, you might as well use it (IMO!)
> By the way, not really a raid-oriented question, but what is the exact
> robustness of ext3 resizing ? I mean : what happens if the box crashes
> while resizing an ext3 ?
> This implies, of course, resizing capabilities for raid6, which is not
> implement yet. Is it ?
No personal experience. I build servers for a purpose and when time comes
to expand it, I build another!
Gordon
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Fwd: Linux MD raid5 and reiser4... Any experience ?
2006-01-05 11:35 ` Francois Barre
2006-01-05 11:43 ` Gordon Henderson
@ 2006-01-05 11:59 ` berk walker
2006-01-05 13:13 ` Bill Rugolsky Jr.
2 siblings, 0 replies; 31+ messages in thread
From: berk walker @ 2006-01-05 11:59 UTC (permalink / raw)
To: Francois Barre; +Cc: linux-raid
Francois Barre wrote:
>2006/1/5, berk walker <berk@panix.com>:
>[...]
>
>
>>>
>>>
>>Ext3 does have a fine record. Might I also suggest an added expense of
>>18 1/2% and do RAID6 for better protection against data loss?
>>b-
>>
>>
>>
>
>Well, I guess so. I just hope I'll be given enough money for it, since
>it increases the cost per GB. Anyway I guess that it is preferable
>than a raid5 + 1 spare.
>By the way, not really a raid-oriented question, but what is the exact
>robustness of ext3 resizing ? I mean : what happens if the box crashes
>while resizing an ext3 ?
>This implies, of course, resizing capabilities for raid6, which is not
>implement yet. Is it ?
>-
>
>
I can not speak to that, sorry.
b-
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Fwd: Linux MD raid5 and reiser4... Any experience ?
2006-01-05 11:35 ` Francois Barre
2006-01-05 11:43 ` Gordon Henderson
2006-01-05 11:59 ` berk walker
@ 2006-01-05 13:13 ` Bill Rugolsky Jr.
2 siblings, 0 replies; 31+ messages in thread
From: Bill Rugolsky Jr. @ 2006-01-05 13:13 UTC (permalink / raw)
To: Francois Barre; +Cc: linux-raid
On Thu, Jan 05, 2006 at 12:35:12PM +0100, Francois Barre wrote:
> By the way, not really a raid-oriented question, but what is the exact
> robustness of ext3 resizing ? I mean : what happens if the box crashes
> while resizing an ext3 ?
Ext3 online resizing is undergoing some changes, so this question will
need to be applied to a particular version.
In any case, during online expansion, most of the time is spent preparing
the new block groups, which the filesystem at that point knows nothing
about. Once everything is prepared, telling the filesystem about the new
block groups involves a small number of very quick operations. I don't
know whether the ordering guarantees 100% safe (although I have great
trust in Andreas Dilger :-)), but even if not, the window of vulnerability
is very small. And if the box crashes, e2fsck ought to recover easily.
Regards,
Bill Rugolsky
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Fwd: Linux MD raid5 and reiser4... Any experience ?
2006-01-05 9:06 ` Fwd: Linux MD raid5 and reiser4... Any experience ? Francois Barre
2006-01-05 10:14 ` Daniel Pittman
@ 2006-01-05 13:38 ` John Stoffel
2006-01-05 14:03 ` Francois Barre
1 sibling, 1 reply; 31+ messages in thread
From: John Stoffel @ 2006-01-05 13:38 UTC (permalink / raw)
To: Francois Barre; +Cc: linux-raid
Francois> Well, I think everything is in the subject... I am looking
Francois> at this solution for a 6*250GB raid5 data server, evolving
Francois> in a 12*250 rai5 in the months to come... Performance is
Francois> absolutely not a big issue for me, but I would not
Francois> appreciate any data loss.
So what are you doing for backups, and can you allow the downtime
needed to restore all your data if there is a problem? Remember, it's
not the cost of doing backups which drives things, it's the cost of
the time to *restore* the data which drives issues.
In this case, you're looking at just under 1 terabyte to start, and 2
terabytes of data later on once you expand. Now think how many DLT IV
tapes at 50gb (nominal compression assumed) each it will take to hold
that data, and how long at 5Mb/sec to restore...
As others have suggested, going with RAID6 will cover the double disk
failure case. This is one reason I really like NetApps with the new
double parity setup, with the increase of disk sizes, and the close to
24 hour RAID reconstruct time, double disk failures are a serious
issue to think about.
Francois> Furthermore, I would prefer not to use LVM nor any middle
Francois> layer between MD and the fs... Is this middle layer *very*
Francois> usefull when I'm sure my partitions layout will not evolve
Francois> (e.g. only one enormous fs) ? Thanks for your
Francois> answerings/advices,
Why do you not want to use LVM? It gives you alot of flexibility to
change your mind down the road. Also, it means that you could just
build a pair of RAID5/6 arrays and stripe across them. Yes, you lose
some disk space since you now have multiple arrays, each with their
own parity disks, but it also means that
In terms of filesystems, I still like ext3 for it's reliability, but I
would like a filesystem which can be resized on the fly if at all
possible. I've been slowly leaning towards xfs, but maybe that's
just me not liking Hans Reiser's attitude on the lkml at points. And
I certainly don't trust reiser4 at all yet, it's way too early for
production data.
From the sound of it, you just want a large place to dump stuff, in
which case you might be happy with a less reliable system.
Oh yeah, don't forget to mirror the root disk. And if you're looking
to make a file server, you might want to look at that OpenNAS stuff
and boot it off a compact flash card/USB dongle as well. Keep as few
a number of moving parts as possible.
John
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Fwd: Linux MD raid5 and reiser4... Any experience ?
2006-01-05 13:38 ` John Stoffel
@ 2006-01-05 14:03 ` Francois Barre
2006-01-05 18:55 ` John Stoffel
0 siblings, 1 reply; 31+ messages in thread
From: Francois Barre @ 2006-01-05 14:03 UTC (permalink / raw)
To: linux-raid
2006/1/5, John Stoffel <john@stoffel.org>:
>
> So what are you doing for backups, and can you allow the downtime
> needed to restore all your data if there is a problem? Remember, it's
> not the cost of doing backups which drives things, it's the cost of
> the time to *restore* the data which drives issues.
>
Well, backups mean snapshots. Snapshots mean having a non-changing set
of data for the time the backup goes. Not sure about that.
Furthermore, backup means increasing the TCO per GB. I must keep it
close to .50 euro per GB. That's my main issue at the moment...
[....]
> Why do you not want to use LVM? It gives you alot of flexibility to
> change your mind down the road. Also, it means that you could just
> build a pair of RAID5/6 arrays and stripe across them. Yes, you lose
> some disk space since you now have multiple arrays, each with their
> own parity disks, but it also means that
>
I don't trust it. I'm wrong I know, but I don't trust it. Had a very
bad experience with a stupidly configured system (not configured by
me, of course :-p) a couple of months ago, with a LVM on top of a
linear Raid. Guess what ? Most of mission-critical data was lost. I
know LVM was not responsible for this, but, you know, trust is
sometimes not only a matter of figures and scientific facts.
> In terms of filesystems, I still like ext3 for it's reliability, but I
> would like a filesystem which can be resized on the fly if at all
> possible. I've been slowly leaning towards xfs, but maybe that's
> just me not liking Hans Reiser's attitude on the lkml at points. And
> I certainly don't trust reiser4 at all yet, it's way too early for
> production data.
What did Hans say on LKML ? I thought he was considered as the
gentle-and-wise-guru for filesystems, just as Linus is for the
kernel...
> Oh yeah, don't forget to mirror the root disk. And if you're looking
> to make a file server, you might want to look at that OpenNAS stuff
> and boot it off a compact flash card/USB dongle as well. Keep as few
> a number of moving parts as possible.
>
Speaking of this, I began to think about splitting all the disks in
two partitions : 1 of 1Go, the rest for data, and build two mds :
md0, 12*1GB of raid1 (mirrored) for /
md1, 12*229GB of raid6 for data.
Maybe this is a little bit paranoïd for / but :
1. I can afford loosing 1GB of space on each DD
2. All disks have the same partition structure
3. I can boot on each DD, regardless to the number of valid DDs it has.
BTW : is there any kind of limitation on the number of devices in a raid1 ?
Of course, updating data on / will be at a high cost : 12 times for
each write... But it's a fileserver, so config will not change so
often (maybe an issue for logs...).
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Fwd: Linux MD raid5 and reiser4... Any experience ?
@ 2006-01-05 17:32 Andrew Burgess
2006-01-05 17:50 ` Francois Barre
0 siblings, 1 reply; 31+ messages in thread
From: Andrew Burgess @ 2006-01-05 17:32 UTC (permalink / raw)
To: linux-raid
>Furthermore, I would prefer not to use LVM nor any middle layer
>between MD and the fs... Is this middle layer *very* usefull when I'm
>sure my partitions layout will not evolve (e.g. only one enormous fs)
>?
Another feature of LVM is moving physical devices (PV). This makes it
easier to grow your enormous fs because it allows you to remove disks.
An example:
Today 1 500GB RAID6 PV #1
A year from now, add another 1TB PV #2, no LVM advantage yet.
A year later, you want to add another 1TB PV #3 but then you want to
remove the first 500GB. With LVM you can copy PV #1 to PV#3 without
affecting the filesystem and then remove PV #1.
I haven't ever done this, just read about it. Also, maybe when md
allows growing raid5/6 this will be handled by md.
HTH
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Fwd: Linux MD raid5 and reiser4... Any experience ?
2006-01-05 17:32 Andrew Burgess
@ 2006-01-05 17:50 ` Francois Barre
0 siblings, 0 replies; 31+ messages in thread
From: Francois Barre @ 2006-01-05 17:50 UTC (permalink / raw)
To: linux-raid
> Another feature of LVM is moving physical devices (PV). This makes it
> easier to grow your enormous fs because it allows you to remove disks.
Thanks for the tip, didn't think of it this way...
[...]
> I haven't ever done this, just read about it. Also, maybe when md
> allows growing raid5/6 this will be handled by md.
>
Well, that's what I am counting on since the beginning of my interrest for md...
Indead, LVM would be a good alternative to the growth capability of md
anyway... Although I would prefer a pure md extension to the raid...
Well, anyway, removing drives is (for me) a little out-of-topic,
because the system will be entropic : loads of data will come in, none
will get out... Removing a drive would only mean the drive is dead.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Fwd: Linux MD raid5 and reiser4... Any experience ?
2006-01-05 14:03 ` Francois Barre
@ 2006-01-05 18:55 ` John Stoffel
2006-01-06 9:08 ` Francois Barre
0 siblings, 1 reply; 31+ messages in thread
From: John Stoffel @ 2006-01-05 18:55 UTC (permalink / raw)
To: Francois Barre; +Cc: linux-raid
>>>>> "Francois" == Francois Barre <francois.barre@gmail.com> writes:
Francois> 2006/1/5, John Stoffel <john@stoffel.org>:
>>
>> So what are you doing for backups, and can you allow the downtime
>> needed to restore all your data if there is a problem? Remember, it's
>> not the cost of doing backups which drives things, it's the cost of
>> the time to *restore* the data which drives issues.
>>
Francois> Well, backups mean snapshots. Snapshots mean having a
Francois> non-changing set of data for the time the backup goes.
No, backups do not mean snapshots. Backups mean a copy of the data is
kept on a reliable medium somewhere else, so that when a <Insert
disaster here> destroys your disk array, you can reload the data from
your backup media.
Snapshots are merely a way to take backups of a consistent view of the
filesystem, so that the time between the initial scan of all
directories and files in a filesystem, and the time when you actually
start to dump them to disk, you don't have users changing stuff
behind your back.
Don't mix them up, you'll be really really unhappy.
Francois> Not sure about that. Furthermore, backup means increasing
Francois> the TCO per GB. I must keep it close to .50 euro per
Francois> GB. That's my main issue at the moment...
What is your cost if the disk array goes up in flames and you need to
access your data then? Or can you just reload it from some other
source and not care about it?
Francois> I don't trust it. I'm wrong I know, but I don't trust
Francois> it. Had a very bad experience with a stupidly configured
Francois> system (not configured by me, of course :-p) a couple of
Francois> months ago, with a LVM on top of a linear Raid. Guess what ?
Francois> Most of mission-critical data was lost. I know LVM was not
Francois> responsible for this, but, you know, trust is sometimes not
Francois> only a matter of figures and scientific facts.
So did you have backups? Sounds like you didn't which just makes my
point that I've been pushing: "How will you backup your data?"
Also, setting up a Linear RAID0 of disks is just asking for trouble,
which you obviously know since you're talking about using RAID5 in
this new array. *grin* If this mis-configured system had been setup
with an MD array that was just linear, without LVM, you would have
still lost data.
From the sound of it, the old system was configure with MD on top of
LVM, which is the inverse of how I do it. I put MD down low, and
layer LVM on top of the reliable (redundant) MD device. Then I can
span my LVM across multiple MD devices.
You're blaming the wrong part of the system for your data loss. First
off blame the user/sysadmin who set it up, then blame the person who
put Linear RAID on your system and into MD/LVM. :]
As another poster said, using LVM on top of MD allows you to move your
filesystems from one set of physical media to another without having
any down time for your users. Or even more importantly, it allows you
to grow/move your storage without having to dump your data to another
filesystem or tape.
Francois> What did Hans say on LKML ? I thought he was considered as
Francois> the gentle-and-wise-guru for filesystems, just as Linus is
Francois> for the kernel...
Umm... not exactly. He's more a pain in the butt to deal with at
times, with an abrasive personality which only seems to care about his
projects. He doesn't like working with other people to make it FS
work within the Linux kernel designs and philosophy.
All my opinion of course. Plus, I've been a bunch of horror stories
about Resierfs3 problems, though I admit not recently, say the past
six months to a year. But resierfs4 I wouldn't deploy production data
on yet...
>> Oh yeah, don't forget to mirror the root disk. And if you're looking
>> to make a file server, you might want to look at that OpenNAS stuff
>> and boot it off a compact flash card/USB dongle as well. Keep as few
>> a number of moving parts as possible.
>>
Francois> Speaking of this, I began to think about splitting all the disks in
Francois> two partitions : 1 of 1Go, the rest for data, and build two mds :
Francois> md0, 12*1GB of raid1 (mirrored) for /
Francois> md1, 12*229GB of raid6 for data.
Francois> Maybe this is a little bit paranoïd for / but :
Francois> 1. I can afford loosing 1GB of space on each DD
It's going a bit too far I think. Just dedicate a pair of small disks
to your OS. Or again, get the OpenNAS software and boot off CDROM/USB
and not worry about it.
Francois> 2. All disks have the same partition structure
It's an advantage.
Francois> 3. I can boot on each DD, regardless to the number of valid
Francois> DDs it has.
At that point, it depends on what your BIOS supports for bootable
disks.
Francois> BTW : is there any kind of limitation on the number of
Francois> devices in a raid1 ?
Don't know. I suspect that there might be.
Francois> Of course, updating data on / will be at a high cost : 12
Francois> times for each write... But it's a fileserver, so config
Francois> will not change so often (maybe an issue for logs...).
Send the logs via syslog to another server. End of problem. :]
John
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Fwd: Linux MD raid5 and reiser4... Any experience ?
2006-01-05 11:21 ` Francois Barre
2006-01-05 11:31 ` Gordon Henderson
@ 2006-01-06 6:33 ` Daniel Pittman
2006-01-06 9:47 ` Simon Valiquette
2 siblings, 0 replies; 31+ messages in thread
From: Daniel Pittman @ 2006-01-06 6:33 UTC (permalink / raw)
To: linux-raid
Francois Barre <francois.barre@gmail.com> writes:
> 2006/1/5, Daniel Pittman <daniel@rimspace.net>:
>> Francois Barre <francois.barre@gmail.com> writes:
>>
>> G'day Francois.
>>
>> > Well, I think everything is in the subject... I am looking at this
>> > solution for a 6*250GB raid5 data server, evolving in a 12*250 rai5 in
>> > the months to come... Performance is absolutely not a big issue for
>> > me, but I would not appreciate any data loss.
>>
>> If your key interest is data integrity, and you don't care a fig about
>> performance, you would be much better off using ext3 on that filesystem.
>>
>> Depending on the test, ext3 may not do better than other filesystems,
>> but it is really quite hard to go past the long history of reliability
>> and stability that it has.
>>
> [...]
>
> Well, as far as I understood it (that is, not so far :-p), reiser4
> seemed to have a stronger and more efficient journal than ext3.
I can't comment on the design of reiserfs4, but it isn't hard to believe
that the journaling could be more efficient than ext3, which journals
complete metadata blocks rather than individual operations.
That method is quite safe, but may not be as fast as other journaling
methods. Of course, that depends on your workload: in some cases it is
distinctly faster. :)
> That is not what everyone believes, but reiser4 was to be designed
> that way more or less... But I guess that ext3 and its
> very-heavily-tested journal can still be more trusted than any
> newcomer.
That is very true: all code has bugs, no matter how good the people
writing it are. With ext3, other people have paid the cost of testing
to find and resolve many of those bugs, so your data is safe.
With reiserfs4 you get to be one of the brave early adopters, today,
which means that it may be your data that gets eaten while those bugs
are found.
> Truth is, I would have been glad to play with reiser4 on a large
> amount of data, just because I was interrested on the theories behind
> it (including the database-filesystem strange wedding Hans tried to
> organize). Maybe it's too great a risk for a production system.
>
> Well, anyway, thanks for the advice. Guess I'll have to stay on ext3
> if I don't want to have nightmares...
That would certainly be my advice. It may not have the performance or
features that reiserfs4 promises[1], and that XFS delivers, but it is
surely a lot safer for you. :)
Daniel
Footnotes:
[1] I have not tested this, and last time I saw results they were still
somewhat mixed.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Fwd: Linux MD raid5 and reiser4... Any experience ?
2006-01-05 18:55 ` John Stoffel
@ 2006-01-06 9:08 ` Francois Barre
2006-01-06 10:49 ` Andre Majorel
0 siblings, 1 reply; 31+ messages in thread
From: Francois Barre @ 2006-01-06 9:08 UTC (permalink / raw)
To: linux-raid
> No, backups do not mean snapshots.
> Snapshots are merely a way to take backups of a consistent view of the
filesystem...
[...]
> Don't mix them up, you'll be really really unhappy.
Wll, I may not be fully awoken yet, but what is the interrest of
having non-consistent backups ? For sure, backuping implies one of :
1. Forbidding write access to data for users
2. Having the capability to have snapshots.
> You're blaming the wrong part of the system for your data loss. First
> off blame the user/sysadmin who set it up, then blame the person who
> put Linear RAID on your system and into MD/LVM. :]
Hey, I already said I was wrong :-p.
The point is, as I still wish to keep it small & simple (kiss :-), LVM
has, no matter how deep I study it, an impact over complexity.
> Umm... not exactly. He's more a pain in the butt to deal with at
> times, with an abrasive personality which only seems to care about his
> projects. He doesn't like working with other people to make it FS
> work within the Linux kernel designs and philosophy.
I didn't know it... Pity, he sometimes has great ideas...
> All my opinion of course. Plus, I've been a bunch of horror stories
> about Resierfs3 problems, though I admit not recently, say the past
> six months to a year. But resierfs4 I wouldn't deploy production data
> on yet...
I agree with that now. You all convinced me.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Fwd: Linux MD raid5 and reiser4... Any experience ?
2006-01-05 11:21 ` Francois Barre
2006-01-05 11:31 ` Gordon Henderson
2006-01-06 6:33 ` Daniel Pittman
@ 2006-01-06 9:47 ` Simon Valiquette
2006-01-06 10:50 ` Francois Barre
` (3 more replies)
2 siblings, 4 replies; 31+ messages in thread
From: Simon Valiquette @ 2006-01-06 9:47 UTC (permalink / raw)
To: linux-raid
Francois Barre a écrit :
> 2006/1/5, Daniel Pittman <daniel@rimspace.net>:
>
>>Francois Barre <francois.barre@gmail.com> writes:
>>
>>>Well, I think everything is in the subject... I am looking at this
>>>solution for a 6*250GB raid5 data server, evolving in a 12*250 rai5 in
>>>the months to come... Performance is absolutely not a big issue for
>>>me, but I would not appreciate any data loss.
>
> Well, as far as I understood it (that is, not so far :-p), reiser4
> seemed to have a stronger and more efficient journal than ext3. That
> is not what everyone believes, but reiser4 was to be designed that way
> more or less... But I guess that ext3 and its very-heavily-tested
> journal can still be more trusted than any newcomer.
>
> Truth is, I would have been glad to play with reiser4 on a large
> amount of data, just because I was interrested on the theories behind
> it (including the database-filesystem strange wedding Hans tried to
> organize). Maybe it's too great a risk for a production system.
>
> Well, anyway, thanks for the advice. Guess I'll have to stay on ext3
> if I don't want to have nightmares...
AFAIK, ext3 volume cannot be bigger than 4TB on a 32 bits system.
I think it is important you know that in case it could be a concern
for you.
Personnally, I also don't like using ext2/3 on a array bigger than
2TB. With 12x250GB, you are already well over 2TB, and maybe some day
you will realize that you needs more than 4TB. If you have a 64 bits
computers, just forget what I said (16TB is the limit).
On production server with large RAID array, I tends to like very
much XFS and trust it more than ReiserFS (I had some bad experience
with ReiserFS in the past). You can also grow a XFS filesystem live,
which is really nice.
XFS is finally much faster than EXT3 and self-defragmenting, but
that was not a concern for you anyway.
Simon Valiquette
http://gulus.USherbrooke.ca
http://www.gulus.org
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Fwd: Linux MD raid5 and reiser4... Any experience ?
2006-01-06 9:08 ` Francois Barre
@ 2006-01-06 10:49 ` Andre Majorel
2006-01-09 8:00 ` Molle Bestefich
0 siblings, 1 reply; 31+ messages in thread
From: Andre Majorel @ 2006-01-06 10:49 UTC (permalink / raw)
To: linux-raid
On 2006-01-06 10:08 +0100, Francois Barre wrote:
> > No, backups do not mean snapshots.
> > Snapshots are merely a way to take backups of a consistent view of the
> filesystem...
> [...]
> > Don't mix them up, you'll be really really unhappy.
> Wll, I may not be fully awoken yet, but what is the interrest of
> having non-consistent backups ? For sure, backuping implies one of :
> 1. Forbidding write access to data for users
> 2. Having the capability to have snapshots.
With LVM, you can create a snapshot of the block device at any
time, mount the snapshot read-only (looks like another block
device) and backup that. Ensuring consistency at application level
is still up to you but at least, if that involves stopping
services, the unavailability window is greatly reduced.
--
André Majorel <URL:http://www.teaser.fr/~amajorel/>
Do not use this account for regular correspondence.
See the URL above for contact information.
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Fwd: Linux MD raid5 and reiser4... Any experience ?
2006-01-06 9:47 ` Simon Valiquette
@ 2006-01-06 10:50 ` Francois Barre
2006-01-06 19:28 ` Forrest Taylor
2006-01-06 11:03 ` Kanotix crashed my raid PFC
` (2 subsequent siblings)
3 siblings, 1 reply; 31+ messages in thread
From: Francois Barre @ 2006-01-06 10:50 UTC (permalink / raw)
To: linux-raid
> AFAIK, ext3 volume cannot be bigger than 4TB on a 32 bits system.
> I think it is important you know that in case it could be a concern
> for you.
What ? What ? .... What ?
<dased_and_confused>Are you sure ? I may search for it more
extensively... Anyway, the total size cannot be more than sizeof(block
size) * block size, and with 32bits systems and 4k pages, that makes
16TB. Are you sure ?</dased_and_confused>
> XFS is finally much faster than EXT3 and self-defragmenting, but
> that was not a concern for you anyway.
Well, I'm currently looking at this solution... Never used XFS before,
but I always been tought of it for as a good thing...
^ permalink raw reply [flat|nested] 31+ messages in thread
* Kanotix crashed my raid...
2006-01-06 9:47 ` Simon Valiquette
2006-01-06 10:50 ` Francois Barre
@ 2006-01-06 11:03 ` PFC
2006-01-06 12:02 ` PFC
2006-01-06 19:05 ` Fwd: Linux MD raid5 and reiser4... Any experience ? Mike Hardy
2006-01-08 2:53 ` Daniel Pittman
3 siblings, 1 reply; 31+ messages in thread
From: PFC @ 2006-01-06 11:03 UTC (permalink / raw)
To: linux-raid
Hello !
This is my first post here, so hello to everyone !
So, I have a 1 Terabyte 5-disk RAID5 array (md) that is now dead. I'll
try to explain.
It's a bit long because I tried to be complete...
----------------------------------------------------------------
Hardware :
- Athlon 64, nforce mobo with 4 IDE and 4 SATA
- 2 IDE HDD making up a RAID1 array
- 4 SATA HDD + 1 IDE HDD making up a RAID5 array.
Software :
- gentoo compiled in 64 bits ; kernel is 2.6.14-archck5
- mdadm - v2.1 - 12 September 2005
RAID1 config :
/dev/hda (80 Gb) and /dev/hdc (120 Gb) contain :
- mirrored /boot partitions,
- a 75 GB RAID1 (/dev/md0) mounted on /
- a 5 GB RAID1 (/dev/md1) for storing mysql and postgres databases
separately
- and hdc, which is larger, has a non-RAID scratch partition for all the
unimportant stuff.
RAID5 config :
/dev/hdb, /dev/sd{a,b,c,d} are 5 x 250 GB hard disks ; some maxtor, some
seagate, 1 IDE and 4 SATA.
They are assembled in a RAID5 array, /dev/md2
----------------------------------------------------------------
What happened ?
So, I'm very happy with the software RAID 1 on my / partition ;
especially since one of the two disks of the mirror died yesterday . The
drive which died was a 100 GB. I had a spare drive lying around, but it
was only 80 GB. So I had to resize a few partitions including / and remake
the raid array. No problem with a Kanotix boot CD ; I thought :
- copy contents of /dev/md0 (/) to the big RAID5
- destroy /dev/md0
- rebuild it in a smaller size to accomodate the new disk
- copy the data back from the RAID5
Kanotix (version 2005.3) had detected the RAID1 partitions and had no
problems with them.
However the RAID5 was not detected. "cat /proc/mdstat" showed no trace of
it.
So I typed in Kanotix :
mdadm --assemble /dev/md2 /dev/hdb1 /dev/sd{a,b,c,d}1
Then it hung. The PC did not crash, but the mdadm process was hung.
And I couldn't cat /proc/mdstat anymore (it would hang also).
After waiting for a long time and seeing that nothing happened, I did a
hard reset.
So I resized my / partition with the usual trick (create a mirror with 1
real drive and 1 failed 'virtual drive', copy data, add old drive).
And I rebooted and all was well. Except /dev/md2 showed no life signs.
This thing had been working flawlessly up until I typed
the dreaded "mdadm --assemble" in Kanotix. However now it's dead.
Yeah, I have backups, sort of. This is my CD collection, all ripped and
converted to lossless FLAC. And now my original CDs (about 900) are nicely
packed in cardboard boxes in the basement. The thought of having to re-rip
900 cds is what motivated me to use RAID by the way.
Anyway :
-------------------------------------------------
apollo13 ~ # cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid5]
md1 : active raid1 hdc7[1] hda7[0]
6248832 blocks [2/2] [UU]
md2 : inactive sda1[0] hdb1[4] sdc1[3] sdb1[1]
978615040 blocks
md0 : active raid1 hdc6[0] hda6[1]
72292992 blocks [2/2] [UU]
unused devices: <none>
-------------------------------------------------
/dev/md2 is the problem. it's inactive so :
apollo13 ~ # mdadm --run /dev/md2
mdadm: failed to run array /dev/md2: Input/output error
ouch !
-------------------------------------------------
Here is dmesg output (/var/log/messages says the same) :
md: Autodetecting RAID arrays.
md: autorun ...
md: considering sdd1 ...
md: adding sdd1 ...
md: adding sdc1 ...
md: adding sdb1 ...
md: adding sda1 ...
md: hdc7 has different UUID to sdd1
md: hdc6 has different UUID to sdd1
md: adding hdb1 ...
md: hda7 has different UUID to sdd1
md: hda6 has different UUID to sdd1
md: created md2
md: bind<hdb1>
md: bind<sda1>
md: bind<sdb1>
md: bind<sdc1>
md: bind<sdd1>
md: running: <sdd1><sdc1><sdb1><sda1><hdb1>
md: kicking non-fresh sdd1 from array!
md: unbind<sdd1>
md: export_rdev(sdd1)
md: md2: raid array is not clean -- starting background reconstruction
raid5: device sdc1 operational as raid disk 3
raid5: device sdb1 operational as raid disk 1
raid5: device sda1 operational as raid disk 0
raid5: device hdb1 operational as raid disk 4
raid5: cannot start dirty degraded array for md2
RAID5 conf printout:
--- rd:5 wd:4 fd:1
disk 0, o:1, dev:sda1
disk 1, o:1, dev:sdb1
disk 3, o:1, dev:sdc1
disk 4, o:1, dev:hdb1
raid5: failed to run raid set md2
md: pers->run() failed ...
md: do_md_run() returned -5
md: md2 stopped.
md: unbind<sdc1>
md: export_rdev(sdc1)
md: unbind<sdb1>
md: export_rdev(sdb1)
md: unbind<sda1>
md: export_rdev(sda1)
md: unbind<hdb1>
md: export_rdev(hdb1)
-------------------------------------------------
So, it seems sdd1 isn't fresh enough so it gets kicked ; and 4 drives
remain, which should be OK to run the array but somehow isn't.
Let's --examine the superblocks :
apollo13 ~ # mdadm --examine /dev/hdb1 /dev/sd?1
/dev/hdb1:
Magic : a92b4efc
Version : 00.90.00
UUID : 55ef57eb:c153dce4:c6f9ac90:e0da3c14
Creation Time : Sun Dec 25 17:58:00 2005
Raid Level : raid5
Device Size : 244195904 (232.88 GiB 250.06 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 2
Update Time : Fri Jan 6 06:57:15 2006
State : active
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0
Checksum : fe3f58c8 - correct
Events : 0.61952
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 4 3 65 4 active sync /dev/hdb1
0 0 8 1 0 active sync /dev/sda1
1 1 8 17 1 active sync /dev/sdb1
2 2 8 49 2 active sync /dev/sdd1
3 3 8 33 3 active sync /dev/sdc1
4 4 3 65 4 active sync /dev/hdb1
/dev/sda1:
Magic : a92b4efc
Version : 00.90.00
UUID : 55ef57eb:c153dce4:c6f9ac90:e0da3c14
Creation Time : Sun Dec 25 17:58:00 2005
Raid Level : raid5
Device Size : 244195904 (232.88 GiB 250.06 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 2
Update Time : Fri Jan 6 06:57:15 2006
State : active
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0
Checksum : fe3f5885 - correct
Events : 0.61952
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 0 8 1 0 active sync /dev/sda1
0 0 8 1 0 active sync /dev/sda1
1 1 8 17 1 active sync /dev/sdb1
2 2 8 49 2 active sync /dev/sdd1
3 3 8 33 3 active sync /dev/sdc1
4 4 3 65 4 active sync /dev/hdb1
/dev/sdb1:
Magic : a92b4efc
Version : 00.90.00
UUID : 55ef57eb:c153dce4:c6f9ac90:e0da3c14
Creation Time : Sun Dec 25 17:58:00 2005
Raid Level : raid5
Device Size : 244195904 (232.88 GiB 250.06 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 2
Update Time : Fri Jan 6 06:57:15 2006
State : active
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0
Checksum : fe3f5897 - correct
Events : 0.61952
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 1 8 17 1 active sync /dev/sdb1
0 0 8 1 0 active sync /dev/sda1
1 1 8 17 1 active sync /dev/sdb1
2 2 8 49 2 active sync /dev/sdd1
3 3 8 33 3 active sync /dev/sdc1
4 4 3 65 4 active sync /dev/hdb1
/dev/sdc1:
Magic : a92b4efc
Version : 00.90.00
UUID : 55ef57eb:c153dce4:c6f9ac90:e0da3c14
Creation Time : Sun Dec 25 17:58:00 2005
Raid Level : raid5
Device Size : 244195904 (232.88 GiB 250.06 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 2
Update Time : Fri Jan 6 06:57:15 2006
State : active
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0
Checksum : fe3f58ab - correct
Events : 0.61952
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 3 8 33 3 active sync /dev/sdc1
0 0 8 1 0 active sync /dev/sda1
1 1 8 17 1 active sync /dev/sdb1
2 2 8 49 2 active sync /dev/sdd1
3 3 8 33 3 active sync /dev/sdc1
4 4 3 65 4 active sync /dev/hdb1
/dev/sdd1:
Magic : a92b4efc
Version : 00.90.00
UUID : 55ef57eb:c153dce4:c6f9ac90:e0da3c14
Creation Time : Sun Dec 25 17:58:00 2005
Raid Level : raid5
Device Size : 244195904 (232.88 GiB 250.06 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 2
Update Time : Thu Jan 5 17:51:25 2006
State : clean
Active Devices : 5
Working Devices : 5
Failed Devices : 0
Spare Devices : 0
Checksum : fe3f9286 - correct
Events : 0.61949
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 2 8 49 2 active sync /dev/sdd1
0 0 8 1 0 active sync /dev/sda1
1 1 8 17 1 active sync /dev/sdb1
2 2 8 49 2 active sync /dev/sdd1
3 3 8 33 3 active sync /dev/sdc1
4 4 3 65 4 active sync /dev/hdb1
-------------------------------------------------
sdd1 does not have the same "Events" than the others -- does this explain
why it's not fresh ?
So, doing mdadm --assemble in Kanotix did "something" which caused this.
-------------------------------------------------
kernel source code : raid5.c line 1759 :
if (mddev->degraded == 1 &&
mddev->recovery_cp != MaxSector) {
printk(KERN_ERR
"raid5: cannot start dirty degraded array for %s (%lx %lx)\n",
mdname(mddev), mddev->recovery_cp, MaxSector);
goto abort;
}
I added some %lx in the printk so it prints :
"raid5: cannot start dirty degraded array for md2 (0 ffffffffffffffff)"
So, mddev->recovery_cp is 0 and MaxSector is -1 in unsigned 64 bit int. I
have ansolutely no idea what this means !
-------------------------------------------------
So, what can I do to get my data back ? I don't care if it's dirty and a
few files are corrupt ; I can re-rip 1 or 2 CDs, no problem, but not ALL
of them.
Shall I remove the "goto abort;" and fasten seats belt ?
What can I do ?
Thanks for your help !!
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Kanotix crashed my raid...
2006-01-06 11:03 ` Kanotix crashed my raid PFC
@ 2006-01-06 12:02 ` PFC
2006-01-06 12:08 ` PFC
0 siblings, 1 reply; 31+ messages in thread
From: PFC @ 2006-01-06 12:02 UTC (permalink / raw)
To: linux-raid
I forgot :
I tried these :
mdadm --assemble /dev/md2 /dev/hdb1 /dev/sd{a,b,c,d}1
with --run, --force, and both, --stop'ping the array before each try, and
everytime is the same error, and the same line in dmesg : "cannot start
dirty degraded array for md2"
apollo13 ~ # mdadm --stop /dev/md2
apollo13 ~ # mdadm --assemble --verbose /dev/md2 /dev/hdb1
/dev/sd{a,b,c,d}1
mdadm: looking for devices for /dev/md2
mdadm: /dev/hdb1 is identified as a member of /dev/md2, slot 4.
mdadm: /dev/sda1 is identified as a member of /dev/md2, slot 0.
mdadm: /dev/sdb1 is identified as a member of /dev/md2, slot 1.
mdadm: /dev/sdc1 is identified as a member of /dev/md2, slot 3.
mdadm: /dev/sdd1 is identified as a member of /dev/md2, slot 2.
mdadm: added /dev/sdb1 to /dev/md2 as 1
mdadm: added /dev/sdd1 to /dev/md2 as 2
mdadm: added /dev/sdc1 to /dev/md2 as 3
mdadm: added /dev/hdb1 to /dev/md2 as 4
mdadm: added /dev/sda1 to /dev/md2 as 0
mdadm: /dev/md2 assembled from 4 drives - need all 5 to start it (use
--run to insist).
apollo13 ~ # mdadm --run /dev/md2
mdadm: failed to run array /dev/md2: Input/output error
apollo13 ~ # mdadm --stop /dev/md2
apollo13 ~ # mdadm --assemble --verbose --run /dev/md2 /dev/hdb1
/dev/sd{a,b,c,d}1
mdadm: looking for devices for /dev/md2
mdadm: /dev/hdb1 is identified as a member of /dev/md2, slot 4.
mdadm: /dev/sda1 is identified as a member of /dev/md2, slot 0.
mdadm: /dev/sdb1 is identified as a member of /dev/md2, slot 1.
mdadm: /dev/sdc1 is identified as a member of /dev/md2, slot 3.
mdadm: /dev/sdd1 is identified as a member of /dev/md2, slot 2.
mdadm: added /dev/sdb1 to /dev/md2 as 1
mdadm: added /dev/sdd1 to /dev/md2 as 2
mdadm: added /dev/sdc1 to /dev/md2 as 3
mdadm: added /dev/hdb1 to /dev/md2 as 4
mdadm: added /dev/sda1 to /dev/md2 as 0
mdadm: failed to RUN_ARRAY /dev/md2: Input/output error
apollo13 ~ # mdadm --stop /dev/md2
apollo13 ~ # mdadm --assemble --verbose --run --force /dev/md2 /dev/hdb1
/dev/sd{a,b,c,d}1
mdadm: looking for devices for /dev/md2
mdadm: /dev/hdb1 is identified as a member of /dev/md2, slot 4.
mdadm: /dev/sda1 is identified as a member of /dev/md2, slot 0.
mdadm: /dev/sdb1 is identified as a member of /dev/md2, slot 1.
mdadm: /dev/sdc1 is identified as a member of /dev/md2, slot 3.
mdadm: /dev/sdd1 is identified as a member of /dev/md2, slot 2.
mdadm: added /dev/sdb1 to /dev/md2 as 1
mdadm: added /dev/sdd1 to /dev/md2 as 2
mdadm: added /dev/sdc1 to /dev/md2 as 3
mdadm: added /dev/hdb1 to /dev/md2 as 4
mdadm: added /dev/sda1 to /dev/md2 as 0
mdadm: failed to RUN_ARRAY /dev/md2: Input/output error
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Kanotix crashed my raid...
2006-01-06 12:02 ` PFC
@ 2006-01-06 12:08 ` PFC
2006-01-06 22:01 ` PFC
0 siblings, 1 reply; 31+ messages in thread
From: PFC @ 2006-01-06 12:08 UTC (permalink / raw)
To: linux-raid
And mdstat says it's inactive ; while mdadm says it's "active,
degraded"... what's happening ????
apollo13 ~ # cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid5]
md1 : active raid1 hdc7[1] hda7[0]
6248832 blocks [2/2] [UU]
md2 : inactive sda1[0] hdb1[4] sdc1[3] sdb1[1]
978615040 blocks
md0 : active raid1 hdc6[0] hda6[1]
72292992 blocks [2/2] [UU]
unused devices: <none>
apollo13 ~ # mdadm --detail /dev/md2
/dev/md2:
Version : 00.90.02
Creation Time : Sun Dec 25 17:58:00 2005
Raid Level : raid5
Device Size : 244195904 (232.88 GiB 250.06 GB)
Raid Devices : 5
Total Devices : 4
Preferred Minor : 2
Persistence : Superblock is persistent
Update Time : Fri Jan 6 06:57:15 2006
State : active, degraded
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
UUID : 55ef57eb:c153dce4:c6f9ac90:e0da3c14
Events : 0.61952
Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1
1829109784 0 0 0 removed
3 8 33 3 active sync /dev/sdc1
4 3 65 4 active sync /dev/hdb1
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Fwd: Linux MD raid5 and reiser4... Any experience ?
2006-01-06 9:47 ` Simon Valiquette
2006-01-06 10:50 ` Francois Barre
2006-01-06 11:03 ` Kanotix crashed my raid PFC
@ 2006-01-06 19:05 ` Mike Hardy
2006-01-08 2:53 ` Daniel Pittman
3 siblings, 0 replies; 31+ messages in thread
From: Mike Hardy @ 2006-01-06 19:05 UTC (permalink / raw)
To: linux-raid
Slightly off-topic, but:
Simon Valiquette wrote:
> Francois Barre a écrit :
> On production server with large RAID array, I tends to like very
> much XFS and trust it more than ReiserFS (I had some bad experience
> with ReiserFS in the past). You can also grow a XFS filesystem live,
> which is really nice.
I didn't know this until recently, but ext2/3 can be grown online as
well (using 'ext2online'), given that you create it originally with
enough block group descriptor table room to support the size you're
growing too.
From the man page for mke2fs:
-E extended-options
Set extended options for the filesystem. Extended options are
comma separated, and may take an argument using the equals
(’=’) sign. The -E option used to be -R in earlier versions of
mke2fs. The -R option is still accepted for backwards compati-
bility. The following extended options are supported:
stride=stripe-size
Configure the filesystem for a RAID array with
stripe-size filesystem blocks per stripe.
resize=max-online-resize
Reserve enough space so that the block group
descriptor table can grow to support a filesystem
that has max-online-resize blocks.
I have done it, and it works.
-Mike
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Fwd: Linux MD raid5 and reiser4... Any experience ?
2006-01-06 10:50 ` Francois Barre
@ 2006-01-06 19:28 ` Forrest Taylor
0 siblings, 0 replies; 31+ messages in thread
From: Forrest Taylor @ 2006-01-06 19:28 UTC (permalink / raw)
To: Francois Barre; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 633 bytes --]
On Fri, 2006-01-06 at 11:50 +0100, Francois Barre wrote:
> > AFAIK, ext3 volume cannot be bigger than 4TB on a 32 bits system.
> > I think it is important you know that in case it could be a concern
> > for you.
>
> What ? What ? .... What ?
> <dased_and_confused>Are you sure ? I may search for it more
> extensively... Anyway, the total size cannot be more than sizeof(block
> size) * block size, and with 32bits systems and 4k pages, that makes
> 16TB. Are you sure ?</dased_and_confused>
I agree with you, Francois, 16TB should be the limit with 4k pages. The
maximum file size should be 4TB, though.
Forrest
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Kanotix crashed my raid...
2006-01-06 12:08 ` PFC
@ 2006-01-06 22:01 ` PFC
[not found] ` <200601090803.03588.mlaks@verizon.net>
0 siblings, 1 reply; 31+ messages in thread
From: PFC @ 2006-01-06 22:01 UTC (permalink / raw)
To: linux-raid
OK, I bit the bullet and removed the "goto abort" in raid5.c
I was then able to mount everything and recover all of my data without
any problem. Hm.
There should be a way to do this with mdadm without recompiling the
kernel, but anyway, opensource saved my ass xDDD
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Fwd: Linux MD raid5 and reiser4... Any experience ?
2006-01-06 9:47 ` Simon Valiquette
` (2 preceding siblings ...)
2006-01-06 19:05 ` Fwd: Linux MD raid5 and reiser4... Any experience ? Mike Hardy
@ 2006-01-08 2:53 ` Daniel Pittman
3 siblings, 0 replies; 31+ messages in thread
From: Daniel Pittman @ 2006-01-08 2:53 UTC (permalink / raw)
To: linux-raid
Simon Valiquette <v.simon@ieee.org> writes:
> Francois Barre a écrit :
>> 2006/1/5, Daniel Pittman <daniel@rimspace.net>:
>>>Francois Barre <francois.barre@gmail.com> writes:
[...]
> AFAIK, ext3 volume cannot be bigger than 4TB on a 32 bits system. I
> think it is important you know that in case it could be a concern for
> you.
Others have pointed out that this is not correct, so I don't repeat the
reasons there.
[...]
> On production server with large RAID array, I tends to like very much
> XFS and trust it more than ReiserFS (I had some bad experience with
> ReiserFS in the past). You can also grow a XFS filesystem live, which
> is really nice.
>
> XFS is finally much faster than EXT3 and self-defragmenting, but that
> was not a concern for you anyway.
This is true. I wouldn't recommend, personally, using XFS on a
production machine that didn't have a very reliable UPS associated.
XFS only journals meta-data, and writes that to disk quite frequently.
It, like most other filesystems, is much more relaxed about writing the
actual data to disk.
This is great from a hard restart situation -- it means no long fsck to
get the metadata consistent.
Unfortunately, the way that XFS is put together this means that any file
that you were actively writing it in the last minute or so before an
unclean restart has a good chance of having the metadata in journal, but
the data blocks unwritten.
That will have XFS replace the file content with ASCII NULL for you --
secure, but somewhat frustrating if you just lost critical system files
as a result.[1]
Assuming that you have otherwise reliable hardware, power problems are
the most common cause of unclean restarts. The risks of XFS data loss,
in my experience, offset the performance benefits for most production
use.
ext3, for reference, can be asked to operate in the same mode as XFS
but, by default, does not do so.
Regards,
Daniel
Footnotes:
[1] I may be somewhat bitter here, as I support a dozen machines that
use XFS and are, sadly, not terribly reliable hardware. As a
result I spend far too much time pulling critical files back of
backup tapes, and can look forward to that continuing until our
migration away from that hardware is complete. :/
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Fwd: Linux MD raid5 and reiser4... Any experience ?
2006-01-06 10:49 ` Andre Majorel
@ 2006-01-09 8:00 ` Molle Bestefich
2006-01-09 8:16 ` Gordon Henderson
0 siblings, 1 reply; 31+ messages in thread
From: Molle Bestefich @ 2006-01-09 8:00 UTC (permalink / raw)
To: linux-raid
Andre Majorel wrote:
> With LVM, you can create a snapshot of the block device at any
> time, mount the snapshot read-only (looks like another block
> device) and backup that. Ensuring consistency at application level
> is still up to you but at least, if that involves stopping
> services, the unavailability window is greatly reduced.
Hmm. And 1 out of 10 times it will completely *nuke* your data.
Not just the new snapshot you've just created, mind you, but also the
volume you're snapshotting.
I'm talking from experience. That's a "stable" version of LVM I'm
talking about, and it was a fresh install from the most recent RedHat
distro too.
A week ago I compiled my own version of kernel/device-mapper tools/lvm
from the newest sources I could find. It didn't trash my data, but
1/4 times it created an unusable snapshot.
I've personally burn-in tested the hardware and there's absolutely no
problems there. It's a standard server from HP, naturally not
overclocked or anything like that.
It's apparently a known fact that the code is wildly unstable, so stay
away from it if you happen to like your data :-).
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Fwd: Linux MD raid5 and reiser4... Any experience ?
2006-01-09 8:00 ` Molle Bestefich
@ 2006-01-09 8:16 ` Gordon Henderson
2006-01-09 9:00 ` Francois Barre
0 siblings, 1 reply; 31+ messages in thread
From: Gordon Henderson @ 2006-01-09 8:16 UTC (permalink / raw)
To: Molle Bestefich; +Cc: linux-raid
On Mon, 9 Jan 2006, Molle Bestefich wrote:
> Andre Majorel wrote:
> > With LVM, you can create a snapshot of the block device at any
> > time, mount the snapshot read-only (looks like another block
> > device) and backup that. Ensuring consistency at application level
> > is still up to you but at least, if that involves stopping
> > services, the unavailability window is greatly reduced.
>
> Hmm. And 1 out of 10 times it will completely *nuke* your data.
> Not just the new snapshot you've just created, mind you, but also the
> volume you're snapshotting.
>
> I'm talking from experience. That's a "stable" version of LVM I'm
> talking about, and it was a fresh install from the most recent RedHat
> distro too.
I was bitten with LVM some time back (maybe 18 months to 2 years) and then
I didn't have time to track down the real cause, so resorted to just
doubling up on disk space (disk is cheap!) and running a nightly rsync to
create/update the snapshot (and you can make several days worth too with
some cleverness and not much more disk space), then dumping the snapshot
to tape... The 3-400GB volumes I have take less than an hour for the rsync
so it's not a big impact in the middle of the night. (Servers I'm dumping
to tape don't have more than about 400GB partitions to fit it all onto one
300/600 DLT tape)
Since then (nearly) all the big servers I've built have been this way -
double the anticipated disk space and either snapshotting to themselves,
or over the 'net to another box on the same LAN. Saves all that faffing
about when luser asks you to restore a file/dir from tape that they've
accidentally deleted. I keep the 'archive' partition read-only through the
day, so that keeps it sane too.
Gordon
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Fwd: Linux MD raid5 and reiser4... Any experience ?
2006-01-09 8:16 ` Gordon Henderson
@ 2006-01-09 9:00 ` Francois Barre
2006-01-09 9:24 ` Molle Bestefich
0 siblings, 1 reply; 31+ messages in thread
From: Francois Barre @ 2006-01-09 9:00 UTC (permalink / raw)
To: linux-raid
> I was bitten with LVM some time back (maybe 18 months to 2 years) and then
> I didn't have time to track down the real cause[...]
Well, that's part of my LVM fear... Ifever something goes wrong,
because of an admin mistake or because of a LVM bug, chances to
recover a problematic LVM are small. That's part of why I do not wish
to use it.
<ABSOLUTELY OUT OF RAID TOPIC>
Regarding snapshots, seems like XFS has some snapshot features (from
http://oss.sgi.com/projects/xfs/index.html : XFS supports filesystem
growth for mounted volumes, allows filesystem "freeze" and "thaw"
operations to support volume level snapshots, and provides an online
file defragmentation utility).
Did anyone play with this ? Could it be a good way to implement robust
and trustworthy snapshots ?
Anyway, I believe that snapshot logic shall live inside of filesystem
layer ; capability to have a filesystem freezed within a special
process, while other processes still modifying the fs could be great :
alterations of the fs could be stored in a particular place of the fs,
committed at the end of the 'freezed' process... That would be great.
But it does not seem to be implemented by anyone...
Another way to do so would be to play with unionfs like this :
- unmount your working partition
- mount it ro with an additionnal rw temp partition using unionfs
- make snapshots
- merge the temp and the working
- unmount all, remount the working one rw
That would mean stopping all your services, and I guess it would not
scale great with large files (writing to a database would mean moving
it from ro to rw...).
Maybe linux-raid is not the right place to speak of this, but as lots
of people here seem to handle large pieces of data, and have
experiences with backuping, I guess your advices can be really
valuable...
<MAYBE MORE LINUX-RAID IN-TOPIC>
Gordon, why not using raid-1 (mirror) for backup proposes ? I mean,
you have your raid5 partition, and you have your backup one. Let's
assume they are the same size.
Why not build a raid1 on top of these two, mount/use raid1 as *the*
working partition, and backup mechanism should look as follow :
- tag the bakcup partition faulty in raid1. raid1 no longer writes
anything to backup.
- backup/tape your backup partition. It's a mere snapshot of your raid5 here.
- resync raid5 and your backup partition. raid1 is done for this.
The only issue I see here is : how to make sure backups are
application-level consistent ?
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Fwd: Linux MD raid5 and reiser4... Any experience ?
2006-01-09 9:00 ` Francois Barre
@ 2006-01-09 9:24 ` Molle Bestefich
0 siblings, 0 replies; 31+ messages in thread
From: Molle Bestefich @ 2006-01-09 9:24 UTC (permalink / raw)
To: linux-raid; +Cc: xen-users
Gordon Henderson wrote:
> I was bitten with LVM some time back (maybe 18 months to 2 years) and then
> I didn't have time to track down the real cause, so resorted to just
> doubling up on disk space (disk is cheap!) and running a nightly rsync to
> create/update the snapshot (and you can make several days worth too with
> some cleverness and not much more disk space), then dumping the snapshot
> to tape... The 3-400GB volumes I have take less than an hour for the rsync
> so it's not a big impact in the middle of the night.
Thanks for the tip :-).
My current approach is to tar the entire filesystem to a "backup" partition
and gzip it with the "--rsyncable" option. That way I can diff the backup
tar's and keep a whole load of backups online at a minimum of disk space.
Doesn't seem useful at first to have so many backups. But keep in mind that
the less space each backup takes, the more often you can back up. With
this approach I can backup every 24 hours which means less work is lost when
I have to restore.
Francois Barre:
> Regarding snapshots, seems like XFS has some snapshot features (from
> http://oss.sgi.com/projects/xfs/index.html : XFS supports filesystem
> growth for mounted volumes, allows filesystem "freeze" and "thaw"
> operations to support volume level snapshots, and provides an online
> file defragmentation utility).
> Did anyone play with this ? Could it be a good way to implement robust
> and trustworthy snapshots ?
> Anyway, I believe that snapshot logic shall live inside of filesystem
> layer ; capability to have a filesystem freezed within a special
> process, while other processes still modifying the fs could be great :
Seeing as server virtualization is all the fuzz right now, what would be
really cool IMHO would be a filesystem where one could make a
block-by-block snapshot of the partition at _any_ point in time and have
a consistent filesystem out of it, without doing any sort of
freeze/unfreeze operations.
This would allow you to do snapshotting and backup *outside* of your
virtual machines, which is much more desirable than doing it from the
inside. From the outside, your virtual machines can't destroy the backup
process, so you have guarantee that it gets done every time. And that
your customers do not muck with it. You also have much better control of
disk space usage. And you can make backups if the machine happens to be
turned off. And consistent ones too, even if it has crashed. There's
probably other advantages I haven't thought of :-).
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: Kanotix crashed my raid...
[not found] ` <200601090803.03588.mlaks@verizon.net>
@ 2006-01-09 18:30 ` PFC
0 siblings, 0 replies; 31+ messages in thread
From: PFC @ 2006-01-09 18:30 UTC (permalink / raw)
To: Mitchell Laks, linux-raid@vger.kernel.org
>> OK, I bit the bullet and removed the "goto abort" in raid5.c
>>
>> I was then able to mount everything and recover all of my data without
>> any problem. Hm.
>>
>> There should be a way to do this with mdadm without recompiling the
>> kernel, but anyway, opensource saved my ass xDDD
> Could you explain to me what you did? It sounds very important for me to
> understand!
Well, I ought to update.
So the kernel refused to start my raid because it was dirty and degraded.
I understand this, but getting some data out is better than none. I knew
that the PC had crashed while starting the array. So, it wouldn't have had
much time to cause a lot of corruption. Most likely everything was
alright, just marked dirty. So I just removed the test in the kernel which
refuses to start the array in this condition (that's why I love
opensource). And it worked.
raid5.c in the 2.6.14 kernel :
if (mddev->degraded == 1 &&
mddev->recovery_cp != MaxSector) {
printk(KERN_ERR "raid5: cannot start dirty degraded array for
%s",mdname(mddev));
goto abort;
}
I just commented out the goto abort.
Anyway, that's not the end ! I got 3 harddisk failures in a week. All
maxtor 250G SATA HDDs. I RMA'd the first one and got a new drive. It
worked for 2 days and then boom. So I went to all the computer shops
around here and they only had maxtor so I bought another maxtor. AND
YESTERDAY IT DIED TOO.
I spent a lot of time with google, and :
Kanotix was not the culprit ; it's nvidia. Turns out the nforce 3 and 4
chipsets have some SATA problems. This might be a hardware issue, or maybe
a driver issue, who knows, but the end result is this :
- My nforce 3 / Athlon 64 PC running Linux has 4 SATA ports (2x 2 ports)
- 2 of these ports (sda and sdb) are compatible with maxtor harddrives
(ie. it seems it works)
- the other two (sdc and sdd) are not compatible with maxtor drives.
My other computer, which has a nforce4 mobo / Athlon 64 PC and runs
windows, has the exactly same problem !
- If I plug a seagate harddrive as sda,b,c,d, it works.
- If I plug a maxtor harddrive as sda or sdb, it works.
- If I plug a maxtor harddrive as sdc or sdd (the other sata ports), on
linux it works for a day or two then the drive "dies" ; on windows it just
fucks everything up (takes forever to boot, disk management console
crashes, mouse freezes, disk exists then disappears, etc).
Re-plugging the "dead" drive as sda or sdb makes it work ! So now all is
well. I have plugged my 2 maxtor drives on the "maxtorphile" sata sockets,
and the 2 seagate drives on the "maxtorphobe" sata sockets. Everything
works. I feel like banging head against wall.
Oh yeah, on linux I also deactivated USB, firewire, and unplugged teh
CDROM drive. Who knows. It seems to work now. At least it has worked for
24 hours now.
Actually it's quite hallucinating. And I found people in forums with the
same experience ! Some guy had to unplug his IDE CDR to get his SATA hdd
to work. On windows. Cool. Actually sata + nforce = broken.
(the "maxtorphile" sata sockets are not driven by the nforce chipset but
by a SIS chip, or so it seems).
DON'T BUY NFORCE FOR MAKING SATA RAID !!!
^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2006-01-09 18:30 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <fd8d0180601050104x15079396h@mail.gmail.com>
2006-01-05 9:06 ` Fwd: Linux MD raid5 and reiser4... Any experience ? Francois Barre
2006-01-05 10:14 ` Daniel Pittman
2006-01-05 11:21 ` Francois Barre
2006-01-05 11:31 ` Gordon Henderson
2006-01-06 6:33 ` Daniel Pittman
2006-01-06 9:47 ` Simon Valiquette
2006-01-06 10:50 ` Francois Barre
2006-01-06 19:28 ` Forrest Taylor
2006-01-06 11:03 ` Kanotix crashed my raid PFC
2006-01-06 12:02 ` PFC
2006-01-06 12:08 ` PFC
2006-01-06 22:01 ` PFC
[not found] ` <200601090803.03588.mlaks@verizon.net>
2006-01-09 18:30 ` PFC
2006-01-06 19:05 ` Fwd: Linux MD raid5 and reiser4... Any experience ? Mike Hardy
2006-01-08 2:53 ` Daniel Pittman
2006-01-05 11:26 ` berk walker
2006-01-05 11:35 ` Francois Barre
2006-01-05 11:43 ` Gordon Henderson
2006-01-05 11:59 ` berk walker
2006-01-05 13:13 ` Bill Rugolsky Jr.
2006-01-05 13:38 ` John Stoffel
2006-01-05 14:03 ` Francois Barre
2006-01-05 18:55 ` John Stoffel
2006-01-06 9:08 ` Francois Barre
2006-01-06 10:49 ` Andre Majorel
2006-01-09 8:00 ` Molle Bestefich
2006-01-09 8:16 ` Gordon Henderson
2006-01-09 9:00 ` Francois Barre
2006-01-09 9:24 ` Molle Bestefich
2006-01-05 17:32 Andrew Burgess
2006-01-05 17:50 ` Francois Barre
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).