* [ANNOUNCE] mdadm-3.1 has been withdrawn
@ 2009-11-06 6:45 Neil Brown
2009-11-09 14:39 ` Doug Ledford
0 siblings, 1 reply; 22+ messages in thread
From: Neil Brown @ 2009-11-06 6:45 UTC (permalink / raw)
To: linux-raid
Greetings.
About a week ago I released mdadm-3.1
I have now 'withdrawn' it meaning that it doesn't appear on the
kernel.org mirrors any more, and I ask people not to use it.
The reason is that it is not as reliable at managing a raid[56]
reshape as I thought and it can corrupt data too easily.
In particular the 'backup' that is taken of the area being reshaped
gets restored to the wrong location when the array is stopped in the
middle of a reshape and reassembled.
If anyone has used mdadm-3.1 to reshape an array and has stopped and
restarted the array during that process (and I know some people have)
then it is very possible that some data in that filesystem has been
corrupted. I would urge you do take whatever measures you can to
check for corruption. An fsck at the very least would be advised.
The code in the devel-3.1 branch of my git tree
(git://neil.brown.name/mdadm) has this bug fixed as well as a number
of other improvements. I will probably release it as 3.1.1 some time
next week.
Note that you need 2.6.32 for most of the reshape operations with the
new code. This is because 2.6.31 does not handle a device failure
during reshape correctly and a subsequent crash can cause data to be
lost. When the needed patches appear in a 2.6.31.y stable kernel. I
will adjust the requirement that mdadm imposes.
A big "thank you" to everyone who tested out this code and an even
bigger apology to anyone who has suffered data loss because of it.
NeilBrown
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn
2009-11-06 6:45 [ANNOUNCE] mdadm-3.1 has been withdrawn Neil Brown
@ 2009-11-09 14:39 ` Doug Ledford
2009-11-09 15:36 ` berk walker
2009-11-09 20:22 ` Neil F Brown
0 siblings, 2 replies; 22+ messages in thread
From: Doug Ledford @ 2009-11-09 14:39 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 931 bytes --]
On 11/06/2009 01:45 AM, Neil Brown wrote:
>
> Greetings.
>
> About a week ago I released mdadm-3.1
> I have now 'withdrawn' it meaning that it doesn't appear on the
> kernel.org mirrors any more, and I ask people not to use it.
Although the cause for this sucks, I was actually going to suggest that
since 3.1 is a version bump, that we take the opportunity to change a
few defaults. Like switching to version 1 superblocks instead of
version 0 by default. And changing the default chunk size to 512k
instead of 64k. The time has simply come for the 0->1 superblock
change, and I have a good deal of data showing that for SATA disks at
least, the 512k chunk size is the typical sweet spot.
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn
2009-11-09 14:39 ` Doug Ledford
@ 2009-11-09 15:36 ` berk walker
2009-11-09 15:42 ` Jon Nelson
2009-11-09 20:22 ` Neil F Brown
1 sibling, 1 reply; 22+ messages in thread
From: berk walker @ 2009-11-09 15:36 UTC (permalink / raw)
To: Doug Ledford; +Cc: Neil Brown, linux-raid
Doug Ledford wrote:
> On 11/06/2009 01:45 AM, Neil Brown wrote:
>> Greetings.
>>
>> About a week ago I released mdadm-3.1
>> I have now 'withdrawn' it meaning that it doesn't appear on the
>> kernel.org mirrors any more, and I ask people not to use it.
>
> Although the cause for this sucks, I was actually going to suggest that
> since 3.1 is a version bump, that we take the opportunity to change a
> few defaults. Like switching to version 1 superblocks instead of
> version 0 by default. And changing the default chunk size to 512k
> instead of 64k. The time has simply come for the 0->1 superblock
> change, and I have a good deal of data showing that for SATA disks at
> least, the 512k chunk size is the typical sweet spot.
>
+1
b-
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn
2009-11-09 15:36 ` berk walker
@ 2009-11-09 15:42 ` Jon Nelson
2009-11-09 16:51 ` Mikael Abrahamsson
0 siblings, 1 reply; 22+ messages in thread
From: Jon Nelson @ 2009-11-09 15:42 UTC (permalink / raw)
Cc: linux-raid
On Mon, Nov 9, 2009 at 9:36 AM, berk walker <berk@panix.com> wrote:
> Doug Ledford wrote:
>> Although the cause for this sucks, I was actually going to suggest that
>> since 3.1 is a version bump, that we take the opportunity to change a
>> few defaults. Like switching to version 1 superblocks instead of
>> version 0 by default. And changing the default chunk size to 512k
>> instead of 64k. The time has simply come for the 0->1 superblock
>> change, and I have a good deal of data showing that for SATA disks at
>> least, the 512k chunk size is the typical sweet spot.
>>
> +1
+1 here, too.
I've been using 1.1 for everything. What's the current wisdom
regarding 1.0 vs 1.1 or 1.2?
I used 1.1 because that's also where filesystem metadata usually goes
and therefore one might hope that the presence of the md metadata
would prevent accidental identification of a raid volume as containing
a filesystem.
--
Jon
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn
2009-11-09 15:42 ` Jon Nelson
@ 2009-11-09 16:51 ` Mikael Abrahamsson
2009-11-09 21:07 ` Doug Ledford
0 siblings, 1 reply; 22+ messages in thread
From: Mikael Abrahamsson @ 2009-11-09 16:51 UTC (permalink / raw)
To: Jon Nelson; +Cc: linux-raid
On Mon, 9 Nov 2009, Jon Nelson wrote:
> I've been using 1.1 for everything. What's the current wisdom
> regarding 1.0 vs 1.1 or 1.2?
> I used 1.1 because that's also where filesystem metadata usually goes
> and therefore one might hope that the presence of the md metadata
> would prevent accidental identification of a raid volume as containing
> a filesystem.
I like 1.2 because if you happen to write an MBR or something to the
drive, you don't lose the superblock.
With 1.2 I can also take the drive from a 3ware hw-raid (single drive in
3ware bios) and put in a non-3ware (because the 3ware stores the
superblock at the end, so when you put it in a non-3ware the end has now
changed).
--
Mikael Abrahamsson email: swmike@swm.pp.se
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn
2009-11-09 14:39 ` Doug Ledford
2009-11-09 15:36 ` berk walker
@ 2009-11-09 20:22 ` Neil F Brown
2009-11-09 21:00 ` Doug Ledford
2009-11-13 23:54 ` Dan Williams
1 sibling, 2 replies; 22+ messages in thread
From: Neil F Brown @ 2009-11-09 20:22 UTC (permalink / raw)
To: Doug Ledford; +Cc: linux-raid
On Tue, November 10, 2009 1:39 am, Doug Ledford wrote:
> On 11/06/2009 01:45 AM, Neil Brown wrote:
>>
>> Greetings.
>>
>> About a week ago I released mdadm-3.1
>> I have now 'withdrawn' it meaning that it doesn't appear on the
>> kernel.org mirrors any more, and I ask people not to use it.
>
> Although the cause for this sucks, I was actually going to suggest that
> since 3.1 is a version bump, that we take the opportunity to change a
> few defaults. Like switching to version 1 superblocks instead of
> version 0 by default. And changing the default chunk size to 512k
> instead of 64k. The time has simply come for the 0->1 superblock
> change, and I have a good deal of data showing that for SATA disks at
> least, the 512k chunk size is the typical sweet spot.
I had been toying with that idea myself - certainly of changing the
defaults soon.
I'm tempted to make the default metadata "1.1" though possibly not for
RAID1. For RAID0,4,5,6,10 there is no value in having the metadata
at the end of the device. For RAID1 there is as it makes booting off
any member easier.
Thoughts?
I'm certainly happy with increasing the chunksize to 512K.
NeilBrown
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn
2009-11-09 20:22 ` Neil F Brown
@ 2009-11-09 21:00 ` Doug Ledford
2009-11-13 23:54 ` Dan Williams
1 sibling, 0 replies; 22+ messages in thread
From: Doug Ledford @ 2009-11-09 21:00 UTC (permalink / raw)
To: Neil F Brown; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 1940 bytes --]
On 11/09/2009 03:22 PM, Neil F Brown wrote:
> On Tue, November 10, 2009 1:39 am, Doug Ledford wrote:
>> On 11/06/2009 01:45 AM, Neil Brown wrote:
>>>
>>> Greetings.
>>>
>>> About a week ago I released mdadm-3.1
>>> I have now 'withdrawn' it meaning that it doesn't appear on the
>>> kernel.org mirrors any more, and I ask people not to use it.
>>
>> Although the cause for this sucks, I was actually going to suggest that
>> since 3.1 is a version bump, that we take the opportunity to change a
>> few defaults. Like switching to version 1 superblocks instead of
>> version 0 by default. And changing the default chunk size to 512k
>> instead of 64k. The time has simply come for the 0->1 superblock
>> change, and I have a good deal of data showing that for SATA disks at
>> least, the 512k chunk size is the typical sweet spot.
>
> I had been toying with that idea myself - certainly of changing the
> defaults soon.
> I'm tempted to make the default metadata "1.1" though possibly not for
> RAID1. For RAID0,4,5,6,10 there is no value in having the metadata
> at the end of the device. For RAID1 there is as it makes booting off
> any member easier.
> Thoughts?
While it makes booting off of a raid1 easier, raid1 is *precisely* the
level that is prone to silent data corrupt due to the individual members
being able to be mounted while not part of a running raid array. I
would make it default to 1.1 period, and force distros or other people
to either A) update the boot loader to something that can handle a 1.1
superblock (grub2 should be able to) or B) manually set it to 1.0 instead.
>
> I'm certainly happy with increasing the chunksize to 512K.
>
> NeilBrown
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn
2009-11-09 16:51 ` Mikael Abrahamsson
@ 2009-11-09 21:07 ` Doug Ledford
2009-11-09 21:27 ` Luca Berra
` (2 more replies)
0 siblings, 3 replies; 22+ messages in thread
From: Doug Ledford @ 2009-11-09 21:07 UTC (permalink / raw)
To: Mikael Abrahamsson; +Cc: Jon Nelson, linux-raid
[-- Attachment #1: Type: text/plain, Size: 1939 bytes --]
On 11/09/2009 11:51 AM, Mikael Abrahamsson wrote:
> On Mon, 9 Nov 2009, Jon Nelson wrote:
>
>> I've been using 1.1 for everything. What's the current wisdom
>> regarding 1.0 vs 1.1 or 1.2?
>> I used 1.1 because that's also where filesystem metadata usually goes
>> and therefore one might hope that the presence of the md metadata
>> would prevent accidental identification of a raid volume as containing
>> a filesystem.
>
> I like 1.2 because if you happen to write an MBR or something to the
> drive, you don't lose the superblock.
Of course, I recently had a bug report that I ended closing out as
NOTABUG because of this very ability. The person had arrays with 1.2
superblocks, and they went to add a new disk, and all the existing disks
had a specific partition layout, so he copied that to the new disk, then
tried to add the partition to the raid array. It kept returning "device
too small for array". Then, upon inspection, we come to see he has a
1.2 superblock on the *entire* drive, which left the partition table
intact, but the partition table is *pointless* because the array is on
the whole disk devices. This sort of confusion is bad. So, while I
could see making it 1.2 for partitions (so that boot sectors won't
overwrite the superblock), I wouldn't make it 1.2 for whole disk
devices, and in fact it might be wise to refuse to create 1.2
superblocks on whole disk devices. Just a thought.
> With 1.2 I can also take the drive from a 3ware hw-raid (single drive in
> 3ware bios) and put in a non-3ware (because the 3ware stores the
> superblock at the end, so when you put it in a non-3ware the end has now
> changed).
1.1 should work just as well for this.
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn
2009-11-09 21:07 ` Doug Ledford
@ 2009-11-09 21:27 ` Luca Berra
2009-11-09 21:43 ` Jon Nelson
2009-11-10 8:25 ` Mikael Abrahamsson
2009-11-12 22:25 ` Bill Davidsen
2 siblings, 1 reply; 22+ messages in thread
From: Luca Berra @ 2009-11-09 21:27 UTC (permalink / raw)
To: linux-raid
On Mon, Nov 09, 2009 at 04:07:09PM -0500, Doug Ledford wrote:
>overwrite the superblock), I wouldn't make it 1.2 for whole disk
>devices, and in fact it might be wise to refuse to create 1.2
>superblocks on whole disk devices. Just a thought.
I am against refusing to do things because users could get confused,
I could agree if this would require a force flag, but not deny
completely.
I would think 1.1 is a good option for default. Better than 1.0 for
reasons we discussed to boredom, and 1.2 is really only for special
cases.
i also agree on the default chunk size bump
--
Luca Berra -- bluca@comedia.it
Communication Media & Services S.r.l.
/"\
\ / ASCII RIBBON CAMPAIGN
X AGAINST HTML MAIL
/ \
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn
2009-11-09 21:27 ` Luca Berra
@ 2009-11-09 21:43 ` Jon Nelson
0 siblings, 0 replies; 22+ messages in thread
From: Jon Nelson @ 2009-11-09 21:43 UTC (permalink / raw)
To: linux-raid
On Mon, Nov 9, 2009 at 3:27 PM, Luca Berra <bluca@comedia.it> wrote:
> On Mon, Nov 09, 2009 at 04:07:09PM -0500, Doug Ledford wrote:
>>
>> overwrite the superblock), I wouldn't make it 1.2 for whole disk
>> devices, and in fact it might be wise to refuse to create 1.2
>> superblocks on whole disk devices. Just a thought.
>
> I am against refusing to do things because users could get confused,
> I could agree if this would require a force flag, but not deny
> completely.
Agree.
> I would think 1.1 is a good option for default. Better than 1.0 for
> reasons we discussed to boredom, and 1.2 is really only for special
> cases.
Agree.
> i also agree on the default chunk size bump
Agree.
--
Jon
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn
2009-11-09 21:07 ` Doug Ledford
2009-11-09 21:27 ` Luca Berra
@ 2009-11-10 8:25 ` Mikael Abrahamsson
2009-11-10 14:22 ` Jon Nelson
2009-11-12 22:25 ` Bill Davidsen
2 siblings, 1 reply; 22+ messages in thread
From: Mikael Abrahamsson @ 2009-11-10 8:25 UTC (permalink / raw)
To: linux-raid
On Mon, 9 Nov 2009, Doug Ledford wrote:
> Of course, I recently had a bug report that I ended closing out as
> NOTABUG because of this very ability. The person had arrays with 1.2
> superblocks, and they went to add a new disk, and all the existing disks
> had a specific partition layout, so he copied that to the new disk, then
> tried to add the partition to the raid array. It kept returning "device
> too small for array". Then, upon inspection, we come to see he has a
> 1.2 superblock on the *entire* drive, which left the partition table
> intact, but the partition table is *pointless* because the array is on
> the whole disk devices. This sort of confusion is bad. So, while I
> could see making it 1.2 for partitions (so that boot sectors won't
> overwrite the superblock), I wouldn't make it 1.2 for whole disk
> devices, and in fact it might be wise to refuse to create 1.2
> superblocks on whole disk devices. Just a thought.
Well, same thing there, if you create a partition table you don't break
the superblock. Perhaps something needs to be able to discern between the
superblock being "whole disk" and on a partition? Personally I put 1.2 on
"whole disk" (no partition table at all), and I would really HATE this
possibility going away. I like it the way it is and feel comfortable with
it and I don't want 1.0 or 1.1 superblocks in my setup.
--
Mikael Abrahamsson email: swmike@swm.pp.se
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn
2009-11-10 8:25 ` Mikael Abrahamsson
@ 2009-11-10 14:22 ` Jon Nelson
2009-11-11 3:26 ` Michael Evans
0 siblings, 1 reply; 22+ messages in thread
From: Jon Nelson @ 2009-11-10 14:22 UTC (permalink / raw)
Cc: linux-raid
On Tue, Nov 10, 2009 at 2:25 AM, Mikael Abrahamsson <swmike@swm.pp.se> wrote:
> On Mon, 9 Nov 2009, Doug Ledford wrote:
>
>> Of course, I recently had a bug report that I ended closing out as NOTABUG
>> because of this very ability. The person had arrays with 1.2 superblocks,
>> and they went to add a new disk, and all the existing disks had a specific
>> partition layout, so he copied that to the new disk, then tried to add the
>> partition to the raid array. It kept returning "device too small for
>> array". Then, upon inspection, we come to see he has a 1.2 superblock on
>> the *entire* drive, which left the partition table intact, but the partition
>> table is *pointless* because the array is on the whole disk devices. This
>> sort of confusion is bad. So, while I could see making it 1.2 for
>> partitions (so that boot sectors won't overwrite the superblock), I wouldn't
>> make it 1.2 for whole disk devices, and in fact it might be wise to refuse
>> to create 1.2 superblocks on whole disk devices. Just a thought.
>
> Well, same thing there, if you create a partition table you don't break the
> superblock. Perhaps something needs to be able to discern between the
> superblock being "whole disk" and on a partition? Personally I put 1.2 on
> "whole disk" (no partition table at all), and I would really HATE this
> possibility going away. I like it the way it is and feel comfortable with it
> and I don't want 1.0 or 1.1 superblocks in my setup.
Since I almost always use partitions (this way, the partition *type*
is "Linux RAID") I largely avoid this issue.
--
Jon
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn
2009-11-10 14:22 ` Jon Nelson
@ 2009-11-11 3:26 ` Michael Evans
0 siblings, 0 replies; 22+ messages in thread
From: Michael Evans @ 2009-11-11 3:26 UTC (permalink / raw)
To: Jon Nelson; +Cc: linux-raid
There is nothing preventing someone from first creating a protective
partition, similar to the MBR record used by GPT. Then they would be
able to use the 4k offset 1.2 label on the device if they absolutely
wanted. However a normal MBR with partition and 1.1 label would use
less disk space and be more compatible with other tools. Similar
logic applies for GPT labeled drives, and any drive large enough to
require GPT should not miss the ~16 kilobytes required at each end of
the drive.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn
@ 2009-11-12 17:51 ` greg
2009-11-12 23:02 ` Rudy Zijlstra
2009-11-13 2:02 ` Neil Brown
0 siblings, 2 replies; 22+ messages in thread
From: greg @ 2009-11-12 17:51 UTC (permalink / raw)
To: Neil F Brown, Doug Ledford; +Cc: linux-raid
On Nov 10, 7:22am, "Neil F Brown" wrote:
} Subject: Re: [ANNOUNCE] mdadm-3.1 has been withdrawn
Good morning to everyone, hope the week is progressing well.
> On Tue, November 10, 2009 1:39 am, Doug Ledford wrote:
> > On 11/06/2009 01:45 AM, Neil Brown wrote:
> >>
> >> Greetings.
> >>
> >> About a week ago I released mdadm-3.1
> >> I have now 'withdrawn' it meaning that it doesn't appear on the
> >> kernel.org mirrors any more, and I ask people not to use it.
> >
> > Although the cause for this sucks, I was actually going to suggest that
> > since 3.1 is a version bump, that we take the opportunity to change a
> > few defaults. Like switching to version 1 superblocks instead of
> > version 0 by default. And changing the default chunk size to 512k
> > instead of 64k. The time has simply come for the 0->1 superblock
> > change, and I have a good deal of data showing that for SATA disks at
> > least, the 512k chunk size is the typical sweet spot.
> I had been toying with that idea myself - certainly of changing the
> defaults soon. I'm tempted to make the default metadata "1.1"
> though possibly not for RAID1. For RAID0,4,5,6,10 there is no value
> in having the metadata at the end of the device. For RAID1 there is
> as it makes booting off any member easier. Thoughts?
It may be heresy but I would suggest that if the defaults change we
should also implement support for auto-starting version 1.x devices,
or some appropriate subset of them.
I understand and appreciate the concerns of the userspace start
community. However, we do a lot of storage on very dedicated systems
and I have spent far more time unsnarling systems with blown
initrd/initramfs setups and other boot issues than I have recovering
from starting RAID volumes on the wrong box. Thats why I don't let
udev anywhere near production machines and I am still living on 0.9
metadata in spite of its limitations.
UNIX has always been about allowing people to shoot themselves in the
foot if they so desire. I think an acceptable compromise would be to
move toward a default of disabled auto-detection with the option to
turn on detection of all meta-data types if people choose to do that.
> NeilBrown
Best wishes for a pleasant weekend to everyone.
}-- End of excerpt from "Neil F Brown"
As always,
Dr. G.W. Wettstein, Ph.D. Enjellic Systems Development, LLC.
4206 N. 19th Ave. Specializing in information infra-structure
Fargo, ND 58102 development.
PH: 701-281-1686
FAX: 701-281-3949 EMAIL: greg@enjellic.com
------------------------------------------------------------------------------
"Join in the new game that's sweeping the country. It's called
`Bureaucracy`. Everybody stands in a circle. The first person to do
anything loses."
-- Steve RTFM Przepiora
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn
2009-11-09 21:07 ` Doug Ledford
2009-11-09 21:27 ` Luca Berra
2009-11-10 8:25 ` Mikael Abrahamsson
@ 2009-11-12 22:25 ` Bill Davidsen
2009-11-13 5:50 ` Mikael Abrahamsson
2 siblings, 1 reply; 22+ messages in thread
From: Bill Davidsen @ 2009-11-12 22:25 UTC (permalink / raw)
To: Doug Ledford; +Cc: Mikael Abrahamsson, Jon Nelson, linux-raid
Doug Ledford wrote:
> On 11/09/2009 11:51 AM, Mikael Abrahamsson wrote:
>
>> On Mon, 9 Nov 2009, Jon Nelson wrote:
>>
>>
>>> I've been using 1.1 for everything. What's the current wisdom
>>> regarding 1.0 vs 1.1 or 1.2?
>>> I used 1.1 because that's also where filesystem metadata usually goes
>>> and therefore one might hope that the presence of the md metadata
>>> would prevent accidental identification of a raid volume as containing
>>> a filesystem.
>>>
>> I like 1.2 because if you happen to write an MBR or something to the
>> drive, you don't lose the superblock.
>>
>
> Of course, I recently had a bug report that I ended closing out as
> NOTABUG because of this very ability. The person had arrays with 1.2
> superblocks, and they went to add a new disk, and all the existing disks
> had a specific partition layout, so he copied that to the new disk, then
> tried to add the partition to the raid array. It kept returning "device
> too small for array". Then, upon inspection, we come to see he has a
> 1.2 superblock on the *entire* drive, which left the partition table
> intact, but the partition table is *pointless* because the array is on
> the whole disk devices. This sort of confusion is bad. So, while I
> could see making it 1.2 for partitions (so that boot sectors won't
> overwrite the superblock), I wouldn't make it 1.2 for whole disk
> devices, and in fact it might be wise to refuse to create 1.2
> superblocks on whole disk devices. Just a thought.
>
>
I'm trying to wrap my head around this recommendation, and not doing
well. The end of the allocation area (partition, disk, array, whatever)
seems to be what users hit when they do a dd or some similar operation
without understanding it. And the from end is what they hit when they
"fix" the MBR or add a partition table because something said it was
missing. As for your friend, nothing is foolproof, and unless he tried
very hard he probably failed to damage anything in a way which couldn't
be readily fixed.
I like 1.2, I feel it's least likely to suffer collateral damage, and
the problems it causes seem to result in the type of behavior you
mention aboue, the system says "Can't, won't, you don't know what you're
doing."
--
Bill Davidsen <davidsen@tmr.com>
"We can't solve today's problems by using the same thinking we
used in creating them." - Einstein
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn
2009-11-12 17:51 ` greg
@ 2009-11-12 23:02 ` Rudy Zijlstra
2009-11-13 1:53 ` Michael Evans
2009-11-13 2:02 ` Neil Brown
1 sibling, 1 reply; 22+ messages in thread
From: Rudy Zijlstra @ 2009-11-12 23:02 UTC (permalink / raw)
To: greg; +Cc: Neil F Brown, Doug Ledford, linux-raid
greg@enjellic.com wrote:
> On Nov 10, 7:22am, "Neil F Brown" wrote:
> } Subject: Re: [ANNOUNCE] mdadm-3.1 has been withdrawn
>
> Good morning to everyone, hope the week is progressing well.
>
> <snip>
> It may be heresy but I would suggest that if the defaults change we
> should also implement support for auto-starting version 1.x devices,
> or some appropriate subset of them.
>
> I understand and appreciate the concerns of the userspace start
> community. However, we do a lot of storage on very dedicated systems
> and I have spent far more time unsnarling systems with blown
> initrd/initramfs setups and other boot issues than I have recovering
> from starting RAID volumes on the wrong box. Thats why I don't let
> udev anywhere near production machines and I am still living on 0.9
> metadata in spite of its limitations.
>
> UNIX has always been about allowing people to shoot themselves in the
> foot if they so desire. I think an acceptable compromise would be to
> move toward a default of disabled auto-detection with the option to
> turn on detection of all meta-data types if people choose to do that.
>
>
+1
Cheers,
Rudy
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn
2009-11-12 23:02 ` Rudy Zijlstra
@ 2009-11-13 1:53 ` Michael Evans
0 siblings, 0 replies; 22+ messages in thread
From: Michael Evans @ 2009-11-13 1:53 UTC (permalink / raw)
To: Rudy Zijlstra; +Cc: greg, Neil F Brown, Doug Ledford, linux-raid
You still have to have some part of your boot process served by normal
startup support (like a hardware raid / motherboard supported
fakeraid; unless you're doing some kind of netboot) no matter what you
do you still need a kernel exposed, and you may as well have an initrd
of some kind after it.
What I'd likely do is use GPT and place your bootloader inside of a
Guid Partition Table area (preferably near the front of the disk so
you're assured to be within realmode bios LBA range (just in case LBA
48 bit isn't supported by your bootloader/bios) which might end at
128GB of data http://ubuntuforums.org/archive/index.php/t-301826.html)
and set it up so that the boot-loader code in the first 440 (446)
bytes of the MBR compatibility/protective label loads the blocks for
the real bootloader from that area.
Then you can use DD to duplicate the first 446 bytes on to each other
'mirror' device and either have those boot devices as a mirror set, or
intentionally only manually update the backups when you've tested a
new kernel.
< Free Open Source Plug >
If for some reason your current distribution's initrd/initramfs
doesn't do what you want, I know of an easily customized alternative:
"Another Early Userspace Init Option"
http://sourceforge.net/projects/aeuio/ which is based on basic /bin/sh
, awk, sed ; it builds best when you have a local copy of busybox, but
should also build (a somewhat larger) initrd using the other binaries
on your system.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn
2009-11-12 17:51 ` greg
2009-11-12 23:02 ` Rudy Zijlstra
@ 2009-11-13 2:02 ` Neil Brown
1 sibling, 0 replies; 22+ messages in thread
From: Neil Brown @ 2009-11-13 2:02 UTC (permalink / raw)
To: greg; +Cc: Doug Ledford, linux-raid
On Thursday November 12, greg@enjellic.com wrote:
> On Nov 10, 7:22am, "Neil F Brown" wrote:
> } Subject: Re: [ANNOUNCE] mdadm-3.1 has been withdrawn
>
> Good morning to everyone, hope the week is progressing well.
>
> > On Tue, November 10, 2009 1:39 am, Doug Ledford wrote:
> > > On 11/06/2009 01:45 AM, Neil Brown wrote:
> > >>
> > >> Greetings.
> > >>
> > >> About a week ago I released mdadm-3.1
> > >> I have now 'withdrawn' it meaning that it doesn't appear on the
> > >> kernel.org mirrors any more, and I ask people not to use it.
> > >
> > > Although the cause for this sucks, I was actually going to suggest that
> > > since 3.1 is a version bump, that we take the opportunity to change a
> > > few defaults. Like switching to version 1 superblocks instead of
> > > version 0 by default. And changing the default chunk size to 512k
> > > instead of 64k. The time has simply come for the 0->1 superblock
> > > change, and I have a good deal of data showing that for SATA disks at
> > > least, the 512k chunk size is the typical sweet spot.
>
> > I had been toying with that idea myself - certainly of changing the
> > defaults soon. I'm tempted to make the default metadata "1.1"
> > though possibly not for RAID1. For RAID0,4,5,6,10 there is no value
> > in having the metadata at the end of the device. For RAID1 there is
> > as it makes booting off any member easier. Thoughts?
>
> It may be heresy but I would suggest that if the defaults change we
> should also implement support for auto-starting version 1.x devices,
> or some appropriate subset of them.
Yep, definite heresy. :-)
>
> I understand and appreciate the concerns of the userspace start
> community. However, we do a lot of storage on very dedicated systems
> and I have spent far more time unsnarling systems with blown
> initrd/initramfs setups and other boot issues than I have recovering
> from starting RAID volumes on the wrong box. Thats why I don't let
> udev anywhere near production machines and I am still living on 0.9
> metadata in spite of its limitations.
You don't need udev for user-space md startup. You do need initramfs,
but I think you need that for lots of things these days. It doesn't
need to be a very complicated initramfs.
>
> UNIX has always been about allowing people to shoot themselves in the
> foot if they so desire. I think an acceptable compromise would be to
> move toward a default of disabled auto-detection with the option to
> turn on detection of all meta-data types if people choose to do that.
>
You have the source code and you are quite welcome to shoot yourself
in whichever limb you please with it.
in-kernel autostart of v1.x arrays could probably be implemented
without too much difficulty. I am not likely to do it though.
If someone sent me nice patches which enabled it as a CONFIG option I
might accept them, providing a good justification was included.
But I don't think it is a good idea.
If you really really want to avoid an initramfs, then just use a 0.90
array for the device holding the root filesystem. Everything other
than root can be started by init scripts. Or do you want to avoid
init scripts too because they are too easy to get wrong :-)
> > NeilBrown
>
> Best wishes for a pleasant weekend to everyone.
>
Thank you, and the same to you!
NeilBrown
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn
2009-11-12 22:25 ` Bill Davidsen
@ 2009-11-13 5:50 ` Mikael Abrahamsson
2009-11-13 13:04 ` Bill Davidsen
0 siblings, 1 reply; 22+ messages in thread
From: Mikael Abrahamsson @ 2009-11-13 5:50 UTC (permalink / raw)
To: Bill Davidsen; +Cc: Doug Ledford, Jon Nelson, linux-raid
On Thu, 12 Nov 2009, Bill Davidsen wrote:
> I like 1.2, I feel it's least likely to suffer collateral damage, and
> the problems it causes seem to result in the type of behavior you
> mention aboue, the system says "Can't, won't, you don't know what you're
> doing."
What about adding a new v1.3 superblock which basically has 4 superblocks,
an old 1.x superblock residing at <end>-<v1.0 superblock size> (new
location), and then pointers to this block residing where 1.0, 1.1 and 1.2
superblocks would normally be? Wouldn't that solve "everybodys" problem by
making it easier to find the superblock regardless of what might have
happened (drive size changed because of 3ware, someone installed mbr on
the drive etc).
--
Mikael Abrahamsson email: swmike@swm.pp.se
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn
2009-11-13 5:50 ` Mikael Abrahamsson
@ 2009-11-13 13:04 ` Bill Davidsen
0 siblings, 0 replies; 22+ messages in thread
From: Bill Davidsen @ 2009-11-13 13:04 UTC (permalink / raw)
To: Mikael Abrahamsson; +Cc: Doug Ledford, Jon Nelson, linux-raid
Mikael Abrahamsson wrote:
> On Thu, 12 Nov 2009, Bill Davidsen wrote:
>
>> I like 1.2, I feel it's least likely to suffer collateral damage, and
>> the problems it causes seem to result in the type of behavior you
>> mention aboue, the system says "Can't, won't, you don't know what
>> you're doing."
>
> What about adding a new v1.3 superblock which basically has 4
> superblocks, an old 1.x superblock residing at <end>-<v1.0 superblock
> size> (new location), and then pointers to this block residing where
> 1.0, 1.1 and 1.2 superblocks would normally be? Wouldn't that solve
> "everybodys" problem by making it easier to find the superblock
> regardless of what might have happened (drive size changed because of
> 3ware, someone installed mbr on the drive etc).
>
Is it because it's early in the morning and I haven't had coffee, or is
that starting to sound like raid-1 with superblocks? I just have to feel
that it would increase the chances of something "looking like" a
superblock, but wasn't. Then we could have reshape of superblocks in
--grow, all in all that idea feels as though it's inviting them to be
different. Imagine an array with partitions, each of which is in an
array (like raid-1+0) with superblocks everywhere.
I'm sure other people will have thoughts on this, but given the problems
we have with mismatch_cnt in mirrors, I wouldn't trust them to stay the
same. And all would have to be updated, of course, makes for much disk
writing.
--
Bill Davidsen <davidsen@tmr.com>
"We can't solve today's problems by using the same thinking we
used in creating them." - Einstein
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn
2009-11-09 20:22 ` Neil F Brown
2009-11-09 21:00 ` Doug Ledford
@ 2009-11-13 23:54 ` Dan Williams
2009-11-14 3:32 ` Doug Ledford
1 sibling, 1 reply; 22+ messages in thread
From: Dan Williams @ 2009-11-13 23:54 UTC (permalink / raw)
To: Neil F Brown; +Cc: Doug Ledford, linux-raid
On Mon, Nov 9, 2009 at 1:22 PM, Neil F Brown <nfbrown@novell.com> wrote:
> I'm certainly happy with increasing the chunksize to 512K.
Probably good for reads, but it makes it harder for the code to
collect full stripe writes. I guess I should get some data to back
that up one of these days...
--
Dan
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn
2009-11-13 23:54 ` Dan Williams
@ 2009-11-14 3:32 ` Doug Ledford
0 siblings, 0 replies; 22+ messages in thread
From: Doug Ledford @ 2009-11-14 3:32 UTC (permalink / raw)
To: Dan Williams; +Cc: Neil F Brown, linux-raid
[-- Attachment #1: Type: text/plain, Size: 1454 bytes --]
On 11/13/2009 06:54 PM, Dan Williams wrote:
> On Mon, Nov 9, 2009 at 1:22 PM, Neil F Brown <nfbrown@novell.com> wrote:
>> I'm certainly happy with increasing the chunksize to 512K.
>
> Probably good for reads, but it makes it harder for the code to
> collect full stripe writes. I guess I should get some data to back
> that up one of these days...
My data (which I have, not that I need to get :-P) suggests that it
really doesn't matter. For streaming writes, the buffer cache stores
stuff up long enough to get a stripe write even when the stripe is huge.
For random writes, you don't normally get a full stripe no matter how
long you wait or how small the stripe is. I say this after looking at
the various performance parameters of a timed 5 minute dbench run and
also the random write time and rate of both 4k and 16k tiotest runs to
raid arrays from 4 to 7 disks and with chunk sizes from 256k up to 1024k
using ext2, ext3, ext4, and xfs filesystems. From those test results,
512k was roughly the sweet spot, streaming writes were effected far more
than random writes by chunk size, and both were probably even more
dependent on things other than chunk size (filesystem type and layout
for instance).
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2009-11-14 3:32 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-06 6:45 [ANNOUNCE] mdadm-3.1 has been withdrawn Neil Brown
2009-11-09 14:39 ` Doug Ledford
2009-11-09 15:36 ` berk walker
2009-11-09 15:42 ` Jon Nelson
2009-11-09 16:51 ` Mikael Abrahamsson
2009-11-09 21:07 ` Doug Ledford
2009-11-09 21:27 ` Luca Berra
2009-11-09 21:43 ` Jon Nelson
2009-11-10 8:25 ` Mikael Abrahamsson
2009-11-10 14:22 ` Jon Nelson
2009-11-11 3:26 ` Michael Evans
2009-11-12 22:25 ` Bill Davidsen
2009-11-13 5:50 ` Mikael Abrahamsson
2009-11-13 13:04 ` Bill Davidsen
2009-11-09 20:22 ` Neil F Brown
2009-11-09 21:00 ` Doug Ledford
2009-11-13 23:54 ` Dan Williams
2009-11-14 3:32 ` Doug Ledford
[not found] <nfbrown@novell.com>
2009-11-12 17:51 ` greg
2009-11-12 23:02 ` Rudy Zijlstra
2009-11-13 1:53 ` Michael Evans
2009-11-13 2:02 ` Neil Brown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).