* [ANNOUNCE] mdadm-3.1 has been withdrawn @ 2009-11-06 6:45 Neil Brown 2009-11-09 14:39 ` Doug Ledford 0 siblings, 1 reply; 22+ messages in thread From: Neil Brown @ 2009-11-06 6:45 UTC (permalink / raw) To: linux-raid Greetings. About a week ago I released mdadm-3.1 I have now 'withdrawn' it meaning that it doesn't appear on the kernel.org mirrors any more, and I ask people not to use it. The reason is that it is not as reliable at managing a raid[56] reshape as I thought and it can corrupt data too easily. In particular the 'backup' that is taken of the area being reshaped gets restored to the wrong location when the array is stopped in the middle of a reshape and reassembled. If anyone has used mdadm-3.1 to reshape an array and has stopped and restarted the array during that process (and I know some people have) then it is very possible that some data in that filesystem has been corrupted. I would urge you do take whatever measures you can to check for corruption. An fsck at the very least would be advised. The code in the devel-3.1 branch of my git tree (git://neil.brown.name/mdadm) has this bug fixed as well as a number of other improvements. I will probably release it as 3.1.1 some time next week. Note that you need 2.6.32 for most of the reshape operations with the new code. This is because 2.6.31 does not handle a device failure during reshape correctly and a subsequent crash can cause data to be lost. When the needed patches appear in a 2.6.31.y stable kernel. I will adjust the requirement that mdadm imposes. A big "thank you" to everyone who tested out this code and an even bigger apology to anyone who has suffered data loss because of it. NeilBrown ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn 2009-11-06 6:45 [ANNOUNCE] mdadm-3.1 has been withdrawn Neil Brown @ 2009-11-09 14:39 ` Doug Ledford 2009-11-09 15:36 ` berk walker 2009-11-09 20:22 ` Neil F Brown 0 siblings, 2 replies; 22+ messages in thread From: Doug Ledford @ 2009-11-09 14:39 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 931 bytes --] On 11/06/2009 01:45 AM, Neil Brown wrote: > > Greetings. > > About a week ago I released mdadm-3.1 > I have now 'withdrawn' it meaning that it doesn't appear on the > kernel.org mirrors any more, and I ask people not to use it. Although the cause for this sucks, I was actually going to suggest that since 3.1 is a version bump, that we take the opportunity to change a few defaults. Like switching to version 1 superblocks instead of version 0 by default. And changing the default chunk size to 512k instead of 64k. The time has simply come for the 0->1 superblock change, and I have a good deal of data showing that for SATA disks at least, the 512k chunk size is the typical sweet spot. -- Doug Ledford <dledford@redhat.com> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn 2009-11-09 14:39 ` Doug Ledford @ 2009-11-09 15:36 ` berk walker 2009-11-09 15:42 ` Jon Nelson 2009-11-09 20:22 ` Neil F Brown 1 sibling, 1 reply; 22+ messages in thread From: berk walker @ 2009-11-09 15:36 UTC (permalink / raw) To: Doug Ledford; +Cc: Neil Brown, linux-raid Doug Ledford wrote: > On 11/06/2009 01:45 AM, Neil Brown wrote: >> Greetings. >> >> About a week ago I released mdadm-3.1 >> I have now 'withdrawn' it meaning that it doesn't appear on the >> kernel.org mirrors any more, and I ask people not to use it. > > Although the cause for this sucks, I was actually going to suggest that > since 3.1 is a version bump, that we take the opportunity to change a > few defaults. Like switching to version 1 superblocks instead of > version 0 by default. And changing the default chunk size to 512k > instead of 64k. The time has simply come for the 0->1 superblock > change, and I have a good deal of data showing that for SATA disks at > least, the 512k chunk size is the typical sweet spot. > +1 b- ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn 2009-11-09 15:36 ` berk walker @ 2009-11-09 15:42 ` Jon Nelson 2009-11-09 16:51 ` Mikael Abrahamsson 0 siblings, 1 reply; 22+ messages in thread From: Jon Nelson @ 2009-11-09 15:42 UTC (permalink / raw) Cc: linux-raid On Mon, Nov 9, 2009 at 9:36 AM, berk walker <berk@panix.com> wrote: > Doug Ledford wrote: >> Although the cause for this sucks, I was actually going to suggest that >> since 3.1 is a version bump, that we take the opportunity to change a >> few defaults. Like switching to version 1 superblocks instead of >> version 0 by default. And changing the default chunk size to 512k >> instead of 64k. The time has simply come for the 0->1 superblock >> change, and I have a good deal of data showing that for SATA disks at >> least, the 512k chunk size is the typical sweet spot. >> > +1 +1 here, too. I've been using 1.1 for everything. What's the current wisdom regarding 1.0 vs 1.1 or 1.2? I used 1.1 because that's also where filesystem metadata usually goes and therefore one might hope that the presence of the md metadata would prevent accidental identification of a raid volume as containing a filesystem. -- Jon -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn 2009-11-09 15:42 ` Jon Nelson @ 2009-11-09 16:51 ` Mikael Abrahamsson 2009-11-09 21:07 ` Doug Ledford 0 siblings, 1 reply; 22+ messages in thread From: Mikael Abrahamsson @ 2009-11-09 16:51 UTC (permalink / raw) To: Jon Nelson; +Cc: linux-raid On Mon, 9 Nov 2009, Jon Nelson wrote: > I've been using 1.1 for everything. What's the current wisdom > regarding 1.0 vs 1.1 or 1.2? > I used 1.1 because that's also where filesystem metadata usually goes > and therefore one might hope that the presence of the md metadata > would prevent accidental identification of a raid volume as containing > a filesystem. I like 1.2 because if you happen to write an MBR or something to the drive, you don't lose the superblock. With 1.2 I can also take the drive from a 3ware hw-raid (single drive in 3ware bios) and put in a non-3ware (because the 3ware stores the superblock at the end, so when you put it in a non-3ware the end has now changed). -- Mikael Abrahamsson email: swmike@swm.pp.se ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn 2009-11-09 16:51 ` Mikael Abrahamsson @ 2009-11-09 21:07 ` Doug Ledford 2009-11-09 21:27 ` Luca Berra ` (2 more replies) 0 siblings, 3 replies; 22+ messages in thread From: Doug Ledford @ 2009-11-09 21:07 UTC (permalink / raw) To: Mikael Abrahamsson; +Cc: Jon Nelson, linux-raid [-- Attachment #1: Type: text/plain, Size: 1939 bytes --] On 11/09/2009 11:51 AM, Mikael Abrahamsson wrote: > On Mon, 9 Nov 2009, Jon Nelson wrote: > >> I've been using 1.1 for everything. What's the current wisdom >> regarding 1.0 vs 1.1 or 1.2? >> I used 1.1 because that's also where filesystem metadata usually goes >> and therefore one might hope that the presence of the md metadata >> would prevent accidental identification of a raid volume as containing >> a filesystem. > > I like 1.2 because if you happen to write an MBR or something to the > drive, you don't lose the superblock. Of course, I recently had a bug report that I ended closing out as NOTABUG because of this very ability. The person had arrays with 1.2 superblocks, and they went to add a new disk, and all the existing disks had a specific partition layout, so he copied that to the new disk, then tried to add the partition to the raid array. It kept returning "device too small for array". Then, upon inspection, we come to see he has a 1.2 superblock on the *entire* drive, which left the partition table intact, but the partition table is *pointless* because the array is on the whole disk devices. This sort of confusion is bad. So, while I could see making it 1.2 for partitions (so that boot sectors won't overwrite the superblock), I wouldn't make it 1.2 for whole disk devices, and in fact it might be wise to refuse to create 1.2 superblocks on whole disk devices. Just a thought. > With 1.2 I can also take the drive from a 3ware hw-raid (single drive in > 3ware bios) and put in a non-3ware (because the 3ware stores the > superblock at the end, so when you put it in a non-3ware the end has now > changed). 1.1 should work just as well for this. -- Doug Ledford <dledford@redhat.com> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn 2009-11-09 21:07 ` Doug Ledford @ 2009-11-09 21:27 ` Luca Berra 2009-11-09 21:43 ` Jon Nelson 2009-11-10 8:25 ` Mikael Abrahamsson 2009-11-12 22:25 ` Bill Davidsen 2 siblings, 1 reply; 22+ messages in thread From: Luca Berra @ 2009-11-09 21:27 UTC (permalink / raw) To: linux-raid On Mon, Nov 09, 2009 at 04:07:09PM -0500, Doug Ledford wrote: >overwrite the superblock), I wouldn't make it 1.2 for whole disk >devices, and in fact it might be wise to refuse to create 1.2 >superblocks on whole disk devices. Just a thought. I am against refusing to do things because users could get confused, I could agree if this would require a force flag, but not deny completely. I would think 1.1 is a good option for default. Better than 1.0 for reasons we discussed to boredom, and 1.2 is really only for special cases. i also agree on the default chunk size bump -- Luca Berra -- bluca@comedia.it Communication Media & Services S.r.l. /"\ \ / ASCII RIBBON CAMPAIGN X AGAINST HTML MAIL / \ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn 2009-11-09 21:27 ` Luca Berra @ 2009-11-09 21:43 ` Jon Nelson 0 siblings, 0 replies; 22+ messages in thread From: Jon Nelson @ 2009-11-09 21:43 UTC (permalink / raw) To: linux-raid On Mon, Nov 9, 2009 at 3:27 PM, Luca Berra <bluca@comedia.it> wrote: > On Mon, Nov 09, 2009 at 04:07:09PM -0500, Doug Ledford wrote: >> >> overwrite the superblock), I wouldn't make it 1.2 for whole disk >> devices, and in fact it might be wise to refuse to create 1.2 >> superblocks on whole disk devices. Just a thought. > > I am against refusing to do things because users could get confused, > I could agree if this would require a force flag, but not deny > completely. Agree. > I would think 1.1 is a good option for default. Better than 1.0 for > reasons we discussed to boredom, and 1.2 is really only for special > cases. Agree. > i also agree on the default chunk size bump Agree. -- Jon -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn 2009-11-09 21:07 ` Doug Ledford 2009-11-09 21:27 ` Luca Berra @ 2009-11-10 8:25 ` Mikael Abrahamsson 2009-11-10 14:22 ` Jon Nelson 2009-11-12 22:25 ` Bill Davidsen 2 siblings, 1 reply; 22+ messages in thread From: Mikael Abrahamsson @ 2009-11-10 8:25 UTC (permalink / raw) To: linux-raid On Mon, 9 Nov 2009, Doug Ledford wrote: > Of course, I recently had a bug report that I ended closing out as > NOTABUG because of this very ability. The person had arrays with 1.2 > superblocks, and they went to add a new disk, and all the existing disks > had a specific partition layout, so he copied that to the new disk, then > tried to add the partition to the raid array. It kept returning "device > too small for array". Then, upon inspection, we come to see he has a > 1.2 superblock on the *entire* drive, which left the partition table > intact, but the partition table is *pointless* because the array is on > the whole disk devices. This sort of confusion is bad. So, while I > could see making it 1.2 for partitions (so that boot sectors won't > overwrite the superblock), I wouldn't make it 1.2 for whole disk > devices, and in fact it might be wise to refuse to create 1.2 > superblocks on whole disk devices. Just a thought. Well, same thing there, if you create a partition table you don't break the superblock. Perhaps something needs to be able to discern between the superblock being "whole disk" and on a partition? Personally I put 1.2 on "whole disk" (no partition table at all), and I would really HATE this possibility going away. I like it the way it is and feel comfortable with it and I don't want 1.0 or 1.1 superblocks in my setup. -- Mikael Abrahamsson email: swmike@swm.pp.se ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn 2009-11-10 8:25 ` Mikael Abrahamsson @ 2009-11-10 14:22 ` Jon Nelson 2009-11-11 3:26 ` Michael Evans 0 siblings, 1 reply; 22+ messages in thread From: Jon Nelson @ 2009-11-10 14:22 UTC (permalink / raw) Cc: linux-raid On Tue, Nov 10, 2009 at 2:25 AM, Mikael Abrahamsson <swmike@swm.pp.se> wrote: > On Mon, 9 Nov 2009, Doug Ledford wrote: > >> Of course, I recently had a bug report that I ended closing out as NOTABUG >> because of this very ability. The person had arrays with 1.2 superblocks, >> and they went to add a new disk, and all the existing disks had a specific >> partition layout, so he copied that to the new disk, then tried to add the >> partition to the raid array. It kept returning "device too small for >> array". Then, upon inspection, we come to see he has a 1.2 superblock on >> the *entire* drive, which left the partition table intact, but the partition >> table is *pointless* because the array is on the whole disk devices. This >> sort of confusion is bad. So, while I could see making it 1.2 for >> partitions (so that boot sectors won't overwrite the superblock), I wouldn't >> make it 1.2 for whole disk devices, and in fact it might be wise to refuse >> to create 1.2 superblocks on whole disk devices. Just a thought. > > Well, same thing there, if you create a partition table you don't break the > superblock. Perhaps something needs to be able to discern between the > superblock being "whole disk" and on a partition? Personally I put 1.2 on > "whole disk" (no partition table at all), and I would really HATE this > possibility going away. I like it the way it is and feel comfortable with it > and I don't want 1.0 or 1.1 superblocks in my setup. Since I almost always use partitions (this way, the partition *type* is "Linux RAID") I largely avoid this issue. -- Jon -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn 2009-11-10 14:22 ` Jon Nelson @ 2009-11-11 3:26 ` Michael Evans 0 siblings, 0 replies; 22+ messages in thread From: Michael Evans @ 2009-11-11 3:26 UTC (permalink / raw) To: Jon Nelson; +Cc: linux-raid There is nothing preventing someone from first creating a protective partition, similar to the MBR record used by GPT. Then they would be able to use the 4k offset 1.2 label on the device if they absolutely wanted. However a normal MBR with partition and 1.1 label would use less disk space and be more compatible with other tools. Similar logic applies for GPT labeled drives, and any drive large enough to require GPT should not miss the ~16 kilobytes required at each end of the drive. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn 2009-11-09 21:07 ` Doug Ledford 2009-11-09 21:27 ` Luca Berra 2009-11-10 8:25 ` Mikael Abrahamsson @ 2009-11-12 22:25 ` Bill Davidsen 2009-11-13 5:50 ` Mikael Abrahamsson 2 siblings, 1 reply; 22+ messages in thread From: Bill Davidsen @ 2009-11-12 22:25 UTC (permalink / raw) To: Doug Ledford; +Cc: Mikael Abrahamsson, Jon Nelson, linux-raid Doug Ledford wrote: > On 11/09/2009 11:51 AM, Mikael Abrahamsson wrote: > >> On Mon, 9 Nov 2009, Jon Nelson wrote: >> >> >>> I've been using 1.1 for everything. What's the current wisdom >>> regarding 1.0 vs 1.1 or 1.2? >>> I used 1.1 because that's also where filesystem metadata usually goes >>> and therefore one might hope that the presence of the md metadata >>> would prevent accidental identification of a raid volume as containing >>> a filesystem. >>> >> I like 1.2 because if you happen to write an MBR or something to the >> drive, you don't lose the superblock. >> > > Of course, I recently had a bug report that I ended closing out as > NOTABUG because of this very ability. The person had arrays with 1.2 > superblocks, and they went to add a new disk, and all the existing disks > had a specific partition layout, so he copied that to the new disk, then > tried to add the partition to the raid array. It kept returning "device > too small for array". Then, upon inspection, we come to see he has a > 1.2 superblock on the *entire* drive, which left the partition table > intact, but the partition table is *pointless* because the array is on > the whole disk devices. This sort of confusion is bad. So, while I > could see making it 1.2 for partitions (so that boot sectors won't > overwrite the superblock), I wouldn't make it 1.2 for whole disk > devices, and in fact it might be wise to refuse to create 1.2 > superblocks on whole disk devices. Just a thought. > > I'm trying to wrap my head around this recommendation, and not doing well. The end of the allocation area (partition, disk, array, whatever) seems to be what users hit when they do a dd or some similar operation without understanding it. And the from end is what they hit when they "fix" the MBR or add a partition table because something said it was missing. As for your friend, nothing is foolproof, and unless he tried very hard he probably failed to damage anything in a way which couldn't be readily fixed. I like 1.2, I feel it's least likely to suffer collateral damage, and the problems it causes seem to result in the type of behavior you mention aboue, the system says "Can't, won't, you don't know what you're doing." -- Bill Davidsen <davidsen@tmr.com> "We can't solve today's problems by using the same thinking we used in creating them." - Einstein ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn 2009-11-12 22:25 ` Bill Davidsen @ 2009-11-13 5:50 ` Mikael Abrahamsson 2009-11-13 13:04 ` Bill Davidsen 0 siblings, 1 reply; 22+ messages in thread From: Mikael Abrahamsson @ 2009-11-13 5:50 UTC (permalink / raw) To: Bill Davidsen; +Cc: Doug Ledford, Jon Nelson, linux-raid On Thu, 12 Nov 2009, Bill Davidsen wrote: > I like 1.2, I feel it's least likely to suffer collateral damage, and > the problems it causes seem to result in the type of behavior you > mention aboue, the system says "Can't, won't, you don't know what you're > doing." What about adding a new v1.3 superblock which basically has 4 superblocks, an old 1.x superblock residing at <end>-<v1.0 superblock size> (new location), and then pointers to this block residing where 1.0, 1.1 and 1.2 superblocks would normally be? Wouldn't that solve "everybodys" problem by making it easier to find the superblock regardless of what might have happened (drive size changed because of 3ware, someone installed mbr on the drive etc). -- Mikael Abrahamsson email: swmike@swm.pp.se ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn 2009-11-13 5:50 ` Mikael Abrahamsson @ 2009-11-13 13:04 ` Bill Davidsen 0 siblings, 0 replies; 22+ messages in thread From: Bill Davidsen @ 2009-11-13 13:04 UTC (permalink / raw) To: Mikael Abrahamsson; +Cc: Doug Ledford, Jon Nelson, linux-raid Mikael Abrahamsson wrote: > On Thu, 12 Nov 2009, Bill Davidsen wrote: > >> I like 1.2, I feel it's least likely to suffer collateral damage, and >> the problems it causes seem to result in the type of behavior you >> mention aboue, the system says "Can't, won't, you don't know what >> you're doing." > > What about adding a new v1.3 superblock which basically has 4 > superblocks, an old 1.x superblock residing at <end>-<v1.0 superblock > size> (new location), and then pointers to this block residing where > 1.0, 1.1 and 1.2 superblocks would normally be? Wouldn't that solve > "everybodys" problem by making it easier to find the superblock > regardless of what might have happened (drive size changed because of > 3ware, someone installed mbr on the drive etc). > Is it because it's early in the morning and I haven't had coffee, or is that starting to sound like raid-1 with superblocks? I just have to feel that it would increase the chances of something "looking like" a superblock, but wasn't. Then we could have reshape of superblocks in --grow, all in all that idea feels as though it's inviting them to be different. Imagine an array with partitions, each of which is in an array (like raid-1+0) with superblocks everywhere. I'm sure other people will have thoughts on this, but given the problems we have with mismatch_cnt in mirrors, I wouldn't trust them to stay the same. And all would have to be updated, of course, makes for much disk writing. -- Bill Davidsen <davidsen@tmr.com> "We can't solve today's problems by using the same thinking we used in creating them." - Einstein ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn 2009-11-09 14:39 ` Doug Ledford 2009-11-09 15:36 ` berk walker @ 2009-11-09 20:22 ` Neil F Brown 2009-11-09 21:00 ` Doug Ledford 2009-11-13 23:54 ` Dan Williams 1 sibling, 2 replies; 22+ messages in thread From: Neil F Brown @ 2009-11-09 20:22 UTC (permalink / raw) To: Doug Ledford; +Cc: linux-raid On Tue, November 10, 2009 1:39 am, Doug Ledford wrote: > On 11/06/2009 01:45 AM, Neil Brown wrote: >> >> Greetings. >> >> About a week ago I released mdadm-3.1 >> I have now 'withdrawn' it meaning that it doesn't appear on the >> kernel.org mirrors any more, and I ask people not to use it. > > Although the cause for this sucks, I was actually going to suggest that > since 3.1 is a version bump, that we take the opportunity to change a > few defaults. Like switching to version 1 superblocks instead of > version 0 by default. And changing the default chunk size to 512k > instead of 64k. The time has simply come for the 0->1 superblock > change, and I have a good deal of data showing that for SATA disks at > least, the 512k chunk size is the typical sweet spot. I had been toying with that idea myself - certainly of changing the defaults soon. I'm tempted to make the default metadata "1.1" though possibly not for RAID1. For RAID0,4,5,6,10 there is no value in having the metadata at the end of the device. For RAID1 there is as it makes booting off any member easier. Thoughts? I'm certainly happy with increasing the chunksize to 512K. NeilBrown ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn 2009-11-09 20:22 ` Neil F Brown @ 2009-11-09 21:00 ` Doug Ledford 2009-11-13 23:54 ` Dan Williams 1 sibling, 0 replies; 22+ messages in thread From: Doug Ledford @ 2009-11-09 21:00 UTC (permalink / raw) To: Neil F Brown; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 1940 bytes --] On 11/09/2009 03:22 PM, Neil F Brown wrote: > On Tue, November 10, 2009 1:39 am, Doug Ledford wrote: >> On 11/06/2009 01:45 AM, Neil Brown wrote: >>> >>> Greetings. >>> >>> About a week ago I released mdadm-3.1 >>> I have now 'withdrawn' it meaning that it doesn't appear on the >>> kernel.org mirrors any more, and I ask people not to use it. >> >> Although the cause for this sucks, I was actually going to suggest that >> since 3.1 is a version bump, that we take the opportunity to change a >> few defaults. Like switching to version 1 superblocks instead of >> version 0 by default. And changing the default chunk size to 512k >> instead of 64k. The time has simply come for the 0->1 superblock >> change, and I have a good deal of data showing that for SATA disks at >> least, the 512k chunk size is the typical sweet spot. > > I had been toying with that idea myself - certainly of changing the > defaults soon. > I'm tempted to make the default metadata "1.1" though possibly not for > RAID1. For RAID0,4,5,6,10 there is no value in having the metadata > at the end of the device. For RAID1 there is as it makes booting off > any member easier. > Thoughts? While it makes booting off of a raid1 easier, raid1 is *precisely* the level that is prone to silent data corrupt due to the individual members being able to be mounted while not part of a running raid array. I would make it default to 1.1 period, and force distros or other people to either A) update the boot loader to something that can handle a 1.1 superblock (grub2 should be able to) or B) manually set it to 1.0 instead. > > I'm certainly happy with increasing the chunksize to 512K. > > NeilBrown -- Doug Ledford <dledford@redhat.com> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn 2009-11-09 20:22 ` Neil F Brown 2009-11-09 21:00 ` Doug Ledford @ 2009-11-13 23:54 ` Dan Williams 2009-11-14 3:32 ` Doug Ledford 1 sibling, 1 reply; 22+ messages in thread From: Dan Williams @ 2009-11-13 23:54 UTC (permalink / raw) To: Neil F Brown; +Cc: Doug Ledford, linux-raid On Mon, Nov 9, 2009 at 1:22 PM, Neil F Brown <nfbrown@novell.com> wrote: > I'm certainly happy with increasing the chunksize to 512K. Probably good for reads, but it makes it harder for the code to collect full stripe writes. I guess I should get some data to back that up one of these days... -- Dan ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn 2009-11-13 23:54 ` Dan Williams @ 2009-11-14 3:32 ` Doug Ledford 0 siblings, 0 replies; 22+ messages in thread From: Doug Ledford @ 2009-11-14 3:32 UTC (permalink / raw) To: Dan Williams; +Cc: Neil F Brown, linux-raid [-- Attachment #1: Type: text/plain, Size: 1454 bytes --] On 11/13/2009 06:54 PM, Dan Williams wrote: > On Mon, Nov 9, 2009 at 1:22 PM, Neil F Brown <nfbrown@novell.com> wrote: >> I'm certainly happy with increasing the chunksize to 512K. > > Probably good for reads, but it makes it harder for the code to > collect full stripe writes. I guess I should get some data to back > that up one of these days... My data (which I have, not that I need to get :-P) suggests that it really doesn't matter. For streaming writes, the buffer cache stores stuff up long enough to get a stripe write even when the stripe is huge. For random writes, you don't normally get a full stripe no matter how long you wait or how small the stripe is. I say this after looking at the various performance parameters of a timed 5 minute dbench run and also the random write time and rate of both 4k and 16k tiotest runs to raid arrays from 4 to 7 disks and with chunk sizes from 256k up to 1024k using ext2, ext3, ext4, and xfs filesystems. From those test results, 512k was roughly the sweet spot, streaming writes were effected far more than random writes by chunk size, and both were probably even more dependent on things other than chunk size (filesystem type and layout for instance). -- Doug Ledford <dledford@redhat.com> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
[parent not found: <nfbrown@novell.com>]
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn @ 2009-11-12 17:51 ` greg 2009-11-12 23:02 ` Rudy Zijlstra 2009-11-13 2:02 ` Neil Brown 0 siblings, 2 replies; 22+ messages in thread From: greg @ 2009-11-12 17:51 UTC (permalink / raw) To: Neil F Brown, Doug Ledford; +Cc: linux-raid On Nov 10, 7:22am, "Neil F Brown" wrote: } Subject: Re: [ANNOUNCE] mdadm-3.1 has been withdrawn Good morning to everyone, hope the week is progressing well. > On Tue, November 10, 2009 1:39 am, Doug Ledford wrote: > > On 11/06/2009 01:45 AM, Neil Brown wrote: > >> > >> Greetings. > >> > >> About a week ago I released mdadm-3.1 > >> I have now 'withdrawn' it meaning that it doesn't appear on the > >> kernel.org mirrors any more, and I ask people not to use it. > > > > Although the cause for this sucks, I was actually going to suggest that > > since 3.1 is a version bump, that we take the opportunity to change a > > few defaults. Like switching to version 1 superblocks instead of > > version 0 by default. And changing the default chunk size to 512k > > instead of 64k. The time has simply come for the 0->1 superblock > > change, and I have a good deal of data showing that for SATA disks at > > least, the 512k chunk size is the typical sweet spot. > I had been toying with that idea myself - certainly of changing the > defaults soon. I'm tempted to make the default metadata "1.1" > though possibly not for RAID1. For RAID0,4,5,6,10 there is no value > in having the metadata at the end of the device. For RAID1 there is > as it makes booting off any member easier. Thoughts? It may be heresy but I would suggest that if the defaults change we should also implement support for auto-starting version 1.x devices, or some appropriate subset of them. I understand and appreciate the concerns of the userspace start community. However, we do a lot of storage on very dedicated systems and I have spent far more time unsnarling systems with blown initrd/initramfs setups and other boot issues than I have recovering from starting RAID volumes on the wrong box. Thats why I don't let udev anywhere near production machines and I am still living on 0.9 metadata in spite of its limitations. UNIX has always been about allowing people to shoot themselves in the foot if they so desire. I think an acceptable compromise would be to move toward a default of disabled auto-detection with the option to turn on detection of all meta-data types if people choose to do that. > NeilBrown Best wishes for a pleasant weekend to everyone. }-- End of excerpt from "Neil F Brown" As always, Dr. G.W. Wettstein, Ph.D. Enjellic Systems Development, LLC. 4206 N. 19th Ave. Specializing in information infra-structure Fargo, ND 58102 development. PH: 701-281-1686 FAX: 701-281-3949 EMAIL: greg@enjellic.com ------------------------------------------------------------------------------ "Join in the new game that's sweeping the country. It's called `Bureaucracy`. Everybody stands in a circle. The first person to do anything loses." -- Steve RTFM Przepiora ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn 2009-11-12 17:51 ` greg @ 2009-11-12 23:02 ` Rudy Zijlstra 2009-11-13 1:53 ` Michael Evans 2009-11-13 2:02 ` Neil Brown 1 sibling, 1 reply; 22+ messages in thread From: Rudy Zijlstra @ 2009-11-12 23:02 UTC (permalink / raw) To: greg; +Cc: Neil F Brown, Doug Ledford, linux-raid greg@enjellic.com wrote: > On Nov 10, 7:22am, "Neil F Brown" wrote: > } Subject: Re: [ANNOUNCE] mdadm-3.1 has been withdrawn > > Good morning to everyone, hope the week is progressing well. > > <snip> > It may be heresy but I would suggest that if the defaults change we > should also implement support for auto-starting version 1.x devices, > or some appropriate subset of them. > > I understand and appreciate the concerns of the userspace start > community. However, we do a lot of storage on very dedicated systems > and I have spent far more time unsnarling systems with blown > initrd/initramfs setups and other boot issues than I have recovering > from starting RAID volumes on the wrong box. Thats why I don't let > udev anywhere near production machines and I am still living on 0.9 > metadata in spite of its limitations. > > UNIX has always been about allowing people to shoot themselves in the > foot if they so desire. I think an acceptable compromise would be to > move toward a default of disabled auto-detection with the option to > turn on detection of all meta-data types if people choose to do that. > > +1 Cheers, Rudy ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn 2009-11-12 23:02 ` Rudy Zijlstra @ 2009-11-13 1:53 ` Michael Evans 0 siblings, 0 replies; 22+ messages in thread From: Michael Evans @ 2009-11-13 1:53 UTC (permalink / raw) To: Rudy Zijlstra; +Cc: greg, Neil F Brown, Doug Ledford, linux-raid You still have to have some part of your boot process served by normal startup support (like a hardware raid / motherboard supported fakeraid; unless you're doing some kind of netboot) no matter what you do you still need a kernel exposed, and you may as well have an initrd of some kind after it. What I'd likely do is use GPT and place your bootloader inside of a Guid Partition Table area (preferably near the front of the disk so you're assured to be within realmode bios LBA range (just in case LBA 48 bit isn't supported by your bootloader/bios) which might end at 128GB of data http://ubuntuforums.org/archive/index.php/t-301826.html) and set it up so that the boot-loader code in the first 440 (446) bytes of the MBR compatibility/protective label loads the blocks for the real bootloader from that area. Then you can use DD to duplicate the first 446 bytes on to each other 'mirror' device and either have those boot devices as a mirror set, or intentionally only manually update the backups when you've tested a new kernel. < Free Open Source Plug > If for some reason your current distribution's initrd/initramfs doesn't do what you want, I know of an easily customized alternative: "Another Early Userspace Init Option" http://sourceforge.net/projects/aeuio/ which is based on basic /bin/sh , awk, sed ; it builds best when you have a local copy of busybox, but should also build (a somewhat larger) initrd using the other binaries on your system. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ANNOUNCE] mdadm-3.1 has been withdrawn 2009-11-12 17:51 ` greg 2009-11-12 23:02 ` Rudy Zijlstra @ 2009-11-13 2:02 ` Neil Brown 1 sibling, 0 replies; 22+ messages in thread From: Neil Brown @ 2009-11-13 2:02 UTC (permalink / raw) To: greg; +Cc: Doug Ledford, linux-raid On Thursday November 12, greg@enjellic.com wrote: > On Nov 10, 7:22am, "Neil F Brown" wrote: > } Subject: Re: [ANNOUNCE] mdadm-3.1 has been withdrawn > > Good morning to everyone, hope the week is progressing well. > > > On Tue, November 10, 2009 1:39 am, Doug Ledford wrote: > > > On 11/06/2009 01:45 AM, Neil Brown wrote: > > >> > > >> Greetings. > > >> > > >> About a week ago I released mdadm-3.1 > > >> I have now 'withdrawn' it meaning that it doesn't appear on the > > >> kernel.org mirrors any more, and I ask people not to use it. > > > > > > Although the cause for this sucks, I was actually going to suggest that > > > since 3.1 is a version bump, that we take the opportunity to change a > > > few defaults. Like switching to version 1 superblocks instead of > > > version 0 by default. And changing the default chunk size to 512k > > > instead of 64k. The time has simply come for the 0->1 superblock > > > change, and I have a good deal of data showing that for SATA disks at > > > least, the 512k chunk size is the typical sweet spot. > > > I had been toying with that idea myself - certainly of changing the > > defaults soon. I'm tempted to make the default metadata "1.1" > > though possibly not for RAID1. For RAID0,4,5,6,10 there is no value > > in having the metadata at the end of the device. For RAID1 there is > > as it makes booting off any member easier. Thoughts? > > It may be heresy but I would suggest that if the defaults change we > should also implement support for auto-starting version 1.x devices, > or some appropriate subset of them. Yep, definite heresy. :-) > > I understand and appreciate the concerns of the userspace start > community. However, we do a lot of storage on very dedicated systems > and I have spent far more time unsnarling systems with blown > initrd/initramfs setups and other boot issues than I have recovering > from starting RAID volumes on the wrong box. Thats why I don't let > udev anywhere near production machines and I am still living on 0.9 > metadata in spite of its limitations. You don't need udev for user-space md startup. You do need initramfs, but I think you need that for lots of things these days. It doesn't need to be a very complicated initramfs. > > UNIX has always been about allowing people to shoot themselves in the > foot if they so desire. I think an acceptable compromise would be to > move toward a default of disabled auto-detection with the option to > turn on detection of all meta-data types if people choose to do that. > You have the source code and you are quite welcome to shoot yourself in whichever limb you please with it. in-kernel autostart of v1.x arrays could probably be implemented without too much difficulty. I am not likely to do it though. If someone sent me nice patches which enabled it as a CONFIG option I might accept them, providing a good justification was included. But I don't think it is a good idea. If you really really want to avoid an initramfs, then just use a 0.90 array for the device holding the root filesystem. Everything other than root can be started by init scripts. Or do you want to avoid init scripts too because they are too easy to get wrong :-) > > NeilBrown > > Best wishes for a pleasant weekend to everyone. > Thank you, and the same to you! NeilBrown ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2009-11-14 3:32 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-06 6:45 [ANNOUNCE] mdadm-3.1 has been withdrawn Neil Brown
2009-11-09 14:39 ` Doug Ledford
2009-11-09 15:36 ` berk walker
2009-11-09 15:42 ` Jon Nelson
2009-11-09 16:51 ` Mikael Abrahamsson
2009-11-09 21:07 ` Doug Ledford
2009-11-09 21:27 ` Luca Berra
2009-11-09 21:43 ` Jon Nelson
2009-11-10 8:25 ` Mikael Abrahamsson
2009-11-10 14:22 ` Jon Nelson
2009-11-11 3:26 ` Michael Evans
2009-11-12 22:25 ` Bill Davidsen
2009-11-13 5:50 ` Mikael Abrahamsson
2009-11-13 13:04 ` Bill Davidsen
2009-11-09 20:22 ` Neil F Brown
2009-11-09 21:00 ` Doug Ledford
2009-11-13 23:54 ` Dan Williams
2009-11-14 3:32 ` Doug Ledford
[not found] <nfbrown@novell.com>
2009-11-12 17:51 ` greg
2009-11-12 23:02 ` Rudy Zijlstra
2009-11-13 1:53 ` Michael Evans
2009-11-13 2:02 ` Neil Brown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).