From: Goswin von Brederlow <goswin-v-b@web.de>
To: Neil Brown <neilb@suse.de>
Cc: Daniel Reurich <daniel@centurion.net.nz>,
Dan Williams <dan.j.williams@intel.com>,
"H. Peter Anvin" <hpa@zytor.com>,
Goswin von Brederlow <goswin-v-b@web.de>,
linux-raid@vger.kernel.org
Subject: Re: md extension to support booting from raid whole disks.
Date: Fri, 01 May 2009 23:33:35 +0200 [thread overview]
Message-ID: <87d4asld8w.fsf@frosties.localdomain> (raw)
In-Reply-To: <18935.35747.471257.202356@notabene.brown> (Neil Brown's message of "Wed, 29 Apr 2009 09:05:07 +1000")
Neil Brown <neilb@suse.de> writes:
> On Wednesday April 29, daniel@centurion.net.nz wrote:
>> On Tue, 2009-04-28 at 11:24 -0700, Dan Williams wrote:
>>
>> >
>> > ...or use a metadata format that your platform bios understands and
>> > provides an int 13h vector. See the new external metadata formats
>> > supported by the mdadm devel-3.0 branch.
>>
>> I don't think a metadata format is the right way either.
>>
>> What we need is a new version of the superblock with the first cylinder
>> (32kb on 512b sectors x64 sectors per cylinder) being set aside for the
>> bootloader, the superblock and w-i bitmap go in the second cylinder, and
>> the raid data area starting in the 3rd cylinder.
>>
>> It should be the bootloaders responsibility to install the bootloader
>> onto the disks 1st cylinder, but md/mdadm would have to replicate it on
>> resync or adding of a new disk. However we could consider remapping the
>> bootloader
>
> While I agree with Dan that having a BIOS which understands RAID is a
> good way to make this sort of thing "just work", I would be nice if it
> could work for people without the bios too.
>
> v1.x metadata has explicit knowledge of where the start of the data
> is, so it is quite possible to leave the first few (dozen) sectors
> unused (let's not talk about cylinders this century - OK?).
> So mdadm could grow a --grub flag to use with --create which arranged
> for data/bitmap to not use the first (say) 512 sectors of any device.
> (1.1 and 1.2 would still use reserved blocks for the superblock).
> [I can cut you a patch to experiment with if you like]
>
> grub could then write whatever it wants to write to any of these
> sectors.
Actualy there you touch a verry good point. How is grub supposed to
write the data anyway? Initially I thought the proposal was to have
sda sdb sdc sdd md0
0 0 0 0 0 (raid1)
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
..
meta meta meta meta -
meta meta meta meta -
64 65 66 xor 64-66 (raid5)
67 68 69 xor 67-69
...
I.e. at the begining of the md0 device there would be a chunk with
raid1 that is also at the begining of the raw devices. Then the
metadata followed by normal raid5 stripes. Grub would then install to
/dev/md0 and get automatically replicated across all disks.
Now I was against that because that seems awfully complicated for the
code and only works with an FS that leaves space for the bootloader.
What you are talking about is just moving the metadata back more (from
the 4k in 1.2 format to 256k or whatever) and starting the raid5 just
a little bit later on the disk. The only change (so far) would be
increasing the offset where to start.
> That only leaves the question of what happens when a spare is added to
> the array - how does the grub data get written to the space on the
> spare.
> I would rather that grub were responsible for this, than for md to
> treat that unused space as RAID1.
> We already have a notification system based on "mdadm --monitor" to
> process events. We could possibly plug grub in to that somehow so
> that it gets told to re-write all it's special blocks every time
> something significant changes in the array.
>
> NeilBrown
But now, indeed, how does this work with grub? Grub can't write to
/dev/md0 there, that wouldn't be bootable at all. And if grub writes
to /dev/sda then it doesn't get replicated.
I see two solutions for the initial write:
1) grub initialy writes to all component devices (which already exists
in some bootloaders)
2) mdadm --copy-reserved /dev/md0 /dev/sda
After grub installs on /dev/sda it tells mdadm to copy the reserved
block too all devices.
Also 2 solutions for what to do on changes:
A) mdadm --add copies the first 256k to new devices when syncing
(possibly sparse too.) The reserved 256k would basically become
part of the superblock. As such --zero-zuperblock would wipe them
too. I'm assuming bootloaders can live with identical data on all
devices.
B) Grub register itself as hook so it can trigger a copy comand on any
significant change. (possibly run option 2 above)
I think options 1+A are easiest for both md and bootloaders to
implement.
MfG
Goswin
PS: I'm using grub here as example for any bootloader.
next prev parent reply other threads:[~2009-05-01 21:33 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-04-24 12:08 md extension to support booting from raid whole disks Daniel Reurich
2009-04-27 15:08 ` Goswin von Brederlow
2009-04-28 4:58 ` H. Peter Anvin
2009-04-28 6:26 ` Luca Berra
2009-04-28 9:35 ` Goswin von Brederlow
2009-04-28 11:21 ` Daniel Reurich
2009-04-28 17:36 ` H. Peter Anvin
2009-04-28 22:23 ` Daniel Reurich
2009-04-28 23:30 ` H. Peter Anvin
2009-04-29 0:02 ` Daniel Reurich
2009-04-29 11:32 ` John Robinson
2009-04-28 18:24 ` Dan Williams
2009-04-28 22:19 ` Daniel Reurich
2009-04-28 22:26 ` Dan Williams
2009-05-01 21:04 ` Goswin von Brederlow
2009-05-01 21:24 ` Dan Williams
2009-05-01 22:33 ` Goswin von Brederlow
2009-05-02 12:07 ` John Robinson
2009-05-04 17:02 ` Goswin von Brederlow
2009-05-05 9:31 ` Michal Soltys
2009-04-28 23:05 ` Neil Brown
2009-04-28 23:20 ` H. Peter Anvin
2009-04-29 0:00 ` Daniel Reurich
2009-04-29 0:04 ` H. Peter Anvin
2009-04-29 0:20 ` Daniel Reurich
2009-04-29 0:28 ` H. Peter Anvin
2009-04-29 0:43 ` Daniel Reurich
2009-04-29 6:43 ` Gabor Gombas
2009-05-01 21:10 ` Goswin von Brederlow
2009-05-01 22:36 ` Rudy Zijlstra
2009-05-02 1:04 ` Daniel Reurich
2009-05-02 17:02 ` Michał Przyłuski
2009-05-03 1:33 ` Leslie Rhorer
2009-05-03 4:25 ` NeilBrown
2009-05-03 18:05 ` Leslie Rhorer
2009-05-04 3:04 ` Daniel Reurich
2009-05-08 21:50 ` Goswin von Brederlow
2009-05-08 22:16 ` NeilBrown
2009-05-08 22:29 ` Goswin von Brederlow
2009-05-12 5:39 ` Neil Brown
2009-05-12 19:44 ` Daniel Reurich
2009-05-13 11:12 ` Neil Brown
2009-05-14 2:21 ` Daniel Reurich
2009-05-15 16:13 ` H. Peter Anvin
2009-05-13 12:15 ` Bill Davidsen
2009-05-08 22:06 ` Goswin von Brederlow
2009-05-09 7:20 ` Peter Rabbitson
2009-05-10 1:29 ` Goswin von Brederlow
[not found] ` <87presxwu4.fsf@frosties.localdomain>
[not found] ` <1241219902.9516.6.camel@poledra.romunt.nl>
[not found] ` <87bpq8n6ym.fsf@frosties.localdomain>
2009-05-04 20:57 ` Rudy Zijlstra
2009-05-04 22:33 ` Daniel Reurich
2009-05-05 0:26 ` John Robinson
2009-05-05 9:03 ` Keld Jørn Simonsen
2009-05-08 21:18 ` Goswin von Brederlow
2009-04-29 22:43 ` md extension to support booting from raid whole disks, raid6, grub2, lvm2 Michael Ole Olsen
2009-05-01 21:36 ` Goswin von Brederlow
2009-04-29 7:45 ` md extension to support booting from raid whole disks Luca Berra
2009-04-29 16:55 ` H. Peter Anvin
2009-04-29 20:38 ` Luca Berra
2009-04-30 6:59 ` Gabor Gombas
2009-04-30 8:11 ` Luca Berra
2009-04-30 13:01 ` John Robinson
2009-04-28 23:41 ` Daniel Reurich
2009-04-29 0:01 ` H. Peter Anvin
2009-05-01 21:33 ` Goswin von Brederlow [this message]
2009-04-28 7:08 ` Daniel Reurich
2009-04-28 23:07 ` Neil Brown
2009-04-28 23:21 ` Daniel Reurich
2009-04-28 23:37 ` H. Peter Anvin
2009-04-29 0:05 ` Daniel Reurich
2009-04-29 0:06 ` H. Peter Anvin
2009-04-29 0:36 ` Daniel Reurich
2009-04-29 0:44 ` H. Peter Anvin
[not found] ` <1240968482.18303.1028.camel@ezra>
[not found] ` <49F7B162.8060301@zytor.com>
2009-04-29 2:08 ` Daniel Reurich
2009-04-29 2:33 ` H. Peter Anvin
2009-04-30 2:41 ` Daniel Reurich
2009-04-29 7:07 ` Gabor Gombas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87d4asld8w.fsf@frosties.localdomain \
--to=goswin-v-b@web.de \
--cc=dan.j.williams@intel.com \
--cc=daniel@centurion.net.nz \
--cc=hpa@zytor.com \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).