From: Lars Marowsky-Bree <lmb@suse.de>
To: Neil Brown <neilb@cse.unsw.edu.au>
Cc: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org
Subject: Re: RFC - new raid superblock layout for md driver
Date: Thu, 21 Nov 2002 00:47:43 +0100 [thread overview]
Message-ID: <20021120234743.GF29881@marowsky-bree.de> (raw)
>The md driver in linux uses a 'superblock' written to all devices in a
>RAID to record the current state and geometry of a RAID and to allow
>the various parts to be re-assembled reliably.
>
>The current superblock layout is sub-optimal. It contains a lot of
>redundancy and wastes space. In 4K it can only fit 27 component
>devices. It has other limitations.
Yes. (In particular, getting all the various counters to agree with each other
is a pain ;-)
Steven raises the valid point that multihost operation isn't currently
possible; I just don't agree with his solution:
- Activating a drive only on one host is already entirely possible.
(can be done by device uuid in initrd for example)
- Activating a RAID device from multiple hosts is still not possible.
(Requires way more sophisticated locking support than we currently have)
However, for none-RAID devices like multipathing I believe that activating a
drive on multiple hosts should be possible; ie, for these it might not be
necessary to scribble to the superblock every time.
(The md patch for 2.4 I sent you already does that; it reconstructs the
available paths fully dynamic on startup (by activating all paths present);
however it still updates the superblock afterwards)
Anyway, on to the format:
>The code in 2.5.lastest has all the superblock handling factored out so
>that defining a new format is very straight forward.
>
>I would like to propose a new layout, and to receive comment on it..
>
>My current design looks like:
> /* constant array information - 128 bytes */
> u32 md_magic
> u32 major_version == 1
> u32 feature_map /* bit map of extra features in superblock */
> u32 set_uuid[4]
> u32 ctime
> u32 level
> u32 layout
> u64 size /* size of component devices, if they are all
> * required to be the same (Raid 1/5 */
> u32 chunksize
> u32 raid_disks
> char name[32]
> u32 pad1[10];
Looks good so far.
> /* constant this-device information - 64 bytes */
> u64 address of superblock in device
> u32 number of this device in array /* constant over reconfigurations
> */
> u32 device_uuid[4]
What is "address of superblock in device" ? Seems redundant, otherwise you
would have been unable to read it, or am missing something?
Special case here might be required for multipathing. (ie, device_uuid == 0)
> u32 pad3[9]
>
> /* array state information - 64 bytes */
> u32 utime
Timestamps (also above, ctime) are always difficult. Time might not be set
correctly at any given time, in particular during early bootup. This field
should only be advisory.
> u32 state /* clean, resync-in-progress */
> u32 sb_csum
> u64 events
> u64 resync-position /* flag in state if this is valid)
> u32 number of devices
> u32 pad2[8]
>
> /* device state information, indexed by 'number of device in array'
> 4 bytes per device */
> for each device:
> u16 position /* in raid array or 0xffff for a spare. */
> u16 state flags /* error detected, in-sync */
u16 != u32; your position flags don't match up. I'd like to be able to take
the "position in the superblock" as a mapping here so it can be found in this
list, or what is the proposed relationship between the two?
>The interpretation of the 'name' field would be up to the user-space
>tools and the system administrator.
>I imagine having something like:
> host:name
>where if "host" isn't the current host name, auto-assembly is not
>tried, and if "host" is the current host name then:
Oh, well. You seem to sort of have Steven's idea here too ;-) In that case,
I'd go with the idea of Steven. Make that field a uuid of the host.
Sincerely,
Lars Marowsky-Brée <lmb@suse.de>
--
Principal Squirrel
SuSE Labs - Research & Development, SuSE Linux AG
"If anything can go wrong, it will." "Chance favors the prepared (mind)."
-- Capt. Edward A. Murphy -- Louis Pasteur
next reply other threads:[~2002-11-20 23:47 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-11-20 23:47 Lars Marowsky-Bree [this message]
2002-11-21 0:31 ` RFC - new raid superblock layout for md driver Neil Brown
2002-11-21 0:35 ` Steven Dake
2002-11-21 0:35 ` Steven Dake
2002-11-21 1:10 ` Alan Cox
2002-12-08 22:35 ` Neil Brown
2002-11-21 19:39 ` Joel Becker
-- strict thread matches above, loose matches on Subject: below --
2002-11-20 15:55 Steve Pratt
2002-11-20 23:24 ` Neil Brown
2002-11-20 4:09 Neil Brown
2002-11-20 10:03 ` Anton Altaparmakov
2002-11-20 10:03 ` Anton Altaparmakov
2002-11-20 23:02 ` Neil Brown
2002-11-22 0:08 ` Kenneth D. Merry
2002-12-09 3:52 ` Neil Brown
2002-12-10 6:28 ` Kenneth D. Merry
2002-12-11 0:07 ` Neil Brown
2002-11-20 13:58 ` Bill Rugolsky Jr.
2002-11-20 23:17 ` Neil Brown
2002-11-20 14:09 ` Alan Cox
2002-11-20 23:11 ` Neil Brown
2002-11-21 0:30 ` Alan Cox
2002-11-21 0:30 ` Alan Cox
2002-11-21 0:10 ` John Adams
2002-11-21 0:30 ` Alan Cox
2002-11-21 0:30 ` Alan Cox
2002-11-20 16:03 ` Joel Becker
2002-11-20 23:31 ` Neil Brown
2002-11-21 1:46 ` Doug Ledford
2002-11-21 19:34 ` Joel Becker
2002-11-21 19:54 ` Doug Ledford
2002-11-21 19:57 ` Steven Dake
2002-11-21 20:38 ` Doug Ledford
2002-11-21 20:49 ` Steven Dake
2002-11-21 20:35 ` Kevin Corry
2002-11-21 21:29 ` Alan Cox
2002-11-21 21:22 ` Doug Ledford
2002-11-21 20:53 ` Kevin Corry
2002-11-21 21:55 ` Doug Ledford
2002-11-21 20:06 ` Joel Becker
2002-11-21 23:35 ` Luca Berra
2002-11-22 10:13 ` Joe Thornber
2002-12-02 21:38 ` Neil Brown
2002-12-03 8:24 ` Luca Berra
2002-11-20 17:05 ` Steven Dake
2002-11-20 23:30 ` Lars Marowsky-Bree
2002-11-20 23:48 ` Neil Brown
2002-11-21 0:29 ` Steven Dake
2002-11-21 15:23 ` John Stoffel
2002-11-21 19:36 ` Joel Becker
2002-11-22 7:11 ` Jeremy Fitzhardinge
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20021120234743.GF29881@marowsky-bree.de \
--to=lmb@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@cse.unsw.edu.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.