The Linux Kernel Mailing List
 help / color / mirror / Atom feed
From: Steven Dake <sdake@mvista.com>
To: Neil Brown <neilb@cse.unsw.edu.au>
Cc: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org
Subject: Re: RFC - new raid superblock layout for md driver
Date: Wed, 20 Nov 2002 10:05:29 -0700	[thread overview]
Message-ID: <3DDBC0D9.5030904@mvista.com> (raw)
In-Reply-To: 15835.2798.613940.614361@notabene.cse.unsw.edu.au

Neil,

I would suggest adding a 64 bit field called "unique_identifier" to the 
per-device structure.  This would allow a RAID volume to be locked to a 
specific host, allowing the ability for true multihost operation.

For FibreChannel, we have a patch which places the host's FC WWN into 
the superblock structure, and only allows importing an array disk (via 
ioctl or autostart) if the superblock's WWN matches the target dev_t's 
host fibrechannel WWN.  We also use this for environments where slots 
are used to house either CPU or disks and lock a RAID array to a 
specific cpu slot.  WWNs are 64 bit, which is why I suggest such a large 
bitsize for this field.  This really helps in hotswap environments where 
a CPU blade is replaced and should use the same disk, but the disk 
naming may have changed between reboots.

This could be done without this field, but then the RAID arrays could be 
started unintentionally by the wrong host.  Imagine a host starting the 
wrong RAID array while it has been already started by some other party 
(forcing a rebuild) ugh!

Thanks
-steve

Neil Brown wrote:

>The md driver in linux uses a 'superblock' written to all devices in a
>RAID to record the current state and geometry of a RAID and to allow
>the various parts to be re-assembled reliably.
>
>The current superblock layout is sub-optimal.  It contains a lot of
>redundancy and wastes space.  In 4K it can only fit 27 component
>devices.  It has other limitations.
>
>I (and others) would like to define a new (version 1) format that
>resolves the problems in the current (0.90.0) format.
>
>The code in 2.5.lastest has all the superblock handling factored out so
>that defining a new format is very straight forward.
>
>I would like to propose a new layout, and to receive comment on it..
>
>My current design looks like:
>	/* constant array information - 128 bytes */
>    u32  md_magic
>    u32  major_version == 1
>    u32  feature_map     /* bit map of extra features in superblock */
>    u32  set_uuid[4]
>    u32  ctime
>    u32  level
>    u32  layout
>    u64  size		/* size of component devices, if they are all
>			 * required to be the same (Raid 1/5 */
>    u32  chunksize
>    u32  raid_disks
>    char name[32]
>    u32  pad1[10];
>
>	/* constant this-device information - 64 bytes */
>    u64  address of superblock in device
>    u32  number of this device in array  /* constant over reconfigurations */
>    u32  device_uuid[4]
>    u32  pad3[9]
>
>	/* array state information - 64 bytes */
>    u32  utime
>    u32  state    /* clean, resync-in-progress */
>    u32  sb_csum
>    u64  events
>    u64  resync-position	/* flag in state if this is valid)
>    u32  number of devices
>    u32  pad2[8]
>
>	/* device state information, indexed by 'number of device in array' 
>	   4 bytes per device */
>    for each device:
>      u16 position     /* in raid array or 0xffff for a spare. */
>      u16 state flags  /* error detected,  in-sync */
>
>
>This has 128 bytes for essentially constant information about the
>array, 64 bytes for constant information about this device, 64 bytes
>for changable state information about the array, and 4 bytes per
>device for state information about the devices.  This would allow an
>array with 192 devices in a 1K superblock, and 960 devices in a 4k
>superblock (the current size).
>
>Other features:
>   A feature map instead of a minor version number.
>   64 bit component device size field.
>   field for storing current position of resync process if array is
>       shut down while resync is running.
>   no "minor" field but a textual "name" field instead.
>   address of superblock in superblock to avoid misidentifying
>      superblock. e.g. is it in a partition or a whole device.
>   uuid for each device.  This is not directly used by the md driver,
>      but it is maintained, even if a drive is moved between arrays, 
>      and user-space can use it for tracking devices.
>
>md would, of course, continue to support the current layout
>indefinately, but this new layout would be available for use by people
>who don't need compatability with 2.4 and do want more than 27 devices
>etc. 
>
>To create an array with the new superblock layout, the user-space
>tool would write directly to the devices, (like mkfs does) and then
>assemble the array.  Creating an array using the ioctl interface will
>still create an array with the old superblock.
>
>When the kernel loads a superblock, it would check the major_version
>to see which piece of code to use to handle it.
>When it writes out a superblock, it would use the same version as was
>read in (of course).
>
>This superblock would *not* support in-kernel auto-assembly as that
>requires the "minor" field that I have deliberatly removed.  However I
>don't think this is a big cost as it looks like in-kernel
>auto-assembly is about to disappear with the early-user-space patches.
>
>The interpretation of the 'name' field would be up to the user-space
>tools and the system administrator.
>I imagine having something like:
>	host:name
>where if "host" isn't the current host name, auto-assembly is not
>tried, and if "host" is the current host name then:
>  if "name" looks like "md[0-9]*" then the array is assembled as that
>    device
>  else the array is assembled as /dev/mdN for some large, unused N,
>    and a symlink is created from /dev/md/name to /dev/mdN
>If the "host" part is empty or non-existant, then the array would be
>assembled no-matter what the hostname is.  This would be important
>e.g. for assembling the device that stores the root filesystem, as we
>may not know the host name until after the root filesystem were loaded.
>
>This would make auto-assembly much more flexable.
>
>Comments welcome.
>
>NeilBrown
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/
>
>
>
>  
>


  parent reply	other threads:[~2002-11-20 16:57 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-11-20  4:09 RFC - new raid superblock layout for md driver Neil Brown
2002-11-20 10:03 ` Anton Altaparmakov
2002-11-20 23:02   ` Neil Brown
2002-11-22  0:08   ` Kenneth D. Merry
2002-12-09  3:52     ` Neil Brown
2002-12-10  6:28       ` Kenneth D. Merry
2002-12-11  0:07         ` Neil Brown
2002-11-20 13:58 ` Bill Rugolsky Jr.
2002-11-20 23:17   ` Neil Brown
2002-11-20 14:09 ` Alan Cox
2002-11-20 23:11   ` Neil Brown
2002-11-21  0:30     ` Alan Cox
2002-11-21  0:30     ` Alan Cox
2002-11-20 16:03 ` Joel Becker
2002-11-20 23:31   ` Neil Brown
2002-11-21  1:46     ` Doug Ledford
2002-11-21 19:34       ` Joel Becker
2002-11-21 19:54         ` Doug Ledford
2002-11-21 19:57           ` Steven Dake
2002-11-21 20:38             ` Doug Ledford
2002-11-21 20:49               ` Steven Dake
2002-11-21 20:35                 ` Kevin Corry
2002-11-21 21:29             ` Alan Cox
2002-11-21 21:22               ` Doug Ledford
2002-11-21 20:53                 ` Kevin Corry
2002-11-21 21:55                   ` Doug Ledford
2002-11-21 23:49               ` DM vs MD (Was: RFC - new raid superblock layout for md driver) Luca Berra
2002-11-21 20:06           ` RFC - new raid superblock layout for md driver Joel Becker
2002-11-21 23:35           ` Luca Berra
2002-11-22 10:13   ` Joe Thornber
2002-12-02 21:38     ` Neil Brown
2002-12-03  8:24       ` Luca Berra
2002-11-20 17:05 ` Steven Dake [this message]
2002-11-20 23:30   ` Lars Marowsky-Bree
2002-11-20 23:48   ` Neil Brown
2002-11-21  0:29     ` Steven Dake
2002-11-21 15:23       ` John Stoffel
2002-11-21 19:36   ` Joel Becker
2002-11-22  7:11 ` Jeremy Fitzhardinge
  -- strict thread matches above, loose matches on Subject: below --
2002-11-20 15:55 Steve Pratt
2002-11-20 23:24 ` Neil Brown
2002-11-20 23:47 Lars Marowsky-Bree
2002-11-21  0:31 ` Neil Brown
2002-11-21  0:35 ` Steven Dake
2002-11-21  1:10   ` Alan Cox
2002-12-08 22:35   ` Neil Brown
2002-11-21 19:39 ` Joel Becker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3DDBC0D9.5030904@mvista.com \
    --to=sdake@mvista.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@cse.unsw.edu.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox