linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Neil Brown <neilb@cse.unsw.edu.au>
To: "Kenneth D. Merry" <ken@kdm.org>
Cc: Anton Altaparmakov <aia21@cantab.net>,
	linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org
Subject: Re: RFC - new raid superblock layout for md driver
Date: Mon, 9 Dec 2002 14:52:11 +1100	[thread overview]
Message-ID: <15860.4971.982540.695313@notabene.cse.unsw.edu.au> (raw)
In-Reply-To: message from Kenneth D. Merry on Thursday November 21

On Thursday November 21, ken@kdm.org wrote:
> 
> This is a good idea.  Having all of the devices listed in the metadata on
> each disk is very helpful.  (See below for why.)
> 
> Here are some of my ideas about the features you'll want out of a new type
> of metadata:
...
> 
> [ these are features I think would be good to have ]
> 
>  - Per-array state that lets you know whether you're doing a resync,
>    reconstruction, verify, verify and fix, and so on.  This is part of the
>    state you'll need to do checkpointing -- picking up where you left off
>    after a reboot during the middle of an operation.
> 
 
Yes, a couple of flags in the 'state' field could do this.

>  - Per-array block number that tells you how far along you are in a verify,
>    resync, reconstruction, etc.  If you reboot, you can, for example, pick
>    a verify back up where you left off.

Got that, called "resync-position" (though I guess I have to change
the hypen...).

> 
>  - Enough per-disk state so you can determine, if you're doing a resync or
>    reconstruction, which disk is the target of the operation.  When I was
>    doing a lot of work on md a while back, one of the things I ran into is
>    that when you do a resync of a RAID-1, it always resyncs from the first
>    to the second disk, even if the first disk is the one out of sync.  (I
>    changed this, with Adaptec metadata at least, so it would resync onto
>    the correct disk.)

When a raid1 array is out of sync, it doesn't mean anything to say
which disc is out of sync.  They all are, with each other...
Nonetheless, the per-device stateflags have an 'in-sync' bit which can
be set or cleared as appropriate.

> 
>  - Each component knows about every other component in the array.  (It
>    knows by UUID, not just that there are N other devices in the array.)
>    This is an important piece of information:
> 	- You can compose the array now, using each disk's set_uuid and the
> 	  position of the device in the array, and by using the events
> 	  field to filter out the older of two disks that claim the same
> 	  position.
> 
> 	  The problem comes in more complicated scenarios.  For example:
> 		- user pulls one disk out of a RAID-1 with a spare
> 		- md reconstructs onto the spare
> 		- user shuts down machine, pulls the (former) spare that is
> 		  now part of the machine, and replaces the disk that he
> 		  originally pulled.
> 
> 	  So now you've got a scenario where you have a disk that claims to
> 	  be part of the array (same set_uuid), but its events field is a
> 	  little behind.  You could just resync the disk since it is out of
> 	  date, but still claims to be part of the array.  But you'd be
> 	  back in the same position if the user pulls the disk again and
> 	  puts the former spare back in -- you'd have to resync again.
> 
> 	  If each disk had a list of the uuids of every disk in the array,
> 	  you could tell from the disk table on the "freshest" disk that
> 	  the disk the user stuck back in isn't part of the array, despite
> 	  the fact that it claims to be.  (It was at one point, and then
> 	  was removed.)  You can then make the user add it back explicitly,
> 	  instead of just resyncing onto it.

The event counter is enough to determine if a device is really part of
the current array or not, and I cannot see why you need more than
that.
As far as I can tell, everything that you have said can be achieved
with setuid/devnumber/event.

> 
>  - Possibly the ability to setup multilevel arrays within a given piece of
>    metadata.  As far as multilevel arrays go, there are two basic
>    approaches to the metadata:

How many actual uses of multi-level arrays are there??

The most common one is raid0 over raid1, and I think there is a strong
case for implementing a 'raid10' module that does that, but also
allows a raid10 of an odd number of drives and things like that.

I don't think anything else is sufficiently common to really deserve
special treatment:  recursive metadata is adequate I think.

Concerning the  auto-assembly of multi-level arrays, that is not
particularly difficult, it just needs to be described precisely, and
coded.
It is a user-space thing and easily solved at that level.

> 
> I know Neil has philosophical issues with autoconfiguration (or perhaps
> in-kernel autoconfiguration), but it really is helpful, especially in
> certain situations.

I have issues with autoconfiguration that is not adequately
configurable, and current linux in-kernel autoconfiguration is not
adequately configurable.  With mdadm autoconfiguration is (very
nearly) adequately configurable and is fine.  There is still room for
some improvement, but not much.

Thanks for your input,
NeilBrown

  reply	other threads:[~2002-12-09  3:52 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-11-20  4:09 RFC - new raid superblock layout for md driver Neil Brown
2002-11-20 10:03 ` Anton Altaparmakov
2002-11-20 23:02   ` Neil Brown
2002-11-22  0:08   ` Kenneth D. Merry
2002-12-09  3:52     ` Neil Brown [this message]
2002-12-09 23:50       ` large await discrepancies Joe Pruett
2002-12-10 15:59         ` Joe Pruett
2002-12-12 15:30           ` Joe Pruett
2002-12-10  6:28       ` RFC - new raid superblock layout for md driver Kenneth D. Merry
2002-12-11  0:07         ` Neil Brown
2002-11-20 13:58 ` Bill Rugolsky Jr.
2002-11-20 23:17   ` Neil Brown
2002-11-20 14:09 ` Alan Cox
2002-11-20 23:11   ` Neil Brown
2002-11-21  0:30     ` Alan Cox
2002-11-21  0:10       ` John Adams
2002-11-21  0:30     ` Alan Cox
2002-11-20 16:03 ` Joel Becker
2002-11-20 23:31   ` Neil Brown
2002-11-21  1:46     ` Doug Ledford
2002-11-21 19:34       ` Joel Becker
2002-11-21 19:54         ` Doug Ledford
2002-11-21 19:57           ` Steven Dake
2002-11-21 20:38             ` Doug Ledford
2002-11-21 20:49               ` Steven Dake
2002-11-21 20:35                 ` Kevin Corry
2002-11-21 21:29             ` Alan Cox
2002-11-21 21:22               ` Doug Ledford
2002-11-21 20:53                 ` Kevin Corry
2002-11-21 21:55                   ` Doug Ledford
2002-11-21 23:49               ` DM vs MD (Was: RFC - new raid superblock layout for md driver) Luca Berra
2002-11-21 20:06           ` RFC - new raid superblock layout for md driver Joel Becker
2002-11-21 23:35           ` Luca Berra
2002-11-22 10:13   ` Joe Thornber
2002-12-02 21:38     ` Neil Brown
2002-12-03  8:24       ` Luca Berra
2002-11-20 17:05 ` Steven Dake
2002-11-20 23:30   ` Lars Marowsky-Bree
2002-11-20 23:48   ` Neil Brown
2002-11-21  0:29     ` Steven Dake
2002-11-21 15:23       ` John Stoffel
2002-11-21 19:36   ` Joel Becker
2002-11-22  7:11 ` Jeremy Fitzhardinge
  -- strict thread matches above, loose matches on Subject: below --
2002-11-20 15:55 Steve Pratt
2002-11-20 23:24 ` Neil Brown
2002-11-20 23:47 Lars Marowsky-Bree
2002-11-21  0:31 ` Neil Brown
2002-11-21  0:35 ` Steven Dake
2002-11-21  1:10   ` Alan Cox
2002-12-08 22:35   ` Neil Brown
2002-11-21 19:39 ` Joel Becker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=15860.4971.982540.695313@notabene.cse.unsw.edu.au \
    --to=neilb@cse.unsw.edu.au \
    --cc=aia21@cantab.net \
    --cc=ken@kdm.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).