linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Alexander Lyakas <alex.bolshoy@gmail.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: Question about mdadm commit d6508f0cfb60edf07b36f1532eae4d9cddf7178b "be more careful about add attempts"
Date: Mon, 21 Nov 2011 13:44:29 +1100	[thread overview]
Message-ID: <20111121134429.7a6f46cc@notabene.brown> (raw)
In-Reply-To: <CAGRgLy49X7KUMdY8hj-99Z46vfzvSnB6LhNZpsN5XedjTdBu1Q@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3805 bytes --]

On Thu, 17 Nov 2011 13:13:20 +0200 Alexander Lyakas <alex.bolshoy@gmail.com>
wrote:

> Hello Neil,
> 
> >> However, at least for 1.2 arrays, I believe this is too restrictive,
> >> don't you think? If the raid slot (not desc_nr) of the device being
> >> re-added is *not occupied* yet, can't we just select a free desc_nr
> >> for the new disk on that path?
> >> Or perhaps, mdadm on the re-add path can select a free desc_nr
> >> (disc.number) for it (just as it does for --add), after ensuring that
> >> the slot is not occupied yet? Where it is better to do it?
> >> Otherwise, the re-add fails, while it can perfectly succeed (only pick
> >> a different desc_nr).
> >
> > I think I see what you are saying.
> > However my question is: is this really an issue.
> > Is there a credible sequence of events that results in the current code makes
> > an undesirable decision?  Of course I do not count deliberately editing the
> > metadata as part of a credible sequence of events.
> 
> Consider this scenario, in which the code refuses to re-add a drive:
> 
> Step 1:
> - I created a raid1 array with 3 drives: A,B,C (and their desc_nr=0,1,2)
> - I failed drives B and C, and removed them from the array, and
> totally forgot about them for the rest of the scenario.
> - I added to the array two new drives: D and E, and waited for the
> resync to complete. The array now has the following structure:
> A: descr_nr=0
> D: desc_nr=3 (was selected during the "add" path in mdadm, as expected)
> E: desc_nr=4 (was selected during the "add" path in mdadm, as expected)
> 
> Step 2:
> - I failed drives D and E, and removed them from the array. The E
> drive is not used for the rest of the scenario, so we can forget about
> it.
> 
> I wrote some data to the array. At this point, the array bitmap is
> dirty, and will not be cleared, since the array is degraded.
> 
> Step 3:
> - I added one new drive (last one, I promise!) to the array - drive F,
> and waited for it to resync. The array now has the following
> structure:
> A: descr_nr=0
> F: desc_nr=3
> 
> So F took desc_nr of D drive (desc_nr=3). This is expected according
> to mdadm code.
> 
> Event counters at this point:
> A and F: events=149, events_cleared=0
> D: events=109
> 
> Step 4:
> At this point, mdadm refuses to re-add the drive D to the array,
> because its desc_nr is already taken (I verified that via gdb). On the
> other hand, if we would have simply picked a fresh desc_nr for D, then
> it could be re-added I believe, because:
> - slots are not important for raid1 (D's slot was taken actually by F).
> - it should pass the check for bitmap-based resync (events in D' sb >=
> events_cleared of the array)
> 
> Do you agree with this, or perhaps I missed something?
> 
> Additional notes:
> - of course, such scenario is relevant only for arrays with more than
> single redundancy, so it's not relevant for raid5
> - to simulate such scenario for raid6, need at step 3 to add the new
> drive to the slot, which is not the slot of the drive we're going to
> re-add in step4 (otherwise, it takes the D's slot, and then we really
> cannot re-add). This can be done as we discussed earlier.
> 
> What do you think?

I think some of the details in your steps aren't really right, but I do see
the point you are making.
If you keep the array degraded, the events_cleared will not be updated so any
old array member can safely be re-added.

I'll have a look and see how best to fix the code.

Thanks.

NeilBrown



> 
> Thanks,
> Alex.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

  reply	other threads:[~2011-11-21  2:44 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-26 17:02 Question about mdadm commit d6508f0cfb60edf07b36f1532eae4d9cddf7178b "be more careful about add attempts" Alexander Lyakas
2011-10-26 21:51 ` NeilBrown
2011-10-27  9:10   ` Alexander Lyakas
2011-10-30 23:16     ` NeilBrown
2011-10-31  8:57       ` Alexander Lyakas
2011-10-31  9:19         ` NeilBrown
2011-11-01 16:26           ` Alexander Lyakas
2011-11-01 22:52             ` NeilBrown
2011-11-08 16:23               ` Alexander Lyakas
2011-11-08 23:41                 ` NeilBrown
2011-11-17 11:13                   ` Alexander Lyakas
2011-11-21  2:44                     ` NeilBrown [this message]
2011-11-22  8:45                       ` Alexander Lyakas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111121134429.7a6f46cc@notabene.brown \
    --to=neilb@suse.de \
    --cc=alex.bolshoy@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).