From: NeilBrown <neilb@suse.de>
To: Alexander Lyakas <alex.bolshoy@gmail.com>
Cc: andrey.warkentin@gmail.com, linux-raid <linux-raid@vger.kernel.org>
Subject: Re: Problem with patch: "reject a re-add request that cannot be honoured" (commit bedd86b7773fd97f0d708cc0c371c8963ba7ba9a)
Date: Wed, 27 Jun 2012 17:22:55 +1000 [thread overview]
Message-ID: <20120627172255.40ee3a19@notabene.brown> (raw)
In-Reply-To: <CAGRgLy55si7a8bY_Xuokqb6=cdOJyk_Wq34_DwmmhbpuQSSQug@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 4380 bytes --]
On Thu, 7 Jun 2012 15:47:46 +0300 Alexander Lyakas <alex.bolshoy@gmail.com>
wrote:
> Thanks for commenting, Neil,
>
> On Thu, Jun 7, 2012 at 2:24 PM, NeilBrown <neilb@suse.de> wrote:
> > On Thu, 7 Jun 2012 10:22:24 +0300 Alexander Lyakas <alex.bolshoy@gmail.com>
> > wrote:
> >
> >> Hi again Neil, and Andrey,
> >>
> >> looking at this email thread:
> >> http://www.spinics.net/lists/raid/msg36236.html
> >> between you and Andrey, the conclusion was:
> >> "So the correct thing to do is to *not* update the metadata on the
> >> recovering device until recovery completes. Then if it fails and is
> >> re-added, it will look just the same as when it was re-added the first
> >> time, and will do a bitmap-based recovery."
> >>
> >> I have two doubts about this decision:
> >> 1) Since the event count on the recovering drive is not updated, this
> >> means that after reboot, when array is re-assembled, this drive will
> >> not be added to the array, and the user will have to manually re-add
> >> it. I agree this is a minor thing.
> >
> > Still, if it can be fixed it should be.
> >
> >> 2) There are places in mdadm, which check for recovery_offset on the
> >> drive and take decisions based upon that. Specifically, if there is
> >> *no* recovery offset, the data on this drive is considered to be
> >> consistent WRT to the failure time of the drive. So, for example, the
> >> drive can be a candidate for bumping up events during "force
> >> assembly". Now, when superblock on such drive is not updated during
> >> recovery (so there is *no* recovery offset), mdadm will think that the
> >> drive is consistent, while in fact, its data is totally unusable until
> >> after recovery completes. That's because we have updated parts of the
> >> drive, but did not complete bringing the whole drive in-sync.
> >
> > If mdadm would consider updating the event count if not recovery had started,
> > then surely it is just as valid to do so once some recovery has started, even
> > if it hasn't completed.
>
> The patch you accepted from me ("Don't consider disks with a valid
> recovery offset as candidates for bumping up event count") actually
> attempts to protect from that:)
>
> I don't understand why "it is just as valid to do so once some
> recovery has started". My understanding is that once recovery of a
> drive has started, its data is not consistent between different parts
> of the drive, until the recovery completes. This is because md does
> bitmap-based recovery, and not kind of journal/transaction-log based
> recovery.
>
> However, one could argue that for force-assembly case, when data
> anyways can come up as partially corrupted, this is less important.
Exactly. And mdadm only updates event counts in the force-assembly case so
while it might not be ideal, it is the best we can do.
>
> I would still think that there is value in recoding in a superblock
> that a drive is recovering.
Probably. It is a bit unfortunate that if you stop an array that is
recovering after a --re-add, you cannot simply 'assemble' it again and
get it back to the same state.
I'll think more on that.
Meanwhile, this patch might address your other problem. It allows --re-add
to work if a non-bitmap rebuild fails and is then re-added.
diff --git a/drivers/md/md.c b/drivers/md/md.c
index c601c4b..d31852e 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5784,7 +5784,7 @@ static int add_new_disk(struct mddev * mddev, mdu_disk_info_t *info)
super_types[mddev->major_version].
validate_super(mddev, rdev);
if ((info->state & (1<<MD_DISK_SYNC)) &&
- (!test_bit(In_sync, &rdev->flags) ||
+ (test_bit(Faulty, &rdev->flags) ||
rdev->raid_disk != info->raid_disk)) {
/* This was a hot-add request, but events doesn't
* match, so reject it.
>
> >
> >>
> >> Can you pls comment on my doubts?
> >
> > I think there is probably room for improvement here but I don't think there
> > are any serious problems.
> >
> > However I'm about to go on leave for a couple of week so I'm unlikely to
> > think about it for a while. I've made a note to look at it properly when I
> > get back.
> >
>
> Indeed, don't you think about this while you are resting!
:-)
Thanks. I did have a very enjoyable break.
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
next prev parent reply other threads:[~2012-06-27 7:22 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-06 16:09 Problem with patch: "reject a re-add request that cannot be honoured" (commit bedd86b7773fd97f0d708cc0c371c8963ba7ba9a) Alexander Lyakas
2012-06-07 7:22 ` Alexander Lyakas
2012-06-07 11:24 ` NeilBrown
2012-06-07 12:47 ` Alexander Lyakas
2012-06-27 7:22 ` NeilBrown [this message]
2012-06-27 16:40 ` Alexander Lyakas
2012-07-02 7:57 ` NeilBrown
2012-07-02 17:30 ` John Gehring
2014-02-19 9:51 ` Alexander Lyakas
2014-02-26 13:02 ` Alexander Lyakas
2014-02-27 3:19 ` NeilBrown
2014-03-02 11:01 ` Alexander Lyakas
2012-06-08 8:59 ` Thanks (was Re: Problem with patch...) John Robinson
2012-06-07 10:26 ` Problem with patch: "reject a re-add request that cannot be honoured" (commit bedd86b7773fd97f0d708cc0c371c8963ba7ba9a) NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120627172255.40ee3a19@notabene.brown \
--to=neilb@suse.de \
--cc=alex.bolshoy@gmail.com \
--cc=andrey.warkentin@gmail.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.