linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Alexander Lyakas <alex.bolshoy@gmail.com>
Cc: linux-raid <linux-raid@vger.kernel.org>
Subject: Re: sb->resync_offset value after resync failure
Date: Thu, 26 Jan 2012 11:51:32 +1100	[thread overview]
Message-ID: <20120126115132.75d9d8bd@notabene.brown> (raw)
In-Reply-To: <CAGRgLy44uKSggEsrU4Fx7op7iVjoPPq-doMdYjxnSGDm6krf9A@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4349 bytes --]

On Thu, 19 Jan 2012 18:19:55 +0200 Alexander Lyakas <alex.bolshoy@gmail.com>
wrote:

> Greetings,
> I am looking into a scenario, in which the md raid5/6 array is
> resyncing (e.g., after a fresh creation) and there is a drive failure.
> As written in Neil's blog entry "Closing the RAID5 write hole"
> (http://neil.brown.name/blog/20110614101708): "if a device fails
> during the resync, md doesn't take special action - it just allows the
> array to be used without a resync even though there could be corrupt
> data".
> 
> However, I noticed that at this point sb->resync_offset in the
> superblock is not set to MaxSector. At this point if a drive is
> added/re-added to the array, then drive recovery starts, i.e., md
> assumes that data/parity on the surviving drives are correct, and uses
> them to rebuild the new drive. This state of data/parity being correct
> should be reflected as sb->resync_offset==MaxSector, shouldn't it?
> 
> One issue that I ran into is the following: I reached a situation in
> which during array assembly: sb->resync_offset==sb->size. At this
> point, the following code in mdadm
> assumes that array is clean:
> info->array.state =
>     (__le64_to_cpu(sb->resync_offset) >= __le64_to_cpu(sb->size))
>      ? 1 : 0;
> As a result, mdadm lets the array assembly flow through fine to the
> kernel, but in the kernel the following code refuses to start the
> array:
>     if (mddev->degraded > dirty_parity_disks &&
>         mddev->recovery_cp != MaxSector) {
> 
> At this point, speciying --force to mdadm --assembly doesn't help,
> because mdadm thinks that array is clean (clean==1), and therefore
> doesn't do the "force-array" update, which would knock off the
> sb->resync_offset value. So there is no way to start the array, unless
> specifying the start_dirty_degraded=1 kernel parameter.
> 
> So one question is: should mdadm compare sb->resync_offset to
> MaxSector and not to sb->size? In the kernel code, resync_offset is
> always compared to MaxSector.

Yes, mdadm should be consistent with the kernel.  Patches welcome.

> 
> Another question is: whether sb->resync_offset should be set to
> MaxSector by the kernel as soon as it starts rebuilding a drive? I
> think this would be consistent with what Neil wrote in the blog entry.

Maybe every time we update ->curr_resync_completed we should update
->recovery_cp as well if it is below the new ->curre_resync_completed ??


> 
> Here is the scenario to reproduce the issue I described:
> # Create a raid6 array with 4 drives A,B,C,D. Array starts resyncing.
> # Fail drive D. Array aborts the resync and then immediately restarts
> it (it seems to checkpoint the mddev->recovery_cp, but I am not sure
> that it restarts from that checkpoint)
> # Re-add drive D to the array. It is added as a spare, array continues resyncing
> # Fail drive C. Array aborts the resync, and then starts rebuilding
> drive D. At this point sb->resync_offset is some valid value (usually
> 0, not MaxSectors and not sb->size).

Does it start the rebuilding from the start?  I hope it does.

> # Stop the array. At this point sb->resync offset is sb->size in all
> the superblocks.

At some point in there you had a RAID6 with two missing devices, so it is
either failed or completely in-sync.  I guess we assume the latter.
Is that wrong?

> 
> Another question I have: when exactly md decides to update the
> sb->resync_offset in the superblock? I am playing with similar
> scenarios with raid5, and sometimes I end up with MaxSectors and
> sometimes with valid values. From the code, it looks like only this
> logic updates it:
> 	if (mddev->in_sync)
> 		sb->resync_offset = cpu_to_le64(mddev->recovery_cp);
> 	else
> 		sb->resync_offset = cpu_to_le64(0);
> except for resizing and setting through sysfs. But I don't understand
> how this value should be managed in general.

I'm not sure what you are asking here .... that code explains exactly when
resync_offset should be set, and how.
What more is there to say?

NeilBrown


> 
> Thanks!
> Alex.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

  parent reply	other threads:[~2012-01-26  0:51 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-19 16:19 sb->resync_offset value after resync failure Alexander Lyakas
2012-01-24 14:25 ` Alexander Lyakas
2012-01-26  0:41   ` NeilBrown
2012-01-26  0:51 ` NeilBrown [this message]
2012-02-01 20:56   ` Alexander Lyakas
2012-02-01 22:19     ` NeilBrown
2012-02-06 11:41       ` Alexander Lyakas
2012-02-07  0:55         ` NeilBrown
2012-02-07  9:11           ` Alexander Lyakas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120126115132.75d9d8bd@notabene.brown \
    --to=neilb@suse.de \
    --cc=alex.bolshoy@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).