From: Neil Brown <neilb@suse.de>
To: Mikael Abrahamsson <swmike@swm.pp.se>
Cc: linux-raid@vger.kernel.org
Subject: Re: 2 drives failed, one "active", one with wrong event count
Date: Thu, 4 Feb 2010 12:03:12 +1100 [thread overview]
Message-ID: <20100204120312.57e40158@notabene.brown> (raw)
In-Reply-To: <alpine.DEB.1.10.1002010810440.31777@uplift.swm.pp.se>
On Mon, 1 Feb 2010 08:13:24 +0100 (CET)
Mikael Abrahamsson <swmike@swm.pp.se> wrote:
> On Mon, 1 Feb 2010, Neil Brown wrote:
>
> > You might know that nothing has been written to the array since the
> > device with the lower event count was removed, but md doesn't know that.
> > Any device with an old event count could have old and so cannot be
> > trusted (unless you assemble with --force meaning that you are taking
> > responsibility).
>
> I did use --force, but it seems in the state "one drive with lower event
> count and another one with 0x2", the event count on the drive isn't
> forcably updated and since there is a 0x2 drive, the array isn't started.
>
> I had the same situation again this morning (changing controller next),
> but this time I had bitmaps enabled so recovery of the array with
> --assemble --force took just a few seconds. Really nice.
>
Right... I understand now.
Fixed with the following patch which will be in 3.1.2.
Thanks,
NeilBrown
commit 921d9e164fd3f6203d1b0cf2424b793043afd001
Author: NeilBrown <neilb@suse.de>
Date: Thu Feb 4 12:02:09 2010 +1100
Assemble: fix --force assembly of v1.x arrays which are recovering.
1.x metadata allows a device to be a member of the array while it
is still recoverying. So it is a working member, but is not
completely in-sync.
mdadm/assemble does not understand this distinction and assumes that a
work member is fully in-sync for the purpose of determining if there
are enough in-sync devices for the array to be functional.
So collect the 'recovery_start' value from the metadata and use it in
assemble when determining how useful a given device is.
Reported-by: Mikael Abrahamsson <swmike@swm.pp.se>
Signed-off-by: NeilBrown <neilb@suse.de>
diff --git a/Assemble.c b/Assemble.c
index 7f90048..e4d6181 100644
--- a/Assemble.c
+++ b/Assemble.c
@@ -800,7 +800,8 @@ int Assemble(struct supertype *st, char *mddev,
if (devices[j].i.events+event_margin >=
devices[most_recent].i.events) {
devices[j].uptodate = 1;
- if (i < content->array.raid_disks) {
+ if (i < content->array.raid_disks &&
+ devices[j].i.recovery_start == MaxSector) {
okcnt++;
avail[i]=1;
} else
@@ -822,6 +823,7 @@ int Assemble(struct supertype *st, char *mddev,
int j = best[i];
if (j>=0 &&
!devices[j].uptodate &&
+ devices[j].i.recovery_start == MaxSector &&
(chosen_drive < 0 ||
devices[j].i.events
> devices[chosen_drive].i.events))
diff --git a/super-ddf.c b/super-ddf.c
index 3e30229..870efd8 100644
--- a/super-ddf.c
+++ b/super-ddf.c
@@ -1369,6 +1369,7 @@ static void getinfo_super_ddf(struct supertype *st, struct mdinfo *info)
info->disk.state = (1 << MD_DISK_SYNC) | (1 << MD_DISK_ACTIVE);
+ info->recovery_start = MaxSector;
info->reshape_active = 0;
info->name[0] = 0;
@@ -1427,6 +1428,7 @@ static void getinfo_super_ddf_bvd(struct supertype *st, struct mdinfo *info)
info->container_member = ddf->currentconf->vcnum;
+ info->recovery_start = MaxSector;
info->resync_start = 0;
if (!(ddf->virt->entries[info->container_member].state
& DDF_state_inconsistent) &&
diff --git a/super-intel.c b/super-intel.c
index 91479a2..bbdcb51 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -1452,6 +1452,7 @@ static void getinfo_super_imsm_volume(struct supertype *st, struct mdinfo *info)
info->data_offset = __le32_to_cpu(map->pba_of_lba0);
info->component_size = __le32_to_cpu(map->blocks_per_member);
memset(info->uuid, 0, sizeof(info->uuid));
+ info->recovery_start = MaxSector;
if (map->map_state == IMSM_T_STATE_UNINITIALIZED || dev->vol.dirty) {
info->resync_start = 0;
@@ -1559,6 +1560,7 @@ static void getinfo_super_imsm(struct supertype *st, struct mdinfo *info)
info->disk.number = -1;
info->disk.state = 0;
info->name[0] = 0;
+ info->recovery_start = MaxSector;
if (super->disks) {
__u32 reserved = imsm_reserved_sectors(super, super->disks);
diff --git a/super0.c b/super0.c
index 0485a3a..5c6b7d7 100644
--- a/super0.c
+++ b/super0.c
@@ -372,6 +372,7 @@ static void getinfo_super0(struct supertype *st, struct mdinfo *info)
uuid_from_super0(st, info->uuid);
+ info->recovery_start = MaxSector;
if (sb->minor_version > 90 && (sb->reshape_position+1) != 0) {
info->reshape_active = 1;
info->reshape_progress = sb->reshape_position;
diff --git a/super1.c b/super1.c
index 85bb598..40fbb81 100644
--- a/super1.c
+++ b/super1.c
@@ -612,6 +612,11 @@ static void getinfo_super1(struct supertype *st, struct mdinfo *info)
strncpy(info->name, sb->set_name, 32);
info->name[32] = 0;
+ if (sb->feature_map & __le32_to_cpu(MD_FEATURE_RECOVERY_OFFSET))
+ info->recovery_start = __le32_to_cpu(sb->recovery_offset);
+ else
+ info->recovery_start = MaxSector;
+
if (sb->feature_map & __le32_to_cpu(MD_FEATURE_RESHAPE_ACTIVE)) {
info->reshape_active = 1;
info->reshape_progress = __le64_to_cpu(sb->reshape_position);
prev parent reply other threads:[~2010-02-04 1:03 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-01-28 9:05 2 drives failed, one "active", one with wrong event count Mikael Abrahamsson
2010-01-29 4:17 ` Mikael Abrahamsson
2010-01-29 7:06 ` Mikael Abrahamsson
2010-01-29 10:17 ` Neil Brown
2010-01-29 12:09 ` Mikael Abrahamsson
2010-01-29 12:27 ` Mikael Abrahamsson
2010-01-30 21:20 ` Mikael Abrahamsson
2010-01-31 22:37 ` Neil Brown
2010-02-01 7:13 ` Mikael Abrahamsson
2010-02-04 1:03 ` Neil Brown [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100204120312.57e40158@notabene.brown \
--to=neilb@suse.de \
--cc=linux-raid@vger.kernel.org \
--cc=swmike@swm.pp.se \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).