linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* interrupted resync not restarted properly?
@ 2010-12-07 20:37 Dailey, Nate
  2010-12-09  3:16 ` Neil Brown
  0 siblings, 1 reply; 3+ messages in thread
From: Dailey, Nate @ 2010-12-07 20:37 UTC (permalink / raw)
  To: linux-raid

It seems to me that resuming an interrupted resync doesn't always work
right... here's what I'm doing (kernel 2.6.36):

- start with a 2 disk raid1 with internal bitmap
- fail/remove one disk and zero the superblock
- add the disk to the raid1
- before resync completes, fail/remove the disk again
- re-add the disk again

For version 0 superblocks, this works the way I'd expect: on adding the
disk the second time, the resync continues (or restarts from the
beginning, not sure).

But for version 1 superblocks, on adding the disk the second time, the
resync completes immediately, leaving some part of the array
out-of-sync.

Should there be something in the v1 superblock to prevent this?

If the raid1 is stopped in the middle of the resync (instead of removing
the target disk) the resync is resumed correctly on re-assembly with
both devices.

Nate


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: interrupted resync not restarted properly?
  2010-12-07 20:37 interrupted resync not restarted properly? Dailey, Nate
@ 2010-12-09  3:16 ` Neil Brown
  2010-12-09 16:53   ` Dailey, Nate
  0 siblings, 1 reply; 3+ messages in thread
From: Neil Brown @ 2010-12-09  3:16 UTC (permalink / raw)
  To: Dailey, Nate; +Cc: linux-raid

On Tue, 7 Dec 2010 15:37:00 -0500 "Dailey, Nate" <Nate.Dailey@stratus.com>
wrote:

> It seems to me that resuming an interrupted resync doesn't always work
> right... here's what I'm doing (kernel 2.6.36):
> 
> - start with a 2 disk raid1 with internal bitmap
> - fail/remove one disk and zero the superblock
> - add the disk to the raid1
> - before resync completes, fail/remove the disk again
> - re-add the disk again
> 
> For version 0 superblocks, this works the way I'd expect: on adding the
> disk the second time, the resync continues (or restarts from the
> beginning, not sure).
> 
> But for version 1 superblocks, on adding the disk the second time, the
> resync completes immediately, leaving some part of the array
> out-of-sync.
> 
> Should there be something in the v1 superblock to prevent this?
> 
> If the raid1 is stopped in the middle of the resync (instead of removing
> the target disk) the resync is resumed correctly on re-assembly with
> both devices.
> 
> Nate
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thanks for the report.
That is pretty bad behaviour.

The following is a patch that I plan to submit to -linus and -stable.  It
doesn't make it work quite as I would like (that would be a lot more code)
but it makes it a lot safer.

Thanks,
NeilBrown

--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5170,7 +5174,10 @@ static int add_new_disk(mddev_t * mddev, mdu_disk_info_t *info)
 		} else
 			super_types[mddev->major_version].
 				validate_super(mddev, rdev);
-		rdev->saved_raid_disk = rdev->raid_disk;
+		if (test_bit(In_sync, &rdev->flags))
+			rdev->saved_raid_disk = rdev->raid_disk;
+		else
+			rdev->saved_raid_disk = -1;
 
 		clear_bit(In_sync, &rdev->flags); /* just to be sure */
 		if (info->state & (1<<MD_DISK_WRITEMOSTLY))

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: interrupted resync not restarted properly?
  2010-12-09  3:16 ` Neil Brown
@ 2010-12-09 16:53   ` Dailey, Nate
  0 siblings, 0 replies; 3+ messages in thread
From: Dailey, Nate @ 2010-12-09 16:53 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

I tried this out, and it does indeed fix the problem I was seeing.

Thanks!
Nate




-----Original Message-----
From: Neil Brown [mailto:neilb@suse.de] 
Sent: Wednesday, December 08, 2010 10:16 PM
To: Dailey, Nate
Cc: linux-raid@vger.kernel.org
Subject: Re: interrupted resync not restarted properly?

On Tue, 7 Dec 2010 15:37:00 -0500 "Dailey, Nate"
<Nate.Dailey@stratus.com>
wrote:

> It seems to me that resuming an interrupted resync doesn't always work
> right... here's what I'm doing (kernel 2.6.36):
> 
> - start with a 2 disk raid1 with internal bitmap
> - fail/remove one disk and zero the superblock
> - add the disk to the raid1
> - before resync completes, fail/remove the disk again
> - re-add the disk again
> 
> For version 0 superblocks, this works the way I'd expect: on adding
the
> disk the second time, the resync continues (or restarts from the
> beginning, not sure).
> 
> But for version 1 superblocks, on adding the disk the second time, the
> resync completes immediately, leaving some part of the array
> out-of-sync.
> 
> Should there be something in the v1 superblock to prevent this?
> 
> If the raid1 is stopped in the middle of the resync (instead of
removing
> the target disk) the resync is resumed correctly on re-assembly with
> both devices.
> 
> Nate
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thanks for the report.
That is pretty bad behaviour.

The following is a patch that I plan to submit to -linus and -stable.
It
doesn't make it work quite as I would like (that would be a lot more
code)
but it makes it a lot safer.

Thanks,
NeilBrown

--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5170,7 +5174,10 @@ static int add_new_disk(mddev_t * mddev,
mdu_disk_info_t *info)
 		} else
 			super_types[mddev->major_version].
 				validate_super(mddev, rdev);
-		rdev->saved_raid_disk = rdev->raid_disk;
+		if (test_bit(In_sync, &rdev->flags))
+			rdev->saved_raid_disk = rdev->raid_disk;
+		else
+			rdev->saved_raid_disk = -1;
 
 		clear_bit(In_sync, &rdev->flags); /* just to be sure */
 		if (info->state & (1<<MD_DISK_WRITEMOSTLY))

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2010-12-09 16:53 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-07 20:37 interrupted resync not restarted properly? Dailey, Nate
2010-12-09  3:16 ` Neil Brown
2010-12-09 16:53   ` Dailey, Nate

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).