Set disk faulty / hot disk remove ioctl bug for read-only MD?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Set disk faulty / hot disk remove ioctl bug for read-only MD?
@ 2013-02-12 21:05 Joe Lawrence
  2013-02-13  2:38 ` NeilBrown
  0 siblings, 1 reply; 9+ messages in thread
From: Joe Lawrence @ 2013-02-12 21:05 UTC (permalink / raw)
  To: linux-raid; +Cc: NeilBrown

Hi Neil,

I believe I found a bug when failing and removing component devices from 
read-only and auto-read-only MD RAID devices.

Prior to commit 1ca69c4b "md: avoid taking the mutex on some ioctls", when
failing a read-only MD device component via mdadm --fail, EROFS was
returned.  Failing (and removing) auto-read-only components was permitted.

After the change, MD allows the failure of component devices for both
read-only and auto-read-only RAID devices.  Running mdadm --remove will
return EROFS when attempted on a read-only device.

It gets a little interesting when trying to remove a component from an
auto-read-only MD.  The *first* attempt will fail with EBUSY as the rdev
was left in the Blocked state by the SET_DISK_FAULTY ioctl.  However,
because the HOT_REMOVE_DISK ioctl made it through to the "switch to rw mode
if started auto-readonly" code, it has been transitioned to read-write. 
Subsequent trips through the device's mddev->thread may clear the Blocked
flag via md_update_sb (should the reconfig_mutex be available).  A second or
third attempt at removing the failed component from the auto-read-only MD
would then succeed.

* After 1ca69c4b:

  ** Remove from auto-read-only:

  % cat /proc/mdstat 
  md124 : active (auto-read-only) raid1 sdr3[4] sdq3[2]
        4192192 blocks super 1.2 [2/2] [UU]
        bitmap: 0/1 pages [0KB], 65536KB chunk

  % mdadm --fail /dev/md124 /dev/sdq3 
  mdadm: set /dev/sdq3 faulty in /dev/md124

  % cat /sys/block/md124/md/dev-sdq3/state
  faulty,blocked

  % mdadm --remove /dev/md124 /dev/sdq3 
  mdadm: hot remove failed for /dev/sdq3: Device or resource busy

  % cat /sys/block/md124/md/dev-sdq3/state
  faulty

  % mdadm --remove /dev/md124 /dev/sdq3 
  mdadm: hot removed /dev/sdq3 from /dev/md124

  ** Remove from read-only MD:

  % mdadm --add /dev/md124 /dev/sdq3
  % mdadm --readonly /dev/md124
  % mdadm --fail /dev/md124 /dev/sdq3 
  mdadm: set /dev/sdq3 faulty in /dev/md124

  % cat /sys/block/md124/md/dev-sdq3/state
  faulty,blocked

  % mdadm --remove /dev/md124 /dev/sdq3 
  mdadm: hot remove failed for /dev/sdq3: Read-only file system

* After patch pasted at bottom of email:

  ** Remove from auto-read-only:

  % cat /proc/mdstat
  md124 : active (auto-read-only) raid1 sdq3[2] sdr3[4]
        4192192 blocks super 1.2 [2/2] [UU]
        bitmap: 0/1 pages [0KB], 65536KB chunk

  % mdadm --fail /dev/md124 /dev/sdq3
  mdadm: set /dev/sdq3 faulty in /dev/md124

  % cat /sys/block/md124/md/dev-sdq3/state
  faulty

  % mdadm --remove /dev/md124 /dev/sdq3
  mdadm: hot removed /dev/sdq3 from /dev/md124

  ** Remove from read-only MD:

  % mdadm --add /dev/md124 /dev/sdq3
  % mdadm --readonly /dev/md124
  % mdadm --fail /dev/md124 /dev/sdq3
  mdadm: set device faulty failed for /dev/sdq3:  Read-only file system

  % cat /sys/block/md124/md/dev-sdq3/state
  in_sync

  % cat /proc/mdstat
  md124 : active (read-only) raid1 sdq3[2] sdr3[4]
      4192192 blocks super 1.2 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

See the patch below on how I reverted the behavior.  There may be a 
cleaner way of providing the same read-only rules for SET_DISK_FAULTY 
without taking the mutex, but it seemed clearer to just put it back where 
it was.

Regards,

-- Joe

From 3f8d6c46085c6106cfd9be9ddd6a059667b45b6f Mon Sep 17 00:00:00 2001
From: Joe Lawrence <joe.lawrence@stratus.com>
Date: Tue, 12 Feb 2013 15:07:45 -0500
Subject: [PATCH] md: SET_DISK_FAULTY should follow (auto) read-only rules

Commit 1ca69c4b "md: avoid taking the mutex on some ioctls" moved the
SET_DISK_FAULTY ioctl from out of the mddev_lock, but inadvertently changed
behavior for (auto) read-only arrays.  Revert that portion of the commit so
that marking a read-only MD device component as faulty is prevented (EROFS),
but still allowed for auto-read-only MD devices by transitioning to read-write
mode.

Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com>
---
 drivers/md/md.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 3db3d1b..3c41d3d 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6392,10 +6392,6 @@ static int md_ioctl(struct block_device *bdev, fmode_t mode,
 		else
 			err = get_disk_info(mddev, argp);
 		goto abort;
-
-	case SET_DISK_FAULTY:
-		err = set_disk_faulty(mddev, new_decode_dev(arg));
-		goto abort;
 	}

 	if (cmd == ADD_NEW_DISK)
@@ -6551,6 +6547,10 @@ static int md_ioctl(struct block_device *bdev, fmode_t mode,
 		err = hot_add_disk(mddev, new_decode_dev(arg));
 		goto done_unlock;

+	case SET_DISK_FAULTY:
+		err = set_disk_faulty(mddev, new_decode_dev(arg));
+		goto done_unlock;
+
 	case RUN_ARRAY:
 		err = do_md_run(mddev);
 		goto done_unlock;
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: Set disk faulty / hot disk remove ioctl bug for read-only MD?
  2013-02-12 21:05 Set disk faulty / hot disk remove ioctl bug for read-only MD? Joe Lawrence
@ 2013-02-13  2:38 ` NeilBrown
  2013-02-13 11:45   ` Sebastian Riemer
  0 siblings, 1 reply; 9+ messages in thread
From: NeilBrown @ 2013-02-13  2:38 UTC (permalink / raw)
  To: Joe Lawrence; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2899 bytes --]

On Tue, 12 Feb 2013 16:05:18 -0500 (EST) Joe Lawrence
<Joe.Lawrence@stratus.com> wrote:

> Hi Neil,
> 
> I believe I found a bug when failing and removing component devices from 
> read-only and auto-read-only MD RAID devices.

Thanks for reporting them!

> 
> Prior to commit 1ca69c4b "md: avoid taking the mutex on some ioctls", when
> failing a read-only MD device component via mdadm --fail, EROFS was
> returned.  Failing (and removing) auto-read-only components was permitted.

I wonder what we really want here....
Given that a read error on a read-only array will mark the device as faulty,
it seems fair to allow "--fail" to also mark a device as faulty on a
read-only array.
Do you have a particular preference for having "mdadm --fail" fail with EROFS
on a readonly array?


> 
> After the change, MD allows the failure of component devices for both
> read-only and auto-read-only RAID devices.  Running mdadm --remove will
> return EROFS when attempted on a read-only device.

I suspect this is preferred behaviour.

> 
> It gets a little interesting when trying to remove a component from an
> auto-read-only MD.  The *first* attempt will fail with EBUSY as the rdev
> was left in the Blocked state by the SET_DISK_FAULTY ioctl.  However,
> because the HOT_REMOVE_DISK ioctl made it through to the "switch to rw mode
> if started auto-readonly" code, it has been transitioned to read-write. 
> Subsequent trips through the device's mddev->thread may clear the Blocked
> flag via md_update_sb (should the reconfig_mutex be available).  A second or
> third attempt at removing the failed component from the auto-read-only MD
> would then succeed.

This definitely a bit odd.
In general "--remove" can fail with EBUSY for a short while after the
failure, but this happens after an arbitrary delay.

I think we just want to allow the "set to read-write" process to complete
cleanly before trying the actual removal.  Patch below seems to work for me.
Can you confirm please?

Thanks,
NeilBrown


diff --git a/drivers/md/md.c b/drivers/md/md.c
index 8b557d2..292cc2f 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6529,7 +6529,17 @@ static int md_ioctl(struct block_device *bdev, fmode_t mode,
 			mddev->ro = 0;
 			sysfs_notify_dirent_safe(mddev->sysfs_state);
 			set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
-			md_wakeup_thread(mddev->thread);
+			/* mddev_unlock will wake thread */
+			/* If a device failed while we were read-only, we
+			 * need to make sure the metadata is updated now.
+			 */
+			if (test_bit(MD_CHANGE_DEVS, &mddev->flags)) {
+				mddev_unlock(mddev);
+				wait_event(mddev->sb_wait,
+					   !test_bit(MD_CHANGE_DEVS, &mddev->flags) &&
+					   !test_bit(MD_CHANGE_PENDING, &mddev->flags));
+				mddev_lock(mddev);
+			}
 		} else {
 			err = -EROFS;
 			goto abort_unlock;

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: Set disk faulty / hot disk remove ioctl bug for read-only MD?
  2013-02-13  2:38 ` NeilBrown
@ 2013-02-13 11:45   ` Sebastian Riemer
  2013-02-13 14:30     ` Sebastian Riemer
  2013-02-13 18:56     ` Joe Lawrence
  0 siblings, 2 replies; 9+ messages in thread
From: Sebastian Riemer @ 2013-02-13 11:45 UTC (permalink / raw)
  To: NeilBrown; +Cc: Joe Lawrence, linux-raid

On 13.02.2013 03:38, NeilBrown wrote:
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 8b557d2..292cc2f 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -6529,7 +6529,17 @@ static int md_ioctl(struct block_device *bdev, fmode_t mode,
>  			mddev->ro = 0;
>  			sysfs_notify_dirent_safe(mddev->sysfs_state);
>  			set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> -			md_wakeup_thread(mddev->thread);
> +			/* mddev_unlock will wake thread */
> +			/* If a device failed while we were read-only, we
> +			 * need to make sure the metadata is updated now.
> +			 */
> +			if (test_bit(MD_CHANGE_DEVS, &mddev->flags)) {
> +				mddev_unlock(mddev);
> +				wait_event(mddev->sb_wait,
> +					   !test_bit(MD_CHANGE_DEVS, &mddev->flags) &&
> +					   !test_bit(MD_CHANGE_PENDING, &mddev->flags));
> +				mddev_lock(mddev);
> +			}
>  		} else {
>  			err = -EROFS;
>  			goto abort_unlock;
> 

Thanks, Neil!

I can confirm the issue on 3.4.y and that your patch fixes it reliably.

Acked-by: Sebastian Riemer <sebastian.riemer@profitbricks.com>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Set disk faulty / hot disk remove ioctl bug for read-only MD?
  2013-02-13 11:45   ` Sebastian Riemer
@ 2013-02-13 14:30     ` Sebastian Riemer
  2013-02-13 21:55       ` NeilBrown
  2013-02-13 18:56     ` Joe Lawrence
  1 sibling, 1 reply; 9+ messages in thread
From: Sebastian Riemer @ 2013-02-13 14:30 UTC (permalink / raw)
  To: NeilBrown; +Cc: Joe Lawrence, linux-raid

On 13.02.2013 12:45, Sebastian Riemer wrote:
> On 13.02.2013 03:38, NeilBrown wrote:
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index 8b557d2..292cc2f 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -6529,7 +6529,17 @@ static int md_ioctl(struct block_device *bdev, fmode_t mode,
>>  			mddev->ro = 0;
>>  			sysfs_notify_dirent_safe(mddev->sysfs_state);
>>  			set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>> -			md_wakeup_thread(mddev->thread);
>> +			/* mddev_unlock will wake thread */
>> +			/* If a device failed while we were read-only, we
>> +			 * need to make sure the metadata is updated now.
>> +			 */
>> +			if (test_bit(MD_CHANGE_DEVS, &mddev->flags)) {
>> +				mddev_unlock(mddev);
>> +				wait_event(mddev->sb_wait,
>> +					   !test_bit(MD_CHANGE_DEVS, &mddev->flags) &&
>> +					   !test_bit(MD_CHANGE_PENDING, &mddev->flags));
>> +				mddev_lock(mddev);
>> +			}
>>  		} else {
>>  			err = -EROFS;
>>  			goto abort_unlock;
>>
> 
> Thanks, Neil!
> 
> I can confirm the issue on 3.4.y and that your patch fixes it reliably.
> 
> Acked-by: Sebastian Riemer <sebastian.riemer@profitbricks.com>
> 

Damn, I've got a kernel which still crashes in
reap_sync_thread->raid1_spare_active() with NULL pointer dereference
although this patch is applied. So the fix isn't correct, yet.

I did some "objdump -S" on raid1.ko and found the issue at the following
code location in raid1_spare_active():
#	for (i = 0; i < conf->raid_disks; i++) {
#		struct md_rdev *rdev = conf->mirrors[i].rdev;
#		struct md_rdev *repl = conf->mirrors[conf->raid_disks + i].rdev;

A resync was pending (create without --assume-clean).
For me it looks like the faulty setting races with the syncer. The rdev
isn't registered in the personality anymore but the syncer tries to
access it for immediate resync.

Cheers,
Sebastian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Set disk faulty / hot disk remove ioctl bug for read-only MD?
  2013-02-13 11:45   ` Sebastian Riemer
  2013-02-13 14:30     ` Sebastian Riemer
@ 2013-02-13 18:56     ` Joe Lawrence
  2013-02-13 22:01       ` NeilBrown
  1 sibling, 1 reply; 9+ messages in thread
From: Joe Lawrence @ 2013-02-13 18:56 UTC (permalink / raw)
  To: NeilBrown; +Cc: Sebastian Riemer, Joe Lawrence, linux-raid

On Wed, 13 Feb 2013, Sebastian Riemer wrote:

> On 13.02.2013 03:38, NeilBrown wrote:
> > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > index 8b557d2..292cc2f 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -6529,7 +6529,17 @@ static int md_ioctl(struct block_device *bdev, fmode_t mode,
> >  			mddev->ro = 0;
> >  			sysfs_notify_dirent_safe(mddev->sysfs_state);
> >  			set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> > -			md_wakeup_thread(mddev->thread);
> > +			/* mddev_unlock will wake thread */
> > +			/* If a device failed while we were read-only, we
> > +			 * need to make sure the metadata is updated now.
> > +			 */
> > +			if (test_bit(MD_CHANGE_DEVS, &mddev->flags)) {
> > +				mddev_unlock(mddev);
> > +				wait_event(mddev->sb_wait,
> > +					   !test_bit(MD_CHANGE_DEVS, &mddev->flags) &&
> > +					   !test_bit(MD_CHANGE_PENDING, &mddev->flags));
> > +				mddev_lock(mddev);
> > +			}
> >  		} else {
> >  			err = -EROFS;
> >  			goto abort_unlock;
> > 
> 
> Thanks, Neil!
> 
> I can confirm the issue on 3.4.y and that your patch fixes it reliably.
> 
> Acked-by: Sebastian Riemer <sebastian.riemer@profitbricks.com>

Running with this patch against 3.8.0rc7 with the following results:

  auto-read-only
  * mdadm --fail   - success
  * mdadm --remove - success

  read-only
  * mdadm --fail   - success
  * mdadm --remove - EROFS

Quick question for Neil -- should the sysfs MD component device state 
file interface perform the same transition from auto-read-only to 
read-write, or is that route intended for more granular changes?

Thanks,

-- Joe

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Set disk faulty / hot disk remove ioctl bug for read-only MD?
  2013-02-13 14:30     ` Sebastian Riemer
@ 2013-02-13 21:55       ` NeilBrown
  2013-02-14 12:23         ` Sebastian Riemer
  0 siblings, 1 reply; 9+ messages in thread
From: NeilBrown @ 2013-02-13 21:55 UTC (permalink / raw)
  To: Sebastian Riemer; +Cc: Joe Lawrence, linux-raid

[-- Attachment #1: Type: text/plain, Size: 2454 bytes --]

On Wed, 13 Feb 2013 15:30:30 +0100 Sebastian Riemer
<sebastian.riemer@profitbricks.com> wrote:

> On 13.02.2013 12:45, Sebastian Riemer wrote:
> > On 13.02.2013 03:38, NeilBrown wrote:
> >> diff --git a/drivers/md/md.c b/drivers/md/md.c
> >> index 8b557d2..292cc2f 100644
> >> --- a/drivers/md/md.c
> >> +++ b/drivers/md/md.c
> >> @@ -6529,7 +6529,17 @@ static int md_ioctl(struct block_device *bdev, fmode_t mode,
> >>  			mddev->ro = 0;
> >>  			sysfs_notify_dirent_safe(mddev->sysfs_state);
> >>  			set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> >> -			md_wakeup_thread(mddev->thread);
> >> +			/* mddev_unlock will wake thread */
> >> +			/* If a device failed while we were read-only, we
> >> +			 * need to make sure the metadata is updated now.
> >> +			 */
> >> +			if (test_bit(MD_CHANGE_DEVS, &mddev->flags)) {
> >> +				mddev_unlock(mddev);
> >> +				wait_event(mddev->sb_wait,
> >> +					   !test_bit(MD_CHANGE_DEVS, &mddev->flags) &&
> >> +					   !test_bit(MD_CHANGE_PENDING, &mddev->flags));
> >> +				mddev_lock(mddev);
> >> +			}
> >>  		} else {
> >>  			err = -EROFS;
> >>  			goto abort_unlock;
> >>
> > 
> > Thanks, Neil!
> > 
> > I can confirm the issue on 3.4.y and that your patch fixes it reliably.
> > 
> > Acked-by: Sebastian Riemer <sebastian.riemer@profitbricks.com>
> > 
> 
> Damn, I've got a kernel which still crashes in
> reap_sync_thread->raid1_spare_active() with NULL pointer dereference
> although this patch is applied. So the fix isn't correct, yet.
> 
> I did some "objdump -S" on raid1.ko and found the issue at the following
> code location in raid1_spare_active():
> #	for (i = 0; i < conf->raid_disks; i++) {
> #		struct md_rdev *rdev = conf->mirrors[i].rdev;
> #		struct md_rdev *repl = conf->mirrors[conf->raid_disks + i].rdev;
> 
> A resync was pending (create without --assume-clean).
> For me it looks like the faulty setting races with the syncer. The rdev
> isn't registered in the personality anymore but the syncer tries to
> access it for immediate resync.
> 

Where exactly is it crashing?  Can I see the complete Oops message?
The code you have identified cannot crash unless conf->raid_disks has become
inconsistent with the allocation of ->mirrors, and that is very unlikely.
Both 'rdev' and 'repl' are tested for NULL before they are used...

If you can get me the Oops message I can probably narrow it down.

Thanks,
NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Set disk faulty / hot disk remove ioctl bug for read-only MD?
  2013-02-13 18:56     ` Joe Lawrence
@ 2013-02-13 22:01       ` NeilBrown
  2013-02-14 19:45         ` Joe Lawrence
  0 siblings, 1 reply; 9+ messages in thread
From: NeilBrown @ 2013-02-13 22:01 UTC (permalink / raw)
  To: Joe Lawrence; +Cc: Sebastian Riemer, linux-raid

[-- Attachment #1: Type: text/plain, Size: 2204 bytes --]

On Wed, 13 Feb 2013 13:56:45 -0500 (EST) Joe Lawrence
<Joe.Lawrence@stratus.com> wrote:

> On Wed, 13 Feb 2013, Sebastian Riemer wrote:
> 
> > On 13.02.2013 03:38, NeilBrown wrote:
> > > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > > index 8b557d2..292cc2f 100644
> > > --- a/drivers/md/md.c
> > > +++ b/drivers/md/md.c
> > > @@ -6529,7 +6529,17 @@ static int md_ioctl(struct block_device *bdev, fmode_t mode,
> > >  			mddev->ro = 0;
> > >  			sysfs_notify_dirent_safe(mddev->sysfs_state);
> > >  			set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> > > -			md_wakeup_thread(mddev->thread);
> > > +			/* mddev_unlock will wake thread */
> > > +			/* If a device failed while we were read-only, we
> > > +			 * need to make sure the metadata is updated now.
> > > +			 */
> > > +			if (test_bit(MD_CHANGE_DEVS, &mddev->flags)) {
> > > +				mddev_unlock(mddev);
> > > +				wait_event(mddev->sb_wait,
> > > +					   !test_bit(MD_CHANGE_DEVS, &mddev->flags) &&
> > > +					   !test_bit(MD_CHANGE_PENDING, &mddev->flags));
> > > +				mddev_lock(mddev);
> > > +			}
> > >  		} else {
> > >  			err = -EROFS;
> > >  			goto abort_unlock;
> > > 
> > 
> > Thanks, Neil!
> > 
> > I can confirm the issue on 3.4.y and that your patch fixes it reliably.
> > 
> > Acked-by: Sebastian Riemer <sebastian.riemer@profitbricks.com>
> 
> Running with this patch against 3.8.0rc7 with the following results:
> 
>   auto-read-only
>   * mdadm --fail   - success
>   * mdadm --remove - success
> 
>   read-only
>   * mdadm --fail   - success
>   * mdadm --remove - EROFS

That all looks good.  I could probably argue that removing a device from a
read-only array should be permitted but I don't know that it is really
important.


> 
> Quick question for Neil -- should the sysfs MD component device state 
> file interface perform the same transition from auto-read-only to 
> read-write, or is that route intended for more granular changes?
> 

Good question.
I think they should definitely transition from auto-read-only to read-write.
Whether they should be denied  for read-only devices I'm less certain of.
I'll look into that.

Thanks,
NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Set disk faulty / hot disk remove ioctl bug for read-only MD?
  2013-02-13 21:55       ` NeilBrown
@ 2013-02-14 12:23         ` Sebastian Riemer
  0 siblings, 0 replies; 9+ messages in thread
From: Sebastian Riemer @ 2013-02-14 12:23 UTC (permalink / raw)
  To: NeilBrown; +Cc: Joe Lawrence, linux-raid

On 13.02.2013 22:55, NeilBrown wrote:
> On Wed, 13 Feb 2013 15:30:30 +0100 Sebastian Riemer
> <sebastian.riemer@profitbricks.com> wrote:
>> Damn, I've got a kernel which still crashes in
>> reap_sync_thread->raid1_spare_active() with NULL pointer dereference
>> although this patch is applied. So the fix isn't correct, yet.
>>
>> I did some "objdump -S" on raid1.ko and found the issue at the following
>> code location in raid1_spare_active():
>> #	for (i = 0; i < conf->raid_disks; i++) {
>> #		struct md_rdev *rdev = conf->mirrors[i].rdev;
>> #		struct md_rdev *repl = conf->mirrors[conf->raid_disks + i].rdev;
>>
>> A resync was pending (create without --assume-clean).
>> For me it looks like the faulty setting races with the syncer. The rdev
>> isn't registered in the personality anymore but the syncer tries to
>> access it for immediate resync.
>>
> 
> Where exactly is it crashing?  Can I see the complete Oops message?
> The code you have identified cannot crash unless conf->raid_disks has become
> inconsistent with the allocation of ->mirrors, and that is very unlikely.
> Both 'rdev' and 'repl' are tested for NULL before they are used...

Sorry, turned out to be a local issue. In an own check directly in the
next line I've forgotten to check rdev for NULL. Thanks for the help!

I'm developing a raw-to-md migration at the moment. With that I can let
MD sync from a device without an MD superblock to a device with MD
superblock.

Cheers,
Sebastian


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Set disk faulty / hot disk remove ioctl bug for read-only MD?
  2013-02-13 22:01       ` NeilBrown
@ 2013-02-14 19:45         ` Joe Lawrence
  0 siblings, 0 replies; 9+ messages in thread
From: Joe Lawrence @ 2013-02-14 19:45 UTC (permalink / raw)
  To: NeilBrown; +Cc: Joe Lawrence, Sebastian Riemer, linux-raid

On Thu, 14 Feb 2013, NeilBrown wrote:

> That all looks good.  I could probably argue that removing a device from a
> read-only array should be permitted but I don't know that it is really
> important.
> 
> 
> > 
> > Quick question for Neil -- should the sysfs MD component device state 
> > file interface perform the same transition from auto-read-only to 
> > read-write, or is that route intended for more granular changes?
> > 
> 
> Good question.
> I think they should definitely transition from auto-read-only to read-write.
> Whether they should be denied  for read-only devices I'm less certain of.
> I'll look into that.

For example, if one disk of a read-only RAID1 happens to be removed, what 
should the user do if he wants MD to release its references to that 
device?  For read-write (and auto-read-only after your change), the user 
can issue mdadm --fail then --remove.  However, if the MD is read-only, is 
stopping the MD the only way to close the device?  What happens if the 
disk is re-inserted and added to the MD?

Regards,

-- Joe

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-02-14 19:45 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-12 21:05 Set disk faulty / hot disk remove ioctl bug for read-only MD? Joe Lawrence
2013-02-13  2:38 ` NeilBrown
2013-02-13 11:45   ` Sebastian Riemer
2013-02-13 14:30     ` Sebastian Riemer
2013-02-13 21:55       ` NeilBrown
2013-02-14 12:23         ` Sebastian Riemer
2013-02-13 18:56     ` Joe Lawrence
2013-02-13 22:01       ` NeilBrown
2013-02-14 19:45         ` Joe Lawrence

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).