[PATCH] md: Add ability for disable bad block management

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] md: Add ability for disable bad block management
@ 2011-11-24 12:19 Adam Kwolek
  2011-11-24 12:23 ` Paul Menzel
  2011-11-30  0:14 ` NeilBrown
  0 siblings, 2 replies; 13+ messages in thread
From: Adam Kwolek @ 2011-11-24 12:19 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, ed.ciechanowski, marcin.labun, dan.j.williams

When external metadata doesn't support BBM, mdadm cannot answer correctly
for BBM requests. It causes reshape process being stopped.

Add ability for external metadata /mdadm/ to disable BBM via sysfs.
md will ignore bad blocks as it is for metadata v0.90.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 drivers/md/md.c |   11 ++++++++++-
 1 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 266e82e..6591108 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -2935,7 +2935,16 @@ static ssize_t bb_show(struct md_rdev *rdev, char *page)
 }
 static ssize_t bb_store(struct md_rdev *rdev, const char *page, size_t len)
 {
-	int rv = badblocks_store(&rdev->badblocks, page, len, 0);
+	int rv;
+
+	/* disable bad blocks managment
+	 */
+	if (strstr(page, "disable") == page) {
+		bb->shift = -1;
+		return len;
+	}
+
+	rv = badblocks_store(&rdev->badblocks, page, len, 0);
 	/* Maybe that ack was all we needed */
 	if (test_and_clear_bit(BlockedBadBlocks, &rdev->flags))
 		wake_up(&rdev->blocked_wait);


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] md: Add ability for disable bad block management
  2011-11-24 12:19 [PATCH] md: Add ability for disable bad block management Adam Kwolek
@ 2011-11-24 12:23 ` Paul Menzel
  2011-11-24 12:28   ` Kwolek, Adam
  2011-11-30  0:14 ` NeilBrown
  1 sibling, 1 reply; 13+ messages in thread
From: Paul Menzel @ 2011-11-24 12:23 UTC (permalink / raw)
  To: Adam Kwolek
  Cc: neilb, linux-raid, ed.ciechanowski, marcin.labun, dan.j.williams

[-- Attachment #1: Type: text/plain, Size: 1350 bytes --]

Dear Adam,


Am Donnerstag, den 24.11.2011, 13:19 +0100 schrieb Adam Kwolek:
> When external metadata doesn't support BBM, mdadm cannot answer correctly
> for BBM requests. It causes reshape process being stopped.
> 
> Add ability for external metadata /mdadm/ to disable BBM via sysfs.
> md will ignore bad blocks as it is for metadata v0.90.
> 
> Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
> ---
> 
>  drivers/md/md.c |   11 ++++++++++-
>  1 files changed, 10 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 266e82e..6591108 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -2935,7 +2935,16 @@ static ssize_t bb_show(struct md_rdev *rdev, char *page)
>  }
>  static ssize_t bb_store(struct md_rdev *rdev, const char *page, size_t len)
>  {
> -	int rv = badblocks_store(&rdev->badblocks, page, len, 0);
> +	int rv;
> +
> +	/* disable bad blocks managment

manag*e*ment

> +	 */
> +	if (strstr(page, "disable") == page) {
> +		bb->shift = -1;
> +		return len;
> +	}
> +
> +	rv = badblocks_store(&rdev->badblocks, page, len, 0);
>  	/* Maybe that ack was all we needed */
>  	if (test_and_clear_bit(BlockedBadBlocks, &rdev->flags))
>  		wake_up(&rdev->blocked_wait);

Also this seems to collide with PATCH 11/11, does not it?


Thanks,

Paul

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH] md: Add ability for disable bad block management
  2011-11-24 12:23 ` Paul Menzel
@ 2011-11-24 12:28   ` Kwolek, Adam
  2011-11-24 12:48     ` Paul Menzel
  0 siblings, 1 reply; 13+ messages in thread
From: Kwolek, Adam @ 2011-11-24 12:28 UTC (permalink / raw)
  To: Paul Menzel
  Cc: neilb@suse.de, linux-raid@vger.kernel.org, Ciechanowski, Ed,
	Labun, Marcin, Williams, Dan J



> -----Original Message-----
> From: Paul Menzel [mailto:pm.debian@googlemail.com]
> Sent: Thursday, November 24, 2011 1:23 PM
> To: Kwolek, Adam
> Cc: neilb@suse.de; linux-raid@vger.kernel.org; Ciechanowski, Ed; Labun,
> Marcin; Williams, Dan J
> Subject: Re: [PATCH] md: Add ability for disable bad block management
> 
> Dear Adam,
> 
> 
> Am Donnerstag, den 24.11.2011, 13:19 +0100 schrieb Adam Kwolek:
> > When external metadata doesn't support BBM, mdadm cannot answer
> > correctly for BBM requests. It causes reshape process being stopped.
> >
> > Add ability for external metadata /mdadm/ to disable BBM via sysfs.
> > md will ignore bad blocks as it is for metadata v0.90.
> >
> > Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
> > ---
> >
> >  drivers/md/md.c |   11 ++++++++++-
> >  1 files changed, 10 insertions(+), 1 deletions(-)
> >
> > diff --git a/drivers/md/md.c b/drivers/md/md.c index 266e82e..6591108
> > 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -2935,7 +2935,16 @@ static ssize_t bb_show(struct md_rdev *rdev,
> > char *page)  }  static ssize_t bb_store(struct md_rdev *rdev, const
> > char *page, size_t len)  {
> > -	int rv = badblocks_store(&rdev->badblocks, page, len, 0);
> > +	int rv;
> > +
> > +	/* disable bad blocks managment
> 
> manag*e*ment
> 
> > +	 */
> > +	if (strstr(page, "disable") == page) {
> > +		bb->shift = -1;
> > +		return len;
> > +	}
> > +
> > +	rv = badblocks_store(&rdev->badblocks, page, len, 0);
> >  	/* Maybe that ack was all we needed */
> >  	if (test_and_clear_bit(BlockedBadBlocks, &rdev->flags))
> >  		wake_up(&rdev->blocked_wait);
> 
> Also this seems to collide with PATCH 11/11, does not it?
> 
> 
> Thanks,
> 
> Paul

In my opinion it doesn't.
Patch 11/11 in mdadm allows for disabling BBM per rdev by setting 'disable' word in sysfs.
This patch interprets in md this action and disables BBM (per rdev also) .

Do you agree?

BR
Adam




^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH] md: Add ability for disable bad block management
  2011-11-24 12:28   ` Kwolek, Adam
@ 2011-11-24 12:48     ` Paul Menzel
  0 siblings, 0 replies; 13+ messages in thread
From: Paul Menzel @ 2011-11-24 12:48 UTC (permalink / raw)
  To: Kwolek, Adam
  Cc: neilb@suse.de, linux-raid@vger.kernel.org, Ciechanowski, Ed,
	Labun, Marcin, Williams, Dan J

[-- Attachment #1: Type: text/plain, Size: 2256 bytes --]

Am Donnerstag, den 24.11.2011, 12:28 +0000 schrieb Kwolek, Adam:
> > -----Original Message-----
> > From: Paul Menzel [mailto:pm.debian@googlemail.com]
> > Sent: Thursday, November 24, 2011 1:23 PM
> > To: Kwolek, Adam
> > Cc: neilb@suse.de; linux-raid@vger.kernel.org; Ciechanowski, Ed; Labun,
> > Marcin; Williams, Dan J
> > Subject: Re: [PATCH] md: Add ability for disable bad block management
> > 
> > Dear Adam,
> > 
> > 
> > Am Donnerstag, den 24.11.2011, 13:19 +0100 schrieb Adam Kwolek:
> > > When external metadata doesn't support BBM, mdadm cannot answer
> > > correctly for BBM requests. It causes reshape process being stopped.
> > >
> > > Add ability for external metadata /mdadm/ to disable BBM via sysfs.
> > > md will ignore bad blocks as it is for metadata v0.90.
> > >
> > > Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
> > > ---
> > >
> > >  drivers/md/md.c |   11 ++++++++++-
> > >  1 files changed, 10 insertions(+), 1 deletions(-)
> > >
> > > diff --git a/drivers/md/md.c b/drivers/md/md.c index 266e82e..6591108
> > > 100644
> > > --- a/drivers/md/md.c
> > > +++ b/drivers/md/md.c
> > > @@ -2935,7 +2935,16 @@ static ssize_t bb_show(struct md_rdev *rdev,
> > > char *page)  }  static ssize_t bb_store(struct md_rdev *rdev, const
> > > char *page, size_t len)  {
> > > -	int rv = badblocks_store(&rdev->badblocks, page, len, 0);
> > > +	int rv;
> > > +
> > > +	/* disable bad blocks managment
> > 
> > manag*e*ment
> > 
> > > +	 */
> > > +	if (strstr(page, "disable") == page) {
> > > +		bb->shift = -1;
> > > +		return len;
> > > +	}
> > > +
> > > +	rv = badblocks_store(&rdev->badblocks, page, len, 0);
> > >  	/* Maybe that ack was all we needed */
> > >  	if (test_and_clear_bit(BlockedBadBlocks, &rdev->flags))
> > >  		wake_up(&rdev->blocked_wait);
> > 
> > Also this seems to collide with PATCH 11/11, does not it?

> In my opinion it doesn't.
> Patch 11/11 in mdadm allows for disabling BBM per rdev by setting 'disable' word in sysfs.
> This patch interprets in md this action and disables BBM (per rdev also) .
> 
> Do you agree?

Yes. Thank you for your explanation. I just saw the same typo. So please
correct that typo in 11/11 also.


Thanks,

Paul

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] md: Add ability for disable bad block management
  2011-11-24 12:19 [PATCH] md: Add ability for disable bad block management Adam Kwolek
  2011-11-24 12:23 ` Paul Menzel
@ 2011-11-30  0:14 ` NeilBrown
  2011-11-30  8:17   ` Kwolek, Adam
  1 sibling, 1 reply; 13+ messages in thread
From: NeilBrown @ 2011-11-30  0:14 UTC (permalink / raw)
  To: Adam Kwolek; +Cc: linux-raid, ed.ciechanowski, marcin.labun, dan.j.williams

[-- Attachment #1: Type: text/plain, Size: 1072 bytes --]

On Thu, 24 Nov 2011 13:19:53 +0100 Adam Kwolek <adam.kwolek@intel.com> wrote:

> When external metadata doesn't support BBM, mdadm cannot answer correctly
> for BBM requests. It causes reshape process being stopped.
> 
> Add ability for external metadata /mdadm/ to disable BBM via sysfs.
> md will ignore bad blocks as it is for metadata v0.90.

This should not be necessary.

The intention is that a device with a bad block looks exactly like a device
with a failed device.  i.e. 'faulty' and 'blocked' appear in the 'state'
file.

If the metadata doesn't support a bad-block list, it will record that the
device has failed and will unblock the device.  At that point the failure is
forced.
If the metadata does support a bad block list it will just record the bad
blocks and acknowledge them, and the unblock the device.  At that point the
device won't be failed, the 'faulty' state will disappear, and it will
continue to be used with the known bad blocks.

What exactly is going wrong that makes you think you need this patch?

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH] md: Add ability for disable bad block management
  2011-11-30  0:14 ` NeilBrown
@ 2011-11-30  8:17   ` Kwolek, Adam
  2011-12-06  6:05     ` NeilBrown
  0 siblings, 1 reply; 13+ messages in thread
From: Kwolek, Adam @ 2011-11-30  8:17 UTC (permalink / raw)
  To: NeilBrown
  Cc: linux-raid@vger.kernel.org, Ciechanowski, Ed, Labun, Marcin,
	Williams, Dan J

> -----Original Message-----
> From: NeilBrown [mailto:neilb@suse.de]
> Sent: Wednesday, November 30, 2011 1:14 AM
> To: Kwolek, Adam
> Cc: linux-raid@vger.kernel.org; Ciechanowski, Ed; Labun, Marcin; Williams,
> Dan J
> Subject: Re: [PATCH] md: Add ability for disable bad block management
> 
> On Thu, 24 Nov 2011 13:19:53 +0100 Adam Kwolek
> <adam.kwolek@intel.com> wrote:
> 
> > When external metadata doesn't support BBM, mdadm cannot answer
> > correctly for BBM requests. It causes reshape process being stopped.
> >
> > Add ability for external metadata /mdadm/ to disable BBM via sysfs.
> > md will ignore bad blocks as it is for metadata v0.90.
> 
> This should not be necessary.
> 
> The intention is that a device with a bad block looks exactly like a device with
> a failed device.  i.e. 'faulty' and 'blocked' appear in the 'state'
> file.
> 
> If the metadata doesn't support a bad-block list, it will record that the device
> has failed and will unblock the device.  At that point the failure is forced.
> If the metadata does support a bad block list it will just record the bad blocks
> and acknowledge them, and the unblock the device.  At that point the device
> won't be failed, the 'faulty' state will disappear, and it will continue to be
> used with the known bad blocks.
> 
> What exactly is going wrong that makes you think you need this patch?

When degradation occurs during migration BBM is signaled to mdmon and mdmon /monitor.c/ tries to mark disk  '-blocked'
This operation fails. Momon goes in to loop, and nothing can be done /I cannot make it using sysfs/ to signal or remove device.
In sysfs device is present in /sys/block/mdXXX/md but entry /sys/block/mdXXX/md/dev-sdX/~block is missing /disk was pulled out/.

From the kernel perspective when BBM event occurs md_do_sync() thread should finish and md should reinitialize process with new disks set.
This occurs when BBM is disabled for metadata 0.9 and using my patches for imsm.
Without BBM being blocked md_do_sync() blocks and doesn't end. Whole process is being stopped.
If I've made md_do_sync() to finish the second one is not started as for BBM disabled case.

From user space device is being visible as device that cannot be umounted /this is not strange as kernel thread waits forever/
Normal reboot is not possible also.

If you don't want the last patch that disables BBM support  /at this moment?/, please consider the rest of patches.
This will add ability to restart migration when array degradation occurs when array was offline /someone "borrows" disk ;)/.

If you have any more questions please let me know.

BR
Adam

> 
> NeilBrown
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] md: Add ability for disable bad block management
  2011-11-30  8:17   ` Kwolek, Adam
@ 2011-12-06  6:05     ` NeilBrown
  2011-12-06 13:02       ` Kwolek, Adam
  0 siblings, 1 reply; 13+ messages in thread
From: NeilBrown @ 2011-12-06  6:05 UTC (permalink / raw)
  To: Kwolek, Adam
  Cc: linux-raid@vger.kernel.org, Ciechanowski, Ed, Labun, Marcin,
	Williams, Dan J

[-- Attachment #1: Type: text/plain, Size: 4428 bytes --]

On Wed, 30 Nov 2011 08:17:32 +0000 "Kwolek, Adam" <adam.kwolek@intel.com>
wrote:

> 
> 
> > -----Original Message-----
> > From: NeilBrown [mailto:neilb@suse.de]
> > Sent: Wednesday, November 30, 2011 1:14 AM
> > To: Kwolek, Adam
> > Cc: linux-raid@vger.kernel.org; Ciechanowski, Ed; Labun, Marcin; Williams,
> > Dan J
> > Subject: Re: [PATCH] md: Add ability for disable bad block management
> > 
> > On Thu, 24 Nov 2011 13:19:53 +0100 Adam Kwolek
> > <adam.kwolek@intel.com> wrote:
> > 
> > > When external metadata doesn't support BBM, mdadm cannot answer
> > > correctly for BBM requests. It causes reshape process being stopped.
> > >
> > > Add ability for external metadata /mdadm/ to disable BBM via sysfs.
> > > md will ignore bad blocks as it is for metadata v0.90.
> > 
> > This should not be necessary.
> > 
> > The intention is that a device with a bad block looks exactly like a device with
> > a failed device.  i.e. 'faulty' and 'blocked' appear in the 'state'
> > file.
> > 
> > If the metadata doesn't support a bad-block list, it will record that the device
> > has failed and will unblock the device.  At that point the failure is forced.
> > If the metadata does support a bad block list it will just record the bad blocks
> > and acknowledge them, and the unblock the device.  At that point the device
> > won't be failed, the 'faulty' state will disappear, and it will continue to be
> > used with the known bad blocks.
> > 
> > What exactly is going wrong that makes you think you need this patch?
> 
> 
> When degradation occurs during migration BBM is signaled to mdmon and mdmon /monitor.c/ tries to mark disk  '-blocked'
> This operation fails. Momon goes in to loop, and nothing can be done /I cannot make it using sysfs/ to signal or remove device.
> In sysfs device is present in /sys/block/mdXXX/md but entry /sys/block/mdXXX/md/dev-sdX/~block is missing /disk was pulled out/.


I've found a couple of issues.  I'm not sure if they completely explain what
you are seeing.  Could you please test with these two fixes and tell me the
results?

Firstly, I find that writing "-blocked" succeeds (no error returned) but the
"blocked" flag does not get cleared, which is certainly confusing.

This is fixed by:

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 4adcbb4..7258dc1 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -2562,7 +2562,8 @@ state_show(struct md_rdev *rdev, char *page)
 		sep = ",";
 	}
 	if (test_bit(Blocked, &rdev->flags) ||
-	    rdev->badblocks.unacked_exist) {
+	    (rdev->badblocks.unacked_exist
+	     && !test_bit(Faulty, &rdev->flags))) {
 		len += sprintf(page+len, "%sblocked", sep);
 		sep = ",";
 	}


Secondly mdmon writes "-blocked" even when the "blocked" flag is not set.
This succeeds so state_store() calls
		sysfs_notify_dirent_safe(rdev->sysfs_state);

so mdmon/monitor.c is woken up to go around the loop again and it writes
"-blocked" again and so it continues in a loop.

This is fixed by:

diff --git a/monitor.c b/monitor.c
index b002e90..29bde18 100644
--- a/monitor.c
+++ b/monitor.c
@@ -339,7 +339,8 @@ static int read_and_act(struct active_array *a)
 			a->container->ss->set_disk(a, mdi->disk.raid_disk,
 						   mdi->curr_state);
 			check_degraded = 1;
-			mdi->next_state |= DS_UNBLOCK;
+			if (mdi->curr_state & DS_BLOCKED)
+				mdi->next_state |= DS_UNBLOCK;
 			if (a->curr_state == read_auto) {
 				a->container->ss->set_array_state(a, 0);
 				a->next_state = active;


Finally, when a badblock is added to the list we don't currently notify
rdev->sysfs_state so mdmon doesn't notice straight away and so is delayed in
taking action.  It will only notice when a write blocks.

This is fixed by:

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 4adcbb4..9cc7983 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -7940,6 +7941,7 @@ int rdev_set_badblocks(struct md_rdev *rdev, sector_t s, int sectors,
 				  s + rdev->data_offset, sectors, acknowledged);
 	if (rv) {
 		/* Make sure they get written out promptly */
+		sysfs_notify_dirent_safe(rdev->sysfs_state);
 		set_bit(MD_CHANGE_CLEAN, &rdev->mddev->flags);
 		md_wakeup_thread(rdev->mddev->thread);
 	}


With these 3 changes in place I get substantially improved behaviour on my
simple test (just doing resync, not reshape).

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* RE: [PATCH] md: Add ability for disable bad block management
  2011-12-06  6:05     ` NeilBrown
@ 2011-12-06 13:02       ` Kwolek, Adam
  2011-12-07  1:52         ` NeilBrown
  0 siblings, 1 reply; 13+ messages in thread
From: Kwolek, Adam @ 2011-12-06 13:02 UTC (permalink / raw)
  To: NeilBrown
  Cc: linux-raid@vger.kernel.org, Ciechanowski, Ed, Labun, Marcin,
	Williams, Dan J



> -----Original Message-----
> From: NeilBrown [mailto:neilb@suse.de]
> Sent: Tuesday, December 06, 2011 7:05 AM
> To: Kwolek, Adam
> Cc: linux-raid@vger.kernel.org; Ciechanowski, Ed; Labun, Marcin; Williams,
> Dan J
> Subject: Re: [PATCH] md: Add ability for disable bad block management
> 
> On Wed, 30 Nov 2011 08:17:32 +0000 "Kwolek, Adam"
> <adam.kwolek@intel.com>
> wrote:
> 
> >
> >
> > > -----Original Message-----
> > > From: NeilBrown [mailto:neilb@suse.de]
> > > Sent: Wednesday, November 30, 2011 1:14 AM
> > > To: Kwolek, Adam
> > > Cc: linux-raid@vger.kernel.org; Ciechanowski, Ed; Labun, Marcin;
> > > Williams, Dan J
> > > Subject: Re: [PATCH] md: Add ability for disable bad block
> > > management
> > >
> > > On Thu, 24 Nov 2011 13:19:53 +0100 Adam Kwolek
> > > <adam.kwolek@intel.com> wrote:
> > >
> > > > When external metadata doesn't support BBM, mdadm cannot answer
> > > > correctly for BBM requests. It causes reshape process being stopped.
> > > >
> > > > Add ability for external metadata /mdadm/ to disable BBM via sysfs.
> > > > md will ignore bad blocks as it is for metadata v0.90.
> > >
> > > This should not be necessary.
> > >
> > > The intention is that a device with a bad block looks exactly like a
> > > device with a failed device.  i.e. 'faulty' and 'blocked' appear in the 'state'
> > > file.
> > >
> > > If the metadata doesn't support a bad-block list, it will record
> > > that the device has failed and will unblock the device.  At that point the
> failure is forced.
> > > If the metadata does support a bad block list it will just record
> > > the bad blocks and acknowledge them, and the unblock the device.  At
> > > that point the device won't be failed, the 'faulty' state will
> > > disappear, and it will continue to be used with the known bad blocks.
> > >
> > > What exactly is going wrong that makes you think you need this patch?
> >
> >
> > When degradation occurs during migration BBM is signaled to mdmon and
> mdmon /monitor.c/ tries to mark disk  '-blocked'
> > This operation fails. Momon goes in to loop, and nothing can be done /I
> cannot make it using sysfs/ to signal or remove device.
> > In sysfs device is present in /sys/block/mdXXX/md but entry
> /sys/block/mdXXX/md/dev-sdX/~block is missing /disk was pulled out/.
> 
> 
> I've found a couple of issues.  I'm not sure if they completely explain what
> you are seeing.  Could you please test with these two fixes and tell me the
> results?
> 
> Firstly, I find that writing "-blocked" succeeds (no error returned) but the
> "blocked" flag does not get cleared, which is certainly confusing.
> 
> This is fixed by:
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c index 4adcbb4..7258dc1
> 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -2562,7 +2562,8 @@ state_show(struct md_rdev *rdev, char *page)
>  		sep = ",";
>  	}
>  	if (test_bit(Blocked, &rdev->flags) ||
> -	    rdev->badblocks.unacked_exist) {
> +	    (rdev->badblocks.unacked_exist
> +	     && !test_bit(Faulty, &rdev->flags))) {
>  		len += sprintf(page+len, "%sblocked", sep);
>  		sep = ",";
>  	}
> 
> 
> Secondly mdmon writes "-blocked" even when the "blocked" flag is not set.
> This succeeds so state_store() calls
> 		sysfs_notify_dirent_safe(rdev->sysfs_state);
> 
> so mdmon/monitor.c is woken up to go around the loop again and it writes "-
> blocked" again and so it continues in a loop.
> 
> This is fixed by:
> 
> diff --git a/monitor.c b/monitor.c
> index b002e90..29bde18 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -339,7 +339,8 @@ static int read_and_act(struct active_array *a)
>  			a->container->ss->set_disk(a, mdi->disk.raid_disk,
>  						   mdi->curr_state);
>  			check_degraded = 1;
> -			mdi->next_state |= DS_UNBLOCK;
> +			if (mdi->curr_state & DS_BLOCKED)
> +				mdi->next_state |= DS_UNBLOCK;
>  			if (a->curr_state == read_auto) {
>  				a->container->ss->set_array_state(a, 0);
>  				a->next_state = active;
> 
> 
> Finally, when a badblock is added to the list we don't currently notify
> rdev->sysfs_state so mdmon doesn't notice straight away and so is
> rdev->delayed in
> taking action.  It will only notice when a write blocks.
> 
> This is fixed by:
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c index 4adcbb4..9cc7983
> 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -7940,6 +7941,7 @@ int rdev_set_badblocks(struct md_rdev *rdev,
> sector_t s, int sectors,
>  				  s + rdev->data_offset, sectors,
> acknowledged);
>  	if (rv) {
>  		/* Make sure they get written out promptly */
> +		sysfs_notify_dirent_safe(rdev->sysfs_state);
>  		set_bit(MD_CHANGE_CLEAN, &rdev->mddev->flags);
>  		md_wakeup_thread(rdev->mddev->thread);
>  	}
> 
> 
> With these 3 changes in place I get substantially improved behaviour on my
> simple test (just doing resync, not reshape).
> 
> Thanks,
> NeilBrown

I've applied those changes and:
1.  Migration:
	a) with additionally disabled BBM, reshape continues after degradation and performance is not lower (without your patches performance was poor and mdmon goes in to "crazy" run).
	b) with enabled BBM (without my change), metadata is updated correctly and md stops. mdstat shows that reshape is in progress but it is not moving forward
2. Rebuild:
	a) with additionally disabled BBM, rebuild is stopped  correctly in md and metadata just after degradation (I've got few additional corrections for metadata rebuild finalization, I'll post it shortly). 
	b) with enabled BBM (without my change), metadata is updated correctly and md stops. mdstat shows that rebuild is in progress but it is not moving forward


It seems that those changes helps for reshape performance drop after degradation and "crazy" mdmon run. 
In md without blocking BBM still md_do_sync() doesn't finish on degradation during reshape and rebuild. This causes process to be stopped.
The last information from md is print out from md_error() and it probably waits on BBM confirmation.

What can be different in my tests is that I physically pull out disks to get raid degraded (I'm not using sysfs to do this). After this rdev link in md device is invalid.

Please let me know if you want to any additional tests made by me /any specific logs?/.

BR
Adam












^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] md: Add ability for disable bad block management
  2011-12-06 13:02       ` Kwolek, Adam
@ 2011-12-07  1:52         ` NeilBrown
  2011-12-07 11:10           ` Kwolek, Adam
  0 siblings, 1 reply; 13+ messages in thread
From: NeilBrown @ 2011-12-07  1:52 UTC (permalink / raw)
  To: Kwolek, Adam
  Cc: linux-raid@vger.kernel.org, Ciechanowski, Ed, Labun, Marcin,
	Williams, Dan J

[-- Attachment #1: Type: text/plain, Size: 7524 bytes --]

On Tue, 6 Dec 2011 13:02:21 +0000 "Kwolek, Adam" <adam.kwolek@intel.com>
wrote:

> 
> 
> > -----Original Message-----
> > From: NeilBrown [mailto:neilb@suse.de]
> > Sent: Tuesday, December 06, 2011 7:05 AM
> > To: Kwolek, Adam
> > Cc: linux-raid@vger.kernel.org; Ciechanowski, Ed; Labun, Marcin; Williams,
> > Dan J
> > Subject: Re: [PATCH] md: Add ability for disable bad block management
> > 
> > On Wed, 30 Nov 2011 08:17:32 +0000 "Kwolek, Adam"
> > <adam.kwolek@intel.com>
> > wrote:
> > 
> > >
> > >
> > > > -----Original Message-----
> > > > From: NeilBrown [mailto:neilb@suse.de]
> > > > Sent: Wednesday, November 30, 2011 1:14 AM
> > > > To: Kwolek, Adam
> > > > Cc: linux-raid@vger.kernel.org; Ciechanowski, Ed; Labun, Marcin;
> > > > Williams, Dan J
> > > > Subject: Re: [PATCH] md: Add ability for disable bad block
> > > > management
> > > >
> > > > On Thu, 24 Nov 2011 13:19:53 +0100 Adam Kwolek
> > > > <adam.kwolek@intel.com> wrote:
> > > >
> > > > > When external metadata doesn't support BBM, mdadm cannot answer
> > > > > correctly for BBM requests. It causes reshape process being stopped.
> > > > >
> > > > > Add ability for external metadata /mdadm/ to disable BBM via sysfs.
> > > > > md will ignore bad blocks as it is for metadata v0.90.
> > > >
> > > > This should not be necessary.
> > > >
> > > > The intention is that a device with a bad block looks exactly like a
> > > > device with a failed device.  i.e. 'faulty' and 'blocked' appear in the 'state'
> > > > file.
> > > >
> > > > If the metadata doesn't support a bad-block list, it will record
> > > > that the device has failed and will unblock the device.  At that point the
> > failure is forced.
> > > > If the metadata does support a bad block list it will just record
> > > > the bad blocks and acknowledge them, and the unblock the device.  At
> > > > that point the device won't be failed, the 'faulty' state will
> > > > disappear, and it will continue to be used with the known bad blocks.
> > > >
> > > > What exactly is going wrong that makes you think you need this patch?
> > >
> > >
> > > When degradation occurs during migration BBM is signaled to mdmon and
> > mdmon /monitor.c/ tries to mark disk  '-blocked'
> > > This operation fails. Momon goes in to loop, and nothing can be done /I
> > cannot make it using sysfs/ to signal or remove device.
> > > In sysfs device is present in /sys/block/mdXXX/md but entry
> > /sys/block/mdXXX/md/dev-sdX/~block is missing /disk was pulled out/.
> > 
> > 
> > I've found a couple of issues.  I'm not sure if they completely explain what
> > you are seeing.  Could you please test with these two fixes and tell me the
> > results?
> > 
> > Firstly, I find that writing "-blocked" succeeds (no error returned) but the
> > "blocked" flag does not get cleared, which is certainly confusing.
> > 
> > This is fixed by:
> > 
> > diff --git a/drivers/md/md.c b/drivers/md/md.c index 4adcbb4..7258dc1
> > 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -2562,7 +2562,8 @@ state_show(struct md_rdev *rdev, char *page)
> >  		sep = ",";
> >  	}
> >  	if (test_bit(Blocked, &rdev->flags) ||
> > -	    rdev->badblocks.unacked_exist) {
> > +	    (rdev->badblocks.unacked_exist
> > +	     && !test_bit(Faulty, &rdev->flags))) {
> >  		len += sprintf(page+len, "%sblocked", sep);
> >  		sep = ",";
> >  	}
> > 
> > 
> > Secondly mdmon writes "-blocked" even when the "blocked" flag is not set.
> > This succeeds so state_store() calls
> > 		sysfs_notify_dirent_safe(rdev->sysfs_state);
> > 
> > so mdmon/monitor.c is woken up to go around the loop again and it writes "-
> > blocked" again and so it continues in a loop.
> > 
> > This is fixed by:
> > 
> > diff --git a/monitor.c b/monitor.c
> > index b002e90..29bde18 100644
> > --- a/monitor.c
> > +++ b/monitor.c
> > @@ -339,7 +339,8 @@ static int read_and_act(struct active_array *a)
> >  			a->container->ss->set_disk(a, mdi->disk.raid_disk,
> >  						   mdi->curr_state);
> >  			check_degraded = 1;
> > -			mdi->next_state |= DS_UNBLOCK;
> > +			if (mdi->curr_state & DS_BLOCKED)
> > +				mdi->next_state |= DS_UNBLOCK;
> >  			if (a->curr_state == read_auto) {
> >  				a->container->ss->set_array_state(a, 0);
> >  				a->next_state = active;
> > 
> > 
> > Finally, when a badblock is added to the list we don't currently notify
> > rdev->sysfs_state so mdmon doesn't notice straight away and so is
> > rdev->delayed in
> > taking action.  It will only notice when a write blocks.
> > 
> > This is fixed by:
> > 
> > diff --git a/drivers/md/md.c b/drivers/md/md.c index 4adcbb4..9cc7983
> > 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -7940,6 +7941,7 @@ int rdev_set_badblocks(struct md_rdev *rdev,
> > sector_t s, int sectors,
> >  				  s + rdev->data_offset, sectors,
> > acknowledged);
> >  	if (rv) {
> >  		/* Make sure they get written out promptly */
> > +		sysfs_notify_dirent_safe(rdev->sysfs_state);
> >  		set_bit(MD_CHANGE_CLEAN, &rdev->mddev->flags);
> >  		md_wakeup_thread(rdev->mddev->thread);
> >  	}
> > 
> > 
> > With these 3 changes in place I get substantially improved behaviour on my
> > simple test (just doing resync, not reshape).
> > 
> > Thanks,
> > NeilBrown
> 
> I've applied those changes and:
> 1.  Migration:
> 	a) with additionally disabled BBM, reshape continues after degradation and performance is not lower (without your patches performance was poor and mdmon goes in to "crazy" run).
> 	b) with enabled BBM (without my change), metadata is updated correctly and md stops. mdstat shows that reshape is in progress but it is not moving forward
> 2. Rebuild:
> 	a) with additionally disabled BBM, rebuild is stopped  correctly in md and metadata just after degradation (I've got few additional corrections for metadata rebuild finalization, I'll post it shortly). 
> 	b) with enabled BBM (without my change), metadata is updated correctly and md stops. mdstat shows that rebuild is in progress but it is not moving forward
> 
> 
> It seems that those changes helps for reshape performance drop after degradation and "crazy" mdmon run. 
> In md without blocking BBM still md_do_sync() doesn't finish on degradation during reshape and rebuild. This causes process to be stopped.
> The last information from md is print out from md_error() and it probably waits on BBM confirmation.
> 
> What can be different in my tests is that I physically pull out disks to get raid degraded (I'm not using sysfs to do this). After this rdev link in md device is invalid.
> 
> Please let me know if you want to any additional tests made by me /any specific logs?/.
> 
>

I cannot reproduce this.
I didn't physically remove devices, but I used
   echo 1 > /sys/block/sdc/device/delete

which should be nearly identical from the perspective of md and mdadm.

If you could give me the exact set of steps that you follow to produce the
problem that would help - maybe a script?  Just a description is OK.

Also you say it is blocking in md_do_sync.  Is that at the 

	wait_event(mddev->recovery_wait, !atomic_read(&mddev->recovery_active));

call just after the "out:" label?

What is the raid thread doing at this point?  
   cat /proc/PID/stack
might help.

What are the contents of all the sysfs files?
   grep . /sys/block/mdXXX/md/*
   grep . /sys/block/mdXXX/md/dev-*/*

Thanks,
NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH] md: Add ability for disable bad block management
  2011-12-07  1:52         ` NeilBrown
@ 2011-12-07 11:10           ` Kwolek, Adam
  2011-12-08  4:02             ` NeilBrown
  0 siblings, 1 reply; 13+ messages in thread
From: Kwolek, Adam @ 2011-12-07 11:10 UTC (permalink / raw)
  To: NeilBrown
  Cc: linux-raid@vger.kernel.org, Ciechanowski, Ed, Labun, Marcin,
	Williams, Dan J



> -----Original Message-----
> From: NeilBrown [mailto:neilb@suse.de]
> Sent: Wednesday, December 07, 2011 2:53 AM
> To: Kwolek, Adam
> Cc: linux-raid@vger.kernel.org; Ciechanowski, Ed; Labun, Marcin; Williams,
> Dan J
> Subject: Re: [PATCH] md: Add ability for disable bad block management
> 
> On Tue, 6 Dec 2011 13:02:21 +0000 "Kwolek, Adam"
> <adam.kwolek@intel.com>
> wrote:
> 
> >
> >
> > > -----Original Message-----
> > > From: NeilBrown [mailto:neilb@suse.de]
> > > Sent: Tuesday, December 06, 2011 7:05 AM
> > > To: Kwolek, Adam
> > > Cc: linux-raid@vger.kernel.org; Ciechanowski, Ed; Labun, Marcin;
> > > Williams, Dan J
> > > Subject: Re: [PATCH] md: Add ability for disable bad block
> > > management
> > >
> > > On Wed, 30 Nov 2011 08:17:32 +0000 "Kwolek, Adam"
> > > <adam.kwolek@intel.com>
> > > wrote:
> > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: NeilBrown [mailto:neilb@suse.de]
> > > > > Sent: Wednesday, November 30, 2011 1:14 AM
> > > > > To: Kwolek, Adam
> > > > > Cc: linux-raid@vger.kernel.org; Ciechanowski, Ed; Labun, Marcin;
> > > > > Williams, Dan J
> > > > > Subject: Re: [PATCH] md: Add ability for disable bad block
> > > > > management
> > > > >
> > > > > On Thu, 24 Nov 2011 13:19:53 +0100 Adam Kwolek
> > > > > <adam.kwolek@intel.com> wrote:
> > > > >
> > > > > > When external metadata doesn't support BBM, mdadm cannot
> > > > > > answer correctly for BBM requests. It causes reshape process being
> stopped.
> > > > > >
> > > > > > Add ability for external metadata /mdadm/ to disable BBM via sysfs.
> > > > > > md will ignore bad blocks as it is for metadata v0.90.
> > > > >
> > > > > This should not be necessary.
> > > > >
> > > > > The intention is that a device with a bad block looks exactly
> > > > > like a device with a failed device.  i.e. 'faulty' and 'blocked' appear in
> the 'state'
> > > > > file.
> > > > >
> > > > > If the metadata doesn't support a bad-block list, it will record
> > > > > that the device has failed and will unblock the device.  At that
> > > > > point the
> > > failure is forced.
> > > > > If the metadata does support a bad block list it will just
> > > > > record the bad blocks and acknowledge them, and the unblock the
> > > > > device.  At that point the device won't be failed, the 'faulty'
> > > > > state will disappear, and it will continue to be used with the known
> bad blocks.
> > > > >
> > > > > What exactly is going wrong that makes you think you need this
> patch?
> > > >
> > > >
> > > > When degradation occurs during migration BBM is signaled to mdmon
> > > > and
> > > mdmon /monitor.c/ tries to mark disk  '-blocked'
> > > > This operation fails. Momon goes in to loop, and nothing can be
> > > > done /I
> > > cannot make it using sysfs/ to signal or remove device.
> > > > In sysfs device is present in /sys/block/mdXXX/md but entry
> > > /sys/block/mdXXX/md/dev-sdX/~block is missing /disk was pulled out/.
> > >
> > >
> > > I've found a couple of issues.  I'm not sure if they completely
> > > explain what you are seeing.  Could you please test with these two
> > > fixes and tell me the results?
> > >
> > > Firstly, I find that writing "-blocked" succeeds (no error returned)
> > > but the "blocked" flag does not get cleared, which is certainly confusing.
> > >
> > > This is fixed by:
> > >
> > > diff --git a/drivers/md/md.c b/drivers/md/md.c index
> > > 4adcbb4..7258dc1
> > > 100644
> > > --- a/drivers/md/md.c
> > > +++ b/drivers/md/md.c
> > > @@ -2562,7 +2562,8 @@ state_show(struct md_rdev *rdev, char *page)
> > >  		sep = ",";
> > >  	}
> > >  	if (test_bit(Blocked, &rdev->flags) ||
> > > -	    rdev->badblocks.unacked_exist) {
> > > +	    (rdev->badblocks.unacked_exist
> > > +	     && !test_bit(Faulty, &rdev->flags))) {
> > >  		len += sprintf(page+len, "%sblocked", sep);
> > >  		sep = ",";
> > >  	}
> > >
> > >
> > > Secondly mdmon writes "-blocked" even when the "blocked" flag is not
> set.
> > > This succeeds so state_store() calls
> > > 		sysfs_notify_dirent_safe(rdev->sysfs_state);
> > >
> > > so mdmon/monitor.c is woken up to go around the loop again and it
> > > writes "- blocked" again and so it continues in a loop.
> > >
> > > This is fixed by:
> > >
> > > diff --git a/monitor.c b/monitor.c
> > > index b002e90..29bde18 100644
> > > --- a/monitor.c
> > > +++ b/monitor.c
> > > @@ -339,7 +339,8 @@ static int read_and_act(struct active_array *a)
> > >  			a->container->ss->set_disk(a, mdi->disk.raid_disk,
> > >  						   mdi->curr_state);
> > >  			check_degraded = 1;
> > > -			mdi->next_state |= DS_UNBLOCK;
> > > +			if (mdi->curr_state & DS_BLOCKED)
> > > +				mdi->next_state |= DS_UNBLOCK;
> > >  			if (a->curr_state == read_auto) {
> > >  				a->container->ss->set_array_state(a, 0);
> > >  				a->next_state = active;
> > >
> > >
> > > Finally, when a badblock is added to the list we don't currently
> > > notify
> > > rdev->sysfs_state so mdmon doesn't notice straight away and so is
> > > rdev->delayed in
> > > taking action.  It will only notice when a write blocks.
> > >
> > > This is fixed by:
> > >
> > > diff --git a/drivers/md/md.c b/drivers/md/md.c index
> > > 4adcbb4..9cc7983
> > > 100644
> > > --- a/drivers/md/md.c
> > > +++ b/drivers/md/md.c
> > > @@ -7940,6 +7941,7 @@ int rdev_set_badblocks(struct md_rdev *rdev,
> > > sector_t s, int sectors,
> > >  				  s + rdev->data_offset, sectors,
> acknowledged);
> > >  	if (rv) {
> > >  		/* Make sure they get written out promptly */
> > > +		sysfs_notify_dirent_safe(rdev->sysfs_state);
> > >  		set_bit(MD_CHANGE_CLEAN, &rdev->mddev->flags);
> > >  		md_wakeup_thread(rdev->mddev->thread);
> > >  	}
> > >
> > >
> > > With these 3 changes in place I get substantially improved behaviour
> > > on my simple test (just doing resync, not reshape).
> > >
> > > Thanks,
> > > NeilBrown
> >
> > I've applied those changes and:
> > 1.  Migration:
> > 	a) with additionally disabled BBM, reshape continues after
> degradation and performance is not lower (without your patches
> performance was poor and mdmon goes in to "crazy" run).
> > 	b) with enabled BBM (without my change), metadata is updated
> > correctly and md stops. mdstat shows that reshape is in progress but it is
> not moving forward 2. Rebuild:
> > 	a) with additionally disabled BBM, rebuild is stopped  correctly in md
> and metadata just after degradation (I've got few additional corrections for
> metadata rebuild finalization, I'll post it shortly).
> > 	b) with enabled BBM (without my change), metadata is updated
> > correctly and md stops. mdstat shows that rebuild is in progress but
> > it is not moving forward
> >
> >
> > It seems that those changes helps for reshape performance drop after
> degradation and "crazy" mdmon run.
> > In md without blocking BBM still md_do_sync() doesn't finish on
> degradation during reshape and rebuild. This causes process to be stopped.
> > The last information from md is print out from md_error() and it probably
> waits on BBM confirmation.
> >
> > What can be different in my tests is that I physically pull out disks to get raid
> degraded (I'm not using sysfs to do this). After this rdev link in md device is
> invalid.
> >
> > Please let me know if you want to any additional tests made by me /any
> specific logs?/.
> >
> >
> 
> I cannot reproduce this.
> I didn't physically remove devices, but I used
>    echo 1 > /sys/block/sdc/device/delete
> which should be nearly identical from the perspective of md and mdadm.

I've checked that when I'm deleting device using sysfs  everything works perfect. 
When when device is pulled out, reshape stops in md/mdstat.

> If you could give me the exact set of steps that you follow to produce the
> problem that would help - maybe a script?  Just a description is OK.


#used disks sdb, sdc, sdd, sde
export IMSM_NO_PLATFORM=1
#create container
mdadm -C /dev/md/imsm0 -amd -e imsm -n 3 /dev/sdb /dev/sdc /dev/sde -R
#create vol
mdadm -C /dev/md/raid5vol_0 -amd -l 5 --chunk 32 --size 104850 -n 3 /dev/sdb /dev/sdc /dev/sde -R
#add spare
mdadm --add /dev/md/imsm0 /dev/sdd
#run OLCE
mdadm --grow /dev/md/imsm0 --raid-devices 4
#when reshape starts, I'm (physically) pulling device out

> Also you say it is blocking in md_do_sync.  Is that at the
> 
> 	wait_event(mddev->recovery_wait, !atomic_read(&mddev-
> >recovery_active));
> 
> call just after the "out:" label?

None of those 2 places.
It enters sync_request() function. Md_error() is called. 
More is visible on thread stack information below (md_wait_for_blocked_rdev()).


> 
> What is the raid thread doing at this point?
>    cat /proc/PID/stack
> might help.

[md126_raid5]
[<ffffffff8121d843>] md_wait_for_blocked_rdev+0xbc/0x10f
[<ffffffffa01d87ce>] handle_stripe+0x1c5c/0x2c99 [raid456]
[<ffffffffa01d9d0d>] raid5d+0x502/0x564 [raid456]
[<ffffffff8121eca5>] md_thread+0x101/0x11f
[<ffffffff81049e0e>] kthread+0x81/0x89
[<ffffffff812cc4f4>] kernel_thread_helper+0x4/0x10
[<ffffffffffffffff>] 0xffffffffffffffff

[md126_reshape]
[<ffffffffa02455a2>] sync_request+0x90a/0xbfb [raid456]
[<ffffffff8121e151>] md_do_sync+0x7aa/0xc40
[<ffffffff8121ecb3>] md_thread+0x101/0x11f
[<ffffffff81049e0e>] kthread+0x81/0x89
[<ffffffff812cc4f4>] kernel_thread_helper+0x4/0x10
[<ffffffffffffffff>] 0xffffffffffffffff

> 
> What are the contents of all the sysfs files?
>    grep . /sys/block/mdXXX/md/*
array_state		->active
degraded		->1
max_read_errors	->20
reshape_position	->12288
resync_start		->none
sync_completed	->4096 / 209664


>    grep . /sys/block/mdXXX/md/dev-*/*

When removed is sdd   /sys/block/mdXXX/md/dev-sdd/*
bad_blocks		->4096 512
			->4608 128
			->4736 384
block			->MISSING link is not valid
errors			->0
offset			->0
recovery_start		->4096
size			->104832
slot			->3
state			->faulty,write_error
unacknowledged_bad_blocks	->4096 512
				->4608 128
				->4736 384

I hope this helps.


BR
Adam

 
> Thanks,
> NeilBrown


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] md: Add ability for disable bad block management
  2011-12-07 11:10           ` Kwolek, Adam
@ 2011-12-08  4:02             ` NeilBrown
  2011-12-08 15:36               ` Kwolek, Adam
  0 siblings, 1 reply; 13+ messages in thread
From: NeilBrown @ 2011-12-08  4:02 UTC (permalink / raw)
  To: Kwolek, Adam
  Cc: linux-raid@vger.kernel.org, Ciechanowski, Ed, Labun, Marcin,
	Williams, Dan J

[-- Attachment #1: Type: text/plain, Size: 3625 bytes --]

On Wed, 7 Dec 2011 11:10:06 +0000 "Kwolek, Adam" <adam.kwolek@intel.com>
wrote:

> 
> 
> > -----Original Message-----
> > From: NeilBrown [mailto:neilb@suse.de]

> > I cannot reproduce this.
> > I didn't physically remove devices, but I used
> >    echo 1 > /sys/block/sdc/device/delete
> > which should be nearly identical from the perspective of md and mdadm.
> 
> I've checked that when I'm deleting device using sysfs  everything works perfect. 
> When when device is pulled out, reshape stops in md/mdstat.
> 
> > If you could give me the exact set of steps that you follow to produce the
> > problem that would help - maybe a script?  Just a description is OK.
> 
> 
> #used disks sdb, sdc, sdd, sde
> export IMSM_NO_PLATFORM=1
> #create container
> mdadm -C /dev/md/imsm0 -amd -e imsm -n 3 /dev/sdb /dev/sdc /dev/sde -R
> #create vol
> mdadm -C /dev/md/raid5vol_0 -amd -l 5 --chunk 32 --size 104850 -n 3 /dev/sdb /dev/sdc /dev/sde -R
> #add spare
> mdadm --add /dev/md/imsm0 /dev/sdd
> #run OLCE
> mdadm --grow /dev/md/imsm0 --raid-devices 4
> #when reshape starts, I'm (physically) pulling device out
> 
> > Also you say it is blocking in md_do_sync.  Is that at the
> > 
> > 	wait_event(mddev->recovery_wait, !atomic_read(&mddev-
> > >recovery_active));
> > 
> > call just after the "out:" label?
> 
> None of those 2 places.
> It enters sync_request() function. Md_error() is called. 
> More is visible on thread stack information below (md_wait_for_blocked_rdev()).
> 
> 
> > 
> > What is the raid thread doing at this point?
> >    cat /proc/PID/stack
> > might help.
> 
> [md126_raid5]
> [<ffffffff8121d843>] md_wait_for_blocked_rdev+0xbc/0x10f
> [<ffffffffa01d87ce>] handle_stripe+0x1c5c/0x2c99 [raid456]
> [<ffffffffa01d9d0d>] raid5d+0x502/0x564 [raid456]
> [<ffffffff8121eca5>] md_thread+0x101/0x11f
> [<ffffffff81049e0e>] kthread+0x81/0x89
> [<ffffffff812cc4f4>] kernel_thread_helper+0x4/0x10
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> [md126_reshape]
> [<ffffffffa02455a2>] sync_request+0x90a/0xbfb [raid456]
> [<ffffffff8121e151>] md_do_sync+0x7aa/0xc40
> [<ffffffff8121ecb3>] md_thread+0x101/0x11f
> [<ffffffff81049e0e>] kthread+0x81/0x89
> [<ffffffff812cc4f4>] kernel_thread_helper+0x4/0x10
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> > 
> > What are the contents of all the sysfs files?
> >    grep . /sys/block/mdXXX/md/*
> array_state		->active
> degraded		->1
> max_read_errors	->20
> reshape_position	->12288
> resync_start		->none
> sync_completed	->4096 / 209664
> 
> 
> >    grep . /sys/block/mdXXX/md/dev-*/*
> 
> When removed is sdd   /sys/block/mdXXX/md/dev-sdd/*
> bad_blocks		->4096 512
> 			->4608 128
> 			->4736 384
> block			->MISSING link is not valid
> errors			->0
> offset			->0
> recovery_start		->4096
> size			->104832
> slot			->3
> state			->faulty,write_error
> unacknowledged_bad_blocks	->4096 512
> 				->4608 128
> 				->4736 384
> 
> I hope this helps.

Yes it does, thanks.

Can you try with this patch as well please.

Thanks,
NeilBrown


diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index ea6dce9..6cf0f6a 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3175,6 +3175,8 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s)
 			rdev = rcu_dereference(conf->disks[i].rdev);
 			clear_bit(R5_ReadRepl, &dev->flags);
 		}
+		if (rdev && test_bit(Faulty, &rdev->flags))
+			rdev = NULL;
 		if (rdev) {
 			is_bad = is_badblock(rdev, sh->sector, STRIPE_SECTORS,
 					     &first_bad, &bad_sectors);


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* RE: [PATCH] md: Add ability for disable bad block management
  2011-12-08  4:02             ` NeilBrown
@ 2011-12-08 15:36               ` Kwolek, Adam
  2011-12-09  3:53                 ` NeilBrown
  0 siblings, 1 reply; 13+ messages in thread
From: Kwolek, Adam @ 2011-12-08 15:36 UTC (permalink / raw)
  To: NeilBrown
  Cc: linux-raid@vger.kernel.org, Ciechanowski, Ed, Labun, Marcin,
	Williams, Dan J



> -----Original Message-----
> From: NeilBrown [mailto:neilb@suse.de]
> Sent: Thursday, December 08, 2011 5:02 AM
> To: Kwolek, Adam
> Cc: linux-raid@vger.kernel.org; Ciechanowski, Ed; Labun, Marcin; Williams,
> Dan J
> Subject: Re: [PATCH] md: Add ability for disable bad block management
> 
> On Wed, 7 Dec 2011 11:10:06 +0000 "Kwolek, Adam"
> <adam.kwolek@intel.com>
> wrote:
> 
> >
> >
> > > -----Original Message-----
> > > From: NeilBrown [mailto:neilb@suse.de]
> 
> > > I cannot reproduce this.
> > > I didn't physically remove devices, but I used
> > >    echo 1 > /sys/block/sdc/device/delete which should be nearly
> > > identical from the perspective of md and mdadm.
> >
> > I've checked that when I'm deleting device using sysfs  everything works
> perfect.
> > When when device is pulled out, reshape stops in md/mdstat.
> >
> > > If you could give me the exact set of steps that you follow to
> > > produce the problem that would help - maybe a script?  Just a description
> is OK.
> >
> >
> > #used disks sdb, sdc, sdd, sde
> > export IMSM_NO_PLATFORM=1
> > #create container
> > mdadm -C /dev/md/imsm0 -amd -e imsm -n 3 /dev/sdb /dev/sdc /dev/sde
> -R
> > #create vol mdadm -C /dev/md/raid5vol_0 -amd -l 5 --chunk 32 --size
> > 104850 -n 3 /dev/sdb /dev/sdc /dev/sde -R #add spare mdadm --add
> > /dev/md/imsm0 /dev/sdd #run OLCE mdadm --grow /dev/md/imsm0
> > --raid-devices 4 #when reshape starts, I'm (physically) pulling device
> > out
> >
> > > Also you say it is blocking in md_do_sync.  Is that at the
> > >
> > > 	wait_event(mddev->recovery_wait, !atomic_read(&mddev-
> > > >recovery_active));
> > >
> > > call just after the "out:" label?
> >
> > None of those 2 places.
> > It enters sync_request() function. Md_error() is called.
> > More is visible on thread stack information below
> (md_wait_for_blocked_rdev()).
> >
> >
> > >
> > > What is the raid thread doing at this point?
> > >    cat /proc/PID/stack
> > > might help.
> >
> > [md126_raid5]
> > [<ffffffff8121d843>] md_wait_for_blocked_rdev+0xbc/0x10f
> > [<ffffffffa01d87ce>] handle_stripe+0x1c5c/0x2c99 [raid456]
> > [<ffffffffa01d9d0d>] raid5d+0x502/0x564 [raid456] [<ffffffff8121eca5>]
> > md_thread+0x101/0x11f [<ffffffff81049e0e>] kthread+0x81/0x89
> > [<ffffffff812cc4f4>] kernel_thread_helper+0x4/0x10
> > [<ffffffffffffffff>] 0xffffffffffffffff
> >
> > [md126_reshape]
> > [<ffffffffa02455a2>] sync_request+0x90a/0xbfb [raid456]
> > [<ffffffff8121e151>] md_do_sync+0x7aa/0xc40 [<ffffffff8121ecb3>]
> > md_thread+0x101/0x11f [<ffffffff81049e0e>] kthread+0x81/0x89
> > [<ffffffff812cc4f4>] kernel_thread_helper+0x4/0x10
> > [<ffffffffffffffff>] 0xffffffffffffffff
> >
> > >
> > > What are the contents of all the sysfs files?
> > >    grep . /sys/block/mdXXX/md/*
> > array_state		->active
> > degraded		->1
> > max_read_errors	->20
> > reshape_position	->12288
> > resync_start		->none
> > sync_completed	->4096 / 209664
> >
> >
> > >    grep . /sys/block/mdXXX/md/dev-*/*
> >
> > When removed is sdd   /sys/block/mdXXX/md/dev-sdd/*
> > bad_blocks		->4096 512
> > 			->4608 128
> > 			->4736 384
> > block			->MISSING link is not valid
> > errors			->0
> > offset			->0
> > recovery_start		->4096
> > size			->104832
> > slot			->3
> > state			->faulty,write_error
> > unacknowledged_bad_blocks	->4096 512
> > 				->4608 128
> > 				->4736 384
> >
> > I hope this helps.
> 
> Yes it does, thanks.
> 
> Can you try with this patch as well please.
> 
> Thanks,
> NeilBrown
> 
> 
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index ea6dce9..6cf0f6a
> 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -3175,6 +3175,8 @@ static void analyse_stripe(struct stripe_head *sh,
> struct stripe_head_state *s)
>  			rdev = rcu_dereference(conf->disks[i].rdev);
>  			clear_bit(R5_ReadRepl, &dev->flags);
>  		}
> +		if (rdev && test_bit(Faulty, &rdev->flags))
> +			rdev = NULL;
>  		if (rdev) {
>  			is_bad = is_badblock(rdev, sh->sector,
> STRIPE_SECTORS,
>  					     &first_bad, &bad_sectors);

I've didn't succeed with this patch only, but when I've switch to newest md from today's neil_for-linus branch things went better.
During migration it seems that it is OK.

Problems are when during rebuild/resync additional disk is failed (physical pull). Metadata react correctly (mdadm/mdmon) but md stops again. This time:
 
[md126_resync]
[<ffffffffa027037d>] get_active_stripe+0x295/0x598 [raid456]
[<ffffffffa02757da>] sync_request+0xb1c/0xba7 [raid456]
[<ffffffff8121e656>] md_do_sync+0x772/0xbc4
[<ffffffff8121f174>] md_thread+0x101/0x11f
[<ffffffff81049ebe>] kthread+0x81/0x89
[<ffffffff812cc934>] kernel_thread_helper+0x4/0x10
[<ffffffffffffffff>] 0xffffffffffffffff

Thread [md126_raid5] is missing, but in mdstat raid5 resync/rebuild is visible
During initialization one time it was executed correctly, second time it stops exactly as rebuild in get_active_stripe() and [md126_raid5] thread was missing also.
Any 'mdadm -Ss' causes system hung (not very  surprising without raid5 thread)

In /var/log/messages we have:
Dec  8 12:39:49 gklab-128-013 kernel: Modules linked in: raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx ext2 nvidia(P) snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device ipv6 af_packet cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode fuse loop dm_mod snd_hda_codec_hdmi snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd intel_agp iTCO_wdt tpm_tis tpm soundcore e100 pcspkr mii tpm_bios snd_page_alloc sr_mod cdrom serio_raw i2c_i801 i2c_core iTCO_vendor_support sg intel_gtt button agpgart usbhid hid uhci_hcd sd_mod crc_t10dif ehci_hcd usbcore usb_common edd ext3 mbcache jbd fan processor ide_pci_generic ide_core ata_generic ahci libahci pata_marvell libata scsi_mod thermal thermal_sys hwmon
Dec  8 12:39:49 gklab-128-013 kernel: 
Dec  8 12:39:49 gklab-128-013 kernel: Pid: 4584, comm: md126_raid5 Tainted: P             3.2.0-rc1-SLE11_BRANCH_ADK #10                  /DP35DP
Dec  8 12:39:49 gklab-128-013 kernel: RIP: 0010:[<ffffffffa0280e67>]  [<ffffffffa0280e67>] handle_stripe+0x2f5/0x2cbf [raid456]
Dec  8 12:39:49 gklab-128-013 kernel: RSP: 0018:ffff8800d61cdb80  EFLAGS: 00010002
Dec  8 12:39:49 gklab-128-013 kernel: RAX: 0000000000008001 RBX: 0000000000000000 RCX: 0000000000000002
Dec  8 12:39:49 gklab-128-013 kernel: RDX: 0000000000000000 RSI: ffff880114462800 RDI: ffff8801144629a8
Dec  8 12:39:49 gklab-128-013 kernel: RBP: ffff8800d61cdd40 R08: ffff8800379256c0 R09: 0000000300000000
Dec  8 12:39:49 gklab-128-013 kernel: R10: ffff88010e5bfa00 R11: 0000000100000001 R12: ffff8800372602c8
Dec  8 12:39:49 gklab-128-013 kernel: R13: ffff880037260048 R14: ffff8800372602d0 R15: ffff8801144638b0
Dec  8 12:39:49 gklab-128-013 kernel: FS:  0000000000000000(0000) GS:ffff88011bc00000(0000) knlGS:0000000000000000
Dec  8 12:39:49 gklab-128-013 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Dec  8 12:39:49 gklab-128-013 kernel: CR2: 00000000000000b0 CR3: 00000000379b3000 CR4: 00000000000006f0
Dec  8 12:39:49 gklab-128-013 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Dec  8 12:39:49 gklab-128-013 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Dec  8 12:39:49 gklab-128-013 kernel: Process md126_raid5 (pid: 4584, threadinfo ffff8800d61cc000, task ffff88003715a7c0)
Dec  8 12:39:49 gklab-128-013 kernel: Stack:
Dec  8 12:39:49 gklab-128-013 kernel:  0000000000000000 0000000000000000 0000000000000000 0000000000000000
Dec  8 12:39:49 gklab-128-013 kernel:  0000000000000000 0000000000000000 0000000000000000 0000000000000000
Dec  8 12:39:49 gklab-128-013 kernel:  0000000000000400 0000000000000400 0000000300000000 ffff88010e749280
Dec  8 12:39:49 gklab-128-013 kernel: Call Trace:
Dec  8 12:39:49 gklab-128-013 kernel:  [<ffffffff81221fd4>] ? md_check_recovery+0x60d/0x630
Dec  8 12:39:49 gklab-128-013 kernel:  [<ffffffffa027ef28>] ? __release_stripe+0x174/0x18f [raid456]
Dec  8 12:39:49 gklab-128-013 kernel:  [<ffffffffa0283d33>] raid5d+0x502/0x564 [raid456]
Dec  8 12:39:49 gklab-128-013 kernel:  [<ffffffff812c3e6c>] ? schedule_timeout+0x35/0x1e8
Dec  8 12:39:49 gklab-128-013 kernel:  [<ffffffff8121f174>] md_thread+0x101/0x11f
Dec  8 12:39:49 gklab-128-013 kernel:  [<ffffffff8104a2ad>] ? wake_up_bit+0x23/0x23
Dec  8 12:39:49 gklab-128-013 kernel:  [<ffffffff8121f073>] ? md_register_thread+0xd6/0xd6
Dec  8 12:39:50 gklab-128-013 kernel:  [<ffffffff81049ebe>] kthread+0x81/0x89
Dec  8 12:39:50 gklab-128-013 kernel:  [<ffffffff812cc934>] kernel_thread_helper+0x4/0x10
Dec  8 12:39:50 gklab-128-013 kernel:  [<ffffffff81049e3d>] ? kthread_worker_fn+0x145/0x145
Dec  8 12:39:50 gklab-128-013 kernel:  [<ffffffff812cc930>] ? gs_change+0xb/0xb
Dec  8 12:39:50 gklab-128-013 kernel: Code: 75 11 49 8b 45 30 48 83 c0 08 48 3b 83 e0 00 00 00 77 07 f0 41 80 4c 24 08 08 49 8b 44 24 08 66 85 c0 79 2c f0 41 80 64 24 08 f7 
Dec  8 12:39:50 gklab-128-013 kernel: <48> 8b 83 b0 00 00 00 a8 02 75 10 c7 45 80 01 00 00 00 f0 ff 83 
Dec  8 12:39:50 gklab-128-013 kernel: RIP  [<ffffffffa0280e67>] handle_stripe+0x2f5/0x2cbf [raid456]
Dec  8 12:39:50 gklab-128-013 kernel:  RSP <ffff8800d61cdb80>
Dec  8 12:39:50 gklab-128-013 kernel: CR2: 00000000000000b0


The problem is caused by access to just cleaned rdev a few lines below in raid5.c.
 The following patch corrects it.

From fbaa3fdff634721e5c2c09e07b8429385494ee02 Mon Sep 17 00:00:00 2001
From: Adam Kwolek <adam.kwolek@intel.com>
Date: Thu, 8 Dec 2011 15:34:09 +0100
Subject: [PATCH] md: raid5 crash during degradation

NULL pointer access causes crash in raid5 module.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---
 drivers/md/raid5.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index b0dec01..da4997c 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3070,7 +3070,7 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s)
 			if (sh->sector + STRIPE_SECTORS <= rdev->recovery_offset)
 				set_bit(R5_Insync, &dev->flags);
 		}
-		if (test_bit(R5_WriteError, &dev->flags)) {
+		if (test_bit(R5_WriteError, &dev->flags) && rdev) {
 			clear_bit(R5_Insync, &dev->flags);
 			if (!test_bit(Faulty, &rdev->flags)) {
 				s->handle_bad_blocks = 1;
@@ -3078,7 +3078,7 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s)
 			} else
 				clear_bit(R5_WriteError, &dev->flags);
 		}
-		if (test_bit(R5_MadeGood, &dev->flags)) {
+		if (test_bit(R5_MadeGood, &dev->flags) && rdev) {
 			if (!test_bit(Faulty, &rdev->flags)) {
 				s->handle_bad_blocks = 1;
 				atomic_inc(&rdev->nr_pending);
-- 
1.6.0.2
 

Possible that you will have to add something in addition to my simple access blocking patch /some flags logic/

BR
Adam






^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] md: Add ability for disable bad block management
  2011-12-08 15:36               ` Kwolek, Adam
@ 2011-12-09  3:53                 ` NeilBrown
  0 siblings, 0 replies; 13+ messages in thread
From: NeilBrown @ 2011-12-09  3:53 UTC (permalink / raw)
  To: Kwolek, Adam
  Cc: linux-raid@vger.kernel.org, Ciechanowski, Ed, Labun, Marcin,
	Williams, Dan J

[-- Attachment #1: Type: text/plain, Size: 11600 bytes --]

On Thu, 8 Dec 2011 15:36:43 +0000 "Kwolek, Adam" <adam.kwolek@intel.com>
wrote:

> 
> 
> > -----Original Message-----
> > From: NeilBrown [mailto:neilb@suse.de]
> > Sent: Thursday, December 08, 2011 5:02 AM
> > To: Kwolek, Adam
> > Cc: linux-raid@vger.kernel.org; Ciechanowski, Ed; Labun, Marcin; Williams,
> > Dan J
> > Subject: Re: [PATCH] md: Add ability for disable bad block management
> > 
> > On Wed, 7 Dec 2011 11:10:06 +0000 "Kwolek, Adam"
> > <adam.kwolek@intel.com>
> > wrote:
> > 
> > >
> > >
> > > > -----Original Message-----
> > > > From: NeilBrown [mailto:neilb@suse.de]
> > 
> > > > I cannot reproduce this.
> > > > I didn't physically remove devices, but I used
> > > >    echo 1 > /sys/block/sdc/device/delete which should be nearly
> > > > identical from the perspective of md and mdadm.
> > >
> > > I've checked that when I'm deleting device using sysfs  everything works
> > perfect.
> > > When when device is pulled out, reshape stops in md/mdstat.
> > >
> > > > If you could give me the exact set of steps that you follow to
> > > > produce the problem that would help - maybe a script?  Just a description
> > is OK.
> > >
> > >
> > > #used disks sdb, sdc, sdd, sde
> > > export IMSM_NO_PLATFORM=1
> > > #create container
> > > mdadm -C /dev/md/imsm0 -amd -e imsm -n 3 /dev/sdb /dev/sdc /dev/sde
> > -R
> > > #create vol mdadm -C /dev/md/raid5vol_0 -amd -l 5 --chunk 32 --size
> > > 104850 -n 3 /dev/sdb /dev/sdc /dev/sde -R #add spare mdadm --add
> > > /dev/md/imsm0 /dev/sdd #run OLCE mdadm --grow /dev/md/imsm0
> > > --raid-devices 4 #when reshape starts, I'm (physically) pulling device
> > > out
> > >
> > > > Also you say it is blocking in md_do_sync.  Is that at the
> > > >
> > > > 	wait_event(mddev->recovery_wait, !atomic_read(&mddev-
> > > > >recovery_active));
> > > >
> > > > call just after the "out:" label?
> > >
> > > None of those 2 places.
> > > It enters sync_request() function. Md_error() is called.
> > > More is visible on thread stack information below
> > (md_wait_for_blocked_rdev()).
> > >
> > >
> > > >
> > > > What is the raid thread doing at this point?
> > > >    cat /proc/PID/stack
> > > > might help.
> > >
> > > [md126_raid5]
> > > [<ffffffff8121d843>] md_wait_for_blocked_rdev+0xbc/0x10f
> > > [<ffffffffa01d87ce>] handle_stripe+0x1c5c/0x2c99 [raid456]
> > > [<ffffffffa01d9d0d>] raid5d+0x502/0x564 [raid456] [<ffffffff8121eca5>]
> > > md_thread+0x101/0x11f [<ffffffff81049e0e>] kthread+0x81/0x89
> > > [<ffffffff812cc4f4>] kernel_thread_helper+0x4/0x10
> > > [<ffffffffffffffff>] 0xffffffffffffffff
> > >
> > > [md126_reshape]
> > > [<ffffffffa02455a2>] sync_request+0x90a/0xbfb [raid456]
> > > [<ffffffff8121e151>] md_do_sync+0x7aa/0xc40 [<ffffffff8121ecb3>]
> > > md_thread+0x101/0x11f [<ffffffff81049e0e>] kthread+0x81/0x89
> > > [<ffffffff812cc4f4>] kernel_thread_helper+0x4/0x10
> > > [<ffffffffffffffff>] 0xffffffffffffffff
> > >
> > > >
> > > > What are the contents of all the sysfs files?
> > > >    grep . /sys/block/mdXXX/md/*
> > > array_state		->active
> > > degraded		->1
> > > max_read_errors	->20
> > > reshape_position	->12288
> > > resync_start		->none
> > > sync_completed	->4096 / 209664
> > >
> > >
> > > >    grep . /sys/block/mdXXX/md/dev-*/*
> > >
> > > When removed is sdd   /sys/block/mdXXX/md/dev-sdd/*
> > > bad_blocks		->4096 512
> > > 			->4608 128
> > > 			->4736 384
> > > block			->MISSING link is not valid
> > > errors			->0
> > > offset			->0
> > > recovery_start		->4096
> > > size			->104832
> > > slot			->3
> > > state			->faulty,write_error
> > > unacknowledged_bad_blocks	->4096 512
> > > 				->4608 128
> > > 				->4736 384
> > >
> > > I hope this helps.
> > 
> > Yes it does, thanks.
> > 
> > Can you try with this patch as well please.
> > 
> > Thanks,
> > NeilBrown
> > 
> > 
> > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index ea6dce9..6cf0f6a
> > 100644
> > --- a/drivers/md/raid5.c
> > +++ b/drivers/md/raid5.c
> > @@ -3175,6 +3175,8 @@ static void analyse_stripe(struct stripe_head *sh,
> > struct stripe_head_state *s)
> >  			rdev = rcu_dereference(conf->disks[i].rdev);
> >  			clear_bit(R5_ReadRepl, &dev->flags);
> >  		}
> > +		if (rdev && test_bit(Faulty, &rdev->flags))
> > +			rdev = NULL;
> >  		if (rdev) {
> >  			is_bad = is_badblock(rdev, sh->sector,
> > STRIPE_SECTORS,
> >  					     &first_bad, &bad_sectors);
> 
> I've didn't succeed with this patch only, but when I've switch to newest md from today's neil_for-linus branch things went better.
> During migration it seems that it is OK.
> 
> Problems are when during rebuild/resync additional disk is failed (physical pull). Metadata react correctly (mdadm/mdmon) but md stops again. This time:
>  
> [md126_resync]
> [<ffffffffa027037d>] get_active_stripe+0x295/0x598 [raid456]
> [<ffffffffa02757da>] sync_request+0xb1c/0xba7 [raid456]
> [<ffffffff8121e656>] md_do_sync+0x772/0xbc4
> [<ffffffff8121f174>] md_thread+0x101/0x11f
> [<ffffffff81049ebe>] kthread+0x81/0x89
> [<ffffffff812cc934>] kernel_thread_helper+0x4/0x10
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> Thread [md126_raid5] is missing, but in mdstat raid5 resync/rebuild is visible
> During initialization one time it was executed correctly, second time it stops exactly as rebuild in get_active_stripe() and [md126_raid5] thread was missing also.
> Any 'mdadm -Ss' causes system hung (not very  surprising without raid5 thread)
> 
> In /var/log/messages we have:
> Dec  8 12:39:49 gklab-128-013 kernel: Modules linked in: raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx ext2 nvidia(P) snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device ipv6 af_packet cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode fuse loop dm_mod snd_hda_codec_hdmi snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_timer snd intel_agp iTCO_wdt tpm_tis tpm soundcore e100 pcspkr mii tpm_bios snd_page_alloc sr_mod cdrom serio_raw i2c_i801 i2c_core iTCO_vendor_support sg intel_gtt button agpgart usbhid hid uhci_hcd sd_mod crc_t10dif ehci_hcd usbcore usb_common edd ext3 mbcache jbd fan processor ide_pci_generic ide_core ata_generic ahci libahci pata_marvell libata scsi_mod thermal thermal_sys hwmon
> Dec  8 12:39:49 gklab-128-013 kernel: 
> Dec  8 12:39:49 gklab-128-013 kernel: Pid: 4584, comm: md126_raid5 Tainted: P             3.2.0-rc1-SLE11_BRANCH_ADK #10                  /DP35DP
> Dec  8 12:39:49 gklab-128-013 kernel: RIP: 0010:[<ffffffffa0280e67>]  [<ffffffffa0280e67>] handle_stripe+0x2f5/0x2cbf [raid456]
> Dec  8 12:39:49 gklab-128-013 kernel: RSP: 0018:ffff8800d61cdb80  EFLAGS: 00010002
> Dec  8 12:39:49 gklab-128-013 kernel: RAX: 0000000000008001 RBX: 0000000000000000 RCX: 0000000000000002
> Dec  8 12:39:49 gklab-128-013 kernel: RDX: 0000000000000000 RSI: ffff880114462800 RDI: ffff8801144629a8
> Dec  8 12:39:49 gklab-128-013 kernel: RBP: ffff8800d61cdd40 R08: ffff8800379256c0 R09: 0000000300000000
> Dec  8 12:39:49 gklab-128-013 kernel: R10: ffff88010e5bfa00 R11: 0000000100000001 R12: ffff8800372602c8
> Dec  8 12:39:49 gklab-128-013 kernel: R13: ffff880037260048 R14: ffff8800372602d0 R15: ffff8801144638b0
> Dec  8 12:39:49 gklab-128-013 kernel: FS:  0000000000000000(0000) GS:ffff88011bc00000(0000) knlGS:0000000000000000
> Dec  8 12:39:49 gklab-128-013 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> Dec  8 12:39:49 gklab-128-013 kernel: CR2: 00000000000000b0 CR3: 00000000379b3000 CR4: 00000000000006f0
> Dec  8 12:39:49 gklab-128-013 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Dec  8 12:39:49 gklab-128-013 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Dec  8 12:39:49 gklab-128-013 kernel: Process md126_raid5 (pid: 4584, threadinfo ffff8800d61cc000, task ffff88003715a7c0)
> Dec  8 12:39:49 gklab-128-013 kernel: Stack:
> Dec  8 12:39:49 gklab-128-013 kernel:  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> Dec  8 12:39:49 gklab-128-013 kernel:  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> Dec  8 12:39:49 gklab-128-013 kernel:  0000000000000400 0000000000000400 0000000300000000 ffff88010e749280
> Dec  8 12:39:49 gklab-128-013 kernel: Call Trace:
> Dec  8 12:39:49 gklab-128-013 kernel:  [<ffffffff81221fd4>] ? md_check_recovery+0x60d/0x630
> Dec  8 12:39:49 gklab-128-013 kernel:  [<ffffffffa027ef28>] ? __release_stripe+0x174/0x18f [raid456]
> Dec  8 12:39:49 gklab-128-013 kernel:  [<ffffffffa0283d33>] raid5d+0x502/0x564 [raid456]
> Dec  8 12:39:49 gklab-128-013 kernel:  [<ffffffff812c3e6c>] ? schedule_timeout+0x35/0x1e8
> Dec  8 12:39:49 gklab-128-013 kernel:  [<ffffffff8121f174>] md_thread+0x101/0x11f
> Dec  8 12:39:49 gklab-128-013 kernel:  [<ffffffff8104a2ad>] ? wake_up_bit+0x23/0x23
> Dec  8 12:39:49 gklab-128-013 kernel:  [<ffffffff8121f073>] ? md_register_thread+0xd6/0xd6
> Dec  8 12:39:50 gklab-128-013 kernel:  [<ffffffff81049ebe>] kthread+0x81/0x89
> Dec  8 12:39:50 gklab-128-013 kernel:  [<ffffffff812cc934>] kernel_thread_helper+0x4/0x10
> Dec  8 12:39:50 gklab-128-013 kernel:  [<ffffffff81049e3d>] ? kthread_worker_fn+0x145/0x145
> Dec  8 12:39:50 gklab-128-013 kernel:  [<ffffffff812cc930>] ? gs_change+0xb/0xb
> Dec  8 12:39:50 gklab-128-013 kernel: Code: 75 11 49 8b 45 30 48 83 c0 08 48 3b 83 e0 00 00 00 77 07 f0 41 80 4c 24 08 08 49 8b 44 24 08 66 85 c0 79 2c f0 41 80 64 24 08 f7 
> Dec  8 12:39:50 gklab-128-013 kernel: <48> 8b 83 b0 00 00 00 a8 02 75 10 c7 45 80 01 00 00 00 f0 ff 83 
> Dec  8 12:39:50 gklab-128-013 kernel: RIP  [<ffffffffa0280e67>] handle_stripe+0x2f5/0x2cbf [raid456]
> Dec  8 12:39:50 gklab-128-013 kernel:  RSP <ffff8800d61cdb80>
> Dec  8 12:39:50 gklab-128-013 kernel: CR2: 00000000000000b0
> 
> 
> The problem is caused by access to just cleaned rdev a few lines below in raid5.c.
>  The following patch corrects it.
> 
> >From fbaa3fdff634721e5c2c09e07b8429385494ee02 Mon Sep 17 00:00:00 2001
> From: Adam Kwolek <adam.kwolek@intel.com>
> Date: Thu, 8 Dec 2011 15:34:09 +0100
> Subject: [PATCH] md: raid5 crash during degradation
> 
> NULL pointer access causes crash in raid5 module.
> 
> Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>

Ahh, thanks.  Obvious once you see it :-)

Thanks,
I've sent this and the other fixes off to Linus.


NeilBrown


> ---
>  drivers/md/raid5.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index b0dec01..da4997c 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -3070,7 +3070,7 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s)
>  			if (sh->sector + STRIPE_SECTORS <= rdev->recovery_offset)
>  				set_bit(R5_Insync, &dev->flags);
>  		}
> -		if (test_bit(R5_WriteError, &dev->flags)) {
> +		if (test_bit(R5_WriteError, &dev->flags) && rdev) {
>  			clear_bit(R5_Insync, &dev->flags);
>  			if (!test_bit(Faulty, &rdev->flags)) {
>  				s->handle_bad_blocks = 1;
> @@ -3078,7 +3078,7 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s)
>  			} else
>  				clear_bit(R5_WriteError, &dev->flags);
>  		}
> -		if (test_bit(R5_MadeGood, &dev->flags)) {
> +		if (test_bit(R5_MadeGood, &dev->flags) && rdev) {
>  			if (!test_bit(Faulty, &rdev->flags)) {
>  				s->handle_bad_blocks = 1;
>  				atomic_inc(&rdev->nr_pending);


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-12-09  3:53 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-24 12:19 [PATCH] md: Add ability for disable bad block management Adam Kwolek
2011-11-24 12:23 ` Paul Menzel
2011-11-24 12:28   ` Kwolek, Adam
2011-11-24 12:48     ` Paul Menzel
2011-11-30  0:14 ` NeilBrown
2011-11-30  8:17   ` Kwolek, Adam
2011-12-06  6:05     ` NeilBrown
2011-12-06 13:02       ` Kwolek, Adam
2011-12-07  1:52         ` NeilBrown
2011-12-07 11:10           ` Kwolek, Adam
2011-12-08  4:02             ` NeilBrown
2011-12-08 15:36               ` Kwolek, Adam
2011-12-09  3:53                 ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).