* [PATCH 3/3 v2] md: unblock array if bad blocks have been acknowledged
@ 2016-10-18 14:10 Tomasz Majchrzak
  2016-10-20  5:28 ` Shaohua Li
  0 siblings, 1 reply; 5+ messages in thread
From: Tomasz Majchrzak @ 2016-10-18 14:10 UTC (permalink / raw)
  To: linux-raid
  Cc: shli, aleksey.obitotskiy, pawel.baldysiak, artur.paszkiewicz,
	maksymilian.kunt, mariusz.dabrowski, axboe, dan.j.williams,
	linux-block, Tomasz Majchrzak
Once external metadata handler acknowledges all bad blocks (by writing
to rdev 'bad_blocks' sysfs file), it requests to unblock the array.
Check if all bad blocks are actually acknowledged as there might be a
race if new bad blocks are notified at the same time. If all bad blocks
are acknowledged, just unblock the array and continue. If not, ignore
the request to unblock (do not fail an array). External metadata handler
is expected to either process remaining bad blocks and try to unblock
again or remove bad block support for a disk (which will cause disk to
fail as in no-support case).
Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Reviewed-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
---
 drivers/md/md.c | 24 +++++++++++++++++-------
 1 file changed, 17 insertions(+), 7 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index cc05236..ce585b7 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -2612,19 +2612,29 @@ state_store(struct md_rdev *rdev, const char *buf, size_t len)
 		set_bit(Blocked, &rdev->flags);
 		err = 0;
 	} else if (cmd_match(buf, "-blocked")) {
-		if (!test_bit(Faulty, &rdev->flags) &&
+		int unblock = 1;
+		int acked = !rdev->badblocks.unacked_exist;
+
+		if ((test_bit(ExternalBbl, &rdev->flags) &&
+		     rdev->badblocks.changed))
+			acked = check_if_badblocks_acked(&rdev->badblocks);
+
+		if (test_bit(ExternalBbl, &rdev->flags) && !acked) {
+			unblock = 0;
+		} else if (!test_bit(Faulty, &rdev->flags) &&
 		    rdev->badblocks.unacked_exist) {
 			/* metadata handler doesn't understand badblocks,
 			 * so we need to fail the device
 			 */
 			md_error(rdev->mddev, rdev);
 		}
-		clear_bit(Blocked, &rdev->flags);
-		clear_bit(BlockedBadBlocks, &rdev->flags);
-		wake_up(&rdev->blocked_wait);
-		set_bit(MD_RECOVERY_NEEDED, &rdev->mddev->recovery);
-		md_wakeup_thread(rdev->mddev->thread);
-
+		if (unblock) {
+			clear_bit(Blocked, &rdev->flags);
+			clear_bit(BlockedBadBlocks, &rdev->flags);
+			wake_up(&rdev->blocked_wait);
+			set_bit(MD_RECOVERY_NEEDED, &rdev->mddev->recovery);
+			md_wakeup_thread(rdev->mddev->thread);
+		}
 		err = 0;
 	} else if (cmd_match(buf, "insync") && rdev->raid_disk == -1) {
 		set_bit(In_sync, &rdev->flags);
-- 
1.8.3.1
^ permalink raw reply related	[flat|nested] 5+ messages in thread
* Re: [PATCH 3/3 v2] md: unblock array if bad blocks have been acknowledged
  2016-10-18 14:10 [PATCH 3/3 v2] md: unblock array if bad blocks have been acknowledged Tomasz Majchrzak
@ 2016-10-20  5:28 ` Shaohua Li
  2016-10-20 12:09   ` Tomasz Majchrzak
  0 siblings, 1 reply; 5+ messages in thread
From: Shaohua Li @ 2016-10-20  5:28 UTC (permalink / raw)
  To: Tomasz Majchrzak
  Cc: linux-raid, aleksey.obitotskiy, pawel.baldysiak,
	artur.paszkiewicz, maksymilian.kunt, mariusz.dabrowski, axboe,
	dan.j.williams, linux-block
On Tue, Oct 18, 2016 at 04:10:24PM +0200, Tomasz Majchrzak wrote:
> Once external metadata handler acknowledges all bad blocks (by writing
> to rdev 'bad_blocks' sysfs file), it requests to unblock the array.
> Check if all bad blocks are actually acknowledged as there might be a
> race if new bad blocks are notified at the same time. If all bad blocks
> are acknowledged, just unblock the array and continue. If not, ignore
> the request to unblock (do not fail an array). External metadata handler
> is expected to either process remaining bad blocks and try to unblock
> again or remove bad block support for a disk (which will cause disk to
> fail as in no-support case).
> 
> Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
> Reviewed-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
> ---
>  drivers/md/md.c | 24 +++++++++++++++++-------
>  1 file changed, 17 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index cc05236..ce585b7 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -2612,19 +2612,29 @@ state_store(struct md_rdev *rdev, const char *buf, size_t len)
>  		set_bit(Blocked, &rdev->flags);
>  		err = 0;
>  	} else if (cmd_match(buf, "-blocked")) {
> -		if (!test_bit(Faulty, &rdev->flags) &&
> +		int unblock = 1;
> +		int acked = !rdev->badblocks.unacked_exist;
> +
> +		if ((test_bit(ExternalBbl, &rdev->flags) &&
> +		     rdev->badblocks.changed))
> +			acked = check_if_badblocks_acked(&rdev->badblocks);
> +
> +		if (test_bit(ExternalBbl, &rdev->flags) && !acked) {
> +			unblock = 0;
> +		} else if (!test_bit(Faulty, &rdev->flags) &&
I missed one thing in last review. writing to bad_blocks sysfs file already
clears the BlockedBadBlocks bit and wakeup the thread sleeping at blocked_wait,
so the array can continue. Why do we need to fix state_store here?
Thanks,
Shaohua
^ permalink raw reply	[flat|nested] 5+ messages in thread
* Re: [PATCH 3/3 v2] md: unblock array if bad blocks have been acknowledged
  2016-10-20  5:28 ` Shaohua Li
@ 2016-10-20 12:09   ` Tomasz Majchrzak
  2016-10-20 21:55     ` Shaohua Li
  0 siblings, 1 reply; 5+ messages in thread
From: Tomasz Majchrzak @ 2016-10-20 12:09 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid
On Wed, Oct 19, 2016 at 10:28:18PM -0700, Shaohua Li wrote:
> On Tue, Oct 18, 2016 at 04:10:24PM +0200, Tomasz Majchrzak wrote:
> > Once external metadata handler acknowledges all bad blocks (by writing
> > to rdev 'bad_blocks' sysfs file), it requests to unblock the array.
> > Check if all bad blocks are actually acknowledged as there might be a
> > race if new bad blocks are notified at the same time. If all bad blocks
> > are acknowledged, just unblock the array and continue. If not, ignore
> > the request to unblock (do not fail an array). External metadata handler
> > is expected to either process remaining bad blocks and try to unblock
> > again or remove bad block support for a disk (which will cause disk to
> > fail as in no-support case).
> > 
> > Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
> > Reviewed-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
> > ---
> >  drivers/md/md.c | 24 +++++++++++++++++-------
> >  1 file changed, 17 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > index cc05236..ce585b7 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -2612,19 +2612,29 @@ state_store(struct md_rdev *rdev, const char *buf, size_t len)
> >  		set_bit(Blocked, &rdev->flags);
> >  		err = 0;
> >  	} else if (cmd_match(buf, "-blocked")) {
> > -		if (!test_bit(Faulty, &rdev->flags) &&
> > +		int unblock = 1;
> > +		int acked = !rdev->badblocks.unacked_exist;
> > +
> > +		if ((test_bit(ExternalBbl, &rdev->flags) &&
> > +		     rdev->badblocks.changed))
> > +			acked = check_if_badblocks_acked(&rdev->badblocks);
> > +
> > +		if (test_bit(ExternalBbl, &rdev->flags) && !acked) {
> > +			unblock = 0;
> > +		} else if (!test_bit(Faulty, &rdev->flags) &&
> 
> I missed one thing in last review. writing to bad_blocks sysfs file already
> clears the BlockedBadBlocks bit and wakeup the thread sleeping at blocked_wait,
> so the array can continue. Why do we need to fix state_store here?
We cannot unblock the rdev until all bad blocks are acknowledged. The problem is
mdadm cannot be sure it has stored all bad blocks in the first pass (read of
unacknowledged_bad_blocks sysfs file). When bad block is encountered, rdev
enters Blocked, Faulty state (unacked_exists is non-zero in state_show). Then
mdadm reads the bad block, stores it in metadata and acknowledges it to the
kernel. Initially I have tried to call ack_all_badblocks in bb_store or in
state_store("-blocked") but there was a race. If other requests (the ones that
had started before array got into blocked state) notified bad blocks after sysfs
file was read by mdadm but before ack_all_badblocks call, ack_all_badblocks call
was also acknowledging bad blocks not stored (and never to be as a result) in
metadata. That's why I have introduced a new function
check_if_all_badblocks_acked to close this race.
I'm not sure if native bad block support is not prone to the similar problem -
bad block structure modified between metadata sync and ack_all_badblocks call.
As for BlockedBadBlocks flag cleared in bb_store, commit de393cdea66c ("md: make
it easier to wait for bad blocks to be acknowledged") explains this flag is only
an advisory. All awaiting requests are woken up and check if bad block that
interests them is already acknowledged. If so, then can continue, and if not,
they set the flag again to check in a while. It is just a useful optimization.
Please note that rdev with unacknowledged bad block is reported as Blocked via
sysfs state (non-zero unacked_exists), even though the corresponding rdev kernel
flag is not set. It is the reason why mdadm calls state_store("-blocked").
I hope it clarifies all your doubts.
Tomek
^ permalink raw reply	[flat|nested] 5+ messages in thread
* Re: [PATCH 3/3 v2] md: unblock array if bad blocks have been acknowledged
  2016-10-20 12:09   ` Tomasz Majchrzak
@ 2016-10-20 21:55     ` Shaohua Li
  2016-10-21 14:33       ` Tomasz Majchrzak
  0 siblings, 1 reply; 5+ messages in thread
From: Shaohua Li @ 2016-10-20 21:55 UTC (permalink / raw)
  To: Tomasz Majchrzak; +Cc: linux-raid
On Thu, Oct 20, 2016 at 02:09:15PM +0200, Tomasz Majchrzak wrote:
> On Wed, Oct 19, 2016 at 10:28:18PM -0700, Shaohua Li wrote:
> > On Tue, Oct 18, 2016 at 04:10:24PM +0200, Tomasz Majchrzak wrote:
> > > Once external metadata handler acknowledges all bad blocks (by writing
> > > to rdev 'bad_blocks' sysfs file), it requests to unblock the array.
> > > Check if all bad blocks are actually acknowledged as there might be a
> > > race if new bad blocks are notified at the same time. If all bad blocks
> > > are acknowledged, just unblock the array and continue. If not, ignore
> > > the request to unblock (do not fail an array). External metadata handler
> > > is expected to either process remaining bad blocks and try to unblock
> > > again or remove bad block support for a disk (which will cause disk to
> > > fail as in no-support case).
> > > 
> > > Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
> > > Reviewed-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
> > > ---
> > >  drivers/md/md.c | 24 +++++++++++++++++-------
> > >  1 file changed, 17 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > > index cc05236..ce585b7 100644
> > > --- a/drivers/md/md.c
> > > +++ b/drivers/md/md.c
> > > @@ -2612,19 +2612,29 @@ state_store(struct md_rdev *rdev, const char *buf, size_t len)
> > >  		set_bit(Blocked, &rdev->flags);
> > >  		err = 0;
> > >  	} else if (cmd_match(buf, "-blocked")) {
> > > -		if (!test_bit(Faulty, &rdev->flags) &&
> > > +		int unblock = 1;
> > > +		int acked = !rdev->badblocks.unacked_exist;
> > > +
> > > +		if ((test_bit(ExternalBbl, &rdev->flags) &&
> > > +		     rdev->badblocks.changed))
> > > +			acked = check_if_badblocks_acked(&rdev->badblocks);
> > > +
> > > +		if (test_bit(ExternalBbl, &rdev->flags) && !acked) {
> > > +			unblock = 0;
> > > +		} else if (!test_bit(Faulty, &rdev->flags) &&
> > 
> > I missed one thing in last review. writing to bad_blocks sysfs file already
> > clears the BlockedBadBlocks bit and wakeup the thread sleeping at blocked_wait,
> > so the array can continue. Why do we need to fix state_store here?
> 
> We cannot unblock the rdev until all bad blocks are acknowledged. The problem is
> mdadm cannot be sure it has stored all bad blocks in the first pass (read of
> unacknowledged_bad_blocks sysfs file). When bad block is encountered, rdev
> enters Blocked, Faulty state (unacked_exists is non-zero in state_show). Then
> mdadm reads the bad block, stores it in metadata and acknowledges it to the
> kernel. Initially I have tried to call ack_all_badblocks in bb_store or in
> state_store("-blocked") but there was a race. If other requests (the ones that
> had started before array got into blocked state) notified bad blocks after sysfs
> file was read by mdadm but before ack_all_badblocks call, ack_all_badblocks call
> was also acknowledging bad blocks not stored (and never to be as a result) in
> metadata. That's why I have introduced a new function
> check_if_all_badblocks_acked to close this race.
> 
> I'm not sure if native bad block support is not prone to the similar problem -
> bad block structure modified between metadata sync and ack_all_badblocks call.
> 
Yep, we always have the race here. Fortunately we don't need to wait all
badblocks acknowledged, the user of md_wait_for_blocked_rdev will retry. In the
retry, we will check if the badblock is acknowledged.
The native bad block support doesn't have the race. We copy badblocks to a new
page, clear badblocks->changed and then write the new page to disks.
ack_all_badblocks will check the ->changed, and do nothing if it's set. So if
something happens in between, ack_all_badblocks will do nothing.
While the external metadata array hasn't such mechanism to avoid race, I still
thought changing state_store isn't a good idea.
I just sent a patch to fix badblocks_set() and make it clear unacked_exists.
bb_store shouldn't call ack_all_badblocks in your case, but we don't need to.
As long as mdadm uses bb_store to acknowledge the ranges, the array can
continue. And if badblocks_set() can clear unacked_exists, the array will not
be reported as Blocked.
> As for BlockedBadBlocks flag cleared in bb_store, commit de393cdea66c ("md: make
> it easier to wait for bad blocks to be acknowledged") explains this flag is only
> an advisory. All awaiting requests are woken up and check if bad block that
> interests them is already acknowledged. If so, then can continue, and if not,
> they set the flag again to check in a while. It is just a useful optimization.
I think 'advisory' means the driver should retry
 
> Please note that rdev with unacknowledged bad block is reported as Blocked via
> sysfs state (non-zero unacked_exists), even though the corresponding rdev kernel
> flag is not set. It is the reason why mdadm calls state_store("-blocked").
If badblocks_set() can clear unacked_exists, this isn't required.
Thanks,
Shaohua
^ permalink raw reply	[flat|nested] 5+ messages in thread
* Re: [PATCH 3/3 v2] md: unblock array if bad blocks have been acknowledged
  2016-10-20 21:55     ` Shaohua Li
@ 2016-10-21 14:33       ` Tomasz Majchrzak
  0 siblings, 0 replies; 5+ messages in thread
From: Tomasz Majchrzak @ 2016-10-21 14:33 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid
On Thu, Oct 20, 2016 at 02:55:15PM -0700, Shaohua Li wrote:
> On Thu, Oct 20, 2016 at 02:09:15PM +0200, Tomasz Majchrzak wrote:
> > On Wed, Oct 19, 2016 at 10:28:18PM -0700, Shaohua Li wrote:
> > > On Tue, Oct 18, 2016 at 04:10:24PM +0200, Tomasz Majchrzak wrote:
> > > > Once external metadata handler acknowledges all bad blocks (by writing
> > > > to rdev 'bad_blocks' sysfs file), it requests to unblock the array.
> > > > Check if all bad blocks are actually acknowledged as there might be a
> > > > race if new bad blocks are notified at the same time. If all bad blocks
> > > > are acknowledged, just unblock the array and continue. If not, ignore
> > > > the request to unblock (do not fail an array). External metadata handler
> > > > is expected to either process remaining bad blocks and try to unblock
> > > > again or remove bad block support for a disk (which will cause disk to
> > > > fail as in no-support case).
> > > > 
> > > > Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
> > > > Reviewed-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
> > > > ---
> > > >  drivers/md/md.c | 24 +++++++++++++++++-------
> > > >  1 file changed, 17 insertions(+), 7 deletions(-)
> > > > 
> > > > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > > > index cc05236..ce585b7 100644
> > > > --- a/drivers/md/md.c
> > > > +++ b/drivers/md/md.c
> > > > @@ -2612,19 +2612,29 @@ state_store(struct md_rdev *rdev, const char *buf, size_t len)
> > > >  		set_bit(Blocked, &rdev->flags);
> > > >  		err = 0;
> > > >  	} else if (cmd_match(buf, "-blocked")) {
> > > > -		if (!test_bit(Faulty, &rdev->flags) &&
> > > > +		int unblock = 1;
> > > > +		int acked = !rdev->badblocks.unacked_exist;
> > > > +
> > > > +		if ((test_bit(ExternalBbl, &rdev->flags) &&
> > > > +		     rdev->badblocks.changed))
> > > > +			acked = check_if_badblocks_acked(&rdev->badblocks);
> > > > +
> > > > +		if (test_bit(ExternalBbl, &rdev->flags) && !acked) {
> > > > +			unblock = 0;
> > > > +		} else if (!test_bit(Faulty, &rdev->flags) &&
> > > 
> > > I missed one thing in last review. writing to bad_blocks sysfs file already
> > > clears the BlockedBadBlocks bit and wakeup the thread sleeping at blocked_wait,
> > > so the array can continue. Why do we need to fix state_store here?
> > 
> > We cannot unblock the rdev until all bad blocks are acknowledged. The problem is
> > mdadm cannot be sure it has stored all bad blocks in the first pass (read of
> > unacknowledged_bad_blocks sysfs file). When bad block is encountered, rdev
> > enters Blocked, Faulty state (unacked_exists is non-zero in state_show). Then
> > mdadm reads the bad block, stores it in metadata and acknowledges it to the
> > kernel. Initially I have tried to call ack_all_badblocks in bb_store or in
> > state_store("-blocked") but there was a race. If other requests (the ones that
> > had started before array got into blocked state) notified bad blocks after sysfs
> > file was read by mdadm but before ack_all_badblocks call, ack_all_badblocks call
> > was also acknowledging bad blocks not stored (and never to be as a result) in
> > metadata. That's why I have introduced a new function
> > check_if_all_badblocks_acked to close this race.
> > 
> > I'm not sure if native bad block support is not prone to the similar problem -
> > bad block structure modified between metadata sync and ack_all_badblocks call.
> > 
> 
> Yep, we always have the race here. Fortunately we don't need to wait all
> badblocks acknowledged, the user of md_wait_for_blocked_rdev will retry. In the
> retry, we will check if the badblock is acknowledged.
> 
> The native bad block support doesn't have the race. We copy badblocks to a new
> page, clear badblocks->changed and then write the new page to disks.
> ack_all_badblocks will check the ->changed, and do nothing if it's set. So if
> something happens in between, ack_all_badblocks will do nothing.
> 
> While the external metadata array hasn't such mechanism to avoid race, I still
> thought changing state_store isn't a good idea.
> 
> I just sent a patch to fix badblocks_set() and make it clear unacked_exists.
> bb_store shouldn't call ack_all_badblocks in your case, but we don't need to.
> As long as mdadm uses bb_store to acknowledge the ranges, the array can
> continue. And if badblocks_set() can clear unacked_exists, the array will not
> be reported as Blocked.
> 
> > As for BlockedBadBlocks flag cleared in bb_store, commit de393cdea66c ("md: make
> > it easier to wait for bad blocks to be acknowledged") explains this flag is only
> > an advisory. All awaiting requests are woken up and check if bad block that
> > interests them is already acknowledged. If so, then can continue, and if not,
> > they set the flag again to check in a while. It is just a useful optimization.
> 
> I think 'advisory' means the driver should retry
>  
> > Please note that rdev with unacknowledged bad block is reported as Blocked via
> > sysfs state (non-zero unacked_exists), even though the corresponding rdev kernel
> > flag is not set. It is the reason why mdadm calls state_store("-blocked").
> 
> If badblocks_set() can clear unacked_exists, this isn't required.
Thank you for your hints. I have resent my patches. They come on top of your patch.
Tomek
^ permalink raw reply	[flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-10-21 14:33 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-10-18 14:10 [PATCH 3/3 v2] md: unblock array if bad blocks have been acknowledged Tomasz Majchrzak
2016-10-20  5:28 ` Shaohua Li
2016-10-20 12:09   ` Tomasz Majchrzak
2016-10-20 21:55     ` Shaohua Li
2016-10-21 14:33       ` Tomasz Majchrzak
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).