Re: [PATCH] md: Add ability for disable bad block management

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: NeilBrown <neilb@suse.de>
To: "Kwolek, Adam" <adam.kwolek@intel.com>
Cc: "linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>,
	"Ciechanowski, Ed" <ed.ciechanowski@intel.com>,
	"Labun, Marcin" <Marcin.Labun@intel.com>,
	"Williams, Dan J" <dan.j.williams@intel.com>
Subject: Re: [PATCH] md: Add ability for disable bad block management
Date: Wed, 7 Dec 2011 12:52:55 +1100	[thread overview]
Message-ID: <20111207125255.51382e59@notabene.brown> (raw)
In-Reply-To: <79556383A0E1384DB3A3903742AAC04A055919@IRSMSX101.ger.corp.intel.com>

[-- Attachment #1: Type: text/plain, Size: 7524 bytes --]

On Tue, 6 Dec 2011 13:02:21 +0000 "Kwolek, Adam" <adam.kwolek@intel.com>
wrote:

> 
> 
> > -----Original Message-----
> > From: NeilBrown [mailto:neilb@suse.de]
> > Sent: Tuesday, December 06, 2011 7:05 AM
> > To: Kwolek, Adam
> > Cc: linux-raid@vger.kernel.org; Ciechanowski, Ed; Labun, Marcin; Williams,
> > Dan J
> > Subject: Re: [PATCH] md: Add ability for disable bad block management
> > 
> > On Wed, 30 Nov 2011 08:17:32 +0000 "Kwolek, Adam"
> > <adam.kwolek@intel.com>
> > wrote:
> > 
> > >
> > >
> > > > -----Original Message-----
> > > > From: NeilBrown [mailto:neilb@suse.de]
> > > > Sent: Wednesday, November 30, 2011 1:14 AM
> > > > To: Kwolek, Adam
> > > > Cc: linux-raid@vger.kernel.org; Ciechanowski, Ed; Labun, Marcin;
> > > > Williams, Dan J
> > > > Subject: Re: [PATCH] md: Add ability for disable bad block
> > > > management
> > > >
> > > > On Thu, 24 Nov 2011 13:19:53 +0100 Adam Kwolek
> > > > <adam.kwolek@intel.com> wrote:
> > > >
> > > > > When external metadata doesn't support BBM, mdadm cannot answer
> > > > > correctly for BBM requests. It causes reshape process being stopped.
> > > > >
> > > > > Add ability for external metadata /mdadm/ to disable BBM via sysfs.
> > > > > md will ignore bad blocks as it is for metadata v0.90.
> > > >
> > > > This should not be necessary.
> > > >
> > > > The intention is that a device with a bad block looks exactly like a
> > > > device with a failed device.  i.e. 'faulty' and 'blocked' appear in the 'state'
> > > > file.
> > > >
> > > > If the metadata doesn't support a bad-block list, it will record
> > > > that the device has failed and will unblock the device.  At that point the
> > failure is forced.
> > > > If the metadata does support a bad block list it will just record
> > > > the bad blocks and acknowledge them, and the unblock the device.  At
> > > > that point the device won't be failed, the 'faulty' state will
> > > > disappear, and it will continue to be used with the known bad blocks.
> > > >
> > > > What exactly is going wrong that makes you think you need this patch?
> > >
> > >
> > > When degradation occurs during migration BBM is signaled to mdmon and
> > mdmon /monitor.c/ tries to mark disk  '-blocked'
> > > This operation fails. Momon goes in to loop, and nothing can be done /I
> > cannot make it using sysfs/ to signal or remove device.
> > > In sysfs device is present in /sys/block/mdXXX/md but entry
> > /sys/block/mdXXX/md/dev-sdX/~block is missing /disk was pulled out/.
> > 
> > 
> > I've found a couple of issues.  I'm not sure if they completely explain what
> > you are seeing.  Could you please test with these two fixes and tell me the
> > results?
> > 
> > Firstly, I find that writing "-blocked" succeeds (no error returned) but the
> > "blocked" flag does not get cleared, which is certainly confusing.
> > 
> > This is fixed by:
> > 
> > diff --git a/drivers/md/md.c b/drivers/md/md.c index 4adcbb4..7258dc1
> > 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -2562,7 +2562,8 @@ state_show(struct md_rdev *rdev, char *page)
> >  		sep = ",";
> >  	}
> >  	if (test_bit(Blocked, &rdev->flags) ||
> > -	    rdev->badblocks.unacked_exist) {
> > +	    (rdev->badblocks.unacked_exist
> > +	     && !test_bit(Faulty, &rdev->flags))) {
> >  		len += sprintf(page+len, "%sblocked", sep);
> >  		sep = ",";
> >  	}
> > 
> > 
> > Secondly mdmon writes "-blocked" even when the "blocked" flag is not set.
> > This succeeds so state_store() calls
> > 		sysfs_notify_dirent_safe(rdev->sysfs_state);
> > 
> > so mdmon/monitor.c is woken up to go around the loop again and it writes "-
> > blocked" again and so it continues in a loop.
> > 
> > This is fixed by:
> > 
> > diff --git a/monitor.c b/monitor.c
> > index b002e90..29bde18 100644
> > --- a/monitor.c
> > +++ b/monitor.c
> > @@ -339,7 +339,8 @@ static int read_and_act(struct active_array *a)
> >  			a->container->ss->set_disk(a, mdi->disk.raid_disk,
> >  						   mdi->curr_state);
> >  			check_degraded = 1;
> > -			mdi->next_state |= DS_UNBLOCK;
> > +			if (mdi->curr_state & DS_BLOCKED)
> > +				mdi->next_state |= DS_UNBLOCK;
> >  			if (a->curr_state == read_auto) {
> >  				a->container->ss->set_array_state(a, 0);
> >  				a->next_state = active;
> > 
> > 
> > Finally, when a badblock is added to the list we don't currently notify
> > rdev->sysfs_state so mdmon doesn't notice straight away and so is
> > rdev->delayed in
> > taking action.  It will only notice when a write blocks.
> > 
> > This is fixed by:
> > 
> > diff --git a/drivers/md/md.c b/drivers/md/md.c index 4adcbb4..9cc7983
> > 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -7940,6 +7941,7 @@ int rdev_set_badblocks(struct md_rdev *rdev,
> > sector_t s, int sectors,
> >  				  s + rdev->data_offset, sectors,
> > acknowledged);
> >  	if (rv) {
> >  		/* Make sure they get written out promptly */
> > +		sysfs_notify_dirent_safe(rdev->sysfs_state);
> >  		set_bit(MD_CHANGE_CLEAN, &rdev->mddev->flags);
> >  		md_wakeup_thread(rdev->mddev->thread);
> >  	}
> > 
> > 
> > With these 3 changes in place I get substantially improved behaviour on my
> > simple test (just doing resync, not reshape).
> > 
> > Thanks,
> > NeilBrown
> 
> I've applied those changes and:
> 1.  Migration:
> 	a) with additionally disabled BBM, reshape continues after degradation and performance is not lower (without your patches performance was poor and mdmon goes in to "crazy" run).
> 	b) with enabled BBM (without my change), metadata is updated correctly and md stops. mdstat shows that reshape is in progress but it is not moving forward
> 2. Rebuild:
> 	a) with additionally disabled BBM, rebuild is stopped  correctly in md and metadata just after degradation (I've got few additional corrections for metadata rebuild finalization, I'll post it shortly). 
> 	b) with enabled BBM (without my change), metadata is updated correctly and md stops. mdstat shows that rebuild is in progress but it is not moving forward
> 
> 
> It seems that those changes helps for reshape performance drop after degradation and "crazy" mdmon run. 
> In md without blocking BBM still md_do_sync() doesn't finish on degradation during reshape and rebuild. This causes process to be stopped.
> The last information from md is print out from md_error() and it probably waits on BBM confirmation.
> 
> What can be different in my tests is that I physically pull out disks to get raid degraded (I'm not using sysfs to do this). After this rdev link in md device is invalid.
> 
> Please let me know if you want to any additional tests made by me /any specific logs?/.
> 
>

I cannot reproduce this.
I didn't physically remove devices, but I used
   echo 1 > /sys/block/sdc/device/delete

which should be nearly identical from the perspective of md and mdadm.

If you could give me the exact set of steps that you follow to produce the
problem that would help - maybe a script?  Just a description is OK.

Also you say it is blocking in md_do_sync.  Is that at the 

	wait_event(mddev->recovery_wait, !atomic_read(&mddev->recovery_active));

call just after the "out:" label?

What is the raid thread doing at this point?  
   cat /proc/PID/stack
might help.

What are the contents of all the sysfs files?
   grep . /sys/block/mdXXX/md/*
   grep . /sys/block/mdXXX/md/dev-*/*

Thanks,
NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

next prev parent reply	other threads:[~2011-12-07  1:52 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-24 12:19 [PATCH] md: Add ability for disable bad block management Adam Kwolek
2011-11-24 12:23 ` Paul Menzel
2011-11-24 12:28   ` Kwolek, Adam
2011-11-24 12:48     ` Paul Menzel
2011-11-30  0:14 ` NeilBrown
2011-11-30  8:17   ` Kwolek, Adam
2011-12-06  6:05     ` NeilBrown
2011-12-06 13:02       ` Kwolek, Adam
2011-12-07  1:52         ` NeilBrown [this message]
2011-12-07 11:10           ` Kwolek, Adam
2011-12-08  4:02             ` NeilBrown
2011-12-08 15:36               ` Kwolek, Adam
2011-12-09  3:53                 ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111207125255.51382e59@notabene.brown \
    --to=neilb@suse.de \
    --cc=Marcin.Labun@intel.com \
    --cc=adam.kwolek@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=ed.ciechanowski@intel.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).