Re: RAID1 removing failed disk returns EBUSY

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Xiao Ni <xni@redhat.com>
To: Neil Brown <neilb@suse.de>
Cc: Joe Lawrence <joe.lawrence@stratus.com>,
	linux-raid@vger.kernel.org,
	Bill Kuzeja <william.kuzeja@stratus.com>
Subject: Re: RAID1 removing failed disk returns EBUSY
Date: Thu, 25 Jun 2015 05:42:54 -0400 (EDT)	[thread overview]
Message-ID: <1225352330.22633499.1435225374686.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <20150617125151.372bb103@home.neil.brown.name>



----- Original Message -----
> From: "Neil Brown" <neilb@suse.de>
> To: "XiaoNi" <xni@redhat.com>
> Cc: "Joe Lawrence" <joe.lawrence@stratus.com>, linux-raid@vger.kernel.org, "Bill Kuzeja" <william.kuzeja@stratus.com>
> Sent: Wednesday, June 17, 2015 10:51:51 AM
> Subject: Re: RAID1 removing failed disk returns EBUSY
> 
> On Wed, 10 Jun 2015 14:26:41 +0800
> XiaoNi <xni@redhat.com> wrote:
> 
> > 
> > 
> > On 02/03/2015 04:10 PM, Xiao Ni wrote:
> > >
> > > ----- Original Message -----
> > >> From: "NeilBrown" <neilb@suse.de>
> > >> To: "Xiao Ni" <xni@redhat.com>
> > >> Cc: "Joe Lawrence" <joe.lawrence@stratus.com>,
> > >> linux-raid@vger.kernel.org, "Bill Kuzeja" <william.kuzeja@stratus.com>
> > >> Sent: Monday, February 2, 2015 2:36:01 PM
> > >> Subject: Re: RAID1 removing failed disk returns EBUSY
> > >>
> > >> On Thu, 29 Jan 2015 07:14:16 -0500 (EST) Xiao Ni <xni@redhat.com> wrote:
> > >>
> > >>>
> > >>> ----- Original Message -----
> > >>>> From: "NeilBrown" <neilb@suse.de>
> > >>>> To: "Xiao Ni" <xni@redhat.com>
> > >>>> Cc: "Joe Lawrence" <joe.lawrence@stratus.com>,
> > >>>> linux-raid@vger.kernel.org, "Bill Kuzeja" <william.kuzeja@stratus.com>
> > >>>> Sent: Thursday, January 29, 2015 11:52:17 AM
> > >>>> Subject: Re: RAID1 removing failed disk returns EBUSY
> > >>>>
> > >>>> On Sun, 18 Jan 2015 21:33:50 -0500 (EST) Xiao Ni <xni@redhat.com>
> > >>>> wrote:
> > >>>>
> > >>>>>
> > >>>>> ----- Original Message -----
> > >>>>>> From: "Joe Lawrence" <joe.lawrence@stratus.com>
> > >>>>>> To: "Xiao Ni" <xni@redhat.com>
> > >>>>>> Cc: "NeilBrown" <neilb@suse.de>, linux-raid@vger.kernel.org, "Bill
> > >>>>>> Kuzeja" <william.kuzeja@stratus.com>
> > >>>>>> Sent: Friday, January 16, 2015 11:10:31 PM
> > >>>>>> Subject: Re: RAID1 removing failed disk returns EBUSY
> > >>>>>>
> > >>>>>> On Fri, 16 Jan 2015 00:20:12 -0500
> > >>>>>> Xiao Ni <xni@redhat.com> wrote:
> > >>>>>>> Hi Joe
> > >>>>>>>
> > >>>>>>>     Thanks for reminding me. I didn't do that. Now it can remove
> > >>>>>>>     successfully after writing
> > >>>>>>> "idle" to sync_action.
> > >>>>>>>
> > >>>>>>>     I thought wrongly that the patch referenced in this mail is
> > >>>>>>>     fixed
> > >>>>>>>     for
> > >>>>>>>     the problem.
> > >>>>>> So it sounds like even with 3.18 and a new mdadm, this bug still
> > >>>>>> persists?
> > >>>>>>
> > >>>>>> -- Joe
> > >>>>>>
> > >>>>>> --
> > >>>>> Hi Joe
> > >>>>>
> > >>>>>     I'm a little confused now. Does the patch
> > >>>>>     45eaf45dfa4850df16bc2e8e7903d89021137f40 from linux-stable
> > >>>>> resolve the problem?
> > >>>>>
> > >>>>>     My environment is:
> > >>>>>
> > >>>>> [root@dhcp-12-133 mdadm]# mdadm --version
> > >>>>> mdadm - v3.3.2-18-g93d3bd3 - 18th December 2014  (this is the newest
> > >>>>> upstream)
> > >>>>> [root@dhcp-12-133 mdadm]# uname -r
> > >>>>> 3.18.2
> > >>>>>
> > >>>>>
> > >>>>>     My steps are:
> > >>>>>
> > >>>>> [root@dhcp-12-133 mdadm]# lsblk
> > >>>>> sdb                       8:16   0 931.5G  0 disk
> > >>>>> └─sdb1                    8:17   0     5G  0 part
> > >>>>> sdc                       8:32   0 186.3G  0 disk
> > >>>>> sdd                       8:48   0 931.5G  0 disk
> > >>>>> └─sdd1                    8:49   0     5G  0 part
> > >>>>> [root@dhcp-12-133 mdadm]# mdadm -CR /dev/md0 -l1 -n2 /dev/sdb1
> > >>>>> /dev/sdd1
> > >>>>> --assume-clean
> > >>>>> mdadm: Note: this array has metadata at the start and
> > >>>>>      may not be suitable as a boot device.  If you plan to
> > >>>>>      store '/boot' on this device please ensure that
> > >>>>>      your boot-loader understands md/v1.x metadata, or use
> > >>>>>      --metadata=0.90
> > >>>>> mdadm: Defaulting to version 1.2 metadata
> > >>>>> mdadm: array /dev/md0 started.
> > >>>>>
> > >>>>>     Then I unplug the disk.
> > >>>>>
> > >>>>> [root@dhcp-12-133 mdadm]# lsblk
> > >>>>> sdc                       8:32   0 186.3G  0 disk
> > >>>>> sdd                       8:48   0 931.5G  0 disk
> > >>>>> └─sdd1                    8:49   0     5G  0 part
> > >>>>>    └─md0                   9:0    0     5G  0 raid1
> > >>>>> [root@dhcp-12-133 mdadm]# echo faulty >
> > >>>>> /sys/block/md0/md/dev-sdb1/state
> > >>>>> [root@dhcp-12-133 mdadm]# echo remove >
> > >>>>> /sys/block/md0/md/dev-sdb1/state
> > >>>>> -bash: echo: write error: Device or resource busy
> > >>>>> [root@dhcp-12-133 mdadm]# echo idle > /sys/block/md0/md/sync_action
> > >>>>> [root@dhcp-12-133 mdadm]# echo remove >
> > >>>>> /sys/block/md0/md/dev-sdb1/state
> > >>>>>
> > >>>> I cannot reproduce this - using linux 3.18.2.  I'd be surprised if
> > >>>> mdadm
> > >>>> version affects things.
> > >>> Hi Neil
> > >>>
> > >>>     I'm very curious, because it can reproduce in my machine 100%.
> > >>>
> > >>>> This error (Device or resoource busy) implies that rdev->raid_disk is
> > >>>> >=
> > >>>> 0
> > >>>> (tested in state_store()).
> > >>>>
> > >>>> ->raid_disk is set to -1 by remove_and_add_spares() providing:
> > >>>>    1/ it isn't Blocked (which is very unlikely)
> > >>>>    2/ hot_remove_disk succeeds, which it will if nr_pending is zero,
> > >>>>    and
> > >>>>    3/ nr_pending is zero.
> > >>>     I remember I have tired to check those reasons. But it's really is
> > >>>     the
> > >>>     reason 1
> > >>> which is very unlikely.
> > >>>
> > >>>     I add some code in the function array_state_show
> > >>>
> > >>>      array_state_show(struct mddev *mddev, char *page) {
> > >>>          enum array_state st = inactive;
> > >>>          struct md_rdev *rdev;
> > >>>
> > >>>          rdev_for_each_rcu(rdev, mddev) {
> > >>>                  printk(KERN_ALERT "search for %s\n",
> > >>>                  rdev->bdev->bd_disk->disk_name);
> > >>>                  if (test_bit(Blocked, &rdev->flags))
> > >>>                          printk(KERN_ALERT "rdev is Blocked\n");
> > >>>                  else
> > >>>                          printk(KERN_ALERT "rdev is not Blocked\n");
> > >>>      }
> > >>>
> > >>>    When I echo 1 > /sys/block/sdc/device/delete, then I ran command:
> > >>>
> > >>> [root@dhcp-12-133 md]# cat /sys/block/md0/md/array_state
> > >>> read-auto
> > >>    ^^^^^^^^^
> > >>
> > >> I think that is half the explanation.
> > >> You must have the md_mod.start_ro parameter set to '1'.
> > >>
> > >>
> > >>> [root@dhcp-12-133 md]# dmesg
> > >>> [ 2679.559185] search for sdc
> > >>> [ 2679.559189] rdev is Blocked
> > >>> [ 2679.559190] search for sdb
> > >>> [ 2679.559190] rdev is not Blocked
> > >>>     
> > >>>    So sdc is Blocked
> > >> and that is the other half - thanks.
> > >> (yes, I was wrong.  Sometimes it is easier than being right, but still
> > >> yields results).
> > >>
> > >> When a device fails, it is Blocked until the metadata is updated to
> > >> record
> > >> the failure.  This ensures that no writes succeed without writing to
> > >> that
> > >> device, until we a certain that no read will try reading from that
> > >> device,
> > >> even after a crash/restart.
> > >>
> > >> Blocked is cleared after the metadata is written, but read-auto (and
> > >> read-only) devices never write out their metadata.  So blocked doesn't
> > >> get
> > >> cleared.
> > >>
> > >> When you "echo idle > .../sync_action" one of the side effects is to
> > >> with
> > >> from 'read-auto' to fully active.  This allows the metadata to be
> > >> written,
> > >> Blocked to be cleared, and the device to be removed.
> > >>
> > >> If you
> > >>    echo none > /sys/block/md0/md/dev-sdc/slot
> > >>
> > >> first, then the remove will work.
> > >>
> > >> We could possibly fix it with something like the following, but I'm not
> > >> sure
> > >> I like it.  There is no guarantee that I can see which would ensure the
> > >> superblock got updated before the first write if the array switch to
> > >> read/write.
> > >>
> > >> NeilBrown
> > >>
> > >> diff --git a/drivers/md/md.c b/drivers/md/md.c
> > >> index 9233c71138f1..b3d1e8e5e067 100644
> > >> --- a/drivers/md/md.c
> > >> +++ b/drivers/md/md.c
> > >> @@ -7528,7 +7528,7 @@ static int remove_and_add_spares(struct mddev
> > >> *mddev,
> > >>   	rdev_for_each(rdev, mddev)
> > >>   		if ((this == NULL || rdev == this) &&
> > >>   		    rdev->raid_disk >= 0 &&
> > >> -		    !test_bit(Blocked, &rdev->flags) &&
> > >> +		    (!test_bit(Blocked, &rdev->flags) || mddev->ro) &&
> > >>   		    (test_bit(Faulty, &rdev->flags) ||
> > >>   		     ! test_bit(In_sync, &rdev->flags)) &&
> > >>   		    atomic_read(&rdev->nr_pending)==0) {
> > >>
> > >>
> > >>
> > > Hi Neil
> > >
> > >     I have tried the patch and the problem can be fixed by it. But I'm
> > >     sorry that I can't
> > > give more advices for better idea about this. I'm not familiar with the
> > > metadata part about
> > > the md. I'll try to get more time to read the code about md.
> > >
> > Hi Neil
> > 
> >      I don't see the patch in linux-stable, do you miss this?
> 
> I don't believe this bug is sufficiently serious for the patch to go to
> -stable.  However it doesn't need to be fixed - thanks for the reminder.
> 
> I've just queued the following patch which I am happy with.  If you
> could confirm that it works for you, I would appreciate that.
> 
> Thanks,
> NeilBrown
> 
> 
> From: Neil Brown <neilb@suse.de>
> Date: Wed, 17 Jun 2015 12:31:46 +1000
> Subject: [PATCH] md: clear Blocked flag on failed devices when array is
>  read-only.
> 
> The Blocked flag indicates that a device has failed but that this
> fact hasn't been recorded in the metadata yet.  Writes to such
> devices cannot be allowed until the metadata has been updated.
> 
> On a read-only array, the Blocked flag will never be cleared.
> This prevents the device being removed from the array.
> 
> If the metadata is being handled by the kernel
> (i.e. !mddev->external), then we can be sure that if the array is
> switch to writable, then a metadata update will happen and will
> record the failure.  So we don't need the flag set.
> 
> If metadata is externally managed, it is upto the external manager
> to clear the 'blocked' flag.
> 
> Reported-by: XiaoNi <xni@redhat.com>
> Signed-off-by: NeilBrown <neilb@suse.de>
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 3d339e2..5a6681a 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -8125,6 +8125,15 @@ void md_check_recovery(struct mddev *mddev)
>  		int spares = 0;
>  
>  		if (mddev->ro) {
> +			struct md_rdev *rdev;
> +			if (!mddev->external && mddev->in_sync)
> +				/* 'Blocked' flag not needed as failed devices
> +				 * will be recorded if array switched to read/write.
> +				 * Leaving it set will prevent the device
> +				 * from being removed.
> +				 */
> +				rdev_for_each(rdev, mddev)
> +					clear_bit(Blocked, &rdev->flags);
>  			/* On a read-only array we can:
>  			 * - remove failed devices
>  			 * - add already-in_sync devices if the array itself
> 
> 
Hi Neil

Sorry for late response for this. 

I have tried the patch. When I unplug the disk(sdc1) which belongs to the raid1, the directory 
/sys/block/md0/md/dev-sdc1 is deleted. I haven't read the code for unplug device. So is it what
you want?

Best Regards
Xiao
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

     prev parent reply	other threads:[~2015-06-25  9:42 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-27 20:27 RAID1 removing failed disk returns EBUSY Joe Lawrence
2014-10-28 21:41 ` NeilBrown
2014-10-29 17:36   ` Joe Lawrence
2014-11-13 14:05     ` Joe Lawrence
2014-11-16 23:03       ` NeilBrown
2015-01-14 12:41         ` XiaoNi
2015-01-15 13:22           ` Joe Lawrence
2015-01-16  5:20             ` Xiao Ni
2015-01-16 15:10               ` Joe Lawrence
2015-01-19  2:33                 ` Xiao Ni
2015-01-19 17:56                   ` Joe Lawrence
2015-01-20  7:16                     ` Xiao Ni
2015-01-23 15:11                       ` Joe Lawrence
2015-01-30  2:19                         ` Xiao Ni
2015-01-30  4:27                           ` Xiao Ni
2015-01-29  3:52                   ` NeilBrown
2015-01-29 12:14                     ` Xiao Ni
2015-02-02  6:36                       ` NeilBrown
2015-02-03  8:10                         ` Xiao Ni
2015-06-10  6:26                           ` XiaoNi
2015-06-17  2:51                             ` Neil Brown
2015-06-25  9:42                               ` Xiao Ni [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1225352330.22633499.1435225374686.JavaMail.zimbra@redhat.com \
    --to=xni@redhat.com \
    --cc=joe.lawrence@stratus.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=william.kuzeja@stratus.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).