From: Xiao Ni <xni@redhat.com>
To: Neil Brown <neilb@suse.de>
Cc: Joe Lawrence <joe.lawrence@stratus.com>,
linux-raid@vger.kernel.org,
Bill Kuzeja <william.kuzeja@stratus.com>
Subject: Re: RAID1 removing failed disk returns EBUSY
Date: Thu, 25 Jun 2015 05:42:54 -0400 (EDT) [thread overview]
Message-ID: <1225352330.22633499.1435225374686.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <20150617125151.372bb103@home.neil.brown.name>
----- Original Message -----
> From: "Neil Brown" <neilb@suse.de>
> To: "XiaoNi" <xni@redhat.com>
> Cc: "Joe Lawrence" <joe.lawrence@stratus.com>, linux-raid@vger.kernel.org, "Bill Kuzeja" <william.kuzeja@stratus.com>
> Sent: Wednesday, June 17, 2015 10:51:51 AM
> Subject: Re: RAID1 removing failed disk returns EBUSY
>
> On Wed, 10 Jun 2015 14:26:41 +0800
> XiaoNi <xni@redhat.com> wrote:
>
> >
> >
> > On 02/03/2015 04:10 PM, Xiao Ni wrote:
> > >
> > > ----- Original Message -----
> > >> From: "NeilBrown" <neilb@suse.de>
> > >> To: "Xiao Ni" <xni@redhat.com>
> > >> Cc: "Joe Lawrence" <joe.lawrence@stratus.com>,
> > >> linux-raid@vger.kernel.org, "Bill Kuzeja" <william.kuzeja@stratus.com>
> > >> Sent: Monday, February 2, 2015 2:36:01 PM
> > >> Subject: Re: RAID1 removing failed disk returns EBUSY
> > >>
> > >> On Thu, 29 Jan 2015 07:14:16 -0500 (EST) Xiao Ni <xni@redhat.com> wrote:
> > >>
> > >>>
> > >>> ----- Original Message -----
> > >>>> From: "NeilBrown" <neilb@suse.de>
> > >>>> To: "Xiao Ni" <xni@redhat.com>
> > >>>> Cc: "Joe Lawrence" <joe.lawrence@stratus.com>,
> > >>>> linux-raid@vger.kernel.org, "Bill Kuzeja" <william.kuzeja@stratus.com>
> > >>>> Sent: Thursday, January 29, 2015 11:52:17 AM
> > >>>> Subject: Re: RAID1 removing failed disk returns EBUSY
> > >>>>
> > >>>> On Sun, 18 Jan 2015 21:33:50 -0500 (EST) Xiao Ni <xni@redhat.com>
> > >>>> wrote:
> > >>>>
> > >>>>>
> > >>>>> ----- Original Message -----
> > >>>>>> From: "Joe Lawrence" <joe.lawrence@stratus.com>
> > >>>>>> To: "Xiao Ni" <xni@redhat.com>
> > >>>>>> Cc: "NeilBrown" <neilb@suse.de>, linux-raid@vger.kernel.org, "Bill
> > >>>>>> Kuzeja" <william.kuzeja@stratus.com>
> > >>>>>> Sent: Friday, January 16, 2015 11:10:31 PM
> > >>>>>> Subject: Re: RAID1 removing failed disk returns EBUSY
> > >>>>>>
> > >>>>>> On Fri, 16 Jan 2015 00:20:12 -0500
> > >>>>>> Xiao Ni <xni@redhat.com> wrote:
> > >>>>>>> Hi Joe
> > >>>>>>>
> > >>>>>>> Thanks for reminding me. I didn't do that. Now it can remove
> > >>>>>>> successfully after writing
> > >>>>>>> "idle" to sync_action.
> > >>>>>>>
> > >>>>>>> I thought wrongly that the patch referenced in this mail is
> > >>>>>>> fixed
> > >>>>>>> for
> > >>>>>>> the problem.
> > >>>>>> So it sounds like even with 3.18 and a new mdadm, this bug still
> > >>>>>> persists?
> > >>>>>>
> > >>>>>> -- Joe
> > >>>>>>
> > >>>>>> --
> > >>>>> Hi Joe
> > >>>>>
> > >>>>> I'm a little confused now. Does the patch
> > >>>>> 45eaf45dfa4850df16bc2e8e7903d89021137f40 from linux-stable
> > >>>>> resolve the problem?
> > >>>>>
> > >>>>> My environment is:
> > >>>>>
> > >>>>> [root@dhcp-12-133 mdadm]# mdadm --version
> > >>>>> mdadm - v3.3.2-18-g93d3bd3 - 18th December 2014 (this is the newest
> > >>>>> upstream)
> > >>>>> [root@dhcp-12-133 mdadm]# uname -r
> > >>>>> 3.18.2
> > >>>>>
> > >>>>>
> > >>>>> My steps are:
> > >>>>>
> > >>>>> [root@dhcp-12-133 mdadm]# lsblk
> > >>>>> sdb 8:16 0 931.5G 0 disk
> > >>>>> └─sdb1 8:17 0 5G 0 part
> > >>>>> sdc 8:32 0 186.3G 0 disk
> > >>>>> sdd 8:48 0 931.5G 0 disk
> > >>>>> └─sdd1 8:49 0 5G 0 part
> > >>>>> [root@dhcp-12-133 mdadm]# mdadm -CR /dev/md0 -l1 -n2 /dev/sdb1
> > >>>>> /dev/sdd1
> > >>>>> --assume-clean
> > >>>>> mdadm: Note: this array has metadata at the start and
> > >>>>> may not be suitable as a boot device. If you plan to
> > >>>>> store '/boot' on this device please ensure that
> > >>>>> your boot-loader understands md/v1.x metadata, or use
> > >>>>> --metadata=0.90
> > >>>>> mdadm: Defaulting to version 1.2 metadata
> > >>>>> mdadm: array /dev/md0 started.
> > >>>>>
> > >>>>> Then I unplug the disk.
> > >>>>>
> > >>>>> [root@dhcp-12-133 mdadm]# lsblk
> > >>>>> sdc 8:32 0 186.3G 0 disk
> > >>>>> sdd 8:48 0 931.5G 0 disk
> > >>>>> └─sdd1 8:49 0 5G 0 part
> > >>>>> └─md0 9:0 0 5G 0 raid1
> > >>>>> [root@dhcp-12-133 mdadm]# echo faulty >
> > >>>>> /sys/block/md0/md/dev-sdb1/state
> > >>>>> [root@dhcp-12-133 mdadm]# echo remove >
> > >>>>> /sys/block/md0/md/dev-sdb1/state
> > >>>>> -bash: echo: write error: Device or resource busy
> > >>>>> [root@dhcp-12-133 mdadm]# echo idle > /sys/block/md0/md/sync_action
> > >>>>> [root@dhcp-12-133 mdadm]# echo remove >
> > >>>>> /sys/block/md0/md/dev-sdb1/state
> > >>>>>
> > >>>> I cannot reproduce this - using linux 3.18.2. I'd be surprised if
> > >>>> mdadm
> > >>>> version affects things.
> > >>> Hi Neil
> > >>>
> > >>> I'm very curious, because it can reproduce in my machine 100%.
> > >>>
> > >>>> This error (Device or resoource busy) implies that rdev->raid_disk is
> > >>>> >=
> > >>>> 0
> > >>>> (tested in state_store()).
> > >>>>
> > >>>> ->raid_disk is set to -1 by remove_and_add_spares() providing:
> > >>>> 1/ it isn't Blocked (which is very unlikely)
> > >>>> 2/ hot_remove_disk succeeds, which it will if nr_pending is zero,
> > >>>> and
> > >>>> 3/ nr_pending is zero.
> > >>> I remember I have tired to check those reasons. But it's really is
> > >>> the
> > >>> reason 1
> > >>> which is very unlikely.
> > >>>
> > >>> I add some code in the function array_state_show
> > >>>
> > >>> array_state_show(struct mddev *mddev, char *page) {
> > >>> enum array_state st = inactive;
> > >>> struct md_rdev *rdev;
> > >>>
> > >>> rdev_for_each_rcu(rdev, mddev) {
> > >>> printk(KERN_ALERT "search for %s\n",
> > >>> rdev->bdev->bd_disk->disk_name);
> > >>> if (test_bit(Blocked, &rdev->flags))
> > >>> printk(KERN_ALERT "rdev is Blocked\n");
> > >>> else
> > >>> printk(KERN_ALERT "rdev is not Blocked\n");
> > >>> }
> > >>>
> > >>> When I echo 1 > /sys/block/sdc/device/delete, then I ran command:
> > >>>
> > >>> [root@dhcp-12-133 md]# cat /sys/block/md0/md/array_state
> > >>> read-auto
> > >> ^^^^^^^^^
> > >>
> > >> I think that is half the explanation.
> > >> You must have the md_mod.start_ro parameter set to '1'.
> > >>
> > >>
> > >>> [root@dhcp-12-133 md]# dmesg
> > >>> [ 2679.559185] search for sdc
> > >>> [ 2679.559189] rdev is Blocked
> > >>> [ 2679.559190] search for sdb
> > >>> [ 2679.559190] rdev is not Blocked
> > >>>
> > >>> So sdc is Blocked
> > >> and that is the other half - thanks.
> > >> (yes, I was wrong. Sometimes it is easier than being right, but still
> > >> yields results).
> > >>
> > >> When a device fails, it is Blocked until the metadata is updated to
> > >> record
> > >> the failure. This ensures that no writes succeed without writing to
> > >> that
> > >> device, until we a certain that no read will try reading from that
> > >> device,
> > >> even after a crash/restart.
> > >>
> > >> Blocked is cleared after the metadata is written, but read-auto (and
> > >> read-only) devices never write out their metadata. So blocked doesn't
> > >> get
> > >> cleared.
> > >>
> > >> When you "echo idle > .../sync_action" one of the side effects is to
> > >> with
> > >> from 'read-auto' to fully active. This allows the metadata to be
> > >> written,
> > >> Blocked to be cleared, and the device to be removed.
> > >>
> > >> If you
> > >> echo none > /sys/block/md0/md/dev-sdc/slot
> > >>
> > >> first, then the remove will work.
> > >>
> > >> We could possibly fix it with something like the following, but I'm not
> > >> sure
> > >> I like it. There is no guarantee that I can see which would ensure the
> > >> superblock got updated before the first write if the array switch to
> > >> read/write.
> > >>
> > >> NeilBrown
> > >>
> > >> diff --git a/drivers/md/md.c b/drivers/md/md.c
> > >> index 9233c71138f1..b3d1e8e5e067 100644
> > >> --- a/drivers/md/md.c
> > >> +++ b/drivers/md/md.c
> > >> @@ -7528,7 +7528,7 @@ static int remove_and_add_spares(struct mddev
> > >> *mddev,
> > >> rdev_for_each(rdev, mddev)
> > >> if ((this == NULL || rdev == this) &&
> > >> rdev->raid_disk >= 0 &&
> > >> - !test_bit(Blocked, &rdev->flags) &&
> > >> + (!test_bit(Blocked, &rdev->flags) || mddev->ro) &&
> > >> (test_bit(Faulty, &rdev->flags) ||
> > >> ! test_bit(In_sync, &rdev->flags)) &&
> > >> atomic_read(&rdev->nr_pending)==0) {
> > >>
> > >>
> > >>
> > > Hi Neil
> > >
> > > I have tried the patch and the problem can be fixed by it. But I'm
> > > sorry that I can't
> > > give more advices for better idea about this. I'm not familiar with the
> > > metadata part about
> > > the md. I'll try to get more time to read the code about md.
> > >
> > Hi Neil
> >
> > I don't see the patch in linux-stable, do you miss this?
>
> I don't believe this bug is sufficiently serious for the patch to go to
> -stable. However it doesn't need to be fixed - thanks for the reminder.
>
> I've just queued the following patch which I am happy with. If you
> could confirm that it works for you, I would appreciate that.
>
> Thanks,
> NeilBrown
>
>
> From: Neil Brown <neilb@suse.de>
> Date: Wed, 17 Jun 2015 12:31:46 +1000
> Subject: [PATCH] md: clear Blocked flag on failed devices when array is
> read-only.
>
> The Blocked flag indicates that a device has failed but that this
> fact hasn't been recorded in the metadata yet. Writes to such
> devices cannot be allowed until the metadata has been updated.
>
> On a read-only array, the Blocked flag will never be cleared.
> This prevents the device being removed from the array.
>
> If the metadata is being handled by the kernel
> (i.e. !mddev->external), then we can be sure that if the array is
> switch to writable, then a metadata update will happen and will
> record the failure. So we don't need the flag set.
>
> If metadata is externally managed, it is upto the external manager
> to clear the 'blocked' flag.
>
> Reported-by: XiaoNi <xni@redhat.com>
> Signed-off-by: NeilBrown <neilb@suse.de>
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 3d339e2..5a6681a 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -8125,6 +8125,15 @@ void md_check_recovery(struct mddev *mddev)
> int spares = 0;
>
> if (mddev->ro) {
> + struct md_rdev *rdev;
> + if (!mddev->external && mddev->in_sync)
> + /* 'Blocked' flag not needed as failed devices
> + * will be recorded if array switched to read/write.
> + * Leaving it set will prevent the device
> + * from being removed.
> + */
> + rdev_for_each(rdev, mddev)
> + clear_bit(Blocked, &rdev->flags);
> /* On a read-only array we can:
> * - remove failed devices
> * - add already-in_sync devices if the array itself
>
>
Hi Neil
Sorry for late response for this.
I have tried the patch. When I unplug the disk(sdc1) which belongs to the raid1, the directory
/sys/block/md0/md/dev-sdc1 is deleted. I haven't read the code for unplug device. So is it what
you want?
Best Regards
Xiao
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
prev parent reply other threads:[~2015-06-25 9:42 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-27 20:27 RAID1 removing failed disk returns EBUSY Joe Lawrence
2014-10-28 21:41 ` NeilBrown
2014-10-29 17:36 ` Joe Lawrence
2014-11-13 14:05 ` Joe Lawrence
2014-11-16 23:03 ` NeilBrown
2015-01-14 12:41 ` XiaoNi
2015-01-15 13:22 ` Joe Lawrence
2015-01-16 5:20 ` Xiao Ni
2015-01-16 15:10 ` Joe Lawrence
2015-01-19 2:33 ` Xiao Ni
2015-01-19 17:56 ` Joe Lawrence
2015-01-20 7:16 ` Xiao Ni
2015-01-23 15:11 ` Joe Lawrence
2015-01-30 2:19 ` Xiao Ni
2015-01-30 4:27 ` Xiao Ni
2015-01-29 3:52 ` NeilBrown
2015-01-29 12:14 ` Xiao Ni
2015-02-02 6:36 ` NeilBrown
2015-02-03 8:10 ` Xiao Ni
2015-06-10 6:26 ` XiaoNi
2015-06-17 2:51 ` Neil Brown
2015-06-25 9:42 ` Xiao Ni [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1225352330.22633499.1435225374686.JavaMail.zimbra@redhat.com \
--to=xni@redhat.com \
--cc=joe.lawrence@stratus.com \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
--cc=william.kuzeja@stratus.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).