From mboxrd@z Thu Jan 1 00:00:00 1970 From: Xiao Ni Subject: Re: RAID1 removing failed disk returns EBUSY Date: Thu, 25 Jun 2015 05:42:54 -0400 (EDT) Message-ID: <1225352330.22633499.1435225374686.JavaMail.zimbra@redhat.com> References: <20141027162748.593451be@jlaw-desktop.mno.stratus.com> <1924199853.11308787.1421634830810.JavaMail.zimbra@redhat.com> <20150129145217.1cb31d5c@notabene.brown> <371504811.2053160.1422533656432.JavaMail.zimbra@redhat.com> <20150202173601.1ab02927@notabene.brown> <1914953233.3814567.1422951056539.JavaMail.zimbra@redhat.com> <5577D8A1.9060605@redhat.com> <20150617125151.372bb103@home.neil.brown.name> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: In-Reply-To: <20150617125151.372bb103@home.neil.brown.name> Sender: linux-raid-owner@vger.kernel.org To: Neil Brown Cc: Joe Lawrence , linux-raid@vger.kernel.org, Bill Kuzeja List-Id: linux-raid.ids ----- Original Message ----- > From: "Neil Brown" > To: "XiaoNi" > Cc: "Joe Lawrence" , linux-raid@vger.kernel= =2Eorg, "Bill Kuzeja" > Sent: Wednesday, June 17, 2015 10:51:51 AM > Subject: Re: RAID1 removing failed disk returns EBUSY >=20 > On Wed, 10 Jun 2015 14:26:41 +0800 > XiaoNi wrote: >=20 > >=20 > >=20 > > On 02/03/2015 04:10 PM, Xiao Ni wrote: > > > > > > ----- Original Message ----- > > >> From: "NeilBrown" > > >> To: "Xiao Ni" > > >> Cc: "Joe Lawrence" , > > >> linux-raid@vger.kernel.org, "Bill Kuzeja" > > >> Sent: Monday, February 2, 2015 2:36:01 PM > > >> Subject: Re: RAID1 removing failed disk returns EBUSY > > >> > > >> On Thu, 29 Jan 2015 07:14:16 -0500 (EST) Xiao Ni wrote: > > >> > > >>> > > >>> ----- Original Message ----- > > >>>> From: "NeilBrown" > > >>>> To: "Xiao Ni" > > >>>> Cc: "Joe Lawrence" , > > >>>> linux-raid@vger.kernel.org, "Bill Kuzeja" > > >>>> Sent: Thursday, January 29, 2015 11:52:17 AM > > >>>> Subject: Re: RAID1 removing failed disk returns EBUSY > > >>>> > > >>>> On Sun, 18 Jan 2015 21:33:50 -0500 (EST) Xiao Ni > > >>>> wrote: > > >>>> > > >>>>> > > >>>>> ----- Original Message ----- > > >>>>>> From: "Joe Lawrence" > > >>>>>> To: "Xiao Ni" > > >>>>>> Cc: "NeilBrown" , linux-raid@vger.kernel.org,= "Bill > > >>>>>> Kuzeja" > > >>>>>> Sent: Friday, January 16, 2015 11:10:31 PM > > >>>>>> Subject: Re: RAID1 removing failed disk returns EBUSY > > >>>>>> > > >>>>>> On Fri, 16 Jan 2015 00:20:12 -0500 > > >>>>>> Xiao Ni wrote: > > >>>>>>> Hi Joe > > >>>>>>> > > >>>>>>> Thanks for reminding me. I didn't do that. Now it can r= emove > > >>>>>>> successfully after writing > > >>>>>>> "idle" to sync_action. > > >>>>>>> > > >>>>>>> I thought wrongly that the patch referenced in this mai= l is > > >>>>>>> fixed > > >>>>>>> for > > >>>>>>> the problem. > > >>>>>> So it sounds like even with 3.18 and a new mdadm, this bug s= till > > >>>>>> persists? > > >>>>>> > > >>>>>> -- Joe > > >>>>>> > > >>>>>> -- > > >>>>> Hi Joe > > >>>>> > > >>>>> I'm a little confused now. Does the patch > > >>>>> 45eaf45dfa4850df16bc2e8e7903d89021137f40 from linux-stabl= e > > >>>>> resolve the problem? > > >>>>> > > >>>>> My environment is: > > >>>>> > > >>>>> [root@dhcp-12-133 mdadm]# mdadm --version > > >>>>> mdadm - v3.3.2-18-g93d3bd3 - 18th December 2014 (this is the= newest > > >>>>> upstream) > > >>>>> [root@dhcp-12-133 mdadm]# uname -r > > >>>>> 3.18.2 > > >>>>> > > >>>>> > > >>>>> My steps are: > > >>>>> > > >>>>> [root@dhcp-12-133 mdadm]# lsblk > > >>>>> sdb 8:16 0 931.5G 0 disk > > >>>>> =E2=94=94=E2=94=80sdb1 8:17 0 5G 0 = part > > >>>>> sdc 8:32 0 186.3G 0 disk > > >>>>> sdd 8:48 0 931.5G 0 disk > > >>>>> =E2=94=94=E2=94=80sdd1 8:49 0 5G 0 = part > > >>>>> [root@dhcp-12-133 mdadm]# mdadm -CR /dev/md0 -l1 -n2 /dev/sdb= 1 > > >>>>> /dev/sdd1 > > >>>>> --assume-clean > > >>>>> mdadm: Note: this array has metadata at the start and > > >>>>> may not be suitable as a boot device. If you plan to > > >>>>> store '/boot' on this device please ensure that > > >>>>> your boot-loader understands md/v1.x metadata, or use > > >>>>> --metadata=3D0.90 > > >>>>> mdadm: Defaulting to version 1.2 metadata > > >>>>> mdadm: array /dev/md0 started. > > >>>>> > > >>>>> Then I unplug the disk. > > >>>>> > > >>>>> [root@dhcp-12-133 mdadm]# lsblk > > >>>>> sdc 8:32 0 186.3G 0 disk > > >>>>> sdd 8:48 0 931.5G 0 disk > > >>>>> =E2=94=94=E2=94=80sdd1 8:49 0 5G 0 = part > > >>>>> =E2=94=94=E2=94=80md0 9:0 0 5G 0= raid1 > > >>>>> [root@dhcp-12-133 mdadm]# echo faulty > > > >>>>> /sys/block/md0/md/dev-sdb1/state > > >>>>> [root@dhcp-12-133 mdadm]# echo remove > > > >>>>> /sys/block/md0/md/dev-sdb1/state > > >>>>> -bash: echo: write error: Device or resource busy > > >>>>> [root@dhcp-12-133 mdadm]# echo idle > /sys/block/md0/md/sync_= action > > >>>>> [root@dhcp-12-133 mdadm]# echo remove > > > >>>>> /sys/block/md0/md/dev-sdb1/state > > >>>>> > > >>>> I cannot reproduce this - using linux 3.18.2. I'd be surprise= d if > > >>>> mdadm > > >>>> version affects things. > > >>> Hi Neil > > >>> > > >>> I'm very curious, because it can reproduce in my machine 10= 0%. > > >>> > > >>>> This error (Device or resoource busy) implies that rdev->raid_= disk is > > >>>> >=3D > > >>>> 0 > > >>>> (tested in state_store()). > > >>>> > > >>>> ->raid_disk is set to -1 by remove_and_add_spares() providing: > > >>>> 1/ it isn't Blocked (which is very unlikely) > > >>>> 2/ hot_remove_disk succeeds, which it will if nr_pending is= zero, > > >>>> and > > >>>> 3/ nr_pending is zero. > > >>> I remember I have tired to check those reasons. But it's re= ally is > > >>> the > > >>> reason 1 > > >>> which is very unlikely. > > >>> > > >>> I add some code in the function array_state_show > > >>> > > >>> array_state_show(struct mddev *mddev, char *page) { > > >>> enum array_state st =3D inactive; > > >>> struct md_rdev *rdev; > > >>> > > >>> rdev_for_each_rcu(rdev, mddev) { > > >>> printk(KERN_ALERT "search for %s\n", > > >>> rdev->bdev->bd_disk->disk_name); > > >>> if (test_bit(Blocked, &rdev->flags)) > > >>> printk(KERN_ALERT "rdev is Blocked\n")= ; > > >>> else > > >>> printk(KERN_ALERT "rdev is not Blocked= \n"); > > >>> } > > >>> > > >>> When I echo 1 > /sys/block/sdc/device/delete, then I ran com= mand: > > >>> > > >>> [root@dhcp-12-133 md]# cat /sys/block/md0/md/array_state > > >>> read-auto > > >> ^^^^^^^^^ > > >> > > >> I think that is half the explanation. > > >> You must have the md_mod.start_ro parameter set to '1'. > > >> > > >> > > >>> [root@dhcp-12-133 md]# dmesg > > >>> [ 2679.559185] search for sdc > > >>> [ 2679.559189] rdev is Blocked > > >>> [ 2679.559190] search for sdb > > >>> [ 2679.559190] rdev is not Blocked > > >>> =20 > > >>> So sdc is Blocked > > >> and that is the other half - thanks. > > >> (yes, I was wrong. Sometimes it is easier than being right, but= still > > >> yields results). > > >> > > >> When a device fails, it is Blocked until the metadata is updated= to > > >> record > > >> the failure. This ensures that no writes succeed without writin= g to > > >> that > > >> device, until we a certain that no read will try reading from th= at > > >> device, > > >> even after a crash/restart. > > >> > > >> Blocked is cleared after the metadata is written, but read-auto = (and > > >> read-only) devices never write out their metadata. So blocked d= oesn't > > >> get > > >> cleared. > > >> > > >> When you "echo idle > .../sync_action" one of the side effects i= s to > > >> with > > >> from 'read-auto' to fully active. This allows the metadata to b= e > > >> written, > > >> Blocked to be cleared, and the device to be removed. > > >> > > >> If you > > >> echo none > /sys/block/md0/md/dev-sdc/slot > > >> > > >> first, then the remove will work. > > >> > > >> We could possibly fix it with something like the following, but = I'm not > > >> sure > > >> I like it. There is no guarantee that I can see which would ens= ure the > > >> superblock got updated before the first write if the array switc= h to > > >> read/write. > > >> > > >> NeilBrown > > >> > > >> diff --git a/drivers/md/md.c b/drivers/md/md.c > > >> index 9233c71138f1..b3d1e8e5e067 100644 > > >> --- a/drivers/md/md.c > > >> +++ b/drivers/md/md.c > > >> @@ -7528,7 +7528,7 @@ static int remove_and_add_spares(struct md= dev > > >> *mddev, > > >> rdev_for_each(rdev, mddev) > > >> if ((this =3D=3D NULL || rdev =3D=3D this) && > > >> rdev->raid_disk >=3D 0 && > > >> - !test_bit(Blocked, &rdev->flags) && > > >> + (!test_bit(Blocked, &rdev->flags) || mddev->ro) && > > >> (test_bit(Faulty, &rdev->flags) || > > >> ! test_bit(In_sync, &rdev->flags)) && > > >> atomic_read(&rdev->nr_pending)=3D=3D0) { > > >> > > >> > > >> > > > Hi Neil > > > > > > I have tried the patch and the problem can be fixed by it. Bu= t I'm > > > sorry that I can't > > > give more advices for better idea about this. I'm not familiar wi= th the > > > metadata part about > > > the md. I'll try to get more time to read the code about md. > > > > > Hi Neil > >=20 > > I don't see the patch in linux-stable, do you miss this? >=20 > I don't believe this bug is sufficiently serious for the patch to go = to > -stable. However it doesn't need to be fixed - thanks for the remind= er. >=20 > I've just queued the following patch which I am happy with. If you > could confirm that it works for you, I would appreciate that. >=20 > Thanks, > NeilBrown >=20 >=20 > From: Neil Brown > Date: Wed, 17 Jun 2015 12:31:46 +1000 > Subject: [PATCH] md: clear Blocked flag on failed devices when array = is > read-only. >=20 > The Blocked flag indicates that a device has failed but that this > fact hasn't been recorded in the metadata yet. Writes to such > devices cannot be allowed until the metadata has been updated. >=20 > On a read-only array, the Blocked flag will never be cleared. > This prevents the device being removed from the array. >=20 > If the metadata is being handled by the kernel > (i.e. !mddev->external), then we can be sure that if the array is > switch to writable, then a metadata update will happen and will > record the failure. So we don't need the flag set. >=20 > If metadata is externally managed, it is upto the external manager > to clear the 'blocked' flag. >=20 > Reported-by: XiaoNi > Signed-off-by: NeilBrown >=20 > diff --git a/drivers/md/md.c b/drivers/md/md.c > index 3d339e2..5a6681a 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -8125,6 +8125,15 @@ void md_check_recovery(struct mddev *mddev) > int spares =3D 0; > =20 > if (mddev->ro) { > + struct md_rdev *rdev; > + if (!mddev->external && mddev->in_sync) > + /* 'Blocked' flag not needed as failed devices > + * will be recorded if array switched to read/write. > + * Leaving it set will prevent the device > + * from being removed. > + */ > + rdev_for_each(rdev, mddev) > + clear_bit(Blocked, &rdev->flags); > /* On a read-only array we can: > * - remove failed devices > * - add already-in_sync devices if the array itself >=20 >=20 Hi Neil Sorry for late response for this.=20 I have tried the patch. When I unplug the disk(sdc1) which belongs to t= he raid1, the directory=20 /sys/block/md0/md/dev-sdc1 is deleted. I haven't read the code for unpl= ug device. So is it what you want? Best Regards Xiao -- To unsubscribe from this list: send the line "unsubscribe linux-raid" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html