From mboxrd@z Thu Jan 1 00:00:00 1970 From: NeilBrown Subject: Re: [PATCH] md: create new workqueue for object destruction Date: Wed, 01 Nov 2017 14:57:14 +1100 Message-ID: <87po92veg5.fsf@notabene.neil.brown.name> References: <87mv4qjrez.fsf@notabene.neil.brown.name> <20171018062137.ssdhwkeoy6fdp7yq@kernel.org> <87h8uwj4mz.fsf@notabene.neil.brown.name> <6454f28e-4728-a10d-f3c3-b68afedec8d9@intel.com> <87376ghyms.fsf@notabene.neil.brown.name> <9759b574-2d3f-de45-0840-c84d9cc10528@intel.com> <87wp3qg4bi.fsf@notabene.neil.brown.name> <06d5ab0c-f669-6c9f-3f0a-930cea5c893b@intel.com> <87y3o2ep54.fsf@notabene.neil.brown.name> <169c257d-0ae7-b721-3954-713522dd0ccd@intel.com> <873761d2e4.fsf@notabene.neil.brown.name> Mime-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Return-path: In-Reply-To: Sender: linux-raid-owner@vger.kernel.org To: Artur Paszkiewicz , Shaohua Li Cc: Linux Raid List-Id: linux-raid.ids --=-=-= Content-Type: text/plain Content-Transfer-Encoding: quoted-printable On Mon, Oct 30 2017, Artur Paszkiewicz wrote: > On 10/29/2017 11:18 PM, NeilBrown wrote: >> On Fri, Oct 27 2017, Artur Paszkiewicz wrote: >>=20 >>> On 10/23/2017 01:31 AM, NeilBrown wrote: >>>> On Fri, Oct 20 2017, Artur Paszkiewicz wrote: >>>> >>>>> On 10/20/2017 12:28 AM, NeilBrown wrote: >>>>>> On Thu, Oct 19 2017, Artur Paszkiewicz wrote: >>>>>> >>>>>>> On 10/19/2017 12:36 AM, NeilBrown wrote: >>>>>>>> On Wed, Oct 18 2017, Artur Paszkiewicz wrote: >>>>>>>> >>>>>>>>> On 10/18/2017 09:29 AM, NeilBrown wrote: >>>>>>>>>> On Tue, Oct 17 2017, Shaohua Li wrote: >>>>>>>>>> >>>>>>>>>>> On Tue, Oct 17, 2017 at 04:04:52PM +1100, Neil Brown wrote: >>>>>>>>>>>> >>>>>>>>>>>> lockdep currently complains about a potential deadlock >>>>>>>>>>>> with sysfs access taking reconfig_mutex, and that >>>>>>>>>>>> waiting for a work queue to complete. >>>>>>>>>>>> >>>>>>>>>>>> The cause is inappropriate overloading of work-items >>>>>>>>>>>> on work-queues. >>>>>>>>>>>> >>>>>>>>>>>> We currently have two work-queues: md_wq and md_misc_wq. >>>>>>>>>>>> They service 5 different tasks: >>>>>>>>>>>> >>>>>>>>>>>> mddev->flush_work md_wq >>>>>>>>>>>> mddev->event_work (for dm-raid) md_misc_wq >>>>>>>>>>>> mddev->del_work (mddev_delayed_delete) md_misc_wq >>>>>>>>>>>> mddev->del_work (md_start_sync) md_misc_wq >>>>>>>>>>>> rdev->del_work md_misc_wq >>>>>>>>>>>> >>>>>>>>>>>> We need to call flush_workqueue() for md_start_sync and ->even= t_work >>>>>>>>>>>> while holding reconfig_mutex, but mustn't hold it when >>>>>>>>>>>> flushing mddev_delayed_delete or rdev->del_work. >>>>>>>>>>>> >>>>>>>>>>>> md_wq is a bit special as it has WQ_MEM_RECLAIM so it is >>>>>>>>>>>> best to leave that alone. >>>>>>>>>>>> >>>>>>>>>>>> So create a new workqueue, md_del_wq, and a new work_struct, >>>>>>>>>>>> mddev->sync_work, so we can keep two classes of work separate. >>>>>>>>>>>> >>>>>>>>>>>> md_del_wq and ->del_work are used only for destroying rdev >>>>>>>>>>>> and mddev. >>>>>>>>>>>> md_misc_wq is used for event_work and sync_work. >>>>>>>>>>>> >>>>>>>>>>>> Also document the purpose of each flush_workqueue() call. >>>>>>>>>>>> >>>>>>>>>>>> This removes the lockdep warning. >>>>>>>>>>> >>>>>>>>>>> I had the exactly same patch queued internally, >>>>>>>>>> >>>>>>>>>> Cool :-) >>>>>>>>>> >>>>>>>>>>> but the mdadm= test suite still >>>>>>>>>>> shows lockdep warnning. I haven't time to check further. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> The only other lockdep I've seen later was some ext4 thing, thou= gh I >>>>>>>>>> haven't tried the full test suite. I might have a look tomorrow. >>>>>>>>> >>>>>>>>> I'm also seeing a lockdep warning with or without this patch, >>>>>>>>> reproducible with: >>>>>>>>> >>>>>>>> >>>>>>>> Thanks! >>>>>>>> Looks like using one workqueue for mddev->del_work and rdev->del_w= ork >>>>>>>> causes problems. >>>>>>>> Can you try with this addition please? >>>>>>> >>>>>>> It helped for that case but now there is another warning triggered = by: >>>>>>> >>>>>>> export IMSM_NO_PLATFORM=3D1 # for platforms without IMSM >>>>>>> mdadm -C /dev/md/imsm0 -eimsm -n4 /dev/sd[a-d] -R >>>>>>> mdadm -C /dev/md/vol0 -l5 -n4 /dev/sd[a-d] -R --assume-clean >>>>>>> mdadm -If sda >>>>>>> mdadm -a /dev/md127 /dev/sda >>>>>>> mdadm -Ss >>>>>> >>>>>> I tried that ... and mdmon gets a SIGSEGV. >>>>>> imsm_set_disk() calls get_imsm_disk() and gets a NULL back. >>>>>> It then passes the NULL to mark_failure() and that dereferences it. >>>>> >>>>> Interesting... I can't reproduce this. Can you show the output from >>>>> mdadm -E for all disks after mdmon crashes? And maybe a debug log from >>>>> mdmon? >>>> >>>> The crash happens when I run "mdadm -If sda". >>>> gdb tell me: >>>> >>>> Thread 2 "mdmon" received signal SIGSEGV, Segmentation fault. >>>> [Switching to Thread 0x7f5526c24700 (LWP 4757)] >>>> 0x000000000041601c in is_failed (disk=3D0x0) at super-intel.c:1324 >>>> 1324 return (disk->status & FAILED_DISK) =3D=3D FAILED_DISK; >>>> (gdb) where >>>> #0 0x000000000041601c in is_failed (disk=3D0x0) at super-intel.c:1324 >>>> #1 0x00000000004255a2 in mark_failure (super=3D0x65fa30, dev=3D0x660b= a0,=20 >>>> disk=3D0x0, idx=3D0) at super-intel.c:7973 >>>> #2 0x00000000004260e8 in imsm_set_disk (a=3D0x6635d0, n=3D0, state=3D= 17) >>>> at super-intel.c:8357 >>>> #3 0x0000000000405069 in read_and_act (a=3D0x6635d0, fds=3D0x7f5526c2= 3e10) >>>> at monitor.c:551 >>>> #4 0x0000000000405c8e in wait_and_act (container=3D0x65f010, nowait= =3D0) >>>> at monitor.c:875 >>>> #5 0x0000000000405dc7 in do_monitor (container=3D0x65f010) at monitor= .c:906 >>>> #6 0x0000000000403037 in run_child (v=3D0x65f010) at mdmon.c:85 >>>> #7 0x00007f5526fcb494 in start_thread (arg=3D0x7f5526c24700) >>>> at pthread_create.c:333 >>>> #8 0x00007f5526d0daff in clone () >>>> at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97 >>>> >>>> The super-disks list that get_imsm_dl_disk() looks through contains >>>> sdc, sdd, sde, but not sda - so get_imsm_disk() returns NULL. >>>> (the 4 devices I use are sda sdc sde sde). >>>> mdadm --examine of sda and sdc after the crash are below. >>>> mdmon debug output is below that. >>> >>> Thank you for the information. The metadata output shows that there is >>> something wrong with sda. Is there anything different about this device? >>> The other disks are 10M QEMU SCSI drives, is sda the same? Can you >>> check its serial e.g. with sg_inq? >>=20 >> sdc, sdd, and sde are specified to qemu with >>=20 >> -hdb /var/tmp/mdtest10 \ >> -hdc /var/tmp/mdtest11 \ >> -hdd /var/tmp/mdtest12 \ >>=20 >> sda comes from >> -drive file=3D/var/tmp/mdtest13,if=3Dscsi,index=3D3,media=3Ddisk = -s >>=20 >> /var/tmp/mdtest* are simple raw images, 10M each. >>=20 >> sg_inq report sd[cde] as >> Vendor: ATA >> Product: QEMU HARDDISK >> Serial: QM0000[234] >>=20 >> sda is >> Vendor: QEMU >> Product: QEMU HARDDISK >> no serial number. >>=20 >>=20 >> If I change my script to use >> -drive file=3D/var/tmp/mdtest13,if=3Dscsi,index=3D3,serial=3DQM00= 009,media=3Ddisk -s >>=20 >> for sda, mdmon doesn't crash. It may well be reasonable to refuse to >> work with a device that has no serial number. It is not very friendly >> to crash :-( > > OK, this explains a lot. Can you try the same with this patch? It looks > like there was insufficient error checking when retrieving the scsi > serial. Mdadm should now abort when creating the container. > IMSM_DEVNAME_AS_SERIAL can be used to create an array with disks that > don't have a serial number. > > Thanks, > Artur > > diff --git a/sg_io.c b/sg_io.c > index 42c91e1e..7889a95e 100644 > --- a/sg_io.c > +++ b/sg_io.c > @@ -46,6 +46,9 @@ int scsi_get_serial(int fd, void *buf, size_t buf_len) > if (rv) > return rv; > > + if ((io_hdr.info & SG_INFO_OK_MASK) !=3D SG_INFO_OK) > + return -1; > + > rsp_len =3D rsp_buf[3]; > > if (!rsp_len || buf_len < rsp_len) Thanks. That does seem to make a useful difference. It doesn't crash now. I need IMSM_DEVNAME_AS_SERIAL=3D1 to create the array, but then it runs smoothly. Thanks, NeilBrown --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEG8Yp69OQ2HB7X0l6Oeye3VZigbkFAln5RhwACgkQOeye3VZi gbkQTA/+MFbklMCu1K2a7Pincee5W+qfcgWAWC5yGeUNKbJ0ImR5GlJpOFiBkW1U ZEMYGq1TIMiCx3Lm6/XuK16OUBXEHoYvO724I5/GL1hSvudG2fq316UYij0zIKQi sqdNgPwhFMvbLmTfSsCjSjR6Tbzlrz3Z7sRZ7sJSstyfyGjC1Iq+Yxi1UZ5Y89iT x7RU/HjXW7CZnJt6PwqGC7nLR8opNw0jwQF7IWPjNyUtepUH2VNAIZKg7UPPsHwv m3cDz7Y9VxBOypGxduq4dnOG8ylfUewCSyvwxcbWEk3LGM/GgsWednR0pLI52G2g 5tWLmla2sPhmcnizl9OSHDXHY4O7m6Qu834zPrrBvbbRDyDH/M8jMVZMUe7H2rph HgHNs9v7CYV8xV6aN2wZAd/nQRGIFwFUWUX4NHo7emthoR1outZ+PnBp/abBiL24 BF4OR7njDeNuijfWMyyj6yv0QdEuZxfxbAhHGCHi6NFWnVHdWVeWiiJi/BwOnfao R+UHrH/MdkGfG5X3wwAFZgjxHtq3b+ozaJixt+PTd+4524Li6FGixN+JhGiJzVgC KLcc5m+joPiX3C32aJoUzZznPKeGIIU/Y7UscDTgkW5oWJCljFF7O+rKh0dnduJh 4PNGiSEwklsDbsZl++NrxLaQMFCPhy6e5nzqUu4zuh20nC+Q6iE= =N4SA -----END PGP SIGNATURE----- --=-=-=--