From mboxrd@z Thu Jan 1 00:00:00 1970 From: BillStuff Subject: Re: Raid5 hang in 3.14.19 Date: Mon, 29 Sep 2014 23:19:51 -0500 Message-ID: <542A2F67.7060706@sbcglobal.net> References: <5425E9D6.1050102@sbcglobal.net> <20140929122533.3b91a543@notabene.brown> <5428D863.7090409@sbcglobal.net> <20140929140818.1086972e@notabene.brown> <5428DFE1.9080600@sbcglobal.net> <20140930075950.1d1e3865@notabene.brown> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20140930075950.1d1e3865@notabene.brown> Sender: linux-raid-owner@vger.kernel.org To: NeilBrown Cc: linux-raid List-Id: linux-raid.ids On 09/29/2014 04:59 PM, NeilBrown wrote: > On Sun, 28 Sep 2014 23:28:17 -0500 BillStuff > wrote: > >> On 09/28/2014 11:08 PM, NeilBrown wrote: >>> On Sun, 28 Sep 2014 22:56:19 -0500 BillStuff >>> wrote: >>> >>>> On 09/28/2014 09:25 PM, NeilBrown wrote: >>>>> On Fri, 26 Sep 2014 17:33:58 -0500 BillStuff >>>>> wrote: >>>>> >>>>>> Hi Neil, >>>>>> >>>>>> I found something that looks similar to the problem described in >>>>>> "Re: seems like a deadlock in workqueue when md do a flush" from Sept 14th. >>>>>> >>>>>> It's on 3.14.19 with 7 recent patches for fixing raid1 recovery hangs. >>>>>> >>>>>> on this array: >>>>>> md3 : active raid5 sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1] sda1[0] >>>>>> 104171200 blocks level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU] >>>>>> bitmap: 1/5 pages [4KB], 2048KB chunk >>>>>> >>>>>> I was running a test doing parallel kernel builds, read/write loops, and >>>>>> disk add / remove / check loops, >>>>>> on both this array and a raid1 array. >>>>>> >>>>>> I was trying to stress test your recent raid1 fixes, which went well, >>>>>> but then after 5 days, >>>>>> the raid5 array hung up with this in dmesg: >>>>> I think this is different to the workqueue problem you mentioned, though as I >>>>> don't know exactly what caused either I cannot be certain. >>>>> >>>>> From the data you provided it looks like everything is waiting on >>>>> get_active_stripe(), or on a process that is waiting on that. >>>>> That seems pretty common whenever anything goes wrong in raid5 :-( >>>>> >>>>> The md3_raid5 task is listed as blocked, but not stack trace is given. >>>>> If the machine is still in the state, then >>>>> >>>>> cat /proc/1698/stack >>>>> >>>>> might be useful. >>>>> (echo t > /proc/sysrq-trigger is always a good idea) >>>> Might this help? I believe the array was doing a "check" when things >>>> hung up. >>> It looks like it was trying to start doing a 'check'. >>> The 'resync' thread hadn't been started yet. >>> What is 'kthreadd' doing? >>> My guess is that it is in try_to_free_pages() waiting for writeout >>> for some xfs file page onto the md array ... which won't progress until >>> the thread gets started. >>> >>> That would suggest that we need an async way to start threads... >>> >>> Thanks, >>> NeilBrown >>> >> I suspect your guess is correct: > Thanks for the confirmation. > > I'm thinking of something like that. Very basic suggestion suggests it > instantly crash. > > If you were to apply this patch and run your test for a week or two, that > would increase my confidence (though of course testing doesn't prove the > absence of bugs....) > > Thanks, > NeilBrown Got it running. I'll let you know if anything interesting happens. Thanks, Bill > > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index a79e51d15c2b..580d4b97696c 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -7770,6 +7770,33 @@ no_add: > return spares; > } > > +static void md_start_sync(struct work_struct *ws) > +{ > + struct mddev *mddev = container_of(ws, struct mddev, del_work); > + > + mddev->sync_thread = md_register_thread(md_do_sync, > + mddev, > + "resync"); > + if (!mddev->sync_thread) { > + printk(KERN_ERR "%s: could not start resync" > + " thread...\n", > + mdname(mddev)); > + /* leave the spares where they are, it shouldn't hurt */ > + clear_bit(MD_RECOVERY_SYNC, &mddev->recovery); > + clear_bit(MD_RECOVERY_RESHAPE, &mddev->recovery); > + clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery); > + clear_bit(MD_RECOVERY_CHECK, &mddev->recovery); > + clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery); > + if (test_and_clear_bit(MD_RECOVERY_RECOVER, > + &mddev->recovery)) > + if (mddev->sysfs_action) > + sysfs_notify_dirent_safe(mddev->sysfs_action); > + } else > + md_wakeup_thread(mddev->sync_thread); > + sysfs_notify_dirent_safe(mddev->sysfs_action); > + md_new_event(mddev); > +} > + > /* > * This routine is regularly called by all per-raid-array threads to > * deal with generic issues like resync and super-block update. > @@ -7823,6 +7850,7 @@ void md_check_recovery(struct mddev *mddev) > > if (mddev_trylock(mddev)) { > int spares = 0; > + bool sync_starting = false; > > if (mddev->ro) { > /* On a read-only array we can: > @@ -7921,28 +7949,14 @@ void md_check_recovery(struct mddev *mddev) > */ > bitmap_write_all(mddev->bitmap); > } > - mddev->sync_thread = md_register_thread(md_do_sync, > - mddev, > - "resync"); > - if (!mddev->sync_thread) { > - printk(KERN_ERR "%s: could not start resync" > - " thread...\n", > - mdname(mddev)); > - /* leave the spares where they are, it shouldn't hurt */ > - clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery); > - clear_bit(MD_RECOVERY_SYNC, &mddev->recovery); > - clear_bit(MD_RECOVERY_RESHAPE, &mddev->recovery); > - clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery); > - clear_bit(MD_RECOVERY_CHECK, &mddev->recovery); > - } else > - md_wakeup_thread(mddev->sync_thread); > - sysfs_notify_dirent_safe(mddev->sysfs_action); > - md_new_event(mddev); > + INIT_WORK(&mddev->del_work, md_start_sync); > + queue_work(md_misc_wq, &mddev->del_work); > + sync_starting = true; > } > unlock: > wake_up(&mddev->sb_wait); > > - if (!mddev->sync_thread) { > + if (!mddev->sync_thread && !sync_starting) { > clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery); > if (test_and_clear_bit(MD_RECOVERY_RECOVER, > &mddev->recovery)) >