From: NeilBrown <neilb@suse.com>
To: Shaohua Li <shli@kernel.org>
Cc: Guoqing Jiang <gqjiang@suse.com>,
Dennis Yang <shinrairis@gmail.com>,
linux-raid <linux-raid@vger.kernel.org>
Subject: Re: Calling block ops when !TASK_RUNNING warning in raid1
Date: Mon, 04 Dec 2017 08:21:04 +1100 [thread overview]
Message-ID: <87vahntsn3.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <20171201201009.4lqx3ktwyxgxyaos@kernel.org>
[-- Attachment #1: Type: text/plain, Size: 9638 bytes --]
On Fri, Dec 01 2017, Shaohua Li wrote:
> On Thu, Nov 30, 2017 at 03:26:06PM +1100, Neil Brown wrote:
>> On Thu, Nov 30 2017, Guoqing Jiang wrote:
>>
>> > On 11/30/2017 08:20 AM, NeilBrown wrote:
>> >> On Tue, Nov 28 2017, Shaohua Li wrote:
>> >>
>> >>> On Tue, Nov 28, 2017 at 04:51:25PM +0800, Dennis Yang wrote:
>> >>>> Hi,
>> >>>>
>> >>>> We recently see the following kernel dump on raid1 with some kernel
>> >>>> debug option on.
>> >>>>
>> >>>> <4>[ 40.501369] ------------[ cut here ]------------
>> >>>> <4>[ 40.501375] WARNING: CPU: 7 PID: 7477 at
>> >>>> kernel/sched/core.c:7404 __might_sleep+0xa2/0xb0()
>> >>>> <4>[ 40.501378] do not call blocking ops when !TASK_RUNNING; state=2
>> >>>> set at [<ffffffff810c28d8>] prepare_to_wait_event+0x58/0x100
>> >>>> <4>[ 40.501379] Modules linked in: dm_c2f(O) pl2303 usbserial
>> >>>> qm2_i2c(O) intel_ips drbd(O) flashcache(O) dm_tier_hro_algo
>> >>>> dm_thin_pool dm_bio_prison dm_persistent_data hal_netlink(O) k10temp
>> >>>> coretemp mlx4_en(O) mlx4_core(O) mlx_compat(O) igb e1000e(O)
>> >>>> mpt3sas(O) mpt2sas scsi_transport_sas raid_class usb_storage xhci_pci
>> >>>> xhci_hcd usblp uhci_hcd ehci_pci ehci_hcd
>> >>>> <4>[ 40.501395] CPU: 7 PID: 7477 Comm: md321_resync Tainted: G
>> >>>> O 4.2.8 #1
>> >>>> <4>[ 40.501396] Hardware name: INSYDE QV96/Type2 - Board Product
>> >>>> Name1, BIOS QV96IR23 10/21/2015
>> >>>> <4>[ 40.501397] ffffffff8219aff7 ffff880092f7b868 ffffffff81c86c4b
>> >>>> ffffffff810dfb59
>> >>>> <4>[ 40.501399] ffff880092f7b8b8 ffff880092f7b8a8 ffffffff81079fa5
>> >>>> ffff880092f7b8e8
>> >>>> <4>[ 40.501401] ffffffff821a4f6d 0000000000000140 0000000000000000
>> >>>> 0000000000000001
>> >>>> <4>[ 40.501403] Call Trace:
>> >>>> <4>[ 40.501407] [<ffffffff81c86c4b>] dump_stack+0x4c/0x65
>> >>>> <4>[ 40.501409] [<ffffffff810dfb59>] ? console_unlock+0x279/0x4f0
>> >>>> <4>[ 40.501411] [<ffffffff81079fa5>] warn_slowpath_common+0x85/0xc0
>> >>>> <4>[ 40.501412] [<ffffffff8107a021>] warn_slowpath_fmt+0x41/0x50
>> >>>> <4>[ 40.501414] [<ffffffff810c28d8>] ? prepare_to_wait_event+0x58/0x100
>> >>>> <4>[ 40.501415] [<ffffffff810c28d8>] ? prepare_to_wait_event+0x58/0x100
>> >>>> <4>[ 40.501416] [<ffffffff810a4f72>] __might_sleep+0xa2/0xb0
>> >>>> <4>[ 40.501419] [<ffffffff8117bb7c>] mempool_alloc+0x7c/0x150
>> >>>> <4>[ 40.501422] [<ffffffff8101442a>] ? save_stack_trace+0x2a/0x50
>> >>>> <4>[ 40.501425] [<ffffffff8145b589>] bio_alloc_bioset+0xb9/0x260
>> >>>> <4>[ 40.501428] [<ffffffff816cb6da>] bio_alloc_mddev+0x1a/0x30
>> >>>> <4>[ 40.501429] [<ffffffff816d22a2>] md_super_write+0x32/0x90
>> >>>> <4>[ 40.501431] [<ffffffff816db9b2>] write_page+0x2d2/0x480
>> >>>> <4>[ 40.501432] [<ffffffff816db808>] ? write_page+0x128/0x480
>> >>>> <4>[ 40.501434] [<ffffffff816a45cc>] ? flush_pending_writes+0x1c/0xe0
>> >>>> <4>[ 40.501435] [<ffffffff816dc2a6>] bitmap_unplug+0x156/0x170
>> >>>> <4>[ 40.501437] [<ffffffff810caa2d>] ? trace_hardirqs_on+0xd/0x10
>> >>>> <4>[ 40.501439] [<ffffffff81c94b4b>] ? _raw_spin_unlock_irq+0x2b/0x40
>> >>>> <4>[ 40.501440] [<ffffffff816a4613>] flush_pending_writes+0x63/0xe0
>> >>>> <4>[ 40.501442] [<ffffffff816a4aff>] freeze_array+0x6f/0xc0
>> >>>> <4>[ 40.501443] [<ffffffff810c27e0>] ? wait_woken+0xb0/0xb0
>> >>>> <4>[ 40.501444] [<ffffffff816a4b8f>] raid1_quiesce+0x3f/0x50
>> >>>> <4>[ 40.501446] [<ffffffff816d2254>] md_do_sync+0x1394/0x13b0
>> >>>> <4>[ 40.501447] [<ffffffff816d1531>] ? md_do_sync+0x671/0x13b0
>> >>>> <4>[ 40.501449] [<ffffffff810cb680>] ? __lock_acquire+0x990/0x23a0
>> >>>> <4>[ 40.501451] [<ffffffff810bade7>] ? pick_next_task_fair+0x707/0xc30
>> >>>> <4>[ 40.501453] [<ffffffff8108783c>] ? kernel_sigaction+0x2c/0xc0
>> >>>> <4>[ 40.501455] [<ffffffff81c94b4b>] ? _raw_spin_unlock_irq+0x2b/0x40
>> >>>> <4>[ 40.501456] [<ffffffff816cac80>] ? find_pers+0x80/0x80
>> >>>> <4>[ 40.501457] [<ffffffff816cadbe>] md_thread+0x13e/0x150
>> >>>> <4>[ 40.501458] [<ffffffff816cac80>] ? find_pers+0x80/0x80
>> >>>> <4>[ 40.501460] [<ffffffff816cac80>] ? find_pers+0x80/0x80
>> >>>> <4>[ 40.501462] [<ffffffff8109ded5>] kthread+0x105/0x120
>> >>>> <4>[ 40.501463] [<ffffffff81c94b4b>] ? _raw_spin_unlock_irq+0x2b/0x40
>> >>>> <4>[ 40.501465] [<ffffffff8109ddd0>] ? kthread_create_on_node+0x220/0x220
>> >>>> <4>[ 40.501467] [<ffffffff81c956cf>] ret_from_fork+0x3f/0x70
>> >>>> <4>[ 40.501468] [<ffffffff8109ddd0>] ? kthread_create_on_node+0x220/0x220
>> >>>> <4>[ 40.501469] ---[ end trace bd085fb137be2a87 ]---
>> >>>>
>> >>>> It looks like raid1_quiesce() creates a nested sleeping primitives by
>> >>>> calling wait_event_lock_irq_cmd()
>> >>>> first to change the state to TASK_UNINTERRUPTIBLE and
>> >>>> flush_pending_writes() could eventually try
>> >>>> to allocate bio for bitmap update with GFP_NOIO which might sleep and
>> >>>> triggers this warning.
>> >>>> I am not sure if this warning is just a false-positive or should we
>> >>>> change the bio allocation
>> >>>> gfp flag to GFP_NOWAIT to prevent it from blocking?
>> >>> This is a legit warnning. Changing gfp to GFP_NOWAIT doesn't completely fix the
>> >>> issue, because generic_make_request could sleep too. I think we should move the
>> >>> work to a workqueue. Could you please try below patch (untested yet)?
>> >> I think it would be simpler to call __set_current_state(TASK_RUNNING)
>> >> in the 'then' branch of flush_pending_writes().
>> >
>> > There is no 'then' branch in this function, maybe you mean set
>> > TASK_RUNNING in the 'if' branch,
>> > since the calltrace is triggered by flush_pending_writes ->
>> > flush_bio_list -> bitmap_unplug.
>>
>> I grew up with BASIC and Pascal.
>> Every "if" statement has a "then" branch and an "else" branch.
>> The fact that C doesn't have a "then" keyword doesn't mean there isn't a
>> 'then' branch.
>>
>> But yes, state should be set to TASK_RUNNING when the condition in the
>> 'if' statement evaluates as 'true' (or maybe I should say "doesn't
>> evaluate to 0").
>
> Completely agree this fixes the issue, but I'm a little hesitant to apply it.
> It looks a little weird, I mean, at least I must add comments to explain why we
> do that way. Do you have strong preference to do this way?
My preference is quite strong. I believe the current code is simple and
correct and doesn't need to change.
The warning is a false positive. It is a good warning to have, but in
this case it doesn't indicate a problem.
I agree that comments are a good thing here. So I propose this patch,
replete with comments.
Thanks,
NeilBrown
Subject: [PATCH] md/raid1,raid10: silence warning about wait-within-wait
If you prepare_to_wait() after a previous prepare_to_wait(),
but before calling schedule(), you get warning:
do not call blocking ops when !TASK_RUNNING; state=2
This is appropriate as it is often a bug. The event that the
first prepare_to_wait() expects might wake up the schedule following
the second prepare_to_wait(), which could be confusing.
However if both prepare_to_wait()s are part of simple wait_event()
loops, and if the inner one is rarely called, then there is
no problem. The inner loop is too simple to get confused by
a stray wakeup, and the outer loop won't spin unduly because the
inner doesnt affect it often.
This pattern occurs in both raid1.c and raid10.c in the use of
flush_pending_writes().
The warning can be silenced by setting current->state to TASK_RUNNING.
Signed-off-by: NeilBrown <neilb@suse.com>
---
drivers/md/raid1.c | 11 +++++++++++
drivers/md/raid10.c | 11 +++++++++++
2 files changed, 22 insertions(+)
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index cc9d337a1ed3..0faec3a5f017 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -813,6 +813,17 @@ static void flush_pending_writes(struct r1conf *conf)
bio = bio_list_get(&conf->pending_bio_list);
conf->pending_count = 0;
spin_unlock_irq(&conf->device_lock);
+
+ /* As this is called in a wait_event() loop, current->state
+ * might be TASK_UNINTERRUPTIBLE which will cause a warning
+ * when we prepare to wait again.
+ * As it is rare that this path is taken, it is perfectly
+ * safe to force us to go around the wait_event() loop
+ * again, so the warning is a false-positive.
+ * Silence the warning by resetting thread state
+ */
+ __set_current_state(TASK_RUNNING);
+
flush_bio_list(conf, bio);
} else
spin_unlock_irq(&conf->device_lock);
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index b9edbc747a95..df7b78a79735 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -898,6 +898,17 @@ static void flush_pending_writes(struct r10conf *conf)
bio = bio_list_get(&conf->pending_bio_list);
conf->pending_count = 0;
spin_unlock_irq(&conf->device_lock);
+
+ /* As this is called in a wait_event() loop, current->state
+ * might be TASK_UNINTERRUPTIBLE which will cause a warning
+ * when we prepare to wait again.
+ * As it is rare that this path is taken, it is perfectly
+ * safe to force us to go around the wait_event() loop
+ * again, so the warning is a false-positive.
+ * Silence the warning by resetting thread state
+ */
+ __set_current_state(TASK_RUNNING);
+
/* flush any pending bitmap writes to disk
* before proceeding w/ I/O */
bitmap_unplug(conf->mddev->bitmap);
--
2.14.0.rc0.dirty
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
next prev parent reply other threads:[~2017-12-03 21:21 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-28 8:51 Calling block ops when !TASK_RUNNING warning in raid1 Dennis Yang
2017-11-28 17:52 ` Shaohua Li
2017-11-30 0:20 ` NeilBrown
2017-11-30 2:56 ` Guoqing Jiang
2017-11-30 4:26 ` NeilBrown
2017-12-01 20:10 ` Shaohua Li
2017-12-03 21:21 ` NeilBrown [this message]
2017-12-04 22:31 ` Shaohua Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87vahntsn3.fsf@notabene.neil.brown.name \
--to=neilb@suse.com \
--cc=gqjiang@suse.com \
--cc=linux-raid@vger.kernel.org \
--cc=shinrairis@gmail.com \
--cc=shli@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).