From: Hugh Dickins <hughd@google.com>
To: Jens Axboe <axbod@kernel.dk>
Cc: Jan Kara <jack@suse.cz>, Keith Busch <kbusch@kernel.org>,
Hugh Dickins <hughd@google.com>,
Yu Kuai <yukuai1@huaweicloud.com>,
Liu Song <liusong@linux.alibaba.com>,
Hillf Danton <hdanton@sina.com>,
linux-block@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH next v2] sbitmap: fix lockup while swapping
Date: Tue, 27 Sep 2022 20:59:47 -0700 (PDT) [thread overview]
Message-ID: <f975dddf-6ec-b3cb-3746-e91f61b22ea@google.com> (raw)
In-Reply-To: <2b931ee7-1bc9-e389-9d9f-71eb778dcf1@google.com>
Commit 4acb83417cad ("sbitmap: fix batched wait_cnt accounting")
is a big improvement: without it, I had to revert to before commit
040b83fcecfb ("sbitmap: fix possible io hung due to lost wakeup")
to avoid the high system time and freezes which that had introduced.
Now okay on the NVME laptop, but 4acb83417cad is a disaster for heavy
swapping (kernel builds in low memory) on another: soon locking up in
sbitmap_queue_wake_up() (into which __sbq_wake_up() is inlined), cycling
around with waitqueue_active() but wait_cnt 0 . Here is a backtrace,
showing the common pattern of outer sbitmap_queue_wake_up() interrupted
before setting wait_cnt 0 back to wake_batch (in some cases other CPUs
are idle, in other cases they're spinning for a lock in dd_bio_merge()):
sbitmap_queue_wake_up < sbitmap_queue_clear < blk_mq_put_tag <
__blk_mq_free_request < blk_mq_free_request < __blk_mq_end_request <
scsi_end_request < scsi_io_completion < scsi_finish_command <
scsi_complete < blk_complete_reqs < blk_done_softirq < __do_softirq <
__irq_exit_rcu < irq_exit_rcu < common_interrupt < asm_common_interrupt <
_raw_spin_unlock_irqrestore < __wake_up_common_lock < __wake_up <
sbitmap_queue_wake_up < sbitmap_queue_clear < blk_mq_put_tag <
__blk_mq_free_request < blk_mq_free_request < dd_bio_merge <
blk_mq_sched_bio_merge < blk_mq_attempt_bio_merge < blk_mq_submit_bio <
__submit_bio < submit_bio_noacct_nocheck < submit_bio_noacct <
submit_bio < __swap_writepage < swap_writepage < pageout <
shrink_folio_list < evict_folios < lru_gen_shrink_lruvec <
shrink_lruvec < shrink_node < do_try_to_free_pages < try_to_free_pages <
__alloc_pages_slowpath < __alloc_pages < folio_alloc < vma_alloc_folio <
do_anonymous_page < __handle_mm_fault < handle_mm_fault <
do_user_addr_fault < exc_page_fault < asm_exc_page_fault
See how the process-context sbitmap_queue_wake_up() has been interrupted,
after bringing wait_cnt down to 0 (and in this example, after doing its
wakeups), before advancing wake_index and refilling wake_cnt: an
interrupt-context sbitmap_queue_wake_up() of the same sbq gets stuck.
I have almost no grasp of all the possible sbitmap races, and their
consequences: but __sbq_wake_up() can do nothing useful while wait_cnt 0,
so it is better if sbq_wake_ptr() skips on to the next ws in that case:
which fixes the lockup and shows no adverse consequence for me.
Signed-off-by: Hugh Dickins <hughd@google.com>
---
v2: - v1 to __sbq_wake_up() broke out when this happens, but
v2 to sbq_wake_ptr() does better by skipping on to the next.
- added more comment and deleted dubious Fixes attribution.
lib/sbitmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/lib/sbitmap.c
+++ b/lib/sbitmap.c
@@ -587,7 +587,7 @@ static struct sbq_wait_state *sbq_wake_p
for (i = 0; i < SBQ_WAIT_QUEUES; i++) {
struct sbq_wait_state *ws = &sbq->ws[wake_index];
- if (waitqueue_active(&ws->wait)) {
+ if (waitqueue_active(&ws->wait) && atomic_read(&ws->wait_cnt)) {
if (wake_index != atomic_read(&sbq->wake_index))
atomic_set(&sbq->wake_index, wake_index);
return ws;
next prev parent reply other threads:[~2022-09-28 4:00 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-18 21:10 [PATCH next] sbitmap: fix lockup while swapping Hugh Dickins
2022-09-19 21:22 ` Keith Busch
2022-09-19 23:01 ` Hugh Dickins
2022-09-21 16:40 ` Jan Kara
2022-09-23 14:43 ` Jan Kara
2022-09-23 15:13 ` Keith Busch
2022-09-23 16:16 ` Hugh Dickins
2022-09-23 19:07 ` Keith Busch
2022-09-23 21:29 ` Hugh Dickins
2022-09-23 23:15 ` Hugh Dickins
2022-09-26 11:44 ` Jan Kara
2022-09-26 14:08 ` Yu Kuai
2022-09-27 3:39 ` Hugh Dickins
2022-09-27 10:31 ` Jan Kara
2022-09-28 3:56 ` Hugh Dickins
2022-09-28 3:59 ` Hugh Dickins [this message]
2022-09-28 4:07 ` [PATCH next v2] " Hugh Dickins
2022-09-29 8:39 ` Jan Kara
2022-09-29 19:50 ` [PATCH next v3] " Hugh Dickins
2022-09-29 19:56 ` Keith Busch
2022-09-29 23:58 ` Jens Axboe
[not found] ` <20220924023047.1410-1-hdanton@sina.com>
2022-09-27 4:02 ` [PATCH next] " Hugh Dickins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f975dddf-6ec-b3cb-3746-e91f61b22ea@google.com \
--to=hughd@google.com \
--cc=axbod@kernel.dk \
--cc=hdanton@sina.com \
--cc=jack@suse.cz \
--cc=kbusch@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=liusong@linux.alibaba.com \
--cc=yukuai1@huaweicloud.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox