All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Ming Lei <ming.lei@redhat.com>, Jan Kara <jack@suse.cz>,
	Kemeng Shi <shikemeng@huaweicloud.com>,
	Changhui Zhong <czhong@redhat.com>, Jens Axboe <axboe@kernel.dk>,
	Sasha Levin <sashal@kernel.org>,
	linux-block@vger.kernel.org
Subject: [PATCH AUTOSEL 5.10 10/13] blk-mq: fix IO hang from sbitmap wakeup race
Date: Sun, 28 Jan 2024 11:15:56 -0500	[thread overview]
Message-ID: <20240128161606.205221-10-sashal@kernel.org> (raw)
In-Reply-To: <20240128161606.205221-1-sashal@kernel.org>

From: Ming Lei <ming.lei@redhat.com>

[ Upstream commit 5266caaf5660529e3da53004b8b7174cab6374ed ]

In blk_mq_mark_tag_wait(), __add_wait_queue() may be re-ordered
with the following blk_mq_get_driver_tag() in case of getting driver
tag failure.

Then in __sbitmap_queue_wake_up(), waitqueue_active() may not observe
the added waiter in blk_mq_mark_tag_wait() and wake up nothing, meantime
blk_mq_mark_tag_wait() can't get driver tag successfully.

This issue can be reproduced by running the following test in loop, and
fio hang can be observed in < 30min when running it on my test VM
in laptop.

	modprobe -r scsi_debug
	modprobe scsi_debug delay=0 dev_size_mb=4096 max_queue=1 host_max_queue=1 submit_queues=4
	dev=`ls -d /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/* | head -1 | xargs basename`
	fio --filename=/dev/"$dev" --direct=1 --rw=randrw --bs=4k --iodepth=1 \
       		--runtime=100 --numjobs=40 --time_based --name=test \
        	--ioengine=libaio

Fix the issue by adding one explicit barrier in blk_mq_mark_tag_wait(), which
is just fine in case of running out of tag.

Cc: Jan Kara <jack@suse.cz>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Reported-by: Changhui Zhong <czhong@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20240112122626.4181044-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 block/blk-mq.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index e153a36c9ba3..a7a31d7090ae 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1188,6 +1188,22 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx,
 	wait->flags &= ~WQ_FLAG_EXCLUSIVE;
 	__add_wait_queue(wq, wait);
 
+	/*
+	 * Add one explicit barrier since blk_mq_get_driver_tag() may
+	 * not imply barrier in case of failure.
+	 *
+	 * Order adding us to wait queue and allocating driver tag.
+	 *
+	 * The pair is the one implied in sbitmap_queue_wake_up() which
+	 * orders clearing sbitmap tag bits and waitqueue_active() in
+	 * __sbitmap_queue_wake_up(), since waitqueue_active() is lockless
+	 *
+	 * Otherwise, re-order of adding wait queue and getting driver tag
+	 * may cause __sbitmap_queue_wake_up() to wake up nothing because
+	 * the waitqueue_active() may not observe us in wait queue.
+	 */
+	smp_mb();
+
 	/*
 	 * It's possible that a tag was freed in the window between the
 	 * allocation failure and adding the hardware queue to the wait
-- 
2.43.0


  parent reply	other threads:[~2024-01-28 16:16 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-28 16:15 [PATCH AUTOSEL 5.10 01/13] PCI: Only override AMD USB controller if required Sasha Levin
2024-01-28 16:15 ` [PATCH AUTOSEL 5.10 02/13] PCI: switchtec: Fix stdev_release() crash after surprise hot remove Sasha Levin
2024-01-28 16:15 ` [PATCH AUTOSEL 5.10 03/13] usb: hub: Replace hardcoded quirk value with BIT() macro Sasha Levin
2024-01-28 16:15 ` [PATCH AUTOSEL 5.10 04/13] tty: allow TIOCSLCKTRMIOS with CAP_CHECKPOINT_RESTORE Sasha Levin
2024-01-28 16:15 ` [PATCH AUTOSEL 5.10 05/13] fs/kernfs/dir: obey S_ISGID Sasha Levin
2024-01-28 16:15 ` [PATCH AUTOSEL 5.10 06/13] PCI/AER: Decode Requester ID when no error info found Sasha Levin
2024-01-28 16:15   ` Sasha Levin
2024-01-28 16:15 ` [PATCH AUTOSEL 5.10 07/13] misc: lis3lv02d_i2c: Add missing setting of the reg_ctrl callback Sasha Levin
2024-01-28 16:15 ` [PATCH AUTOSEL 5.10 08/13] libsubcmd: Fix memory leak in uniq() Sasha Levin
2024-01-28 16:15 ` [PATCH AUTOSEL 5.10 09/13] virtio_net: Fix "‘%d’ directive writing between 1 and 11 bytes into a region of size 10" warnings Sasha Levin
2024-01-28 16:15 ` Sasha Levin [this message]
2024-01-28 16:15 ` [PATCH AUTOSEL 5.10 11/13] ceph: fix deadlock or deadcode of misusing dget() Sasha Levin
2024-01-28 16:15 ` [PATCH AUTOSEL 5.10 12/13] drm/amd/powerplay: Fix kzalloc parameter 'ATOM_Tonga_PPM_Table' in 'get_platform_power_management_table()' Sasha Levin
2024-01-28 16:15   ` Sasha Levin
2024-01-28 16:15 ` [PATCH AUTOSEL 5.10 13/13] drm/amdgpu: Release 'adev->pm.fw' before return in 'amdgpu_device_need_post()' Sasha Levin
2024-01-28 16:15   ` Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240128161606.205221-10-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=czhong@redhat.com \
    --cc=jack@suse.cz \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=shikemeng@huaweicloud.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.