From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Ming Lei <ming.lei@redhat.com>, ChanghuiZhong <czhong@redhat.com>,
Christoph Hellwig <hch@lst.de>,
"Martin K. Petersen" <martin.petersen@oracle.com>,
Bart Van Assche <bvanassche@acm.org>,
linux-scsi@vger.kernel.org, Jens Axboe <axboe@kernel.dk>,
Sasha Levin <sashal@kernel.org>,
linux-block@vger.kernel.org
Subject: [PATCH AUTOSEL 5.15 14/39] blk-mq: cancel blk-mq dispatch work in both blk_cleanup_queue and disk_release()
Date: Thu, 25 Nov 2021 21:31:31 -0500 [thread overview]
Message-ID: <20211126023156.441292-14-sashal@kernel.org> (raw)
In-Reply-To: <20211126023156.441292-1-sashal@kernel.org>
From: Ming Lei <ming.lei@redhat.com>
[ Upstream commit 2a19b28f7929866e1cec92a3619f4de9f2d20005 ]
For avoiding to slow down queue destroy, we don't call
blk_mq_quiesce_queue() in blk_cleanup_queue(), instead of delaying to
cancel dispatch work in blk_release_queue().
However, this way has caused kernel oops[1], reported by Changhui. The log
shows that scsi_device can be freed before running blk_release_queue(),
which is expected too since scsi_device is released after the scsi disk
is closed and the scsi_device is removed.
Fixes the issue by canceling blk-mq dispatch work in both blk_cleanup_queue()
and disk_release():
1) when disk_release() is run, the disk has been closed, and any sync
dispatch activities have been done, so canceling dispatch work is enough to
quiesce filesystem I/O dispatch activity.
2) in blk_cleanup_queue(), we only focus on passthrough request, and
passthrough request is always explicitly allocated & freed by
its caller, so once queue is frozen, all sync dispatch activity
for passthrough request has been done, then it is enough to just cancel
dispatch work for avoiding any dispatch activity.
[1] kernel panic log
[12622.769416] BUG: kernel NULL pointer dereference, address: 0000000000000300
[12622.777186] #PF: supervisor read access in kernel mode
[12622.782918] #PF: error_code(0x0000) - not-present page
[12622.788649] PGD 0 P4D 0
[12622.791474] Oops: 0000 [#1] PREEMPT SMP PTI
[12622.796138] CPU: 10 PID: 744 Comm: kworker/10:1H Kdump: loaded Not tainted 5.15.0+ #1
[12622.804877] Hardware name: Dell Inc. PowerEdge R730/0H21J3, BIOS 1.5.4 10/002/2015
[12622.813321] Workqueue: kblockd blk_mq_run_work_fn
[12622.818572] RIP: 0010:sbitmap_get+0x75/0x190
[12622.823336] Code: 85 80 00 00 00 41 8b 57 08 85 d2 0f 84 b1 00 00 00 45 31 e4 48 63 cd 48 8d 1c 49 48 c1 e3 06 49 03 5f 10 4c 8d 6b 40 83 f0 01 <48> 8b 33 44 89 f2 4c 89 ef 0f b6 c8 e8 fa f3 ff ff 83 f8 ff 75 58
[12622.844290] RSP: 0018:ffffb00a446dbd40 EFLAGS: 00010202
[12622.850120] RAX: 0000000000000001 RBX: 0000000000000300 RCX: 0000000000000004
[12622.858082] RDX: 0000000000000006 RSI: 0000000000000082 RDI: ffffa0b7a2dfe030
[12622.866042] RBP: 0000000000000004 R08: 0000000000000001 R09: ffffa0b742721334
[12622.874003] R10: 0000000000000008 R11: 0000000000000008 R12: 0000000000000000
[12622.881964] R13: 0000000000000340 R14: 0000000000000000 R15: ffffa0b7a2dfe030
[12622.889926] FS: 0000000000000000(0000) GS:ffffa0baafb40000(0000) knlGS:0000000000000000
[12622.898956] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12622.905367] CR2: 0000000000000300 CR3: 0000000641210001 CR4: 00000000001706e0
[12622.913328] Call Trace:
[12622.916055] <TASK>
[12622.918394] scsi_mq_get_budget+0x1a/0x110
[12622.922969] __blk_mq_do_dispatch_sched+0x1d4/0x320
[12622.928404] ? pick_next_task_fair+0x39/0x390
[12622.933268] __blk_mq_sched_dispatch_requests+0xf4/0x140
[12622.939194] blk_mq_sched_dispatch_requests+0x30/0x60
[12622.944829] __blk_mq_run_hw_queue+0x30/0xa0
[12622.949593] process_one_work+0x1e8/0x3c0
[12622.954059] worker_thread+0x50/0x3b0
[12622.958144] ? rescuer_thread+0x370/0x370
[12622.962616] kthread+0x158/0x180
[12622.966218] ? set_kthread_struct+0x40/0x40
[12622.970884] ret_from_fork+0x22/0x30
[12622.974875] </TASK>
[12622.977309] Modules linked in: scsi_debug rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs sunrpc dm_multipath intel_rapl_msr intel_rapl_common dell_wmi_descriptor sb_edac rfkill video x86_pkg_temp_thermal intel_powerclamp dcdbas coretemp kvm_intel kvm mgag200 irqbypass i2c_algo_bit rapl drm_kms_helper ipmi_ssif intel_cstate intel_uncore syscopyarea sysfillrect sysimgblt fb_sys_fops pcspkr cec mei_me lpc_ich mei ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter drm fuse xfs libcrc32c sr_mod cdrom sd_mod t10_pi sg ixgbe ahci libahci crct10dif_pclmul crc32_pclmul crc32c_intel libata megaraid_sas ghash_clmulni_intel tg3 wdat_wdt mdio dca wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_debug]
Reported-by: ChanghuiZhong <czhong@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20211116014343.610501-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
block/blk-core.c | 4 +++-
block/blk-mq.c | 13 +++++++++++++
block/blk-mq.h | 2 ++
block/blk-sysfs.c | 10 ----------
block/genhd.c | 2 ++
5 files changed, 20 insertions(+), 11 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index 4d8f5fe915887..0d5e7bc5dbea2 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -389,8 +389,10 @@ void blk_cleanup_queue(struct request_queue *q)
blk_queue_flag_set(QUEUE_FLAG_DEAD, q);
blk_sync_queue(q);
- if (queue_is_mq(q))
+ if (queue_is_mq(q)) {
+ blk_mq_cancel_work_sync(q);
blk_mq_exit_queue(q);
+ }
/*
* In theory, request pool of sched_tags belongs to request queue.
diff --git a/block/blk-mq.c b/block/blk-mq.c
index c8a9d10f7c18b..82de39926a9f6 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -4018,6 +4018,19 @@ unsigned int blk_mq_rq_cpu(struct request *rq)
}
EXPORT_SYMBOL(blk_mq_rq_cpu);
+void blk_mq_cancel_work_sync(struct request_queue *q)
+{
+ if (queue_is_mq(q)) {
+ struct blk_mq_hw_ctx *hctx;
+ int i;
+
+ cancel_delayed_work_sync(&q->requeue_work);
+
+ queue_for_each_hw_ctx(q, hctx, i)
+ cancel_delayed_work_sync(&hctx->run_work);
+ }
+}
+
static int __init blk_mq_init(void)
{
int i;
diff --git a/block/blk-mq.h b/block/blk-mq.h
index d08779f77a265..7cdca23b6263d 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -129,6 +129,8 @@ extern int blk_mq_sysfs_register(struct request_queue *q);
extern void blk_mq_sysfs_unregister(struct request_queue *q);
extern void blk_mq_hctx_kobj_init(struct blk_mq_hw_ctx *hctx);
+void blk_mq_cancel_work_sync(struct request_queue *q);
+
void blk_mq_release(struct request_queue *q);
static inline struct blk_mq_ctx *__blk_mq_get_ctx(struct request_queue *q,
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 614d9d47de36b..4737ec024ee9b 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -805,16 +805,6 @@ static void blk_release_queue(struct kobject *kobj)
blk_free_queue_stats(q->stats);
- if (queue_is_mq(q)) {
- struct blk_mq_hw_ctx *hctx;
- int i;
-
- cancel_delayed_work_sync(&q->requeue_work);
-
- queue_for_each_hw_ctx(q, hctx, i)
- cancel_delayed_work_sync(&hctx->run_work);
- }
-
blk_exit_queue(q);
blk_queue_free_zone_bitmaps(q);
diff --git a/block/genhd.c b/block/genhd.c
index 6accd0b185e9e..f091a60dcf1ea 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1086,6 +1086,8 @@ static void disk_release(struct device *dev)
might_sleep();
WARN_ON_ONCE(disk_live(disk));
+ blk_mq_cancel_work_sync(disk->queue);
+
disk_release_events(disk);
kfree(disk->random);
xa_destroy(&disk->part_tbl);
--
2.33.0
next prev parent reply other threads:[~2021-11-26 2:34 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-26 2:31 [PATCH AUTOSEL 5.15 01/39] gfs2: release iopen glock early in evict Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 02/39] gfs2: Fix length of holes reported at end-of-file Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 03/39] powerpc/pseries/ddw: Revert "Extend upper limit for huge DMA window for persistent memory" Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 04/39] powerpc/pseries/ddw: Do not try direct mapping with persistent memory and one window Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 05/39] drm/sun4i: fix unmet dependency on RESET_CONTROLLER for PHY_SUN6I_MIPI_DPHY Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 06/39] mac80211: do not access the IV when it was stripped Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 07/39] mac80211: fix throughput LED trigger Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 08/39] x86/hyperv: Move required MSRs check to initial platform probing Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 09/39] tun: fix bonding active backup with arp monitoring Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 10/39] net/smc: Transfer remaining wait queue entries during fallback Sasha Levin
2021-11-26 2:51 ` Jakub Kicinski
2021-12-03 18:20 ` Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 11/39] atlantic: Fix OOB read and write in hw_atl_utils_fw_rpc_wait Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 12/39] net: return correct error code Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 13/39] pinctrl: qcom: fix unmet dependencies on GPIOLIB for GPIOLIB_IRQCHIP Sasha Levin
2021-11-26 2:31 ` Sasha Levin [this message]
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 15/39] platform/x86: dell-wmi-descriptor: disable by default Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 16/39] platform/x86: thinkpad_acpi: Add support for dual fan control Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 17/39] platform/x86: thinkpad_acpi: Fix WWAN device disabled issue after S3 deep Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 18/39] s390/setup: avoid using memblock_enforce_memory_limit Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 19/39] btrfs: silence lockdep when reading chunk tree during mount Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 20/39] btrfs: check-integrity: fix a warning on write caching disabled disk Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 21/39] thermal: core: Reset previous low and high trip during thermal zone init Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 22/39] scsi: iscsi: Unblock session then wake up error handler Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 23/39] net: usb: r8152: Add MAC passthrough support for more Lenovo Docks Sasha Levin
2021-11-28 9:49 ` Sergey Shtylyov
2021-11-28 9:50 ` Sergey Shtylyov
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 24/39] drm/amd/pm: Remove artificial freq level on Navi1x Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 25/39] drm/amd/amdkfd: Fix kernel panic when reset failed and been triggered again Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 26/39] drm/amd/amdgpu: fix potential memleak Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 27/39] ata: ahci: Add Green Sardine vendor ID as board_ahci_mobile Sasha Levin
2021-11-29 14:46 ` Limonciello, Mario
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 28/39] ata: libahci: Adjust behavior when StorageD3Enable _DSD is set Sasha Levin
2021-11-29 14:46 ` Limonciello, Mario
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 29/39] ethernet: hisilicon: hns: hns_dsaf_misc: fix a possible array overflow in hns_dsaf_ge_srst_by_port() Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 30/39] ipv6: check return value of ipv6_skip_exthdr Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 31/39] net: tulip: de4x5: fix the problem that the array 'lp->phy[8]' may be out of bound Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 32/39] net: ethernet: dec: tulip: de4x5: fix possible array overflows in type3_infoblock() Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 33/39] perf sort: Fix the 'weight' sort key behavior Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 34/39] perf sort: Fix the 'ins_lat' " Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 35/39] perf sort: Fix the 'p_stage_cyc' " Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 36/39] perf inject: Fix ARM SPE handling Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 37/39] perf hist: Fix memory leak of a perf_hpp_fmt Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 38/39] perf report: Fix memory leaks around perf_tip() Sasha Levin
2021-11-26 2:31 ` [PATCH AUTOSEL 5.15 39/39] tracing: Don't use out-of-sync va_list in event printing Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211126023156.441292-14-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=axboe@kernel.dk \
--cc=bvanassche@acm.org \
--cc=czhong@redhat.com \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=ming.lei@redhat.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox