All of lore.kernel.org
 help / color / mirror / Atom feed
From: axboe@kernel.dk (Jens Axboe)
Subject: [PATCH] nvme: limit max io queues as 1 in case of kdump kernel
Date: Thu, 6 Dec 2018 08:31:00 -0700	[thread overview]
Message-ID: <7d679d92-b732-2abc-80e3-68c4460af5dc@kernel.dk> (raw)
In-Reply-To: <20181206113807.15238-1-ming.lei@redhat.com>

On 12/6/18 4:38 AM, Ming Lei wrote:
> NVMe PCI builds queue mapping before allocating tagset, in which
> set->nr_hw_queues is set as 1 in case of kdump kernel, so wrong
> queue mapping can be setup, and kernel panic[1] is observed during
> booting.
> 
> This patch fixes the issue by setting max io queues as 1 under this
> situation.
> 
> [1] kernel panic log
> [    4.438371] nvme nvme0: 16/0/0 default/read/poll queues
> [    4.443277] BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
> [    4.444681] PGD 0 P4D 0
> [    4.445367] Oops: 0000 [#1] SMP NOPTI
> [    4.446342] CPU: 3 PID: 201 Comm: kworker/u33:10 Not tainted 4.20.0-rc5-00664-g5eb02f7ee1eb-dirty #459
> [    4.447630] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014
> [    4.448689] Workqueue: nvme-wq nvme_scan_work [nvme_core]
> [    4.449368] RIP: 0010:blk_mq_map_swqueue+0xfb/0x222
> [    4.450596] Code: 04 f5 20 28 ef 81 48 89 c6 39 55 30 76 93 89 d0 48 c1 e0 04 48 03 83 f8 05 00 00 48 8b 00 42 8b 3c 28 48 8b 43 58 48 8b 04 f8 <48> 8b b8 98 00 00 00 4c 0f a3 37 72 42 f0 4c 0f ab 37 66 8b b8 f6
> [    4.453132] RSP: 0018:ffffc900023b3cd8 EFLAGS: 00010286
> [    4.454061] RAX: 0000000000000000 RBX: ffff888174448000 RCX: 0000000000000001
> [    4.456480] RDX: 0000000000000001 RSI: ffffe8feffc506c0 RDI: 0000000000000001
> [    4.458750] RBP: ffff88810722d008 R08: ffff88817647a880 R09: 0000000000000002
> [    4.464580] R10: ffffc900023b3c10 R11: 0000000000000004 R12: ffff888174448538
> [    4.467803] R13: 0000000000000004 R14: 0000000000000001 R15: 0000000000000001
> [    4.469220] FS:  0000000000000000(0000) GS:ffff88817bac0000(0000) knlGS:0000000000000000
> [    4.471554] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    4.472464] CR2: 0000000000000098 CR3: 0000000174e4e001 CR4: 0000000000760ee0
> [    4.474264] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    4.476007] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [    4.477061] PKRU: 55555554
> [    4.477464] Call Trace:
> [    4.478731]  blk_mq_init_allocated_queue+0x36a/0x3ad
> [    4.479595]  blk_mq_init_queue+0x32/0x4e
> [    4.480178]  nvme_validate_ns+0x98/0x623 [nvme_core]
> [    4.480963]  ? nvme_submit_sync_cmd+0x1b/0x20 [nvme_core]
> [    4.481685]  ? nvme_identify_ctrl.isra.8+0x70/0xa0 [nvme_core]
> [    4.482601]  nvme_scan_work+0x23a/0x29b [nvme_core]
> [    4.483269]  ? _raw_spin_unlock_irqrestore+0x25/0x38
> [    4.483930]  ? try_to_wake_up+0x38d/0x3b3
> [    4.484478]  ? process_one_work+0x179/0x2fc
> [    4.485118]  process_one_work+0x1d3/0x2fc
> [    4.485655]  ? rescuer_thread+0x2ae/0x2ae
> [    4.486196]  worker_thread+0x1e9/0x2be
> [    4.486841]  kthread+0x115/0x11d
> [    4.487294]  ? kthread_park+0x76/0x76
> [    4.487784]  ret_from_fork+0x3a/0x50
> [    4.488322] Modules linked in: nvme nvme_core qemu_fw_cfg virtio_scsi ip_tables
> [    4.489428] Dumping ftrace buffer:
> [    4.489939]    (ftrace buffer empty)
> [    4.490492] CR2: 0000000000000098
> [    4.491052] ---[ end trace 03cd268ad5a86ff7 ]---
> 
> Cc: Jens Axboe <axboe at kernel.dk>
> Cc: David Milburn <dmilburn at redhat.com>
> Signed-off-by: Ming Lei <ming.lei at redhat.com>
> ---
>  drivers/nvme/host/pci.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 7732c4979a4e..86789921f463 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -31,6 +31,7 @@
>  #include <linux/io-64-nonatomic-lo-hi.h>
>  #include <linux/sed-opal.h>
>  #include <linux/pci-p2pdma.h>
> +#include <linux/crash_dump.h>
>  
>  #include "nvme.h"
>  
> @@ -253,6 +254,14 @@ static inline void _nvme_check_size(void)
>  
>  static unsigned int max_io_queues(void)
>  {
> +	/*
> +	 * blk-mq may set set->nr_hw_queues as 1 during allocating tagset
> +	 * in case of kdump kernel. However, queue mapping can be setup
> +	 * before allocating tagset, so limit the max io queues as 1 if
> +	 * running from kdump kernel.
> +	 */
> +	if (is_kdump_kernel())
> +		return 1;
>  	return num_possible_cpus() + write_queues + poll_queues;
>  }

Hmm, can we do this in blk-mq instead? Seems kind of shady to have to
be aware of this in the driver. Something like the below? Totally
untested.

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 900550594651..38beb0de5088 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3028,9 +3028,9 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
 	 * memory constrained environment. Limit us to 1 queue and
 	 * 64 tags to prevent using too much memory.
 	 */
-	if (is_kdump_kernel()) {
-		set->nr_hw_queues = 1;
+	if (is_kdump_kernel() && set->nr_hw_queues > 1) {
 		set->queue_depth = min(64U, set->queue_depth);
+		blk_mq_update_nr_hw_queues(set, 1);
 	}
 	/*
 	 * There is no use for more h/w queues than cpus if we just have

Or maybe we can do it in the actual queue setup? Like this one...


diff --git a/block/blk-mq.c b/block/blk-mq.c
index 900550594651..2e2249aca799 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2810,6 +2810,8 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
  */
 static unsigned int nr_hw_queues(struct blk_mq_tag_set *set)
 {
+	if (is_kdump_kernel())
+		return 1;
 	if (set->nr_maps == 1)
 		return nr_cpu_ids;
 

-- 
Jens Axboe

  reply	other threads:[~2018-12-06 15:31 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-06 11:38 [PATCH] nvme: limit max io queues as 1 in case of kdump kernel Ming Lei
2018-12-06 15:31 ` Jens Axboe [this message]
2018-12-06 15:49   ` Jens Axboe
2018-12-07  1:13   ` Ming Lei
2018-12-07  1:14     ` Jens Axboe
2018-12-07  1:37       ` Ming Lei
2018-12-07  1:39         ` Jens Axboe
2018-12-07  1:57           ` Ming Lei
2018-12-07  2:08             ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7d679d92-b732-2abc-80e3-68c4460af5dc@kernel.dk \
    --to=axboe@kernel.dk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.