[PATCH V3 0/14] blk-mq: fix wrong queue mapping for kdump kernel

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH V3 0/14] blk-mq: fix wrong queue mapping for kdump kernel
@ 2023-08-08 10:42 Ming Lei
  2023-08-08 10:42 ` [PATCH V3 01/14] blk-mq: add blk_mq_max_nr_hw_queues() Ming Lei
                   ` (13 more replies)
  0 siblings, 14 replies; 25+ messages in thread
From: Ming Lei @ 2023-08-08 10:42 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-nvme, Martin K . Petersen,
	linux-scsi
  Cc: linux-block, Wen Xiong, Keith Busch, Ming Lei

Hi,

Fix wrong queue mapping for kdump kernel since blk-mq updates
nr_hw_queues to 1, so driver and blk-mq may have different queue topo.


V3:
	- cover more drivers
	- clean up blk-mq a bit, as suggested by Christoph

V2:
	- add helper of scsi_max_nr_hw_queues() for avoiding potential build
	failure because scsi driver often doesn't deal with blk-mq directly
	- apply scsi_max_nr_hw_queues() for all scsi changes
	- move lpfc's change into managed irq code path


Ming Lei (14):
  blk-mq: add blk_mq_max_nr_hw_queues()
  nvme-pci: use blk_mq_max_nr_hw_queues() to calculate io queues
  ublk: limit max allowed nr_hw_queues
  virtio-blk: limit max allowed submit queues
  scsi: core: add helper of scsi_max_nr_hw_queues()
  scsi: lpfc: use blk_mq_max_nr_hw_queues() to calculate io vectors
  scsi: mpi3mr: take blk_mq_max_nr_hw_queues() into account for
    calculating io vectors
  scsi: megaraid: take blk_mq_max_nr_hw_queues() into account for
    calculating io vectors
  scsi: mpt3sas: take blk_mq_max_nr_hw_queues() into account for
    calculating io vectors
  scsi: pm8001: take blk_mq_max_nr_hw_queues() into account for
    calculating io vectors
  scsi: hisi: take blk_mq_max_nr_hw_queues() into account for
    calculating io vectors
  scsi: ufs: limit max allowed nr_hw_queues
  scsi: storvsc: limit max allowed nr_hw_queues
  blk-mq: add helpers for treating kdump kernel

 block/blk-mq.c                            | 55 ++++++++++++++++++-----
 drivers/block/ublk_drv.c                  |  2 +-
 drivers/block/virtio_blk.c                |  3 +-
 drivers/nvme/host/pci.c                   |  2 +-
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c    |  3 ++
 drivers/scsi/lpfc/lpfc_init.c             |  2 +
 drivers/scsi/megaraid/megaraid_sas_base.c |  6 ++-
 drivers/scsi/mpi3mr/mpi3mr_fw.c           |  3 ++
 drivers/scsi/mpt3sas/mpt3sas_base.c       |  4 +-
 drivers/scsi/pm8001/pm8001_init.c         |  4 +-
 drivers/scsi/storvsc_drv.c                |  3 ++
 drivers/ufs/core/ufs-mcq.c                |  2 +-
 include/linux/blk-mq.h                    |  1 +
 include/scsi/scsi_host.h                  |  5 +++
 14 files changed, 75 insertions(+), 20 deletions(-)

-- 
2.40.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH V3 01/14] blk-mq: add blk_mq_max_nr_hw_queues()
  2023-08-08 10:42 [PATCH V3 0/14] blk-mq: fix wrong queue mapping for kdump kernel Ming Lei
@ 2023-08-08 10:42 ` Ming Lei
  2023-08-09 13:44   ` Christoph Hellwig
  2023-08-08 10:42 ` [PATCH V3 02/14] nvme-pci: use blk_mq_max_nr_hw_queues() to calculate io queues Ming Lei
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 25+ messages in thread
From: Ming Lei @ 2023-08-08 10:42 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-nvme, Martin K . Petersen,
	linux-scsi
  Cc: linux-block, Wen Xiong, Keith Busch, Ming Lei

blk_mq_alloc_tag_set() may override set->nr_hw_queues as 1 in case of kdump
kernel. This way causes trouble for driver, because blk-mq and driver see
different queue mapping. Especially the only online CPU may not be 1 for
kdump kernel, in which 'maxcpus=1' is passed from kernel command line,
then driver may map hctx0 into one inactive real hw queue which cpu
affinity is 0(offline).

The issue exists on all drivers which use managed irq and support
multiple hw queue.

Prepare for fixing this kind of issue by applying the added helper, so
driver can take blk-mq max nr_hw_queues knowledge into account when
calculating io queues.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c         | 16 ++++++++++++++++
 include/linux/blk-mq.h |  1 +
 2 files changed, 17 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index b04ff6f56926..617d6f849a7b 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -140,6 +140,22 @@ void blk_mq_freeze_queue_wait(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait);
 
+/*
+ * Return the max supported nr_hw_queues for each hw queue type
+ *
+ * blk_mq_alloc_tag_set() may change nr_hw_queues for kdump kernel, so
+ * driver has to take blk-mq max supported nr_hw_queues into account
+ * when figuring out nr_hw_queues from hardware info, for avoiding
+ * inconsistency between driver and blk-mq.
+ */
+unsigned int blk_mq_max_nr_hw_queues(void)
+{
+	if (is_kdump_kernel())
+		return 1;
+	return nr_cpu_ids;
+}
+EXPORT_SYMBOL_GPL(blk_mq_max_nr_hw_queues);
+
 int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
 				     unsigned long timeout)
 {
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 495ca198775f..4c0cfd1f9e52 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -711,6 +711,7 @@ int blk_mq_alloc_sq_tag_set(struct blk_mq_tag_set *set,
 		const struct blk_mq_ops *ops, unsigned int queue_depth,
 		unsigned int set_flags);
 void blk_mq_free_tag_set(struct blk_mq_tag_set *set);
+unsigned int blk_mq_max_nr_hw_queues(void);
 
 void blk_mq_free_request(struct request *rq);
 int blk_rq_poll(struct request *rq, struct io_comp_batch *iob,
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V3 02/14] nvme-pci: use blk_mq_max_nr_hw_queues() to calculate io queues
  2023-08-08 10:42 [PATCH V3 0/14] blk-mq: fix wrong queue mapping for kdump kernel Ming Lei
  2023-08-08 10:42 ` [PATCH V3 01/14] blk-mq: add blk_mq_max_nr_hw_queues() Ming Lei
@ 2023-08-08 10:42 ` Ming Lei
  2023-08-08 10:42 ` [PATCH V3 03/14] ublk: limit max allowed nr_hw_queues Ming Lei
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Ming Lei @ 2023-08-08 10:42 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-nvme, Martin K . Petersen,
	linux-scsi
  Cc: linux-block, Wen Xiong, Keith Busch, Ming Lei

Take blk-mq's knowledge into account for calculating io queues.

Fix wrong queue mapping in case of kdump kernel.

On arm and ppc64, 'maxcpus=1' is passed to kdump command line, see
`Documentation/admin-guide/kdump/kdump.rst`, so num_possible_cpus()
still returns all CPUs because 'maxcpus=1' just bring up one single
cpu core during booting.

blk-mq sees single queue in kdump kernel, and in driver's viewpoint
there are still multiple queues, this inconsistency causes driver to apply
wrong queue mapping for handling IO, and IO timeout is triggered.

Meantime, single queue makes much less resource utilization, and reduce
risk of kernel failure.

Reported-by: Wen Xiong <wenxiong@linux.ibm.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/host/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index baf69af7ea78..a1227ae7eb39 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2251,7 +2251,7 @@ static unsigned int nvme_max_io_queues(struct nvme_dev *dev)
 	 */
 	if (dev->ctrl.quirks & NVME_QUIRK_SHARED_TAGS)
 		return 1;
-	return num_possible_cpus() + dev->nr_write_queues + dev->nr_poll_queues;
+	return blk_mq_max_nr_hw_queues() + dev->nr_write_queues + dev->nr_poll_queues;
 }

 static int nvme_setup_io_queues(struct nvme_dev *dev)
-- 
2.40.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V3 03/14] ublk: limit max allowed nr_hw_queues
  2023-08-08 10:42 [PATCH V3 0/14] blk-mq: fix wrong queue mapping for kdump kernel Ming Lei
  2023-08-08 10:42 ` [PATCH V3 01/14] blk-mq: add blk_mq_max_nr_hw_queues() Ming Lei
  2023-08-08 10:42 ` [PATCH V3 02/14] nvme-pci: use blk_mq_max_nr_hw_queues() to calculate io queues Ming Lei
@ 2023-08-08 10:42 ` Ming Lei
  2023-08-08 10:42 ` [PATCH V3 04/14] virtio-blk: limit max allowed submit queues Ming Lei
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Ming Lei @ 2023-08-08 10:42 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-nvme, Martin K . Petersen,
	linux-scsi
  Cc: linux-block, Wen Xiong, Keith Busch, Ming Lei

Take blk-mq's knowledge into account for calculating io queues.

Fix wrong queue mapping in case of kdump kernel.

On arm and ppc64, 'maxcpus=1' is passed to kdump command line, see
`Documentation/admin-guide/kdump/kdump.rst`, so num_possible_cpus()
still returns all CPUs because 'maxcpus=1' just bring up one single
cpu core during booting.

blk-mq sees single queue in kdump kernel, and in driver's viewpoint
there are still multiple queues, this inconsistency causes driver to apply
wrong queue mapping for handling IO, and IO timeout is triggered.

Meantime, single queue makes much less resource utilization, and reduce
risk of kernel failure.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/block/ublk_drv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 21d2e71c5514..6cf4981951e2 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -2045,7 +2045,7 @@ static int ublk_ctrl_add_dev(struct io_uring_cmd *cmd)
 	ub->dev_info.flags &= ~UBLK_F_SUPPORT_ZERO_COPY;

 	ub->dev_info.nr_hw_queues = min_t(unsigned int,
-			ub->dev_info.nr_hw_queues, nr_cpu_ids);
+			ub->dev_info.nr_hw_queues, blk_mq_max_nr_hw_queues());
 	ublk_align_max_io_size(ub);

 	ret = ublk_init_queues(ub);
-- 
2.40.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V3 04/14] virtio-blk: limit max allowed submit queues
  2023-08-08 10:42 [PATCH V3 0/14] blk-mq: fix wrong queue mapping for kdump kernel Ming Lei
                   ` (2 preceding siblings ...)
  2023-08-08 10:42 ` [PATCH V3 03/14] ublk: limit max allowed nr_hw_queues Ming Lei
@ 2023-08-08 10:42 ` Ming Lei
  2023-08-10 19:23   ` Michael S. Tsirkin
  2023-08-08 10:42 ` [PATCH V3 05/14] scsi: core: add helper of scsi_max_nr_hw_queues() Ming Lei
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 25+ messages in thread
From: Ming Lei @ 2023-08-08 10:42 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-nvme, Martin K . Petersen,
	linux-scsi
  Cc: linux-block, Wen Xiong, Keith Busch, Ming Lei,
	Michael S . Tsirkin, Jason Wang, virtualization

Take blk-mq's knowledge into account for calculating io queues.

Fix wrong queue mapping in case of kdump kernel.

On arm and ppc64, 'maxcpus=1' is passed to kdump command line, see
`Documentation/admin-guide/kdump/kdump.rst`, so num_possible_cpus()
still returns all CPUs because 'maxcpus=1' just bring up one single
cpu core during booting.

blk-mq sees single queue in kdump kernel, and in driver's viewpoint
there are still multiple queues, this inconsistency causes driver to apply
wrong queue mapping for handling IO, and IO timeout is triggered.

Meantime, single queue makes much less resource utilization, and reduce
risk of kernel failure.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/block/virtio_blk.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 1fe011676d07..4ba79fe2a1b4 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -1047,7 +1047,8 @@ static int init_vq(struct virtio_blk *vblk)
 
 	num_poll_vqs = min_t(unsigned int, poll_queues, num_vqs - 1);
 
-	vblk->io_queues[HCTX_TYPE_DEFAULT] = num_vqs - num_poll_vqs;
+	vblk->io_queues[HCTX_TYPE_DEFAULT] = min_t(unsigned,
+			num_vqs - num_poll_vqs, blk_mq_max_nr_hw_queues());
 	vblk->io_queues[HCTX_TYPE_READ] = 0;
 	vblk->io_queues[HCTX_TYPE_POLL] = num_poll_vqs;
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V3 05/14] scsi: core: add helper of scsi_max_nr_hw_queues()
  2023-08-08 10:42 [PATCH V3 0/14] blk-mq: fix wrong queue mapping for kdump kernel Ming Lei
                   ` (3 preceding siblings ...)
  2023-08-08 10:42 ` [PATCH V3 04/14] virtio-blk: limit max allowed submit queues Ming Lei
@ 2023-08-08 10:42 ` Ming Lei
  2023-08-08 10:42 ` [PATCH V3 06/14] scsi: lpfc: use blk_mq_max_nr_hw_queues() to calculate io vectors Ming Lei
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Ming Lei @ 2023-08-08 10:42 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-nvme, Martin K . Petersen,
	linux-scsi
  Cc: linux-block, Wen Xiong, Keith Busch, Ming Lei

blk_mq_alloc_tag_set() may change nr_hw_queues for kdump kernel, so
driver has to take blk-mq max supported nr_hw_queues into account
when figuring out nr_hw_queues from hardware info, for avoiding
inconsistency between scsi driver and blk-mq.

Add helper of scsi_max_nr_hw_queues() for avoiding nr_hw_queues
inconsistency between scsi driver and blk-mq.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 include/scsi/scsi_host.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 70b7475dcf56..2273f855940f 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -803,6 +803,11 @@ extern int scsi_host_unblock(struct Scsi_Host *shost, int new_state);
 void scsi_host_busy_iter(struct Scsi_Host *,
 			 bool (*fn)(struct scsi_cmnd *, void *), void *priv);
 
+static inline unsigned int scsi_max_nr_hw_queues(void)
+{
+	return blk_mq_max_nr_hw_queues();
+}
+
 struct class_container;
 
 /*
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V3 06/14] scsi: lpfc: use blk_mq_max_nr_hw_queues() to calculate io vectors
  2023-08-08 10:42 [PATCH V3 0/14] blk-mq: fix wrong queue mapping for kdump kernel Ming Lei
                   ` (4 preceding siblings ...)
  2023-08-08 10:42 ` [PATCH V3 05/14] scsi: core: add helper of scsi_max_nr_hw_queues() Ming Lei
@ 2023-08-08 10:42 ` Ming Lei
  2023-08-08 10:42 ` [PATCH V3 07/14] scsi: mpi3mr: take blk_mq_max_nr_hw_queues() into account for calculating " Ming Lei
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Ming Lei @ 2023-08-08 10:42 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-nvme, Martin K . Petersen,
	linux-scsi
  Cc: linux-block, Wen Xiong, Keith Busch, Ming Lei, Justin Tee,
	Justin Tee, James Smart

Take blk-mq's knowledge into account for calculating io queues.

Fix wrong queue mapping in case of kdump kernel.

On arm and ppc64, 'maxcpus=1' is passed to kdump kernel command line,
see `Documentation/admin-guide/kdump/kdump.rst`, so num_possible_cpus()
still returns all CPUs because 'maxcpus=1' just bring up one single
cpu core during booting.

blk-mq sees single queue in kdump kernel, and in driver's viewpoint
there are still multiple queues, this inconsistency causes driver to apply
wrong queue mapping for handling IO, and IO timeout is triggered.

Meantime, single queue makes much less resource utilization, and reduce
risk of kernel failure.

Reviewed-by: Justin Tee <justin.tee@broadcom.com>
Cc: Justin Tee <justintee8345@gmail.com>
Cc: James Smart <james.smart@broadcom.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/lpfc/lpfc_init.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index 3221a934066b..961ff392f1c5 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -13025,6 +13025,8 @@ lpfc_sli4_enable_msix(struct lpfc_hba *phba)
 		flags |= PCI_IRQ_AFFINITY;
 	}
 
+	vectors = min_t(unsigned int, vectors, scsi_max_nr_hw_queues());
+
 	rc = pci_alloc_irq_vectors(phba->pcidev, 1, vectors, flags);
 	if (rc < 0) {
 		lpfc_printf_log(phba, KERN_INFO, LOG_INIT,
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V3 07/14] scsi: mpi3mr: take blk_mq_max_nr_hw_queues() into account for calculating io vectors
  2023-08-08 10:42 [PATCH V3 0/14] blk-mq: fix wrong queue mapping for kdump kernel Ming Lei
                   ` (5 preceding siblings ...)
  2023-08-08 10:42 ` [PATCH V3 06/14] scsi: lpfc: use blk_mq_max_nr_hw_queues() to calculate io vectors Ming Lei
@ 2023-08-08 10:42 ` Ming Lei
  2023-08-08 10:42 ` [PATCH V3 08/14] scsi: megaraid: " Ming Lei
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Ming Lei @ 2023-08-08 10:42 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-nvme, Martin K . Petersen,
	linux-scsi
  Cc: linux-block, Wen Xiong, Keith Busch, Ming Lei, Sreekanth Reddy,
	Sathya Prakash Veerichetty, Kashyap Desai, Sumit Saxena

Take blk-mq's knowledge into account for calculating io queues.

Fix wrong queue mapping in case of kdump kernel.

On arm and ppc64, 'maxcpus=1' is passed to kdump kernel command line,
see `Documentation/admin-guide/kdump/kdump.rst`, so num_possible_cpus()
still returns all CPUs because 'maxcpus=1' just bring up one single
cpu core during booting.

blk-mq sees single queue in kdump kernel, and in driver's viewpoint
there are still multiple queues, this inconsistency causes driver to apply
wrong queue mapping for handling IO, and IO timeout is triggered.

Meantime, single queue makes much less resource utilization, and reduce
risk of kernel failure.

Cc: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Cc: Sathya Prakash Veerichetty <sathya.prakash@broadcom.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/mpi3mr/mpi3mr_fw.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/scsi/mpi3mr/mpi3mr_fw.c b/drivers/scsi/mpi3mr/mpi3mr_fw.c
index 5fa07d6ee5b8..8fe14e728af6 100644
--- a/drivers/scsi/mpi3mr/mpi3mr_fw.c
+++ b/drivers/scsi/mpi3mr/mpi3mr_fw.c
@@ -815,6 +815,9 @@ static int mpi3mr_setup_isr(struct mpi3mr_ioc *mrioc, u8 setup_one)
 
 		desc.post_vectors = mrioc->requested_poll_qcount;
 		min_vec = desc.pre_vectors + desc.post_vectors;
+		if (max_vectors - min_vec > scsi_max_nr_hw_queues())
+			max_vectors = min_vec + scsi_max_nr_hw_queues();
+
 		irq_flags |= PCI_IRQ_AFFINITY | PCI_IRQ_ALL_TYPES;
 
 		retval = pci_alloc_irq_vectors_affinity(mrioc->pdev,
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V3 08/14] scsi: megaraid: take blk_mq_max_nr_hw_queues() into account for calculating io vectors
  2023-08-08 10:42 [PATCH V3 0/14] blk-mq: fix wrong queue mapping for kdump kernel Ming Lei
                   ` (6 preceding siblings ...)
  2023-08-08 10:42 ` [PATCH V3 07/14] scsi: mpi3mr: take blk_mq_max_nr_hw_queues() into account for calculating " Ming Lei
@ 2023-08-08 10:42 ` Ming Lei
  2023-08-08 10:42 ` [PATCH V3 09/14] scsi: mpt3sas: " Ming Lei
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Ming Lei @ 2023-08-08 10:42 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-nvme, Martin K . Petersen,
	linux-scsi
  Cc: linux-block, Wen Xiong, Keith Busch, Ming Lei, Sreekanth Reddy,
	Sathya Prakash Veerichetty, Kashyap Desai, Sumit Saxena

Take blk-mq's knowledge into account for calculating io queues.

Fix wrong queue mapping in case of kdump kernel.

On arm and ppc64, 'maxcpus=1' is passed to kdump kernel command line,
see `Documentation/admin-guide/kdump/kdump.rst`, so num_possible_cpus()
still returns all CPUs because 'maxcpus=1' just bring up one single
cpu core during booting.

blk-mq sees single queue in kdump kernel, and in driver's viewpoint
there are still multiple queues, this inconsistency causes driver to apply
wrong queue mapping for handling IO, and IO timeout is triggered.

Meantime, single queue makes much less resource utilization, and reduce
risk of kernel failure.

Cc: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Cc: Sathya Prakash Veerichetty <sathya.prakash@broadcom.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/megaraid/megaraid_sas_base.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
index 050eed8e2684..587ccbc1ec92 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -5921,6 +5921,10 @@ __megasas_alloc_irq_vectors(struct megasas_instance *instance)
 	int i, irq_flags;
 	struct irq_affinity desc = { .pre_vectors = instance->low_latency_index_start };
 	struct irq_affinity *descp = &desc;
+	unsigned max_vecs = instance->msix_vectors - instance->iopoll_q_count;
+
+	if (max_vecs > scsi_max_nr_hw_queues())
+		max_vecs = scsi_max_nr_hw_queues();
 
 	irq_flags = PCI_IRQ_MSIX;
 
@@ -5934,7 +5938,7 @@ __megasas_alloc_irq_vectors(struct megasas_instance *instance)
 	 */
 	i = pci_alloc_irq_vectors_affinity(instance->pdev,
 		instance->low_latency_index_start,
-		instance->msix_vectors - instance->iopoll_q_count, irq_flags, descp);
+		max_vecs, irq_flags, descp);
 
 	return i;
 }
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V3 09/14] scsi: mpt3sas: take blk_mq_max_nr_hw_queues() into account for calculating io vectors
  2023-08-08 10:42 [PATCH V3 0/14] blk-mq: fix wrong queue mapping for kdump kernel Ming Lei
                   ` (7 preceding siblings ...)
  2023-08-08 10:42 ` [PATCH V3 08/14] scsi: megaraid: " Ming Lei
@ 2023-08-08 10:42 ` Ming Lei
  2023-08-08 10:42 ` [PATCH V3 10/14] scsi: pm8001: " Ming Lei
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Ming Lei @ 2023-08-08 10:42 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-nvme, Martin K . Petersen,
	linux-scsi
  Cc: linux-block, Wen Xiong, Keith Busch, Ming Lei, Sathya Prakash,
	Sreekanth Reddy, Suganath Prabu Subramani

Take blk-mq's knowledge into account for calculating io queues.

Fix wrong queue mapping in case of kdump kernel.

On arm and ppc64, 'maxcpus=1' is passed to kdump kernel command line,
see `Documentation/admin-guide/kdump/kdump.rst`, so num_possible_cpus()
still returns all CPUs because 'maxcpus=1' just bring up one single
cpu core during booting.

blk-mq sees single queue in kdump kernel, and in driver's viewpoint
there are still multiple queues, this inconsistency causes driver to apply
wrong queue mapping for handling IO, and IO timeout is triggered.

Meantime, single queue makes much less resource utilization, and reduce
risk of kernel failure.

Cc: Sathya Prakash <sathya.prakash@broadcom.com>
Cc: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Cc: Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/mpt3sas/mpt3sas_base.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 53f5492579cb..d238e0275edd 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -3332,8 +3332,8 @@ _base_alloc_irq_vectors(struct MPT3SAS_ADAPTER *ioc)
 	 * Don't allocate msix vectors for poll_queues.
 	 * msix_vectors is always within a range of FW supported reply queue.
 	 */
-	int nr_msix_vectors = ioc->iopoll_q_start_index;
-
+	int nr_msix_vectors = min_t(unsigned int, ioc->iopoll_q_start_index,
+			scsi_max_nr_hw_queues());
 
 	if (ioc->smp_affinity_enable)
 		irq_flags |= PCI_IRQ_AFFINITY | PCI_IRQ_ALL_TYPES;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V3 10/14] scsi: pm8001: take blk_mq_max_nr_hw_queues() into account for calculating io vectors
  2023-08-08 10:42 [PATCH V3 0/14] blk-mq: fix wrong queue mapping for kdump kernel Ming Lei
                   ` (8 preceding siblings ...)
  2023-08-08 10:42 ` [PATCH V3 09/14] scsi: mpt3sas: " Ming Lei
@ 2023-08-08 10:42 ` Ming Lei
  2023-08-08 10:42 ` [PATCH V3 11/14] scsi: hisi: " Ming Lei
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Ming Lei @ 2023-08-08 10:42 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-nvme, Martin K . Petersen,
	linux-scsi
  Cc: linux-block, Wen Xiong, Keith Busch, Ming Lei, Jack Wang

Take blk-mq's knowledge into account for calculating io queues.

Fix wrong queue mapping in case of kdump kernel.

On arm and ppc64, 'maxcpus=1' is passed to kdump kernel command line,
see `Documentation/admin-guide/kdump/kdump.rst`, so num_possible_cpus()
still returns all CPUs because 'maxcpus=1' just bring up one single
cpu core during booting.

blk-mq sees single queue in kdump kernel, and in driver's viewpoint
there are still multiple queues, this inconsistency causes driver to apply
wrong queue mapping for handling IO, and IO timeout is triggered.

Meantime, single queue makes much less resource utilization, and reduce
risk of kernel failure.

Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/pm8001/pm8001_init.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/pm8001/pm8001_init.c b/drivers/scsi/pm8001/pm8001_init.c
index 4995e1ef4e0e..220ae504e5a5 100644
--- a/drivers/scsi/pm8001/pm8001_init.c
+++ b/drivers/scsi/pm8001/pm8001_init.c
@@ -965,6 +965,8 @@ static u32 pm8001_setup_msix(struct pm8001_hba_info *pm8001_ha)
 		rc = pci_alloc_irq_vectors(pm8001_ha->pdev, 1, 1,
 					   PCI_IRQ_MSIX);
 	} else {
+		unsigned int max_vecs = min_t(unsigned int, PM8001_MAX_MSIX_VEC,
+				scsi_max_nr_hw_queues() + 1);
 		/*
 		 * Queue index #0 is used always for housekeeping, so don't
 		 * include in the affinity spreading.
@@ -973,7 +975,7 @@ static u32 pm8001_setup_msix(struct pm8001_hba_info *pm8001_ha)
 			.pre_vectors = 1,
 		};
 		rc = pci_alloc_irq_vectors_affinity(
-				pm8001_ha->pdev, 2, PM8001_MAX_MSIX_VEC,
+				pm8001_ha->pdev, 2, max_vecs,
 				PCI_IRQ_MSIX | PCI_IRQ_AFFINITY, &desc);
 	}
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V3 11/14] scsi: hisi: take blk_mq_max_nr_hw_queues() into account for calculating io vectors
  2023-08-08 10:42 [PATCH V3 0/14] blk-mq: fix wrong queue mapping for kdump kernel Ming Lei
                   ` (9 preceding siblings ...)
  2023-08-08 10:42 ` [PATCH V3 10/14] scsi: pm8001: " Ming Lei
@ 2023-08-08 10:42 ` Ming Lei
  2023-08-08 10:42 ` [PATCH V3 12/14] scsi: ufs: limit max allowed nr_hw_queues Ming Lei
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Ming Lei @ 2023-08-08 10:42 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-nvme, Martin K . Petersen,
	linux-scsi
  Cc: linux-block, Wen Xiong, Keith Busch, Ming Lei, John Garry,
	Xiang Chen

Take blk-mq's knowledge into account for calculating io queues.

Fix wrong queue mapping in case of kdump kernel.

On arm and ppc64, 'maxcpus=1' is passed to kdump kernel command line,
see `Documentation/admin-guide/kdump/kdump.rst`, so num_possible_cpus()
still returns all CPUs because 'maxcpus=1' just bring up one single
cpu core during booting.

blk-mq sees single queue in kdump kernel, and in driver's viewpoint
there are still multiple queues, this inconsistency causes driver to apply
wrong queue mapping for handling IO, and IO timeout is triggered.

Meantime, single queue makes much less resource utilization, and reduce
risk of kernel failure.

Cc: John Garry <john.g.garry@oracle.com>
Cc: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
index 20e1607c6282..60d2301e7f9d 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
@@ -2550,6 +2550,9 @@ static int interrupt_preinit_v3_hw(struct hisi_hba *hisi_hba)
 
 
 	hisi_hba->cq_nvecs = vectors - BASE_VECTORS_V3_HW - hisi_hba->iopoll_q_cnt;
+	if (hisi_hba->cq_nvecs > scsi_max_nr_hw_queues())
+		hisi_hba->cq_nvecs = scsi_max_nr_hw_queues();
+
 	shost->nr_hw_queues = hisi_hba->cq_nvecs + hisi_hba->iopoll_q_cnt;
 
 	return devm_add_action(&pdev->dev, hisi_sas_v3_free_vectors, pdev);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V3 12/14] scsi: ufs: limit max allowed nr_hw_queues
  2023-08-08 10:42 [PATCH V3 0/14] blk-mq: fix wrong queue mapping for kdump kernel Ming Lei
                   ` (10 preceding siblings ...)
  2023-08-08 10:42 ` [PATCH V3 11/14] scsi: hisi: " Ming Lei
@ 2023-08-08 10:42 ` Ming Lei
  2023-08-08 10:42 ` [PATCH V3 13/14] scsi: storvsc: " Ming Lei
  2023-08-08 10:42 ` [PATCH V3 14/14] blk-mq: add helpers for treating kdump kernel Ming Lei
  13 siblings, 0 replies; 25+ messages in thread
From: Ming Lei @ 2023-08-08 10:42 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-nvme, Martin K . Petersen,
	linux-scsi
  Cc: linux-block, Wen Xiong, Keith Busch, Ming Lei

Take blk-mq's knowledge into account for calculating io queues.

Fix wrong queue mapping in case of kdump kernel.

On arm and ppc64, 'maxcpus=1' is passed to kdump command line, see
`Documentation/admin-guide/kdump/kdump.rst`, so num_possible_cpus()
still returns all CPUs because 'maxcpus=1' just bring up one single
cpu core during booting.

blk-mq sees single queue in kdump kernel, and in driver's viewpoint
there are still multiple queues, this inconsistency causes driver to apply
wrong queue mapping for handling IO, and IO timeout is triggered.

Meantime, single queue makes much less resource utilization, and reduce
risk of kernel failure.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/ufs/core/ufs-mcq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/ufs/core/ufs-mcq.c b/drivers/ufs/core/ufs-mcq.c
index 6fb0e007af63..7d808836b018 100644
--- a/drivers/ufs/core/ufs-mcq.c
+++ b/drivers/ufs/core/ufs-mcq.c
@@ -186,7 +186,7 @@ static int ufshcd_mcq_config_nr_queues(struct ufs_hba *hba)

 	if (!hba->nr_queues[HCTX_TYPE_DEFAULT])
 		hba->nr_queues[HCTX_TYPE_DEFAULT] = min3(rem, rw_queues,
-							 num_possible_cpus());
+							 scsi_max_nr_hw_queues());

 	for (i = 0; i < HCTX_MAX_TYPES; i++)
 		host->nr_hw_queues += hba->nr_queues[i];
-- 
2.40.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V3 13/14] scsi: storvsc: limit max allowed nr_hw_queues
  2023-08-08 10:42 [PATCH V3 0/14] blk-mq: fix wrong queue mapping for kdump kernel Ming Lei
                   ` (11 preceding siblings ...)
  2023-08-08 10:42 ` [PATCH V3 12/14] scsi: ufs: limit max allowed nr_hw_queues Ming Lei
@ 2023-08-08 10:42 ` Ming Lei
  2023-08-08 10:42 ` [PATCH V3 14/14] blk-mq: add helpers for treating kdump kernel Ming Lei
  13 siblings, 0 replies; 25+ messages in thread
From: Ming Lei @ 2023-08-08 10:42 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-nvme, Martin K . Petersen,
	linux-scsi
  Cc: linux-block, Wen Xiong, Keith Busch, Ming Lei

Take blk-mq's knowledge into account for calculating io queues.

Fix wrong queue mapping in case of kdump kernel.

On arm and ppc64, 'maxcpus=1' is passed to kdump command line, see
`Documentation/admin-guide/kdump/kdump.rst`, so num_possible_cpus()
still returns all CPUs because 'maxcpus=1' just bring up one single
cpu core during booting.

blk-mq sees single queue in kdump kernel, and in driver's viewpoint
there are still multiple queues, this inconsistency causes driver to apply
wrong queue mapping for handling IO, and IO timeout is triggered.

Meantime, single queue makes much less resource utilization, and reduce
risk of kernel failure.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/storvsc_drv.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index f2823218670a..50df040b100d 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -2040,6 +2040,9 @@ static int storvsc_probe(struct hv_device *device,
 			host->nr_hw_queues = storvsc_max_hw_queues;
 		else
 			host->nr_hw_queues = num_present_cpus;
+		if (host->nr_hw_queues > scsi_max_nr_hw_queues())
+			host->nr_hw_queues = scsi_max_nr_hw_queues();
+
 	}

 	/*
-- 
2.40.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH V3 14/14] blk-mq: add helpers for treating kdump kernel
  2023-08-08 10:42 [PATCH V3 0/14] blk-mq: fix wrong queue mapping for kdump kernel Ming Lei
                   ` (12 preceding siblings ...)
  2023-08-08 10:42 ` [PATCH V3 13/14] scsi: storvsc: " Ming Lei
@ 2023-08-08 10:42 ` Ming Lei
  2023-08-10  8:00   ` kernel test robot
  13 siblings, 1 reply; 25+ messages in thread
From: Ming Lei @ 2023-08-08 10:42 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-nvme, Martin K . Petersen,
	linux-scsi
  Cc: linux-block, Wen Xiong, Keith Busch, Ming Lei

Clean up code a bit by adding helpers for treating kdump kernel
specially.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 39 +++++++++++++++++++++++++++------------
 1 file changed, 27 insertions(+), 12 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 617d6f849a7b..afa51df2f0d3 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -147,6 +147,8 @@ EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait);
  * driver has to take blk-mq max supported nr_hw_queues into account
  * when figuring out nr_hw_queues from hardware info, for avoiding
  * inconsistency between driver and blk-mq.
+ *
+ * Limit to single queue in case of kdump kernel
  */
 unsigned int blk_mq_max_nr_hw_queues(void)
 {
@@ -4370,7 +4372,7 @@ static void blk_mq_update_queue_map(struct blk_mq_tag_set *set)
 	if (set->nr_maps == 1)
 		set->map[HCTX_TYPE_DEFAULT].nr_queues = set->nr_hw_queues;
 
-	if (set->ops->map_queues && !is_kdump_kernel()) {
+	if (set->ops->map_queues) {
 		int i;
 
 		/*
@@ -4420,6 +4422,22 @@ static int blk_mq_realloc_tag_set_tags(struct blk_mq_tag_set *set,
 	return 0;
 }
 
+/* Limit to single map in case of kdump kernel */
+static unsigned int blk_mq_max_nr_maps(void)
+{
+	if (is_kdump_kernel())
+		return 1;
+	return HCTX_MAX_TYPES;
+}
+
+/* Limit to 64 in case of kdump kernel */
+static unsigned int blk_mq_max_depth(void)
+{
+	if (is_kdump_kernel())
+		return 64;
+	return BLK_MQ_MAX_DEPTH;
+}
+
 /*
  * Alloc a tag set to be associated with one or more request queues.
  * May fail with EINVAL for various error conditions. May adjust the
@@ -4456,16 +4474,13 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
 	else if (set->nr_maps > HCTX_MAX_TYPES)
 		return -EINVAL;
 
-	/*
-	 * If a crashdump is active, then we are potentially in a very
-	 * memory constrained environment. Limit us to 1 queue and
-	 * 64 tags to prevent using too much memory.
-	 */
-	if (is_kdump_kernel()) {
-		set->nr_hw_queues = 1;
-		set->nr_maps = 1;
-		set->queue_depth = min(64U, set->queue_depth);
-	}
+	if (set->nr_hw_queues > blk_mq_max_nr_hw_queues())
+		set->nr_hw_queues = blk_mq_max_nr_hw_queues();
+	if (set->nr_maps > blk_mq_max_nr_maps())
+		set->nr_maps = blk_mq_max_nr_maps();
+	if (set->queue_depth > blk_mq_max_depth())
+		set->queue_depth = blk_mq_max_depth();
+
 	/*
 	 * There is no use for more h/w queues than cpus if we just have
 	 * a single map
@@ -4495,7 +4510,7 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
 						  GFP_KERNEL, set->numa_node);
 		if (!set->map[i].mq_map)
 			goto out_free_mq_map;
-		set->map[i].nr_queues = is_kdump_kernel() ? 1 : set->nr_hw_queues;
+		set->map[i].nr_queues = set->nr_hw_queues;
 	}
 
 	blk_mq_update_queue_map(set);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH V3 01/14] blk-mq: add blk_mq_max_nr_hw_queues()
  2023-08-08 10:42 ` [PATCH V3 01/14] blk-mq: add blk_mq_max_nr_hw_queues() Ming Lei
@ 2023-08-09 13:44   ` Christoph Hellwig
  2023-08-10  0:09     ` Ming Lei
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2023-08-09 13:44 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, linux-nvme, Martin K . Petersen,
	linux-scsi, linux-block, Wen Xiong, Keith Busch

I'm starting to sound like a broken record, but we can't just do random
is_kdump checks, and it's not going to get better by resending it again and
again.  If kdump kernels limit the number of possible CPUs, it needs to
reflected in cpu_possible_map and we need to use that information.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V3 01/14] blk-mq: add blk_mq_max_nr_hw_queues()
  2023-08-09 13:44   ` Christoph Hellwig
@ 2023-08-10  0:09     ` Ming Lei
  2023-08-10  1:18       ` Baoquan He
  2023-08-11 13:10       ` Christoph Hellwig
  0 siblings, 2 replies; 25+ messages in thread
From: Ming Lei @ 2023-08-10  0:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-nvme, Martin K . Petersen, linux-scsi,
	linux-block, Wen Xiong, Keith Busch, linuxppc-dev, Dave Young,
	kexec, linux-kernel, Baoquan He, Pingfan Liu

On Wed, Aug 09, 2023 at 03:44:01PM +0200, Christoph Hellwig wrote:
> I'm starting to sound like a broken record, but we can't just do random
> is_kdump checks, and it's not going to get better by resending it again and
> again.  If kdump kernels limit the number of possible CPUs, it needs to
> reflected in cpu_possible_map and we need to use that information.
> 

Can you look at previous kdump/arch guys' comment about kdump usage &
num_possible_cpus?

    https://lore.kernel.org/linux-block/CAF+s44RuqswbosY9kMDx35crviQnxOeuvgNsuE75Bb0Y2Jg2uw@mail.gmail.com/
    https://lore.kernel.org/linux-block/ZKz912KyFQ7q9qwL@MiWiFi-R3L-srv/

The point is that kdump kernels does not limit the number of possible CPUs.

1) some archs support 'nr_cpus=1' for kdump kernel, which is fine, since
num_possible_cpus becomes 1.

2) some archs do not support 'nr_cpus=1', and have to rely on
'max_cpus=1', so num_possible_cpus isn't changed, and kernel just boots
with single online cpu. That causes trouble because blk-mq limits single
queue.

Documentation/admin-guide/kdump/kdump.rst

Thanks, 
Ming


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V3 01/14] blk-mq: add blk_mq_max_nr_hw_queues()
  2023-08-10  0:09     ` Ming Lei
@ 2023-08-10  1:18       ` Baoquan He
  2023-08-10  2:06         ` Ming Lei
  2023-08-11 13:10       ` Christoph Hellwig
  1 sibling, 1 reply; 25+ messages in thread
From: Baoquan He @ 2023-08-10  1:18 UTC (permalink / raw)
  To: Ming Lei, mpe, npiggin, christophe.leroy
  Cc: Christoph Hellwig, Jens Axboe, linux-nvme, Martin K . Petersen,
	linux-scsi, linux-block, Wen Xiong, Keith Busch, linuxppc-dev,
	Dave Young, kexec, linux-kernel, Pingfan Liu

On 08/10/23 at 08:09am, Ming Lei wrote:
> On Wed, Aug 09, 2023 at 03:44:01PM +0200, Christoph Hellwig wrote:
> > I'm starting to sound like a broken record, but we can't just do random
> > is_kdump checks, and it's not going to get better by resending it again and
> > again.  If kdump kernels limit the number of possible CPUs, it needs to
> > reflected in cpu_possible_map and we need to use that information.
> > 
> 
> Can you look at previous kdump/arch guys' comment about kdump usage &
> num_possible_cpus?
> 
>     https://lore.kernel.org/linux-block/CAF+s44RuqswbosY9kMDx35crviQnxOeuvgNsuE75Bb0Y2Jg2uw@mail.gmail.com/
>     https://lore.kernel.org/linux-block/ZKz912KyFQ7q9qwL@MiWiFi-R3L-srv/
> 
> The point is that kdump kernels does not limit the number of possible CPUs.
> 
> 1) some archs support 'nr_cpus=1' for kdump kernel, which is fine, since
> num_possible_cpus becomes 1.

Yes, "nr_cpus=" is strongly suggested in kdump kernel because "nr_cpus="
limits the possible cpu numbers, while "maxcpuss=" only limits the cpu
number which can be brought up during bootup. We noticed this diference
because a large number of possible cpus will cost more memory in kdump
kernel. e.g percpu initialization, even though kdump kernel have set
"maxcpus=1". 

Currently x86 and arm64 all support "nr_cpus=". Pingfan ever spent much
effort to make patches to add "nr_cpus=" support to ppc64, seems ppc64
dev and maintainers do not care about it. Finally the patches are not
accepted, and the work is not continued.

Now, I am wondering what is the barrier to add "nr_cpus=" to power ach.
Can we reconsider adding 'nr_cpus=' to power arch since real issue
occurred in kdump kernel?

As for this patchset, it can be accpeted so that no failure in kdump
kernel is seen on ARCHes w/o "nr_cpus=" support? My personal opinion.

> 
> 2) some archs do not support 'nr_cpus=1', and have to rely on
> 'max_cpus=1', so num_possible_cpus isn't changed, and kernel just boots
> with single online cpu. That causes trouble because blk-mq limits single
> queue.
> 
> Documentation/admin-guide/kdump/kdump.rst
> 
> Thanks, 
> Ming
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V3 01/14] blk-mq: add blk_mq_max_nr_hw_queues()
  2023-08-10  1:18       ` Baoquan He
@ 2023-08-10  2:06         ` Ming Lei
  2023-08-10  3:01           ` Baoquan He
  0 siblings, 1 reply; 25+ messages in thread
From: Ming Lei @ 2023-08-10  2:06 UTC (permalink / raw)
  To: Baoquan He
  Cc: mpe, npiggin, christophe.leroy, Christoph Hellwig, Jens Axboe,
	linux-nvme, Martin K . Petersen, linux-scsi, linux-block,
	Wen Xiong, Keith Busch, linuxppc-dev, Dave Young, kexec,
	linux-kernel, Pingfan Liu

On Thu, Aug 10, 2023 at 09:18:27AM +0800, Baoquan He wrote:
> On 08/10/23 at 08:09am, Ming Lei wrote:
> > On Wed, Aug 09, 2023 at 03:44:01PM +0200, Christoph Hellwig wrote:
> > > I'm starting to sound like a broken record, but we can't just do random
> > > is_kdump checks, and it's not going to get better by resending it again and
> > > again.  If kdump kernels limit the number of possible CPUs, it needs to
> > > reflected in cpu_possible_map and we need to use that information.
> > > 
> > 
> > Can you look at previous kdump/arch guys' comment about kdump usage &
> > num_possible_cpus?
> > 
> >     https://lore.kernel.org/linux-block/CAF+s44RuqswbosY9kMDx35crviQnxOeuvgNsuE75Bb0Y2Jg2uw@mail.gmail.com/
> >     https://lore.kernel.org/linux-block/ZKz912KyFQ7q9qwL@MiWiFi-R3L-srv/
> > 
> > The point is that kdump kernels does not limit the number of possible CPUs.
> > 
> > 1) some archs support 'nr_cpus=1' for kdump kernel, which is fine, since
> > num_possible_cpus becomes 1.
> 
> Yes, "nr_cpus=" is strongly suggested in kdump kernel because "nr_cpus="
> limits the possible cpu numbers, while "maxcpuss=" only limits the cpu
> number which can be brought up during bootup. We noticed this diference
> because a large number of possible cpus will cost more memory in kdump
> kernel. e.g percpu initialization, even though kdump kernel have set
> "maxcpus=1". 
> 
> Currently x86 and arm64 all support "nr_cpus=". Pingfan ever spent much
> effort to make patches to add "nr_cpus=" support to ppc64, seems ppc64
> dev and maintainers do not care about it. Finally the patches are not
> accepted, and the work is not continued.
> 
> Now, I am wondering what is the barrier to add "nr_cpus=" to power ach.
> Can we reconsider adding 'nr_cpus=' to power arch since real issue
> occurred in kdump kernel?

If 'nr_cpus=' can be supported on ppc64, this patchset isn't needed.

> 
> As for this patchset, it can be accpeted so that no failure in kdump
> kernel is seen on ARCHes w/o "nr_cpus=" support? My personal opinion.

IMO 'nr_cpus=' support should be preferred, given it is annoying to
maintain two kinds of implementation for kdump kernel from driver
viewpoint. I guess kdump things can be simplified too with supporting
'nr_cpus=' only.

thanks,
Ming


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V3 01/14] blk-mq: add blk_mq_max_nr_hw_queues()
  2023-08-10  2:06         ` Ming Lei
@ 2023-08-10  3:01           ` Baoquan He
  2023-08-11  7:53             ` Hari Bathini
  0 siblings, 1 reply; 25+ messages in thread
From: Baoquan He @ 2023-08-10  3:01 UTC (permalink / raw)
  To: Ming Lei
  Cc: mpe, npiggin, christophe.leroy, Christoph Hellwig, Jens Axboe,
	linux-nvme, Martin K . Petersen, linux-scsi, linux-block,
	Wen Xiong, Keith Busch, linuxppc-dev, Dave Young, kexec,
	linux-kernel, Pingfan Liu

On 08/10/23 at 10:06am, Ming Lei wrote:
> On Thu, Aug 10, 2023 at 09:18:27AM +0800, Baoquan He wrote:
> > On 08/10/23 at 08:09am, Ming Lei wrote:
> > > On Wed, Aug 09, 2023 at 03:44:01PM +0200, Christoph Hellwig wrote:
> > > > I'm starting to sound like a broken record, but we can't just do random
> > > > is_kdump checks, and it's not going to get better by resending it again and
> > > > again.  If kdump kernels limit the number of possible CPUs, it needs to
> > > > reflected in cpu_possible_map and we need to use that information.
> > > > 
> > > 
> > > Can you look at previous kdump/arch guys' comment about kdump usage &
> > > num_possible_cpus?
> > > 
> > >     https://lore.kernel.org/linux-block/CAF+s44RuqswbosY9kMDx35crviQnxOeuvgNsuE75Bb0Y2Jg2uw@mail.gmail.com/
> > >     https://lore.kernel.org/linux-block/ZKz912KyFQ7q9qwL@MiWiFi-R3L-srv/
> > > 
> > > The point is that kdump kernels does not limit the number of possible CPUs.
> > > 
> > > 1) some archs support 'nr_cpus=1' for kdump kernel, which is fine, since
> > > num_possible_cpus becomes 1.
> > 
> > Yes, "nr_cpus=" is strongly suggested in kdump kernel because "nr_cpus="
> > limits the possible cpu numbers, while "maxcpuss=" only limits the cpu
> > number which can be brought up during bootup. We noticed this diference
> > because a large number of possible cpus will cost more memory in kdump
> > kernel. e.g percpu initialization, even though kdump kernel have set
> > "maxcpus=1". 
> > 
> > Currently x86 and arm64 all support "nr_cpus=". Pingfan ever spent much
> > effort to make patches to add "nr_cpus=" support to ppc64, seems ppc64
> > dev and maintainers do not care about it. Finally the patches are not
> > accepted, and the work is not continued.
> > 
> > Now, I am wondering what is the barrier to add "nr_cpus=" to power ach.
> > Can we reconsider adding 'nr_cpus=' to power arch since real issue
> > occurred in kdump kernel?
> 
> If 'nr_cpus=' can be supported on ppc64, this patchset isn't needed.
> 
> > 
> > As for this patchset, it can be accpeted so that no failure in kdump
> > kernel is seen on ARCHes w/o "nr_cpus=" support? My personal opinion.
> 
> IMO 'nr_cpus=' support should be preferred, given it is annoying to
> maintain two kinds of implementation for kdump kernel from driver
> viewpoint. I guess kdump things can be simplified too with supporting
> 'nr_cpus=' only.

Yes, 'nr_cpus=' is ideal. Not sure if there's some underlying concerns so
that power people decided to not support it.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V3 14/14] blk-mq: add helpers for treating kdump kernel
  2023-08-08 10:42 ` [PATCH V3 14/14] blk-mq: add helpers for treating kdump kernel Ming Lei
@ 2023-08-10  8:00   ` kernel test robot
  0 siblings, 0 replies; 25+ messages in thread
From: kernel test robot @ 2023-08-10  8:00 UTC (permalink / raw)
  To: Ming Lei
  Cc: oe-lkp, lkp, Christoph Hellwig, linux-block, Jens Axboe,
	linux-nvme, Martin K . Petersen, linux-scsi, Wen Xiong,
	Keith Busch, Ming Lei, oliver.sang



Hello,

kernel test robot noticed "WARNING:at_drivers/block/null_blk/main.c:#null_map_queues" on:

commit: 8ec7debf0b62ddf5f62e18b886925462215ab98b ("[PATCH V3 14/14] blk-mq: add helpers for treating kdump kernel")
url: https://github.com/intel-lab-lkp/linux/commits/Ming-Lei/blk-mq-add-blk_mq_max_nr_hw_queues/20230809-003555
base: https://git.kernel.org/cgit/linux/kernel/git/mkp/scsi.git for-next
patch link: https://lore.kernel.org/all/20230808104239.146085-15-ming.lei@redhat.com/
patch subject: [PATCH V3 14/14] blk-mq: add helpers for treating kdump kernel

in testcase: boot

compiler: gcc-12
test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 4G

(please refer to attached dmesg/kmsg for entire log/backtrace)


+-----------------------------------------------------------+------------+------------+
|                                                           | 27da637b41 | 8ec7debf0b |
+-----------------------------------------------------------+------------+------------+
| boot_successes                                            | 12         | 0          |
| boot_failures                                             | 0          | 12         |
| WARNING:at_drivers/block/null_blk/main.c:#null_map_queues | 0          | 12         |
| EIP:null_map_queues                                       | 0          | 12         |
| BUG:kernel_NULL_pointer_dereference,address               | 0          | 12         |
| Oops:#[##]                                                | 0          | 12         |
| EIP:group_cpus_evenly                                     | 0          | 12         |
| Kernel_panic-not_syncing:Fatal_exception                  | 0          | 12         |
+-----------------------------------------------------------+------------+------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202308101503.67a2d533-oliver.sang@intel.com



[    7.742766][    T1] null_blk: tag set has unexpected nr_hw_queues: 1
[    7.744704][    T1] ------------[ cut here ]------------
[    7.745825][    T1] WARNING: CPU: 0 PID: 1 at drivers/block/null_blk/main.c:1615 null_map_queues+0x56/0xdb
[    7.748029][    T1] Modules linked in:
[    7.748923][    T1] CPU: 0 PID: 1 Comm: swapper Tainted: G                T  6.5.0-rc1-00100-g8ec7debf0b62 #10 05d847e43b9b6f584ad59352c89de6920a7d94da
[    7.751662][    T1] EIP: null_map_queues+0x56/0xdb
[    7.753366][    T1] Code: b0 b4 01 00 00 8d 0c 37 39 ca 74 29 8b b8 b0 01 00 00 8b b0 b8 01 00 00 8d 04 37 39 c2 74 16 52 68 6d 34 9b 42 e8 f8 89 84 ff <0f> 0b 31
 f6 5a bf 01 00 00 00 59 8d 43 04 31 c9 31 d2 89 45 f0 3b
[    7.757272][    T1] EAX: 00000030 EBX: 46131e1c ECX: 00000000 EDX: 00000000
[    7.758794][    T1] ESI: 00000001 EDI: 00000001 EBP: 40343e88 ESP: 40343e68
[    7.760257][    T1] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00010246
[    7.761810][    T1] CR0: 80050033 CR2: ffd99000 CR3: 0343e000 CR4: 000406d0
[    7.763254][    T1] Call Trace:
[    7.763953][    T1]  ? show_regs+0x60/0x70
[    7.764872][    T1]  ? null_map_queues+0x56/0xdb
[    7.765908][    T1]  ? __warn+0x8c/0x10a
[    7.766754][    T1]  ? report_bug+0xdd/0x13e
[    7.767719][    T1]  ? null_map_queues+0x56/0xdb
[    7.768721][    T1]  ? exc_overflow+0x41/0x41
[    7.769668][    T1]  ? handle_bug+0x2b/0x53
[    7.770532][    T1]  ? exc_invalid_op+0x24/0x6a
[    7.771501][    T1]  ? handle_exception+0x11d/0x11d
[    7.772572][    T1]  ? lockdep_next_lockchain+0x18/0x2b
[    7.773697][    T1]  ? exc_overflow+0x41/0x41
[    7.774604][    T1]  ? null_map_queues+0x56/0xdb
[    7.775593][    T1]  ? exc_overflow+0x41/0x41
[    7.776561][    T1]  ? null_map_queues+0x56/0xdb
[    7.777534][    T1]  ? kmalloc_array_node+0x19/0x28
[    7.778828][    T1]  ? blk_mq_update_queue_map+0x57/0x7e
[    7.779975][    T1]  ? blk_mq_alloc_tag_set+0x1eb/0x353
[    7.781115][    T1]  ? null_init_tag_set+0xd7/0xe6
[    7.782155][    T1]  ? null_add_dev+0x1d9/0x5ef
[    7.783128][    T1]  ? null_alloc_dev+0x7a/0x1c2
[    7.784144][    T1]  ? null_init+0x26d/0x36b
[    7.785126][    T1]  ? do_one_initcall+0x77/0x1b9
[    7.786097][    T1]  ? virtio_blk_init+0xbf/0xbf
[    7.787076][    T1]  ? do_initcalls+0x176/0x1ba
[    7.788080][    T1]  ? kernel_init_freeable+0xe9/0x13c
[    7.792635][    T1]  ? rest_init+0x11d/0x11d
[    7.793557][    T1]  ? kernel_init+0x12/0xf7
[    7.794469][    T1]  ? ret_from_fork+0x1c/0x30
[    7.795465][    T1] irq event stamp: 321017
[    7.796352][    T1] hardirqs last  enabled at (321027): [<41095389>] __up_console_sem+0x59/0x71
[    7.798190][    T1] hardirqs last disabled at (321036): [<41095370>] __up_console_sem+0x40/0x71
[    7.800028][    T1] softirqs last  enabled at (320984): [<41f25ca9>] __do_softirq+0x279/0x2b7
[    7.801854][    T1] softirqs last disabled at (320975): [<4101cf25>] call_on_stack+0x40/0x50
[    7.803562][    T1] ---[ end trace 0000000000000000 ]---
[    7.804950][    T1] BUG: kernel NULL pointer dereference, address: 00000010
[    7.805688][    T1] #PF: supervisor write access in kernel mode
[    7.805688][    T1] #PF: error_code(0x0002) - not-present page
[    7.805688][    T1] *pde = 00000000
[    7.805688][    T1] Oops: 0002 [#1] PREEMPT
[    7.805688][    T1] CPU: 0 PID: 1 Comm: swapper Tainted: G        W       T  6.5.0-rc1-00100-g8ec7debf0b62 #10 05d847e43b9b6f584ad59352c89de6920a7d94da
[    7.805688][    T1] EIP: group_cpus_evenly+0x24/0x2e
[    7.805688][    T1] Code: c9 e9 a1 91 8d 00 55 89 e5 ba 04 00 00 00 f7 e2 70 0e ba c0 0d 00 00 e8 2e 87 b4 ff 85 c0 75 04 31 c0 eb 08 8b 15 a8 1b 34 43 <89> 10 5d
 31 d2 e9 73 91 8d 00 55 89 e5 56 53 8b 1d 84 5f 32 43 85
[    7.805688][    T1] EAX: 00000010 EBX: 46131e38 ECX: 00000000 EDX: 00000001
[    7.805688][    T1] ESI: 00000000 EDI: 00000001 EBP: 40343e58 ESP: 40343e58
[    7.805688][    T1] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00010202
[    7.805688][    T1] CR0: 80050033 CR2: 00000010 CR3: 0343e000 CR4: 000406d0
[    7.805688][    T1] Call Trace:
[    7.805688][    T1]  ? show_regs+0x60/0x70
[    7.805688][    T1]  ? __die_body+0x13/0x52
[    7.805688][    T1]  ? __die+0x22/0x2c
[    7.805688][    T1]  ? page_fault_oops+0x4c/0x7f
[    7.805688][    T1]  ? kernelmode_fixup_or_oops+0x8b/0x9d
[    7.805688][    T1]  ? __bad_area_nosemaphore+0x40/0x16c
[    7.805688][    T1]  ? bad_area_nosemaphore+0xa/0x17
[    7.805688][    T1]  ? do_user_addr_fault+0xdd/0x396
[    7.805688][    T1]  ? trace_irq_disable+0x3b/0x4e
[    7.805688][    T1]  ? exc_page_fault+0xf6/0x120
[    7.805688][    T1]  ? pvclock_clocksource_read_nowd+0x167/0x167
[    7.805688][    T1]  ? handle_exception+0x11d/0x11d
[    7.805688][    T1]  ? pvclock_clocksource_read_nowd+0x167/0x167
[    7.805688][    T1]  ? group_cpus_evenly+0x24/0x2e
[    7.805688][    T1]  ? pvclock_clocksource_read_nowd+0x167/0x167
[    7.805688][    T1]  ? group_cpus_evenly+0x24/0x2e
[    7.805688][    T1]  ? blk_mq_map_queues+0xf/0x46
[    7.805688][    T1]  ? null_map_queues+0xbc/0xdb
[    7.805688][    T1]  ? blk_mq_update_queue_map+0x57/0x7e
[    7.805688][    T1]  ? blk_mq_alloc_tag_set+0x1eb/0x353
[    7.805688][    T1]  ? null_init_tag_set+0xd7/0xe6
[    7.805688][    T1]  ? null_add_dev+0x1d9/0x5ef
[    7.805688][    T1]  ? null_alloc_dev+0x7a/0x1c2
[    7.805688][    T1]  ? null_init+0x26d/0x36b
[    7.805688][    T1]  ? do_one_initcall+0x77/0x1b9
[    7.805688][    T1]  ? virtio_blk_init+0xbf/0xbf
[    7.805688][    T1]  ? do_initcalls+0x176/0x1ba
[    7.805688][    T1]  ? kernel_init_freeable+0xe9/0x13c
[    7.805688][    T1]  ? rest_init+0x11d/0x11d
[    7.805688][    T1]  ? kernel_init+0x12/0xf7
[    7.805688][    T1]  ? ret_from_fork+0x1c/0x30
[    7.805688][    T1] Modules linked in:
[    7.805688][    T1] CR2: 0000000000000010
[    7.805688][    T1] ---[ end trace 0000000000000000 ]---
[    7.805688][    T1] EIP: group_cpus_evenly+0x24/0x2e
[    7.805688][    T1] Code: c9 e9 a1 91 8d 00 55 89 e5 ba 04 00 00 00 f7 e2 70 0e ba c0 0d 00 00 e8 2e 87 b4 ff 85 c0 75 04 31 c0 eb 08 8b 15 a8 1b 34 43 <89> 10 5d
 31 d2 e9 73 91 8d 00 55 89 e5 56 53 8b 1d 84 5f 32 43 85
[    7.805688][    T1] EAX: 00000010 EBX: 46131e38 ECX: 00000000 EDX: 00000001
[    7.805688][    T1] ESI: 00000000 EDI: 00000001 EBP: 40343e58 ESP: 40343e58
[    7.805688][    T1] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00010202
[    7.805688][    T1] CR0: 80050033 CR2: 00000010 CR3: 0343e000 CR4: 000406d0
[    7.805688][    T1] Kernel panic - not syncing: Fatal exception
[    7.805688][    T1] Kernel Offset: disabled



The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230810/202308101503.67a2d533-oliver.sang@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V3 04/14] virtio-blk: limit max allowed submit queues
  2023-08-08 10:42 ` [PATCH V3 04/14] virtio-blk: limit max allowed submit queues Ming Lei
@ 2023-08-10 19:23   ` Michael S. Tsirkin
  0 siblings, 0 replies; 25+ messages in thread
From: Michael S. Tsirkin @ 2023-08-10 19:23 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, linux-nvme, Martin K . Petersen,
	linux-scsi, linux-block, Wen Xiong, Keith Busch, Jason Wang,
	virtualization

On Tue, Aug 08, 2023 at 06:42:29PM +0800, Ming Lei wrote:
> Take blk-mq's knowledge into account for calculating io queues.
> 
> Fix wrong queue mapping in case of kdump kernel.
> 
> On arm and ppc64, 'maxcpus=1' is passed to kdump command line, see
> `Documentation/admin-guide/kdump/kdump.rst`, so num_possible_cpus()
> still returns all CPUs because 'maxcpus=1' just bring up one single
> cpu core during booting.
> 
> blk-mq sees single queue in kdump kernel, and in driver's viewpoint
> there are still multiple queues, this inconsistency causes driver to apply
> wrong queue mapping for handling IO, and IO timeout is triggered.
> 
> Meantime, single queue makes much less resource utilization, and reduce
> risk of kernel failure.
> 
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: virtualization@lists.linux-foundation.org
> Signed-off-by: Ming Lei <ming.lei@redhat.com>

superficially:

Acked-by: Michael S. Tsirkin <mst@redhat.com>

but this patch only makes sense if the rest of patchset is merged.
feel free to merge directly.

> ---
>  drivers/block/virtio_blk.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 1fe011676d07..4ba79fe2a1b4 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -1047,7 +1047,8 @@ static int init_vq(struct virtio_blk *vblk)
>  
>  	num_poll_vqs = min_t(unsigned int, poll_queues, num_vqs - 1);
>  
> -	vblk->io_queues[HCTX_TYPE_DEFAULT] = num_vqs - num_poll_vqs;
> +	vblk->io_queues[HCTX_TYPE_DEFAULT] = min_t(unsigned,
> +			num_vqs - num_poll_vqs, blk_mq_max_nr_hw_queues());
>  	vblk->io_queues[HCTX_TYPE_READ] = 0;
>  	vblk->io_queues[HCTX_TYPE_POLL] = num_poll_vqs;
>  
> -- 
> 2.40.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V3 01/14] blk-mq: add blk_mq_max_nr_hw_queues()
  2023-08-10  3:01           ` Baoquan He
@ 2023-08-11  7:53             ` Hari Bathini
  2023-09-05  5:03               ` Baoquan He
  0 siblings, 1 reply; 25+ messages in thread
From: Hari Bathini @ 2023-08-11  7:53 UTC (permalink / raw)
  To: Baoquan He, Ming Lei
  Cc: Jens Axboe, linux-scsi, Martin K . Petersen, Pingfan Liu,
	Dave Young, npiggin, linux-kernel, linux-nvme, linux-block,
	Wen Xiong, kexec, Keith Busch, linuxppc-dev, Christoph Hellwig,
	Mahesh J Salgaonkar



On 10/08/23 8:31 am, Baoquan He wrote:
> On 08/10/23 at 10:06am, Ming Lei wrote:
>> On Thu, Aug 10, 2023 at 09:18:27AM +0800, Baoquan He wrote:
>>> On 08/10/23 at 08:09am, Ming Lei wrote:
>>>> On Wed, Aug 09, 2023 at 03:44:01PM +0200, Christoph Hellwig wrote:
>>>>> I'm starting to sound like a broken record, but we can't just do random
>>>>> is_kdump checks, and it's not going to get better by resending it again and
>>>>> again.  If kdump kernels limit the number of possible CPUs, it needs to
>>>>> reflected in cpu_possible_map and we need to use that information.
>>>>>
>>>>
>>>> Can you look at previous kdump/arch guys' comment about kdump usage &
>>>> num_possible_cpus?
>>>>
>>>>      https://lore.kernel.org/linux-block/CAF+s44RuqswbosY9kMDx35crviQnxOeuvgNsuE75Bb0Y2Jg2uw@mail.gmail.com/
>>>>      https://lore.kernel.org/linux-block/ZKz912KyFQ7q9qwL@MiWiFi-R3L-srv/
>>>>
>>>> The point is that kdump kernels does not limit the number of possible CPUs.
>>>>
>>>> 1) some archs support 'nr_cpus=1' for kdump kernel, which is fine, since
>>>> num_possible_cpus becomes 1.
>>>
>>> Yes, "nr_cpus=" is strongly suggested in kdump kernel because "nr_cpus="
>>> limits the possible cpu numbers, while "maxcpuss=" only limits the cpu
>>> number which can be brought up during bootup. We noticed this diference
>>> because a large number of possible cpus will cost more memory in kdump
>>> kernel. e.g percpu initialization, even though kdump kernel have set
>>> "maxcpus=1".
>>>
>>> Currently x86 and arm64 all support "nr_cpus=". Pingfan ever spent much
>>> effort to make patches to add "nr_cpus=" support to ppc64, seems ppc64
>>> dev and maintainers do not care about it. Finally the patches are not
>>> accepted, and the work is not continued.
>>>
>>> Now, I am wondering what is the barrier to add "nr_cpus=" to power ach.
>>> Can we reconsider adding 'nr_cpus=' to power arch since real issue
>>> occurred in kdump kernel?
>>
>> If 'nr_cpus=' can be supported on ppc64, this patchset isn't needed.
>>
>>>
>>> As for this patchset, it can be accpeted so that no failure in kdump
>>> kernel is seen on ARCHes w/o "nr_cpus=" support? My personal opinion.
>>
>> IMO 'nr_cpus=' support should be preferred, given it is annoying to
>> maintain two kinds of implementation for kdump kernel from driver
>> viewpoint. I guess kdump things can be simplified too with supporting
>> 'nr_cpus=' only.
> 
> Yes, 'nr_cpus=' is ideal. Not sure if there's some underlying concerns so
> that power people decided to not support it.

Though "nr_cpus=1" is an ideal solution, maintainer was not happy with
the patch as the code changes have impact for regular boot path and
it is likely to cause breakages. So, even if "nr_cpus=1" support for
ppc64 is revived, the change is going to take time to be accepted
upstream.

Also, I see is_kdump_kernel() being used irrespective of "nr_cpus=1"
support for other optimizations in the driver for the special dump
capture environment kdump is.

If there is no other downside for driver code, to use is_kdump_kernel(),
other than the maintainability aspect, I think the above changes are
worth considering.

Thanks
Hari

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V3 01/14] blk-mq: add blk_mq_max_nr_hw_queues()
  2023-08-10  0:09     ` Ming Lei
  2023-08-10  1:18       ` Baoquan He
@ 2023-08-11 13:10       ` Christoph Hellwig
  1 sibling, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2023-08-11 13:10 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, linux-nvme, Martin K . Petersen,
	linux-scsi, linux-block, Wen Xiong, Keith Busch, linuxppc-dev,
	Dave Young, kexec, linux-kernel, Baoquan He, Pingfan Liu

On Thu, Aug 10, 2023 at 08:09:27AM +0800, Ming Lei wrote:
> 1) some archs support 'nr_cpus=1' for kdump kernel, which is fine, since
> num_possible_cpus becomes 1.
> 
> 2) some archs do not support 'nr_cpus=1', and have to rely on
> 'max_cpus=1', so num_possible_cpus isn't changed, and kernel just boots
> with single online cpu. That causes trouble because blk-mq limits single
> queue.

And we need to fix case 2.  We need to drop the is_kdump support, and
if they want to force less cpus they need to make nr_cpus=1 work.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH V3 01/14] blk-mq: add blk_mq_max_nr_hw_queues()
  2023-08-11  7:53             ` Hari Bathini
@ 2023-09-05  5:03               ` Baoquan He
  0 siblings, 0 replies; 25+ messages in thread
From: Baoquan He @ 2023-09-05  5:03 UTC (permalink / raw)
  To: Hari Bathini, mpe
  Cc: Ming Lei, Jens Axboe, linux-scsi, Martin K . Petersen,
	Pingfan Liu, Dave Young, npiggin, linux-kernel, linux-nvme,
	linux-block, Wen Xiong, kexec, Keith Busch, linuxppc-dev,
	Christoph Hellwig, Mahesh J Salgaonkar

Hi Hari, Michael

On 08/11/23 at 01:23pm, Hari Bathini wrote:
> 
> 
> On 10/08/23 8:31 am, Baoquan He wrote:
> > On 08/10/23 at 10:06am, Ming Lei wrote:
> > > On Thu, Aug 10, 2023 at 09:18:27AM +0800, Baoquan He wrote:
> > > > On 08/10/23 at 08:09am, Ming Lei wrote:
> > > > > On Wed, Aug 09, 2023 at 03:44:01PM +0200, Christoph Hellwig wrote:
> > > > > > I'm starting to sound like a broken record, but we can't just do random
> > > > > > is_kdump checks, and it's not going to get better by resending it again and
> > > > > > again.  If kdump kernels limit the number of possible CPUs, it needs to
> > > > > > reflected in cpu_possible_map and we need to use that information.
> > > > > > 
> > > > > 
> > > > > Can you look at previous kdump/arch guys' comment about kdump usage &
> > > > > num_possible_cpus?
> > > > > 
> > > > >      https://lore.kernel.org/linux-block/CAF+s44RuqswbosY9kMDx35crviQnxOeuvgNsuE75Bb0Y2Jg2uw@mail.gmail.com/
> > > > >      https://lore.kernel.org/linux-block/ZKz912KyFQ7q9qwL@MiWiFi-R3L-srv/
> > > > > 
> > > > > The point is that kdump kernels does not limit the number of possible CPUs.
> > > > > 
> > > > > 1) some archs support 'nr_cpus=1' for kdump kernel, which is fine, since
> > > > > num_possible_cpus becomes 1.
> > > > 
> > > > Yes, "nr_cpus=" is strongly suggested in kdump kernel because "nr_cpus="
> > > > limits the possible cpu numbers, while "maxcpuss=" only limits the cpu
> > > > number which can be brought up during bootup. We noticed this diference
> > > > because a large number of possible cpus will cost more memory in kdump
> > > > kernel. e.g percpu initialization, even though kdump kernel have set
> > > > "maxcpus=1".
> > > > 
> > > > Currently x86 and arm64 all support "nr_cpus=". Pingfan ever spent much
> > > > effort to make patches to add "nr_cpus=" support to ppc64, seems ppc64
> > > > dev and maintainers do not care about it. Finally the patches are not
> > > > accepted, and the work is not continued.
> > > > 
> > > > Now, I am wondering what is the barrier to add "nr_cpus=" to power ach.
> > > > Can we reconsider adding 'nr_cpus=' to power arch since real issue
> > > > occurred in kdump kernel?
> > > 
> > > If 'nr_cpus=' can be supported on ppc64, this patchset isn't needed.
> > > 
> > > > 
> > > > As for this patchset, it can be accpeted so that no failure in kdump
> > > > kernel is seen on ARCHes w/o "nr_cpus=" support? My personal opinion.
> > > 
> > > IMO 'nr_cpus=' support should be preferred, given it is annoying to
> > > maintain two kinds of implementation for kdump kernel from driver
> > > viewpoint. I guess kdump things can be simplified too with supporting
> > > 'nr_cpus=' only.
> > 
> > Yes, 'nr_cpus=' is ideal. Not sure if there's some underlying concerns so
> > that power people decided to not support it.
> 
> Though "nr_cpus=1" is an ideal solution, maintainer was not happy with
> the patch as the code changes have impact for regular boot path and
> it is likely to cause breakages. So, even if "nr_cpus=1" support for
> ppc64 is revived, the change is going to take time to be accepted
> upstream.

I talked to pingfan recently, he said he posted patches to add 'nr_cpus='
support in powerpc in order to reduce memory amount for kdump kernel.
His patches were rejected by maintainer because maintainer thought the
reason is not sufficient. So up to now, in architectures fedora/RHEL
supports to provide default crashkernel reservation value, powerpc costs
most. Now with this emerging issue, can we reconsider supporting
'nr_cpus=' in powerpc?

> 
> Also, I see is_kdump_kernel() being used irrespective of "nr_cpus=1"
> support for other optimizations in the driver for the special dump
> capture environment kdump is.
> 
> If there is no other downside for driver code, to use is_kdump_kernel(),
> other than the maintainability aspect, I think the above changes are
> worth considering.

Hi Hari,

By the way, will you use the ppc specific is_kdump_kernel() and
is_crashdump_kernel() in your patches to fix this issue?

Thanks
Baoquan


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2023-09-05 16:24 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-08 10:42 [PATCH V3 0/14] blk-mq: fix wrong queue mapping for kdump kernel Ming Lei
2023-08-08 10:42 ` [PATCH V3 01/14] blk-mq: add blk_mq_max_nr_hw_queues() Ming Lei
2023-08-09 13:44   ` Christoph Hellwig
2023-08-10  0:09     ` Ming Lei
2023-08-10  1:18       ` Baoquan He
2023-08-10  2:06         ` Ming Lei
2023-08-10  3:01           ` Baoquan He
2023-08-11  7:53             ` Hari Bathini
2023-09-05  5:03               ` Baoquan He
2023-08-11 13:10       ` Christoph Hellwig
2023-08-08 10:42 ` [PATCH V3 02/14] nvme-pci: use blk_mq_max_nr_hw_queues() to calculate io queues Ming Lei
2023-08-08 10:42 ` [PATCH V3 03/14] ublk: limit max allowed nr_hw_queues Ming Lei
2023-08-08 10:42 ` [PATCH V3 04/14] virtio-blk: limit max allowed submit queues Ming Lei
2023-08-10 19:23   ` Michael S. Tsirkin
2023-08-08 10:42 ` [PATCH V3 05/14] scsi: core: add helper of scsi_max_nr_hw_queues() Ming Lei
2023-08-08 10:42 ` [PATCH V3 06/14] scsi: lpfc: use blk_mq_max_nr_hw_queues() to calculate io vectors Ming Lei
2023-08-08 10:42 ` [PATCH V3 07/14] scsi: mpi3mr: take blk_mq_max_nr_hw_queues() into account for calculating " Ming Lei
2023-08-08 10:42 ` [PATCH V3 08/14] scsi: megaraid: " Ming Lei
2023-08-08 10:42 ` [PATCH V3 09/14] scsi: mpt3sas: " Ming Lei
2023-08-08 10:42 ` [PATCH V3 10/14] scsi: pm8001: " Ming Lei
2023-08-08 10:42 ` [PATCH V3 11/14] scsi: hisi: " Ming Lei
2023-08-08 10:42 ` [PATCH V3 12/14] scsi: ufs: limit max allowed nr_hw_queues Ming Lei
2023-08-08 10:42 ` [PATCH V3 13/14] scsi: storvsc: " Ming Lei
2023-08-08 10:42 ` [PATCH V3 14/14] blk-mq: add helpers for treating kdump kernel Ming Lei
2023-08-10  8:00   ` kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).