linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] blk-mq: fix wrong queue mapping for kdump kernel
@ 2023-07-12 12:54 Ming Lei
  2023-07-12 12:54 ` [PATCH 1/8] blk-mq: add blk_mq_max_nr_hw_queues() Ming Lei
                   ` (7 more replies)
  0 siblings, 8 replies; 16+ messages in thread
From: Ming Lei @ 2023-07-12 12:54 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-nvme, Martin K . Petersen,
	linux-scsi
  Cc: linux-block, Wen Xiong, Keith Busch, Ming Lei

Hi,

On arm and ppc64, 'maxcpus=1' is required for kdump kernel,
see `Documentation/admin-guide/kdump/kdump.rst`, so num_possible_cpus()
still returns all CPUs because 'maxcpus=1' just bring up one single
cpu core during booting.

blk-mq sees single queue in kdump kernel, and in driver's viewpoint
there are still multiple queues, this inconsistency causes driver to apply
wrong queue mapping for handling IO, and IO timeout is triggered.

This issue is only triggered on managed irq in case of multiple hw
queues. Some drivers takes online cpus into account for nr_hw_queues,
and don't have such issue, such as nvme rdma/tcp.

Meantime, single queue makes much less resource utilization, and reduce
risk of kernel failure.


Thanks,
Ming

Ming Lei (8):
  blk-mq: add blk_mq_max_nr_hw_queues()
  nvme-pci: use blk_mq_max_nr_hw_queues() to calculate io queues
  scsi: lpfc: use blk_mq_max_nr_hw_queues() to calculate io vectors
  scsi: hisi: take blk_mq_max_nr_hw_queues() into account for
    calculating io vectors
  scsi: mpi3mr: take blk_mq_max_nr_hw_queues() into account for
    calculating io vectors
  scsi: megaraid: take blk_mq_max_nr_hw_queues() into account for
    calculating io vectors
  scsi: mpt3sas: take blk_mq_max_nr_hw_queues() into account for
    calculating io vectors
  scsi: pm8001: take blk_mq_max_nr_hw_queues() into account for
    calculating io vectors

 block/blk-mq.c                            | 9 +++++++++
 drivers/nvme/host/pci.c                   | 2 +-
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c    | 3 +++
 drivers/scsi/lpfc/lpfc_init.c             | 1 +
 drivers/scsi/megaraid/megaraid_sas_base.c | 6 +++++-
 drivers/scsi/mpi3mr/mpi3mr_fw.c           | 3 +++
 drivers/scsi/mpt3sas/mpt3sas_base.c       | 4 ++--
 drivers/scsi/pm8001/pm8001_init.c         | 4 +++-
 include/linux/blk-mq.h                    | 1 +
 9 files changed, 28 insertions(+), 5 deletions(-)

-- 
2.40.1



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/8] blk-mq: add blk_mq_max_nr_hw_queues()
  2023-07-12 12:54 [PATCH 0/8] blk-mq: fix wrong queue mapping for kdump kernel Ming Lei
@ 2023-07-12 12:54 ` Ming Lei
  2023-07-12 13:00   ` Christoph Hellwig
  2023-07-12 12:54 ` [PATCH 2/8] nvme-pci: use blk_mq_max_nr_hw_queues() to calculate io queues Ming Lei
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: Ming Lei @ 2023-07-12 12:54 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-nvme, Martin K . Petersen,
	linux-scsi
  Cc: linux-block, Wen Xiong, Keith Busch, Ming Lei

blk_mq_alloc_tag_set() may override set->nr_hw_queues as 1 in case of kdump
kernel. This way causes trouble for driver, because blk-mq and driver see
different queue mapping. Especially the only online CPU may not be 1 for
kdump kernel, in which 'maxcpus=1' is passed from kernel command line,
then driver may map hctx0 into one inactive real hw queue which cpu
affinity is 0(offline).

The issue exists on all drivers which use managed irq and support
multiple hw queue.

Prepare for fixing this kind of issue by applying the added helper, so
driver can take blk-mq max nr_hw_queues knowledge into account when
calculating io queues.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c         | 9 +++++++++
 include/linux/blk-mq.h | 1 +
 2 files changed, 10 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 5504719b970d..b764da69a416 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -140,6 +140,15 @@ void blk_mq_freeze_queue_wait(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait);
 
+/* Max nr_hw_queues for each hw queue type */
+unsigned int blk_mq_max_nr_hw_queues(void)
+{
+	if (is_kdump_kernel())
+		return 1;
+	return nr_cpu_ids;
+}
+EXPORT_SYMBOL_GPL(blk_mq_max_nr_hw_queues);
+
 int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
 				     unsigned long timeout)
 {
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 2b7fb8e87793..2407978fbc30 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -713,6 +713,7 @@ int blk_mq_alloc_sq_tag_set(struct blk_mq_tag_set *set,
 		const struct blk_mq_ops *ops, unsigned int queue_depth,
 		unsigned int set_flags);
 void blk_mq_free_tag_set(struct blk_mq_tag_set *set);
+unsigned int blk_mq_max_nr_hw_queues(void);
 
 void blk_mq_free_request(struct request *rq);
 int blk_rq_poll(struct request *rq, struct io_comp_batch *iob,
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/8] nvme-pci: use blk_mq_max_nr_hw_queues() to calculate io queues
  2023-07-12 12:54 [PATCH 0/8] blk-mq: fix wrong queue mapping for kdump kernel Ming Lei
  2023-07-12 12:54 ` [PATCH 1/8] blk-mq: add blk_mq_max_nr_hw_queues() Ming Lei
@ 2023-07-12 12:54 ` Ming Lei
  2023-07-12 12:54 ` [PATCH 3/8] scsi: lpfc: use blk_mq_max_nr_hw_queues() to calculate io vectors Ming Lei
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Ming Lei @ 2023-07-12 12:54 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-nvme, Martin K . Petersen,
	linux-scsi
  Cc: linux-block, Wen Xiong, Keith Busch, Ming Lei

Take blk-mq's knowledge into account for calculating io queues.

Fix wrong queue mapping in case of kdump kernel.

On arm and ppc64, 'maxcpus=1' is passed to kdump command line, see
`Documentation/admin-guide/kdump/kdump.rst`, so num_possible_cpus()
still returns all CPUs because 'maxcpus=1' just bring up one single
cpu core during booting.

blk-mq sees single queue in kdump kernel, and in driver's viewpoint
there are still multiple queues, this inconsistency causes driver to apply
wrong queue mapping for handling IO, and IO timeout is triggered.

Meantime, single queue makes much less resource utilization, and reduce
risk of kernel failure.

Reported-by: Wen Xiong <wenxiong@linux.ibm.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/host/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 72725729cb6c..cb13ba203956 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2247,7 +2247,7 @@ static unsigned int nvme_max_io_queues(struct nvme_dev *dev)
 	 */
 	if (dev->ctrl.quirks & NVME_QUIRK_SHARED_TAGS)
 		return 1;
-	return num_possible_cpus() + dev->nr_write_queues + dev->nr_poll_queues;
+	return blk_mq_max_nr_hw_queues() + dev->nr_write_queues + dev->nr_poll_queues;
 }
 
 static int nvme_setup_io_queues(struct nvme_dev *dev)
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/8] scsi: lpfc: use blk_mq_max_nr_hw_queues() to calculate io vectors
  2023-07-12 12:54 [PATCH 0/8] blk-mq: fix wrong queue mapping for kdump kernel Ming Lei
  2023-07-12 12:54 ` [PATCH 1/8] blk-mq: add blk_mq_max_nr_hw_queues() Ming Lei
  2023-07-12 12:54 ` [PATCH 2/8] nvme-pci: use blk_mq_max_nr_hw_queues() to calculate io queues Ming Lei
@ 2023-07-12 12:54 ` Ming Lei
  2023-07-13  0:03   ` Justin Tee
  2023-07-12 12:54 ` [PATCH 4/8] scsi: hisi: take blk_mq_max_nr_hw_queues() into account for calculating " Ming Lei
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: Ming Lei @ 2023-07-12 12:54 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-nvme, Martin K . Petersen,
	linux-scsi
  Cc: linux-block, Wen Xiong, Keith Busch, Ming Lei, James Smart

Take blk-mq's knowledge into account for calculating io queues.

Fix wrong queue mapping in case of kdump kernel.

On arm and ppc64, 'maxcpus=1' is passed to kdump kernel command line,
see `Documentation/admin-guide/kdump/kdump.rst`, so num_possible_cpus()
still returns all CPUs because 'maxcpus=1' just bring up one single
cpu core during booting.

blk-mq sees single queue in kdump kernel, and in driver's viewpoint
there are still multiple queues, this inconsistency causes driver to apply
wrong queue mapping for handling IO, and IO timeout is triggered.

Meantime, single queue makes much less resource utilization, and reduce
risk of kernel failure.

Cc: James Smart <james.smart@broadcom.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/lpfc/lpfc_init.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index 3221a934066b..88314c3f1dc1 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -13012,6 +13012,7 @@ lpfc_sli4_enable_msix(struct lpfc_hba *phba)
 	if (phba->irq_chann_mode != NORMAL_MODE)
 		aff_mask = &phba->sli4_hba.irq_aff_mask;
 
+	vectors = min_t(unsigned int, vectors, blk_mq_max_nr_hw_queues());
 	if (aff_mask) {
 		cpu_cnt = cpumask_weight(aff_mask);
 		vectors = min(phba->cfg_irq_chann, cpu_cnt);
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/8] scsi: hisi: take blk_mq_max_nr_hw_queues() into account for calculating io vectors
  2023-07-12 12:54 [PATCH 0/8] blk-mq: fix wrong queue mapping for kdump kernel Ming Lei
                   ` (2 preceding siblings ...)
  2023-07-12 12:54 ` [PATCH 3/8] scsi: lpfc: use blk_mq_max_nr_hw_queues() to calculate io vectors Ming Lei
@ 2023-07-12 12:54 ` Ming Lei
  2023-07-12 12:54 ` [PATCH 5/8] scsi: mpi3mr: " Ming Lei
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Ming Lei @ 2023-07-12 12:54 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-nvme, Martin K . Petersen,
	linux-scsi
  Cc: linux-block, Wen Xiong, Keith Busch, Ming Lei, Xiang Chen

Take blk-mq's knowledge into account for calculating io queues.

Fix wrong queue mapping in case of kdump kernel.

On arm and ppc64, 'maxcpus=1' is passed to kdump kernel command line,
see `Documentation/admin-guide/kdump/kdump.rst`, so num_possible_cpus()
still returns all CPUs because 'maxcpus=1' just bring up one single
cpu core during booting.

blk-mq sees single queue in kdump kernel, and in driver's viewpoint
there are still multiple queues, this inconsistency causes driver to apply
wrong queue mapping for handling IO, and IO timeout is triggered.

Meantime, single queue makes much less resource utilization, and reduce
risk of kernel failure.

Cc: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
index 20e1607c6282..bbec11220286 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
@@ -2550,6 +2550,9 @@ static int interrupt_preinit_v3_hw(struct hisi_hba *hisi_hba)
 
 
 	hisi_hba->cq_nvecs = vectors - BASE_VECTORS_V3_HW - hisi_hba->iopoll_q_cnt;
+	if (hisi_hba->cq_nvecs > blk_mq_max_nr_hw_queues())
+		hisi_hba->cq_nvecs = blk_mq_max_nr_hw_queues();
+
 	shost->nr_hw_queues = hisi_hba->cq_nvecs + hisi_hba->iopoll_q_cnt;
 
 	return devm_add_action(&pdev->dev, hisi_sas_v3_free_vectors, pdev);
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/8] scsi: mpi3mr: take blk_mq_max_nr_hw_queues() into account for calculating io vectors
  2023-07-12 12:54 [PATCH 0/8] blk-mq: fix wrong queue mapping for kdump kernel Ming Lei
                   ` (3 preceding siblings ...)
  2023-07-12 12:54 ` [PATCH 4/8] scsi: hisi: take blk_mq_max_nr_hw_queues() into account for calculating " Ming Lei
@ 2023-07-12 12:54 ` Ming Lei
  2023-07-12 12:54 ` [PATCH 6/8] scsi: megaraid: " Ming Lei
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Ming Lei @ 2023-07-12 12:54 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-nvme, Martin K . Petersen,
	linux-scsi
  Cc: linux-block, Wen Xiong, Keith Busch, Ming Lei, Sreekanth Reddy,
	Sathya Prakash Veerichetty, Kashyap Desai, Sumit Saxena

Take blk-mq's knowledge into account for calculating io queues.

Fix wrong queue mapping in case of kdump kernel.

On arm and ppc64, 'maxcpus=1' is passed to kdump kernel command line,
see `Documentation/admin-guide/kdump/kdump.rst`, so num_possible_cpus()
still returns all CPUs because 'maxcpus=1' just bring up one single
cpu core during booting.

blk-mq sees single queue in kdump kernel, and in driver's viewpoint
there are still multiple queues, this inconsistency causes driver to apply
wrong queue mapping for handling IO, and IO timeout is triggered.

Meantime, single queue makes much less resource utilization, and reduce
risk of kernel failure.

Cc: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Cc: Sathya Prakash Veerichetty <sathya.prakash@broadcom.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/mpi3mr/mpi3mr_fw.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/scsi/mpi3mr/mpi3mr_fw.c b/drivers/scsi/mpi3mr/mpi3mr_fw.c
index 5fa07d6ee5b8..79ffd161575c 100644
--- a/drivers/scsi/mpi3mr/mpi3mr_fw.c
+++ b/drivers/scsi/mpi3mr/mpi3mr_fw.c
@@ -815,6 +815,9 @@ static int mpi3mr_setup_isr(struct mpi3mr_ioc *mrioc, u8 setup_one)
 
 		desc.post_vectors = mrioc->requested_poll_qcount;
 		min_vec = desc.pre_vectors + desc.post_vectors;
+		if (max_vectors - min_vec > blk_mq_max_nr_hw_queues())
+			max_vectors = min_vec + blk_mq_max_nr_hw_queues();
+
 		irq_flags |= PCI_IRQ_AFFINITY | PCI_IRQ_ALL_TYPES;
 
 		retval = pci_alloc_irq_vectors_affinity(mrioc->pdev,
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 6/8] scsi: megaraid: take blk_mq_max_nr_hw_queues() into account for calculating io vectors
  2023-07-12 12:54 [PATCH 0/8] blk-mq: fix wrong queue mapping for kdump kernel Ming Lei
                   ` (4 preceding siblings ...)
  2023-07-12 12:54 ` [PATCH 5/8] scsi: mpi3mr: " Ming Lei
@ 2023-07-12 12:54 ` Ming Lei
  2023-07-12 12:54 ` [PATCH 7/8] scsi: mpt3sas: " Ming Lei
  2023-07-12 12:54 ` [PATCH 8/8] scsi: pm8001: " Ming Lei
  7 siblings, 0 replies; 16+ messages in thread
From: Ming Lei @ 2023-07-12 12:54 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-nvme, Martin K . Petersen,
	linux-scsi
  Cc: linux-block, Wen Xiong, Keith Busch, Ming Lei, Sreekanth Reddy,
	Sathya Prakash Veerichetty, Kashyap Desai, Sumit Saxena

Take blk-mq's knowledge into account for calculating io queues.

Fix wrong queue mapping in case of kdump kernel.

On arm and ppc64, 'maxcpus=1' is passed to kdump kernel command line,
see `Documentation/admin-guide/kdump/kdump.rst`, so num_possible_cpus()
still returns all CPUs because 'maxcpus=1' just bring up one single
cpu core during booting.

blk-mq sees single queue in kdump kernel, and in driver's viewpoint
there are still multiple queues, this inconsistency causes driver to apply
wrong queue mapping for handling IO, and IO timeout is triggered.

Meantime, single queue makes much less resource utilization, and reduce
risk of kernel failure.

Cc: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Cc: Sathya Prakash Veerichetty <sathya.prakash@broadcom.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/megaraid/megaraid_sas_base.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
index 050eed8e2684..214713803c2e 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -5921,6 +5921,10 @@ __megasas_alloc_irq_vectors(struct megasas_instance *instance)
 	int i, irq_flags;
 	struct irq_affinity desc = { .pre_vectors = instance->low_latency_index_start };
 	struct irq_affinity *descp = &desc;
+	unsigned max_vecs = instance->msix_vectors - instance->iopoll_q_count;
+
+	if (max_vecs > blk_mq_max_nr_hw_queues())
+		max_vecs = blk_mq_max_nr_hw_queues();
 
 	irq_flags = PCI_IRQ_MSIX;
 
@@ -5934,7 +5938,7 @@ __megasas_alloc_irq_vectors(struct megasas_instance *instance)
 	 */
 	i = pci_alloc_irq_vectors_affinity(instance->pdev,
 		instance->low_latency_index_start,
-		instance->msix_vectors - instance->iopoll_q_count, irq_flags, descp);
+		max_vecs, irq_flags, descp);
 
 	return i;
 }
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 7/8] scsi: mpt3sas: take blk_mq_max_nr_hw_queues() into account for calculating io vectors
  2023-07-12 12:54 [PATCH 0/8] blk-mq: fix wrong queue mapping for kdump kernel Ming Lei
                   ` (5 preceding siblings ...)
  2023-07-12 12:54 ` [PATCH 6/8] scsi: megaraid: " Ming Lei
@ 2023-07-12 12:54 ` Ming Lei
  2023-07-12 12:54 ` [PATCH 8/8] scsi: pm8001: " Ming Lei
  7 siblings, 0 replies; 16+ messages in thread
From: Ming Lei @ 2023-07-12 12:54 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-nvme, Martin K . Petersen,
	linux-scsi
  Cc: linux-block, Wen Xiong, Keith Busch, Ming Lei, Sathya Prakash,
	Sreekanth Reddy, Suganath Prabu Subramani

Take blk-mq's knowledge into account for calculating io queues.

Fix wrong queue mapping in case of kdump kernel.

On arm and ppc64, 'maxcpus=1' is passed to kdump kernel command line,
see `Documentation/admin-guide/kdump/kdump.rst`, so num_possible_cpus()
still returns all CPUs because 'maxcpus=1' just bring up one single
cpu core during booting.

blk-mq sees single queue in kdump kernel, and in driver's viewpoint
there are still multiple queues, this inconsistency causes driver to apply
wrong queue mapping for handling IO, and IO timeout is triggered.

Meantime, single queue makes much less resource utilization, and reduce
risk of kernel failure.

Cc: Sathya Prakash <sathya.prakash@broadcom.com>
Cc: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Cc: Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/mpt3sas/mpt3sas_base.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 53f5492579cb..18a73d3ff00b 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -3332,8 +3332,8 @@ _base_alloc_irq_vectors(struct MPT3SAS_ADAPTER *ioc)
 	 * Don't allocate msix vectors for poll_queues.
 	 * msix_vectors is always within a range of FW supported reply queue.
 	 */
-	int nr_msix_vectors = ioc->iopoll_q_start_index;
-
+	int nr_msix_vectors = min_t(unsigned int, ioc->iopoll_q_start_index,
+			blk_mq_max_nr_hw_queues());
 
 	if (ioc->smp_affinity_enable)
 		irq_flags |= PCI_IRQ_AFFINITY | PCI_IRQ_ALL_TYPES;
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 8/8] scsi: pm8001: take blk_mq_max_nr_hw_queues() into account for calculating io vectors
  2023-07-12 12:54 [PATCH 0/8] blk-mq: fix wrong queue mapping for kdump kernel Ming Lei
                   ` (6 preceding siblings ...)
  2023-07-12 12:54 ` [PATCH 7/8] scsi: mpt3sas: " Ming Lei
@ 2023-07-12 12:54 ` Ming Lei
  2023-07-12 13:12   ` Jinpu Wang
  7 siblings, 1 reply; 16+ messages in thread
From: Ming Lei @ 2023-07-12 12:54 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, linux-nvme, Martin K . Petersen,
	linux-scsi
  Cc: linux-block, Wen Xiong, Keith Busch, Ming Lei, Jack Wang

Take blk-mq's knowledge into account for calculating io queues.

Fix wrong queue mapping in case of kdump kernel.

On arm and ppc64, 'maxcpus=1' is passed to kdump kernel command line,
see `Documentation/admin-guide/kdump/kdump.rst`, so num_possible_cpus()
still returns all CPUs because 'maxcpus=1' just bring up one single
cpu core during booting.

blk-mq sees single queue in kdump kernel, and in driver's viewpoint
there are still multiple queues, this inconsistency causes driver to apply
wrong queue mapping for handling IO, and IO timeout is triggered.

Meantime, single queue makes much less resource utilization, and reduce
risk of kernel failure.

Cc: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/pm8001/pm8001_init.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/pm8001/pm8001_init.c b/drivers/scsi/pm8001/pm8001_init.c
index 2e886c1d867d..e2416f556560 100644
--- a/drivers/scsi/pm8001/pm8001_init.c
+++ b/drivers/scsi/pm8001/pm8001_init.c
@@ -965,6 +965,8 @@ static u32 pm8001_setup_msix(struct pm8001_hba_info *pm8001_ha)
 		rc = pci_alloc_irq_vectors(pm8001_ha->pdev, 1, 1,
 					   PCI_IRQ_MSIX);
 	} else {
+		unsigned int max_vecs = min_t(unsigned int, PM8001_MAX_MSIX_VEC,
+				blk_mq_max_nr_hw_queues() + 1);
 		/*
 		 * Queue index #0 is used always for housekeeping, so don't
 		 * include in the affinity spreading.
@@ -973,7 +975,7 @@ static u32 pm8001_setup_msix(struct pm8001_hba_info *pm8001_ha)
 			.pre_vectors = 1,
 		};
 		rc = pci_alloc_irq_vectors_affinity(
-				pm8001_ha->pdev, 2, PM8001_MAX_MSIX_VEC,
+				pm8001_ha->pdev, 2, max_vecs,
 				PCI_IRQ_MSIX | PCI_IRQ_AFFINITY, &desc);
 	}
 
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/8] blk-mq: add blk_mq_max_nr_hw_queues()
  2023-07-12 12:54 ` [PATCH 1/8] blk-mq: add blk_mq_max_nr_hw_queues() Ming Lei
@ 2023-07-12 13:00   ` Christoph Hellwig
  2023-07-12 13:16     ` Ming Lei
  0 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2023-07-12 13:00 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, linux-nvme, Martin K . Petersen,
	linux-scsi, linux-block, Wen Xiong, Keith Busch

On Wed, Jul 12, 2023 at 08:54:48PM +0800, Ming Lei wrote:
> +/* Max nr_hw_queues for each hw queue type */
> +unsigned int blk_mq_max_nr_hw_queues(void)
> +{
> +	if (is_kdump_kernel())
> +		return 1;
> +	return nr_cpu_ids;

Again, these is_kdump_kernel hacks don't make any sense.   The amount
of maximum available CPU needs to come through a proper API, and we
need to use it, not add hacks like this.

The only thing that makes sense here is to find the last CPU
in cpu_possible_mask, and for kdump kernels to ensure that number
is 1 or whatever low value they want.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 8/8] scsi: pm8001: take blk_mq_max_nr_hw_queues() into account for calculating io vectors
  2023-07-12 12:54 ` [PATCH 8/8] scsi: pm8001: " Ming Lei
@ 2023-07-12 13:12   ` Jinpu Wang
  0 siblings, 0 replies; 16+ messages in thread
From: Jinpu Wang @ 2023-07-12 13:12 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, linux-nvme, Martin K . Petersen,
	linux-scsi, linux-block, Wen Xiong, Keith Busch, Jack Wang

On Wed, Jul 12, 2023 at 2:55 PM Ming Lei <ming.lei@redhat.com> wrote:
>
> Take blk-mq's knowledge into account for calculating io queues.
>
> Fix wrong queue mapping in case of kdump kernel.
>
> On arm and ppc64, 'maxcpus=1' is passed to kdump kernel command line,
> see `Documentation/admin-guide/kdump/kdump.rst`, so num_possible_cpus()
> still returns all CPUs because 'maxcpus=1' just bring up one single
> cpu core during booting.
>
> blk-mq sees single queue in kdump kernel, and in driver's viewpoint
> there are still multiple queues, this inconsistency causes driver to apply
> wrong queue mapping for handling IO, and IO timeout is triggered.
>
> Meantime, single queue makes much less resource utilization, and reduce
> risk of kernel failure.
>
> Cc: Jack Wang <jinpu.wang@cloud.ionos.com>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
thx!
> ---
>  drivers/scsi/pm8001/pm8001_init.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/scsi/pm8001/pm8001_init.c b/drivers/scsi/pm8001/pm8001_init.c
> index 2e886c1d867d..e2416f556560 100644
> --- a/drivers/scsi/pm8001/pm8001_init.c
> +++ b/drivers/scsi/pm8001/pm8001_init.c
> @@ -965,6 +965,8 @@ static u32 pm8001_setup_msix(struct pm8001_hba_info *pm8001_ha)
>                 rc = pci_alloc_irq_vectors(pm8001_ha->pdev, 1, 1,
>                                            PCI_IRQ_MSIX);
>         } else {
> +               unsigned int max_vecs = min_t(unsigned int, PM8001_MAX_MSIX_VEC,
> +                               blk_mq_max_nr_hw_queues() + 1);
>                 /*
>                  * Queue index #0 is used always for housekeeping, so don't
>                  * include in the affinity spreading.
> @@ -973,7 +975,7 @@ static u32 pm8001_setup_msix(struct pm8001_hba_info *pm8001_ha)
>                         .pre_vectors = 1,
>                 };
>                 rc = pci_alloc_irq_vectors_affinity(
> -                               pm8001_ha->pdev, 2, PM8001_MAX_MSIX_VEC,
> +                               pm8001_ha->pdev, 2, max_vecs,
>                                 PCI_IRQ_MSIX | PCI_IRQ_AFFINITY, &desc);
>         }
>
> --
> 2.40.1
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/8] blk-mq: add blk_mq_max_nr_hw_queues()
  2023-07-12 13:00   ` Christoph Hellwig
@ 2023-07-12 13:16     ` Ming Lei
  2023-07-12 13:19       ` Christoph Hellwig
  0 siblings, 1 reply; 16+ messages in thread
From: Ming Lei @ 2023-07-12 13:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-nvme, Martin K . Petersen, linux-scsi,
	linux-block, Wen Xiong, Keith Busch, Thomas Gleixner,
	linux-kernel

On Wed, Jul 12, 2023 at 03:00:17PM +0200, Christoph Hellwig wrote:
> On Wed, Jul 12, 2023 at 08:54:48PM +0800, Ming Lei wrote:
> > +/* Max nr_hw_queues for each hw queue type */
> > +unsigned int blk_mq_max_nr_hw_queues(void)
> > +{
> > +	if (is_kdump_kernel())
> > +		return 1;
> > +	return nr_cpu_ids;
> 
> Again, these is_kdump_kernel hacks don't make any sense.   The amount
> of maximum available CPU needs to come through a proper API, and we
> need to use it, not add hacks like this.
> 
> The only thing that makes sense here is to find the last CPU
> in cpu_possible_mask, and for kdump kernels to ensure that number
> is 1 or whatever low value they want.

It doesn't matter how many cpus are available, given at least one
cpu is online.

The problem is that blk_mq_alloc_tag_set() forces to set nr_hw_queues
as 1 for kdump kernel, that is why blk_mq_max_nr_hw_queues() has to
return 1 for kdump kernel.

We have to tell driver that blk-mq only supports 1 queue for kdump
kernel.

Or:

Thomas, can we disable managed irq for kdump kernel and switch to
non-managed irq? Then we can avoid driver's change. I'd suggest
this way if it is possible.

Thanks,
Ming



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/8] blk-mq: add blk_mq_max_nr_hw_queues()
  2023-07-12 13:16     ` Ming Lei
@ 2023-07-12 13:19       ` Christoph Hellwig
  2023-07-12 13:31         ` Ming Lei
  0 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2023-07-12 13:19 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, linux-nvme, Martin K . Petersen,
	linux-scsi, linux-block, Wen Xiong, Keith Busch, Thomas Gleixner,
	linux-kernel

On Wed, Jul 12, 2023 at 09:16:11PM +0800, Ming Lei wrote:
> The problem is that blk_mq_alloc_tag_set() forces to set nr_hw_queues
> as 1 for kdump kernel, that is why blk_mq_max_nr_hw_queues() has to
> return 1 for kdump kernel.

Well, let's fix that first and work from there.  Same argument against
that deep magic applies there as well.

> Thomas, can we disable managed irq for kdump kernel and switch to
> non-managed irq? Then we can avoid driver's change. I'd suggest
> this way if it is possible.

Why the heck would we?



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/8] blk-mq: add blk_mq_max_nr_hw_queues()
  2023-07-12 13:19       ` Christoph Hellwig
@ 2023-07-12 13:31         ` Ming Lei
  0 siblings, 0 replies; 16+ messages in thread
From: Ming Lei @ 2023-07-12 13:31 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-nvme, Martin K . Petersen, linux-scsi,
	linux-block, Wen Xiong, Keith Busch, Thomas Gleixner,
	linux-kernel, ming.lei

On Wed, Jul 12, 2023 at 03:19:25PM +0200, Christoph Hellwig wrote:
> On Wed, Jul 12, 2023 at 09:16:11PM +0800, Ming Lei wrote:
> > The problem is that blk_mq_alloc_tag_set() forces to set nr_hw_queues
> > as 1 for kdump kernel, that is why blk_mq_max_nr_hw_queues() has to
> > return 1 for kdump kernel.
> 
> Well, let's fix that first and work from there.  Same argument against
> that deep magic applies there as well.

In short, driver needs to figure out nr_hw_queues first by hardware info,
then pass it to blk_mq_alloc_tag_set(), but blk_mq_alloc_tag_set() changes it,
so inconsistency is caused.

The only solution in this way is to tell driver the max supported
number from the beginning, that is what this patchset is doing.

> 
> > Thomas, can we disable managed irq for kdump kernel and switch to
> > non-managed irq? Then we can avoid driver's change. I'd suggest
> > this way if it is possible.
> 
> Why the heck would we?

IMO irq kernel doesn't make sense in kdump kernel, which is very
resource limited and has to be reliable.

PCI_IRQ_AFFINITY can be just one hint, pci_alloc_irq_vectors_affinity()
still allocates affinity in managed way, then queue mapping can work
just fine, and the only difference is that genirq handles this irqs
as non-manged wrt. migration.

This way should solve queue mapping issue, but driver still allocates
lots of queues, which take resource useless. So looks we still have to
fix drivers.


Thanks, 
Ming



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/8] scsi: lpfc: use blk_mq_max_nr_hw_queues() to calculate io vectors
  2023-07-12 12:54 ` [PATCH 3/8] scsi: lpfc: use blk_mq_max_nr_hw_queues() to calculate io vectors Ming Lei
@ 2023-07-13  0:03   ` Justin Tee
  2023-07-21  8:48     ` Ming Lei
  0 siblings, 1 reply; 16+ messages in thread
From: Justin Tee @ 2023-07-13  0:03 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, linux-nvme, Martin K . Petersen,
	linux-scsi, linux-block, Wen Xiong, Keith Busch, James Smart,
	Justin Tee

Hi Ming,

A few lines below in if (aff_mask), vectors can be overwritten again
with min(phba->cfg_irq_chann, cpu_cnt).

Perhaps we should move blk_mq_max_nr_hw_queues min comparison a little later:

diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index 3221a934066b..20410789e8b8 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -13025,6 +13025,8 @@ lpfc_sli4_enable_msix(struct lpfc_hba *phba)
                flags |= PCI_IRQ_AFFINITY;
        }

+       vectors = min_t(unsigned int, vectors, blk_mq_max_nr_hw_queues());
+
        rc = pci_alloc_irq_vectors(phba->pcidev, 1, vectors, flags);
        if (rc < 0) {
                lpfc_printf_log(phba, KERN_INFO, LOG_INIT,


Regards,
Justin

On Wed, Jul 12, 2023 at 6:04 AM Ming Lei <ming.lei@redhat.com> wrote:
>
> Take blk-mq's knowledge into account for calculating io queues.
>
> Fix wrong queue mapping in case of kdump kernel.
>
> On arm and ppc64, 'maxcpus=1' is passed to kdump kernel command line,
> see `Documentation/admin-guide/kdump/kdump.rst`, so num_possible_cpus()
> still returns all CPUs because 'maxcpus=1' just bring up one single
> cpu core during booting.
>
> blk-mq sees single queue in kdump kernel, and in driver's viewpoint
> there are still multiple queues, this inconsistency causes driver to apply
> wrong queue mapping for handling IO, and IO timeout is triggered.
>
> Meantime, single queue makes much less resource utilization, and reduce
> risk of kernel failure.
>
> Cc: James Smart <james.smart@broadcom.com>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/scsi/lpfc/lpfc_init.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
> index 3221a934066b..88314c3f1dc1 100644
> --- a/drivers/scsi/lpfc/lpfc_init.c
> +++ b/drivers/scsi/lpfc/lpfc_init.c
> @@ -13012,6 +13012,7 @@ lpfc_sli4_enable_msix(struct lpfc_hba *phba)
>         if (phba->irq_chann_mode != NORMAL_MODE)
>                 aff_mask = &phba->sli4_hba.irq_aff_mask;
>
> +       vectors = min_t(unsigned int, vectors, blk_mq_max_nr_hw_queues());
>         if (aff_mask) {
>                 cpu_cnt = cpumask_weight(aff_mask);
>                 vectors = min(phba->cfg_irq_chann, cpu_cnt);
> --
> 2.40.1
>


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/8] scsi: lpfc: use blk_mq_max_nr_hw_queues() to calculate io vectors
  2023-07-13  0:03   ` Justin Tee
@ 2023-07-21  8:48     ` Ming Lei
  0 siblings, 0 replies; 16+ messages in thread
From: Ming Lei @ 2023-07-21  8:48 UTC (permalink / raw)
  To: Justin Tee
  Cc: Jens Axboe, Christoph Hellwig, linux-nvme, Martin K . Petersen,
	linux-scsi, linux-block, Wen Xiong, Keith Busch, James Smart,
	Justin Tee

On Wed, Jul 12, 2023 at 05:03:19PM -0700, Justin Tee wrote:
> Hi Ming,
> 
> A few lines below in if (aff_mask), vectors can be overwritten again
> with min(phba->cfg_irq_chann, cpu_cnt).
> 
> Perhaps we should move blk_mq_max_nr_hw_queues min comparison a little later:
> 
> diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
> index 3221a934066b..20410789e8b8 100644
> --- a/drivers/scsi/lpfc/lpfc_init.c
> +++ b/drivers/scsi/lpfc/lpfc_init.c
> @@ -13025,6 +13025,8 @@ lpfc_sli4_enable_msix(struct lpfc_hba *phba)
>                 flags |= PCI_IRQ_AFFINITY;
>         }
> 
> +       vectors = min_t(unsigned int, vectors, blk_mq_max_nr_hw_queues());
> +
>         rc = pci_alloc_irq_vectors(phba->pcidev, 1, vectors, flags);
>         if (rc < 0) {
>                 lpfc_printf_log(phba, KERN_INFO, LOG_INIT,

Hi Justin, 

Indeed, will take it in next version.

Thanks,
Ming



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-07-21  8:49 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-12 12:54 [PATCH 0/8] blk-mq: fix wrong queue mapping for kdump kernel Ming Lei
2023-07-12 12:54 ` [PATCH 1/8] blk-mq: add blk_mq_max_nr_hw_queues() Ming Lei
2023-07-12 13:00   ` Christoph Hellwig
2023-07-12 13:16     ` Ming Lei
2023-07-12 13:19       ` Christoph Hellwig
2023-07-12 13:31         ` Ming Lei
2023-07-12 12:54 ` [PATCH 2/8] nvme-pci: use blk_mq_max_nr_hw_queues() to calculate io queues Ming Lei
2023-07-12 12:54 ` [PATCH 3/8] scsi: lpfc: use blk_mq_max_nr_hw_queues() to calculate io vectors Ming Lei
2023-07-13  0:03   ` Justin Tee
2023-07-21  8:48     ` Ming Lei
2023-07-12 12:54 ` [PATCH 4/8] scsi: hisi: take blk_mq_max_nr_hw_queues() into account for calculating " Ming Lei
2023-07-12 12:54 ` [PATCH 5/8] scsi: mpi3mr: " Ming Lei
2023-07-12 12:54 ` [PATCH 6/8] scsi: megaraid: " Ming Lei
2023-07-12 12:54 ` [PATCH 7/8] scsi: mpt3sas: " Ming Lei
2023-07-12 12:54 ` [PATCH 8/8] scsi: pm8001: " Ming Lei
2023-07-12 13:12   ` Jinpu Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).