public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Guenter Roeck <linux@roeck-us.net>
Cc: Christoph Hellwig <hch@lst.de>,
	Keith Busch <keith.busch@intel.com>,
	Sagi Grimberg <sagi@grimberg.me>,
	linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] nvme: default to 0 poll queues
Date: Sat, 8 Dec 2018 22:38:47 -0700	[thread overview]
Message-ID: <fa7aabad-774b-4e5b-686f-3696762291c1@kernel.dk> (raw)
In-Reply-To: <20181209004953.GA11638@roeck-us.net>

On 12/8/18 5:49 PM, Guenter Roeck wrote:
> Hi,
> 
> On Mon, Nov 19, 2018 at 08:18:24AM -0700, Jens Axboe wrote:
>> We need a better way of configuring this, and given that polling is
>> (still) a bit niche, let's default to using 0 poll queues. That way
>> we'll have the same read/write/poll behavior as 4.20, and users that
>> want to test/use polling are required to do manual configuration of the
>> number of poll queues.
>>
>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>> ---
> 
> This patch results in a boot stall when booting parisc (hppa) images
> from nvme in qemu.
> 
> ...
> Fusion MPT SAS Host driver 3.04.20
> rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> rcu: 	(detected by 0, t=5252 jiffies, g=141, q=22)
> rcu: All QSes seen, last rcu_sched kthread activity 5252 (-66742--71994), jiffies_till_next_fqs=1, root ->qsmask 0x0
> kworker/u8:3    R  running task        0    85      2 0x00000004
> Workqueue: nvme-reset-wq nvme_reset_work
> Backtrace:
>  [<10190d20>] show_stack+0x28/0x38
>  [<101dd1e0>] sched_show_task.part.3+0xc4/0x144
>  [<101dd290>] sched_show_task+0x30/0x38
>  [<10221e18>] rcu_check_callbacks+0x760/0x7a4
> 
> rcu: rcu_sched kthread starved for 5252 jiffies! g141 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
> rcu: RCU grace-period kthread stack dump:
> rcu_sched       R  running task        0    10      2 0x00000000
> Backtrace:
>  [<10995b1c>] __schedule+0x214/0x648
>  [<10995f94>] schedule+0x44/0xa8
>  [<1099a7c4>] schedule_timeout+0x114/0x1a0
>  [<10220e70>] rcu_gp_kthread+0x744/0x968
>  [<101d5438>] kthread+0x154/0x15c
>  [<1019501c>] ret_from_kernel_thread+0x1c/0x24
> 
> [ continued ]
> 
> This is only seen in SMP configurations; non-SMP configurations are ok.
> Reverting the patch fixes the problem. v4.20-rcX and earlier kernels
> also boot without problems.
> 
> For reference, here is the qemu command line. This is with qemu 3.0.
> 
> qemu-system-hppa -kernel vmlinux -no-reboot \
> 	-snapshot \
> 	-device nvme,serial=foo,drive=d0 \
> 	-drive file=rootfs.ext2,if=none,format=raw,id=d0 \
> 	-append 'root=/dev/nvme0n1 rw rootwait panic=-1 console=ttyS0,115200 ' \
> 	-nographic -monitor null
> 
> Please let me know if you need additional information.

Hmm, I think the queue reduction case has a logic error. Actually there
are two bugs:

1) Ensure we don't keep overwriting the queue count we ask for
2) Don't include poll_queues in the vectors we need

Untested... And not super pretty. But does this work for you?


diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 7732c4979a4e..fe00e19493ae 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2083,7 +2083,7 @@ static void nvme_calc_io_queues(struct nvme_dev *dev, unsigned int nr_io_queues)
 	}
 }
 
-static int nvme_setup_irqs(struct nvme_dev *dev, int nr_io_queues)
+static int nvme_setup_irqs(struct nvme_dev *dev, int irq_queues, int pqueues)
 {
 	struct pci_dev *pdev = to_pci_dev(dev->dev);
 	int irq_sets[2];
@@ -2100,7 +2100,8 @@ static int nvme_setup_irqs(struct nvme_dev *dev, int nr_io_queues)
 	 * IRQ vector needs.
 	 */
 	do {
-		nvme_calc_io_queues(dev, nr_io_queues);
+		nvme_calc_io_queues(dev, irq_queues + pqueues);
+		pqueues = dev->io_queues[HCTX_TYPE_POLL];
 		irq_sets[0] = dev->io_queues[HCTX_TYPE_DEFAULT];
 		irq_sets[1] = dev->io_queues[HCTX_TYPE_READ];
 		if (!irq_sets[1])
@@ -2111,11 +2112,11 @@ static int nvme_setup_irqs(struct nvme_dev *dev, int nr_io_queues)
 		 * 1 + 1 queues, just ask for a single vector. We'll share
 		 * that between the single IO queue and the admin queue.
 		 */
-		if (!(result < 0 && nr_io_queues == 1))
-			nr_io_queues = irq_sets[0] + irq_sets[1] + 1;
+		if (!(result < 0 || irq_queues == 1))
+			irq_queues = irq_sets[0] + irq_sets[1] + 1;
 
-		result = pci_alloc_irq_vectors_affinity(pdev, nr_io_queues,
-				nr_io_queues,
+		result = pci_alloc_irq_vectors_affinity(pdev, irq_queues,
+				irq_queues,
 				PCI_IRQ_ALL_TYPES | PCI_IRQ_AFFINITY, &affd);
 
 		/*
@@ -2125,12 +2126,12 @@ static int nvme_setup_irqs(struct nvme_dev *dev, int nr_io_queues)
 		 * likely does not. Back down to ask for just one vector.
 		 */
 		if (result == -ENOSPC) {
-			nr_io_queues--;
-			if (!nr_io_queues)
+			irq_queues--;
+			if (!irq_queues)
 				return result;
 			continue;
 		} else if (result == -EINVAL) {
-			nr_io_queues = 1;
+			irq_queues = 1;
 			continue;
 		} else if (result <= 0)
 			return -EIO;
@@ -2144,7 +2145,7 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
 {
 	struct nvme_queue *adminq = &dev->queues[0];
 	struct pci_dev *pdev = to_pci_dev(dev->dev);
-	int result, nr_io_queues;
+	int result, want_irqs, nr_io_queues, pqueues;
 	unsigned long size;
 
 	nr_io_queues = max_io_queues();
@@ -2185,7 +2186,20 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
 	 */
 	pci_free_irq_vectors(pdev);
 
-	result = nvme_setup_irqs(dev, nr_io_queues);
+	/*
+	 * If we don't get the number of IO queues we asked for, see if we
+	 * need to adjust the number of poll queues down
+	 */
+	pqueues = poll_queues;
+	if (!pqueues)
+		want_irqs = nr_io_queues;
+	else if (pqueues >= nr_io_queues) {
+		want_irqs = 1;
+		pqueues = nr_io_queues - 1;
+	} else
+		want_irqs = nr_io_queues - pqueues;
+
+	result = nvme_setup_irqs(dev, want_irqs, pqueues);
 	if (result <= 0)
 		return -EIO;
 

-- 
Jens Axboe


  reply	other threads:[~2018-12-09  5:39 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-09  0:49 [PATCH] nvme: default to 0 poll queues Guenter Roeck
2018-12-09  5:38 ` Jens Axboe [this message]
2018-12-09  6:22   ` Guenter Roeck
2018-12-09  6:31     ` Jens Axboe
2018-12-09  7:32       ` Guenter Roeck
2018-12-09 18:18       ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fa7aabad-774b-4e5b-686f-3696762291c1@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=keith.busch@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux@roeck-us.net \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox