* [PATCH] NVMe: Re-introduce polling for completions
@ 2016-04-04 22:24 Keith Busch
2016-04-05 7:32 ` Johannes Thumshirn
2016-04-05 12:35 ` Christoph Hellwig
0 siblings, 2 replies; 12+ messages in thread
From: Keith Busch @ 2016-04-04 22:24 UTC (permalink / raw)
Multiple users have reported device initialization failure on some
platforms due the driver not receiving expected interrupts. This is not
the fault of any particular controller. This patch polls for completions
on the admin queue in the watchdog timer, and starts the timer immediately
after controller initialization completes, just before the first admin
command is issued.
Reported-by: Tim Muhlemmer <muhlemmer at gmail.com>
Reported-by: Jon Derrick <jonathan.derrick at intel.com>
Signed-off-by: Keith Busch <keith.busch at intel.com>
---
drivers/nvme/host/pci.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 29f31bc..9e03fe3 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1354,6 +1354,7 @@ static void nvme_watchdog_timer(unsigned long data)
{
struct nvme_dev *dev = (struct nvme_dev *)data;
u32 csts = readl(dev->bar + NVME_REG_CSTS);
+ struct nvme_queue *admin_q = dev->queues[0];
/*
* Skip controllers currently under reset.
@@ -1369,6 +1370,10 @@ static void nvme_watchdog_timer(unsigned long data)
return;
}
+ spin_lock_irq(&admin_q->q_lock);
+ nvme_process_cq(admin_q);
+ spin_unlock_irq(&admin_q->q_lock);
+
mod_timer(&dev->watchdog_timer, round_jiffies(jiffies + HZ));
}
@@ -1877,6 +1882,8 @@ static void nvme_reset_work(struct work_struct *work)
if (result)
goto out;
+ mod_timer(&dev->watchdog_timer, round_jiffies(jiffies + HZ));
+
result = nvme_init_identify(&dev->ctrl);
if (result)
goto out;
@@ -1888,8 +1895,6 @@ static void nvme_reset_work(struct work_struct *work)
dev->ctrl.event_limit = NVME_NR_AEN_COMMANDS;
queue_work(nvme_workq, &dev->async_work);
- mod_timer(&dev->watchdog_timer, round_jiffies(jiffies + HZ));
-
/*
* Keep the controller around but remove all namespaces if we don't have
* any working I/O queue.
--
2.7.2
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH] NVMe: Re-introduce polling for completions
2016-04-04 22:24 [PATCH] NVMe: Re-introduce polling for completions Keith Busch
@ 2016-04-05 7:32 ` Johannes Thumshirn
2016-04-05 12:35 ` Christoph Hellwig
1 sibling, 0 replies; 12+ messages in thread
From: Johannes Thumshirn @ 2016-04-05 7:32 UTC (permalink / raw)
On 2016-04-05 00:24, Keith Busch wrote:
> Multiple users have reported device initialization failure on some
> platforms due the driver not receiving expected interrupts. This is not
> the fault of any particular controller. This patch polls for
> completions
> on the admin queue in the watchdog timer, and starts the timer
> immediately
> after controller initialization completes, just before the first admin
> command is issued.
>
> Reported-by: Tim Muhlemmer <muhlemmer at gmail.com>
> Reported-by: Jon Derrick <jonathan.derrick at intel.com>
> Signed-off-by: Keith Busch <keith.busch at intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de>
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] NVMe: Re-introduce polling for completions
2016-04-04 22:24 [PATCH] NVMe: Re-introduce polling for completions Keith Busch
2016-04-05 7:32 ` Johannes Thumshirn
@ 2016-04-05 12:35 ` Christoph Hellwig
2016-04-05 17:54 ` Keith Busch
1 sibling, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2016-04-05 12:35 UTC (permalink / raw)
On Mon, Apr 04, 2016@04:24:49PM -0600, Keith Busch wrote:
> Multiple users have reported device initialization failure on some
> platforms due the driver not receiving expected interrupts. This is not
> the fault of any particular controller. This patch polls for completions
> on the admin queue in the watchdog timer, and starts the timer immediately
> after controller initialization completes, just before the first admin
> command is issued.
So how do any other PCIe device work given that almost no driver does
unconditionaly polling?
> + spin_lock_irq(&admin_q->q_lock);
> + nvme_process_cq(admin_q);
> + spin_unlock_irq(&admin_q->q_lock);
Add a comment on why this is done. Given that only the admin queue
is polled it's going to be a really convoluted explanation which should
be spelled out in detail.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] NVMe: Re-introduce polling for completions
2016-04-05 12:35 ` Christoph Hellwig
@ 2016-04-05 17:54 ` Keith Busch
2016-04-05 21:39 ` Keith Busch
0 siblings, 1 reply; 12+ messages in thread
From: Keith Busch @ 2016-04-05 17:54 UTC (permalink / raw)
On Tue, Apr 05, 2016@05:35:23AM -0700, Christoph Hellwig wrote:
> So how do any other PCIe device work given that almost no driver does
> unconditionaly polling?
I honestly don't have a good answer to that, and I agree with you
that this shouldn't be necessary ... but this is a harmless way to not
frustrate people who purchase these devices.
I suspect other PCI device drivers either at least try to never use the
legacy IRQ, or they've never been tested in platforms that break them.
> > + spin_lock_irq(&admin_q->q_lock);
> > + nvme_process_cq(admin_q);
> > + spin_unlock_irq(&admin_q->q_lock);
>
> Add a comment on why this is done. Given that only the admin queue
> is polled it's going to be a really convoluted explanation which should
> be spelled out in detail.
Sounds good.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] NVMe: Re-introduce polling for completions
2016-04-05 17:54 ` Keith Busch
@ 2016-04-05 21:39 ` Keith Busch
2016-04-05 22:04 ` Judy Brock-SSI
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Keith Busch @ 2016-04-05 21:39 UTC (permalink / raw)
On Tue, Apr 05, 2016@05:54:37PM +0000, Keith Busch wrote:
> On Tue, Apr 05, 2016@05:35:23AM -0700, Christoph Hellwig wrote:
> > So how do any other PCIe device work given that almost no driver does
> > unconditionaly polling?
>
> I honestly don't have a good answer to that, and I agree with you
> that this shouldn't be necessary ... but this is a harmless way to not
> frustrate people who purchase these devices.
>
> I suspect other PCI device drivers either at least try to never use the
> legacy IRQ, or they've never been tested in platforms that break them.
Before I resend with the requested code comments, I'd like to hear your
opinion on an alternate solution (patch below).
Currently the driver uses legacy IRQ if available only until we know how
many queues it can create. The legacy IRQ use is tripping up some h/w,
but I've not heard such issues with MSI/MSI-x.
Would you prefer going straight to MSI-x? The driver does that when
there is no INTx, but we wouldn't need to poll from the watchdog timer
if we make MSI-x the default behavior,
---
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 9e03fe3..5954f40 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1518,7 +1518,9 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
* If we enable msix early due to not intx, disable it again before
* setting up the full range we need.
*/
- if (!pdev->irq)
+ if (pdev->msi_enabled)
+ pci_disable_msi(pdev);
+ else if (pdev->msix_enabled)
pci_disable_msix(pdev);
for (i = 0; i < nr_io_queues; i++)
@@ -1701,7 +1703,6 @@ static int nvme_pci_enable(struct nvme_dev *dev)
if (pci_enable_device_mem(pdev))
return result;
- dev->entry[0].vector = pdev->irq;
pci_set_master(pdev);
if (dma_set_mask_and_coherent(dev->dev, DMA_BIT_MASK(64)) &&
@@ -1714,13 +1715,18 @@ static int nvme_pci_enable(struct nvme_dev *dev)
}
/*
- * Some devices don't advertse INTx interrupts, pre-enable a single
- * MSIX vec for setup. We'll adjust this later.
+ * Some devices and/or platforms don't advertise or work with INTx
+ * interrupts. Pre-enable a single MSIX or MSI vec for setup. We'll
+ * adjust this later.
*/
- if (!pdev->irq) {
- result = pci_enable_msix(pdev, dev->entry, 1);
- if (result < 0)
- goto disable;
+ if (pci_enable_msix(pdev, dev->entry, 1)) {
+ pci_enable_msi(pdev);
+ dev->entry[0].vector = pdev->irq;
+ }
+
+ if (!dev->entry[0].vector) {
+ result = -ENODEV;
+ goto disable;
}
cap = lo_hi_readq(dev->bar + NVME_REG_CAP);
--
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH] NVMe: Re-introduce polling for completions
2016-04-05 21:39 ` Keith Busch
@ 2016-04-05 22:04 ` Judy Brock-SSI
2016-04-05 22:29 ` Keith Busch
2016-04-06 6:56 ` Christoph Hellwig
2016-04-06 6:56 ` Christoph Hellwig
2016-04-12 19:00 ` Christoph Hellwig
2 siblings, 2 replies; 12+ messages in thread
From: Judy Brock-SSI @ 2016-04-05 22:04 UTC (permalink / raw)
Going straight to MSI-x is a higher risk change since re-introducing polling restores code that was in place previously with no reported problems whereas changing the init path to something new has no test time on it.
Restoring the polling (with comments as requested) is a harmless way to regain general ecosystem compatibility.
Thanks,
Judy
-----Original Message-----
From: Linux-nvme [mailto:linux-nvme-bounces@lists.infradead.org] On Behalf Of Keith Busch
Sent: Tuesday, April 05, 2016 2:39 PM
To: Christoph Hellwig
Cc: Jens Axboe; linux-nvme at lists.infradead.org
Subject: Re: [PATCH] NVMe: Re-introduce polling for completions
On Tue, Apr 05, 2016@05:54:37PM +0000, Keith Busch wrote:
> On Tue, Apr 05, 2016@05:35:23AM -0700, Christoph Hellwig wrote:
> > So how do any other PCIe device work given that almost no driver
> > does unconditionaly polling?
>
> I honestly don't have a good answer to that, and I agree with you that
> this shouldn't be necessary ... but this is a harmless way to not
> frustrate people who purchase these devices.
>
> I suspect other PCI device drivers either at least try to never use
> the legacy IRQ, or they've never been tested in platforms that break them.
Before I resend with the requested code comments, I'd like to hear your opinion on an alternate solution (patch below).
Currently the driver uses legacy IRQ if available only until we know how many queues it can create. The legacy IRQ use is tripping up some h/w, but I've not heard such issues with MSI/MSI-x.
Would you prefer going straight to MSI-x? The driver does that when there is no INTx, but we wouldn't need to poll from the watchdog timer if we make MSI-x the default behavior,
---
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 9e03fe3..5954f40 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1518,7 +1518,9 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
* If we enable msix early due to not intx, disable it again before
* setting up the full range we need.
*/
- if (!pdev->irq)
+ if (pdev->msi_enabled)
+ pci_disable_msi(pdev);
+ else if (pdev->msix_enabled)
pci_disable_msix(pdev);
for (i = 0; i < nr_io_queues; i++)
@@ -1701,7 +1703,6 @@ static int nvme_pci_enable(struct nvme_dev *dev)
if (pci_enable_device_mem(pdev))
return result;
- dev->entry[0].vector = pdev->irq;
pci_set_master(pdev);
if (dma_set_mask_and_coherent(dev->dev, DMA_BIT_MASK(64)) && @@ -1714,13 +1715,18 @@ static int nvme_pci_enable(struct nvme_dev *dev)
}
/*
- * Some devices don't advertse INTx interrupts, pre-enable a single
- * MSIX vec for setup. We'll adjust this later.
+ * Some devices and/or platforms don't advertise or work with INTx
+ * interrupts. Pre-enable a single MSIX or MSI vec for setup. We'll
+ * adjust this later.
*/
- if (!pdev->irq) {
- result = pci_enable_msix(pdev, dev->entry, 1);
- if (result < 0)
- goto disable;
+ if (pci_enable_msix(pdev, dev->entry, 1)) {
+ pci_enable_msi(pdev);
+ dev->entry[0].vector = pdev->irq;
+ }
+
+ if (!dev->entry[0].vector) {
+ result = -ENODEV;
+ goto disable;
}
cap = lo_hi_readq(dev->bar + NVME_REG_CAP);
--
_______________________________________________
Linux-nvme mailing list
Linux-nvme at lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH] NVMe: Re-introduce polling for completions
2016-04-05 22:04 ` Judy Brock-SSI
@ 2016-04-05 22:29 ` Keith Busch
2016-04-06 5:22 ` Judy Brock-SSI
2016-04-06 6:56 ` Christoph Hellwig
1 sibling, 1 reply; 12+ messages in thread
From: Keith Busch @ 2016-04-05 22:29 UTC (permalink / raw)
On Tue, Apr 05, 2016@10:04:22PM +0000, Judy Brock-SSI wrote:
> Going straight to MSI-x is a higher risk change since re-introducing polling restores code that was in place previously with no reported problems whereas changing the init path to something new has no test time on it.
Well, I wouldn't say "no" test time since we have a well tested path
that goes straight to MSI-x in place for some time.
If we rely on polled completions during initialization, we potentially
add up to 2 seconds for device initialization to complete (1 second
polling interval * 2 commands using broken IRQ).
Maybe you want both?
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] NVMe: Re-introduce polling for completions
2016-04-05 22:29 ` Keith Busch
@ 2016-04-06 5:22 ` Judy Brock-SSI
0 siblings, 0 replies; 12+ messages in thread
From: Judy Brock-SSI @ 2016-04-06 5:22 UTC (permalink / raw)
> Maybe you want both?
I think that would be a good solution.
-----Original Message-----
From: Keith Busch [mailto:keith.busch@intel.com]
Sent: Tuesday, April 05, 2016 3:30 PM
To: Judy Brock-SSI
Cc: Christoph Hellwig; Jens Axboe; linux-nvme at lists.infradead.org
Subject: Re: [PATCH] NVMe: Re-introduce polling for completions
On Tue, Apr 05, 2016@10:04:22PM +0000, Judy Brock-SSI wrote:
> Going straight to MSI-x is a higher risk change since re-introducing polling restores code that was in place previously with no reported problems whereas changing the init path to something new has no test time on it.
Well, I wouldn't say "no" test time since we have a well tested path that goes straight to MSI-x in place for some time.
If we rely on polled completions during initialization, we potentially add up to 2 seconds for device initialization to complete (1 second polling interval * 2 commands using broken IRQ).
Maybe you want both?
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] NVMe: Re-introduce polling for completions
2016-04-05 22:04 ` Judy Brock-SSI
2016-04-05 22:29 ` Keith Busch
@ 2016-04-06 6:56 ` Christoph Hellwig
1 sibling, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2016-04-06 6:56 UTC (permalink / raw)
On Tue, Apr 05, 2016@10:04:22PM +0000, Judy Brock-SSI wrote:
> Restoring the polling (with comments as requested) is a harmless way to regain general ecosystem compatibility.
Please take your marketing talk elsewhere, there is no "ecosystem" to
"synergize" in the linux kernel.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] NVMe: Re-introduce polling for completions
2016-04-05 21:39 ` Keith Busch
2016-04-05 22:04 ` Judy Brock-SSI
@ 2016-04-06 6:56 ` Christoph Hellwig
2016-04-12 19:00 ` Christoph Hellwig
2 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2016-04-06 6:56 UTC (permalink / raw)
On Tue, Apr 05, 2016@09:39:04PM +0000, Keith Busch wrote:
> Before I resend with the requested code comments, I'd like to hear your
> opinion on an alternate solution (patch below).
>
> Currently the driver uses legacy IRQ if available only until we know how
> many queues it can create. The legacy IRQ use is tripping up some h/w,
> but I've not heard such issues with MSI/MSI-x.
>
> Would you prefer going straight to MSI-x? The driver does that when
> there is no INTx, but we wouldn't need to poll from the watchdog timer
> if we make MSI-x the default behavior,
I would much prefer that variant over doing crazy non-standard polling.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] NVMe: Re-introduce polling for completions
2016-04-05 21:39 ` Keith Busch
2016-04-05 22:04 ` Judy Brock-SSI
2016-04-06 6:56 ` Christoph Hellwig
@ 2016-04-12 19:00 ` Christoph Hellwig
2016-04-12 19:02 ` Christoph Hellwig
2 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2016-04-12 19:00 UTC (permalink / raw)
On Tue, Apr 05, 2016@09:39:04PM +0000, Keith Busch wrote:
> On Tue, Apr 05, 2016@05:54:37PM +0000, Keith Busch wrote:
> > On Tue, Apr 05, 2016@05:35:23AM -0700, Christoph Hellwig wrote:
> > > So how do any other PCIe device work given that almost no driver does
> > > unconditionaly polling?
> >
> > I honestly don't have a good answer to that, and I agree with you
> > that this shouldn't be necessary ... but this is a harmless way to not
> > frustrate people who purchase these devices.
> >
> > I suspect other PCI device drivers either at least try to never use the
> > legacy IRQ, or they've never been tested in platforms that break them.
>
> Before I resend with the requested code comments, I'd like to hear your
> opinion on an alternate solution (patch below).
>
> Currently the driver uses legacy IRQ if available only until we know how
> many queues it can create. The legacy IRQ use is tripping up some h/w,
> but I've not heard such issues with MSI/MSI-x.
>
> Would you prefer going straight to MSI-x? The driver does that when
> there is no INTx, but we wouldn't need to poll from the watchdog timer
> if we make MSI-x the default behavior,
Can you resend this to Jens for 4.6 with a proper signoff?
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2016-04-12 19:02 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-04 22:24 [PATCH] NVMe: Re-introduce polling for completions Keith Busch
2016-04-05 7:32 ` Johannes Thumshirn
2016-04-05 12:35 ` Christoph Hellwig
2016-04-05 17:54 ` Keith Busch
2016-04-05 21:39 ` Keith Busch
2016-04-05 22:04 ` Judy Brock-SSI
2016-04-05 22:29 ` Keith Busch
2016-04-06 5:22 ` Judy Brock-SSI
2016-04-06 6:56 ` Christoph Hellwig
2016-04-06 6:56 ` Christoph Hellwig
2016-04-12 19:00 ` Christoph Hellwig
2016-04-12 19:02 ` Christoph Hellwig
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.