[PATCHv2 0/2] PCI: driver function reset notification

linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCHv2 0/2] PCI: driver function reset notification
@ 2014-01-13 22:26 Keith Busch
  2014-01-13 22:26 ` [PATCHv2 1/2] PCI: call pci reset callback to pci_driver on FLR Keith Busch
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Keith Busch @ 2014-01-13 22:26 UTC (permalink / raw)
  To: linux-pci, alex.williamson, bhelgaas; +Cc: Keith Busch

Here's version 2 of this patch with a driver implementing the intended
use as an example.

The NVMe stuff requires using the maintainer's tree to get the newly
added nvme reset handling code. Willy's repo is located here:

git.infradead.org/users/willy/linux-nvme.git

v1->v2:

As suggested, I'm reusing the slot_reset error handler instead of defining
a new one for function_reset.

I moved invoking the callback further up the this call stack. The test
case I use resets the device via sysfs, and the pci device's command
register is cleared at the previous point, so the callback couldn't
actually do anything useful other than schedule something to handle
it after pci_dev_restore is called. The previous location would break
other driver slot_reset implementations and make my nvme implementation
a little more complicated.

Actually ... I'm a little concered to be using slot_reset instead of
defining a new callback for FLR. From looking at other device drivers,
I'm not sure they would expect to have their slot_reset invoked in
this situation.

Keith Busch (2):
  PCI: call pci reset callback to pci_driver on FLR
  NVMe: Implement pci reset callback

 drivers/block/nvme-core.c |   36 ++++++++++++++++++++++++++++++++++--
 drivers/pci/pci.c         |    6 ++++++
 include/linux/nvme.h      |    1 +
 3 files changed, 41 insertions(+), 2 deletions(-)

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCHv2 1/2] PCI: call pci reset callback to pci_driver on FLR
  2014-01-13 22:26 [PATCHv2 0/2] PCI: driver function reset notification Keith Busch
@ 2014-01-13 22:26 ` Keith Busch
  2014-01-13 22:26 ` [PATCHv2 2/2] NVMe: Implement pci reset callback Keith Busch
  2014-01-16 18:49 ` [PATCHv2 0/2] PCI: driver function reset notification Bjorn Helgaas
  2 siblings, 0 replies; 7+ messages in thread
From: Keith Busch @ 2014-01-13 22:26 UTC (permalink / raw)
  To: linux-pci, alex.williamson, bhelgaas; +Cc: Keith Busch

A user can issue a pci function level reset to a device using sysfs
entry /sys/bus/pci/devices/.../reset. A kernel driver handling the
pci device probably would like to know that such a reset has occured,
so this patch calls the pci_driver's slot_reset pci_error_handler.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 drivers/pci/pci.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index b127fbda..1bfddc5 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3584,6 +3584,12 @@ int pci_reset_function(struct pci_dev *dev)
 
 	pci_dev_restore(dev);
 
+	if (!rc) {
+		const struct pci_error_handlers *err_handler = dev->driver ?
+			dev->driver->err_handler : NULL;
+		if (err_handler && err_handler->slot_reset)
+			err_handler->slot_reset(dev);
+	}
 	return rc;
 }
 EXPORT_SYMBOL_GPL(pci_reset_function);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCHv2 2/2] NVMe: Implement pci reset callback
  2014-01-13 22:26 [PATCHv2 0/2] PCI: driver function reset notification Keith Busch
  2014-01-13 22:26 ` [PATCHv2 1/2] PCI: call pci reset callback to pci_driver on FLR Keith Busch
@ 2014-01-13 22:26 ` Keith Busch
  2014-01-16 18:49 ` [PATCHv2 0/2] PCI: driver function reset notification Bjorn Helgaas
  2 siblings, 0 replies; 7+ messages in thread
From: Keith Busch @ 2014-01-13 22:26 UTC (permalink / raw)
  To: linux-pci, alex.williamson, bhelgaas; +Cc: Keith Busch

Initiates nvme controller reset sequence when pci layer notifies of slot
reset. The driver's slot_reset handling is skipped if the nvme polling
kthread happened to try accessing a device register while the reset was in
progress, which would have scheduled the nvme device to be reset anyway.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 drivers/block/nvme-core.c |   36 ++++++++++++++++++++++++++++++++++--
 include/linux/nvme.h      |    1 +
 2 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
index b59a93a..1f5c18a 100644
--- a/drivers/block/nvme-core.c
+++ b/drivers/block/nvme-core.c
@@ -1750,6 +1750,7 @@ static int nvme_kthread(void *data)
 							dev->initialized) {
 				if (work_busy(&dev->reset_work))
 					continue;
+				dev->reset = 1;
 				list_del_init(&dev->node);
 				dev_warn(&dev->pci_dev->dev,
 					"Failed status, reset controller\n");
@@ -2088,6 +2089,7 @@ static int nvme_dev_map(struct nvme_dev *dev)
 	}
 	dev->db_stride = 1 << NVME_CAP_STRIDE(readq(&dev->bar->cap));
 	dev->dbs = ((void __iomem *)dev->bar) + 4096;
+	dev->reset = 0;
 
 	return 0;
 
@@ -2265,7 +2267,10 @@ static void nvme_dev_shutdown(struct nvme_dev *dev)
 	list_del_init(&dev->node);
 	spin_unlock(&dev_list_lock);
 
-	if (!dev->bar || (dev->bar && readl(&dev->bar->csts) == -1)) {
+	if (!dev->bar || (dev->bar && readl(&dev->bar->csts) == -1) ||
+							dev->reset) {
+		if (dev->bar)
+			nvme_shutdown_ctrl(dev);
 		for (i = dev->queue_count - 1; i >= 0; i--) {
 			struct nvme_queue *nvmeq = dev->queues[i];
 			nvme_suspend_queue(nvmeq);
@@ -2575,7 +2580,6 @@ static void nvme_remove(struct pci_dev *pdev)
 #define nvme_error_detected NULL
 #define nvme_dump_registers NULL
 #define nvme_link_reset NULL
-#define nvme_slot_reset NULL
 #define nvme_error_resume NULL
 
 static int nvme_suspend(struct device *dev)
@@ -2599,6 +2603,34 @@ static int nvme_resume(struct device *dev)
 	return 0;
 }
 
+
+/**
+ * nvme_slot_reset - handle nvme device that has been reset
+ * @pdev: pci device that has been through a pci slot or function reset
+ *
+ * Called by pci error recovery. We assume the device is in a state requiring
+ * initialization, so set the device reset flag first to prevent wasting time
+ * taking the controller offline with admin commands that we expect to timeout.
+ */
+static pci_ers_result_t nvme_slot_reset(struct pci_dev *pdev)
+{
+	struct nvme_dev *dev = pci_get_drvdata(pdev);
+
+	spin_lock(&dev_list_lock);
+	if (!work_busy(&dev->reset_work)) {
+		list_del_init(&dev->node);
+		dev->reset = 1;
+	} else {
+		spin_unlock(&dev_list_lock);
+		return PCI_ERS_RESULT_RECOVERED;
+	}
+	spin_unlock(&dev_list_lock);
+
+	nvme_dev_reset(dev);
+	return dev->initialized ? PCI_ERS_RESULT_RECOVERED :
+				PCI_ERS_RESULT_DISCONNECT;
+}
+
 static SIMPLE_DEV_PM_OPS(nvme_dev_pm_ops, nvme_suspend, nvme_resume);
 
 static const struct pci_error_handlers nvme_err_handler = {
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 69ae03f..febce6f 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -97,6 +97,7 @@ struct nvme_dev {
 	u16 oncs;
 	u16 abort_limit;
 	u8 initialized;
+	u8 reset;
 };
 
 /*
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCHv2 0/2] PCI: driver function reset notification
  2014-01-13 22:26 [PATCHv2 0/2] PCI: driver function reset notification Keith Busch
  2014-01-13 22:26 ` [PATCHv2 1/2] PCI: call pci reset callback to pci_driver on FLR Keith Busch
  2014-01-13 22:26 ` [PATCHv2 2/2] NVMe: Implement pci reset callback Keith Busch
@ 2014-01-16 18:49 ` Bjorn Helgaas
  2014-01-16 19:10   ` Alex Williamson
  2 siblings, 1 reply; 7+ messages in thread
From: Bjorn Helgaas @ 2014-01-16 18:49 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-pci@vger.kernel.org, alex.williamson@redhat.com

On Mon, Jan 13, 2014 at 3:26 PM, Keith Busch <keith.busch@intel.com> wrote:
> Here's version 2 of this patch with a driver implementing the intended
> use as an example.
>
> The NVMe stuff requires using the maintainer's tree to get the newly
> added nvme reset handling code. Willy's repo is located here:
>
> git.infradead.org/users/willy/linux-nvme.git
>
> v1->v2:
>
> As suggested, I'm reusing the slot_reset error handler instead of defining
> a new one for function_reset.
>
> I moved invoking the callback further up the this call stack. The test
> case I use resets the device via sysfs, and the pci device's command
> register is cleared at the previous point, so the callback couldn't
> actually do anything useful other than schedule something to handle
> it after pci_dev_restore is called. The previous location would break
> other driver slot_reset implementations and make my nvme implementation
> a little more complicated.

There's now a pci_try_reset_function(), and something like this
callback would have to be done in that path, too.

> Actually ... I'm a little concered to be using slot_reset instead of
> defining a new callback for FLR. From looking at other device drivers,
> I'm not sure they would expect to have their slot_reset invoked in
> this situation.

I haven't looked at other slot_reset callbacks.  Do you have an
example we can look at and talk about?

The existing model (even without your changes) allows a user to start
a reset via sysfs while the driver is still active.  What happens when
the reset occurs while the driver is programming the device?  For
example, if the driver sets up a DMA transfer address, the reset
occurs (destroying the address), and the driver initiates the DMA, the
DMA will go to the wrong place.

I'm not sure how to fix this model of resets happening asynchronous to
the driver.  Maybe we need to tell the driver *before* the reset.
Maybe we need to ask the driver to do the reset itself, and only do it
in the core if no driver is attached.  Maybe we need to make the reset
look like a hotplug remove/add to the driver, so we detach the driver,
do the reset, and reattach a driver.

In the general case, we don't know what the device *is* after a reset
because it could have loaded new firmware.  It could require more
resources or even a different driver.

I know it would cause Alex heartburn to make reset look like hotplug.
What sort of NVMe problems would that cause?  I assume most drivers
will have to treat a device coming out of reset basically the same way
as a brand new hot-added device.

Bjorn

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCHv2 0/2] PCI: driver function reset notification
  2014-01-16 18:49 ` [PATCHv2 0/2] PCI: driver function reset notification Bjorn Helgaas
@ 2014-01-16 19:10   ` Alex Williamson
  2014-01-16 20:38     ` Keith Busch
  2014-01-17 17:26     ` Bjorn Helgaas
  0 siblings, 2 replies; 7+ messages in thread
From: Alex Williamson @ 2014-01-16 19:10 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Keith Busch, linux-pci@vger.kernel.org

On Thu, 2014-01-16 at 11:49 -0700, Bjorn Helgaas wrote:
> On Mon, Jan 13, 2014 at 3:26 PM, Keith Busch <keith.busch@intel.com> wrote:
> > Here's version 2 of this patch with a driver implementing the intended
> > use as an example.
> >
> > The NVMe stuff requires using the maintainer's tree to get the newly
> > added nvme reset handling code. Willy's repo is located here:
> >
> > git.infradead.org/users/willy/linux-nvme.git
> >
> > v1->v2:
> >
> > As suggested, I'm reusing the slot_reset error handler instead of defining
> > a new one for function_reset.
> >
> > I moved invoking the callback further up the this call stack. The test
> > case I use resets the device via sysfs, and the pci device's command
> > register is cleared at the previous point, so the callback couldn't
> > actually do anything useful other than schedule something to handle
> > it after pci_dev_restore is called. The previous location would break
> > other driver slot_reset implementations and make my nvme implementation
> > a little more complicated.
> 
> There's now a pci_try_reset_function(), and something like this
> callback would have to be done in that path, too.
> 
> > Actually ... I'm a little concered to be using slot_reset instead of
> > defining a new callback for FLR. From looking at other device drivers,
> > I'm not sure they would expect to have their slot_reset invoked in
> > this situation.
> 
> I haven't looked at other slot_reset callbacks.  Do you have an
> example we can look at and talk about?
> 
> The existing model (even without your changes) allows a user to start
> a reset via sysfs while the driver is still active.  What happens when
> the reset occurs while the driver is programming the device?  For
> example, if the driver sets up a DMA transfer address, the reset
> occurs (destroying the address), and the driver initiates the DMA, the
> DMA will go to the wrong place.
> 
> I'm not sure how to fix this model of resets happening asynchronous to
> the driver.  Maybe we need to tell the driver *before* the reset.
> Maybe we need to ask the driver to do the reset itself, and only do it
> in the core if no driver is attached.  Maybe we need to make the reset
> look like a hotplug remove/add to the driver, so we detach the driver,
> do the reset, and reattach a driver.
> 
> In the general case, we don't know what the device *is* after a reset
> because it could have loaded new firmware.  It could require more
> resources or even a different driver.
> 
> I know it would cause Alex heartburn to make reset look like hotplug.

Yes it would.  I agree that it's theoretically possible for a device to
undergo a metamorphosis to something new at reset, but in practice it
just doesn't happen.  Perhaps a reset with no driver attached could
completely remove, reset, and rescan the device, but that might still
cause problems with legacy KVM device assignment if pci-stub is not used
to hold the device.  For a device exposed to userspace through vfio, we
want to be able to reset the device, but releasing the device and trying
to reacquire it would be challenging to say the least.  Thanks,

Alex

> What sort of NVMe problems would that cause?  I assume most drivers
> will have to treat a device coming out of reset basically the same way
> as a brand new hot-added device.
> 
> Bjorn




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCHv2 0/2] PCI: driver function reset notification
  2014-01-16 19:10   ` Alex Williamson
@ 2014-01-16 20:38     ` Keith Busch
  2014-01-17 17:26     ` Bjorn Helgaas
  1 sibling, 0 replies; 7+ messages in thread
From: Keith Busch @ 2014-01-16 20:38 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Bjorn Helgaas, Keith Busch, linux-pci@vger.kernel.org

On Thu, 16 Jan 2014, Alex Williamson wrote:
> On Thu, 2014-01-16 at 11:49 -0700, Bjorn Helgaas wrote:
>> On Mon, Jan 13, 2014 at 3:26 PM, Keith Busch <keith.busch@intel.com> wrote:
...
>>> Actually ... I'm a little concered to be using slot_reset instead of
>>> defining a new callback for FLR. From looking at other device drivers,
>>> I'm not sure they would expect to have their slot_reset invoked in
>>> this situation.
>>
>> I haven't looked at other slot_reset callbacks.  Do you have an
>> example we can look at and talk about?

I'm comparing with mpt[2,3]sas_scsih.c. It looks like their slot_reset
requires pci_errhandlers' "err_detected" called prior to slot_reset,
otherwise they'll fail the device remapping.

>> The existing model (even without your changes) allows a user to start
>> a reset via sysfs while the driver is still active.  What happens when
>> the reset occurs while the driver is programming the device?  For
>> example, if the driver sets up a DMA transfer address, the reset
>> occurs (destroying the address), and the driver initiates the DMA, the
>> DMA will go to the wrong place.
>>
>> I'm not sure how to fix this model of resets happening asynchronous to
>> the driver.  Maybe we need to tell the driver *before* the reset.
>> Maybe we need to ask the driver to do the reset itself, and only do it
>> in the core if no driver is attached.  Maybe we need to make the reset
>> look like a hotplug remove/add to the driver, so we detach the driver,
>> do the reset, and reattach a driver.

Ha, Willy actually recommended I add two calls before I sent the first
patch! Something like pci_reset_prepare(), pci_reset_complete().

Now that I think about it for my driver, these calls would do be identical
to suspend()/resume(). Can we use those?!

>> In the general case, we don't know what the device *is* after a reset
>> because it could have loaded new firmware.  It could require more
>> resources or even a different driver.

Activating new firmware is the exact use I'm going for here. I've been
recommending users manually remove/reset/rescan the device with the
appriate sysfs writes. It sounds like your suggesting we make all that
automatically happen with only need to hit reset.

It turns out making this look like hotplug isn't desirable here because
the device's storage may be mounted partitions (sounds a little scary
to me), and a removal would take down opened disks.

>> I know it would cause Alex heartburn to make reset look like hotplug.
>
> Yes it would.  I agree that it's theoretically possible for a device to
> undergo a metamorphosis to something new at reset, but in practice it
> just doesn't happen.  Perhaps a reset with no driver attached could
> completely remove, reset, and rescan the device, but that might still
> cause problems with legacy KVM device assignment if pci-stub is not used
> to hold the device.  For a device exposed to userspace through vfio, we
> want to be able to reset the device, but releasing the device and trying
> to reacquire it would be challenging to say the least.  Thanks,
>
> Alex
>
>> What sort of NVMe problems would that cause?  I assume most drivers
>> will have to treat a device coming out of reset basically the same way
>>
>> Bjorn

Thanks for the feedback!
Keith

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCHv2 0/2] PCI: driver function reset notification
  2014-01-16 19:10   ` Alex Williamson
  2014-01-16 20:38     ` Keith Busch
@ 2014-01-17 17:26     ` Bjorn Helgaas
  1 sibling, 0 replies; 7+ messages in thread
From: Bjorn Helgaas @ 2014-01-17 17:26 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Keith Busch, linux-pci@vger.kernel.org

On Thu, Jan 16, 2014 at 12:10 PM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> On Thu, 2014-01-16 at 11:49 -0700, Bjorn Helgaas wrote:

>> In the general case, we don't know what the device *is* after a reset
>> because it could have loaded new firmware.  It could require more
>> resources or even a different driver.
>>
>> I know it would cause Alex heartburn to make reset look like hotplug.
>
> Yes it would.  I agree that it's theoretically possible for a device to
> undergo a metamorphosis to something new at reset, but in practice it
> just doesn't happen.

Maybe not in your world, but this definitely happens, and it
definitely causes hassles for people.  Some resort to hacks like
rebooting the whole system because we don't support this well.

I mentioned these before, so you may have seen them already:

    http://forums.xilinx.com/t5/PCI-Express/PC-is-hung-if-run-the-second-program-FPGA-by-JTAG/td-p/20871
    http://www.alteraforum.com/forum/archive/index.php/t-28477.html
    http://www.alteraforum.com/forum/showthread.php?t=35091
    https://github.com/NetFPGA/NetFPGA-public/wiki/Restore-PCIE-Configuration
    http://lkml.indiana.edu/hypermail/linux/kernel/1104.3/02544.html

My fear is that we're building a lot of stuff based on assumptions
about how things typically operate.  That works most of the time, but
it does fail on important corner cases that are spec-compliant but
atypical.  From the PCI subsystem point of view, that makes me uneasy.

Bjorn

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-01-17 17:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-13 22:26 [PATCHv2 0/2] PCI: driver function reset notification Keith Busch
2014-01-13 22:26 ` [PATCHv2 1/2] PCI: call pci reset callback to pci_driver on FLR Keith Busch
2014-01-13 22:26 ` [PATCHv2 2/2] NVMe: Implement pci reset callback Keith Busch
2014-01-16 18:49 ` [PATCHv2 0/2] PCI: driver function reset notification Bjorn Helgaas
2014-01-16 19:10   ` Alex Williamson
2014-01-16 20:38     ` Keith Busch
2014-01-17 17:26     ` Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).