[PATCH v2] nvme/pci: Log PCI_STATUS when the controller dies

linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2] nvme/pci: Log PCI_STATUS when the controller dies
@ 2016-12-02  0:42 Andy Lutomirski
  2016-12-02 13:26 ` Christoph Hellwig
  2016-12-02 22:19 ` J Freyensee
  0 siblings, 2 replies; 4+ messages in thread
From: Andy Lutomirski @ 2016-12-02  0:42 UTC (permalink / raw)


When debugging nvme controller crashes, it's nice to know whether
the controller died cleanly so that the failure is just reflected in
CSTS, it died and put an error in PCI_STATUS, or whether it died so
badly that it stopped responding to PCI configuration space reads.

Signed-off-by: Andy Lutomirski <luto at kernel.org>
---
 drivers/nvme/host/pci.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 0248d0e21fee..f8dd83be01d9 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1292,10 +1292,22 @@ static void nvme_watchdog_timer(unsigned long data)
 
 	/* Skip controllers under certain specific conditions. */
 	if (nvme_should_reset(dev, csts)) {
-		if (!nvme_reset(dev))
-			dev_warn(dev->dev,
-				"Failed status: 0x%x, reset controller.\n",
-				csts);
+		if (!nvme_reset(dev)) {
+			/* Read a config register to help see what died. */
+			u16 pci_status;
+			int result;
+
+			result = pci_read_config_word(to_pci_dev(dev->dev),
+						      PCI_STATUS, &pci_status);
+			if (result == PCIBIOS_SUCCESSFUL)
+				dev_warn(dev->dev,
+					 "controller is down; will reset: CSTS=0x%x, PCI_STATUS=0x%hx\n",
+					 csts, pci_status);
+			else
+				dev_warn(dev->dev,
+					 "controller is down; will reset: CSTS=0x%x, PCI_STATUS read failed (%d)\n",
+					 csts, result);
+		}
 		return;
 	}
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v2] nvme/pci: Log PCI_STATUS when the controller dies
  2016-12-02  0:42 [PATCH v2] nvme/pci: Log PCI_STATUS when the controller dies Andy Lutomirski
@ 2016-12-02 13:26 ` Christoph Hellwig
  2016-12-02 16:57   ` Andy Lutomirski
  2016-12-02 22:19 ` J Freyensee
  1 sibling, 1 reply; 4+ messages in thread
From: Christoph Hellwig @ 2016-12-02 13:26 UTC (permalink / raw)


On Thu, Dec 01, 2016@04:42:41PM -0800, Andy Lutomirski wrote:
> When debugging nvme controller crashes, it's nice to know whether
> the controller died cleanly so that the failure is just reflected in
> CSTS, it died and put an error in PCI_STATUS, or whether it died so
> badly that it stopped responding to PCI configuration space reads.

Just curious:  what controller did this happen with?

> +			/* Read a config register to help see what died. */
> +			u16 pci_status;
> +			int result;
> +
> +			result = pci_read_config_word(to_pci_dev(dev->dev),
> +						      PCI_STATUS, &pci_status);
> +			if (result == PCIBIOS_SUCCESSFUL)
> +				dev_warn(dev->dev,
> +					 "controller is down; will reset: CSTS=0x%x, PCI_STATUS=0x%hx\n",
> +					 csts, pci_status);
> +			else
> +				dev_warn(dev->dev,
> +					 "controller is down; will reset: CSTS=0x%x, PCI_STATUS read failed (%d)\n",
> +					 csts, result);
> +		}

Can you factor all this debug code into a separate function to keep
the main flow easier to read?

Except for that this patch looks fine to me:

Reviewed-by: Christoph Hellwig <hch at lst.de>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v2] nvme/pci: Log PCI_STATUS when the controller dies
  2016-12-02 13:26 ` Christoph Hellwig
@ 2016-12-02 16:57   ` Andy Lutomirski
  0 siblings, 0 replies; 4+ messages in thread
From: Andy Lutomirski @ 2016-12-02 16:57 UTC (permalink / raw)


On Fri, Dec 2, 2016@5:26 AM, Christoph Hellwig <hch@infradead.org> wrote:
> On Thu, Dec 01, 2016@04:42:41PM -0800, Andy Lutomirski wrote:
>> When debugging nvme controller crashes, it's nice to know whether
>> the controller died cleanly so that the failure is just reflected in
>> CSTS, it died and put an error in PCI_STATUS, or whether it died so
>> badly that it stopped responding to PCI configuration space reads.
>
> Just curious:  what controller did this happen with?

I've seen a failure that gives 0xffff in PCI_STATUS on a Samsung
"SM951 NVMe SAMSUNG 256GB" with firmware "BXW75D0Q".

I'll add that to the v3 changelog.

>
>> +                     /* Read a config register to help see what died. */
>> +                     u16 pci_status;
>> +                     int result;
>> +
>> +                     result = pci_read_config_word(to_pci_dev(dev->dev),
>> +                                                   PCI_STATUS, &pci_status);
>> +                     if (result == PCIBIOS_SUCCESSFUL)
>> +                             dev_warn(dev->dev,
>> +                                      "controller is down; will reset: CSTS=0x%x, PCI_STATUS=0x%hx\n",
>> +                                      csts, pci_status);
>> +                     else
>> +                             dev_warn(dev->dev,
>> +                                      "controller is down; will reset: CSTS=0x%x, PCI_STATUS read failed (%d)\n",
>> +                                      csts, result);
>> +             }
>
> Can you factor all this debug code into a separate function to keep
> the main flow easier to read?
>
> Except for that this patch looks fine to me:
>
> Reviewed-by: Christoph Hellwig <hch at lst.de>

Done.  v3 coming.

-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v2] nvme/pci: Log PCI_STATUS when the controller dies
  2016-12-02  0:42 [PATCH v2] nvme/pci: Log PCI_STATUS when the controller dies Andy Lutomirski
  2016-12-02 13:26 ` Christoph Hellwig
@ 2016-12-02 22:19 ` J Freyensee
  1 sibling, 0 replies; 4+ messages in thread
From: J Freyensee @ 2016-12-02 22:19 UTC (permalink / raw)


On Thu, 2016-12-01@16:42 -0800, Andy Lutomirski wrote:
> When debugging nvme controller crashes, it's nice to know whether
> the controller died cleanly so that the failure is just reflected in
> CSTS, it died and put an error in PCI_STATUS, or whether it died so
> badly that it stopped responding to PCI configuration space reads.
> 
> Signed-off-by: Andy Lutomirski <luto at kernel.org>
> ---
> ?drivers/nvme/host/pci.c | 20 ++++++++++++++++----
> ?1 file changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 0248d0e21fee..f8dd83be01d9 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -1292,10 +1292,22 @@ static void nvme_watchdog_timer(unsigned long
> data)
> ?
> ?	/* Skip controllers under certain specific conditions. */
> ?	if (nvme_should_reset(dev, csts)) {
> -		if (!nvme_reset(dev))
> -			dev_warn(dev->dev,
> -				"Failed status: 0x%x, reset
> controller.\n",
> -				csts);
> +		if (!nvme_reset(dev)) {
> +			/* Read a config register to help see what
> died. */
> +			u16 pci_status;
> +			int result;
> +
> +			result =
> pci_read_config_word(to_pci_dev(dev->dev),
> +						??????PCI_STATUS,
> &pci_status);
> +			if (result == PCIBIOS_SUCCESSFUL)
> +				dev_warn(dev->dev,
> +					?"controller is down; will
> reset: CSTS=0x%x, PCI_STATUS=0x%hx\n",
> +					?csts, pci_status);
> +			else
> +				dev_warn(dev->dev,
> +					?"controller is down; will
> reset: CSTS=0x%x, PCI_STATUS read failed (%d)\n",
> +					?csts, result);
> +		}
> ?		return;

Looks a lot more like what Keith suggested.

Reviewed-by: Jay Freyensee <james_p_freyensee at linux.intel.com>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-12-02 22:19 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-12-02  0:42 [PATCH v2] nvme/pci: Log PCI_STATUS when the controller dies Andy Lutomirski
2016-12-02 13:26 ` Christoph Hellwig
2016-12-02 16:57   ` Andy Lutomirski
2016-12-02 22:19 ` J Freyensee

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).