linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/2] PCI/AER: Clean up minor logging issues
@ 2018-05-30 17:54 Bjorn Helgaas
  2018-05-30 17:54 ` [PATCH v1 1/2] PCI/AER: Decode Error Source Requester ID Bjorn Helgaas
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Bjorn Helgaas @ 2018-05-30 17:54 UTC (permalink / raw)
  To: linux-pci; +Cc: Oza Pawandeep, linux-kernel

AER logging printed the plain 16-bit Requester ID straight out of the TLP,
which is hard to interpret, e.g., id=00e4 corresponds to what we normally
see as 00:1c.4 in dmesg or lspci.

Also, there's no need to print the vendor/device ID of the root port
reporting an error; we can easily find that from dmesg or lspci.

---

Bjorn Helgaas (2):
      PCI/AER: Decode Error Source Requester ID
      PCI/AER: Stop printing vendor/device ID


 drivers/pci/pcie/aer/aerdrv_errprint.c |   23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v1 1/2] PCI/AER: Decode Error Source Requester ID
  2018-05-30 17:54 [PATCH v1 0/2] PCI/AER: Clean up minor logging issues Bjorn Helgaas
@ 2018-05-30 17:54 ` Bjorn Helgaas
  2018-05-30 18:32   ` Lukas Wunner
  2018-05-30 18:41   ` Rajat Jain
  2018-05-30 17:54 ` [PATCH v1 2/2] PCI/AER: Stop printing vendor/device ID Bjorn Helgaas
  2018-06-05 22:21 ` [PATCH v1 0/2] PCI/AER: Clean up minor logging issues Bjorn Helgaas
  2 siblings, 2 replies; 11+ messages in thread
From: Bjorn Helgaas @ 2018-05-30 17:54 UTC (permalink / raw)
  To: linux-pci; +Cc: Oza Pawandeep, linux-kernel

From: Bjorn Helgaas <bhelgaas@google.com>

Decode the Requester ID from the AER Error Source Register into domain/
bus/device/function format to match other logging.  In cases where the ID
matches the device used for pci_err(), drop the extra ID completely so we
don't print it twice.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/pcie/aer/aerdrv_errprint.c |   18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c b/drivers/pci/pcie/aer/aerdrv_errprint.c
index 21ca5e1b0ded..d7fde8368d81 100644
--- a/drivers/pci/pcie/aer/aerdrv_errprint.c
+++ b/drivers/pci/pcie/aer/aerdrv_errprint.c
@@ -163,17 +163,17 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info)
 	int id = ((dev->bus->number << 8) | dev->devfn);
 
 	if (!info->status) {
-		pci_err(dev, "PCIe Bus Error: severity=%s, type=Unaccessible, id=%04x(Unregistered Agent ID)\n",
-			aer_error_severity_string[info->severity], id);
+		pci_err(dev, "PCIe Bus Error: severity=%s, type=Inaccessible, (Unregistered Agent ID)\n",
+			aer_error_severity_string[info->severity]);
 		goto out;
 	}
 
 	layer = AER_GET_LAYER_ERROR(info->severity, info->status);
 	agent = AER_GET_AGENT(info->severity, info->status);
 
-	pci_err(dev, "PCIe Bus Error: severity=%s, type=%s, id=%04x(%s)\n",
+	pci_err(dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n",
 		aer_error_severity_string[info->severity],
-		aer_error_layer[layer], id, aer_agent_string[agent]);
+		aer_error_layer[layer], aer_agent_string[agent]);
 
 	pci_err(dev, "  device [%04x:%04x] error status/mask=%08x/%08x\n",
 		dev->vendor, dev->device,
@@ -186,7 +186,7 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info)
 
 out:
 	if (info->id && info->error_dev_num > 1 && info->id == id)
-		pci_err(dev, "  Error of this Agent(%04x) is reported first\n", id);
+		pci_err(dev, "  Error of this Agent is reported first\n");
 
 	trace_aer_event(dev_name(&dev->dev), (info->status & ~info->mask),
 			info->severity, info->tlp_header_valid, &info->tlp);
@@ -194,9 +194,13 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info)
 
 void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info)
 {
-	pci_info(dev, "AER: %s%s error received: id=%04x\n",
+	u8 bus = info->id >> 8;
+	u8 devfn = info->id & 0xff;
+
+	pci_info(dev, "AER: %s%s error received: %04x:%02x:%02x.%d\n",
 		info->multi_error_valid ? "Multiple " : "",
-		aer_error_severity_string[info->severity], info->id);
+		aer_error_severity_string[info->severity],
+		pci_domain_nr(dev->bus), bus, devfn >> 3, devfn & 0x7);
 }
 
 #ifdef CONFIG_ACPI_APEI_PCIEAER

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v1 2/2] PCI/AER: Stop printing vendor/device ID
  2018-05-30 17:54 [PATCH v1 0/2] PCI/AER: Clean up minor logging issues Bjorn Helgaas
  2018-05-30 17:54 ` [PATCH v1 1/2] PCI/AER: Decode Error Source Requester ID Bjorn Helgaas
@ 2018-05-30 17:54 ` Bjorn Helgaas
  2018-05-30 18:18   ` Rajat Jain
  2018-06-05 22:21 ` [PATCH v1 0/2] PCI/AER: Clean up minor logging issues Bjorn Helgaas
  2 siblings, 1 reply; 11+ messages in thread
From: Bjorn Helgaas @ 2018-05-30 17:54 UTC (permalink / raw)
  To: linux-pci; +Cc: Oza Pawandeep, linux-kernel

From: Bjorn Helgaas <bhelgaas@google.com>

The Vendor and Device ID of the root port that raised an AER interrupt is
irrelevant and already available via normal enumeration dmesg logging or
lspci.

Remove the Vendor and Device ID from AER logging.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/pcie/aer/aerdrv_errprint.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c b/drivers/pci/pcie/aer/aerdrv_errprint.c
index d7fde8368d81..16116844531c 100644
--- a/drivers/pci/pcie/aer/aerdrv_errprint.c
+++ b/drivers/pci/pcie/aer/aerdrv_errprint.c
@@ -175,9 +175,8 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info)
 		aer_error_severity_string[info->severity],
 		aer_error_layer[layer], aer_agent_string[agent]);
 
-	pci_err(dev, "  device [%04x:%04x] error status/mask=%08x/%08x\n",
-		dev->vendor, dev->device,
-		info->status, info->mask);
+	pci_err(dev, "  error status/mask=%08x/%08x\n", info->status,
+		info->mask);
 
 	__aer_print_error(dev, info);
 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 2/2] PCI/AER: Stop printing vendor/device ID
  2018-05-30 17:54 ` [PATCH v1 2/2] PCI/AER: Stop printing vendor/device ID Bjorn Helgaas
@ 2018-05-30 18:18   ` Rajat Jain
  2018-05-31  0:28     ` Bjorn Helgaas
  0 siblings, 1 reply; 11+ messages in thread
From: Rajat Jain @ 2018-05-30 18:18 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, Oza Pawandeep, Linux Kernel Mailing List

On Wed, May 30, 2018 at 10:54 AM Bjorn Helgaas <helgaas@kernel.org> wrote:

> From: Bjorn Helgaas <bhelgaas@google.com>

> The Vendor and Device ID of the root port that raised an AER interrupt is
> irrelevant and already available via normal enumeration dmesg logging or
> lspci.

Er, what is getting printed is not the vendor/device id of the root port
but that of the AER source device (the one that root port got an ERR_*
message from). In case of fatal AERs, the end point device may become
inaccessible so lspci will not be available, and enumeration logs (from
boot) may have gotten rolled over. So I think it is still better to print
this information here.

Just my opinion :-)

Thanks,

Rajat


> Remove the Vendor and Device ID from AER logging.

> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
>   drivers/pci/pcie/aer/aerdrv_errprint.c |    5 ++---
>   1 file changed, 2 insertions(+), 3 deletions(-)

> diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c
b/drivers/pci/pcie/aer/aerdrv_errprint.c
> index d7fde8368d81..16116844531c 100644
> --- a/drivers/pci/pcie/aer/aerdrv_errprint.c
> +++ b/drivers/pci/pcie/aer/aerdrv_errprint.c
> @@ -175,9 +175,8 @@ void aer_print_error(struct pci_dev *dev, struct
aer_err_info *info)
>                  aer_error_severity_string[info->severity],
>                  aer_error_layer[layer], aer_agent_string[agent]);

> -       pci_err(dev, "  device [%04x:%04x] error status/mask=%08x/%08x\n",
> -               dev->vendor, dev->device,
> -               info->status, info->mask);
> +       pci_err(dev, "  error status/mask=%08x/%08x\n", info->status,
> +               info->mask);

>          __aer_print_error(dev, info);

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 1/2] PCI/AER: Decode Error Source Requester ID
  2018-05-30 17:54 ` [PATCH v1 1/2] PCI/AER: Decode Error Source Requester ID Bjorn Helgaas
@ 2018-05-30 18:32   ` Lukas Wunner
  2018-05-31  4:54     ` Bjorn Helgaas
  2018-05-30 18:41   ` Rajat Jain
  1 sibling, 1 reply; 11+ messages in thread
From: Lukas Wunner @ 2018-05-30 18:32 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, Oza Pawandeep, linux-kernel

On Wed, May 30, 2018 at 12:54:15PM -0500, Bjorn Helgaas wrote:
>  void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info)
>  {
> -	pci_info(dev, "AER: %s%s error received: id=%04x\n",
> +	u8 bus = info->id >> 8;
> +	u8 devfn = info->id & 0xff;
> +
> +	pci_info(dev, "AER: %s%s error received: %04x:%02x:%02x.%d\n",
>  		info->multi_error_valid ? "Multiple " : "",
> -		aer_error_severity_string[info->severity], info->id);
> +		aer_error_severity_string[info->severity],
> +		pci_domain_nr(dev->bus), bus, devfn >> 3, devfn & 0x7);

I think PCI_SLOT(devfn), PCI_FUNC(devfn) is a bit more readable.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 1/2] PCI/AER: Decode Error Source Requester ID
  2018-05-30 17:54 ` [PATCH v1 1/2] PCI/AER: Decode Error Source Requester ID Bjorn Helgaas
  2018-05-30 18:32   ` Lukas Wunner
@ 2018-05-30 18:41   ` Rajat Jain
  2018-05-31  4:42     ` Bjorn Helgaas
  1 sibling, 1 reply; 11+ messages in thread
From: Rajat Jain @ 2018-05-30 18:41 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, Oza Pawandeep, Linux Kernel Mailing List

On Wed, May 30, 2018 at 10:54 AM Bjorn Helgaas <helgaas@kernel.org> wrote:

> From: Bjorn Helgaas <bhelgaas@google.com>

> Decode the Requester ID from the AER Error Source Register into domain/
> bus/device/function format to match other logging.  In cases where the ID
> matches the device used for pci_err(), drop the extra ID completely so we
> don't print it twice.

> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
>   drivers/pci/pcie/aer/aerdrv_errprint.c |   18 +++++++++++-------
>   1 file changed, 11 insertions(+), 7 deletions(-)

> diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c
b/drivers/pci/pcie/aer/aerdrv_errprint.c
> index 21ca5e1b0ded..d7fde8368d81 100644
> --- a/drivers/pci/pcie/aer/aerdrv_errprint.c
> +++ b/drivers/pci/pcie/aer/aerdrv_errprint.c
> @@ -163,17 +163,17 @@ void aer_print_error(struct pci_dev *dev, struct
aer_err_info *info)
>          int id = ((dev->bus->number << 8) | dev->devfn);

>          if (!info->status) {
> -               pci_err(dev, "PCIe Bus Error: severity=%s,
type=Unaccessible, id=%04x(Unregistered Agent ID)\n",
> -                       aer_error_severity_string[info->severity], id);
> +               pci_err(dev, "PCIe Bus Error: severity=%s,
type=Inaccessible, (Unregistered Agent ID)\n",
> +                       aer_error_severity_string[info->severity]);

Does this code path indicate that a requester id was decoded to a device
that is not registered with the kernel? If so, shouldn't we log the bad
requester ID for better debugging, specifically since there is not going to
be any subsequent print about this ID (since we return from this function
in this case)?

>                  goto out;
>          }

>          layer = AER_GET_LAYER_ERROR(info->severity, info->status);
>          agent = AER_GET_AGENT(info->severity, info->status);

> -       pci_err(dev, "PCIe Bus Error: severity=%s, type=%s,
id=%04x(%s)\n",
> +       pci_err(dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n",
>                  aer_error_severity_string[info->severity],
> -               aer_error_layer[layer], id, aer_agent_string[agent]);
> +               aer_error_layer[layer], aer_agent_string[agent]);

>          pci_err(dev, "  device [%04x:%04x] error status/mask=%08x/%08x\n",
>                  dev->vendor, dev->device,
> @@ -186,7 +186,7 @@ void aer_print_error(struct pci_dev *dev, struct
aer_err_info *info)

>   out:
>          if (info->id && info->error_dev_num > 1 && info->id == id)
> -               pci_err(dev, "  Error of this Agent(%04x) is reported
first\n", id);
> +               pci_err(dev, "  Error of this Agent is reported first\n");

>          trace_aer_event(dev_name(&dev->dev), (info->status & ~info->mask),
>                          info->severity, info->tlp_header_valid,
&info->tlp);
> @@ -194,9 +194,13 @@ void aer_print_error(struct pci_dev *dev, struct
aer_err_info *info)

>   void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info)
>   {
> -       pci_info(dev, "AER: %s%s error received: id=%04x\n",
> +       u8 bus = info->id >> 8;
> +       u8 devfn = info->id & 0xff;
> +
> +       pci_info(dev, "AER: %s%s error received: %04x:%02x:%02x.%d\n",
>                  info->multi_error_valid ? "Multiple " : "",
> -               aer_error_severity_string[info->severity], info->id);
> +               aer_error_severity_string[info->severity],
> +               pci_domain_nr(dev->bus), bus, devfn >> 3, devfn & 0x7);
>   }

>   #ifdef CONFIG_ACPI_APEI_PCIEAER

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 2/2] PCI/AER: Stop printing vendor/device ID
  2018-05-30 18:18   ` Rajat Jain
@ 2018-05-31  0:28     ` Bjorn Helgaas
  0 siblings, 0 replies; 11+ messages in thread
From: Bjorn Helgaas @ 2018-05-31  0:28 UTC (permalink / raw)
  To: Rajat Jain; +Cc: linux-pci, Oza Pawandeep, Linux Kernel Mailing List

On Wed, May 30, 2018 at 11:18:35AM -0700, Rajat Jain wrote:
> On Wed, May 30, 2018 at 10:54 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> 
> > From: Bjorn Helgaas <bhelgaas@google.com>
> 
> > The Vendor and Device ID of the root port that raised an AER interrupt is
> > irrelevant and already available via normal enumeration dmesg logging or
> > lspci.
> 
> Er, what is getting printed is not the vendor/device id of the root port
> but that of the AER source device (the one that root port got an ERR_*
> message from). In case of fatal AERs, the end point device may become
> inaccessible so lspci will not be available, and enumeration logs (from
> boot) may have gotten rolled over. So I think it is still better to print
> this information here.

Thanks for looking this over!

You're right, "dev" here is not necessarily the Root Port, so this
changelog is bogus.  "dev" came from e_info->dev[] from
aer_process_err_devices().

I think to be more precise, aer_irq() reads the Root Port's
PCI_ERR_ROOT_ERR_SRC register, which gives us the Requester ID from
the ERR_* message.  Then find_source_device() walks the tree starting
with the Root Port, looking for:

  - a device that matches the Requester ID, or
  - a device that doesn't match the Requester ID (e.g., because a VMD
    port clears the source ID) but has AER enabled and has logged an
    error of the same type (ERR_COR vs ERR_FATAL/NONFATAL) we're
    currently decoding

So there might be multiple "dev" pointers in e_info->dev[] because
several devices could have logged errors.

I'm not convinced the vendor/device ID is that useful because there
might be several devices with the same ID, so it doesn't really tell
you which one.  The Requester ID (bus/device/function) is the
important thing.

The current code is not ideal because the find_source_device() path
depends on the pci_dev still being present and even accessible (so we
can read DEVCTL, ERR_COR_STATUS, etc), which might not be the case.

If find_source_device() fails, i.e., it can't find a matching pci_dev
and prints the "can't find device of ID%04x" message, we're in real
trouble because we don't call aer_process_err_devices(), which means
we don't clear PCI_ERR_COR_STATUS.

Anyway, I'll abandon this change for now since it's not a clear
improvement.

> > Remove the Vendor and Device ID from AER logging.
> 
> > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> > ---
> >   drivers/pci/pcie/aer/aerdrv_errprint.c |    5 ++---
> >   1 file changed, 2 insertions(+), 3 deletions(-)
> 
> > diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c
> b/drivers/pci/pcie/aer/aerdrv_errprint.c
> > index d7fde8368d81..16116844531c 100644
> > --- a/drivers/pci/pcie/aer/aerdrv_errprint.c
> > +++ b/drivers/pci/pcie/aer/aerdrv_errprint.c
> > @@ -175,9 +175,8 @@ void aer_print_error(struct pci_dev *dev, struct
> aer_err_info *info)
> >                  aer_error_severity_string[info->severity],
> >                  aer_error_layer[layer], aer_agent_string[agent]);
> 
> > -       pci_err(dev, "  device [%04x:%04x] error status/mask=%08x/%08x\n",
> > -               dev->vendor, dev->device,
> > -               info->status, info->mask);
> > +       pci_err(dev, "  error status/mask=%08x/%08x\n", info->status,
> > +               info->mask);
> 
> >          __aer_print_error(dev, info);

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 1/2] PCI/AER: Decode Error Source Requester ID
  2018-05-30 18:41   ` Rajat Jain
@ 2018-05-31  4:42     ` Bjorn Helgaas
  2018-05-31 17:35       ` Rajat Jain
  0 siblings, 1 reply; 11+ messages in thread
From: Bjorn Helgaas @ 2018-05-31  4:42 UTC (permalink / raw)
  To: Rajat Jain; +Cc: linux-pci, Oza Pawandeep, Linux Kernel Mailing List

On Wed, May 30, 2018 at 11:41:23AM -0700, Rajat Jain wrote:
> On Wed, May 30, 2018 at 10:54 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> 
> > From: Bjorn Helgaas <bhelgaas@google.com>
> 
> > Decode the Requester ID from the AER Error Source Register into domain/
> > bus/device/function format to match other logging.  In cases where the ID
> > matches the device used for pci_err(), drop the extra ID completely so we
> > don't print it twice.
> 
> > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> > ---
> >   drivers/pci/pcie/aer/aerdrv_errprint.c |   18 +++++++++++-------
> >   1 file changed, 11 insertions(+), 7 deletions(-)
> 
> > diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c
> b/drivers/pci/pcie/aer/aerdrv_errprint.c
> > index 21ca5e1b0ded..d7fde8368d81 100644
> > --- a/drivers/pci/pcie/aer/aerdrv_errprint.c
> > +++ b/drivers/pci/pcie/aer/aerdrv_errprint.c
> > @@ -163,17 +163,17 @@ void aer_print_error(struct pci_dev *dev, struct
> aer_err_info *info)
> >          int id = ((dev->bus->number << 8) | dev->devfn);

> >          if (!info->status) {
> > -               pci_err(dev, "PCIe Bus Error: severity=%s,
> type=Unaccessible, id=%04x(Unregistered Agent ID)\n",
> > -                       aer_error_severity_string[info->severity], id);
> > +               pci_err(dev, "PCIe Bus Error: severity=%s,
> type=Inaccessible, (Unregistered Agent ID)\n",
> > +                       aer_error_severity_string[info->severity]);
> 
> Does this code path indicate that a requester id was decoded to a device
> that is not registered with the kernel? If so, shouldn't we log the bad
> requester ID for better debugging, specifically since there is not going to
> be any subsequent print about this ID (since we return from this function
> in this case)?

Previously we printed "id", which was constructed above from "dev":

  id = ((dev->bus->number << 8) | dev->devfn);

so even if we print "id=%04x", it contains exactly the same
information as the bus/device/function printed using "dev".

So no, I don't think "Unregistered Agent ID" means a device not registered
with the kernel.  At any rate, we do have a pci_dev for it.

I *think* "info->status == 0" means PCI_ERR_COR_STATUS (or
PCI_ERR_UNCOR_STATUS) was zero, i.e., we didn't find any error status
bits set for this device.  I don't think "Unregistered Agent ID" is a
very good description of this situation.

Bjorn

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 1/2] PCI/AER: Decode Error Source Requester ID
  2018-05-30 18:32   ` Lukas Wunner
@ 2018-05-31  4:54     ` Bjorn Helgaas
  0 siblings, 0 replies; 11+ messages in thread
From: Bjorn Helgaas @ 2018-05-31  4:54 UTC (permalink / raw)
  To: Lukas Wunner; +Cc: linux-pci, Oza Pawandeep, linux-kernel

On Wed, May 30, 2018 at 08:32:41PM +0200, Lukas Wunner wrote:
> On Wed, May 30, 2018 at 12:54:15PM -0500, Bjorn Helgaas wrote:
> >  void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info)
> >  {
> > -	pci_info(dev, "AER: %s%s error received: id=%04x\n",
> > +	u8 bus = info->id >> 8;
> > +	u8 devfn = info->id & 0xff;
> > +
> > +	pci_info(dev, "AER: %s%s error received: %04x:%02x:%02x.%d\n",
> >  		info->multi_error_valid ? "Multiple " : "",
> > -		aer_error_severity_string[info->severity], info->id);
> > +		aer_error_severity_string[info->severity],
> > +		pci_domain_nr(dev->bus), bus, devfn >> 3, devfn & 0x7);
> 
> I think PCI_SLOT(devfn), PCI_FUNC(devfn) is a bit more readable.

I used those originally, but of course those definitions predate PCIe
so they aren't clearly related to a Requester ID.

I searched the PCIe spec for the specifics of the Requester ID
composition.  It was surprisingly hard to find a clear statement.  The
best I found was PCIe r4.0, sec 6.13, which says

    Routing IDs, Requester IDs, and Completer IDs are 16-bit
    identifiers traditionally composed of three fields: an 8-bit Bus
    Number, a 5-bit Device Number, and a 3-bit Function Number.

Even that isn't specific about where the fields are, 

But it's probably not worth obsessing over this and PCI_SLOT() and
PCI_FUNC() are definitely more readable, so I changed them.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 1/2] PCI/AER: Decode Error Source Requester ID
  2018-05-31  4:42     ` Bjorn Helgaas
@ 2018-05-31 17:35       ` Rajat Jain
  0 siblings, 0 replies; 11+ messages in thread
From: Rajat Jain @ 2018-05-31 17:35 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, Oza Pawandeep, Linux Kernel Mailing List

On Wed, May 30, 2018 at 9:42 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Wed, May 30, 2018 at 11:41:23AM -0700, Rajat Jain wrote:
> > On Wed, May 30, 2018 at 10:54 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > > From: Bjorn Helgaas <bhelgaas@google.com>
> >
> > > Decode the Requester ID from the AER Error Source Register into domain/
> > > bus/device/function format to match other logging.  In cases where the ID
> > > matches the device used for pci_err(), drop the extra ID completely so we
> > > don't print it twice.
> >
> > > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> > > ---
> > >   drivers/pci/pcie/aer/aerdrv_errprint.c |   18 +++++++++++-------
> > >   1 file changed, 11 insertions(+), 7 deletions(-)
> >
> > > diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c
> > b/drivers/pci/pcie/aer/aerdrv_errprint.c
> > > index 21ca5e1b0ded..d7fde8368d81 100644
> > > --- a/drivers/pci/pcie/aer/aerdrv_errprint.c
> > > +++ b/drivers/pci/pcie/aer/aerdrv_errprint.c
> > > @@ -163,17 +163,17 @@ void aer_print_error(struct pci_dev *dev, struct
> > aer_err_info *info)
> > >          int id = ((dev->bus->number << 8) | dev->devfn);
>
> > >          if (!info->status) {
> > > -               pci_err(dev, "PCIe Bus Error: severity=%s,
> > type=Unaccessible, id=%04x(Unregistered Agent ID)\n",
> > > -                       aer_error_severity_string[info->severity], id);
> > > +               pci_err(dev, "PCIe Bus Error: severity=%s,
> > type=Inaccessible, (Unregistered Agent ID)\n",
> > > +                       aer_error_severity_string[info->severity]);
> >
> > Does this code path indicate that a requester id was decoded to a device
> > that is not registered with the kernel? If so, shouldn't we log the bad
> > requester ID for better debugging, specifically since there is not going to
> > be any subsequent print about this ID (since we return from this function
> > in this case)?
>
> Previously we printed "id", which was constructed above from "dev":
>
>   id = ((dev->bus->number << 8) | dev->devfn);
>
> so even if we print "id=%04x", it contains exactly the same
> information as the bus/device/function printed using "dev".

Sorry, my bad, I missed it, despite it being right there in my face :-).

>
> So no, I don't think "Unregistered Agent ID" means a device not registered
> with the kernel.  At any rate, we do have a pci_dev for it.
>
> I *think* "info->status == 0" means PCI_ERR_COR_STATUS (or
> PCI_ERR_UNCOR_STATUS) was zero, i.e., we didn't find any error status
> bits set for this device.  I don't think "Unregistered Agent ID" is a
> very good description of this situation.

Agree, may be something along the lines of "Unknown Error Status"
might be better.

Thanks,

Rajat

>
> Bjorn

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 0/2] PCI/AER: Clean up minor logging issues
  2018-05-30 17:54 [PATCH v1 0/2] PCI/AER: Clean up minor logging issues Bjorn Helgaas
  2018-05-30 17:54 ` [PATCH v1 1/2] PCI/AER: Decode Error Source Requester ID Bjorn Helgaas
  2018-05-30 17:54 ` [PATCH v1 2/2] PCI/AER: Stop printing vendor/device ID Bjorn Helgaas
@ 2018-06-05 22:21 ` Bjorn Helgaas
  2 siblings, 0 replies; 11+ messages in thread
From: Bjorn Helgaas @ 2018-06-05 22:21 UTC (permalink / raw)
  To: linux-pci; +Cc: Oza Pawandeep, linux-kernel

On Wed, May 30, 2018 at 12:54:10PM -0500, Bjorn Helgaas wrote:
> AER logging printed the plain 16-bit Requester ID straight out of the TLP,
> which is hard to interpret, e.g., id=00e4 corresponds to what we normally
> see as 00:1c.4 in dmesg or lspci.
> 
> Also, there's no need to print the vendor/device ID of the root port
> reporting an error; we can easily find that from dmesg or lspci.
> 
> ---
> 
> Bjorn Helgaas (2):
>       PCI/AER: Decode Error Source Requester ID
>       PCI/AER: Stop printing vendor/device ID
> 
> 
>  drivers/pci/pcie/aer/aerdrv_errprint.c |   23 +++++++++++++----------
>  1 file changed, 13 insertions(+), 10 deletions(-)

I applied the first but not the second to pci/aer for v4.18.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-06-05 22:22 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-05-30 17:54 [PATCH v1 0/2] PCI/AER: Clean up minor logging issues Bjorn Helgaas
2018-05-30 17:54 ` [PATCH v1 1/2] PCI/AER: Decode Error Source Requester ID Bjorn Helgaas
2018-05-30 18:32   ` Lukas Wunner
2018-05-31  4:54     ` Bjorn Helgaas
2018-05-30 18:41   ` Rajat Jain
2018-05-31  4:42     ` Bjorn Helgaas
2018-05-31 17:35       ` Rajat Jain
2018-05-30 17:54 ` [PATCH v1 2/2] PCI/AER: Stop printing vendor/device ID Bjorn Helgaas
2018-05-30 18:18   ` Rajat Jain
2018-05-31  0:28     ` Bjorn Helgaas
2018-06-05 22:21 ` [PATCH v1 0/2] PCI/AER: Clean up minor logging issues Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).