* [PATCH v1 0/2] PCI/AER: Clean up minor logging issues
@ 2018-05-30 17:54 Bjorn Helgaas
  2018-05-30 17:54 ` [PATCH v1 1/2] PCI/AER: Decode Error Source Requester ID Bjorn Helgaas
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Bjorn Helgaas @ 2018-05-30 17:54 UTC (permalink / raw)
  To: linux-pci; +Cc: Oza Pawandeep, linux-kernel
AER logging printed the plain 16-bit Requester ID straight out of the TLP,
which is hard to interpret, e.g., id=00e4 corresponds to what we normally
see as 00:1c.4 in dmesg or lspci.
Also, there's no need to print the vendor/device ID of the root port
reporting an error; we can easily find that from dmesg or lspci.
---
Bjorn Helgaas (2):
      PCI/AER: Decode Error Source Requester ID
      PCI/AER: Stop printing vendor/device ID
 drivers/pci/pcie/aer/aerdrv_errprint.c |   23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)
^ permalink raw reply	[flat|nested] 11+ messages in thread* [PATCH v1 1/2] PCI/AER: Decode Error Source Requester ID 2018-05-30 17:54 [PATCH v1 0/2] PCI/AER: Clean up minor logging issues Bjorn Helgaas @ 2018-05-30 17:54 ` Bjorn Helgaas 2018-05-30 18:32 ` Lukas Wunner 2018-05-30 18:41 ` Rajat Jain 2018-05-30 17:54 ` [PATCH v1 2/2] PCI/AER: Stop printing vendor/device ID Bjorn Helgaas 2018-06-05 22:21 ` [PATCH v1 0/2] PCI/AER: Clean up minor logging issues Bjorn Helgaas 2 siblings, 2 replies; 11+ messages in thread From: Bjorn Helgaas @ 2018-05-30 17:54 UTC (permalink / raw) To: linux-pci; +Cc: Oza Pawandeep, linux-kernel From: Bjorn Helgaas <bhelgaas@google.com> Decode the Requester ID from the AER Error Source Register into domain/ bus/device/function format to match other logging. In cases where the ID matches the device used for pci_err(), drop the extra ID completely so we don't print it twice. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> --- drivers/pci/pcie/aer/aerdrv_errprint.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c b/drivers/pci/pcie/aer/aerdrv_errprint.c index 21ca5e1b0ded..d7fde8368d81 100644 --- a/drivers/pci/pcie/aer/aerdrv_errprint.c +++ b/drivers/pci/pcie/aer/aerdrv_errprint.c @@ -163,17 +163,17 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info) int id = ((dev->bus->number << 8) | dev->devfn); if (!info->status) { - pci_err(dev, "PCIe Bus Error: severity=%s, type=Unaccessible, id=%04x(Unregistered Agent ID)\n", - aer_error_severity_string[info->severity], id); + pci_err(dev, "PCIe Bus Error: severity=%s, type=Inaccessible, (Unregistered Agent ID)\n", + aer_error_severity_string[info->severity]); goto out; } layer = AER_GET_LAYER_ERROR(info->severity, info->status); agent = AER_GET_AGENT(info->severity, info->status); - pci_err(dev, "PCIe Bus Error: severity=%s, type=%s, id=%04x(%s)\n", + pci_err(dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n", aer_error_severity_string[info->severity], - aer_error_layer[layer], id, aer_agent_string[agent]); + aer_error_layer[layer], aer_agent_string[agent]); pci_err(dev, " device [%04x:%04x] error status/mask=%08x/%08x\n", dev->vendor, dev->device, @@ -186,7 +186,7 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info) out: if (info->id && info->error_dev_num > 1 && info->id == id) - pci_err(dev, " Error of this Agent(%04x) is reported first\n", id); + pci_err(dev, " Error of this Agent is reported first\n"); trace_aer_event(dev_name(&dev->dev), (info->status & ~info->mask), info->severity, info->tlp_header_valid, &info->tlp); @@ -194,9 +194,13 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info) void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info) { - pci_info(dev, "AER: %s%s error received: id=%04x\n", + u8 bus = info->id >> 8; + u8 devfn = info->id & 0xff; + + pci_info(dev, "AER: %s%s error received: %04x:%02x:%02x.%d\n", info->multi_error_valid ? "Multiple " : "", - aer_error_severity_string[info->severity], info->id); + aer_error_severity_string[info->severity], + pci_domain_nr(dev->bus), bus, devfn >> 3, devfn & 0x7); } #ifdef CONFIG_ACPI_APEI_PCIEAER ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v1 1/2] PCI/AER: Decode Error Source Requester ID 2018-05-30 17:54 ` [PATCH v1 1/2] PCI/AER: Decode Error Source Requester ID Bjorn Helgaas @ 2018-05-30 18:32 ` Lukas Wunner 2018-05-31 4:54 ` Bjorn Helgaas 2018-05-30 18:41 ` Rajat Jain 1 sibling, 1 reply; 11+ messages in thread From: Lukas Wunner @ 2018-05-30 18:32 UTC (permalink / raw) To: Bjorn Helgaas; +Cc: linux-pci, Oza Pawandeep, linux-kernel On Wed, May 30, 2018 at 12:54:15PM -0500, Bjorn Helgaas wrote: > void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info) > { > - pci_info(dev, "AER: %s%s error received: id=%04x\n", > + u8 bus = info->id >> 8; > + u8 devfn = info->id & 0xff; > + > + pci_info(dev, "AER: %s%s error received: %04x:%02x:%02x.%d\n", > info->multi_error_valid ? "Multiple " : "", > - aer_error_severity_string[info->severity], info->id); > + aer_error_severity_string[info->severity], > + pci_domain_nr(dev->bus), bus, devfn >> 3, devfn & 0x7); I think PCI_SLOT(devfn), PCI_FUNC(devfn) is a bit more readable. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v1 1/2] PCI/AER: Decode Error Source Requester ID 2018-05-30 18:32 ` Lukas Wunner @ 2018-05-31 4:54 ` Bjorn Helgaas 0 siblings, 0 replies; 11+ messages in thread From: Bjorn Helgaas @ 2018-05-31 4:54 UTC (permalink / raw) To: Lukas Wunner; +Cc: linux-pci, Oza Pawandeep, linux-kernel On Wed, May 30, 2018 at 08:32:41PM +0200, Lukas Wunner wrote: > On Wed, May 30, 2018 at 12:54:15PM -0500, Bjorn Helgaas wrote: > > void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info) > > { > > - pci_info(dev, "AER: %s%s error received: id=%04x\n", > > + u8 bus = info->id >> 8; > > + u8 devfn = info->id & 0xff; > > + > > + pci_info(dev, "AER: %s%s error received: %04x:%02x:%02x.%d\n", > > info->multi_error_valid ? "Multiple " : "", > > - aer_error_severity_string[info->severity], info->id); > > + aer_error_severity_string[info->severity], > > + pci_domain_nr(dev->bus), bus, devfn >> 3, devfn & 0x7); > > I think PCI_SLOT(devfn), PCI_FUNC(devfn) is a bit more readable. I used those originally, but of course those definitions predate PCIe so they aren't clearly related to a Requester ID. I searched the PCIe spec for the specifics of the Requester ID composition. It was surprisingly hard to find a clear statement. The best I found was PCIe r4.0, sec 6.13, which says Routing IDs, Requester IDs, and Completer IDs are 16-bit identifiers traditionally composed of three fields: an 8-bit Bus Number, a 5-bit Device Number, and a 3-bit Function Number. Even that isn't specific about where the fields are, But it's probably not worth obsessing over this and PCI_SLOT() and PCI_FUNC() are definitely more readable, so I changed them. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v1 1/2] PCI/AER: Decode Error Source Requester ID 2018-05-30 17:54 ` [PATCH v1 1/2] PCI/AER: Decode Error Source Requester ID Bjorn Helgaas 2018-05-30 18:32 ` Lukas Wunner @ 2018-05-30 18:41 ` Rajat Jain 2018-05-31 4:42 ` Bjorn Helgaas 1 sibling, 1 reply; 11+ messages in thread From: Rajat Jain @ 2018-05-30 18:41 UTC (permalink / raw) To: Bjorn Helgaas; +Cc: linux-pci, Oza Pawandeep, Linux Kernel Mailing List On Wed, May 30, 2018 at 10:54 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > From: Bjorn Helgaas <bhelgaas@google.com> > Decode the Requester ID from the AER Error Source Register into domain/ > bus/device/function format to match other logging. In cases where the ID > matches the device used for pci_err(), drop the extra ID completely so we > don't print it twice. > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> > --- > drivers/pci/pcie/aer/aerdrv_errprint.c | 18 +++++++++++------- > 1 file changed, 11 insertions(+), 7 deletions(-) > diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c b/drivers/pci/pcie/aer/aerdrv_errprint.c > index 21ca5e1b0ded..d7fde8368d81 100644 > --- a/drivers/pci/pcie/aer/aerdrv_errprint.c > +++ b/drivers/pci/pcie/aer/aerdrv_errprint.c > @@ -163,17 +163,17 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info) > int id = ((dev->bus->number << 8) | dev->devfn); > if (!info->status) { > - pci_err(dev, "PCIe Bus Error: severity=%s, type=Unaccessible, id=%04x(Unregistered Agent ID)\n", > - aer_error_severity_string[info->severity], id); > + pci_err(dev, "PCIe Bus Error: severity=%s, type=Inaccessible, (Unregistered Agent ID)\n", > + aer_error_severity_string[info->severity]); Does this code path indicate that a requester id was decoded to a device that is not registered with the kernel? If so, shouldn't we log the bad requester ID for better debugging, specifically since there is not going to be any subsequent print about this ID (since we return from this function in this case)? > goto out; > } > layer = AER_GET_LAYER_ERROR(info->severity, info->status); > agent = AER_GET_AGENT(info->severity, info->status); > - pci_err(dev, "PCIe Bus Error: severity=%s, type=%s, id=%04x(%s)\n", > + pci_err(dev, "PCIe Bus Error: severity=%s, type=%s, (%s)\n", > aer_error_severity_string[info->severity], > - aer_error_layer[layer], id, aer_agent_string[agent]); > + aer_error_layer[layer], aer_agent_string[agent]); > pci_err(dev, " device [%04x:%04x] error status/mask=%08x/%08x\n", > dev->vendor, dev->device, > @@ -186,7 +186,7 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info) > out: > if (info->id && info->error_dev_num > 1 && info->id == id) > - pci_err(dev, " Error of this Agent(%04x) is reported first\n", id); > + pci_err(dev, " Error of this Agent is reported first\n"); > trace_aer_event(dev_name(&dev->dev), (info->status & ~info->mask), > info->severity, info->tlp_header_valid, &info->tlp); > @@ -194,9 +194,13 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info) > void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info) > { > - pci_info(dev, "AER: %s%s error received: id=%04x\n", > + u8 bus = info->id >> 8; > + u8 devfn = info->id & 0xff; > + > + pci_info(dev, "AER: %s%s error received: %04x:%02x:%02x.%d\n", > info->multi_error_valid ? "Multiple " : "", > - aer_error_severity_string[info->severity], info->id); > + aer_error_severity_string[info->severity], > + pci_domain_nr(dev->bus), bus, devfn >> 3, devfn & 0x7); > } > #ifdef CONFIG_ACPI_APEI_PCIEAER ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v1 1/2] PCI/AER: Decode Error Source Requester ID 2018-05-30 18:41 ` Rajat Jain @ 2018-05-31 4:42 ` Bjorn Helgaas 2018-05-31 17:35 ` Rajat Jain 0 siblings, 1 reply; 11+ messages in thread From: Bjorn Helgaas @ 2018-05-31 4:42 UTC (permalink / raw) To: Rajat Jain; +Cc: linux-pci, Oza Pawandeep, Linux Kernel Mailing List On Wed, May 30, 2018 at 11:41:23AM -0700, Rajat Jain wrote: > On Wed, May 30, 2018 at 10:54 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > From: Bjorn Helgaas <bhelgaas@google.com> > > > Decode the Requester ID from the AER Error Source Register into domain/ > > bus/device/function format to match other logging. In cases where the ID > > matches the device used for pci_err(), drop the extra ID completely so we > > don't print it twice. > > > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> > > --- > > drivers/pci/pcie/aer/aerdrv_errprint.c | 18 +++++++++++------- > > 1 file changed, 11 insertions(+), 7 deletions(-) > > > diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c > b/drivers/pci/pcie/aer/aerdrv_errprint.c > > index 21ca5e1b0ded..d7fde8368d81 100644 > > --- a/drivers/pci/pcie/aer/aerdrv_errprint.c > > +++ b/drivers/pci/pcie/aer/aerdrv_errprint.c > > @@ -163,17 +163,17 @@ void aer_print_error(struct pci_dev *dev, struct > aer_err_info *info) > > int id = ((dev->bus->number << 8) | dev->devfn); > > if (!info->status) { > > - pci_err(dev, "PCIe Bus Error: severity=%s, > type=Unaccessible, id=%04x(Unregistered Agent ID)\n", > > - aer_error_severity_string[info->severity], id); > > + pci_err(dev, "PCIe Bus Error: severity=%s, > type=Inaccessible, (Unregistered Agent ID)\n", > > + aer_error_severity_string[info->severity]); > > Does this code path indicate that a requester id was decoded to a device > that is not registered with the kernel? If so, shouldn't we log the bad > requester ID for better debugging, specifically since there is not going to > be any subsequent print about this ID (since we return from this function > in this case)? Previously we printed "id", which was constructed above from "dev": id = ((dev->bus->number << 8) | dev->devfn); so even if we print "id=%04x", it contains exactly the same information as the bus/device/function printed using "dev". So no, I don't think "Unregistered Agent ID" means a device not registered with the kernel. At any rate, we do have a pci_dev for it. I *think* "info->status == 0" means PCI_ERR_COR_STATUS (or PCI_ERR_UNCOR_STATUS) was zero, i.e., we didn't find any error status bits set for this device. I don't think "Unregistered Agent ID" is a very good description of this situation. Bjorn ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v1 1/2] PCI/AER: Decode Error Source Requester ID 2018-05-31 4:42 ` Bjorn Helgaas @ 2018-05-31 17:35 ` Rajat Jain 0 siblings, 0 replies; 11+ messages in thread From: Rajat Jain @ 2018-05-31 17:35 UTC (permalink / raw) To: Bjorn Helgaas; +Cc: linux-pci, Oza Pawandeep, Linux Kernel Mailing List On Wed, May 30, 2018 at 9:42 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Wed, May 30, 2018 at 11:41:23AM -0700, Rajat Jain wrote: > > On Wed, May 30, 2018 at 10:54 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > From: Bjorn Helgaas <bhelgaas@google.com> > > > > > Decode the Requester ID from the AER Error Source Register into domain/ > > > bus/device/function format to match other logging. In cases where the ID > > > matches the device used for pci_err(), drop the extra ID completely so we > > > don't print it twice. > > > > > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> > > > --- > > > drivers/pci/pcie/aer/aerdrv_errprint.c | 18 +++++++++++------- > > > 1 file changed, 11 insertions(+), 7 deletions(-) > > > > > diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c > > b/drivers/pci/pcie/aer/aerdrv_errprint.c > > > index 21ca5e1b0ded..d7fde8368d81 100644 > > > --- a/drivers/pci/pcie/aer/aerdrv_errprint.c > > > +++ b/drivers/pci/pcie/aer/aerdrv_errprint.c > > > @@ -163,17 +163,17 @@ void aer_print_error(struct pci_dev *dev, struct > > aer_err_info *info) > > > int id = ((dev->bus->number << 8) | dev->devfn); > > > > if (!info->status) { > > > - pci_err(dev, "PCIe Bus Error: severity=%s, > > type=Unaccessible, id=%04x(Unregistered Agent ID)\n", > > > - aer_error_severity_string[info->severity], id); > > > + pci_err(dev, "PCIe Bus Error: severity=%s, > > type=Inaccessible, (Unregistered Agent ID)\n", > > > + aer_error_severity_string[info->severity]); > > > > Does this code path indicate that a requester id was decoded to a device > > that is not registered with the kernel? If so, shouldn't we log the bad > > requester ID for better debugging, specifically since there is not going to > > be any subsequent print about this ID (since we return from this function > > in this case)? > > Previously we printed "id", which was constructed above from "dev": > > id = ((dev->bus->number << 8) | dev->devfn); > > so even if we print "id=%04x", it contains exactly the same > information as the bus/device/function printed using "dev". Sorry, my bad, I missed it, despite it being right there in my face :-). > > So no, I don't think "Unregistered Agent ID" means a device not registered > with the kernel. At any rate, we do have a pci_dev for it. > > I *think* "info->status == 0" means PCI_ERR_COR_STATUS (or > PCI_ERR_UNCOR_STATUS) was zero, i.e., we didn't find any error status > bits set for this device. I don't think "Unregistered Agent ID" is a > very good description of this situation. Agree, may be something along the lines of "Unknown Error Status" might be better. Thanks, Rajat > > Bjorn ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v1 2/2] PCI/AER: Stop printing vendor/device ID 2018-05-30 17:54 [PATCH v1 0/2] PCI/AER: Clean up minor logging issues Bjorn Helgaas 2018-05-30 17:54 ` [PATCH v1 1/2] PCI/AER: Decode Error Source Requester ID Bjorn Helgaas @ 2018-05-30 17:54 ` Bjorn Helgaas 2018-05-30 18:18 ` Rajat Jain 2018-06-05 22:21 ` [PATCH v1 0/2] PCI/AER: Clean up minor logging issues Bjorn Helgaas 2 siblings, 1 reply; 11+ messages in thread From: Bjorn Helgaas @ 2018-05-30 17:54 UTC (permalink / raw) To: linux-pci; +Cc: Oza Pawandeep, linux-kernel From: Bjorn Helgaas <bhelgaas@google.com> The Vendor and Device ID of the root port that raised an AER interrupt is irrelevant and already available via normal enumeration dmesg logging or lspci. Remove the Vendor and Device ID from AER logging. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> --- drivers/pci/pcie/aer/aerdrv_errprint.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c b/drivers/pci/pcie/aer/aerdrv_errprint.c index d7fde8368d81..16116844531c 100644 --- a/drivers/pci/pcie/aer/aerdrv_errprint.c +++ b/drivers/pci/pcie/aer/aerdrv_errprint.c @@ -175,9 +175,8 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info) aer_error_severity_string[info->severity], aer_error_layer[layer], aer_agent_string[agent]); - pci_err(dev, " device [%04x:%04x] error status/mask=%08x/%08x\n", - dev->vendor, dev->device, - info->status, info->mask); + pci_err(dev, " error status/mask=%08x/%08x\n", info->status, + info->mask); __aer_print_error(dev, info); ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v1 2/2] PCI/AER: Stop printing vendor/device ID 2018-05-30 17:54 ` [PATCH v1 2/2] PCI/AER: Stop printing vendor/device ID Bjorn Helgaas @ 2018-05-30 18:18 ` Rajat Jain 2018-05-31 0:28 ` Bjorn Helgaas 0 siblings, 1 reply; 11+ messages in thread From: Rajat Jain @ 2018-05-30 18:18 UTC (permalink / raw) To: Bjorn Helgaas; +Cc: linux-pci, Oza Pawandeep, Linux Kernel Mailing List On Wed, May 30, 2018 at 10:54 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > From: Bjorn Helgaas <bhelgaas@google.com> > The Vendor and Device ID of the root port that raised an AER interrupt is > irrelevant and already available via normal enumeration dmesg logging or > lspci. Er, what is getting printed is not the vendor/device id of the root port but that of the AER source device (the one that root port got an ERR_* message from). In case of fatal AERs, the end point device may become inaccessible so lspci will not be available, and enumeration logs (from boot) may have gotten rolled over. So I think it is still better to print this information here. Just my opinion :-) Thanks, Rajat > Remove the Vendor and Device ID from AER logging. > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> > --- > drivers/pci/pcie/aer/aerdrv_errprint.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c b/drivers/pci/pcie/aer/aerdrv_errprint.c > index d7fde8368d81..16116844531c 100644 > --- a/drivers/pci/pcie/aer/aerdrv_errprint.c > +++ b/drivers/pci/pcie/aer/aerdrv_errprint.c > @@ -175,9 +175,8 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info) > aer_error_severity_string[info->severity], > aer_error_layer[layer], aer_agent_string[agent]); > - pci_err(dev, " device [%04x:%04x] error status/mask=%08x/%08x\n", > - dev->vendor, dev->device, > - info->status, info->mask); > + pci_err(dev, " error status/mask=%08x/%08x\n", info->status, > + info->mask); > __aer_print_error(dev, info); ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v1 2/2] PCI/AER: Stop printing vendor/device ID 2018-05-30 18:18 ` Rajat Jain @ 2018-05-31 0:28 ` Bjorn Helgaas 0 siblings, 0 replies; 11+ messages in thread From: Bjorn Helgaas @ 2018-05-31 0:28 UTC (permalink / raw) To: Rajat Jain; +Cc: linux-pci, Oza Pawandeep, Linux Kernel Mailing List On Wed, May 30, 2018 at 11:18:35AM -0700, Rajat Jain wrote: > On Wed, May 30, 2018 at 10:54 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > From: Bjorn Helgaas <bhelgaas@google.com> > > > The Vendor and Device ID of the root port that raised an AER interrupt is > > irrelevant and already available via normal enumeration dmesg logging or > > lspci. > > Er, what is getting printed is not the vendor/device id of the root port > but that of the AER source device (the one that root port got an ERR_* > message from). In case of fatal AERs, the end point device may become > inaccessible so lspci will not be available, and enumeration logs (from > boot) may have gotten rolled over. So I think it is still better to print > this information here. Thanks for looking this over! You're right, "dev" here is not necessarily the Root Port, so this changelog is bogus. "dev" came from e_info->dev[] from aer_process_err_devices(). I think to be more precise, aer_irq() reads the Root Port's PCI_ERR_ROOT_ERR_SRC register, which gives us the Requester ID from the ERR_* message. Then find_source_device() walks the tree starting with the Root Port, looking for: - a device that matches the Requester ID, or - a device that doesn't match the Requester ID (e.g., because a VMD port clears the source ID) but has AER enabled and has logged an error of the same type (ERR_COR vs ERR_FATAL/NONFATAL) we're currently decoding So there might be multiple "dev" pointers in e_info->dev[] because several devices could have logged errors. I'm not convinced the vendor/device ID is that useful because there might be several devices with the same ID, so it doesn't really tell you which one. The Requester ID (bus/device/function) is the important thing. The current code is not ideal because the find_source_device() path depends on the pci_dev still being present and even accessible (so we can read DEVCTL, ERR_COR_STATUS, etc), which might not be the case. If find_source_device() fails, i.e., it can't find a matching pci_dev and prints the "can't find device of ID%04x" message, we're in real trouble because we don't call aer_process_err_devices(), which means we don't clear PCI_ERR_COR_STATUS. Anyway, I'll abandon this change for now since it's not a clear improvement. > > Remove the Vendor and Device ID from AER logging. > > > Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> > > --- > > drivers/pci/pcie/aer/aerdrv_errprint.c | 5 ++--- > > 1 file changed, 2 insertions(+), 3 deletions(-) > > > diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c > b/drivers/pci/pcie/aer/aerdrv_errprint.c > > index d7fde8368d81..16116844531c 100644 > > --- a/drivers/pci/pcie/aer/aerdrv_errprint.c > > +++ b/drivers/pci/pcie/aer/aerdrv_errprint.c > > @@ -175,9 +175,8 @@ void aer_print_error(struct pci_dev *dev, struct > aer_err_info *info) > > aer_error_severity_string[info->severity], > > aer_error_layer[layer], aer_agent_string[agent]); > > > - pci_err(dev, " device [%04x:%04x] error status/mask=%08x/%08x\n", > > - dev->vendor, dev->device, > > - info->status, info->mask); > > + pci_err(dev, " error status/mask=%08x/%08x\n", info->status, > > + info->mask); > > > __aer_print_error(dev, info); ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v1 0/2] PCI/AER: Clean up minor logging issues 2018-05-30 17:54 [PATCH v1 0/2] PCI/AER: Clean up minor logging issues Bjorn Helgaas 2018-05-30 17:54 ` [PATCH v1 1/2] PCI/AER: Decode Error Source Requester ID Bjorn Helgaas 2018-05-30 17:54 ` [PATCH v1 2/2] PCI/AER: Stop printing vendor/device ID Bjorn Helgaas @ 2018-06-05 22:21 ` Bjorn Helgaas 2 siblings, 0 replies; 11+ messages in thread From: Bjorn Helgaas @ 2018-06-05 22:21 UTC (permalink / raw) To: linux-pci; +Cc: Oza Pawandeep, linux-kernel On Wed, May 30, 2018 at 12:54:10PM -0500, Bjorn Helgaas wrote: > AER logging printed the plain 16-bit Requester ID straight out of the TLP, > which is hard to interpret, e.g., id=00e4 corresponds to what we normally > see as 00:1c.4 in dmesg or lspci. > > Also, there's no need to print the vendor/device ID of the root port > reporting an error; we can easily find that from dmesg or lspci. > > --- > > Bjorn Helgaas (2): > PCI/AER: Decode Error Source Requester ID > PCI/AER: Stop printing vendor/device ID > > > drivers/pci/pcie/aer/aerdrv_errprint.c | 23 +++++++++++++---------- > 1 file changed, 13 insertions(+), 10 deletions(-) I applied the first but not the second to pci/aer for v4.18. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2018-06-05 22:22 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-05-30 17:54 [PATCH v1 0/2] PCI/AER: Clean up minor logging issues Bjorn Helgaas 2018-05-30 17:54 ` [PATCH v1 1/2] PCI/AER: Decode Error Source Requester ID Bjorn Helgaas 2018-05-30 18:32 ` Lukas Wunner 2018-05-31 4:54 ` Bjorn Helgaas 2018-05-30 18:41 ` Rajat Jain 2018-05-31 4:42 ` Bjorn Helgaas 2018-05-31 17:35 ` Rajat Jain 2018-05-30 17:54 ` [PATCH v1 2/2] PCI/AER: Stop printing vendor/device ID Bjorn Helgaas 2018-05-30 18:18 ` Rajat Jain 2018-05-31 0:28 ` Bjorn Helgaas 2018-06-05 22:21 ` [PATCH v1 0/2] PCI/AER: Clean up minor logging issues Bjorn Helgaas
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).