From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
To: Rajat Jain <rajatja@google.com>
Cc: Karolina Stolarek <karolina.stolarek@oracle.com>,
Jon Pan-Doh <pandoh@google.com>,
Bjorn Helgaas <bhelgaas@google.com>, <linux-pci@vger.kernel.org>,
Martin Petersen <martin.petersen@oracle.com>,
Ben Fuller <ben.fuller@oracle.com>,
Drew Walton <drewwalton@microsoft.com>,
Anil Agrawal <anilagrawal@meta.com>,
Tony Luck <tony.luck@intel.com>
Subject: Re: [PATCH 8/8] PCI/AER: Move AER sysfs attributes into separate directory
Date: Fri, 31 Jan 2025 14:36:16 +0000 [thread overview]
Message-ID: <20250131143616.00007a73@huawei.com> (raw)
In-Reply-To: <CACK8Z6Hyx4D3d=BK15f55muYu7kMLYV7fEusc7dTiUJJ3G5KuQ@mail.gmail.com>
On Thu, 16 Jan 2025 09:18:20 -0800
Rajat Jain <rajatja@google.com> wrote:
> Hello,
>
> On Thu, Jan 16, 2025 at 2:26 AM Karolina Stolarek
> <karolina.stolarek@oracle.com> wrote:
> >
> > On 15/01/2025 08:43, Jon Pan-Doh wrote:
> > > Prepare for the addition of new AER sysfs attributes (e.g. ratelimits)
> > > by moving them into their own directory. Update naming to reflect
> > > broader definition and for consistency.
> > >
> > > /sys/bus/pci/devices/<dev>/aer_dev_correctable
> > > /sys/bus/pci/devices/<dev>/aer_dev_fatal
> > > /sys/bus/pci/devices/<dev>/aer_dev_nonfatal
> > > /sys/bus/pci/devices/<dev>/aer_rootport_total_err_cor
> > > /sys/bus/pci/devices/<dev>/aer_rootport_total_err_fatal
> > > /sys/bus/pci/devices/<dev>/aer_rootport_total_err_nonfatal
> > > ->
> > > /sys/bus/pci/devices/<dev>/aer/err_cor
> > > /sys/bus/pci/devices/<dev>/aer/err_fatal
> > > /sys/bus/pci/devices/<dev>/aer/err_nonfatal
> > > /sys/bus/pci/devices/<dev>/aer/rootport_total_err_cor
> > > /sys/bus/pci/devices/<dev>/aer/rootport_total_err_fatal
> > > /sys/bus/pci/devices/<dev>/aer/rootport_total_err_nonfatal
> > >
> > > Tested using aer-inject[1] tool. Sent 1 AER error. Observed AER stats
> > > correctedly logged (cat /sys/bus/pci/devices/<dev>/aer/dev_err_cor).
> >
> > I'm not a sysfs expert but my understanding is that we shouldn't do
> > major changes in the existing hierarchies.
> >
> > On one hand, I think it would be nice to extract out AER-specific info
> > and knobs into a subdirectory (e.g., using attribute_group with name
> > "aer"), but on the other this would be disruptive to the userspace. I
> > can imagine that there are tools that watch these values that would
> > break after this change.
>
> Thank you. This is the right guidance.
>
> As the original author to introduce these attributes, I just wanted to
> chime in from the ChromeOS team's perspective (who originally
> introduced these attributes). I can say that we have used these
> attributes for debugging mostly manually, and do not have tools yet
> with hardcoded hierarchy / paths. So we wouldn't be opposed to it, if
> changes to the hierarchy have wider acceptance and it seems better in
> general.
You'd need to be really sure no one is using them or this is
ABI breakage and will need reverting. If it's been live for a while
then we are in a mess as we have to revert and break new users...
Generally I'd go with don't touch the existing elements.
Jonathan
>
> Thanks & Best Regards,
>
> Rajat
>
>
> >
> > All the best,
> > Karolina
> >
> > >
> > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/gong.chen/aer-inject.git
> > >
> > > Signed-off-by: Jon Pan-Doh <pandoh@google.com>
> > > ---
> > > .../ABI/testing/sysfs-bus-pci-devices-aer | 18 +++---
> > > drivers/pci/pci-sysfs.c | 1 -
> > > drivers/pci/pci.h | 1 -
> > > drivers/pci/pcie/aer.c | 64 +++++++------------
> > > 4 files changed, 32 insertions(+), 52 deletions(-)
> > >
> > > diff --git a/Documentation/ABI/testing/sysfs-bus-pci-devices-aer b/Documentation/ABI/testing/sysfs-bus-pci-devices-aer
> > > index c680a53af0f4..e1472583207b 100644
> > > --- a/Documentation/ABI/testing/sysfs-bus-pci-devices-aer
> > > +++ b/Documentation/ABI/testing/sysfs-bus-pci-devices-aer
> > > @@ -9,7 +9,7 @@ errors may be "seen" / reported by the link partner and not the
> > > problematic endpoint itself (which may report all counters as 0 as it never
> > > saw any problems).
> > >
> > > -What: /sys/bus/pci/devices/<dev>/aer_dev_correctable
> > > +What: /sys/bus/pci/devices/<dev>/aer/err_cor
> > > Date: July 2018
> > > KernelVersion: 4.19.0
> > > Contact: linux-pci@vger.kernel.org, rajatja@google.com
> > > @@ -19,7 +19,7 @@ Description: List of correctable errors seen and reported by this
> > > TOTAL_ERR_COR at the end of the file may not match the actual
> > > total of all the errors in the file. Sample output::
> > >
> > > - localhost /sys/devices/pci0000:00/0000:00:1c.0 # cat aer_dev_correctable
> > > + localhost /sys/devices/pci0000:00/0000:00:1c.0/aer # cat err_cor
> > > Receiver Error 2
> > > Bad TLP 0
> > > Bad DLLP 0
> > > @@ -30,7 +30,7 @@ Description: List of correctable errors seen and reported by this
> > > Header Log Overflow 0
> > > TOTAL_ERR_COR 2
> > >
> > > -What: /sys/bus/pci/devices/<dev>/aer_dev_fatal
> > > +What: /sys/bus/pci/devices/<dev>/aer/err_fatal
> > > Date: July 2018
> > > KernelVersion: 4.19.0
> > > Contact: linux-pci@vger.kernel.org, rajatja@google.com
> > > @@ -40,7 +40,7 @@ Description: List of uncorrectable fatal errors seen and reported by this
> > > TOTAL_ERR_FATAL at the end of the file may not match the actual
> > > total of all the errors in the file. Sample output::
> > >
> > > - localhost /sys/devices/pci0000:00/0000:00:1c.0 # cat aer_dev_fatal
> > > + localhost /sys/devices/pci0000:00/0000:00:1c.0/aer # cat err_fatal
> > > Undefined 0
> > > Data Link Protocol 0
> > > Surprise Down Error 0
> > > @@ -60,7 +60,7 @@ Description: List of uncorrectable fatal errors seen and reported by this
> > > TLP Prefix Blocked Error 0
> > > TOTAL_ERR_FATAL 0
> > >
> > > -What: /sys/bus/pci/devices/<dev>/aer_dev_nonfatal
> > > +What: /sys/bus/pci/devices/<dev>/aer/err_nonfatal
> > > Date: July 2018
> > > KernelVersion: 4.19.0
> > > Contact: linux-pci@vger.kernel.org, rajatja@google.com
> > > @@ -70,7 +70,7 @@ Description: List of uncorrectable nonfatal errors seen and reported by this
> > > TOTAL_ERR_NONFATAL at the end of the file may not match the
> > > actual total of all the errors in the file. Sample output::
> > >
> > > - localhost /sys/devices/pci0000:00/0000:00:1c.0 # cat aer_dev_nonfatal
> > > + localhost /sys/devices/pci0000:00/0000:00:1c.0/aer # cat err_nonfatal
> > > Undefined 0
> > > Data Link Protocol 0
> > > Surprise Down Error 0
> > > @@ -100,19 +100,19 @@ collectors) that are AER capable. These indicate the number of error messages as
> > > device, so these counters include them and are thus cumulative of all the error
> > > messages on the PCI hierarchy originating at that root port.
> > >
> > > -What: /sys/bus/pci/devices/<dev>/aer_rootport_total_err_cor
> > > +What: /sys/bus/pci/devices/<dev>/aer/rootport_total_err_cor
> > > Date: July 2018
> > > KernelVersion: 4.19.0
> > > Contact: linux-pci@vger.kernel.org, rajatja@google.com
> > > Description: Total number of ERR_COR messages reported to rootport.
> > >
> > > -What: /sys/bus/pci/devices/<dev>/aer_rootport_total_err_fatal
> > > +What: /sys/bus/pci/devices/<dev>/aer/rootport_total_err_fatal
> > > Date: July 2018
> > > KernelVersion: 4.19.0
> > > Contact: linux-pci@vger.kernel.org, rajatja@google.com
> > > Description: Total number of ERR_FATAL messages reported to rootport.
> > >
> > > -What: /sys/bus/pci/devices/<dev>/aer_rootport_total_err_nonfatal
> > > +What: /sys/bus/pci/devices/<dev>/aer/rootport_total_err_nonfatal
> > > Date: July 2018
> > > KernelVersion: 4.19.0
> > > Contact: linux-pci@vger.kernel.org, rajatja@google.com
> > > diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> > > index 41acb6713e2d..e16b92edf3bd 100644
> > > --- a/drivers/pci/pci-sysfs.c
> > > +++ b/drivers/pci/pci-sysfs.c
> > > @@ -1692,7 +1692,6 @@ const struct attribute_group *pci_dev_attr_groups[] = {
> > > &pci_bridge_attr_group,
> > > &pcie_dev_attr_group,
> > > #ifdef CONFIG_PCIEAER
> > > - &aer_stats_attr_group,
> > > &aer_attr_group,
> > > #endif
> > > #ifdef CONFIG_PCIEASPM
> > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > > index 9d0272a890ef..a80cfc08f634 100644
> > > --- a/drivers/pci/pci.h
> > > +++ b/drivers/pci/pci.h
> > > @@ -880,7 +880,6 @@ static inline void of_pci_remove_node(struct pci_dev *pdev) { }
> > > void pci_no_aer(void);
> > > void pci_aer_init(struct pci_dev *dev);
> > > void pci_aer_exit(struct pci_dev *dev);
> > > -extern const struct attribute_group aer_stats_attr_group;
> > > extern const struct attribute_group aer_attr_group;
> > > void pci_aer_clear_fatal_status(struct pci_dev *dev);
> > > int pci_aer_clear_status(struct pci_dev *dev);
> > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > > index e48e2951baae..68850525cc8d 100644
> > > --- a/drivers/pci/pcie/aer.c
> > > +++ b/drivers/pci/pcie/aer.c
> > > @@ -569,13 +569,13 @@ static const char *aer_agent_string[] = {
> > > } \
> > > static DEVICE_ATTR_RO(name)
> > >
> > > -aer_stats_dev_attr(aer_dev_correctable, dev_cor_errs,
> > > +aer_stats_dev_attr(err_cor, dev_cor_errs,
> > > aer_correctable_error_string, "ERR_COR",
> > > dev_total_cor_errs);
> > > -aer_stats_dev_attr(aer_dev_fatal, dev_fatal_errs,
> > > +aer_stats_dev_attr(err_fatal, dev_fatal_errs,
> > > aer_uncorrectable_error_string, "ERR_FATAL",
> > > dev_total_fatal_errs);
> > > -aer_stats_dev_attr(aer_dev_nonfatal, dev_nonfatal_errs,
> > > +aer_stats_dev_attr(err_nonfatal, dev_nonfatal_errs,
> > > aer_uncorrectable_error_string, "ERR_NONFATAL",
> > > dev_total_nonfatal_errs);
> > >
> > > @@ -589,47 +589,13 @@ aer_stats_dev_attr(aer_dev_nonfatal, dev_nonfatal_errs,
> > > } \
> > > static DEVICE_ATTR_RO(name)
> > >
> > > -aer_stats_rootport_attr(aer_rootport_total_err_cor,
> > > +aer_stats_rootport_attr(rootport_total_err_cor,
> > > rootport_total_cor_errs);
> > > -aer_stats_rootport_attr(aer_rootport_total_err_fatal,
> > > +aer_stats_rootport_attr(rootport_total_err_fatal,
> > > rootport_total_fatal_errs);
> > > -aer_stats_rootport_attr(aer_rootport_total_err_nonfatal,
> > > +aer_stats_rootport_attr(rootport_total_err_nonfatal,
> > > rootport_total_nonfatal_errs);
> > >
> > > -static struct attribute *aer_stats_attrs[] __ro_after_init = {
> > > - &dev_attr_aer_dev_correctable.attr,
> > > - &dev_attr_aer_dev_fatal.attr,
> > > - &dev_attr_aer_dev_nonfatal.attr,
> > > - &dev_attr_aer_rootport_total_err_cor.attr,
> > > - &dev_attr_aer_rootport_total_err_fatal.attr,
> > > - &dev_attr_aer_rootport_total_err_nonfatal.attr,
> > > - NULL
> > > -};
> > > -
> > > -static umode_t aer_stats_attrs_are_visible(struct kobject *kobj,
> > > - struct attribute *a, int n)
> > > -{
> > > - struct device *dev = kobj_to_dev(kobj);
> > > - struct pci_dev *pdev = to_pci_dev(dev);
> > > -
> > > - if (!pdev->aer_info)
> > > - return 0;
> > > -
> > > - if ((a == &dev_attr_aer_rootport_total_err_cor.attr ||
> > > - a == &dev_attr_aer_rootport_total_err_fatal.attr ||
> > > - a == &dev_attr_aer_rootport_total_err_nonfatal.attr) &&
> > > - ((pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT) &&
> > > - (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_EC)))
> > > - return 0;
> > > -
> > > - return a->mode;
> > > -}
> > > -
> > > -const struct attribute_group aer_stats_attr_group = {
> > > - .attrs = aer_stats_attrs,
> > > - .is_visible = aer_stats_attrs_are_visible,
> > > -};
> > > -
> > > #define aer_ratelimit_attr(name, ratelimit) \
> > > static ssize_t \
> > > name##_show(struct device *dev, struct device_attribute *attr, \
> > > @@ -662,6 +628,14 @@ aer_ratelimit_attr(ratelimit_cor_log, cor_log_ratelimit);
> > > aer_ratelimit_attr(ratelimit_uncor_log, uncor_log_ratelimit);
> > >
> > > static struct attribute *aer_attrs[] __ro_after_init = {
> > > + /* Stats */
> > > + &dev_attr_err_cor.attr,
> > > + &dev_attr_err_fatal.attr,
> > > + &dev_attr_err_nonfatal.attr,
> > > + &dev_attr_rootport_total_err_cor.attr,
> > > + &dev_attr_rootport_total_err_fatal.attr,
> > > + &dev_attr_rootport_total_err_nonfatal.attr,
> > > + /* Ratelimits */
> > > &dev_attr_ratelimit_cor_irq.attr,
> > > &dev_attr_ratelimit_uncor_irq.attr,
> > > &dev_attr_ratelimit_cor_log.attr,
> > > @@ -670,13 +644,21 @@ static struct attribute *aer_attrs[] __ro_after_init = {
> > > };
> > >
> > > static umode_t aer_attrs_are_visible(struct kobject *kobj,
> > > - struct attribute *a, int n)
> > > + struct attribute *a, int n)
> > > {
> > > struct device *dev = kobj_to_dev(kobj);
> > > struct pci_dev *pdev = to_pci_dev(dev);
> > >
> > > if (!pdev->aer_info)
> > > return 0;
> > > +
> > > + if ((a == &dev_attr_rootport_total_err_cor.attr ||
> > > + a == &dev_attr_rootport_total_err_fatal.attr ||
> > > + a == &dev_attr_rootport_total_err_nonfatal.attr) &&
> > > + ((pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT) &&
> > > + (pci_pcie_type(pdev) != PCI_EXP_TYPE_RC_EC)))
> > > + return 0;
> > > +
> > > return a->mode;
> > > }
> > >
> >
> >
>
>
next prev parent reply other threads:[~2025-01-31 14:36 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-15 7:42 [PATCH 0/8] Rate limit AER logs/IRQs Jon Pan-Doh
2025-01-15 7:42 ` [PATCH 1/8] PCI/AER: Remove aer_print_port_info Jon Pan-Doh
2025-01-16 14:27 ` Karolina Stolarek
2025-01-18 1:57 ` Jon Pan-Doh
2025-01-20 9:25 ` Karolina Stolarek
2025-02-12 23:20 ` Jon Pan-Doh
2025-01-21 14:18 ` Ilpo Järvinen
2025-02-12 23:20 ` Jon Pan-Doh
2025-01-25 4:15 ` Sathyanarayanan Kuppuswamy
2025-02-12 23:20 ` Jon Pan-Doh
2025-01-15 7:42 ` [PATCH 2/8] PCI/AER: Move AER stat collection out of __aer_print_error Jon Pan-Doh
2025-01-16 14:47 ` Karolina Stolarek
2025-01-18 1:57 ` Jon Pan-Doh
2025-01-25 4:37 ` Sathyanarayanan Kuppuswamy
2025-02-12 23:20 ` Jon Pan-Doh
2025-01-15 7:42 ` [PATCH 3/8] PCI/AER: Rename struct aer_stats to aer_info Jon Pan-Doh
2025-01-16 10:11 ` Karolina Stolarek
2025-01-18 1:59 ` Jon Pan-Doh
2025-01-20 10:13 ` Karolina Stolarek
2025-02-12 23:20 ` Jon Pan-Doh
2025-01-15 7:42 ` [PATCH 4/8] PCI/AER: Introduce ratelimit for error logs Jon Pan-Doh
2025-01-16 11:11 ` Karolina Stolarek
2025-01-18 1:59 ` Jon Pan-Doh
2025-01-20 10:25 ` Karolina Stolarek
2025-01-15 7:42 ` [PATCH 5/8] PCI/AER: Introduce ratelimit for AER IRQs Jon Pan-Doh
2025-01-16 12:02 ` Karolina Stolarek
2025-01-18 1:58 ` Jon Pan-Doh
2025-01-20 10:38 ` Karolina Stolarek
2025-01-25 7:39 ` Lukas Wunner
2025-01-31 14:43 ` Jonathan Cameron
2025-03-04 23:42 ` Jon Pan-Doh
2025-02-06 13:56 ` Karolina Stolarek
2025-02-06 20:25 ` Lukas Wunner
2025-01-31 14:55 ` Jonathan Cameron
2025-03-04 23:42 ` Jon Pan-Doh
2025-01-15 7:42 ` [PATCH 6/8] PCI/AER: Add AER sysfs attributes for ratelimits Jon Pan-Doh
2025-01-31 14:32 ` Jonathan Cameron
2025-02-28 23:11 ` Jon Pan-Doh
2025-01-15 7:42 ` [PATCH 7/8] PCI/AER: Update AER sysfs ABI filename Jon Pan-Doh
2025-01-15 7:43 ` [PATCH 8/8] PCI/AER: Move AER sysfs attributes into separate directory Jon Pan-Doh
2025-01-16 10:26 ` Karolina Stolarek
2025-01-16 17:18 ` Rajat Jain
2025-01-31 14:36 ` Jonathan Cameron [this message]
2025-02-12 23:19 ` Jon Pan-Doh
2025-01-23 15:18 ` [PATCH 0/8] Rate limit AER logs/IRQs Bowman, Terry
2025-01-24 6:46 ` Jon Pan-Doh
2025-01-25 7:59 ` Lukas Wunner
2025-02-06 13:32 ` Karolina Stolarek
2025-02-12 23:19 ` Jon Pan-Doh
2025-02-13 16:00 ` Karolina Stolarek
2025-02-14 2:49 ` Jon Pan-Doh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250131143616.00007a73@huawei.com \
--to=jonathan.cameron@huawei.com \
--cc=anilagrawal@meta.com \
--cc=ben.fuller@oracle.com \
--cc=bhelgaas@google.com \
--cc=drewwalton@microsoft.com \
--cc=karolina.stolarek@oracle.com \
--cc=linux-pci@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=pandoh@google.com \
--cc=rajatja@google.com \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.