* RFC: Counters for PCI Express AERs @ 2018-05-17 21:05 Rajat Jain 2018-05-17 21:25 ` okaya 0 siblings, 1 reply; 7+ messages in thread From: Rajat Jain @ 2018-05-17 21:05 UTC (permalink / raw) To: linux-pci, Bjorn Helgaas Hello, I have been thinking about adding counters for different kinds of AERs and expose them via sysfs. IMHO this would help by giving some sense of "link quality" for PCIe links (a lot of correctable AERs may indicate system is workable, but may indicate some signal integrity issues etc). Currently, on a correctable AER, we do log them, but having them in sysfs would allow userspace tools to possibly (periodically) poll them and raise an appropriate warning in case of too many errors. I know that for my purposes, getting some idea of PCI link quality or a way to quantize it, would help. Do you think such counters make sense or would be helpful generically? Also, please let me know if something like this already exists? Thanks, Rajat ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RFC: Counters for PCI Express AERs 2018-05-17 21:05 RFC: Counters for PCI Express AERs Rajat Jain @ 2018-05-17 21:25 ` okaya 2018-05-17 21:48 ` Rajat Jain 0 siblings, 1 reply; 7+ messages in thread From: okaya @ 2018-05-17 21:25 UTC (permalink / raw) To: Rajat Jain; +Cc: linux-pci, Bjorn Helgaas, linux-pci-owner On 2018-05-17 17:05, Rajat Jain wrote: > Hello, > > I have been thinking about adding counters for different kinds of AERs > and expose them via sysfs. IMHO this would help by giving some sense > of "link quality" for PCIe links (a lot of correctable AERs may > indicate system is workable, but may indicate some signal integrity > issues etc). Currently, on a correctable AER, we do log them, but > having them in sysfs would allow userspace tools to possibly > (periodically) poll them and raise an appropriate warning in case of > too many errors. I know that for my purposes, getting some idea of PCI > link quality or a way to quantize it, would help. > > Do you think such counters make sense or would be helpful generically? > Also, please let me know if something like this already exists? This question came from FB folks last year. They were told to use the perf events for counting. I don't honestly have a strong opinion. > > Thanks, > > Rajat ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RFC: Counters for PCI Express AERs 2018-05-17 21:25 ` okaya @ 2018-05-17 21:48 ` Rajat Jain 2018-05-17 21:52 ` Rajat Jain 0 siblings, 1 reply; 7+ messages in thread From: Rajat Jain @ 2018-05-17 21:48 UTC (permalink / raw) To: Sinan Kaya, Jes Sorensen; +Cc: linux-pci, Bjorn Helgaas, linux-pci-owner [+Jes Sorensen] On Thu, May 17, 2018 at 2:25 PM, <okaya@codeaurora.org> wrote: > On 2018-05-17 17:05, Rajat Jain wrote: >> >> Hello, >> >> I have been thinking about adding counters for different kinds of AERs >> and expose them via sysfs. IMHO this would help by giving some sense >> of "link quality" for PCIe links (a lot of correctable AERs may >> indicate system is workable, but may indicate some signal integrity >> issues etc). Currently, on a correctable AER, we do log them, but >> having them in sysfs would allow userspace tools to possibly >> (periodically) poll them and raise an appropriate warning in case of >> too many errors. I know that for my purposes, getting some idea of PCI >> link quality or a way to quantize it, would help. >> >> Do you think such counters make sense or would be helpful generically? >> Also, please let me know if something like this already exists? > > > This question came from FB folks last year. They were told to use the perf > events for counting. Thanks for the info. I think you are referring to this: https://linuxplumbersconf.org/2017/ocw/proposals/4803.html Jes: did anything come out of the proposal? I'm wondering if you have any patch that in work-in-progress that I could use may be as a starting point? Thanks, Rajat > > I don't honestly have a strong opinion. Thanks! I'd like to work on this if not already done. > >> >> Thanks, >> >> Rajat ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RFC: Counters for PCI Express AERs 2018-05-17 21:48 ` Rajat Jain @ 2018-05-17 21:52 ` Rajat Jain 2018-05-18 14:24 ` Jes Sorensen 0 siblings, 1 reply; 7+ messages in thread From: Rajat Jain @ 2018-05-17 21:52 UTC (permalink / raw) To: Sinan Kaya, jsorensen; +Cc: linux-pci, Bjorn Helgaas, linux-pci-owner [Fixing the new email address for Jes Sorensen now] On Thu, May 17, 2018 at 2:48 PM, Rajat Jain <rajatja@google.com> wrote: > [+Jes Sorensen] > > On Thu, May 17, 2018 at 2:25 PM, <okaya@codeaurora.org> wrote: >> On 2018-05-17 17:05, Rajat Jain wrote: >>> >>> Hello, >>> >>> I have been thinking about adding counters for different kinds of AERs >>> and expose them via sysfs. IMHO this would help by giving some sense >>> of "link quality" for PCIe links (a lot of correctable AERs may >>> indicate system is workable, but may indicate some signal integrity >>> issues etc). Currently, on a correctable AER, we do log them, but >>> having them in sysfs would allow userspace tools to possibly >>> (periodically) poll them and raise an appropriate warning in case of >>> too many errors. I know that for my purposes, getting some idea of PCI >>> link quality or a way to quantize it, would help. >>> >>> Do you think such counters make sense or would be helpful generically? >>> Also, please let me know if something like this already exists? >> >> >> This question came from FB folks last year. They were told to use the perf >> events for counting. > > Thanks for the info. I think you are referring to this: > https://linuxplumbersconf.org/2017/ocw/proposals/4803.html > > Jes: did anything come out of the proposal? I'm wondering if you have > any patch that in work-in-progress that I could use may be as a > starting point? > > Thanks, > > Rajat > >> >> I don't honestly have a strong opinion. > > Thanks! I'd like to work on this if not already done. > >> >>> >>> Thanks, >>> >>> Rajat ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RFC: Counters for PCI Express AERs 2018-05-17 21:52 ` Rajat Jain @ 2018-05-18 14:24 ` Jes Sorensen 2018-05-18 16:31 ` Rajat Jain 0 siblings, 1 reply; 7+ messages in thread From: Jes Sorensen @ 2018-05-18 14:24 UTC (permalink / raw) To: Rajat Jain, Sinan Kaya Cc: linux-pci, Bjorn Helgaas, linux-pci-owner, Kyle McMartin On 05/17/2018 05:52 PM, Rajat Jain wrote: > [Fixing the new email address for Jes Sorensen now] > > On Thu, May 17, 2018 at 2:48 PM, Rajat Jain <rajatja@google.com> wrote: >> [+Jes Sorensen] >> >> On Thu, May 17, 2018 at 2:25 PM, <okaya@codeaurora.org> wrote: >>> On 2018-05-17 17:05, Rajat Jain wrote: >>>> >>>> Hello, >>>> >>>> I have been thinking about adding counters for different kinds of AERs >>>> and expose them via sysfs. IMHO this would help by giving some sense >>>> of "link quality" for PCIe links (a lot of correctable AERs may >>>> indicate system is workable, but may indicate some signal integrity >>>> issues etc). Currently, on a correctable AER, we do log them, but >>>> having them in sysfs would allow userspace tools to possibly >>>> (periodically) poll them and raise an appropriate warning in case of >>>> too many errors. I know that for my purposes, getting some idea of PCI >>>> link quality or a way to quantize it, would help. >>>> >>>> Do you think such counters make sense or would be helpful generically? >>>> Also, please let me know if something like this already exists? >>> >>> >>> This question came from FB folks last year. They were told to use the perf >>> events for counting. >> >> Thanks for the info. I think you are referring to this: >> >> Jes: did anything come out of the proposal? I'm wondering if you have >> any patch that in work-in-progress that I could use may be as a >> starting point? Kyle McMartin was working on this, I don't know the current status. Jes ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RFC: Counters for PCI Express AERs 2018-05-18 14:24 ` Jes Sorensen @ 2018-05-18 16:31 ` Rajat Jain 2018-05-23 22:20 ` Kyle McMartin 0 siblings, 1 reply; 7+ messages in thread From: Rajat Jain @ 2018-05-18 16:31 UTC (permalink / raw) To: Jes Sorensen Cc: Sinan Kaya, linux-pci, Bjorn Helgaas, linux-pci-owner, Kyle McMartin On Fri, May 18, 2018 at 7:24 AM, Jes Sorensen <jsorensen@fb.com> wrote: > On 05/17/2018 05:52 PM, Rajat Jain wrote: >> [Fixing the new email address for Jes Sorensen now] >> >> On Thu, May 17, 2018 at 2:48 PM, Rajat Jain <rajatja@google.com> wrote: >>> [+Jes Sorensen] >>> >>> On Thu, May 17, 2018 at 2:25 PM, <okaya@codeaurora.org> wrote: >>>> On 2018-05-17 17:05, Rajat Jain wrote: >>>>> >>>>> Hello, >>>>> >>>>> I have been thinking about adding counters for different kinds of AERs >>>>> and expose them via sysfs. IMHO this would help by giving some sense >>>>> of "link quality" for PCIe links (a lot of correctable AERs may >>>>> indicate system is workable, but may indicate some signal integrity >>>>> issues etc). Currently, on a correctable AER, we do log them, but >>>>> having them in sysfs would allow userspace tools to possibly >>>>> (periodically) poll them and raise an appropriate warning in case of >>>>> too many errors. I know that for my purposes, getting some idea of PCI >>>>> link quality or a way to quantize it, would help. >>>>> >>>>> Do you think such counters make sense or would be helpful generically? >>>>> Also, please let me know if something like this already exists? >>>> >>>> >>>> This question came from FB folks last year. They were told to use the perf >>>> events for counting. >>> >>> Thanks for the info. I think you are referring to this: >>> >>> Jes: did anything come out of the proposal? I'm wondering if you have >>> any patch that in work-in-progress that I could use may be as a >>> starting point? > > Kyle McMartin was working on this, I don't know the current status. Never mind, I think I'm more than halfway there and will be sending a patch in a day or two. > > Jes > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RFC: Counters for PCI Express AERs 2018-05-18 16:31 ` Rajat Jain @ 2018-05-23 22:20 ` Kyle McMartin 0 siblings, 0 replies; 7+ messages in thread From: Kyle McMartin @ 2018-05-23 22:20 UTC (permalink / raw) To: Rajat Jain; +Cc: Jes Sorensen, Sinan Kaya, linux-pci, Bjorn Helgaas On Fri, May 18, 2018 at 09:31:11AM -0700, Rajat Jain wrote: > >>> Jes: did anything come out of the proposal? I'm wondering if you have > >>> any patch that in work-in-progress that I could use may be as a > >>> starting point? > > > > Kyle McMartin was working on this, I don't know the current status. > > > Never mind, I think I'm more than halfway there and will be sending a > patch in a day or two. > Patch set looks good to me, it's pretty analogous to what I came up with in the fall. I ended up using the tracepoints instead of patching in sysfs to enable it across all of our kernel versions in production. Really glad you did this work, hopefully it gets merged. cheers, Kyle ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2018-05-23 22:20 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-05-17 21:05 RFC: Counters for PCI Express AERs Rajat Jain 2018-05-17 21:25 ` okaya 2018-05-17 21:48 ` Rajat Jain 2018-05-17 21:52 ` Rajat Jain 2018-05-18 14:24 ` Jes Sorensen 2018-05-18 16:31 ` Rajat Jain 2018-05-23 22:20 ` Kyle McMartin
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.