Linux PCI subsystem development
 help / color / mirror / Atom feed
From: Karolina Stolarek <karolina.stolarek@oracle.com>
To: Jon Pan-Doh <pandoh@google.com>
Cc: linux-pci@vger.kernel.org, "Bjorn Helgaas" <bhelgaas@google.com>,
	"Martin Petersen" <martin.petersen@oracle.com>,
	"Ben Fuller" <ben.fuller@oracle.com>,
	"Drew Walton" <drewwalton@microsoft.com>,
	"Anil Agrawal" <anilagrawal@meta.com>,
	"Tony Luck" <tony.luck@intel.com>,
	"Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>,
	"Sathyanarayanan Kuppuswamy"
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	"Lukas Wunner" <lukas@wunner.de>,
	"Jonathan Cameron" <Jonathan.Cameron@huawei.com>,
	"Sargun Dhillon" <sargun@meta.com>,
	"Paul E . McKenney" <paulmck@kernel.org>
Subject: Re: [PATCH v5 6/8] PCI/AER: Introduce ratelimit for error logs
Date: Fri, 4 Apr 2025 11:32:30 +0200	[thread overview]
Message-ID: <88557c78-9ead-4a5b-942f-5bd0213356f9@oracle.com> (raw)
In-Reply-To: <CAMC_AXWSXLwnS7-KbK=xFR+r84s+VCYYEegBVFCqehx21L4AeA@mail.gmail.com>

On 21/03/2025 19:41, Jon Pan-Doh wrote:
> On Fri, Mar 21, 2025 at 6:46 AM Karolina Stolarek
> <karolina.stolarek@oracle.com> wrote:
>>
>> On 21/03/2025 02:58, Jon Pan-Doh wrote:
>>>    void aer_print_error(struct pci_dev *dev, struct aer_err_info *info,
>>> -                  const char *level)
>>> +                  const char *level, bool ratelimited)
>>
>> Ideally, we would like to be able to extract the "ratelimited" flag from
>> the aer_err_info struct, with no need for extra parameters in this function.
> 
> I considered this, but pci_print_aer() and dpc_process_aer() both call
> aer_print_error directly without populating info->dev[]. I thought it
> was easier/cleaner to pass it in vs. populate info->dev[] and then
> read it.

We decided to not rate limit DPC and with my patch, eventually, 
hopefully, landing upstream, we will stop caring about pci_print_aer() 
altogether.

>> It looks like we ratelimit the Root Port logs based on the source device
>> that generated the message, and the actual errors in
>> aer_process_err_devices() use their own ratelimits. As you noted in one
>> of your emails, there might be the case where we report errors but
>> there's no information about the Root Port that issued the interrupt
>> we're handling.
> 
> Your understanding is correct. I think the edge case described is
> acceptable behavior:
> 
> 1. Multiple errors arrive where the first source device is ratelimited
> 2. Root port log and first source device log are not printed
> 3. Other error source device(s) logs are printed

I find it inconsistent. I won't block the series on the basis of this 
change but wanted to point out that such a thing can happen.

(...)

>> The way I understood the suggestion in 20250320202913.GA1097165@bhelgaas
>> is that we evaluate the ratelimit of the Root Port or Downstream Port,
>> save it in aer_err_info, and use it in aer_print_rp_info() and
>> aer_print_error(). I'm worried that one noisy device under a Root Port
>> could hit a ratelimit and hide everything else
> 
> This is not a 100% translation of Bjorn's suggestion as I share your
> concern (1 spammy device ratelimits and hides other devices).

I understand.

>> A fair (and complicated) solution would be to check the ratelimits of
>> all devices in the Error Message to see if there is at least one that
>> can be reported. If so, use that ratelimit when printing both the Root
>> Port info and the error details from that device.
> 
> I mentioned this as well[1], albeit briefly. I'd opt for this if my
> initial solution isn't satisfactory. You could make it more
> complicated by substituting the source device, if it is ratelimited,
> to a non-ratelimited one. However, it's not good as it changes the
> default expectation that the root port log would have the source ID.

Oh yes, that sounds overly complicated. I have some doubts about missing 
Root Port logs in that specific case (even if they are confusing and may 
point to the ratelimited source), but let's go with the series as it is.

All the best,
Karolina

  reply	other threads:[~2025-04-04  9:33 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-21  1:57 [PATCH v5 0/8] Rate limit AER logs Jon Pan-Doh
2025-03-21  1:57 ` [PATCH v5 1/8] PCI/AER: Check log level once and propagate down Jon Pan-Doh
2025-05-01 21:43   ` Bjorn Helgaas
2025-05-05  9:30     ` Karolina Stolarek
2025-05-05 17:43       ` Bjorn Helgaas
2025-05-08 15:07         ` Karolina Stolarek
2025-03-21  1:58 ` [PATCH v5 2/8] PCI/AER: Make all pci_print_aer() log levels depend on error type Jon Pan-Doh
2025-03-21  1:58 ` [PATCH v5 3/8] PCI/AER: Move AER stat collection out of __aer_print_error() Jon Pan-Doh
2025-03-21  1:58 ` [PATCH v5 4/8] PCI/AER: Rename aer_print_port_info() to aer_printrp_info() Jon Pan-Doh
2025-03-21 13:39   ` Karolina Stolarek
2025-03-21 19:26     ` Jon Pan-Doh
2025-03-21  1:58 ` [PATCH v5 5/8] PCI/AER: Rename struct aer_stats to aer_report Jon Pan-Doh
2025-03-21 22:01   ` Bjorn Helgaas
2025-03-21 22:15     ` Jon Pan-Doh
2025-03-21 22:30       ` Paul E. McKenney
2025-03-21 22:16     ` Paul E. McKenney
2025-03-21 22:39       ` Bjorn Helgaas
2025-03-21 22:47         ` Paul E. McKenney
2025-05-01 22:02         ` Bjorn Helgaas
2025-05-02  2:16           ` Paul E. McKenney
2025-03-21  1:58 ` [PATCH v5 6/8] PCI/AER: Introduce ratelimit for error logs Jon Pan-Doh
2025-03-21 13:46   ` Karolina Stolarek
2025-03-21 18:41     ` Jon Pan-Doh
2025-04-04  9:32       ` Karolina Stolarek [this message]
2025-03-25 17:17   ` Paul E. McKenney
2025-03-27 22:49     ` Jon Pan-Doh
2025-04-03 19:02       ` Paul E. McKenney
2025-03-31 18:48   ` Bjorn Helgaas
2025-04-01  0:30     ` Jon Pan-Doh
2025-04-01 18:02       ` Bjorn Helgaas
2025-04-24 20:31   ` Bjorn Helgaas
2025-03-21  1:58 ` [PATCH v5 7/8] PCI/AER: Add ratelimits to PCI AER Documentation Jon Pan-Doh
2025-03-21  1:58 ` [PATCH v5 8/8] PCI/AER: Add sysfs attributes for log ratelimits Jon Pan-Doh
2025-03-23 12:20   ` Krzysztof Wilczyński
2025-03-27 22:50     ` Jon Pan-Doh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=88557c78-9ead-4a5b-942f-5bd0213356f9@oracle.com \
    --to=karolina.stolarek@oracle.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=anilagrawal@meta.com \
    --cc=ben.fuller@oracle.com \
    --cc=bhelgaas@google.com \
    --cc=drewwalton@microsoft.com \
    --cc=ilpo.jarvinen@linux.intel.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=martin.petersen@oracle.com \
    --cc=pandoh@google.com \
    --cc=paulmck@kernel.org \
    --cc=sargun@meta.com \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox