All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lukas Wunner <lukas@wunner.de>
To: Brett Creeley <bcreeley@amd.com>
Cc: Shannon Nelson <shannon.nelson@amd.com>,
	netdev@vger.kernel.org, davem@davemloft.net, kuba@kernel.org,
	edumazet@google.com, pabeni@redhat.com, brett.creeley@amd.com,
	drivers@pensando.io
Subject: Re: [PATCH net-next 1/3] pds_core: add simple AER handler
Date: Wed, 6 Aug 2025 08:58:40 +0200	[thread overview]
Message-ID: <aJL9IBagGNUagEbv@wunner.de> (raw)
In-Reply-To: <48ffde5c-084f-4ad6-8be7-314afb14b2ac@amd.com>

On Tue, Aug 05, 2025 at 03:10:19PM -0700, Brett Creeley wrote:
> On 8/5/2025 8:01 AM, Lukas Wunner wrote:
> > On Fri, Feb 16, 2024 at 02:29:50PM -0800, Shannon Nelson wrote:
> > > Set up the pci_error_handlers error_detected and resume to be
> > > useful in handling AER events.
> > 
> > The above was committed as d740f4be7cf0 ("pds_core: add simple
> > AER handler").
> > 
> > Just noticed the following while inspecting the pci_error_handlers
> > of this driver:
> > 
> > > +static pci_ers_result_t pdsc_pci_error_detected(struct pci_dev *pdev,
> > > +                                             pci_channel_state_t error)
> > > +{
> > > +     if (error == pci_channel_io_frozen) {
> > > +             pdsc_reset_prepare(pdev);
> > > +             return PCI_ERS_RESULT_NEED_RESET;
> > > +     }
> > > +
> > > +     return PCI_ERS_RESULT_NONE;
> > > +}
> > 
> > The ->error_detected() callback of this driver invokes
> > pdsc_reset_prepare(), which unmaps BARs and calls pci_disable_device(),
> > but there is no corresponding ->slot_reset() callback which would invoke
> > pdsc_reset_done() to re-enable the device after reset recovery.
> 
> Thanks for the note. It's been a bit since I have looked at this, but I
> believe that it's working in the following way:
> 
> 1. pds_core's pci_error_handlers.error_detected callback returns
> PCI_ERS_RESULT_NEED_RESET
> 2. status is initialized to PCI_ERS_RESULT_RECOVERED in the pci core and
> since pds_core doesn't have a slot_reset callback then status remains
> PCI_ERS_RESULT_RECOVERED
> 3. pds_core's pci_error_handlers.resume callback is called, which will
> attempt reset/recover the device to a functional state

My point is, you're calling pdsc_reset_prepare() but you're never calling
pdsc_reset_done().  The former performs various teardown steps and calls
pci_disable_device(), which disables MMIO access to the device.  Since
you're never calling pdsc_reset_done(), you're not re-enabling MMIO
access to the device and re-initializing the device.  So I'd expect
any subsequent device access to fail.

Normally you'd have a ->slot_reset() callback which would call
pdsc_reset_done().  Then the code would look sane.

Moreover, the AER driver in the PCI core performs an unconditional 
Secondary Bus Reset on Fatal Errors (channel state pci_channel_io_frozen).
You're performing an additional reset of the PCI Function in
pdsc_pci_error_resume().  At least for Fatal Errors, this seems
superfluous.

You're only resetting the PCI Function if the PDSC_S_FW_DEAD bit is set,
irrespective whether you're dealing with a Fatal or Non-Fatal Error.

Normally I'd expect that you need to perform some re-initialization
after resetting the PCI Function, so this also looks weird.

Thanks,

Lukas

  parent reply	other threads:[~2025-08-06  7:05 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-16 22:29 [PATCH net-next 0/3] pds_core: AER handling Shannon Nelson
2024-02-16 22:29 ` [PATCH net-next 1/3] pds_core: add simple AER handler Shannon Nelson
2025-08-05 15:01   ` Lukas Wunner
2025-08-05 22:10     ` Brett Creeley
2025-08-05 22:28       ` Shannon Nelson
2025-08-06  6:58       ` Lukas Wunner [this message]
2025-08-06  7:08         ` Lukas Wunner
2024-02-16 22:29 ` [PATCH net-next 2/3] pds_core: delete VF dev on reset Shannon Nelson
2024-02-16 22:29 ` [PATCH net-next 3/3] pds_core: use pci_reset_function for health reset Shannon Nelson
2024-02-19 10:40 ` [PATCH net-next 0/3] pds_core: AER handling patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aJL9IBagGNUagEbv@wunner.de \
    --to=lukas@wunner.de \
    --cc=bcreeley@amd.com \
    --cc=brett.creeley@amd.com \
    --cc=davem@davemloft.net \
    --cc=drivers@pensando.io \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=shannon.nelson@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.