All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Alex_Gagniuc@Dellteam.com
Cc: oohall@gmail.com, gregkh@linuxfoundation.org,
	keith.busch@intel.com, mr.nuke.me@gmail.com,
	linux-pci@vger.kernel.org, Austin.Bolen@dell.com,
	Shyam.Iyer@dell.com, linux-kernel@vger.kernel.org,
	jonathan.derrick@intel.com, lukas@wunner.de, ruscur@russell.cc,
	sbobroff@linux.ibm.com, linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected
Date: Thu, 15 Nov 2018 00:24:23 -0600	[thread overview]
Message-ID: <20181115062423.GA94998@google.com> (raw)
In-Reply-To: <1eb0fa27924f426992715684f5e63346@ausx13mps321.AMER.DELL.COM>

On Wed, Nov 14, 2018 at 07:22:04PM +0000, Alex_Gagniuc@Dellteam.com wrote:
> On 11/14/2018 12:00 AM, Bjorn Helgaas wrote:
> > On Tue, Nov 13, 2018 at 10:39:15PM +0000, Alex_Gagniuc@Dellteam.com wrote:
> >> On 11/12/2018 11:02 PM, Bjorn Helgaas wrote:
> >> ...
> >>> Do you think Linux observes the rule about not touching AER bits on
> >>> FFS?  I'm not sure it does.  I'm not even sure what section of the
> >>> spec is relevant.
> >>
> >> I haven't found any place where linux breaks this rule. I'm very
> >> confident that, unless otherwise instructed, we follow this rule.
> > 
> > Just to make sure we're on the same page, can you point me to this
> > rule?  I do see that OSPM must request control of AER using _OSC
> > before it touches the AER registers.  What I don't see is the
> > connection between firmware-first and the AER registers.
> 
> ACPI 6.2 - 6.2.11.3, Table 6-197:
> 
> PCI Express Advanced Error Reporting control:
>   * The firmware sets this bit to 1 to grant control over PCI Express 
> Advanced Error Reporting. If firmware allows the OS control of this 
> feature, then in the context of the _OSC method it must ensure that 
> error messages are routed to device interrupts as described in the PCI 
> Express Base Specification[...]

The PCIe Base Spec is pretty big, so I wish this reference were a
little more explicit.  I *guess* maybe it's referring to PCIe r4.0,
figure 6-3 in sec 6.2.6, where PCIe ERR_* Messages can be routed to
"INTx or MSI Error Interrupts" and/or "platform-specific System Error"
interrupts.

"Device interrupts" seems like it refers to the "INTx or MSI"
interrupts, not the platform-specific System Errors, so I would read
that as saying "if firmware grants OS control of AER via _OSC,
firmware must set the AER Reporting Enables in the AER Root Error
Command register."  But that seems a little silly because the OS now
*owns* the AER capability and it can set the AER Root Error Command
register itself if it wants to.

And I still don't see the connection here with Firmware-First.  I'm
pretty sure firmware could not be notified via INTx or MSI interrupts
because those are totally managed by OSPM.

> > The closest I can find is the "Enabled" field in the HEST PCIe
> > AER structures (ACPI v6.2, sec 18.3.2.4, .5, .6), where it says:
> > 
> >    If the field value is 1, indicates this error source is
> >    to be enabled.
> > 
> >    If the field value is 0, indicates that the error source
> >    is not to be enabled.
> > 
> >    If FIRMWARE_FIRST is set in the flags field, the Enabled
> >    field is ignored by the OSPM.
> > 
> > AFAICT, Linux completely ignores the Enabled field in these
> > structures.
> 
> I don't think ignoring the field is a problem:
>   * With FFS, OS should ignore it.
>   * Without FFS, we have control, and we get to make the decisions anyway.
> In the latter case we decide whether to use AER, independent of the crap 
> in ACPI. I'm not even sure why "Enabled" matters in native AER handling. 

It seems like these HEST structures are "here's how firmware thinks
you should set up AER on this device".  But I agree, I have no idea
how to interpret "Enabled".  The rest of the HEST fields cover all the
useful AER registers, including the Reporting Enables in the AER Root
Error Command register *and* the Error Reporting Enables in the Device
Control register.  So I don't know what the "Enabled" field adds to
all that.  What a mess.

> > For firmware-first to work, firmware has to get control.  How does
> > it get control?  How does OSPM know to either set up that
> > mechanism or keep its mitts off something firmware set up before
> > handoff?
> 
> My understanding is that, if FW keeps control of AER in _OSC, then
> it will have set things up to get notified instead of the OS. OSPM
> not touching AER bits is to make sure it doesn't mess up FW's setup.
> I think there are some proprietary bits in the root port to route
> interrupts to SMIs instead of the AER vectors.

It makes good sense that if OSPM doesn't have AER control, firmware
does all AER handling, including any setup for firmware-first
notification.  If we can assume that firmware-first notification is
done in some way the OS doesn't know about and can't mess up, that
would be awesome.

But I think the VMD model really has nothing to do with the APEI
firmware-first model.  With VMD, it sounds like OSPM owns the AER
capability and doesn't know firmware exists *except* that it has to be
careful not to step on firmware's interrupt.  So maybe we can handle it
separately.

Bjorn

WARNING: multiple messages have this Message-ID (diff)
From: Bjorn Helgaas <helgaas@kernel.org>
To: Alex_Gagniuc@Dellteam.com
Cc: Shyam.Iyer@dell.com, sbobroff@linux.ibm.com,
	gregkh@linuxfoundation.org, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org, keith.busch@intel.com,
	lukas@wunner.de, oohall@gmail.com, mr.nuke.me@gmail.com,
	Austin.Bolen@dell.com, linuxppc-dev@lists.ozlabs.org,
	jonathan.derrick@intel.com
Subject: Re: [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected
Date: Thu, 15 Nov 2018 00:24:23 -0600	[thread overview]
Message-ID: <20181115062423.GA94998@google.com> (raw)
In-Reply-To: <1eb0fa27924f426992715684f5e63346@ausx13mps321.AMER.DELL.COM>

On Wed, Nov 14, 2018 at 07:22:04PM +0000, Alex_Gagniuc@Dellteam.com wrote:
> On 11/14/2018 12:00 AM, Bjorn Helgaas wrote:
> > On Tue, Nov 13, 2018 at 10:39:15PM +0000, Alex_Gagniuc@Dellteam.com wrote:
> >> On 11/12/2018 11:02 PM, Bjorn Helgaas wrote:
> >> ...
> >>> Do you think Linux observes the rule about not touching AER bits on
> >>> FFS?  I'm not sure it does.  I'm not even sure what section of the
> >>> spec is relevant.
> >>
> >> I haven't found any place where linux breaks this rule. I'm very
> >> confident that, unless otherwise instructed, we follow this rule.
> > 
> > Just to make sure we're on the same page, can you point me to this
> > rule?  I do see that OSPM must request control of AER using _OSC
> > before it touches the AER registers.  What I don't see is the
> > connection between firmware-first and the AER registers.
> 
> ACPI 6.2 - 6.2.11.3, Table 6-197:
> 
> PCI Express Advanced Error Reporting control:
>   * The firmware sets this bit to 1 to grant control over PCI Express 
> Advanced Error Reporting. If firmware allows the OS control of this 
> feature, then in the context of the _OSC method it must ensure that 
> error messages are routed to device interrupts as described in the PCI 
> Express Base Specification[...]

The PCIe Base Spec is pretty big, so I wish this reference were a
little more explicit.  I *guess* maybe it's referring to PCIe r4.0,
figure 6-3 in sec 6.2.6, where PCIe ERR_* Messages can be routed to
"INTx or MSI Error Interrupts" and/or "platform-specific System Error"
interrupts.

"Device interrupts" seems like it refers to the "INTx or MSI"
interrupts, not the platform-specific System Errors, so I would read
that as saying "if firmware grants OS control of AER via _OSC,
firmware must set the AER Reporting Enables in the AER Root Error
Command register."  But that seems a little silly because the OS now
*owns* the AER capability and it can set the AER Root Error Command
register itself if it wants to.

And I still don't see the connection here with Firmware-First.  I'm
pretty sure firmware could not be notified via INTx or MSI interrupts
because those are totally managed by OSPM.

> > The closest I can find is the "Enabled" field in the HEST PCIe
> > AER structures (ACPI v6.2, sec 18.3.2.4, .5, .6), where it says:
> > 
> >    If the field value is 1, indicates this error source is
> >    to be enabled.
> > 
> >    If the field value is 0, indicates that the error source
> >    is not to be enabled.
> > 
> >    If FIRMWARE_FIRST is set in the flags field, the Enabled
> >    field is ignored by the OSPM.
> > 
> > AFAICT, Linux completely ignores the Enabled field in these
> > structures.
> 
> I don't think ignoring the field is a problem:
>   * With FFS, OS should ignore it.
>   * Without FFS, we have control, and we get to make the decisions anyway.
> In the latter case we decide whether to use AER, independent of the crap 
> in ACPI. I'm not even sure why "Enabled" matters in native AER handling. 

It seems like these HEST structures are "here's how firmware thinks
you should set up AER on this device".  But I agree, I have no idea
how to interpret "Enabled".  The rest of the HEST fields cover all the
useful AER registers, including the Reporting Enables in the AER Root
Error Command register *and* the Error Reporting Enables in the Device
Control register.  So I don't know what the "Enabled" field adds to
all that.  What a mess.

> > For firmware-first to work, firmware has to get control.  How does
> > it get control?  How does OSPM know to either set up that
> > mechanism or keep its mitts off something firmware set up before
> > handoff?
> 
> My understanding is that, if FW keeps control of AER in _OSC, then
> it will have set things up to get notified instead of the OS. OSPM
> not touching AER bits is to make sure it doesn't mess up FW's setup.
> I think there are some proprietary bits in the root port to route
> interrupts to SMIs instead of the AER vectors.

It makes good sense that if OSPM doesn't have AER control, firmware
does all AER handling, including any setup for firmware-first
notification.  If we can assume that firmware-first notification is
done in some way the OS doesn't know about and can't mess up, that
would be awesome.

But I think the VMD model really has nothing to do with the APEI
firmware-first model.  With VMD, it sounds like OSPM owns the AER
capability and doesn't know firmware exists *except* that it has to be
careful not to step on firmware's interrupt.  So maybe we can handle it
separately.

Bjorn

  parent reply	other threads:[~2018-11-15  6:24 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-18 22:15 [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected Alexandru Gagniuc
2018-11-06  0:32 ` Alex G.
2018-11-07 17:04   ` Derrick, Jonathan
2018-11-07 23:42 ` Bjorn Helgaas
2018-11-08 20:09   ` Bjorn Helgaas
2018-11-08 20:09     ` Bjorn Helgaas
2018-11-08 21:49     ` Keith Busch
2018-11-08 21:49       ` Keith Busch
2018-11-08 22:01     ` Greg Kroah-Hartman
2018-11-08 22:01       ` Greg Kroah-Hartman
2018-11-08 22:32       ` Keith Busch
2018-11-08 22:32         ` Keith Busch
2018-11-08 22:42         ` Greg Kroah-Hartman
2018-11-08 22:42           ` Greg Kroah-Hartman
2018-11-08 22:49           ` Alex_Gagniuc
2018-11-08 22:49             ` Alex_Gagniuc
2018-11-08 22:51             ` Greg KH
2018-11-08 22:51               ` Greg KH
2018-11-08 23:06               ` Alex_Gagniuc
2018-11-08 23:06                 ` Alex_Gagniuc
2018-11-12  5:49                 ` Oliver O'Halloran
2018-11-12  5:49                   ` Oliver O'Halloran
2018-11-12 20:05                   ` Alex_Gagniuc
2018-11-12 20:05                     ` Alex_Gagniuc
2018-11-13  5:02                     ` Bjorn Helgaas
2018-11-13  5:02                       ` Bjorn Helgaas
2018-11-13 22:39                       ` Alex_Gagniuc
2018-11-13 22:39                         ` Alex_Gagniuc
2018-11-13 22:52                         ` Keith Busch
2018-11-13 22:52                           ` Keith Busch
2018-11-14  0:31                           ` Alex_Gagniuc
2018-11-14  0:31                             ` Alex_Gagniuc
2018-11-14  5:59                         ` Bjorn Helgaas
2018-11-14  5:59                           ` Bjorn Helgaas
2018-11-14 19:22                           ` Alex_Gagniuc
2018-11-14 19:22                             ` Alex_Gagniuc
2018-11-14 19:41                             ` Derrick, Jonathan
2018-11-14 19:41                               ` Derrick, Jonathan
2018-11-14 20:23                             ` Keith Busch
2018-11-14 20:23                               ` Keith Busch
2018-11-14 20:52                               ` Alex_Gagniuc
2018-11-14 20:52                                 ` Alex_Gagniuc
2018-11-14 20:58                                 ` Keith Busch
2018-11-14 20:58                                   ` Keith Busch
2018-11-15  6:24                             ` Bjorn Helgaas [this message]
2018-11-15  6:24                               ` Bjorn Helgaas
2018-11-16  0:19                               ` Alex_Gagniuc
2018-11-16  0:19                                 ` Alex_Gagniuc
2018-11-08 23:03           ` Keith Busch
2018-11-08 23:03             ` Keith Busch
2018-11-09  7:29       ` Lukas Wunner
2018-11-09 11:32         ` Greg Kroah-Hartman
2018-11-09 11:32           ` Greg Kroah-Hartman
2018-11-09 16:36           ` Keith Busch
2018-11-09 16:36             ` Keith Busch
2018-11-08 22:20     ` Alex_Gagniuc
2018-11-08 22:20       ` Alex_Gagniuc
2018-11-09  7:11     ` Lukas Wunner
2018-11-12  5:48       ` Oliver O'Halloran
2018-11-12  5:48         ` Oliver O'Halloran
2018-12-27 19:28     ` Alex_Gagniuc
2018-12-27 19:28       ` Alex_Gagniuc

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181115062423.GA94998@google.com \
    --to=helgaas@kernel.org \
    --cc=Alex_Gagniuc@Dellteam.com \
    --cc=Austin.Bolen@dell.com \
    --cc=Shyam.Iyer@dell.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jonathan.derrick@intel.com \
    --cc=keith.busch@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=lukas@wunner.de \
    --cc=mr.nuke.me@gmail.com \
    --cc=oohall@gmail.com \
    --cc=ruscur@russell.cc \
    --cc=sbobroff@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.