qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Malcolm Crossley <malcolm.crossley@citrix.com>
Cc: xen-devel@lists.xensource.com, pmatouse@redhat.com,
	Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	qemu-devel@nongnu.org, Jan Beulich <JBeulich@suse.com>
Subject: Re: [Qemu-devel] [Xen-devel] [PATCH][XSA-126] xen: limit guest control of PCI command register
Date: Mon, 8 Jun 2015 10:59:51 +0200	[thread overview]
Message-ID: <20150608104102-mutt-send-email-mst@redhat.com> (raw)
In-Reply-To: <55754DAB.2070102@citrix.com>

On Mon, Jun 08, 2015 at 09:09:15AM +0100, Malcolm Crossley wrote:
> On 08/06/15 08:42, Jan Beulich wrote:
> >>>> On 07.06.15 at 08:23, <mst@redhat.com> wrote:
> >> On Mon, Apr 20, 2015 at 04:32:12PM +0200, Michael S. Tsirkin wrote:
> >>> On Mon, Apr 20, 2015 at 03:08:09PM +0100, Jan Beulich wrote:
> >>>>>>> On 20.04.15 at 15:43, <mst@redhat.com> wrote:
> >>>>> On Mon, Apr 13, 2015 at 01:51:06PM +0100, Jan Beulich wrote:
> >>>>>>>>> On 13.04.15 at 14:47, <mst@redhat.com> wrote:
> >>>>>>> Can you check device capabilities register, offset 0x4 within
> >>>>>>> pci express capability structure?
> >>>>>>> Bit 15 is 15 Role-Based Error Reporting.
> >>>>>>> Is it set?
> >>>>>>>
> >>>>>>> The spec says:
> >>>>>>>
> >>>>>>> 	15
> >>>>>>> 	On platforms where robust error handling and PC-compatible Configuration 
> >>>>>>> Space probing is
> >>>>>>> 	required, it is suggested that software or firmware have the Unsupported 
> >>>>>>> Request Reporting Enable
> >>>>>>> 	bit Set for Role-Based Error Reporting Functions, but clear for 1.0a 
> >>>>>>> Functions. Software or
> >>>>>>> 	firmware can distinguish the two classes of Functions by examining the 
> >>>>>>> Role-Based Error Reporting
> >>>>>>> 	bit in the Device Capabilities register.
> >>>>>>
> >>>>>> Yes, that bit is set.
> >>>>>
> >>>>> curiouser and curiouser.
> >>>>>
> >>>>> So with functions that do support Role-Based Error Reporting, we have
> >>>>> this:
> >>>>>
> >>>>>
> >>>>> 	With device Functions implementing Role-Based Error Reporting, setting the 
> >>>>> Unsupported Request
> >>>>> 	Reporting Enable bit will not interfere with PC-compatible Configuration 
> >>>>> Space probing, assuming
> >>>>> 	that the severity for UR is left at its default of non-fatal. However, 
> >>>>> setting the Unsupported Request
> >>>>> 	Reporting Enable bit will enable the Function to report UR errors 97 
> >>>>> detected with posted Requests,
> >>>>> 	helping avoid this case for potential silent data corruption.
> >>>>
> >>>> I still don't see what the PC-compatible config space probing has to
> >>>> do with our issue.
> >>>
> >>> I'm not sure but I think it's listed here because it causes a ton of URs
> >>> when device scan probes unimplemented functions.
> >>>
> >>>>> did firmware reconfigure this device to report URs as fatal errors then?
> >>>>
> >>>> No, the Unsupported Request Error Serverity flag is zero.
> >>>
> >>> OK, that's the correct configuration, so how come the box crashes when
> >>> there's a UR then?
> >>
> >> Ping - any update on this?
> > 
> > Not really. All we concluded so far is that _maybe_ the bridge, upon
> > seeing the UR, generates a Master Abort, rendering the whole thing
> > fatal. Otoh the respective root port also has
> > - Received Master Abort set in its Secondary Status register (but
> >   that's also already the case in the log that we have before the UR
> >   occurs, i.e. that doesn't mean all that much),
> > - Received System Error set in its Secondary Status register (and
> >   after the UR the sibling endpoint [UR originating from 83:00.0,
> >   sibling being 83:00.1] also shows Signaled System Error set).
> > 
> 
> Disabling the Memory decode in the command register could also result in a completion timeout on the
> root port issuing a transaction towards the PCI device in question.

Can it really? Such device would violate the PCIE spec, which says:

	If the request is not claimed, then it is handled as an
	Unsupported Request, which is the
	PCI Express equivalent of conventional PCI’s Master Abort termination.




> PCIE completion timeouts can be
> escalated to Fatal AER errors which trigger system firmware to inject NMI's into the host.
> 
> Unsupported requests can also be escalated to be Fatal AER errors (which would again trigger system
> firmware to inject an NMI).

Only if the system is misconfigured. We found out the system in question
is not configured to do this.


> Here is an example AER setup for a PCIE root port. You can see UnsupReq errors are masked and so do
> not trigger errors. CmpltTO ( completion timeout) errors are not masked and the errors are treated
> as Fatal because the corresponding bit in the Uncorrectable Severity register is set.
> 
> Capabilities: [148 v1] Advanced Error Reporting
> UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol+
> UESvrt:	DLP+ SDES+ TLP+ FCP+ CmpltTO+ CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
> CEMsk:	RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr+
> AERCap:	First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
> 
> A root port completion timeout will also result in the master abort bit being set.

How do you figure this one out? The spec I have says master abort is the
equivalent of UR.

> Typically system firmware clears the error in the AER registers after it's processed it. So the
> operating system may not be able to determine what error triggered the NMI in the first place.

At least for debugging, just disable firmware and handle everything in
software.

> >> Do we can chalk this up to hardware bugs on a specific box?
> > 
> > I have to admit that I'm still very uncertain whether to consider all
> > this correct behavior, a firmware flaw, or a hardware bug.
> I believe the correct behaviour is happening but a PCIE completion timeout is occurring instead of a
> unsupported request.
> 
> Malcolm

This guess would be easy to check - just mask out the timeout bit.



> 
> > 
> > Jan
> > 
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel
> > 

  reply	other threads:[~2015-06-08  9:00 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-31 14:18 [Qemu-devel] [PATCH][XSA-126] xen: limit guest control of PCI command register Stefano Stabellini
2015-04-01  9:01 ` Michael S. Tsirkin
2015-04-01  9:20   ` Stefano Stabellini
2015-04-01  9:32     ` Michael S. Tsirkin
2015-04-01  9:41     ` [Qemu-devel] [Xen-devel] " Andrew Cooper
2015-04-01  9:59       ` Michael S. Tsirkin
2015-04-13  8:17         ` Jan Beulich
2015-04-13 11:19           ` Michael S. Tsirkin
2015-04-13 11:34             ` Jan Beulich
2015-04-13 11:47               ` Michael S. Tsirkin
2015-04-13 12:40                 ` Jan Beulich
2015-04-13 12:47                   ` Michael S. Tsirkin
2015-04-13 12:51                     ` Jan Beulich
2015-04-20 13:43                       ` Michael S. Tsirkin
2015-04-20 14:08                         ` Jan Beulich
2015-04-20 14:32                           ` Michael S. Tsirkin
2015-04-20 14:57                             ` Jan Beulich
2015-06-07  6:23                             ` Michael S. Tsirkin
2015-06-08  7:42                               ` Jan Beulich
2015-06-08  8:09                                 ` Malcolm Crossley
2015-06-08  8:59                                   ` Michael S. Tsirkin [this message]
2015-06-08  9:03                                   ` Jan Beulich
2015-06-08  9:36                                     ` Michael S. Tsirkin
2015-06-08 10:55                                       ` Jan Beulich
2015-06-08 11:28                                         ` Michael S. Tsirkin
2015-06-08 11:44                                           ` Jan Beulich
2015-06-10  7:00                                           ` Jan Beulich
2015-06-10 11:43                                             ` Michael S. Tsirkin
2015-06-10 12:06                                               ` Jan Beulich
2015-06-10 13:35                                                 ` Michael S. Tsirkin
2015-06-08  9:30                                 ` Michael S. Tsirkin
2015-06-08 10:38                                   ` Jan Beulich
2015-06-10  7:08                                   ` Jan Beulich
2015-06-10 11:46                                     ` Michael S. Tsirkin
2015-06-10 12:10                                       ` Jan Beulich
2015-04-01  9:50     ` Ian Campbell
2015-04-01 10:12       ` Michael S. Tsirkin
2015-04-09 18:10 ` [Qemu-devel] " Peter Maydell
2015-04-10 11:45   ` Peter Maydell
2015-04-10 11:49     ` Peter Maydell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150608104102-mutt-send-email-mst@redhat.com \
    --to=mst@redhat.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=malcolm.crossley@citrix.com \
    --cc=pmatouse@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).