From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60208) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yhdmb-00080x-KE for qemu-devel@nongnu.org; Mon, 13 Apr 2015 08:48:10 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YhdmT-0005Qg-Lr for qemu-devel@nongnu.org; Mon, 13 Apr 2015 08:48:09 -0400 Received: from mx1.redhat.com ([209.132.183.28]:47187) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YhdmT-0005QW-Fc for qemu-devel@nongnu.org; Mon, 13 Apr 2015 08:48:01 -0400 Date: Mon, 13 Apr 2015 14:47:57 +0200 From: "Michael S. Tsirkin" Message-ID: <20150413144223-mutt-send-email-mst@redhat.com> References: <20150401103411-mutt-send-email-mst@redhat.com> <551BBD38.60204@citrix.com> <20150401115032-mutt-send-email-mst@redhat.com> <552B97A00200007800071558@mail.emea.novell.com> <20150413125101-mutt-send-email-mst@redhat.com> <552BC5EA02000078000716E7@mail.emea.novell.com> <20150413133843-mutt-send-email-mst@redhat.com> <552BD57B0200007800071763@mail.emea.novell.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <552BD57B0200007800071763@mail.emea.novell.com> Subject: Re: [Qemu-devel] [Xen-devel] [PATCH][XSA-126] xen: limit guest control of PCI command register List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jan Beulich Cc: Andrew Cooper , xen-devel@lists.xensource.com, pmatouse@redhat.com, qemu-devel@nongnu.org, Stefano Stabellini On Mon, Apr 13, 2015 at 01:40:59PM +0100, Jan Beulich wrote: > >>> On 13.04.15 at 13:47, wrote: > > On Mon, Apr 13, 2015 at 12:34:34PM +0100, Jan Beulich wrote: > >> >>> On 13.04.15 at 13:19, wrote: > >> > Yes Linux can't fix firmware 1st mode, but > >> > PCI express spec says what firmware should do in this case: > >> > > >> > IMPLEMENTATION NOTE Software UR Reporting Compatibility with 1.0a Devices > >> > > >> > With 1.0a device Functions, 96 if the Unsupported Request Reporting > > > >> > Enable bit is set, the Function > >> > when operating as a Completer will send an uncorrectable error > >> > Message (if enabled) when a UR > >> > error is detected. On platforms where an uncorrectable error > > Message > >> > is handled as a System Error, > >> > this will break PC-compatible Configuration Space probing, so > >> > software/firmware on such > >> > platforms may need to avoid setting the Unsupported Request > >> > Reporting Enable bit. > >> > With device Functions implementing Role-Based Error Reporting, > >> > setting the Unsupported Request > >> > Reporting Enable bit will not interfere with PC-compatible > >> > Configuration Space probing, assuming > >> > that the severity for UR is left at its default of non-fatal. > >> > However, setting the Unsupported Request > >> > Reporting Enable bit will enable the Function to report UR errors > >> > detected with posted Requests, > >> > helping avoid this case for potential silent data corruption. > >> > On platforms where robust error handling and PC-compatible > >> > Configuration Space probing is > >> > required, it is suggested that software or firmware have the > >> > Unsupported Request Reporting Enable > >> > bit Set for Role-Based Error Reporting Functions, but clear for 1.0a > > > >> > Functions. Software or > >> > firmware can distinguish the two classes of Functions by examining > >> > the Role-Based Error Reporting > >> > bit in the Device Capabilities register. > >> > > >> > > >> > What I think you have is a very old 1.0a system, and you set Unsupported > >> > Request Reporting Enable. > >> > > >> > Can you confirm? > >> > >> No. In at least one of the two cases we got reports of the original > >> problem, triggering the finding of this issue, this is a brand new one, > >> only soon to become available publicly. Furthermore I'm being > >> confused by the mention of PC-compatible config space probing > >> above: The URs we talk about here don't result from config space > >> accessed at all. > > > > OK. Can you please explain why does UR cause a system error then? > > It looks like a hardware bug: PCIE 1.1 seems to say it shouldn't. > > Quite possible. Looking at the ITP log we were provided, the UR > severity bit is clear (non-fatal), yet the error got surfaced to the > OS as a fatal one (I would guess because it validly gets flagged as > uncorrectable at the same time). > > Jan No, that's not valid. Can you check device capabilities register, offset 0x4 within pci express capability structure? Bit 15 is 15 Role-Based Error Reporting. Is it set? The spec says: 15 On platforms where robust error handling and PC-compatible Configuration Space probing is required, it is suggested that software or firmware have the Unsupported Request Reporting Enable bit Set for Role-Based Error Reporting Functions, but clear for 1.0a Functions. Software or firmware can distinguish the two classes of Functions by examining the Role-Based Error Reporting bit in the Device Capabilities register. -- MST