From mboxrd@z Thu Jan 1 00:00:00 1970 From: Don Dutile Subject: Re: [PATCH v3] pci: correct pci config size default for cap version 2 endpoints Date: Mon, 25 Jul 2011 15:37:55 -0400 Message-ID: <4E2DC613.405@redhat.com> References: <20110722155338.43049.12587.stgit@dddsys0.bos.redhat.com> <20110722212422.GH9766@sequoia.sous-sol.org> <1311370232.2653.100.camel@bling.home> <20110722213547.GI9766@sequoia.sous-sol.org> <20110724081244.GB24483@redhat.com> <20110724084110.GA25016@redhat.com> <20110724105836.GA25079@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Chris Wright , Alex Williamson , kvm@vger.kernel.org To: "Michael S. Tsirkin" Return-path: Received: from mx1.redhat.com ([209.132.183.28]:9681 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752155Ab1GYTiA (ORCPT ); Mon, 25 Jul 2011 15:38:00 -0400 In-Reply-To: <20110724105836.GA25079@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On 07/24/2011 06:58 AM, Michael S. Tsirkin wrote: > On Sun, Jul 24, 2011 at 11:41:10AM +0300, Michael S. Tsirkin wrote: >> On Sun, Jul 24, 2011 at 11:12:44AM +0300, Michael S. Tsirkin wrote: >>> On Fri, Jul 22, 2011 at 02:35:47PM -0700, Chris Wright wrote: >>>> * Alex Williamson (alex.williamson@redhat.com) wrote: >>>>> On Fri, 2011-07-22 at 14:24 -0700, Chris Wright wrote: >>>>>> * Donald Dutile (ddutile@redhat.com) wrote: >>>>>>> diff --git a/hw/device-assignment.c b/hw/device-assignment.c >>>>>>> index 36ad6b0..34db52e 100644 >>>>>>> --- a/hw/device-assignment.c >>>>>>> +++ b/hw/device-assignment.c >>>>>>> @@ -1419,16 +1419,18 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev) >>>>>>> } >>>>>>> >>>>>>> if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_EXP, 0))) { >>>>>>> - uint8_t version; >>>>>>> + uint8_t version, size; >>>>>>> uint16_t type, devctl, lnkcap, lnksta; >>>>>>> uint32_t devcap; >>>>>>> - int size = 0x3c; /* version 2 size */ >>>>>>> >>>>>>> version = pci_get_byte(pci_dev->config + pos + PCI_EXP_FLAGS); >>>>>>> version&= PCI_EXP_FLAGS_VERS; >>>>>>> if (version == 1) { >>>>>>> size = 0x14; >>>>>>> - } else if (version> 2) { >>>>>>> + } else if (version == 2) { >>>>>>> + /* don't include slot cap/stat/ctrl 2 regs; only support endpoints */ >>>>>>> + size = 0x34; >>>>>> >>>>>> That doesn't look correct to me. The size is fixed, just that some >>>>>> registers are Reserved Zero when they do not apply (e.g. endpoint only). >>>>> >>>>> Apparently it can be interpreted differently. In this case, we've seen >>>>> a tg3 device expose a v2 PCI express capability at offset 0xcc. Using >>>>> 0x3c bytes, we extend 8 bytes past the legacy config space area :( >>>> >>>> Wow, that device sounds broken to me. The spec is pretty clear. >>> >>> Yes, I agree it's broken. Looks like something that >>> happens when a device is designed in parallel with the spec. >>> >>> What bothers me is this patch seems to make devices that do behave >>> correctly out of spec (registers will be writeable by default) - >>> correct? >>> >>> How about we check for overflow and only do the hacks >>> if it happens? >>> >>> Also, the code to initialize slot and root control registers is still >>> there: it would seem that running it will corrupt memmory beyond the >>> config array? >> >> I take this last bit back: registers we touch are at offset< 0x34. >> Sorry about the noise. But the question about read-only registers >> still stands. > > Also, where does the magic 0x34 come from? I'm guessing this is > simply what's left till the end of the config space. > So let's be conservative specific as possible with > this hack: > I believe the spec leaves room for interpretation, and thus the resulting 'broken' device. As I read the spec, the size of the struct can be: -- 0x2c for all devices, min., that are cap version 2 or higher. -- 0x34 for devices with links, i.e., not a root-port-based device, , a device not integrated into the root port , or if it is, it uses a serial link anyhow (doesn't strip out 8b/10b serial link btwn device & internal root port) -- 0x3c for devices with slots, i.e., a bridge with downstream slots, i.e., not an endpoint, i.e., a PCI(e) bridge. Thus, 0x34 was chosen, since we don't support device assigning PCI bridges, (not until MRIOV shows up, at least), and 0x34 fits the bug at hand, and device cap/stat/control may be used/modified. So, a 'hack' is not needed. In fact, the 0x34 size is a bit of a hack, since the case to use 0x2c could be ascertained by checking if the device is a root port device, _and_ it's not using a serial link, but a perusal of root port devices on a number of systems I looked at always had this structure greater than 0x2c, so I figured the simple heuristic of 0x34 was sufficient. > /* A version 2 device was observed to only have a partial > * implementation for the capability structure. Apparently, it doesn't > * implement the registers from slot capability 2 and on (offset 0x34), > * with the capability at offset 0xCC = 256 - 0x34. This is out of spec, > * but let's try to support this. */ > if (version == 2&& pos == 0xCC) { > size = 0x34; > } > > >>> >>> -- >>> MST