From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexey Kardashevskiy Subject: Re: [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space Date: Tue, 22 May 2012 13:44:21 +1000 Message-ID: <4FBB0B95.8050901@ozlabs.ru> References: <4FACB581.2050609@ozlabs.ru> <6A22E211-BC82-49BD-A335-02D3BAA14A17@suse.de> <4FAD0A4F.2050506@ozlabs.ru> <4FB080CE.3030703@ozlabs.ru> <4FB5DA43.90907@ozlabs.ru> <1337652170.2779.143.camel@pasglop> <6C472F5B-B8C3-48DE-B19B-00973AF6AC56@suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=KOI8-R Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Benjamin Herrenschmidt , "qemu-devel@nongnu.org" , Alex Williamson , David Gibson , "anthony@codemonkey.ws" , "kvm@vger.kernel.org" To: Alexander Graf Return-path: Received: from mail-pb0-f46.google.com ([209.85.160.46]:63959 "EHLO mail-pb0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759759Ab2EVDod (ORCPT ); Mon, 21 May 2012 23:44:33 -0400 Received: by pbbrp8 with SMTP id rp8so7782406pbb.19 for ; Mon, 21 May 2012 20:44:32 -0700 (PDT) In-Reply-To: <6C472F5B-B8C3-48DE-B19B-00973AF6AC56@suse.de> Sender: kvm-owner@vger.kernel.org List-ID: On 22/05/12 13:21, Alexander Graf wrote: >=20 >=20 > On 22.05.2012, at 04:02, Benjamin Herrenschmidt wrote: >=20 >> On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote: >>> Alexander, >>> >>> Is that any better? :) >> >> Alex (Graf that is), ping ? >> >> The original patch from Alexey was fine btw. >> >> VFIO will always call things with the existing capability offset so >> there's no real risk of doing the wrong thing or break the list or >> anything. >> >> IE. A small simple patch that addresses the problem :-) >> >> The new patch is a bit more "robust" I believe, I don't think we nee= d to >> go too far to fix a problem we don't have. But we need a fix for the >> real issue and the simple patch does it neatly from what I can >> understand. >> >> Cheers, >> Ben. >> >>> >>> @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *p= dev) >>> * in pci config space */ >>> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >>> uint8_t offset, uint8_t size) >>> { >>> - uint8_t *config; >>> + uint8_t *config, existing; >=20 > Existing is a pointer to the target dev's config space, right? Yes. >>> int i, overlapping_cap; >>> >>> + existing =3D pci_find_capability(pdev, cap_id); >>> + if (existing) { >>> + if (offset && (existing !=3D offset)) { >>> + return -EEXIST; >>> + } >>> + for (i =3D existing; i < size; ++i) { >=20 > So how does this possibly make sense? Although I do not expect VFIO to add capabilities (does not make sense)= , I still want to double check that this space has not been tried to use by someone else. >>> + if (pdev->used[i]) { >>> + return -EFAULT; >>> + } >>> + } >>> + memset(pdev->used + offset, 0xFF, size); > Why? Because I am marking the space this capability takes as used. >>> + /* Make capability read-only by default */ >>> + memset(pdev->wmask + offset, 0, size); > Why? Because the pci_add_capability() does it for a new capability by defaul= t. >>> + /* Check capability by default */ >>> + memset(pdev->cmask + offset, 0xFF, size); >=20 > I don't understand this part either. The pci_add_capability() does it for a new capability by default. >=20 > Alex >=20 >>> + return existing; >>> + } >>> + >>> if (!offset) { >>> offset =3D pci_find_space(pdev, size); >>> if (!offset) { >>> return -ENOSPC; >>> >>> >>> >>> >>> >>> >>> On 14/05/12 13:49, Alexey Kardashevskiy wrote: >>>> On 12/05/12 00:13, Alexander Graf wrote: >>>>> >>>>> On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote: >>>>> >>>>>> 11.05.2012 20:52, Alexander Graf =CE=C1=D0=C9=D3=C1=CC: >>>>>>> >>>>>>> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote: >>>>>>> >>>>>>>> Normally the pci_add_capability is called on devices to add ne= w >>>>>>>> capability. This is ok for emulated devices which capabilities= list >>>>>>>> is being built by QEMU. >>>>>>>> >>>>>>>> In the case of VFIO the capability may already exist and addin= g new >>>>>>>> capability into the beginning of the linked list may create a = loop. >>>>>>>> >>>>>>>> For example, the old code destroys the following config >>>>>>>> of PCIe Intel E1000E: >>>>>>>> >>>>>>>> before adding PCI_CAP_ID_MSI (0x05): >>>>>>>> 0x34: 0xC8 >>>>>>>> 0xC8: 0x01 0xD0 >>>>>>>> 0xD0: 0x05 0xE0 >>>>>>>> 0xE0: 0x10 0x00 >>>>>>>> >>>>>>>> after: >>>>>>>> 0x34: 0xD0 >>>>>>>> 0xC8: 0x01 0xD0 >>>>>>>> 0xD0: 0x05 0xC8 >>>>>>>> 0xE0: 0x10 0x00 >>>>>>>> >>>>>>>> As result capabilities 0x01 and 0x05 point to each other. >>>>>>>> >>>>>>>> The proposed patch does not change capability pointers when >>>>>>>> the same type capability is about to add. >>>>>>>> >>>>>>>> Signed-off-by: Alexey Kardashevskiy >>>>>>>> --- >>>>>>>> hw/pci.c | 10 ++++++---- >>>>>>>> 1 files changed, 6 insertions(+), 4 deletions(-) >>>>>>>> >>>>>>>> diff --git a/hw/pci.c b/hw/pci.c >>>>>>>> index aa0c0b8..1f7c924 100644 >>>>>>>> --- a/hw/pci.c >>>>>>>> +++ b/hw/pci.c >>>>>>>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev= , uint8_t cap_id, >>>>>>>> } >>>>>>>> >>>>>>>> config =3D pdev->config + offset; >>>>>>>> - config[PCI_CAP_LIST_ID] =3D cap_id; >>>>>>>> - config[PCI_CAP_LIST_NEXT] =3D pdev->config[PCI_CAPABILITY= _LIST]; >>>>>>>> - pdev->config[PCI_CAPABILITY_LIST] =3D offset; >>>>>>>> - pdev->config[PCI_STATUS] |=3D PCI_STATUS_CAP_LIST; >>>>>>>> + if (config[PCI_CAP_LIST_ID] !=3D cap_id) { >>>>>>> >>>>>>> This doesn't scale. Capabilities are a list of CAPs. You'll hav= e to do a loop through all capabilities, check if the one you want to a= dd is there already and if so either >>>>>>> * replace the existing one or >>>>>>> * drop out and not write the new one in. >>>>> >>>>> * hw_error :) >>>>> >>>>>>> >>>>>>> I'm not sure which way would be more natural. >>>>>> >>>>>> There is a third option - add another function, lets call it >>>>>> pci_fixup_capability() which would do whatever pci_add_capabilit= y() does >>>>>> but won't touch list pointers. >>>>> >>>>> What good is a function that breaks internal consistency? >>>> >>>> >>>> It is broken already by having PCIDevice.used field. Normally pci_= add_capability() would go through >>>> the whole list and add a capability if it does not exist. Emulated= devices which care about having a >>>> capability at some fixed offset would have initialized their confi= g space before calling this >>>> capabilities API (as VFIO does). >>>> >>>> If we really want to support emulated devices which want some capa= bilities be at fixed offset and >>>> others at random offsets (strange, but ok), I do not see how it is= bad to restore this consistency >>>> by special function (pci_fixup_capability()) to avoid its rewritin= g at different location as a guest >>>> driver may care about its offset. >>>> >>>> >>>> >>>>>> When vfio, pci_add_capability() is called from the code which kn= ows >>>>>> exactly that the capability exists and where it is and it calls >>>>>> pci_add_capability() based on this knowledge so doing additional= loops >>>>>> just for imaginery scalability is a bit weird, no? >>>>> >>>>> Not sure I understand your proposal. The more generic a framework= is, the better, no? In this code path we don't care about speed. We on= ly care about consistency and reliability. >>>>> >>>>> >>>>> Alex >>>>> >>>> >>>> >>> >>> >> >> --=20 Alexey