* [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space @ 2012-05-11 6:45 Alexey Kardashevskiy 2012-05-11 10:52 ` Alexander Graf 2012-05-11 19:20 ` Jason Baron 0 siblings, 2 replies; 29+ messages in thread From: Alexey Kardashevskiy @ 2012-05-11 6:45 UTC (permalink / raw) To: qemu-devel; +Cc: kvm, Alex Graf, Alex Williamson, anthony, David Gibson Normally the pci_add_capability is called on devices to add new capability. This is ok for emulated devices which capabilities list is being built by QEMU. In the case of VFIO the capability may already exist and adding new capability into the beginning of the linked list may create a loop. For example, the old code destroys the following config of PCIe Intel E1000E: before adding PCI_CAP_ID_MSI (0x05): 0x34: 0xC8 0xC8: 0x01 0xD0 0xD0: 0x05 0xE0 0xE0: 0x10 0x00 after: 0x34: 0xD0 0xC8: 0x01 0xD0 0xD0: 0x05 0xC8 0xE0: 0x10 0x00 As result capabilities 0x01 and 0x05 point to each other. The proposed patch does not change capability pointers when the same type capability is about to add. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> --- hw/pci.c | 10 ++++++---- 1 files changed, 6 insertions(+), 4 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index aa0c0b8..1f7c924 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, } config = pdev->config + offset; - config[PCI_CAP_LIST_ID] = cap_id; - config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; - pdev->config[PCI_CAPABILITY_LIST] = offset; - pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; + if (config[PCI_CAP_LIST_ID] != cap_id) { + config[PCI_CAP_LIST_ID] = cap_id; + config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; + pdev->config[PCI_CAPABILITY_LIST] = offset; + pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; + } memset(pdev->used + offset, 0xFF, size); /* Make capability read-only by default */ memset(pdev->wmask + offset, 0, size); -- Alexey ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space 2012-05-11 6:45 [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space Alexey Kardashevskiy @ 2012-05-11 10:52 ` Alexander Graf 2012-05-11 12:47 ` Alexey Kardashevskiy 2012-05-11 19:20 ` Jason Baron 1 sibling, 1 reply; 29+ messages in thread From: Alexander Graf @ 2012-05-11 10:52 UTC (permalink / raw) To: Alexey Kardashevskiy Cc: kvm, qemu-devel, Alex Williamson, anthony, David Gibson On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote: > Normally the pci_add_capability is called on devices to add new > capability. This is ok for emulated devices which capabilities list > is being built by QEMU. > > In the case of VFIO the capability may already exist and adding new > capability into the beginning of the linked list may create a loop. > > For example, the old code destroys the following config > of PCIe Intel E1000E: > > before adding PCI_CAP_ID_MSI (0x05): > 0x34: 0xC8 > 0xC8: 0x01 0xD0 > 0xD0: 0x05 0xE0 > 0xE0: 0x10 0x00 > > after: > 0x34: 0xD0 > 0xC8: 0x01 0xD0 > 0xD0: 0x05 0xC8 > 0xE0: 0x10 0x00 > > As result capabilities 0x01 and 0x05 point to each other. > > The proposed patch does not change capability pointers when > the same type capability is about to add. > > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> > --- > hw/pci.c | 10 ++++++---- > 1 files changed, 6 insertions(+), 4 deletions(-) > > diff --git a/hw/pci.c b/hw/pci.c > index aa0c0b8..1f7c924 100644 > --- a/hw/pci.c > +++ b/hw/pci.c > @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, > } > > config = pdev->config + offset; > - config[PCI_CAP_LIST_ID] = cap_id; > - config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; > - pdev->config[PCI_CAPABILITY_LIST] = offset; > - pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; > + if (config[PCI_CAP_LIST_ID] != cap_id) { This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either * replace the existing one or * drop out and not write the new one in. I'm not sure which way would be more natural. > + config[PCI_CAP_LIST_ID] = cap_id; > + config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; > + pdev->config[PCI_CAPABILITY_LIST] = offset; > + pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; > + } > memset(pdev->used + offset, 0xFF, size); > /* Make capability read-only by default */ > memset(pdev->wmask + offset, 0, size); > > > -- > Alexey ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space 2012-05-11 10:52 ` Alexander Graf @ 2012-05-11 12:47 ` Alexey Kardashevskiy 2012-05-11 14:13 ` Alexander Graf 0 siblings, 1 reply; 29+ messages in thread From: Alexey Kardashevskiy @ 2012-05-11 12:47 UTC (permalink / raw) To: Alexander Graf; +Cc: kvm, qemu-devel, Alex Williamson, anthony, David Gibson 11.05.2012 20:52, Alexander Graf написал: > > On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote: > >> Normally the pci_add_capability is called on devices to add new >> capability. This is ok for emulated devices which capabilities list >> is being built by QEMU. >> >> In the case of VFIO the capability may already exist and adding new >> capability into the beginning of the linked list may create a loop. >> >> For example, the old code destroys the following config >> of PCIe Intel E1000E: >> >> before adding PCI_CAP_ID_MSI (0x05): >> 0x34: 0xC8 >> 0xC8: 0x01 0xD0 >> 0xD0: 0x05 0xE0 >> 0xE0: 0x10 0x00 >> >> after: >> 0x34: 0xD0 >> 0xC8: 0x01 0xD0 >> 0xD0: 0x05 0xC8 >> 0xE0: 0x10 0x00 >> >> As result capabilities 0x01 and 0x05 point to each other. >> >> The proposed patch does not change capability pointers when >> the same type capability is about to add. >> >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >> --- >> hw/pci.c | 10 ++++++---- >> 1 files changed, 6 insertions(+), 4 deletions(-) >> >> diff --git a/hw/pci.c b/hw/pci.c >> index aa0c0b8..1f7c924 100644 >> --- a/hw/pci.c >> +++ b/hw/pci.c >> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >> } >> >> config = pdev->config + offset; >> - config[PCI_CAP_LIST_ID] = cap_id; >> - config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; >> - pdev->config[PCI_CAPABILITY_LIST] = offset; >> - pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; >> + if (config[PCI_CAP_LIST_ID] != cap_id) { > > This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either > * replace the existing one or > * drop out and not write the new one in. > > I'm not sure which way would be more natural. There is a third option - add another function, lets call it pci_fixup_capability() which would do whatever pci_add_capability() does but won't touch list pointers. When vfio, pci_add_capability() is called from the code which knows exactly that the capability exists and where it is and it calls pci_add_capability() based on this knowledge so doing additional loops just for imaginery scalability is a bit weird, no? >> + config[PCI_CAP_LIST_ID] = cap_id; >> + config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; >> + pdev->config[PCI_CAPABILITY_LIST] = offset; >> + pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; >> + } >> memset(pdev->used + offset, 0xFF, size); >> /* Make capability read-only by default */ >> memset(pdev->wmask + offset, 0, size); -- With best regards Alexey Kardashevskiy ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space 2012-05-11 12:47 ` Alexey Kardashevskiy @ 2012-05-11 14:13 ` Alexander Graf 2012-05-14 3:49 ` Alexey Kardashevskiy 0 siblings, 1 reply; 29+ messages in thread From: Alexander Graf @ 2012-05-11 14:13 UTC (permalink / raw) To: Alexey Kardashevskiy Cc: kvm, qemu-devel, Alex Williamson, anthony, David Gibson On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote: > 11.05.2012 20:52, Alexander Graf написал: >> >> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote: >> >>> Normally the pci_add_capability is called on devices to add new >>> capability. This is ok for emulated devices which capabilities list >>> is being built by QEMU. >>> >>> In the case of VFIO the capability may already exist and adding new >>> capability into the beginning of the linked list may create a loop. >>> >>> For example, the old code destroys the following config >>> of PCIe Intel E1000E: >>> >>> before adding PCI_CAP_ID_MSI (0x05): >>> 0x34: 0xC8 >>> 0xC8: 0x01 0xD0 >>> 0xD0: 0x05 0xE0 >>> 0xE0: 0x10 0x00 >>> >>> after: >>> 0x34: 0xD0 >>> 0xC8: 0x01 0xD0 >>> 0xD0: 0x05 0xC8 >>> 0xE0: 0x10 0x00 >>> >>> As result capabilities 0x01 and 0x05 point to each other. >>> >>> The proposed patch does not change capability pointers when >>> the same type capability is about to add. >>> >>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >>> --- >>> hw/pci.c | 10 ++++++---- >>> 1 files changed, 6 insertions(+), 4 deletions(-) >>> >>> diff --git a/hw/pci.c b/hw/pci.c >>> index aa0c0b8..1f7c924 100644 >>> --- a/hw/pci.c >>> +++ b/hw/pci.c >>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >>> } >>> >>> config = pdev->config + offset; >>> - config[PCI_CAP_LIST_ID] = cap_id; >>> - config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; >>> - pdev->config[PCI_CAPABILITY_LIST] = offset; >>> - pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; >>> + if (config[PCI_CAP_LIST_ID] != cap_id) { >> >> This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either >> * replace the existing one or >> * drop out and not write the new one in. * hw_error :) >> >> I'm not sure which way would be more natural. > > There is a third option - add another function, lets call it > pci_fixup_capability() which would do whatever pci_add_capability() does > but won't touch list pointers. What good is a function that breaks internal consistency? > When vfio, pci_add_capability() is called from the code which knows > exactly that the capability exists and where it is and it calls > pci_add_capability() based on this knowledge so doing additional loops > just for imaginery scalability is a bit weird, no? Not sure I understand your proposal. The more generic a framework is, the better, no? In this code path we don't care about speed. We only care about consistency and reliability. Alex ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space 2012-05-11 14:13 ` Alexander Graf @ 2012-05-14 3:49 ` Alexey Kardashevskiy 2012-05-18 5:12 ` Alexey Kardashevskiy 0 siblings, 1 reply; 29+ messages in thread From: Alexey Kardashevskiy @ 2012-05-14 3:49 UTC (permalink / raw) To: Alexander Graf; +Cc: kvm, qemu-devel, Alex Williamson, anthony, David Gibson On 12/05/12 00:13, Alexander Graf wrote: > > On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote: > >> 11.05.2012 20:52, Alexander Graf написал: >>> >>> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote: >>> >>>> Normally the pci_add_capability is called on devices to add new >>>> capability. This is ok for emulated devices which capabilities list >>>> is being built by QEMU. >>>> >>>> In the case of VFIO the capability may already exist and adding new >>>> capability into the beginning of the linked list may create a loop. >>>> >>>> For example, the old code destroys the following config >>>> of PCIe Intel E1000E: >>>> >>>> before adding PCI_CAP_ID_MSI (0x05): >>>> 0x34: 0xC8 >>>> 0xC8: 0x01 0xD0 >>>> 0xD0: 0x05 0xE0 >>>> 0xE0: 0x10 0x00 >>>> >>>> after: >>>> 0x34: 0xD0 >>>> 0xC8: 0x01 0xD0 >>>> 0xD0: 0x05 0xC8 >>>> 0xE0: 0x10 0x00 >>>> >>>> As result capabilities 0x01 and 0x05 point to each other. >>>> >>>> The proposed patch does not change capability pointers when >>>> the same type capability is about to add. >>>> >>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >>>> --- >>>> hw/pci.c | 10 ++++++---- >>>> 1 files changed, 6 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/hw/pci.c b/hw/pci.c >>>> index aa0c0b8..1f7c924 100644 >>>> --- a/hw/pci.c >>>> +++ b/hw/pci.c >>>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >>>> } >>>> >>>> config = pdev->config + offset; >>>> - config[PCI_CAP_LIST_ID] = cap_id; >>>> - config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; >>>> - pdev->config[PCI_CAPABILITY_LIST] = offset; >>>> - pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; >>>> + if (config[PCI_CAP_LIST_ID] != cap_id) { >>> >>> This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either >>> * replace the existing one or >>> * drop out and not write the new one in. > > * hw_error :) > >>> >>> I'm not sure which way would be more natural. >> >> There is a third option - add another function, lets call it >> pci_fixup_capability() which would do whatever pci_add_capability() does >> but won't touch list pointers. > > What good is a function that breaks internal consistency? It is broken already by having PCIDevice.used field. Normally pci_add_capability() would go through the whole list and add a capability if it does not exist. Emulated devices which care about having a capability at some fixed offset would have initialized their config space before calling this capabilities API (as VFIO does). If we really want to support emulated devices which want some capabilities be at fixed offset and others at random offsets (strange, but ok), I do not see how it is bad to restore this consistency by special function (pci_fixup_capability()) to avoid its rewriting at different location as a guest driver may care about its offset. >> When vfio, pci_add_capability() is called from the code which knows >> exactly that the capability exists and where it is and it calls >> pci_add_capability() based on this knowledge so doing additional loops >> just for imaginery scalability is a bit weird, no? > > Not sure I understand your proposal. The more generic a framework is, the better, no? In this code path we don't care about speed. We only care about consistency and reliability. > > > Alex > -- Alexey ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space 2012-05-14 3:49 ` Alexey Kardashevskiy @ 2012-05-18 5:12 ` Alexey Kardashevskiy 2012-05-22 2:02 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 29+ messages in thread From: Alexey Kardashevskiy @ 2012-05-18 5:12 UTC (permalink / raw) To: Alexander Graf; +Cc: kvm, qemu-devel, Alex Williamson, anthony, David Gibson Alexander, Is that any better? :) @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev) * in pci config space */ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t offset, uint8_t size) { - uint8_t *config; + uint8_t *config, existing; int i, overlapping_cap; + existing = pci_find_capability(pdev, cap_id); + if (existing) { + if (offset && (existing != offset)) { + return -EEXIST; + } + for (i = existing; i < size; ++i) { + if (pdev->used[i]) { + return -EFAULT; + } + } + memset(pdev->used + offset, 0xFF, size); + /* Make capability read-only by default */ + memset(pdev->wmask + offset, 0, size); + /* Check capability by default */ + memset(pdev->cmask + offset, 0xFF, size); + return existing; + } + if (!offset) { offset = pci_find_space(pdev, size); if (!offset) { return -ENOSPC; On 14/05/12 13:49, Alexey Kardashevskiy wrote: > On 12/05/12 00:13, Alexander Graf wrote: >> >> On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote: >> >>> 11.05.2012 20:52, Alexander Graf написал: >>>> >>>> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote: >>>> >>>>> Normally the pci_add_capability is called on devices to add new >>>>> capability. This is ok for emulated devices which capabilities list >>>>> is being built by QEMU. >>>>> >>>>> In the case of VFIO the capability may already exist and adding new >>>>> capability into the beginning of the linked list may create a loop. >>>>> >>>>> For example, the old code destroys the following config >>>>> of PCIe Intel E1000E: >>>>> >>>>> before adding PCI_CAP_ID_MSI (0x05): >>>>> 0x34: 0xC8 >>>>> 0xC8: 0x01 0xD0 >>>>> 0xD0: 0x05 0xE0 >>>>> 0xE0: 0x10 0x00 >>>>> >>>>> after: >>>>> 0x34: 0xD0 >>>>> 0xC8: 0x01 0xD0 >>>>> 0xD0: 0x05 0xC8 >>>>> 0xE0: 0x10 0x00 >>>>> >>>>> As result capabilities 0x01 and 0x05 point to each other. >>>>> >>>>> The proposed patch does not change capability pointers when >>>>> the same type capability is about to add. >>>>> >>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >>>>> --- >>>>> hw/pci.c | 10 ++++++---- >>>>> 1 files changed, 6 insertions(+), 4 deletions(-) >>>>> >>>>> diff --git a/hw/pci.c b/hw/pci.c >>>>> index aa0c0b8..1f7c924 100644 >>>>> --- a/hw/pci.c >>>>> +++ b/hw/pci.c >>>>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >>>>> } >>>>> >>>>> config = pdev->config + offset; >>>>> - config[PCI_CAP_LIST_ID] = cap_id; >>>>> - config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; >>>>> - pdev->config[PCI_CAPABILITY_LIST] = offset; >>>>> - pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; >>>>> + if (config[PCI_CAP_LIST_ID] != cap_id) { >>>> >>>> This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either >>>> * replace the existing one or >>>> * drop out and not write the new one in. >> >> * hw_error :) >> >>>> >>>> I'm not sure which way would be more natural. >>> >>> There is a third option - add another function, lets call it >>> pci_fixup_capability() which would do whatever pci_add_capability() does >>> but won't touch list pointers. >> >> What good is a function that breaks internal consistency? > > > It is broken already by having PCIDevice.used field. Normally pci_add_capability() would go through > the whole list and add a capability if it does not exist. Emulated devices which care about having a > capability at some fixed offset would have initialized their config space before calling this > capabilities API (as VFIO does). > > If we really want to support emulated devices which want some capabilities be at fixed offset and > others at random offsets (strange, but ok), I do not see how it is bad to restore this consistency > by special function (pci_fixup_capability()) to avoid its rewriting at different location as a guest > driver may care about its offset. > > > >>> When vfio, pci_add_capability() is called from the code which knows >>> exactly that the capability exists and where it is and it calls >>> pci_add_capability() based on this knowledge so doing additional loops >>> just for imaginery scalability is a bit weird, no? >> >> Not sure I understand your proposal. The more generic a framework is, the better, no? In this code path we don't care about speed. We only care about consistency and reliability. >> >> >> Alex >> > > -- Alexey ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space 2012-05-18 5:12 ` Alexey Kardashevskiy @ 2012-05-22 2:02 ` Benjamin Herrenschmidt 2012-05-22 3:21 ` Alexander Graf 0 siblings, 1 reply; 29+ messages in thread From: Benjamin Herrenschmidt @ 2012-05-22 2:02 UTC (permalink / raw) To: Alexey Kardashevskiy Cc: kvm, Alexander Graf, qemu-devel, Alex Williamson, anthony, David Gibson On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote: > Alexander, > > Is that any better? :) Alex (Graf that is), ping ? The original patch from Alexey was fine btw. VFIO will always call things with the existing capability offset so there's no real risk of doing the wrong thing or break the list or anything. IE. A small simple patch that addresses the problem :-) The new patch is a bit more "robust" I believe, I don't think we need to go too far to fix a problem we don't have. But we need a fix for the real issue and the simple patch does it neatly from what I can understand. Cheers, Ben. > > @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev) > * in pci config space */ > int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, > uint8_t offset, uint8_t size) > { > - uint8_t *config; > + uint8_t *config, existing; > int i, overlapping_cap; > > + existing = pci_find_capability(pdev, cap_id); > + if (existing) { > + if (offset && (existing != offset)) { > + return -EEXIST; > + } > + for (i = existing; i < size; ++i) { > + if (pdev->used[i]) { > + return -EFAULT; > + } > + } > + memset(pdev->used + offset, 0xFF, size); > + /* Make capability read-only by default */ > + memset(pdev->wmask + offset, 0, size); > + /* Check capability by default */ > + memset(pdev->cmask + offset, 0xFF, size); > + return existing; > + } > + > if (!offset) { > offset = pci_find_space(pdev, size); > if (!offset) { > return -ENOSPC; > > > > > > > On 14/05/12 13:49, Alexey Kardashevskiy wrote: > > On 12/05/12 00:13, Alexander Graf wrote: > >> > >> On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote: > >> > >>> 11.05.2012 20:52, Alexander Graf написал: > >>>> > >>>> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote: > >>>> > >>>>> Normally the pci_add_capability is called on devices to add new > >>>>> capability. This is ok for emulated devices which capabilities list > >>>>> is being built by QEMU. > >>>>> > >>>>> In the case of VFIO the capability may already exist and adding new > >>>>> capability into the beginning of the linked list may create a loop. > >>>>> > >>>>> For example, the old code destroys the following config > >>>>> of PCIe Intel E1000E: > >>>>> > >>>>> before adding PCI_CAP_ID_MSI (0x05): > >>>>> 0x34: 0xC8 > >>>>> 0xC8: 0x01 0xD0 > >>>>> 0xD0: 0x05 0xE0 > >>>>> 0xE0: 0x10 0x00 > >>>>> > >>>>> after: > >>>>> 0x34: 0xD0 > >>>>> 0xC8: 0x01 0xD0 > >>>>> 0xD0: 0x05 0xC8 > >>>>> 0xE0: 0x10 0x00 > >>>>> > >>>>> As result capabilities 0x01 and 0x05 point to each other. > >>>>> > >>>>> The proposed patch does not change capability pointers when > >>>>> the same type capability is about to add. > >>>>> > >>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> > >>>>> --- > >>>>> hw/pci.c | 10 ++++++---- > >>>>> 1 files changed, 6 insertions(+), 4 deletions(-) > >>>>> > >>>>> diff --git a/hw/pci.c b/hw/pci.c > >>>>> index aa0c0b8..1f7c924 100644 > >>>>> --- a/hw/pci.c > >>>>> +++ b/hw/pci.c > >>>>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, > >>>>> } > >>>>> > >>>>> config = pdev->config + offset; > >>>>> - config[PCI_CAP_LIST_ID] = cap_id; > >>>>> - config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; > >>>>> - pdev->config[PCI_CAPABILITY_LIST] = offset; > >>>>> - pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; > >>>>> + if (config[PCI_CAP_LIST_ID] != cap_id) { > >>>> > >>>> This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either > >>>> * replace the existing one or > >>>> * drop out and not write the new one in. > >> > >> * hw_error :) > >> > >>>> > >>>> I'm not sure which way would be more natural. > >>> > >>> There is a third option - add another function, lets call it > >>> pci_fixup_capability() which would do whatever pci_add_capability() does > >>> but won't touch list pointers. > >> > >> What good is a function that breaks internal consistency? > > > > > > It is broken already by having PCIDevice.used field. Normally pci_add_capability() would go through > > the whole list and add a capability if it does not exist. Emulated devices which care about having a > > capability at some fixed offset would have initialized their config space before calling this > > capabilities API (as VFIO does). > > > > If we really want to support emulated devices which want some capabilities be at fixed offset and > > others at random offsets (strange, but ok), I do not see how it is bad to restore this consistency > > by special function (pci_fixup_capability()) to avoid its rewriting at different location as a guest > > driver may care about its offset. > > > > > > > >>> When vfio, pci_add_capability() is called from the code which knows > >>> exactly that the capability exists and where it is and it calls > >>> pci_add_capability() based on this knowledge so doing additional loops > >>> just for imaginery scalability is a bit weird, no? > >> > >> Not sure I understand your proposal. The more generic a framework is, the better, no? In this code path we don't care about speed. We only care about consistency and reliability. > >> > >> > >> Alex > >> > > > > > > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space 2012-05-22 2:02 ` Benjamin Herrenschmidt @ 2012-05-22 3:21 ` Alexander Graf 2012-05-22 3:44 ` Alexey Kardashevskiy 0 siblings, 1 reply; 29+ messages in thread From: Alexander Graf @ 2012-05-22 3:21 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: kvm@vger.kernel.org, Alexey Kardashevskiy, qemu-devel@nongnu.org, Alex Williamson, anthony@codemonkey.ws, David Gibson On 22.05.2012, at 04:02, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote: >> Alexander, >> >> Is that any better? :) > > Alex (Graf that is), ping ? > > The original patch from Alexey was fine btw. > > VFIO will always call things with the existing capability offset so > there's no real risk of doing the wrong thing or break the list or > anything. > > IE. A small simple patch that addresses the problem :-) > > The new patch is a bit more "robust" I believe, I don't think we need to > go too far to fix a problem we don't have. But we need a fix for the > real issue and the simple patch does it neatly from what I can > understand. > > Cheers, > Ben. > >> >> @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev) >> * in pci config space */ >> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >> uint8_t offset, uint8_t size) >> { >> - uint8_t *config; >> + uint8_t *config, existing; Existing is a pointer to the target dev's config space, right? >> int i, overlapping_cap; >> >> + existing = pci_find_capability(pdev, cap_id); >> + if (existing) { >> + if (offset && (existing != offset)) { >> + return -EEXIST; >> + } >> + for (i = existing; i < size; ++i) { So how does this possibly make sense? >> + if (pdev->used[i]) { >> + return -EFAULT; >> + } >> + } >> + memset(pdev->used + offset, 0xFF, size); Why? >> + /* Make capability read-only by default */ >> + memset(pdev->wmask + offset, 0, size); Why? >> + /* Check capability by default */ >> + memset(pdev->cmask + offset, 0xFF, size); I don't understand this part either. Alex >> + return existing; >> + } >> + >> if (!offset) { >> offset = pci_find_space(pdev, size); >> if (!offset) { >> return -ENOSPC; >> >> >> >> >> >> >> On 14/05/12 13:49, Alexey Kardashevskiy wrote: >>> On 12/05/12 00:13, Alexander Graf wrote: >>>> >>>> On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote: >>>> >>>>> 11.05.2012 20:52, Alexander Graf написал: >>>>>> >>>>>> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote: >>>>>> >>>>>>> Normally the pci_add_capability is called on devices to add new >>>>>>> capability. This is ok for emulated devices which capabilities list >>>>>>> is being built by QEMU. >>>>>>> >>>>>>> In the case of VFIO the capability may already exist and adding new >>>>>>> capability into the beginning of the linked list may create a loop. >>>>>>> >>>>>>> For example, the old code destroys the following config >>>>>>> of PCIe Intel E1000E: >>>>>>> >>>>>>> before adding PCI_CAP_ID_MSI (0x05): >>>>>>> 0x34: 0xC8 >>>>>>> 0xC8: 0x01 0xD0 >>>>>>> 0xD0: 0x05 0xE0 >>>>>>> 0xE0: 0x10 0x00 >>>>>>> >>>>>>> after: >>>>>>> 0x34: 0xD0 >>>>>>> 0xC8: 0x01 0xD0 >>>>>>> 0xD0: 0x05 0xC8 >>>>>>> 0xE0: 0x10 0x00 >>>>>>> >>>>>>> As result capabilities 0x01 and 0x05 point to each other. >>>>>>> >>>>>>> The proposed patch does not change capability pointers when >>>>>>> the same type capability is about to add. >>>>>>> >>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >>>>>>> --- >>>>>>> hw/pci.c | 10 ++++++---- >>>>>>> 1 files changed, 6 insertions(+), 4 deletions(-) >>>>>>> >>>>>>> diff --git a/hw/pci.c b/hw/pci.c >>>>>>> index aa0c0b8..1f7c924 100644 >>>>>>> --- a/hw/pci.c >>>>>>> +++ b/hw/pci.c >>>>>>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >>>>>>> } >>>>>>> >>>>>>> config = pdev->config + offset; >>>>>>> - config[PCI_CAP_LIST_ID] = cap_id; >>>>>>> - config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; >>>>>>> - pdev->config[PCI_CAPABILITY_LIST] = offset; >>>>>>> - pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; >>>>>>> + if (config[PCI_CAP_LIST_ID] != cap_id) { >>>>>> >>>>>> This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either >>>>>> * replace the existing one or >>>>>> * drop out and not write the new one in. >>>> >>>> * hw_error :) >>>> >>>>>> >>>>>> I'm not sure which way would be more natural. >>>>> >>>>> There is a third option - add another function, lets call it >>>>> pci_fixup_capability() which would do whatever pci_add_capability() does >>>>> but won't touch list pointers. >>>> >>>> What good is a function that breaks internal consistency? >>> >>> >>> It is broken already by having PCIDevice.used field. Normally pci_add_capability() would go through >>> the whole list and add a capability if it does not exist. Emulated devices which care about having a >>> capability at some fixed offset would have initialized their config space before calling this >>> capabilities API (as VFIO does). >>> >>> If we really want to support emulated devices which want some capabilities be at fixed offset and >>> others at random offsets (strange, but ok), I do not see how it is bad to restore this consistency >>> by special function (pci_fixup_capability()) to avoid its rewriting at different location as a guest >>> driver may care about its offset. >>> >>> >>> >>>>> When vfio, pci_add_capability() is called from the code which knows >>>>> exactly that the capability exists and where it is and it calls >>>>> pci_add_capability() based on this knowledge so doing additional loops >>>>> just for imaginery scalability is a bit weird, no? >>>> >>>> Not sure I understand your proposal. The more generic a framework is, the better, no? In this code path we don't care about speed. We only care about consistency and reliability. >>>> >>>> >>>> Alex >>>> >>> >>> >> >> > > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space 2012-05-22 3:21 ` Alexander Graf @ 2012-05-22 3:44 ` Alexey Kardashevskiy 2012-05-22 5:52 ` Alexander Graf 0 siblings, 1 reply; 29+ messages in thread From: Alexey Kardashevskiy @ 2012-05-22 3:44 UTC (permalink / raw) To: Alexander Graf Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org, Alex Williamson, anthony@codemonkey.ws, David Gibson On 22/05/12 13:21, Alexander Graf wrote: > > > On 22.05.2012, at 04:02, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > >> On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote: >>> Alexander, >>> >>> Is that any better? :) >> >> Alex (Graf that is), ping ? >> >> The original patch from Alexey was fine btw. >> >> VFIO will always call things with the existing capability offset so >> there's no real risk of doing the wrong thing or break the list or >> anything. >> >> IE. A small simple patch that addresses the problem :-) >> >> The new patch is a bit more "robust" I believe, I don't think we need to >> go too far to fix a problem we don't have. But we need a fix for the >> real issue and the simple patch does it neatly from what I can >> understand. >> >> Cheers, >> Ben. >> >>> >>> @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev) >>> * in pci config space */ >>> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >>> uint8_t offset, uint8_t size) >>> { >>> - uint8_t *config; >>> + uint8_t *config, existing; > > Existing is a pointer to the target dev's config space, right? Yes. >>> int i, overlapping_cap; >>> >>> + existing = pci_find_capability(pdev, cap_id); >>> + if (existing) { >>> + if (offset && (existing != offset)) { >>> + return -EEXIST; >>> + } >>> + for (i = existing; i < size; ++i) { > > So how does this possibly make sense? Although I do not expect VFIO to add capabilities (does not make sense), I still want to double check that this space has not been tried to use by someone else. >>> + if (pdev->used[i]) { >>> + return -EFAULT; >>> + } >>> + } >>> + memset(pdev->used + offset, 0xFF, size); > Why? Because I am marking the space this capability takes as used. >>> + /* Make capability read-only by default */ >>> + memset(pdev->wmask + offset, 0, size); > Why? Because the pci_add_capability() does it for a new capability by default. >>> + /* Check capability by default */ >>> + memset(pdev->cmask + offset, 0xFF, size); > > I don't understand this part either. The pci_add_capability() does it for a new capability by default. > > Alex > >>> + return existing; >>> + } >>> + >>> if (!offset) { >>> offset = pci_find_space(pdev, size); >>> if (!offset) { >>> return -ENOSPC; >>> >>> >>> >>> >>> >>> >>> On 14/05/12 13:49, Alexey Kardashevskiy wrote: >>>> On 12/05/12 00:13, Alexander Graf wrote: >>>>> >>>>> On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote: >>>>> >>>>>> 11.05.2012 20:52, Alexander Graf написал: >>>>>>> >>>>>>> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote: >>>>>>> >>>>>>>> Normally the pci_add_capability is called on devices to add new >>>>>>>> capability. This is ok for emulated devices which capabilities list >>>>>>>> is being built by QEMU. >>>>>>>> >>>>>>>> In the case of VFIO the capability may already exist and adding new >>>>>>>> capability into the beginning of the linked list may create a loop. >>>>>>>> >>>>>>>> For example, the old code destroys the following config >>>>>>>> of PCIe Intel E1000E: >>>>>>>> >>>>>>>> before adding PCI_CAP_ID_MSI (0x05): >>>>>>>> 0x34: 0xC8 >>>>>>>> 0xC8: 0x01 0xD0 >>>>>>>> 0xD0: 0x05 0xE0 >>>>>>>> 0xE0: 0x10 0x00 >>>>>>>> >>>>>>>> after: >>>>>>>> 0x34: 0xD0 >>>>>>>> 0xC8: 0x01 0xD0 >>>>>>>> 0xD0: 0x05 0xC8 >>>>>>>> 0xE0: 0x10 0x00 >>>>>>>> >>>>>>>> As result capabilities 0x01 and 0x05 point to each other. >>>>>>>> >>>>>>>> The proposed patch does not change capability pointers when >>>>>>>> the same type capability is about to add. >>>>>>>> >>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >>>>>>>> --- >>>>>>>> hw/pci.c | 10 ++++++---- >>>>>>>> 1 files changed, 6 insertions(+), 4 deletions(-) >>>>>>>> >>>>>>>> diff --git a/hw/pci.c b/hw/pci.c >>>>>>>> index aa0c0b8..1f7c924 100644 >>>>>>>> --- a/hw/pci.c >>>>>>>> +++ b/hw/pci.c >>>>>>>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >>>>>>>> } >>>>>>>> >>>>>>>> config = pdev->config + offset; >>>>>>>> - config[PCI_CAP_LIST_ID] = cap_id; >>>>>>>> - config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; >>>>>>>> - pdev->config[PCI_CAPABILITY_LIST] = offset; >>>>>>>> - pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; >>>>>>>> + if (config[PCI_CAP_LIST_ID] != cap_id) { >>>>>>> >>>>>>> This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either >>>>>>> * replace the existing one or >>>>>>> * drop out and not write the new one in. >>>>> >>>>> * hw_error :) >>>>> >>>>>>> >>>>>>> I'm not sure which way would be more natural. >>>>>> >>>>>> There is a third option - add another function, lets call it >>>>>> pci_fixup_capability() which would do whatever pci_add_capability() does >>>>>> but won't touch list pointers. >>>>> >>>>> What good is a function that breaks internal consistency? >>>> >>>> >>>> It is broken already by having PCIDevice.used field. Normally pci_add_capability() would go through >>>> the whole list and add a capability if it does not exist. Emulated devices which care about having a >>>> capability at some fixed offset would have initialized their config space before calling this >>>> capabilities API (as VFIO does). >>>> >>>> If we really want to support emulated devices which want some capabilities be at fixed offset and >>>> others at random offsets (strange, but ok), I do not see how it is bad to restore this consistency >>>> by special function (pci_fixup_capability()) to avoid its rewriting at different location as a guest >>>> driver may care about its offset. >>>> >>>> >>>> >>>>>> When vfio, pci_add_capability() is called from the code which knows >>>>>> exactly that the capability exists and where it is and it calls >>>>>> pci_add_capability() based on this knowledge so doing additional loops >>>>>> just for imaginery scalability is a bit weird, no? >>>>> >>>>> Not sure I understand your proposal. The more generic a framework is, the better, no? In this code path we don't care about speed. We only care about consistency and reliability. >>>>> >>>>> >>>>> Alex >>>>> >>>> >>>> >>> >>> >> >> -- Alexey ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space 2012-05-22 3:44 ` Alexey Kardashevskiy @ 2012-05-22 5:52 ` Alexander Graf 2012-05-22 6:11 ` Alexey Kardashevskiy 0 siblings, 1 reply; 29+ messages in thread From: Alexander Graf @ 2012-05-22 5:52 UTC (permalink / raw) To: Alexey Kardashevskiy Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org, Alex Williamson, anthony@codemonkey.ws, David Gibson On 22.05.2012, at 05:44, Alexey Kardashevskiy <aik@ozlabs.ru> wrote: > On 22/05/12 13:21, Alexander Graf wrote: >> >> >> On 22.05.2012, at 04:02, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: >> >>> On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote: >>>> Alexander, >>>> >>>> Is that any better? :) >>> >>> Alex (Graf that is), ping ? >>> >>> The original patch from Alexey was fine btw. >>> >>> VFIO will always call things with the existing capability offset so >>> there's no real risk of doing the wrong thing or break the list or >>> anything. >>> >>> IE. A small simple patch that addresses the problem :-) >>> >>> The new patch is a bit more "robust" I believe, I don't think we need to >>> go too far to fix a problem we don't have. But we need a fix for the >>> real issue and the simple patch does it neatly from what I can >>> understand. >>> >>> Cheers, >>> Ben. >>> >>>> >>>> @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev) >>>> * in pci config space */ >>>> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >>>> uint8_t offset, uint8_t size) >>>> { >>>> - uint8_t *config; >>>> + uint8_t *config, existing; >> >> Existing is a pointer to the target dev's config space, right? > > Yes. > >>>> int i, overlapping_cap; >>>> >>>> + existing = pci_find_capability(pdev, cap_id); >>>> + if (existing) { >>>> + if (offset && (existing != offset)) { >>>> + return -EEXIST; >>>> + } >>>> + for (i = existing; i < size; ++i) { >> >> So how does this possibly make sense? > > Although I do not expect VFIO to add capabilities (does not make sense), I still want to double > check that this space has not been tried to use by someone else. i is an int. existing is a uint8_t*. > >>>> + if (pdev->used[i]) { >>>> + return -EFAULT; >>>> + } >>>> + } >>>> + memset(pdev->used + offset, 0xFF, size); >> Why? > > Because I am marking the space this capability takes as used. But if it already existed (at the same offset), it should be set used already, no? Unless size > existing size, in which case you might overwrite data in the next chunk, no? > >>>> + /* Make capability read-only by default */ >>>> + memset(pdev->wmask + offset, 0, size); >> Why? > > Because the pci_add_capability() does it for a new capability by default. Hrm. So you're copying code? Can't you merge the overwrite and write cases? Alex > > >>>> + /* Check capability by default */ >>>> + memset(pdev->cmask + offset, 0xFF, size); >> >> I don't understand this part either. > > The pci_add_capability() does it for a new capability by default. > > > >> >> Alex >> >>>> + return existing; >>>> + } >>>> + >>>> if (!offset) { >>>> offset = pci_find_space(pdev, size); >>>> if (!offset) { >>>> return -ENOSPC; >>>> >>>> >>>> >>>> >>>> >>>> >>>> On 14/05/12 13:49, Alexey Kardashevskiy wrote: >>>>> On 12/05/12 00:13, Alexander Graf wrote: >>>>>> >>>>>> On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote: >>>>>> >>>>>>> 11.05.2012 20:52, Alexander Graf написал: >>>>>>>> >>>>>>>> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote: >>>>>>>> >>>>>>>>> Normally the pci_add_capability is called on devices to add new >>>>>>>>> capability. This is ok for emulated devices which capabilities list >>>>>>>>> is being built by QEMU. >>>>>>>>> >>>>>>>>> In the case of VFIO the capability may already exist and adding new >>>>>>>>> capability into the beginning of the linked list may create a loop. >>>>>>>>> >>>>>>>>> For example, the old code destroys the following config >>>>>>>>> of PCIe Intel E1000E: >>>>>>>>> >>>>>>>>> before adding PCI_CAP_ID_MSI (0x05): >>>>>>>>> 0x34: 0xC8 >>>>>>>>> 0xC8: 0x01 0xD0 >>>>>>>>> 0xD0: 0x05 0xE0 >>>>>>>>> 0xE0: 0x10 0x00 >>>>>>>>> >>>>>>>>> after: >>>>>>>>> 0x34: 0xD0 >>>>>>>>> 0xC8: 0x01 0xD0 >>>>>>>>> 0xD0: 0x05 0xC8 >>>>>>>>> 0xE0: 0x10 0x00 >>>>>>>>> >>>>>>>>> As result capabilities 0x01 and 0x05 point to each other. >>>>>>>>> >>>>>>>>> The proposed patch does not change capability pointers when >>>>>>>>> the same type capability is about to add. >>>>>>>>> >>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >>>>>>>>> --- >>>>>>>>> hw/pci.c | 10 ++++++---- >>>>>>>>> 1 files changed, 6 insertions(+), 4 deletions(-) >>>>>>>>> >>>>>>>>> diff --git a/hw/pci.c b/hw/pci.c >>>>>>>>> index aa0c0b8..1f7c924 100644 >>>>>>>>> --- a/hw/pci.c >>>>>>>>> +++ b/hw/pci.c >>>>>>>>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >>>>>>>>> } >>>>>>>>> >>>>>>>>> config = pdev->config + offset; >>>>>>>>> - config[PCI_CAP_LIST_ID] = cap_id; >>>>>>>>> - config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; >>>>>>>>> - pdev->config[PCI_CAPABILITY_LIST] = offset; >>>>>>>>> - pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; >>>>>>>>> + if (config[PCI_CAP_LIST_ID] != cap_id) { >>>>>>>> >>>>>>>> This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either >>>>>>>> * replace the existing one or >>>>>>>> * drop out and not write the new one in. >>>>>> >>>>>> * hw_error :) >>>>>> >>>>>>>> >>>>>>>> I'm not sure which way would be more natural. >>>>>>> >>>>>>> There is a third option - add another function, lets call it >>>>>>> pci_fixup_capability() which would do whatever pci_add_capability() does >>>>>>> but won't touch list pointers. >>>>>> >>>>>> What good is a function that breaks internal consistency? >>>>> >>>>> >>>>> It is broken already by having PCIDevice.used field. Normally pci_add_capability() would go through >>>>> the whole list and add a capability if it does not exist. Emulated devices which care about having a >>>>> capability at some fixed offset would have initialized their config space before calling this >>>>> capabilities API (as VFIO does). >>>>> >>>>> If we really want to support emulated devices which want some capabilities be at fixed offset and >>>>> others at random offsets (strange, but ok), I do not see how it is bad to restore this consistency >>>>> by special function (pci_fixup_capability()) to avoid its rewriting at different location as a guest >>>>> driver may care about its offset. >>>>> >>>>> >>>>> >>>>>>> When vfio, pci_add_capability() is called from the code which knows >>>>>>> exactly that the capability exists and where it is and it calls >>>>>>> pci_add_capability() based on this knowledge so doing additional loops >>>>>>> just for imaginery scalability is a bit weird, no? >>>>>> >>>>>> Not sure I understand your proposal. The more generic a framework is, the better, no? In this code path we don't care about speed. We only care about consistency and reliability. >>>>>> >>>>>> >>>>>> Alex >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> > > > -- > Alexey ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space 2012-05-22 5:52 ` Alexander Graf @ 2012-05-22 6:11 ` Alexey Kardashevskiy 2012-05-22 6:31 ` Alexander Graf 2012-05-22 6:38 ` Alexander Graf 0 siblings, 2 replies; 29+ messages in thread From: Alexey Kardashevskiy @ 2012-05-22 6:11 UTC (permalink / raw) To: Alexander Graf Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org, Alex Williamson, anthony@codemonkey.ws, David Gibson On 22/05/12 15:52, Alexander Graf wrote: > > > On 22.05.2012, at 05:44, Alexey Kardashevskiy <aik@ozlabs.ru> wrote: > >> On 22/05/12 13:21, Alexander Graf wrote: >>> >>> >>> On 22.05.2012, at 04:02, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: >>> >>>> On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote: >>>>> Alexander, >>>>> >>>>> Is that any better? :) >>>> >>>> Alex (Graf that is), ping ? >>>> >>>> The original patch from Alexey was fine btw. >>>> >>>> VFIO will always call things with the existing capability offset so >>>> there's no real risk of doing the wrong thing or break the list or >>>> anything. >>>> >>>> IE. A small simple patch that addresses the problem :-) >>>> >>>> The new patch is a bit more "robust" I believe, I don't think we need to >>>> go too far to fix a problem we don't have. But we need a fix for the >>>> real issue and the simple patch does it neatly from what I can >>>> understand. >>>> >>>> Cheers, >>>> Ben. >>>> >>>>> >>>>> @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev) >>>>> * in pci config space */ >>>>> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >>>>> uint8_t offset, uint8_t size) >>>>> { >>>>> - uint8_t *config; >>>>> + uint8_t *config, existing; >>> >>> Existing is a pointer to the target dev's config space, right? >> >> Yes. >> >>>>> int i, overlapping_cap; >>>>> >>>>> + existing = pci_find_capability(pdev, cap_id); >>>>> + if (existing) { >>>>> + if (offset && (existing != offset)) { >>>>> + return -EEXIST; >>>>> + } >>>>> + for (i = existing; i < size; ++i) { >>> >>> So how does this possibly make sense? >> >> Although I do not expect VFIO to add capabilities (does not make sense), I still want to double >> check that this space has not been tried to use by someone else. > > i is an int. existing is a uint8_t*. It was there before me. This function already does a loop and this is how it was coded at the first place. >>>>> + if (pdev->used[i]) { >>>>> + return -EFAULT; >>>>> + } >>>>> + } >>>>> + memset(pdev->used + offset, 0xFF, size); >>> Why? >> >> Because I am marking the space this capability takes as used. > > But if it already existed (at the same offset), it should be set used already, no? Unless size > existing size, in which case you might overwrite data in the next chunk, no? No, it does not exist for VFIO - VFIO read the config space from the host kernel first and then calls msi_init or msix_init or whatever_else_init depending on what it got from the host kernel. And these xxx_init() functions eventually call pci_add_capability(). Sure we can either implement own msi_init/msix_init (and may be others in the future) for VFIO (which would do all the same as other QEMU devices except touching the capabilities) OR hack msi_init/msix_init not to touch capabilities if they exist. >>>>> + /* Make capability read-only by default */ >>>>> + memset(pdev->wmask + offset, 0, size); >>> Why? >> >> Because the pci_add_capability() does it for a new capability by default. > > Hrm. So you're copying code? Can't you merge the overwrite and write cases? I am trying to make it as a single chunk which is as small as possible. If it helps, below is the same patch with extended context to see what is going on in that function. hw/pci.c | 20 +++++++++++++++++++- 1 files changed, 19 insertions(+), 1 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 63a8219..7008a42 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -1772,75 +1772,93 @@ static int pci_add_option_rom(PCIDevice *pdev, bool is_default_rom) ptr = memory_region_get_ram_ptr(&pdev->rom); load_image(path, ptr); g_free(path); if (is_default_rom) { /* Only the default rom images will be patched (if needed). */ pci_patch_ids(pdev, ptr, size); } qemu_put_ram_ptr(ptr); pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom); return 0; } static void pci_del_option_rom(PCIDevice *pdev) { if (!pdev->has_rom) return; vmstate_unregister_ram(&pdev->rom, &pdev->qdev); memory_region_destroy(&pdev->rom); pdev->has_rom = false; } /* * if !offset * Reserve space and add capability to the linked list in pci config space * * if offset = 0, * Find and reserve space and add capability to the linked list * in pci config space */ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t offset, uint8_t size) { - uint8_t *config; + uint8_t *config, existing; int i, overlapping_cap; + existing = pci_find_capability(pdev, cap_id); + if (existing) { + if (offset && (existing != offset)) { + return -EEXIST; + } + for (i = existing; i < size; ++i) { + if (pdev->used[i]) { + return -EFAULT; + } + } + memset(pdev->used + offset, 0xFF, size); + /* Make capability read-only by default */ + memset(pdev->wmask + offset, 0, size); + /* Check capability by default */ + memset(pdev->cmask + offset, 0xFF, size); + return existing; + } + if (!offset) { offset = pci_find_space(pdev, size); if (!offset) { return -ENOSPC; } } else { /* Verify that capabilities don't overlap. Note: device assignment * depends on this check to verify that the device is not broken. * Should never trigger for emulated devices, but it's helpful * for debugging these. */ for (i = offset; i < offset + size; i++) { overlapping_cap = pci_find_capability_at_offset(pdev, i); if (overlapping_cap) { fprintf(stderr, "ERROR: %04x:%02x:%02x.%x " "Attempt to add PCI capability %x at offset " "%x overlaps existing capability %x at offset %x\n", pci_find_domain(pdev->bus), pci_bus_num(pdev->bus), PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn), cap_id, offset, overlapping_cap, i); return -EINVAL; } } } config = pdev->config + offset; config[PCI_CAP_LIST_ID] = cap_id; config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; pdev->config[PCI_CAPABILITY_LIST] = offset; pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; memset(pdev->used + offset, 0xFF, size); /* Make capability read-only by default */ memset(pdev->wmask + offset, 0, size); /* Check capability by default */ memset(pdev->cmask + offset, 0xFF, size); return offset; } >>>>> + /* Check capability by default */ >>>>> + memset(pdev->cmask + offset, 0xFF, size); >>> >>> I don't understand this part either. >> >> The pci_add_capability() does it for a new capability by default. >> >> >> >>> >>> Alex >>> >>>>> + return existing; >>>>> + } >>>>> + >>>>> if (!offset) { >>>>> offset = pci_find_space(pdev, size); >>>>> if (!offset) { >>>>> return -ENOSPC; >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 14/05/12 13:49, Alexey Kardashevskiy wrote: >>>>>> On 12/05/12 00:13, Alexander Graf wrote: >>>>>>> >>>>>>> On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote: >>>>>>> >>>>>>>> 11.05.2012 20:52, Alexander Graf =0?8A0;: >>>>>>>>> >>>>>>>>> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote: >>>>>>>>> >>>>>>>>>> Normally the pci_add_capability is called on devices to add new >>>>>>>>>> capability. This is ok for emulated devices which capabilities list >>>>>>>>>> is being built by QEMU. >>>>>>>>>> >>>>>>>>>> In the case of VFIO the capability may already exist and adding new >>>>>>>>>> capability into the beginning of the linked list may create a loop. >>>>>>>>>> >>>>>>>>>> For example, the old code destroys the following config >>>>>>>>>> of PCIe Intel E1000E: >>>>>>>>>> >>>>>>>>>> before adding PCI_CAP_ID_MSI (0x05): >>>>>>>>>> 0x34: 0xC8 >>>>>>>>>> 0xC8: 0x01 0xD0 >>>>>>>>>> 0xD0: 0x05 0xE0 >>>>>>>>>> 0xE0: 0x10 0x00 >>>>>>>>>> >>>>>>>>>> after: >>>>>>>>>> 0x34: 0xD0 >>>>>>>>>> 0xC8: 0x01 0xD0 >>>>>>>>>> 0xD0: 0x05 0xC8 >>>>>>>>>> 0xE0: 0x10 0x00 >>>>>>>>>> >>>>>>>>>> As result capabilities 0x01 and 0x05 point to each other. >>>>>>>>>> >>>>>>>>>> The proposed patch does not change capability pointers when >>>>>>>>>> the same type capability is about to add. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >>>>>>>>>> --- >>>>>>>>>> hw/pci.c | 10 ++++++---- >>>>>>>>>> 1 files changed, 6 insertions(+), 4 deletions(-) >>>>>>>>>> >>>>>>>>>> diff --git a/hw/pci.c b/hw/pci.c >>>>>>>>>> index aa0c0b8..1f7c924 100644 >>>>>>>>>> --- a/hw/pci.c >>>>>>>>>> +++ b/hw/pci.c >>>>>>>>>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> config = pdev->config + offset; >>>>>>>>>> - config[PCI_CAP_LIST_ID] = cap_id; >>>>>>>>>> - config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; >>>>>>>>>> - pdev->config[PCI_CAPABILITY_LIST] = offset; >>>>>>>>>> - pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; >>>>>>>>>> + if (config[PCI_CAP_LIST_ID] != cap_id) { >>>>>>>>> >>>>>>>>> This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either >>>>>>>>> * replace the existing one or >>>>>>>>> * drop out and not write the new one in. >>>>>>> >>>>>>> * hw_error :) >>>>>>> >>>>>>>>> >>>>>>>>> I'm not sure which way would be more natural. >>>>>>>> >>>>>>>> There is a third option - add another function, lets call it >>>>>>>> pci_fixup_capability() which would do whatever pci_add_capability() does >>>>>>>> but won't touch list pointers. >>>>>>> >>>>>>> What good is a function that breaks internal consistency? >>>>>> >>>>>> >>>>>> It is broken already by having PCIDevice.used field. Normally pci_add_capability() would go through >>>>>> the whole list and add a capability if it does not exist. Emulated devices which care about having a >>>>>> capability at some fixed offset would have initialized their config space before calling this >>>>>> capabilities API (as VFIO does). >>>>>> >>>>>> If we really want to support emulated devices which want some capabilities be at fixed offset and >>>>>> others at random offsets (strange, but ok), I do not see how it is bad to restore this consistency >>>>>> by special function (pci_fixup_capability()) to avoid its rewriting at different location as a guest >>>>>> driver may care about its offset. >>>>>> >>>>>> >>>>>> >>>>>>>> When vfio, pci_add_capability() is called from the code which knows >>>>>>>> exactly that the capability exists and where it is and it calls >>>>>>>> pci_add_capability() based on this knowledge so doing additional loops >>>>>>>> just for imaginery scalability is a bit weird, no? >>>>>>> >>>>>>> Not sure I understand your proposal. The more generic a framework is, the better, no? In this code path we don't care about speed. We only care about consistency and reliability. >>>>>>> >>>>>>> >>>>>>> Alex >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >> >> >> -- >> Alexey -- Alexey ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space 2012-05-22 6:11 ` Alexey Kardashevskiy @ 2012-05-22 6:31 ` Alexander Graf 2012-05-22 7:01 ` Alexey Kardashevskiy 2012-06-08 8:47 ` Alexey Kardashevskiy 2012-05-22 6:38 ` Alexander Graf 1 sibling, 2 replies; 29+ messages in thread From: Alexander Graf @ 2012-05-22 6:31 UTC (permalink / raw) To: Alexey Kardashevskiy Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org, Alex Williamson, anthony@codemonkey.ws, David Gibson On 22.05.2012, at 08:11, Alexey Kardashevskiy <aik@ozlabs.ru> wrote: > On 22/05/12 15:52, Alexander Graf wrote: >> >> >> On 22.05.2012, at 05:44, Alexey Kardashevskiy <aik@ozlabs.ru> wrote: >> >>> On 22/05/12 13:21, Alexander Graf wrote: >>>> >>>> >>>> On 22.05.2012, at 04:02, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: >>>> >>>>> On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote: >>>>>> Alexander, >>>>>> >>>>>> Is that any better? :) >>>>> >>>>> Alex (Graf that is), ping ? >>>>> >>>>> The original patch from Alexey was fine btw. >>>>> >>>>> VFIO will always call things with the existing capability offset so >>>>> there's no real risk of doing the wrong thing or break the list or >>>>> anything. >>>>> >>>>> IE. A small simple patch that addresses the problem :-) >>>>> >>>>> The new patch is a bit more "robust" I believe, I don't think we need to >>>>> go too far to fix a problem we don't have. But we need a fix for the >>>>> real issue and the simple patch does it neatly from what I can >>>>> understand. >>>>> >>>>> Cheers, >>>>> Ben. >>>>> >>>>>> >>>>>> @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev) >>>>>> * in pci config space */ >>>>>> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >>>>>> uint8_t offset, uint8_t size) >>>>>> { >>>>>> - uint8_t *config; >>>>>> + uint8_t *config, existing; >>>> >>>> Existing is a pointer to the target dev's config space, right? >>> >>> Yes. >>> >>>>>> int i, overlapping_cap; >>>>>> >>>>>> + existing = pci_find_capability(pdev, cap_id); >>>>>> + if (existing) { >>>>>> + if (offset && (existing != offset)) { >>>>>> + return -EEXIST; >>>>>> + } >>>>>> + for (i = existing; i < size; ++i) { >>>> >>>> So how does this possibly make sense? >>> >>> Although I do not expect VFIO to add capabilities (does not make sense), I still want to double >>> check that this space has not been tried to use by someone else. >> >> i is an int. existing is a uint8_t*. > > > It was there before me. This function already does a loop and this is how it was coded at the first place. Ugh. Existing is a uint8_t, not a pointer. Gotta love C syntax... > > >>>>>> + if (pdev->used[i]) { >>>>>> + return -EFAULT; >>>>>> + } >>>>>> + } >>>>>> + memset(pdev->used + offset, 0xFF, size); >>>> Why? >>> >>> Because I am marking the space this capability takes as used. >> >> But if it already existed (at the same offset), it should be set used already, no? Unless size > existing size, in which case you might overwrite data in the next chunk, no? > > > No, it does not exist for VFIO - VFIO read the config space from the host kernel first and then calls msi_init or msix_init or whatever_else_init depending on what it got from the host kernel. And these xxx_init() functions eventually call pci_add_capability(). So why would the function that populates the config space initially not set the used flag correctly? > > Sure we can either implement own msi_init/msix_init (and may be others in the future) for VFIO (which would do all the same as other QEMU devices except touching the capabilities) OR hack msi_init/msix_init not to touch capabilities if they exist. No, calling the internal functions sounds fine to me. It's the step before that irritates me. VFIO shouldn't differ too much from an emulated device wrt its config space population really. > > > >>>>>> + /* Make capability read-only by default */ >>>>>> + memset(pdev->wmask + offset, 0, size); >>>> Why? >>> >>> Because the pci_add_capability() does it for a new capability by default. >> >> Hrm. So you're copying code? Can't you merge the overwrite and write cases? > > I am trying to make it as a single chunk which is as small as possible. No, you're needlessly duplicating code which is a bad idea :). Please reuse as much of the existing function as possible, unless it really doesn't make sense. > > > If it helps, below is the same patch with extended context to see what is going on in that function. > > > > > > > hw/pci.c | 20 +++++++++++++++++++- > 1 files changed, 19 insertions(+), 1 deletions(-) > > diff --git a/hw/pci.c b/hw/pci.c > index 63a8219..7008a42 100644 > --- a/hw/pci.c > +++ b/hw/pci.c > @@ -1772,75 +1772,93 @@ static int pci_add_option_rom(PCIDevice *pdev, bool is_default_rom) > ptr = memory_region_get_ram_ptr(&pdev->rom); > load_image(path, ptr); > g_free(path); > > if (is_default_rom) { > /* Only the default rom images will be patched (if needed). */ > pci_patch_ids(pdev, ptr, size); > } > > qemu_put_ram_ptr(ptr); > > pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom); > > return 0; > } > > static void pci_del_option_rom(PCIDevice *pdev) > { > if (!pdev->has_rom) > return; > > vmstate_unregister_ram(&pdev->rom, &pdev->qdev); > memory_region_destroy(&pdev->rom); > pdev->has_rom = false; > } > > /* > * if !offset > * Reserve space and add capability to the linked list in pci config space > * > * if offset = 0, > * Find and reserve space and add capability to the linked list > * in pci config space */ > int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, > uint8_t offset, uint8_t size) > { > - uint8_t *config; > + uint8_t *config, existing; > int i, overlapping_cap; > > + existing = pci_find_capability(pdev, cap_id); > + if (existing) { > + if (offset && (existing != offset)) { > + return -EEXIST; > + } > + for (i = existing; i < size; ++i) { > + if (pdev->used[i]) { > + return -EFAULT; > + } > + } } > + memset(pdev->used + offset, 0xFF, size); > + /* Make capability read-only by default */ > + memset(pdev->wmask + offset, 0, size); > + /* Check capability by default */ > + memset(pdev->cmask + offset, 0xFF, size); > + return existing; > + } > + > if (!offset) { && !existing maybe? > offset = pci_find_space(pdev, size); > if (!offset) { > return -ENOSPC; > } > } else { > /* Verify that capabilities don't overlap. Note: device assignment > * depends on this check to verify that the device is not broken. > * Should never trigger for emulated devices, but it's helpful > * for debugging these. */ > for (i = offset; i < offset + size; i++) { > overlapping_cap = pci_find_capability_at_offset(pdev, i); > if (overlapping_cap) { > fprintf(stderr, "ERROR: %04x:%02x:%02x.%x " > "Attempt to add PCI capability %x at offset " > "%x overlaps existing capability %x at offset %x\n", > pci_find_domain(pdev->bus), pci_bus_num(pdev->bus), > PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn), > cap_id, offset, overlapping_cap, i); > return -EINVAL; > } > } > } > If (!existing) { > config = pdev->config + offset; > config[PCI_CAP_LIST_ID] = cap_id; > config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; > pdev->config[PCI_CAPABILITY_LIST] = offset; > pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; } which poses the question why the above wouldn't apply for the existing case. It would work just as well to leave that in, no? Alex > memset(pdev->used + offset, 0xFF, size); > /* Make capability read-only by default */ > memset(pdev->wmask + offset, 0, size); > /* Check capability by default */ > memset(pdev->cmask + offset, 0xFF, size); > return offset; > } > > > > >>>>>> + /* Check capability by default */ >>>>>> + memset(pdev->cmask + offset, 0xFF, size); >>>> >>>> I don't understand this part either. >>> >>> The pci_add_capability() does it for a new capability by default. >>> >>> >>> >>>> >>>> Alex >>>> >>>>>> + return existing; >>>>>> + } >>>>>> + >>>>>> if (!offset) { >>>>>> offset = pci_find_space(pdev, size); >>>>>> if (!offset) { >>>>>> return -ENOSPC; >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 14/05/12 13:49, Alexey Kardashevskiy wrote: >>>>>>> On 12/05/12 00:13, Alexander Graf wrote: >>>>>>>> >>>>>>>> On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote: >>>>>>>> >>>>>>>>> 11.05.2012 20:52, Alexander Graf =0?8A0;: >>>>>>>>>> >>>>>>>>>> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote: >>>>>>>>>> >>>>>>>>>>> Normally the pci_add_capability is called on devices to add new >>>>>>>>>>> capability. This is ok for emulated devices which capabilities list >>>>>>>>>>> is being built by QEMU. >>>>>>>>>>> >>>>>>>>>>> In the case of VFIO the capability may already exist and adding new >>>>>>>>>>> capability into the beginning of the linked list may create a loop. >>>>>>>>>>> >>>>>>>>>>> For example, the old code destroys the following config >>>>>>>>>>> of PCIe Intel E1000E: >>>>>>>>>>> >>>>>>>>>>> before adding PCI_CAP_ID_MSI (0x05): >>>>>>>>>>> 0x34: 0xC8 >>>>>>>>>>> 0xC8: 0x01 0xD0 >>>>>>>>>>> 0xD0: 0x05 0xE0 >>>>>>>>>>> 0xE0: 0x10 0x00 >>>>>>>>>>> >>>>>>>>>>> after: >>>>>>>>>>> 0x34: 0xD0 >>>>>>>>>>> 0xC8: 0x01 0xD0 >>>>>>>>>>> 0xD0: 0x05 0xC8 >>>>>>>>>>> 0xE0: 0x10 0x00 >>>>>>>>>>> >>>>>>>>>>> As result capabilities 0x01 and 0x05 point to each other. >>>>>>>>>>> >>>>>>>>>>> The proposed patch does not change capability pointers when >>>>>>>>>>> the same type capability is about to add. >>>>>>>>>>> >>>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >>>>>>>>>>> --- >>>>>>>>>>> hw/pci.c | 10 ++++++---- >>>>>>>>>>> 1 files changed, 6 insertions(+), 4 deletions(-) >>>>>>>>>>> >>>>>>>>>>> diff --git a/hw/pci.c b/hw/pci.c >>>>>>>>>>> index aa0c0b8..1f7c924 100644 >>>>>>>>>>> --- a/hw/pci.c >>>>>>>>>>> +++ b/hw/pci.c >>>>>>>>>>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> config = pdev->config + offset; >>>>>>>>>>> - config[PCI_CAP_LIST_ID] = cap_id; >>>>>>>>>>> - config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; >>>>>>>>>>> - pdev->config[PCI_CAPABILITY_LIST] = offset; >>>>>>>>>>> - pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; >>>>>>>>>>> + if (config[PCI_CAP_LIST_ID] != cap_id) { >>>>>>>>>> >>>>>>>>>> This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either >>>>>>>>>> * replace the existing one or >>>>>>>>>> * drop out and not write the new one in. >>>>>>>> >>>>>>>> * hw_error :) >>>>>>>> >>>>>>>>>> >>>>>>>>>> I'm not sure which way would be more natural. >>>>>>>>> >>>>>>>>> There is a third option - add another function, lets call it >>>>>>>>> pci_fixup_capability() which would do whatever pci_add_capability() does >>>>>>>>> but won't touch list pointers. >>>>>>>> >>>>>>>> What good is a function that breaks internal consistency? >>>>>>> >>>>>>> >>>>>>> It is broken already by having PCIDevice.used field. Normally pci_add_capability() would go through >>>>>>> the whole list and add a capability if it does not exist. Emulated devices which care about having a >>>>>>> capability at some fixed offset would have initialized their config space before calling this >>>>>>> capabilities API (as VFIO does). >>>>>>> >>>>>>> If we really want to support emulated devices which want some capabilities be at fixed offset and >>>>>>> others at random offsets (strange, but ok), I do not see how it is bad to restore this consistency >>>>>>> by special function (pci_fixup_capability()) to avoid its rewriting at different location as a guest >>>>>>> driver may care about its offset. >>>>>>> >>>>>>> >>>>>>> >>>>>>>>> When vfio, pci_add_capability() is called from the code which knows >>>>>>>>> exactly that the capability exists and where it is and it calls >>>>>>>>> pci_add_capability() based on this knowledge so doing additional loops >>>>>>>>> just for imaginery scalability is a bit weird, no? >>>>>>>> >>>>>>>> Not sure I understand your proposal. The more generic a framework is, the better, no? In this code path we don't care about speed. We only care about consistency and reliability. >>>>>>>> >>>>>>>> >>>>>>>> Alex >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>> >>> >>> -- >>> Alexey > > > -- > Alexey ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space 2012-05-22 6:31 ` Alexander Graf @ 2012-05-22 7:01 ` Alexey Kardashevskiy 2012-05-22 7:13 ` Alexander Graf 2012-06-08 8:47 ` Alexey Kardashevskiy 1 sibling, 1 reply; 29+ messages in thread From: Alexey Kardashevskiy @ 2012-05-22 7:01 UTC (permalink / raw) To: Alexander Graf Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org, Alex Williamson, anthony@codemonkey.ws, David Gibson On 22/05/12 16:31, Alexander Graf wrote: > > > On 22.05.2012, at 08:11, Alexey Kardashevskiy <aik@ozlabs.ru> wrote: > >> On 22/05/12 15:52, Alexander Graf wrote: >>> >>> >>> On 22.05.2012, at 05:44, Alexey Kardashevskiy <aik@ozlabs.ru> wrote: >>> >>>> On 22/05/12 13:21, Alexander Graf wrote: >>>>> >>>>> >>>>> On 22.05.2012, at 04:02, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: >>>>> >>>>>> On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote: >>>>>>> Alexander, >>>>>>> >>>>>>> Is that any better? :) >>>>>> >>>>>> Alex (Graf that is), ping ? >>>>>> >>>>>> The original patch from Alexey was fine btw. >>>>>> >>>>>> VFIO will always call things with the existing capability offset so >>>>>> there's no real risk of doing the wrong thing or break the list or >>>>>> anything. >>>>>> >>>>>> IE. A small simple patch that addresses the problem :-) >>>>>> >>>>>> The new patch is a bit more "robust" I believe, I don't think we need to >>>>>> go too far to fix a problem we don't have. But we need a fix for the >>>>>> real issue and the simple patch does it neatly from what I can >>>>>> understand. >>>>>> >>>>>> Cheers, >>>>>> Ben. >>>>>> >>>>>>> >>>>>>> @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev) >>>>>>> * in pci config space */ >>>>>>> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >>>>>>> uint8_t offset, uint8_t size) >>>>>>> { >>>>>>> - uint8_t *config; >>>>>>> + uint8_t *config, existing; >>>>> >>>>> Existing is a pointer to the target dev's config space, right? >>>> >>>> Yes. >>>> >>>>>>> int i, overlapping_cap; >>>>>>> >>>>>>> + existing = pci_find_capability(pdev, cap_id); >>>>>>> + if (existing) { >>>>>>> + if (offset && (existing != offset)) { >>>>>>> + return -EEXIST; >>>>>>> + } >>>>>>> + for (i = existing; i < size; ++i) { >>>>> >>>>> So how does this possibly make sense? >>>> >>>> Although I do not expect VFIO to add capabilities (does not make sense), I still want to double >>>> check that this space has not been tried to use by someone else. >>> >>> i is an int. existing is a uint8_t*. >> >> >> It was there before me. This function already does a loop and this is how it was coded at the first place. > > Ugh. Existing is a uint8_t, not a pointer. Gotta love C syntax... Well it is still does not make much sense to have "int i" rather than "uint8_t i" :) >>>>>>> + if (pdev->used[i]) { >>>>>>> + return -EFAULT; >>>>>>> + } >>>>>>> + } >>>>>>> + memset(pdev->used + offset, 0xFF, size); >>>>> Why? >>>> >>>> Because I am marking the space this capability takes as used. >>> >>> But if it already existed (at the same offset), it should be set used already, no? Unless size > existing size, in which case you might overwrite data in the next chunk, no? >> >> >> No, it does not exist for VFIO - VFIO read the config space from the host kernel first and then calls msi_init or msix_init or whatever_else_init depending on what it got from the host kernel. And these xxx_init() functions eventually call pci_add_capability(). > So why would the function that populates the config space initially not set the used flag correctly? This is internal kitchen of PCIDevice which I do not want to touch from anywhere but pci.c. And there is no "fixup_capability" or something. >> Sure we can either implement own msi_init/msix_init (and may be others in the future) for VFIO (which would do all the same as other QEMU devices except touching the capabilities) OR hack msi_init/msix_init not to touch capabilities if they exist. > No, calling the internal functions sounds fine to me. It's the step before that irritates me. VFIO shouldn't differ too much from an emulated device wrt its config space population really. The last thing we want for a VFIO device is changing its capabilities list. >>>>>>> + /* Make capability read-only by default */ >>>>>>> + memset(pdev->wmask + offset, 0, size); >>>>> Why? >>>> >>>> Because the pci_add_capability() does it for a new capability by default. >>> >>> Hrm. So you're copying code? Can't you merge the overwrite and write cases? >> >> I am trying to make it as a single chunk which is as small as possible. > > No, you're needlessly duplicating code which is a bad idea :). Please reuse as much of the existing function as possible, unless it really doesn't make sense. I actually duplicated 4 (four) lines and did it just once. This is too little to be called "duplicating" :) And I get very special case visually separated and easy to remove if we find a better solution later. But - no problemo, I'll rework it. [no further comments] >> If it helps, below is the same patch with extended context to see what is going on in that function. >> >> >> >> >> >> >> hw/pci.c | 20 +++++++++++++++++++- >> 1 files changed, 19 insertions(+), 1 deletions(-) >> >> diff --git a/hw/pci.c b/hw/pci.c >> index 63a8219..7008a42 100644 >> --- a/hw/pci.c >> +++ b/hw/pci.c >> @@ -1772,75 +1772,93 @@ static int pci_add_option_rom(PCIDevice *pdev, bool is_default_rom) >> ptr = memory_region_get_ram_ptr(&pdev->rom); >> load_image(path, ptr); >> g_free(path); >> >> if (is_default_rom) { >> /* Only the default rom images will be patched (if needed). */ >> pci_patch_ids(pdev, ptr, size); >> } >> >> qemu_put_ram_ptr(ptr); >> >> pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom); >> >> return 0; >> } >> >> static void pci_del_option_rom(PCIDevice *pdev) >> { >> if (!pdev->has_rom) >> return; >> >> vmstate_unregister_ram(&pdev->rom, &pdev->qdev); >> memory_region_destroy(&pdev->rom); >> pdev->has_rom = false; >> } >> >> /* >> * if !offset >> * Reserve space and add capability to the linked list in pci config space >> * >> * if offset = 0, >> * Find and reserve space and add capability to the linked list >> * in pci config space */ >> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >> uint8_t offset, uint8_t size) >> { >> - uint8_t *config; >> + uint8_t *config, existing; >> int i, overlapping_cap; >> >> + existing = pci_find_capability(pdev, cap_id); >> + if (existing) { >> + if (offset && (existing != offset)) { >> + return -EEXIST; >> + } >> + for (i = existing; i < size; ++i) { >> + if (pdev->used[i]) { >> + return -EFAULT; >> + } >> + } > > } > >> + memset(pdev->used + offset, 0xFF, size); >> + /* Make capability read-only by default */ >> + memset(pdev->wmask + offset, 0, size); >> + /* Check capability by default */ >> + memset(pdev->cmask + offset, 0xFF, size); >> + return existing; >> + } >> + >> if (!offset) { > > && !existing maybe? > >> offset = pci_find_space(pdev, size); >> if (!offset) { >> return -ENOSPC; >> } >> } else { >> /* Verify that capabilities don't overlap. Note: device assignment >> * depends on this check to verify that the device is not broken. >> * Should never trigger for emulated devices, but it's helpful >> * for debugging these. */ >> for (i = offset; i < offset + size; i++) { >> overlapping_cap = pci_find_capability_at_offset(pdev, i); >> if (overlapping_cap) { >> fprintf(stderr, "ERROR: %04x:%02x:%02x.%x " >> "Attempt to add PCI capability %x at offset " >> "%x overlaps existing capability %x at offset %x\n", >> pci_find_domain(pdev->bus), pci_bus_num(pdev->bus), >> PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn), >> cap_id, offset, overlapping_cap, i); >> return -EINVAL; >> } >> } >> } >> > > If (!existing) { > >> config = pdev->config + offset; >> config[PCI_CAP_LIST_ID] = cap_id; >> config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; >> pdev->config[PCI_CAPABILITY_LIST] = offset; >> pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; > > } > > which poses the question why the above wouldn't apply for the existing case. It would work just as well to leave that in, no? > > Alex > >> memset(pdev->used + offset, 0xFF, size); >> /* Make capability read-only by default */ >> memset(pdev->wmask + offset, 0, size); >> /* Check capability by default */ >> memset(pdev->cmask + offset, 0xFF, size); >> return offset; >> } >> >> >> >> >>>>>>> + /* Check capability by default */ >>>>>>> + memset(pdev->cmask + offset, 0xFF, size); >>>>> >>>>> I don't understand this part either. >>>> >>>> The pci_add_capability() does it for a new capability by default. >>>> >>>> >>>> >>>>> >>>>> Alex >>>>> >>>>>>> + return existing; >>>>>>> + } >>>>>>> + >>>>>>> if (!offset) { >>>>>>> offset = pci_find_space(pdev, size); >>>>>>> if (!offset) { >>>>>>> return -ENOSPC; >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 14/05/12 13:49, Alexey Kardashevskiy wrote: >>>>>>>> On 12/05/12 00:13, Alexander Graf wrote: >>>>>>>>> >>>>>>>>> On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote: >>>>>>>>> >>>>>>>>>> 11.05.2012 20:52, Alexander Graf =0?8A0;: >>>>>>>>>>> >>>>>>>>>>> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote: >>>>>>>>>>> >>>>>>>>>>>> Normally the pci_add_capability is called on devices to add new >>>>>>>>>>>> capability. This is ok for emulated devices which capabilities list >>>>>>>>>>>> is being built by QEMU. >>>>>>>>>>>> >>>>>>>>>>>> In the case of VFIO the capability may already exist and adding new >>>>>>>>>>>> capability into the beginning of the linked list may create a loop. >>>>>>>>>>>> >>>>>>>>>>>> For example, the old code destroys the following config >>>>>>>>>>>> of PCIe Intel E1000E: >>>>>>>>>>>> >>>>>>>>>>>> before adding PCI_CAP_ID_MSI (0x05): >>>>>>>>>>>> 0x34: 0xC8 >>>>>>>>>>>> 0xC8: 0x01 0xD0 >>>>>>>>>>>> 0xD0: 0x05 0xE0 >>>>>>>>>>>> 0xE0: 0x10 0x00 >>>>>>>>>>>> >>>>>>>>>>>> after: >>>>>>>>>>>> 0x34: 0xD0 >>>>>>>>>>>> 0xC8: 0x01 0xD0 >>>>>>>>>>>> 0xD0: 0x05 0xC8 >>>>>>>>>>>> 0xE0: 0x10 0x00 >>>>>>>>>>>> >>>>>>>>>>>> As result capabilities 0x01 and 0x05 point to each other. >>>>>>>>>>>> >>>>>>>>>>>> The proposed patch does not change capability pointers when >>>>>>>>>>>> the same type capability is about to add. >>>>>>>>>>>> >>>>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >>>>>>>>>>>> --- >>>>>>>>>>>> hw/pci.c | 10 ++++++---- >>>>>>>>>>>> 1 files changed, 6 insertions(+), 4 deletions(-) >>>>>>>>>>>> >>>>>>>>>>>> diff --git a/hw/pci.c b/hw/pci.c >>>>>>>>>>>> index aa0c0b8..1f7c924 100644 >>>>>>>>>>>> --- a/hw/pci.c >>>>>>>>>>>> +++ b/hw/pci.c >>>>>>>>>>>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> config = pdev->config + offset; >>>>>>>>>>>> - config[PCI_CAP_LIST_ID] = cap_id; >>>>>>>>>>>> - config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; >>>>>>>>>>>> - pdev->config[PCI_CAPABILITY_LIST] = offset; >>>>>>>>>>>> - pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; >>>>>>>>>>>> + if (config[PCI_CAP_LIST_ID] != cap_id) { >>>>>>>>>>> >>>>>>>>>>> This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either >>>>>>>>>>> * replace the existing one or >>>>>>>>>>> * drop out and not write the new one in. >>>>>>>>> >>>>>>>>> * hw_error :) >>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I'm not sure which way would be more natural. >>>>>>>>>> >>>>>>>>>> There is a third option - add another function, lets call it >>>>>>>>>> pci_fixup_capability() which would do whatever pci_add_capability() does >>>>>>>>>> but won't touch list pointers. >>>>>>>>> >>>>>>>>> What good is a function that breaks internal consistency? >>>>>>>> >>>>>>>> >>>>>>>> It is broken already by having PCIDevice.used field. Normally pci_add_capability() would go through >>>>>>>> the whole list and add a capability if it does not exist. Emulated devices which care about having a >>>>>>>> capability at some fixed offset would have initialized their config space before calling this >>>>>>>> capabilities API (as VFIO does). >>>>>>>> >>>>>>>> If we really want to support emulated devices which want some capabilities be at fixed offset and >>>>>>>> others at random offsets (strange, but ok), I do not see how it is bad to restore this consistency >>>>>>>> by special function (pci_fixup_capability()) to avoid its rewriting at different location as a guest >>>>>>>> driver may care about its offset. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>> When vfio, pci_add_capability() is called from the code which knows >>>>>>>>>> exactly that the capability exists and where it is and it calls >>>>>>>>>> pci_add_capability() based on this knowledge so doing additional loops >>>>>>>>>> just for imaginery scalability is a bit weird, no? >>>>>>>>> >>>>>>>>> Not sure I understand your proposal. The more generic a framework is, the better, no? In this code path we don't care about speed. We only care about consistency and reliability. -- Alexey ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space 2012-05-22 7:01 ` Alexey Kardashevskiy @ 2012-05-22 7:13 ` Alexander Graf 2012-05-22 7:37 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 29+ messages in thread From: Alexander Graf @ 2012-05-22 7:13 UTC (permalink / raw) To: Alexey Kardashevskiy Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org, Alex Williamson, anthony@codemonkey.ws, David Gibson On 22.05.2012, at 09:01, Alexey Kardashevskiy wrote: > On 22/05/12 16:31, Alexander Graf wrote: >> >> >> On 22.05.2012, at 08:11, Alexey Kardashevskiy <aik@ozlabs.ru> wrote: >> >>> On 22/05/12 15:52, Alexander Graf wrote: >>>> >>>> >>>> On 22.05.2012, at 05:44, Alexey Kardashevskiy <aik@ozlabs.ru> wrote: >>>> >>>>> On 22/05/12 13:21, Alexander Graf wrote: >>>>>> >>>>>> >>>>>> On 22.05.2012, at 04:02, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: >>>>>> >>>>>>> On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote: >>>>>>>> Alexander, >>>>>>>> >>>>>>>> Is that any better? :) >>>>>>> >>>>>>> Alex (Graf that is), ping ? >>>>>>> >>>>>>> The original patch from Alexey was fine btw. >>>>>>> >>>>>>> VFIO will always call things with the existing capability offset so >>>>>>> there's no real risk of doing the wrong thing or break the list or >>>>>>> anything. >>>>>>> >>>>>>> IE. A small simple patch that addresses the problem :-) >>>>>>> >>>>>>> The new patch is a bit more "robust" I believe, I don't think we need to >>>>>>> go too far to fix a problem we don't have. But we need a fix for the >>>>>>> real issue and the simple patch does it neatly from what I can >>>>>>> understand. >>>>>>> >>>>>>> Cheers, >>>>>>> Ben. >>>>>>> >>>>>>>> >>>>>>>> @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev) >>>>>>>> * in pci config space */ >>>>>>>> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >>>>>>>> uint8_t offset, uint8_t size) >>>>>>>> { >>>>>>>> - uint8_t *config; >>>>>>>> + uint8_t *config, existing; >>>>>> >>>>>> Existing is a pointer to the target dev's config space, right? >>>>> >>>>> Yes. >>>>> >>>>>>>> int i, overlapping_cap; >>>>>>>> >>>>>>>> + existing = pci_find_capability(pdev, cap_id); >>>>>>>> + if (existing) { >>>>>>>> + if (offset && (existing != offset)) { >>>>>>>> + return -EEXIST; >>>>>>>> + } >>>>>>>> + for (i = existing; i < size; ++i) { >>>>>> >>>>>> So how does this possibly make sense? >>>>> >>>>> Although I do not expect VFIO to add capabilities (does not make sense), I still want to double >>>>> check that this space has not been tried to use by someone else. >>>> >>>> i is an int. existing is a uint8_t*. >>> >>> >>> It was there before me. This function already does a loop and this is how it was coded at the first place. >> >> Ugh. Existing is a uint8_t, not a pointer. Gotta love C syntax... > > > Well it is still does not make much sense to have "int i" rather than "uint8_t i" :) > > >>>>>>>> + if (pdev->used[i]) { >>>>>>>> + return -EFAULT; >>>>>>>> + } >>>>>>>> + } >>>>>>>> + memset(pdev->used + offset, 0xFF, size); >>>>>> Why? >>>>> >>>>> Because I am marking the space this capability takes as used. >>>> >>>> But if it already existed (at the same offset), it should be set used already, no? Unless size > existing size, in which case you might overwrite data in the next chunk, no? >>> >>> >>> No, it does not exist for VFIO - VFIO read the config space from the host kernel first and then calls msi_init or msix_init or whatever_else_init depending on what it got from the host kernel. And these xxx_init() functions eventually call pci_add_capability(). >> So why would the function that populates the config space initially not set the used flag correctly? > > > This is internal kitchen of PCIDevice which I do not want to touch from anywhere but pci.c. And > there is no "fixup_capability" or something. Hrm. Maybe we should have one? :) Or instead of populating the config space with the exact data from the host device, loop through the host device capabilities and populate them using this function as we go. That should maintain the offsets, but ensure that all internal flags are set, no? > > >>> Sure we can either implement own msi_init/msix_init (and may be others in the future) for VFIO (which would do all the same as other QEMU devices except touching the capabilities) OR hack msi_init/msix_init not to touch capabilities if they exist. >> No, calling the internal functions sounds fine to me. It's the step before that irritates me. VFIO shouldn't differ too much from an emulated device wrt its config space population really. > > > The last thing we want for a VFIO device is changing its capabilities list. Well - we want it to look the same. The population should go through the same methods as emulated devices have to go through, no? :) > > >>>>>>>> + /* Make capability read-only by default */ >>>>>>>> + memset(pdev->wmask + offset, 0, size); >>>>>> Why? >>>>> >>>>> Because the pci_add_capability() does it for a new capability by default. >>>> >>>> Hrm. So you're copying code? Can't you merge the overwrite and write cases? >>> >>> I am trying to make it as a single chunk which is as small as possible. >> >> No, you're needlessly duplicating code which is a bad idea :). Please reuse as much of the existing function as possible, unless it really doesn't make sense. > > I actually duplicated 4 (four) lines and did it just once. This is too little to be called > "duplicating" :) And I get very special case visually separated and easy to remove if we find a > better solution later. Yeah, but the special case really shouldn't be all that special - that's why I was irritated. > But - no problemo, I'll rework it. Thanks! > [no further comments] What about mine below? ;) Alex > > > >>> If it helps, below is the same patch with extended context to see what is going on in that function. >>> >>> >>> >>> >>> >>> >>> hw/pci.c | 20 +++++++++++++++++++- >>> 1 files changed, 19 insertions(+), 1 deletions(-) >>> >>> diff --git a/hw/pci.c b/hw/pci.c >>> index 63a8219..7008a42 100644 >>> --- a/hw/pci.c >>> +++ b/hw/pci.c >>> @@ -1772,75 +1772,93 @@ static int pci_add_option_rom(PCIDevice *pdev, bool is_default_rom) >>> ptr = memory_region_get_ram_ptr(&pdev->rom); >>> load_image(path, ptr); >>> g_free(path); >>> >>> if (is_default_rom) { >>> /* Only the default rom images will be patched (if needed). */ >>> pci_patch_ids(pdev, ptr, size); >>> } >>> >>> qemu_put_ram_ptr(ptr); >>> >>> pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom); >>> >>> return 0; >>> } >>> >>> static void pci_del_option_rom(PCIDevice *pdev) >>> { >>> if (!pdev->has_rom) >>> return; >>> >>> vmstate_unregister_ram(&pdev->rom, &pdev->qdev); >>> memory_region_destroy(&pdev->rom); >>> pdev->has_rom = false; >>> } >>> >>> /* >>> * if !offset >>> * Reserve space and add capability to the linked list in pci config space >>> * >>> * if offset = 0, >>> * Find and reserve space and add capability to the linked list >>> * in pci config space */ >>> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >>> uint8_t offset, uint8_t size) >>> { >>> - uint8_t *config; >>> + uint8_t *config, existing; >>> int i, overlapping_cap; >>> >>> + existing = pci_find_capability(pdev, cap_id); >>> + if (existing) { >>> + if (offset && (existing != offset)) { >>> + return -EEXIST; >>> + } >>> + for (i = existing; i < size; ++i) { >>> + if (pdev->used[i]) { >>> + return -EFAULT; >>> + } >>> + } >> >> } >> >>> + memset(pdev->used + offset, 0xFF, size); >>> + /* Make capability read-only by default */ >>> + memset(pdev->wmask + offset, 0, size); >>> + /* Check capability by default */ >>> + memset(pdev->cmask + offset, 0xFF, size); >>> + return existing; >>> + } >>> + >>> if (!offset) { >> >> && !existing maybe? >> >>> offset = pci_find_space(pdev, size); >>> if (!offset) { >>> return -ENOSPC; >>> } >>> } else { >>> /* Verify that capabilities don't overlap. Note: device assignment >>> * depends on this check to verify that the device is not broken. >>> * Should never trigger for emulated devices, but it's helpful >>> * for debugging these. */ >>> for (i = offset; i < offset + size; i++) { >>> overlapping_cap = pci_find_capability_at_offset(pdev, i); >>> if (overlapping_cap) { >>> fprintf(stderr, "ERROR: %04x:%02x:%02x.%x " >>> "Attempt to add PCI capability %x at offset " >>> "%x overlaps existing capability %x at offset %x\n", >>> pci_find_domain(pdev->bus), pci_bus_num(pdev->bus), >>> PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn), >>> cap_id, offset, overlapping_cap, i); >>> return -EINVAL; >>> } >>> } >>> } >>> >> >> If (!existing) { >> >>> config = pdev->config + offset; >>> config[PCI_CAP_LIST_ID] = cap_id; >>> config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; >>> pdev->config[PCI_CAPABILITY_LIST] = offset; >>> pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; >> >> } >> >> which poses the question why the above wouldn't apply for the existing case. It would work just as well to leave that in, no? >> >> Alex >> >>> memset(pdev->used + offset, 0xFF, size); >>> /* Make capability read-only by default */ >>> memset(pdev->wmask + offset, 0, size); >>> /* Check capability by default */ >>> memset(pdev->cmask + offset, 0xFF, size); >>> return offset; >>> } >>> >>> >>> >>> >>>>>>>> + /* Check capability by default */ >>>>>>>> + memset(pdev->cmask + offset, 0xFF, size); >>>>>> >>>>>> I don't understand this part either. >>>>> >>>>> The pci_add_capability() does it for a new capability by default. >>>>> >>>>> >>>>> >>>>>> >>>>>> Alex >>>>>> >>>>>>>> + return existing; >>>>>>>> + } >>>>>>>> + >>>>>>>> if (!offset) { >>>>>>>> offset = pci_find_space(pdev, size); >>>>>>>> if (!offset) { >>>>>>>> return -ENOSPC; >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 14/05/12 13:49, Alexey Kardashevskiy wrote: >>>>>>>>> On 12/05/12 00:13, Alexander Graf wrote: >>>>>>>>>> >>>>>>>>>> On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote: >>>>>>>>>> >>>>>>>>>>> 11.05.2012 20:52, Alexander Graf =0?8A0;: >>>>>>>>>>>> >>>>>>>>>>>> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Normally the pci_add_capability is called on devices to add new >>>>>>>>>>>>> capability. This is ok for emulated devices which capabilities list >>>>>>>>>>>>> is being built by QEMU. >>>>>>>>>>>>> >>>>>>>>>>>>> In the case of VFIO the capability may already exist and adding new >>>>>>>>>>>>> capability into the beginning of the linked list may create a loop. >>>>>>>>>>>>> >>>>>>>>>>>>> For example, the old code destroys the following config >>>>>>>>>>>>> of PCIe Intel E1000E: >>>>>>>>>>>>> >>>>>>>>>>>>> before adding PCI_CAP_ID_MSI (0x05): >>>>>>>>>>>>> 0x34: 0xC8 >>>>>>>>>>>>> 0xC8: 0x01 0xD0 >>>>>>>>>>>>> 0xD0: 0x05 0xE0 >>>>>>>>>>>>> 0xE0: 0x10 0x00 >>>>>>>>>>>>> >>>>>>>>>>>>> after: >>>>>>>>>>>>> 0x34: 0xD0 >>>>>>>>>>>>> 0xC8: 0x01 0xD0 >>>>>>>>>>>>> 0xD0: 0x05 0xC8 >>>>>>>>>>>>> 0xE0: 0x10 0x00 >>>>>>>>>>>>> >>>>>>>>>>>>> As result capabilities 0x01 and 0x05 point to each other. >>>>>>>>>>>>> >>>>>>>>>>>>> The proposed patch does not change capability pointers when >>>>>>>>>>>>> the same type capability is about to add. >>>>>>>>>>>>> >>>>>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >>>>>>>>>>>>> --- >>>>>>>>>>>>> hw/pci.c | 10 ++++++---- >>>>>>>>>>>>> 1 files changed, 6 insertions(+), 4 deletions(-) >>>>>>>>>>>>> >>>>>>>>>>>>> diff --git a/hw/pci.c b/hw/pci.c >>>>>>>>>>>>> index aa0c0b8..1f7c924 100644 >>>>>>>>>>>>> --- a/hw/pci.c >>>>>>>>>>>>> +++ b/hw/pci.c >>>>>>>>>>>>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >>>>>>>>>>>>> } >>>>>>>>>>>>> >>>>>>>>>>>>> config = pdev->config + offset; >>>>>>>>>>>>> - config[PCI_CAP_LIST_ID] = cap_id; >>>>>>>>>>>>> - config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; >>>>>>>>>>>>> - pdev->config[PCI_CAPABILITY_LIST] = offset; >>>>>>>>>>>>> - pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; >>>>>>>>>>>>> + if (config[PCI_CAP_LIST_ID] != cap_id) { >>>>>>>>>>>> >>>>>>>>>>>> This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either >>>>>>>>>>>> * replace the existing one or >>>>>>>>>>>> * drop out and not write the new one in. >>>>>>>>>> >>>>>>>>>> * hw_error :) >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I'm not sure which way would be more natural. >>>>>>>>>>> >>>>>>>>>>> There is a third option - add another function, lets call it >>>>>>>>>>> pci_fixup_capability() which would do whatever pci_add_capability() does >>>>>>>>>>> but won't touch list pointers. >>>>>>>>>> >>>>>>>>>> What good is a function that breaks internal consistency? >>>>>>>>> >>>>>>>>> >>>>>>>>> It is broken already by having PCIDevice.used field. Normally pci_add_capability() would go through >>>>>>>>> the whole list and add a capability if it does not exist. Emulated devices which care about having a >>>>>>>>> capability at some fixed offset would have initialized their config space before calling this >>>>>>>>> capabilities API (as VFIO does). >>>>>>>>> >>>>>>>>> If we really want to support emulated devices which want some capabilities be at fixed offset and >>>>>>>>> others at random offsets (strange, but ok), I do not see how it is bad to restore this consistency >>>>>>>>> by special function (pci_fixup_capability()) to avoid its rewriting at different location as a guest >>>>>>>>> driver may care about its offset. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>>> When vfio, pci_add_capability() is called from the code which knows >>>>>>>>>>> exactly that the capability exists and where it is and it calls >>>>>>>>>>> pci_add_capability() based on this knowledge so doing additional loops >>>>>>>>>>> just for imaginery scalability is a bit weird, no? >>>>>>>>>> >>>>>>>>>> Not sure I understand your proposal. The more generic a framework is, the better, no? In this code path we don't care about speed. We only care about consistency and reliability. > > > > > -- > Alexey ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space 2012-05-22 7:13 ` Alexander Graf @ 2012-05-22 7:37 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 29+ messages in thread From: Benjamin Herrenschmidt @ 2012-05-22 7:37 UTC (permalink / raw) To: Alexander Graf Cc: kvm@vger.kernel.org, Alexey Kardashevskiy, qemu-devel@nongnu.org, Alex Williamson, anthony@codemonkey.ws, David Gibson On Tue, 2012-05-22 at 09:13 +0200, Alexander Graf wrote: > On 22.05.2012, at 09:01, Alexey Kardashevskiy wrote: > > This is internal kitchen of PCIDevice which I do not want to touch > from anywhere but pci.c. And > > there is no "fixup_capability" or something. > > Hrm. Maybe we should have one? :) Or instead of populating the config > space with the exact data from the host device, > loop through the host device capabilities and populate them using this > function as we go. > That should maintain the offsets, but ensure that all internal flags > are set, no? That actually sounds reasonable, though it might be more efficient to have something like pci_parse_config() or similar called once with a pre-cooked config space. Internally inside pci.c it can do that same loop you mention. The advantage is that it can also do whatever else we might need in the future. If for some reason, something wants to cache a cap pointer it can be done there, whatever else that is normally initialized as fields to generate the config space can be initialized by reading the config space and then initializing the fields etc... from that one function. Cheers, Ben. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space 2012-05-22 6:31 ` Alexander Graf 2012-05-22 7:01 ` Alexey Kardashevskiy @ 2012-06-08 8:47 ` Alexey Kardashevskiy 2012-06-08 10:56 ` Jan Kiszka 1 sibling, 1 reply; 29+ messages in thread From: Alexey Kardashevskiy @ 2012-06-08 8:47 UTC (permalink / raw) To: Alexander Graf Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org, Alex Williamson, anthony@codemonkey.ws, David Gibson Yet another try :) Normally the pci_add_capability is called on devices to add new capability. This is ok for emulated devices which capabilities list is being built by QEMU. In the case of VFIO the capability may already exist and adding new capability into the beginning of the linked list may create a loop as the existing code ignores capabilities which point to the capability being added. For example, the old code destroys the following config of PCIe Intel E1000E: before adding PCI_CAP_ID_MSI (0x05) at offset 0xD0: 0x34: 0xC8 0xC8: 0x01 0xD0 0xD0: 0x05 0xE0 0xE0: 0x10 0x00 after: 0x34: 0xD0 0xC8: 0x01 0xD0 0xD0: 0x05 0xC8 0xE0: 0x10 0x00 As result capabilities at 0xC8 and 0xD0 point to each other. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> --- hw/pci.c | 25 ++++++++++++++++++++----- 1 files changed, 20 insertions(+), 5 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 63a8219..cd22caa 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -1791,61 +1791,76 @@ static void pci_del_option_rom(PCIDevice *pdev) return; vmstate_unregister_ram(&pdev->rom, &pdev->qdev); memory_region_destroy(&pdev->rom); pdev->has_rom = false; } /* * if !offset * Reserve space and add capability to the linked list in pci config space * * if offset = 0, * Find and reserve space and add capability to the linked list - * in pci config space */ + * in pci config space + * Also does not change the capability list pointers + * if a capability already exists (actual for device with pre-cooked config + * space such as VFIO) + */ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t offset, uint8_t size) { uint8_t *config; int i, overlapping_cap; + uint8_t existing_offset; + + existing_offset = pci_find_capability(pdev, cap_id); + if (existing_offset) { + if (offset && (existing_offset != offset)) { + return -EEXIST; + } + offset = existing_offset; + } if (!offset) { offset = pci_find_space(pdev, size); if (!offset) { return -ENOSPC; } } else { /* Verify that capabilities don't overlap. Note: device assignment * depends on this check to verify that the device is not broken. * Should never trigger for emulated devices, but it's helpful * for debugging these. */ for (i = offset; i < offset + size; i++) { overlapping_cap = pci_find_capability_at_offset(pdev, i); if (overlapping_cap) { fprintf(stderr, "ERROR: %04x:%02x:%02x.%x " "Attempt to add PCI capability %x at offset " "%x overlaps existing capability %x at offset %x\n", pci_find_domain(pdev->bus), pci_bus_num(pdev->bus), PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn), cap_id, offset, overlapping_cap, i); return -EINVAL; } } } - config = pdev->config + offset; - config[PCI_CAP_LIST_ID] = cap_id; - config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; - pdev->config[PCI_CAPABILITY_LIST] = offset; + if (!existing_offset) { + config = pdev->config + offset; + config[PCI_CAP_LIST_ID] = cap_id; + config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; + pdev->config[PCI_CAPABILITY_LIST] = offset; + } pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; memset(pdev->used + offset, 0xFF, size); /* Make capability read-only by default */ memset(pdev->wmask + offset, 0, size); /* Check capability by default */ memset(pdev->cmask + offset, 0xFF, size); return offset; } /* Unlink capability from the pci config space. */ void pci_del_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t size) { uint8_t prev, offset = pci_find_capability_list(pdev, cap_id, &prev); -- 1.7.7.3 On 22/05/12 16:31, Alexander Graf wrote: > > > On 22.05.2012, at 08:11, Alexey Kardashevskiy <aik@ozlabs.ru> wrote: > >> On 22/05/12 15:52, Alexander Graf wrote: >>> >>> >>> On 22.05.2012, at 05:44, Alexey Kardashevskiy <aik@ozlabs.ru> wrote: >>> >>>> On 22/05/12 13:21, Alexander Graf wrote: >>>>> >>>>> >>>>> On 22.05.2012, at 04:02, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: >>>>> >>>>>> On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote: >>>>>>> Alexander, >>>>>>> >>>>>>> Is that any better? :) >>>>>> >>>>>> Alex (Graf that is), ping ? >>>>>> >>>>>> The original patch from Alexey was fine btw. >>>>>> >>>>>> VFIO will always call things with the existing capability offset so >>>>>> there's no real risk of doing the wrong thing or break the list or >>>>>> anything. >>>>>> >>>>>> IE. A small simple patch that addresses the problem :-) >>>>>> >>>>>> The new patch is a bit more "robust" I believe, I don't think we need to >>>>>> go too far to fix a problem we don't have. But we need a fix for the >>>>>> real issue and the simple patch does it neatly from what I can >>>>>> understand. >>>>>> >>>>>> Cheers, >>>>>> Ben. >>>>>> >>>>>>> >>>>>>> @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev) >>>>>>> * in pci config space */ >>>>>>> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >>>>>>> uint8_t offset, uint8_t size) >>>>>>> { >>>>>>> - uint8_t *config; >>>>>>> + uint8_t *config, existing; >>>>> >>>>> Existing is a pointer to the target dev's config space, right? >>>> >>>> Yes. >>>> >>>>>>> int i, overlapping_cap; >>>>>>> >>>>>>> + existing = pci_find_capability(pdev, cap_id); >>>>>>> + if (existing) { >>>>>>> + if (offset && (existing != offset)) { >>>>>>> + return -EEXIST; >>>>>>> + } >>>>>>> + for (i = existing; i < size; ++i) { >>>>> >>>>> So how does this possibly make sense? >>>> >>>> Although I do not expect VFIO to add capabilities (does not make sense), I still want to double >>>> check that this space has not been tried to use by someone else. >>> >>> i is an int. existing is a uint8_t*. >> >> >> It was there before me. This function already does a loop and this is how it was coded at the first place. > > Ugh. Existing is a uint8_t, not a pointer. Gotta love C syntax... > >> >> >>>>>>> + if (pdev->used[i]) { >>>>>>> + return -EFAULT; >>>>>>> + } >>>>>>> + } >>>>>>> + memset(pdev->used + offset, 0xFF, size); >>>>> Why? >>>> >>>> Because I am marking the space this capability takes as used. >>> >>> But if it already existed (at the same offset), it should be set used already, no? Unless size > existing size, in which case you might overwrite data in the next chunk, no? >> >> >> No, it does not exist for VFIO - VFIO read the config space from the host kernel first and then calls msi_init or msix_init or whatever_else_init depending on what it got from the host kernel. And these xxx_init() functions eventually call pci_add_capability(). > > So why would the function that populates the config space initially not set the used flag correctly? > >> >> Sure we can either implement own msi_init/msix_init (and may be others in the future) for VFIO (which would do all the same as other QEMU devices except touching the capabilities) OR hack msi_init/msix_init not to touch capabilities if they exist. > > No, calling the internal functions sounds fine to me. It's the step before that irritates me. VFIO shouldn't differ too much from an emulated device wrt its config space population really. > >> >> >> >>>>>>> + /* Make capability read-only by default */ >>>>>>> + memset(pdev->wmask + offset, 0, size); >>>>> Why? >>>> >>>> Because the pci_add_capability() does it for a new capability by default. >>> >>> Hrm. So you're copying code? Can't you merge the overwrite and write cases? >> >> I am trying to make it as a single chunk which is as small as possible. > > No, you're needlessly duplicating code which is a bad idea :). Please reuse as much of the existing function as possible, unless it really doesn't make sense. > >> >> >> If it helps, below is the same patch with extended context to see what is going on in that function. >> >> >> >> >> >> >> hw/pci.c | 20 +++++++++++++++++++- >> 1 files changed, 19 insertions(+), 1 deletions(-) >> >> diff --git a/hw/pci.c b/hw/pci.c >> index 63a8219..7008a42 100644 >> --- a/hw/pci.c >> +++ b/hw/pci.c >> @@ -1772,75 +1772,93 @@ static int pci_add_option_rom(PCIDevice *pdev, bool is_default_rom) >> ptr = memory_region_get_ram_ptr(&pdev->rom); >> load_image(path, ptr); >> g_free(path); >> >> if (is_default_rom) { >> /* Only the default rom images will be patched (if needed). */ >> pci_patch_ids(pdev, ptr, size); >> } >> >> qemu_put_ram_ptr(ptr); >> >> pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom); >> >> return 0; >> } >> >> static void pci_del_option_rom(PCIDevice *pdev) >> { >> if (!pdev->has_rom) >> return; >> >> vmstate_unregister_ram(&pdev->rom, &pdev->qdev); >> memory_region_destroy(&pdev->rom); >> pdev->has_rom = false; >> } >> >> /* >> * if !offset >> * Reserve space and add capability to the linked list in pci config space >> * >> * if offset = 0, >> * Find and reserve space and add capability to the linked list >> * in pci config space */ >> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >> uint8_t offset, uint8_t size) >> { >> - uint8_t *config; >> + uint8_t *config, existing; >> int i, overlapping_cap; >> >> + existing = pci_find_capability(pdev, cap_id); >> + if (existing) { >> + if (offset && (existing != offset)) { >> + return -EEXIST; >> + } >> + for (i = existing; i < size; ++i) { >> + if (pdev->used[i]) { >> + return -EFAULT; >> + } >> + } > > } > >> + memset(pdev->used + offset, 0xFF, size); >> + /* Make capability read-only by default */ >> + memset(pdev->wmask + offset, 0, size); >> + /* Check capability by default */ >> + memset(pdev->cmask + offset, 0xFF, size); >> + return existing; >> + } >> + >> if (!offset) { > > && !existing maybe? > >> offset = pci_find_space(pdev, size); >> if (!offset) { >> return -ENOSPC; >> } >> } else { >> /* Verify that capabilities don't overlap. Note: device assignment >> * depends on this check to verify that the device is not broken. >> * Should never trigger for emulated devices, but it's helpful >> * for debugging these. */ >> for (i = offset; i < offset + size; i++) { >> overlapping_cap = pci_find_capability_at_offset(pdev, i); >> if (overlapping_cap) { >> fprintf(stderr, "ERROR: %04x:%02x:%02x.%x " >> "Attempt to add PCI capability %x at offset " >> "%x overlaps existing capability %x at offset %x\n", >> pci_find_domain(pdev->bus), pci_bus_num(pdev->bus), >> PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn), >> cap_id, offset, overlapping_cap, i); >> return -EINVAL; >> } >> } >> } >> > > If (!existing) { > >> config = pdev->config + offset; >> config[PCI_CAP_LIST_ID] = cap_id; >> config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; >> pdev->config[PCI_CAPABILITY_LIST] = offset; >> pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; > > } > > which poses the question why the above wouldn't apply for the existing case. It would work just as well to leave that in, no? > > Alex > >> memset(pdev->used + offset, 0xFF, size); >> /* Make capability read-only by default */ >> memset(pdev->wmask + offset, 0, size); >> /* Check capability by default */ >> memset(pdev->cmask + offset, 0xFF, size); >> return offset; >> } >> >> >> >> >>>>>>> + /* Check capability by default */ >>>>>>> + memset(pdev->cmask + offset, 0xFF, size); >>>>> >>>>> I don't understand this part either. >>>> >>>> The pci_add_capability() does it for a new capability by default. >>>> >>>> >>>> >>>>> >>>>> Alex >>>>> >>>>>>> + return existing; >>>>>>> + } >>>>>>> + >>>>>>> if (!offset) { >>>>>>> offset = pci_find_space(pdev, size); >>>>>>> if (!offset) { >>>>>>> return -ENOSPC; >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 14/05/12 13:49, Alexey Kardashevskiy wrote: >>>>>>>> On 12/05/12 00:13, Alexander Graf wrote: >>>>>>>>> >>>>>>>>> On 11.05.2012, at 14:47, Alexey Kardashevskiy wrote: >>>>>>>>> >>>>>>>>>> 11.05.2012 20:52, Alexander Graf =0?8A0;: >>>>>>>>>>> >>>>>>>>>>> On 11.05.2012, at 08:45, Alexey Kardashevskiy wrote: >>>>>>>>>>> >>>>>>>>>>>> Normally the pci_add_capability is called on devices to add new >>>>>>>>>>>> capability. This is ok for emulated devices which capabilities list >>>>>>>>>>>> is being built by QEMU. >>>>>>>>>>>> >>>>>>>>>>>> In the case of VFIO the capability may already exist and adding new >>>>>>>>>>>> capability into the beginning of the linked list may create a loop. >>>>>>>>>>>> >>>>>>>>>>>> For example, the old code destroys the following config >>>>>>>>>>>> of PCIe Intel E1000E: >>>>>>>>>>>> >>>>>>>>>>>> before adding PCI_CAP_ID_MSI (0x05): >>>>>>>>>>>> 0x34: 0xC8 >>>>>>>>>>>> 0xC8: 0x01 0xD0 >>>>>>>>>>>> 0xD0: 0x05 0xE0 >>>>>>>>>>>> 0xE0: 0x10 0x00 >>>>>>>>>>>> >>>>>>>>>>>> after: >>>>>>>>>>>> 0x34: 0xD0 >>>>>>>>>>>> 0xC8: 0x01 0xD0 >>>>>>>>>>>> 0xD0: 0x05 0xC8 >>>>>>>>>>>> 0xE0: 0x10 0x00 >>>>>>>>>>>> >>>>>>>>>>>> As result capabilities 0x01 and 0x05 point to each other. >>>>>>>>>>>> >>>>>>>>>>>> The proposed patch does not change capability pointers when >>>>>>>>>>>> the same type capability is about to add. >>>>>>>>>>>> >>>>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >>>>>>>>>>>> --- >>>>>>>>>>>> hw/pci.c | 10 ++++++---- >>>>>>>>>>>> 1 files changed, 6 insertions(+), 4 deletions(-) >>>>>>>>>>>> >>>>>>>>>>>> diff --git a/hw/pci.c b/hw/pci.c >>>>>>>>>>>> index aa0c0b8..1f7c924 100644 >>>>>>>>>>>> --- a/hw/pci.c >>>>>>>>>>>> +++ b/hw/pci.c >>>>>>>>>>>> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> config = pdev->config + offset; >>>>>>>>>>>> - config[PCI_CAP_LIST_ID] = cap_id; >>>>>>>>>>>> - config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; >>>>>>>>>>>> - pdev->config[PCI_CAPABILITY_LIST] = offset; >>>>>>>>>>>> - pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; >>>>>>>>>>>> + if (config[PCI_CAP_LIST_ID] != cap_id) { >>>>>>>>>>> >>>>>>>>>>> This doesn't scale. Capabilities are a list of CAPs. You'll have to do a loop through all capabilities, check if the one you want to add is there already and if so either >>>>>>>>>>> * replace the existing one or >>>>>>>>>>> * drop out and not write the new one in. >>>>>>>>> >>>>>>>>> * hw_error :) >>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I'm not sure which way would be more natural. >>>>>>>>>> >>>>>>>>>> There is a third option - add another function, lets call it >>>>>>>>>> pci_fixup_capability() which would do whatever pci_add_capability() does >>>>>>>>>> but won't touch list pointers. >>>>>>>>> >>>>>>>>> What good is a function that breaks internal consistency? >>>>>>>> >>>>>>>> >>>>>>>> It is broken already by having PCIDevice.used field. Normally pci_add_capability() would go through >>>>>>>> the whole list and add a capability if it does not exist. Emulated devices which care about having a >>>>>>>> capability at some fixed offset would have initialized their config space before calling this >>>>>>>> capabilities API (as VFIO does). >>>>>>>> >>>>>>>> If we really want to support emulated devices which want some capabilities be at fixed offset and >>>>>>>> others at random offsets (strange, but ok), I do not see how it is bad to restore this consistency >>>>>>>> by special function (pci_fixup_capability()) to avoid its rewriting at different location as a guest >>>>>>>> driver may care about its offset. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>> When vfio, pci_add_capability() is called from the code which knows >>>>>>>>>> exactly that the capability exists and where it is and it calls >>>>>>>>>> pci_add_capability() based on this knowledge so doing additional loops >>>>>>>>>> just for imaginery scalability is a bit weird, no? >>>>>>>>> >>>>>>>>> Not sure I understand your proposal. The more generic a framework is, the better, no? In this code path we don't care about speed. We only care about consistency and reliability. -- Alexey ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space 2012-06-08 8:47 ` Alexey Kardashevskiy @ 2012-06-08 10:56 ` Jan Kiszka 2012-06-08 11:16 ` Alexey Kardashevskiy 0 siblings, 1 reply; 29+ messages in thread From: Jan Kiszka @ 2012-06-08 10:56 UTC (permalink / raw) To: Alexey Kardashevskiy Cc: kvm@vger.kernel.org, Alexander Graf, qemu-devel@nongnu.org, Alex Williamson, anthony@codemonkey.ws, David Gibson On 2012-06-08 10:47, Alexey Kardashevskiy wrote: > Yet another try :) > > Normally the pci_add_capability is called on devices to add new > capability. This is ok for emulated devices which capabilities list > is being built by QEMU. > > In the case of VFIO the capability may already exist and adding new Why does it exit? VFIO should build the virtual capability list from scratch (just like classic device assignment does), recreating the layout of the physical device (except for masked out caps). In that case, this conflict should become impossible, no? But if pci_*add*_capability should actually be used like this (I doubt this), some renaming would be required. "Add" sound like "append" to me, not "update". Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space 2012-06-08 10:56 ` Jan Kiszka @ 2012-06-08 11:16 ` Alexey Kardashevskiy 2012-06-08 11:30 ` Jan Kiszka 0 siblings, 1 reply; 29+ messages in thread From: Alexey Kardashevskiy @ 2012-06-08 11:16 UTC (permalink / raw) To: Jan Kiszka Cc: kvm@vger.kernel.org, Alexander Graf, qemu-devel@nongnu.org, Alex Williamson, anthony@codemonkey.ws, David Gibson 08.06.2012 20:56, Jan Kiszka написал: > On 2012-06-08 10:47, Alexey Kardashevskiy wrote: >> Yet another try :) >> >> Normally the pci_add_capability is called on devices to add new >> capability. This is ok for emulated devices which capabilities list >> is being built by QEMU. >> >> In the case of VFIO the capability may already exist and adding new > > Why does it exit? VFIO should build the virtual capability list from > scratch (just like classic device assignment does), recreating the > layout of the physical device (except for masked out caps). In that > case, this conflict should become impossible, no? Normally capabilities in emulated devices are created by calling msi_init or msix_init - just when emulated device wants to advertise it to the guest. In the case of VFIO, there is a lot of capabilities which QEMU does not know and does not want to know about. They are read from the host kernel as is. And we definitely want to pass these capabilities to the guest as is, i.e. on the same position and the same number of them. Just for some we call pci_add_capability (indirectly!) if we want QEMU to support them somehow. If we invent some function which "readds" all the capabilities we got from the host to keep internal QEMU's PCIDevice data in sync, then we'll need to change every piece of code which adds capabilities. I noticed, this is very common approach here to change a lot for a very small thing or rare case but I'd like to avoid this :) > But if pci_*add*_capability should actually be used like this (I doubt > this), MSI/MSIX use it. To enable MSI/MSIX on VFIO PCIDevice, we call msi_init/msix_init and they call pci_add_capability. > some renaming would be required. "Add" sound like "append" to me, > not "update". It is "add" for all the cases but VFIO. VFIO is the very special case and I do not see another one doing the same soon. -- With best regards Alexey Kardashevskiy -- icq: 52150396 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space 2012-06-08 11:16 ` Alexey Kardashevskiy @ 2012-06-08 11:30 ` Jan Kiszka 2012-06-08 14:00 ` Alexey Kardashevskiy 0 siblings, 1 reply; 29+ messages in thread From: Jan Kiszka @ 2012-06-08 11:30 UTC (permalink / raw) To: Alexey Kardashevskiy Cc: kvm@vger.kernel.org, Alexander Graf, qemu-devel@nongnu.org, Alex Williamson, anthony@codemonkey.ws, David Gibson On 2012-06-08 13:16, Alexey Kardashevskiy wrote: > 08.06.2012 20:56, Jan Kiszka написал: >> On 2012-06-08 10:47, Alexey Kardashevskiy wrote: >>> Yet another try :) >>> >>> Normally the pci_add_capability is called on devices to add new >>> capability. This is ok for emulated devices which capabilities list >>> is being built by QEMU. >>> >>> In the case of VFIO the capability may already exist and adding new >> >> Why does it exit? VFIO should build the virtual capability list from >> scratch (just like classic device assignment does), recreating the >> layout of the physical device (except for masked out caps). In that >> case, this conflict should become impossible, no? > > Normally capabilities in emulated devices are created by calling > msi_init or msix_init - just when emulated device wants to advertise it > to the guest. > > In the case of VFIO, there is a lot of capabilities which QEMU does not > know and does not want to know about. They are read from the host kernel > as is. And we definitely want to pass these capabilities to the guest as > is, i.e. on the same position and the same number of them. Just for some > we call pci_add_capability (indirectly!) if we want QEMU to support them > somehow. > > If we invent some function which "readds" all the capabilities we got > from the host to keep internal QEMU's PCIDevice data in sync, then we'll > need to change every piece of code which adds capabilities. I can't follow. What is different in VFIO from device-assignment.c, assigned_device_pci_cap_init (except that it already uses msi[x]_init, something we need to fix in device-assignment.c)? > I noticed, > this is very common approach here to change a lot for a very small thing > or rare case but I'd like to avoid this :) > >> But if pci_*add*_capability should actually be used like this (I doubt >> this), > > MSI/MSIX use it. To enable MSI/MSIX on VFIO PCIDevice, we call > msi_init/msix_init and they call pci_add_capability. You can't blame msi_init/msix_init for the fact that VFIO creates a capability list with an existing MSI/MSI-X entry beforehand. > >> some renaming would be required. "Add" sound like "append" to me, >> not "update". > > It is "add" for all the cases but VFIO. VFIO is the very special case > and I do not see another one doing the same soon. PCI device assignment may have some special requirements. Then it is either required to generalize common services properly or keep the specialty local. So far, this proposal does not fall in any of those two categories. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space 2012-06-08 11:30 ` Jan Kiszka @ 2012-06-08 14:00 ` Alexey Kardashevskiy 2012-06-08 14:43 ` Jan Kiszka 0 siblings, 1 reply; 29+ messages in thread From: Alexey Kardashevskiy @ 2012-06-08 14:00 UTC (permalink / raw) To: Jan Kiszka Cc: kvm@vger.kernel.org, Alexander Graf, qemu-devel@nongnu.org, Alex Williamson, anthony@codemonkey.ws, David Gibson 08.06.2012 21:30, Jan Kiszka пишет: > On 2012-06-08 13:16, Alexey Kardashevskiy wrote: >> 08.06.2012 20:56, Jan Kiszka написал: >>> On 2012-06-08 10:47, Alexey Kardashevskiy wrote: >>>> Yet another try :) >>>> >>>> Normally the pci_add_capability is called on devices to add new >>>> capability. This is ok for emulated devices which capabilities list >>>> is being built by QEMU. >>>> >>>> In the case of VFIO the capability may already exist and adding new >>> >>> Why does it exit? VFIO should build the virtual capability list from >>> scratch (just like classic device assignment does), recreating the >>> layout of the physical device (except for masked out caps). In that >>> case, this conflict should become impossible, no? >> >> Normally capabilities in emulated devices are created by calling >> msi_init or msix_init - just when emulated device wants to advertise it >> to the guest. >> >> In the case of VFIO, there is a lot of capabilities which QEMU does not >> know and does not want to know about. They are read from the host kernel >> as is. And we definitely want to pass these capabilities to the guest as >> is, i.e. on the same position and the same number of them. Just for some >> we call pci_add_capability (indirectly!) if we want QEMU to support them >> somehow. >> >> If we invent some function which "readds" all the capabilities we got >> from the host to keep internal QEMU's PCIDevice data in sync, then we'll >> need to change every piece of code which adds capabilities. > > I can't follow. What is different in VFIO from device-assignment.c, > assigned_device_pci_cap_init (except that it already uses msi[x]_init, > something we need to fix in device-assignment.c)? What are device-assignment.c and assigned_device_pci_cap_init? Cannot find them in QEMU tree. Ah, anyway. The main difference is QEMU does not emulate VFIO devices, it just a proxy to the host system. Or I do not understand the question. >> I noticed, >> this is very common approach here to change a lot for a very small thing >> or rare case but I'd like to avoid this :) >> >>> But if pci_*add*_capability should actually be used like this (I doubt >>> this), >> >> MSI/MSIX use it. To enable MSI/MSIX on VFIO PCIDevice, we call >> msi_init/msix_init and they call pci_add_capability. > > You can't blame msi_init/msix_init for the fact that VFIO creates a > capability list with an existing MSI/MSI-X entry beforehand. VFIO does not create any capability. It gets them all from the host kernel and passes to the guest as is. VFIO only needs MSIX to be enabled in VFIO. >>> some renaming would be required. "Add" sound like "append" to me, >>> not "update". >> >> It is "add" for all the cases but VFIO. VFIO is the very special case >> and I do not see another one doing the same soon. > > PCI device assignment may have some special requirements. Then it is > either required to generalize common services properly or keep the > specialty local. So far, this proposal does not fall in any of those two > categories. It is a common patch. It does not know about VFIO and lets pci_add_capability handle one more situation when the capability already exists. The only "common" solution I see here is 1) to add pci_fixup_capabilities() which would mark all the bytes of existing capabilities as "used", we will call it once we fetched the config space from the host kernel 2) to fix pci_add_capabilities not to fail and simply return (0?) if we add a capability which already exists. Will it be ok? -- With best regards Alexey Kardashevskiy -- icq: 52150396 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space 2012-06-08 14:00 ` Alexey Kardashevskiy @ 2012-06-08 14:43 ` Jan Kiszka 2012-06-08 14:56 ` Alex Williamson 0 siblings, 1 reply; 29+ messages in thread From: Jan Kiszka @ 2012-06-08 14:43 UTC (permalink / raw) To: Alexey Kardashevskiy Cc: kvm@vger.kernel.org, Alexander Graf, qemu-devel@nongnu.org, Alex Williamson, anthony@codemonkey.ws, David Gibson On 2012-06-08 16:00, Alexey Kardashevskiy wrote: > 08.06.2012 21:30, Jan Kiszka пишет: >> On 2012-06-08 13:16, Alexey Kardashevskiy wrote: >>> 08.06.2012 20:56, Jan Kiszka написал: >>>> On 2012-06-08 10:47, Alexey Kardashevskiy wrote: >>>>> Yet another try :) >>>>> >>>>> Normally the pci_add_capability is called on devices to add new >>>>> capability. This is ok for emulated devices which capabilities list >>>>> is being built by QEMU. >>>>> >>>>> In the case of VFIO the capability may already exist and adding new >>>> >>>> Why does it exit? VFIO should build the virtual capability list from >>>> scratch (just like classic device assignment does), recreating the >>>> layout of the physical device (except for masked out caps). In that >>>> case, this conflict should become impossible, no? >>> >>> Normally capabilities in emulated devices are created by calling >>> msi_init or msix_init - just when emulated device wants to advertise it >>> to the guest. >>> >>> In the case of VFIO, there is a lot of capabilities which QEMU does not >>> know and does not want to know about. They are read from the host kernel >>> as is. And we definitely want to pass these capabilities to the guest as >>> is, i.e. on the same position and the same number of them. Just for some >>> we call pci_add_capability (indirectly!) if we want QEMU to support them >>> somehow. >>> >>> If we invent some function which "readds" all the capabilities we got >>> from the host to keep internal QEMU's PCIDevice data in sync, then we'll >>> need to change every piece of code which adds capabilities. >> >> I can't follow. What is different in VFIO from device-assignment.c, >> assigned_device_pci_cap_init (except that it already uses msi[x]_init, >> something we need to fix in device-assignment.c)? > > What are device-assignment.c and assigned_device_pci_cap_init? Cannot > find them in QEMU tree. "Old-style" KVM device assignment is not yet upstream. You can find it in qemu-kvm, hopefully in upstream soon as well. > > Ah, anyway. The main difference is QEMU does not emulate VFIO devices, > it just a proxy to the host system. Or I do not understand the question. > >>> I noticed, >>> this is very common approach here to change a lot for a very small thing >>> or rare case but I'd like to avoid this :) >>> >>>> But if pci_*add*_capability should actually be used like this (I doubt >>>> this), >>> >>> MSI/MSIX use it. To enable MSI/MSIX on VFIO PCIDevice, we call >>> msi_init/msix_init and they call pci_add_capability. >> >> You can't blame msi_init/msix_init for the fact that VFIO creates a >> capability list with an existing MSI/MSI-X entry beforehand. > > VFIO does not create any capability. It gets them all from the host > kernel and passes to the guest as is. VFIO only needs MSIX to be enabled > in VFIO. Just like any device in QEMU, also VFIO need to set up a virtual config space when it registers with the PCI core layer. Even if the virtual one is modeled after the real one, it is still _created_ by the VFIO userspace part. And this creation process is obviously a bit messed up so far. Fix this, but not by adding workarounds in the MSI or PCI layer. Rather add all capabilities you want to expose to the guest via pci_add_capability or, indirectly, via msi[x]_init at the right position. Do not just copy the real config space over, that breaks the core layer as we see. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space 2012-06-08 14:43 ` Jan Kiszka @ 2012-06-08 14:56 ` Alex Williamson 2012-06-08 15:05 ` Jan Kiszka 0 siblings, 1 reply; 29+ messages in thread From: Alex Williamson @ 2012-06-08 14:56 UTC (permalink / raw) To: Jan Kiszka Cc: kvm@vger.kernel.org, Alexey Kardashevskiy, Alexander Graf, qemu-devel@nongnu.org, anthony@codemonkey.ws, David Gibson On Fri, 2012-06-08 at 16:43 +0200, Jan Kiszka wrote: > On 2012-06-08 16:00, Alexey Kardashevskiy wrote: > > 08.06.2012 21:30, Jan Kiszka пишет: > >> On 2012-06-08 13:16, Alexey Kardashevskiy wrote: > >>> 08.06.2012 20:56, Jan Kiszka написал: > >>>> On 2012-06-08 10:47, Alexey Kardashevskiy wrote: > >>>>> Yet another try :) > >>>>> > >>>>> Normally the pci_add_capability is called on devices to add new > >>>>> capability. This is ok for emulated devices which capabilities list > >>>>> is being built by QEMU. > >>>>> > >>>>> In the case of VFIO the capability may already exist and adding new > >>>> > >>>> Why does it exit? VFIO should build the virtual capability list from > >>>> scratch (just like classic device assignment does), recreating the > >>>> layout of the physical device (except for masked out caps). In that > >>>> case, this conflict should become impossible, no? > >>> > >>> Normally capabilities in emulated devices are created by calling > >>> msi_init or msix_init - just when emulated device wants to advertise it > >>> to the guest. > >>> > >>> In the case of VFIO, there is a lot of capabilities which QEMU does not > >>> know and does not want to know about. They are read from the host kernel > >>> as is. And we definitely want to pass these capabilities to the guest as > >>> is, i.e. on the same position and the same number of them. Just for some > >>> we call pci_add_capability (indirectly!) if we want QEMU to support them > >>> somehow. > >>> > >>> If we invent some function which "readds" all the capabilities we got > >>> from the host to keep internal QEMU's PCIDevice data in sync, then we'll > >>> need to change every piece of code which adds capabilities. > >> > >> I can't follow. What is different in VFIO from device-assignment.c, > >> assigned_device_pci_cap_init (except that it already uses msi[x]_init, > >> something we need to fix in device-assignment.c)? > > > > What are device-assignment.c and assigned_device_pci_cap_init? Cannot > > find them in QEMU tree. > > "Old-style" KVM device assignment is not yet upstream. You can find it > in qemu-kvm, hopefully in upstream soon as well. > > > > > Ah, anyway. The main difference is QEMU does not emulate VFIO devices, > > it just a proxy to the host system. Or I do not understand the question. > > > >>> I noticed, > >>> this is very common approach here to change a lot for a very small thing > >>> or rare case but I'd like to avoid this :) > >>> > >>>> But if pci_*add*_capability should actually be used like this (I doubt > >>>> this), > >>> > >>> MSI/MSIX use it. To enable MSI/MSIX on VFIO PCIDevice, we call > >>> msi_init/msix_init and they call pci_add_capability. > >> > >> You can't blame msi_init/msix_init for the fact that VFIO creates a > >> capability list with an existing MSI/MSI-X entry beforehand. > > > > VFIO does not create any capability. It gets them all from the host > > kernel and passes to the guest as is. VFIO only needs MSIX to be enabled > > in VFIO. > > Just like any device in QEMU, also VFIO need to set up a virtual config > space when it registers with the PCI core layer. Even if the virtual one > is modeled after the real one, it is still _created_ by the VFIO > userspace part. And this creation process is obviously a bit messed up > so far. Fix this, but not by adding workarounds in the MSI or PCI layer. > Rather add all capabilities you want to expose to the guest via > pci_add_capability or, indirectly, via msi[x]_init at the right > position. Do not just copy the real config space over, that breaks the > core layer as we see. The difference between VFIO and kvm device assignment is that VFIO emulates a lot of config space for us, so most things are passed through. MSI and MSIX are unique that we actually do want the qemu support for helping us to manage them. So we're basically not telling qemu about anything other than these, and for the most part, that works since qemu never handles access to the other capabilities. However, I think you're probably right, VFIO should just walk the capabilities list, registering each with qemu. It's a little "unnecessary" overhead from the VFIO perspective, but it makes the VFIO device less unique. I'll work on adding this. Thanks, Alex ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space 2012-06-08 14:56 ` Alex Williamson @ 2012-06-08 15:05 ` Jan Kiszka 2012-06-08 15:22 ` Alex Williamson 0 siblings, 1 reply; 29+ messages in thread From: Jan Kiszka @ 2012-06-08 15:05 UTC (permalink / raw) To: Alex Williamson Cc: kvm@vger.kernel.org, Alexey Kardashevskiy, Alexander Graf, qemu-devel@nongnu.org, anthony@codemonkey.ws, David Gibson On 2012-06-08 16:56, Alex Williamson wrote: > The difference between VFIO and kvm device assignment is that VFIO > emulates a lot of config space for us, so most things are passed > through. That's not different from current device assignment, is it? I think the major difference is that VFIO filters and potentially post-processes the direct writes in kernel space. > MSI and MSIX are unique that we actually do want the qemu > support for helping us to manage them. So we're basically not telling > qemu about anything other than these, and for the most part, that works > since qemu never handles access to the other capabilities. However, I > think you're probably right, VFIO should just walk the capabilities > list, registering each with qemu. It's a little "unnecessary" overhead > from the VFIO perspective, but it makes the VFIO device less unique. > I'll work on adding this. Thanks, Great, thanks! Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space 2012-06-08 15:05 ` Jan Kiszka @ 2012-06-08 15:22 ` Alex Williamson 0 siblings, 0 replies; 29+ messages in thread From: Alex Williamson @ 2012-06-08 15:22 UTC (permalink / raw) To: Jan Kiszka Cc: kvm@vger.kernel.org, Alexey Kardashevskiy, Alexander Graf, qemu-devel@nongnu.org, anthony@codemonkey.ws, David Gibson On Fri, 2012-06-08 at 17:05 +0200, Jan Kiszka wrote: > On 2012-06-08 16:56, Alex Williamson wrote: > > The difference between VFIO and kvm device assignment is that VFIO > > emulates a lot of config space for us, so most things are passed > > through. > > That's not different from current device assignment, is it? I think the > major difference is that VFIO filters and potentially post-processes the > direct writes in kernel space. Right, and having the filtering/virtualization in the kernel means that qemu only handles a very small subset of PCI config space. That's made us lax in even telling qemu about the areas that it'll never see accesses too. For current device assignment, since we doing the emulation in qemu, it's a little more beneficial to register everything. Thanks, Alex ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space 2012-05-22 6:11 ` Alexey Kardashevskiy 2012-05-22 6:31 ` Alexander Graf @ 2012-05-22 6:38 ` Alexander Graf 1 sibling, 0 replies; 29+ messages in thread From: Alexander Graf @ 2012-05-22 6:38 UTC (permalink / raw) To: Alexey Kardashevskiy Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org, Alex Williamson, anthony@codemonkey.ws, David Gibson On 22.05.2012, at 08:11, Alexey Kardashevskiy <aik@ozlabs.ru> wrote: > On 22/05/12 15:52, Alexander Graf wrote: >> >> >> On 22.05.2012, at 05:44, Alexey Kardashevskiy <aik@ozlabs.ru> wrote: >> >>> On 22/05/12 13:21, Alexander Graf wrote: >>>> >>>> >>>> On 22.05.2012, at 04:02, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: >>>> >>>>> On Fri, 2012-05-18 at 15:12 +1000, Alexey Kardashevskiy wrote: >>>>>> Alexander, >>>>>> >>>>>> Is that any better? :) >>>>> >>>>> Alex (Graf that is), ping ? >>>>> >>>>> The original patch from Alexey was fine btw. >>>>> >>>>> VFIO will always call things with the existing capability offset so >>>>> there's no real risk of doing the wrong thing or break the list or >>>>> anything. >>>>> >>>>> IE. A small simple patch that addresses the problem :-) >>>>> >>>>> The new patch is a bit more "robust" I believe, I don't think we need to >>>>> go too far to fix a problem we don't have. But we need a fix for the >>>>> real issue and the simple patch does it neatly from what I can >>>>> understand. >>>>> >>>>> Cheers, >>>>> Ben. >>>>> >>>>>> >>>>>> @@ -1779,11 +1779,29 @@ static void pci_del_option_rom(PCIDevice *pdev) >>>>>> * in pci config space */ >>>>>> int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >>>>>> uint8_t offset, uint8_t size) >>>>>> { >>>>>> - uint8_t *config; >>>>>> + uint8_t *config, existing; >>>> >>>> Existing is a pointer to the target dev's config space, right? >>> >>> Yes. >>> >>>>>> int i, overlapping_cap; >>>>>> >>>>>> + existing = pci_find_capability(pdev, cap_id); >>>>>> + if (existing) { >>>>>> + if (offset && (existing != offset)) { >>>>>> + return -EEXIST; >>>>>> + } >>>>>> + for (i = existing; i < size; ++i) { >>>> >>>> So how does this possibly make sense? >>> >>> Although I do not expect VFIO to add capabilities (does not make sense), I still want to double >>> check that this space has not been tried to use by someone else. >> >> i is an int. existing is a uint8_t*. > > > It was there before me. This function already does a loop and this is how it was coded at the first place. Also, while at it, please add some comments at least for the code you add that explain why you do the things you do :). Alex ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space 2012-05-11 6:45 [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space Alexey Kardashevskiy 2012-05-11 10:52 ` Alexander Graf @ 2012-05-11 19:20 ` Jason Baron 2012-05-12 0:27 ` Alexey Kardashevskiy 1 sibling, 1 reply; 29+ messages in thread From: Jason Baron @ 2012-05-11 19:20 UTC (permalink / raw) To: Alexey Kardashevskiy Cc: kvm, qemu-devel, Alex Graf, Alex Williamson, anthony, David Gibson On Fri, May 11, 2012 at 04:45:21PM +1000, Alexey Kardashevskiy wrote: > Normally the pci_add_capability is called on devices to add new > capability. This is ok for emulated devices which capabilities list > is being built by QEMU. > > In the case of VFIO the capability may already exist and adding new > capability into the beginning of the linked list may create a loop. Hi, I don't quite understand how we get a loop, if 'offset' is supplied to 'pci_add_capability' and there is an overlap we get -EINVAL. Otherwise, we are adding the capability in a new empty space. So, I see how we could get the capability in the list twice, but not how there is a loop. what am I missing? Thanks, -Jason > > For example, the old code destroys the following config > of PCIe Intel E1000E: > > before adding PCI_CAP_ID_MSI (0x05): > 0x34: 0xC8 > 0xC8: 0x01 0xD0 > 0xD0: 0x05 0xE0 > 0xE0: 0x10 0x00 > > after: > 0x34: 0xD0 > 0xC8: 0x01 0xD0 > 0xD0: 0x05 0xC8 > 0xE0: 0x10 0x00 > > As result capabilities 0x01 and 0x05 point to each other. > > The proposed patch does not change capability pointers when > the same type capability is about to add. > > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> > --- > hw/pci.c | 10 ++++++---- > 1 files changed, 6 insertions(+), 4 deletions(-) > > diff --git a/hw/pci.c b/hw/pci.c > index aa0c0b8..1f7c924 100644 > --- a/hw/pci.c > +++ b/hw/pci.c > @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, > } > > config = pdev->config + offset; > - config[PCI_CAP_LIST_ID] = cap_id; > - config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; > - pdev->config[PCI_CAPABILITY_LIST] = offset; > - pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; > + if (config[PCI_CAP_LIST_ID] != cap_id) { > + config[PCI_CAP_LIST_ID] = cap_id; > + config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; > + pdev->config[PCI_CAPABILITY_LIST] = offset; > + pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; > + } > memset(pdev->used + offset, 0xFF, size); > /* Make capability read-only by default */ > memset(pdev->wmask + offset, 0, size); > > > -- > Alexey > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space 2012-05-11 19:20 ` Jason Baron @ 2012-05-12 0:27 ` Alexey Kardashevskiy 2012-05-14 2:37 ` Alex Williamson 0 siblings, 1 reply; 29+ messages in thread From: Alexey Kardashevskiy @ 2012-05-12 0:27 UTC (permalink / raw) To: Jason Baron Cc: kvm, qemu-devel, Alex Graf, Alex Williamson, anthony, David Gibson 12.05.2012 5:20, Jason Baron написал: > On Fri, May 11, 2012 at 04:45:21PM +1000, Alexey Kardashevskiy wrote: >> Normally the pci_add_capability is called on devices to add new >> capability. This is ok for emulated devices which capabilities list >> is being built by QEMU. >> >> In the case of VFIO the capability may already exist and adding new >> capability into the beginning of the linked list may create a loop. > > Hi, > > I don't quite understand how we get a loop, if 'offset' is supplied to > 'pci_add_capability' and there is an overlap we get -EINVAL. Otherwise, > we are adding the capability in a new empty space. So, I see how we > could get the capability in the list twice, but not how there is a loop. > what am I missing? This happens only with VFIO. The capability already exists in the config space as it is fetched from the host kernel _before_ msi_init is called. Furthermore, msi_init() is called when VFIO sees this capability in the config space. We probably want to re-add all capabilities, do not know... > Thanks, > > -Jason > >> >> For example, the old code destroys the following config >> of PCIe Intel E1000E: >> >> before adding PCI_CAP_ID_MSI (0x05): >> 0x34: 0xC8 >> 0xC8: 0x01 0xD0 >> 0xD0: 0x05 0xE0 >> 0xE0: 0x10 0x00 >> >> after: >> 0x34: 0xD0 >> 0xC8: 0x01 0xD0 >> 0xD0: 0x05 0xC8 >> 0xE0: 0x10 0x00 >> >> As result capabilities 0x01 and 0x05 point to each other. >> >> The proposed patch does not change capability pointers when >> the same type capability is about to add. >> >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >> --- >> hw/pci.c | 10 ++++++---- >> 1 files changed, 6 insertions(+), 4 deletions(-) >> >> diff --git a/hw/pci.c b/hw/pci.c >> index aa0c0b8..1f7c924 100644 >> --- a/hw/pci.c >> +++ b/hw/pci.c >> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, >> } >> >> config = pdev->config + offset; >> - config[PCI_CAP_LIST_ID] = cap_id; >> - config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; >> - pdev->config[PCI_CAPABILITY_LIST] = offset; >> - pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; >> + if (config[PCI_CAP_LIST_ID] != cap_id) { >> + config[PCI_CAP_LIST_ID] = cap_id; >> + config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; >> + pdev->config[PCI_CAPABILITY_LIST] = offset; >> + pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; >> + } >> memset(pdev->used + offset, 0xFF, size); >> /* Make capability read-only by default */ >> memset(pdev->wmask + offset, 0, size); -- With best regards Alexey Kardashevskiy -- icq: 52150396 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space 2012-05-12 0:27 ` Alexey Kardashevskiy @ 2012-05-14 2:37 ` Alex Williamson 0 siblings, 0 replies; 29+ messages in thread From: Alex Williamson @ 2012-05-14 2:37 UTC (permalink / raw) To: Alexey Kardashevskiy Cc: kvm, Jason Baron, qemu-devel, Alex Graf, anthony, David Gibson On Sat, 2012-05-12 at 10:27 +1000, Alexey Kardashevskiy wrote: > 12.05.2012 5:20, Jason Baron написал: > > On Fri, May 11, 2012 at 04:45:21PM +1000, Alexey Kardashevskiy wrote: > >> Normally the pci_add_capability is called on devices to add new > >> capability. This is ok for emulated devices which capabilities list > >> is being built by QEMU. > >> > >> In the case of VFIO the capability may already exist and adding new > >> capability into the beginning of the linked list may create a loop. > > > > Hi, > > > > I don't quite understand how we get a loop, if 'offset' is supplied to > > 'pci_add_capability' and there is an overlap we get -EINVAL. Otherwise, > > we are adding the capability in a new empty space. So, I see how we > > could get the capability in the list twice, but not how there is a loop. > > what am I missing? > > > This happens only with VFIO. > > The capability already exists in the config space as it is fetched from > the host kernel _before_ msi_init is called. Furthermore, msi_init() is > called when VFIO sees this capability in the config space. > > We probably want to re-add all capabilities, do not know... Yep, I've had a msi[1] and msix[2] patches in my vfio tree for a long time, we really want to support this generically for all capabilities though. We either need to detect or allow the caller to specify that the config space is already programmed. Note that even if we don't create a loop, particularly finicky drivers may balk at just changing the order of the capabilities list. Thanks, Alex [1]https://github.com/awilliam/qemu-vfio/commit/a9f04351610ab69e22d90a76dc85be3269000a9f [2]https://github.com/awilliam/qemu-vfio/commit/b4de3d0436b0260fbc6fcd40787c1c92ffca2980 > >> > >> For example, the old code destroys the following config > >> of PCIe Intel E1000E: > >> > >> before adding PCI_CAP_ID_MSI (0x05): > >> 0x34: 0xC8 > >> 0xC8: 0x01 0xD0 > >> 0xD0: 0x05 0xE0 > >> 0xE0: 0x10 0x00 > >> > >> after: > >> 0x34: 0xD0 > >> 0xC8: 0x01 0xD0 > >> 0xD0: 0x05 0xC8 > >> 0xE0: 0x10 0x00 > >> > >> As result capabilities 0x01 and 0x05 point to each other. > >> > >> The proposed patch does not change capability pointers when > >> the same type capability is about to add. > >> > >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> > >> --- > >> hw/pci.c | 10 ++++++---- > >> 1 files changed, 6 insertions(+), 4 deletions(-) > >> > >> diff --git a/hw/pci.c b/hw/pci.c > >> index aa0c0b8..1f7c924 100644 > >> --- a/hw/pci.c > >> +++ b/hw/pci.c > >> @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, > >> } > >> > >> config = pdev->config + offset; > >> - config[PCI_CAP_LIST_ID] = cap_id; > >> - config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; > >> - pdev->config[PCI_CAPABILITY_LIST] = offset; > >> - pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; > >> + if (config[PCI_CAP_LIST_ID] != cap_id) { > >> + config[PCI_CAP_LIST_ID] = cap_id; > >> + config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; > >> + pdev->config[PCI_CAPABILITY_LIST] = offset; > >> + pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; > >> + } > >> memset(pdev->used + offset, 0xFF, size); > >> /* Make capability read-only by default */ > >> memset(pdev->wmask + offset, 0, size); > > > ^ permalink raw reply [flat|nested] 29+ messages in thread
* [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space @ 2012-05-11 6:59 Alexey Kardashevskiy 0 siblings, 0 replies; 29+ messages in thread From: Alexey Kardashevskiy @ 2012-05-11 6:59 UTC (permalink / raw) To: qemu-devel; +Cc: aik, kvm Normally the pci_add_capability is called on devices to add new capability. This is ok for emulated devices which capabilities list is being built by QEMU. In the case of VFIO the capability may already exist and adding new capability into the beginning of the linked list may create a loop. For example, the old code destroys the following config of PCIe Intel E1000E: before adding PCI_CAP_ID_MSI (0x05): 0x34: 0xC8 0xC8: 0x01 0xD0 0xD0: 0x05 0xE0 0xE0: 0x10 0x00 after: 0x34: 0xD0 0xC8: 0x01 0xD0 0xD0: 0x05 0xC8 0xE0: 0x10 0x00 As result capabilities 0x01 and 0x05 point to each other. The proposed patch does not change capability pointers when the same type capability is about to add. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> --- hw/pci.c | 10 ++++++---- 1 files changed, 6 insertions(+), 4 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index aa0c0b8..1f7c924 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -1794,10 +1794,12 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, } config = pdev->config + offset; - config[PCI_CAP_LIST_ID] = cap_id; - config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; - pdev->config[PCI_CAPABILITY_LIST] = offset; - pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; + if (config[PCI_CAP_LIST_ID] != cap_id) { + config[PCI_CAP_LIST_ID] = cap_id; + config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; + pdev->config[PCI_CAPABILITY_LIST] = offset; + pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; + } memset(pdev->used + offset, 0xFF, size); /* Make capability read-only by default */ memset(pdev->wmask + offset, 0, size); -- Alexey ^ permalink raw reply related [flat|nested] 29+ messages in thread
end of thread, other threads:[~2012-06-08 15:22 UTC | newest] Thread overview: 29+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-05-11 6:45 [Qemu-devel] [RFC PATCH] qemu pci: pci_add_capability enhancement to prevent damaging config space Alexey Kardashevskiy 2012-05-11 10:52 ` Alexander Graf 2012-05-11 12:47 ` Alexey Kardashevskiy 2012-05-11 14:13 ` Alexander Graf 2012-05-14 3:49 ` Alexey Kardashevskiy 2012-05-18 5:12 ` Alexey Kardashevskiy 2012-05-22 2:02 ` Benjamin Herrenschmidt 2012-05-22 3:21 ` Alexander Graf 2012-05-22 3:44 ` Alexey Kardashevskiy 2012-05-22 5:52 ` Alexander Graf 2012-05-22 6:11 ` Alexey Kardashevskiy 2012-05-22 6:31 ` Alexander Graf 2012-05-22 7:01 ` Alexey Kardashevskiy 2012-05-22 7:13 ` Alexander Graf 2012-05-22 7:37 ` Benjamin Herrenschmidt 2012-06-08 8:47 ` Alexey Kardashevskiy 2012-06-08 10:56 ` Jan Kiszka 2012-06-08 11:16 ` Alexey Kardashevskiy 2012-06-08 11:30 ` Jan Kiszka 2012-06-08 14:00 ` Alexey Kardashevskiy 2012-06-08 14:43 ` Jan Kiszka 2012-06-08 14:56 ` Alex Williamson 2012-06-08 15:05 ` Jan Kiszka 2012-06-08 15:22 ` Alex Williamson 2012-05-22 6:38 ` Alexander Graf 2012-05-11 19:20 ` Jason Baron 2012-05-12 0:27 ` Alexey Kardashevskiy 2012-05-14 2:37 ` Alex Williamson -- strict thread matches above, loose matches on Subject: below -- 2012-05-11 6:59 Alexey Kardashevskiy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).