* [Qemu-devel] [PATCH 1/8] pci: pci_default_cap_write_config ignores wmask
2010-11-12 2:54 [Qemu-devel] [PATCH 0/8] PCI capability and device assignment improvements Alex Williamson
@ 2010-11-12 2:55 ` Alex Williamson
2010-11-12 5:22 ` [Qemu-devel] " Michael S. Tsirkin
2010-11-12 2:55 ` [Qemu-devel] [PATCH 2/8] pci: Remove pci_enable_capability_support() Alex Williamson
` (7 subsequent siblings)
8 siblings, 1 reply; 26+ messages in thread
From: Alex Williamson @ 2010-11-12 2:55 UTC (permalink / raw)
To: kvm; +Cc: chrisw, alex.williamson, qemu-devel, mst
Make use of wmask, just like the rest of config space.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
hw/pci.c | 19 ++++++++-----------
1 files changed, 8 insertions(+), 11 deletions(-)
diff --git a/hw/pci.c b/hw/pci.c
index 92aaa85..12c47ac 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1175,13 +1175,14 @@ uint32_t pci_default_read_config(PCIDevice *d,
return pci_read_config(d, address, len);
}
-static void pci_write_config(PCIDevice *pci_dev,
- uint32_t address, uint32_t val, int len)
+static void pci_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
{
int i;
- for (i = 0; i < len; i++) {
- pci_dev->config[address + i] = val & 0xff;
- val >>= 8;
+ uint32_t config_size = pci_config_size(d);
+
+ for (i = 0; i < l && addr + i < config_size; val >>= 8, ++i) {
+ uint8_t wmask = d->wmask[addr + i];
+ d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask);
}
}
@@ -1207,18 +1208,14 @@ void pci_default_cap_write_config(PCIDevice *pci_dev,
void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
{
- int i, was_irq_disabled = pci_irq_disabled(d);
- uint32_t config_size = pci_config_size(d);
+ int was_irq_disabled = pci_irq_disabled(d);
if (pci_access_cap_config(d, addr, l)) {
d->cap.config_write(d, addr, val, l);
return;
}
- for (i = 0; i < l && addr + i < config_size; val >>= 8, ++i) {
- uint8_t wmask = d->wmask[addr + i];
- d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask);
- }
+ pci_write_config(d, addr, val, l);
#ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
if (kvm_enabled() && kvm_irqchip_in_kernel() &&
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [Qemu-devel] Re: [PATCH 1/8] pci: pci_default_cap_write_config ignores wmask
2010-11-12 2:55 ` [Qemu-devel] [PATCH 1/8] pci: pci_default_cap_write_config ignores wmask Alex Williamson
@ 2010-11-12 5:22 ` Michael S. Tsirkin
2010-11-12 6:03 ` Alex Williamson
0 siblings, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2010-11-12 5:22 UTC (permalink / raw)
To: Alex Williamson; +Cc: chrisw, qemu-devel, kvm
On Thu, Nov 11, 2010 at 07:55:01PM -0700, Alex Williamson wrote:
> Make use of wmask, just like the rest of config space.
>
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
>
> hw/pci.c | 19 ++++++++-----------
> 1 files changed, 8 insertions(+), 11 deletions(-)
>
> diff --git a/hw/pci.c b/hw/pci.c
> index 92aaa85..12c47ac 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -1175,13 +1175,14 @@ uint32_t pci_default_read_config(PCIDevice *d,
> return pci_read_config(d, address, len);
> }
>
> -static void pci_write_config(PCIDevice *pci_dev,
> - uint32_t address, uint32_t val, int len)
> +static void pci_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
> {
> int i;
> - for (i = 0; i < len; i++) {
> - pci_dev->config[address + i] = val & 0xff;
> - val >>= 8;
> + uint32_t config_size = pci_config_size(d);
> +
> + for (i = 0; i < l && addr + i < config_size; val >>= 8, ++i) {
> + uint8_t wmask = d->wmask[addr + i];
> + d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask);
> }
> }
>
Let's not name an internal static helper pci_write_config.
This is really update_config_by_mask or something like that.
But see below: maybe we don't need it at all?
> @@ -1207,18 +1208,14 @@ void pci_default_cap_write_config(PCIDevice *pci_dev,
>
> void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
> {
> - int i, was_irq_disabled = pci_irq_disabled(d);
> - uint32_t config_size = pci_config_size(d);
> + int was_irq_disabled = pci_irq_disabled(d);
>
> if (pci_access_cap_config(d, addr, l)) {
> d->cap.config_write(d, addr, val, l);
> return;
> }
>
I would like to also examine the need for _cap_
functions. Why can assigned devices just do
pci_default_write_config
if (range_overlap(...msi)) {
}
if (range_overlap(...msix)) {
}
and then we could remove all the _cap_ extensions
altogether?
> - for (i = 0; i < l && addr + i < config_size; val >>= 8, ++i) {
> - uint8_t wmask = d->wmask[addr + i];
> - d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask);
> - }
> + pci_write_config(d, addr, val, l);
>
> #ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
> if (kvm_enabled() && kvm_irqchip_in_kernel() &&
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Qemu-devel] Re: [PATCH 1/8] pci: pci_default_cap_write_config ignores wmask
2010-11-12 5:22 ` [Qemu-devel] " Michael S. Tsirkin
@ 2010-11-12 6:03 ` Alex Williamson
2010-11-12 8:48 ` Michael S. Tsirkin
0 siblings, 1 reply; 26+ messages in thread
From: Alex Williamson @ 2010-11-12 6:03 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: chrisw, qemu-devel, kvm
On Fri, 2010-11-12 at 07:22 +0200, Michael S. Tsirkin wrote:
> On Thu, Nov 11, 2010 at 07:55:01PM -0700, Alex Williamson wrote:
> > Make use of wmask, just like the rest of config space.
> >
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > ---
> >
> > hw/pci.c | 19 ++++++++-----------
> > 1 files changed, 8 insertions(+), 11 deletions(-)
> >
> > diff --git a/hw/pci.c b/hw/pci.c
> > index 92aaa85..12c47ac 100644
> > --- a/hw/pci.c
> > +++ b/hw/pci.c
> > @@ -1175,13 +1175,14 @@ uint32_t pci_default_read_config(PCIDevice *d,
> > return pci_read_config(d, address, len);
> > }
> >
> > -static void pci_write_config(PCIDevice *pci_dev,
> > - uint32_t address, uint32_t val, int len)
> > +static void pci_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
> > {
> > int i;
> > - for (i = 0; i < len; i++) {
> > - pci_dev->config[address + i] = val & 0xff;
> > - val >>= 8;
> > + uint32_t config_size = pci_config_size(d);
> > +
> > + for (i = 0; i < l && addr + i < config_size; val >>= 8, ++i) {
> > + uint8_t wmask = d->wmask[addr + i];
> > + d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask);
> > }
> > }
> >
>
>
> Let's not name an internal static helper pci_write_config.
> This is really update_config_by_mask or something like that.
> But see below: maybe we don't need it at all?
The function already exists, I just made it do what it seems like it
should have been doing already.
> > @@ -1207,18 +1208,14 @@ void pci_default_cap_write_config(PCIDevice *pci_dev,
> >
> > void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
> > {
> > - int i, was_irq_disabled = pci_irq_disabled(d);
> > - uint32_t config_size = pci_config_size(d);
> > + int was_irq_disabled = pci_irq_disabled(d);
> >
> > if (pci_access_cap_config(d, addr, l)) {
> > d->cap.config_write(d, addr, val, l);
> > return;
> > }
> >
>
> I would like to also examine the need for _cap_
> functions. Why can assigned devices just do
>
> pci_default_write_config
> if (range_overlap(...msi)) {
> }
> if (range_overlap(...msix)) {
> }
> and then we could remove all the _cap_ extensions
> altogether?
I think that somewhere we need to track what capabilities are at what
offset, config space isn't a performance path, but that look horribly
inefficient and gets worse with more capabilities.
Why don't we define capability id 0xff as normal config space (first 64
bytes), then add the capability id to read/write_config (this is what
vfio does). Then the driver can split capability handling off from
their main functions if they want. Anyway, I think such an improvement
could be added incrementally later. Thanks,
Alex
> > - for (i = 0; i < l && addr + i < config_size; val >>= 8, ++i) {
> > - uint8_t wmask = d->wmask[addr + i];
> > - d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask);
> > - }
> > + pci_write_config(d, addr, val, l);
> >
> > #ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
> > if (kvm_enabled() && kvm_irqchip_in_kernel() &&
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Qemu-devel] Re: [PATCH 1/8] pci: pci_default_cap_write_config ignores wmask
2010-11-12 6:03 ` Alex Williamson
@ 2010-11-12 8:48 ` Michael S. Tsirkin
2010-11-12 15:49 ` Alex Williamson
0 siblings, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2010-11-12 8:48 UTC (permalink / raw)
To: Alex Williamson; +Cc: chrisw, qemu-devel, kvm
On Thu, Nov 11, 2010 at 11:03:19PM -0700, Alex Williamson wrote:
> On Fri, 2010-11-12 at 07:22 +0200, Michael S. Tsirkin wrote:
> > On Thu, Nov 11, 2010 at 07:55:01PM -0700, Alex Williamson wrote:
> > > Make use of wmask, just like the rest of config space.
> > >
> > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > ---
> > >
> > > hw/pci.c | 19 ++++++++-----------
> > > 1 files changed, 8 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/hw/pci.c b/hw/pci.c
> > > index 92aaa85..12c47ac 100644
> > > --- a/hw/pci.c
> > > +++ b/hw/pci.c
> > > @@ -1175,13 +1175,14 @@ uint32_t pci_default_read_config(PCIDevice *d,
> > > return pci_read_config(d, address, len);
> > > }
> > >
> > > -static void pci_write_config(PCIDevice *pci_dev,
> > > - uint32_t address, uint32_t val, int len)
> > > +static void pci_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
> > > {
> > > int i;
> > > - for (i = 0; i < len; i++) {
> > > - pci_dev->config[address + i] = val & 0xff;
> > > - val >>= 8;
> > > + uint32_t config_size = pci_config_size(d);
> > > +
> > > + for (i = 0; i < l && addr + i < config_size; val >>= 8, ++i) {
> > > + uint8_t wmask = d->wmask[addr + i];
> > > + d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask);
> > > }
> > > }
> > >
> >
> >
> > Let's not name an internal static helper pci_write_config.
> > This is really update_config_by_mask or something like that.
> > But see below: maybe we don't need it at all?
>
> The function already exists, I just made it do what it seems like it
> should have been doing already.
Yep. But since you are rewriting this function - could you rename it as
well?
> > > @@ -1207,18 +1208,14 @@ void pci_default_cap_write_config(PCIDevice *pci_dev,
> > >
> > > void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
> > > {
> > > - int i, was_irq_disabled = pci_irq_disabled(d);
> > > - uint32_t config_size = pci_config_size(d);
> > > + int was_irq_disabled = pci_irq_disabled(d);
> > >
> > > if (pci_access_cap_config(d, addr, l)) {
> > > d->cap.config_write(d, addr, val, l);
> > > return;
> > > }
> > >
> >
> > I would like to also examine the need for _cap_
> > functions. Why can assigned devices just do
> >
> > pci_default_write_config
> > if (range_overlap(...msi)) {
> > }
> > if (range_overlap(...msix)) {
> > }
> > and then we could remove all the _cap_ extensions
> > altogether?
>
> I think that somewhere we need to track what capabilities are at what
> offset, config space isn't a performance path, but that look horribly
> inefficient and gets worse with more capabilities.
Looks like premature optimization to me. I guess when we get more than
say 8 capabilities to support, I'll start to worry.
Even then, these optimizations are better internal in pci core.
> Why don't we define capability id 0xff as normal config space (first 64
> bytes), then add the capability id to read/write_config (this is what
> vfio does). Then the driver can split capability handling off from
> their main functions if they want.
My feeling is we need higher level APIs than 'capability write'.
Otherwise we get the PCI config handling all over the place.
E.g. a callback when msi is enabled/disabled would make sense,
so that pci core can keep track of current state and only notify
the device when there are things to do.
> Anyway, I think such an improvement
> could be added incrementally later. Thanks,
>
> Alex
Sure.
> > > - for (i = 0; i < l && addr + i < config_size; val >>= 8, ++i) {
> > > - uint8_t wmask = d->wmask[addr + i];
> > > - d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask);
> > > - }
> > > + pci_write_config(d, addr, val, l);
> > >
> > > #ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
> > > if (kvm_enabled() && kvm_irqchip_in_kernel() &&
>
>
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Qemu-devel] Re: [PATCH 1/8] pci: pci_default_cap_write_config ignores wmask
2010-11-12 8:48 ` Michael S. Tsirkin
@ 2010-11-12 15:49 ` Alex Williamson
2010-11-16 16:12 ` Michael S. Tsirkin
0 siblings, 1 reply; 26+ messages in thread
From: Alex Williamson @ 2010-11-12 15:49 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: chrisw, qemu-devel, kvm
On Fri, 2010-11-12 at 10:48 +0200, Michael S. Tsirkin wrote:
> On Thu, Nov 11, 2010 at 11:03:19PM -0700, Alex Williamson wrote:
> > On Fri, 2010-11-12 at 07:22 +0200, Michael S. Tsirkin wrote:
> > > On Thu, Nov 11, 2010 at 07:55:01PM -0700, Alex Williamson wrote:
> > > > Make use of wmask, just like the rest of config space.
> > > >
> > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > > ---
> > > >
> > > > hw/pci.c | 19 ++++++++-----------
> > > > 1 files changed, 8 insertions(+), 11 deletions(-)
> > > >
> > > > diff --git a/hw/pci.c b/hw/pci.c
> > > > index 92aaa85..12c47ac 100644
> > > > --- a/hw/pci.c
> > > > +++ b/hw/pci.c
> > > > @@ -1175,13 +1175,14 @@ uint32_t pci_default_read_config(PCIDevice *d,
> > > > return pci_read_config(d, address, len);
> > > > }
> > > >
> > > > -static void pci_write_config(PCIDevice *pci_dev,
> > > > - uint32_t address, uint32_t val, int len)
> > > > +static void pci_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
> > > > {
> > > > int i;
> > > > - for (i = 0; i < len; i++) {
> > > > - pci_dev->config[address + i] = val & 0xff;
> > > > - val >>= 8;
> > > > + uint32_t config_size = pci_config_size(d);
> > > > +
> > > > + for (i = 0; i < l && addr + i < config_size; val >>= 8, ++i) {
> > > > + uint8_t wmask = d->wmask[addr + i];
> > > > + d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask);
> > > > }
> > > > }
> > > >
> > >
> > >
> > > Let's not name an internal static helper pci_write_config.
> > > This is really update_config_by_mask or something like that.
> > > But see below: maybe we don't need it at all?
> >
> > The function already exists, I just made it do what it seems like it
> > should have been doing already.
>
> Yep. But since you are rewriting this function - could you rename it as
> well?
Ok
> > > > @@ -1207,18 +1208,14 @@ void pci_default_cap_write_config(PCIDevice *pci_dev,
> > > >
> > > > void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
> > > > {
> > > > - int i, was_irq_disabled = pci_irq_disabled(d);
> > > > - uint32_t config_size = pci_config_size(d);
> > > > + int was_irq_disabled = pci_irq_disabled(d);
> > > >
> > > > if (pci_access_cap_config(d, addr, l)) {
> > > > d->cap.config_write(d, addr, val, l);
> > > > return;
> > > > }
> > > >
> > >
> > > I would like to also examine the need for _cap_
> > > functions. Why can assigned devices just do
> > >
> > > pci_default_write_config
> > > if (range_overlap(...msi)) {
> > > }
> > > if (range_overlap(...msix)) {
> > > }
> > > and then we could remove all the _cap_ extensions
> > > altogether?
> >
> > I think that somewhere we need to track what capabilities are at what
> > offset, config space isn't a performance path, but that look horribly
> > inefficient and gets worse with more capabilities.
>
> Looks like premature optimization to me. I guess when we get more than
> say 8 capabilities to support, I'll start to worry.
> Even then, these optimizations are better internal in pci core.
It's not just an optimization, as noted in another reply, we should be
using it to make sure we don't have collisions, and it simply makes the
callback code much cleaner to be able to do a switch statement instead
of a pile of 'if (ranges_overlap)', IMO.
> > Why don't we define capability id 0xff as normal config space (first 64
> > bytes), then add the capability id to read/write_config (this is what
> > vfio does). Then the driver can split capability handling off from
> > their main functions if they want.
>
> My feeling is we need higher level APIs than 'capability write'.
> Otherwise we get the PCI config handling all over the place.
> E.g. a callback when msi is enabled/disabled would make sense,
> so that pci core can keep track of current state and only notify
> the device when there are things to do.
I agree, but it's difficult to provide the flexibility to meet all the
needs. Device assignment might want to be called for more or less bit
flips than an emulated device, PM is probably a good example of this.
We could actually change state on a PMCSR write, but I'm not sure what
an emulated device would do. Does that mean we add a callback
specifically for that, or do we provide some generic interface that
drivers can register which bits they want to know about changing?
> > Anyway, I think such an improvement
> > could be added incrementally later. Thanks,
> >
> > Alex
>
> Sure.
>
> > > > - for (i = 0; i < l && addr + i < config_size; val >>= 8, ++i) {
> > > > - uint8_t wmask = d->wmask[addr + i];
> > > > - d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask);
> > > > - }
> > > > + pci_write_config(d, addr, val, l);
> > > >
> > > > #ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
> > > > if (kvm_enabled() && kvm_irqchip_in_kernel() &&
> >
> >
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Qemu-devel] Re: [PATCH 1/8] pci: pci_default_cap_write_config ignores wmask
2010-11-12 15:49 ` Alex Williamson
@ 2010-11-16 16:12 ` Michael S. Tsirkin
0 siblings, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2010-11-16 16:12 UTC (permalink / raw)
To: Alex Williamson; +Cc: chrisw, qemu-devel, kvm
On Fri, Nov 12, 2010 at 08:49:20AM -0700, Alex Williamson wrote:
> > > > > @@ -1207,18 +1208,14 @@ void pci_default_cap_write_config(PCIDevice *pci_dev,
> > > > >
> > > > > void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
> > > > > {
> > > > > - int i, was_irq_disabled = pci_irq_disabled(d);
> > > > > - uint32_t config_size = pci_config_size(d);
> > > > > + int was_irq_disabled = pci_irq_disabled(d);
> > > > >
> > > > > if (pci_access_cap_config(d, addr, l)) {
> > > > > d->cap.config_write(d, addr, val, l);
> > > > > return;
> > > > > }
> > > > >
> > > >
> > > > I would like to also examine the need for _cap_
> > > > functions. Why can assigned devices just do
> > > >
> > > > pci_default_write_config
> > > > if (range_overlap(...msi)) {
> > > > }
> > > > if (range_overlap(...msix)) {
> > > > }
> > > > and then we could remove all the _cap_ extensions
> > > > altogether?
> > >
> > > I think that somewhere we need to track what capabilities are at what
> > > offset, config space isn't a performance path, but that look horribly
> > > inefficient and gets worse with more capabilities.
> >
> > Looks like premature optimization to me. I guess when we get more than
> > say 8 capabilities to support, I'll start to worry.
> > Even then, these optimizations are better internal in pci core.
>
> It's not just an optimization, as noted in another reply, we should be
> using it to make sure we don't have collisions, and it simply makes the
> callback code much cleaner to be able to do a switch statement instead
> of a pile of 'if (ranges_overlap)', IMO.
Two if statements is not a pile :)
I think in the end we will have
a general pci handler dealing with all capabilities,
and device assignment would *maybe* deal with msi and msix.
> > > Why don't we define capability id 0xff as normal config space (first 64
> > > bytes), then add the capability id to read/write_config (this is what
> > > vfio does). Then the driver can split capability handling off from
> > > their main functions if they want.
> >
> > My feeling is we need higher level APIs than 'capability write'.
> > Otherwise we get the PCI config handling all over the place.
> > E.g. a callback when msi is enabled/disabled would make sense,
> > so that pci core can keep track of current state and only notify
> > the device when there are things to do.
>
> I agree, but it's difficult to provide the flexibility to meet all the
> needs. Device assignment might want to be called for more or less bit
> flips than an emulated device, PM is probably a good example of this.
> We could actually change state on a PMCSR write, but I'm not sure what
> an emulated device would do. Does that mean we add a callback
> specifically for that, or do we provide some generic interface that
> drivers can register which bits they want to know about changing?
I'm arguing for the PM callback. This way pci config decoding
is local in pci.c, others use high level APIs.
> > > Anyway, I think such an improvement
> > > could be added incrementally later. Thanks,
> > >
> > > Alex
> >
> > Sure.
> >
> > > > > - for (i = 0; i < l && addr + i < config_size; val >>= 8, ++i) {
> > > > > - uint8_t wmask = d->wmask[addr + i];
> > > > > - d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask);
> > > > > - }
> > > > > + pci_write_config(d, addr, val, l);
> > > > >
> > > > > #ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
> > > > > if (kvm_enabled() && kvm_irqchip_in_kernel() &&
> > >
> > >
>
>
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Qemu-devel] [PATCH 2/8] pci: Remove pci_enable_capability_support()
2010-11-12 2:54 [Qemu-devel] [PATCH 0/8] PCI capability and device assignment improvements Alex Williamson
2010-11-12 2:55 ` [Qemu-devel] [PATCH 1/8] pci: pci_default_cap_write_config ignores wmask Alex Williamson
@ 2010-11-12 2:55 ` Alex Williamson
2010-11-12 2:55 ` [Qemu-devel] [PATCH 3/8] device-assignment: Use PCI capabilities support Alex Williamson
` (6 subsequent siblings)
8 siblings, 0 replies; 26+ messages in thread
From: Alex Williamson @ 2010-11-12 2:55 UTC (permalink / raw)
To: kvm; +Cc: chrisw, alex.williamson, qemu-devel, mst
This interface doesn't make much sense, adding a capability can
take care of everything, just provide a means to register
capability read/write handlers.
Device assignment does it's own thing, so requires a couple
ugly hacks that will be cleaned by subsequent patches.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
hw/device-assignment.c | 12 ++++++++---
hw/pci.c | 52 +++++++++++++++++++++---------------------------
hw/pci.h | 9 +++-----
3 files changed, 35 insertions(+), 38 deletions(-)
diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index bde231d..311c197 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1292,7 +1292,12 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
PCIRegion *pci_region = dev->real_device.regions;
int next_cap_pt = 0;
+ pci_dev->cap.supported = 1;
+ pci_dev->cap.start = PCI_CAPABILITY_CONFIG_DEFAULT_START_ADDR;
pci_dev->cap.length = 0;
+ pci_dev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
+ pci_dev->config[PCI_CAPABILITY_LIST] = pci_dev->cap.start;
+
#ifdef KVM_CAP_IRQ_ROUTING
#ifdef KVM_CAP_DEVICE_MSI
/* Expose MSI capability
@@ -1468,9 +1473,10 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
dev->h_busnr = dev->host.bus;
dev->h_devfn = PCI_DEVFN(dev->host.dev, dev->host.func);
- if (pci_enable_capability_support(pci_dev, 0, NULL,
- assigned_device_pci_cap_write_config,
- assigned_device_pci_cap_init) < 0)
+ pci_register_capability_handlers(pci_dev, NULL,
+ assigned_device_pci_cap_write_config);
+
+ if (assigned_device_pci_cap_init(pci_dev) < 0)
goto out;
/* assign device to guest */
diff --git a/hw/pci.c b/hw/pci.c
index 12c47ac..6b2b320 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -787,6 +787,8 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
config_write = pci_default_write_config;
pci_dev->config_read = config_read;
pci_dev->config_write = config_write;
+ pci_dev->cap.config_read = pci_default_cap_read_config;
+ pci_dev->cap.config_write = pci_default_cap_write_config;
bus->devices[devfn] = pci_dev;
pci_dev->irq = qemu_allocate_irqs(pci_set_irq, pci_dev, PCI_NUM_PINS);
pci_dev->version_id = 2; /* Current pci device vmstate version */
@@ -1893,35 +1895,21 @@ PCIDevice *pci_create_simple_multifunction(PCIBus *bus, int devfn,
return dev;
}
-int pci_enable_capability_support(PCIDevice *pci_dev,
- uint32_t config_start,
- PCICapConfigReadFunc *config_read,
- PCICapConfigWriteFunc *config_write,
- PCICapConfigInitFunc *config_init)
+void pci_register_capability_handlers(PCIDevice *pdev,
+ PCICapConfigReadFunc *config_read,
+ PCICapConfigWriteFunc *config_write)
{
- if (!pci_dev)
- return -ENODEV;
-
- pci_dev->config[0x06] |= 0x10; // status = capabilities
-
- if (config_start == 0)
- pci_dev->cap.start = PCI_CAPABILITY_CONFIG_DEFAULT_START_ADDR;
- else if (config_start >= 0x40 && config_start < 0xff)
- pci_dev->cap.start = config_start;
- else
- return -EINVAL;
+ if (config_read) {
+ pdev->cap.config_read = config_read;
+ } else {
+ pdev->cap.config_read = pci_default_cap_read_config;
+ }
- if (config_read)
- pci_dev->cap.config_read = config_read;
- else
- pci_dev->cap.config_read = pci_default_cap_read_config;
- if (config_write)
- pci_dev->cap.config_write = config_write;
- else
- pci_dev->cap.config_write = pci_default_cap_write_config;
- pci_dev->cap.supported = 1;
- pci_dev->config[PCI_CAPABILITY_LIST] = pci_dev->cap.start;
- return config_init(pci_dev);
+ if (config_write) {
+ pdev->cap.config_write = config_write;
+ } else {
+ pdev->cap.config_write = pci_default_cap_write_config;
+ }
}
PCIDevice *pci_create(PCIBus *bus, int devfn, const char *name)
@@ -2045,12 +2033,16 @@ int pci_add_capability_at_offset(PCIDevice *pdev, uint8_t cap_id,
config[PCI_CAP_LIST_ID] = cap_id;
config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
pdev->config[PCI_CAPABILITY_LIST] = offset;
- pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
memset(pdev->used + offset, 0xFF, size);
/* Make capability read-only by default */
memset(pdev->wmask + offset, 0, size);
/* Check capability by default */
memset(pdev->cmask + offset, 0xFF, size);
+
+ pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
+ pdev->cap.supported = 1;
+ pdev->cap.start = pdev->cap.start ? MIN(pdev->cap.start, offset) : offset;
+
return offset;
}
@@ -2078,8 +2070,10 @@ void pci_del_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t size)
memset(pdev->cmask + offset, 0, size);
memset(pdev->used + offset, 0, size);
- if (!pdev->config[PCI_CAPABILITY_LIST])
+ if (!pdev->config[PCI_CAPABILITY_LIST]) {
pdev->config[PCI_STATUS] &= ~PCI_STATUS_CAP_LIST;
+ pdev->cap.start = pdev->cap.length = 0;
+ }
}
/* Reserve space for capability at a known offset (to call after load). */
diff --git a/hw/pci.h b/hw/pci.h
index 334e928..0712e55 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -87,7 +87,6 @@ typedef void PCICapConfigWriteFunc(PCIDevice *pci_dev,
uint32_t address, uint32_t val, int len);
typedef uint32_t PCICapConfigReadFunc(PCIDevice *pci_dev,
uint32_t address, int len);
-typedef int PCICapConfigInitFunc(PCIDevice *pci_dev);
typedef struct PCIIORegion {
pcibus_t addr; /* current PCI mapping address. -1 means not mapped */
@@ -228,11 +227,9 @@ void pci_register_bar(PCIDevice *pci_dev, int region_num,
void pci_map_option_rom(PCIDevice *pdev, int region_num, pcibus_t addr,
pcibus_t size, int type);
-int pci_enable_capability_support(PCIDevice *pci_dev,
- uint32_t config_start,
- PCICapConfigReadFunc *config_read,
- PCICapConfigWriteFunc *config_write,
- PCICapConfigInitFunc *config_init);
+void pci_register_capability_handlers(PCIDevice *pci_dev,
+ PCICapConfigReadFunc *config_read,
+ PCICapConfigWriteFunc *config_write);
int pci_map_irq(PCIDevice *pci_dev, int pin);
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [Qemu-devel] [PATCH 3/8] device-assignment: Use PCI capabilities support
2010-11-12 2:54 [Qemu-devel] [PATCH 0/8] PCI capability and device assignment improvements Alex Williamson
2010-11-12 2:55 ` [Qemu-devel] [PATCH 1/8] pci: pci_default_cap_write_config ignores wmask Alex Williamson
2010-11-12 2:55 ` [Qemu-devel] [PATCH 2/8] pci: Remove pci_enable_capability_support() Alex Williamson
@ 2010-11-12 2:55 ` Alex Williamson
2010-11-12 2:55 ` [Qemu-devel] [PATCH 4/8] pci: Replace used bitmap with capability byte map Alex Williamson
` (5 subsequent siblings)
8 siblings, 0 replies; 26+ messages in thread
From: Alex Williamson @ 2010-11-12 2:55 UTC (permalink / raw)
To: kvm; +Cc: chrisw, alex.williamson, qemu-devel, mst
Convert to use common pci_add_capabilities() rather than creating
our own mess.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
hw/device-assignment.c | 112 +++++++++++++++++++++++++++---------------------
1 files changed, 63 insertions(+), 49 deletions(-)
diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 311c197..74cdd26 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -38,6 +38,7 @@
#include "device-assignment.h"
#include "loader.h"
#include "monitor.h"
+#include "range.h"
#include <pci/header.h>
/* From linux/ioport.h */
@@ -1075,17 +1076,17 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev, unsigned int ctrl_pos)
}
if (ctrl_byte & PCI_MSI_FLAGS_ENABLE) {
+ int pos = ctrl_pos - PCI_MSI_FLAGS;
assigned_dev->entry = calloc(1, sizeof(struct kvm_irq_routing_entry));
if (!assigned_dev->entry) {
perror("assigned_dev_update_msi: ");
return;
}
assigned_dev->entry->u.msi.address_lo =
- *(uint32_t *)(pci_dev->config + pci_dev->cap.start +
- PCI_MSI_ADDRESS_LO);
+ pci_get_long(pci_dev->config + pos + PCI_MSI_ADDRESS_LO);
assigned_dev->entry->u.msi.address_hi = 0;
- assigned_dev->entry->u.msi.data = *(uint16_t *)(pci_dev->config +
- pci_dev->cap.start + PCI_MSI_DATA_32);
+ assigned_dev->entry->u.msi.data =
+ pci_get_word(pci_dev->config + pos + PCI_MSI_DATA_32);
assigned_dev->entry->type = KVM_IRQ_ROUTING_MSI;
r = kvm_get_irq_route_gsi();
if (r < 0) {
@@ -1123,10 +1124,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
struct kvm_assigned_msix_entry msix_entry;
void *va = adev->msix_table_page;
- if (adev->cap.available & ASSIGNED_DEVICE_CAP_MSI)
- pos = pci_dev->cap.start + PCI_CAPABILITY_CONFIG_MSI_LENGTH;
- else
- pos = pci_dev->cap.start;
+ pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX);
entries_max_nr = *(uint16_t *)(pci_dev->config + pos + 2);
entries_max_nr &= PCI_MSIX_TABSIZE;
@@ -1260,26 +1258,23 @@ static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev, uint32_t ad
uint32_t val, int len)
{
AssignedDevice *assigned_dev = container_of(pci_dev, AssignedDevice, dev);
- unsigned int pos = pci_dev->cap.start, ctrl_pos;
pci_default_cap_write_config(pci_dev, address, val, len);
#ifdef KVM_CAP_IRQ_ROUTING
#ifdef KVM_CAP_DEVICE_MSI
if (assigned_dev->cap.available & ASSIGNED_DEVICE_CAP_MSI) {
- ctrl_pos = pos + PCI_MSI_FLAGS;
- if (address <= ctrl_pos && address + len > ctrl_pos)
- assigned_dev_update_msi(pci_dev, ctrl_pos);
- pos += PCI_CAPABILITY_CONFIG_MSI_LENGTH;
+ int pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSI);
+ if (ranges_overlap(address, len, pos + PCI_MSI_FLAGS, 1)) {
+ assigned_dev_update_msi(pci_dev, pos + PCI_MSI_FLAGS);
+ }
}
#endif
#ifdef KVM_CAP_DEVICE_MSIX
if (assigned_dev->cap.available & ASSIGNED_DEVICE_CAP_MSIX) {
- ctrl_pos = pos + 3;
- if (address <= ctrl_pos && address + len > ctrl_pos) {
- ctrl_pos--; /* control is word long */
- assigned_dev_update_msix(pci_dev, ctrl_pos);
+ int pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX);
+ if (ranges_overlap(address, len, pos + PCI_MSIX_FLAGS + 1, 1)) {
+ assigned_dev_update_msix(pci_dev, pos + PCI_MSIX_FLAGS);
}
- pos += PCI_CAPABILITY_CONFIG_MSIX_LENGTH;
}
#endif
#endif
@@ -1290,58 +1285,77 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
{
AssignedDevice *dev = container_of(pci_dev, AssignedDevice, dev);
PCIRegion *pci_region = dev->real_device.regions;
- int next_cap_pt = 0;
- pci_dev->cap.supported = 1;
- pci_dev->cap.start = PCI_CAPABILITY_CONFIG_DEFAULT_START_ADDR;
+ /* Clear initial capabilities pointer and status copied from hw */
+ pci_set_byte(pci_dev->config + PCI_CAPABILITY_LIST, 0);
+ pci_set_word(pci_dev->config + PCI_STATUS,
+ pci_get_word(pci_dev->config + PCI_STATUS) &
+ ~PCI_STATUS_CAP_LIST);
+
pci_dev->cap.length = 0;
- pci_dev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
- pci_dev->config[PCI_CAPABILITY_LIST] = pci_dev->cap.start;
#ifdef KVM_CAP_IRQ_ROUTING
#ifdef KVM_CAP_DEVICE_MSI
/* Expose MSI capability
* MSI capability is the 1st capability in capability config */
if (pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSI)) {
+ int vpos, ppos;
+ uint16_t flags;
+
dev->cap.available |= ASSIGNED_DEVICE_CAP_MSI;
- memset(&pci_dev->config[pci_dev->cap.start + pci_dev->cap.length],
- 0, PCI_CAPABILITY_CONFIG_MSI_LENGTH);
- pci_dev->config[pci_dev->cap.start + pci_dev->cap.length] =
- PCI_CAP_ID_MSI;
+ vpos = pci_add_capability(pci_dev, PCI_CAP_ID_MSI,
+ PCI_CAPABILITY_CONFIG_MSI_LENGTH);
+
+ memset(pci_dev->config + vpos + PCI_CAP_FLAGS, 0,
+ PCI_CAPABILITY_CONFIG_MSI_LENGTH - PCI_CAP_FLAGS);
+
+ /* Only 32-bit/no-mask currently supported */
+ ppos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSI);
+ flags = assigned_dev_pci_read_word(pci_dev, ppos + PCI_MSI_FLAGS);
+ flags &= PCI_MSI_FLAGS_QMASK;
+ pci_set_word(pci_dev->config + vpos + PCI_MSI_FLAGS, flags);
+
+ /* Set writable fields */
+ pci_set_word(pci_dev->wmask + vpos + PCI_MSI_FLAGS,
+ PCI_MSI_FLAGS_QSIZE | PCI_MSI_FLAGS_ENABLE);
+ pci_set_long(pci_dev->wmask + vpos + PCI_MSI_ADDRESS_LO, 0xfffffffc);
+ pci_set_long(pci_dev->wmask + vpos + PCI_MSI_DATA_32, 0xffff);
pci_dev->cap.length += PCI_CAPABILITY_CONFIG_MSI_LENGTH;
- next_cap_pt = 1;
}
#endif
#ifdef KVM_CAP_DEVICE_MSIX
/* Expose MSI-X capability */
if (pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSIX)) {
- int pos, entry_nr, bar_nr;
+ int vpos, ppos, entry_nr, bar_nr;
uint32_t msix_table_entry;
+
dev->cap.available |= ASSIGNED_DEVICE_CAP_MSIX;
- memset(&pci_dev->config[pci_dev->cap.start + pci_dev->cap.length],
- 0, PCI_CAPABILITY_CONFIG_MSIX_LENGTH);
- pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSIX);
- entry_nr = assigned_dev_pci_read_word(pci_dev, pos + 2) &
- PCI_MSIX_TABSIZE;
- pci_dev->config[pci_dev->cap.start + pci_dev->cap.length] = 0x11;
- *(uint16_t *)(pci_dev->config + pci_dev->cap.start +
- pci_dev->cap.length + 2) = entry_nr;
+ vpos = pci_add_capability(pci_dev, PCI_CAP_ID_MSIX,
+ PCI_CAPABILITY_CONFIG_MSIX_LENGTH);
+
+ memset(pci_dev->config + vpos + PCI_CAP_FLAGS, 0,
+ PCI_CAPABILITY_CONFIG_MSIX_LENGTH - PCI_CAP_FLAGS);
+
+ /* Only enable and function mask bits are writable */
+ pci_set_word(pci_dev->wmask + vpos + PCI_MSIX_FLAGS,
+ PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL);
+
+ ppos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSIX);
+
+ entry_nr = assigned_dev_pci_read_word(pci_dev, ppos + PCI_MSIX_FLAGS);
+ entry_nr &= PCI_MSIX_TABSIZE;
+ pci_set_word(pci_dev->config + vpos + PCI_MSIX_FLAGS, entry_nr);
+
msix_table_entry = assigned_dev_pci_read_long(pci_dev,
- pos + PCI_MSIX_TABLE);
- *(uint32_t *)(pci_dev->config + pci_dev->cap.start +
- pci_dev->cap.length + PCI_MSIX_TABLE) = msix_table_entry;
- *(uint32_t *)(pci_dev->config + pci_dev->cap.start +
- pci_dev->cap.length + PCI_MSIX_PBA) =
- assigned_dev_pci_read_long(pci_dev, pos + PCI_MSIX_PBA);
+ ppos + PCI_MSIX_TABLE);
+ pci_set_long(pci_dev->config + vpos + PCI_MSIX_TABLE, msix_table_entry);
+
+ pci_set_long(pci_dev->config + vpos + PCI_MSIX_PBA,
+ assigned_dev_pci_read_long(pci_dev, ppos + PCI_MSIX_PBA));
+
bar_nr = msix_table_entry & PCI_MSIX_BIR;
msix_table_entry &= ~PCI_MSIX_BIR;
dev->msix_table_addr = pci_region[bar_nr].base_addr + msix_table_entry;
- if (next_cap_pt != 0) {
- pci_dev->config[pci_dev->cap.start + next_cap_pt] =
- pci_dev->cap.start + pci_dev->cap.length;
- next_cap_pt += PCI_CAPABILITY_CONFIG_MSI_LENGTH;
- } else
- next_cap_pt = 1;
pci_dev->cap.length += PCI_CAPABILITY_CONFIG_MSIX_LENGTH;
}
#endif
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [Qemu-devel] [PATCH 4/8] pci: Replace used bitmap with capability byte map
2010-11-12 2:54 [Qemu-devel] [PATCH 0/8] PCI capability and device assignment improvements Alex Williamson
` (2 preceding siblings ...)
2010-11-12 2:55 ` [Qemu-devel] [PATCH 3/8] device-assignment: Use PCI capabilities support Alex Williamson
@ 2010-11-12 2:55 ` Alex Williamson
2010-11-12 5:40 ` [Qemu-devel] " Michael S. Tsirkin
2010-11-12 2:55 ` [Qemu-devel] [PATCH 5/8] pci: Remove cap.length, cap.start, cap.supported Alex Williamson
` (4 subsequent siblings)
8 siblings, 1 reply; 26+ messages in thread
From: Alex Williamson @ 2010-11-12 2:55 UTC (permalink / raw)
To: kvm; +Cc: chrisw, alex.williamson, qemu-devel, mst
Capabilities are allocated in bytes, so we can track both whether
a byte is used and by what capability in the same structure.
Remove pci_reserve_capability() as there are no users.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
hw/pci.c | 16 +++++-----------
hw/pci.h | 6 ++----
2 files changed, 7 insertions(+), 15 deletions(-)
diff --git a/hw/pci.c b/hw/pci.c
index 6b2b320..e1e8a77 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -730,7 +730,7 @@ static void pci_config_alloc(PCIDevice *pci_dev)
pci_dev->config = qemu_mallocz(config_size);
pci_dev->cmask = qemu_mallocz(config_size);
pci_dev->wmask = qemu_mallocz(config_size);
- pci_dev->used = qemu_mallocz(config_size);
+ pci_dev->cap_map = qemu_mallocz(config_size);
}
static void pci_config_free(PCIDevice *pci_dev)
@@ -738,7 +738,7 @@ static void pci_config_free(PCIDevice *pci_dev)
qemu_free(pci_dev->config);
qemu_free(pci_dev->cmask);
qemu_free(pci_dev->wmask);
- qemu_free(pci_dev->used);
+ qemu_free(pci_dev->cap_map);
}
/* -1 for devfn means auto assign */
@@ -1928,7 +1928,7 @@ static int pci_find_space(PCIDevice *pdev, uint8_t size)
int offset = PCI_CONFIG_HEADER_SIZE;
int i;
for (i = PCI_CONFIG_HEADER_SIZE; i < config_size; ++i)
- if (pdev->used[i])
+ if (pdev->cap_map[i])
offset = i + 1;
else if (i - offset + 1 == size)
return offset;
@@ -2033,7 +2033,7 @@ int pci_add_capability_at_offset(PCIDevice *pdev, uint8_t cap_id,
config[PCI_CAP_LIST_ID] = cap_id;
config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
pdev->config[PCI_CAPABILITY_LIST] = offset;
- memset(pdev->used + offset, 0xFF, size);
+ memset(pdev->cap_map + offset, cap_id, size);
/* Make capability read-only by default */
memset(pdev->wmask + offset, 0, size);
/* Check capability by default */
@@ -2068,7 +2068,7 @@ void pci_del_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t size)
memset(pdev->wmask + offset, 0xff, size);
/* Clear cmask as device-specific registers can't be checked */
memset(pdev->cmask + offset, 0, size);
- memset(pdev->used + offset, 0, size);
+ memset(pdev->cap_map + offset, 0, size);
if (!pdev->config[PCI_CAPABILITY_LIST]) {
pdev->config[PCI_STATUS] &= ~PCI_STATUS_CAP_LIST;
@@ -2076,12 +2076,6 @@ void pci_del_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t size)
}
}
-/* Reserve space for capability at a known offset (to call after load). */
-void pci_reserve_capability(PCIDevice *pdev, uint8_t offset, uint8_t size)
-{
- memset(pdev->used + offset, 0xff, size);
-}
-
uint8_t pci_find_capability(PCIDevice *pdev, uint8_t cap_id)
{
return pci_find_capability_list(pdev, cap_id, NULL);
diff --git a/hw/pci.h b/hw/pci.h
index 0712e55..2265c70 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -151,8 +151,8 @@ struct PCIDevice {
/* Used to implement R/W bytes */
uint8_t *wmask;
- /* Used to allocate config space for capabilities. */
- uint8_t *used;
+ /* Used to allocate config space and track capabilities. */
+ uint8_t *cap_map;
/* the following fields are read only */
PCIBus *bus;
@@ -239,8 +239,6 @@ int pci_add_capability_at_offset(PCIDevice *pci_dev, uint8_t cap_id,
void pci_del_capability(PCIDevice *pci_dev, uint8_t cap_id, uint8_t cap_size);
-void pci_reserve_capability(PCIDevice *pci_dev, uint8_t offset, uint8_t size);
-
uint8_t pci_find_capability(PCIDevice *pci_dev, uint8_t cap_id);
uint32_t pci_default_read_config(PCIDevice *d,
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [Qemu-devel] Re: [PATCH 4/8] pci: Replace used bitmap with capability byte map
2010-11-12 2:55 ` [Qemu-devel] [PATCH 4/8] pci: Replace used bitmap with capability byte map Alex Williamson
@ 2010-11-12 5:40 ` Michael S. Tsirkin
2010-11-12 6:07 ` Alex Williamson
0 siblings, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2010-11-12 5:40 UTC (permalink / raw)
To: Alex Williamson; +Cc: chrisw, qemu-devel, kvm
On Thu, Nov 11, 2010 at 07:55:43PM -0700, Alex Williamson wrote:
> Capabilities are allocated in bytes, so we can track both whether
> a byte is used and by what capability in the same structure.
>
> Remove pci_reserve_capability() as there are no users.
>
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
I actually wanted to remove the used array completely and ask
all users to add offsets directly.
Will this be needed then?
> ---
>
> hw/pci.c | 16 +++++-----------
> hw/pci.h | 6 ++----
> 2 files changed, 7 insertions(+), 15 deletions(-)
>
> diff --git a/hw/pci.c b/hw/pci.c
> index 6b2b320..e1e8a77 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -730,7 +730,7 @@ static void pci_config_alloc(PCIDevice *pci_dev)
> pci_dev->config = qemu_mallocz(config_size);
> pci_dev->cmask = qemu_mallocz(config_size);
> pci_dev->wmask = qemu_mallocz(config_size);
> - pci_dev->used = qemu_mallocz(config_size);
> + pci_dev->cap_map = qemu_mallocz(config_size);
> }
>
> static void pci_config_free(PCIDevice *pci_dev)
> @@ -738,7 +738,7 @@ static void pci_config_free(PCIDevice *pci_dev)
> qemu_free(pci_dev->config);
> qemu_free(pci_dev->cmask);
> qemu_free(pci_dev->wmask);
> - qemu_free(pci_dev->used);
> + qemu_free(pci_dev->cap_map);
> }
>
> /* -1 for devfn means auto assign */
> @@ -1928,7 +1928,7 @@ static int pci_find_space(PCIDevice *pdev, uint8_t size)
> int offset = PCI_CONFIG_HEADER_SIZE;
> int i;
> for (i = PCI_CONFIG_HEADER_SIZE; i < config_size; ++i)
> - if (pdev->used[i])
> + if (pdev->cap_map[i])
> offset = i + 1;
> else if (i - offset + 1 == size)
> return offset;
> @@ -2033,7 +2033,7 @@ int pci_add_capability_at_offset(PCIDevice *pdev, uint8_t cap_id,
> config[PCI_CAP_LIST_ID] = cap_id;
> config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
> pdev->config[PCI_CAPABILITY_LIST] = offset;
> - memset(pdev->used + offset, 0xFF, size);
> + memset(pdev->cap_map + offset, cap_id, size);
> /* Make capability read-only by default */
> memset(pdev->wmask + offset, 0, size);
> /* Check capability by default */
> @@ -2068,7 +2068,7 @@ void pci_del_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t size)
> memset(pdev->wmask + offset, 0xff, size);
> /* Clear cmask as device-specific registers can't be checked */
> memset(pdev->cmask + offset, 0, size);
> - memset(pdev->used + offset, 0, size);
> + memset(pdev->cap_map + offset, 0, size);
>
> if (!pdev->config[PCI_CAPABILITY_LIST]) {
> pdev->config[PCI_STATUS] &= ~PCI_STATUS_CAP_LIST;
> @@ -2076,12 +2076,6 @@ void pci_del_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t size)
> }
> }
>
> -/* Reserve space for capability at a known offset (to call after load). */
> -void pci_reserve_capability(PCIDevice *pdev, uint8_t offset, uint8_t size)
> -{
> - memset(pdev->used + offset, 0xff, size);
> -}
> -
> uint8_t pci_find_capability(PCIDevice *pdev, uint8_t cap_id)
> {
> return pci_find_capability_list(pdev, cap_id, NULL);
> diff --git a/hw/pci.h b/hw/pci.h
> index 0712e55..2265c70 100644
> --- a/hw/pci.h
> +++ b/hw/pci.h
> @@ -151,8 +151,8 @@ struct PCIDevice {
> /* Used to implement R/W bytes */
> uint8_t *wmask;
>
> - /* Used to allocate config space for capabilities. */
> - uint8_t *used;
> + /* Used to allocate config space and track capabilities. */
> + uint8_t *cap_map;
>
> /* the following fields are read only */
> PCIBus *bus;
> @@ -239,8 +239,6 @@ int pci_add_capability_at_offset(PCIDevice *pci_dev, uint8_t cap_id,
>
> void pci_del_capability(PCIDevice *pci_dev, uint8_t cap_id, uint8_t cap_size);
>
> -void pci_reserve_capability(PCIDevice *pci_dev, uint8_t offset, uint8_t size);
> -
> uint8_t pci_find_capability(PCIDevice *pci_dev, uint8_t cap_id);
>
> uint32_t pci_default_read_config(PCIDevice *d,
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Qemu-devel] Re: [PATCH 4/8] pci: Replace used bitmap with capability byte map
2010-11-12 5:40 ` [Qemu-devel] " Michael S. Tsirkin
@ 2010-11-12 6:07 ` Alex Williamson
2010-11-12 9:02 ` Michael S. Tsirkin
0 siblings, 1 reply; 26+ messages in thread
From: Alex Williamson @ 2010-11-12 6:07 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: chrisw, qemu-devel, kvm
On Fri, 2010-11-12 at 07:40 +0200, Michael S. Tsirkin wrote:
> On Thu, Nov 11, 2010 at 07:55:43PM -0700, Alex Williamson wrote:
> > Capabilities are allocated in bytes, so we can track both whether
> > a byte is used and by what capability in the same structure.
> >
> > Remove pci_reserve_capability() as there are no users.
> >
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
>
> I actually wanted to remove the used array completely and ask
> all users to add offsets directly.
> Will this be needed then?
Can you give an example, I don't understand what you mean by asking
users to add offsets directly. I think some kind of tracking what's
where in config space needs to be done somewhere and the common PCI code
seems like it'd be the place. Thanks,
Alex
> > ---
> >
> > hw/pci.c | 16 +++++-----------
> > hw/pci.h | 6 ++----
> > 2 files changed, 7 insertions(+), 15 deletions(-)
> >
> > diff --git a/hw/pci.c b/hw/pci.c
> > index 6b2b320..e1e8a77 100644
> > --- a/hw/pci.c
> > +++ b/hw/pci.c
> > @@ -730,7 +730,7 @@ static void pci_config_alloc(PCIDevice *pci_dev)
> > pci_dev->config = qemu_mallocz(config_size);
> > pci_dev->cmask = qemu_mallocz(config_size);
> > pci_dev->wmask = qemu_mallocz(config_size);
> > - pci_dev->used = qemu_mallocz(config_size);
> > + pci_dev->cap_map = qemu_mallocz(config_size);
> > }
> >
> > static void pci_config_free(PCIDevice *pci_dev)
> > @@ -738,7 +738,7 @@ static void pci_config_free(PCIDevice *pci_dev)
> > qemu_free(pci_dev->config);
> > qemu_free(pci_dev->cmask);
> > qemu_free(pci_dev->wmask);
> > - qemu_free(pci_dev->used);
> > + qemu_free(pci_dev->cap_map);
> > }
> >
> > /* -1 for devfn means auto assign */
> > @@ -1928,7 +1928,7 @@ static int pci_find_space(PCIDevice *pdev, uint8_t size)
> > int offset = PCI_CONFIG_HEADER_SIZE;
> > int i;
> > for (i = PCI_CONFIG_HEADER_SIZE; i < config_size; ++i)
> > - if (pdev->used[i])
> > + if (pdev->cap_map[i])
> > offset = i + 1;
> > else if (i - offset + 1 == size)
> > return offset;
> > @@ -2033,7 +2033,7 @@ int pci_add_capability_at_offset(PCIDevice *pdev, uint8_t cap_id,
> > config[PCI_CAP_LIST_ID] = cap_id;
> > config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
> > pdev->config[PCI_CAPABILITY_LIST] = offset;
> > - memset(pdev->used + offset, 0xFF, size);
> > + memset(pdev->cap_map + offset, cap_id, size);
> > /* Make capability read-only by default */
> > memset(pdev->wmask + offset, 0, size);
> > /* Check capability by default */
> > @@ -2068,7 +2068,7 @@ void pci_del_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t size)
> > memset(pdev->wmask + offset, 0xff, size);
> > /* Clear cmask as device-specific registers can't be checked */
> > memset(pdev->cmask + offset, 0, size);
> > - memset(pdev->used + offset, 0, size);
> > + memset(pdev->cap_map + offset, 0, size);
> >
> > if (!pdev->config[PCI_CAPABILITY_LIST]) {
> > pdev->config[PCI_STATUS] &= ~PCI_STATUS_CAP_LIST;
> > @@ -2076,12 +2076,6 @@ void pci_del_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t size)
> > }
> > }
> >
> > -/* Reserve space for capability at a known offset (to call after load). */
> > -void pci_reserve_capability(PCIDevice *pdev, uint8_t offset, uint8_t size)
> > -{
> > - memset(pdev->used + offset, 0xff, size);
> > -}
> > -
> > uint8_t pci_find_capability(PCIDevice *pdev, uint8_t cap_id)
> > {
> > return pci_find_capability_list(pdev, cap_id, NULL);
> > diff --git a/hw/pci.h b/hw/pci.h
> > index 0712e55..2265c70 100644
> > --- a/hw/pci.h
> > +++ b/hw/pci.h
> > @@ -151,8 +151,8 @@ struct PCIDevice {
> > /* Used to implement R/W bytes */
> > uint8_t *wmask;
> >
> > - /* Used to allocate config space for capabilities. */
> > - uint8_t *used;
> > + /* Used to allocate config space and track capabilities. */
> > + uint8_t *cap_map;
> >
> > /* the following fields are read only */
> > PCIBus *bus;
> > @@ -239,8 +239,6 @@ int pci_add_capability_at_offset(PCIDevice *pci_dev, uint8_t cap_id,
> >
> > void pci_del_capability(PCIDevice *pci_dev, uint8_t cap_id, uint8_t cap_size);
> >
> > -void pci_reserve_capability(PCIDevice *pci_dev, uint8_t offset, uint8_t size);
> > -
> > uint8_t pci_find_capability(PCIDevice *pci_dev, uint8_t cap_id);
> >
> > uint32_t pci_default_read_config(PCIDevice *d,
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Qemu-devel] Re: [PATCH 4/8] pci: Replace used bitmap with capability byte map
2010-11-12 6:07 ` Alex Williamson
@ 2010-11-12 9:02 ` Michael S. Tsirkin
2010-11-12 15:32 ` Alex Williamson
0 siblings, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2010-11-12 9:02 UTC (permalink / raw)
To: Alex Williamson; +Cc: chrisw, qemu-devel, kvm
On Thu, Nov 11, 2010 at 11:07:15PM -0700, Alex Williamson wrote:
> On Fri, 2010-11-12 at 07:40 +0200, Michael S. Tsirkin wrote:
> > On Thu, Nov 11, 2010 at 07:55:43PM -0700, Alex Williamson wrote:
> > > Capabilities are allocated in bytes, so we can track both whether
> > > a byte is used and by what capability in the same structure.
> > >
> > > Remove pci_reserve_capability() as there are no users.
> > >
> > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> >
> > I actually wanted to remove the used array completely and ask
> > all users to add offsets directly.
> > Will this be needed then?
>
> Can you give an example, I don't understand what you mean by asking
> users to add offsets directly.
Here's a dump of patch in progress.
Something like the below (untested). After applying this we can remove
the whole used array allocator.
> I think some kind of tracking what's
> where in config space needs to be done somewhere and the common PCI code
> seems like it'd be the place.
Why do we need it? config lets us scan the capability list
readily enough. I had this idea that we should pack
capabilities in config space, but now I think it's silly.
> Thanks,
>
> Alex
pci: remove config space allocator
pci supports allocating caps in config space so they are packed tightly.
This doesn't seem to be useful, especially since caps must stay at fixed
offsets to ensure backwards compatibility (e.g. for migration).
Remove this support, and make virtio-pci supply the offset to
the MSIX capability explicitly.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
diff --git a/hw/msix.c b/hw/msix.c
index f66d255..6fd3791 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -52,7 +52,7 @@ int msix_supported;
* Original bar size must be a power of 2 or 0.
* New bar size is returned. */
static int msix_add_config(struct PCIDevice *pdev, unsigned short nentries,
- unsigned bar_nr, unsigned bar_size)
+ unsigned offset, unsigned bar_nr, unsigned bar_size)
{
int config_offset;
uint8_t *config;
@@ -75,7 +75,7 @@ static int msix_add_config(struct PCIDevice *pdev, unsigned short nentries,
pdev->msix_bar_size = new_size;
config_offset = pci_add_capability(pdev, PCI_CAP_ID_MSIX,
- 0, MSIX_CAP_LENGTH);
+ offset, MSIX_CAP_LENGTH);
if (config_offset < 0)
return config_offset;
config = pdev->config + config_offset;
@@ -237,7 +237,7 @@ static void msix_mask_all(struct PCIDevice *dev, unsigned nentries)
/* Initialize the MSI-X structures. Note: if MSI-X is supported, BAR size is
* modified, it should be retrieved with msix_bar_size. */
int msix_init(struct PCIDevice *dev, unsigned short nentries,
- unsigned bar_nr, unsigned bar_size)
+ unsigned offset, unsigned bar_nr, unsigned bar_size)
{
int ret;
/* Nothing to do if MSI is not supported by interrupt controller */
@@ -261,7 +261,7 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
}
dev->msix_entries_nr = nentries;
- ret = msix_add_config(dev, nentries, bar_nr, bar_size);
+ ret = msix_add_config(dev, nentries, offset, bar_nr, bar_size);
if (ret)
goto err_config;
diff --git a/hw/msix.h b/hw/msix.h
index a9f7993..b61e42e 100644
--- a/hw/msix.h
+++ b/hw/msix.h
@@ -5,7 +5,7 @@
#include "pci.h"
int msix_init(PCIDevice *pdev, unsigned short nentries,
- unsigned bar_nr, unsigned bar_size);
+ unsigned offset, unsigned bar_nr, unsigned bar_size);
void msix_write_config(PCIDevice *pci_dev, uint32_t address,
uint32_t val, int len);
diff --git a/hw/pci.c b/hw/pci.c
index aed2d42..e411f12 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1737,12 +1737,7 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
uint8_t offset, uint8_t size)
{
uint8_t *config;
- if (!offset) {
- offset = pci_find_space(pdev, size);
- if (!offset) {
- return -ENOSPC;
- }
- }
+ assert(offset >= PCI_CONFIG_HEADER_SIZE);
config = pdev->config + offset;
config[PCI_CAP_LIST_ID] = cap_id;
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 729917d..bcb0266 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -543,7 +543,8 @@ static void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev,
config[0x3d] = 1;
- if (vdev->nvectors && !msix_init(&proxy->pci_dev, vdev->nvectors, 1, 0)) {
+ if (vdev->nvectors && !msix_init(&proxy->pci_dev, vdev->nvectors,
+ PCI_CONFIG_HEADER_SIZE, 1, 0)) {
pci_register_bar(&proxy->pci_dev, 1,
msix_bar_size(&proxy->pci_dev),
PCI_BASE_ADDRESS_SPACE_MEMORY,
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [Qemu-devel] Re: [PATCH 4/8] pci: Replace used bitmap with capability byte map
2010-11-12 9:02 ` Michael S. Tsirkin
@ 2010-11-12 15:32 ` Alex Williamson
0 siblings, 0 replies; 26+ messages in thread
From: Alex Williamson @ 2010-11-12 15:32 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: chrisw, qemu-devel, kvm
On Fri, 2010-11-12 at 11:02 +0200, Michael S. Tsirkin wrote:
> On Thu, Nov 11, 2010 at 11:07:15PM -0700, Alex Williamson wrote:
> > On Fri, 2010-11-12 at 07:40 +0200, Michael S. Tsirkin wrote:
> > > On Thu, Nov 11, 2010 at 07:55:43PM -0700, Alex Williamson wrote:
> > > > Capabilities are allocated in bytes, so we can track both whether
> > > > a byte is used and by what capability in the same structure.
> > > >
> > > > Remove pci_reserve_capability() as there are no users.
> > > >
> > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > >
> > > I actually wanted to remove the used array completely and ask
> > > all users to add offsets directly.
> > > Will this be needed then?
> >
> > Can you give an example, I don't understand what you mean by asking
> > users to add offsets directly.
>
> Here's a dump of patch in progress.
> Something like the below (untested). After applying this we can remove
> the whole used array allocator.
>
> > I think some kind of tracking what's
> > where in config space needs to be done somewhere and the common PCI code
> > seems like it'd be the place.
>
> Why do we need it? config lets us scan the capability list
> readily enough. I had this idea that we should pack
> capabilities in config space, but now I think it's silly.
>
> > Thanks,
> >
> > Alex
>
> pci: remove config space allocator
>
> pci supports allocating caps in config space so they are packed tightly.
> This doesn't seem to be useful, especially since caps must stay at fixed
> offsets to ensure backwards compatibility (e.g. for migration).
>
> Remove this support, and make virtio-pci supply the offset to
> the MSIX capability explicitly.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
So add_capability effectively becomes the add_capability_at_offset that
we have in the qemu-kvm tree. I think that's fine, but doesn't
necessarily support removing the cap_map. A direct config offset to
capability id map is pretty useful for device assignment, which only
uses add_capability_at_offset after these patches. We might also want
to use it to check drivers to make sure they're not overlapping caps.
For instance in the below, we now have core virtio code forcing an
offset for the msix cap. What if the driver already placed something at
that offset, or what if the driver wants to add capability foo, it
should at least error if it tries to place it over something else in the
list.
Alex
> diff --git a/hw/msix.c b/hw/msix.c
> index f66d255..6fd3791 100644
> --- a/hw/msix.c
> +++ b/hw/msix.c
> @@ -52,7 +52,7 @@ int msix_supported;
> * Original bar size must be a power of 2 or 0.
> * New bar size is returned. */
> static int msix_add_config(struct PCIDevice *pdev, unsigned short nentries,
> - unsigned bar_nr, unsigned bar_size)
> + unsigned offset, unsigned bar_nr, unsigned bar_size)
> {
> int config_offset;
> uint8_t *config;
> @@ -75,7 +75,7 @@ static int msix_add_config(struct PCIDevice *pdev, unsigned short nentries,
>
> pdev->msix_bar_size = new_size;
> config_offset = pci_add_capability(pdev, PCI_CAP_ID_MSIX,
> - 0, MSIX_CAP_LENGTH);
> + offset, MSIX_CAP_LENGTH);
> if (config_offset < 0)
> return config_offset;
> config = pdev->config + config_offset;
> @@ -237,7 +237,7 @@ static void msix_mask_all(struct PCIDevice *dev, unsigned nentries)
> /* Initialize the MSI-X structures. Note: if MSI-X is supported, BAR size is
> * modified, it should be retrieved with msix_bar_size. */
> int msix_init(struct PCIDevice *dev, unsigned short nentries,
> - unsigned bar_nr, unsigned bar_size)
> + unsigned offset, unsigned bar_nr, unsigned bar_size)
> {
> int ret;
> /* Nothing to do if MSI is not supported by interrupt controller */
> @@ -261,7 +261,7 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
> }
>
> dev->msix_entries_nr = nentries;
> - ret = msix_add_config(dev, nentries, bar_nr, bar_size);
> + ret = msix_add_config(dev, nentries, offset, bar_nr, bar_size);
> if (ret)
> goto err_config;
>
> diff --git a/hw/msix.h b/hw/msix.h
> index a9f7993..b61e42e 100644
> --- a/hw/msix.h
> +++ b/hw/msix.h
> @@ -5,7 +5,7 @@
> #include "pci.h"
>
> int msix_init(PCIDevice *pdev, unsigned short nentries,
> - unsigned bar_nr, unsigned bar_size);
> + unsigned offset, unsigned bar_nr, unsigned bar_size);
>
> void msix_write_config(PCIDevice *pci_dev, uint32_t address,
> uint32_t val, int len);
> diff --git a/hw/pci.c b/hw/pci.c
> index aed2d42..e411f12 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -1737,12 +1737,7 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
> uint8_t offset, uint8_t size)
> {
> uint8_t *config;
> - if (!offset) {
> - offset = pci_find_space(pdev, size);
> - if (!offset) {
> - return -ENOSPC;
> - }
> - }
> + assert(offset >= PCI_CONFIG_HEADER_SIZE);
>
> config = pdev->config + offset;
> config[PCI_CAP_LIST_ID] = cap_id;
> diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
> index 729917d..bcb0266 100644
> --- a/hw/virtio-pci.c
> +++ b/hw/virtio-pci.c
> @@ -543,7 +543,8 @@ static void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev,
>
> config[0x3d] = 1;
>
> - if (vdev->nvectors && !msix_init(&proxy->pci_dev, vdev->nvectors, 1, 0)) {
> + if (vdev->nvectors && !msix_init(&proxy->pci_dev, vdev->nvectors,
> + PCI_CONFIG_HEADER_SIZE, 1, 0)) {
> pci_register_bar(&proxy->pci_dev, 1,
> msix_bar_size(&proxy->pci_dev),
> PCI_BASE_ADDRESS_SPACE_MEMORY,
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Qemu-devel] [PATCH 5/8] pci: Remove cap.length, cap.start, cap.supported
2010-11-12 2:54 [Qemu-devel] [PATCH 0/8] PCI capability and device assignment improvements Alex Williamson
` (3 preceding siblings ...)
2010-11-12 2:55 ` [Qemu-devel] [PATCH 4/8] pci: Replace used bitmap with capability byte map Alex Williamson
@ 2010-11-12 2:55 ` Alex Williamson
2010-11-12 2:56 ` [Qemu-devel] [PATCH 6/8] device-assignment: Move PCI capabilities to match physical hardware Alex Williamson
` (3 subsequent siblings)
8 siblings, 0 replies; 26+ messages in thread
From: Alex Williamson @ 2010-11-12 2:55 UTC (permalink / raw)
To: kvm; +Cc: chrisw, alex.williamson, qemu-devel, mst
Capabilities aren't required to be contiguous, so cap.length never
really made much sense. Likewise, cap.start is mostly meaningless
too. Both of these are better served by the capability map. We
can also get rid of cap.supported, since it's really now unused
and redundant with flag in the status word anyway.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
hw/device-assignment.c | 4 ----
hw/pci.c | 8 +-------
hw/pci.h | 2 --
3 files changed, 1 insertions(+), 13 deletions(-)
diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 74cdd26..322fa9f 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1292,8 +1292,6 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
pci_get_word(pci_dev->config + PCI_STATUS) &
~PCI_STATUS_CAP_LIST);
- pci_dev->cap.length = 0;
-
#ifdef KVM_CAP_IRQ_ROUTING
#ifdef KVM_CAP_DEVICE_MSI
/* Expose MSI capability
@@ -1320,7 +1318,6 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
PCI_MSI_FLAGS_QSIZE | PCI_MSI_FLAGS_ENABLE);
pci_set_long(pci_dev->wmask + vpos + PCI_MSI_ADDRESS_LO, 0xfffffffc);
pci_set_long(pci_dev->wmask + vpos + PCI_MSI_DATA_32, 0xffff);
- pci_dev->cap.length += PCI_CAPABILITY_CONFIG_MSI_LENGTH;
}
#endif
#ifdef KVM_CAP_DEVICE_MSIX
@@ -1356,7 +1353,6 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
bar_nr = msix_table_entry & PCI_MSIX_BIR;
msix_table_entry &= ~PCI_MSIX_BIR;
dev->msix_table_addr = pci_region[bar_nr].base_addr + msix_table_entry;
- pci_dev->cap.length += PCI_CAPABILITY_CONFIG_MSIX_LENGTH;
}
#endif
#endif
diff --git a/hw/pci.c b/hw/pci.c
index e1e8a77..d566e33 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1190,10 +1190,7 @@ static void pci_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
int pci_access_cap_config(PCIDevice *pci_dev, uint32_t address, int len)
{
- if (pci_dev->cap.supported && address >= pci_dev->cap.start &&
- (address + len) < pci_dev->cap.start + pci_dev->cap.length)
- return 1;
- return 0;
+ return pci_dev->cap_map[address];
}
uint32_t pci_default_cap_read_config(PCIDevice *pci_dev,
@@ -2040,8 +2037,6 @@ int pci_add_capability_at_offset(PCIDevice *pdev, uint8_t cap_id,
memset(pdev->cmask + offset, 0xFF, size);
pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
- pdev->cap.supported = 1;
- pdev->cap.start = pdev->cap.start ? MIN(pdev->cap.start, offset) : offset;
return offset;
}
@@ -2072,7 +2067,6 @@ void pci_del_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t size)
if (!pdev->config[PCI_CAPABILITY_LIST]) {
pdev->config[PCI_STATUS] &= ~PCI_STATUS_CAP_LIST;
- pdev->cap.start = pdev->cap.length = 0;
}
}
diff --git a/hw/pci.h b/hw/pci.h
index 2265c70..177008a 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -208,8 +208,6 @@ struct PCIDevice {
/* Device capability configuration space */
struct {
- int supported;
- unsigned int start, length;
PCICapConfigReadFunc *config_read;
PCICapConfigWriteFunc *config_write;
} cap;
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [Qemu-devel] [PATCH 6/8] device-assignment: Move PCI capabilities to match physical hardware
2010-11-12 2:54 [Qemu-devel] [PATCH 0/8] PCI capability and device assignment improvements Alex Williamson
` (4 preceding siblings ...)
2010-11-12 2:55 ` [Qemu-devel] [PATCH 5/8] pci: Remove cap.length, cap.start, cap.supported Alex Williamson
@ 2010-11-12 2:56 ` Alex Williamson
2010-11-12 9:20 ` [Qemu-devel] " Michael S. Tsirkin
2010-11-12 2:56 ` [Qemu-devel] [PATCH 7/8] pci: Pass ID for capability read/write handlers Alex Williamson
` (2 subsequent siblings)
8 siblings, 1 reply; 26+ messages in thread
From: Alex Williamson @ 2010-11-12 2:56 UTC (permalink / raw)
To: kvm; +Cc: chrisw, alex.williamson, qemu-devel, mst
Now that common PCI code doesn't have a hangup on capabilities
being contiguous, move assigned device capabilities to match
their offset on physical hardware. This helps for drivers that
assume a capability configuration and don't bother searching.
We can also remove several calls to assigned_dev_pci_read_* because
we're overlaying the capability at the same location as the initial
copy we made of config space. We can therefore just use pci_get_*.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
hw/device-assignment.c | 67 +++++++++++++++---------------------------------
1 files changed, 21 insertions(+), 46 deletions(-)
diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 322fa9f..39f19be 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -366,16 +366,6 @@ static uint8_t assigned_dev_pci_read_byte(PCIDevice *d, int pos)
return (uint8_t)assigned_dev_pci_read(d, pos, 1);
}
-static uint16_t assigned_dev_pci_read_word(PCIDevice *d, int pos)
-{
- return (uint16_t)assigned_dev_pci_read(d, pos, 2);
-}
-
-static uint32_t assigned_dev_pci_read_long(PCIDevice *d, int pos)
-{
- return assigned_dev_pci_read(d, pos, 4);
-}
-
static uint8_t pci_find_cap_offset(PCIDevice *d, uint8_t cap)
{
int id;
@@ -1285,6 +1275,7 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
{
AssignedDevice *dev = container_of(pci_dev, AssignedDevice, dev);
PCIRegion *pci_region = dev->real_device.regions;
+ int pos;
/* Clear initial capabilities pointer and status copied from hw */
pci_set_byte(pci_dev->config + PCI_CAPABILITY_LIST, 0);
@@ -1296,60 +1287,44 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
#ifdef KVM_CAP_DEVICE_MSI
/* Expose MSI capability
* MSI capability is the 1st capability in capability config */
- if (pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSI)) {
- int vpos, ppos;
- uint16_t flags;
-
+ if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSI))) {
dev->cap.available |= ASSIGNED_DEVICE_CAP_MSI;
- vpos = pci_add_capability(pci_dev, PCI_CAP_ID_MSI,
- PCI_CAPABILITY_CONFIG_MSI_LENGTH);
-
- memset(pci_dev->config + vpos + PCI_CAP_FLAGS, 0,
- PCI_CAPABILITY_CONFIG_MSI_LENGTH - PCI_CAP_FLAGS);
+ pci_add_capability_at_offset(pci_dev, PCI_CAP_ID_MSI, pos,
+ PCI_CAPABILITY_CONFIG_MSI_LENGTH);
/* Only 32-bit/no-mask currently supported */
- ppos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSI);
- flags = assigned_dev_pci_read_word(pci_dev, ppos + PCI_MSI_FLAGS);
- flags &= PCI_MSI_FLAGS_QMASK;
- pci_set_word(pci_dev->config + vpos + PCI_MSI_FLAGS, flags);
+ pci_set_word(pci_dev->config + pos + PCI_MSI_FLAGS,
+ pci_get_word(pci_dev->config + pos + PCI_MSI_FLAGS) &
+ PCI_MSI_FLAGS_QMASK);
+ pci_set_long(pci_dev->config + pos + PCI_MSI_ADDRESS_LO, 0);
+ pci_set_word(pci_dev->config + pos + PCI_MSI_DATA_32, 0);
/* Set writable fields */
- pci_set_word(pci_dev->wmask + vpos + PCI_MSI_FLAGS,
+ pci_set_word(pci_dev->wmask + pos + PCI_MSI_FLAGS,
PCI_MSI_FLAGS_QSIZE | PCI_MSI_FLAGS_ENABLE);
- pci_set_long(pci_dev->wmask + vpos + PCI_MSI_ADDRESS_LO, 0xfffffffc);
- pci_set_long(pci_dev->wmask + vpos + PCI_MSI_DATA_32, 0xffff);
+ pci_set_long(pci_dev->wmask + pos + PCI_MSI_ADDRESS_LO, 0xfffffffc);
+ pci_set_word(pci_dev->wmask + pos + PCI_MSI_DATA_32, 0xffff);
}
#endif
#ifdef KVM_CAP_DEVICE_MSIX
/* Expose MSI-X capability */
- if (pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSIX)) {
- int vpos, ppos, entry_nr, bar_nr;
+ if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSIX))) {
+ int bar_nr;
uint32_t msix_table_entry;
dev->cap.available |= ASSIGNED_DEVICE_CAP_MSIX;
- vpos = pci_add_capability(pci_dev, PCI_CAP_ID_MSIX,
- PCI_CAPABILITY_CONFIG_MSIX_LENGTH);
+ pci_add_capability_at_offset(pci_dev, PCI_CAP_ID_MSIX, pos,
+ PCI_CAPABILITY_CONFIG_MSIX_LENGTH);
- memset(pci_dev->config + vpos + PCI_CAP_FLAGS, 0,
- PCI_CAPABILITY_CONFIG_MSIX_LENGTH - PCI_CAP_FLAGS);
+ pci_set_word(pci_dev->config + pos + PCI_MSIX_FLAGS,
+ pci_get_word(pci_dev->config + pos + PCI_MSIX_FLAGS) &
+ PCI_MSIX_TABSIZE);
/* Only enable and function mask bits are writable */
- pci_set_word(pci_dev->wmask + vpos + PCI_MSIX_FLAGS,
+ pci_set_word(pci_dev->wmask + pos + PCI_MSIX_FLAGS,
PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL);
- ppos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSIX);
-
- entry_nr = assigned_dev_pci_read_word(pci_dev, ppos + PCI_MSIX_FLAGS);
- entry_nr &= PCI_MSIX_TABSIZE;
- pci_set_word(pci_dev->config + vpos + PCI_MSIX_FLAGS, entry_nr);
-
- msix_table_entry = assigned_dev_pci_read_long(pci_dev,
- ppos + PCI_MSIX_TABLE);
- pci_set_long(pci_dev->config + vpos + PCI_MSIX_TABLE, msix_table_entry);
-
- pci_set_long(pci_dev->config + vpos + PCI_MSIX_PBA,
- assigned_dev_pci_read_long(pci_dev, ppos + PCI_MSIX_PBA));
-
+ msix_table_entry = pci_get_long(pci_dev->config + pos + PCI_MSIX_TABLE);
bar_nr = msix_table_entry & PCI_MSIX_BIR;
msix_table_entry &= ~PCI_MSIX_BIR;
dev->msix_table_addr = pci_region[bar_nr].base_addr + msix_table_entry;
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [Qemu-devel] Re: [PATCH 6/8] device-assignment: Move PCI capabilities to match physical hardware
2010-11-12 2:56 ` [Qemu-devel] [PATCH 6/8] device-assignment: Move PCI capabilities to match physical hardware Alex Williamson
@ 2010-11-12 9:20 ` Michael S. Tsirkin
2010-11-12 13:53 ` Alex Williamson
0 siblings, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2010-11-12 9:20 UTC (permalink / raw)
To: Alex Williamson; +Cc: chrisw, qemu-devel, kvm
On Thu, Nov 11, 2010 at 07:56:13PM -0700, Alex Williamson wrote:
> Now that common PCI code doesn't have a hangup on capabilities
> being contiguous,
Hmm, this comment confused me : there's no requirement of
contigious allocations in current code in pci.c, is there?
--
MST
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Qemu-devel] Re: [PATCH 6/8] device-assignment: Move PCI capabilities to match physical hardware
2010-11-12 9:20 ` [Qemu-devel] " Michael S. Tsirkin
@ 2010-11-12 13:53 ` Alex Williamson
0 siblings, 0 replies; 26+ messages in thread
From: Alex Williamson @ 2010-11-12 13:53 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: chrisw, qemu-devel, kvm
On Fri, 2010-11-12 at 11:20 +0200, Michael S. Tsirkin wrote:
> On Thu, Nov 11, 2010 at 07:56:13PM -0700, Alex Williamson wrote:
> > Now that common PCI code doesn't have a hangup on capabilities
> > being contiguous,
>
> Hmm, this comment confused me : there's no requirement of
> contigious allocations in current code in pci.c, is there?
Exactly, but the code used to have cap.start and cap.length, which
implied it was contiguous. Since those were removed in 5/8, we don't
need to worry about where the physical capabilities land in config
space. Thanks,
Alex
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Qemu-devel] [PATCH 7/8] pci: Pass ID for capability read/write handlers
2010-11-12 2:54 [Qemu-devel] [PATCH 0/8] PCI capability and device assignment improvements Alex Williamson
` (5 preceding siblings ...)
2010-11-12 2:56 ` [Qemu-devel] [PATCH 6/8] device-assignment: Move PCI capabilities to match physical hardware Alex Williamson
@ 2010-11-12 2:56 ` Alex Williamson
2010-11-12 2:56 ` [Qemu-devel] [RFC PATCH 8/8] device-assignment: pass through and stub more PCI caps Alex Williamson
2010-11-12 5:39 ` [Qemu-devel] Re: [PATCH 0/8] PCI capability and device assignment improvements Michael S. Tsirkin
8 siblings, 0 replies; 26+ messages in thread
From: Alex Williamson @ 2010-11-12 2:56 UTC (permalink / raw)
To: kvm; +Cc: chrisw, alex.williamson, qemu-devel, mst
Any handlers that actually want to interact with specific capabilities
are going to want to know the capability ID being accessed. With the
capability map, this is readily available, so we can save handlers the
trouble of figuring it out.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
hw/device-assignment.c | 36 ++++++++++++++++++++++--------------
hw/pci.c | 14 ++++++++------
hw/pci.h | 8 ++++----
3 files changed, 34 insertions(+), 24 deletions(-)
diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 39f19be..179c7dc 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1244,30 +1244,38 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev, unsigned int ctrl_pos)
#endif
#endif
-static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev, uint32_t address,
+static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev,
+ uint8_t cap_id,
+ uint32_t address,
uint32_t val, int len)
{
- AssignedDevice *assigned_dev = container_of(pci_dev, AssignedDevice, dev);
+ pci_default_cap_write_config(pci_dev, cap_id, address, val, len);
- pci_default_cap_write_config(pci_dev, address, val, len);
+ switch (cap_id) {
#ifdef KVM_CAP_IRQ_ROUTING
+ case PCI_CAP_ID_MSI:
#ifdef KVM_CAP_DEVICE_MSI
- if (assigned_dev->cap.available & ASSIGNED_DEVICE_CAP_MSI) {
- int pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSI);
- if (ranges_overlap(address, len, pos + PCI_MSI_FLAGS, 1)) {
- assigned_dev_update_msi(pci_dev, pos + PCI_MSI_FLAGS);
+ {
+ uint8_t cap = pci_find_capability(pci_dev, cap_id);
+ if (ranges_overlap(address - cap, len, PCI_MSI_FLAGS, 1)) {
+ assigned_dev_update_msi(pci_dev, cap + PCI_MSI_FLAGS);
+ }
}
- }
#endif
+ break;
+
+ case PCI_CAP_ID_MSIX:
#ifdef KVM_CAP_DEVICE_MSIX
- if (assigned_dev->cap.available & ASSIGNED_DEVICE_CAP_MSIX) {
- int pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX);
- if (ranges_overlap(address, len, pos + PCI_MSIX_FLAGS + 1, 1)) {
- assigned_dev_update_msix(pci_dev, pos + PCI_MSIX_FLAGS);
- }
- }
+ {
+ uint8_t cap = pci_find_capability(pci_dev, cap_id);
+ if (ranges_overlap(address - cap, len, PCI_MSIX_FLAGS + 1, 1)) {
+ assigned_dev_update_msix(pci_dev, cap + PCI_MSIX_FLAGS);
+ }
+ }
#endif
+ break;
#endif
+ }
return;
}
diff --git a/hw/pci.c b/hw/pci.c
index d566e33..e1c0917 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1168,10 +1168,11 @@ static uint32_t pci_read_config(PCIDevice *d,
uint32_t pci_default_read_config(PCIDevice *d,
uint32_t address, int len)
{
+ uint8_t cap_id;
assert(len == 1 || len == 2 || len == 4);
- if (pci_access_cap_config(d, address, len)) {
- return d->cap.config_read(d, address, len);
+ if ((cap_id = pci_access_cap_config(d, address, len))) {
+ return d->cap.config_read(d, cap_id, address, len);
}
return pci_read_config(d, address, len);
@@ -1193,13 +1194,13 @@ int pci_access_cap_config(PCIDevice *pci_dev, uint32_t address, int len)
return pci_dev->cap_map[address];
}
-uint32_t pci_default_cap_read_config(PCIDevice *pci_dev,
+uint32_t pci_default_cap_read_config(PCIDevice *pci_dev, uint8_t cap_id,
uint32_t address, int len)
{
return pci_read_config(pci_dev, address, len);
}
-void pci_default_cap_write_config(PCIDevice *pci_dev,
+void pci_default_cap_write_config(PCIDevice *pci_dev, uint8_t cap_id,
uint32_t address, uint32_t val, int len)
{
pci_write_config(pci_dev, address, val, len);
@@ -1208,9 +1209,10 @@ void pci_default_cap_write_config(PCIDevice *pci_dev,
void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
{
int was_irq_disabled = pci_irq_disabled(d);
+ uint8_t cap_id;
- if (pci_access_cap_config(d, addr, l)) {
- d->cap.config_write(d, addr, val, l);
+ if ((cap_id = pci_access_cap_config(d, addr, l))) {
+ d->cap.config_write(d, cap_id, addr, val, l);
return;
}
diff --git a/hw/pci.h b/hw/pci.h
index 177008a..3f0b4e0 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -83,9 +83,9 @@ typedef void PCIMapIORegionFunc(PCIDevice *pci_dev, int region_num,
pcibus_t addr, pcibus_t size, int type);
typedef int PCIUnregisterFunc(PCIDevice *pci_dev);
-typedef void PCICapConfigWriteFunc(PCIDevice *pci_dev,
+typedef void PCICapConfigWriteFunc(PCIDevice *pci_dev, uint8_t cap_id,
uint32_t address, uint32_t val, int len);
-typedef uint32_t PCICapConfigReadFunc(PCIDevice *pci_dev,
+typedef uint32_t PCICapConfigReadFunc(PCIDevice *pci_dev, uint8_t cap_id,
uint32_t address, int len);
typedef struct PCIIORegion {
@@ -245,9 +245,9 @@ void pci_default_write_config(PCIDevice *d,
uint32_t address, uint32_t val, int len);
void pci_device_save(PCIDevice *s, QEMUFile *f);
int pci_device_load(PCIDevice *s, QEMUFile *f);
-uint32_t pci_default_cap_read_config(PCIDevice *pci_dev,
+uint32_t pci_default_cap_read_config(PCIDevice *pci_dev, uint8_t cap_id,
uint32_t address, int len);
-void pci_default_cap_write_config(PCIDevice *pci_dev,
+void pci_default_cap_write_config(PCIDevice *pci_dev, uint8_t cap_id,
uint32_t address, uint32_t val, int len);
int pci_access_cap_config(PCIDevice *pci_dev, uint32_t address, int len);
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [Qemu-devel] [RFC PATCH 8/8] device-assignment: pass through and stub more PCI caps
2010-11-12 2:54 [Qemu-devel] [PATCH 0/8] PCI capability and device assignment improvements Alex Williamson
` (6 preceding siblings ...)
2010-11-12 2:56 ` [Qemu-devel] [PATCH 7/8] pci: Pass ID for capability read/write handlers Alex Williamson
@ 2010-11-12 2:56 ` Alex Williamson
2010-11-12 5:36 ` [Qemu-devel] " Michael S. Tsirkin
2010-11-12 5:39 ` [Qemu-devel] Re: [PATCH 0/8] PCI capability and device assignment improvements Michael S. Tsirkin
8 siblings, 1 reply; 26+ messages in thread
From: Alex Williamson @ 2010-11-12 2:56 UTC (permalink / raw)
To: kvm; +Cc: chrisw, alex.williamson, qemu-devel, mst
Some drivers depend on finding capabilities like power management,
PCI express/X, vital product data, or vendor specific fields. Now
that we have better capability support, we can pass more of these
tables through to the guest. Note that VPD and VNDR are direct pass
through capabilies, the rest are mostly empty shells with a few
writable bits where necessary.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
hw/device-assignment.c | 160 +++++++++++++++++++++++++++++++++++++++++++++---
1 files changed, 149 insertions(+), 11 deletions(-)
diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 179c7dc..1b228ad 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -366,6 +366,27 @@ static uint8_t assigned_dev_pci_read_byte(PCIDevice *d, int pos)
return (uint8_t)assigned_dev_pci_read(d, pos, 1);
}
+static void assigned_dev_pci_write(PCIDevice *d, int pos, uint32_t val, int len)
+{
+ AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev);
+ ssize_t ret;
+ int fd = pci_dev->real_device.config_fd;
+
+again:
+ ret = pwrite(fd, &val, len, pos);
+ if (ret != len) {
+ if ((ret < 0) && (errno == EINTR || errno == EAGAIN))
+ goto again;
+
+ fprintf(stderr, "%s: pwrite failed, ret = %zd errno = %d\n",
+ __func__, ret, errno);
+
+ exit(1);
+ }
+
+ return;
+}
+
static uint8_t pci_find_cap_offset(PCIDevice *d, uint8_t cap)
{
int id;
@@ -1244,37 +1265,75 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev, unsigned int ctrl_pos)
#endif
#endif
+static uint32_t assigned_device_pci_cap_read_config(PCIDevice *pci_dev,
+ uint8_t cap_id,
+ uint32_t address, int len)
+{
+ uint8_t cap;
+
+ switch (cap_id) {
+
+ case PCI_CAP_ID_VPD:
+ cap = pci_find_capability(pci_dev, cap_id);
+ if (address - cap >= PCI_CAP_FLAGS) {
+ return assigned_dev_pci_read(pci_dev, address, len);
+ }
+ break;
+
+ case PCI_CAP_ID_VNDR:
+ cap = pci_find_capability(pci_dev, cap_id);
+ if (address - cap > PCI_CAP_FLAGS) {
+ return assigned_dev_pci_read(pci_dev, address, len);
+ }
+ break;
+ }
+
+ return pci_default_cap_read_config(pci_dev, cap_id, address, len);
+}
+
static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev,
uint8_t cap_id,
uint32_t address,
uint32_t val, int len)
{
+ uint8_t cap;
+
pci_default_cap_write_config(pci_dev, cap_id, address, val, len);
switch (cap_id) {
#ifdef KVM_CAP_IRQ_ROUTING
case PCI_CAP_ID_MSI:
#ifdef KVM_CAP_DEVICE_MSI
- {
- uint8_t cap = pci_find_capability(pci_dev, cap_id);
- if (ranges_overlap(address - cap, len, PCI_MSI_FLAGS, 1)) {
- assigned_dev_update_msi(pci_dev, cap + PCI_MSI_FLAGS);
- }
+ cap = pci_find_capability(pci_dev, cap_id);
+ if (ranges_overlap(address - cap, len, PCI_MSI_FLAGS, 1)) {
+ assigned_dev_update_msi(pci_dev, cap + PCI_MSI_FLAGS);
}
#endif
break;
case PCI_CAP_ID_MSIX:
#ifdef KVM_CAP_DEVICE_MSIX
- {
- uint8_t cap = pci_find_capability(pci_dev, cap_id);
- if (ranges_overlap(address - cap, len, PCI_MSIX_FLAGS + 1, 1)) {
- assigned_dev_update_msix(pci_dev, cap + PCI_MSIX_FLAGS);
- }
+ cap = pci_find_capability(pci_dev, cap_id);
+ if (ranges_overlap(address - cap, len, PCI_MSIX_FLAGS + 1, 1)) {
+ assigned_dev_update_msix(pci_dev, cap + PCI_MSIX_FLAGS);
}
#endif
break;
#endif
+
+ case PCI_CAP_ID_VPD:
+ cap = pci_find_capability(pci_dev, cap_id);
+ if (address - cap >= PCI_CAP_FLAGS) {
+ assigned_dev_pci_write(pci_dev, address, val, len);
+ }
+ break;
+
+ case PCI_CAP_ID_VNDR:
+ cap = pci_find_capability(pci_dev, cap_id);
+ if (address - cap > PCI_CAP_FLAGS) {
+ assigned_dev_pci_write(pci_dev, address, val, len);
+ }
+ break;
}
return;
}
@@ -1340,6 +1399,84 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
#endif
#endif
+ /* Minimal PM support, make the state bits writable so the guest
+ * thinks it's doing something. */
+ if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_PM))) {
+ uint16_t pmc, pmcsr;
+ pci_add_capability_at_offset(pci_dev, PCI_CAP_ID_PM, pos,
+ PCI_PM_SIZEOF);
+
+ pmc = pci_get_word(pci_dev->config + pos + PCI_CAP_FLAGS);
+ pmc &= (PCI_PM_CAP_VER_MASK | PCI_PM_CAP_DSI);
+ pci_set_word(pci_dev->config + pos + PCI_CAP_FLAGS, pmc);
+
+ pmcsr = pci_get_word(pci_dev->config + pos + PCI_PM_CTRL);
+ pmcsr &= (PCI_PM_CTRL_STATE_MASK);
+ pmcsr |= PCI_PM_CTRL_NO_SOFT_RST;
+ pci_set_word(pci_dev->config + pos + PCI_PM_CTRL, pmcsr);
+
+ pci_set_word(pci_dev->wmask + pos + PCI_PM_CTRL,
+ PCI_PM_CTRL_STATE_MASK);
+ }
+
+ if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_EXP))) {
+ uint16_t devctl, lnkcap, lnksta;
+
+ pci_add_capability_at_offset(pci_dev, PCI_CAP_ID_EXP, pos, 0x40);
+
+ devctl = pci_get_word(pci_dev->config + pos + PCI_EXP_DEVCTL);
+ devctl = (devctl & (PCI_EXP_DEVCTL_READRQ | PCI_EXP_DEVCTL_PAYLOAD)) |
+ PCI_EXP_DEVCTL_RELAX_EN | PCI_EXP_DEVCTL_NOSNOOP_EN;
+ pci_set_word(pci_dev->config + pos + PCI_EXP_DEVCTL, devctl);
+ devctl = PCI_EXP_DEVCTL_BCR_FLR | PCI_EXP_DEVCTL_AUX_PME;
+ pci_set_word(pci_dev->wmask + pos + PCI_EXP_DEVCTL, ~devctl);
+
+ lnkcap = pci_get_word(pci_dev->config + pos + PCI_EXP_LNKCAP);
+ lnkcap &= (PCI_EXP_LNKCAP_SLS | PCI_EXP_LNKCAP_MLW |
+ PCI_EXP_LNKCAP_ASPMS | PCI_EXP_LNKCAP_L0SEL |
+ PCI_EXP_LNKCAP_L1EL);
+ pci_set_word(pci_dev->config + pos + PCI_EXP_LNKCAP, lnkcap);
+
+ pci_set_word(pci_dev->wmask + pos + PCI_EXP_LNKCAP,
+ PCI_EXP_LNKCTL_ASPMC | PCI_EXP_LNKCTL_RCB |
+ PCI_EXP_LNKCTL_CCC | PCI_EXP_LNKCTL_ES |
+ PCI_EXP_LNKCTL_CLKREQ_EN | PCI_EXP_LNKCTL_HAWD);
+
+ lnksta = pci_get_word(pci_dev->config + pos + PCI_EXP_LNKSTA);
+ lnksta &= (PCI_EXP_LNKSTA_CLS | PCI_EXP_LNKSTA_NLW);
+ pci_set_word(pci_dev->config + pos + PCI_EXP_LNKSTA, lnksta);
+
+ pci_set_word(pci_dev->wmask + pos + 0x28, 0x1f);
+ pci_set_word(pci_dev->wmask + pos + 0x30, 0xfbf);
+ }
+
+ if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_VPD))) {
+ pci_add_capability_at_offset(pci_dev, PCI_CAP_ID_VPD, pos, 0x8);
+ }
+
+ if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_VNDR))) {
+ uint8_t len = pci_get_byte(pci_dev->config + pos + PCI_CAP_FLAGS);
+ pci_add_capability_at_offset(pci_dev, PCI_CAP_ID_VNDR, pos, len);
+ }
+
+ if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_PCIX))) {
+ uint16_t cmd;
+ uint32_t status;
+
+ pci_add_capability_at_offset(pci_dev, PCI_CAP_ID_PCIX, pos, 0x8);
+
+ cmd = pci_get_word(pci_dev->config + pos + PCI_X_CMD);
+ cmd &= (PCI_X_CMD_DPERR_E | PCI_X_CMD_ERO | PCI_X_CMD_MAX_READ |
+ PCI_X_CMD_MAX_SPLIT);
+ pci_set_word(pci_dev->config + pos + PCI_X_CMD, cmd);
+
+ status = pci_get_long(pci_dev->config + pos + PCI_X_STATUS);
+ status &= ~(PCI_X_STATUS_BUS | PCI_X_STATUS_DEVFN);
+ status |= (pci_bus_num(pci_dev->bus) << 8) | pci_dev->devfn;
+ status &= ~(PCI_X_STATUS_SPL_DISC | PCI_X_STATUS_UNX_SPL |
+ PCI_X_STATUS_SPL_ERR);
+ pci_set_long(pci_dev->config + pos + PCI_X_STATUS, status);
+ }
return 0;
}
@@ -1466,7 +1603,8 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
dev->h_busnr = dev->host.bus;
dev->h_devfn = PCI_DEVFN(dev->host.dev, dev->host.func);
- pci_register_capability_handlers(pci_dev, NULL,
+ pci_register_capability_handlers(pci_dev,
+ assigned_device_pci_cap_read_config,
assigned_device_pci_cap_write_config);
if (assigned_device_pci_cap_init(pci_dev) < 0)
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [Qemu-devel] Re: [RFC PATCH 8/8] device-assignment: pass through and stub more PCI caps
2010-11-12 2:56 ` [Qemu-devel] [RFC PATCH 8/8] device-assignment: pass through and stub more PCI caps Alex Williamson
@ 2010-11-12 5:36 ` Michael S. Tsirkin
2010-11-12 6:30 ` Alex Williamson
0 siblings, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2010-11-12 5:36 UTC (permalink / raw)
To: Alex Williamson; +Cc: chrisw, qemu-devel, kvm
On Thu, Nov 11, 2010 at 07:56:46PM -0700, Alex Williamson wrote:
> Some drivers depend on finding capabilities like power management,
> PCI express/X, vital product data, or vendor specific fields. Now
> that we have better capability support, we can pass more of these
> tables through to the guest. Note that VPD and VNDR are direct pass
> through capabilies, the rest are mostly empty shells with a few
> writable bits where necessary.
>
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
>
> hw/device-assignment.c | 160 +++++++++++++++++++++++++++++++++++++++++++++---
> 1 files changed, 149 insertions(+), 11 deletions(-)
>
> diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> index 179c7dc..1b228ad 100644
> --- a/hw/device-assignment.c
> +++ b/hw/device-assignment.c
> @@ -366,6 +366,27 @@ static uint8_t assigned_dev_pci_read_byte(PCIDevice *d, int pos)
> return (uint8_t)assigned_dev_pci_read(d, pos, 1);
> }
>
> +static void assigned_dev_pci_write(PCIDevice *d, int pos, uint32_t val, int len)
> +{
> + AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev);
> + ssize_t ret;
> + int fd = pci_dev->real_device.config_fd;
> +
> +again:
> + ret = pwrite(fd, &val, len, pos);
> + if (ret != len) {
> + if ((ret < 0) && (errno == EINTR || errno == EAGAIN))
> + goto again;
do {} while() ?
> +
> + fprintf(stderr, "%s: pwrite failed, ret = %zd errno = %d\n",
> + __func__, ret, errno);
> +
> + exit(1);
> + }
> +
> + return;
> +}
> +
> static uint8_t pci_find_cap_offset(PCIDevice *d, uint8_t cap)
> {
> int id;
> @@ -1244,37 +1265,75 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev, unsigned int ctrl_pos)
> #endif
> #endif
>
> +static uint32_t assigned_device_pci_cap_read_config(PCIDevice *pci_dev,
> + uint8_t cap_id,
> + uint32_t address, int len)
> +{
> + uint8_t cap;
> +
> + switch (cap_id) {
> +
> + case PCI_CAP_ID_VPD:
> + cap = pci_find_capability(pci_dev, cap_id);
> + if (address - cap >= PCI_CAP_FLAGS) {
> + return assigned_dev_pci_read(pci_dev, address, len);
> + }
> + break;
> +
> + case PCI_CAP_ID_VNDR:
> + cap = pci_find_capability(pci_dev, cap_id);
> + if (address - cap > PCI_CAP_FLAGS) {
> + return assigned_dev_pci_read(pci_dev, address, len);
> + }
> + break;
> + }
> +
> + return pci_default_cap_read_config(pci_dev, cap_id, address, len);
> +}
> +
> static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev,
> uint8_t cap_id,
> uint32_t address,
> uint32_t val, int len)
> {
> + uint8_t cap;
> +
> pci_default_cap_write_config(pci_dev, cap_id, address, val, len);
>
> switch (cap_id) {
> #ifdef KVM_CAP_IRQ_ROUTING
> case PCI_CAP_ID_MSI:
> #ifdef KVM_CAP_DEVICE_MSI
> - {
> - uint8_t cap = pci_find_capability(pci_dev, cap_id);
> - if (ranges_overlap(address - cap, len, PCI_MSI_FLAGS, 1)) {
> - assigned_dev_update_msi(pci_dev, cap + PCI_MSI_FLAGS);
> - }
> + cap = pci_find_capability(pci_dev, cap_id);
> + if (ranges_overlap(address - cap, len, PCI_MSI_FLAGS, 1)) {
> + assigned_dev_update_msi(pci_dev, cap + PCI_MSI_FLAGS);
> }
> #endif
> break;
>
> case PCI_CAP_ID_MSIX:
> #ifdef KVM_CAP_DEVICE_MSIX
> - {
> - uint8_t cap = pci_find_capability(pci_dev, cap_id);
> - if (ranges_overlap(address - cap, len, PCI_MSIX_FLAGS + 1, 1)) {
> - assigned_dev_update_msix(pci_dev, cap + PCI_MSIX_FLAGS);
> - }
> + cap = pci_find_capability(pci_dev, cap_id);
> + if (ranges_overlap(address - cap, len, PCI_MSIX_FLAGS + 1, 1)) {
> + assigned_dev_update_msix(pci_dev, cap + PCI_MSIX_FLAGS);
> }
> #endif
> break;
> #endif
> +
> + case PCI_CAP_ID_VPD:
> + cap = pci_find_capability(pci_dev, cap_id);
> + if (address - cap >= PCI_CAP_FLAGS) {
> + assigned_dev_pci_write(pci_dev, address, val, len);
> + }
> + break;
> +
> + case PCI_CAP_ID_VNDR:
> + cap = pci_find_capability(pci_dev, cap_id);
> + if (address - cap > PCI_CAP_FLAGS) {
> + assigned_dev_pci_write(pci_dev, address, val, len);
> + }
> + break;
I have a feeling we should use overlap functions instead of
address math. What do you think?
Also - put cap offsets in assigned device structure to avoid
find calls?
> }
> return;
> }
> @@ -1340,6 +1399,84 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
> #endif
> #endif
>
> + /* Minimal PM support, make the state bits writable so the guest
> + * thinks it's doing something. */
> + if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_PM))) {
> + uint16_t pmc, pmcsr;
> + pci_add_capability_at_offset(pci_dev, PCI_CAP_ID_PM, pos,
> + PCI_PM_SIZEOF);
> +
> + pmc = pci_get_word(pci_dev->config + pos + PCI_CAP_FLAGS);
> + pmc &= (PCI_PM_CAP_VER_MASK | PCI_PM_CAP_DSI);
> + pci_set_word(pci_dev->config + pos + PCI_CAP_FLAGS, pmc);
> +
> + pmcsr = pci_get_word(pci_dev->config + pos + PCI_PM_CTRL);
> + pmcsr &= (PCI_PM_CTRL_STATE_MASK);
> + pmcsr |= PCI_PM_CTRL_NO_SOFT_RST;
> + pci_set_word(pci_dev->config + pos + PCI_PM_CTRL, pmcsr);
> +
> + pci_set_word(pci_dev->wmask + pos + PCI_PM_CTRL,
> + PCI_PM_CTRL_STATE_MASK);
> + }
we don't pass anything to device. So - can this be put in pci_pm.c
so that emulated devices can use this too?
> +
> + if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_EXP))) {
> + uint16_t devctl, lnkcap, lnksta;
> +
> + pci_add_capability_at_offset(pci_dev, PCI_CAP_ID_EXP, pos, 0x40);
> +
> + devctl = pci_get_word(pci_dev->config + pos + PCI_EXP_DEVCTL);
> + devctl = (devctl & (PCI_EXP_DEVCTL_READRQ | PCI_EXP_DEVCTL_PAYLOAD)) |
> + PCI_EXP_DEVCTL_RELAX_EN | PCI_EXP_DEVCTL_NOSNOOP_EN;
> + pci_set_word(pci_dev->config + pos + PCI_EXP_DEVCTL, devctl);
> + devctl = PCI_EXP_DEVCTL_BCR_FLR | PCI_EXP_DEVCTL_AUX_PME;
> + pci_set_word(pci_dev->wmask + pos + PCI_EXP_DEVCTL, ~devctl);
> +
> + lnkcap = pci_get_word(pci_dev->config + pos + PCI_EXP_LNKCAP);
> + lnkcap &= (PCI_EXP_LNKCAP_SLS | PCI_EXP_LNKCAP_MLW |
> + PCI_EXP_LNKCAP_ASPMS | PCI_EXP_LNKCAP_L0SEL |
> + PCI_EXP_LNKCAP_L1EL);
> + pci_set_word(pci_dev->config + pos + PCI_EXP_LNKCAP, lnkcap);
> +
> + pci_set_word(pci_dev->wmask + pos + PCI_EXP_LNKCAP,
> + PCI_EXP_LNKCTL_ASPMC | PCI_EXP_LNKCTL_RCB |
> + PCI_EXP_LNKCTL_CCC | PCI_EXP_LNKCTL_ES |
> + PCI_EXP_LNKCTL_CLKREQ_EN | PCI_EXP_LNKCTL_HAWD);
> +
> + lnksta = pci_get_word(pci_dev->config + pos + PCI_EXP_LNKSTA);
> + lnksta &= (PCI_EXP_LNKSTA_CLS | PCI_EXP_LNKSTA_NLW);
> + pci_set_word(pci_dev->config + pos + PCI_EXP_LNKSTA, lnksta);
> +
> + pci_set_word(pci_dev->wmask + pos + 0x28, 0x1f);
> + pci_set_word(pci_dev->wmask + pos + 0x30, 0xfbf);
This seems to overlap functionally with the express work upstream.
Can code from there be reused? I also wonder whether is affects the
guest OS if it finds an express device on a non-express bridge.
> + }
> +
> + if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_VPD))) {
> + pci_add_capability_at_offset(pci_dev, PCI_CAP_ID_VPD, pos, 0x8);
> + }
> +
> + if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_VNDR))) {
> + uint8_t len = pci_get_byte(pci_dev->config + pos + PCI_CAP_FLAGS);
> + pci_add_capability_at_offset(pci_dev, PCI_CAP_ID_VNDR, pos, len);
> + }
> +
> + if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_PCIX))) {
> + uint16_t cmd;
> + uint32_t status;
> +
> + pci_add_capability_at_offset(pci_dev, PCI_CAP_ID_PCIX, pos, 0x8);
> +
> + cmd = pci_get_word(pci_dev->config + pos + PCI_X_CMD);
> + cmd &= (PCI_X_CMD_DPERR_E | PCI_X_CMD_ERO | PCI_X_CMD_MAX_READ |
> + PCI_X_CMD_MAX_SPLIT);
> + pci_set_word(pci_dev->config + pos + PCI_X_CMD, cmd);
> +
> + status = pci_get_long(pci_dev->config + pos + PCI_X_STATUS);
> + status &= ~(PCI_X_STATUS_BUS | PCI_X_STATUS_DEVFN);
> + status |= (pci_bus_num(pci_dev->bus) << 8) | pci_dev->devfn;
> + status &= ~(PCI_X_STATUS_SPL_DISC | PCI_X_STATUS_UNX_SPL |
> + PCI_X_STATUS_SPL_ERR);
> + pci_set_long(pci_dev->config + pos + PCI_X_STATUS, status);
> + }
This will be handy for non-assignment case so
I'd like to see this moved out of device-assignment.c:
we could create pcix.c or just add to pci.c.
> return 0;
> }
>
> @@ -1466,7 +1603,8 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
> dev->h_busnr = dev->host.bus;
> dev->h_devfn = PCI_DEVFN(dev->host.dev, dev->host.func);
>
> - pci_register_capability_handlers(pci_dev, NULL,
> + pci_register_capability_handlers(pci_dev,
> + assigned_device_pci_cap_read_config,
> assigned_device_pci_cap_write_config);
Maybe these could go away?
>
> if (assigned_device_pci_cap_init(pci_dev) < 0)
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Qemu-devel] Re: [RFC PATCH 8/8] device-assignment: pass through and stub more PCI caps
2010-11-12 5:36 ` [Qemu-devel] " Michael S. Tsirkin
@ 2010-11-12 6:30 ` Alex Williamson
2010-11-12 9:11 ` Michael S. Tsirkin
0 siblings, 1 reply; 26+ messages in thread
From: Alex Williamson @ 2010-11-12 6:30 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: chrisw, qemu-devel, kvm
On Fri, 2010-11-12 at 07:36 +0200, Michael S. Tsirkin wrote:
> On Thu, Nov 11, 2010 at 07:56:46PM -0700, Alex Williamson wrote:
> > Some drivers depend on finding capabilities like power management,
> > PCI express/X, vital product data, or vendor specific fields. Now
> > that we have better capability support, we can pass more of these
> > tables through to the guest. Note that VPD and VNDR are direct pass
> > through capabilies, the rest are mostly empty shells with a few
> > writable bits where necessary.
> >
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > ---
> >
> > hw/device-assignment.c | 160 +++++++++++++++++++++++++++++++++++++++++++++---
> > 1 files changed, 149 insertions(+), 11 deletions(-)
> >
> > diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> > index 179c7dc..1b228ad 100644
> > --- a/hw/device-assignment.c
> > +++ b/hw/device-assignment.c
> > @@ -366,6 +366,27 @@ static uint8_t assigned_dev_pci_read_byte(PCIDevice *d, int pos)
> > return (uint8_t)assigned_dev_pci_read(d, pos, 1);
> > }
> >
> > +static void assigned_dev_pci_write(PCIDevice *d, int pos, uint32_t val, int len)
> > +{
> > + AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev);
> > + ssize_t ret;
> > + int fd = pci_dev->real_device.config_fd;
> > +
> > +again:
> > + ret = pwrite(fd, &val, len, pos);
> > + if (ret != len) {
> > + if ((ret < 0) && (errno == EINTR || errno == EAGAIN))
> > + goto again;
>
>
> do {} while() ?
Sure, this is just a copy of another place that does something similar.
They should either be merged or both converted in a separate patch.
> > +
> > + fprintf(stderr, "%s: pwrite failed, ret = %zd errno = %d\n",
> > + __func__, ret, errno);
> > +
> > + exit(1);
> > + }
> > +
> > + return;
> > +}
> > +
> > static uint8_t pci_find_cap_offset(PCIDevice *d, uint8_t cap)
> > {
> > int id;
> > @@ -1244,37 +1265,75 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev, unsigned int ctrl_pos)
> > #endif
> > #endif
> >
> > +static uint32_t assigned_device_pci_cap_read_config(PCIDevice *pci_dev,
> > + uint8_t cap_id,
> > + uint32_t address, int len)
> > +{
> > + uint8_t cap;
> > +
> > + switch (cap_id) {
> > +
> > + case PCI_CAP_ID_VPD:
> > + cap = pci_find_capability(pci_dev, cap_id);
> > + if (address - cap >= PCI_CAP_FLAGS) {
> > + return assigned_dev_pci_read(pci_dev, address, len);
> > + }
> > + break;
> > +
> > + case PCI_CAP_ID_VNDR:
> > + cap = pci_find_capability(pci_dev, cap_id);
> > + if (address - cap > PCI_CAP_FLAGS) {
> > + return assigned_dev_pci_read(pci_dev, address, len);
> > + }
> > + break;
> > + }
> > +
> > + return pci_default_cap_read_config(pci_dev, cap_id, address, len);
> > +}
> > +
> > static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev,
> > uint8_t cap_id,
> > uint32_t address,
> > uint32_t val, int len)
> > {
> > + uint8_t cap;
> > +
> > pci_default_cap_write_config(pci_dev, cap_id, address, val, len);
> >
> > switch (cap_id) {
> > #ifdef KVM_CAP_IRQ_ROUTING
> > case PCI_CAP_ID_MSI:
> > #ifdef KVM_CAP_DEVICE_MSI
> > - {
> > - uint8_t cap = pci_find_capability(pci_dev, cap_id);
> > - if (ranges_overlap(address - cap, len, PCI_MSI_FLAGS, 1)) {
> > - assigned_dev_update_msi(pci_dev, cap + PCI_MSI_FLAGS);
> > - }
> > + cap = pci_find_capability(pci_dev, cap_id);
> > + if (ranges_overlap(address - cap, len, PCI_MSI_FLAGS, 1)) {
> > + assigned_dev_update_msi(pci_dev, cap + PCI_MSI_FLAGS);
> > }
> > #endif
> > break;
> >
> > case PCI_CAP_ID_MSIX:
> > #ifdef KVM_CAP_DEVICE_MSIX
> > - {
> > - uint8_t cap = pci_find_capability(pci_dev, cap_id);
> > - if (ranges_overlap(address - cap, len, PCI_MSIX_FLAGS + 1, 1)) {
> > - assigned_dev_update_msix(pci_dev, cap + PCI_MSIX_FLAGS);
> > - }
> > + cap = pci_find_capability(pci_dev, cap_id);
> > + if (ranges_overlap(address - cap, len, PCI_MSIX_FLAGS + 1, 1)) {
> > + assigned_dev_update_msix(pci_dev, cap + PCI_MSIX_FLAGS);
> > }
> > #endif
> > break;
> > #endif
> > +
> > + case PCI_CAP_ID_VPD:
> > + cap = pci_find_capability(pci_dev, cap_id);
> > + if (address - cap >= PCI_CAP_FLAGS) {
> > + assigned_dev_pci_write(pci_dev, address, val, len);
> > + }
> > + break;
> > +
> > + case PCI_CAP_ID_VNDR:
> > + cap = pci_find_capability(pci_dev, cap_id);
> > + if (address - cap > PCI_CAP_FLAGS) {
> > + assigned_dev_pci_write(pci_dev, address, val, len);
> > + }
> > + break;
>
> I have a feeling we should use overlap functions instead of
> address math. What do you think?
if (!ranges_overlap(address - cap, len, 0, PCI_CAP_FLAGS)) ?
Sure, that'd be a nice cleanup.
> Also - put cap offsets in assigned device structure to avoid
> find calls?
I suppose there aren't enough capability IDs that it'd take much space
to do so, but it doesn't sound like a unique to device assignment issue.
Maybe that should live on PCIDevice with an access function.
> > }
> > return;
> > }
> > @@ -1340,6 +1399,84 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
> > #endif
> > #endif
> >
> > + /* Minimal PM support, make the state bits writable so the guest
> > + * thinks it's doing something. */
> > + if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_PM))) {
> > + uint16_t pmc, pmcsr;
> > + pci_add_capability_at_offset(pci_dev, PCI_CAP_ID_PM, pos,
> > + PCI_PM_SIZEOF);
> > +
> > + pmc = pci_get_word(pci_dev->config + pos + PCI_CAP_FLAGS);
> > + pmc &= (PCI_PM_CAP_VER_MASK | PCI_PM_CAP_DSI);
> > + pci_set_word(pci_dev->config + pos + PCI_CAP_FLAGS, pmc);
> > +
> > + pmcsr = pci_get_word(pci_dev->config + pos + PCI_PM_CTRL);
> > + pmcsr &= (PCI_PM_CTRL_STATE_MASK);
> > + pmcsr |= PCI_PM_CTRL_NO_SOFT_RST;
> > + pci_set_word(pci_dev->config + pos + PCI_PM_CTRL, pmcsr);
> > +
> > + pci_set_word(pci_dev->wmask + pos + PCI_PM_CTRL,
> > + PCI_PM_CTRL_STATE_MASK);
> > + }
>
> we don't pass anything to device. So - can this be put in pci_pm.c
> so that emulated devices can use this too?
That's part of why this one is an RFC, should we allow the guest to do
some level of power management to the physical device? Xen seems to
allow D-state manipulation by the guest.
> > +
> > + if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_EXP))) {
> > + uint16_t devctl, lnkcap, lnksta;
> > +
> > + pci_add_capability_at_offset(pci_dev, PCI_CAP_ID_EXP, pos, 0x40);
> > +
> > + devctl = pci_get_word(pci_dev->config + pos + PCI_EXP_DEVCTL);
> > + devctl = (devctl & (PCI_EXP_DEVCTL_READRQ | PCI_EXP_DEVCTL_PAYLOAD)) |
> > + PCI_EXP_DEVCTL_RELAX_EN | PCI_EXP_DEVCTL_NOSNOOP_EN;
> > + pci_set_word(pci_dev->config + pos + PCI_EXP_DEVCTL, devctl);
> > + devctl = PCI_EXP_DEVCTL_BCR_FLR | PCI_EXP_DEVCTL_AUX_PME;
> > + pci_set_word(pci_dev->wmask + pos + PCI_EXP_DEVCTL, ~devctl);
> > +
> > + lnkcap = pci_get_word(pci_dev->config + pos + PCI_EXP_LNKCAP);
> > + lnkcap &= (PCI_EXP_LNKCAP_SLS | PCI_EXP_LNKCAP_MLW |
> > + PCI_EXP_LNKCAP_ASPMS | PCI_EXP_LNKCAP_L0SEL |
> > + PCI_EXP_LNKCAP_L1EL);
> > + pci_set_word(pci_dev->config + pos + PCI_EXP_LNKCAP, lnkcap);
> > +
> > + pci_set_word(pci_dev->wmask + pos + PCI_EXP_LNKCAP,
> > + PCI_EXP_LNKCTL_ASPMC | PCI_EXP_LNKCTL_RCB |
> > + PCI_EXP_LNKCTL_CCC | PCI_EXP_LNKCTL_ES |
> > + PCI_EXP_LNKCTL_CLKREQ_EN | PCI_EXP_LNKCTL_HAWD);
> > +
> > + lnksta = pci_get_word(pci_dev->config + pos + PCI_EXP_LNKSTA);
> > + lnksta &= (PCI_EXP_LNKSTA_CLS | PCI_EXP_LNKSTA_NLW);
> > + pci_set_word(pci_dev->config + pos + PCI_EXP_LNKSTA, lnksta);
> > +
> > + pci_set_word(pci_dev->wmask + pos + 0x28, 0x1f);
> > + pci_set_word(pci_dev->wmask + pos + 0x30, 0xfbf);
>
> This seems to overlap functionally with the express work upstream.
> Can code from there be reused? I also wonder whether is affects the
> guest OS if it finds an express device on a non-express bridge.
Yes, perhaps it can be merged. I'd like to start with figuring out what
we need for device assignment, and merging where it makes sense later
than stall out trying to solve the whole problem upfront.
It could certainly be confusing for a driver to find an express device
under a PIIX chipset, but I think typically the driver is just looking
for link speed info for pretty printks.
> > + }
> > +
> > + if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_VPD))) {
> > + pci_add_capability_at_offset(pci_dev, PCI_CAP_ID_VPD, pos, 0x8);
> > + }
> > +
> > + if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_VNDR))) {
> > + uint8_t len = pci_get_byte(pci_dev->config + pos + PCI_CAP_FLAGS);
> > + pci_add_capability_at_offset(pci_dev, PCI_CAP_ID_VNDR, pos, len);
> > + }
> > +
> > + if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_PCIX))) {
> > + uint16_t cmd;
> > + uint32_t status;
> > +
> > + pci_add_capability_at_offset(pci_dev, PCI_CAP_ID_PCIX, pos, 0x8);
> > +
> > + cmd = pci_get_word(pci_dev->config + pos + PCI_X_CMD);
> > + cmd &= (PCI_X_CMD_DPERR_E | PCI_X_CMD_ERO | PCI_X_CMD_MAX_READ |
> > + PCI_X_CMD_MAX_SPLIT);
> > + pci_set_word(pci_dev->config + pos + PCI_X_CMD, cmd);
> > +
> > + status = pci_get_long(pci_dev->config + pos + PCI_X_STATUS);
> > + status &= ~(PCI_X_STATUS_BUS | PCI_X_STATUS_DEVFN);
> > + status |= (pci_bus_num(pci_dev->bus) << 8) | pci_dev->devfn;
> > + status &= ~(PCI_X_STATUS_SPL_DISC | PCI_X_STATUS_UNX_SPL |
> > + PCI_X_STATUS_SPL_ERR);
> > + pci_set_long(pci_dev->config + pos + PCI_X_STATUS, status);
> > + }
>
> This will be handy for non-assignment case so
> I'd like to see this moved out of device-assignment.c:
> we could create pcix.c or just add to pci.c.
Yeah, I'm not sure there's enough we can poke at in a PCIX capability to
warrant it's own file. Here for the same reason as the others, what do
we want to expose, what's emulated, what's poke-able. We also have the
benefit here that we get default from hardware. If it stays this small,
I'd just assume leave it here than try to generalize an interface when
we're the only user.
> > return 0;
> > }
> >
> > @@ -1466,7 +1603,8 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
> > dev->h_busnr = dev->host.bus;
> > dev->h_devfn = PCI_DEVFN(dev->host.dev, dev->host.func);
> >
> > - pci_register_capability_handlers(pci_dev, NULL,
> > + pci_register_capability_handlers(pci_dev,
> > + assigned_device_pci_cap_read_config,
> > assigned_device_pci_cap_write_config);
>
> Maybe these could go away?
Sure, pass a capability ID to the read/write_config functions and I'd
support this going way. I don't think that necessarily needs to be tied
to this series though. Thanks,
Alex
> >
> > if (assigned_device_pci_cap_init(pci_dev) < 0)
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Qemu-devel] Re: [RFC PATCH 8/8] device-assignment: pass through and stub more PCI caps
2010-11-12 6:30 ` Alex Williamson
@ 2010-11-12 9:11 ` Michael S. Tsirkin
2010-11-12 15:42 ` Alex Williamson
0 siblings, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2010-11-12 9:11 UTC (permalink / raw)
To: Alex Williamson; +Cc: chrisw, qemu-devel, kvm
On Thu, Nov 11, 2010 at 11:30:07PM -0700, Alex Williamson wrote:
> On Fri, 2010-11-12 at 07:36 +0200, Michael S. Tsirkin wrote:
> > On Thu, Nov 11, 2010 at 07:56:46PM -0700, Alex Williamson wrote:
> > > Some drivers depend on finding capabilities like power management,
> > > PCI express/X, vital product data, or vendor specific fields. Now
> > > that we have better capability support, we can pass more of these
> > > tables through to the guest. Note that VPD and VNDR are direct pass
> > > through capabilies, the rest are mostly empty shells with a few
> > > writable bits where necessary.
> > >
> > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > ---
> > >
> > > hw/device-assignment.c | 160 +++++++++++++++++++++++++++++++++++++++++++++---
> > > 1 files changed, 149 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> > > index 179c7dc..1b228ad 100644
> > > --- a/hw/device-assignment.c
> > > +++ b/hw/device-assignment.c
> > > @@ -366,6 +366,27 @@ static uint8_t assigned_dev_pci_read_byte(PCIDevice *d, int pos)
> > > return (uint8_t)assigned_dev_pci_read(d, pos, 1);
> > > }
> > >
> > > +static void assigned_dev_pci_write(PCIDevice *d, int pos, uint32_t val, int len)
> > > +{
> > > + AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev);
> > > + ssize_t ret;
> > > + int fd = pci_dev->real_device.config_fd;
> > > +
> > > +again:
> > > + ret = pwrite(fd, &val, len, pos);
> > > + if (ret != len) {
> > > + if ((ret < 0) && (errno == EINTR || errno == EAGAIN))
> > > + goto again;
> >
> >
> > do {} while() ?
>
> Sure, this is just a copy of another place that does something similar.
> They should either be merged or both converted in a separate patch.
>
> > > +
> > > + fprintf(stderr, "%s: pwrite failed, ret = %zd errno = %d\n",
> > > + __func__, ret, errno);
> > > +
> > > + exit(1);
> > > + }
> > > +
> > > + return;
> > > +}
> > > +
> > > static uint8_t pci_find_cap_offset(PCIDevice *d, uint8_t cap)
> > > {
> > > int id;
> > > @@ -1244,37 +1265,75 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev, unsigned int ctrl_pos)
> > > #endif
> > > #endif
> > >
> > > +static uint32_t assigned_device_pci_cap_read_config(PCIDevice *pci_dev,
> > > + uint8_t cap_id,
> > > + uint32_t address, int len)
> > > +{
> > > + uint8_t cap;
> > > +
> > > + switch (cap_id) {
> > > +
> > > + case PCI_CAP_ID_VPD:
> > > + cap = pci_find_capability(pci_dev, cap_id);
> > > + if (address - cap >= PCI_CAP_FLAGS) {
> > > + return assigned_dev_pci_read(pci_dev, address, len);
> > > + }
> > > + break;
> > > +
> > > + case PCI_CAP_ID_VNDR:
> > > + cap = pci_find_capability(pci_dev, cap_id);
> > > + if (address - cap > PCI_CAP_FLAGS) {
> > > + return assigned_dev_pci_read(pci_dev, address, len);
> > > + }
> > > + break;
> > > + }
> > > +
> > > + return pci_default_cap_read_config(pci_dev, cap_id, address, len);
> > > +}
> > > +
> > > static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev,
> > > uint8_t cap_id,
> > > uint32_t address,
> > > uint32_t val, int len)
> > > {
> > > + uint8_t cap;
> > > +
> > > pci_default_cap_write_config(pci_dev, cap_id, address, val, len);
> > >
> > > switch (cap_id) {
> > > #ifdef KVM_CAP_IRQ_ROUTING
> > > case PCI_CAP_ID_MSI:
> > > #ifdef KVM_CAP_DEVICE_MSI
> > > - {
> > > - uint8_t cap = pci_find_capability(pci_dev, cap_id);
> > > - if (ranges_overlap(address - cap, len, PCI_MSI_FLAGS, 1)) {
> > > - assigned_dev_update_msi(pci_dev, cap + PCI_MSI_FLAGS);
> > > - }
> > > + cap = pci_find_capability(pci_dev, cap_id);
> > > + if (ranges_overlap(address - cap, len, PCI_MSI_FLAGS, 1)) {
> > > + assigned_dev_update_msi(pci_dev, cap + PCI_MSI_FLAGS);
> > > }
> > > #endif
> > > break;
> > >
> > > case PCI_CAP_ID_MSIX:
> > > #ifdef KVM_CAP_DEVICE_MSIX
> > > - {
> > > - uint8_t cap = pci_find_capability(pci_dev, cap_id);
> > > - if (ranges_overlap(address - cap, len, PCI_MSIX_FLAGS + 1, 1)) {
> > > - assigned_dev_update_msix(pci_dev, cap + PCI_MSIX_FLAGS);
> > > - }
> > > + cap = pci_find_capability(pci_dev, cap_id);
> > > + if (ranges_overlap(address - cap, len, PCI_MSIX_FLAGS + 1, 1)) {
> > > + assigned_dev_update_msix(pci_dev, cap + PCI_MSIX_FLAGS);
> > > }
> > > #endif
> > > break;
> > > #endif
> > > +
> > > + case PCI_CAP_ID_VPD:
> > > + cap = pci_find_capability(pci_dev, cap_id);
> > > + if (address - cap >= PCI_CAP_FLAGS) {
> > > + assigned_dev_pci_write(pci_dev, address, val, len);
> > > + }
> > > + break;
> > > +
> > > + case PCI_CAP_ID_VNDR:
> > > + cap = pci_find_capability(pci_dev, cap_id);
> > > + if (address - cap > PCI_CAP_FLAGS) {
> > > + assigned_dev_pci_write(pci_dev, address, val, len);
> > > + }
> > > + break;
> >
> > I have a feeling we should use overlap functions instead of
> > address math. What do you think?
>
> if (!ranges_overlap(address - cap, len, 0, PCI_CAP_FLAGS)) ?
ranges_overlap(address, len, cap, PCI_CAP_FLAGS)
> Sure, that'd be a nice cleanup.
>
> > Also - put cap offsets in assigned device structure to avoid
> > find calls?
>
> I suppose there aren't enough capability IDs that it'd take much space
> to do so, but it doesn't sound like a unique to device assignment issue.
> Maybe that should live on PCIDevice with an access function.
Sure, I put all caps that we actually emulate in PCIDevice.
So that would apply to express, pcix, etc.
Sticking offsets to caps that core doesn't emulate in PCIDevice
seems a bit strange. That's why each device has its own device state.
> > > }
> > > return;
> > > }
> > > @@ -1340,6 +1399,84 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
> > > #endif
> > > #endif
> > >
> > > + /* Minimal PM support, make the state bits writable so the guest
> > > + * thinks it's doing something. */
> > > + if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_PM))) {
> > > + uint16_t pmc, pmcsr;
> > > + pci_add_capability_at_offset(pci_dev, PCI_CAP_ID_PM, pos,
> > > + PCI_PM_SIZEOF);
> > > +
> > > + pmc = pci_get_word(pci_dev->config + pos + PCI_CAP_FLAGS);
> > > + pmc &= (PCI_PM_CAP_VER_MASK | PCI_PM_CAP_DSI);
> > > + pci_set_word(pci_dev->config + pos + PCI_CAP_FLAGS, pmc);
> > > +
> > > + pmcsr = pci_get_word(pci_dev->config + pos + PCI_PM_CTRL);
> > > + pmcsr &= (PCI_PM_CTRL_STATE_MASK);
> > > + pmcsr |= PCI_PM_CTRL_NO_SOFT_RST;
> > > + pci_set_word(pci_dev->config + pos + PCI_PM_CTRL, pmcsr);
> > > +
> > > + pci_set_word(pci_dev->wmask + pos + PCI_PM_CTRL,
> > > + PCI_PM_CTRL_STATE_MASK);
> > > + }
> >
> > we don't pass anything to device. So - can this be put in pci_pm.c
> > so that emulated devices can use this too?
>
> That's part of why this one is an RFC, should we allow the guest to do
> some level of power management to the physical device? Xen seems to
> allow D-state manipulation by the guest.
Hmm, ok. I still hope we can do the emulated one and then
assigned device would call the relevant function and pass
info to the backend.
> > > +
> > > + if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_EXP))) {
> > > + uint16_t devctl, lnkcap, lnksta;
> > > +
> > > + pci_add_capability_at_offset(pci_dev, PCI_CAP_ID_EXP, pos, 0x40);
> > > +
> > > + devctl = pci_get_word(pci_dev->config + pos + PCI_EXP_DEVCTL);
> > > + devctl = (devctl & (PCI_EXP_DEVCTL_READRQ | PCI_EXP_DEVCTL_PAYLOAD)) |
> > > + PCI_EXP_DEVCTL_RELAX_EN | PCI_EXP_DEVCTL_NOSNOOP_EN;
> > > + pci_set_word(pci_dev->config + pos + PCI_EXP_DEVCTL, devctl);
> > > + devctl = PCI_EXP_DEVCTL_BCR_FLR | PCI_EXP_DEVCTL_AUX_PME;
> > > + pci_set_word(pci_dev->wmask + pos + PCI_EXP_DEVCTL, ~devctl);
> > > +
> > > + lnkcap = pci_get_word(pci_dev->config + pos + PCI_EXP_LNKCAP);
> > > + lnkcap &= (PCI_EXP_LNKCAP_SLS | PCI_EXP_LNKCAP_MLW |
> > > + PCI_EXP_LNKCAP_ASPMS | PCI_EXP_LNKCAP_L0SEL |
> > > + PCI_EXP_LNKCAP_L1EL);
> > > + pci_set_word(pci_dev->config + pos + PCI_EXP_LNKCAP, lnkcap);
> > > +
> > > + pci_set_word(pci_dev->wmask + pos + PCI_EXP_LNKCAP,
> > > + PCI_EXP_LNKCTL_ASPMC | PCI_EXP_LNKCTL_RCB |
> > > + PCI_EXP_LNKCTL_CCC | PCI_EXP_LNKCTL_ES |
> > > + PCI_EXP_LNKCTL_CLKREQ_EN | PCI_EXP_LNKCTL_HAWD);
> > > +
> > > + lnksta = pci_get_word(pci_dev->config + pos + PCI_EXP_LNKSTA);
> > > + lnksta &= (PCI_EXP_LNKSTA_CLS | PCI_EXP_LNKSTA_NLW);
> > > + pci_set_word(pci_dev->config + pos + PCI_EXP_LNKSTA, lnksta);
> > > +
> > > + pci_set_word(pci_dev->wmask + pos + 0x28, 0x1f);
> > > + pci_set_word(pci_dev->wmask + pos + 0x30, 0xfbf);
> >
> > This seems to overlap functionally with the express work upstream.
> > Can code from there be reused? I also wonder whether is affects the
> > guest OS if it finds an express device on a non-express bridge.
>
> Yes, perhaps it can be merged. I'd like to start with figuring out what
> we need for device assignment, and merging where it makes sense later
> than stall out trying to solve the whole problem upfront.
>
> It could certainly be confusing for a driver to find an express device
> under a PIIX chipset, but I think typically the driver is just looking
> for link speed info for pretty printks.
> > > + }
> > > +
> > > + if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_VPD))) {
> > > + pci_add_capability_at_offset(pci_dev, PCI_CAP_ID_VPD, pos, 0x8);
> > > + }
> > > +
> > > + if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_VNDR))) {
> > > + uint8_t len = pci_get_byte(pci_dev->config + pos + PCI_CAP_FLAGS);
> > > + pci_add_capability_at_offset(pci_dev, PCI_CAP_ID_VNDR, pos, len);
> > > + }
> > > +
> > > + if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_PCIX))) {
> > > + uint16_t cmd;
> > > + uint32_t status;
> > > +
> > > + pci_add_capability_at_offset(pci_dev, PCI_CAP_ID_PCIX, pos, 0x8);
> > > +
> > > + cmd = pci_get_word(pci_dev->config + pos + PCI_X_CMD);
> > > + cmd &= (PCI_X_CMD_DPERR_E | PCI_X_CMD_ERO | PCI_X_CMD_MAX_READ |
> > > + PCI_X_CMD_MAX_SPLIT);
> > > + pci_set_word(pci_dev->config + pos + PCI_X_CMD, cmd);
> > > +
> > > + status = pci_get_long(pci_dev->config + pos + PCI_X_STATUS);
> > > + status &= ~(PCI_X_STATUS_BUS | PCI_X_STATUS_DEVFN);
> > > + status |= (pci_bus_num(pci_dev->bus) << 8) | pci_dev->devfn;
> > > + status &= ~(PCI_X_STATUS_SPL_DISC | PCI_X_STATUS_UNX_SPL |
> > > + PCI_X_STATUS_SPL_ERR);
> > > + pci_set_long(pci_dev->config + pos + PCI_X_STATUS, status);
> > > + }
> >
> > This will be handy for non-assignment case so
> > I'd like to see this moved out of device-assignment.c:
> > we could create pcix.c or just add to pci.c.
>
> Yeah, I'm not sure there's enough we can poke at in a PCIX capability to
> warrant it's own file. Here for the same reason as the others, what do
> we want to expose, what's emulated, what's poke-able. We also have the
> benefit here that we get default from hardware. If it stays this small,
> I'd just assume leave it here than try to generalize an interface when
> we're the only user.
Do you have a pcix device btw? Do we even care?
> > > return 0;
> > > }
> > >
> > > @@ -1466,7 +1603,8 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
> > > dev->h_busnr = dev->host.bus;
> > > dev->h_devfn = PCI_DEVFN(dev->host.dev, dev->host.func);
> > >
> > > - pci_register_capability_handlers(pci_dev, NULL,
> > > + pci_register_capability_handlers(pci_dev,
> > > + assigned_device_pci_cap_read_config,
> > > assigned_device_pci_cap_write_config);
> >
> > Maybe these could go away?
>
> Sure, pass a capability ID to the read/write_config functions and I'd
> support this going way. I don't think that necessarily needs to be tied
> to this series though. Thanks,
>
> Alex
As in, can be a separate cleanup later. Yes.
> > >
> > > if (assigned_device_pci_cap_init(pci_dev) < 0)
>
>
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Qemu-devel] Re: [RFC PATCH 8/8] device-assignment: pass through and stub more PCI caps
2010-11-12 9:11 ` Michael S. Tsirkin
@ 2010-11-12 15:42 ` Alex Williamson
2010-11-16 16:08 ` Michael S. Tsirkin
0 siblings, 1 reply; 26+ messages in thread
From: Alex Williamson @ 2010-11-12 15:42 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: chrisw, qemu-devel, kvm
On Fri, 2010-11-12 at 11:11 +0200, Michael S. Tsirkin wrote:
> On Thu, Nov 11, 2010 at 11:30:07PM -0700, Alex Williamson wrote:
> > On Fri, 2010-11-12 at 07:36 +0200, Michael S. Tsirkin wrote:
> > > On Thu, Nov 11, 2010 at 07:56:46PM -0700, Alex Williamson wrote:
> > > > Some drivers depend on finding capabilities like power management,
> > > > PCI express/X, vital product data, or vendor specific fields. Now
> > > > that we have better capability support, we can pass more of these
> > > > tables through to the guest. Note that VPD and VNDR are direct pass
> > > > through capabilies, the rest are mostly empty shells with a few
> > > > writable bits where necessary.
> > > >
> > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > > ---
> > > >
> > > > hw/device-assignment.c | 160 +++++++++++++++++++++++++++++++++++++++++++++---
> > > > 1 files changed, 149 insertions(+), 11 deletions(-)
> > > >
> > > > diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> > > > index 179c7dc..1b228ad 100644
> > > > --- a/hw/device-assignment.c
> > > > +++ b/hw/device-assignment.c
> > > > @@ -366,6 +366,27 @@ static uint8_t assigned_dev_pci_read_byte(PCIDevice *d, int pos)
> > > > return (uint8_t)assigned_dev_pci_read(d, pos, 1);
> > > > }
> > > >
> > > > +static void assigned_dev_pci_write(PCIDevice *d, int pos, uint32_t val, int len)
> > > > +{
> > > > + AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev);
> > > > + ssize_t ret;
> > > > + int fd = pci_dev->real_device.config_fd;
> > > > +
> > > > +again:
> > > > + ret = pwrite(fd, &val, len, pos);
> > > > + if (ret != len) {
> > > > + if ((ret < 0) && (errno == EINTR || errno == EAGAIN))
> > > > + goto again;
> > >
> > >
> > > do {} while() ?
> >
> > Sure, this is just a copy of another place that does something similar.
> > They should either be merged or both converted in a separate patch.
> >
> > > > +
> > > > + fprintf(stderr, "%s: pwrite failed, ret = %zd errno = %d\n",
> > > > + __func__, ret, errno);
> > > > +
> > > > + exit(1);
> > > > + }
> > > > +
> > > > + return;
> > > > +}
> > > > +
> > > > static uint8_t pci_find_cap_offset(PCIDevice *d, uint8_t cap)
> > > > {
> > > > int id;
> > > > @@ -1244,37 +1265,75 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev, unsigned int ctrl_pos)
> > > > #endif
> > > > #endif
> > > >
> > > > +static uint32_t assigned_device_pci_cap_read_config(PCIDevice *pci_dev,
> > > > + uint8_t cap_id,
> > > > + uint32_t address, int len)
> > > > +{
> > > > + uint8_t cap;
> > > > +
> > > > + switch (cap_id) {
> > > > +
> > > > + case PCI_CAP_ID_VPD:
> > > > + cap = pci_find_capability(pci_dev, cap_id);
> > > > + if (address - cap >= PCI_CAP_FLAGS) {
> > > > + return assigned_dev_pci_read(pci_dev, address, len);
> > > > + }
> > > > + break;
> > > > +
> > > > + case PCI_CAP_ID_VNDR:
> > > > + cap = pci_find_capability(pci_dev, cap_id);
> > > > + if (address - cap > PCI_CAP_FLAGS) {
> > > > + return assigned_dev_pci_read(pci_dev, address, len);
> > > > + }
> > > > + break;
> > > > + }
> > > > +
> > > > + return pci_default_cap_read_config(pci_dev, cap_id, address, len);
> > > > +}
> > > > +
> > > > static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev,
> > > > uint8_t cap_id,
> > > > uint32_t address,
> > > > uint32_t val, int len)
> > > > {
> > > > + uint8_t cap;
> > > > +
> > > > pci_default_cap_write_config(pci_dev, cap_id, address, val, len);
> > > >
> > > > switch (cap_id) {
> > > > #ifdef KVM_CAP_IRQ_ROUTING
> > > > case PCI_CAP_ID_MSI:
> > > > #ifdef KVM_CAP_DEVICE_MSI
> > > > - {
> > > > - uint8_t cap = pci_find_capability(pci_dev, cap_id);
> > > > - if (ranges_overlap(address - cap, len, PCI_MSI_FLAGS, 1)) {
> > > > - assigned_dev_update_msi(pci_dev, cap + PCI_MSI_FLAGS);
> > > > - }
> > > > + cap = pci_find_capability(pci_dev, cap_id);
> > > > + if (ranges_overlap(address - cap, len, PCI_MSI_FLAGS, 1)) {
> > > > + assigned_dev_update_msi(pci_dev, cap + PCI_MSI_FLAGS);
> > > > }
> > > > #endif
> > > > break;
> > > >
> > > > case PCI_CAP_ID_MSIX:
> > > > #ifdef KVM_CAP_DEVICE_MSIX
> > > > - {
> > > > - uint8_t cap = pci_find_capability(pci_dev, cap_id);
> > > > - if (ranges_overlap(address - cap, len, PCI_MSIX_FLAGS + 1, 1)) {
> > > > - assigned_dev_update_msix(pci_dev, cap + PCI_MSIX_FLAGS);
> > > > - }
> > > > + cap = pci_find_capability(pci_dev, cap_id);
> > > > + if (ranges_overlap(address - cap, len, PCI_MSIX_FLAGS + 1, 1)) {
> > > > + assigned_dev_update_msix(pci_dev, cap + PCI_MSIX_FLAGS);
> > > > }
> > > > #endif
> > > > break;
> > > > #endif
> > > > +
> > > > + case PCI_CAP_ID_VPD:
> > > > + cap = pci_find_capability(pci_dev, cap_id);
> > > > + if (address - cap >= PCI_CAP_FLAGS) {
> > > > + assigned_dev_pci_write(pci_dev, address, val, len);
> > > > + }
> > > > + break;
> > > > +
> > > > + case PCI_CAP_ID_VNDR:
> > > > + cap = pci_find_capability(pci_dev, cap_id);
> > > > + if (address - cap > PCI_CAP_FLAGS) {
> > > > + assigned_dev_pci_write(pci_dev, address, val, len);
> > > > + }
> > > > + break;
> > >
> > > I have a feeling we should use overlap functions instead of
> > > address math. What do you think?
> >
> > if (!ranges_overlap(address - cap, len, 0, PCI_CAP_FLAGS)) ?
>
> ranges_overlap(address, len, cap, PCI_CAP_FLAGS)
>
> > Sure, that'd be a nice cleanup.
> >
> > > Also - put cap offsets in assigned device structure to avoid
> > > find calls?
> >
> > I suppose there aren't enough capability IDs that it'd take much space
> > to do so, but it doesn't sound like a unique to device assignment issue.
> > Maybe that should live on PCIDevice with an access function.
>
> Sure, I put all caps that we actually emulate in PCIDevice.
> So that would apply to express, pcix, etc.
> Sticking offsets to caps that core doesn't emulate in PCIDevice
> seems a bit strange. That's why each device has its own device state.
The counter argument is that instead of sprinkling cap_msi, cap_msix,
cap_pcie, cap_foo into PCIDevice as support gets added, it would add a
lot of consistency to have a uint8_t caps[PCI_CAP_ID_MAX], then
pci_find_capability simply becomes return pdev->caps[cap_id], and we can
make more use of it.
> > > > }
> > > > return;
> > > > }
> > > > @@ -1340,6 +1399,84 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev)
> > > > #endif
> > > > #endif
> > > >
> > > > + /* Minimal PM support, make the state bits writable so the guest
> > > > + * thinks it's doing something. */
> > > > + if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_PM))) {
> > > > + uint16_t pmc, pmcsr;
> > > > + pci_add_capability_at_offset(pci_dev, PCI_CAP_ID_PM, pos,
> > > > + PCI_PM_SIZEOF);
> > > > +
> > > > + pmc = pci_get_word(pci_dev->config + pos + PCI_CAP_FLAGS);
> > > > + pmc &= (PCI_PM_CAP_VER_MASK | PCI_PM_CAP_DSI);
> > > > + pci_set_word(pci_dev->config + pos + PCI_CAP_FLAGS, pmc);
> > > > +
> > > > + pmcsr = pci_get_word(pci_dev->config + pos + PCI_PM_CTRL);
> > > > + pmcsr &= (PCI_PM_CTRL_STATE_MASK);
> > > > + pmcsr |= PCI_PM_CTRL_NO_SOFT_RST;
> > > > + pci_set_word(pci_dev->config + pos + PCI_PM_CTRL, pmcsr);
> > > > +
> > > > + pci_set_word(pci_dev->wmask + pos + PCI_PM_CTRL,
> > > > + PCI_PM_CTRL_STATE_MASK);
> > > > + }
> > >
> > > we don't pass anything to device. So - can this be put in pci_pm.c
> > > so that emulated devices can use this too?
> >
> > That's part of why this one is an RFC, should we allow the guest to do
> > some level of power management to the physical device? Xen seems to
> > allow D-state manipulation by the guest.
>
> Hmm, ok. I still hope we can do the emulated one and then
> assigned device would call the relevant function and pass
> info to the backend.
I hope so too, but I don't want generalizing an interface to gate
supporting the capability for assigned devices.
> > > > +
> > > > + if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_EXP))) {
> > > > + uint16_t devctl, lnkcap, lnksta;
> > > > +
> > > > + pci_add_capability_at_offset(pci_dev, PCI_CAP_ID_EXP, pos, 0x40);
> > > > +
> > > > + devctl = pci_get_word(pci_dev->config + pos + PCI_EXP_DEVCTL);
> > > > + devctl = (devctl & (PCI_EXP_DEVCTL_READRQ | PCI_EXP_DEVCTL_PAYLOAD)) |
> > > > + PCI_EXP_DEVCTL_RELAX_EN | PCI_EXP_DEVCTL_NOSNOOP_EN;
> > > > + pci_set_word(pci_dev->config + pos + PCI_EXP_DEVCTL, devctl);
> > > > + devctl = PCI_EXP_DEVCTL_BCR_FLR | PCI_EXP_DEVCTL_AUX_PME;
> > > > + pci_set_word(pci_dev->wmask + pos + PCI_EXP_DEVCTL, ~devctl);
> > > > +
> > > > + lnkcap = pci_get_word(pci_dev->config + pos + PCI_EXP_LNKCAP);
> > > > + lnkcap &= (PCI_EXP_LNKCAP_SLS | PCI_EXP_LNKCAP_MLW |
> > > > + PCI_EXP_LNKCAP_ASPMS | PCI_EXP_LNKCAP_L0SEL |
> > > > + PCI_EXP_LNKCAP_L1EL);
> > > > + pci_set_word(pci_dev->config + pos + PCI_EXP_LNKCAP, lnkcap);
> > > > +
> > > > + pci_set_word(pci_dev->wmask + pos + PCI_EXP_LNKCAP,
> > > > + PCI_EXP_LNKCTL_ASPMC | PCI_EXP_LNKCTL_RCB |
> > > > + PCI_EXP_LNKCTL_CCC | PCI_EXP_LNKCTL_ES |
> > > > + PCI_EXP_LNKCTL_CLKREQ_EN | PCI_EXP_LNKCTL_HAWD);
> > > > +
> > > > + lnksta = pci_get_word(pci_dev->config + pos + PCI_EXP_LNKSTA);
> > > > + lnksta &= (PCI_EXP_LNKSTA_CLS | PCI_EXP_LNKSTA_NLW);
> > > > + pci_set_word(pci_dev->config + pos + PCI_EXP_LNKSTA, lnksta);
> > > > +
> > > > + pci_set_word(pci_dev->wmask + pos + 0x28, 0x1f);
> > > > + pci_set_word(pci_dev->wmask + pos + 0x30, 0xfbf);
> > >
> > > This seems to overlap functionally with the express work upstream.
> > > Can code from there be reused? I also wonder whether is affects the
> > > guest OS if it finds an express device on a non-express bridge.
> >
> > Yes, perhaps it can be merged. I'd like to start with figuring out what
> > we need for device assignment, and merging where it makes sense later
> > than stall out trying to solve the whole problem upfront.
> >
> > It could certainly be confusing for a driver to find an express device
> > under a PIIX chipset, but I think typically the driver is just looking
> > for link speed info for pretty printks.
> > > > + }
> > > > +
> > > > + if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_VPD))) {
> > > > + pci_add_capability_at_offset(pci_dev, PCI_CAP_ID_VPD, pos, 0x8);
> > > > + }
> > > > +
> > > > + if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_VNDR))) {
> > > > + uint8_t len = pci_get_byte(pci_dev->config + pos + PCI_CAP_FLAGS);
> > > > + pci_add_capability_at_offset(pci_dev, PCI_CAP_ID_VNDR, pos, len);
> > > > + }
> > > > +
> > > > + if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_PCIX))) {
> > > > + uint16_t cmd;
> > > > + uint32_t status;
> > > > +
> > > > + pci_add_capability_at_offset(pci_dev, PCI_CAP_ID_PCIX, pos, 0x8);
> > > > +
> > > > + cmd = pci_get_word(pci_dev->config + pos + PCI_X_CMD);
> > > > + cmd &= (PCI_X_CMD_DPERR_E | PCI_X_CMD_ERO | PCI_X_CMD_MAX_READ |
> > > > + PCI_X_CMD_MAX_SPLIT);
> > > > + pci_set_word(pci_dev->config + pos + PCI_X_CMD, cmd);
> > > > +
> > > > + status = pci_get_long(pci_dev->config + pos + PCI_X_STATUS);
> > > > + status &= ~(PCI_X_STATUS_BUS | PCI_X_STATUS_DEVFN);
> > > > + status |= (pci_bus_num(pci_dev->bus) << 8) | pci_dev->devfn;
> > > > + status &= ~(PCI_X_STATUS_SPL_DISC | PCI_X_STATUS_UNX_SPL |
> > > > + PCI_X_STATUS_SPL_ERR);
> > > > + pci_set_long(pci_dev->config + pos + PCI_X_STATUS, status);
> > > > + }
> > >
> > > This will be handy for non-assignment case so
> > > I'd like to see this moved out of device-assignment.c:
> > > we could create pcix.c or just add to pci.c.
> >
> > Yeah, I'm not sure there's enough we can poke at in a PCIX capability to
> > warrant it's own file. Here for the same reason as the others, what do
> > we want to expose, what's emulated, what's poke-able. We also have the
> > benefit here that we get default from hardware. If it stays this small,
> > I'd just assume leave it here than try to generalize an interface when
> > we're the only user.
>
> Do you have a pcix device btw? Do we even care?
I don't have one, but I know of VT-d capable systems that support PCI-e
to PCI-X bridges in their topology.
Alex
> > > > return 0;
> > > > }
> > > >
> > > > @@ -1466,7 +1603,8 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
> > > > dev->h_busnr = dev->host.bus;
> > > > dev->h_devfn = PCI_DEVFN(dev->host.dev, dev->host.func);
> > > >
> > > > - pci_register_capability_handlers(pci_dev, NULL,
> > > > + pci_register_capability_handlers(pci_dev,
> > > > + assigned_device_pci_cap_read_config,
> > > > assigned_device_pci_cap_write_config);
> > >
> > > Maybe these could go away?
> >
> > Sure, pass a capability ID to the read/write_config functions and I'd
> > support this going way. I don't think that necessarily needs to be tied
> > to this series though. Thanks,
> >
> > Alex
>
> As in, can be a separate cleanup later. Yes.
>
> > > >
> > > > if (assigned_device_pci_cap_init(pci_dev) < 0)
> >
> >
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Qemu-devel] Re: [RFC PATCH 8/8] device-assignment: pass through and stub more PCI caps
2010-11-12 15:42 ` Alex Williamson
@ 2010-11-16 16:08 ` Michael S. Tsirkin
0 siblings, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2010-11-16 16:08 UTC (permalink / raw)
To: Alex Williamson; +Cc: chrisw, qemu-devel, kvm
On Fri, Nov 12, 2010 at 08:42:38AM -0700, Alex Williamson wrote:
> On Fri, 2010-11-12 at 11:11 +0200, Michael S. Tsirkin wrote:
> > On Thu, Nov 11, 2010 at 11:30:07PM -0700, Alex Williamson wrote:
> > > On Fri, 2010-11-12 at 07:36 +0200, Michael S. Tsirkin wrote:
> > > > On Thu, Nov 11, 2010 at 07:56:46PM -0700, Alex Williamson wrote:
> > > > > Some drivers depend on finding capabilities like power management,
> > > > > PCI express/X, vital product data, or vendor specific fields. Now
> > > > > that we have better capability support, we can pass more of these
> > > > > tables through to the guest. Note that VPD and VNDR are direct pass
> > > > > through capabilies, the rest are mostly empty shells with a few
> > > > > writable bits where necessary.
> > > > >
> > > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > > > ---
> > > > >
> > > > > hw/device-assignment.c | 160 +++++++++++++++++++++++++++++++++++++++++++++---
> > > > > 1 files changed, 149 insertions(+), 11 deletions(-)
> > > > >
> > > > > diff --git a/hw/device-assignment.c b/hw/device-assignment.c
> > > > > index 179c7dc..1b228ad 100644
> > > > > --- a/hw/device-assignment.c
> > > > > +++ b/hw/device-assignment.c
> > > > > @@ -366,6 +366,27 @@ static uint8_t assigned_dev_pci_read_byte(PCIDevice *d, int pos)
> > > > > return (uint8_t)assigned_dev_pci_read(d, pos, 1);
> > > > > }
> > > > >
> > > > > +static void assigned_dev_pci_write(PCIDevice *d, int pos, uint32_t val, int len)
> > > > > +{
> > > > > + AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev);
> > > > > + ssize_t ret;
> > > > > + int fd = pci_dev->real_device.config_fd;
> > > > > +
> > > > > +again:
> > > > > + ret = pwrite(fd, &val, len, pos);
> > > > > + if (ret != len) {
> > > > > + if ((ret < 0) && (errno == EINTR || errno == EAGAIN))
> > > > > + goto again;
> > > >
> > > >
> > > > do {} while() ?
> > >
> > > Sure, this is just a copy of another place that does something similar.
> > > They should either be merged or both converted in a separate patch.
> > >
> > > > > +
> > > > > + fprintf(stderr, "%s: pwrite failed, ret = %zd errno = %d\n",
> > > > > + __func__, ret, errno);
> > > > > +
> > > > > + exit(1);
> > > > > + }
> > > > > +
> > > > > + return;
> > > > > +}
> > > > > +
> > > > > static uint8_t pci_find_cap_offset(PCIDevice *d, uint8_t cap)
> > > > > {
> > > > > int id;
> > > > > @@ -1244,37 +1265,75 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev, unsigned int ctrl_pos)
> > > > > #endif
> > > > > #endif
> > > > >
> > > > > +static uint32_t assigned_device_pci_cap_read_config(PCIDevice *pci_dev,
> > > > > + uint8_t cap_id,
> > > > > + uint32_t address, int len)
> > > > > +{
> > > > > + uint8_t cap;
> > > > > +
> > > > > + switch (cap_id) {
> > > > > +
> > > > > + case PCI_CAP_ID_VPD:
> > > > > + cap = pci_find_capability(pci_dev, cap_id);
> > > > > + if (address - cap >= PCI_CAP_FLAGS) {
> > > > > + return assigned_dev_pci_read(pci_dev, address, len);
> > > > > + }
> > > > > + break;
> > > > > +
> > > > > + case PCI_CAP_ID_VNDR:
> > > > > + cap = pci_find_capability(pci_dev, cap_id);
> > > > > + if (address - cap > PCI_CAP_FLAGS) {
> > > > > + return assigned_dev_pci_read(pci_dev, address, len);
> > > > > + }
> > > > > + break;
> > > > > + }
> > > > > +
> > > > > + return pci_default_cap_read_config(pci_dev, cap_id, address, len);
> > > > > +}
> > > > > +
> > > > > static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev,
> > > > > uint8_t cap_id,
> > > > > uint32_t address,
> > > > > uint32_t val, int len)
> > > > > {
> > > > > + uint8_t cap;
> > > > > +
> > > > > pci_default_cap_write_config(pci_dev, cap_id, address, val, len);
> > > > >
> > > > > switch (cap_id) {
> > > > > #ifdef KVM_CAP_IRQ_ROUTING
> > > > > case PCI_CAP_ID_MSI:
> > > > > #ifdef KVM_CAP_DEVICE_MSI
> > > > > - {
> > > > > - uint8_t cap = pci_find_capability(pci_dev, cap_id);
> > > > > - if (ranges_overlap(address - cap, len, PCI_MSI_FLAGS, 1)) {
> > > > > - assigned_dev_update_msi(pci_dev, cap + PCI_MSI_FLAGS);
> > > > > - }
> > > > > + cap = pci_find_capability(pci_dev, cap_id);
> > > > > + if (ranges_overlap(address - cap, len, PCI_MSI_FLAGS, 1)) {
> > > > > + assigned_dev_update_msi(pci_dev, cap + PCI_MSI_FLAGS);
> > > > > }
> > > > > #endif
> > > > > break;
> > > > >
> > > > > case PCI_CAP_ID_MSIX:
> > > > > #ifdef KVM_CAP_DEVICE_MSIX
> > > > > - {
> > > > > - uint8_t cap = pci_find_capability(pci_dev, cap_id);
> > > > > - if (ranges_overlap(address - cap, len, PCI_MSIX_FLAGS + 1, 1)) {
> > > > > - assigned_dev_update_msix(pci_dev, cap + PCI_MSIX_FLAGS);
> > > > > - }
> > > > > + cap = pci_find_capability(pci_dev, cap_id);
> > > > > + if (ranges_overlap(address - cap, len, PCI_MSIX_FLAGS + 1, 1)) {
> > > > > + assigned_dev_update_msix(pci_dev, cap + PCI_MSIX_FLAGS);
> > > > > }
> > > > > #endif
> > > > > break;
> > > > > #endif
> > > > > +
> > > > > + case PCI_CAP_ID_VPD:
> > > > > + cap = pci_find_capability(pci_dev, cap_id);
> > > > > + if (address - cap >= PCI_CAP_FLAGS) {
> > > > > + assigned_dev_pci_write(pci_dev, address, val, len);
> > > > > + }
> > > > > + break;
> > > > > +
> > > > > + case PCI_CAP_ID_VNDR:
> > > > > + cap = pci_find_capability(pci_dev, cap_id);
> > > > > + if (address - cap > PCI_CAP_FLAGS) {
> > > > > + assigned_dev_pci_write(pci_dev, address, val, len);
> > > > > + }
> > > > > + break;
> > > >
> > > > I have a feeling we should use overlap functions instead of
> > > > address math. What do you think?
> > >
> > > if (!ranges_overlap(address - cap, len, 0, PCI_CAP_FLAGS)) ?
> >
> > ranges_overlap(address, len, cap, PCI_CAP_FLAGS)
> >
> > > Sure, that'd be a nice cleanup.
> > >
> > > > Also - put cap offsets in assigned device structure to avoid
> > > > find calls?
> > >
> > > I suppose there aren't enough capability IDs that it'd take much space
> > > to do so, but it doesn't sound like a unique to device assignment issue.
> > > Maybe that should live on PCIDevice with an access function.
> >
> > Sure, I put all caps that we actually emulate in PCIDevice.
> > So that would apply to express, pcix, etc.
> > Sticking offsets to caps that core doesn't emulate in PCIDevice
> > seems a bit strange. That's why each device has its own device state.
>
> The counter argument is that instead of sprinkling cap_msi, cap_msix,
> cap_pcie, cap_foo into PCIDevice as support gets added, it would add a
> lot of consistency to have a uint8_t caps[PCI_CAP_ID_MAX], then
> pci_find_capability simply becomes return pdev->caps[cap_id], and we can
> make more use of it.
Consider that express has 16 bit IDs too. That might make it a problem
if we try to use them as indexes.
--
MST
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Qemu-devel] Re: [PATCH 0/8] PCI capability and device assignment improvements
2010-11-12 2:54 [Qemu-devel] [PATCH 0/8] PCI capability and device assignment improvements Alex Williamson
` (7 preceding siblings ...)
2010-11-12 2:56 ` [Qemu-devel] [RFC PATCH 8/8] device-assignment: pass through and stub more PCI caps Alex Williamson
@ 2010-11-12 5:39 ` Michael S. Tsirkin
8 siblings, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2010-11-12 5:39 UTC (permalink / raw)
To: Alex Williamson; +Cc: chrisw, qemu-devel, kvm
On Thu, Nov 11, 2010 at 07:54:49PM -0700, Alex Williamson wrote:
> This series attempts to clean up capability support between common
> code and device assignment. In doing so, we can move existing MSI &
> MSI-X capabilities to offsets matching real hardware, and further
> enable more capabilities to be exposed.
Very good cleanup overall. Some further suggestions posted.
Thanks!
> The last patch is only for RFC, I'd like some input on what we should
> pass directly and where we should only provide read-only/emulated
> access. Patches 1-7 are submitted for commit. Thanks,
>
> Alex
>
> ---
>
> Alex Williamson (8):
> device-assignment: pass through and stub more PCI caps
> pci: Pass ID for capability read/write handlers
> device-assignment: Move PCI capabilities to match physical hardware
> pci: Remove cap.length, cap.start, cap.supported
> pci: Replace used bitmap with capability byte map
> device-assignment: Use PCI capabilities support
> pci: Remove pci_enable_capability_support()
> pci: pci_default_cap_write_config ignores wmask
>
>
> hw/device-assignment.c | 273 ++++++++++++++++++++++++++++++++++++------------
> hw/pci.c | 103 +++++++-----------
> hw/pci.h | 25 ++--
> 3 files changed, 256 insertions(+), 145 deletions(-)
^ permalink raw reply [flat|nested] 26+ messages in thread