From: Gleb Natapov <gleb@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: yamahata@valinux.co.jp,
Alex Williamson <alex.williamson@redhat.com>,
qemu-devel@nongnu.org, armbru@redhat.com
Subject: [Qemu-devel] Re: [PATCH] PCI: Bus number from the bridge, not the device
Date: Mon, 8 Nov 2010 19:27:49 +0200 [thread overview]
Message-ID: <20101108172749.GA19863@redhat.com> (raw)
In-Reply-To: <20101108170837.GI7962@redhat.com>
On Mon, Nov 08, 2010 at 07:08:37PM +0200, Michael S. Tsirkin wrote:
> On Mon, Nov 08, 2010 at 07:00:15PM +0200, Gleb Natapov wrote:
> > On Mon, Nov 08, 2010 at 06:26:33PM +0200, Michael S. Tsirkin wrote:
> > > On Mon, Nov 08, 2010 at 07:52:12AM -0700, Alex Williamson wrote:
> > > > On Mon, 2010-11-08 at 13:22 +0200, Michael S. Tsirkin wrote:
> > > > > On Mon, Oct 04, 2010 at 03:53:11PM -0600, Alex Williamson wrote:
> > > > > > pcibus_dev_print() was erroneously retrieving the device bus
> > > > > > number from the secondary bus number offset of the device
> > > > > > instead of the bridge above the device. This ends of landing
> > > > > > in the 2nd byte of the 3rd BAR for devices, which thankfully
> > > > > > is usually zero. pcibus_get_dev_path() copied this code,
> > > > > > inheriting the same bug. pcibus_get_dev_path() is used for
> > > > > > ramblock naming, so changing it can effect migration. However,
> > > > > > I've only seen this byte be non-zero for an assigned device,
> > > > > > which can't migrate anyway, so hopefully we won't run into
> > > > > > any issues.
> > > > > >
> > > > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > > >
> > > > > Good catch. Applied.
> > > > > I don't really see why do we put the dev path
> > > > > in the bus object: why not let device supply its name?
> > > >
> > > > Because the device name is not unique. This came about from the
> > > > discussion about how to create a canonical device path that Gleb and
> > > > Markus are again trying to hash out. If we go up to the bus and get the
> > > > bus address, we have a VM unique name. Unfortunately, it's difficult to
> > > > define what the bus should print in all cases (ISA), but since they
> > > > don't do hotplug and typically don't allocate ramblocks, we can mostly
> > > > ignore it for this use case.
> > > >
> > > > > And I think this will affect nested bridges. However they are currently
> > > > > broken anyway: we really must convert to topological names as bus number
> > > > > is guest-assigned - they don't have to be unique, even.
> > > >
> > > > Yes, nested bridges are a problem. How can the seg/bus/devfn not be
> > > > unique?
> > >
> > > Bus numbers for nested bridges are guest assigned. We start with 0 after reset.
> > >
> > > > > What does fixing this involve? Just changing pcibus_get_dev_path?
> > > >
> > > > How do you plan to fix it? Don't forget that migration depends on these
> > > > names, so some kind of compatibility layer would be required. Thanks,
> > > >
> > > > Alex
> > >
> > > Replace bus number with slot numbers of parent bridges up to the root.
> > > This works for root bridge in a compatible way because bus number there
> > > is hard-coded to 0.
> > > IMO nested bridges are broken anyway, no way to be compatible there.
> > >
> > >
> > > Gleb, Markus, I think the following should be sufficient for PCI. What
> > > do you think? Also - do we need to update QMP/monitor to teach them to
> > > work with these paths?
> > >
> >
> > I am no longer use bus's get_dev_path callback for my purpose (I added
> > another one get_fw_dev_path)
>
> Why? Because get_dev_path is unstable? to avoid changing for migration?
Because to get functional get_dev_path() we need to implemented it
recursively (hint domain should not be part of PCI bus get_dev_path but
its parent bus instead). And since get_dev_path() is used in migration
code you have all or nothing situation. Either you implement it for all
devices properly or for none.
> So with this patch, you get to use it again.
Unfortunately not.
> IMO two ways to address devices is a bad idea.
>
> > since it is used for migration
>
> what is used for migration?
>
get_dev_path() callback is used to create unique instance id. Actually
it is the only use of it so simple grep will show you this :)
> > it should
> > be done for all buses to work properly. And by properly I mean produce
> > full path from system root to the device itself recursively.
>
> This is what the code does. Recursion is not needed here though.
>
Recursion is needed to create full device path. I am not interested in
only PCI here. In fact PCI one is the easiest.
> > But what I
> > learned is that by changing get_dev_path output you will break migration
> > from older guest to newer once (something we have to support).
>
> Well I think migration for sytems with nested buses are broken anyway:
> it's just a new feature where we simply did not figure out the migration
> andgle before the last release.
> Without nesting my code returns exactly the existing output
> so no, it's not broken.
>
It does not drops bus id part? If without bridges the output is exactly
same then migration is OK with the patch.
> > And of
> > course, as Alex said already, you need to traverse bridges recursively
>
> Well this is an implementation detail. loop is functionally equivalent,
> only cleaner and easier to understand.
>
With that I can agree. I am referring more to recursion down device
tree.
> > and
> > domain does not provide any meaningful information.
>
> I dont understand yet, sorry. Code I posted returns exactly
> what's there today. If domain does not provide any meaningful
> information, How do things work then? And how do you address
> roots on a system with multiple roots?
>
The current code is broken. Look at my patch series.
> > > This is on top of Alex's patch, completely untested.
> > >
> > >
> > > pci: fix device path for devices behind nested bridges
> > >
> > > We were using bus number in the device path, which is clearly
> > > broken as this number is guest-assigned for all devices
> > > except the root.
> > >
> > > Fix by using hierarchical list of slots, walking the path
> > > from root down to device, instead. Add :00 as bus number
> > > so that if there are no nested bridges, this is compatible
> > > with what we have now.
> > >
> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > >
> > > diff --git a/hw/pci.c b/hw/pci.c
> > > index 7d12473..fa98d94 100644
> > > --- a/hw/pci.c
> > > +++ b/hw/pci.c
> > > @@ -1826,13 +1826,45 @@ static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent)
> > >
> > > static char *pcibus_get_dev_path(DeviceState *dev)
> > > {
> > > - PCIDevice *d = (PCIDevice *)dev;
> > > - char path[16];
> > > -
> > > - snprintf(path, sizeof(path), "%04x:%02x:%02x.%x",
> > > - pci_find_domain(d->bus), pci_bus_num(d->bus),
> > > - PCI_SLOT(d->devfn), PCI_FUNC(d->devfn));
> > > -
> > > - return strdup(path);
> > > + PCIDevice *d = container_of(dev, PCIDevice, qdev);
> > > + PCIDevice *t;
> > > + int slot_depth;
> > > + /* Path format: Domain:00:Slot:Slot....:Slot.Function.
> > > + * 00 is added here to make this format compatible with
> > > + * domain:Bus:Slot.Func for systems without nested PCI bridges.
> > > + * Slot list specifies the slot numbers for all devices on the
> > > + * path from root to the specific device. */
> > > + int domain_len = strlen("DDDD:00");
> > > + int func_len = strlen(".F");
> > > + int slot_len = strlen(":SS");
> > > + int path_len;
> > > + char *path, *p;
> > > +
> > > + /* Calculate # of slots on path between device and root. */;
> > > + slot_depth = 0;
> > > + for (t = d; t; t = t->bus->parent_dev)
> > > + ++slot_depth;
> > > +
> > > + path_len = domain_len + bus_len + slot_len * slot_depth + func_len;
> > > +
> > > + /* Allocate memory, fill in the terminating null byte. */
> > > + path = malloc(path_len + 1 /* For '\0' */);
> > > + path[path_len] = '\0';
> > > +
> > > + /* First field is the domain. */
> > > + snprintf(path, domain_len, "%04x", pci_find_domain(d->bus));
> > > +
> > > + /* Leave space for slot numbers and fill in function number. */
> > > + p = path + domain_len + slot_len * slot_depth;
> > > + snprintf(p, func_len, ".%02x", PCI_FUNC(d->devfn));
> > > +
> > > + /* Fill in slot numbers. We walk up from device to root, so need to print
> > > + * them in the reverse order, last to first. */
> > > + for (t = d; t; t = t->bus->parent_dev) {
> > > + p -= slot_len;
> > > + snprintf(p, slot_len, ":%x", PCI_SLOT(t->devfn));
> > > + }
> > > +
> > > + return path;
> > > }
> > >
> >
> > --
> > Gleb.
--
Gleb.
next prev parent reply other threads:[~2010-11-08 17:28 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-04 21:53 [Qemu-devel] [PATCH] PCI: Bus number from the bridge, not the device Alex Williamson
2010-11-08 11:22 ` [Qemu-devel] " Michael S. Tsirkin
2010-11-08 14:52 ` Alex Williamson
2010-11-08 16:26 ` Michael S. Tsirkin
2010-11-08 16:36 ` Alex Williamson
2010-11-08 16:48 ` Michael S. Tsirkin
2010-11-08 17:00 ` Gleb Natapov
2010-11-08 17:08 ` Michael S. Tsirkin
2010-11-08 17:27 ` Gleb Natapov [this message]
2010-11-09 2:41 ` Isaku Yamahata
2010-11-09 11:53 ` Michael S. Tsirkin
2010-11-19 17:02 ` Markus Armbruster
2010-11-19 20:38 ` Gleb Natapov
2010-11-20 20:17 ` Michael S. Tsirkin
2010-11-21 8:32 ` Gleb Natapov
2010-11-21 9:50 ` Michael S. Tsirkin
2010-11-21 10:19 ` Gleb Natapov
2010-11-21 11:53 ` Michael S. Tsirkin
2010-11-21 12:50 ` Gleb Natapov
2010-11-21 14:48 ` Michael S. Tsirkin
2010-11-21 16:01 ` Gleb Natapov
2010-11-21 16:38 ` Michael S. Tsirkin
2010-11-21 17:28 ` Gleb Natapov
2010-11-21 18:22 ` Michael S. Tsirkin
2010-11-21 19:29 ` Gleb Natapov
2010-11-21 20:39 ` Michael S. Tsirkin
2010-11-22 7:37 ` Gleb Natapov
2010-11-22 8:16 ` Michael S. Tsirkin
2010-11-22 13:04 ` Gleb Natapov
2010-11-22 14:50 ` Michael S. Tsirkin
2010-11-22 14:52 ` Gleb Natapov
2010-11-22 14:56 ` Michael S. Tsirkin
2010-11-22 14:58 ` Gleb Natapov
2010-11-22 16:41 ` Michael S. Tsirkin
2010-11-22 17:01 ` Gleb Natapov
2010-12-13 20:04 ` Alex Williamson
2010-12-14 4:46 ` Michael S. Tsirkin
2010-12-14 4:49 ` Alex Williamson
2010-12-14 4:57 ` Michael S. Tsirkin
2010-12-14 5:04 ` Alex Williamson
2010-12-14 12:26 ` Michael S. Tsirkin
2010-12-14 18:34 ` Alex Williamson
2010-12-15 9:56 ` Michael S. Tsirkin
2010-12-15 15:27 ` Alex Williamson
2010-12-16 7:08 ` Isaku Yamahata
2010-12-16 8:36 ` Michael S. Tsirkin
2010-12-21 10:13 ` Isaku Yamahata
2010-12-21 10:56 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101108172749.GA19863@redhat.com \
--to=gleb@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=armbru@redhat.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=yamahata@valinux.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.