From: Gleb Natapov <gleb@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: yamahata@valinux.co.jp,
Alex Williamson <alex.williamson@redhat.com>,
qemu-devel@nongnu.org, armbru@redhat.com
Subject: [Qemu-devel] Re: [PATCH] PCI: Bus number from the bridge, not the device
Date: Mon, 8 Nov 2010 19:27:49 +0200 [thread overview]
Message-ID: <20101108172749.GA19863@redhat.com> (raw)
In-Reply-To: <20101108170837.GI7962@redhat.com>
On Mon, Nov 08, 2010 at 07:08:37PM +0200, Michael S. Tsirkin wrote:
> On Mon, Nov 08, 2010 at 07:00:15PM +0200, Gleb Natapov wrote:
> > On Mon, Nov 08, 2010 at 06:26:33PM +0200, Michael S. Tsirkin wrote:
> > > On Mon, Nov 08, 2010 at 07:52:12AM -0700, Alex Williamson wrote:
> > > > On Mon, 2010-11-08 at 13:22 +0200, Michael S. Tsirkin wrote:
> > > > > On Mon, Oct 04, 2010 at 03:53:11PM -0600, Alex Williamson wrote:
> > > > > > pcibus_dev_print() was erroneously retrieving the device bus
> > > > > > number from the secondary bus number offset of the device
> > > > > > instead of the bridge above the device. This ends of landing
> > > > > > in the 2nd byte of the 3rd BAR for devices, which thankfully
> > > > > > is usually zero. pcibus_get_dev_path() copied this code,
> > > > > > inheriting the same bug. pcibus_get_dev_path() is used for
> > > > > > ramblock naming, so changing it can effect migration. However,
> > > > > > I've only seen this byte be non-zero for an assigned device,
> > > > > > which can't migrate anyway, so hopefully we won't run into
> > > > > > any issues.
> > > > > >
> > > > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > > >
> > > > > Good catch. Applied.
> > > > > I don't really see why do we put the dev path
> > > > > in the bus object: why not let device supply its name?
> > > >
> > > > Because the device name is not unique. This came about from the
> > > > discussion about how to create a canonical device path that Gleb and
> > > > Markus are again trying to hash out. If we go up to the bus and get the
> > > > bus address, we have a VM unique name. Unfortunately, it's difficult to
> > > > define what the bus should print in all cases (ISA), but since they
> > > > don't do hotplug and typically don't allocate ramblocks, we can mostly
> > > > ignore it for this use case.
> > > >
> > > > > And I think this will affect nested bridges. However they are currently
> > > > > broken anyway: we really must convert to topological names as bus number
> > > > > is guest-assigned - they don't have to be unique, even.
> > > >
> > > > Yes, nested bridges are a problem. How can the seg/bus/devfn not be
> > > > unique?
> > >
> > > Bus numbers for nested bridges are guest assigned. We start with 0 after reset.
> > >
> > > > > What does fixing this involve? Just changing pcibus_get_dev_path?
> > > >
> > > > How do you plan to fix it? Don't forget that migration depends on these
> > > > names, so some kind of compatibility layer would be required. Thanks,
> > > >
> > > > Alex
> > >
> > > Replace bus number with slot numbers of parent bridges up to the root.
> > > This works for root bridge in a compatible way because bus number there
> > > is hard-coded to 0.
> > > IMO nested bridges are broken anyway, no way to be compatible there.
> > >
> > >
> > > Gleb, Markus, I think the following should be sufficient for PCI. What
> > > do you think? Also - do we need to update QMP/monitor to teach them to
> > > work with these paths?
> > >
> >
> > I am no longer use bus's get_dev_path callback for my purpose (I added
> > another one get_fw_dev_path)
>
> Why? Because get_dev_path is unstable? to avoid changing for migration?
Because to get functional get_dev_path() we need to implemented it
recursively (hint domain should not be part of PCI bus get_dev_path but
its parent bus instead). And since get_dev_path() is used in migration
code you have all or nothing situation. Either you implement it for all
devices properly or for none.
> So with this patch, you get to use it again.
Unfortunately not.
> IMO two ways to address devices is a bad idea.
>
> > since it is used for migration
>
> what is used for migration?
>
get_dev_path() callback is used to create unique instance id. Actually
it is the only use of it so simple grep will show you this :)
> > it should
> > be done for all buses to work properly. And by properly I mean produce
> > full path from system root to the device itself recursively.
>
> This is what the code does. Recursion is not needed here though.
>
Recursion is needed to create full device path. I am not interested in
only PCI here. In fact PCI one is the easiest.
> > But what I
> > learned is that by changing get_dev_path output you will break migration
> > from older guest to newer once (something we have to support).
>
> Well I think migration for sytems with nested buses are broken anyway:
> it's just a new feature where we simply did not figure out the migration
> andgle before the last release.
> Without nesting my code returns exactly the existing output
> so no, it's not broken.
>
It does not drops bus id part? If without bridges the output is exactly
same then migration is OK with the patch.
> > And of
> > course, as Alex said already, you need to traverse bridges recursively
>
> Well this is an implementation detail. loop is functionally equivalent,
> only cleaner and easier to understand.
>
With that I can agree. I am referring more to recursion down device
tree.
> > and
> > domain does not provide any meaningful information.
>
> I dont understand yet, sorry. Code I posted returns exactly
> what's there today. If domain does not provide any meaningful
> information, How do things work then? And how do you address
> roots on a system with multiple roots?
>
The current code is broken. Look at my patch series.
> > > This is on top of Alex's patch, completely untested.
> > >
> > >
> > > pci: fix device path for devices behind nested bridges
> > >
> > > We were using bus number in the device path, which is clearly
> > > broken as this number is guest-assigned for all devices
> > > except the root.
> > >
> > > Fix by using hierarchical list of slots, walking the path
> > > from root down to device, instead. Add :00 as bus number
> > > so that if there are no nested bridges, this is compatible
> > > with what we have now.
> > >
> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > >
> > > diff --git a/hw/pci.c b/hw/pci.c
> > > index 7d12473..fa98d94 100644
> > > --- a/hw/pci.c
> > > +++ b/hw/pci.c
> > > @@ -1826,13 +1826,45 @@ static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent)
> > >
> > > static char *pcibus_get_dev_path(DeviceState *dev)
> > > {
> > > - PCIDevice *d = (PCIDevice *)dev;
> > > - char path[16];
> > > -
> > > - snprintf(path, sizeof(path), "%04x:%02x:%02x.%x",
> > > - pci_find_domain(d->bus), pci_bus_num(d->bus),
> > > - PCI_SLOT(d->devfn), PCI_FUNC(d->devfn));
> > > -
> > > - return strdup(path);
> > > + PCIDevice *d = container_of(dev, PCIDevice, qdev);
> > > + PCIDevice *t;
> > > + int slot_depth;
> > > + /* Path format: Domain:00:Slot:Slot....:Slot.Function.
> > > + * 00 is added here to make this format compatible with
> > > + * domain:Bus:Slot.Func for systems without nested PCI bridges.
> > > + * Slot list specifies the slot numbers for all devices on the
> > > + * path from root to the specific device. */
> > > + int domain_len = strlen("DDDD:00");
> > > + int func_len = strlen(".F");
> > > + int slot_len = strlen(":SS");
> > > + int path_len;
> > > + char *path, *p;
> > > +
> > > + /* Calculate # of slots on path between device and root. */;
> > > + slot_depth = 0;
> > > + for (t = d; t; t = t->bus->parent_dev)
> > > + ++slot_depth;
> > > +
> > > + path_len = domain_len + bus_len + slot_len * slot_depth + func_len;
> > > +
> > > + /* Allocate memory, fill in the terminating null byte. */
> > > + path = malloc(path_len + 1 /* For '\0' */);
> > > + path[path_len] = '\0';
> > > +
> > > + /* First field is the domain. */
> > > + snprintf(path, domain_len, "%04x", pci_find_domain(d->bus));
> > > +
> > > + /* Leave space for slot numbers and fill in function number. */
> > > + p = path + domain_len + slot_len * slot_depth;
> > > + snprintf(p, func_len, ".%02x", PCI_FUNC(d->devfn));
> > > +
> > > + /* Fill in slot numbers. We walk up from device to root, so need to print
> > > + * them in the reverse order, last to first. */
> > > + for (t = d; t; t = t->bus->parent_dev) {
> > > + p -= slot_len;
> > > + snprintf(p, slot_len, ":%x", PCI_SLOT(t->devfn));
> > > + }
> > > +
> > > + return path;
> > > }
> > >
> >
> > --
> > Gleb.
--
Gleb.
next prev parent reply other threads:[~2010-11-08 17:28 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-04 21:53 [Qemu-devel] [PATCH] PCI: Bus number from the bridge, not the device Alex Williamson
2010-11-08 11:22 ` [Qemu-devel] " Michael S. Tsirkin
2010-11-08 14:52 ` Alex Williamson
2010-11-08 16:26 ` Michael S. Tsirkin
2010-11-08 16:36 ` Alex Williamson
2010-11-08 16:48 ` Michael S. Tsirkin
2010-11-08 17:00 ` Gleb Natapov
2010-11-08 17:08 ` Michael S. Tsirkin
2010-11-08 17:27 ` Gleb Natapov [this message]
2010-11-09 2:41 ` Isaku Yamahata
2010-11-09 11:53 ` Michael S. Tsirkin
2010-11-19 17:02 ` Markus Armbruster
2010-11-19 20:38 ` Gleb Natapov
2010-11-20 20:17 ` Michael S. Tsirkin
2010-11-21 8:32 ` Gleb Natapov
2010-11-21 9:50 ` Michael S. Tsirkin
2010-11-21 10:19 ` Gleb Natapov
2010-11-21 11:53 ` Michael S. Tsirkin
2010-11-21 12:50 ` Gleb Natapov
2010-11-21 14:48 ` Michael S. Tsirkin
2010-11-21 16:01 ` Gleb Natapov
2010-11-21 16:38 ` Michael S. Tsirkin
2010-11-21 17:28 ` Gleb Natapov
2010-11-21 18:22 ` Michael S. Tsirkin
2010-11-21 19:29 ` Gleb Natapov
2010-11-21 20:39 ` Michael S. Tsirkin
2010-11-22 7:37 ` Gleb Natapov
2010-11-22 8:16 ` Michael S. Tsirkin
2010-11-22 13:04 ` Gleb Natapov
2010-11-22 14:50 ` Michael S. Tsirkin
2010-11-22 14:52 ` Gleb Natapov
2010-11-22 14:56 ` Michael S. Tsirkin
2010-11-22 14:58 ` Gleb Natapov
2010-11-22 16:41 ` Michael S. Tsirkin
2010-11-22 17:01 ` Gleb Natapov
2010-12-13 20:04 ` Alex Williamson
2010-12-14 4:46 ` Michael S. Tsirkin
2010-12-14 4:49 ` Alex Williamson
2010-12-14 4:57 ` Michael S. Tsirkin
2010-12-14 5:04 ` Alex Williamson
2010-12-14 12:26 ` Michael S. Tsirkin
2010-12-14 18:34 ` Alex Williamson
2010-12-15 9:56 ` Michael S. Tsirkin
2010-12-15 15:27 ` Alex Williamson
2010-12-16 7:08 ` Isaku Yamahata
2010-12-16 8:36 ` Michael S. Tsirkin
2010-12-21 10:13 ` Isaku Yamahata
2010-12-21 10:56 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101108172749.GA19863@redhat.com \
--to=gleb@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=armbru@redhat.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=yamahata@valinux.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).