All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Gleb Natapov <gleb@redhat.com>
Cc: yamahata@valinux.co.jp,
	Alex Williamson <alex.williamson@redhat.com>,
	qemu-devel@nongnu.org, armbru@redhat.com
Subject: [Qemu-devel] Re: [PATCH] PCI: Bus number from the bridge, not the device
Date: Mon, 8 Nov 2010 19:08:37 +0200	[thread overview]
Message-ID: <20101108170837.GI7962@redhat.com> (raw)
In-Reply-To: <20101108170015.GA705@redhat.com>

On Mon, Nov 08, 2010 at 07:00:15PM +0200, Gleb Natapov wrote:
> On Mon, Nov 08, 2010 at 06:26:33PM +0200, Michael S. Tsirkin wrote:
> > On Mon, Nov 08, 2010 at 07:52:12AM -0700, Alex Williamson wrote:
> > > On Mon, 2010-11-08 at 13:22 +0200, Michael S. Tsirkin wrote:
> > > > On Mon, Oct 04, 2010 at 03:53:11PM -0600, Alex Williamson wrote:
> > > > > pcibus_dev_print() was erroneously retrieving the device bus
> > > > > number from the secondary bus number offset of the device
> > > > > instead of the bridge above the device.  This ends of landing
> > > > > in the 2nd byte of the 3rd BAR for devices, which thankfully
> > > > > is usually zero.  pcibus_get_dev_path() copied this code,
> > > > > inheriting the same bug.  pcibus_get_dev_path() is used for
> > > > > ramblock naming, so changing it can effect migration.  However,
> > > > > I've only seen this byte be non-zero for an assigned device,
> > > > > which can't migrate anyway, so hopefully we won't run into
> > > > > any issues.
> > > > > 
> > > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > > 
> > > > Good catch. Applied.
> > > > I don't really see why do we put the dev path
> > > > in the bus object: why not let device supply its name?
> > > 
> > > Because the device name is not unique.  This came about from the
> > > discussion about how to create a canonical device path that Gleb and
> > > Markus are again trying to hash out.  If we go up to the bus and get the
> > > bus address, we have a VM unique name.  Unfortunately, it's difficult to
> > > define what the bus should print in all cases (ISA), but since they
> > > don't do hotplug and typically don't allocate ramblocks, we can mostly
> > > ignore it for this use case.
> > > 
> > > > And I think this will affect nested bridges. However they are currently
> > > > broken anyway: we really must convert to topological names as bus number
> > > > is guest-assigned - they don't have to be unique, even.
> > > 
> > > Yes, nested bridges are a problem.  How can the seg/bus/devfn not be
> > > unique?
> > 
> > Bus numbers for nested bridges are guest assigned. We start with 0 after reset.
> > 
> > > > What does fixing this involve? Just changing pcibus_get_dev_path?
> > > 
> > > How do you plan to fix it?  Don't forget that migration depends on these
> > > names, so some kind of compatibility layer would be required.  Thanks,
> > > 
> > > Alex
> > 
> > Replace bus number with slot numbers of parent bridges up to the root.
> > This works for root bridge in a compatible way because bus number there
> > is hard-coded to 0.
> > IMO nested bridges are broken anyway, no way to be compatible there.
> > 
> > 
> > Gleb, Markus, I think the following should be sufficient for PCI.  What
> > do you think?  Also - do we need to update QMP/monitor to teach them to
> > work with these paths?
> > 
> 
> I am no longer use bus's get_dev_path callback for my purpose (I added
> another one get_fw_dev_path)

Why? Because get_dev_path is unstable? to avoid changing for migration?
So with this patch, you get to use it again.
IMO two ways to address devices is a bad idea.

> since it is used for migration

what is used for migration?

> it should
> be done for all buses to work properly. And by properly I mean produce
> full path from system root to the device itself recursively.

This is what the code does. Recursion is not needed here though.

> But what I
> learned is that by changing get_dev_path output you will break migration
> from older guest to newer once (something we have to support).

Well I think migration for sytems with nested buses are broken anyway:
it's just a new feature where we simply did not figure out the migration
andgle before the last release.
Without nesting my code returns exactly the existing output
so no, it's not broken.

> And of
> course, as Alex said already, you need to traverse bridges recursively

Well this is an implementation detail.  loop is functionally equivalent,
only cleaner and easier to understand.

> and
> domain does not provide any meaningful information.

I dont understand yet, sorry. Code I posted returns exactly
what's there today. If domain does not provide any meaningful
information, How do things work then? And how do you address
roots on a system with multiple roots?

> > This is on top of Alex's patch, completely untested.
> > 
> > 
> > pci: fix device path for devices behind nested bridges
> > 
> > We were using bus number in the device path, which is clearly
> > broken as this number is guest-assigned for all devices
> > except the root.
> > 
> > Fix by using hierarchical list of slots, walking the path
> > from root down to device, instead. Add :00 as bus number
> > so that if there are no nested bridges, this is compatible
> > with what we have now.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > 
> > diff --git a/hw/pci.c b/hw/pci.c
> > index 7d12473..fa98d94 100644
> > --- a/hw/pci.c
> > +++ b/hw/pci.c
> > @@ -1826,13 +1826,45 @@ static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent)
> >  
> >  static char *pcibus_get_dev_path(DeviceState *dev)
> >  {
> > -    PCIDevice *d = (PCIDevice *)dev;
> > -    char path[16];
> > -
> > -    snprintf(path, sizeof(path), "%04x:%02x:%02x.%x",
> > -             pci_find_domain(d->bus), pci_bus_num(d->bus),
> > -             PCI_SLOT(d->devfn), PCI_FUNC(d->devfn));
> > -
> > -    return strdup(path);
> > +    PCIDevice *d = container_of(dev, PCIDevice, qdev);
> > +    PCIDevice *t;
> > +    int slot_depth;
> > +    /* Path format: Domain:00:Slot:Slot....:Slot.Function.
> > +     * 00 is added here to make this format compatible with
> > +     * domain:Bus:Slot.Func for systems without nested PCI bridges.
> > +     * Slot list specifies the slot numbers for all devices on the
> > +     * path from root to the specific device. */
> > +    int domain_len = strlen("DDDD:00");
> > +    int func_len = strlen(".F");
> > +    int slot_len = strlen(":SS");
> > +    int path_len;
> > +    char *path, *p;
> > +
> > +    /* Calculate # of slots on path between device and root. */;
> > +    slot_depth = 0;
> > +    for (t = d; t; t = t->bus->parent_dev)
> > +        ++slot_depth;
> > +
> > +    path_len = domain_len + bus_len + slot_len * slot_depth + func_len;
> > +
> > +    /* Allocate memory, fill in the terminating null byte. */
> > +    path = malloc(path_len + 1 /* For '\0' */);
> > +    path[path_len] = '\0';
> > +
> > +    /* First field is the domain. */
> > +    snprintf(path, domain_len, "%04x", pci_find_domain(d->bus));
> > +
> > +    /* Leave space for slot numbers and fill in function number. */
> > +    p = path + domain_len + slot_len * slot_depth;
> > +    snprintf(p, func_len, ".%02x", PCI_FUNC(d->devfn));
> > +
> > +    /* Fill in slot numbers. We walk up from device to root, so need to print
> > +     * them in the reverse order, last to first. */
> > +    for (t = d; t; t = t->bus->parent_dev) {
> > +        p -= slot_len;
> > +        snprintf(p, slot_len, ":%x", PCI_SLOT(t->devfn));
> > +    }
> > +
> > +    return path;
> >  }
> >  
> 
> --
> 			Gleb.

  reply	other threads:[~2010-11-08 17:08 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-04 21:53 [Qemu-devel] [PATCH] PCI: Bus number from the bridge, not the device Alex Williamson
2010-11-08 11:22 ` [Qemu-devel] " Michael S. Tsirkin
2010-11-08 14:52   ` Alex Williamson
2010-11-08 16:26     ` Michael S. Tsirkin
2010-11-08 16:36       ` Alex Williamson
2010-11-08 16:48         ` Michael S. Tsirkin
2010-11-08 17:00       ` Gleb Natapov
2010-11-08 17:08         ` Michael S. Tsirkin [this message]
2010-11-08 17:27           ` Gleb Natapov
2010-11-09  2:41       ` Isaku Yamahata
2010-11-09 11:53         ` Michael S. Tsirkin
2010-11-19 17:02           ` Markus Armbruster
2010-11-19 20:38             ` Gleb Natapov
2010-11-20 20:17               ` Michael S. Tsirkin
2010-11-21  8:32                 ` Gleb Natapov
2010-11-21  9:50                   ` Michael S. Tsirkin
2010-11-21 10:19                     ` Gleb Natapov
2010-11-21 11:53                       ` Michael S. Tsirkin
2010-11-21 12:50                         ` Gleb Natapov
2010-11-21 14:48                           ` Michael S. Tsirkin
2010-11-21 16:01                             ` Gleb Natapov
2010-11-21 16:38                               ` Michael S. Tsirkin
2010-11-21 17:28                                 ` Gleb Natapov
2010-11-21 18:22                               ` Michael S. Tsirkin
2010-11-21 19:29                                 ` Gleb Natapov
2010-11-21 20:39                                   ` Michael S. Tsirkin
2010-11-22  7:37                                     ` Gleb Natapov
2010-11-22  8:16                                       ` Michael S. Tsirkin
2010-11-22 13:04                                         ` Gleb Natapov
2010-11-22 14:50                                           ` Michael S. Tsirkin
2010-11-22 14:52                                             ` Gleb Natapov
2010-11-22 14:56                                               ` Michael S. Tsirkin
2010-11-22 14:58                                                 ` Gleb Natapov
2010-11-22 16:41                                                   ` Michael S. Tsirkin
2010-11-22 17:01                                                     ` Gleb Natapov
2010-12-13 20:04   ` Alex Williamson
2010-12-14  4:46     ` Michael S. Tsirkin
2010-12-14  4:49       ` Alex Williamson
2010-12-14  4:57         ` Michael S. Tsirkin
2010-12-14  5:04           ` Alex Williamson
2010-12-14 12:26             ` Michael S. Tsirkin
2010-12-14 18:34               ` Alex Williamson
2010-12-15  9:56                 ` Michael S. Tsirkin
2010-12-15 15:27                   ` Alex Williamson
2010-12-16  7:08                     ` Isaku Yamahata
2010-12-16  8:36                       ` Michael S. Tsirkin
2010-12-21 10:13                         ` Isaku Yamahata
2010-12-21 10:56                           ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101108170837.GI7962@redhat.com \
    --to=mst@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=armbru@redhat.com \
    --cc=gleb@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=yamahata@valinux.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.