qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Gleb Natapov <gleb@redhat.com>
Cc: Isaku Yamahata <yamahata@valinux.co.jp>,
	Alex Williamson <alex.williamson@redhat.com>,
	Markus Armbruster <armbru@redhat.com>,
	qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] Re: [PATCH] PCI: Bus number from the bridge, not the device
Date: Sun, 21 Nov 2010 13:53:26 +0200	[thread overview]
Message-ID: <20101121115326.GB19477@redhat.com> (raw)
In-Reply-To: <20101121101903.GC7948@redhat.com>

On Sun, Nov 21, 2010 at 12:19:03PM +0200, Gleb Natapov wrote:
> On Sun, Nov 21, 2010 at 11:50:18AM +0200, Michael S. Tsirkin wrote:
> > On Sun, Nov 21, 2010 at 10:32:11AM +0200, Gleb Natapov wrote:
> > > On Sat, Nov 20, 2010 at 10:17:09PM +0200, Michael S. Tsirkin wrote:
> > > > On Fri, Nov 19, 2010 at 10:38:42PM +0200, Gleb Natapov wrote:
> > > > > On Fri, Nov 19, 2010 at 06:02:58PM +0100, Markus Armbruster wrote:
> > > > > > "Michael S. Tsirkin" <mst@redhat.com> writes:
> > > > > > 
> > > > > > > On Tue, Nov 09, 2010 at 11:41:43AM +0900, Isaku Yamahata wrote:
> > > > > > >> On Mon, Nov 08, 2010 at 06:26:33PM +0200, Michael S. Tsirkin wrote:
> > > > > > >> > Replace bus number with slot numbers of parent bridges up to the root.
> > > > > > >> > This works for root bridge in a compatible way because bus number there
> > > > > > >> > is hard-coded to 0.
> > > > > > >> > IMO nested bridges are broken anyway, no way to be compatible there.
> > > > > > >> > 
> > > > > > >> > 
> > > > > > >> > Gleb, Markus, I think the following should be sufficient for PCI.  What
> > > > > > >> > do you think?  Also - do we need to update QMP/monitor to teach them to
> > > > > > >> > work with these paths?
> > > > > > >> > 
> > > > > > >> > This is on top of Alex's patch, completely untested.
> > > > > > >> > 
> > > > > > >> > 
> > > > > > >> > pci: fix device path for devices behind nested bridges
> > > > > > >> > 
> > > > > > >> > We were using bus number in the device path, which is clearly
> > > > > > >> > broken as this number is guest-assigned for all devices
> > > > > > >> > except the root.
> > > > > > >> > 
> > > > > > >> > Fix by using hierarchical list of slots, walking the path
> > > > > > >> > from root down to device, instead. Add :00 as bus number
> > > > > > >> > so that if there are no nested bridges, this is compatible
> > > > > > >> > with what we have now.
> > > > > > >> 
> > > > > > >> This format, Domain:00:Slot:Slot....:Slot.Function, doesn't work
> > > > > > >> because pci-to-pci bridge is pci function.
> > > > > > >> So the format should be
> > > > > > >> Domain:00:Slot.Function:Slot.Function....:Slot.Function
> > > > > > >> 
> > > > > > >> thanks,
> > > > > > >
> > > > > > > Hmm, interesting. If we do this we aren't backwards compatible
> > > > > > > though, so maybe we could try using openfirmware paths, just as well.
> > > > > > 
> > > > > > Whatever we do, we need to make it work for all (qdevified) devices and
> > > > > > buses.
> > > > > > 
> > > > > > It should also be possible to use canonical addressing with device_add &
> > > > > > friends.  I.e. permit naming a device by (a unique abbreviation of) its
> > > > > > canonical address in addition to naming it by its user-defined ID.  For
> > > > > > instance, something like
> > > > > > 
> > > > > >    device_del /pci/@1,1
> > > > > > 
> > > > > FWIW openbios allows this kind of abbreviation.
> > > > > 
> > > > > > in addition to
> > > > > > 
> > > > > >    device_del ID
> > > > > > 
> > > > > > Open Firmware is a useful source of inspiration there, but should it
> > > > > > come into conflict with usability, we should let usability win.
> > > > > 
> > > > > --
> > > > > 			Gleb.
> > > > 
> > > > 
> > > > I think that the domain (PCI segment group), bus, slot, function way to
> > > > address pci devices is still the most familiar and the easiest to map to
> > > Most familiar to whom?
> > 
> > The guests.
> Which one? There are many guests. Your favorite?
> 
> > For CLI, we need an easy way to map a device in guest to the
> > device in qemu and back.
> Then use eth0, /dev/sdb, or even C:. Your way is not less broken since what
> you are saying is "lets use name that guest assigned to a device". 

No I am saying let's use the name that our ACPI tables assigned.

> > 
> > > It looks like you identify yourself with most of
> > > qemu users, but if most qemu users are like you then qemu has not enough
> > > users :) Most users that consider themselves to be "advanced" may know
> > > what eth1 or /dev/sdb means. This doesn't mean we should provide
> > > "device_del eth1" or "device_add /dev/sdb" command though. 
> > > 
> > > More important is that "domain" (encoded as number like you used to)
> > > and "bus number" has no meaning from inside qemu.
> > > So while I said many
> > > times I don't care about exact CLI syntax to much it should make sense
> > > at least. It can use id to specify PCI bus in CLI like this:
> > > device_del pci.0:1.1. Or it can even use device id too like this:
> > > device_del pci.0:ide.0. Or it can use HW topology like in FO device
> > > path. But doing ah-hoc device enumeration inside qemu and then using it
> > > for CLI is not it.
> > > 
> > > > functionality in the guests.  Qemu is buggy in the moment in that is
> > > > uses the bus addresses assigned by guest and not the ones in ACPI,
> > > > but that can be fixed.
> > > It looks like you confused ACPI _SEG for something it isn't.
> > 
> > Maybe I did. This is what linux does:
> > 
> > struct pci_bus * __devinit pci_acpi_scan_root(struct acpi_pci_root
> > *root)
> > {
> >         struct acpi_device *device = root->device;
> >         int domain = root->segment;
> >         int busnum = root->secondary.start;
> > 
> > And I think this is consistent with the spec.
> > 
> It means that one domain may include several host bridges.
> At that level
> domain is defined as something that have unique name for each device
> inside it thus no two buses in one segment/domain can have same bus
> number. This is what PCI spec tells you. 

And that really is enough for CLI because all we need is locate the
specific slot in a unique way.

> And this further shows that using "domain" as defined by guest is very
> bad idea. 

As defined by ACPI, really.

> > > ACPI spec
> > > says that PCI segment group is purely software concept managed by system
> > > firmware. In fact one segment may include multiple PCI host bridges.
> > 
> > It can't I think:
> Read _BBN definition:
>  The _BBN object is located under a PCI host bridge and must be unique for
>  every host bridge within a segment since it is the PCI bus number.
> 
> Clearly above speaks about multiple host bridge within a segment.

Yes, it looks like the firmware spec allows that.

> > 	Multiple Host Bridges
> > 
> > 	A platform may have multiple PCI Express or PCI-X host bridges. The base
> > 	address for the
> > 	MMCONFIG space for these host bridges may need to be allocated at
> > 	different locations. In such
> > 	cases, using MCFG table and _CBA method as defined in this section means
> > 	that each of these host
> > 	bridges must be in its own PCI Segment Group.
> > 
> This is not from ACPI spec,

PCI Firmware Specification 3.0

> but without going to deep into it above
> paragraph talks about some particular case when each host bridge must
> be in its own PCI Segment Group with is a definite prove that in other
> cases multiple host bridges can be in on segment group.

I stand corrected. I think you are right. But note that if they are,
they must have distinct bus numbers assigned by ACPI.

> > 
> > > _SEG
> > > is not what OSPM uses to tie HW resource to ACPI resource. It used _CRS
> > > (Current Resource Settings) for that just like OF. No surprise there.
> > 
> > OSPM uses both I think.
> > 
> > All I see linux do with CRS is get the bus number range.
> So lets assume that HW has two PCI host bridges and ACPI has:
>         Device(PCI0) {
>             Name (_HID, EisaId ("PNP0A03"))
>             Name (_SEG, 0x00)
>         }
>         Device(PCI1) {
>             Name (_HID, EisaId ("PNP0A03"))
>             Name (_SEG, 0x01)
>         }
> I.e no _CRS to describe resources. How do you think OSPM knows which of
> two pci host bridges is PCI0 and which one is PCI1?

You must be able to uniquely address any bridge using the combination of _SEG
and _BBN.

> > And the spec says, e.g.:
> > 
> > 	  the memory mapped configuration base
> > 	address (always corresponds to bus number 0) for the PCI Segment Group
> > 	of the host bridge is provided by _CBA and the bus range covered by the
> > 	base address is indicated by the corresponding bus range specified in
> > 	_CRS.
> > 
> Don't see how it is relevant. And _CBA is defined only for PCI Express. Lets
> solve the problem for PCI first and then move to PCI Express. Jumping from one
> to another destruct us from main discussion.

I think this is what confuses us.  As long as you are using cf8/cfc there's no
concept of a domain really.
Thus:
	/pci@i0cf8

is probably enough for BIOS boot because we'll need to make root bus numbers
unique for legacy guests/option ROMs.  But this is not a hardware requirement
and might become easier to ignore eith EFI.

> > 
> > > > 
> > > > That should be enough for e.g. device_del. We do have the need to
> > > > describe the topology when we interface with firmware, e.g. to describe
> > > > the ACPI tables themselves to qemu (this is what Gleb's patches deal
> > > > with), but that's probably the only case.
> > > > 
> > > Describing HW topology is the only way to unambiguously describe device to
> > > something or someone outside qemu and have persistent device naming
> > > between different HW configuration.
> > 
> > Not really, since ACPI is a binary blob programmed by qemu.
> > 
> APCI is part of the guest, not qemu.

Yes it runs in the guest but it's generated by qemu. On real hardware,
it's supplied by the motherboard.

> Just saying "not really" doesn't
> prove much. I still haven't seen any proposition from you that actually
> solve the problem. No, "lets use guest naming" is not it. There is no
> such thing as "The Guest". 
> 
> --
> 			Gleb.

I am sorry if I didn't make this clear.  I think we should use the domain:bus
pair to name the root device. As these are unique and 

-- 
MST

  reply	other threads:[~2010-11-21 11:53 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-04 21:53 [Qemu-devel] [PATCH] PCI: Bus number from the bridge, not the device Alex Williamson
2010-11-08 11:22 ` [Qemu-devel] " Michael S. Tsirkin
2010-11-08 14:52   ` Alex Williamson
2010-11-08 16:26     ` Michael S. Tsirkin
2010-11-08 16:36       ` Alex Williamson
2010-11-08 16:48         ` Michael S. Tsirkin
2010-11-08 17:00       ` Gleb Natapov
2010-11-08 17:08         ` Michael S. Tsirkin
2010-11-08 17:27           ` Gleb Natapov
2010-11-09  2:41       ` Isaku Yamahata
2010-11-09 11:53         ` Michael S. Tsirkin
2010-11-19 17:02           ` Markus Armbruster
2010-11-19 20:38             ` Gleb Natapov
2010-11-20 20:17               ` Michael S. Tsirkin
2010-11-21  8:32                 ` Gleb Natapov
2010-11-21  9:50                   ` Michael S. Tsirkin
2010-11-21 10:19                     ` Gleb Natapov
2010-11-21 11:53                       ` Michael S. Tsirkin [this message]
2010-11-21 12:50                         ` Gleb Natapov
2010-11-21 14:48                           ` Michael S. Tsirkin
2010-11-21 16:01                             ` Gleb Natapov
2010-11-21 16:38                               ` Michael S. Tsirkin
2010-11-21 17:28                                 ` Gleb Natapov
2010-11-21 18:22                               ` Michael S. Tsirkin
2010-11-21 19:29                                 ` Gleb Natapov
2010-11-21 20:39                                   ` Michael S. Tsirkin
2010-11-22  7:37                                     ` Gleb Natapov
2010-11-22  8:16                                       ` Michael S. Tsirkin
2010-11-22 13:04                                         ` Gleb Natapov
2010-11-22 14:50                                           ` Michael S. Tsirkin
2010-11-22 14:52                                             ` Gleb Natapov
2010-11-22 14:56                                               ` Michael S. Tsirkin
2010-11-22 14:58                                                 ` Gleb Natapov
2010-11-22 16:41                                                   ` Michael S. Tsirkin
2010-11-22 17:01                                                     ` Gleb Natapov
2010-12-13 20:04   ` Alex Williamson
2010-12-14  4:46     ` Michael S. Tsirkin
2010-12-14  4:49       ` Alex Williamson
2010-12-14  4:57         ` Michael S. Tsirkin
2010-12-14  5:04           ` Alex Williamson
2010-12-14 12:26             ` Michael S. Tsirkin
2010-12-14 18:34               ` Alex Williamson
2010-12-15  9:56                 ` Michael S. Tsirkin
2010-12-15 15:27                   ` Alex Williamson
2010-12-16  7:08                     ` Isaku Yamahata
2010-12-16  8:36                       ` Michael S. Tsirkin
2010-12-21 10:13                         ` Isaku Yamahata
2010-12-21 10:56                           ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101121115326.GB19477@redhat.com \
    --to=mst@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=armbru@redhat.com \
    --cc=gleb@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=yamahata@valinux.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).