From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=40414 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PKCwD-0002Tc-GK for qemu-devel@nongnu.org; Sun, 21 Nov 2010 11:38:54 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PKCwB-0002YG-GU for qemu-devel@nongnu.org; Sun, 21 Nov 2010 11:38:49 -0500 Received: from mx1.redhat.com ([209.132.183.28]:43725) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PKCwB-0002Xx-1d for qemu-devel@nongnu.org; Sun, 21 Nov 2010 11:38:47 -0500 Date: Sun, 21 Nov 2010 18:38:31 +0200 From: "Michael S. Tsirkin" Subject: Re: [Qemu-devel] Re: [PATCH] PCI: Bus number from the bridge, not the device Message-ID: <20101121163830.GA26701@redhat.com> References: <20101119203842.GA11108@redhat.com> <20101120201709.GA8388@redhat.com> <20101121083211.GB7948@redhat.com> <20101121095018.GA19477@redhat.com> <20101121101903.GC7948@redhat.com> <20101121115326.GB19477@redhat.com> <20101121125014.GD7948@redhat.com> <20101121144844.GA21647@redhat.com> <20101121160110.GE7948@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20101121160110.GE7948@redhat.com> Content-Transfer-Encoding: quoted-printable List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Gleb Natapov Cc: Isaku Yamahata , Alex Williamson , Markus Armbruster , qemu-devel@nongnu.org On Sun, Nov 21, 2010 at 06:01:11PM +0200, Gleb Natapov wrote: > On Sun, Nov 21, 2010 at 04:48:44PM +0200, Michael S. Tsirkin wrote: > > On Sun, Nov 21, 2010 at 02:50:14PM +0200, Gleb Natapov wrote: > > > On Sun, Nov 21, 2010 at 01:53:26PM +0200, Michael S. Tsirkin wrote: > > > > > > The guests. > > > > > Which one? There are many guests. Your favorite? > > > > >=20 > > > > > > For CLI, we need an easy way to map a device in guest to the > > > > > > device in qemu and back. > > > > > Then use eth0, /dev/sdb, or even C:. Your way is not less broke= n since what > > > > > you are saying is "lets use name that guest assigned to a devic= e".=20 > > > >=20 > > > > No I am saying let's use the name that our ACPI tables assigned. > > > >=20 > > > ACPI does not assign any name. In a best case ACPI tables describe = resources > > > used by a device. > >=20 > > Not only that. bus number and segment aren't resources as such. > > They describe addressing. > >=20 > > > And not all guests qemu supports has support for ACPI. Qemu > > > even support machines types that do not support ACPI. > >=20 > > So? Different machines -> different names. > >=20 > You want to have different cli for different type of machines qemu > supports? Different device names. > > > > > >=20 > > > > > >=20 > > > > > > > It looks like you identify yourself with most of > > > > > > > qemu users, but if most qemu users are like you then qemu h= as not enough > > > > > > > users :) Most users that consider themselves to be "advance= d" may know > > > > > > > what eth1 or /dev/sdb means. This doesn't mean we should pr= ovide > > > > > > > "device_del eth1" or "device_add /dev/sdb" command though.=20 > > > > > > >=20 > > > > > > > More important is that "domain" (encoded as number like you= used to) > > > > > > > and "bus number" has no meaning from inside qemu. > > > > > > > So while I said many > > > > > > > times I don't care about exact CLI syntax to much it should= make sense > > > > > > > at least. It can use id to specify PCI bus in CLI like this= : > > > > > > > device_del pci.0:1.1. Or it can even use device id too like= this: > > > > > > > device_del pci.0:ide.0. Or it can use HW topology like in F= O device > > > > > > > path. But doing ah-hoc device enumeration inside qemu and t= hen using it > > > > > > > for CLI is not it. > > > > > > >=20 > > > > > > > > functionality in the guests. Qemu is buggy in the moment= in that is > > > > > > > > uses the bus addresses assigned by guest and not the ones= in ACPI, > > > > > > > > but that can be fixed. > > > > > > > It looks like you confused ACPI _SEG for something it isn't. > > > > > >=20 > > > > > > Maybe I did. This is what linux does: > > > > > >=20 > > > > > > struct pci_bus * __devinit pci_acpi_scan_root(struct acpi_pci= _root > > > > > > *root) > > > > > > { > > > > > > struct acpi_device *device =3D root->device; > > > > > > int domain =3D root->segment; > > > > > > int busnum =3D root->secondary.start; > > > > > >=20 > > > > > > And I think this is consistent with the spec. > > > > > >=20 > > > > > It means that one domain may include several host bridges. > > > > > At that level > > > > > domain is defined as something that have unique name for each d= evice > > > > > inside it thus no two buses in one segment/domain can have same= bus > > > > > number. This is what PCI spec tells you.=20 > > > >=20 > > > > And that really is enough for CLI because all we need is locate t= he > > > > specific slot in a unique way. > > > >=20 > > > At qemu level we do not have bus numbers. They are assigned by a gu= est. > > > So inside a guest domain:bus:slot.func points you to a device, but = in > > > qemu does not enumerate buses. > > >=20 > > > > > And this further shows that using "domain" as defined by guest = is very > > > > > bad idea.=20 > > > >=20 > > > > As defined by ACPI, really. > > > >=20 > > > ACPI is a part of a guest software that may not event present in th= e > > > guest. How is it relevant? > >=20 > > It's relevant because this is what guests use. To access the root > > device with cf8/cfc you need to know the bus number assigned to it > > by firmware. How that was assigned is of interest to BIOS/ACPI but no= t > > really interesting to the user or, I suspect, guest OS. > >=20 > Of course this is incorrect. OS can re-enumerate PCI if it wishes. Linu= x > have cmd just for that. I haven't looked but I suspect linux will simply assume cf8/cfc and and start doing it from there. If that doesn't get you the root device you wanted, tough. > And saying that ACPI is relevant because this is > what guest software use in a reply to sentence that states that not all > guest even use ACPI is, well, strange. >=20 > And ACPI describes only HW that present at boot time. What if you > hot-plugged root pci bridge? How non existent PCI naming helps you? that's described by ACPI as well. > > > > > > > ACPI spec > > > > > > > says that PCI segment group is purely software concept mana= ged by system > > > > > > > firmware. In fact one segment may include multiple PCI host= bridges. > > > > > >=20 > > > > > > It can't I think: > > > > > Read _BBN definition: > > > > > The _BBN object is located under a PCI host bridge and must be= unique for > > > > > every host bridge within a segment since it is the PCI bus num= ber. > > > > >=20 > > > > > Clearly above speaks about multiple host bridge within a segmen= t. > > > >=20 > > > > Yes, it looks like the firmware spec allows that. > > > It even have explicit example that shows it. > > >=20 > > > >=20 > > > > > > Multiple Host Bridges > > > > > >=20 > > > > > > A platform may have multiple PCI Express or PCI-X host bridg= es. The base > > > > > > address for the > > > > > > MMCONFIG space for these host bridges may need to be allocat= ed at > > > > > > different locations. In such > > > > > > cases, using MCFG table and _CBA method as defined in this s= ection means > > > > > > that each of these host > > > > > > bridges must be in its own PCI Segment Group. > > > > > >=20 > > > > > This is not from ACPI spec, > > > >=20 > > > > PCI Firmware Specification 3.0 > > > >=20 > > > > > but without going to deep into it above > > > > > paragraph talks about some particular case when each host bridg= e must > > > > > be in its own PCI Segment Group with is a definite prove that i= n other > > > > > cases multiple host bridges can be in on segment group. > > > >=20 > > > > I stand corrected. I think you are right. But note that if they a= re, > > > > they must have distinct bus numbers assigned by ACPI. > > > ACPI does not assign any numbers. > >=20 > > For all root pci devices firmware must supply BBN number. This is the > > bus number, isn't it? For nested buses, this is optional. > Nonsense. _BBN is optional and does not present in Seabios DSDT. The spec says it's not optional for host bridges: Firmware must report Host Bridges in the ACPI name space. Each Host Bridge object must contain the following objects: _HID and _CID _CRS to determine all resources consumed and produced (passed through to the secondary bus) by the host bridge. Firmware allocates resources (Memory Addresses, I/O Port, etc.) to Host Bridges. The _CRS descriptor informs the operating system of the resources it may use for configuring devices below the Host Bridge. =E2=97=8F _TRA, _TTP, and _TRS translation offsets to inform the oper= ating system of the mapping between the primary bus and the secondary bus. _PRT and the interrupt descriptor to determine interrupt routing. _BBN to obtain a bus number. so seabios seems to be out of spec. > As far > as I can tell it is only needed if PCI segment group has more then one > pci host bridges. No. Because cfc/cf8 are not aware of _SEG. > >=20 > > > Bios enumerates buses and assign > > > numbers. > >=20 > > There's no standard way to enumerate pci root devices in guest AFAIK. > > The spec says: > > Firmware must configure all Host Bridges in the systems, even if > > they are not connected to a console or boot device. Firmware must > > configure Host Bridges in order to allow operating systems to use th= e > > devices below the Host Bridges. This is because the Host Bridges > > programming model is not defined by the PCI Specifications.=20 > >=20 > >=20 > Guest should be aware of HW to use it. Be it through bios or driver. Why should it? You get bus number and stiuck it in cf8/cfc, you get a config cycle. No magic HW awareness needed. > > > ACPI, in a base case, describes what BIOS did to OSPM. Qemu sits > > > one layer below all this and does not enumerate PC buses. Even if w= e make > > > it to do so there is not way to guaranty that guest will enumerate = them > > > in the same order since there is more then one way to do enumeratio= n. I > > > repeated this numerous times to you already. > >=20 > > ACPI is really part of the motherboard. Calling it the guest just > > confuses things. Guest OS can override bus numbering for nested buses > > but not for root buses. > >=20 > If calling ACPI part of a guest confuses you then you are already > confused. Guest OS can do whatever it wishes with any enumeration FW di= d > if it knows better. >=20 > > > >=20 > > > > > >=20 > > > > > > > _SEG > > > > > > > is not what OSPM uses to tie HW resource to ACPI resource. = It used _CRS > > > > > > > (Current Resource Settings) for that just like OF. No surpr= ise there. > > > > > >=20 > > > > > > OSPM uses both I think. > > > > > >=20 > > > > > > All I see linux do with CRS is get the bus number range. > > > > > So lets assume that HW has two PCI host bridges and ACPI has: > > > > > Device(PCI0) { > > > > > Name (_HID, EisaId ("PNP0A03")) > > > > > Name (_SEG, 0x00) > > > > > } > > > > > Device(PCI1) { > > > > > Name (_HID, EisaId ("PNP0A03")) > > > > > Name (_SEG, 0x01) > > > > > } > > > > > I.e no _CRS to describe resources. How do you think OSPM knows = which of > > > > > two pci host bridges is PCI0 and which one is PCI1? > > > >=20 > > > > You must be able to uniquely address any bridge using the combina= tion of _SEG > > > > and _BBN. > > >=20 > > > No at all. And saying "you must be able" without actually show how > > > doesn't prove anything. _SEG is relevant only for those host bridge= s > > > that support MMCONFIG (not all of them do, and none that qemu suppo= rt > > > does yet). _SEG points to specific entry in MCFG table and MCFG ent= ry > > > holds base address for MMCONFIG space for the bridge (this address > > > is configured by a guest). This is all _SEG does really, no magic a= t > > > all. _BBN returns bus number assigned by the BIOS to host bridge. N= othing > > > qemu visible again. > > > So _SEG and _BBN gives you two numbers assigned by > > > a guest FW. Nothing qemu can use to identify a device. > >=20 > > This FW is given to guest by qemu. It only assigns bus numbers > > because qemu told it to do so. > Seabios is just a guest qemu ships. There are other FW for qemu. Bochs > bios, openfirmware, efi. All of them where developed outside of qemu > project and all of them are usable without qemu. You can't consider the= m > be part of qemu any more then Linux/Windows with virtio drivers. >=20 > >=20 > > > >=20 > > > > > > And the spec says, e.g.: > > > > > >=20 > > > > > > the memory mapped configuration base > > > > > > address (always corresponds to bus number 0) for the PCI Seg= ment Group > > > > > > of the host bridge is provided by _CBA and the bus range cov= ered by the > > > > > > base address is indicated by the corresponding bus range spe= cified in > > > > > > _CRS. > > > > > >=20 > > > > > Don't see how it is relevant. And _CBA is defined only for PCI = Express. Lets > > > > > solve the problem for PCI first and then move to PCI Express. J= umping from one > > > > > to another destruct us from main discussion. > > > >=20 > > > > I think this is what confuses us. As long as you are using cf8/c= fc there's no > > > > concept of a domain really. > > > > Thus: > > > > /pci@i0cf8 > > > >=20 > > > > is probably enough for BIOS boot because we'll need to make root = bus numbers > > > > unique for legacy guests/option ROMs. But this is not a hardware= requirement > > > > and might become easier to ignore eith EFI. > > > >=20 > > > You do not need MMCONFIG to have multiple PCI domains. You can have= one > > > configured via standard cf8/cfc and another one on ef8/efc and one = more > > > at mmio fce00000 and you can address all of them: > > > /pci@i0cf8 > > > /pci@i0ef8 > > > /pci@fce00000 > > >=20 > > > And each one of those PCI domains can have 256 subbridges. > >=20 > > Will common guests such as windows or linux be able to use them? This > With proper drivers yes. There is HW with more then one PCI bus and I > think qemu emulates some of it (PPC MAC for instance).=20 >=20 > > seems to be outside the scope of the PCI Firmware specification, whic= h > > says that bus numbers must be unique. > They must be unique per PCI segment group. >=20 > >=20 > > > > > >=20 > > > > > > > >=20 > > > > > > > > That should be enough for e.g. device_del. We do have the= need to > > > > > > > > describe the topology when we interface with firmware, e.= g. to describe > > > > > > > > the ACPI tables themselves to qemu (this is what Gleb's p= atches deal > > > > > > > > with), but that's probably the only case. > > > > > > > >=20 > > > > > > > Describing HW topology is the only way to unambiguously des= cribe device to > > > > > > > something or someone outside qemu and have persistent devic= e naming > > > > > > > between different HW configuration. > > > > > >=20 > > > > > > Not really, since ACPI is a binary blob programmed by qemu. > > > > > >=20 > > > > > APCI is part of the guest, not qemu. > > > >=20 > > > > Yes it runs in the guest but it's generated by qemu. On real hard= ware, > > > > it's supplied by the motherboard. > > > >=20 > > > It is not generated by qemu. Parts of it depend on HW and other par= t depend > > > on how BIOS configure HW. _BBN for instance is clearly defined to r= eturn > > > address assigned bu the BIOS. > >=20 > > BIOS is supplied on the motherboard and in our case by qemu as well. > You can replace MB bios by coreboot+seabios on some of them. > Manufacturer don't want you to do it and make it hard to do, but > otherwise this is just software, not some magic dust. >=20 > > There's no standard way for BIOS to assign bus number to the pci root= , > > so it does it in device-specific way. Why should a management tool > > or a CLI user care about these? As far as they are concerned > > we could use some PV scheme to find root devices and assign bus > > numbers, and it would be exactly the same. > >=20 > Go write KVM userspace that does that. AFAIK there is project out there > that tries to do that. No luck so far. Your world view is very x86/Linu= x > centric. You need to broaden it a little bit. Next time you propose > something ask yourself will it work with qemu-sparc, qemu-ppc, qemu-amd. >=20 >=20 > > > > > Just saying "not really" doesn't > > > > > prove much. I still haven't seen any proposition from you that = actually > > > > > solve the problem. No, "lets use guest naming" is not it. There= is no > > > > > such thing as "The Guest".=20 > > > > >=20 > > > > > -- > > > > > Gleb. > > > >=20 > > > > I am sorry if I didn't make this clear. I think we should use th= e domain:bus > > > > pair to name the root device. As these are unique and=20 > > > >=20 > > > You forgot to complete the sentence :) But you made it clear enough= and > > > it is incorrect. domain:bus pair not only not unique they do not ex= ist > > > in qemu at all > >=20 > > Sure they do. domain maps to mcfg address for express. bus is used fo= r > mcfg is optional as far as I can see. You can compile out MMCONFIG > support on Linux. >=20 > > cf8/cfc addressing. They are assigned by BIOS but since BIOS > > is supplied with hardware the point is moot. > Most PC hardware is supplied with Windows, so what? BIOS is a code that > runs in a guest. It is part of a guest. Every line of code executed by > vcpu belongs to a guest. No need to redefine things to prove you point. >=20 > >=20 > > > and as such can't be used to address device. They are > > > product of HW enumeration done by a guest OS just like eth0 or C:. > > >=20 > > > -- > > > Gleb. > >=20 > > There's a huge difference between BIOS and guest OS, > Not true. >=20 > > and between bus > > numbers of pci root and of nested bridges. > Really? What is it? >=20 > >=20 > > Describing hardware io ports makes sense if you are trying to > > communicate data from qemu to the BIOS. But the rest of the world mi= ght > > not care. > >=20 > The part of the world that manage HW cares. You may need to add device > from monitor before first line of BIOS is event executed. How can you > rely on BIOS enumerate of devices in this case? >=20 >=20 > -- > Gleb.