From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:49934) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZAX20-0006Ai-P3 for qemu-devel@nongnu.org; Thu, 02 Jul 2015 01:27:31 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZAX1w-0006bx-Kh for qemu-devel@nongnu.org; Thu, 02 Jul 2015 01:27:28 -0400 Received: from e23smtp04.au.ibm.com ([202.81.31.146]:60975) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZAX1v-0006bI-QV for qemu-devel@nongnu.org; Thu, 02 Jul 2015 01:27:24 -0400 Received: from /spool/local by e23smtp04.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 2 Jul 2015 15:27:16 +1000 Received: from d23relay08.au.ibm.com (d23relay08.au.ibm.com [9.185.71.33]) by d23dlp02.au.ibm.com (Postfix) with ESMTP id EB80F2BB0040 for ; Thu, 2 Jul 2015 15:27:17 +1000 (EST) Received: from d23av01.au.ibm.com (d23av01.au.ibm.com [9.190.234.96]) by d23relay08.au.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id t625R7Vj45350918 for ; Thu, 2 Jul 2015 15:27:15 +1000 Received: from d23av01.au.ibm.com (localhost [127.0.0.1]) by d23av01.au.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id t625Qjo7016944 for ; Thu, 2 Jul 2015 15:26:45 +1000 Message-ID: <5594CB7B.5020002@linux.vnet.ibm.com> Date: Thu, 02 Jul 2015 13:26:19 +0800 From: Hong Bo Li MIME-Version: 1.0 References: <20150701111650-mutt-send-email-mst@redhat.com> <5593BB28.7060904@linux.vnet.ibm.com> <20150701122334-mutt-send-email-mst@redhat.com> <5593CAEA.2000301@linux.vnet.ibm.com> <20150701131635-mutt-send-email-mst@redhat.com> <5593D2F9.6010600@linux.vnet.ibm.com> <20150701134925-mutt-send-email-mst@redhat.com> <5593E04C.3080204@linux.vnet.ibm.com> <20150701153151-mutt-send-email-mst@redhat.com> <5594A89E.10805@linux.vnet.ibm.com> <20150702070942-mutt-send-email-mst@redhat.com> In-Reply-To: <20150702070942-mutt-send-email-mst@redhat.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v2 1/1] KVM s390 pci infrastructure modelling List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" Cc: cornelia.huck@de.ibm.com, borntraeger@de.ibm.com, sebott@linux.vnet.ibm.com, qemu-devel@nongnu.org, agraf@suse.de On 7/2/2015 13:13, Michael S. Tsirkin wrote: > On Thu, Jul 02, 2015 at 10:57:34AM +0800, Hong Bo Li wrote: >> >> On 7/1/2015 21:37, Michael S. Tsirkin wrote: >>> On Wed, Jul 01, 2015 at 08:42:52PM +0800, Hong Bo Li wrote: >>>> On 7/1/2015 19:57, Michael S. Tsirkin wrote: >>>>> On Wed, Jul 01, 2015 at 07:46:01PM +0800, Hong Bo Li wrote: >>>>>> On 7/1/2015 19:23, Michael S. Tsirkin wrote: >>>>>>> On Wed, Jul 01, 2015 at 07:11:38PM +0800, Hong Bo Li wrote: >>>>>>>> On 7/1/2015 18:36, Michael S. Tsirkin wrote: >>>>>>>>> On Wed, Jul 01, 2015 at 06:04:24PM +0800, Hong Bo Li wrote: >>>>>>>>>> On 7/1/2015 17:22, Michael S. Tsirkin wrote: >>>>>>>>>>> On Wed, Jul 01, 2015 at 05:13:11PM +0800, Hong Bo Li wrote: >>>>>>>>>>>> On 7/1/2015 16:05, Michael S. Tsirkin wrote: >>>>>>>>>>>>> On Wed, Jul 01, 2015 at 03:56:25PM +0800, Hong Bo Li wrote: >>>>>>>>>>>>>> On 7/1/2015 14:22, Michael S. Tsirkin wrote: >>>>>>>>>>>>>>> On Tue, Jun 30, 2015 at 02:16:59PM +0800, Hong Bo Li wrote: >>>>>>>>>>>>>>>> On 6/29/2015 18:01, Michael S. Tsirkin wrote: >>>>>>>>>>>>>>>>> On Mon, Jun 29, 2015 at 05:24:53PM +0800, Hong Bo Li wrote: >>>>>>>>>>>>>>>>>> This patch introduce a new facility(and bus) >>>>>>>>>>>>>>>>>> to hold devices representing information actually >>>>>>>>>>>>>>>>>> provided by s390 firmware and I/O configuration. >>>>>>>>>>>>>>>>>> usage example: >>>>>>>>>>>>>>>>>> -device s390-pcihost >>>>>>>>>>>>>>>>>> -device vfio-pci,host=0000:00:00.0,id=vpci1 >>>>>>>>>>>>>>>>>> -device zpci,fid=2,uid=5,pci_id=vpci1,id=zpci1 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The first line will create a s390 pci host bridge >>>>>>>>>>>>>>>>>> and init the root bus. The second line will create >>>>>>>>>>>>>>>>>> a standard vfio pci device, and attach it to the >>>>>>>>>>>>>>>>>> root bus. These are similiar to the standard process >>>>>>>>>>>>>>>>>> to define a pci device on other platform. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The third line will create a s390 pci device to >>>>>>>>>>>>>>>>>> store s390 specific information, and references >>>>>>>>>>>>>>>>>> the corresponding vfio pci device via device id. >>>>>>>>>>>>>>>>>> We create a s390 pci facility bus to hold all the >>>>>>>>>>>>>>>>>> zpci devices. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Signed-off-by: Hong Bo Li >>>>>>>>>>>>>>>>> It's mostly up to s390 maintainers, but I'd like to note >>>>>>>>>>>>>>>>> one thing below >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> --- >>>>>>>>>>>>>>>>>> hw/s390x/s390-pci-bus.c | 314 +++++++++++++++++++++++++++++++++------------ >>>>>>>>>>>>>>>>>> hw/s390x/s390-pci-bus.h | 48 ++++++- >>>>>>>>>>>>>>>>>> hw/s390x/s390-pci-inst.c | 4 +- >>>>>>>>>>>>>>>>>> hw/s390x/s390-virtio-ccw.c | 5 +- >>>>>>>>>>>>>>>>>> 4 files changed, 283 insertions(+), 88 deletions(-) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c >>>>>>>>>>>>>>>>>> index 560b66a..d5e7b2e 100644 >>>>>>>>>>>>>>>>>> --- a/hw/s390x/s390-pci-bus.c >>>>>>>>>>>>>>>>>> +++ b/hw/s390x/s390-pci-bus.c >>>>>>>>>>>>>>>>>> @@ -32,8 +32,8 @@ int chsc_sei_nt2_get_event(void *res) >>>>>>>>>>>>>>>>>> PciCcdfErr *eccdf; >>>>>>>>>>>>>>>>>> int rc = 1; >>>>>>>>>>>>>>>>>> SeiContainer *sei_cont; >>>>>>>>>>>>>>>>>> - S390pciState *s = S390_PCI_HOST_BRIDGE( >>>>>>>>>>>>>>>>>> - object_resolve_path(TYPE_S390_PCI_HOST_BRIDGE, NULL)); >>>>>>>>>>>>>>>>>> + S390PCIFacility *s = S390_PCI_FACILITY( >>>>>>>>>>>>>>>>>> + object_resolve_path(TYPE_S390_PCI_FACILITY, NULL)); >>>>>>>>>>>>>>>>>> if (!s) { >>>>>>>>>>>>>>>>>> return rc; >>>>>>>>>>>>>>>>>> @@ -72,8 +72,8 @@ int chsc_sei_nt2_get_event(void *res) >>>>>>>>>>>>>>>>>> int chsc_sei_nt2_have_event(void) >>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>> - S390pciState *s = S390_PCI_HOST_BRIDGE( >>>>>>>>>>>>>>>>>> - object_resolve_path(TYPE_S390_PCI_HOST_BRIDGE, NULL)); >>>>>>>>>>>>>>>>>> + S390PCIFacility *s = S390_PCI_FACILITY( >>>>>>>>>>>>>>>>>> + object_resolve_path(TYPE_S390_PCI_FACILITY, NULL)); >>>>>>>>>>>>>>>>>> if (!s) { >>>>>>>>>>>>>>>>>> return 0; >>>>>>>>>>>>>>>>>> @@ -82,20 +82,32 @@ int chsc_sei_nt2_have_event(void) >>>>>>>>>>>>>>>>>> return !QTAILQ_EMPTY(&s->pending_sei); >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> +void s390_pci_device_enable(S390PCIBusDevice *zpci) >>>>>>>>>>>>>>>>>> +{ >>>>>>>>>>>>>>>>>> + zpci->fh = zpci->fh | 1 << ENABLE_BIT_OFFSET; >>>>>>>>>>>>>>>>>> +} >>>>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>>>> +void s390_pci_device_disable(S390PCIBusDevice *zpci) >>>>>>>>>>>>>>>>>> +{ >>>>>>>>>>>>>>>>>> + zpci->fh = zpci->fh & ~(1 << ENABLE_BIT_OFFSET); >>>>>>>>>>>>>>>>>> + if (zpci->is_unplugged) >>>>>>>>>>>>>>>>>> + object_unparent(OBJECT(zpci)); >>>>>>>>>>>>>>>>>> +} >>>>>>>>>>>>>>>>>> + >>>>>>>>>>>>>>>>>> S390PCIBusDevice *s390_pci_find_dev_by_fid(uint32_t fid) >>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>> S390PCIBusDevice *pbdev; >>>>>>>>>>>>>>>>>> - int i; >>>>>>>>>>>>>>>>>> - S390pciState *s = S390_PCI_HOST_BRIDGE( >>>>>>>>>>>>>>>>>> - object_resolve_path(TYPE_S390_PCI_HOST_BRIDGE, NULL)); >>>>>>>>>>>>>>>>>> + BusChild *kid; >>>>>>>>>>>>>>>>>> + S390PCIFacility *s = S390_PCI_FACILITY( >>>>>>>>>>>>>>>>>> + object_resolve_path(TYPE_S390_PCI_FACILITY, NULL)); >>>>>>>>>>>>>>>>>> if (!s) { >>>>>>>>>>>>>>>>>> return NULL; >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> - for (i = 0; i < PCI_SLOT_MAX; i++) { >>>>>>>>>>>>>>>>>> - pbdev = &s->pbdev[i]; >>>>>>>>>>>>>>>>>> - if ((pbdev->fh != 0) && (pbdev->fid == fid)) { >>>>>>>>>>>>>>>>>> + QTAILQ_FOREACH(kid, &s->fbus->qbus.children, sibling) { >>>>>>>>>>>>>>>>>> + pbdev = (S390PCIBusDevice *)kid->child; >>>>>>>>>>>>>>>>>> + if (pbdev->fid == fid) { >>>>>>>>>>>>>>>>>> return pbdev; >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> @@ -126,39 +138,24 @@ void s390_pci_sclp_configure(int configure, SCCB *sccb) >>>>>>>>>>>>>>>>>> return; >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> -static uint32_t s390_pci_get_pfid(PCIDevice *pdev) >>>>>>>>>>>>>>>>>> -{ >>>>>>>>>>>>>>>>>> - return PCI_SLOT(pdev->devfn); >>>>>>>>>>>>>>>>>> -} >>>>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>>> -static uint32_t s390_pci_get_pfh(PCIDevice *pdev) >>>>>>>>>>>>>>>>>> -{ >>>>>>>>>>>>>>>>>> - return PCI_SLOT(pdev->devfn) | FH_VIRT; >>>>>>>>>>>>>>>>>> -} >>>>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>>> S390PCIBusDevice *s390_pci_find_dev_by_idx(uint32_t idx) >>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>> S390PCIBusDevice *pbdev; >>>>>>>>>>>>>>>>>> - int i; >>>>>>>>>>>>>>>>>> - int j = 0; >>>>>>>>>>>>>>>>>> - S390pciState *s = S390_PCI_HOST_BRIDGE( >>>>>>>>>>>>>>>>>> - object_resolve_path(TYPE_S390_PCI_HOST_BRIDGE, NULL)); >>>>>>>>>>>>>>>>>> + BusChild *kid; >>>>>>>>>>>>>>>>>> + int i = 0; >>>>>>>>>>>>>>>>>> + S390PCIFacility *s = S390_PCI_FACILITY( >>>>>>>>>>>>>>>>>> + object_resolve_path(TYPE_S390_PCI_FACILITY, NULL)); >>>>>>>>>>>>>>>>>> if (!s) { >>>>>>>>>>>>>>>>>> return NULL; >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> - for (i = 0; i < PCI_SLOT_MAX; i++) { >>>>>>>>>>>>>>>>>> - pbdev = &s->pbdev[i]; >>>>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>>> - if (pbdev->fh == 0) { >>>>>>>>>>>>>>>>>> - continue; >>>>>>>>>>>>>>>>>> - } >>>>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>>>> - if (j == idx) { >>>>>>>>>>>>>>>>>> + QTAILQ_FOREACH(kid, &s->fbus->qbus.children, sibling) { >>>>>>>>>>>>>>>>>> + pbdev = (S390PCIBusDevice *)kid->child; >>>>>>>>>>>>>>>>>> + if (i == idx) { >>>>>>>>>>>>>>>>>> return pbdev; >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> - j++; >>>>>>>>>>>>>>>>>> + i++; >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> return NULL; >>>>>>>>>>>>>>>>> This relies on the order of children on the qbus, that's wrong I think. >>>>>>>>>>>>>>>>> Generally I'm not sure why do you convert all slot lookups to child >>>>>>>>>>>>>>>>> lookups: more code to achieve the same effect? >>>>>>>>>>>>>>>> Thank you Michael. >>>>>>>>>>>>>>>> I do the change due to two reasons: >>>>>>>>>>>>>>>> 1. The old implement only supports one s390 pci root bus, and 32(PCI_SLOT_MAX) >>>>>>>>>>>>>>>> slots at most. So when it comes to multiple s390 pci root buses, the old code >>>>>>>>>>>>>>>> does not work. >>>>>>>>>>>>>>>> 2. Now the zpci device "S390PCIBusDevice" is only a structure to store >>>>>>>>>>>>>>>> s390 specific information, so we can attach all the zpci devices to a >>>>>>>>>>>>>>>> s390 pci facility bus. Since these zpci device has no relation with the "slot", >>>>>>>>>>>>>>>> so the order of them does not matter. >>>>>>>>>>>>>>> But you make this order guest-visible which seems wrong. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> The guest uses a s390 specific "list pci" instruction to get all the zpci >>>>>>>>>>>>>> devices, and will >>>>>>>>>>>>>> create a root s390 pci bus for each device. So the order has no relation >>>>>>>>>>>>>> with the pci >>>>>>>>>>>>>> topology on guest. >>>>>>>>>>>>>> >>>>>>>>>>>>>> If we assign too many zpci devices to one guest, the "list pci" instruction >>>>>>>>>>>>>> will use a >>>>>>>>>>>>>> resume token to get all the zpci devices. For example, first time we return >>>>>>>>>>>>>> 32 zpci >>>>>>>>>>>>>> devices to guest. Next time we'll return another 32 zpci devices. The resume >>>>>>>>>>>>>> token >>>>>>>>>>>>>> is used to store the beginning of zpci devices that will be returned to >>>>>>>>>>>>>> guest at next time. >>>>>>>>>>>>>> >>>>>>>>>>>>>> So, if we change the order of the zpci device on s390 facility bus, it may >>>>>>>>>>>>>> change the >>>>>>>>>>>>>> "batch" in which this device be returned to guest. But this will not change >>>>>>>>>>>>>> the pci >>>>>>>>>>>>>> topology on guest. >>>>>>>>>>>>> Yes but that's still guest visible, and will break >>>>>>>>>>>>> for example if guest is migrated between qemu instances >>>>>>>>>>>>> where list order is different precisely when >>>>>>>>>>>>> it's enumerating the bus. >>>>>>>>>>>>> >>>>>>>>>>>> Yes, and the list order is not the only s390 specific information that >>>>>>>>>>>> exposed to >>>>>>>>>>>> guest. Besides that, we need to migrate all other zpci information. For >>>>>>>>>>>> now, >>>>>>>>>>>> we have no plan to support zpci migration yet. >>>>>>>>>>> BTW how will hotplug work? If it happens while guest >>>>>>>>>>> enumerates the bus the naturally all index values >>>>>>>>>>> become invalid. >>>>>>>>>> The list zpci only happen when the guest doing pci_base_init() for s390. >>>>>>>>>> At that moment, hotplug does not work yet. >>>>>>>>> You can't prevent this: user can request hotplug at this time. >>>>>>>>> >>>>>>>>>> And assume we have >>>>>>>>>> that case, we still have the index issue even when scan standard pci >>>>>>>>>> bus. Please see my following words. >>>>>>>>>> >>>>>>>>>>> Just don't expose internal qdev data structures to guest. >>>>>>>>>>> It's not by chance that we don't have a look up by index >>>>>>>>>>> capability, it's an attempt to enfoce sane usage. >>>>>>>>>>> You are misusing the API with your hack. >>>>>>>>>> The resume token of list zpci is indeed an index of iteration:( >>>>>>>>>> >>>>>>>>>>> PCI has standard ways to enumerate the bus, maybe you >>>>>>>>>>> should emulate it. Or find some other way that works. >>>>>>>>>>> The idea to poke at s->fbus->qbus and count things there >>>>>>>>>>> is a bad one. >>>>>>>>>>> >>>>>>>>>> I can define multiple zpci buses, and attach zpci device to a slot of a root >>>>>>>>>> bus. >>>>>>>>>> Then I need to add a api to the common pci code to do the scan of all the >>>>>>>>>> pci host bridges. And in this way, it still has the index issue. I need to >>>>>>>>>> scan >>>>>>>>> >from the first bus to count the index. So any suggestion from you? >>>>>>>>> OK, I looked at arch/s390/pci/pci.c. >>>>>>>>> First of all, it seems to run the regular PCI thing on bridges. >>>>>>>>> >>>>>>>>> zdev->bus = pci_scan_root_bus(NULL, ZPCI_BUS_NR, &pci_root_ops, >>>>>>>>> zdev, &resources); >>>>>>>> At this moment, the guest has got all the zpci devices through clp list zpci >>>>>>>> instruction. For each device, in the pci_scan_root_bus(), it will create >>>>>>>> a root bus. So for s390, we get pci devices first, then create a new root bus >>>>>>>> for it. >>>>>>> I don't see this in guest code. >>>>>>> >>>>>>> I looked at pci_scan_root_bus and it's completely generic. >>>>>>> It sets up the bus: >>>>>>> b = pci_create_root_bus(parent, bus, ops, sysdata, resources); >>>>>>> >>>>>>> then it scans it: >>>>>>> max = pci_scan_child_bus(b); >>>>>>> >>>>>>> >>>>>>> that one does >>>>>>> /* Go find them, Rover! */ >>>>>>> for (devfn = 0; devfn < 0x100; devfn += 8) >>>>>>> pci_scan_slot(bus, devfn); >>>>>>> >>>>>>> next >>>>>>> dev = pci_scan_single_device(bus, devfn); >>>>>>> >>>>>>> and so on. Eventually you get >>>>>>> if (!pci_bus_read_dev_vendor_id(bus, devfn, &l, 60*1000)) >>>>>>> return NULL; >>>>>>> >>>>>>> and that one does the clp thing using zpci_cfg_load. >>>>>>> >>>>>> pci_base_init()-> clp_scan_pci_devices(): >>>>>> rc = clp_list_pci(rrb, __clp_add); >>>>>> In this function, there is a while loop to get all the zpci devices by means >>>>>> of >>>>>> resume token(index). And for each device, >>>>>> __clp_add()-> clp_add_pci_device(); >>>>>> In clp_add_pci_device(), we use the zpci information to create a struct >>>>>> zpci_dev zdev. >>>>>> Then zpci_create_device()->zpci_scan_bus()->pci_scan_root_bus() >>>>>> zdev->bus = pci_scan_root_bus(NULL, ZPCI_BUS_NR, &pci_root_ops, >>>>>> zdev, &resources); >>>>>> So, you see, each zdev has its own root bus. And there is no child bus under >>>>>> that root bus. >>>>> Right - zdev *is* the root. But there are pci devices hanging off it. >>>> We have multiple zdevs in kernel, and each zdev only has one pci device >>>> attached to it. >>> I see. It's nasty. Is it too late to fix in guest? >>> Supporting bridges should just be a question of passing >>> bus numbers to host. >> On the Linux OS level, there is no pci to pci bridge on s390, the bus >> number, >> slot number are all virtual and has no meaning, like these: >> 0000:00:00.0 >> 0001:00:00.0 >> 0002:00:00.0 >> ...... >> Each zpci device is in a separate domain. >> I add Sebastian to the list, he is the owner of s390 pci. I think he >> could give some reasons why s390 pci implemented in this way. >>> I guess you need to support old guests too, so this >>> justifies some code in qemu. But you still need something >>> stable to sort by, that does not depend on the order >>> of initialization of devices. If all else fails, ask user >>> to give you numbers. >> Thank you, it's a good idea, I can sort the devices by fid or uid. >> >>> And I'm still confused by this: >>>>>>>>>>>>>>>>>> -device s390-pcihost >>>>>>>>>>>>>>>>>> -device vfio-pci,host=0000:00:00.0,id=vpci1 >>>>>>>>>>>>>>>>>> -device zpci,fid=2,uid=5,pci_id=vpci1,id=zpci1 >>> why isn't vfio connected to zpci? why is it the other way around? >> I implement the hotplug in s390_pci_device_hot_plug() in the patch, >> not the s390_pcihost_hot_plug(). It will do some s390 specific action. >> If we define zpci first, then I need to do the real hotplug when hotplugging >> a vfio-pci device. I think both of them are ok, you prefer the later one? > I prefer sane modeling, it shouldn't be driven by implementation detail. > > But I would like to note that pci device drivers require driver handshake > before device goes away. > IIUC s390 hotplug is immediate, which is a problem. > Maybe doing the change will help make sure device removal is acked > by guest before it happens? > Right, I did not see this point, will do some research on it, thank you! >>>>> So why not model it like this? >>>>> >>>>> vfio should attach to zdev, zdev is the pci host. >>>>> >>>>> Also, you can stick a pci to pci bridge under the root, and >>>>> everything will just work. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>>>> so to me, it looks like there's no need to expose >>>>>>>>> non-root buses through special means. >>>>>>>>> >>>>>>>>> What to do for root buses is a different question but again, >>>>>>>>> you definitely do not want to rely on the order of things >>>>>>>>> on that linked list. >>>>>>>>> The simplest thing is to ask user to give them unique >>>>>>>>> numbers, or find some stable way to sort them that >>>>>>>>> does not rely on order of initialization (e.g. device IDs?). >>>>>>>>> >>>>>>>>> But again, this only works ok for root buses. >>>>>>>>> >>>>>>>> Basically, it does not exposed the buses to guest, it exposed an index >>>>>>>> to guest. >>>>>>>> Here is the process to get all the zpci device for a guest. >>>>>>>> For example: we have 10 zpci devices, and the batch size for list zpci >>>>>>>> instruction is 4. >>>>>>>> First, qemu will return devices 0-3, index of list zpci is 0 >>>>>>>> Second, qemu will return device 4-7, index of list zpci is 4 >>>>>>>> Third, qemu will return device 8-9, index of list zpci is 8 >>>>>>>> We have device id, but list zpci does not use that as a flag to get >>>>>>>> next batch, it use an index instead. >>>>>>>> This process is defined by s390 arch, we can't change it. >>>>>>>> So no matter how we organize zpci devices in qemu, slot or link list. >>>>>>>> We could not get rid of the index issue. >>>>>>>> >>>>>>>> How about I add a flag to identify whether the link list >>>>>>>> is valid or not. When a hotplug/unplug event occurred, I will >>>>>>>> reset the index, and make the guest refetch the zpci devices >>>>>>> >from the beginning. >>>>>>>> >>>>>>> You should just use something stable for IDs. >>>>>>> And avoid doing it for anything that isn't a root or maybe a bridge >>>>>>> since it'll just cause everyone maintainance problems down the road. >>>>>>> >>>>>> The list zpci instruction is defined by arch, not a software thing, I could >>>>>> not >>>>>> change it to use a ID instead...