qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Hong Bo Li <lihbbj@linux.vnet.ibm.com>
Cc: cornelia.huck@de.ibm.com, borntraeger@de.ibm.com,
	qemu-devel@nongnu.org, agraf@suse.de
Subject: Re: [Qemu-devel] [PATCH v2 1/1] KVM s390 pci	infrastructure	modelling
Date: Wed, 1 Jul 2015 13:23:37 +0200	[thread overview]
Message-ID: <20150701131635-mutt-send-email-mst@redhat.com> (raw)
In-Reply-To: <5593CAEA.2000301@linux.vnet.ibm.com>

On Wed, Jul 01, 2015 at 07:11:38PM +0800, Hong Bo Li wrote:
> 
> 
> On 7/1/2015 18:36, Michael S. Tsirkin wrote:
> >On Wed, Jul 01, 2015 at 06:04:24PM +0800, Hong Bo Li wrote:
> >>
> >>On 7/1/2015 17:22, Michael S. Tsirkin wrote:
> >>>On Wed, Jul 01, 2015 at 05:13:11PM +0800, Hong Bo Li wrote:
> >>>>On 7/1/2015 16:05, Michael S. Tsirkin wrote:
> >>>>>On Wed, Jul 01, 2015 at 03:56:25PM +0800, Hong Bo Li wrote:
> >>>>>>On 7/1/2015 14:22, Michael S. Tsirkin wrote:
> >>>>>>>On Tue, Jun 30, 2015 at 02:16:59PM +0800, Hong Bo Li wrote:
> >>>>>>>>On 6/29/2015 18:01, Michael S. Tsirkin wrote:
> >>>>>>>>>On Mon, Jun 29, 2015 at 05:24:53PM +0800, Hong Bo Li wrote:
> >>>>>>>>>>This patch introduce a new facility(and bus)
> >>>>>>>>>>to hold devices representing information actually
> >>>>>>>>>>provided by s390 firmware and I/O configuration.
> >>>>>>>>>>usage example:
> >>>>>>>>>>-device s390-pcihost
> >>>>>>>>>>-device vfio-pci,host=0000:00:00.0,id=vpci1
> >>>>>>>>>>-device zpci,fid=2,uid=5,pci_id=vpci1,id=zpci1
> >>>>>>>>>>
> >>>>>>>>>>The first line will create a s390 pci host bridge
> >>>>>>>>>>and init the root bus. The second line will create
> >>>>>>>>>>a standard vfio pci device, and attach it to the
> >>>>>>>>>>root bus. These are similiar to the standard process
> >>>>>>>>>>to define a pci device on other platform.
> >>>>>>>>>>
> >>>>>>>>>>The third line will create a s390 pci device to
> >>>>>>>>>>store s390 specific information, and references
> >>>>>>>>>>the corresponding vfio pci device via device id.
> >>>>>>>>>>We create a s390 pci facility bus to hold all the
> >>>>>>>>>>zpci devices.
> >>>>>>>>>>
> >>>>>>>>>>Signed-off-by: Hong Bo Li <lihbbj@linux.vnet.ibm.com>
> >>>>>>>>>It's mostly up to s390 maintainers, but I'd like to note
> >>>>>>>>>one thing below
> >>>>>>>>>
> >>>>>>>>>>---
> >>>>>>>>>>  hw/s390x/s390-pci-bus.c    | 314 +++++++++++++++++++++++++++++++++------------
> >>>>>>>>>>  hw/s390x/s390-pci-bus.h    |  48 ++++++-
> >>>>>>>>>>  hw/s390x/s390-pci-inst.c   |   4 +-
> >>>>>>>>>>  hw/s390x/s390-virtio-ccw.c |   5 +-
> >>>>>>>>>>  4 files changed, 283 insertions(+), 88 deletions(-)
> >>>>>>>>>>
> >>>>>>>>>>diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
> >>>>>>>>>>index 560b66a..d5e7b2e 100644
> >>>>>>>>>>--- a/hw/s390x/s390-pci-bus.c
> >>>>>>>>>>+++ b/hw/s390x/s390-pci-bus.c
> >>>>>>>>>>@@ -32,8 +32,8 @@ int chsc_sei_nt2_get_event(void *res)
> >>>>>>>>>>      PciCcdfErr *eccdf;
> >>>>>>>>>>      int rc = 1;
> >>>>>>>>>>      SeiContainer *sei_cont;
> >>>>>>>>>>-    S390pciState *s = S390_PCI_HOST_BRIDGE(
> >>>>>>>>>>-        object_resolve_path(TYPE_S390_PCI_HOST_BRIDGE, NULL));
> >>>>>>>>>>+    S390PCIFacility *s = S390_PCI_FACILITY(
> >>>>>>>>>>+        object_resolve_path(TYPE_S390_PCI_FACILITY, NULL));
> >>>>>>>>>>      if (!s) {
> >>>>>>>>>>          return rc;
> >>>>>>>>>>@@ -72,8 +72,8 @@ int chsc_sei_nt2_get_event(void *res)
> >>>>>>>>>>  int chsc_sei_nt2_have_event(void)
> >>>>>>>>>>  {
> >>>>>>>>>>-    S390pciState *s = S390_PCI_HOST_BRIDGE(
> >>>>>>>>>>-        object_resolve_path(TYPE_S390_PCI_HOST_BRIDGE, NULL));
> >>>>>>>>>>+    S390PCIFacility *s = S390_PCI_FACILITY(
> >>>>>>>>>>+        object_resolve_path(TYPE_S390_PCI_FACILITY, NULL));
> >>>>>>>>>>      if (!s) {
> >>>>>>>>>>          return 0;
> >>>>>>>>>>@@ -82,20 +82,32 @@ int chsc_sei_nt2_have_event(void)
> >>>>>>>>>>      return !QTAILQ_EMPTY(&s->pending_sei);
> >>>>>>>>>>  }
> >>>>>>>>>>+void s390_pci_device_enable(S390PCIBusDevice *zpci)
> >>>>>>>>>>+{
> >>>>>>>>>>+    zpci->fh = zpci->fh | 1 << ENABLE_BIT_OFFSET;
> >>>>>>>>>>+}
> >>>>>>>>>>+
> >>>>>>>>>>+void s390_pci_device_disable(S390PCIBusDevice *zpci)
> >>>>>>>>>>+{
> >>>>>>>>>>+    zpci->fh = zpci->fh & ~(1 << ENABLE_BIT_OFFSET);
> >>>>>>>>>>+    if (zpci->is_unplugged)
> >>>>>>>>>>+        object_unparent(OBJECT(zpci));
> >>>>>>>>>>+}
> >>>>>>>>>>+
> >>>>>>>>>>  S390PCIBusDevice *s390_pci_find_dev_by_fid(uint32_t fid)
> >>>>>>>>>>  {
> >>>>>>>>>>      S390PCIBusDevice *pbdev;
> >>>>>>>>>>-    int i;
> >>>>>>>>>>-    S390pciState *s = S390_PCI_HOST_BRIDGE(
> >>>>>>>>>>-        object_resolve_path(TYPE_S390_PCI_HOST_BRIDGE, NULL));
> >>>>>>>>>>+    BusChild *kid;
> >>>>>>>>>>+    S390PCIFacility *s = S390_PCI_FACILITY(
> >>>>>>>>>>+        object_resolve_path(TYPE_S390_PCI_FACILITY, NULL));
> >>>>>>>>>>      if (!s) {
> >>>>>>>>>>          return NULL;
> >>>>>>>>>>      }
> >>>>>>>>>>-    for (i = 0; i < PCI_SLOT_MAX; i++) {
> >>>>>>>>>>-        pbdev = &s->pbdev[i];
> >>>>>>>>>>-        if ((pbdev->fh != 0) && (pbdev->fid == fid)) {
> >>>>>>>>>>+    QTAILQ_FOREACH(kid, &s->fbus->qbus.children, sibling) {
> >>>>>>>>>>+        pbdev = (S390PCIBusDevice *)kid->child;
> >>>>>>>>>>+        if (pbdev->fid == fid) {
> >>>>>>>>>>              return pbdev;
> >>>>>>>>>>          }
> >>>>>>>>>>      }
> >>>>>>>>>>@@ -126,39 +138,24 @@ void s390_pci_sclp_configure(int configure, SCCB *sccb)
> >>>>>>>>>>      return;
> >>>>>>>>>>  }
> >>>>>>>>>>-static uint32_t s390_pci_get_pfid(PCIDevice *pdev)
> >>>>>>>>>>-{
> >>>>>>>>>>-    return PCI_SLOT(pdev->devfn);
> >>>>>>>>>>-}
> >>>>>>>>>>-
> >>>>>>>>>>-static uint32_t s390_pci_get_pfh(PCIDevice *pdev)
> >>>>>>>>>>-{
> >>>>>>>>>>-    return PCI_SLOT(pdev->devfn) | FH_VIRT;
> >>>>>>>>>>-}
> >>>>>>>>>>-
> >>>>>>>>>>  S390PCIBusDevice *s390_pci_find_dev_by_idx(uint32_t idx)
> >>>>>>>>>>  {
> >>>>>>>>>>      S390PCIBusDevice *pbdev;
> >>>>>>>>>>-    int i;
> >>>>>>>>>>-    int j = 0;
> >>>>>>>>>>-    S390pciState *s = S390_PCI_HOST_BRIDGE(
> >>>>>>>>>>-        object_resolve_path(TYPE_S390_PCI_HOST_BRIDGE, NULL));
> >>>>>>>>>>+    BusChild *kid;
> >>>>>>>>>>+    int i = 0;
> >>>>>>>>>>+    S390PCIFacility *s = S390_PCI_FACILITY(
> >>>>>>>>>>+        object_resolve_path(TYPE_S390_PCI_FACILITY, NULL));
> >>>>>>>>>>      if (!s) {
> >>>>>>>>>>          return NULL;
> >>>>>>>>>>      }
> >>>>>>>>>>-    for (i = 0; i < PCI_SLOT_MAX; i++) {
> >>>>>>>>>>-        pbdev = &s->pbdev[i];
> >>>>>>>>>>-
> >>>>>>>>>>-        if (pbdev->fh == 0) {
> >>>>>>>>>>-            continue;
> >>>>>>>>>>-        }
> >>>>>>>>>>-
> >>>>>>>>>>-        if (j == idx) {
> >>>>>>>>>>+    QTAILQ_FOREACH(kid, &s->fbus->qbus.children, sibling) {
> >>>>>>>>>>+        pbdev = (S390PCIBusDevice *)kid->child;
> >>>>>>>>>>+        if (i == idx) {
> >>>>>>>>>>              return pbdev;
> >>>>>>>>>>          }
> >>>>>>>>>>-        j++;
> >>>>>>>>>>+        i++;
> >>>>>>>>>>      }
> >>>>>>>>>>      return NULL;
> >>>>>>>>>This relies on the order of children on the qbus, that's wrong I think.
> >>>>>>>>>Generally I'm not sure why do you convert all slot lookups to child
> >>>>>>>>>lookups: more code to achieve the same effect?
> >>>>>>>>Thank you Michael.
> >>>>>>>>I do the change due to two reasons:
> >>>>>>>>1. The old implement only supports one s390 pci root bus, and 32(PCI_SLOT_MAX)
> >>>>>>>>slots at most. So when it comes to multiple s390 pci root buses, the old code
> >>>>>>>>does not work.
> >>>>>>>>2. Now the zpci device "S390PCIBusDevice" is only a structure to store
> >>>>>>>>s390 specific information, so we can attach all the zpci devices to a
> >>>>>>>>s390 pci facility bus. Since these zpci device has no relation with the "slot",
> >>>>>>>>so the order of them does not matter.
> >>>>>>>But you make this order guest-visible which seems wrong.
> >>>>>>>
> >>>>>>The guest uses a s390 specific "list pci" instruction to get all the zpci
> >>>>>>devices, and will
> >>>>>>create a root s390 pci bus for each device.  So the order has no relation
> >>>>>>with the pci
> >>>>>>topology on guest.
> >>>>>>
> >>>>>>If we assign  too many zpci devices to one guest, the "list pci" instruction
> >>>>>>will use a
> >>>>>>resume token to get all the zpci devices. For example, first time we return
> >>>>>>32 zpci
> >>>>>>devices to guest. Next time we'll return another 32 zpci devices. The resume
> >>>>>>token
> >>>>>>is used to store the beginning of zpci devices that will be returned to
> >>>>>>guest at next time.
> >>>>>>
> >>>>>>So, if we change the order of the zpci device on s390 facility bus, it may
> >>>>>>change the
> >>>>>>"batch" in which this device be returned to guest. But this will not change
> >>>>>>the  pci
> >>>>>>topology on guest.
> >>>>>Yes but that's still guest visible, and will break
> >>>>>for example if guest is migrated between qemu instances
> >>>>>where list order is different precisely when
> >>>>>it's enumerating the bus.
> >>>>>
> >>>>Yes, and the list order is not the only s390 specific information that
> >>>>exposed to
> >>>>guest. Besides that,  we need to migrate all other zpci information. For
> >>>>now,
> >>>>we have no plan to support zpci migration yet.
> >>>BTW how will hotplug work? If it happens while guest
> >>>enumerates the bus the naturally all index values
> >>>become invalid.
> >>The list zpci only happen when the guest doing pci_base_init() for s390.
> >>At that moment,  hotplug does not work yet.
> >You can't prevent this: user can request hotplug at this time.
> >
> >>And assume we have
> >>that case, we still have the index issue even when scan standard pci
> >>bus. Please see my following words.
> >>
> >>>Just don't expose internal qdev data structures to guest.
> >>>It's not by chance that we don't have a look up by index
> >>>capability, it's an attempt to enfoce sane usage.
> >>>You are misusing the API with your hack.
> >>The resume token of list zpci is indeed an index of iteration:(
> >>
> >>>PCI has standard ways to enumerate the bus, maybe you
> >>>should emulate it.  Or find some other way that works.
> >>>The idea to poke at s->fbus->qbus and count things there
> >>>is a bad one.
> >>>
> >>I can define multiple zpci buses, and attach zpci device to a slot of a root
> >>bus.
> >>Then I need to add a api to the common pci code to do the scan of all the
> >>pci host bridges. And in this way, it still has the index issue. I need to
> >>scan
> >>from the first bus to count the index. So any suggestion from you?
> >>
> >OK, I looked at arch/s390/pci/pci.c.
> >First of all, it seems to run the regular PCI thing on bridges.
> >
> >         zdev->bus = pci_scan_root_bus(NULL, ZPCI_BUS_NR, &pci_root_ops,
> >                                       zdev, &resources);
> 
> At this moment, the guest has got all the zpci devices through clp list zpci
> instruction. For each device, in the pci_scan_root_bus(), it will create
> a root bus. So for s390, we get pci devices first, then create a new root bus
> for it.

I don't see this in guest code.

I looked at pci_scan_root_bus and it's completely generic.
It sets up the bus:
        b = pci_create_root_bus(parent, bus, ops, sysdata, resources);

then it scans it:
        max = pci_scan_child_bus(b);


that one does
        /* Go find them, Rover! */
        for (devfn = 0; devfn < 0x100; devfn += 8)
                pci_scan_slot(bus, devfn);

next
        dev = pci_scan_single_device(bus, devfn);

and so on. Eventually you get
        if (!pci_bus_read_dev_vendor_id(bus, devfn, &l, 60*1000))
                return NULL;

and that one does the clp thing using zpci_cfg_load.




> 
> >
> >so to me, it looks like there's no need to expose
> >non-root buses through special means.
> >
> >What to do for root buses is a different question but again,
> >you definitely do not want to rely on the order of things
> >on that linked list.
> >The simplest thing is to ask user to give them unique
> >numbers, or find some stable way to sort them that
> >does not rely on order of initialization (e.g. device IDs?).
> >
> >But again, this only works ok for root buses.
> >
> Basically, it does not exposed the buses to guest, it exposed an index
> to guest.
> Here is the process to get all the zpci device for a guest.
> For example: we have 10 zpci devices, and the batch size for list zpci
> instruction is 4.
> First, qemu will return devices 0-3, index of list zpci is 0
> Second, qemu will return device 4-7, index of list zpci is 4
> Third, qemu will return device 8-9, index of list zpci is 8
> We have device id, but list zpci does not use that as a flag to get
> next batch, it use an index instead.
> This process is defined by s390 arch, we can't change it.
> So no matter how we organize zpci devices in qemu, slot or link list.
> We could not get rid of the index issue.
> 
> How about I add a flag to identify whether the link list
> is valid or not. When a hotplug/unplug event occurred, I will
> reset the index, and make the guest refetch the zpci devices
> from the beginning.
> 
> 
> 
> 

You should just use something stable for IDs.
And avoid doing it for anything that isn't a root or maybe a bridge
since it'll just cause everyone maintainance problems down the road.

-- 
MST

  reply	other threads:[~2015-07-01 11:23 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-29  9:24 [Qemu-devel] [PATCH v2 0/1] s390 pci infrastructure modelling Hong Bo Li
2015-06-29  9:24 ` [Qemu-devel] [PATCH v2 1/1] KVM " Hong Bo Li
2015-06-29 10:01   ` Michael S. Tsirkin
2015-06-30  6:16     ` Hong Bo Li
2015-07-01  6:22       ` Michael S. Tsirkin
2015-07-01  7:56         ` Hong Bo Li
2015-07-01  8:05           ` Michael S. Tsirkin
2015-07-01  9:13             ` Hong Bo Li
2015-07-01  9:22               ` Michael S. Tsirkin
2015-07-01 10:04                 ` Hong Bo Li
2015-07-01 10:36                   ` Michael S. Tsirkin
2015-07-01 11:11                     ` Hong Bo Li
2015-07-01 11:23                       ` Michael S. Tsirkin [this message]
2015-07-01 11:46                         ` Hong Bo Li
2015-07-01 11:57                           ` Michael S. Tsirkin
2015-07-01 12:30                             ` Hong Bo Li
2015-07-01 12:42                             ` Hong Bo Li
2015-07-01 13:37                               ` Michael S. Tsirkin
2015-07-02  2:57                                 ` Hong Bo Li
2015-07-02  5:13                                   ` Michael S. Tsirkin
2015-07-02  5:26                                     ` Hong Bo Li
2015-07-03 11:09                                     ` Hong Bo Li
2015-07-04 18:25                                       ` Michael S. Tsirkin
2015-07-06  2:06                                         ` Hong Bo Li
2015-07-06 10:56                                           ` Michael S. Tsirkin
2015-07-06 12:09                                             ` Hong Bo Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150701131635-mutt-send-email-mst@redhat.com \
    --to=mst@redhat.com \
    --cc=agraf@suse.de \
    --cc=borntraeger@de.ibm.com \
    --cc=cornelia.huck@de.ibm.com \
    --cc=lihbbj@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).