From: Hong Bo Li <lihbbj@linux.vnet.ibm.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: cornelia.huck@de.ibm.com, borntraeger@de.ibm.com,
qemu-devel@nongnu.org, agraf@suse.de
Subject: Re: [Qemu-devel] [PATCH v2 1/1] KVM s390 pci infrastructure modelling
Date: Wed, 01 Jul 2015 20:42:52 +0800 [thread overview]
Message-ID: <5593E04C.3080204@linux.vnet.ibm.com> (raw)
In-Reply-To: <20150701134925-mutt-send-email-mst@redhat.com>
On 7/1/2015 19:57, Michael S. Tsirkin wrote:
> On Wed, Jul 01, 2015 at 07:46:01PM +0800, Hong Bo Li wrote:
>>
>> On 7/1/2015 19:23, Michael S. Tsirkin wrote:
>>> On Wed, Jul 01, 2015 at 07:11:38PM +0800, Hong Bo Li wrote:
>>>> On 7/1/2015 18:36, Michael S. Tsirkin wrote:
>>>>> On Wed, Jul 01, 2015 at 06:04:24PM +0800, Hong Bo Li wrote:
>>>>>> On 7/1/2015 17:22, Michael S. Tsirkin wrote:
>>>>>>> On Wed, Jul 01, 2015 at 05:13:11PM +0800, Hong Bo Li wrote:
>>>>>>>> On 7/1/2015 16:05, Michael S. Tsirkin wrote:
>>>>>>>>> On Wed, Jul 01, 2015 at 03:56:25PM +0800, Hong Bo Li wrote:
>>>>>>>>>> On 7/1/2015 14:22, Michael S. Tsirkin wrote:
>>>>>>>>>>> On Tue, Jun 30, 2015 at 02:16:59PM +0800, Hong Bo Li wrote:
>>>>>>>>>>>> On 6/29/2015 18:01, Michael S. Tsirkin wrote:
>>>>>>>>>>>>> On Mon, Jun 29, 2015 at 05:24:53PM +0800, Hong Bo Li wrote:
>>>>>>>>>>>>>> This patch introduce a new facility(and bus)
>>>>>>>>>>>>>> to hold devices representing information actually
>>>>>>>>>>>>>> provided by s390 firmware and I/O configuration.
>>>>>>>>>>>>>> usage example:
>>>>>>>>>>>>>> -device s390-pcihost
>>>>>>>>>>>>>> -device vfio-pci,host=0000:00:00.0,id=vpci1
>>>>>>>>>>>>>> -device zpci,fid=2,uid=5,pci_id=vpci1,id=zpci1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The first line will create a s390 pci host bridge
>>>>>>>>>>>>>> and init the root bus. The second line will create
>>>>>>>>>>>>>> a standard vfio pci device, and attach it to the
>>>>>>>>>>>>>> root bus. These are similiar to the standard process
>>>>>>>>>>>>>> to define a pci device on other platform.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The third line will create a s390 pci device to
>>>>>>>>>>>>>> store s390 specific information, and references
>>>>>>>>>>>>>> the corresponding vfio pci device via device id.
>>>>>>>>>>>>>> We create a s390 pci facility bus to hold all the
>>>>>>>>>>>>>> zpci devices.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Signed-off-by: Hong Bo Li <lihbbj@linux.vnet.ibm.com>
>>>>>>>>>>>>> It's mostly up to s390 maintainers, but I'd like to note
>>>>>>>>>>>>> one thing below
>>>>>>>>>>>>>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>> hw/s390x/s390-pci-bus.c | 314 +++++++++++++++++++++++++++++++++------------
>>>>>>>>>>>>>> hw/s390x/s390-pci-bus.h | 48 ++++++-
>>>>>>>>>>>>>> hw/s390x/s390-pci-inst.c | 4 +-
>>>>>>>>>>>>>> hw/s390x/s390-virtio-ccw.c | 5 +-
>>>>>>>>>>>>>> 4 files changed, 283 insertions(+), 88 deletions(-)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
>>>>>>>>>>>>>> index 560b66a..d5e7b2e 100644
>>>>>>>>>>>>>> --- a/hw/s390x/s390-pci-bus.c
>>>>>>>>>>>>>> +++ b/hw/s390x/s390-pci-bus.c
>>>>>>>>>>>>>> @@ -32,8 +32,8 @@ int chsc_sei_nt2_get_event(void *res)
>>>>>>>>>>>>>> PciCcdfErr *eccdf;
>>>>>>>>>>>>>> int rc = 1;
>>>>>>>>>>>>>> SeiContainer *sei_cont;
>>>>>>>>>>>>>> - S390pciState *s = S390_PCI_HOST_BRIDGE(
>>>>>>>>>>>>>> - object_resolve_path(TYPE_S390_PCI_HOST_BRIDGE, NULL));
>>>>>>>>>>>>>> + S390PCIFacility *s = S390_PCI_FACILITY(
>>>>>>>>>>>>>> + object_resolve_path(TYPE_S390_PCI_FACILITY, NULL));
>>>>>>>>>>>>>> if (!s) {
>>>>>>>>>>>>>> return rc;
>>>>>>>>>>>>>> @@ -72,8 +72,8 @@ int chsc_sei_nt2_get_event(void *res)
>>>>>>>>>>>>>> int chsc_sei_nt2_have_event(void)
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>> - S390pciState *s = S390_PCI_HOST_BRIDGE(
>>>>>>>>>>>>>> - object_resolve_path(TYPE_S390_PCI_HOST_BRIDGE, NULL));
>>>>>>>>>>>>>> + S390PCIFacility *s = S390_PCI_FACILITY(
>>>>>>>>>>>>>> + object_resolve_path(TYPE_S390_PCI_FACILITY, NULL));
>>>>>>>>>>>>>> if (!s) {
>>>>>>>>>>>>>> return 0;
>>>>>>>>>>>>>> @@ -82,20 +82,32 @@ int chsc_sei_nt2_have_event(void)
>>>>>>>>>>>>>> return !QTAILQ_EMPTY(&s->pending_sei);
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> +void s390_pci_device_enable(S390PCIBusDevice *zpci)
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>> + zpci->fh = zpci->fh | 1 << ENABLE_BIT_OFFSET;
>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +void s390_pci_device_disable(S390PCIBusDevice *zpci)
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>> + zpci->fh = zpci->fh & ~(1 << ENABLE_BIT_OFFSET);
>>>>>>>>>>>>>> + if (zpci->is_unplugged)
>>>>>>>>>>>>>> + object_unparent(OBJECT(zpci));
>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> S390PCIBusDevice *s390_pci_find_dev_by_fid(uint32_t fid)
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>> S390PCIBusDevice *pbdev;
>>>>>>>>>>>>>> - int i;
>>>>>>>>>>>>>> - S390pciState *s = S390_PCI_HOST_BRIDGE(
>>>>>>>>>>>>>> - object_resolve_path(TYPE_S390_PCI_HOST_BRIDGE, NULL));
>>>>>>>>>>>>>> + BusChild *kid;
>>>>>>>>>>>>>> + S390PCIFacility *s = S390_PCI_FACILITY(
>>>>>>>>>>>>>> + object_resolve_path(TYPE_S390_PCI_FACILITY, NULL));
>>>>>>>>>>>>>> if (!s) {
>>>>>>>>>>>>>> return NULL;
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> - for (i = 0; i < PCI_SLOT_MAX; i++) {
>>>>>>>>>>>>>> - pbdev = &s->pbdev[i];
>>>>>>>>>>>>>> - if ((pbdev->fh != 0) && (pbdev->fid == fid)) {
>>>>>>>>>>>>>> + QTAILQ_FOREACH(kid, &s->fbus->qbus.children, sibling) {
>>>>>>>>>>>>>> + pbdev = (S390PCIBusDevice *)kid->child;
>>>>>>>>>>>>>> + if (pbdev->fid == fid) {
>>>>>>>>>>>>>> return pbdev;
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> @@ -126,39 +138,24 @@ void s390_pci_sclp_configure(int configure, SCCB *sccb)
>>>>>>>>>>>>>> return;
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> -static uint32_t s390_pci_get_pfid(PCIDevice *pdev)
>>>>>>>>>>>>>> -{
>>>>>>>>>>>>>> - return PCI_SLOT(pdev->devfn);
>>>>>>>>>>>>>> -}
>>>>>>>>>>>>>> -
>>>>>>>>>>>>>> -static uint32_t s390_pci_get_pfh(PCIDevice *pdev)
>>>>>>>>>>>>>> -{
>>>>>>>>>>>>>> - return PCI_SLOT(pdev->devfn) | FH_VIRT;
>>>>>>>>>>>>>> -}
>>>>>>>>>>>>>> -
>>>>>>>>>>>>>> S390PCIBusDevice *s390_pci_find_dev_by_idx(uint32_t idx)
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>> S390PCIBusDevice *pbdev;
>>>>>>>>>>>>>> - int i;
>>>>>>>>>>>>>> - int j = 0;
>>>>>>>>>>>>>> - S390pciState *s = S390_PCI_HOST_BRIDGE(
>>>>>>>>>>>>>> - object_resolve_path(TYPE_S390_PCI_HOST_BRIDGE, NULL));
>>>>>>>>>>>>>> + BusChild *kid;
>>>>>>>>>>>>>> + int i = 0;
>>>>>>>>>>>>>> + S390PCIFacility *s = S390_PCI_FACILITY(
>>>>>>>>>>>>>> + object_resolve_path(TYPE_S390_PCI_FACILITY, NULL));
>>>>>>>>>>>>>> if (!s) {
>>>>>>>>>>>>>> return NULL;
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> - for (i = 0; i < PCI_SLOT_MAX; i++) {
>>>>>>>>>>>>>> - pbdev = &s->pbdev[i];
>>>>>>>>>>>>>> -
>>>>>>>>>>>>>> - if (pbdev->fh == 0) {
>>>>>>>>>>>>>> - continue;
>>>>>>>>>>>>>> - }
>>>>>>>>>>>>>> -
>>>>>>>>>>>>>> - if (j == idx) {
>>>>>>>>>>>>>> + QTAILQ_FOREACH(kid, &s->fbus->qbus.children, sibling) {
>>>>>>>>>>>>>> + pbdev = (S390PCIBusDevice *)kid->child;
>>>>>>>>>>>>>> + if (i == idx) {
>>>>>>>>>>>>>> return pbdev;
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> - j++;
>>>>>>>>>>>>>> + i++;
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> return NULL;
>>>>>>>>>>>>> This relies on the order of children on the qbus, that's wrong I think.
>>>>>>>>>>>>> Generally I'm not sure why do you convert all slot lookups to child
>>>>>>>>>>>>> lookups: more code to achieve the same effect?
>>>>>>>>>>>> Thank you Michael.
>>>>>>>>>>>> I do the change due to two reasons:
>>>>>>>>>>>> 1. The old implement only supports one s390 pci root bus, and 32(PCI_SLOT_MAX)
>>>>>>>>>>>> slots at most. So when it comes to multiple s390 pci root buses, the old code
>>>>>>>>>>>> does not work.
>>>>>>>>>>>> 2. Now the zpci device "S390PCIBusDevice" is only a structure to store
>>>>>>>>>>>> s390 specific information, so we can attach all the zpci devices to a
>>>>>>>>>>>> s390 pci facility bus. Since these zpci device has no relation with the "slot",
>>>>>>>>>>>> so the order of them does not matter.
>>>>>>>>>>> But you make this order guest-visible which seems wrong.
>>>>>>>>>>>
>>>>>>>>>> The guest uses a s390 specific "list pci" instruction to get all the zpci
>>>>>>>>>> devices, and will
>>>>>>>>>> create a root s390 pci bus for each device. So the order has no relation
>>>>>>>>>> with the pci
>>>>>>>>>> topology on guest.
>>>>>>>>>>
>>>>>>>>>> If we assign too many zpci devices to one guest, the "list pci" instruction
>>>>>>>>>> will use a
>>>>>>>>>> resume token to get all the zpci devices. For example, first time we return
>>>>>>>>>> 32 zpci
>>>>>>>>>> devices to guest. Next time we'll return another 32 zpci devices. The resume
>>>>>>>>>> token
>>>>>>>>>> is used to store the beginning of zpci devices that will be returned to
>>>>>>>>>> guest at next time.
>>>>>>>>>>
>>>>>>>>>> So, if we change the order of the zpci device on s390 facility bus, it may
>>>>>>>>>> change the
>>>>>>>>>> "batch" in which this device be returned to guest. But this will not change
>>>>>>>>>> the pci
>>>>>>>>>> topology on guest.
>>>>>>>>> Yes but that's still guest visible, and will break
>>>>>>>>> for example if guest is migrated between qemu instances
>>>>>>>>> where list order is different precisely when
>>>>>>>>> it's enumerating the bus.
>>>>>>>>>
>>>>>>>> Yes, and the list order is not the only s390 specific information that
>>>>>>>> exposed to
>>>>>>>> guest. Besides that, we need to migrate all other zpci information. For
>>>>>>>> now,
>>>>>>>> we have no plan to support zpci migration yet.
>>>>>>> BTW how will hotplug work? If it happens while guest
>>>>>>> enumerates the bus the naturally all index values
>>>>>>> become invalid.
>>>>>> The list zpci only happen when the guest doing pci_base_init() for s390.
>>>>>> At that moment, hotplug does not work yet.
>>>>> You can't prevent this: user can request hotplug at this time.
>>>>>
>>>>>> And assume we have
>>>>>> that case, we still have the index issue even when scan standard pci
>>>>>> bus. Please see my following words.
>>>>>>
>>>>>>> Just don't expose internal qdev data structures to guest.
>>>>>>> It's not by chance that we don't have a look up by index
>>>>>>> capability, it's an attempt to enfoce sane usage.
>>>>>>> You are misusing the API with your hack.
>>>>>> The resume token of list zpci is indeed an index of iteration:(
>>>>>>
>>>>>>> PCI has standard ways to enumerate the bus, maybe you
>>>>>>> should emulate it. Or find some other way that works.
>>>>>>> The idea to poke at s->fbus->qbus and count things there
>>>>>>> is a bad one.
>>>>>>>
>>>>>> I can define multiple zpci buses, and attach zpci device to a slot of a root
>>>>>> bus.
>>>>>> Then I need to add a api to the common pci code to do the scan of all the
>>>>>> pci host bridges. And in this way, it still has the index issue. I need to
>>>>>> scan
>>>>> >from the first bus to count the index. So any suggestion from you?
>>>>> OK, I looked at arch/s390/pci/pci.c.
>>>>> First of all, it seems to run the regular PCI thing on bridges.
>>>>>
>>>>> zdev->bus = pci_scan_root_bus(NULL, ZPCI_BUS_NR, &pci_root_ops,
>>>>> zdev, &resources);
>>>> At this moment, the guest has got all the zpci devices through clp list zpci
>>>> instruction. For each device, in the pci_scan_root_bus(), it will create
>>>> a root bus. So for s390, we get pci devices first, then create a new root bus
>>>> for it.
>>> I don't see this in guest code.
>>>
>>> I looked at pci_scan_root_bus and it's completely generic.
>>> It sets up the bus:
>>> b = pci_create_root_bus(parent, bus, ops, sysdata, resources);
>>>
>>> then it scans it:
>>> max = pci_scan_child_bus(b);
>>>
>>>
>>> that one does
>>> /* Go find them, Rover! */
>>> for (devfn = 0; devfn < 0x100; devfn += 8)
>>> pci_scan_slot(bus, devfn);
>>>
>>> next
>>> dev = pci_scan_single_device(bus, devfn);
>>>
>>> and so on. Eventually you get
>>> if (!pci_bus_read_dev_vendor_id(bus, devfn, &l, 60*1000))
>>> return NULL;
>>>
>>> and that one does the clp thing using zpci_cfg_load.
>>>
>> pci_base_init()-> clp_scan_pci_devices():
>> rc = clp_list_pci(rrb, __clp_add);
>> In this function, there is a while loop to get all the zpci devices by means
>> of
>> resume token(index). And for each device,
>> __clp_add()-> clp_add_pci_device();
>> In clp_add_pci_device(), we use the zpci information to create a struct
>> zpci_dev zdev.
>> Then zpci_create_device()->zpci_scan_bus()->pci_scan_root_bus()
>> zdev->bus = pci_scan_root_bus(NULL, ZPCI_BUS_NR, &pci_root_ops,
>> zdev, &resources);
>> So, you see, each zdev has its own root bus. And there is no child bus under
>> that root bus.
> Right - zdev *is* the root. But there are pci devices hanging off it.
We have multiple zdevs in kernel, and each zdev only has one pci device
attached to it.
>
> So why not model it like this?
>
> vfio should attach to zdev, zdev is the pci host.
>
> Also, you can stick a pci to pci bridge under the root, and
> everything will just work.
>
>
>
>
>
>>>
>>>>> so to me, it looks like there's no need to expose
>>>>> non-root buses through special means.
>>>>>
>>>>> What to do for root buses is a different question but again,
>>>>> you definitely do not want to rely on the order of things
>>>>> on that linked list.
>>>>> The simplest thing is to ask user to give them unique
>>>>> numbers, or find some stable way to sort them that
>>>>> does not rely on order of initialization (e.g. device IDs?).
>>>>>
>>>>> But again, this only works ok for root buses.
>>>>>
>>>> Basically, it does not exposed the buses to guest, it exposed an index
>>>> to guest.
>>>> Here is the process to get all the zpci device for a guest.
>>>> For example: we have 10 zpci devices, and the batch size for list zpci
>>>> instruction is 4.
>>>> First, qemu will return devices 0-3, index of list zpci is 0
>>>> Second, qemu will return device 4-7, index of list zpci is 4
>>>> Third, qemu will return device 8-9, index of list zpci is 8
>>>> We have device id, but list zpci does not use that as a flag to get
>>>> next batch, it use an index instead.
>>>> This process is defined by s390 arch, we can't change it.
>>>> So no matter how we organize zpci devices in qemu, slot or link list.
>>>> We could not get rid of the index issue.
>>>>
>>>> How about I add a flag to identify whether the link list
>>>> is valid or not. When a hotplug/unplug event occurred, I will
>>>> reset the index, and make the guest refetch the zpci devices
>>> >from the beginning.
>>>>
>>>>
>>>>
>>> You should just use something stable for IDs.
>>> And avoid doing it for anything that isn't a root or maybe a bridge
>>> since it'll just cause everyone maintainance problems down the road.
>>>
>> The list zpci instruction is defined by arch, not a software thing, I could
>> not
>> change it to use a ID instead...
next prev parent reply other threads:[~2015-07-01 12:43 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-29 9:24 [Qemu-devel] [PATCH v2 0/1] s390 pci infrastructure modelling Hong Bo Li
2015-06-29 9:24 ` [Qemu-devel] [PATCH v2 1/1] KVM " Hong Bo Li
2015-06-29 10:01 ` Michael S. Tsirkin
2015-06-30 6:16 ` Hong Bo Li
2015-07-01 6:22 ` Michael S. Tsirkin
2015-07-01 7:56 ` Hong Bo Li
2015-07-01 8:05 ` Michael S. Tsirkin
2015-07-01 9:13 ` Hong Bo Li
2015-07-01 9:22 ` Michael S. Tsirkin
2015-07-01 10:04 ` Hong Bo Li
2015-07-01 10:36 ` Michael S. Tsirkin
2015-07-01 11:11 ` Hong Bo Li
2015-07-01 11:23 ` Michael S. Tsirkin
2015-07-01 11:46 ` Hong Bo Li
2015-07-01 11:57 ` Michael S. Tsirkin
2015-07-01 12:30 ` Hong Bo Li
2015-07-01 12:42 ` Hong Bo Li [this message]
2015-07-01 13:37 ` Michael S. Tsirkin
2015-07-02 2:57 ` Hong Bo Li
2015-07-02 5:13 ` Michael S. Tsirkin
2015-07-02 5:26 ` Hong Bo Li
2015-07-03 11:09 ` Hong Bo Li
2015-07-04 18:25 ` Michael S. Tsirkin
2015-07-06 2:06 ` Hong Bo Li
2015-07-06 10:56 ` Michael S. Tsirkin
2015-07-06 12:09 ` Hong Bo Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5593E04C.3080204@linux.vnet.ibm.com \
--to=lihbbj@linux.vnet.ibm.com \
--cc=agraf@suse.de \
--cc=borntraeger@de.ibm.com \
--cc=cornelia.huck@de.ibm.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).