qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Thomas Huth <thuth@redhat.com>, qemu-devel@nongnu.org
Cc: qemu-s390x@nongnu.org, Collin Walling <walling@linux.ibm.com>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Cornelia Huck <cohuck@redhat.com>,
	Richard Henderson <rth@twiddle.net>,
	Pierre Morel <pmorel@linux.ibm.com>,
	"Michael S . Tsirkin" <mst@redhat.com>,
	Laurent Vivier <lvivier@redhat.com>,
	Marcel Apfelbaum <marcel.apfelbaum@gmail.com>,
	Gerald Schaefer <gerald.schaefer@de.ibm.com>,
	linux-s390@vger.kernel.org, Sebastian Ott <sebott@linux.ibm.com>
Subject: Re: [Qemu-devel] [PATCH v1 2/2] s390x/pci: Fix hotplugging of PCI bridges
Date: Wed, 23 Jan 2019 15:26:04 +0100	[thread overview]
Message-ID: <9420629b-a788-d780-f9c9-b26d1e30a60d@redhat.com> (raw)
In-Reply-To: <afaed009-3831-db23-26c6-485d23e44b29@redhat.com>

On 23.01.19 14:42, Thomas Huth wrote:
> On 2019-01-23 12:42, David Hildenbrand wrote:
>> On 23.01.19 12:16, Thomas Huth wrote:
>>> On 2019-01-22 13:51, David Hildenbrand wrote:
>>>> When hotplugging a PCI bridge right now to the root port, we resolve
>>>> pci_get_bus(pdev)->parent_dev, which results in a SEGFAULT. Hotplugging
>>>> really only works right now when hotplugging to another bridge.
>>>>
>>>> Instead, we have to properly check if we are already at the root.
>>>>
>>>> Let's cleanup the code while at it a bit and factor out updating the
>>>> subordiante bus number into a separate function. The check for
>>>> "old_nr < nr" is right now not strictly necessary, but makes it more
>>>> obvious what is actually going on.
>>>>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>> ---
>>>>  hw/s390x/s390-pci-bus.c | 28 +++++++++++++++++++---------
>>>>  1 file changed, 19 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
>>>> index b7c4613fde..9b5c5fff60 100644
>>>> --- a/hw/s390x/s390-pci-bus.c
>>>> +++ b/hw/s390x/s390-pci-bus.c
>>>> @@ -843,6 +843,21 @@ static void s390_pcihost_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>>>      }
>>>>  }
>>>>  
>>>> +static void s390_pci_update_subordinate(PCIDevice *dev, uint32_t nr)
>>>> +{
>>>> +    uint32_t old_nr;
>>>> +
>>>> +    pci_default_write_config(dev, PCI_SUBORDINATE_BUS, nr, 1);
>>>> +    while (!pci_bus_is_root(pci_get_bus(dev))) {
>>>> +        dev = pci_get_bus(dev)->parent_dev;
>>>> +
>>>> +        old_nr = pci_default_read_config(dev, PCI_SUBORDINATE_BUS, 1);
>>>> +        if (old_nr < nr) {
>>>> +            pci_default_write_config(dev, PCI_SUBORDINATE_BUS, nr, 1);
>>>> +        }
>>>> +    }
>>>> +}
>>>> +
>>>>  static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>>>                                Error **errp)
>>>>  {
>>>> @@ -851,26 +866,21 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>>>      S390PCIBusDevice *pbdev = NULL;
>>>>  
>>>>      if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_BRIDGE)) {
>>>> -        BusState *bus;
>>>>          PCIBridge *pb = PCI_BRIDGE(dev);
>>>> -        PCIDevice *pdev = PCI_DEVICE(dev);
>>>>  
>>>> +        pdev = PCI_DEVICE(dev);
>>>
>>> Why not keep the direct assignment?
>>
>> We have the same local variable in that function already. I'm just
>> reusing it.
> 
> D'oh, right. Please disregard my comment.
> 
>>>
>>>>          pci_bridge_map_irq(pb, dev->id, s390_pci_map_irq);
>>>>          pci_setup_iommu(&pb->sec_bus, s390_pci_dma_iommu, s);
>>>>  
>>>> -        bus = BUS(&pb->sec_bus);
>>>> -        qbus_set_hotplug_handler(bus, DEVICE(s), errp);
>>>> +        qbus_set_hotplug_handler(BUS(&pb->sec_bus), DEVICE(s), errp);
>>>>  
>>>>          if (dev->hotplugged) {
>>>>              pci_default_write_config(pdev, PCI_PRIMARY_BUS,
>>>>                                       pci_dev_bus_num(pdev), 1);
>>>>              s->bus_no += 1;
>>>>              pci_default_write_config(pdev, PCI_SECONDARY_BUS, s->bus_no, 1);
>>>> -            do {
>>>> -                pdev = pci_get_bus(pdev)->parent_dev;
>>>> -                pci_default_write_config(pdev, PCI_SUBORDINATE_BUS,
>>>> -                                         s->bus_no, 1);
>>>> -            } while (pci_get_bus(pdev) && pci_dev_bus_num(pdev));
>>>> +
>>>> +            s390_pci_update_subordinate(pdev, s->bus_no);
>>>
>>>
>>> While your changes look OK at a first glance, and certainly fix the
>>> crash, I wonder whether this is the right thing to do here at all. Why
>>> does the hypervisor set up the all the bridge registers here? I'd rather
>>> expect that this is the job of the guest after it detects a hot-plugged
>>> device?
>>
>> I honestly have no idea who is responsible for that. I can see that
>> spapr does not seem to handle hotplugging the way s390x does. I was
>> hoping that Colin maybe know why we have to do that on s390x this way.
>> At least this patch does not change the behavior but only fixes it.
> 
> Right, you're patch certainly fixes a crash, and does not make it any
> worse than it was before, so:
> 
> Reviewed-by: Thomas Huth <thuth@redhat.com>
> 
>>>
>>> Also I'm not sure whether the numbering is right in every case. Let's
>>> use the example from the URL that you've mentioned in the cover letter
>>> (http://www.science.unitn.it/~fiorella/guidelinux/tlk/node76.html step 4):
>>>
>>> Assume that you hot-plug another bridge "5" to "Bridge 2". If I get the
>>> code right, the setup will now look like this:
>>
>> Yes, this is what I would expect! This implies that any bus in the
>> hierarchy lies between Sec and Sub. But that there might be multiple
>> candidates, which feels wrong as you correctly say.
>>
>>>
>>>                CPU
>>>                 |
>>>   --------------+-------------- Bus 0
>>>                 |
>>>                 |  Pri=0
>>>                BR1 Sec=1
>>>                 |  Sub=5
>>>                 |
>>>   ------+-------+----------+--- Bus 1
>>>         |                  |
>>>         |  Pri=1           |  Pri=1
>>>        BR3 Sec=3          BR2 Sec=2
>>>         |  Sub=4           |  Sub=5
>>>         |                  |
>>>   --+---+--- Bus 3       --+-+- Bus 2
>>>     |                        |
>>>     |  Pri=3                 |  Pri=2
>>>    BR4 Sec=4                BR5 Sec=5
>>>     |  Sub=4                 |  Sub=5
>>>     |                        |
>>>   --+---- Bus 4            --+-- Bus 5
>>>
>>> Requests to bus 3 and 4 are now also routed through to bus 2 and 5. That
>>> sounds wrong to me. Or is this how hot-plugging of PCI bridges is
>>> supposed to work??
>>>
>>
>> Learning more about PCI
>>
>> Quoting from [2]
>>
>> "PCIe enumeration is generally done twice. First, your BIOS (UEFI or
>> otherwise) will do it, to figure out who's present and how much memory
>> they need. This data can then be passed on to the host OS who can take
>> it as-is, but Linux and Windows often perform their own enumeration
>> procedure ..."
>>
>> "The custom software part comes in during this enumeration process and
>> that is you must reserve ahead of time PCI Bus numbers, and memory
>> segments for potential future devices -- this is sometimes called 'bus
>> padding'. This avoids the need to re-enumerate the bus in the future
>> which can often not be done without disruption to the system ..."
>>
>> So if I understand correctly, the BIOS/Firmware/QEMU only provides the
>> initial state. After that, the guest is responsible to manage it,
>> especially to manage hotplug.
>>
>> However, I wonder if it would also be up to BIOS/Firmware/QEMU to do
>> this bus padding or if it is completely on the OS side. And if
>> hotplugged devices simply have all 0's.
> 
> Anyway, the current behavior sounds pretty wrong to me. I am sure that
> we can not simply traverse the bridge hierarchy up to the top and change
> the subordinate bus setting everywhere without informing the guest about
> that change. What if the guest OS has cached the values from the
> bridges? It then might use the wrong value for future calculations...

Yes, it also sounds wrong to me. Maybe this is what Firmware manages on
s390x - maybe they reserve bus numbers. Or maybe hotplug has to really
be handled by the guests. Only IBM people can clarify that.

Thanks!

-- 

Thanks,

David / dhildenb

  reply	other threads:[~2019-01-23 14:32 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-22 12:51 [Qemu-devel] [PATCH v1 0/2] s390x/pci: PCI bridge plugging fixes David Hildenbrand
2019-01-22 12:51 ` [Qemu-devel] [PATCH v1 1/2] s390x/pci: Fix primary bus number for PCI bridges David Hildenbrand
2019-01-23 10:26   ` Thomas Huth
2019-01-23 10:32     ` David Hildenbrand
2019-01-23 10:37       ` Cornelia Huck
2019-01-23 10:39       ` Thomas Huth
2019-01-22 12:51 ` [Qemu-devel] [PATCH v1 2/2] s390x/pci: Fix hotplugging of " David Hildenbrand
2019-01-23 11:16   ` Thomas Huth
2019-01-23 11:42     ` David Hildenbrand
2019-01-23 13:42       ` Thomas Huth
2019-01-23 14:26         ` David Hildenbrand [this message]
2019-01-22 13:01 ` [Qemu-devel] [PATCH v1 0/2] s390x/pci: PCI bridge plugging fixes Cornelia Huck
2019-01-23  8:02   ` David Hildenbrand
2019-01-23  9:47     ` Cornelia Huck
2019-01-28 11:42 ` Cornelia Huck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9420629b-a788-d780-f9c9-b26d1e30a60d@redhat.com \
    --to=david@redhat.com \
    --cc=borntraeger@de.ibm.com \
    --cc=cohuck@redhat.com \
    --cc=gerald.schaefer@de.ibm.com \
    --cc=linux-s390@vger.kernel.org \
    --cc=lvivier@redhat.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mst@redhat.com \
    --cc=pmorel@linux.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-s390x@nongnu.org \
    --cc=rth@twiddle.net \
    --cc=sebott@linux.ibm.com \
    --cc=thuth@redhat.com \
    --cc=walling@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).