From: Marcel Apfelbaum <marcel@redhat.com>
To: Laszlo Ersek <lersek@redhat.com>, qemu-devel@nongnu.org
Cc: pbonzini@redhat.com, rth@twiddle.net, ehabkost@redhat.com,
mst@redhat.com, "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 2/2] hw/pcie: disable IO port fwd by default for pcie-root-port
Date: Wed, 20 Sep 2017 14:16:42 +0300 [thread overview]
Message-ID: <77945ee2-ccfa-becc-eee1-c4b1066303c7@redhat.com> (raw)
In-Reply-To: <24ca79b9-9350-4339-960b-0dbdf9cd4c5b@redhat.com>
On 20/09/2017 14:01, Laszlo Ersek wrote:
> On 09/20/17 09:42, Marcel Apfelbaum wrote:
>> On 20/09/2017 1:15, Laszlo Ersek wrote:
>
>>> It seems that we now have "grp->io_reserve" (a numeric quantity, not a
>>> boolean), from property "io-reserve". The default value is -1.
>>>
>>> According to the documentation added in c1800a162765 ("docs: update
>>> documentation considering PCIE-PCI bridge", 2017-08-18), the value -1
>>> seems to imply, "If any reservation field is -1 then this kind of
>>> reservation is not needed and must be ignored by firmware".
>>>
>>
>> This was decided after long long upstream discussions with
>> different maintainers.
>
> I know :) I participated in the review, and the documentation patch
> noted above carries my R-b.
>
> I didn't mean that the -1 default was wrong, only that we might have to
> polish the definition / semantics a bit.
>
>>> I think we'll need to refine the definition. Once OVMF starts processing
>>> this capability, the behavior should be compatible with earlier
>>> behavior. Assume that a user sets only "mem-reserve" to something
>>> different from -1, and thus the capability appears. When OVMF sees the
>>> capability, it should be able to tell apart:
>>>
>>> - no particular hint about IO space, so continue doing whatever has been
>>> done all this time (default IO space reservation),
>>> - do not request IO space reservation at all, > - a given (positive)
>>> size of IO space should be reserved.
>>>
>>> So I think:
>>>
>>> (a) the above read-only mask setting should be done based on
>>>
>>> (grp->io_reserve == 0)
>>>
>>> and the "enable-io-fwd" property should be unnecessary,
>>>
>>
>> I really want this to be the case, but I need to check the
>> implementation first. The only concern is the semantics.
>>
>> (grp->io_reserve == 0) hints the firmware to take
>> max(hint, <default-value>), since we don't want to reserve less
>> than the IO range needed by existing devices behind the bridge.
>
> I agree.
>
> In OVMF, I would translate (io_reserve == 0) into *not* returning any
> particular IO reservation hint to the generic PciBusDxe driver, for the
> PCI-E port in question. In turn PciBusDxe follows the above "max"
> semantics, as far as I remember; in other words, setting io_reserve to
> zero wouldn't break existing IO requirements.
>
> Here's a table:
>
> - io_reserve == (-1) --> OVMF returns an IO reservation of 512 bytes
> (which in practice gets rounded up to 4KB). If more IO is needed by
> devices behind the bridge, then that larger value is used. This is the
> current behavior and we should keep it.
>
> - io_reserve == 0 --> OVMF returns no IO reservation at all. If any IO
> is needed by devices behind the bridge, then that value is used.
> Otherwise, no IO is allocated.
>
> - io_reserve > 0 --> OVMF returns this value to PciBusDxe. PciBusDxe
> might round it up. If more IO is needed by devices behind the bridge,
> then that larger value is used.
>
>> The implementation issue might be that Guest Firmware would
>> take the alignment as default value for an empty root port.
>> (maybe take it as a bug and "solve" it?)
>
> Hm, I don't get this. What do you mean?
>
Empty PCI Express Root Port with io_reserve = 0 should end with
no IO ports range, right?
But there is some code in SeaBIOS that "aligns" the IO range to 4K,
so we may end up with a 4K range even if io_reserve=0.
>>
>>> (b) the "io-reserve" property should be set to 0 for 2.11 machine types
>>> and onward, and to -1 for 2.10 and earlier (for compatibility),
>>>
>>
>> Michael asked us to wait a little before changing the default,
>> you can ask him for more info :)
>
> Sure, I'm totally fine with delaying the change to the default value.
>
>>
>>> (c) the documentation in "docs/pcie_pci_bridge.txt" should be updated to
>>> say:
>>> * (-1) --> default firmware behavior (unspecified)
>>> * 0 --> do not reserve
>>> * >0 --> specific reservation requested
>>>
>>
>> Agreed, once I re-check SeaBIOS to confirm behavior.
>
> BTW, if the OVMF -- more precisely, PciBusDxe -- and SeaBIOS behaviors
> turn out to differ significantly in the handling of the values (although
> I don't expect it, see above -- last time I looked, the interpretations
> were similar), I think it's a possibility (although not optimal) to say
> that the interpretation of these values is firmware specific.
>
> We won't know for certain until we write the code and test both firmwares.
>
>>
>>> (d) pci_bridge_qemu_reserve_cap_init() should be updated accordingly
>>> (i.e., a conflict exists if both mem_pref_32_reserve and
>>> mem_pref_64_reserve are *positive*),
>>>
>>
>> I thought we have this check already.
>
> We have a check like this, but it doesn't use the exact condition that
> I'm suggesting above (emphasis on *positive*). We now have
>
> if (mem_pref_32_reserve != (uint32_t)-1 &&
> mem_pref_64_reserve != (uint64_t)-1) {
> error_setg(errp,
>
> but after introducing the zero value, this is going to be too strict.
> Consider all the possible combinations:
>
> * (-1; -1): default fw behavior, check is OK
>
> * (-1; 0): fw sticks with the default for the first component, picks
> "no reservation" for the second component, check is OK
>
> * (-1; >0): fw can treat this identically to ( 0; >0) -- see below. This
> is justified because, while the first component says "use
> the default", the entire situation of having such a
> capability is new, so the firmware can introduce new ways
> for handling it. The check passes, which is good.
>
> * ( 0; 0): check reports error, but the firmware can handle this case
> fine (no reservation for either component)
>
> * ( 0; >0): check reports error, but the fw can handle this (no
> reservation for the first component, specific reservation
> for the second)
>
> * (>0; >0): conflict, check is OK
>
> (I'm skipping the description of the inverted orderings, such as (0;-1),
> they behave similarly.)
>
> So we need to accept 0 as "valid" for either component:
>
> if (mem_pref_32_reserve > 0 &&
> mem_pref_32_reserve < (uint32_t)-1 &&
> mem_pref_64_reserve > 0 &&
> mem_pref_64_reserve < (uint64_t)-1) {
> error_setg(errp,
>
Got it, you are right. How did we end up with something
complicated again :( ?
(complicated = needs documentation and not straight forward)
>>
>>>
>>> Second, when determining the reservations in OVMF, I would like to look
>>> only at the capability fields, and not do a read-write-read-write
>>> quadruplet to the IO base/limit registers. Do you agree?
>>>
>>
>> Well, as stated before, the semantics for the hint is
>> max(hint, <sum of all IO/MEM ranges for devices behind the bridge>).
>> If you can follow it, that would be OK.
>
> In OVMF there are two separate questions:
> - how to report the requested reservations,
> - how to act upon the reported values.
>
> The first question pertains to "platform code", i.e., what I'm going to
> implement under OvmfPkg. The second question pertains to "universal
> code" (under "MdeModulePkg/Bus/Pci/PciBusDxe"), which is a lot harder to
> modify, because it is shipped on a wide range of physical platforms too.
>
> Again, my understanding is that PciBusDxe implements the "max" semantics
> that you describe (pertaining to the 2nd question).
>
> My interest is in figuring out the 1st question now, that is, where I
> should take the information from that goes into the "reservation
> description" that PciBusDxe understands. My preference would be to look
> only at the PCIE port in question (i.e. no other PCI devices at all),
> and only at said capability in the config space of the PCIE port.
>
Sounds OK.
Thanks,
Marcel
>> Getting back to this patch, setting io-reserve to 0
>> would require also:
>>
>> + pci_word_test_and_clear_mask(pci_dev->wmask + PCI_COMMAND,
>> + PCI_COMMAND_IO);
>> + pci_dev->wmask[PCI_IO_BASE] = 0;
>> + pci_dev->wmask[PCI_IO_LIMIT] = 0;
>>
>> otherwise the Guest OS would not behave, just be aware of it.
>
> Absolutely, no doubt about this.
>
> Thanks,
> Laszlo
>
>
>>>> static const VMStateDescription vmstate_rp_dev = {
>>>> .name = "pcie-root-port",
>>>> .version_id = 1,
>>>> @@ -78,6 +111,7 @@ static const VMStateDescription vmstate_rp_dev = {
>>>> static Property gen_rp_props[] = {
>>>> DEFINE_PROP_BOOL("x-migrate-msix", GenPCIERootPort,
>>>> migrate_msix, true),
>>>> + DEFINE_PROP_BOOL("enable-io-fwd", GenPCIERootPort,
>>>> enable_io_fwd, false),
>>>> DEFINE_PROP_END_OF_LIST()
>>>> };
>>>> @@ -86,6 +120,7 @@ static void gen_rp_dev_class_init(ObjectClass
>>>> *klass, void *data)
>>>> DeviceClass *dc = DEVICE_CLASS(klass);
>>>> PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
>>>> PCIERootPortClass *rpc = PCIE_ROOT_PORT_CLASS(klass);
>>>> + GenPCIERootPortClass *grpc = GEN_PCIE_ROOT_PORT_CLASS(klass);
>>>> k->vendor_id = PCI_VENDOR_ID_REDHAT;
>>>> k->device_id = PCI_DEVICE_ID_REDHAT_PCIE_RP;
>>>> @@ -96,6 +131,8 @@ static void gen_rp_dev_class_init(ObjectClass
>>>> *klass, void *data)
>>>> rpc->interrupts_init = gen_rp_interrupts_init;
>>>> rpc->interrupts_uninit = gen_rp_interrupts_uninit;
>>>> rpc->aer_offset = GEN_PCIE_ROOT_PORT_AER_OFFSET;
>>>> + grpc->parent_realize = dc->realize;
>>>> + dc->realize = gen_rp_realize;
>>>> }
>>>> static const TypeInfo gen_rp_dev_info = {
>>>> @@ -103,6 +140,7 @@ static const TypeInfo gen_rp_dev_info = {
>>>> .parent = TYPE_PCIE_ROOT_PORT,
>>>> .instance_size = sizeof(GenPCIERootPort),
>>>> .class_init = gen_rp_dev_class_init,
>>>> + .class_size = sizeof(GenPCIERootPortClass),
>>>> };
>>>> static void gen_rp_register_types(void)
>>>> diff --git a/include/hw/compat.h b/include/hw/compat.h
>>>> index 3e101f8f67..843bf4a3a5 100644
>>>> --- a/include/hw/compat.h
>>>> +++ b/include/hw/compat.h
>>>> @@ -2,7 +2,11 @@
>>>> #define HW_COMPAT_H
>>>> #define HW_COMPAT_2_10 \
>>>> - /* empty */
>>>> + {\
>>>> + .driver = "pcie-root-port",\
>>>> + .property = "enable-io-fwd",\
>>>> + .value = "true",\
>>>> + },
>>>> #define HW_COMPAT_2_9 \
>>>> {\
>>>>
>>>
>>
>
next prev parent reply other threads:[~2017-09-20 13:46 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-06 14:26 [Qemu-devel] [PATCH 0/2] hw/pcie: disable IO port fwd by default for pcie-root-port Marcel Apfelbaum
2017-09-06 14:26 ` [Qemu-devel] [PATCH 1/2] pc: add 2.11 machine types Marcel Apfelbaum
2017-09-06 14:26 ` [Qemu-devel] [PATCH 2/2] hw/pcie: disable IO port fwd by default for pcie-root-port Marcel Apfelbaum
2017-09-06 14:49 ` Eduardo Habkost
2017-09-08 11:39 ` Marcel Apfelbaum
2017-09-19 22:15 ` Laszlo Ersek
2017-09-20 7:42 ` Marcel Apfelbaum
2017-09-20 11:01 ` Laszlo Ersek
2017-09-20 11:16 ` Marcel Apfelbaum [this message]
2017-09-20 11:35 ` Laszlo Ersek
2017-09-27 10:06 ` Marcel Apfelbaum
2017-09-28 7:48 ` Laszlo Ersek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=77945ee2-ccfa-becc-eee1-c4b1066303c7@redhat.com \
--to=marcel@redhat.com \
--cc=dgilbert@redhat.com \
--cc=ehabkost@redhat.com \
--cc=lersek@redhat.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=rth@twiddle.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).