qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Marcel Apfelbaum <marcel@redhat.com>
To: Laszlo Ersek <lersek@redhat.com>, qemu-devel@nongnu.org
Cc: pbonzini@redhat.com, rth@twiddle.net, ehabkost@redhat.com,
	mst@redhat.com, "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 2/2] hw/pcie: disable IO port fwd by default for pcie-root-port
Date: Wed, 20 Sep 2017 14:16:42 +0300	[thread overview]
Message-ID: <77945ee2-ccfa-becc-eee1-c4b1066303c7@redhat.com> (raw)
In-Reply-To: <24ca79b9-9350-4339-960b-0dbdf9cd4c5b@redhat.com>

On 20/09/2017 14:01, Laszlo Ersek wrote:
> On 09/20/17 09:42, Marcel Apfelbaum wrote:
>> On 20/09/2017 1:15, Laszlo Ersek wrote:
> 
>>> It seems that we now have "grp->io_reserve" (a numeric quantity, not a
>>> boolean), from property "io-reserve". The default value is -1.
>>>
>>> According to the documentation added in c1800a162765 ("docs: update
>>> documentation considering PCIE-PCI bridge", 2017-08-18), the value -1
>>> seems to imply, "If any reservation field is -1 then this kind of
>>> reservation is not needed and must be ignored by firmware".
>>>
>>
>> This was decided after long long upstream discussions with
>> different maintainers.
> 
> I know :) I participated in the review, and the documentation patch
> noted above carries my R-b.
> 
> I didn't mean that the -1 default was wrong, only that we might have to
> polish the definition / semantics a bit.
> 
>>> I think we'll need to refine the definition. Once OVMF starts processing
>>> this capability, the behavior should be compatible with earlier
>>> behavior. Assume that a user sets only "mem-reserve" to something
>>> different from -1, and thus the capability appears. When OVMF sees the
>>> capability, it should be able to tell apart:
>>>
>>> - no particular hint about IO space, so continue doing whatever has been
>>> done all this time (default IO space reservation),
>>> - do not request IO space reservation at all, > - a given (positive)
>>> size of IO space should be reserved.
>>>
>>> So I think:
>>>
>>> (a) the above read-only mask setting should be done based on
>>>
>>>     (grp->io_reserve == 0)
>>>
>>> and the "enable-io-fwd" property should be unnecessary,
>>>
>>
>> I really want this to be the case,  but I need to check the
>> implementation first. The only concern is the semantics.
>>
>> (grp->io_reserve == 0) hints the firmware to take
>> max(hint, <default-value>), since we don't want to reserve less
>> than the IO range needed by existing devices behind the bridge.
> 
> I agree.
> 
> In OVMF, I would translate (io_reserve == 0) into *not* returning any
> particular IO reservation hint to the generic PciBusDxe driver, for the
> PCI-E port in question. In turn PciBusDxe follows the above "max"
> semantics, as far as I remember; in other words, setting io_reserve to
> zero wouldn't break existing IO requirements.
> 
> Here's a table:
> 
> - io_reserve == (-1) --> OVMF returns an IO reservation of 512 bytes
>    (which in practice gets rounded up to 4KB). If more IO is needed by
>    devices behind the bridge, then that larger value is used. This is the
>    current behavior and we should keep it.
> 
> - io_reserve == 0 --> OVMF returns no IO reservation at all. If any IO
>    is needed by devices behind the bridge, then that value is used.
>    Otherwise, no IO is allocated.
> 
> - io_reserve > 0 --> OVMF returns this value to PciBusDxe. PciBusDxe
>    might round it up. If more IO is needed by devices behind the bridge,
>    then that larger value is used.
> 
>> The implementation issue might be that Guest Firmware would
>> take the alignment as default value for an empty root port.
>> (maybe take it as a bug and "solve" it?)
> 
> Hm, I don't get this. What do you mean?
> 

Empty PCI Express Root Port with io_reserve = 0 should end with
no IO ports range, right?

But there is some code in SeaBIOS that "aligns" the IO range to 4K,
so we may end up with a 4K range even if io_reserve=0.


>>
>>> (b) the "io-reserve" property should be set to 0 for 2.11 machine types
>>> and onward, and to -1 for 2.10 and earlier (for compatibility),
>>>
>>
>> Michael asked us to wait a little before changing the default,
>> you can ask him for more info :)
> 
> Sure, I'm totally fine with delaying the change to the default value.
> 
>>
>>> (c) the documentation in "docs/pcie_pci_bridge.txt" should be updated to
>>> say:
>>> * (-1) --> default firmware behavior (unspecified)
>>> *   0  --> do not reserve
>>> *  >0  --> specific reservation requested
>>>
>>
>> Agreed, once I re-check SeaBIOS to confirm behavior.
> 
> BTW, if the OVMF -- more precisely, PciBusDxe -- and SeaBIOS behaviors
> turn out to differ significantly in the handling of the values (although
> I don't expect it, see above -- last time I looked, the interpretations
> were similar), I think it's a possibility (although not optimal) to say
> that the interpretation of these values is firmware specific.
> 
> We won't know for certain until we write the code and test both firmwares.
> 
>>
>>> (d) pci_bridge_qemu_reserve_cap_init() should be updated accordingly
>>> (i.e., a conflict exists if both mem_pref_32_reserve and
>>> mem_pref_64_reserve are *positive*),
>>>
>>
>> I thought we have this check already.
> 
> We have a check like this, but it doesn't use the exact condition that
> I'm suggesting above (emphasis on *positive*). We now have
> 
>      if (mem_pref_32_reserve != (uint32_t)-1 &&
>          mem_pref_64_reserve != (uint64_t)-1) {
>          error_setg(errp,
> 
> but after introducing the zero value, this is going to be too strict.
> Consider all the possible combinations:
> 
> * (-1; -1): default fw behavior, check is OK
> 
> * (-1;  0): fw sticks with the default for the first component, picks
>              "no reservation" for the second component, check is OK
> 
> * (-1; >0): fw can treat this identically to ( 0; >0) -- see below. This
>              is justified because, while the first component says "use
>              the default", the entire situation of having such a
>              capability is new, so the firmware can introduce new ways
>              for handling it. The check passes, which is good.
> 
> * ( 0;  0): check reports error, but the firmware can handle this case
>              fine (no reservation for either component)
> 
> * ( 0; >0): check reports error, but the fw can handle this (no
>              reservation for the first component, specific reservation
>              for the second)
> 
> * (>0; >0): conflict, check is OK
> 
> (I'm skipping the description of the inverted orderings, such as (0;-1),
> they behave similarly.)
> 
> So we need to accept 0 as "valid" for either component:
> 
>      if (mem_pref_32_reserve > 0 &&
>          mem_pref_32_reserve < (uint32_t)-1 &&
>          mem_pref_64_reserve > 0 &&
>          mem_pref_64_reserve < (uint64_t)-1) {
>          error_setg(errp,
> 

Got it, you are right. How did we end up with something
complicated again :( ?
(complicated = needs documentation and not straight forward)

>>
>>>
>>> Second, when determining the reservations in OVMF, I would like to look
>>> only at the capability fields, and not do a read-write-read-write
>>> quadruplet to the IO base/limit registers. Do you agree?
>>>
>>
>> Well, as stated before, the semantics for the hint is
>> max(hint, <sum of all IO/MEM ranges for devices behind the bridge>).
>> If you can follow it, that would be OK.
> 
> In OVMF there are two separate questions:
> - how to report the requested reservations,
> - how to act upon the reported values.
> 
> The first question pertains to "platform code", i.e., what I'm going to
> implement under OvmfPkg. The second question pertains to "universal
> code" (under "MdeModulePkg/Bus/Pci/PciBusDxe"), which is a lot harder to
> modify, because it is shipped on a wide range of physical platforms too.
> 
> Again, my understanding is that PciBusDxe implements the "max" semantics
> that you describe (pertaining to the 2nd question).
> 
> My interest is in figuring out the 1st question now, that is, where I
> should take the information from that goes into the "reservation
> description" that PciBusDxe understands. My preference would be to look
> only at the PCIE port in question (i.e. no other PCI devices at all),
> and only at said capability in the config space of the PCIE port.
> 

Sounds OK.

Thanks,
Marcel

>> Getting back to this patch, setting io-reserve to 0
>> would require also:
>>
>>     +      pci_word_test_and_clear_mask(pci_dev->wmask + PCI_COMMAND,
>>     +                                     PCI_COMMAND_IO);
>>     +      pci_dev->wmask[PCI_IO_BASE] = 0;
>>     +      pci_dev->wmask[PCI_IO_LIMIT] = 0;
>>
>> otherwise the Guest OS would not behave, just be aware of it.
> 
> Absolutely, no doubt about this.
> 
> Thanks,
> Laszlo
> 
> 
>>>>    static const VMStateDescription vmstate_rp_dev = {
>>>>        .name = "pcie-root-port",
>>>>        .version_id = 1,
>>>> @@ -78,6 +111,7 @@ static const VMStateDescription vmstate_rp_dev = {
>>>>      static Property gen_rp_props[] = {
>>>>        DEFINE_PROP_BOOL("x-migrate-msix", GenPCIERootPort,
>>>> migrate_msix, true),
>>>> +    DEFINE_PROP_BOOL("enable-io-fwd", GenPCIERootPort,
>>>> enable_io_fwd, false),
>>>>        DEFINE_PROP_END_OF_LIST()
>>>>    };
>>>>    @@ -86,6 +120,7 @@ static void gen_rp_dev_class_init(ObjectClass
>>>> *klass, void *data)
>>>>        DeviceClass *dc = DEVICE_CLASS(klass);
>>>>        PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
>>>>        PCIERootPortClass *rpc = PCIE_ROOT_PORT_CLASS(klass);
>>>> +    GenPCIERootPortClass *grpc = GEN_PCIE_ROOT_PORT_CLASS(klass);
>>>>          k->vendor_id = PCI_VENDOR_ID_REDHAT;
>>>>        k->device_id = PCI_DEVICE_ID_REDHAT_PCIE_RP;
>>>> @@ -96,6 +131,8 @@ static void gen_rp_dev_class_init(ObjectClass
>>>> *klass, void *data)
>>>>        rpc->interrupts_init = gen_rp_interrupts_init;
>>>>        rpc->interrupts_uninit = gen_rp_interrupts_uninit;
>>>>        rpc->aer_offset = GEN_PCIE_ROOT_PORT_AER_OFFSET;
>>>> +    grpc->parent_realize = dc->realize;
>>>> +    dc->realize = gen_rp_realize;
>>>>    }
>>>>      static const TypeInfo gen_rp_dev_info = {
>>>> @@ -103,6 +140,7 @@ static const TypeInfo gen_rp_dev_info = {
>>>>        .parent        = TYPE_PCIE_ROOT_PORT,
>>>>        .instance_size = sizeof(GenPCIERootPort),
>>>>        .class_init    = gen_rp_dev_class_init,
>>>> +    .class_size    = sizeof(GenPCIERootPortClass),
>>>>    };
>>>>       static void gen_rp_register_types(void)
>>>> diff --git a/include/hw/compat.h b/include/hw/compat.h
>>>> index 3e101f8f67..843bf4a3a5 100644
>>>> --- a/include/hw/compat.h
>>>> +++ b/include/hw/compat.h
>>>> @@ -2,7 +2,11 @@
>>>>    #define HW_COMPAT_H
>>>>      #define HW_COMPAT_2_10 \
>>>> -    /* empty */
>>>> +    {\
>>>> +        .driver   = "pcie-root-port",\
>>>> +        .property = "enable-io-fwd",\
>>>> +        .value    = "true",\
>>>> +    },
>>>>      #define HW_COMPAT_2_9 \
>>>>        {\
>>>>
>>>
>>
> 

  reply	other threads:[~2017-09-20 13:46 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-06 14:26 [Qemu-devel] [PATCH 0/2] hw/pcie: disable IO port fwd by default for pcie-root-port Marcel Apfelbaum
2017-09-06 14:26 ` [Qemu-devel] [PATCH 1/2] pc: add 2.11 machine types Marcel Apfelbaum
2017-09-06 14:26 ` [Qemu-devel] [PATCH 2/2] hw/pcie: disable IO port fwd by default for pcie-root-port Marcel Apfelbaum
2017-09-06 14:49   ` Eduardo Habkost
2017-09-08 11:39     ` Marcel Apfelbaum
2017-09-19 22:15   ` Laszlo Ersek
2017-09-20  7:42     ` Marcel Apfelbaum
2017-09-20 11:01       ` Laszlo Ersek
2017-09-20 11:16         ` Marcel Apfelbaum [this message]
2017-09-20 11:35           ` Laszlo Ersek
2017-09-27 10:06         ` Marcel Apfelbaum
2017-09-28  7:48           ` Laszlo Ersek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=77945ee2-ccfa-becc-eee1-c4b1066303c7@redhat.com \
    --to=marcel@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=lersek@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).