From: Laszlo Ersek <lersek@redhat.com>
To: Igor Mammedov <imammedo@redhat.com>
Cc: boris.ostrovsky@oracle.com,
"Philippe Mathieu-Daudé" <philmd@redhat.com>,
qemu-devel@nongnu.org, aaron.young@oracle.com
Subject: Re: [PATCH v2 6/7] x68: acpi: trigger SMI before sending hotplug Notify event to OSPM
Date: Wed, 26 Aug 2020 15:32:07 +0200 [thread overview]
Message-ID: <1f563a82-4439-6346-e92e-d734e93418a1@redhat.com> (raw)
In-Reply-To: <cfd4dd52-4827-2288-4b4e-b396d48494f0@redhat.com>
On 08/26/20 11:24, Laszlo Ersek wrote:
> Hi Igor,
>
> On 08/25/20 19:25, Laszlo Ersek wrote:
>
>> So I would suggest fetching the CNEW array element back into "uid"
>> first, then using "uid" for both the NOTIFY call, and the (currently
>> missing) restoration of CSEL. Then we can write 1 to CINS.
>>
>> Expressed as a patch on top of yours:
>>
>>> diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
>>> index 4864c3b39694..2bea6144fd5e 100644
>>> --- a/hw/acpi/cpu.c
>>> +++ b/hw/acpi/cpu.c
>>> @@ -564,8 +564,11 @@ void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
>>> aml_append(method, aml_store(zero, cpu_idx));
>>> while_ctx = aml_while(aml_lless(cpu_idx, num_added_cpus));
>>> {
>>> - aml_append(while_ctx, aml_call2(CPU_NOTIFY_METHOD,
>>> - aml_derefof(aml_index(new_cpus, cpu_idx)), dev_chk));
>>> + aml_append(while_ctx,
>>> + aml_store(aml_derefof(aml_index(new_cpus, cpu_idx)), uid));
>>> + aml_append(while_ctx,
>>> + aml_call2(CPU_NOTIFY_METHOD, uid, dev_chk));
>>> + aml_append(while_ctx, aml_store(uid, cpu_selector));
>>> aml_append(while_ctx, aml_store(one, ins_evt));
>>> aml_append(while_ctx, aml_increment(cpu_idx));
>>> }
>>
>> This effects the following change, in the decompiled method:
>>
>>> @@ -37,15 +37,17 @@
>>> If ((Local_NumAddedCpus != Zero))
>>> {
>>> \_SB.PCI0.SMI0.SMIC = 0x04
>>> }
>>>
>>> Local_CpuIdx = Zero
>>> While ((Local_CpuIdx < Local_NumAddedCpus))
>>> {
>>> - CTFY (DerefOf (CNEW [Local_CpuIdx]), One)
>>> + Local_Uid = DerefOf (CNEW [Local_CpuIdx])
>>> + CTFY (Local_Uid, One)
>>> + \_SB.PCI0.PRES.CSEL = Local_Uid
>>> \_SB.PCI0.PRES.CINS = One
>>> Local_CpuIdx++
>>> }
>>>
>>> Release (\_SB.PCI0.PRES.CPLK)
>>> }
>>
>> With this change, the
>>
>> virsh setvcpus DOMAIN 8 --live
>>
>> command works for me. The topology in my test domain has CPU#0 and
>> CPU#2 cold-plugged, so the command adds 6 VCPUs. Viewed from the
>> firmware side, the 6 "device_add" commands, issued in close succession
>> by libvirtd, coalesce into 4 "batches". (And of course the firmware
>> sees the 4 batches back-to-back.)
>
> unfortunately, with more testing, I have run into two more races:
>
> (1) When a "device_add" occurs after the ACPI loop collects the CPUS
> from the register block, but before the SMI.
>
> Here, the "stray CPU" is processed fine by the firmware. However,
> the CTFY loop in ACPI does not know about the CPU, so it doesn't
> clear the pending insert event for it. And when the firmware is
> entered with an SMI for the *next* time, the firmware sees the same
> CPU *again* as pending, and tries to relocate it again. Bad things
> happen.
>
> (2) When a "device_add" occurs after the SMI, but before the firmware
> collects the pending CPUs from the register block.
>
> Here, the firmware collects the "stray CPU". However, the "broadcast
> SMI", with which we entered the firmware, did *not* cover the stray
> CPU -- the CPU_FOREACH() loop in ich9_apm_ctrl_changed() could not
> make the SMI pending for the new CPU, because at that time, the CPU
> had not been added yet. As a result, when the firmware sends an
> INIT-SIPI-SIPI to the new CPU, expecting it to boot right into SMM,
> the new CPU instead boots straight into the post-RSM (normal mode)
> "pen", skipping its initial SMI handler. Meaning that the CPU halts
> nicely, but its SMBASE is never relocated, and the SMRAM message
> exchange with the BSP falls apart.
>
> Possible mitigations I can think of:
>
> For problem (1):
>
> (1a) Change the firmware so it notices that it has relocated the
> "stray" CPU before -- such CPUs should be simply skipped in the
> firmware. The next time the CTFY loop runs in ACPI, it will clear
> the pending event.
>
> (1b) Alternatively, stop consuming the hotplug register block in the
> firmware altogether, and work out general message passing, from
> ACPI to firmware. See the outline here:
>
> http://mid.mail-archive.com/cf887d74-f65d-602a-9629-3d25cef93a69@redhat.com
>
> For problem (2):
>
> (2a) Change the firmware so that it sends a directed SMI as well to
> each CPU, just before sending an INIT-SIPI-SIPI. This should be
> idempotent -- if the broadcast SMI *has* covered the the CPU,
> then sending a directed SMI should make no difference.
>
> (2b) Alternatively, change the "device_add" command in QEMU so that,
> if "CPU hotplug with SMI" has been negotiated, the new CPU is
> added with the SMI made pending for it at once. (That is, no
> hot-plugged CPU would exist with the directed SMI *not* pending
> for it.)
>
> (2c) Alternatively, approach (1b) would fix problem (2) as well -- the
> firmware would only relocate such CPUs that ACPI collected before
> injecting the SMI. So all those CPUs would have the SMI pending.
>
>
> I can experiment with (1a) and (2a),
My patches for (1a) and (1b) seem to work -- my workstation has 10
PCPUs, and I'm using a guest with 20 possible VCPUs and 2 cold-plugged
VCPUs on it, for testing. The patches survive the hot-plugging of 18
VCPUs in one go, or two batches like 9+9. I can see the fixes being
exercised.
Unless you strongly disagree (or I find issues in further testing), I
propose that I post these fixes to edk2-devel (they should still be in
scope for the upcoming release), and that we stick with your current
patch series for QEMU (v3 -- upcoming, or maybe already posted).
Thanks!
Laszlo
next prev parent reply other threads:[~2020-08-26 13:32 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-18 12:22 [PATCH v2 0/7] x86: fix cpu hotplug with secure boot Igor Mammedov
2020-08-18 12:22 ` [PATCH v2 1/7] x86: lpc9: let firmware negotiate 'CPU hotplug with SMI' features Igor Mammedov
2020-08-19 8:39 ` Laszlo Ersek
2020-08-19 8:51 ` Laszlo Ersek
2020-08-19 8:58 ` Cornelia Huck
2020-08-19 9:14 ` Laszlo Ersek
2020-08-19 12:37 ` Igor Mammedov
2020-08-20 14:56 ` [PATCH v3 " Igor Mammedov
2020-08-21 13:46 ` Laszlo Ersek
2020-08-24 11:00 ` [PATCH v4 1/6] " Igor Mammedov
2020-08-25 12:28 ` Laszlo Ersek
2020-08-18 12:22 ` [PATCH v2 2/7] x86: cphp: prevent guest crash on CPU hotplug when broadcast SMI is in use Igor Mammedov
2020-08-18 12:22 ` [PATCH v2 3/7] x86: cpuhp: refuse cpu hot-unplug request earlier if not supported Igor Mammedov
2020-08-25 12:50 ` Laszlo Ersek
2020-08-18 12:22 ` [PATCH v2 4/7] acpi: add aml_land() and aml_break() primitives Igor Mammedov
2020-08-18 13:03 ` Philippe Mathieu-Daudé
2020-08-25 13:01 ` Laszlo Ersek
2020-08-18 12:22 ` [PATCH v2 5/7] tests: acpi: mark to be changed tables in bios-tables-test-allowed-diff Igor Mammedov
2020-08-18 12:22 ` [PATCH v2 6/7] x68: acpi: trigger SMI before sending hotplug Notify event to OSPM Igor Mammedov
2020-08-25 17:25 ` Laszlo Ersek
2020-08-26 9:23 ` Igor Mammedov
2020-08-26 9:24 ` Laszlo Ersek
2020-08-26 11:55 ` Igor Mammedov
2020-08-26 15:12 ` Laszlo Ersek
2020-08-26 13:32 ` Laszlo Ersek [this message]
2020-08-26 13:32 ` Laszlo Ersek
2020-08-26 14:10 ` Igor Mammedov
2020-08-18 12:22 ` [PATCH v2 7/7] tests: acpi: update acpi blobs with new AML Igor Mammedov
2020-08-18 12:56 ` [PATCH v2 0/7] x86: fix cpu hotplug with secure boot no-reply
2020-08-18 13:01 ` Philippe Mathieu-Daudé
2020-08-18 13:39 ` no-reply
2020-08-18 14:07 ` Philippe Mathieu-Daudé
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1f563a82-4439-6346-e92e-d734e93418a1@redhat.com \
--to=lersek@redhat.com \
--cc=aaron.young@oracle.com \
--cc=boris.ostrovsky@oracle.com \
--cc=imammedo@redhat.com \
--cc=philmd@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).