* [PATCH 0/2] libacpi: Fix memory corruption on ACPI CPU hotplug
@ 2025-09-11 11:53 Alejandro Vallejo
2025-09-11 11:53 ` [PATCH v2 1/2] libacpi: Prevent CPU hotplug AML from corrupting memory Alejandro Vallejo
2025-09-11 11:53 ` [PATCH v2 2/2] libacpi: Remove CPU hotplug and GPE handling from PVH DSDTs Alejandro Vallejo
0 siblings, 2 replies; 15+ messages in thread
From: Alejandro Vallejo @ 2025-09-11 11:53 UTC (permalink / raw)
To: xen-devel; +Cc: Alejandro Vallejo, Jan Beulich, Anthony PERARD
Hi,
This series is a follow-up to a prior patch to remove ACPI CPU hotplug AML from
PVH guests. Between now and then the working assumption for the corruption has
been identified to be more general than PVH. This series rewords the commit
message for that patch (which became patch 2 in this series), and includes a
patch 1 that prevents AML overflowing the MADT during execution of _GPE.
More details of the bug in patch1's commit message.
pipeline: https://gitlab.com/xen-project/people/agvallejo/xen/-/pipelines/2034006083
original patch: https://lore.kernel.org/xen-devel/20250910144921.29048-1-alejandro.garciavallejo@amd.com/
Alejandro Vallejo (2):
libacpi: Prevent CPU hotplug AML from corrupting memory
libacpi: Remove CPU hotplug and GPE handling from PVH DSDTs
tools/libacpi/mk_dsdt.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
base-commit: aad6ebf0596f7eda6ea709f1c293ef5911ae8938
--
2.43.0
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2 1/2] libacpi: Prevent CPU hotplug AML from corrupting memory
2025-09-11 11:53 [PATCH 0/2] libacpi: Fix memory corruption on ACPI CPU hotplug Alejandro Vallejo
@ 2025-09-11 11:53 ` Alejandro Vallejo
2025-09-11 12:03 ` Andrew Cooper
2025-09-11 14:52 ` Jan Beulich
2025-09-11 11:53 ` [PATCH v2 2/2] libacpi: Remove CPU hotplug and GPE handling from PVH DSDTs Alejandro Vallejo
1 sibling, 2 replies; 15+ messages in thread
From: Alejandro Vallejo @ 2025-09-11 11:53 UTC (permalink / raw)
To: xen-devel
Cc: Alejandro Vallejo, Jan Beulich, Anthony PERARD, Grygorii Strashko
CPU hotplug relies on the online CPU bitmap being provided on PIO 0xaf00
by the device model. The GPE handler checks this and compares it against
the "online" flag on each MADT LAPIC entry, setting the flag to its
related bit in the bitmap and adjusting the table's checksum.
The bytecode doesn't, however, stop at NCPUS. It keeps comparing until it
reaches 128, even if that overflows the MADT into some other (hopefully
mapped) memory. The reading isn't as problematic as the writing though.
If an "entry" outside the MADT is deemed to disagree with the CPU bitmap
then the bit where the "online" flag would be is flipped, thus
corrupting that memory. And the MADT checksum gets adjusted for a flip
that happened outside its range. It's all terrible.
Note that this corruption happens regardless of the device-model being
present or not, because even if the bitmap holds 0s, the overflowed
memory might not at the bits corresponding to the "online" flag.
This patch adjusts the DSDT so entries >=NCPUS are skipped.
Fixes: c70ad37a1f7c("HVM vcpu add/remove: setup dsdt infrastructure...")
Reported-by: Grygorii Strashko <grygorii_strashko@epam.com>
Signed-off-by: Alejandro Vallejo <alejandro.garciavallejo@amd.com>
---
Half RFC. Not thoroughly untested. Pipeline is green, but none of this is tested
there.
v2:
* New patch with the general fix for HVM too. Turns out the correction
logic was buggy after all.
---
tools/libacpi/mk_dsdt.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/libacpi/mk_dsdt.c b/tools/libacpi/mk_dsdt.c
index 8ac4f9d0b4..06a229e2e9 100644
--- a/tools/libacpi/mk_dsdt.c
+++ b/tools/libacpi/mk_dsdt.c
@@ -239,7 +239,8 @@ int main(int argc, char **argv)
/* Extract current CPU's status: 0=offline; 1=online. */
stmt("And", "Local1, 1, Local2");
/* Check if status is up-to-date in the relevant MADT LAPIC entry... */
- push_block("If", "LNotEqual(Local2, \\_SB.PR%02X.FLG)", cpu);
+ push_block("If", "And(LLess(%d, NCPU), LNotEqual(Local2, \\_SB.PR%02X.FLG))",
+ cpu, cpu);
/* ...If not, update it and the MADT checksum, and notify OSPM. */
stmt("Store", "Local2, \\_SB.PR%02X.FLG", cpu);
push_block("If", "LEqual(Local2, 1)");
--
2.43.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v2 2/2] libacpi: Remove CPU hotplug and GPE handling from PVH DSDTs
2025-09-11 11:53 [PATCH 0/2] libacpi: Fix memory corruption on ACPI CPU hotplug Alejandro Vallejo
2025-09-11 11:53 ` [PATCH v2 1/2] libacpi: Prevent CPU hotplug AML from corrupting memory Alejandro Vallejo
@ 2025-09-11 11:53 ` Alejandro Vallejo
2025-09-11 15:07 ` Jan Beulich
1 sibling, 1 reply; 15+ messages in thread
From: Alejandro Vallejo @ 2025-09-11 11:53 UTC (permalink / raw)
To: xen-devel; +Cc: Alejandro Vallejo, Jan Beulich, Anthony PERARD
PVH guests have no DM, so this causes the guest to fetch the online CPU
bitmap from an unbacked 0xaf00 PIO port when executing the GPE handler.
Seeing how ACPI CPU hotplug is the only event delivered via GPE, remove
the GPE handler in addition to anything ACPI CPU hotplug related.
This shrinks PVH's DSDT substantially and prevents spuriously executing
a large amount of AML with no purpose at all.
Signed-off-by: Alejandro Vallejo <alejandro.garciavallejo@amd.com>
---
v2:
* Adjusted commit message
* All other tags except S-by moved to patch 1.
---
tools/libacpi/mk_dsdt.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/tools/libacpi/mk_dsdt.c b/tools/libacpi/mk_dsdt.c
index 06a229e2e9..cc4ed3961c 100644
--- a/tools/libacpi/mk_dsdt.c
+++ b/tools/libacpi/mk_dsdt.c
@@ -218,6 +218,11 @@ int main(int argc, char **argv)
pop_block();
/**** Processor end ****/
#else
+ if (dm_version == QEMU_NONE) {
+ pop_block();
+ pop_block();
+ return 0;
+ }
/* Operation Region 'PRST': bitmask of online CPUs. */
stmt("OperationRegion", "PRST, SystemIO, %#x, %d",
@@ -265,10 +270,6 @@ int main(int argc, char **argv)
pop_block();
pop_block();
- if (dm_version == QEMU_NONE) {
- pop_block();
- return 0;
- }
/**** Processor end ****/
--
2.43.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v2 1/2] libacpi: Prevent CPU hotplug AML from corrupting memory
2025-09-11 11:53 ` [PATCH v2 1/2] libacpi: Prevent CPU hotplug AML from corrupting memory Alejandro Vallejo
@ 2025-09-11 12:03 ` Andrew Cooper
2025-09-11 12:35 ` Alejandro Vallejo
2025-09-11 15:04 ` Jan Beulich
2025-09-11 14:52 ` Jan Beulich
1 sibling, 2 replies; 15+ messages in thread
From: Andrew Cooper @ 2025-09-11 12:03 UTC (permalink / raw)
To: Alejandro Vallejo, xen-devel
Cc: Jan Beulich, Anthony PERARD, Grygorii Strashko
On 11/09/2025 12:53 pm, Alejandro Vallejo wrote:
> CPU hotplug relies on the online CPU bitmap being provided on PIO 0xaf00
> by the device model. The GPE handler checks this and compares it against
> the "online" flag on each MADT LAPIC entry, setting the flag to its
> related bit in the bitmap and adjusting the table's checksum.
>
> The bytecode doesn't, however, stop at NCPUS. It keeps comparing until it
> reaches 128, even if that overflows the MADT into some other (hopefully
> mapped) memory. The reading isn't as problematic as the writing though.
>
> If an "entry" outside the MADT is deemed to disagree with the CPU bitmap
> then the bit where the "online" flag would be is flipped, thus
> corrupting that memory. And the MADT checksum gets adjusted for a flip
> that happened outside its range. It's all terrible.
>
> Note that this corruption happens regardless of the device-model being
> present or not, because even if the bitmap holds 0s, the overflowed
> memory might not at the bits corresponding to the "online" flag.
>
> This patch adjusts the DSDT so entries >=NCPUS are skipped.
>
> Fixes: c70ad37a1f7c("HVM vcpu add/remove: setup dsdt infrastructure...")
> Reported-by: Grygorii Strashko <grygorii_strashko@epam.com>
> Signed-off-by: Alejandro Vallejo <alejandro.garciavallejo@amd.com>
> ---
> Half RFC. Not thoroughly untested. Pipeline is green, but none of this is tested
> there.
>
> v2:
> * New patch with the general fix for HVM too. Turns out the correction
> logic was buggy after all.
Hmm, this does sound rather more serious. I have a nagging feeling that
until recently we always wrote 128 MADT entries.
So, while this looks like a good fix, I think we might want a second
Fixes tag.
~Andrew
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 1/2] libacpi: Prevent CPU hotplug AML from corrupting memory
2025-09-11 12:03 ` Andrew Cooper
@ 2025-09-11 12:35 ` Alejandro Vallejo
2025-09-11 13:29 ` Andrew Cooper
2025-09-11 15:04 ` Jan Beulich
1 sibling, 1 reply; 15+ messages in thread
From: Alejandro Vallejo @ 2025-09-11 12:35 UTC (permalink / raw)
To: Andrew Cooper, xen-devel; +Cc: Jan Beulich, Anthony PERARD, Grygorii Strashko
On Thu Sep 11, 2025 at 2:03 PM CEST, Andrew Cooper wrote:
> On 11/09/2025 12:53 pm, Alejandro Vallejo wrote:
>> CPU hotplug relies on the online CPU bitmap being provided on PIO 0xaf00
>> by the device model. The GPE handler checks this and compares it against
>> the "online" flag on each MADT LAPIC entry, setting the flag to its
>> related bit in the bitmap and adjusting the table's checksum.
>>
>> The bytecode doesn't, however, stop at NCPUS. It keeps comparing until it
>> reaches 128, even if that overflows the MADT into some other (hopefully
>> mapped) memory. The reading isn't as problematic as the writing though.
>>
>> If an "entry" outside the MADT is deemed to disagree with the CPU bitmap
>> then the bit where the "online" flag would be is flipped, thus
>> corrupting that memory. And the MADT checksum gets adjusted for a flip
>> that happened outside its range. It's all terrible.
>>
>> Note that this corruption happens regardless of the device-model being
>> present or not, because even if the bitmap holds 0s, the overflowed
>> memory might not at the bits corresponding to the "online" flag.
>>
>> This patch adjusts the DSDT so entries >=NCPUS are skipped.
>>
>> Fixes: c70ad37a1f7c("HVM vcpu add/remove: setup dsdt infrastructure...")
>> Reported-by: Grygorii Strashko <grygorii_strashko@epam.com>
>> Signed-off-by: Alejandro Vallejo <alejandro.garciavallejo@amd.com>
>> ---
>> Half RFC. Not thoroughly untested. Pipeline is green, but none of this is tested
>> there.
>>
>> v2:
>> * New patch with the general fix for HVM too. Turns out the correction
>> logic was buggy after all.
>
> Hmm, this does sound rather more serious. I have a nagging feeling that
> until recently we always wrote 128 MADT entries.
If so, I don't see where. It used to be 16, waaaaaaaaaaaaaaaaaaaaay back when.
Then it got extended to whatever it needed to be.
I have the nagging feeling that rather opaque "some OSs (cough Windows cough)
don't like more than 16 CPUs was actually this bug in action. Making the DSDTs
with exactly 16 CPUs a particular kind of silly.
>
> So, while this looks like a good fix, I think we might want a second
> Fixes tag.
Happy to add it, but I really don't see anything like that in the git log.
Cheers,
Alejandro
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 1/2] libacpi: Prevent CPU hotplug AML from corrupting memory
2025-09-11 12:35 ` Alejandro Vallejo
@ 2025-09-11 13:29 ` Andrew Cooper
0 siblings, 0 replies; 15+ messages in thread
From: Andrew Cooper @ 2025-09-11 13:29 UTC (permalink / raw)
To: Alejandro Vallejo, xen-devel
Cc: Jan Beulich, Anthony PERARD, Grygorii Strashko
On 11/09/2025 1:35 pm, Alejandro Vallejo wrote:
> On Thu Sep 11, 2025 at 2:03 PM CEST, Andrew Cooper wrote:
>> On 11/09/2025 12:53 pm, Alejandro Vallejo wrote:
>>> CPU hotplug relies on the online CPU bitmap being provided on PIO 0xaf00
>>> by the device model. The GPE handler checks this and compares it against
>>> the "online" flag on each MADT LAPIC entry, setting the flag to its
>>> related bit in the bitmap and adjusting the table's checksum.
>>>
>>> The bytecode doesn't, however, stop at NCPUS. It keeps comparing until it
>>> reaches 128, even if that overflows the MADT into some other (hopefully
>>> mapped) memory. The reading isn't as problematic as the writing though.
>>>
>>> If an "entry" outside the MADT is deemed to disagree with the CPU bitmap
>>> then the bit where the "online" flag would be is flipped, thus
>>> corrupting that memory. And the MADT checksum gets adjusted for a flip
>>> that happened outside its range. It's all terrible.
>>>
>>> Note that this corruption happens regardless of the device-model being
>>> present or not, because even if the bitmap holds 0s, the overflowed
>>> memory might not at the bits corresponding to the "online" flag.
>>>
>>> This patch adjusts the DSDT so entries >=NCPUS are skipped.
>>>
>>> Fixes: c70ad37a1f7c("HVM vcpu add/remove: setup dsdt infrastructure...")
>>> Reported-by: Grygorii Strashko <grygorii_strashko@epam.com>
>>> Signed-off-by: Alejandro Vallejo <alejandro.garciavallejo@amd.com>
>>> ---
>>> Half RFC. Not thoroughly untested. Pipeline is green, but none of this is tested
>>> there.
>>>
>>> v2:
>>> * New patch with the general fix for HVM too. Turns out the correction
>>> logic was buggy after all.
>> Hmm, this does sound rather more serious. I have a nagging feeling that
>> until recently we always wrote 128 MADT entries.
> If so, I don't see where. It used to be 16, waaaaaaaaaaaaaaaaaaaaay back when.
> Then it got extended to whatever it needed to be.
>
> I have the nagging feeling that rather opaque "some OSs (cough Windows cough)
> don't like more than 16 CPUs was actually this bug in action. Making the DSDTs
> with exactly 16 CPUs a particular kind of silly.
>
>> So, while this looks like a good fix, I think we might want a second
>> Fixes tag.
> Happy to add it, but I really don't see anything like that in the git log.
I can't find it either. Maybe my memory is getting faulty.
~Andrew
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 1/2] libacpi: Prevent CPU hotplug AML from corrupting memory
2025-09-11 11:53 ` [PATCH v2 1/2] libacpi: Prevent CPU hotplug AML from corrupting memory Alejandro Vallejo
2025-09-11 12:03 ` Andrew Cooper
@ 2025-09-11 14:52 ` Jan Beulich
2025-09-11 15:32 ` Alejandro Vallejo
1 sibling, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2025-09-11 14:52 UTC (permalink / raw)
To: Alejandro Vallejo; +Cc: Anthony PERARD, Grygorii Strashko, xen-devel
On 11.09.2025 13:53, Alejandro Vallejo wrote:
> CPU hotplug relies on the online CPU bitmap being provided on PIO 0xaf00
> by the device model. The GPE handler checks this and compares it against
> the "online" flag on each MADT LAPIC entry, setting the flag to its
> related bit in the bitmap and adjusting the table's checksum.
>
> The bytecode doesn't, however, stop at NCPUS. It keeps comparing until it
> reaches 128, even if that overflows the MADT into some other (hopefully
> mapped) memory. The reading isn't as problematic as the writing though.
>
> If an "entry" outside the MADT is deemed to disagree with the CPU bitmap
> then the bit where the "online" flag would be is flipped, thus
> corrupting that memory. And the MADT checksum gets adjusted for a flip
> that happened outside its range. It's all terrible.
>
> Note that this corruption happens regardless of the device-model being
> present or not, because even if the bitmap holds 0s, the overflowed
> memory might not at the bits corresponding to the "online" flag.
>
> This patch adjusts the DSDT so entries >=NCPUS are skipped.
>
> Fixes: c70ad37a1f7c("HVM vcpu add/remove: setup dsdt infrastructure...")
The code in question originates from e5dc62c4d4f1 ("hvmloader: Fix CPU
hotplug notify handler in ACPI DSDT"), though. Before that there was a
different issue (as mentioned in the description).
> --- a/tools/libacpi/mk_dsdt.c
> +++ b/tools/libacpi/mk_dsdt.c
> @@ -239,7 +239,8 @@ int main(int argc, char **argv)
> /* Extract current CPU's status: 0=offline; 1=online. */
> stmt("And", "Local1, 1, Local2");
> /* Check if status is up-to-date in the relevant MADT LAPIC entry... */
> - push_block("If", "LNotEqual(Local2, \\_SB.PR%02X.FLG)", cpu);
> + push_block("If", "And(LLess(%d, NCPU), LNotEqual(Local2, \\_SB.PR%02X.FLG))",
> + cpu, cpu);
Don't we need to use \\_SB.NCPU here? From the other two uses it's not
quite clear; it might also be that the one using this form is actually
needlessly doing so. Yet here it may be better if only for consistency's
sake, as the LNotEqual() also operates on an absolute reference.
The other thing is that I'm not fluent in AML operand evaluation rules.
We want to avoid even the read access to FLG, and I'm unconvinced And()
will avoid evaluating its 2nd argument when the first one is 0. IOW this
may need to become a 2nd "If".
I further think that strictly speaking you mean LAnd() here, not And()
(but the above concern remains; all the ASL spec says is "Source1 and
Source2 are evaluated as integers" for both And() and LAnd()).
Jan
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 1/2] libacpi: Prevent CPU hotplug AML from corrupting memory
2025-09-11 12:03 ` Andrew Cooper
2025-09-11 12:35 ` Alejandro Vallejo
@ 2025-09-11 15:04 ` Jan Beulich
2025-09-11 15:16 ` Alejandro Vallejo
1 sibling, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2025-09-11 15:04 UTC (permalink / raw)
To: Andrew Cooper, Alejandro Vallejo
Cc: Anthony PERARD, Grygorii Strashko, xen-devel
On 11.09.2025 14:03, Andrew Cooper wrote:
> On 11/09/2025 12:53 pm, Alejandro Vallejo wrote:
>> CPU hotplug relies on the online CPU bitmap being provided on PIO 0xaf00
>> by the device model. The GPE handler checks this and compares it against
>> the "online" flag on each MADT LAPIC entry, setting the flag to its
>> related bit in the bitmap and adjusting the table's checksum.
>>
>> The bytecode doesn't, however, stop at NCPUS. It keeps comparing until it
>> reaches 128, even if that overflows the MADT into some other (hopefully
>> mapped) memory. The reading isn't as problematic as the writing though.
>>
>> If an "entry" outside the MADT is deemed to disagree with the CPU bitmap
>> then the bit where the "online" flag would be is flipped, thus
>> corrupting that memory. And the MADT checksum gets adjusted for a flip
>> that happened outside its range. It's all terrible.
>>
>> Note that this corruption happens regardless of the device-model being
>> present or not, because even if the bitmap holds 0s, the overflowed
>> memory might not at the bits corresponding to the "online" flag.
>>
>> This patch adjusts the DSDT so entries >=NCPUS are skipped.
>>
>> Fixes: c70ad37a1f7c("HVM vcpu add/remove: setup dsdt infrastructure...")
>> Reported-by: Grygorii Strashko <grygorii_strashko@epam.com>
>> Signed-off-by: Alejandro Vallejo <alejandro.garciavallejo@amd.com>
>> ---
>> Half RFC. Not thoroughly untested. Pipeline is green, but none of this is tested
>> there.
>>
>> v2:
>> * New patch with the general fix for HVM too. Turns out the correction
>> logic was buggy after all.
>
> Hmm, this does sound rather more serious. I have a nagging feeling that
> until recently we always wrote 128 MADT entries.
Not exactly recently, but looks like that's my fault then: 0875433389240
("hvmloader: limit CPUs exposed to guests").
Jan
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 2/2] libacpi: Remove CPU hotplug and GPE handling from PVH DSDTs
2025-09-11 11:53 ` [PATCH v2 2/2] libacpi: Remove CPU hotplug and GPE handling from PVH DSDTs Alejandro Vallejo
@ 2025-09-11 15:07 ` Jan Beulich
2025-09-11 15:36 ` Alejandro Vallejo
0 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2025-09-11 15:07 UTC (permalink / raw)
To: Alejandro Vallejo; +Cc: Anthony PERARD, xen-devel
On 11.09.2025 13:53, Alejandro Vallejo wrote:
> PVH guests have no DM, so this causes the guest to fetch the online CPU
> bitmap from an unbacked 0xaf00 PIO port when executing the GPE handler.
>
> Seeing how ACPI CPU hotplug is the only event delivered via GPE, remove
> the GPE handler in addition to anything ACPI CPU hotplug related.
>
> This shrinks PVH's DSDT substantially and prevents spuriously executing
> a large amount of AML with no purpose at all.
>
> Signed-off-by: Alejandro Vallejo <alejandro.garciavallejo@amd.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
> ---
> v2:
> * Adjusted commit message
> * All other tags except S-by moved to patch 1.
This will want backporting; I expect finding a suitable commit for a Fixes:
tag is somewhat difficult.
Jan
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 1/2] libacpi: Prevent CPU hotplug AML from corrupting memory
2025-09-11 15:04 ` Jan Beulich
@ 2025-09-11 15:16 ` Alejandro Vallejo
0 siblings, 0 replies; 15+ messages in thread
From: Alejandro Vallejo @ 2025-09-11 15:16 UTC (permalink / raw)
To: Jan Beulich, Andrew Cooper; +Cc: Anthony PERARD, Grygorii Strashko, xen-devel
On Thu Sep 11, 2025 at 5:04 PM CEST, Jan Beulich wrote:
> On 11.09.2025 14:03, Andrew Cooper wrote:
>> On 11/09/2025 12:53 pm, Alejandro Vallejo wrote:
>>> CPU hotplug relies on the online CPU bitmap being provided on PIO 0xaf00
>>> by the device model. The GPE handler checks this and compares it against
>>> the "online" flag on each MADT LAPIC entry, setting the flag to its
>>> related bit in the bitmap and adjusting the table's checksum.
>>>
>>> The bytecode doesn't, however, stop at NCPUS. It keeps comparing until it
>>> reaches 128, even if that overflows the MADT into some other (hopefully
>>> mapped) memory. The reading isn't as problematic as the writing though.
>>>
>>> If an "entry" outside the MADT is deemed to disagree with the CPU bitmap
>>> then the bit where the "online" flag would be is flipped, thus
>>> corrupting that memory. And the MADT checksum gets adjusted for a flip
>>> that happened outside its range. It's all terrible.
>>>
>>> Note that this corruption happens regardless of the device-model being
>>> present or not, because even if the bitmap holds 0s, the overflowed
>>> memory might not at the bits corresponding to the "online" flag.
>>>
>>> This patch adjusts the DSDT so entries >=NCPUS are skipped.
>>>
>>> Fixes: c70ad37a1f7c("HVM vcpu add/remove: setup dsdt infrastructure...")
>>> Reported-by: Grygorii Strashko <grygorii_strashko@epam.com>
>>> Signed-off-by: Alejandro Vallejo <alejandro.garciavallejo@amd.com>
>>> ---
>>> Half RFC. Not thoroughly untested. Pipeline is green, but none of this is tested
>>> there.
>>>
>>> v2:
>>> * New patch with the general fix for HVM too. Turns out the correction
>>> logic was buggy after all.
>>
>> Hmm, this does sound rather more serious. I have a nagging feeling that
>> until recently we always wrote 128 MADT entries.
>
> Not exactly recently, but looks like that's my fault then: 0875433389240
> ("hvmloader: limit CPUs exposed to guests").
>
> Jan
Very right. I got to that commit, but thought nr_processor_objects would match
NCPUS. Wrong assumption.
That sorts out wich fixes tag to attribute this to.
Cheers,
Alejandro
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 1/2] libacpi: Prevent CPU hotplug AML from corrupting memory
2025-09-11 14:52 ` Jan Beulich
@ 2025-09-11 15:32 ` Alejandro Vallejo
2025-09-11 15:42 ` Jan Beulich
0 siblings, 1 reply; 15+ messages in thread
From: Alejandro Vallejo @ 2025-09-11 15:32 UTC (permalink / raw)
To: Jan Beulich; +Cc: Anthony PERARD, Grygorii Strashko, xen-devel
On Thu Sep 11, 2025 at 4:52 PM CEST, Jan Beulich wrote:
> On 11.09.2025 13:53, Alejandro Vallejo wrote:
>> CPU hotplug relies on the online CPU bitmap being provided on PIO 0xaf00
>> by the device model. The GPE handler checks this and compares it against
>> the "online" flag on each MADT LAPIC entry, setting the flag to its
>> related bit in the bitmap and adjusting the table's checksum.
>>
>> The bytecode doesn't, however, stop at NCPUS. It keeps comparing until it
>> reaches 128, even if that overflows the MADT into some other (hopefully
>> mapped) memory. The reading isn't as problematic as the writing though.
>>
>> If an "entry" outside the MADT is deemed to disagree with the CPU bitmap
>> then the bit where the "online" flag would be is flipped, thus
>> corrupting that memory. And the MADT checksum gets adjusted for a flip
>> that happened outside its range. It's all terrible.
>>
>> Note that this corruption happens regardless of the device-model being
>> present or not, because even if the bitmap holds 0s, the overflowed
>> memory might not at the bits corresponding to the "online" flag.
>>
>> This patch adjusts the DSDT so entries >=NCPUS are skipped.
>>
>> Fixes: c70ad37a1f7c("HVM vcpu add/remove: setup dsdt infrastructure...")
>
> The code in question originates from e5dc62c4d4f1 ("hvmloader: Fix CPU
> hotplug notify handler in ACPI DSDT"), though. Before that there was a
> different issue (as mentioned in the description).
As you mentioned elsewhere, it probably is 087543338924("hvmloader: limit CPUs
exposed to guests") that matters. Until then the DSDT was correct.
>
>> --- a/tools/libacpi/mk_dsdt.c
>> +++ b/tools/libacpi/mk_dsdt.c
>> @@ -239,7 +239,8 @@ int main(int argc, char **argv)
>> /* Extract current CPU's status: 0=offline; 1=online. */
>> stmt("And", "Local1, 1, Local2");
>> /* Check if status is up-to-date in the relevant MADT LAPIC entry... */
>> - push_block("If", "LNotEqual(Local2, \\_SB.PR%02X.FLG)", cpu);
>> + push_block("If", "And(LLess(%d, NCPU), LNotEqual(Local2, \\_SB.PR%02X.FLG))",
>> + cpu, cpu);
>
> Don't we need to use \\_SB.NCPU here? From the other two uses it's not
> quite clear; it might also be that the one using this form is actually
> needlessly doing so. Yet here it may be better if only for consistency's
> sake, as the LNotEqual() also operates on an absolute reference.
\SB.PMAT method does the same thing. I'll just change that too while at it.
> The other thing is that I'm not fluent in AML operand evaluation rules.
> We want to avoid even the read access to FLG, and I'm unconvinced And()
> will avoid evaluating its 2nd argument when the first one is 0. IOW this
> may need to become a 2nd "If".
I don't think there are any rules, it's unspecified. While in practice it
wouldn't matter a lot, it's indeed better not to rely on it not blowing up.
After sending this, I wondered about having a separate if with an early return.
>
> I further think that strictly speaking you mean LAnd() here, not And()
> (but the above concern remains; all the ASL spec says is "Source1 and
> Source2 are evaluated as integers" for both And() and LAnd()).
I very definitely did mean LAnd! Nice catch. As for
>
> Jan
TL;DR: Will s/And/LAnd/ and move it to a separate If
Cheers,
Alejandro
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 2/2] libacpi: Remove CPU hotplug and GPE handling from PVH DSDTs
2025-09-11 15:07 ` Jan Beulich
@ 2025-09-11 15:36 ` Alejandro Vallejo
2025-09-11 15:43 ` Jan Beulich
0 siblings, 1 reply; 15+ messages in thread
From: Alejandro Vallejo @ 2025-09-11 15:36 UTC (permalink / raw)
To: Jan Beulich; +Cc: Anthony PERARD, xen-devel
On Thu Sep 11, 2025 at 5:07 PM CEST, Jan Beulich wrote:
> On 11.09.2025 13:53, Alejandro Vallejo wrote:
>> PVH guests have no DM, so this causes the guest to fetch the online CPU
>> bitmap from an unbacked 0xaf00 PIO port when executing the GPE handler.
>>
>> Seeing how ACPI CPU hotplug is the only event delivered via GPE, remove
>> the GPE handler in addition to anything ACPI CPU hotplug related.
>>
>> This shrinks PVH's DSDT substantially and prevents spuriously executing
>> a large amount of AML with no purpose at all.
>>
>> Signed-off-by: Alejandro Vallejo <alejandro.garciavallejo@amd.com>
>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Thanks
>
>> ---
>> v2:
>> * Adjusted commit message
>> * All other tags except S-by moved to patch 1.
>
> This will want backporting; I expect finding a suitable commit for a Fixes:
> tag is somewhat difficult.
>
> Jan
I'd say it's the patch that needlessly enabled GPE handling on PVH.
Fixes: 062975dc9441("acpi: PVH guests need _E02 method")
Cheers,
Alejandro
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 1/2] libacpi: Prevent CPU hotplug AML from corrupting memory
2025-09-11 15:32 ` Alejandro Vallejo
@ 2025-09-11 15:42 ` Jan Beulich
0 siblings, 0 replies; 15+ messages in thread
From: Jan Beulich @ 2025-09-11 15:42 UTC (permalink / raw)
To: Alejandro Vallejo; +Cc: Anthony PERARD, Grygorii Strashko, xen-devel
On 11.09.2025 17:32, Alejandro Vallejo wrote:
> On Thu Sep 11, 2025 at 4:52 PM CEST, Jan Beulich wrote:
>> On 11.09.2025 13:53, Alejandro Vallejo wrote:
>>> CPU hotplug relies on the online CPU bitmap being provided on PIO 0xaf00
>>> by the device model. The GPE handler checks this and compares it against
>>> the "online" flag on each MADT LAPIC entry, setting the flag to its
>>> related bit in the bitmap and adjusting the table's checksum.
>>>
>>> The bytecode doesn't, however, stop at NCPUS. It keeps comparing until it
>>> reaches 128, even if that overflows the MADT into some other (hopefully
>>> mapped) memory. The reading isn't as problematic as the writing though.
>>>
>>> If an "entry" outside the MADT is deemed to disagree with the CPU bitmap
>>> then the bit where the "online" flag would be is flipped, thus
>>> corrupting that memory. And the MADT checksum gets adjusted for a flip
>>> that happened outside its range. It's all terrible.
>>>
>>> Note that this corruption happens regardless of the device-model being
>>> present or not, because even if the bitmap holds 0s, the overflowed
>>> memory might not at the bits corresponding to the "online" flag.
>>>
>>> This patch adjusts the DSDT so entries >=NCPUS are skipped.
>>>
>>> Fixes: c70ad37a1f7c("HVM vcpu add/remove: setup dsdt infrastructure...")
>>
>> The code in question originates from e5dc62c4d4f1 ("hvmloader: Fix CPU
>> hotplug notify handler in ACPI DSDT"), though. Before that there was a
>> different issue (as mentioned in the description).
>
> As you mentioned elsewhere, it probably is 087543338924("hvmloader: limit CPUs
> exposed to guests") that matters. Until then the DSDT was correct.
>
>>
>>> --- a/tools/libacpi/mk_dsdt.c
>>> +++ b/tools/libacpi/mk_dsdt.c
>>> @@ -239,7 +239,8 @@ int main(int argc, char **argv)
>>> /* Extract current CPU's status: 0=offline; 1=online. */
>>> stmt("And", "Local1, 1, Local2");
>>> /* Check if status is up-to-date in the relevant MADT LAPIC entry... */
>>> - push_block("If", "LNotEqual(Local2, \\_SB.PR%02X.FLG)", cpu);
>>> + push_block("If", "And(LLess(%d, NCPU), LNotEqual(Local2, \\_SB.PR%02X.FLG))",
>>> + cpu, cpu);
>>
>> Don't we need to use \\_SB.NCPU here? From the other two uses it's not
>> quite clear; it might also be that the one using this form is actually
>> needlessly doing so. Yet here it may be better if only for consistency's
>> sake, as the LNotEqual() also operates on an absolute reference.
>
> \SB.PMAT method does the same thing. I'll just change that too while at it.
>
>> The other thing is that I'm not fluent in AML operand evaluation rules.
>> We want to avoid even the read access to FLG, and I'm unconvinced And()
>> will avoid evaluating its 2nd argument when the first one is 0. IOW this
>> may need to become a 2nd "If".
>
> I don't think there are any rules, it's unspecified. While in practice it
> wouldn't matter a lot, it's indeed better not to rely on it not blowing up.
>
> After sending this, I wondered about having a separate if with an early return.
>
>>
>> I further think that strictly speaking you mean LAnd() here, not And()
>> (but the above concern remains; all the ASL spec says is "Source1 and
>> Source2 are evaluated as integers" for both And() and LAnd()).
>
> I very definitely did mean LAnd! Nice catch. As for
>
>>
>> Jan
>
> TL;DR: Will s/And/LAnd/ and move it to a separate If
Except that once you use a separate If, no And() or LAnd() will be needed
anymore.
Jan
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 2/2] libacpi: Remove CPU hotplug and GPE handling from PVH DSDTs
2025-09-11 15:36 ` Alejandro Vallejo
@ 2025-09-11 15:43 ` Jan Beulich
0 siblings, 0 replies; 15+ messages in thread
From: Jan Beulich @ 2025-09-11 15:43 UTC (permalink / raw)
To: Alejandro Vallejo; +Cc: Anthony PERARD, xen-devel
On 11.09.2025 17:36, Alejandro Vallejo wrote:
> On Thu Sep 11, 2025 at 5:07 PM CEST, Jan Beulich wrote:
>> On 11.09.2025 13:53, Alejandro Vallejo wrote:
>>> ---
>>> v2:
>>> * Adjusted commit message
>>> * All other tags except S-by moved to patch 1.
>>
>> This will want backporting; I expect finding a suitable commit for a Fixes:
>> tag is somewhat difficult.
>
> I'd say it's the patch that needlessly enabled GPE handling on PVH.
>
> Fixes: 062975dc9441("acpi: PVH guests need _E02 method")
Oh, right.
Jan
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2 2/2] libacpi: Remove CPU hotplug and GPE handling from PVH DSDTs
2025-09-11 16:23 [PATCH v2 0/2] libacpi: Fix memory corruption on ACPI CPU hotplug Alejandro Vallejo
@ 2025-09-11 16:23 ` Alejandro Vallejo
0 siblings, 0 replies; 15+ messages in thread
From: Alejandro Vallejo @ 2025-09-11 16:23 UTC (permalink / raw)
To: xen-devel
Cc: Alejandro Vallejo, Jan Beulich, Anthony PERARD, Grygorii Strashko
PVH guests have no DM, so this causes the guest to fetch the online CPU
bitmap from an unbacked 0xaf00 PIO port when executing the GPE handler.
Seeing how ACPI CPU hotplug is the only event delivered via GPE, remove
the GPE handler in addition to anything ACPI CPU hotplug related.
This shrinks PVH's DSDT substantially and prevents spuriously executing
a large amount of AML with no purpose at all.
Fixes: 062975dc9441("acpi: PVH guests need _E02 method")
Signed-off-by: Alejandro Vallejo <alejandro.garciavallejo@amd.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
v2:
* Added Fixes and R-by tags
---
tools/libacpi/mk_dsdt.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/tools/libacpi/mk_dsdt.c b/tools/libacpi/mk_dsdt.c
index aeeb71dfe6..433eb4f8d9 100644
--- a/tools/libacpi/mk_dsdt.c
+++ b/tools/libacpi/mk_dsdt.c
@@ -218,6 +218,11 @@ int main(int argc, char **argv)
pop_block();
/**** Processor end ****/
#else
+ if (dm_version == QEMU_NONE) {
+ pop_block();
+ pop_block();
+ return 0;
+ }
/* Operation Region 'PRST': bitmask of online CPUs. */
stmt("OperationRegion", "PRST, SystemIO, %#x, %d",
@@ -278,10 +283,6 @@ int main(int argc, char **argv)
pop_block();
pop_block();
- if (dm_version == QEMU_NONE) {
- pop_block();
- return 0;
- }
/**** Processor end ****/
--
2.43.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-09-11 16:24 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-11 11:53 [PATCH 0/2] libacpi: Fix memory corruption on ACPI CPU hotplug Alejandro Vallejo
2025-09-11 11:53 ` [PATCH v2 1/2] libacpi: Prevent CPU hotplug AML from corrupting memory Alejandro Vallejo
2025-09-11 12:03 ` Andrew Cooper
2025-09-11 12:35 ` Alejandro Vallejo
2025-09-11 13:29 ` Andrew Cooper
2025-09-11 15:04 ` Jan Beulich
2025-09-11 15:16 ` Alejandro Vallejo
2025-09-11 14:52 ` Jan Beulich
2025-09-11 15:32 ` Alejandro Vallejo
2025-09-11 15:42 ` Jan Beulich
2025-09-11 11:53 ` [PATCH v2 2/2] libacpi: Remove CPU hotplug and GPE handling from PVH DSDTs Alejandro Vallejo
2025-09-11 15:07 ` Jan Beulich
2025-09-11 15:36 ` Alejandro Vallejo
2025-09-11 15:43 ` Jan Beulich
-- strict thread matches above, loose matches on Subject: below --
2025-09-11 16:23 [PATCH v2 0/2] libacpi: Fix memory corruption on ACPI CPU hotplug Alejandro Vallejo
2025-09-11 16:23 ` [PATCH v2 2/2] libacpi: Remove CPU hotplug and GPE handling from PVH DSDTs Alejandro Vallejo
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.