* [PATCH 1/5] x86/ucode: Fix missing printk() newline in ucode_probe_amd()
2025-10-20 13:19 [PATCH for-4.21? 0/5] x86/ucode: Support loading latest ucode from linux-firwmare Andrew Cooper
@ 2025-10-20 13:19 ` Andrew Cooper
2025-10-20 13:37 ` Jan Beulich
2025-10-20 13:19 ` [PATCH 2/5] x86/ucode: Abort parallel load early on any control thread error Andrew Cooper
` (3 subsequent siblings)
4 siblings, 1 reply; 21+ messages in thread
From: Andrew Cooper @ 2025-10-20 13:19 UTC (permalink / raw)
To: Xen-devel
Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné,
Oleksii Kurochko
Fixes: 630e8875ab36 ("x86/ucode: Perform extra SHA2 checks on AMD Fam17h/19h microcode")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Oleksii Kurochko <oleksii.kurochko@gmail.com>
For 4.21. This is a formatting fix with basically 0 risk.
It is encouraging that no-one has reported this bug so far, because it
suggests that no-one has turned off digest checking and then looked at dmesg.
---
xen/arch/x86/cpu/microcode/amd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/xen/arch/x86/cpu/microcode/amd.c b/xen/arch/x86/cpu/microcode/amd.c
index a5729229a403..59332da2b827 100644
--- a/xen/arch/x86/cpu/microcode/amd.c
+++ b/xen/arch/x86/cpu/microcode/amd.c
@@ -519,7 +519,7 @@ void __init ucode_probe_amd(struct microcode_ops *ops)
if ( !opt_digest_check && boot_cpu_data.family >= 0x17 )
{
printk(XENLOG_WARNING
- "Microcode patch additional digest checks disabled");
+ "Microcode patch additional digest checks disabled\n");
add_taint(TAINT_CPU_OUT_OF_SPEC);
}
--
2.39.5
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH 1/5] x86/ucode: Fix missing printk() newline in ucode_probe_amd()
2025-10-20 13:19 ` [PATCH 1/5] x86/ucode: Fix missing printk() newline in ucode_probe_amd() Andrew Cooper
@ 2025-10-20 13:37 ` Jan Beulich
2025-10-20 15:02 ` Andrew Cooper
0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2025-10-20 13:37 UTC (permalink / raw)
To: Andrew Cooper; +Cc: Roger Pau Monné, Oleksii Kurochko, Xen-devel
On 20.10.2025 15:19, Andrew Cooper wrote:
> Fixes: 630e8875ab36 ("x86/ucode: Perform extra SHA2 checks on AMD Fam17h/19h microcode")
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Looks like you got an ack and r-ack for this already, but you have lost them?
Jan
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH 1/5] x86/ucode: Fix missing printk() newline in ucode_probe_amd()
2025-10-20 13:37 ` Jan Beulich
@ 2025-10-20 15:02 ` Andrew Cooper
0 siblings, 0 replies; 21+ messages in thread
From: Andrew Cooper @ 2025-10-20 15:02 UTC (permalink / raw)
To: Jan Beulich; +Cc: Roger Pau Monné, Oleksii Kurochko, Xen-devel
On 20/10/2025 2:37 pm, Jan Beulich wrote:
> On 20.10.2025 15:19, Andrew Cooper wrote:
>> Fixes: 630e8875ab36 ("x86/ucode: Perform extra SHA2 checks on AMD Fam17h/19h microcode")
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Looks like you got an ack and r-ack for this already, but you have lost them?
Bah, forgot to include them here. I did note this in the cover letter.
~Andrew
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 2/5] x86/ucode: Abort parallel load early on any control thread error
2025-10-20 13:19 [PATCH for-4.21? 0/5] x86/ucode: Support loading latest ucode from linux-firwmare Andrew Cooper
2025-10-20 13:19 ` [PATCH 1/5] x86/ucode: Fix missing printk() newline in ucode_probe_amd() Andrew Cooper
@ 2025-10-20 13:19 ` Andrew Cooper
2025-10-20 15:55 ` Jan Beulich
2025-10-20 13:19 ` [PATCH 3/5] x86/ucode: Refine TLB flush fix for AMD Fam17h CPUs Andrew Cooper
` (2 subsequent siblings)
4 siblings, 1 reply; 21+ messages in thread
From: Andrew Cooper @ 2025-10-20 13:19 UTC (permalink / raw)
To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné
EIO is not the only error that ucode_ops.apply_microcode() can produce.
EINVAL, EEXISTS and ENXIO can be generated too, each of which mean that Xen is
unhappy in some way with the proposed blob.
Some of these can be bypassed with --force, which will cause the parallel load
to be attempted.
Fixes: 5ed12565aa32 ("microcode: rendezvous CPUs in NMI handler and load ucode")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
---
xen/arch/x86/cpu/microcode/core.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/xen/arch/x86/cpu/microcode/core.c b/xen/arch/x86/cpu/microcode/core.c
index 1b093bc98a58..2705bb43c97f 100644
--- a/xen/arch/x86/cpu/microcode/core.c
+++ b/xen/arch/x86/cpu/microcode/core.c
@@ -392,10 +392,10 @@ static int control_thread_fn(const struct microcode_patch *patch,
atomic_inc(&cpu_updated);
atomic_inc(&cpu_out);
- if ( ret == -EIO )
+ if ( ret )
{
printk(XENLOG_ERR
- "Late loading aborted: CPU%u failed to update ucode\n", cpu);
+ "Late loading aborted: CPU%u failed to update ucode: %d\n", cpu, ret);
goto out;
}
--
2.39.5
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH 2/5] x86/ucode: Abort parallel load early on any control thread error
2025-10-20 13:19 ` [PATCH 2/5] x86/ucode: Abort parallel load early on any control thread error Andrew Cooper
@ 2025-10-20 15:55 ` Jan Beulich
2025-10-22 19:28 ` Andrew Cooper
0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2025-10-20 15:55 UTC (permalink / raw)
To: Andrew Cooper; +Cc: Roger Pau Monné, Xen-devel
On 20.10.2025 15:19, Andrew Cooper wrote:
> EIO is not the only error that ucode_ops.apply_microcode() can produce.
> EINVAL, EEXISTS and ENXIO can be generated too, each of which mean that Xen is
> unhappy in some way with the proposed blob.
Yes, yet wasn't that the case already when the EIO check was added? Were we
perhaps trying to deal with a certain level of asymmetry in the system? I
think a little more is needed here, also to ...
> Some of these can be bypassed with --force, which will cause the parallel load
> to be attempted.
>
> Fixes: 5ed12565aa32 ("microcode: rendezvous CPUs in NMI handler and load ucode")
... justify there being a Fixes: tag. It would also seem possible that the
check was intentional and correct at the time of introduction, but was
rendered stale by some later change.
Jan
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH 2/5] x86/ucode: Abort parallel load early on any control thread error
2025-10-20 15:55 ` Jan Beulich
@ 2025-10-22 19:28 ` Andrew Cooper
2025-10-23 6:24 ` Jan Beulich
0 siblings, 1 reply; 21+ messages in thread
From: Andrew Cooper @ 2025-10-22 19:28 UTC (permalink / raw)
To: Jan Beulich; +Cc: Roger Pau Monné, Xen-devel
On 20/10/2025 4:55 pm, Jan Beulich wrote:
> On 20.10.2025 15:19, Andrew Cooper wrote:
>> EIO is not the only error that ucode_ops.apply_microcode() can produce.
>> EINVAL, EEXISTS and ENXIO can be generated too, each of which mean that Xen is
>> unhappy in some way with the proposed blob.
> Yes, yet wasn't that the case already when the EIO check was added? Were we
> perhaps trying to deal with a certain level of asymmetry in the system? I
> think a little more is needed here, also to ...
>
>> Some of these can be bypassed with --force, which will cause the parallel load
>> to be attempted.
>>
>> Fixes: 5ed12565aa32 ("microcode: rendezvous CPUs in NMI handler and load ucode")
> ... justify there being a Fixes: tag. It would also seem possible that the
> check was intentional and correct at the time of introduction, but was
> rendered stale by some later change.
The parallel load logic more bugs than lines of code. What hasn't
already been rewritten either has pending patches, or pending bugs
needing fixing.
I didn't care to check why it was limited to EIO at the time. It's
definitely wrong.
But if you insist that I waste time doing so, at the time EIO was
introduced, both apply_microcode()'s could fail with -ENOENT for a NULL
pointer, -EINVAL for "patch isn't for this CPU".
I.e. it was definitely wrong at the time too.
~Andrew
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH 2/5] x86/ucode: Abort parallel load early on any control thread error
2025-10-22 19:28 ` Andrew Cooper
@ 2025-10-23 6:24 ` Jan Beulich
2025-10-27 22:46 ` Andrew Cooper
0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2025-10-23 6:24 UTC (permalink / raw)
To: Andrew Cooper; +Cc: Roger Pau Monné, Xen-devel
On 22.10.2025 21:28, Andrew Cooper wrote:
> On 20/10/2025 4:55 pm, Jan Beulich wrote:
>> On 20.10.2025 15:19, Andrew Cooper wrote:
>>> EIO is not the only error that ucode_ops.apply_microcode() can produce.
>>> EINVAL, EEXISTS and ENXIO can be generated too, each of which mean that Xen is
>>> unhappy in some way with the proposed blob.
>> Yes, yet wasn't that the case already when the EIO check was added? Were we
>> perhaps trying to deal with a certain level of asymmetry in the system? I
>> think a little more is needed here, also to ...
>>
>>> Some of these can be bypassed with --force, which will cause the parallel load
>>> to be attempted.
>>>
>>> Fixes: 5ed12565aa32 ("microcode: rendezvous CPUs in NMI handler and load ucode")
>> ... justify there being a Fixes: tag. It would also seem possible that the
>> check was intentional and correct at the time of introduction, but was
>> rendered stale by some later change.
>
> The parallel load logic more bugs than lines of code. What hasn't
> already been rewritten either has pending patches, or pending bugs
> needing fixing.
>
> I didn't care to check why it was limited to EIO at the time. It's
> definitely wrong.
>
> But if you insist that I waste time doing so, at the time EIO was
> introduced, both apply_microcode()'s could fail with -ENOENT for a NULL
> pointer, -EINVAL for "patch isn't for this CPU".
The latter fits my "trying to deal with a certain level of asymmetry" guess,
doesn't it?
And btw, why are you being so negative again? "Waste time" is a pretty clear
sign of you (once again) thinking that your view of the world is the only
possibly sensible one.
Nevertheless, considering that asymmetry is not something we really mean to
care about:
Acked-by: Jan Beulich <jbeulich@suse.com>
Jan
> I.e. it was definitely wrong at the time too.
>
> ~Andrew
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH 2/5] x86/ucode: Abort parallel load early on any control thread error
2025-10-23 6:24 ` Jan Beulich
@ 2025-10-27 22:46 ` Andrew Cooper
2025-10-28 8:07 ` Jan Beulich
0 siblings, 1 reply; 21+ messages in thread
From: Andrew Cooper @ 2025-10-27 22:46 UTC (permalink / raw)
To: Jan Beulich; +Cc: Roger Pau Monné, Xen-devel
On 23/10/2025 7:24 am, Jan Beulich wrote:
> On 22.10.2025 21:28, Andrew Cooper wrote:
>> On 20/10/2025 4:55 pm, Jan Beulich wrote:
>>> On 20.10.2025 15:19, Andrew Cooper wrote:
>>>> EIO is not the only error that ucode_ops.apply_microcode() can produce.
>>>> EINVAL, EEXISTS and ENXIO can be generated too, each of which mean that Xen is
>>>> unhappy in some way with the proposed blob.
>>> Yes, yet wasn't that the case already when the EIO check was added? Were we
>>> perhaps trying to deal with a certain level of asymmetry in the system? I
>>> think a little more is needed here, also to ...
>>>
>>>> Some of these can be bypassed with --force, which will cause the parallel load
>>>> to be attempted.
>>>>
>>>> Fixes: 5ed12565aa32 ("microcode: rendezvous CPUs in NMI handler and load ucode")
>>> ... justify there being a Fixes: tag. It would also seem possible that the
>>> check was intentional and correct at the time of introduction, but was
>>> rendered stale by some later change.
>> The parallel load logic more bugs than lines of code. What hasn't
>> already been rewritten either has pending patches, or pending bugs
>> needing fixing.
>>
>> I didn't care to check why it was limited to EIO at the time. It's
>> definitely wrong.
>>
>> But if you insist that I waste time doing so, at the time EIO was
>> introduced, both apply_microcode()'s could fail with -ENOENT for a NULL
>> pointer, -EINVAL for "patch isn't for this CPU".
> The latter fits my "trying to deal with a certain level of asymmetry" guess,
> doesn't it?
If you mean CPU asymmetry, I'm going to argue this as a bugfix. Some
Intel CPUs are known not to check the stepping, accept the wrong
microcode, and crash. I have more patches to at least make this case
very obvious, but I need to get through some of the existing queue first.
If you mean revision asymmetry, nothing has really changed here. The
parallel load is only started if a blob newer than the cache is found.
If --force is used to override this check and load anyway, you also
won't get -EEXISTs out of apply_microcode().
> And btw, why are you being so negative again? "Waste time" is a pretty clear
> sign of you (once again) thinking that your view of the world is the only
> possibly sensible one.
I have rewritten most of microcode loading from scratch. What hasn't
yet been rewritten is pending, with serious errors already identified
on-list and more still that haven't made it into public.
I would be further through if it had not taken an unreasonable amount of
effort to make the changes so far. You refused my module changes
despite the blatant issues in the existing code, forcing me to
manoeuvrer them in via the boot_info changes (and in so doing discover
that module handling in general was even more broken than originally
realised).
The current pending series is in part stuck because I haven't had the
energy to tell you to stop trying to scope creep the work.
So yes, I was irritated at being asked to justify not breaking a thing
which has been thoroughly demonstrated to be broken.
>
> Nevertheless, considering that asymmetry is not something we really mean to
> care about:
>
> Acked-by: Jan Beulich <jbeulich@suse.com>
Thanks.
~Andrew
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH 2/5] x86/ucode: Abort parallel load early on any control thread error
2025-10-27 22:46 ` Andrew Cooper
@ 2025-10-28 8:07 ` Jan Beulich
0 siblings, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2025-10-28 8:07 UTC (permalink / raw)
To: Andrew Cooper; +Cc: Roger Pau Monné, Xen-devel
On 27.10.2025 23:46, Andrew Cooper wrote:
> On 23/10/2025 7:24 am, Jan Beulich wrote:
>> On 22.10.2025 21:28, Andrew Cooper wrote:
>>> On 20/10/2025 4:55 pm, Jan Beulich wrote:
>>>> On 20.10.2025 15:19, Andrew Cooper wrote:
>>>>> EIO is not the only error that ucode_ops.apply_microcode() can produce.
>>>>> EINVAL, EEXISTS and ENXIO can be generated too, each of which mean that Xen is
>>>>> unhappy in some way with the proposed blob.
>>>> Yes, yet wasn't that the case already when the EIO check was added? Were we
>>>> perhaps trying to deal with a certain level of asymmetry in the system? I
>>>> think a little more is needed here, also to ...
>>>>
>>>>> Some of these can be bypassed with --force, which will cause the parallel load
>>>>> to be attempted.
>>>>>
>>>>> Fixes: 5ed12565aa32 ("microcode: rendezvous CPUs in NMI handler and load ucode")
>>>> ... justify there being a Fixes: tag. It would also seem possible that the
>>>> check was intentional and correct at the time of introduction, but was
>>>> rendered stale by some later change.
>>> The parallel load logic more bugs than lines of code. What hasn't
>>> already been rewritten either has pending patches, or pending bugs
>>> needing fixing.
>>>
>>> I didn't care to check why it was limited to EIO at the time. It's
>>> definitely wrong.
>>>
>>> But if you insist that I waste time doing so, at the time EIO was
>>> introduced, both apply_microcode()'s could fail with -ENOENT for a NULL
>>> pointer, -EINVAL for "patch isn't for this CPU".
>> The latter fits my "trying to deal with a certain level of asymmetry" guess,
>> doesn't it?
>
> If you mean CPU asymmetry, I'm going to argue this as a bugfix. Some
> Intel CPUs are known not to check the stepping, accept the wrong
> microcode, and crash. I have more patches to at least make this case
> very obvious, but I need to get through some of the existing queue first.
>
> If you mean revision asymmetry, nothing has really changed here. The
> parallel load is only started if a blob newer than the cache is found.
> If --force is used to override this check and load anyway, you also
> won't get -EEXISTs out of apply_microcode().
>
>> And btw, why are you being so negative again? "Waste time" is a pretty clear
>> sign of you (once again) thinking that your view of the world is the only
>> possibly sensible one.
>
> I have rewritten most of microcode loading from scratch. What hasn't
> yet been rewritten is pending, with serious errors already identified
> on-list and more still that haven't made it into public.
>
> I would be further through if it had not taken an unreasonable amount of
> effort to make the changes so far. You refused my module changes
> despite the blatant issues in the existing code, forcing me to
> manoeuvrer them in via the boot_info changes (and in so doing discover
> that module handling in general was even more broken than originally
> realised).
>
> The current pending series is in part stuck because I haven't had the
> energy to tell you to stop trying to scope creep the work.
>
> So yes, I was irritated at being asked to justify not breaking a thing
> which has been thoroughly demonstrated to be broken.
Hmm. You may have memorized all the details. I haven't. Instead I need
pointers to where aspects were discussed that matter here. It still
feels odd to me that my (implicit) request to supply such is deemed a
waste of time.
As to me refusing changes (here or elsewhere): In case it hadn't become
obvious to you, no matter whether it's your or my (or actually also
other people's) changes - when things get stuck, they almost always get
stuck at your end. You simply don't get back. When it's your changes -
I can be convinced. But if I raise questions or even objections, that
means I also _want_ to be convinced (rather than silently giving in).
That requires taking the time to respond accordingly, yes. When it's my
patches, I do take the time (often in vein, as a response then never
appears). When it's your patches, you will want to accept that you need
to take the time then as well, in order to make your work make progress.
Simply coming back months or years later claiming I (or somebody else)
blocked some of your work is at best a very subjective view of the world,
imo at least.
Jan
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 3/5] x86/ucode: Refine TLB flush fix for AMD Fam17h CPUs
2025-10-20 13:19 [PATCH for-4.21? 0/5] x86/ucode: Support loading latest ucode from linux-firwmare Andrew Cooper
2025-10-20 13:19 ` [PATCH 1/5] x86/ucode: Fix missing printk() newline in ucode_probe_amd() Andrew Cooper
2025-10-20 13:19 ` [PATCH 2/5] x86/ucode: Abort parallel load early on any control thread error Andrew Cooper
@ 2025-10-20 13:19 ` Andrew Cooper
2025-10-21 8:34 ` Jan Beulich
2025-10-20 13:19 ` [PATCH 4/5] x86/ucode: Cross check the minimum revision Andrew Cooper
2025-10-20 13:19 ` [PATCH 5/5] x86/ucode: Relax digest check when Entrysign is fixed in firmware Andrew Cooper
4 siblings, 1 reply; 21+ messages in thread
From: Andrew Cooper @ 2025-10-20 13:19 UTC (permalink / raw)
To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné
In the time since Xen discovered this, Linux stubled on it too and AMD
produced a narrower fix, limited to Fam17h CPUs only. To my knowledge,
there's no erratum or other public statement from AMD on the matter.
Adjust Xen to match the narrower fix.
Link: https://lore.kernel.org/lkml/ZyulbYuvrkshfsd2@antipodes/T/#u
Fixes: f19a199281a2 ("x86/AMD: flush TLB after ucode update")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
There is a difference in memory clobber with the invlpg() wrapper.
apply_microcode() specifically does not want a memory clobber, whereas
flush_area_local() doesn't need it as far as I can tell (there's nothing
unsafe to move across this instruction).
---
xen/arch/x86/cpu/microcode/amd.c | 14 +++++++++++---
xen/arch/x86/flushtlb.c | 3 +--
xen/arch/x86/include/asm/flushtlb.h | 5 +++++
3 files changed, 17 insertions(+), 5 deletions(-)
diff --git a/xen/arch/x86/cpu/microcode/amd.c b/xen/arch/x86/cpu/microcode/amd.c
index 59332da2b827..7ff702c06caf 100644
--- a/xen/arch/x86/cpu/microcode/amd.c
+++ b/xen/arch/x86/cpu/microcode/amd.c
@@ -306,10 +306,18 @@ static int cf_check apply_microcode(const struct microcode_patch *patch,
sig->rev = rev;
/*
- * Some processors leave the ucode blob mapping as UC after the update.
- * Flush the mapping to regain normal cacheability.
+ * Family 0x17 processors leave the mapping of the ucode as UC after the
+ * update. Flush the mapping to regain normal cacheability.
+ *
+ * We do not know the granularity of mapping, and at 3200 bytes in size
+ * there is a good chance of crossing a 4k page boundary. Shoot-down the
+ * start and end just to be safe.
*/
- flush_area_local(patch, FLUSH_TLB_GLOBAL | FLUSH_ORDER(0));
+ if ( boot_cpu_data.family == 0x17 )
+ {
+ invlpg(patch);
+ invlpg((const void *)patch + F17H_MPB_MAX_SIZE - 1);
+ }
/* check current patch id and patch's id for match */
if ( hw_err || (rev != patch->patch_id) )
diff --git a/xen/arch/x86/flushtlb.c b/xen/arch/x86/flushtlb.c
index 94b2a30e8d30..09e676c151fa 100644
--- a/xen/arch/x86/flushtlb.c
+++ b/xen/arch/x86/flushtlb.c
@@ -222,8 +222,7 @@ unsigned int flush_area_local(const void *va, unsigned int flags)
}
}
else
- asm volatile ( "invlpg %0"
- : : "m" (*(const char *)(va)) : "memory" );
+ invlpg(va);
}
else
do_tlb_flush();
diff --git a/xen/arch/x86/include/asm/flushtlb.h b/xen/arch/x86/include/asm/flushtlb.h
index 019d886f2b80..37bc203652b3 100644
--- a/xen/arch/x86/include/asm/flushtlb.h
+++ b/xen/arch/x86/include/asm/flushtlb.h
@@ -98,6 +98,11 @@ static inline unsigned long read_cr3(void)
return cr3;
}
+static inline void invlpg(const void *p)
+{
+ asm volatile ( "invlpg %0" :: "m" (*(const char *)p) );
+}
+
/* Write pagetable base and implicitly tick the tlbflush clock. */
void switch_cr3_cr4(unsigned long cr3, unsigned long cr4);
--
2.39.5
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH 3/5] x86/ucode: Refine TLB flush fix for AMD Fam17h CPUs
2025-10-20 13:19 ` [PATCH 3/5] x86/ucode: Refine TLB flush fix for AMD Fam17h CPUs Andrew Cooper
@ 2025-10-21 8:34 ` Jan Beulich
0 siblings, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2025-10-21 8:34 UTC (permalink / raw)
To: Andrew Cooper; +Cc: Roger Pau Monné, Xen-devel
On 20.10.2025 15:19, Andrew Cooper wrote:
> In the time since Xen discovered this, Linux stubled on it too and AMD
> produced a narrower fix, limited to Fam17h CPUs only. To my knowledge,
> there's no erratum or other public statement from AMD on the matter.
>
> Adjust Xen to match the narrower fix.
>
> Link: https://lore.kernel.org/lkml/ZyulbYuvrkshfsd2@antipodes/T/#u
> Fixes: f19a199281a2 ("x86/AMD: flush TLB after ucode update")
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
>
> There is a difference in memory clobber with the invlpg() wrapper.
> apply_microcode() specifically does not want a memory clobber, whereas
> flush_area_local() doesn't need it as far as I can tell (there's nothing
> unsafe to move across this instruction).
The memory access(es) it would not want moving across would be page table
writes. With link-time optimization, wouldn't it in principle be possible
for flush_area_local() to be inlined, and the invlpg() then be moved?
Potentially ahead of a PTE write, seeing that read_cr4() is merely a
simple memory only, and hence the compiler could utilize knowledge it has
to short-circuit that as well?
For the ucode case things can't move unduly due to both rdmsrl() and
invlpg() using "asm volatile()".
With the clobber re-added
Acked-by: Jan Beulich <jbeulich@suse.com>
Otherwise I need to be further educated as to why omitting the clobber is
safe in all (present and future) cases.
Jan
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 4/5] x86/ucode: Cross check the minimum revision
2025-10-20 13:19 [PATCH for-4.21? 0/5] x86/ucode: Support loading latest ucode from linux-firwmare Andrew Cooper
` (2 preceding siblings ...)
2025-10-20 13:19 ` [PATCH 3/5] x86/ucode: Refine TLB flush fix for AMD Fam17h CPUs Andrew Cooper
@ 2025-10-20 13:19 ` Andrew Cooper
2025-10-21 9:18 ` Jan Beulich
2025-10-20 13:19 ` [PATCH 5/5] x86/ucode: Relax digest check when Entrysign is fixed in firmware Andrew Cooper
4 siblings, 1 reply; 21+ messages in thread
From: Andrew Cooper @ 2025-10-20 13:19 UTC (permalink / raw)
To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné
For Zen3-5 microcode blobs signed with the updated signature scheme, the
checksum field has been reused to be a min_revision field, referring to the
microcode revision which fixed Entrysign (SB-7033, CVE-2024-36347).
Cross-check this when trying to load microcode, but allow --force to override
it. If the signature scheme is genuinely different, a #GP will occur.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
---
xen/arch/x86/cpu/microcode/amd.c | 48 +++++++++++++++++++++++++++++++-
1 file changed, 47 insertions(+), 1 deletion(-)
diff --git a/xen/arch/x86/cpu/microcode/amd.c b/xen/arch/x86/cpu/microcode/amd.c
index 7ff702c06caf..30bddc89da0a 100644
--- a/xen/arch/x86/cpu/microcode/amd.c
+++ b/xen/arch/x86/cpu/microcode/amd.c
@@ -42,7 +42,10 @@ struct microcode_patch {
uint8_t mc_patch_data_id[2];
uint8_t mc_patch_data_len;
uint8_t init_flag;
- uint32_t mc_patch_data_checksum;
+ union {
+ uint32_t checksum; /* Fam12h and earlier */
+ uint32_t min_rev; /* Zen3-5, post Entrysign */
+ };
uint32_t nb_dev_id;
uint32_t sb_dev_id;
uint16_t processor_rev_id;
@@ -270,6 +273,41 @@ static int cf_check amd_compare(
return compare_revisions(old->patch_id, new->patch_id);
}
+/*
+ * Check whether this patch has a minimum revision given, and whether the
+ * condition is satisfied.
+ *
+ * In linux-firmware, blobs signed with the updated signature algorithm have
+ * reused the checksum field as a min-revision field. From public archives,
+ * the checksum field appears to have been unused since Fam12h.
+ *
+ * Returns false if there is a min revision given, and it suggests that that
+ * the patch cannot be loaded on the current system. True otherwise.
+ */
+static bool check_min_rev(const struct microcode_patch *patch)
+{
+ ASSERT(microcode_fits_cpu(patch));
+
+ if ( patch->processor_rev_id < 0xa000 || /* pre Zen3? */
+ patch->min_rev == 0 ) /* No min rev specified */
+ return true;
+
+ /*
+ * Sanity check, as this is a reused field. If this is a true
+ * min_revision field, it will differ only in the bottom byte from the
+ * patch_id. Otherwise, it's probably a checksum.
+ */
+ if ( (patch->patch_id ^ patch->min_rev) & ~0xff )
+ {
+ printk(XENLOG_WARNING
+ "microcode: patch %#x has unexpected min_rev %#x\n",
+ patch->patch_id, patch->min_rev);
+ return true;
+ }
+
+ return this_cpu(cpu_sig).rev >= patch->min_rev;
+}
+
static int cf_check apply_microcode(const struct microcode_patch *patch,
unsigned int flags)
{
@@ -299,6 +337,14 @@ static int cf_check apply_microcode(const struct microcode_patch *patch,
return -ENXIO;
}
+ if ( !ucode_force && !check_min_rev(patch) )
+ {
+ printk(XENLOG_ERR
+ "microcode: CPU%u current rev %#x below patch min_rev %#x\n",
+ cpu, sig->rev, patch->min_rev);
+ return -ENXIO;
+ }
+
hw_err = wrmsr_safe(MSR_AMD_PATCHLOADER, (unsigned long)patch);
/* get patch id after patching */
--
2.39.5
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH 4/5] x86/ucode: Cross check the minimum revision
2025-10-20 13:19 ` [PATCH 4/5] x86/ucode: Cross check the minimum revision Andrew Cooper
@ 2025-10-21 9:18 ` Jan Beulich
2025-10-21 9:24 ` Andrew Cooper
0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2025-10-21 9:18 UTC (permalink / raw)
To: Andrew Cooper; +Cc: Roger Pau Monné, Xen-devel
On 20.10.2025 15:19, Andrew Cooper wrote:
> For Zen3-5 microcode blobs signed with the updated signature scheme, the
> checksum field has been reused to be a min_revision field, referring to the
> microcode revision which fixed Entrysign (SB-7033, CVE-2024-36347).
>
> Cross-check this when trying to load microcode, but allow --force to override
> it. If the signature scheme is genuinely different, a #GP will occur.
>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Might be upgradable to R-b if only I knew where - if anywhere - this is
documented. I can't spot anything in PM vol 2 in particular.
Jan
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 4/5] x86/ucode: Cross check the minimum revision
2025-10-21 9:18 ` Jan Beulich
@ 2025-10-21 9:24 ` Andrew Cooper
0 siblings, 0 replies; 21+ messages in thread
From: Andrew Cooper @ 2025-10-21 9:24 UTC (permalink / raw)
To: Jan Beulich; +Cc: Roger Pau Monné, Xen-devel
On 21/10/2025 10:18 am, Jan Beulich wrote:
> On 20.10.2025 15:19, Andrew Cooper wrote:
>> For Zen3-5 microcode blobs signed with the updated signature scheme, the
>> checksum field has been reused to be a min_revision field, referring to the
>> microcode revision which fixed Entrysign (SB-7033, CVE-2024-36347).
>>
>> Cross-check this when trying to load microcode, but allow --force to override
>> it. If the signature scheme is genuinely different, a #GP will occur.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Acked-by: Jan Beulich <jbeulich@suse.com>
Thanks.
>
> Might be upgradable to R-b if only I knew where - if anywhere - this is
> documented. I can't spot anything in PM vol 2 in particular.
Like everything else about the ucode format, It's not documented at all.
In fact, this was discovered by people on the WinRaid forums, because
even
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/amd-ucode?id=3768c184de68a85b9df6697e7f93a2f61de90a99
doesn't say that the internal headers have been adjusted.
I've confirmed with AMD that it's intentional and expected to continue
like this for the lifetime of the Zen3-5 blobs.
~Andrew
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 5/5] x86/ucode: Relax digest check when Entrysign is fixed in firmware
2025-10-20 13:19 [PATCH for-4.21? 0/5] x86/ucode: Support loading latest ucode from linux-firwmare Andrew Cooper
` (3 preceding siblings ...)
2025-10-20 13:19 ` [PATCH 4/5] x86/ucode: Cross check the minimum revision Andrew Cooper
@ 2025-10-20 13:19 ` Andrew Cooper
2025-10-20 15:06 ` Andrew Cooper
2025-10-21 9:47 ` Jan Beulich
4 siblings, 2 replies; 21+ messages in thread
From: Andrew Cooper @ 2025-10-20 13:19 UTC (permalink / raw)
To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich, Roger Pau Monné
When Entrysign has been mitigated in firwmare, it is believed to be safe to
pass blobs to the CPU again. This avoids us needing to update the digest
table for new microcodes.
Relax the digest check when firmware looks to be up to date, and leave behind
a clear message when not.
This is best-effort only. If a malicious microcode has been loaded prior to
Xen running, then all bets are off.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
I need to double check the revision table. I think I need to submit a
correction to Linux first.
---
xen/arch/x86/cpu/microcode/amd.c | 81 +++++++++++++++++++++++++++-
xen/arch/x86/cpu/microcode/core.c | 2 +
xen/arch/x86/cpu/microcode/private.h | 2 +
3 files changed, 84 insertions(+), 1 deletion(-)
diff --git a/xen/arch/x86/cpu/microcode/amd.c b/xen/arch/x86/cpu/microcode/amd.c
index 30bddc89da0a..b5b55b7a00cd 100644
--- a/xen/arch/x86/cpu/microcode/amd.c
+++ b/xen/arch/x86/cpu/microcode/amd.c
@@ -101,6 +101,7 @@ static const struct patch_digest {
} patch_digests[] = {
#include "amd-patch-digests.c"
};
+static bool __ro_after_init entrysign_mitigiated_in_firmware;
static int cf_check cmp_patch_id(const void *key, const void *elem)
{
@@ -125,7 +126,7 @@ static bool check_digest(const struct container_microcode *mc)
* microcode updates. Mitigate by checking the digest of the patch
* against a list of known provenance.
*/
- if ( boot_cpu_data.family < 0x17 ||
+ if ( boot_cpu_data.family < 0x17 || entrysign_mitigiated_in_firmware ||
!opt_digest_check )
return true;
@@ -597,3 +598,81 @@ static void __init __constructor test_digests_sorted(void)
}
}
#endif /* CONFIG_SELF_TESTS */
+
+/*
+ * The Entrysign vulnerability affects all Zen1 thru Zen5 CPUs. Firmware
+ * fixes were produced in Nov/Dec 2025. Zen3 thru Zen5 can continue to take
+ * OS-loadable microcode updates using a new signature scheme, as long as
+ * firmware has been updated first.
+ */
+void __init amd_check_entrysign(void)
+{
+ unsigned int curr_rev;
+ uint8_t fixed_rev;
+
+ if ( boot_cpu_data.vendor != X86_VENDOR_AMD ||
+ boot_cpu_data.family < 0x17 ||
+ boot_cpu_data.family > 0x1a )
+ return;
+
+ /*
+ * Table taken from Linux, which is the only known source of information
+ * about client revisions.
+ */
+ curr_rev = this_cpu(cpu_sig).rev;
+ switch ( curr_rev >> 8 )
+ {
+ case 0x080012: fixed_rev = 0x6f; break;
+ case 0x080082: fixed_rev = 0x0f; break;
+ case 0x083010: fixed_rev = 0x7c; break;
+ case 0x086001: fixed_rev = 0x0e; break;
+ case 0x086081: fixed_rev = 0x08; break;
+ case 0x087010: fixed_rev = 0x34; break;
+ case 0x08a000: fixed_rev = 0x0a; break;
+ case 0x0a0010: fixed_rev = 0x7a; break;
+ case 0x0a0011: fixed_rev = 0xda; break;
+ case 0x0a0012: fixed_rev = 0x43; break;
+ case 0x0a0082: fixed_rev = 0x0e; break;
+ case 0x0a1011: fixed_rev = 0x53; break;
+ case 0x0a1012: fixed_rev = 0x4e; break;
+ case 0x0a1081: fixed_rev = 0x09; break;
+ case 0x0a2010: fixed_rev = 0x2f; break;
+ case 0x0a2012: fixed_rev = 0x12; break;
+ case 0x0a4041: fixed_rev = 0x09; break;
+ case 0x0a5000: fixed_rev = 0x13; break;
+ case 0x0a6012: fixed_rev = 0x0a; break;
+ case 0x0a7041: fixed_rev = 0x09; break;
+ case 0x0a7052: fixed_rev = 0x08; break;
+ case 0x0a7080: fixed_rev = 0x09; break;
+ case 0x0a70c0: fixed_rev = 0x09; break;
+ case 0x0aa001: fixed_rev = 0x16; break;
+ case 0x0aa002: fixed_rev = 0x18; break;
+ case 0x0b0021: fixed_rev = 0x46; break;
+ case 0x0b1010: fixed_rev = 0x46; break;
+ case 0x0b2040: fixed_rev = 0x31; break;
+ case 0x0b4040: fixed_rev = 0x31; break;
+ case 0x0b6000: fixed_rev = 0x31; break;
+ case 0x0b7000: fixed_rev = 0x31; break;
+ default:
+ printk(XENLOG_WARNING
+ "Unrecognised CPU %02x-%02x-%02x ucode 0x%08x, assuming vulnerable to Entrysign\n",
+ boot_cpu_data.family, boot_cpu_data.model,
+ boot_cpu_data.stepping, curr_rev);
+ return;
+ }
+
+ /*
+ * This check is best-effort. If the platform looks to be out of date, it
+ * probably is. If the platform looks to be fixed, it either genuinely
+ * is, or malware has gotten in before Xen booted and all bets are off.
+ */
+ if ( (uint8_t)curr_rev >= fixed_rev )
+ {
+ entrysign_mitigiated_in_firmware = true;
+ return;
+ }
+
+ printk(XENLOG_ERR
+ "Platform vulnerable to Entrysign (SB-7033, CVE-2024-36347) - firmware update required\n");
+ add_taint(TAINT_CPU_OUT_OF_SPEC);
+}
diff --git a/xen/arch/x86/cpu/microcode/core.c b/xen/arch/x86/cpu/microcode/core.c
index 2705bb43c97f..1d1a5aa4b097 100644
--- a/xen/arch/x86/cpu/microcode/core.c
+++ b/xen/arch/x86/cpu/microcode/core.c
@@ -750,6 +750,8 @@ static int __init early_microcode_load(struct boot_info *bi)
int idx = opt_mod_idx;
int rc;
+ amd_check_entrysign();
+
/*
* Cmdline parsing ensures this invariant holds, so that we don't end up
* trying to mix multiple ways of finding the microcode.
diff --git a/xen/arch/x86/cpu/microcode/private.h b/xen/arch/x86/cpu/microcode/private.h
index f5e2bfee00d9..e6c965dc99dd 100644
--- a/xen/arch/x86/cpu/microcode/private.h
+++ b/xen/arch/x86/cpu/microcode/private.h
@@ -81,8 +81,10 @@ extern bool opt_digest_check;
*/
#ifdef CONFIG_AMD
void ucode_probe_amd(struct microcode_ops *ops);
+void amd_check_entrysign(void);
#else
static inline void ucode_probe_amd(struct microcode_ops *ops) {}
+static inline void amd_check_entrysign(void) {}
#endif
#ifdef CONFIG_INTEL
--
2.39.5
^ permalink raw reply related [flat|nested] 21+ messages in thread* Re: [PATCH 5/5] x86/ucode: Relax digest check when Entrysign is fixed in firmware
2025-10-20 13:19 ` [PATCH 5/5] x86/ucode: Relax digest check when Entrysign is fixed in firmware Andrew Cooper
@ 2025-10-20 15:06 ` Andrew Cooper
2025-10-21 9:47 ` Jan Beulich
1 sibling, 0 replies; 21+ messages in thread
From: Andrew Cooper @ 2025-10-20 15:06 UTC (permalink / raw)
To: Xen-devel; +Cc: Jan Beulich, Roger Pau Monné
On 20/10/2025 2:19 pm, Andrew Cooper wrote:
> When Entrysign has been mitigated in firwmare, it is believed to be safe to
> pass blobs to the CPU again. This avoids us needing to update the digest
> table for new microcodes.
>
> Relax the digest check when firmware looks to be up to date, and leave behind
> a clear message when not.
>
> This is best-effort only. If a malicious microcode has been loaded prior to
> Xen running, then all bets are off.
>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
>
> I need to double check the revision table. I think I need to submit a
> correction to Linux first.
Yes.
https://lore.kernel.org/lkml/20251020144124.2930784-1-andrew.cooper3@citrix.com/T/#u
Also there's a general off-by-one error in the revisions, owing to a
difference in how Linux and Xen are using the boundaries.
Both fixed locally for v2.
~Andrew
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 5/5] x86/ucode: Relax digest check when Entrysign is fixed in firmware
2025-10-20 13:19 ` [PATCH 5/5] x86/ucode: Relax digest check when Entrysign is fixed in firmware Andrew Cooper
2025-10-20 15:06 ` Andrew Cooper
@ 2025-10-21 9:47 ` Jan Beulich
2025-10-22 21:19 ` Andrew Cooper
1 sibling, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2025-10-21 9:47 UTC (permalink / raw)
To: Andrew Cooper; +Cc: Roger Pau Monné, Xen-devel
On 20.10.2025 15:19, Andrew Cooper wrote:
> @@ -597,3 +598,81 @@ static void __init __constructor test_digests_sorted(void)
> }
> }
> #endif /* CONFIG_SELF_TESTS */
> +
> +/*
> + * The Entrysign vulnerability affects all Zen1 thru Zen5 CPUs.
And older ones are fine, or merely have no fixes produced?
> Firmware
> + * fixes were produced in Nov/Dec 2025. Zen3 thru Zen5 can continue to take
> + * OS-loadable microcode updates using a new signature scheme, as long as
> + * firmware has been updated first.
> + */
Yet what about Zen1/2?
> +void __init amd_check_entrysign(void)
> +{
> + unsigned int curr_rev;
> + uint8_t fixed_rev;
> +
> + if ( boot_cpu_data.vendor != X86_VENDOR_AMD ||
> + boot_cpu_data.family < 0x17 ||
> + boot_cpu_data.family > 0x1a )
> + return;
> +
> + /*
> + * Table taken from Linux, which is the only known source of information
> + * about client revisions.
> + */
> + curr_rev = this_cpu(cpu_sig).rev;
> + switch ( curr_rev >> 8 )
> + {
> + case 0x080012: fixed_rev = 0x6f; break;
> + case 0x080082: fixed_rev = 0x0f; break;
In your reply you mentioned a "general off-by-1" when comparing with Linux,
but I'm in trouble understanding how both can be correct. Leaving aside the
1st line (for which you sent a Linux patch anyway), how can our
"(uint8_t)curr_rev >= fixed_rev" (i.e. "(uint8_t)curr_rev >= 0x0f") further
below be correct at the same time as Linux'es "return cur_rev <= 0x800820f"
(indicating to the caller whether a SHA check is needed) is also correct?
We say 0x0f is okay, while they demand a SHA check for that revision.
In any event, whatever (legitimate) off-by-1 it is that I'm failing to spot,
I think this would want explaining in the comment above.
> + case 0x083010: fixed_rev = 0x7c; break;
> + case 0x086001: fixed_rev = 0x0e; break;
> + case 0x086081: fixed_rev = 0x08; break;
> + case 0x087010: fixed_rev = 0x34; break;
> + case 0x08a000: fixed_rev = 0x0a; break;
> + case 0x0a0010: fixed_rev = 0x7a; break;
> + case 0x0a0011: fixed_rev = 0xda; break;
> + case 0x0a0012: fixed_rev = 0x43; break;
> + case 0x0a0082: fixed_rev = 0x0e; break;
> + case 0x0a1011: fixed_rev = 0x53; break;
> + case 0x0a1012: fixed_rev = 0x4e; break;
> + case 0x0a1081: fixed_rev = 0x09; break;
> + case 0x0a2010: fixed_rev = 0x2f; break;
> + case 0x0a2012: fixed_rev = 0x12; break;
> + case 0x0a4041: fixed_rev = 0x09; break;
> + case 0x0a5000: fixed_rev = 0x13; break;
> + case 0x0a6012: fixed_rev = 0x0a; break;
> + case 0x0a7041: fixed_rev = 0x09; break;
> + case 0x0a7052: fixed_rev = 0x08; break;
> + case 0x0a7080: fixed_rev = 0x09; break;
> + case 0x0a70c0: fixed_rev = 0x09; break;
> + case 0x0aa001: fixed_rev = 0x16; break;
> + case 0x0aa002: fixed_rev = 0x18; break;
> + case 0x0b0021: fixed_rev = 0x46; break;
> + case 0x0b1010: fixed_rev = 0x46; break;
> + case 0x0b2040: fixed_rev = 0x31; break;
> + case 0x0b4040: fixed_rev = 0x31; break;
> + case 0x0b6000: fixed_rev = 0x31; break;
> + case 0x0b7000: fixed_rev = 0x31; break;
Without at least brief model related comments this looks extremely opaque.
Linux, as a minimal reference, at least has cpuid_to_ucode_rev() and the
accompanying union zen_patch_rev. Background of my remark is that I would
have expected there to be more models per Zen<N>, seeing in particular how
many different BKDGs / PPRs and RGs there are. Many RGs in particular say
they apply to a range of models, yet no similar ranges are covered here
(unless my deciphering attempts went wrong).
Jan
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH 5/5] x86/ucode: Relax digest check when Entrysign is fixed in firmware
2025-10-21 9:47 ` Jan Beulich
@ 2025-10-22 21:19 ` Andrew Cooper
2025-10-23 7:05 ` Jan Beulich
0 siblings, 1 reply; 21+ messages in thread
From: Andrew Cooper @ 2025-10-22 21:19 UTC (permalink / raw)
To: Jan Beulich; +Cc: Roger Pau Monné, Xen-devel
On 21/10/2025 10:47 am, Jan Beulich wrote:
> On 20.10.2025 15:19, Andrew Cooper wrote:
>> @@ -597,3 +598,81 @@ static void __init __constructor test_digests_sorted(void)
>> }
>> }
>> #endif /* CONFIG_SELF_TESTS */
>> +
>> +/*
>> + * The Entrysign vulnerability affects all Zen1 thru Zen5 CPUs.
> And older ones are fine, or merely have no fixes produced?
Unknown. Everything prior to Zen1 is fully out of support from AMD.
>
>> Firmware
>> + * fixes were produced in Nov/Dec 2025. Zen3 thru Zen5 can continue to take
>> + * OS-loadable microcode updates using a new signature scheme, as long as
>> + * firmware has been updated first.
>> + */
> Yet what about Zen1/2?
There's no signature fix for Zen1/2. The "fix" disables microcode
loading, and firmware updates are the only way to get new fixes.
This is footnote 1 in the bulletin. "Microcode cannot be hot-loaded
after updating to this PI version."
>
>> +void __init amd_check_entrysign(void)
>> +{
>> + unsigned int curr_rev;
>> + uint8_t fixed_rev;
>> +
>> + if ( boot_cpu_data.vendor != X86_VENDOR_AMD ||
>> + boot_cpu_data.family < 0x17 ||
>> + boot_cpu_data.family > 0x1a )
>> + return;
>> +
>> + /*
>> + * Table taken from Linux, which is the only known source of information
>> + * about client revisions.
>> + */
>> + curr_rev = this_cpu(cpu_sig).rev;
>> + switch ( curr_rev >> 8 )
>> + {
>> + case 0x080012: fixed_rev = 0x6f; break;
>> + case 0x080082: fixed_rev = 0x0f; break;
> In your reply you mentioned a "general off-by-1" when comparing with Linux,
> but I'm in trouble understanding how both can be correct. Leaving aside the
> 1st line (for which you sent a Linux patch anyway), how can our
> "(uint8_t)curr_rev >= fixed_rev" (i.e. "(uint8_t)curr_rev >= 0x0f") further
> below be correct at the same time as Linux'es "return cur_rev <= 0x800820f"
> (indicating to the caller whether a SHA check is needed) is also correct?
> We say 0x0f is okay, while they demand a SHA check for that revision.
>
> In any event, whatever (legitimate) off-by-1 it is that I'm failing to spot,
> I think this would want explaining in the comment above.
What you've spotted is the off-by-one error.
Linux is written as "curr <= last-vuln-rev" in order to do the digest check.
Xen wants "cur >= first-fixed-rev"; I renamed the variable and forgot to
adjust the table to compensate. I've already fixed it in v2, so this
line now reads fixed_rev = 0x0a.
>
>> + case 0x083010: fixed_rev = 0x7c; break;
>> + case 0x086001: fixed_rev = 0x0e; break;
>> + case 0x086081: fixed_rev = 0x08; break;
>> + case 0x087010: fixed_rev = 0x34; break;
>> + case 0x08a000: fixed_rev = 0x0a; break;
>> + case 0x0a0010: fixed_rev = 0x7a; break;
>> + case 0x0a0011: fixed_rev = 0xda; break;
>> + case 0x0a0012: fixed_rev = 0x43; break;
>> + case 0x0a0082: fixed_rev = 0x0e; break;
>> + case 0x0a1011: fixed_rev = 0x53; break;
>> + case 0x0a1012: fixed_rev = 0x4e; break;
>> + case 0x0a1081: fixed_rev = 0x09; break;
>> + case 0x0a2010: fixed_rev = 0x2f; break;
>> + case 0x0a2012: fixed_rev = 0x12; break;
>> + case 0x0a4041: fixed_rev = 0x09; break;
>> + case 0x0a5000: fixed_rev = 0x13; break;
>> + case 0x0a6012: fixed_rev = 0x0a; break;
>> + case 0x0a7041: fixed_rev = 0x09; break;
>> + case 0x0a7052: fixed_rev = 0x08; break;
>> + case 0x0a7080: fixed_rev = 0x09; break;
>> + case 0x0a70c0: fixed_rev = 0x09; break;
>> + case 0x0aa001: fixed_rev = 0x16; break;
>> + case 0x0aa002: fixed_rev = 0x18; break;
>> + case 0x0b0021: fixed_rev = 0x46; break;
>> + case 0x0b1010: fixed_rev = 0x46; break;
>> + case 0x0b2040: fixed_rev = 0x31; break;
>> + case 0x0b4040: fixed_rev = 0x31; break;
>> + case 0x0b6000: fixed_rev = 0x31; break;
>> + case 0x0b7000: fixed_rev = 0x31; break;
> Without at least brief model related comments this looks extremely opaque.
> Linux, as a minimal reference, at least has cpuid_to_ucode_rev() and the
> accompanying union zen_patch_rev.
We have other tables like this in Xen. Linux has even more.
These case labels are family/model/steppings, but not in the same format
as CPUID.1.EAX, and also not in the same format at patch->processor_id.
This is true even on CPUs prior to Zen1, making union zen_patch_rev
misleading and why I have intentionally not ported it across. Fam11h
seems to be where this started being true in practice.
Fam10h has cases the same ucode applies to different steppings of CPUs.
> Background of my remark is that I would
> have expected there to be more models per Zen<N>, seeing in particular how
> many different BKDGs / PPRs and RGs there are. Many RGs in particular say
> they apply to a range of models, yet no similar ranges are covered here
> (unless my deciphering attempts went wrong).
PPRs/RGs are generally per block of 0x10 models and all steppings
therewith. This is quite often one production CPU and a handful of
preproduction steppings, but e.g. Milan and MilanX are two production
CPUs share a same PPR/RG, as they differ only by stepping.
Preproduction CPUs probably won't have a fix (other than the final two
rows which are A0 stepping of something presumably trying to get out of
the door when Entrysign was found.) The list does look to be right
order of magnitude for the production CPUs.
The AMD bulletin only gives microcode versions for server. Clients only
state AgesaPI versions, so I'm entirely reliant on Linux for the
microcode versions.
~Andrew
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH 5/5] x86/ucode: Relax digest check when Entrysign is fixed in firmware
2025-10-22 21:19 ` Andrew Cooper
@ 2025-10-23 7:05 ` Jan Beulich
2025-10-27 21:42 ` Andrew Cooper
0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2025-10-23 7:05 UTC (permalink / raw)
To: Andrew Cooper; +Cc: Roger Pau Monné, Xen-devel
On 22.10.2025 23:19, Andrew Cooper wrote:
> On 21/10/2025 10:47 am, Jan Beulich wrote:
>> On 20.10.2025 15:19, Andrew Cooper wrote:
>>> +void __init amd_check_entrysign(void)
>>> +{
>>> + unsigned int curr_rev;
>>> + uint8_t fixed_rev;
>>> +
>>> + if ( boot_cpu_data.vendor != X86_VENDOR_AMD ||
>>> + boot_cpu_data.family < 0x17 ||
>>> + boot_cpu_data.family > 0x1a )
>>> + return;
>>> +
>>> + /*
>>> + * Table taken from Linux, which is the only known source of information
>>> + * about client revisions.
>>> + */
>>> + curr_rev = this_cpu(cpu_sig).rev;
>>> + switch ( curr_rev >> 8 )
>>> + {
>>> + case 0x080012: fixed_rev = 0x6f; break;
>>> + case 0x080082: fixed_rev = 0x0f; break;
>> In your reply you mentioned a "general off-by-1" when comparing with Linux,
>> but I'm in trouble understanding how both can be correct. Leaving aside the
>> 1st line (for which you sent a Linux patch anyway), how can our
>> "(uint8_t)curr_rev >= fixed_rev" (i.e. "(uint8_t)curr_rev >= 0x0f") further
>> below be correct at the same time as Linux'es "return cur_rev <= 0x800820f"
>> (indicating to the caller whether a SHA check is needed) is also correct?
>> We say 0x0f is okay, while they demand a SHA check for that revision.
>>
>> In any event, whatever (legitimate) off-by-1 it is that I'm failing to spot,
>> I think this would want explaining in the comment above.
>
> What you've spotted is the off-by-one error.
>
> Linux is written as "curr <= last-vuln-rev" in order to do the digest check.
>
> Xen wants "cur >= first-fixed-rev"; I renamed the variable and forgot to
> adjust the table to compensate. I've already fixed it in v2, so this
> line now reads fixed_rev = 0x0a.
Now I'm even more confused. If Linux uses 0x0f for last-vuln-rev, how would
0x0a be first-fixed-ref?
>>> + case 0x083010: fixed_rev = 0x7c; break;
>>> + case 0x086001: fixed_rev = 0x0e; break;
>>> + case 0x086081: fixed_rev = 0x08; break;
>>> + case 0x087010: fixed_rev = 0x34; break;
>>> + case 0x08a000: fixed_rev = 0x0a; break;
>>> + case 0x0a0010: fixed_rev = 0x7a; break;
>>> + case 0x0a0011: fixed_rev = 0xda; break;
>>> + case 0x0a0012: fixed_rev = 0x43; break;
>>> + case 0x0a0082: fixed_rev = 0x0e; break;
>>> + case 0x0a1011: fixed_rev = 0x53; break;
>>> + case 0x0a1012: fixed_rev = 0x4e; break;
>>> + case 0x0a1081: fixed_rev = 0x09; break;
>>> + case 0x0a2010: fixed_rev = 0x2f; break;
>>> + case 0x0a2012: fixed_rev = 0x12; break;
>>> + case 0x0a4041: fixed_rev = 0x09; break;
>>> + case 0x0a5000: fixed_rev = 0x13; break;
>>> + case 0x0a6012: fixed_rev = 0x0a; break;
>>> + case 0x0a7041: fixed_rev = 0x09; break;
>>> + case 0x0a7052: fixed_rev = 0x08; break;
>>> + case 0x0a7080: fixed_rev = 0x09; break;
>>> + case 0x0a70c0: fixed_rev = 0x09; break;
>>> + case 0x0aa001: fixed_rev = 0x16; break;
>>> + case 0x0aa002: fixed_rev = 0x18; break;
>>> + case 0x0b0021: fixed_rev = 0x46; break;
>>> + case 0x0b1010: fixed_rev = 0x46; break;
>>> + case 0x0b2040: fixed_rev = 0x31; break;
>>> + case 0x0b4040: fixed_rev = 0x31; break;
>>> + case 0x0b6000: fixed_rev = 0x31; break;
>>> + case 0x0b7000: fixed_rev = 0x31; break;
>> Without at least brief model related comments this looks extremely opaque.
>> Linux, as a minimal reference, at least has cpuid_to_ucode_rev() and the
>> accompanying union zen_patch_rev.
>
> We have other tables like this in Xen. Linux has even more.
The one in amd-patch-digests.c I'm aware of. Oh, and tsa_calculations().
But ...
> These case labels are family/model/steppings, but not in the same format
> as CPUID.1.EAX, and also not in the same format at patch->processor_id.
... none of them explaining what these numbers really mean isn't helpful.
I didn't question them earlier because I assumed them to be all "magic".
Now that I learned how they're encoded, I thought it might be (have been)
nice if they weren't left as "entirely magic".
>> Background of my remark is that I would
>> have expected there to be more models per Zen<N>, seeing in particular how
>> many different BKDGs / PPRs and RGs there are. Many RGs in particular say
>> they apply to a range of models, yet no similar ranges are covered here
>> (unless my deciphering attempts went wrong).
>
> PPRs/RGs are generally per block of 0x10 models and all steppings
> therewith. This is quite often one production CPU and a handful of
> preproduction steppings, but e.g. Milan and MilanX are two production
> CPUs share a same PPR/RG, as they differ only by stepping.
>
> Preproduction CPUs probably won't have a fix (other than the final two
> rows which are A0 stepping of something presumably trying to get out of
> the door when Entrysign was found.) The list does look to be right
> order of magnitude for the production CPUs.
Sure, and my question wasn't towards steppings of individual models. My
question was towards models of individual families, where the docs
suggest far more exist than this table would cover. I guess that while
talking mainly of steppings, you really (also) meant to say that most of
the model numbers weren't used in practice (for production CPUs) either?
> The AMD bulletin only gives microcode versions for server. Clients only
> state AgesaPI versions, so I'm entirely reliant on Linux for the
> microcode versions.
I did understand that, yes, as you have a code comment saying so.
Jan
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: [PATCH 5/5] x86/ucode: Relax digest check when Entrysign is fixed in firmware
2025-10-23 7:05 ` Jan Beulich
@ 2025-10-27 21:42 ` Andrew Cooper
0 siblings, 0 replies; 21+ messages in thread
From: Andrew Cooper @ 2025-10-27 21:42 UTC (permalink / raw)
To: Jan Beulich; +Cc: Roger Pau Monné, Xen-devel
On 23/10/2025 8:05 am, Jan Beulich wrote:
> On 22.10.2025 23:19, Andrew Cooper wrote:
>> On 21/10/2025 10:47 am, Jan Beulich wrote:
>>> On 20.10.2025 15:19, Andrew Cooper wrote:
>>>> +void __init amd_check_entrysign(void)
>>>> +{
>>>> + unsigned int curr_rev;
>>>> + uint8_t fixed_rev;
>>>> +
>>>> + if ( boot_cpu_data.vendor != X86_VENDOR_AMD ||
>>>> + boot_cpu_data.family < 0x17 ||
>>>> + boot_cpu_data.family > 0x1a )
>>>> + return;
>>>> +
>>>> + /*
>>>> + * Table taken from Linux, which is the only known source of information
>>>> + * about client revisions.
>>>> + */
>>>> + curr_rev = this_cpu(cpu_sig).rev;
>>>> + switch ( curr_rev >> 8 )
>>>> + {
>>>> + case 0x080012: fixed_rev = 0x6f; break;
>>>> + case 0x080082: fixed_rev = 0x0f; break;
>>> In your reply you mentioned a "general off-by-1" when comparing with Linux,
>>> but I'm in trouble understanding how both can be correct. Leaving aside the
>>> 1st line (for which you sent a Linux patch anyway), how can our
>>> "(uint8_t)curr_rev >= fixed_rev" (i.e. "(uint8_t)curr_rev >= 0x0f") further
>>> below be correct at the same time as Linux'es "return cur_rev <= 0x800820f"
>>> (indicating to the caller whether a SHA check is needed) is also correct?
>>> We say 0x0f is okay, while they demand a SHA check for that revision.
>>>
>>> In any event, whatever (legitimate) off-by-1 it is that I'm failing to spot,
>>> I think this would want explaining in the comment above.
>> What you've spotted is the off-by-one error.
>>
>> Linux is written as "curr <= last-vuln-rev" in order to do the digest check.
>>
>> Xen wants "cur >= first-fixed-rev"; I renamed the variable and forgot to
>> adjust the table to compensate. I've already fixed it in v2, so this
>> line now reads fixed_rev = 0x0a.
> Now I'm even more confused. If Linux uses 0x0f for last-vuln-rev, how would
> 0x0a be first-fixed-ref?
Sorry, that was a typo in my email. I've got 0x10 locally.
>
>>>> + case 0x083010: fixed_rev = 0x7c; break;
>>>> + case 0x086001: fixed_rev = 0x0e; break;
>>>> + case 0x086081: fixed_rev = 0x08; break;
>>>> + case 0x087010: fixed_rev = 0x34; break;
>>>> + case 0x08a000: fixed_rev = 0x0a; break;
>>>> + case 0x0a0010: fixed_rev = 0x7a; break;
>>>> + case 0x0a0011: fixed_rev = 0xda; break;
>>>> + case 0x0a0012: fixed_rev = 0x43; break;
>>>> + case 0x0a0082: fixed_rev = 0x0e; break;
>>>> + case 0x0a1011: fixed_rev = 0x53; break;
>>>> + case 0x0a1012: fixed_rev = 0x4e; break;
>>>> + case 0x0a1081: fixed_rev = 0x09; break;
>>>> + case 0x0a2010: fixed_rev = 0x2f; break;
>>>> + case 0x0a2012: fixed_rev = 0x12; break;
>>>> + case 0x0a4041: fixed_rev = 0x09; break;
>>>> + case 0x0a5000: fixed_rev = 0x13; break;
>>>> + case 0x0a6012: fixed_rev = 0x0a; break;
>>>> + case 0x0a7041: fixed_rev = 0x09; break;
>>>> + case 0x0a7052: fixed_rev = 0x08; break;
>>>> + case 0x0a7080: fixed_rev = 0x09; break;
>>>> + case 0x0a70c0: fixed_rev = 0x09; break;
>>>> + case 0x0aa001: fixed_rev = 0x16; break;
>>>> + case 0x0aa002: fixed_rev = 0x18; break;
>>>> + case 0x0b0021: fixed_rev = 0x46; break;
>>>> + case 0x0b1010: fixed_rev = 0x46; break;
>>>> + case 0x0b2040: fixed_rev = 0x31; break;
>>>> + case 0x0b4040: fixed_rev = 0x31; break;
>>>> + case 0x0b6000: fixed_rev = 0x31; break;
>>>> + case 0x0b7000: fixed_rev = 0x31; break;
>>> Without at least brief model related comments this looks extremely opaque.
>>> Linux, as a minimal reference, at least has cpuid_to_ucode_rev() and the
>>> accompanying union zen_patch_rev.
>> We have other tables like this in Xen. Linux has even more.
> The one in amd-patch-digests.c I'm aware of. Oh, and tsa_calculations().
> But ...
>
>> These case labels are family/model/steppings, but not in the same format
>> as CPUID.1.EAX, and also not in the same format at patch->processor_id.
> ... none of them explaining what these numbers really mean isn't helpful.
> I didn't question them earlier because I assumed them to be all "magic".
> Now that I learned how they're encoded, I thought it might be (have been)
> nice if they weren't left as "entirely magic".
Well - they are about as magic as numbers get.
It's just a convention that AMD uses when choosing the (otherwise
arbitrary) patch_id, and I'm not aware of it being written down
anywhere. Using the entrysign vulnerability, AIUI you can choose an
arbitrary 32bit value here.
Linux says it's from Fam17h onwards, but the pattern works from Fam12h,
and Fam10h was definitely different.
I've got no idea how long it will continue. For one, the 8-bit ucode
revision is proving to be a limiting factor on some CPUs, and e.g. one
of the 3 F/M/S encoding (patch->processor_id) will run out when we hit
Zen15 CPUs at the current rate that AMD are using Family numbers.
>>> Background of my remark is that I would
>>> have expected there to be more models per Zen<N>, seeing in particular how
>>> many different BKDGs / PPRs and RGs there are. Many RGs in particular say
>>> they apply to a range of models, yet no similar ranges are covered here
>>> (unless my deciphering attempts went wrong).
>> PPRs/RGs are generally per block of 0x10 models and all steppings
>> therewith. This is quite often one production CPU and a handful of
>> preproduction steppings, but e.g. Milan and MilanX are two production
>> CPUs share a same PPR/RG, as they differ only by stepping.
>>
>> Preproduction CPUs probably won't have a fix (other than the final two
>> rows which are A0 stepping of something presumably trying to get out of
>> the door when Entrysign was found.) The list does look to be right
>> order of magnitude for the production CPUs.
> Sure, and my question wasn't towards steppings of individual models. My
> question was towards models of individual families, where the docs
> suggest far more exist than this table would cover. I guess that while
> talking mainly of steppings, you really (also) meant to say that most of
> the model numbers weren't used in practice (for production CPUs) either?
AMD's numbering space is very sparse. From a block of 0x10 (or in some
cases 8) model numbers, it's uncommon to see anything other than 0 or 1.
~Andrew
^ permalink raw reply [flat|nested] 21+ messages in thread