* AMD zen microcode updates breaks boot
@ 2024-09-27 15:17 Jens Axboe
2024-09-28 6:10 ` Borislav Petkov
` (2 more replies)
0 siblings, 3 replies; 44+ messages in thread
From: Jens Axboe @ 2024-09-27 15:17 UTC (permalink / raw)
To: Borislav Petkov; +Cc: Thomas Gleixner, the arch/x86 maintainers, LKML
Hi,
Got home from conference travels and updated two test boxes to current
-git (sha 075dbe9f6e3c), both AMD boxes. One of them boots fine, the
other one does not. One is a Dell R7525, cpu:
2 socket AMD EPYC 7763 64-Core Processor
and it boots fine on -git. The other is a Dell R7625, cpu:
2 socket AMD EPYC 9754 128-Core Processor
and that one does not boot. Just get a black screen when the kernel
should load. Because I didn't have much to go on here, I bisected the
issue, and it came up with:
94838d230a6c835ced1bad06b8759e0a5f19c1d3 is the first bad commit
commit 94838d230a6c835ced1bad06b8759e0a5f19c1d3 (HEAD)
Author: Borislav Petkov <bp@alien8.de>
Date: Thu Jul 25 13:20:37 2024 +0200
x86/microcode/AMD: Use the family,model,stepping encoded in the patch ID
which seems plausible. And indeed, reverting that commit (and its fixup)
on top of current -git does indeed solve it. Happy to test patches,
unfortunately I don't have much to offer up in terms of oops or whatever
to help diagnose this. In lieu of instant ideas to prevent this issue on
-rc1, perhaps a revert?
--
Jens Axboe
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: AMD zen microcode updates breaks boot
2024-09-27 15:17 AMD zen microcode updates breaks boot Jens Axboe
@ 2024-09-28 6:10 ` Borislav Petkov
2024-09-28 11:31 ` Jens Axboe
2024-10-22 16:08 ` [tip: x86/urgent] x86/microcode/AMD: Pay attention to the stepping dynamically tip-bot2 for Borislav Petkov (AMD)
2024-10-22 16:08 ` [tip: x86/urgent] x86/microcode/AMD: Split load_microcode_amd() tip-bot2 for Borislav Petkov (AMD)
2 siblings, 1 reply; 44+ messages in thread
From: Borislav Petkov @ 2024-09-28 6:10 UTC (permalink / raw)
To: Jens Axboe; +Cc: Thomas Gleixner, the arch/x86 maintainers, LKML
On Fri, Sep 27, 2024 at 09:17:46AM -0600, Jens Axboe wrote:
> which seems plausible. And indeed, reverting that commit (and its fixup)
> on top of current -git does indeed solve it.
Can you send full dmesg from that kernel with the patch reverted and
also upload somewhere
/lib/firmware/amd-ucode/microcode_amd_fam19h.bin
?
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: AMD zen microcode updates breaks boot
2024-09-28 6:10 ` Borislav Petkov
@ 2024-09-28 11:31 ` Jens Axboe
2024-09-30 4:43 ` Borislav Petkov
0 siblings, 1 reply; 44+ messages in thread
From: Jens Axboe @ 2024-09-28 11:31 UTC (permalink / raw)
To: Borislav Petkov; +Cc: Thomas Gleixner, the arch/x86 maintainers, LKML
On 9/28/24 12:10 AM, Borislav Petkov wrote:
> On Fri, Sep 27, 2024 at 09:17:46AM -0600, Jens Axboe wrote:
>> which seems plausible. And indeed, reverting that commit (and its fixup)
>> on top of current -git does indeed solve it.
>
> Can you send full dmesg from that kernel with the patch reverted and
> also upload somewhere
Here's dmesg on the kernel I booted yesterday, which is -git (and some
other branches) and the two patches reverted:
https://kernel.dk/r7625.dmesg.gz
> /lib/firmware/amd-ucode/microcode_amd_fam19h.bin
https://kernel.dk/microcode_amd_fam19h.bin
This is from:
axboe@r7625 ~> apt show amd64-microcode
Package: amd64-microcode
Version: 3.20240820.1
as the box is running debian testing/unstable.
--
Jens Axboe
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: AMD zen microcode updates breaks boot
2024-09-28 11:31 ` Jens Axboe
@ 2024-09-30 4:43 ` Borislav Petkov
2024-09-30 12:27 ` Jens Axboe
0 siblings, 1 reply; 44+ messages in thread
From: Borislav Petkov @ 2024-09-30 4:43 UTC (permalink / raw)
To: Jens Axboe; +Cc: Thomas Gleixner, the arch/x86 maintainers, LKML
On Sat, Sep 28, 2024 at 05:31:36AM -0600, Jens Axboe wrote:
> as the box is running debian testing/unstable.
Thanks, I need to go find a box like yours to try to reproduce it there,
after I get back from vacation.
In the meantime, you can add "dis_ucode_ldr" to the kernel cmdline if
you want to boot the box with -rc1.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: AMD zen microcode updates breaks boot
2024-09-30 4:43 ` Borislav Petkov
@ 2024-09-30 12:27 ` Jens Axboe
2024-09-30 16:16 ` Borislav Petkov
0 siblings, 1 reply; 44+ messages in thread
From: Jens Axboe @ 2024-09-30 12:27 UTC (permalink / raw)
To: Borislav Petkov
Cc: Thomas Gleixner, the arch/x86 maintainers, LKML,
regressions@lists.linux.dev
On 9/29/24 10:43 PM, Borislav Petkov wrote:
> On Sat, Sep 28, 2024 at 05:31:36AM -0600, Jens Axboe wrote:
>> as the box is running debian testing/unstable.
>
> Thanks, I need to go find a box like yours to try to reproduce it there,
> after I get back from vacation.
>
> In the meantime, you can add "dis_ucode_ldr" to the kernel cmdline if
> you want to boot the box with -rc1.
Would it perhaps be an idea to revert the cleanup until then? I can't
be the only one hit by this. We obviously missed -rc1 now, but...
CC'ing the regressions list to keep track of this.
--
Jens Axboe
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: AMD zen microcode updates breaks boot
2024-09-30 12:27 ` Jens Axboe
@ 2024-09-30 16:16 ` Borislav Petkov
2024-09-30 16:25 ` Jens Axboe
0 siblings, 1 reply; 44+ messages in thread
From: Borislav Petkov @ 2024-09-30 16:16 UTC (permalink / raw)
To: Jens Axboe
Cc: Thomas Gleixner, the arch/x86 maintainers, LKML,
regressions@lists.linux.dev
On September 30, 2024 2:27:03 PM GMT+02:00, Jens Axboe <axboe@kernel.dk> wrote:
>Would it perhaps be an idea to revert the cleanup until then? I can't
>be the only one hit by this. We obviously missed -rc1 now, but...
You're the first one I'm hearing about. Either your configuration is weird or something else is amiss. I've tested this patch on all AMD families. But sure, feel free to send a properly explained revert and I'll ack it.
Thx.
--
Sent from a small device: formatting sucks and brevity is inevitable.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: AMD zen microcode updates breaks boot
2024-09-30 16:16 ` Borislav Petkov
@ 2024-09-30 16:25 ` Jens Axboe
2024-10-09 9:12 ` Borislav Petkov
0 siblings, 1 reply; 44+ messages in thread
From: Jens Axboe @ 2024-09-30 16:25 UTC (permalink / raw)
To: Borislav Petkov
Cc: Thomas Gleixner, the arch/x86 maintainers, LKML,
regressions@lists.linux.dev
On 9/30/24 10:16 AM, Borislav Petkov wrote:
> On September 30, 2024 2:27:03 PM GMT+02:00, Jens Axboe <axboe@kernel.dk> wrote:
>> Would it perhaps be an idea to revert the cleanup until then? I can't
>> be the only one hit by this. We obviously missed -rc1 now, but...
>
> You're the first one I'm hearing about. Either your configuration is
> weird or something else is amiss. I've tested this patch on all AMD
> families. But sure, feel free to send a properly explained revert and
> I'll ack it.
Hmm, seems like a pretty standard cpu and updated microcode fw, no? But
if it's just me, we can just defer until it gets fixed, at least for
some more rcs. I just pruned the microcode for now to work around it, as
it's pretty annoying to forget about doing the reverts and then booting
a broken kernel. The box takes minutes to post+boot.
--
Jens Axboe
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: AMD zen microcode updates breaks boot
2024-09-30 16:25 ` Jens Axboe
@ 2024-10-09 9:12 ` Borislav Petkov
2024-10-09 11:04 ` Jens Axboe
0 siblings, 1 reply; 44+ messages in thread
From: Borislav Petkov @ 2024-10-09 9:12 UTC (permalink / raw)
To: Jens Axboe
Cc: Thomas Gleixner, the arch/x86 maintainers, LKML,
regressions@lists.linux.dev
On Mon, Sep 30, 2024 at 10:25:21AM -0600, Jens Axboe wrote:
> Hmm, seems like a pretty standard cpu and updated microcode fw, no? But
> if it's just me, we can just defer until it gets fixed, at least for
> some more rcs. I just pruned the microcode for now to work around it, as
> it's pretty annoying to forget about doing the reverts and then booting
> a broken kernel. The box takes minutes to post+boot.
With the microcode blob removed so that no loading happens, what microcode
does this box have?
Still 0x0aa00215?
I.e., what does:
$ grep microcode /proc/cpuinfo | sort | uniq -c
say?
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: AMD zen microcode updates breaks boot
2024-10-09 9:12 ` Borislav Petkov
@ 2024-10-09 11:04 ` Jens Axboe
2024-10-10 13:46 ` Borislav Petkov
0 siblings, 1 reply; 44+ messages in thread
From: Jens Axboe @ 2024-10-09 11:04 UTC (permalink / raw)
To: Borislav Petkov
Cc: Thomas Gleixner, the arch/x86 maintainers, LKML,
regressions@lists.linux.dev
On 10/9/24 3:12 AM, Borislav Petkov wrote:
> On Mon, Sep 30, 2024 at 10:25:21AM -0600, Jens Axboe wrote:
>> Hmm, seems like a pretty standard cpu and updated microcode fw, no? But
>> if it's just me, we can just defer until it gets fixed, at least for
>> some more rcs. I just pruned the microcode for now to work around it, as
>> it's pretty annoying to forget about doing the reverts and then booting
>> a broken kernel. The box takes minutes to post+boot.
>
> With the microcode blob removed so that no loading happens, what microcode
> does this box have?
>
> Still 0x0aa00215?
Yep, 0xaa00215
--
Jens Axboe
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: AMD zen microcode updates breaks boot
2024-10-09 11:04 ` Jens Axboe
@ 2024-10-10 13:46 ` Borislav Petkov
2024-10-10 13:50 ` Jens Axboe
0 siblings, 1 reply; 44+ messages in thread
From: Borislav Petkov @ 2024-10-10 13:46 UTC (permalink / raw)
To: Jens Axboe
Cc: Thomas Gleixner, the arch/x86 maintainers, LKML,
regressions@lists.linux.dev
On Wed, Oct 09, 2024 at 05:04:23AM -0600, Jens Axboe wrote:
> Yep, 0xaa00215
Found something: I'm not handling the stepping properly, below is a big diff
along with debug printks. Can you pls run it and send me dmesg. I'm assuming
the box will boot with it.
Thx.
---
diff --git a/arch/x86/kernel/cpu/microcode/amd.c b/arch/x86/kernel/cpu/microcode/amd.c
index f63b051f25a0..a86fd2684913 100644
--- a/arch/x86/kernel/cpu/microcode/amd.c
+++ b/arch/x86/kernel/cpu/microcode/amd.c
@@ -158,6 +158,9 @@ static union cpuid_1_eax ucode_rev_to_cpuid(unsigned int val)
c.family = 0xf;
c.ext_fam = p.ext_fam;
+ pr_info("%s: val: 0x%x, p.stepping: 0x%x, c.stepping: 0x%x\n",
+ __func__, val, p.stepping, c.stepping);
+
return c;
}
@@ -613,16 +616,22 @@ static int __init save_microcode_in_initrd(void)
}
early_initcall(save_microcode_in_initrd);
-static inline bool patch_cpus_equivalent(struct ucode_patch *p, struct ucode_patch *n)
+static inline bool patch_cpus_equivalent(struct ucode_patch *p,
+ struct ucode_patch *n,
+ bool ignore_stepping)
{
/* Zen and newer hardcode the f/m/s in the patch ID */
if (x86_family(bsp_cpuid_1_eax) >= 0x17) {
union cpuid_1_eax p_cid = ucode_rev_to_cpuid(p->patch_id);
union cpuid_1_eax n_cid = ucode_rev_to_cpuid(n->patch_id);
- /* Zap stepping */
- p_cid.stepping = 0;
- n_cid.stepping = 0;
+ if (ignore_stepping) {
+ p_cid.stepping = 0;
+ n_cid.stepping = 0;
+ }
+
+ pr_info("%s: p_cid.full: 0x%x, n_cid.full: 0x%x\n",
+ __func__, p_cid.full, n_cid.full);
return p_cid.full == n_cid.full;
} else {
@@ -641,16 +650,22 @@ static struct ucode_patch *cache_find_patch(struct ucode_cpu_info *uci, u16 equi
n.equiv_cpu = equiv_cpu;
n.patch_id = uci->cpu_sig.rev;
+ pr_info("%s: equiv_cpu: 0x%x, patch_id: 0x%x\n",
+ __func__, equiv_cpu, uci->cpu_sig.rev);
+
WARN_ON_ONCE(!n.patch_id);
- list_for_each_entry(p, µcode_cache, plist)
- if (patch_cpus_equivalent(p, &n))
+ list_for_each_entry(p, µcode_cache, plist) {
+ if (patch_cpus_equivalent(p, &n, false)) {
+ pr_info("%s: using 0x%x\n", __func__, p->patch_id);
return p;
+ }
+ }
return NULL;
}
-static inline bool patch_newer(struct ucode_patch *p, struct ucode_patch *n)
+static inline int patch_newer(struct ucode_patch *p, struct ucode_patch *n)
{
/* Zen and newer hardcode the f/m/s in the patch ID */
if (x86_family(bsp_cpuid_1_eax) >= 0x17) {
@@ -659,6 +674,9 @@ static inline bool patch_newer(struct ucode_patch *p, struct ucode_patch *n)
zp.ucode_rev = p->patch_id;
zn.ucode_rev = n->patch_id;
+ if (zn.stepping != zp.stepping)
+ return -1;
+
return zn.rev > zp.rev;
} else {
return n->patch_id > p->patch_id;
@@ -668,22 +686,32 @@ static inline bool patch_newer(struct ucode_patch *p, struct ucode_patch *n)
static void update_cache(struct ucode_patch *new_patch)
{
struct ucode_patch *p;
+ int ret;
list_for_each_entry(p, µcode_cache, plist) {
- if (patch_cpus_equivalent(p, new_patch)) {
- if (!patch_newer(p, new_patch)) {
+ if (patch_cpus_equivalent(p, new_patch, true)) {
+ ret = patch_newer(p, new_patch);
+ if (ret < 0)
+ continue;
+ else if (!ret) {
/* we already have the latest patch */
kfree(new_patch->data);
kfree(new_patch);
return;
}
+ pr_info("%s: replace 0x%x with 0x%x\n",
+ __func__, p->patch_id, new_patch->patch_id);
+
list_replace(&p->plist, &new_patch->plist);
kfree(p->data);
kfree(p);
return;
}
}
+
+ pr_info("%s: add patch: 0x%x\n", __func__, new_patch->patch_id);
+
/* no patch found, add it */
list_add_tail(&new_patch->plist, µcode_cache);
}
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: AMD zen microcode updates breaks boot
2024-10-10 13:46 ` Borislav Petkov
@ 2024-10-10 13:50 ` Jens Axboe
2024-10-17 2:34 ` Jens Axboe
0 siblings, 1 reply; 44+ messages in thread
From: Jens Axboe @ 2024-10-10 13:50 UTC (permalink / raw)
To: Borislav Petkov
Cc: Thomas Gleixner, the arch/x86 maintainers, LKML,
regressions@lists.linux.dev
On 10/10/24 7:46 AM, Borislav Petkov wrote:
> On Wed, Oct 09, 2024 at 05:04:23AM -0600, Jens Axboe wrote:
>> Yep, 0xaa00215
>
> Found something: I'm not handling the stepping properly, below is a big diff
> along with debug printks. Can you pls run it and send me dmesg. I'm assuming
> the box will boot with it.
Sure will give it a spin. The box is tied up with testing for now, but
I should be able to run this later today and send you the output.
--
Jens Axboe
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: AMD zen microcode updates breaks boot
2024-10-10 13:50 ` Jens Axboe
@ 2024-10-17 2:34 ` Jens Axboe
2024-10-17 10:02 ` Borislav Petkov
0 siblings, 1 reply; 44+ messages in thread
From: Jens Axboe @ 2024-10-17 2:34 UTC (permalink / raw)
To: Borislav Petkov
Cc: Thomas Gleixner, the arch/x86 maintainers, LKML,
regressions@lists.linux.dev
On 10/10/24 7:50 AM, Jens Axboe wrote:
> On 10/10/24 7:46 AM, Borislav Petkov wrote:
>> On Wed, Oct 09, 2024 at 05:04:23AM -0600, Jens Axboe wrote:
>>> Yep, 0xaa00215
>>
>> Found something: I'm not handling the stepping properly, below is a big diff
>> along with debug printks. Can you pls run it and send me dmesg. I'm assuming
>> the box will boot with it.
>
> Sure will give it a spin. The box is tied up with testing for now, but
> I should be able to run this later today and send you the output.
And then I totally forgot... Got it done now, but it still just hangs
after loading the kernel. No output.
--
Jens Axboe
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: AMD zen microcode updates breaks boot
2024-10-17 2:34 ` Jens Axboe
@ 2024-10-17 10:02 ` Borislav Petkov
2024-10-17 14:05 ` Jens Axboe
0 siblings, 1 reply; 44+ messages in thread
From: Borislav Petkov @ 2024-10-17 10:02 UTC (permalink / raw)
To: Jens Axboe
Cc: Thomas Gleixner, the arch/x86 maintainers, LKML,
regressions@lists.linux.dev
On Wed, Oct 16, 2024 at 08:34:09PM -0600, Jens Axboe wrote:
> And then I totally forgot... Got it done now, but it still just hangs
> after loading the kernel. No output.
Hmm...
Let's verify it still actually selects the correct patch. Diff below, it
should boot with it as it won't apply the microcode and you should be able to
gather dmesg.
Thx.
---
diff --git a/arch/x86/kernel/cpu/microcode/amd.c b/arch/x86/kernel/cpu/microcode/amd.c
index f63b051f25a0..ed5474cf683f 100644
--- a/arch/x86/kernel/cpu/microcode/amd.c
+++ b/arch/x86/kernel/cpu/microcode/amd.c
@@ -158,6 +158,9 @@ static union cpuid_1_eax ucode_rev_to_cpuid(unsigned int val)
c.family = 0xf;
c.ext_fam = p.ext_fam;
+ pr_info("%s: val: 0x%x, p.stepping: 0x%x, c.stepping: 0x%x\n",
+ __func__, val, p.stepping, c.stepping);
+
return c;
}
@@ -487,11 +490,13 @@ static int __apply_microcode_amd(struct microcode_amd *mc)
{
u32 rev, dummy;
- native_wrmsrl(MSR_AMD64_PATCH_LOADER, (u64)(long)&mc->hdr.data_code);
+// native_wrmsrl(MSR_AMD64_PATCH_LOADER, (u64)(long)&mc->hdr.data_code);
/* verify patch application was successful */
native_rdmsr(MSR_AMD64_PATCH_LEVEL, rev, dummy);
+ pr_info("%s: applying 0x%x, rev: 0x%x\n", __func__, mc->hdr.patch_id, rev);
+
if (rev != mc->hdr.patch_id)
return -1;
@@ -613,16 +618,22 @@ static int __init save_microcode_in_initrd(void)
}
early_initcall(save_microcode_in_initrd);
-static inline bool patch_cpus_equivalent(struct ucode_patch *p, struct ucode_patch *n)
+static inline bool patch_cpus_equivalent(struct ucode_patch *p,
+ struct ucode_patch *n,
+ bool ignore_stepping)
{
/* Zen and newer hardcode the f/m/s in the patch ID */
if (x86_family(bsp_cpuid_1_eax) >= 0x17) {
union cpuid_1_eax p_cid = ucode_rev_to_cpuid(p->patch_id);
union cpuid_1_eax n_cid = ucode_rev_to_cpuid(n->patch_id);
- /* Zap stepping */
- p_cid.stepping = 0;
- n_cid.stepping = 0;
+ if (ignore_stepping) {
+ p_cid.stepping = 0;
+ n_cid.stepping = 0;
+ }
+
+ pr_info("%s: p_cid.full: 0x%x, n_cid.full: 0x%x\n",
+ __func__, p_cid.full, n_cid.full);
return p_cid.full == n_cid.full;
} else {
@@ -641,16 +652,22 @@ static struct ucode_patch *cache_find_patch(struct ucode_cpu_info *uci, u16 equi
n.equiv_cpu = equiv_cpu;
n.patch_id = uci->cpu_sig.rev;
+ pr_info("%s: equiv_cpu: 0x%x, patch_id: 0x%x\n",
+ __func__, equiv_cpu, uci->cpu_sig.rev);
+
WARN_ON_ONCE(!n.patch_id);
- list_for_each_entry(p, µcode_cache, plist)
- if (patch_cpus_equivalent(p, &n))
+ list_for_each_entry(p, µcode_cache, plist) {
+ if (patch_cpus_equivalent(p, &n, false)) {
+ pr_info("%s: using 0x%x\n", __func__, p->patch_id);
return p;
+ }
+ }
return NULL;
}
-static inline bool patch_newer(struct ucode_patch *p, struct ucode_patch *n)
+static inline int patch_newer(struct ucode_patch *p, struct ucode_patch *n)
{
/* Zen and newer hardcode the f/m/s in the patch ID */
if (x86_family(bsp_cpuid_1_eax) >= 0x17) {
@@ -659,6 +676,9 @@ static inline bool patch_newer(struct ucode_patch *p, struct ucode_patch *n)
zp.ucode_rev = p->patch_id;
zn.ucode_rev = n->patch_id;
+ if (zn.stepping != zp.stepping)
+ return -1;
+
return zn.rev > zp.rev;
} else {
return n->patch_id > p->patch_id;
@@ -668,22 +688,32 @@ static inline bool patch_newer(struct ucode_patch *p, struct ucode_patch *n)
static void update_cache(struct ucode_patch *new_patch)
{
struct ucode_patch *p;
+ int ret;
list_for_each_entry(p, µcode_cache, plist) {
- if (patch_cpus_equivalent(p, new_patch)) {
- if (!patch_newer(p, new_patch)) {
+ if (patch_cpus_equivalent(p, new_patch, true)) {
+ ret = patch_newer(p, new_patch);
+ if (ret < 0)
+ continue;
+ else if (!ret) {
/* we already have the latest patch */
kfree(new_patch->data);
kfree(new_patch);
return;
}
+ pr_info("%s: replace 0x%x with 0x%x\n",
+ __func__, p->patch_id, new_patch->patch_id);
+
list_replace(&p->plist, &new_patch->plist);
kfree(p->data);
kfree(p);
return;
}
}
+
+ pr_info("%s: add patch: 0x%x\n", __func__, new_patch->patch_id);
+
/* no patch found, add it */
list_add_tail(&new_patch->plist, µcode_cache);
}
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: AMD zen microcode updates breaks boot
2024-10-17 10:02 ` Borislav Petkov
@ 2024-10-17 14:05 ` Jens Axboe
2024-10-17 14:13 ` Borislav Petkov
0 siblings, 1 reply; 44+ messages in thread
From: Jens Axboe @ 2024-10-17 14:05 UTC (permalink / raw)
To: Borislav Petkov
Cc: Thomas Gleixner, the arch/x86 maintainers, LKML,
regressions@lists.linux.dev
On 10/17/24 4:02 AM, Borislav Petkov wrote:
> On Wed, Oct 16, 2024 at 08:34:09PM -0600, Jens Axboe wrote:
>> And then I totally forgot... Got it done now, but it still just hangs
>> after loading the kernel. No output.
>
> Hmm...
>
> Let's verify it still actually selects the correct patch. Diff below, it
> should boot with it as it won't apply the microcode and you should be able to
> gather dmesg.
Same thing, doesn't boot, just hangs after loading the kernel.
--
Jens Axboe
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: AMD zen microcode updates breaks boot
2024-10-17 14:05 ` Jens Axboe
@ 2024-10-17 14:13 ` Borislav Petkov
2024-10-17 14:23 ` Jens Axboe
0 siblings, 1 reply; 44+ messages in thread
From: Borislav Petkov @ 2024-10-17 14:13 UTC (permalink / raw)
To: Jens Axboe
Cc: Thomas Gleixner, the arch/x86 maintainers, LKML,
regressions@lists.linux.dev
On Thu, Oct 17, 2024 at 08:05:09AM -0600, Jens Axboe wrote:
> Same thing, doesn't boot, just hangs after loading the kernel.
Ohh, so it is not the patch application but something else in the code.
I presume "dis_ucode_ldr" on the kernel cmdline makes it boot?
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: AMD zen microcode updates breaks boot
2024-10-17 14:13 ` Borislav Petkov
@ 2024-10-17 14:23 ` Jens Axboe
2024-10-17 14:27 ` Borislav Petkov
0 siblings, 1 reply; 44+ messages in thread
From: Jens Axboe @ 2024-10-17 14:23 UTC (permalink / raw)
To: Borislav Petkov
Cc: Thomas Gleixner, the arch/x86 maintainers, LKML,
regressions@lists.linux.dev
On 10/17/24 8:13 AM, Borislav Petkov wrote:
> On Thu, Oct 17, 2024 at 08:05:09AM -0600, Jens Axboe wrote:
>> Same thing, doesn't boot, just hangs after loading the kernel.
>
> Ohh, so it is not the patch application but something else in the code.
>
> I presume "dis_ucode_ldr" on the kernel cmdline makes it boot?
Yep, it boots with that added on current -git WITH the microcode
package installed.
--
Jens Axboe
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: AMD zen microcode updates breaks boot
2024-10-17 14:23 ` Jens Axboe
@ 2024-10-17 14:27 ` Borislav Petkov
2024-10-17 14:40 ` Jens Axboe
0 siblings, 1 reply; 44+ messages in thread
From: Borislav Petkov @ 2024-10-17 14:27 UTC (permalink / raw)
To: Jens Axboe
Cc: Thomas Gleixner, the arch/x86 maintainers, LKML,
regressions@lists.linux.dev
On Thu, Oct 17, 2024 at 08:23:10AM -0600, Jens Axboe wrote:
> Yep, it boots with that added on current -git WITH the microcode
> package installed.
Ok, lemme go stare.
In the meantime, is it in some way possible to catch serial output from the
box with my patch?
It should be dumping something to serial, a splat or so...
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: AMD zen microcode updates breaks boot
2024-10-17 14:27 ` Borislav Petkov
@ 2024-10-17 14:40 ` Jens Axboe
2024-10-18 11:58 ` Borislav Petkov
0 siblings, 1 reply; 44+ messages in thread
From: Jens Axboe @ 2024-10-17 14:40 UTC (permalink / raw)
To: Borislav Petkov
Cc: Thomas Gleixner, the arch/x86 maintainers, LKML,
regressions@lists.linux.dev
On 10/17/24 8:27 AM, Borislav Petkov wrote:
> On Thu, Oct 17, 2024 at 08:23:10AM -0600, Jens Axboe wrote:
>> Yep, it boots with that added on current -git WITH the microcode
>> package installed.
>
> Ok, lemme go stare.
>
> In the meantime, is it in some way possible to catch serial output from the
> box with my patch?
>
> It should be dumping something to serial, a splat or so...
Maybe? There might be some way to do it through ilo. I'm not near the box
today, but I can take a look at that tomorrow most likely.
--
Jens Axboe
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: AMD zen microcode updates breaks boot
2024-10-17 14:40 ` Jens Axboe
@ 2024-10-18 11:58 ` Borislav Petkov
2024-10-18 12:49 ` Borislav Petkov
0 siblings, 1 reply; 44+ messages in thread
From: Borislav Petkov @ 2024-10-18 11:58 UTC (permalink / raw)
To: Jens Axboe
Cc: Thomas Gleixner, the arch/x86 maintainers, LKML,
regressions@lists.linux.dev
On Thu, Oct 17, 2024 at 08:40:18AM -0600, Jens Axboe wrote:
> Maybe? There might be some way to do it through ilo. I'm not near the box
> today, but I can take a look at that tomorrow most likely.
So I found a box exactly like yours:
[ 3.699583] smpboot: CPU0: AMD EPYC 9754 128-Core Processor (family: 0x19, model: 0xa0, stepping: 0x2)
and it updates just fine and just like on yours:
[ 28.457381] microcode: Current revision: 0x0aa00215
[ 28.462827] microcode: Updated early from: 0x0aa00215
The next thing that is catching my eye is your simulated NUMA config:
[ 1.668943] smp: Brought up 32 nodes, 512 CPUs
Lemme see if I can repro that here.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: AMD zen microcode updates breaks boot
2024-10-18 11:58 ` Borislav Petkov
@ 2024-10-18 12:49 ` Borislav Petkov
2024-10-18 13:30 ` Jens Axboe
0 siblings, 1 reply; 44+ messages in thread
From: Borislav Petkov @ 2024-10-18 12:49 UTC (permalink / raw)
To: Jens Axboe
Cc: Thomas Gleixner, the arch/x86 maintainers, LKML,
regressions@lists.linux.dev
On Fri, Oct 18, 2024 at 01:58:57PM +0200, Borislav Petkov wrote:
> The next thing that is catching my eye is your simulated NUMA config:
>
> [ 1.668943] smp: Brought up 32 nodes, 512 CPUs
>
> Lemme see if I can repro that here.
I can do only 4 nodes here:
[ 23.137188] smp: Brought up 4 nodes, 255 CPUs
But that thing boots fine too.
I guess the next thing to try is a two-socket box like yours.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: AMD zen microcode updates breaks boot
2024-10-18 12:49 ` Borislav Petkov
@ 2024-10-18 13:30 ` Jens Axboe
2024-10-18 15:51 ` Borislav Petkov
0 siblings, 1 reply; 44+ messages in thread
From: Jens Axboe @ 2024-10-18 13:30 UTC (permalink / raw)
To: Borislav Petkov
Cc: Thomas Gleixner, the arch/x86 maintainers, LKML,
regressions@lists.linux.dev
On 10/18/24 6:49 AM, Borislav Petkov wrote:
> On Fri, Oct 18, 2024 at 01:58:57PM +0200, Borislav Petkov wrote:
>> The next thing that is catching my eye is your simulated NUMA config:
>>
>> [ 1.668943] smp: Brought up 32 nodes, 512 CPUs
>>
>> Lemme see if I can repro that here.
>
> I can do only 4 nodes here:
>
> [ 23.137188] smp: Brought up 4 nodes, 255 CPUs
>
> But that thing boots fine too.
>
> I guess the next thing to try is a two-socket box like yours.
At least on mine, the BIOS has an option that says something like "L3
cache as numa domain", which is on and why there's 32 nodes on that box.
It's pretty handy for testing since there's a crap ton of CPUs, as it
makes affinity handling easier.
--
Jens Axboe
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: AMD zen microcode updates breaks boot
2024-10-18 13:30 ` Jens Axboe
@ 2024-10-18 15:51 ` Borislav Petkov
2024-10-18 16:45 ` Dr. David Alan Gilbert
2024-10-18 16:48 ` Jens Axboe
0 siblings, 2 replies; 44+ messages in thread
From: Borislav Petkov @ 2024-10-18 15:51 UTC (permalink / raw)
To: Jens Axboe
Cc: Thomas Gleixner, the arch/x86 maintainers, LKML,
regressions@lists.linux.dev
On Fri, Oct 18, 2024 at 07:30:15AM -0600, Jens Axboe wrote:
> At least on mine, the BIOS has an option that says something like "L3
> cache as numa domain", which is on and why there's 32 nodes on that box.
> It's pretty handy for testing since there's a crap ton of CPUs, as it
> makes affinity handling easier.
Right, so two boxes I tested with this:
* 2 socket, a bit different microcode:
[ 22.947525] smp: Brought up 32 nodes, 512 CPUs
* your CPU, one socket:
[ 26.830137] smp: Brought up 16 nodes, 255 CPUs
[ 37.770789] microcode: Current revision: 0x0aa00215
[ 37.776231] microcode: Updated early from: 0x0aa00215
and both boot with my debugging patch just fine.
Hmm.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: AMD zen microcode updates breaks boot
2024-10-18 15:51 ` Borislav Petkov
@ 2024-10-18 16:45 ` Dr. David Alan Gilbert
2024-10-18 16:47 ` Jens Axboe
2024-10-18 16:48 ` Jens Axboe
1 sibling, 1 reply; 44+ messages in thread
From: Dr. David Alan Gilbert @ 2024-10-18 16:45 UTC (permalink / raw)
To: Borislav Petkov
Cc: Jens Axboe, Thomas Gleixner, the arch/x86 maintainers, LKML,
regressions@lists.linux.dev
* Borislav Petkov (bp@alien8.de) wrote:
> On Fri, Oct 18, 2024 at 07:30:15AM -0600, Jens Axboe wrote:
> > At least on mine, the BIOS has an option that says something like "L3
> > cache as numa domain", which is on and why there's 32 nodes on that box.
> > It's pretty handy for testing since there's a crap ton of CPUs, as it
> > makes affinity handling easier.
>
> Right, so two boxes I tested with this:
>
> * 2 socket, a bit different microcode:
>
> [ 22.947525] smp: Brought up 32 nodes, 512 CPUs
>
> * your CPU, one socket:
>
> [ 26.830137] smp: Brought up 16 nodes, 255 CPUs
(Probably unrelated but...)
What happened to number 256 ?
Dave
> [ 37.770789] microcode: Current revision: 0x0aa00215
> [ 37.776231] microcode: Updated early from: 0x0aa00215
>
> and both boot with my debugging patch just fine.
>
> Hmm.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
>
--
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux | Happy \
\ dave @ treblig.org | | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: AMD zen microcode updates breaks boot
2024-10-18 16:45 ` Dr. David Alan Gilbert
@ 2024-10-18 16:47 ` Jens Axboe
2024-10-18 17:59 ` Dr. David Alan Gilbert
0 siblings, 1 reply; 44+ messages in thread
From: Jens Axboe @ 2024-10-18 16:47 UTC (permalink / raw)
To: Dr. David Alan Gilbert, Borislav Petkov
Cc: Thomas Gleixner, the arch/x86 maintainers, LKML,
regressions@lists.linux.dev
On 10/18/24 10:45 AM, Dr. David Alan Gilbert wrote:
> * Borislav Petkov (bp@alien8.de) wrote:
>> On Fri, Oct 18, 2024 at 07:30:15AM -0600, Jens Axboe wrote:
>>> At least on mine, the BIOS has an option that says something like "L3
>>> cache as numa domain", which is on and why there's 32 nodes on that box.
>>> It's pretty handy for testing since there's a crap ton of CPUs, as it
>>> makes affinity handling easier.
>>
>> Right, so two boxes I tested with this:
>>
>> * 2 socket, a bit different microcode:
>>
>> [ 22.947525] smp: Brought up 32 nodes, 512 CPUs
>>
>> * your CPU, one socket:
>>
>> [ 26.830137] smp: Brought up 16 nodes, 255 CPUs
>
> (Probably unrelated but...)
> What happened to number 256 ?
Quick guess, maybe iommu was off, will cap it to 255. IIRC...
--
Jens Axboe
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: AMD zen microcode updates breaks boot
2024-10-18 15:51 ` Borislav Petkov
2024-10-18 16:45 ` Dr. David Alan Gilbert
@ 2024-10-18 16:48 ` Jens Axboe
2024-10-18 17:56 ` Borislav Petkov
1 sibling, 1 reply; 44+ messages in thread
From: Jens Axboe @ 2024-10-18 16:48 UTC (permalink / raw)
To: Borislav Petkov
Cc: Thomas Gleixner, the arch/x86 maintainers, LKML,
regressions@lists.linux.dev
On 10/18/24 9:51 AM, Borislav Petkov wrote:
> On Fri, Oct 18, 2024 at 07:30:15AM -0600, Jens Axboe wrote:
>> At least on mine, the BIOS has an option that says something like "L3
>> cache as numa domain", which is on and why there's 32 nodes on that box.
>> It's pretty handy for testing since there's a crap ton of CPUs, as it
>> makes affinity handling easier.
>
> Right, so two boxes I tested with this:
>
> * 2 socket, a bit different microcode:
>
> [ 22.947525] smp: Brought up 32 nodes, 512 CPUs
>
> * your CPU, one socket:
>
> [ 26.830137] smp: Brought up 16 nodes, 255 CPUs
> [ 37.770789] microcode: Current revision: 0x0aa00215
> [ 37.776231] microcode: Updated early from: 0x0aa00215
>
> and both boot with my debugging patch just fine.
>
> Hmm.
Funky... Not sure I'll have time to get a serial console on this
thing before next week. Like I mentioned before -rc1, maybe we
revert this change and I'll be happy to test patches as time
permits?
--
Jens Axboe
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: AMD zen microcode updates breaks boot
2024-10-18 16:48 ` Jens Axboe
@ 2024-10-18 17:56 ` Borislav Petkov
2024-10-18 18:03 ` Jens Axboe
0 siblings, 1 reply; 44+ messages in thread
From: Borislav Petkov @ 2024-10-18 17:56 UTC (permalink / raw)
To: Jens Axboe
Cc: Thomas Gleixner, the arch/x86 maintainers, LKML,
regressions@lists.linux.dev
On Fri, Oct 18, 2024 at 10:48:19AM -0600, Jens Axboe wrote:
> Funky... Not sure I'll have time to get a serial console on this
> thing before next week.
That would be much appreciated.
> Like I mentioned before -rc1, maybe we revert this change and I'll be happy
> to test patches as time permits?
That's my plan if we can't resolve it until -rc6-ish or so.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: AMD zen microcode updates breaks boot
2024-10-18 16:47 ` Jens Axboe
@ 2024-10-18 17:59 ` Dr. David Alan Gilbert
0 siblings, 0 replies; 44+ messages in thread
From: Dr. David Alan Gilbert @ 2024-10-18 17:59 UTC (permalink / raw)
To: Jens Axboe
Cc: Borislav Petkov, Thomas Gleixner, the arch/x86 maintainers, LKML,
regressions@lists.linux.dev
* Jens Axboe (axboe@kernel.dk) wrote:
> On 10/18/24 10:45 AM, Dr. David Alan Gilbert wrote:
> > * Borislav Petkov (bp@alien8.de) wrote:
> >> On Fri, Oct 18, 2024 at 07:30:15AM -0600, Jens Axboe wrote:
> >>> At least on mine, the BIOS has an option that says something like "L3
> >>> cache as numa domain", which is on and why there's 32 nodes on that box.
> >>> It's pretty handy for testing since there's a crap ton of CPUs, as it
> >>> makes affinity handling easier.
> >>
> >> Right, so two boxes I tested with this:
> >>
> >> * 2 socket, a bit different microcode:
> >>
> >> [ 22.947525] smp: Brought up 32 nodes, 512 CPUs
> >>
> >> * your CPU, one socket:
> >>
> >> [ 26.830137] smp: Brought up 16 nodes, 255 CPUs
> >
> > (Probably unrelated but...)
> > What happened to number 256 ?
>
> Quick guess, maybe iommu was off, will cap it to 255. IIRC...
Ah OK.
Dave
> --
> Jens Axboe
>
>
--
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux | Happy \
\ dave @ treblig.org | | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: AMD zen microcode updates breaks boot
2024-10-18 17:56 ` Borislav Petkov
@ 2024-10-18 18:03 ` Jens Axboe
2024-10-18 23:01 ` Jens Axboe
0 siblings, 1 reply; 44+ messages in thread
From: Jens Axboe @ 2024-10-18 18:03 UTC (permalink / raw)
To: Borislav Petkov
Cc: Thomas Gleixner, the arch/x86 maintainers, LKML,
regressions@lists.linux.dev
On 10/18/24 11:56 AM, Borislav Petkov wrote:
> On Fri, Oct 18, 2024 at 10:48:19AM -0600, Jens Axboe wrote:
>> Funky... Not sure I'll have time to get a serial console on this
>> thing before next week.
>
> That would be much appreciated.
I will probably have some time to get that going on Monday. Just to set
expectations in terms of timing, the above should've read "before the end
of next week".
>> Like I mentioned before -rc1, maybe we revert this change and I'll be happy
>> to test patches as time permits?
>
> That's my plan if we can't resolve it until -rc6-ish or so.
That sounds fine.
--
Jens Axboe
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: AMD zen microcode updates breaks boot
2024-10-18 18:03 ` Jens Axboe
@ 2024-10-18 23:01 ` Jens Axboe
2024-10-19 9:37 ` Borislav Petkov
0 siblings, 1 reply; 44+ messages in thread
From: Jens Axboe @ 2024-10-18 23:01 UTC (permalink / raw)
To: Borislav Petkov
Cc: Thomas Gleixner, the arch/x86 maintainers, LKML,
regressions@lists.linux.dev
On 10/18/24 12:03 PM, Jens Axboe wrote:
> On 10/18/24 11:56 AM, Borislav Petkov wrote:
>> On Fri, Oct 18, 2024 at 10:48:19AM -0600, Jens Axboe wrote:
>>> Funky... Not sure I'll have time to get a serial console on this
>>> thing before next week.
>>
>> That would be much appreciated.
>
> I will probably have some time to get that going on Monday. Just to set
> expectations in terms of timing, the above should've read "before the end
> of next week".
I took time out of "would otherwise have had a beer" time on a Friday
afternoon and got a serial console on it. Here's the crash at boot:
BUG: unable to handle page fault for address: 00000001000141ab
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP
CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0-rc2+ #143
Hardware name: Dell Inc. PowerEdge R7625/06444F, BIOS 1.8.3 04/02/2024
RIP: 0010:load_microcode_amd.isra.0+0x334/0x450
Code: e7 06 48 81 c7 c0 97 7e 96 e8 f8 bb 4e 00 89 c7 89 c0 4c 8b 2c c5 40 20 e2 95 49 01 ed e8 44 f9 ff ff 48 85 c0 74 12 8b 40 1c <41> 39 85 28 01 00 00 0f 92 c0 0f b6 c0 09 c3 41 8d 4c 24 01 41 83
RSP: 0018:ffffa41ac00e7e08 EFLAGS: 00010282
RAX: 000000000aa00116 RBX: 0000000000000000 RCX: 0000000000a00000
RDX: 0000000000aa0f00 RSI: 0000000000000200 RDI: 000000000000000a
RBP: 0000000000014080 R08: 000000000000aa01 R09: ffff93ebc1106000
R10: ffffa41ac00e7df0 R11: 0000000000002000 R12: 0000000000000001
R13: 0000000100014083 R14: ffff93ebc103b400 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff93f127e00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000001000141ab CR3: 000000ba2de28001 CR4: 0000000000370ef0
Call Trace:
<TASK>
? __die_body.cold+0x19/0x2b
? page_fault_oops+0x90/0x210
? load_microcode_amd.isra.0+0x185/0x450
? exc_page_fault+0x6c/0x130
? asm_exc_page_fault+0x22/0x30
? load_microcode_amd.isra.0+0x334/0x450
? load_microcode_amd.isra.0+0x32c/0x450
save_microcode_in_initrd+0x90/0xb0
? find_blobs_in_containers+0xb0/0xb0
do_one_initcall+0x2e/0x190
? try_to_wake_up+0x1c0/0x4b0
kernel_init_freeable+0xdd/0x210
? rest_init+0xc0/0xc0
kernel_init+0x16/0x120
ret_from_fork+0x2d/0x50
? rest_init+0xc0/0xc0
ret_from_fork_asm+0x11/0x20
</TASK>
Modules linked in:
CR2: 00000001000141ab
---[ end trace 0000000000000000 ]---
RIP: 0010:load_microcode_amd.isra.0+0x334/0x450
Code: e7 06 48 81 c7 c0 97 7e 96 e8 f8 bb 4e 00 89 c7 89 c0 4c 8b 2c c5 40 20 e2 95 49 01 ed e8 44 f9 ff ff 48 85 c0 74 12 8b 40 1c <41> 39 85 28 01 00 00 0f 92 c0 0f b6 c0 09 c3 41 8d 4c 24 01 41 83
RSP: 0018:ffffa41ac00e7e08 EFLAGS: 00010282
RAX: 000000000aa00116 RBX: 0000000000000000 RCX: 0000000000a00000
RDX: 0000000000aa0f00 RSI: 0000000000000200 RDI: 000000000000000a
RBP: 0000000000014080 R08: 000000000000aa01 R09: ffff93ebc1106000
R10: ffffa41ac00e7df0 R11: 0000000000002000 R12: 0000000000000001
R13: 0000000100014083 R14: ffff93ebc103b400 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff93f127e00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000001000141ab CR3: 000000ba2de28001 CR4: 0000000000370ef0
note: swapper/0[1] exited with irqs disabled
Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 ]---
which appears to be here:
(gdb) l *load_microcode_amd+0x334
0xffffffff810914a4 is in load_microcode_amd (arch/x86/kernel/cpu/microcode/amd.c:971).
966
967 p = find_patch(cpu);
968 if (!p)
969 continue;
970
971 if (c->microcode >= p->patch_id)
972 continue;
973
974 ret = UCODE_NEW;
975 }
heading home, so didn't poke. As you know, there are 32 nodes in this
sytem, looks like:
axboe@r7625 /s/d/s/node> grep . node*/cpulist
node0/cpulist:0-7,256-263
node1/cpulist:8-15,264-271
node2/cpulist:16-23,272-279
node3/cpulist:24-31,280-287
node4/cpulist:32-39,288-295
node5/cpulist:40-47,296-303
node6/cpulist:48-55,304-311
node7/cpulist:56-63,312-319
node8/cpulist:64-71,320-327
node9/cpulist:72-79,328-335
node10/cpulist:80-87,336-343
node11/cpulist:88-95,344-351
node12/cpulist:96-103,352-359
node13/cpulist:104-111,360-367
node14/cpulist:112-119,368-375
node15/cpulist:120-127,376-383
node16/cpulist:128-135,384-391
node17/cpulist:136-143,392-399
node18/cpulist:144-151,400-407
node19/cpulist:152-159,408-415
node20/cpulist:160-167,416-423
node21/cpulist:168-175,424-431
node22/cpulist:176-183,432-439
node23/cpulist:184-191,440-447
node24/cpulist:192-199,448-455
node25/cpulist:200-207,456-463
node26/cpulist:208-215,464-471
node27/cpulist:216-223,472-479
node28/cpulist:224-231,480-487
node29/cpulist:232-239,488-495
node30/cpulist:240-247,496-503
node31/cpulist:248-255,504-511
and .config has:
CONFIG_NODES_SHIFT=5
CONFIG_USE_PERCPU_NUMA_NODE_ID=y
--
Jens Axboe
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: AMD zen microcode updates breaks boot
2024-10-18 23:01 ` Jens Axboe
@ 2024-10-19 9:37 ` Borislav Petkov
2024-10-19 13:54 ` Jens Axboe
0 siblings, 1 reply; 44+ messages in thread
From: Borislav Petkov @ 2024-10-19 9:37 UTC (permalink / raw)
To: Jens Axboe
Cc: Thomas Gleixner, the arch/x86 maintainers, LKML,
regressions@lists.linux.dev
On Fri, Oct 18, 2024 at 05:01:18PM -0600, Jens Axboe wrote:
> I took time out of "would otherwise have had a beer" time on a Friday
No worries, next time we meet at a conference, I'm buying! :-)
> which appears to be here:
>
> (gdb) l *load_microcode_amd+0x334
> 0xffffffff810914a4 is in load_microcode_amd (arch/x86/kernel/cpu/microcode/amd.c:971).
Ok, first things first, this line 971 points to the code *without* my big
debugging patch. Is that correct?
With my patch that line should be 999 because of the debugging output and
other changes.
Because if so, please run with it because it has a fix for the patch matching
steppings too. And then send me the whole serial log because it'll have debug
info for me to stare at.
Thanks a lot!
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: AMD zen microcode updates breaks boot
2024-10-19 9:37 ` Borislav Petkov
@ 2024-10-19 13:54 ` Jens Axboe
2024-10-19 23:21 ` Borislav Petkov
0 siblings, 1 reply; 44+ messages in thread
From: Jens Axboe @ 2024-10-19 13:54 UTC (permalink / raw)
To: Borislav Petkov
Cc: Thomas Gleixner, the arch/x86 maintainers, LKML,
regressions@lists.linux.dev
On 10/19/24 3:37 AM, Borislav Petkov wrote:
> On Fri, Oct 18, 2024 at 05:01:18PM -0600, Jens Axboe wrote:
>> I took time out of "would otherwise have had a beer" time on a Friday
>
> No worries, next time we meet at a conference, I'm buying! :-)
;-)
>> which appears to be here:
>>
>> (gdb) l *load_microcode_amd+0x334
>> 0xffffffff810914a4 is in load_microcode_amd (arch/x86/kernel/cpu/microcode/amd.c:971).
>
> Ok, first things first, this line 971 points to the code *without* my big
> debugging patch. Is that correct?
Right, this was just a boot of the kernel I'm running now.
> With my patch that line should be 999 because of the debugging output and
> other changes.
>
> Because if so, please run with it because it has a fix for the patch matching
> steppings too. And then send me the whole serial log because it'll have debug
> info for me to stare at.
Added that, and here's the full boot output until it crashes. Sent you
the full thing as there's some microcode debug prints initially, and
then some later on. Didn't want to miss any.
EFI stub: Loaded initrd from LINUX_EFI_INITRD_MEDIA_GUID device path
EFI stub: Measured initrd data into PCR 9
microcode: ucode_rev_to_cpuid: val: 0xa00107a, p.stepping: 0x0, c.stepping: 0x0
microcode: ucode_rev_to_cpuid: val: 0xa101248, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xa001238, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xa101148, p.stepping: 0x1, c.stepping: 0x1
microcode: ucode_rev_to_cpuid: val: 0xa0011d5, p.stepping: 0x1, c.stepping: 0x1
microcode: ucode_rev_to_cpuid: val: 0xaa00116, p.stepping: 0x1, c.stepping: 0x1
Linux version 6.12.0-rc2+ (axboe@r7625) (gcc (Debian 14.2.0-6) 14.2.0, GNU ld (GNU Binutils for Debian) 2.43.1) #144 SMP Sat Oct 19 13:43:43 UTC 2024
Command line: BOOT_IMAGE=/vmlinuz-6.12.0-rc2+ root=UUID=d155ab07-2863-462d-931e-ce417aca7adb ro nvme.poll_queues=16 mitigations=off console=ttyS1,115200 console=tty0
KERNEL supported cpus:
AMD AuthenticAMD
BIOS-provided physical RAM map:
BIOS-e820: [mem 0x0000000000000000-0x000000000008efff] usable
BIOS-e820: [mem 0x000000000008f000-0x000000000008ffff] ACPI NVS
BIOS-e820: [mem 0x0000000000090000-0x000000000009ffff] usable
BIOS-e820: [mem 0x0000000000100000-0x0000000059cccfff] usable
BIOS-e820: [mem 0x0000000059ccd000-0x0000000059ccffff] reserved
BIOS-e820: [mem 0x0000000059cd0000-0x00000000677cefff] usable
BIOS-e820: [mem 0x00000000677cf000-0x000000006dfcefff] reserved
BIOS-e820: [mem 0x000000006dfcf000-0x000000006edfefff] ACPI NVS
BIOS-e820: [mem 0x000000006edff000-0x000000006effefff] ACPI data
BIOS-e820: [mem 0x000000006efff000-0x000000006effffff] usable
BIOS-e820: [mem 0x000000006f000000-0x000000006fffffff] ACPI NVS
BIOS-e820: [mem 0x0000000070000000-0x000000008fffffff] reserved
BIOS-e820: [mem 0x000000009c000000-0x000000009cffffff] reserved
BIOS-e820: [mem 0x00000000a9000000-0x00000000a9ffffff] reserved
BIOS-e820: [mem 0x00000000b6000000-0x00000000b6ffffff] reserved
BIOS-e820: [mem 0x00000000c3000000-0x00000000c3ffffff] reserved
BIOS-e820: [mem 0x00000000d0000000-0x00000000d0ffffff] reserved
BIOS-e820: [mem 0x00000000dd000000-0x00000000ddffffff] reserved
BIOS-e820: [mem 0x00000000ea000000-0x00000000eaffffff] reserved
BIOS-e820: [mem 0x00000000fd000000-0x00000000ffffffff] reserved
BIOS-e820: [mem 0x0000000100000000-0x000000187fdfffff] usable
BIOS-e820: [mem 0x000000187fe00000-0x000000187fffffff] reserved
BIOS-e820: [mem 0x0000001880000000-0x000000307fdfffff] usable
BIOS-e820: [mem 0x000000307fe00000-0x000000307fffffff] reserved
BIOS-e820: [mem 0x0000003080000000-0x000000487fdfffff] usable
BIOS-e820: [mem 0x000000487fe00000-0x000000487fffffff] reserved
BIOS-e820: [mem 0x0000004880000000-0x000000607edfffff] usable
BIOS-e820: [mem 0x000000607ee00000-0x000000607fffffff] reserved
BIOS-e820: [mem 0x0000006080000000-0x000000787fdfffff] usable
BIOS-e820: [mem 0x000000787fe00000-0x000000787fffffff] reserved
BIOS-e820: [mem 0x0000007880000000-0x000000907fdfffff] usable
BIOS-e820: [mem 0x000000907fe00000-0x000000907fffffff] reserved
BIOS-e820: [mem 0x0000009080000000-0x000000a87fdfffff] usable
BIOS-e820: [mem 0x000000a87fe00000-0x000000a87fffffff] reserved
BIOS-e820: [mem 0x000000a880000000-0x000000c07e7bffff] usable
BIOS-e820: [mem 0x000000c07e7c0000-0x000000c07fbfffff] reserved
BIOS-e820: [mem 0x000000c07fc00000-0x000000c07fcfffff] usable
BIOS-e820: [mem 0x000000c07fd00000-0x000000c07fffffff] reserved
BIOS-e820: [mem 0x0000010000000000-0x00000100201fffff] reserved
BIOS-e820: [mem 0x000001dfa0000000-0x000001dfc01fffff] reserved
BIOS-e820: [mem 0x000002bf40000000-0x000002bf601fffff] reserved
BIOS-e820: [mem 0x0000039ee0000000-0x0000039f001fffff] reserved
BIOS-e820: [mem 0x0000047e80000000-0x0000047ea01fffff] reserved
BIOS-e820: [mem 0x0000055e20000000-0x0000055e401fffff] reserved
BIOS-e820: [mem 0x0000063dc0000000-0x0000063de01fffff] reserved
BIOS-e820: [mem 0x0000071d60000000-0x0000071d801fffff] reserved
Kernel compiled without mitigations, ignoring 'mitigations'; system may still be vulnerable
NX (Execute Disable) protection: active
APIC: Static calls initialized
extended physical RAM map:
reserve setup_data: [mem 0x0000000000000000-0x000000000008efff] usable
reserve setup_data: [mem 0x000000000008f000-0x000000000008ffff] ACPI NVS
reserve setup_data: [mem 0x0000000000090000-0x000000000009ffff] usable
reserve setup_data: [mem 0x0000000000100000-0x000000004bd0701f] usable
reserve setup_data: [mem 0x000000004bd07020-0x000000004bd6965f] usable
reserve setup_data: [mem 0x000000004bd69660-0x000000004bd6a01f] usable
reserve setup_data: [mem 0x000000004bd6a020-0x000000004bdcc65f] usable
reserve setup_data: [mem 0x000000004bdcc660-0x000000004bdcd01f] usable
reserve setup_data: [mem 0x000000004bdcd020-0x000000004be01c5f] usable
reserve setup_data: [mem 0x000000004be01c60-0x000000004be0201f] usable
reserve setup_data: [mem 0x000000004be02020-0x000000004be36c5f] usable
reserve setup_data: [mem 0x000000004be36c60-0x000000004be3701f] usable
reserve setup_data: [mem 0x000000004be37020-0x000000004be3f05f] usable
reserve setup_data: [mem 0x000000004be3f060-0x000000004be4001f] usable
reserve setup_data: [mem 0x000000004be40020-0x000000004be5325f] usable
reserve setup_data: [mem 0x000000004be53260-0x000000004be5401f] usable
reserve setup_data: [mem 0x000000004be54020-0x000000004be8a45f] usable
reserve setup_data: [mem 0x000000004be8a460-0x000000004be8b01f] usable
reserve setup_data: [mem 0x000000004be8b020-0x000000004bec145f] usable
reserve setup_data: [mem 0x000000004bec1460-0x000000004bec201f] usable
reserve setup_data: [mem 0x000000004bec2020-0x000000004bef845f] usable
reserve setup_data: [mem 0x000000004bef8460-0x000000004bef901f] usable
reserve setup_data: [mem 0x000000004bef9020-0x000000004bf2f45f] usable
reserve setup_data: [mem 0x000000004bf2f460-0x0000000059cccfff] usable
reserve setup_data: [mem 0x0000000059ccd000-0x0000000059ccffff] reserved
reserve setup_data: [mem 0x0000000059cd0000-0x00000000677cefff] usable
reserve setup_data: [mem 0x00000000677cf000-0x000000006dfcefff] reserved
reserve setup_data: [mem 0x000000006dfcf000-0x000000006edfefff] ACPI NVS
reserve setup_data: [mem 0x000000006edff000-0x000000006effefff] ACPI data
reserve setup_data: [mem 0x000000006efff000-0x000000006effffff] usable
reserve setup_data: [mem 0x000000006f000000-0x000000006fffffff] ACPI NVS
reserve setup_data: [mem 0x0000000070000000-0x000000008fffffff] reserved
reserve setup_data: [mem 0x000000009c000000-0x000000009cffffff] reserved
reserve setup_data: [mem 0x00000000a9000000-0x00000000a9ffffff] reserved
reserve setup_data: [mem 0x00000000b6000000-0x00000000b6ffffff] reserved
reserve setup_data: [mem 0x00000000c3000000-0x00000000c3ffffff] reserved
reserve setup_data: [mem 0x00000000d0000000-0x00000000d0ffffff] reserved
reserve setup_data: [mem 0x00000000dd000000-0x00000000ddffffff] reserved
reserve setup_data: [mem 0x00000000ea000000-0x00000000eaffffff] reserved
reserve setup_data: [mem 0x00000000fd000000-0x00000000ffffffff] reserved
reserve setup_data: [mem 0x0000000100000000-0x000000187fdfffff] usable
reserve setup_data: [mem 0x000000187fe00000-0x000000187fffffff] reserved
reserve setup_data: [mem 0x0000001880000000-0x000000307fdfffff] usable
reserve setup_data: [mem 0x000000307fe00000-0x000000307fffffff] reserved
reserve setup_data: [mem 0x0000003080000000-0x000000487fdfffff] usable
reserve setup_data: [mem 0x000000487fe00000-0x000000487fffffff] reserved
reserve setup_data: [mem 0x0000004880000000-0x000000607edfffff] usable
reserve setup_data: [mem 0x000000607ee00000-0x000000607fffffff] reserved
reserve setup_data: [mem 0x0000006080000000-0x000000787fdfffff] usable
reserve setup_data: [mem 0x000000787fe00000-0x000000787fffffff] reserved
reserve setup_data: [mem 0x0000007880000000-0x000000907fdfffff] usable
reserve setup_data: [mem 0x000000907fe00000-0x000000907fffffff] reserved
reserve setup_data: [mem 0x0000009080000000-0x000000a87fdfffff] usable
reserve setup_data: [mem 0x000000a87fe00000-0x000000a87fffffff] reserved
reserve setup_data: [mem 0x000000a880000000-0x000000c07e7bffff] usable
reserve setup_data: [mem 0x000000c07e7c0000-0x000000c07fbfffff] reserved
reserve setup_data: [mem 0x000000c07fc00000-0x000000c07fcfffff] usable
reserve setup_data: [mem 0x000000c07fd00000-0x000000c07fffffff] reserved
reserve setup_data: [mem 0x0000010000000000-0x00000100201fffff] reserved
reserve setup_data: [mem 0x000001dfa0000000-0x000001dfc01fffff] reserved
reserve setup_data: [mem 0x000002bf40000000-0x000002bf601fffff] reserved
reserve setup_data: [mem 0x0000039ee0000000-0x0000039f001fffff] reserved
reserve setup_data: [mem 0x0000047e80000000-0x0000047ea01fffff] reserved
reserve setup_data: [mem 0x0000055e20000000-0x0000055e401fffff] reserved
reserve setup_data: [mem 0x0000063dc0000000-0x0000063de01fffff] reserved
reserve setup_data: [mem 0x0000071d60000000-0x0000071d801fffff] reserved
efi: EFI v2.7 by Dell Inc.
efi: ACPI=0x6effe000 ACPI 2.0=0x6effe014 TPMFinalLog=0x6ed95000 MEMATTR=0x62fadaa0 SMBIOS=0x69185000 SMBIOS 3.0=0x69183000 INITRD=0x54797da0 RNG=0x6eefd020 TPMEventLog=0x6eeee020
random: crng init done
efi: Remove mem55: MMIO range=[0x80000000-0x8fffffff] (256MB) from e820 map
efi: Remove mem56: MMIO range=[0x9c000000-0x9cffffff] (16MB) from e820 map
efi: Remove mem57: MMIO range=[0xa9000000-0xa9ffffff] (16MB) from e820 map
efi: Remove mem58: MMIO range=[0xb6000000-0xb6ffffff] (16MB) from e820 map
efi: Remove mem59: MMIO range=[0xc3000000-0xc3ffffff] (16MB) from e820 map
efi: Remove mem60: MMIO range=[0xd0000000-0xd0ffffff] (16MB) from e820 map
efi: Remove mem61: MMIO range=[0xdd000000-0xddffffff] (16MB) from e820 map
efi: Remove mem62: MMIO range=[0xea000000-0xeaffffff] (16MB) from e820 map
efi: Remove mem63: MMIO range=[0xfd000000-0xffffffff] (48MB) from e820 map
efi: Remove mem73: MMIO range=[0x10000000000-0x100201fffff] (514MB) from e820 map
efi: Remove mem74: MMIO range=[0x1dfa0000000-0x1dfc01fffff] (514MB) from e820 map
efi: Remove mem75: MMIO range=[0x2bf40000000-0x2bf601fffff] (514MB) from e820 map
efi: Remove mem76: MMIO range=[0x39ee0000000-0x39f001fffff] (514MB) from e820 map
efi: Remove mem77: MMIO range=[0x47e80000000-0x47ea01fffff] (514MB) from e820 map
efi: Remove mem78: MMIO range=[0x55e20000000-0x55e401fffff] (514MB) from e820 map
efi: Remove mem79: MMIO range=[0x63dc0000000-0x63de01fffff] (514MB) from e820 map
efi: Remove mem80: MMIO range=[0x71d60000000-0x71d801fffff] (514MB) from e820 map
SMBIOS 3.3.0 present.
DMI: Dell Inc. PowerEdge R7625/06444F, BIOS 1.8.3 04/02/2024
DMI: Memory slots populated: 24/24
tsc: Fast TSC calibration using PIT
tsc: Detected 2250.007 MHz processor
last_pfn = 0xc07fd00 max_arch_pfn = 0x400000000
MTRR map: 5 entries (2 fixed + 3 variable; max 19), built from 9 variable MTRRs
x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT
x2apic: enabled by BIOS, switching to x2apic ops
last_pfn = 0x6f000 max_arch_pfn = 0x400000000
Using GB pages for direct mapping
Secure boot disabled
RAMDISK: [mem 0x4bf30000-0x4e31dfff]
ACPI: Early table checksum verification disabled
ACPI: RSDP 0x000000006EFFE014 000024 (v02 DELL )
ACPI: XSDT 0x000000006EF00188 0000DC (v01 DELL PE_SC3 00000002 DELL 00000001)
ACPI: FACP 0x000000006EFFA000 000114 (v06 DELL PE_SC3 00000002 DELL 00000001)
ACPI: DSDT 0x000000006EF26000 0C986A (v02 DELL PE_SC3 00000002 DELL 00000001)
ACPI: FACS 0x000000006EDAF000 000040
ACPI: BERT 0x000000006EFFC000 000030 (v01 DELL PE_SC3 00000001 DELL 00000001)
ACPI: HEST 0x000000006EFFB000 0008D4 (v01 DELL PE_SC3 00000001 DELL 00000001)
ACPI: HPET 0x000000006EFF9000 000038 (v01 DELL PE_SC3 00000002 DELL 00000001)
ACPI: APIC 0x000000006EFF2000 0060BE (v05 DELL PE_SC3 00000002 DELL 00000001)
ACPI: MCFG 0x000000006EFF1000 00003C (v01 DELL PE_SC3 00000002 DELL 00000001)
ACPI: WSMT 0x000000006EFF0000 000028 (v01 DELL PE_SC3 00000002 DELL 00000001)
ACPI: SSDT 0x000000006EF25000 000513 (v02 DELL xhc_port 00000001 INTL 20210331)
ACPI: SSDT 0x000000006EF23000 00111F (v02 AMD CPMRAS 00000001 INTL 20210331)
ACPI: SSDT 0x000000006EF22000 000623 (v02 DELL Tpm2Tabl 00001000 INTL 20210331)
ACPI: TPM2 0x000000006EF21000 00004C (v04 DELL PE_SC3 00000002 DELL 00000001)
ACPI: EINJ 0x000000006EF20000 000170 (v01 AMD PE_SC3 00000001 AMD 00000001)
ACPI: ASPT 0x000000006EF1F000 000070 (v02 DELL AmdTable 00000001 AMD 00000001)
ACPI: SSDT 0x000000006EF0F000 00FE24 (v02 DELL PE_SC3 00000001 AMD 00000001)
ACPI: SRAT 0x000000006EF0B000 003580 (v03 DELL PE_SC3 00000001 AMD 00000001)
ACPI: MSCT 0x000000006EF0A000 00004E (v01 DELL PE_SC3 00000000 AMD 00000001)
ACPI: SLIT 0x000000006EF09000 00042C (v01 DELL PE_SC3 00000001 AMD 00000001)
ACPI: HMAT 0x000000006EF07000 001768 (v02 DELL AmdTable 00000001 AMD 00000001)
ACPI: IVRS 0x000000006EF06000 000370 (v02 DELL PE_SC3 00000001 AMD 00000001)
ACPI: SPCR 0x000000006EF05000 000050 (v02 DELL PE_SC3 00000002 DELL 00000001)
ACPI: SSDT 0x000000006EF01000 0030DC (v02 DELL PE_SC3 00000002 DELL 00000001)
ACPI: NBFT 0x000000006EFFD000 000123 (v01 DELL PE_SC3 00000000 DELL 00000001)
ACPI: SSDT 0x000000006EEFE000 00193C (v02 AMD CPMCMN 00000001 INTL 20210331)
ACPI: Reserving FACP table memory at [mem 0x6effa000-0x6effa113]
ACPI: Reserving DSDT table memory at [mem 0x6ef26000-0x6efef869]
ACPI: Reserving FACS table memory at [mem 0x6edaf000-0x6edaf03f]
ACPI: Reserving BERT table memory at [mem 0x6effc000-0x6effc02f]
ACPI: Reserving HEST table memory at [mem 0x6effb000-0x6effb8d3]
ACPI: Reserving HPET table memory at [mem 0x6eff9000-0x6eff9037]
ACPI: Reserving APIC table memory at [mem 0x6eff2000-0x6eff80bd]
ACPI: Reserving MCFG table memory at [mem 0x6eff1000-0x6eff103b]
ACPI: Reserving WSMT table memory at [mem 0x6eff0000-0x6eff0027]
ACPI: Reserving SSDT table memory at [mem 0x6ef25000-0x6ef25512]
ACPI: Reserving SSDT table memory at [mem 0x6ef23000-0x6ef2411e]
ACPI: Reserving SSDT table memory at [mem 0x6ef22000-0x6ef22622]
ACPI: Reserving TPM2 table memory at [mem 0x6ef21000-0x6ef2104b]
ACPI: Reserving EINJ table memory at [mem 0x6ef20000-0x6ef2016f]
ACPI: Reserving ASPT table memory at [mem 0x6ef1f000-0x6ef1f06f]
ACPI: Reserving SSDT table memory at [mem 0x6ef0f000-0x6ef1ee23]
ACPI: Reserving SRAT table memory at [mem 0x6ef0b000-0x6ef0e57f]
ACPI: Reserving MSCT table memory at [mem 0x6ef0a000-0x6ef0a04d]
ACPI: Reserving SLIT table memory at [mem 0x6ef09000-0x6ef0942b]
ACPI: Reserving HMAT table memory at [mem 0x6ef07000-0x6ef08767]
ACPI: Reserving IVRS table memory at [mem 0x6ef06000-0x6ef0636f]
ACPI: Reserving SPCR table memory at [mem 0x6ef05000-0x6ef0504f]
ACPI: Reserving SSDT table memory at [mem 0x6ef01000-0x6ef040db]
ACPI: Reserving NBFT table memory at [mem 0x6effd000-0x6effd122]
ACPI: Reserving SSDT table memory at [mem 0x6eefe000-0x6eeff93b]
APIC: Switched APIC routing to: cluster x2apic
ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
ACPI: SRAT: Node 0 PXM 0 [mem 0x000c0000-0x7fffffff]
ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x67fffffff]
ACPI: SRAT: Node 1 PXM 1 [mem 0x680000000-0xc7fffffff]
ACPI: SRAT: Node 2 PXM 8 [mem 0xc80000000-0x127fffffff]
ACPI: SRAT: Node 4 PXM 4 [mem 0x1880000000-0x1e7fffffff]
ACPI: SRAT: Node 3 PXM 9 [mem 0x1280000000-0x187fffffff]
ACPI: SRAT: Node 5 PXM 5 [mem 0x1e80000000-0x247fffffff]
ACPI: SRAT: Node 6 PXM 12 [mem 0x2480000000-0x2a7fffffff]
ACPI: SRAT: Node 8 PXM 6 [mem 0x3080000000-0x367fffffff]
ACPI: SRAT: Node 7 PXM 13 [mem 0x2a80000000-0x307fffffff]
ACPI: SRAT: Node 9 PXM 7 [mem 0x3680000000-0x3c7fffffff]
ACPI: SRAT: Node 10 PXM 14 [mem 0x3c80000000-0x427fffffff]
ACPI: SRAT: Node 12 PXM 2 [mem 0x4880000000-0x4e7fffffff]
ACPI: SRAT: Node 11 PXM 15 [mem 0x4280000000-0x487fffffff]
ACPI: SRAT: Node 13 PXM 3 [mem 0x4e80000000-0x547fffffff]
ACPI: SRAT: Node 14 PXM 10 [mem 0x5480000000-0x5a7fffffff]
ACPI: SRAT: Node 15 PXM 11 [mem 0x5a80000000-0x607fffffff]
ACPI: SRAT: Node 16 PXM 16 [mem 0x6080000000-0x667fffffff]
ACPI: SRAT: Node 17 PXM 17 [mem 0x6680000000-0x6c7fffffff]
ACPI: SRAT: Node 18 PXM 24 [mem 0x6c80000000-0x727fffffff]
ACPI: SRAT: Node 20 PXM 20 [mem 0x7880000000-0x7e7fffffff]
ACPI: SRAT: Node 19 PXM 25 [mem 0x7280000000-0x787fffffff]
ACPI: SRAT: Node 21 PXM 21 [mem 0x7e80000000-0x847fffffff]
ACPI: SRAT: Node 22 PXM 28 [mem 0x8480000000-0x8a7fffffff]
ACPI: SRAT: Node 24 PXM 22 [mem 0x9080000000-0x967fffffff]
ACPI: SRAT: Node 23 PXM 29 [mem 0x8a80000000-0x907fffffff]
ACPI: SRAT: Node 25 PXM 23 [mem 0x9680000000-0x9c7fffffff]
ACPI: SRAT: Node 26 PXM 30 [mem 0x9c80000000-0xa27fffffff]
ACPI: SRAT: Node 28 PXM 18 [mem 0xa880000000-0xae7fffffff]
ACPI: SRAT: Node 27 PXM 31 [mem 0xa280000000-0xa87fffffff]
ACPI: SRAT: Node 29 PXM 19 [mem 0xae80000000-0xb47fffffff]
ACPI: SRAT: Node 30 PXM 26 [mem 0xb480000000-0xba7fffffff]
ACPI: SRAT: Node 31 PXM 27 [mem 0xba80000000-0xc07fffffff]
NUMA: Node 0 [mem 0x00001000-0x0009ffff] + [mem 0x000c0000-0x7fffffff] -> [mem 0x00001000-0x7fffffff]
NUMA: Node 0 [mem 0x00001000-0x7fffffff] + [mem 0x100000000-0x67fffffff] -> [mem 0x00001000-0x67fffffff]
NODE_DATA(0) allocated [mem 0x67fffdf40-0x67fffffff]
NODE_DATA(1) allocated [mem 0xc7fffdf40-0xc7fffffff]
NODE_DATA(2) allocated [mem 0x127fffdf40-0x127fffffff]
NODE_DATA(3) allocated [mem 0x187fdfdf40-0x187fdfffff]
NODE_DATA(4) allocated [mem 0x1e7fffdf40-0x1e7fffffff]
NODE_DATA(5) allocated [mem 0x247fffdf40-0x247fffffff]
NODE_DATA(6) allocated [mem 0x2a7fffdf40-0x2a7fffffff]
NODE_DATA(7) allocated [mem 0x307fdfdf40-0x307fdfffff]
NODE_DATA(8) allocated [mem 0x367fffdf40-0x367fffffff]
NODE_DATA(9) allocated [mem 0x3c7fffdf40-0x3c7fffffff]
NODE_DATA(10) allocated [mem 0x427fffdf40-0x427fffffff]
NODE_DATA(11) allocated [mem 0x487fdfdf40-0x487fdfffff]
NODE_DATA(12) allocated [mem 0x4e7fffdf40-0x4e7fffffff]
NODE_DATA(13) allocated [mem 0x547fffdf40-0x547fffffff]
NODE_DATA(14) allocated [mem 0x5a7fffdf40-0x5a7fffffff]
NODE_DATA(15) allocated [mem 0x607edfdf40-0x607edfffff]
NODE_DATA(16) allocated [mem 0x667fffdf40-0x667fffffff]
NODE_DATA(17) allocated [mem 0x6c7fffdf40-0x6c7fffffff]
NODE_DATA(18) allocated [mem 0x727fffdf40-0x727fffffff]
NODE_DATA(19) allocated [mem 0x787fdfdf40-0x787fdfffff]
NODE_DATA(20) allocated [mem 0x7e7fffdf40-0x7e7fffffff]
NODE_DATA(21) allocated [mem 0x847fffdf40-0x847fffffff]
NODE_DATA(22) allocated [mem 0x8a7fffdf40-0x8a7fffffff]
NODE_DATA(23) allocated [mem 0x907fdfdf40-0x907fdfffff]
NODE_DATA(24) allocated [mem 0x967fffdf40-0x967fffffff]
NODE_DATA(25) allocated [mem 0x9c7fffdf40-0x9c7fffffff]
NODE_DATA(26) allocated [mem 0xa27fffdf40-0xa27fffffff]
NODE_DATA(27) allocated [mem 0xa87fdfdf40-0xa87fdfffff]
NODE_DATA(28) allocated [mem 0xae7fffdf40-0xae7fffffff]
NODE_DATA(29) allocated [mem 0xb47fffdf40-0xb47fffffff]
NODE_DATA(30) allocated [mem 0xba7fffdf40-0xba7fffffff]
NODE_DATA(31) allocated [mem 0xc07fcfcf40-0xc07fcfefff]
Zone ranges:
DMA32 [mem 0x0000000000001000-0x00000000ffffffff]
Normal [mem 0x0000000100000000-0x000000c07fcfffff]
Movable zone start for each node
Early memory node ranges
node 0: [mem 0x0000000000001000-0x000000000008efff]
node 0: [mem 0x0000000000090000-0x000000000009ffff]
node 0: [mem 0x0000000000100000-0x0000000059cccfff]
node 0: [mem 0x0000000059cd0000-0x00000000677cefff]
node 0: [mem 0x000000006efff000-0x000000006effffff]
node 0: [mem 0x0000000100000000-0x000000067fffffff]
node 1: [mem 0x0000000680000000-0x0000000c7fffffff]
node 2: [mem 0x0000000c80000000-0x000000127fffffff]
node 3: [mem 0x0000001280000000-0x000000187fdfffff]
node 4: [mem 0x0000001880000000-0x0000001e7fffffff]
node 5: [mem 0x0000001e80000000-0x000000247fffffff]
node 6: [mem 0x0000002480000000-0x0000002a7fffffff]
node 7: [mem 0x0000002a80000000-0x000000307fdfffff]
node 8: [mem 0x0000003080000000-0x000000367fffffff]
node 9: [mem 0x0000003680000000-0x0000003c7fffffff]
node 10: [mem 0x0000003c80000000-0x000000427fffffff]
node 11: [mem 0x0000004280000000-0x000000487fdfffff]
node 12: [mem 0x0000004880000000-0x0000004e7fffffff]
node 13: [mem 0x0000004e80000000-0x000000547fffffff]
node 14: [mem 0x0000005480000000-0x0000005a7fffffff]
node 15: [mem 0x0000005a80000000-0x000000607edfffff]
node 16: [mem 0x0000006080000000-0x000000667fffffff]
node 17: [mem 0x0000006680000000-0x0000006c7fffffff]
node 18: [mem 0x0000006c80000000-0x000000727fffffff]
node 19: [mem 0x0000007280000000-0x000000787fdfffff]
node 20: [mem 0x0000007880000000-0x0000007e7fffffff]
node 21: [mem 0x0000007e80000000-0x000000847fffffff]
node 22: [mem 0x0000008480000000-0x0000008a7fffffff]
node 23: [mem 0x0000008a80000000-0x000000907fdfffff]
node 24: [mem 0x0000009080000000-0x000000967fffffff]
node 25: [mem 0x0000009680000000-0x0000009c7fffffff]
node 26: [mem 0x0000009c80000000-0x000000a27fffffff]
node 27: [mem 0x000000a280000000-0x000000a87fdfffff]
node 28: [mem 0x000000a880000000-0x000000ae7fffffff]
node 29: [mem 0x000000ae80000000-0x000000b47fffffff]
node 30: [mem 0x000000b480000000-0x000000ba7fffffff]
node 31: [mem 0x000000ba80000000-0x000000c07e7bffff]
node 31: [mem 0x000000c07fc00000-0x000000c07fcfffff]
Initmem setup node 0 [mem 0x0000000000001000-0x000000067fffffff]
Initmem setup node 1 [mem 0x0000000680000000-0x0000000c7fffffff]
Initmem setup node 2 [mem 0x0000000c80000000-0x000000127fffffff]
Initmem setup node 3 [mem 0x0000001280000000-0x000000187fdfffff]
Initmem setup node 4 [mem 0x0000001880000000-0x0000001e7fffffff]
Initmem setup node 5 [mem 0x0000001e80000000-0x000000247fffffff]
Initmem setup node 6 [mem 0x0000002480000000-0x0000002a7fffffff]
Initmem setup node 7 [mem 0x0000002a80000000-0x000000307fdfffff]
Initmem setup node 8 [mem 0x0000003080000000-0x000000367fffffff]
Initmem setup node 9 [mem 0x0000003680000000-0x0000003c7fffffff]
Initmem setup node 10 [mem 0x0000003c80000000-0x000000427fffffff]
Initmem setup node 11 [mem 0x0000004280000000-0x000000487fdfffff]
Initmem setup node 12 [mem 0x0000004880000000-0x0000004e7fffffff]
Initmem setup node 13 [mem 0x0000004e80000000-0x000000547fffffff]
Initmem setup node 14 [mem 0x0000005480000000-0x0000005a7fffffff]
Initmem setup node 15 [mem 0x0000005a80000000-0x000000607edfffff]
Initmem setup node 16 [mem 0x0000006080000000-0x000000667fffffff]
Initmem setup node 17 [mem 0x0000006680000000-0x0000006c7fffffff]
Initmem setup node 18 [mem 0x0000006c80000000-0x000000727fffffff]
Initmem setup node 19 [mem 0x0000007280000000-0x000000787fdfffff]
Initmem setup node 20 [mem 0x0000007880000000-0x0000007e7fffffff]
Initmem setup node 21 [mem 0x0000007e80000000-0x000000847fffffff]
Initmem setup node 22 [mem 0x0000008480000000-0x0000008a7fffffff]
Initmem setup node 23 [mem 0x0000008a80000000-0x000000907fdfffff]
Initmem setup node 24 [mem 0x0000009080000000-0x000000967fffffff]
Initmem setup node 25 [mem 0x0000009680000000-0x0000009c7fffffff]
Initmem setup node 26 [mem 0x0000009c80000000-0x000000a27fffffff]
Initmem setup node 27 [mem 0x000000a280000000-0x000000a87fdfffff]
Initmem setup node 28 [mem 0x000000a880000000-0x000000ae7fffffff]
Initmem setup node 29 [mem 0x000000ae80000000-0x000000b47fffffff]
Initmem setup node 30 [mem 0x000000b480000000-0x000000ba7fffffff]
Initmem setup node 31 [mem 0x000000ba80000000-0x000000c07fcfffff]
On node 0, zone DMA32: 1 pages in unavailable ranges
On node 0, zone DMA32: 1 pages in unavailable ranges
On node 0, zone DMA32: 96 pages in unavailable ranges
On node 0, zone DMA32: 3 pages in unavailable ranges
On node 0, zone DMA32: 30768 pages in unavailable ranges
On node 0, zone Normal: 4096 pages in unavailable ranges
On node 4, zone Normal: 512 pages in unavailable ranges
On node 8, zone Normal: 512 pages in unavailable ranges
On node 12, zone Normal: 512 pages in unavailable ranges
On node 16, zone Normal: 4608 pages in unavailable ranges
On node 20, zone Normal: 512 pages in unavailable ranges
On node 24, zone Normal: 512 pages in unavailable ranges
On node 28, zone Normal: 512 pages in unavailable ranges
On node 31, zone Normal: 5184 pages in unavailable ranges
On node 31, zone Normal: 768 pages in unavailable ranges
ACPI: PM-Timer IO Port: 0x408
ACPI: X2APIC_NMI (uid[0xffffffff] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0xff] high edge lint[0x1])
IOAPIC[0]: apic_id 240, version 33, address 0xfec00000, GSI 0-23
IOAPIC[1]: apic_id 241, version 33, address 0xea180000, GSI 120-151
IOAPIC[2]: apic_id 242, version 33, address 0xdd180000, GSI 88-119
IOAPIC[3]: apic_id 243, version 33, address 0xfd180000, GSI 24-55
IOAPIC[4]: apic_id 244, version 33, address 0xd0180000, GSI 56-87
IOAPIC[5]: apic_id 245, version 33, address 0xc3180000, GSI 248-279
IOAPIC[6]: apic_id 246, version 33, address 0xb6180000, GSI 216-247
IOAPIC[7]: apic_id 247, version 33, address 0xa9280000, GSI 152-183
IOAPIC[8]: apic_id 248, version 33, address 0x9c180000, GSI 184-215
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
ACPI: Using ACPI (MADT) for SMP configuration information
ACPI: HPET id: 0x10228201 base: 0xfed00000
ACPI: SPCR: console: uart,io,0x2f8,115200
CPU topo: Max. logical packages: 2
CPU topo: Max. logical dies: 16
CPU topo: Max. dies per package: 8
CPU topo: Max. threads per core: 2
CPU topo: Num. cores per package: 128
CPU topo: Num. threads per package: 256
CPU topo: Allowing 512 present CPUs plus 0 hotplug CPUs
[mem 0x80000000-0xffffffff] available for PCI devices
clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
setup_percpu: NR_CPUS:512 nr_cpumask_bits:512 nr_cpu_ids:512 nr_node_ids:32
percpu: Embedded 53 pages/cpu s179352 r8192 d29544 u262144
Kernel command line: BOOT_IMAGE=/vmlinuz-6.12.0-rc2+ root=UUID=d155ab07-2863-462d-931e-ce417aca7adb ro nvme.poll_queues=16 mitigations=off console=ttyS1,115200 console=tty0
Unknown kernel command line parameters "BOOT_IMAGE=/vmlinuz-6.12.0-rc2+", will be passed to user space.
printk: log_buf_len individual max cpu contribution: 4096 bytes
printk: log_buf_len total cpu_extra contributions: 2093056 bytes
printk: log_buf_len min size: 131072 bytes
printk: log_buf_len: 4194304 bytes
printk: early log buf free: 94952(72%)
Fallback order for Node 0: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Fallback order for Node 1: 1 2 3 0 5 6 7 8 9 10 11 12 13 14 15 4 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 16
Fallback order for Node 2: 2 3 0 6 7 8 9 10 11 12 13 14 15 1 4 5 18 19 20 21 22 23 24 25 26 27 28 29 30 31 16 17
Fallback order for Node 3: 3 0 7 8 9 10 11 12 13 14 15 2 5 6 1 4 19 20 21 22 23 24 25 26 27 28 29 30 31 16 17 18
Fallback order for Node 4: 4 6 7 5 8 9 10 11 12 13 14 15 0 3 2 1 20 21 22 23 24 25 26 27 28 29 30 31 16 17 18 19
Fallback order for Node 5: 5 7 6 9 10 11 12 13 14 15 8 4 0 3 2 1 21 22 23 24 25 26 27 28 29 30 31 16 17 18 19 20
Fallback order for Node 6: 6 7 10 11 12 13 14 15 8 9 5 4 3 0 2 1 22 23 24 25 26 27 28 29 30 31 16 17 18 19 20 21
Fallback order for Node 7: 7 11 12 13 14 15 8 9 10 6 5 4 0 2 3 1 23 24 25 26 27 28 29 30 31 16 17 18 19 20 21 22
Fallback order for Node 8: 8 9 10 11 12 13 14 15 2 3 0 1 5 6 7 4 24 25 26 27 28 29 30 31 16 17 18 19 20 21 22 23
Fallback order for Node 9: 9 10 11 13 14 15 8 12 2 3 0 1 5 6 7 4 25 26 27 28 29 30 31 16 17 18 19 20 21 22 23 24
Fallback order for Node 10: 10 11 14 15 13 8 9 12 2 3 0 1 5 6 7 4 26 27 28 29 30 31 16 17 18 19 20 21 22 23 24 25
Fallback order for Node 11: 11 15 13 14 9 10 8 12 2 3 0 1 5 6 7 4 27 28 29 30 31 16 17 18 19 20 21 22 23 24 25 26
Fallback order for Node 12: 12 13 14 15 2 3 10 11 0 1 5 6 7 8 9 4 28 29 30 31 16 17 18 19 20 21 22 23 24 25 26 27
Fallback order for Node 13: 13 14 15 12 3 10 11 0 1 2 5 6 7 8 9 4 29 30 31 16 17 18 19 20 21 22 23 24 25 26 27 28
Fallback order for Node 14: 14 15 13 12 10 11 0 1 2 3 5 6 7 8 9 4 30 31 16 17 18 19 20 21 22 23 24 25 26 27 28 29
Fallback order for Node 15: 15 13 14 12 11 0 1 2 3 5 6 7 8 9 10 4 31 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Fallback order for Node 16: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 14 15 0 1 2 3 5 6 7 8 9 10 11 13 4 12
Fallback order for Node 17: 17 18 19 16 21 22 23 24 25 26 27 28 29 30 31 20 15 0 1 2 3 5 6 7 8 9 10 11 13 14 4 12
Fallback order for Node 18: 18 19 16 22 23 24 25 26 27 28 29 30 31 17 20 21 0 1 2 3 5 6 7 8 9 10 11 13 14 15 4 12
Fallback order for Node 19: 19 16 23 24 25 26 27 28 29 30 31 18 21 22 17 20 1 2 3 5 6 7 8 9 10 11 13 14 15 0 4 12
Fallback order for Node 20: 20 22 23 21 24 25 26 27 28 29 30 31 16 19 18 17 2 3 5 6 7 8 9 10 11 13 14 15 0 1 4 12
Fallback order for Node 21: 21 23 22 25 26 27 28 29 30 31 24 20 16 19 18 17 3 5 6 7 8 9 10 11 13 14 15 0 1 2 4 12
Fallback order for Node 22: 22 23 26 27 28 29 30 31 24 25 21 20 19 16 18 17 5 6 7 8 9 10 11 13 14 15 0 1 2 3 4 12
Fallback order for Node 23: 23 27 28 29 30 31 24 25 26 22 21 20 16 18 19 17 6 7 8 9 10 11 13 14 15 0 1 2 3 4 5 12
Fallback order for Node 24: 24 25 26 27 28 29 30 31 18 19 16 17 21 22 23 20 7 8 9 10 11 13 14 15 0 1 2 3 4 5 6 12
Fallback order for Node 25: 25 26 27 29 30 31 24 28 18 19 16 17 21 22 23 20 8 9 10 11 13 14 15 0 1 2 3 4 5 6 7 12
Fallback order for Node 26: 26 27 30 31 29 24 25 28 18 19 16 17 21 22 23 20 9 10 11 13 14 15 0 1 2 3 4 5 6 7 8 12
Fallback order for Node 27: 27 31 29 30 25 26 24 28 18 19 16 17 21 22 23 20 10 11 13 14 15 0 1 2 3 4 5 6 7 8 9 12
Fallback order for Node 28: 28 29 30 31 18 19 26 27 16 17 21 22 23 24 25 20 11 13 14 15 0 1 2 3 4 5 6 7 8 9 10 12
Fallback order for Node 29: 29 30 31 28 19 26 27 16 17 18 21 22 23 24 25 20 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12
Fallback order for Node 30: 30 31 29 28 26 27 16 17 18 19 21 22 23 24 25 20 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13
Fallback order for Node 31: 31 29 30 28 27 16 17 18 19 21 22 23 24 25 26 20 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Built 32 zonelists, mobility grouping on. Total pages: 201212459
Policy zone: Normal
mem auto-init: stack:off, heap alloc:on, heap free:off
software IO TLB: area num 512.
software IO TLB: SWIOTLB bounce buffer size roundup to 128MB
SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=512, Nodes=32
ftrace: allocating 40090 entries in 157 pages
ftrace: allocated 157 pages with 5 groups
rcu: Hierarchical RCU implementation.
Rude variant of Tasks RCU enabled.
Tracing variant of Tasks RCU enabled.
rcu: RCU calculated value of scheduler-enlistment delay is 12 jiffies.
RCU Tasks Rude: Setting shift to 9 and lim to 1 rcu_task_cb_adjust=1 rcu_task_cpu_ids=512.
RCU Tasks Trace: Setting shift to 9 and lim to 1 rcu_task_cb_adjust=1 rcu_task_cpu_ids=512.
NR_IRQS: 33024, nr_irqs: 8872, preallocated irqs: 16
rcu: srcu_init: Setting srcu_struct sizes to big.
Console: colour dummy device 80x25
printk: legacy console [tty0] enabled
printk: legacy console [ttyS1] enabled
mempolicy: Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl
ACPI: Core revision 20240827
clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484873504 ns
APIC: Switch to symmetric I/O mode setup
AMD-Vi: Using global IVHD EFR:0x25bf732fa2295afe, EFR2:0x1d
..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x206ebacc55f, max_idle_ns: 440795306246 ns
Calibrating delay loop (skipped), value calculated using timer frequency.. 4500.01 BogoMIPS (lpj=22500070)
x86/cpu: User Mode Instruction Prevention (UMIP) activated
LVT offset 2 assigned for vector 0xf4
Last level iTLB entries: 4KB 512, 2MB 512, 4MB 256
Last level dTLB entries: 4KB 3072, 2MB 3072, 4MB 1536, 1GB 0
process: using mwait in idle threads
Spectre V2 : User space: Vulnerable
Speculative Store Bypass: Vulnerable
x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask'
x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256'
x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256'
x86/fpu: Supporting XSAVE feature 0x800: 'Control-flow User registers'
x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
x86/fpu: xstate_offset[5]: 832, xstate_sizes[5]: 64
x86/fpu: xstate_offset[6]: 896, xstate_sizes[6]: 512
x86/fpu: xstate_offset[7]: 1408, xstate_sizes[7]: 1024
x86/fpu: xstate_offset[11]: 2432, xstate_sizes[11]: 16
x86/fpu: Enabled xstate features 0x8e7, context size is 2448 bytes, using 'compacted' format.
Freeing SMP alternatives memory: 36K
pid_max: default: 524288 minimum: 4096
Dentry cache hash table entries: 33554432 (order: 16, 268435456 bytes, vmalloc hugepage)
Inode-cache hash table entries: 16777216 (order: 15, 134217728 bytes, vmalloc hugepage)
Mount-cache hash table entries: 524288 (order: 10, 4194304 bytes, vmalloc)
Mountpoint-cache hash table entries: 524288 (order: 10, 4194304 bytes, vmalloc)
smpboot: CPU0: AMD EPYC 9754 128-Core Processor (family: 0x19, model: 0xa0, stepping: 0x2)
Performance Events: Fam17h+ 16-deep LBR, core perfctr, AMD PMU driver.
... version: 2
... bit width: 48
... generic registers: 6
... value mask: 0000ffffffffffff
... max period: 00007fffffffffff
... fixed-purpose events: 0
... event mask: 000000000000003f
signal: max sigframe size: 2960
microcode: ucode_rev_to_cpuid: val: 0xa00107a, p.stepping: 0x0, c.stepping: 0x0
microcode: ucode_rev_to_cpuid: val: 0xa101248, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xa001238, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xa101148, p.stepping: 0x1, c.stepping: 0x1
microcode: ucode_rev_to_cpuid: val: 0xa0011d5, p.stepping: 0x1, c.stepping: 0x1
microcode: ucode_rev_to_cpuid: val: 0xaa00116, p.stepping: 0x1, c.stepping: 0x1
microcode: update_cache: add patch: 0xa00107a
microcode: ucode_rev_to_cpuid: val: 0xa00107a, p.stepping: 0x0, c.stepping: 0x0
microcode: ucode_rev_to_cpuid: val: 0xa101248, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa00f10, n_cid.full: 0xa10f10
microcode: update_cache: add patch: 0xa101248
microcode: ucode_rev_to_cpuid: val: 0xa00107a, p.stepping: 0x0, c.stepping: 0x0
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa00f10, n_cid.full: 0xaa0f00
microcode: ucode_rev_to_cpuid: val: 0xa101248, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa10f10, n_cid.full: 0xaa0f00
microcode: update_cache: add patch: 0xaa00215
microcode: ucode_rev_to_cpuid: val: 0xa00107a, p.stepping: 0x0, c.stepping: 0x0
microcode: ucode_rev_to_cpuid: val: 0xa001238, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa00f10, n_cid.full: 0xa00f10
microcode: ucode_rev_to_cpuid: val: 0xa101248, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xa001238, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa10f10, n_cid.full: 0xa00f10
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xa001238, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xaa0f00, n_cid.full: 0xa00f10
microcode: update_cache: add patch: 0xa001238
microcode: ucode_rev_to_cpuid: val: 0xa00107a, p.stepping: 0x0, c.stepping: 0x0
microcode: ucode_rev_to_cpuid: val: 0xa101148, p.stepping: 0x1, c.stepping: 0x1
microcode: patch_cpus_equivalent: p_cid.full: 0xa00f10, n_cid.full: 0xa10f10
microcode: ucode_rev_to_cpuid: val: 0xa101248, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xa101148, p.stepping: 0x1, c.stepping: 0x1
microcode: patch_cpus_equivalent: p_cid.full: 0xa10f10, n_cid.full: 0xa10f10
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xa101148, p.stepping: 0x1, c.stepping: 0x1
microcode: patch_cpus_equivalent: p_cid.full: 0xaa0f00, n_cid.full: 0xa10f10
microcode: ucode_rev_to_cpuid: val: 0xa001238, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xa101148, p.stepping: 0x1, c.stepping: 0x1
microcode: patch_cpus_equivalent: p_cid.full: 0xa00f10, n_cid.full: 0xa10f10
microcode: update_cache: add patch: 0xa101148
microcode: ucode_rev_to_cpuid: val: 0xa00107a, p.stepping: 0x0, c.stepping: 0x0
microcode: ucode_rev_to_cpuid: val: 0xa0011d5, p.stepping: 0x1, c.stepping: 0x1
microcode: patch_cpus_equivalent: p_cid.full: 0xa00f10, n_cid.full: 0xa00f10
microcode: ucode_rev_to_cpuid: val: 0xa101248, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xa0011d5, p.stepping: 0x1, c.stepping: 0x1
microcode: patch_cpus_equivalent: p_cid.full: 0xa10f10, n_cid.full: 0xa00f10
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xa0011d5, p.stepping: 0x1, c.stepping: 0x1
microcode: patch_cpus_equivalent: p_cid.full: 0xaa0f00, n_cid.full: 0xa00f10
microcode: ucode_rev_to_cpuid: val: 0xa001238, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xa0011d5, p.stepping: 0x1, c.stepping: 0x1
microcode: patch_cpus_equivalent: p_cid.full: 0xa00f10, n_cid.full: 0xa00f10
microcode: ucode_rev_to_cpuid: val: 0xa101148, p.stepping: 0x1, c.stepping: 0x1
microcode: ucode_rev_to_cpuid: val: 0xa0011d5, p.stepping: 0x1, c.stepping: 0x1
microcode: patch_cpus_equivalent: p_cid.full: 0xa10f10, n_cid.full: 0xa00f10
microcode: update_cache: add patch: 0xa0011d5
microcode: ucode_rev_to_cpuid: val: 0xa00107a, p.stepping: 0x0, c.stepping: 0x0
microcode: ucode_rev_to_cpuid: val: 0xaa00116, p.stepping: 0x1, c.stepping: 0x1
microcode: patch_cpus_equivalent: p_cid.full: 0xa00f10, n_cid.full: 0xaa0f00
microcode: ucode_rev_to_cpuid: val: 0xa101248, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00116, p.stepping: 0x1, c.stepping: 0x1
microcode: patch_cpus_equivalent: p_cid.full: 0xa10f10, n_cid.full: 0xaa0f00
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00116, p.stepping: 0x1, c.stepping: 0x1
microcode: patch_cpus_equivalent: p_cid.full: 0xaa0f00, n_cid.full: 0xaa0f00
microcode: ucode_rev_to_cpuid: val: 0xa001238, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00116, p.stepping: 0x1, c.stepping: 0x1
microcode: patch_cpus_equivalent: p_cid.full: 0xa00f10, n_cid.full: 0xaa0f00
microcode: ucode_rev_to_cpuid: val: 0xa101148, p.stepping: 0x1, c.stepping: 0x1
microcode: ucode_rev_to_cpuid: val: 0xaa00116, p.stepping: 0x1, c.stepping: 0x1
microcode: patch_cpus_equivalent: p_cid.full: 0xa10f10, n_cid.full: 0xaa0f00
microcode: ucode_rev_to_cpuid: val: 0xa0011d5, p.stepping: 0x1, c.stepping: 0x1
microcode: ucode_rev_to_cpuid: val: 0xaa00116, p.stepping: 0x1, c.stepping: 0x1
microcode: patch_cpus_equivalent: p_cid.full: 0xa00f10, n_cid.full: 0xaa0f00
microcode: update_cache: add patch: 0xaa00116
microcode: cache_find_patch: equiv_cpu: 0x0, patch_id: 0xaa00215
microcode: ucode_rev_to_cpuid: val: 0xa00107a, p.stepping: 0x0, c.stepping: 0x0
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa00f10, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xa101248, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa10f12, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xaa0f02, n_cid.full: 0xaa0f02
microcode: cache_find_patch: using 0xaa00215
microcode: cache_find_patch: equiv_cpu: 0x0, patch_id: 0xaa00215
microcode: ucode_rev_to_cpuid: val: 0xa00107a, p.stepping: 0x0, c.stepping: 0x0
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa00f10, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xa101248, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa10f12, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xaa0f02, n_cid.full: 0xaa0f02
microcode: cache_find_patch: using 0xaa00215
BUG: unable to handle page fault for address: 00000001000141ab
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP
CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0-rc2+ #144
Hardware name: Dell Inc. PowerEdge R7625/06444F, BIOS 1.8.3 04/02/2024
RIP: 0010:load_microcode_amd.isra.0+0x2b1/0x380
Code: e7 06 48 81 c7 c0 97 9e a5 e8 3b bd 4e 00 89 c7 89 c0 4c 8b 2c c5 40 20 02 a5 49 01 ed e8 87 fa ff ff 48 85 c0 74 12 8b 40 1c <41> 39 85 28 01 00 00 0f 92 c0 0f b6 c0 09 c3 41 8d 4c 24 01 41 83
RSP: 0018:ffffaaa8800e7e08 EFLAGS: 00010286
RAX: 000000000aa00215 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffaaa8800e7c78 RDI: 00000000fffdffff
RBP: 0000000000014080 R08: 00000000fffdffff R09: ffff987ce58fffa8
R10: ffff987ce4e00000 R11: 646f636f7263696d R12: 0000000000000001
R13: 0000000100014083 R14: ffffaaa8800e7e0c R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff97c2e7e00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000001000141ab CR3: 00000034d3828001 CR4: 0000000000370ef0
Call Trace:
<TASK>
? __die_body.cold+0x19/0x2b
? page_fault_oops+0x90/0x210
? prb_read_valid+0x17/0x20
? console_unlock+0xb4/0x240
? exc_page_fault+0x6c/0x130
? asm_exc_page_fault+0x22/0x30
? load_microcode_amd.isra.0+0x2b1/0x380
? load_microcode_amd.isra.0+0x2a9/0x380
save_microcode_in_initrd+0x90/0xb0
? find_blobs_in_containers+0xb0/0xb0
do_one_initcall+0x2e/0x190
? try_to_wake_up+0x1c0/0x4b0
kernel_init_freeable+0xdd/0x210
? rest_init+0xc0/0xc0
kernel_init+0x16/0x120
ret_from_fork+0x2d/0x50
? rest_init+0xc0/0xc0
ret_from_fork_asm+0x11/0x20
</TASK>
Modules linked in:
CR2: 00000001000141ab
---[ end trace 0000000000000000 ]---
RIP: 0010:load_microcode_amd.isra.0+0x2b1/0x380
Code: e7 06 48 81 c7 c0 97 9e a5 e8 3b bd 4e 00 89 c7 89 c0 4c 8b 2c c5 40 20 02 a5 49 01 ed e8 87 fa ff ff 48 85 c0 74 12 8b 40 1c <41> 39 85 28 01 00 00 0f 92 c0 0f b6 c0 09 c3 41 8d 4c 24 01 41 83
RSP: 0018:ffffaaa8800e7e08 EFLAGS: 00010286
RAX: 000000000aa00215 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffaaa8800e7c78 RDI: 00000000fffdffff
RBP: 0000000000014080 R08: 00000000fffdffff R09: ffff987ce58fffa8
R10: ffff987ce4e00000 R11: 646f636f7263696d R12: 0000000000000001
R13: 0000000100014083 R14: ffffaaa8800e7e0c R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff97c2e7e00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000001000141ab CR3: 00000034d3828001 CR4: 0000000000370ef0
note: swapper/0[1] exited with irqs disabled
Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 ]---
--
Jens Axboe
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: AMD zen microcode updates breaks boot
2024-10-19 13:54 ` Jens Axboe
@ 2024-10-19 23:21 ` Borislav Petkov
2024-10-20 3:24 ` Jens Axboe
0 siblings, 1 reply; 44+ messages in thread
From: Borislav Petkov @ 2024-10-19 23:21 UTC (permalink / raw)
To: Jens Axboe
Cc: Thomas Gleixner, the arch/x86 maintainers, LKML,
regressions@lists.linux.dev
On Sat, Oct 19, 2024 at 07:54:07AM -0600, Jens Axboe wrote:
> Added that, and here's the full boot output until it crashes. Sent you
> the full thing as there's some microcode debug prints initially, and
> then some later on. Didn't want to miss any.
Thanks, I think I see it. It is that weird node-per-L3 setting which
apparently doesn't set up the ACPI tables properly or the loader runs too
early but load_microcode_amd() sees a very funky node maps. Node 0's first CPU
in the map is CPU 0, which is correct but then cpu_data(0) is a funky pointer
which causes the splat.
All the other "nodes" up to 31 have the first CPU in the mask as 512 which is
WTF?!
So the below is the same conglomerate patch but with code to dump those node
ids in load_microcode_amd() so that I can see what your system says.
It should boot ok now - fingers crossed - but let's see what it really does on
your machine. Just replace it with the previous one, pls, and send me full
dmesg again.
Thx.
---
diff --git a/arch/x86/kernel/cpu/microcode/amd.c b/arch/x86/kernel/cpu/microcode/amd.c
index f63b051f25a0..0d840a43b915 100644
--- a/arch/x86/kernel/cpu/microcode/amd.c
+++ b/arch/x86/kernel/cpu/microcode/amd.c
@@ -158,6 +158,9 @@ static union cpuid_1_eax ucode_rev_to_cpuid(unsigned int val)
c.family = 0xf;
c.ext_fam = p.ext_fam;
+ pr_info("%s: val: 0x%x, p.stepping: 0x%x, c.stepping: 0x%x\n",
+ __func__, val, p.stepping, c.stepping);
+
return c;
}
@@ -613,16 +616,22 @@ static int __init save_microcode_in_initrd(void)
}
early_initcall(save_microcode_in_initrd);
-static inline bool patch_cpus_equivalent(struct ucode_patch *p, struct ucode_patch *n)
+static inline bool patch_cpus_equivalent(struct ucode_patch *p,
+ struct ucode_patch *n,
+ bool ignore_stepping)
{
/* Zen and newer hardcode the f/m/s in the patch ID */
if (x86_family(bsp_cpuid_1_eax) >= 0x17) {
union cpuid_1_eax p_cid = ucode_rev_to_cpuid(p->patch_id);
union cpuid_1_eax n_cid = ucode_rev_to_cpuid(n->patch_id);
- /* Zap stepping */
- p_cid.stepping = 0;
- n_cid.stepping = 0;
+ if (ignore_stepping) {
+ p_cid.stepping = 0;
+ n_cid.stepping = 0;
+ }
+
+ pr_info("%s: p_cid.full: 0x%x, n_cid.full: 0x%x\n",
+ __func__, p_cid.full, n_cid.full);
return p_cid.full == n_cid.full;
} else {
@@ -641,16 +650,22 @@ static struct ucode_patch *cache_find_patch(struct ucode_cpu_info *uci, u16 equi
n.equiv_cpu = equiv_cpu;
n.patch_id = uci->cpu_sig.rev;
+ pr_info("%s: equiv_cpu: 0x%x, patch_id: 0x%x\n",
+ __func__, equiv_cpu, uci->cpu_sig.rev);
+
WARN_ON_ONCE(!n.patch_id);
- list_for_each_entry(p, µcode_cache, plist)
- if (patch_cpus_equivalent(p, &n))
+ list_for_each_entry(p, µcode_cache, plist) {
+ if (patch_cpus_equivalent(p, &n, false)) {
+ pr_info("%s: using 0x%x\n", __func__, p->patch_id);
return p;
+ }
+ }
return NULL;
}
-static inline bool patch_newer(struct ucode_patch *p, struct ucode_patch *n)
+static inline int patch_newer(struct ucode_patch *p, struct ucode_patch *n)
{
/* Zen and newer hardcode the f/m/s in the patch ID */
if (x86_family(bsp_cpuid_1_eax) >= 0x17) {
@@ -659,6 +674,9 @@ static inline bool patch_newer(struct ucode_patch *p, struct ucode_patch *n)
zp.ucode_rev = p->patch_id;
zn.ucode_rev = n->patch_id;
+ if (zn.stepping != zp.stepping)
+ return -1;
+
return zn.rev > zp.rev;
} else {
return n->patch_id > p->patch_id;
@@ -668,22 +686,32 @@ static inline bool patch_newer(struct ucode_patch *p, struct ucode_patch *n)
static void update_cache(struct ucode_patch *new_patch)
{
struct ucode_patch *p;
+ int ret;
list_for_each_entry(p, µcode_cache, plist) {
- if (patch_cpus_equivalent(p, new_patch)) {
- if (!patch_newer(p, new_patch)) {
+ if (patch_cpus_equivalent(p, new_patch, true)) {
+ ret = patch_newer(p, new_patch);
+ if (ret < 0)
+ continue;
+ else if (!ret) {
/* we already have the latest patch */
kfree(new_patch->data);
kfree(new_patch);
return;
}
+ pr_info("%s: replace 0x%x with 0x%x\n",
+ __func__, p->patch_id, new_patch->patch_id);
+
list_replace(&p->plist, &new_patch->plist);
kfree(p->data);
kfree(p);
return;
}
}
+
+ pr_info("%s: add patch: 0x%x\n", __func__, new_patch->patch_id);
+
/* no patch found, add it */
list_add_tail(&new_patch->plist, µcode_cache);
}
@@ -964,11 +992,14 @@ static enum ucode_state load_microcode_amd(u8 family, const u8 *data, size_t siz
cpu = cpumask_first(cpumask_of_node(nid));
c = &cpu_data(cpu);
+ pr_info("%s: nid: %d, cpu: %d, c ptr: 0x%px\n",
+ __func__, nid, cpu, c);
+
p = find_patch(cpu);
if (!p)
continue;
- if (c->microcode >= p->patch_id)
+ if (boot_cpu_data.microcode >= p->patch_id)
continue;
ret = UCODE_NEW;
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: AMD zen microcode updates breaks boot
2024-10-19 23:21 ` Borislav Petkov
@ 2024-10-20 3:24 ` Jens Axboe
2024-10-20 12:18 ` Borislav Petkov
0 siblings, 1 reply; 44+ messages in thread
From: Jens Axboe @ 2024-10-20 3:24 UTC (permalink / raw)
To: Borislav Petkov
Cc: Thomas Gleixner, the arch/x86 maintainers, LKML,
regressions@lists.linux.dev
On 10/19/24 5:21 PM, Borislav Petkov wrote:
> On Sat, Oct 19, 2024 at 07:54:07AM -0600, Jens Axboe wrote:
>> Added that, and here's the full boot output until it crashes. Sent you
>> the full thing as there's some microcode debug prints initially, and
>> then some later on. Didn't want to miss any.
>
> Thanks, I think I see it. It is that weird node-per-L3 setting which
> apparently doesn't set up the ACPI tables properly or the loader runs too
> early but load_microcode_amd() sees a very funky node maps. Node 0's first CPU
> in the map is CPU 0, which is correct but then cpu_data(0) is a funky pointer
> which causes the splat.
This was my initial thought when I saw where it crashed, is this being
run before node masks are initialized?
> All the other "nodes" up to 31 have the first CPU in the mask as 512 which is
> WTF?!
>
> So the below is the same conglomerate patch but with code to dump those node
> ids in load_microcode_amd() so that I can see what your system says.
>
> It should boot ok now - fingers crossed - but let's see what it really does on
> your machine. Just replace it with the previous one, pls, and send me full
> dmesg again.
That's a lot of output... Took about 1 minute on the serial console just
for that. Here's some of it, let me know if you need all, because it's a
LOT. Looks like it takes long enough that it actually fails to bring up
some CPUs.
microcode: ucode_rev_to_cpuid: val: 0xa101248, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa10f12, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xaa0f02, n_cid.full: 0xaa0f02
microcode: cache_find_patch: using 0xaa00215
#487
microcode: cache_find_patch: equiv_cpu: 0x0, patch_id: 0xaa00215
microcode: ucode_rev_to_cpuid: val: 0xa00107a, p.stepping: 0x0, c.stepping: 0x0
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa00f10, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xa101248, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa10f12, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xaa0f02, n_cid.full: 0xaa0f02
microcode: cache_find_patch: using 0xaa00215
.... node #29, CPUs: #488
microcode: cache_find_patch: equiv_cpu: 0x0, patch_id: 0xaa00215
microcode: ucode_rev_to_cpuid: val: 0xa00107a, p.stepping: 0x0, c.stepping: 0x0
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa00f10, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xa101248, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa10f12, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xaa0f02, n_cid.full: 0xaa0f02
microcode: cache_find_patch: using 0xaa00215
#489
microcode: cache_find_patch: equiv_cpu: 0x0, patch_id: 0xaa00215
microcode: ucode_rev_to_cpuid: val: 0xa00107a, p.stepping: 0x0, c.stepping: 0x0
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa00f10, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xa101248, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa10f12, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xaa0f02, n_cid.full: 0xaa0f02
microcode: cache_find_patch: using 0xaa00215
#490
microcode: cache_find_patch: equiv_cpu: 0x0, patch_id: 0xaa00215
microcode: ucode_rev_to_cpuid: val: 0xa00107a, p.stepping: 0x0, c.stepping: 0x0
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa00f10, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xa101248, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa10f12, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xaa0f02, n_cid.full: 0xaa0f02
microcode: cache_find_patch: using 0xaa00215
#491
microcode: cache_find_patch: equiv_cpu: 0x0, patch_id: 0xaa00215
microcode: ucode_rev_to_cpuid: val: 0xa00107a, p.stepping: 0x0, c.stepping: 0x0
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa00f10, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xa101248, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa10f12, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xaa0f02, n_cid.full: 0xaa0f02
microcode: cache_find_patch: using 0xaa00215
#492
microcode: cache_find_patch: equiv_cpu: 0x0, patch_id: 0xaa00215
microcode: ucode_rev_to_cpuid: val: 0xa00107a, p.stepping: 0x0, c.stepping: 0x0
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa00f10, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xa101248, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa10f12, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xaa0f02, n_cid.full: 0xaa0f02
microcode: cache_find_patch: using 0xaa00215
#493
microcode: cache_find_patch: equiv_cpu: 0x0, patch_id: 0xaa00215
microcode: ucode_rev_to_cpuid: val: 0xa00107a, p.stepping: 0x0, c.stepping: 0x0
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa00f10, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xa101248, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa10f12, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xaa0f02, n_cid.full: 0xaa0f02
microcode: cache_find_patch: using 0xaa00215
#494
microcode: cache_find_patch: equiv_cpu: 0x0, patch_id: 0xaa00215
microcode: ucode_rev_to_cpuid: val: 0xa00107a, p.stepping: 0x0, c.stepping: 0x0
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa00f10, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xa101248, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa10f12, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xaa0f02, n_cid.full: 0xaa0f02
microcode: cache_find_patch: using 0xaa00215
#495
microcode: cache_find_patch: equiv_cpu: 0x0, patch_id: 0xaa00215
microcode: ucode_rev_to_cpuid: val: 0xa00107a, p.stepping: 0x0, c.stepping: 0x0
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa00f10, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xa101248, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa10f12, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xaa0f02, n_cid.full: 0xaa0f02
microcode: cache_find_patch: using 0xaa00215
.... node #30, CPUs: #496
microcode: cache_find_patch: equiv_cpu: 0x0, patch_id: 0xaa00215
microcode: ucode_rev_to_cpuid: val: 0xa00107a, p.stepping: 0x0, c.stepping: 0x0
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa00f10, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xa101248, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa10f12, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xaa0f02, n_cid.full: 0xaa0f02
microcode: cache_find_patch: using 0xaa00215
#497
microcode: cache_find_patch: equiv_cpu: 0x0, patch_id: 0xaa00215
microcode: ucode_rev_to_cpuid: val: 0xa00107a, p.stepping: 0x0, c.stepping: 0x0
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa00f10, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xa101248, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa10f12, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xaa0f02, n_cid.full: 0xaa0f02
microcode: cache_find_patch: using 0xaa00215
#498
microcode: cache_find_patch: equiv_cpu: 0x0, patch_id: 0xaa00215
microcode: ucode_rev_to_cpuid: val: 0xa00107a, p.stepping: 0x0, c.stepping: 0x0
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa00f10, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xa101248, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa10f12, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xaa0f02, n_cid.full: 0xaa0f02
microcode: cache_find_patch: using 0xaa00215
#499
microcode: cache_find_patch: equiv_cpu: 0x0, patch_id: 0xaa00215
microcode: ucode_rev_to_cpuid: val: 0xa00107a, p.stepping: 0x0, c.stepping: 0x0
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa00f10, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xa101248, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa10f12, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xaa0f02, n_cid.full: 0xaa0f02
microcode: cache_find_patch: using 0xaa00215
#500
microcode: cache_find_patch: equiv_cpu: 0x0, patch_id: 0xaa00215
microcode: ucode_rev_to_cpuid: val: 0xa00107a, p.stepping: 0x0, c.stepping: 0x0
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa00f10, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xa101248, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa10f12, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xaa0f02, n_cid.full: 0xaa0f02
microcode: cache_find_patch: using 0xaa00215
#501
microcode: cache_find_patch: equiv_cpu: 0x0, patch_id: 0xaa00215
microcode: ucode_rev_to_cpuid: val: 0xa00107a, p.stepping: 0x0, c.stepping: 0x0
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa00f10, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xa101248, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa10f12, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xaa0f02, n_cid.full: 0xaa0f02
microcode: cache_find_patch: using 0xaa00215
#502
microcode: cache_find_patch: equiv_cpu: 0x0, patch_id: 0xaa00215
microcode: ucode_rev_to_cpuid: val: 0xa00107a, p.stepping: 0x0, c.stepping: 0x0
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa00f10, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xa101248, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa10f12, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xaa0f02, n_cid.full: 0xaa0f02
microcode: cache_find_patch: using 0xaa00215
#503
microcode: cache_find_patch: equiv_cpu: 0x0, patch_id: 0xaa00215
microcode: ucode_rev_to_cpuid: val: 0xa00107a, p.stepping: 0x0, c.stepping: 0x0
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa00f10, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xa101248, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa10f12, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xaa0f02, n_cid.full: 0xaa0f02
microcode: cache_find_patch: using 0xaa00215
.... node #31, CPUs: #504
microcode: cache_find_patch: equiv_cpu: 0x0, patch_id: 0xaa00215
microcode: ucode_rev_to_cpuid: val: 0xa00107a, p.stepping: 0x0, c.stepping: 0x0
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa00f10, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xa101248, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa10f12, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xaa0f02, n_cid.full: 0xaa0f02
microcode: cache_find_patch: using 0xaa00215
#505
microcode: cache_find_patch: equiv_cpu: 0x0, patch_id: 0xaa00215
microcode: ucode_rev_to_cpuid: val: 0xa00107a, p.stepping: 0x0, c.stepping: 0x0
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa00f10, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xa101248, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa10f12, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xaa0f02, n_cid.full: 0xaa0f02
microcode: cache_find_patch: using 0xaa00215
#506
microcode: cache_find_patch: equiv_cpu: 0x0, patch_id: 0xaa00215
microcode: ucode_rev_to_cpuid: val: 0xa00107a, p.stepping: 0x0, c.stepping: 0x0
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa00f10, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xa101248, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa10f12, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xaa0f02, n_cid.full: 0xaa0f02
microcode: cache_find_patch: using 0xaa00215
#507
microcode: cache_find_patch: equiv_cpu: 0x0, patch_id: 0xaa00215
microcode: ucode_rev_to_cpuid: val: 0xa00107a, p.stepping: 0x0, c.stepping: 0x0
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa00f10, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xa101248, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa10f12, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xaa0f02, n_cid.full: 0xaa0f02
microcode: cache_find_patch: using 0xaa00215
#508
microcode: cache_find_patch: equiv_cpu: 0x0, patch_id: 0xaa00215
microcode: ucode_rev_to_cpuid: val: 0xa00107a, p.stepping: 0x0, c.stepping: 0x0
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa00f10, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xa101248, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa10f12, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xaa0f02, n_cid.full: 0xaa0f02
microcode: cache_find_patch: using 0xaa00215
#509
microcode: cache_find_patch: equiv_cpu: 0x0, patch_id: 0xaa00215
microcode: ucode_rev_to_cpuid: val: 0xa00107a, p.stepping: 0x0, c.stepping: 0x0
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa00f10, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xa101248, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa10f12, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xaa0f02, n_cid.full: 0xaa0f02
microcode: cache_find_patch: using 0xaa00215
#510
microcode: cache_find_patch: equiv_cpu: 0x0, patch_id: 0xaa00215
microcode: ucode_rev_to_cpuid: val: 0xa00107a, p.stepping: 0x0, c.stepping: 0x0
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa00f10, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xa101248, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa10f12, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xaa0f02, n_cid.full: 0xaa0f02
microcode: cache_find_patch: using 0xaa00215
#511
microcode: cache_find_patch: equiv_cpu: 0x0, patch_id: 0xaa00215
microcode: ucode_rev_to_cpuid: val: 0xa00107a, p.stepping: 0x0, c.stepping: 0x0
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa00f10, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xa101248, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa10f12, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xaa0f02, n_cid.full: 0xaa0f02
microcode: cache_find_patch: using 0xaa00215
microcode: patch_cpus_equivalent: p_cid.full: 0xa00f10, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xa101248, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa10f12, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xa10f12, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: patch_cpus_equivalent: p_cid.full: 0xaa0f02, n_cid.full: 0xaa0f02
microcode: ucode_rev_to_cpuid: val: 0xaa00215, p.stepping: 0x2, c.stepping: 0x2
microcode: cache_find_patch: using 0xaa00215
microcode: patch_cpus_equivalent: p_cid.full: 0xaa0f02, n_cid.full: 0xaa0f02
microcode: cache_find_patch: using 0xaa00215
CPU257 failed to report alive state
call_irq_handler: 498.55 No irq handler for vector
smp: Brought up 32 nodes, 510 CPUs
then a bunch of the usual kernel messages, then it ends with:
microcode: Current revision: 0x0aa00215
microcode: Updated early from: 0x0aa00215
BUG: unable to handle page fault for address: 000000000aa0021d
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP
CPU: 225 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0-rc2+ #145
Hardware name: Dell Inc. PowerEdge R7625/06444F, BIOS 1.8.3 04/02/2024
RIP: 0010:internal_create_group+0x1d0/0x430
Code: 8b 0c 24 81 e2 b4 11 00 00 4c 89 ef e8 39 f2 ff ff 85 c0 0f 85 55 01 00 00 49 8b 47 08 49 83 c7 08 41 83 c6 01 48 85 c0 74 22 <0f> b7 48 08 85 ed 74 96 48 8b 30 31 d2 4c 89 ef 89 4c 24 08 e8 07
RSP: 0018:ffffb4a3000e7de8 EFLAGS: 00010246
RAX: 000000000aa00215 RBX: ffffffffbaa1a0c0 RCX: ffff9c6da7a374d0
RDX: 0000000080000001 RSI: ffff9c3da7a36598 RDI: ffff9c7fcc4e5188
RBP: 0000000000000000 R08: 0000000000000228 R09: ffff9bd8403a4f10
R10: ffff9c91c2381dd0 R11: 0000000000000000 R12: ffff9be3a7a3cc00
R13: ffff9c7fcc4e5188 R14: 0000000000000000 R15: ffffffffbb7e3928
FS: 0000000000000000(0000) GS:ffff9c85a7a40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000aa0021d CR3: 000000bd9bc28001 CR4: 0000000000370ef0
Call Trace:
<TASK>
? __die_body.cold+0x19/0x2b
? page_fault_oops+0x90/0x210
? idr_alloc_cyclic+0x49/0x90
? exc_page_fault+0x6c/0x130
? asm_exc_page_fault+0x22/0x30
? internal_create_group+0x1d0/0x430
? internal_create_group+0x141/0x430
microcode_init+0x121/0x190
? mtrr_trim_uncached_memory+0x2d0/0x2d0
do_one_initcall+0x2e/0x190
? __kmalloc_noprof+0x1d0/0x2c0
kernel_init_freeable+0x1d0/0x210
? rest_init+0xc0/0xc0
kernel_init+0x16/0x120
ret_from_fork+0x2d/0x50
? rest_init+0xc0/0xc0
ret_from_fork_asm+0x11/0x20
</TASK>
Modules linked in:
CR2: 000000000aa0021d
---[ end trace 0000000000000000 ]---
RIP: 0010:internal_create_group+0x1d0/0x430
Code: 8b 0c 24 81 e2 b4 11 00 00 4c 89 ef e8 39 f2 ff ff 85 c0 0f 85 55 01 00 00 49 8b 47 08 49 83 c7 08 41 83 c6 01 48 85 c0 74 22 <0f> b7 48 08 85 ed 74 96 48 8b 30 31 d2 4c 89 ef 89 4c 24 08 e8 07
RSP: 0018:ffffb4a3000e7de8 EFLAGS: 00010246
RAX: 000000000aa00215 RBX: ffffffffbaa1a0c0 RCX: ffff9c6da7a374d0
RDX: 0000000080000001 RSI: ffff9c3da7a36598 RDI: ffff9c7fcc4e5188
RBP: 0000000000000000 R08: 0000000000000228 R09: ffff9bd8403a4f10
R10: ffff9c91c2381dd0 R11: 0000000000000000 R12: ffff9be3a7a3cc00
R13: ffff9c7fcc4e5188 R14: 0000000000000000 R15: ffffffffbb7e3928
FS: 0000000000000000(0000) GS:ffff9c85a7a40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000aa0021d CR3: 000000bd9bc28001 CR4: 0000000000370ef0
note: swapper/0[1] exited with irqs disabled
Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
Kernel Offset: 0x38e00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 ]---
--
Jens Axboe
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: AMD zen microcode updates breaks boot
2024-10-20 3:24 ` Jens Axboe
@ 2024-10-20 12:18 ` Borislav Petkov
2024-10-21 0:47 ` Jens Axboe
0 siblings, 1 reply; 44+ messages in thread
From: Borislav Petkov @ 2024-10-20 12:18 UTC (permalink / raw)
To: Jens Axboe
Cc: Thomas Gleixner, the arch/x86 maintainers, LKML,
regressions@lists.linux.dev
On Sat, Oct 19, 2024 at 09:24:24PM -0600, Jens Axboe wrote:
> This was my initial thought when I saw where it crashed, is this being
> run before node masks are initialized?
Yap, looka here:
...
[ 14.195542] #253
[ 13.435208] microcode: cache_find_patch: equiv_cpu: 0x0, patch_id: 0xaa00009
[ 13.435208] microcode: ucode_rev_to_cpuid: val: 0xaa00009, p.stepping: 0x0, c.stepping: 0x0
[ 13.435208] microcode: ucode_rev_to_cpuid: val: 0xaa00009, p.stepping: 0x0, c.stepping: 0x0
[ 13.435208] microcode: patch_cpus_equivalent: p_cid.full: 0xaa0f00, n_cid.full: 0xaa0f00
[ 13.435208] microcode: cache_find_patch: using 0xaa00009
[ 14.203190] #254
[ 13.435208] microcode: ucode_rev_to_cpuid: val: 0xaa00009, p.stepping: 0x0, c.stepping: 0x0
[ 13.435208] microcode: patch_cpus_equivalent: p_cid.full: 0xaa0f00, n_cid.full: 0xaa0f00
[ 13.435208] microcode: cache_find_patch: using 0xaa00009
[ 13.435208] microcode: cache_find_patch: equiv_cpu: 0x0, patch_id: 0xaa00009
[ 14.223125] #255
[ 13.435208] microcode: ucode_rev_to_cpuid: val: 0xaa00009, p.stepping: 0x0, c.stepping: 0x0
[ 13.435208] microcode: cache_find_patch: equiv_cpu: 0x0, patch_id: 0xaa00009
<--- Patching on the first node ends here and now it starts fixing up the
NUMA masks:
[ 13.435153] numa_add_cpu cpu 1 node 0: mask now 0-1
[ 13.435153] numa_add_cpu cpu 2 node 0: mask now 0-2
[ 13.435153] numa_add_cpu cpu 3 node 0: mask now 0-3
[ 13.435208] microcode: ucode_rev_to_cpuid: val: 0xaa00009, p.stepping: 0x0, c.stepping: 0x0
[ 13.435153] numa_add_cpu cpu 4 node 0: mask now 0-4
[ 13.435154] numa_add_cpu cpu 5 node 0: mask now 0-5
[ 13.435208] microcode: ucode_rev_to_cpuid: val: 0xaa00009, p.stepping: 0x0, c.stepping: 0x0
[ 13.435154] numa_add_cpu cpu 6 node 0: mask now 0-6
[ 13.435208] microcode: patch_cpus_equivalent: p_cid.full: 0xaa0f00, n_cid.full: 0xaa0f00
[ 13.435154] numa_add_cpu cpu 7 node 0: mask now 0-7
[ 13.435154] numa_add_cpu cpu 8 node 1: mask now 8
[ 13.435154] numa_add_cpu cpu 9 node 1: mask now 8-9
...
so we're clearly too early.
Ok, this diff at the end should fix it completely. I think. The same dance,
replace, build and boot.
I've moved the iteration over the nodes in the late loading path as this is
where we need it only anyway. If you wanna test that too and you feel brave,
do this after the machine boots:
# echo 1 > /sys/devices/system/cpu/microcode/reload
It should say
bash: echo: write error: File descriptor in bad state
because you don't have newer microcode but there will be output in dmesg and
the nodes should look all correct. Like here:
dmesg | grep nid:
[ 156.770267] microcode: load_microcode_amd: nid: 0, cpu: 0, c ptr: 0xff11000c1f819020
[ 156.829217] microcode: load_microcode_amd: nid: 1, cpu: 8, c ptr: 0xff1100181f619020
[ 156.888810] microcode: load_microcode_amd: nid: 2, cpu: 16, c ptr: 0xff11006c1f619020
[ 156.950003] microcode: load_microcode_amd: nid: 3, cpu: 24, c ptr: 0xff1100781f619020
[ 157.011062] microcode: load_microcode_amd: nid: 4, cpu: 32, c ptr: 0xff11003c1f619020
[ 157.071976] microcode: load_microcode_amd: nid: 5, cpu: 40, c ptr: 0xff1100481f619020
[ 157.132870] microcode: load_microcode_amd: nid: 6, cpu: 48, c ptr: 0xff11009c1f619020
[ 157.193847] microcode: load_microcode_amd: nid: 7, cpu: 56, c ptr: 0xff1100a81f619020
[ 157.254828] microcode: load_microcode_amd: nid: 8, cpu: 64, c ptr: 0xff1100541f619020
[ 157.315782] microcode: load_microcode_amd: nid: 9, cpu: 72, c ptr: 0xff1100601f619020
[ 157.376709] microcode: load_microcode_amd: nid: 10, cpu: 80, c ptr: 0xff1100b41f619020
[ 157.437826] microcode: load_microcode_amd: nid: 11, cpu: 88, c ptr: 0xff1100c01f619020
[ 157.498932] microcode: load_microcode_amd: nid: 12, cpu: 96, c ptr: 0xff1100241f619020
[ 157.559974] microcode: load_microcode_amd: nid: 13, cpu: 104, c ptr: 0xff1100301f619020
[ 157.621153] microcode: load_microcode_amd: nid: 14, cpu: 112, c ptr: 0xff1100841f619020
[ 157.682331] microcode: load_microcode_amd: nid: 15, cpu: 120, c ptr: 0xff1100901f619020
[ 157.743529] microcode: load_microcode_amd: nid: 16, cpu: 128, c ptr: 0xff1101080f619020
[ 157.804669] microcode: load_microcode_amd: nid: 17, cpu: 136, c ptr: 0xff1101104f619020
[ 157.865863] microcode: load_microcode_amd: nid: 18, cpu: 144, c ptr: 0xff11014a0f619020
[ 157.927048] microcode: load_microcode_amd: nid: 19, cpu: 152, c ptr: 0xff1101524f619020
[ 157.988263] microcode: load_microcode_amd: nid: 20, cpu: 160, c ptr: 0xff1101290f619020
[ 158.049464] microcode: load_microcode_amd: nid: 21, cpu: 168, c ptr: 0xff1101314f619020
[ 158.110717] microcode: load_microcode_amd: nid: 22, cpu: 176, c ptr: 0xff11016b0f619020
[ 158.171976] microcode: load_microcode_amd: nid: 23, cpu: 184, c ptr: 0xff1101734f619020
[ 158.233168] microcode: load_microcode_amd: nid: 24, cpu: 192, c ptr: 0xff1101398f619020
[ 158.294369] microcode: load_microcode_amd: nid: 25, cpu: 200, c ptr: 0xff110141cf619020
[ 158.355615] microcode: load_microcode_amd: nid: 26, cpu: 208, c ptr: 0xff11017b8f619020
[ 158.416845] microcode: load_microcode_amd: nid: 27, cpu: 216, c ptr: 0xff110183c3019020
[ 158.478043] microcode: load_microcode_amd: nid: 28, cpu: 224, c ptr: 0xff1101188f619020
[ 158.539273] microcode: load_microcode_amd: nid: 29, cpu: 232, c ptr: 0xff110120cf619020
[ 158.600503] microcode: load_microcode_amd: nid: 30, cpu: 240, c ptr: 0xff11015a8f619020
[ 158.661735] microcode: load_microcode_amd: nid: 31, cpu: 248, c ptr: 0xff110162cf619020
Every node has 8 CPUs in it, as it should.
Anyway, full conglomerate diff below.
Thx.
---
diff --git a/arch/x86/kernel/cpu/microcode/amd.c b/arch/x86/kernel/cpu/microcode/amd.c
index f63b051f25a0..602fd382e0b5 100644
--- a/arch/x86/kernel/cpu/microcode/amd.c
+++ b/arch/x86/kernel/cpu/microcode/amd.c
@@ -158,6 +158,9 @@ static union cpuid_1_eax ucode_rev_to_cpuid(unsigned int val)
c.family = 0xf;
c.ext_fam = p.ext_fam;
+ pr_info("%s: val: 0x%x, p.stepping: 0x%x, c.stepping: 0x%x\n",
+ __func__, val, p.stepping, c.stepping);
+
return c;
}
@@ -584,7 +587,7 @@ void __init load_ucode_amd_bsp(struct early_load_data *ed, unsigned int cpuid_1_
native_rdmsr(MSR_AMD64_PATCH_LEVEL, ed->new_rev, dummy);
}
-static enum ucode_state load_microcode_amd(u8 family, const u8 *data, size_t size);
+static enum ucode_state _load_microcode_amd(u8 family, const u8 *data, size_t size);
static int __init save_microcode_in_initrd(void)
{
@@ -605,7 +608,7 @@ static int __init save_microcode_in_initrd(void)
if (!desc.mc)
return -EINVAL;
- ret = load_microcode_amd(x86_family(cpuid_1_eax), desc.data, desc.size);
+ ret = _load_microcode_amd(x86_family(cpuid_1_eax), desc.data, desc.size);
if (ret > UCODE_UPDATED)
return -EINVAL;
@@ -613,16 +616,22 @@ static int __init save_microcode_in_initrd(void)
}
early_initcall(save_microcode_in_initrd);
-static inline bool patch_cpus_equivalent(struct ucode_patch *p, struct ucode_patch *n)
+static inline bool patch_cpus_equivalent(struct ucode_patch *p,
+ struct ucode_patch *n,
+ bool ignore_stepping)
{
/* Zen and newer hardcode the f/m/s in the patch ID */
if (x86_family(bsp_cpuid_1_eax) >= 0x17) {
union cpuid_1_eax p_cid = ucode_rev_to_cpuid(p->patch_id);
union cpuid_1_eax n_cid = ucode_rev_to_cpuid(n->patch_id);
- /* Zap stepping */
- p_cid.stepping = 0;
- n_cid.stepping = 0;
+ if (ignore_stepping) {
+ p_cid.stepping = 0;
+ n_cid.stepping = 0;
+ }
+
+ pr_info("%s: p_cid.full: 0x%x, n_cid.full: 0x%x\n",
+ __func__, p_cid.full, n_cid.full);
return p_cid.full == n_cid.full;
} else {
@@ -641,16 +650,22 @@ static struct ucode_patch *cache_find_patch(struct ucode_cpu_info *uci, u16 equi
n.equiv_cpu = equiv_cpu;
n.patch_id = uci->cpu_sig.rev;
+ pr_info("%s: equiv_cpu: 0x%x, patch_id: 0x%x\n",
+ __func__, equiv_cpu, uci->cpu_sig.rev);
+
WARN_ON_ONCE(!n.patch_id);
- list_for_each_entry(p, µcode_cache, plist)
- if (patch_cpus_equivalent(p, &n))
+ list_for_each_entry(p, µcode_cache, plist) {
+ if (patch_cpus_equivalent(p, &n, false)) {
+ pr_info("%s: using 0x%x\n", __func__, p->patch_id);
return p;
+ }
+ }
return NULL;
}
-static inline bool patch_newer(struct ucode_patch *p, struct ucode_patch *n)
+static inline int patch_newer(struct ucode_patch *p, struct ucode_patch *n)
{
/* Zen and newer hardcode the f/m/s in the patch ID */
if (x86_family(bsp_cpuid_1_eax) >= 0x17) {
@@ -659,6 +674,9 @@ static inline bool patch_newer(struct ucode_patch *p, struct ucode_patch *n)
zp.ucode_rev = p->patch_id;
zn.ucode_rev = n->patch_id;
+ if (zn.stepping != zp.stepping)
+ return -1;
+
return zn.rev > zp.rev;
} else {
return n->patch_id > p->patch_id;
@@ -668,22 +686,32 @@ static inline bool patch_newer(struct ucode_patch *p, struct ucode_patch *n)
static void update_cache(struct ucode_patch *new_patch)
{
struct ucode_patch *p;
+ int ret;
list_for_each_entry(p, µcode_cache, plist) {
- if (patch_cpus_equivalent(p, new_patch)) {
- if (!patch_newer(p, new_patch)) {
+ if (patch_cpus_equivalent(p, new_patch, true)) {
+ ret = patch_newer(p, new_patch);
+ if (ret < 0)
+ continue;
+ else if (!ret) {
/* we already have the latest patch */
kfree(new_patch->data);
kfree(new_patch);
return;
}
+ pr_info("%s: replace 0x%x with 0x%x\n",
+ __func__, p->patch_id, new_patch->patch_id);
+
list_replace(&p->plist, &new_patch->plist);
kfree(p->data);
kfree(p);
return;
}
}
+
+ pr_info("%s: add patch: 0x%x\n", __func__, new_patch->patch_id);
+
/* no patch found, add it */
list_add_tail(&new_patch->plist, µcode_cache);
}
@@ -944,30 +972,45 @@ static enum ucode_state __load_microcode_amd(u8 family, const u8 *data,
return UCODE_OK;
}
-static enum ucode_state load_microcode_amd(u8 family, const u8 *data, size_t size)
+static enum ucode_state _load_microcode_amd(u8 family, const u8 *data, size_t size)
{
- struct cpuinfo_x86 *c;
- unsigned int nid, cpu;
- struct ucode_patch *p;
enum ucode_state ret;
/* free old equiv table */
free_equiv_cpu_table();
ret = __load_microcode_amd(family, data, size);
- if (ret != UCODE_OK) {
+ if (ret != UCODE_OK)
cleanup();
+
+ return ret;
+}
+
+static enum ucode_state load_microcode_amd(u8 family, const u8 *data, size_t size)
+{
+ struct cpuinfo_x86 *c;
+ unsigned int nid, cpu;
+ struct ucode_patch *p;
+ enum ucode_state ret;
+
+ ret = _load_microcode_amd(family, data, size);
+ if (ret != UCODE_OK)
return ret;
- }
for_each_node(nid) {
cpu = cpumask_first(cpumask_of_node(nid));
c = &cpu_data(cpu);
+ pr_info("%s: nid: %d, cpu: %d, c ptr: 0x%px\n",
+ __func__, nid, cpu, c);
+
p = find_patch(cpu);
if (!p)
continue;
+ pr_info("%s: microcode: 0x%x, patch_id: 0x%x\n",
+ __func__, c->microcode, p->patch_id);
+
if (c->microcode >= p->patch_id)
continue;
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: AMD zen microcode updates breaks boot
2024-10-20 12:18 ` Borislav Petkov
@ 2024-10-21 0:47 ` Jens Axboe
2024-10-21 7:31 ` Borislav Petkov
0 siblings, 1 reply; 44+ messages in thread
From: Jens Axboe @ 2024-10-21 0:47 UTC (permalink / raw)
To: Borislav Petkov
Cc: Thomas Gleixner, the arch/x86 maintainers, LKML,
regressions@lists.linux.dev
[-- Attachment #1: Type: text/plain, Size: 874 bytes --]
On 10/20/24 6:18 AM, Borislav Petkov wrote:
> Ok, this diff at the end should fix it completely. I think. The same dance,
> replace, build and boot.
Works, outside of waiting 3 min for the serial console spewage from the
patch! But it boots and it's happy after that. Attached here gzip'ed as
it's 1.3M otherwise.
> I've moved the iteration over the nodes in the late loading path as this is
> where we need it only anyway. If you wanna test that too and you feel brave,
> do this after the machine boots:
>
> # echo 1 > /sys/devices/system/cpu/microcode/reload
>
> It should say
>
> bash: echo: write error: File descriptor in bad state
I get:
root@r7625 ~# echo 1 > /sys/devices/system/cpu/microcode/reload
warning: An error occurred while redirecting file '/sys/devices/system/cpu/microcode/reload'
open: Permission denied
and no output in dmesg.
--
Jens Axboe
[-- Attachment #2: r7625-dmesg.txt.gz --]
[-- Type: application/gzip, Size: 106285 bytes --]
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: AMD zen microcode updates breaks boot
2024-10-21 0:47 ` Jens Axboe
@ 2024-10-21 7:31 ` Borislav Petkov
2024-10-21 17:00 ` Jens Axboe
0 siblings, 1 reply; 44+ messages in thread
From: Borislav Petkov @ 2024-10-21 7:31 UTC (permalink / raw)
To: Jens Axboe
Cc: Thomas Gleixner, the arch/x86 maintainers, LKML,
regressions@lists.linux.dev
On Sun, Oct 20, 2024 at 06:47:25PM -0600, Jens Axboe wrote:
> Works, outside of waiting 3 min for the serial console spewage from the
> patch! But it boots and it's happy after that. Attached here gzip'ed as
> it's 1.3M otherwise.
Thanks for testing.
Lemme write proper patches, run them everywhere here and give them to you for
one final testing.
> I get:
>
> root@r7625 ~# echo 1 > /sys/devices/system/cpu/microcode/reload
> warning: An error occurred while redirecting file '/sys/devices/system/cpu/microcode/reload'
> open: Permission denied
>
> and no output in dmesg.
Right, you need
CONFIG_MICROCODE_LATE_LOADING=y
for that.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: AMD zen microcode updates breaks boot
2024-10-21 7:31 ` Borislav Petkov
@ 2024-10-21 17:00 ` Jens Axboe
2024-10-22 12:05 ` Borislav Petkov
0 siblings, 1 reply; 44+ messages in thread
From: Jens Axboe @ 2024-10-21 17:00 UTC (permalink / raw)
To: Borislav Petkov
Cc: Thomas Gleixner, the arch/x86 maintainers, LKML,
regressions@lists.linux.dev
On 10/21/24 1:31 AM, Borislav Petkov wrote:
> On Sun, Oct 20, 2024 at 06:47:25PM -0600, Jens Axboe wrote:
>> Works, outside of waiting 3 min for the serial console spewage from the
>> patch! But it boots and it's happy after that. Attached here gzip'ed as
>> it's 1.3M otherwise.
>
> Thanks for testing.
>
> Lemme write proper patches, run them everywhere here and give them to you for
> one final testing.
Sounds good, I'll give them a spin once posted.
>> root@r7625 ~# echo 1 > /sys/devices/system/cpu/microcode/reload
>> warning: An error occurred while redirecting file '/sys/devices/system/cpu/microcode/reload'
>> open: Permission denied
>>
>> and no output in dmesg.
>
> Right, you need
>
> CONFIG_MICROCODE_LATE_LOADING=y
>
> for that.
Ah gotcha, makes sense.
--
Jens Axboe
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: AMD zen microcode updates breaks boot
2024-10-21 17:00 ` Jens Axboe
@ 2024-10-22 12:05 ` Borislav Petkov
2024-10-22 12:07 ` [PATCH 1/2] x86/microcode/AMD: Pay attention to the stepping dynamically Borislav Petkov
2024-10-22 12:08 ` [PATCH 2/2] x86/microcode/AMD: Split load_microcode_amd() Borislav Petkov
0 siblings, 2 replies; 44+ messages in thread
From: Borislav Petkov @ 2024-10-22 12:05 UTC (permalink / raw)
To: Jens Axboe
Cc: Thomas Gleixner, the arch/x86 maintainers, LKML,
regressions@lists.linux.dev
On Mon, Oct 21, 2024 at 11:00:26AM -0600, Jens Axboe wrote:
> Sounds good, I'll give them a spin once posted.
Coming up as replies to this message.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 1/2] x86/microcode/AMD: Pay attention to the stepping dynamically
2024-10-22 12:05 ` Borislav Petkov
@ 2024-10-22 12:07 ` Borislav Petkov
2024-10-22 12:08 ` [PATCH 2/2] x86/microcode/AMD: Split load_microcode_amd() Borislav Petkov
1 sibling, 0 replies; 44+ messages in thread
From: Borislav Petkov @ 2024-10-22 12:07 UTC (permalink / raw)
To: Jens Axboe
Cc: Thomas Gleixner, the arch/x86 maintainers, LKML,
regressions@lists.linux.dev
From: "Borislav Petkov (AMD)" <bp@alien8.de>
Date: Mon, 21 Oct 2024 10:27:52 +0200
Commit in Fixes changed how a microcode patch is loaded on Zen and newer
but the patch matching needs to happen with different rigidity,
depending on what is being done:
1) When the patch is added to the patches cache, the stepping must be
ignored because the driver still supports different steppings per
system
2) When the patch is matched for loading, then the stepping must be
taken into account because each CPU needs the patch matching its
stepping
Take care of that by making the matching smarter.
Fixes: 94838d230a6c ("x86/microcode/AMD: Use the family,model,stepping encoded in the patch ID")
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/91194406-3fdf-4e38-9838-d334af538f74@kernel.dk
---
arch/x86/kernel/cpu/microcode/amd.c | 26 ++++++++++++++++++--------
1 file changed, 18 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kernel/cpu/microcode/amd.c b/arch/x86/kernel/cpu/microcode/amd.c
index f63b051f25a0..1ae36ab37fe8 100644
--- a/arch/x86/kernel/cpu/microcode/amd.c
+++ b/arch/x86/kernel/cpu/microcode/amd.c
@@ -613,16 +613,19 @@ static int __init save_microcode_in_initrd(void)
}
early_initcall(save_microcode_in_initrd);
-static inline bool patch_cpus_equivalent(struct ucode_patch *p, struct ucode_patch *n)
+static inline bool patch_cpus_equivalent(struct ucode_patch *p,
+ struct ucode_patch *n,
+ bool ignore_stepping)
{
/* Zen and newer hardcode the f/m/s in the patch ID */
if (x86_family(bsp_cpuid_1_eax) >= 0x17) {
union cpuid_1_eax p_cid = ucode_rev_to_cpuid(p->patch_id);
union cpuid_1_eax n_cid = ucode_rev_to_cpuid(n->patch_id);
- /* Zap stepping */
- p_cid.stepping = 0;
- n_cid.stepping = 0;
+ if (ignore_stepping) {
+ p_cid.stepping = 0;
+ n_cid.stepping = 0;
+ }
return p_cid.full == n_cid.full;
} else {
@@ -644,13 +647,13 @@ static struct ucode_patch *cache_find_patch(struct ucode_cpu_info *uci, u16 equi
WARN_ON_ONCE(!n.patch_id);
list_for_each_entry(p, µcode_cache, plist)
- if (patch_cpus_equivalent(p, &n))
+ if (patch_cpus_equivalent(p, &n, false))
return p;
return NULL;
}
-static inline bool patch_newer(struct ucode_patch *p, struct ucode_patch *n)
+static inline int patch_newer(struct ucode_patch *p, struct ucode_patch *n)
{
/* Zen and newer hardcode the f/m/s in the patch ID */
if (x86_family(bsp_cpuid_1_eax) >= 0x17) {
@@ -659,6 +662,9 @@ static inline bool patch_newer(struct ucode_patch *p, struct ucode_patch *n)
zp.ucode_rev = p->patch_id;
zn.ucode_rev = n->patch_id;
+ if (zn.stepping != zp.stepping)
+ return -1;
+
return zn.rev > zp.rev;
} else {
return n->patch_id > p->patch_id;
@@ -668,10 +674,14 @@ static inline bool patch_newer(struct ucode_patch *p, struct ucode_patch *n)
static void update_cache(struct ucode_patch *new_patch)
{
struct ucode_patch *p;
+ int ret;
list_for_each_entry(p, µcode_cache, plist) {
- if (patch_cpus_equivalent(p, new_patch)) {
- if (!patch_newer(p, new_patch)) {
+ if (patch_cpus_equivalent(p, new_patch, true)) {
+ ret = patch_newer(p, new_patch);
+ if (ret < 0)
+ continue;
+ else if (!ret) {
/* we already have the latest patch */
kfree(new_patch->data);
kfree(new_patch);
--
2.43.0
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 2/2] x86/microcode/AMD: Split load_microcode_amd()
2024-10-22 12:05 ` Borislav Petkov
2024-10-22 12:07 ` [PATCH 1/2] x86/microcode/AMD: Pay attention to the stepping dynamically Borislav Petkov
@ 2024-10-22 12:08 ` Borislav Petkov
2024-10-22 13:15 ` Jens Axboe
1 sibling, 1 reply; 44+ messages in thread
From: Borislav Petkov @ 2024-10-22 12:08 UTC (permalink / raw)
To: Jens Axboe
Cc: Thomas Gleixner, the arch/x86 maintainers, LKML,
regressions@lists.linux.dev
From: "Borislav Petkov (AMD)" <bp@alien8.de>
Date: Mon, 21 Oct 2024 10:38:21 +0200
This function should've been split a long time ago because it is used in
two paths:
1) On the late loading path, when the microcode is loaded through the
request_firmware interface
2) In the save_microcode_in_initrd() path which collects all the
microcode patches which are relevant for the current system before
the initrd with the microcode container has been jettisoned.
In that path, it is not really necessary to iterate over the nodes on
a system and match a patch however it didn't cause any trouble so it
was left for a later cleanup
However, that later cleanup was expedited by the fact that Jens was
enabling "Use L3 as a NUMA node" in the BIOS setting in his machine and
so this causes the NUMA CPU masks used in cpumask_of_node() to be
generated *after* 2) above happened on the first node. Which means, all
those masks were funky, wrong, uninitialized and whatnot, leading to
explosions when dereffing c->microcode in load_microcode_amd().
So split that function and do only the necessary work needed at each
stage.
Fixes: 94838d230a6c ("x86/microcode/AMD: Use the family,model,stepping encoded in the patch ID")
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/91194406-3fdf-4e38-9838-d334af538f74@kernel.dk
---
arch/x86/kernel/cpu/microcode/amd.c | 25 +++++++++++++++++--------
1 file changed, 17 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kernel/cpu/microcode/amd.c b/arch/x86/kernel/cpu/microcode/amd.c
index 1ae36ab37fe8..31a73715d755 100644
--- a/arch/x86/kernel/cpu/microcode/amd.c
+++ b/arch/x86/kernel/cpu/microcode/amd.c
@@ -584,7 +584,7 @@ void __init load_ucode_amd_bsp(struct early_load_data *ed, unsigned int cpuid_1_
native_rdmsr(MSR_AMD64_PATCH_LEVEL, ed->new_rev, dummy);
}
-static enum ucode_state load_microcode_amd(u8 family, const u8 *data, size_t size);
+static enum ucode_state _load_microcode_amd(u8 family, const u8 *data, size_t size);
static int __init save_microcode_in_initrd(void)
{
@@ -605,7 +605,7 @@ static int __init save_microcode_in_initrd(void)
if (!desc.mc)
return -EINVAL;
- ret = load_microcode_amd(x86_family(cpuid_1_eax), desc.data, desc.size);
+ ret = _load_microcode_amd(x86_family(cpuid_1_eax), desc.data, desc.size);
if (ret > UCODE_UPDATED)
return -EINVAL;
@@ -954,21 +954,30 @@ static enum ucode_state __load_microcode_amd(u8 family, const u8 *data,
return UCODE_OK;
}
-static enum ucode_state load_microcode_amd(u8 family, const u8 *data, size_t size)
+static enum ucode_state _load_microcode_amd(u8 family, const u8 *data, size_t size)
{
- struct cpuinfo_x86 *c;
- unsigned int nid, cpu;
- struct ucode_patch *p;
enum ucode_state ret;
/* free old equiv table */
free_equiv_cpu_table();
ret = __load_microcode_amd(family, data, size);
- if (ret != UCODE_OK) {
+ if (ret != UCODE_OK)
cleanup();
+
+ return ret;
+}
+
+static enum ucode_state load_microcode_amd(u8 family, const u8 *data, size_t size)
+{
+ struct cpuinfo_x86 *c;
+ unsigned int nid, cpu;
+ struct ucode_patch *p;
+ enum ucode_state ret;
+
+ ret = _load_microcode_amd(family, data, size);
+ if (ret != UCODE_OK)
return ret;
- }
for_each_node(nid) {
cpu = cpumask_first(cpumask_of_node(nid));
--
2.43.0
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] x86/microcode/AMD: Split load_microcode_amd()
2024-10-22 12:08 ` [PATCH 2/2] x86/microcode/AMD: Split load_microcode_amd() Borislav Petkov
@ 2024-10-22 13:15 ` Jens Axboe
2024-10-22 14:33 ` Borislav Petkov
0 siblings, 1 reply; 44+ messages in thread
From: Jens Axboe @ 2024-10-22 13:15 UTC (permalink / raw)
To: Borislav Petkov
Cc: Thomas Gleixner, the arch/x86 maintainers, LKML,
regressions@lists.linux.dev
Tested 1+2 together only, but for those two:
Tested-by: Jens Axboe <axboe@kernel.dk>
Thanks Boris!
--
Jens Axboe
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] x86/microcode/AMD: Split load_microcode_amd()
2024-10-22 13:15 ` Jens Axboe
@ 2024-10-22 14:33 ` Borislav Petkov
0 siblings, 0 replies; 44+ messages in thread
From: Borislav Petkov @ 2024-10-22 14:33 UTC (permalink / raw)
To: Jens Axboe
Cc: Thomas Gleixner, the arch/x86 maintainers, LKML,
regressions@lists.linux.dev
On Tue, Oct 22, 2024 at 07:15:07AM -0600, Jens Axboe wrote:
> Tested 1+2 together only, but for those two:
>
> Tested-by: Jens Axboe <axboe@kernel.dk>
>
> Thanks Boris!
Thanks too, Jens, for the serious testing effort - very much appreciated!
Lemme queue them.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 44+ messages in thread
* [tip: x86/urgent] x86/microcode/AMD: Split load_microcode_amd()
2024-09-27 15:17 AMD zen microcode updates breaks boot Jens Axboe
2024-09-28 6:10 ` Borislav Petkov
2024-10-22 16:08 ` [tip: x86/urgent] x86/microcode/AMD: Pay attention to the stepping dynamically tip-bot2 for Borislav Petkov (AMD)
@ 2024-10-22 16:08 ` tip-bot2 for Borislav Petkov (AMD)
2 siblings, 0 replies; 44+ messages in thread
From: tip-bot2 for Borislav Petkov (AMD) @ 2024-10-22 16:08 UTC (permalink / raw)
To: linux-tip-commits; +Cc: Jens Axboe, Borislav Petkov (AMD), x86, linux-kernel
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: 1d81d85d1a19e50d5237dc67d6b825c34ae13de8
Gitweb: https://git.kernel.org/tip/1d81d85d1a19e50d5237dc67d6b825c34ae13de8
Author: Borislav Petkov (AMD) <bp@alien8.de>
AuthorDate: Mon, 21 Oct 2024 10:38:21 +02:00
Committer: Borislav Petkov (AMD) <bp@alien8.de>
CommitterDate: Tue, 22 Oct 2024 16:48:00 +02:00
x86/microcode/AMD: Split load_microcode_amd()
This function should've been split a long time ago because it is used in
two paths:
1) On the late loading path, when the microcode is loaded through the
request_firmware interface
2) In the save_microcode_in_initrd() path which collects all the
microcode patches which are relevant for the current system before
the initrd with the microcode container has been jettisoned.
In that path, it is not really necessary to iterate over the nodes on
a system and match a patch however it didn't cause any trouble so it
was left for a later cleanup
However, that later cleanup was expedited by the fact that Jens was
enabling "Use L3 as a NUMA node" in the BIOS setting in his machine and
so this causes the NUMA CPU masks used in cpumask_of_node() to be
generated *after* 2) above happened on the first node. Which means, all
those masks were funky, wrong, uninitialized and whatnot, leading to
explosions when dereffing c->microcode in load_microcode_amd().
So split that function and do only the necessary work needed at each
stage.
Fixes: 94838d230a6c ("x86/microcode/AMD: Use the family,model,stepping encoded in the patch ID")
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/91194406-3fdf-4e38-9838-d334af538f74@kernel.dk
---
arch/x86/kernel/cpu/microcode/amd.c | 25 +++++++++++++++++--------
1 file changed, 17 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kernel/cpu/microcode/amd.c b/arch/x86/kernel/cpu/microcode/amd.c
index 1ae36ab..31a7371 100644
--- a/arch/x86/kernel/cpu/microcode/amd.c
+++ b/arch/x86/kernel/cpu/microcode/amd.c
@@ -584,7 +584,7 @@ void __init load_ucode_amd_bsp(struct early_load_data *ed, unsigned int cpuid_1_
native_rdmsr(MSR_AMD64_PATCH_LEVEL, ed->new_rev, dummy);
}
-static enum ucode_state load_microcode_amd(u8 family, const u8 *data, size_t size);
+static enum ucode_state _load_microcode_amd(u8 family, const u8 *data, size_t size);
static int __init save_microcode_in_initrd(void)
{
@@ -605,7 +605,7 @@ static int __init save_microcode_in_initrd(void)
if (!desc.mc)
return -EINVAL;
- ret = load_microcode_amd(x86_family(cpuid_1_eax), desc.data, desc.size);
+ ret = _load_microcode_amd(x86_family(cpuid_1_eax), desc.data, desc.size);
if (ret > UCODE_UPDATED)
return -EINVAL;
@@ -954,21 +954,30 @@ static enum ucode_state __load_microcode_amd(u8 family, const u8 *data,
return UCODE_OK;
}
-static enum ucode_state load_microcode_amd(u8 family, const u8 *data, size_t size)
+static enum ucode_state _load_microcode_amd(u8 family, const u8 *data, size_t size)
{
- struct cpuinfo_x86 *c;
- unsigned int nid, cpu;
- struct ucode_patch *p;
enum ucode_state ret;
/* free old equiv table */
free_equiv_cpu_table();
ret = __load_microcode_amd(family, data, size);
- if (ret != UCODE_OK) {
+ if (ret != UCODE_OK)
cleanup();
+
+ return ret;
+}
+
+static enum ucode_state load_microcode_amd(u8 family, const u8 *data, size_t size)
+{
+ struct cpuinfo_x86 *c;
+ unsigned int nid, cpu;
+ struct ucode_patch *p;
+ enum ucode_state ret;
+
+ ret = _load_microcode_amd(family, data, size);
+ if (ret != UCODE_OK)
return ret;
- }
for_each_node(nid) {
cpu = cpumask_first(cpumask_of_node(nid));
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [tip: x86/urgent] x86/microcode/AMD: Pay attention to the stepping dynamically
2024-09-27 15:17 AMD zen microcode updates breaks boot Jens Axboe
2024-09-28 6:10 ` Borislav Petkov
@ 2024-10-22 16:08 ` tip-bot2 for Borislav Petkov (AMD)
2024-10-22 16:08 ` [tip: x86/urgent] x86/microcode/AMD: Split load_microcode_amd() tip-bot2 for Borislav Petkov (AMD)
2 siblings, 0 replies; 44+ messages in thread
From: tip-bot2 for Borislav Petkov (AMD) @ 2024-10-22 16:08 UTC (permalink / raw)
To: linux-tip-commits; +Cc: Jens Axboe, Borislav Petkov (AMD), x86, linux-kernel
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: d1744a4c975b1acbe8b498356d28afbc46c88428
Gitweb: https://git.kernel.org/tip/d1744a4c975b1acbe8b498356d28afbc46c88428
Author: Borislav Petkov (AMD) <bp@alien8.de>
AuthorDate: Mon, 21 Oct 2024 10:27:52 +02:00
Committer: Borislav Petkov (AMD) <bp@alien8.de>
CommitterDate: Tue, 22 Oct 2024 16:37:13 +02:00
x86/microcode/AMD: Pay attention to the stepping dynamically
Commit in Fixes changed how a microcode patch is loaded on Zen and newer but
the patch matching needs to happen with different rigidity, depending on what
is being done:
1) When the patch is added to the patches cache, the stepping must be ignored
because the driver still supports different steppings per system
2) When the patch is matched for loading, then the stepping must be taken into
account because each CPU needs the patch matching its exact stepping
Take care of that by making the matching smarter.
Fixes: 94838d230a6c ("x86/microcode/AMD: Use the family,model,stepping encoded in the patch ID")
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/91194406-3fdf-4e38-9838-d334af538f74@kernel.dk
---
arch/x86/kernel/cpu/microcode/amd.c | 26 ++++++++++++++++++--------
1 file changed, 18 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kernel/cpu/microcode/amd.c b/arch/x86/kernel/cpu/microcode/amd.c
index f63b051..1ae36ab 100644
--- a/arch/x86/kernel/cpu/microcode/amd.c
+++ b/arch/x86/kernel/cpu/microcode/amd.c
@@ -613,16 +613,19 @@ static int __init save_microcode_in_initrd(void)
}
early_initcall(save_microcode_in_initrd);
-static inline bool patch_cpus_equivalent(struct ucode_patch *p, struct ucode_patch *n)
+static inline bool patch_cpus_equivalent(struct ucode_patch *p,
+ struct ucode_patch *n,
+ bool ignore_stepping)
{
/* Zen and newer hardcode the f/m/s in the patch ID */
if (x86_family(bsp_cpuid_1_eax) >= 0x17) {
union cpuid_1_eax p_cid = ucode_rev_to_cpuid(p->patch_id);
union cpuid_1_eax n_cid = ucode_rev_to_cpuid(n->patch_id);
- /* Zap stepping */
- p_cid.stepping = 0;
- n_cid.stepping = 0;
+ if (ignore_stepping) {
+ p_cid.stepping = 0;
+ n_cid.stepping = 0;
+ }
return p_cid.full == n_cid.full;
} else {
@@ -644,13 +647,13 @@ static struct ucode_patch *cache_find_patch(struct ucode_cpu_info *uci, u16 equi
WARN_ON_ONCE(!n.patch_id);
list_for_each_entry(p, µcode_cache, plist)
- if (patch_cpus_equivalent(p, &n))
+ if (patch_cpus_equivalent(p, &n, false))
return p;
return NULL;
}
-static inline bool patch_newer(struct ucode_patch *p, struct ucode_patch *n)
+static inline int patch_newer(struct ucode_patch *p, struct ucode_patch *n)
{
/* Zen and newer hardcode the f/m/s in the patch ID */
if (x86_family(bsp_cpuid_1_eax) >= 0x17) {
@@ -659,6 +662,9 @@ static inline bool patch_newer(struct ucode_patch *p, struct ucode_patch *n)
zp.ucode_rev = p->patch_id;
zn.ucode_rev = n->patch_id;
+ if (zn.stepping != zp.stepping)
+ return -1;
+
return zn.rev > zp.rev;
} else {
return n->patch_id > p->patch_id;
@@ -668,10 +674,14 @@ static inline bool patch_newer(struct ucode_patch *p, struct ucode_patch *n)
static void update_cache(struct ucode_patch *new_patch)
{
struct ucode_patch *p;
+ int ret;
list_for_each_entry(p, µcode_cache, plist) {
- if (patch_cpus_equivalent(p, new_patch)) {
- if (!patch_newer(p, new_patch)) {
+ if (patch_cpus_equivalent(p, new_patch, true)) {
+ ret = patch_newer(p, new_patch);
+ if (ret < 0)
+ continue;
+ else if (!ret) {
/* we already have the latest patch */
kfree(new_patch->data);
kfree(new_patch);
^ permalink raw reply related [flat|nested] 44+ messages in thread
end of thread, other threads:[~2024-10-22 16:08 UTC | newest]
Thread overview: 44+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-27 15:17 AMD zen microcode updates breaks boot Jens Axboe
2024-09-28 6:10 ` Borislav Petkov
2024-09-28 11:31 ` Jens Axboe
2024-09-30 4:43 ` Borislav Petkov
2024-09-30 12:27 ` Jens Axboe
2024-09-30 16:16 ` Borislav Petkov
2024-09-30 16:25 ` Jens Axboe
2024-10-09 9:12 ` Borislav Petkov
2024-10-09 11:04 ` Jens Axboe
2024-10-10 13:46 ` Borislav Petkov
2024-10-10 13:50 ` Jens Axboe
2024-10-17 2:34 ` Jens Axboe
2024-10-17 10:02 ` Borislav Petkov
2024-10-17 14:05 ` Jens Axboe
2024-10-17 14:13 ` Borislav Petkov
2024-10-17 14:23 ` Jens Axboe
2024-10-17 14:27 ` Borislav Petkov
2024-10-17 14:40 ` Jens Axboe
2024-10-18 11:58 ` Borislav Petkov
2024-10-18 12:49 ` Borislav Petkov
2024-10-18 13:30 ` Jens Axboe
2024-10-18 15:51 ` Borislav Petkov
2024-10-18 16:45 ` Dr. David Alan Gilbert
2024-10-18 16:47 ` Jens Axboe
2024-10-18 17:59 ` Dr. David Alan Gilbert
2024-10-18 16:48 ` Jens Axboe
2024-10-18 17:56 ` Borislav Petkov
2024-10-18 18:03 ` Jens Axboe
2024-10-18 23:01 ` Jens Axboe
2024-10-19 9:37 ` Borislav Petkov
2024-10-19 13:54 ` Jens Axboe
2024-10-19 23:21 ` Borislav Petkov
2024-10-20 3:24 ` Jens Axboe
2024-10-20 12:18 ` Borislav Petkov
2024-10-21 0:47 ` Jens Axboe
2024-10-21 7:31 ` Borislav Petkov
2024-10-21 17:00 ` Jens Axboe
2024-10-22 12:05 ` Borislav Petkov
2024-10-22 12:07 ` [PATCH 1/2] x86/microcode/AMD: Pay attention to the stepping dynamically Borislav Petkov
2024-10-22 12:08 ` [PATCH 2/2] x86/microcode/AMD: Split load_microcode_amd() Borislav Petkov
2024-10-22 13:15 ` Jens Axboe
2024-10-22 14:33 ` Borislav Petkov
2024-10-22 16:08 ` [tip: x86/urgent] x86/microcode/AMD: Pay attention to the stepping dynamically tip-bot2 for Borislav Petkov (AMD)
2024-10-22 16:08 ` [tip: x86/urgent] x86/microcode/AMD: Split load_microcode_amd() tip-bot2 for Borislav Petkov (AMD)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox