* [PATCH] RAS/AMD/ATL: Decrease message about unknown DF revision to debug
@ 2026-03-05 15:45 Mario Limonciello
2026-03-06 14:50 ` Borislav Petkov
0 siblings, 1 reply; 17+ messages in thread
From: Mario Limonciello @ 2026-03-05 15:45 UTC (permalink / raw)
To: mario.limonciello, Yazen.Ghannam, Tony Luck, bp, superm1
Cc: yazen.ghannam, linux-edac
commit 187d1b27a1e43 ("RAS/AMD/ATL: Require PRM support for future
systems") made PRM mandatory for future systems; but this is only a
datacenter centric point of view. PRM is implemented on a case by
case basis on other products and thus it will be expected that the
DF revision can't be detected on some systems.
Decrease the applicable messaging to debug.
Fixes: 187d1b27a1e43 ("RAS/AMD/ATL: Require PRM support for future systems")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
---
drivers/ras/amd/atl/system.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/ras/amd/atl/system.c b/drivers/ras/amd/atl/system.c
index 812a30e21d3ad..a9bf05be5c3fc 100644
--- a/drivers/ras/amd/atl/system.c
+++ b/drivers/ras/amd/atl/system.c
@@ -300,7 +300,7 @@ int get_df_system_info(void)
ret = determine_df_rev();
if (ret) {
- pr_warn("Failed to determine DF Revision");
+ pr_debug("Failed to determine DF Revision");
df_cfg.rev = UNKNOWN;
return ret;
}
--
2.53.0
^ permalink raw reply related [flat|nested] 17+ messages in thread* Re: [PATCH] RAS/AMD/ATL: Decrease message about unknown DF revision to debug 2026-03-05 15:45 [PATCH] RAS/AMD/ATL: Decrease message about unknown DF revision to debug Mario Limonciello @ 2026-03-06 14:50 ` Borislav Petkov 2026-03-06 15:03 ` Mario Limonciello 0 siblings, 1 reply; 17+ messages in thread From: Borislav Petkov @ 2026-03-06 14:50 UTC (permalink / raw) To: Mario Limonciello; +Cc: Yazen.Ghannam, Tony Luck, superm1, linux-edac On Thu, Mar 05, 2026 at 09:45:27AM -0600, Mario Limonciello wrote: > commit 187d1b27a1e43 ("RAS/AMD/ATL: Require PRM support for future > systems") made PRM mandatory for future systems; but this is only a > datacenter centric point of view. PRM is implemented on a case by > case basis on other products and thus it will be expected that the > DF revision can't be detected on some systems. > > Decrease the applicable messaging to debug. > > Fixes: 187d1b27a1e43 ("RAS/AMD/ATL: Require PRM support for future systems") > Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> > --- > drivers/ras/amd/atl/system.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/ras/amd/atl/system.c b/drivers/ras/amd/atl/system.c > index 812a30e21d3ad..a9bf05be5c3fc 100644 > --- a/drivers/ras/amd/atl/system.c > +++ b/drivers/ras/amd/atl/system.c > @@ -300,7 +300,7 @@ int get_df_system_info(void) > > ret = determine_df_rev(); > if (ret) { > - pr_warn("Failed to determine DF Revision"); > + pr_debug("Failed to determine DF Revision"); > df_cfg.rev = UNKNOWN; > return ret; > } Well, this doesn't look like the right fix to me. If the platform has PRM, then it shouldn't warn about not being able to determine a DF revision because it doesn't need to... No? -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] RAS/AMD/ATL: Decrease message about unknown DF revision to debug 2026-03-06 14:50 ` Borislav Petkov @ 2026-03-06 15:03 ` Mario Limonciello 2026-03-06 15:32 ` Borislav Petkov 0 siblings, 1 reply; 17+ messages in thread From: Mario Limonciello @ 2026-03-06 15:03 UTC (permalink / raw) To: Borislav Petkov; +Cc: Yazen.Ghannam, Tony Luck, superm1, linux-edac On 3/6/26 8:50 AM, Borislav Petkov wrote: > On Thu, Mar 05, 2026 at 09:45:27AM -0600, Mario Limonciello wrote: >> commit 187d1b27a1e43 ("RAS/AMD/ATL: Require PRM support for future >> systems") made PRM mandatory for future systems; but this is only a >> datacenter centric point of view. PRM is implemented on a case by >> case basis on other products and thus it will be expected that the >> DF revision can't be detected on some systems. >> >> Decrease the applicable messaging to debug. >> >> Fixes: 187d1b27a1e43 ("RAS/AMD/ATL: Require PRM support for future systems") >> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> >> --- >> drivers/ras/amd/atl/system.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/ras/amd/atl/system.c b/drivers/ras/amd/atl/system.c >> index 812a30e21d3ad..a9bf05be5c3fc 100644 >> --- a/drivers/ras/amd/atl/system.c >> +++ b/drivers/ras/amd/atl/system.c >> @@ -300,7 +300,7 @@ int get_df_system_info(void) >> >> ret = determine_df_rev(); >> if (ret) { >> - pr_warn("Failed to determine DF Revision"); >> + pr_debug("Failed to determine DF Revision"); >> df_cfg.rev = UNKNOWN; >> return ret; >> } > > Well, this doesn't look like the right fix to me. > > If the platform has PRM, then it shouldn't warn about not being able to > determine a DF revision because it doesn't need to... > > No? > Well in this case the platform /doesn't/ have PRM. The implication is we're not going to have DF support either if the system doesn't have PRM. We pass the error codes up so the caller of amd_atl_init() will fail init and thus the UMC decoder won't work. That's all the intended behavior for such systems. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] RAS/AMD/ATL: Decrease message about unknown DF revision to debug 2026-03-06 15:03 ` Mario Limonciello @ 2026-03-06 15:32 ` Borislav Petkov 2026-03-06 15:40 ` Mario Limonciello 0 siblings, 1 reply; 17+ messages in thread From: Borislav Petkov @ 2026-03-06 15:32 UTC (permalink / raw) To: Mario Limonciello; +Cc: Yazen.Ghannam, Tony Luck, superm1, linux-edac On Fri, Mar 06, 2026 at 09:03:44AM -0600, Mario Limonciello wrote: > Well in this case the platform /doesn't/ have PRM. The implication is we're > not going to have DF support either if the system doesn't have PRM. > > We pass the error codes up so the caller of amd_atl_init() will fail init > and thus the UMC decoder won't work. > > That's all the intended behavior for such systems. Yah, I had the hunch you're talking about pure clients... :) And as such, they should not even try to load ATL because they don't have ECC memory. And if so, we probably should check somewhere whether ECC is even enabled on the system and then stop loading if not... Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] RAS/AMD/ATL: Decrease message about unknown DF revision to debug 2026-03-06 15:32 ` Borislav Petkov @ 2026-03-06 15:40 ` Mario Limonciello 2026-03-06 15:49 ` Borislav Petkov 0 siblings, 1 reply; 17+ messages in thread From: Mario Limonciello @ 2026-03-06 15:40 UTC (permalink / raw) To: Borislav Petkov; +Cc: Yazen.Ghannam, Tony Luck, superm1, linux-edac On 3/6/26 9:32 AM, Borislav Petkov wrote: > On Fri, Mar 06, 2026 at 09:03:44AM -0600, Mario Limonciello wrote: >> Well in this case the platform /doesn't/ have PRM. The implication is we're >> not going to have DF support either if the system doesn't have PRM. >> >> We pass the error codes up so the caller of amd_atl_init() will fail init >> and thus the UMC decoder won't work. >> >> That's all the intended behavior for such systems. > > Yah, I had the hunch you're talking about pure clients... :) > > And as such, they should not even try to load ATL because they don't have ECC > memory. > > And if so, we probably should check somewhere whether ECC is even enabled on > the system and then stop loading if not... > But don't you need to use UMC to discover that? Chicken and egg type of issue. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] RAS/AMD/ATL: Decrease message about unknown DF revision to debug 2026-03-06 15:40 ` Mario Limonciello @ 2026-03-06 15:49 ` Borislav Petkov 2026-03-07 14:49 ` Yazen Ghannam 0 siblings, 1 reply; 17+ messages in thread From: Borislav Petkov @ 2026-03-06 15:49 UTC (permalink / raw) To: Mario Limonciello, Yazen.Ghannam; +Cc: Tony Luck, superm1, linux-edac On Fri, Mar 06, 2026 at 09:40:06AM -0600, Mario Limonciello wrote: > But don't you need to use UMC to discover that? Chicken and egg type of > issue. Probably... And we already do that in amd64_edac. So perhaps we could export an API or so. Yazen might have an idea... -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] RAS/AMD/ATL: Decrease message about unknown DF revision to debug 2026-03-06 15:49 ` Borislav Petkov @ 2026-03-07 14:49 ` Yazen Ghannam 2026-03-07 15:12 ` Borislav Petkov 2026-03-18 15:43 ` Mario Limonciello 0 siblings, 2 replies; 17+ messages in thread From: Yazen Ghannam @ 2026-03-07 14:49 UTC (permalink / raw) To: Borislav Petkov; +Cc: Mario Limonciello, Tony Luck, superm1, linux-edac On Fri, Mar 06, 2026 at 04:49:47PM +0100, Borislav Petkov wrote: > On Fri, Mar 06, 2026 at 09:40:06AM -0600, Mario Limonciello wrote: > > But don't you need to use UMC to discover that? Chicken and egg type of > > issue. > > Probably... > > And we already do that in amd64_edac. So perhaps we could export an API or so. > Yazen might have an idea... > How about having EDAC load ATL when ready? Thanks, Yazen --- From d4e3cdb2efb34ccb2c234a4b227d0301327ad340 Mon Sep 17 00:00:00 2001 From: Yazen Ghannam <yazen.ghannam@amd.com> Date: Sat, 7 Mar 2026 08:58:56 -0500 Subject: [PATCH] RAS/AMD/ATL, EDAC/amd64: Only load ATL when needed The AMD Address Translation Library (ATL) will attempt to load on all AMD Zen/SMCA systems. However, only systems with DRAM ECC enabled will use the library. Other systems will fail to load the library and produce an unnecessary message to the user. Remove the ATL module dependency table to prevent autoloading. Request ATL to load from EDAC once all system checks are complete. Fixes: 3f3174996be6 ("RAS: Introduce AMD Address Translation Library") Reported-by: Mario Limonciello <mario.limonciello@amd.com> Closes: https://lore.kernel.org/20260305154528.1171999-1-mario.limonciello@amd.com Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> --- drivers/edac/amd64_edac.c | 2 ++ drivers/ras/amd/atl/core.c | 1 - 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c index 8908ab881c85..7b04f7c5e2ba 100644 --- a/drivers/edac/amd64_edac.c +++ b/drivers/edac/amd64_edac.c @@ -4170,6 +4170,8 @@ static int __init amd64_edac_init(void) goto err_pci; } + request_module("amd_atl"); + /* register stuff with EDAC MCE */ if (boot_cpu_data.x86 >= 0x17) { amd_register_ecc_decoder(decode_umc_error); diff --git a/drivers/ras/amd/atl/core.c b/drivers/ras/amd/atl/core.c index 0f7cd6dab0b0..d77dacdd4f56 100644 --- a/drivers/ras/amd/atl/core.c +++ b/drivers/ras/amd/atl/core.c @@ -190,7 +190,6 @@ static const struct x86_cpu_id amd_atl_cpuids[] = { X86_MATCH_FEATURE(X86_FEATURE_ZEN, NULL), { } }; -MODULE_DEVICE_TABLE(x86cpu, amd_atl_cpuids); static int __init amd_atl_init(void) { -- 2.53.0 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] RAS/AMD/ATL: Decrease message about unknown DF revision to debug 2026-03-07 14:49 ` Yazen Ghannam @ 2026-03-07 15:12 ` Borislav Petkov 2026-03-10 12:54 ` Yazen Ghannam 2026-03-18 15:43 ` Mario Limonciello 1 sibling, 1 reply; 17+ messages in thread From: Borislav Petkov @ 2026-03-07 15:12 UTC (permalink / raw) To: Yazen Ghannam; +Cc: Mario Limonciello, Tony Luck, superm1, linux-edac On Sat, Mar 07, 2026 at 09:49:10AM -0500, Yazen Ghannam wrote: > On Fri, Mar 06, 2026 at 04:49:47PM +0100, Borislav Petkov wrote: > > On Fri, Mar 06, 2026 at 09:40:06AM -0600, Mario Limonciello wrote: > > > But don't you need to use UMC to discover that? Chicken and egg type of > > > issue. > > > > Probably... > > > > And we already do that in amd64_edac. So perhaps we could export an API or so. > > Yazen might have an idea... > > > > How about having EDAC load ATL when ready? The thing is, AMD_ATL can also be built-in so then request_module doesn't make sense. Which means, if we have to "tie" it to amd64_edac detection, we'd have to make it synchronize its Kconfig setting to the CONFIG_EDAC_AMD64 setting. Or we could simply say that AMD_ATL is a module only because if anything needs it, then anything should request it. And that makes sense because the address translation should be present only when something else loads which is at all capable of presenting addresses which can be translated. IOW, AMD_ATL should not be builtin at all because, well, it doesn't make any sense for it to be. IOW, its existence alone on the system makes a little sense if there's no address producer like amd64_edac or whatever else calls amd_convert_umc_mca_addr_to_sys_addr()... Hmmm. It sure sounds weird... -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] RAS/AMD/ATL: Decrease message about unknown DF revision to debug 2026-03-07 15:12 ` Borislav Petkov @ 2026-03-10 12:54 ` Yazen Ghannam 2026-03-10 14:58 ` Mario Limonciello 0 siblings, 1 reply; 17+ messages in thread From: Yazen Ghannam @ 2026-03-10 12:54 UTC (permalink / raw) To: Borislav Petkov; +Cc: Mario Limonciello, Tony Luck, superm1, linux-edac On Sat, Mar 07, 2026 at 04:12:31PM +0100, Borislav Petkov wrote: > On Sat, Mar 07, 2026 at 09:49:10AM -0500, Yazen Ghannam wrote: > > On Fri, Mar 06, 2026 at 04:49:47PM +0100, Borislav Petkov wrote: > > > On Fri, Mar 06, 2026 at 09:40:06AM -0600, Mario Limonciello wrote: > > > > But don't you need to use UMC to discover that? Chicken and egg type of > > > > issue. > > > > > > Probably... > > > > > > And we already do that in amd64_edac. So perhaps we could export an API or so. > > > Yazen might have an idea... > > > > > > > How about having EDAC load ATL when ready? > > The thing is, AMD_ATL can also be built-in so then request_module doesn't make > sense. > > Which means, if we have to "tie" it to amd64_edac detection, we'd have to make > it synchronize its Kconfig setting to the CONFIG_EDAC_AMD64 setting. > > Or we could simply say that AMD_ATL is a module only because if anything needs > it, then anything should request it. And that makes sense because the address > translation should be present only when something else loads which is at all > capable of presenting addresses which can be translated. > > IOW, AMD_ATL should not be builtin at all because, well, it doesn't make any > sense for it to be. IOW, its existence alone on the system makes a little > sense if there's no address producer like amd64_edac or whatever else calls > amd_convert_umc_mca_addr_to_sys_addr()... > > Hmmm. > > It sure sounds weird... > So AMD_ATL *can* be built-in, but it is default 'N'. CONFIG_EDAC_AMD64 has 'imply AMD_ATL', so CONFIG_AMD_ATL=CONFIG_EDAC_AMD64. If CONFIG_EDAC_AMD64=m, then CONFIG_AMD_ATL=m. I think this would be the default for most users. EDAC will fail to load on systems without DRAM ECC, so AMD_ATL won't load either. If CONFIG_EDAC_AMD64=y, then CONFIG_AMD_ATL=y. I expect that a user that wants EDAC built-in knows their system will use it. Thanks, Yazen ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] RAS/AMD/ATL: Decrease message about unknown DF revision to debug 2026-03-10 12:54 ` Yazen Ghannam @ 2026-03-10 14:58 ` Mario Limonciello 2026-03-10 16:52 ` Yazen Ghannam 0 siblings, 1 reply; 17+ messages in thread From: Mario Limonciello @ 2026-03-10 14:58 UTC (permalink / raw) To: Yazen Ghannam, Borislav Petkov; +Cc: Tony Luck, superm1, linux-edac On 3/10/26 7:54 AM, Yazen Ghannam wrote: > On Sat, Mar 07, 2026 at 04:12:31PM +0100, Borislav Petkov wrote: >> On Sat, Mar 07, 2026 at 09:49:10AM -0500, Yazen Ghannam wrote: >>> On Fri, Mar 06, 2026 at 04:49:47PM +0100, Borislav Petkov wrote: >>>> On Fri, Mar 06, 2026 at 09:40:06AM -0600, Mario Limonciello wrote: >>>>> But don't you need to use UMC to discover that? Chicken and egg type of >>>>> issue. >>>> >>>> Probably... >>>> >>>> And we already do that in amd64_edac. So perhaps we could export an API or so. >>>> Yazen might have an idea... >>>> >>> >>> How about having EDAC load ATL when ready? >> >> The thing is, AMD_ATL can also be built-in so then request_module doesn't make >> sense. >> >> Which means, if we have to "tie" it to amd64_edac detection, we'd have to make >> it synchronize its Kconfig setting to the CONFIG_EDAC_AMD64 setting. >> >> Or we could simply say that AMD_ATL is a module only because if anything needs >> it, then anything should request it. And that makes sense because the address >> translation should be present only when something else loads which is at all >> capable of presenting addresses which can be translated. >> >> IOW, AMD_ATL should not be builtin at all because, well, it doesn't make any >> sense for it to be. IOW, its existence alone on the system makes a little >> sense if there's no address producer like amd64_edac or whatever else calls >> amd_convert_umc_mca_addr_to_sys_addr()... >> >> Hmmm. >> >> It sure sounds weird... >> > > So AMD_ATL *can* be built-in, but it is default 'N'. CONFIG_EDAC_AMD64 > has 'imply AMD_ATL', so CONFIG_AMD_ATL=CONFIG_EDAC_AMD64. > > If CONFIG_EDAC_AMD64=m, then CONFIG_AMD_ATL=m. I think this would be the > default for most users. EDAC will fail to load on systems without DRAM > ECC, so AMD_ATL won't load either. > > If CONFIG_EDAC_AMD64=y, then CONFIG_AMD_ATL=y. I expect that a user that > wants EDAC built-in knows their system will use it. > > Thanks, > Yazen Are there "going" to be other consumers of AMD_ATL planned? I wonder if it should just be structured as part of amd64_edac and only registered once we know there is ECC support. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] RAS/AMD/ATL: Decrease message about unknown DF revision to debug 2026-03-10 14:58 ` Mario Limonciello @ 2026-03-10 16:52 ` Yazen Ghannam 2026-03-10 17:47 ` Mario Limonciello 0 siblings, 1 reply; 17+ messages in thread From: Yazen Ghannam @ 2026-03-10 16:52 UTC (permalink / raw) To: Mario Limonciello; +Cc: Borislav Petkov, Tony Luck, superm1, linux-edac On Tue, Mar 10, 2026 at 09:58:18AM -0500, Mario Limonciello wrote: > On 3/10/26 7:54 AM, Yazen Ghannam wrote: > > On Sat, Mar 07, 2026 at 04:12:31PM +0100, Borislav Petkov wrote: > > > On Sat, Mar 07, 2026 at 09:49:10AM -0500, Yazen Ghannam wrote: > > > > On Fri, Mar 06, 2026 at 04:49:47PM +0100, Borislav Petkov wrote: > > > > > On Fri, Mar 06, 2026 at 09:40:06AM -0600, Mario Limonciello wrote: > > > > > > But don't you need to use UMC to discover that? Chicken and egg type of > > > > > > issue. > > > > > > > > > > Probably... > > > > > > > > > > And we already do that in amd64_edac. So perhaps we could export an API or so. > > > > > Yazen might have an idea... > > > > > > > > > > > > > How about having EDAC load ATL when ready? > > > > > > The thing is, AMD_ATL can also be built-in so then request_module doesn't make > > > sense. > > > > > > Which means, if we have to "tie" it to amd64_edac detection, we'd have to make > > > it synchronize its Kconfig setting to the CONFIG_EDAC_AMD64 setting. > > > > > > Or we could simply say that AMD_ATL is a module only because if anything needs > > > it, then anything should request it. And that makes sense because the address > > > translation should be present only when something else loads which is at all > > > capable of presenting addresses which can be translated. > > > > > > IOW, AMD_ATL should not be builtin at all because, well, it doesn't make any > > > sense for it to be. IOW, its existence alone on the system makes a little > > > sense if there's no address producer like amd64_edac or whatever else calls > > > amd_convert_umc_mca_addr_to_sys_addr()... > > > > > > Hmmm. > > > > > > It sure sounds weird... > > > > > > > So AMD_ATL *can* be built-in, but it is default 'N'. CONFIG_EDAC_AMD64 > > has 'imply AMD_ATL', so CONFIG_AMD_ATL=CONFIG_EDAC_AMD64. > > > > If CONFIG_EDAC_AMD64=m, then CONFIG_AMD_ATL=m. I think this would be the > > default for most users. EDAC will fail to load on systems without DRAM > > ECC, so AMD_ATL won't load either. > > > > If CONFIG_EDAC_AMD64=y, then CONFIG_AMD_ATL=y. I expect that a user that > > wants EDAC built-in knows their system will use it. > > > > Thanks, > > Yazen > > Are there "going" to be other consumers of AMD_ATL planned? I wonder if it > should just be structured as part of amd64_edac and only registered once we > know there is ECC support. > Yes, actually address translation was once part of amd64_edac. But there is so much code that it was nice to move it to a separate library. Another example of this decoupling is with ACPI_ADXL that Intel uses. There is another user of AMD_ATL: RAS_FMPM. Also, I do have it in mind that AMD_ATL can be used by other places or independently. I have some old WIP where we use it with MCE to do preemptive page offlining. EDAC for x86 is mostly counting and decoding. You could leverage address translation without wanting to use EDAC. And you can use EDAC without address translation. Thanks, Yazen ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] RAS/AMD/ATL: Decrease message about unknown DF revision to debug 2026-03-10 16:52 ` Yazen Ghannam @ 2026-03-10 17:47 ` Mario Limonciello 2026-03-11 13:38 ` Yazen Ghannam 0 siblings, 1 reply; 17+ messages in thread From: Mario Limonciello @ 2026-03-10 17:47 UTC (permalink / raw) To: Yazen Ghannam; +Cc: Borislav Petkov, Tony Luck, superm1, linux-edac On 3/10/26 11:52 AM, Yazen Ghannam wrote: > On Tue, Mar 10, 2026 at 09:58:18AM -0500, Mario Limonciello wrote: >> On 3/10/26 7:54 AM, Yazen Ghannam wrote: >>> On Sat, Mar 07, 2026 at 04:12:31PM +0100, Borislav Petkov wrote: >>>> On Sat, Mar 07, 2026 at 09:49:10AM -0500, Yazen Ghannam wrote: >>>>> On Fri, Mar 06, 2026 at 04:49:47PM +0100, Borislav Petkov wrote: >>>>>> On Fri, Mar 06, 2026 at 09:40:06AM -0600, Mario Limonciello wrote: >>>>>>> But don't you need to use UMC to discover that? Chicken and egg type of >>>>>>> issue. >>>>>> >>>>>> Probably... >>>>>> >>>>>> And we already do that in amd64_edac. So perhaps we could export an API or so. >>>>>> Yazen might have an idea... >>>>>> >>>>> >>>>> How about having EDAC load ATL when ready? >>>> >>>> The thing is, AMD_ATL can also be built-in so then request_module doesn't make >>>> sense. >>>> >>>> Which means, if we have to "tie" it to amd64_edac detection, we'd have to make >>>> it synchronize its Kconfig setting to the CONFIG_EDAC_AMD64 setting. >>>> >>>> Or we could simply say that AMD_ATL is a module only because if anything needs >>>> it, then anything should request it. And that makes sense because the address >>>> translation should be present only when something else loads which is at all >>>> capable of presenting addresses which can be translated. >>>> >>>> IOW, AMD_ATL should not be builtin at all because, well, it doesn't make any >>>> sense for it to be. IOW, its existence alone on the system makes a little >>>> sense if there's no address producer like amd64_edac or whatever else calls >>>> amd_convert_umc_mca_addr_to_sys_addr()... >>>> >>>> Hmmm. >>>> >>>> It sure sounds weird... >>>> >>> >>> So AMD_ATL *can* be built-in, but it is default 'N'. CONFIG_EDAC_AMD64 >>> has 'imply AMD_ATL', so CONFIG_AMD_ATL=CONFIG_EDAC_AMD64. >>> >>> If CONFIG_EDAC_AMD64=m, then CONFIG_AMD_ATL=m. I think this would be the >>> default for most users. EDAC will fail to load on systems without DRAM >>> ECC, so AMD_ATL won't load either. >>> >>> If CONFIG_EDAC_AMD64=y, then CONFIG_AMD_ATL=y. I expect that a user that >>> wants EDAC built-in knows their system will use it. >>> >>> Thanks, >>> Yazen >> >> Are there "going" to be other consumers of AMD_ATL planned? I wonder if it >> should just be structured as part of amd64_edac and only registered once we >> know there is ECC support. >> > > Yes, actually address translation was once part of amd64_edac. But there > is so much code that it was nice to move it to a separate library. > Another example of this decoupling is with ACPI_ADXL that Intel uses. > > There is another user of AMD_ATL: RAS_FMPM. > > Also, I do have it in mind that AMD_ATL can be used by other places or > independently. I have some old WIP where we use it with MCE to do > preemptive page offlining. > > EDAC for x86 is mostly counting and decoding. You could leverage address > translation without wanting to use EDAC. And you can use EDAC without > address translation. > > Thanks, > Yazen OK. I don't think the assertion that amd-atl won't load by default on most systems without ECC is correct. It has a modalias that looks for features (X86_FEATURE_ZEN specifically). alias: cpu:type:x86,ven*fam*mod*:feature:*00FC* alias: cpu:type:x86,ven*fam*mod*:feature:*0223* If you look at this all in a vacuum - is turning down the messaging to debug for amd-atl so bad? ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] RAS/AMD/ATL: Decrease message about unknown DF revision to debug 2026-03-10 17:47 ` Mario Limonciello @ 2026-03-11 13:38 ` Yazen Ghannam 2026-03-11 14:28 ` Mario Limonciello 0 siblings, 1 reply; 17+ messages in thread From: Yazen Ghannam @ 2026-03-11 13:38 UTC (permalink / raw) To: Mario Limonciello; +Cc: Borislav Petkov, Tony Luck, superm1, linux-edac On Tue, Mar 10, 2026 at 12:47:39PM -0500, Mario Limonciello wrote: [...] > > I don't think the assertion that amd-atl won't load by default on most > systems without ECC is correct. It has a modalias that looks for features > (X86_FEATURE_ZEN specifically). > > alias: cpu:type:x86,ven*fam*mod*:feature:*00FC* > alias: cpu:type:x86,ven*fam*mod*:feature:*0223* > Is this with the patch I attached? > > > If you look at this all in a vacuum - is turning down the messaging to debug > for amd-atl so bad? > That works for me. Thanks, Yazen ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] RAS/AMD/ATL: Decrease message about unknown DF revision to debug 2026-03-11 13:38 ` Yazen Ghannam @ 2026-03-11 14:28 ` Mario Limonciello 0 siblings, 0 replies; 17+ messages in thread From: Mario Limonciello @ 2026-03-11 14:28 UTC (permalink / raw) To: Yazen Ghannam; +Cc: Borislav Petkov, Tony Luck, superm1, linux-edac On 3/11/26 08:38, Yazen Ghannam wrote: > On Tue, Mar 10, 2026 at 12:47:39PM -0500, Mario Limonciello wrote: > [...] >> >> I don't think the assertion that amd-atl won't load by default on most >> systems without ECC is correct. It has a modalias that looks for features >> (X86_FEATURE_ZEN specifically). >> >> alias: cpu:type:x86,ven*fam*mod*:feature:*00FC* >> alias: cpu:type:x86,ven*fam*mod*:feature:*0223* >> > > Is this with the patch I attached? > >> >> >> If you look at this all in a vacuum - is turning down the messaging to debug >> for amd-atl so bad? >> > > That works for me. > > Thanks, > Yazen Sorry; I totally missed that patch! That should improve the situation for most people. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] RAS/AMD/ATL: Decrease message about unknown DF revision to debug 2026-03-07 14:49 ` Yazen Ghannam 2026-03-07 15:12 ` Borislav Petkov @ 2026-03-18 15:43 ` Mario Limonciello 2026-03-19 14:36 ` Yazen Ghannam 1 sibling, 1 reply; 17+ messages in thread From: Mario Limonciello @ 2026-03-18 15:43 UTC (permalink / raw) To: Yazen Ghannam, Borislav Petkov Cc: Tony Luck, superm1, linux-edac, Shrirang Deskhmukh On 3/7/2026 8:49 AM, Yazen Ghannam wrote: > On Fri, Mar 06, 2026 at 04:49:47PM +0100, Borislav Petkov wrote: >> On Fri, Mar 06, 2026 at 09:40:06AM -0600, Mario Limonciello wrote: >>> But don't you need to use UMC to discover that? Chicken and egg type of >>> issue. >> >> Probably... >> >> And we already do that in amd64_edac. So perhaps we could export an API or so. >> Yazen might have an idea... >> > > How about having EDAC load ATL when ready? > > Thanks, > Yazen > > --- > >>From d4e3cdb2efb34ccb2c234a4b227d0301327ad340 Mon Sep 17 00:00:00 2001 > From: Yazen Ghannam <yazen.ghannam@amd.com> > Date: Sat, 7 Mar 2026 08:58:56 -0500 > Subject: [PATCH] RAS/AMD/ATL, EDAC/amd64: Only load ATL when needed > > The AMD Address Translation Library (ATL) will attempt to load on all > AMD Zen/SMCA systems. > > However, only systems with DRAM ECC enabled will use the library. Other > systems will fail to load the library and produce an unnecessary message > to the user. > > Remove the ATL module dependency table to prevent autoloading. Request > ATL to load from EDAC once all system checks are complete. > > Fixes: 3f3174996be6 ("RAS: Introduce AMD Address Translation Library") > Reported-by: Mario Limonciello <mario.limonciello@amd.com> > Closes: https://lore.kernel.org/20260305154528.1171999-1-mario.limonciello@amd.com > Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> One of my colleagues tested it. Tested-by: Deskhmukh Shrirang <Shrirang.Deskhmukh@amd.com> > --- > drivers/edac/amd64_edac.c | 2 ++ > drivers/ras/amd/atl/core.c | 1 - > 2 files changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c > index 8908ab881c85..7b04f7c5e2ba 100644 > --- a/drivers/edac/amd64_edac.c > +++ b/drivers/edac/amd64_edac.c > @@ -4170,6 +4170,8 @@ static int __init amd64_edac_init(void) > goto err_pci; > } > > + request_module("amd_atl"); > + > /* register stuff with EDAC MCE */ > if (boot_cpu_data.x86 >= 0x17) { > amd_register_ecc_decoder(decode_umc_error); > diff --git a/drivers/ras/amd/atl/core.c b/drivers/ras/amd/atl/core.c > index 0f7cd6dab0b0..d77dacdd4f56 100644 > --- a/drivers/ras/amd/atl/core.c > +++ b/drivers/ras/amd/atl/core.c > @@ -190,7 +190,6 @@ static const struct x86_cpu_id amd_atl_cpuids[] = { > X86_MATCH_FEATURE(X86_FEATURE_ZEN, NULL), > { } > }; > -MODULE_DEVICE_TABLE(x86cpu, amd_atl_cpuids); Pretty sure this means you can drop amd_atl_cpuids as well. > > static int __init amd_atl_init(void) > { ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] RAS/AMD/ATL: Decrease message about unknown DF revision to debug 2026-03-18 15:43 ` Mario Limonciello @ 2026-03-19 14:36 ` Yazen Ghannam 2026-03-20 3:30 ` Mario Limonciello 0 siblings, 1 reply; 17+ messages in thread From: Yazen Ghannam @ 2026-03-19 14:36 UTC (permalink / raw) To: Mario Limonciello Cc: Borislav Petkov, Tony Luck, superm1, linux-edac, Shrirang Deskhmukh On Wed, Mar 18, 2026 at 10:43:04AM -0500, Mario Limonciello wrote: [...] > > diff --git a/drivers/ras/amd/atl/core.c b/drivers/ras/amd/atl/core.c > > index 0f7cd6dab0b0..d77dacdd4f56 100644 > > --- a/drivers/ras/amd/atl/core.c > > +++ b/drivers/ras/amd/atl/core.c > > @@ -190,7 +190,6 @@ static const struct x86_cpu_id amd_atl_cpuids[] = { > > X86_MATCH_FEATURE(X86_FEATURE_ZEN, NULL), > > { } > > }; > > -MODULE_DEVICE_TABLE(x86cpu, amd_atl_cpuids); > > Pretty sure this means you can drop amd_atl_cpuids as well. Yeah, maybe. The request_module() in EDAC is unconditional. We use the same EDAC module for multiple generations of CPUs. The amd_atl_cpuids check will keep the amd_atl module off the legacy systems. We could redo the flow. But there needs to be a feature check somewhere. The proposed patch is the smallest diff I thought of at the time. Thanks, Yazen ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] RAS/AMD/ATL: Decrease message about unknown DF revision to debug 2026-03-19 14:36 ` Yazen Ghannam @ 2026-03-20 3:30 ` Mario Limonciello 0 siblings, 0 replies; 17+ messages in thread From: Mario Limonciello @ 2026-03-20 3:30 UTC (permalink / raw) To: Yazen Ghannam Cc: Borislav Petkov, Tony Luck, superm1, linux-edac, Shrirang Deskhmukh On 3/19/2026 9:36 AM, Yazen Ghannam wrote: > On Wed, Mar 18, 2026 at 10:43:04AM -0500, Mario Limonciello wrote: > > [...] > >>> diff --git a/drivers/ras/amd/atl/core.c b/drivers/ras/amd/atl/core.c >>> index 0f7cd6dab0b0..d77dacdd4f56 100644 >>> --- a/drivers/ras/amd/atl/core.c >>> +++ b/drivers/ras/amd/atl/core.c >>> @@ -190,7 +190,6 @@ static const struct x86_cpu_id amd_atl_cpuids[] = { >>> X86_MATCH_FEATURE(X86_FEATURE_ZEN, NULL), >>> { } >>> }; >>> -MODULE_DEVICE_TABLE(x86cpu, amd_atl_cpuids); >> >> Pretty sure this means you can drop amd_atl_cpuids as well. > > Yeah, maybe. > > The request_module() in EDAC is unconditional. We use the same EDAC > module for multiple generations of CPUs. > > The amd_atl_cpuids check will keep the amd_atl module off the legacy > systems. > > We could redo the flow. But there needs to be a feature check somewhere. > The proposed patch is the smallest diff I thought of at the time. > > Thanks, > Yazen Well to be fair there was CPU no check before - the module device table just controlled automatic loading. But yeah I think a simple cpu feature check during probe along with the suggested change would solve this. ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2026-03-20 3:30 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-05 15:45 [PATCH] RAS/AMD/ATL: Decrease message about unknown DF revision to debug Mario Limonciello 2026-03-06 14:50 ` Borislav Petkov 2026-03-06 15:03 ` Mario Limonciello 2026-03-06 15:32 ` Borislav Petkov 2026-03-06 15:40 ` Mario Limonciello 2026-03-06 15:49 ` Borislav Petkov 2026-03-07 14:49 ` Yazen Ghannam 2026-03-07 15:12 ` Borislav Petkov 2026-03-10 12:54 ` Yazen Ghannam 2026-03-10 14:58 ` Mario Limonciello 2026-03-10 16:52 ` Yazen Ghannam 2026-03-10 17:47 ` Mario Limonciello 2026-03-11 13:38 ` Yazen Ghannam 2026-03-11 14:28 ` Mario Limonciello 2026-03-18 15:43 ` Mario Limonciello 2026-03-19 14:36 ` Yazen Ghannam 2026-03-20 3:30 ` Mario Limonciello
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox