* [PATCH v4] EDAC/amd64: Fix size calculation for Non-Power-of-Two DIMMs
@ 2025-05-13 19:20 Avadhut Naik
2025-05-22 16:12 ` Naik, Avadhut
2025-05-28 9:22 ` Borislav Petkov
0 siblings, 2 replies; 4+ messages in thread
From: Avadhut Naik @ 2025-05-13 19:20 UTC (permalink / raw)
To: linux-edac
Cc: bp, linux-kernel, avadnaik, Žilvinas Žaltiena,
Yazen Ghannam
Each Chip-Select (CS) of a Unified Memory Controller (UMC) on AMD's
modern Zen-based SOCs has an Address Mask and a Secondary Address Mask
register associated with it. The amd64_edac module logs DIMM sizes on a
per-UMC per-CS granularity during init using these two registers.
Currently, the module primarily considers only the Address Mask register
for computing DIMM sizes. The Secondary Address Mask register is only
considered for odd CS. Additionally, if it has been considered, the
Address Mask register is ignored altogether for that CS. For
power-of-two DIMMs, this is not an issue since only the Address Mask
register is used.
For non-power-of-two DIMMs, however, the Secondary Address Mask register
is used in conjunction with the Address Mask register. However, since the
module only considers either of the two registers for a CS, the size
computed by the module is incorrect. The Secondary Address Mask register
is not considered for even CS, and the Address Mask register is not
considered for odd CS.
Introduce a new helper function so that both Address Mask and Secondary
Address Mask registers are considered, when valid, for computing DIMM
sizes. Furthermore, also rename some variables for greater clarity.
Fixes: 81f5090db843 ("EDAC/amd64: Support asymmetric dual-rank DIMMs")
Reported-by: Žilvinas Žaltiena <zilvinas@natrix.lt>
Closes: https://lore.kernel.org/dbec22b6-00f2-498b-b70d-ab6f8a5ec87e@natrix.lt
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Tested-by: Žilvinas Žaltiena <zilvinas@natrix.lt>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: stable@vger.kernel.org
```
Changes in v2:
1. Avoid unnecessary variable initialization.
2. Modify commit message to accurately reflect the changes.
3. Move check for non-zero Address Mask register into the new helper.
Changes in v3:
1. Add the missing Closes tag and rearrange tags per tip tree handbook.
3. Slightly modify commit message to properly reflect the SOCs that may
encounter this issue.
4. Rebase on top of edac-for-next.
Changes in v4:
1. Rebase on top of edac-for-next.
Links:
v1: https://lore.kernel.org/all/20250327210718.1640762-1-avadhut.naik@amd.com/
v2: https://lore.kernel.org/all/20250415213150.755255-1-avadhut.naik@amd.com/
v3: https://lore.kernel.org/all/20250416222552.1686475-1-avadhut.naik@amd.com/
---
drivers/edac/amd64_edac.c | 57 ++++++++++++++++++++++++---------------
1 file changed, 36 insertions(+), 21 deletions(-)
diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 90f0eb7cc5b9..9e8b8bd2be32 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -1209,7 +1209,9 @@ static int umc_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt)
if (csrow_enabled(2 * dimm + 1, ctrl, pvt))
cs_mode |= CS_ODD_PRIMARY;
- /* Asymmetric dual-rank DIMM support. */
+ if (csrow_sec_enabled(2 * dimm, ctrl, pvt))
+ cs_mode |= CS_EVEN_SECONDARY;
+
if (csrow_sec_enabled(2 * dimm + 1, ctrl, pvt))
cs_mode |= CS_ODD_SECONDARY;
@@ -1230,12 +1232,13 @@ static int umc_get_cs_mode(int dimm, u8 ctrl, struct amd64_pvt *pvt)
return cs_mode;
}
-static int __addr_mask_to_cs_size(u32 addr_mask_orig, unsigned int cs_mode,
- int csrow_nr, int dimm)
+static int calculate_cs_size(u32 mask, unsigned int cs_mode)
{
- u32 msb, weight, num_zero_bits;
- u32 addr_mask_deinterleaved;
- int size = 0;
+ int msb, weight, num_zero_bits;
+ u32 deinterleaved_mask;
+
+ if (!mask)
+ return 0;
/*
* The number of zero bits in the mask is equal to the number of bits
@@ -1248,19 +1251,30 @@ static int __addr_mask_to_cs_size(u32 addr_mask_orig, unsigned int cs_mode,
* without swapping with the most significant bit. This can be handled
* by keeping the MSB where it is and ignoring the single zero bit.
*/
- msb = fls(addr_mask_orig) - 1;
- weight = hweight_long(addr_mask_orig);
+ msb = fls(mask) - 1;
+ weight = hweight_long(mask);
num_zero_bits = msb - weight - !!(cs_mode & CS_3R_INTERLEAVE);
/* Take the number of zero bits off from the top of the mask. */
- addr_mask_deinterleaved = GENMASK_ULL(msb - num_zero_bits, 1);
+ deinterleaved_mask = GENMASK(msb - num_zero_bits, 1);
+ edac_dbg(1, " Deinterleaved AddrMask: 0x%x\n", deinterleaved_mask);
+
+ return (deinterleaved_mask >> 2) + 1;
+}
+
+static int __addr_mask_to_cs_size(u32 addr_mask, u32 addr_mask_sec,
+ unsigned int cs_mode, int csrow_nr, int dimm)
+{
+ int size;
edac_dbg(1, "CS%d DIMM%d AddrMasks:\n", csrow_nr, dimm);
- edac_dbg(1, " Original AddrMask: 0x%x\n", addr_mask_orig);
- edac_dbg(1, " Deinterleaved AddrMask: 0x%x\n", addr_mask_deinterleaved);
+ edac_dbg(1, " Primary AddrMask: 0x%x\n", addr_mask);
/* Register [31:1] = Address [39:9]. Size is in kBs here. */
- size = (addr_mask_deinterleaved >> 2) + 1;
+ size = calculate_cs_size(addr_mask, cs_mode);
+
+ edac_dbg(1, " Secondary AddrMask: 0x%x\n", addr_mask_sec);
+ size += calculate_cs_size(addr_mask_sec, cs_mode);
/* Return size in MBs. */
return size >> 10;
@@ -1270,7 +1284,7 @@ static int umc_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
unsigned int cs_mode, int csrow_nr)
{
int cs_mask_nr = csrow_nr;
- u32 addr_mask_orig;
+ u32 addr_mask = 0, addr_mask_sec = 0;
int dimm, size = 0;
/* No Chip Selects are enabled. */
@@ -1308,13 +1322,13 @@ static int umc_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
if (!pvt->flags.zn_regs_v2)
cs_mask_nr >>= 1;
- /* Asymmetric dual-rank DIMM support. */
- if ((csrow_nr & 1) && (cs_mode & CS_ODD_SECONDARY))
- addr_mask_orig = pvt->csels[umc].csmasks_sec[cs_mask_nr];
- else
- addr_mask_orig = pvt->csels[umc].csmasks[cs_mask_nr];
+ if (cs_mode & (CS_EVEN_PRIMARY | CS_ODD_PRIMARY))
+ addr_mask = pvt->csels[umc].csmasks[cs_mask_nr];
+
+ if (cs_mode & (CS_EVEN_SECONDARY | CS_ODD_SECONDARY))
+ addr_mask_sec = pvt->csels[umc].csmasks_sec[cs_mask_nr];
- return __addr_mask_to_cs_size(addr_mask_orig, cs_mode, csrow_nr, dimm);
+ return __addr_mask_to_cs_size(addr_mask, addr_mask_sec, cs_mode, csrow_nr, dimm);
}
static void umc_debug_display_dimm_sizes(struct amd64_pvt *pvt, u8 ctrl)
@@ -3512,9 +3526,10 @@ static void gpu_get_err_info(struct mce *m, struct err_info *err)
static int gpu_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
unsigned int cs_mode, int csrow_nr)
{
- u32 addr_mask_orig = pvt->csels[umc].csmasks[csrow_nr];
+ u32 addr_mask = pvt->csels[umc].csmasks[csrow_nr];
+ u32 addr_mask_sec = pvt->csels[umc].csmasks_sec[csrow_nr];
- return __addr_mask_to_cs_size(addr_mask_orig, cs_mode, csrow_nr, csrow_nr >> 1);
+ return __addr_mask_to_cs_size(addr_mask, addr_mask_sec, cs_mode, csrow_nr, csrow_nr >> 1);
}
static void gpu_debug_display_dimm_sizes(struct amd64_pvt *pvt, u8 ctrl)
base-commit: 4521b86e4a6ef9efff329ef18120b1520059ae4e
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH v4] EDAC/amd64: Fix size calculation for Non-Power-of-Two DIMMs
2025-05-13 19:20 [PATCH v4] EDAC/amd64: Fix size calculation for Non-Power-of-Two DIMMs Avadhut Naik
@ 2025-05-22 16:12 ` Naik, Avadhut
2025-05-28 9:22 ` Borislav Petkov
1 sibling, 0 replies; 4+ messages in thread
From: Naik, Avadhut @ 2025-05-22 16:12 UTC (permalink / raw)
To: Avadhut Naik
Cc: bp, linux-kernel, Žilvinas Žaltiena, Yazen Ghannam,
linux-edac
Hi,
Any further feedback on this patch?
--
Thanks,
Avadhut Naik
On 5/13/2025 14:20, Avadhut Naik wrote:
> Each Chip-Select (CS) of a Unified Memory Controller (UMC) on AMD's
> modern Zen-based SOCs has an Address Mask and a Secondary Address Mask
> register associated with it. The amd64_edac module logs DIMM sizes on a
> per-UMC per-CS granularity during init using these two registers.
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v4] EDAC/amd64: Fix size calculation for Non-Power-of-Two DIMMs
2025-05-13 19:20 [PATCH v4] EDAC/amd64: Fix size calculation for Non-Power-of-Two DIMMs Avadhut Naik
2025-05-22 16:12 ` Naik, Avadhut
@ 2025-05-28 9:22 ` Borislav Petkov
2025-05-28 19:38 ` Naik, Avadhut
1 sibling, 1 reply; 4+ messages in thread
From: Borislav Petkov @ 2025-05-28 9:22 UTC (permalink / raw)
To: Avadhut Naik
Cc: linux-edac, linux-kernel, avadnaik, Žilvinas Žaltiena,
Yazen Ghannam
On Tue, May 13, 2025 at 07:20:11PM +0000, Avadhut Naik wrote:
> Each Chip-Select (CS) of a Unified Memory Controller (UMC) on AMD's
> modern Zen-based SOCs has an Address Mask and a Secondary Address Mask
> register associated with it. The amd64_edac module logs DIMM sizes on a
> per-UMC per-CS granularity during init using these two registers.
>
> Currently, the module primarily considers only the Address Mask register
> for computing DIMM sizes. The Secondary Address Mask register is only
> considered for odd CS. Additionally, if it has been considered, the
> Address Mask register is ignored altogether for that CS. For
> power-of-two DIMMs, this is not an issue since only the Address Mask
What are power-of-two DIMMs?
The number of DIMMs on the system is a 2^x?
Their ranks are a power of two?
Their combined size is not power of two?
One can only guess...
> register is used.
>
> For non-power-of-two DIMMs, however, the Secondary Address Mask register
> is used in conjunction with the Address Mask register. However, since the
> module only considers either of the two registers for a CS, the size
> computed by the module is incorrect.
Yah, it must be something about the size...
> The Secondary Address Mask register
> is not considered for even CS, and the Address Mask register is not
> considered for odd CS.
>
> Introduce a new helper function so that both Address Mask and Secondary
> Address Mask registers are considered, when valid, for computing DIMM
> sizes. Furthermore, also rename some variables for greater clarity.
So it is non-power-of-two sized DIMMs?
IOW, DIMMs whose size is not a power of two?
> Fixes: 81f5090db843 ("EDAC/amd64: Support asymmetric dual-rank DIMMs")
> Reported-by: Žilvinas Žaltiena <zilvinas@natrix.lt>
> Closes: https://lore.kernel.org/dbec22b6-00f2-498b-b70d-ab6f8a5ec87e@natrix.lt
> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
> Tested-by: Žilvinas Žaltiena <zilvinas@natrix.lt>
> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
> Cc: stable@vger.kernel.org
All that changelog stuff...
> ```
> Changes in v2:
> 1. Avoid unnecessary variable initialization.
> 2. Modify commit message to accurately reflect the changes.
> 3. Move check for non-zero Address Mask register into the new helper.
>
> Changes in v3:
> 1. Add the missing Closes tag and rearrange tags per tip tree handbook.
> 3. Slightly modify commit message to properly reflect the SOCs that may
> encounter this issue.
> 4. Rebase on top of edac-for-next.
>
> Changes in v4:
> 1. Rebase on top of edac-for-next.
>
> Links:
> v1: https://lore.kernel.org/all/20250327210718.1640762-1-avadhut.naik@amd.com/
> v2: https://lore.kernel.org/all/20250415213150.755255-1-avadhut.naik@amd.com/
> v3: https://lore.kernel.org/all/20250416222552.1686475-1-avadhut.naik@amd.com/
> ---
<--- ... goes here, under the --- line so that patch handling tools can ignore
it.
> drivers/edac/amd64_edac.c | 57 ++++++++++++++++++++++++---------------
> 1 file changed, 36 insertions(+), 21 deletions(-)
...
> +static int __addr_mask_to_cs_size(u32 addr_mask, u32 addr_mask_sec,
> + unsigned int cs_mode, int csrow_nr, int dimm)
> +{
> + int size;
>
> edac_dbg(1, "CS%d DIMM%d AddrMasks:\n", csrow_nr, dimm);
> - edac_dbg(1, " Original AddrMask: 0x%x\n", addr_mask_orig);
> - edac_dbg(1, " Deinterleaved AddrMask: 0x%x\n", addr_mask_deinterleaved);
> + edac_dbg(1, " Primary AddrMask: 0x%x\n", addr_mask);
>
> /* Register [31:1] = Address [39:9]. Size is in kBs here. */
> - size = (addr_mask_deinterleaved >> 2) + 1;
> + size = calculate_cs_size(addr_mask, cs_mode);
> +
> + edac_dbg(1, " Secondary AddrMask: 0x%x\n", addr_mask_sec);
> + size += calculate_cs_size(addr_mask_sec, cs_mode);
>
> /* Return size in MBs. */
> return size >> 10;
> @@ -1270,7 +1284,7 @@ static int umc_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
> unsigned int cs_mode, int csrow_nr)
> {
> int cs_mask_nr = csrow_nr;
> - u32 addr_mask_orig;
> + u32 addr_mask = 0, addr_mask_sec = 0;
> int dimm, size = 0;
The EDAC tree preferred ordering of variable declarations at the
beginning of a function is reverse fir tree order::
struct long_struct_name *descriptive_name;
unsigned long foo, bar;
unsigned int tmp;
int ret;
The above is faster to parse than the reverse ordering::
int ret;
unsigned int tmp;
unsigned long foo, bar;
struct long_struct_name *descriptive_name;
And even more so than random ordering::
unsigned long foo, bar;
int ret;
struct long_struct_name *descriptive_name;
unsigned int tmp;
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v4] EDAC/amd64: Fix size calculation for Non-Power-of-Two DIMMs
2025-05-28 9:22 ` Borislav Petkov
@ 2025-05-28 19:38 ` Naik, Avadhut
0 siblings, 0 replies; 4+ messages in thread
From: Naik, Avadhut @ 2025-05-28 19:38 UTC (permalink / raw)
To: Borislav Petkov
Cc: linux-edac, linux-kernel, Žilvinas Žaltiena,
Yazen Ghannam, Avadhut Naik
Hi,
On 5/28/2025 04:22, Borislav Petkov wrote:
> On Tue, May 13, 2025 at 07:20:11PM +0000, Avadhut Naik wrote:
>> Each Chip-Select (CS) of a Unified Memory Controller (UMC) on AMD's
>> modern Zen-based SOCs has an Address Mask and a Secondary Address Mask
>> register associated with it. The amd64_edac module logs DIMM sizes on a
>> per-UMC per-CS granularity during init using these two registers.
>>
>> Currently, the module primarily considers only the Address Mask register
>> for computing DIMM sizes. The Secondary Address Mask register is only
>> considered for odd CS. Additionally, if it has been considered, the
>> Address Mask register is ignored altogether for that CS. For
>> power-of-two DIMMs, this is not an issue since only the Address Mask
>
> What are power-of-two DIMMs?
>
> The number of DIMMs on the system is a 2^x?
>
> Their ranks are a power of two?
>
> Their combined size is not power of two?
>
> One can only guess...
>
By power-of-two DIMMs, I mean the DIMMs whose combined i.e .total size
is a power of two. Example: 16 GB, 32 GB or 64 GB DIMMs.
Will mention that explicitly in the commit message.
>> register is used.
>>
>> For non-power-of-two DIMMs, however, the Secondary Address Mask register
>> is used in conjunction with the Address Mask register. However, since the
>> module only considers either of the two registers for a CS, the size
>> computed by the module is incorrect.
>
> Yah, it must be something about the size...
>
>> The Secondary Address Mask register
>> is not considered for even CS, and the Address Mask register is not
>> considered for odd CS.
>>
>> Introduce a new helper function so that both Address Mask and Secondary
>> Address Mask registers are considered, when valid, for computing DIMM
>> sizes. Furthermore, also rename some variables for greater clarity.
>
> So it is non-power-of-two sized DIMMs?
>
> IOW, DIMMs whose size is not a power of two?
>
Yes, non-power-of-2 DIMMs are those DIMMs whose combined i.e. total size
is not a power of two. Example: 24 GB, 48 GB or 96 GB DIMMs.
>> Fixes: 81f5090db843 ("EDAC/amd64: Support asymmetric dual-rank DIMMs")
>> Reported-by: Žilvinas Žaltiena <zilvinas@natrix.lt>
>> Closes: https://lore.kernel.org/dbec22b6-00f2-498b-b70d-ab6f8a5ec87e@natrix.lt
>> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
>> Tested-by: Žilvinas Žaltiena <zilvinas@natrix.lt>
>> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
>> Cc: stable@vger.kernel.org
>
> All that changelog stuff...
>
>> ```
>> Changes in v2:
>> 1. Avoid unnecessary variable initialization.
>> 2. Modify commit message to accurately reflect the changes.
>> 3. Move check for non-zero Address Mask register into the new helper.
>>
>> Changes in v3:
>> 1. Add the missing Closes tag and rearrange tags per tip tree handbook.
>> 3. Slightly modify commit message to properly reflect the SOCs that may
>> encounter this issue.
>> 4. Rebase on top of edac-for-next.
>>
>> Changes in v4:
>> 1. Rebase on top of edac-for-next.
>>
>> Links:
>> v1: https://lore.kernel.org/all/20250327210718.1640762-1-avadhut.naik@amd.com/
>> v2: https://lore.kernel.org/all/20250415213150.755255-1-avadhut.naik@amd.com/
>> v3: https://lore.kernel.org/all/20250416222552.1686475-1-avadhut.naik@amd.com/
>> ---
>
> <--- ... goes here, under the --- line so that patch handling tools can ignore
> it.
>
This is an OOPS!
Thanks for catching this! Will fix it.
>> drivers/edac/amd64_edac.c | 57 ++++++++++++++++++++++++---------------
>> 1 file changed, 36 insertions(+), 21 deletions(-)
>
> ...
>
>> +static int __addr_mask_to_cs_size(u32 addr_mask, u32 addr_mask_sec,
>> + unsigned int cs_mode, int csrow_nr, int dimm)
>> +{
>> + int size;
>>
>> edac_dbg(1, "CS%d DIMM%d AddrMasks:\n", csrow_nr, dimm);
>> - edac_dbg(1, " Original AddrMask: 0x%x\n", addr_mask_orig);
>> - edac_dbg(1, " Deinterleaved AddrMask: 0x%x\n", addr_mask_deinterleaved);
>> + edac_dbg(1, " Primary AddrMask: 0x%x\n", addr_mask);
>>
>> /* Register [31:1] = Address [39:9]. Size is in kBs here. */
>> - size = (addr_mask_deinterleaved >> 2) + 1;
>> + size = calculate_cs_size(addr_mask, cs_mode);
>> +
>> + edac_dbg(1, " Secondary AddrMask: 0x%x\n", addr_mask_sec);
>> + size += calculate_cs_size(addr_mask_sec, cs_mode);
>>
>> /* Return size in MBs. */
>> return size >> 10;
>> @@ -1270,7 +1284,7 @@ static int umc_addr_mask_to_cs_size(struct amd64_pvt *pvt, u8 umc,
>> unsigned int cs_mode, int csrow_nr)
>> {
>> int cs_mask_nr = csrow_nr;
>> - u32 addr_mask_orig;
>> + u32 addr_mask = 0, addr_mask_sec = 0;
>> int dimm, size = 0;
>
> The EDAC tree preferred ordering of variable declarations at the
> beginning of a function is reverse fir tree order::
>
> struct long_struct_name *descriptive_name;
> unsigned long foo, bar;
> unsigned int tmp;
> int ret;
>
> The above is faster to parse than the reverse ordering::
>
> int ret;
> unsigned int tmp;
> unsigned long foo, bar;
> struct long_struct_name *descriptive_name;
>
> And even more so than random ordering::
>
> unsigned long foo, bar;
> int ret;
> struct long_struct_name *descriptive_name;
> unsigned int tmp;
>
Will change them to reverse fir tree order.
Thank you for the feedback!
--
Thanks,
Avadhut Naik
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-05-28 19:38 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-13 19:20 [PATCH v4] EDAC/amd64: Fix size calculation for Non-Power-of-Two DIMMs Avadhut Naik
2025-05-22 16:12 ` Naik, Avadhut
2025-05-28 9:22 ` Borislav Petkov
2025-05-28 19:38 ` Naik, Avadhut
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).