* [PATCH v2] RAS/AMD/ATL: Remove bitwise_xor_bits @ 2025-11-24 8:40 Nikolay Borisov 2025-11-24 8:57 ` Kuan-Wei Chiu 2025-11-25 9:35 ` david laight 0 siblings, 2 replies; 8+ messages in thread From: Nikolay Borisov @ 2025-11-24 8:40 UTC (permalink / raw) To: linux-edac; +Cc: linux-kernel, bp, Yazen.Ghannam, Nikolay Borisov Both LLVM/GCC support a __builtin_parity function which is functionally equivalent to the custom bitwise_xor_bits() one. Let's simplify the code by relying on the built-in. No functional changes. Signed-off-by: Nikolay Borisov <nik.borisov@suse.com> --- Changes since v1: * Reworded the commit message drivers/ras/amd/atl/umc.c | 22 +++++----------------- 1 file changed, 5 insertions(+), 17 deletions(-) diff --git a/drivers/ras/amd/atl/umc.c b/drivers/ras/amd/atl/umc.c index 6e072b7667e9..7ff4a5a1c5da 100644 --- a/drivers/ras/amd/atl/umc.c +++ b/drivers/ras/amd/atl/umc.c @@ -49,18 +49,6 @@ static u8 get_coh_st_inst_id_mi300(struct atl_err *err) return i; } -/* XOR the bits in @val. */ -static u16 bitwise_xor_bits(u16 val) -{ - u16 tmp = 0; - u8 i; - - for (i = 0; i < 16; i++) - tmp ^= (val >> i) & 0x1; - - return tmp; -} - struct xor_bits { bool xor_enable; u16 col_xor; @@ -250,17 +238,17 @@ static unsigned long convert_dram_to_norm_addr_mi300(unsigned long addr) if (!addr_hash.bank[i].xor_enable) continue; - temp = bitwise_xor_bits(col & addr_hash.bank[i].col_xor); - temp ^= bitwise_xor_bits(row & addr_hash.bank[i].row_xor); + temp = (u16)__builtin_parity(col & addr_hash.bank[i].col_xor); + temp ^= (u16)__builtin_parity(row & addr_hash.bank[i].row_xor); bank ^= temp << i; } /* Calculate hash for PC bit. */ if (addr_hash.pc.xor_enable) { - temp = bitwise_xor_bits(col & addr_hash.pc.col_xor); - temp ^= bitwise_xor_bits(row & addr_hash.pc.row_xor); + temp = (u16)__builtin_parity(col & addr_hash.pc.col_xor); + temp ^= (u16)__builtin_parity(row & addr_hash.pc.row_xor); /* Bits SID[1:0] act as Bank[5:4] for PC hash, so apply them here. */ - temp ^= bitwise_xor_bits((bank | sid << NUM_BANK_BITS) & addr_hash.bank_xor); + temp ^= (u16)__builtin_parity((bank | sid << NUM_BANK_BITS) & addr_hash.bank_xor); pc ^= temp; } -- 2.52.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2] RAS/AMD/ATL: Remove bitwise_xor_bits 2025-11-24 8:40 [PATCH v2] RAS/AMD/ATL: Remove bitwise_xor_bits Nikolay Borisov @ 2025-11-24 8:57 ` Kuan-Wei Chiu 2025-11-24 11:05 ` Borislav Petkov 2025-11-25 9:35 ` david laight 1 sibling, 1 reply; 8+ messages in thread From: Kuan-Wei Chiu @ 2025-11-24 8:57 UTC (permalink / raw) To: Nikolay Borisov; +Cc: linux-edac, linux-kernel, bp, Yazen.Ghannam Hi Nikolay, On Mon, Nov 24, 2025 at 10:40:11AM +0200, Nikolay Borisov wrote: > Both LLVM/GCC support a __builtin_parity function which is functionally > equivalent to the custom bitwise_xor_bits() one. Let's simplify the code by > relying on the built-in. No functional changes. IIRC in some cases, if the compiler decides not to inline __builtin_parity(), it generates a libgcc function call like __paritysi2(). Since the kernel currently lacks this symbol, this could lead to a build failure at link time. Although the compiler inlines it in most cases, I am not sure if using __builtin_parity() here is a good idea. Regards, Kuan-Wei > > Signed-off-by: Nikolay Borisov <nik.borisov@suse.com> > --- > > Changes since v1: > > * Reworded the commit message > > drivers/ras/amd/atl/umc.c | 22 +++++----------------- > 1 file changed, 5 insertions(+), 17 deletions(-) > > diff --git a/drivers/ras/amd/atl/umc.c b/drivers/ras/amd/atl/umc.c > index 6e072b7667e9..7ff4a5a1c5da 100644 > --- a/drivers/ras/amd/atl/umc.c > +++ b/drivers/ras/amd/atl/umc.c > @@ -49,18 +49,6 @@ static u8 get_coh_st_inst_id_mi300(struct atl_err *err) > return i; > } > > -/* XOR the bits in @val. */ > -static u16 bitwise_xor_bits(u16 val) > -{ > - u16 tmp = 0; > - u8 i; > - > - for (i = 0; i < 16; i++) > - tmp ^= (val >> i) & 0x1; > - > - return tmp; > -} > - > struct xor_bits { > bool xor_enable; > u16 col_xor; > @@ -250,17 +238,17 @@ static unsigned long convert_dram_to_norm_addr_mi300(unsigned long addr) > if (!addr_hash.bank[i].xor_enable) > continue; > > - temp = bitwise_xor_bits(col & addr_hash.bank[i].col_xor); > - temp ^= bitwise_xor_bits(row & addr_hash.bank[i].row_xor); > + temp = (u16)__builtin_parity(col & addr_hash.bank[i].col_xor); > + temp ^= (u16)__builtin_parity(row & addr_hash.bank[i].row_xor); > bank ^= temp << i; > } > > /* Calculate hash for PC bit. */ > if (addr_hash.pc.xor_enable) { > - temp = bitwise_xor_bits(col & addr_hash.pc.col_xor); > - temp ^= bitwise_xor_bits(row & addr_hash.pc.row_xor); > + temp = (u16)__builtin_parity(col & addr_hash.pc.col_xor); > + temp ^= (u16)__builtin_parity(row & addr_hash.pc.row_xor); > /* Bits SID[1:0] act as Bank[5:4] for PC hash, so apply them here. */ > - temp ^= bitwise_xor_bits((bank | sid << NUM_BANK_BITS) & addr_hash.bank_xor); > + temp ^= (u16)__builtin_parity((bank | sid << NUM_BANK_BITS) & addr_hash.bank_xor); > pc ^= temp; > } > > -- > 2.52.0 > > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] RAS/AMD/ATL: Remove bitwise_xor_bits 2025-11-24 8:57 ` Kuan-Wei Chiu @ 2025-11-24 11:05 ` Borislav Petkov 2025-11-24 12:03 ` Kuan-Wei Chiu 0 siblings, 1 reply; 8+ messages in thread From: Borislav Petkov @ 2025-11-24 11:05 UTC (permalink / raw) To: Kuan-Wei Chiu; +Cc: Nikolay Borisov, linux-edac, linux-kernel, Yazen.Ghannam On Mon, Nov 24, 2025 at 04:57:51PM +0800, Kuan-Wei Chiu wrote: > > Both LLVM/GCC support a __builtin_parity function which is functionally > > equivalent to the custom bitwise_xor_bits() one. Let's simplify the code by > > relying on the built-in. No functional changes. > > IIRC in some cases, Which are those cases? Do you have a trigger scenario? -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] RAS/AMD/ATL: Remove bitwise_xor_bits 2025-11-24 11:05 ` Borislav Petkov @ 2025-11-24 12:03 ` Kuan-Wei Chiu 2025-11-24 12:52 ` Borislav Petkov 0 siblings, 1 reply; 8+ messages in thread From: Kuan-Wei Chiu @ 2025-11-24 12:03 UTC (permalink / raw) To: Borislav Petkov; +Cc: Nikolay Borisov, linux-edac, linux-kernel, Yazen.Ghannam Hi Borislav, On Mon, Nov 24, 2025 at 12:05:26PM +0100, Borislav Petkov wrote: > On Mon, Nov 24, 2025 at 04:57:51PM +0800, Kuan-Wei Chiu wrote: > > > Both LLVM/GCC support a __builtin_parity function which is functionally > > > equivalent to the custom bitwise_xor_bits() one. Let's simplify the code by > > > relying on the built-in. No functional changes. > > > > IIRC in some cases, > > Which are those cases? > > Do you have a trigger scenario? > I did a quick search, and I believe it was this kernel test robot report [1] that reminded me of this compiler behavior. [1]: https://lore.kernel.org/oe-kbuild-all/202501312159.l6jNRaYy-lkp@intel.com/ Regards, Kuan-Wei ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] RAS/AMD/ATL: Remove bitwise_xor_bits 2025-11-24 12:03 ` Kuan-Wei Chiu @ 2025-11-24 12:52 ` Borislav Petkov 2025-11-24 13:24 ` Nikolay Borisov 0 siblings, 1 reply; 8+ messages in thread From: Borislav Petkov @ 2025-11-24 12:52 UTC (permalink / raw) To: Kuan-Wei Chiu; +Cc: Nikolay Borisov, linux-edac, linux-kernel, Yazen.Ghannam On Mon, Nov 24, 2025 at 08:03:02PM +0800, Kuan-Wei Chiu wrote: > On Mon, Nov 24, 2025 at 12:05:26PM +0100, Borislav Petkov wrote: > > On Mon, Nov 24, 2025 at 04:57:51PM +0800, Kuan-Wei Chiu wrote: > > > > Both LLVM/GCC support a __builtin_parity function which is functionally > > > > equivalent to the custom bitwise_xor_bits() one. Let's simplify the code by > > > > relying on the built-in. No functional changes. > > > > > > IIRC in some cases, > > > > Which are those cases? > > > > Do you have a trigger scenario? > > > I did a quick search, and I believe it was this kernel test robot > report [1] that reminded me of this compiler behavior. > > [1]: https://lore.kernel.org/oe-kbuild-all/202501312159.l6jNRaYy-lkp@intel.com/ Interesting, thanks for the pointer. @Nik, just use hweight16() but pls do check what asm the compiler generates before and after so that at least there's some palpable improvement or gcc is smart enough to replace the unrolled XORing with something slick. Also put in the commit message why we're not using the builtin parity thing and quote the link above. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] RAS/AMD/ATL: Remove bitwise_xor_bits 2025-11-24 12:52 ` Borislav Petkov @ 2025-11-24 13:24 ` Nikolay Borisov 2025-11-24 13:34 ` Borislav Petkov 0 siblings, 1 reply; 8+ messages in thread From: Nikolay Borisov @ 2025-11-24 13:24 UTC (permalink / raw) To: Borislav Petkov, Kuan-Wei Chiu; +Cc: linux-edac, linux-kernel, Yazen.Ghannam On 11/24/25 14:52, Borislav Petkov wrote: > On Mon, Nov 24, 2025 at 08:03:02PM +0800, Kuan-Wei Chiu wrote: >> On Mon, Nov 24, 2025 at 12:05:26PM +0100, Borislav Petkov wrote: >>> On Mon, Nov 24, 2025 at 04:57:51PM +0800, Kuan-Wei Chiu wrote: >>>>> Both LLVM/GCC support a __builtin_parity function which is functionally >>>>> equivalent to the custom bitwise_xor_bits() one. Let's simplify the code by >>>>> relying on the built-in. No functional changes. >>>> >>>> IIRC in some cases, >>> >>> Which are those cases? >>> >>> Do you have a trigger scenario? >>> >> I did a quick search, and I believe it was this kernel test robot >> report [1] that reminded me of this compiler behavior. >> >> [1]: https://lore.kernel.org/oe-kbuild-all/202501312159.l6jNRaYy-lkp@intel.com/ > > Interesting, thanks for the pointer. > > @Nik, just use hweight16() but pls do check what asm the compiler generates > before and after so that at least there's some palpable improvement or gcc is > smart enough to replace the unrolled XORing with something slick. > > Also put in the commit message why we're not using the builtin parity thing > and quote the link above. > > Thx. > So the custom function one results in the following asm being inlined i.e 4 total copies: # drivers/ras/amd/atl/umc.c:59: tmp ^= (val >> i) & 0x1; xorl %ecx, %ecx # ivtmp.157 # drivers/ras/amd/atl/umc.c:55: u16 tmp = 0; xorl %esi, %esi # tmp # drivers/ras/amd/atl/umc.c:253: temp = bitwise_xor_bits(col & addr_hash.bank[i].col_xor); andl %r13d, %edx # col, tmp249 # drivers/ras/amd/atl/umc.c:59: tmp ^= (val >> i) & 0x1; movzwl %dx, %edx # tmp249, _387 .L3: movl %edx, %eax # _387, tmp250 sarl %cl, %eax # ivtmp.157, tmp250 # drivers/ras/amd/atl/umc.c:58: for (i = 0; i < 16; i++) addl $1, %ecx #, ivtmp.157 # drivers/ras/amd/atl/umc.c:59: tmp ^= (val >> i) & 0x1; andl $1, %eax #, tmp251 # drivers/ras/amd/atl/umc.c:59: tmp ^= (val >> i) & 0x1; xorl %eax, %esi # tmp251, tmp # drivers/ras/amd/atl/umc.c:58: for (i = 0; i < 16; i++) cmpl $16, %ecx #, ivtmp.157 jne .L3 #, Whilst with hweight, if the cpu has popcnt it will be 2 instructions - popcnt and an andl: # drivers/ras/amd/atl/umc.c:243: temp ^= hweight16(row & addr_hash.bank[i].row_xor) & 1; andl %ebp, %edi # _321, tmp221 # ./arch/x86/include/asm/arch_hweight.h:19: asm_inline (ALTERNATIVE("call __sw_hweight32", #APP # 19 "./arch/x86/include/asm/arch_hweight.h" 1 # ALT: oldinstr 771: call __sw_hweight32 772: # ALT: padding .skip -(((775f-774f)-(772b-771b)) > 0) * ((775f-774f)-(772b-771b)),0x90 773: .pushsection .altinstructions,"a" .long 771b - . .long 774f - . .4byte ( 4*32+23) .byte 773b-771b .byte 775f-774f .popsection .pushsection .altinstr_replacement, "ax" # ALT: replacement 774: popcntl %edi, %eax # tmp221, res 775: .popsection # 0 "" 2 # drivers/ras/amd/atl/umc.c:243: temp ^= hweight16(row & addr_hash.bank[i].row_xor) & 1; #NO_APP xorl %eax, %edx # res, tmp222 andl $1, %edx #, temp So yeah, much better IMHO, unless there is some hidden latency in the popcnt... ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] RAS/AMD/ATL: Remove bitwise_xor_bits 2025-11-24 13:24 ` Nikolay Borisov @ 2025-11-24 13:34 ` Borislav Petkov 0 siblings, 0 replies; 8+ messages in thread From: Borislav Petkov @ 2025-11-24 13:34 UTC (permalink / raw) To: Nikolay Borisov; +Cc: Kuan-Wei Chiu, linux-edac, linux-kernel, Yazen.Ghannam On Mon, Nov 24, 2025 at 03:24:40PM +0200, Nikolay Borisov wrote: > So yeah, much better IMHO, unless there is some hidden latency in the popcnt... Yap, exactly. And every CPU we care about supports POPCNT. And this is soo not perf-sensitive so let's whack that function. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] RAS/AMD/ATL: Remove bitwise_xor_bits 2025-11-24 8:40 [PATCH v2] RAS/AMD/ATL: Remove bitwise_xor_bits Nikolay Borisov 2025-11-24 8:57 ` Kuan-Wei Chiu @ 2025-11-25 9:35 ` david laight 1 sibling, 0 replies; 8+ messages in thread From: david laight @ 2025-11-25 9:35 UTC (permalink / raw) To: Nikolay Borisov; +Cc: linux-edac, linux-kernel, bp, Yazen.Ghannam On Mon, 24 Nov 2025 10:40:11 +0200 Nikolay Borisov <nik.borisov@suse.com> wrote: > Both LLVM/GCC support a __builtin_parity function which is functionally > equivalent to the custom bitwise_xor_bits() one. Let's simplify the code by > relying on the built-in. No functional changes. > While you've got this code out on the operating table: - Change all the locals/parameters from u8/u16 to 'unsigned int'. It will generate better code. Using u8/u16 only makes any sense if you are trying to reduce the size of a structure. - Both col_xor and row_xor are masks (for the parity code). So the names are wrong. In fact I think all the 'xor' and 'XOR' are incorrectly named. - How often is 'xor_enable' aka 'mask_enable' set? If set most of the time (or the code rarely runs) then if the hardware register says 'don't include these values' then just set the row/col mask values to zero and let the rest of the code just run through. David ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-11-25 9:35 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-11-24 8:40 [PATCH v2] RAS/AMD/ATL: Remove bitwise_xor_bits Nikolay Borisov 2025-11-24 8:57 ` Kuan-Wei Chiu 2025-11-24 11:05 ` Borislav Petkov 2025-11-24 12:03 ` Kuan-Wei Chiu 2025-11-24 12:52 ` Borislav Petkov 2025-11-24 13:24 ` Nikolay Borisov 2025-11-24 13:34 ` Borislav Petkov 2025-11-25 9:35 ` david laight
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox