[PATCH 00/17] Introduce and use generic parity32/64 helper

linux-input.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 00/17] Introduce and use generic parity32/64 helper
@ 2025-02-23 16:42 Kuan-Wei Chiu
  2025-02-23 16:42 ` [PATCH 01/17] bitops: Add generic parity calculation for u32 Kuan-Wei Chiu
                   ` (18 more replies)
  0 siblings, 19 replies; 54+ messages in thread
From: Kuan-Wei Chiu @ 2025-02-23 16:42 UTC (permalink / raw)
  To: tglx, mingo, bp, dave.hansen, x86, jk, joel, eajames,
	andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst, mripard,
	tzimmermann, airlied, simona, dmitry.torokhov, mchehab, awalls,
	hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, jirislaby, yury.norov, akpm
  Cc: hpa, alistair, linux, Laurent.pinchart, jonas, jernej.skrabec,
	kuba, linux-kernel, linux-fsi, dri-devel, linux-input,
	linux-media, linux-mtd, oss-drivers, netdev, linux-wireless,
	brcm80211, brcm80211-dev-list.pdl, linux-serial, bpf, jserv,
	Kuan-Wei Chiu, Yu-Chun Lin

Several parts of the kernel contain redundant implementations of parity
calculations for 32-bit and 64-bit values. Introduces generic
parity32() and parity64() helpers in bitops.h, providing a standardized
and optimized implementation.  

Subsequent patches refactor various kernel components to replace
open-coded parity calculations with the new helpers, reducing code
duplication and improving maintainability.  

Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>

Kuan-Wei Chiu (17):
  bitops: Add generic parity calculation for u32
  bitops: Add generic parity calculation for u64
  x86: Replace open-coded parity calculation with parity8()
  media: media/test_drivers: Replace open-coded parity calculation with
    parity8()
  media: pci: cx18-av-vbi: Replace open-coded parity calculation with
    parity8()
  media: saa7115: Replace open-coded parity calculation with parity8()
  serial: max3100: Replace open-coded parity calculation with parity8()
  lib/bch: Replace open-coded parity calculation with parity32()
  Input: joystick - Replace open-coded parity calculation with
    parity32()
  net: ethernet: oa_tc6: Replace open-coded parity calculation with
    parity32()
  wifi: brcm80211: Replace open-coded parity calculation with parity32()
  rm/bridge: dw-hdmi: Replace open-coded parity calculation with
    parity32()
  mtd: ssfdc: Replace open-coded parity calculation with parity32()
  fsi: i2cr: Replace open-coded parity calculation with parity32()
  fsi: i2cr: Replace open-coded parity calculation with parity64()
  Input: joystick - Replace open-coded parity calculation with
    parity64()
  nfp: bpf: Replace open-coded parity calculation with parity64()

 arch/x86/kernel/bootflag.c                    | 18 ++------
 drivers/fsi/fsi-master-i2cr.c                 | 18 ++------
 .../drm/bridge/synopsys/dw-hdmi-ahb-audio.c   |  8 +---
 drivers/input/joystick/grip_mp.c              | 17 +-------
 drivers/input/joystick/sidewinder.c           | 24 +++--------
 drivers/media/i2c/saa7115.c                   | 12 +-----
 drivers/media/pci/cx18/cx18-av-vbi.c          | 12 +-----
 .../media/test-drivers/vivid/vivid-vbi-gen.c  |  8 +---
 drivers/mtd/ssfdc.c                           | 17 +-------
 drivers/net/ethernet/netronome/nfp/nfp_asm.c  |  7 +--
 drivers/net/ethernet/oa_tc6.c                 | 19 ++------
 .../broadcom/brcm80211/brcmsmac/dma.c         | 16 +------
 drivers/tty/serial/max3100.c                  |  3 +-
 include/linux/bitops.h                        | 43 +++++++++++++++++++
 lib/bch.c                                     | 14 +-----
 15 files changed, 74 insertions(+), 162 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH 01/17] bitops: Add generic parity calculation for u32
  2025-02-23 16:42 [PATCH 00/17] Introduce and use generic parity32/64 helper Kuan-Wei Chiu
@ 2025-02-23 16:42 ` Kuan-Wei Chiu
  2025-02-23 16:42 ` [PATCH 02/17] bitops: Add generic parity calculation for u64 Kuan-Wei Chiu
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 54+ messages in thread
From: Kuan-Wei Chiu @ 2025-02-23 16:42 UTC (permalink / raw)
  To: tglx, mingo, bp, dave.hansen, x86, jk, joel, eajames,
	andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst, mripard,
	tzimmermann, airlied, simona, dmitry.torokhov, mchehab, awalls,
	hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, jirislaby, yury.norov, akpm
  Cc: hpa, alistair, linux, Laurent.pinchart, jonas, jernej.skrabec,
	kuba, linux-kernel, linux-fsi, dri-devel, linux-input,
	linux-media, linux-mtd, oss-drivers, netdev, linux-wireless,
	brcm80211, brcm80211-dev-list.pdl, linux-serial, bpf, jserv,
	Kuan-Wei Chiu, Yu-Chun Lin

Several parts of the kernel open-code parity calculations using
different methods. Add a generic parity32() helper implemented with the
same efficient approach as parity8().

Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
---
 include/linux/bitops.h | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index c1cb53cf2f0f..fb13dedad7aa 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -260,6 +260,27 @@ static inline int parity8(u8 val)
 	return (0x6996 >> (val & 0xf)) & 1;
 }
 
+/**
+ * parity32 - get the parity of an u32 value
+ * @value: the value to be examined
+ *
+ * Determine the parity of the u32 argument.
+ *
+ * Returns:
+ * 0 for even parity, 1 for odd parity
+ */
+static inline int parity32(u32 val)
+{
+	/*
+	 * One explanation of this algorithm:
+	 * https://funloop.org/codex/problem/parity/README.html
+	 */
+	val ^= val >> 16;
+	val ^= val >> 8;
+	val ^= val >> 4;
+	return (0x6996 >> (val & 0xf)) & 1;
+}
+
 /**
  * __ffs64 - find first set bit in a 64 bit word
  * @word: The 64 bit word
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 02/17] bitops: Add generic parity calculation for u64
  2025-02-23 16:42 [PATCH 00/17] Introduce and use generic parity32/64 helper Kuan-Wei Chiu
  2025-02-23 16:42 ` [PATCH 01/17] bitops: Add generic parity calculation for u32 Kuan-Wei Chiu
@ 2025-02-23 16:42 ` Kuan-Wei Chiu
  2025-02-24  7:09   ` Jiri Slaby
  2025-02-24 19:27   ` Yury Norov
  2025-02-23 16:42 ` [PATCH 03/17] x86: Replace open-coded parity calculation with parity8() Kuan-Wei Chiu
                   ` (16 subsequent siblings)
  18 siblings, 2 replies; 54+ messages in thread
From: Kuan-Wei Chiu @ 2025-02-23 16:42 UTC (permalink / raw)
  To: tglx, mingo, bp, dave.hansen, x86, jk, joel, eajames,
	andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst, mripard,
	tzimmermann, airlied, simona, dmitry.torokhov, mchehab, awalls,
	hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, jirislaby, yury.norov, akpm
  Cc: hpa, alistair, linux, Laurent.pinchart, jonas, jernej.skrabec,
	kuba, linux-kernel, linux-fsi, dri-devel, linux-input,
	linux-media, linux-mtd, oss-drivers, netdev, linux-wireless,
	brcm80211, brcm80211-dev-list.pdl, linux-serial, bpf, jserv,
	Kuan-Wei Chiu, Yu-Chun Lin

Several parts of the kernel open-code parity calculations using
different methods. Add a generic parity64() helper implemented with the
same efficient approach as parity8().

Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
---
 include/linux/bitops.h | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index fb13dedad7aa..67677057f5e2 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -281,6 +281,28 @@ static inline int parity32(u32 val)
 	return (0x6996 >> (val & 0xf)) & 1;
 }
 
+/**
+ * parity64 - get the parity of an u64 value
+ * @value: the value to be examined
+ *
+ * Determine the parity of the u64 argument.
+ *
+ * Returns:
+ * 0 for even parity, 1 for odd parity
+ */
+static inline int parity64(u64 val)
+{
+	/*
+	 * One explanation of this algorithm:
+	 * https://funloop.org/codex/problem/parity/README.html
+	 */
+	val ^= val >> 32;
+	val ^= val >> 16;
+	val ^= val >> 8;
+	val ^= val >> 4;
+	return (0x6996 >> (val & 0xf)) & 1;
+}
+
 /**
  * __ffs64 - find first set bit in a 64 bit word
  * @word: The 64 bit word
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 03/17] x86: Replace open-coded parity calculation with parity8()
  2025-02-23 16:42 [PATCH 00/17] Introduce and use generic parity32/64 helper Kuan-Wei Chiu
  2025-02-23 16:42 ` [PATCH 01/17] bitops: Add generic parity calculation for u32 Kuan-Wei Chiu
  2025-02-23 16:42 ` [PATCH 02/17] bitops: Add generic parity calculation for u64 Kuan-Wei Chiu
@ 2025-02-23 16:42 ` Kuan-Wei Chiu
  2025-02-24 15:24   ` Uros Bizjak
  2025-02-23 16:42 ` [PATCH 04/17] media: media/test_drivers: " Kuan-Wei Chiu
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 54+ messages in thread
From: Kuan-Wei Chiu @ 2025-02-23 16:42 UTC (permalink / raw)
  To: tglx, mingo, bp, dave.hansen, x86, jk, joel, eajames,
	andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst, mripard,
	tzimmermann, airlied, simona, dmitry.torokhov, mchehab, awalls,
	hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, jirislaby, yury.norov, akpm
  Cc: hpa, alistair, linux, Laurent.pinchart, jonas, jernej.skrabec,
	kuba, linux-kernel, linux-fsi, dri-devel, linux-input,
	linux-media, linux-mtd, oss-drivers, netdev, linux-wireless,
	brcm80211, brcm80211-dev-list.pdl, linux-serial, bpf, jserv,
	Kuan-Wei Chiu, Yu-Chun Lin

Refactor parity calculations to use the standard parity8() helper. This
change eliminates redundant implementations and improves code
efficiency.

Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
---
 arch/x86/kernel/bootflag.c | 18 +++---------------
 1 file changed, 3 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/bootflag.c b/arch/x86/kernel/bootflag.c
index 3fed7ae58b60..314ff0e84900 100644
--- a/arch/x86/kernel/bootflag.c
+++ b/arch/x86/kernel/bootflag.c
@@ -8,6 +8,7 @@
 #include <linux/string.h>
 #include <linux/spinlock.h>
 #include <linux/acpi.h>
+#include <linux/bitops.h>
 #include <asm/io.h>
 
 #include <linux/mc146818rtc.h>
@@ -20,26 +21,13 @@
 
 int sbf_port __initdata = -1;	/* set via acpi_boot_init() */
 
-static int __init parity(u8 v)
-{
-	int x = 0;
-	int i;
-
-	for (i = 0; i < 8; i++) {
-		x ^= (v & 1);
-		v >>= 1;
-	}
-
-	return x;
-}
-
 static void __init sbf_write(u8 v)
 {
 	unsigned long flags;
 
 	if (sbf_port != -1) {
 		v &= ~SBF_PARITY;
-		if (!parity(v))
+		if (!parity8(v))
 			v |= SBF_PARITY;
 
 		printk(KERN_INFO "Simple Boot Flag at 0x%x set to 0x%x\n",
@@ -70,7 +58,7 @@ static int __init sbf_value_valid(u8 v)
 {
 	if (v & SBF_RESERVED)		/* Reserved bits */
 		return 0;
-	if (!parity(v))
+	if (!parity8(v))
 		return 0;
 
 	return 1;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 04/17] media: media/test_drivers: Replace open-coded parity calculation with parity8()
  2025-02-23 16:42 [PATCH 00/17] Introduce and use generic parity32/64 helper Kuan-Wei Chiu
                   ` (2 preceding siblings ...)
  2025-02-23 16:42 ` [PATCH 03/17] x86: Replace open-coded parity calculation with parity8() Kuan-Wei Chiu
@ 2025-02-23 16:42 ` Kuan-Wei Chiu
  2025-02-23 16:42 ` [PATCH 05/17] media: pci: cx18-av-vbi: " Kuan-Wei Chiu
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 54+ messages in thread
From: Kuan-Wei Chiu @ 2025-02-23 16:42 UTC (permalink / raw)
  To: tglx, mingo, bp, dave.hansen, x86, jk, joel, eajames,
	andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst, mripard,
	tzimmermann, airlied, simona, dmitry.torokhov, mchehab, awalls,
	hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, jirislaby, yury.norov, akpm
  Cc: hpa, alistair, linux, Laurent.pinchart, jonas, jernej.skrabec,
	kuba, linux-kernel, linux-fsi, dri-devel, linux-input,
	linux-media, linux-mtd, oss-drivers, netdev, linux-wireless,
	brcm80211, brcm80211-dev-list.pdl, linux-serial, bpf, jserv,
	Kuan-Wei Chiu, Yu-Chun Lin

Refactor parity calculations to use the standard parity8() helper. This
change eliminates redundant implementations and improves code
efficiency.

Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
---
 drivers/media/test-drivers/vivid/vivid-vbi-gen.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/media/test-drivers/vivid/vivid-vbi-gen.c b/drivers/media/test-drivers/vivid/vivid-vbi-gen.c
index 70a4024d461e..90fafa533ccd 100644
--- a/drivers/media/test-drivers/vivid/vivid-vbi-gen.c
+++ b/drivers/media/test-drivers/vivid/vivid-vbi-gen.c
@@ -5,6 +5,7 @@
  * Copyright 2014 Cisco Systems, Inc. and/or its affiliates. All rights reserved.
  */
 
+#include <linux/bitops.h>
 #include <linux/errno.h>
 #include <linux/kernel.h>
 #include <linux/ktime.h>
@@ -165,12 +166,7 @@ static const u8 vivid_cc_sequence2[30] = {
 
 static u8 calc_parity(u8 val)
 {
-	unsigned i;
-	unsigned tot = 0;
-
-	for (i = 0; i < 7; i++)
-		tot += (val & (1 << i)) ? 1 : 0;
-	return val | ((tot & 1) ? 0 : 0x80);
+	return val | ((parity8(val)) ? 0 : 0x80);
 }
 
 static void vivid_vbi_gen_set_time_of_day(u8 *packet)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 05/17] media: pci: cx18-av-vbi: Replace open-coded parity calculation with parity8()
  2025-02-23 16:42 [PATCH 00/17] Introduce and use generic parity32/64 helper Kuan-Wei Chiu
                   ` (3 preceding siblings ...)
  2025-02-23 16:42 ` [PATCH 04/17] media: media/test_drivers: " Kuan-Wei Chiu
@ 2025-02-23 16:42 ` Kuan-Wei Chiu
  2025-02-23 16:42 ` [PATCH 06/17] media: saa7115: " Kuan-Wei Chiu
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 54+ messages in thread
From: Kuan-Wei Chiu @ 2025-02-23 16:42 UTC (permalink / raw)
  To: tglx, mingo, bp, dave.hansen, x86, jk, joel, eajames,
	andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst, mripard,
	tzimmermann, airlied, simona, dmitry.torokhov, mchehab, awalls,
	hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, jirislaby, yury.norov, akpm
  Cc: hpa, alistair, linux, Laurent.pinchart, jonas, jernej.skrabec,
	kuba, linux-kernel, linux-fsi, dri-devel, linux-input,
	linux-media, linux-mtd, oss-drivers, netdev, linux-wireless,
	brcm80211, brcm80211-dev-list.pdl, linux-serial, bpf, jserv,
	Kuan-Wei Chiu, Yu-Chun Lin

Refactor parity calculations to use the standard parity8() helper. This
change eliminates redundant implementations and improves code
efficiency.

Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
---
 drivers/media/pci/cx18/cx18-av-vbi.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/drivers/media/pci/cx18/cx18-av-vbi.c b/drivers/media/pci/cx18/cx18-av-vbi.c
index 65281d40c681..1a113aad9cd4 100644
--- a/drivers/media/pci/cx18/cx18-av-vbi.c
+++ b/drivers/media/pci/cx18/cx18-av-vbi.c
@@ -8,6 +8,7 @@
  */
 
 
+#include <linux/bitops.h>
 #include "cx18-driver.h"
 
 /*
@@ -56,15 +57,6 @@ struct vbi_anc_data {
 	/* u8 fill[]; Variable number of fill bytes */
 };
 
-static int odd_parity(u8 c)
-{
-	c ^= (c >> 4);
-	c ^= (c >> 2);
-	c ^= (c >> 1);
-
-	return c & 1;
-}
-
 static int decode_vps(u8 *dst, u8 *p)
 {
 	static const u8 biphase_tbl[] = {
@@ -278,7 +270,7 @@ int cx18_av_decode_vbi_line(struct v4l2_subdev *sd,
 		break;
 	case 6:
 		sdid = V4L2_SLICED_CAPTION_525;
-		err = !odd_parity(p[0]) || !odd_parity(p[1]);
+		err = !parity8(p[0]) || !parity8(p[1]);
 		break;
 	case 9:
 		sdid = V4L2_SLICED_VPS;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 06/17] media: saa7115: Replace open-coded parity calculation with parity8()
  2025-02-23 16:42 [PATCH 00/17] Introduce and use generic parity32/64 helper Kuan-Wei Chiu
                   ` (4 preceding siblings ...)
  2025-02-23 16:42 ` [PATCH 05/17] media: pci: cx18-av-vbi: " Kuan-Wei Chiu
@ 2025-02-23 16:42 ` Kuan-Wei Chiu
  2025-02-23 16:42 ` [PATCH 07/17] serial: max3100: " Kuan-Wei Chiu
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 54+ messages in thread
From: Kuan-Wei Chiu @ 2025-02-23 16:42 UTC (permalink / raw)
  To: tglx, mingo, bp, dave.hansen, x86, jk, joel, eajames,
	andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst, mripard,
	tzimmermann, airlied, simona, dmitry.torokhov, mchehab, awalls,
	hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, jirislaby, yury.norov, akpm
  Cc: hpa, alistair, linux, Laurent.pinchart, jonas, jernej.skrabec,
	kuba, linux-kernel, linux-fsi, dri-devel, linux-input,
	linux-media, linux-mtd, oss-drivers, netdev, linux-wireless,
	brcm80211, brcm80211-dev-list.pdl, linux-serial, bpf, jserv,
	Kuan-Wei Chiu, Yu-Chun Lin

Refactor parity calculations to use the standard parity8() helper. This
change eliminates redundant implementations and improves code
efficiency.

Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
---
 drivers/media/i2c/saa7115.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/drivers/media/i2c/saa7115.c b/drivers/media/i2c/saa7115.c
index a1c71187e773..b8b8f206ec3a 100644
--- a/drivers/media/i2c/saa7115.c
+++ b/drivers/media/i2c/saa7115.c
@@ -25,6 +25,7 @@
 
 #include "saa711x_regs.h"
 
+#include <linux/bitops.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/slab.h>
@@ -664,15 +665,6 @@ static const unsigned char saa7115_init_misc[] = {
 	0x00, 0x00
 };
 
-static int saa711x_odd_parity(u8 c)
-{
-	c ^= (c >> 4);
-	c ^= (c >> 2);
-	c ^= (c >> 1);
-
-	return c & 1;
-}
-
 static int saa711x_decode_vps(u8 *dst, u8 *p)
 {
 	static const u8 biphase_tbl[] = {
@@ -1227,7 +1219,7 @@ static int saa711x_decode_vbi_line(struct v4l2_subdev *sd, struct v4l2_decode_vb
 		vbi->type = V4L2_SLICED_TELETEXT_B;
 		break;
 	case 4:
-		if (!saa711x_odd_parity(p[0]) || !saa711x_odd_parity(p[1]))
+		if (!parity8(p[0]) || !parity8(p[1]))
 			return 0;
 		vbi->type = V4L2_SLICED_CAPTION_525;
 		break;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 07/17] serial: max3100: Replace open-coded parity calculation with parity8()
  2025-02-23 16:42 [PATCH 00/17] Introduce and use generic parity32/64 helper Kuan-Wei Chiu
                   ` (5 preceding siblings ...)
  2025-02-23 16:42 ` [PATCH 06/17] media: saa7115: " Kuan-Wei Chiu
@ 2025-02-23 16:42 ` Kuan-Wei Chiu
  2025-02-24  7:25   ` Jiri Slaby
  2025-02-23 16:42 ` [PATCH 08/17] lib/bch: Replace open-coded parity calculation with parity32() Kuan-Wei Chiu
                   ` (11 subsequent siblings)
  18 siblings, 1 reply; 54+ messages in thread
From: Kuan-Wei Chiu @ 2025-02-23 16:42 UTC (permalink / raw)
  To: tglx, mingo, bp, dave.hansen, x86, jk, joel, eajames,
	andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst, mripard,
	tzimmermann, airlied, simona, dmitry.torokhov, mchehab, awalls,
	hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, jirislaby, yury.norov, akpm
  Cc: hpa, alistair, linux, Laurent.pinchart, jonas, jernej.skrabec,
	kuba, linux-kernel, linux-fsi, dri-devel, linux-input,
	linux-media, linux-mtd, oss-drivers, netdev, linux-wireless,
	brcm80211, brcm80211-dev-list.pdl, linux-serial, bpf, jserv,
	Kuan-Wei Chiu, Yu-Chun Lin

Refactor parity calculations to use the standard parity8() helper. This
change eliminates redundant implementations and improves code
efficiency.

Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
---
 drivers/tty/serial/max3100.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/serial/max3100.c b/drivers/tty/serial/max3100.c
index cde5f1c86353..f5c487bdc56a 100644
--- a/drivers/tty/serial/max3100.c
+++ b/drivers/tty/serial/max3100.c
@@ -16,6 +16,7 @@
 /* 4 MAX3100s should be enough for everyone */
 #define MAX_MAX3100 4
 
+#include <linux/bitops.h>
 #include <linux/container_of.h>
 #include <linux/delay.h>
 #include <linux/device.h>
@@ -133,7 +134,7 @@ static int max3100_do_parity(struct max3100_port *s, u16 c)
 	else
 		c &= 0xff;
 
-	parity = parity ^ (hweight8(c) & 1);
+	parity = parity ^ (parity8(c));
 	return parity;
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 08/17] lib/bch: Replace open-coded parity calculation with parity32()
  2025-02-23 16:42 [PATCH 00/17] Introduce and use generic parity32/64 helper Kuan-Wei Chiu
                   ` (6 preceding siblings ...)
  2025-02-23 16:42 ` [PATCH 07/17] serial: max3100: " Kuan-Wei Chiu
@ 2025-02-23 16:42 ` Kuan-Wei Chiu
  2025-02-23 16:42 ` [PATCH 09/17] Input: joystick - " Kuan-Wei Chiu
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 54+ messages in thread
From: Kuan-Wei Chiu @ 2025-02-23 16:42 UTC (permalink / raw)
  To: tglx, mingo, bp, dave.hansen, x86, jk, joel, eajames,
	andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst, mripard,
	tzimmermann, airlied, simona, dmitry.torokhov, mchehab, awalls,
	hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, jirislaby, yury.norov, akpm
  Cc: hpa, alistair, linux, Laurent.pinchart, jonas, jernej.skrabec,
	kuba, linux-kernel, linux-fsi, dri-devel, linux-input,
	linux-media, linux-mtd, oss-drivers, netdev, linux-wireless,
	brcm80211, brcm80211-dev-list.pdl, linux-serial, bpf, jserv,
	Kuan-Wei Chiu, Yu-Chun Lin

Refactor parity calculations to use the standard parity32() helper.
This change eliminates redundant implementations and improves code
efficiency.

Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
---
 lib/bch.c | 14 +-------------
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/lib/bch.c b/lib/bch.c
index 1c0cb07cdfeb..769459749982 100644
--- a/lib/bch.c
+++ b/lib/bch.c
@@ -311,18 +311,6 @@ static inline int deg(unsigned int poly)
 	return fls(poly)-1;
 }
 
-static inline int parity(unsigned int x)
-{
-	/*
-	 * public domain code snippet, lifted from
-	 * http://www-graphics.stanford.edu/~seander/bithacks.html
-	 */
-	x ^= x >> 1;
-	x ^= x >> 2;
-	x = (x & 0x11111111U) * 0x11111111U;
-	return (x >> 28) & 1;
-}
-
 /* Galois field basic operations: multiply, divide, inverse, etc. */
 
 static inline unsigned int gf_mul(struct bch_control *bch, unsigned int a,
@@ -524,7 +512,7 @@ static int solve_linear_system(struct bch_control *bch, unsigned int *rows,
 		tmp = 0;
 		for (r = m-1; r >= 0; r--) {
 			mask = rows[r] & (tmp|1);
-			tmp |= parity(mask) << (m-r);
+			tmp |= parity32(mask) << (m-r);
 		}
 		sol[p] = tmp >> 1;
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 09/17] Input: joystick - Replace open-coded parity calculation with parity32()
  2025-02-23 16:42 [PATCH 00/17] Introduce and use generic parity32/64 helper Kuan-Wei Chiu
                   ` (7 preceding siblings ...)
  2025-02-23 16:42 ` [PATCH 08/17] lib/bch: Replace open-coded parity calculation with parity32() Kuan-Wei Chiu
@ 2025-02-23 16:42 ` Kuan-Wei Chiu
  2025-02-23 16:42 ` [PATCH 10/17] net: ethernet: oa_tc6: " Kuan-Wei Chiu
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 54+ messages in thread
From: Kuan-Wei Chiu @ 2025-02-23 16:42 UTC (permalink / raw)
  To: tglx, mingo, bp, dave.hansen, x86, jk, joel, eajames,
	andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst, mripard,
	tzimmermann, airlied, simona, dmitry.torokhov, mchehab, awalls,
	hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, jirislaby, yury.norov, akpm
  Cc: hpa, alistair, linux, Laurent.pinchart, jonas, jernej.skrabec,
	kuba, linux-kernel, linux-fsi, dri-devel, linux-input,
	linux-media, linux-mtd, oss-drivers, netdev, linux-wireless,
	brcm80211, brcm80211-dev-list.pdl, linux-serial, bpf, jserv,
	Kuan-Wei Chiu, Yu-Chun Lin

Refactor parity calculations to use the standard parity32() helper.
This change eliminates redundant implementations and improves code
efficiency.

Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
---
 drivers/input/joystick/grip_mp.c | 17 ++---------------
 1 file changed, 2 insertions(+), 15 deletions(-)

diff --git a/drivers/input/joystick/grip_mp.c b/drivers/input/joystick/grip_mp.c
index 5eadb5a3ca37..897ce13753dc 100644
--- a/drivers/input/joystick/grip_mp.c
+++ b/drivers/input/joystick/grip_mp.c
@@ -18,6 +18,7 @@
 #include <linux/delay.h>
 #include <linux/proc_fs.h>
 #include <linux/jiffies.h>
+#include <linux/bitops.h>
 
 #define DRIVER_DESC	"Gravis Grip Multiport driver"
 
@@ -112,20 +113,6 @@ static const int axis_map[] = { 5, 9, 1, 5, 6, 10, 2, 6, 4, 8, 0, 4, 5, 9, 1, 5
 
 static int register_slot(int i, struct grip_mp *grip);
 
-/*
- * Returns whether an odd or even number of bits are on in pkt.
- */
-
-static int bit_parity(u32 pkt)
-{
-	int x = pkt ^ (pkt >> 16);
-	x ^= x >> 8;
-	x ^= x >> 4;
-	x ^= x >> 2;
-	x ^= x >> 1;
-	return x & 1;
-}
-
 /*
  * Poll gameport; return true if all bits set in 'onbits' are on and
  * all bits set in 'offbits' are off.
@@ -236,7 +223,7 @@ static int mp_io(struct gameport* gameport, int sendflags, int sendcode, u32 *pa
 		pkt = (pkt >> 2) | 0xf0000000;
 	}
 
-	if (bit_parity(pkt) == 1)
+	if (parity32(pkt) == 1)
 		return IO_RESET;
 
 	/* Acknowledge packet receipt */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 10/17] net: ethernet: oa_tc6: Replace open-coded parity calculation with parity32()
  2025-02-23 16:42 [PATCH 00/17] Introduce and use generic parity32/64 helper Kuan-Wei Chiu
                   ` (8 preceding siblings ...)
  2025-02-23 16:42 ` [PATCH 09/17] Input: joystick - " Kuan-Wei Chiu
@ 2025-02-23 16:42 ` Kuan-Wei Chiu
  2025-02-23 16:42 ` [PATCH 11/17] wifi: brcm80211: " Kuan-Wei Chiu
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 54+ messages in thread
From: Kuan-Wei Chiu @ 2025-02-23 16:42 UTC (permalink / raw)
  To: tglx, mingo, bp, dave.hansen, x86, jk, joel, eajames,
	andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst, mripard,
	tzimmermann, airlied, simona, dmitry.torokhov, mchehab, awalls,
	hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, jirislaby, yury.norov, akpm
  Cc: hpa, alistair, linux, Laurent.pinchart, jonas, jernej.skrabec,
	kuba, linux-kernel, linux-fsi, dri-devel, linux-input,
	linux-media, linux-mtd, oss-drivers, netdev, linux-wireless,
	brcm80211, brcm80211-dev-list.pdl, linux-serial, bpf, jserv,
	Kuan-Wei Chiu, Yu-Chun Lin

Refactor parity calculations to use the standard parity32() helper.
This change eliminates redundant implementations and improves code
efficiency.

Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
---
 drivers/net/ethernet/oa_tc6.c | 19 +++----------------
 1 file changed, 3 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/oa_tc6.c b/drivers/net/ethernet/oa_tc6.c
index db200e4ec284..f02dba7b89a1 100644
--- a/drivers/net/ethernet/oa_tc6.c
+++ b/drivers/net/ethernet/oa_tc6.c
@@ -6,6 +6,7 @@
  */
 
 #include <linux/bitfield.h>
+#include <linux/bitops.h>
 #include <linux/iopoll.h>
 #include <linux/mdio.h>
 #include <linux/phy.h>
@@ -177,19 +178,6 @@ static int oa_tc6_spi_transfer(struct oa_tc6 *tc6,
 	return spi_sync(tc6->spi, &msg);
 }
 
-static int oa_tc6_get_parity(u32 p)
-{
-	/* Public domain code snippet, lifted from
-	 * http://www-graphics.stanford.edu/~seander/bithacks.html
-	 */
-	p ^= p >> 1;
-	p ^= p >> 2;
-	p = (p & 0x11111111U) * 0x11111111U;
-
-	/* Odd parity is used here */
-	return !((p >> 28) & 1);
-}
-
 static __be32 oa_tc6_prepare_ctrl_header(u32 addr, u8 length,
 					 enum oa_tc6_register_op reg_op)
 {
@@ -202,7 +190,7 @@ static __be32 oa_tc6_prepare_ctrl_header(u32 addr, u8 length,
 		 FIELD_PREP(OA_TC6_CTRL_HEADER_ADDR, addr) |
 		 FIELD_PREP(OA_TC6_CTRL_HEADER_LENGTH, length - 1);
 	header |= FIELD_PREP(OA_TC6_CTRL_HEADER_PARITY,
-			     oa_tc6_get_parity(header));
+			     !parity32(header));
 
 	return cpu_to_be32(header);
 }
@@ -940,8 +928,7 @@ static __be32 oa_tc6_prepare_data_header(bool data_valid, bool start_valid,
 		     FIELD_PREP(OA_TC6_DATA_HEADER_END_BYTE_OFFSET,
 				end_byte_offset);
 
-	header |= FIELD_PREP(OA_TC6_DATA_HEADER_PARITY,
-			     oa_tc6_get_parity(header));
+	header |= FIELD_PREP(OA_TC6_DATA_HEADER_PARITY, !parity32(header));
 
 	return cpu_to_be32(header);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 11/17] wifi: brcm80211: Replace open-coded parity calculation with parity32()
  2025-02-23 16:42 [PATCH 00/17] Introduce and use generic parity32/64 helper Kuan-Wei Chiu
                   ` (9 preceding siblings ...)
  2025-02-23 16:42 ` [PATCH 10/17] net: ethernet: oa_tc6: " Kuan-Wei Chiu
@ 2025-02-23 16:42 ` Kuan-Wei Chiu
  2025-02-25  6:29   ` Arend Van Spriel
  2025-02-23 16:42 ` [PATCH 12/17] drm/bridge: dw-hdmi: " Kuan-Wei Chiu
                   ` (7 subsequent siblings)
  18 siblings, 1 reply; 54+ messages in thread
From: Kuan-Wei Chiu @ 2025-02-23 16:42 UTC (permalink / raw)
  To: tglx, mingo, bp, dave.hansen, x86, jk, joel, eajames,
	andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst, mripard,
	tzimmermann, airlied, simona, dmitry.torokhov, mchehab, awalls,
	hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, jirislaby, yury.norov, akpm
  Cc: hpa, alistair, linux, Laurent.pinchart, jonas, jernej.skrabec,
	kuba, linux-kernel, linux-fsi, dri-devel, linux-input,
	linux-media, linux-mtd, oss-drivers, netdev, linux-wireless,
	brcm80211, brcm80211-dev-list.pdl, linux-serial, bpf, jserv,
	Kuan-Wei Chiu, Yu-Chun Lin

Refactor parity calculations to use the standard parity32() helper.
This change eliminates redundant implementations and improves code
efficiency.

Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
---
 .../wireless/broadcom/brcm80211/brcmsmac/dma.c   | 16 +---------------
 1 file changed, 1 insertion(+), 15 deletions(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/dma.c b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/dma.c
index 80c35027787a..d1a1ecd97d42 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/dma.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/dma.c
@@ -17,6 +17,7 @@
 #include <linux/slab.h>
 #include <linux/delay.h>
 #include <linux/pci.h>
+#include <linux/bitops.h>
 #include <net/cfg80211.h>
 #include <net/mac80211.h>
 
@@ -283,21 +284,6 @@ struct dma_info {
 	bool aligndesc_4k;
 };
 
-/* Check for odd number of 1's */
-static u32 parity32(__le32 data)
-{
-	/* no swap needed for counting 1's */
-	u32 par_data = *(u32 *)&data;
-
-	par_data ^= par_data >> 16;
-	par_data ^= par_data >> 8;
-	par_data ^= par_data >> 4;
-	par_data ^= par_data >> 2;
-	par_data ^= par_data >> 1;
-
-	return par_data & 1;
-}
-
 static bool dma64_dd_parity(struct dma64desc *dd)
 {
 	return parity32(dd->addrlow ^ dd->addrhigh ^ dd->ctrl1 ^ dd->ctrl2);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 12/17] drm/bridge: dw-hdmi: Replace open-coded parity calculation with parity32()
  2025-02-23 16:42 [PATCH 00/17] Introduce and use generic parity32/64 helper Kuan-Wei Chiu
                   ` (10 preceding siblings ...)
  2025-02-23 16:42 ` [PATCH 11/17] wifi: brcm80211: " Kuan-Wei Chiu
@ 2025-02-23 16:42 ` Kuan-Wei Chiu
  2025-02-23 16:42 ` [PATCH 13/17] mtd: ssfdc: " Kuan-Wei Chiu
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 54+ messages in thread
From: Kuan-Wei Chiu @ 2025-02-23 16:42 UTC (permalink / raw)
  To: tglx, mingo, bp, dave.hansen, x86, jk, joel, eajames,
	andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst, mripard,
	tzimmermann, airlied, simona, dmitry.torokhov, mchehab, awalls,
	hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, jirislaby, yury.norov, akpm
  Cc: hpa, alistair, linux, Laurent.pinchart, jonas, jernej.skrabec,
	kuba, linux-kernel, linux-fsi, dri-devel, linux-input,
	linux-media, linux-mtd, oss-drivers, netdev, linux-wireless,
	brcm80211, brcm80211-dev-list.pdl, linux-serial, bpf, jserv,
	Kuan-Wei Chiu, Yu-Chun Lin

Refactor parity calculations to use the standard parity32() helper.
This change eliminates redundant implementations and improves code
efficiency.

Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
---
 drivers/gpu/drm/bridge/synopsys/dw-hdmi-ahb-audio.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/bridge/synopsys/dw-hdmi-ahb-audio.c b/drivers/gpu/drm/bridge/synopsys/dw-hdmi-ahb-audio.c
index cf1f66b7b192..833e65f33483 100644
--- a/drivers/gpu/drm/bridge/synopsys/dw-hdmi-ahb-audio.c
+++ b/drivers/gpu/drm/bridge/synopsys/dw-hdmi-ahb-audio.c
@@ -4,6 +4,7 @@
  *
  * Written and tested against the Designware HDMI Tx found in iMX6.
  */
+#include <linux/bitops.h>
 #include <linux/io.h>
 #include <linux/interrupt.h>
 #include <linux/module.h>
@@ -171,12 +172,7 @@ static void dw_hdmi_reformat_iec958(struct snd_dw_hdmi *dw,
 
 static u32 parity(u32 sample)
 {
-	sample ^= sample >> 16;
-	sample ^= sample >> 8;
-	sample ^= sample >> 4;
-	sample ^= sample >> 2;
-	sample ^= sample >> 1;
-	return (sample & 1) << 27;
+	return parity32(sample) << 27;
 }
 
 static void dw_hdmi_reformat_s24(struct snd_dw_hdmi *dw,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 13/17] mtd: ssfdc: Replace open-coded parity calculation with parity32()
  2025-02-23 16:42 [PATCH 00/17] Introduce and use generic parity32/64 helper Kuan-Wei Chiu
                   ` (11 preceding siblings ...)
  2025-02-23 16:42 ` [PATCH 12/17] drm/bridge: dw-hdmi: " Kuan-Wei Chiu
@ 2025-02-23 16:42 ` Kuan-Wei Chiu
  2025-02-23 16:42 ` [PATCH 14/17] fsi: i2cr: " Kuan-Wei Chiu
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 54+ messages in thread
From: Kuan-Wei Chiu @ 2025-02-23 16:42 UTC (permalink / raw)
  To: tglx, mingo, bp, dave.hansen, x86, jk, joel, eajames,
	andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst, mripard,
	tzimmermann, airlied, simona, dmitry.torokhov, mchehab, awalls,
	hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, jirislaby, yury.norov, akpm
  Cc: hpa, alistair, linux, Laurent.pinchart, jonas, jernej.skrabec,
	kuba, linux-kernel, linux-fsi, dri-devel, linux-input,
	linux-media, linux-mtd, oss-drivers, netdev, linux-wireless,
	brcm80211, brcm80211-dev-list.pdl, linux-serial, bpf, jserv,
	Kuan-Wei Chiu, Yu-Chun Lin

Refactor parity calculations to use the standard parity32() helper.
This change eliminates redundant implementations and improves code
efficiency.

Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
---
 drivers/mtd/ssfdc.c | 17 ++---------------
 1 file changed, 2 insertions(+), 15 deletions(-)

diff --git a/drivers/mtd/ssfdc.c b/drivers/mtd/ssfdc.c
index 46c01fa2ec46..e7f9e73da644 100644
--- a/drivers/mtd/ssfdc.c
+++ b/drivers/mtd/ssfdc.c
@@ -7,6 +7,7 @@
  * Based on NTFL and MTDBLOCK_RO drivers
  */
 
+#include <linux/bitops.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/init.h>
@@ -178,20 +179,6 @@ static int read_raw_oob(struct mtd_info *mtd, loff_t offs, uint8_t *buf)
 	return 0;
 }
 
-/* Parity calculator on a word of n bit size */
-static int get_parity(int number, int size)
-{
- 	int k;
-	int parity;
-
-	parity = 1;
-	for (k = 0; k < size; k++) {
-		parity += (number >> k);
-		parity &= 1;
-	}
-	return parity;
-}
-
 /* Read and validate the logical block address field stored in the OOB */
 static int get_logical_address(uint8_t *oob_buf)
 {
@@ -215,7 +202,7 @@ static int get_logical_address(uint8_t *oob_buf)
 			block_address &= 0x7FF;
 			block_address >>= 1;
 
-			if (get_parity(block_address, 10) != parity) {
+			if (parity32(block_address & 0x3ff) == parity) {
 				pr_debug("SSFDC_RO: logical address field%d"
 					"parity error(0x%04X)\n", j+1,
 					block_address);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 14/17] fsi: i2cr: Replace open-coded parity calculation with parity32()
  2025-02-23 16:42 [PATCH 00/17] Introduce and use generic parity32/64 helper Kuan-Wei Chiu
                   ` (12 preceding siblings ...)
  2025-02-23 16:42 ` [PATCH 13/17] mtd: ssfdc: " Kuan-Wei Chiu
@ 2025-02-23 16:42 ` Kuan-Wei Chiu
  2025-02-23 16:42 ` [PATCH 15/17] fsi: i2cr: Replace open-coded parity calculation with parity64() Kuan-Wei Chiu
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 54+ messages in thread
From: Kuan-Wei Chiu @ 2025-02-23 16:42 UTC (permalink / raw)
  To: tglx, mingo, bp, dave.hansen, x86, jk, joel, eajames,
	andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst, mripard,
	tzimmermann, airlied, simona, dmitry.torokhov, mchehab, awalls,
	hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, jirislaby, yury.norov, akpm
  Cc: hpa, alistair, linux, Laurent.pinchart, jonas, jernej.skrabec,
	kuba, linux-kernel, linux-fsi, dri-devel, linux-input,
	linux-media, linux-mtd, oss-drivers, netdev, linux-wireless,
	brcm80211, brcm80211-dev-list.pdl, linux-serial, bpf, jserv,
	Kuan-Wei Chiu, Yu-Chun Lin

Refactor parity calculations to use the standard parity32() helper.
This change eliminates redundant implementations and improves code
efficiency.

Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
---
 drivers/fsi/fsi-master-i2cr.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/fsi/fsi-master-i2cr.c b/drivers/fsi/fsi-master-i2cr.c
index 40f1f4d231e5..8212b99ab2f9 100644
--- a/drivers/fsi/fsi-master-i2cr.c
+++ b/drivers/fsi/fsi-master-i2cr.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 /* Copyright (C) IBM Corporation 2023 */
 
+#include <linux/bitops.h>
 #include <linux/device.h>
 #include <linux/fsi.h>
 #include <linux/i2c.h>
@@ -38,14 +39,7 @@ static const u8 i2cr_cfam[] = {
 
 static bool i2cr_check_parity32(u32 v, bool parity)
 {
-	u32 i;
-
-	for (i = 0; i < 32; ++i) {
-		if (v & (1u << i))
-			parity = !parity;
-	}
-
-	return parity;
+	return parity ^ parity32(v);
 }
 
 static bool i2cr_check_parity64(u64 v)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 15/17] fsi: i2cr: Replace open-coded parity calculation with parity64()
  2025-02-23 16:42 [PATCH 00/17] Introduce and use generic parity32/64 helper Kuan-Wei Chiu
                   ` (13 preceding siblings ...)
  2025-02-23 16:42 ` [PATCH 14/17] fsi: i2cr: " Kuan-Wei Chiu
@ 2025-02-23 16:42 ` Kuan-Wei Chiu
  2025-02-23 16:42 ` [PATCH 16/17] Input: joystick - " Kuan-Wei Chiu
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 54+ messages in thread
From: Kuan-Wei Chiu @ 2025-02-23 16:42 UTC (permalink / raw)
  To: tglx, mingo, bp, dave.hansen, x86, jk, joel, eajames,
	andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst, mripard,
	tzimmermann, airlied, simona, dmitry.torokhov, mchehab, awalls,
	hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, jirislaby, yury.norov, akpm
  Cc: hpa, alistair, linux, Laurent.pinchart, jonas, jernej.skrabec,
	kuba, linux-kernel, linux-fsi, dri-devel, linux-input,
	linux-media, linux-mtd, oss-drivers, netdev, linux-wireless,
	brcm80211, brcm80211-dev-list.pdl, linux-serial, bpf, jserv,
	Kuan-Wei Chiu, Yu-Chun Lin

Refactor parity calculations to use the standard parity64() helper.
This change eliminates redundant implementations and improves code
efficiency.

Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
---
 drivers/fsi/fsi-master-i2cr.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/fsi/fsi-master-i2cr.c b/drivers/fsi/fsi-master-i2cr.c
index 8212b99ab2f9..8f558b7c6dbc 100644
--- a/drivers/fsi/fsi-master-i2cr.c
+++ b/drivers/fsi/fsi-master-i2cr.c
@@ -44,15 +44,9 @@ static bool i2cr_check_parity32(u32 v, bool parity)
 
 static bool i2cr_check_parity64(u64 v)
 {
-	u32 i;
 	bool parity = I2CR_INITIAL_PARITY;
 
-	for (i = 0; i < 64; ++i) {
-		if (v & (1llu << i))
-			parity = !parity;
-	}
-
-	return parity;
+	return parity ^ parity64(v);
 }
 
 static u32 i2cr_get_command(u32 address, bool parity)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 16/17] Input: joystick - Replace open-coded parity calculation with parity64()
  2025-02-23 16:42 [PATCH 00/17] Introduce and use generic parity32/64 helper Kuan-Wei Chiu
                   ` (14 preceding siblings ...)
  2025-02-23 16:42 ` [PATCH 15/17] fsi: i2cr: Replace open-coded parity calculation with parity64() Kuan-Wei Chiu
@ 2025-02-23 16:42 ` Kuan-Wei Chiu
  2025-02-23 16:42 ` [PATCH 17/17] nfp: bpf: " Kuan-Wei Chiu
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 54+ messages in thread
From: Kuan-Wei Chiu @ 2025-02-23 16:42 UTC (permalink / raw)
  To: tglx, mingo, bp, dave.hansen, x86, jk, joel, eajames,
	andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst, mripard,
	tzimmermann, airlied, simona, dmitry.torokhov, mchehab, awalls,
	hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, jirislaby, yury.norov, akpm
  Cc: hpa, alistair, linux, Laurent.pinchart, jonas, jernej.skrabec,
	kuba, linux-kernel, linux-fsi, dri-devel, linux-input,
	linux-media, linux-mtd, oss-drivers, netdev, linux-wireless,
	brcm80211, brcm80211-dev-list.pdl, linux-serial, bpf, jserv,
	Kuan-Wei Chiu, Yu-Chun Lin

Refactor parity calculations to use the standard parity64() helper.
This change eliminates redundant implementations and improves code
efficiency.

Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
---
 drivers/input/joystick/sidewinder.c | 24 +++++-------------------
 1 file changed, 5 insertions(+), 19 deletions(-)

diff --git a/drivers/input/joystick/sidewinder.c b/drivers/input/joystick/sidewinder.c
index 3a5873e5fcb3..9fe980096f70 100644
--- a/drivers/input/joystick/sidewinder.c
+++ b/drivers/input/joystick/sidewinder.c
@@ -7,6 +7,7 @@
  * Microsoft SideWinder joystick family driver for Linux
  */
 
+#include <linux/bitops.h>
 #include <linux/delay.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
@@ -240,21 +241,6 @@ static void sw_init_digital(struct gameport *gameport)
 	local_irq_restore(flags);
 }
 
-/*
- * sw_parity() computes parity of __u64
- */
-
-static int sw_parity(__u64 t)
-{
-	int x = t ^ (t >> 32);
-
-	x ^= x >> 16;
-	x ^= x >> 8;
-	x ^= x >> 4;
-	x ^= x >> 2;
-	x ^= x >> 1;
-	return x & 1;
-}
 
 /*
  * sw_ccheck() checks synchronization bits and computes checksum of nibbles.
@@ -316,7 +302,7 @@ static int sw_parse(unsigned char *buf, struct sw *sw)
 
 			for (i = 0; i < sw->number; i ++) {
 
-				if (sw_parity(GB(i*15,15)))
+				if (parity64(GB(i*15,15)))
 					return -1;
 
 				input_report_abs(sw->dev[i], ABS_X, GB(i*15+3,1) - GB(i*15+2,1));
@@ -333,7 +319,7 @@ static int sw_parse(unsigned char *buf, struct sw *sw)
 		case SW_ID_PP:
 		case SW_ID_FFP:
 
-			if (!sw_parity(GB(0,48)) || (hat = GB(42,4)) > 8)
+			if (!parity64(GB(0,48)) || (hat = GB(42,4)) > 8)
 				return -1;
 
 			dev = sw->dev[0];
@@ -354,7 +340,7 @@ static int sw_parse(unsigned char *buf, struct sw *sw)
 
 		case SW_ID_FSP:
 
-			if (!sw_parity(GB(0,43)) || (hat = GB(28,4)) > 8)
+			if (!parity64(GB(0,43)) || (hat = GB(28,4)) > 8)
 				return -1;
 
 			dev = sw->dev[0];
@@ -379,7 +365,7 @@ static int sw_parse(unsigned char *buf, struct sw *sw)
 
 		case SW_ID_FFW:
 
-			if (!sw_parity(GB(0,33)))
+			if (!parity64(GB(0,33)))
 				return -1;
 
 			dev = sw->dev[0];
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH 17/17] nfp: bpf: Replace open-coded parity calculation with parity64()
  2025-02-23 16:42 [PATCH 00/17] Introduce and use generic parity32/64 helper Kuan-Wei Chiu
                   ` (15 preceding siblings ...)
  2025-02-23 16:42 ` [PATCH 16/17] Input: joystick - " Kuan-Wei Chiu
@ 2025-02-23 16:42 ` Kuan-Wei Chiu
  2025-02-23 20:25 ` [PATCH 00/17] Introduce and use generic parity32/64 helper Uros Bizjak
  2025-02-24  7:58 ` Jeremy Kerr
  18 siblings, 0 replies; 54+ messages in thread
From: Kuan-Wei Chiu @ 2025-02-23 16:42 UTC (permalink / raw)
  To: tglx, mingo, bp, dave.hansen, x86, jk, joel, eajames,
	andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst, mripard,
	tzimmermann, airlied, simona, dmitry.torokhov, mchehab, awalls,
	hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, jirislaby, yury.norov, akpm
  Cc: hpa, alistair, linux, Laurent.pinchart, jonas, jernej.skrabec,
	kuba, linux-kernel, linux-fsi, dri-devel, linux-input,
	linux-media, linux-mtd, oss-drivers, netdev, linux-wireless,
	brcm80211, brcm80211-dev-list.pdl, linux-serial, bpf, jserv,
	Kuan-Wei Chiu, Yu-Chun Lin

Refactor parity calculations to use the standard parity64() helper.
This change eliminates redundant implementations and improves code
efficiency.

Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
---
 drivers/net/ethernet/netronome/nfp/nfp_asm.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_asm.c b/drivers/net/ethernet/netronome/nfp/nfp_asm.c
index 154399c5453f..3646f84a6e8c 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_asm.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_asm.c
@@ -295,11 +295,6 @@ static const u64 nfp_ustore_ecc_polynomials[NFP_USTORE_ECC_POLY_WORDS] = {
 	0x0daf69a46910ULL,
 };
 
-static bool parity(u64 value)
-{
-	return hweight64(value) & 1;
-}
-
 int nfp_ustore_check_valid_no_ecc(u64 insn)
 {
 	if (insn & ~GENMASK_ULL(NFP_USTORE_OP_BITS, 0))
@@ -314,7 +309,7 @@ u64 nfp_ustore_calc_ecc_insn(u64 insn)
 	int i;
 
 	for (i = 0; i < NFP_USTORE_ECC_POLY_WORDS; i++)
-		ecc |= parity(nfp_ustore_ecc_polynomials[i] & insn) << i;
+		ecc |= parity64(nfp_ustore_ecc_polynomials[i] & insn) << i;
 
 	return insn | (u64)ecc << NFP_USTORE_OP_BITS;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH 00/17] Introduce and use generic parity32/64 helper
  2025-02-23 16:42 [PATCH 00/17] Introduce and use generic parity32/64 helper Kuan-Wei Chiu
                   ` (16 preceding siblings ...)
  2025-02-23 16:42 ` [PATCH 17/17] nfp: bpf: " Kuan-Wei Chiu
@ 2025-02-23 20:25 ` Uros Bizjak
  2025-02-24 15:27   ` Yu-Chun Lin
  2025-02-24  7:58 ` Jeremy Kerr
  18 siblings, 1 reply; 54+ messages in thread
From: Uros Bizjak @ 2025-02-23 20:25 UTC (permalink / raw)
  To: Kuan-Wei Chiu, tglx, mingo, bp, dave.hansen, x86, jk, joel,
	eajames, andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst,
	mripard, tzimmermann, airlied, simona, dmitry.torokhov, mchehab,
	awalls, hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, jirislaby, yury.norov, akpm
  Cc: hpa, alistair, linux, Laurent.pinchart, jonas, jernej.skrabec,
	kuba, linux-kernel, linux-fsi, dri-devel, linux-input,
	linux-media, linux-mtd, oss-drivers, netdev, linux-wireless,
	brcm80211, brcm80211-dev-list.pdl, linux-serial, bpf, jserv,
	Yu-Chun Lin

On 23. 02. 25 17:42, Kuan-Wei Chiu wrote:
> Several parts of the kernel contain redundant implementations of parity
> calculations for 32-bit and 64-bit values. Introduces generic
> parity32() and parity64() helpers in bitops.h, providing a standardized
> and optimized implementation.
> 
> Subsequent patches refactor various kernel components to replace
> open-coded parity calculations with the new helpers, reducing code
> duplication and improving maintainability.

Please note that GCC (and clang) provide __builtin_parity{,l,ll}() 
family of builtin functions. Recently, I have tried to use this builtin 
in a couple of places [1], [2], but I had to retract the patches, 
because __builtin functions aren't strictly required to be inlined and 
can generate a library call [3].

As explained in [2], the compilers are able to emit optimized 
target-dependent code (also automatically using popcnt insn when 
avaialble), so ideally the generic parity64() and parity32() would be 
implemented using __builtin_parity(), where the generic library would 
provide a fallback __paritydi2() and __paritysi2() functions, otherwise 
provided by the compiler support library.

For x86, we would like to exercise the hardware parity calculation or 
optimized code sequences involving HW parity calculation, as shown in 
[1] and [2].

[1] https://lore.kernel.org/lkml/20250129205746.10963-1-ubizjak@gmail.com/

[2] https://lore.kernel.org/lkml/20250129154920.6773-2-ubizjak@gmail.com/

[3] 
https://lore.kernel.org/linux-mm/CAKbZUD0N7bkuw_Le3Pr9o1V2BjjcY_YiLm8a8DPceubTdZ00GQ@mail.gmail.com/

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 02/17] bitops: Add generic parity calculation for u64
  2025-02-23 16:42 ` [PATCH 02/17] bitops: Add generic parity calculation for u64 Kuan-Wei Chiu
@ 2025-02-24  7:09   ` Jiri Slaby
  2025-02-24 13:34     ` David Laight
  2025-02-24 19:27   ` Yury Norov
  1 sibling, 1 reply; 54+ messages in thread
From: Jiri Slaby @ 2025-02-24  7:09 UTC (permalink / raw)
  To: Kuan-Wei Chiu, tglx, mingo, bp, dave.hansen, x86, jk, joel,
	eajames, andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst,
	mripard, tzimmermann, airlied, simona, dmitry.torokhov, mchehab,
	awalls, hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, yury.norov, akpm
  Cc: hpa, alistair, linux, Laurent.pinchart, jonas, jernej.skrabec,
	kuba, linux-kernel, linux-fsi, dri-devel, linux-input,
	linux-media, linux-mtd, oss-drivers, netdev, linux-wireless,
	brcm80211, brcm80211-dev-list.pdl, linux-serial, bpf, jserv,
	Yu-Chun Lin

On 23. 02. 25, 17:42, Kuan-Wei Chiu wrote:
> Several parts of the kernel open-code parity calculations using
> different methods. Add a generic parity64() helper implemented with the
> same efficient approach as parity8().
> 
> Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
> Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
> Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> ---
>   include/linux/bitops.h | 22 ++++++++++++++++++++++
>   1 file changed, 22 insertions(+)
> 
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index fb13dedad7aa..67677057f5e2 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -281,6 +281,28 @@ static inline int parity32(u32 val)
>   	return (0x6996 >> (val & 0xf)) & 1;
>   }
>   
> +/**
> + * parity64 - get the parity of an u64 value
> + * @value: the value to be examined
> + *
> + * Determine the parity of the u64 argument.
> + *
> + * Returns:
> + * 0 for even parity, 1 for odd parity
> + */
> +static inline int parity64(u64 val)
> +{
> +	/*
> +	 * One explanation of this algorithm:
> +	 * https://funloop.org/codex/problem/parity/README.html
> +	 */
> +	val ^= val >> 32;

Do we need all these implementations? Can't we simply use parity64() for 
any 8, 16 and 32-bit values too? I.e. have one parity().

> +	val ^= val >> 16;
> +	val ^= val >> 8;
> +	val ^= val >> 4;
> +	return (0x6996 >> (val & 0xf)) & 1;
> +}
> +
>   /**
>    * __ffs64 - find first set bit in a 64 bit word
>    * @word: The 64 bit word


-- 
js
suse labs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 07/17] serial: max3100: Replace open-coded parity calculation with parity8()
  2025-02-23 16:42 ` [PATCH 07/17] serial: max3100: " Kuan-Wei Chiu
@ 2025-02-24  7:25   ` Jiri Slaby
  0 siblings, 0 replies; 54+ messages in thread
From: Jiri Slaby @ 2025-02-24  7:25 UTC (permalink / raw)
  To: Kuan-Wei Chiu, tglx, mingo, bp, dave.hansen, x86, jk, joel,
	eajames, andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst,
	mripard, tzimmermann, airlied, simona, dmitry.torokhov, mchehab,
	awalls, hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, yury.norov, akpm
  Cc: hpa, alistair, linux, Laurent.pinchart, jonas, jernej.skrabec,
	kuba, linux-kernel, linux-fsi, dri-devel, linux-input,
	linux-media, linux-mtd, oss-drivers, netdev, linux-wireless,
	brcm80211, brcm80211-dev-list.pdl, linux-serial, bpf, jserv,
	Yu-Chun Lin

On 23. 02. 25, 17:42, Kuan-Wei Chiu wrote:
> --- a/drivers/tty/serial/max3100.c
> +++ b/drivers/tty/serial/max3100.c
> @@ -16,6 +16,7 @@
>   /* 4 MAX3100s should be enough for everyone */
>   #define MAX_MAX3100 4
>   
> +#include <linux/bitops.h>
>   #include <linux/container_of.h>
>   #include <linux/delay.h>
>   #include <linux/device.h>
> @@ -133,7 +134,7 @@ static int max3100_do_parity(struct max3100_port *s, u16 c)
>   	else
>   		c &= 0xff;
>   
> -	parity = parity ^ (hweight8(c) & 1);
> +	parity = parity ^ (parity8(c));
>   	return parity;

So all this should be simply:
return parity ^ parity8(c);

>   }
>   

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 00/17] Introduce and use generic parity32/64 helper
  2025-02-23 16:42 [PATCH 00/17] Introduce and use generic parity32/64 helper Kuan-Wei Chiu
                   ` (17 preceding siblings ...)
  2025-02-23 20:25 ` [PATCH 00/17] Introduce and use generic parity32/64 helper Uros Bizjak
@ 2025-02-24  7:58 ` Jeremy Kerr
  2025-02-24 15:35   ` Yu-Chun Lin
  18 siblings, 1 reply; 54+ messages in thread
From: Jeremy Kerr @ 2025-02-24  7:58 UTC (permalink / raw)
  To: Kuan-Wei Chiu, tglx, mingo, bp, dave.hansen, x86, joel, eajames,
	andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst, mripard,
	tzimmermann, airlied, simona, dmitry.torokhov, mchehab, awalls,
	hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, jirislaby, yury.norov, akpm
  Cc: hpa, alistair, linux, Laurent.pinchart, jonas, jernej.skrabec,
	kuba, linux-kernel, linux-fsi, dri-devel, linux-input,
	linux-media, linux-mtd, oss-drivers, netdev, linux-wireless,
	brcm80211, brcm80211-dev-list.pdl, linux-serial, bpf, jserv,
	Yu-Chun Lin

Hi Kuan-Wei,

> Several parts of the kernel contain redundant implementations of parity
> calculations for 32-bit and 64-bit values. Introduces generic
> parity32() and parity64() helpers in bitops.h, providing a standardized
> and optimized implementation.  

More so than __builtin_parity() ?

I'm all for reducing the duplication, but the compiler may well have a
better parity approach than the xor-folding implementation here. Looks
like we can get this to two instructions on powerpc64, for example.

Cheers,


Jeremy

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 02/17] bitops: Add generic parity calculation for u64
  2025-02-24  7:09   ` Jiri Slaby
@ 2025-02-24 13:34     ` David Laight
  2025-02-24 16:56       ` Yu-Chun Lin
                         ` (2 more replies)
  0 siblings, 3 replies; 54+ messages in thread
From: David Laight @ 2025-02-24 13:34 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Kuan-Wei Chiu, tglx, mingo, bp, dave.hansen, x86, jk, joel,
	eajames, andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst,
	mripard, tzimmermann, airlied, simona, dmitry.torokhov, mchehab,
	awalls, hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, yury.norov, akpm, hpa,
	alistair, linux, Laurent.pinchart, jonas, jernej.skrabec, kuba,
	linux-kernel, linux-fsi, dri-devel, linux-input, linux-media,
	linux-mtd, oss-drivers, netdev, linux-wireless, brcm80211,
	brcm80211-dev-list.pdl, linux-serial, bpf, jserv, Yu-Chun Lin

On Mon, 24 Feb 2025 08:09:43 +0100
Jiri Slaby <jirislaby@kernel.org> wrote:

> On 23. 02. 25, 17:42, Kuan-Wei Chiu wrote:
> > Several parts of the kernel open-code parity calculations using
> > different methods. Add a generic parity64() helper implemented with the
> > same efficient approach as parity8().
> > 
> > Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
> > Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
> > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> > ---
> >   include/linux/bitops.h | 22 ++++++++++++++++++++++
> >   1 file changed, 22 insertions(+)
> > 
> > diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> > index fb13dedad7aa..67677057f5e2 100644
> > --- a/include/linux/bitops.h
> > +++ b/include/linux/bitops.h
> > @@ -281,6 +281,28 @@ static inline int parity32(u32 val)
> >   	return (0x6996 >> (val & 0xf)) & 1;
> >   }
> >   
> > +/**
> > + * parity64 - get the parity of an u64 value
> > + * @value: the value to be examined
> > + *
> > + * Determine the parity of the u64 argument.
> > + *
> > + * Returns:
> > + * 0 for even parity, 1 for odd parity
> > + */
> > +static inline int parity64(u64 val)
> > +{
> > +	/*
> > +	 * One explanation of this algorithm:
> > +	 * https://funloop.org/codex/problem/parity/README.html
> > +	 */
> > +	val ^= val >> 32;  
> 
> Do we need all these implementations? Can't we simply use parity64() for 
> any 8, 16 and 32-bit values too? I.e. have one parity().

I'm not sure you can guarantee that the compiler will optimise away
the unnecessary operations.

But:
static inline int parity64(u64 val)
{
	return parity32(val ^ (val >> 32))
}

should be ok.
It will also work on x86-32 where parity32() can just check the parity flag.
Although you are unlikely to manage to use the the PF the xor sets.

	David

> 
> > +	val ^= val >> 16;
> > +	val ^= val >> 8;
> > +	val ^= val >> 4;
> > +	return (0x6996 >> (val & 0xf)) & 1;
> > +}
> > +
> >   /**
> >    * __ffs64 - find first set bit in a 64 bit word
> >    * @word: The 64 bit word  
> 
> 


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 03/17] x86: Replace open-coded parity calculation with parity8()
  2025-02-23 16:42 ` [PATCH 03/17] x86: Replace open-coded parity calculation with parity8() Kuan-Wei Chiu
@ 2025-02-24 15:24   ` Uros Bizjak
  2025-02-24 21:55     ` H. Peter Anvin
  0 siblings, 1 reply; 54+ messages in thread
From: Uros Bizjak @ 2025-02-24 15:24 UTC (permalink / raw)
  To: Kuan-Wei Chiu, tglx, Ingo Molnar, bp, dave.hansen, x86, jk, joel,
	eajames, andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst,
	mripard, tzimmermann, airlied, simona, dmitry.torokhov, mchehab,
	awalls, hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, jirislaby, yury.norov, akpm,
	mingo
  Cc: hpa, alistair, linux, Laurent.pinchart, jonas, jernej.skrabec,
	kuba, linux-kernel, linux-fsi, dri-devel, linux-input,
	linux-media, linux-mtd, oss-drivers, netdev, linux-wireless,
	brcm80211, brcm80211-dev-list.pdl, linux-serial, bpf, jserv,
	Yu-Chun Lin



On 23. 02. 25 17:42, Kuan-Wei Chiu wrote:
> Refactor parity calculations to use the standard parity8() helper. This
> change eliminates redundant implementations and improves code
> efficiency.

The patch improves parity assembly code in bootflag.o from:

   58:	89 de                	mov    %ebx,%esi
   5a:	b9 08 00 00 00       	mov    $0x8,%ecx
   5f:	31 d2                	xor    %edx,%edx
   61:	89 f0                	mov    %esi,%eax
   63:	89 d7                	mov    %edx,%edi
   65:	40 d0 ee             	shr    %sil
   68:	83 e0 01             	and    $0x1,%eax
   6b:	31 c2                	xor    %eax,%edx
   6d:	83 e9 01             	sub    $0x1,%ecx
   70:	75 ef                	jne    61 <sbf_init+0x51>
   72:	39 c7                	cmp    %eax,%edi
   74:	74 7f                	je     f5 <sbf_init+0xe5>
   76:

to:

   54:	89 d8                	mov    %ebx,%eax
   56:	ba 96 69 00 00       	mov    $0x6996,%edx
   5b:	c0 e8 04             	shr    $0x4,%al
   5e:	31 d8                	xor    %ebx,%eax
   60:	83 e0 0f             	and    $0xf,%eax
   63:	0f a3 c2             	bt     %eax,%edx
   66:	73 64                	jae    cc <sbf_init+0xbc>
   68:

which is faster and smaller (-10 bytes) code.

Reviewed-by: Uros Bizjak <ubizjak@gmail.com>

Thanks,
Uros.

> 
> Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
> Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
> Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> ---
>   arch/x86/kernel/bootflag.c | 18 +++---------------
>   1 file changed, 3 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/x86/kernel/bootflag.c b/arch/x86/kernel/bootflag.c
> index 3fed7ae58b60..314ff0e84900 100644
> --- a/arch/x86/kernel/bootflag.c
> +++ b/arch/x86/kernel/bootflag.c
> @@ -8,6 +8,7 @@
>   #include <linux/string.h>
>   #include <linux/spinlock.h>
>   #include <linux/acpi.h>
> +#include <linux/bitops.h>
>   #include <asm/io.h>
>   
>   #include <linux/mc146818rtc.h>
> @@ -20,26 +21,13 @@
>   
>   int sbf_port __initdata = -1;	/* set via acpi_boot_init() */
>   
> -static int __init parity(u8 v)
> -{
> -	int x = 0;
> -	int i;
> -
> -	for (i = 0; i < 8; i++) {
> -		x ^= (v & 1);
> -		v >>= 1;
> -	}
> -
> -	return x;
> -}
> -
>   static void __init sbf_write(u8 v)
>   {
>   	unsigned long flags;
>   
>   	if (sbf_port != -1) {
>   		v &= ~SBF_PARITY;
> -		if (!parity(v))
> +		if (!parity8(v))
>   			v |= SBF_PARITY;
>   
>   		printk(KERN_INFO "Simple Boot Flag at 0x%x set to 0x%x\n",
> @@ -70,7 +58,7 @@ static int __init sbf_value_valid(u8 v)
>   {
>   	if (v & SBF_RESERVED)		/* Reserved bits */
>   		return 0;
> -	if (!parity(v))
> +	if (!parity8(v))
>   		return 0;
>   
>   	return 1;

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 00/17] Introduce and use generic parity32/64 helper
  2025-02-23 20:25 ` [PATCH 00/17] Introduce and use generic parity32/64 helper Uros Bizjak
@ 2025-02-24 15:27   ` Yu-Chun Lin
  0 siblings, 0 replies; 54+ messages in thread
From: Yu-Chun Lin @ 2025-02-24 15:27 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Kuan-Wei Chiu, tglx, mingo, bp, dave.hansen, x86, jk, joel,
	eajames, andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst,
	mripard, tzimmermann, airlied, simona, dmitry.torokhov, mchehab,
	awalls, hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, jirislaby, yury.norov, akpm,
	hpa, alistair, linux, Laurent.pinchart, jonas, jernej.skrabec,
	kuba, linux-kernel, linux-fsi, dri-devel, linux-input,
	linux-media, linux-mtd, oss-drivers, netdev, linux-wireless,
	brcm80211, brcm80211-dev-list.pdl, linux-serial, bpf, jserv

On Sun, Feb 23, 2025 at 09:25:42PM +0100, Uros Bizjak wrote:
> 
> Please note that GCC (and clang) provide __builtin_parity{,l,ll}() family of
> builtin functions. Recently, I have tried to use this builtin in a couple of
> places [1], [2], but I had to retract the patches, because __builtin
> functions aren't strictly required to be inlined and can generate a library
> call [3].
> 
> As explained in [2], the compilers are able to emit optimized
> target-dependent code (also automatically using popcnt insn when avaialble),
> so ideally the generic parity64() and parity32() would be implemented using
> __builtin_parity(), where the generic library would provide a fallback
> __paritydi2() and __paritysi2() functions, otherwise provided by the
> compiler support library.
> 
> For x86, we would like to exercise the hardware parity calculation or
> optimized code sequences involving HW parity calculation, as shown in [1]
> and [2].
> 
> [1] https://lore.kernel.org/lkml/20250129205746.10963-1-ubizjak@gmail.com/
> 
> [2] https://lore.kernel.org/lkml/20250129154920.6773-2-ubizjak@gmail.com/
> 
> [3] https://lore.kernel.org/linux-mm/CAKbZUD0N7bkuw_Le3Pr9o1V2BjjcY_YiLm8a8DPceubTdZ00GQ@mail.gmail.com/

Hi Uros,
Thanks for your information.

We originally planned to implement hardware optimizations after this
patch series. However, for V2, We will incorporate __builtin_parity(),
while keeping our current implementation as the fallback function.

Best regards,
Yu-Chun Lin

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 00/17] Introduce and use generic parity32/64 helper
  2025-02-24  7:58 ` Jeremy Kerr
@ 2025-02-24 15:35   ` Yu-Chun Lin
  0 siblings, 0 replies; 54+ messages in thread
From: Yu-Chun Lin @ 2025-02-24 15:35 UTC (permalink / raw)
  To: Jeremy Kerr
  Cc: Kuan-Wei Chiu, tglx, mingo, bp, dave.hansen, x86, joel, eajames,
	andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst, mripard,
	tzimmermann, airlied, simona, dmitry.torokhov, mchehab, awalls,
	hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, jirislaby, yury.norov, akpm,
	hpa, alistair, linux, Laurent.pinchart, jonas, jernej.skrabec,
	kuba, linux-kernel, linux-fsi, dri-devel, linux-input,
	linux-media, linux-mtd, oss-drivers, netdev, linux-wireless,
	brcm80211, brcm80211-dev-list.pdl, linux-serial, bpf, jserv

On Mon, Feb 24, 2025 at 03:58:49PM +0800, Jeremy Kerr wrote:
> More so than __builtin_parity() ?
> 
> I'm all for reducing the duplication, but the compiler may well have a
> better parity approach than the xor-folding implementation here. Looks
> like we can get this to two instructions on powerpc64, for example.

Hi Jeremy,

Thank for your input. We will do that in V2.

Best regards,
Yu-Chun Lin

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 02/17] bitops: Add generic parity calculation for u64
  2025-02-24 13:34     ` David Laight
@ 2025-02-24 16:56       ` Yu-Chun Lin
  2025-02-25 15:21       ` H. Peter Anvin
  2025-02-25 15:24       ` H. Peter Anvin
  2 siblings, 0 replies; 54+ messages in thread
From: Yu-Chun Lin @ 2025-02-24 16:56 UTC (permalink / raw)
  To: David Laight
  Cc: Jiri Slaby, Kuan-Wei Chiu, tglx, mingo, bp, dave.hansen, x86, jk,
	joel, eajames, andrzej.hajda, neil.armstrong, rfoss,
	maarten.lankhorst, mripard, tzimmermann, airlied, simona,
	dmitry.torokhov, mchehab, awalls, hverkuil, miquel.raynal,
	richard, vigneshr, louis.peens, andrew+netdev, davem, edumazet,
	pabeni, parthiban.veerasooran, arend.vanspriel, johannes, gregkh,
	yury.norov, akpm, hpa, alistair, linux, Laurent.pinchart, jonas,
	jernej.skrabec, kuba, linux-kernel, linux-fsi, dri-devel,
	linux-input, linux-media, linux-mtd, oss-drivers, netdev,
	linux-wireless, brcm80211, brcm80211-dev-list.pdl, linux-serial,
	bpf, jserv

On Mon, Feb 24, 2025 at 01:34:31PM +0000, David Laight wrote:
> On Mon, 24 Feb 2025 08:09:43 +0100
> Jiri Slaby <jirislaby@kernel.org> wrote:
> 
> > On 23. 02. 25, 17:42, Kuan-Wei Chiu wrote:
> > > Several parts of the kernel open-code parity calculations using
> > > different methods. Add a generic parity64() helper implemented with the
> > > same efficient approach as parity8().
> > > 
> > > Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
> > > Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
> > > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> > > ---
> > >   include/linux/bitops.h | 22 ++++++++++++++++++++++
> > >   1 file changed, 22 insertions(+)
> > > 
> > > diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> > > index fb13dedad7aa..67677057f5e2 100644
> > > --- a/include/linux/bitops.h
> > > +++ b/include/linux/bitops.h
> > > @@ -281,6 +281,28 @@ static inline int parity32(u32 val)
> > >   	return (0x6996 >> (val & 0xf)) & 1;
> > >   }
> > >   
> > > +/**
> > > + * parity64 - get the parity of an u64 value
> > > + * @value: the value to be examined
> > > + *
> > > + * Determine the parity of the u64 argument.
> > > + *
> > > + * Returns:
> > > + * 0 for even parity, 1 for odd parity
> > > + */
> > > +static inline int parity64(u64 val)
> > > +{
> > > +	/*
> > > +	 * One explanation of this algorithm:
> > > +	 * https://funloop.org/codex/problem/parity/README.html
> > > +	 */
> > > +	val ^= val >> 32;  
> > 
> > Do we need all these implementations? Can't we simply use parity64() for 
> > any 8, 16 and 32-bit values too? I.e. have one parity().
> 
> I'm not sure you can guarantee that the compiler will optimise away
> the unnecessary operations.

Hi Jiri and David,

Unless we can be certain about the compiler's optimization behavior, we
prefer to follow an approach similar to hweight, distinguishing
implementations based on different bit sizes.

> 
> But:
> static inline int parity64(u64 val)
> {
> 	return parity32(val ^ (val >> 32))
> }
> 
> should be ok.

We will adopt this approach, as it is indeed more concise.

Thank you all for your feedback.

Best regards,

Yu-Chun Lin

> It will also work on x86-32 where parity32() can just check the parity flag.
> Although you are unlikely to manage to use the the PF the xor sets.
> 
> 	David
> 
> > 
> > > +	val ^= val >> 16;
> > > +	val ^= val >> 8;
> > > +	val ^= val >> 4;
> > > +	return (0x6996 >> (val & 0xf)) & 1;
> > > +}
> > > +
> > >   /**
> > >    * __ffs64 - find first set bit in a 64 bit word
> > >    * @word: The 64 bit word  
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 02/17] bitops: Add generic parity calculation for u64
  2025-02-23 16:42 ` [PATCH 02/17] bitops: Add generic parity calculation for u64 Kuan-Wei Chiu
  2025-02-24  7:09   ` Jiri Slaby
@ 2025-02-24 19:27   ` Yury Norov
  2025-02-25 13:29     ` Kuan-Wei Chiu
  2025-02-26 22:29     ` David Laight
  1 sibling, 2 replies; 54+ messages in thread
From: Yury Norov @ 2025-02-24 19:27 UTC (permalink / raw)
  To: Kuan-Wei Chiu
  Cc: tglx, mingo, bp, dave.hansen, x86, jk, joel, eajames,
	andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst, mripard,
	tzimmermann, airlied, simona, dmitry.torokhov, mchehab, awalls,
	hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, jirislaby, akpm, hpa, alistair,
	linux, Laurent.pinchart, jonas, jernej.skrabec, kuba,
	linux-kernel, linux-fsi, dri-devel, linux-input, linux-media,
	linux-mtd, oss-drivers, netdev, linux-wireless, brcm80211,
	brcm80211-dev-list.pdl, linux-serial, bpf, jserv, Yu-Chun Lin

On Mon, Feb 24, 2025 at 12:42:02AM +0800, Kuan-Wei Chiu wrote:
> Several parts of the kernel open-code parity calculations using
> different methods. Add a generic parity64() helper implemented with the
> same efficient approach as parity8().

No reason to add parity32() and parity64() in separate patches
 
> Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
> Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
> Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> ---
>  include/linux/bitops.h | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index fb13dedad7aa..67677057f5e2 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -281,6 +281,28 @@ static inline int parity32(u32 val)
>  	return (0x6996 >> (val & 0xf)) & 1;
>  }
>  
> +/**
> + * parity64 - get the parity of an u64 value
> + * @value: the value to be examined
> + *
> + * Determine the parity of the u64 argument.
> + *
> + * Returns:
> + * 0 for even parity, 1 for odd parity
> + */
> +static inline int parity64(u64 val)
> +{
> +	/*
> +	 * One explanation of this algorithm:
> +	 * https://funloop.org/codex/problem/parity/README.html

This is already referenced in sources. No need to spread it for more.

> +	 */
> +	val ^= val >> 32;
> +	val ^= val >> 16;
> +	val ^= val >> 8;
> +	val ^= val >> 4;
> +	return (0x6996 >> (val & 0xf)) & 1;

It's better to avoid duplicating the same logic again and again.

> +}
> +

So maybe make it a macro?


From f17a28ae3429f49825d65ebc0f7717c6a191a3e2 Mon Sep 17 00:00:00 2001
From: Yury Norov <yury.norov@gmail.com>
Date: Mon, 24 Feb 2025 14:14:27 -0500
Subject: [PATCH] bitops: generalize parity8()

The generic parity calculation approach may be easily generalized for
other standard types. Do that and drop sub-optimal implementation of
parity calculation in x86 code.

Signed-off-by: Yury Norov [NVIDIA] <yury.norov@gmail.com>
---
 arch/x86/kernel/bootflag.c | 14 +-----------
 include/linux/bitops.h     | 47 +++++++++++++++++++++++++++-----------
 2 files changed, 35 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kernel/bootflag.c b/arch/x86/kernel/bootflag.c
index 3fed7ae58b60..4a85c69a28f8 100644
--- a/arch/x86/kernel/bootflag.c
+++ b/arch/x86/kernel/bootflag.c
@@ -2,6 +2,7 @@
 /*
  *	Implement 'Simple Boot Flag Specification 2.0'
  */
+#include <linux/bitops.h>
 #include <linux/types.h>
 #include <linux/kernel.h>
 #include <linux/init.h>
@@ -20,19 +21,6 @@
 
 int sbf_port __initdata = -1;	/* set via acpi_boot_init() */
 
-static int __init parity(u8 v)
-{
-	int x = 0;
-	int i;
-
-	for (i = 0; i < 8; i++) {
-		x ^= (v & 1);
-		v >>= 1;
-	}
-
-	return x;
-}
-
 static void __init sbf_write(u8 v)
 {
 	unsigned long flags;
diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index c1cb53cf2f0f..29601434f5f4 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -230,10 +230,10 @@ static inline int get_count_order_long(unsigned long l)
 }
 
 /**
- * parity8 - get the parity of an u8 value
+ * parity - get the parity of a value
  * @value: the value to be examined
  *
- * Determine the parity of the u8 argument.
+ * Determine parity of the argument.
  *
  * Returns:
  * 0 for even parity, 1 for odd parity
@@ -241,24 +241,45 @@ static inline int get_count_order_long(unsigned long l)
  * Note: This function informs you about the current parity. Example to bail
  * out when parity is odd:
  *
- *	if (parity8(val) == 1)
+ *	if (parity(val) == 1)
  *		return -EBADMSG;
  *
  * If you need to calculate a parity bit, you need to draw the conclusion from
  * this result yourself. Example to enforce odd parity, parity bit is bit 7:
  *
- *	if (parity8(val) == 0)
+ *	if (parity(val) == 0)
  *		val ^= BIT(7);
+ *
+ * One explanation of this algorithm:
+ * https://funloop.org/codex/problem/parity/README.html
  */
-static inline int parity8(u8 val)
-{
-	/*
-	 * One explanation of this algorithm:
-	 * https://funloop.org/codex/problem/parity/README.html
-	 */
-	val ^= val >> 4;
-	return (0x6996 >> (val & 0xf)) & 1;
-}
+#define parity(val)					\
+({							\
+	u64 __v = (val);				\
+	int __ret;					\
+	switch (BITS_PER_TYPE(val)) {			\
+	case 64:					\
+		__v ^= __v >> 32;			\
+		fallthrough;				\
+	case 32:					\
+		__v ^= __v >> 16;			\
+		fallthrough;				\
+	case 16:					\
+		__v ^= __v >> 8;			\
+		fallthrough;				\
+	case 8:						\
+		__v ^= __v >> 4;			\
+		__ret =  (0x6996 >> (__v & 0xf)) & 1;	\
+		break;					\
+	default:					\
+		BUILD_BUG();				\
+	}						\
+	__ret;						\
+})
+
+#define parity8(val)	parity((u8)(val))
+#define parity32(val)	parity((u32)(val))
+#define parity64(val)	parity((u64)(val))
 
 /**
  * __ffs64 - find first set bit in a 64 bit word
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH 03/17] x86: Replace open-coded parity calculation with parity8()
  2025-02-24 15:24   ` Uros Bizjak
@ 2025-02-24 21:55     ` H. Peter Anvin
  2025-02-24 22:08       ` Uros Bizjak
                         ` (2 more replies)
  0 siblings, 3 replies; 54+ messages in thread
From: H. Peter Anvin @ 2025-02-24 21:55 UTC (permalink / raw)
  To: Uros Bizjak, Kuan-Wei Chiu, tglx, Ingo Molnar, bp, dave.hansen,
	x86, jk, joel, eajames, andrzej.hajda, neil.armstrong, rfoss,
	maarten.lankhorst, mripard, tzimmermann, airlied, simona,
	dmitry.torokhov, mchehab, awalls, hverkuil, miquel.raynal,
	richard, vigneshr, louis.peens, andrew+netdev, davem, edumazet,
	pabeni, parthiban.veerasooran, arend.vanspriel, johannes, gregkh,
	jirislaby, yury.norov, akpm, mingo
  Cc: alistair, linux, Laurent.pinchart, jonas, jernej.skrabec, kuba,
	linux-kernel, linux-fsi, dri-devel, linux-input, linux-media,
	linux-mtd, oss-drivers, netdev, linux-wireless, brcm80211,
	brcm80211-dev-list.pdl, linux-serial, bpf, jserv, Yu-Chun Lin

On 2/24/25 07:24, Uros Bizjak wrote:
> 
> 
> On 23. 02. 25 17:42, Kuan-Wei Chiu wrote:
>> Refactor parity calculations to use the standard parity8() helper. This
>> change eliminates redundant implementations and improves code
>> efficiency.
> 
> The patch improves parity assembly code in bootflag.o from:
> 
>    58:    89 de                    mov    %ebx,%esi
>    5a:    b9 08 00 00 00           mov    $0x8,%ecx
>    5f:    31 d2                    xor    %edx,%edx
>    61:    89 f0                    mov    %esi,%eax
>    63:    89 d7                    mov    %edx,%edi
>    65:    40 d0 ee                 shr    %sil
>    68:    83 e0 01                 and    $0x1,%eax
>    6b:    31 c2                    xor    %eax,%edx
>    6d:    83 e9 01                 sub    $0x1,%ecx
>    70:    75 ef                    jne    61 <sbf_init+0x51>
>    72:    39 c7                    cmp    %eax,%edi
>    74:    74 7f                    je     f5 <sbf_init+0xe5>
>    76:
> 
> to:
> 
>    54:    89 d8                    mov    %ebx,%eax
>    56:    ba 96 69 00 00           mov    $0x6996,%edx
>    5b:    c0 e8 04                 shr    $0x4,%al
>    5e:    31 d8                    xor    %ebx,%eax
>    60:    83 e0 0f                 and    $0xf,%eax
>    63:    0f a3 c2                 bt     %eax,%edx
>    66:    73 64                    jae    cc <sbf_init+0xbc>
>    68:
> 
> which is faster and smaller (-10 bytes) code.
> 

Of course, on x86, parity8() and parity16() can be implemented very simply:

(Also, the parity functions really ought to return bool, and be flagged 
__attribute_const__.)

static inline __attribute_const__ bool _arch_parity8(u8 val)
{
	bool parity;
	asm("and %0,%0" : "=@ccnp" (parity) : "q" (val));
	return parity;
}

static inline __attribute_const__ bool _arch_parity16(u16 val)
{
	bool parity;
	asm("xor %h0,%b0" : "=@ccnp" (parity), "+Q" (val));
	return parity;
}

In the generic algorithm, you probably should implement parity16() in 
terms of parity8(), parity32() in terms of parity16() and so on:

static inline __attribute_const__ bool parity16(u16 val)
{
#ifdef ARCH_HAS_PARITY16
	if (!__builtin_const_p(val))
		return _arch_parity16(val);
#endif
	return parity8(val ^ (val >> 8));
}

This picks up the architectural versions when available.

Furthermore, if a popcnt instruction is known to exist, then the parity 
is simply popcnt(x) & 1.

	-hpa


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 03/17] x86: Replace open-coded parity calculation with parity8()
  2025-02-24 21:55     ` H. Peter Anvin
@ 2025-02-24 22:08       ` Uros Bizjak
  2025-02-24 22:18         ` H. Peter Anvin
  2025-02-25  3:36         ` H. Peter Anvin
  2025-02-24 22:17       ` Yury Norov
  2025-02-25 22:46       ` David Laight
  2 siblings, 2 replies; 54+ messages in thread
From: Uros Bizjak @ 2025-02-24 22:08 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Kuan-Wei Chiu, tglx, Ingo Molnar, bp, dave.hansen, x86, jk, joel,
	eajames, andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst,
	mripard, tzimmermann, airlied, simona, dmitry.torokhov, mchehab,
	awalls, hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, jirislaby, yury.norov, akpm,
	mingo, alistair, linux, Laurent.pinchart, jonas, jernej.skrabec,
	kuba, linux-kernel, linux-fsi, dri-devel, linux-input,
	linux-media, linux-mtd, oss-drivers, netdev, linux-wireless,
	brcm80211, brcm80211-dev-list.pdl, linux-serial, bpf, jserv,
	Yu-Chun Lin

On Mon, Feb 24, 2025 at 10:56 PM H. Peter Anvin <hpa@zytor.com> wrote:
>
> On 2/24/25 07:24, Uros Bizjak wrote:
> >
> >
> > On 23. 02. 25 17:42, Kuan-Wei Chiu wrote:
> >> Refactor parity calculations to use the standard parity8() helper. This
> >> change eliminates redundant implementations and improves code
> >> efficiency.
> >
> > The patch improves parity assembly code in bootflag.o from:
> >
> >    58:    89 de                    mov    %ebx,%esi
> >    5a:    b9 08 00 00 00           mov    $0x8,%ecx
> >    5f:    31 d2                    xor    %edx,%edx
> >    61:    89 f0                    mov    %esi,%eax
> >    63:    89 d7                    mov    %edx,%edi
> >    65:    40 d0 ee                 shr    %sil
> >    68:    83 e0 01                 and    $0x1,%eax
> >    6b:    31 c2                    xor    %eax,%edx
> >    6d:    83 e9 01                 sub    $0x1,%ecx
> >    70:    75 ef                    jne    61 <sbf_init+0x51>
> >    72:    39 c7                    cmp    %eax,%edi
> >    74:    74 7f                    je     f5 <sbf_init+0xe5>
> >    76:
> >
> > to:
> >
> >    54:    89 d8                    mov    %ebx,%eax
> >    56:    ba 96 69 00 00           mov    $0x6996,%edx
> >    5b:    c0 e8 04                 shr    $0x4,%al
> >    5e:    31 d8                    xor    %ebx,%eax
> >    60:    83 e0 0f                 and    $0xf,%eax
> >    63:    0f a3 c2                 bt     %eax,%edx
> >    66:    73 64                    jae    cc <sbf_init+0xbc>
> >    68:
> >
> > which is faster and smaller (-10 bytes) code.
> >
>
> Of course, on x86, parity8() and parity16() can be implemented very simply:
>
> (Also, the parity functions really ought to return bool, and be flagged
> __attribute_const__.)
>
> static inline __attribute_const__ bool _arch_parity8(u8 val)
> {
>         bool parity;
>         asm("and %0,%0" : "=@ccnp" (parity) : "q" (val));

asm("test %0,%0" : "=@ccnp" (parity) : "q" (val));

because we are interested only in flags.

Uros.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 03/17] x86: Replace open-coded parity calculation with parity8()
  2025-02-24 21:55     ` H. Peter Anvin
  2025-02-24 22:08       ` Uros Bizjak
@ 2025-02-24 22:17       ` Yury Norov
  2025-02-24 22:21         ` H. Peter Anvin
  2025-02-25 22:46       ` David Laight
  2 siblings, 1 reply; 54+ messages in thread
From: Yury Norov @ 2025-02-24 22:17 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Uros Bizjak, Kuan-Wei Chiu, tglx, Ingo Molnar, bp, dave.hansen,
	x86, jk, joel, eajames, andrzej.hajda, neil.armstrong, rfoss,
	maarten.lankhorst, mripard, tzimmermann, airlied, simona,
	dmitry.torokhov, mchehab, awalls, hverkuil, miquel.raynal,
	richard, vigneshr, louis.peens, andrew+netdev, davem, edumazet,
	pabeni, parthiban.veerasooran, arend.vanspriel, johannes, gregkh,
	jirislaby, akpm, mingo, alistair, linux, Laurent.pinchart, jonas,
	jernej.skrabec, kuba, linux-kernel, linux-fsi, dri-devel,
	linux-input, linux-media, linux-mtd, oss-drivers, netdev,
	linux-wireless, brcm80211, brcm80211-dev-list.pdl, linux-serial,
	bpf, jserv, Yu-Chun Lin

On Mon, Feb 24, 2025 at 01:55:28PM -0800, H. Peter Anvin wrote:
> On 2/24/25 07:24, Uros Bizjak wrote:
> > 
> > 
> > On 23. 02. 25 17:42, Kuan-Wei Chiu wrote:
> > > Refactor parity calculations to use the standard parity8() helper. This
> > > change eliminates redundant implementations and improves code
> > > efficiency.
> > 
> > The patch improves parity assembly code in bootflag.o from:
> > 
> >    58:    89 de                    mov    %ebx,%esi
> >    5a:    b9 08 00 00 00           mov    $0x8,%ecx
> >    5f:    31 d2                    xor    %edx,%edx
> >    61:    89 f0                    mov    %esi,%eax
> >    63:    89 d7                    mov    %edx,%edi
> >    65:    40 d0 ee                 shr    %sil
> >    68:    83 e0 01                 and    $0x1,%eax
> >    6b:    31 c2                    xor    %eax,%edx
> >    6d:    83 e9 01                 sub    $0x1,%ecx
> >    70:    75 ef                    jne    61 <sbf_init+0x51>
> >    72:    39 c7                    cmp    %eax,%edi
> >    74:    74 7f                    je     f5 <sbf_init+0xe5>
> >    76:
> > 
> > to:
> > 
> >    54:    89 d8                    mov    %ebx,%eax
> >    56:    ba 96 69 00 00           mov    $0x6996,%edx
> >    5b:    c0 e8 04                 shr    $0x4,%al
> >    5e:    31 d8                    xor    %ebx,%eax
> >    60:    83 e0 0f                 and    $0xf,%eax
> >    63:    0f a3 c2                 bt     %eax,%edx
> >    66:    73 64                    jae    cc <sbf_init+0xbc>
> >    68:
> > 
> > which is faster and smaller (-10 bytes) code.
> > 
> 
> Of course, on x86, parity8() and parity16() can be implemented very simply:
> 
> (Also, the parity functions really ought to return bool, and be flagged
> __attribute_const__.)

There was a discussion regarding return type when parity8() was added.
The integer type was taken over bool with a sort of consideration that
bool should be returned as an answer to some question, like parity_odd().

To me it's not a big deal. We can switch to boolean and describe in
comment what the 'true' means for the parity() function.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 03/17] x86: Replace open-coded parity calculation with parity8()
  2025-02-24 22:08       ` Uros Bizjak
@ 2025-02-24 22:18         ` H. Peter Anvin
  2025-02-25  3:36         ` H. Peter Anvin
  1 sibling, 0 replies; 54+ messages in thread
From: H. Peter Anvin @ 2025-02-24 22:18 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Kuan-Wei Chiu, tglx, Ingo Molnar, bp, dave.hansen, x86, jk, joel,
	eajames, andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst,
	mripard, tzimmermann, airlied, simona, dmitry.torokhov, mchehab,
	awalls, hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, jirislaby, yury.norov, akpm,
	mingo, alistair, linux, Laurent.pinchart, jonas, jernej.skrabec,
	kuba, linux-kernel, linux-fsi, dri-devel, linux-input,
	linux-media, linux-mtd, oss-drivers, netdev, linux-wireless,
	brcm80211, brcm80211-dev-list.pdl, linux-serial, bpf, jserv,
	Yu-Chun Lin

On February 24, 2025 2:08:05 PM PST, Uros Bizjak <ubizjak@gmail.com> wrote:
>On Mon, Feb 24, 2025 at 10:56 PM H. Peter Anvin <hpa@zytor.com> wrote:
>>
>> On 2/24/25 07:24, Uros Bizjak wrote:
>> >
>> >
>> > On 23. 02. 25 17:42, Kuan-Wei Chiu wrote:
>> >> Refactor parity calculations to use the standard parity8() helper. This
>> >> change eliminates redundant implementations and improves code
>> >> efficiency.
>> >
>> > The patch improves parity assembly code in bootflag.o from:
>> >
>> >    58:    89 de                    mov    %ebx,%esi
>> >    5a:    b9 08 00 00 00           mov    $0x8,%ecx
>> >    5f:    31 d2                    xor    %edx,%edx
>> >    61:    89 f0                    mov    %esi,%eax
>> >    63:    89 d7                    mov    %edx,%edi
>> >    65:    40 d0 ee                 shr    %sil
>> >    68:    83 e0 01                 and    $0x1,%eax
>> >    6b:    31 c2                    xor    %eax,%edx
>> >    6d:    83 e9 01                 sub    $0x1,%ecx
>> >    70:    75 ef                    jne    61 <sbf_init+0x51>
>> >    72:    39 c7                    cmp    %eax,%edi
>> >    74:    74 7f                    je     f5 <sbf_init+0xe5>
>> >    76:
>> >
>> > to:
>> >
>> >    54:    89 d8                    mov    %ebx,%eax
>> >    56:    ba 96 69 00 00           mov    $0x6996,%edx
>> >    5b:    c0 e8 04                 shr    $0x4,%al
>> >    5e:    31 d8                    xor    %ebx,%eax
>> >    60:    83 e0 0f                 and    $0xf,%eax
>> >    63:    0f a3 c2                 bt     %eax,%edx
>> >    66:    73 64                    jae    cc <sbf_init+0xbc>
>> >    68:
>> >
>> > which is faster and smaller (-10 bytes) code.
>> >
>>
>> Of course, on x86, parity8() and parity16() can be implemented very simply:
>>
>> (Also, the parity functions really ought to return bool, and be flagged
>> __attribute_const__.)
>>
>> static inline __attribute_const__ bool _arch_parity8(u8 val)
>> {
>>         bool parity;
>>         asm("and %0,%0" : "=@ccnp" (parity) : "q" (val));
>
>asm("test %0,%0" : "=@ccnp" (parity) : "q" (val));
>
>because we are interested only in flags.
>
>Uros.
>

Same thing, really, but yes, using test is cleaner.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 03/17] x86: Replace open-coded parity calculation with parity8()
  2025-02-24 22:17       ` Yury Norov
@ 2025-02-24 22:21         ` H. Peter Anvin
  2025-02-24 22:30           ` Yury Norov
  0 siblings, 1 reply; 54+ messages in thread
From: H. Peter Anvin @ 2025-02-24 22:21 UTC (permalink / raw)
  To: Yury Norov
  Cc: Uros Bizjak, Kuan-Wei Chiu, tglx, Ingo Molnar, bp, dave.hansen,
	x86, jk, joel, eajames, andrzej.hajda, neil.armstrong, rfoss,
	maarten.lankhorst, mripard, tzimmermann, airlied, simona,
	dmitry.torokhov, mchehab, awalls, hverkuil, miquel.raynal,
	richard, vigneshr, louis.peens, andrew+netdev, davem, edumazet,
	pabeni, parthiban.veerasooran, arend.vanspriel, johannes, gregkh,
	jirislaby, akpm, mingo, alistair, linux, Laurent.pinchart, jonas,
	jernej.skrabec, kuba, linux-kernel, linux-fsi, dri-devel,
	linux-input, linux-media, linux-mtd, oss-drivers, netdev,
	linux-wireless, brcm80211, brcm80211-dev-list.pdl, linux-serial,
	bpf, jserv, Yu-Chun Lin

On February 24, 2025 2:17:29 PM PST, Yury Norov <yury.norov@gmail.com> wrote:
>On Mon, Feb 24, 2025 at 01:55:28PM -0800, H. Peter Anvin wrote:
>> On 2/24/25 07:24, Uros Bizjak wrote:
>> > 
>> > 
>> > On 23. 02. 25 17:42, Kuan-Wei Chiu wrote:
>> > > Refactor parity calculations to use the standard parity8() helper. This
>> > > change eliminates redundant implementations and improves code
>> > > efficiency.
>> > 
>> > The patch improves parity assembly code in bootflag.o from:
>> > 
>> >    58:    89 de                    mov    %ebx,%esi
>> >    5a:    b9 08 00 00 00           mov    $0x8,%ecx
>> >    5f:    31 d2                    xor    %edx,%edx
>> >    61:    89 f0                    mov    %esi,%eax
>> >    63:    89 d7                    mov    %edx,%edi
>> >    65:    40 d0 ee                 shr    %sil
>> >    68:    83 e0 01                 and    $0x1,%eax
>> >    6b:    31 c2                    xor    %eax,%edx
>> >    6d:    83 e9 01                 sub    $0x1,%ecx
>> >    70:    75 ef                    jne    61 <sbf_init+0x51>
>> >    72:    39 c7                    cmp    %eax,%edi
>> >    74:    74 7f                    je     f5 <sbf_init+0xe5>
>> >    76:
>> > 
>> > to:
>> > 
>> >    54:    89 d8                    mov    %ebx,%eax
>> >    56:    ba 96 69 00 00           mov    $0x6996,%edx
>> >    5b:    c0 e8 04                 shr    $0x4,%al
>> >    5e:    31 d8                    xor    %ebx,%eax
>> >    60:    83 e0 0f                 and    $0xf,%eax
>> >    63:    0f a3 c2                 bt     %eax,%edx
>> >    66:    73 64                    jae    cc <sbf_init+0xbc>
>> >    68:
>> > 
>> > which is faster and smaller (-10 bytes) code.
>> > 
>> 
>> Of course, on x86, parity8() and parity16() can be implemented very simply:
>> 
>> (Also, the parity functions really ought to return bool, and be flagged
>> __attribute_const__.)
>
>There was a discussion regarding return type when parity8() was added.
>The integer type was taken over bool with a sort of consideration that
>bool should be returned as an answer to some question, like parity_odd().
>
>To me it's not a big deal. We can switch to boolean and describe in
>comment what the 'true' means for the parity() function.

Bool is really the single-bit type, and gives the compiler more information. You could argue that the function really should be called parity_odd*() in general, but that's kind of excessive IMO.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 03/17] x86: Replace open-coded parity calculation with parity8()
  2025-02-24 22:21         ` H. Peter Anvin
@ 2025-02-24 22:30           ` Yury Norov
  0 siblings, 0 replies; 54+ messages in thread
From: Yury Norov @ 2025-02-24 22:30 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Uros Bizjak, Kuan-Wei Chiu, tglx, Ingo Molnar, bp, dave.hansen,
	x86, jk, joel, eajames, andrzej.hajda, neil.armstrong, rfoss,
	maarten.lankhorst, mripard, tzimmermann, airlied, simona,
	dmitry.torokhov, mchehab, awalls, hverkuil, miquel.raynal,
	richard, vigneshr, louis.peens, andrew+netdev, davem, edumazet,
	pabeni, parthiban.veerasooran, arend.vanspriel, johannes, gregkh,
	jirislaby, akpm, mingo, alistair, linux, Laurent.pinchart, jonas,
	jernej.skrabec, kuba, linux-kernel, linux-fsi, dri-devel,
	linux-input, linux-media, linux-mtd, oss-drivers, netdev,
	linux-wireless, brcm80211, brcm80211-dev-list.pdl, linux-serial,
	bpf, jserv, Yu-Chun Lin

On Mon, Feb 24, 2025 at 02:21:13PM -0800, H. Peter Anvin wrote:
> On February 24, 2025 2:17:29 PM PST, Yury Norov <yury.norov@gmail.com> wrote:
> >On Mon, Feb 24, 2025 at 01:55:28PM -0800, H. Peter Anvin wrote:
> >> On 2/24/25 07:24, Uros Bizjak wrote:
> >> > 
> >> > 
> >> > On 23. 02. 25 17:42, Kuan-Wei Chiu wrote:
> >> > > Refactor parity calculations to use the standard parity8() helper. This
> >> > > change eliminates redundant implementations and improves code
> >> > > efficiency.
> >> > 
> >> > The patch improves parity assembly code in bootflag.o from:
> >> > 
> >> >    58:    89 de                    mov    %ebx,%esi
> >> >    5a:    b9 08 00 00 00           mov    $0x8,%ecx
> >> >    5f:    31 d2                    xor    %edx,%edx
> >> >    61:    89 f0                    mov    %esi,%eax
> >> >    63:    89 d7                    mov    %edx,%edi
> >> >    65:    40 d0 ee                 shr    %sil
> >> >    68:    83 e0 01                 and    $0x1,%eax
> >> >    6b:    31 c2                    xor    %eax,%edx
> >> >    6d:    83 e9 01                 sub    $0x1,%ecx
> >> >    70:    75 ef                    jne    61 <sbf_init+0x51>
> >> >    72:    39 c7                    cmp    %eax,%edi
> >> >    74:    74 7f                    je     f5 <sbf_init+0xe5>
> >> >    76:
> >> > 
> >> > to:
> >> > 
> >> >    54:    89 d8                    mov    %ebx,%eax
> >> >    56:    ba 96 69 00 00           mov    $0x6996,%edx
> >> >    5b:    c0 e8 04                 shr    $0x4,%al
> >> >    5e:    31 d8                    xor    %ebx,%eax
> >> >    60:    83 e0 0f                 and    $0xf,%eax
> >> >    63:    0f a3 c2                 bt     %eax,%edx
> >> >    66:    73 64                    jae    cc <sbf_init+0xbc>
> >> >    68:
> >> > 
> >> > which is faster and smaller (-10 bytes) code.
> >> > 
> >> 
> >> Of course, on x86, parity8() and parity16() can be implemented very simply:
> >> 
> >> (Also, the parity functions really ought to return bool, and be flagged
> >> __attribute_const__.)
> >
> >There was a discussion regarding return type when parity8() was added.
> >The integer type was taken over bool with a sort of consideration that
> >bool should be returned as an answer to some question, like parity_odd().
> >
> >To me it's not a big deal. We can switch to boolean and describe in
> >comment what the 'true' means for the parity() function.
> 
> Bool is really the single-bit type, and gives the compiler more information. You could argue that the function really should be called parity_odd*() in general, but that's kind of excessive IMO.

Yes, I could, but I will not. :) I also feel like bool looks more
natural here.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 03/17] x86: Replace open-coded parity calculation with parity8()
  2025-02-24 22:08       ` Uros Bizjak
  2025-02-24 22:18         ` H. Peter Anvin
@ 2025-02-25  3:36         ` H. Peter Anvin
  1 sibling, 0 replies; 54+ messages in thread
From: H. Peter Anvin @ 2025-02-25  3:36 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Kuan-Wei Chiu, tglx, Ingo Molnar, bp, dave.hansen, x86, jk, joel,
	eajames, andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst,
	mripard, tzimmermann, airlied, simona, dmitry.torokhov, mchehab,
	awalls, hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, jirislaby, yury.norov, akpm,
	mingo, alistair, linux, Laurent.pinchart, jonas, jernej.skrabec,
	kuba, linux-kernel, linux-fsi, dri-devel, linux-input,
	linux-media, linux-mtd, oss-drivers, netdev, linux-wireless,
	brcm80211, brcm80211-dev-list.pdl, linux-serial, bpf, jserv,
	Yu-Chun Lin


On 2/24/25 14:08, Uros Bizjak wrote:
> On Mon, Feb 24, 2025 at 10:56 PM H. Peter Anvin <hpa@zytor.com> wrote:
>>
>> On 2/24/25 07:24, Uros Bizjak wrote:
>>>
>>>
>>> On 23. 02. 25 17:42, Kuan-Wei Chiu wrote:
>>>> Refactor parity calculations to use the standard parity8() helper. This
>>>> change eliminates redundant implementations and improves code
>>>> efficiency.
>>>
>>> The patch improves parity assembly code in bootflag.o from:
>>>
>>>     58:    89 de                    mov    %ebx,%esi
>>>     5a:    b9 08 00 00 00           mov    $0x8,%ecx
>>>     5f:    31 d2                    xor    %edx,%edx
>>>     61:    89 f0                    mov    %esi,%eax
>>>     63:    89 d7                    mov    %edx,%edi
>>>     65:    40 d0 ee                 shr    %sil
>>>     68:    83 e0 01                 and    $0x1,%eax
>>>     6b:    31 c2                    xor    %eax,%edx
>>>     6d:    83 e9 01                 sub    $0x1,%ecx
>>>     70:    75 ef                    jne    61 <sbf_init+0x51>
>>>     72:    39 c7                    cmp    %eax,%edi
>>>     74:    74 7f                    je     f5 <sbf_init+0xe5>
>>>     76:
>>>
>>> to:
>>>
>>>     54:    89 d8                    mov    %ebx,%eax
>>>     56:    ba 96 69 00 00           mov    $0x6996,%edx
>>>     5b:    c0 e8 04                 shr    $0x4,%al
>>>     5e:    31 d8                    xor    %ebx,%eax
>>>     60:    83 e0 0f                 and    $0xf,%eax
>>>     63:    0f a3 c2                 bt     %eax,%edx
>>>     66:    73 64                    jae    cc <sbf_init+0xbc>
>>>     68:
>>>
>>> which is faster and smaller (-10 bytes) code.
>>>
>>
>> Of course, on x86, parity8() and parity16() can be implemented very simply:
>>
>> (Also, the parity functions really ought to return bool, and be flagged
>> __attribute_const__.)
>>
>> static inline __attribute_const__ bool _arch_parity8(u8 val)
>> {
>>          bool parity;
>>          asm("and %0,%0" : "=@ccnp" (parity) : "q" (val));
> 
> asm("test %0,%0" : "=@ccnp" (parity) : "q" (val));
> 
> because we are interested only in flags.
> 

Also, needs to be %1,%1 (my mistake, thought flags outputs didn't count.)

Finally, this is kind of an obvious improvement:

  static void __init sbf_write(u8 v)
  {
         unsigned long flags;

         if (sbf_port != -1) {
-               v &= ~SBF_PARITY;
                 if (!parity(v))
-                       v |= SBF_PARITY;
+                       v ^= SBF_PARITY;

	-hpa


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 11/17] wifi: brcm80211: Replace open-coded parity calculation with parity32()
  2025-02-23 16:42 ` [PATCH 11/17] wifi: brcm80211: " Kuan-Wei Chiu
@ 2025-02-25  6:29   ` Arend Van Spriel
  0 siblings, 0 replies; 54+ messages in thread
From: Arend Van Spriel @ 2025-02-25  6:29 UTC (permalink / raw)
  To: Kuan-Wei Chiu, tglx, mingo, bp, dave.hansen, x86, jk, joel,
	eajames, andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst,
	mripard, tzimmermann, airlied, simona, dmitry.torokhov, mchehab,
	awalls, hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	johannes, gregkh, jirislaby, yury.norov, akpm
  Cc: hpa, alistair, linux, Laurent.pinchart, jonas, jernej.skrabec,
	kuba, linux-kernel, linux-fsi, dri-devel, linux-input,
	linux-media, linux-mtd, oss-drivers, netdev, linux-wireless,
	brcm80211, brcm80211-dev-list.pdl, linux-serial, bpf, jserv,
	Yu-Chun Lin

On February 23, 2025 5:44:54 PM Kuan-Wei Chiu <visitorckw@gmail.com> wrote:

> Refactor parity calculations to use the standard parity32() helper.
> This change eliminates redundant implementations and improves code
> efficiency.

While the dust settles on the exact implementation from driver perspective 
looks fine to me so...

Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
>
> Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
> Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
> Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> ---
> .../wireless/broadcom/brcm80211/brcmsmac/dma.c   | 16 +---------------
> 1 file changed, 1 insertion(+), 15 deletions(-)




^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 02/17] bitops: Add generic parity calculation for u64
  2025-02-24 19:27   ` Yury Norov
@ 2025-02-25 13:29     ` Kuan-Wei Chiu
  2025-02-25 14:20       ` Kuan-Wei Chiu
  2025-02-26  7:14       ` Jiri Slaby
  2025-02-26 22:29     ` David Laight
  1 sibling, 2 replies; 54+ messages in thread
From: Kuan-Wei Chiu @ 2025-02-25 13:29 UTC (permalink / raw)
  To: Yury Norov
  Cc: tglx, mingo, bp, dave.hansen, x86, jk, joel, eajames,
	andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst, mripard,
	tzimmermann, airlied, simona, dmitry.torokhov, mchehab, awalls,
	hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, jirislaby, akpm, hpa, alistair,
	linux, Laurent.pinchart, jonas, jernej.skrabec, kuba,
	linux-kernel, linux-fsi, dri-devel, linux-input, linux-media,
	linux-mtd, oss-drivers, netdev, linux-wireless, brcm80211,
	brcm80211-dev-list.pdl, linux-serial, bpf, jserv, Yu-Chun Lin

Hi Yury,

On Mon, Feb 24, 2025 at 02:27:03PM -0500, Yury Norov wrote:
> On Mon, Feb 24, 2025 at 12:42:02AM +0800, Kuan-Wei Chiu wrote:
> > Several parts of the kernel open-code parity calculations using
> > different methods. Add a generic parity64() helper implemented with the
> > same efficient approach as parity8().
> 
> No reason to add parity32() and parity64() in separate patches

Ack.

>  
> > Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
> > Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
> > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> > ---
> >  include/linux/bitops.h | 22 ++++++++++++++++++++++
> >  1 file changed, 22 insertions(+)
> > 
> > diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> > index fb13dedad7aa..67677057f5e2 100644
> > --- a/include/linux/bitops.h
> > +++ b/include/linux/bitops.h
> > @@ -281,6 +281,28 @@ static inline int parity32(u32 val)
> >  	return (0x6996 >> (val & 0xf)) & 1;
> >  }
> >  
> > +/**
> > + * parity64 - get the parity of an u64 value
> > + * @value: the value to be examined
> > + *
> > + * Determine the parity of the u64 argument.
> > + *
> > + * Returns:
> > + * 0 for even parity, 1 for odd parity
> > + */
> > +static inline int parity64(u64 val)
> > +{
> > +	/*
> > +	 * One explanation of this algorithm:
> > +	 * https://funloop.org/codex/problem/parity/README.html
> 
> This is already referenced in sources. No need to spread it for more.

Ack.

> 
> > +	 */
> > +	val ^= val >> 32;
> > +	val ^= val >> 16;
> > +	val ^= val >> 8;
> > +	val ^= val >> 4;
> > +	return (0x6996 >> (val & 0xf)) & 1;
> 
> It's better to avoid duplicating the same logic again and again.

Ack.

> 
> > +}
> > +
> 
> So maybe make it a macro?
> 
> 
> From f17a28ae3429f49825d65ebc0f7717c6a191a3e2 Mon Sep 17 00:00:00 2001
> From: Yury Norov <yury.norov@gmail.com>
> Date: Mon, 24 Feb 2025 14:14:27 -0500
> Subject: [PATCH] bitops: generalize parity8()
> 
> The generic parity calculation approach may be easily generalized for
> other standard types. Do that and drop sub-optimal implementation of
> parity calculation in x86 code.
> 
> Signed-off-by: Yury Norov [NVIDIA] <yury.norov@gmail.com>
> ---
>  arch/x86/kernel/bootflag.c | 14 +-----------
>  include/linux/bitops.h     | 47 +++++++++++++++++++++++++++-----------
>  2 files changed, 35 insertions(+), 26 deletions(-)
> 
> diff --git a/arch/x86/kernel/bootflag.c b/arch/x86/kernel/bootflag.c
> index 3fed7ae58b60..4a85c69a28f8 100644
> --- a/arch/x86/kernel/bootflag.c
> +++ b/arch/x86/kernel/bootflag.c
> @@ -2,6 +2,7 @@
>  /*
>   *	Implement 'Simple Boot Flag Specification 2.0'
>   */
> +#include <linux/bitops.h>
>  #include <linux/types.h>
>  #include <linux/kernel.h>
>  #include <linux/init.h>
> @@ -20,19 +21,6 @@
>  
>  int sbf_port __initdata = -1;	/* set via acpi_boot_init() */
>  
> -static int __init parity(u8 v)
> -{
> -	int x = 0;
> -	int i;
> -
> -	for (i = 0; i < 8; i++) {
> -		x ^= (v & 1);
> -		v >>= 1;
> -	}
> -
> -	return x;
> -}
> -
>  static void __init sbf_write(u8 v)
>  {
>  	unsigned long flags;
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index c1cb53cf2f0f..29601434f5f4 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -230,10 +230,10 @@ static inline int get_count_order_long(unsigned long l)
>  }
>  
>  /**
> - * parity8 - get the parity of an u8 value
> + * parity - get the parity of a value
>   * @value: the value to be examined
>   *
> - * Determine the parity of the u8 argument.
> + * Determine parity of the argument.
>   *
>   * Returns:
>   * 0 for even parity, 1 for odd parity
> @@ -241,24 +241,45 @@ static inline int get_count_order_long(unsigned long l)
>   * Note: This function informs you about the current parity. Example to bail
>   * out when parity is odd:
>   *
> - *	if (parity8(val) == 1)
> + *	if (parity(val) == 1)
>   *		return -EBADMSG;
>   *
>   * If you need to calculate a parity bit, you need to draw the conclusion from
>   * this result yourself. Example to enforce odd parity, parity bit is bit 7:
>   *
> - *	if (parity8(val) == 0)
> + *	if (parity(val) == 0)
>   *		val ^= BIT(7);
> + *
> + * One explanation of this algorithm:
> + * https://funloop.org/codex/problem/parity/README.html
>   */
> -static inline int parity8(u8 val)
> -{
> -	/*
> -	 * One explanation of this algorithm:
> -	 * https://funloop.org/codex/problem/parity/README.html
> -	 */
> -	val ^= val >> 4;
> -	return (0x6996 >> (val & 0xf)) & 1;
> -}
> +#define parity(val)					\
> +({							\
> +	u64 __v = (val);				\
> +	int __ret;					\
> +	switch (BITS_PER_TYPE(val)) {			\
> +	case 64:					\
> +		__v ^= __v >> 32;			\
> +		fallthrough;				\
> +	case 32:					\
> +		__v ^= __v >> 16;			\
> +		fallthrough;				\
> +	case 16:					\
> +		__v ^= __v >> 8;			\
> +		fallthrough;				\
> +	case 8:						\
> +		__v ^= __v >> 4;			\
> +		__ret =  (0x6996 >> (__v & 0xf)) & 1;	\
> +		break;					\
> +	default:					\
> +		BUILD_BUG();				\
> +	}						\
> +	__ret;						\
> +})
> +
> +#define parity8(val)	parity((u8)(val))
> +#define parity32(val)	parity((u32)(val))
> +#define parity64(val)	parity((u64)(val))
>  
What do you think about using these inline functions instead of macros?
Except for parity8(), each function is a single line and follows the
same logic. I find inline functions more readable, and coding-style.rst
also recommends them over macros.

Regards,
Kuan-Wei

diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index c1cb53cf2f0f..d518a382f1fe 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -260,6 +260,26 @@ static inline int parity8(u8 val)
 	return (0x6996 >> (val & 0xf)) & 1;
 }
 
+static inline parity16(u16 val)
+{
+	return parity8(val ^ (val >> 8));
+}
+
+static inline parity16(u16 val)
+{
+	return parity8(val ^ (val >> 8));
+}
+
+static inline parity32(u32)
+{
+	return parity16(val ^ (val >> 16));
+}
+
+static inline parity64(u64)
+{
+	return parity32(val ^ (val >> 32));
+}
+
 /**
  * __ffs64 - find first set bit in a 64 bit word
  * @word: The 64 bit word


>  /**
>   * __ffs64 - find first set bit in a 64 bit word
> -- 
> 2.43.0
> 

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH 02/17] bitops: Add generic parity calculation for u64
  2025-02-25 13:29     ` Kuan-Wei Chiu
@ 2025-02-25 14:20       ` Kuan-Wei Chiu
  2025-02-26  7:14       ` Jiri Slaby
  1 sibling, 0 replies; 54+ messages in thread
From: Kuan-Wei Chiu @ 2025-02-25 14:20 UTC (permalink / raw)
  To: Yury Norov
  Cc: tglx, mingo, bp, dave.hansen, x86, jk, joel, eajames,
	andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst, mripard,
	tzimmermann, airlied, simona, dmitry.torokhov, mchehab, awalls,
	hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, jirislaby, akpm, hpa, alistair,
	linux, Laurent.pinchart, jonas, jernej.skrabec, kuba,
	linux-kernel, linux-fsi, dri-devel, linux-input, linux-media,
	linux-mtd, oss-drivers, netdev, linux-wireless, brcm80211,
	brcm80211-dev-list.pdl, linux-serial, bpf, jserv, Yu-Chun Lin

On Tue, Feb 25, 2025 at 09:29:51PM +0800, Kuan-Wei Chiu wrote:
> Hi Yury,
> 
> On Mon, Feb 24, 2025 at 02:27:03PM -0500, Yury Norov wrote:
> > On Mon, Feb 24, 2025 at 12:42:02AM +0800, Kuan-Wei Chiu wrote:
> > > Several parts of the kernel open-code parity calculations using
> > > different methods. Add a generic parity64() helper implemented with the
> > > same efficient approach as parity8().
> > 
> > No reason to add parity32() and parity64() in separate patches
> 
> Ack.
> 
> >  
> > > Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
> > > Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
> > > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> > > ---
> > >  include/linux/bitops.h | 22 ++++++++++++++++++++++
> > >  1 file changed, 22 insertions(+)
> > > 
> > > diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> > > index fb13dedad7aa..67677057f5e2 100644
> > > --- a/include/linux/bitops.h
> > > +++ b/include/linux/bitops.h
> > > @@ -281,6 +281,28 @@ static inline int parity32(u32 val)
> > >  	return (0x6996 >> (val & 0xf)) & 1;
> > >  }
> > >  
> > > +/**
> > > + * parity64 - get the parity of an u64 value
> > > + * @value: the value to be examined
> > > + *
> > > + * Determine the parity of the u64 argument.
> > > + *
> > > + * Returns:
> > > + * 0 for even parity, 1 for odd parity
> > > + */
> > > +static inline int parity64(u64 val)
> > > +{
> > > +	/*
> > > +	 * One explanation of this algorithm:
> > > +	 * https://funloop.org/codex/problem/parity/README.html
> > 
> > This is already referenced in sources. No need to spread it for more.
> 
> Ack.
> 
> > 
> > > +	 */
> > > +	val ^= val >> 32;
> > > +	val ^= val >> 16;
> > > +	val ^= val >> 8;
> > > +	val ^= val >> 4;
> > > +	return (0x6996 >> (val & 0xf)) & 1;
> > 
> > It's better to avoid duplicating the same logic again and again.
> 
> Ack.
> 
> > 
> > > +}
> > > +
> > 
> > So maybe make it a macro?
> > 
> > 
> > From f17a28ae3429f49825d65ebc0f7717c6a191a3e2 Mon Sep 17 00:00:00 2001
> > From: Yury Norov <yury.norov@gmail.com>
> > Date: Mon, 24 Feb 2025 14:14:27 -0500
> > Subject: [PATCH] bitops: generalize parity8()
> > 
> > The generic parity calculation approach may be easily generalized for
> > other standard types. Do that and drop sub-optimal implementation of
> > parity calculation in x86 code.
> > 
> > Signed-off-by: Yury Norov [NVIDIA] <yury.norov@gmail.com>
> > ---
> >  arch/x86/kernel/bootflag.c | 14 +-----------
> >  include/linux/bitops.h     | 47 +++++++++++++++++++++++++++-----------
> >  2 files changed, 35 insertions(+), 26 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/bootflag.c b/arch/x86/kernel/bootflag.c
> > index 3fed7ae58b60..4a85c69a28f8 100644
> > --- a/arch/x86/kernel/bootflag.c
> > +++ b/arch/x86/kernel/bootflag.c
> > @@ -2,6 +2,7 @@
> >  /*
> >   *	Implement 'Simple Boot Flag Specification 2.0'
> >   */
> > +#include <linux/bitops.h>
> >  #include <linux/types.h>
> >  #include <linux/kernel.h>
> >  #include <linux/init.h>
> > @@ -20,19 +21,6 @@
> >  
> >  int sbf_port __initdata = -1;	/* set via acpi_boot_init() */
> >  
> > -static int __init parity(u8 v)
> > -{
> > -	int x = 0;
> > -	int i;
> > -
> > -	for (i = 0; i < 8; i++) {
> > -		x ^= (v & 1);
> > -		v >>= 1;
> > -	}
> > -
> > -	return x;
> > -}
> > -
> >  static void __init sbf_write(u8 v)
> >  {
> >  	unsigned long flags;
> > diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> > index c1cb53cf2f0f..29601434f5f4 100644
> > --- a/include/linux/bitops.h
> > +++ b/include/linux/bitops.h
> > @@ -230,10 +230,10 @@ static inline int get_count_order_long(unsigned long l)
> >  }
> >  
> >  /**
> > - * parity8 - get the parity of an u8 value
> > + * parity - get the parity of a value
> >   * @value: the value to be examined
> >   *
> > - * Determine the parity of the u8 argument.
> > + * Determine parity of the argument.
> >   *
> >   * Returns:
> >   * 0 for even parity, 1 for odd parity
> > @@ -241,24 +241,45 @@ static inline int get_count_order_long(unsigned long l)
> >   * Note: This function informs you about the current parity. Example to bail
> >   * out when parity is odd:
> >   *
> > - *	if (parity8(val) == 1)
> > + *	if (parity(val) == 1)
> >   *		return -EBADMSG;
> >   *
> >   * If you need to calculate a parity bit, you need to draw the conclusion from
> >   * this result yourself. Example to enforce odd parity, parity bit is bit 7:
> >   *
> > - *	if (parity8(val) == 0)
> > + *	if (parity(val) == 0)
> >   *		val ^= BIT(7);
> > + *
> > + * One explanation of this algorithm:
> > + * https://funloop.org/codex/problem/parity/README.html
> >   */
> > -static inline int parity8(u8 val)
> > -{
> > -	/*
> > -	 * One explanation of this algorithm:
> > -	 * https://funloop.org/codex/problem/parity/README.html
> > -	 */
> > -	val ^= val >> 4;
> > -	return (0x6996 >> (val & 0xf)) & 1;
> > -}
> > +#define parity(val)					\
> > +({							\
> > +	u64 __v = (val);				\
> > +	int __ret;					\
> > +	switch (BITS_PER_TYPE(val)) {			\
> > +	case 64:					\
> > +		__v ^= __v >> 32;			\
> > +		fallthrough;				\
> > +	case 32:					\
> > +		__v ^= __v >> 16;			\
> > +		fallthrough;				\
> > +	case 16:					\
> > +		__v ^= __v >> 8;			\
> > +		fallthrough;				\
> > +	case 8:						\
> > +		__v ^= __v >> 4;			\
> > +		__ret =  (0x6996 >> (__v & 0xf)) & 1;	\
> > +		break;					\
> > +	default:					\
> > +		BUILD_BUG();				\
> > +	}						\
> > +	__ret;						\
> > +})
> > +
> > +#define parity8(val)	parity((u8)(val))
> > +#define parity32(val)	parity((u32)(val))
> > +#define parity64(val)	parity((u64)(val))
> >  
> What do you think about using these inline functions instead of macros?
> Except for parity8(), each function is a single line and follows the
> same logic. I find inline functions more readable, and coding-style.rst
> also recommends them over macros.
> 
> Regards,
> Kuan-Wei
> 
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index c1cb53cf2f0f..d518a382f1fe 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -260,6 +260,26 @@ static inline int parity8(u8 val)
>  	return (0x6996 >> (val & 0xf)) & 1;
>  }
>  
> +static inline parity16(u16 val)
> +{
> +	return parity8(val ^ (val >> 8));
> +}
> +
> +static inline parity16(u16 val)
> +{
> +	return parity8(val ^ (val >> 8));
> +}
> +
> +static inline parity32(u32)
> +{
> +	return parity16(val ^ (val >> 16));
> +}
> +
> +static inline parity64(u64)
> +{
> +	return parity32(val ^ (val >> 32));
> +}
> +
>  /**
>   * __ffs64 - find first set bit in a 64 bit word
>   * @word: The 64 bit word
> 
>
Oops... I made a lot of fat-finger mistakes. Here's the correct one.

diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index c1cb53cf2f0f..427e4c06055e 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -260,6 +260,21 @@ static inline int parity8(u8 val)
 	return (0x6996 >> (val & 0xf)) & 1;
 }
 
+static inline int parity16(u16 val)
+{
+	return parity8(val ^ (val >> 8));
+}
+
+static inline int parity32(u32 val)
+{
+	return parity16(val ^ (val >> 16));
+}
+
+static inline int parity64(u64 val)
+{
+	return parity32(val ^ (val >> 32));
+}
+
 /**
  * __ffs64 - find first set bit in a 64 bit word
  * @word: The 64 bit word


> >  /**
> >   * __ffs64 - find first set bit in a 64 bit word
> > -- 
> > 2.43.0
> > 

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH 02/17] bitops: Add generic parity calculation for u64
  2025-02-24 13:34     ` David Laight
  2025-02-24 16:56       ` Yu-Chun Lin
@ 2025-02-25 15:21       ` H. Peter Anvin
  2025-02-25 15:24       ` H. Peter Anvin
  2 siblings, 0 replies; 54+ messages in thread
From: H. Peter Anvin @ 2025-02-25 15:21 UTC (permalink / raw)
  To: David Laight, Jiri Slaby
  Cc: Kuan-Wei Chiu, tglx, mingo, bp, dave.hansen, x86, jk, joel,
	eajames, andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst,
	mripard, tzimmermann, airlied, simona, dmitry.torokhov, mchehab,
	awalls, hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, yury.norov, akpm, alistair,
	linux, Laurent.pinchart, jonas, jernej.skrabec, kuba,
	linux-kernel, linux-fsi, dri-devel, linux-input, linux-media,
	linux-mtd, oss-drivers, netdev, linux-wireless, brcm80211,
	brcm80211-dev-list.pdl, linux-serial, bpf, jserv, Yu-Chun Lin

On February 24, 2025 5:34:31 AM PST, David Laight <david.laight.linux@gmail.com> wrote:
>On Mon, 24 Feb 2025 08:09:43 +0100
>Jiri Slaby <jirislaby@kernel.org> wrote:
>
>> On 23. 02. 25, 17:42, Kuan-Wei Chiu wrote:
>> > Several parts of the kernel open-code parity calculations using
>> > different methods. Add a generic parity64() helper implemented with the
>> > same efficient approach as parity8().
>> > 
>> > Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
>> > Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
>> > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
>> > ---
>> >   include/linux/bitops.h | 22 ++++++++++++++++++++++
>> >   1 file changed, 22 insertions(+)
>> > 
>> > diff --git a/include/linux/bitops.h b/include/linux/bitops.h
>> > index fb13dedad7aa..67677057f5e2 100644
>> > --- a/include/linux/bitops.h
>> > +++ b/include/linux/bitops.h
>> > @@ -281,6 +281,28 @@ static inline int parity32(u32 val)
>> >   	return (0x6996 >> (val & 0xf)) & 1;
>> >   }
>> >   
>> > +/**
>> > + * parity64 - get the parity of an u64 value
>> > + * @value: the value to be examined
>> > + *
>> > + * Determine the parity of the u64 argument.
>> > + *
>> > + * Returns:
>> > + * 0 for even parity, 1 for odd parity
>> > + */
>> > +static inline int parity64(u64 val)
>> > +{
>> > +	/*
>> > +	 * One explanation of this algorithm:
>> > +	 * https://funloop.org/codex/problem/parity/README.html
>> > +	 */
>> > +	val ^= val >> 32;  
>> 
>> Do we need all these implementations? Can't we simply use parity64() for 
>> any 8, 16 and 32-bit values too? I.e. have one parity().
>
>I'm not sure you can guarantee that the compiler will optimise away
>the unnecessary operations.
>
>But:
>static inline int parity64(u64 val)
>{
>	return parity32(val ^ (val >> 32))
>}
>
>should be ok.
>It will also work on x86-32 where parity32() can just check the parity flag.
>Although you are unlikely to manage to use the the PF the xor sets.
>
>	David
>
>> 
>> > +	val ^= val >> 16;
>> > +	val ^= val >> 8;
>> > +	val ^= val >> 4;
>> > +	return (0x6996 >> (val & 0xf)) & 1;
>> > +}
>> > +
>> >   /**
>> >    * __ffs64 - find first set bit in a 64 bit word
>> >    * @word: The 64 bit word  
>> 
>> 
>

Sure you can; you do need an 8- and a 16-bit arch implementation though (the 16 bit one being xor %rh,%rl)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 02/17] bitops: Add generic parity calculation for u64
  2025-02-24 13:34     ` David Laight
  2025-02-24 16:56       ` Yu-Chun Lin
  2025-02-25 15:21       ` H. Peter Anvin
@ 2025-02-25 15:24       ` H. Peter Anvin
  2025-02-25 21:43         ` Andrew Cooper
  2 siblings, 1 reply; 54+ messages in thread
From: H. Peter Anvin @ 2025-02-25 15:24 UTC (permalink / raw)
  To: David Laight, Jiri Slaby
  Cc: Kuan-Wei Chiu, tglx, mingo, bp, dave.hansen, x86, jk, joel,
	eajames, andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst,
	mripard, tzimmermann, airlied, simona, dmitry.torokhov, mchehab,
	awalls, hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, yury.norov, akpm, alistair,
	linux, Laurent.pinchart, jonas, jernej.skrabec, kuba,
	linux-kernel, linux-fsi, dri-devel, linux-input, linux-media,
	linux-mtd, oss-drivers, netdev, linux-wireless, brcm80211,
	brcm80211-dev-list.pdl, linux-serial, bpf, jserv, Yu-Chun Lin

On February 24, 2025 5:34:31 AM PST, David Laight <david.laight.linux@gmail.com> wrote:
>On Mon, 24 Feb 2025 08:09:43 +0100
>Jiri Slaby <jirislaby@kernel.org> wrote:
>
>> On 23. 02. 25, 17:42, Kuan-Wei Chiu wrote:
>> > Several parts of the kernel open-code parity calculations using
>> > different methods. Add a generic parity64() helper implemented with the
>> > same efficient approach as parity8().
>> > 
>> > Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
>> > Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
>> > Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
>> > ---
>> >   include/linux/bitops.h | 22 ++++++++++++++++++++++
>> >   1 file changed, 22 insertions(+)
>> > 
>> > diff --git a/include/linux/bitops.h b/include/linux/bitops.h
>> > index fb13dedad7aa..67677057f5e2 100644
>> > --- a/include/linux/bitops.h
>> > +++ b/include/linux/bitops.h
>> > @@ -281,6 +281,28 @@ static inline int parity32(u32 val)
>> >   	return (0x6996 >> (val & 0xf)) & 1;
>> >   }
>> >   
>> > +/**
>> > + * parity64 - get the parity of an u64 value
>> > + * @value: the value to be examined
>> > + *
>> > + * Determine the parity of the u64 argument.
>> > + *
>> > + * Returns:
>> > + * 0 for even parity, 1 for odd parity
>> > + */
>> > +static inline int parity64(u64 val)
>> > +{
>> > +	/*
>> > +	 * One explanation of this algorithm:
>> > +	 * https://funloop.org/codex/problem/parity/README.html
>> > +	 */
>> > +	val ^= val >> 32;  
>> 
>> Do we need all these implementations? Can't we simply use parity64() for 
>> any 8, 16 and 32-bit values too? I.e. have one parity().
>
>I'm not sure you can guarantee that the compiler will optimise away
>the unnecessary operations.
>
>But:
>static inline int parity64(u64 val)
>{
>	return parity32(val ^ (val >> 32))
>}
>
>should be ok.
>It will also work on x86-32 where parity32() can just check the parity flag.
>Although you are unlikely to manage to use the the PF the xor sets.
>
>	David
>
>> 
>> > +	val ^= val >> 16;
>> > +	val ^= val >> 8;
>> > +	val ^= val >> 4;
>> > +	return (0x6996 >> (val & 0xf)) & 1;
>> > +}
>> > +
>> >   /**
>> >    * __ffs64 - find first set bit in a 64 bit word
>> >    * @word: The 64 bit word  
>> 
>> 
>

Incidentally, in all of this, didn't anyone notice __builtin_parity()?

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 02/17] bitops: Add generic parity calculation for u64
  2025-02-25 15:24       ` H. Peter Anvin
@ 2025-02-25 21:43         ` Andrew Cooper
  2025-02-26  1:35           ` H. Peter Anvin
  0 siblings, 1 reply; 54+ messages in thread
From: Andrew Cooper @ 2025-02-25 21:43 UTC (permalink / raw)
  To: hpa
  Cc: Laurent.pinchart, airlied, akpm, alistair, andrew+netdev,
	andrzej.hajda, arend.vanspriel, awalls, bp, bpf,
	brcm80211-dev-list.pdl, brcm80211, dave.hansen, davem,
	david.laight.linux, dmitry.torokhov, dri-devel, eajames, edumazet,
	eleanor15x, gregkh, hverkuil, jernej.skrabec, jirislaby, jk, joel,
	johannes, jonas, jserv, kuba, linux-fsi, linux-input,
	linux-kernel, linux-media, linux-mtd, linux-serial,
	linux-wireless, linux, louis.peens, maarten.lankhorst, mchehab,
	mingo, miquel.raynal, mripard, neil.armstrong, netdev,
	oss-drivers, pabeni, parthiban.veerasooran, rfoss, richard,
	simona, tglx, tzimmermann, vigneshr, visitorckw, x86, yury.norov

> Incidentally, in all of this, didn't anyone notice __builtin_parity()?

Yes.  It it has done sane for a decade on x86, yet does things such as
emitting a library call on other architectures.

https://godbolt.org/z/6qG3noebq

~Andrew

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 03/17] x86: Replace open-coded parity calculation with parity8()
  2025-02-24 21:55     ` H. Peter Anvin
  2025-02-24 22:08       ` Uros Bizjak
  2025-02-24 22:17       ` Yury Norov
@ 2025-02-25 22:46       ` David Laight
  2025-02-26  0:26         ` H. Peter Anvin
  2 siblings, 1 reply; 54+ messages in thread
From: David Laight @ 2025-02-25 22:46 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Uros Bizjak, Kuan-Wei Chiu, tglx, Ingo Molnar, bp, dave.hansen,
	x86, jk, joel, eajames, andrzej.hajda, neil.armstrong, rfoss,
	maarten.lankhorst, mripard, tzimmermann, airlied, simona,
	dmitry.torokhov, mchehab, awalls, hverkuil, miquel.raynal,
	richard, vigneshr, louis.peens, andrew+netdev, davem, edumazet,
	pabeni, parthiban.veerasooran, arend.vanspriel, johannes, gregkh,
	jirislaby, yury.norov, akpm, mingo, alistair, linux,
	Laurent.pinchart, jonas, jernej.skrabec, kuba, linux-kernel,
	linux-fsi, dri-devel, linux-input, linux-media, linux-mtd,
	oss-drivers, netdev, linux-wireless, brcm80211,
	brcm80211-dev-list.pdl, linux-serial, bpf, jserv, Yu-Chun Lin

On Mon, 24 Feb 2025 13:55:28 -0800
"H. Peter Anvin" <hpa@zytor.com> wrote:

> On 2/24/25 07:24, Uros Bizjak wrote:
> > 
> > 
> > On 23. 02. 25 17:42, Kuan-Wei Chiu wrote:  
> >> Refactor parity calculations to use the standard parity8() helper. This
> >> change eliminates redundant implementations and improves code
> >> efficiency.  
...
> Of course, on x86, parity8() and parity16() can be implemented very simply:
> 
> (Also, the parity functions really ought to return bool, and be flagged 
> __attribute_const__.)
> 
> static inline __attribute_const__ bool _arch_parity8(u8 val)
> {
> 	bool parity;
> 	asm("and %0,%0" : "=@ccnp" (parity) : "q" (val));
> 	return parity;
> }
> 
> static inline __attribute_const__ bool _arch_parity16(u16 val)
> {
> 	bool parity;
> 	asm("xor %h0,%b0" : "=@ccnp" (parity), "+Q" (val));
> 	return parity;
> }

The same (with fixes) can be done for parity64() on 32bit.

> 
> In the generic algorithm, you probably should implement parity16() in 
> terms of parity8(), parity32() in terms of parity16() and so on:
> 
> static inline __attribute_const__ bool parity16(u16 val)
> {
> #ifdef ARCH_HAS_PARITY16
> 	if (!__builtin_const_p(val))
> 		return _arch_parity16(val);
> #endif
> 	return parity8(val ^ (val >> 8));
> }
> 
> This picks up the architectural versions when available.

Not the best way to do that.
Make the name in the #ifdef the same as the function and define
a default one if the architecture doesn't define one.
So:

static inline parity16(u16 val)
{
	return __builtin_const_p(val) ? _parity_const(val) : _parity16(val);
}

#ifndef _parity16
static inline _parity16(u15 val)
{
	return _parity8(val ^ (val >> 8));
}
#endif

You only need one _parity_const().

> 
> Furthermore, if a popcnt instruction is known to exist, then the parity 
> is simply popcnt(x) & 1.

Beware that some popcnt instructions are slow.

	David

> 
> 	-hpa
> 
> 


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 03/17] x86: Replace open-coded parity calculation with parity8()
  2025-02-25 22:46       ` David Laight
@ 2025-02-26  0:26         ` H. Peter Anvin
  0 siblings, 0 replies; 54+ messages in thread
From: H. Peter Anvin @ 2025-02-26  0:26 UTC (permalink / raw)
  To: David Laight
  Cc: Uros Bizjak, Kuan-Wei Chiu, tglx, Ingo Molnar, bp, dave.hansen,
	x86, jk, joel, eajames, andrzej.hajda, neil.armstrong, rfoss,
	maarten.lankhorst, mripard, tzimmermann, airlied, simona,
	dmitry.torokhov, mchehab, awalls, hverkuil, miquel.raynal,
	richard, vigneshr, louis.peens, andrew+netdev, davem, edumazet,
	pabeni, parthiban.veerasooran, arend.vanspriel, johannes, gregkh,
	jirislaby, yury.norov, akpm, mingo, alistair, linux,
	Laurent.pinchart, jonas, jernej.skrabec, kuba, linux-kernel,
	linux-fsi, dri-devel, linux-input, linux-media, linux-mtd,
	oss-drivers, netdev, linux-wireless, brcm80211,
	brcm80211-dev-list.pdl, linux-serial, bpf, jserv, Yu-Chun Lin

On February 25, 2025 2:46:23 PM PST, David Laight <david.laight.linux@gmail.com> wrote:
>On Mon, 24 Feb 2025 13:55:28 -0800
>"H. Peter Anvin" <hpa@zytor.com> wrote:
>
>> On 2/24/25 07:24, Uros Bizjak wrote:
>> > 
>> > 
>> > On 23. 02. 25 17:42, Kuan-Wei Chiu wrote:  
>> >> Refactor parity calculations to use the standard parity8() helper. This
>> >> change eliminates redundant implementations and improves code
>> >> efficiency.  
>...
>> Of course, on x86, parity8() and parity16() can be implemented very simply:
>> 
>> (Also, the parity functions really ought to return bool, and be flagged 
>> __attribute_const__.)
>> 
>> static inline __attribute_const__ bool _arch_parity8(u8 val)
>> {
>> 	bool parity;
>> 	asm("and %0,%0" : "=@ccnp" (parity) : "q" (val));
>> 	return parity;
>> }
>> 
>> static inline __attribute_const__ bool _arch_parity16(u16 val)
>> {
>> 	bool parity;
>> 	asm("xor %h0,%b0" : "=@ccnp" (parity), "+Q" (val));
>> 	return parity;
>> }
>
>The same (with fixes) can be done for parity64() on 32bit.
>
>> 
>> In the generic algorithm, you probably should implement parity16() in 
>> terms of parity8(), parity32() in terms of parity16() and so on:
>> 
>> static inline __attribute_const__ bool parity16(u16 val)
>> {
>> #ifdef ARCH_HAS_PARITY16
>> 	if (!__builtin_const_p(val))
>> 		return _arch_parity16(val);
>> #endif
>> 	return parity8(val ^ (val >> 8));
>> }
>> 
>> This picks up the architectural versions when available.
>
>Not the best way to do that.
>Make the name in the #ifdef the same as the function and define
>a default one if the architecture doesn't define one.
>So:
>
>static inline parity16(u16 val)
>{
>	return __builtin_const_p(val) ? _parity_const(val) : _parity16(val);
>}
>
>#ifndef _parity16
>static inline _parity16(u15 val)
>{
>	return _parity8(val ^ (val >> 8));
>}
>#endif
>
>You only need one _parity_const().
>
>> 
>> Furthermore, if a popcnt instruction is known to exist, then the parity 
>> is simply popcnt(x) & 1.
>
>Beware that some popcnt instructions are slow.
>
>	David
>
>> 
>> 	-hpa
>> 
>> 
>

Seems more verbose than just #ifdef _arch_parity8 et al since the const and generic code cases are the same (which they aren't always.)

But that part is a good idea, especially since on at least *some* architectures like x86 doing: 

#define _arch_parity8(x) __builtin_parity(x)

... etc is entirely reasonable and lets gcc use an already available parity flag should one be available.

The inline wrapper, of course, takes care of the type mangling.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 02/17] bitops: Add generic parity calculation for u64
  2025-02-25 21:43         ` Andrew Cooper
@ 2025-02-26  1:35           ` H. Peter Anvin
  0 siblings, 0 replies; 54+ messages in thread
From: H. Peter Anvin @ 2025-02-26  1:35 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Laurent.pinchart, airlied, akpm, alistair, andrew+netdev,
	andrzej.hajda, arend.vanspriel, awalls, bp, bpf,
	brcm80211-dev-list.pdl, brcm80211, dave.hansen, davem,
	david.laight.linux, dmitry.torokhov, dri-devel, eajames, edumazet,
	eleanor15x, gregkh, hverkuil, jernej.skrabec, jirislaby, jk, joel,
	johannes, jonas, jserv, kuba, linux-fsi, linux-input,
	linux-kernel, linux-media, linux-mtd, linux-serial,
	linux-wireless, linux, louis.peens, maarten.lankhorst, mchehab,
	mingo, miquel.raynal, mripard, neil.armstrong, netdev,
	oss-drivers, pabeni, parthiban.veerasooran, rfoss, richard,
	simona, tglx, tzimmermann, vigneshr, visitorckw, x86, yury.norov

On February 25, 2025 1:43:27 PM PST, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> Incidentally, in all of this, didn't anyone notice __builtin_parity()?
>
>Yes.  It it has done sane for a decade on x86, yet does things such as
>emitting a library call on other architectures.
>
>https://godbolt.org/z/6qG3noebq
>
>~Andrew

And not even a smart one at that.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 02/17] bitops: Add generic parity calculation for u64
  2025-02-25 13:29     ` Kuan-Wei Chiu
  2025-02-25 14:20       ` Kuan-Wei Chiu
@ 2025-02-26  7:14       ` Jiri Slaby
  2025-02-26 17:59         ` Kuan-Wei Chiu
  2025-02-26 18:33         ` Yury Norov
  1 sibling, 2 replies; 54+ messages in thread
From: Jiri Slaby @ 2025-02-26  7:14 UTC (permalink / raw)
  To: Kuan-Wei Chiu, Yury Norov
  Cc: tglx, mingo, bp, dave.hansen, x86, jk, joel, eajames,
	andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst, mripard,
	tzimmermann, airlied, simona, dmitry.torokhov, mchehab, awalls,
	hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, akpm, hpa, alistair, linux,
	Laurent.pinchart, jonas, jernej.skrabec, kuba, linux-kernel,
	linux-fsi, dri-devel, linux-input, linux-media, linux-mtd,
	oss-drivers, netdev, linux-wireless, brcm80211,
	brcm80211-dev-list.pdl, linux-serial, bpf, jserv, Yu-Chun Lin

On 25. 02. 25, 14:29, Kuan-Wei Chiu wrote:
>> +#define parity(val)					\
>> +({							\
>> +	u64 __v = (val);				\
>> +	int __ret;					\
>> +	switch (BITS_PER_TYPE(val)) {			\
>> +	case 64:					\
>> +		__v ^= __v >> 32;			\
>> +		fallthrough;				\
>> +	case 32:					\
>> +		__v ^= __v >> 16;			\
>> +		fallthrough;				\
>> +	case 16:					\
>> +		__v ^= __v >> 8;			\
>> +		fallthrough;				\
>> +	case 8:						\
>> +		__v ^= __v >> 4;			\
>> +		__ret =  (0x6996 >> (__v & 0xf)) & 1;	\
>> +		break;					\
>> +	default:					\
>> +		BUILD_BUG();				\
>> +	}						\
>> +	__ret;						\
>> +})
>> +
>> +#define parity8(val)	parity((u8)(val))
>> +#define parity32(val)	parity((u32)(val))
>> +#define parity64(val)	parity((u64)(val))
>>   
> What do you think about using these inline functions instead of macros?
> Except for parity8(), each function is a single line and follows the
> same logic. I find inline functions more readable, and coding-style.rst
> also recommends them over macros.

Not in cases where macros are inevitable. I mean, do we need parityXX() 
for XX in (8, 16, 32, 64) at all? Isn't the parity() above enough for 
everybody? And if not, you can have all those parityXX() as inlines as 
you suggest, but also provide a macro such as the above to call 
(optimized) parityXX() as per datatype len.

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 02/17] bitops: Add generic parity calculation for u64
  2025-02-26  7:14       ` Jiri Slaby
@ 2025-02-26 17:59         ` Kuan-Wei Chiu
  2025-02-26 18:33         ` Yury Norov
  1 sibling, 0 replies; 54+ messages in thread
From: Kuan-Wei Chiu @ 2025-02-26 17:59 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Yury Norov, tglx, mingo, bp, dave.hansen, x86, jk, joel, eajames,
	andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst, mripard,
	tzimmermann, airlied, simona, dmitry.torokhov, mchehab, awalls,
	hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, akpm, hpa, alistair, linux,
	Laurent.pinchart, jonas, jernej.skrabec, kuba, linux-kernel,
	linux-fsi, dri-devel, linux-input, linux-media, linux-mtd,
	oss-drivers, netdev, linux-wireless, brcm80211,
	brcm80211-dev-list.pdl, linux-serial, bpf, jserv, Yu-Chun Lin

Hi Jiri,

On Wed, Feb 26, 2025 at 08:14:14AM +0100, Jiri Slaby wrote:
> On 25. 02. 25, 14:29, Kuan-Wei Chiu wrote:
> > > +#define parity(val)					\
> > > +({							\
> > > +	u64 __v = (val);				\
> > > +	int __ret;					\
> > > +	switch (BITS_PER_TYPE(val)) {			\
> > > +	case 64:					\
> > > +		__v ^= __v >> 32;			\
> > > +		fallthrough;				\
> > > +	case 32:					\
> > > +		__v ^= __v >> 16;			\
> > > +		fallthrough;				\
> > > +	case 16:					\
> > > +		__v ^= __v >> 8;			\
> > > +		fallthrough;				\
> > > +	case 8:						\
> > > +		__v ^= __v >> 4;			\
> > > +		__ret =  (0x6996 >> (__v & 0xf)) & 1;	\
> > > +		break;					\
> > > +	default:					\
> > > +		BUILD_BUG();				\
> > > +	}						\
> > > +	__ret;						\
> > > +})
> > > +
> > > +#define parity8(val)	parity((u8)(val))
> > > +#define parity32(val)	parity((u32)(val))
> > > +#define parity64(val)	parity((u64)(val))
> > What do you think about using these inline functions instead of macros?
> > Except for parity8(), each function is a single line and follows the
> > same logic. I find inline functions more readable, and coding-style.rst
> > also recommends them over macros.
> 
> Not in cases where macros are inevitable. I mean, do we need parityXX() for
> XX in (8, 16, 32, 64) at all? Isn't the parity() above enough for everybody?
> And if not, you can have all those parityXX() as inlines as you suggest, but
> also provide a macro such as the above to call (optimized) parityXX() as per
> datatype len.
> 
I agree that we can add a macro to call parity8/16/32/64 based on the
data type size. However, I think we should still keep parity8/16/32/64.
As Peter and David discussed, the x86-specific implementations of
parity8() and parity16() might use different instructions instead of
just XORing and calling another function, as in the generic version.

My current idea is to follow David's suggestion and use
__builtin_parity when there is no architecture-specific implementation.
In lib/, we can provide a generic weak function implementation of
__parity[sdt]i2.

Any comments or suggestions are welcome!

Regards,
Kuan-Wei

static inline parity32(u32 val)
{
    return __builtin_const_p(val) ? _parity_const(val) : _parity32(val);
}

#ifndef _parity32
static inline _parity32(u32 val)
{
    return __builtin_parity(val);
}
#endif

int __weak __paritysi2(u32 val);
int __weak __paritysi2(u32 val)
{
    val ^= val >> 16;
    val ^= val >> 8;
    val ^= val >> 4;
    return (0x6996 >> (val & 0xf)) & 1;
}
EXPORT_SYMBOL(__paritysi2);

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 02/17] bitops: Add generic parity calculation for u64
  2025-02-26  7:14       ` Jiri Slaby
  2025-02-26 17:59         ` Kuan-Wei Chiu
@ 2025-02-26 18:33         ` Yury Norov
  2025-02-27  6:38           ` Jiri Slaby
  1 sibling, 1 reply; 54+ messages in thread
From: Yury Norov @ 2025-02-26 18:33 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Kuan-Wei Chiu, tglx, mingo, bp, dave.hansen, x86, jk, joel,
	eajames, andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst,
	mripard, tzimmermann, airlied, simona, dmitry.torokhov, mchehab,
	awalls, hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, akpm, hpa, alistair, linux,
	Laurent.pinchart, jonas, jernej.skrabec, kuba, linux-kernel,
	linux-fsi, dri-devel, linux-input, linux-media, linux-mtd,
	oss-drivers, netdev, linux-wireless, brcm80211,
	brcm80211-dev-list.pdl, linux-serial, bpf, jserv, Yu-Chun Lin

On Wed, Feb 26, 2025 at 08:14:14AM +0100, Jiri Slaby wrote:
> On 25. 02. 25, 14:29, Kuan-Wei Chiu wrote:
> > > +#define parity(val)					\
> > > +({							\
> > > +	u64 __v = (val);				\
> > > +	int __ret;					\
> > > +	switch (BITS_PER_TYPE(val)) {			\
> > > +	case 64:					\
> > > +		__v ^= __v >> 32;			\
> > > +		fallthrough;				\
> > > +	case 32:					\
> > > +		__v ^= __v >> 16;			\
> > > +		fallthrough;				\
> > > +	case 16:					\
> > > +		__v ^= __v >> 8;			\
> > > +		fallthrough;				\
> > > +	case 8:						\
> > > +		__v ^= __v >> 4;			\
> > > +		__ret =  (0x6996 >> (__v & 0xf)) & 1;	\
> > > +		break;					\
> > > +	default:					\
> > > +		BUILD_BUG();				\
> > > +	}						\
> > > +	__ret;						\
> > > +})
> > > +
> > > +#define parity8(val)	parity((u8)(val))
> > > +#define parity32(val)	parity((u32)(val))
> > > +#define parity64(val)	parity((u64)(val))
> > What do you think about using these inline functions instead of macros?
> > Except for parity8(), each function is a single line and follows the
> > same logic. I find inline functions more readable, and coding-style.rst
> > also recommends them over macros.
>
> Not in cases where macros are inevitable. I mean, do we need parityXX() for
> XX in (8, 16, 32, 64) at all? Isn't the parity() above enough for everybody?

The existing codebase has something like:

        int ret;

        ret = i3c_master_get_free_addr(m, last_addr + 1);
        ret |= parity8(ret) ? 0 : BIT(7)

So if we'll switch it to a macro like one above, it will become a
32-bit parity. It wouldn't be an error because i3c_master_get_free_addr()
returns an u8 or -ENOMEM, and the error code is checked explicitly. 

But if we decide to go with parity() only, some users will have to
call it like parity((u8)val) explicitly. Which is not bad actually.

> And if not, you can have all those parityXX() as inlines as you suggest, but
> also provide a macro such as the above to call (optimized) parityXX() as per
> datatype len.

Yes, if we need fixed-type parity's, they should all be one-liners
calling the same macro. Macros or inline functions - no preference for
me.

Thanks,
Yury

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 02/17] bitops: Add generic parity calculation for u64
  2025-02-24 19:27   ` Yury Norov
  2025-02-25 13:29     ` Kuan-Wei Chiu
@ 2025-02-26 22:29     ` David Laight
  2025-02-27 18:05       ` Yury Norov
  1 sibling, 1 reply; 54+ messages in thread
From: David Laight @ 2025-02-26 22:29 UTC (permalink / raw)
  To: Yury Norov
  Cc: Kuan-Wei Chiu, tglx, mingo, bp, dave.hansen, x86, jk, joel,
	eajames, andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst,
	mripard, tzimmermann, airlied, simona, dmitry.torokhov, mchehab,
	awalls, hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, jirislaby, akpm, hpa, alistair,
	linux, Laurent.pinchart, jonas, jernej.skrabec, kuba,
	linux-kernel, linux-fsi, dri-devel, linux-input, linux-media,
	linux-mtd, oss-drivers, netdev, linux-wireless, brcm80211,
	brcm80211-dev-list.pdl, linux-serial, bpf, jserv, Yu-Chun Lin

On Mon, 24 Feb 2025 14:27:03 -0500
Yury Norov <yury.norov@gmail.com> wrote:
....
> +#define parity(val)					\
> +({							\
> +	u64 __v = (val);				\
> +	int __ret;					\
> +	switch (BITS_PER_TYPE(val)) {			\
> +	case 64:					\
> +		__v ^= __v >> 32;			\
> +		fallthrough;				\
> +	case 32:					\
> +		__v ^= __v >> 16;			\
> +		fallthrough;				\
> +	case 16:					\
> +		__v ^= __v >> 8;			\
> +		fallthrough;				\
> +	case 8:						\
> +		__v ^= __v >> 4;			\
> +		__ret =  (0x6996 >> (__v & 0xf)) & 1;	\
> +		break;					\
> +	default:					\
> +		BUILD_BUG();				\
> +	}						\
> +	__ret;						\
> +})
> +

You really don't want to do that!
gcc makes a right hash of it for x86 (32bit).
See https://www.godbolt.org/z/jG8dv3cvs

You do better using a __v32 after the 64bit xor.

Even the 64bit version is probably sub-optimal (both gcc and clang).
The whole lot ends up being a bit single register dependency chain.
You want to do:
	mov %eax, %edx
	shrl $n, %eax
	xor %edx, %eax
so that the 'mov' and 'shrl' can happen in the same clock
(without relying on the register-register move being optimised out).

I dropped in the arm64 for an example of where the magic shift of 6996
just adds an extra instruction.

	David



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 02/17] bitops: Add generic parity calculation for u64
  2025-02-26 18:33         ` Yury Norov
@ 2025-02-27  6:38           ` Jiri Slaby
  2025-02-27 17:37             ` Yury Norov
  0 siblings, 1 reply; 54+ messages in thread
From: Jiri Slaby @ 2025-02-27  6:38 UTC (permalink / raw)
  To: Yury Norov
  Cc: Kuan-Wei Chiu, tglx, mingo, bp, dave.hansen, x86, jk, joel,
	eajames, andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst,
	mripard, tzimmermann, airlied, simona, dmitry.torokhov, mchehab,
	awalls, hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, akpm, hpa, alistair, linux,
	Laurent.pinchart, jonas, jernej.skrabec, kuba, linux-kernel,
	linux-fsi, dri-devel, linux-input, linux-media, linux-mtd,
	oss-drivers, netdev, linux-wireless, brcm80211,
	brcm80211-dev-list.pdl, linux-serial, bpf, jserv, Yu-Chun Lin

On 26. 02. 25, 19:33, Yury Norov wrote:
>> Not in cases where macros are inevitable. I mean, do we need parityXX() for
>> XX in (8, 16, 32, 64) at all? Isn't the parity() above enough for everybody?
> 
> The existing codebase has something like:
> 
>          int ret;
> 
>          ret = i3c_master_get_free_addr(m, last_addr + 1);
>          ret |= parity8(ret) ? 0 : BIT(7)
> 
> So if we'll switch it to a macro like one above, it will become a
> 32-bit parity. It wouldn't be an error because i3c_master_get_free_addr()
> returns an u8 or -ENOMEM, and the error code is checked explicitly.
> 
> But if we decide to go with parity() only, some users will have to
> call it like parity((u8)val) explicitly. Which is not bad actually.

That cast looks ugly -- we apparently need parityXX(). (In this 
particular case we could do parity8(last_addr), but I assume there are 
more cases like this.) Thanks for looking up the case for this.

>> And if not, you can have all those parityXX() as inlines as you suggest, but
>> also provide a macro such as the above to call (optimized) parityXX() as per
>> datatype len.
> 
> Yes, if we need fixed-type parity's, they should all be one-liners
> calling the same macro. Macros or inline functions - no preference for
> me.

-- 
js
suse labs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 02/17] bitops: Add generic parity calculation for u64
  2025-02-27  6:38           ` Jiri Slaby
@ 2025-02-27 17:37             ` Yury Norov
  0 siblings, 0 replies; 54+ messages in thread
From: Yury Norov @ 2025-02-27 17:37 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Kuan-Wei Chiu, tglx, mingo, bp, dave.hansen, x86, jk, joel,
	eajames, andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst,
	mripard, tzimmermann, airlied, simona, dmitry.torokhov, mchehab,
	awalls, hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, akpm, hpa, alistair, linux,
	Laurent.pinchart, jonas, jernej.skrabec, kuba, linux-kernel,
	linux-fsi, dri-devel, linux-input, linux-media, linux-mtd,
	oss-drivers, netdev, linux-wireless, brcm80211,
	brcm80211-dev-list.pdl, linux-serial, bpf, jserv, Yu-Chun Lin

On Thu, Feb 27, 2025 at 07:38:58AM +0100, Jiri Slaby wrote:
> On 26. 02. 25, 19:33, Yury Norov wrote:
> > > Not in cases where macros are inevitable. I mean, do we need parityXX() for
> > > XX in (8, 16, 32, 64) at all? Isn't the parity() above enough for everybody?
> > 
> > The existing codebase has something like:
> > 
> >          int ret;
> > 
> >          ret = i3c_master_get_free_addr(m, last_addr + 1);
> >          ret |= parity8(ret) ? 0 : BIT(7)
> > 
> > So if we'll switch it to a macro like one above, it will become a
> > 32-bit parity. It wouldn't be an error because i3c_master_get_free_addr()
> > returns an u8 or -ENOMEM, and the error code is checked explicitly.
> > 
> > But if we decide to go with parity() only, some users will have to
> > call it like parity((u8)val) explicitly. Which is not bad actually.
> 
> That cast looks ugly -- we apparently need parityXX(). (In this particular
> case we could do parity8(last_addr), but I assume there are more cases like
> this.) Thanks for looking up the case for this.

This parity8() is used in just 2 drivers - i3c and hwmon/spd5118. The hwmon
driver looks good. I3C, yeah, makes this implied typecast, which is nasty
regardless.

This is the new code, and I think if we all agree that generic parity()
would be a better API, it's a good time to convert existing users now.

Thanks,
Yury

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 02/17] bitops: Add generic parity calculation for u64
  2025-02-26 22:29     ` David Laight
@ 2025-02-27 18:05       ` Yury Norov
  2025-02-27 21:57         ` David Laight
  0 siblings, 1 reply; 54+ messages in thread
From: Yury Norov @ 2025-02-27 18:05 UTC (permalink / raw)
  To: David Laight
  Cc: Kuan-Wei Chiu, tglx, mingo, bp, dave.hansen, x86, jk, joel,
	eajames, andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst,
	mripard, tzimmermann, airlied, simona, dmitry.torokhov, mchehab,
	awalls, hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, jirislaby, akpm, hpa, alistair,
	linux, Laurent.pinchart, jonas, jernej.skrabec, kuba,
	linux-kernel, linux-fsi, dri-devel, linux-input, linux-media,
	linux-mtd, oss-drivers, netdev, linux-wireless, brcm80211,
	brcm80211-dev-list.pdl, linux-serial, bpf, jserv, Yu-Chun Lin

On Wed, Feb 26, 2025 at 10:29:11PM +0000, David Laight wrote:
> On Mon, 24 Feb 2025 14:27:03 -0500
> Yury Norov <yury.norov@gmail.com> wrote:
> ....
> > +#define parity(val)					\
> > +({							\
> > +	u64 __v = (val);				\
> > +	int __ret;					\
> > +	switch (BITS_PER_TYPE(val)) {			\
> > +	case 64:					\
> > +		__v ^= __v >> 32;			\
> > +		fallthrough;				\
> > +	case 32:					\
> > +		__v ^= __v >> 16;			\
> > +		fallthrough;				\
> > +	case 16:					\
> > +		__v ^= __v >> 8;			\
> > +		fallthrough;				\
> > +	case 8:						\
> > +		__v ^= __v >> 4;			\
> > +		__ret =  (0x6996 >> (__v & 0xf)) & 1;	\
> > +		break;					\
> > +	default:					\
> > +		BUILD_BUG();				\
> > +	}						\
> > +	__ret;						\
> > +})
> > +
> 
> You really don't want to do that!
> gcc makes a right hash of it for x86 (32bit).
> See https://www.godbolt.org/z/jG8dv3cvs

GCC fails to even understand this. Of course, the __v should be an
__auto_type. But that way GCC fails to understand that case 64 is
a dead code for all smaller type and throws a false-positive 
Wshift-count-overflow. This is a known issue, unfixed for 25 years!

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=4210
 
> You do better using a __v32 after the 64bit xor.

It should be an __auto_type. I already mentioned. So because of that,
we can either do something like this:

  #define parity(val)					\
  ({							\
  #ifdef CLANG                                          \
  	__auto_type __v = (val);			\
  #else /* GCC; because of this and that */             \
  	u64 __v = (val);			        \
  #endif                                                \
  	int __ret;					\

Or simply disable Wshift-count-overflow for GCC.

> Even the 64bit version is probably sub-optimal (both gcc and clang).
> The whole lot ends up being a bit single register dependency chain.
> You want to do:

No, I don't. I want to have a sane compiler that does it for me.

> 	mov %eax, %edx
> 	shrl $n, %eax
> 	xor %edx, %eax
> so that the 'mov' and 'shrl' can happen in the same clock
> (without relying on the register-register move being optimised out).
> 
> I dropped in the arm64 for an example of where the magic shift of 6996
> just adds an extra instruction.

It's still unclear to me that this parity thing is used in hot paths.
If that holds, it's unclear that your hand-made version is better than
what's generated by GCC.

Do you have any perf test?

Thanks,
Yury

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 02/17] bitops: Add generic parity calculation for u64
  2025-02-27 18:05       ` Yury Norov
@ 2025-02-27 21:57         ` David Laight
  2025-02-28  1:50           ` H. Peter Anvin
  2025-03-02 15:47           ` Yury Norov
  0 siblings, 2 replies; 54+ messages in thread
From: David Laight @ 2025-02-27 21:57 UTC (permalink / raw)
  To: Yury Norov
  Cc: Kuan-Wei Chiu, tglx, mingo, bp, dave.hansen, x86, jk, joel,
	eajames, andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst,
	mripard, tzimmermann, airlied, simona, dmitry.torokhov, mchehab,
	awalls, hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, jirislaby, akpm, hpa, alistair,
	linux, Laurent.pinchart, jonas, jernej.skrabec, kuba,
	linux-kernel, linux-fsi, dri-devel, linux-input, linux-media,
	linux-mtd, oss-drivers, netdev, linux-wireless, brcm80211,
	brcm80211-dev-list.pdl, linux-serial, bpf, jserv, Yu-Chun Lin

On Thu, 27 Feb 2025 13:05:29 -0500
Yury Norov <yury.norov@gmail.com> wrote:

> On Wed, Feb 26, 2025 at 10:29:11PM +0000, David Laight wrote:
> > On Mon, 24 Feb 2025 14:27:03 -0500
> > Yury Norov <yury.norov@gmail.com> wrote:
> > ....  
> > > +#define parity(val)					\
> > > +({							\
> > > +	u64 __v = (val);				\
> > > +	int __ret;					\
> > > +	switch (BITS_PER_TYPE(val)) {			\
> > > +	case 64:					\
> > > +		__v ^= __v >> 32;			\
> > > +		fallthrough;				\
> > > +	case 32:					\
> > > +		__v ^= __v >> 16;			\
> > > +		fallthrough;				\
> > > +	case 16:					\
> > > +		__v ^= __v >> 8;			\
> > > +		fallthrough;				\
> > > +	case 8:						\
> > > +		__v ^= __v >> 4;			\
> > > +		__ret =  (0x6996 >> (__v & 0xf)) & 1;	\
> > > +		break;					\
> > > +	default:					\
> > > +		BUILD_BUG();				\
> > > +	}						\
> > > +	__ret;						\
> > > +})
> > > +  
> > 
> > You really don't want to do that!
> > gcc makes a right hash of it for x86 (32bit).
> > See https://www.godbolt.org/z/jG8dv3cvs  
> 
> GCC fails to even understand this. Of course, the __v should be an
> __auto_type. But that way GCC fails to understand that case 64 is
> a dead code for all smaller type and throws a false-positive 
> Wshift-count-overflow. This is a known issue, unfixed for 25 years!

Just do __v ^= __v >> 16 >> 16

> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=4210
>  
> > You do better using a __v32 after the 64bit xor.  
> 
> It should be an __auto_type. I already mentioned. So because of that,
> we can either do something like this:
> 
>   #define parity(val)					\
>   ({							\
>   #ifdef CLANG                                          \
>   	__auto_type __v = (val);			\
>   #else /* GCC; because of this and that */             \
>   	u64 __v = (val);			        \
>   #endif                                                \
>   	int __ret;					\
> 
> Or simply disable Wshift-count-overflow for GCC.

For 64bit values on 32bit it is probably better to do:
int p32(unsigned long long x)
{
    unsigned int lo = x;
    lo ^= x >> 32;
    lo ^= lo >> 16;
    lo ^= lo >> 8;
    lo ^= lo >> 4;
    return (0x6996 >> (lo & 0xf)) & 1;
}
That stops the compiler doing 64bit shifts (ok on x86, but probably not elsewhere).
It is likely to be reasonably optimal for most 64bit cpu as well.
(For x86-64 it probably removes a load of REX prefix.)
(It adds an extra instruction to arm because if its barrel shifter.)


> 
> > Even the 64bit version is probably sub-optimal (both gcc and clang).
> > The whole lot ends up being a bit single register dependency chain.
> > You want to do:  
> 
> No, I don't. I want to have a sane compiler that does it for me.
> 
> > 	mov %eax, %edx
> > 	shrl $n, %eax
> > 	xor %edx, %eax
> > so that the 'mov' and 'shrl' can happen in the same clock
> > (without relying on the register-register move being optimised out).
> > 
> > I dropped in the arm64 for an example of where the magic shift of 6996
> > just adds an extra instruction.  
> 
> It's still unclear to me that this parity thing is used in hot paths.
> If that holds, it's unclear that your hand-made version is better than
> what's generated by GCC.

I wasn't seriously considering doing that optimisation.
Perhaps just hoping is might make a compiler person think :-)

	David

> 
> Do you have any perf test?
> 
> Thanks,
> Yury


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 02/17] bitops: Add generic parity calculation for u64
  2025-02-27 21:57         ` David Laight
@ 2025-02-28  1:50           ` H. Peter Anvin
  2025-03-02 15:47           ` Yury Norov
  1 sibling, 0 replies; 54+ messages in thread
From: H. Peter Anvin @ 2025-02-28  1:50 UTC (permalink / raw)
  To: David Laight, Yury Norov
  Cc: Kuan-Wei Chiu, tglx, mingo, bp, dave.hansen, x86, jk, joel,
	eajames, andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst,
	mripard, tzimmermann, airlied, simona, dmitry.torokhov, mchehab,
	awalls, hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, jirislaby, akpm, alistair,
	linux, Laurent.pinchart, jonas, jernej.skrabec, kuba,
	linux-kernel, linux-fsi, dri-devel, linux-input, linux-media,
	linux-mtd, oss-drivers, netdev, linux-wireless, brcm80211,
	brcm80211-dev-list.pdl, linux-serial, bpf, jserv, Yu-Chun Lin

On February 27, 2025 1:57:41 PM PST, David Laight <david.laight.linux@gmail.com> wrote:
>On Thu, 27 Feb 2025 13:05:29 -0500
>Yury Norov <yury.norov@gmail.com> wrote:
>
>> On Wed, Feb 26, 2025 at 10:29:11PM +0000, David Laight wrote:
>> > On Mon, 24 Feb 2025 14:27:03 -0500
>> > Yury Norov <yury.norov@gmail.com> wrote:
>> > ....  
>> > > +#define parity(val)					\
>> > > +({							\
>> > > +	u64 __v = (val);				\
>> > > +	int __ret;					\
>> > > +	switch (BITS_PER_TYPE(val)) {			\
>> > > +	case 64:					\
>> > > +		__v ^= __v >> 32;			\
>> > > +		fallthrough;				\
>> > > +	case 32:					\
>> > > +		__v ^= __v >> 16;			\
>> > > +		fallthrough;				\
>> > > +	case 16:					\
>> > > +		__v ^= __v >> 8;			\
>> > > +		fallthrough;				\
>> > > +	case 8:						\
>> > > +		__v ^= __v >> 4;			\
>> > > +		__ret =  (0x6996 >> (__v & 0xf)) & 1;	\
>> > > +		break;					\
>> > > +	default:					\
>> > > +		BUILD_BUG();				\
>> > > +	}						\
>> > > +	__ret;						\
>> > > +})
>> > > +  
>> > 
>> > You really don't want to do that!
>> > gcc makes a right hash of it for x86 (32bit).
>> > See https://www.godbolt.org/z/jG8dv3cvs  
>> 
>> GCC fails to even understand this. Of course, the __v should be an
>> __auto_type. But that way GCC fails to understand that case 64 is
>> a dead code for all smaller type and throws a false-positive 
>> Wshift-count-overflow. This is a known issue, unfixed for 25 years!
>
>Just do __v ^= __v >> 16 >> 16
>
>> 
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=4210
>>  
>> > You do better using a __v32 after the 64bit xor.  
>> 
>> It should be an __auto_type. I already mentioned. So because of that,
>> we can either do something like this:
>> 
>>   #define parity(val)					\
>>   ({							\
>>   #ifdef CLANG                                          \
>>   	__auto_type __v = (val);			\
>>   #else /* GCC; because of this and that */             \
>>   	u64 __v = (val);			        \
>>   #endif                                                \
>>   	int __ret;					\
>> 
>> Or simply disable Wshift-count-overflow for GCC.
>
>For 64bit values on 32bit it is probably better to do:
>int p32(unsigned long long x)
>{
>    unsigned int lo = x;
>    lo ^= x >> 32;
>    lo ^= lo >> 16;
>    lo ^= lo >> 8;
>    lo ^= lo >> 4;
>    return (0x6996 >> (lo & 0xf)) & 1;
>}
>That stops the compiler doing 64bit shifts (ok on x86, but probably not elsewhere).
>It is likely to be reasonably optimal for most 64bit cpu as well.
>(For x86-64 it probably removes a load of REX prefix.)
>(It adds an extra instruction to arm because if its barrel shifter.)
>
>
>> 
>> > Even the 64bit version is probably sub-optimal (both gcc and clang).
>> > The whole lot ends up being a bit single register dependency chain.
>> > You want to do:  
>> 
>> No, I don't. I want to have a sane compiler that does it for me.
>> 
>> > 	mov %eax, %edx
>> > 	shrl $n, %eax
>> > 	xor %edx, %eax
>> > so that the 'mov' and 'shrl' can happen in the same clock
>> > (without relying on the register-register move being optimised out).
>> > 
>> > I dropped in the arm64 for an example of where the magic shift of 6996
>> > just adds an extra instruction.  
>> 
>> It's still unclear to me that this parity thing is used in hot paths.
>> If that holds, it's unclear that your hand-made version is better than
>> what's generated by GCC.
>
>I wasn't seriously considering doing that optimisation.
>Perhaps just hoping is might make a compiler person think :-)
>
>	David
>
>> 
>> Do you have any perf test?
>> 
>> Thanks,
>> Yury
>

What the compiler people need to do is to not make __builtin_parity*() generate crap.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH 02/17] bitops: Add generic parity calculation for u64
  2025-02-27 21:57         ` David Laight
  2025-02-28  1:50           ` H. Peter Anvin
@ 2025-03-02 15:47           ` Yury Norov
  1 sibling, 0 replies; 54+ messages in thread
From: Yury Norov @ 2025-03-02 15:47 UTC (permalink / raw)
  To: David Laight
  Cc: Kuan-Wei Chiu, tglx, mingo, bp, dave.hansen, x86, jk, joel,
	eajames, andrzej.hajda, neil.armstrong, rfoss, maarten.lankhorst,
	mripard, tzimmermann, airlied, simona, dmitry.torokhov, mchehab,
	awalls, hverkuil, miquel.raynal, richard, vigneshr, louis.peens,
	andrew+netdev, davem, edumazet, pabeni, parthiban.veerasooran,
	arend.vanspriel, johannes, gregkh, jirislaby, akpm, hpa, alistair,
	linux, Laurent.pinchart, jonas, jernej.skrabec, kuba,
	linux-kernel, linux-fsi, dri-devel, linux-input, linux-media,
	linux-mtd, oss-drivers, netdev, linux-wireless, brcm80211,
	brcm80211-dev-list.pdl, linux-serial, bpf, jserv, Yu-Chun Lin

On Thu, Feb 27, 2025 at 09:57:41PM +0000, David Laight wrote:
> > It's still unclear to me that this parity thing is used in hot paths.
> > If that holds, it's unclear that your hand-made version is better than
> > what's generated by GCC.
> 
> I wasn't seriously considering doing that optimisation.
> Perhaps just hoping is might make a compiler person think :-)

David, can you suggest only things you're seriously considered to do?
Random suggestions distract my contributors and make them doing unneeded
work and experiments.

In the other thread you asked I Hsin to try your approach to GENMASK()
macro, saying you're lazy. I don't think this is the right way to
communicate, not mentioning that if you're lazy to try your own
approach, it doesn't sound nice to ask someone else to try it.

Thanks for understanding,
Yury

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2025-03-02 15:47 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-23 16:42 [PATCH 00/17] Introduce and use generic parity32/64 helper Kuan-Wei Chiu
2025-02-23 16:42 ` [PATCH 01/17] bitops: Add generic parity calculation for u32 Kuan-Wei Chiu
2025-02-23 16:42 ` [PATCH 02/17] bitops: Add generic parity calculation for u64 Kuan-Wei Chiu
2025-02-24  7:09   ` Jiri Slaby
2025-02-24 13:34     ` David Laight
2025-02-24 16:56       ` Yu-Chun Lin
2025-02-25 15:21       ` H. Peter Anvin
2025-02-25 15:24       ` H. Peter Anvin
2025-02-25 21:43         ` Andrew Cooper
2025-02-26  1:35           ` H. Peter Anvin
2025-02-24 19:27   ` Yury Norov
2025-02-25 13:29     ` Kuan-Wei Chiu
2025-02-25 14:20       ` Kuan-Wei Chiu
2025-02-26  7:14       ` Jiri Slaby
2025-02-26 17:59         ` Kuan-Wei Chiu
2025-02-26 18:33         ` Yury Norov
2025-02-27  6:38           ` Jiri Slaby
2025-02-27 17:37             ` Yury Norov
2025-02-26 22:29     ` David Laight
2025-02-27 18:05       ` Yury Norov
2025-02-27 21:57         ` David Laight
2025-02-28  1:50           ` H. Peter Anvin
2025-03-02 15:47           ` Yury Norov
2025-02-23 16:42 ` [PATCH 03/17] x86: Replace open-coded parity calculation with parity8() Kuan-Wei Chiu
2025-02-24 15:24   ` Uros Bizjak
2025-02-24 21:55     ` H. Peter Anvin
2025-02-24 22:08       ` Uros Bizjak
2025-02-24 22:18         ` H. Peter Anvin
2025-02-25  3:36         ` H. Peter Anvin
2025-02-24 22:17       ` Yury Norov
2025-02-24 22:21         ` H. Peter Anvin
2025-02-24 22:30           ` Yury Norov
2025-02-25 22:46       ` David Laight
2025-02-26  0:26         ` H. Peter Anvin
2025-02-23 16:42 ` [PATCH 04/17] media: media/test_drivers: " Kuan-Wei Chiu
2025-02-23 16:42 ` [PATCH 05/17] media: pci: cx18-av-vbi: " Kuan-Wei Chiu
2025-02-23 16:42 ` [PATCH 06/17] media: saa7115: " Kuan-Wei Chiu
2025-02-23 16:42 ` [PATCH 07/17] serial: max3100: " Kuan-Wei Chiu
2025-02-24  7:25   ` Jiri Slaby
2025-02-23 16:42 ` [PATCH 08/17] lib/bch: Replace open-coded parity calculation with parity32() Kuan-Wei Chiu
2025-02-23 16:42 ` [PATCH 09/17] Input: joystick - " Kuan-Wei Chiu
2025-02-23 16:42 ` [PATCH 10/17] net: ethernet: oa_tc6: " Kuan-Wei Chiu
2025-02-23 16:42 ` [PATCH 11/17] wifi: brcm80211: " Kuan-Wei Chiu
2025-02-25  6:29   ` Arend Van Spriel
2025-02-23 16:42 ` [PATCH 12/17] drm/bridge: dw-hdmi: " Kuan-Wei Chiu
2025-02-23 16:42 ` [PATCH 13/17] mtd: ssfdc: " Kuan-Wei Chiu
2025-02-23 16:42 ` [PATCH 14/17] fsi: i2cr: " Kuan-Wei Chiu
2025-02-23 16:42 ` [PATCH 15/17] fsi: i2cr: Replace open-coded parity calculation with parity64() Kuan-Wei Chiu
2025-02-23 16:42 ` [PATCH 16/17] Input: joystick - " Kuan-Wei Chiu
2025-02-23 16:42 ` [PATCH 17/17] nfp: bpf: " Kuan-Wei Chiu
2025-02-23 20:25 ` [PATCH 00/17] Introduce and use generic parity32/64 helper Uros Bizjak
2025-02-24 15:27   ` Yu-Chun Lin
2025-02-24  7:58 ` Jeremy Kerr
2025-02-24 15:35   ` Yu-Chun Lin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).