Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper

linux-input.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Kuan-Wei Chiu <visitorckw@gmail.com>
To: Yury Norov <yury.norov@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>,
	David Laight <david.laight.linux@gmail.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Laurent.pinchart@ideasonboard.com, airlied@gmail.com,
	akpm@linux-foundation.org, alistair@popple.id.au,
	andrew+netdev@lunn.ch, andrzej.hajda@intel.com,
	arend.vanspriel@broadcom.com, awalls@md.metrocast.net,
	bp@alien8.de, bpf@vger.kernel.org,
	brcm80211-dev-list.pdl@broadcom.com, brcm80211@lists.linux.dev,
	dave.hansen@linux.intel.com, davem@davemloft.net,
	dmitry.torokhov@gmail.com, dri-devel@lists.freedesktop.org,
	eajames@linux.ibm.com, edumazet@google.com, eleanor15x@gmail.com,
	gregkh@linuxfoundation.org, hverkuil@xs4all.nl,
	jernej.skrabec@gmail.com, jirislaby@kernel.org, jk@ozlabs.org,
	joel@jms.id.au, johannes@sipsolutions.net, jonas@kwiboo.se,
	jserv@ccns.ncku.edu.tw, kuba@kernel.org,
	linux-fsi@lists.ozlabs.org, linux-input@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-media@vger.kernel.org,
	linux-mtd@lists.infradead.org, linux-serial@vger.kernel.org,
	linux-wireless@vger.kernel.org, linux@rasmusvillemoes.dk,
	louis.peens@corigine.com, maarten.lankhorst@linux.intel.com,
	mchehab@kernel.org, mingo@redhat.com, miquel.raynal@bootlin.com,
	mripard@kernel.org, neil.armstrong@linaro.org,
	netdev@vger.kernel.org, oss-drivers@corigine.com,
	pabeni@redhat.com, parthiban.veerasooran@microchip.com,
	rfoss@kernel.org, richard@nod.at, simona@ffwll.ch,
	tglx@linutronix.de, tzimmermann@suse.de, vigneshr@ti.com,
	x86@kernel.org
Subject: Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper
Date: Fri, 4 Apr 2025 16:47:58 +0800	[thread overview]
Message-ID: <Z++cvrLOz2VAaUkO@visitorckw-System-Product-Name> (raw)
In-Reply-To: <Z-6zzP2O-Q7zvTLt@thinkpad>

On Thu, Apr 03, 2025 at 12:14:04PM -0400, Yury Norov wrote:
> On Thu, Apr 03, 2025 at 10:39:03PM +0800, Kuan-Wei Chiu wrote:
> > On Tue, Mar 25, 2025 at 12:43:25PM -0700, H. Peter Anvin wrote:
> > > On 3/23/25 08:16, Kuan-Wei Chiu wrote:
> > > > 
> > > > Interface 3: Multiple Functions
> > > > Description: bool parity_odd8/16/32/64()
> > > > Pros: No need for explicit casting; easy to integrate
> > > >        architecture-specific optimizations; except for parity8(), all
> > > >        functions are one-liners with no significant code duplication
> > > > Cons: More functions may increase maintenance burden
> > > > Opinions: Only I support this approach
> > > > 
> > > 
> > > OK, so I responded to this but I can't find my reply or any of the
> > > followups, so let me go again:
> > > 
> > > I prefer this option, because:
> > > 
> > > a. Virtually all uses of parity is done in contexts where the sizes of the
> > > items for which parity is to be taken are well-defined, but it is *really*
> > > easy for integer promotion to cause a value to be extended to 32 bits
> > > unnecessarily (sign or zero extend, although for parity it doesn't make any
> > > difference -- if the compiler realizes it.)
> > > 
> > > b. It makes it easier to add arch-specific implementations, notably using
> > > __builtin_parity on architectures where that is known to generate good code.
> > > 
> > > c. For architectures where only *some* parity implementations are
> > > fast/practical, the generic fallbacks will either naturally synthesize them
> > > from components via shift-xor, or they can be defined to use a larger
> > > version; the function prototype acts like a cast.
> > > 
> > > d. If there is a reason in the future to add a generic version, it is really
> > > easy to do using the size-specific functions as components; this is
> > > something we do literally all over the place, using a pattern so common that
> > > it, itself, probably should be macroized:
> > > 
> > > #define parity(x) 				\
> > > ({						\
> > > 	typeof(x) __x = (x);			\
> > > 	bool __y;				\
> > > 	switch (sizeof(__x)) {			\
> > > 		case 1:				\
> > > 			__y = parity8(__x);	\
> > > 			break;			\
> > > 		case 2:				\
> > > 			__y = parity16(__x);	\
> > > 			break;			\
> > > 		case 4:				\
> > > 			__y = parity32(__x);	\
> > > 			break;			\
> > > 		case 8:				\
> > > 			__y = parity64(__x);	\
> > > 			break;			\
> > > 		default:			\
> > > 			BUILD_BUG();		\
> > > 			break;			\
> > > 	}					\
> > > 	__y;					\
> > > })
> > >
> > Thank you for your detailed response and for explaining the rationale
> > behind your preference. The points you outlined in (a)–(d) all seem
> > quite reasonable to me.
> > 
> > Yury,
> > do you have any feedback on this?
> > Thank you.
> 
> My feedback to you:
> 
> I asked you to share any numbers about each approach. Asm listings,
> performance tests, bloat-o-meter. But you did nothing or very little
> in that department. You move this series, and it means you should be
> very well aware of alternative solutions, their pros and cons.
> 
It seems the concern is that I didn't provide assembly results and
performance numbers. While I believe that listing these numbers alone
cannot prove which users really care about parity efficiency, I have
included the assembly results and my initial observations below. Some
differences, like mov vs movzh, are likely difficult to measure.

Compilation on x86-64 using GCC 14.2 with O2 Optimization:

Link to Godbolt: https://godbolt.org/z/EsqPMz8cq

For u8 Input:
- #2 and #3 generate exactly the same assembly code, while #1 replaces
  one `mov` instruction with `movzh`, which may slightly slow down the
  performance due to zero extension.
- Efficiency: #2 = #3 > #1

For u16 Input:
- As with u8 input, #1 performs an unnecessary zero extension, while #3
  replaces one of the `shr` instructions in #2 with a `mov`, making it
  slightly faster.
- Efficiency: #3 > #2 > #1

For u32 Input:
- #1 has an additional `mov` instruction compared to #2, and #2 has an
  extra `shr` instruction compared to #3.
- Efficiency: #3 > #2 > #1

For u64 Input:
- #1 and #2 generate the same code, but #3 has one less `shr`
  instruction compared to the others.
- Efficiency: #3 > #1 = #2

---

Adding -m32 Flag to View Assembly for 32-bit Machine:

Link to Godbolt: https://godbolt.org/z/GrPa86Eq5

For u8 Input:
- #2 and #3 generate identical assembly code, whereas #1 has additional
  `mov`, `shr`, and `push/pop` instructions.
- Efficiency: #2 = #3 > #1

For u16 Input:
- #1 uses a lot of `xmm` register operations, making it slower than #2
  and #3. Additionally, #2 has an extra `shr` instruction compared to #3.
- Efficiency: #3 > #2 > #1

For u32 Input:
- #1 again uses a lot of `xmm` register operations, so it is slower
  than #2 and #3, and #2 has an additional `shr` instruction compared to #3.
- Efficiency: #3 > #2 > #1

For u64 Input:
- Both #1 and #2 use `xmm` register operations, but #1 has a few extra
  `movdqa` instructions. #3 is more concise, using a few `shr`, `xor`,
  and `mov` instructions to complete the operation.
- Efficiency: #3 > #2 > #1

Regards,
Kuan-Wei

next prev parent reply	other threads:[~2025-04-04  8:48 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-06 16:25 [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper Kuan-Wei Chiu
2025-03-06 16:25 ` [PATCH v3 01/16] bitops: Change parity8() return type to bool Kuan-Wei Chiu
2025-03-06 20:45   ` David Laight
2025-03-07  6:48   ` Jiri Slaby
2025-03-07 11:38     ` Ingo Molnar
2025-03-07 11:42       ` Jiri Slaby
2025-03-07 12:13         ` Ingo Molnar
2025-03-07 12:14           ` H. Peter Anvin
2025-03-07 19:30             ` Yury Norov
2025-03-07 19:33               ` H. Peter Anvin
2025-03-13 16:26                 ` Yury Norov
2025-03-07 19:36         ` David Laight
2025-03-07 19:39           ` H. Peter Anvin
2025-03-12 23:56           ` Jacob Keller
2025-03-13  0:09             ` H. Peter Anvin
2025-03-13 16:24               ` Yury Norov
2025-03-13 16:36                 ` H. Peter Anvin
2025-03-13 21:09                   ` Jacob Keller
2025-03-14 19:06                     ` David Laight
2025-03-15  0:14                       ` H. Peter Anvin
2025-03-06 16:25 ` [PATCH v3 02/16] bitops: Add parity16(), parity32(), and parity64() helpers Kuan-Wei Chiu
2025-03-06 16:25 ` [PATCH v3 03/16] media: media/test_drivers: Replace open-coded parity calculation with parity8() Kuan-Wei Chiu
2025-03-06 16:25 ` [PATCH v3 04/16] media: pci: cx18-av-vbi: " Kuan-Wei Chiu
2025-03-06 16:25 ` [PATCH v3 05/16] media: saa7115: " Kuan-Wei Chiu
2025-03-06 16:25 ` [PATCH v3 06/16] serial: max3100: " Kuan-Wei Chiu
2025-03-06 16:25 ` [PATCH v3 07/16] lib/bch: Replace open-coded parity calculation with parity32() Kuan-Wei Chiu
2025-03-06 16:25 ` [PATCH v3 08/16] Input: joystick - " Kuan-Wei Chiu
2025-03-06 16:25 ` [PATCH v3 09/16] net: ethernet: oa_tc6: " Kuan-Wei Chiu
2025-03-06 16:25 ` [PATCH v3 10/16] wifi: brcm80211: " Kuan-Wei Chiu
2025-03-06 16:25 ` [PATCH v3 11/16] drm/bridge: dw-hdmi: " Kuan-Wei Chiu
2025-03-06 16:25 ` [PATCH v3 12/16] mtd: ssfdc: " Kuan-Wei Chiu
2025-03-06 16:25 ` [PATCH v3 13/16] fsi: i2cr: " Kuan-Wei Chiu
2025-03-06 16:25 ` [PATCH v3 14/16] fsi: i2cr: Replace open-coded parity calculation with parity64() Kuan-Wei Chiu
2025-03-06 16:25 ` [PATCH v3 15/16] Input: joystick - " Kuan-Wei Chiu
2025-03-06 16:25 ` [PATCH v3 16/16] nfp: bpf: " Kuan-Wei Chiu
2025-03-07  3:08 ` [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper H. Peter Anvin
2025-03-07 18:49   ` Andrew Cooper
2025-03-07 19:30     ` H. Peter Anvin
2025-03-07 19:53       ` David Laight
2025-03-07 20:07         ` H. Peter Anvin
2025-03-09 15:48           ` Kuan-Wei Chiu
2025-03-09 16:00             ` H. Peter Anvin
2025-03-09 17:42             ` Jiri Slaby
2025-03-11 22:01             ` Yury Norov
2025-03-11 22:24               ` H. Peter Anvin
2025-03-12 15:51                 ` Yury Norov
2025-03-12 16:29                   ` Kuan-Wei Chiu
2025-03-13  7:41                     ` Kuan-Wei Chiu
2025-03-23 15:16                       ` Kuan-Wei Chiu
2025-03-23 22:40                         ` H. Peter Anvin
2025-03-24 15:53                           ` Yury Norov
2025-03-29 16:00                             ` Kuan-Wei Chiu
2025-03-25 19:43                         ` H. Peter Anvin
2025-04-03 14:39                           ` Kuan-Wei Chiu
2025-04-03 16:14                             ` Yury Norov
2025-04-03 16:54                               ` Kuan-Wei Chiu
2025-04-04  2:51                               ` Jeremy Kerr
2025-04-04  8:46                                 ` Kuan-Wei Chiu
2025-04-04  9:01                                   ` Jeremy Kerr
2025-04-04  8:47                               ` Kuan-Wei Chiu [this message]
2025-03-07  3:14 ` H. Peter Anvin
2025-03-07  9:19   ` Kuan-Wei Chiu
2025-03-07 10:52     ` Jiri Slaby
2025-03-07  6:57 ` Jiri Slaby
2025-03-07  9:22   ` Kuan-Wei Chiu
2025-03-07 15:55   ` Yury Norov
2025-03-07 18:30     ` Kuan-Wei Chiu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z++cvrLOz2VAaUkO@visitorckw-System-Product-Name \
    --to=visitorckw@gmail.com \
    --cc=Laurent.pinchart@ideasonboard.com \
    --cc=airlied@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=alistair@popple.id.au \
    --cc=andrew+netdev@lunn.ch \
    --cc=andrew.cooper3@citrix.com \
    --cc=andrzej.hajda@intel.com \
    --cc=arend.vanspriel@broadcom.com \
    --cc=awalls@md.metrocast.net \
    --cc=bp@alien8.de \
    --cc=bpf@vger.kernel.org \
    --cc=brcm80211-dev-list.pdl@broadcom.com \
    --cc=brcm80211@lists.linux.dev \
    --cc=dave.hansen@linux.intel.com \
    --cc=davem@davemloft.net \
    --cc=david.laight.linux@gmail.com \
    --cc=dmitry.torokhov@gmail.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=eajames@linux.ibm.com \
    --cc=edumazet@google.com \
    --cc=eleanor15x@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hpa@zytor.com \
    --cc=hverkuil@xs4all.nl \
    --cc=jernej.skrabec@gmail.com \
    --cc=jirislaby@kernel.org \
    --cc=jk@ozlabs.org \
    --cc=joel@jms.id.au \
    --cc=johannes@sipsolutions.net \
    --cc=jonas@kwiboo.se \
    --cc=jserv@ccns.ncku.edu.tw \
    --cc=kuba@kernel.org \
    --cc=linux-fsi@lists.ozlabs.org \
    --cc=linux-input@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-mtd@lists.infradead.org \
    --cc=linux-serial@vger.kernel.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=louis.peens@corigine.com \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=mchehab@kernel.org \
    --cc=mingo@redhat.com \
    --cc=miquel.raynal@bootlin.com \
    --cc=mripard@kernel.org \
    --cc=neil.armstrong@linaro.org \
    --cc=netdev@vger.kernel.org \
    --cc=oss-drivers@corigine.com \
    --cc=pabeni@redhat.com \
    --cc=parthiban.veerasooran@microchip.com \
    --cc=rfoss@kernel.org \
    --cc=richard@nod.at \
    --cc=simona@ffwll.ch \
    --cc=tglx@linutronix.de \
    --cc=tzimmermann@suse.de \
    --cc=vigneshr@ti.com \
    --cc=x86@kernel.org \
    --cc=yury.norov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).