From: Kuan-Wei Chiu <visitorckw@gmail.com>
To: Yury Norov <yury.norov@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>,
David Laight <david.laight.linux@gmail.com>,
Andrew Cooper <andrew.cooper3@citrix.com>,
Laurent.pinchart@ideasonboard.com, airlied@gmail.com,
akpm@linux-foundation.org, alistair@popple.id.au,
andrew+netdev@lunn.ch, andrzej.hajda@intel.com,
arend.vanspriel@broadcom.com, awalls@md.metrocast.net,
bp@alien8.de, bpf@vger.kernel.org,
brcm80211-dev-list.pdl@broadcom.com, brcm80211@lists.linux.dev,
dave.hansen@linux.intel.com, davem@davemloft.net,
dmitry.torokhov@gmail.com, dri-devel@lists.freedesktop.org,
eajames@linux.ibm.com, edumazet@google.com, eleanor15x@gmail.com,
gregkh@linuxfoundation.org, hverkuil@xs4all.nl,
jernej.skrabec@gmail.com, jirislaby@kernel.org, jk@ozlabs.org,
joel@jms.id.au, johannes@sipsolutions.net, jonas@kwiboo.se,
jserv@ccns.ncku.edu.tw, kuba@kernel.org,
linux-fsi@lists.ozlabs.org, linux-input@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-media@vger.kernel.org,
linux-mtd@lists.infradead.org, linux-serial@vger.kernel.org,
linux-wireless@vger.kernel.org, linux@rasmusvillemoes.dk,
louis.peens@corigine.com, maarten.lankhorst@linux.intel.com,
mchehab@kernel.org, mingo@redhat.com, miquel.raynal@bootlin.com,
mripard@kernel.org, neil.armstrong@linaro.org,
netdev@vger.kernel.org, oss-drivers@corigine.com,
pabeni@redhat.com, parthiban.veerasooran@microchip.com,
rfoss@kernel.org, richard@nod.at, simona@ffwll.ch,
tglx@linutronix.de, tzimmermann@suse.de, vigneshr@ti.com,
x86@kernel.org
Subject: Re: [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper
Date: Fri, 4 Apr 2025 16:47:58 +0800 [thread overview]
Message-ID: <Z++cvrLOz2VAaUkO@visitorckw-System-Product-Name> (raw)
In-Reply-To: <Z-6zzP2O-Q7zvTLt@thinkpad>
On Thu, Apr 03, 2025 at 12:14:04PM -0400, Yury Norov wrote:
> On Thu, Apr 03, 2025 at 10:39:03PM +0800, Kuan-Wei Chiu wrote:
> > On Tue, Mar 25, 2025 at 12:43:25PM -0700, H. Peter Anvin wrote:
> > > On 3/23/25 08:16, Kuan-Wei Chiu wrote:
> > > >
> > > > Interface 3: Multiple Functions
> > > > Description: bool parity_odd8/16/32/64()
> > > > Pros: No need for explicit casting; easy to integrate
> > > > architecture-specific optimizations; except for parity8(), all
> > > > functions are one-liners with no significant code duplication
> > > > Cons: More functions may increase maintenance burden
> > > > Opinions: Only I support this approach
> > > >
> > >
> > > OK, so I responded to this but I can't find my reply or any of the
> > > followups, so let me go again:
> > >
> > > I prefer this option, because:
> > >
> > > a. Virtually all uses of parity is done in contexts where the sizes of the
> > > items for which parity is to be taken are well-defined, but it is *really*
> > > easy for integer promotion to cause a value to be extended to 32 bits
> > > unnecessarily (sign or zero extend, although for parity it doesn't make any
> > > difference -- if the compiler realizes it.)
> > >
> > > b. It makes it easier to add arch-specific implementations, notably using
> > > __builtin_parity on architectures where that is known to generate good code.
> > >
> > > c. For architectures where only *some* parity implementations are
> > > fast/practical, the generic fallbacks will either naturally synthesize them
> > > from components via shift-xor, or they can be defined to use a larger
> > > version; the function prototype acts like a cast.
> > >
> > > d. If there is a reason in the future to add a generic version, it is really
> > > easy to do using the size-specific functions as components; this is
> > > something we do literally all over the place, using a pattern so common that
> > > it, itself, probably should be macroized:
> > >
> > > #define parity(x) \
> > > ({ \
> > > typeof(x) __x = (x); \
> > > bool __y; \
> > > switch (sizeof(__x)) { \
> > > case 1: \
> > > __y = parity8(__x); \
> > > break; \
> > > case 2: \
> > > __y = parity16(__x); \
> > > break; \
> > > case 4: \
> > > __y = parity32(__x); \
> > > break; \
> > > case 8: \
> > > __y = parity64(__x); \
> > > break; \
> > > default: \
> > > BUILD_BUG(); \
> > > break; \
> > > } \
> > > __y; \
> > > })
> > >
> > Thank you for your detailed response and for explaining the rationale
> > behind your preference. The points you outlined in (a)–(d) all seem
> > quite reasonable to me.
> >
> > Yury,
> > do you have any feedback on this?
> > Thank you.
>
> My feedback to you:
>
> I asked you to share any numbers about each approach. Asm listings,
> performance tests, bloat-o-meter. But you did nothing or very little
> in that department. You move this series, and it means you should be
> very well aware of alternative solutions, their pros and cons.
>
It seems the concern is that I didn't provide assembly results and
performance numbers. While I believe that listing these numbers alone
cannot prove which users really care about parity efficiency, I have
included the assembly results and my initial observations below. Some
differences, like mov vs movzh, are likely difficult to measure.
Compilation on x86-64 using GCC 14.2 with O2 Optimization:
Link to Godbolt: https://godbolt.org/z/EsqPMz8cq
For u8 Input:
- #2 and #3 generate exactly the same assembly code, while #1 replaces
one `mov` instruction with `movzh`, which may slightly slow down the
performance due to zero extension.
- Efficiency: #2 = #3 > #1
For u16 Input:
- As with u8 input, #1 performs an unnecessary zero extension, while #3
replaces one of the `shr` instructions in #2 with a `mov`, making it
slightly faster.
- Efficiency: #3 > #2 > #1
For u32 Input:
- #1 has an additional `mov` instruction compared to #2, and #2 has an
extra `shr` instruction compared to #3.
- Efficiency: #3 > #2 > #1
For u64 Input:
- #1 and #2 generate the same code, but #3 has one less `shr`
instruction compared to the others.
- Efficiency: #3 > #1 = #2
---
Adding -m32 Flag to View Assembly for 32-bit Machine:
Link to Godbolt: https://godbolt.org/z/GrPa86Eq5
For u8 Input:
- #2 and #3 generate identical assembly code, whereas #1 has additional
`mov`, `shr`, and `push/pop` instructions.
- Efficiency: #2 = #3 > #1
For u16 Input:
- #1 uses a lot of `xmm` register operations, making it slower than #2
and #3. Additionally, #2 has an extra `shr` instruction compared to #3.
- Efficiency: #3 > #2 > #1
For u32 Input:
- #1 again uses a lot of `xmm` register operations, so it is slower
than #2 and #3, and #2 has an additional `shr` instruction compared to #3.
- Efficiency: #3 > #2 > #1
For u64 Input:
- Both #1 and #2 use `xmm` register operations, but #1 has a few extra
`movdqa` instructions. #3 is more concise, using a few `shr`, `xor`,
and `mov` instructions to complete the operation.
- Efficiency: #3 > #2 > #1
Regards,
Kuan-Wei
next prev parent reply other threads:[~2025-04-04 8:48 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-06 16:25 [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper Kuan-Wei Chiu
2025-03-06 16:25 ` [PATCH v3 01/16] bitops: Change parity8() return type to bool Kuan-Wei Chiu
2025-03-06 20:45 ` David Laight
2025-03-07 6:48 ` Jiri Slaby
2025-03-07 11:38 ` Ingo Molnar
2025-03-07 11:42 ` Jiri Slaby
2025-03-07 12:13 ` Ingo Molnar
2025-03-07 12:14 ` H. Peter Anvin
2025-03-07 19:30 ` Yury Norov
2025-03-07 19:33 ` H. Peter Anvin
2025-03-13 16:26 ` Yury Norov
2025-03-07 19:36 ` David Laight
2025-03-07 19:39 ` H. Peter Anvin
2025-03-12 23:56 ` Jacob Keller
2025-03-13 0:09 ` H. Peter Anvin
2025-03-13 16:24 ` Yury Norov
2025-03-13 16:36 ` H. Peter Anvin
2025-03-13 21:09 ` Jacob Keller
2025-03-14 19:06 ` David Laight
2025-03-15 0:14 ` H. Peter Anvin
2025-03-06 16:25 ` [PATCH v3 02/16] bitops: Add parity16(), parity32(), and parity64() helpers Kuan-Wei Chiu
2025-03-06 16:25 ` [PATCH v3 03/16] media: media/test_drivers: Replace open-coded parity calculation with parity8() Kuan-Wei Chiu
2025-03-06 16:25 ` [PATCH v3 04/16] media: pci: cx18-av-vbi: " Kuan-Wei Chiu
2025-03-06 16:25 ` [PATCH v3 05/16] media: saa7115: " Kuan-Wei Chiu
2025-03-06 16:25 ` [PATCH v3 06/16] serial: max3100: " Kuan-Wei Chiu
2025-03-06 16:25 ` [PATCH v3 07/16] lib/bch: Replace open-coded parity calculation with parity32() Kuan-Wei Chiu
2025-03-06 16:25 ` [PATCH v3 08/16] Input: joystick - " Kuan-Wei Chiu
2025-03-06 16:25 ` [PATCH v3 09/16] net: ethernet: oa_tc6: " Kuan-Wei Chiu
2025-03-06 16:25 ` [PATCH v3 10/16] wifi: brcm80211: " Kuan-Wei Chiu
2025-03-06 16:25 ` [PATCH v3 11/16] drm/bridge: dw-hdmi: " Kuan-Wei Chiu
2025-03-06 16:25 ` [PATCH v3 12/16] mtd: ssfdc: " Kuan-Wei Chiu
2025-03-06 16:25 ` [PATCH v3 13/16] fsi: i2cr: " Kuan-Wei Chiu
2025-03-06 16:25 ` [PATCH v3 14/16] fsi: i2cr: Replace open-coded parity calculation with parity64() Kuan-Wei Chiu
2025-03-06 16:25 ` [PATCH v3 15/16] Input: joystick - " Kuan-Wei Chiu
2025-03-06 16:25 ` [PATCH v3 16/16] nfp: bpf: " Kuan-Wei Chiu
2025-03-07 3:08 ` [PATCH v3 00/16] Introduce and use generic parity16/32/64 helper H. Peter Anvin
2025-03-07 18:49 ` Andrew Cooper
2025-03-07 19:30 ` H. Peter Anvin
2025-03-07 19:53 ` David Laight
2025-03-07 20:07 ` H. Peter Anvin
2025-03-09 15:48 ` Kuan-Wei Chiu
2025-03-09 16:00 ` H. Peter Anvin
2025-03-09 17:42 ` Jiri Slaby
2025-03-11 22:01 ` Yury Norov
2025-03-11 22:24 ` H. Peter Anvin
2025-03-12 15:51 ` Yury Norov
2025-03-12 16:29 ` Kuan-Wei Chiu
2025-03-13 7:41 ` Kuan-Wei Chiu
2025-03-23 15:16 ` Kuan-Wei Chiu
2025-03-23 22:40 ` H. Peter Anvin
2025-03-24 15:53 ` Yury Norov
2025-03-29 16:00 ` Kuan-Wei Chiu
2025-03-25 19:43 ` H. Peter Anvin
2025-04-03 14:39 ` Kuan-Wei Chiu
2025-04-03 16:14 ` Yury Norov
2025-04-03 16:54 ` Kuan-Wei Chiu
2025-04-04 2:51 ` Jeremy Kerr
2025-04-04 8:46 ` Kuan-Wei Chiu
2025-04-04 9:01 ` Jeremy Kerr
2025-04-04 8:47 ` Kuan-Wei Chiu [this message]
2025-03-07 3:14 ` H. Peter Anvin
2025-03-07 9:19 ` Kuan-Wei Chiu
2025-03-07 10:52 ` Jiri Slaby
2025-03-07 6:57 ` Jiri Slaby
2025-03-07 9:22 ` Kuan-Wei Chiu
2025-03-07 15:55 ` Yury Norov
2025-03-07 18:30 ` Kuan-Wei Chiu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z++cvrLOz2VAaUkO@visitorckw-System-Product-Name \
--to=visitorckw@gmail.com \
--cc=Laurent.pinchart@ideasonboard.com \
--cc=airlied@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=alistair@popple.id.au \
--cc=andrew+netdev@lunn.ch \
--cc=andrew.cooper3@citrix.com \
--cc=andrzej.hajda@intel.com \
--cc=arend.vanspriel@broadcom.com \
--cc=awalls@md.metrocast.net \
--cc=bp@alien8.de \
--cc=bpf@vger.kernel.org \
--cc=brcm80211-dev-list.pdl@broadcom.com \
--cc=brcm80211@lists.linux.dev \
--cc=dave.hansen@linux.intel.com \
--cc=davem@davemloft.net \
--cc=david.laight.linux@gmail.com \
--cc=dmitry.torokhov@gmail.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=eajames@linux.ibm.com \
--cc=edumazet@google.com \
--cc=eleanor15x@gmail.com \
--cc=gregkh@linuxfoundation.org \
--cc=hpa@zytor.com \
--cc=hverkuil@xs4all.nl \
--cc=jernej.skrabec@gmail.com \
--cc=jirislaby@kernel.org \
--cc=jk@ozlabs.org \
--cc=joel@jms.id.au \
--cc=johannes@sipsolutions.net \
--cc=jonas@kwiboo.se \
--cc=jserv@ccns.ncku.edu.tw \
--cc=kuba@kernel.org \
--cc=linux-fsi@lists.ozlabs.org \
--cc=linux-input@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-media@vger.kernel.org \
--cc=linux-mtd@lists.infradead.org \
--cc=linux-serial@vger.kernel.org \
--cc=linux-wireless@vger.kernel.org \
--cc=linux@rasmusvillemoes.dk \
--cc=louis.peens@corigine.com \
--cc=maarten.lankhorst@linux.intel.com \
--cc=mchehab@kernel.org \
--cc=mingo@redhat.com \
--cc=miquel.raynal@bootlin.com \
--cc=mripard@kernel.org \
--cc=neil.armstrong@linaro.org \
--cc=netdev@vger.kernel.org \
--cc=oss-drivers@corigine.com \
--cc=pabeni@redhat.com \
--cc=parthiban.veerasooran@microchip.com \
--cc=rfoss@kernel.org \
--cc=richard@nod.at \
--cc=simona@ffwll.ch \
--cc=tglx@linutronix.de \
--cc=tzimmermann@suse.de \
--cc=vigneshr@ti.com \
--cc=x86@kernel.org \
--cc=yury.norov@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).