RE: [PATCH 1/2] net: ethernet address comparison optimizations

public inbox for dev@dpdk.org
 help / color / mirror / Atom feed

From: "Morten Brørup" <mb@smartsharesystems.com>
To: "Bruce Richardson" <bruce.richardson@intel.com>
Cc: <dev@dpdk.org>
Subject: RE: [PATCH 1/2] net: ethernet address comparison optimizations
Date: Fri, 30 Jan 2026 14:54:52 +0100	[thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35F656D5@smartserver.smartshare.dk> (raw)
In-Reply-To: <aXyVd7BpQKR_4uRj@bricha3-mobl1.ger.corp.intel.com>

> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> Sent: Friday, 30 January 2026 12.27
> 
> On Fri, Jan 30, 2026 at 12:16:43PM +0100, Morten Brørup wrote:
> > > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > > Sent: Friday, 30 January 2026 11.53
> > >
> > > On Fri, Jan 30, 2026 at 10:46:16AM +0000, Morten Brørup wrote:
> > > > For CPU architectures without strict alignment requirements,
> > > operations on
> > > > 6-byte Ethernet addresses using three 2-byte operations were
> replaced
> > > by a
> > > > 4-byte and a 2-byte operation, i.e. two operations instead of
> three.
> > > >
> > > > Comparison functions are pure, so added __rte_pure.
> > > >
> > > > Removed superfluous parentheses. (No functional change.)
> > > >
> > > > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > > > ---
> > > >  lib/net/rte_ether.h | 19 ++++++++++++++++++-
> > > >  1 file changed, 18 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/lib/net/rte_ether.h b/lib/net/rte_ether.h
> > > > index c9a0b536c3..5552d3c1f6 100644
> > > > --- a/lib/net/rte_ether.h
> > > > +++ b/lib/net/rte_ether.h
> > > > @@ -99,13 +99,19 @@ static_assert(alignof(struct rte_ether_addr)
> ==
> > > 2,
> > > >   *  True  (1) if the given two ethernet address are the same;
> > > >   *  False (0) otherwise.
> > > >   */
> > > > +__rte_pure
> > > >  static inline int rte_is_same_ether_addr(const struct
> rte_ether_addr
> > > *ea1,
> > > >  				     const struct rte_ether_addr *ea2)
> > > >  {
> > > > +#if !defined(RTE_ARCH_STRICT_ALIGN)
> > > > +	return ((((const unaligned_uint32_t *)ea1)[0] ^ ((const
> > > unaligned_uint32_t *)ea2)[0]) |
> > > > +			(((const uint16_t *)ea1)[2] ^ ((const uint16_t
> > > *)ea2)[2])) == 0;
> > > > +#else
> > > >  	const uint16_t *w1 = (const uint16_t *)ea1;
> > > >  	const uint16_t *w2 = (const uint16_t *)ea2;
> > > >
> > > >  	return ((w1[0] ^ w2[0]) | (w1[1] ^ w2[1]) | (w1[2] ^
> w2[2])) ==
> > > 0;
> > > > +#endif
> > > >  }
> > >
> > > Is this actually faster?
> >
> > It's a simple micro-optimization, so I haven't benchmarked it.
> > On x86, the compiled function is simplified and reduced in size from
> 34 to 24 bytes:
> >
> > 00000000004ed650 <review_rte_is_same_ether_addr>:
> >   4ed650:	0f b7 07             	movzwl (%rdi),%eax
> >   4ed653:	0f b7 57 02          	movzwl 0x2(%rdi),%edx
> >   4ed657:	66 33 06             	xor    (%rsi),%ax
> >   4ed65a:	66 33 56 02          	xor    0x2(%rsi),%dx
> >   4ed65e:	09 d0                	or     %edx,%eax
> >   4ed660:	0f b7 57 04          	movzwl 0x4(%rdi),%edx
> >   4ed664:	66 33 56 04          	xor    0x4(%rsi),%dx
> >   4ed668:	66 09 d0             	or     %dx,%ax
> >   4ed66b:	0f 94 c0             	sete   %al
> >   4ed66e:	0f b6 c0             	movzbl %al,%eax
> >   4ed671:	c3                   	ret
> >   4ed672:	66 66 2e 0f 1f 84 00 	data16 cs nopw 0x0(%rax,%rax,1)
> >   4ed679:	00 00 00 00
> >   4ed67d:	0f 1f 00             	nopl   (%rax)
> >
> > 00000000004ed680 <rte_is_same_ether_addr_improved>:
> >   4ed680:	0f b7 47 04          	movzwl 0x4(%rdi),%eax
> >   4ed684:	66 33 46 04          	xor    0x4(%rsi),%ax
> >   4ed688:	8b 17                	mov    (%rdi),%edx
> >   4ed68a:	33 16                	xor    (%rsi),%edx
> >   4ed68c:	0f b7 c0             	movzwl %ax,%eax
> >   4ed68f:	09 c2                	or     %eax,%edx
> >   4ed691:	0f 94 c0             	sete   %al
> >   4ed694:	0f b6 c0             	movzbl %al,%eax
> >   4ed697:	c3                   	ret
> >   4ed698:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
> >   4ed69f:	00
> >
> > For reference, memcpy() of 6 bytes (compile time constant) also
> compiles to a 4-byte and a 2-byte operation, not three 2-byte
> operations.
> >
> What about memcmp? Does it compile similarly?

memcmp(a,b,6) on Clang compiles into something very similar.
memcmp(a,b,6) on GCC compiles into something with a branch after the first 4-byte comparison, with the assumption (regarding static branch prediction) that they are likely to differ.
I guess GCC's counterproductive behavior was the reason for originally implementing a manual comparison, instead of simply using memcmp().

BTW, GCC is clever enough to compile 8-byte and 16-byte comparisons into code without branches.
I guess that's why rte_ipv6_addr_eq() is implemented using memcpy() [1].

[1]: https://elixir.bootlin.com/dpdk/v25.11/source/lib/net/rte_ip6.h#L68

> Before we start adding ifdefs
> like this to the code, I'd like to see some measured performance
> benefits
> from it. While the code may be 10 bytes shorter, does that actually
> translate into a measurable difference in some app?

Excellent question!
Some quick rudimentary testing shows that it seems to be ~4 cycles slower than what it's replacing.
Reality beats expectations.

I'll drop this patch.

next prev parent reply	other threads:[~2026-01-30 13:54 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-30 10:46 [PATCH 1/2] net: ethernet address comparison optimizations Morten Brørup
2026-01-30 10:46 ` [PATCH 2/2] [RFC] net: introduce fast ethernet address comparison function Morten Brørup
2026-01-30 14:03   ` Morten Brørup
2026-01-30 10:52 ` [PATCH 1/2] net: ethernet address comparison optimizations Bruce Richardson
2026-01-30 11:16   ` Morten Brørup
2026-01-30 11:26     ` Bruce Richardson
2026-01-30 13:54       ` Morten Brørup [this message]
2026-01-30 14:02         ` Bruce Richardson
2026-01-30 14:25           ` Morten Brørup
2026-01-30 14:32             ` Bruce Richardson
2026-01-30 14:59               ` Morten Brørup
2026-01-30 16:20 ` Stephen Hemminger
2026-01-30 16:24   ` Bruce Richardson
2026-01-30 16:31     ` Konstantin Ananyev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98CBD80474FA8B44BF855DF32C47DC35F656D5@smartserver.smartshare.dk \
    --to=mb@smartsharesystems.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox