public inbox for dev@dpdk.org
 help / color / mirror / Atom feed
From: "Morten Brørup" <mb@smartsharesystems.com>
To: "Bruce Richardson" <bruce.richardson@intel.com>
Cc: <dev@dpdk.org>
Subject: RE: [PATCH 1/2] net: ethernet address comparison optimizations
Date: Fri, 30 Jan 2026 12:16:43 +0100	[thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35F656D4@smartserver.smartshare.dk> (raw)
In-Reply-To: <aXyNh8Yja-Mg6qID@bricha3-mobl1.ger.corp.intel.com>

> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> Sent: Friday, 30 January 2026 11.53
> 
> On Fri, Jan 30, 2026 at 10:46:16AM +0000, Morten Brørup wrote:
> > For CPU architectures without strict alignment requirements,
> operations on
> > 6-byte Ethernet addresses using three 2-byte operations were replaced
> by a
> > 4-byte and a 2-byte operation, i.e. two operations instead of three.
> >
> > Comparison functions are pure, so added __rte_pure.
> >
> > Removed superfluous parentheses. (No functional change.)
> >
> > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > ---
> >  lib/net/rte_ether.h | 19 ++++++++++++++++++-
> >  1 file changed, 18 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/net/rte_ether.h b/lib/net/rte_ether.h
> > index c9a0b536c3..5552d3c1f6 100644
> > --- a/lib/net/rte_ether.h
> > +++ b/lib/net/rte_ether.h
> > @@ -99,13 +99,19 @@ static_assert(alignof(struct rte_ether_addr) ==
> 2,
> >   *  True  (1) if the given two ethernet address are the same;
> >   *  False (0) otherwise.
> >   */
> > +__rte_pure
> >  static inline int rte_is_same_ether_addr(const struct rte_ether_addr
> *ea1,
> >  				     const struct rte_ether_addr *ea2)
> >  {
> > +#if !defined(RTE_ARCH_STRICT_ALIGN)
> > +	return ((((const unaligned_uint32_t *)ea1)[0] ^ ((const
> unaligned_uint32_t *)ea2)[0]) |
> > +			(((const uint16_t *)ea1)[2] ^ ((const uint16_t
> *)ea2)[2])) == 0;
> > +#else
> >  	const uint16_t *w1 = (const uint16_t *)ea1;
> >  	const uint16_t *w2 = (const uint16_t *)ea2;
> >
> >  	return ((w1[0] ^ w2[0]) | (w1[1] ^ w2[1]) | (w1[2] ^ w2[2])) ==
> 0;
> > +#endif
> >  }
> 
> Is this actually faster?

It's a simple micro-optimization, so I haven't benchmarked it.
On x86, the compiled function is simplified and reduced in size from 34 to 24 bytes:

00000000004ed650 <review_rte_is_same_ether_addr>:
  4ed650:	0f b7 07             	movzwl (%rdi),%eax
  4ed653:	0f b7 57 02          	movzwl 0x2(%rdi),%edx
  4ed657:	66 33 06             	xor    (%rsi),%ax
  4ed65a:	66 33 56 02          	xor    0x2(%rsi),%dx
  4ed65e:	09 d0                	or     %edx,%eax
  4ed660:	0f b7 57 04          	movzwl 0x4(%rdi),%edx
  4ed664:	66 33 56 04          	xor    0x4(%rsi),%dx
  4ed668:	66 09 d0             	or     %dx,%ax
  4ed66b:	0f 94 c0             	sete   %al
  4ed66e:	0f b6 c0             	movzbl %al,%eax
  4ed671:	c3                   	ret
  4ed672:	66 66 2e 0f 1f 84 00 	data16 cs nopw 0x0(%rax,%rax,1)
  4ed679:	00 00 00 00 
  4ed67d:	0f 1f 00             	nopl   (%rax)

00000000004ed680 <rte_is_same_ether_addr_improved>:
  4ed680:	0f b7 47 04          	movzwl 0x4(%rdi),%eax
  4ed684:	66 33 46 04          	xor    0x4(%rsi),%ax
  4ed688:	8b 17                	mov    (%rdi),%edx
  4ed68a:	33 16                	xor    (%rsi),%edx
  4ed68c:	0f b7 c0             	movzwl %ax,%eax
  4ed68f:	09 c2                	or     %eax,%edx
  4ed691:	0f 94 c0             	sete   %al
  4ed694:	0f b6 c0             	movzbl %al,%eax
  4ed697:	c3                   	ret
  4ed698:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
  4ed69f:	00

For reference, memcpy() of 6 bytes (compile time constant) also compiles to a 4-byte and a 2-byte operation, not three 2-byte operations.

> For architectures that support strict alignment,
> this looks like something that the compilers should be doing using
> proper
> cost-benefit evaluation based on target architecture, rather than us
> doing
> it in our code.

I agree with the high level message in your comment.
DPDK contains some manual optimizations from back in the days, and the evolvement of compilers have made some of them obsolete.

In this case, GCC doesn't optimize it, so I did it manually.
I haven't checked if other compilers are clever enough to do it.


  reply	other threads:[~2026-01-30 11:16 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-30 10:46 [PATCH 1/2] net: ethernet address comparison optimizations Morten Brørup
2026-01-30 10:46 ` [PATCH 2/2] [RFC] net: introduce fast ethernet address comparison function Morten Brørup
2026-01-30 14:03   ` Morten Brørup
2026-01-30 10:52 ` [PATCH 1/2] net: ethernet address comparison optimizations Bruce Richardson
2026-01-30 11:16   ` Morten Brørup [this message]
2026-01-30 11:26     ` Bruce Richardson
2026-01-30 13:54       ` Morten Brørup
2026-01-30 14:02         ` Bruce Richardson
2026-01-30 14:25           ` Morten Brørup
2026-01-30 14:32             ` Bruce Richardson
2026-01-30 14:59               ` Morten Brørup
2026-01-30 16:20 ` Stephen Hemminger
2026-01-30 16:24   ` Bruce Richardson
2026-01-30 16:31     ` Konstantin Ananyev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98CBD80474FA8B44BF855DF32C47DC35F656D4@smartserver.smartshare.dk \
    --to=mb@smartsharesystems.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox