From: "Morten Brørup" <mb@smartsharesystems.com>
To: "Bruce Richardson" <bruce.richardson@intel.com>
Cc: <dev@dpdk.org>
Subject: RE: [PATCH 1/2] net: ethernet address comparison optimizations
Date: Fri, 30 Jan 2026 12:16:43 +0100 [thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35F656D4@smartserver.smartshare.dk> (raw)
In-Reply-To: <aXyNh8Yja-Mg6qID@bricha3-mobl1.ger.corp.intel.com>
> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> Sent: Friday, 30 January 2026 11.53
>
> On Fri, Jan 30, 2026 at 10:46:16AM +0000, Morten Brørup wrote:
> > For CPU architectures without strict alignment requirements,
> operations on
> > 6-byte Ethernet addresses using three 2-byte operations were replaced
> by a
> > 4-byte and a 2-byte operation, i.e. two operations instead of three.
> >
> > Comparison functions are pure, so added __rte_pure.
> >
> > Removed superfluous parentheses. (No functional change.)
> >
> > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > ---
> > lib/net/rte_ether.h | 19 ++++++++++++++++++-
> > 1 file changed, 18 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/net/rte_ether.h b/lib/net/rte_ether.h
> > index c9a0b536c3..5552d3c1f6 100644
> > --- a/lib/net/rte_ether.h
> > +++ b/lib/net/rte_ether.h
> > @@ -99,13 +99,19 @@ static_assert(alignof(struct rte_ether_addr) ==
> 2,
> > * True (1) if the given two ethernet address are the same;
> > * False (0) otherwise.
> > */
> > +__rte_pure
> > static inline int rte_is_same_ether_addr(const struct rte_ether_addr
> *ea1,
> > const struct rte_ether_addr *ea2)
> > {
> > +#if !defined(RTE_ARCH_STRICT_ALIGN)
> > + return ((((const unaligned_uint32_t *)ea1)[0] ^ ((const
> unaligned_uint32_t *)ea2)[0]) |
> > + (((const uint16_t *)ea1)[2] ^ ((const uint16_t
> *)ea2)[2])) == 0;
> > +#else
> > const uint16_t *w1 = (const uint16_t *)ea1;
> > const uint16_t *w2 = (const uint16_t *)ea2;
> >
> > return ((w1[0] ^ w2[0]) | (w1[1] ^ w2[1]) | (w1[2] ^ w2[2])) ==
> 0;
> > +#endif
> > }
>
> Is this actually faster?
It's a simple micro-optimization, so I haven't benchmarked it.
On x86, the compiled function is simplified and reduced in size from 34 to 24 bytes:
00000000004ed650 <review_rte_is_same_ether_addr>:
4ed650: 0f b7 07 movzwl (%rdi),%eax
4ed653: 0f b7 57 02 movzwl 0x2(%rdi),%edx
4ed657: 66 33 06 xor (%rsi),%ax
4ed65a: 66 33 56 02 xor 0x2(%rsi),%dx
4ed65e: 09 d0 or %edx,%eax
4ed660: 0f b7 57 04 movzwl 0x4(%rdi),%edx
4ed664: 66 33 56 04 xor 0x4(%rsi),%dx
4ed668: 66 09 d0 or %dx,%ax
4ed66b: 0f 94 c0 sete %al
4ed66e: 0f b6 c0 movzbl %al,%eax
4ed671: c3 ret
4ed672: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1)
4ed679: 00 00 00 00
4ed67d: 0f 1f 00 nopl (%rax)
00000000004ed680 <rte_is_same_ether_addr_improved>:
4ed680: 0f b7 47 04 movzwl 0x4(%rdi),%eax
4ed684: 66 33 46 04 xor 0x4(%rsi),%ax
4ed688: 8b 17 mov (%rdi),%edx
4ed68a: 33 16 xor (%rsi),%edx
4ed68c: 0f b7 c0 movzwl %ax,%eax
4ed68f: 09 c2 or %eax,%edx
4ed691: 0f 94 c0 sete %al
4ed694: 0f b6 c0 movzbl %al,%eax
4ed697: c3 ret
4ed698: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
4ed69f: 00
For reference, memcpy() of 6 bytes (compile time constant) also compiles to a 4-byte and a 2-byte operation, not three 2-byte operations.
> For architectures that support strict alignment,
> this looks like something that the compilers should be doing using
> proper
> cost-benefit evaluation based on target architecture, rather than us
> doing
> it in our code.
I agree with the high level message in your comment.
DPDK contains some manual optimizations from back in the days, and the evolvement of compilers have made some of them obsolete.
In this case, GCC doesn't optimize it, so I did it manually.
I haven't checked if other compilers are clever enough to do it.
next prev parent reply other threads:[~2026-01-30 11:16 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-30 10:46 [PATCH 1/2] net: ethernet address comparison optimizations Morten Brørup
2026-01-30 10:46 ` [PATCH 2/2] [RFC] net: introduce fast ethernet address comparison function Morten Brørup
2026-01-30 14:03 ` Morten Brørup
2026-01-30 10:52 ` [PATCH 1/2] net: ethernet address comparison optimizations Bruce Richardson
2026-01-30 11:16 ` Morten Brørup [this message]
2026-01-30 11:26 ` Bruce Richardson
2026-01-30 13:54 ` Morten Brørup
2026-01-30 14:02 ` Bruce Richardson
2026-01-30 14:25 ` Morten Brørup
2026-01-30 14:32 ` Bruce Richardson
2026-01-30 14:59 ` Morten Brørup
2026-01-30 16:20 ` Stephen Hemminger
2026-01-30 16:24 ` Bruce Richardson
2026-01-30 16:31 ` Konstantin Ananyev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=98CBD80474FA8B44BF855DF32C47DC35F656D4@smartserver.smartshare.dk \
--to=mb@smartsharesystems.com \
--cc=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox