From: Thomas Monjalon <thomas@monjalon.net>
To: Konstantin Ananyev <konstantin.ananyev@huawei.com>
Cc: "Morten Brørup" <mb@smartsharesystems.com>,
"Bruce Richardson" <bruce.richardson@intel.com>,
"dev@dpdk.org" <dev@dpdk.org>,
"Stephen Hemminger" <stephen@networkplumber.org>,
"Wathsala Vithanage" <wathsala.vithanage@arm.com>,
Fengchengwen <fengchengwen@huawei.com>
Subject: Re: [PATCH v7] mbuf: optimize segment prefree
Date: Mon, 27 Oct 2025 10:00:45 +0100 [thread overview]
Message-ID: <8731469.qOBuL9xsDt@thomas> (raw)
In-Reply-To: <74c5f0883b1943068e7348ca8e8a78da@huawei.com>
27/10/2025 09:54, Konstantin Ananyev:
> > > From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com]
> > > Sent: Saturday, 25 October 2025 12.25
> > >
> > > > >
> > > > > > > > Refactored rte_pktmbuf_prefree_seg() for both performance and
> > > > > > > readability.
> > > > > > > >
> > > > > > > > With the optimized RTE_MBUF_DIRECT() macro, the common likely
> > > code
> > > > > > > path
> > > > > > > > now fits within one instruction cache line on x86-64 when
> > > built with
> > > > > > > GCC.
> > > > > > > >
> > > > > > > > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > > > > > > > Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
> > > > > > > > Acked-by: Chengwen Feng <fengchengwen@huawei.com>
> > > > > > > > Reviewed-by: Bruce Richardson <bruce.richardson@intel.com>
> > > > > > > > ---
> > > > > > > > v7:
> > > > > > > > * Go back to long names instead of numerical value in
> > > > > > > RTE_MBUF_DIRECT()
> > > > > > > > macro.
> > > > > > > > (Konstantin Ananyev)
> > > > > > > > * Updated static_assert() accordingly.
> > > > > >
> > > > > > [...]
> > > > > >
> > > > > > > > *
> > > > > > > > * If a mbuf embeds its own data after the rte_mbuf
> > > structure, this
> > > > > > > mbuf
> > > > > > > > * can be defined as a direct mbuf.
> > > > > > > > - */
> > > > > > > > + *
> > > > > > > > + * Note: Macro optimized for code size.
> > > > > > > > + *
> > > > > > > > + * The plain macro would be:
> > > > > > > > + * \code{.c}
> > > > > > > > + * #define RTE_MBUF_DIRECT(mb) \
> > > > > > > > + * (!((mb)->ol_flags & (RTE_MBUF_F_INDIRECT |
> > > > > > > > RTE_MBUF_F_EXTERNAL)))
> > > > > > > > + * \endcode
> > > > > > > > + *
> > > > > > > > + * The flags RTE_MBUF_F_INDIRECT and RTE_MBUF_F_EXTERNAL are
> > > both
> > > > in
> > > > > > > > the MSB (most significant
> > > > > > > > + * byte) of the 64-bit ol_flags field, so we only compare
> > > this one
> > > > > > > byte instead of
> > > > > > > > all 64 bits.
> > > > > > > > + *
> > > > > > > > + * E.g., GCC version 16.0.0 20251019 (experimental)
> > > generates the
> > > > > > > following
> > > > > > > > code for x86-64.
> > > > > > > > + *
> > > > > > > > + * With the plain macro, 17 bytes of instructions:
> > > > > > > > + * \code
> > > > > > > > + * movabs rax,0x6000000000000000 // 10 bytes
> > > > > > > > + * and rax,QWORD PTR [rdi+0x18] // 4 bytes
> > > > > > > > + * sete al // 3 bytes
> > > > > > > > + * \endcode
> > > > > > > > + * With this optimized macro, only 7 bytes of instructions:
> > > > > > > > + * \code
> > > > > > > > + * test BYTE PTR [rdi+0x1f],0x60 // 4 bytes
> > > > > > > > + * sete al // 3 bytes
> > > > > > > > + * \endcode
> > > > > > > > + */
> > > > > > > > +#ifdef __DOXYGEN__
> > > > > > > > +#define RTE_MBUF_DIRECT(mb) \
> > > > > > > > + !(((const char *)(&(mb)->ol_flags))[MSB_OFFSET /* 7 or 0,
> > > > > > > depending on
> > > > > > > > endianness */] & \
> > > > > > > > + (char)((RTE_MBUF_F_INDIRECT | RTE_MBUF_F_EXTERNAL) >> (7 *
> > > > > > > > CHAR_BIT)))
> > > > > > > > +#else /* !__DOXYGEN__ */
> > > > > > > > +#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
> > > > > > > > +/* On little endian architecture, the MSB of a 64-bit
> > > integer is at
> > > > > > > byte offset 7. */
> > > > > > > > +#define RTE_MBUF_DIRECT(mb) \
> > > > > > > > + !(((const char *)(&(mb)->ol_flags))[7] & \
> > > > > > > > + (char)((RTE_MBUF_F_INDIRECT | RTE_MBUF_F_EXTERNAL) >> (7 *
> > > > > > > > CHAR_BIT)))
> > > > > > > > +#elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
> > > > > > > > +/* On big endian architecture, the MSB of a 64-bit integer
> > > is at
> > > > > > > byte offset 0. */
> > > > > > > > #define RTE_MBUF_DIRECT(mb) \
> > > > > > > > - (!((mb)->ol_flags & (RTE_MBUF_F_INDIRECT |
> > > > > > > > RTE_MBUF_F_EXTERNAL)))
> > > > > > > > + !(((const char *)(&(mb)->ol_flags))[0] & \
> > > > > > > > + (char)((RTE_MBUF_F_INDIRECT | RTE_MBUF_F_EXTERNAL) >> (7 *
> > > > > > > > CHAR_BIT)))
> > > > > > > > +#endif /* RTE_BYTE_ORDER */
> > > > > > > > +#endif /* !__DOXYGEN__ */
> > > > > > > > +/* Verify the optimization above. */
> > > > > > > > +static_assert(((RTE_MBUF_F_INDIRECT | RTE_MBUF_F_EXTERNAL) &
> > > > > > > > (UINT64_C(0xFF) << (7 * CHAR_BIT))) ==
> > > > > > > > + (RTE_MBUF_F_INDIRECT | RTE_MBUF_F_EXTERNAL),
> > > > > > > > + "(RTE_MBUF_F_INDIRECT | RTE_MBUF_F_EXTERNAL) is not at
> > > > MSB");
> > > > > > > >
> > > > > > > > /** Uninitialized or unspecified port. */
> > > > > > > > #define RTE_MBUF_PORT_INVALID UINT16_MAX
> > > > > > > > --
> > > > > > >
> > > > > > > LGTM, thanks for refactoring.
> > > > > >
> > > > > > Thank you for reviewing, Konstantin.
> > > > > >
> > > > > > I had no preference for v7 or v6, but Bruce and Thomas preferred
> > > v6, so v6 was
> > > > > > applied.
> > > > >
> > > > > Yes, I saw Thomas email, after I sent my reply already.
> > > > > Looks like I was late with my vote.
> > > > > My preference still would be to avoid hard-coded constants in the
> > > code,
> > > > > but seems that it is just me.
> > > >
> > > > Me too I want to avoid hardcoded constants.
> > > > But in this case, it is very well documented,
> > > > and there is a trade-off with length and reading.
> > >
> > > If we allow hard-coded constants in one place,
> > > then it would be harder for us to disallow them in other places.
> > > If we allow them everywhere - code will become a mess pretty soon.
> >
> > Vector code has plenty of hardcoded constants. E.g.:
> > https://elixir.bootlin.com/dpdk/v25.07/source/drivers/net/intel/i40e/i40e_rxtx_vec
> > _avx512.c#L154
> > So we already do allow it.
>
> It is a SIMD code for specific architecture (avx512) and specific driver.
> mbuf.h is a core header in our core library - I expect the code cleanness requirements
> to be higher for it.
Feel free to propose a patch, and I'll apply it if several reviewers like it.
prev parent reply other threads:[~2025-10-27 9:00 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-27 21:35 [PATCH] mbuf: optimize segment prefree Morten Brørup
2025-08-27 23:17 ` Stephen Hemminger
2025-10-06 17:46 ` Wathsala Vithanage
2025-10-06 18:26 ` Morten Brørup
2025-10-06 14:49 ` Morten Brørup
2025-10-20 12:02 ` [PATCH v2] " Morten Brørup
2025-10-20 14:24 ` Konstantin Ananyev
2025-10-21 8:38 ` fengchengwen
2025-10-22 9:08 ` Bruce Richardson
2025-10-22 13:53 ` Morten Brørup
2025-10-22 14:12 ` Bruce Richardson
2025-10-22 14:14 ` Morten Brørup
2025-10-22 13:23 ` [PATCH v3] " Morten Brørup
2025-10-22 14:47 ` [PATCH v4] " Morten Brørup
2025-10-22 15:02 ` Bruce Richardson
2025-10-22 18:28 ` Morten Brørup
2025-10-23 7:04 ` Morten Brørup
2025-10-23 8:01 ` [PATCH v5] " Morten Brørup
2025-10-23 8:08 ` Bruce Richardson
2025-10-23 8:51 ` Konstantin Ananyev
2025-10-23 11:17 ` Morten Brørup
2025-10-23 14:04 ` Konstantin Ananyev
2025-10-23 14:48 ` Morten Brørup
2025-10-23 15:27 ` Konstantin Ananyev
2025-10-23 15:46 ` Morten Brørup
2025-10-23 16:03 ` Bruce Richardson
2025-10-23 16:24 ` Morten Brørup
2025-10-23 12:48 ` [PATCH v6] " Morten Brørup
2025-10-23 17:13 ` Thomas Monjalon
2025-10-23 16:18 ` [PATCH v7] " Morten Brørup
2025-10-24 8:20 ` Konstantin Ananyev
2025-10-24 8:58 ` Morten Brørup
2025-10-24 9:21 ` Konstantin Ananyev
2025-10-24 10:14 ` Thomas Monjalon
2025-10-25 10:24 ` Konstantin Ananyev
2025-10-25 12:16 ` Morten Brørup
2025-10-27 8:54 ` Konstantin Ananyev
2025-10-27 9:00 ` Thomas Monjalon [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8731469.qOBuL9xsDt@thomas \
--to=thomas@monjalon.net \
--cc=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
--cc=fengchengwen@huawei.com \
--cc=konstantin.ananyev@huawei.com \
--cc=mb@smartsharesystems.com \
--cc=stephen@networkplumber.org \
--cc=wathsala.vithanage@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.