From: Konstantin Ananyev <konstantin.ananyev@huawei.com>
To: "Morten Brørup" <mb@smartsharesystems.com>,
"dev@dpdk.org" <dev@dpdk.org>,
"Stephen Hemminger" <stephen@networkplumber.org>,
"Wathsala Vithanage" <wathsala.vithanage@arm.com>,
Fengchengwen <fengchengwen@huawei.com>,
"Bruce Richardson" <bruce.richardson@intel.com>
Subject: RE: [PATCH v5] mbuf: optimize segment prefree
Date: Thu, 23 Oct 2025 15:27:25 +0000 [thread overview]
Message-ID: <e12aa6583cb34a3595fd518c187bf144@huawei.com> (raw)
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35F654FB@smartserver.smartshare.dk>
> > From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com]
> > Sent: Thursday, 23 October 2025 16.05
> >
> > > > From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com]
> > > > Sent: Thursday, 23 October 2025 10.51
> > > >
> > > > > -#define RTE_MBUF_DIRECT(mb) \
> > > > > - (!((mb)->ol_flags & (RTE_MBUF_F_INDIRECT |
> > RTE_MBUF_F_EXTERNAL)))
> > > > > + *
> > > > > + * Note: Macro optimized for code size.
> > > > > + *
> > > > > + * The plain macro would be:
> > > > > + * #define RTE_MBUF_DIRECT(mb) \
> > > > > + * (!((mb)->ol_flags & (RTE_MBUF_F_INDIRECT |
> > > > RTE_MBUF_F_EXTERNAL)))
> > > > > + *
> > > > > + * The flags RTE_MBUF_F_INDIRECT and RTE_MBUF_F_EXTERNAL are
> > both in
> > > > the
> > > > > MSB (most significant
> > > > > + * byte) of the 64-bit ol_flags field, so we only compare this
> > one
> > > > byte instead of all
> > > > > 64 bits.
> > > > > + *
> > > > > + * E.g., GCC version 16.0.0 20251019 (experimental) generates
> > the
> > > > following code
> > > > > for x86-64.
> > > > > + *
> > > > > + * With the plain macro, 17 bytes of instructions:
> > > > > + * movabs rax,0x6000000000000000 // 10 bytes
> > > > > + * and rax,QWORD PTR [rdi+0x18] // 4 bytes
> > > > > + * sete al // 3 bytes
> > > > > + * With this optimized macro, only 7 bytes of instructions:
> > > > > + * test BYTE PTR [rdi+0x1f],0x60 // 4 bytes
> > > > > + * sete al // 3 bytes
> > > > > + */
> > > > > +#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
> > > > > +/* On little endian architecture, the MSB of a 64-bit integer is
> > at
> > > > byte offset 7. */
> > > > > +#define RTE_MBUF_DIRECT(mb) !(((const char *)(&(mb)-
> > > > >ol_flags))[7] & 0x60)
> > > > > +#elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
> > > > > +/* On big endian architecture, the MSB of a 64-bit integer is at
> > > > byte offset 0. */
> > > > > +#define RTE_MBUF_DIRECT(mb) !(((const char *)(&(mb)-
> > > > >ol_flags))[0] & 0x60)
> > > >
> > > > A stupid q: why then not simply do:
> > > > (mb->ol_flags >> 56) & 0x60
> > > > then?
> > > > Should help to all these LE/BE casts, etc.
> > >
> > > GCC is too stupid for that too.
> > >
> > > Playing around with Godbolt shows that
> > > return !((char)(p[3] >> 56) & 0x60);
> > > becomes
> > > movzx eax,BYTE PTR [rdi+0x1f] // 4 bytes
> > > test al,0x60 // 2 bytes
> > > Instead of simply
> > > test BYTE PTR [rdi+0x1f],0x60 // 4 bytes
> >
> > And these 2 extra bytes in instructions, are that really that critical?
> > My guess, we wouldn't notice any real diff here.
>
> The optimized macro made the common code path of the refactored
> rte_pktmbuf_prefree_seg() fit into one cache line.
> IIRC, all 10 bytes saving were required for this.
I understand that. but is that change will provide a measurable impact,
in terms of cycles/op or pps or so?
> > But if it really is, can I ask you to create a new define for 0x60,
> > to avoid hardcoded constants in the code?
> > Might be something like
> > #define RTE_MBUF_F_INDIRECT_EXTERNAL_1B ...
> > or so.
>
> I started out using the field names, but Bruce suggested using 0x60 for
> readability, making the macros shorter, which IMO looks good.
>
> I don't like adding special names just for this, so either we stick with 0x60 or go for
> "(char)((RTE_MBUF_F_INDIRECT | RTE_MBUF_F_EXTERNAL) >> (7 *
> CHAR_BIT))", something like this:
My vote would be to use the construction above.
Might be put it in a new macro for readability.
Konstantin
> #ifdef __DOXYGEN__
> #define RTE_MBUF_DIRECT(mb) \
> !(((const char *)(&(mb)->ol_flags))[MSB_OFFSET /* 7 or 0, depending on
> endianness */] & \
> (char)((RTE_MBUF_F_INDIRECT | RTE_MBUF_F_EXTERNAL) >> (7 *
> CHAR_BIT)))
> #else /* !__DOXYGEN__ */
> #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
> /* On little endian architecture, the MSB of a 64-bit integer is at byte offset 7. */
> #define RTE_MBUF_DIRECT(mb) \
> !(((const char *)(&(mb)->ol_flags))[7] & \
> (char)((RTE_MBUF_F_INDIRECT | RTE_MBUF_F_EXTERNAL) >> (7 *
> CHAR_BIT)))
> #elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
> /* On big endian architecture, the MSB of a 64-bit integer is at byte offset 0. */
> #define RTE_MBUF_DIRECT(mb) \
> !(((const char *)(&(mb)->ol_flags))[0] & \
> (char)((RTE_MBUF_F_INDIRECT | RTE_MBUF_F_EXTERNAL) >> (7 *
> CHAR_BIT)))
> #endif /* RTE_BYTE_ORDER */
> #endif /* !__DOXYGEN__ */
> /* Verify the optimization above. */
> static_assert(((RTE_MBUF_F_INDIRECT | RTE_MBUF_F_EXTERNAL) &
> (UINT64_C(0xFF) << (7 * CHAR_BIT))) ==
> (RTE_MBUF_F_INDIRECT | RTE_MBUF_F_EXTERNAL),
> "(RTE_MBUF_F_INDIRECT | RTE_MBUF_F_EXTERNAL) is not at MSB");
>
> > Konstantin
> >
> > > Good suggestion, though.
> > >
> > > >
> > > > > +#endif
> > > > > +/* Verify the optimization above. */
> > > > > +static_assert((RTE_MBUF_F_INDIRECT | RTE_MBUF_F_EXTERNAL) ==
> > > > > UINT64_C(0x60) << (7 * CHAR_BIT),
> > > > > + "(RTE_MBUF_F_INDIRECT | RTE_MBUF_F_EXTERNAL) is not 0x60
> at
> > > > MSB");
> > > > >
> > > > > /** Uninitialized or unspecified port. */
> > > > > #define RTE_MBUF_PORT_INVALID UINT16_MAX
> > > > > --
> > > > > 2.43.0
next prev parent reply other threads:[~2025-10-23 15:27 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-27 21:35 [PATCH] mbuf: optimize segment prefree Morten Brørup
2025-08-27 23:17 ` Stephen Hemminger
2025-10-06 17:46 ` Wathsala Vithanage
2025-10-06 18:26 ` Morten Brørup
2025-10-06 14:49 ` Morten Brørup
2025-10-20 12:02 ` [PATCH v2] " Morten Brørup
2025-10-20 14:24 ` Konstantin Ananyev
2025-10-21 8:38 ` fengchengwen
2025-10-22 9:08 ` Bruce Richardson
2025-10-22 13:53 ` Morten Brørup
2025-10-22 14:12 ` Bruce Richardson
2025-10-22 14:14 ` Morten Brørup
2025-10-22 13:23 ` [PATCH v3] " Morten Brørup
2025-10-22 14:47 ` [PATCH v4] " Morten Brørup
2025-10-22 15:02 ` Bruce Richardson
2025-10-22 18:28 ` Morten Brørup
2025-10-23 7:04 ` Morten Brørup
2025-10-23 8:01 ` [PATCH v5] " Morten Brørup
2025-10-23 8:08 ` Bruce Richardson
2025-10-23 8:51 ` Konstantin Ananyev
2025-10-23 11:17 ` Morten Brørup
2025-10-23 14:04 ` Konstantin Ananyev
2025-10-23 14:48 ` Morten Brørup
2025-10-23 15:27 ` Konstantin Ananyev [this message]
2025-10-23 15:46 ` Morten Brørup
2025-10-23 16:03 ` Bruce Richardson
2025-10-23 16:24 ` Morten Brørup
2025-10-23 12:48 ` [PATCH v6] " Morten Brørup
2025-10-23 17:13 ` Thomas Monjalon
2025-10-23 16:18 ` [PATCH v7] " Morten Brørup
2025-10-24 8:20 ` Konstantin Ananyev
2025-10-24 8:58 ` Morten Brørup
2025-10-24 9:21 ` Konstantin Ananyev
2025-10-24 10:14 ` Thomas Monjalon
2025-10-25 10:24 ` Konstantin Ananyev
2025-10-25 12:16 ` Morten Brørup
2025-10-27 8:54 ` Konstantin Ananyev
2025-10-27 9:00 ` Thomas Monjalon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e12aa6583cb34a3595fd518c187bf144@huawei.com \
--to=konstantin.ananyev@huawei.com \
--cc=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
--cc=fengchengwen@huawei.com \
--cc=mb@smartsharesystems.com \
--cc=stephen@networkplumber.org \
--cc=wathsala.vithanage@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.