Re: [PATCH net-next v10 02/12] netmem: use netmem_desc instead of page to access ->pp in __netmem_get_pp()

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Mina Almasry <almasrymina@google.com>
To: Byungchul Park <byungchul@sk.com>,
	"Lobakin, Aleksander" <aleksander.lobakin@intel.com>
Cc: willy@infradead.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org,  linux-mm@kvack.org,
	kernel_team@skhynix.com, ilias.apalodimas@linaro.org,
	 harry.yoo@oracle.com, akpm@linux-foundation.org,
	andrew+netdev@lunn.ch,  asml.silence@gmail.com, toke@redhat.com,
	david@redhat.com,  Liam.Howlett@oracle.com, vbabka@suse.cz,
	rppt@kernel.org, surenb@google.com,  mhocko@suse.com,
	linux-rdma@vger.kernel.org, bpf@vger.kernel.org,
	 vishal.moola@gmail.com, hannes@cmpxchg.org, ziy@nvidia.com,
	 jackmanb@google.com, wei.fang@nxp.com, shenwei.wang@nxp.com,
	 xiaoning.wang@nxp.com, davem@davemloft.net, edumazet@google.com,
	 kuba@kernel.org, pabeni@redhat.com, anthony.l.nguyen@intel.com,
	 przemyslaw.kitszel@intel.com, sgoutham@marvell.com,
	gakula@marvell.com,  sbhatta@marvell.com, hkelam@marvell.com,
	bbhushan2@marvell.com,  tariqt@nvidia.com, ast@kernel.org,
	daniel@iogearbox.net, hawk@kernel.org,  john.fastabend@gmail.com,
	sdf@fomichev.me, saeedm@nvidia.com, leon@kernel.org,
	 mbloch@nvidia.com, danishanwar@ti.com, rogerq@kernel.org,
	nbd@nbd.name,  lorenzo@kernel.org, ryder.lee@mediatek.com,
	shayne.chen@mediatek.com,  sean.wang@mediatek.com,
	matthias.bgg@gmail.com,  angelogioacchino.delregno@collabora.com,
	horms@kernel.org, m-malladi@ti.com,
	 krzysztof.kozlowski@linaro.org,
	matthias.schiffer@ew.tq-group.com,  robh@kernel.org,
	imx@lists.linux.dev, intel-wired-lan@lists.osuosl.org,
	 linux-arm-kernel@lists.infradead.org,
	linux-wireless@vger.kernel.org,
	 linux-mediatek@lists.infradead.org
Subject: Re: [PATCH net-next v10 02/12] netmem: use netmem_desc instead of page to access ->pp in __netmem_get_pp()
Date: Wed, 16 Jul 2025 12:41:04 -0700	[thread overview]
Message-ID: <CAHS8izMK2JA4rGNMRMqQbZtJVEP8b_QPLXzoKNeVgQFzAmdv3g@mail.gmail.com> (raw)
In-Reply-To: <20250716045124.GB12760@system.software.com>

On Tue, Jul 15, 2025 at 9:51 PM Byungchul Park <byungchul@sk.com> wrote:
>
> On Tue, Jul 15, 2025 at 12:09:34PM -0700, Mina Almasry wrote:
> > On Mon, Jul 14, 2025 at 6:36 PM Byungchul Park <byungchul@sk.com> wrote:
> > >
> > > On Mon, Jul 14, 2025 at 12:58:15PM -0700, Mina Almasry wrote:
> > > > On Mon, Jul 14, 2025 at 12:37 PM Mina Almasry <almasrymina@google.com> wrote:
> > > > >
> > > > > On Mon, Jul 14, 2025 at 5:01 AM Byungchul Park <byungchul@sk.com> wrote:
> > > > > >
> > > > > > To eliminate the use of the page pool fields in struct page, the page
> > > > > > pool code should use netmem descriptor and APIs instead.
> > > > > >
> > > > > > However, __netmem_get_pp() still accesses ->pp via struct page.  So
> > > > > > change it to use struct netmem_desc instead, since ->pp no longer will
> > > > > > be available in struct page.
> > > > > >
> > > > > > While at it, add a helper, pp_page_to_nmdesc(), that can be used to
> > > > > > extract netmem_desc from page only if it's pp page.  For now that
> > > > > > netmem_desc overlays on page, it can be achieved by just casting.
> > > > > >
> > > > > > Signed-off-by: Byungchul Park <byungchul@sk.com>
> > > > > > ---
> > > > > >  include/net/netmem.h | 13 ++++++++++++-
> > > > > >  1 file changed, 12 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/include/net/netmem.h b/include/net/netmem.h
> > > > > > index 535cf17b9134..2b8a7b51ac99 100644
> > > > > > --- a/include/net/netmem.h
> > > > > > +++ b/include/net/netmem.h
> > > > > > @@ -267,6 +267,17 @@ static inline struct net_iov *__netmem_clear_lsb(netmem_ref netmem)
> > > > > >         return (struct net_iov *)((__force unsigned long)netmem & ~NET_IOV);
> > > > > >  }
> > > > > >
> > > > > > +static inline struct netmem_desc *pp_page_to_nmdesc(struct page *page)
> > > > > > +{
> > > > > > +       DEBUG_NET_WARN_ON_ONCE(!page_pool_page_is_pp(page));
> > > > > > +
> > > > > > +       /* XXX: How to extract netmem_desc from page must be changed,
> > > > > > +        * once netmem_desc no longer overlays on page and will be
> > > > > > +        * allocated through slab.
> > > > > > +        */
> > > > > > +       return (struct netmem_desc *)page;
> > > > > > +}
> > > > > > +
> > > > >
> > > > > Same thing. Do not create a generic looking pp_page_to_nmdesc helper
> > > > > which does not check that the page is the correct type. The
> > > > > DEBUG_NET... is not good enough.
> > > > >
> > > > > You don't need to add a generic helper here. There is only one call
> > > > > site. Open code this in the callsite. The one callsite is marked as
> > > > > unsafe, only called by code that knows that the netmem is specifically
> > > > > a pp page. Open code this in the unsafe callsite, instead of creating
> > > > > a generic looking unsafe helper and not even documenting it's unsafe.
> > > > >
> > > >
> > > > On second read through the series, I actually now think this is a
> > > > great idea :-) Adding this helper has simplified the series greatly. I
> > > > did not realize you were converting entire drivers to netmem just to
> > > > get rid of page->pp accesses. Adding a pp_page_to_nmdesc helper makes
> > > > the entire series simpler.
> > > >
> > > > You're also calling it only from code paths like drivers that already
> > > > assumed that the page is a pp page and did page->pp deference without
> > > > a check, so this should be safe.
> > > >
> > > > Only thing I would change is add a comment explaining that the calling
> > > > code needs to check the page is pp page or know it's a pp page (like a
> > > > driver that supports pp).
> > > >
> > > >
> > > > > >  /**
> > > > > >   * __netmem_get_pp - unsafely get pointer to the &page_pool backing @netmem
> > > > > >   * @netmem: netmem reference to get the pointer from
> > > > > > @@ -280,7 +291,7 @@ static inline struct net_iov *__netmem_clear_lsb(netmem_ref netmem)
> > > > > >   */
> > > > > >  static inline struct page_pool *__netmem_get_pp(netmem_ref netmem)
> > > > > >  {
> > > > > > -       return __netmem_to_page(netmem)->pp;
> > > > > > +       return pp_page_to_nmdesc(__netmem_to_page(netmem))->pp;
> > > > > >  }
> > > > >
> > > > > This makes me very sad. Casting from netmem -> page -> nmdesc...
> > > > >
> > > > > Instead, we should be able to go from netmem directly to nmdesc. I
> > > > > would suggest rename __netmem_clear_lsb to netmem_to_nmdesc and have
> > > > > it return netmem_desc instead of net_iov. Then use it here.
> > > > >
> > > > > We could have an unsafe version of netmem_to_nmdesc which converts the
> > > > > netmem to netmem_desc without clearing the lsb and mark it unsafe.
> > > > >
> > > >
> > > > This, I think, we should address to keep some sanity in the code and
> > > > reduce the casts and make it a bit more maintainable.
> > >
> > > I will reflect your suggestions.  To summarize:
> > >
> > >    1) The current implementation of pp_page_to_nmdesc() is good enough
> > >       to keep, but add a comment on it like "Check if the page is a pp
> > >       page before calling this function or know it's a pp page.".
> > >
> >
> > Yes please.
> >
> > >    2) Introduce the unsafe version, __netmem_to_nmdesc(), and use it in
> > >       __netmem_get_pp().
> > >
> >
> > No need following Pavel's feedback. We can just delete
> > __netmem_get_pp. If we do find a need in the future to extract the
> > netmem_desc from a netmem_ref, I would rather we do a straight cast
> > from netmem_ref to netmem_desc rather than netmem_ref -> pages/net_iov
> > -> netmem_desc.
> >
> > But that seems unnecessary for this series.
>
> No.  The series should remove accessing ->pp through page.
>
> I will kill __netmem_get_pp() as you and I prefer.  However,
> __netmem_get_pp() users e.i. libeth_xdp_return_va() and
> libeth_xdp_tx_fill_buf() should be altered.  I will modify the code like:
>
> as is: __netmem_get_pp(netmem)
> to be: __netmem_nmdesc(netmem)->pp
>
> Is it okay with you?
>

When Pavel and I were saying 'remove __netmem_get_pp', I think we
meant to remove the entire concept of unsafe netmem -> page
conversions. I think we both don't like them. From this perspective,
__netmem_nmdesc(netmem)->pp is just as bad as __netmem_get_pp(netmem).

I think since the unsafe netmem-to-page casts are already in mainline,
lets assume they should stay there until someone feels strongly enough
to remove them. The logic in Olek's patch is sound:

https://lore.kernel.org/all/20241203173733.3181246-8-aleksander.lobakin@intel.com/

Header buffer page pools do always use pages and will likely remain so
for a long time, so I guess lets continue to support them rather than
try to remove them in this series. A followup series could try to
remove them.

> > >    3) Rename __netmem_clear_lsb() to netmem_to_nmdesc(), and return
> > >       netmem_desc, and use it in all users of __netmem_clear_lsb().
> > >
> >
> > Following Pavel's comment, this I think also is not necessary for this
> > series. Cleaning up the return value of __netmem_clear_lsb is good
> > work I think, but we're already on v10 of this and I think it would
> > unnecessary to ask for added cleanups. We can do the cleanup on top.
>
> However, I still need to include 'introduce __netmem_nmdesc() helper'

Yes.

> in this series since it should be used to remove __netmem_get_pp() as I

lets keep __netmem_get_pp, which does a `return
__netmem_nmdesc(netmem)->pp;` In general we avoid allowing the driver
to do any netmem casts in the driver code, and we do any casting in
core.

> described above.  I think I'd better add netmem_nmdesc() too while at it.
>

Yes. netmem_nmdesc should replace __netmem_clear_lsb.

> I assume __netmem_nmdesc() is an unsafe version not clearing lsb.  The

Yes.

> safe version, netmem_nmdesc() needs an additional operation clearing lsb.

Yes.


--
Thanks,
Mina

next prev parent reply	other threads:[~2025-07-16 19:41 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-14 12:00 [PATCH net-next v10 00/12] Split netmem from struct page Byungchul Park
2025-07-14 12:00 ` [PATCH net-next v10 01/12] netmem: introduce struct netmem_desc mirroring " Byungchul Park
2025-07-14 12:00 ` [PATCH net-next v10 02/12] netmem: use netmem_desc instead of page to access ->pp in __netmem_get_pp() Byungchul Park
2025-07-14 19:37   ` Mina Almasry
2025-07-14 19:58     ` Mina Almasry
2025-07-15  1:36       ` Byungchul Park
2025-07-15 19:09         ` Mina Almasry
2025-07-16  4:51           ` Byungchul Park
2025-07-16 19:41             ` Mina Almasry [this message]
2025-07-17  0:54               ` Byungchul Park
2025-07-17  4:18               ` Byungchul Park
2025-07-17  6:32               ` Byungchul Park
2025-07-17  9:17               ` Pavel Begunkov
2025-07-17  9:23                 ` Pavel Begunkov
2025-07-15 10:37     ` Pavel Begunkov
2025-07-15 19:06       ` Mina Almasry
2025-07-16  4:27       ` Byungchul Park
2025-07-14 12:00 ` [PATCH net-next v10 03/12] mlx4: access ->pp_ref_count through netmem_desc instead of page Byungchul Park
2025-07-14 12:00 ` [PATCH net-next v10 04/12] netdevsim: access ->pp " Byungchul Park
2025-07-14 12:00 ` [PATCH net-next v10 05/12] mt76: " Byungchul Park
2025-07-14 12:00 ` [PATCH net-next v10 06/12] net: fec: " Byungchul Park
2025-07-14 12:00 ` [PATCH net-next v10 07/12] octeontx2-pf: " Byungchul Park
2025-07-14 12:00 ` [PATCH net-next v10 08/12] iavf: " Byungchul Park
2025-07-14 12:00 ` [PATCH net-next v10 09/12] idpf: " Byungchul Park
2025-07-14 12:00 ` [PATCH net-next v10 10/12] mlx5: " Byungchul Park
2025-07-14 12:00 ` [PATCH net-next v10 11/12] net: ti: icssg-prueth: " Byungchul Park
2025-07-14 12:00 ` [PATCH net-next v10 12/12] libeth: xdp: " Byungchul Park
2025-07-15  0:21   ` kernel test robot
2025-07-15  1:45   ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHS8izMK2JA4rGNMRMqQbZtJVEP8b_QPLXzoKNeVgQFzAmdv3g@mail.gmail.com \
    --to=almasrymina@google.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=aleksander.lobakin@intel.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=angelogioacchino.delregno@collabora.com \
    --cc=anthony.l.nguyen@intel.com \
    --cc=asml.silence@gmail.com \
    --cc=ast@kernel.org \
    --cc=bbhushan2@marvell.com \
    --cc=bpf@vger.kernel.org \
    --cc=byungchul@sk.com \
    --cc=daniel@iogearbox.net \
    --cc=danishanwar@ti.com \
    --cc=davem@davemloft.net \
    --cc=david@redhat.com \
    --cc=edumazet@google.com \
    --cc=gakula@marvell.com \
    --cc=hannes@cmpxchg.org \
    --cc=harry.yoo@oracle.com \
    --cc=hawk@kernel.org \
    --cc=hkelam@marvell.com \
    --cc=horms@kernel.org \
    --cc=ilias.apalodimas@linaro.org \
    --cc=imx@lists.linux.dev \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=jackmanb@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=kernel_team@skhynix.com \
    --cc=krzysztof.kozlowski@linaro.org \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mediatek@lists.infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=lorenzo@kernel.org \
    --cc=m-malladi@ti.com \
    --cc=matthias.bgg@gmail.com \
    --cc=matthias.schiffer@ew.tq-group.com \
    --cc=mbloch@nvidia.com \
    --cc=mhocko@suse.com \
    --cc=nbd@nbd.name \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=przemyslaw.kitszel@intel.com \
    --cc=robh@kernel.org \
    --cc=rogerq@kernel.org \
    --cc=rppt@kernel.org \
    --cc=ryder.lee@mediatek.com \
    --cc=saeedm@nvidia.com \
    --cc=sbhatta@marvell.com \
    --cc=sdf@fomichev.me \
    --cc=sean.wang@mediatek.com \
    --cc=sgoutham@marvell.com \
    --cc=shayne.chen@mediatek.com \
    --cc=shenwei.wang@nxp.com \
    --cc=surenb@google.com \
    --cc=tariqt@nvidia.com \
    --cc=toke@redhat.com \
    --cc=vbabka@suse.cz \
    --cc=vishal.moola@gmail.com \
    --cc=wei.fang@nxp.com \
    --cc=willy@infradead.org \
    --cc=xiaoning.wang@nxp.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).