All of lore.kernel.org
 help / color / mirror / Atom feed
From: Byungchul Park <byungchul@sk.com>
To: Mina Almasry <almasrymina@google.com>
Cc: willy@infradead.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	kernel_team@skhynix.com, kuba@kernel.org,
	ilias.apalodimas@linaro.org, harry.yoo@oracle.com,
	hawk@kernel.org, akpm@linux-foundation.org, davem@davemloft.net,
	john.fastabend@gmail.com, andrew+netdev@lunn.ch,
	asml.silence@gmail.com, toke@redhat.com, tariqt@nvidia.com,
	edumazet@google.com, pabeni@redhat.com, saeedm@nvidia.com,
	leon@kernel.org, ast@kernel.org, daniel@iogearbox.net,
	david@redhat.com, lorenzo.stoakes@oracle.com,
	Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org,
	surenb@google.com, mhocko@suse.com, horms@kernel.org,
	linux-rdma@vger.kernel.org, bpf@vger.kernel.org,
	vishal.moola@gmail.com
Subject: Re: [PATCH 01/18] netmem: introduce struct netmem_desc struct_group_tagged()'ed on struct net_iov
Date: Wed, 28 May 2025 14:03:46 +0900	[thread overview]
Message-ID: <20250528050346.GA59539@system.software.com> (raw)
In-Reply-To: <CAHS8izMvRrG2wpE7HEyK3t544-wN_h3SC8nGabCoPWj1qCv_ag@mail.gmail.com>

On Tue, May 27, 2025 at 08:47:54PM -0700, Mina Almasry wrote:
> On Tue, May 27, 2025 at 6:22 PM Byungchul Park <byungchul@sk.com> wrote:
> >
> > On Tue, May 27, 2025 at 01:03:32PM -0700, Mina Almasry wrote:
> > > On Mon, May 26, 2025 at 7:50 PM Byungchul Park <byungchul@sk.com> wrote:
> > > >
> > > > On Fri, May 23, 2025 at 12:25:52PM +0900, Byungchul Park wrote:
> > > > > To simplify struct page, the page pool members of struct page should be
> > > > > moved to other, allowing these members to be removed from struct page.
> > > > >
> > > > > Introduce a network memory descriptor to store the members, struct
> > > > > netmem_desc, reusing struct net_iov that already mirrored struct page.
> > > > >
> > > > > While at it, relocate _pp_mapping_pad to group struct net_iov's fields.
> > > > >
> > > > > Signed-off-by: Byungchul Park <byungchul@sk.com>
> > > > > ---
> > > > >  include/linux/mm_types.h |  2 +-
> > > > >  include/net/netmem.h     | 43 +++++++++++++++++++++++++++++++++-------
> > > > >  2 files changed, 37 insertions(+), 8 deletions(-)
> > > > >
> > > > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > > > > index 56d07edd01f9..873e820e1521 100644
> > > > > --- a/include/linux/mm_types.h
> > > > > +++ b/include/linux/mm_types.h
> > > > > @@ -120,13 +120,13 @@ struct page {
> > > > >                       unsigned long private;
> > > > >               };
> > > > >               struct {        /* page_pool used by netstack */
> > > > > +                     unsigned long _pp_mapping_pad;
> > > > >                       /**
> > > > >                        * @pp_magic: magic value to avoid recycling non
> > > > >                        * page_pool allocated pages.
> > > > >                        */
> > > > >                       unsigned long pp_magic;
> > > > >                       struct page_pool *pp;
> > > > > -                     unsigned long _pp_mapping_pad;
> > > > >                       unsigned long dma_addr;
> > > > >                       atomic_long_t pp_ref_count;
> > > > >               };
> > > > > diff --git a/include/net/netmem.h b/include/net/netmem.h
> > > > > index 386164fb9c18..08e9d76cdf14 100644
> > > > > --- a/include/net/netmem.h
> > > > > +++ b/include/net/netmem.h
> > > > > @@ -31,12 +31,41 @@ enum net_iov_type {
> > > > >  };
> > > > >
> > > > >  struct net_iov {
> > > > > -     enum net_iov_type type;
> > > > > -     unsigned long pp_magic;
> > > > > -     struct page_pool *pp;
> > > > > -     struct net_iov_area *owner;
> > > > > -     unsigned long dma_addr;
> > > > > -     atomic_long_t pp_ref_count;
> > > > > +     /*
> > > > > +      * XXX: Now that struct netmem_desc overlays on struct page,
> > > > > +      * struct_group_tagged() should cover all of them.  However,
> > > > > +      * a separate struct netmem_desc should be declared and embedded,
> > > > > +      * once struct netmem_desc is no longer overlayed but it has its
> > > > > +      * own instance from slab.  The final form should be:
> > > > > +      *
> > > > > +      *    struct netmem_desc {
> > > > > +      *         unsigned long pp_magic;
> > > > > +      *         struct page_pool *pp;
> > > > > +      *         unsigned long dma_addr;
> > > > > +      *         atomic_long_t pp_ref_count;
> > > > > +      *    };
> > > > > +      *
> > > > > +      *    struct net_iov {
> > > > > +      *         enum net_iov_type type;
> > > > > +      *         struct net_iov_area *owner;
> > > > > +      *         struct netmem_desc;
> > > > > +      *    };
> > > > > +      */
> > > > > +     struct_group_tagged(netmem_desc, desc,
> > > >
> > > > So..  For now, this is the best option we can pick.  We can do all that
> > > > you told me once struct netmem_desc has it own instance from slab.
> > > >
> > > > Again, it's because the page pool fields (or netmem things) from struct
> > > > page will be gone by this series.
> > > >
> > > > Mina, thoughts?
> > > >
> > >
> > > Can you please post an updated series with the approach you have in
> > > mind? I think this series as-is seems broken vis-a-vie the
> > > _pp_padding_map param move that looks incorrect. Pavel and I have also
> > > commented on patch 18 that removing the ASSERTS seems incorrect as
> > > it's breaking the symmetry between struct page and struct net_iov.
> >
> > I told you I will fix it.  I will send the updated series shortly for
> > *review*.  However, it will be for review since we know this work can be
> > completed once the next works have been done:
> >
> >    https://lore.kernel.org/all/20250520205920.2134829-2-anthony.l.nguyen@intel.com/
> >    https://lore.kernel.org/all/1747950086-1246773-9-git-send-email-tariqt@nvidia.com/
> >
> > > It's not clear to me if the fields are being removed from struct page,
> > > where are they going... the approach ptdesc for example has taken is
> >
> > They are going to struct net_iov.

Precisely speaking, to 'struct netmem_desc'.

> Oh. I see. My gut reaction is I'm not sure moving the page_pool fields
> to struct net_iov will work.
> 
> struct net_iov shares some fields with struct page, but abstractly
> it's very different.
> 
> struct page is allocated by the mm stack via things like alloc_pages
> and can be passed to mm apis such as put_page() (called from
> skb_frag_ref) and vm_insert_batch (called from
> tcp_zerocopy_vm_insert_batch_error).
> 
> struct net_iov is kvmalloced by networking code (see
> net_devmem_bind_dmabuf for example), and *must not* be passed to any
> mm apis as it's not a struct page at all. Accidentally calling
> vm_insert_batch on a struct net_iov will cause a kernel crash or some
> memory corruption.
> 
> Thus abstractly different things maybe should not share the same
> in-kernel struct.
> 
> One thing that maybe could work is if struct net_iov has a field in it
> which tells us whether it's actually a struct page that can be passed
> to mm apis, or not a struct page which cannot be passed to mm apis.
> 
> > Or I should introduce another struct
> 
> maybe introducing another struct is the answer. I'm not sure. The net

The final form should be like:

   struct netmem_desc {
      struct page_pool *pp;
      unsigned long dma_addr;
      atomic_long_t ref_count;
   };

   struct net_iov {
      struct netmem_desc;
      enum net_iov_type type;
      struct net_iov_area *owner;
      ...
   };

However, now that overlaying on struct page is required, struct
netmem_desc should be almost same as struct net_iov.  So I'm not sure if
we should introduce struct netmem_desc as a new struct along with struct
net_iov.

> stack today already supports struct page and struct net_iov, with
> netmem_ref acting as an abstraction over both. Adding a 3rd struct and
> adding more checks to test if page or net_iov or something new will
> add overhead.

So I think the current form in this patch is a good option we can take
for now.

> An additional problem is that there are probably hundreds or thousands
> of references to 'page' in the net stack and drivers. I'm not sure
> what you're going to do about those. Are you converting all those to
> netmem or netmem_desc?

No.  I will convert only the references for page pool.

	Byungchul

> 
> -- 
> Thanks,
> Mina

  reply	other threads:[~2025-05-28  5:03 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-23  3:25 [PATCH 00/18] Split netmem from struct page Byungchul Park
2025-05-23  3:25 ` [PATCH 01/18] netmem: introduce struct netmem_desc struct_group_tagged()'ed on struct net_iov Byungchul Park
2025-05-23  9:01   ` Toke Høiland-Jørgensen
2025-05-26  0:56     ` Byungchul Park
2025-05-23 17:00   ` Mina Almasry
2025-05-26  1:15     ` Byungchul Park
2025-05-27  2:50   ` Byungchul Park
2025-05-27 20:03     ` Mina Almasry
2025-05-28  1:21       ` Byungchul Park
2025-05-28  3:47         ` Mina Almasry
2025-05-28  5:03           ` Byungchul Park [this message]
2025-05-28  7:43             ` Pavel Begunkov
2025-05-28  8:17               ` Byungchul Park
2025-05-28  7:38         ` Pavel Begunkov
2025-05-23  3:25 ` [PATCH 02/18] netmem: introduce netmem alloc APIs to wrap page alloc APIs Byungchul Park
2025-05-23  3:25 ` [PATCH 03/18] page_pool: use netmem alloc/put APIs in __page_pool_alloc_page_order() Byungchul Park
2025-05-23  3:25 ` [PATCH 04/18] page_pool: rename __page_pool_alloc_page_order() to __page_pool_alloc_large_netmem() Byungchul Park
2025-05-23  3:25 ` [PATCH 05/18] page_pool: use netmem alloc/put APIs in __page_pool_alloc_pages_slow() Byungchul Park
2025-05-23  3:25 ` [PATCH 06/18] page_pool: rename page_pool_return_page() to page_pool_return_netmem() Byungchul Park
2025-05-28  3:18   ` Mina Almasry
2025-05-23  3:25 ` [PATCH 07/18] page_pool: use netmem put API in page_pool_return_netmem() Byungchul Park
2025-05-23  3:25 ` [PATCH 08/18] page_pool: rename __page_pool_release_page_dma() to __page_pool_release_netmem_dma() Byungchul Park
2025-05-23  3:26 ` [PATCH 09/18] page_pool: rename __page_pool_put_page() to __page_pool_put_netmem() Byungchul Park
2025-05-23  3:26 ` [PATCH 10/18] page_pool: rename __page_pool_alloc_pages_slow() to __page_pool_alloc_netmems_slow() Byungchul Park
2025-05-23  3:26 ` [PATCH 11/18] mlx4: use netmem descriptor and APIs for page pool Byungchul Park
2025-05-23  3:26 ` [PATCH 12/18] page_pool: use netmem APIs to access page->pp_magic in page_pool_page_is_pp() Byungchul Park
2025-05-23  8:58   ` Toke Høiland-Jørgensen
2025-05-23 17:21   ` Mina Almasry
2025-05-26  2:23     ` Byungchul Park
2025-05-26  2:36       ` Byungchul Park
2025-05-26  8:40         ` Toke Høiland-Jørgensen
2025-05-26  9:43           ` Byungchul Park
2025-05-26  9:54             ` Toke Høiland-Jørgensen
2025-05-26 10:01               ` Byungchul Park
2025-05-28  5:14               ` Byungchul Park
2025-05-28  7:35                 ` Toke Høiland-Jørgensen
2025-05-28  8:15                   ` Byungchul Park
2025-05-28  7:51       ` Pavel Begunkov
2025-05-28  8:14         ` Byungchul Park
2025-05-28  9:07           ` Pavel Begunkov
2025-05-28  9:14             ` Byungchul Park
2025-05-28  9:20               ` Pavel Begunkov
2025-05-28  9:33                 ` Byungchul Park
2025-05-28  9:51                   ` Pavel Begunkov
2025-05-28 10:44                     ` Byungchul Park
2025-05-28 10:54                       ` Pavel Begunkov
2025-05-23  3:26 ` [PATCH 13/18] mlx5: use netmem descriptor and APIs for page pool Byungchul Park
2025-05-23 17:13   ` Mina Almasry
2025-05-26  3:08     ` Byungchul Park
2025-05-26  8:12       ` Byungchul Park
2025-05-26 18:00         ` Mina Almasry
2025-05-23  3:26 ` [PATCH 14/18] netmem: use _Generic to cover const casting for page_to_netmem() Byungchul Park
2025-05-23 17:14   ` Mina Almasry
2025-05-23  3:26 ` [PATCH 15/18] netmem: remove __netmem_get_pp() Byungchul Park
2025-05-23  3:26 ` [PATCH 16/18] page_pool: make page_pool_get_dma_addr() just wrap page_pool_get_dma_addr_netmem() Byungchul Park
2025-05-23  3:26 ` [PATCH 17/18] netdevsim: use netmem descriptor and APIs for page pool Byungchul Park
2025-05-23  3:26 ` [PATCH 18/18] mm, netmem: remove the page pool members in struct page Byungchul Park
2025-05-23 17:16   ` kernel test robot
2025-05-23 17:55   ` Mina Almasry
2025-05-26  1:37     ` Byungchul Park
2025-05-26 16:58       ` Pavel Begunkov
2025-05-26 17:33         ` Mina Almasry
2025-05-27  1:02         ` Byungchul Park
2025-05-27  1:31           ` Byungchul Park
2025-05-27  5:30           ` Pavel Begunkov
2025-05-27 17:38             ` Mina Almasry
2025-05-28  1:31               ` Byungchul Park
2025-05-28  7:21               ` Pavel Begunkov
2025-05-23  6:20 ` [PATCH 00/18] Split netmem from " Taehee Yoo
2025-05-23  7:47   ` Byungchul Park
2025-05-23 17:47 ` SeongJae Park
2025-05-26  1:16   ` Byungchul Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250528050346.GA59539@system.software.com \
    --to=byungchul@sk.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=asml.silence@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=david@redhat.com \
    --cc=edumazet@google.com \
    --cc=harry.yoo@oracle.com \
    --cc=hawk@kernel.org \
    --cc=horms@kernel.org \
    --cc=ilias.apalodimas@linaro.org \
    --cc=john.fastabend@gmail.com \
    --cc=kernel_team@skhynix.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=rppt@kernel.org \
    --cc=saeedm@nvidia.com \
    --cc=surenb@google.com \
    --cc=tariqt@nvidia.com \
    --cc=toke@redhat.com \
    --cc=vbabka@suse.cz \
    --cc=vishal.moola@gmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.