linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Pavel Begunkov <asml.silence@gmail.com>
To: Byungchul Park <byungchul@sk.com>, Mina Almasry <almasrymina@google.com>
Cc: willy@infradead.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	kernel_team@skhynix.com, kuba@kernel.org,
	ilias.apalodimas@linaro.org, harry.yoo@oracle.com,
	hawk@kernel.org, akpm@linux-foundation.org, davem@davemloft.net,
	john.fastabend@gmail.com, andrew+netdev@lunn.ch, toke@redhat.com,
	tariqt@nvidia.com, edumazet@google.com, pabeni@redhat.com,
	saeedm@nvidia.com, leon@kernel.org, ast@kernel.org,
	daniel@iogearbox.net, david@redhat.com,
	lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com,
	vbabka@suse.cz, rppt@kernel.org, surenb@google.com,
	mhocko@suse.com, horms@kernel.org, linux-rdma@vger.kernel.org,
	bpf@vger.kernel.org, vishal.moola@gmail.com
Subject: Re: [PATCH 18/18] mm, netmem: remove the page pool members in struct page
Date: Mon, 26 May 2025 17:58:10 +0100	[thread overview]
Message-ID: <cae26eaa-66cf-4d1f-ae13-047fb421824a@gmail.com> (raw)
In-Reply-To: <20250526013744.GD74632@system.software.com>

On 5/26/25 02:37, Byungchul Park wrote:
> On Fri, May 23, 2025 at 10:55:54AM -0700, Mina Almasry wrote:
>> On Thu, May 22, 2025 at 8:26 PM Byungchul Park <byungchul@sk.com> wrote:
>>>
>>> Now that all the users of the page pool members in struct page have been
>>> gone, the members can be removed from struct page.
>>>
>>> However, since struct netmem_desc might still use the space in struct
>>> page, the size of struct netmem_desc should be checked, until struct
>>> netmem_desc has its own instance from slab, to avoid conficting with
>>> other members within struct page.
>>>
>>> Remove the page pool members in struct page and add a static checker for
>>> the size.
>>>
>>> Signed-off-by: Byungchul Park <byungchul@sk.com>
>>> ---
>>>   include/linux/mm_types.h | 11 -----------
>>>   include/net/netmem.h     | 28 +++++-----------------------
>>>   2 files changed, 5 insertions(+), 34 deletions(-)
>>>
>>> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
>>> index 873e820e1521..5a7864eb9d76 100644
>>> --- a/include/linux/mm_types.h
>>> +++ b/include/linux/mm_types.h
>>> @@ -119,17 +119,6 @@ struct page {
>>>                           */
>>>                          unsigned long private;
>>>                  };
>>> -               struct {        /* page_pool used by netstack */
>>> -                       unsigned long _pp_mapping_pad;
>>> -                       /**
>>> -                        * @pp_magic: magic value to avoid recycling non
>>> -                        * page_pool allocated pages.
>>> -                        */
>>> -                       unsigned long pp_magic;
>>> -                       struct page_pool *pp;
>>> -                       unsigned long dma_addr;
>>> -                       atomic_long_t pp_ref_count;
>>> -               };
>>>                  struct {        /* Tail pages of compound page */
>>>                          unsigned long compound_head;    /* Bit zero is set */
>>>                  };
>>> diff --git a/include/net/netmem.h b/include/net/netmem.h
>>> index c63a7e20f5f3..257c22398d7a 100644
>>> --- a/include/net/netmem.h
>>> +++ b/include/net/netmem.h
>>> @@ -77,30 +77,12 @@ struct net_iov_area {
>>>          unsigned long base_virtual;
>>>   };
>>>
>>> -/* These fields in struct page are used by the page_pool and net stack:
>>> - *
>>> - *        struct {
>>> - *                unsigned long _pp_mapping_pad;
>>> - *                unsigned long pp_magic;
>>> - *                struct page_pool *pp;
>>> - *                unsigned long dma_addr;
>>> - *                atomic_long_t pp_ref_count;
>>> - *        };
>>> - *
>>> - * We mirror the page_pool fields here so the page_pool can access these fields
>>> - * without worrying whether the underlying fields belong to a page or net_iov.
>>> - *
>>> - * The non-net stack fields of struct page are private to the mm stack and must
>>> - * never be mirrored to net_iov.
>>> +/* XXX: The page pool fields in struct page have been removed but they
>>> + * might still use the space in struct page.  Thus, the size of struct
>>> + * netmem_desc should be under control until struct netmem_desc has its
>>> + * own instance from slab.
>>>    */
>>> -#define NET_IOV_ASSERT_OFFSET(pg, iov)             \
>>> -       static_assert(offsetof(struct page, pg) == \
>>> -                     offsetof(struct net_iov, iov))
>>> -NET_IOV_ASSERT_OFFSET(pp_magic, pp_magic);
>>> -NET_IOV_ASSERT_OFFSET(pp, pp);
>>> -NET_IOV_ASSERT_OFFSET(dma_addr, dma_addr);
>>> -NET_IOV_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
>>> -#undef NET_IOV_ASSERT_OFFSET
>>> +static_assert(sizeof(struct netmem_desc) <= offsetof(struct page, _refcount));
>>>
>>
>> Removing these asserts is actually a bit dangerous. Functions like
>> netmem_or_pp_magic() rely on the fact that the offsets are the same
>> between struct page and struct net_iov to access these fields without
> 
> Worth noting this patch removes the page pool fields from struct page.

static inline struct net_iov *__netmem_clear_lsb(netmem_ref netmem)
{
	return (struct net_iov *)((__force unsigned long)netmem & ~NET_IOV);
}

static inline atomic_long_t *netmem_get_pp_ref_count_ref(netmem_ref netmem)
{
	return &__netmem_clear_lsb(netmem)->pp_ref_count;
}

That's a snippet of code after applying the series. So, let's say we
take a page, it's casted to netmem, then the netmem (as it was before)
is casted to net_iov. Before it relied on net_iov and the pp's part of
the page having the same layout, which was checked by static asserts,
but now, unless I'm mistaken, it's aligned in the exactly same way but
points to a seemingly random offset of the page. We should not be doing
that.

Just to be clear, I think casting pages to struct net_iov *, as it
currently is, is quite ugly, but that's something netmem_desc and this
effort can help with.

What you likely want to do is:

Patch 1:

struct page {
	unsigned long flags;
	union {
		struct_group_tagged(netmem_desc, netmem_desc) {
			// same layout as before
			...
			struct page_pool *pp;
			...
		};
	}
}

struct net_iov {
	unsigned long flags_padding;
	union {
		struct {
			// same layout as in page + build asserts;
			...
			struct page_pool *pp;
			...
		};
		struct netmem_desc desc;
	};
};

struct netmem_desc *page_to_netmem_desc(struct page *page)
{
	return &page->netmem_desc;
}

struct netmem_desc *netmem_to_desc(netmem_t netmem)
{
	if (netmem_is_page(netmem))
		return page_to_netmem_desc(netmem_to_page(netmem);
	return &netmem_to_niov(netmem)->desc;
}

The compiler should be able to optimise the branch in netmem_to_desc(),
but we might need to help it a bit.


Then, patch 2 ... N convert page pool and everyone else accessing
those page fields directly to netmem_to_desc / etc.

And the final patch replaces the struct group in the page with a
new field:

struct netmem_desc {
	struct page_pool *pp;
	...
};

struct page {
	unsigned long flags_padding;
	union {
		struct netmem_desc desc;
		...
	};
};

net_iov will drop its union in a later series to avoid conflicts.

btw, I don't think you need to convert page pool to netmem for this
to happen, so that can be done in a separate unrelated series. It's
18 patches, and netdev usually requires it to be no more than 15.

-- 
Pavel Begunkov



  reply	other threads:[~2025-05-26 16:57 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-23  3:25 [PATCH 00/18] Split netmem from struct page Byungchul Park
2025-05-23  3:25 ` [PATCH 01/18] netmem: introduce struct netmem_desc struct_group_tagged()'ed on struct net_iov Byungchul Park
2025-05-23  9:01   ` Toke Høiland-Jørgensen
2025-05-26  0:56     ` Byungchul Park
2025-05-23 17:00   ` Mina Almasry
2025-05-26  1:15     ` Byungchul Park
2025-05-27  2:50   ` Byungchul Park
2025-05-27 20:03     ` Mina Almasry
2025-05-28  1:21       ` Byungchul Park
2025-05-28  3:47         ` Mina Almasry
2025-05-28  5:03           ` Byungchul Park
2025-05-28  7:43             ` Pavel Begunkov
2025-05-28  8:17               ` Byungchul Park
2025-05-28  7:38         ` Pavel Begunkov
2025-05-23  3:25 ` [PATCH 02/18] netmem: introduce netmem alloc APIs to wrap page alloc APIs Byungchul Park
2025-05-23  3:25 ` [PATCH 03/18] page_pool: use netmem alloc/put APIs in __page_pool_alloc_page_order() Byungchul Park
2025-05-23  3:25 ` [PATCH 04/18] page_pool: rename __page_pool_alloc_page_order() to __page_pool_alloc_large_netmem() Byungchul Park
2025-05-23  3:25 ` [PATCH 05/18] page_pool: use netmem alloc/put APIs in __page_pool_alloc_pages_slow() Byungchul Park
2025-05-23  3:25 ` [PATCH 06/18] page_pool: rename page_pool_return_page() to page_pool_return_netmem() Byungchul Park
2025-05-28  3:18   ` Mina Almasry
2025-05-23  3:25 ` [PATCH 07/18] page_pool: use netmem put API in page_pool_return_netmem() Byungchul Park
2025-05-23  3:25 ` [PATCH 08/18] page_pool: rename __page_pool_release_page_dma() to __page_pool_release_netmem_dma() Byungchul Park
2025-05-23  3:26 ` [PATCH 09/18] page_pool: rename __page_pool_put_page() to __page_pool_put_netmem() Byungchul Park
2025-05-23  3:26 ` [PATCH 10/18] page_pool: rename __page_pool_alloc_pages_slow() to __page_pool_alloc_netmems_slow() Byungchul Park
2025-05-23  3:26 ` [PATCH 11/18] mlx4: use netmem descriptor and APIs for page pool Byungchul Park
2025-05-23  3:26 ` [PATCH 12/18] page_pool: use netmem APIs to access page->pp_magic in page_pool_page_is_pp() Byungchul Park
2025-05-23  8:58   ` Toke Høiland-Jørgensen
2025-05-23 17:21   ` Mina Almasry
2025-05-26  2:23     ` Byungchul Park
2025-05-26  2:36       ` Byungchul Park
2025-05-26  8:40         ` Toke Høiland-Jørgensen
2025-05-26  9:43           ` Byungchul Park
2025-05-26  9:54             ` Toke Høiland-Jørgensen
2025-05-26 10:01               ` Byungchul Park
2025-05-28  5:14               ` Byungchul Park
2025-05-28  7:35                 ` Toke Høiland-Jørgensen
2025-05-28  8:15                   ` Byungchul Park
2025-05-28  7:51       ` Pavel Begunkov
2025-05-28  8:14         ` Byungchul Park
2025-05-28  9:07           ` Pavel Begunkov
2025-05-28  9:14             ` Byungchul Park
2025-05-28  9:20               ` Pavel Begunkov
2025-05-28  9:33                 ` Byungchul Park
2025-05-28  9:51                   ` Pavel Begunkov
2025-05-28 10:44                     ` Byungchul Park
2025-05-28 10:54                       ` Pavel Begunkov
2025-05-23  3:26 ` [PATCH 13/18] mlx5: use netmem descriptor and APIs for page pool Byungchul Park
2025-05-23 17:13   ` Mina Almasry
2025-05-26  3:08     ` Byungchul Park
2025-05-26  8:12       ` Byungchul Park
2025-05-26 18:00         ` Mina Almasry
2025-05-23  3:26 ` [PATCH 14/18] netmem: use _Generic to cover const casting for page_to_netmem() Byungchul Park
2025-05-23 17:14   ` Mina Almasry
2025-05-23  3:26 ` [PATCH 15/18] netmem: remove __netmem_get_pp() Byungchul Park
2025-05-23  3:26 ` [PATCH 16/18] page_pool: make page_pool_get_dma_addr() just wrap page_pool_get_dma_addr_netmem() Byungchul Park
2025-05-23  3:26 ` [PATCH 17/18] netdevsim: use netmem descriptor and APIs for page pool Byungchul Park
2025-05-23  3:26 ` [PATCH 18/18] mm, netmem: remove the page pool members in struct page Byungchul Park
2025-05-23 17:16   ` kernel test robot
2025-05-23 17:55   ` Mina Almasry
2025-05-26  1:37     ` Byungchul Park
2025-05-26 16:58       ` Pavel Begunkov [this message]
2025-05-26 17:33         ` Mina Almasry
2025-05-27  1:02         ` Byungchul Park
2025-05-27  1:31           ` Byungchul Park
2025-05-27  5:30           ` Pavel Begunkov
2025-05-27 17:38             ` Mina Almasry
2025-05-28  1:31               ` Byungchul Park
2025-05-28  7:21               ` Pavel Begunkov
2025-05-23  6:20 ` [PATCH 00/18] Split netmem from " Taehee Yoo
2025-05-23  7:47   ` Byungchul Park
2025-05-23 17:47 ` SeongJae Park
2025-05-26  1:16   ` Byungchul Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cae26eaa-66cf-4d1f-ae13-047fb421824a@gmail.com \
    --to=asml.silence@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=byungchul@sk.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=david@redhat.com \
    --cc=edumazet@google.com \
    --cc=harry.yoo@oracle.com \
    --cc=hawk@kernel.org \
    --cc=horms@kernel.org \
    --cc=ilias.apalodimas@linaro.org \
    --cc=john.fastabend@gmail.com \
    --cc=kernel_team@skhynix.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=rppt@kernel.org \
    --cc=saeedm@nvidia.com \
    --cc=surenb@google.com \
    --cc=tariqt@nvidia.com \
    --cc=toke@redhat.com \
    --cc=vbabka@suse.cz \
    --cc=vishal.moola@gmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).