From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Zi Yan <ziy@nvidia.com>, David Hildenbrand <david@redhat.com>
Cc: Byungchul Park <byungchul@sk.com>,
willy@infradead.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
kernel_team@skhynix.com, kuba@kernel.org, almasrymina@google.com,
ilias.apalodimas@linaro.org, harry.yoo@oracle.com,
hawk@kernel.org, akpm@linux-foundation.org, davem@davemloft.net,
john.fastabend@gmail.com, andrew+netdev@lunn.ch,
asml.silence@gmail.com, tariqt@nvidia.com, edumazet@google.com,
pabeni@redhat.com, saeedm@nvidia.com, leon@kernel.org,
ast@kernel.org, daniel@iogearbox.net, lorenzo.stoakes@oracle.com,
Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, horms@kernel.org,
linux-rdma@vger.kernel.org, bpf@vger.kernel.org,
vishal.moola@gmail.com, hannes@cmpxchg.org, jackmanb@google.com,
"jesper@cloudflare.com" <jesper@cloudflare.com>
Subject: Re: [PATCH net-next v6 9/9] page_pool: access ->pp_magic through struct netmem_desc in page_pool_page_is_pp()
Date: Tue, 24 Jun 2025 16:43:36 +0200 [thread overview]
Message-ID: <87o6udfbdz.fsf@toke.dk> (raw)
In-Reply-To: <42E9BEA8-9B02-440F-94BF-74393827B01E@nvidia.com>
Zi Yan <ziy@nvidia.com> writes:
> On 23 Jun 2025, at 10:58, David Hildenbrand wrote:
>
>> On 23.06.25 13:13, Zi Yan wrote:
>>> On 23 Jun 2025, at 6:16, Byungchul Park wrote:
>>>
>>>> On Mon, Jun 23, 2025 at 11:16:43AM +0200, David Hildenbrand wrote:
>>>>> On 20.06.25 06:12, Byungchul Park wrote:
>>>>>> To simplify struct page, the effort to separate its own descriptor from
>>>>>> struct page is required and the work for page pool is on going.
>>>>>>
>>>>>> To achieve that, all the code should avoid directly accessing page pool
>>>>>> members of struct page.
>>>>>>
>>>>>> Access ->pp_magic through struct netmem_desc instead of directly
>>>>>> accessing it through struct page in page_pool_page_is_pp(). Plus, move
>>>>>> page_pool_page_is_pp() from mm.h to netmem.h to use struct netmem_desc
>>>>>> without header dependency issue.
>>>>>>
>>>>>> Signed-off-by: Byungchul Park <byungchul@sk.com>
>>>>>> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>>>>> Reviewed-by: Mina Almasry <almasrymina@google.com>
>>>>>> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
>>>>>> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>>>>>> Acked-by: Harry Yoo <harry.yoo@oracle.com>
>>>>>> ---
>>>>>> include/linux/mm.h | 12 ------------
>>>>>> include/net/netmem.h | 14 ++++++++++++++
>>>>>> mm/page_alloc.c | 1 +
>>>>>> 3 files changed, 15 insertions(+), 12 deletions(-)
>>>>>>
>>>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>>>>> index 0ef2ba0c667a..0b7f7f998085 100644
>>>>>> --- a/include/linux/mm.h
>>>>>> +++ b/include/linux/mm.h
>>>>>> @@ -4172,16 +4172,4 @@ int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
>>>>>> */
>>>>>> #define PP_MAGIC_MASK ~(PP_DMA_INDEX_MASK | 0x3UL)
>>>>>>
>>>>>> -#ifdef CONFIG_PAGE_POOL
>>>>>> -static inline bool page_pool_page_is_pp(struct page *page)
>>>>>> -{
>>>>>> - return (page->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
>>>>>> -}
>>>>>> -#else
>>>>>> -static inline bool page_pool_page_is_pp(struct page *page)
>>>>>> -{
>>>>>> - return false;
>>>>>> -}
>>>>>> -#endif
>>>>>> -
>>>>>> #endif /* _LINUX_MM_H */
>>>>>> diff --git a/include/net/netmem.h b/include/net/netmem.h
>>>>>> index d49ed49d250b..3d1b1dfc9ba5 100644
>>>>>> --- a/include/net/netmem.h
>>>>>> +++ b/include/net/netmem.h
>>>>>> @@ -56,6 +56,20 @@ NETMEM_DESC_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
>>>>>> */
>>>>>> static_assert(sizeof(struct netmem_desc) <= offsetof(struct page, _refcount));
>>>>>>
>>>>>> +#ifdef CONFIG_PAGE_POOL
>>>>>> +static inline bool page_pool_page_is_pp(struct page *page)
>>>>>> +{
>>>>>> + struct netmem_desc *desc = (struct netmem_desc *)page;
>>>>>> +
>>>>>> + return (desc->pp_magic & PP_MAGIC_MASK) == PP_SIGNATURE;
>>>>>> +}
>>>>>> +#else
>>>>>> +static inline bool page_pool_page_is_pp(struct page *page)
>>>>>> +{
>>>>>> + return false;
>>>>>> +}
>>>>>> +#endif
>>>>>
>>>>> I wonder how helpful this cleanup is long-term.
>>>>>
>>>>> page_pool_page_is_pp() is only called from mm/page_alloc.c, right?
>>>>
>>>> Yes.
>>>>
>>>>> There, we want to make sure that no pagepool page is ever returned to
>>>>> the buddy.
>>>>>
>>>>> How reasonable is this sanity check to have long-term? Wouldn't we be
>>>>> able to check that on some higher-level freeing path?
>>>>>
>>>>> The reason I am commenting is that once we decouple "struct page" from
>>>>> "struct netmem_desc", we'd have to lookup here the corresponding "struct
>>>>> netmem_desc".
>>>>>
>>>>> ... but at that point here (when we free the actual pages), the "struct
>>>>> netmem_desc" would likely already have been freed separately (remember:
>>>>> it will be dynamically allocated).
>>>>>
>>>>> With that in mind:
>>>>>
>>>>> 1) Is there a higher level "struct netmem_desc" freeing path where we
>>>>> could check that instead, so we don't have to cast from pages to
>>>>> netmem_desc at all.
>>>>
>>>> I also thought it's too paranoiac. However, I thought it's other issue
>>>> than this work. That's why I left the API as is for now, it can be gone
>>>> once we get convinced the check is unnecessary in deep buddy. Wrong?
>>>>
>>>>> 2) How valuable are these sanity checks deep in the buddy?
>>>>
>>>> That was also what I felt weird on.
>>>
>>> It seems very useful when I asked last time[1]:
>>>
>>> |> We have actually used this at Cloudflare to catch some page_pool bugs.
>>
>> My question is rather, whether there is some higher-level freeing path for netmem_desc where we could check that instead (IOW, earlier).
>>
>> Or is it really arbitrary put_page() (IOW, we assume that many possible references can be held)?
>
> +Toke, who I talked about this last time.
>
> Maybe he can shed some light on it.
As others have pointed out, basically, AFAIU: Yes, pages are *supposed*
to go through a common freeing path where this check could reside, but
we've had bugs where they ended up leaking anyway, which is why this
check in MM was added in the first place.
I don't recall the specifics of *what* the bug was; +Jesper who maybe does?
-Toke
next prev parent reply other threads:[~2025-06-24 14:43 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-20 4:12 [PATCH net-next v6 0/9] Split netmem from struct page Byungchul Park
2025-06-20 4:12 ` [PATCH net-next v6 1/9] netmem: introduce struct netmem_desc mirroring " Byungchul Park
2025-06-23 9:32 ` David Hildenbrand
2025-06-23 10:28 ` Byungchul Park
2025-06-23 10:38 ` David Hildenbrand
2025-06-23 12:18 ` Harry Yoo
2025-06-23 19:09 ` Mina Almasry
2025-06-23 19:28 ` Pavel Begunkov
2025-06-24 1:17 ` Byungchul Park
2025-06-20 4:12 ` [PATCH net-next v6 2/9] page_pool: rename page_pool_return_page() to page_pool_return_netmem() Byungchul Park
2025-06-20 4:12 ` [PATCH net-next v6 3/9] page_pool: rename __page_pool_release_page_dma() to __page_pool_release_netmem_dma() Byungchul Park
2025-06-20 4:12 ` [PATCH net-next v6 4/9] page_pool: rename __page_pool_alloc_pages_slow() to __page_pool_alloc_netmems_slow() Byungchul Park
2025-06-20 4:12 ` [PATCH net-next v6 5/9] netmem: use _Generic to cover const casting for page_to_netmem() Byungchul Park
2025-06-20 4:12 ` [PATCH net-next v6 6/9] netmem: remove __netmem_get_pp() Byungchul Park
2025-06-23 4:32 ` Byungchul Park
2025-06-24 0:17 ` Jakub Kicinski
2025-06-24 1:27 ` Byungchul Park
2025-06-20 4:12 ` [PATCH net-next v6 7/9] page_pool: make page_pool_get_dma_addr() just wrap page_pool_get_dma_addr_netmem() Byungchul Park
2025-06-20 4:12 ` [PATCH net-next v6 8/9] netmem: introduce a netmem API, virt_to_head_netmem() Byungchul Park
2025-06-20 4:12 ` [PATCH net-next v6 9/9] page_pool: access ->pp_magic through struct netmem_desc in page_pool_page_is_pp() Byungchul Park
2025-06-23 9:16 ` David Hildenbrand
2025-06-23 10:16 ` Byungchul Park
2025-06-23 11:13 ` Zi Yan
2025-06-23 11:25 ` Byungchul Park
2025-06-23 14:58 ` David Hildenbrand
2025-06-23 15:25 ` Zi Yan
2025-06-24 14:43 ` Toke Høiland-Jørgensen [this message]
2025-06-24 14:56 ` David Hildenbrand
2025-06-25 1:24 ` Byungchul Park
2025-06-26 6:35 ` Byungchul Park
2025-06-23 17:06 ` Pavel Begunkov
2025-06-23 17:28 ` Mina Almasry
2025-06-23 18:09 ` Vlastimil Babka
2025-06-23 18:14 ` Pavel Begunkov
2025-06-24 1:54 ` Byungchul Park
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87o6udfbdz.fsf@toke.dk \
--to=toke@redhat.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=almasrymina@google.com \
--cc=andrew+netdev@lunn.ch \
--cc=asml.silence@gmail.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=byungchul@sk.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=david@redhat.com \
--cc=edumazet@google.com \
--cc=hannes@cmpxchg.org \
--cc=harry.yoo@oracle.com \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=ilias.apalodimas@linaro.org \
--cc=jackmanb@google.com \
--cc=jesper@cloudflare.com \
--cc=john.fastabend@gmail.com \
--cc=kernel_team@skhynix.com \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-rdma@vger.kernel.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=rppt@kernel.org \
--cc=saeedm@nvidia.com \
--cc=surenb@google.com \
--cc=tariqt@nvidia.com \
--cc=vbabka@suse.cz \
--cc=vishal.moola@gmail.com \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).