From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Yunsheng Lin <linyunsheng@huawei.com>,
davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com
Cc: zhangkun09@huawei.com, liuyonglong@huawei.com,
fanghaiqing@huawei.com, Yunsheng Lin <linyunsheng@huawei.com>,
Alexander Lobakin <aleksander.lobakin@intel.com>,
Xuan Zhuo <xuanzhuo@linux.alibaba.com>,
Jesper Dangaard Brouer <hawk@kernel.org>,
Ilias Apalodimas <ilias.apalodimas@linaro.org>,
Eric Dumazet <edumazet@google.com>,
Simon Horman <horms@kernel.org>,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH net-next v7 2/8] page_pool: fix timing for checking and disabling napi_local
Date: Fri, 10 Jan 2025 16:40:12 +0100 [thread overview]
Message-ID: <87sepqhe3n.fsf@toke.dk> (raw)
In-Reply-To: <20250110130703.3814407-3-linyunsheng@huawei.com>
Yunsheng Lin <linyunsheng@huawei.com> writes:
> page_pool page may be freed from skb_defer_free_flush() in
> softirq context without binding to any specific napi, it
> may cause use-after-free problem due to the below time window,
> as below, CPU1 may still access napi->list_owner after CPU0
> free the napi memory:
>
> CPU 0 CPU1
> page_pool_destroy() skb_defer_free_flush()
> . .
> . napi = READ_ONCE(pool->p.napi);
> . .
> page_pool_disable_direct_recycling() .
> driver free napi memory .
> . .
> . napi && READ_ONCE(napi->list_owner) == cpuid
> . .
Have you actually observed this happen, or are you just speculating?
Because I don't think it can; deleting a NAPI instance already requires
observing an RCU grace period, cf netdevice.h:
/**
* __netif_napi_del - remove a NAPI context
* @napi: NAPI context
*
* Warning: caller must observe RCU grace period before freeing memory
* containing @napi. Drivers might want to call this helper to combine
* all the needed RCU grace periods into a single one.
*/
void __netif_napi_del(struct napi_struct *napi);
/**
* netif_napi_del - remove a NAPI context
* @napi: NAPI context
*
* netif_napi_del() removes a NAPI context from the network device NAPI list
*/
static inline void netif_napi_del(struct napi_struct *napi)
{
__netif_napi_del(napi);
synchronize_net();
}
> Use rcu mechanism to avoid the above problem.
>
> Note, the above was found during code reviewing on how to fix
> the problem in [1].
>
> As the following IOMMU fix patch depends on synchronize_rcu()
> added in this patch and the time window is so small that it
> doesn't seem to be an urgent fix, so target the net-next as
> the IOMMU fix patch does.
>
> 1. https://lore.kernel.org/lkml/8067f204-1380-4d37-8ffd-007fc6f26738@kernel.org/T/
>
> Fixes: dd64b232deb8 ("page_pool: unlink from napi during destroy")
> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> CC: Alexander Lobakin <aleksander.lobakin@intel.com>
> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
> net/core/page_pool.c | 15 ++++++++++++++-
> 1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 9733206d6406..1aa7b93bdcc8 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -799,6 +799,7 @@ __page_pool_put_page(struct page_pool *pool, netmem_ref netmem,
> static bool page_pool_napi_local(const struct page_pool *pool)
> {
> const struct napi_struct *napi;
> + bool napi_local;
> u32 cpuid;
>
> if (unlikely(!in_softirq()))
> @@ -814,9 +815,15 @@ static bool page_pool_napi_local(const struct page_pool *pool)
> if (READ_ONCE(pool->cpuid) == cpuid)
> return true;
>
> + /* Synchronizated with page_pool_destory() to avoid use-after-free
> + * for 'napi'.
> + */
> + rcu_read_lock();
> napi = READ_ONCE(pool->p.napi);
> + napi_local = napi && READ_ONCE(napi->list_owner) == cpuid;
> + rcu_read_unlock();
This rcu_read_lock/unlock() pair is redundant in the context you mention
above, since skb_defer_free_flush() is only ever called from softirq
context (within local_bh_disable()), which already function as an RCU
read lock.
> - return napi && READ_ONCE(napi->list_owner) == cpuid;
> + return napi_local;
> }
>
> void page_pool_put_unrefed_netmem(struct page_pool *pool, netmem_ref netmem,
> @@ -1165,6 +1172,12 @@ void page_pool_destroy(struct page_pool *pool)
> if (!page_pool_release(pool))
> return;
>
> + /* Paired with rcu lock in page_pool_napi_local() to enable clearing
> + * of pool->p.napi in page_pool_disable_direct_recycling() is seen
> + * before returning to driver to free the napi instance.
> + */
> + synchronize_rcu();
Most drivers call page_pool_destroy() in a loop for each RX queue, so
now you're introducing a full synchronize_rcu() wait for each queue.
That can delay tearing down the device significantly, so I don't think
this is a good idea.
-Toke
next prev parent reply other threads:[~2025-01-10 15:40 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-10 13:06 [PATCH net-next v7 0/8] fix two bugs related to page_pool Yunsheng Lin
2025-01-10 13:06 ` [Intel-wired-lan] " Yunsheng Lin
2025-01-10 13:06 ` [PATCH net-next v7 1/8] page_pool: introduce page_pool_get_pp() API Yunsheng Lin
2025-01-10 13:06 ` [Intel-wired-lan] " Yunsheng Lin
2025-01-10 13:06 ` [PATCH net-next v7 2/8] page_pool: fix timing for checking and disabling napi_local Yunsheng Lin
2025-01-10 15:40 ` Toke Høiland-Jørgensen [this message]
2025-01-11 5:24 ` Yunsheng Lin
2025-01-14 13:03 ` Yunsheng Lin
2025-01-20 11:24 ` Toke Høiland-Jørgensen
2025-01-22 11:02 ` Yunsheng Lin
2025-01-24 17:13 ` Toke Høiland-Jørgensen
2025-01-25 14:21 ` Yunsheng Lin
2025-01-27 13:47 ` Toke Høiland-Jørgensen
2025-02-04 13:51 ` Yunsheng Lin
2025-01-10 13:06 ` [PATCH net-next v7 3/8] page_pool: fix IOMMU crash when driver has already unbound Yunsheng Lin
2025-01-15 16:29 ` Jesper Dangaard Brouer
2025-01-16 12:52 ` Yunsheng Lin
2025-01-16 16:09 ` Jesper Dangaard Brouer
2025-01-17 11:56 ` Yunsheng Lin
2025-01-17 16:56 ` Jesper Dangaard Brouer
2025-01-18 13:36 ` Yunsheng Lin
2025-01-10 13:06 ` [PATCH net-next v7 4/8] page_pool: support unlimited number of inflight pages Yunsheng Lin
2025-01-10 13:06 ` [PATCH net-next v7 5/8] page_pool: skip dma sync operation for " Yunsheng Lin
2025-01-10 13:07 ` [PATCH net-next v7 6/8] page_pool: use list instead of ptr_ring for ring cache Yunsheng Lin
2025-01-10 13:07 ` [PATCH net-next v7 7/8] page_pool: batch refilling pages to reduce atomic operation Yunsheng Lin
2025-01-10 13:07 ` [PATCH net-next v7 8/8] page_pool: use list instead of array for alloc cache Yunsheng Lin
2025-01-14 14:31 ` [PATCH net-next v7 0/8] fix two bugs related to page_pool Jesper Dangaard Brouer
2025-01-14 14:31 ` [Intel-wired-lan] " Jesper Dangaard Brouer
2025-01-15 11:33 ` Yunsheng Lin
2025-01-15 11:33 ` [Intel-wired-lan] " Yunsheng Lin
2025-01-15 17:40 ` Jesper Dangaard Brouer
2025-01-15 17:40 ` [Intel-wired-lan] " Jesper Dangaard Brouer
2025-01-16 12:52 ` Yunsheng Lin
2025-01-16 12:52 ` [Intel-wired-lan] " Yunsheng Lin
2025-01-16 18:02 ` Jesper Dangaard Brouer
2025-01-16 18:02 ` [Intel-wired-lan] " Jesper Dangaard Brouer
2025-01-17 11:35 ` Yunsheng Lin
2025-01-17 11:35 ` [Intel-wired-lan] " Yunsheng Lin
2025-01-18 8:04 ` Jesper Dangaard Brouer
2025-01-18 8:04 ` [Intel-wired-lan] " Jesper Dangaard Brouer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87sepqhe3n.fsf@toke.dk \
--to=toke@redhat.com \
--cc=aleksander.lobakin@intel.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=fanghaiqing@huawei.com \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=ilias.apalodimas@linaro.org \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linyunsheng@huawei.com \
--cc=liuyonglong@huawei.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=xuanzhuo@linux.alibaba.com \
--cc=zhangkun09@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.