From: Jakub Kicinski <kuba@kernel.org>
To: Yunsheng Lin <linyunsheng@huawei.com>
Cc: <davem@davemloft.net>, <pabeni@redhat.com>,
<liuyonglong@huawei.com>, <fanghaiqing@huawei.com>,
<zhangkun09@huawei.com>,
Alexander Lobakin <aleksander.lobakin@intel.com>,
Xuan Zhuo <xuanzhuo@linux.alibaba.com>,
Jesper Dangaard Brouer <hawk@kernel.org>,
Ilias Apalodimas <ilias.apalodimas@linaro.org>,
Eric Dumazet <edumazet@google.com>,
Simon Horman <horms@kernel.org>, <netdev@vger.kernel.org>,
<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH RFC v4 1/3] page_pool: fix timing for checking and disabling napi_local
Date: Fri, 6 Dec 2024 08:09:43 -0800 [thread overview]
Message-ID: <20241206080943.32da477c@kernel.org> (raw)
In-Reply-To: <c2b306af-4817-4169-814b-adbf25803919@huawei.com>
On Fri, 6 Dec 2024 20:29:40 +0800 Yunsheng Lin wrote:
> On 2024/12/6 8:42, Jakub Kicinski wrote:
> > On Thu, 5 Dec 2024 19:43:25 +0800 Yunsheng Lin wrote:
> >> It depends on what is the callers is trying to protect by calling
> >> page_pool_disable_direct_recycling().
> >>
> >> It seems the use case for the only user of the API in bnxt driver
> >> is about reuseing the same NAPI for different page_pool instances.
> >>
> >> According to the steps in netdev_rx_queue.c:
> >> 1. allocate new queue memory & create page_pool
> >> 2. stop old rx queue.
> >> 3. start new rx queue with new page_pool
> >> 4. free old queue memory + destroy page_pool.
> >>
> >> The page_pool_disable_direct_recycling() is called in step 2, I am
> >> not sure how napi_enable() & napi_disable() are called in the above
> >> flow, but it seems there is no use-after-free problem this patch is
> >> trying to fix for the above flow.
> >>
> >> It doesn't seems to have any concurrent access problem if napi->list_owner
> >> is set to -1 before napi_disable() returns and the napi_enable() for the
> >> new queue is called after page_pool_disable_direct_recycling() is called
> >> in step 2.
> >
> > The fix is presupposing there is long delay between fetching of
> > the NAPI pointer and its access. The concern is that NAPI gets
> > restarted in step 3 after we already READ_ONCE()'ed the pointer,
> > then we access it and judge it to be running on the same core.
> > Then we put the page into the fast cache which will never get
> > flushed.
>
> It seems the napi_disable() is called before netdev_rx_queue_restart()
> and napi_enable() and ____napi_schedule() are called after
> netdev_rx_queue_restart() as there is no napi API called in the
> implementation of 'netdev_queue_mgmt_ops' for bnxt driver?
>
> If yes, napi->list_owner is set to -1 before step 1 and only set to
> a valid cpu in step 6 as below:
> 1. napi_disable()
> 2. allocate new queue memory & create new page_pool.
> 3. stop old rx queue.
> 4. start new rx queue with new page_pool.
> 5. free old queue memory + destroy old page_pool.
> 6. napi_enable() & ____napi_schedule()
>
> And there are at least three flows involved here:
> flow 1: calling napi_complete_done() and set napi->list_owner to -1.
> flow 2: calling netdev_rx_queue_restart().
> flow 3: calling skb_defer_free_flush() with the page belonging to the old
> page_pool.
>
> The only case of page_pool_napi_local() returning true in flow 3 I can
> think of is that flow 1 and flow 3 might need to be called in the softirq
> of the same CPU and flow 3 might need to be called before flow 1.
>
> It seems impossible that page_pool_napi_local() will return true between
> step 1 and step 6 as updated napi->list_owner is always seen by flow 3
> when they are both called in the softirq context of the same CPU or
> napi->list_owner != CPU that calling flow 3, which seems like an implicit
> assumption for the case of napi scheduling between different cpus too.
>
> And old page_pool is destroyed in step 5, I am not sure if it is necessary
> to call page_pool_disable_direct_recycling() in step 3 if page_pool_destroy()
> already have the synchronize_rcu() in step 5 before enabling napi.
>
> If not, maybe I am missing something here.
Yes, I believe you got the steps 5 and 6 backwards.
> It would be good to be more specific
> about the timing window that page_pool_napi_local() returning true for the old
> page_pool.
next prev parent reply other threads:[~2024-12-06 16:09 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-20 10:34 [PATCH RFC v4 0/3] fix two bugs related to page_pool Yunsheng Lin
2024-11-20 10:34 ` [PATCH RFC v4 1/3] page_pool: fix timing for checking and disabling napi_local Yunsheng Lin
2024-12-03 2:49 ` Jakub Kicinski
2024-12-04 11:01 ` Yunsheng Lin
2024-12-05 1:28 ` Jakub Kicinski
2024-12-05 11:43 ` Yunsheng Lin
2024-12-06 0:42 ` Jakub Kicinski
2024-12-06 12:29 ` Yunsheng Lin
2024-12-06 16:09 ` Jakub Kicinski [this message]
2024-12-07 5:52 ` Yunsheng Lin
2024-11-20 10:34 ` [PATCH RFC v4 2/3] page_pool: fix IOMMU crash when driver has already unbound Yunsheng Lin
2024-11-20 15:10 ` Jesper Dangaard Brouer
2024-11-21 8:03 ` Yunsheng Lin
2024-11-25 15:25 ` Jesper Dangaard Brouer
2024-11-26 8:22 ` Yunsheng Lin
2024-11-26 10:22 ` Jesper Dangaard Brouer
2024-11-26 11:46 ` Yunsheng Lin
2024-11-26 21:51 ` Mina Almasry
2024-11-26 23:53 ` Alexander Duyck
2024-11-27 9:35 ` Yunsheng Lin
2024-11-27 15:31 ` Robin Murphy
2024-11-27 16:27 ` Alexander Duyck
2024-12-04 11:16 ` Yunsheng Lin
2024-11-27 19:39 ` Mina Almasry
2024-11-20 10:34 ` [PATCH RFC v4 3/3] page_pool: skip dma sync operation for inflight pages Yunsheng Lin
2024-11-20 16:17 ` Robin Murphy
2024-11-21 8:04 ` Yunsheng Lin
2024-11-21 13:44 ` Robin Murphy
2024-11-22 7:20 ` Yunsheng Lin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241206080943.32da477c@kernel.org \
--to=kuba@kernel.org \
--cc=aleksander.lobakin@intel.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=fanghaiqing@huawei.com \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=ilias.apalodimas@linaro.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linyunsheng@huawei.com \
--cc=liuyonglong@huawei.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=xuanzhuo@linux.alibaba.com \
--cc=zhangkun09@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.