Re: [PATCH RFC v4 1/3] page_pool: fix timing for checking and disabling napi_local

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jakub Kicinski <kuba@kernel.org>
To: Yunsheng Lin <linyunsheng@huawei.com>
Cc: <davem@davemloft.net>, <pabeni@redhat.com>,
	<liuyonglong@huawei.com>, <fanghaiqing@huawei.com>,
	<zhangkun09@huawei.com>,
	Alexander Lobakin <aleksander.lobakin@intel.com>,
	Xuan Zhuo <xuanzhuo@linux.alibaba.com>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	Ilias Apalodimas <ilias.apalodimas@linaro.org>,
	Eric Dumazet <edumazet@google.com>,
	Simon Horman <horms@kernel.org>, <netdev@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH RFC v4 1/3] page_pool: fix timing for checking and disabling napi_local
Date: Fri, 6 Dec 2024 08:09:43 -0800	[thread overview]
Message-ID: <20241206080943.32da477c@kernel.org> (raw)
In-Reply-To: <c2b306af-4817-4169-814b-adbf25803919@huawei.com>

On Fri, 6 Dec 2024 20:29:40 +0800 Yunsheng Lin wrote:
> On 2024/12/6 8:42, Jakub Kicinski wrote:
> > On Thu, 5 Dec 2024 19:43:25 +0800 Yunsheng Lin wrote:  
> >> It depends on what is the callers is trying to protect by calling
> >> page_pool_disable_direct_recycling().
> >>
> >> It seems the use case for the only user of the API in bnxt driver
> >> is about reuseing the same NAPI for different page_pool instances.
> >>
> >> According to the steps in netdev_rx_queue.c:
> >> 1. allocate new queue memory & create page_pool
> >> 2. stop old rx queue.
> >> 3. start new rx queue with new page_pool
> >> 4. free old queue memory + destroy page_pool.
> >>
> >> The page_pool_disable_direct_recycling() is called in step 2, I am
> >> not sure how napi_enable() & napi_disable() are called in the above
> >> flow, but it seems there is no use-after-free problem this patch is
> >> trying to fix for the above flow.
> >>
> >> It doesn't seems to have any concurrent access problem if napi->list_owner
> >> is set to -1 before napi_disable() returns and the napi_enable() for the
> >> new queue is called after page_pool_disable_direct_recycling() is called
> >> in step 2.  
> > 
> > The fix is presupposing there is long delay between fetching of
> > the NAPI pointer and its access. The concern is that NAPI gets
> > restarted in step 3 after we already READ_ONCE()'ed the pointer,
> > then we access it and judge it to be running on the same core.
> > Then we put the page into the fast cache which will never get
> > flushed.  
> 
> It seems the napi_disable() is called before netdev_rx_queue_restart()
> and napi_enable() and ____napi_schedule() are called after
> netdev_rx_queue_restart() as there is no napi API called in the
> implementation of 'netdev_queue_mgmt_ops' for bnxt driver?
> 
> If yes, napi->list_owner is set to -1 before step 1 and only set to
> a valid cpu in step 6 as below:
> 1. napi_disable()
> 2. allocate new queue memory & create new page_pool.
> 3. stop old rx queue.
> 4. start new rx queue with new page_pool.
> 5. free old queue memory + destroy old page_pool.
> 6. napi_enable() & ____napi_schedule()
> 
> And there are at least three flows involved here:
> flow 1: calling napi_complete_done() and set napi->list_owner to -1.
> flow 2: calling netdev_rx_queue_restart().
> flow 3: calling skb_defer_free_flush() with the page belonging to the old
>        page_pool.
> 
> The only case of page_pool_napi_local() returning true in flow 3 I can
> think of is that flow 1 and flow 3 might need to be called in the softirq
> of the same CPU and flow 3 might need to be called before flow 1.
> 
> It seems impossible that page_pool_napi_local() will return true between
> step 1 and step 6 as updated napi->list_owner is always seen by flow 3
> when they are both called in the softirq context of the same CPU or
> napi->list_owner != CPU that calling flow 3, which seems like an implicit
> assumption for the case of napi scheduling between different cpus too.
> 
> And old page_pool is destroyed in step 5, I am not sure if it is necessary
> to call page_pool_disable_direct_recycling() in step 3 if page_pool_destroy()
> already have the synchronize_rcu() in step 5 before enabling napi.
> 
> If not, maybe I am missing something here.

Yes, I believe you got the steps 5 and 6 backwards.

> It would be good to be more specific
> about the timing window that page_pool_napi_local() returning true for the old
> page_pool.

next prev parent reply	other threads:[~2024-12-06 16:09 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20241120103456.396577-1-linyunsheng@huawei.com>
2024-11-20 10:34 ` [PATCH RFC v4 1/3] page_pool: fix timing for checking and disabling napi_local Yunsheng Lin
2024-12-03  2:49   ` Jakub Kicinski
2024-12-04 11:01     ` Yunsheng Lin
2024-12-05  1:28       ` Jakub Kicinski
2024-12-05 11:43         ` Yunsheng Lin
2024-12-06  0:42           ` Jakub Kicinski
2024-12-06 12:29             ` Yunsheng Lin
2024-12-06 16:09               ` Jakub Kicinski [this message]
2024-12-07  5:52                 ` Yunsheng Lin
2024-11-20 10:34 ` [PATCH RFC v4 2/3] page_pool: fix IOMMU crash when driver has already unbound Yunsheng Lin
2024-11-20 15:10   ` Jesper Dangaard Brouer
2024-11-21  8:03     ` Yunsheng Lin
2024-11-25 15:25       ` Jesper Dangaard Brouer
2024-11-26  8:22         ` Yunsheng Lin
2024-11-26 10:22           ` Jesper Dangaard Brouer
2024-11-26 11:46             ` Yunsheng Lin
2024-11-26 21:51       ` Mina Almasry
2024-11-26 23:53         ` Alexander Duyck
2024-11-27  9:35           ` Yunsheng Lin
2024-11-27 15:31             ` Robin Murphy
2024-11-27 16:27               ` Alexander Duyck
2024-12-04 11:16                 ` Yunsheng Lin
2024-11-27 19:39             ` Mina Almasry
2024-11-20 10:34 ` [PATCH RFC v4 3/3] page_pool: skip dma sync operation for inflight pages Yunsheng Lin
2024-11-20 16:17   ` Robin Murphy
2024-11-21  8:04     ` Yunsheng Lin
2024-11-21 13:44       ` Robin Murphy
2024-11-22  7:20         ` Yunsheng Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241206080943.32da477c@kernel.org \
    --to=kuba@kernel.org \
    --cc=aleksander.lobakin@intel.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=fanghaiqing@huawei.com \
    --cc=hawk@kernel.org \
    --cc=horms@kernel.org \
    --cc=ilias.apalodimas@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linyunsheng@huawei.com \
    --cc=liuyonglong@huawei.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=xuanzhuo@linux.alibaba.com \
    --cc=zhangkun09@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).