From: Joe Damato <jdamato@fastly.com>
To: Yunsheng Lin <linyunsheng@huawei.com>
Cc: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com,
liuyonglong@huawei.com, fanghaiqing@huawei.com,
zhangkun09@huawei.com,
Alexander Lobakin <aleksander.lobakin@intel.com>,
Jesper Dangaard Brouer <hawk@kernel.org>,
Ilias Apalodimas <ilias.apalodimas@linaro.org>,
Eric Dumazet <edumazet@google.com>,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
mkarsten@uwaterloo.ca
Subject: Re: [PATCH net v2 1/2] page_pool: fix timing for checking and disabling napi_local
Date: Thu, 26 Sep 2024 13:06:16 -0700 [thread overview]
Message-ID: <ZvW-uEXITmZtncub@LQ3V64L9R2> (raw)
In-Reply-To: <20240925075707.3970187-2-linyunsheng@huawei.com>
On Wed, Sep 25, 2024 at 03:57:06PM +0800, Yunsheng Lin wrote:
> page_pool page may be freed from skb_defer_free_flush() to
> softirq context, it may cause concurrent access problem for
> pool->alloc cache due to the below time window, as below,
> both CPU0 and CPU1 may access the pool->alloc cache
> concurrently in page_pool_empty_alloc_cache_once() and
> page_pool_recycle_in_cache():
>
> CPU 0 CPU1
> page_pool_destroy() skb_defer_free_flush()
> . .
> . page_pool_put_unrefed_page()
> . .
> . allow_direct = page_pool_napi_local()
> . .
> page_pool_disable_direct_recycling() .
> . .
> page_pool_empty_alloc_cache_once() page_pool_recycle_in_cache()
>
> Use rcu mechanism to avoid the above concurrent access problem.
>
> Note, the above was found during code reviewing on how to fix
> the problem in [1].
>
> 1. https://lore.kernel.org/lkml/8067f204-1380-4d37-8ffd-007fc6f26738@kernel.org/T/
>
> Fixes: dd64b232deb8 ("page_pool: unlink from napi during destroy")
> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> CC: Alexander Lobakin <aleksander.lobakin@intel.com>
Sorry for the noise, but I hit an assert in page_pool_unref_netmem
and I am trying to figure out if it is related to what you all are
debugging? I thought it might be, but if not, my apologies.
Just in case it is, I've put the backtrace on github [1]. I
triggered this while testing an RFC [2] I've been working on. Please
note, the RFC posted publicly does not currently apply cleanly to
net-next and has some bugs I've fixed in my v4. I had planned to
send the v4 early next week and mention the page pool issue I am
hitting.
After triggering the assert in [1], I tried applying the patches of
this series and retesting the RFC v4 I have queued locally. When I
did that, I hit a new assertion page_pool_destroy [3].
There are a few possibilities:
1. I am hitting the same issue you are hitting
2. I am hitting a different issue caused by a bug I introduced
3. I am hitting a different page pool issue entirely
In case of 2 and 3, my apologies for the noise.
In case of 1: If you think I am hitting the same issue as you are
trying to solve, I can reliably reproduce this with my RFC v4 and
would be happy to test any patches meant to fix the issue.
[1]: https://gist.githubusercontent.com/jdamato-fsly/eb628c8bf4e4d1c8158441644cdb7e52/raw/96dcf422303d9e64b5060f2fb0f1d71e04ab048e/warning1.txt
[2]: https://lore.kernel.org/all/20240912100738.16567-1-jdamato@fastly.com/#r
[3]: https://gist.githubusercontent.com/jdamato-fsly/eb628c8bf4e4d1c8158441644cdb7e52/raw/96dcf422303d9e64b5060f2fb0f1d71e04ab048e/warning2.txt
next prev parent reply other threads:[~2024-09-26 20:06 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-25 7:57 [PATCH net v2 0/2] fix two bugs related to page_pool Yunsheng Lin
2024-09-25 7:57 ` [Intel-wired-lan] " Yunsheng Lin
2024-09-25 7:57 ` [PATCH net v2 1/2] page_pool: fix timing for checking and disabling napi_local Yunsheng Lin
2024-09-26 20:06 ` Joe Damato [this message]
2024-09-27 3:58 ` Yunsheng Lin
2024-10-01 11:30 ` Paolo Abeni
2024-10-02 1:52 ` Yunsheng Lin
2024-10-09 0:40 ` Jakub Kicinski
2024-10-09 3:33 ` Yunsheng Lin
2024-10-09 15:13 ` Jakub Kicinski
2024-10-10 9:14 ` Yunsheng Lin
2024-09-25 7:57 ` [PATCH net v2 2/2] page_pool: fix IOMMU crash when driver has already unbound Yunsheng Lin
2024-09-25 7:57 ` [Intel-wired-lan] " Yunsheng Lin
2024-09-26 18:15 ` Mina Almasry
2024-09-26 18:15 ` [Intel-wired-lan] " Mina Almasry
2024-09-27 3:57 ` Yunsheng Lin
2024-09-27 3:57 ` [Intel-wired-lan] " Yunsheng Lin
2024-09-27 5:54 ` Mina Almasry
2024-09-27 5:54 ` [Intel-wired-lan] " Mina Almasry
2024-09-27 7:25 ` Yunsheng Lin
2024-09-27 7:25 ` [Intel-wired-lan] " Yunsheng Lin
2024-09-27 9:21 ` Ilias Apalodimas
2024-09-27 9:21 ` [Intel-wired-lan] " Ilias Apalodimas
2024-09-27 9:49 ` Yunsheng Lin
2024-09-27 9:49 ` [Intel-wired-lan] " Yunsheng Lin
2024-09-27 9:58 ` Ilias Apalodimas
2024-09-27 9:58 ` [Intel-wired-lan] " Ilias Apalodimas
2024-09-27 11:29 ` Yunsheng Lin
2024-09-27 11:29 ` [Intel-wired-lan] " Yunsheng Lin
2024-09-28 7:34 ` Ilias Apalodimas
2024-09-28 7:34 ` [Intel-wired-lan] " Ilias Apalodimas
2024-09-29 2:44 ` Yunsheng Lin
2024-09-29 2:44 ` [Intel-wired-lan] " Yunsheng Lin
2024-09-30 8:09 ` Ilias Apalodimas
2024-09-30 8:09 ` [Intel-wired-lan] " Ilias Apalodimas
2024-09-30 8:38 ` Yunsheng Lin
2024-09-30 8:38 ` [Intel-wired-lan] " Yunsheng Lin
2024-10-01 13:32 ` Paolo Abeni
2024-10-01 13:32 ` [Intel-wired-lan] " Paolo Abeni
2024-10-02 2:34 ` Yunsheng Lin
2024-10-02 2:34 ` [Intel-wired-lan] " Yunsheng Lin
2024-10-02 7:37 ` Paolo Abeni
2024-10-02 7:37 ` [Intel-wired-lan] " Paolo Abeni
2024-10-02 8:23 ` Ilias Apalodimas
2024-10-02 8:23 ` [Intel-wired-lan] " Ilias Apalodimas
2024-10-05 12:38 ` Yunsheng Lin
2024-10-05 12:38 ` [Intel-wired-lan] " Yunsheng Lin
2024-10-02 6:46 ` Ilias Apalodimas
2024-10-02 6:46 ` [Intel-wired-lan] " Ilias Apalodimas
2024-10-02 6:51 ` Ilias Apalodimas
2024-10-02 6:51 ` [Intel-wired-lan] " Ilias Apalodimas
2024-09-25 13:31 ` [PATCH net v2 0/2] fix two bugs related to page_pool Yonglong Liu
2024-09-25 13:31 ` [Intel-wired-lan] " Yonglong Liu
2024-10-12 12:05 ` Yunsheng Lin
2024-10-12 12:05 ` [Intel-wired-lan] " Yunsheng Lin
2024-10-15 0:14 ` Jakub Kicinski
2024-10-15 0:14 ` [Intel-wired-lan] " Jakub Kicinski
2024-10-15 10:52 ` Yunsheng Lin
2024-10-15 10:52 ` [Intel-wired-lan] " Yunsheng Lin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZvW-uEXITmZtncub@LQ3V64L9R2 \
--to=jdamato@fastly.com \
--cc=aleksander.lobakin@intel.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=fanghaiqing@huawei.com \
--cc=hawk@kernel.org \
--cc=ilias.apalodimas@linaro.org \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linyunsheng@huawei.com \
--cc=liuyonglong@huawei.com \
--cc=mkarsten@uwaterloo.ca \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=zhangkun09@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.