From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Yunsheng Lin <yunshenglin0825@gmail.com>,
Yunsheng Lin <linyunsheng@huawei.com>,
davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com
Cc: zhangkun09@huawei.com, liuyonglong@huawei.com,
fanghaiqing@huawei.com,
Alexander Lobakin <aleksander.lobakin@intel.com>,
Xuan Zhuo <xuanzhuo@linux.alibaba.com>,
Jesper Dangaard Brouer <hawk@kernel.org>,
Ilias Apalodimas <ilias.apalodimas@linaro.org>,
Eric Dumazet <edumazet@google.com>,
Simon Horman <horms@kernel.org>,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH net-next v7 2/8] page_pool: fix timing for checking and disabling napi_local
Date: Mon, 27 Jan 2025 14:47:03 +0100 [thread overview]
Message-ID: <874j1kpdwo.fsf@toke.dk> (raw)
In-Reply-To: <84282526-6229-41c7-8f6b-5f2c500dcd8e@gmail.com>
Yunsheng Lin <yunshenglin0825@gmail.com> writes:
> On 1/25/2025 1:13 AM, Toke Høiland-Jørgensen wrote:
>> Yunsheng Lin <linyunsheng@huawei.com> writes:
>>
>>>> So I really don't see a way for this race to happen with correct usage
>>>> of the page_pool and NAPI APIs, which means there's no reason to make
>>>> the change you are proposing here.
>>>
>>> I looked at one driver setting pp->napi, it seems the bnxt driver doesn't
>>> seems to call page_pool_disable_direct_recycling() when unloading, see
>>> bnxt_half_close_nic(), page_pool_disable_direct_recycling() seems to be
>>> only called for the new queue_mgmt API:
>>>
>>> /* rtnl_lock held, this call can only be made after a previous successful
>>> * call to bnxt_half_open_nic().
>>> */
>>> void bnxt_half_close_nic(struct bnxt *bp)
>>> {
>>> bnxt_hwrm_resource_free(bp, false, true);
>>> bnxt_del_napi(bp); *----call napi del and rcu sync----*
>>> bnxt_free_skbs(bp);
>>> bnxt_free_mem(bp, true); *------call page_pool_destroy()----*
>>> clear_bit(BNXT_STATE_HALF_OPEN, &bp->state);
>>> }
>>>
>>> Even if there is a page_pool_disable_direct_recycling() called between
>>> bnxt_del_napi() and bnxt_free_mem(), the timing window still exist as
>>> rcu sync need to be called after page_pool_disable_direct_recycling(),
>>> it seems some refactor is needed for bnxt driver to reuse the rcu sync
>>> from the NAPI API, in order to avoid calling the rcu sync for
>>> page_pool_destroy().
>>
>> Well, I would consider that usage buggy. A page pool object is created
>> with a reference to the napi struct; so the page pool should also be
>> destroyed (clearing its reference) before the napi memory is freed. I
>> guess this is not really documented anywhere, but it's pretty standard
>> practice to free objects in the opposite order of their creation.
>
> I am not so familiar with rule about the creation API of NAPI, but the
> implementation of bnxt driver can have reference of 'struct napi' before
> calling netif_napi_add(), see below:
>
> static int __bnxt_open_nic(struct bnxt *bp, bool irq_re_init, bool
> link_re_init)
> {
> .......
> rc = bnxt_alloc_mem(bp, irq_re_init); *create page_pool*
> if (rc) {
> netdev_err(bp->dev, "bnxt_alloc_mem err: %x\n", rc);
> goto open_err_free_mem;
> }
>
> if (irq_re_init) {
> bnxt_init_napi(bp); *netif_napi_add*
> rc = bnxt_request_irq(bp);
> if (rc) {
> netdev_err(bp->dev, "bnxt_request_irq err: %x\n", rc);
> goto open_err_irq;
> }
> }
>
> .....
> }
Regardless of the initialisation error, the fact that bnxt frees the
NAPI memory before calling page_pool_destroy() is a driver bug. Mina has
a suggestion for a warning to catch such bugs over in this thread:
https://lore.kernel.org/r/CAHS8izOv=tUiuzha6NFq1-ZurLGz9Jdi78jb3ey4ExVJirMprA@mail.gmail.com
>> So no, I don't think this is something that should be fixed on the page
>> pool side (and certainly not by adding another synchronize_rcu() call
>> per queue!); rather, we should fix the drivers that get this wrong (and
>> probably document the requirement a bit better).
>
> Even if timing problem of checking and disabling napi_local should not
> be fixed on the page_pool side, do we have some common understanding
> about fixing the DMA API misuse problem on the page_pool side?
> If yes, do we have some common understanding about some mechanism
> like synchronize_rcu() might be still needed on the page_pool side?
I have not reviewed the rest of your patch set, I only looked at this
patch. I see you posted v8 without addressing Jesper's ask for a
conceptual description of your design. I am not going to review a
600-something line patch series without such a description to go by, so
please address that first.
> If yes, it may be better to focus on discussing how to avoid calling rcu
> sync for each queue mentioned in [1].
Regardless of whether a synchronize_rcu() is needed in the final design
(and again, note that I don't have an opinion on this before reviewing
the whole series), this patch should be dropped from the series. The bug
it is purporting to fix is a driver API misuse and should be fixed in
the drivers, cf the above.
-Toke
next prev parent reply other threads:[~2025-01-27 13:47 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-10 13:06 [PATCH net-next v7 0/8] fix two bugs related to page_pool Yunsheng Lin
2025-01-10 13:06 ` [Intel-wired-lan] " Yunsheng Lin
2025-01-10 13:06 ` [PATCH net-next v7 1/8] page_pool: introduce page_pool_get_pp() API Yunsheng Lin
2025-01-10 13:06 ` [Intel-wired-lan] " Yunsheng Lin
2025-01-10 13:06 ` [PATCH net-next v7 2/8] page_pool: fix timing for checking and disabling napi_local Yunsheng Lin
2025-01-10 15:40 ` Toke Høiland-Jørgensen
2025-01-11 5:24 ` Yunsheng Lin
2025-01-14 13:03 ` Yunsheng Lin
2025-01-20 11:24 ` Toke Høiland-Jørgensen
2025-01-22 11:02 ` Yunsheng Lin
2025-01-24 17:13 ` Toke Høiland-Jørgensen
2025-01-25 14:21 ` Yunsheng Lin
2025-01-27 13:47 ` Toke Høiland-Jørgensen [this message]
2025-02-04 13:51 ` Yunsheng Lin
2025-01-10 13:06 ` [PATCH net-next v7 3/8] page_pool: fix IOMMU crash when driver has already unbound Yunsheng Lin
2025-01-15 16:29 ` Jesper Dangaard Brouer
2025-01-16 12:52 ` Yunsheng Lin
2025-01-16 16:09 ` Jesper Dangaard Brouer
2025-01-17 11:56 ` Yunsheng Lin
2025-01-17 16:56 ` Jesper Dangaard Brouer
2025-01-18 13:36 ` Yunsheng Lin
2025-01-10 13:06 ` [PATCH net-next v7 4/8] page_pool: support unlimited number of inflight pages Yunsheng Lin
2025-01-10 13:06 ` [PATCH net-next v7 5/8] page_pool: skip dma sync operation for " Yunsheng Lin
2025-01-10 13:07 ` [PATCH net-next v7 6/8] page_pool: use list instead of ptr_ring for ring cache Yunsheng Lin
2025-01-10 13:07 ` [PATCH net-next v7 7/8] page_pool: batch refilling pages to reduce atomic operation Yunsheng Lin
2025-01-10 13:07 ` [PATCH net-next v7 8/8] page_pool: use list instead of array for alloc cache Yunsheng Lin
2025-01-14 14:31 ` [PATCH net-next v7 0/8] fix two bugs related to page_pool Jesper Dangaard Brouer
2025-01-14 14:31 ` [Intel-wired-lan] " Jesper Dangaard Brouer
2025-01-15 11:33 ` Yunsheng Lin
2025-01-15 11:33 ` [Intel-wired-lan] " Yunsheng Lin
2025-01-15 17:40 ` Jesper Dangaard Brouer
2025-01-15 17:40 ` [Intel-wired-lan] " Jesper Dangaard Brouer
2025-01-16 12:52 ` Yunsheng Lin
2025-01-16 12:52 ` [Intel-wired-lan] " Yunsheng Lin
2025-01-16 18:02 ` Jesper Dangaard Brouer
2025-01-16 18:02 ` [Intel-wired-lan] " Jesper Dangaard Brouer
2025-01-17 11:35 ` Yunsheng Lin
2025-01-17 11:35 ` [Intel-wired-lan] " Yunsheng Lin
2025-01-18 8:04 ` Jesper Dangaard Brouer
2025-01-18 8:04 ` [Intel-wired-lan] " Jesper Dangaard Brouer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=874j1kpdwo.fsf@toke.dk \
--to=toke@redhat.com \
--cc=aleksander.lobakin@intel.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=fanghaiqing@huawei.com \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=ilias.apalodimas@linaro.org \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linyunsheng@huawei.com \
--cc=liuyonglong@huawei.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=xuanzhuo@linux.alibaba.com \
--cc=yunshenglin0825@gmail.com \
--cc=zhangkun09@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.