From: Jesper Dangaard Brouer <hawk@kernel.org>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
netdev@vger.kernel.org, Ratheesh Kannoth <rkannoth@marvell.com>
Cc: "David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Geetha sowjanya <gakula@marvell.com>,
Ilias Apalodimas <ilias.apalodimas@linaro.org>,
Jakub Kicinski <kuba@kernel.org>,
Jesper Dangaard Brouer <hawk@kernel.org>,
Paolo Abeni <pabeni@redhat.com>,
Subbaraya Sundeep <sbhatta@marvell.com>,
Sunil Goutham <sgoutham@marvell.com>,
Thomas Gleixner <tglx@linutronix.de>,
hariprasad <hkelam@marvell.com>,
Alexander Lobakin <aleksander.lobakin@intel.com>,
Qingfang DENG <qingfang.deng@siflower.com.cn>
Subject: Re: [BUG] Possible unsafe page_pool usage in octeontx2
Date: Wed, 23 Aug 2023 21:45:04 +0200 [thread overview]
Message-ID: <d34d4c1c-2436-3d4c-268c-b971c9cc473f@kernel.org> (raw)
In-Reply-To: <20230823094757.gxvCEOBi@linutronix.de>
(Cc Olek as he have changes in this code path)
On 23/08/2023 11.47, Sebastian Andrzej Siewior wrote:
> Hi,
>
> I've been looking at the page_pool locking.
>
> page_pool_alloc_frag() -> page_pool_alloc_pages() ->
> __page_pool_get_cached():
>
> There core of the allocation is:
> | /* Caller MUST guarantee safe non-concurrent access, e.g. softirq */
> | if (likely(pool->alloc.count)) {
> | /* Fast-path */
> | page = pool->alloc.cache[--pool->alloc.count];
>
> The access to the `cache' array and the `count' variable is not locked.
> This is fine as long as there only one consumer per pool. In my
> understanding the intention is to have one page_pool per NAPI callback
> to ensure this.
>
Yes, the intention is a single PP instance is "bound" to one RX-NAPI.
> The pool can be filled in the same context (within allocation if the
> pool is empty). There is also page_pool_recycle_in_cache() which fills
> the pool from within skb free, for instance:
> napi_consume_skb() -> skb_release_all() -> skb_release_data() ->
> napi_frag_unref() -> page_pool_return_skb_page().
>
> The last one has the following check here:
> | napi = READ_ONCE(pp->p.napi);
> | allow_direct = napi_safe && napi &&
> | READ_ONCE(napi->list_owner) == smp_processor_id();
>
> This eventually ends in page_pool_recycle_in_cache() where it adds the
> page to the cache buffer if the check above is true (and BH is disabled).
>
> napi->list_owner is set once NAPI is scheduled until the poll callback
> completed. It is safe to add items to list because only one of the two
> can run on a single CPU and the completion of them ensured by having BH
> disabled the whole time.
>
> This breaks in octeontx2 where a worker is used to fill the buffer:
> otx2_pool_refill_task() -> otx2_alloc_rbuf() -> __otx2_alloc_rbuf() ->
> otx2_alloc_pool_buf() -> page_pool_alloc_frag().
>
This seems problematic! - this is NOT allowed.
But otx2_pool_refill_task() is a work-queue, and I though it runs in
process-context. This WQ process is not allowed to use the lockless PP
cache. This seems to be a bug!
The problematic part is otx2_alloc_rbuf() that disables BH:
int otx2_alloc_rbuf(struct otx2_nic *pfvf, struct otx2_pool *pool,
dma_addr_t *dma)
{
int ret;
local_bh_disable();
ret = __otx2_alloc_rbuf(pfvf, pool, dma);
local_bh_enable();
return ret;
}
The fix, can be to not do this local_bh_disable() in this driver?
> BH is disabled but the add of a page can still happen while NAPI
> callback runs on a remote CPU and so corrupting the index/ array.
>
> API wise I would suggest to
>
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 7ff80b80a6f9f..b50e219470a36 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -612,7 +612,7 @@ __page_pool_put_page(struct page_pool *pool, struct page *page,
> page_pool_dma_sync_for_device(pool, page,
> dma_sync_size);
>
> - if (allow_direct && in_softirq() &&
> + if (allow_direct && in_serving_softirq() &&
This is the "return/free/put" code path, where we have "allow_direct" as
a protection in the API. API users are suppose to use
page_pool_recycle_direct() to indicate this, but as some point we
allowed APIs to expose 'allow_direct'.
The PP-alloc side is more fragile, and maybe the in_serving_softirq()
belongs there.
> page_pool_recycle_in_cache(page, pool))
> return NULL;
>
> because the intention (as I understand it) is to be invoked from within
> the NAPI callback (while softirq is served) and not if BH is just
> disabled due to a lock or so.
>
True, and it used-to-be like this (in_serving_softirq), but as Ilias
wrote it was changed recently. This was to support threaded-NAPI (in
542bcea4be866b ("net: page_pool: use in_softirq() instead")), which
I understood was one of your (Sebastian's) use-cases.
> It would also make sense to a add WARN_ON_ONCE(!in_serving_softirq()) to
> page_pool_alloc_pages() to spot usage outside of softirq. But this will
> trigger in every driver since the same function is used in the open
> callback to initially setup the HW.
>
I'm very open to ideas of detecting this. Since mentioned commit PP is
open to these kind of miss-uses of the API.
One idea would be to leverage that NAPI napi->list_owner will have been
set to something else than -1, when this is NAPI context. Getting hold
of napi object, could be done via pp->p.napi (but as Jakub wrote this is
opt-in ATM).
--Jesper
next prev parent reply other threads:[~2023-08-23 19:45 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-23 9:47 [BUG] Possible unsafe page_pool usage in octeontx2 Sebastian Andrzej Siewior
2023-08-23 11:36 ` Ilias Apalodimas
2023-08-23 13:31 ` Sebastian Andrzej Siewior
2023-08-23 12:28 ` [EXT] " Ratheesh Kannoth
2023-08-23 12:54 ` Sebastian Andrzej Siewior
2023-08-24 2:49 ` Ratheesh Kannoth
2023-08-23 14:49 ` Jakub Kicinski
2023-08-23 19:45 ` Jesper Dangaard Brouer [this message]
2023-08-24 7:21 ` Ilias Apalodimas
2023-08-24 7:42 ` Ilias Apalodimas
2023-08-24 15:26 ` Alexander Lobakin
2023-08-25 13:22 ` Jesper Dangaard Brouer
2023-08-25 13:38 ` Alexander Lobakin
2023-08-25 17:25 ` Jesper Dangaard Brouer
2023-08-26 0:42 ` Jakub Kicinski
2023-08-28 10:59 ` Alexander Lobakin
2023-08-28 12:25 ` Jesper Dangaard Brouer
2023-08-28 11:07 ` Alexander Lobakin
2023-08-28 12:34 ` Jesper Dangaard Brouer
2023-08-28 16:40 ` Sebastian Andrzej Siewior
2023-08-25 13:16 ` Jesper Dangaard Brouer
2023-08-30 7:14 ` [EXT] " Ratheesh Kannoth
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d34d4c1c-2436-3d4c-268c-b971c9cc473f@kernel.org \
--to=hawk@kernel.org \
--cc=aleksander.lobakin@intel.com \
--cc=bigeasy@linutronix.de \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=gakula@marvell.com \
--cc=hkelam@marvell.com \
--cc=ilias.apalodimas@linaro.org \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=qingfang.deng@siflower.com.cn \
--cc=rkannoth@marvell.com \
--cc=sbhatta@marvell.com \
--cc=sgoutham@marvell.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).