From: Jesper Dangaard Brouer <jbrouer@redhat.com>
To: Jakub Kicinski <kuba@kernel.org>, Liang Chen <liangchen.linux@gmail.com>
Cc: brouer@redhat.com, hawk@kernel.org, ilias.apalodimas@linaro.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
davem@davemloft.net, edumazet@google.com, pabeni@redhat.com
Subject: Re: [PATCH net-next] page pool: not return page to alloc cache during pool destruction
Date: Thu, 15 Jun 2023 16:00:13 +0200 [thread overview]
Message-ID: <b28b0e3e-87e4-5a02-c172-2d1424405a5a@redhat.com> (raw)
In-Reply-To: <20230614212031.7e1b6893@kernel.org>
On 15/06/2023 06.20, Jakub Kicinski wrote:
> On Thu, 15 Jun 2023 09:36:45 +0800 Liang Chen wrote:
>> When destroying a page pool, the alloc cache and recycle ring are emptied.
>> If there are inflight pages, the retry process will periodically check the
>> recycle ring for recently returned pages, but not the alloc cache (alloc
>> cache is only emptied once). As a result, any pages returned to the alloc
>> cache after the page pool destruction will be stuck there and cause the
>> retry process to continuously look for inflight pages and report warnings.
>>
>> To safeguard against this situation, any pages returning to the alloc cache
>> after pool destruction should be prevented.
>
> Let's hear from the page pool maintainers but I think the driver
> is supposed to prevent allocations while pool is getting destroyed.
Yes, this is a driver API violation. Direct returns (allow_direct) can
only happen from drivers RX path, e.g while driver is active processing
packets (in NAPI). When driver is shutting down a page_pool, it MUST
have stopped RX path and NAPI (napi_disable()) before calling
page_pool_destroy() Thus, this situation cannot happen and if it does
it is a driver bug.
> Perhaps we can add DEBUG_NET_WARN_ON_ONCE() for this condition to
> prevent wasting cycles in production builds?
>
For this page_pool code path ("allow_direct") it is extremely important
we avoid wasting cycles in production. As this is used for XDP_DROP
use-cases for 100Gbit/s NICs.
At 100Gbit/s with 64 bytes Ethernet frames (84 on wire), the wirespeed
is 148.8Mpps which gives CPU 6.72 nanosec to process each packet.
The microbench[1] shows (below signature) that page_pool_alloc_pages() +
page_pool_recycle_direct() cost 4.041 ns (or 14 cycles(tsc)).
Thus, for this code fast-path every cycle counts.
In practice PCIe transactions/sec seems limit total system to 108Mpps
(with multiple RX-queues + descriptor compression) thus 9.26 nanosec to
process each packet. Individual hardware RX queues seems be limited to
around 36Mpps thus 27.77 nanosec to process each packet.
Adding a DEBUG_NET_WARN_ON_ONCE will be annoying as I like to run my
testlab kernels with CONFIG_DEBUG_NET, which will change this extreme
fash-path slightly (adding some unlikely's affecting code layout to the
mix).
Question to Liang Chen: Did you hit this bug in practice?
--Jesper
CPU E5-1650 v4 @ 3.60GHz
tasklet_page_pool01_fast_path Per elem: 14 cycles(tsc) 4.041 ns
tasklet_page_pool02_ptr_ring Per elem: 49 cycles(tsc) 13.622 ns
tasklet_page_pool03_slow Per elem: 162 cycles(tsc) 45.198 ns
[1]
https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/bench_page_pool_simple.c
next prev parent reply other threads:[~2023-06-15 14:00 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-15 1:36 [PATCH net-next] page pool: not return page to alloc cache during pool destruction Liang Chen
2023-06-15 4:20 ` Jakub Kicinski
2023-06-15 10:49 ` Ilias Apalodimas
2023-06-15 14:00 ` Jesper Dangaard Brouer [this message]
2023-06-16 2:45 ` Jakub Kicinski
2023-06-16 3:07 ` Liang Chen
2023-06-15 9:01 ` Yunsheng Lin
2023-06-15 9:06 ` Yunsheng Lin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b28b0e3e-87e4-5a02-c172-2d1424405a5a@redhat.com \
--to=jbrouer@redhat.com \
--cc=brouer@redhat.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=ilias.apalodimas@linaro.org \
--cc=kuba@kernel.org \
--cc=liangchen.linux@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).