From: Jakub Kicinski <kuba@kernel.org>
To: "Toke Høiland-Jørgensen" <toke@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
Alexei Starovoitov <alexei.starovoitov@gmail.com>,
LKML <linux-kernel@vger.kernel.org>,
Network Development <netdev@vger.kernel.org>,
"David S. Miller" <davem@davemloft.net>,
Boqun Feng <boqun.feng@gmail.com>,
Daniel Borkmann <daniel@iogearbox.net>,
Eric Dumazet <edumazet@google.com>,
Frederic Weisbecker <frederic@kernel.org>,
Ingo Molnar <mingo@redhat.com>, Paolo Abeni <pabeni@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
Waiman Long <longman@redhat.com>, Will Deacon <will@kernel.org>,
Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Cong Wang <xiyou.wangcong@gmail.com>, Hao Luo <haoluo@google.com>,
Jamal Hadi Salim <jhs@mojatatu.com>,
Jesper Dangaard Brouer <hawk@kernel.org>,
Jiri Olsa <jolsa@kernel.org>, Jiri Pirko <jiri@resnulli.us>,
John Fastabend <john.fastabend@gmail.com>,
KP Singh <kpsingh@kernel.org>,
Martin KaFai Lau <martin.lau@linux.dev>,
Ronak Doshi <doshir@vmware.com>, Song Liu <song@kernel.org>,
Stanislav Fomichev <sdf@google.com>,
VMware PV-Drivers Reviewers <pv-drivers@vmware.com>,
Yonghong Song <yonghong.song@linux.dev>,
bpf <bpf@vger.kernel.org>
Subject: Re: [PATCH net-next 15/24] net: Use nested-BH locking for XDP redirect.
Date: Wed, 17 Jan 2024 18:04:47 -0800 [thread overview]
Message-ID: <20240117180447.2512335b@kernel.org> (raw)
In-Reply-To: <87ttnb6hme.fsf@toke.dk>
On Wed, 17 Jan 2024 17:37:29 +0100 Toke Høiland-Jørgensen wrote:
> I am not contesting that latency is important, but it's a pretty
> fundamental trade-off and we don't want to kill throughput entirely
> either. Especially since this is global to the whole kernel; and there
> are definitely people who want to use XDP on an RT kernel and still
> achieve high PPS rates.
>
> (Whether those people really strictly speaking need to be running an RT
> kernel is maybe debatable, but it does happen).
>
> > I expected the lock operation (under RT) to always succeeds and not
> > cause any delay because it should not be contended.
>
> A lock does cause delay even when it's not contended. Bear in mind that
> at 10 Gbps line rate, we have a budget of 64 nanoseconds to process each
> packet (for 64-byte packets). So just the atomic op to figure out
> whether there's any contention (around 10ns on the Intel processors I
> usually test on) will blow a huge chunk of the total processing budget.
> We can't actually do the full processing needed in those 64 nanoseconds
> (not to mention the 6.4 nanoseconds we have available at 100Gbps), which
> is why it's essential to amortise as much as we can over multiple
> packets.
>
> This is all back-of-the-envelope calculations, of course. Having some
> actual numbers to look at would be great; I don't suppose you have a
> setup where you can run xdp-bench and see how your patches affect the
> throughput?
A potentially stupid idea which I have been turning in my head is
how we could get away from having the driver handle details of NAPI
budgeting. It's an source of bugs and endless review comments.
All drivers end up maintaining a counter of "how many packets have
I processed" and comparing that against the budget. Would it be crazy
if we put that inside napi_struct? Add a "budget" member inside
napi_struct as well, and:
struct napi_struct {
...
// poll state
unsigned int budget;
unsigned int rx_used;
...
}
static inline bool napi_rx_has_budget(napi)
{
return napi->budget > napi->rx_used;
}
poll(napi) // no budget
{
while (napi_rx_has_budget(napi)) {
napi_gro_receive(napi, skb); /* does napi->rx_used++ */
// maybe add explicit napi_rx_count() if
// driver did something funny with the frame.
}
}
We can also create napi_tx_has_budget() so that people stop being
confused whether budget is for Tx or not. And napi_xdp_comp_has_budget()
so that people stop completing XDP in hard irq context (budget==0)...
And we can pass napi into napi_consume_skb(), instead of, presumably
inexplicably to a newcomer, passing in budget.
And napi_complete_done() can lose the work_done argument, too.
Oh, and I'm bringing it up here, because CONFIG_RT can throw
in "need_resched()" into the napi_rx_has_budget(), obviously.
next prev parent reply other threads:[~2024-01-18 2:04 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-15 17:07 [PATCH net-next 00/24] locking: Introduce nested-BH locking Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 01/24] locking/local_lock: Introduce guard definition for local_lock Sebastian Andrzej Siewior
2023-12-18 8:16 ` Paolo Abeni
2024-01-11 16:19 ` Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 02/24] locking/local_lock: Add local nested BH locking infrastructure Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 03/24] net: Use __napi_alloc_frag_align() instead of open coding it Sebastian Andrzej Siewior
2023-12-18 7:48 ` Paolo Abeni
2024-01-12 9:01 ` Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 04/24] net: Use nested-BH locking for napi_alloc_cache Sebastian Andrzej Siewior
2023-12-16 4:43 ` kernel test robot
2024-01-12 10:58 ` Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 05/24] net/tcp_sigpool: Use nested-BH locking for sigpool_scratch Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 06/24] net/ipv4: Use nested-BH locking for ipv4_tcp_sk Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 07/24] netfilter: br_netfilter: Use nested-BH locking for brnf_frag_data_storage Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 08/24] net: softnet_data: Make xmit.recursion per task Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 09/24] dev: Use the RPS lock for softnet_data::input_pkt_queue on PREEMPT_RT Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 10/24] dev: Use nested-BH locking for softnet_data.process_queue Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 11/24] lwt: Don't disable migration prio invoking BPF Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 12/24] seg6: Use nested-BH locking for seg6_bpf_srh_states Sebastian Andrzej Siewior
2023-12-16 3:39 ` kernel test robot
2023-12-18 8:33 ` Paolo Abeni
2024-01-12 11:23 ` Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 13/24] net: Use nested-BH locking for bpf_scratchpad Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 14/24] net: Add a lock which held during the redirect process Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 15/24] net: Use nested-BH locking for XDP redirect Sebastian Andrzej Siewior
2023-12-16 4:12 ` kernel test robot
2023-12-20 0:25 ` Alexei Starovoitov
2024-01-04 19:29 ` Toke Høiland-Jørgensen
2024-01-12 17:41 ` Sebastian Andrzej Siewior
2024-01-17 16:37 ` Toke Høiland-Jørgensen
2024-01-18 2:04 ` Jakub Kicinski [this message]
2024-01-18 8:27 ` Sebastian Andrzej Siewior
2024-01-18 16:38 ` Jakub Kicinski
2024-01-18 16:50 ` Sebastian Andrzej Siewior
2024-01-18 11:51 ` Toke Høiland-Jørgensen
2024-01-18 16:37 ` Jakub Kicinski
2024-01-20 14:41 ` Toke Høiland-Jørgensen
2024-01-18 7:35 ` Sebastian Andrzej Siewior
2024-01-18 11:58 ` Toke Høiland-Jørgensen
2023-12-15 17:07 ` [PATCH net-next 16/24] net: netkit, veth, tun, virt*: " Sebastian Andrzej Siewior
2023-12-16 19:28 ` kernel test robot
2023-12-18 8:52 ` Daniel Borkmann
2024-01-12 15:37 ` Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 17/24] net: amazon, aquanti, broadcom, cavium, engleder: " Sebastian Andrzej Siewior
2023-12-16 22:09 ` Kiyanovski, Arthur
2024-01-12 17:53 ` Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 18/24] net: Freescale: " Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 19/24] net: fungible, gve, mtk, microchip, mana: " Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 20/24] net: intel: " Sebastian Andrzej Siewior
2023-12-16 4:53 ` kernel test robot
2023-12-19 0:01 ` Nathan Chancellor
2023-12-19 16:55 ` Nick Desaulniers
2023-12-15 17:07 ` [PATCH net-next 21/24] net: marvell: " Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 22/24] net: mellanox, nfp, sfc: " Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 23/24] net: qlogic, socionext, stmmac, cpsw: " Sebastian Andrzej Siewior
2023-12-15 17:07 ` [PATCH net-next 24/24] net: bpf: Add lockdep assert for the redirect process Sebastian Andrzej Siewior
2023-12-15 22:50 ` [PATCH net-next 00/24] locking: Introduce nested-BH locking Jakub Kicinski
2023-12-18 17:23 ` Sebastian Andrzej Siewior
2023-12-19 0:41 ` Jakub Kicinski
2023-12-21 20:46 ` Sebastian Andrzej Siewior
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240117180447.2512335b@kernel.org \
--to=kuba@kernel.org \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bigeasy@linutronix.de \
--cc=boqun.feng@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=doshir@vmware.com \
--cc=edumazet@google.com \
--cc=frederic@kernel.org \
--cc=haoluo@google.com \
--cc=hawk@kernel.org \
--cc=jhs@mojatatu.com \
--cc=jiri@resnulli.us \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=martin.lau@linux.dev \
--cc=mingo@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=peterz@infradead.org \
--cc=pv-drivers@vmware.com \
--cc=sdf@google.com \
--cc=song@kernel.org \
--cc=tglx@linutronix.de \
--cc=toke@redhat.com \
--cc=will@kernel.org \
--cc=xiyou.wangcong@gmail.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).