From: Yan Zhai <yan@cloudflare.com>
To: netdev@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>,
"Eric Dumazet" <edumazet@google.com>,
"Jakub Kicinski" <kuba@kernel.org>,
"Paolo Abeni" <pabeni@redhat.com>,
"Jiri Pirko" <jiri@resnulli.us>,
"Simon Horman" <horms@kernel.org>,
"Daniel Borkmann" <daniel@iogearbox.net>,
"Lorenzo Bianconi" <lorenzo@kernel.org>,
"Coco Li" <lixiaoyan@google.com>, "Wei Wang" <weiwan@google.com>,
"Alexander Duyck" <alexanderduyck@fb.com>,
linux-kernel@vger.kernel.org, rcu@vger.kernel.org,
bpf@vger.kernel.org, kernel-team@cloudflare.com,
"Joel Fernandes" <joel@joelfernandes.org>,
"Paul E. McKenney" <paulmck@kernel.org>,
"Toke Høiland-Jørgensen" <toke@redhat.com>,
"Alexei Starovoitov" <alexei.starovoitov@gmail.com>,
"Steven Rostedt" <rostedt@goodmis.org>,
mark.rutland@arm.com, "Jesper Dangaard Brouer" <hawk@kernel.org>,
"Sebastian Andrzej Siewior" <bigeasy@linutronix.de>
Subject: [PATCH v5 net 0/3] Report RCU QS for busy network kthreads
Date: Tue, 19 Mar 2024 13:44:30 -0700 [thread overview]
Message-ID: <cover.1710877680.git.yan@cloudflare.com> (raw)
This changeset fixes a common problem for busy networking kthreads.
These threads, e.g. NAPI threads, typically will do:
* polling a batch of packets
* if there are more work, call cond_resched() to allow scheduling
* continue to poll more packets when rx queue is not empty
We observed this being a problem in production, since it can block RCU
tasks from making progress under heavy load. Investigation indicates
that just calling cond_resched() is insufficient for RCU tasks to reach
quiescent states. This also has the side effect of frequently clearing
the TIF_NEED_RESCHED flag on voluntary preempt kernels. As a result,
schedule() will not be called in these circumstances, despite schedule()
in fact provides required quiescent states. This at least affects NAPI
threads, napi_busy_loop, and also cpumap kthread.
By reporting RCU QSes in these kthreads periodically before cond_resched, the
blocked RCU waiters can correctly progress. Instead of just reporting QS for
RCU tasks, these code share the same concern as noted in the commit
d28139c4e967 ("rcu: Apply RCU-bh QSes to RCU-sched and RCU-preempt when safe").
So report a consolidated QS for safety.
It is worth noting that, although this problem is reproducible in
napi_busy_loop, it only shows up when setting the polling interval to as high
as 2ms, which is far larger than recommended 50us-100us in the documentation.
So napi_busy_loop is left untouched.
Lastly, this does not affect RT kernels, which does not enter the scheduler
through cond_resched(). Without the mentioned side effect, schedule() will
be called time by time, and clear the RCU task holdouts.
V4: https://lore.kernel.org/bpf/cover.1710525524.git.yan@cloudflare.com/
V3: https://lore.kernel.org/lkml/20240314145459.7b3aedf1@kernel.org/t/
V2: https://lore.kernel.org/bpf/ZeFPz4D121TgvCje@debian.debian/
V1: https://lore.kernel.org/lkml/Zd4DXTyCf17lcTfq@debian.debian/#t
changes since v4:
* polished comments and docs for the RCU helper as Paul McKenney suggested
changes since v3:
* fixed kernel-doc errors
changes since v2:
* created a helper in rcu header to abstract the behavior
* fixed cpumap kthread in addition
changes since v1:
* disable preemption first as Paul McKenney suggested
Yan Zhai (3):
rcu: add a helper to report consolidated flavor QS
net: report RCU QS on threaded NAPI repolling
bpf: report RCU QS in cpumap kthread
include/linux/rcupdate.h | 31 +++++++++++++++++++++++++++++++
kernel/bpf/cpumap.c | 3 +++
net/core/dev.c | 3 +++
3 files changed, 37 insertions(+)
--
2.30.2
next reply other threads:[~2024-03-19 20:44 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-19 20:44 Yan Zhai [this message]
2024-03-19 20:44 ` [PATCH v5 net 1/3] rcu: add a helper to report consolidated flavor QS Yan Zhai
2024-03-19 21:31 ` Paul E. McKenney
2024-03-19 22:00 ` Yan Zhai
2024-03-19 22:08 ` Paul E. McKenney
2024-03-22 11:24 ` Sebastian Andrzej Siewior
2024-03-22 21:30 ` Paul E. McKenney
2024-03-23 2:02 ` Yan Zhai
2024-03-23 23:53 ` Paul E. McKenney
2024-04-05 13:49 ` Sebastian Andrzej Siewior
2024-04-05 18:13 ` Paul E. McKenney
2024-03-19 20:44 ` [PATCH v5 net 2/3] net: report RCU QS on threaded NAPI repolling Yan Zhai
2024-03-19 21:32 ` Paul E. McKenney
2024-03-19 20:44 ` [PATCH v5 net 3/3] bpf: report RCU QS in cpumap kthread Yan Zhai
2024-03-19 21:32 ` Paul E. McKenney
2024-03-20 10:30 ` [PATCH v5 net 0/3] Report RCU QS for busy network kthreads Jesper Dangaard Brouer
2024-03-21 4:40 ` patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1710877680.git.yan@cloudflare.com \
--to=yan@cloudflare.com \
--cc=alexanderduyck@fb.com \
--cc=alexei.starovoitov@gmail.com \
--cc=bigeasy@linutronix.de \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=jiri@resnulli.us \
--cc=joel@joelfernandes.org \
--cc=kernel-team@cloudflare.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lixiaoyan@google.com \
--cc=lorenzo@kernel.org \
--cc=mark.rutland@arm.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=paulmck@kernel.org \
--cc=rcu@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=toke@redhat.com \
--cc=weiwan@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.