* [PATCH net-next v4 00/15] net: Cover more per-CPU storage with local nested BH locking
@ 2025-05-12 9:27 Sebastian Andrzej Siewior
2025-05-12 9:27 ` [PATCH net-next v4 01/15] net: page_pool: Don't recycle into cache on PREEMPT_RT Sebastian Andrzej Siewior
` (15 more replies)
0 siblings, 16 replies; 22+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-05-12 9:27 UTC (permalink / raw)
To: netdev, linux-rt-devel
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Thomas Gleixner, Sebastian Andrzej Siewior
I was looking at the build-time defined per-CPU variables in net/ and
added the needed local-BH-locks in order to be able to remove the
current per-CPU lock in local_bh_disable() on PREMPT_RT.
The work is not yet complete, I just wanted to post what I have so far
instead of sitting on it.
v3…v4: https://lore.kernel.org/all/20250430124758.1159480-1-bigeasy@linutronix.de/
- xdp: Extend locked section and create a single unlock/ exit path in
xdp_copy_frags_from_zc(). Toke asked for this.
- Dropped the netfilter patches.
- Fixed up a typo in the openswitch comment.
v2…v3: https://lore.kernel.org/all/20250414160754.503321-1-bigeasy@linutronix.de
- openvswitch: Limit the ovs_pcpu_storage::owner assignment (and so
the whole locking procedure in the recursive case) to PREEMPT_RT.
- Add acked-by from sched folks for "netfilter: nf_dup{4, 6}: Move
duplication check to task_struct"
v1…v2: https://lore.kernel.org/all/20250309144653.825351-1-bigeasy@linutronix.de
- act_mirred: Using proper variable on PREEMPT_RT, noticed by Davide
Caratti.
- openvswitch:
- Renamed the per-CPU variable to ovs_pcpu_storage.
- Moved some data structures from action.c to datapath.h in order
to implement the locking within datapath.c.
Sebastian Andrzej Siewior (15):
net: page_pool: Don't recycle into cache on PREEMPT_RT
net: dst_cache: Use nested-BH locking for dst_cache::cache
ipv4/route: Use this_cpu_inc() for stats on PREEMPT_RT
ipv6: sr: Use nested-BH locking for hmac_storage
xdp: Use nested-BH locking for system_page_pool
xfrm: Use nested-BH locking for nat_keepalive_sk_ipv[46]
openvswitch: Merge three per-CPU structures into one
openvswitch: Use nested-BH locking for ovs_pcpu_storage
openvswitch: Move ovs_frag_data_storage into the struct
ovs_pcpu_storage
net/sched: act_mirred: Move the recursion counter struct netdev_xmit
net/sched: Use nested-BH locking for sch_frag_data_storage
mptcp: Use nested-BH locking for hmac_storage
rds: Disable only bottom halves in rds_page_remainder_alloc()
rds: Acquire per-CPU pointer within BH disabled section
rds: Use nested-BH locking for rds_page_remainder
include/linux/netdevice.h | 7 ++-
include/linux/netdevice_xmit.h | 3 ++
net/core/dev.c | 15 ++++--
net/core/dst_cache.c | 30 ++++++++++--
net/core/page_pool.c | 4 ++
net/core/xdp.c | 15 ++++--
net/ipv4/route.c | 4 ++
net/ipv6/seg6_hmac.c | 13 ++++-
net/mptcp/protocol.c | 4 +-
net/mptcp/protocol.h | 9 +++-
net/openvswitch/actions.c | 86 +++++-----------------------------
net/openvswitch/datapath.c | 33 +++++++++----
net/openvswitch/datapath.h | 52 ++++++++++++++++++--
net/rds/page.c | 25 +++++-----
net/sched/act_mirred.c | 28 +++++++++--
net/sched/sch_frag.c | 10 +++-
net/xfrm/xfrm_nat_keepalive.c | 30 ++++++++----
17 files changed, 241 insertions(+), 127 deletions(-)
--
2.49.0
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH net-next v4 01/15] net: page_pool: Don't recycle into cache on PREEMPT_RT
2025-05-12 9:27 [PATCH net-next v4 00/15] net: Cover more per-CPU storage with local nested BH locking Sebastian Andrzej Siewior
@ 2025-05-12 9:27 ` Sebastian Andrzej Siewior
2025-05-12 9:27 ` [PATCH net-next v4 02/15] net: dst_cache: Use nested-BH locking for dst_cache::cache Sebastian Andrzej Siewior
` (14 subsequent siblings)
15 siblings, 0 replies; 22+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-05-12 9:27 UTC (permalink / raw)
To: netdev, linux-rt-devel
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Thomas Gleixner, Sebastian Andrzej Siewior,
Jesper Dangaard Brouer, Ilias Apalodimas
With preemptible softirq and no per-CPU locking in local_bh_disable() on
PREEMPT_RT the consumer can be preempted while a skb is returned.
Avoid the race by disabling the recycle into the cache on PREEMPT_RT.
Cc: Jesper Dangaard Brouer <hawk@kernel.org>
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
net/core/page_pool.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 7745ad924ae2d..ba8803c2c0b20 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -805,6 +805,10 @@ static bool page_pool_napi_local(const struct page_pool *pool)
const struct napi_struct *napi;
u32 cpuid;
+ /* On PREEMPT_RT the softirq can be preempted by the consumer */
+ if (IS_ENABLED(CONFIG_PREEMPT_RT))
+ return false;
+
if (unlikely(!in_softirq()))
return false;
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH net-next v4 02/15] net: dst_cache: Use nested-BH locking for dst_cache::cache
2025-05-12 9:27 [PATCH net-next v4 00/15] net: Cover more per-CPU storage with local nested BH locking Sebastian Andrzej Siewior
2025-05-12 9:27 ` [PATCH net-next v4 01/15] net: page_pool: Don't recycle into cache on PREEMPT_RT Sebastian Andrzej Siewior
@ 2025-05-12 9:27 ` Sebastian Andrzej Siewior
2025-05-12 9:27 ` [PATCH net-next v4 03/15] ipv4/route: Use this_cpu_inc() for stats on PREEMPT_RT Sebastian Andrzej Siewior
` (13 subsequent siblings)
15 siblings, 0 replies; 22+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-05-12 9:27 UTC (permalink / raw)
To: netdev, linux-rt-devel
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Thomas Gleixner, Sebastian Andrzej Siewior
dst_cache::cache is a per-CPU variable and relies on disabled BH for its
locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT
this data structure requires explicit locking.
Add a local_lock_t to the data structure and use
local_lock_nested_bh() for locking. This change adds only lockdep
coverage and does not alter the functional behaviour for !PREEMPT_RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
net/core/dst_cache.c | 30 +++++++++++++++++++++++++++---
1 file changed, 27 insertions(+), 3 deletions(-)
diff --git a/net/core/dst_cache.c b/net/core/dst_cache.c
index 70c634b9e7b02..93a04d18e5054 100644
--- a/net/core/dst_cache.c
+++ b/net/core/dst_cache.c
@@ -17,6 +17,7 @@
struct dst_cache_pcpu {
unsigned long refresh_ts;
struct dst_entry *dst;
+ local_lock_t bh_lock;
u32 cookie;
union {
struct in_addr in_saddr;
@@ -65,10 +66,15 @@ static struct dst_entry *dst_cache_per_cpu_get(struct dst_cache *dst_cache,
struct dst_entry *dst_cache_get(struct dst_cache *dst_cache)
{
+ struct dst_entry *dst;
+
if (!dst_cache->cache)
return NULL;
- return dst_cache_per_cpu_get(dst_cache, this_cpu_ptr(dst_cache->cache));
+ local_lock_nested_bh(&dst_cache->cache->bh_lock);
+ dst = dst_cache_per_cpu_get(dst_cache, this_cpu_ptr(dst_cache->cache));
+ local_unlock_nested_bh(&dst_cache->cache->bh_lock);
+ return dst;
}
EXPORT_SYMBOL_GPL(dst_cache_get);
@@ -80,12 +86,16 @@ struct rtable *dst_cache_get_ip4(struct dst_cache *dst_cache, __be32 *saddr)
if (!dst_cache->cache)
return NULL;
+ local_lock_nested_bh(&dst_cache->cache->bh_lock);
idst = this_cpu_ptr(dst_cache->cache);
dst = dst_cache_per_cpu_get(dst_cache, idst);
- if (!dst)
+ if (!dst) {
+ local_unlock_nested_bh(&dst_cache->cache->bh_lock);
return NULL;
+ }
*saddr = idst->in_saddr.s_addr;
+ local_unlock_nested_bh(&dst_cache->cache->bh_lock);
return dst_rtable(dst);
}
EXPORT_SYMBOL_GPL(dst_cache_get_ip4);
@@ -98,9 +108,11 @@ void dst_cache_set_ip4(struct dst_cache *dst_cache, struct dst_entry *dst,
if (!dst_cache->cache)
return;
+ local_lock_nested_bh(&dst_cache->cache->bh_lock);
idst = this_cpu_ptr(dst_cache->cache);
dst_cache_per_cpu_dst_set(idst, dst, 0);
idst->in_saddr.s_addr = saddr;
+ local_unlock_nested_bh(&dst_cache->cache->bh_lock);
}
EXPORT_SYMBOL_GPL(dst_cache_set_ip4);
@@ -113,10 +125,13 @@ void dst_cache_set_ip6(struct dst_cache *dst_cache, struct dst_entry *dst,
if (!dst_cache->cache)
return;
+ local_lock_nested_bh(&dst_cache->cache->bh_lock);
+
idst = this_cpu_ptr(dst_cache->cache);
dst_cache_per_cpu_dst_set(idst, dst,
rt6_get_cookie(dst_rt6_info(dst)));
idst->in6_saddr = *saddr;
+ local_unlock_nested_bh(&dst_cache->cache->bh_lock);
}
EXPORT_SYMBOL_GPL(dst_cache_set_ip6);
@@ -129,12 +144,17 @@ struct dst_entry *dst_cache_get_ip6(struct dst_cache *dst_cache,
if (!dst_cache->cache)
return NULL;
+ local_lock_nested_bh(&dst_cache->cache->bh_lock);
+
idst = this_cpu_ptr(dst_cache->cache);
dst = dst_cache_per_cpu_get(dst_cache, idst);
- if (!dst)
+ if (!dst) {
+ local_unlock_nested_bh(&dst_cache->cache->bh_lock);
return NULL;
+ }
*saddr = idst->in6_saddr;
+ local_unlock_nested_bh(&dst_cache->cache->bh_lock);
return dst;
}
EXPORT_SYMBOL_GPL(dst_cache_get_ip6);
@@ -142,10 +162,14 @@ EXPORT_SYMBOL_GPL(dst_cache_get_ip6);
int dst_cache_init(struct dst_cache *dst_cache, gfp_t gfp)
{
+ unsigned int i;
+
dst_cache->cache = alloc_percpu_gfp(struct dst_cache_pcpu,
gfp | __GFP_ZERO);
if (!dst_cache->cache)
return -ENOMEM;
+ for_each_possible_cpu(i)
+ local_lock_init(&per_cpu_ptr(dst_cache->cache, i)->bh_lock);
dst_cache_reset(dst_cache);
return 0;
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH net-next v4 03/15] ipv4/route: Use this_cpu_inc() for stats on PREEMPT_RT
2025-05-12 9:27 [PATCH net-next v4 00/15] net: Cover more per-CPU storage with local nested BH locking Sebastian Andrzej Siewior
2025-05-12 9:27 ` [PATCH net-next v4 01/15] net: page_pool: Don't recycle into cache on PREEMPT_RT Sebastian Andrzej Siewior
2025-05-12 9:27 ` [PATCH net-next v4 02/15] net: dst_cache: Use nested-BH locking for dst_cache::cache Sebastian Andrzej Siewior
@ 2025-05-12 9:27 ` Sebastian Andrzej Siewior
2025-05-12 9:27 ` [PATCH net-next v4 04/15] ipv6: sr: Use nested-BH locking for hmac_storage Sebastian Andrzej Siewior
` (12 subsequent siblings)
15 siblings, 0 replies; 22+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-05-12 9:27 UTC (permalink / raw)
To: netdev, linux-rt-devel
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Thomas Gleixner, Sebastian Andrzej Siewior,
David Ahern
The statistics are incremented with raw_cpu_inc() assuming it always
happens with bottom half disabled. Without per-CPU locking in
local_bh_disable() on PREEMPT_RT this is no longer true.
Use this_cpu_inc() on PREEMPT_RT for the increment to not worry about
preemption.
Cc: David Ahern <dsahern@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
net/ipv4/route.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 753704f75b2c6..5d7c7efea66cc 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -189,7 +189,11 @@ const __u8 ip_tos2prio[16] = {
EXPORT_SYMBOL(ip_tos2prio);
static DEFINE_PER_CPU(struct rt_cache_stat, rt_cache_stat);
+#ifndef CONFIG_PREEMPT_RT
#define RT_CACHE_STAT_INC(field) raw_cpu_inc(rt_cache_stat.field)
+#else
+#define RT_CACHE_STAT_INC(field) this_cpu_inc(rt_cache_stat.field)
+#endif
#ifdef CONFIG_PROC_FS
static void *rt_cache_seq_start(struct seq_file *seq, loff_t *pos)
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH net-next v4 04/15] ipv6: sr: Use nested-BH locking for hmac_storage
2025-05-12 9:27 [PATCH net-next v4 00/15] net: Cover more per-CPU storage with local nested BH locking Sebastian Andrzej Siewior
` (2 preceding siblings ...)
2025-05-12 9:27 ` [PATCH net-next v4 03/15] ipv4/route: Use this_cpu_inc() for stats on PREEMPT_RT Sebastian Andrzej Siewior
@ 2025-05-12 9:27 ` Sebastian Andrzej Siewior
2025-05-12 9:27 ` [PATCH net-next v4 05/15] xdp: Use nested-BH locking for system_page_pool Sebastian Andrzej Siewior
` (11 subsequent siblings)
15 siblings, 0 replies; 22+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-05-12 9:27 UTC (permalink / raw)
To: netdev, linux-rt-devel
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Thomas Gleixner, Sebastian Andrzej Siewior,
David Ahern
hmac_storage is a per-CPU variable and relies on disabled BH for its
locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT
this data structure requires explicit locking.
Add a local_lock_t to the data structure and use
local_lock_nested_bh() for locking. This change adds only lockdep
coverage and does not alter the functional behaviour for !PREEMPT_RT.
Cc: David Ahern <dsahern@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
net/ipv6/seg6_hmac.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/net/ipv6/seg6_hmac.c b/net/ipv6/seg6_hmac.c
index bbf5b84a70fca..f78ecb6ad8383 100644
--- a/net/ipv6/seg6_hmac.c
+++ b/net/ipv6/seg6_hmac.c
@@ -40,7 +40,14 @@
#include <net/seg6_hmac.h>
#include <linux/random.h>
-static DEFINE_PER_CPU(char [SEG6_HMAC_RING_SIZE], hmac_ring);
+struct hmac_storage {
+ local_lock_t bh_lock;
+ char hmac_ring[SEG6_HMAC_RING_SIZE];
+};
+
+static DEFINE_PER_CPU(struct hmac_storage, hmac_storage) = {
+ .bh_lock = INIT_LOCAL_LOCK(bh_lock),
+};
static int seg6_hmac_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
{
@@ -187,7 +194,8 @@ int seg6_hmac_compute(struct seg6_hmac_info *hinfo, struct ipv6_sr_hdr *hdr,
*/
local_bh_disable();
- ring = this_cpu_ptr(hmac_ring);
+ local_lock_nested_bh(&hmac_storage.bh_lock);
+ ring = this_cpu_ptr(hmac_storage.hmac_ring);
off = ring;
/* source address */
@@ -212,6 +220,7 @@ int seg6_hmac_compute(struct seg6_hmac_info *hinfo, struct ipv6_sr_hdr *hdr,
dgsize = __do_hmac(hinfo, ring, plen, tmp_out,
SEG6_HMAC_MAX_DIGESTSIZE);
+ local_unlock_nested_bh(&hmac_storage.bh_lock);
local_bh_enable();
if (dgsize < 0)
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH net-next v4 05/15] xdp: Use nested-BH locking for system_page_pool
2025-05-12 9:27 [PATCH net-next v4 00/15] net: Cover more per-CPU storage with local nested BH locking Sebastian Andrzej Siewior
` (3 preceding siblings ...)
2025-05-12 9:27 ` [PATCH net-next v4 04/15] ipv6: sr: Use nested-BH locking for hmac_storage Sebastian Andrzej Siewior
@ 2025-05-12 9:27 ` Sebastian Andrzej Siewior
2025-05-12 9:27 ` [PATCH net-next v4 06/15] xfrm: Use nested-BH locking for nat_keepalive_sk_ipv[46] Sebastian Andrzej Siewior
` (10 subsequent siblings)
15 siblings, 0 replies; 22+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-05-12 9:27 UTC (permalink / raw)
To: netdev, linux-rt-devel
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Thomas Gleixner, Sebastian Andrzej Siewior,
Andrew Lunn, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend,
Toke Høiland-Jørgensen
system_page_pool is a per-CPU variable and relies on disabled BH for its
locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT
this data structure requires explicit locking.
Make a struct with a page_pool member (original system_page_pool) and a
local_lock_t and use local_lock_nested_bh() for locking. This change
adds only lockdep coverage and does not alter the functional behaviour
for !PREEMPT_RT.
Cc: Andrew Lunn <andrew+netdev@lunn.ch>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jesper Dangaard Brouer <hawk@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
include/linux/netdevice.h | 7 ++++++-
net/core/dev.c | 15 ++++++++++-----
net/core/xdp.c | 15 ++++++++++-----
3 files changed, 26 insertions(+), 11 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 7ea022750e4e0..138bd7f3d2bef 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3502,7 +3502,12 @@ struct softnet_data {
};
DECLARE_PER_CPU_ALIGNED(struct softnet_data, softnet_data);
-DECLARE_PER_CPU(struct page_pool *, system_page_pool);
+
+struct page_pool_bh {
+ struct page_pool *pool;
+ local_lock_t bh_lock;
+};
+DECLARE_PER_CPU(struct page_pool_bh, system_page_pool);
#ifndef CONFIG_PREEMPT_RT
static inline int dev_recursion_level(void)
diff --git a/net/core/dev.c b/net/core/dev.c
index 11da1e272ec20..3c49366cda560 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -462,7 +462,9 @@ EXPORT_PER_CPU_SYMBOL(softnet_data);
* PP consumers must pay attention to run APIs in the appropriate context
* (e.g. NAPI context).
*/
-DEFINE_PER_CPU(struct page_pool *, system_page_pool);
+DEFINE_PER_CPU(struct page_pool_bh, system_page_pool) = {
+ .bh_lock = INIT_LOCAL_LOCK(bh_lock),
+};
#ifdef CONFIG_LOCKDEP
/*
@@ -5238,7 +5240,10 @@ netif_skb_check_for_xdp(struct sk_buff **pskb, const struct bpf_prog *prog)
struct sk_buff *skb = *pskb;
int err, hroom, troom;
- if (!skb_cow_data_for_xdp(this_cpu_read(system_page_pool), pskb, prog))
+ local_lock_nested_bh(&system_page_pool.bh_lock);
+ err = skb_cow_data_for_xdp(this_cpu_read(system_page_pool.pool), pskb, prog);
+ local_unlock_nested_bh(&system_page_pool.bh_lock);
+ if (!err)
return 0;
/* In case we have to go down the path and also linearize,
@@ -12619,7 +12624,7 @@ static int net_page_pool_create(int cpuid)
return err;
}
- per_cpu(system_page_pool, cpuid) = pp_ptr;
+ per_cpu(system_page_pool.pool, cpuid) = pp_ptr;
#endif
return 0;
}
@@ -12749,13 +12754,13 @@ static int __init net_dev_init(void)
for_each_possible_cpu(i) {
struct page_pool *pp_ptr;
- pp_ptr = per_cpu(system_page_pool, i);
+ pp_ptr = per_cpu(system_page_pool.pool, i);
if (!pp_ptr)
continue;
xdp_unreg_page_pool(pp_ptr);
page_pool_destroy(pp_ptr);
- per_cpu(system_page_pool, i) = NULL;
+ per_cpu(system_page_pool.pool, i) = NULL;
}
}
diff --git a/net/core/xdp.c b/net/core/xdp.c
index f86eedad586a7..8696e8ffa3bcc 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -737,25 +737,27 @@ static noinline bool xdp_copy_frags_from_zc(struct sk_buff *skb,
*/
struct sk_buff *xdp_build_skb_from_zc(struct xdp_buff *xdp)
{
- struct page_pool *pp = this_cpu_read(system_page_pool);
const struct xdp_rxq_info *rxq = xdp->rxq;
u32 len = xdp->data_end - xdp->data_meta;
u32 truesize = xdp->frame_sz;
- struct sk_buff *skb;
+ struct sk_buff *skb = NULL;
+ struct page_pool *pp;
int metalen;
void *data;
if (!IS_ENABLED(CONFIG_PAGE_POOL))
return NULL;
+ local_lock_nested_bh(&system_page_pool.bh_lock);
+ pp = this_cpu_read(system_page_pool.pool);
data = page_pool_dev_alloc_va(pp, &truesize);
if (unlikely(!data))
- return NULL;
+ goto out;
skb = napi_build_skb(data, truesize);
if (unlikely(!skb)) {
page_pool_free_va(pp, data, true);
- return NULL;
+ goto out;
}
skb_mark_for_recycle(skb);
@@ -774,13 +776,16 @@ struct sk_buff *xdp_build_skb_from_zc(struct xdp_buff *xdp)
if (unlikely(xdp_buff_has_frags(xdp)) &&
unlikely(!xdp_copy_frags_from_zc(skb, xdp, pp))) {
napi_consume_skb(skb, true);
- return NULL;
+ skb = NULL;
+ goto out;
}
xsk_buff_free(xdp);
skb->protocol = eth_type_trans(skb, rxq->dev);
+out:
+ local_unlock_nested_bh(&system_page_pool.bh_lock);
return skb;
}
EXPORT_SYMBOL_GPL(xdp_build_skb_from_zc);
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH net-next v4 06/15] xfrm: Use nested-BH locking for nat_keepalive_sk_ipv[46]
2025-05-12 9:27 [PATCH net-next v4 00/15] net: Cover more per-CPU storage with local nested BH locking Sebastian Andrzej Siewior
` (4 preceding siblings ...)
2025-05-12 9:27 ` [PATCH net-next v4 05/15] xdp: Use nested-BH locking for system_page_pool Sebastian Andrzej Siewior
@ 2025-05-12 9:27 ` Sebastian Andrzej Siewior
2025-05-12 9:27 ` [PATCH net-next v4 07/15] openvswitch: Merge three per-CPU structures into one Sebastian Andrzej Siewior
` (9 subsequent siblings)
15 siblings, 0 replies; 22+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-05-12 9:27 UTC (permalink / raw)
To: netdev, linux-rt-devel
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Thomas Gleixner, Sebastian Andrzej Siewior,
Steffen Klassert, Herbert Xu
nat_keepalive_sk_ipv[46] is a per-CPU variable and relies on disabled BH
for its locking. Without per-CPU locking in local_bh_disable() on
PREEMPT_RT this data structure requires explicit locking.
Use sock_bh_locked which has a sock pointer and a local_lock_t. Use
local_lock_nested_bh() for locking. This change adds only lockdep
coverage and does not alter the functional behaviour for !PREEMPT_RT.
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
net/xfrm/xfrm_nat_keepalive.c | 30 ++++++++++++++++++++----------
1 file changed, 20 insertions(+), 10 deletions(-)
diff --git a/net/xfrm/xfrm_nat_keepalive.c b/net/xfrm/xfrm_nat_keepalive.c
index 82f0a301683f0..ebf95d48e86c1 100644
--- a/net/xfrm/xfrm_nat_keepalive.c
+++ b/net/xfrm/xfrm_nat_keepalive.c
@@ -9,9 +9,13 @@
#include <net/ip6_checksum.h>
#include <net/xfrm.h>
-static DEFINE_PER_CPU(struct sock *, nat_keepalive_sk_ipv4);
+static DEFINE_PER_CPU(struct sock_bh_locked, nat_keepalive_sk_ipv4) = {
+ .bh_lock = INIT_LOCAL_LOCK(bh_lock),
+};
#if IS_ENABLED(CONFIG_IPV6)
-static DEFINE_PER_CPU(struct sock *, nat_keepalive_sk_ipv6);
+static DEFINE_PER_CPU(struct sock_bh_locked, nat_keepalive_sk_ipv6) = {
+ .bh_lock = INIT_LOCAL_LOCK(bh_lock),
+};
#endif
struct nat_keepalive {
@@ -56,10 +60,12 @@ static int nat_keepalive_send_ipv4(struct sk_buff *skb,
skb_dst_set(skb, &rt->dst);
- sk = *this_cpu_ptr(&nat_keepalive_sk_ipv4);
+ local_lock_nested_bh(&nat_keepalive_sk_ipv4.bh_lock);
+ sk = this_cpu_read(nat_keepalive_sk_ipv4.sock);
sock_net_set(sk, net);
err = ip_build_and_send_pkt(skb, sk, fl4.saddr, fl4.daddr, NULL, tos);
sock_net_set(sk, &init_net);
+ local_unlock_nested_bh(&nat_keepalive_sk_ipv4.bh_lock);
return err;
}
@@ -89,15 +95,19 @@ static int nat_keepalive_send_ipv6(struct sk_buff *skb,
fl6.fl6_sport = ka->encap_sport;
fl6.fl6_dport = ka->encap_dport;
- sk = *this_cpu_ptr(&nat_keepalive_sk_ipv6);
+ local_lock_nested_bh(&nat_keepalive_sk_ipv6.bh_lock);
+ sk = this_cpu_read(nat_keepalive_sk_ipv6.sock);
sock_net_set(sk, net);
dst = ipv6_stub->ipv6_dst_lookup_flow(net, sk, &fl6, NULL);
- if (IS_ERR(dst))
+ if (IS_ERR(dst)) {
+ local_unlock_nested_bh(&nat_keepalive_sk_ipv6.bh_lock);
return PTR_ERR(dst);
+ }
skb_dst_set(skb, dst);
err = ipv6_stub->ip6_xmit(sk, skb, &fl6, skb->mark, NULL, 0, 0);
sock_net_set(sk, &init_net);
+ local_unlock_nested_bh(&nat_keepalive_sk_ipv6.bh_lock);
return err;
}
#endif
@@ -202,7 +212,7 @@ static void nat_keepalive_work(struct work_struct *work)
(ctx.next_run - ctx.now) * HZ);
}
-static int nat_keepalive_sk_init(struct sock * __percpu *socks,
+static int nat_keepalive_sk_init(struct sock_bh_locked __percpu *socks,
unsigned short family)
{
struct sock *sk;
@@ -214,22 +224,22 @@ static int nat_keepalive_sk_init(struct sock * __percpu *socks,
if (err < 0)
goto err;
- *per_cpu_ptr(socks, i) = sk;
+ per_cpu_ptr(socks, i)->sock = sk;
}
return 0;
err:
for_each_possible_cpu(i)
- inet_ctl_sock_destroy(*per_cpu_ptr(socks, i));
+ inet_ctl_sock_destroy(per_cpu_ptr(socks, i)->sock);
return err;
}
-static void nat_keepalive_sk_fini(struct sock * __percpu *socks)
+static void nat_keepalive_sk_fini(struct sock_bh_locked __percpu *socks)
{
int i;
for_each_possible_cpu(i)
- inet_ctl_sock_destroy(*per_cpu_ptr(socks, i));
+ inet_ctl_sock_destroy(per_cpu_ptr(socks, i)->sock);
}
void xfrm_nat_keepalive_state_updated(struct xfrm_state *x)
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH net-next v4 07/15] openvswitch: Merge three per-CPU structures into one
2025-05-12 9:27 [PATCH net-next v4 00/15] net: Cover more per-CPU storage with local nested BH locking Sebastian Andrzej Siewior
` (5 preceding siblings ...)
2025-05-12 9:27 ` [PATCH net-next v4 06/15] xfrm: Use nested-BH locking for nat_keepalive_sk_ipv[46] Sebastian Andrzej Siewior
@ 2025-05-12 9:27 ` Sebastian Andrzej Siewior
2025-05-14 14:13 ` [ovs-dev] " Aaron Conole
2025-05-12 9:27 ` [PATCH net-next v4 08/15] openvswitch: Use nested-BH locking for ovs_pcpu_storage Sebastian Andrzej Siewior
` (8 subsequent siblings)
15 siblings, 1 reply; 22+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-05-12 9:27 UTC (permalink / raw)
To: netdev, linux-rt-devel
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Thomas Gleixner, Sebastian Andrzej Siewior,
Aaron Conole, Eelco Chaudron, Ilya Maximets, dev
exec_actions_level is a per-CPU integer allocated at compile time.
action_fifos and flow_keys are per-CPU pointer and have their data
allocated at module init time.
There is no gain in splitting it, once the module is allocated, the
structures are allocated.
Merge the three per-CPU variables into ovs_pcpu_storage, adapt callers.
Cc: Aaron Conole <aconole@redhat.com>
Cc: Eelco Chaudron <echaudro@redhat.com>
Cc: Ilya Maximets <i.maximets@ovn.org>
Cc: dev@openvswitch.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
net/openvswitch/actions.c | 49 +++++++++++++-------------------------
net/openvswitch/datapath.c | 9 +------
net/openvswitch/datapath.h | 3 ---
3 files changed, 17 insertions(+), 44 deletions(-)
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 2f22ca59586f2..7e4a8d41b9ed6 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -78,17 +78,22 @@ struct action_flow_keys {
struct sw_flow_key key[OVS_DEFERRED_ACTION_THRESHOLD];
};
-static struct action_fifo __percpu *action_fifos;
-static struct action_flow_keys __percpu *flow_keys;
-static DEFINE_PER_CPU(int, exec_actions_level);
+struct ovs_pcpu_storage {
+ struct action_fifo action_fifos;
+ struct action_flow_keys flow_keys;
+ int exec_level;
+};
+
+static DEFINE_PER_CPU(struct ovs_pcpu_storage, ovs_pcpu_storage);
/* Make a clone of the 'key', using the pre-allocated percpu 'flow_keys'
* space. Return NULL if out of key spaces.
*/
static struct sw_flow_key *clone_key(const struct sw_flow_key *key_)
{
- struct action_flow_keys *keys = this_cpu_ptr(flow_keys);
- int level = this_cpu_read(exec_actions_level);
+ struct ovs_pcpu_storage *ovs_pcpu = this_cpu_ptr(&ovs_pcpu_storage);
+ struct action_flow_keys *keys = &ovs_pcpu->flow_keys;
+ int level = ovs_pcpu->exec_level;
struct sw_flow_key *key = NULL;
if (level <= OVS_DEFERRED_ACTION_THRESHOLD) {
@@ -132,10 +137,9 @@ static struct deferred_action *add_deferred_actions(struct sk_buff *skb,
const struct nlattr *actions,
const int actions_len)
{
- struct action_fifo *fifo;
+ struct action_fifo *fifo = this_cpu_ptr(&ovs_pcpu_storage.action_fifos);
struct deferred_action *da;
- fifo = this_cpu_ptr(action_fifos);
da = action_fifo_put(fifo);
if (da) {
da->skb = skb;
@@ -1608,13 +1612,13 @@ static int clone_execute(struct datapath *dp, struct sk_buff *skb,
if (actions) { /* Sample action */
if (clone_flow_key)
- __this_cpu_inc(exec_actions_level);
+ __this_cpu_inc(ovs_pcpu_storage.exec_level);
err = do_execute_actions(dp, skb, clone,
actions, len);
if (clone_flow_key)
- __this_cpu_dec(exec_actions_level);
+ __this_cpu_dec(ovs_pcpu_storage.exec_level);
} else { /* Recirc action */
clone->recirc_id = recirc_id;
ovs_dp_process_packet(skb, clone);
@@ -1650,7 +1654,7 @@ static int clone_execute(struct datapath *dp, struct sk_buff *skb,
static void process_deferred_actions(struct datapath *dp)
{
- struct action_fifo *fifo = this_cpu_ptr(action_fifos);
+ struct action_fifo *fifo = this_cpu_ptr(&ovs_pcpu_storage.action_fifos);
/* Do not touch the FIFO in case there is no deferred actions. */
if (action_fifo_is_empty(fifo))
@@ -1681,7 +1685,7 @@ int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb,
{
int err, level;
- level = __this_cpu_inc_return(exec_actions_level);
+ level = __this_cpu_inc_return(ovs_pcpu_storage.exec_level);
if (unlikely(level > OVS_RECURSION_LIMIT)) {
net_crit_ratelimited("ovs: recursion limit reached on datapath %s, probable configuration error\n",
ovs_dp_name(dp));
@@ -1698,27 +1702,6 @@ int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb,
process_deferred_actions(dp);
out:
- __this_cpu_dec(exec_actions_level);
+ __this_cpu_dec(ovs_pcpu_storage.exec_level);
return err;
}
-
-int action_fifos_init(void)
-{
- action_fifos = alloc_percpu(struct action_fifo);
- if (!action_fifos)
- return -ENOMEM;
-
- flow_keys = alloc_percpu(struct action_flow_keys);
- if (!flow_keys) {
- free_percpu(action_fifos);
- return -ENOMEM;
- }
-
- return 0;
-}
-
-void action_fifos_exit(void)
-{
- free_percpu(action_fifos);
- free_percpu(flow_keys);
-}
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 5d548eda742df..aaa6277bb49c2 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -2729,13 +2729,9 @@ static int __init dp_init(void)
pr_info("Open vSwitch switching datapath\n");
- err = action_fifos_init();
- if (err)
- goto error;
-
err = ovs_internal_dev_rtnl_link_register();
if (err)
- goto error_action_fifos_exit;
+ goto error;
err = ovs_flow_init();
if (err)
@@ -2778,8 +2774,6 @@ static int __init dp_init(void)
ovs_flow_exit();
error_unreg_rtnl_link:
ovs_internal_dev_rtnl_link_unregister();
-error_action_fifos_exit:
- action_fifos_exit();
error:
return err;
}
@@ -2795,7 +2789,6 @@ static void dp_cleanup(void)
ovs_vport_exit();
ovs_flow_exit();
ovs_internal_dev_rtnl_link_unregister();
- action_fifos_exit();
}
module_init(dp_init);
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index 384ca77f4e794..a126407926058 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -281,9 +281,6 @@ int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb,
void ovs_dp_notify_wq(struct work_struct *work);
-int action_fifos_init(void);
-void action_fifos_exit(void);
-
/* 'KEY' must not have any bits set outside of the 'MASK' */
#define OVS_MASKED(OLD, KEY, MASK) ((KEY) | ((OLD) & ~(MASK)))
#define OVS_SET_MASKED(OLD, KEY, MASK) ((OLD) = OVS_MASKED(OLD, KEY, MASK))
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH net-next v4 08/15] openvswitch: Use nested-BH locking for ovs_pcpu_storage
2025-05-12 9:27 [PATCH net-next v4 00/15] net: Cover more per-CPU storage with local nested BH locking Sebastian Andrzej Siewior
` (6 preceding siblings ...)
2025-05-12 9:27 ` [PATCH net-next v4 07/15] openvswitch: Merge three per-CPU structures into one Sebastian Andrzej Siewior
@ 2025-05-12 9:27 ` Sebastian Andrzej Siewior
2025-05-14 14:14 ` [ovs-dev] " Aaron Conole
2025-05-12 9:27 ` [PATCH net-next v4 09/15] openvswitch: Move ovs_frag_data_storage into the struct ovs_pcpu_storage Sebastian Andrzej Siewior
` (7 subsequent siblings)
15 siblings, 1 reply; 22+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-05-12 9:27 UTC (permalink / raw)
To: netdev, linux-rt-devel
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Thomas Gleixner, Sebastian Andrzej Siewior,
Aaron Conole, Eelco Chaudron, Ilya Maximets, dev
ovs_pcpu_storage is a per-CPU variable and relies on disabled BH for its
locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT
this data structure requires explicit locking.
The data structure can be referenced recursive and there is a recursion
counter to avoid too many recursions.
Add a local_lock_t to the data structure and use
local_lock_nested_bh() for locking. Add an owner of the struct which is
the current task and acquire the lock only if the structure is not owned
by the current task.
Cc: Aaron Conole <aconole@redhat.com>
Cc: Eelco Chaudron <echaudro@redhat.com>
Cc: Ilya Maximets <i.maximets@ovn.org>
Cc: dev@openvswitch.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
net/openvswitch/actions.c | 31 ++-----------------------------
net/openvswitch/datapath.c | 24 ++++++++++++++++++++++++
net/openvswitch/datapath.h | 33 +++++++++++++++++++++++++++++++++
3 files changed, 59 insertions(+), 29 deletions(-)
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 7e4a8d41b9ed6..435725c27a557 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -39,15 +39,6 @@
#include "flow_netlink.h"
#include "openvswitch_trace.h"
-struct deferred_action {
- struct sk_buff *skb;
- const struct nlattr *actions;
- int actions_len;
-
- /* Store pkt_key clone when creating deferred action. */
- struct sw_flow_key pkt_key;
-};
-
#define MAX_L2_LEN (VLAN_ETH_HLEN + 3 * MPLS_HLEN)
struct ovs_frag_data {
unsigned long dst;
@@ -64,28 +55,10 @@ struct ovs_frag_data {
static DEFINE_PER_CPU(struct ovs_frag_data, ovs_frag_data_storage);
-#define DEFERRED_ACTION_FIFO_SIZE 10
-#define OVS_RECURSION_LIMIT 5
-#define OVS_DEFERRED_ACTION_THRESHOLD (OVS_RECURSION_LIMIT - 2)
-struct action_fifo {
- int head;
- int tail;
- /* Deferred action fifo queue storage. */
- struct deferred_action fifo[DEFERRED_ACTION_FIFO_SIZE];
+DEFINE_PER_CPU(struct ovs_pcpu_storage, ovs_pcpu_storage) = {
+ .bh_lock = INIT_LOCAL_LOCK(bh_lock),
};
-struct action_flow_keys {
- struct sw_flow_key key[OVS_DEFERRED_ACTION_THRESHOLD];
-};
-
-struct ovs_pcpu_storage {
- struct action_fifo action_fifos;
- struct action_flow_keys flow_keys;
- int exec_level;
-};
-
-static DEFINE_PER_CPU(struct ovs_pcpu_storage, ovs_pcpu_storage);
-
/* Make a clone of the 'key', using the pre-allocated percpu 'flow_keys'
* space. Return NULL if out of key spaces.
*/
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index aaa6277bb49c2..6a304ae2d959c 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -244,11 +244,13 @@ void ovs_dp_detach_port(struct vport *p)
/* Must be called with rcu_read_lock. */
void ovs_dp_process_packet(struct sk_buff *skb, struct sw_flow_key *key)
{
+ struct ovs_pcpu_storage *ovs_pcpu = this_cpu_ptr(&ovs_pcpu_storage);
const struct vport *p = OVS_CB(skb)->input_vport;
struct datapath *dp = p->dp;
struct sw_flow *flow;
struct sw_flow_actions *sf_acts;
struct dp_stats_percpu *stats;
+ bool ovs_pcpu_locked = false;
u64 *stats_counter;
u32 n_mask_hit;
u32 n_cache_hit;
@@ -290,10 +292,26 @@ void ovs_dp_process_packet(struct sk_buff *skb, struct sw_flow_key *key)
ovs_flow_stats_update(flow, key->tp.flags, skb);
sf_acts = rcu_dereference(flow->sf_acts);
+ /* This path can be invoked recursively: Use the current task to
+ * identify recursive invocation - the lock must be acquired only once.
+ * Even with disabled bottom halves this can be preempted on PREEMPT_RT.
+ * Limit the locking to RT to avoid assigning `owner' if it can be
+ * avoided.
+ */
+ if (IS_ENABLED(CONFIG_PREEMPT_RT) && ovs_pcpu->owner != current) {
+ local_lock_nested_bh(&ovs_pcpu_storage.bh_lock);
+ ovs_pcpu->owner = current;
+ ovs_pcpu_locked = true;
+ }
+
error = ovs_execute_actions(dp, skb, sf_acts, key);
if (unlikely(error))
net_dbg_ratelimited("ovs: action execution error on datapath %s: %d\n",
ovs_dp_name(dp), error);
+ if (ovs_pcpu_locked) {
+ ovs_pcpu->owner = NULL;
+ local_unlock_nested_bh(&ovs_pcpu_storage.bh_lock);
+ }
stats_counter = &stats->n_hit;
@@ -671,7 +689,13 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
sf_acts = rcu_dereference(flow->sf_acts);
local_bh_disable();
+ local_lock_nested_bh(&ovs_pcpu_storage.bh_lock);
+ if (IS_ENABLED(CONFIG_PREEMPT_RT))
+ this_cpu_write(ovs_pcpu_storage.owner, current);
err = ovs_execute_actions(dp, packet, sf_acts, &flow->key);
+ if (IS_ENABLED(CONFIG_PREEMPT_RT))
+ this_cpu_write(ovs_pcpu_storage.owner, NULL);
+ local_unlock_nested_bh(&ovs_pcpu_storage.bh_lock);
local_bh_enable();
rcu_read_unlock();
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index a126407926058..4a665c3cfa906 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -173,6 +173,39 @@ struct ovs_net {
bool xt_label;
};
+struct deferred_action {
+ struct sk_buff *skb;
+ const struct nlattr *actions;
+ int actions_len;
+
+ /* Store pkt_key clone when creating deferred action. */
+ struct sw_flow_key pkt_key;
+};
+
+#define DEFERRED_ACTION_FIFO_SIZE 10
+#define OVS_RECURSION_LIMIT 5
+#define OVS_DEFERRED_ACTION_THRESHOLD (OVS_RECURSION_LIMIT - 2)
+
+struct action_fifo {
+ int head;
+ int tail;
+ /* Deferred action fifo queue storage. */
+ struct deferred_action fifo[DEFERRED_ACTION_FIFO_SIZE];
+};
+
+struct action_flow_keys {
+ struct sw_flow_key key[OVS_DEFERRED_ACTION_THRESHOLD];
+};
+
+struct ovs_pcpu_storage {
+ struct action_fifo action_fifos;
+ struct action_flow_keys flow_keys;
+ int exec_level;
+ struct task_struct *owner;
+ local_lock_t bh_lock;
+};
+DECLARE_PER_CPU(struct ovs_pcpu_storage, ovs_pcpu_storage);
+
/**
* enum ovs_pkt_hash_types - hash info to include with a packet
* to send to userspace.
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH net-next v4 09/15] openvswitch: Move ovs_frag_data_storage into the struct ovs_pcpu_storage
2025-05-12 9:27 [PATCH net-next v4 00/15] net: Cover more per-CPU storage with local nested BH locking Sebastian Andrzej Siewior
` (7 preceding siblings ...)
2025-05-12 9:27 ` [PATCH net-next v4 08/15] openvswitch: Use nested-BH locking for ovs_pcpu_storage Sebastian Andrzej Siewior
@ 2025-05-12 9:27 ` Sebastian Andrzej Siewior
2025-05-14 14:14 ` [ovs-dev] " Aaron Conole
2025-05-12 9:27 ` [PATCH net-next v4 10/15] net/sched: act_mirred: Move the recursion counter struct netdev_xmit Sebastian Andrzej Siewior
` (6 subsequent siblings)
15 siblings, 1 reply; 22+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-05-12 9:27 UTC (permalink / raw)
To: netdev, linux-rt-devel
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Thomas Gleixner, Sebastian Andrzej Siewior,
Aaron Conole, Eelco Chaudron, Ilya Maximets, dev
ovs_frag_data_storage is a per-CPU variable and relies on disabled BH for its
locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT
this data structure requires explicit locking.
Move ovs_frag_data_storage into the struct ovs_pcpu_storage which already
provides locking for the structure.
Cc: Aaron Conole <aconole@redhat.com>
Cc: Eelco Chaudron <echaudro@redhat.com>
Cc: Ilya Maximets <i.maximets@ovn.org>
Cc: dev@openvswitch.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
net/openvswitch/actions.c | 20 ++------------------
net/openvswitch/datapath.h | 16 ++++++++++++++++
2 files changed, 18 insertions(+), 18 deletions(-)
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 435725c27a557..e7269a3eec79e 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -39,22 +39,6 @@
#include "flow_netlink.h"
#include "openvswitch_trace.h"
-#define MAX_L2_LEN (VLAN_ETH_HLEN + 3 * MPLS_HLEN)
-struct ovs_frag_data {
- unsigned long dst;
- struct vport *vport;
- struct ovs_skb_cb cb;
- __be16 inner_protocol;
- u16 network_offset; /* valid only for MPLS */
- u16 vlan_tci;
- __be16 vlan_proto;
- unsigned int l2_len;
- u8 mac_proto;
- u8 l2_data[MAX_L2_LEN];
-};
-
-static DEFINE_PER_CPU(struct ovs_frag_data, ovs_frag_data_storage);
-
DEFINE_PER_CPU(struct ovs_pcpu_storage, ovs_pcpu_storage) = {
.bh_lock = INIT_LOCAL_LOCK(bh_lock),
};
@@ -771,7 +755,7 @@ static int set_sctp(struct sk_buff *skb, struct sw_flow_key *flow_key,
static int ovs_vport_output(struct net *net, struct sock *sk,
struct sk_buff *skb)
{
- struct ovs_frag_data *data = this_cpu_ptr(&ovs_frag_data_storage);
+ struct ovs_frag_data *data = this_cpu_ptr(&ovs_pcpu_storage.frag_data);
struct vport *vport = data->vport;
if (skb_cow_head(skb, data->l2_len) < 0) {
@@ -823,7 +807,7 @@ static void prepare_frag(struct vport *vport, struct sk_buff *skb,
unsigned int hlen = skb_network_offset(skb);
struct ovs_frag_data *data;
- data = this_cpu_ptr(&ovs_frag_data_storage);
+ data = this_cpu_ptr(&ovs_pcpu_storage.frag_data);
data->dst = skb->_skb_refdst;
data->vport = vport;
data->cb = *OVS_CB(skb);
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index 4a665c3cfa906..1b5348b0f5594 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -13,6 +13,7 @@
#include <linux/skbuff.h>
#include <linux/u64_stats_sync.h>
#include <net/ip_tunnels.h>
+#include <net/mpls.h>
#include "conntrack.h"
#include "flow.h"
@@ -173,6 +174,20 @@ struct ovs_net {
bool xt_label;
};
+#define MAX_L2_LEN (VLAN_ETH_HLEN + 3 * MPLS_HLEN)
+struct ovs_frag_data {
+ unsigned long dst;
+ struct vport *vport;
+ struct ovs_skb_cb cb;
+ __be16 inner_protocol;
+ u16 network_offset; /* valid only for MPLS */
+ u16 vlan_tci;
+ __be16 vlan_proto;
+ unsigned int l2_len;
+ u8 mac_proto;
+ u8 l2_data[MAX_L2_LEN];
+};
+
struct deferred_action {
struct sk_buff *skb;
const struct nlattr *actions;
@@ -200,6 +215,7 @@ struct action_flow_keys {
struct ovs_pcpu_storage {
struct action_fifo action_fifos;
struct action_flow_keys flow_keys;
+ struct ovs_frag_data frag_data;
int exec_level;
struct task_struct *owner;
local_lock_t bh_lock;
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH net-next v4 10/15] net/sched: act_mirred: Move the recursion counter struct netdev_xmit
2025-05-12 9:27 [PATCH net-next v4 00/15] net: Cover more per-CPU storage with local nested BH locking Sebastian Andrzej Siewior
` (8 preceding siblings ...)
2025-05-12 9:27 ` [PATCH net-next v4 09/15] openvswitch: Move ovs_frag_data_storage into the struct ovs_pcpu_storage Sebastian Andrzej Siewior
@ 2025-05-12 9:27 ` Sebastian Andrzej Siewior
2025-05-15 9:55 ` Paolo Abeni
2025-05-12 9:27 ` [PATCH net-next v4 11/15] net/sched: Use nested-BH locking for sch_frag_data_storage Sebastian Andrzej Siewior
` (5 subsequent siblings)
15 siblings, 1 reply; 22+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-05-12 9:27 UTC (permalink / raw)
To: netdev, linux-rt-devel
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Thomas Gleixner, Sebastian Andrzej Siewior,
Jamal Hadi Salim, Cong Wang, Jiri Pirko
mirred_nest_level is a per-CPU variable and relies on disabled BH for its
locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT
this data structure requires explicit locking.
Move mirred_nest_level to struct netdev_xmit as u8, provide wrappers.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
include/linux/netdevice_xmit.h | 3 +++
net/sched/act_mirred.c | 28 +++++++++++++++++++++++++---
2 files changed, 28 insertions(+), 3 deletions(-)
diff --git a/include/linux/netdevice_xmit.h b/include/linux/netdevice_xmit.h
index 38325e0702968..848735b3a7c02 100644
--- a/include/linux/netdevice_xmit.h
+++ b/include/linux/netdevice_xmit.h
@@ -8,6 +8,9 @@ struct netdev_xmit {
#ifdef CONFIG_NET_EGRESS
u8 skip_txqueue;
#endif
+#if IS_ENABLED(CONFIG_NET_ACT_MIRRED)
+ u8 sched_mirred_nest;
+#endif
};
#endif
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 5b38143659249..5f01f567c934d 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -30,7 +30,29 @@ static LIST_HEAD(mirred_list);
static DEFINE_SPINLOCK(mirred_list_lock);
#define MIRRED_NEST_LIMIT 4
-static DEFINE_PER_CPU(unsigned int, mirred_nest_level);
+
+#ifndef CONFIG_PREEMPT_RT
+static u8 tcf_mirred_nest_level_inc_return(void)
+{
+ return __this_cpu_inc_return(softnet_data.xmit.sched_mirred_nest);
+}
+
+static void tcf_mirred_nest_level_dec(void)
+{
+ __this_cpu_dec(softnet_data.xmit.sched_mirred_nest);
+}
+
+#else
+static u8 tcf_mirred_nest_level_inc_return(void)
+{
+ return current->net_xmit.sched_mirred_nest++;
+}
+
+static void tcf_mirred_nest_level_dec(void)
+{
+ current->net_xmit.sched_mirred_nest--;
+}
+#endif
static bool tcf_mirred_is_act_redirect(int action)
{
@@ -423,7 +445,7 @@ TC_INDIRECT_SCOPE int tcf_mirred_act(struct sk_buff *skb,
int m_eaction;
u32 blockid;
- nest_level = __this_cpu_inc_return(mirred_nest_level);
+ nest_level = tcf_mirred_nest_level_inc_return();
if (unlikely(nest_level > MIRRED_NEST_LIMIT)) {
net_warn_ratelimited("Packet exceeded mirred recursion limit on dev %s\n",
netdev_name(skb->dev));
@@ -454,7 +476,7 @@ TC_INDIRECT_SCOPE int tcf_mirred_act(struct sk_buff *skb,
retval);
dec_nest_level:
- __this_cpu_dec(mirred_nest_level);
+ tcf_mirred_nest_level_dec();
return retval;
}
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH net-next v4 11/15] net/sched: Use nested-BH locking for sch_frag_data_storage
2025-05-12 9:27 [PATCH net-next v4 00/15] net: Cover more per-CPU storage with local nested BH locking Sebastian Andrzej Siewior
` (9 preceding siblings ...)
2025-05-12 9:27 ` [PATCH net-next v4 10/15] net/sched: act_mirred: Move the recursion counter struct netdev_xmit Sebastian Andrzej Siewior
@ 2025-05-12 9:27 ` Sebastian Andrzej Siewior
2025-05-12 9:27 ` [PATCH net-next v4 12/15] mptcp: Use nested-BH locking for hmac_storage Sebastian Andrzej Siewior
` (4 subsequent siblings)
15 siblings, 0 replies; 22+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-05-12 9:27 UTC (permalink / raw)
To: netdev, linux-rt-devel
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Thomas Gleixner, Sebastian Andrzej Siewior,
Jamal Hadi Salim, Cong Wang, Jiri Pirko
sch_frag_data_storage is a per-CPU variable and relies on disabled BH
for its locking. Without per-CPU locking in local_bh_disable() on
PREEMPT_RT this data structure requires explicit locking.
Add local_lock_t to the struct and use local_lock_nested_bh() for locking.
This change adds only lockdep coverage and does not alter the functional
behaviour for !PREEMPT_RT.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
net/sched/sch_frag.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/net/sched/sch_frag.c b/net/sched/sch_frag.c
index ce63414185fd6..d1d87dce7f3f7 100644
--- a/net/sched/sch_frag.c
+++ b/net/sched/sch_frag.c
@@ -16,14 +16,18 @@ struct sch_frag_data {
unsigned int l2_len;
u8 l2_data[VLAN_ETH_HLEN];
int (*xmit)(struct sk_buff *skb);
+ local_lock_t bh_lock;
};
-static DEFINE_PER_CPU(struct sch_frag_data, sch_frag_data_storage);
+static DEFINE_PER_CPU(struct sch_frag_data, sch_frag_data_storage) = {
+ .bh_lock = INIT_LOCAL_LOCK(bh_lock),
+};
static int sch_frag_xmit(struct net *net, struct sock *sk, struct sk_buff *skb)
{
struct sch_frag_data *data = this_cpu_ptr(&sch_frag_data_storage);
+ lockdep_assert_held(&data->bh_lock);
if (skb_cow_head(skb, data->l2_len) < 0) {
kfree_skb(skb);
return -ENOMEM;
@@ -95,6 +99,7 @@ static int sch_fragment(struct net *net, struct sk_buff *skb,
struct rtable sch_frag_rt = { 0 };
unsigned long orig_dst;
+ local_lock_nested_bh(&sch_frag_data_storage.bh_lock);
sch_frag_prepare_frag(skb, xmit);
dst_init(&sch_frag_rt.dst, &sch_frag_dst_ops, NULL,
DST_OBSOLETE_NONE, DST_NOCOUNT);
@@ -105,11 +110,13 @@ static int sch_fragment(struct net *net, struct sk_buff *skb,
IPCB(skb)->frag_max_size = mru;
ret = ip_do_fragment(net, skb->sk, skb, sch_frag_xmit);
+ local_unlock_nested_bh(&sch_frag_data_storage.bh_lock);
refdst_drop(orig_dst);
} else if (skb_protocol(skb, true) == htons(ETH_P_IPV6)) {
unsigned long orig_dst;
struct rt6_info sch_frag_rt;
+ local_lock_nested_bh(&sch_frag_data_storage.bh_lock);
sch_frag_prepare_frag(skb, xmit);
memset(&sch_frag_rt, 0, sizeof(sch_frag_rt));
dst_init(&sch_frag_rt.dst, &sch_frag_dst_ops, NULL,
@@ -122,6 +129,7 @@ static int sch_fragment(struct net *net, struct sk_buff *skb,
ret = ipv6_stub->ipv6_fragment(net, skb->sk, skb,
sch_frag_xmit);
+ local_unlock_nested_bh(&sch_frag_data_storage.bh_lock);
refdst_drop(orig_dst);
} else {
net_warn_ratelimited("Fail frag %s: eth=%x, MRU=%d, MTU=%d\n",
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH net-next v4 12/15] mptcp: Use nested-BH locking for hmac_storage
2025-05-12 9:27 [PATCH net-next v4 00/15] net: Cover more per-CPU storage with local nested BH locking Sebastian Andrzej Siewior
` (10 preceding siblings ...)
2025-05-12 9:27 ` [PATCH net-next v4 11/15] net/sched: Use nested-BH locking for sch_frag_data_storage Sebastian Andrzej Siewior
@ 2025-05-12 9:27 ` Sebastian Andrzej Siewior
2025-05-12 9:27 ` [PATCH net-next v4 13/15] rds: Disable only bottom halves in rds_page_remainder_alloc() Sebastian Andrzej Siewior
` (3 subsequent siblings)
15 siblings, 0 replies; 22+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-05-12 9:27 UTC (permalink / raw)
To: netdev, linux-rt-devel
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Thomas Gleixner, Sebastian Andrzej Siewior,
Matthieu Baerts, Mat Martineau, Geliang Tang, mptcp
mptcp_delegated_actions is a per-CPU variable and relies on disabled BH for its
locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data
structure requires explicit locking.
Add a local_lock_t to the data structure and use local_lock_nested_bh() for
locking. This change adds only lockdep coverage and does not alter the
functional behaviour for !PREEMPT_RT.
Cc: Matthieu Baerts <matttbe@kernel.org>
Cc: Mat Martineau <martineau@kernel.org>
Cc: Geliang Tang <geliang@kernel.org>
Cc: mptcp@lists.linux.dev
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
net/mptcp/protocol.c | 4 +++-
net/mptcp/protocol.h | 9 ++++++++-
2 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 44f7ab463d755..b6cb93a3f9a2d 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -46,7 +46,9 @@ static struct percpu_counter mptcp_sockets_allocated ____cacheline_aligned_in_sm
static void __mptcp_destroy_sock(struct sock *sk);
static void mptcp_check_send_data_fin(struct sock *sk);
-DEFINE_PER_CPU(struct mptcp_delegated_action, mptcp_delegated_actions);
+DEFINE_PER_CPU(struct mptcp_delegated_action, mptcp_delegated_actions) = {
+ .bh_lock = INIT_LOCAL_LOCK(bh_lock),
+};
static struct net_device *mptcp_napi_dev;
/* Returns end sequence number of the receiver's advertised window */
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index d409586b5977f..88cc2a857adce 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -479,6 +479,7 @@ mptcp_subflow_rsk(const struct request_sock *rsk)
struct mptcp_delegated_action {
struct napi_struct napi;
+ local_lock_t bh_lock;
struct list_head head;
};
@@ -670,9 +671,11 @@ static inline void mptcp_subflow_delegate(struct mptcp_subflow_context *subflow,
if (WARN_ON_ONCE(!list_empty(&subflow->delegated_node)))
return;
+ local_lock_nested_bh(&mptcp_delegated_actions.bh_lock);
delegated = this_cpu_ptr(&mptcp_delegated_actions);
schedule = list_empty(&delegated->head);
list_add_tail(&subflow->delegated_node, &delegated->head);
+ local_unlock_nested_bh(&mptcp_delegated_actions.bh_lock);
sock_hold(mptcp_subflow_tcp_sock(subflow));
if (schedule)
napi_schedule(&delegated->napi);
@@ -684,11 +687,15 @@ mptcp_subflow_delegated_next(struct mptcp_delegated_action *delegated)
{
struct mptcp_subflow_context *ret;
- if (list_empty(&delegated->head))
+ local_lock_nested_bh(&mptcp_delegated_actions.bh_lock);
+ if (list_empty(&delegated->head)) {
+ local_unlock_nested_bh(&mptcp_delegated_actions.bh_lock);
return NULL;
+ }
ret = list_first_entry(&delegated->head, struct mptcp_subflow_context, delegated_node);
list_del_init(&ret->delegated_node);
+ local_unlock_nested_bh(&mptcp_delegated_actions.bh_lock);
return ret;
}
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH net-next v4 13/15] rds: Disable only bottom halves in rds_page_remainder_alloc()
2025-05-12 9:27 [PATCH net-next v4 00/15] net: Cover more per-CPU storage with local nested BH locking Sebastian Andrzej Siewior
` (11 preceding siblings ...)
2025-05-12 9:27 ` [PATCH net-next v4 12/15] mptcp: Use nested-BH locking for hmac_storage Sebastian Andrzej Siewior
@ 2025-05-12 9:27 ` Sebastian Andrzej Siewior
2025-05-12 9:27 ` [PATCH net-next v4 14/15] rds: Acquire per-CPU pointer within BH disabled section Sebastian Andrzej Siewior
` (2 subsequent siblings)
15 siblings, 0 replies; 22+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-05-12 9:27 UTC (permalink / raw)
To: netdev, linux-rt-devel
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Thomas Gleixner, Sebastian Andrzej Siewior,
Allison Henderson, linux-rdma
rds_page_remainder_alloc() is invoked from a preemptible context or a
tasklet. There is no need to disable interrupts for locking.
Use local_bh_disable() instead of local_irq_save() for locking.
Cc: Allison Henderson <allison.henderson@oracle.com>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
net/rds/page.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/net/rds/page.c b/net/rds/page.c
index 7cc57e098ddb9..e0dd4f62ea47a 100644
--- a/net/rds/page.c
+++ b/net/rds/page.c
@@ -69,7 +69,6 @@ int rds_page_remainder_alloc(struct scatterlist *scat, unsigned long bytes,
gfp_t gfp)
{
struct rds_page_remainder *rem;
- unsigned long flags;
struct page *page;
int ret;
@@ -88,7 +87,7 @@ int rds_page_remainder_alloc(struct scatterlist *scat, unsigned long bytes,
}
rem = &per_cpu(rds_page_remainders, get_cpu());
- local_irq_save(flags);
+ local_bh_disable();
while (1) {
/* avoid a tiny region getting stuck by tossing it */
@@ -116,13 +115,13 @@ int rds_page_remainder_alloc(struct scatterlist *scat, unsigned long bytes,
}
/* alloc if there is nothing for us to use */
- local_irq_restore(flags);
+ local_bh_enable();
put_cpu();
page = alloc_page(gfp);
rem = &per_cpu(rds_page_remainders, get_cpu());
- local_irq_save(flags);
+ local_bh_disable();
if (!page) {
ret = -ENOMEM;
@@ -140,7 +139,7 @@ int rds_page_remainder_alloc(struct scatterlist *scat, unsigned long bytes,
rem->r_offset = 0;
}
- local_irq_restore(flags);
+ local_bh_enable();
put_cpu();
out:
rdsdebug("bytes %lu ret %d %p %u %u\n", bytes, ret,
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH net-next v4 14/15] rds: Acquire per-CPU pointer within BH disabled section
2025-05-12 9:27 [PATCH net-next v4 00/15] net: Cover more per-CPU storage with local nested BH locking Sebastian Andrzej Siewior
` (12 preceding siblings ...)
2025-05-12 9:27 ` [PATCH net-next v4 13/15] rds: Disable only bottom halves in rds_page_remainder_alloc() Sebastian Andrzej Siewior
@ 2025-05-12 9:27 ` Sebastian Andrzej Siewior
2025-05-12 9:27 ` [PATCH net-next v4 15/15] rds: Use nested-BH locking for rds_page_remainder Sebastian Andrzej Siewior
2025-05-15 13:40 ` [PATCH net-next v4 00/15] net: Cover more per-CPU storage with local nested BH locking patchwork-bot+netdevbpf
15 siblings, 0 replies; 22+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-05-12 9:27 UTC (permalink / raw)
To: netdev, linux-rt-devel
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Thomas Gleixner, Sebastian Andrzej Siewior,
Allison Henderson, linux-rdma
rds_page_remainder_alloc() obtains the current CPU with get_cpu() while
disabling preemption. Then the CPU number is used to access the per-CPU
data structure via per_cpu().
This can be optimized by relying on local_bh_disable() to provide a
stable CPU number/ avoid migration and then using this_cpu_ptr() to
retrieve the data structure.
Cc: Allison Henderson <allison.henderson@oracle.com>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
net/rds/page.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/net/rds/page.c b/net/rds/page.c
index e0dd4f62ea47a..58a8548a915a9 100644
--- a/net/rds/page.c
+++ b/net/rds/page.c
@@ -86,8 +86,8 @@ int rds_page_remainder_alloc(struct scatterlist *scat, unsigned long bytes,
goto out;
}
- rem = &per_cpu(rds_page_remainders, get_cpu());
local_bh_disable();
+ rem = this_cpu_ptr(&rds_page_remainders);
while (1) {
/* avoid a tiny region getting stuck by tossing it */
@@ -116,12 +116,11 @@ int rds_page_remainder_alloc(struct scatterlist *scat, unsigned long bytes,
/* alloc if there is nothing for us to use */
local_bh_enable();
- put_cpu();
page = alloc_page(gfp);
- rem = &per_cpu(rds_page_remainders, get_cpu());
local_bh_disable();
+ rem = this_cpu_ptr(&rds_page_remainders);
if (!page) {
ret = -ENOMEM;
@@ -140,7 +139,6 @@ int rds_page_remainder_alloc(struct scatterlist *scat, unsigned long bytes,
}
local_bh_enable();
- put_cpu();
out:
rdsdebug("bytes %lu ret %d %p %u %u\n", bytes, ret,
ret ? NULL : sg_page(scat), ret ? 0 : scat->offset,
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH net-next v4 15/15] rds: Use nested-BH locking for rds_page_remainder
2025-05-12 9:27 [PATCH net-next v4 00/15] net: Cover more per-CPU storage with local nested BH locking Sebastian Andrzej Siewior
` (13 preceding siblings ...)
2025-05-12 9:27 ` [PATCH net-next v4 14/15] rds: Acquire per-CPU pointer within BH disabled section Sebastian Andrzej Siewior
@ 2025-05-12 9:27 ` Sebastian Andrzej Siewior
2025-05-15 13:40 ` [PATCH net-next v4 00/15] net: Cover more per-CPU storage with local nested BH locking patchwork-bot+netdevbpf
15 siblings, 0 replies; 22+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-05-12 9:27 UTC (permalink / raw)
To: netdev, linux-rt-devel
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Thomas Gleixner, Sebastian Andrzej Siewior,
Allison Henderson, linux-rdma
rds_page_remainder is a per-CPU variable and relies on disabled BH for its
locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT
this data structure requires explicit locking.
Add a local_lock_t to the data structure and use
local_lock_nested_bh() for locking. This change adds only lockdep
coverage and does not alter the functional behaviour for !PREEMPT_RT.
Cc: Allison Henderson <allison.henderson@oracle.com>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
net/rds/page.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/net/rds/page.c b/net/rds/page.c
index 58a8548a915a9..afb151eac271c 100644
--- a/net/rds/page.c
+++ b/net/rds/page.c
@@ -40,10 +40,12 @@
struct rds_page_remainder {
struct page *r_page;
unsigned long r_offset;
+ local_lock_t bh_lock;
};
-static
-DEFINE_PER_CPU_SHARED_ALIGNED(struct rds_page_remainder, rds_page_remainders);
+static DEFINE_PER_CPU_SHARED_ALIGNED(struct rds_page_remainder, rds_page_remainders) = {
+ .bh_lock = INIT_LOCAL_LOCK(bh_lock),
+};
/**
* rds_page_remainder_alloc - build up regions of a message.
@@ -87,6 +89,7 @@ int rds_page_remainder_alloc(struct scatterlist *scat, unsigned long bytes,
}
local_bh_disable();
+ local_lock_nested_bh(&rds_page_remainders.bh_lock);
rem = this_cpu_ptr(&rds_page_remainders);
while (1) {
@@ -115,11 +118,13 @@ int rds_page_remainder_alloc(struct scatterlist *scat, unsigned long bytes,
}
/* alloc if there is nothing for us to use */
+ local_unlock_nested_bh(&rds_page_remainders.bh_lock);
local_bh_enable();
page = alloc_page(gfp);
local_bh_disable();
+ local_lock_nested_bh(&rds_page_remainders.bh_lock);
rem = this_cpu_ptr(&rds_page_remainders);
if (!page) {
@@ -138,6 +143,7 @@ int rds_page_remainder_alloc(struct scatterlist *scat, unsigned long bytes,
rem->r_offset = 0;
}
+ local_unlock_nested_bh(&rds_page_remainders.bh_lock);
local_bh_enable();
out:
rdsdebug("bytes %lu ret %d %p %u %u\n", bytes, ret,
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [ovs-dev] [PATCH net-next v4 07/15] openvswitch: Merge three per-CPU structures into one
2025-05-12 9:27 ` [PATCH net-next v4 07/15] openvswitch: Merge three per-CPU structures into one Sebastian Andrzej Siewior
@ 2025-05-14 14:13 ` Aaron Conole
0 siblings, 0 replies; 22+ messages in thread
From: Aaron Conole @ 2025-05-14 14:13 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: netdev, linux-rt-devel, dev, Ilya Maximets, Eric Dumazet,
Simon Horman, Jakub Kicinski, Thomas Gleixner, Paolo Abeni,
David S. Miller
Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:
> exec_actions_level is a per-CPU integer allocated at compile time.
> action_fifos and flow_keys are per-CPU pointer and have their data
> allocated at module init time.
> There is no gain in splitting it, once the module is allocated, the
> structures are allocated.
>
> Merge the three per-CPU variables into ovs_pcpu_storage, adapt callers.
>
> Cc: Aaron Conole <aconole@redhat.com>
> Cc: Eelco Chaudron <echaudro@redhat.com>
> Cc: Ilya Maximets <i.maximets@ovn.org>
> Cc: dev@openvswitch.org
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
Reviewed-by: Aaron Conole <aconole@redhat.com>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ovs-dev] [PATCH net-next v4 08/15] openvswitch: Use nested-BH locking for ovs_pcpu_storage
2025-05-12 9:27 ` [PATCH net-next v4 08/15] openvswitch: Use nested-BH locking for ovs_pcpu_storage Sebastian Andrzej Siewior
@ 2025-05-14 14:14 ` Aaron Conole
0 siblings, 0 replies; 22+ messages in thread
From: Aaron Conole @ 2025-05-14 14:14 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: netdev, linux-rt-devel, dev, Ilya Maximets, Eric Dumazet,
Simon Horman, Jakub Kicinski, Thomas Gleixner, Paolo Abeni,
David S. Miller
Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:
> ovs_pcpu_storage is a per-CPU variable and relies on disabled BH for its
> locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT
> this data structure requires explicit locking.
> The data structure can be referenced recursive and there is a recursion
> counter to avoid too many recursions.
>
> Add a local_lock_t to the data structure and use
> local_lock_nested_bh() for locking. Add an owner of the struct which is
> the current task and acquire the lock only if the structure is not owned
> by the current task.
>
> Cc: Aaron Conole <aconole@redhat.com>
> Cc: Eelco Chaudron <echaudro@redhat.com>
> Cc: Ilya Maximets <i.maximets@ovn.org>
> Cc: dev@openvswitch.org
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
Reviewed-by: Aaron Conole <aconole@redhat.com>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [ovs-dev] [PATCH net-next v4 09/15] openvswitch: Move ovs_frag_data_storage into the struct ovs_pcpu_storage
2025-05-12 9:27 ` [PATCH net-next v4 09/15] openvswitch: Move ovs_frag_data_storage into the struct ovs_pcpu_storage Sebastian Andrzej Siewior
@ 2025-05-14 14:14 ` Aaron Conole
0 siblings, 0 replies; 22+ messages in thread
From: Aaron Conole @ 2025-05-14 14:14 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: netdev, linux-rt-devel, dev, Ilya Maximets, Eric Dumazet,
Simon Horman, Jakub Kicinski, Thomas Gleixner, Paolo Abeni,
David S. Miller
Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:
> ovs_frag_data_storage is a per-CPU variable and relies on disabled BH for its
> locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT
> this data structure requires explicit locking.
>
> Move ovs_frag_data_storage into the struct ovs_pcpu_storage which already
> provides locking for the structure.
>
> Cc: Aaron Conole <aconole@redhat.com>
> Cc: Eelco Chaudron <echaudro@redhat.com>
> Cc: Ilya Maximets <i.maximets@ovn.org>
> Cc: dev@openvswitch.org
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
Reviewed-by: Aaron Conole <aconole@redhat.com>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next v4 10/15] net/sched: act_mirred: Move the recursion counter struct netdev_xmit
2025-05-12 9:27 ` [PATCH net-next v4 10/15] net/sched: act_mirred: Move the recursion counter struct netdev_xmit Sebastian Andrzej Siewior
@ 2025-05-15 9:55 ` Paolo Abeni
2025-05-15 12:49 ` Juri Lelli
0 siblings, 1 reply; 22+ messages in thread
From: Paolo Abeni @ 2025-05-15 9:55 UTC (permalink / raw)
To: Sebastian Andrzej Siewior, netdev, linux-rt-devel, Ingo Molnar,
Peter Zijlstra, Juri Lelli, Vincent Guittot
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Simon Horman,
Thomas Gleixner, Jamal Hadi Salim, Cong Wang, Jiri Pirko
CC sched maintainers.
On 5/12/25 11:27 AM, Sebastian Andrzej Siewior wrote:
> mirred_nest_level is a per-CPU variable and relies on disabled BH for its
> locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT
> this data structure requires explicit locking.
>
> Move mirred_nest_level to struct netdev_xmit as u8, provide wrappers.
>
> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> Cc: Cong Wang <xiyou.wangcong@gmail.com>
> Cc: Jiri Pirko <jiri@resnulli.us>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
> include/linux/netdevice_xmit.h | 3 +++
> net/sched/act_mirred.c | 28 +++++++++++++++++++++++++---
> 2 files changed, 28 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/netdevice_xmit.h b/include/linux/netdevice_xmit.h
> index 38325e0702968..848735b3a7c02 100644
> --- a/include/linux/netdevice_xmit.h
> +++ b/include/linux/netdevice_xmit.h
> @@ -8,6 +8,9 @@ struct netdev_xmit {
> #ifdef CONFIG_NET_EGRESS
> u8 skip_txqueue;
> #endif
> +#if IS_ENABLED(CONFIG_NET_ACT_MIRRED)
> + u8 sched_mirred_nest;
> +#endif
> };
The above struct is embedded into task_struct in RT build. The new field
*should* use an existing hole. According to my weak knowledge in that
area the task_struct binary layout is a critical, an explicit ack from
SMEs would be nice.
Thanks,
Paolo
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next v4 10/15] net/sched: act_mirred: Move the recursion counter struct netdev_xmit
2025-05-15 9:55 ` Paolo Abeni
@ 2025-05-15 12:49 ` Juri Lelli
0 siblings, 0 replies; 22+ messages in thread
From: Juri Lelli @ 2025-05-15 12:49 UTC (permalink / raw)
To: Paolo Abeni
Cc: Sebastian Andrzej Siewior, netdev, linux-rt-devel, Ingo Molnar,
Peter Zijlstra, Vincent Guittot, David S. Miller, Eric Dumazet,
Jakub Kicinski, Simon Horman, Thomas Gleixner, Jamal Hadi Salim,
Cong Wang, Jiri Pirko
On 15/05/25 11:55, Paolo Abeni wrote:
> CC sched maintainers.
>
> On 5/12/25 11:27 AM, Sebastian Andrzej Siewior wrote:
> > mirred_nest_level is a per-CPU variable and relies on disabled BH for its
> > locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT
> > this data structure requires explicit locking.
> >
> > Move mirred_nest_level to struct netdev_xmit as u8, provide wrappers.
> >
> > Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> > Cc: Cong Wang <xiyou.wangcong@gmail.com>
> > Cc: Jiri Pirko <jiri@resnulli.us>
> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > ---
> > include/linux/netdevice_xmit.h | 3 +++
> > net/sched/act_mirred.c | 28 +++++++++++++++++++++++++---
> > 2 files changed, 28 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/linux/netdevice_xmit.h b/include/linux/netdevice_xmit.h
> > index 38325e0702968..848735b3a7c02 100644
> > --- a/include/linux/netdevice_xmit.h
> > +++ b/include/linux/netdevice_xmit.h
> > @@ -8,6 +8,9 @@ struct netdev_xmit {
> > #ifdef CONFIG_NET_EGRESS
> > u8 skip_txqueue;
> > #endif
> > +#if IS_ENABLED(CONFIG_NET_ACT_MIRRED)
> > + u8 sched_mirred_nest;
> > +#endif
> > };
>
> The above struct is embedded into task_struct in RT build. The new field
> *should* use an existing hole. According to my weak knowledge in that
> area the task_struct binary layout is a critical, an explicit ack from
> SMEs would be nice.
Agree. Still fitting in an existing task_struct hole.
Reviewed-by: Juri Lelli <juri.lelli@redhat.com>
Best,
Juri
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next v4 00/15] net: Cover more per-CPU storage with local nested BH locking
2025-05-12 9:27 [PATCH net-next v4 00/15] net: Cover more per-CPU storage with local nested BH locking Sebastian Andrzej Siewior
` (14 preceding siblings ...)
2025-05-12 9:27 ` [PATCH net-next v4 15/15] rds: Use nested-BH locking for rds_page_remainder Sebastian Andrzej Siewior
@ 2025-05-15 13:40 ` patchwork-bot+netdevbpf
15 siblings, 0 replies; 22+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-05-15 13:40 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: netdev, linux-rt-devel, davem, edumazet, kuba, pabeni, horms,
tglx
Hello:
This series was applied to netdev/net-next.git (main)
by Paolo Abeni <pabeni@redhat.com>:
On Mon, 12 May 2025 11:27:21 +0200 you wrote:
> I was looking at the build-time defined per-CPU variables in net/ and
> added the needed local-BH-locks in order to be able to remove the
> current per-CPU lock in local_bh_disable() on PREMPT_RT.
>
> The work is not yet complete, I just wanted to post what I have so far
> instead of sitting on it.
>
> [...]
Here is the summary with links:
- [net-next,v4,01/15] net: page_pool: Don't recycle into cache on PREEMPT_RT
https://git.kernel.org/netdev/net-next/c/32471b2f481d
- [net-next,v4,02/15] net: dst_cache: Use nested-BH locking for dst_cache::cache
https://git.kernel.org/netdev/net-next/c/c99dac52ffad
- [net-next,v4,03/15] ipv4/route: Use this_cpu_inc() for stats on PREEMPT_RT
https://git.kernel.org/netdev/net-next/c/1c0829788a6e
- [net-next,v4,04/15] ipv6: sr: Use nested-BH locking for hmac_storage
https://git.kernel.org/netdev/net-next/c/bc57eda646ce
- [net-next,v4,05/15] xdp: Use nested-BH locking for system_page_pool
https://git.kernel.org/netdev/net-next/c/b9eef3391de0
- [net-next,v4,06/15] xfrm: Use nested-BH locking for nat_keepalive_sk_ipv[46]
https://git.kernel.org/netdev/net-next/c/9c607d4b6589
- [net-next,v4,07/15] openvswitch: Merge three per-CPU structures into one
https://git.kernel.org/netdev/net-next/c/035fcdc4d240
- [net-next,v4,08/15] openvswitch: Use nested-BH locking for ovs_pcpu_storage
https://git.kernel.org/netdev/net-next/c/672318331b44
- [net-next,v4,09/15] openvswitch: Move ovs_frag_data_storage into the struct ovs_pcpu_storage
https://git.kernel.org/netdev/net-next/c/3af4cdd67f32
- [net-next,v4,10/15] net/sched: act_mirred: Move the recursion counter struct netdev_xmit
https://git.kernel.org/netdev/net-next/c/7fe70c06a182
- [net-next,v4,11/15] net/sched: Use nested-BH locking for sch_frag_data_storage
https://git.kernel.org/netdev/net-next/c/20d677d389e7
- [net-next,v4,12/15] mptcp: Use nested-BH locking for hmac_storage
https://git.kernel.org/netdev/net-next/c/82d9e6b9a0a1
- [net-next,v4,13/15] rds: Disable only bottom halves in rds_page_remainder_alloc()
https://git.kernel.org/netdev/net-next/c/aaaaa6639cf5
- [net-next,v4,14/15] rds: Acquire per-CPU pointer within BH disabled section
https://git.kernel.org/netdev/net-next/c/0af5928f358c
- [net-next,v4,15/15] rds: Use nested-BH locking for rds_page_remainder
https://git.kernel.org/netdev/net-next/c/c50d295c37f2
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2025-05-15 13:40 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-12 9:27 [PATCH net-next v4 00/15] net: Cover more per-CPU storage with local nested BH locking Sebastian Andrzej Siewior
2025-05-12 9:27 ` [PATCH net-next v4 01/15] net: page_pool: Don't recycle into cache on PREEMPT_RT Sebastian Andrzej Siewior
2025-05-12 9:27 ` [PATCH net-next v4 02/15] net: dst_cache: Use nested-BH locking for dst_cache::cache Sebastian Andrzej Siewior
2025-05-12 9:27 ` [PATCH net-next v4 03/15] ipv4/route: Use this_cpu_inc() for stats on PREEMPT_RT Sebastian Andrzej Siewior
2025-05-12 9:27 ` [PATCH net-next v4 04/15] ipv6: sr: Use nested-BH locking for hmac_storage Sebastian Andrzej Siewior
2025-05-12 9:27 ` [PATCH net-next v4 05/15] xdp: Use nested-BH locking for system_page_pool Sebastian Andrzej Siewior
2025-05-12 9:27 ` [PATCH net-next v4 06/15] xfrm: Use nested-BH locking for nat_keepalive_sk_ipv[46] Sebastian Andrzej Siewior
2025-05-12 9:27 ` [PATCH net-next v4 07/15] openvswitch: Merge three per-CPU structures into one Sebastian Andrzej Siewior
2025-05-14 14:13 ` [ovs-dev] " Aaron Conole
2025-05-12 9:27 ` [PATCH net-next v4 08/15] openvswitch: Use nested-BH locking for ovs_pcpu_storage Sebastian Andrzej Siewior
2025-05-14 14:14 ` [ovs-dev] " Aaron Conole
2025-05-12 9:27 ` [PATCH net-next v4 09/15] openvswitch: Move ovs_frag_data_storage into the struct ovs_pcpu_storage Sebastian Andrzej Siewior
2025-05-14 14:14 ` [ovs-dev] " Aaron Conole
2025-05-12 9:27 ` [PATCH net-next v4 10/15] net/sched: act_mirred: Move the recursion counter struct netdev_xmit Sebastian Andrzej Siewior
2025-05-15 9:55 ` Paolo Abeni
2025-05-15 12:49 ` Juri Lelli
2025-05-12 9:27 ` [PATCH net-next v4 11/15] net/sched: Use nested-BH locking for sch_frag_data_storage Sebastian Andrzej Siewior
2025-05-12 9:27 ` [PATCH net-next v4 12/15] mptcp: Use nested-BH locking for hmac_storage Sebastian Andrzej Siewior
2025-05-12 9:27 ` [PATCH net-next v4 13/15] rds: Disable only bottom halves in rds_page_remainder_alloc() Sebastian Andrzej Siewior
2025-05-12 9:27 ` [PATCH net-next v4 14/15] rds: Acquire per-CPU pointer within BH disabled section Sebastian Andrzej Siewior
2025-05-12 9:27 ` [PATCH net-next v4 15/15] rds: Use nested-BH locking for rds_page_remainder Sebastian Andrzej Siewior
2025-05-15 13:40 ` [PATCH net-next v4 00/15] net: Cover more per-CPU storage with local nested BH locking patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).