* xfrm in RT @ 2024-12-18 0:07 Alexei Starovoitov 2024-12-18 8:32 ` Steffen Klassert 0 siblings, 1 reply; 8+ messages in thread From: Alexei Starovoitov @ 2024-12-18 0:07 UTC (permalink / raw) To: Sebastian Sewior, Network Development, Jakub Kicinski, Steffen Klassert Hi, Looks like xfrm isn't friendly to PREEMPT_RT. xfrm_input_state_lookup() is doing: int cpu = get_cpu(); ... spin_lock_bh(&net->xfrm.xfrm_state_lock); which causes a splat: [ 811.175877] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 [ 811.175884] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 13785, name: ping [ 811.175886] preempt_count: 1, expected: 0 [ 811.175888] RCU nest depth: 6, expected: 6 [ 811.175889] INFO: lockdep is turned off. [ 811.175891] CPU: 9 UID: 0 PID: 13785 Comm: ping Tainted: G W O 6.13.0-rc3-00067-ga3c4183875d5-dirty #2 [ 811.175900] Call Trace: [ 811.175901] <TASK> [ 811.175902] dump_stack_lvl+0x80/0x90 [ 811.175911] __might_resched+0x2c7/0x480 [ 811.175917] rt_spin_lock+0xbd/0x240 [ 811.175922] ? rtlock_slowlock_locked+0x4cd0/0x4cd0 [ 811.175925] xfrm_input_state_lookup+0x643/0xa10 [ 811.175930] ? skb_ext_add+0x4dd/0x690 [ 811.175934] ? xfrm_state_lookup+0x1d0/0x1d0 [ 811.175937] ? __asan_memset+0x23/0x40 [ 811.175940] xfrm_input+0x78c/0x5820 [ 811.175943] ? reacquire_held_locks+0x4d0/0x4d0 [ 811.175948] ? bpf_prog_1a2cc90c3a1be51f_xfrm_get_state+0x31/0x7d [ 811.175978] ? fib_multipath_hash+0x1190/0x1190 [ 811.175983] ? cls_bpf_classify+0x4ad/0x12e0 [ 811.176001] ? xfrm_rcv_cb+0x270/0x270 [ 811.176001] ? raw_rcv+0x6c0/0x6c0 [ 811.176001] xfrm4_esp_rcv+0x80/0x190 [ 811.176001] ip_protocol_deliver_rcu+0x82/0x300 [ 811.176001] ip_local_deliver_finish+0x29b/0x420 [ 811.176001] ip_local_deliver+0x17b/0x400 [ 811.176001] ? ip_local_deliver_finish+0x420/0x420 [ 811.176001] ? lock_release+0x464/0x640 [ 811.176001] ? rcu_read_lock_held+0xe/0x50 [ 811.176001] ? ip_rcv_finish_core.isra.0+0x943/0x1f80 [ 811.176001] ip_sublist_rcv_finish+0x84/0x230 [ 811.176001] ip_sublist_rcv+0x3e2/0x780 [ 811.176001] ? ip_rcv_finish_core.isra.0+0x1f80/0x1f80 [ 811.176001] ? do_xdp_generic+0xbf0/0xbf0 [ 811.176001] ? xfrm_state_lookup+0xf5/0x1d0 [ 811.176001] ? ip_sublist_rcv+0x780/0x780 [ 811.176001] ip_list_rcv+0x266/0x360 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: xfrm in RT 2024-12-18 0:07 xfrm in RT Alexei Starovoitov @ 2024-12-18 8:32 ` Steffen Klassert 2024-12-18 15:44 ` Sebastian Sewior 0 siblings, 1 reply; 8+ messages in thread From: Steffen Klassert @ 2024-12-18 8:32 UTC (permalink / raw) To: Alexei Starovoitov; +Cc: Sebastian Sewior, Network Development, Jakub Kicinski On Tue, Dec 17, 2024 at 04:07:16PM -0800, Alexei Starovoitov wrote: > Hi, > > Looks like xfrm isn't friendly to PREEMPT_RT. > xfrm_input_state_lookup() is doing: > > int cpu = get_cpu(); > ... > spin_lock_bh(&net->xfrm.xfrm_state_lock); We just need the cpu as a lookup key, no need to hold on the cpu. So we just can do put_cpu() directly after we fetched the value. I'll fix that, thanks! ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: xfrm in RT 2024-12-18 8:32 ` Steffen Klassert @ 2024-12-18 15:44 ` Sebastian Sewior 2024-12-18 16:07 ` Sebastian Sewior 2025-01-17 8:02 ` Steffen Klassert 0 siblings, 2 replies; 8+ messages in thread From: Sebastian Sewior @ 2024-12-18 15:44 UTC (permalink / raw) To: Steffen Klassert; +Cc: Alexei Starovoitov, Network Development, Jakub Kicinski On 2024-12-18 09:32:26 [+0100], Steffen Klassert wrote: > On Tue, Dec 17, 2024 at 04:07:16PM -0800, Alexei Starovoitov wrote: > > Hi, > > > > Looks like xfrm isn't friendly to PREEMPT_RT. Thank you for the report. > > xfrm_input_state_lookup() is doing: > > > > int cpu = get_cpu(); > > ... > > spin_lock_bh(&net->xfrm.xfrm_state_lock); > > We just need the cpu as a lookup key, no need to > hold on the cpu. So we just can do put_cpu() > directly after we fetched the value. I would assume that the espX_gro_receive() caller is within NAPI. Can't tell what xfrm_input() is. However if you don't care about staying on the current CPU for the whole time (your current get_cpu() -> put_cpu() span) you could do something like diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index 67ca7ac955a37..66b108a5b87d4 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -1116,9 +1116,8 @@ struct xfrm_state *xfrm_input_state_lookup(struct net *net, u32 mark, { struct hlist_head *state_cache_input; struct xfrm_state *x = NULL; - int cpu = get_cpu(); - state_cache_input = per_cpu_ptr(net->xfrm.state_cache_input, cpu); + state_cache_input = raw_cpu_ptr(net->xfrm.state_cache_input); rcu_read_lock(); hlist_for_each_entry_rcu(x, state_cache_input, state_cache_input) { @@ -1150,7 +1149,6 @@ struct xfrm_state *xfrm_input_state_lookup(struct net *net, u32 mark, out: rcu_read_unlock(); - put_cpu(); return x; } EXPORT_SYMBOL(xfrm_input_state_lookup); > I'll fix that, Thank you. > thanks! Sebastian ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: xfrm in RT 2024-12-18 15:44 ` Sebastian Sewior @ 2024-12-18 16:07 ` Sebastian Sewior 2025-01-17 8:17 ` Steffen Klassert 2025-01-17 8:02 ` Steffen Klassert 1 sibling, 1 reply; 8+ messages in thread From: Sebastian Sewior @ 2024-12-18 16:07 UTC (permalink / raw) To: Steffen Klassert; +Cc: Alexei Starovoitov, Network Development, Jakub Kicinski On 2024-12-18 16:44:27 [+0100], To Steffen Klassert wrote: > time (your current get_cpu() -> put_cpu() span) you could do something > like While at, may I promote rcuref? This would get rid of your cmpxchg() loop that you have otherwise in refcount_inc_not_zero(). In case you have performance test: diff --git a/include/net/xfrm.h b/include/net/xfrm.h index 32c09e85a64ce..eb152ebc0832c 100644 --- a/include/net/xfrm.h +++ b/include/net/xfrm.h @@ -14,7 +14,7 @@ #include <linux/mutex.h> #include <linux/audit.h> #include <linux/slab.h> -#include <linux/refcount.h> +#include <linux/rcuref.h> #include <linux/sockptr.h> #include <net/sock.h> @@ -188,7 +188,7 @@ struct xfrm_state { struct hlist_node state_cache; struct hlist_node state_cache_input; - refcount_t refcnt; + rcuref_t ref; spinlock_t lock; u32 pcpu_num; @@ -857,24 +857,24 @@ void __xfrm_state_destroy(struct xfrm_state *, bool); static inline void __xfrm_state_put(struct xfrm_state *x) { - refcount_dec(&x->refcnt); + WARN_ON(rcuref_put(&x->ref)); } static inline void xfrm_state_put(struct xfrm_state *x) { - if (refcount_dec_and_test(&x->refcnt)) + if (rcuref_put(&x->ref)) __xfrm_state_destroy(x, false); } static inline void xfrm_state_put_sync(struct xfrm_state *x) { - if (refcount_dec_and_test(&x->refcnt)) + if (rcuref_put(&x->ref)) __xfrm_state_destroy(x, true); } static inline void xfrm_state_hold(struct xfrm_state *x) { - refcount_inc(&x->refcnt); + WARN_ON(!rcuref_get(&x->ref)); } static inline bool addr_match(const void *token1, const void *token2, diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index 66b108a5b87d4..35ed7c35292a3 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -53,7 +53,7 @@ static HLIST_HEAD(xfrm_state_dev_gc_list); static inline bool xfrm_state_hold_rcu(struct xfrm_state __rcu *x) { - return refcount_inc_not_zero(&x->refcnt); + return rcuref_get(&x->ref); } static inline unsigned int xfrm_dst_hash(struct net *net, @@ -662,7 +662,7 @@ struct xfrm_state *xfrm_state_alloc(struct net *net) if (x) { write_pnet(&x->xs_net, net); - refcount_set(&x->refcnt, 1); + rcuref_init(&x->ref, 1); atomic_set(&x->tunnel_users, 0); INIT_LIST_HEAD(&x->km.all); INIT_HLIST_NODE(&x->state_cache); Sebastian ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: xfrm in RT 2024-12-18 16:07 ` Sebastian Sewior @ 2025-01-17 8:17 ` Steffen Klassert 0 siblings, 0 replies; 8+ messages in thread From: Steffen Klassert @ 2025-01-17 8:17 UTC (permalink / raw) To: Sebastian Sewior; +Cc: Alexei Starovoitov, Network Development, Jakub Kicinski On Wed, Dec 18, 2024 at 05:07:45PM +0100, Sebastian Sewior wrote: > On 2024-12-18 16:44:27 [+0100], To Steffen Klassert wrote: > > time (your current get_cpu() -> put_cpu() span) you could do something > > like > > While at, may I promote rcuref? This would get rid of your cmpxchg() > loop that you have otherwise in refcount_inc_not_zero(). In case you > have performance test: I have currently no test setup running that could show a performance difference, but anyway the change looks OK to me. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: xfrm in RT 2024-12-18 15:44 ` Sebastian Sewior 2024-12-18 16:07 ` Sebastian Sewior @ 2025-01-17 8:02 ` Steffen Klassert 2025-01-23 16:20 ` [PATCH net] xfrm: Don't disable preemption while looking up cache state Sebastian Sewior 1 sibling, 1 reply; 8+ messages in thread From: Steffen Klassert @ 2025-01-17 8:02 UTC (permalink / raw) To: Sebastian Sewior; +Cc: Alexei Starovoitov, Network Development, Jakub Kicinski On Wed, Dec 18, 2024 at 04:44:26PM +0100, Sebastian Sewior wrote: > On 2024-12-18 09:32:26 [+0100], Steffen Klassert wrote: > > On Tue, Dec 17, 2024 at 04:07:16PM -0800, Alexei Starovoitov wrote: > > > Hi, > > > > > > Looks like xfrm isn't friendly to PREEMPT_RT. > Thank you for the report. Sorry for the delay. > > > xfrm_input_state_lookup() is doing: > > > > > > int cpu = get_cpu(); > > > ... > > > spin_lock_bh(&net->xfrm.xfrm_state_lock); > > > > We just need the cpu as a lookup key, no need to > > hold on the cpu. So we just can do put_cpu() > > directly after we fetched the value. > > I would assume that the espX_gro_receive() caller is within NAPI. Can't > tell what xfrm_input() is. > However if you don't care about staying on the current CPU for the whole > time (your current get_cpu() -> put_cpu() span) you could do something > like > > diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c > index 67ca7ac955a37..66b108a5b87d4 100644 > --- a/net/xfrm/xfrm_state.c > +++ b/net/xfrm/xfrm_state.c > @@ -1116,9 +1116,8 @@ struct xfrm_state *xfrm_input_state_lookup(struct net *net, u32 mark, > { > struct hlist_head *state_cache_input; > struct xfrm_state *x = NULL; > - int cpu = get_cpu(); > > - state_cache_input = per_cpu_ptr(net->xfrm.state_cache_input, cpu); > + state_cache_input = raw_cpu_ptr(net->xfrm.state_cache_input); > > rcu_read_lock(); > hlist_for_each_entry_rcu(x, state_cache_input, state_cache_input) { > @@ -1150,7 +1149,6 @@ struct xfrm_state *xfrm_input_state_lookup(struct net *net, u32 mark, > > out: > rcu_read_unlock(); > - put_cpu(); > return x; > } This looks like the correct fix. Do you want to submit this pach? ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH net] xfrm: Don't disable preemption while looking up cache state. 2025-01-17 8:02 ` Steffen Klassert @ 2025-01-23 16:20 ` Sebastian Sewior 2025-01-27 5:33 ` Steffen Klassert 0 siblings, 1 reply; 8+ messages in thread From: Sebastian Sewior @ 2025-01-23 16:20 UTC (permalink / raw) To: Steffen Klassert; +Cc: Alexei Starovoitov, Network Development For the state cache lookup xfrm_input_state_lookup() first disables preemption, to remain on the CPU and then retrieves a per-CPU pointer. Within the preempt-disable section it also acquires netns_xfrm::xfrm_state_lock, a spinlock_t. This lock must not be acquired with explicit disabled preemption (such as by get_cpu()) because this lock becomes a sleeping lock on PREEMPT_RT. To remain on the same CPU is just an optimisation for the CPU local lookup. The actual modification of the per-CPU variable happens with netns_xfrm::xfrm_state_lock acquired. Remove get_cpu() and use the state_cache_input on the current CPU. Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Closes: https://lore.kernel.org/all/CAADnVQKkCLaj=roayH=Mjiiqz_svdf1tsC3OE4EC0E=mAD+L1A@mail.gmail.com/ Fixes: 81a331a0e72dd ("xfrm: Add an inbound percpu state cache.") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- net/xfrm/xfrm_state.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index 67ca7ac955a37..66b108a5b87d4 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -1116,9 +1116,8 @@ struct xfrm_state *xfrm_input_state_lookup(struct net *net, u32 mark, { struct hlist_head *state_cache_input; struct xfrm_state *x = NULL; - int cpu = get_cpu(); - state_cache_input = per_cpu_ptr(net->xfrm.state_cache_input, cpu); + state_cache_input = raw_cpu_ptr(net->xfrm.state_cache_input); rcu_read_lock(); hlist_for_each_entry_rcu(x, state_cache_input, state_cache_input) { @@ -1150,7 +1149,6 @@ struct xfrm_state *xfrm_input_state_lookup(struct net *net, u32 mark, out: rcu_read_unlock(); - put_cpu(); return x; } EXPORT_SYMBOL(xfrm_input_state_lookup); -- 2.47.2 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net] xfrm: Don't disable preemption while looking up cache state. 2025-01-23 16:20 ` [PATCH net] xfrm: Don't disable preemption while looking up cache state Sebastian Sewior @ 2025-01-27 5:33 ` Steffen Klassert 0 siblings, 0 replies; 8+ messages in thread From: Steffen Klassert @ 2025-01-27 5:33 UTC (permalink / raw) To: Sebastian Sewior; +Cc: Alexei Starovoitov, Network Development On Thu, Jan 23, 2025 at 05:20:45PM +0100, Sebastian Sewior wrote: > For the state cache lookup xfrm_input_state_lookup() first disables > preemption, to remain on the CPU and then retrieves a per-CPU pointer. > Within the preempt-disable section it also acquires > netns_xfrm::xfrm_state_lock, a spinlock_t. This lock must not be > acquired with explicit disabled preemption (such as by get_cpu()) > because this lock becomes a sleeping lock on PREEMPT_RT. > > To remain on the same CPU is just an optimisation for the CPU local > lookup. The actual modification of the per-CPU variable happens with > netns_xfrm::xfrm_state_lock acquired. > > Remove get_cpu() and use the state_cache_input on the current CPU. > > Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> > Closes: https://lore.kernel.org/all/CAADnVQKkCLaj=roayH=Mjiiqz_svdf1tsC3OE4EC0E=mAD+L1A@mail.gmail.com/ > Fixes: 81a331a0e72dd ("xfrm: Add an inbound percpu state cache.") > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Applied to the ipsec tree, thanks a lot Sebastian! ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-01-27 5:33 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-12-18 0:07 xfrm in RT Alexei Starovoitov 2024-12-18 8:32 ` Steffen Klassert 2024-12-18 15:44 ` Sebastian Sewior 2024-12-18 16:07 ` Sebastian Sewior 2025-01-17 8:17 ` Steffen Klassert 2025-01-17 8:02 ` Steffen Klassert 2025-01-23 16:20 ` [PATCH net] xfrm: Don't disable preemption while looking up cache state Sebastian Sewior 2025-01-27 5:33 ` Steffen Klassert
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).