netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* xfrm in RT
@ 2024-12-18  0:07 Alexei Starovoitov
  2024-12-18  8:32 ` Steffen Klassert
  0 siblings, 1 reply; 8+ messages in thread
From: Alexei Starovoitov @ 2024-12-18  0:07 UTC (permalink / raw)
  To: Sebastian Sewior, Network Development, Jakub Kicinski,
	Steffen Klassert

Hi,

Looks like xfrm isn't friendly to PREEMPT_RT.
xfrm_input_state_lookup() is doing:

int cpu = get_cpu();
...
spin_lock_bh(&net->xfrm.xfrm_state_lock);

which causes a splat:

[  811.175877] BUG: sleeping function called from invalid context at
kernel/locking/spinlock_rt.c:48
[  811.175884] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid:
13785, name: ping
[  811.175886] preempt_count: 1, expected: 0
[  811.175888] RCU nest depth: 6, expected: 6
[  811.175889] INFO: lockdep is turned off.
[  811.175891] CPU: 9 UID: 0 PID: 13785 Comm: ping Tainted: G        W
 O       6.13.0-rc3-00067-ga3c4183875d5-dirty #2
[  811.175900] Call Trace:
[  811.175901]  <TASK>
[  811.175902]  dump_stack_lvl+0x80/0x90
[  811.175911]  __might_resched+0x2c7/0x480
[  811.175917]  rt_spin_lock+0xbd/0x240
[  811.175922]  ? rtlock_slowlock_locked+0x4cd0/0x4cd0
[  811.175925]  xfrm_input_state_lookup+0x643/0xa10
[  811.175930]  ? skb_ext_add+0x4dd/0x690
[  811.175934]  ? xfrm_state_lookup+0x1d0/0x1d0
[  811.175937]  ? __asan_memset+0x23/0x40
[  811.175940]  xfrm_input+0x78c/0x5820
[  811.175943]  ? reacquire_held_locks+0x4d0/0x4d0
[  811.175948]  ? bpf_prog_1a2cc90c3a1be51f_xfrm_get_state+0x31/0x7d
[  811.175978]  ? fib_multipath_hash+0x1190/0x1190
[  811.175983]  ? cls_bpf_classify+0x4ad/0x12e0
[  811.176001]  ? xfrm_rcv_cb+0x270/0x270
[  811.176001]  ? raw_rcv+0x6c0/0x6c0
[  811.176001]  xfrm4_esp_rcv+0x80/0x190
[  811.176001]  ip_protocol_deliver_rcu+0x82/0x300
[  811.176001]  ip_local_deliver_finish+0x29b/0x420
[  811.176001]  ip_local_deliver+0x17b/0x400
[  811.176001]  ? ip_local_deliver_finish+0x420/0x420
[  811.176001]  ? lock_release+0x464/0x640
[  811.176001]  ? rcu_read_lock_held+0xe/0x50
[  811.176001]  ? ip_rcv_finish_core.isra.0+0x943/0x1f80
[  811.176001]  ip_sublist_rcv_finish+0x84/0x230
[  811.176001]  ip_sublist_rcv+0x3e2/0x780
[  811.176001]  ? ip_rcv_finish_core.isra.0+0x1f80/0x1f80
[  811.176001]  ? do_xdp_generic+0xbf0/0xbf0
[  811.176001]  ? xfrm_state_lookup+0xf5/0x1d0
[  811.176001]  ? ip_sublist_rcv+0x780/0x780
[  811.176001]  ip_list_rcv+0x266/0x360

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: xfrm in RT
  2024-12-18  0:07 xfrm in RT Alexei Starovoitov
@ 2024-12-18  8:32 ` Steffen Klassert
  2024-12-18 15:44   ` Sebastian Sewior
  0 siblings, 1 reply; 8+ messages in thread
From: Steffen Klassert @ 2024-12-18  8:32 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Sebastian Sewior, Network Development, Jakub Kicinski

On Tue, Dec 17, 2024 at 04:07:16PM -0800, Alexei Starovoitov wrote:
> Hi,
> 
> Looks like xfrm isn't friendly to PREEMPT_RT.
> xfrm_input_state_lookup() is doing:
> 
> int cpu = get_cpu();
> ...
> spin_lock_bh(&net->xfrm.xfrm_state_lock);

We just need the cpu as a lookup key, no need to
hold on the cpu. So we just can do put_cpu()
directly after we fetched the value.

I'll fix that,

thanks!


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: xfrm in RT
  2024-12-18  8:32 ` Steffen Klassert
@ 2024-12-18 15:44   ` Sebastian Sewior
  2024-12-18 16:07     ` Sebastian Sewior
  2025-01-17  8:02     ` Steffen Klassert
  0 siblings, 2 replies; 8+ messages in thread
From: Sebastian Sewior @ 2024-12-18 15:44 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: Alexei Starovoitov, Network Development, Jakub Kicinski

On 2024-12-18 09:32:26 [+0100], Steffen Klassert wrote:
> On Tue, Dec 17, 2024 at 04:07:16PM -0800, Alexei Starovoitov wrote:
> > Hi,
> > 
> > Looks like xfrm isn't friendly to PREEMPT_RT.
Thank you for the report.

> > xfrm_input_state_lookup() is doing:
> > 
> > int cpu = get_cpu();
> > ...
> > spin_lock_bh(&net->xfrm.xfrm_state_lock);
> 
> We just need the cpu as a lookup key, no need to
> hold on the cpu. So we just can do put_cpu()
> directly after we fetched the value.

I would assume that the espX_gro_receive() caller is within NAPI. Can't
tell what xfrm_input() is.
However if you don't care about staying on the current CPU for the whole
time (your current get_cpu() -> put_cpu() span) you could do something
like

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 67ca7ac955a37..66b108a5b87d4 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1116,9 +1116,8 @@ struct xfrm_state *xfrm_input_state_lookup(struct net *net, u32 mark,
 {
 	struct hlist_head *state_cache_input;
 	struct xfrm_state *x = NULL;
-	int cpu = get_cpu();
 
-	state_cache_input =  per_cpu_ptr(net->xfrm.state_cache_input, cpu);
+	state_cache_input = raw_cpu_ptr(net->xfrm.state_cache_input);
 
 	rcu_read_lock();
 	hlist_for_each_entry_rcu(x, state_cache_input, state_cache_input) {
@@ -1150,7 +1149,6 @@ struct xfrm_state *xfrm_input_state_lookup(struct net *net, u32 mark,
 
 out:
 	rcu_read_unlock();
-	put_cpu();
 	return x;
 }
 EXPORT_SYMBOL(xfrm_input_state_lookup);

> I'll fix that,

Thank you.

> thanks!

Sebastian

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: xfrm in RT
  2024-12-18 15:44   ` Sebastian Sewior
@ 2024-12-18 16:07     ` Sebastian Sewior
  2025-01-17  8:17       ` Steffen Klassert
  2025-01-17  8:02     ` Steffen Klassert
  1 sibling, 1 reply; 8+ messages in thread
From: Sebastian Sewior @ 2024-12-18 16:07 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: Alexei Starovoitov, Network Development, Jakub Kicinski

On 2024-12-18 16:44:27 [+0100], To Steffen Klassert wrote:
> time (your current get_cpu() -> put_cpu() span) you could do something
> like

While at, may I promote rcuref? This would get rid of your cmpxchg()
loop that you have otherwise in refcount_inc_not_zero(). In case you
have performance test:

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 32c09e85a64ce..eb152ebc0832c 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -14,7 +14,7 @@
 #include <linux/mutex.h>
 #include <linux/audit.h>
 #include <linux/slab.h>
-#include <linux/refcount.h>
+#include <linux/rcuref.h>
 #include <linux/sockptr.h>
 
 #include <net/sock.h>
@@ -188,7 +188,7 @@ struct xfrm_state {
 	struct hlist_node	state_cache;
 	struct hlist_node	state_cache_input;
 
-	refcount_t		refcnt;
+	rcuref_t		ref;
 	spinlock_t		lock;
 
 	u32			pcpu_num;
@@ -857,24 +857,24 @@ void __xfrm_state_destroy(struct xfrm_state *, bool);
 
 static inline void __xfrm_state_put(struct xfrm_state *x)
 {
-	refcount_dec(&x->refcnt);
+	WARN_ON(rcuref_put(&x->ref));
 }
 
 static inline void xfrm_state_put(struct xfrm_state *x)
 {
-	if (refcount_dec_and_test(&x->refcnt))
+	if (rcuref_put(&x->ref))
 		__xfrm_state_destroy(x, false);
 }
 
 static inline void xfrm_state_put_sync(struct xfrm_state *x)
 {
-	if (refcount_dec_and_test(&x->refcnt))
+	if (rcuref_put(&x->ref))
 		__xfrm_state_destroy(x, true);
 }
 
 static inline void xfrm_state_hold(struct xfrm_state *x)
 {
-	refcount_inc(&x->refcnt);
+	WARN_ON(!rcuref_get(&x->ref));
 }
 
 static inline bool addr_match(const void *token1, const void *token2,
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 66b108a5b87d4..35ed7c35292a3 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -53,7 +53,7 @@ static HLIST_HEAD(xfrm_state_dev_gc_list);
 
 static inline bool xfrm_state_hold_rcu(struct xfrm_state __rcu *x)
 {
-	return refcount_inc_not_zero(&x->refcnt);
+	return rcuref_get(&x->ref);
 }
 
 static inline unsigned int xfrm_dst_hash(struct net *net,
@@ -662,7 +662,7 @@ struct xfrm_state *xfrm_state_alloc(struct net *net)
 
 	if (x) {
 		write_pnet(&x->xs_net, net);
-		refcount_set(&x->refcnt, 1);
+		rcuref_init(&x->ref, 1);
 		atomic_set(&x->tunnel_users, 0);
 		INIT_LIST_HEAD(&x->km.all);
 		INIT_HLIST_NODE(&x->state_cache);

Sebastian

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: xfrm in RT
  2024-12-18 15:44   ` Sebastian Sewior
  2024-12-18 16:07     ` Sebastian Sewior
@ 2025-01-17  8:02     ` Steffen Klassert
  2025-01-23 16:20       ` [PATCH net] xfrm: Don't disable preemption while looking up cache state Sebastian Sewior
  1 sibling, 1 reply; 8+ messages in thread
From: Steffen Klassert @ 2025-01-17  8:02 UTC (permalink / raw)
  To: Sebastian Sewior; +Cc: Alexei Starovoitov, Network Development, Jakub Kicinski

On Wed, Dec 18, 2024 at 04:44:26PM +0100, Sebastian Sewior wrote:
> On 2024-12-18 09:32:26 [+0100], Steffen Klassert wrote:
> > On Tue, Dec 17, 2024 at 04:07:16PM -0800, Alexei Starovoitov wrote:
> > > Hi,
> > > 
> > > Looks like xfrm isn't friendly to PREEMPT_RT.
> Thank you for the report.

Sorry for the delay.

> > > xfrm_input_state_lookup() is doing:
> > > 
> > > int cpu = get_cpu();
> > > ...
> > > spin_lock_bh(&net->xfrm.xfrm_state_lock);
> > 
> > We just need the cpu as a lookup key, no need to
> > hold on the cpu. So we just can do put_cpu()
> > directly after we fetched the value.
> 
> I would assume that the espX_gro_receive() caller is within NAPI. Can't
> tell what xfrm_input() is.
> However if you don't care about staying on the current CPU for the whole
> time (your current get_cpu() -> put_cpu() span) you could do something
> like
> 
> diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
> index 67ca7ac955a37..66b108a5b87d4 100644
> --- a/net/xfrm/xfrm_state.c
> +++ b/net/xfrm/xfrm_state.c
> @@ -1116,9 +1116,8 @@ struct xfrm_state *xfrm_input_state_lookup(struct net *net, u32 mark,
>  {
>  	struct hlist_head *state_cache_input;
>  	struct xfrm_state *x = NULL;
> -	int cpu = get_cpu();
>  
> -	state_cache_input =  per_cpu_ptr(net->xfrm.state_cache_input, cpu);
> +	state_cache_input = raw_cpu_ptr(net->xfrm.state_cache_input);
>  
>  	rcu_read_lock();
>  	hlist_for_each_entry_rcu(x, state_cache_input, state_cache_input) {
> @@ -1150,7 +1149,6 @@ struct xfrm_state *xfrm_input_state_lookup(struct net *net, u32 mark,
>  
>  out:
>  	rcu_read_unlock();
> -	put_cpu();
>  	return x;
>  }

This looks like the correct fix. Do you want to submit this pach?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: xfrm in RT
  2024-12-18 16:07     ` Sebastian Sewior
@ 2025-01-17  8:17       ` Steffen Klassert
  0 siblings, 0 replies; 8+ messages in thread
From: Steffen Klassert @ 2025-01-17  8:17 UTC (permalink / raw)
  To: Sebastian Sewior; +Cc: Alexei Starovoitov, Network Development, Jakub Kicinski

On Wed, Dec 18, 2024 at 05:07:45PM +0100, Sebastian Sewior wrote:
> On 2024-12-18 16:44:27 [+0100], To Steffen Klassert wrote:
> > time (your current get_cpu() -> put_cpu() span) you could do something
> > like
> 
> While at, may I promote rcuref? This would get rid of your cmpxchg()
> loop that you have otherwise in refcount_inc_not_zero(). In case you
> have performance test:

I have currently no test setup running that could show a performance
difference, but anyway the change looks OK to me.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH net] xfrm: Don't disable preemption while looking up cache state.
  2025-01-17  8:02     ` Steffen Klassert
@ 2025-01-23 16:20       ` Sebastian Sewior
  2025-01-27  5:33         ` Steffen Klassert
  0 siblings, 1 reply; 8+ messages in thread
From: Sebastian Sewior @ 2025-01-23 16:20 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: Alexei Starovoitov, Network Development

For the state cache lookup xfrm_input_state_lookup() first disables
preemption, to remain on the CPU and then retrieves a per-CPU pointer.
Within the preempt-disable section it also acquires
netns_xfrm::xfrm_state_lock, a spinlock_t. This lock must not be
acquired with explicit disabled preemption (such as by get_cpu())
because this lock becomes a sleeping lock on PREEMPT_RT.

To remain on the same CPU is just an optimisation for the CPU local
lookup. The actual modification of the per-CPU variable happens with
netns_xfrm::xfrm_state_lock acquired.

Remove get_cpu() and use the state_cache_input on the current CPU.

Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Closes: https://lore.kernel.org/all/CAADnVQKkCLaj=roayH=Mjiiqz_svdf1tsC3OE4EC0E=mAD+L1A@mail.gmail.com/
Fixes: 81a331a0e72dd ("xfrm: Add an inbound percpu state cache.")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 net/xfrm/xfrm_state.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 67ca7ac955a37..66b108a5b87d4 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1116,9 +1116,8 @@ struct xfrm_state *xfrm_input_state_lookup(struct net *net, u32 mark,
 {
 	struct hlist_head *state_cache_input;
 	struct xfrm_state *x = NULL;
-	int cpu = get_cpu();
 
-	state_cache_input =  per_cpu_ptr(net->xfrm.state_cache_input, cpu);
+	state_cache_input = raw_cpu_ptr(net->xfrm.state_cache_input);
 
 	rcu_read_lock();
 	hlist_for_each_entry_rcu(x, state_cache_input, state_cache_input) {
@@ -1150,7 +1149,6 @@ struct xfrm_state *xfrm_input_state_lookup(struct net *net, u32 mark,
 
 out:
 	rcu_read_unlock();
-	put_cpu();
 	return x;
 }
 EXPORT_SYMBOL(xfrm_input_state_lookup);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net] xfrm: Don't disable preemption while looking up cache state.
  2025-01-23 16:20       ` [PATCH net] xfrm: Don't disable preemption while looking up cache state Sebastian Sewior
@ 2025-01-27  5:33         ` Steffen Klassert
  0 siblings, 0 replies; 8+ messages in thread
From: Steffen Klassert @ 2025-01-27  5:33 UTC (permalink / raw)
  To: Sebastian Sewior; +Cc: Alexei Starovoitov, Network Development

On Thu, Jan 23, 2025 at 05:20:45PM +0100, Sebastian Sewior wrote:
> For the state cache lookup xfrm_input_state_lookup() first disables
> preemption, to remain on the CPU and then retrieves a per-CPU pointer.
> Within the preempt-disable section it also acquires
> netns_xfrm::xfrm_state_lock, a spinlock_t. This lock must not be
> acquired with explicit disabled preemption (such as by get_cpu())
> because this lock becomes a sleeping lock on PREEMPT_RT.
> 
> To remain on the same CPU is just an optimisation for the CPU local
> lookup. The actual modification of the per-CPU variable happens with
> netns_xfrm::xfrm_state_lock acquired.
> 
> Remove get_cpu() and use the state_cache_input on the current CPU.
> 
> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Closes: https://lore.kernel.org/all/CAADnVQKkCLaj=roayH=Mjiiqz_svdf1tsC3OE4EC0E=mAD+L1A@mail.gmail.com/
> Fixes: 81a331a0e72dd ("xfrm: Add an inbound percpu state cache.")
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Applied to the ipsec tree, thanks a lot Sebastian!

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-01-27  5:33 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-18  0:07 xfrm in RT Alexei Starovoitov
2024-12-18  8:32 ` Steffen Klassert
2024-12-18 15:44   ` Sebastian Sewior
2024-12-18 16:07     ` Sebastian Sewior
2025-01-17  8:17       ` Steffen Klassert
2025-01-17  8:02     ` Steffen Klassert
2025-01-23 16:20       ` [PATCH net] xfrm: Don't disable preemption while looking up cache state Sebastian Sewior
2025-01-27  5:33         ` Steffen Klassert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).