* xfrm in RT
@ 2024-12-18 0:07 Alexei Starovoitov
2024-12-18 8:32 ` Steffen Klassert
0 siblings, 1 reply; 8+ messages in thread
From: Alexei Starovoitov @ 2024-12-18 0:07 UTC (permalink / raw)
To: Sebastian Sewior, Network Development, Jakub Kicinski,
Steffen Klassert
Hi,
Looks like xfrm isn't friendly to PREEMPT_RT.
xfrm_input_state_lookup() is doing:
int cpu = get_cpu();
...
spin_lock_bh(&net->xfrm.xfrm_state_lock);
which causes a splat:
[ 811.175877] BUG: sleeping function called from invalid context at
kernel/locking/spinlock_rt.c:48
[ 811.175884] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid:
13785, name: ping
[ 811.175886] preempt_count: 1, expected: 0
[ 811.175888] RCU nest depth: 6, expected: 6
[ 811.175889] INFO: lockdep is turned off.
[ 811.175891] CPU: 9 UID: 0 PID: 13785 Comm: ping Tainted: G W
O 6.13.0-rc3-00067-ga3c4183875d5-dirty #2
[ 811.175900] Call Trace:
[ 811.175901] <TASK>
[ 811.175902] dump_stack_lvl+0x80/0x90
[ 811.175911] __might_resched+0x2c7/0x480
[ 811.175917] rt_spin_lock+0xbd/0x240
[ 811.175922] ? rtlock_slowlock_locked+0x4cd0/0x4cd0
[ 811.175925] xfrm_input_state_lookup+0x643/0xa10
[ 811.175930] ? skb_ext_add+0x4dd/0x690
[ 811.175934] ? xfrm_state_lookup+0x1d0/0x1d0
[ 811.175937] ? __asan_memset+0x23/0x40
[ 811.175940] xfrm_input+0x78c/0x5820
[ 811.175943] ? reacquire_held_locks+0x4d0/0x4d0
[ 811.175948] ? bpf_prog_1a2cc90c3a1be51f_xfrm_get_state+0x31/0x7d
[ 811.175978] ? fib_multipath_hash+0x1190/0x1190
[ 811.175983] ? cls_bpf_classify+0x4ad/0x12e0
[ 811.176001] ? xfrm_rcv_cb+0x270/0x270
[ 811.176001] ? raw_rcv+0x6c0/0x6c0
[ 811.176001] xfrm4_esp_rcv+0x80/0x190
[ 811.176001] ip_protocol_deliver_rcu+0x82/0x300
[ 811.176001] ip_local_deliver_finish+0x29b/0x420
[ 811.176001] ip_local_deliver+0x17b/0x400
[ 811.176001] ? ip_local_deliver_finish+0x420/0x420
[ 811.176001] ? lock_release+0x464/0x640
[ 811.176001] ? rcu_read_lock_held+0xe/0x50
[ 811.176001] ? ip_rcv_finish_core.isra.0+0x943/0x1f80
[ 811.176001] ip_sublist_rcv_finish+0x84/0x230
[ 811.176001] ip_sublist_rcv+0x3e2/0x780
[ 811.176001] ? ip_rcv_finish_core.isra.0+0x1f80/0x1f80
[ 811.176001] ? do_xdp_generic+0xbf0/0xbf0
[ 811.176001] ? xfrm_state_lookup+0xf5/0x1d0
[ 811.176001] ? ip_sublist_rcv+0x780/0x780
[ 811.176001] ip_list_rcv+0x266/0x360
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: xfrm in RT
2024-12-18 0:07 xfrm in RT Alexei Starovoitov
@ 2024-12-18 8:32 ` Steffen Klassert
2024-12-18 15:44 ` Sebastian Sewior
0 siblings, 1 reply; 8+ messages in thread
From: Steffen Klassert @ 2024-12-18 8:32 UTC (permalink / raw)
To: Alexei Starovoitov; +Cc: Sebastian Sewior, Network Development, Jakub Kicinski
On Tue, Dec 17, 2024 at 04:07:16PM -0800, Alexei Starovoitov wrote:
> Hi,
>
> Looks like xfrm isn't friendly to PREEMPT_RT.
> xfrm_input_state_lookup() is doing:
>
> int cpu = get_cpu();
> ...
> spin_lock_bh(&net->xfrm.xfrm_state_lock);
We just need the cpu as a lookup key, no need to
hold on the cpu. So we just can do put_cpu()
directly after we fetched the value.
I'll fix that,
thanks!
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: xfrm in RT
2024-12-18 8:32 ` Steffen Klassert
@ 2024-12-18 15:44 ` Sebastian Sewior
2024-12-18 16:07 ` Sebastian Sewior
2025-01-17 8:02 ` Steffen Klassert
0 siblings, 2 replies; 8+ messages in thread
From: Sebastian Sewior @ 2024-12-18 15:44 UTC (permalink / raw)
To: Steffen Klassert; +Cc: Alexei Starovoitov, Network Development, Jakub Kicinski
On 2024-12-18 09:32:26 [+0100], Steffen Klassert wrote:
> On Tue, Dec 17, 2024 at 04:07:16PM -0800, Alexei Starovoitov wrote:
> > Hi,
> >
> > Looks like xfrm isn't friendly to PREEMPT_RT.
Thank you for the report.
> > xfrm_input_state_lookup() is doing:
> >
> > int cpu = get_cpu();
> > ...
> > spin_lock_bh(&net->xfrm.xfrm_state_lock);
>
> We just need the cpu as a lookup key, no need to
> hold on the cpu. So we just can do put_cpu()
> directly after we fetched the value.
I would assume that the espX_gro_receive() caller is within NAPI. Can't
tell what xfrm_input() is.
However if you don't care about staying on the current CPU for the whole
time (your current get_cpu() -> put_cpu() span) you could do something
like
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 67ca7ac955a37..66b108a5b87d4 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1116,9 +1116,8 @@ struct xfrm_state *xfrm_input_state_lookup(struct net *net, u32 mark,
{
struct hlist_head *state_cache_input;
struct xfrm_state *x = NULL;
- int cpu = get_cpu();
- state_cache_input = per_cpu_ptr(net->xfrm.state_cache_input, cpu);
+ state_cache_input = raw_cpu_ptr(net->xfrm.state_cache_input);
rcu_read_lock();
hlist_for_each_entry_rcu(x, state_cache_input, state_cache_input) {
@@ -1150,7 +1149,6 @@ struct xfrm_state *xfrm_input_state_lookup(struct net *net, u32 mark,
out:
rcu_read_unlock();
- put_cpu();
return x;
}
EXPORT_SYMBOL(xfrm_input_state_lookup);
> I'll fix that,
Thank you.
> thanks!
Sebastian
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: xfrm in RT
2024-12-18 15:44 ` Sebastian Sewior
@ 2024-12-18 16:07 ` Sebastian Sewior
2025-01-17 8:17 ` Steffen Klassert
2025-01-17 8:02 ` Steffen Klassert
1 sibling, 1 reply; 8+ messages in thread
From: Sebastian Sewior @ 2024-12-18 16:07 UTC (permalink / raw)
To: Steffen Klassert; +Cc: Alexei Starovoitov, Network Development, Jakub Kicinski
On 2024-12-18 16:44:27 [+0100], To Steffen Klassert wrote:
> time (your current get_cpu() -> put_cpu() span) you could do something
> like
While at, may I promote rcuref? This would get rid of your cmpxchg()
loop that you have otherwise in refcount_inc_not_zero(). In case you
have performance test:
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 32c09e85a64ce..eb152ebc0832c 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -14,7 +14,7 @@
#include <linux/mutex.h>
#include <linux/audit.h>
#include <linux/slab.h>
-#include <linux/refcount.h>
+#include <linux/rcuref.h>
#include <linux/sockptr.h>
#include <net/sock.h>
@@ -188,7 +188,7 @@ struct xfrm_state {
struct hlist_node state_cache;
struct hlist_node state_cache_input;
- refcount_t refcnt;
+ rcuref_t ref;
spinlock_t lock;
u32 pcpu_num;
@@ -857,24 +857,24 @@ void __xfrm_state_destroy(struct xfrm_state *, bool);
static inline void __xfrm_state_put(struct xfrm_state *x)
{
- refcount_dec(&x->refcnt);
+ WARN_ON(rcuref_put(&x->ref));
}
static inline void xfrm_state_put(struct xfrm_state *x)
{
- if (refcount_dec_and_test(&x->refcnt))
+ if (rcuref_put(&x->ref))
__xfrm_state_destroy(x, false);
}
static inline void xfrm_state_put_sync(struct xfrm_state *x)
{
- if (refcount_dec_and_test(&x->refcnt))
+ if (rcuref_put(&x->ref))
__xfrm_state_destroy(x, true);
}
static inline void xfrm_state_hold(struct xfrm_state *x)
{
- refcount_inc(&x->refcnt);
+ WARN_ON(!rcuref_get(&x->ref));
}
static inline bool addr_match(const void *token1, const void *token2,
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 66b108a5b87d4..35ed7c35292a3 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -53,7 +53,7 @@ static HLIST_HEAD(xfrm_state_dev_gc_list);
static inline bool xfrm_state_hold_rcu(struct xfrm_state __rcu *x)
{
- return refcount_inc_not_zero(&x->refcnt);
+ return rcuref_get(&x->ref);
}
static inline unsigned int xfrm_dst_hash(struct net *net,
@@ -662,7 +662,7 @@ struct xfrm_state *xfrm_state_alloc(struct net *net)
if (x) {
write_pnet(&x->xs_net, net);
- refcount_set(&x->refcnt, 1);
+ rcuref_init(&x->ref, 1);
atomic_set(&x->tunnel_users, 0);
INIT_LIST_HEAD(&x->km.all);
INIT_HLIST_NODE(&x->state_cache);
Sebastian
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: xfrm in RT
2024-12-18 15:44 ` Sebastian Sewior
2024-12-18 16:07 ` Sebastian Sewior
@ 2025-01-17 8:02 ` Steffen Klassert
2025-01-23 16:20 ` [PATCH net] xfrm: Don't disable preemption while looking up cache state Sebastian Sewior
1 sibling, 1 reply; 8+ messages in thread
From: Steffen Klassert @ 2025-01-17 8:02 UTC (permalink / raw)
To: Sebastian Sewior; +Cc: Alexei Starovoitov, Network Development, Jakub Kicinski
On Wed, Dec 18, 2024 at 04:44:26PM +0100, Sebastian Sewior wrote:
> On 2024-12-18 09:32:26 [+0100], Steffen Klassert wrote:
> > On Tue, Dec 17, 2024 at 04:07:16PM -0800, Alexei Starovoitov wrote:
> > > Hi,
> > >
> > > Looks like xfrm isn't friendly to PREEMPT_RT.
> Thank you for the report.
Sorry for the delay.
> > > xfrm_input_state_lookup() is doing:
> > >
> > > int cpu = get_cpu();
> > > ...
> > > spin_lock_bh(&net->xfrm.xfrm_state_lock);
> >
> > We just need the cpu as a lookup key, no need to
> > hold on the cpu. So we just can do put_cpu()
> > directly after we fetched the value.
>
> I would assume that the espX_gro_receive() caller is within NAPI. Can't
> tell what xfrm_input() is.
> However if you don't care about staying on the current CPU for the whole
> time (your current get_cpu() -> put_cpu() span) you could do something
> like
>
> diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
> index 67ca7ac955a37..66b108a5b87d4 100644
> --- a/net/xfrm/xfrm_state.c
> +++ b/net/xfrm/xfrm_state.c
> @@ -1116,9 +1116,8 @@ struct xfrm_state *xfrm_input_state_lookup(struct net *net, u32 mark,
> {
> struct hlist_head *state_cache_input;
> struct xfrm_state *x = NULL;
> - int cpu = get_cpu();
>
> - state_cache_input = per_cpu_ptr(net->xfrm.state_cache_input, cpu);
> + state_cache_input = raw_cpu_ptr(net->xfrm.state_cache_input);
>
> rcu_read_lock();
> hlist_for_each_entry_rcu(x, state_cache_input, state_cache_input) {
> @@ -1150,7 +1149,6 @@ struct xfrm_state *xfrm_input_state_lookup(struct net *net, u32 mark,
>
> out:
> rcu_read_unlock();
> - put_cpu();
> return x;
> }
This looks like the correct fix. Do you want to submit this pach?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: xfrm in RT
2024-12-18 16:07 ` Sebastian Sewior
@ 2025-01-17 8:17 ` Steffen Klassert
0 siblings, 0 replies; 8+ messages in thread
From: Steffen Klassert @ 2025-01-17 8:17 UTC (permalink / raw)
To: Sebastian Sewior; +Cc: Alexei Starovoitov, Network Development, Jakub Kicinski
On Wed, Dec 18, 2024 at 05:07:45PM +0100, Sebastian Sewior wrote:
> On 2024-12-18 16:44:27 [+0100], To Steffen Klassert wrote:
> > time (your current get_cpu() -> put_cpu() span) you could do something
> > like
>
> While at, may I promote rcuref? This would get rid of your cmpxchg()
> loop that you have otherwise in refcount_inc_not_zero(). In case you
> have performance test:
I have currently no test setup running that could show a performance
difference, but anyway the change looks OK to me.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH net] xfrm: Don't disable preemption while looking up cache state.
2025-01-17 8:02 ` Steffen Klassert
@ 2025-01-23 16:20 ` Sebastian Sewior
2025-01-27 5:33 ` Steffen Klassert
0 siblings, 1 reply; 8+ messages in thread
From: Sebastian Sewior @ 2025-01-23 16:20 UTC (permalink / raw)
To: Steffen Klassert; +Cc: Alexei Starovoitov, Network Development
For the state cache lookup xfrm_input_state_lookup() first disables
preemption, to remain on the CPU and then retrieves a per-CPU pointer.
Within the preempt-disable section it also acquires
netns_xfrm::xfrm_state_lock, a spinlock_t. This lock must not be
acquired with explicit disabled preemption (such as by get_cpu())
because this lock becomes a sleeping lock on PREEMPT_RT.
To remain on the same CPU is just an optimisation for the CPU local
lookup. The actual modification of the per-CPU variable happens with
netns_xfrm::xfrm_state_lock acquired.
Remove get_cpu() and use the state_cache_input on the current CPU.
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Closes: https://lore.kernel.org/all/CAADnVQKkCLaj=roayH=Mjiiqz_svdf1tsC3OE4EC0E=mAD+L1A@mail.gmail.com/
Fixes: 81a331a0e72dd ("xfrm: Add an inbound percpu state cache.")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
net/xfrm/xfrm_state.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 67ca7ac955a37..66b108a5b87d4 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1116,9 +1116,8 @@ struct xfrm_state *xfrm_input_state_lookup(struct net *net, u32 mark,
{
struct hlist_head *state_cache_input;
struct xfrm_state *x = NULL;
- int cpu = get_cpu();
- state_cache_input = per_cpu_ptr(net->xfrm.state_cache_input, cpu);
+ state_cache_input = raw_cpu_ptr(net->xfrm.state_cache_input);
rcu_read_lock();
hlist_for_each_entry_rcu(x, state_cache_input, state_cache_input) {
@@ -1150,7 +1149,6 @@ struct xfrm_state *xfrm_input_state_lookup(struct net *net, u32 mark,
out:
rcu_read_unlock();
- put_cpu();
return x;
}
EXPORT_SYMBOL(xfrm_input_state_lookup);
--
2.47.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net] xfrm: Don't disable preemption while looking up cache state.
2025-01-23 16:20 ` [PATCH net] xfrm: Don't disable preemption while looking up cache state Sebastian Sewior
@ 2025-01-27 5:33 ` Steffen Klassert
0 siblings, 0 replies; 8+ messages in thread
From: Steffen Klassert @ 2025-01-27 5:33 UTC (permalink / raw)
To: Sebastian Sewior; +Cc: Alexei Starovoitov, Network Development
On Thu, Jan 23, 2025 at 05:20:45PM +0100, Sebastian Sewior wrote:
> For the state cache lookup xfrm_input_state_lookup() first disables
> preemption, to remain on the CPU and then retrieves a per-CPU pointer.
> Within the preempt-disable section it also acquires
> netns_xfrm::xfrm_state_lock, a spinlock_t. This lock must not be
> acquired with explicit disabled preemption (such as by get_cpu())
> because this lock becomes a sleeping lock on PREEMPT_RT.
>
> To remain on the same CPU is just an optimisation for the CPU local
> lookup. The actual modification of the per-CPU variable happens with
> netns_xfrm::xfrm_state_lock acquired.
>
> Remove get_cpu() and use the state_cache_input on the current CPU.
>
> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Closes: https://lore.kernel.org/all/CAADnVQKkCLaj=roayH=Mjiiqz_svdf1tsC3OE4EC0E=mAD+L1A@mail.gmail.com/
> Fixes: 81a331a0e72dd ("xfrm: Add an inbound percpu state cache.")
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Applied to the ipsec tree, thanks a lot Sebastian!
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-01-27 5:33 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-18 0:07 xfrm in RT Alexei Starovoitov
2024-12-18 8:32 ` Steffen Klassert
2024-12-18 15:44 ` Sebastian Sewior
2024-12-18 16:07 ` Sebastian Sewior
2025-01-17 8:17 ` Steffen Klassert
2025-01-17 8:02 ` Steffen Klassert
2025-01-23 16:20 ` [PATCH net] xfrm: Don't disable preemption while looking up cache state Sebastian Sewior
2025-01-27 5:33 ` Steffen Klassert
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).