* [PATCH ipsec 0/2] ipsec: fix splat due to ipcomp fallback tunnel
@ 2025-07-04 14:54 Sabrina Dubroca
2025-07-04 14:54 ` [PATCH ipsec 1/2] xfrm: delete x->tunnel as we delete x Sabrina Dubroca
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Sabrina Dubroca @ 2025-07-04 14:54 UTC (permalink / raw)
To: netdev
Cc: Steffen Klassert, Herbert Xu, Alexey Dobriyan, Cong Wang,
Sabrina Dubroca
IPcomp tunnel states have an associated fallback tunnel, a keep a
reference on the corresponding xfrm_state, to allow deleting that
extra state when it's not needed anymore. These states cause issues
during netns deletion.
Commit f75a2804da39 ("xfrm: destroy xfrm_state synchronously on net
exit path") tried to address these problems but doesn't fully solve
them, and slowed down netns deletion by adding one synchronize_rcu per
deleted state.
The first patch solves the problem by moving the fallback state
deletion earlier (when we delete the user state, rather than at
destruction), then we can revert the previous fix.
Sabrina Dubroca (2):
xfrm: delete x->tunnel as we delete x
Revert "xfrm: destroy xfrm_state synchronously on net exit path"
include/net/xfrm.h | 13 +++----------
net/ipv4/ipcomp.c | 2 ++
net/ipv6/ipcomp6.c | 2 ++
net/ipv6/xfrm6_tunnel.c | 2 +-
net/key/af_key.c | 2 +-
net/xfrm/xfrm_ipcomp.c | 1 -
net/xfrm/xfrm_state.c | 40 ++++++++++++++++------------------------
net/xfrm/xfrm_user.c | 2 +-
8 files changed, 26 insertions(+), 38 deletions(-)
--
2.50.0
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH ipsec 1/2] xfrm: delete x->tunnel as we delete x
2025-07-04 14:54 [PATCH ipsec 0/2] ipsec: fix splat due to ipcomp fallback tunnel Sabrina Dubroca
@ 2025-07-04 14:54 ` Sabrina Dubroca
2025-07-04 14:54 ` [PATCH ipsec 2/2] Revert "xfrm: destroy xfrm_state synchronously on net exit path" Sabrina Dubroca
2025-07-14 7:02 ` [PATCH ipsec 0/2] ipsec: fix splat due to ipcomp fallback tunnel Steffen Klassert
2 siblings, 0 replies; 4+ messages in thread
From: Sabrina Dubroca @ 2025-07-04 14:54 UTC (permalink / raw)
To: netdev
Cc: Steffen Klassert, Herbert Xu, Alexey Dobriyan, Cong Wang,
Sabrina Dubroca
The ipcomp fallback tunnels currently get deleted (from the various
lists and hashtables) as the last user state that needed that fallback
is destroyed (not deleted). If a reference to that user state still
exists, the fallback state will remain on the hashtables/lists,
triggering the WARN in xfrm_state_fini. Because of those remaining
references, the fix in commit f75a2804da39 ("xfrm: destroy xfrm_state
synchronously on net exit path") is not complete.
We recently fixed one such situation in TCP due to defered freeing of
skbs (commit 9b6412e6979f ("tcp: drop secpath at the same time as we
currently drop dst")). This can also happen due to IP reassembly: skbs
with a secpath remain on the reassembly queue until netns
destruction. If we can't guarantee that the queues are flushed by the
time xfrm_state_fini runs, there may still be references to a (user)
xfrm_state, preventing the timely deletion of the corresponding
fallback state.
Instead of chasing each instance of skbs holding a secpath one by one,
this patch fixes the issue directly within xfrm, by deleting the
fallback state as soon as the last user state depending on it has been
deleted. Destruction will still happen when the final reference is
dropped.
A separate lockdep class for the fallback state is required since
we're going to lock x->tunnel while x is locked.
Fixes: 9d4139c76905 ("netns xfrm: per-netns xfrm_state_all list")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
---
include/net/xfrm.h | 1 -
net/ipv4/ipcomp.c | 2 ++
net/ipv6/ipcomp6.c | 2 ++
net/ipv6/xfrm6_tunnel.c | 2 +-
net/xfrm/xfrm_ipcomp.c | 1 -
net/xfrm/xfrm_state.c | 19 ++++++++-----------
6 files changed, 13 insertions(+), 14 deletions(-)
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index e45a275fca26..91d52a380e37 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -441,7 +441,6 @@ int xfrm_input_register_afinfo(const struct xfrm_input_afinfo *afinfo);
int xfrm_input_unregister_afinfo(const struct xfrm_input_afinfo *afinfo);
void xfrm_flush_gc(void);
-void xfrm_state_delete_tunnel(struct xfrm_state *x);
struct xfrm_type {
struct module *owner;
diff --git a/net/ipv4/ipcomp.c b/net/ipv4/ipcomp.c
index 5a4fb2539b08..9a45aed508d1 100644
--- a/net/ipv4/ipcomp.c
+++ b/net/ipv4/ipcomp.c
@@ -54,6 +54,7 @@ static int ipcomp4_err(struct sk_buff *skb, u32 info)
}
/* We always hold one tunnel user reference to indicate a tunnel */
+static struct lock_class_key xfrm_state_lock_key;
static struct xfrm_state *ipcomp_tunnel_create(struct xfrm_state *x)
{
struct net *net = xs_net(x);
@@ -62,6 +63,7 @@ static struct xfrm_state *ipcomp_tunnel_create(struct xfrm_state *x)
t = xfrm_state_alloc(net);
if (!t)
goto out;
+ lockdep_set_class(&t->lock, &xfrm_state_lock_key);
t->id.proto = IPPROTO_IPIP;
t->id.spi = x->props.saddr.a4;
diff --git a/net/ipv6/ipcomp6.c b/net/ipv6/ipcomp6.c
index 72d4858dec18..8607569de34f 100644
--- a/net/ipv6/ipcomp6.c
+++ b/net/ipv6/ipcomp6.c
@@ -71,6 +71,7 @@ static int ipcomp6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
return 0;
}
+static struct lock_class_key xfrm_state_lock_key;
static struct xfrm_state *ipcomp6_tunnel_create(struct xfrm_state *x)
{
struct net *net = xs_net(x);
@@ -79,6 +80,7 @@ static struct xfrm_state *ipcomp6_tunnel_create(struct xfrm_state *x)
t = xfrm_state_alloc(net);
if (!t)
goto out;
+ lockdep_set_class(&t->lock, &xfrm_state_lock_key);
t->id.proto = IPPROTO_IPV6;
t->id.spi = xfrm6_tunnel_alloc_spi(net, (xfrm_address_t *)&x->props.saddr);
diff --git a/net/ipv6/xfrm6_tunnel.c b/net/ipv6/xfrm6_tunnel.c
index bf140ef781c1..7fd8bc08e6eb 100644
--- a/net/ipv6/xfrm6_tunnel.c
+++ b/net/ipv6/xfrm6_tunnel.c
@@ -334,8 +334,8 @@ static void __net_exit xfrm6_tunnel_net_exit(struct net *net)
struct xfrm6_tunnel_net *xfrm6_tn = xfrm6_tunnel_pernet(net);
unsigned int i;
- xfrm_flush_gc();
xfrm_state_flush(net, 0, false, true);
+ xfrm_flush_gc();
for (i = 0; i < XFRM6_TUNNEL_SPI_BYADDR_HSIZE; i++)
WARN_ON_ONCE(!hlist_empty(&xfrm6_tn->spi_byaddr[i]));
diff --git a/net/xfrm/xfrm_ipcomp.c b/net/xfrm/xfrm_ipcomp.c
index a38545413b80..43fdc6ed8dd1 100644
--- a/net/xfrm/xfrm_ipcomp.c
+++ b/net/xfrm/xfrm_ipcomp.c
@@ -313,7 +313,6 @@ void ipcomp_destroy(struct xfrm_state *x)
struct ipcomp_data *ipcd = x->data;
if (!ipcd)
return;
- xfrm_state_delete_tunnel(x);
ipcomp_free_data(ipcd);
kfree(ipcd);
}
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index c7e6472c623d..f7110a658897 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -811,6 +811,7 @@ void __xfrm_state_destroy(struct xfrm_state *x, bool sync)
}
EXPORT_SYMBOL(__xfrm_state_destroy);
+static void xfrm_state_delete_tunnel(struct xfrm_state *x);
int __xfrm_state_delete(struct xfrm_state *x)
{
struct net *net = xs_net(x);
@@ -838,6 +839,8 @@ int __xfrm_state_delete(struct xfrm_state *x)
xfrm_dev_state_delete(x);
+ xfrm_state_delete_tunnel(x);
+
/* All xfrm_state objects are created by xfrm_state_alloc.
* The xfrm_state_alloc call gives a reference, and that
* is what we are dropping here.
@@ -941,10 +944,7 @@ int xfrm_state_flush(struct net *net, u8 proto, bool task_valid, bool sync)
err = xfrm_state_delete(x);
xfrm_audit_state_delete(x, err ? 0 : 1,
task_valid);
- if (sync)
- xfrm_state_put_sync(x);
- else
- xfrm_state_put(x);
+ xfrm_state_put(x);
if (!err)
cnt++;
@@ -3068,20 +3068,17 @@ void xfrm_flush_gc(void)
}
EXPORT_SYMBOL(xfrm_flush_gc);
-/* Temporarily located here until net/xfrm/xfrm_tunnel.c is created */
-void xfrm_state_delete_tunnel(struct xfrm_state *x)
+static void xfrm_state_delete_tunnel(struct xfrm_state *x)
{
if (x->tunnel) {
struct xfrm_state *t = x->tunnel;
- if (atomic_read(&t->tunnel_users) == 2)
+ if (atomic_dec_return(&t->tunnel_users) == 1)
xfrm_state_delete(t);
- atomic_dec(&t->tunnel_users);
- xfrm_state_put_sync(t);
+ xfrm_state_put(t);
x->tunnel = NULL;
}
}
-EXPORT_SYMBOL(xfrm_state_delete_tunnel);
u32 xfrm_state_mtu(struct xfrm_state *x, int mtu)
{
@@ -3286,8 +3283,8 @@ void xfrm_state_fini(struct net *net)
unsigned int sz;
flush_work(&net->xfrm.state_hash_work);
- flush_work(&xfrm_state_gc_work);
xfrm_state_flush(net, 0, false, true);
+ flush_work(&xfrm_state_gc_work);
WARN_ON(!list_empty(&net->xfrm.state_all));
--
2.50.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH ipsec 2/2] Revert "xfrm: destroy xfrm_state synchronously on net exit path"
2025-07-04 14:54 [PATCH ipsec 0/2] ipsec: fix splat due to ipcomp fallback tunnel Sabrina Dubroca
2025-07-04 14:54 ` [PATCH ipsec 1/2] xfrm: delete x->tunnel as we delete x Sabrina Dubroca
@ 2025-07-04 14:54 ` Sabrina Dubroca
2025-07-14 7:02 ` [PATCH ipsec 0/2] ipsec: fix splat due to ipcomp fallback tunnel Steffen Klassert
2 siblings, 0 replies; 4+ messages in thread
From: Sabrina Dubroca @ 2025-07-04 14:54 UTC (permalink / raw)
To: netdev
Cc: Steffen Klassert, Herbert Xu, Alexey Dobriyan, Cong Wang,
Sabrina Dubroca
This reverts commit f75a2804da391571563c4b6b29e7797787332673.
With all states (whether user or kern) removed from the hashtables
during deletion, there's no need for synchronous destruction of
states. xfrm6_tunnel states still need to have been destroyed (which
will be the case when its last user is deleted (not destroyed)) so
that xfrm6_tunnel_free_spi removes it from the per-netns hashtable
before the netns is destroyed.
This has the benefit of skipping one synchronize_rcu per state (in
__xfrm_state_destroy(sync=true)) when we exit a netns.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
---
include/net/xfrm.h | 12 +++---------
net/ipv6/xfrm6_tunnel.c | 2 +-
net/key/af_key.c | 2 +-
net/xfrm/xfrm_state.c | 23 +++++++++--------------
net/xfrm/xfrm_user.c | 2 +-
5 files changed, 15 insertions(+), 26 deletions(-)
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 91d52a380e37..f3014e4f54fc 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -915,7 +915,7 @@ static inline void xfrm_pols_put(struct xfrm_policy **pols, int npols)
xfrm_pol_put(pols[i]);
}
-void __xfrm_state_destroy(struct xfrm_state *, bool);
+void __xfrm_state_destroy(struct xfrm_state *);
static inline void __xfrm_state_put(struct xfrm_state *x)
{
@@ -925,13 +925,7 @@ static inline void __xfrm_state_put(struct xfrm_state *x)
static inline void xfrm_state_put(struct xfrm_state *x)
{
if (refcount_dec_and_test(&x->refcnt))
- __xfrm_state_destroy(x, false);
-}
-
-static inline void xfrm_state_put_sync(struct xfrm_state *x)
-{
- if (refcount_dec_and_test(&x->refcnt))
- __xfrm_state_destroy(x, true);
+ __xfrm_state_destroy(x);
}
static inline void xfrm_state_hold(struct xfrm_state *x)
@@ -1769,7 +1763,7 @@ struct xfrmk_spdinfo {
struct xfrm_state *xfrm_find_acq_byseq(struct net *net, u32 mark, u32 seq, u32 pcpu_num);
int xfrm_state_delete(struct xfrm_state *x);
-int xfrm_state_flush(struct net *net, u8 proto, bool task_valid, bool sync);
+int xfrm_state_flush(struct net *net, u8 proto, bool task_valid);
int xfrm_dev_state_flush(struct net *net, struct net_device *dev, bool task_valid);
int xfrm_dev_policy_flush(struct net *net, struct net_device *dev,
bool task_valid);
diff --git a/net/ipv6/xfrm6_tunnel.c b/net/ipv6/xfrm6_tunnel.c
index 7fd8bc08e6eb..5120a763da0d 100644
--- a/net/ipv6/xfrm6_tunnel.c
+++ b/net/ipv6/xfrm6_tunnel.c
@@ -334,7 +334,7 @@ static void __net_exit xfrm6_tunnel_net_exit(struct net *net)
struct xfrm6_tunnel_net *xfrm6_tn = xfrm6_tunnel_pernet(net);
unsigned int i;
- xfrm_state_flush(net, 0, false, true);
+ xfrm_state_flush(net, IPSEC_PROTO_ANY, false);
xfrm_flush_gc();
for (i = 0; i < XFRM6_TUNNEL_SPI_BYADDR_HSIZE; i++)
diff --git a/net/key/af_key.c b/net/key/af_key.c
index efc2a91f4c48..b5d761700776 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -1766,7 +1766,7 @@ static int pfkey_flush(struct sock *sk, struct sk_buff *skb, const struct sadb_m
if (proto == 0)
return -EINVAL;
- err = xfrm_state_flush(net, proto, true, false);
+ err = xfrm_state_flush(net, proto, true);
err2 = unicast_flush_resp(sk, hdr);
if (err || err2) {
if (err == -ESRCH) /* empty table - go quietly */
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index f7110a658897..327a1a6f892c 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -592,7 +592,7 @@ void xfrm_state_free(struct xfrm_state *x)
}
EXPORT_SYMBOL(xfrm_state_free);
-static void ___xfrm_state_destroy(struct xfrm_state *x)
+static void xfrm_state_gc_destroy(struct xfrm_state *x)
{
if (x->mode_cbs && x->mode_cbs->destroy_state)
x->mode_cbs->destroy_state(x);
@@ -631,7 +631,7 @@ static void xfrm_state_gc_task(struct work_struct *work)
synchronize_rcu();
hlist_for_each_entry_safe(x, tmp, &gc_list, gclist)
- ___xfrm_state_destroy(x);
+ xfrm_state_gc_destroy(x);
}
static enum hrtimer_restart xfrm_timer_handler(struct hrtimer *me)
@@ -795,19 +795,14 @@ void xfrm_dev_state_free(struct xfrm_state *x)
}
#endif
-void __xfrm_state_destroy(struct xfrm_state *x, bool sync)
+void __xfrm_state_destroy(struct xfrm_state *x)
{
WARN_ON(x->km.state != XFRM_STATE_DEAD);
- if (sync) {
- synchronize_rcu();
- ___xfrm_state_destroy(x);
- } else {
- spin_lock_bh(&xfrm_state_gc_lock);
- hlist_add_head(&x->gclist, &xfrm_state_gc_list);
- spin_unlock_bh(&xfrm_state_gc_lock);
- schedule_work(&xfrm_state_gc_work);
- }
+ spin_lock_bh(&xfrm_state_gc_lock);
+ hlist_add_head(&x->gclist, &xfrm_state_gc_list);
+ spin_unlock_bh(&xfrm_state_gc_lock);
+ schedule_work(&xfrm_state_gc_work);
}
EXPORT_SYMBOL(__xfrm_state_destroy);
@@ -922,7 +917,7 @@ xfrm_dev_state_flush_secctx_check(struct net *net, struct net_device *dev, bool
}
#endif
-int xfrm_state_flush(struct net *net, u8 proto, bool task_valid, bool sync)
+int xfrm_state_flush(struct net *net, u8 proto, bool task_valid)
{
int i, err = 0, cnt = 0;
@@ -3283,7 +3278,7 @@ void xfrm_state_fini(struct net *net)
unsigned int sz;
flush_work(&net->xfrm.state_hash_work);
- xfrm_state_flush(net, 0, false, true);
+ xfrm_state_flush(net, IPSEC_PROTO_ANY, false);
flush_work(&xfrm_state_gc_work);
WARN_ON(!list_empty(&net->xfrm.state_all));
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 1db18f470f42..684239018bec 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2635,7 +2635,7 @@ static int xfrm_flush_sa(struct sk_buff *skb, struct nlmsghdr *nlh,
struct xfrm_usersa_flush *p = nlmsg_data(nlh);
int err;
- err = xfrm_state_flush(net, p->proto, true, false);
+ err = xfrm_state_flush(net, p->proto, true);
if (err) {
if (err == -ESRCH) /* empty table */
return 0;
--
2.50.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH ipsec 0/2] ipsec: fix splat due to ipcomp fallback tunnel
2025-07-04 14:54 [PATCH ipsec 0/2] ipsec: fix splat due to ipcomp fallback tunnel Sabrina Dubroca
2025-07-04 14:54 ` [PATCH ipsec 1/2] xfrm: delete x->tunnel as we delete x Sabrina Dubroca
2025-07-04 14:54 ` [PATCH ipsec 2/2] Revert "xfrm: destroy xfrm_state synchronously on net exit path" Sabrina Dubroca
@ 2025-07-14 7:02 ` Steffen Klassert
2 siblings, 0 replies; 4+ messages in thread
From: Steffen Klassert @ 2025-07-14 7:02 UTC (permalink / raw)
To: Sabrina Dubroca; +Cc: netdev, Herbert Xu, Alexey Dobriyan, Cong Wang
On Fri, Jul 04, 2025 at 04:54:32PM +0200, Sabrina Dubroca wrote:
> IPcomp tunnel states have an associated fallback tunnel, a keep a
> reference on the corresponding xfrm_state, to allow deleting that
> extra state when it's not needed anymore. These states cause issues
> during netns deletion.
>
> Commit f75a2804da39 ("xfrm: destroy xfrm_state synchronously on net
> exit path") tried to address these problems but doesn't fully solve
> them, and slowed down netns deletion by adding one synchronize_rcu per
> deleted state.
>
> The first patch solves the problem by moving the fallback state
> deletion earlier (when we delete the user state, rather than at
> destruction), then we can revert the previous fix.
>
> Sabrina Dubroca (2):
> xfrm: delete x->tunnel as we delete x
> Revert "xfrm: destroy xfrm_state synchronously on net exit path"
Series applied, thanks Sabrina!
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-07-14 7:02 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-04 14:54 [PATCH ipsec 0/2] ipsec: fix splat due to ipcomp fallback tunnel Sabrina Dubroca
2025-07-04 14:54 ` [PATCH ipsec 1/2] xfrm: delete x->tunnel as we delete x Sabrina Dubroca
2025-07-04 14:54 ` [PATCH ipsec 2/2] Revert "xfrm: destroy xfrm_state synchronously on net exit path" Sabrina Dubroca
2025-07-14 7:02 ` [PATCH ipsec 0/2] ipsec: fix splat due to ipcomp fallback tunnel Steffen Klassert
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).