[PATCH v2] xfrm: delay dev_put in xfrm_input to after transport reinject

public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2] xfrm: delay dev_put in xfrm_input to after transport reinject
@ 2026-03-31  9:27 Qi Tang
  2026-04-02 10:36 ` Steffen Klassert
  0 siblings, 1 reply; 4+ messages in thread
From: Qi Tang @ 2026-03-31  9:27 UTC (permalink / raw)
  To: Steffen Klassert, Herbert Xu
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, David Ahern, netdev, Qi Tang, stable

xfrm_trans_queue() queues transport-mode packets for async reinject
via xfrm_trans_reinject() workqueue.  After async crypto completes,
xfrm_input_resume() re-enters xfrm_input() with encap_type == -1,
which immediately calls dev_put(skb->dev) before the skb reaches
transport_finish and the reinject queue.  The device can be freed
before the workqueue callback runs, causing a use-after-free when
xfrm_trans_reinject dereferences skb->dev.

Remove the dev_put from the async resumption entry and let the
reference survive through the transport reinject path.  Introduce
async variants of the NF_HOOK okfn callbacks that queue the skb
with dev_held=true and drop the reference on error.  The reinject
worker checks this flag and puts the reference after the callback
completes.

For the synchronous crypto path, the existing dev_hold/dev_put
around x->type->input() is unchanged — the reference is balanced
within the same softirq context before the skb reaches the queue.

If the loop re-enters async crypto (multi-SPI with a second
-EINPROGRESS), drop the extra reference from the earlier async
resume so exactly one reference accompanies the skb.

Fixes: acf568ee859f ("xfrm: Reinject transport-mode packets through tasklet")
Cc: stable@vger.kernel.org
Signed-off-by: Qi Tang <tpluszz77@gmail.com>
---
Changes in v2:
  - Do not add extra dev_hold/dev_put pair (reviewer feedback:
    "expensive operation, we just drop it too early")
  - Reuse existing dev_hold from xfrm_input, delay dev_put to
    reinject completion
  - Add async okfn variants for IPv4/IPv6 transport_finish so
    the reinject queue knows whether a dev ref is held
  - Drop the cb->dev field from v1; use bool dev_held flag instead

Link: https://lore.kernel.org/all/20260320073023.21873-1-tpluszz77@gmail.com/
---
 include/net/xfrm.h     |  3 ++-
 net/ipv4/esp4.c        |  3 ++-
 net/ipv4/xfrm4_input.c | 25 ++++++++++++++++++++++++-
 net/ipv6/esp6.c        |  3 ++-
 net/ipv6/xfrm6_input.c | 16 +++++++++++++++-
 net/xfrm/xfrm_input.c  | 35 ++++++++++++++++++++++++++---------
 6 files changed, 71 insertions(+), 14 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 10d3edde6b2f..1dd8b3b36649 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1779,7 +1779,8 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type);
 int xfrm_input_resume(struct sk_buff *skb, int nexthdr);
 int xfrm_trans_queue_net(struct net *net, struct sk_buff *skb,
 			 int (*finish)(struct net *, struct sock *,
-				       struct sk_buff *));
+				       struct sk_buff *),
+			 bool dev_held);
 int xfrm_trans_queue(struct sk_buff *skb,
 		     int (*finish)(struct net *, struct sock *,
 				   struct sk_buff *));
diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index 6dfc0bcdef65..0114c92b10d4 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -187,7 +187,8 @@ static int esp_output_tail_tcp(struct xfrm_state *x, struct sk_buff *skb)
 	int err;
 
 	local_bh_disable();
-	err = xfrm_trans_queue_net(xs_net(x), skb, esp_output_tcp_encap_cb);
+	err = xfrm_trans_queue_net(xs_net(x), skb, esp_output_tcp_encap_cb,
+				   false);
 	local_bh_enable();
 
 	/* EINPROGRESS just happens to do the right thing.  It
diff --git a/net/ipv4/xfrm4_input.c b/net/ipv4/xfrm4_input.c
index f28cfd88eaf5..9765fdc63ffc 100644
--- a/net/ipv4/xfrm4_input.c
+++ b/net/ipv4/xfrm4_input.c
@@ -46,6 +46,28 @@ static inline int xfrm4_rcv_encap_finish(struct net *net, struct sock *sk,
 	return NET_RX_DROP;
 }
 
+static int xfrm4_rcv_encap_finish_async(struct net *net, struct sock *sk,
+					struct sk_buff *skb)
+{
+	if (!skb_dst(skb)) {
+		const struct iphdr *iph = ip_hdr(skb);
+
+		if (ip_route_input_noref(skb, iph->daddr, iph->saddr,
+					 ip4h_dscp(iph), skb->dev))
+			goto drop;
+	}
+
+	if (xfrm_trans_queue_net(dev_net(skb->dev), skb,
+				 xfrm4_rcv_encap_finish2, true))
+		goto drop;
+
+	return 0;
+drop:
+	dev_put(skb->dev);
+	kfree_skb(skb);
+	return NET_RX_DROP;
+}
+
 int xfrm4_transport_finish(struct sk_buff *skb, int async)
 {
 	struct xfrm_offload *xo = xfrm_offload(skb);
@@ -74,7 +96,8 @@ int xfrm4_transport_finish(struct sk_buff *skb, int async)
 
 	NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING,
 		dev_net(skb->dev), NULL, skb, skb->dev, NULL,
-		xfrm4_rcv_encap_finish);
+		async ? xfrm4_rcv_encap_finish_async :
+			xfrm4_rcv_encap_finish);
 	return 0;
 }
 
diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index 9f75313734f8..8a0a44d7d010 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -204,7 +204,8 @@ static int esp_output_tail_tcp(struct xfrm_state *x, struct sk_buff *skb)
 	int err;
 
 	local_bh_disable();
-	err = xfrm_trans_queue_net(xs_net(x), skb, esp_output_tcp_encap_cb);
+	err = xfrm_trans_queue_net(xs_net(x), skb, esp_output_tcp_encap_cb,
+				   false);
 	local_bh_enable();
 
 	/* EINPROGRESS just happens to do the right thing.  It
diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c
index 9005fc156a20..d4eede5315ac 100644
--- a/net/ipv6/xfrm6_input.c
+++ b/net/ipv6/xfrm6_input.c
@@ -40,6 +40,19 @@ static int xfrm6_transport_finish2(struct net *net, struct sock *sk,
 	return 0;
 }
 
+static int xfrm6_transport_finish2_async(struct net *net, struct sock *sk,
+					 struct sk_buff *skb)
+{
+	if (xfrm_trans_queue_net(dev_net(skb->dev), skb, ip6_rcv_finish,
+				 true)) {
+		dev_put(skb->dev);
+		kfree_skb(skb);
+		return NET_RX_DROP;
+	}
+
+	return 0;
+}
+
 int xfrm6_transport_finish(struct sk_buff *skb, int async)
 {
 	struct xfrm_offload *xo = xfrm_offload(skb);
@@ -69,7 +82,8 @@ int xfrm6_transport_finish(struct sk_buff *skb, int async)
 
 	NF_HOOK(NFPROTO_IPV6, NF_INET_PRE_ROUTING,
 		dev_net(skb->dev), NULL, skb, skb->dev, NULL,
-		xfrm6_transport_finish2);
+		async ? xfrm6_transport_finish2_async :
+			xfrm6_transport_finish2);
 	return 0;
 }
 
diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index dc1312ed5a09..2d75f984532a 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -40,6 +40,7 @@ struct xfrm_trans_cb {
 	} header;
 	int (*finish)(struct net *net, struct sock *sk, struct sk_buff *skb);
 	struct net *net;
+	bool dev_held;
 };
 
 #define XFRM_TRANS_SKB_CB(__skb) ((struct xfrm_trans_cb *)&((__skb)->cb[0]))
@@ -506,7 +507,6 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 		/* An encap_type of -1 indicates async resumption. */
 		if (encap_type == -1) {
 			async = 1;
-			dev_put(skb->dev);
 			seq = XFRM_SKB_CB(skb)->seq.input.low;
 			spin_lock(&x->lock);
 			goto resume;
@@ -659,8 +659,11 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 			dev_hold(skb->dev);
 
 			nexthdr = x->type->input(x, skb);
-			if (nexthdr == -EINPROGRESS)
+			if (nexthdr == -EINPROGRESS) {
+				if (async)
+					dev_put(skb->dev);
 				return 0;
+			}
 
 			dev_put(skb->dev);
 			spin_lock(&x->lock);
@@ -695,9 +698,11 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 		XFRM_MODE_SKB_CB(skb)->protocol = nexthdr;
 
 		err = xfrm_inner_mode_input(x, skb);
-		if (err == -EINPROGRESS)
+		if (err == -EINPROGRESS) {
+			if (async)
+				dev_put(skb->dev);
 			return 0;
-		else if (err) {
+		} else if (err) {
 			XFRM_INC_STATS(net, LINUX_MIB_XFRMINSTATEMODEERROR);
 			goto drop;
 		}
@@ -734,6 +739,8 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 			sp->olen = 0;
 		if (skb_valid_dst(skb))
 			skb_dst_drop(skb);
+		if (async)
+			dev_put(skb->dev);
 		gro_cells_receive(&gro_cells, skb);
 		return 0;
 	} else {
@@ -753,6 +760,8 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 				sp->olen = 0;
 			if (skb_valid_dst(skb))
 				skb_dst_drop(skb);
+			if (async)
+				dev_put(skb->dev);
 			gro_cells_receive(&gro_cells, skb);
 			return err;
 		}
@@ -763,6 +772,8 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 drop_unlock:
 	spin_unlock(&x->lock);
 drop:
+	if (async)
+		dev_put(skb->dev);
 	xfrm_rcv_cb(skb, family, x && x->type ? x->type->proto : nexthdr, -1);
 	kfree_skb(skb);
 	return 0;
@@ -787,15 +798,20 @@ static void xfrm_trans_reinject(struct work_struct *work)
 	spin_unlock_bh(&trans->queue_lock);
 
 	local_bh_disable();
-	while ((skb = __skb_dequeue(&queue)))
-		XFRM_TRANS_SKB_CB(skb)->finish(XFRM_TRANS_SKB_CB(skb)->net,
-					       NULL, skb);
+	while ((skb = __skb_dequeue(&queue))) {
+		struct xfrm_trans_cb *cb = XFRM_TRANS_SKB_CB(skb);
+		struct net_device *dev = cb->dev_held ? skb->dev : NULL;
+
+		cb->finish(cb->net, NULL, skb);
+		dev_put(dev);
+	}
 	local_bh_enable();
 }
 
 int xfrm_trans_queue_net(struct net *net, struct sk_buff *skb,
 			 int (*finish)(struct net *, struct sock *,
-				       struct sk_buff *))
+				       struct sk_buff *),
+			 bool dev_held)
 {
 	struct xfrm_trans_tasklet *trans;
 
@@ -808,6 +824,7 @@ int xfrm_trans_queue_net(struct net *net, struct sk_buff *skb,
 
 	XFRM_TRANS_SKB_CB(skb)->finish = finish;
 	XFRM_TRANS_SKB_CB(skb)->net = net;
+	XFRM_TRANS_SKB_CB(skb)->dev_held = dev_held;
 	spin_lock_bh(&trans->queue_lock);
 	__skb_queue_tail(&trans->queue, skb);
 	spin_unlock_bh(&trans->queue_lock);
@@ -820,7 +837,7 @@ int xfrm_trans_queue(struct sk_buff *skb,
 		     int (*finish)(struct net *, struct sock *,
 				   struct sk_buff *))
 {
-	return xfrm_trans_queue_net(dev_net(skb->dev), skb, finish);
+	return xfrm_trans_queue_net(dev_net(skb->dev), skb, finish, false);
 }
 EXPORT_SYMBOL(xfrm_trans_queue);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] xfrm: delay dev_put in xfrm_input to after transport reinject
  2026-03-31  9:27 [PATCH v2] xfrm: delay dev_put in xfrm_input to after transport reinject Qi Tang
@ 2026-04-02 10:36 ` Steffen Klassert
  2026-04-02 10:54   ` Florian Westphal
  0 siblings, 1 reply; 4+ messages in thread
From: Steffen Klassert @ 2026-04-02 10:36 UTC (permalink / raw)
  To: Qi Tang
  Cc: Herbert Xu, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, David Ahern, netdev, stable

On Tue, Mar 31, 2026 at 05:27:37PM +0800, Qi Tang wrote:
> diff --git a/net/ipv4/xfrm4_input.c b/net/ipv4/xfrm4_input.c
> index f28cfd88eaf5..9765fdc63ffc 100644
> --- a/net/ipv4/xfrm4_input.c
> +++ b/net/ipv4/xfrm4_input.c
> @@ -46,6 +46,28 @@ static inline int xfrm4_rcv_encap_finish(struct net *net, struct sock *sk,
>  	return NET_RX_DROP;
>  }
>  
> +static int xfrm4_rcv_encap_finish_async(struct net *net, struct sock *sk,
> +					struct sk_buff *skb)
> +{
> +	if (!skb_dst(skb)) {
> +		const struct iphdr *iph = ip_hdr(skb);
> +
> +		if (ip_route_input_noref(skb, iph->daddr, iph->saddr,
> +					 ip4h_dscp(iph), skb->dev))
> +			goto drop;
> +	}
> +
> +	if (xfrm_trans_queue_net(dev_net(skb->dev), skb,
> +				 xfrm4_rcv_encap_finish2, true))
> +		goto drop;
> +
> +	return 0;
> +drop:
> +	dev_put(skb->dev);
> +	kfree_skb(skb);
> +	return NET_RX_DROP;
> +}
> +
>  int xfrm4_transport_finish(struct sk_buff *skb, int async)
>  {
>  	struct xfrm_offload *xo = xfrm_offload(skb);
> @@ -74,7 +96,8 @@ int xfrm4_transport_finish(struct sk_buff *skb, int async)
>  
>  	NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING,
>  		dev_net(skb->dev), NULL, skb, skb->dev, NULL,
> -		xfrm4_rcv_encap_finish);
> +		async ? xfrm4_rcv_encap_finish_async :
> +			xfrm4_rcv_encap_finish);

What happens if the PRE_ROUTING hook returns NF_DROP, NF_QUEUE, or
NF_STOLEN before the okfn runs? Looks like we leak the dev refcnt
then.

> diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c
> index 9005fc156a20..d4eede5315ac 100644
> --- a/net/ipv6/xfrm6_input.c
> +++ b/net/ipv6/xfrm6_input.c
> @@ -40,6 +40,19 @@ static int xfrm6_transport_finish2(struct net *net, struct sock *sk,
>  	return 0;
>  }
>  
> +static int xfrm6_transport_finish2_async(struct net *net, struct sock *sk,
> +					 struct sk_buff *skb)
> +{
> +	if (xfrm_trans_queue_net(dev_net(skb->dev), skb, ip6_rcv_finish,
> +				 true)) {
> +		dev_put(skb->dev);
> +		kfree_skb(skb);
> +		return NET_RX_DROP;
> +	}
> +
> +	return 0;
> +}
> +
>  int xfrm6_transport_finish(struct sk_buff *skb, int async)
>  {
>  	struct xfrm_offload *xo = xfrm_offload(skb);
> @@ -69,7 +82,8 @@ int xfrm6_transport_finish(struct sk_buff *skb, int async)
>  
>  	NF_HOOK(NFPROTO_IPV6, NF_INET_PRE_ROUTING,
>  		dev_net(skb->dev), NULL, skb, skb->dev, NULL,
> -		xfrm6_transport_finish2);
> +		async ? xfrm6_transport_finish2_async :
> +			xfrm6_transport_finish2);

Same here.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] xfrm: delay dev_put in xfrm_input to after transport reinject
  2026-04-02 10:36 ` Steffen Klassert
@ 2026-04-02 10:54   ` Florian Westphal
  2026-04-02 11:26     ` Qi Tang
  0 siblings, 1 reply; 4+ messages in thread
From: Florian Westphal @ 2026-04-02 10:54 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Qi Tang, Herbert Xu, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, David Ahern, netdev,
	stable

Steffen Klassert <steffen.klassert@secunet.com> wrote:
> On Tue, Mar 31, 2026 at 05:27:37PM +0800, Qi Tang wrote:
> >  int xfrm4_transport_finish(struct sk_buff *skb, int async)
> >  {
> >  	struct xfrm_offload *xo = xfrm_offload(skb);
> > @@ -74,7 +96,8 @@ int xfrm4_transport_finish(struct sk_buff *skb, int async)
> >  
> >  	NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING,
> >  		dev_net(skb->dev), NULL, skb, skb->dev, NULL,
> > -		xfrm4_rcv_encap_finish);
> > +		async ? xfrm4_rcv_encap_finish_async :
> > +			xfrm4_rcv_encap_finish);
> 
> What happens if the PRE_ROUTING hook returns NF_DROP, NF_QUEUE, or
> NF_STOLEN before the okfn runs? Looks like we leak the dev refcnt
> then.

Yes, no okfn is run in those cases.
I'd suggest do drop the refcount after NF_HOOK, i.e. something like:

dev = skb->dev;

NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING,
...
if (async)
	dev_put(dev);

Thats easier to follow.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] xfrm: delay dev_put in xfrm_input to after transport reinject
  2026-04-02 10:54   ` Florian Westphal
@ 2026-04-02 11:26     ` Qi Tang
  0 siblings, 0 replies; 4+ messages in thread
From: Qi Tang @ 2026-04-02 11:26 UTC (permalink / raw)
  To: Florian Westphal, Steffen Klassert
  Cc: Herbert Xu, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, David Ahern, netdev, stable, Qi Tang

On Thu, Apr 3, 2026, Florian Westphal wrote:
> I'd suggest do drop the refcount after NF_HOOK, i.e. something like:
>
> dev = skb->dev;
>
> NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING,
> ...
> if (async)
>         dev_put(dev);

Much cleaner. The reinject callback only uses cb->net (saved at
queue time) and dst_input, neither needs skb->dev, so the ref
only has to survive through the NF_HOOK call.

Will send v3 with this approach.

Qi Tang

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-04-02 11:26 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-31  9:27 [PATCH v2] xfrm: delay dev_put in xfrm_input to after transport reinject Qi Tang
2026-04-02 10:36 ` Steffen Klassert
2026-04-02 10:54   ` Florian Westphal
2026-04-02 11:26     ` Qi Tang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox