From: Jiayuan Chen <jiayuan.chen@linux.dev>
To: netdev@vger.kernel.org
Cc: Jiayuan Chen <jiayuan.chen@shopee.com>,
syzbot+e738404dcd14b620923c@syzkaller.appspotmail.com,
Jiayuan Chen <jiayuan.chen@linux.dev>,
"David S. Miller" <davem@davemloft.net>,
David Ahern <dsahern@kernel.org>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Simon Horman <horms@kernel.org>,
Herbert Xu <herbert@gondor.apana.org.au>,
linux-kernel@vger.kernel.org
Subject: [PATCH net v3] xfrm: fix ip_rt_bug race in icmp_route_lookup reverse path
Date: Thu, 5 Feb 2026 15:02:02 +0800 [thread overview]
Message-ID: <20260205070203.61560-1-jiayuan.chen@linux.dev> (raw)
From: Jiayuan Chen <jiayuan.chen@shopee.com>
icmp_route_lookup() performs multiple route lookups to find a suitable
route for sending ICMP error messages, with special handling for XFRM
(IPsec) policies.
The lookup sequence is:
1. First, lookup output route for ICMP reply (dst = original src)
2. Pass through xfrm_lookup() for policy check
3. If blocked (-EPERM) or dst is not local, enter "reverse path"
4. In reverse path, call xfrm_decode_session_reverse() to get fl4_dec
which reverses the original packet's flow (saddr<->daddr swapped)
5. If fl4_dec.saddr is local (we are the original destination), use
__ip_route_output_key() for output route lookup
6. If fl4_dec.saddr is NOT local (we are a forwarding node), use
ip_route_input() to simulate the reverse packet's input path
7. Finally, pass rt2 through xfrm_lookup() with XFRM_LOOKUP_ICMP flag
The bug occurs in step 6: ip_route_input() is called with fl4_dec.daddr
(original packet's source) as destination. If this address becomes local
between the initial check and ip_route_input() call (e.g., due to
concurrent "ip addr add"), ip_route_input() returns a LOCAL route with
dst.output set to ip_rt_bug.
This route is then used for ICMP output, causing dst_output() to call
ip_rt_bug(), triggering a WARN_ON:
------------[ cut here ]------------
WARNING: net/ipv4/route.c:1275 at ip_rt_bug+0x21/0x30, CPU#1
Call Trace:
<TASK>
ip_push_pending_frames+0x202/0x240
icmp_push_reply+0x30d/0x430
__icmp_send+0x1149/0x24f0
ip_options_compile+0xa2/0xd0
ip_rcv_finish_core+0x829/0x1950
ip_rcv+0x2d7/0x420
__netif_receive_skb_one_core+0x185/0x1f0
netif_receive_skb+0x90/0x450
tun_get_user+0x3413/0x3fb0
tun_chr_write_iter+0xe4/0x220
...
Fix this by checking rt2->rt_type after ip_route_input(). If it's
RTN_LOCAL, the route cannot be used for output, so treat it as an error.
The reproducer requires kernel modification to widen the race window,
making it unsuitable as a selftest. It is available at:
https://gist.github.com/mrpre/eae853b72ac6a750f5d45d64ddac1e81
Reported-by: syzbot+e738404dcd14b620923c@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000b1060905eada8881@google.com/T/
Closes: https://lore.kernel.org/r/20260128090523.356953-1-jiayuan.chen@linux.dev
Fixes: 8b7817f3a959 ("[IPSEC]: Add ICMP host relookup support")
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
v1 -> v3:
Suggested by Paolo Abeni:
- Resend it using net tree and using xfrm prefix
- Fix text string over 80 chars limit.
- Simplify commit message.
v1: https://lore.kernel.org/r/20260128090523.356953-1-jiayuan.chen@linux.dev
v2: https://lore.kernel.org/netdev/20260203063449.44737-1-jiayuan.chen@linux.dev/
---
net/ipv4/icmp.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 4abbec2f47ef..35816ac749bc 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -554,6 +554,22 @@ static struct rtable *icmp_route_lookup(struct net *net, struct flowi4 *fl4,
/* steal dst entry from skb_in, don't drop refcnt */
skb_dstref_steal(skb_in);
skb_dstref_restore(skb_in, orefdst);
+
+ /*
+ * At this point, fl4_dec.daddr should NOT be local (we
+ * checked fl4_dec.saddr above). However, a race condition
+ * may occur if the address is added to the interface
+ * concurrently. In that case, ip_route_input() returns a
+ * LOCAL route with dst.output=ip_rt_bug, which must not
+ * be used for output.
+ */
+ if (!err && rt2 && rt2->rt_type == RTN_LOCAL) {
+ net_warn_ratelimited("detected local route for %pI4 "
+ "during ICMP sending, src %pI4\n",
+ &fl4_dec.daddr, &fl4_dec.saddr);
+ dst_release(&rt2->dst);
+ err = -EINVAL;
+ }
}
if (err)
--
2.43.0
next reply other threads:[~2026-02-05 7:02 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-05 7:02 Jiayuan Chen [this message]
2026-02-05 15:17 ` [PATCH net v3] xfrm: fix ip_rt_bug race in icmp_route_lookup reverse path David Ahern
2026-02-05 15:29 ` Jiayuan Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260205070203.61560-1-jiayuan.chen@linux.dev \
--to=jiayuan.chen@linux.dev \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=edumazet@google.com \
--cc=herbert@gondor.apana.org.au \
--cc=horms@kernel.org \
--cc=jiayuan.chen@shopee.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=syzbot+e738404dcd14b620923c@syzkaller.appspotmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.