All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: "Daniel Gröber" <dxld@darkboxed.org>
Cc: "David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Paolo Abeni <pabeni@redhat.com>,
	netdev@vger.kernel.org, Richard Weinberger <richard@nod.at>,
	Serge Hallyn <serge.hallyn@canonical.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>
Subject: Re: [BUG] rtnl_newlink: Rogue MOVE event delivered on netns change
Date: Fri, 13 Oct 2023 15:36:05 -0700	[thread overview]
Message-ID: <20231013153605.487f5a74@kernel.org> (raw)
In-Reply-To: <20231010121003.x3yi6fihecewjy4e@House.clients.dxld.at>

On Tue, 10 Oct 2023 14:10:03 +0200 Daniel Gröber wrote:
> Changing a device's netns and renaming it with one RTM_NEWLINK call causes
> a rogue MOVE uevent to be delivered to the new netns in addition to the
> expected ADD uevent.
> 
> iproute2 reproducer:
> 
>     $ ip netns add test
>     $ ip link add dev eth0 netns test type dummy
>     $ ip link add dev eth0 type dummy
> 
>     $ ip -netns test link set dev eth0 netns 1 name eth123
> 
> With the last command, which renames the device and moves it out of the
> netns, we get the following:
> 
>     $ udevadm monitor -k
>     KERNEL[230953.424834] add      /devices/virtual/net/eth0 (net)
>     KERNEL[230953.424914] move     /devices/virtual/net/eth0 (net)
>     KERNEL[230953.425066] move     /devices/virtual/net/eth123 (net)

FTR I don't see the move on current net-next, I see two adds 
and one move.

 [ ~]# udevadm monitor -k &
 monitor will print the received events for:
 KERNEL - the kernel uevent

 [ ~]# ip netns add test
 [ ~]# ip link add dev eth0 netns test type dummy
 KERNEL[115.393650] add      /module/dummy (module)
 [ ~]# ip link add dev eth0 type dummy
 KERNEL[121.702300] add      /devices/virtual/net/eth0 (net)
 KERNEL[121.704608] add      /devices/virtual/net/eth0/queues/rx-0 (queues)
 KERNEL[121.704733] add      /devices/virtual/net/eth0/queues/tx-0 (queues)
 [ ~]# ip -netns test link set dev eth0 netns 1 name eth123
 KERNEL[135.598907] add      /devices/virtual/net/eth0 (net)
 KERNEL[135.600425] move     /devices/virtual/net/eth123 (net)

I don't think it matters for the problem you're describing, tho.

> The problem is the MOVE event hooribly confuses userspace. The particular
> symptom we're seing is that systemd will bring down the ifup@eth0.service
> on the host as it handles the MOVE of eth0->eth123 as a stop for the
> BoundTo sys-subsystem-net-devices-eth0.device unit.
> 
> I also create a clashing eth0 device on the host in the repro to
> demonstrate that the RTM_NETLINK move+rename call is atomic and so the MOVE
> event is entirely nonsensical.
> 
> Looking at the code in __rtnl_newlink I see do_setlink first calls
> __dev_change_net_namespace and then dev_change_name. My guess is the order
> is just wrong here.

Interesting. My analysis is slightly different but only in low level
aspects.. tell me if I'm wrong:

1. we have tb[IFLA_IFNAME] set, so do_setlink() will populate ifname

2. Because of #1, __dev_change_net_namespace() gets called with 
   new name provide (pat = eth123)

3. It will do netdev_name_in_use(), which returns true.

4. It will then call dev_get_valid_name() which, (confusingly?) already
   sets the new name on the netdevice itself.

5. It then calls device shutdown from the source netns.
   This results in Bug#1, the netlink notification carries 
   the new name in the old netns.

 [ ~]# ip -netns test link add dev eth0 netns test type dummy
 [ ~]# ip -netns test link add dev eth1 netns test type dummy
 [ ~]# ip -netns test link set dev eth0 netns 1 name eth1

ip monitor inside netns:

 5: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default 
     link/ether be:4d:58:f9:d5:40 brd ff:ff:ff:ff:ff:ff
 6: eth1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default 
     link/ether 1e:4a:34:36:e3:cd brd ff:ff:ff:ff:ff:ff

 Deleted inet eth0 
 Deleted inet6 eth0 
 Deleted 5: eth1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default 
     link/ether be:4d:58:f9:d5:40 brd ff:ff:ff:ff:ff:ff new-netnsid 0 new-ifindex 7

tells us we deleted eth1, ifindex 5, which is not true. It was eth0.

Small sidebar - altnames are completely broken when it comes to netns,
too:

 [ ~]# ip link add dev eth0 type dummy
 [ ~]# ip link add dev eth1 type dummy
 [ ~]# ip link property add dev eth1 altname eth0
 RTNETLINK answers: File exists

^ it's not letting us use eth0 because device eth0 exists, but

 [ ~]# ip netns add test
 [ ~]# ip -netns test link add dev ethX netns test type dummy
 [ ~]# ip -netns test link property add dev ethX altname eth0
 [ ~]# ip -netns test link set dev ethX netns 1  

 [ ~]# ip li
 ...
 3: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
     link/ether 02:40:88:62:ec:b8 brd ff:ff:ff:ff:ff:ff
 ...
 5: ethX: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
     link/ether 26:b7:28:78:38:0f brd ff:ff:ff:ff:ff:ff
     altname eth0

and we have eth0 altname. So that's Bug#2.
Picking back up after the shutdown in old netns.

6. We call:

   kobject_uevent(&dev->dev.kobj, KOBJ_REMOVE);
   dev_net_set(dev, net);
   kobject_uevent(&dev->dev.kobj, KOBJ_ADD);

Those are the calls you see in udev, recall that device core has
its own naming, so both of those calls will use the _old_ name.
REMOVE in the source netns and ADD in the destination netns.

The kobject calls were added by Serge in 4e66ae2ea371c, in 2012,
curiously it states:

    v2: also send KOBJ_ADD to new netns.  There will then be a
    _MOVE event from the device_rename() call, but that should
    be innocuous.

Which was based on this conversation:
https://lore.kernel.org/all/20121012031328.GA5472@sergelap/

7. Now we finally call:

   err = device_rename(&dev->dev, dev->name);

Which tells device core that the name has changed, and gives you 
the (second) MOVE event. This time with the correct name.

Which is what you're seeing, Bug#3, the ADD event should be after
the call to device_rename()...

Bug#1 and Bug#2 we can fix in networking. Bug#3 is a bit more tricky,
because what we want is a "silent" rename, without generating the MOVE.
This email is a bit long, so let me cut off here..

  reply	other threads:[~2023-10-13 22:36 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-10 12:10 [BUG] rtnl_newlink: Rogue MOVE event delivered on netns change Daniel Gröber
2023-10-13 22:36 ` Jakub Kicinski [this message]
2023-10-13 22:43   ` Jakub Kicinski
2023-10-14  8:58     ` Greg Kroah-Hartman
2023-10-16 14:32       ` Jakub Kicinski
2023-10-16 17:20         ` Greg Kroah-Hartman
2023-10-16 19:03           ` Jakub Kicinski
2023-10-17  1:20           ` Daniel Gröber
2023-10-17  1:45             ` Jakub Kicinski
2023-11-11  2:13               ` [RFC net-next] net: do not send a MOVE event when netdev changes netns Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231013153605.487f5a74@kernel.org \
    --to=kuba@kernel.org \
    --cc=davem@davemloft.net \
    --cc=dxld@darkboxed.org \
    --cc=ebiederm@xmission.com \
    --cc=edumazet@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=richard@nod.at \
    --cc=serge.hallyn@canonical.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.