From: Jakub Kicinski <kuba@kernel.org>
To: "Daniel Gröber" <dxld@darkboxed.org>
Cc: "David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Paolo Abeni <pabeni@redhat.com>,
netdev@vger.kernel.org, Richard Weinberger <richard@nod.at>,
Serge Hallyn <serge.hallyn@canonical.com>,
"Eric W. Biederman" <ebiederm@xmission.com>
Subject: Re: [BUG] rtnl_newlink: Rogue MOVE event delivered on netns change
Date: Fri, 13 Oct 2023 15:36:05 -0700 [thread overview]
Message-ID: <20231013153605.487f5a74@kernel.org> (raw)
In-Reply-To: <20231010121003.x3yi6fihecewjy4e@House.clients.dxld.at>
On Tue, 10 Oct 2023 14:10:03 +0200 Daniel Gröber wrote:
> Changing a device's netns and renaming it with one RTM_NEWLINK call causes
> a rogue MOVE uevent to be delivered to the new netns in addition to the
> expected ADD uevent.
>
> iproute2 reproducer:
>
> $ ip netns add test
> $ ip link add dev eth0 netns test type dummy
> $ ip link add dev eth0 type dummy
>
> $ ip -netns test link set dev eth0 netns 1 name eth123
>
> With the last command, which renames the device and moves it out of the
> netns, we get the following:
>
> $ udevadm monitor -k
> KERNEL[230953.424834] add /devices/virtual/net/eth0 (net)
> KERNEL[230953.424914] move /devices/virtual/net/eth0 (net)
> KERNEL[230953.425066] move /devices/virtual/net/eth123 (net)
FTR I don't see the move on current net-next, I see two adds
and one move.
[ ~]# udevadm monitor -k &
monitor will print the received events for:
KERNEL - the kernel uevent
[ ~]# ip netns add test
[ ~]# ip link add dev eth0 netns test type dummy
KERNEL[115.393650] add /module/dummy (module)
[ ~]# ip link add dev eth0 type dummy
KERNEL[121.702300] add /devices/virtual/net/eth0 (net)
KERNEL[121.704608] add /devices/virtual/net/eth0/queues/rx-0 (queues)
KERNEL[121.704733] add /devices/virtual/net/eth0/queues/tx-0 (queues)
[ ~]# ip -netns test link set dev eth0 netns 1 name eth123
KERNEL[135.598907] add /devices/virtual/net/eth0 (net)
KERNEL[135.600425] move /devices/virtual/net/eth123 (net)
I don't think it matters for the problem you're describing, tho.
> The problem is the MOVE event hooribly confuses userspace. The particular
> symptom we're seing is that systemd will bring down the ifup@eth0.service
> on the host as it handles the MOVE of eth0->eth123 as a stop for the
> BoundTo sys-subsystem-net-devices-eth0.device unit.
>
> I also create a clashing eth0 device on the host in the repro to
> demonstrate that the RTM_NETLINK move+rename call is atomic and so the MOVE
> event is entirely nonsensical.
>
> Looking at the code in __rtnl_newlink I see do_setlink first calls
> __dev_change_net_namespace and then dev_change_name. My guess is the order
> is just wrong here.
Interesting. My analysis is slightly different but only in low level
aspects.. tell me if I'm wrong:
1. we have tb[IFLA_IFNAME] set, so do_setlink() will populate ifname
2. Because of #1, __dev_change_net_namespace() gets called with
new name provide (pat = eth123)
3. It will do netdev_name_in_use(), which returns true.
4. It will then call dev_get_valid_name() which, (confusingly?) already
sets the new name on the netdevice itself.
5. It then calls device shutdown from the source netns.
This results in Bug#1, the netlink notification carries
the new name in the old netns.
[ ~]# ip -netns test link add dev eth0 netns test type dummy
[ ~]# ip -netns test link add dev eth1 netns test type dummy
[ ~]# ip -netns test link set dev eth0 netns 1 name eth1
ip monitor inside netns:
5: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
link/ether be:4d:58:f9:d5:40 brd ff:ff:ff:ff:ff:ff
6: eth1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
link/ether 1e:4a:34:36:e3:cd brd ff:ff:ff:ff:ff:ff
Deleted inet eth0
Deleted inet6 eth0
Deleted 5: eth1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
link/ether be:4d:58:f9:d5:40 brd ff:ff:ff:ff:ff:ff new-netnsid 0 new-ifindex 7
tells us we deleted eth1, ifindex 5, which is not true. It was eth0.
Small sidebar - altnames are completely broken when it comes to netns,
too:
[ ~]# ip link add dev eth0 type dummy
[ ~]# ip link add dev eth1 type dummy
[ ~]# ip link property add dev eth1 altname eth0
RTNETLINK answers: File exists
^ it's not letting us use eth0 because device eth0 exists, but
[ ~]# ip netns add test
[ ~]# ip -netns test link add dev ethX netns test type dummy
[ ~]# ip -netns test link property add dev ethX altname eth0
[ ~]# ip -netns test link set dev ethX netns 1
[ ~]# ip li
...
3: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 02:40:88:62:ec:b8 brd ff:ff:ff:ff:ff:ff
...
5: ethX: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 26:b7:28:78:38:0f brd ff:ff:ff:ff:ff:ff
altname eth0
and we have eth0 altname. So that's Bug#2.
Picking back up after the shutdown in old netns.
6. We call:
kobject_uevent(&dev->dev.kobj, KOBJ_REMOVE);
dev_net_set(dev, net);
kobject_uevent(&dev->dev.kobj, KOBJ_ADD);
Those are the calls you see in udev, recall that device core has
its own naming, so both of those calls will use the _old_ name.
REMOVE in the source netns and ADD in the destination netns.
The kobject calls were added by Serge in 4e66ae2ea371c, in 2012,
curiously it states:
v2: also send KOBJ_ADD to new netns. There will then be a
_MOVE event from the device_rename() call, but that should
be innocuous.
Which was based on this conversation:
https://lore.kernel.org/all/20121012031328.GA5472@sergelap/
7. Now we finally call:
err = device_rename(&dev->dev, dev->name);
Which tells device core that the name has changed, and gives you
the (second) MOVE event. This time with the correct name.
Which is what you're seeing, Bug#3, the ADD event should be after
the call to device_rename()...
Bug#1 and Bug#2 we can fix in networking. Bug#3 is a bit more tricky,
because what we want is a "silent" rename, without generating the MOVE.
This email is a bit long, so let me cut off here..
next prev parent reply other threads:[~2023-10-13 22:36 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-10 12:10 [BUG] rtnl_newlink: Rogue MOVE event delivered on netns change Daniel Gröber
2023-10-13 22:36 ` Jakub Kicinski [this message]
2023-10-13 22:43 ` Jakub Kicinski
2023-10-14 8:58 ` Greg Kroah-Hartman
2023-10-16 14:32 ` Jakub Kicinski
2023-10-16 17:20 ` Greg Kroah-Hartman
2023-10-16 19:03 ` Jakub Kicinski
2023-10-17 1:20 ` Daniel Gröber
2023-10-17 1:45 ` Jakub Kicinski
2023-11-11 2:13 ` [RFC net-next] net: do not send a MOVE event when netdev changes netns Jakub Kicinski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231013153605.487f5a74@kernel.org \
--to=kuba@kernel.org \
--cc=davem@davemloft.net \
--cc=dxld@darkboxed.org \
--cc=ebiederm@xmission.com \
--cc=edumazet@google.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=richard@nod.at \
--cc=serge.hallyn@canonical.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.