netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG] rtnl_newlink: Rogue MOVE event delivered on netns change
@ 2023-10-10 12:10 Daniel Gröber
  2023-10-13 22:36 ` Jakub Kicinski
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Gröber @ 2023-10-10 12:10 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Richard Weinberger

Hi netdev,

Changing a device's netns and renaming it with one RTM_NEWLINK call causes
a rogue MOVE uevent to be delivered to the new netns in addition to the
expected ADD uevent.

iproute2 reproducer:

    $ ip netns add test
    $ ip link add dev eth0 netns test type dummy
    $ ip link add dev eth0 type dummy

    $ ip -netns test link set dev eth0 netns 1 name eth123

With the last command, which renames the device and moves it out of the
netns, we get the following:

    $ udevadm monitor -k
    KERNEL[230953.424834] add      /devices/virtual/net/eth0 (net)
    KERNEL[230953.424914] move     /devices/virtual/net/eth0 (net)
    KERNEL[230953.425066] move     /devices/virtual/net/eth123 (net)

The problem is the MOVE event hooribly confuses userspace. The particular
symptom we're seing is that systemd will bring down the ifup@eth0.service
on the host as it handles the MOVE of eth0->eth123 as a stop for the
BoundTo sys-subsystem-net-devices-eth0.device unit.

I also create a clashing eth0 device on the host in the repro to
demonstrate that the RTM_NETLINK move+rename call is atomic and so the MOVE
event is entirely nonsensical.

Looking at the code in __rtnl_newlink I see do_setlink first calls
__dev_change_net_namespace and then dev_change_name. My guess is the order
is just wrong here.

Thanks,
--Daniel

PS: Full debugging log in https://github.com/lxc/incus/issues/146

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG] rtnl_newlink: Rogue MOVE event delivered on netns change
  2023-10-10 12:10 [BUG] rtnl_newlink: Rogue MOVE event delivered on netns change Daniel Gröber
@ 2023-10-13 22:36 ` Jakub Kicinski
  2023-10-13 22:43   ` Jakub Kicinski
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Kicinski @ 2023-10-13 22:36 UTC (permalink / raw)
  To: Daniel Gröber
  Cc: David S. Miller, Eric Dumazet, Paolo Abeni, netdev,
	Richard Weinberger, Serge Hallyn, Eric W. Biederman

On Tue, 10 Oct 2023 14:10:03 +0200 Daniel Gröber wrote:
> Changing a device's netns and renaming it with one RTM_NEWLINK call causes
> a rogue MOVE uevent to be delivered to the new netns in addition to the
> expected ADD uevent.
> 
> iproute2 reproducer:
> 
>     $ ip netns add test
>     $ ip link add dev eth0 netns test type dummy
>     $ ip link add dev eth0 type dummy
> 
>     $ ip -netns test link set dev eth0 netns 1 name eth123
> 
> With the last command, which renames the device and moves it out of the
> netns, we get the following:
> 
>     $ udevadm monitor -k
>     KERNEL[230953.424834] add      /devices/virtual/net/eth0 (net)
>     KERNEL[230953.424914] move     /devices/virtual/net/eth0 (net)
>     KERNEL[230953.425066] move     /devices/virtual/net/eth123 (net)

FTR I don't see the move on current net-next, I see two adds 
and one move.

 [ ~]# udevadm monitor -k &
 monitor will print the received events for:
 KERNEL - the kernel uevent

 [ ~]# ip netns add test
 [ ~]# ip link add dev eth0 netns test type dummy
 KERNEL[115.393650] add      /module/dummy (module)
 [ ~]# ip link add dev eth0 type dummy
 KERNEL[121.702300] add      /devices/virtual/net/eth0 (net)
 KERNEL[121.704608] add      /devices/virtual/net/eth0/queues/rx-0 (queues)
 KERNEL[121.704733] add      /devices/virtual/net/eth0/queues/tx-0 (queues)
 [ ~]# ip -netns test link set dev eth0 netns 1 name eth123
 KERNEL[135.598907] add      /devices/virtual/net/eth0 (net)
 KERNEL[135.600425] move     /devices/virtual/net/eth123 (net)

I don't think it matters for the problem you're describing, tho.

> The problem is the MOVE event hooribly confuses userspace. The particular
> symptom we're seing is that systemd will bring down the ifup@eth0.service
> on the host as it handles the MOVE of eth0->eth123 as a stop for the
> BoundTo sys-subsystem-net-devices-eth0.device unit.
> 
> I also create a clashing eth0 device on the host in the repro to
> demonstrate that the RTM_NETLINK move+rename call is atomic and so the MOVE
> event is entirely nonsensical.
> 
> Looking at the code in __rtnl_newlink I see do_setlink first calls
> __dev_change_net_namespace and then dev_change_name. My guess is the order
> is just wrong here.

Interesting. My analysis is slightly different but only in low level
aspects.. tell me if I'm wrong:

1. we have tb[IFLA_IFNAME] set, so do_setlink() will populate ifname

2. Because of #1, __dev_change_net_namespace() gets called with 
   new name provide (pat = eth123)

3. It will do netdev_name_in_use(), which returns true.

4. It will then call dev_get_valid_name() which, (confusingly?) already
   sets the new name on the netdevice itself.

5. It then calls device shutdown from the source netns.
   This results in Bug#1, the netlink notification carries 
   the new name in the old netns.

 [ ~]# ip -netns test link add dev eth0 netns test type dummy
 [ ~]# ip -netns test link add dev eth1 netns test type dummy
 [ ~]# ip -netns test link set dev eth0 netns 1 name eth1

ip monitor inside netns:

 5: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default 
     link/ether be:4d:58:f9:d5:40 brd ff:ff:ff:ff:ff:ff
 6: eth1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default 
     link/ether 1e:4a:34:36:e3:cd brd ff:ff:ff:ff:ff:ff

 Deleted inet eth0 
 Deleted inet6 eth0 
 Deleted 5: eth1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default 
     link/ether be:4d:58:f9:d5:40 brd ff:ff:ff:ff:ff:ff new-netnsid 0 new-ifindex 7

tells us we deleted eth1, ifindex 5, which is not true. It was eth0.

Small sidebar - altnames are completely broken when it comes to netns,
too:

 [ ~]# ip link add dev eth0 type dummy
 [ ~]# ip link add dev eth1 type dummy
 [ ~]# ip link property add dev eth1 altname eth0
 RTNETLINK answers: File exists

^ it's not letting us use eth0 because device eth0 exists, but

 [ ~]# ip netns add test
 [ ~]# ip -netns test link add dev ethX netns test type dummy
 [ ~]# ip -netns test link property add dev ethX altname eth0
 [ ~]# ip -netns test link set dev ethX netns 1  

 [ ~]# ip li
 ...
 3: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
     link/ether 02:40:88:62:ec:b8 brd ff:ff:ff:ff:ff:ff
 ...
 5: ethX: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
     link/ether 26:b7:28:78:38:0f brd ff:ff:ff:ff:ff:ff
     altname eth0

and we have eth0 altname. So that's Bug#2.
Picking back up after the shutdown in old netns.

6. We call:

   kobject_uevent(&dev->dev.kobj, KOBJ_REMOVE);
   dev_net_set(dev, net);
   kobject_uevent(&dev->dev.kobj, KOBJ_ADD);

Those are the calls you see in udev, recall that device core has
its own naming, so both of those calls will use the _old_ name.
REMOVE in the source netns and ADD in the destination netns.

The kobject calls were added by Serge in 4e66ae2ea371c, in 2012,
curiously it states:

    v2: also send KOBJ_ADD to new netns.  There will then be a
    _MOVE event from the device_rename() call, but that should
    be innocuous.

Which was based on this conversation:
https://lore.kernel.org/all/20121012031328.GA5472@sergelap/

7. Now we finally call:

   err = device_rename(&dev->dev, dev->name);

Which tells device core that the name has changed, and gives you 
the (second) MOVE event. This time with the correct name.

Which is what you're seeing, Bug#3, the ADD event should be after
the call to device_rename()...

Bug#1 and Bug#2 we can fix in networking. Bug#3 is a bit more tricky,
because what we want is a "silent" rename, without generating the MOVE.
This email is a bit long, so let me cut off here..

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG] rtnl_newlink: Rogue MOVE event delivered on netns change
  2023-10-13 22:36 ` Jakub Kicinski
@ 2023-10-13 22:43   ` Jakub Kicinski
  2023-10-14  8:58     ` Greg Kroah-Hartman
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Kicinski @ 2023-10-13 22:43 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Daniel Gröber, David S. Miller, Eric Dumazet, Paolo Abeni,
	netdev, Richard Weinberger, Serge Hallyn, Eric W. Biederman

On Fri, 13 Oct 2023 15:36:05 -0700 Jakub Kicinski wrote:
>    kobject_uevent(&dev->dev.kobj, KOBJ_REMOVE);
>    dev_net_set(dev, net);
>    kobject_uevent(&dev->dev.kobj, KOBJ_ADD);

Greg, we seem to have a problem in networking with combined
netns move and name change.

We have this code in __dev_change_net_namespace():

	kobject_uevent(&dev->dev.kobj, KOBJ_REMOVE);
	dev_net_set(dev, net);
	kobject_uevent(&dev->dev.kobj, KOBJ_ADD);

	err = device_rename(&dev->dev, dev->name);

Is there any way we can only get the REMOVE (old name) and ADD
(new name) events, without the move? I.e. silence the rename? 

Daniel is reporting that with current code target netns sees an 
add of an interface with the old (duplicated) name. And then a rename.

Without a silent move best we can do is probably:

	kobject_uevent(&dev->dev.kobj, KOBJ_REMOVE);
	dev_net_set(dev, net);
	err = device_rename(&dev->dev, dev->name);
	kobject_uevent(&dev->dev.kobj, KOBJ_ADD);

which will give us:

	MOVE new-name
	ADD new-name

in target netns, which, hm.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG] rtnl_newlink: Rogue MOVE event delivered on netns change
  2023-10-13 22:43   ` Jakub Kicinski
@ 2023-10-14  8:58     ` Greg Kroah-Hartman
  2023-10-16 14:32       ` Jakub Kicinski
  0 siblings, 1 reply; 10+ messages in thread
From: Greg Kroah-Hartman @ 2023-10-14  8:58 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Daniel Gröber, David S. Miller, Eric Dumazet, Paolo Abeni,
	netdev, Richard Weinberger, Serge Hallyn, Eric W. Biederman

On Fri, Oct 13, 2023 at 03:43:02PM -0700, Jakub Kicinski wrote:
> On Fri, 13 Oct 2023 15:36:05 -0700 Jakub Kicinski wrote:
> >    kobject_uevent(&dev->dev.kobj, KOBJ_REMOVE);
> >    dev_net_set(dev, net);
> >    kobject_uevent(&dev->dev.kobj, KOBJ_ADD);
> 
> Greg, we seem to have a problem in networking with combined
> netns move and name change.
> 
> We have this code in __dev_change_net_namespace():
> 
> 	kobject_uevent(&dev->dev.kobj, KOBJ_REMOVE);
> 	dev_net_set(dev, net);
> 	kobject_uevent(&dev->dev.kobj, KOBJ_ADD);
> 
> 	err = device_rename(&dev->dev, dev->name);
> 
> Is there any way we can only get the REMOVE (old name) and ADD
> (new name) events, without the move? I.e. silence the rename? 
> 
> Daniel is reporting that with current code target netns sees an 
> add of an interface with the old (duplicated) name. And then a rename.

But that's how this has always been, right?  What problems is this
causing?

> Without a silent move best we can do is probably:
> 
> 	kobject_uevent(&dev->dev.kobj, KOBJ_REMOVE);
> 	dev_net_set(dev, net);
> 	err = device_rename(&dev->dev, dev->name);
> 	kobject_uevent(&dev->dev.kobj, KOBJ_ADD);
> 
> which will give us:
> 
> 	MOVE new-name
> 	ADD new-name
> 
> in target netns, which, hm.

That wouldn't make much sense.

What is the real problem here?  What changed to cause a problem?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG] rtnl_newlink: Rogue MOVE event delivered on netns change
  2023-10-14  8:58     ` Greg Kroah-Hartman
@ 2023-10-16 14:32       ` Jakub Kicinski
  2023-10-16 17:20         ` Greg Kroah-Hartman
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Kicinski @ 2023-10-16 14:32 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Daniel Gröber, David S. Miller, Eric Dumazet, Paolo Abeni,
	netdev, Richard Weinberger, Serge Hallyn, Eric W. Biederman

On Sat, 14 Oct 2023 10:58:20 +0200 Greg Kroah-Hartman wrote:
> On Fri, Oct 13, 2023 at 03:43:02PM -0700, Jakub Kicinski wrote:
> > On Fri, 13 Oct 2023 15:36:05 -0700 Jakub Kicinski wrote:  
> > >    kobject_uevent(&dev->dev.kobj, KOBJ_REMOVE);
> > >    dev_net_set(dev, net);
> > >    kobject_uevent(&dev->dev.kobj, KOBJ_ADD);  
> > 
> > Greg, we seem to have a problem in networking with combined
> > netns move and name change.
> > 
> > We have this code in __dev_change_net_namespace():
> > 
> > 	kobject_uevent(&dev->dev.kobj, KOBJ_REMOVE);
> > 	dev_net_set(dev, net);
> > 	kobject_uevent(&dev->dev.kobj, KOBJ_ADD);
> > 
> > 	err = device_rename(&dev->dev, dev->name);
> > 
> > Is there any way we can only get the REMOVE (old name) and ADD
> > (new name) events, without the move? I.e. silence the rename? 
> > 
> > Daniel is reporting that with current code target netns sees an 
> > add of an interface with the old (duplicated) name. And then a rename.  
> 
> But that's how this has always been, right?  What problems is this
> causing?

Original report is up-thread:
https://lore.kernel.org/all/20231010121003.x3yi6fihecewjy4e@House.clients.dxld.at/
With a link to a GH issue for lxc:
https://github.com/lxc/incus/issues/146

> > Without a silent move best we can do is probably:
> > 
> > 	kobject_uevent(&dev->dev.kobj, KOBJ_REMOVE);
> > 	dev_net_set(dev, net);
> > 	err = device_rename(&dev->dev, dev->name);
> > 	kobject_uevent(&dev->dev.kobj, KOBJ_ADD);
> > 
> > which will give us:
> > 
> > 	MOVE new-name
> > 	ADD new-name
> > 
> > in target netns, which, hm.  
> 
> That wouldn't make much sense.
> 
> What is the real problem here?  What changed to cause a problem?

IIUC what happens is:

 - systemd controls "real" eth0
 - we move a "to be renamed" eth0 from a container into main ns
 - we rename "to be renamed" eth0 to something else
 - seeing the rename of eth0 system thinks it's the "real" one
   that is being renamed, ergo there's no eth0 any more,
   so it shuts down its "unit" for eth0

I don't think anything changed. Sounds more like someone finally tried
to use this in anger.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG] rtnl_newlink: Rogue MOVE event delivered on netns change
  2023-10-16 14:32       ` Jakub Kicinski
@ 2023-10-16 17:20         ` Greg Kroah-Hartman
  2023-10-16 19:03           ` Jakub Kicinski
  2023-10-17  1:20           ` Daniel Gröber
  0 siblings, 2 replies; 10+ messages in thread
From: Greg Kroah-Hartman @ 2023-10-16 17:20 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Daniel Gröber, David S. Miller, Eric Dumazet, Paolo Abeni,
	netdev, Richard Weinberger, Serge Hallyn, Eric W. Biederman

On Mon, Oct 16, 2023 at 07:32:51AM -0700, Jakub Kicinski wrote:
> On Sat, 14 Oct 2023 10:58:20 +0200 Greg Kroah-Hartman wrote:
> > On Fri, Oct 13, 2023 at 03:43:02PM -0700, Jakub Kicinski wrote:
> > > On Fri, 13 Oct 2023 15:36:05 -0700 Jakub Kicinski wrote:  
> > > >    kobject_uevent(&dev->dev.kobj, KOBJ_REMOVE);
> > > >    dev_net_set(dev, net);
> > > >    kobject_uevent(&dev->dev.kobj, KOBJ_ADD);  
> > > 
> > > Greg, we seem to have a problem in networking with combined
> > > netns move and name change.
> > > 
> > > We have this code in __dev_change_net_namespace():
> > > 
> > > 	kobject_uevent(&dev->dev.kobj, KOBJ_REMOVE);
> > > 	dev_net_set(dev, net);
> > > 	kobject_uevent(&dev->dev.kobj, KOBJ_ADD);
> > > 
> > > 	err = device_rename(&dev->dev, dev->name);
> > > 
> > > Is there any way we can only get the REMOVE (old name) and ADD
> > > (new name) events, without the move? I.e. silence the rename? 
> > > 
> > > Daniel is reporting that with current code target netns sees an 
> > > add of an interface with the old (duplicated) name. And then a rename.  
> > 
> > But that's how this has always been, right?  What problems is this
> > causing?
> 
> Original report is up-thread:
> https://lore.kernel.org/all/20231010121003.x3yi6fihecewjy4e@House.clients.dxld.at/
> With a link to a GH issue for lxc:
> https://github.com/lxc/incus/issues/146
> 
> > > Without a silent move best we can do is probably:
> > > 
> > > 	kobject_uevent(&dev->dev.kobj, KOBJ_REMOVE);
> > > 	dev_net_set(dev, net);
> > > 	err = device_rename(&dev->dev, dev->name);
> > > 	kobject_uevent(&dev->dev.kobj, KOBJ_ADD);
> > > 
> > > which will give us:
> > > 
> > > 	MOVE new-name
> > > 	ADD new-name
> > > 
> > > in target netns, which, hm.  
> > 
> > That wouldn't make much sense.
> > 
> > What is the real problem here?  What changed to cause a problem?
> 
> IIUC what happens is:
> 
>  - systemd controls "real" eth0
>  - we move a "to be renamed" eth0 from a container into main ns
>  - we rename "to be renamed" eth0 to something else
>  - seeing the rename of eth0 system thinks it's the "real" one
>    that is being renamed, ergo there's no eth0 any more,
>    so it shuts down its "unit" for eth0
> 
> I don't think anything changed. Sounds more like someone finally tried
> to use this in anger.

Then they get to keep the broken pieces that they created here.
"moving" a network connection to a container needs to either be added to
systemd if it is going to manage the network connections, or just stop
using systemd to handle the connection entirely as they want to do
something that systemd doesn't support.

I don't think your proposed change is going to do much here as you would
have multiple adds for the same device without any removes, which is
odd.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG] rtnl_newlink: Rogue MOVE event delivered on netns change
  2023-10-16 17:20         ` Greg Kroah-Hartman
@ 2023-10-16 19:03           ` Jakub Kicinski
  2023-10-17  1:20           ` Daniel Gröber
  1 sibling, 0 replies; 10+ messages in thread
From: Jakub Kicinski @ 2023-10-16 19:03 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Daniel Gröber, David S. Miller, Eric Dumazet, Paolo Abeni,
	netdev, Richard Weinberger, Serge Hallyn, Eric W. Biederman

On Mon, 16 Oct 2023 19:20:26 +0200 Greg Kroah-Hartman wrote:
> > IIUC what happens is:
> > 
> >  - systemd controls "real" eth0
> >  - we move a "to be renamed" eth0 from a container into main ns
> >  - we rename "to be renamed" eth0 to something else
> >  - seeing the rename of eth0 system thinks it's the "real" one
> >    that is being renamed, ergo there's no eth0 any more,
> >    so it shuts down its "unit" for eth0
> > 
> > I don't think anything changed. Sounds more like someone finally tried
> > to use this in anger.  
> 
> Then they get to keep the broken pieces that they created here.
> "moving" a network connection to a container needs to either be added to
> systemd if it is going to manage the network connections, or just stop
> using systemd to handle the connection entirely as they want to do
> something that systemd doesn't support.
> 
> I don't think your proposed change is going to do much here as you would
> have multiple adds for the same device without any removes, which is
> odd.

We issue the ADD and REMOVE uevents explicitly.

If only we could have a form of device_rename() which does not generate
any uevent, everything should be perfectly sensible. We issue REMOVE in
the old namespace. Rejig the device including the silent rename. Issue
ADD for the new identity in the new namespace.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG] rtnl_newlink: Rogue MOVE event delivered on netns change
  2023-10-16 17:20         ` Greg Kroah-Hartman
  2023-10-16 19:03           ` Jakub Kicinski
@ 2023-10-17  1:20           ` Daniel Gröber
  2023-10-17  1:45             ` Jakub Kicinski
  1 sibling, 1 reply; 10+ messages in thread
From: Daniel Gröber @ 2023-10-17  1:20 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Jakub Kicinski
  Cc: David S. Miller, Eric Dumazet, Paolo Abeni, netdev,
	Richard Weinberger, Serge Hallyn, Eric W. Biederman

Hi Greg, Jackub,

On Mon, Oct 16, 2023 at 07:20:26PM +0200, Greg Kroah-Hartman wrote:
> > IIUC what happens is:
> > 
> >  - systemd controls "real" eth0
> >  - we move a "to be renamed" eth0 from a container into main ns
> >  - we rename "to be renamed" eth0 to something else
> >  - seeing the rename of eth0 system thinks it's the "real" one
> >    that is being renamed, ergo there's no eth0 any more,
> >    so it shuts down its "unit" for eth0
> > 
> > I don't think anything changed. Sounds more like someone finally tried
> > to use this in anger.
> 
> Then they get to keep the broken pieces that they created here.
> "moving" a network connection to a container needs to either be added to
> systemd if it is going to manage the network connections, or just stop
> using systemd to handle the connection entirely as they want to do
> something that systemd doesn't support.

I think you've not entirely understood the problem, let me try to explain
it better. The buggy scenario is this: we have a netdev "eth0" in main-ns
(call it eth0-main to avoid confusion), managed by a systemd service (with
BoundTo=*-eth0.device) but also a different netdev called eth0 in a
container (call it eth0-netns).

Now the container is being shut down so the container manager dutifully
moves eth0-netns back to the main-ns while simultaniously renaming it so as
to not cash with eth0-main.

Systemd in main-ns now sees a MOVE event (eth0 -> eth123) and handles this
as the removal of eth0-main. This stops the corresponding device unit and
subsequently the service managing eth0-main via BoundTo.

Problem is the real eth0-main wasn't actually moved, removed or
otherwise. eth0-netns being rename+moved just caused a nonsense MOVE event
to be sent to main-ns (and an incorrect ADD with the old name).

You can imagine how this bug might bring down an entire container
hypervisor's network uplink.



On Fri, Oct 13, 2023 at 03:36:05PM -0700, Jakub Kicinski wrote:
> FTR I don't see the move on current net-next, I see two adds 
> and one move.

Where's the second add? I see only one. Note: I only look at what happens
after the move+rename ip link set call.

>  [ ~]# ip -netns test link set dev eth0 netns 1 name eth123
>  KERNEL[135.598907] add      /devices/virtual/net/eth0 (net)
>  KERNEL[135.600425] move     /devices/virtual/net/eth123 (net)

Testing again now my output matches yours above. Odd, but I probably did a
kernel upgrade since then. Let me know if it matters and I'll go digging
thought apt logs.

Now I have uname -a:
Linux 6.5.0-2-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.5.6-1 (2023-10-07) x86_64

> I don't think it matters for the problem you're describing, tho.

I thought the two move lines I was seeing were one move event being printed
as two lines, with the old/new device names. If what's being printed is the
new name the bug root cause stays the same.

> > Looking at the code in __rtnl_newlink I see do_setlink first calls
> > __dev_change_net_namespace and then dev_change_name. My guess is the order
> > is just wrong here.
> 
> Interesting. My analysis is slightly different but only in low level
> aspects.. tell me if I'm wrong:

I only looked at the high level flow of the code.

> 1. we have tb[IFLA_IFNAME] set, so do_setlink() will populate ifname
> 
> 2. Because of #1, __dev_change_net_namespace() gets called with 
>    new name provide (pat = eth123)
> 
> 3. It will do netdev_name_in_use(), which returns true.

At this point we're still looking at the old netns, right?

> 4. It will then call dev_get_valid_name() which, (confusingly?) already
>    sets the new name on the netdevice itself.

I wouldn't have guessed a "get" function will actually set something, no,
but then again I know not of the kernel's arcane ways :)

> Picking back up after the shutdown in old netns.
> 
> 6. We call:
> 
>    kobject_uevent(&dev->dev.kobj, KOBJ_REMOVE);
>    dev_net_set(dev, net);
>    kobject_uevent(&dev->dev.kobj, KOBJ_ADD);
> 
> Those are the calls you see in udev, recall that device core has
> its own naming, so both of those calls will use the _old_ name.
> REMOVE in the source netns and ADD in the destination netns.
>
> The kobject calls were added by Serge in 4e66ae2ea371c, in 2012,
> curiously it states:
> 
>     v2: also send KOBJ_ADD to new netns.  There will then be a
>     _MOVE event from the device_rename() call, but that should
>     be innocuous.
>
> Which was based on this conversation:
> https://lore.kernel.org/all/20121012031328.GA5472@sergelap/

Seems to me what main-ns might do with this MOVE event was simply not
considered in detail, innocuous it is not ;)

> 7. Now we finally call:
> 
>    err = device_rename(&dev->dev, dev->name);
> 
> Which tells device core that the name has changed, and gives you 
> the (second) MOVE event. This time with the correct name.

I don't like loose ends. Any idea why we only see the one MOVE now?

> Which is what you're seeing, Bug#3, the ADD event should be after
> the call to device_rename()...

All tracks AFAICT.

Thanks,
--Daniel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [BUG] rtnl_newlink: Rogue MOVE event delivered on netns change
  2023-10-17  1:20           ` Daniel Gröber
@ 2023-10-17  1:45             ` Jakub Kicinski
  2023-11-11  2:13               ` [RFC net-next] net: do not send a MOVE event when netdev changes netns Jakub Kicinski
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Kicinski @ 2023-10-17  1:45 UTC (permalink / raw)
  To: Daniel Gröber
  Cc: Greg Kroah-Hartman, David S. Miller, Eric Dumazet, Paolo Abeni,
	netdev, Richard Weinberger, Serge Hallyn, Eric W. Biederman

On Tue, 17 Oct 2023 03:20:24 +0200 Daniel Gröber wrote:
> > 1. we have tb[IFLA_IFNAME] set, so do_setlink() will populate ifname
> > 
> > 2. Because of #1, __dev_change_net_namespace() gets called with 
> >    new name provide (pat = eth123)
> > 
> > 3. It will do netdev_name_in_use(), which returns true.  
> 
> At this point we're still looking at the old netns, right?

New one, already. We got it from the caller and the caller
from rtnl_link_get_net_capable().

> > 7. Now we finally call:
> > 
> >    err = device_rename(&dev->dev, dev->name);
> > 
> > Which tells device core that the name has changed, and gives you 
> > the (second) MOVE event. This time with the correct name.  
> 
> I don't like loose ends. Any idea why we only see the one MOVE now?

No, annoyingly I haven't. But I do have a host on 5.19.

[ ~]# uname -r
5.19.13-200.fc36.x86_64
[ ~]# ip netns add test
[ ~]# udevadm monitor -k &
[ ~]# ip li add name eth0 type dummy
KERNEL[67.377539] add      /module/dummy (module)
KERNEL[67.381720] add      /devices/virtual/net/eth0 (net)
KERNEL[67.381822] add      /devices/virtual/net/eth0/queues/rx-0 (queues)
KERNEL[67.381854] add      /devices/virtual/net/eth0/queues/tx-0 (queues)
[ ~]# ip -netns test li add name eth0 type dummy
[ ~]# ip -netns test link set dev eth0 netns 1 name eth1
KERNEL[99.681956] add      /devices/virtual/net/eth0 (net)
KERNEL[99.681975] move     /devices/virtual/net/eth1 (net)

I don't see it on older kernels either :S

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC net-next] net: do not send a MOVE event when netdev changes netns
  2023-10-17  1:45             ` Jakub Kicinski
@ 2023-11-11  2:13               ` Jakub Kicinski
  0 siblings, 0 replies; 10+ messages in thread
From: Jakub Kicinski @ 2023-11-11  2:13 UTC (permalink / raw)
  To: gregkh, dxld; +Cc: netdev, richard, pabeni, edumazet, Jakub Kicinski

Networking supports changing netdevice's netns and name
at the same time. This allows avoiding name conflicts
and having to rename the interface in multiple steps.
E.g. netns1={eth0, eth1}, netns2={eth1} - we want
to move netns1:eth1 to netns2 and call it eth0 there.
If we can't rename "in flight" we'd need to (1) rename
eth1 -> $tmp, (2) change netns, (3) rename $tmp -> eth0.

To rename the underlying struct device we have to call
device_rename(). The rename()'s MOVE event, however, doesn't
"belong" to either the old or the new namespace.
If there are conflicts on both sides it's actually impossible
to issue a real MOVE (old name -> new name) without confusing
user space. And Daniel reports that such confusions do in fact
happen for systemd, in real life.

Since we already issue explicit REMOVE and ADD events
manually - suppress the MOVE event completely. Move
the ADD after the rename, so that the REMOVE uses
the old name, and the ADD the new one.

If there is no rename this changes the picture as follows:

Before:

old ns | KERNEL[213.399289] remove   /devices/virtual/net/eth0 (net)
new ns | KERNEL[213.401302] add      /devices/virtual/net/eth0 (net)
new ns | KERNEL[213.401397] move     /devices/virtual/net/eth0 (net)

After:

old ns | KERNEL[266.774257] remove   /devices/virtual/net/eth0 (net)
new ns | KERNEL[266.774509] add      /devices/virtual/net/eth0 (net)

If there is a rename and a conflict (using the exact eth0/eth1
example explained above) we get this:

Before:

old ns | KERNEL[224.316833] remove   /devices/virtual/net/eth1 (net)
new ns | KERNEL[224.318551] add      /devices/virtual/net/eth1 (net)
new ns | KERNEL[224.319662] move     /devices/virtual/net/eth0 (net)

After:

old ns | KERNEL[333.033166] remove   /devices/virtual/net/eth1 (net)
new ns | KERNEL[333.035098] add      /devices/virtual/net/eth0 (net)

Note that "in flight" rename is only performed when needed.
If there is no conflict for old name in the target netns -
the rename will be performed separately by dev_change_name(),
as if the rename was a different command, and there will still
be a MOVE event for the rename:

Before:

old ns | KERNEL[194.416429] remove   /devices/virtual/net/eth0 (net)
new ns | KERNEL[194.418809] add      /devices/virtual/net/eth0 (net)
new ns | KERNEL[194.418869] move     /devices/virtual/net/eth0 (net)
new ns | KERNEL[194.420866] move     /devices/virtual/net/eth1 (net)

After:

old ns | KERNEL[71.917520] remove   /devices/virtual/net/eth0 (net)
new ns | KERNEL[71.919155] add      /devices/virtual/net/eth0 (net)
new ns | KERNEL[71.920729] move     /devices/virtual/net/eth1 (net)

If deleting the MOVE event breaks some user space we should insert
an explicit kobject_uevent(MOVE) after the ADD, like this:

--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -11192,6 +11192,12 @@ int __dev_change_net_namespace(struct net_device *dev, struct net *net,
 	kobject_uevent(&dev->dev.kobj, KOBJ_ADD);
 	netdev_adjacent_add_links(dev);

+	/* User space wants an explicit MOVE event, issue one unless
+	 * dev_change_name() will get called later and issue one.
+	 */
+	if (!pat || new_name[0])
+		kobject_uevent(&dev->dev.kobj, KOBJ_MOVE);
+
 	/* Adapt owner in case owning user namespace of target network
 	 * namespace is different from the original one.
 	 */

CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reported-by: Daniel Gröber <dxld@darkboxed.org>
Link: https://lore.kernel.org/all/20231010121003.x3yi6fihecewjy4e@House.clients.dxld.at/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
WDYT?
---
 net/core/dev.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 0d548431f3fa..a26200cbf687 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -11179,17 +11179,19 @@ int __dev_change_net_namespace(struct net_device *dev, struct net *net,
 	dev_net_set(dev, net);
 	dev->ifindex = new_ifindex;
 
-	/* Send a netdev-add uevent to the new namespace */
-	kobject_uevent(&dev->dev.kobj, KOBJ_ADD);
-	netdev_adjacent_add_links(dev);
-
 	if (new_name[0]) /* Rename the netdev to prepared name */
 		strscpy(dev->name, new_name, IFNAMSIZ);
 
 	/* Fixup kobjects */
+	dev_set_uevent_suppress(&dev->dev, 1);
 	err = device_rename(&dev->dev, dev->name);
+	dev_set_uevent_suppress(&dev->dev, 0);
 	WARN_ON(err);
 
+	/* Send a netdev-add uevent to the new namespace */
+	kobject_uevent(&dev->dev.kobj, KOBJ_ADD);
+	netdev_adjacent_add_links(dev);
+
 	/* Adapt owner in case owning user namespace of target network
 	 * namespace is different from the original one.
 	 */
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-11-11  2:13 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-10 12:10 [BUG] rtnl_newlink: Rogue MOVE event delivered on netns change Daniel Gröber
2023-10-13 22:36 ` Jakub Kicinski
2023-10-13 22:43   ` Jakub Kicinski
2023-10-14  8:58     ` Greg Kroah-Hartman
2023-10-16 14:32       ` Jakub Kicinski
2023-10-16 17:20         ` Greg Kroah-Hartman
2023-10-16 19:03           ` Jakub Kicinski
2023-10-17  1:20           ` Daniel Gröber
2023-10-17  1:45             ` Jakub Kicinski
2023-11-11  2:13               ` [RFC net-next] net: do not send a MOVE event when netdev changes netns Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).