* [PATCH net] openvswitch: vport: fix race between linking and the device notifier
@ 2026-05-13 9:54 Ilya Maximets
2026-05-13 12:02 ` Aaron Conole
0 siblings, 1 reply; 3+ messages in thread
From: Ilya Maximets @ 2026-05-13 9:54 UTC (permalink / raw)
To: netdev
Cc: Aaron Conole, Eelco Chaudron, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, dev, linux-kernel,
Ilya Maximets
Sashiko reports that it is technically possible that we got the device
reference, but by the time we're linking it to the OVS datapath, it
may be already in the process of being deleted. In this case if the
notifier wins the race for RTNL, it will see that the device is not
yet in the OVS datapath (ovs_netdev_get_vport() will fail in the
dp_device_event()) and will do nothing. Then the ovs_netdev_link()
will take the RTNL and link the unregistering device to OVS datapath.
Eventually, netdev_wait_allrefs_any() will re-broadcast the event and
the device will be properly detached, but it will take at least a
second before that happens, so it's not something we should rely on.
Let's avoid linking the non-registered device in the first place.
Note: As per documentation, RTNL doesn't protect the reg_state, but
it actually does for all the state transitions we care about here,
so it should not be necessary to use READ_ONCE or taking the instance
lock. We can still do that, but we have a few more places even in
this file where the reg_state is accessed without those while under
RTNL, and many more places like this across the kernel code, so it
might make more sense to change all of them in a more centralized
fashion in the future, if necessary.
Fixes: ccb1352e76cf ("net: Add Open vSwitch kernel components.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
---
net/openvswitch/vport-netdev.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index c42642075685d..de90d0541e172 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -83,6 +83,11 @@ struct vport *ovs_netdev_link(struct vport *vport, bool tunnel)
}
rtnl_lock();
+ if (vport->dev->reg_state != NETREG_REGISTERED) {
+ err = -ENODEV;
+ goto error_put_unlock;
+ }
+
err = netdev_master_upper_dev_link(vport->dev,
get_dpdev(vport->dp),
NULL, NULL, NULL);
--
2.53.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH net] openvswitch: vport: fix race between linking and the device notifier
2026-05-13 9:54 [PATCH net] openvswitch: vport: fix race between linking and the device notifier Ilya Maximets
@ 2026-05-13 12:02 ` Aaron Conole
2026-05-13 16:55 ` Ilya Maximets
0 siblings, 1 reply; 3+ messages in thread
From: Aaron Conole @ 2026-05-13 12:02 UTC (permalink / raw)
To: Ilya Maximets
Cc: netdev, Eelco Chaudron, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, dev, linux-kernel
Hi Ilya,
Ilya Maximets <i.maximets@ovn.org> writes:
> Sashiko reports that it is technically possible that we got the device
> reference, but by the time we're linking it to the OVS datapath, it
> may be already in the process of being deleted. In this case if the
> notifier wins the race for RTNL, it will see that the device is not
> yet in the OVS datapath (ovs_netdev_get_vport() will fail in the
> dp_device_event()) and will do nothing. Then the ovs_netdev_link()
> will take the RTNL and link the unregistering device to OVS datapath.
>
> Eventually, netdev_wait_allrefs_any() will re-broadcast the event and
> the device will be properly detached, but it will take at least a
> second before that happens, so it's not something we should rely on.
>
> Let's avoid linking the non-registered device in the first place.
>
> Note: As per documentation, RTNL doesn't protect the reg_state, but
> it actually does for all the state transitions we care about here,
> so it should not be necessary to use READ_ONCE or taking the instance
> lock. We can still do that, but we have a few more places even in
> this file where the reg_state is accessed without those while under
> RTNL, and many more places like this across the kernel code, so it
> might make more sense to change all of them in a more centralized
> fashion in the future, if necessary.
>
> Fixes: ccb1352e76cf ("net: Add Open vSwitch kernel components.")
> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> ---
> net/openvswitch/vport-netdev.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
> index c42642075685d..de90d0541e172 100644
> --- a/net/openvswitch/vport-netdev.c
> +++ b/net/openvswitch/vport-netdev.c
> @@ -83,6 +83,11 @@ struct vport *ovs_netdev_link(struct vport *vport, bool tunnel)
> }
>
> rtnl_lock();
As noted in your commit, this shouldn't cause any kind of issues, since
the next netdev_wait_allrefs_any() will make sure things look correct to
the users again.
That said, I agree this is good to do to prevent some confusion going to
the users. I wonder if it makes sense to add a comment here noting
that. Otherwise, if I were just freshly reading through the code it
wouldn't follow (all the places where ovs_netdev_link get called are in
the 'create' path).
WDYT?
> + if (vport->dev->reg_state != NETREG_REGISTERED) {
> + err = -ENODEV;
> + goto error_put_unlock;
> + }
> +
> err = netdev_master_upper_dev_link(vport->dev,
> get_dpdev(vport->dp),
> NULL, NULL, NULL);
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH net] openvswitch: vport: fix race between linking and the device notifier
2026-05-13 12:02 ` Aaron Conole
@ 2026-05-13 16:55 ` Ilya Maximets
0 siblings, 0 replies; 3+ messages in thread
From: Ilya Maximets @ 2026-05-13 16:55 UTC (permalink / raw)
To: Aaron Conole
Cc: i.maximets, netdev, Eelco Chaudron, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, dev, linux-kernel
On 5/13/26 2:02 PM, Aaron Conole wrote:
> Hi Ilya,
>
> Ilya Maximets <i.maximets@ovn.org> writes:
>
>> Sashiko reports that it is technically possible that we got the device
>> reference, but by the time we're linking it to the OVS datapath, it
>> may be already in the process of being deleted. In this case if the
>> notifier wins the race for RTNL, it will see that the device is not
>> yet in the OVS datapath (ovs_netdev_get_vport() will fail in the
>> dp_device_event()) and will do nothing. Then the ovs_netdev_link()
>> will take the RTNL and link the unregistering device to OVS datapath.
>>
>> Eventually, netdev_wait_allrefs_any() will re-broadcast the event and
>> the device will be properly detached, but it will take at least a
>> second before that happens, so it's not something we should rely on.
>>
>> Let's avoid linking the non-registered device in the first place.
>>
>> Note: As per documentation, RTNL doesn't protect the reg_state, but
>> it actually does for all the state transitions we care about here,
>> so it should not be necessary to use READ_ONCE or taking the instance
>> lock. We can still do that, but we have a few more places even in
>> this file where the reg_state is accessed without those while under
>> RTNL, and many more places like this across the kernel code, so it
>> might make more sense to change all of them in a more centralized
>> fashion in the future, if necessary.
>>
>> Fixes: ccb1352e76cf ("net: Add Open vSwitch kernel components.")
>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>> ---
>> net/openvswitch/vport-netdev.c | 5 +++++
>> 1 file changed, 5 insertions(+)
>>
>> diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
>> index c42642075685d..de90d0541e172 100644
>> --- a/net/openvswitch/vport-netdev.c
>> +++ b/net/openvswitch/vport-netdev.c
>> @@ -83,6 +83,11 @@ struct vport *ovs_netdev_link(struct vport *vport, bool tunnel)
>> }
>>
>> rtnl_lock();
>
> As noted in your commit, this shouldn't cause any kind of issues, since
> the next netdev_wait_allrefs_any() will make sure things look correct to
> the users again.
>
> That said, I agree this is good to do to prevent some confusion going to
> the users. I wonder if it makes sense to add a comment here noting
> that. Otherwise, if I were just freshly reading through the code it
> wouldn't follow (all the places where ovs_netdev_link get called are in
> the 'create' path).
>
> WDYT?
I'm not sure if the comment is necessary. We're not creating a device here
and it seems clear enough that we shouldn't be linking devices that are not
registered, even if there are no races. But I could add something like:
/* Do not link devices that are not registered to avoid a potential
* race with the NETDEV_UNREGISTER notification in dp_device_event().
*/
WDYT?
Best regards, Ilya Maximets.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-05-13 16:56 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-13 9:54 [PATCH net] openvswitch: vport: fix race between linking and the device notifier Ilya Maximets
2026-05-13 12:02 ` Aaron Conole
2026-05-13 16:55 ` Ilya Maximets
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox