* [PATCH net] ovpn: Fix race condition in ovpn_dellink()
@ 2026-03-02 10:02 Hyunwoo Kim
2026-03-03 20:15 ` Antonio Quartulli
0 siblings, 1 reply; 3+ messages in thread
From: Hyunwoo Kim @ 2026-03-02 10:02 UTC (permalink / raw)
To: antonio, sd, andrew+netdev, davem, edumazet, kuba, pabeni; +Cc: netdev, imv4bel
When ovpn_dellink() is called, it invokes
cancel_delayed_work_sync() to stop keepalive_work before freeing
the device.
However, ovpn_nl_peer_new_doit() runs without any lock shared
with the RTNL path, so keepalive_work can be scheduled after
cancel_delayed_work_sync() returns.
The following is a simple race scenario:
cpu0 cpu1
ovpn_dellink(dev)
cancel_delayed_work_sync(keepalive_work)
ovpn_nl_peer_new_doit()
ovpn_nl_peer_modify()
ovpn_peer_keepalive_set()
mod_delayed_work(keepalive_work)
To prevent this race condition, cancel_delayed_work_sync() is
replaced with disable_delayed_work_sync().
Fixes: 3ecfd9349f40 ("ovpn: implement keepalive mechanism")
Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
---
drivers/net/ovpn/main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c
index 2e0420febda0..ada73dd35ba4 100644
--- a/drivers/net/ovpn/main.c
+++ b/drivers/net/ovpn/main.c
@@ -212,7 +212,7 @@ static void ovpn_dellink(struct net_device *dev, struct list_head *head)
{
struct ovpn_priv *ovpn = netdev_priv(dev);
- cancel_delayed_work_sync(&ovpn->keepalive_work);
+ disable_delayed_work_sync(&ovpn->keepalive_work);
ovpn_peers_free(ovpn, NULL, OVPN_DEL_PEER_REASON_TEARDOWN);
unregister_netdevice_queue(dev, head);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH net] ovpn: Fix race condition in ovpn_dellink() 2026-03-02 10:02 [PATCH net] ovpn: Fix race condition in ovpn_dellink() Hyunwoo Kim @ 2026-03-03 20:15 ` Antonio Quartulli 2026-03-06 11:03 ` Sabrina Dubroca 0 siblings, 1 reply; 3+ messages in thread From: Antonio Quartulli @ 2026-03-03 20:15 UTC (permalink / raw) To: Hyunwoo Kim, sd; +Cc: pabeni, netdev, davem, andrew+netdev, edumazet, kuba Hi, On 02/03/2026 11:02, Hyunwoo Kim wrote: > When ovpn_dellink() is called, it invokes > cancel_delayed_work_sync() to stop keepalive_work before freeing > the device. > However, ovpn_nl_peer_new_doit() runs without any lock shared > with the RTNL path, so keepalive_work can be scheduled after > cancel_delayed_work_sync() returns. > > The following is a simple race scenario: > > cpu0 cpu1 > > ovpn_dellink(dev) > cancel_delayed_work_sync(keepalive_work) > ovpn_nl_peer_new_doit() > ovpn_nl_peer_modify() > ovpn_peer_keepalive_set() > mod_delayed_work(keepalive_work) > > To prevent this race condition, cancel_delayed_work_sync() is > replaced with disable_delayed_work_sync(). I was about to agree on your fix, however, it seems there is a larger issue here and this patch is just addressing one symptom. If you are truly able to execute the whole ovpn_nl_peer_new_doit() after cancel_delayed_work_sync() and before the netdev refcounter reaches 0, I think we may get stuck in the "wait for device to be freed..." loop. That's because the ovpn_peer_new() will acquire a reference to the netdev, preventing it to be fully released. Sabrina, do you have any thought on this? It seems as if there is no safe guard against adding peers while destroying the interface. Regards, -- Antonio Quartulli OpenVPN Inc. ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH net] ovpn: Fix race condition in ovpn_dellink() 2026-03-03 20:15 ` Antonio Quartulli @ 2026-03-06 11:03 ` Sabrina Dubroca 0 siblings, 0 replies; 3+ messages in thread From: Sabrina Dubroca @ 2026-03-06 11:03 UTC (permalink / raw) To: Antonio Quartulli Cc: Hyunwoo Kim, pabeni, netdev, davem, andrew+netdev, edumazet, kuba Sorry Antonio, things are a bit busy over here, I'm finally getting to this. 2026-03-03, 21:15:59 +0100, Antonio Quartulli wrote: > Hi, > > On 02/03/2026 11:02, Hyunwoo Kim wrote: > > When ovpn_dellink() is called, it invokes > > cancel_delayed_work_sync() to stop keepalive_work before freeing > > the device. > > However, ovpn_nl_peer_new_doit() runs without any lock shared > > with the RTNL path, so keepalive_work can be scheduled after > > cancel_delayed_work_sync() returns. > > > > The following is a simple race scenario: > > > > cpu0 cpu1 > > > > ovpn_dellink(dev) > > cancel_delayed_work_sync(keepalive_work) > > ovpn_nl_peer_new_doit() > > ovpn_nl_peer_modify() > > ovpn_peer_keepalive_set() > > mod_delayed_work(keepalive_work) > > > > To prevent this race condition, cancel_delayed_work_sync() is > > replaced with disable_delayed_work_sync(). > > I was about to agree on your fix, however, it seems there is a larger issue > here and this patch is just addressing one symptom. > > If you are truly able to execute the whole ovpn_nl_peer_new_doit() after > cancel_delayed_work_sync() and before the netdev refcounter reaches 0, I > think we may get stuck in the "wait for device to be freed..." loop. > > That's because the ovpn_peer_new() will acquire a reference to the netdev, > preventing it to be fully released. > > > Sabrina, do you have any thought on this? > It seems as if there is no safe guard against adding peers while destroying > the interface. Feel free to add: diff --git a/drivers/net/ovpn/main.c b/drivers/net/ovpn/main.c index 2e0420febda0..3bf6ea5f7208 100644 --- a/drivers/net/ovpn/main.c +++ b/drivers/net/ovpn/main.c @@ -214,6 +214,10 @@ static void ovpn_dellink(struct net_device *dev, struct list_head *head) cancel_delayed_work_sync(&ovpn->keepalive_work); ovpn_peers_free(ovpn, NULL, OVPN_DEL_PEER_REASON_TEARDOWN); + + pr_warn("sleeping\n"); + msleep(10000); + unregister_netdevice_queue(dev, head); } to help reproduce the race, I usually do something like that if it's possible to sleep in that context. I think you're right, the newly-added peer will miss ovpn_peers_free. Can we simply move cancel_delayed_work_sync + ovpn_peers_free to ndo_uninit and not provide a dellink? At ndo_uninit, the netdevice is no longer listed so PEER_NEW won't find it, but delaying the release of the reference on the netdev via peer cleanup should still be ok (unregister_netdevice_many_notify calls unlist_netdevice and then ndo_uninit, then we add it to the todo_list and netdev_run_todo processes that list and does netdev_wait_allrefs_any). -- Sabrina ^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-03-06 11:03 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-02 10:02 [PATCH net] ovpn: Fix race condition in ovpn_dellink() Hyunwoo Kim 2026-03-03 20:15 ` Antonio Quartulli 2026-03-06 11:03 ` Sabrina Dubroca
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox