* [PATCH] batman-adv: fix DAT purge use-after-free on teardown
@ 2026-05-26 6:48 Soowan Park
2026-05-26 7:30 ` Sven Eckelmann
0 siblings, 1 reply; 2+ messages in thread
From: Soowan Park @ 2026-05-26 6:48 UTC (permalink / raw)
To: b.a.t.m.a.n, netdev
Cc: marek.lindner, sw, antonio, sven, davem, edumazet, kuba, pabeni,
horms, Soowan Park
batadv_dat_purge() is a periodic delayed work that re-queues itself via
batadv_dat_start_timer() at the end of each run. When the mesh interface
is torn down, batadv_dat_free() calls cancel_delayed_work_sync() to stop
the purge work before freeing the DAT hash table.
However, cancel_delayed_work_sync() leaves the work in an enabled state.
If the purge work is currently executing and re-queues itself before
cancel_delayed_work_sync() internally marks it for cancellation, the
newly queued work escapes cancellation. This re-queued work then fires
after batadv_dat_hash_free() has already freed the hash table but before
the pointer is set to NULL, causing __batadv_dat_purge() to operate on a
dangling pointer that passes the NULL check, and spin indefinitely on a
spinlock in freed memory.
Replace cancel_delayed_work_sync() with disable_delayed_work_sync(),
which additionally disables the work so that any concurrent
queue_delayed_work() call from the running batadv_dat_purge() is
silently rejected. This guarantees no re-queued work can fire after
disable_delayed_work_sync() returns.
Found by syzkaller.
Fixes: 2f1dfbe18507 ("batman-adv: Distributed ARP Table - implement local storage")
Signed-off-by: Soowan Park <swan2718@snu.ac.kr>
---
net/batman-adv/distributed-arp-table.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/batman-adv/distributed-arp-table.c b/net/batman-adv/distributed-arp-table.c
index 0a8bd95e2f99..9dce7da4282c 100644
--- a/net/batman-adv/distributed-arp-table.c
+++ b/net/batman-adv/distributed-arp-table.c
@@ -837,7 +837,7 @@ void batadv_dat_free(struct batadv_priv *bat_priv)
batadv_tvlv_container_unregister(bat_priv, BATADV_TVLV_DAT, 1);
batadv_tvlv_handler_unregister(bat_priv, BATADV_TVLV_DAT, 1);
- cancel_delayed_work_sync(&bat_priv->dat.work);
+ disable_delayed_work_sync(&bat_priv->dat.work);
batadv_dat_hash_free(bat_priv);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH] batman-adv: fix DAT purge use-after-free on teardown
2026-05-26 6:48 [PATCH] batman-adv: fix DAT purge use-after-free on teardown Soowan Park
@ 2026-05-26 7:30 ` Sven Eckelmann
0 siblings, 0 replies; 2+ messages in thread
From: Sven Eckelmann @ 2026-05-26 7:30 UTC (permalink / raw)
To: b.a.t.m.a.n, netdev, Soowan Park
Cc: marek.lindner, sw, antonio, davem, edumazet, kuba, pabeni, horms,
Soowan Park, Tejun Heo
[-- Attachment #1: Type: text/plain, Size: 2521 bytes --]
On Tuesday, 26 May 2026 08:48:35 CEST Soowan Park wrote:
> batadv_dat_purge() is a periodic delayed work that re-queues itself via
> batadv_dat_start_timer() at the end of each run. When the mesh interface
> is torn down, batadv_dat_free() calls cancel_delayed_work_sync() to stop
> the purge work before freeing the DAT hash table.
>
> However, cancel_delayed_work_sync() leaves the work in an enabled state.
> If the purge work is currently executing and re-queues itself before
> cancel_delayed_work_sync() internally marks it for cancellation, the
> newly queued work escapes cancellation. This re-queued work then fires
> after batadv_dat_hash_free() has already freed the hash table but before
> the pointer is set to NULL, causing __batadv_dat_purge() to operate on a
> dangling pointer that passes the NULL check, and spin indefinitely on a
> spinlock in freed memory.
You are talking about a re-queue by batadv_dat_start_timer(). This only
happens when the DAT gets initialized or via the worker (batadv_dat_purge)
itself. How can the worker which is cancelled (with sync) re-queue itself?
Isn't this breaking a guarantee of cancel_delayed_work_sync() or did I
misunderstand this part of the documentation?
"This is cancel_work_sync() for delayed works." [1]
"Cancel work and wait for its execution to finish. This function can be used
even if the work re-queues itself or migrates to another workqueue. On return
from this function, work is guaranteed to be not pending or executing on any
CPU as long as there aren’t racing enqueues." [2]
(the part "This function can be used even if the work re-queues itself" is
the important part here).
> Replace cancel_delayed_work_sync() with disable_delayed_work_sync(),
> which additionally disables the work so that any concurrent
> queue_delayed_work() call from the running batadv_dat_purge() is
> silently rejected. This guarantees no re-queued work can fire after
> disable_delayed_work_sync() returns.
I have no problem with using "disabled_*" everywhere (I even have a pending
patchset to use it - just to avoid problems with code changes in the future).
But since this is a fix which I don't get in the moment, I would like to
understand the problem you are describing better before applying it.
Regards,
Sven
[1] https://www.kernel.org/doc/html/v7.0/core-api/workqueue.html#c.cancel_delayed_work_sync
[2] https://www.kernel.org/doc/html/v7.0/core-api/workqueue.html#c.cancel_work_sync
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-05-26 7:39 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-26 6:48 [PATCH] batman-adv: fix DAT purge use-after-free on teardown Soowan Park
2026-05-26 7:30 ` Sven Eckelmann
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox