* [PATCH net] xfrm: Fix work re-schedule after cancel in xfrm_nat_keepalive_net_fini()
@ 2026-03-10 18:16 Hyunwoo Kim
2026-03-10 22:57 ` Sabrina Dubroca
2026-03-16 9:57 ` Steffen Klassert
0 siblings, 2 replies; 9+ messages in thread
From: Hyunwoo Kim @ 2026-03-10 18:16 UTC (permalink / raw)
To: steffen.klassert, herbert, davem, edumazet, kuba, pabeni, horms
Cc: netdev, imv4bel
After cancel_delayed_work_sync() is called from
xfrm_nat_keepalive_net_fini(), xfrm_state_fini() flushes remaining
states via __xfrm_state_delete(), which calls
xfrm_nat_keepalive_state_updated() to re-schedule nat_keepalive_work.
The following is a simple race scenario:
cpu0 cpu1
cleanup_net() [Round 1]
ops_undo_list()
xfrm_net_exit()
xfrm_nat_keepalive_net_fini()
cancel_delayed_work_sync(nat_keepalive_work);
xfrm_state_fini()
xfrm_state_flush()
xfrm_state_delete(x)
__xfrm_state_delete(x)
xfrm_nat_keepalive_state_updated(x)
schedule_delayed_work(nat_keepalive_work);
rcu_barrier();
net_complete_free();
net_passive_dec(net);
llist_add(&net->defer_free_list, &defer_free_list);
cleanup_net() [Round 2]
rcu_barrier();
net_complete_free()
kmem_cache_free(net_cachep, net);
nat_keepalive_work()
// on freed net
To prevent this, cancel_delayed_work_sync() is replaced with
disable_delayed_work_sync().
Fixes: f531d13bdfe3 ("xfrm: support sending NAT keepalives in ESP in UDP states")
Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
---
net/xfrm/xfrm_nat_keepalive.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/xfrm/xfrm_nat_keepalive.c b/net/xfrm/xfrm_nat_keepalive.c
index ebf95d48e86c..1856beee0149 100644
--- a/net/xfrm/xfrm_nat_keepalive.c
+++ b/net/xfrm/xfrm_nat_keepalive.c
@@ -261,7 +261,7 @@ int __net_init xfrm_nat_keepalive_net_init(struct net *net)
int xfrm_nat_keepalive_net_fini(struct net *net)
{
- cancel_delayed_work_sync(&net->xfrm.nat_keepalive_work);
+ disable_delayed_work_sync(&net->xfrm.nat_keepalive_work);
return 0;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH net] xfrm: Fix work re-schedule after cancel in xfrm_nat_keepalive_net_fini()
2026-03-10 18:16 [PATCH net] xfrm: Fix work re-schedule after cancel in xfrm_nat_keepalive_net_fini() Hyunwoo Kim
@ 2026-03-10 22:57 ` Sabrina Dubroca
2026-03-11 0:14 ` Eyal Birger
2026-03-16 9:57 ` Steffen Klassert
1 sibling, 1 reply; 9+ messages in thread
From: Sabrina Dubroca @ 2026-03-10 22:57 UTC (permalink / raw)
To: Hyunwoo Kim, Eyal Birger
Cc: steffen.klassert, herbert, davem, edumazet, kuba, pabeni, horms,
netdev
Please also CC the author, and maybe additional contributors, of the
patch that introduced the problem you're fixing.
2026-03-11, 03:16:29 +0900, Hyunwoo Kim wrote:
> After cancel_delayed_work_sync() is called from
> xfrm_nat_keepalive_net_fini(), xfrm_state_fini() flushes remaining
> states via __xfrm_state_delete(), which calls
> xfrm_nat_keepalive_state_updated() to re-schedule nat_keepalive_work.
Eyal, I'm wondering why __xfrm_state_delete() calls
xfrm_nat_keepalive_state_updated(). At this point the state has been
removed from the walk list so nat_keepalive_work() won't do
anything. Am I missing something?
> The following is a simple race scenario:
>
> cpu0 cpu1
>
> cleanup_net() [Round 1]
> ops_undo_list()
> xfrm_net_exit()
> xfrm_nat_keepalive_net_fini()
> cancel_delayed_work_sync(nat_keepalive_work);
> xfrm_state_fini()
> xfrm_state_flush()
> xfrm_state_delete(x)
> __xfrm_state_delete(x)
> xfrm_nat_keepalive_state_updated(x)
> schedule_delayed_work(nat_keepalive_work);
> rcu_barrier();
> net_complete_free();
> net_passive_dec(net);
> llist_add(&net->defer_free_list, &defer_free_list);
>
> cleanup_net() [Round 2]
> rcu_barrier();
> net_complete_free()
> kmem_cache_free(net_cachep, net);
> nat_keepalive_work()
> // on freed net
>
> To prevent this, cancel_delayed_work_sync() is replaced with
> disable_delayed_work_sync().
>
> Fixes: f531d13bdfe3 ("xfrm: support sending NAT keepalives in ESP in UDP states")
> Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
> ---
> net/xfrm/xfrm_nat_keepalive.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/xfrm/xfrm_nat_keepalive.c b/net/xfrm/xfrm_nat_keepalive.c
> index ebf95d48e86c..1856beee0149 100644
> --- a/net/xfrm/xfrm_nat_keepalive.c
> +++ b/net/xfrm/xfrm_nat_keepalive.c
> @@ -261,7 +261,7 @@ int __net_init xfrm_nat_keepalive_net_init(struct net *net)
>
> int xfrm_nat_keepalive_net_fini(struct net *net)
> {
> - cancel_delayed_work_sync(&net->xfrm.nat_keepalive_work);
> + disable_delayed_work_sync(&net->xfrm.nat_keepalive_work);
> return 0;
> }
>
> --
> 2.43.0
>
>
--
Sabrina
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] xfrm: Fix work re-schedule after cancel in xfrm_nat_keepalive_net_fini()
2026-03-10 22:57 ` Sabrina Dubroca
@ 2026-03-11 0:14 ` Eyal Birger
2026-03-11 9:26 ` Sabrina Dubroca
0 siblings, 1 reply; 9+ messages in thread
From: Eyal Birger @ 2026-03-11 0:14 UTC (permalink / raw)
To: Sabrina Dubroca
Cc: Hyunwoo Kim, steffen.klassert, herbert, davem, edumazet, kuba,
pabeni, horms, netdev
Hi,
On Tue, Mar 10, 2026 at 3:57 PM Sabrina Dubroca <sd@queasysnail.net> wrote:
>
> Please also CC the author, and maybe additional contributors, of the
> patch that introduced the problem you're fixing.
>
> 2026-03-11, 03:16:29 +0900, Hyunwoo Kim wrote:
> > After cancel_delayed_work_sync() is called from
> > xfrm_nat_keepalive_net_fini(), xfrm_state_fini() flushes remaining
> > states via __xfrm_state_delete(), which calls
> > xfrm_nat_keepalive_state_updated() to re-schedule nat_keepalive_work.
>
> Eyal, I'm wondering why __xfrm_state_delete() calls
> xfrm_nat_keepalive_state_updated(). At this point the state has been
> removed from the walk list so nat_keepalive_work() won't do
> anything. Am I missing something?
I don't remember for sure, but I think the idea was to have the work
run "now" so that when deleting the last nat-keepalive state it
won't run in the future, and in general to refresh the interval and
not wait for the next iteration.
Eyal.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] xfrm: Fix work re-schedule after cancel in xfrm_nat_keepalive_net_fini()
2026-03-11 0:14 ` Eyal Birger
@ 2026-03-11 9:26 ` Sabrina Dubroca
2026-03-11 10:31 ` Hyunwoo Kim
2026-03-11 13:00 ` Eyal Birger
0 siblings, 2 replies; 9+ messages in thread
From: Sabrina Dubroca @ 2026-03-11 9:26 UTC (permalink / raw)
To: Eyal Birger
Cc: Hyunwoo Kim, steffen.klassert, herbert, davem, edumazet, kuba,
pabeni, horms, netdev
2026-03-10, 17:14:19 -0700, Eyal Birger wrote:
> Hi,
>
> On Tue, Mar 10, 2026 at 3:57 PM Sabrina Dubroca <sd@queasysnail.net> wrote:
> >
> > Please also CC the author, and maybe additional contributors, of the
> > patch that introduced the problem you're fixing.
> >
> > 2026-03-11, 03:16:29 +0900, Hyunwoo Kim wrote:
> > > After cancel_delayed_work_sync() is called from
> > > xfrm_nat_keepalive_net_fini(), xfrm_state_fini() flushes remaining
> > > states via __xfrm_state_delete(), which calls
> > > xfrm_nat_keepalive_state_updated() to re-schedule nat_keepalive_work.
> >
> > Eyal, I'm wondering why __xfrm_state_delete() calls
> > xfrm_nat_keepalive_state_updated(). At this point the state has been
> > removed from the walk list so nat_keepalive_work() won't do
> > anything. Am I missing something?
>
> I don't remember for sure, but I think the idea was to have the work
> run "now" so that when deleting the last nat-keepalive state it
> won't run in the future, and in general to refresh the interval and
> not wait for the next iteration.
>
> Eyal.
Ok. I thought about this, but I'm not seeing the benefit of doing
that. Assuming we're deleting just this one state, the next run will
process all the remaining states in the same way, whether it happens
right now or at the previously scheduled time:
- if the next run was needed by the peer we're deleting, not much
changes except that we're recomputing the delay earlier than
otherwise (right now instead of when deleted_state's interval runs
out)
- if some other state was the first to need a keepalive, we do a run
for nothing
So I think we could drop the xfrm_nat_keepalive_state_updated call
from __xfrm_state_delete.
@Hyunwoo here again I'm not opposed to s/cancel/disable/, it makes
sense to use disable_ in a "destruct" operation where we don't plan to
need the work again. But AFAICT this schedule_delayed_work isn't
really useful.
--
Sabrina
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] xfrm: Fix work re-schedule after cancel in xfrm_nat_keepalive_net_fini()
2026-03-11 9:26 ` Sabrina Dubroca
@ 2026-03-11 10:31 ` Hyunwoo Kim
2026-03-11 13:00 ` Eyal Birger
1 sibling, 0 replies; 9+ messages in thread
From: Hyunwoo Kim @ 2026-03-11 10:31 UTC (permalink / raw)
To: Sabrina Dubroca
Cc: Eyal Birger, steffen.klassert, herbert, davem, edumazet, kuba,
pabeni, horms, netdev, imv4bel
On Wed, Mar 11, 2026 at 10:26:27AM +0100, Sabrina Dubroca wrote:
> 2026-03-10, 17:14:19 -0700, Eyal Birger wrote:
> > Hi,
> >
> > On Tue, Mar 10, 2026 at 3:57 PM Sabrina Dubroca <sd@queasysnail.net> wrote:
> > >
> > > Please also CC the author, and maybe additional contributors, of the
> > > patch that introduced the problem you're fixing.
> > >
> > > 2026-03-11, 03:16:29 +0900, Hyunwoo Kim wrote:
> > > > After cancel_delayed_work_sync() is called from
> > > > xfrm_nat_keepalive_net_fini(), xfrm_state_fini() flushes remaining
> > > > states via __xfrm_state_delete(), which calls
> > > > xfrm_nat_keepalive_state_updated() to re-schedule nat_keepalive_work.
> > >
> > > Eyal, I'm wondering why __xfrm_state_delete() calls
> > > xfrm_nat_keepalive_state_updated(). At this point the state has been
> > > removed from the walk list so nat_keepalive_work() won't do
> > > anything. Am I missing something?
> >
> > I don't remember for sure, but I think the idea was to have the work
> > run "now" so that when deleting the last nat-keepalive state it
> > won't run in the future, and in general to refresh the interval and
> > not wait for the next iteration.
> >
> > Eyal.
>
> Ok. I thought about this, but I'm not seeing the benefit of doing
> that. Assuming we're deleting just this one state, the next run will
> process all the remaining states in the same way, whether it happens
> right now or at the previously scheduled time:
>
> - if the next run was needed by the peer we're deleting, not much
> changes except that we're recomputing the delay earlier than
> otherwise (right now instead of when deleted_state's interval runs
> out)
>
> - if some other state was the first to need a keepalive, we do a run
> for nothing
>
> So I think we could drop the xfrm_nat_keepalive_state_updated call
> from __xfrm_state_delete.
>
>
> @Hyunwoo here again I'm not opposed to s/cancel/disable/, it makes
> sense to use disable_ in a "destruct" operation where we don't plan to
> need the work again. But AFAICT this schedule_delayed_work isn't
> really useful.
Thank you for the review.
Should I submit a v2 patch that removes the
xfrm_nat_keepalive_state_updated() call from __xfrm_state_delete()?
Best regards,
Hyunwoo Kim
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] xfrm: Fix work re-schedule after cancel in xfrm_nat_keepalive_net_fini()
2026-03-11 9:26 ` Sabrina Dubroca
2026-03-11 10:31 ` Hyunwoo Kim
@ 2026-03-11 13:00 ` Eyal Birger
2026-03-11 13:27 ` Sabrina Dubroca
1 sibling, 1 reply; 9+ messages in thread
From: Eyal Birger @ 2026-03-11 13:00 UTC (permalink / raw)
To: Sabrina Dubroca
Cc: Hyunwoo Kim, steffen.klassert, herbert, davem, edumazet, kuba,
pabeni, horms, netdev
On Wed, Mar 11, 2026 at 2:26 AM Sabrina Dubroca <sd@queasysnail.net> wrote:
>
> 2026-03-10, 17:14:19 -0700, Eyal Birger wrote:
> > Hi,
> >
> > On Tue, Mar 10, 2026 at 3:57 PM Sabrina Dubroca <sd@queasysnail.net> wrote:
> > >
> > > Please also CC the author, and maybe additional contributors, of the
> > > patch that introduced the problem you're fixing.
> > >
> > > 2026-03-11, 03:16:29 +0900, Hyunwoo Kim wrote:
> > > > After cancel_delayed_work_sync() is called from
> > > > xfrm_nat_keepalive_net_fini(), xfrm_state_fini() flushes remaining
> > > > states via __xfrm_state_delete(), which calls
> > > > xfrm_nat_keepalive_state_updated() to re-schedule nat_keepalive_work.
> > >
> > > Eyal, I'm wondering why __xfrm_state_delete() calls
> > > xfrm_nat_keepalive_state_updated(). At this point the state has been
> > > removed from the walk list so nat_keepalive_work() won't do
> > > anything. Am I missing something?
> >
> > I don't remember for sure, but I think the idea was to have the work
> > run "now" so that when deleting the last nat-keepalive state it
> > won't run in the future, and in general to refresh the interval and
> > not wait for the next iteration.
> >
> > Eyal.
>
> Ok. I thought about this, but I'm not seeing the benefit of doing
> that. Assuming we're deleting just this one state, the next run will
> process all the remaining states in the same way, whether it happens
> right now or at the previously scheduled time:
>
> - if the next run was needed by the peer we're deleting, not much
> changes except that we're recomputing the delay earlier than
> otherwise (right now instead of when deleted_state's interval runs
> out)
>
> - if some other state was the first to need a keepalive, we do a run
> for nothing
>
> So I think we could drop the xfrm_nat_keepalive_state_updated call
> from __xfrm_state_delete.
Right. I think at the time I didn't think it was "nice" to have the
work running long after all states have been deleted, and also it
decoupled the implementation a little - i.e. if instead of a global
work we had per-state timers (which one of the original versions had).
>
>
> @Hyunwoo here again I'm not opposed to s/cancel/disable/, it makes
> sense to use disable_ in a "destruct" operation where we don't plan to
> need the work again. But AFAICT this schedule_delayed_work isn't
> really useful.
I'm fine with both approaches.
Eyal.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] xfrm: Fix work re-schedule after cancel in xfrm_nat_keepalive_net_fini()
2026-03-11 13:00 ` Eyal Birger
@ 2026-03-11 13:27 ` Sabrina Dubroca
2026-03-11 13:40 ` Hyunwoo Kim
0 siblings, 1 reply; 9+ messages in thread
From: Sabrina Dubroca @ 2026-03-11 13:27 UTC (permalink / raw)
To: Eyal Birger
Cc: Hyunwoo Kim, steffen.klassert, herbert, davem, edumazet, kuba,
pabeni, horms, netdev
2026-03-11, 06:00:03 -0700, Eyal Birger wrote:
> On Wed, Mar 11, 2026 at 2:26 AM Sabrina Dubroca <sd@queasysnail.net> wrote:
> >
> > 2026-03-10, 17:14:19 -0700, Eyal Birger wrote:
> > > Hi,
> > >
> > > On Tue, Mar 10, 2026 at 3:57 PM Sabrina Dubroca <sd@queasysnail.net> wrote:
> > > >
> > > > Please also CC the author, and maybe additional contributors, of the
> > > > patch that introduced the problem you're fixing.
> > > >
> > > > 2026-03-11, 03:16:29 +0900, Hyunwoo Kim wrote:
> > > > > After cancel_delayed_work_sync() is called from
> > > > > xfrm_nat_keepalive_net_fini(), xfrm_state_fini() flushes remaining
> > > > > states via __xfrm_state_delete(), which calls
> > > > > xfrm_nat_keepalive_state_updated() to re-schedule nat_keepalive_work.
> > > >
> > > > Eyal, I'm wondering why __xfrm_state_delete() calls
> > > > xfrm_nat_keepalive_state_updated(). At this point the state has been
> > > > removed from the walk list so nat_keepalive_work() won't do
> > > > anything. Am I missing something?
> > >
> > > I don't remember for sure, but I think the idea was to have the work
> > > run "now" so that when deleting the last nat-keepalive state it
> > > won't run in the future, and in general to refresh the interval and
> > > not wait for the next iteration.
> > >
> > > Eyal.
> >
> > Ok. I thought about this, but I'm not seeing the benefit of doing
> > that. Assuming we're deleting just this one state, the next run will
> > process all the remaining states in the same way, whether it happens
> > right now or at the previously scheduled time:
> >
> > - if the next run was needed by the peer we're deleting, not much
> > changes except that we're recomputing the delay earlier than
> > otherwise (right now instead of when deleted_state's interval runs
> > out)
> >
> > - if some other state was the first to need a keepalive, we do a run
> > for nothing
> >
> > So I think we could drop the xfrm_nat_keepalive_state_updated call
> > from __xfrm_state_delete.
>
> Right. I think at the time I didn't think it was "nice" to have the
> work running long after all states have been deleted, and also it
> decoupled the implementation a little - i.e. if instead of a global
> work we had per-state timers (which one of the original versions had).
Ok, I see. Thanks.
> > @Hyunwoo here again I'm not opposed to s/cancel/disable/, it makes
> > sense to use disable_ in a "destruct" operation where we don't plan to
> > need the work again. But AFAICT this schedule_delayed_work isn't
> > really useful.
>
> I'm fine with both approaches.
Ok, so:
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Hyunwoo, if you want to send a v2 that does disable_ + remove the
xfrm_nat_keepalive_state_updated call, or keep this one as-is (and
then we can remove the unnecessary xfrm_nat_keepalive_state_updated
call in -next later), that's ok for me either way. Thanks.
BTW, patches for "NETWORKING [IPSEC]" should be tagged as [PATCH ipsec]
or [PATCH ipsec-next] rather than [PATCH net] or [PATCH net-next].
They go through Steffen Klassert's ipsec/ipsec-next trees, and get
pulled into net/net-next a bit later.
--
Sabrina
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] xfrm: Fix work re-schedule after cancel in xfrm_nat_keepalive_net_fini()
2026-03-11 13:27 ` Sabrina Dubroca
@ 2026-03-11 13:40 ` Hyunwoo Kim
0 siblings, 0 replies; 9+ messages in thread
From: Hyunwoo Kim @ 2026-03-11 13:40 UTC (permalink / raw)
To: Sabrina Dubroca
Cc: Eyal Birger, steffen.klassert, herbert, davem, edumazet, kuba,
pabeni, horms, netdev, imv4bel
On Wed, Mar 11, 2026 at 02:27:39PM +0100, Sabrina Dubroca wrote:
> 2026-03-11, 06:00:03 -0700, Eyal Birger wrote:
> > On Wed, Mar 11, 2026 at 2:26 AM Sabrina Dubroca <sd@queasysnail.net> wrote:
> > >
> > > 2026-03-10, 17:14:19 -0700, Eyal Birger wrote:
> > > > Hi,
> > > >
> > > > On Tue, Mar 10, 2026 at 3:57 PM Sabrina Dubroca <sd@queasysnail.net> wrote:
> > > > >
> > > > > Please also CC the author, and maybe additional contributors, of the
> > > > > patch that introduced the problem you're fixing.
> > > > >
> > > > > 2026-03-11, 03:16:29 +0900, Hyunwoo Kim wrote:
> > > > > > After cancel_delayed_work_sync() is called from
> > > > > > xfrm_nat_keepalive_net_fini(), xfrm_state_fini() flushes remaining
> > > > > > states via __xfrm_state_delete(), which calls
> > > > > > xfrm_nat_keepalive_state_updated() to re-schedule nat_keepalive_work.
> > > > >
> > > > > Eyal, I'm wondering why __xfrm_state_delete() calls
> > > > > xfrm_nat_keepalive_state_updated(). At this point the state has been
> > > > > removed from the walk list so nat_keepalive_work() won't do
> > > > > anything. Am I missing something?
> > > >
> > > > I don't remember for sure, but I think the idea was to have the work
> > > > run "now" so that when deleting the last nat-keepalive state it
> > > > won't run in the future, and in general to refresh the interval and
> > > > not wait for the next iteration.
> > > >
> > > > Eyal.
> > >
> > > Ok. I thought about this, but I'm not seeing the benefit of doing
> > > that. Assuming we're deleting just this one state, the next run will
> > > process all the remaining states in the same way, whether it happens
> > > right now or at the previously scheduled time:
> > >
> > > - if the next run was needed by the peer we're deleting, not much
> > > changes except that we're recomputing the delay earlier than
> > > otherwise (right now instead of when deleted_state's interval runs
> > > out)
> > >
> > > - if some other state was the first to need a keepalive, we do a run
> > > for nothing
> > >
> > > So I think we could drop the xfrm_nat_keepalive_state_updated call
> > > from __xfrm_state_delete.
> >
> > Right. I think at the time I didn't think it was "nice" to have the
> > work running long after all states have been deleted, and also it
> > decoupled the implementation a little - i.e. if instead of a global
> > work we had per-state timers (which one of the original versions had).
>
> Ok, I see. Thanks.
>
> > > @Hyunwoo here again I'm not opposed to s/cancel/disable/, it makes
> > > sense to use disable_ in a "destruct" operation where we don't plan to
> > > need the work again. But AFAICT this schedule_delayed_work isn't
> > > really useful.
> >
> > I'm fine with both approaches.
>
> Ok, so:
> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
>
> Hyunwoo, if you want to send a v2 that does disable_ + remove the
> xfrm_nat_keepalive_state_updated call, or keep this one as-is (and
> then we can remove the unnecessary xfrm_nat_keepalive_state_updated
> call in -next later), that's ok for me either way. Thanks.
Thank you for the review, Sabrina. I'll keep the current patch as-is for now.
>
>
> BTW, patches for "NETWORKING [IPSEC]" should be tagged as [PATCH ipsec]
> or [PATCH ipsec-next] rather than [PATCH net] or [PATCH net-next].
> They go through Steffen Klassert's ipsec/ipsec-next trees, and get
> pulled into net/net-next a bit later.
Got it, will keep that in mind for future submissions.
Best regards,
Hyunwoo Kim
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] xfrm: Fix work re-schedule after cancel in xfrm_nat_keepalive_net_fini()
2026-03-10 18:16 [PATCH net] xfrm: Fix work re-schedule after cancel in xfrm_nat_keepalive_net_fini() Hyunwoo Kim
2026-03-10 22:57 ` Sabrina Dubroca
@ 2026-03-16 9:57 ` Steffen Klassert
1 sibling, 0 replies; 9+ messages in thread
From: Steffen Klassert @ 2026-03-16 9:57 UTC (permalink / raw)
To: Hyunwoo Kim; +Cc: herbert, davem, edumazet, kuba, pabeni, horms, netdev
On Wed, Mar 11, 2026 at 03:16:29AM +0900, Hyunwoo Kim wrote:
> After cancel_delayed_work_sync() is called from
> xfrm_nat_keepalive_net_fini(), xfrm_state_fini() flushes remaining
> states via __xfrm_state_delete(), which calls
> xfrm_nat_keepalive_state_updated() to re-schedule nat_keepalive_work.
>
> The following is a simple race scenario:
>
> cpu0 cpu1
>
> cleanup_net() [Round 1]
> ops_undo_list()
> xfrm_net_exit()
> xfrm_nat_keepalive_net_fini()
> cancel_delayed_work_sync(nat_keepalive_work);
> xfrm_state_fini()
> xfrm_state_flush()
> xfrm_state_delete(x)
> __xfrm_state_delete(x)
> xfrm_nat_keepalive_state_updated(x)
> schedule_delayed_work(nat_keepalive_work);
> rcu_barrier();
> net_complete_free();
> net_passive_dec(net);
> llist_add(&net->defer_free_list, &defer_free_list);
>
> cleanup_net() [Round 2]
> rcu_barrier();
> net_complete_free()
> kmem_cache_free(net_cachep, net);
> nat_keepalive_work()
> // on freed net
>
> To prevent this, cancel_delayed_work_sync() is replaced with
> disable_delayed_work_sync().
>
> Fixes: f531d13bdfe3 ("xfrm: support sending NAT keepalives in ESP in UDP states")
> Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
Applied, thanks a lot!
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-03-16 9:57 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-10 18:16 [PATCH net] xfrm: Fix work re-schedule after cancel in xfrm_nat_keepalive_net_fini() Hyunwoo Kim
2026-03-10 22:57 ` Sabrina Dubroca
2026-03-11 0:14 ` Eyal Birger
2026-03-11 9:26 ` Sabrina Dubroca
2026-03-11 10:31 ` Hyunwoo Kim
2026-03-11 13:00 ` Eyal Birger
2026-03-11 13:27 ` Sabrina Dubroca
2026-03-11 13:40 ` Hyunwoo Kim
2026-03-16 9:57 ` Steffen Klassert
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox