Netdev List
 help / color / mirror / Atom feed
* [PATCH net v2 1/1] net/sched: act_api: use RCU with deferred freeing for action lifecycle
@ 2026-05-31 16:08 Jamal Hadi Salim
  2026-05-31 19:14 ` Kyle Zeng
  0 siblings, 1 reply; 2+ messages in thread
From: Jamal Hadi Salim @ 2026-05-31 16:08 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, kuba, pabeni, horms, victor, kylebot, jiri,
	vladbu, linux-kernel, security, stable, Jamal Hadi Salim, syzbot

When NEWTFILTER and DELFILTER are run concurrently it is possible to create a
race with an associated action.

Let's illustrate with CPU0 running NEWTFILTER and CPU1 running DELFILTER:

 0: mutex_lock() <-- holds the idr lock
 0: rcu_read_lock()
 0: p = idr_find(idr, index) <-- action p is valid (RCU protects IDR)
 0: mutex_unlock() <-- releases the idr lock
 1: refcount_dec_and_mutex_lock() <-- refcnt 1->0, mutex held
 1: idr_remove(idr, index) <-- Action removed from IDR
 1: mutex_unlock() <-- mutex released allowing us to delete the action
 1: tcf_action_cleanup(p); kfree(p) <-- Kfrees p immediately, no deferral
 0: refcount_inc_not_zero(&p->tcfa_refcnt) <-- ouch, UAF p points to freed memory

This patch fixes the race condition between NEWTFILTER and DELFILTER by
adding struct rcu_head to tc_action used in the deferral and introducing a
call_rcu() in the delete path to defer the final kfree().

Note: this is a revert of commit d7fb60b9cafb ("net_sched: get rid of tcfa_rcu")
but also modernization/simplification to directly use kfree_rcu().

Let's illustrate the new restored code path:

 0: rcu_read_lock()
 1: refcount_dec_and_mutex_lock() <-- refcnt 1->0, mutex held
 1: idr_remove(idr, index)
 1: mutex_unlock()
 1: call_rcu(&p->tcfa_rcu, tcf_action_rcu_free) <-- defer kfree after grace period
 0: p = idr_find(idr, index)
 0: refcount_inc_not_zero(&p->tcfa_refcnt) <-- fails, refcnt already 0
 1: rcu_read_unlock() <-- release so freeing can run after grace period

After CPU1 calls idr_remove(), the object is no longer reachable through the IDR.
CPU0's subsequent idr_find() will return NULL, and even if it still held a
stale pointer, the immediate kfree() is now deferred until after the RCU grace
period, so no UAF can occur.

Fixes: d7fb60b9cafb ("net_sched: get rid of tcfa_rcu")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reported-by: Kyle Zeng <kylebot@openai.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Tested-by: syzbot@syzkaller.appspotmail.com
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
---
v1->v2
1) syzbot ci found revealed that the cleanup code could be called under
   softirq/atomic context. Because tcf_action_cleanup() calls  __tcf_chain_put()
   which grabs a mutex (which can sleep). Fix is to move tcf_action_cleanup()
   to be invoked before call to rcu.
2) And while looking closely i noticed a bad comment on action code which
   claims that "actions are always connected to filters" ;-> Tracing back
   on git i noticed that infact we did have exactly this new change circa 2017
   that was erronously removed by commit d7fb60b9cafb. Well, at least I
   am certain what the "Fixes" tag should be now ;->
3) Patch is much much simpler now than it was in v1
---
 include/net/act_api.h | 1 +
 net/sched/act_api.c   | 7 +------
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index d11b79107930..fd2967ee08f7 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -45,6 +45,7 @@ struct tc_action {
 	struct tc_cookie	__rcu *user_cookie;
 	struct tcf_chain	__rcu *goto_chain;
 	u32			tcfa_flags;
+	struct rcu_head         tcfa_rcu;
 	u8			hw_stats;
 	u8			used_hw_stats;
 	bool			used_hw_stats_valid;
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 332fd9695e54..04ea11c90e03 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -112,11 +112,6 @@ struct tcf_chain *tcf_action_set_ctrlact(struct tc_action *a, int action,
 }
 EXPORT_SYMBOL(tcf_action_set_ctrlact);
 
-/* XXX: For standalone actions, we don't need a RCU grace period either, because
- * actions are always connected to filters and filters are already destroyed in
- * RCU callbacks, so after a RCU grace period actions are already disconnected
- * from filters. Readers later can not find us.
- */
 static void free_tcf(struct tc_action *p)
 {
 	struct tcf_chain *chain = rcu_dereference_protected(p->goto_chain, 1);
@@ -129,7 +124,7 @@ static void free_tcf(struct tc_action *p)
 	if (chain)
 		tcf_chain_put_by_act(chain);
 
-	kfree(p);
+	kfree_rcu(p, tcfa_rcu);
 }
 
 static void offload_action_hw_count_set(struct tc_action *act,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH net v2 1/1] net/sched: act_api: use RCU with deferred freeing for action lifecycle
  2026-05-31 16:08 [PATCH net v2 1/1] net/sched: act_api: use RCU with deferred freeing for action lifecycle Jamal Hadi Salim
@ 2026-05-31 19:14 ` Kyle Zeng
  0 siblings, 0 replies; 2+ messages in thread
From: Kyle Zeng @ 2026-05-31 19:14 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: netdev, davem, edumazet, kuba, pabeni, horms, victor, jiri,
	vladbu, linux-kernel, security, stable, syzbot

On Sun, May 31, 2026 at 12:08:12PM -0400, Jamal Hadi Salim wrote:
> When NEWTFILTER and DELFILTER are run concurrently it is possible to create a
> race with an associated action.
> 
> Let's illustrate with CPU0 running NEWTFILTER and CPU1 running DELFILTER:
> 
>  0: mutex_lock() <-- holds the idr lock
>  0: rcu_read_lock()
>  0: p = idr_find(idr, index) <-- action p is valid (RCU protects IDR)
>  0: mutex_unlock() <-- releases the idr lock
>  1: refcount_dec_and_mutex_lock() <-- refcnt 1->0, mutex held
>  1: idr_remove(idr, index) <-- Action removed from IDR
>  1: mutex_unlock() <-- mutex released allowing us to delete the action
>  1: tcf_action_cleanup(p); kfree(p) <-- Kfrees p immediately, no deferral
>  0: refcount_inc_not_zero(&p->tcfa_refcnt) <-- ouch, UAF p points to freed memory
> 
> This patch fixes the race condition between NEWTFILTER and DELFILTER by
> adding struct rcu_head to tc_action used in the deferral and introducing a
> call_rcu() in the delete path to defer the final kfree().
> 
> Note: this is a revert of commit d7fb60b9cafb ("net_sched: get rid of tcfa_rcu")
> but also modernization/simplification to directly use kfree_rcu().
> 
> Let's illustrate the new restored code path:
> 
>  0: rcu_read_lock()
>  1: refcount_dec_and_mutex_lock() <-- refcnt 1->0, mutex held
>  1: idr_remove(idr, index)
>  1: mutex_unlock()
>  1: call_rcu(&p->tcfa_rcu, tcf_action_rcu_free) <-- defer kfree after grace period
>  0: p = idr_find(idr, index)
>  0: refcount_inc_not_zero(&p->tcfa_refcnt) <-- fails, refcnt already 0
>  1: rcu_read_unlock() <-- release so freeing can run after grace period
> 
> After CPU1 calls idr_remove(), the object is no longer reachable through the IDR.
> CPU0's subsequent idr_find() will return NULL, and even if it still held a
> stale pointer, the immediate kfree() is now deferred until after the RCU grace
> period, so no UAF can occur.
> 
> Fixes: d7fb60b9cafb ("net_sched: get rid of tcfa_rcu")
> Suggested-by: Jakub Kicinski <kuba@kernel.org>
> Reported-by: Kyle Zeng <kylebot@openai.com>
> Tested-by: Victor Nogueira <victor@mojatatu.com>
> Tested-by: syzbot@syzkaller.appspotmail.com
> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>

Tested-by: Kyle Zeng <kylebot@openai.com>

Best,
Kyle

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-05-31 19:14 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-31 16:08 [PATCH net v2 1/1] net/sched: act_api: use RCU with deferred freeing for action lifecycle Jamal Hadi Salim
2026-05-31 19:14 ` Kyle Zeng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox