public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next] net/sched: refine indirect call mitigation in tc_wrapper.h
@ 2026-03-07 13:36 Eric Dumazet
  2026-03-07 21:44 ` Jamal Hadi Salim
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Eric Dumazet @ 2026-03-07 13:36 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
	Eric Dumazet, Victor Noguiera, Pedro Tammela

Some modern cpus disable X86_FEATURE_RETPOLINE feature,
even if a direct call can still be beneficial.

Even when IBRS is present, an indirect call is more expensive
than a direct one:

Direct Calls:
  Compilers can perform powerful optimizations like inlining,
  where the function body is directly inserted at the call site,
  eliminating call overhead entirely.

Indirect Calls:
  Inlining is much harder, if not impossible, because the compiler
  doesn't know the target function at compile time.
  Techniques like Indirect Call Promotion can help by using
  profile-guided optimization to turn frequently taken indirect calls
  into conditional direct calls, but they still add complexity
  and potential overhead compared to a truly direct call.

In this patch, I split tc_skip_wrapper in two different
static keys, one for tc_act() (tc_skip_wrapper_act)
and one for tc_classify() (tc_skip_wrapper_cls).

Then I enable the tc_skip_wrapper_cls only if the count
of builtin classifiers is above one.

I enable tc_skip_wrapper_act only it the count of builtin
actions is above one.

In our production kernels, we only have CONFIG_NET_CLS_BPF=y
and CONFIG_NET_ACT_BPF=y. Other are modules or are not compiled.

Tested on AMD Turin cpus, cls_bpf_classify() cost went
from 1% down to 0.18 %, and FDO will be able to inline
it in tcf_classify() for further gains.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
Cc: Victor Noguiera <victor@mojatatu.com>
Cc: Pedro Tammela <pctammela@mojatatu.ai>
---
 include/net/tc_wrapper.h | 47 +++++++++++++++++++++++++++++++++++-----
 net/sched/sch_api.c      |  3 ++-
 2 files changed, 44 insertions(+), 6 deletions(-)

diff --git a/include/net/tc_wrapper.h b/include/net/tc_wrapper.h
index ffe58a02537c3ae7979e37ace09373a443d81cc9..4ebb053bb0dd6f5f1af572fe62404eacfd0885ed 100644
--- a/include/net/tc_wrapper.h
+++ b/include/net/tc_wrapper.h
@@ -12,7 +12,8 @@
 
 #define TC_INDIRECT_SCOPE
 
-extern struct static_key_false tc_skip_wrapper;
+extern struct static_key_false tc_skip_wrapper_act;
+extern struct static_key_false tc_skip_wrapper_cls;
 
 /* TC Actions */
 #ifdef CONFIG_NET_CLS_ACT
@@ -46,7 +47,7 @@ TC_INDIRECT_ACTION_DECLARE(tunnel_key_act);
 static inline int tc_act(struct sk_buff *skb, const struct tc_action *a,
 			   struct tcf_result *res)
 {
-	if (static_branch_likely(&tc_skip_wrapper))
+	if (static_branch_likely(&tc_skip_wrapper_act))
 		goto skip;
 
 #if IS_BUILTIN(CONFIG_NET_ACT_GACT)
@@ -153,7 +154,7 @@ TC_INDIRECT_FILTER_DECLARE(u32_classify);
 static inline int tc_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 				struct tcf_result *res)
 {
-	if (static_branch_likely(&tc_skip_wrapper))
+	if (static_branch_likely(&tc_skip_wrapper_cls))
 		goto skip;
 
 #if IS_BUILTIN(CONFIG_NET_CLS_BPF)
@@ -202,8 +203,44 @@ static inline int tc_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 static inline void tc_wrapper_init(void)
 {
 #ifdef CONFIG_X86
-	if (!cpu_feature_enabled(X86_FEATURE_RETPOLINE))
-		static_branch_enable(&tc_skip_wrapper);
+	int cnt_cls = IS_BUILTIN(CONFIG_NET_CLS_BPF) +
+		IS_BUILTIN(CONFIG_NET_CLS_U32)  +
+		IS_BUILTIN(CONFIG_NET_CLS_FLOWER) +
+		IS_BUILTIN(CONFIG_NET_CLS_FW) +
+		IS_BUILTIN(CONFIG_NET_CLS_MATCHALL) +
+		IS_BUILTIN(CONFIG_NET_CLS_BASIC) +
+		IS_BUILTIN(CONFIG_NET_CLS_CGROUP) +
+		IS_BUILTIN(CONFIG_NET_CLS_FLOW) +
+		IS_BUILTIN(CONFIG_NET_CLS_ROUTE4);
+
+	int cnt_act = IS_BUILTIN(CONFIG_NET_ACT_GACT) +
+		IS_BUILTIN(CONFIG_NET_ACT_MIRRED) +
+		IS_BUILTIN(CONFIG_NET_ACT_PEDIT) +
+		IS_BUILTIN(CONFIG_NET_ACT_SKBEDIT) +
+		IS_BUILTIN(CONFIG_NET_ACT_SKBMOD) +
+		IS_BUILTIN(CONFIG_NET_ACT_POLICE) +
+		IS_BUILTIN(CONFIG_NET_ACT_BPF) +
+		IS_BUILTIN(CONFIG_NET_ACT_CONNMARK) +
+		IS_BUILTIN(CONFIG_NET_ACT_CSUM) +
+		IS_BUILTIN(CONFIG_NET_ACT_CT) +
+		IS_BUILTIN(CONFIG_NET_ACT_CTINFO) +
+		IS_BUILTIN(CONFIG_NET_ACT_GATE) +
+		IS_BUILTIN(CONFIG_NET_ACT_MPLS) +
+		IS_BUILTIN(CONFIG_NET_ACT_NAT) +
+		IS_BUILTIN(CONFIG_NET_ACT_TUNNEL_KEY) +
+		IS_BUILTIN(CONFIG_NET_ACT_VLAN) +
+		IS_BUILTIN(CONFIG_NET_ACT_IFE) +
+		IS_BUILTIN(CONFIG_NET_ACT_SIMP) +
+		IS_BUILTIN(CONFIG_NET_ACT_SAMPLE);
+
+	if (cpu_feature_enabled(X86_FEATURE_RETPOLINE))
+		return;
+
+	if (cnt_cls > 1)
+		static_branch_enable(&tc_skip_wrapper_cls);
+
+	if (cnt_act > 1)
+		static_branch_enable(&tc_skip_wrapper_act);
 #endif
 }
 
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index cc43e3f7574fae203989f5c28b4934f0720e64c2..61a9dc219a18fb34d10893fe45a0030d3c23f5d0 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -2479,7 +2479,8 @@ static struct pernet_operations psched_net_ops = {
 };
 
 #if IS_ENABLED(CONFIG_MITIGATION_RETPOLINE)
-DEFINE_STATIC_KEY_FALSE(tc_skip_wrapper);
+DEFINE_STATIC_KEY_FALSE(tc_skip_wrapper_act);
+DEFINE_STATIC_KEY_FALSE(tc_skip_wrapper_cls);
 #endif
 
 static const struct rtnl_msg_handler psched_rtnl_msg_handlers[] __initconst = {
-- 
2.53.0.473.g4a7958ca14-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next] net/sched: refine indirect call mitigation in tc_wrapper.h
  2026-03-07 13:36 [PATCH net-next] net/sched: refine indirect call mitigation in tc_wrapper.h Eric Dumazet
@ 2026-03-07 21:44 ` Jamal Hadi Salim
  2026-03-07 23:53   ` Eric Dumazet
  2026-03-09 17:37 ` Pedro Tammela
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 7+ messages in thread
From: Jamal Hadi Salim @ 2026-03-07 21:44 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Jiri Pirko, netdev, eric.dumazet, Victor Noguiera, Pedro Tammela

On Sat, Mar 7, 2026 at 8:36 AM Eric Dumazet <edumazet@google.com> wrote:
>
> Some modern cpus disable X86_FEATURE_RETPOLINE feature,
> even if a direct call can still be beneficial.
>
> Even when IBRS is present, an indirect call is more expensive
> than a direct one:
>
> Direct Calls:
>   Compilers can perform powerful optimizations like inlining,
>   where the function body is directly inserted at the call site,
>   eliminating call overhead entirely.
>
> Indirect Calls:
>   Inlining is much harder, if not impossible, because the compiler
>   doesn't know the target function at compile time.
>   Techniques like Indirect Call Promotion can help by using
>   profile-guided optimization to turn frequently taken indirect calls
>   into conditional direct calls, but they still add complexity
>   and potential overhead compared to a truly direct call.
>
> In this patch, I split tc_skip_wrapper in two different
> static keys, one for tc_act() (tc_skip_wrapper_act)
> and one for tc_classify() (tc_skip_wrapper_cls).
>
> Then I enable the tc_skip_wrapper_cls only if the count
> of builtin classifiers is above one.
>
> I enable tc_skip_wrapper_act only it the count of builtin
> actions is above one.
>

Sorry, not clear from reading the patch - why "above one"?

cheers,
jamal

> In our production kernels, we only have CONFIG_NET_CLS_BPF=y
> and CONFIG_NET_ACT_BPF=y. Other are modules or are not compiled.
>
> Tested on AMD Turin cpus, cls_bpf_classify() cost went
> from 1% down to 0.18 %, and FDO will be able to inline
> it in tcf_classify() for further gains.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
> Cc: Victor Noguiera <victor@mojatatu.com>
> Cc: Pedro Tammela <pctammela@mojatatu.ai>
> ---
>  include/net/tc_wrapper.h | 47 +++++++++++++++++++++++++++++++++++-----
>  net/sched/sch_api.c      |  3 ++-
>  2 files changed, 44 insertions(+), 6 deletions(-)
>
> diff --git a/include/net/tc_wrapper.h b/include/net/tc_wrapper.h
> index ffe58a02537c3ae7979e37ace09373a443d81cc9..4ebb053bb0dd6f5f1af572fe62404eacfd0885ed 100644
> --- a/include/net/tc_wrapper.h
> +++ b/include/net/tc_wrapper.h
> @@ -12,7 +12,8 @@
>
>  #define TC_INDIRECT_SCOPE
>
> -extern struct static_key_false tc_skip_wrapper;
> +extern struct static_key_false tc_skip_wrapper_act;
> +extern struct static_key_false tc_skip_wrapper_cls;
>
>  /* TC Actions */
>  #ifdef CONFIG_NET_CLS_ACT
> @@ -46,7 +47,7 @@ TC_INDIRECT_ACTION_DECLARE(tunnel_key_act);
>  static inline int tc_act(struct sk_buff *skb, const struct tc_action *a,
>                            struct tcf_result *res)
>  {
> -       if (static_branch_likely(&tc_skip_wrapper))
> +       if (static_branch_likely(&tc_skip_wrapper_act))
>                 goto skip;
>
>  #if IS_BUILTIN(CONFIG_NET_ACT_GACT)
> @@ -153,7 +154,7 @@ TC_INDIRECT_FILTER_DECLARE(u32_classify);
>  static inline int tc_classify(struct sk_buff *skb, const struct tcf_proto *tp,
>                                 struct tcf_result *res)
>  {
> -       if (static_branch_likely(&tc_skip_wrapper))
> +       if (static_branch_likely(&tc_skip_wrapper_cls))
>                 goto skip;
>
>  #if IS_BUILTIN(CONFIG_NET_CLS_BPF)
> @@ -202,8 +203,44 @@ static inline int tc_classify(struct sk_buff *skb, const struct tcf_proto *tp,
>  static inline void tc_wrapper_init(void)
>  {
>  #ifdef CONFIG_X86
> -       if (!cpu_feature_enabled(X86_FEATURE_RETPOLINE))
> -               static_branch_enable(&tc_skip_wrapper);
> +       int cnt_cls = IS_BUILTIN(CONFIG_NET_CLS_BPF) +
> +               IS_BUILTIN(CONFIG_NET_CLS_U32)  +
> +               IS_BUILTIN(CONFIG_NET_CLS_FLOWER) +
> +               IS_BUILTIN(CONFIG_NET_CLS_FW) +
> +               IS_BUILTIN(CONFIG_NET_CLS_MATCHALL) +
> +               IS_BUILTIN(CONFIG_NET_CLS_BASIC) +
> +               IS_BUILTIN(CONFIG_NET_CLS_CGROUP) +
> +               IS_BUILTIN(CONFIG_NET_CLS_FLOW) +
> +               IS_BUILTIN(CONFIG_NET_CLS_ROUTE4);
> +
> +       int cnt_act = IS_BUILTIN(CONFIG_NET_ACT_GACT) +
> +               IS_BUILTIN(CONFIG_NET_ACT_MIRRED) +
> +               IS_BUILTIN(CONFIG_NET_ACT_PEDIT) +
> +               IS_BUILTIN(CONFIG_NET_ACT_SKBEDIT) +
> +               IS_BUILTIN(CONFIG_NET_ACT_SKBMOD) +
> +               IS_BUILTIN(CONFIG_NET_ACT_POLICE) +
> +               IS_BUILTIN(CONFIG_NET_ACT_BPF) +
> +               IS_BUILTIN(CONFIG_NET_ACT_CONNMARK) +
> +               IS_BUILTIN(CONFIG_NET_ACT_CSUM) +
> +               IS_BUILTIN(CONFIG_NET_ACT_CT) +
> +               IS_BUILTIN(CONFIG_NET_ACT_CTINFO) +
> +               IS_BUILTIN(CONFIG_NET_ACT_GATE) +
> +               IS_BUILTIN(CONFIG_NET_ACT_MPLS) +
> +               IS_BUILTIN(CONFIG_NET_ACT_NAT) +
> +               IS_BUILTIN(CONFIG_NET_ACT_TUNNEL_KEY) +
> +               IS_BUILTIN(CONFIG_NET_ACT_VLAN) +
> +               IS_BUILTIN(CONFIG_NET_ACT_IFE) +
> +               IS_BUILTIN(CONFIG_NET_ACT_SIMP) +
> +               IS_BUILTIN(CONFIG_NET_ACT_SAMPLE);
> +
> +       if (cpu_feature_enabled(X86_FEATURE_RETPOLINE))
> +               return;
> +
> +       if (cnt_cls > 1)
> +               static_branch_enable(&tc_skip_wrapper_cls);
> +
> +       if (cnt_act > 1)
> +               static_branch_enable(&tc_skip_wrapper_act);
>  #endif
>  }
>
> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> index cc43e3f7574fae203989f5c28b4934f0720e64c2..61a9dc219a18fb34d10893fe45a0030d3c23f5d0 100644
> --- a/net/sched/sch_api.c
> +++ b/net/sched/sch_api.c
> @@ -2479,7 +2479,8 @@ static struct pernet_operations psched_net_ops = {
>  };
>
>  #if IS_ENABLED(CONFIG_MITIGATION_RETPOLINE)
> -DEFINE_STATIC_KEY_FALSE(tc_skip_wrapper);
> +DEFINE_STATIC_KEY_FALSE(tc_skip_wrapper_act);
> +DEFINE_STATIC_KEY_FALSE(tc_skip_wrapper_cls);
>  #endif
>
>  static const struct rtnl_msg_handler psched_rtnl_msg_handlers[] __initconst = {
> --
> 2.53.0.473.g4a7958ca14-goog
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next] net/sched: refine indirect call mitigation in tc_wrapper.h
  2026-03-07 21:44 ` Jamal Hadi Salim
@ 2026-03-07 23:53   ` Eric Dumazet
  2026-03-09 16:19     ` Jamal Hadi Salim
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2026-03-07 23:53 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Jiri Pirko, netdev, eric.dumazet, Victor Noguiera, Pedro Tammela

On Sat, Mar 7, 2026 at 10:44 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> On Sat, Mar 7, 2026 at 8:36 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> > Some modern cpus disable X86_FEATURE_RETPOLINE feature,
> > even if a direct call can still be beneficial.
> >
> > Even when IBRS is present, an indirect call is more expensive
> > than a direct one:
> >
> > Direct Calls:
> >   Compilers can perform powerful optimizations like inlining,
> >   where the function body is directly inserted at the call site,
> >   eliminating call overhead entirely.
> >
> > Indirect Calls:
> >   Inlining is much harder, if not impossible, because the compiler
> >   doesn't know the target function at compile time.
> >   Techniques like Indirect Call Promotion can help by using
> >   profile-guided optimization to turn frequently taken indirect calls
> >   into conditional direct calls, but they still add complexity
> >   and potential overhead compared to a truly direct call.
> >
> > In this patch, I split tc_skip_wrapper in two different
> > static keys, one for tc_act() (tc_skip_wrapper_act)
> > and one for tc_classify() (tc_skip_wrapper_cls).
> >
> > Then I enable the tc_skip_wrapper_cls only if the count
> > of builtin classifiers is above one.
> >
> > I enable tc_skip_wrapper_act only it the count of builtin
> > actions is above one.
> >
>
> Sorry, not clear from reading the patch - why "above one"?
>

Rationale of the code before my patch is :

For platforms with IBRS, it is not worth having many tests that could have bad
branch predictions, only to avoid one indirect branch which is okay.

My patch refines the logic to still allow _one_ conditional to allow
one direct call,
because we know this construct is better, even with IBRS.

This is the reason we do not have static keys guarding many INDIRECT_CALL()
in networking stack. Only TC seemed to care about that.

Our use case is only having CLS_BPF as a builtin, I will let people
needing 2 or 3 builtins
amend the code with their performance numbers.

Thanks !

> cheers,
> jamal
>
> > In our production kernels, we only have CONFIG_NET_CLS_BPF=y
> > and CONFIG_NET_ACT_BPF=y. Other are modules or are not compiled.
> >
> > Tested on AMD Turin cpus, cls_bpf_classify() cost went
> > from 1% down to 0.18 %, and FDO will be able to inline
> > it in tcf_classify() for further gains.
> >
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > ---
> > Cc: Victor Noguiera <victor@mojatatu.com>
> > Cc: Pedro Tammela <pctammela@mojatatu.ai>
> > ---
> >  include/net/tc_wrapper.h | 47 +++++++++++++++++++++++++++++++++++-----
> >  net/sched/sch_api.c      |  3 ++-
> >  2 files changed, 44 insertions(+), 6 deletions(-)
> >
> > diff --git a/include/net/tc_wrapper.h b/include/net/tc_wrapper.h
> > index ffe58a02537c3ae7979e37ace09373a443d81cc9..4ebb053bb0dd6f5f1af572fe62404eacfd0885ed 100644
> > --- a/include/net/tc_wrapper.h
> > +++ b/include/net/tc_wrapper.h
> > @@ -12,7 +12,8 @@
> >
> >  #define TC_INDIRECT_SCOPE
> >
> > -extern struct static_key_false tc_skip_wrapper;
> > +extern struct static_key_false tc_skip_wrapper_act;
> > +extern struct static_key_false tc_skip_wrapper_cls;
> >
> >  /* TC Actions */
> >  #ifdef CONFIG_NET_CLS_ACT
> > @@ -46,7 +47,7 @@ TC_INDIRECT_ACTION_DECLARE(tunnel_key_act);
> >  static inline int tc_act(struct sk_buff *skb, const struct tc_action *a,
> >                            struct tcf_result *res)
> >  {
> > -       if (static_branch_likely(&tc_skip_wrapper))
> > +       if (static_branch_likely(&tc_skip_wrapper_act))
> >                 goto skip;
> >
> >  #if IS_BUILTIN(CONFIG_NET_ACT_GACT)
> > @@ -153,7 +154,7 @@ TC_INDIRECT_FILTER_DECLARE(u32_classify);
> >  static inline int tc_classify(struct sk_buff *skb, const struct tcf_proto *tp,
> >                                 struct tcf_result *res)
> >  {
> > -       if (static_branch_likely(&tc_skip_wrapper))
> > +       if (static_branch_likely(&tc_skip_wrapper_cls))
> >                 goto skip;
> >
> >  #if IS_BUILTIN(CONFIG_NET_CLS_BPF)
> > @@ -202,8 +203,44 @@ static inline int tc_classify(struct sk_buff *skb, const struct tcf_proto *tp,
> >  static inline void tc_wrapper_init(void)
> >  {
> >  #ifdef CONFIG_X86
> > -       if (!cpu_feature_enabled(X86_FEATURE_RETPOLINE))
> > -               static_branch_enable(&tc_skip_wrapper);
> > +       int cnt_cls = IS_BUILTIN(CONFIG_NET_CLS_BPF) +
> > +               IS_BUILTIN(CONFIG_NET_CLS_U32)  +
> > +               IS_BUILTIN(CONFIG_NET_CLS_FLOWER) +
> > +               IS_BUILTIN(CONFIG_NET_CLS_FW) +
> > +               IS_BUILTIN(CONFIG_NET_CLS_MATCHALL) +
> > +               IS_BUILTIN(CONFIG_NET_CLS_BASIC) +
> > +               IS_BUILTIN(CONFIG_NET_CLS_CGROUP) +
> > +               IS_BUILTIN(CONFIG_NET_CLS_FLOW) +
> > +               IS_BUILTIN(CONFIG_NET_CLS_ROUTE4);
> > +
> > +       int cnt_act = IS_BUILTIN(CONFIG_NET_ACT_GACT) +
> > +               IS_BUILTIN(CONFIG_NET_ACT_MIRRED) +
> > +               IS_BUILTIN(CONFIG_NET_ACT_PEDIT) +
> > +               IS_BUILTIN(CONFIG_NET_ACT_SKBEDIT) +
> > +               IS_BUILTIN(CONFIG_NET_ACT_SKBMOD) +
> > +               IS_BUILTIN(CONFIG_NET_ACT_POLICE) +
> > +               IS_BUILTIN(CONFIG_NET_ACT_BPF) +
> > +               IS_BUILTIN(CONFIG_NET_ACT_CONNMARK) +
> > +               IS_BUILTIN(CONFIG_NET_ACT_CSUM) +
> > +               IS_BUILTIN(CONFIG_NET_ACT_CT) +
> > +               IS_BUILTIN(CONFIG_NET_ACT_CTINFO) +
> > +               IS_BUILTIN(CONFIG_NET_ACT_GATE) +
> > +               IS_BUILTIN(CONFIG_NET_ACT_MPLS) +
> > +               IS_BUILTIN(CONFIG_NET_ACT_NAT) +
> > +               IS_BUILTIN(CONFIG_NET_ACT_TUNNEL_KEY) +
> > +               IS_BUILTIN(CONFIG_NET_ACT_VLAN) +
> > +               IS_BUILTIN(CONFIG_NET_ACT_IFE) +
> > +               IS_BUILTIN(CONFIG_NET_ACT_SIMP) +
> > +               IS_BUILTIN(CONFIG_NET_ACT_SAMPLE);
> > +
> > +       if (cpu_feature_enabled(X86_FEATURE_RETPOLINE))
> > +               return;
> > +
> > +       if (cnt_cls > 1)
> > +               static_branch_enable(&tc_skip_wrapper_cls);
> > +
> > +       if (cnt_act > 1)
> > +               static_branch_enable(&tc_skip_wrapper_act);
> >  #endif
> >  }
> >
> > diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> > index cc43e3f7574fae203989f5c28b4934f0720e64c2..61a9dc219a18fb34d10893fe45a0030d3c23f5d0 100644
> > --- a/net/sched/sch_api.c
> > +++ b/net/sched/sch_api.c
> > @@ -2479,7 +2479,8 @@ static struct pernet_operations psched_net_ops = {
> >  };
> >
> >  #if IS_ENABLED(CONFIG_MITIGATION_RETPOLINE)
> > -DEFINE_STATIC_KEY_FALSE(tc_skip_wrapper);
> > +DEFINE_STATIC_KEY_FALSE(tc_skip_wrapper_act);
> > +DEFINE_STATIC_KEY_FALSE(tc_skip_wrapper_cls);
> >  #endif
> >
> >  static const struct rtnl_msg_handler psched_rtnl_msg_handlers[] __initconst = {
> > --
> > 2.53.0.473.g4a7958ca14-goog
> >

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next] net/sched: refine indirect call mitigation in tc_wrapper.h
  2026-03-07 23:53   ` Eric Dumazet
@ 2026-03-09 16:19     ` Jamal Hadi Salim
  0 siblings, 0 replies; 7+ messages in thread
From: Jamal Hadi Salim @ 2026-03-09 16:19 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Jiri Pirko, netdev, eric.dumazet, Victor Noguiera, Pedro Tammela

On Sat, Mar 7, 2026 at 6:54 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Sat, Mar 7, 2026 at 10:44 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> >
> > On Sat, Mar 7, 2026 at 8:36 AM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > Some modern cpus disable X86_FEATURE_RETPOLINE feature,
> > > even if a direct call can still be beneficial.
> > >
> > > Even when IBRS is present, an indirect call is more expensive
> > > than a direct one:
> > >
> > > Direct Calls:
> > >   Compilers can perform powerful optimizations like inlining,
> > >   where the function body is directly inserted at the call site,
> > >   eliminating call overhead entirely.
> > >
> > > Indirect Calls:
> > >   Inlining is much harder, if not impossible, because the compiler
> > >   doesn't know the target function at compile time.
> > >   Techniques like Indirect Call Promotion can help by using
> > >   profile-guided optimization to turn frequently taken indirect calls
> > >   into conditional direct calls, but they still add complexity
> > >   and potential overhead compared to a truly direct call.
> > >
> > > In this patch, I split tc_skip_wrapper in two different
> > > static keys, one for tc_act() (tc_skip_wrapper_act)
> > > and one for tc_classify() (tc_skip_wrapper_cls).
> > >
> > > Then I enable the tc_skip_wrapper_cls only if the count
> > > of builtin classifiers is above one.
> > >
> > > I enable tc_skip_wrapper_act only it the count of builtin
> > > actions is above one.
> > >
> >
> > Sorry, not clear from reading the patch - why "above one"?
> >
>
> Rationale of the code before my patch is :
>
> For platforms with IBRS, it is not worth having many tests that could have bad
> branch predictions, only to avoid one indirect branch which is okay.
>
> My patch refines the logic to still allow _one_ conditional to allow
> one direct call,
> because we know this construct is better, even with IBRS.
>
> This is the reason we do not have static keys guarding many INDIRECT_CALL()
> in networking stack. Only TC seemed to care about that.
>
> Our use case is only having CLS_BPF as a builtin, I will let people
> needing 2 or 3 builtins
> amend the code with their performance numbers.
>

Thanks. I was initially worried it was highly workload-specific.

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

cheers,
jamal

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next] net/sched: refine indirect call mitigation in tc_wrapper.h
  2026-03-07 13:36 [PATCH net-next] net/sched: refine indirect call mitigation in tc_wrapper.h Eric Dumazet
  2026-03-07 21:44 ` Jamal Hadi Salim
@ 2026-03-09 17:37 ` Pedro Tammela
  2026-03-09 18:34 ` Victor Nogueira
  2026-03-10  3:00 ` patchwork-bot+netdevbpf
  3 siblings, 0 replies; 7+ messages in thread
From: Pedro Tammela @ 2026-03-09 17:37 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
	Victor Noguiera, Pedro Tammela

On 07/03/2026 10:36, Eric Dumazet wrote:
> Some modern cpus disable X86_FEATURE_RETPOLINE feature,
> even if a direct call can still be beneficial.
> 
> Even when IBRS is present, an indirect call is more expensive
> than a direct one:
> 
> Direct Calls:
>    Compilers can perform powerful optimizations like inlining,
>    where the function body is directly inserted at the call site,
>    eliminating call overhead entirely.
> 
> Indirect Calls:
>    Inlining is much harder, if not impossible, because the compiler
>    doesn't know the target function at compile time.
>    Techniques like Indirect Call Promotion can help by using
>    profile-guided optimization to turn frequently taken indirect calls
>    into conditional direct calls, but they still add complexity
>    and potential overhead compared to a truly direct call.
> 
> In this patch, I split tc_skip_wrapper in two different
> static keys, one for tc_act() (tc_skip_wrapper_act)
> and one for tc_classify() (tc_skip_wrapper_cls).
> 
> Then I enable the tc_skip_wrapper_cls only if the count
> of builtin classifiers is above one.
> 
> I enable tc_skip_wrapper_act only it the count of builtin
> actions is above one.
> 
> In our production kernels, we only have CONFIG_NET_CLS_BPF=y
> and CONFIG_NET_ACT_BPF=y. Other are modules or are not compiled.
> 
> Tested on AMD Turin cpus, cls_bpf_classify() cost went
> from 1% down to 0.18 %, and FDO will be able to inline
> it in tcf_classify() for further gains.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next] net/sched: refine indirect call mitigation in tc_wrapper.h
  2026-03-07 13:36 [PATCH net-next] net/sched: refine indirect call mitigation in tc_wrapper.h Eric Dumazet
  2026-03-07 21:44 ` Jamal Hadi Salim
  2026-03-09 17:37 ` Pedro Tammela
@ 2026-03-09 18:34 ` Victor Nogueira
  2026-03-10  3:00 ` patchwork-bot+netdevbpf
  3 siblings, 0 replies; 7+ messages in thread
From: Victor Nogueira @ 2026-03-09 18:34 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet, Pedro Tammela

On Sat, Mar 7, 2026 at 10:36 AM Eric Dumazet <edumazet@google.com> wrote:
>
> Some modern cpus disable X86_FEATURE_RETPOLINE feature,
> even if a direct call can still be beneficial.
>
> Even when IBRS is present, an indirect call is more expensive
> than a direct one:
>
> Direct Calls:
>   Compilers can perform powerful optimizations like inlining,
>   where the function body is directly inserted at the call site,
>   eliminating call overhead entirely.
>
> Indirect Calls:
>   Inlining is much harder, if not impossible, because the compiler
>   doesn't know the target function at compile time.
>   Techniques like Indirect Call Promotion can help by using
>   profile-guided optimization to turn frequently taken indirect calls
>   into conditional direct calls, but they still add complexity
>   and potential overhead compared to a truly direct call.
>
> In this patch, I split tc_skip_wrapper in two different
> static keys, one for tc_act() (tc_skip_wrapper_act)
> and one for tc_classify() (tc_skip_wrapper_cls).
>
> Then I enable the tc_skip_wrapper_cls only if the count
> of builtin classifiers is above one.
>
> I enable tc_skip_wrapper_act only it the count of builtin
> actions is above one.
>
> In our production kernels, we only have CONFIG_NET_CLS_BPF=y
> and CONFIG_NET_ACT_BPF=y. Other are modules or are not compiled.
>
> Tested on AMD Turin cpus, cls_bpf_classify() cost went
> from 1% down to 0.18 %, and FDO will be able to inline
> it in tcf_classify() for further gains.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Victor Nogueira <victor@mojatatu.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next] net/sched: refine indirect call mitigation in tc_wrapper.h
  2026-03-07 13:36 [PATCH net-next] net/sched: refine indirect call mitigation in tc_wrapper.h Eric Dumazet
                   ` (2 preceding siblings ...)
  2026-03-09 18:34 ` Victor Nogueira
@ 2026-03-10  3:00 ` patchwork-bot+netdevbpf
  3 siblings, 0 replies; 7+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-03-10  3:00 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: davem, kuba, pabeni, horms, jhs, jiri, netdev, eric.dumazet,
	victor, pctammela

Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Sat,  7 Mar 2026 13:36:01 +0000 you wrote:
> Some modern cpus disable X86_FEATURE_RETPOLINE feature,
> even if a direct call can still be beneficial.
> 
> Even when IBRS is present, an indirect call is more expensive
> than a direct one:
> 
> Direct Calls:
>   Compilers can perform powerful optimizations like inlining,
>   where the function body is directly inserted at the call site,
>   eliminating call overhead entirely.
> 
> [...]

Here is the summary with links:
  - [net-next] net/sched: refine indirect call mitigation in tc_wrapper.h
    https://git.kernel.org/netdev/net-next/c/f2db7b80b03f

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-03-10  3:00 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-07 13:36 [PATCH net-next] net/sched: refine indirect call mitigation in tc_wrapper.h Eric Dumazet
2026-03-07 21:44 ` Jamal Hadi Salim
2026-03-07 23:53   ` Eric Dumazet
2026-03-09 16:19     ` Jamal Hadi Salim
2026-03-09 17:37 ` Pedro Tammela
2026-03-09 18:34 ` Victor Nogueira
2026-03-10  3:00 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox