* [PATCH] locking/local_lock: Reduce local_[un]lock_nested_bh() overhead
@ 2026-03-09 12:20 Eric Dumazet
2026-03-09 13:43 ` Peter Zijlstra
0 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2026-03-09 12:20 UTC (permalink / raw)
To: Thomas Gleixner
Cc: linux-kernel, Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev,
Eric Dumazet, Sebastian Andrzej Siewior, Peter Zijlstra (Intel),
Marco Elver
On !PREEMPT_RT and !LOCKDEP kernels, local_[un]lock_nested_bh()
are supposed to be NOP.
This is not exactly true after 7ff495e26a39 ("local_lock: Move
this_cpu_ptr() notation from internal to main header") due to
this_cpu_ptr() being evaluated even if its result it not used.
This prevents some tail call optimizations.
After this patch we have gains in networking fast paths:
$ scripts/bloat-o-meter -t vmlinux.0 vmlinux
add/remove: 0/0 grow/shrink: 0/36 up/down: 0/-644 (-644)
Function old new delta
tcp_sigpool_end 79 71 -8
skb_attempt_defer_free 457 449 -8
ppp_xmit_process 179 171 -8
ppp_write 411 403 -8
ppp_output_wakeup 135 127 -8
napi_skb_cache_get_bulk 440 432 -8
napi_consume_skb 409 401 -8
dst_cache_set_ip6 203 195 -8
dst_cache_set_ip4 135 127 -8
cpu_map_enqueue 193 185 -8
bq_enqueue 263 255 -8
__netdev_alloc_skb 377 369 -8
__netdev_alloc_frag_align 155 147 -8
__napi_kfree_skb 136 128 -8
napi_skb_free_stolen_head 199 190 -9
input_action_end_bpf 1083 1072 -11
napi_alloc_skb 275 263 -12
__napi_alloc_frag_align 59 45 -14
xdp_build_skb_from_zc 590 574 -16
tcp_v4_send_ack 1129 1113 -16
sch_frag_xmit_hook 1260 1244 -16
flush_backlog 507 491 -16
dst_cache_get_ip6 99 83 -16
dst_cache_get_ip4 90 74 -16
do_xdp_generic 932 916 -16
__napi_build_skb 591 575 -16
__dev_flush 115 99 -16
__cpu_map_flush 85 69 -16
dst_cache_get 55 38 -17
tcp_v4_send_reset 2682 2658 -24
mptcp_subflow_delegate 955 931 -24
__alloc_skb 988 964 -24
mptcp_napi_poll 310 281 -29
nat_keepalive_work_single 1385 1335 -50
gro_cells_receive 320 244 -76
process_backlog 486 404 -82
Total: Before=25812320, After=25811676, chg -0.00%
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Marco Elver <elver@google.com>
---
include/linux/local_lock.h | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/include/linux/local_lock.h b/include/linux/local_lock.h
index b8830148a8591c17c22e36470fbc13ff5c354955..40c2da54a0b720265be7b6327e0922a49befd8fc 100644
--- a/include/linux/local_lock.h
+++ b/include/linux/local_lock.h
@@ -94,12 +94,19 @@ DEFINE_LOCK_GUARD_1(local_lock_irqsave, local_lock_t __percpu,
local_unlock_irqrestore(_T->lock, _T->flags),
unsigned long flags)
+#if defined(WARN_CONTEXT_ANALYSIS) || defined(CONFIG_PREEMPT_RT) || \
+ defined(CONFIG_DEBUG_LOCK_ALLOC)
#define local_lock_nested_bh(_lock) \
__local_lock_nested_bh(__this_cpu_local_lock(_lock))
#define local_unlock_nested_bh(_lock) \
__local_unlock_nested_bh(__this_cpu_local_lock(_lock))
+#else
+static inline void local_lock_nested_bh(local_lock_t *_lock) {}
+static inline void local_unlock_nested_bh(local_lock_t *__lock) {}
+#endif
+
DEFINE_LOCK_GUARD_1(local_lock_nested_bh, local_lock_t __percpu,
local_lock_nested_bh(_T->lock),
local_unlock_nested_bh(_T->lock))
base-commit: 1f318b96cc84d7c2ab792fcc0bfd42a7ca890681
prerequisite-patch-id: f6002c357582927a383603a22e69bc0d7a5b9528
--
2.53.0.473.g4a7958ca14-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH] locking/local_lock: Reduce local_[un]lock_nested_bh() overhead
2026-03-09 12:20 [PATCH] locking/local_lock: Reduce local_[un]lock_nested_bh() overhead Eric Dumazet
@ 2026-03-09 13:43 ` Peter Zijlstra
2026-03-09 13:49 ` Eric Dumazet
2026-03-09 14:03 ` Eric Dumazet
0 siblings, 2 replies; 11+ messages in thread
From: Peter Zijlstra @ 2026-03-09 13:43 UTC (permalink / raw)
To: Eric Dumazet
Cc: Thomas Gleixner, linux-kernel, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, netdev, Sebastian Andrzej Siewior, Marco Elver
On Mon, Mar 09, 2026 at 12:20:55PM +0000, Eric Dumazet wrote:
> diff --git a/include/linux/local_lock.h b/include/linux/local_lock.h
> index b8830148a8591c17c22e36470fbc13ff5c354955..40c2da54a0b720265be7b6327e0922a49befd8fc 100644
> --- a/include/linux/local_lock.h
> +++ b/include/linux/local_lock.h
> @@ -94,12 +94,19 @@ DEFINE_LOCK_GUARD_1(local_lock_irqsave, local_lock_t __percpu,
> local_unlock_irqrestore(_T->lock, _T->flags),
> unsigned long flags)
>
> +#if defined(WARN_CONTEXT_ANALYSIS) || defined(CONFIG_PREEMPT_RT) || \
> + defined(CONFIG_DEBUG_LOCK_ALLOC)
> #define local_lock_nested_bh(_lock) \
> __local_lock_nested_bh(__this_cpu_local_lock(_lock))
>
> #define local_unlock_nested_bh(_lock) \
> __local_unlock_nested_bh(__this_cpu_local_lock(_lock))
>
> +#else
> +static inline void local_lock_nested_bh(local_lock_t *_lock) {}
> +static inline void local_unlock_nested_bh(local_lock_t *__lock) {}
> +#endif
This isn't going to work; WARN_CONTEXT_ANALYSIS is unconditional on
clang >= 22.1
How come that this isn't DCEd properly?
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH] locking/local_lock: Reduce local_[un]lock_nested_bh() overhead
2026-03-09 13:43 ` Peter Zijlstra
@ 2026-03-09 13:49 ` Eric Dumazet
2026-03-09 14:05 ` Marco Elver
2026-03-09 14:03 ` Eric Dumazet
1 sibling, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2026-03-09 13:49 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Thomas Gleixner, linux-kernel, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, netdev, Sebastian Andrzej Siewior, Marco Elver
On Mon, Mar 9, 2026 at 2:44 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Mar 09, 2026 at 12:20:55PM +0000, Eric Dumazet wrote:
>
> > diff --git a/include/linux/local_lock.h b/include/linux/local_lock.h
> > index b8830148a8591c17c22e36470fbc13ff5c354955..40c2da54a0b720265be7b6327e0922a49befd8fc 100644
> > --- a/include/linux/local_lock.h
> > +++ b/include/linux/local_lock.h
> > @@ -94,12 +94,19 @@ DEFINE_LOCK_GUARD_1(local_lock_irqsave, local_lock_t __percpu,
> > local_unlock_irqrestore(_T->lock, _T->flags),
> > unsigned long flags)
> >
> > +#if defined(WARN_CONTEXT_ANALYSIS) || defined(CONFIG_PREEMPT_RT) || \
> > + defined(CONFIG_DEBUG_LOCK_ALLOC)
> > #define local_lock_nested_bh(_lock) \
> > __local_lock_nested_bh(__this_cpu_local_lock(_lock))
> >
> > #define local_unlock_nested_bh(_lock) \
> > __local_unlock_nested_bh(__this_cpu_local_lock(_lock))
> >
> > +#else
> > +static inline void local_lock_nested_bh(local_lock_t *_lock) {}
> > +static inline void local_unlock_nested_bh(local_lock_t *__lock) {}
> > +#endif
>
> This isn't going to work; WARN_CONTEXT_ANALYSIS is unconditional on
> clang >= 22.1
>
> How come that this isn't DCEd properly?
BTW I wonder if the following WARN_CONTEXT_ANALYSIS should be
CONFIG_WARN_CONTEXT_ANALYSIS
include/linux/local_lock_internal.h:318:#if defined(WARN_CONTEXT_ANALYSIS)
include/linux/local_lock_internal.h:337:#else /* WARN_CONTEXT_ANALYSIS */
include/linux/local_lock_internal.h:339:#endif /* WARN_CONTEXT_ANALYSIS */
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH] locking/local_lock: Reduce local_[un]lock_nested_bh() overhead
2026-03-09 13:49 ` Eric Dumazet
@ 2026-03-09 14:05 ` Marco Elver
2026-03-09 14:11 ` Eric Dumazet
0 siblings, 1 reply; 11+ messages in thread
From: Marco Elver @ 2026-03-09 14:05 UTC (permalink / raw)
To: Eric Dumazet
Cc: Peter Zijlstra, Thomas Gleixner, linux-kernel, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, netdev, Sebastian Andrzej Siewior
On Mon, 9 Mar 2026 at 14:49, Eric Dumazet <edumazet@google.com> wrote:
>
> On Mon, Mar 9, 2026 at 2:44 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Mon, Mar 09, 2026 at 12:20:55PM +0000, Eric Dumazet wrote:
> >
> > > diff --git a/include/linux/local_lock.h b/include/linux/local_lock.h
> > > index b8830148a8591c17c22e36470fbc13ff5c354955..40c2da54a0b720265be7b6327e0922a49befd8fc 100644
> > > --- a/include/linux/local_lock.h
> > > +++ b/include/linux/local_lock.h
> > > @@ -94,12 +94,19 @@ DEFINE_LOCK_GUARD_1(local_lock_irqsave, local_lock_t __percpu,
> > > local_unlock_irqrestore(_T->lock, _T->flags),
> > > unsigned long flags)
> > >
> > > +#if defined(WARN_CONTEXT_ANALYSIS) || defined(CONFIG_PREEMPT_RT) || \
> > > + defined(CONFIG_DEBUG_LOCK_ALLOC)
> > > #define local_lock_nested_bh(_lock) \
> > > __local_lock_nested_bh(__this_cpu_local_lock(_lock))
> > >
> > > #define local_unlock_nested_bh(_lock) \
> > > __local_unlock_nested_bh(__this_cpu_local_lock(_lock))
> > >
> > > +#else
> > > +static inline void local_lock_nested_bh(local_lock_t *_lock) {}
> > > +static inline void local_unlock_nested_bh(local_lock_t *__lock) {}
> > > +#endif
> >
> > This isn't going to work; WARN_CONTEXT_ANALYSIS is unconditional on
> > clang >= 22.1
> >
> > How come that this isn't DCEd properly?
>
> BTW I wonder if the following WARN_CONTEXT_ANALYSIS should be
> CONFIG_WARN_CONTEXT_ANALYSIS
>
> include/linux/local_lock_internal.h:318:#if defined(WARN_CONTEXT_ANALYSIS)
> include/linux/local_lock_internal.h:337:#else /* WARN_CONTEXT_ANALYSIS */
> include/linux/local_lock_internal.h:339:#endif /* WARN_CONTEXT_ANALYSIS */
Even if enabled in Kconfig, our make rules set -DWARN_CONTEXT_ANALYSIS
for translation units where we actually want to compile with
-Wthread-safety. So WARN_CONTEXT_ANALYSIS should be ok.
But for !CONFIG_PREEMPT_RT and !CONFIG_DEBUG_LOCK_ALLOC builds, where
we build with context analysis (which is purely static, no dynamic
overhead) we should be able to get the same better codegen as well.
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH] locking/local_lock: Reduce local_[un]lock_nested_bh() overhead
2026-03-09 14:05 ` Marco Elver
@ 2026-03-09 14:11 ` Eric Dumazet
0 siblings, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2026-03-09 14:11 UTC (permalink / raw)
To: Marco Elver
Cc: Peter Zijlstra, Thomas Gleixner, linux-kernel, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, netdev, Sebastian Andrzej Siewior
On Mon, Mar 9, 2026 at 3:06 PM Marco Elver <elver@google.com> wrote:
>
> On Mon, 9 Mar 2026 at 14:49, Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Mon, Mar 9, 2026 at 2:44 PM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Mon, Mar 09, 2026 at 12:20:55PM +0000, Eric Dumazet wrote:
> > >
> > > > diff --git a/include/linux/local_lock.h b/include/linux/local_lock.h
> > > > index b8830148a8591c17c22e36470fbc13ff5c354955..40c2da54a0b720265be7b6327e0922a49befd8fc 100644
> > > > --- a/include/linux/local_lock.h
> > > > +++ b/include/linux/local_lock.h
> > > > @@ -94,12 +94,19 @@ DEFINE_LOCK_GUARD_1(local_lock_irqsave, local_lock_t __percpu,
> > > > local_unlock_irqrestore(_T->lock, _T->flags),
> > > > unsigned long flags)
> > > >
> > > > +#if defined(WARN_CONTEXT_ANALYSIS) || defined(CONFIG_PREEMPT_RT) || \
> > > > + defined(CONFIG_DEBUG_LOCK_ALLOC)
> > > > #define local_lock_nested_bh(_lock) \
> > > > __local_lock_nested_bh(__this_cpu_local_lock(_lock))
> > > >
> > > > #define local_unlock_nested_bh(_lock) \
> > > > __local_unlock_nested_bh(__this_cpu_local_lock(_lock))
> > > >
> > > > +#else
> > > > +static inline void local_lock_nested_bh(local_lock_t *_lock) {}
> > > > +static inline void local_unlock_nested_bh(local_lock_t *__lock) {}
> > > > +#endif
> > >
> > > This isn't going to work; WARN_CONTEXT_ANALYSIS is unconditional on
> > > clang >= 22.1
> > >
> > > How come that this isn't DCEd properly?
> >
> > BTW I wonder if the following WARN_CONTEXT_ANALYSIS should be
> > CONFIG_WARN_CONTEXT_ANALYSIS
> >
> > include/linux/local_lock_internal.h:318:#if defined(WARN_CONTEXT_ANALYSIS)
> > include/linux/local_lock_internal.h:337:#else /* WARN_CONTEXT_ANALYSIS */
> > include/linux/local_lock_internal.h:339:#endif /* WARN_CONTEXT_ANALYSIS */
>
> Even if enabled in Kconfig, our make rules set -DWARN_CONTEXT_ANALYSIS
> for translation units where we actually want to compile with
> -Wthread-safety. So WARN_CONTEXT_ANALYSIS should be ok.
>
> But for !CONFIG_PREEMPT_RT and !CONFIG_DEBUG_LOCK_ALLOC builds, where
> we build with context analysis (which is purely static, no dynamic
> overhead) we should be able to get the same better codegen as well.
Ah ok, a bit confusing ....
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] locking/local_lock: Reduce local_[un]lock_nested_bh() overhead
2026-03-09 13:43 ` Peter Zijlstra
2026-03-09 13:49 ` Eric Dumazet
@ 2026-03-09 14:03 ` Eric Dumazet
2026-03-09 14:18 ` Eric Dumazet
1 sibling, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2026-03-09 14:03 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Thomas Gleixner, linux-kernel, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, netdev, Sebastian Andrzej Siewior, Marco Elver
On Mon, Mar 9, 2026 at 2:44 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Mar 09, 2026 at 12:20:55PM +0000, Eric Dumazet wrote:
>
> > diff --git a/include/linux/local_lock.h b/include/linux/local_lock.h
> > index b8830148a8591c17c22e36470fbc13ff5c354955..40c2da54a0b720265be7b6327e0922a49befd8fc 100644
> > --- a/include/linux/local_lock.h
> > +++ b/include/linux/local_lock.h
> > @@ -94,12 +94,19 @@ DEFINE_LOCK_GUARD_1(local_lock_irqsave, local_lock_t __percpu,
> > local_unlock_irqrestore(_T->lock, _T->flags),
> > unsigned long flags)
> >
> > +#if defined(WARN_CONTEXT_ANALYSIS) || defined(CONFIG_PREEMPT_RT) || \
> > + defined(CONFIG_DEBUG_LOCK_ALLOC)
> > #define local_lock_nested_bh(_lock) \
> > __local_lock_nested_bh(__this_cpu_local_lock(_lock))
> >
> > #define local_unlock_nested_bh(_lock) \
> > __local_unlock_nested_bh(__this_cpu_local_lock(_lock))
> >
> > +#else
> > +static inline void local_lock_nested_bh(local_lock_t *_lock) {}
> > +static inline void local_unlock_nested_bh(local_lock_t *__lock) {}
> > +#endif
>
> This isn't going to work; WARN_CONTEXT_ANALYSIS is unconditional on
> clang >= 22.1
>
> How come that this isn't DCEd properly?
It might be partially done.
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 0e217041958a83d2a3c18de2965808442546c49b..50455951dc38668b0cbbcccdb2c5ce726e3c4da9
100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -7498,3 +7498,12 @@ struct vlan_type_depth
__vlan_get_protocol_offset(const struct sk_buff *skb,
};
}
EXPORT_SYMBOL(__vlan_get_protocol_offset);
+
+void ericeric(void);
+void ericeric(void)
+{
+ local_lock_nested_bh(&napi_alloc_cache.bh_lock);
+ local_unlock_nested_bh(&napi_alloc_cache.bh_lock);
+ local_lock_nested_bh(&napi_alloc_cache.bh_lock);
+ local_unlock_nested_bh(&napi_alloc_cache.bh_lock);
+}
objdump --disassemble=ericeric -r net/core/skbuff.o
net/core/skbuff.o: file format elf64-x86-64
Disassembly of section .text:
000000000000fe40 <ericeric>:
fe40: f3 0f 1e fa endbr64
fe44: e8 00 00 00 00 call fe49 <ericeric+0x9>
fe45: R_X86_64_PLT32 __fentry__-0x4
fe49: 65 48 8b 05 00 00 00 mov %gs:0x0(%rip),%rax # fe51
<ericeric+0x11>
fe50: 00
fe4d: R_X86_64_PC32 this_cpu_off-0x4
fe51: 2e e9 00 00 00 00 cs jmp fe57 <ericeric+0x17>
fe53: R_X86_64_PLT32 __x86_return_thunk-0x4
Disassembly of section .init.text:
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH] locking/local_lock: Reduce local_[un]lock_nested_bh() overhead
2026-03-09 14:03 ` Eric Dumazet
@ 2026-03-09 14:18 ` Eric Dumazet
2026-03-09 14:52 ` Eric Dumazet
0 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2026-03-09 14:18 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Thomas Gleixner, linux-kernel, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, netdev, Sebastian Andrzej Siewior, Marco Elver
On Mon, Mar 9, 2026 at 3:03 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Mon, Mar 9, 2026 at 2:44 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Mon, Mar 09, 2026 at 12:20:55PM +0000, Eric Dumazet wrote:
> >
> > > diff --git a/include/linux/local_lock.h b/include/linux/local_lock.h
> > > index b8830148a8591c17c22e36470fbc13ff5c354955..40c2da54a0b720265be7b6327e0922a49befd8fc 100644
> > > --- a/include/linux/local_lock.h
> > > +++ b/include/linux/local_lock.h
> > > @@ -94,12 +94,19 @@ DEFINE_LOCK_GUARD_1(local_lock_irqsave, local_lock_t __percpu,
> > > local_unlock_irqrestore(_T->lock, _T->flags),
> > > unsigned long flags)
> > >
> > > +#if defined(WARN_CONTEXT_ANALYSIS) || defined(CONFIG_PREEMPT_RT) || \
> > > + defined(CONFIG_DEBUG_LOCK_ALLOC)
> > > #define local_lock_nested_bh(_lock) \
> > > __local_lock_nested_bh(__this_cpu_local_lock(_lock))
> > >
> > > #define local_unlock_nested_bh(_lock) \
> > > __local_unlock_nested_bh(__this_cpu_local_lock(_lock))
> > >
> > > +#else
> > > +static inline void local_lock_nested_bh(local_lock_t *_lock) {}
> > > +static inline void local_unlock_nested_bh(local_lock_t *__lock) {}
> > > +#endif
> >
> > This isn't going to work; WARN_CONTEXT_ANALYSIS is unconditional on
> > clang >= 22.1
> >
> > How come that this isn't DCEd properly?
>
> It might be partially done.
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 0e217041958a83d2a3c18de2965808442546c49b..50455951dc38668b0cbbcccdb2c5ce726e3c4da9
> 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -7498,3 +7498,12 @@ struct vlan_type_depth
> __vlan_get_protocol_offset(const struct sk_buff *skb,
> };
> }
> EXPORT_SYMBOL(__vlan_get_protocol_offset);
> +
> +void ericeric(void);
> +void ericeric(void)
> +{
> + local_lock_nested_bh(&napi_alloc_cache.bh_lock);
> + local_unlock_nested_bh(&napi_alloc_cache.bh_lock);
> + local_lock_nested_bh(&napi_alloc_cache.bh_lock);
> + local_unlock_nested_bh(&napi_alloc_cache.bh_lock);
> +}
>
> objdump --disassemble=ericeric -r net/core/skbuff.o
>
> net/core/skbuff.o: file format elf64-x86-64
>
>
> Disassembly of section .text:
>
> 000000000000fe40 <ericeric>:
> fe40: f3 0f 1e fa endbr64
> fe44: e8 00 00 00 00 call fe49 <ericeric+0x9>
> fe45: R_X86_64_PLT32 __fentry__-0x4
> fe49: 65 48 8b 05 00 00 00 mov %gs:0x0(%rip),%rax # fe51
> <ericeric+0x11>
> fe50: 00
> fe4d: R_X86_64_PC32 this_cpu_off-0x4
> fe51: 2e e9 00 00 00 00 cs jmp fe57 <ericeric+0x17>
> fe53: R_X86_64_PLT32 __x86_return_thunk-0x4
>
> Disassembly of section .init.text:
Same for
+
+void ericeric(void);
+void ericeric(void)
+{
+ raw_cpu_read_long(this_cpu_off);
+ raw_cpu_read_long(this_cpu_off);
+}
I am guessing __raw_cpu_read() is forcing the asm ?
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH] locking/local_lock: Reduce local_[un]lock_nested_bh() overhead
2026-03-09 14:18 ` Eric Dumazet
@ 2026-03-09 14:52 ` Eric Dumazet
2026-03-11 15:55 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2026-03-09 14:52 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Thomas Gleixner, linux-kernel, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, netdev, Sebastian Andrzej Siewior, Marco Elver
On Mon, Mar 9, 2026 at 3:18 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Mon, Mar 9, 2026 at 3:03 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Mon, Mar 9, 2026 at 2:44 PM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Mon, Mar 09, 2026 at 12:20:55PM +0000, Eric Dumazet wrote:
> > >
> > > > diff --git a/include/linux/local_lock.h b/include/linux/local_lock.h
> > > > index b8830148a8591c17c22e36470fbc13ff5c354955..40c2da54a0b720265be7b6327e0922a49befd8fc 100644
> > > > --- a/include/linux/local_lock.h
> > > > +++ b/include/linux/local_lock.h
> > > > @@ -94,12 +94,19 @@ DEFINE_LOCK_GUARD_1(local_lock_irqsave, local_lock_t __percpu,
> > > > local_unlock_irqrestore(_T->lock, _T->flags),
> > > > unsigned long flags)
> > > >
> > > > +#if defined(WARN_CONTEXT_ANALYSIS) || defined(CONFIG_PREEMPT_RT) || \
> > > > + defined(CONFIG_DEBUG_LOCK_ALLOC)
> > > > #define local_lock_nested_bh(_lock) \
> > > > __local_lock_nested_bh(__this_cpu_local_lock(_lock))
> > > >
> > > > #define local_unlock_nested_bh(_lock) \
> > > > __local_unlock_nested_bh(__this_cpu_local_lock(_lock))
> > > >
> > > > +#else
> > > > +static inline void local_lock_nested_bh(local_lock_t *_lock) {}
> > > > +static inline void local_unlock_nested_bh(local_lock_t *__lock) {}
> > > > +#endif
> > >
> > > This isn't going to work; WARN_CONTEXT_ANALYSIS is unconditional on
> > > clang >= 22.1
> > >
> > > How come that this isn't DCEd properly?
> >
> > It might be partially done.
> >
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index 0e217041958a83d2a3c18de2965808442546c49b..50455951dc38668b0cbbcccdb2c5ce726e3c4da9
> > 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -7498,3 +7498,12 @@ struct vlan_type_depth
> > __vlan_get_protocol_offset(const struct sk_buff *skb,
> > };
> > }
> > EXPORT_SYMBOL(__vlan_get_protocol_offset);
> > +
> > +void ericeric(void);
> > +void ericeric(void)
> > +{
> > + local_lock_nested_bh(&napi_alloc_cache.bh_lock);
> > + local_unlock_nested_bh(&napi_alloc_cache.bh_lock);
> > + local_lock_nested_bh(&napi_alloc_cache.bh_lock);
> > + local_unlock_nested_bh(&napi_alloc_cache.bh_lock);
> > +}
> >
> > objdump --disassemble=ericeric -r net/core/skbuff.o
> >
> > net/core/skbuff.o: file format elf64-x86-64
> >
> >
> > Disassembly of section .text:
> >
> > 000000000000fe40 <ericeric>:
> > fe40: f3 0f 1e fa endbr64
> > fe44: e8 00 00 00 00 call fe49 <ericeric+0x9>
> > fe45: R_X86_64_PLT32 __fentry__-0x4
> > fe49: 65 48 8b 05 00 00 00 mov %gs:0x0(%rip),%rax # fe51
> > <ericeric+0x11>
> > fe50: 00
> > fe4d: R_X86_64_PC32 this_cpu_off-0x4
> > fe51: 2e e9 00 00 00 00 cs jmp fe57 <ericeric+0x17>
> > fe53: R_X86_64_PLT32 __x86_return_thunk-0x4
> >
> > Disassembly of section .init.text:
>
> Same for
>
> +
> +void ericeric(void);
> +void ericeric(void)
> +{
> + raw_cpu_read_long(this_cpu_off);
> + raw_cpu_read_long(this_cpu_off);
> +}
>
> I am guessing __raw_cpu_read() is forcing the asm ?
Might be a clang issue. Oh well.
clang --version
Debian clang version 19.1.7 (10.1+build1)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/lib/llvm-19/bin
Documentation/process/changes.rst mentions the minimum supported version is 15.0
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH] locking/local_lock: Reduce local_[un]lock_nested_bh() overhead
2026-03-09 14:52 ` Eric Dumazet
@ 2026-03-11 15:55 ` Sebastian Andrzej Siewior
2026-03-11 16:32 ` Uros Bizjak
0 siblings, 1 reply; 11+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-03-11 15:55 UTC (permalink / raw)
To: Eric Dumazet
Cc: Peter Zijlstra, Thomas Gleixner, linux-kernel, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, netdev, Marco Elver, Uros Bizjak
On 2026-03-09 15:52:34 [+0100], Eric Dumazet wrote:
> > +void ericeric(void);
> > +void ericeric(void)
> > +{
> > + raw_cpu_read_long(this_cpu_off);
> > + raw_cpu_read_long(this_cpu_off);
> > +}
> >
> > I am guessing __raw_cpu_read() is forcing the asm ?
>
> Might be a clang issue. Oh well.
So the difference is that with gcc we have USE_X86_SEG_SUPPORT and with
llvm we don't. This leads to two asm statements with LLVM of which only
one is eliminated. This optimisation origins in commit ca4256348660c
("x86/percpu: Use C for percpu read/write accessors").
__seg_fs and __seg_gs is supported by LLVM but enabling it leads to tons
warnings and aborts later.
Is there something missing in LLVM? The generated code for
raw_cpu_read_long(this_cpu_off) looks fine.
Sebastian
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH] locking/local_lock: Reduce local_[un]lock_nested_bh() overhead
2026-03-11 15:55 ` Sebastian Andrzej Siewior
@ 2026-03-11 16:32 ` Uros Bizjak
2026-03-11 16:45 ` Uros Bizjak
0 siblings, 1 reply; 11+ messages in thread
From: Uros Bizjak @ 2026-03-11 16:32 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: Eric Dumazet, Peter Zijlstra, Thomas Gleixner, linux-kernel,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, Marco Elver,
Nathan Chancellor
On Wed, Mar 11, 2026 at 4:55 PM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> On 2026-03-09 15:52:34 [+0100], Eric Dumazet wrote:
> > > +void ericeric(void);
> > > +void ericeric(void)
> > > +{
> > > + raw_cpu_read_long(this_cpu_off);
> > > + raw_cpu_read_long(this_cpu_off);
> > > +}
> > >
> > > I am guessing __raw_cpu_read() is forcing the asm ?
> >
> > Might be a clang issue. Oh well.
>
> So the difference is that with gcc we have USE_X86_SEG_SUPPORT and with
> llvm we don't. This leads to two asm statements with LLVM of which only
> one is eliminated. This optimisation origins in commit ca4256348660c
> ("x86/percpu: Use C for percpu read/write accessors").
>
> __seg_fs and __seg_gs is supported by LLVM but enabling it leads to tons
> warnings and aborts later.
Tons of warnings is just due to clang being picky and warns for
duplicated qualifiers, such as "__seg_gs __seg_gs var". This can be
fixed with:
https://lore.kernel.org/lkml/20240526175655.227798-1-ubizjak@gmail.com/
> Is there something missing in LLVM? The generated code for
> raw_cpu_read_long(this_cpu_off) looks fine.
Yes:
1. The %fs: and %gs: prefix does not get emitted in inline assembly.
2. An internal compiler error when addressing symbols directly:
https://github.com/llvm/llvm-project/issues/93449
3. Wrong named address space for anonymous struct:
https://github.com/llvm/llvm-project/issues/119705
Uros.
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH] locking/local_lock: Reduce local_[un]lock_nested_bh() overhead
2026-03-11 16:32 ` Uros Bizjak
@ 2026-03-11 16:45 ` Uros Bizjak
0 siblings, 0 replies; 11+ messages in thread
From: Uros Bizjak @ 2026-03-11 16:45 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: Eric Dumazet, Peter Zijlstra, Thomas Gleixner, linux-kernel,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, Marco Elver,
Nathan Chancellor, Bill Wendling, Dmitry Vyukov
On Wed, Mar 11, 2026 at 5:32 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Wed, Mar 11, 2026 at 4:55 PM Sebastian Andrzej Siewior
> <bigeasy@linutronix.de> wrote:
> >
> > On 2026-03-09 15:52:34 [+0100], Eric Dumazet wrote:
> > > > +void ericeric(void);
> > > > +void ericeric(void)
> > > > +{
> > > > + raw_cpu_read_long(this_cpu_off);
> > > > + raw_cpu_read_long(this_cpu_off);
> > > > +}
> > > >
> > > > I am guessing __raw_cpu_read() is forcing the asm ?
> > >
> > > Might be a clang issue. Oh well.
> >
> > So the difference is that with gcc we have USE_X86_SEG_SUPPORT and with
> > llvm we don't. This leads to two asm statements with LLVM of which only
> > one is eliminated. This optimisation origins in commit ca4256348660c
> > ("x86/percpu: Use C for percpu read/write accessors").
> >
> > __seg_fs and __seg_gs is supported by LLVM but enabling it leads to tons
> > warnings and aborts later.
>
> Tons of warnings is just due to clang being picky and warns for
> duplicated qualifiers, such as "__seg_gs __seg_gs var". This can be
> fixed with:
>
> https://lore.kernel.org/lkml/20240526175655.227798-1-ubizjak@gmail.com/
>
> > Is there something missing in LLVM? The generated code for
> > raw_cpu_read_long(this_cpu_off) looks fine.
>
> Yes:
>
> 1. The %fs: and %gs: prefix does not get emitted in inline assembly.
>
> 2. An internal compiler error when addressing symbols directly:
> https://github.com/llvm/llvm-project/issues/93449
>
> 3. Wrong named address space for anonymous struct:
> https://github.com/llvm/llvm-project/issues/119705
BTW: A related issue is that ASAN fails to handle gs: prefixed
addresses. For GCC, we have had to disable ASAN instrumentation for
all locations in non-default address spaces. With asm accessors, the
access is hidden from ASAN, and the memory access is not instrumented
anyways.
Uros.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-03-11 16:45 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-09 12:20 [PATCH] locking/local_lock: Reduce local_[un]lock_nested_bh() overhead Eric Dumazet
2026-03-09 13:43 ` Peter Zijlstra
2026-03-09 13:49 ` Eric Dumazet
2026-03-09 14:05 ` Marco Elver
2026-03-09 14:11 ` Eric Dumazet
2026-03-09 14:03 ` Eric Dumazet
2026-03-09 14:18 ` Eric Dumazet
2026-03-09 14:52 ` Eric Dumazet
2026-03-11 15:55 ` Sebastian Andrzej Siewior
2026-03-11 16:32 ` Uros Bizjak
2026-03-11 16:45 ` Uros Bizjak
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox