* uprobe splat in PREEMP_RT
@ 2025-04-01 21:04 Alexei Starovoitov
2025-04-01 21:22 ` Steven Rostedt
2025-04-02 9:10 ` Oleg Nesterov
0 siblings, 2 replies; 28+ messages in thread
From: Alexei Starovoitov @ 2025-04-01 21:04 UTC (permalink / raw)
To: Oleg Nesterov, Andrii Nakryiko, bpf, Sebastian Sewior,
Steven Rostedt, Jiri Olsa, Masami Hiramatsu
Hi,
caught the following splat running uprobe tests in PREEMPT_RT
[ 101.862206] ------------[ cut here ]------------
[ 101.862212] WARNING: CPU: 0 PID: 16 at include/linux/seqlock.h:221
ri_timer+0x235/0x320
[ 101.862226] Modules linked in:
[ 101.862233] CPU: 0 UID: 0 PID: 16 Comm: ktimers/0 Not tainted
6.14.0-12141-g1d0ec9988088 #22 PREEMPT_RT
[ 101.862240] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 101.862243] RIP: 0010:ri_timer+0x235/0x320
[ 101.862249] Code: 5d 41 5e 41 5f e9 5b f5 b7 ff 65 f7 05 a8 95 ff
04 ff ff ff 7f 0f 85 ad fe ff ff 65 8b 05 57 cf ff 04 85 c0 0f 84 9e
fe ff ff <0f> 0b e9 97 fe ff ff e8 df 7b b8 ff 84 c0 0f 85 43 fe ff ff
e8 52
[ 101.862253] RSP: 0018:ffffc9000010fb80 EFLAGS: 00010202
[ 101.862257] RAX: 0000000000000001 RBX: ffffffff819c8889 RCX: 0000000000000001
[ 101.862260] RDX: 0000000000000000 RSI: ffffffff819c8889 RDI: ffff8881f6a33910
[ 101.862262] RBP: ffff888105a1da18 R08: 000000000000000a R09: 000000000000000a
[ 101.862265] R10: ffffc9000010f987 R11: 0000000000000000 R12: 1ffff92000021f78
[ 101.862267] R13: 0000000000000000 R14: ffffffff819c8860 R15: 0000000000000000
[ 101.862292] FS: 0000000000000000(0000) GS:ffff88827005e000(0000)
knlGS:0000000000000000
[ 101.862316] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 101.862319] CR2: 00007fffffffe000 CR3: 0000000109d67004 CR4: 00000000003706f0
[ 101.862322] Call Trace:
[ 101.862325] <TASK>
[ 101.862333] ? free_ret_instance+0x180/0x180
[ 101.862338] call_timer_fn+0x14c/0x3c0
[ 101.862345] ? lock_release+0xb6/0x250
[ 101.862353] ? detach_if_pending+0x310/0x310
[ 101.862363] ? _raw_spin_unlock_irq+0x28/0x40
[ 101.862371] ? lockdep_hardirqs_on_prepare+0xa7/0x170
[ 101.862380] __run_timers+0x58a/0x980
[ 101.862385] ? free_ret_instance+0x180/0x180
[ 101.862396] ? timer_shutdown_sync+0x20/0x20
[ 101.862402] ? lock_acquire+0x123/0x2b0
[ 101.862408] ? run_timer_softirq+0x11a/0x220
[ 101.862414] ? do_raw_spin_lock+0x11e/0x240
[ 101.862419] ? spin_bug+0x230/0x230
[ 101.862422] ? rtlock_slowlock_locked+0x50a0/0x50a0
[ 101.862433] run_timer_softirq+0x122/0x220
[ 101.862503] handle_softirqs.isra.0+0x136/0x610
[ 101.862518] run_ktimerd+0x47/0xe0
[ 101.862524] smpboot_thread_fn+0x30f/0x8a0
[ 101.862531] ? schedule+0xe2/0x390
[ 101.862537] ? sort_range+0x20/0x20
[ 101.862541] kthread+0x3ac/0x770
[ 101.862547] ? rt_read_trylock+0x1d0/0x1d0
[ 101.862554] ? kthread_is_per_cpu+0xc0/0xc0
[ 101.862560] ? lock_release+0xb6/0x250
[ 101.862570] ? kthread_is_per_cpu+0xc0/0xc0
[ 101.862574] ret_from_fork+0x31/0x70
[ 101.862580] ? kthread_is_per_cpu+0xc0/0xc0
[ 101.862586] ret_from_fork_asm+0x11/0x20
[ 101.862604] </TASK>
[ 101.862606] irq event stamp: 13032
[ 101.862608] hardirqs last enabled at (13034): [<ffffffff8150094a>]
vprintk_store+0x72a/0x850
[ 101.862613] hardirqs last disabled at (13035): [<ffffffff8150061d>]
vprintk_store+0x3fd/0x850
[ 101.862615] softirqs last enabled at (12922): [<ffffffff8136e049>]
run_ktimerd+0x69/0xe0
[ 101.862618] softirqs last disabled at (12928): [<ffffffff813ef4bf>]
smpboot_thread_fn+0x30f/0x8a0
[ 101.862621] ---[ end trace 0000000000000000 ]---
Looks like write_seqcount_begin(&utask->ri_seqcount);
use in ri_timer() needs a fix ?
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: uprobe splat in PREEMP_RT
2025-04-01 21:04 uprobe splat in PREEMP_RT Alexei Starovoitov
@ 2025-04-01 21:22 ` Steven Rostedt
2025-04-01 22:01 ` Andrii Nakryiko
2025-04-02 9:10 ` Oleg Nesterov
1 sibling, 1 reply; 28+ messages in thread
From: Steven Rostedt @ 2025-04-01 21:22 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Oleg Nesterov, Andrii Nakryiko, bpf, Sebastian Sewior, Jiri Olsa,
Masami Hiramatsu
On Tue, 1 Apr 2025 14:04:22 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> Looks like write_seqcount_begin(&utask->ri_seqcount);
> use in ri_timer() needs a fix ?
Hmm,
write_seqcount_begin(&utask->ri_seqcount);
for_each_ret_instance_rcu(ri, utask->return_instances)
hprobe_expire(&ri->hprobe, false);
write_seqcount_end(&utask->ri_seqcount);
How big can that loop be?
Of course, we could just say not to use uprobes on PREEMPT_RT kernels?
Otherwise, they could cause an unspecified latency.
-- Steve
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: uprobe splat in PREEMP_RT
2025-04-01 21:22 ` Steven Rostedt
@ 2025-04-01 22:01 ` Andrii Nakryiko
2025-04-02 10:33 ` Oleg Nesterov
2025-04-02 12:21 ` Sebastian Sewior
0 siblings, 2 replies; 28+ messages in thread
From: Andrii Nakryiko @ 2025-04-01 22:01 UTC (permalink / raw)
To: Steven Rostedt, Peter Ziljstra
Cc: Alexei Starovoitov, Oleg Nesterov, Andrii Nakryiko, bpf,
Sebastian Sewior, Jiri Olsa, Masami Hiramatsu
On Tue, Apr 1, 2025 at 2:21 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Tue, 1 Apr 2025 14:04:22 -0700
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>
> > Looks like write_seqcount_begin(&utask->ri_seqcount);
> > use in ri_timer() needs a fix ?
>
So, write_seqcount_begin()'s documentation states:
/**
* write_seqcount_begin() - start a seqcount_t write side critical section
* @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants
*
* Context: sequence counter write side sections must be serialized and
* non-preemptible. Preemption will be automatically disabled if and
* only if the seqcount write serialization lock is associated, and
* preemptible. If readers can be invoked from hardirq or softirq
* context, interrupts or bottom halves must be respectively disabled.
*/
In our case we cannot have readers invoked from hardirq/softirq. It's
the writer that can be invoked from hardirq (timer).
So what did I do incorrectly here? Should I still disable hardirqs
just to satisfy that seqprop_assert()?
Peter, any opinion?
> Hmm,
>
> write_seqcount_begin(&utask->ri_seqcount);
>
> for_each_ret_instance_rcu(ri, utask->return_instances)
> hprobe_expire(&ri->hprobe, false);
>
> write_seqcount_end(&utask->ri_seqcount);
>
> How big can that loop be?
>
> Of course, we could just say not to use uprobes on PREEMPT_RT kernels?
> Otherwise, they could cause an unspecified latency.
There can't be more than 64 nested uretprobes, so it will be (in a
very-very unlikely event) at most 64 items. And that hprobe_expire()
operation is very fast. So I don't think latency is a big concert
here.
>
> -- Steve
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: uprobe splat in PREEMP_RT
2025-04-01 21:04 uprobe splat in PREEMP_RT Alexei Starovoitov
2025-04-01 21:22 ` Steven Rostedt
@ 2025-04-02 9:10 ` Oleg Nesterov
2025-04-02 10:54 ` Sebastian Sewior
1 sibling, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2025-04-02 9:10 UTC (permalink / raw)
To: Alexei Starovoitov, Peter Ziljstra
Cc: Andrii Nakryiko, bpf, Sebastian Sewior, Steven Rostedt, Jiri Olsa,
Masami Hiramatsu
Add Peter.
I never understood why __seqprop_preemptible() returns false.
Stupid question, perhaps
--- x/include/linux/seqlock.h
+++ x/include/linux/seqlock.h
@@ -213,12 +213,11 @@ static inline unsigned __seqprop_sequenc
static inline bool __seqprop_preemptible(const seqcount_t *s)
{
- return false;
+ return true;
}
static inline void __seqprop_assert(const seqcount_t *s)
{
- lockdep_assert_preemption_disabled();
}
#define __SEQ_RT IS_ENABLED(CONFIG_PREEMPT_RT)
makes more sense?
Then we can remove the no longer necessary preempt_disable()'s
before write_seqcount_begin() in other users of seqcount_t.
Oleg.
On 04/01, Alexei Starovoitov wrote:
>
> Hi,
>
> caught the following splat running uprobe tests in PREEMPT_RT
>
> [ 101.862206] ------------[ cut here ]------------
> [ 101.862212] WARNING: CPU: 0 PID: 16 at include/linux/seqlock.h:221
> ri_timer+0x235/0x320
> [ 101.862226] Modules linked in:
> [ 101.862233] CPU: 0 UID: 0 PID: 16 Comm: ktimers/0 Not tainted
> 6.14.0-12141-g1d0ec9988088 #22 PREEMPT_RT
> [ 101.862240] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
> [ 101.862243] RIP: 0010:ri_timer+0x235/0x320
> [ 101.862249] Code: 5d 41 5e 41 5f e9 5b f5 b7 ff 65 f7 05 a8 95 ff
> 04 ff ff ff 7f 0f 85 ad fe ff ff 65 8b 05 57 cf ff 04 85 c0 0f 84 9e
> fe ff ff <0f> 0b e9 97 fe ff ff e8 df 7b b8 ff 84 c0 0f 85 43 fe ff ff
> e8 52
> [ 101.862253] RSP: 0018:ffffc9000010fb80 EFLAGS: 00010202
> [ 101.862257] RAX: 0000000000000001 RBX: ffffffff819c8889 RCX: 0000000000000001
> [ 101.862260] RDX: 0000000000000000 RSI: ffffffff819c8889 RDI: ffff8881f6a33910
> [ 101.862262] RBP: ffff888105a1da18 R08: 000000000000000a R09: 000000000000000a
> [ 101.862265] R10: ffffc9000010f987 R11: 0000000000000000 R12: 1ffff92000021f78
> [ 101.862267] R13: 0000000000000000 R14: ffffffff819c8860 R15: 0000000000000000
> [ 101.862292] FS: 0000000000000000(0000) GS:ffff88827005e000(0000)
> knlGS:0000000000000000
> [ 101.862316] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 101.862319] CR2: 00007fffffffe000 CR3: 0000000109d67004 CR4: 00000000003706f0
> [ 101.862322] Call Trace:
> [ 101.862325] <TASK>
> [ 101.862333] ? free_ret_instance+0x180/0x180
> [ 101.862338] call_timer_fn+0x14c/0x3c0
> [ 101.862345] ? lock_release+0xb6/0x250
> [ 101.862353] ? detach_if_pending+0x310/0x310
> [ 101.862363] ? _raw_spin_unlock_irq+0x28/0x40
> [ 101.862371] ? lockdep_hardirqs_on_prepare+0xa7/0x170
> [ 101.862380] __run_timers+0x58a/0x980
> [ 101.862385] ? free_ret_instance+0x180/0x180
> [ 101.862396] ? timer_shutdown_sync+0x20/0x20
> [ 101.862402] ? lock_acquire+0x123/0x2b0
> [ 101.862408] ? run_timer_softirq+0x11a/0x220
> [ 101.862414] ? do_raw_spin_lock+0x11e/0x240
> [ 101.862419] ? spin_bug+0x230/0x230
> [ 101.862422] ? rtlock_slowlock_locked+0x50a0/0x50a0
> [ 101.862433] run_timer_softirq+0x122/0x220
> [ 101.862503] handle_softirqs.isra.0+0x136/0x610
> [ 101.862518] run_ktimerd+0x47/0xe0
> [ 101.862524] smpboot_thread_fn+0x30f/0x8a0
> [ 101.862531] ? schedule+0xe2/0x390
> [ 101.862537] ? sort_range+0x20/0x20
> [ 101.862541] kthread+0x3ac/0x770
> [ 101.862547] ? rt_read_trylock+0x1d0/0x1d0
> [ 101.862554] ? kthread_is_per_cpu+0xc0/0xc0
> [ 101.862560] ? lock_release+0xb6/0x250
> [ 101.862570] ? kthread_is_per_cpu+0xc0/0xc0
> [ 101.862574] ret_from_fork+0x31/0x70
> [ 101.862580] ? kthread_is_per_cpu+0xc0/0xc0
> [ 101.862586] ret_from_fork_asm+0x11/0x20
> [ 101.862604] </TASK>
> [ 101.862606] irq event stamp: 13032
> [ 101.862608] hardirqs last enabled at (13034): [<ffffffff8150094a>]
> vprintk_store+0x72a/0x850
> [ 101.862613] hardirqs last disabled at (13035): [<ffffffff8150061d>]
> vprintk_store+0x3fd/0x850
> [ 101.862615] softirqs last enabled at (12922): [<ffffffff8136e049>]
> run_ktimerd+0x69/0xe0
> [ 101.862618] softirqs last disabled at (12928): [<ffffffff813ef4bf>]
> smpboot_thread_fn+0x30f/0x8a0
> [ 101.862621] ---[ end trace 0000000000000000 ]---
>
> Looks like write_seqcount_begin(&utask->ri_seqcount);
> use in ri_timer() needs a fix ?
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: uprobe splat in PREEMP_RT
2025-04-01 22:01 ` Andrii Nakryiko
@ 2025-04-02 10:33 ` Oleg Nesterov
2025-04-02 10:57 ` Sebastian Sewior
2025-04-02 16:15 ` Andrii Nakryiko
2025-04-02 12:21 ` Sebastian Sewior
1 sibling, 2 replies; 28+ messages in thread
From: Oleg Nesterov @ 2025-04-02 10:33 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Steven Rostedt, Peter Ziljstra, Alexei Starovoitov,
Andrii Nakryiko, bpf, Sebastian Sewior, Jiri Olsa,
Masami Hiramatsu
On 04/01, Andrii Nakryiko wrote:
>
> On Tue, Apr 1, 2025 at 2:21 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > Hmm,
> >
> > write_seqcount_begin(&utask->ri_seqcount);
> >
> > for_each_ret_instance_rcu(ri, utask->return_instances)
> > hprobe_expire(&ri->hprobe, false);
> >
> > write_seqcount_end(&utask->ri_seqcount);
> >
> > How big can that loop be?
> >
> > Of course, we could just say not to use uprobes on PREEMPT_RT kernels?
> > Otherwise, they could cause an unspecified latency.
>
> There can't be more than 64 nested uretprobes, so it will be (in a
> very-very unlikely event) at most 64 items. And that hprobe_expire()
> operation is very fast. So I don't think latency is a big concert
> here.
I still didn't read this code, but after the quick glance I don't
understand why do we actually need utask->ri_seqcount.
The "writer" ri_timer() can't race with itself, right?
The "reader" free_ret_instance() uses raw_seqcount_try_begin() without
the "retry" logic.
I have no idea if this logic is correct or not, but it seems that (apart
from the necessary barriers) we could use the utask->ri_timer_is_running
boolean instead with the same effect? Set/cleared in ri_timer(), checked
in free_ret_instance().
I must have missed something...
Oleg.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: uprobe splat in PREEMP_RT
2025-04-02 9:10 ` Oleg Nesterov
@ 2025-04-02 10:54 ` Sebastian Sewior
2025-04-02 11:20 ` Oleg Nesterov
0 siblings, 1 reply; 28+ messages in thread
From: Sebastian Sewior @ 2025-04-02 10:54 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Alexei Starovoitov, Peter Ziljstra, Andrii Nakryiko, bpf,
Steven Rostedt, Jiri Olsa, Masami Hiramatsu
On 2025-04-02 11:10:45 [+0200], Oleg Nesterov wrote:
> Add Peter.
>
> I never understood why __seqprop_preemptible() returns false.
> Stupid question, perhaps
>
> --- x/include/linux/seqlock.h
> +++ x/include/linux/seqlock.h
> @@ -213,12 +213,11 @@ static inline unsigned __seqprop_sequenc
>
> static inline bool __seqprop_preemptible(const seqcount_t *s)
> {
> - return false;
> + return true;
> }
>
> static inline void __seqprop_assert(const seqcount_t *s)
> {
> - lockdep_assert_preemption_disabled();
> }
>
> #define __SEQ_RT IS_ENABLED(CONFIG_PREEMPT_RT)
>
> makes more sense?
>
> Then we can remove the no longer necessary preempt_disable()'s
> before write_seqcount_begin() in other users of seqcount_t.
This depends on locktype that is coupled with the seqcount.
If the lock disables preemption and relies on it then it must be somehow
enforced on PREEMPT_RT or rely on the lock+unlock mechnanism to avoid
deadlocks. Also it needs to be ensured that you don't have two writer
since preemption is allowed.
> Oleg.
Sebastian
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: uprobe splat in PREEMP_RT
2025-04-02 10:33 ` Oleg Nesterov
@ 2025-04-02 10:57 ` Sebastian Sewior
2025-04-02 11:23 ` Oleg Nesterov
2025-04-02 16:15 ` Andrii Nakryiko
1 sibling, 1 reply; 28+ messages in thread
From: Sebastian Sewior @ 2025-04-02 10:57 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Andrii Nakryiko, Steven Rostedt, Peter Ziljstra,
Alexei Starovoitov, Andrii Nakryiko, bpf, Jiri Olsa,
Masami Hiramatsu
On 2025-04-02 12:33:55 [+0200], Oleg Nesterov wrote:
>
> I still didn't read this code, but after the quick glance I don't
> understand why do we actually need utask->ri_seqcount.
Same here but on it…
> The "writer" ri_timer() can't race with itself, right?
On PREEMPT_RT the timer could be preempted by a task with higher
priority and invoke hprobe_expire() somewhere else.
> The "reader" free_ret_instance() uses raw_seqcount_try_begin() without
> the "retry" logic.
>
> I have no idea if this logic is correct or not, but it seems that (apart
> from the necessary barriers) we could use the utask->ri_timer_is_running
> boolean instead with the same effect? Set/cleared in ri_timer(), checked
> in free_ret_instance().
>
> I must have missed something...
Let me try to study this before I can make a statement…
> Oleg.
Sebastian
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: uprobe splat in PREEMP_RT
2025-04-02 10:54 ` Sebastian Sewior
@ 2025-04-02 11:20 ` Oleg Nesterov
2025-04-02 11:31 ` Oleg Nesterov
0 siblings, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2025-04-02 11:20 UTC (permalink / raw)
To: Sebastian Sewior
Cc: Alexei Starovoitov, Peter Ziljstra, Andrii Nakryiko, bpf,
Steven Rostedt, Jiri Olsa, Masami Hiramatsu
On 04/02, Sebastian Sewior wrote:
>
> On 2025-04-02 11:10:45 [+0200], Oleg Nesterov wrote:
> > Add Peter.
> >
> > I never understood why __seqprop_preemptible() returns false.
> > Stupid question, perhaps
> >
> > --- x/include/linux/seqlock.h
> > +++ x/include/linux/seqlock.h
> > @@ -213,12 +213,11 @@ static inline unsigned __seqprop_sequenc
> >
> > static inline bool __seqprop_preemptible(const seqcount_t *s)
> > {
> > - return false;
> > + return true;
> > }
> >
> > static inline void __seqprop_assert(const seqcount_t *s)
> > {
> > - lockdep_assert_preemption_disabled();
> > }
> >
> > #define __SEQ_RT IS_ENABLED(CONFIG_PREEMPT_RT)
> >
> > makes more sense?
> >
> > Then we can remove the no longer necessary preempt_disable()'s
> > before write_seqcount_begin() in other users of seqcount_t.
>
> This depends on locktype that is coupled with the seqcount.
Yes.
But seqcount_t doesn't have the "internal" lock. Unlike other
seqcount's defined by SEQCOUNT_LOCKNAME().
> If the lock disables preemption and relies on it then it must be somehow
> enforced on PREEMPT_RT or rely on the lock+unlock mechnanism to avoid
> deadlocks. Also it needs to be ensured that you don't have two writer
> since preemption is allowed.
Sorry, I don't understand.
Again, seqcount_t differs, it can't do lock+unlock like (say)
seqcount_spinlock_t.
Oleg.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: uprobe splat in PREEMP_RT
2025-04-02 10:57 ` Sebastian Sewior
@ 2025-04-02 11:23 ` Oleg Nesterov
2025-04-02 12:13 ` Sebastian Sewior
0 siblings, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2025-04-02 11:23 UTC (permalink / raw)
To: Sebastian Sewior
Cc: Andrii Nakryiko, Steven Rostedt, Peter Ziljstra,
Alexei Starovoitov, Andrii Nakryiko, bpf, Jiri Olsa,
Masami Hiramatsu
On 04/02, Sebastian Sewior wrote:
>
> On 2025-04-02 12:33:55 [+0200], Oleg Nesterov wrote:
> >
> > The "writer" ri_timer() can't race with itself, right?
>
> On PREEMPT_RT the timer could be preempted by a task with higher
> priority and invoke hprobe_expire() somewhere else.
This is clear, but ri_timer() still can not race with itself, no?
Oleg.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: uprobe splat in PREEMP_RT
2025-04-02 11:20 ` Oleg Nesterov
@ 2025-04-02 11:31 ` Oleg Nesterov
2025-04-02 12:06 ` Sebastian Sewior
0 siblings, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2025-04-02 11:31 UTC (permalink / raw)
To: Sebastian Sewior
Cc: Alexei Starovoitov, Peter Ziljstra, Andrii Nakryiko, bpf,
Steven Rostedt, Jiri Olsa, Masami Hiramatsu
On 04/02, Oleg Nesterov wrote:
>
> On 04/02, Sebastian Sewior wrote:
> >
> > On 2025-04-02 11:10:45 [+0200], Oleg Nesterov wrote:
> > > Add Peter.
> > >
> > > I never understood why __seqprop_preemptible() returns false.
> > > Stupid question, perhaps
> > >
> > > --- x/include/linux/seqlock.h
> > > +++ x/include/linux/seqlock.h
> > > @@ -213,12 +213,11 @@ static inline unsigned __seqprop_sequenc
> > >
> > > static inline bool __seqprop_preemptible(const seqcount_t *s)
> > > {
> > > - return false;
> > > + return true;
> > > }
> > >
> > > static inline void __seqprop_assert(const seqcount_t *s)
> > > {
> > > - lockdep_assert_preemption_disabled();
> > > }
> > >
> > > #define __SEQ_RT IS_ENABLED(CONFIG_PREEMPT_RT)
> > >
> > > makes more sense?
> > >
> > > Then we can remove the no longer necessary preempt_disable()'s
> > > before write_seqcount_begin() in other users of seqcount_t.
> >
> > This depends on locktype that is coupled with the seqcount.
>
> Yes.
>
> But seqcount_t doesn't have the "internal" lock. Unlike other
> seqcount's defined by SEQCOUNT_LOCKNAME().
>
> > If the lock disables preemption and relies on it then it must be somehow
> > enforced on PREEMPT_RT or rely on the lock+unlock mechnanism to avoid
> > deadlocks. Also it needs to be ensured that you don't have two writer
> > since preemption is allowed.
>
> Sorry, I don't understand.
>
> Again, seqcount_t differs, it can't do lock+unlock like (say)
> seqcount_spinlock_t.
IOW.
I understand that seqcount_t is not RT-friendly, but why exactly do
you think the patch above can make the things worse?
Oleg.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: uprobe splat in PREEMP_RT
2025-04-02 11:31 ` Oleg Nesterov
@ 2025-04-02 12:06 ` Sebastian Sewior
2025-04-02 12:12 ` Oleg Nesterov
0 siblings, 1 reply; 28+ messages in thread
From: Sebastian Sewior @ 2025-04-02 12:06 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Alexei Starovoitov, Peter Ziljstra, Andrii Nakryiko, bpf,
Steven Rostedt, Jiri Olsa, Masami Hiramatsu
On 2025-04-02 13:31:43 [+0200], Oleg Nesterov wrote:
> > > > Then we can remove the no longer necessary preempt_disable()'s
> > > > before write_seqcount_begin() in other users of seqcount_t.
> > >
> > > This depends on locktype that is coupled with the seqcount.
> >
> > Yes.
> >
> > But seqcount_t doesn't have the "internal" lock. Unlike other
> > seqcount's defined by SEQCOUNT_LOCKNAME().
> >
> > > If the lock disables preemption and relies on it then it must be somehow
> > > enforced on PREEMPT_RT or rely on the lock+unlock mechnanism to avoid
> > > deadlocks. Also it needs to be ensured that you don't have two writer
> > > since preemption is allowed.
> >
> > Sorry, I don't understand.
> >
> > Again, seqcount_t differs, it can't do lock+unlock like (say)
> > seqcount_spinlock_t.
>
> IOW.
>
> I understand that seqcount_t is not RT-friendly, but why exactly do
> you think the patch above can make the things worse?
We wouldn't notice such a case.
> Oleg.
Sebastian
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: uprobe splat in PREEMP_RT
2025-04-02 12:06 ` Sebastian Sewior
@ 2025-04-02 12:12 ` Oleg Nesterov
2025-04-02 12:16 ` Sebastian Sewior
0 siblings, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2025-04-02 12:12 UTC (permalink / raw)
To: Sebastian Sewior
Cc: Alexei Starovoitov, Peter Ziljstra, Andrii Nakryiko, bpf,
Steven Rostedt, Jiri Olsa, Masami Hiramatsu
On 04/02, Sebastian Sewior wrote:
>
> On 2025-04-02 13:31:43 [+0200], Oleg Nesterov wrote:
> >
> > IOW.
> >
> > I understand that seqcount_t is not RT-friendly, but why exactly do
> > you think the patch above can make the things worse?
>
> We wouldn't notice such a case.
Sebastian, could you spell please?
What case we wouldn't notice?
With this patch write_seqcount_begin(seqcount_t) will notice that
seqprop_preemptible() is true and do preempt_disable() itself.
Oleg.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: uprobe splat in PREEMP_RT
2025-04-02 11:23 ` Oleg Nesterov
@ 2025-04-02 12:13 ` Sebastian Sewior
2025-04-02 12:18 ` Oleg Nesterov
0 siblings, 1 reply; 28+ messages in thread
From: Sebastian Sewior @ 2025-04-02 12:13 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Andrii Nakryiko, Steven Rostedt, Peter Ziljstra,
Alexei Starovoitov, Andrii Nakryiko, bpf, Jiri Olsa,
Masami Hiramatsu
On 2025-04-02 13:23:09 [+0200], Oleg Nesterov wrote:
> On 04/02, Sebastian Sewior wrote:
> >
> > On 2025-04-02 12:33:55 [+0200], Oleg Nesterov wrote:
> > >
> > > The "writer" ri_timer() can't race with itself, right?
> >
> > On PREEMPT_RT the timer could be preempted by a task with higher
> > priority and invoke hprobe_expire() somewhere else.
>
> This is clear, but ri_timer() still can not race with itself, no?
No, ri_timer() can not race against itself.
The preempted ri_timer() could stall a read_seqcount_begin(). Also not
the case. The API tries to prevent such things.
> Oleg.
Sebastian
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: uprobe splat in PREEMP_RT
2025-04-02 12:12 ` Oleg Nesterov
@ 2025-04-02 12:16 ` Sebastian Sewior
2025-04-02 13:56 ` Oleg Nesterov
0 siblings, 1 reply; 28+ messages in thread
From: Sebastian Sewior @ 2025-04-02 12:16 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Alexei Starovoitov, Peter Ziljstra, Andrii Nakryiko, bpf,
Steven Rostedt, Jiri Olsa, Masami Hiramatsu
On 2025-04-02 14:12:28 [+0200], Oleg Nesterov wrote:
> On 04/02, Sebastian Sewior wrote:
> >
> > On 2025-04-02 13:31:43 [+0200], Oleg Nesterov wrote:
> > >
> > > IOW.
> > >
> > > I understand that seqcount_t is not RT-friendly, but why exactly do
> > > you think the patch above can make the things worse?
> >
> > We wouldn't notice such a case.
>
> Sebastian, could you spell please?
>
> What case we wouldn't notice?
I'm sorry. It wouldn't notice that preemption isn't disabled and yell.
> With this patch write_seqcount_begin(seqcount_t) will notice that
> seqprop_preemptible() is true and do preempt_disable() itself.
Yes, but that we don't want. This would disable preemption for the whole
section and not allow anything on PREEMPT_RT what would be possible
otherwise. Like acquire a spinlock_t or so.
Yes, none of this would affect hprobe_expire().
> Oleg.
Sebastian
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: uprobe splat in PREEMP_RT
2025-04-02 12:13 ` Sebastian Sewior
@ 2025-04-02 12:18 ` Oleg Nesterov
2025-04-02 12:24 ` Sebastian Sewior
0 siblings, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2025-04-02 12:18 UTC (permalink / raw)
To: Sebastian Sewior
Cc: Andrii Nakryiko, Steven Rostedt, Peter Ziljstra,
Alexei Starovoitov, Andrii Nakryiko, bpf, Jiri Olsa,
Masami Hiramatsu
On 04/02, Sebastian Sewior wrote:
>
> On 2025-04-02 13:23:09 [+0200], Oleg Nesterov wrote:
> > On 04/02, Sebastian Sewior wrote:
> > >
> > > On 2025-04-02 12:33:55 [+0200], Oleg Nesterov wrote:
> > > >
> > > > The "writer" ri_timer() can't race with itself, right?
> > >
> > > On PREEMPT_RT the timer could be preempted by a task with higher
> > > priority and invoke hprobe_expire() somewhere else.
> >
> > This is clear, but ri_timer() still can not race with itself, no?
>
> No, ri_timer() can not race against itself.
Okay,
> The preempted ri_timer() could stall a read_seqcount_begin().
Again, nobody use read_seqcount_begin(utask->ri_seqcount).
free_ret_instance() uses raw_seqcount_try_begin(utask->ri_seqcount),
which, since ri_seqcount is seqcount_t, is just smp_load_acquire().
This can't stall.
Oleg.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: uprobe splat in PREEMP_RT
2025-04-01 22:01 ` Andrii Nakryiko
2025-04-02 10:33 ` Oleg Nesterov
@ 2025-04-02 12:21 ` Sebastian Sewior
1 sibling, 0 replies; 28+ messages in thread
From: Sebastian Sewior @ 2025-04-02 12:21 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Steven Rostedt, Peter Ziljstra, Alexei Starovoitov, Oleg Nesterov,
Andrii Nakryiko, bpf, Jiri Olsa, Masami Hiramatsu
On 2025-04-01 15:01:55 [-0700], Andrii Nakryiko wrote:
> So, write_seqcount_begin()'s documentation states:
>
> /**
> * write_seqcount_begin() - start a seqcount_t write side critical section
> * @s: Pointer to seqcount_t or any of the seqcount_LOCKNAME_t variants
> *
> * Context: sequence counter write side sections must be serialized and
> * non-preemptible. Preemption will be automatically disabled if and
> * only if the seqcount write serialization lock is associated, and
> * preemptible. If readers can be invoked from hardirq or softirq
> * context, interrupts or bottom halves must be respectively disabled.
> */
>
>
> In our case we cannot have readers invoked from hardirq/softirq. It's
> the writer that can be invoked from hardirq (timer).
>
> So what did I do incorrectly here? Should I still disable hardirqs
> just to satisfy that seqprop_assert()?
You don't have a lock associated with the seqcount. The writer side
ensure that there can only be one writer because the timer can only fire
once.
The reader side uses raw_seqcount_try_begin() so it does not require the
writer to make progress. Meaning if it preempts the writer then it will
continue and make progress.
If these conditions remain, why not
s/write_seqcount_begin/raw_write_seqcount_begin// ? This would avoid the
checks as per "I know best" and I don't have a reason why a lock would
make sense to exclude multiple writers.
> Peter, any opinion?
Sebastian
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: uprobe splat in PREEMP_RT
2025-04-02 12:18 ` Oleg Nesterov
@ 2025-04-02 12:24 ` Sebastian Sewior
2025-04-02 14:12 ` Oleg Nesterov
0 siblings, 1 reply; 28+ messages in thread
From: Sebastian Sewior @ 2025-04-02 12:24 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Andrii Nakryiko, Steven Rostedt, Peter Ziljstra,
Alexei Starovoitov, Andrii Nakryiko, bpf, Jiri Olsa,
Masami Hiramatsu
On 2025-04-02 14:18:51 [+0200], Oleg Nesterov wrote:
> On 04/02, Sebastian Sewior wrote:
I need to tell mutt to replace my name in case it is misspelled.
> > The preempted ri_timer() could stall a read_seqcount_begin().
>
> Again, nobody use read_seqcount_begin(utask->ri_seqcount).
>
> free_ret_instance() uses raw_seqcount_try_begin(utask->ri_seqcount),
> which, since ri_seqcount is seqcount_t, is just smp_load_acquire().
> This can't stall.
Yes. This would work for here just to skip the check because of all
details that are hard to express. Therefore I suggested to use
raw_write_seqcount_begin() instead of write_seqcount_begin() in
20250402122158.j_8VoHQ-@linutronix.de. Would that work?
> Oleg.
Sebastian
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: uprobe splat in PREEMP_RT
2025-04-02 12:16 ` Sebastian Sewior
@ 2025-04-02 13:56 ` Oleg Nesterov
2025-04-02 14:23 ` Oleg Nesterov
0 siblings, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2025-04-02 13:56 UTC (permalink / raw)
To: Sebastian Sewior
Cc: Alexei Starovoitov, Peter Ziljstra, Andrii Nakryiko, bpf,
Steven Rostedt, Jiri Olsa, Masami Hiramatsu
On 04/02, Sebastian Sewior wrote:
>
> On 2025-04-02 14:12:28 [+0200], Oleg Nesterov wrote:
> > On 04/02, Sebastian Sewior wrote:
> > >
> > > On 2025-04-02 13:31:43 [+0200], Oleg Nesterov wrote:
> > > >
> > > > IOW.
> > > >
> > > > I understand that seqcount_t is not RT-friendly, but why exactly do
> > > > you think the patch above can make the things worse?
> > >
> > > We wouldn't notice such a case.
> >
> > Sebastian, could you spell please?
> >
> > What case we wouldn't notice?
>
> I'm sorry. It wouldn't notice that preemption isn't disabled and yell.
>
> > With this patch write_seqcount_begin(seqcount_t) will notice that
> > seqprop_preemptible() is true and do preempt_disable() itself.
>
> Yes, but that we don't want. This would disable preemption for the whole
> section and not allow anything on PREEMPT_RT what would be possible
> otherwise. Like acquire a spinlock_t or so.
Still can't understand...
Currently __seqprop_assert() does lockdep_assert_preemption_disabled().
This means that at least with PREEMPT_RT=y preemption must be disabled
even before write_seqcount_begin(seqcount_t).
That is why (I guess) for example i_size_write() does
preempt_disable() before write_seqcount_begin(&inode->i_size_seqcount).
> Yes, none of this would affect hprobe_expire().
Yes.
Oleg.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: uprobe splat in PREEMP_RT
2025-04-02 12:24 ` Sebastian Sewior
@ 2025-04-02 14:12 ` Oleg Nesterov
2025-04-03 7:37 ` Sebastian Sewior
0 siblings, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2025-04-02 14:12 UTC (permalink / raw)
To: Sebastian Sewior
Cc: Andrii Nakryiko, Steven Rostedt, Peter Ziljstra,
Alexei Starovoitov, Andrii Nakryiko, bpf, Jiri Olsa,
Masami Hiramatsu
On 04/02, Sebastian Sewior wrote:
>
> On 2025-04-02 14:18:51 [+0200], Oleg Nesterov wrote:
> > On 04/02, Sebastian Sewior wrote:
>
> I need to tell mutt to replace my name in case it is misspelled.
Hmm... have I misspelled your name somewhere?
If yes - my apologies.
> > > The preempted ri_timer() could stall a read_seqcount_begin().
> >
> > Again, nobody use read_seqcount_begin(utask->ri_seqcount).
> >
> > free_ret_instance() uses raw_seqcount_try_begin(utask->ri_seqcount),
> > which, since ri_seqcount is seqcount_t, is just smp_load_acquire().
> > This can't stall.
>
> Yes. This would work for here just to skip the check because of all
> details that are hard to express. Therefore I suggested to use
> raw_write_seqcount_begin() instead of write_seqcount_begin() in
> 20250402122158.j_8VoHQ-@linutronix.de. Would that work?
If this can work, then let me repeat: why can't we turn ->ri_seqcount
into a boolean?
That was my question. I don't understand the purpose of
"seqcount_t ri_seqcount" regardless of the reported problem.
This "WARNING: CPU: 0 PID: 16 at include/linux/seqlock.h:221" is another
issue.
Again, I must have missed something.
Oleg.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: uprobe splat in PREEMP_RT
2025-04-02 13:56 ` Oleg Nesterov
@ 2025-04-02 14:23 ` Oleg Nesterov
2025-04-03 9:08 ` Sebastian Sewior
0 siblings, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2025-04-02 14:23 UTC (permalink / raw)
To: Sebastian Sewior
Cc: Alexei Starovoitov, Peter Ziljstra, Andrii Nakryiko, bpf,
Steven Rostedt, Jiri Olsa, Masami Hiramatsu
OK, it seems we can't understand each other. Probably my fault.
So, it think that
static inline bool __seqprop_preemptible(const seqcount_t *s)
{
return false;
}
should return true. Well because it is preemptible just like
SEQCOUNT_LOCKNAME(mutex) or, if PREEMPT_RT=y, SEQCOUNT_LOCKNAME(spinlock).
Am I wrong?
Oleg.
On 04/02, Oleg Nesterov wrote:
>
> On 04/02, Sebastian Sewior wrote:
> >
> > On 2025-04-02 14:12:28 [+0200], Oleg Nesterov wrote:
> > > On 04/02, Sebastian Sewior wrote:
> > > >
> > > > On 2025-04-02 13:31:43 [+0200], Oleg Nesterov wrote:
> > > > >
> > > > > IOW.
> > > > >
> > > > > I understand that seqcount_t is not RT-friendly, but why exactly do
> > > > > you think the patch above can make the things worse?
> > > >
> > > > We wouldn't notice such a case.
> > >
> > > Sebastian, could you spell please?
> > >
> > > What case we wouldn't notice?
> >
> > I'm sorry. It wouldn't notice that preemption isn't disabled and yell.
> >
> > > With this patch write_seqcount_begin(seqcount_t) will notice that
> > > seqprop_preemptible() is true and do preempt_disable() itself.
> >
> > Yes, but that we don't want. This would disable preemption for the whole
> > section and not allow anything on PREEMPT_RT what would be possible
> > otherwise. Like acquire a spinlock_t or so.
>
> Still can't understand...
>
> Currently __seqprop_assert() does lockdep_assert_preemption_disabled().
> This means that at least with PREEMPT_RT=y preemption must be disabled
> even before write_seqcount_begin(seqcount_t).
>
> That is why (I guess) for example i_size_write() does
> preempt_disable() before write_seqcount_begin(&inode->i_size_seqcount).
>
> > Yes, none of this would affect hprobe_expire().
>
> Yes.
>
> Oleg.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: uprobe splat in PREEMP_RT
2025-04-02 10:33 ` Oleg Nesterov
2025-04-02 10:57 ` Sebastian Sewior
@ 2025-04-02 16:15 ` Andrii Nakryiko
2025-04-02 16:57 ` Oleg Nesterov
1 sibling, 1 reply; 28+ messages in thread
From: Andrii Nakryiko @ 2025-04-02 16:15 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Steven Rostedt, Peter Ziljstra, Alexei Starovoitov,
Andrii Nakryiko, bpf, Sebastian Sewior, Jiri Olsa,
Masami Hiramatsu
On Wed, Apr 2, 2025 at 3:34 AM Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 04/01, Andrii Nakryiko wrote:
> >
> > On Tue, Apr 1, 2025 at 2:21 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> > >
> > > Hmm,
> > >
> > > write_seqcount_begin(&utask->ri_seqcount);
> > >
> > > for_each_ret_instance_rcu(ri, utask->return_instances)
> > > hprobe_expire(&ri->hprobe, false);
> > >
> > > write_seqcount_end(&utask->ri_seqcount);
> > >
> > > How big can that loop be?
> > >
> > > Of course, we could just say not to use uprobes on PREEMPT_RT kernels?
> > > Otherwise, they could cause an unspecified latency.
> >
> > There can't be more than 64 nested uretprobes, so it will be (in a
> > very-very unlikely event) at most 64 items. And that hprobe_expire()
> > operation is very fast. So I don't think latency is a big concert
> > here.
>
> I still didn't read this code, but after the quick glance I don't
> understand why do we actually need utask->ri_seqcount.
>
> The "writer" ri_timer() can't race with itself, right?
>
> The "reader" free_ret_instance() uses raw_seqcount_try_begin() without
> the "retry" logic.
>
> I have no idea if this logic is correct or not, but it seems that (apart
> from the necessary barriers) we could use the utask->ri_timer_is_running
> boolean instead with the same effect? Set/cleared in ri_timer(), checked
> in free_ret_instance().
"Apart from the necessary barriers" is exactly what I didn't want to
deal with, tbh... Which is why I went with (ab)using seqcount lock.
Other than that, yes, the reader logic is very simple and just wants
to make sure that ri_timer (writer) couldn't have seen the
return_instance we are about to immediately reuse (which would pose a
problem).
>
> I must have missed something...
>
> Oleg.
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: uprobe splat in PREEMP_RT
2025-04-02 16:15 ` Andrii Nakryiko
@ 2025-04-02 16:57 ` Oleg Nesterov
0 siblings, 0 replies; 28+ messages in thread
From: Oleg Nesterov @ 2025-04-02 16:57 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Steven Rostedt, Peter Ziljstra, Alexei Starovoitov,
Andrii Nakryiko, bpf, Sebastian Sewior, Jiri Olsa,
Masami Hiramatsu
On 04/02, Andrii Nakryiko wrote:
>
> On Wed, Apr 2, 2025 at 3:34 AM Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > I have no idea if this logic is correct or not, but it seems that (apart
> > from the necessary barriers) we could use the utask->ri_timer_is_running
> > boolean instead with the same effect? Set/cleared in ri_timer(), checked
> > in free_ret_instance().
>
> "Apart from the necessary barriers" is exactly what I didn't want to
> deal with, tbh... Which is why I went with (ab)using seqcount lock.
>
> Other than that, yes, the reader logic is very simple and just wants
> to make sure that ri_timer (writer) couldn't have seen the
> return_instance we are about to immediately reuse (which would pose a
> problem).
Ah. This answers my question about the motivation to use seqcount_t,
thanks.
I am not going to question your decision, but perhaps this deserves a
comment, it is not immediately clear from reading this code...
Thanks!
Oleg.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: uprobe splat in PREEMP_RT
2025-04-02 14:12 ` Oleg Nesterov
@ 2025-04-03 7:37 ` Sebastian Sewior
2025-04-03 12:27 ` Oleg Nesterov
0 siblings, 1 reply; 28+ messages in thread
From: Sebastian Sewior @ 2025-04-03 7:37 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Andrii Nakryiko, Steven Rostedt, Peter Ziljstra,
Alexei Starovoitov, Andrii Nakryiko, bpf, Jiri Olsa,
Masami Hiramatsu
On 2025-04-02 16:12:46 [+0200], Oleg Nesterov wrote:
> On 04/02, Sebastian Sewior wrote:
> >
> > On 2025-04-02 14:18:51 [+0200], Oleg Nesterov wrote:
> > > On 04/02, Sebastian Sewior wrote:
> >
> > I need to tell mutt to replace my name in case it is misspelled.
>
> Hmm... have I misspelled your name somewhere?
>
> If yes - my apologies.
Don't worry. You didn't start it, it is just I noticed it.
> > > > The preempted ri_timer() could stall a read_seqcount_begin().
> > >
> > > Again, nobody use read_seqcount_begin(utask->ri_seqcount).
> > >
> > > free_ret_instance() uses raw_seqcount_try_begin(utask->ri_seqcount),
> > > which, since ri_seqcount is seqcount_t, is just smp_load_acquire().
> > > This can't stall.
> >
> > Yes. This would work for here just to skip the check because of all
> > details that are hard to express. Therefore I suggested to use
> > raw_write_seqcount_begin() instead of write_seqcount_begin() in
> > 20250402122158.j_8VoHQ-@linutronix.de. Would that work?
>
> If this can work, then let me repeat: why can't we turn ->ri_seqcount
> into a boolean?
I just stumbled here due to the warning. Now that you ask the question,
it is used a bool in the current construct. So yes, I also don't see
why.
> Oleg.
Sebastian
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: uprobe splat in PREEMP_RT
2025-04-02 14:23 ` Oleg Nesterov
@ 2025-04-03 9:08 ` Sebastian Sewior
2025-04-03 12:11 ` Oleg Nesterov
0 siblings, 1 reply; 28+ messages in thread
From: Sebastian Sewior @ 2025-04-03 9:08 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Alexei Starovoitov, Peter Ziljstra, Andrii Nakryiko, bpf,
Steven Rostedt, Jiri Olsa, Masami Hiramatsu
On 2025-04-02 16:23:50 [+0200], Oleg Nesterov wrote:
> OK, it seems we can't understand each other. Probably my fault.
>
> So, it think that
>
> static inline bool __seqprop_preemptible(const seqcount_t *s)
> {
> return false;
> }
>
> should return true. Well because it is preemptible just like
> SEQCOUNT_LOCKNAME(mutex) or, if PREEMPT_RT=y, SEQCOUNT_LOCKNAME(spinlock).
>
> Am I wrong?
A seqcount_t has no lock associated with so it is not preemptible. It
always refers to the lock. This should come from extern so it not only
disables preemption but also ensures that there can only be one writer.
The "disabling preemption" is only there to ensure progress is made in
reasonable time: You should not get preempted in your write section. If
the writer gets preempted then nothing bad (as in *boom*) will happen
because you ensured that you have only one writer can enter the section.
In that scenario the reader will spin on the counter until the writer
gets back on the CPU and completes the write section and while doing so
wasting resources. No boom, just wasting resources.
If you make __seqprop_preemptible() return true then
write_seqcount_begin() for seqcount_t will disable preemption on its
own. You could now remove all preempt_disable() around its callers. So
far so good as everyone should have one.
The problem here is that for !RT only seqcount_mutex_t gets preemption
disabled. For RT none of the seqcount_t variants get preemption disabled
but rely on lock+unlock mechanism to ensure that progress is made.
With this change it would also disable preemption for RT and I don't
know if it is is always the smart thing to do. Usually if the splats
pop up the users can be pointed to something else. Then we have also the
"limited" API where you can't use spinlock_t within the seqcount because
it would break PREEMPT_RT.
I've been now verbose to make up for the other few times where I was
brief :)
> Oleg.
Sebastian
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: uprobe splat in PREEMP_RT
2025-04-03 9:08 ` Sebastian Sewior
@ 2025-04-03 12:11 ` Oleg Nesterov
2025-04-03 12:43 ` Oleg Nesterov
0 siblings, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2025-04-03 12:11 UTC (permalink / raw)
To: Sebastian Sewior
Cc: Alexei Starovoitov, Peter Ziljstra, Andrii Nakryiko, bpf,
Steven Rostedt, Jiri Olsa, Masami Hiramatsu
On 04/03, Sebastian Sewior wrote:
>
> On 2025-04-02 16:23:50 [+0200], Oleg Nesterov wrote:
> > OK, it seems we can't understand each other. Probably my fault.
> >
> > So, it think that
> >
> > static inline bool __seqprop_preemptible(const seqcount_t *s)
> > {
> > return false;
> > }
> >
> > should return true. Well because it is preemptible just like
> > SEQCOUNT_LOCKNAME(mutex) or, if PREEMPT_RT=y, SEQCOUNT_LOCKNAME(spinlock).
> >
> > Am I wrong?
>
> A seqcount_t has no lock associated with so it is not preemptible.
Well, I still disagree...
> It
> always refers to the lock. This should come from extern so it not only
> disables preemption but also ensures that there can only be one writer.
Yes, but
> The "disabling preemption" is only there to ensure progress is made in
> reasonable time: You should not get preempted in your write section. If
> the writer gets preempted then nothing bad (as in *boom*) will happen
> because you ensured that you have only one writer can enter the section.
> In that scenario the reader will spin on the counter until the writer
> gets back on the CPU and completes the write section and while doing so
> wasting resources. No boom, just wasting resources.
This can lead to deadlock.
Suppose we have a seqcount_t SEQ, and we ensure that it has a single
writer. Say, this SEQ is protected by mutex.
The writer does write_seqcount_begin(&SEQ) on a UP machine, and it is
preeempted by a real-time process which does read_seqcount_begin(&SEQ).
The reader will spin "forever".
> If you make __seqprop_preemptible() return true then
> write_seqcount_begin() for seqcount_t will disable preemption on its
> own. You could now remove all preempt_disable() around its callers. So
> far so good as everyone should have one.
Yes,
> The problem here is that for !RT only seqcount_mutex_t gets preemption
> disabled.
Sure, for !RT spinlock/rwlock disable preemption,
> For RT none of the seqcount_t variants get preemption disabled
> but rely on lock+unlock mechanism to ensure that progress is made.
Yes I know, but seqcount_t doesn't have the associated lock.
> With this change it would also disable preemption for RT and I don't
> know if it is is always the smart thing to do.
I don't know either.
But the current logic doesn't look right to me.
From include/linux/seqlock.h
SEQCOUNT_LOCKNAME(raw_spinlock, raw_spinlock_t, false, raw_spin)
SEQCOUNT_LOCKNAME(spinlock, spinlock_t, __SEQ_RT, spin)
SEQCOUNT_LOCKNAME(rwlock, rwlock_t, __SEQ_RT, read)
SEQCOUNT_LOCKNAME(mutex, struct mutex, true, mutex)
so for these seqcount's seqprop_preemptible() returns false only if
the associated lock disables preemption. raw_spinlock always does this
spinlock/rwlock depend on PREEMPT_RT.
But seqcount_t doesn't have the associated lock, so I think that
seqprop_preemptible(seqcount_t) should return true.
But OK, I won't insist. At least it seems we more or less understand
each other ;)
Oleg.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: uprobe splat in PREEMP_RT
2025-04-03 7:37 ` Sebastian Sewior
@ 2025-04-03 12:27 ` Oleg Nesterov
2025-04-03 16:04 ` Andrii Nakryiko
0 siblings, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2025-04-03 12:27 UTC (permalink / raw)
To: Sebastian Sewior
Cc: Andrii Nakryiko, Steven Rostedt, Peter Ziljstra,
Alexei Starovoitov, Andrii Nakryiko, bpf, Jiri Olsa,
Masami Hiramatsu
On 04/03, Sebastian Sewior wrote:
>
> On 2025-04-02 16:12:46 [+0200], Oleg Nesterov wrote:
> > >
> > > Yes. This would work for here just to skip the check because of all
> > > details that are hard to express. Therefore I suggested to use
> > > raw_write_seqcount_begin() instead of write_seqcount_begin() in
> > > 20250402122158.j_8VoHQ-@linutronix.de. Would that work?
> >
> > If this can work, then let me repeat: why can't we turn ->ri_seqcount
> > into a boolean?
>
> I just stumbled here due to the warning. Now that you ask the question,
> it is used a bool in the current construct. So yes, I also don't see
> why.
Well, Andrii has already explained why he decided to abuse seqcount_t,
to avoid the explicti barriers in this code... I won't argue.
So, just in case, I agree that your suggestion to use
raw_write_seqcount_begin/end should obviously fix the problem.
Oleg.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: uprobe splat in PREEMP_RT
2025-04-03 12:11 ` Oleg Nesterov
@ 2025-04-03 12:43 ` Oleg Nesterov
0 siblings, 0 replies; 28+ messages in thread
From: Oleg Nesterov @ 2025-04-03 12:43 UTC (permalink / raw)
To: Sebastian Sewior
Cc: Alexei Starovoitov, Peter Ziljstra, Andrii Nakryiko, bpf,
Steven Rostedt, Jiri Olsa, Masami Hiramatsu
On 04/03, Oleg Nesterov wrote:
>
> From include/linux/seqlock.h
>
> SEQCOUNT_LOCKNAME(raw_spinlock, raw_spinlock_t, false, raw_spin)
> SEQCOUNT_LOCKNAME(spinlock, spinlock_t, __SEQ_RT, spin)
> SEQCOUNT_LOCKNAME(rwlock, rwlock_t, __SEQ_RT, read)
> SEQCOUNT_LOCKNAME(mutex, struct mutex, true, mutex)
>
> so for these seqcount's seqprop_preemptible() returns false only if
> the associated lock disables preemption. raw_spinlock always does this
> spinlock/rwlock depend on PREEMPT_RT.
Sorry, I wasn't clear... seqprop_preemptible() always returns false
on PREEMPT_RT, I meant the "preemptible" check in seqprop_sequence().
Oleg.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: uprobe splat in PREEMP_RT
2025-04-03 12:27 ` Oleg Nesterov
@ 2025-04-03 16:04 ` Andrii Nakryiko
0 siblings, 0 replies; 28+ messages in thread
From: Andrii Nakryiko @ 2025-04-03 16:04 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Sebastian Sewior, Steven Rostedt, Peter Ziljstra,
Alexei Starovoitov, Andrii Nakryiko, bpf, Jiri Olsa,
Masami Hiramatsu
On Thu, Apr 3, 2025 at 5:28 AM Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 04/03, Sebastian Sewior wrote:
> >
> > On 2025-04-02 16:12:46 [+0200], Oleg Nesterov wrote:
> > > >
> > > > Yes. This would work for here just to skip the check because of all
> > > > details that are hard to express. Therefore I suggested to use
> > > > raw_write_seqcount_begin() instead of write_seqcount_begin() in
> > > > 20250402122158.j_8VoHQ-@linutronix.de. Would that work?
> > >
> > > If this can work, then let me repeat: why can't we turn ->ri_seqcount
> > > into a boolean?
> >
> > I just stumbled here due to the warning. Now that you ask the question,
> > it is used a bool in the current construct. So yes, I also don't see
> > why.
>
> Well, Andrii has already explained why he decided to abuse seqcount_t,
> to avoid the explicti barriers in this code... I won't argue.
>
> So, just in case, I agree that your suggestion to use
> raw_write_seqcount_begin/end should obviously fix the problem.
>
Sounds good, I'll switch to raw_ variants and will point out more
explicitly in the comment that I rely on seqcount barriers and
visibility rules.
> Oleg.
>
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2025-04-03 16:04 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-01 21:04 uprobe splat in PREEMP_RT Alexei Starovoitov
2025-04-01 21:22 ` Steven Rostedt
2025-04-01 22:01 ` Andrii Nakryiko
2025-04-02 10:33 ` Oleg Nesterov
2025-04-02 10:57 ` Sebastian Sewior
2025-04-02 11:23 ` Oleg Nesterov
2025-04-02 12:13 ` Sebastian Sewior
2025-04-02 12:18 ` Oleg Nesterov
2025-04-02 12:24 ` Sebastian Sewior
2025-04-02 14:12 ` Oleg Nesterov
2025-04-03 7:37 ` Sebastian Sewior
2025-04-03 12:27 ` Oleg Nesterov
2025-04-03 16:04 ` Andrii Nakryiko
2025-04-02 16:15 ` Andrii Nakryiko
2025-04-02 16:57 ` Oleg Nesterov
2025-04-02 12:21 ` Sebastian Sewior
2025-04-02 9:10 ` Oleg Nesterov
2025-04-02 10:54 ` Sebastian Sewior
2025-04-02 11:20 ` Oleg Nesterov
2025-04-02 11:31 ` Oleg Nesterov
2025-04-02 12:06 ` Sebastian Sewior
2025-04-02 12:12 ` Oleg Nesterov
2025-04-02 12:16 ` Sebastian Sewior
2025-04-02 13:56 ` Oleg Nesterov
2025-04-02 14:23 ` Oleg Nesterov
2025-04-03 9:08 ` Sebastian Sewior
2025-04-03 12:11 ` Oleg Nesterov
2025-04-03 12:43 ` Oleg Nesterov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox