* [PATCH 0/1] documentation: seqlock: fix the wrong documentation of read_seqbegin_or_lock/need_seqretry
@ 2025-09-28 16:19 Oleg Nesterov
2025-09-28 16:20 ` [PATCH 1/1] " Oleg Nesterov
` (5 more replies)
0 siblings, 6 replies; 74+ messages in thread
From: Oleg Nesterov @ 2025-09-28 16:19 UTC (permalink / raw)
To: Boqun Feng, David Howells, Ingo Molnar, Li RongQing,
Linus Torvalds, Peter Zijlstra, Waiman Long, Will Deacon
Cc: linux-kernel
Hello,
I already tried to discuss this about 2 years ago, but nobody replied.
Let me make another attempt.
1/1 is the trivial doc fix, but what do you think about "RFC 2/1" ?
Does it make any sense?
Oleg.
^ permalink raw reply [flat|nested] 74+ messages in thread* [PATCH 1/1] documentation: seqlock: fix the wrong documentation of read_seqbegin_or_lock/need_seqretry 2025-09-28 16:19 [PATCH 0/1] documentation: seqlock: fix the wrong documentation of read_seqbegin_or_lock/need_seqretry Oleg Nesterov @ 2025-09-28 16:20 ` Oleg Nesterov 2025-10-01 18:21 ` Waiman Long 2025-10-21 10:35 ` [tip: locking/core] " tip-bot2 for Oleg Nesterov 2025-09-28 16:20 ` [RFC 2/1] seqlock: make the read_seqbegin_or_lock() API more simple and less error-prone ? Oleg Nesterov ` (4 subsequent siblings) 5 siblings, 2 replies; 74+ messages in thread From: Oleg Nesterov @ 2025-09-28 16:20 UTC (permalink / raw) To: Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Linus Torvalds, Peter Zijlstra, Waiman Long, Will Deacon Cc: linux-kernel The comments and pseudo code in Documentation/locking/seqlock.rst are wrong: int seq = 0; do { read_seqbegin_or_lock(&foo_seqlock, &seq); /* ... [[read-side critical section]] ... */ } while (need_seqretry(&foo_seqlock, seq)); read_seqbegin_or_lock() always returns with an even "seq" and need_seqretry() doesn't change this counter. This means that seq is always even and thus the locking pass is simply impossible. IOW, "_or_lock" has no effect and this code doesn't differ from do { seq = read_seqbegin(&foo_seqlock); /* ... [[read-side critical section]] ... */ } while (read_seqretry(&foo_seqlock, seq)); Signed-off-by: Oleg Nesterov <oleg@redhat.com> --- Documentation/locking/seqlock.rst | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/Documentation/locking/seqlock.rst b/Documentation/locking/seqlock.rst index ec6411d02ac8..167d442d3c7f 100644 --- a/Documentation/locking/seqlock.rst +++ b/Documentation/locking/seqlock.rst @@ -218,13 +218,14 @@ Read path, three categories: according to a passed marker. This is used to avoid lockless readers starvation (too much retry loops) in case of a sharp spike in write activity. First, a lockless read is tried (even marker passed). If - that trial fails (odd sequence counter is returned, which is used as - the next iteration marker), the lockless read is transformed to a - full locking read and no retry loop is necessary:: + that trial fails (sequence counter doesn't match), make the marker + odd for the next iteration, the lockless read is transformed to a + full locking read and no retry loop is necessary, for example:: /* marker; even initialization */ - int seq = 0; + int seq = 1; do { + seq++; /* 2 on the 1st/lockless path, otherwise odd */ read_seqbegin_or_lock(&foo_seqlock, &seq); /* ... [[read-side critical section]] ... */ -- 2.25.1.362.g51ebf55 ^ permalink raw reply related [flat|nested] 74+ messages in thread
* Re: [PATCH 1/1] documentation: seqlock: fix the wrong documentation of read_seqbegin_or_lock/need_seqretry 2025-09-28 16:20 ` [PATCH 1/1] " Oleg Nesterov @ 2025-10-01 18:21 ` Waiman Long 2025-10-01 19:06 ` Oleg Nesterov 2025-10-02 11:01 ` Oleg Nesterov 2025-10-21 10:35 ` [tip: locking/core] " tip-bot2 for Oleg Nesterov 1 sibling, 2 replies; 74+ messages in thread From: Waiman Long @ 2025-10-01 18:21 UTC (permalink / raw) To: Oleg Nesterov, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Linus Torvalds, Peter Zijlstra, Will Deacon Cc: linux-kernel On 9/28/25 12:20 PM, Oleg Nesterov wrote: > The comments and pseudo code in Documentation/locking/seqlock.rst are wrong: > > int seq = 0; > do { > read_seqbegin_or_lock(&foo_seqlock, &seq); > > /* ... [[read-side critical section]] ... */ > > } while (need_seqretry(&foo_seqlock, seq)); > > read_seqbegin_or_lock() always returns with an even "seq" and need_seqretry() > doesn't change this counter. This means that seq is always even and thus the > locking pass is simply impossible. > > IOW, "_or_lock" has no effect and this code doesn't differ from > > do { > seq = read_seqbegin(&foo_seqlock); > > /* ... [[read-side critical section]] ... */ > > } while (read_seqretry(&foo_seqlock, seq)); > > Signed-off-by: Oleg Nesterov <oleg@redhat.com> > --- > Documentation/locking/seqlock.rst | 9 +++++---- > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/Documentation/locking/seqlock.rst b/Documentation/locking/seqlock.rst > index ec6411d02ac8..167d442d3c7f 100644 > --- a/Documentation/locking/seqlock.rst > +++ b/Documentation/locking/seqlock.rst > @@ -218,13 +218,14 @@ Read path, three categories: > according to a passed marker. This is used to avoid lockless readers > starvation (too much retry loops) in case of a sharp spike in write > activity. First, a lockless read is tried (even marker passed). If > - that trial fails (odd sequence counter is returned, which is used as > - the next iteration marker), the lockless read is transformed to a > - full locking read and no retry loop is necessary:: > + that trial fails (sequence counter doesn't match), make the marker > + odd for the next iteration, the lockless read is transformed to a > + full locking read and no retry loop is necessary, for example:: > > /* marker; even initialization */ > - int seq = 0; > + int seq = 1; > do { > + seq++; /* 2 on the 1st/lockless path, otherwise odd */ > read_seqbegin_or_lock(&foo_seqlock, &seq); > > /* ... [[read-side critical section]] ... */ It is kind of odd to initialize the sequence to 1 and add an sequence increment inside the loop. Perhaps we can do something like: diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h index 5ce48eab7a2a..0f607ef28d98 100644 --- a/include/linux/seqlock.h +++ b/include/linux/seqlock.h @@ -1126,6 +1126,9 @@ read_sequnlock_excl_irqrestore(seqlock_t *sl, unsigned long flags) */ static inline void read_seqbegin_or_lock(seqlock_t *lock, int *seq) { + if (!(*seq & 1)) /* Reread sequence # if even */ + *seq = seqprop_sequence(&lock->seqcount); + if (!(*seq & 1)) /* Even */ *seq = read_seqbegin(lock); else /* Odd */ @@ -1144,6 +1147,15 @@ static inline int need_seqretry(seqlock_t *lock, int seq) return !(seq & 1) && read_seqretry(lock, seq); } +static inline int need_seqretry_once(seqlock_t *lock, int *seq) +{ + int ret = !(*seq & 1) && read_seqretry(lock, *seq); + + if (ret) + *seq = 1; /* Enforce locking in next iteration */ + return ret; +} + With this, the current document should be good. Users have the option of using need_seqretry_once() to enforce at most 1 iteration. Of course, we still need to do similar change to the other read_seqbegin_or_lock_*() variants. My 2 cents. Cheers, Longman ^ permalink raw reply related [flat|nested] 74+ messages in thread
* Re: [PATCH 1/1] documentation: seqlock: fix the wrong documentation of read_seqbegin_or_lock/need_seqretry 2025-10-01 18:21 ` Waiman Long @ 2025-10-01 19:06 ` Oleg Nesterov 2025-10-01 19:24 ` Waiman Long 2025-10-02 11:01 ` Oleg Nesterov 1 sibling, 1 reply; 74+ messages in thread From: Oleg Nesterov @ 2025-10-01 19:06 UTC (permalink / raw) To: Waiman Long Cc: Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Linus Torvalds, Peter Zijlstra, Will Deacon, linux-kernel On 10/01, Waiman Long wrote: > > On 9/28/25 12:20 PM, Oleg Nesterov wrote: > > > >--- a/Documentation/locking/seqlock.rst > >+++ b/Documentation/locking/seqlock.rst > >@@ -218,13 +218,14 @@ Read path, three categories: > > according to a passed marker. This is used to avoid lockless readers > > starvation (too much retry loops) in case of a sharp spike in write > > activity. First, a lockless read is tried (even marker passed). If > >- that trial fails (odd sequence counter is returned, which is used as > >- the next iteration marker), the lockless read is transformed to a > >- full locking read and no retry loop is necessary:: > >+ that trial fails (sequence counter doesn't match), make the marker > >+ odd for the next iteration, the lockless read is transformed to a > >+ full locking read and no retry loop is necessary, for example:: > > /* marker; even initialization */ > >- int seq = 0; > >+ int seq = 1; > > do { > >+ seq++; /* 2 on the 1st/lockless path, otherwise odd */ > > read_seqbegin_or_lock(&foo_seqlock, &seq); > > /* ... [[read-side critical section]] ... */ > > It is kind of odd to initialize the sequence to 1 and add an sequence > increment inside the loop. Sure. But a) in this patch my only point is that the current documentation is wrong, and b) the pseudo-code after this change becomes correct and the new pattern already have the users. For example, do_io_accounting() and more. > Perhaps we can do something like: Perhaps. But could you please read the "RFC 2/1" thread? To me it is kind of odd that the simple loops like this example have to even touch the sequence counter inside the loop. > +static inline int need_seqretry_once(seqlock_t *lock, int *seq) > +{ > + int ret = !(*seq & 1) && read_seqretry(lock, *seq); > + > + if (ret) > + *seq = 1; /* Enforce locking in next iteration */ > + return ret; > +} And this is exactly what I tried to propose in "RFC 2/1". Plus more... Oleg. ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH 1/1] documentation: seqlock: fix the wrong documentation of read_seqbegin_or_lock/need_seqretry 2025-10-01 19:06 ` Oleg Nesterov @ 2025-10-01 19:24 ` Waiman Long 2025-10-01 19:34 ` Waiman Long 0 siblings, 1 reply; 74+ messages in thread From: Waiman Long @ 2025-10-01 19:24 UTC (permalink / raw) To: Oleg Nesterov, Waiman Long Cc: Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Linus Torvalds, Peter Zijlstra, Will Deacon, linux-kernel On 10/1/25 3:06 PM, Oleg Nesterov wrote: > On 10/01, Waiman Long wrote: >> On 9/28/25 12:20 PM, Oleg Nesterov wrote: >>> --- a/Documentation/locking/seqlock.rst >>> +++ b/Documentation/locking/seqlock.rst >>> @@ -218,13 +218,14 @@ Read path, three categories: >>> according to a passed marker. This is used to avoid lockless readers >>> starvation (too much retry loops) in case of a sharp spike in write >>> activity. First, a lockless read is tried (even marker passed). If >>> - that trial fails (odd sequence counter is returned, which is used as >>> - the next iteration marker), the lockless read is transformed to a >>> - full locking read and no retry loop is necessary:: >>> + that trial fails (sequence counter doesn't match), make the marker >>> + odd for the next iteration, the lockless read is transformed to a >>> + full locking read and no retry loop is necessary, for example:: >>> /* marker; even initialization */ >>> - int seq = 0; >>> + int seq = 1; >>> do { >>> + seq++; /* 2 on the 1st/lockless path, otherwise odd */ >>> read_seqbegin_or_lock(&foo_seqlock, &seq); >>> /* ... [[read-side critical section]] ... */ >> It is kind of odd to initialize the sequence to 1 and add an sequence >> increment inside the loop. > Sure. But a) in this patch my only point is that the current documentation is > wrong, and b) the pseudo-code after this change becomes correct and the new > pattern already have the users. For example, do_io_accounting() and more. Thank for letting me know, but I believe my suggested change will work with this modified loop iteration. > >> Perhaps we can do something like: > Perhaps. But could you please read the "RFC 2/1" thread? To me it is kind of > odd that the simple loops like this example have to even touch the sequence > counter inside the loop. > >> +static inline int need_seqretry_once(seqlock_t *lock, int *seq) >> +{ >> + int ret = !(*seq & 1) && read_seqretry(lock, *seq); >> + >> + if (ret) >> + *seq = 1; /* Enforce locking in next iteration */ >> + return ret; >> +} > And this is exactly what I tried to propose in "RFC 2/1". Plus more... I had read that. You used _xxx() suffix which I think a good choice will be to use _once() to indicate that we only want one retry. Note that I believe the current read_seqbegin_or_lock() API was made to be consistent with how the read_seqbegin() or read_seqcount_begin() APIs are being used. I am not against your suggested single helper function, but it may cause confusion because it differs from the others. So it must be clearly documented to highlight its difference from the other similar APIs. Cheers, Longman ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH 1/1] documentation: seqlock: fix the wrong documentation of read_seqbegin_or_lock/need_seqretry 2025-10-01 19:24 ` Waiman Long @ 2025-10-01 19:34 ` Waiman Long 0 siblings, 0 replies; 74+ messages in thread From: Waiman Long @ 2025-10-01 19:34 UTC (permalink / raw) To: Waiman Long, Oleg Nesterov Cc: Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Linus Torvalds, Peter Zijlstra, Will Deacon, linux-kernel On 10/1/25 3:24 PM, Waiman Long wrote: > On 10/1/25 3:06 PM, Oleg Nesterov wrote: >> On 10/01, Waiman Long wrote: >>> On 9/28/25 12:20 PM, Oleg Nesterov wrote: >>>> --- a/Documentation/locking/seqlock.rst >>>> +++ b/Documentation/locking/seqlock.rst >>>> @@ -218,13 +218,14 @@ Read path, three categories: >>>> according to a passed marker. This is used to avoid lockless >>>> readers >>>> starvation (too much retry loops) in case of a sharp spike in >>>> write >>>> activity. First, a lockless read is tried (even marker >>>> passed). If >>>> - that trial fails (odd sequence counter is returned, which is >>>> used as >>>> - the next iteration marker), the lockless read is transformed to a >>>> - full locking read and no retry loop is necessary:: >>>> + that trial fails (sequence counter doesn't match), make the marker >>>> + odd for the next iteration, the lockless read is transformed to a >>>> + full locking read and no retry loop is necessary, for example:: >>>> /* marker; even initialization */ >>>> - int seq = 0; >>>> + int seq = 1; >>>> do { >>>> + seq++; /* 2 on the 1st/lockless path, otherwise odd */ >>>> read_seqbegin_or_lock(&foo_seqlock, &seq); >>>> /* ... [[read-side critical section]] ... */ >>> It is kind of odd to initialize the sequence to 1 and add an sequence >>> increment inside the loop. >> Sure. But a) in this patch my only point is that the current >> documentation is >> wrong, and b) the pseudo-code after this change becomes correct and >> the new >> pattern already have the users. For example, do_io_accounting() and >> more. > Thank for letting me know, but I believe my suggested change will work > with this modified loop iteration. >> >>> Perhaps we can do something like: >> Perhaps. But could you please read the "RFC 2/1" thread? To me it is >> kind of >> odd that the simple loops like this example have to even touch the >> sequence >> counter inside the loop. >> >>> +static inline int need_seqretry_once(seqlock_t *lock, int *seq) >>> +{ >>> + int ret = !(*seq & 1) && read_seqretry(lock, *seq); >>> + >>> + if (ret) >>> + *seq = 1; /* Enforce locking in next iteration */ >>> + return ret; >>> +} >> And this is exactly what I tried to propose in "RFC 2/1". Plus more... > > I had read that. You used _xxx() suffix which I think a good choice > will be to use _once() to indicate that we only want one retry. Note that with my suggested reading of the sequence count within read_seqbegin_or_lock(), we may not really need this extra helper. However, there is still a very slight chance that reader and writer are perfectly synchronized in such a way that an even sequence number is always read even though it is still increasing after each iteration. So this new helper is for users that is paranoid about this rare case. Cheers, Longman ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH 1/1] documentation: seqlock: fix the wrong documentation of read_seqbegin_or_lock/need_seqretry 2025-10-01 18:21 ` Waiman Long 2025-10-01 19:06 ` Oleg Nesterov @ 2025-10-02 11:01 ` Oleg Nesterov 1 sibling, 0 replies; 74+ messages in thread From: Oleg Nesterov @ 2025-10-02 11:01 UTC (permalink / raw) To: Waiman Long Cc: Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Linus Torvalds, Peter Zijlstra, Will Deacon, linux-kernel Hi Waiman, This is probably my fault, but I can't understand your emails. So let me start from the very beginning and write another reply to this email. I don't think that we can change include/linux/seqlock.h so that this change will make the documentation correct without changing/breaking the existing code. But perhaps I misunderstood you... And just in case... of course need_seqretry_xxx() and/or XXX() must be renamed. On 10/01, Waiman Long wrote: > > On 9/28/25 12:20 PM, Oleg Nesterov wrote: > >- int seq = 0; > >+ int seq = 1; > > do { > >+ seq++; /* 2 on the 1st/lockless path, otherwise odd */ > > read_seqbegin_or_lock(&foo_seqlock, &seq); > > /* ... [[read-side critical section]] ... */ > > It is kind of odd to initialize the sequence to 1 and add an sequence > increment inside the loop. Sure. That is why I am proposing the new helper which can be used instead need_seqretry(). You named it need_seqretry_once() below. Now, from Documentation/locking/seqlock.rst before this patch: /* marker; even initialization */ int seq = 0; do { read_seqbegin_or_lock(&foo_seqlock, &seq); /* ... [[read-side critical section]] ... */ } while (need_seqretry(&foo_seqlock, seq)); done_seqretry(&foo_seqlock, seq); > Perhaps we can do something like: > > --- a/include/linux/seqlock.h > +++ b/include/linux/seqlock.h > @@ -1126,6 +1126,9 @@ read_sequnlock_excl_irqrestore(seqlock_t *sl, unsigned > long flags) > */ > static inline void read_seqbegin_or_lock(seqlock_t *lock, int *seq) > { > + if (!(*seq & 1)) /* Reread sequence # if even */ > + *seq = seqprop_sequence(&lock->seqcount); > + Why? I don't understand this change... > +static inline int need_seqretry_once(seqlock_t *lock, int *seq) > +{ > + int ret = !(*seq & 1) && read_seqretry(lock, *seq); > + > + if (ret) > + *seq = 1; /* Enforce locking in next iteration */ > + return ret; > +} > > With this, the current document should be good. No? How can this change make the pseudo code above correct? It will never take the lock. OK, unless CONFIG_PREEMPT_RT=y but this is another story. And the documentation is still wrong in this respect. > Users have the option of > using need_seqretry_once() to enforce at most 1 iteration. So we need to change the pseudo code above - } while (need_seqretry(&foo_seqlock, seq)); + } while (need_seqretry_once(&foo_seqlock, &seq)); and this is exactly what I am trying to suggest in "RFC 2/1". So I think we should fix the docs, and the new helper(s), and update the current users one-by-one. Oleg. ^ permalink raw reply [flat|nested] 74+ messages in thread
* [tip: locking/core] documentation: seqlock: fix the wrong documentation of read_seqbegin_or_lock/need_seqretry 2025-09-28 16:20 ` [PATCH 1/1] " Oleg Nesterov 2025-10-01 18:21 ` Waiman Long @ 2025-10-21 10:35 ` tip-bot2 for Oleg Nesterov 1 sibling, 0 replies; 74+ messages in thread From: tip-bot2 for Oleg Nesterov @ 2025-10-21 10:35 UTC (permalink / raw) To: linux-tip-commits Cc: Oleg Nesterov, Peter Zijlstra (Intel), x86, linux-kernel The following commit has been merged into the locking/core branch of tip: Commit-ID: 28a0ee311960baad97bf85e1e995aed4a71e22a2 Gitweb: https://git.kernel.org/tip/28a0ee311960baad97bf85e1e995aed4a71e22a2 Author: Oleg Nesterov <oleg@redhat.com> AuthorDate: Sun, 28 Sep 2025 18:20:29 +02:00 Committer: Peter Zijlstra <peterz@infradead.org> CommitterDate: Tue, 21 Oct 2025 12:31:56 +02:00 documentation: seqlock: fix the wrong documentation of read_seqbegin_or_lock/need_seqretry The comments and pseudo code in Documentation/locking/seqlock.rst are wrong: int seq = 0; do { read_seqbegin_or_lock(&foo_seqlock, &seq); /* ... [[read-side critical section]] ... */ } while (need_seqretry(&foo_seqlock, seq)); read_seqbegin_or_lock() always returns with an even "seq" and need_seqretry() doesn't change this counter. This means that seq is always even and thus the locking pass is simply impossible. IOW, "_or_lock" has no effect and this code doesn't differ from do { seq = read_seqbegin(&foo_seqlock); /* ... [[read-side critical section]] ... */ } while (read_seqretry(&foo_seqlock, seq)); Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- Documentation/locking/seqlock.rst | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/Documentation/locking/seqlock.rst b/Documentation/locking/seqlock.rst index 3fb7ea3..9899871 100644 --- a/Documentation/locking/seqlock.rst +++ b/Documentation/locking/seqlock.rst @@ -220,13 +220,14 @@ Read path, three categories: according to a passed marker. This is used to avoid lockless readers starvation (too much retry loops) in case of a sharp spike in write activity. First, a lockless read is tried (even marker passed). If - that trial fails (odd sequence counter is returned, which is used as - the next iteration marker), the lockless read is transformed to a - full locking read and no retry loop is necessary:: + that trial fails (sequence counter doesn't match), make the marker + odd for the next iteration, the lockless read is transformed to a + full locking read and no retry loop is necessary, for example:: /* marker; even initialization */ - int seq = 0; + int seq = 1; do { + seq++; /* 2 on the 1st/lockless path, otherwise odd */ read_seqbegin_or_lock(&foo_seqlock, &seq); /* ... [[read-side critical section]] ... */ ^ permalink raw reply related [flat|nested] 74+ messages in thread
* [RFC 2/1] seqlock: make the read_seqbegin_or_lock() API more simple and less error-prone ? 2025-09-28 16:19 [PATCH 0/1] documentation: seqlock: fix the wrong documentation of read_seqbegin_or_lock/need_seqretry Oleg Nesterov 2025-09-28 16:20 ` [PATCH 1/1] " Oleg Nesterov @ 2025-09-28 16:20 ` Oleg Nesterov 2025-09-29 0:41 ` [????] " Li,Rongqing ` (2 more replies) 2025-10-05 14:47 ` [PATCH 0/5] seqlock: introduce SEQLOCK_READ_SECTION() Oleg Nesterov ` (3 subsequent siblings) 5 siblings, 3 replies; 74+ messages in thread From: Oleg Nesterov @ 2025-09-28 16:20 UTC (permalink / raw) To: Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Linus Torvalds, Peter Zijlstra, Waiman Long, Will Deacon Cc: linux-kernel Another problem is that this API is error prone. Two years ago half of the read_seqbegin_or_lock() users were buggy (they followed the wrong example from Documentation/locking/seqlock.rst). And even if the code is mostly correct it is very easy to add a hard-to-detect mistake, see for example [PATCH][v3] afs: Remove erroneous seq |= 1 in volume lookup loop https://lore.kernel.org/all/20250910084235.2630-1-lirongqing@baidu.com/ Can we improve this API? ------------------------------------------------------------------------------- To simplify, suppose we add the new helper static inline int need_seqretry_xxx(seqlock_t *lock, int *seq) { int ret = !(*seq & 1) && read_seqretry(lock, *seq); if (ret) ++*seq; /* make this counter odd */ return ret; } which can be used instead of need_seqretry(). This way the users do not need to modify "seq" in the main loop. For example, we can simplify thread_group_cputime() as follows: --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -314,7 +314,7 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times) struct signal_struct *sig = tsk->signal; u64 utime, stime; struct task_struct *t; - unsigned int seq, nextseq; + unsigned int seq; unsigned long flags; /* @@ -330,9 +330,8 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times) rcu_read_lock(); /* Attempt a lockless read on the first round. */ - nextseq = 0; + seq = 0; do { - seq = nextseq; flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, &seq); times->utime = sig->utime; times->stime = sig->stime; @@ -344,9 +343,7 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times) times->stime += stime; times->sum_exec_runtime += read_sum_exec_runtime(t); } - /* If lockless access failed, take the lock. */ - nextseq = 1; - } while (need_seqretry(&sig->stats_lock, seq)); + } while (need_seqretry_xxx(&sig->stats_lock, &seq)); done_seqretry_irqrestore(&sig->stats_lock, seq, flags); rcu_read_unlock(); } most (if not all) of other users can be changed the same way. ------------------------------------------------------------------------------- Or perhaps we can even add a helper that hides all the details, something like int xxx(seqlock_t *lock, int *seq, int lockless) { if (lockless) { *seq = read_seqbegin(lock); return 1; } if (*seq & 1) { read_sequnlock_excl(lock); return 0; } if (read_seqretry(lock, *seq)) { read_seqlock_excl(lock); *seq = 1; return 1; } return 0; } #define __XXX(lock, seq, lockless) \ for (int lockless = 1, seq; xxx(lock, &seq, lockless); lockless = 0) #define XXX(lock) \ __XXX(lock, __UNIQUE_ID(seq), __UNIQUE_ID(lockless)) ? This way the users can simply do: seqlock_t sl; void func(void) { XXX(&sl) { ... read-side critical section ... } } using only the new XXX() helper. No need to declare/initialize seq, no need for need_seqretry/done_seqretry. What do you think? Oleg. ^ permalink raw reply [flat|nested] 74+ messages in thread
* RE: [????] [RFC 2/1] seqlock: make the read_seqbegin_or_lock() API more simple and less error-prone ? 2025-09-28 16:20 ` [RFC 2/1] seqlock: make the read_seqbegin_or_lock() API more simple and less error-prone ? Oleg Nesterov @ 2025-09-29 0:41 ` Li,Rongqing 2025-09-29 6:47 ` Oleg Nesterov 2025-09-30 22:09 ` David Howells 2025-10-01 13:02 ` Peter Zijlstra 2 siblings, 1 reply; 74+ messages in thread From: Li,Rongqing @ 2025-09-29 0:41 UTC (permalink / raw) To: Oleg Nesterov, Boqun Feng, David Howells, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Waiman Long, Will Deacon Cc: linux-kernel@vger.kernel.org > Another problem is that this API is error prone. Two years ago half of the > read_seqbegin_or_lock() users were buggy (they followed the wrong example > from Documentation/locking/seqlock.rst). And even if the code is mostly > correct it is very easy to add a hard-to-detect mistake, see for example > > [PATCH][v3] afs: Remove erroneous seq |= 1 in volume lookup loop > https://lore.kernel.org/all/20250910084235.2630-1-lirongqing@baidu.co > m/ > > Can we improve this API? > > ------------------------------------------------------------------------------- > To simplify, suppose we add the new helper > > static inline int need_seqretry_xxx(seqlock_t *lock, int *seq) > { > int ret = !(*seq & 1) && read_seqretry(lock, *seq); > > if (ret) > ++*seq; /* make this counter odd */ > > return ret; > } > > which can be used instead of need_seqretry(). This way the users do not need > to modify "seq" in the main loop. For example, we can simplify > thread_group_cputime() as follows: > > --- a/kernel/sched/cputime.c > +++ b/kernel/sched/cputime.c > @@ -314,7 +314,7 @@ void thread_group_cputime(struct task_struct > *tsk, struct task_cputime *times) > struct signal_struct *sig = tsk->signal; > u64 utime, stime; > struct task_struct *t; > - unsigned int seq, nextseq; > + unsigned int seq; > unsigned long flags; > > /* > @@ -330,9 +330,8 @@ void thread_group_cputime(struct task_struct > *tsk, struct task_cputime *times) > > rcu_read_lock(); > /* Attempt a lockless read on the first round. */ > - nextseq = 0; > + seq = 0; > do { > - seq = nextseq; > flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, &seq); > times->utime = sig->utime; > times->stime = sig->stime; > @@ -344,9 +343,7 @@ void thread_group_cputime(struct task_struct > *tsk, struct task_cputime *times) > times->stime += stime; > times->sum_exec_runtime += read_sum_exec_runtime(t); > } > - /* If lockless access failed, take the lock. */ > - nextseq = 1; > - } while (need_seqretry(&sig->stats_lock, seq)); > + } while (need_seqretry_xxx(&sig->stats_lock, &seq)); > done_seqretry_irqrestore(&sig->stats_lock, seq, flags); > rcu_read_unlock(); > } > If this API can be simplified, it should prevent future errors; I submitted a patch, inspired by inspired by your previous patch, and hope that all places use a fixed syntax, to prevent future errors; https://lkml.org/lkml/2025/7/31/616 thanks -Li > most (if not all) of other users can be changed the same way. > > ------------------------------------------------------------------------------- > Or perhaps we can even add a helper that hides all the details, something like > > int xxx(seqlock_t *lock, int *seq, int lockless) > { > if (lockless) { > *seq = read_seqbegin(lock); > return 1; > } > > if (*seq & 1) { > read_sequnlock_excl(lock); > return 0; > } > > if (read_seqretry(lock, *seq)) { > read_seqlock_excl(lock); > *seq = 1; > return 1; > } > > return 0; > > } > > #define __XXX(lock, seq, lockless) \ > for (int lockless = 1, seq; xxx(lock, &seq, lockless); lockless = 0) > > #define XXX(lock) \ > __XXX(lock, __UNIQUE_ID(seq), __UNIQUE_ID(lockless)) > > > ? > > This way the users can simply do: > > seqlock_t sl; > > void func(void) > { > XXX(&sl) { > ... read-side critical section ... > } > } > > using only the new XXX() helper. No need to declare/initialize seq, no need for > need_seqretry/done_seqretry. > > What do you think? > > Oleg. ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [????] [RFC 2/1] seqlock: make the read_seqbegin_or_lock() API more simple and less error-prone ? 2025-09-29 0:41 ` [????] " Li,Rongqing @ 2025-09-29 6:47 ` Oleg Nesterov 0 siblings, 0 replies; 74+ messages in thread From: Oleg Nesterov @ 2025-09-29 6:47 UTC (permalink / raw) To: Li,Rongqing Cc: Boqun Feng, David Howells, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Waiman Long, Will Deacon, linux-kernel@vger.kernel.org On 09/29, Li,Rongqing wrote: > > > Another problem is that this API is error prone. Two years ago half of the > > read_seqbegin_or_lock() users were buggy (they followed the wrong example > > from Documentation/locking/seqlock.rst). And even if the code is mostly > > correct it is very easy to add a hard-to-detect mistake, see for example > > > > [PATCH][v3] afs: Remove erroneous seq |= 1 in volume lookup loop > > https://lore.kernel.org/all/20250910084235.2630-1-lirongqing@baidu.co > > m/ > > > > Can we improve this API? > > > > ------------------------------------------------------------------------------- > > To simplify, suppose we add the new helper > > > > static inline int need_seqretry_xxx(seqlock_t *lock, int *seq) > > { > > int ret = !(*seq & 1) && read_seqretry(lock, *seq); > > > > if (ret) > > ++*seq; /* make this counter odd */ > > > > return ret; > > } > > > > which can be used instead of need_seqretry(). This way the users do not need > > to modify "seq" in the main loop. For example, we can simplify > > thread_group_cputime() as follows: > > > > --- a/kernel/sched/cputime.c > > +++ b/kernel/sched/cputime.c > > @@ -314,7 +314,7 @@ void thread_group_cputime(struct task_struct > > *tsk, struct task_cputime *times) > > struct signal_struct *sig = tsk->signal; > > u64 utime, stime; > > struct task_struct *t; > > - unsigned int seq, nextseq; > > + unsigned int seq; > > unsigned long flags; > > > > /* > > @@ -330,9 +330,8 @@ void thread_group_cputime(struct task_struct > > *tsk, struct task_cputime *times) > > > > rcu_read_lock(); > > /* Attempt a lockless read on the first round. */ > > - nextseq = 0; > > + seq = 0; > > do { > > - seq = nextseq; > > flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, &seq); > > times->utime = sig->utime; > > times->stime = sig->stime; > > @@ -344,9 +343,7 @@ void thread_group_cputime(struct task_struct > > *tsk, struct task_cputime *times) > > times->stime += stime; > > times->sum_exec_runtime += read_sum_exec_runtime(t); > > } > > - /* If lockless access failed, take the lock. */ > > - nextseq = 1; > > - } while (need_seqretry(&sig->stats_lock, seq)); > > + } while (need_seqretry_xxx(&sig->stats_lock, &seq)); > > done_seqretry_irqrestore(&sig->stats_lock, seq, flags); > > rcu_read_unlock(); > > } > > > > If this API can be simplified, it should prevent future errors; > > I submitted a patch, inspired by inspired by your previous patch, and hope that all places use a fixed syntax, to prevent future errors; > > https://lkml.org/lkml/2025/7/31/616 Well, I am not sure it makes a lot of sense to change thread_group_cputime() this way, "nextseq" or the "seq++" trick is a matter of taste. I tried to suggest a simplified API to avoid the manipulation of "seq" altogether. Oleg. > > most (if not all) of other users can be changed the same way. > > > > ------------------------------------------------------------------------------- > > Or perhaps we can even add a helper that hides all the details, something like > > > > int xxx(seqlock_t *lock, int *seq, int lockless) > > { > > if (lockless) { > > *seq = read_seqbegin(lock); > > return 1; > > } > > > > if (*seq & 1) { > > read_sequnlock_excl(lock); > > return 0; > > } > > > > if (read_seqretry(lock, *seq)) { > > read_seqlock_excl(lock); > > *seq = 1; > > return 1; > > } > > > > return 0; > > > > } > > > > #define __XXX(lock, seq, lockless) \ > > for (int lockless = 1, seq; xxx(lock, &seq, lockless); lockless = 0) > > > > #define XXX(lock) \ > > __XXX(lock, __UNIQUE_ID(seq), __UNIQUE_ID(lockless)) > > > > > > ? > > > > This way the users can simply do: > > > > seqlock_t sl; > > > > void func(void) > > { > > XXX(&sl) { > > ... read-side critical section ... > > } > > } > > > > using only the new XXX() helper. No need to declare/initialize seq, no need for > > need_seqretry/done_seqretry. > > > > What do you think? > > > > Oleg. > ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [RFC 2/1] seqlock: make the read_seqbegin_or_lock() API more simple and less error-prone ? 2025-09-28 16:20 ` [RFC 2/1] seqlock: make the read_seqbegin_or_lock() API more simple and less error-prone ? Oleg Nesterov 2025-09-29 0:41 ` [????] " Li,Rongqing @ 2025-09-30 22:09 ` David Howells 2025-10-01 11:51 ` Oleg Nesterov 2025-10-01 13:02 ` Peter Zijlstra 2 siblings, 1 reply; 74+ messages in thread From: David Howells @ 2025-09-30 22:09 UTC (permalink / raw) To: Oleg Nesterov Cc: dhowells, Boqun Feng, Ingo Molnar, Li RongQing, Linus Torvalds, Peter Zijlstra, Waiman Long, Will Deacon, linux-kernel Oleg Nesterov <oleg@redhat.com> wrote: > > Can we improve this API? It would also be nice to fix the static lock-balance detection stuff that you get when you enable advanced checking during a kernel build. It doesn't seem to understand seqlocks. > - nextseq = 0; > + seq = 0; Perhaps an init function or macro that hides this bit? void init_read_seqlock(int *seq) { *seq = 0; } init_read_seqlock(&seq); or: #define INIT_READ_SEQBEGIN 0 seq = INIT_READ_SEQBEGIN; Though if we can fold the whole loop inside a macro, that might make it easier to use. d_walk() in fs/dcache.c might give you issues, though. David ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [RFC 2/1] seqlock: make the read_seqbegin_or_lock() API more simple and less error-prone ? 2025-09-30 22:09 ` David Howells @ 2025-10-01 11:51 ` Oleg Nesterov 0 siblings, 0 replies; 74+ messages in thread From: Oleg Nesterov @ 2025-10-01 11:51 UTC (permalink / raw) To: David Howells Cc: Boqun Feng, Ingo Molnar, Li RongQing, Linus Torvalds, Peter Zijlstra, Waiman Long, Will Deacon, linux-kernel On 09/30, David Howells wrote: > > It would also be nice to fix the static lock-balance detection stuff that you > get when you enable advanced checking during a kernel build. It doesn't > seem to understand seqlocks. > > > - nextseq = 0; > > + seq = 0; > > Perhaps an init function or macro that hides this bit? > > void init_read_seqlock(int *seq) > { > *seq = 0; > } > > init_read_seqlock(&seq); > > or: > > #define INIT_READ_SEQBEGIN 0 > > seq = INIT_READ_SEQBEGIN; Agreed, > Though if we can fold the whole loop inside a macro, that might make it easier > to use. Yes... > d_walk() in fs/dcache.c might give you issues, though. Yes. But note that it can use need_seqretry_xxx() too, see the patch below. So. Do you think it makes any sense to add the new helper for the start? Can you suggest a better name? Say, check_restart() ? Oleg. --- a/fs/dcache.c +++ b/fs/dcache.c @@ -1355,7 +1355,7 @@ static void d_walk(struct dentry *parent, void *data, spin_lock(&this_parent->d_lock); /* might go back up the wrong parent if we have had a rename. */ - if (need_seqretry(&rename_lock, seq)) + if (need_seqretry_xxx(&rename_lock, &seq)) goto rename_retry; /* go into the first sibling still alive */ hlist_for_each_entry_continue(dentry, d_sib) { @@ -1366,7 +1366,7 @@ static void d_walk(struct dentry *parent, void *data, } goto ascend; } - if (need_seqretry(&rename_lock, seq)) + if (need_seqretry_xxx(&rename_lock, &seq)) goto rename_retry; rcu_read_unlock(); @@ -1378,11 +1378,8 @@ static void d_walk(struct dentry *parent, void *data, rename_retry: spin_unlock(&this_parent->d_lock); rcu_read_unlock(); - BUG_ON(seq & 1); - if (!retry) - return; - seq = 1; - goto again; + if (retry) + goto again; } struct check_mount { ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [RFC 2/1] seqlock: make the read_seqbegin_or_lock() API more simple and less error-prone ? 2025-09-28 16:20 ` [RFC 2/1] seqlock: make the read_seqbegin_or_lock() API more simple and less error-prone ? Oleg Nesterov 2025-09-29 0:41 ` [????] " Li,Rongqing 2025-09-30 22:09 ` David Howells @ 2025-10-01 13:02 ` Peter Zijlstra 2025-10-01 13:13 ` Oleg Nesterov 2 siblings, 1 reply; 74+ messages in thread From: Peter Zijlstra @ 2025-10-01 13:02 UTC (permalink / raw) To: Oleg Nesterov Cc: Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Linus Torvalds, Waiman Long, Will Deacon, linux-kernel On Sun, Sep 28, 2025 at 06:20:54PM +0200, Oleg Nesterov wrote: > Can we improve this API? Please :-) > ------------------------------------------------------------------------------- > To simplify, suppose we add the new helper > > static inline int need_seqretry_xxx(seqlock_t *lock, int *seq) > { > int ret = !(*seq & 1) && read_seqretry(lock, *seq); > > if (ret) > ++*seq; /* make this counter odd */ > > return ret; > } How about need_seqretry_or_lock() to stay in theme with read_seqbegin_or_lock(). But then there's done_seqretry() without the _or_lock() :/ > which can be used instead of need_seqretry(). This way the users do not > need to modify "seq" in the main loop. For example, we can simplify > thread_group_cputime() as follows: > > --- a/kernel/sched/cputime.c > +++ b/kernel/sched/cputime.c > @@ -314,7 +314,7 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times) > struct signal_struct *sig = tsk->signal; > u64 utime, stime; > struct task_struct *t; > - unsigned int seq, nextseq; > + unsigned int seq; > unsigned long flags; > > /* > @@ -330,9 +330,8 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times) > > rcu_read_lock(); > /* Attempt a lockless read on the first round. */ > - nextseq = 0; > + seq = 0; > do { > - seq = nextseq; > flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, &seq); > times->utime = sig->utime; > times->stime = sig->stime; > @@ -344,9 +343,7 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times) > times->stime += stime; > times->sum_exec_runtime += read_sum_exec_runtime(t); > } > - /* If lockless access failed, take the lock. */ > - nextseq = 1; > - } while (need_seqretry(&sig->stats_lock, seq)); > + } while (need_seqretry_xxx(&sig->stats_lock, &seq)); > done_seqretry_irqrestore(&sig->stats_lock, seq, flags); > rcu_read_unlock(); > } > > most (if not all) of other users can be changed the same way. > > ------------------------------------------------------------------------------- > Or perhaps we can even add a helper that hides all the details, something like > > int xxx(seqlock_t *lock, int *seq, int lockless) > { > if (lockless) { > *seq = read_seqbegin(lock); > return 1; > } > > if (*seq & 1) { > read_sequnlock_excl(lock); > return 0; > } > > if (read_seqretry(lock, *seq)) { > read_seqlock_excl(lock); > *seq = 1; > return 1; > } > > return 0; > > } > > #define __XXX(lock, seq, lockless) \ > for (int lockless = 1, seq; xxx(lock, &seq, lockless); lockless = 0) > > #define XXX(lock) \ > __XXX(lock, __UNIQUE_ID(seq), __UNIQUE_ID(lockless)) > > > ? Oh gawd, that thing had better not have control flow escape that loop. But yes, I suppose something like this is far more useable than the current thing. > This way the users can simply do: > > seqlock_t sl; > > void func(void) > { > XXX(&sl) { > ... read-side critical section ... > } > } > > using only the new XXX() helper. No need to declare/initialize seq, no need > for need_seqretry/done_seqretry. > > What do you think? I'm thinking we want something like this for the normal seqcount loops too :-) ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [RFC 2/1] seqlock: make the read_seqbegin_or_lock() API more simple and less error-prone ? 2025-10-01 13:02 ` Peter Zijlstra @ 2025-10-01 13:13 ` Oleg Nesterov 2025-10-01 13:46 ` Oleg Nesterov 2025-10-02 12:58 ` Oleg Nesterov 0 siblings, 2 replies; 74+ messages in thread From: Oleg Nesterov @ 2025-10-01 13:13 UTC (permalink / raw) To: Peter Zijlstra Cc: Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Linus Torvalds, Waiman Long, Will Deacon, linux-kernel On 10/01, Peter Zijlstra wrote: > On Sun, Sep 28, 2025 at 06:20:54PM +0200, Oleg Nesterov wrote: > > > To simplify, suppose we add the new helper > > > > static inline int need_seqretry_xxx(seqlock_t *lock, int *seq) > > { > > int ret = !(*seq & 1) && read_seqretry(lock, *seq); > > > > if (ret) > > ++*seq; /* make this counter odd */ ^^^^^^ Hmm. just *seq = 1; makes more sense > How about need_seqretry_or_lock() to stay in theme with > read_seqbegin_or_lock(). I am fine with any name ;) This one looks good to me. > > #define __XXX(lock, seq, lockless) \ > > for (int lockless = 1, seq; xxx(lock, &seq, lockless); lockless = 0) > > > > #define XXX(lock) \ > > __XXX(lock, __UNIQUE_ID(seq), __UNIQUE_ID(lockless)) > > > > > > ? > > Oh gawd, that thing had better not have control flow escape that loop. Yes, yes. "continue" is fine, but break/return won't work. > But yes, I suppose something like this is far more useable than the > current thing. OK, great. So, modulo naming, how about the patch below? The new stuff should obviously go to include/linux/seqlock.h, xxx() can be probably uninlined. thread_group_cputime() is changed as an example. Oleg. --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -306,6 +306,35 @@ static u64 read_sum_exec_runtime(struct task_struct *t) } #endif /* !CONFIG_64BIT */ +static inline int xxx(seqlock_t *lock, int lockless, int *seq, unsigned long *flags) +{ + if (lockless) { + *seq = read_seqbegin(lock); + return 1; + } else if (*seq & 1) { + if (flags) + read_sequnlock_excl_irqrestore(lock, *flags); + else + read_sequnlock_excl(lock); + return 0; + } else if (read_seqretry(lock, *seq)) { + if (flags) + read_seqlock_excl_irqsave(lock, *flags); + else + read_seqlock_excl(lock); + *seq = 1; + return 1; + } else { + return 0; + } +} + +#define __XXX(lock, lockless, seq, flags) \ + for (int lockless = 1, seq; xxx(lock, lockless, &seq, flags); lockless = 0) + +#define XXX(lock, flags) \ + __XXX(lock, __UNIQUE_ID(lockless), __UNIQUE_ID(seq), flags) + /* * Accumulate raw cputime values of dead tasks (sig->[us]time) and live * tasks (sum on group iteration) belonging to @tsk's group. @@ -315,7 +344,6 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times) struct signal_struct *sig = tsk->signal; u64 utime, stime; struct task_struct *t; - unsigned int seq, nextseq; unsigned long flags; /* @@ -330,11 +358,7 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times) (void) task_sched_runtime(current); rcu_read_lock(); - /* Attempt a lockless read on the first round. */ - nextseq = 0; - do { - seq = nextseq; - flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, &seq); + XXX(&sig->stats_lock, &flags) { times->utime = sig->utime; times->stime = sig->stime; times->sum_exec_runtime = sig->sum_sched_runtime; @@ -345,10 +369,7 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times) times->stime += stime; times->sum_exec_runtime += read_sum_exec_runtime(t); } - /* If lockless access failed, take the lock. */ - nextseq = 1; - } while (need_seqretry(&sig->stats_lock, seq)); - done_seqretry_irqrestore(&sig->stats_lock, seq, flags); + } rcu_read_unlock(); } ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [RFC 2/1] seqlock: make the read_seqbegin_or_lock() API more simple and less error-prone ? 2025-10-01 13:13 ` Oleg Nesterov @ 2025-10-01 13:46 ` Oleg Nesterov 2025-10-02 12:58 ` Oleg Nesterov 1 sibling, 0 replies; 74+ messages in thread From: Oleg Nesterov @ 2025-10-01 13:46 UTC (permalink / raw) To: Peter Zijlstra Cc: Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Linus Torvalds, Waiman Long, Will Deacon, linux-kernel On 10/01, Oleg Nesterov wrote: > > +static inline int xxx(seqlock_t *lock, int lockless, int *seq, unsigned long *flags) > +{ > + if (lockless) { > + *seq = read_seqbegin(lock); > + return 1; > + } else if (*seq & 1) { > + if (flags) > + read_sequnlock_excl_irqrestore(lock, *flags); > + else > + read_sequnlock_excl(lock); > + return 0; > + } else if (read_seqretry(lock, *seq)) { > + if (flags) > + read_seqlock_excl_irqsave(lock, *flags); > + else > + read_seqlock_excl(lock); > + *seq = 1; > + return 1; > + } else { > + return 0; > + } > +} > + > +#define __XXX(lock, lockless, seq, flags) \ > + for (int lockless = 1, seq; xxx(lock, lockless, &seq, flags); lockless = 0) > + > +#define XXX(lock, flags) \ > + __XXX(lock, __UNIQUE_ID(lockless), __UNIQUE_ID(seq), flags) Note that __XXX() can have users too. See the patch below. Not that it makes a lot of sense, just for example. Oleg. diff --git a/fs/d_path.c b/fs/d_path.c index bb365511066b..b4cde188be4f 100644 --- a/fs/d_path.c +++ b/fs/d_path.c @@ -332,28 +332,23 @@ static char *__dentry_path(const struct dentry *d, struct prepend_buffer *p) { const struct dentry *dentry; struct prepend_buffer b; - int seq = 0; rcu_read_lock(); -restart: - dentry = d; - b = *p; - read_seqbegin_or_lock(&rename_lock, &seq); - while (!IS_ROOT(dentry)) { - const struct dentry *parent = dentry->d_parent; + __XXX(&rename_lock, lockless, seq, NULL) { + dentry = d; + b = *p; + while (!IS_ROOT(dentry)) { + const struct dentry *parent = dentry->d_parent; - prefetch(parent); - if (!prepend_name(&b, &dentry->d_name)) - break; - dentry = parent; - } - if (!(seq & 1)) - rcu_read_unlock(); - if (need_seqretry(&rename_lock, seq)) { - seq = 1; - goto restart; + prefetch(parent); + if (!prepend_name(&b, &dentry->d_name)) + break; + dentry = parent; + } + if (lockless) + rcu_read_unlock(); } - done_seqretry(&rename_lock, seq); + if (b.len == p->len) prepend_char(&b, '/'); return extract_string(&b); ^ permalink raw reply related [flat|nested] 74+ messages in thread
* Re: [RFC 2/1] seqlock: make the read_seqbegin_or_lock() API more simple and less error-prone ? 2025-10-01 13:13 ` Oleg Nesterov 2025-10-01 13:46 ` Oleg Nesterov @ 2025-10-02 12:58 ` Oleg Nesterov 1 sibling, 0 replies; 74+ messages in thread From: Oleg Nesterov @ 2025-10-02 12:58 UTC (permalink / raw) To: Peter Zijlstra Cc: Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Linus Torvalds, Waiman Long, Will Deacon, linux-kernel On 10/01, Oleg Nesterov wrote: > > +static inline int xxx(seqlock_t *lock, int lockless, int *seq, unsigned long *flags) > +{ > + if (lockless) { > + *seq = read_seqbegin(lock); > + return 1; > + } else if (*seq & 1) { > + if (flags) > + read_sequnlock_excl_irqrestore(lock, *flags); > + else > + read_sequnlock_excl(lock); > + return 0; > + } else if (read_seqretry(lock, *seq)) { > + if (flags) > + read_seqlock_excl_irqsave(lock, *flags); > + else > + read_seqlock_excl(lock); > + *seq = 1; > + return 1; > + } else { > + return 0; > + } > +} > + > +#define __XXX(lock, lockless, seq, flags) \ > + for (int lockless = 1, seq; xxx(lock, lockless, &seq, flags); lockless = 0) > + > +#define XXX(lock, flags) \ > + __XXX(lock, __UNIQUE_ID(lockless), __UNIQUE_ID(seq), flags) Or, better static inline int xxx(seqlock_t *lock, int *seq, unsigned long *flags) { int retry = 0; if (*seq & 1) { if (flags) read_sequnlock_excl_irqrestore(lock, *flags); else read_sequnlock_excl(lock); } else if (read_seqretry(lock, *seq)) { retry = *seq = 1; if (flags) read_seqlock_excl_irqsave(lock, *flags); else read_seqlock_excl(lock); } return retry; } #define __XXX(lock, lockless, seq, flags) \ for (int lockless = 1, seq = read_seqbegin(lock); \ lockless || xxx(lock, &seq, flags); \ lockless = 0) #define XXX(lock, flags) \ __XXX(lock, __UNIQUE_ID(lockless), __UNIQUE_ID(seq), flags) With this version the change in thread_group_cputime() even shrinks .text a little bit. I'd like to send the patch, but I still can't find a good name... Peter, will you agree with SEQBEGIN_OR_LOCK(lock, flags) ? Oleg. ^ permalink raw reply [flat|nested] 74+ messages in thread
* [PATCH 0/5] seqlock: introduce SEQLOCK_READ_SECTION() 2025-09-28 16:19 [PATCH 0/1] documentation: seqlock: fix the wrong documentation of read_seqbegin_or_lock/need_seqretry Oleg Nesterov 2025-09-28 16:20 ` [PATCH 1/1] " Oleg Nesterov 2025-09-28 16:20 ` [RFC 2/1] seqlock: make the read_seqbegin_or_lock() API more simple and less error-prone ? Oleg Nesterov @ 2025-10-05 14:47 ` Oleg Nesterov 2025-10-05 14:49 ` Oleg Nesterov ` (2 subsequent siblings) 5 siblings, 0 replies; 74+ messages in thread From: Oleg Nesterov @ 2025-10-05 14:47 UTC (permalink / raw) To: Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Linus Torvalds, Peter Zijlstra, Waiman Long, Will Deacon Cc: linux-kernel OK, let me name it SEQLOCK_READ_SECTION(). Al, could you look at 5/5? Please nack it if you think it makes no sense or wrong. I abused __dentry_path() to add another example. To simplify the review, this is how it looks after the patch: static char *__dentry_path(const struct dentry *d, struct prepend_buffer *p) { const struct dentry *dentry; struct prepend_buffer b; rcu_read_lock(); __SEQLOCK_READ_SECTION(&rename_lock, lockless, seq, NULL) { dentry = d; b = *p; while (!IS_ROOT(dentry)) { const struct dentry *parent = dentry->d_parent; prefetch(parent); if (!prepend_name(&b, &dentry->d_name)) break; dentry = parent; } if (lockless) rcu_read_unlock(); } if (b.len == p->len) prepend_char(&b, '/'); return extract_string(&b); } TODO: add another trivial helper static inline int need_seqretry_or_lock(seqlock_t *lock, int *seq) { int ret = !(*seq & 1) && read_seqretry(lock, *seq); if (ret) *seq = 1; /* make this counter odd */ return ret; } which can be used when the read section is more complex. Say, d_walk(). Oleg. --- fs/d_path.c | 31 +++++++++++++------------------ fs/proc/array.c | 9 ++------- fs/proc/base.c | 9 ++------- include/linux/seqlock.h | 49 +++++++++++++++++++++++++++++++++++++++++++++++++ kernel/sched/cputime.c | 14 +++----------- 5 files changed, 69 insertions(+), 43 deletions(-) ^ permalink raw reply [flat|nested] 74+ messages in thread
* [PATCH 0/5] seqlock: introduce SEQLOCK_READ_SECTION() 2025-09-28 16:19 [PATCH 0/1] documentation: seqlock: fix the wrong documentation of read_seqbegin_or_lock/need_seqretry Oleg Nesterov ` (2 preceding siblings ...) 2025-10-05 14:47 ` [PATCH 0/5] seqlock: introduce SEQLOCK_READ_SECTION() Oleg Nesterov @ 2025-10-05 14:49 ` Oleg Nesterov 2025-10-05 14:50 ` [PATCH 1/5] " Oleg Nesterov ` (5 more replies) 2025-10-07 14:20 ` [PATCH 0/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() Oleg Nesterov 2025-10-08 12:30 ` [PATCH v2 " Oleg Nesterov 5 siblings, 6 replies; 74+ messages in thread From: Oleg Nesterov @ 2025-10-05 14:49 UTC (permalink / raw) To: Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Linus Torvalds, Peter Zijlstra, Waiman Long, Will Deacon Cc: linux-kernel OK, let me name it SEQLOCK_READ_SECTION(). Al, could you look at 5/5? Please nack it if you think it makes no sense or wrong. I abused __dentry_path() to add another example. To simplify the review, this is how it looks after the patch: static char *__dentry_path(const struct dentry *d, struct prepend_buffer *p) { const struct dentry *dentry; struct prepend_buffer b; rcu_read_lock(); __SEQLOCK_READ_SECTION(&rename_lock, lockless, seq, NULL) { dentry = d; b = *p; while (!IS_ROOT(dentry)) { const struct dentry *parent = dentry->d_parent; prefetch(parent); if (!prepend_name(&b, &dentry->d_name)) break; dentry = parent; } if (lockless) rcu_read_unlock(); } if (b.len == p->len) prepend_char(&b, '/'); return extract_string(&b); } TODO: add another trivial helper static inline int need_seqretry_or_lock(seqlock_t *lock, int *seq) { int ret = !(*seq & 1) && read_seqretry(lock, *seq); if (ret) *seq = 1; /* make this counter odd */ return ret; } which can be used when the read section is more complex. Say, d_walk(). Oleg. --- fs/d_path.c | 31 +++++++++++++------------------ fs/proc/array.c | 9 ++------- fs/proc/base.c | 9 ++------- include/linux/seqlock.h | 49 +++++++++++++++++++++++++++++++++++++++++++++++++ kernel/sched/cputime.c | 14 +++----------- 5 files changed, 69 insertions(+), 43 deletions(-) ^ permalink raw reply [flat|nested] 74+ messages in thread
* [PATCH 1/5] seqlock: introduce SEQLOCK_READ_SECTION() 2025-10-05 14:49 ` Oleg Nesterov @ 2025-10-05 14:50 ` Oleg Nesterov 2025-10-05 15:34 ` Linus Torvalds 2025-10-05 14:50 ` [PATCH 2/5] seqlock: change thread_group_cputime() to use SEQLOCK_READ_SECTION() Oleg Nesterov ` (4 subsequent siblings) 5 siblings, 1 reply; 74+ messages in thread From: Oleg Nesterov @ 2025-10-05 14:50 UTC (permalink / raw) To: Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Linus Torvalds, Peter Zijlstra, Waiman Long, Will Deacon Cc: linux-kernel The read_seqbegin/need_seqretry/done_seqretry API is cumbersome and error prone. With the new helper the "typical" code like nextseq = 0; do { seq = nextseq; flags = read_seqbegin_or_lock_irqsave(&seqlock, &seq); // read-side critical section nextseq = 1; } while (need_seqretry(&seqlock, seq)); done_seqretry_irqrestore(&seqlock, seq, flags); can be rewritten as SEQLOCK_READ_SECTION(&seqlock, &flags) { // read-side critical section } Signed-off-by: Oleg Nesterov <oleg@redhat.com> --- include/linux/seqlock.h | 49 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+) diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h index 5ce48eab7a2a..8cd36a7c1638 100644 --- a/include/linux/seqlock.h +++ b/include/linux/seqlock.h @@ -1209,4 +1209,53 @@ done_seqretry_irqrestore(seqlock_t *lock, int seq, unsigned long flags) if (seq & 1) read_sequnlock_excl_irqrestore(lock, flags); } + +/* internal helper for SEQLOCK_READ_SECTION() */ +static inline int +seqlock_read_section_retry(seqlock_t *lock, int *seq, unsigned long *flags) +{ + int retry = 0; + + if (*seq & 1) { + if (flags) + read_sequnlock_excl_irqrestore(lock, *flags); + else + read_sequnlock_excl(lock); + } else if (read_seqretry(lock, *seq)) { + retry = *seq = 1; + if (flags) + read_seqlock_excl_irqsave(lock, *flags); + else + read_seqlock_excl(lock); + } + + return retry; +} + +#define __SEQLOCK_READ_SECTION(lock, lockless, seq, flags) \ + for (int lockless = 1, seq = read_seqbegin(lock); \ + lockless || seqlock_read_section_retry(lock, &seq, flags); \ + lockless = 0) + +/** + * SEQLOCK_READ_SECTION(lock, flags) - execute the read side critical section + * without manual sequence counter handling + * or calls to other helpers + * @lock: pointer to the seqlock_t protecting the data + * @flags: pointer to unsigned long for irqsave/irqrestore or NULL + * + * Example: + * + * SEQLOCK_READ_SECTION(&lock, &flags) { + * // read-side critical section + * } + * + * Starts with a lockless pass first. If it fails, restarts the critical section + * with the lock held and, if @flags != NULL, with irqs disabled. + * + * The critical section must not contain control flow that escapes the loop. + */ +#define SEQLOCK_READ_SECTION(lock, flags) \ + __SEQLOCK_READ_SECTION(lock, __UNIQUE_ID(lockless), __UNIQUE_ID(seq), flags) + #endif /* __LINUX_SEQLOCK_H */ -- 2.25.1.362.g51ebf55 ^ permalink raw reply related [flat|nested] 74+ messages in thread
* Re: [PATCH 1/5] seqlock: introduce SEQLOCK_READ_SECTION() 2025-10-05 14:50 ` [PATCH 1/5] " Oleg Nesterov @ 2025-10-05 15:34 ` Linus Torvalds 2025-10-05 16:07 ` Oleg Nesterov 0 siblings, 1 reply; 74+ messages in thread From: Linus Torvalds @ 2025-10-05 15:34 UTC (permalink / raw) To: Oleg Nesterov Cc: Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Peter Zijlstra, Waiman Long, Will Deacon, linux-kernel On Sun, 5 Oct 2025 at 07:51, Oleg Nesterov <oleg@redhat.com> wrote: > > +#define __SEQLOCK_READ_SECTION(lock, lockless, seq, flags) \ > + for (int lockless = 1, seq = read_seqbegin(lock); \ > + lockless || seqlock_read_section_retry(lock, &seq, flags); \ > + lockless = 0) > ... > +#define SEQLOCK_READ_SECTION(lock, flags) \ > + __SEQLOCK_READ_SECTION(lock, __UNIQUE_ID(lockless), __UNIQUE_ID(seq), flags) Ok, I like the helper wrapper, but I don't actually think we need it to be shouty. As far as the users are concerned, the result doesn't end up being really any different from our scoped guards, so I'd actually suggest you just make this helper look like our scoped_guard() macro does. And instead of making people pass in a NULL 'flags', just do a separate version of it, exactly like we do for locking. Even if the internal implementation then ends up sharing most of the code, please don't make people pass in NULL just because they don't want the irqsave version. So make it two different things: scoped_seqlock_read(lock) { .... } scoped_seqlock__read_irqsave(lock, flags) { ... } or something. (Maybe 'flags' can even be local to that scope?) Hmm? Linus ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH 1/5] seqlock: introduce SEQLOCK_READ_SECTION() 2025-10-05 15:34 ` Linus Torvalds @ 2025-10-05 16:07 ` Oleg Nesterov 2025-10-05 16:35 ` Linus Torvalds 0 siblings, 1 reply; 74+ messages in thread From: Oleg Nesterov @ 2025-10-05 16:07 UTC (permalink / raw) To: Linus Torvalds Cc: Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Peter Zijlstra, Waiman Long, Will Deacon, linux-kernel OK. I'll send v2 but let me ask... On 10/05, Linus Torvalds wrote: > > As far as the users are concerned, the result doesn't end up being > really any different from our scoped guards, so I'd actually suggest > you just make this helper look like our scoped_guard() macro does. I swear, I too thought about scoped_seqlock_xxx ;) > And instead of making people pass in a NULL 'flags', just do a > separate version of it, exactly like we do for locking. Even if the > internal implementation then ends up sharing most of the code, please > don't make people pass in NULL just because they don't want the > irqsave version. > > So make it two different things: > > scoped_seqlock_read(lock) { .... } > > scoped_seqlock__read_irqsave(lock, flags) { ... } OK. But if you don't object I'd like to avoid another DEFINE_LOCK_GUARD() or something like it in this case. To me it won't buy anything. And I think that the "generic" seqlock_read_section_retry(flags) makes sense, the "if (flags)" checks should not generate the extra code. Will you agree? > (Maybe 'flags' can even be local to that scope?) The problem is that you can't declare "int lockless/seq" and "unsigned long flags" inside "for (...)", but I'll try to think about it more. Oleg. ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH 1/5] seqlock: introduce SEQLOCK_READ_SECTION() 2025-10-05 16:07 ` Oleg Nesterov @ 2025-10-05 16:35 ` Linus Torvalds 0 siblings, 0 replies; 74+ messages in thread From: Linus Torvalds @ 2025-10-05 16:35 UTC (permalink / raw) To: Oleg Nesterov Cc: Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Peter Zijlstra, Waiman Long, Will Deacon, linux-kernel On Sun, 5 Oct 2025 at 09:09, Oleg Nesterov <oleg@redhat.com> wrote: > > OK. But if you don't object I'd like to avoid another DEFINE_LOCK_GUARD() > or something like it in this case. To me it won't buy anything. Oh, absolutely. I didn't mean that you actually *use* the fancy GUARD infrastructure we have: this code doesn't do the whole '__cleanup()' thing at all. I only mean that as far as users are concerned, this all looks similar, so please use a similar interface even if from an implementation standpoint it is very different. > The problem is that you can't declare "int lockless/seq" and > "unsigned long flags" inside "for (...)", but I'll try to think about > it more. I agree, it's an annoying limitation of the for-loop variable declaration thing that you can only declare one type. There's a somewhat strange solution to it, though: declare a structure with multiple members of the types you want: for (struct { char *a; int b; } c = {NULL, 1}; ... } and then you use 'c.a' and 'c.b' and friends inside the loop. It looks pretty odd, but it actually works fine and we should probably use it more. [ The *really* hacky solution is to just use one type and pick the biggest type and then possibly mis-use that type. That's effectively what you did with 'lockless': it's declared as an 'int', but it's really effectively used as a 'bool' anyway. Of course, in that 'lockless' case it's not as noticeable, since we often use int/bool rather interchangeably in the kernel for historical reasons anyway, so you probably didn't think of it as a hacky solution ] Linus ^ permalink raw reply [flat|nested] 74+ messages in thread
* [PATCH 2/5] seqlock: change thread_group_cputime() to use SEQLOCK_READ_SECTION() 2025-10-05 14:49 ` Oleg Nesterov 2025-10-05 14:50 ` [PATCH 1/5] " Oleg Nesterov @ 2025-10-05 14:50 ` Oleg Nesterov 2025-10-05 14:50 ` [PATCH 3/5] seqlock: change do_task_stat() " Oleg Nesterov ` (3 subsequent siblings) 5 siblings, 0 replies; 74+ messages in thread From: Oleg Nesterov @ 2025-10-05 14:50 UTC (permalink / raw) To: Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Linus Torvalds, Peter Zijlstra, Waiman Long, Will Deacon Cc: linux-kernel To simplify the code and make it more readable. While at it, change thread_group_cputime() to use __for_each_thread(sig). Signed-off-by: Oleg Nesterov <oleg@redhat.com> --- kernel/sched/cputime.c | 14 +++----------- 1 file changed, 3 insertions(+), 11 deletions(-) diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index 7097de2c8cda..2d691075b6d6 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -315,7 +315,6 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times) struct signal_struct *sig = tsk->signal; u64 utime, stime; struct task_struct *t; - unsigned int seq, nextseq; unsigned long flags; /* @@ -330,25 +329,18 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times) (void) task_sched_runtime(current); rcu_read_lock(); - /* Attempt a lockless read on the first round. */ - nextseq = 0; - do { - seq = nextseq; - flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, &seq); + SEQLOCK_READ_SECTION(&sig->stats_lock, &flags) { times->utime = sig->utime; times->stime = sig->stime; times->sum_exec_runtime = sig->sum_sched_runtime; - for_each_thread(tsk, t) { + __for_each_thread(sig, t) { task_cputime(t, &utime, &stime); times->utime += utime; times->stime += stime; times->sum_exec_runtime += read_sum_exec_runtime(t); } - /* If lockless access failed, take the lock. */ - nextseq = 1; - } while (need_seqretry(&sig->stats_lock, seq)); - done_seqretry_irqrestore(&sig->stats_lock, seq, flags); + } rcu_read_unlock(); } -- 2.25.1.362.g51ebf55 ^ permalink raw reply related [flat|nested] 74+ messages in thread
* [PATCH 3/5] seqlock: change do_task_stat() to use SEQLOCK_READ_SECTION() 2025-10-05 14:49 ` Oleg Nesterov 2025-10-05 14:50 ` [PATCH 1/5] " Oleg Nesterov 2025-10-05 14:50 ` [PATCH 2/5] seqlock: change thread_group_cputime() to use SEQLOCK_READ_SECTION() Oleg Nesterov @ 2025-10-05 14:50 ` Oleg Nesterov 2025-10-05 14:50 ` [PATCH 4/5] seqlock: change do_io_accounting() " Oleg Nesterov ` (2 subsequent siblings) 5 siblings, 0 replies; 74+ messages in thread From: Oleg Nesterov @ 2025-10-05 14:50 UTC (permalink / raw) To: Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Linus Torvalds, Peter Zijlstra, Waiman Long, Will Deacon Cc: linux-kernel To simplify the code and make it more readable. Signed-off-by: Oleg Nesterov <oleg@redhat.com> --- fs/proc/array.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/fs/proc/array.c b/fs/proc/array.c index d6a0369caa93..99bcaa58a2bd 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -483,7 +483,6 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, unsigned long flags; int exit_code = task->exit_code; struct signal_struct *sig = task->signal; - unsigned int seq = 1; state = *get_task_state(task); vsize = eip = esp = 0; @@ -540,10 +539,7 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, if (permitted && (!whole || num_threads < 2)) wchan = !task_is_running(task); - do { - seq++; /* 2 on the 1st/lockless path, otherwise odd */ - flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, &seq); - + SEQLOCK_READ_SECTION(&sig->stats_lock, &flags) { cmin_flt = sig->cmin_flt; cmaj_flt = sig->cmaj_flt; cutime = sig->cutime; @@ -565,8 +561,7 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, } rcu_read_unlock(); } - } while (need_seqretry(&sig->stats_lock, seq)); - done_seqretry_irqrestore(&sig->stats_lock, seq, flags); + } if (whole) { thread_group_cputime_adjusted(task, &utime, &stime); -- 2.25.1.362.g51ebf55 ^ permalink raw reply related [flat|nested] 74+ messages in thread
* [PATCH 4/5] seqlock: change do_io_accounting() to use SEQLOCK_READ_SECTION() 2025-10-05 14:49 ` Oleg Nesterov ` (2 preceding siblings ...) 2025-10-05 14:50 ` [PATCH 3/5] seqlock: change do_task_stat() " Oleg Nesterov @ 2025-10-05 14:50 ` Oleg Nesterov 2025-10-05 14:50 ` [PATCH 5/5] seqlock: change __dentry_path() to use __SEQLOCK_READ_SECTION() Oleg Nesterov 2025-10-05 15:30 ` [PATCH 0/5] seqlock: introduce SEQLOCK_READ_SECTION() Al Viro 5 siblings, 0 replies; 74+ messages in thread From: Oleg Nesterov @ 2025-10-05 14:50 UTC (permalink / raw) To: Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Linus Torvalds, Peter Zijlstra, Waiman Long, Will Deacon Cc: linux-kernel To simplify the code and make it more readable. Signed-off-by: Oleg Nesterov <oleg@redhat.com> --- fs/proc/base.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index 62d35631ba8c..7ec507efbb99 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -3041,20 +3041,15 @@ static int do_io_accounting(struct task_struct *task, struct seq_file *m, int wh if (whole) { struct signal_struct *sig = task->signal; struct task_struct *t; - unsigned int seq = 1; unsigned long flags; rcu_read_lock(); - do { - seq++; /* 2 on the 1st/lockless path, otherwise odd */ - flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, &seq); - + SEQLOCK_READ_SECTION(&sig->stats_lock, &flags) { acct = sig->ioac; __for_each_thread(sig, t) task_io_accounting_add(&acct, &t->ioac); - } while (need_seqretry(&sig->stats_lock, seq)); - done_seqretry_irqrestore(&sig->stats_lock, seq, flags); + } rcu_read_unlock(); } else { acct = task->ioac; -- 2.25.1.362.g51ebf55 ^ permalink raw reply related [flat|nested] 74+ messages in thread
* [PATCH 5/5] seqlock: change __dentry_path() to use __SEQLOCK_READ_SECTION() 2025-10-05 14:49 ` Oleg Nesterov ` (3 preceding siblings ...) 2025-10-05 14:50 ` [PATCH 4/5] seqlock: change do_io_accounting() " Oleg Nesterov @ 2025-10-05 14:50 ` Oleg Nesterov 2025-10-05 15:48 ` Linus Torvalds 2025-10-05 15:30 ` [PATCH 0/5] seqlock: introduce SEQLOCK_READ_SECTION() Al Viro 5 siblings, 1 reply; 74+ messages in thread From: Oleg Nesterov @ 2025-10-05 14:50 UTC (permalink / raw) To: Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Linus Torvalds, Peter Zijlstra, Waiman Long, Will Deacon Cc: linux-kernel To simplify the code and make it more readable. Signed-off-by: Oleg Nesterov <oleg@redhat.com> --- fs/d_path.c | 31 +++++++++++++------------------ 1 file changed, 13 insertions(+), 18 deletions(-) diff --git a/fs/d_path.c b/fs/d_path.c index bb365511066b..0dab1ab1e78d 100644 --- a/fs/d_path.c +++ b/fs/d_path.c @@ -332,28 +332,23 @@ static char *__dentry_path(const struct dentry *d, struct prepend_buffer *p) { const struct dentry *dentry; struct prepend_buffer b; - int seq = 0; rcu_read_lock(); -restart: - dentry = d; - b = *p; - read_seqbegin_or_lock(&rename_lock, &seq); - while (!IS_ROOT(dentry)) { - const struct dentry *parent = dentry->d_parent; + __SEQLOCK_READ_SECTION(&rename_lock, lockless, seq, NULL) { + dentry = d; + b = *p; + while (!IS_ROOT(dentry)) { + const struct dentry *parent = dentry->d_parent; - prefetch(parent); - if (!prepend_name(&b, &dentry->d_name)) - break; - dentry = parent; - } - if (!(seq & 1)) - rcu_read_unlock(); - if (need_seqretry(&rename_lock, seq)) { - seq = 1; - goto restart; + prefetch(parent); + if (!prepend_name(&b, &dentry->d_name)) + break; + dentry = parent; + } + if (lockless) + rcu_read_unlock(); } - done_seqretry(&rename_lock, seq); + if (b.len == p->len) prepend_char(&b, '/'); return extract_string(&b); -- 2.25.1.362.g51ebf55 ^ permalink raw reply related [flat|nested] 74+ messages in thread
* Re: [PATCH 5/5] seqlock: change __dentry_path() to use __SEQLOCK_READ_SECTION() 2025-10-05 14:50 ` [PATCH 5/5] seqlock: change __dentry_path() to use __SEQLOCK_READ_SECTION() Oleg Nesterov @ 2025-10-05 15:48 ` Linus Torvalds 0 siblings, 0 replies; 74+ messages in thread From: Linus Torvalds @ 2025-10-05 15:48 UTC (permalink / raw) To: Oleg Nesterov Cc: Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Peter Zijlstra, Waiman Long, Will Deacon, linux-kernel On Sun, 5 Oct 2025 at 07:52, Oleg Nesterov <oleg@redhat.com> wrote: > > To simplify the code and make it more readable. Ok, so the other ones looked fine. This one I'm not convinced about. The end result makes no sense taken alone. It does that odd "rcu_read_unlock()" after the first loop, and it basically ends up being very incestuous with that implementation, which means that now you have to understand both pieces intimately to figure it out, and they are not near each other. It *might* be solved by just renaming 'lockless' to 'first_lockless_iteration' - not because it changes the code, but because it makes the logic much more explicit, and then that if (first_lockless_iteration) rcu_read_unlock(); test inside that loop would at least make a lot more conceptual sense without knowing the internal implementation of that macro. But honestly, while I think that would turn it from "too ugly to live" to "I don't love it but I can deal with it", I would wish for something better. Side note: that whole internal loop: while (!IS_ROOT(dentry)) { const struct dentry *parent = dentry->d_parent; prefetch(parent); if (!prepend_name(&b, &dentry->d_name)) break; dentry = parent; } should be a helper function of its own, I think. And if you do that, maybe you can switch the whole thing over to just making the first non-locked iteration be an explicitly separate phase? I dunno. I don't love that code in the existing format - but I think you ended up hiding that subtlety even more. Linus ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH 0/5] seqlock: introduce SEQLOCK_READ_SECTION() 2025-10-05 14:49 ` Oleg Nesterov ` (4 preceding siblings ...) 2025-10-05 14:50 ` [PATCH 5/5] seqlock: change __dentry_path() to use __SEQLOCK_READ_SECTION() Oleg Nesterov @ 2025-10-05 15:30 ` Al Viro 2025-10-05 17:40 ` Oleg Nesterov 5 siblings, 1 reply; 74+ messages in thread From: Al Viro @ 2025-10-05 15:30 UTC (permalink / raw) To: Oleg Nesterov Cc: Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Linus Torvalds, Peter Zijlstra, Waiman Long, Will Deacon, linux-kernel On Sun, Oct 05, 2025 at 04:49:29PM +0200, Oleg Nesterov wrote: > OK, let me name it SEQLOCK_READ_SECTION(). > > Al, could you look at 5/5? Please nack it if you think it makes no sense > or wrong. I abused __dentry_path() to add another example. To simplify the > review, this is how it looks after the patch: > > static char *__dentry_path(const struct dentry *d, struct prepend_buffer *p) > { > const struct dentry *dentry; > struct prepend_buffer b; > > rcu_read_lock(); > __SEQLOCK_READ_SECTION(&rename_lock, lockless, seq, NULL) { > dentry = d; > b = *p; > while (!IS_ROOT(dentry)) { > const struct dentry *parent = dentry->d_parent; > > prefetch(parent); > if (!prepend_name(&b, &dentry->d_name)) > break; > dentry = parent; > } > if (lockless) > rcu_read_unlock(); <cringe> > } > > if (b.len == p->len) > prepend_char(&b, '/'); > return extract_string(&b); > } > > > TODO: add another trivial helper > > static inline int need_seqretry_or_lock(seqlock_t *lock, int *seq) > { > int ret = !(*seq & 1) && read_seqretry(lock, *seq); > > if (ret) > *seq = 1; /* make this counter odd */ > > return ret; > } > > which can be used when the read section is more complex. Say, d_walk(). prepend_path() would be interesting to look at... ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH 0/5] seqlock: introduce SEQLOCK_READ_SECTION() 2025-10-05 15:30 ` [PATCH 0/5] seqlock: introduce SEQLOCK_READ_SECTION() Al Viro @ 2025-10-05 17:40 ` Oleg Nesterov 0 siblings, 0 replies; 74+ messages in thread From: Oleg Nesterov @ 2025-10-05 17:40 UTC (permalink / raw) To: Al Viro Cc: Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Linus Torvalds, Peter Zijlstra, Waiman Long, Will Deacon, linux-kernel On 10/05, Al Viro wrote: > > On Sun, Oct 05, 2025 at 04:49:29PM +0200, Oleg Nesterov wrote: > > > static char *__dentry_path(const struct dentry *d, struct prepend_buffer *p) > > { > > const struct dentry *dentry; > > struct prepend_buffer b; > > > > rcu_read_lock(); > > __SEQLOCK_READ_SECTION(&rename_lock, lockless, seq, NULL) { > > dentry = d; > > b = *p; > > while (!IS_ROOT(dentry)) { > > const struct dentry *parent = dentry->d_parent; > > > > prefetch(parent); > > if (!prepend_name(&b, &dentry->d_name)) > > break; > > dentry = parent; > > } > > if (lockless) > > rcu_read_unlock(); > > <cringe> Weird or buggy? > prepend_path() would be interesting to look at... Yeah, I have already looked at it. Not sure yet what the "good" cleanup could do. Will think about it more. Oleg. ^ permalink raw reply [flat|nested] 74+ messages in thread
* [PATCH 0/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() 2025-09-28 16:19 [PATCH 0/1] documentation: seqlock: fix the wrong documentation of read_seqbegin_or_lock/need_seqretry Oleg Nesterov ` (3 preceding siblings ...) 2025-10-05 14:49 ` Oleg Nesterov @ 2025-10-07 14:20 ` Oleg Nesterov 2025-10-07 14:21 ` [PATCH 1/4] " Oleg Nesterov ` (4 more replies) 2025-10-08 12:30 ` [PATCH v2 " Oleg Nesterov 5 siblings, 5 replies; 74+ messages in thread From: Oleg Nesterov @ 2025-10-07 14:20 UTC (permalink / raw) To: Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Linus Torvalds, Peter Zijlstra, Waiman Long, Will Deacon Cc: linux-kernel OK, please see V2. scoped_seqlock_read_irqsave() uses the "for (struct {} s ..." hack to make "ulong flags" local. I can change scoped_seqlock_read() to do the same, to make it symmetric with _irqsave() if you think it makes sense. I've dropped 5/5 which changed __dentry_path() for now. Oleg. fs/proc/array.c | 9 ++------ fs/proc/base.c | 10 ++------ include/linux/seqlock.h | 61 +++++++++++++++++++++++++++++++++++++++++++++++++ kernel/sched/cputime.c | 15 +++--------- 4 files changed, 68 insertions(+), 27 deletions(-) ^ permalink raw reply [flat|nested] 74+ messages in thread
* [PATCH 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() 2025-10-07 14:20 ` [PATCH 0/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() Oleg Nesterov @ 2025-10-07 14:21 ` Oleg Nesterov 2025-10-07 16:35 ` Waiman Long 2025-10-07 14:21 ` [PATCH 2/4] seqlock: change thread_group_cputime() to use scoped_seqlock_read_irqsave() Oleg Nesterov ` (3 subsequent siblings) 4 siblings, 1 reply; 74+ messages in thread From: Oleg Nesterov @ 2025-10-07 14:21 UTC (permalink / raw) To: Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Linus Torvalds, Peter Zijlstra, Waiman Long, Will Deacon Cc: linux-kernel The read_seqbegin/need_seqretry/done_seqretry API is cumbersome and error prone. With the new helper the "typical" code like int seq, nextseq; unsigned long flags; nextseq = 0; do { seq = nextseq; flags = read_seqbegin_or_lock_irqsave(&seqlock, &seq); // read-side critical section nextseq = 1; } while (need_seqretry(&seqlock, seq)); done_seqretry_irqrestore(&seqlock, seq, flags); can be rewritten as scoped_seqlock_read_irqsave (&seqlock) { // read-side critical section } Signed-off-by: Oleg Nesterov <oleg@redhat.com> --- include/linux/seqlock.h | 61 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 61 insertions(+) diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h index 5ce48eab7a2a..9012702fd0a8 100644 --- a/include/linux/seqlock.h +++ b/include/linux/seqlock.h @@ -1209,4 +1209,65 @@ done_seqretry_irqrestore(seqlock_t *lock, int seq, unsigned long flags) if (seq & 1) read_sequnlock_excl_irqrestore(lock, flags); } + +/* internal helper for scoped_seqlock_read/scoped_seqlock_read_irqsave */ +static inline int +scoped_seqlock_read_retry(seqlock_t *lock, int *seq, unsigned long *flags) +{ + int retry = 0; + + if (*seq & 1) { + if (flags) + read_sequnlock_excl_irqrestore(lock, *flags); + else + read_sequnlock_excl(lock); + } else if (read_seqretry(lock, *seq)) { + retry = *seq = 1; + if (flags) + read_seqlock_excl_irqsave(lock, *flags); + else + read_seqlock_excl(lock); + } + + return retry; +} + +#define __scoped_seqlock_read(lock, lockless, seq) \ + for (int lockless = 1, seq = read_seqbegin(lock); \ + lockless || scoped_seqlock_read_retry(lock, &seq, NULL); \ + lockless = 0) + +/** + * scoped_seqlock_read(lock) - execute the read side critical section + * without manual sequence counter handling + * or calls to other helpers + * @lock: pointer to the seqlock_t protecting the data + * + * Example: + * + * scoped_seqlock_read(&lock) { + * // read-side critical section + * } + * + * Starts with a lockless pass first. If it fails, restarts the critical + * section with the lock held. + * + * The critical section must not contain control flow that escapes the loop. + */ +#define scoped_seqlock_read(lock) \ + __scoped_seqlock_read(lock, __UNIQUE_ID(lockless), __UNIQUE_ID(seq)) + +#define __scoped_seqlock_read_irqsave(lock, s) \ + for (struct { int lockless, seq; ulong flags; } s = { 1, read_seqbegin(lock) }; \ + s.lockless || scoped_seqlock_read_retry(lock, &s.seq, &s.flags); \ + s.lockless = 0) + +/** + * scoped_seqlock_read_irqsave(lock) - same as scoped_seqlock_read() but + * disables irqs on a locking pass + * @lock: pointer to the seqlock_t protecting the data + */ +#define scoped_seqlock_read_irqsave(lock) \ + __scoped_seqlock_read_irqsave(lock, __UNIQUE_ID(s)) + #endif /* __LINUX_SEQLOCK_H */ -- 2.25.1.362.g51ebf55 ^ permalink raw reply related [flat|nested] 74+ messages in thread
* Re: [PATCH 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() 2025-10-07 14:21 ` [PATCH 1/4] " Oleg Nesterov @ 2025-10-07 16:35 ` Waiman Long 2025-10-07 17:18 ` Oleg Nesterov 0 siblings, 1 reply; 74+ messages in thread From: Waiman Long @ 2025-10-07 16:35 UTC (permalink / raw) To: Oleg Nesterov, Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Linus Torvalds, Peter Zijlstra, Will Deacon Cc: linux-kernel On 10/7/25 10:21 AM, Oleg Nesterov wrote: > The read_seqbegin/need_seqretry/done_seqretry API is cumbersome and > error prone. With the new helper the "typical" code like > > int seq, nextseq; > unsigned long flags; > > nextseq = 0; > do { > seq = nextseq; > flags = read_seqbegin_or_lock_irqsave(&seqlock, &seq); > > // read-side critical section > > nextseq = 1; > } while (need_seqretry(&seqlock, seq)); > done_seqretry_irqrestore(&seqlock, seq, flags); > > can be rewritten as > > scoped_seqlock_read_irqsave (&seqlock) { > // read-side critical section > } > > Signed-off-by: Oleg Nesterov <oleg@redhat.com> > --- > include/linux/seqlock.h | 61 +++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 61 insertions(+) > > diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h > index 5ce48eab7a2a..9012702fd0a8 100644 > --- a/include/linux/seqlock.h > +++ b/include/linux/seqlock.h > @@ -1209,4 +1209,65 @@ done_seqretry_irqrestore(seqlock_t *lock, int seq, unsigned long flags) > if (seq & 1) > read_sequnlock_excl_irqrestore(lock, flags); > } > + > +/* internal helper for scoped_seqlock_read/scoped_seqlock_read_irqsave */ > +static inline int > +scoped_seqlock_read_retry(seqlock_t *lock, int *seq, unsigned long *flags) I would suggest adding the "__" prefix to indicate that this is an internal helper that shouldn't be called directly. > +{ > + int retry = 0; > + > + if (*seq & 1) { > + if (flags) > + read_sequnlock_excl_irqrestore(lock, *flags); > + else > + read_sequnlock_excl(lock); > + } else if (read_seqretry(lock, *seq)) { > + retry = *seq = 1; > + if (flags) > + read_seqlock_excl_irqsave(lock, *flags); > + else > + read_seqlock_excl(lock); > + } > + > + return retry; > +} > + > +#define __scoped_seqlock_read(lock, lockless, seq) \ > + for (int lockless = 1, seq = read_seqbegin(lock); \ > + lockless || scoped_seqlock_read_retry(lock, &seq, NULL); \ > + lockless = 0) I like Linus' suggestion of putting lockless and seq into a struct to make it more consistent with __scoped_seqlock_read_irqsave(). > + > +/** > + * scoped_seqlock_read(lock) - execute the read side critical section > + * without manual sequence counter handling > + * or calls to other helpers > + * @lock: pointer to the seqlock_t protecting the data > + * > + * Example: > + * > + * scoped_seqlock_read(&lock) { > + * // read-side critical section > + * } > + * > + * Starts with a lockless pass first. If it fails, restarts the critical > + * section with the lock held. > + * > + * The critical section must not contain control flow that escapes the loop. > + */ > +#define scoped_seqlock_read(lock) \ > + __scoped_seqlock_read(lock, __UNIQUE_ID(lockless), __UNIQUE_ID(seq)) > + > +#define __scoped_seqlock_read_irqsave(lock, s) \ > + for (struct { int lockless, seq; ulong flags; } s = { 1, read_seqbegin(lock) }; \ > + s.lockless || scoped_seqlock_read_retry(lock, &s.seq, &s.flags); \ > + s.lockless = 0) > + > +/** > + * scoped_seqlock_read_irqsave(lock) - same as scoped_seqlock_read() but > + * disables irqs on a locking pass > + * @lock: pointer to the seqlock_t protecting the data Maybe we should we should add a comment saying that this API is similar to scoped_seqlock_read() but with irqs disabled. > + */ > +#define scoped_seqlock_read_irqsave(lock) \ > + __scoped_seqlock_read_irqsave(lock, __UNIQUE_ID(s)) > + > #endif /* __LINUX_SEQLOCK_H */ Other than these minor nits, the patch looks good to me. Cheers, Longman ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() 2025-10-07 16:35 ` Waiman Long @ 2025-10-07 17:18 ` Oleg Nesterov 2025-10-07 17:21 ` Waiman Long 0 siblings, 1 reply; 74+ messages in thread From: Oleg Nesterov @ 2025-10-07 17:18 UTC (permalink / raw) To: Waiman Long Cc: Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Linus Torvalds, Peter Zijlstra, Will Deacon, linux-kernel On 10/07, Waiman Long wrote: > > On 10/7/25 10:21 AM, Oleg Nesterov wrote: > >+ > >+/* internal helper for scoped_seqlock_read/scoped_seqlock_read_irqsave */ > >+static inline int > >+scoped_seqlock_read_retry(seqlock_t *lock, int *seq, unsigned long *flags) > I would suggest adding the "__" prefix to indicate that this is an internal > helper that shouldn't be called directly. OK, I will add "__", but I thought that "internal helper" makes it clear that it shouldn't be called directly. Nevermind, will do. > >+#define __scoped_seqlock_read(lock, lockless, seq) \ > >+ for (int lockless = 1, seq = read_seqbegin(lock); \ > >+ lockless || scoped_seqlock_read_retry(lock, &seq, NULL); \ > >+ lockless = 0) > > I like Linus' suggestion of putting lockless and seq into a struct to make > it more consistent with __scoped_seqlock_read_irqsave(). Again, will do. See my reply to Linus. > >+/** > >+ * scoped_seqlock_read_irqsave(lock) - same as scoped_seqlock_read() but > >+ * disables irqs on a locking pass > >+ * @lock: pointer to the seqlock_t protecting the data > Maybe we should we should add a comment saying that this API is similar to > scoped_seqlock_read() but with irqs disabled. Hmm... This is what the comment above tries to say... Do you think it can be improved? Oleg. ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() 2025-10-07 17:18 ` Oleg Nesterov @ 2025-10-07 17:21 ` Waiman Long 0 siblings, 0 replies; 74+ messages in thread From: Waiman Long @ 2025-10-07 17:21 UTC (permalink / raw) To: Oleg Nesterov, Waiman Long Cc: Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Linus Torvalds, Peter Zijlstra, Will Deacon, linux-kernel On 10/7/25 1:18 PM, Oleg Nesterov wrote: > On 10/07, Waiman Long wrote: >> On 10/7/25 10:21 AM, Oleg Nesterov wrote: >>> + >>> +/* internal helper for scoped_seqlock_read/scoped_seqlock_read_irqsave */ >>> +static inline int >>> +scoped_seqlock_read_retry(seqlock_t *lock, int *seq, unsigned long *flags) >> I would suggest adding the "__" prefix to indicate that this is an internal >> helper that shouldn't be called directly. > OK, I will add "__", but I thought that "internal helper" makes it clear that > it shouldn't be called directly. Nevermind, will do. > >>> +#define __scoped_seqlock_read(lock, lockless, seq) \ >>> + for (int lockless = 1, seq = read_seqbegin(lock); \ >>> + lockless || scoped_seqlock_read_retry(lock, &seq, NULL); \ >>> + lockless = 0) >> I like Linus' suggestion of putting lockless and seq into a struct to make >> it more consistent with __scoped_seqlock_read_irqsave(). > Again, will do. See my reply to Linus. > >>> +/** >>> + * scoped_seqlock_read_irqsave(lock) - same as scoped_seqlock_read() but >>> + * disables irqs on a locking pass >>> + * @lock: pointer to the seqlock_t protecting the data >> Maybe we should we should add a comment saying that this API is similar to >> scoped_seqlock_read() but with irqs disabled. > Hmm... This is what the comment above tries to say... Do you think it can > be improved? Sorry, I missed that. Never mind :-) Cheers, Longman ^ permalink raw reply [flat|nested] 74+ messages in thread
* [PATCH 2/4] seqlock: change thread_group_cputime() to use scoped_seqlock_read_irqsave() 2025-10-07 14:20 ` [PATCH 0/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() Oleg Nesterov 2025-10-07 14:21 ` [PATCH 1/4] " Oleg Nesterov @ 2025-10-07 14:21 ` Oleg Nesterov 2025-10-07 14:21 ` [PATCH 3/4] seqlock: change do_task_stat() " Oleg Nesterov ` (2 subsequent siblings) 4 siblings, 0 replies; 74+ messages in thread From: Oleg Nesterov @ 2025-10-07 14:21 UTC (permalink / raw) To: Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Linus Torvalds, Peter Zijlstra, Waiman Long, Will Deacon Cc: linux-kernel To simplify the code and make it more readable. While at it, change thread_group_cputime() to use __for_each_thread(sig). Signed-off-by: Oleg Nesterov <oleg@redhat.com> --- kernel/sched/cputime.c | 15 +++------------ 1 file changed, 3 insertions(+), 12 deletions(-) diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index 7097de2c8cda..94b445e947d2 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -315,8 +315,6 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times) struct signal_struct *sig = tsk->signal; u64 utime, stime; struct task_struct *t; - unsigned int seq, nextseq; - unsigned long flags; /* * Update current task runtime to account pending time since last @@ -330,25 +328,18 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times) (void) task_sched_runtime(current); rcu_read_lock(); - /* Attempt a lockless read on the first round. */ - nextseq = 0; - do { - seq = nextseq; - flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, &seq); + scoped_seqlock_read_irqsave(&sig->stats_lock) { times->utime = sig->utime; times->stime = sig->stime; times->sum_exec_runtime = sig->sum_sched_runtime; - for_each_thread(tsk, t) { + __for_each_thread(sig, t) { task_cputime(t, &utime, &stime); times->utime += utime; times->stime += stime; times->sum_exec_runtime += read_sum_exec_runtime(t); } - /* If lockless access failed, take the lock. */ - nextseq = 1; - } while (need_seqretry(&sig->stats_lock, seq)); - done_seqretry_irqrestore(&sig->stats_lock, seq, flags); + } rcu_read_unlock(); } -- 2.25.1.362.g51ebf55 ^ permalink raw reply related [flat|nested] 74+ messages in thread
* [PATCH 3/4] seqlock: change do_task_stat() to use scoped_seqlock_read_irqsave() 2025-10-07 14:20 ` [PATCH 0/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() Oleg Nesterov 2025-10-07 14:21 ` [PATCH 1/4] " Oleg Nesterov 2025-10-07 14:21 ` [PATCH 2/4] seqlock: change thread_group_cputime() to use scoped_seqlock_read_irqsave() Oleg Nesterov @ 2025-10-07 14:21 ` Oleg Nesterov 2025-10-07 14:21 ` [PATCH 4/4] seqlock: change do_io_accounting() " Oleg Nesterov 2025-10-07 15:38 ` [PATCH 0/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() Linus Torvalds 4 siblings, 0 replies; 74+ messages in thread From: Oleg Nesterov @ 2025-10-07 14:21 UTC (permalink / raw) To: Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Linus Torvalds, Peter Zijlstra, Waiman Long, Will Deacon Cc: linux-kernel To simplify the code and make it more readable. Signed-off-by: Oleg Nesterov <oleg@redhat.com> --- fs/proc/array.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/fs/proc/array.c b/fs/proc/array.c index d6a0369caa93..e9c61448f3ee 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -483,7 +483,6 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, unsigned long flags; int exit_code = task->exit_code; struct signal_struct *sig = task->signal; - unsigned int seq = 1; state = *get_task_state(task); vsize = eip = esp = 0; @@ -540,10 +539,7 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, if (permitted && (!whole || num_threads < 2)) wchan = !task_is_running(task); - do { - seq++; /* 2 on the 1st/lockless path, otherwise odd */ - flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, &seq); - + scoped_seqlock_read_irqsave(&sig->stats_lock) { cmin_flt = sig->cmin_flt; cmaj_flt = sig->cmaj_flt; cutime = sig->cutime; @@ -565,8 +561,7 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, } rcu_read_unlock(); } - } while (need_seqretry(&sig->stats_lock, seq)); - done_seqretry_irqrestore(&sig->stats_lock, seq, flags); + } if (whole) { thread_group_cputime_adjusted(task, &utime, &stime); -- 2.25.1.362.g51ebf55 ^ permalink raw reply related [flat|nested] 74+ messages in thread
* [PATCH 4/4] seqlock: change do_io_accounting() to use scoped_seqlock_read_irqsave() 2025-10-07 14:20 ` [PATCH 0/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() Oleg Nesterov ` (2 preceding siblings ...) 2025-10-07 14:21 ` [PATCH 3/4] seqlock: change do_task_stat() " Oleg Nesterov @ 2025-10-07 14:21 ` Oleg Nesterov 2025-10-07 15:38 ` [PATCH 0/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() Linus Torvalds 4 siblings, 0 replies; 74+ messages in thread From: Oleg Nesterov @ 2025-10-07 14:21 UTC (permalink / raw) To: Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Linus Torvalds, Peter Zijlstra, Waiman Long, Will Deacon Cc: linux-kernel To simplify the code and make it more readable. Signed-off-by: Oleg Nesterov <oleg@redhat.com> --- fs/proc/base.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index 62d35631ba8c..08113bb6af1d 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -3041,20 +3041,14 @@ static int do_io_accounting(struct task_struct *task, struct seq_file *m, int wh if (whole) { struct signal_struct *sig = task->signal; struct task_struct *t; - unsigned int seq = 1; - unsigned long flags; rcu_read_lock(); - do { - seq++; /* 2 on the 1st/lockless path, otherwise odd */ - flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, &seq); - + scoped_seqlock_read_irqsave(&sig->stats_lock) { acct = sig->ioac; __for_each_thread(sig, t) task_io_accounting_add(&acct, &t->ioac); - } while (need_seqretry(&sig->stats_lock, seq)); - done_seqretry_irqrestore(&sig->stats_lock, seq, flags); + } rcu_read_unlock(); } else { acct = task->ioac; -- 2.25.1.362.g51ebf55 ^ permalink raw reply related [flat|nested] 74+ messages in thread
* Re: [PATCH 0/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() 2025-10-07 14:20 ` [PATCH 0/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() Oleg Nesterov ` (3 preceding siblings ...) 2025-10-07 14:21 ` [PATCH 4/4] seqlock: change do_io_accounting() " Oleg Nesterov @ 2025-10-07 15:38 ` Linus Torvalds 2025-10-07 16:34 ` Oleg Nesterov 4 siblings, 1 reply; 74+ messages in thread From: Linus Torvalds @ 2025-10-07 15:38 UTC (permalink / raw) To: Oleg Nesterov Cc: Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Peter Zijlstra, Waiman Long, Will Deacon, linux-kernel On Tue, 7 Oct 2025 at 07:22, Oleg Nesterov <oleg@redhat.com> wrote: > > OK, please see V2. scoped_seqlock_read_irqsave() uses the > "for (struct {} s ..." hack to make "ulong flags" local. Well, patches 2-4 certainly look pretty. That's clearly a *much* nicer interface. The macros to introduce that nicer interface may not be winning any beauty contests, but hey, that's pretty much par for the course. So this does look like a clear improvement to the interface. > I can change scoped_seqlock_read() to do the same, to make > it symmetric with _irqsave() if you think it makes sense. I don't think it's visible to users, but maybe it would make the macros look slightly cleaner. And it would allow making 'lockless' an actual 'bool' - which you admittedly didn't actually do in the irqsave version either. Not that that would matter either - I don't think compilers would care one way or the other. So a matter of taste. I'd personally lean towards doing it just to make that 'use a struct in a for-loop' pattern less of an outlier, and perhaps make people more aware of it. For example, one advantage of doing it that way was that you only needed one single use of __UNIQUE_ID() in the scoped_seqlock_read_irqsave(), because the only ID that ends up being unique is the name of the struct, and then you can have multiple different members. I hadn't even thought of that detail, but I thought it was a nice bonus. But I really don't think it *matters*, so I'm happy either way. Linus ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH 0/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() 2025-10-07 15:38 ` [PATCH 0/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() Linus Torvalds @ 2025-10-07 16:34 ` Oleg Nesterov 0 siblings, 0 replies; 74+ messages in thread From: Oleg Nesterov @ 2025-10-07 16:34 UTC (permalink / raw) To: Linus Torvalds Cc: Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Peter Zijlstra, Waiman Long, Will Deacon, linux-kernel On 10/07, Linus Torvalds wrote: > > On Tue, 7 Oct 2025 at 07:22, Oleg Nesterov <oleg@redhat.com> wrote: > > > > I can change scoped_seqlock_read() to do the same, to make > > it symmetric with _irqsave() if you think it makes sense. > > I don't think it's visible to users, but maybe it would make the > macros look slightly cleaner. OK, agreed, will do tomorrow, > And it would allow making 'lockless' an actual 'bool' Yeees... This will slightly increase the "size" of defines, but I guess this doesn't really matter in this case. Thanks, Oleg. ^ permalink raw reply [flat|nested] 74+ messages in thread
* [PATCH v2 0/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() 2025-09-28 16:19 [PATCH 0/1] documentation: seqlock: fix the wrong documentation of read_seqbegin_or_lock/need_seqretry Oleg Nesterov ` (4 preceding siblings ...) 2025-10-07 14:20 ` [PATCH 0/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() Oleg Nesterov @ 2025-10-08 12:30 ` Oleg Nesterov 2025-10-08 12:30 ` [PATCH v2 1/4] " Oleg Nesterov ` (4 more replies) 5 siblings, 5 replies; 74+ messages in thread From: Oleg Nesterov @ 2025-10-08 12:30 UTC (permalink / raw) To: Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Linus Torvalds, Peter Zijlstra, Waiman Long, Will Deacon Cc: linux-kernel Only 1/4 was changed according to comments from Linus and Waiman, but let me resend 2-4 as well. Oleg. fs/proc/array.c | 9 ++----- fs/proc/base.c | 10 ++------ include/linux/seqlock.h | 64 +++++++++++++++++++++++++++++++++++++++++++++++++ kernel/sched/cputime.c | 15 +++--------- 4 files changed, 71 insertions(+), 27 deletions(-) ^ permalink raw reply [flat|nested] 74+ messages in thread
* [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() 2025-10-08 12:30 ` [PATCH v2 " Oleg Nesterov @ 2025-10-08 12:30 ` Oleg Nesterov 2025-10-08 12:55 ` Peter Zijlstra 2025-10-08 16:05 ` Linus Torvalds 2025-10-08 12:30 ` [PATCH v2 2/4] seqlock: change thread_group_cputime() to use scoped_seqlock_read_irqsave() Oleg Nesterov ` (3 subsequent siblings) 4 siblings, 2 replies; 74+ messages in thread From: Oleg Nesterov @ 2025-10-08 12:30 UTC (permalink / raw) To: Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Linus Torvalds, Peter Zijlstra, Waiman Long, Will Deacon Cc: linux-kernel The read_seqbegin/need_seqretry/done_seqretry API is cumbersome and error prone. With the new helper the "typical" code like int seq, nextseq; unsigned long flags; nextseq = 0; do { seq = nextseq; flags = read_seqbegin_or_lock_irqsave(&seqlock, &seq); // read-side critical section nextseq = 1; } while (need_seqretry(&seqlock, seq)); done_seqretry_irqrestore(&seqlock, seq, flags); can be rewritten as scoped_seqlock_read_irqsave (&seqlock) { // read-side critical section } Signed-off-by: Oleg Nesterov <oleg@redhat.com> --- include/linux/seqlock.h | 64 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 64 insertions(+) diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h index 5ce48eab7a2a..fa8d73668f4b 100644 --- a/include/linux/seqlock.h +++ b/include/linux/seqlock.h @@ -1209,4 +1209,68 @@ done_seqretry_irqrestore(seqlock_t *lock, int seq, unsigned long flags) if (seq & 1) read_sequnlock_excl_irqrestore(lock, flags); } + +/* internal helper for scoped_seqlock_read/scoped_seqlock_read_irqsave */ +static inline bool +__scoped_seqlock_read_retry(seqlock_t *lock, int *seq, unsigned long *flags) +{ + bool retry = false; + + if (*seq & 1) { + if (flags) + read_sequnlock_excl_irqrestore(lock, *flags); + else + read_sequnlock_excl(lock); + } else if (read_seqretry(lock, *seq)) { + *seq = 1; + retry = true; + if (flags) + read_seqlock_excl_irqsave(lock, *flags); + else + read_seqlock_excl(lock); + } + + return retry; +} + +#define __scoped_seqlock_read(lock, s) \ + for (struct { bool lockless; int seq; } \ + s = { .lockless = true, .seq = read_seqbegin(lock) }; \ + s.lockless || __scoped_seqlock_read_retry(lock, &s.seq, NULL); \ + s.lockless = false) + +#define __scoped_seqlock_read_irqsave(lock, s) \ + for (struct { bool lockless; int seq; unsigned long flags; } \ + s = { .lockless = true, .seq = read_seqbegin(lock) }; \ + s.lockless || __scoped_seqlock_read_retry(lock, &s.seq, &s.flags); \ + s.lockless = false) + +/** + * scoped_seqlock_read(lock) - execute the read side critical section + * without manual sequence counter handling + * or calls to other helpers + * @lock: pointer to the seqlock_t protecting the data + * + * Example: + * + * scoped_seqlock_read(&lock) { + * // read-side critical section + * } + * + * Starts with a lockless pass first. If it fails, restarts the critical + * section with the lock held. + * + * The critical section must not contain control flow that escapes the loop. + */ +#define scoped_seqlock_read(lock) \ + __scoped_seqlock_read(lock, __UNIQUE_ID(s)) + +/** + * scoped_seqlock_read_irqsave(lock) - same as scoped_seqlock_read() but + * disables irqs on a locking pass + * @lock: pointer to the seqlock_t protecting the data + */ +#define scoped_seqlock_read_irqsave(lock) \ + __scoped_seqlock_read_irqsave(lock, __UNIQUE_ID(s)) + #endif /* __LINUX_SEQLOCK_H */ -- 2.25.1.362.g51ebf55 ^ permalink raw reply related [flat|nested] 74+ messages in thread
* Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() 2025-10-08 12:30 ` [PATCH v2 1/4] " Oleg Nesterov @ 2025-10-08 12:55 ` Peter Zijlstra 2025-10-08 12:59 ` Oleg Nesterov 2025-10-08 16:05 ` Linus Torvalds 1 sibling, 1 reply; 74+ messages in thread From: Peter Zijlstra @ 2025-10-08 12:55 UTC (permalink / raw) To: Oleg Nesterov Cc: Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Linus Torvalds, Waiman Long, Will Deacon, linux-kernel On Wed, Oct 08, 2025 at 02:30:45PM +0200, Oleg Nesterov wrote: > The read_seqbegin/need_seqretry/done_seqretry API is cumbersome and > error prone. With the new helper the "typical" code like > > int seq, nextseq; > unsigned long flags; > > nextseq = 0; > do { > seq = nextseq; > flags = read_seqbegin_or_lock_irqsave(&seqlock, &seq); > > // read-side critical section > > nextseq = 1; > } while (need_seqretry(&seqlock, seq)); > done_seqretry_irqrestore(&seqlock, seq, flags); > > can be rewritten as > > scoped_seqlock_read_irqsave (&seqlock) { > // read-side critical section > } > Hmm, on first reading I was expecting that to be: do { seq = read_seqbegin(&seqlock); // read-side section } while (read_seqretry(&seqlock, seq)); for lack of that _or_lock() wording, but I suppose we can make that something like: scoped_seqbegin_read (&seqlock) { // read-side section } which is distinctive enough. ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() 2025-10-08 12:55 ` Peter Zijlstra @ 2025-10-08 12:59 ` Oleg Nesterov 2025-10-08 13:54 ` Peter Zijlstra 0 siblings, 1 reply; 74+ messages in thread From: Oleg Nesterov @ 2025-10-08 12:59 UTC (permalink / raw) To: Peter Zijlstra Cc: Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Linus Torvalds, Waiman Long, Will Deacon, linux-kernel On 10/08, Peter Zijlstra wrote: > > On Wed, Oct 08, 2025 at 02:30:45PM +0200, Oleg Nesterov wrote: > > > > scoped_seqlock_read_irqsave (&seqlock) { > > // read-side critical section > > } > > > > Hmm, on first reading I was expecting that to be: > > do { > seq = read_seqbegin(&seqlock); > > // read-side section > > } while (read_seqretry(&seqlock, seq)); > > for lack of that _or_lock() wording, but I suppose we can make that > something like: > > scoped_seqbegin_read (&seqlock) { I was thinking about scoped_seqcount_read() but I agree with any naming Oleg. ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() 2025-10-08 12:59 ` Oleg Nesterov @ 2025-10-08 13:54 ` Peter Zijlstra 0 siblings, 0 replies; 74+ messages in thread From: Peter Zijlstra @ 2025-10-08 13:54 UTC (permalink / raw) To: Oleg Nesterov Cc: Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Linus Torvalds, Waiman Long, Will Deacon, linux-kernel On Wed, Oct 08, 2025 at 02:59:03PM +0200, Oleg Nesterov wrote: > On 10/08, Peter Zijlstra wrote: > > > > On Wed, Oct 08, 2025 at 02:30:45PM +0200, Oleg Nesterov wrote: > > > > > > scoped_seqlock_read_irqsave (&seqlock) { > > > // read-side critical section > > > } > > > > > > > Hmm, on first reading I was expecting that to be: > > > > do { > > seq = read_seqbegin(&seqlock); > > > > // read-side section > > > > } while (read_seqretry(&seqlock, seq)); > > > > for lack of that _or_lock() wording, but I suppose we can make that > > something like: > > > > scoped_seqbegin_read (&seqlock) { > > I was thinking about scoped_seqcount_read() but I agree with any naming Ah, but that would take a seqcount_*t not a seqlock_t no? Or we can do: scoped_seqcount_read (&seqlock.seqcount) But yeah, for later I suppose. ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() 2025-10-08 12:30 ` [PATCH v2 1/4] " Oleg Nesterov 2025-10-08 12:55 ` Peter Zijlstra @ 2025-10-08 16:05 ` Linus Torvalds 2025-10-08 16:55 ` Oleg Nesterov 2025-10-09 5:31 ` Linus Torvalds 1 sibling, 2 replies; 74+ messages in thread From: Linus Torvalds @ 2025-10-08 16:05 UTC (permalink / raw) To: Oleg Nesterov Cc: Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Peter Zijlstra, Waiman Long, Will Deacon, linux-kernel Bah. Now I'm reading this again, and going through it more carefully, and now I hate your helper. Why? Because I think that with the new organization, you can do so much better. So the old code used "seq" as two different values: - the first loop around, it's the sequence count for the lockless attempt and is always even because of how read_seqbegin() works (and because the original code had that whole 'nextseq' hackery. - the second loop around, it's just a flag that now we hold the lock and you converted the new scheme to that same model. Which works, but in the new scheme, it's just all *pointless*. Why do I say that? Because in the new scheme, you have that explicit "lockless" flag which is so much more legible in the first place. You use it in the for-loop, but you don't use it in the helper function. So all the games with if (*seq & 1) { ... *seq = 1; are horribly non-intuitive and shouldn't exist because it would be much more logical to just use the 'lockless' boolean instead. Those games only make the code harder to understand, and I think they also make the compiler unable to just until the "loop" twice. So instead of this: static inline bool __scoped_seqlock_read_retry(seqlock_t *lock, int *seq, unsigned long *flags) { bool retry = false; if (*seq & 1) { if (flags) read_sequnlock_excl_irqrestore(lock, *flags); else read_sequnlock_excl(lock); } else if (read_seqretry(lock, *seq)) { *seq = 1; retry = true; if (flags) read_seqlock_excl_irqsave(lock, *flags); else read_seqlock_excl(lock); } return retry; } I think that you should make that helper function be a macro - so that it can take the unnamed union type as its argument - and then make it do something like this instead: #define __scoped_seqlock_read_retry(s) ({ s.lockless ? \ (s.lockless = false) || read_seqretry(lock, s.seq) ? \ ({ read_seqlock_excl(lock); true }) : false : \ ({ read_sequnlock_excl(lock); false }) }) Ok, that hasn't even been close to a compiler and may be completely buggy, but basically the logic is - if lockless: set lockless to false (unconditionally to make it real easy for the compiler), then do the seq_retry, and if that says we should retry, it does the locking and returns true (so that the loop is re-done) - if not lockless, just unlock the lock and return false to say we're done And yes, you'd need the irqsafe version too, and yes, you could do it with the "if (flags)" thing or you could duplicate that macro, but you could also just pass the lock/unlock sequence as an argument, and now that macro looks even simpler #define __scoped_seqlock_retry(s, lock, unlock) ({ s.lockless ? \ (s.lockless = false) || read_seqretry(lock, s.seq) ? \ ({ lock ; true; }) : false : \ ({ unlock ; false }) }) although then the users or that macro will be a bit uglier (maybe do that with another level of macro to make each step simpler). And I think the compiler will be able to optimize this better, because the compiler actually can just follow the 's.lockless' tests and turn all those tests into static jumps when it unrolls that loop twice (first the lockless = true, then the lockless = false). Again: I DID NOT FEED THIS TO A COMPILER. It may have syntax errors. It may have logic errors, Or it may be me having a complete breakdown and spouting nonsense. But when re-reading your __scoped_seqlock_read_retry() helper, I really really hated how it played those games with even/odd *seq, and I think it's unnecessary. Am I making sense? Who knows? Linus ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() 2025-10-08 16:05 ` Linus Torvalds @ 2025-10-08 16:55 ` Oleg Nesterov 2025-10-09 5:31 ` Linus Torvalds 1 sibling, 0 replies; 74+ messages in thread From: Oleg Nesterov @ 2025-10-08 16:55 UTC (permalink / raw) To: Linus Torvalds Cc: Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Peter Zijlstra, Waiman Long, Will Deacon, linux-kernel On 10/08, Linus Torvalds wrote: > > Bah. > > Now I'm reading this again, and going through it more carefully, and > now I hate your helper. OK ;) let me read/check your email and reply tomorrow, after sleep. Oleg. ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() 2025-10-08 16:05 ` Linus Torvalds 2025-10-08 16:55 ` Oleg Nesterov @ 2025-10-09 5:31 ` Linus Torvalds 2025-10-09 7:04 ` Linus Torvalds 1 sibling, 1 reply; 74+ messages in thread From: Linus Torvalds @ 2025-10-09 5:31 UTC (permalink / raw) To: Oleg Nesterov Cc: Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Peter Zijlstra, Waiman Long, Will Deacon, linux-kernel On Wed, 8 Oct 2025 at 09:05, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > I think that you should make that helper function be a macro - so that > it can take the unnamed union type as its argument - and then make it > do something like this instead: > > #define __scoped_seqlock_read_retry(s) ({ s.lockless ? \ > (s.lockless = false) || read_seqretry(lock, s.seq) ? \ > ({ read_seqlock_excl(lock); true }) : false : \ > ({ read_sequnlock_excl(lock); false }) }) Actually, it shouldn't do that "s.lockless = false" thing at all - because that's done by the for() loop. So that was just a thinko from trying to translate the whole "*seq = 1" hackery from the old model. But that hackery shouldn't even exist since it's all handled naturally and much more cleanly by the surrounding for-loop. The only thing that the macro needs to worry about is whether it should retry after the lockless case (and take the lock if so), or whether it should just unlock after the locked case. Linus ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() 2025-10-09 5:31 ` Linus Torvalds @ 2025-10-09 7:04 ` Linus Torvalds 2025-10-09 14:37 ` Oleg Nesterov 0 siblings, 1 reply; 74+ messages in thread From: Linus Torvalds @ 2025-10-09 7:04 UTC (permalink / raw) To: Oleg Nesterov Cc: Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Peter Zijlstra, Waiman Long, Will Deacon, linux-kernel [-- Attachment #1: Type: text/plain, Size: 3140 bytes --] On Wed, 8 Oct 2025 at 22:31, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > But that hackery shouldn't even exist since it's all handled naturally > and much more cleanly by the surrounding for-loop. I'm batting zero. I still think it's true that this logic should be handled by the for-loop, but the *seq=1" hackery did actually do something. Because it, together with the 'lockless' flag, enumerated three states, not two. It's just that the first state didn't _do_ anything, so it was kind of easy to miss it when I read through the code. So the three states are (a) before anything has been done (lockless, seq is even) (b) the "we've done the lockless, we need to end the loop or take the lock" (!lockless, seq is even) (c) we've done the locked, we need to unlock and end (seq is odd) and so relying on just a 'bool lockless' in the loop isn't enough. I do still think this could be done at the for-loop level, so that the compiler can statically see the stages and unroll it all to not be a loop at all. But I think it means that "lockless" would have to be a "phase", and go 0/1/2. And I did get it to work that way, and did get gcc to unroll it to generate really nice code. IOW, I can make this code scoped_seqlock_read(lock) { asm volatile("TEST"); } generate this assembly: .L12: movl (%rdi), %eax #* lock, _25 testb $1, %al #, _25 jne .L14 #, TEST movl (%rdi), %edx # MEM[(const volatile unsigned int *)lock_6(D)], _70 cmpl %edx, %eax # _70, _25 jne .L15 #, ret .L15: subq $8, %rsp #, addq $4, %rdi #, _74 movq %rdi, (%rsp) # _74, %sfp call _raw_spin_lock # TEST movq (%rsp), %rdi # %sfp, _74 addq $8, %rsp #, jmp _raw_spin_unlock # .L14: pause jmp .L12 # which is basically optimal, but I have to say, the hoops I had to jump through to get there makes me suspect it's not worth it. (Note above how there is no sign of 'phase' left in the code, and the only conditionals are on the actual sequence count state, and the nice default action of no sequence count problems is all linear fall-through code). The irqsave version looks the same, just (obviously) calling a different set of spinlock/unlock functions and having one extra register argument for that. But to get there, I had to use not only linux/unroll.h, but also make that for-loop explicitly have that s.phase < 3 test just to get the unrolling to actually trigger. I'm attaching the test-case I wrote that does this. It's not *horrific*, but the extra unrolling hackery makes me doubt that it's really worth it. So I suspect your patch is fine as-is, and I was not just wrong on relying on the (insufficient) "locked" boolean, but the more complete solution is just too damn ugly. But hey, I *did* get gcc to generate nice code. So you might look at my test-case below and decide that maybe there's something to be said for this. Linus [-- Attachment #2: t.c --] [-- Type: text/x-csrc, Size: 1357 bytes --] #include <linux/seqlock.h> #include <linux/unroll.h> #define __scoped_seqlock_read(lock, s, cond) \ unrolled_full for (struct { unsigned long phase, data; } s = { 0 }; \ s.phase < 3 && cond(lock, s.phase, &s.data); s.phase++) #define scoped_seqlock_read(lock) \ __scoped_seqlock_read(lock, __UNIQUE_ID(s), __seqlock_cond) #define scoped_seqlock_read_irqsave(lock) \ __scoped_seqlock_read(lock, __UNIQUE_ID(s), __seqlock_cond_irqsave) /* Do the phase action and return true if the loop should go on */ static __always_inline bool __seqlock_cond(seqlock_t *lock, int phase, unsigned long *data) { switch (phase) { case 0: *data = read_seqbegin(lock); return true; case 1: if (!read_seqretry(lock, *data)) return false; read_seqlock_excl(lock); return true; default: read_sequnlock_excl(lock); return false; } } static __always_inline bool __seqlock_cond_irqsave(seqlock_t *lock, int phase, unsigned long *data) { switch (phase) { case 0: *data = read_seqbegin(lock); return true; case 1: if (!read_seqretry(lock, *data)) return false; read_seqlock_excl_irqsave(lock, *data); return true; default: read_sequnlock_excl_irqrestore(lock, *data); return false; } } void testme(seqlock_t *lock); void testme(seqlock_t *lock) { scoped_seqlock_read_irqsave(lock) { asm volatile("TEST"); } } ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() 2025-10-09 7:04 ` Linus Torvalds @ 2025-10-09 14:37 ` Oleg Nesterov 2025-10-09 16:18 ` Linus Torvalds 2025-10-09 19:50 ` Peter Zijlstra 0 siblings, 2 replies; 74+ messages in thread From: Oleg Nesterov @ 2025-10-09 14:37 UTC (permalink / raw) To: Linus Torvalds Cc: Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Peter Zijlstra, Waiman Long, Will Deacon, linux-kernel On 10/09, Linus Torvalds wrote: > > I'm batting zero. I still think it's true that this logic should be > handled by the for-loop, but the *seq=1" hackery did actually do > something. Yes... > But I think it means that "lockless" would have to be a "phase", and go 0/1/2. OK, I like your version more than mine. In particular, the fact that it puts seq/flags into the unsigned long data "union", somehow I didn't think about this option. Plus, yes, it removes the "make/check seq is odd". Nice. But do we really want to unroll the loop? This code should be optimized for the likely case when the lockless pass succeeds. Let me think a bit more before I send V3... > #define __scoped_seqlock_read(lock, s, cond) \ > unrolled_full for (struct { unsigned long phase, data; } s = { 0 }; \ > s.phase < 3 && cond(lock, s.phase, &s.data); s.phase++) I guess in your version "struct" doesn't buy too much, just saves one __UNIQUE_ID(). But this is a matter of taste. Thanks! Oleg. ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() 2025-10-09 14:37 ` Oleg Nesterov @ 2025-10-09 16:18 ` Linus Torvalds 2025-10-09 19:50 ` Peter Zijlstra 1 sibling, 0 replies; 74+ messages in thread From: Linus Torvalds @ 2025-10-09 16:18 UTC (permalink / raw) To: Oleg Nesterov Cc: Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Peter Zijlstra, Waiman Long, Will Deacon, linux-kernel On Thu, 9 Oct 2025 at 07:39, Oleg Nesterov <oleg@redhat.com> wrote: > > But do we really want to unroll the loop? This code should be optimized > for the likely case when the lockless pass succeeds. Dropping the unroll magic certainly makes the patch more palatable, and maybe it doesn't matter that much. It really was mainly me going "this should be easy for the compiler to see the flow statically", and the need to keep two state variables annoyed me. But I guess many of the users are complicated enough that you actually don't want to have two copies of the inner body. If they weren't complicated, they'd just use the simpler do { } while (read_seqcount_retry(...)); model. So you are right, I was probably chasing that optimal code generation for no real reason. Linus ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() 2025-10-09 14:37 ` Oleg Nesterov 2025-10-09 16:18 ` Linus Torvalds @ 2025-10-09 19:50 ` Peter Zijlstra 2025-10-09 20:11 ` Peter Zijlstra 1 sibling, 1 reply; 74+ messages in thread From: Peter Zijlstra @ 2025-10-09 19:50 UTC (permalink / raw) To: Oleg Nesterov Cc: Linus Torvalds, Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Waiman Long, Will Deacon, linux-kernel On Thu, Oct 09, 2025 at 04:37:49PM +0200, Oleg Nesterov wrote: > Let me think a bit more before I send V3... How do we feel about something a little like so? diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h index 5ce48eab7a2a..9786b8d14164 100644 --- a/include/linux/seqlock.h +++ b/include/linux/seqlock.h @@ -1209,4 +1209,85 @@ done_seqretry_irqrestore(seqlock_t *lock, int seq, unsigned long flags) if (seq & 1) read_sequnlock_excl_irqrestore(lock, flags); } + +enum ss_state { + sss_done = 0, + sss_locked, + sss_lockless, +}; + +enum ss_type { + ss_lockless, + ss_lock, + ss_lock_irqsave, +}; + +struct ss_tmp { + enum ss_type type; + enum ss_state state; + int seq; + unsigned long flags; + spinlock_t *lock; +}; + +static inline void __scoped_seqlock_cleanup(struct ss_tmp *sst) +{ + if (!sst->lock) + return; + + if (sst->type == ss_lock_irqsave) { + spin_unlock_irqrestore(sst->lock, sst->flags); + return; + } + + spin_unlock(sst->lock); +} + +static inline void +__scoped_seqlock_next(struct ss_tmp *sst, seqlock_t *lock) +{ + switch (sst->state) { + case sss_lockless: + if (!read_seqretry(lock, sst->seq)) { + sst->state = sss_done; + return; + } + + switch (sst->type) { + case ss_lock: + sst->lock = &lock->lock; + spin_lock(sst->lock); + sst->state = sss_locked; + return; + + case ss_lock_irqsave: + sst->lock = &lock->lock; + spin_lock_irqsave(sst->lock, sst->flags); + sst->state = sss_locked; + return; + + case ss_lockless: + sst->seq = read_seqbegin(lock); + return; + } + + case sss_locked: + sst->state = sss_done; + return; + + case sss_done: + BUG(); + } +} + +#define __scoped_seqlock_read(_seqlock, _type, _s) \ + for (struct ss_tmp _s __cleanup(__scoped_seqlock_cleanup) = { \ + .type = _type, .state = sss_lockless, \ + .seq = read_seqbegin(_seqlock), .lock = NULL }; \ + _s.state != sss_done; \ + __scoped_seqlock_next(&_s, _seqlock)) + +#define scoped_seqlock_read(_seqlock, _type) \ + __scoped_seqlock_read(_seqlock, _type, __UNIQUE_ID(seqlock)) + #endif /* __LINUX_SEQLOCK_H */ diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index 7097de2c8cda..d2b3f987c888 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -313,10 +313,8 @@ static u64 read_sum_exec_runtime(struct task_struct *t) void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times) { struct signal_struct *sig = tsk->signal; - u64 utime, stime; struct task_struct *t; - unsigned int seq, nextseq; - unsigned long flags; + u64 utime, stime; /* * Update current task runtime to account pending time since last @@ -329,27 +327,19 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times) if (same_thread_group(current, tsk)) (void) task_sched_runtime(current); - rcu_read_lock(); - /* Attempt a lockless read on the first round. */ - nextseq = 0; - do { - seq = nextseq; - flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, &seq); + guard(rcu)(); + scoped_seqlock_read(&sig->stats_lock, ss_lock_irqsave) { times->utime = sig->utime; times->stime = sig->stime; times->sum_exec_runtime = sig->sum_sched_runtime; - for_each_thread(tsk, t) { + __for_each_thread(sig, t) { task_cputime(t, &utime, &stime); times->utime += utime; times->stime += stime; times->sum_exec_runtime += read_sum_exec_runtime(t); } - /* If lockless access failed, take the lock. */ - nextseq = 1; - } while (need_seqretry(&sig->stats_lock, seq)); - done_seqretry_irqrestore(&sig->stats_lock, seq, flags); - rcu_read_unlock(); + } } #ifdef CONFIG_IRQ_TIME_ACCOUNTING ^ permalink raw reply related [flat|nested] 74+ messages in thread
* Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() 2025-10-09 19:50 ` Peter Zijlstra @ 2025-10-09 20:11 ` Peter Zijlstra 2025-10-09 20:24 ` Linus Torvalds 2025-10-21 10:35 ` [tip: locking/core] seqlock: Introduce scoped_seqlock_read() tip-bot2 for Peter Zijlstra 0 siblings, 2 replies; 74+ messages in thread From: Peter Zijlstra @ 2025-10-09 20:11 UTC (permalink / raw) To: Oleg Nesterov Cc: Linus Torvalds, Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Waiman Long, Will Deacon, linux-kernel On Thu, Oct 09, 2025 at 09:50:24PM +0200, Peter Zijlstra wrote: > On Thu, Oct 09, 2025 at 04:37:49PM +0200, Oleg Nesterov wrote: > > > Let me think a bit more before I send V3... > > How do we feel about something a little like so? Slightly nicer version that's actually compiled :-) --- diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h index 5ce48eab7a2a..7273fddc19a2 100644 --- a/include/linux/seqlock.h +++ b/include/linux/seqlock.h @@ -1209,4 +1209,87 @@ done_seqretry_irqrestore(seqlock_t *lock, int seq, unsigned long flags) if (seq & 1) read_sequnlock_excl_irqrestore(lock, flags); } + +enum ss_state { + ss_done = 0, + ss_lock, + ss_lock_irqsave, + ss_lockless, +}; + +struct ss_tmp { + enum ss_state state; + int seq; + unsigned long flags; + spinlock_t *lock; +}; + +static inline void __scoped_seqlock_cleanup(struct ss_tmp *sst) +{ + if (!sst->lock) + return; + + if (sst->state == ss_lock_irqsave) { + spin_unlock_irqrestore(sst->lock, sst->flags); + return; + } + + spin_unlock(sst->lock); +} + +extern void __scoped_seqlock_fail(void); + +static inline void +__scoped_seqlock_next(struct ss_tmp *sst, seqlock_t *lock, enum ss_state target) +{ + switch (sst->state) { + case ss_lockless: + if (!read_seqretry(lock, sst->seq)) { + sst->state = ss_done; + return; + } + + switch (target) { + case ss_done: + __scoped_seqlock_fail(); + return; + + case ss_lock: + sst->lock = &lock->lock; + spin_lock(sst->lock); + sst->state = ss_lock; + return; + + case ss_lock_irqsave: + sst->lock = &lock->lock; + spin_lock_irqsave(sst->lock, sst->flags); + sst->state = ss_lock_irqsave; + return; + + case ss_lockless: + sst->seq = read_seqbegin(lock); + return; + } + BUG(); + + case ss_lock: + case ss_lock_irqsave: + sst->state = ss_done; + return; + + case ss_done: + BUG(); + } +} + +#define __scoped_seqlock_read(_seqlock, _target, _s) \ + for (struct ss_tmp _s __cleanup(__scoped_seqlock_cleanup) = { \ + .state = ss_lockless, .seq = read_seqbegin(_seqlock), \ + .lock = NULL }; \ + _s.state != ss_done; \ + __scoped_seqlock_next(&_s, _seqlock, _target)) + +#define scoped_seqlock_read(_seqlock, _target) \ + __scoped_seqlock_read(_seqlock, _target, __UNIQUE_ID(seqlock)) + #endif /* __LINUX_SEQLOCK_H */ diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index 7097de2c8cda..d2b3f987c888 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -313,10 +313,8 @@ static u64 read_sum_exec_runtime(struct task_struct *t) void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times) { struct signal_struct *sig = tsk->signal; - u64 utime, stime; struct task_struct *t; - unsigned int seq, nextseq; - unsigned long flags; + u64 utime, stime; /* * Update current task runtime to account pending time since last @@ -329,27 +327,19 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times) if (same_thread_group(current, tsk)) (void) task_sched_runtime(current); - rcu_read_lock(); - /* Attempt a lockless read on the first round. */ - nextseq = 0; - do { - seq = nextseq; - flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, &seq); + guard(rcu)(); + scoped_seqlock_read(&sig->stats_lock, ss_lock_irqsave) { times->utime = sig->utime; times->stime = sig->stime; times->sum_exec_runtime = sig->sum_sched_runtime; - for_each_thread(tsk, t) { + __for_each_thread(sig, t) { task_cputime(t, &utime, &stime); times->utime += utime; times->stime += stime; times->sum_exec_runtime += read_sum_exec_runtime(t); } - /* If lockless access failed, take the lock. */ - nextseq = 1; - } while (need_seqretry(&sig->stats_lock, seq)); - done_seqretry_irqrestore(&sig->stats_lock, seq, flags); - rcu_read_unlock(); + } } #ifdef CONFIG_IRQ_TIME_ACCOUNTING ^ permalink raw reply related [flat|nested] 74+ messages in thread
* Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() 2025-10-09 20:11 ` Peter Zijlstra @ 2025-10-09 20:24 ` Linus Torvalds 2025-10-09 22:12 ` Peter Zijlstra 2025-10-21 10:35 ` [tip: locking/core] seqlock: Introduce scoped_seqlock_read() tip-bot2 for Peter Zijlstra 1 sibling, 1 reply; 74+ messages in thread From: Linus Torvalds @ 2025-10-09 20:24 UTC (permalink / raw) To: Peter Zijlstra Cc: Oleg Nesterov, Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Waiman Long, Will Deacon, linux-kernel On Thu, 9 Oct 2025 at 13:12, Peter Zijlstra <peterz@infradead.org> wrote: > > Slightly nicer version that's actually compiled :-) I assume that "target of ss_lockless" is an intentional extension to just make the loop never take a lock at all? I do like it, but I think you'll find that having a separate 'seq' and 'flags' like this: > +struct ss_tmp { > + enum ss_state state; > + int seq; > + unsigned long flags; > + spinlock_t *lock; > +}; makes it unnecessarily waste a register. You never need both seq and flags at the same time, since if you take the spinlock the sequence number is pointless. So please make that a union, and I think it will help avoid wasting a register in the loop. Other than that I like it. Except that "BUG()" really bugs me. It will generate horrendous code for no reason and we *really* shouldn't add BUG statements anyway. Either that inline function is fine, or it isn't. Don't make it generate stupid code for "I'm not fine" that will also be a huge pain to debug because if that code is buggy it will presumably trigger in context where the machine will be dead, dead, dead. Linus ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() 2025-10-09 20:24 ` Linus Torvalds @ 2025-10-09 22:12 ` Peter Zijlstra 2025-10-09 22:55 ` Linus Torvalds 2025-10-09 23:20 ` Peter Zijlstra 0 siblings, 2 replies; 74+ messages in thread From: Peter Zijlstra @ 2025-10-09 22:12 UTC (permalink / raw) To: Linus Torvalds Cc: Oleg Nesterov, Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Waiman Long, Will Deacon, linux-kernel On Thu, Oct 09, 2025 at 01:24:51PM -0700, Linus Torvalds wrote: > On Thu, 9 Oct 2025 at 13:12, Peter Zijlstra <peterz@infradead.org> wrote: > > > > Slightly nicer version that's actually compiled :-) > > I assume that "target of ss_lockless" is an intentional extension to > just make the loop never take a lock at all? Yep. Almost came for free, so might as well do it. > I do like it, but I think you'll find that having a separate 'seq' and > 'flags' like this: > > > +struct ss_tmp { > > + enum ss_state state; > > + int seq; > > + unsigned long flags; > > + spinlock_t *lock; > > +}; > > makes it unnecessarily waste a register. > > You never need both seq and flags at the same time, since if you take > the spinlock the sequence number is pointless. > > So please make that a union, and I think it will help avoid wasting a > register in the loop. Sure; otoh compiler should be able to tell the same using liveness analysis I suppose, but perhaps they're not *that* clever. > Other than that I like it. Except that "BUG()" really bugs me. It will > generate horrendous code for no reason and we *really* shouldn't add > BUG statements anyway. > > Either that inline function is fine, or it isn't. Don't make it > generate stupid code for "I'm not fine" that will also be a huge pain > to debug because if that code is buggy it will presumably trigger in > context where the machine will be dead, dead, dead. So I thought they were fine; we handle all the enum cases with 'return' so its impossible to not exit the switch() but the silly compiler was complaining about possible fall-through, so clearly it was getting confused. The ss_done case should never trip either -- perhaps I should make that one of those __scoped_seqlock_fail like things as well. Anyway, I'll play around with it some (and look at actual code-gen) tomorrow. ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() 2025-10-09 22:12 ` Peter Zijlstra @ 2025-10-09 22:55 ` Linus Torvalds 2025-10-10 8:03 ` Peter Zijlstra 2025-10-09 23:20 ` Peter Zijlstra 1 sibling, 1 reply; 74+ messages in thread From: Linus Torvalds @ 2025-10-09 22:55 UTC (permalink / raw) To: Peter Zijlstra Cc: Oleg Nesterov, Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Waiman Long, Will Deacon, linux-kernel On Thu, 9 Oct 2025 at 15:12, Peter Zijlstra <peterz@infradead.org> wrote: > > Sure; otoh compiler should be able to tell the same using liveness > analysis I suppose, but perhaps they're not *that* clever. They are that clever, but only if they end up unrolling the loop statically. If the loop remains a loop, the two variables end up live in the same code. And while a compiler _could_ in theory still see that they aren't actually live in the same _iteration_, I don't think any compiler actually ends up being that clever in practice. So making it a union then hopefully gets the compiler to basically use that explicit information. > So I thought they were fine; we handle all the enum cases with 'return' > so its impossible to not exit the switch() but the silly compiler was > complaining about possible fall-through, so clearly it was getting > confused. Yeah, I found the same thing with the 0/1/2 approach - the compiler wouldn't realize that the range was limited until I added a very explicit limit check that "shouldn't matter", but did. This might obviously end up depending on compiler version and other random things, but in general the whole value range analysis tends to be a pretty fragile thing. In practice, compilers tend to be good at doing value range analysis if they see particular patterns (like initializing it to some value, always incrementing it by one, and comparing against another value). But when it's written more like a state machine like this, it's clearly very hit and miss. Linus ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() 2025-10-09 22:55 ` Linus Torvalds @ 2025-10-10 8:03 ` Peter Zijlstra 2025-10-10 12:32 ` Oleg Nesterov 0 siblings, 1 reply; 74+ messages in thread From: Peter Zijlstra @ 2025-10-10 8:03 UTC (permalink / raw) To: Linus Torvalds Cc: Oleg Nesterov, Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Waiman Long, Will Deacon, linux-kernel On Thu, Oct 09, 2025 at 03:55:15PM -0700, Linus Torvalds wrote: > On Thu, 9 Oct 2025 at 15:12, Peter Zijlstra <peterz@infradead.org> wrote: > > > > Sure; otoh compiler should be able to tell the same using liveness > > analysis I suppose, but perhaps they're not *that* clever. > > They are that clever, but only if they end up unrolling the loop > statically. If the loop remains a loop, the two variables end up live > in the same code. > > And while a compiler _could_ in theory still see that they aren't > actually live in the same _iteration_, I don't think any compiler > actually ends up being that clever in practice. > > So making it a union then hopefully gets the compiler to basically use > that explicit information. Right, so I had to use -Os to not make it unroll the thing, but then indeed, sharing the variable helps it. > > So I thought they were fine; we handle all the enum cases with 'return' > > so its impossible to not exit the switch() but the silly compiler was > > complaining about possible fall-through, so clearly it was getting > > confused. > > Yeah, I found the same thing with the 0/1/2 approach - the compiler > wouldn't realize that the range was limited until I added a very > explicit limit check that "shouldn't matter", but did. > > This might obviously end up depending on compiler version and other > random things, but in general the whole value range analysis tends to > be a pretty fragile thing. > > In practice, compilers tend to be good at doing value range analysis > if they see particular patterns (like initializing it to some value, > always incrementing it by one, and comparing against another value). > > But when it's written more like a state machine like this, it's > clearly very hit and miss. I reordered the code, it is happier now. Anyway, the below seems to generate decent code for {-O2,-Os}x{gcc-14,clang-22}. Yay for optimizing compilers I suppose :-) --- diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h index 5ce48eab7a2a..45fab026f7d6 100644 --- a/include/linux/seqlock.h +++ b/include/linux/seqlock.h @@ -1209,4 +1209,83 @@ done_seqretry_irqrestore(seqlock_t *lock, int seq, unsigned long flags) if (seq & 1) read_sequnlock_excl_irqrestore(lock, flags); } + +enum ss_state { + ss_done = 0, + ss_lock, + ss_lock_irqsave, + ss_lockless, +}; + +struct ss_tmp { + enum ss_state state; + unsigned long data; + spinlock_t *lock; + spinlock_t *lock_irqsave; +}; + +static inline void __scoped_seqlock_cleanup(struct ss_tmp *sst) +{ + if (sst->lock) + spin_unlock(sst->lock); + if (sst->lock_irqsave) + spin_unlock_irqrestore(sst->lock, sst->data); +} + +extern void __scoped_seqlock_invalid_target(void); +extern void __scoped_seqlock_bug(void); + +static inline void +__scoped_seqlock_next(struct ss_tmp *sst, seqlock_t *lock, enum ss_state target) +{ + switch (sst->state) { + case ss_done: + __scoped_seqlock_bug(); + return; + + case ss_lock: + case ss_lock_irqsave: + sst->state = ss_done; + return; + + case ss_lockless: + if (!read_seqretry(lock, sst->data)) { + sst->state = ss_done; + return; + } + break; + } + + switch (target) { + case ss_done: + __scoped_seqlock_invalid_target(); + return; + + case ss_lock: + sst->lock = &lock->lock; + spin_lock(sst->lock); + sst->state = ss_lock; + return; + + case ss_lock_irqsave: + sst->lock_irqsave = &lock->lock; + spin_lock_irqsave(sst->lock, sst->data); + sst->state = ss_lock_irqsave; + return; + + case ss_lockless: + sst->data = read_seqbegin(lock); + return; + } +} + +#define __scoped_seqlock_read(_seqlock, _target, _s) \ + for (struct ss_tmp _s __cleanup(__scoped_seqlock_cleanup) = \ + { .state = ss_lockless, .data = read_seqbegin(_seqlock) }; \ + _s.state != ss_done; \ + __scoped_seqlock_next(&_s, _seqlock, _target)) + +#define scoped_seqlock_read(_seqlock, _target) \ + __scoped_seqlock_read(_seqlock, _target, __UNIQUE_ID(seqlock)) + #endif /* __LINUX_SEQLOCK_H */ diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index 7097de2c8cda..d2b3f987c888 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -313,10 +313,8 @@ static u64 read_sum_exec_runtime(struct task_struct *t) void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times) { struct signal_struct *sig = tsk->signal; - u64 utime, stime; struct task_struct *t; - unsigned int seq, nextseq; - unsigned long flags; + u64 utime, stime; /* * Update current task runtime to account pending time since last @@ -329,27 +327,19 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times) if (same_thread_group(current, tsk)) (void) task_sched_runtime(current); - rcu_read_lock(); - /* Attempt a lockless read on the first round. */ - nextseq = 0; - do { - seq = nextseq; - flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, &seq); + guard(rcu)(); + scoped_seqlock_read(&sig->stats_lock, ss_lock_irqsave) { times->utime = sig->utime; times->stime = sig->stime; times->sum_exec_runtime = sig->sum_sched_runtime; - for_each_thread(tsk, t) { + __for_each_thread(sig, t) { task_cputime(t, &utime, &stime); times->utime += utime; times->stime += stime; times->sum_exec_runtime += read_sum_exec_runtime(t); } - /* If lockless access failed, take the lock. */ - nextseq = 1; - } while (need_seqretry(&sig->stats_lock, seq)); - done_seqretry_irqrestore(&sig->stats_lock, seq, flags); - rcu_read_unlock(); + } } #ifdef CONFIG_IRQ_TIME_ACCOUNTING ^ permalink raw reply related [flat|nested] 74+ messages in thread
* Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() 2025-10-10 8:03 ` Peter Zijlstra @ 2025-10-10 12:32 ` Oleg Nesterov 2025-10-10 13:14 ` Oleg Nesterov 2025-10-10 15:30 ` Linus Torvalds 0 siblings, 2 replies; 74+ messages in thread From: Oleg Nesterov @ 2025-10-10 12:32 UTC (permalink / raw) To: Peter Zijlstra Cc: Linus Torvalds, Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Waiman Long, Will Deacon, linux-kernel On 10/10, Peter Zijlstra wrote: > > I reordered the code, it is happier now. > > Anyway, the below seems to generate decent code for > {-O2,-Os}x{gcc-14,clang-22}. Yay for optimizing compilers I suppose :-) Another approach which looks better than mine ;) Linus's version is simpler, but yours can handle break/return and the "only lockless" case, good. I leave this patch to you and Linus, he seems to like your code too. Reviewed-by: Oleg Nesterov <oleg@redhat.com> But... perhaps we should not "export" the _target names and instead add the additional defines, something like scoped_seqlock_read() scoped_seqlock_read_or_lock() scoped_seqlock_read_or_lock_irqsave() ? Oleg. ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() 2025-10-10 12:32 ` Oleg Nesterov @ 2025-10-10 13:14 ` Oleg Nesterov 2025-10-13 9:03 ` Peter Zijlstra 2025-10-10 15:30 ` Linus Torvalds 1 sibling, 1 reply; 74+ messages in thread From: Oleg Nesterov @ 2025-10-10 13:14 UTC (permalink / raw) To: Peter Zijlstra Cc: Linus Torvalds, Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Waiman Long, Will Deacon, linux-kernel On 10/10, Oleg Nesterov wrote: > > On 10/10, Peter Zijlstra wrote: > > > > I reordered the code, it is happier now. > > > > Anyway, the below seems to generate decent code for > > {-O2,-Os}x{gcc-14,clang-22}. Yay for optimizing compilers I suppose :-) > > Another approach which looks better than mine ;) > > Linus's version is simpler, but yours can handle break/return and > the "only lockless" case, good. > > I leave this patch to you and Linus, he seems to like your code too. > > Reviewed-by: Oleg Nesterov <oleg@redhat.com> > > > But... perhaps we should not "export" the _target names and instead > add the additional defines, something like > > scoped_seqlock_read() > scoped_seqlock_read_or_lock() > scoped_seqlock_read_or_lock_irqsave() > > ? And... perhaps we can simplify this code a little bit? I mean enum ss_state { ss_lockless = 0, ss_lock = 1, ss_lock_irqsave = 2, ss_done = 4, }; struct ss_tmp { enum ss_state state; unsigned long data; seqlock_t *lock; }; static inline void __scoped_seqlock_cleanup(struct ss_tmp *sst) { if (sst->state & ss_lock) spin_unlock(&sst->lock.lock); if (sst->state & ss_lock_irqsave) spin_unlock_irqrestore(&sst->lock.lock, sst->data); } static inline void __scoped_seqlock_next(struct ss_tmp *sst, enum ss_state target) { switch (sst->state) { case ss_lock: case ss_lock_irqsave: sst->state |= ss_done; return; case ss_lockless: if (!read_seqretry(sst->lock, sst->data)) { sst->state = ss_done; return; } break; } switch (target) { case ss_lock: spin_lock(&sst->lock.lock); sst->state = ss_lock; return; case ss_lock_irqsave: spin_lock_irqsave(&sst->lock.lock, sst->data); sst->state = ss_lock_irqsave; return; case ss_lockless: sst->data = read_seqbegin(sst->lock); return; } } #define __scoped_seqlock_read(_seqlock, _target, _s) \ for (struct ss_tmp _s __cleanup(__scoped_seqlock_cleanup) = \ { .state = ss_lockless, .data = read_seqbegin(_seqlock), .lock = __seqlock }; \ !(_s.state & ss_done); \ __scoped_seqlock_next(&_s, _target)) (I removed __scoped_seqlock_invalid_target/__scoped_seqlock_bug to lessen the code). Not sure this makes sense. Plus I didn't even try to compile this code and I have no idea how this change can affect the code generation. But let me ask anyway... Oleg. ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() 2025-10-10 13:14 ` Oleg Nesterov @ 2025-10-13 9:03 ` Peter Zijlstra 2025-10-13 11:50 ` Oleg Nesterov 0 siblings, 1 reply; 74+ messages in thread From: Peter Zijlstra @ 2025-10-13 9:03 UTC (permalink / raw) To: Oleg Nesterov Cc: Linus Torvalds, Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Waiman Long, Will Deacon, linux-kernel On Fri, Oct 10, 2025 at 03:14:39PM +0200, Oleg Nesterov wrote: > On 10/10, Oleg Nesterov wrote: > > > > On 10/10, Peter Zijlstra wrote: > > > > > > I reordered the code, it is happier now. > > > > > > Anyway, the below seems to generate decent code for > > > {-O2,-Os}x{gcc-14,clang-22}. Yay for optimizing compilers I suppose :-) > > > > Another approach which looks better than mine ;) > > > > Linus's version is simpler, but yours can handle break/return and > > the "only lockless" case, good. > > > > I leave this patch to you and Linus, he seems to like your code too. > > > > Reviewed-by: Oleg Nesterov <oleg@redhat.com> > > > > > > But... perhaps we should not "export" the _target names and instead > > add the additional defines, something like > > > > scoped_seqlock_read() > > scoped_seqlock_read_or_lock() > > scoped_seqlock_read_or_lock_irqsave() > > > > ? > > And... perhaps we can simplify this code a little bit? I mean > > enum ss_state { > ss_lockless = 0, > ss_lock = 1, > ss_lock_irqsave = 2, > ss_done = 4, > }; > > struct ss_tmp { > enum ss_state state; > unsigned long data; > seqlock_t *lock; > }; > > static inline void __scoped_seqlock_cleanup(struct ss_tmp *sst) > { > if (sst->state & ss_lock) > spin_unlock(&sst->lock.lock); > if (sst->state & ss_lock_irqsave) > spin_unlock_irqrestore(&sst->lock.lock, sst->data); > } > > static inline void > __scoped_seqlock_next(struct ss_tmp *sst, enum ss_state target) > { > switch (sst->state) { > case ss_lock: > case ss_lock_irqsave: > sst->state |= ss_done; > return; > > case ss_lockless: > if (!read_seqretry(sst->lock, sst->data)) { > sst->state = ss_done; > return; > } > break; > } > > switch (target) { > case ss_lock: > spin_lock(&sst->lock.lock); > sst->state = ss_lock; > return; > > case ss_lock_irqsave: > spin_lock_irqsave(&sst->lock.lock, sst->data); > sst->state = ss_lock_irqsave; > return; > > case ss_lockless: > sst->data = read_seqbegin(sst->lock); > return; > } > } > > #define __scoped_seqlock_read(_seqlock, _target, _s) \ > for (struct ss_tmp _s __cleanup(__scoped_seqlock_cleanup) = \ > { .state = ss_lockless, .data = read_seqbegin(_seqlock), .lock = __seqlock }; \ > !(_s.state & ss_done); \ > __scoped_seqlock_next(&_s, _target)) > > > (I removed __scoped_seqlock_invalid_target/__scoped_seqlock_bug to lessen the code). > > Not sure this makes sense. Plus I didn't even try to compile this code and I have > no idea how this change can affect the code generation. But let me ask anyway... So GCC is clever enough to see through this scheme, but Clang gets confused and generates worse code. Specifically it emits the whole __scoped_seqlock_cleanup() sequence, testing both bits and both unlock options. Where previously it would only have to discover which field was written to and could delete all code for the unwritten field, it now has to track the state and discover ss_lock|ss_done is not possible while ss_lock_irqsave|ss_done is. So while that additional pointer might seem wasteful, it actually makes the state tracking easier and allows the compiler to more easily throw away stuff. ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() 2025-10-13 9:03 ` Peter Zijlstra @ 2025-10-13 11:50 ` Oleg Nesterov 0 siblings, 0 replies; 74+ messages in thread From: Oleg Nesterov @ 2025-10-13 11:50 UTC (permalink / raw) To: Peter Zijlstra Cc: Linus Torvalds, Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Waiman Long, Will Deacon, linux-kernel On 10/13, Peter Zijlstra wrote: > > On Fri, Oct 10, 2025 at 03:14:39PM +0200, Oleg Nesterov wrote: > > static inline void __scoped_seqlock_cleanup(struct ss_tmp *sst) > > { > > if (sst->state & ss_lock) > > spin_unlock(&sst->lock.lock); > > if (sst->state & ss_lock_irqsave) > > spin_unlock_irqrestore(&sst->lock.lock, sst->data); > > } > > > > static inline void > > __scoped_seqlock_next(struct ss_tmp *sst, enum ss_state target) > > { > > switch (sst->state) { > > case ss_lock: > > case ss_lock_irqsave: > > sst->state |= ss_done; > > So GCC is clever enough to see through this scheme, but Clang gets > confused and generates worse code. Specifically it emits the whole > __scoped_seqlock_cleanup() sequence, testing both bits and both unlock > options. OK, thanks and sorry for the noise then. > So while that additional pointer might seem wasteful, it actually makes > the state tracking easier and allows the compiler to more easily throw > away stuff. Great ;) Oleg. ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() 2025-10-10 12:32 ` Oleg Nesterov 2025-10-10 13:14 ` Oleg Nesterov @ 2025-10-10 15:30 ` Linus Torvalds 1 sibling, 0 replies; 74+ messages in thread From: Linus Torvalds @ 2025-10-10 15:30 UTC (permalink / raw) To: Oleg Nesterov Cc: Peter Zijlstra, Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Waiman Long, Will Deacon, linux-kernel On Fri, 10 Oct 2025 at 05:33, Oleg Nesterov <oleg@redhat.com> wrote: > > I leave this patch to you and Linus, he seems to like your code too. Yeah, I think PeterZ's is the best of the bunch, I like how it looks in that latest iteration. Linus ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() 2025-10-09 22:12 ` Peter Zijlstra 2025-10-09 22:55 ` Linus Torvalds @ 2025-10-09 23:20 ` Peter Zijlstra 2025-10-09 23:26 ` Linus Torvalds 1 sibling, 1 reply; 74+ messages in thread From: Peter Zijlstra @ 2025-10-09 23:20 UTC (permalink / raw) To: Linus Torvalds Cc: Oleg Nesterov, Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Waiman Long, Will Deacon, linux-kernel On Fri, Oct 10, 2025 at 12:12:42AM +0200, Peter Zijlstra wrote: > On Thu, Oct 09, 2025 at 01:24:51PM -0700, Linus Torvalds wrote: > > On Thu, 9 Oct 2025 at 13:12, Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > Slightly nicer version that's actually compiled :-) > > > > I assume that "target of ss_lockless" is an intentional extension to > > just make the loop never take a lock at all? > > Yep. Almost came for free, so might as well do it. > > > I do like it, but I think you'll find that having a separate 'seq' and > > 'flags' like this: > > > > > +struct ss_tmp { > > > + enum ss_state state; > > > + int seq; > > > + unsigned long flags; > > > + spinlock_t *lock; > > > +}; > > > > makes it unnecessarily waste a register. > > > > You never need both seq and flags at the same time, since if you take > > the spinlock the sequence number is pointless. > > > > So please make that a union, and I think it will help avoid wasting a > > register in the loop. > > Sure; otoh compiler should be able to tell the same using liveness > analysis I suppose, but perhaps they're not *that* clever. The moment I use a union gcc-14 creates the whole structure on the stack, while without the union it mostly manages to keep the things in registers without too much spilling. Also, there's a rather silly bug in the code, the __cleanup function looks at ->state to determine which spin_unlock variant to pick, but ->state will be ss_done. Additionally, when someone does 'break' or 'goto' out of the scope the state isn't reliably anyway. The sanest code gen happens when I do: struct ss_tmp { enum ss_state state; int seq; unsigned long flags; spinlock_t *lock; spinlock_t *lock_irqsave; } static inline void __scoped_seqlock_cleanup(struct ss_tmp *sst) { if (sst->lock) spin_unlock(sst->lock); if (sst->lock_irqsave) spin_unlock_irqrestore(sst->lock, sst->flags); } liveness analysis seems to work well, and it is able to determine which fields are active and only track those in registers and utterly discard the rest. Specifically in the thread_group_cputime code, there are only _raw_spin_lock_irqsave and _raw_spin_unlock_irqrestore calls left, all the other stuff is just optimized out. I'll look what clang does ... but that's really tomorrow, or rather, somewhat later this morning :/ ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() 2025-10-09 23:20 ` Peter Zijlstra @ 2025-10-09 23:26 ` Linus Torvalds 0 siblings, 0 replies; 74+ messages in thread From: Linus Torvalds @ 2025-10-09 23:26 UTC (permalink / raw) To: Peter Zijlstra Cc: Oleg Nesterov, Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Waiman Long, Will Deacon, linux-kernel On Thu, 9 Oct 2025 at 16:20, Peter Zijlstra <peterz@infradead.org> wrote: > > The moment I use a union gcc-14 creates the whole structure on the > stack, while without the union it mostly manages to keep the things > in registers without too much spilling. Ouch. Do the union manually like I did by just using a "unsigned long" for both 'seq' and 'flags'. I bet that works fine - it seemed to work well for my - admittedly somewhat different - version. Linus ^ permalink raw reply [flat|nested] 74+ messages in thread
* [tip: locking/core] seqlock: Introduce scoped_seqlock_read() 2025-10-09 20:11 ` Peter Zijlstra 2025-10-09 20:24 ` Linus Torvalds @ 2025-10-21 10:35 ` tip-bot2 for Peter Zijlstra 1 sibling, 0 replies; 74+ messages in thread From: tip-bot2 for Peter Zijlstra @ 2025-10-21 10:35 UTC (permalink / raw) To: linux-tip-commits Cc: Oleg Nesterov, Peter Zijlstra (Intel), x86, linux-kernel The following commit has been merged into the locking/core branch of tip: Commit-ID: cc39f3872c0865bef992b713338df369554fa9e0 Gitweb: https://git.kernel.org/tip/cc39f3872c0865bef992b713338df369554fa9e0 Author: Peter Zijlstra <peterz@infradead.org> AuthorDate: Thu, 09 Oct 2025 22:11:54 +02:00 Committer: Peter Zijlstra <peterz@infradead.org> CommitterDate: Tue, 21 Oct 2025 12:31:57 +02:00 seqlock: Introduce scoped_seqlock_read() The read_seqbegin/need_seqretry/done_seqretry API is cumbersome and error prone. With the new helper the "typical" code like int seq, nextseq; unsigned long flags; nextseq = 0; do { seq = nextseq; flags = read_seqbegin_or_lock_irqsave(&seqlock, &seq); // read-side critical section nextseq = 1; } while (need_seqretry(&seqlock, seq)); done_seqretry_irqrestore(&seqlock, seq, flags); can be rewritten as scoped_seqlock_read (&seqlock, ss_lock_irqsave) { // read-side critical section } Original idea by Oleg Nesterov; with contributions from Linus. Originally-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- include/linux/seqlock.h | 111 +++++++++++++++++++++++++++++++++++++++- 1 file changed, 111 insertions(+) diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h index 5ce48ea..b7bcc41 100644 --- a/include/linux/seqlock.h +++ b/include/linux/seqlock.h @@ -1209,4 +1209,115 @@ done_seqretry_irqrestore(seqlock_t *lock, int seq, unsigned long flags) if (seq & 1) read_sequnlock_excl_irqrestore(lock, flags); } + +enum ss_state { + ss_done = 0, + ss_lock, + ss_lock_irqsave, + ss_lockless, +}; + +struct ss_tmp { + enum ss_state state; + unsigned long data; + spinlock_t *lock; + spinlock_t *lock_irqsave; +}; + +static inline void __scoped_seqlock_cleanup(struct ss_tmp *sst) +{ + if (sst->lock) + spin_unlock(sst->lock); + if (sst->lock_irqsave) + spin_unlock_irqrestore(sst->lock_irqsave, sst->data); +} + +extern void __scoped_seqlock_invalid_target(void); + +#if defined(CONFIG_CC_IS_GCC) && CONFIG_GCC_VERSION < 90000 +/* + * For some reason some GCC-8 architectures (nios2, alpha) have trouble + * determining that the ss_done state is impossible in __scoped_seqlock_next() + * below. + */ +static inline void __scoped_seqlock_bug(void) { } +#else +/* + * Canary for compiler optimization -- if the compiler doesn't realize this is + * an impossible state, it very likely generates sub-optimal code here. + */ +extern void __scoped_seqlock_bug(void); +#endif + +static inline void +__scoped_seqlock_next(struct ss_tmp *sst, seqlock_t *lock, enum ss_state target) +{ + switch (sst->state) { + case ss_done: + __scoped_seqlock_bug(); + return; + + case ss_lock: + case ss_lock_irqsave: + sst->state = ss_done; + return; + + case ss_lockless: + if (!read_seqretry(lock, sst->data)) { + sst->state = ss_done; + return; + } + break; + } + + switch (target) { + case ss_done: + __scoped_seqlock_invalid_target(); + return; + + case ss_lock: + sst->lock = &lock->lock; + spin_lock(sst->lock); + sst->state = ss_lock; + return; + + case ss_lock_irqsave: + sst->lock_irqsave = &lock->lock; + spin_lock_irqsave(sst->lock_irqsave, sst->data); + sst->state = ss_lock_irqsave; + return; + + case ss_lockless: + sst->data = read_seqbegin(lock); + return; + } +} + +#define __scoped_seqlock_read(_seqlock, _target, _s) \ + for (struct ss_tmp _s __cleanup(__scoped_seqlock_cleanup) = \ + { .state = ss_lockless, .data = read_seqbegin(_seqlock) }; \ + _s.state != ss_done; \ + __scoped_seqlock_next(&_s, _seqlock, _target)) + +/** + * scoped_seqlock_read (lock, ss_state) - execute the read side critical + * section without manual sequence + * counter handling or calls to other + * helpers + * @lock: pointer to seqlock_t protecting the data + * @ss_state: one of {ss_lock, ss_lock_irqsave, ss_lockless} indicating + * the type of critical read section + * + * Example: + * + * scoped_seqlock_read (&lock, ss_lock) { + * // read-side critical section + * } + * + * Starts with a lockess pass first. If it fails, restarts the critical + * section with the lock held. + */ +#define scoped_seqlock_read(_seqlock, _target) \ + __scoped_seqlock_read(_seqlock, _target, __UNIQUE_ID(seqlock)) + #endif /* __LINUX_SEQLOCK_H */ ^ permalink raw reply related [flat|nested] 74+ messages in thread
* [PATCH v2 2/4] seqlock: change thread_group_cputime() to use scoped_seqlock_read_irqsave() 2025-10-08 12:30 ` [PATCH v2 " Oleg Nesterov 2025-10-08 12:30 ` [PATCH v2 1/4] " Oleg Nesterov @ 2025-10-08 12:30 ` Oleg Nesterov 2025-10-21 10:35 ` [tip: locking/core] seqlock: Change thread_group_cputime() to use scoped_seqlock_read() tip-bot2 for Oleg Nesterov 2025-10-08 12:30 ` [PATCH v2 3/4] seqlock: change do_task_stat() to use scoped_seqlock_read_irqsave() Oleg Nesterov ` (2 subsequent siblings) 4 siblings, 1 reply; 74+ messages in thread From: Oleg Nesterov @ 2025-10-08 12:30 UTC (permalink / raw) To: Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Linus Torvalds, Peter Zijlstra, Waiman Long, Will Deacon Cc: linux-kernel To simplify the code and make it more readable. While at it, change thread_group_cputime() to use __for_each_thread(sig). Signed-off-by: Oleg Nesterov <oleg@redhat.com> --- kernel/sched/cputime.c | 15 +++------------ 1 file changed, 3 insertions(+), 12 deletions(-) diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index 7097de2c8cda..94b445e947d2 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -315,8 +315,6 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times) struct signal_struct *sig = tsk->signal; u64 utime, stime; struct task_struct *t; - unsigned int seq, nextseq; - unsigned long flags; /* * Update current task runtime to account pending time since last @@ -330,25 +328,18 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times) (void) task_sched_runtime(current); rcu_read_lock(); - /* Attempt a lockless read on the first round. */ - nextseq = 0; - do { - seq = nextseq; - flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, &seq); + scoped_seqlock_read_irqsave(&sig->stats_lock) { times->utime = sig->utime; times->stime = sig->stime; times->sum_exec_runtime = sig->sum_sched_runtime; - for_each_thread(tsk, t) { + __for_each_thread(sig, t) { task_cputime(t, &utime, &stime); times->utime += utime; times->stime += stime; times->sum_exec_runtime += read_sum_exec_runtime(t); } - /* If lockless access failed, take the lock. */ - nextseq = 1; - } while (need_seqretry(&sig->stats_lock, seq)); - done_seqretry_irqrestore(&sig->stats_lock, seq, flags); + } rcu_read_unlock(); } -- 2.25.1.362.g51ebf55 ^ permalink raw reply related [flat|nested] 74+ messages in thread
* [tip: locking/core] seqlock: Change thread_group_cputime() to use scoped_seqlock_read() 2025-10-08 12:30 ` [PATCH v2 2/4] seqlock: change thread_group_cputime() to use scoped_seqlock_read_irqsave() Oleg Nesterov @ 2025-10-21 10:35 ` tip-bot2 for Oleg Nesterov 0 siblings, 0 replies; 74+ messages in thread From: tip-bot2 for Oleg Nesterov @ 2025-10-21 10:35 UTC (permalink / raw) To: linux-tip-commits Cc: Oleg Nesterov, Peter Zijlstra (Intel), x86, linux-kernel The following commit has been merged into the locking/core branch of tip: Commit-ID: 488f48b32654dc6be04d9cc12f75ce030c9cb21b Gitweb: https://git.kernel.org/tip/488f48b32654dc6be04d9cc12f75ce030c9cb21b Author: Oleg Nesterov <oleg@redhat.com> AuthorDate: Wed, 08 Oct 2025 14:30:52 +02:00 Committer: Peter Zijlstra <peterz@infradead.org> CommitterDate: Tue, 21 Oct 2025 12:31:57 +02:00 seqlock: Change thread_group_cputime() to use scoped_seqlock_read() To simplify the code and make it more readable. While at it, change thread_group_cputime() to use __for_each_thread(sig). [peterz: update to new interface] Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- kernel/sched/cputime.c | 20 +++++--------------- 1 file changed, 5 insertions(+), 15 deletions(-) diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index 7097de2..4f97896 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -313,10 +313,8 @@ static u64 read_sum_exec_runtime(struct task_struct *t) void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times) { struct signal_struct *sig = tsk->signal; - u64 utime, stime; struct task_struct *t; - unsigned int seq, nextseq; - unsigned long flags; + u64 utime, stime; /* * Update current task runtime to account pending time since last @@ -329,27 +327,19 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times) if (same_thread_group(current, tsk)) (void) task_sched_runtime(current); - rcu_read_lock(); - /* Attempt a lockless read on the first round. */ - nextseq = 0; - do { - seq = nextseq; - flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, &seq); + guard(rcu)(); + scoped_seqlock_read (&sig->stats_lock, ss_lock_irqsave) { times->utime = sig->utime; times->stime = sig->stime; times->sum_exec_runtime = sig->sum_sched_runtime; - for_each_thread(tsk, t) { + __for_each_thread(sig, t) { task_cputime(t, &utime, &stime); times->utime += utime; times->stime += stime; times->sum_exec_runtime += read_sum_exec_runtime(t); } - /* If lockless access failed, take the lock. */ - nextseq = 1; - } while (need_seqretry(&sig->stats_lock, seq)); - done_seqretry_irqrestore(&sig->stats_lock, seq, flags); - rcu_read_unlock(); + } } #ifdef CONFIG_IRQ_TIME_ACCOUNTING ^ permalink raw reply related [flat|nested] 74+ messages in thread
* [PATCH v2 3/4] seqlock: change do_task_stat() to use scoped_seqlock_read_irqsave() 2025-10-08 12:30 ` [PATCH v2 " Oleg Nesterov 2025-10-08 12:30 ` [PATCH v2 1/4] " Oleg Nesterov 2025-10-08 12:30 ` [PATCH v2 2/4] seqlock: change thread_group_cputime() to use scoped_seqlock_read_irqsave() Oleg Nesterov @ 2025-10-08 12:30 ` Oleg Nesterov 2025-10-21 10:35 ` [tip: locking/core] seqlock: Change do_task_stat() to use scoped_seqlock_read() tip-bot2 for Oleg Nesterov 2025-10-08 12:31 ` [PATCH v2 4/4] seqlock: change do_io_accounting() to use scoped_seqlock_read_irqsave() Oleg Nesterov 2025-10-08 12:56 ` [PATCH v2 0/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() Peter Zijlstra 4 siblings, 1 reply; 74+ messages in thread From: Oleg Nesterov @ 2025-10-08 12:30 UTC (permalink / raw) To: Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Linus Torvalds, Peter Zijlstra, Waiman Long, Will Deacon Cc: linux-kernel To simplify the code and make it more readable. Signed-off-by: Oleg Nesterov <oleg@redhat.com> --- fs/proc/array.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/fs/proc/array.c b/fs/proc/array.c index d6a0369caa93..e9c61448f3ee 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -483,7 +483,6 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, unsigned long flags; int exit_code = task->exit_code; struct signal_struct *sig = task->signal; - unsigned int seq = 1; state = *get_task_state(task); vsize = eip = esp = 0; @@ -540,10 +539,7 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, if (permitted && (!whole || num_threads < 2)) wchan = !task_is_running(task); - do { - seq++; /* 2 on the 1st/lockless path, otherwise odd */ - flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, &seq); - + scoped_seqlock_read_irqsave(&sig->stats_lock) { cmin_flt = sig->cmin_flt; cmaj_flt = sig->cmaj_flt; cutime = sig->cutime; @@ -565,8 +561,7 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, } rcu_read_unlock(); } - } while (need_seqretry(&sig->stats_lock, seq)); - done_seqretry_irqrestore(&sig->stats_lock, seq, flags); + } if (whole) { thread_group_cputime_adjusted(task, &utime, &stime); -- 2.25.1.362.g51ebf55 ^ permalink raw reply related [flat|nested] 74+ messages in thread
* [tip: locking/core] seqlock: Change do_task_stat() to use scoped_seqlock_read() 2025-10-08 12:30 ` [PATCH v2 3/4] seqlock: change do_task_stat() to use scoped_seqlock_read_irqsave() Oleg Nesterov @ 2025-10-21 10:35 ` tip-bot2 for Oleg Nesterov 0 siblings, 0 replies; 74+ messages in thread From: tip-bot2 for Oleg Nesterov @ 2025-10-21 10:35 UTC (permalink / raw) To: linux-tip-commits Cc: Oleg Nesterov, Peter Zijlstra (Intel), x86, linux-kernel The following commit has been merged into the locking/core branch of tip: Commit-ID: b76f72bea2c601afec81829ea427fc0d20f83216 Gitweb: https://git.kernel.org/tip/b76f72bea2c601afec81829ea427fc0d20f83216 Author: Oleg Nesterov <oleg@redhat.com> AuthorDate: Wed, 08 Oct 2025 14:30:59 +02:00 Committer: Peter Zijlstra <peterz@infradead.org> CommitterDate: Tue, 21 Oct 2025 12:31:57 +02:00 seqlock: Change do_task_stat() to use scoped_seqlock_read() To simplify the code and make it more readable. [peterz: change to new interface] Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- fs/proc/array.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/fs/proc/array.c b/fs/proc/array.c index 2ae6318..cbd4bc4 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -481,7 +481,6 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, unsigned long flags; int exit_code = task->exit_code; struct signal_struct *sig = task->signal; - unsigned int seq = 1; state = *get_task_state(task); vsize = eip = esp = 0; @@ -538,10 +537,7 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, if (permitted && (!whole || num_threads < 2)) wchan = !task_is_running(task); - do { - seq++; /* 2 on the 1st/lockless path, otherwise odd */ - flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, &seq); - + scoped_seqlock_read (&sig->stats_lock, ss_lock_irqsave) { cmin_flt = sig->cmin_flt; cmaj_flt = sig->cmaj_flt; cutime = sig->cutime; @@ -563,8 +559,7 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns, } rcu_read_unlock(); } - } while (need_seqretry(&sig->stats_lock, seq)); - done_seqretry_irqrestore(&sig->stats_lock, seq, flags); + } if (whole) { thread_group_cputime_adjusted(task, &utime, &stime); ^ permalink raw reply related [flat|nested] 74+ messages in thread
* [PATCH v2 4/4] seqlock: change do_io_accounting() to use scoped_seqlock_read_irqsave() 2025-10-08 12:30 ` [PATCH v2 " Oleg Nesterov ` (2 preceding siblings ...) 2025-10-08 12:30 ` [PATCH v2 3/4] seqlock: change do_task_stat() to use scoped_seqlock_read_irqsave() Oleg Nesterov @ 2025-10-08 12:31 ` Oleg Nesterov 2025-10-21 10:35 ` [tip: locking/core] seqlock: Change do_io_accounting() to use scoped_seqlock_read() tip-bot2 for Oleg Nesterov 2025-10-08 12:56 ` [PATCH v2 0/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() Peter Zijlstra 4 siblings, 1 reply; 74+ messages in thread From: Oleg Nesterov @ 2025-10-08 12:31 UTC (permalink / raw) To: Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Linus Torvalds, Peter Zijlstra, Waiman Long, Will Deacon Cc: linux-kernel To simplify the code and make it more readable. Signed-off-by: Oleg Nesterov <oleg@redhat.com> --- fs/proc/base.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index 62d35631ba8c..08113bb6af1d 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -3041,20 +3041,14 @@ static int do_io_accounting(struct task_struct *task, struct seq_file *m, int wh if (whole) { struct signal_struct *sig = task->signal; struct task_struct *t; - unsigned int seq = 1; - unsigned long flags; rcu_read_lock(); - do { - seq++; /* 2 on the 1st/lockless path, otherwise odd */ - flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, &seq); - + scoped_seqlock_read_irqsave(&sig->stats_lock) { acct = sig->ioac; __for_each_thread(sig, t) task_io_accounting_add(&acct, &t->ioac); - } while (need_seqretry(&sig->stats_lock, seq)); - done_seqretry_irqrestore(&sig->stats_lock, seq, flags); + } rcu_read_unlock(); } else { acct = task->ioac; -- 2.25.1.362.g51ebf55 ^ permalink raw reply related [flat|nested] 74+ messages in thread
* [tip: locking/core] seqlock: Change do_io_accounting() to use scoped_seqlock_read() 2025-10-08 12:31 ` [PATCH v2 4/4] seqlock: change do_io_accounting() to use scoped_seqlock_read_irqsave() Oleg Nesterov @ 2025-10-21 10:35 ` tip-bot2 for Oleg Nesterov 0 siblings, 0 replies; 74+ messages in thread From: tip-bot2 for Oleg Nesterov @ 2025-10-21 10:35 UTC (permalink / raw) To: linux-tip-commits Cc: Oleg Nesterov, Peter Zijlstra (Intel), x86, linux-kernel The following commit has been merged into the locking/core branch of tip: Commit-ID: 795aab353d0650b2d04dc3aa2e22a51000cb2aaa Gitweb: https://git.kernel.org/tip/795aab353d0650b2d04dc3aa2e22a51000cb2aaa Author: Oleg Nesterov <oleg@redhat.com> AuthorDate: Wed, 08 Oct 2025 14:31:05 +02:00 Committer: Peter Zijlstra <peterz@infradead.org> CommitterDate: Tue, 21 Oct 2025 12:31:57 +02:00 seqlock: Change do_io_accounting() to use scoped_seqlock_read() To simplify the code and make it more readable. [peterz: change to new interface] Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- fs/proc/base.c | 13 +++---------- 1 file changed, 3 insertions(+), 10 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index 6299878..407b41c 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -3043,21 +3043,14 @@ static int do_io_accounting(struct task_struct *task, struct seq_file *m, int wh if (whole) { struct signal_struct *sig = task->signal; struct task_struct *t; - unsigned int seq = 1; - unsigned long flags; - - rcu_read_lock(); - do { - seq++; /* 2 on the 1st/lockless path, otherwise odd */ - flags = read_seqbegin_or_lock_irqsave(&sig->stats_lock, &seq); + guard(rcu)(); + scoped_seqlock_read (&sig->stats_lock, ss_lock_irqsave) { acct = sig->ioac; __for_each_thread(sig, t) task_io_accounting_add(&acct, &t->ioac); - } while (need_seqretry(&sig->stats_lock, seq)); - done_seqretry_irqrestore(&sig->stats_lock, seq, flags); - rcu_read_unlock(); + } } else { acct = task->ioac; } ^ permalink raw reply related [flat|nested] 74+ messages in thread
* Re: [PATCH v2 0/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() 2025-10-08 12:30 ` [PATCH v2 " Oleg Nesterov ` (3 preceding siblings ...) 2025-10-08 12:31 ` [PATCH v2 4/4] seqlock: change do_io_accounting() to use scoped_seqlock_read_irqsave() Oleg Nesterov @ 2025-10-08 12:56 ` Peter Zijlstra 2025-10-08 13:13 ` Oleg Nesterov 4 siblings, 1 reply; 74+ messages in thread From: Peter Zijlstra @ 2025-10-08 12:56 UTC (permalink / raw) To: Oleg Nesterov Cc: Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Linus Torvalds, Waiman Long, Will Deacon, linux-kernel On Wed, Oct 08, 2025 at 02:30:14PM +0200, Oleg Nesterov wrote: > Only 1/4 was changed according to comments from Linus and Waiman, > but let me resend 2-4 as well. Shall I put this in tip/locking/core once -rc1 happens or so? ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH v2 0/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() 2025-10-08 12:56 ` [PATCH v2 0/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() Peter Zijlstra @ 2025-10-08 13:13 ` Oleg Nesterov 2025-10-08 13:55 ` Peter Zijlstra 0 siblings, 1 reply; 74+ messages in thread From: Oleg Nesterov @ 2025-10-08 13:13 UTC (permalink / raw) To: Peter Zijlstra Cc: Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Linus Torvalds, Waiman Long, Will Deacon, linux-kernel On 10/08, Peter Zijlstra wrote: > > On Wed, Oct 08, 2025 at 02:30:14PM +0200, Oleg Nesterov wrote: > > Only 1/4 was changed according to comments from Linus and Waiman, > > but let me resend 2-4 as well. > > Shall I put this in tip/locking/core once -rc1 happens or so? Would be great ;) If nobody objects. Can you also take the trivial [PATCH 1/1] documentation: seqlock: fix the wrong documentation of read_seqbegin_or_lock/need_seqretry https://lore.kernel.org/all/20250928162029.GA3121@redhat.com/ ? OTOH, if this series is merged, this doc fix should be updated to mention scoped_seqlock_read()... Oleg. ^ permalink raw reply [flat|nested] 74+ messages in thread
* Re: [PATCH v2 0/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() 2025-10-08 13:13 ` Oleg Nesterov @ 2025-10-08 13:55 ` Peter Zijlstra 0 siblings, 0 replies; 74+ messages in thread From: Peter Zijlstra @ 2025-10-08 13:55 UTC (permalink / raw) To: Oleg Nesterov Cc: Alexander Viro, Boqun Feng, David Howells, Ingo Molnar, Li RongQing, Linus Torvalds, Waiman Long, Will Deacon, linux-kernel On Wed, Oct 08, 2025 at 03:13:02PM +0200, Oleg Nesterov wrote: > On 10/08, Peter Zijlstra wrote: > > > > On Wed, Oct 08, 2025 at 02:30:14PM +0200, Oleg Nesterov wrote: > > > Only 1/4 was changed according to comments from Linus and Waiman, > > > but let me resend 2-4 as well. > > > > Shall I put this in tip/locking/core once -rc1 happens or so? > > Would be great ;) If nobody objects. > > Can you also take the trivial > > [PATCH 1/1] documentation: seqlock: fix the wrong documentation of read_seqbegin_or_lock/need_seqretry > https://lore.kernel.org/all/20250928162029.GA3121@redhat.com/ > > ? I've picked that up too and stuck it in front. > OTOH, if this series is merged, this doc fix should be updated to > mention scoped_seqlock_read()... I see a patch [5/4] in your future ;-) ^ permalink raw reply [flat|nested] 74+ messages in thread
end of thread, other threads:[~2025-10-21 10:35 UTC | newest] Thread overview: 74+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-09-28 16:19 [PATCH 0/1] documentation: seqlock: fix the wrong documentation of read_seqbegin_or_lock/need_seqretry Oleg Nesterov 2025-09-28 16:20 ` [PATCH 1/1] " Oleg Nesterov 2025-10-01 18:21 ` Waiman Long 2025-10-01 19:06 ` Oleg Nesterov 2025-10-01 19:24 ` Waiman Long 2025-10-01 19:34 ` Waiman Long 2025-10-02 11:01 ` Oleg Nesterov 2025-10-21 10:35 ` [tip: locking/core] " tip-bot2 for Oleg Nesterov 2025-09-28 16:20 ` [RFC 2/1] seqlock: make the read_seqbegin_or_lock() API more simple and less error-prone ? Oleg Nesterov 2025-09-29 0:41 ` [????] " Li,Rongqing 2025-09-29 6:47 ` Oleg Nesterov 2025-09-30 22:09 ` David Howells 2025-10-01 11:51 ` Oleg Nesterov 2025-10-01 13:02 ` Peter Zijlstra 2025-10-01 13:13 ` Oleg Nesterov 2025-10-01 13:46 ` Oleg Nesterov 2025-10-02 12:58 ` Oleg Nesterov 2025-10-05 14:47 ` [PATCH 0/5] seqlock: introduce SEQLOCK_READ_SECTION() Oleg Nesterov 2025-10-05 14:49 ` Oleg Nesterov 2025-10-05 14:50 ` [PATCH 1/5] " Oleg Nesterov 2025-10-05 15:34 ` Linus Torvalds 2025-10-05 16:07 ` Oleg Nesterov 2025-10-05 16:35 ` Linus Torvalds 2025-10-05 14:50 ` [PATCH 2/5] seqlock: change thread_group_cputime() to use SEQLOCK_READ_SECTION() Oleg Nesterov 2025-10-05 14:50 ` [PATCH 3/5] seqlock: change do_task_stat() " Oleg Nesterov 2025-10-05 14:50 ` [PATCH 4/5] seqlock: change do_io_accounting() " Oleg Nesterov 2025-10-05 14:50 ` [PATCH 5/5] seqlock: change __dentry_path() to use __SEQLOCK_READ_SECTION() Oleg Nesterov 2025-10-05 15:48 ` Linus Torvalds 2025-10-05 15:30 ` [PATCH 0/5] seqlock: introduce SEQLOCK_READ_SECTION() Al Viro 2025-10-05 17:40 ` Oleg Nesterov 2025-10-07 14:20 ` [PATCH 0/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() Oleg Nesterov 2025-10-07 14:21 ` [PATCH 1/4] " Oleg Nesterov 2025-10-07 16:35 ` Waiman Long 2025-10-07 17:18 ` Oleg Nesterov 2025-10-07 17:21 ` Waiman Long 2025-10-07 14:21 ` [PATCH 2/4] seqlock: change thread_group_cputime() to use scoped_seqlock_read_irqsave() Oleg Nesterov 2025-10-07 14:21 ` [PATCH 3/4] seqlock: change do_task_stat() " Oleg Nesterov 2025-10-07 14:21 ` [PATCH 4/4] seqlock: change do_io_accounting() " Oleg Nesterov 2025-10-07 15:38 ` [PATCH 0/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() Linus Torvalds 2025-10-07 16:34 ` Oleg Nesterov 2025-10-08 12:30 ` [PATCH v2 " Oleg Nesterov 2025-10-08 12:30 ` [PATCH v2 1/4] " Oleg Nesterov 2025-10-08 12:55 ` Peter Zijlstra 2025-10-08 12:59 ` Oleg Nesterov 2025-10-08 13:54 ` Peter Zijlstra 2025-10-08 16:05 ` Linus Torvalds 2025-10-08 16:55 ` Oleg Nesterov 2025-10-09 5:31 ` Linus Torvalds 2025-10-09 7:04 ` Linus Torvalds 2025-10-09 14:37 ` Oleg Nesterov 2025-10-09 16:18 ` Linus Torvalds 2025-10-09 19:50 ` Peter Zijlstra 2025-10-09 20:11 ` Peter Zijlstra 2025-10-09 20:24 ` Linus Torvalds 2025-10-09 22:12 ` Peter Zijlstra 2025-10-09 22:55 ` Linus Torvalds 2025-10-10 8:03 ` Peter Zijlstra 2025-10-10 12:32 ` Oleg Nesterov 2025-10-10 13:14 ` Oleg Nesterov 2025-10-13 9:03 ` Peter Zijlstra 2025-10-13 11:50 ` Oleg Nesterov 2025-10-10 15:30 ` Linus Torvalds 2025-10-09 23:20 ` Peter Zijlstra 2025-10-09 23:26 ` Linus Torvalds 2025-10-21 10:35 ` [tip: locking/core] seqlock: Introduce scoped_seqlock_read() tip-bot2 for Peter Zijlstra 2025-10-08 12:30 ` [PATCH v2 2/4] seqlock: change thread_group_cputime() to use scoped_seqlock_read_irqsave() Oleg Nesterov 2025-10-21 10:35 ` [tip: locking/core] seqlock: Change thread_group_cputime() to use scoped_seqlock_read() tip-bot2 for Oleg Nesterov 2025-10-08 12:30 ` [PATCH v2 3/4] seqlock: change do_task_stat() to use scoped_seqlock_read_irqsave() Oleg Nesterov 2025-10-21 10:35 ` [tip: locking/core] seqlock: Change do_task_stat() to use scoped_seqlock_read() tip-bot2 for Oleg Nesterov 2025-10-08 12:31 ` [PATCH v2 4/4] seqlock: change do_io_accounting() to use scoped_seqlock_read_irqsave() Oleg Nesterov 2025-10-21 10:35 ` [tip: locking/core] seqlock: Change do_io_accounting() to use scoped_seqlock_read() tip-bot2 for Oleg Nesterov 2025-10-08 12:56 ` [PATCH v2 0/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave() Peter Zijlstra 2025-10-08 13:13 ` Oleg Nesterov 2025-10-08 13:55 ` Peter Zijlstra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox