Re: [RFC] in-kernel rseq

All of lore.kernel.org
 help / color / mirror / Atom feed

From: David Laight <david.laight.linux@gmail.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	linux-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Mark Rutland <mark.rutland@arm.com>,
	cmarinas@kernel.org, maddy@linux.ibm.com, hca@linux.ibm.com,
	ryan.roberts@arm.com
Subject: Re: [RFC] in-kernel rseq
Date: Tue, 24 Feb 2026 10:27:28 +0000	[thread overview]
Message-ID: <20260224102728.1273c080@pumpkin> (raw)
In-Reply-To: <20260223215436.GS1282955@noisy.programming.kicks-ass.net>

On Mon, 23 Feb 2026 22:54:36 +0100
Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, Feb 23, 2026 at 01:22:18PM -0500, Mathieu Desnoyers wrote:
> 
> > > I think it would be better as the address of the instruction after
> > > the 'store'.  
> > 
> > That's indeed what we do for userspace rseq.  
> 
> Either works I suppose. The only think to be careful about is that you
> must not restart once the store has happened.
> 
> > > You probably don't need separate 'begin' and 'restart' addresses.  
> > 
> > It's not needed as long as the abort behavior is only restart. It
> > becomes useful if another behavior is wanted on abort. But since
> > this is kernel code and not ABI, it can change if the need arise.  
> 
> Right, didn't want to limit to restart, although that is what is used
> here.
> 
> > > It might be enough to save the 'restart' address and a byte length
> > > directly in 'current', much simpler code.  
> > 
> > That would make it two stores to the task struct. Those would not be
> > single-instruction, so we'd have to deal with preemption coming between
> > those two stores. Also this would be more code: two stores compared
> > to a single pointer store to the task struct to begin the critical
> > section. AFAIU Peter's proposed approach is more efficient.  
> 
> Must indeed be a single store. Either we have it set in full, or we
> don't.

Not really, you can do two stores (to the task struct) provided you
check the second one - remember the data is being looked at by the
cpu that did the writes.

> > We could turn the end address into a length if we want, this would
> > make it more alike the userspace rseq ABI counterpart.  
> 
> I find 3 distinct addresses easier to fill out, but again it doesn't
> matter.

Actually if you save the end address you only need to check if the
current %pc is less that that address, if it is you back it up to
the start of the sequence.

> 
> > > How much it helps is another matter.
> > > I'm sure I remember something about per-cpu data being used for something
> > > because it was faster then using 'current' - not sure of the context.  
> > 
> > The problem with per-cpu data for this is how to handle migration ?
> > The whole point of this is to replace preempt disable.  
> 
> This; it cannot be a per-cpu address, if you need it to implement
> per-cpu ops.

Sorry yes, you are replacing a per-cpu data operation with a per-task one.
But I'm sure I remember something where the opposite was done because it
was unexpectedly faster to use per-cpu data.
I'm not sure where arm gets 'current' from, x86 'has it easy' because
of %fg and %gs.
(If current is loaded from per-cpu data that might explain why using
per-cpu data is faster.)

That makes me think (a bad sign)...
Are the compilers 'clever' enough to use %fs for current->member while
current()->member uses a #define to get the actual address?

preempt_disable() itself can be implemented using per-cpu or per-task
data. I think it varies between architectures, not sure which asm uses.

> > > The real problem with rseq is they don't scale.  
> > 
> > Not sure what you mean. They don't scale with respect to what ?  
> 
> He might be talking about forward progress instead of scaling. There are
> indeed foward progress concerns with rseq -- as there are with trivial
> LL/SC. But given the length of a slice vs the length of a rseq section,
> this should be a non-issue.

No scaling, in this case it is fine to add the rseq just before needing it.
But if they have to be set in advance then you start getting a long list
to check - I'm sure that must happen with userspace rseq?

> 
> Doing the restart on interrupt would be a bigger issue. Although even
> there I think that since the operations we're talking about are but a
> few instructions, it should all just work well enough.
> 
...

> > > I think that is just unlocked RMW of a per-cpu/thread variable.  
> 
> That's missing the point entirely. He might be stuck in x86_64 or
> something.

Not entirely, it doesn't matter if code is preempted between the read
and write in preempt_disable() because that can only happen when the
count is changing from 0 to 1.
What does matter is that the 1 is written to the correct place.

	David

next prev parent reply	other threads:[~2026-02-24 10:27 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-23 16:38 [RFC] in-kernel rseq Peter Zijlstra
2026-02-23 17:53 ` David Laight
2026-02-23 18:22   ` Mathieu Desnoyers
2026-02-23 21:54     ` Peter Zijlstra
2026-02-24 10:27       ` David Laight [this message]
2026-02-24 13:33         ` Mathieu Desnoyers
2026-02-24 14:49           ` David Laight
2026-02-24 16:15             ` Mathieu Desnoyers
2026-02-24 11:16 ` Heiko Carstens
2026-02-24 13:48   ` Mathieu Desnoyers
2026-02-24 14:59     ` David Laight
2026-02-24 16:18       ` Mathieu Desnoyers
2026-02-24 15:17   ` Peter Zijlstra
2026-02-24 15:20   ` Peter Zijlstra
2026-02-24 16:02     ` Heiko Carstens
2026-02-24 16:15       ` Heiko Carstens
2026-04-10 17:57 ` Shrikanth Hegde
2026-04-15  8:51   ` Heiko Carstens
2026-04-17  9:29     ` Shrikanth Hegde
2026-04-17  9:36     ` Shrikanth Hegde

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260224102728.1273c080@pumpkin \
    --to=david.laight.linux@gmail.com \
    --cc=cmarinas@kernel.org \
    --cc=hca@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maddy@linux.ibm.com \
    --cc=mark.rutland@arm.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=peterz@infradead.org \
    --cc=ryan.roberts@arm.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.