* Re: looking for an informed write up on how to handle serious multicore issues in rust
2025-04-22 14:44 looking for an informed write up on how to handle serious multicore issues in rust Mateusz Guzik
@ 2025-04-22 15:46 ` Mateusz Guzik
2025-04-22 19:37 ` Alice Ryhl
1 sibling, 0 replies; 3+ messages in thread
From: Mateusz Guzik @ 2025-04-22 15:46 UTC (permalink / raw)
To: rust-for-linux
On Tue, Apr 22, 2025 at 4:44 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
>
> Hello.
>
> There is a lot of Rust marketing around "fearless concurrency". The
> features showcased don't cut it in projects with serious needs and I
> failed to find a write up on what to do in that case.
>
> Everything I found so far revolves around the following:
> - an embarrassingly parallel problem, where you can pawn off
> everything -- good, but not applicable for vast majority of the kernel
> - type system which prevents non-mp safe code from being used in a
> multicore setting -- nice, but ultimately pretty minor
> - some atomic usage, sometimes mixed with unsafe -- does not cover
> what's needed (see below)
> - wrapping all accesses with a guard obtained by issuing .lock() on an
> object -- non-starter for code with serious needs (again see below)
>
> Resources I checked:
> - the official book (https://doc.rust-lang.org/stable/book/) -- maybe
> the usage is too advanced to mention there, but there are 0 pointers
> where to go should you need it
> - Rust for Rustaceans (https://rust-for-rustaceans.com/) -- same
> - Rust Atomics and Locks (https://marabos.nl/atomics/) -- it describes
> some of the nature of the problem, but does not show to deal with it
> - Rust for C programmers -- same as first two
>
> I also found the crossbeam crate for reference -- again only has some
> of what's needed.
>
> Ultimately you can't fine-grained lock your way to high performance,
> neither in terms of scalability nor single-threaded execution. The
> reality of hardware is that not only lock contention is terrible (let
> alone in a NUMA setting), but the very operation of acquiring a lock
> is expensive even if it is cache-hot. Locking for everything may be
> tolerable for small-scale userspace programs, it's a no go for a
> modern kernel intended to run on boxen with dozens of cores.
>
> Here is an example of what's needed, how C falls short in helping out
> to validate correctness and where Rust could help at least in
> principle (except *so far* I failed to find any resources on how to do
> it). Of course an individual determined to do something wrong will
> find a way to bypass any and all safety belts -- for cases like that I
> have no expectations of Rust. There is however plenty of honest
> mistakes to make here and which Rust could help with.
>
> I'm condensing the crux of it into the following:
> struct meh {
> /* read-mostly area */
> seq_t seq;
> long safe_to_read;
> char name_safe_to_read[32];
> atomic_t modifiable_without_the_lock;
> /* ... some other fields */
> /* the lock protecting the struct, explicitly separated from
> read-mostly area */
> spin_lock lock ___cacheline_aligned_in_smp;
> long need_the_lock_to_access_in_any_way;
> };
>
> The first 3 fields (seq, safe_to_read, name_safe_to_read) must only
> ever be written to with the spin lock held.
>
> However, everyone is free to read them without the lock and this
> happens frequently.
>
> Suppose the common case reads both safe_to_read and the name and needs
> the seq counter to ensure correctness of the result, maybe like this:
> long meh_name_to_key(struct meh *obj, const char *name) {
> seq = seq_read(obj->seq);
> if (name_equal(obj->name_safe_to_read, name)) {
> key = READ_ONCE(obj->safe_to_read);
> if (seq_equal(obj->seq, seq)) {
> return key;
> }
> }
> spin_lock(&meh->lock);
> if (name_equal(obj->name_safe_to_read, name))
> key = obj->safe_to_read;
> spin_unlock(&meh->lock);
> return key;
> }
>
> Pretend this is a part of a lockless hash lookup or similar.
>
> If one was to write C code to handle this *with* safety belts, it
> would be a lot of hand-rolled asserts and even then there is some
> possibility of a screw up.
>
> Note in this case the ->name_safe_to_read array might *change* as it
> is being used to compare in name_equal(). False positive/negative
> mismatch is handled with the sequence counter.
>
> So I would like a code sample in Rust which handles the above in a
> safe and performance-friendly manner. Of course preferably code trying
> to misuse the struct would fail to compile, but a debug-only runtime
> check would do just fine.
>
> Most notably:
> - the code issuing name check + key read *must* validate the sequence
> counter before returning
> - need_the_lock_to_access_in_any_way must not be accessible without
> the lock (duh)
> - no writes to the read-mostly area without the lock held, except for
> the atomic var
>
> So how do you do this in Rust? I did not cover enough, I can write
> self-contained fully-fledged C example.
>
Here is a more complete example. It may look silly as presented here,
but this very much *is* representative of what's needed -- safe
operation in face of data changing as you access it and without any
writes to it. While this code is not something you can readily test, I
think it is sufficient to show what I'm after.
#define NAME_SIZE 32
struct meh {
/* read-mostly area */
seqcount_t seq;
long safe_to_read;
char name_safe_to_read[NAME_SIZE];
atomic_t modifiable_without_the_lock;
/* ... some other fields */
/* the lock protecting the struct, explicitly separated from
read-mostly area */
spinlock_t lock ____cacheline_aligned_in_smp;
long need_the_lock_to_access_in_any_way;
};
static bool meh_name_equal(const char *s1, const char *s2)
{
/*
* Inefficient for simplicity to not get in the way of the
crux of the issue.
* Also intentionally ignores the case of s2 being longer for
the same reason.
*/
return strncmp(s1, s2, NAME_SIZE) == 0;
}
long meh_name_to_key(struct meh *obj, const char *name)
{
long key;
unsigned seq;
/* note there are no stores made to any shared areas in the
common case *and* we may be racing against meh_store_name_key(), in
which case it is fine to fail to match the name. it is *not* fine to
return the ->safe_to_read value for the wrong name */
seq = raw_seqcount_begin(&obj->seq);
if (likely(meh_name_equal(obj->name_safe_to_read, name))) {
key = READ_ONCE(obj->safe_to_read);
if (!read_seqcount_retry(&obj->seq, seq))
return key;
}
spin_lock(&obj->lock);
if (meh_name_equal(obj->name_safe_to_read, name))
key = obj->safe_to_read;
spin_unlock(&obj->lock);
return key;
}
EXPORT_SYMBOL(meh_name_to_key);
void meh_store_name_key(struct meh *obj, const char *name, long key)
{
/*
* for simplicity pretend the api contract requires it
*/
BUG_ON(strlen(name) > NAME_SIZE - 1);
spin_lock(&obj->lock);
write_seqcount_begin(&obj->seq);
strcpy(obj->name_safe_to_read, name);
obj->safe_to_read = key;
write_seqcount_end(&obj->seq);
spin_unlock(&obj->lock);
}
EXPORT_SYMBOL(meh_store_name_key);
pahole shows:
struct meh {
seqcount_t seq; /* 0 4 */
/* XXX 4 bytes hole, try to pack */
long int safe_to_read; /* 8 8 */
char name_safe_to_read[32]; /* 16 32 */
atomic_t modifiable_without_the_lock; /*
48 4 */
/* XXX 12 bytes hole, try to pack */
/* --- cacheline 1 boundary (64 bytes) --- */
spinlock_t lock
__attribute__((__aligned__(64))); /* 64 4 */
/* XXX 4 bytes hole, try to pack */
long int need_the_lock_to_access_in_any_way;
/* 72 8 */
/* size: 128, cachelines: 2, members: 6 */
/* sum members: 60, holes: 3, sum holes: 20 */
/* padding: 48 */
/* forced alignments: 1, forced holes: 1, sum forced holes: 12 */
} __attribute__((__aligned__(64)));
I expect the rust code to retain the layout, notably the separated lock.
--
Mateusz Guzik <mjguzik gmail.com>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: looking for an informed write up on how to handle serious multicore issues in rust
2025-04-22 14:44 looking for an informed write up on how to handle serious multicore issues in rust Mateusz Guzik
2025-04-22 15:46 ` Mateusz Guzik
@ 2025-04-22 19:37 ` Alice Ryhl
1 sibling, 0 replies; 3+ messages in thread
From: Alice Ryhl @ 2025-04-22 19:37 UTC (permalink / raw)
To: Mateusz Guzik; +Cc: rust-for-linux
On Tue, Apr 22, 2025 at 04:44:20PM +0200, Mateusz Guzik wrote:
> If one was to write C code to handle this *with* safety belts, it
> would be a lot of hand-rolled asserts and even then there is some
> possibility of a screw up.
>
> Note in this case the ->name_safe_to_read array might *change* as it
> is being used to compare in name_equal(). False positive/negative
> mismatch is handled with the sequence counter.
>
> So I would like a code sample in Rust which handles the above in a
> safe and performance-friendly manner. Of course preferably code trying
> to misuse the struct would fail to compile, but a debug-only runtime
> check would do just fine.
>
> Most notably:
> - the code issuing name check + key read *must* validate the sequence
> counter before returning
> - need_the_lock_to_access_in_any_way must not be accessible without
> the lock (duh)
> - no writes to the read-mostly area without the lock held, except for
> the atomic var
>
> So how do you do this in Rust? I did not cover enough, I can write
> self-contained fully-fledged C example.
>
> --
> Mateusz Guzik <mjguzik gmail.com>
There isn't a single canonical implementation of seqlocks in Rust that I
can point you to, and we don't have one in the kernel. But we can go
through how we can design one. I'll start with a pretty bare bones API
and expand it.
To start with, I'm going to assume that we have a Seqnum implementation.
pub struct Seqnum { ... }
impl Seqnum {
pub fn begin(&self) -> usize;
pub fn retry(&self, num: usize) -> bool;
pub fn write_begin(&self);
pub fn write_end(&self);
}
I will not get in to how you implement these. You can use the same
implementation as C. I understand that the kernel implements them with
inline assembly, which we could replicate. Or we could do a number of
other things. Regardless, this is enough to get us a working but
bare-bones implementation:
#[repr(C)]
struct Meh {
seq: Seqnum,
safe_to_read: AtomicUsize,
name_safe_to_read: [AtomicU8; NAME_SIZE],
modifiable_without_the_lock: AtomicI32,
// some other fields ...
_cache: Cachealign, /* type with size 0 and alignment 64 */
lock: SpinLock<MehInner>,
}
struct MehInner {
needs_the_lock: usize,
}
fn name_equal<const LEN: usize>(name: &[AtomicU8; LEN], cmp: &CStr) -> bool {
// Left as an excercise for the reader.
//
// Probably you just call the C implementation of strncmp, and this
// function is probably defined in `kernel`.
}
fn name_copy<const LEN: usize>(dst: &[AtomicU8; LEN], src: &CStr) -> bool {
// Ditto.
}
impl Meh {
fn name_to_key(&self, name: &CStr) -> Option<usize> {
let seq = self.seq.begin();
if name_equal(&self.name_safe_to_read, name) {
let key = self.safe_to_read.load(Relaxed);
if !self.seq.retry(seq) {
return key;
}
}
let _guard = self.lock.lock();
if name_equal(&self.name_safe_to_read, name) {
Some(self.safe_to_read.load(Relaxed))
} else {
None
}
}
fn store_name_key(&self, name: &CStr, key: usize) {
assert!(name.len_with_nul() < NAME_SIZE);
let _guard = self.lock.lock();
self.seq.write_begin();
name_copy(&self.name_safe_to_read, name);
self.safe_to_read.write(key, Relaxed);
self.seq.write_end();
}
}
Your first reaction is probably "ATOMICS!?". But hold your horses for a
moment.
The reason we can't use the ordinary `usize` type is because we need to
*disallow* you from performing ordinary unsynchronized reads; similar to
how you used READ_ONCE instead of just reading the field normally. I
used atomics in this example to achieve that, but it is true that
atomics are stronger and more expensive than what we need. We might
replace AtomicUsize with a new type called SharedUsize (naming TBD!)
that has `load` and `store` methods which are weaker than the relaxed
atomic ops. For example, even relaxed loads on AtomicUsize have
additional costs associated with them because they're guaranteed to
avoid tearing.
But I digress. The point is that we need a special version of usize like
AtomicUsize that allows us to annotate that these loads and stores are
not normal loads and stores because other threads may also touch the
values in parallel. It has to be a separate type from usize to prevent
you from performing ordinary unsynchronized reads.
Let's get back to the example. The bigger problem is that there is no
enforcement of correct use at all. After all, we're just using atomics,
and any combination of reads and writes from different threads is safe
no matter what, so the compiler doesn't care.
To start with improving the API, we can wrap Seqnum in a Seqlock type.
It could look like this:
pub struct SeqLock<T> {
seqnum: Seqnum,
value: T,
}
impl<T> SeqLock<T> {
pub fn try_read<U>(&self, f: impl FnOnce(&T) -> U) -> Option<U> {
let seq = self.seqnum.begin();
let ret = f(&self.value);
if self.seqnum.retry(seq) {
None
} else {
Some(ret)
}
}
pub fn assume_no_writers(&self) -> &T {
&self.value
}
pub fn write(&self) -> WriteGuard<'_, T> {
self.seqnum.write_begin();
WriteGuard(self)
}
}
struct WriteGuard<'a, T>(&'a SeqLock<T>);
impl<T> Drop for WriteGuard<'_, T> {
fn drop(&mut self) {
self.0.seqnum.write_end();
}
}
impl<T> Deref for WriteGuard<'_, T> {
type Target = T;
fn deref(&self) -> &T { &self.0.value }
}
This API is still not fool-proof - we're missing the part where it knows
about the spinlock, but at the very least you have to explicitly
annotate each access with a call to `try_read`, `assume_no_writers`, or
`write`. This is how you might use it:
#[repr(C)]
struct Meh {
seq: SeqLock<MehSeq>,
modifiable_without_the_lock: AtomicI32,
// some other fields ...
_cache: Cachealign, /* type with size 0 and alignment 64 */
lock: SpinLock<MehInner>,
}
struct MehSeq {
safe_to_read: SharedUsize,
name_safe_to_read: [SharedU8; NAME_SIZE],
}
struct MehInner {
needs_the_lock: usize,
}
impl Meh {
fn name_to_key(&self, name: &CStr) -> Option<usize> {
let ret = self.seq.try_read(|inner| {
if name_equal(&inner.name_safe_to_read, name) {
Some(inner.safe_to_read.load());
} else {
None
}
});
if let Some(ret) = ret {
return ret;
}
let _guard = self.lock.lock();
let inner = self.seq.assume_no_writers();
if name_equal(&inner.name_safe_to_read, name) {
Some(inner.safe_to_read.load())
} else {
None
}
}
fn store_name_key(&self, name: &CStr, key: usize) {
assert!(name.len_with_nul() < NAME_SIZE);
let _guard = self.lock.lock();
let inner = self.seq.write();
name_copy(&inner.name_safe_to_read, name);
inner.safe_to_read.store(key);
// destructor of inner ends the seqlock write region
}
}
Notice how the fields protected by the seqlock are moved into a new
sub-struct similar to MehInner.
The `try_read` method is also interesting. It throws away your return
value if you need to retry to discourage you from using invalid values.
Anyway. There's still another big issue with the design: when you use
try_read or assume_no_writers, it still lets you write to the data.
After all, we're using atomics which we can write to at any time.
There are many ways to improve the API to prevent such writes. I'll
share one here: If you have write access, then that means that you have
a WriteGuard object. So we can have you pass a reference to it:
impl SharedUsize {
pub fn load(&self) -> usize;
pub fn store<T>(&self, value: usize, proof: &WriteGuard<'_, T>);
}
This way, readers can't call `store` because they don't have a
WriteGuard. The writer becomes:
fn store_name_key(&self, name: &CStr, key: usize) {
assert!(name.len_with_nul() < NAME_SIZE);
let _guard = self.lock.lock();
let inner = self.seq.write();
name_copy(&inner.name_safe_to_read, name, &inner);
inner.safe_to_read.store(key, &inner);
// destructor of inner ends the seqlock write region
}
This isn't the only way to solve this problem - there are others. Not
all of them are fool-proof - including this one. For example, I could
get a WriteGuard from one SeqLock and use it with a SharedUsize stored
in *another* SeqLock to bypass it. Other solutions *are* fool-proof and
don't have this bypass. Some of the other Rust folks are probably happy
to tell you about field projections which is an upcoming language
feature that lets you do this (and other things such as rcu) in a
fool-proof way with much more convenient syntax.
Anyway, since the reads and writes are just atomics, Rust the language
doesn't care too much if there's a bypass. It's mostly something that we
might care about as API designers.
Ok, let's return to another problem in the design, which is that it's
not tied together with the spinlock. This means that you could call
write() or assume_no_writers() without holding it.
The most direct way of solving this is to make a single abstraction that
wraps the entire thing.
#[repr(C)]
pub struct SeqLock2<T, U, V> {
seq: Seqnum,
seq_data: T,
extra_data: U,
_cache: Cachealign,
lock: SpinLock<V>,
}
impl<T, U, V> SeqLock2<T, U, V> {
pub fn try_read<R>(&self, f: impl FnOnce(&T) -> R) -> Option<R> {
let seq = self.seqnum.begin();
let ret = f(&self.value);
if self.seqnum.retry(seq) {
None
} else {
Some(ret)
}
}
pub fn extra_data(&self) -> &U {
&self.extra_data
}
pub fn read_with_lock(&self) -> ReadWithLockGuard<'_, T, U, V> {
ReadWithLockGuard {
seq: self,
guard: self.lock.lock(),
}
}
pub fn read_retry<R>(&self, mut f: impl FnMut(&T) -> R) -> R {
if let Some(ret) = self.try_read(&mut f) {
return ret;
}
f(self.read_with_lock().seq_data())
}
pub fn write(&self) -> WriteGuard<'a, T, U, V> {
let guard = self.lock.lock();
self.seqnum.write_begin();
WriteGuard {
seq: self,
guard,
}
}
}
pub struct ReadWithLockGuard<'a, T, U, V> {
seq: &'a SeqLock2<T, U, V>,
guard: SpinLockGuard<'a, V>,
}
impl<'a, T, U, V> ReadWithLockGuard<'a, T, U, V> {
pub fn seq_data(&self) -> &T {
&self.seq.seq_data
}
pub fn lock_data(&self) -> &T {
/* we hold the lock, so we also get to use the lock data */
&self.guard
}
}
// Ditto for WriteGuard.
By using a combined type like the above, we enforce that the spinlock
and the seqlock are used together and make it pretty hard to get that
wrong. The example becomes:
#[repr(C)]
struct Meh {
seq: SeqLock2<MehSeq, AtomicI32, MehInner>,
}
struct MehSeq {
safe_to_read: SharedUsize,
name_safe_to_read: [SharedU8; NAME_SIZE],
}
struct MehInner {
needs_the_lock: usize,
}
impl Meh {
fn name_to_key(&self, name: &CStr) -> Option<usize> {
self.seq.read_retry(|inner| {
if name_equal(&inner.name_safe_to_read, name) {
Some(inner.safe_to_read.load());
} else {
None
}
})
}
fn store_name_key(&self, name: &CStr, key: usize) {
assert!(name.len_with_nul() < NAME_SIZE);
let guard = self.seq.write();
let inner = guard.seq_data();
name_copy(&inner.name_safe_to_read, name);
inner.safe_to_read.store(key);
}
}
This API is rather nice because it lets us avoid repeating the logic in
name_to_key. The seqlock provides the retry functionality for us.
Of course it also has the disadvantage that the layout is rather more
fixed than the C version. The layout used here is probably what you want
in many cases, but the more things you enforce in the API, the more
inflexible it becomes and the fewer cases it will support. This is a
trade-off in API design.
Another way to tie the seqlock with the spinlock could be to do the same
as LockedBy: https://rust.docs.kernel.org/kernel/sync/struct.LockedBy.html
Anyway, this email has become long enough. I'll stop it here.
Alice
^ permalink raw reply [flat|nested] 3+ messages in thread