From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 62F0615098F for ; Tue, 22 Apr 2025 19:37:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745350660; cv=none; b=F+Pud89OX0W8WDHHZHbzWqO/CXEH+eUngLGeMviRS8LOljb6CQE7TxMa+aRy2+9YtZY5V/jaU+bi3+L8po1+2JgFIxCy+ROo0ywIQudofcasaciwbu4GyLL40E1iV8rctEGRhwgmJkUfUCLk64EuAkg4DU/TSqFLt8yerfgRU4o= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745350660; c=relaxed/simple; bh=uBCeJWuQg9yGebhCe0IxKmCUgXiu992oDMVtc4BuF4Q=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=V9mi8Spx/G1T91XZ7U5ilSCIAVNjG2zxsRQHLvfSYATsp5kSiYdfeivF2m+VpIpgpeAztiBOrpSYBboB5sN7Qj66vRVTQmyCaMC78QGsT4J8QJRG1OplSB1QNZhnkkYEGSQImqQAH6mkuN98k1lTB1YdBiSqrYTkbMes8b9BtB0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--aliceryhl.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=e0Vv84h+; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--aliceryhl.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="e0Vv84h+" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-43f405810b4so30894265e9.1 for ; Tue, 22 Apr 2025 12:37:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1745350657; x=1745955457; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=7RnXmCgq1tfF7gO4aZ9Pyvr8hfNnZsXUOnMOuRw52u0=; b=e0Vv84h+7F5H+eqUtqxS0cVCPV8M9BljtL+XicJEVI9sM3QgMLy7MCX6Lx53ZGbDne PRtcqLvvLvTuxrHVoymv2F0em7yqO/k/gvAoy5WmqddybLXgB+4vQxpQDZQPWpVXysZW tV7WIA2xFZ5Ru3wlG+EvmNn/7DSYPdp6f0SZYAaf36j3wMHofJGXe8H+2jgGyqwe+MSq O3Y1IXxECeUm6+2EWkYrUUAt4+KOY0/EzMamLusbhx6U5qh8w8gp4yLUj5s4ww1aU0o/ Vkqhg0LU9eULfEBUkv89F37GNT4IhrpxZePmrv4izzs2sgitYpg8IUmjjCKY60kPo0c5 Wkgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745350657; x=1745955457; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=7RnXmCgq1tfF7gO4aZ9Pyvr8hfNnZsXUOnMOuRw52u0=; b=XlahWUXum9uPI246NzaXaVy1eRPD9eLq/JvGNQtaJrUIp+6+7GfD4852qukzmyswXr eak40gYRY8Lh44vJy2h+KC6bZ7Aq4uO3GeJ/3K/x1gAKqB31IZt1keN7Dg6MG/2Vo/a0 MjlwarNY70BfFrjtbdXkHb4hg6hzPp7nW1PH6+aZDoYM3oFGSauHxlRduWGwIP5iMCt2 soZN3bNc+11iNhDKbYYwZB0MJn0+sGVfEbeWAO33WBqvDUZKpVpy0Z5JvVNHoQWw25de vPhscnQXZUPDZdk4XUD20D+rEcmk5yeMND4dh6mNBY22OQZYdhZOtcjX237EuPvovpml 32nw== X-Gm-Message-State: AOJu0Yzi4GD+Z4tV1aFotlsMvKmEvIsU18z463I+sHXRAXrTgy0eNB/O a7f24nhB6tlEsZxtB6bgUve6Vb2B3V3BlukcZQtM4INNJOt9p/VCk3wLtCVzJcVUFl/rYcMipkz gyxFt0DhG7D+EOQ== X-Google-Smtp-Source: AGHT+IG0m74NCBDfAN7J6NSQg6v19UWsx/JNfQ/I5HGYwAl4oODqZClIjiOs1/zp+FPH6wx5SmSmHSDNXpZgkW8= X-Received: from wmbfo18.prod.google.com ([2002:a05:600c:6912:b0:440:59df:376a]) (user=aliceryhl job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:4f8f:b0:440:54ef:dfdc with SMTP id 5b1f17b1804b1-4406ab9342dmr139491455e9.8.1745350656856; Tue, 22 Apr 2025 12:37:36 -0700 (PDT) Date: Tue, 22 Apr 2025 19:37:34 +0000 In-Reply-To: Precedence: bulk X-Mailing-List: rust-for-linux@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: Message-ID: Subject: Re: looking for an informed write up on how to handle serious multicore issues in rust From: Alice Ryhl To: Mateusz Guzik Cc: rust-for-linux@vger.kernel.org Content-Type: text/plain; charset="utf-8" On Tue, Apr 22, 2025 at 04:44:20PM +0200, Mateusz Guzik wrote: > If one was to write C code to handle this *with* safety belts, it > would be a lot of hand-rolled asserts and even then there is some > possibility of a screw up. > > Note in this case the ->name_safe_to_read array might *change* as it > is being used to compare in name_equal(). False positive/negative > mismatch is handled with the sequence counter. > > So I would like a code sample in Rust which handles the above in a > safe and performance-friendly manner. Of course preferably code trying > to misuse the struct would fail to compile, but a debug-only runtime > check would do just fine. > > Most notably: > - the code issuing name check + key read *must* validate the sequence > counter before returning > - need_the_lock_to_access_in_any_way must not be accessible without > the lock (duh) > - no writes to the read-mostly area without the lock held, except for > the atomic var > > So how do you do this in Rust? I did not cover enough, I can write > self-contained fully-fledged C example. > > -- > Mateusz Guzik There isn't a single canonical implementation of seqlocks in Rust that I can point you to, and we don't have one in the kernel. But we can go through how we can design one. I'll start with a pretty bare bones API and expand it. To start with, I'm going to assume that we have a Seqnum implementation. pub struct Seqnum { ... } impl Seqnum { pub fn begin(&self) -> usize; pub fn retry(&self, num: usize) -> bool; pub fn write_begin(&self); pub fn write_end(&self); } I will not get in to how you implement these. You can use the same implementation as C. I understand that the kernel implements them with inline assembly, which we could replicate. Or we could do a number of other things. Regardless, this is enough to get us a working but bare-bones implementation: #[repr(C)] struct Meh { seq: Seqnum, safe_to_read: AtomicUsize, name_safe_to_read: [AtomicU8; NAME_SIZE], modifiable_without_the_lock: AtomicI32, // some other fields ... _cache: Cachealign, /* type with size 0 and alignment 64 */ lock: SpinLock, } struct MehInner { needs_the_lock: usize, } fn name_equal(name: &[AtomicU8; LEN], cmp: &CStr) -> bool { // Left as an excercise for the reader. // // Probably you just call the C implementation of strncmp, and this // function is probably defined in `kernel`. } fn name_copy(dst: &[AtomicU8; LEN], src: &CStr) -> bool { // Ditto. } impl Meh { fn name_to_key(&self, name: &CStr) -> Option { let seq = self.seq.begin(); if name_equal(&self.name_safe_to_read, name) { let key = self.safe_to_read.load(Relaxed); if !self.seq.retry(seq) { return key; } } let _guard = self.lock.lock(); if name_equal(&self.name_safe_to_read, name) { Some(self.safe_to_read.load(Relaxed)) } else { None } } fn store_name_key(&self, name: &CStr, key: usize) { assert!(name.len_with_nul() < NAME_SIZE); let _guard = self.lock.lock(); self.seq.write_begin(); name_copy(&self.name_safe_to_read, name); self.safe_to_read.write(key, Relaxed); self.seq.write_end(); } } Your first reaction is probably "ATOMICS!?". But hold your horses for a moment. The reason we can't use the ordinary `usize` type is because we need to *disallow* you from performing ordinary unsynchronized reads; similar to how you used READ_ONCE instead of just reading the field normally. I used atomics in this example to achieve that, but it is true that atomics are stronger and more expensive than what we need. We might replace AtomicUsize with a new type called SharedUsize (naming TBD!) that has `load` and `store` methods which are weaker than the relaxed atomic ops. For example, even relaxed loads on AtomicUsize have additional costs associated with them because they're guaranteed to avoid tearing. But I digress. The point is that we need a special version of usize like AtomicUsize that allows us to annotate that these loads and stores are not normal loads and stores because other threads may also touch the values in parallel. It has to be a separate type from usize to prevent you from performing ordinary unsynchronized reads. Let's get back to the example. The bigger problem is that there is no enforcement of correct use at all. After all, we're just using atomics, and any combination of reads and writes from different threads is safe no matter what, so the compiler doesn't care. To start with improving the API, we can wrap Seqnum in a Seqlock type. It could look like this: pub struct SeqLock { seqnum: Seqnum, value: T, } impl SeqLock { pub fn try_read(&self, f: impl FnOnce(&T) -> U) -> Option { let seq = self.seqnum.begin(); let ret = f(&self.value); if self.seqnum.retry(seq) { None } else { Some(ret) } } pub fn assume_no_writers(&self) -> &T { &self.value } pub fn write(&self) -> WriteGuard<'_, T> { self.seqnum.write_begin(); WriteGuard(self) } } struct WriteGuard<'a, T>(&'a SeqLock); impl Drop for WriteGuard<'_, T> { fn drop(&mut self) { self.0.seqnum.write_end(); } } impl Deref for WriteGuard<'_, T> { type Target = T; fn deref(&self) -> &T { &self.0.value } } This API is still not fool-proof - we're missing the part where it knows about the spinlock, but at the very least you have to explicitly annotate each access with a call to `try_read`, `assume_no_writers`, or `write`. This is how you might use it: #[repr(C)] struct Meh { seq: SeqLock, modifiable_without_the_lock: AtomicI32, // some other fields ... _cache: Cachealign, /* type with size 0 and alignment 64 */ lock: SpinLock, } struct MehSeq { safe_to_read: SharedUsize, name_safe_to_read: [SharedU8; NAME_SIZE], } struct MehInner { needs_the_lock: usize, } impl Meh { fn name_to_key(&self, name: &CStr) -> Option { let ret = self.seq.try_read(|inner| { if name_equal(&inner.name_safe_to_read, name) { Some(inner.safe_to_read.load()); } else { None } }); if let Some(ret) = ret { return ret; } let _guard = self.lock.lock(); let inner = self.seq.assume_no_writers(); if name_equal(&inner.name_safe_to_read, name) { Some(inner.safe_to_read.load()) } else { None } } fn store_name_key(&self, name: &CStr, key: usize) { assert!(name.len_with_nul() < NAME_SIZE); let _guard = self.lock.lock(); let inner = self.seq.write(); name_copy(&inner.name_safe_to_read, name); inner.safe_to_read.store(key); // destructor of inner ends the seqlock write region } } Notice how the fields protected by the seqlock are moved into a new sub-struct similar to MehInner. The `try_read` method is also interesting. It throws away your return value if you need to retry to discourage you from using invalid values. Anyway. There's still another big issue with the design: when you use try_read or assume_no_writers, it still lets you write to the data. After all, we're using atomics which we can write to at any time. There are many ways to improve the API to prevent such writes. I'll share one here: If you have write access, then that means that you have a WriteGuard object. So we can have you pass a reference to it: impl SharedUsize { pub fn load(&self) -> usize; pub fn store(&self, value: usize, proof: &WriteGuard<'_, T>); } This way, readers can't call `store` because they don't have a WriteGuard. The writer becomes: fn store_name_key(&self, name: &CStr, key: usize) { assert!(name.len_with_nul() < NAME_SIZE); let _guard = self.lock.lock(); let inner = self.seq.write(); name_copy(&inner.name_safe_to_read, name, &inner); inner.safe_to_read.store(key, &inner); // destructor of inner ends the seqlock write region } This isn't the only way to solve this problem - there are others. Not all of them are fool-proof - including this one. For example, I could get a WriteGuard from one SeqLock and use it with a SharedUsize stored in *another* SeqLock to bypass it. Other solutions *are* fool-proof and don't have this bypass. Some of the other Rust folks are probably happy to tell you about field projections which is an upcoming language feature that lets you do this (and other things such as rcu) in a fool-proof way with much more convenient syntax. Anyway, since the reads and writes are just atomics, Rust the language doesn't care too much if there's a bypass. It's mostly something that we might care about as API designers. Ok, let's return to another problem in the design, which is that it's not tied together with the spinlock. This means that you could call write() or assume_no_writers() without holding it. The most direct way of solving this is to make a single abstraction that wraps the entire thing. #[repr(C)] pub struct SeqLock2 { seq: Seqnum, seq_data: T, extra_data: U, _cache: Cachealign, lock: SpinLock, } impl SeqLock2 { pub fn try_read(&self, f: impl FnOnce(&T) -> R) -> Option { let seq = self.seqnum.begin(); let ret = f(&self.value); if self.seqnum.retry(seq) { None } else { Some(ret) } } pub fn extra_data(&self) -> &U { &self.extra_data } pub fn read_with_lock(&self) -> ReadWithLockGuard<'_, T, U, V> { ReadWithLockGuard { seq: self, guard: self.lock.lock(), } } pub fn read_retry(&self, mut f: impl FnMut(&T) -> R) -> R { if let Some(ret) = self.try_read(&mut f) { return ret; } f(self.read_with_lock().seq_data()) } pub fn write(&self) -> WriteGuard<'a, T, U, V> { let guard = self.lock.lock(); self.seqnum.write_begin(); WriteGuard { seq: self, guard, } } } pub struct ReadWithLockGuard<'a, T, U, V> { seq: &'a SeqLock2, guard: SpinLockGuard<'a, V>, } impl<'a, T, U, V> ReadWithLockGuard<'a, T, U, V> { pub fn seq_data(&self) -> &T { &self.seq.seq_data } pub fn lock_data(&self) -> &T { /* we hold the lock, so we also get to use the lock data */ &self.guard } } // Ditto for WriteGuard. By using a combined type like the above, we enforce that the spinlock and the seqlock are used together and make it pretty hard to get that wrong. The example becomes: #[repr(C)] struct Meh { seq: SeqLock2, } struct MehSeq { safe_to_read: SharedUsize, name_safe_to_read: [SharedU8; NAME_SIZE], } struct MehInner { needs_the_lock: usize, } impl Meh { fn name_to_key(&self, name: &CStr) -> Option { self.seq.read_retry(|inner| { if name_equal(&inner.name_safe_to_read, name) { Some(inner.safe_to_read.load()); } else { None } }) } fn store_name_key(&self, name: &CStr, key: usize) { assert!(name.len_with_nul() < NAME_SIZE); let guard = self.seq.write(); let inner = guard.seq_data(); name_copy(&inner.name_safe_to_read, name); inner.safe_to_read.store(key); } } This API is rather nice because it lets us avoid repeating the logic in name_to_key. The seqlock provides the retry functionality for us. Of course it also has the disadvantage that the layout is rather more fixed than the C version. The layout used here is probably what you want in many cases, but the more things you enforce in the API, the more inflexible it becomes and the fewer cases it will support. This is a trade-off in API design. Another way to tie the seqlock with the spinlock could be to do the same as LockedBy: https://rust.docs.kernel.org/kernel/sync/struct.LockedBy.html Anyway, this email has become long enough. I'll stop it here. Alice