Rust for Linux List
 help / color / mirror / Atom feed
* [PATCH 0/1] rust: block: mq: make GenDisk Send impl sound
@ 2026-06-05  4:49 Yuan Tan
  2026-06-05  4:49 ` [PATCH 1/1] " Yuan Tan
  2026-06-05 12:04 ` [PATCH 0/1] " Andreas Hindborg
  0 siblings, 2 replies; 3+ messages in thread
From: Yuan Tan @ 2026-06-05  4:49 UTC (permalink / raw)
  To: a.hindborg, ojeda, boqun, rust-for-linux
  Cc: zhiyunq, ardalan, pgovind2, dzueck, Yuan Tan

Hi Linux kernel maintainers,

We are developing a tool called FerroLens to detect potential unsound
behavior in Rust code in the Linux kernel. FerroLens reported the following
bug in rust/kernel/gendisk.rs.

Gendisk is marked as Send although the fields it contains may not be.
Specifically, the QueueData held in the raw gendisk pointer may not be safe
to send across threads. Therefore, sending the Gendisk from one thread to
another and dropping on a different thread may cause unsound behavior.

Additionally, Gendisk contains an Arc<TagSet<T>>. This Arc would be Send
and Sync if the underlying TagSet<T> were Send and Sync. But this is not
explicitly derived, although it can be, since the API does not modify the
TagSet.

We're not Rust experts, so we may have gotten some things wrong. We'd greatly
appreciate any corrections.


I tried creating a PoC to trigger this bug and make our findings more
solid.

We actually hacked the kernel a bit to enable KCSAN for the Rust pieces,
and it did generate a few crashes, but they are highly unstable.

BUG: KCSAN: data-race in drop_in_place<...GenDisk...> / __srcu_check_read_flavor

Here is the PoC I used. Hopefully, sharing it here in case it can helps
anyone better understand the bug.

---
python3 guest_kcsan_inflight_teardown.py --rounds 20 --inflight-threads 2 --io-threads 2 --io-burst 8 --round-cooldown 0.02

#!/usr/bin/env python3
import os
import threading
import time
import traceback


BASE = "/sys/kernel/config/rnull"
SYS_BLOCK = "/sys/block"
DEVICE_SIZE_MIB = 64
BLOCK_SIZE = 4096
ROUNDS = 220
INFLIGHT_THREADS = 8
POWER_OFF_DELAY_SEC = 0.015
POST_POWEROFF_SPIN_SEC = 0.12


def write_file(path: str, data: str) -> None:
    with open(path, "w", encoding="ascii") as f:
        f.write(data)


def wait_for_path(path: str, timeout_sec: float) -> bool:
    deadline = time.time() + timeout_sec
    while time.time() < deadline:
        if os.path.exists(path):
            return True
        time.sleep(0.001)
    return os.path.exists(path)


def create_device(name: str) -> str:
    path = f"{BASE}/{name}"
    os.mkdir(path)
    write_file(f"{path}/size", f"{DEVICE_SIZE_MIB}\n")
    write_file(f"{path}/blocksize", f"{BLOCK_SIZE}\n")
    write_file(f"{path}/irqmode", "1\n")
    write_file(f"{path}/power", "1\n")
    return path


def inflight_path(name: str) -> str:
    return f"{SYS_BLOCK}/{name}/inflight"


def inflight_worker(name: str, running: threading.Event, errors: list[str]) -> None:
    path = inflight_path(name)
    fd = None
    try:
        if not wait_for_path(path, 1.0):
            errors.append(f"{name}: inflight path not ready")
            return

        fd = os.open(path, os.O_RDONLY)
        while running.is_set():
            try:
                os.lseek(fd, 0, os.SEEK_SET)
                os.read(fd, 128)
            except OSError:
                pass
    except Exception as exc:
        errors.append(f"inflight {name}: {exc!r}\n{traceback.format_exc()}")
    finally:
        if fd is not None:
            try:
                os.close(fd)
            except OSError:
                pass


def destroy_device(path: str, errors: list[str]) -> None:
    for _ in range(1000):
        try:
            os.rmdir(path)
            return
        except OSError:
            time.sleep(0.002)
    errors.append(f"rmdir failed for {path}")


def device_round(name: str, errors: list[str]) -> None:
    path = create_device(name)
    running = threading.Event()
    running.set()

    workers = []
    for _ in range(INFLIGHT_THREADS):
        workers.append(threading.Thread(target=inflight_worker, args=(name, running, errors)))

    for worker in workers:
        worker.start()

    time.sleep(POWER_OFF_DELAY_SEC)
    write_file(f"{path}/power", "0\n")
    time.sleep(POST_POWEROFF_SPIN_SEC)
    running.clear()

    for worker in workers:
        worker.join()

    destroy_device(path, errors)


def main() -> int:
    errors: list[str] = []
    start = time.time()

    for round_id in range(ROUNDS):
        name = f"gi{round_id}"
        try:
            device_round(name, errors)
        except Exception as exc:
            errors.append(f"round {name}: {exc!r}\n{traceback.format_exc()}")

    duration = time.time() - start
    print(f"kcsan_inflight_teardown_done rounds={ROUNDS} duration={duration:.2f}s")
    print(f"errors={len(errors)}")
    for err in errors[:20]:
        print(err)
    return 0


if __name__ == "__main__":
    raise SystemExit(main())



Yuan Tan (1):
  rust: block: mq: make GenDisk Send impl sound

 rust/kernel/block/mq/gen_disk.rs |  8 +++++---
 rust/kernel/block/mq/tag_set.rs  | 11 +++++++++++
 2 files changed, 16 insertions(+), 3 deletions(-)

-- 
2.43.2


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 1/1] rust: block: mq: make GenDisk Send impl sound
  2026-06-05  4:49 [PATCH 0/1] rust: block: mq: make GenDisk Send impl sound Yuan Tan
@ 2026-06-05  4:49 ` Yuan Tan
  2026-06-05 12:04 ` [PATCH 0/1] " Andreas Hindborg
  1 sibling, 0 replies; 3+ messages in thread
From: Yuan Tan @ 2026-06-05  4:49 UTC (permalink / raw)
  To: a.hindborg, ojeda, boqun, rust-for-linux
  Cc: zhiyunq, ardalan, pgovind2, dzueck, Yuan Tan, stable

GenDisk<T> has a manual Send implementation, but it stores an
Arc<TagSet<T>>. Since Arc<T> is Send only when T is both Send and Sync,
this relies on TagSet<T> being Sync.

TagSet<T> wraps struct blk_mq_tag_set in Opaque, which is based on
UnsafeCell and is therefore not Sync by default. This leaves the GenDisk<T>
Send implementation claiming that GenDisk can be moved across threads even
though one of the values it keeps alive does not satisfy the corresponding
thread-safety requirements.

Fix this by marking TagSet<T> as Send and Sync. TagSet<T> does not expose
safe Rust access to the interior fields of blk_mq_tag_set; the safe Rust
APIs only pass the opaque tag set pointer back to blk-mq, whose queue and
tag mutation paths provide their own locking, RCU, or SRCU synchronization.

Also make GenDisk<T>: Send depend on T::QueueData: Send instead of T: Send.
GenDisk<T> does not own a value of type T, but dropping it reclaims and
drops T::QueueData on the thread that drops the GenDisk.

Fixes: 3253aba3408a ("rust: block: introduce `kernel::block::mq` module")
Cc: stable@vger.kernel.org
Reported-by: Priya Bala Govindasamy <pgovind2@uci.edu>
Reported-by: Dylan Zueck <dzueck@uci.edu>
Signed-off-by: Yuan Tan <ytan089@ucr.edu>
---
 rust/kernel/block/mq/gen_disk.rs |  8 +++++---
 rust/kernel/block/mq/tag_set.rs  | 11 +++++++++++
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/rust/kernel/block/mq/gen_disk.rs b/rust/kernel/block/mq/gen_disk.rs
index 912cb805caf5..77f9327c0d4d 100644
--- a/rust/kernel/block/mq/gen_disk.rs
+++ b/rust/kernel/block/mq/gen_disk.rs
@@ -198,9 +198,11 @@ pub struct GenDisk<T: Operations> {
     gendisk: *mut bindings::gendisk,
 }
 
-// SAFETY: `GenDisk` is an owned pointer to a `struct gendisk` and an `Arc` to a
-// `TagSet` It is safe to send this to other threads as long as T is Send.
-unsafe impl<T: Operations + Send> Send for GenDisk<T> {}
+// SAFETY: `GenDisk` owns a `struct gendisk` and keeps the associated `TagSet`
+// alive via the `Arc`; it does not own a value of type `T`. If a `GenDisk` is
+// dropped on another thread, `Drop::drop` reclaims `T::QueueData`, so
+// `T::QueueData` must be `Send`.
+unsafe impl<T: Operations> Send for GenDisk<T> where T::QueueData: Send {}
 
 impl<T: Operations> Drop for GenDisk<T> {
     fn drop(&mut self) {
diff --git a/rust/kernel/block/mq/tag_set.rs b/rust/kernel/block/mq/tag_set.rs
index dae9df408a86..2051f4305e6b 100644
--- a/rust/kernel/block/mq/tag_set.rs
+++ b/rust/kernel/block/mq/tag_set.rs
@@ -31,6 +31,17 @@ pub struct TagSet<T: Operations> {
     _p: PhantomData<T>,
 }
 
+// SAFETY: `TagSet` does not own a value of type `T`; `T` is only used to
+// select the blk-mq operations vtable. The wrapped `struct blk_mq_tag_set` is
+// only exposed to Rust as an opaque handle, and concurrent access to it is
+// synchronized by the blk-mq core.
+unsafe impl<T: Operations> Send for TagSet<T> {}
+
+// SAFETY: `TagSet` does not provide safe access to the interior of the wrapped
+// `struct blk_mq_tag_set`; concurrent access to it is synchronized by the
+// blk-mq core.
+unsafe impl<T: Operations> Sync for TagSet<T> {}
+
 impl<T: Operations> TagSet<T> {
     /// Try to create a new tag set
     pub fn new(
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH 0/1] rust: block: mq: make GenDisk Send impl sound
  2026-06-05  4:49 [PATCH 0/1] rust: block: mq: make GenDisk Send impl sound Yuan Tan
  2026-06-05  4:49 ` [PATCH 1/1] " Yuan Tan
@ 2026-06-05 12:04 ` Andreas Hindborg
  1 sibling, 0 replies; 3+ messages in thread
From: Andreas Hindborg @ 2026-06-05 12:04 UTC (permalink / raw)
  To: Yuan Tan, ojeda, boqun, rust-for-linux
  Cc: zhiyunq, ardalan, pgovind2, dzueck, Yuan Tan

Hi,

Yuan Tan <ytan089@ucr.edu> writes:

> Hi Linux kernel maintainers,
>
> We are developing a tool called FerroLens to detect potential unsound
> behavior in Rust code in the Linux kernel. FerroLens reported the following
> bug in rust/kernel/gendisk.rs.
>
> Gendisk is marked as Send although the fields it contains may not be.
> Specifically, the QueueData held in the raw gendisk pointer may not be safe
> to send across threads. Therefore, sending the Gendisk from one thread to
> another and dropping on a different thread may cause unsound behavior.
>
> Additionally, Gendisk contains an Arc<TagSet<T>>. This Arc would be Send
> and Sync if the underlying TagSet<T> were Send and Sync. But this is not
> explicitly derived, although it can be, since the API does not modify the
> TagSet.

Thanks for the patch. The `GenDisk` `Send` fix looks correct to me.

The implementation of `Send` and `Sync` for `TagSet` is not a fix, and
it was already sent to list [1]. It does however suffer the same issue
that you described for `GenDisk`, so I will address this when I send v2.

FYI, I did not receive your email in my inbox, I saw it on list. It
might be rejected by Proton for some reason.

Best regards,
Andreas Hindborg

[1] https://lore.kernel.org/rust-for-linux/20260216-rnull-v6-19-rc5-send-v1-42-de9a7af4b469@kernel.org/



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-06-05 12:04 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-05  4:49 [PATCH 0/1] rust: block: mq: make GenDisk Send impl sound Yuan Tan
2026-06-05  4:49 ` [PATCH 1/1] " Yuan Tan
2026-06-05 12:04 ` [PATCH 0/1] " Andreas Hindborg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox