From: sashiko-bot@kernel.org
To: leixiang <leenollei@gmail.com>
Cc: kvm@vger.kernel.org, ojeda@kernel.org
Subject: Re: [RFC PATCH] KVM: rust: add Rust reimplementation of eventfd
Date: Fri, 26 Jun 2026 08:48:41 +0000 [thread overview]
Message-ID: <20260626084842.172581F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260626083212.39611-1-leixiang@kylinos.cn>
Thank you for your contribution! Sashiko AI review found 7 potential issue(s) to consider:
- [Critical] Recursive spinlock acquisition causing guaranteed deadlock in `irqfd_wakeup`.
- [Critical] ABBA deadlock between `irqfds.lock` and waitqueue lock intentionally hidden from lockdep.
- [High] `kvm_deassign_ioeventfd_idx` fails to free the `Ioeventfd` device, leaking memory and eventfd references.
- [High] MutexGuard dropped too late, holding `irq_lock` across `synchronize_srcu_expedited`.
- [High] Logic error in `irqfd_wakeup` injects spurious interrupts for deactivating irqfds.
- [High] Unsafe RCU list traversal in `kvm_notify_irqfd_resampler` lacks `rcu_dereference` memory barriers.
- [Medium] Unaligned memory read causes undefined behavior in `ioeventfd_in_range`.
--
commit 3427a3ec37eb356c05fc7c711d26f74d692d5086
Author: ninol <leenollei@gmail.com>
Subject: KVM: rust: add Rust reimplementation of eventfd
This patch introduces a Rust reimplementation of KVM's eventfd support, providing irqfd and ioeventfd functionality. It uses RAII guards for lock management and is selectable via a new Kconfig option.
> diff --git a/rust/kernel/kvm/eventfd.rs b/rust/kernel/kvm/eventfd.rs
> new file mode 100644
> index 0000000000000..d9b12c7caccb0
> --- /dev/null
> +++ b/rust/kernel/kvm/eventfd.rs
[ ... ]
> +unsafe extern "C" fn irqfd_wakeup(
> + wait: *mut bindings::wait_queue_entry,
> + _mode: c_uint,
> + _sync: c_int,
> + key: *mut c_void,
> +) -> c_int {
> + // SAFETY: wait is embedded in kvm_kernel_irqfd.
> + let irqfd: *mut bindings::kvm_kernel_irqfd =
> + crate::container_of!(wait, bindings::kvm_kernel_irqfd, wait);
> + let flags = (key as usize) as bindings::__poll_t;
> + let kvm = (*irqfd).kvm;
> + let mut ret: c_int = 0;
> +
> + if (flags & bindings::POLLIN) != 0 {
> + let mut cnt: u64 = 0;
> + eventfd_ctx_do_read((*irqfd).eventfd, &raw mut cnt);
[Severity: Critical]
Does this cause a recursive spinlock deadlock? This irqfd_wakeup() function is
invoked from the eventfd waitqueue wakeup path where the waitqueue spinlock
(wqh.lock) is already held. Calling eventfd_ctx_do_read() here will try to
acquire that exact same spinlock again, resulting in an immediate deadlock.
> +
> + // SAFETY: kvm->irq_srcu is valid.
> + let _srcu_guard = unsafe { SrcuGuard::new(&raw mut (*kvm).irq_srcu) };
> +
> + // Read irq_entry under seqcount protection to avoid torn reads
> + // when kvm_irq_routing_update() writes concurrently on another CPU.
> + let mut irq = loop {
> + let seq = unsafe { bindings::read_seqcount_begin(&raw mut (*irqfd).irq_entry_sc) };
> + let entry = unsafe { core::ptr::read(&(*irqfd).irq_entry) };
> + if unsafe { bindings::read_seqcount_retry(&raw mut (*irqfd).irq_entry_sc, seq) } == 0 {
> + break entry;
> + }
> + };
> +
> + let deactivating = !irqfd_is_active(irqfd);
> + if deactivating
> + || unsafe {
> + bindings::kvm_arch_set_irq_inatomic(
> + &raw mut irq,
> + kvm,
> + bindings::KVM_USERSPACE_IRQ_SOURCE_ID as i32,
> + 1,
> + false,
> + )
> + } == -(bindings::EWOULDBLOCK as c_int)
> + {
> + // SAFETY: FFI call.
> + unsafe { bindings::schedule_work(&raw mut (*irqfd).inject) };
> + }
[Severity: High]
Does this logic mistakenly inject spurious interrupts when an irqfd is
deactivating? If the deactivating condition is true, the || operator
short-circuits and evaluates the entire condition to true, meaning we will
schedule work for deactivating irqfds. This could lead to use-after-free
issues if the work executes after the irqfd memory has been freed.
> +
> + // _srcu_guard is dropped here
> + ret = 1;
> + }
> +
> + if (flags & bindings::POLLHUP) != 0 {
> + // SAFETY: Taking irqfds.lock is safe here per original C comment.
> + // SAFETY: kvm is valid, irqfds.lock is valid.
> + let _spin_guard = unsafe { SpinLockIrqSaveGuard::new(&raw mut (*kvm).irqfds.lock) };
[Severity: Critical]
Is there an ABBA deadlock risk here? This acquires irqfds.lock while the
waitqueue lock (wqh.lock) is already held by the caller. However, in
kvm_irqfd_register(), irqfds.lock is acquired before calling
add_wait_queue_priority_exclusive(), which in turn acquires wqh.lock. This
lock inversion can trigger a deadlock when registering and closing eventfds
concurrently.
[ ... ]
> +unsafe extern "C" fn kvm_irqfd_register(
> + _file: *mut bindings::file,
> + wqh: *mut bindings::wait_queue_head,
> + pt: *mut bindings::poll_table,
> +) {
[ ... ]
> + // Initialize wait queue entry with our custom wakeup handler.
> + init_waitqueue_func_entry(&raw mut (*irqfd).wait, Some(irqfd_wakeup));
> +
> + // Temporarily lie to lockdep about holding irqfds.lock to avoid a
> + // false positive: add_wait_queue_priority_exclusive takes wqh->lock
> + // internally, and lockdep would see irqfds.lock -> wqh->lock, which
> + // matches the reverse order in irqfd_wakeup(EPOLLHUP).
> + // SAFETY: `kvm` is a valid pointer, and `irqfds.lock` is held.
> + unsafe { bindings::rust_helper_kvm_irqfds_spin_release(kvm) };
> +
> + // Add as priority waiter on eventfd.
> + (*p).ret = add_wait_queue_priority_exclusive(wqh, &raw mut (*irqfd).wait);
> +
> + // SAFETY: `kvm` is valid, lockdep state is restored after
> + // releasing wait queue lock.
> + unsafe { bindings::rust_helper_kvm_irqfds_spin_acquire(kvm) };
[Severity: Critical]
This isn't a false positive; silencing lockdep here hides the real ABBA
deadlock mentioned above between irqfd_wakeup() and this function. Could we
restructure the locking to avoid this inversion instead of hiding it from
lockdep?
[ ... ]
> +#[inline(always)]
> +unsafe fn ioeventfd_in_range(
> + p: &Ioeventfd,
> + addr: bindings::gpa_t,
> + len: c_int,
> + val: *const c_void,
> +) -> bool {
[ ... ]
> + if let Some(datamatch) = p.datamatch() {
> + // KVM guarantees val is naturally aligned; assert to match C's BUG_ON.
> + debug_assert!(
> + (val as usize) % (len as usize) == 0,
> + "ioeventfd_in_range: val pointer not aligned to len"
> + );
> + let _val: u64 = match len {
> + 1 => unsafe { (val as *const u8).read() as u64 },
> + 2 => unsafe { (val as *const u16).read() as u64 },
> + 4 => unsafe { (val as *const u32).read() as u64 },
> + 8 => unsafe { (val as *const u64).read() as u64 },
> + _ => return false,
> + };
[Severity: Medium]
Could this result in undefined behavior? In release builds, debug_assert!
compiles away, and ptr::read() strictly requires the input pointer to be
aligned. If the pointer happens to be unaligned, this invokes undefined
behavior. Using ptr::read_unaligned() would be safer here.
[ ... ]
> +unsafe fn kvm_deassign_ioeventfd_idx(
> + kvm: *mut bindings::kvm,
> + bus_idx: bindings::kvm_bus,
> + args: *const bindings::kvm_ioeventfd,
> +) -> c_int {
[ ... ]
> + // SAFETY: kvm_io_bus_unregister_dev will trigger the iodevice destructor,
> + // which implicitly performs list_del and kfree(p). Using `next` above avoids UAF.
> + unsafe { bindings::kvm_io_bus_unregister_dev(kvm, bus_idx, &raw mut (*p).dev) };
> + let bus = unsafe { bindings::kvm_get_bus(kvm, bus_idx) };
> + if !bus.is_null() {
> + unsafe { (*bus).ioeventfd_count -= 1 };
> + }
> + ret = 0;
> + break;
> + }
> + // _mutex_guard drops here, unlocking before eventfd_ctx_put.
> + }
[Severity: High]
Does this code leak the Ioeventfd and eventfd references? The assumption in
the comment that kvm_io_bus_unregister_dev() triggers the destructor is
incorrect; it only removes the device from the bus array. The caller is still
responsible for explicitly deleting the list entry and freeing the memory.
[ ... ]
> +pub unsafe extern "C" fn kvm_notify_irqfd_resampler(
> + kvm: *mut bindings::kvm,
> + irqchip: c_uint,
> + pin: c_uint,
> +) -> bool {
> + // SAFETY: kvm->irq_srcu is valid.
> + let _srcu_guard = unsafe { SrcuGuard::new(&raw mut (*kvm).irq_srcu) };
> + let gsi = unsafe { bindings::kvm_irq_map_chip_pin(kvm, irqchip, pin) };
> +
> + if gsi != -1 {
> + let mut pos = (*kvm).irqfds.resampler_list.next;
> + let head = &raw mut (*kvm).irqfds.resampler_list;
> + while pos != head {
> + let resampler: *mut bindings::kvm_kernel_irqfd_resampler =
> + crate::container_of!(pos, bindings::kvm_kernel_irqfd_resampler, link);
[Severity: High]
Can this plain pointer traversal cause a data race? Accessing an RCU list via
direct pos.next dereferences under an SRCU read lock lacks the memory barrier
provided by rcu_dereference(), which is required on weakly ordered
architectures to prevent reading uninitialized state.
[ ... ]
> +pub unsafe extern "C" fn kvm_unregister_irq_ack_notifier(
> + kvm: *mut bindings::kvm,
> + kian: *mut bindings::kvm_irq_ack_notifier,
> +) {
> + // SAFETY: Caller guarantees kvm is valid.
> + let _mutex_guard = unsafe { MutexGuard::new(&raw mut (*kvm).irq_lock) };
> + unsafe { bindings::rust_helper_hlist_del_init_rcu(&raw mut (*kian).link) };
> + // _mutex_guard dropped here
> + unsafe { bindings::synchronize_srcu_expedited(&raw mut (*kvm).irq_srcu) };
> + unsafe { bindings::kvm_arch_post_irq_ack_notifier_list_update(kvm) };
> +}
[Severity: High]
Is the _mutex_guard actually dropped where the comment claims? Because
_mutex_guard is not explicitly dropped (e.g., via drop(_mutex_guard) or
wrapping in an inner block), its lifetime extends to the end of the function
scope. This means irq_lock is held across the blocking
synchronize_srcu_expedited() call, violating the locking hierarchy and
potentially causing a deadlock.
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260626083212.39611-1-leixiang@kylinos.cn?part=1
next prev parent reply other threads:[~2026-06-26 8:48 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-26 8:29 [RFC PATCH] KVM: rust: add Rust reimplementation of eventfd leixiang
2026-06-26 8:48 ` sashiko-bot [this message]
2026-06-29 8:29 ` Onur Özkan
2026-06-30 7:16 ` ninol
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260626084842.172581F000E9@smtp.kernel.org \
--to=sashiko-bot@kernel.org \
--cc=kvm@vger.kernel.org \
--cc=leenollei@gmail.com \
--cc=ojeda@kernel.org \
--cc=sashiko-reviews@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox