From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
patches@lists.linux.dev, Marc Zyngier <maz@kernel.org>,
Oliver Upton <oupton@kernel.org>,
Sean Christopherson <seanjc@google.com>
Subject: [PATCH 5.15 05/75] KVM: Dont clobber irqfd routing type when deassigning irqfd
Date: Mon, 9 Feb 2026 15:24:02 +0100 [thread overview]
Message-ID: <20260209142302.032848391@linuxfoundation.org> (raw)
In-Reply-To: <20260209142301.830618238@linuxfoundation.org>
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Christopherson <seanjc@google.com>
commit b4d37cdb77a0015f51fee083598fa227cc07aaf1 upstream.
When deassigning a KVM_IRQFD, don't clobber the irqfd's copy of the IRQ's
routing entry as doing so breaks kvm_arch_irq_bypass_del_producer() on x86
and arm64, which explicitly look for KVM_IRQ_ROUTING_MSI. Instead, to
handle a concurrent routing update, verify that the irqfd is still active
before consuming the routing information. As evidenced by the x86 and
arm64 bugs, and another bug in kvm_arch_update_irqfd_routing() (see below),
clobbering the entry type without notifying arch code is surprising and
error prone.
As a bonus, checking that the irqfd is active provides a convenient
location for documenting _why_ KVM must not consume the routing entry for
an irqfd that is in the process of being deassigned: once the irqfd is
deleted from the list (which happens *before* the eventfd is detached), it
will no longer receive updates via kvm_irq_routing_update(), and so KVM
could deliver an event using stale routing information (relative to
KVM_SET_GSI_ROUTING returning to userspace).
As an even better bonus, explicitly checking for the irqfd being active
fixes a similar bug to the one the clobbering is trying to prevent: if an
irqfd is deactivated, and then its routing is changed,
kvm_irq_routing_update() won't invoke kvm_arch_update_irqfd_routing()
(because the irqfd isn't in the list). And so if the irqfd is in bypass
mode, IRQs will continue to be posted using the old routing information.
As for kvm_arch_irq_bypass_del_producer(), clobbering the routing type
results in KVM incorrectly keeping the IRQ in bypass mode, which is
especially problematic on AMD as KVM tracks IRQs that are being posted to
a vCPU in a list whose lifetime is tied to the irqfd.
Without the help of KASAN to detect use-after-free, the most common
sympton on AMD is a NULL pointer deref in amd_iommu_update_ga() due to
the memory for irqfd structure being re-allocated and zeroed, resulting
in irqfd->irq_bypass_data being NULL when read by
avic_update_iommu_vcpu_affinity():
BUG: kernel NULL pointer dereference, address: 0000000000000018
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 40cf2b9067 P4D 40cf2b9067 PUD 408362a067 PMD 0
Oops: Oops: 0000 [#1] SMP
CPU: 6 UID: 0 PID: 40383 Comm: vfio_irq_test
Tainted: G U W O 6.19.0-smp--5dddc257e6b2-irqfd #31 NONE
Tainted: [U]=USER, [W]=WARN, [O]=OOT_MODULE
Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 34.78.2-0 09/05/2025
RIP: 0010:amd_iommu_update_ga+0x19/0xe0
Call Trace:
<TASK>
avic_update_iommu_vcpu_affinity+0x3d/0x90 [kvm_amd]
__avic_vcpu_load+0xf4/0x130 [kvm_amd]
kvm_arch_vcpu_load+0x89/0x210 [kvm]
vcpu_load+0x30/0x40 [kvm]
kvm_arch_vcpu_ioctl_run+0x45/0x620 [kvm]
kvm_vcpu_ioctl+0x571/0x6a0 [kvm]
__se_sys_ioctl+0x6d/0xb0
do_syscall_64+0x6f/0x9d0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x46893b
</TASK>
---[ end trace 0000000000000000 ]---
If AVIC is inhibited when the irfd is deassigned, the bug will manifest as
list corruption, e.g. on the next irqfd assignment.
list_add corruption. next->prev should be prev (ffff8d474d5cd588),
but was 0000000000000000. (next=ffff8d8658f86530).
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:31!
Oops: invalid opcode: 0000 [#1] SMP
CPU: 128 UID: 0 PID: 80818 Comm: vfio_irq_test
Tainted: G U W O 6.19.0-smp--f19dc4d680ba-irqfd #28 NONE
Tainted: [U]=USER, [W]=WARN, [O]=OOT_MODULE
Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 34.78.2-0 09/05/2025
RIP: 0010:__list_add_valid_or_report+0x97/0xc0
Call Trace:
<TASK>
avic_pi_update_irte+0x28e/0x2b0 [kvm_amd]
kvm_pi_update_irte+0xbf/0x190 [kvm]
kvm_arch_irq_bypass_add_producer+0x72/0x90 [kvm]
irq_bypass_register_consumer+0xcd/0x170 [irqbypass]
kvm_irqfd+0x4c6/0x540 [kvm]
kvm_vm_ioctl+0x118/0x5d0 [kvm]
__se_sys_ioctl+0x6d/0xb0
do_syscall_64+0x6f/0x9d0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
</TASK>
---[ end trace 0000000000000000 ]---
On Intel and arm64, the bug is less noisy, as the end result is that the
device keeps posting IRQs to the vCPU even after it's been deassigned.
Note, the worst of the breakage can be traced back to commit cb210737675e
("KVM: Pass new routing entries and irqfd when updating IRTEs"), as before
that commit KVM would pull the routing information from the per-VM routing
table. But as above, similar bugs have existed since support for IRQ
bypass was added. E.g. if a routing change finished before irq_shutdown()
invoked kvm_arch_irq_bypass_del_producer(), VMX and SVM would see stale
routing information and potentially leave the irqfd in bypass mode.
Alternatively, x86 could be fixed by explicitly checking irq_bypass_vcpu
instead of irq_entry.type in kvm_arch_irq_bypass_del_producer(), and arm64
could be modified to utilize irq_bypass_vcpu in a similar manner. But (a)
that wouldn't fix the routing updates bug, and (b) fixing core code doesn't
preclude x86 (or arm64) from adding such code as a sanity check (spoiler
alert).
Fixes: f70c20aaf141 ("KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd'")
Fixes: cb210737675e ("KVM: Pass new routing entries and irqfd when updating IRTEs")
Fixes: a0d7e2fc61ab ("KVM: arm64: vgic-v4: Only attempt vLPI mapping for actual MSIs")
Cc: stable@vger.kernel.org
Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oupton@kernel.org>
Link: https://patch.msgid.link/20260113174606.104978-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
virt/kvm/eventfd.c | 44 ++++++++++++++++++++++++--------------------
1 file changed, 24 insertions(+), 20 deletions(-)
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -147,21 +147,28 @@ irqfd_shutdown(struct work_struct *work)
}
-/* assumes kvm->irqfds.lock is held */
-static bool
-irqfd_is_active(struct kvm_kernel_irqfd *irqfd)
+static bool irqfd_is_active(struct kvm_kernel_irqfd *irqfd)
{
+ /*
+ * Assert that either irqfds.lock or SRCU is held, as irqfds.lock must
+ * be held to prevent false positives (on the irqfd being active), and
+ * while false negatives are impossible as irqfds are never added back
+ * to the list once they're deactivated, the caller must at least hold
+ * SRCU to guard against routing changes if the irqfd is deactivated.
+ */
+ lockdep_assert_once(lockdep_is_held(&irqfd->kvm->irqfds.lock) ||
+ srcu_read_lock_held(&irqfd->kvm->irq_srcu));
+
return list_empty(&irqfd->list) ? false : true;
}
/*
* Mark the irqfd as inactive and schedule it for removal
- *
- * assumes kvm->irqfds.lock is held
*/
-static void
-irqfd_deactivate(struct kvm_kernel_irqfd *irqfd)
+static void irqfd_deactivate(struct kvm_kernel_irqfd *irqfd)
{
+ lockdep_assert_held(&irqfd->kvm->irqfds.lock);
+
BUG_ON(!irqfd_is_active(irqfd));
list_del_init(&irqfd->list);
@@ -202,8 +209,15 @@ irqfd_wakeup(wait_queue_entry_t *wait, u
seq = read_seqcount_begin(&irqfd->irq_entry_sc);
irq = irqfd->irq_entry;
} while (read_seqcount_retry(&irqfd->irq_entry_sc, seq));
- /* An event has been signaled, inject an interrupt */
- if (kvm_arch_set_irq_inatomic(&irq, kvm,
+
+ /*
+ * An event has been signaled, inject an interrupt unless the
+ * irqfd is being deassigned (isn't active), in which case the
+ * routing information may be stale (once the irqfd is removed
+ * from the list, it will stop receiving routing updates).
+ */
+ if (unlikely(!irqfd_is_active(irqfd)) ||
+ kvm_arch_set_irq_inatomic(&irq, kvm,
KVM_USERSPACE_IRQ_SOURCE_ID, 1,
false) == -EWOULDBLOCK)
schedule_work(&irqfd->inject);
@@ -541,18 +555,8 @@ kvm_irqfd_deassign(struct kvm *kvm, stru
spin_lock_irq(&kvm->irqfds.lock);
list_for_each_entry_safe(irqfd, tmp, &kvm->irqfds.items, list) {
- if (irqfd->eventfd == eventfd && irqfd->gsi == args->gsi) {
- /*
- * This clearing of irq_entry.type is needed for when
- * another thread calls kvm_irq_routing_update before
- * we flush workqueue below (we synchronize with
- * kvm_irq_routing_update using irqfds.lock).
- */
- write_seqcount_begin(&irqfd->irq_entry_sc);
- irqfd->irq_entry.type = 0;
- write_seqcount_end(&irqfd->irq_entry_sc);
+ if (irqfd->eventfd == eventfd && irqfd->gsi == args->gsi)
irqfd_deactivate(irqfd);
- }
}
spin_unlock_irq(&kvm->irqfds.lock);
next prev parent reply other threads:[~2026-02-09 14:53 UTC|newest]
Thread overview: 85+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-09 14:23 [PATCH 5.15 00/75] 5.15.200-rc1 review Greg Kroah-Hartman
2026-02-09 14:23 ` [PATCH 5.15 01/75] x86/kfence: fix booting on 32bit non-PAE systems Greg Kroah-Hartman
2026-02-09 14:23 ` [PATCH 5.15 02/75] platform/x86: intel_telemetry: Fix swapped arrays in PSS output Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 03/75] rbd: check for EOD after exclusive lock is ensured to be held Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 04/75] ARM: 9468/1: fix memset64() on big-endian Greg Kroah-Hartman
2026-02-09 14:24 ` Greg Kroah-Hartman [this message]
2026-02-09 14:24 ` [PATCH 5.15 06/75] mm/kfence: randomize the freelist on initialization Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 07/75] netfilter: nft_set_pipapo: clamp maximum map bucket size to INT_MAX Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 08/75] Documentation: Remove bogus claim about del_timer_sync() Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 09/75] ARM: spear: Do not use timer namespace for timer_shutdown() function Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 10/75] clocksource/drivers/arm_arch_timer: " Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 11/75] clocksource/drivers/sp804: " Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 12/75] timers: Get rid of del_singleshot_timer_sync() Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 13/75] timers: Replace BUG_ON()s Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 14/75] timers: Rename del_timer() to timer_delete() Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 15/75] Documentation: Replace del_timer/del_timer_sync() Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 16/75] timers: Silently ignore timers with a NULL function Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 17/75] timers: Split [try_to_]del_timer[_sync]() to prepare for shutdown mode Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 18/75] timers: Add shutdown mechanism to the internal functions Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 19/75] timers: Provide timer_shutdown[_sync]() Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 20/75] timers: Update the documentation to reflect on the new timer_shutdown() API Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 21/75] Bluetooth: hci_qca: Fix the teardown problem for real Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 22/75] timers: Fix NULL function pointer race in timer_shutdown_sync() Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 23/75] binderfs: fix ida_alloc_max() upper bound Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 24/75] wifi: mac80211: ocb: skip rx_no_sta when interface is not joined Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 25/75] wifi: wlcore: ensure skb headroom before skb_push Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 26/75] net: usb: sr9700: support devices with virtual driver CD Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 27/75] block,bfq: fix aux stat accumulation destination Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 28/75] smb/server: call ksmbd_session_rpc_close() on error path in create_smb2_pipe() Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 29/75] HID: multitouch: add MT_QUIRK_STICKY_FINGERS to MT_CLS_VTL Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 30/75] HID: intel-ish-hid: Reset enum_devices_done before enumeration Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 31/75] HID: playstation: Center initial joystick axes to prevent spurious events Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 32/75] ALSA: hda/realtek: add HP Laptop 15s-eq1xxx mute LED quirk Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 33/75] netfilter: replace -EEXIST with -EBUSY Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 34/75] HID: quirks: Add another Chicony HP 5MP Cameras to hid_ignore_list Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 35/75] HID: Apply quirk HID_QUIRK_ALWAYS_POLL to Edifier QR30 (2d99:a101) Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 36/75] ring-buffer: Avoid softlockup in ring_buffer_resize() during memory free Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 37/75] wifi: mac80211: collect station statistics earlier when disconnect Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 38/75] ASoC: davinci-evm: Fix reference leak in davinci_evm_probe Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 39/75] ASoC: tlv320adcx140: Propagate error codes during probe Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 40/75] wifi: cfg80211: Fix bitrate calculation overflow for HE rates Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 41/75] scsi: target: iscsi: Fix use-after-free in iscsit_dec_session_usage_count() Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 42/75] scsi: target: iscsi: Fix use-after-free in iscsit_dec_conn_usage_count() Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 43/75] wifi: mac80211: dont increment crypto_tx_tailroom_needed_cnt twice Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 44/75] platform/x86: toshiba_haps: Fix memory leaks in add/remove routines Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 45/75] platform/x86: intel_telemetry: Fix PSS event register mask Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 46/75] dpaa2-switch: prevent ZERO_SIZE_PTR dereference when num_ifs is zero Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 47/75] net: liquidio: Initialize netdev pointer before queue setup Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 48/75] net: liquidio: Fix off-by-one error in PF setup_nic_devices() cleanup Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 49/75] net: liquidio: Fix off-by-one error in VF " Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 50/75] dpaa2-switch: add bounds check for if_id in IRQ handler Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 51/75] macvlan: fix error recovery in macvlan_common_newlink() Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 52/75] tipc: use kfree_sensitive() for session key material Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 53/75] hwmon: (occ) Mark occ_init_attribute() as __printf Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 54/75] netfilter: nf_tables: fix inverted genmask check in nft_map_catchall_activate() Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 55/75] nvmet-tcp: add an helper to free the cmd buffers Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 56/75] nvmet-tcp: fix memory leak when performing a controller reset Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 57/75] nvmet-tcp: fix regression in data_digest calculation Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 58/75] nvmet-tcp: dont map pages which cant come from HIGHMEM Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 59/75] nvmet-tcp: add bounds checks in nvmet_tcp_build_pdu_iovec Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 60/75] ASoC: amd: fix memory leak in acp3x pdm dma ops Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 61/75] riscv: uprobes: Add missing fence.i after building the XOL buffer Greg Kroah-Hartman
2026-02-09 14:24 ` [PATCH 5.15 62/75] hfsplus: fix slab-out-of-bounds read in hfsplus_uni2asc() Greg Kroah-Hartman
2026-02-09 14:25 ` [PATCH 5.15 63/75] gfs2: Fix NULL pointer dereference in gfs2_log_flush Greg Kroah-Hartman
2026-02-09 14:25 ` [PATCH 5.15 64/75] tracing: Fix ftrace event field alignments Greg Kroah-Hartman
2026-02-09 14:25 ` [PATCH 5.15 65/75] gve: Fix stats report corruption on queue count change Greg Kroah-Hartman
2026-02-09 14:25 ` [PATCH 5.15 66/75] gve: Correct ethtool rx_dropped calculation Greg Kroah-Hartman
2026-02-09 14:25 ` [PATCH 5.15 67/75] Bluetooth: hci_event: call disconnect callback before deleting conn Greg Kroah-Hartman
2026-02-09 14:25 ` [PATCH 5.15 68/75] iommu: disable SVA when CONFIG_X86 is set Greg Kroah-Hartman
2026-02-09 14:25 ` [PATCH 5.15 69/75] spi: tegra210-quad: Return IRQ_HANDLED when timeout already processed transfer Greg Kroah-Hartman
2026-02-09 14:25 ` [PATCH 5.15 70/75] spi: tegra210-quad: Move curr_xfer read inside spinlock Greg Kroah-Hartman
2026-02-09 14:25 ` [PATCH 5.15 71/75] spi: tegra210-quad: Protect curr_xfer assignment in tegra_qspi_setup_transfer_one Greg Kroah-Hartman
2026-02-09 14:25 ` [PATCH 5.15 72/75] spi: tegra210-quad: Protect curr_xfer in tegra_qspi_combined_seq_xfer Greg Kroah-Hartman
2026-02-09 14:25 ` [PATCH 5.15 73/75] spi: tegra210-quad: Protect curr_xfer clearing in tegra_qspi_non_combined_seq_xfer Greg Kroah-Hartman
2026-02-09 14:25 ` [PATCH 5.15 74/75] spi: tegra: Fix a memory leak in tegra_slink_probe() Greg Kroah-Hartman
2026-02-09 14:25 ` [PATCH 5.15 75/75] nvmet-tcp: pass iov_len instead of sg->length to bvec_set_page() Greg Kroah-Hartman
2026-02-09 18:16 ` [PATCH 5.15 00/75] 5.15.200-rc1 review Brett A C Sheffield
2026-02-09 18:49 ` Florian Fainelli
2026-02-09 20:47 ` Hardik Garg
2026-02-09 20:55 ` Jon Hunter
2026-02-10 10:07 ` Ron Economos
2026-02-11 11:34 ` Greg Kroah-Hartman
2026-02-10 13:24 ` Mark Brown
2026-02-11 7:27 ` Vijayendra Suman
2026-02-11 10:44 ` Jeffrin Thalakkottoor
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260209142302.032848391@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=maz@kernel.org \
--cc=oupton@kernel.org \
--cc=patches@lists.linux.dev \
--cc=seanjc@google.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox