Linux s390 Architecture development
 help / color / mirror / Atom feed
* [PATCH v5 0/4] KVM: s390: Introducing kvm_arch_set_irq_inatomic Fast Inject
@ 2026-05-05 17:37 Douglas Freimuth
  2026-05-05 17:37 ` [PATCH v5 1/4] KVM: s390: Add map/unmap ioctl and clean mappings post-guest Douglas Freimuth
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Douglas Freimuth @ 2026-05-05 17:37 UTC (permalink / raw)
  To: borntraeger, imbrenda, frankja, david, hca, gor, agordeev, svens,
	kvm, linux-s390, linux-kernel
  Cc: mjrosato, freimuth

S390 needs this series of three patches in order to enable a non-blocking
path for irqfd injection on s390 via kvm_arch_set_irq_inatomic(). Before
these changes, kvm_arch_set_irq_inatomic() would just return -EWOULDBLOCK
and place all interrupts on the global work queue, which must subsequently
be processed by a different thread. This series of patches implements an
s390 version of inatomic and is relevant to virtio-blk and virtio-net and
was tested against virtio-pci and virtio-ccw.

The inatomic fast path cannot lose control since it is running with
interrupts disabled. This meant making the following changes that exist on
the slow path today. First, the adapter_indicators page needs to be mapped
since it is accessed with interrupts disabled, so we added map/unmap
functions. Second, access to shared resources between the fast and slow
paths needed to be changed from mutex and semaphores to raw_spin_lock's.
Finally, the memory allocation on the slow path utilizes GFP_KERNEL_ACCOUNT
but we had to implement the fast path with GFP_ATOMIC allocation. Each of
these enhancements were required to prevent blocking on the fast inject
path. In order to handle the RT kernel case, raw spin locks were added.

Statistical counters have been added to enable analysis of irq injection on
the fast path and slow path including io_390_inatomic, io_flic_inject_airq,
io_set_adapter_int and io_390_inatomic_adapter_masked. And counters have
been added to analyze map/unmap of the adapter_indicator
pages in non-Secure Execution environments and to track fencing of Fast
Inject in Secure Execution environments. In order to take advantage of this
kernel series with virtio-pci, a QEMU that includes the
's390x/pci: set kvm_msi_via_irqfd_allowed' fix is needed.  Additionally,
the guest xml needs a thread pool and threads explicitly assigned per disk
device using the common way of defining threads for disks.

Patch 1 enables map/unmap of adapter indicator pages but for Secure
Execution environments it avoids the long term mapping.

v4->v5: Replace get_map_page using pin functions FOLL_WRITE | FOLL_LONGTERM
v4->v5: Add shared function for destroy_adapters, unmap_all_adapters_pv
v4->v5: Upon failed injection on fast path clear the summary bit only
v4->v5: If allocation fails in inatomic, clear summary bit and do slow path
v4->v5: Change fi->lock to a raw spin lock to handle the RT case

Douglas Freimuth (4):
  KVM: s390: Add map/unmap ioctl and clean mappings post-guest
  KVM: s390: Enable adapter_indicators_set to use mapped pages
  KVM: s390: Change the fi->lock to a raw_spinlock for RT case
  KVM: s390: Introducing kvm_arch_set_irq_inatomic fast inject

 arch/s390/include/asm/kvm_host.h |  13 +-
 arch/s390/kvm/intercept.c        |   4 +-
 arch/s390/kvm/interrupt.c        | 497 +++++++++++++++++++++++++------
 arch/s390/kvm/kvm-s390.c         |  31 +-
 arch/s390/kvm/kvm-s390.h         |   5 +-
 5 files changed, 444 insertions(+), 106 deletions(-)

-- 
2.52.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v5 1/4] KVM: s390: Add map/unmap ioctl and clean mappings post-guest
  2026-05-05 17:37 [PATCH v5 0/4] KVM: s390: Introducing kvm_arch_set_irq_inatomic Fast Inject Douglas Freimuth
@ 2026-05-05 17:37 ` Douglas Freimuth
  2026-05-05 17:37 ` [PATCH v5 2/4] KVM: s390: Enable adapter_indicators_set to use mapped pages Douglas Freimuth
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Douglas Freimuth @ 2026-05-05 17:37 UTC (permalink / raw)
  To: borntraeger, imbrenda, frankja, david, hca, gor, agordeev, svens,
	kvm, linux-s390, linux-kernel
  Cc: mjrosato, freimuth

S390 needs map/unmap ioctls, which map the adapter set
indicator pages, so the pages can be accessed when interrupts are
disabled. The mappings are cleaned up when the guest is removed.

Map/Unmap ioctls are fenced in order to avoid the longterm pinning
in Secure Execution environments. In Secure Execution
environments the path of execution available before this patch is followed.

Statistical counters to count map/unmap functions for adapter indicator
pages are added. The counters can be used to analyze
map/unmap functions in non-Secure Execution environments and similarly
can be used to analyze Secure Execution environments where the counters
will not be incremented as the adapter indicator pages are not mapped.

Signed-off-by: Douglas Freimuth <freimuth@linux.ibm.com>
---
 arch/s390/include/asm/kvm_host.h |   5 +
 arch/s390/kvm/interrupt.c        | 190 ++++++++++++++++++++++++++-----
 arch/s390/kvm/kvm-s390.c         |   8 ++
 arch/s390/kvm/kvm-s390.h         |   2 +
 4 files changed, 176 insertions(+), 29 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 8a4f4a39f7a2..fbb2406b31d2 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -448,6 +448,8 @@ struct kvm_vcpu_arch {
 struct kvm_vm_stat {
 	struct kvm_vm_stat_generic generic;
 	u64 inject_io;
+	u64 io_390_adapter_map;
+	u64 io_390_adapter_unmap;
 	u64 inject_float_mchk;
 	u64 inject_pfault_done;
 	u64 inject_service_signal;
@@ -479,6 +481,9 @@ struct s390_io_adapter {
 	bool masked;
 	bool swap;
 	bool suppressible;
+	raw_spinlock_t maps_lock;
+	struct list_head maps;
+	unsigned int nr_maps;
 };
 
 #define MAX_S390_IO_ADAPTERS ((MAX_ISC + 1) * 8)
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 07f59c3b9a7b..a9b418996225 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -2429,6 +2429,9 @@ static int register_io_adapter(struct kvm_device *dev,
 	if (!adapter)
 		return -ENOMEM;
 
+	INIT_LIST_HEAD(&adapter->maps);
+	raw_spin_lock_init(&adapter->maps_lock);
+	adapter->nr_maps = 0;
 	adapter->id = adapter_info.id;
 	adapter->isc = adapter_info.isc;
 	adapter->maskable = adapter_info.maskable;
@@ -2453,12 +2456,151 @@ int kvm_s390_mask_adapter(struct kvm *kvm, unsigned int id, bool masked)
 	return ret;
 }
 
+static struct page *pin_map_page(struct kvm *kvm, u64 uaddr,
+				 unsigned int gup_flags)
+{
+	struct mm_struct *mm = kvm->mm;
+	struct page *page = NULL;
+	int locked = 1;
+
+	if (mmget_not_zero(mm)) {
+		mmap_read_lock(mm);
+		pin_user_pages_remote(mm, uaddr, 1, FOLL_WRITE | gup_flags,
+				      &page, &locked);
+		if (locked)
+			mmap_read_unlock(mm);
+		mmput(mm);
+	}
+
+	return page;
+}
+
+static int kvm_s390_adapter_map(struct kvm *kvm, unsigned int id, __u64 addr)
+{
+	struct s390_io_adapter *adapter = get_io_adapter(kvm, id);
+	struct s390_map_info *map;
+	unsigned long flags;
+	__u64 host_addr;
+	int ret, idx;
+
+	if (!adapter || !addr)
+		return -EINVAL;
+
+	map = kzalloc_obj(*map, GFP_KERNEL_ACCOUNT);
+	if (!map)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&map->list);
+	idx = srcu_read_lock(&kvm->srcu);
+	host_addr = gpa_to_hva(kvm, addr);
+	if (kvm_is_error_hva(host_addr)) {
+		srcu_read_unlock(&kvm->srcu, idx);
+		ret = -EFAULT;
+		goto out;
+	}
+	srcu_read_unlock(&kvm->srcu, idx);
+	map->guest_addr = addr;
+	map->addr = host_addr;
+	map->page = pin_map_page(kvm, host_addr, FOLL_LONGTERM);
+	if (!map->page) {
+		ret = -EINVAL;
+		goto out;
+	}
+	raw_spin_lock_irqsave(&adapter->maps_lock, flags);
+	if (adapter->nr_maps < MAX_S390_ADAPTER_MAPS) {
+		list_add_tail(&map->list, &adapter->maps);
+		adapter->nr_maps++;
+		ret = 0;
+	} else {
+		ret = -EINVAL;
+	}
+	raw_spin_unlock_irqrestore(&adapter->maps_lock, flags);
+	if (ret)
+		unpin_user_page(map->page);
+out:
+	if (ret)
+		kfree(map);
+	return ret;
+}
+
+static int kvm_s390_adapter_unmap(struct kvm *kvm, unsigned int id, __u64 addr)
+{
+	struct s390_io_adapter *adapter = get_io_adapter(kvm, id);
+	struct s390_map_info *map, *tmp, *map_to_free;
+	struct page *map_page_to_put = NULL;
+	u64 map_addr_to_mark = 0;
+	unsigned long flags;
+	int found = 0, idx;
+
+	if (!adapter || !addr)
+		return -EINVAL;
+
+	raw_spin_lock_irqsave(&adapter->maps_lock, flags);
+	list_for_each_entry_safe(map, tmp, &adapter->maps, list) {
+		if (map->guest_addr == addr) {
+			found = 1;
+			adapter->nr_maps--;
+			list_del(&map->list);
+			map_page_to_put = map->page;
+			map_addr_to_mark = map->guest_addr;
+			map_to_free = map;
+			break;
+		}
+	}
+	raw_spin_unlock_irqrestore(&adapter->maps_lock, flags);
+
+	if (found) {
+		kfree(map_to_free);
+		idx = srcu_read_lock(&kvm->srcu);
+		mark_page_dirty(kvm, map_addr_to_mark >> PAGE_SHIFT);
+		set_page_dirty_lock(map_page_to_put);
+		srcu_read_unlock(&kvm->srcu, idx);
+		unpin_user_page(map_page_to_put);
+	}
+
+	return found ? 0 : -ENOENT;
+}
+
+void kvm_s390_unmap_all_adapters(struct kvm *kvm)
+{
+	struct s390_map_info *map, *tmp;
+	unsigned long flags;
+	int i, idx;
+
+	for (i = 0; i < MAX_S390_IO_ADAPTERS; i++) {
+		struct s390_io_adapter *adapter = kvm->arch.adapters[i];
+		LIST_HEAD(local_list);
+
+		if (!adapter)
+			continue;
+
+		raw_spin_lock_irqsave(&adapter->maps_lock, flags);
+		list_splice_init(&adapter->maps, &local_list);
+		adapter->nr_maps = 0;
+		raw_spin_unlock_irqrestore(&adapter->maps_lock, flags);
+
+		list_for_each_entry_safe(map, tmp, &local_list, list) {
+			list_del(&map->list);
+			idx = srcu_read_lock(&kvm->srcu);
+			mark_page_dirty(kvm, map->guest_addr >> PAGE_SHIFT);
+			set_page_dirty_lock(map->page);
+			srcu_read_unlock(&kvm->srcu, idx);
+			unpin_user_page(map->page);
+			kfree(map);
+		}
+	}
+}
+
 void kvm_s390_destroy_adapters(struct kvm *kvm)
 {
 	int i;
 
-	for (i = 0; i < MAX_S390_IO_ADAPTERS; i++)
+	kvm_s390_unmap_all_adapters(kvm);
+
+	for (i = 0; i < MAX_S390_IO_ADAPTERS; i++) {
 		kfree(kvm->arch.adapters[i]);
+		kvm->arch.adapters[i] = NULL;
+	}
 }
 
 static int modify_io_adapter(struct kvm_device *dev,
@@ -2480,14 +2622,22 @@ static int modify_io_adapter(struct kvm_device *dev,
 		if (ret > 0)
 			ret = 0;
 		break;
-	/*
-	 * The following operations are no longer needed and therefore no-ops.
-	 * The gpa to hva translation is done when an IRQ route is set up. The
-	 * set_irq code uses get_user_pages_remote() to do the actual write.
-	 */
 	case KVM_S390_IO_ADAPTER_MAP:
 	case KVM_S390_IO_ADAPTER_UNMAP:
-		ret = 0;
+		/* If in Secure Execution mode do not long term pin. */
+		mutex_lock(&dev->kvm->lock);
+		if (kvm_s390_pv_is_protected(dev->kvm)) {
+			mutex_unlock(&dev->kvm->lock);
+			return 0;
+		}
+		if (req.type == KVM_S390_IO_ADAPTER_MAP) {
+			dev->kvm->stat.io_390_adapter_map++;
+			ret = kvm_s390_adapter_map(dev->kvm, req.id, req.addr);
+		} else {
+			dev->kvm->stat.io_390_adapter_unmap++;
+			ret = kvm_s390_adapter_unmap(dev->kvm, req.id, req.addr);
+		}
+		mutex_unlock(&dev->kvm->lock);
 		break;
 	default:
 		ret = -EINVAL;
@@ -2733,24 +2883,6 @@ static unsigned long get_ind_bit(__u64 addr, unsigned long bit_nr, bool swap)
 	return swap ? (bit ^ (BITS_PER_LONG - 1)) : bit;
 }
 
-static struct page *get_map_page(struct kvm *kvm, u64 uaddr)
-{
-	struct mm_struct *mm = kvm->mm;
-	struct page *page = NULL;
-	int locked = 1;
-
-	if (mmget_not_zero(mm)) {
-		mmap_read_lock(mm);
-		get_user_pages_remote(mm, uaddr, 1, FOLL_WRITE,
-				      &page, &locked);
-		if (locked)
-			mmap_read_unlock(mm);
-		mmput(mm);
-	}
-
-	return page;
-}
-
 static int adapter_indicators_set(struct kvm *kvm,
 				  struct s390_io_adapter *adapter,
 				  struct kvm_s390_adapter_int *adapter_int)
@@ -2760,10 +2892,10 @@ static int adapter_indicators_set(struct kvm *kvm,
 	struct page *ind_page, *summary_page;
 	void *map;
 
-	ind_page = get_map_page(kvm, adapter_int->ind_addr);
+	ind_page = pin_map_page(kvm, adapter_int->ind_addr, 0);
 	if (!ind_page)
 		return -1;
-	summary_page = get_map_page(kvm, adapter_int->summary_addr);
+	summary_page = pin_map_page(kvm, adapter_int->summary_addr, 0);
 	if (!summary_page) {
 		put_page(ind_page);
 		return -1;
@@ -2784,8 +2916,8 @@ static int adapter_indicators_set(struct kvm *kvm,
 	set_page_dirty_lock(summary_page);
 	srcu_read_unlock(&kvm->srcu, idx);
 
-	put_page(ind_page);
-	put_page(summary_page);
+	unpin_user_page(ind_page);
+	unpin_user_page(summary_page);
 	return summary_set ? 0 : 1;
 }
 
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index e09960c2e6ed..74f453f039a3 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -68,6 +68,8 @@
 const struct kvm_stats_desc kvm_vm_stats_desc[] = {
 	KVM_GENERIC_VM_STATS(),
 	STATS_DESC_COUNTER(VM, inject_io),
+	STATS_DESC_COUNTER(VM, io_390_adapter_map),
+	STATS_DESC_COUNTER(VM, io_390_adapter_unmap),
 	STATS_DESC_COUNTER(VM, inject_float_mchk),
 	STATS_DESC_COUNTER(VM, inject_pfault_done),
 	STATS_DESC_COUNTER(VM, inject_service_signal),
@@ -2497,6 +2499,11 @@ static int kvm_s390_pv_dmp(struct kvm *kvm, struct kvm_pv_cmd *cmd,
 	return r;
 }
 
+static void kvm_s390_unmap_all_adapters_pv(struct kvm *kvm)
+{
+	kvm_s390_unmap_all_adapters(kvm);
+}
+
 static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
 {
 	const bool need_lock = (cmd->cmd != KVM_PV_ASYNC_CLEANUP_PERFORM);
@@ -2513,6 +2520,7 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
 		if (kvm_s390_pv_is_protected(kvm))
 			break;
 
+		kvm_s390_unmap_all_adapters_pv(kvm);
 		mmap_write_lock(kvm->mm);
 		/*
 		 * Disable creation of new THPs. Existing THPs can stay, they
diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
index dc0573b7aa4b..7ba885cb6bd1 100644
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -560,6 +560,8 @@ void kvm_s390_gisa_disable(struct kvm *kvm);
 void kvm_s390_gisa_enable(struct kvm *kvm);
 int __init kvm_s390_gib_init(u8 nisc);
 void kvm_s390_gib_destroy(void);
+void kvm_s390_unmap_all_adapters(struct kvm *kvm);
+
 
 /* implemented in guestdbg.c */
 void kvm_s390_backup_guest_per_regs(struct kvm_vcpu *vcpu);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v5 2/4] KVM: s390: Enable adapter_indicators_set to use mapped pages
  2026-05-05 17:37 [PATCH v5 0/4] KVM: s390: Introducing kvm_arch_set_irq_inatomic Fast Inject Douglas Freimuth
  2026-05-05 17:37 ` [PATCH v5 1/4] KVM: s390: Add map/unmap ioctl and clean mappings post-guest Douglas Freimuth
@ 2026-05-05 17:37 ` Douglas Freimuth
  2026-05-05 17:37 ` [PATCH v5 3/4] KVM: s390: Change the fi->lock to a raw_spinlock for RT case Douglas Freimuth
  2026-05-05 17:37 ` [PATCH v5 4/4] KVM: s390: Introducing kvm_arch_set_irq_inatomic fast inject Douglas Freimuth
  3 siblings, 0 replies; 13+ messages in thread
From: Douglas Freimuth @ 2026-05-05 17:37 UTC (permalink / raw)
  To: borntraeger, imbrenda, frankja, david, hca, gor, agordeev, svens,
	kvm, linux-s390, linux-kernel
  Cc: mjrosato, freimuth

The S390 adapter_indicators_set function needs to be able to use mapped
pages so that worked can be processed,on a fast path when interrupts are
disabled. If adapter indicator pages are not mapped then local mapping is
done on a slow path as it is prior to this patch. For example, Secure
Execution environments will take the local mapping path as it does prior to
this patch.

Signed-off-by: Douglas Freimuth <freimuth@linux.ibm.com>
---
 arch/s390/kvm/interrupt.c | 94 ++++++++++++++++++++++++++++-----------
 1 file changed, 69 insertions(+), 25 deletions(-)

diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index a9b418996225..12d8d38c260d 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -2883,41 +2883,85 @@ static unsigned long get_ind_bit(__u64 addr, unsigned long bit_nr, bool swap)
 	return swap ? (bit ^ (BITS_PER_LONG - 1)) : bit;
 }
 
+static struct s390_map_info *get_map_info(struct s390_io_adapter *adapter,
+					  u64 addr)
+{
+	struct s390_map_info *map;
+
+	if (!adapter)
+		return NULL;
+
+	list_for_each_entry(map, &adapter->maps, list) {
+		if (map->addr == addr)
+			return map;
+	}
+	return NULL;
+}
+
 static int adapter_indicators_set(struct kvm *kvm,
 				  struct s390_io_adapter *adapter,
 				  struct kvm_s390_adapter_int *adapter_int)
 {
 	unsigned long bit;
 	int summary_set, idx;
-	struct page *ind_page, *summary_page;
+	struct s390_map_info *ind_info, *summary_info;
 	void *map;
+	struct page *ind_page, *summary_page;
+	unsigned long flags;
 
-	ind_page = pin_map_page(kvm, adapter_int->ind_addr, 0);
-	if (!ind_page)
-		return -1;
-	summary_page = pin_map_page(kvm, adapter_int->summary_addr, 0);
-	if (!summary_page) {
-		put_page(ind_page);
-		return -1;
+	raw_spin_lock_irqsave(&adapter->maps_lock, flags);
+	ind_info = get_map_info(adapter, adapter_int->ind_addr);
+	if (!ind_info) {
+		raw_spin_unlock_irqrestore(&adapter->maps_lock, flags);
+		ind_page = pin_map_page(kvm, adapter_int->ind_addr, 0);
+		if (!ind_page)
+			return -1;
+		idx = srcu_read_lock(&kvm->srcu);
+		map = page_address(ind_page);
+		bit = get_ind_bit(adapter_int->ind_addr,
+				  adapter_int->ind_offset, adapter->swap);
+		set_bit(bit, map);
+		mark_page_dirty(kvm, adapter_int->ind_gaddr >> PAGE_SHIFT);
+		set_page_dirty_lock(ind_page);
+		srcu_read_unlock(&kvm->srcu, idx);
+	} else {
+		map = page_address(ind_info->page);
+		bit = get_ind_bit(ind_info->addr, adapter_int->ind_offset, adapter->swap);
+		set_bit(bit, map);
+		raw_spin_unlock_irqrestore(&adapter->maps_lock, flags);
+	}
+	raw_spin_lock_irqsave(&adapter->maps_lock, flags);
+	summary_info = get_map_info(adapter, adapter_int->summary_addr);
+	if (!summary_info) {
+		raw_spin_unlock_irqrestore(&adapter->maps_lock, flags);
+		summary_page = pin_map_page(kvm, adapter_int->summary_addr, 0);
+		if (!summary_page) {
+			if (!ind_info) {
+				WARN_ON_ONCE(!ind_page);
+				unpin_user_page(ind_page);
+			}
+			return -1;
+		}
+		idx = srcu_read_lock(&kvm->srcu);
+		map = page_address(summary_page);
+		bit = get_ind_bit(adapter_int->summary_addr,
+				  adapter_int->summary_offset, adapter->swap);
+		summary_set = test_and_set_bit(bit, map);
+		mark_page_dirty(kvm, adapter_int->summary_gaddr >> PAGE_SHIFT);
+		set_page_dirty_lock(summary_page);
+		srcu_read_unlock(&kvm->srcu, idx);
+	} else {
+		map = page_address(summary_info->page);
+		bit = get_ind_bit(summary_info->addr, adapter_int->summary_offset,
+				  adapter->swap);
+		summary_set = test_and_set_bit(bit, map);
+		raw_spin_unlock_irqrestore(&adapter->maps_lock, flags);
 	}
 
-	idx = srcu_read_lock(&kvm->srcu);
-	map = page_address(ind_page);
-	bit = get_ind_bit(adapter_int->ind_addr,
-			  adapter_int->ind_offset, adapter->swap);
-	set_bit(bit, map);
-	mark_page_dirty(kvm, adapter_int->ind_gaddr >> PAGE_SHIFT);
-	set_page_dirty_lock(ind_page);
-	map = page_address(summary_page);
-	bit = get_ind_bit(adapter_int->summary_addr,
-			  adapter_int->summary_offset, adapter->swap);
-	summary_set = test_and_set_bit(bit, map);
-	mark_page_dirty(kvm, adapter_int->summary_gaddr >> PAGE_SHIFT);
-	set_page_dirty_lock(summary_page);
-	srcu_read_unlock(&kvm->srcu, idx);
-
-	unpin_user_page(ind_page);
-	unpin_user_page(summary_page);
+	if (!ind_info)
+		unpin_user_page(ind_page);
+	if (!summary_info)
+		unpin_user_page(summary_page);
 	return summary_set ? 0 : 1;
 }
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v5 3/4] KVM: s390: Change the fi->lock to a raw_spinlock for RT case
  2026-05-05 17:37 [PATCH v5 0/4] KVM: s390: Introducing kvm_arch_set_irq_inatomic Fast Inject Douglas Freimuth
  2026-05-05 17:37 ` [PATCH v5 1/4] KVM: s390: Add map/unmap ioctl and clean mappings post-guest Douglas Freimuth
  2026-05-05 17:37 ` [PATCH v5 2/4] KVM: s390: Enable adapter_indicators_set to use mapped pages Douglas Freimuth
@ 2026-05-05 17:37 ` Douglas Freimuth
  2026-05-06  4:57   ` Heiko Carstens
  2026-05-05 17:37 ` [PATCH v5 4/4] KVM: s390: Introducing kvm_arch_set_irq_inatomic fast inject Douglas Freimuth
  3 siblings, 1 reply; 13+ messages in thread
From: Douglas Freimuth @ 2026-05-05 17:37 UTC (permalink / raw)
  To: borntraeger, imbrenda, frankja, david, hca, gor, agordeev, svens,
	kvm, linux-s390, linux-kernel
  Cc: mjrosato, freimuth

s390 needs to maintain support for an RT kernel. This requires the
floating interrupt lock, fi->lock to be changed to a raw spin lock 
since the fi->lock maybe called with interrupts disabled in __inject_io.

Signed-off-by: Douglas Freimuth <freimuth@linux.ibm.com>
---
 arch/s390/include/asm/kvm_host.h |  2 +-
 arch/s390/kvm/intercept.c        |  4 +-
 arch/s390/kvm/interrupt.c        | 68 ++++++++++++++++----------------
 arch/s390/kvm/kvm-s390.c         |  2 +-
 4 files changed, 38 insertions(+), 38 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index fbb2406b31d2..9dd8a4986592 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -353,7 +353,7 @@ struct kvm_s390_local_interrupt {
 struct kvm_s390_float_interrupt {
 	unsigned long pending_irqs;
 	unsigned long masked_irqs;
-	spinlock_t lock;
+	raw_spinlock_t lock;
 	struct list_head lists[FIRQ_LIST_COUNT];
 	int counters[FIRQ_MAX_COUNT];
 	struct kvm_s390_mchk_info mchk;
diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
index 39aff324203e..6e9ad58c0e90 100644
--- a/arch/s390/kvm/intercept.c
+++ b/arch/s390/kvm/intercept.c
@@ -518,7 +518,7 @@ static int handle_pv_sclp(struct kvm_vcpu *vcpu)
 {
 	struct kvm_s390_float_interrupt *fi = &vcpu->kvm->arch.float_int;
 
-	spin_lock(&fi->lock);
+	raw_spin_lock(&fi->lock);
 	/*
 	 * 2 cases:
 	 * a: an sccb answering interrupt was already pending or in flight.
@@ -534,7 +534,7 @@ static int handle_pv_sclp(struct kvm_vcpu *vcpu)
 	fi->srv_signal.ext_params |= 0x43000;
 	set_bit(IRQ_PEND_EXT_SERVICE, &fi->pending_irqs);
 	clear_bit(IRQ_PEND_EXT_SERVICE, &fi->masked_irqs);
-	spin_unlock(&fi->lock);
+	raw_spin_unlock(&fi->lock);
 	return 0;
 }
 
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 12d8d38c260d..49ccdeccc70c 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -625,7 +625,7 @@ static int __must_check __deliver_machine_check(struct kvm_vcpu *vcpu)
 	int deliver = 0;
 	int rc = 0;
 
-	spin_lock(&fi->lock);
+	raw_spin_lock(&fi->lock);
 	spin_lock(&li->lock);
 	if (test_bit(IRQ_PEND_MCHK_EX, &li->pending_irqs) ||
 	    test_bit(IRQ_PEND_MCHK_REP, &li->pending_irqs)) {
@@ -654,7 +654,7 @@ static int __must_check __deliver_machine_check(struct kvm_vcpu *vcpu)
 		deliver = 1;
 	}
 	spin_unlock(&li->lock);
-	spin_unlock(&fi->lock);
+	raw_spin_unlock(&fi->lock);
 
 	if (deliver) {
 		VCPU_EVENT(vcpu, 3, "deliver: machine check mcic 0x%llx",
@@ -942,10 +942,10 @@ static int __must_check __deliver_service(struct kvm_vcpu *vcpu)
 	struct kvm_s390_float_interrupt *fi = &vcpu->kvm->arch.float_int;
 	struct kvm_s390_ext_info ext;
 
-	spin_lock(&fi->lock);
+	raw_spin_lock(&fi->lock);
 	if (test_bit(IRQ_PEND_EXT_SERVICE, &fi->masked_irqs) ||
 	    !(test_bit(IRQ_PEND_EXT_SERVICE, &fi->pending_irqs))) {
-		spin_unlock(&fi->lock);
+		raw_spin_unlock(&fi->lock);
 		return 0;
 	}
 	ext = fi->srv_signal;
@@ -954,7 +954,7 @@ static int __must_check __deliver_service(struct kvm_vcpu *vcpu)
 	clear_bit(IRQ_PEND_EXT_SERVICE_EV, &fi->pending_irqs);
 	if (kvm_s390_pv_cpu_is_protected(vcpu))
 		set_bit(IRQ_PEND_EXT_SERVICE, &fi->masked_irqs);
-	spin_unlock(&fi->lock);
+	raw_spin_unlock(&fi->lock);
 
 	if (!ext.ext_params)
 		return 0;
@@ -973,16 +973,16 @@ static int __must_check __deliver_service_ev(struct kvm_vcpu *vcpu)
 	struct kvm_s390_float_interrupt *fi = &vcpu->kvm->arch.float_int;
 	struct kvm_s390_ext_info ext;
 
-	spin_lock(&fi->lock);
+	raw_spin_lock(&fi->lock);
 	if (!(test_bit(IRQ_PEND_EXT_SERVICE_EV, &fi->pending_irqs))) {
-		spin_unlock(&fi->lock);
+		raw_spin_unlock(&fi->lock);
 		return 0;
 	}
 	ext = fi->srv_signal;
 	/* only clear the event bits */
 	fi->srv_signal.ext_params &= ~SCCB_EVENT_PENDING;
 	clear_bit(IRQ_PEND_EXT_SERVICE_EV, &fi->pending_irqs);
-	spin_unlock(&fi->lock);
+	raw_spin_unlock(&fi->lock);
 
 	VCPU_EVENT(vcpu, 4, "%s", "deliver: sclp parameter event");
 	vcpu->stat.deliver_service_signal++;
@@ -998,7 +998,7 @@ static int __must_check __deliver_pfault_done(struct kvm_vcpu *vcpu)
 	struct kvm_s390_interrupt_info *inti;
 	int rc = 0;
 
-	spin_lock(&fi->lock);
+	raw_spin_lock(&fi->lock);
 	inti = list_first_entry_or_null(&fi->lists[FIRQ_LIST_PFAULT],
 					struct kvm_s390_interrupt_info,
 					list);
@@ -1008,7 +1008,7 @@ static int __must_check __deliver_pfault_done(struct kvm_vcpu *vcpu)
 	}
 	if (list_empty(&fi->lists[FIRQ_LIST_PFAULT]))
 		clear_bit(IRQ_PEND_PFAULT_DONE, &fi->pending_irqs);
-	spin_unlock(&fi->lock);
+	raw_spin_unlock(&fi->lock);
 
 	if (inti) {
 		trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id,
@@ -1040,7 +1040,7 @@ static int __must_check __deliver_virtio(struct kvm_vcpu *vcpu)
 	struct kvm_s390_interrupt_info *inti;
 	int rc = 0;
 
-	spin_lock(&fi->lock);
+	raw_spin_lock(&fi->lock);
 	inti = list_first_entry_or_null(&fi->lists[FIRQ_LIST_VIRTIO],
 					struct kvm_s390_interrupt_info,
 					list);
@@ -1058,7 +1058,7 @@ static int __must_check __deliver_virtio(struct kvm_vcpu *vcpu)
 	}
 	if (list_empty(&fi->lists[FIRQ_LIST_VIRTIO]))
 		clear_bit(IRQ_PEND_VIRTIO, &fi->pending_irqs);
-	spin_unlock(&fi->lock);
+	raw_spin_unlock(&fi->lock);
 
 	if (inti) {
 		rc  = put_guest_lc(vcpu, EXT_IRQ_CP_SERVICE,
@@ -1119,7 +1119,7 @@ static int __must_check __deliver_io(struct kvm_vcpu *vcpu,
 
 	fi = &vcpu->kvm->arch.float_int;
 
-	spin_lock(&fi->lock);
+	raw_spin_lock(&fi->lock);
 	isc = irq_type_to_isc(irq_type);
 	isc_list = &fi->lists[isc];
 	inti = list_first_entry_or_null(isc_list,
@@ -1146,7 +1146,7 @@ static int __must_check __deliver_io(struct kvm_vcpu *vcpu,
 	}
 	if (list_empty(isc_list))
 		clear_bit(irq_type, &fi->pending_irqs);
-	spin_unlock(&fi->lock);
+	raw_spin_unlock(&fi->lock);
 
 	if (inti) {
 		rc = __do_deliver_io(vcpu, &(inti->io));
@@ -1663,7 +1663,7 @@ static struct kvm_s390_interrupt_info *get_io_int(struct kvm *kvm,
 	u16 id = (schid & 0xffff0000U) >> 16;
 	u16 nr = schid & 0x0000ffffU;
 
-	spin_lock(&fi->lock);
+	raw_spin_lock(&fi->lock);
 	list_for_each_entry(iter, isc_list, list) {
 		if (schid && (id != iter->io.subchannel_id ||
 			      nr != iter->io.subchannel_nr))
@@ -1673,10 +1673,10 @@ static struct kvm_s390_interrupt_info *get_io_int(struct kvm *kvm,
 		fi->counters[FIRQ_CNTR_IO] -= 1;
 		if (list_empty(isc_list))
 			clear_bit(isc_to_irq_type(isc), &fi->pending_irqs);
-		spin_unlock(&fi->lock);
+		raw_spin_unlock(&fi->lock);
 		return iter;
 	}
-	spin_unlock(&fi->lock);
+	raw_spin_unlock(&fi->lock);
 	return NULL;
 }
 
@@ -1771,7 +1771,7 @@ static int __inject_service(struct kvm *kvm,
 	struct kvm_s390_float_interrupt *fi = &kvm->arch.float_int;
 
 	kvm->stat.inject_service_signal++;
-	spin_lock(&fi->lock);
+	raw_spin_lock(&fi->lock);
 	fi->srv_signal.ext_params |= inti->ext.ext_params & SCCB_EVENT_PENDING;
 
 	/* We always allow events, track them separately from the sccb ints */
@@ -1791,7 +1791,7 @@ static int __inject_service(struct kvm *kvm,
 	fi->srv_signal.ext_params |= inti->ext.ext_params & SCCB_MASK;
 	set_bit(IRQ_PEND_EXT_SERVICE, &fi->pending_irqs);
 out:
-	spin_unlock(&fi->lock);
+	raw_spin_unlock(&fi->lock);
 	kfree(inti);
 	return 0;
 }
@@ -1802,15 +1802,15 @@ static int __inject_virtio(struct kvm *kvm,
 	struct kvm_s390_float_interrupt *fi = &kvm->arch.float_int;
 
 	kvm->stat.inject_virtio++;
-	spin_lock(&fi->lock);
+	raw_spin_lock(&fi->lock);
 	if (fi->counters[FIRQ_CNTR_VIRTIO] >= KVM_S390_MAX_VIRTIO_IRQS) {
-		spin_unlock(&fi->lock);
+		raw_spin_unlock(&fi->lock);
 		return -EBUSY;
 	}
 	fi->counters[FIRQ_CNTR_VIRTIO] += 1;
 	list_add_tail(&inti->list, &fi->lists[FIRQ_LIST_VIRTIO]);
 	set_bit(IRQ_PEND_VIRTIO, &fi->pending_irqs);
-	spin_unlock(&fi->lock);
+	raw_spin_unlock(&fi->lock);
 	return 0;
 }
 
@@ -1820,16 +1820,16 @@ static int __inject_pfault_done(struct kvm *kvm,
 	struct kvm_s390_float_interrupt *fi = &kvm->arch.float_int;
 
 	kvm->stat.inject_pfault_done++;
-	spin_lock(&fi->lock);
+	raw_spin_lock(&fi->lock);
 	if (fi->counters[FIRQ_CNTR_PFAULT] >=
 		(ASYNC_PF_PER_VCPU * KVM_MAX_VCPUS)) {
-		spin_unlock(&fi->lock);
+		raw_spin_unlock(&fi->lock);
 		return -EBUSY;
 	}
 	fi->counters[FIRQ_CNTR_PFAULT] += 1;
 	list_add_tail(&inti->list, &fi->lists[FIRQ_LIST_PFAULT]);
 	set_bit(IRQ_PEND_PFAULT_DONE, &fi->pending_irqs);
-	spin_unlock(&fi->lock);
+	raw_spin_unlock(&fi->lock);
 	return 0;
 }
 
@@ -1840,11 +1840,11 @@ static int __inject_float_mchk(struct kvm *kvm,
 	struct kvm_s390_float_interrupt *fi = &kvm->arch.float_int;
 
 	kvm->stat.inject_float_mchk++;
-	spin_lock(&fi->lock);
+	raw_spin_lock(&fi->lock);
 	fi->mchk.cr14 |= inti->mchk.cr14 & (1UL << CR_PENDING_SUBCLASS);
 	fi->mchk.mcic |= inti->mchk.mcic;
 	set_bit(IRQ_PEND_MCHK_REP, &fi->pending_irqs);
-	spin_unlock(&fi->lock);
+	raw_spin_unlock(&fi->lock);
 	kfree(inti);
 	return 0;
 }
@@ -1873,9 +1873,9 @@ static int __inject_io(struct kvm *kvm, struct kvm_s390_interrupt_info *inti)
 	}
 
 	fi = &kvm->arch.float_int;
-	spin_lock(&fi->lock);
+	raw_spin_lock(&fi->lock);
 	if (fi->counters[FIRQ_CNTR_IO] >= KVM_S390_MAX_FLOAT_IRQS) {
-		spin_unlock(&fi->lock);
+		raw_spin_unlock(&fi->lock);
 		return -EBUSY;
 	}
 	fi->counters[FIRQ_CNTR_IO] += 1;
@@ -1890,7 +1890,7 @@ static int __inject_io(struct kvm *kvm, struct kvm_s390_interrupt_info *inti)
 	list = &fi->lists[FIRQ_LIST_IO_ISC_0 + isc];
 	list_add_tail(&inti->list, list);
 	set_bit(isc_to_irq_type(isc), &fi->pending_irqs);
-	spin_unlock(&fi->lock);
+	raw_spin_unlock(&fi->lock);
 	return 0;
 }
 
@@ -2181,7 +2181,7 @@ void kvm_s390_clear_float_irqs(struct kvm *kvm)
 	if (!kvm_s390_pv_is_protected(kvm))
 		fi->masked_irqs = 0;
 	mutex_unlock(&kvm->lock);
-	spin_lock(&fi->lock);
+	raw_spin_lock(&fi->lock);
 	fi->pending_irqs = 0;
 	memset(&fi->srv_signal, 0, sizeof(fi->srv_signal));
 	memset(&fi->mchk, 0, sizeof(fi->mchk));
@@ -2189,7 +2189,7 @@ void kvm_s390_clear_float_irqs(struct kvm *kvm)
 		clear_irq_list(&fi->lists[i]);
 	for (i = 0; i < FIRQ_MAX_COUNT; i++)
 		fi->counters[i] = 0;
-	spin_unlock(&fi->lock);
+	raw_spin_unlock(&fi->lock);
 	kvm_s390_gisa_clear(kvm);
 };
 
@@ -2235,7 +2235,7 @@ static int get_all_floating_irqs(struct kvm *kvm, u8 __user *usrbuf, u64 len)
 		}
 	}
 	fi = &kvm->arch.float_int;
-	spin_lock(&fi->lock);
+	raw_spin_lock(&fi->lock);
 	for (i = 0; i < FIRQ_LIST_COUNT; i++) {
 		list_for_each_entry(inti, &fi->lists[i], list) {
 			if (n == max_irqs) {
@@ -2272,7 +2272,7 @@ static int get_all_floating_irqs(struct kvm *kvm, u8 __user *usrbuf, u64 len)
 }
 
 out:
-	spin_unlock(&fi->lock);
+	raw_spin_unlock(&fi->lock);
 out_nolock:
 	if (!ret && n > 0) {
 		if (copy_to_user(usrbuf, buf, sizeof(struct kvm_s390_irq) * n))
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 74f453f039a3..d8011b6d6801 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -3263,7 +3263,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	}
 
 	mutex_init(&kvm->arch.float_int.ais_lock);
-	spin_lock_init(&kvm->arch.float_int.lock);
+	raw_spin_lock_init(&kvm->arch.float_int.lock);
 	for (i = 0; i < FIRQ_LIST_COUNT; i++)
 		INIT_LIST_HEAD(&kvm->arch.float_int.lists[i]);
 	init_waitqueue_head(&kvm->arch.ipte_wq);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v5 4/4] KVM: s390: Introducing kvm_arch_set_irq_inatomic fast inject
  2026-05-05 17:37 [PATCH v5 0/4] KVM: s390: Introducing kvm_arch_set_irq_inatomic Fast Inject Douglas Freimuth
                   ` (2 preceding siblings ...)
  2026-05-05 17:37 ` [PATCH v5 3/4] KVM: s390: Change the fi->lock to a raw_spinlock for RT case Douglas Freimuth
@ 2026-05-05 17:37 ` Douglas Freimuth
  3 siblings, 0 replies; 13+ messages in thread
From: Douglas Freimuth @ 2026-05-05 17:37 UTC (permalink / raw)
  To: borntraeger, imbrenda, frankja, david, hca, gor, agordeev, svens,
	kvm, linux-s390, linux-kernel
  Cc: mjrosato, freimuth

S390 needs a fast path for irq injection, and along those lines we
introduce kvm_arch_set_irq_inatomic. Instead of placing all interrupts on
the global work queue as it does today, this patch provides a fast path for
irq injection.

The inatomic fast path cannot lose control since it is running with
interrupts disabled. This meant making the following changes that exist on
the slow path today. First, the adapter_indicators page needs to be mapped
since it is accessed with interrupts disabled, so we added map/unmap
functions. Second, access to shared resources between the fast and slow
paths needed to be changed from mutex and semaphores to raw_spin_lock's.
Finally, the memory allocation on the slow path utilizes GFP_KERNEL_ACCOUNT
but we had to implement the fast path with GFP_ATOMIC allocation. Each of
these enhancements were required to prevent blocking on the fast inject
path.

Fencing of Fast Inject in Secure Execution environments is enabled in the
patch series by not mapping adapter indicator pages. In Secure Execution
environments the path of execution available before this patch is followed.

Statistical counters have been added to enable analysis of irq injection on
the fast path and slow path including io_390_inatomic, io_flic_inject_airq,
io_set_adapter_int and io_390_inatomic_adapter_masked.

Signed-off-by: Douglas Freimuth <freimuth@linux.ibm.com>
---
 arch/s390/include/asm/kvm_host.h |   6 +-
 arch/s390/kvm/interrupt.c        | 163 +++++++++++++++++++++++++++----
 arch/s390/kvm/kvm-s390.c         |  21 +++-
 arch/s390/kvm/kvm-s390.h         |   3 +-
 4 files changed, 170 insertions(+), 23 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 9dd8a4986592..b485dee4c766 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -359,7 +359,7 @@ struct kvm_s390_float_interrupt {
 	struct kvm_s390_mchk_info mchk;
 	struct kvm_s390_ext_info srv_signal;
 	int last_sleep_cpu;
-	struct mutex ais_lock;
+	raw_spinlock_t ais_lock;
 	u8 simm;
 	u8 nimm;
 };
@@ -450,6 +450,10 @@ struct kvm_vm_stat {
 	u64 inject_io;
 	u64 io_390_adapter_map;
 	u64 io_390_adapter_unmap;
+	u64 io_390_inatomic;
+	u64 io_flic_inject_airq;
+	u64 io_set_adapter_int;
+	u64 io_390_inatomic_adapter_masked;
 	u64 inject_float_mchk;
 	u64 inject_pfault_done;
 	u64 inject_service_signal;
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 49ccdeccc70c..1c79ad072fce 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -1966,15 +1966,10 @@ static int __inject_vm(struct kvm *kvm, struct kvm_s390_interrupt_info *inti)
 }
 
 int kvm_s390_inject_vm(struct kvm *kvm,
-		       struct kvm_s390_interrupt *s390int)
+		       struct kvm_s390_interrupt *s390int, struct kvm_s390_interrupt_info *inti)
 {
-	struct kvm_s390_interrupt_info *inti;
 	int rc;
 
-	inti = kzalloc_obj(*inti, GFP_KERNEL_ACCOUNT);
-	if (!inti)
-		return -ENOMEM;
-
 	inti->type = s390int->type;
 	switch (inti->type) {
 	case KVM_S390_INT_VIRTIO:
@@ -2010,6 +2005,7 @@ int kvm_s390_inject_vm(struct kvm *kvm,
 				 2);
 
 	rc = __inject_vm(kvm, inti);
+	/* memory allocation is done by the caller and inti is passed in, we free it here */
 	if (rc)
 		kfree(inti);
 	return rc;
@@ -2287,6 +2283,7 @@ static int flic_ais_mode_get_all(struct kvm *kvm, struct kvm_device_attr *attr)
 {
 	struct kvm_s390_float_interrupt *fi = &kvm->arch.float_int;
 	struct kvm_s390_ais_all ais;
+	unsigned long flags;
 
 	if (attr->attr < sizeof(ais))
 		return -EINVAL;
@@ -2294,10 +2291,10 @@ static int flic_ais_mode_get_all(struct kvm *kvm, struct kvm_device_attr *attr)
 	if (!test_kvm_facility(kvm, 72))
 		return -EOPNOTSUPP;
 
-	mutex_lock(&fi->ais_lock);
+	raw_spin_lock_irqsave(&fi->ais_lock, flags);
 	ais.simm = fi->simm;
 	ais.nimm = fi->nimm;
-	mutex_unlock(&fi->ais_lock);
+	raw_spin_unlock_irqrestore(&fi->ais_lock, flags);
 
 	if (copy_to_user((void __user *)attr->addr, &ais, sizeof(ais)))
 		return -EFAULT;
@@ -2674,6 +2671,7 @@ static int modify_ais_mode(struct kvm *kvm, struct kvm_device_attr *attr)
 	struct kvm_s390_float_interrupt *fi = &kvm->arch.float_int;
 	struct kvm_s390_ais_req req;
 	int ret = 0;
+	unsigned long flags;
 
 	if (!test_kvm_facility(kvm, 72))
 		return -EOPNOTSUPP;
@@ -2690,7 +2688,7 @@ static int modify_ais_mode(struct kvm *kvm, struct kvm_device_attr *attr)
 				       2 : KVM_S390_AIS_MODE_SINGLE :
 				       KVM_S390_AIS_MODE_ALL, req.mode);
 
-	mutex_lock(&fi->ais_lock);
+	raw_spin_lock_irqsave(&fi->ais_lock, flags);
 	switch (req.mode) {
 	case KVM_S390_AIS_MODE_ALL:
 		fi->simm &= ~AIS_MODE_MASK(req.isc);
@@ -2703,7 +2701,7 @@ static int modify_ais_mode(struct kvm *kvm, struct kvm_device_attr *attr)
 	default:
 		ret = -EINVAL;
 	}
-	mutex_unlock(&fi->ais_lock);
+	raw_spin_unlock_irqrestore(&fi->ais_lock, flags);
 
 	return ret;
 }
@@ -2717,25 +2715,33 @@ static int kvm_s390_inject_airq(struct kvm *kvm,
 		.parm = 0,
 		.parm64 = isc_to_int_word(adapter->isc),
 	};
+	struct kvm_s390_interrupt_info *inti;
+	unsigned long flags;
+
 	int ret = 0;
 
+	inti = kzalloc_obj(*inti, GFP_KERNEL_ACCOUNT);
+	if (!inti)
+		return -ENOMEM;
+
 	if (!test_kvm_facility(kvm, 72) || !adapter->suppressible)
-		return kvm_s390_inject_vm(kvm, &s390int);
+		return kvm_s390_inject_vm(kvm, &s390int, inti);
 
-	mutex_lock(&fi->ais_lock);
+	raw_spin_lock_irqsave(&fi->ais_lock, flags);
 	if (fi->nimm & AIS_MODE_MASK(adapter->isc)) {
 		trace_kvm_s390_airq_suppressed(adapter->id, adapter->isc);
+		kfree(inti);
 		goto out;
 	}
 
-	ret = kvm_s390_inject_vm(kvm, &s390int);
+	ret = kvm_s390_inject_vm(kvm, &s390int, inti);
 	if (!ret && (fi->simm & AIS_MODE_MASK(adapter->isc))) {
 		fi->nimm |= AIS_MODE_MASK(adapter->isc);
 		trace_kvm_s390_modify_ais_mode(adapter->isc,
 					       KVM_S390_AIS_MODE_SINGLE, 2);
 	}
 out:
-	mutex_unlock(&fi->ais_lock);
+	raw_spin_unlock_irqrestore(&fi->ais_lock, flags);
 	return ret;
 }
 
@@ -2744,6 +2750,8 @@ static int flic_inject_airq(struct kvm *kvm, struct kvm_device_attr *attr)
 	unsigned int id = attr->attr;
 	struct s390_io_adapter *adapter = get_io_adapter(kvm, id);
 
+	kvm->stat.io_flic_inject_airq++;
+
 	if (!adapter)
 		return -EINVAL;
 
@@ -2754,6 +2762,7 @@ static int flic_ais_mode_set_all(struct kvm *kvm, struct kvm_device_attr *attr)
 {
 	struct kvm_s390_float_interrupt *fi = &kvm->arch.float_int;
 	struct kvm_s390_ais_all ais;
+	unsigned long flags;
 
 	if (!test_kvm_facility(kvm, 72))
 		return -EOPNOTSUPP;
@@ -2761,10 +2770,10 @@ static int flic_ais_mode_set_all(struct kvm *kvm, struct kvm_device_attr *attr)
 	if (copy_from_user(&ais, (void __user *)attr->addr, sizeof(ais)))
 		return -EFAULT;
 
-	mutex_lock(&fi->ais_lock);
+	raw_spin_lock_irqsave(&fi->ais_lock, flags);
 	fi->simm = ais.simm;
 	fi->nimm = ais.nimm;
-	mutex_unlock(&fi->ais_lock);
+	raw_spin_unlock_irqrestore(&fi->ais_lock, flags);
 
 	return 0;
 }
@@ -2930,6 +2939,7 @@ static int adapter_indicators_set(struct kvm *kvm,
 		set_bit(bit, map);
 		raw_spin_unlock_irqrestore(&adapter->maps_lock, flags);
 	}
+
 	raw_spin_lock_irqsave(&adapter->maps_lock, flags);
 	summary_info = get_map_info(adapter, adapter_int->summary_addr);
 	if (!summary_info) {
@@ -2965,6 +2975,44 @@ static int adapter_indicators_set(struct kvm *kvm,
 	return summary_set ? 0 : 1;
 }
 
+static int adapter_indicators_set_fast(struct kvm *kvm,
+				       struct s390_io_adapter *adapter,
+				       struct kvm_s390_adapter_int *adapter_int,
+				       int setbit)
+{
+	unsigned long bit;
+	int summary_set;
+	struct s390_map_info *ind_info, *summary_info;
+	void *map;
+
+	raw_spin_lock(&adapter->maps_lock);
+	ind_info = get_map_info(adapter, adapter_int->ind_addr);
+	if (!ind_info) {
+		raw_spin_unlock(&adapter->maps_lock);
+		return -EWOULDBLOCK;
+	}
+	map = page_address(ind_info->page);
+	bit = get_ind_bit(ind_info->addr, adapter_int->ind_offset, adapter->swap);
+	if (setbit)
+		set_bit(bit, map);
+	summary_info = get_map_info(adapter, adapter_int->summary_addr);
+	if (!summary_info) {
+		raw_spin_unlock(&adapter->maps_lock);
+		return -EWOULDBLOCK;
+	}
+	map = page_address(summary_info->page);
+	bit = get_ind_bit(summary_info->addr, adapter_int->summary_offset,
+			  adapter->swap);
+	/* If setbit then set summary bit. Else if falling back to the slow path */
+	/* with setbit==0 then clear the summary bit so the slow path re-injects */
+	if (setbit)
+		summary_set = test_and_set_bit(bit, map);
+	else
+		summary_set = test_and_clear_bit(bit, map);
+	raw_spin_unlock(&adapter->maps_lock);
+	return summary_set ? 0 : 1;
+}
+
 /*
  * < 0 - not injected due to error
  * = 0 - coalesced, summary indicator already active
@@ -2977,6 +3025,8 @@ static int set_adapter_int(struct kvm_kernel_irq_routing_entry *e,
 	int ret;
 	struct s390_io_adapter *adapter;
 
+	kvm->stat.io_set_adapter_int++;
+
 	/* We're only interested in the 0->1 transition. */
 	if (!level)
 		return 0;
@@ -3045,7 +3095,6 @@ int kvm_set_routing_entry(struct kvm *kvm,
 	int idx;
 
 	switch (ue->type) {
-	/* we store the userspace addresses instead of the guest addresses */
 	case KVM_IRQ_ROUTING_S390_ADAPTER:
 		if (kvm_is_ucontrol(kvm))
 			return -EINVAL;
@@ -3636,3 +3685,83 @@ int __init kvm_s390_gib_init(u8 nisc)
 out:
 	return rc;
 }
+
+/*
+ * kvm_arch_set_irq_inatomic: fast-path for irqfd injection
+ */
+int kvm_arch_set_irq_inatomic(struct kvm_kernel_irq_routing_entry *e,
+			      struct kvm *kvm, int irq_source_id, int level,
+			      bool line_status)
+{
+	int ret, setbit;
+	struct s390_io_adapter *adapter;
+	struct kvm_s390_float_interrupt *fi = &kvm->arch.float_int;
+	struct kvm_s390_interrupt_info *inti;
+	struct kvm_s390_interrupt s390int = {
+			.type = KVM_S390_INT_IO(1, 0, 0, 0),
+			.parm = 0,
+	};
+
+	kvm->stat.io_390_inatomic++;
+
+	/* We're only interested in the 0->1 transition. */
+	if (!level)
+		return -EWOULDBLOCK;
+	if (e->type != KVM_IRQ_ROUTING_S390_ADAPTER)
+		return -EWOULDBLOCK;
+
+	adapter = get_io_adapter(kvm, e->adapter.adapter_id);
+	if (!adapter)
+		return -EWOULDBLOCK;
+
+	s390int.parm64 = isc_to_int_word(adapter->isc);
+	setbit = 1;
+	ret = adapter_indicators_set_fast(kvm, adapter, &e->adapter, setbit);
+	if (ret < 0)
+		return -EWOULDBLOCK;
+	if (!ret || adapter->masked) {
+		kvm->stat.io_390_inatomic_adapter_masked++;
+		return 0;
+	}
+
+	inti = kzalloc_obj(*inti, GFP_ATOMIC);
+	if (!inti) {
+		setbit = 0;
+		adapter_indicators_set_fast(kvm, adapter, &e->adapter, setbit);
+		return -EWOULDBLOCK;
+	}
+
+	if (!test_kvm_facility(kvm, 72) || !adapter->suppressible) {
+		ret = kvm_s390_inject_vm(kvm, &s390int, inti);
+		if (ret == 0) {
+			return ret;
+		} else {
+			setbit = 0;
+			adapter_indicators_set_fast(kvm, adapter, &e->adapter, setbit);
+			return -EWOULDBLOCK;
+		}
+	}
+
+	raw_spin_lock(&fi->ais_lock);
+	if (fi->nimm & AIS_MODE_MASK(adapter->isc)) {
+		trace_kvm_s390_airq_suppressed(adapter->id, adapter->isc);
+		kfree(inti);
+		goto out;
+	}
+
+	ret = kvm_s390_inject_vm(kvm, &s390int, inti);
+	if (!ret && (fi->simm & AIS_MODE_MASK(adapter->isc))) {
+		fi->nimm |= AIS_MODE_MASK(adapter->isc);
+		trace_kvm_s390_modify_ais_mode(adapter->isc,
+					       KVM_S390_AIS_MODE_SINGLE, 2);
+	} else if (ret) {
+		raw_spin_unlock(&fi->ais_lock);
+		setbit = 0;
+		adapter_indicators_set_fast(kvm, adapter, &e->adapter, setbit);
+		return -EWOULDBLOCK;
+	}
+
+out:
+	raw_spin_unlock(&fi->ais_lock);
+	return 0;
+}
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index d8011b6d6801..11b62fa8634f 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -70,6 +70,10 @@ const struct kvm_stats_desc kvm_vm_stats_desc[] = {
 	STATS_DESC_COUNTER(VM, inject_io),
 	STATS_DESC_COUNTER(VM, io_390_adapter_map),
 	STATS_DESC_COUNTER(VM, io_390_adapter_unmap),
+	STATS_DESC_COUNTER(VM, io_390_inatomic),
+	STATS_DESC_COUNTER(VM, io_flic_inject_airq),
+	STATS_DESC_COUNTER(VM, io_set_adapter_int),
+	STATS_DESC_COUNTER(VM, io_390_inatomic_adapter_masked),
 	STATS_DESC_COUNTER(VM, inject_float_mchk),
 	STATS_DESC_COUNTER(VM, inject_pfault_done),
 	STATS_DESC_COUNTER(VM, inject_service_signal),
@@ -2856,6 +2860,7 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
 	void __user *argp = (void __user *)arg;
 	struct kvm_device_attr attr;
 	int r;
+	struct kvm_s390_interrupt_info *inti;
 
 	switch (ioctl) {
 	case KVM_S390_INTERRUPT: {
@@ -2864,7 +2869,10 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
 		r = -EFAULT;
 		if (copy_from_user(&s390int, argp, sizeof(s390int)))
 			break;
-		r = kvm_s390_inject_vm(kvm, &s390int);
+		inti = kzalloc_obj(*inti, GFP_KERNEL_ACCOUNT);
+		if (!inti)
+			return -ENOMEM;
+		r = kvm_s390_inject_vm(kvm, &s390int, inti);
 		break;
 	}
 	case KVM_CREATE_IRQCHIP: {
@@ -3262,7 +3270,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 		mutex_unlock(&kvm->lock);
 	}
 
-	mutex_init(&kvm->arch.float_int.ais_lock);
+	raw_spin_lock_init(&kvm->arch.float_int.ais_lock);
 	raw_spin_lock_init(&kvm->arch.float_int.lock);
 	for (i = 0; i < FIRQ_LIST_COUNT; i++)
 		INIT_LIST_HEAD(&kvm->arch.float_int.lists[i]);
@@ -4384,19 +4392,24 @@ int kvm_s390_try_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clo
 }
 
 static void __kvm_inject_pfault_token(struct kvm_vcpu *vcpu, bool start_token,
-				      unsigned long token)
+				     unsigned long token)
 {
 	struct kvm_s390_interrupt inti;
 	struct kvm_s390_irq irq;
+	struct kvm_s390_interrupt_info *inti_mem = NULL;
 
 	if (start_token) {
 		irq.u.ext.ext_params2 = token;
 		irq.type = KVM_S390_INT_PFAULT_INIT;
 		WARN_ON_ONCE(kvm_s390_inject_vcpu(vcpu, &irq));
 	} else {
+		inti_mem = kzalloc_obj(*inti_mem, GFP_KERNEL_ACCOUNT);
+		if (WARN_ON_ONCE(!inti_mem))
+			return;
+
 		inti.type = KVM_S390_INT_PFAULT_DONE;
 		inti.parm64 = token;
-		WARN_ON_ONCE(kvm_s390_inject_vm(vcpu->kvm, &inti));
+		WARN_ON_ONCE(kvm_s390_inject_vm(vcpu->kvm, &inti, inti_mem));
 	}
 }
 
diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
index 7ba885cb6bd1..6d2842fb71a3 100644
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -376,7 +376,8 @@ int __must_check kvm_s390_deliver_pending_interrupts(struct kvm_vcpu *vcpu);
 void kvm_s390_clear_local_irqs(struct kvm_vcpu *vcpu);
 void kvm_s390_clear_float_irqs(struct kvm *kvm);
 int __must_check kvm_s390_inject_vm(struct kvm *kvm,
-				    struct kvm_s390_interrupt *s390int);
+				    struct kvm_s390_interrupt *s390int,
+				    struct kvm_s390_interrupt_info *inti);
 int __must_check kvm_s390_inject_vcpu(struct kvm_vcpu *vcpu,
 				      struct kvm_s390_irq *irq);
 static inline int kvm_s390_inject_prog_irq(struct kvm_vcpu *vcpu,
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 3/4] KVM: s390: Change the fi->lock to a raw_spinlock for RT case
  2026-05-05 17:37 ` [PATCH v5 3/4] KVM: s390: Change the fi->lock to a raw_spinlock for RT case Douglas Freimuth
@ 2026-05-06  4:57   ` Heiko Carstens
  2026-05-06 14:50     ` Douglas Freimuth
  0 siblings, 1 reply; 13+ messages in thread
From: Heiko Carstens @ 2026-05-06  4:57 UTC (permalink / raw)
  To: Douglas Freimuth
  Cc: borntraeger, imbrenda, frankja, david, gor, agordeev, svens, kvm,
	linux-s390, linux-kernel, mjrosato

On Tue, May 05, 2026 at 07:37:27PM +0200, Douglas Freimuth wrote:
> s390 needs to maintain support for an RT kernel. This requires the
> floating interrupt lock, fi->lock to be changed to a raw spin lock 
> since the fi->lock maybe called with interrupts disabled in __inject_io.
> 
> Signed-off-by: Douglas Freimuth <freimuth@linux.ibm.com>
> ---
>  arch/s390/include/asm/kvm_host.h |  2 +-
>  arch/s390/kvm/intercept.c        |  4 +-
>  arch/s390/kvm/interrupt.c        | 68 ++++++++++++++++----------------
>  arch/s390/kvm/kvm-s390.c         |  2 +-
>  4 files changed, 38 insertions(+), 38 deletions(-)

s390 does not support RT, but I guess you are referring to a lockdep splat
which you would see without doing this change, similar like we have seen at
other places.

Can you include the relevant parts of the splat for reference, please?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 3/4] KVM: s390: Change the fi->lock to a raw_spinlock for RT case
  2026-05-06  4:57   ` Heiko Carstens
@ 2026-05-06 14:50     ` Douglas Freimuth
  2026-05-07  9:56       ` Heiko Carstens
  0 siblings, 1 reply; 13+ messages in thread
From: Douglas Freimuth @ 2026-05-06 14:50 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: borntraeger, imbrenda, frankja, david, gor, agordeev, svens, kvm,
	linux-s390, linux-kernel, mjrosato



On 5/6/26 12:57 AM, Heiko Carstens wrote:
> On Tue, May 05, 2026 at 07:37:27PM +0200, Douglas Freimuth wrote:
>> s390 needs to maintain support for an RT kernel. This requires the
>> floating interrupt lock, fi->lock to be changed to a raw spin lock
>> since the fi->lock maybe called with interrupts disabled in __inject_io.
>>
>> Signed-off-by: Douglas Freimuth <freimuth@linux.ibm.com>
>> ---
>>   arch/s390/include/asm/kvm_host.h |  2 +-
>>   arch/s390/kvm/intercept.c        |  4 +-
>>   arch/s390/kvm/interrupt.c        | 68 ++++++++++++++++----------------
>>   arch/s390/kvm/kvm-s390.c         |  2 +-
>>   4 files changed, 38 insertions(+), 38 deletions(-)
> 
> s390 does not support RT, but I guess you are referring to a lockdep splat
> which you would see without doing this change, similar like we have seen at
> other places.
> 
> Can you include the relevant parts of the splat for reference, please?

Heiko, thank you for you response. I dont recall trapping it with 
lockdep (while it was on) but discussion on the mailing list in an 
earlier version made us look closer (and we saw it across the AI models 
that reviewed the patch.) It appears that while RT isn't supported it 
can still be compiled in to the kernel so we wanted to mitigate the 
issues we would add to if someone does that while not impacting non-RT 
environments, the main use case.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 3/4] KVM: s390: Change the fi->lock to a raw_spinlock for RT case
  2026-05-06 14:50     ` Douglas Freimuth
@ 2026-05-07  9:56       ` Heiko Carstens
  2026-05-07 13:17         ` Matthew Rosato
  0 siblings, 1 reply; 13+ messages in thread
From: Heiko Carstens @ 2026-05-07  9:56 UTC (permalink / raw)
  To: Douglas Freimuth
  Cc: borntraeger, imbrenda, frankja, david, gor, agordeev, svens, kvm,
	linux-s390, linux-kernel, mjrosato

On Wed, May 06, 2026 at 10:50:52AM -0400, Douglas Freimuth wrote:
> On 5/6/26 12:57 AM, Heiko Carstens wrote:
> > On Tue, May 05, 2026 at 07:37:27PM +0200, Douglas Freimuth wrote:
> > > s390 needs to maintain support for an RT kernel. This requires the
> > > floating interrupt lock, fi->lock to be changed to a raw spin lock
> > > since the fi->lock maybe called with interrupts disabled in __inject_io.
> > > 
> > > Signed-off-by: Douglas Freimuth <freimuth@linux.ibm.com>
> > > ---
> > >   arch/s390/include/asm/kvm_host.h |  2 +-
> > >   arch/s390/kvm/intercept.c        |  4 +-
> > >   arch/s390/kvm/interrupt.c        | 68 ++++++++++++++++----------------
> > >   arch/s390/kvm/kvm-s390.c         |  2 +-
> > >   4 files changed, 38 insertions(+), 38 deletions(-)
> > 
> > s390 does not support RT, but I guess you are referring to a lockdep splat
> > which you would see without doing this change, similar like we have seen at
> > other places.
> > 
> > Can you include the relevant parts of the splat for reference, please?
> 
> Heiko, thank you for you response. I dont recall trapping it with lockdep
> (while it was on) but discussion on the mailing list in an earlier version
> made us look closer (and we saw it across the AI models that reviewed the
> patch.) It appears that while RT isn't supported it can still be compiled in
> to the kernel so we wanted to mitigate the issues we would add to if someone
> does that while not impacting non-RT environments, the main use case.

RT support cannot be compiled in for s390, because of the missing
"select ARCH_SUPPORTS_RT", however you can still enable lockdep checks
for raw_spinlock vs spinlock nesting, which this seems to appear about?

See PROVE_RAW_LOCK_NESTING config option for a more detailed description.

Therefore my question about a lockdep splat. However I don't see why
using spin_lock() instead of raw_spin_lock() alone in irq disabled
context could be problematic. On the other hand this patch does
introduce a

      raw_spin_lock();
      spin_lock();
      spin_unlock();
      raw_spin_unlock();

sequence in __deliver_machine_check() which seems to be incorrect and
indeed should generate a lockdep splat iff PROVE_RAW_LOCK_NESTING is
enabled.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 3/4] KVM: s390: Change the fi->lock to a raw_spinlock for RT case
  2026-05-07  9:56       ` Heiko Carstens
@ 2026-05-07 13:17         ` Matthew Rosato
  2026-05-07 14:45           ` Heiko Carstens
  0 siblings, 1 reply; 13+ messages in thread
From: Matthew Rosato @ 2026-05-07 13:17 UTC (permalink / raw)
  To: Heiko Carstens, Douglas Freimuth
  Cc: borntraeger, imbrenda, frankja, david, gor, agordeev, svens, kvm,
	linux-s390, linux-kernel

On 5/7/26 5:56 AM, Heiko Carstens wrote:
> On Wed, May 06, 2026 at 10:50:52AM -0400, Douglas Freimuth wrote:
>> On 5/6/26 12:57 AM, Heiko Carstens wrote:
>>> On Tue, May 05, 2026 at 07:37:27PM +0200, Douglas Freimuth wrote:
>>>> s390 needs to maintain support for an RT kernel. This requires the
>>>> floating interrupt lock, fi->lock to be changed to a raw spin lock
>>>> since the fi->lock maybe called with interrupts disabled in __inject_io.
>>>>
>>>> Signed-off-by: Douglas Freimuth <freimuth@linux.ibm.com>
>>>> ---
>>>>   arch/s390/include/asm/kvm_host.h |  2 +-
>>>>   arch/s390/kvm/intercept.c        |  4 +-
>>>>   arch/s390/kvm/interrupt.c        | 68 ++++++++++++++++----------------
>>>>   arch/s390/kvm/kvm-s390.c         |  2 +-
>>>>   4 files changed, 38 insertions(+), 38 deletions(-)
>>>
>>> s390 does not support RT, but I guess you are referring to a lockdep splat
>>> which you would see without doing this change, similar like we have seen at
>>> other places.
>>>
>>> Can you include the relevant parts of the splat for reference, please?
>>
>> Heiko, thank you for you response. I dont recall trapping it with lockdep
>> (while it was on) but discussion on the mailing list in an earlier version
>> made us look closer (and we saw it across the AI models that reviewed the
>> patch.) It appears that while RT isn't supported it can still be compiled in
>> to the kernel so we wanted to mitigate the issues we would add to if someone
>> does that while not impacting non-RT environments, the main use case.
> 
> RT support cannot be compiled in for s390, because of the missing
> "select ARCH_SUPPORTS_RT", however you can still enable lockdep checks
> for raw_spinlock vs spinlock nesting, which this seems to appear about?
> 
> See PROVE_RAW_LOCK_NESTING config option for a more detailed description.
> 
> Therefore my question about a lockdep splat. However I don't see why
> using spin_lock() instead of raw_spin_lock() alone in irq disabled
> context could be problematic. On the other hand this patch does

Hi Heiko,

AFAIU it is only problematic if we (s390) should ever want to support RT
in the future.

As the name implies, the point of kvm_arch_set_irq_inatomic() is to
inject the interrupt without the possibility of sleeping, or
alternatively recognizing the need to sleep and fall back to a queued
"slow path" that can safely sleep while delivering it.

My original thinking was 'well, it won't hurt to use the raw spinlocks
in the new code' so I set Doug down this road with my review comments --
I did not consider that there would be a need for additional fallout
like this patch, which means increased chance of regressions (see below)
to accomodate a feature that we don't support today.

If you are saying it's OK to simply not care about RT for s390 now, then
AFAICT it should be fine to just use s/raw_spin_)lock/spin_lock/ for
this whole series, drop this patch and then ignore the subsequent
Sashiko complaints about RT.

What do you think?

> introduce a
> 
>       raw_spin_lock();
>       spin_lock();
>       spin_unlock();
>       raw_spin_unlock();
> 
> sequence in __deliver_machine_check() which seems to be incorrect and
> indeed should generate a lockdep splat iff PROVE_RAW_LOCK_NESTING is
> enabled.

+1
Doug, I know you've run with lockdep enabled before on this series --
please make sure to test with lockdep for next version

Thanks,
Matt




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 3/4] KVM: s390: Change the fi->lock to a raw_spinlock for RT case
  2026-05-07 13:17         ` Matthew Rosato
@ 2026-05-07 14:45           ` Heiko Carstens
  2026-05-07 14:49             ` Peter Zijlstra
  2026-05-08  2:46             ` Douglas Freimuth
  0 siblings, 2 replies; 13+ messages in thread
From: Heiko Carstens @ 2026-05-07 14:45 UTC (permalink / raw)
  To: Matthew Rosato
  Cc: Douglas Freimuth, borntraeger, imbrenda, frankja, david, gor,
	agordeev, svens, kvm, linux-s390, linux-kernel, Peter Zijlstra

Adding Peter :)

On Thu, May 07, 2026 at 09:17:00AM -0400, Matthew Rosato wrote:
> On 5/7/26 5:56 AM, Heiko Carstens wrote:
> > On Wed, May 06, 2026 at 10:50:52AM -0400, Douglas Freimuth wrote:
> >> On 5/6/26 12:57 AM, Heiko Carstens wrote:
> >>> On Tue, May 05, 2026 at 07:37:27PM +0200, Douglas Freimuth wrote:
> >>>> s390 needs to maintain support for an RT kernel. This requires the
> >>>> floating interrupt lock, fi->lock to be changed to a raw spin lock
> >>>> since the fi->lock maybe called with interrupts disabled in __inject_io.
> >>>>
> >>>> Signed-off-by: Douglas Freimuth <freimuth@linux.ibm.com>
> >>>> ---
> >>>>   arch/s390/include/asm/kvm_host.h |  2 +-
> >>>>   arch/s390/kvm/intercept.c        |  4 +-
> >>>>   arch/s390/kvm/interrupt.c        | 68 ++++++++++++++++----------------
> >>>>   arch/s390/kvm/kvm-s390.c         |  2 +-
> >>>>   4 files changed, 38 insertions(+), 38 deletions(-)
> >>>
> >>> s390 does not support RT, but I guess you are referring to a lockdep splat
> >>> which you would see without doing this change, similar like we have seen at
> >>> other places.
> >>>
> >>> Can you include the relevant parts of the splat for reference, please?

...

> AFAIU it is only problematic if we (s390) should ever want to support RT
> in the future.

I don't see that coming, but nobody knows what happens in future.

...

> My original thinking was 'well, it won't hurt to use the raw spinlocks
> in the new code' so I set Doug down this road with my review comments --
> I did not consider that there would be a need for additional fallout
> like this patch, which means increased chance of regressions (see below)
> to accomodate a feature that we don't support today.
> 
> If you are saying it's OK to simply not care about RT for s390 now, then
> AFAICT it should be fine to just use s/raw_spin_)lock/spin_lock/ for
> this whole series, drop this patch and then ignore the subsequent
> Sashiko complaints about RT.
> 
> What do you think?

So... after having given this a second thought: we do not have
PROVE_RAW_LOCK_NESTING enabled in our debug_defconfig (either we missed it,
or somebody (cough) thought it is not relevant for s390). That said, I
believe we should enable it, fix all fallout and also make sure that new
code does not generate any lockdep splats with PROVE_RAW_LOCK_NESTING
enabled.

Rationale: even though it is not relevant for s390, we also change common
code; and by ignoring PROVE_RAW_LOCK_NESTING we might cause problems for
other architectures by introducing incorrect nesting of locks in common
code. So yes, your thinking is correct.

Peter, I just added you to cc, so you can correct me if I'm entirely wrong.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 3/4] KVM: s390: Change the fi->lock to a raw_spinlock for RT case
  2026-05-07 14:45           ` Heiko Carstens
@ 2026-05-07 14:49             ` Peter Zijlstra
  2026-05-08  2:46             ` Douglas Freimuth
  1 sibling, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2026-05-07 14:49 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Matthew Rosato, Douglas Freimuth, borntraeger, imbrenda, frankja,
	david, gor, agordeev, svens, kvm, linux-s390, linux-kernel

On Thu, May 07, 2026 at 04:45:49PM +0200, Heiko Carstens wrote:

> So... after having given this a second thought: we do not have
> PROVE_RAW_LOCK_NESTING enabled in our debug_defconfig (either we missed it,
> or somebody (cough) thought it is not relevant for s390). That said, I
> believe we should enable it, fix all fallout and also make sure that new
> code does not generate any lockdep splats with PROVE_RAW_LOCK_NESTING
> enabled.
> 
> Rationale: even though it is not relevant for s390, we also change common
> code; and by ignoring PROVE_RAW_LOCK_NESTING we might cause problems for
> other architectures by introducing incorrect nesting of locks in common
> code. So yes, your thinking is correct.
> 
> Peter, I just added you to cc, so you can correct me if I'm entirely wrong.

Makes sense to me; thanks for doing so!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 3/4] KVM: s390: Change the fi->lock to a raw_spinlock for RT case
  2026-05-07 14:45           ` Heiko Carstens
  2026-05-07 14:49             ` Peter Zijlstra
@ 2026-05-08  2:46             ` Douglas Freimuth
  2026-05-08 10:27               ` Heiko Carstens
  1 sibling, 1 reply; 13+ messages in thread
From: Douglas Freimuth @ 2026-05-08  2:46 UTC (permalink / raw)
  To: Heiko Carstens, Matthew Rosato
  Cc: borntraeger, imbrenda, frankja, david, gor, agordeev, svens, kvm,
	linux-s390, linux-kernel, Peter Zijlstra



On 5/7/26 10:45 AM, Heiko Carstens wrote:
> Adding Peter :)
> 
> On Thu, May 07, 2026 at 09:17:00AM -0400, Matthew Rosato wrote:
>> On 5/7/26 5:56 AM, Heiko Carstens wrote:
>>> On Wed, May 06, 2026 at 10:50:52AM -0400, Douglas Freimuth wrote:
>>>> On 5/6/26 12:57 AM, Heiko Carstens wrote:
>>>>> On Tue, May 05, 2026 at 07:37:27PM +0200, Douglas Freimuth wrote:
>>>>>> s390 needs to maintain support for an RT kernel. This requires the
>>>>>> floating interrupt lock, fi->lock to be changed to a raw spin lock
>>>>>> since the fi->lock maybe called with interrupts disabled in __inject_io.
>>>>>>
>>>>>> Signed-off-by: Douglas Freimuth <freimuth@linux.ibm.com>
>>>>>> ---
>>>>>>    arch/s390/include/asm/kvm_host.h |  2 +-
>>>>>>    arch/s390/kvm/intercept.c        |  4 +-
>>>>>>    arch/s390/kvm/interrupt.c        | 68 ++++++++++++++++----------------
>>>>>>    arch/s390/kvm/kvm-s390.c         |  2 +-
>>>>>>    4 files changed, 38 insertions(+), 38 deletions(-)
>>>>>
>>>>> s390 does not support RT, but I guess you are referring to a lockdep splat
>>>>> which you would see without doing this change, similar like we have seen at
>>>>> other places.
>>>>>
>>>>> Can you include the relevant parts of the splat for reference, please?
> 
> ...
> 
>> AFAIU it is only problematic if we (s390) should ever want to support RT
>> in the future.
> 
> I don't see that coming, but nobody knows what happens in future.
> 
> ...
> 
>> My original thinking was 'well, it won't hurt to use the raw spinlocks
>> in the new code' so I set Doug down this road with my review comments --
>> I did not consider that there would be a need for additional fallout
>> like this patch, which means increased chance of regressions (see below)
>> to accomodate a feature that we don't support today.
>>
>> If you are saying it's OK to simply not care about RT for s390 now, then
>> AFAICT it should be fine to just use s/raw_spin_)lock/spin_lock/ for
>> this whole series, drop this patch and then ignore the subsequent
>> Sashiko complaints about RT.
>>
>> What do you think?
> 
> So... after having given this a second thought: we do not have
> PROVE_RAW_LOCK_NESTING enabled in our debug_defconfig (either we missed it,
> or somebody (cough) thought it is not relevant for s390). That said, I
> believe we should enable it, fix all fallout and also make sure that new
> code does not generate any lockdep splats with PROVE_RAW_LOCK_NESTING
> enabled.
> 
> Rationale: even though it is not relevant for s390, we also change common
> code; and by ignoring PROVE_RAW_LOCK_NESTING we might cause problems for
> other architectures by introducing incorrect nesting of locks in common
> code. So yes, your thinking is correct.

Heiko, to be complete, I went through the exercise of enabling 
PROVE_RAW_LOCK_NESTING. I created a small hack to generate a 
__deliver_machine_check to trap the nested locking issue. The requested 
splat is below. Here the floating interrupt lock is a raw_spin_lock and 
the nested call to local interrupt lock is a spin_lock thus the nesting 
issue. No other nesting issues were found.

Now we need to arrive at, do we keep the raw_spin_locks to cover the 
possibility of future RT support or common code? In that case I also 
make the li->lock a raw_spin_lock. OR should I drop this raw_spin_lock 
patch and back out any other raw_spin_locks since we dont currently 
support RT on s390? And end either choice by testing again with 
PROVE_RAW_LOCK_NESTING.

[  187.278926] =============================
[  187.278927] [ BUG: Invalid wait context ]
[  187.278930] 7.1.0-rc1-gb8e991a47d4c-dirty #6 Not tainted
[  187.278932] -----------------------------
[  187.278933] CPU 0/KVM/4263 is trying to lock:
[  187.278935] 000000c7448982a0 
(&vcpu->arch.local_int.lock){+.+.}-{3:3}, at: 
__deliver_machine_check+0x44/0x1a0 [kvm]
[  187.278976] other info that might help us debug this:
[  187.278978] context-{5:5}
[  187.278979] 3 locks held by CPU 0/KVM/4263:
[  187.278981]  #0: 000000c7448980b8 (&vcpu->mutex){+.+.}-{4:4}, at: 
kvm_vcpu_ioctl+0xb8/0x9b0 [kvm]
[  187.279001]  #1: 000000c73a75b108 (&kvm->srcu){.+.+}-{0:0}, at: 
__vcpu_run+0x46/0x4f0 [kvm]
[  187.279024]  #2: 000000c73a758dd0 
(&kvm->arch.float_int.lock){+.+.}-{2:2}, at: 
__deliver_machine_check+0x3a/0x1a0 [kvm]
[  187.279046] stack backtrace:
[  187.279048] CPU: 10 UID: 107 PID: 4263 Comm: CPU 0/KVM Not tainted 
7.1.0-rc1-gb8e991a47d4c-dirty #6 PREEMPT
[  187.279050] Hardware name: IBM 9175 ME1 701 (LPAR)
[  187.279051] Call Trace:
[  187.279051]  [<000001cbdd2e7eea>] dump_stack_lvl+0xa2/0xe8
[  187.279054]  [<000001cbdd3ecd98>] __lock_acquire+0xe18/0x15c0
[  187.279057]  [<000001cbdd3ed62c>] lock_acquire.part.0+0xec/0x260
[  187.279059]  [<000001cbdd3ed84c>] lock_acquire+0xac/0x200
[  187.279061]  [<000001cbde401528>] _raw_spin_lock+0x58/0xb0
[  187.279063]  [<000001cb5dc5e734>] __deliver_machine_check+0x44/0x1a0 
[kvm]
[  187.279082]  [<000001cb5dc6057e>] 
kvm_s390_deliver_pending_interrupts+0x7e/0x990 [kvm]
[  187.279099]  [<000001cb5dc49934>] vcpu_pre_run+0x74/0x2d0 [kvm]
[  187.279117]  [<000001cb5dc558e8>] __vcpu_run+0xa8/0x4f0 [kvm]
[  187.279134]  [<000001cb5dc56400>] kvm_arch_vcpu_ioctl_run+0x140/0x320 
[kvm]
[  187.279152]  [<000001cb5dc35cc2>] kvm_vcpu_ioctl+0x142/0x9b0 [kvm]
[  187.279167]  [<000001cbdd7d0bda>] __s390x_sys_ioctl+0xea/0x120
[  187.279171]  [<000001cbde3ef868>] __do_syscall+0x168/0x750
[  187.279173]  [<000001cbde402d1a>] system_call+0x72/0x90
[  187.279175] INFO: lockdep is turned off.


> 
> Peter, I just added you to cc, so you can correct me if I'm entirely wrong.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v5 3/4] KVM: s390: Change the fi->lock to a raw_spinlock for RT case
  2026-05-08  2:46             ` Douglas Freimuth
@ 2026-05-08 10:27               ` Heiko Carstens
  0 siblings, 0 replies; 13+ messages in thread
From: Heiko Carstens @ 2026-05-08 10:27 UTC (permalink / raw)
  To: Douglas Freimuth
  Cc: Matthew Rosato, borntraeger, imbrenda, frankja, david, gor,
	agordeev, svens, kvm, linux-s390, linux-kernel, Peter Zijlstra

On Thu, May 07, 2026 at 10:46:44PM -0400, Douglas Freimuth wrote:
> > Rationale: even though it is not relevant for s390, we also change common
> > code; and by ignoring PROVE_RAW_LOCK_NESTING we might cause problems for
> > other architectures by introducing incorrect nesting of locks in common
> > code. So yes, your thinking is correct.
> 
> Heiko, to be complete, I went through the exercise of enabling
> PROVE_RAW_LOCK_NESTING. I created a small hack to generate a
> __deliver_machine_check to trap the nested locking issue. The requested
> splat is below. Here the floating interrupt lock is a raw_spin_lock and the
> nested call to local interrupt lock is a spin_lock thus the nesting issue.
> No other nesting issues were found.
> 
> Now we need to arrive at, do we keep the raw_spin_locks to cover the
> possibility of future RT support or common code? In that case I also make
> the li->lock a raw_spin_lock. OR should I drop this raw_spin_lock patch and
> back out any other raw_spin_locks since we dont currently support RT on
> s390? And end either choice by testing again with PROVE_RAW_LOCK_NESTING.

Doug, we are going to enable PROVE_RAW_LOCK_NESTING in our debug_defconfig for
the reasons I tried to outline above.

Or in other words: you need to convert li->lock too, since we want our code in
a way that it doesn't trigger any lockdep splats, regardless if s390 will or
will not support RT.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-05-08 10:28 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-05 17:37 [PATCH v5 0/4] KVM: s390: Introducing kvm_arch_set_irq_inatomic Fast Inject Douglas Freimuth
2026-05-05 17:37 ` [PATCH v5 1/4] KVM: s390: Add map/unmap ioctl and clean mappings post-guest Douglas Freimuth
2026-05-05 17:37 ` [PATCH v5 2/4] KVM: s390: Enable adapter_indicators_set to use mapped pages Douglas Freimuth
2026-05-05 17:37 ` [PATCH v5 3/4] KVM: s390: Change the fi->lock to a raw_spinlock for RT case Douglas Freimuth
2026-05-06  4:57   ` Heiko Carstens
2026-05-06 14:50     ` Douglas Freimuth
2026-05-07  9:56       ` Heiko Carstens
2026-05-07 13:17         ` Matthew Rosato
2026-05-07 14:45           ` Heiko Carstens
2026-05-07 14:49             ` Peter Zijlstra
2026-05-08  2:46             ` Douglas Freimuth
2026-05-08 10:27               ` Heiko Carstens
2026-05-05 17:37 ` [PATCH v5 4/4] KVM: s390: Introducing kvm_arch_set_irq_inatomic fast inject Douglas Freimuth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox