* [PATCH v10 0/3] KVM: s390: Introducing kvm_arch_set_irq_inatomic Fast Inject
@ 2026-06-04 19:27 Douglas Freimuth
2026-06-04 19:27 ` [PATCH v10 1/3] KVM: s390: Add map/unmap ioctl and clean mappings post-guest Douglas Freimuth
` (3 more replies)
0 siblings, 4 replies; 7+ messages in thread
From: Douglas Freimuth @ 2026-06-04 19:27 UTC (permalink / raw)
To: borntraeger, imbrenda, frankja, david, hca, gor, agordeev, svens,
kvm, linux-s390, linux-kernel
Cc: mjrosato, freimuth
s390 needs this series of three patches in order to enable a non-blocking
path for irqfd injection on s390 via kvm_arch_set_irq_inatomic(). Before
these changes, kvm_arch_set_irq_inatomic() would just return -EWOULDBLOCK
and place all interrupts on the global work queue, which must subsequently
be processed by a different thread. This series of patches implements an
s390 version of inatomic and is relevant to virtio-blk and virtio-net and
was tested against virtio-pci and virtio-ccw.
The inatomic fast path cannot lose control since it is running with
interrupts disabled. This meant making the following changes that exist on
the slow path today. First, the adapter_indicators page needs to be mapped
since it is accessed with interrupts disabled, so we added map/unmap
functions. Second, access to shared resources between the fast and slow
paths needed to be changed from mutex and semaphores to spin_lock's.
Finally, the memory allocation on the slow path utilizes GFP_KERNEL_ACCOUNT
but we had to implement the fast path with GFP_ATOMIC allocation. Each of
these enhancements were required to prevent blocking on the fast inject
path.
s390 doesn't support a PREEMPT_RT kernel and this patch doesn't either.
Given this fact, we are not using raw_spin_lock instead we are using
regular spin_lock.
Statistical counters have been added to enable analysis of irq injection on
the fast path and slow path including io_390_inatomic, io_flic_inject_airq,
io_set_adapter_int and io_390_inatomic_no_inject. And
counters have been added to analyze map/unmap of the adapter_indicator
pages in non-Secure Execution environments and to track fencing of Fast
Inject in Secure Execution environments. In order to take advantage of this
kernel series with virtio-pci, a QEMU that includes the
's390x/pci: set kvm_msi_via_irqfd_allowed' fix is needed. Additionally,
the guest xml needs a thread pool and threads explicitly assigned per disk
device using the common way of defining threads for disks.
Patch 1 enables map/unmap of adapter indicator pages but for Secure
Execution environments it avoids the long term mapping.
v9->v10: Streamline logic in register_io_adapter()
Douglas Freimuth (3):
KVM: s390: Add map/unmap ioctl and clean mappings post-guest
KVM: s390: Enable adapter_indicators_set to use mapped pages
KVM: s390: Introducing kvm_arch_set_irq_inatomic fast inject
arch/s390/include/asm/kvm_host.h | 11 +-
arch/s390/kvm/intercept.c | 5 +-
arch/s390/kvm/interrupt.c | 559 ++++++++++++++++++++++++-------
arch/s390/kvm/kvm-s390.c | 30 +-
arch/s390/kvm/kvm-s390.h | 5 +-
5 files changed, 488 insertions(+), 122 deletions(-)
--
2.54.0
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v10 1/3] KVM: s390: Add map/unmap ioctl and clean mappings post-guest
2026-06-04 19:27 [PATCH v10 0/3] KVM: s390: Introducing kvm_arch_set_irq_inatomic Fast Inject Douglas Freimuth
@ 2026-06-04 19:27 ` Douglas Freimuth
2026-06-04 19:27 ` [PATCH v10 2/3] KVM: s390: Enable adapter_indicators_set to use mapped pages Douglas Freimuth
` (2 subsequent siblings)
3 siblings, 0 replies; 7+ messages in thread
From: Douglas Freimuth @ 2026-06-04 19:27 UTC (permalink / raw)
To: borntraeger, imbrenda, frankja, david, hca, gor, agordeev, svens,
kvm, linux-s390, linux-kernel
Cc: mjrosato, freimuth
s390 needs map/unmap ioctls, which map the adapter set
indicator pages, so the pages can be accessed when interrupts are
disabled. The mappings are cleaned up when the guest is removed.
pin_user_pages_remote is used for both the ioctl as well
as the pin-on-demand logic in adapter_indicators_set().
Map/Unmap ioctls are fenced in order to avoid the longterm pinning
in Secure Execution environments. In Secure Execution
environments the path of execution available before this patch is followed.
Statistical counters to count map/unmap functions for adapter indicator
pages are added. The counters can be used to analyze
map/unmap functions in non-Secure Execution environments and similarly
can be used to analyze Secure Execution environments where the counters
will not be incremented as the adapter indicator pages are not mapped.
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Douglas Freimuth <freimuth@linux.ibm.com>
---
arch/s390/include/asm/kvm_host.h | 5 +
arch/s390/kvm/interrupt.c | 227 +++++++++++++++++++++++++------
arch/s390/kvm/kvm-s390.c | 3 +
arch/s390/kvm/kvm-s390.h | 2 +
4 files changed, 194 insertions(+), 43 deletions(-)
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 8a4f4a39f7a2..0056cc9414a0 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -448,6 +448,8 @@ struct kvm_vcpu_arch {
struct kvm_vm_stat {
struct kvm_vm_stat_generic generic;
u64 inject_io;
+ u64 io_390_adapter_map;
+ u64 io_390_adapter_unmap;
u64 inject_float_mchk;
u64 inject_pfault_done;
u64 inject_service_signal;
@@ -479,6 +481,9 @@ struct s390_io_adapter {
bool masked;
bool swap;
bool suppressible;
+ spinlock_t maps_lock;
+ struct list_head maps;
+ unsigned int nr_maps;
};
#define MAX_S390_IO_ADAPTERS ((MAX_ISC + 1) * 8)
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 3bcdbbbb6891..d066a282271e 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -2411,34 +2411,46 @@ static int register_io_adapter(struct kvm_device *dev,
{
struct s390_io_adapter *adapter;
struct kvm_s390_io_adapter adapter_info;
+ int rc = 0;
+ mutex_lock(&dev->kvm->lock);
if (copy_from_user(&adapter_info,
- (void __user *)attr->addr, sizeof(adapter_info)))
- return -EFAULT;
-
- if (adapter_info.id >= MAX_S390_IO_ADAPTERS)
- return -EINVAL;
-
+ (void __user *)attr->addr, sizeof(adapter_info))) {
+ rc = -EFAULT;
+ goto out;
+ }
+ if (adapter_info.id >= MAX_S390_IO_ADAPTERS) {
+ rc = -EINVAL;
+ goto out;
+ }
adapter_info.id = array_index_nospec(adapter_info.id,
MAX_S390_IO_ADAPTERS);
- if (dev->kvm->arch.adapters[adapter_info.id] != NULL)
- return -EINVAL;
-
+ if (dev->kvm->arch.adapters[adapter_info.id] != NULL) {
+ rc = -EINVAL;
+ goto out;
+ }
adapter = kzalloc_obj(*adapter, GFP_KERNEL_ACCOUNT);
- if (!adapter)
- return -ENOMEM;
+ if (!adapter) {
+ rc = -ENOMEM;
+ goto out;
+ }
+ INIT_LIST_HEAD(&adapter->maps);
+ spin_lock_init(&adapter->maps_lock);
+ adapter->nr_maps = 0;
adapter->id = adapter_info.id;
adapter->isc = adapter_info.isc;
adapter->maskable = adapter_info.maskable;
adapter->masked = false;
adapter->swap = adapter_info.swap;
- adapter->suppressible = (adapter_info.flags) &
+ adapter->suppressible = adapter_info.flags &
KVM_S390_ADAPTER_SUPPRESSIBLE;
dev->kvm->arch.adapters[adapter->id] = adapter;
- return 0;
+out:
+ mutex_unlock(&dev->kvm->lock);
+ return rc;
}
int kvm_s390_mask_adapter(struct kvm *kvm, unsigned int id, bool masked)
@@ -2453,12 +2465,151 @@ int kvm_s390_mask_adapter(struct kvm *kvm, unsigned int id, bool masked)
return ret;
}
+static struct page *pin_map_page(struct kvm *kvm, u64 uaddr,
+ unsigned int gup_flags)
+{
+ struct mm_struct *mm = kvm->mm;
+ struct page *page = NULL;
+ int locked = 1;
+
+ if (mmget_not_zero(mm)) {
+ mmap_read_lock(mm);
+ pin_user_pages_remote(mm, uaddr, 1, FOLL_WRITE | gup_flags,
+ &page, &locked);
+ if (locked)
+ mmap_read_unlock(mm);
+ mmput(mm);
+ }
+
+ return page;
+}
+
+static int kvm_s390_adapter_map(struct kvm *kvm, unsigned int id, __u64 addr)
+{
+ struct s390_io_adapter *adapter = get_io_adapter(kvm, id);
+ struct s390_map_info *map;
+ unsigned long flags;
+ __u64 host_addr;
+ int ret, idx;
+
+ if (!adapter || !addr)
+ return -EINVAL;
+
+ map = kzalloc_obj(*map, GFP_KERNEL_ACCOUNT);
+ if (!map)
+ return -ENOMEM;
+
+ INIT_LIST_HEAD(&map->list);
+ idx = srcu_read_lock(&kvm->srcu);
+ host_addr = gpa_to_hva(kvm, addr);
+ if (kvm_is_error_hva(host_addr)) {
+ srcu_read_unlock(&kvm->srcu, idx);
+ ret = -EFAULT;
+ goto out;
+ }
+ srcu_read_unlock(&kvm->srcu, idx);
+ map->guest_addr = addr;
+ map->addr = host_addr;
+ map->page = pin_map_page(kvm, host_addr, FOLL_LONGTERM);
+ if (!map->page) {
+ ret = -EINVAL;
+ goto out;
+ }
+ spin_lock_irqsave(&adapter->maps_lock, flags);
+ if (adapter->nr_maps < MAX_S390_ADAPTER_MAPS) {
+ list_add_tail(&map->list, &adapter->maps);
+ adapter->nr_maps++;
+ ret = 0;
+ } else {
+ ret = -EINVAL;
+ }
+ spin_unlock_irqrestore(&adapter->maps_lock, flags);
+ if (ret)
+ unpin_user_page(map->page);
+out:
+ if (ret)
+ kfree(map);
+ return ret;
+}
+
+static int kvm_s390_adapter_unmap(struct kvm *kvm, unsigned int id, __u64 addr)
+{
+ struct s390_io_adapter *adapter = get_io_adapter(kvm, id);
+ struct s390_map_info *map, *tmp, *map_to_free;
+ struct page *map_page_to_put = NULL;
+ u64 map_addr_to_mark = 0;
+ unsigned long flags;
+ int found = 0, idx;
+
+ if (!adapter || !addr)
+ return -EINVAL;
+
+ spin_lock_irqsave(&adapter->maps_lock, flags);
+ list_for_each_entry_safe(map, tmp, &adapter->maps, list) {
+ if (map->guest_addr == addr) {
+ found = 1;
+ adapter->nr_maps--;
+ list_del(&map->list);
+ map_page_to_put = map->page;
+ map_addr_to_mark = map->guest_addr;
+ map_to_free = map;
+ break;
+ }
+ }
+ spin_unlock_irqrestore(&adapter->maps_lock, flags);
+
+ if (found) {
+ kfree(map_to_free);
+ idx = srcu_read_lock(&kvm->srcu);
+ mark_page_dirty(kvm, map_addr_to_mark >> PAGE_SHIFT);
+ set_page_dirty_lock(map_page_to_put);
+ srcu_read_unlock(&kvm->srcu, idx);
+ unpin_user_page(map_page_to_put);
+ }
+
+ return found ? 0 : -ENOENT;
+}
+
+void kvm_s390_unmap_all_adapters(struct kvm *kvm)
+{
+ struct s390_map_info *map, *tmp;
+ unsigned long flags;
+ int i, idx;
+
+ for (i = 0; i < MAX_S390_IO_ADAPTERS; i++) {
+ struct s390_io_adapter *adapter = kvm->arch.adapters[i];
+ LIST_HEAD(local_list);
+
+ if (!adapter)
+ continue;
+
+ spin_lock_irqsave(&adapter->maps_lock, flags);
+ list_splice_init(&adapter->maps, &local_list);
+ adapter->nr_maps = 0;
+ spin_unlock_irqrestore(&adapter->maps_lock, flags);
+
+ list_for_each_entry_safe(map, tmp, &local_list, list) {
+ list_del(&map->list);
+ idx = srcu_read_lock(&kvm->srcu);
+ mark_page_dirty(kvm, map->guest_addr >> PAGE_SHIFT);
+ set_page_dirty_lock(map->page);
+ srcu_read_unlock(&kvm->srcu, idx);
+ unpin_user_page(map->page);
+ kfree(map);
+ }
+ }
+}
+
void kvm_s390_destroy_adapters(struct kvm *kvm)
{
int i;
- for (i = 0; i < MAX_S390_IO_ADAPTERS; i++)
+ kvm_s390_unmap_all_adapters(kvm);
+
+ for (i = 0; i < MAX_S390_IO_ADAPTERS; i++) {
kfree(kvm->arch.adapters[i]);
+ kvm->arch.adapters[i] = NULL;
+ }
}
static int modify_io_adapter(struct kvm_device *dev,
@@ -2480,14 +2631,22 @@ static int modify_io_adapter(struct kvm_device *dev,
if (ret > 0)
ret = 0;
break;
- /*
- * The following operations are no longer needed and therefore no-ops.
- * The gpa to hva translation is done when an IRQ route is set up. The
- * set_irq code uses get_user_pages_remote() to do the actual write.
- */
case KVM_S390_IO_ADAPTER_MAP:
case KVM_S390_IO_ADAPTER_UNMAP:
- ret = 0;
+ /* If in Secure Execution mode do not long term pin. */
+ mutex_lock(&dev->kvm->lock);
+ if (kvm_s390_pv_is_protected(dev->kvm)) {
+ mutex_unlock(&dev->kvm->lock);
+ return 0;
+ }
+ if (req.type == KVM_S390_IO_ADAPTER_MAP) {
+ dev->kvm->stat.io_390_adapter_map++;
+ ret = kvm_s390_adapter_map(dev->kvm, req.id, req.addr);
+ } else {
+ dev->kvm->stat.io_390_adapter_unmap++;
+ ret = kvm_s390_adapter_unmap(dev->kvm, req.id, req.addr);
+ }
+ mutex_unlock(&dev->kvm->lock);
break;
default:
ret = -EINVAL;
@@ -2733,24 +2892,6 @@ static unsigned long get_ind_bit(__u64 addr, unsigned long bit_nr, bool swap)
return swap ? (bit ^ (BITS_PER_LONG - 1)) : bit;
}
-static struct page *get_map_page(struct kvm *kvm, u64 uaddr)
-{
- struct mm_struct *mm = kvm->mm;
- struct page *page = NULL;
- int locked = 1;
-
- if (mmget_not_zero(mm)) {
- mmap_read_lock(mm);
- get_user_pages_remote(mm, uaddr, 1, FOLL_WRITE,
- &page, &locked);
- if (locked)
- mmap_read_unlock(mm);
- mmput(mm);
- }
-
- return page;
-}
-
static int adapter_indicators_set(struct kvm *kvm,
struct s390_io_adapter *adapter,
struct kvm_s390_adapter_int *adapter_int)
@@ -2760,12 +2901,12 @@ static int adapter_indicators_set(struct kvm *kvm,
struct page *ind_page, *summary_page;
void *map;
- ind_page = get_map_page(kvm, adapter_int->ind_addr);
+ ind_page = pin_map_page(kvm, adapter_int->ind_addr, 0);
if (!ind_page)
return -1;
- summary_page = get_map_page(kvm, adapter_int->summary_addr);
+ summary_page = pin_map_page(kvm, adapter_int->summary_addr, 0);
if (!summary_page) {
- put_page(ind_page);
+ unpin_user_page(ind_page);
return -1;
}
@@ -2784,8 +2925,8 @@ static int adapter_indicators_set(struct kvm *kvm,
set_page_dirty_lock(summary_page);
srcu_read_unlock(&kvm->srcu, idx);
- put_page(ind_page);
- put_page(summary_page);
+ unpin_user_page(ind_page);
+ unpin_user_page(summary_page);
return summary_set ? 0 : 1;
}
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index e09960c2e6ed..0d39c1375de2 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -68,6 +68,8 @@
const struct kvm_stats_desc kvm_vm_stats_desc[] = {
KVM_GENERIC_VM_STATS(),
STATS_DESC_COUNTER(VM, inject_io),
+ STATS_DESC_COUNTER(VM, io_390_adapter_map),
+ STATS_DESC_COUNTER(VM, io_390_adapter_unmap),
STATS_DESC_COUNTER(VM, inject_float_mchk),
STATS_DESC_COUNTER(VM, inject_pfault_done),
STATS_DESC_COUNTER(VM, inject_service_signal),
@@ -2513,6 +2515,7 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
if (kvm_s390_pv_is_protected(kvm))
break;
+ kvm_s390_unmap_all_adapters(kvm);
mmap_write_lock(kvm->mm);
/*
* Disable creation of new THPs. Existing THPs can stay, they
diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
index dc0573b7aa4b..7ba885cb6bd1 100644
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -560,6 +560,8 @@ void kvm_s390_gisa_disable(struct kvm *kvm);
void kvm_s390_gisa_enable(struct kvm *kvm);
int __init kvm_s390_gib_init(u8 nisc);
void kvm_s390_gib_destroy(void);
+void kvm_s390_unmap_all_adapters(struct kvm *kvm);
+
/* implemented in guestdbg.c */
void kvm_s390_backup_guest_per_regs(struct kvm_vcpu *vcpu);
--
2.54.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v10 2/3] KVM: s390: Enable adapter_indicators_set to use mapped pages
2026-06-04 19:27 [PATCH v10 0/3] KVM: s390: Introducing kvm_arch_set_irq_inatomic Fast Inject Douglas Freimuth
2026-06-04 19:27 ` [PATCH v10 1/3] KVM: s390: Add map/unmap ioctl and clean mappings post-guest Douglas Freimuth
@ 2026-06-04 19:27 ` Douglas Freimuth
2026-06-04 19:42 ` sashiko-bot
2026-06-04 19:27 ` [PATCH v10 3/3] KVM: s390: Introducing kvm_arch_set_irq_inatomic fast inject Douglas Freimuth
2026-06-15 9:04 ` [PATCH v10 0/3] KVM: s390: Introducing kvm_arch_set_irq_inatomic Fast Inject Claudio Imbrenda
3 siblings, 1 reply; 7+ messages in thread
From: Douglas Freimuth @ 2026-06-04 19:27 UTC (permalink / raw)
To: borntraeger, imbrenda, frankja, david, hca, gor, agordeev, svens,
kvm, linux-s390, linux-kernel
Cc: mjrosato, freimuth
The s390 adapter_indicators_set function can now be optimized to use
long-term mapped pages when available so that work can be
processed on a fast path when interrupts are disabled.
If adapter indicator pages are not mapped then local mapping is
done on a slow path as it is prior to this patch. For example, Secure
Execution environments will take the local mapping path as it does prior to
this patch.
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Douglas Freimuth <freimuth@linux.ibm.com>
---
arch/s390/kvm/interrupt.c | 87 ++++++++++++++++++++++++++++-----------
1 file changed, 63 insertions(+), 24 deletions(-)
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index d066a282271e..b5304816aaa0 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -2892,41 +2892,80 @@ static unsigned long get_ind_bit(__u64 addr, unsigned long bit_nr, bool swap)
return swap ? (bit ^ (BITS_PER_LONG - 1)) : bit;
}
+static struct s390_map_info *get_map_info(struct s390_io_adapter *adapter,
+ u64 addr)
+{
+ struct s390_map_info *map;
+
+ if (!adapter)
+ return NULL;
+
+ list_for_each_entry(map, &adapter->maps, list) {
+ if (map->addr == addr)
+ return map;
+ }
+ return NULL;
+}
+
static int adapter_indicators_set(struct kvm *kvm,
struct s390_io_adapter *adapter,
struct kvm_s390_adapter_int *adapter_int)
{
unsigned long bit;
int summary_set, idx;
- struct page *ind_page, *summary_page;
+ struct s390_map_info *ind_info, *summary_info;
void *map;
+ struct page *ind_page, *summary_page;
+ unsigned long flags;
- ind_page = pin_map_page(kvm, adapter_int->ind_addr, 0);
- if (!ind_page)
- return -1;
- summary_page = pin_map_page(kvm, adapter_int->summary_addr, 0);
- if (!summary_page) {
+ ind_page = NULL;
+
+ spin_lock_irqsave(&adapter->maps_lock, flags);
+ ind_info = get_map_info(adapter, adapter_int->ind_addr);
+ if (!ind_info) {
+ spin_unlock_irqrestore(&adapter->maps_lock, flags);
+ ind_page = pin_map_page(kvm, adapter_int->ind_addr, 0);
+ if (!ind_page)
+ return -1;
+ idx = srcu_read_lock(&kvm->srcu);
+ map = page_address(ind_page);
+ bit = get_ind_bit(adapter_int->ind_addr,
+ adapter_int->ind_offset, adapter->swap);
+ set_bit(bit, map);
+ mark_page_dirty(kvm, adapter_int->ind_gaddr >> PAGE_SHIFT);
+ set_page_dirty_lock(ind_page);
+ srcu_read_unlock(&kvm->srcu, idx);
unpin_user_page(ind_page);
- return -1;
+ } else {
+ map = page_address(ind_info->page);
+ bit = get_ind_bit(ind_info->addr, adapter_int->ind_offset, adapter->swap);
+ set_bit(bit, map);
+ spin_unlock_irqrestore(&adapter->maps_lock, flags);
+ }
+ spin_lock_irqsave(&adapter->maps_lock, flags);
+ summary_info = get_map_info(adapter, adapter_int->summary_addr);
+ if (!summary_info) {
+ spin_unlock_irqrestore(&adapter->maps_lock, flags);
+ summary_page = pin_map_page(kvm, adapter_int->summary_addr, 0);
+ if (WARN_ON_ONCE(!summary_page))
+ return -1;
+ idx = srcu_read_lock(&kvm->srcu);
+ map = page_address(summary_page);
+ bit = get_ind_bit(adapter_int->summary_addr,
+ adapter_int->summary_offset, adapter->swap);
+ summary_set = test_and_set_bit(bit, map);
+ mark_page_dirty(kvm, adapter_int->summary_gaddr >> PAGE_SHIFT);
+ set_page_dirty_lock(summary_page);
+ srcu_read_unlock(&kvm->srcu, idx);
+ unpin_user_page(summary_page);
+ } else {
+ map = page_address(summary_info->page);
+ bit = get_ind_bit(summary_info->addr, adapter_int->summary_offset,
+ adapter->swap);
+ summary_set = test_and_set_bit(bit, map);
+ spin_unlock_irqrestore(&adapter->maps_lock, flags);
}
- idx = srcu_read_lock(&kvm->srcu);
- map = page_address(ind_page);
- bit = get_ind_bit(adapter_int->ind_addr,
- adapter_int->ind_offset, adapter->swap);
- set_bit(bit, map);
- mark_page_dirty(kvm, adapter_int->ind_gaddr >> PAGE_SHIFT);
- set_page_dirty_lock(ind_page);
- map = page_address(summary_page);
- bit = get_ind_bit(adapter_int->summary_addr,
- adapter_int->summary_offset, adapter->swap);
- summary_set = test_and_set_bit(bit, map);
- mark_page_dirty(kvm, adapter_int->summary_gaddr >> PAGE_SHIFT);
- set_page_dirty_lock(summary_page);
- srcu_read_unlock(&kvm->srcu, idx);
-
- unpin_user_page(ind_page);
- unpin_user_page(summary_page);
return summary_set ? 0 : 1;
}
--
2.54.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v10 3/3] KVM: s390: Introducing kvm_arch_set_irq_inatomic fast inject
2026-06-04 19:27 [PATCH v10 0/3] KVM: s390: Introducing kvm_arch_set_irq_inatomic Fast Inject Douglas Freimuth
2026-06-04 19:27 ` [PATCH v10 1/3] KVM: s390: Add map/unmap ioctl and clean mappings post-guest Douglas Freimuth
2026-06-04 19:27 ` [PATCH v10 2/3] KVM: s390: Enable adapter_indicators_set to use mapped pages Douglas Freimuth
@ 2026-06-04 19:27 ` Douglas Freimuth
2026-06-04 19:46 ` sashiko-bot
2026-06-15 9:04 ` [PATCH v10 0/3] KVM: s390: Introducing kvm_arch_set_irq_inatomic Fast Inject Claudio Imbrenda
3 siblings, 1 reply; 7+ messages in thread
From: Douglas Freimuth @ 2026-06-04 19:27 UTC (permalink / raw)
To: borntraeger, imbrenda, frankja, david, hca, gor, agordeev, svens,
kvm, linux-s390, linux-kernel
Cc: mjrosato, freimuth
s390 needs a fast path for irq injection, and along those lines we
introduce kvm_arch_set_irq_inatomic. Instead of placing all interrupts on
the global work queue as it does today, this patch provides a fast path for
irq injection.
The inatomic fast path cannot lose control since it is running with
interrupts disabled. This meant making the following changes that exist on
the slow path today. First, the adapter_indicators page needs to be mapped
since it is accessed with interrupts disabled, so we added map/unmap
functions. Second, access to shared resources between the fast and slow
paths needed to be changed from mutex and semaphores to spin_lock's.
Finally, the memory allocation on the slow path utilizes GFP_KERNEL_ACCOUNT
but we had to implement the fast path with GFP_ATOMIC allocation. Each of
these enhancements were required to prevent blocking on the fast inject
path.
Fencing of Fast Inject in Secure Execution environments is enabled in the
patch series by not mapping adapter indicator pages. In Secure Execution
environments the path of execution available before this patch is followed.
Statistical counters have been added to enable analysis of irq injection on
the fast path and slow path including io_390_inatomic, io_flic_inject_airq,
io_set_adapter_int and io_390_inatomic_no_inject. The no inject counter
captures adapter masked, coalesced and suppressed interrupts.
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Douglas Freimuth <freimuth@linux.ibm.com>
---
arch/s390/include/asm/kvm_host.h | 6 +-
arch/s390/kvm/intercept.c | 5 +-
arch/s390/kvm/interrupt.c | 265 ++++++++++++++++++++++++-------
arch/s390/kvm/kvm-s390.c | 27 +++-
arch/s390/kvm/kvm-s390.h | 3 +-
5 files changed, 241 insertions(+), 65 deletions(-)
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 0056cc9414a0..7422ded443ba 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -359,7 +359,7 @@ struct kvm_s390_float_interrupt {
struct kvm_s390_mchk_info mchk;
struct kvm_s390_ext_info srv_signal;
int last_sleep_cpu;
- struct mutex ais_lock;
+ spinlock_t ais_lock;
u8 simm;
u8 nimm;
};
@@ -450,6 +450,10 @@ struct kvm_vm_stat {
u64 inject_io;
u64 io_390_adapter_map;
u64 io_390_adapter_unmap;
+ u64 io_390_inatomic;
+ u64 io_flic_inject_airq;
+ u64 io_set_adapter_int;
+ u64 io_390_inatomic_no_inject;
u64 inject_float_mchk;
u64 inject_pfault_done;
u64 inject_service_signal;
diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c
index 39aff324203e..1980df61ef30 100644
--- a/arch/s390/kvm/intercept.c
+++ b/arch/s390/kvm/intercept.c
@@ -517,8 +517,9 @@ static int handle_pv_spx(struct kvm_vcpu *vcpu)
static int handle_pv_sclp(struct kvm_vcpu *vcpu)
{
struct kvm_s390_float_interrupt *fi = &vcpu->kvm->arch.float_int;
+ unsigned long flags;
- spin_lock(&fi->lock);
+ spin_lock_irqsave(&fi->lock, flags);
/*
* 2 cases:
* a: an sccb answering interrupt was already pending or in flight.
@@ -534,7 +535,7 @@ static int handle_pv_sclp(struct kvm_vcpu *vcpu)
fi->srv_signal.ext_params |= 0x43000;
set_bit(IRQ_PEND_EXT_SERVICE, &fi->pending_irqs);
clear_bit(IRQ_PEND_EXT_SERVICE, &fi->masked_irqs);
- spin_unlock(&fi->lock);
+ spin_unlock_irqrestore(&fi->lock, flags);
return 0;
}
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index b5304816aaa0..9e3e6b0d72ad 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -624,8 +624,9 @@ static int __must_check __deliver_machine_check(struct kvm_vcpu *vcpu)
struct kvm_s390_mchk_info mchk = {};
int deliver = 0;
int rc = 0;
+ unsigned long flags;
- spin_lock(&fi->lock);
+ spin_lock_irqsave(&fi->lock, flags);
spin_lock(&li->lock);
if (test_bit(IRQ_PEND_MCHK_EX, &li->pending_irqs) ||
test_bit(IRQ_PEND_MCHK_REP, &li->pending_irqs)) {
@@ -654,7 +655,7 @@ static int __must_check __deliver_machine_check(struct kvm_vcpu *vcpu)
deliver = 1;
}
spin_unlock(&li->lock);
- spin_unlock(&fi->lock);
+ spin_unlock_irqrestore(&fi->lock, flags);
if (deliver) {
VCPU_EVENT(vcpu, 3, "deliver: machine check mcic 0x%llx",
@@ -941,11 +942,12 @@ static int __must_check __deliver_service(struct kvm_vcpu *vcpu)
{
struct kvm_s390_float_interrupt *fi = &vcpu->kvm->arch.float_int;
struct kvm_s390_ext_info ext;
+ unsigned long flags;
- spin_lock(&fi->lock);
+ spin_lock_irqsave(&fi->lock, flags);
if (test_bit(IRQ_PEND_EXT_SERVICE, &fi->masked_irqs) ||
!(test_bit(IRQ_PEND_EXT_SERVICE, &fi->pending_irqs))) {
- spin_unlock(&fi->lock);
+ spin_unlock_irqrestore(&fi->lock, flags);
return 0;
}
ext = fi->srv_signal;
@@ -954,7 +956,7 @@ static int __must_check __deliver_service(struct kvm_vcpu *vcpu)
clear_bit(IRQ_PEND_EXT_SERVICE_EV, &fi->pending_irqs);
if (kvm_s390_pv_cpu_is_protected(vcpu))
set_bit(IRQ_PEND_EXT_SERVICE, &fi->masked_irqs);
- spin_unlock(&fi->lock);
+ spin_unlock_irqrestore(&fi->lock, flags);
if (!ext.ext_params)
return 0;
@@ -972,17 +974,18 @@ static int __must_check __deliver_service_ev(struct kvm_vcpu *vcpu)
{
struct kvm_s390_float_interrupt *fi = &vcpu->kvm->arch.float_int;
struct kvm_s390_ext_info ext;
+ unsigned long flags;
- spin_lock(&fi->lock);
+ spin_lock_irqsave(&fi->lock, flags);
if (!(test_bit(IRQ_PEND_EXT_SERVICE_EV, &fi->pending_irqs))) {
- spin_unlock(&fi->lock);
+ spin_unlock_irqrestore(&fi->lock, flags);
return 0;
}
ext = fi->srv_signal;
/* only clear the event bits */
fi->srv_signal.ext_params &= ~SCCB_EVENT_PENDING;
clear_bit(IRQ_PEND_EXT_SERVICE_EV, &fi->pending_irqs);
- spin_unlock(&fi->lock);
+ spin_unlock_irqrestore(&fi->lock, flags);
VCPU_EVENT(vcpu, 4, "%s", "deliver: sclp parameter event");
vcpu->stat.deliver_service_signal++;
@@ -997,8 +1000,9 @@ static int __must_check __deliver_pfault_done(struct kvm_vcpu *vcpu)
struct kvm_s390_float_interrupt *fi = &vcpu->kvm->arch.float_int;
struct kvm_s390_interrupt_info *inti;
int rc = 0;
+ unsigned long flags;
- spin_lock(&fi->lock);
+ spin_lock_irqsave(&fi->lock, flags);
inti = list_first_entry_or_null(&fi->lists[FIRQ_LIST_PFAULT],
struct kvm_s390_interrupt_info,
list);
@@ -1008,7 +1012,7 @@ static int __must_check __deliver_pfault_done(struct kvm_vcpu *vcpu)
}
if (list_empty(&fi->lists[FIRQ_LIST_PFAULT]))
clear_bit(IRQ_PEND_PFAULT_DONE, &fi->pending_irqs);
- spin_unlock(&fi->lock);
+ spin_unlock_irqrestore(&fi->lock, flags);
if (inti) {
trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id,
@@ -1039,8 +1043,9 @@ static int __must_check __deliver_virtio(struct kvm_vcpu *vcpu)
struct kvm_s390_float_interrupt *fi = &vcpu->kvm->arch.float_int;
struct kvm_s390_interrupt_info *inti;
int rc = 0;
+ unsigned long flags;
- spin_lock(&fi->lock);
+ spin_lock_irqsave(&fi->lock, flags);
inti = list_first_entry_or_null(&fi->lists[FIRQ_LIST_VIRTIO],
struct kvm_s390_interrupt_info,
list);
@@ -1058,7 +1063,7 @@ static int __must_check __deliver_virtio(struct kvm_vcpu *vcpu)
}
if (list_empty(&fi->lists[FIRQ_LIST_VIRTIO]))
clear_bit(IRQ_PEND_VIRTIO, &fi->pending_irqs);
- spin_unlock(&fi->lock);
+ spin_unlock_irqrestore(&fi->lock, flags);
if (inti) {
rc = put_guest_lc(vcpu, EXT_IRQ_CP_SERVICE,
@@ -1116,10 +1121,11 @@ static int __must_check __deliver_io(struct kvm_vcpu *vcpu,
struct kvm_s390_io_info io;
u32 isc;
int rc = 0;
+ unsigned long flags;
fi = &vcpu->kvm->arch.float_int;
- spin_lock(&fi->lock);
+ spin_lock_irqsave(&fi->lock, flags);
isc = irq_type_to_isc(irq_type);
isc_list = &fi->lists[isc];
inti = list_first_entry_or_null(isc_list,
@@ -1146,7 +1152,7 @@ static int __must_check __deliver_io(struct kvm_vcpu *vcpu,
}
if (list_empty(isc_list))
clear_bit(irq_type, &fi->pending_irqs);
- spin_unlock(&fi->lock);
+ spin_unlock_irqrestore(&fi->lock, flags);
if (inti) {
rc = __do_deliver_io(vcpu, &(inti->io));
@@ -1662,8 +1668,9 @@ static struct kvm_s390_interrupt_info *get_io_int(struct kvm *kvm,
struct kvm_s390_interrupt_info *iter;
u16 id = (schid & 0xffff0000U) >> 16;
u16 nr = schid & 0x0000ffffU;
+ unsigned long flags;
- spin_lock(&fi->lock);
+ spin_lock_irqsave(&fi->lock, flags);
list_for_each_entry(iter, isc_list, list) {
if (schid && (id != iter->io.subchannel_id ||
nr != iter->io.subchannel_nr))
@@ -1673,10 +1680,10 @@ static struct kvm_s390_interrupt_info *get_io_int(struct kvm *kvm,
fi->counters[FIRQ_CNTR_IO] -= 1;
if (list_empty(isc_list))
clear_bit(isc_to_irq_type(isc), &fi->pending_irqs);
- spin_unlock(&fi->lock);
+ spin_unlock_irqrestore(&fi->lock, flags);
return iter;
}
- spin_unlock(&fi->lock);
+ spin_unlock_irqrestore(&fi->lock, flags);
return NULL;
}
@@ -1769,9 +1776,10 @@ static int __inject_service(struct kvm *kvm,
struct kvm_s390_interrupt_info *inti)
{
struct kvm_s390_float_interrupt *fi = &kvm->arch.float_int;
+ unsigned long flags;
kvm->stat.inject_service_signal++;
- spin_lock(&fi->lock);
+ spin_lock_irqsave(&fi->lock, flags);
fi->srv_signal.ext_params |= inti->ext.ext_params & SCCB_EVENT_PENDING;
/* We always allow events, track them separately from the sccb ints */
@@ -1791,7 +1799,7 @@ static int __inject_service(struct kvm *kvm,
fi->srv_signal.ext_params |= inti->ext.ext_params & SCCB_MASK;
set_bit(IRQ_PEND_EXT_SERVICE, &fi->pending_irqs);
out:
- spin_unlock(&fi->lock);
+ spin_unlock_irqrestore(&fi->lock, flags);
kfree(inti);
return 0;
}
@@ -1800,17 +1808,18 @@ static int __inject_virtio(struct kvm *kvm,
struct kvm_s390_interrupt_info *inti)
{
struct kvm_s390_float_interrupt *fi = &kvm->arch.float_int;
+ unsigned long flags;
kvm->stat.inject_virtio++;
- spin_lock(&fi->lock);
+ spin_lock_irqsave(&fi->lock, flags);
if (fi->counters[FIRQ_CNTR_VIRTIO] >= KVM_S390_MAX_VIRTIO_IRQS) {
- spin_unlock(&fi->lock);
+ spin_unlock_irqrestore(&fi->lock, flags);
return -EBUSY;
}
fi->counters[FIRQ_CNTR_VIRTIO] += 1;
list_add_tail(&inti->list, &fi->lists[FIRQ_LIST_VIRTIO]);
set_bit(IRQ_PEND_VIRTIO, &fi->pending_irqs);
- spin_unlock(&fi->lock);
+ spin_unlock_irqrestore(&fi->lock, flags);
return 0;
}
@@ -1818,18 +1827,19 @@ static int __inject_pfault_done(struct kvm *kvm,
struct kvm_s390_interrupt_info *inti)
{
struct kvm_s390_float_interrupt *fi = &kvm->arch.float_int;
+ unsigned long flags;
kvm->stat.inject_pfault_done++;
- spin_lock(&fi->lock);
+ spin_lock_irqsave(&fi->lock, flags);
if (fi->counters[FIRQ_CNTR_PFAULT] >=
(ASYNC_PF_PER_VCPU * KVM_MAX_VCPUS)) {
- spin_unlock(&fi->lock);
+ spin_unlock_irqrestore(&fi->lock, flags);
return -EBUSY;
}
fi->counters[FIRQ_CNTR_PFAULT] += 1;
list_add_tail(&inti->list, &fi->lists[FIRQ_LIST_PFAULT]);
set_bit(IRQ_PEND_PFAULT_DONE, &fi->pending_irqs);
- spin_unlock(&fi->lock);
+ spin_unlock_irqrestore(&fi->lock, flags);
return 0;
}
@@ -1838,13 +1848,14 @@ static int __inject_float_mchk(struct kvm *kvm,
struct kvm_s390_interrupt_info *inti)
{
struct kvm_s390_float_interrupt *fi = &kvm->arch.float_int;
+ unsigned long flags;
kvm->stat.inject_float_mchk++;
- spin_lock(&fi->lock);
+ spin_lock_irqsave(&fi->lock, flags);
fi->mchk.cr14 |= inti->mchk.cr14 & (1UL << CR_PENDING_SUBCLASS);
fi->mchk.mcic |= inti->mchk.mcic;
set_bit(IRQ_PEND_MCHK_REP, &fi->pending_irqs);
- spin_unlock(&fi->lock);
+ spin_unlock_irqrestore(&fi->lock, flags);
kfree(inti);
return 0;
}
@@ -1855,6 +1866,7 @@ static int __inject_io(struct kvm *kvm, struct kvm_s390_interrupt_info *inti)
struct kvm_s390_float_interrupt *fi;
struct list_head *list;
int isc;
+ unsigned long flags;
kvm->stat.inject_io++;
isc = int_word_to_isc(inti->io.io_int_word);
@@ -1873,9 +1885,9 @@ static int __inject_io(struct kvm *kvm, struct kvm_s390_interrupt_info *inti)
}
fi = &kvm->arch.float_int;
- spin_lock(&fi->lock);
+ spin_lock_irqsave(&fi->lock, flags);
if (fi->counters[FIRQ_CNTR_IO] >= KVM_S390_MAX_FLOAT_IRQS) {
- spin_unlock(&fi->lock);
+ spin_unlock_irqrestore(&fi->lock, flags);
return -EBUSY;
}
fi->counters[FIRQ_CNTR_IO] += 1;
@@ -1890,7 +1902,7 @@ static int __inject_io(struct kvm *kvm, struct kvm_s390_interrupt_info *inti)
list = &fi->lists[FIRQ_LIST_IO_ISC_0 + isc];
list_add_tail(&inti->list, list);
set_bit(isc_to_irq_type(isc), &fi->pending_irqs);
- spin_unlock(&fi->lock);
+ spin_unlock_irqrestore(&fi->lock, flags);
return 0;
}
@@ -1966,15 +1978,10 @@ static int __inject_vm(struct kvm *kvm, struct kvm_s390_interrupt_info *inti)
}
int kvm_s390_inject_vm(struct kvm *kvm,
- struct kvm_s390_interrupt *s390int)
+ struct kvm_s390_interrupt *s390int, struct kvm_s390_interrupt_info *inti)
{
- struct kvm_s390_interrupt_info *inti;
int rc;
- inti = kzalloc_obj(*inti, GFP_KERNEL_ACCOUNT);
- if (!inti)
- return -ENOMEM;
-
inti->type = s390int->type;
switch (inti->type) {
case KVM_S390_INT_VIRTIO:
@@ -2003,15 +2010,13 @@ int kvm_s390_inject_vm(struct kvm *kvm,
inti->io.io_int_word = s390int->parm64 & 0x00000000ffffffffull;
break;
default:
- kfree(inti);
return -EINVAL;
}
trace_kvm_s390_inject_vm(s390int->type, s390int->parm, s390int->parm64,
2);
rc = __inject_vm(kvm, inti);
- if (rc)
- kfree(inti);
+
return rc;
}
@@ -2176,12 +2181,13 @@ void kvm_s390_clear_float_irqs(struct kvm *kvm)
{
struct kvm_s390_float_interrupt *fi = &kvm->arch.float_int;
int i;
+ unsigned long flags;
mutex_lock(&kvm->lock);
if (!kvm_s390_pv_is_protected(kvm))
fi->masked_irqs = 0;
mutex_unlock(&kvm->lock);
- spin_lock(&fi->lock);
+ spin_lock_irqsave(&fi->lock, flags);
fi->pending_irqs = 0;
memset(&fi->srv_signal, 0, sizeof(fi->srv_signal));
memset(&fi->mchk, 0, sizeof(fi->mchk));
@@ -2189,7 +2195,7 @@ void kvm_s390_clear_float_irqs(struct kvm *kvm)
clear_irq_list(&fi->lists[i]);
for (i = 0; i < FIRQ_MAX_COUNT; i++)
fi->counters[i] = 0;
- spin_unlock(&fi->lock);
+ spin_unlock_irqrestore(&fi->lock, flags);
kvm_s390_gisa_clear(kvm);
};
@@ -2204,6 +2210,7 @@ static int get_all_floating_irqs(struct kvm *kvm, u8 __user *usrbuf, u64 len)
int ret = 0;
int n = 0;
int i;
+ unsigned long flags;
if (len > KVM_S390_FLIC_MAX_BUFFER || len == 0)
return -EINVAL;
@@ -2235,7 +2242,7 @@ static int get_all_floating_irqs(struct kvm *kvm, u8 __user *usrbuf, u64 len)
}
}
fi = &kvm->arch.float_int;
- spin_lock(&fi->lock);
+ spin_lock_irqsave(&fi->lock, flags);
for (i = 0; i < FIRQ_LIST_COUNT; i++) {
list_for_each_entry(inti, &fi->lists[i], list) {
if (n == max_irqs) {
@@ -2272,7 +2279,7 @@ static int get_all_floating_irqs(struct kvm *kvm, u8 __user *usrbuf, u64 len)
}
out:
- spin_unlock(&fi->lock);
+ spin_unlock_irqrestore(&fi->lock, flags);
out_nolock:
if (!ret && n > 0) {
if (copy_to_user(usrbuf, buf, sizeof(struct kvm_s390_irq) * n))
@@ -2287,6 +2294,7 @@ static int flic_ais_mode_get_all(struct kvm *kvm, struct kvm_device_attr *attr)
{
struct kvm_s390_float_interrupt *fi = &kvm->arch.float_int;
struct kvm_s390_ais_all ais;
+ unsigned long flags;
if (attr->attr < sizeof(ais))
return -EINVAL;
@@ -2294,10 +2302,10 @@ static int flic_ais_mode_get_all(struct kvm *kvm, struct kvm_device_attr *attr)
if (!test_kvm_facility(kvm, 72))
return -EOPNOTSUPP;
- mutex_lock(&fi->ais_lock);
+ spin_lock_irqsave(&fi->ais_lock, flags);
ais.simm = fi->simm;
ais.nimm = fi->nimm;
- mutex_unlock(&fi->ais_lock);
+ spin_unlock_irqrestore(&fi->ais_lock, flags);
if (copy_to_user((void __user *)attr->addr, &ais, sizeof(ais)))
return -EFAULT;
@@ -2683,6 +2691,7 @@ static int modify_ais_mode(struct kvm *kvm, struct kvm_device_attr *attr)
struct kvm_s390_float_interrupt *fi = &kvm->arch.float_int;
struct kvm_s390_ais_req req;
int ret = 0;
+ unsigned long flags;
if (!test_kvm_facility(kvm, 72))
return -EOPNOTSUPP;
@@ -2699,7 +2708,7 @@ static int modify_ais_mode(struct kvm *kvm, struct kvm_device_attr *attr)
2 : KVM_S390_AIS_MODE_SINGLE :
KVM_S390_AIS_MODE_ALL, req.mode);
- mutex_lock(&fi->ais_lock);
+ spin_lock_irqsave(&fi->ais_lock, flags);
switch (req.mode) {
case KVM_S390_AIS_MODE_ALL:
fi->simm &= ~AIS_MODE_MASK(req.isc);
@@ -2712,7 +2721,7 @@ static int modify_ais_mode(struct kvm *kvm, struct kvm_device_attr *attr)
default:
ret = -EINVAL;
}
- mutex_unlock(&fi->ais_lock);
+ spin_unlock_irqrestore(&fi->ais_lock, flags);
return ret;
}
@@ -2726,25 +2735,41 @@ static int kvm_s390_inject_airq(struct kvm *kvm,
.parm = 0,
.parm64 = isc_to_int_word(adapter->isc),
};
+ struct kvm_s390_interrupt_info *inti;
+ unsigned long flags;
+
int ret = 0;
- if (!test_kvm_facility(kvm, 72) || !adapter->suppressible)
- return kvm_s390_inject_vm(kvm, &s390int);
+ inti = kzalloc_obj(*inti, GFP_KERNEL_ACCOUNT);
+ if (!inti)
+ return -ENOMEM;
- mutex_lock(&fi->ais_lock);
+ if (!test_kvm_facility(kvm, 72) || !adapter->suppressible) {
+ ret = kvm_s390_inject_vm(kvm, &s390int, inti);
+ if (ret)
+ kfree(inti);
+ return ret;
+ }
+
+ spin_lock_irqsave(&fi->ais_lock, flags);
if (fi->nimm & AIS_MODE_MASK(adapter->isc)) {
trace_kvm_s390_airq_suppressed(adapter->id, adapter->isc);
- goto out;
+ spin_unlock_irqrestore(&fi->ais_lock, flags);
+ kfree(inti);
+ return ret;
}
- ret = kvm_s390_inject_vm(kvm, &s390int);
+ ret = kvm_s390_inject_vm(kvm, &s390int, inti);
+
if (!ret && (fi->simm & AIS_MODE_MASK(adapter->isc))) {
fi->nimm |= AIS_MODE_MASK(adapter->isc);
trace_kvm_s390_modify_ais_mode(adapter->isc,
KVM_S390_AIS_MODE_SINGLE, 2);
}
-out:
- mutex_unlock(&fi->ais_lock);
+
+ spin_unlock_irqrestore(&fi->ais_lock, flags);
+ if (ret)
+ kfree(inti);
return ret;
}
@@ -2753,6 +2778,8 @@ static int flic_inject_airq(struct kvm *kvm, struct kvm_device_attr *attr)
unsigned int id = attr->attr;
struct s390_io_adapter *adapter = get_io_adapter(kvm, id);
+ kvm->stat.io_flic_inject_airq++;
+
if (!adapter)
return -EINVAL;
@@ -2763,6 +2790,7 @@ static int flic_ais_mode_set_all(struct kvm *kvm, struct kvm_device_attr *attr)
{
struct kvm_s390_float_interrupt *fi = &kvm->arch.float_int;
struct kvm_s390_ais_all ais;
+ unsigned long flags;
if (!test_kvm_facility(kvm, 72))
return -EOPNOTSUPP;
@@ -2770,10 +2798,10 @@ static int flic_ais_mode_set_all(struct kvm *kvm, struct kvm_device_attr *attr)
if (copy_from_user(&ais, (void __user *)attr->addr, sizeof(ais)))
return -EFAULT;
- mutex_lock(&fi->ais_lock);
+ spin_lock_irqsave(&fi->ais_lock, flags);
fi->simm = ais.simm;
fi->nimm = ais.nimm;
- mutex_unlock(&fi->ais_lock);
+ spin_unlock_irqrestore(&fi->ais_lock, flags);
return 0;
}
@@ -2942,6 +2970,7 @@ static int adapter_indicators_set(struct kvm *kvm,
set_bit(bit, map);
spin_unlock_irqrestore(&adapter->maps_lock, flags);
}
+
spin_lock_irqsave(&adapter->maps_lock, flags);
summary_info = get_map_info(adapter, adapter_int->summary_addr);
if (!summary_info) {
@@ -2969,6 +2998,44 @@ static int adapter_indicators_set(struct kvm *kvm,
return summary_set ? 0 : 1;
}
+static int adapter_indicators_set_fast(struct kvm *kvm,
+ struct s390_io_adapter *adapter,
+ struct kvm_s390_adapter_int *adapter_int,
+ int setbit)
+{
+ unsigned long bit;
+ int summary_set;
+ struct s390_map_info *ind_info, *summary_info;
+ void *map;
+
+ spin_lock(&adapter->maps_lock);
+ ind_info = get_map_info(adapter, adapter_int->ind_addr);
+ if (!ind_info) {
+ spin_unlock(&adapter->maps_lock);
+ return -EWOULDBLOCK;
+ }
+ map = page_address(ind_info->page);
+ bit = get_ind_bit(ind_info->addr, adapter_int->ind_offset, adapter->swap);
+ if (setbit)
+ set_bit(bit, map);
+ summary_info = get_map_info(adapter, adapter_int->summary_addr);
+ if (!summary_info) {
+ spin_unlock(&adapter->maps_lock);
+ return -EWOULDBLOCK;
+ }
+ map = page_address(summary_info->page);
+ bit = get_ind_bit(summary_info->addr, adapter_int->summary_offset,
+ adapter->swap);
+ /* If setbit then set summary bit. Else if falling back to the slow path */
+ /* with setbit==0 then clear the summary bit so the slow path re-injects */
+ if (setbit)
+ summary_set = test_and_set_bit(bit, map);
+ else
+ summary_set = test_and_clear_bit(bit, map);
+ spin_unlock(&adapter->maps_lock);
+ return summary_set ? 0 : 1;
+}
+
/*
* < 0 - not injected due to error
* = 0 - coalesced, summary indicator already active
@@ -2981,6 +3048,8 @@ static int set_adapter_int(struct kvm_kernel_irq_routing_entry *e,
int ret;
struct s390_io_adapter *adapter;
+ kvm->stat.io_set_adapter_int++;
+
/* We're only interested in the 0->1 transition. */
if (!level)
return 0;
@@ -3049,7 +3118,6 @@ int kvm_set_routing_entry(struct kvm *kvm,
int idx;
switch (ue->type) {
- /* we store the userspace addresses instead of the guest addresses */
case KVM_IRQ_ROUTING_S390_ADAPTER:
if (kvm_is_ucontrol(kvm))
return -EINVAL;
@@ -3639,3 +3707,86 @@ int __init kvm_s390_gib_init(u8 nisc)
out:
return rc;
}
+
+/*
+ * kvm_arch_set_irq_inatomic: fast-path for irqfd injection
+ */
+int kvm_arch_set_irq_inatomic(struct kvm_kernel_irq_routing_entry *e,
+ struct kvm *kvm, int irq_source_id, int level,
+ bool line_status)
+{
+ int ret, setbit;
+ struct s390_io_adapter *adapter;
+ struct kvm_s390_float_interrupt *fi = &kvm->arch.float_int;
+ struct kvm_s390_interrupt_info *inti;
+ struct kvm_s390_interrupt s390int = {
+ .type = KVM_S390_INT_IO(1, 0, 0, 0),
+ .parm = 0,
+ };
+
+ kvm->stat.io_390_inatomic++;
+
+ /* We're only interested in the 0->1 transition. */
+ if (!level)
+ return 0;
+ if (e->type != KVM_IRQ_ROUTING_S390_ADAPTER)
+ return -EWOULDBLOCK;
+
+ adapter = get_io_adapter(kvm, e->adapter.adapter_id);
+ if (!adapter)
+ return -EWOULDBLOCK;
+
+ s390int.parm64 = isc_to_int_word(adapter->isc);
+ setbit = 1;
+ ret = adapter_indicators_set_fast(kvm, adapter, &e->adapter, setbit);
+ if (ret < 0)
+ return -EWOULDBLOCK;
+ if (!ret || adapter->masked) {
+ kvm->stat.io_390_inatomic_no_inject++;
+ return 0;
+ }
+
+ inti = kzalloc_obj(*inti, GFP_ATOMIC);
+ if (!inti) {
+ setbit = 0;
+ adapter_indicators_set_fast(kvm, adapter, &e->adapter, setbit);
+ return -EWOULDBLOCK;
+ }
+
+ if (!test_kvm_facility(kvm, 72) || !adapter->suppressible) {
+ ret = kvm_s390_inject_vm(kvm, &s390int, inti);
+ if (ret == 0) {
+ return ret;
+ } else {
+ setbit = 0;
+ adapter_indicators_set_fast(kvm, adapter, &e->adapter, setbit);
+ kfree(inti);
+ return -EWOULDBLOCK;
+ }
+ }
+
+ spin_lock(&fi->ais_lock);
+ if (fi->nimm & AIS_MODE_MASK(adapter->isc)) {
+ trace_kvm_s390_airq_suppressed(adapter->id, adapter->isc);
+ spin_unlock(&fi->ais_lock);
+ kfree(inti);
+ kvm->stat.io_390_inatomic_no_inject++;
+ return 0;
+ }
+
+ ret = kvm_s390_inject_vm(kvm, &s390int, inti);
+ if (!ret && (fi->simm & AIS_MODE_MASK(adapter->isc))) {
+ fi->nimm |= AIS_MODE_MASK(adapter->isc);
+ trace_kvm_s390_modify_ais_mode(adapter->isc,
+ KVM_S390_AIS_MODE_SINGLE, 2);
+ } else if (ret) {
+ spin_unlock(&fi->ais_lock);
+ setbit = 0;
+ adapter_indicators_set_fast(kvm, adapter, &e->adapter, setbit);
+ kfree(inti);
+ return -EWOULDBLOCK;
+ }
+
+ spin_unlock(&fi->ais_lock);
+ return 0;
+}
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 0d39c1375de2..98e7d807d620 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -70,6 +70,10 @@ const struct kvm_stats_desc kvm_vm_stats_desc[] = {
STATS_DESC_COUNTER(VM, inject_io),
STATS_DESC_COUNTER(VM, io_390_adapter_map),
STATS_DESC_COUNTER(VM, io_390_adapter_unmap),
+ STATS_DESC_COUNTER(VM, io_390_inatomic),
+ STATS_DESC_COUNTER(VM, io_flic_inject_airq),
+ STATS_DESC_COUNTER(VM, io_set_adapter_int),
+ STATS_DESC_COUNTER(VM, io_390_inatomic_no_inject),
STATS_DESC_COUNTER(VM, inject_float_mchk),
STATS_DESC_COUNTER(VM, inject_pfault_done),
STATS_DESC_COUNTER(VM, inject_service_signal),
@@ -2851,6 +2855,7 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
void __user *argp = (void __user *)arg;
struct kvm_device_attr attr;
int r;
+ struct kvm_s390_interrupt_info *inti;
switch (ioctl) {
case KVM_S390_INTERRUPT: {
@@ -2859,7 +2864,12 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
r = -EFAULT;
if (copy_from_user(&s390int, argp, sizeof(s390int)))
break;
- r = kvm_s390_inject_vm(kvm, &s390int);
+ inti = kzalloc_obj(*inti, GFP_KERNEL_ACCOUNT);
+ if (!inti)
+ return -ENOMEM;
+ r = kvm_s390_inject_vm(kvm, &s390int, inti);
+ if (r)
+ kfree(inti);
break;
}
case KVM_CREATE_IRQCHIP: {
@@ -3257,7 +3267,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
mutex_unlock(&kvm->lock);
}
- mutex_init(&kvm->arch.float_int.ais_lock);
+ spin_lock_init(&kvm->arch.float_int.ais_lock);
spin_lock_init(&kvm->arch.float_int.lock);
for (i = 0; i < FIRQ_LIST_COUNT; i++)
INIT_LIST_HEAD(&kvm->arch.float_int.lists[i]);
@@ -4379,19 +4389,28 @@ int kvm_s390_try_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clo
}
static void __kvm_inject_pfault_token(struct kvm_vcpu *vcpu, bool start_token,
- unsigned long token)
+ unsigned long token)
{
struct kvm_s390_interrupt inti;
struct kvm_s390_irq irq;
+ struct kvm_s390_interrupt_info *inti_mem = NULL;
+ int ret = 0;
if (start_token) {
irq.u.ext.ext_params2 = token;
irq.type = KVM_S390_INT_PFAULT_INIT;
WARN_ON_ONCE(kvm_s390_inject_vcpu(vcpu, &irq));
} else {
+ inti_mem = kzalloc_obj(*inti_mem, GFP_KERNEL_ACCOUNT);
+ if (WARN_ON_ONCE(!inti_mem))
+ return;
+
inti.type = KVM_S390_INT_PFAULT_DONE;
inti.parm64 = token;
- WARN_ON_ONCE(kvm_s390_inject_vm(vcpu->kvm, &inti));
+ ret = kvm_s390_inject_vm(vcpu->kvm, &inti, inti_mem);
+ if (ret)
+ kfree(inti_mem);
+ WARN_ON_ONCE(ret);
}
}
diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
index 7ba885cb6bd1..6d2842fb71a3 100644
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -376,7 +376,8 @@ int __must_check kvm_s390_deliver_pending_interrupts(struct kvm_vcpu *vcpu);
void kvm_s390_clear_local_irqs(struct kvm_vcpu *vcpu);
void kvm_s390_clear_float_irqs(struct kvm *kvm);
int __must_check kvm_s390_inject_vm(struct kvm *kvm,
- struct kvm_s390_interrupt *s390int);
+ struct kvm_s390_interrupt *s390int,
+ struct kvm_s390_interrupt_info *inti);
int __must_check kvm_s390_inject_vcpu(struct kvm_vcpu *vcpu,
struct kvm_s390_irq *irq);
static inline int kvm_s390_inject_prog_irq(struct kvm_vcpu *vcpu,
--
2.54.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v10 2/3] KVM: s390: Enable adapter_indicators_set to use mapped pages
2026-06-04 19:27 ` [PATCH v10 2/3] KVM: s390: Enable adapter_indicators_set to use mapped pages Douglas Freimuth
@ 2026-06-04 19:42 ` sashiko-bot
0 siblings, 0 replies; 7+ messages in thread
From: sashiko-bot @ 2026-06-04 19:42 UTC (permalink / raw)
To: Douglas Freimuth; +Cc: kvm
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] Host-userspace reachable WARN_ON_ONCE when pinning the summary page fails.
- [High] Broken atomicity: The indicator bit is set in guest memory before validating the summary page, leaving the guest in an inconsistent state on error.
- [High] The fast path for long-term mapped adapter indicators modifies guest memory without updating KVM's dirty page tracking, breaking live migration.
--
commit eab5dd8ef6ef5a366acb6ed92a72330259c39430
Author: Douglas Freimuth <freimuth@linux.ibm.com>
KVM: s390: Enable adapter_indicators_set to use mapped pages
This commit optimizes the s390 adapter_indicators_set function to use
long-term mapped pages when available. It introduces a fast path that operates
under a spinlock, falling back to the existing pin-and-map slow path if
the mapped pages are not found.
> diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
> index d066a282271e2..b5304816aaa04 100644
> --- a/arch/s390/kvm/interrupt.c
> +++ b/arch/s390/kvm/interrupt.c
[ ... ]
> @@ -2892,41 +2892,80 @@ static unsigned long get_ind_bit(__u64 addr, unsigned long bit_nr, bool swap)
[ ... ]
> static int adapter_indicators_set(struct kvm *kvm,
> struct s390_io_adapter *adapter,
> struct kvm_s390_adapter_int *adapter_int)
> {
[ ... ]
> + spin_lock_irqsave(&adapter->maps_lock, flags);
> + ind_info = get_map_info(adapter, adapter_int->ind_addr);
> + if (!ind_info) {
> + spin_unlock_irqrestore(&adapter->maps_lock, flags);
> + ind_page = pin_map_page(kvm, adapter_int->ind_addr, 0);
> + if (!ind_page)
> + return -1;
> + idx = srcu_read_lock(&kvm->srcu);
> + map = page_address(ind_page);
> + bit = get_ind_bit(adapter_int->ind_addr,
> + adapter_int->ind_offset, adapter->swap);
> + set_bit(bit, map);
[Severity: High]
If the indicator bit is set here, but the subsequent attempt to resolve or pin
the summary page fails and the function returns -1, does this code leave the
indicator bit permanently set in guest memory?
Prior to this patch, both pages were successfully pinned before any bits were
mutated, ensuring an all-or-nothing update.
> + mark_page_dirty(kvm, adapter_int->ind_gaddr >> PAGE_SHIFT);
> + set_page_dirty_lock(ind_page);
> + srcu_read_unlock(&kvm->srcu, idx);
> + unpin_user_page(ind_page);
> + } else {
> + map = page_address(ind_info->page);
> + bit = get_ind_bit(ind_info->addr, adapter_int->ind_offset, adapter->swap);
> + set_bit(bit, map);
> + spin_unlock_irqrestore(&adapter->maps_lock, flags);
[Severity: High]
By modifying the long-term mapped adapter indicator directly in the host
virtual mapping, does this bypass KVM's dirty page tracking?
The slow path calls mark_page_dirty() and set_page_dirty_lock(), but the fast
path skips them. Could this cause the migrated guest to lose interrupt
delivery notifications if the updated indicator pages are not transferred to
the destination VM?
> + }
> + spin_lock_irqsave(&adapter->maps_lock, flags);
> + summary_info = get_map_info(adapter, adapter_int->summary_addr);
> + if (!summary_info) {
> + spin_unlock_irqrestore(&adapter->maps_lock, flags);
> + summary_page = pin_map_page(kvm, adapter_int->summary_addr, 0);
> + if (WARN_ON_ONCE(!summary_page))
> + return -1;
[Severity: High]
Since the summary address is derived from guest-controlled KVM routing entries
and managed by host userspace, could a malicious or misbehaving userspace
process trigger this warning by unmapping the memory or providing an invalid
address?
The KVM subsystem guidelines note that asserts whose conditions can be driven
by unprivileged host userspace are a host-side denial of service, especially
if panic_on_warn is enabled. Is it better to just return -1 here without the
warning?
> + idx = srcu_read_lock(&kvm->srcu);
> + map = page_address(summary_page);
> + bit = get_ind_bit(adapter_int->summary_addr,
> + adapter_int->summary_offset, adapter->swap);
> + summary_set = test_and_set_bit(bit, map);
> + mark_page_dirty(kvm, adapter_int->summary_gaddr >> PAGE_SHIFT);
> + set_page_dirty_lock(summary_page);
> + srcu_read_unlock(&kvm->srcu, idx);
> + unpin_user_page(summary_page);
> + } else {
> + map = page_address(summary_info->page);
> + bit = get_ind_bit(summary_info->addr, adapter_int->summary_offset,
> + adapter->swap);
> + summary_set = test_and_set_bit(bit, map);
> + spin_unlock_irqrestore(&adapter->maps_lock, flags);
[Severity: High]
Similar to the indicator page fast path above, does this code also bypass KVM
dirty page tracking for the summary page when a mapped info structure is found?
> }
>
> - idx = srcu_read_lock(&kvm->srcu);
> - map = page_address(ind_page);
> - bit = get_ind_bit(adapter_int->ind_addr,
> - adapter_int->ind_offset, adapter->swap);
> - set_bit(bit, map);
> - mark_page_dirty(kvm, adapter_int->ind_gaddr >> PAGE_SHIFT);
> - set_page_dirty_lock(ind_page);
> - map = page_address(summary_page);
> - bit = get_ind_bit(adapter_int->summary_addr,
> - adapter_int->summary_offset, adapter->swap);
> - summary_set = test_and_set_bit(bit, map);
> - mark_page_dirty(kvm, adapter_int->summary_gaddr >> PAGE_SHIFT);
> - set_page_dirty_lock(summary_page);
> - srcu_read_unlock(&kvm->srcu, idx);
> -
> - unpin_user_page(ind_page);
> - unpin_user_page(summary_page);
> return summary_set ? 0 : 1;
> }
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260604192755.203143-1-freimuth@linux.ibm.com?part=2
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v10 3/3] KVM: s390: Introducing kvm_arch_set_irq_inatomic fast inject
2026-06-04 19:27 ` [PATCH v10 3/3] KVM: s390: Introducing kvm_arch_set_irq_inatomic fast inject Douglas Freimuth
@ 2026-06-04 19:46 ` sashiko-bot
0 siblings, 0 replies; 7+ messages in thread
From: sashiko-bot @ 2026-06-04 19:46 UTC (permalink / raw)
To: Douglas Freimuth; +Cc: kvm
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
New issues:
- [High] Missing KVM Dirty Page Tracking during Fast IRQ Injection.
Pre-existing issues:
- [High] Guest-Triggerable WARNING / Denial of Service on memory allocation failure.
--
commit 87294ba489c0602b2b9baebf0c40ddd4f77205c0
Author: Douglas Freimuth <freimuth@linux.ibm.com>
Subject: KVM: s390: Introducing kvm_arch_set_irq_inatomic fast inject
This patch introduces a fast path for irq injection on s390 via
kvm_arch_set_irq_inatomic. It avoids placing interrupts on the global
work queue by directly mapping adapter indicator pages and accessing
shared resources using spinlocks instead of mutexes.
> diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
> index b5304816aaa04..9e3e6b0d72ad6 100644
> --- a/arch/s390/kvm/interrupt.c
> +++ b/arch/s390/kvm/interrupt.c
[ ... ]
> @@ -2969,6 +2998,44 @@ static int adapter_indicators_set(struct kvm *kvm,
> return summary_set ? 0 : 1;
> }
>
> +static int adapter_indicators_set_fast(struct kvm *kvm,
> + struct s390_io_adapter *adapter,
> + struct kvm_s390_adapter_int *adapter_int,
> + int setbit)
> +{
> + unsigned long bit;
> + int summary_set;
> + struct s390_map_info *ind_info, *summary_info;
> + void *map;
> +
> + spin_lock(&adapter->maps_lock);
> + ind_info = get_map_info(adapter, adapter_int->ind_addr);
> + if (!ind_info) {
> + spin_unlock(&adapter->maps_lock);
> + return -EWOULDBLOCK;
> + }
> + map = page_address(ind_info->page);
> + bit = get_ind_bit(ind_info->addr, adapter_int->ind_offset, adapter->swap);
> + if (setbit)
> + set_bit(bit, map);
[Severity: High]
Does this bypass dirty page tracking? Since the guest memory is directly
mutated here using the kernel virtual address mapping, the hardware dirty
bit won't be set.
It appears KVM must explicitly invoke mark_page_dirty() (or an equivalent) to
record this modification in the memory slot's dirty bitmap. Without it, pending
interrupts injected via the fast path might be lost after live migration,
leading to guest I/O hangs or data corruption.
> + summary_info = get_map_info(adapter, adapter_int->summary_addr);
> + if (!summary_info) {
> + spin_unlock(&adapter->maps_lock);
> + return -EWOULDBLOCK;
> + }
> + map = page_address(summary_info->page);
> + bit = get_ind_bit(summary_info->addr, adapter_int->summary_offset,
> + adapter->swap);
> + /* If setbit then set summary bit. Else if falling back to the slow path */
> + /* with setbit==0 then clear the summary bit so the slow path re-injects */
> + if (setbit)
> + summary_set = test_and_set_bit(bit, map);
> + else
> + summary_set = test_and_clear_bit(bit, map);
[Severity: High]
Similarly, does the summary page modification here also need explicit dirty
page tracking?
> + spin_unlock(&adapter->maps_lock);
> + return summary_set ? 0 : 1;
> +}
[ ... ]
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 0d39c1375de21..98e7d807d6204 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
[ ... ]
> @@ -4379,19 +4389,28 @@ int kvm_s390_try_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clo
> }
>
> static void __kvm_inject_pfault_token(struct kvm_vcpu *vcpu, bool start_token,
> - unsigned long token)
> + unsigned long token)
> {
> struct kvm_s390_interrupt inti;
> struct kvm_s390_irq irq;
> + struct kvm_s390_interrupt_info *inti_mem = NULL;
> + int ret = 0;
>
> if (start_token) {
> irq.u.ext.ext_params2 = token;
> irq.type = KVM_S390_INT_PFAULT_INIT;
> WARN_ON_ONCE(kvm_s390_inject_vcpu(vcpu, &irq));
> } else {
> + inti_mem = kzalloc_obj(*inti_mem, GFP_KERNEL_ACCOUNT);
> + if (WARN_ON_ONCE(!inti_mem))
> + return;
[Severity: High]
This isn't a bug introduced by this patch, but does this guest-triggerable
warning create a Denial of Service risk?
The KVM subsystem guidelines explicitly state there should be no guest- or
host-userspace-reachable WARN_ON or BUG_ON. If this allocation fails (e.g.
under host memory pressure, or if a malicious guest deliberately exhausts its
memory cgroup limits via asynchronous page faults), it will hit this warning.
On systems configured with panic_on_warn=1, an unprivileged guest could
intentionally crash the host.
> +
> inti.type = KVM_S390_INT_PFAULT_DONE;
> inti.parm64 = token;
> - WARN_ON_ONCE(kvm_s390_inject_vm(vcpu->kvm, &inti));
> + ret = kvm_s390_inject_vm(vcpu->kvm, &inti, inti_mem);
> + if (ret)
> + kfree(inti_mem);
> + WARN_ON_ONCE(ret);
> }
> }
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260604192755.203143-1-freimuth@linux.ibm.com?part=3
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v10 0/3] KVM: s390: Introducing kvm_arch_set_irq_inatomic Fast Inject
2026-06-04 19:27 [PATCH v10 0/3] KVM: s390: Introducing kvm_arch_set_irq_inatomic Fast Inject Douglas Freimuth
` (2 preceding siblings ...)
2026-06-04 19:27 ` [PATCH v10 3/3] KVM: s390: Introducing kvm_arch_set_irq_inatomic fast inject Douglas Freimuth
@ 2026-06-15 9:04 ` Claudio Imbrenda
3 siblings, 0 replies; 7+ messages in thread
From: Claudio Imbrenda @ 2026-06-15 9:04 UTC (permalink / raw)
To: Douglas Freimuth
Cc: borntraeger, frankja, david, hca, gor, agordeev, svens, kvm,
linux-s390, linux-kernel, mjrosato
On Thu, 4 Jun 2026 21:27:52 +0200
Douglas Freimuth <freimuth@linux.ibm.com> wrote:
> s390 needs this series of three patches in order to enable a non-blocking
> path for irqfd injection on s390 via kvm_arch_set_irq_inatomic(). Before
> these changes, kvm_arch_set_irq_inatomic() would just return -EWOULDBLOCK
> and place all interrupts on the global work queue, which must subsequently
> be processed by a different thread. This series of patches implements an
> s390 version of inatomic and is relevant to virtio-blk and virtio-net and
> was tested against virtio-pci and virtio-ccw.
>
> The inatomic fast path cannot lose control since it is running with
> interrupts disabled. This meant making the following changes that exist on
> the slow path today. First, the adapter_indicators page needs to be mapped
> since it is accessed with interrupts disabled, so we added map/unmap
> functions. Second, access to shared resources between the fast and slow
> paths needed to be changed from mutex and semaphores to spin_lock's.
> Finally, the memory allocation on the slow path utilizes GFP_KERNEL_ACCOUNT
> but we had to implement the fast path with GFP_ATOMIC allocation. Each of
> these enhancements were required to prevent blocking on the fast inject
> path.
>
> s390 doesn't support a PREEMPT_RT kernel and this patch doesn't either.
> Given this fact, we are not using raw_spin_lock instead we are using
> regular spin_lock.
>
> Statistical counters have been added to enable analysis of irq injection on
> the fast path and slow path including io_390_inatomic, io_flic_inject_airq,
> io_set_adapter_int and io_390_inatomic_no_inject. And
> counters have been added to analyze map/unmap of the adapter_indicator
> pages in non-Secure Execution environments and to track fencing of Fast
> Inject in Secure Execution environments. In order to take advantage of this
> kernel series with virtio-pci, a QEMU that includes the
> 's390x/pci: set kvm_msi_via_irqfd_allowed' fix is needed. Additionally,
> the guest xml needs a thread pool and threads explicitly assigned per disk
> device using the common way of defining threads for disks.
>
> Patch 1 enables map/unmap of adapter indicator pages but for Secure
> Execution environments it avoids the long term mapping.
>
Whole series:
Acked-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> v9->v10: Streamline logic in register_io_adapter()
>
> Douglas Freimuth (3):
> KVM: s390: Add map/unmap ioctl and clean mappings post-guest
> KVM: s390: Enable adapter_indicators_set to use mapped pages
> KVM: s390: Introducing kvm_arch_set_irq_inatomic fast inject
>
> arch/s390/include/asm/kvm_host.h | 11 +-
> arch/s390/kvm/intercept.c | 5 +-
> arch/s390/kvm/interrupt.c | 559 ++++++++++++++++++++++++-------
> arch/s390/kvm/kvm-s390.c | 30 +-
> arch/s390/kvm/kvm-s390.h | 5 +-
> 5 files changed, 488 insertions(+), 122 deletions(-)
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-06-15 9:04 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-04 19:27 [PATCH v10 0/3] KVM: s390: Introducing kvm_arch_set_irq_inatomic Fast Inject Douglas Freimuth
2026-06-04 19:27 ` [PATCH v10 1/3] KVM: s390: Add map/unmap ioctl and clean mappings post-guest Douglas Freimuth
2026-06-04 19:27 ` [PATCH v10 2/3] KVM: s390: Enable adapter_indicators_set to use mapped pages Douglas Freimuth
2026-06-04 19:42 ` sashiko-bot
2026-06-04 19:27 ` [PATCH v10 3/3] KVM: s390: Introducing kvm_arch_set_irq_inatomic fast inject Douglas Freimuth
2026-06-04 19:46 ` sashiko-bot
2026-06-15 9:04 ` [PATCH v10 0/3] KVM: s390: Introducing kvm_arch_set_irq_inatomic Fast Inject Claudio Imbrenda
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox