* [PATCH v3 0/3] KVM: s390: Introducing kvm_arch_set_irq_inatomic Fast Inject
@ 2026-04-06 6:44 Douglas Freimuth
2026-04-06 6:44 ` [PATCH v3 1/3] KVM: s390: Add map/unmap ioctl and clean mappings post-guest Douglas Freimuth
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Douglas Freimuth @ 2026-04-06 6:44 UTC (permalink / raw)
To: borntraeger, imbrenda, frankja, david, hca, gor, agordeev, svens,
kvm, linux-s390, linux-kernel
Cc: mjrosato, freimuth
S390 needs this series of three patches in order to enable a non-blocking
path for irqfd injection on s390 via kvm_arch_set_irq_inatomic(). Before
these changes, kvm_arch_set_irq_inatomic() would just return -EWOULDBLOCK
and place all interrupts on the global work queue, which must subsequently
be processed by a different thread. This series of patches implements an
s390 version of inatomic and is relevant to virtio-blk and virtio-net and
was tested against virtio-pci and virtio-ccw.
The inatomic fast path cannot lose control since it is running with
interrupts disabled. This meant making the following changes that exist on
the slow path today. First, the adapter_indicators page needs to be mapped
since it is accessed with interrupts disabled, so we added map/unmap
functions. Second, access to shared resources between the fast and slow
paths needed to be changed from mutex and semaphores to spin_lock's.
Finally, the memory allocation on the slow path utilizes GFP_KERNEL_ACCOUNT
but we had to implement the fast path with GFP_ATOMIC allocation. Each of
these enhancements were required to prevent blocking on the fast inject
path.
Statistical counters have been added to enable analysis of irq injection on
the fast path and slow path including io_390_inatomic, io_flic_inject_airq,
io_set_adapter_int and io_390_inatomic_adapter_masked. And counters have
been added to analyze map/unmap of the adapter_indicator
pages in non-Secure Execution environments and to track fencing of Fast
Inject in Secure Execution environments. In order to take advantage of this
kernel series with virtio-pci, a QEMU that includes the
's390x/pci: set kvm_msi_via_irqfd_allowed' fix is needed. Additionally,
the guest xml needs a thread pool and threads explicitly assigned per disk
device using the common way of defining threads for disks.
Patch 1 enables map/unmap of adapter indicator pages but for Secure
Execution environments it avoids the long term mapping.
v2->v3: GFP_KERNEL to GFP_KERNEL_ACCOUNT in one instance of allocation.
v2->v3: Fix alignment error.
v2->v3: Increment nr_maps after new map added to list
v2->v3: kvm_s390_adapter_unmap do mark_page_dirty and set_page_dirty_lock.
v2->v3: In unmap_all_adapters_pv do mark_page_dirty, set_page_dirty_lock.
v2->v3: Move kvm_s390_unmap_all_adapters_pv() to after check if in pv.
v2->v3: Move mutex_unlock after map/unmap in modify_io_adapter.
v2->v3: Add spin_lock to get maps->lock in kvm_s390_unmap_all_adapters_pv.
v2->v3: Only put_page(ind_page) if !ind_info which allocates ind_page.
v2->v3: Move the spin_lock inside of the adapter_indicators_set in patch 2.
v2->v3: On last conditional in kvm_arch_set_irq_inatomic, add else clause.
v2->v3: Clear ind/summ bits if inject fails upon return to inatomic.
Douglas Freimuth (3):
Add map/unmap ioctl and clean mappings post-guest
Enable adapter_indicators_set to use mapped pages
Introducing kvm_arch_set_irq_inatomic fast inject
arch/s390/include/asm/kvm_host.h | 11 +-
arch/s390/kvm/interrupt.c | 392 ++++++++++++++++++++++++++-----
arch/s390/kvm/kvm-s390.c | 51 +++-
arch/s390/kvm/kvm-s390.h | 3 +-
4 files changed, 387 insertions(+), 70 deletions(-)
--
2.52.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 1/3] KVM: s390: Add map/unmap ioctl and clean mappings post-guest
2026-04-06 6:44 [PATCH v3 0/3] KVM: s390: Introducing kvm_arch_set_irq_inatomic Fast Inject Douglas Freimuth
@ 2026-04-06 6:44 ` Douglas Freimuth
2026-04-06 13:39 ` Matthew Rosato
2026-04-06 6:44 ` [PATCH v3 2/3] KVM: s390: Enable adapter_indicators_set to use mapped pages Douglas Freimuth
2026-04-06 6:44 ` [PATCH v3 3/3] KVM: s390: Introducing kvm_arch_set_irq_inatomic fast inject Douglas Freimuth
2 siblings, 1 reply; 10+ messages in thread
From: Douglas Freimuth @ 2026-04-06 6:44 UTC (permalink / raw)
To: borntraeger, imbrenda, frankja, david, hca, gor, agordeev, svens,
kvm, linux-s390, linux-kernel
Cc: mjrosato, freimuth
S390 needs map/unmap ioctls, which map the adapter set
indicator pages, so the pages can be accessed when interrupts are
disabled. The mappings are cleaned up when the guest is removed.
Map/Unmap ioctls are fenced in order to avoid the longterm pinning
in Secure Execution environments. In Secure Execution
environments the path of execution available before this patch is followed.
Statistical counters to count map/unmap functions for adapter indicator
pages are added. The counters can be used to analyze
map/unmap functions in non-Secure Execution environments and similarly
can be used to analyze Secure Execution environments where the counters
will not be incremented as the adapter indicator pages are not mapped.
Signed-off-by: Douglas Freimuth <freimuth@linux.ibm.com>
---
arch/s390/include/asm/kvm_host.h | 5 +
arch/s390/kvm/interrupt.c | 151 +++++++++++++++++++++++++------
arch/s390/kvm/kvm-s390.c | 27 ++++++
3 files changed, 157 insertions(+), 26 deletions(-)
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 3039c88daa63..a078420751a1 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -448,6 +448,8 @@ struct kvm_vcpu_arch {
struct kvm_vm_stat {
struct kvm_vm_stat_generic generic;
u64 inject_io;
+ u64 io_390_adapter_map;
+ u64 io_390_adapter_unmap;
u64 inject_float_mchk;
u64 inject_pfault_done;
u64 inject_service_signal;
@@ -479,6 +481,9 @@ struct s390_io_adapter {
bool masked;
bool swap;
bool suppressible;
+ spinlock_t maps_lock;
+ struct list_head maps;
+ unsigned int nr_maps;
};
#define MAX_S390_IO_ADAPTERS ((MAX_ISC + 1) * 8)
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 7cb8ce833b62..47bd6361c849 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -2426,6 +2426,9 @@ static int register_io_adapter(struct kvm_device *dev,
if (!adapter)
return -ENOMEM;
+ INIT_LIST_HEAD(&adapter->maps);
+ spin_lock_init(&adapter->maps_lock);
+ adapter->nr_maps = 0;
adapter->id = adapter_info.id;
adapter->isc = adapter_info.isc;
adapter->maskable = adapter_info.maskable;
@@ -2450,12 +2453,109 @@ int kvm_s390_mask_adapter(struct kvm *kvm, unsigned int id, bool masked)
return ret;
}
+static struct page *get_map_page(struct kvm *kvm, u64 uaddr)
+{
+ struct mm_struct *mm = kvm->mm;
+ struct page *page = NULL;
+ int locked = 1;
+
+ if (mmget_not_zero(mm)) {
+ mmap_read_lock(mm);
+ get_user_pages_remote(mm, uaddr, 1, FOLL_WRITE,
+ &page, &locked);
+ if (locked)
+ mmap_read_unlock(mm);
+ mmput(mm);
+ }
+
+ return page;
+}
+
+static int kvm_s390_adapter_map(struct kvm *kvm, unsigned int id, __u64 addr)
+{
+ struct s390_io_adapter *adapter = get_io_adapter(kvm, id);
+ struct s390_map_info *map;
+ unsigned long flags;
+ int ret;
+
+ if (!adapter || !addr)
+ return -EINVAL;
+
+ map = kzalloc_obj(*map, GFP_KERNEL_ACCOUNT);
+ if (!map)
+ return -ENOMEM;
+
+ map->page = get_map_page(kvm, addr);
+ if (!map->page) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ INIT_LIST_HEAD(&map->list);
+ map->guest_addr = addr;
+ map->addr = addr;
+ spin_lock_irqsave(&adapter->maps_lock, flags);
+ if (adapter->nr_maps < MAX_S390_ADAPTER_MAPS) {
+ list_add_tail(&map->list, &adapter->maps);
+ adapter->nr_maps++;
+ ret = 0;
+ } else {
+ put_page(map->page);
+ ret = -EINVAL;
+ }
+ spin_unlock_irqrestore(&adapter->maps_lock, flags);
+out:
+ if (ret)
+ kfree(map);
+ return ret;
+}
+
+static int kvm_s390_adapter_unmap(struct kvm *kvm, unsigned int id, __u64 addr)
+{
+ struct s390_io_adapter *adapter = get_io_adapter(kvm, id);
+ struct s390_map_info *map, *tmp;
+ unsigned long flags;
+ int found = 0, idx;
+
+ if (!adapter || !addr)
+ return -EINVAL;
+
+ list_for_each_entry_safe(map, tmp, &adapter->maps, list) {
+ if (map->guest_addr == addr) {
+ spin_lock_irqsave(&adapter->maps_lock, flags);
+ found = 1;
+ adapter->nr_maps--;
+ list_del(&map->list);
+ spin_unlock_irqrestore(&adapter->maps_lock, flags);
+ idx = srcu_read_lock(&kvm->srcu);
+ mark_page_dirty(kvm, map->addr >> PAGE_SHIFT);
+ set_page_dirty_lock(map->page);
+ srcu_read_unlock(&kvm->srcu, idx);
+ put_page(map->page);
+ kfree(map);
+ break;
+ }
+ }
+
+ return found ? 0 : -ENOENT;
+}
+
void kvm_s390_destroy_adapters(struct kvm *kvm)
{
int i;
+ struct s390_map_info *map, *tmp;
- for (i = 0; i < MAX_S390_IO_ADAPTERS; i++)
+ for (i = 0; i < MAX_S390_IO_ADAPTERS; i++) {
+ if (!kvm->arch.adapters[i])
+ continue;
+ list_for_each_entry_safe(map, tmp,
+ &kvm->arch.adapters[i]->maps, list) {
+ list_del(&map->list);
+ put_page(map->page);
+ kfree(map);
+ }
kfree(kvm->arch.adapters[i]);
+ }
}
static int modify_io_adapter(struct kvm_device *dev,
@@ -2463,7 +2563,8 @@ static int modify_io_adapter(struct kvm_device *dev,
{
struct kvm_s390_io_adapter_req req;
struct s390_io_adapter *adapter;
- int ret;
+ __u64 host_addr;
+ int ret, idx;
if (copy_from_user(&req, (void __user *)attr->addr, sizeof(req)))
return -EFAULT;
@@ -2477,14 +2578,30 @@ static int modify_io_adapter(struct kvm_device *dev,
if (ret > 0)
ret = 0;
break;
- /*
- * The following operations are no longer needed and therefore no-ops.
- * The gpa to hva translation is done when an IRQ route is set up. The
- * set_irq code uses get_user_pages_remote() to do the actual write.
- */
case KVM_S390_IO_ADAPTER_MAP:
case KVM_S390_IO_ADAPTER_UNMAP:
- ret = 0;
+ /* If in Secure Execution mode do not long term pin. */
+ mutex_lock(&dev->kvm->lock);
+ if (kvm_s390_pv_is_protected(dev->kvm)) {
+ mutex_unlock(&dev->kvm->lock);
+ return 0;
+ }
+ idx = srcu_read_lock(&dev->kvm->srcu);
+ host_addr = gpa_to_hva(dev->kvm, req.addr);
+ if (kvm_is_error_hva(host_addr)) {
+ srcu_read_unlock(&dev->kvm->srcu, idx);
+ return -EFAULT;
+ }
+ srcu_read_unlock(&dev->kvm->srcu, idx);
+ if (req.type == KVM_S390_IO_ADAPTER_MAP) {
+ dev->kvm->stat.io_390_adapter_map++;
+ ret = kvm_s390_adapter_map(dev->kvm, req.id, host_addr);
+ mutex_unlock(&dev->kvm->lock);
+ } else {
+ dev->kvm->stat.io_390_adapter_unmap++;
+ ret = kvm_s390_adapter_unmap(dev->kvm, req.id, host_addr);
+ mutex_unlock(&dev->kvm->lock);
+ }
break;
default:
ret = -EINVAL;
@@ -2730,24 +2847,6 @@ static unsigned long get_ind_bit(__u64 addr, unsigned long bit_nr, bool swap)
return swap ? (bit ^ (BITS_PER_LONG - 1)) : bit;
}
-static struct page *get_map_page(struct kvm *kvm, u64 uaddr)
-{
- struct mm_struct *mm = kvm->mm;
- struct page *page = NULL;
- int locked = 1;
-
- if (mmget_not_zero(mm)) {
- mmap_read_lock(mm);
- get_user_pages_remote(mm, uaddr, 1, FOLL_WRITE,
- &page, &locked);
- if (locked)
- mmap_read_unlock(mm);
- mmput(mm);
- }
-
- return page;
-}
-
static int adapter_indicators_set(struct kvm *kvm,
struct s390_io_adapter *adapter,
struct kvm_s390_adapter_int *adapter_int)
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index d7838334a338..4eada48c6e27 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -68,6 +68,8 @@
const struct kvm_stats_desc kvm_vm_stats_desc[] = {
KVM_GENERIC_VM_STATS(),
STATS_DESC_COUNTER(VM, inject_io),
+ STATS_DESC_COUNTER(VM, io_390_adapter_map),
+ STATS_DESC_COUNTER(VM, io_390_adapter_unmap),
STATS_DESC_COUNTER(VM, inject_float_mchk),
STATS_DESC_COUNTER(VM, inject_pfault_done),
STATS_DESC_COUNTER(VM, inject_service_signal),
@@ -2491,6 +2493,30 @@ static int kvm_s390_pv_dmp(struct kvm *kvm, struct kvm_pv_cmd *cmd,
return r;
}
+static void kvm_s390_unmap_all_adapters_pv(struct kvm *kvm)
+{
+ unsigned long flags;
+ struct s390_map_info *map, *tmp;
+ int i, idx;
+
+ for (i = 0; i < MAX_S390_IO_ADAPTERS; i++) {
+ if (!kvm->arch.adapters[i])
+ continue;
+ list_for_each_entry_safe(map, tmp,
+ &kvm->arch.adapters[i]->maps, list) {
+ spin_lock_irqsave(&kvm->arch.adapters[i]->maps_lock, flags);
+ list_del(&map->list);
+ spin_unlock_irqrestore(&kvm->arch.adapters[i]->maps_lock, flags);
+ idx = srcu_read_lock(&kvm->srcu);
+ mark_page_dirty(kvm, map->addr >> PAGE_SHIFT);
+ set_page_dirty_lock(map->page);
+ srcu_read_unlock(&kvm->srcu, idx);
+ put_page(map->page);
+ kfree(map);
+ }
+ }
+}
+
static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
{
const bool need_lock = (cmd->cmd != KVM_PV_ASYNC_CLEANUP_PERFORM);
@@ -2507,6 +2533,7 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
if (kvm_s390_pv_is_protected(kvm))
break;
+ kvm_s390_unmap_all_adapters_pv(kvm);
mmap_write_lock(kvm->mm);
/*
* Disable creation of new THPs. Existing THPs can stay, they
--
2.52.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v3 2/3] KVM: s390: Enable adapter_indicators_set to use mapped pages
2026-04-06 6:44 [PATCH v3 0/3] KVM: s390: Introducing kvm_arch_set_irq_inatomic Fast Inject Douglas Freimuth
2026-04-06 6:44 ` [PATCH v3 1/3] KVM: s390: Add map/unmap ioctl and clean mappings post-guest Douglas Freimuth
@ 2026-04-06 6:44 ` Douglas Freimuth
2026-04-06 15:33 ` Matthew Rosato
2026-04-06 6:44 ` [PATCH v3 3/3] KVM: s390: Introducing kvm_arch_set_irq_inatomic fast inject Douglas Freimuth
2 siblings, 1 reply; 10+ messages in thread
From: Douglas Freimuth @ 2026-04-06 6:44 UTC (permalink / raw)
To: borntraeger, imbrenda, frankja, david, hca, gor, agordeev, svens,
kvm, linux-s390, linux-kernel
Cc: mjrosato, freimuth
The S390 adapter_indicators_set function needs to be able to use mapped
pages so that worked can be processed,on a fast path when interrupts are
disabled. If adapter indicator pages are not mapped then local mapping is
done on a slow path as it is prior to this patch. For example, Secure
Execution environments will take the local mapping path as it does prior to
this patch.
Signed-off-by: Douglas Freimuth <freimuth@linux.ibm.com>
---
arch/s390/kvm/interrupt.c | 91 ++++++++++++++++++++++++++++-----------
1 file changed, 66 insertions(+), 25 deletions(-)
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 47bd6361c849..f3183c9ec7f1 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -2847,41 +2847,82 @@ static unsigned long get_ind_bit(__u64 addr, unsigned long bit_nr, bool swap)
return swap ? (bit ^ (BITS_PER_LONG - 1)) : bit;
}
+static struct s390_map_info *get_map_info(struct s390_io_adapter *adapter,
+ u64 addr)
+{
+ struct s390_map_info *map;
+
+ if (!adapter)
+ return NULL;
+
+ list_for_each_entry(map, &adapter->maps, list) {
+ if (map->guest_addr == addr)
+ return map;
+ }
+ return NULL;
+}
+
static int adapter_indicators_set(struct kvm *kvm,
struct s390_io_adapter *adapter,
struct kvm_s390_adapter_int *adapter_int)
{
unsigned long bit;
int summary_set, idx;
- struct page *ind_page, *summary_page;
+ struct s390_map_info *ind_info, *summary_info;
void *map;
+ struct page *ind_page, *summary_page;
+ unsigned long flags;
- ind_page = get_map_page(kvm, adapter_int->ind_addr);
- if (!ind_page)
- return -1;
- summary_page = get_map_page(kvm, adapter_int->summary_addr);
- if (!summary_page) {
- put_page(ind_page);
- return -1;
+ spin_lock_irqsave(&adapter->maps_lock, flags);
+ ind_info = get_map_info(adapter, adapter_int->ind_addr);
+ if (!ind_info) {
+ spin_unlock_irqrestore(&adapter->maps_lock, flags);
+ ind_page = get_map_page(kvm, adapter_int->ind_addr);
+ if (!ind_page)
+ return -1;
+ idx = srcu_read_lock(&kvm->srcu);
+ map = page_address(ind_page);
+ bit = get_ind_bit(adapter_int->ind_addr,
+ adapter_int->ind_offset, adapter->swap);
+ set_bit(bit, map);
+ mark_page_dirty(kvm, adapter_int->ind_gaddr >> PAGE_SHIFT);
+ set_page_dirty_lock(ind_page);
+ srcu_read_unlock(&kvm->srcu, idx);
+ } else {
+ map = page_address(ind_info->page);
+ bit = get_ind_bit(ind_info->addr, adapter_int->ind_offset, adapter->swap);
+ set_bit(bit, map);
+ spin_unlock_irqrestore(&adapter->maps_lock, flags);
+ }
+ spin_lock_irqsave(&adapter->maps_lock, flags);
+ summary_info = get_map_info(adapter, adapter_int->summary_addr);
+ if (!summary_info) {
+ spin_unlock_irqrestore(&adapter->maps_lock, flags);
+ summary_page = get_map_page(kvm, adapter_int->summary_addr);
+ if (!summary_page && !ind_info) {
+ put_page(ind_page);
+ return -1;
+ }
+ idx = srcu_read_lock(&kvm->srcu);
+ map = page_address(summary_page);
+ bit = get_ind_bit(adapter_int->summary_addr,
+ adapter_int->summary_offset, adapter->swap);
+ summary_set = test_and_set_bit(bit, map);
+ mark_page_dirty(kvm, adapter_int->summary_gaddr >> PAGE_SHIFT);
+ set_page_dirty_lock(summary_page);
+ srcu_read_unlock(&kvm->srcu, idx);
+ } else {
+ map = page_address(summary_info->page);
+ bit = get_ind_bit(summary_info->addr, adapter_int->summary_offset,
+ adapter->swap);
+ summary_set = test_and_set_bit(bit, map);
+ spin_unlock_irqrestore(&adapter->maps_lock, flags);
}
- idx = srcu_read_lock(&kvm->srcu);
- map = page_address(ind_page);
- bit = get_ind_bit(adapter_int->ind_addr,
- adapter_int->ind_offset, adapter->swap);
- set_bit(bit, map);
- mark_page_dirty(kvm, adapter_int->ind_gaddr >> PAGE_SHIFT);
- set_page_dirty_lock(ind_page);
- map = page_address(summary_page);
- bit = get_ind_bit(adapter_int->summary_addr,
- adapter_int->summary_offset, adapter->swap);
- summary_set = test_and_set_bit(bit, map);
- mark_page_dirty(kvm, adapter_int->summary_gaddr >> PAGE_SHIFT);
- set_page_dirty_lock(summary_page);
- srcu_read_unlock(&kvm->srcu, idx);
-
- put_page(ind_page);
- put_page(summary_page);
+ if (!ind_info)
+ put_page(ind_page);
+ if (!summary_info)
+ put_page(summary_page);
return summary_set ? 0 : 1;
}
--
2.52.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v3 3/3] KVM: s390: Introducing kvm_arch_set_irq_inatomic fast inject
2026-04-06 6:44 [PATCH v3 0/3] KVM: s390: Introducing kvm_arch_set_irq_inatomic Fast Inject Douglas Freimuth
2026-04-06 6:44 ` [PATCH v3 1/3] KVM: s390: Add map/unmap ioctl and clean mappings post-guest Douglas Freimuth
2026-04-06 6:44 ` [PATCH v3 2/3] KVM: s390: Enable adapter_indicators_set to use mapped pages Douglas Freimuth
@ 2026-04-06 6:44 ` Douglas Freimuth
2026-04-06 16:15 ` Matthew Rosato
2026-04-06 16:25 ` Matthew Rosato
2 siblings, 2 replies; 10+ messages in thread
From: Douglas Freimuth @ 2026-04-06 6:44 UTC (permalink / raw)
To: borntraeger, imbrenda, frankja, david, hca, gor, agordeev, svens,
kvm, linux-s390, linux-kernel
Cc: mjrosato, freimuth
S390 needs a fast path for irq injection, and along those lines we
introduce kvm_arch_set_irq_inatomic. Instead of placing all interrupts on
the global work queue as it does today, this patch provides a fast path for
irq injection.
The inatomic fast path cannot lose control since it is running with
interrupts disabled. This meant making the following changes that exist on
the slow path today. First, the adapter_indicators page needs to be mapped
since it is accessed with interrupts disabled, so we added map/unmap
functions. Second, access to shared resources between the fast and slow
paths needed to be changed from mutex and semaphores to spin_lock's.
Finally, the memory allocation on the slow path utilizes GFP_KERNEL_ACCOUNT
but we had to implement the fast path with GFP_ATOMIC allocation. Each of
these enhancements were required to prevent blocking on the fast inject
path.
Fencing of Fast Inject in Secure Execution environments is enabled in the
patch series by not mapping adapter indicator pages. In Secure Execution
environments the path of execution available before this patch is followed.
Statistical counters have been added to enable analysis of irq injection on
the fast path and slow path including io_390_inatomic, io_flic_inject_airq,
io_set_adapter_int and io_390_inatomic_adapter_masked.
Signed-off-by: Douglas Freimuth <freimuth@linux.ibm.com>
---
arch/s390/include/asm/kvm_host.h | 6 +-
arch/s390/kvm/interrupt.c | 160 +++++++++++++++++++++++++++----
arch/s390/kvm/kvm-s390.c | 24 ++++-
arch/s390/kvm/kvm-s390.h | 3 +-
4 files changed, 169 insertions(+), 24 deletions(-)
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index a078420751a1..90b1a19074ce 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -359,7 +359,7 @@ struct kvm_s390_float_interrupt {
struct kvm_s390_mchk_info mchk;
struct kvm_s390_ext_info srv_signal;
int last_sleep_cpu;
- struct mutex ais_lock;
+ spinlock_t ais_lock;
u8 simm;
u8 nimm;
};
@@ -450,6 +450,10 @@ struct kvm_vm_stat {
u64 inject_io;
u64 io_390_adapter_map;
u64 io_390_adapter_unmap;
+ u64 io_390_inatomic;
+ u64 io_flic_inject_airq;
+ u64 io_set_adapter_int;
+ u64 io_390_inatomic_adapter_masked;
u64 inject_float_mchk;
u64 inject_pfault_done;
u64 inject_service_signal;
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index f3183c9ec7f1..ead54f968a79 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -1963,15 +1963,10 @@ static int __inject_vm(struct kvm *kvm, struct kvm_s390_interrupt_info *inti)
}
int kvm_s390_inject_vm(struct kvm *kvm,
- struct kvm_s390_interrupt *s390int)
+ struct kvm_s390_interrupt *s390int, struct kvm_s390_interrupt_info *inti)
{
- struct kvm_s390_interrupt_info *inti;
int rc;
- inti = kzalloc_obj(*inti, GFP_KERNEL_ACCOUNT);
- if (!inti)
- return -ENOMEM;
-
inti->type = s390int->type;
switch (inti->type) {
case KVM_S390_INT_VIRTIO:
@@ -2007,6 +2002,7 @@ int kvm_s390_inject_vm(struct kvm *kvm,
2);
rc = __inject_vm(kvm, inti);
+ /* memory allocation is done by the caller and inti is passed in, we free it here */
if (rc)
kfree(inti);
return rc;
@@ -2284,6 +2280,7 @@ static int flic_ais_mode_get_all(struct kvm *kvm, struct kvm_device_attr *attr)
{
struct kvm_s390_float_interrupt *fi = &kvm->arch.float_int;
struct kvm_s390_ais_all ais;
+ unsigned long flags;
if (attr->attr < sizeof(ais))
return -EINVAL;
@@ -2291,10 +2288,10 @@ static int flic_ais_mode_get_all(struct kvm *kvm, struct kvm_device_attr *attr)
if (!test_kvm_facility(kvm, 72))
return -EOPNOTSUPP;
- mutex_lock(&fi->ais_lock);
+ spin_lock_irqsave(&fi->ais_lock, flags);
ais.simm = fi->simm;
ais.nimm = fi->nimm;
- mutex_unlock(&fi->ais_lock);
+ spin_unlock_irqrestore(&fi->ais_lock, flags);
if (copy_to_user((void __user *)attr->addr, &ais, sizeof(ais)))
return -EFAULT;
@@ -2638,6 +2635,7 @@ static int modify_ais_mode(struct kvm *kvm, struct kvm_device_attr *attr)
struct kvm_s390_float_interrupt *fi = &kvm->arch.float_int;
struct kvm_s390_ais_req req;
int ret = 0;
+ unsigned long flags;
if (!test_kvm_facility(kvm, 72))
return -EOPNOTSUPP;
@@ -2654,7 +2652,7 @@ static int modify_ais_mode(struct kvm *kvm, struct kvm_device_attr *attr)
2 : KVM_S390_AIS_MODE_SINGLE :
KVM_S390_AIS_MODE_ALL, req.mode);
- mutex_lock(&fi->ais_lock);
+ spin_lock_irqsave(&fi->ais_lock, flags);
switch (req.mode) {
case KVM_S390_AIS_MODE_ALL:
fi->simm &= ~AIS_MODE_MASK(req.isc);
@@ -2667,7 +2665,7 @@ static int modify_ais_mode(struct kvm *kvm, struct kvm_device_attr *attr)
default:
ret = -EINVAL;
}
- mutex_unlock(&fi->ais_lock);
+ spin_unlock_irqrestore(&fi->ais_lock, flags);
return ret;
}
@@ -2681,25 +2679,33 @@ static int kvm_s390_inject_airq(struct kvm *kvm,
.parm = 0,
.parm64 = isc_to_int_word(adapter->isc),
};
+ struct kvm_s390_interrupt_info *inti;
+ unsigned long flags;
+
int ret = 0;
+ inti = kzalloc_obj(*inti, GFP_KERNEL_ACCOUNT);
+ if (!inti)
+ return -ENOMEM;
+
if (!test_kvm_facility(kvm, 72) || !adapter->suppressible)
- return kvm_s390_inject_vm(kvm, &s390int);
+ return kvm_s390_inject_vm(kvm, &s390int, inti);
- mutex_lock(&fi->ais_lock);
+ spin_lock_irqsave(&fi->ais_lock, flags);
if (fi->nimm & AIS_MODE_MASK(adapter->isc)) {
trace_kvm_s390_airq_suppressed(adapter->id, adapter->isc);
+ kfree(inti);
goto out;
}
- ret = kvm_s390_inject_vm(kvm, &s390int);
+ ret = kvm_s390_inject_vm(kvm, &s390int, inti);
if (!ret && (fi->simm & AIS_MODE_MASK(adapter->isc))) {
fi->nimm |= AIS_MODE_MASK(adapter->isc);
trace_kvm_s390_modify_ais_mode(adapter->isc,
KVM_S390_AIS_MODE_SINGLE, 2);
}
out:
- mutex_unlock(&fi->ais_lock);
+ spin_unlock_irqrestore(&fi->ais_lock, flags);
return ret;
}
@@ -2708,6 +2714,8 @@ static int flic_inject_airq(struct kvm *kvm, struct kvm_device_attr *attr)
unsigned int id = attr->attr;
struct s390_io_adapter *adapter = get_io_adapter(kvm, id);
+ kvm->stat.io_flic_inject_airq++;
+
if (!adapter)
return -EINVAL;
@@ -2718,6 +2726,7 @@ static int flic_ais_mode_set_all(struct kvm *kvm, struct kvm_device_attr *attr)
{
struct kvm_s390_float_interrupt *fi = &kvm->arch.float_int;
struct kvm_s390_ais_all ais;
+ unsigned long flags;
if (!test_kvm_facility(kvm, 72))
return -EOPNOTSUPP;
@@ -2725,10 +2734,10 @@ static int flic_ais_mode_set_all(struct kvm *kvm, struct kvm_device_attr *attr)
if (copy_from_user(&ais, (void __user *)attr->addr, sizeof(ais)))
return -EFAULT;
- mutex_lock(&fi->ais_lock);
+ spin_lock_irqsave(&fi->ais_lock, flags);
fi->simm = ais.simm;
fi->nimm = ais.nimm;
- mutex_unlock(&fi->ais_lock);
+ spin_unlock_irqrestore(&fi->ais_lock, flags);
return 0;
}
@@ -2894,6 +2903,7 @@ static int adapter_indicators_set(struct kvm *kvm,
set_bit(bit, map);
spin_unlock_irqrestore(&adapter->maps_lock, flags);
}
+
spin_lock_irqsave(&adapter->maps_lock, flags);
summary_info = get_map_info(adapter, adapter_int->summary_addr);
if (!summary_info) {
@@ -2926,6 +2936,44 @@ static int adapter_indicators_set(struct kvm *kvm,
return summary_set ? 0 : 1;
}
+static int adapter_indicators_set_fast(struct kvm *kvm,
+ struct s390_io_adapter *adapter,
+ struct kvm_s390_adapter_int *adapter_int,
+ int setbit)
+{
+ unsigned long bit;
+ int summary_set;
+ struct s390_map_info *ind_info, *summary_info;
+ void *map;
+
+ spin_lock(&adapter->maps_lock);
+ ind_info = get_map_info(adapter, adapter_int->ind_addr);
+ if (!ind_info) {
+ spin_unlock(&adapter->maps_lock);
+ return -EWOULDBLOCK;
+ }
+ map = page_address(ind_info->page);
+ bit = get_ind_bit(ind_info->addr, adapter_int->ind_offset, adapter->swap);
+ if (setbit)
+ set_bit(bit, map);
+ else
+ clear_bit(bit, map);
+ summary_info = get_map_info(adapter, adapter_int->summary_addr);
+ if (!summary_info) {
+ spin_unlock(&adapter->maps_lock);
+ return -EWOULDBLOCK;
+ }
+ map = page_address(summary_info->page);
+ bit = get_ind_bit(summary_info->addr, adapter_int->summary_offset,
+ adapter->swap);
+ if (setbit)
+ summary_set = test_and_set_bit(bit, map);
+ else
+ summary_set = test_and_clear_bit(bit, map);
+ spin_unlock(&adapter->maps_lock);
+ return summary_set ? 0 : 1;
+}
+
/*
* < 0 - not injected due to error
* = 0 - coalesced, summary indicator already active
@@ -2938,6 +2986,8 @@ static int set_adapter_int(struct kvm_kernel_irq_routing_entry *e,
int ret;
struct s390_io_adapter *adapter;
+ kvm->stat.io_set_adapter_int++;
+
/* We're only interested in the 0->1 transition. */
if (!level)
return 0;
@@ -3006,7 +3056,6 @@ int kvm_set_routing_entry(struct kvm *kvm,
int idx;
switch (ue->type) {
- /* we store the userspace addresses instead of the guest addresses */
case KVM_IRQ_ROUTING_S390_ADAPTER:
if (kvm_is_ucontrol(kvm))
return -EINVAL;
@@ -3597,3 +3646,80 @@ int __init kvm_s390_gib_init(u8 nisc)
out:
return rc;
}
+
+/*
+ * kvm_arch_set_irq_inatomic: fast-path for irqfd injection
+ */
+int kvm_arch_set_irq_inatomic(struct kvm_kernel_irq_routing_entry *e,
+ struct kvm *kvm, int irq_source_id, int level,
+ bool line_status)
+{
+ int ret, setbit;
+ struct s390_io_adapter *adapter;
+ struct kvm_s390_float_interrupt *fi = &kvm->arch.float_int;
+ struct kvm_s390_interrupt_info *inti;
+ struct kvm_s390_interrupt s390int = {
+ .type = KVM_S390_INT_IO(1, 0, 0, 0),
+ .parm = 0,
+ };
+
+ kvm->stat.io_390_inatomic++;
+
+ /* We're only interested in the 0->1 transition. */
+ if (!level)
+ return -EWOULDBLOCK;
+ if (e->type != KVM_IRQ_ROUTING_S390_ADAPTER)
+ return -EWOULDBLOCK;
+
+ adapter = get_io_adapter(kvm, e->adapter.adapter_id);
+ if (!adapter)
+ return -EWOULDBLOCK;
+
+ s390int.parm64 = isc_to_int_word(adapter->isc);
+ setbit = 1;
+ ret = adapter_indicators_set_fast(kvm, adapter, &e->adapter, setbit);
+ if (ret < 0)
+ return -EWOULDBLOCK;
+ if (!ret || adapter->masked) {
+ kvm->stat.io_390_inatomic_adapter_masked++;
+ return 0;
+ }
+
+ inti = kzalloc_obj(*inti, GFP_ATOMIC);
+ if (!inti)
+ return -EWOULDBLOCK;
+
+ if (!test_kvm_facility(kvm, 72) || !adapter->suppressible) {
+ ret = kvm_s390_inject_vm(kvm, &s390int, inti);
+ if (ret == 0) {
+ return ret;
+ } else {
+ setbit = 0;
+ adapter_indicators_set_fast(kvm, adapter, &e->adapter, setbit);
+ return -EWOULDBLOCK;
+ }
+ }
+
+ spin_lock(&fi->ais_lock);
+ if (fi->nimm & AIS_MODE_MASK(adapter->isc)) {
+ trace_kvm_s390_airq_suppressed(adapter->id, adapter->isc);
+ kfree(inti);
+ goto out;
+ }
+
+ ret = kvm_s390_inject_vm(kvm, &s390int, inti);
+ if (!ret && (fi->simm & AIS_MODE_MASK(adapter->isc))) {
+ fi->nimm |= AIS_MODE_MASK(adapter->isc);
+ trace_kvm_s390_modify_ais_mode(adapter->isc,
+ KVM_S390_AIS_MODE_SINGLE, 2);
+ } else if (ret) {
+ spin_unlock(&fi->ais_lock);
+ setbit = 0;
+ adapter_indicators_set_fast(kvm, adapter, &e->adapter, setbit);
+ return -EWOULDBLOCK;
+ }
+
+out:
+ spin_unlock(&fi->ais_lock);
+ return 0;
+}
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 4eada48c6e27..72d083e9afa8 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -70,6 +70,10 @@ const struct kvm_stats_desc kvm_vm_stats_desc[] = {
STATS_DESC_COUNTER(VM, inject_io),
STATS_DESC_COUNTER(VM, io_390_adapter_map),
STATS_DESC_COUNTER(VM, io_390_adapter_unmap),
+ STATS_DESC_COUNTER(VM, io_390_inatomic),
+ STATS_DESC_COUNTER(VM, io_flic_inject_airq),
+ STATS_DESC_COUNTER(VM, io_set_adapter_int),
+ STATS_DESC_COUNTER(VM, io_390_inatomic_adapter_masked),
STATS_DESC_COUNTER(VM, inject_float_mchk),
STATS_DESC_COUNTER(VM, inject_pfault_done),
STATS_DESC_COUNTER(VM, inject_service_signal),
@@ -2869,6 +2873,7 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
void __user *argp = (void __user *)arg;
struct kvm_device_attr attr;
int r;
+ struct kvm_s390_interrupt_info *inti;
switch (ioctl) {
case KVM_S390_INTERRUPT: {
@@ -2877,7 +2882,10 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
r = -EFAULT;
if (copy_from_user(&s390int, argp, sizeof(s390int)))
break;
- r = kvm_s390_inject_vm(kvm, &s390int);
+ inti = kzalloc_obj(*inti, GFP_KERNEL_ACCOUNT);
+ if (!inti)
+ return -ENOMEM;
+ r = kvm_s390_inject_vm(kvm, &s390int, inti);
break;
}
case KVM_CREATE_IRQCHIP: {
@@ -3275,7 +3283,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
mutex_unlock(&kvm->lock);
}
- mutex_init(&kvm->arch.float_int.ais_lock);
+ spin_lock_init(&kvm->arch.float_int.ais_lock);
spin_lock_init(&kvm->arch.float_int.lock);
for (i = 0; i < FIRQ_LIST_COUNT; i++)
INIT_LIST_HEAD(&kvm->arch.float_int.lists[i]);
@@ -4396,11 +4404,16 @@ int kvm_s390_try_set_tod_clock(struct kvm *kvm, const struct kvm_s390_vm_tod_clo
return 1;
}
-static void __kvm_inject_pfault_token(struct kvm_vcpu *vcpu, bool start_token,
- unsigned long token)
+static int __kvm_inject_pfault_token(struct kvm_vcpu *vcpu, bool start_token,
+ unsigned long token)
{
struct kvm_s390_interrupt inti;
struct kvm_s390_irq irq;
+ struct kvm_s390_interrupt_info *inti_mem;
+
+ inti_mem = kzalloc_obj(*inti_mem, GFP_KERNEL_ACCOUNT);
+ if (!inti_mem)
+ return -ENOMEM;
if (start_token) {
irq.u.ext.ext_params2 = token;
@@ -4409,8 +4422,9 @@ static void __kvm_inject_pfault_token(struct kvm_vcpu *vcpu, bool start_token,
} else {
inti.type = KVM_S390_INT_PFAULT_DONE;
inti.parm64 = token;
- WARN_ON_ONCE(kvm_s390_inject_vm(vcpu->kvm, &inti));
+ WARN_ON_ONCE(kvm_s390_inject_vm(vcpu->kvm, &inti, inti_mem));
}
+ return true;
}
bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
index bf1d7798c1af..2f2da868a040 100644
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -373,7 +373,8 @@ int __must_check kvm_s390_deliver_pending_interrupts(struct kvm_vcpu *vcpu);
void kvm_s390_clear_local_irqs(struct kvm_vcpu *vcpu);
void kvm_s390_clear_float_irqs(struct kvm *kvm);
int __must_check kvm_s390_inject_vm(struct kvm *kvm,
- struct kvm_s390_interrupt *s390int);
+ struct kvm_s390_interrupt *s390int,
+ struct kvm_s390_interrupt_info *inti);
int __must_check kvm_s390_inject_vcpu(struct kvm_vcpu *vcpu,
struct kvm_s390_irq *irq);
static inline int kvm_s390_inject_prog_irq(struct kvm_vcpu *vcpu,
--
2.52.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/3] KVM: s390: Add map/unmap ioctl and clean mappings post-guest
2026-04-06 6:44 ` [PATCH v3 1/3] KVM: s390: Add map/unmap ioctl and clean mappings post-guest Douglas Freimuth
@ 2026-04-06 13:39 ` Matthew Rosato
0 siblings, 0 replies; 10+ messages in thread
From: Matthew Rosato @ 2026-04-06 13:39 UTC (permalink / raw)
To: Douglas Freimuth, borntraeger, imbrenda, frankja, david, hca, gor,
agordeev, svens, kvm, linux-s390, linux-kernel
On 4/6/26 2:44 AM, Douglas Freimuth wrote:
> S390 needs map/unmap ioctls, which map the adapter set
> indicator pages, so the pages can be accessed when interrupts are
> disabled. The mappings are cleaned up when the guest is removed.
>
> Map/Unmap ioctls are fenced in order to avoid the longterm pinning
> in Secure Execution environments. In Secure Execution
> environments the path of execution available before this patch is followed.
>
> Statistical counters to count map/unmap functions for adapter indicator
> pages are added. The counters can be used to analyze
> map/unmap functions in non-Secure Execution environments and similarly
> can be used to analyze Secure Execution environments where the counters
> will not be incremented as the adapter indicator pages are not mapped.
>
> Signed-off-by: Douglas Freimuth <freimuth@linux.ibm.com>
[...]
> +static int kvm_s390_adapter_unmap(struct kvm *kvm, unsigned int id, __u64 addr)
> +{
> + struct s390_io_adapter *adapter = get_io_adapter(kvm, id);
> + struct s390_map_info *map, *tmp;
> + unsigned long flags;
> + int found = 0, idx;
> +
> + if (!adapter || !addr)
> + return -EINVAL;
> +
> + list_for_each_entry_safe(map, tmp, &adapter->maps, list) {
> + if (map->guest_addr == addr) {
> + spin_lock_irqsave(&adapter->maps_lock, flags);
This lock needs to be acquired before the list_for_each_entry_safe call,
that way it protects the list from changing until after you delete the
specified entry.
> + found = 1;
> + adapter->nr_maps--;
> + list_del(&map->list);
> + spin_unlock_irqrestore(&adapter->maps_lock, flags);
> + idx = srcu_read_lock(&kvm->srcu);
> + mark_page_dirty(kvm, map->addr >> PAGE_SHIFT);
This isn't the right address to mark dirty. Per
898885477e0f KVM: s390: Use guest address to mark guest page dirty
You need to keep track of the gaddr and use that to mark the page dirty.
> + set_page_dirty_lock(map->page);
> + srcu_read_unlock(&kvm->srcu, idx);
> + put_page(map->page);
> + kfree(map);
> + break;
> + }
> + }
> +
> + return found ? 0 : -ENOENT;
> +}
> +
> void kvm_s390_destroy_adapters(struct kvm *kvm)
> {
> int i;
> + struct s390_map_info *map, *tmp;
>
> - for (i = 0; i < MAX_S390_IO_ADAPTERS; i++)
> + for (i = 0; i < MAX_S390_IO_ADAPTERS; i++) {
> + if (!kvm->arch.adapters[i])
> + continue;
Should also be holding the maps_lock over this until the list is emptied
(e.g. right until the kfree(kvm->arch.adapters[i] call)
> + list_for_each_entry_safe(map, tmp,
> + &kvm->arch.adapters[i]->maps, list) {
> + list_del(&map->list);
> + put_page(map->page);
> + kfree(map);
> + }
> kfree(kvm->arch.adapters[i]);
> + }
> }
>
> static int modify_io_adapter(struct kvm_device *dev,
> @@ -2463,7 +2563,8 @@ static int modify_io_adapter(struct kvm_device *dev,
> {
> struct kvm_s390_io_adapter_req req;
> struct s390_io_adapter *adapter;
> - int ret;
> + __u64 host_addr;
> + int ret, idx;
>
> if (copy_from_user(&req, (void __user *)attr->addr, sizeof(req)))
> return -EFAULT;
> @@ -2477,14 +2578,30 @@ static int modify_io_adapter(struct kvm_device *dev,
> if (ret > 0)
> ret = 0;
> break;
> - /*
> - * The following operations are no longer needed and therefore no-ops.
> - * The gpa to hva translation is done when an IRQ route is set up. The
> - * set_irq code uses get_user_pages_remote() to do the actual write.
> - */
> case KVM_S390_IO_ADAPTER_MAP:
> case KVM_S390_IO_ADAPTER_UNMAP:
> - ret = 0;
> + /* If in Secure Execution mode do not long term pin. */
> + mutex_lock(&dev->kvm->lock);
> + if (kvm_s390_pv_is_protected(dev->kvm)) {
> + mutex_unlock(&dev->kvm->lock);
> + return 0;
> + }
> + idx = srcu_read_lock(&dev->kvm->srcu);
> + host_addr = gpa_to_hva(dev->kvm, req.addr);
> + if (kvm_is_error_hva(host_addr)) {
> + srcu_read_unlock(&dev->kvm->srcu, idx);
dev->kvm->lock also needs to be dropped here.
> + return -EFAULT;
> + }
> + srcu_read_unlock(&dev->kvm->srcu, idx);
> + if (req.type == KVM_S390_IO_ADAPTER_MAP) {
> + dev->kvm->stat.io_390_adapter_map++;
> + ret = kvm_s390_adapter_map(dev->kvm, req.id, host_addr);
> + mutex_unlock(&dev->kvm->lock);
This unlock...
> + } else {
> + dev->kvm->stat.io_390_adapter_unmap++;
> + ret = kvm_s390_adapter_unmap(dev->kvm, req.id, host_addr);
> + mutex_unlock(&dev->kvm->lock);
and this unlock...
> + }
Could be combined and moved here
> break;
> default:
> ret = -EINVAL;
> @@ -2730,24 +2847,6 @@ static unsigned long get_ind_bit(__u64 addr, unsigned long bit_nr, bool swap)
> return swap ? (bit ^ (BITS_PER_LONG - 1)) : bit;
> }
>
> -static struct page *get_map_page(struct kvm *kvm, u64 uaddr)
> -{
> - struct mm_struct *mm = kvm->mm;
> - struct page *page = NULL;
> - int locked = 1;
> -
> - if (mmget_not_zero(mm)) {
> - mmap_read_lock(mm);
> - get_user_pages_remote(mm, uaddr, 1, FOLL_WRITE,
> - &page, &locked);
> - if (locked)
> - mmap_read_unlock(mm);
> - mmput(mm);
> - }
> -
> - return page;
> -}
> -
> static int adapter_indicators_set(struct kvm *kvm,
> struct s390_io_adapter *adapter,
> struct kvm_s390_adapter_int *adapter_int)
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index d7838334a338..4eada48c6e27 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -68,6 +68,8 @@
> const struct kvm_stats_desc kvm_vm_stats_desc[] = {
> KVM_GENERIC_VM_STATS(),
> STATS_DESC_COUNTER(VM, inject_io),
> + STATS_DESC_COUNTER(VM, io_390_adapter_map),
> + STATS_DESC_COUNTER(VM, io_390_adapter_unmap),
> STATS_DESC_COUNTER(VM, inject_float_mchk),
> STATS_DESC_COUNTER(VM, inject_pfault_done),
> STATS_DESC_COUNTER(VM, inject_service_signal),
> @@ -2491,6 +2493,30 @@ static int kvm_s390_pv_dmp(struct kvm *kvm, struct kvm_pv_cmd *cmd,
> return r;
> }
>
> +static void kvm_s390_unmap_all_adapters_pv(struct kvm *kvm)
> +{
> + unsigned long flags;
> + struct s390_map_info *map, *tmp;
> + int i, idx;
> +
> + for (i = 0; i < MAX_S390_IO_ADAPTERS; i++) {
> + if (!kvm->arch.adapters[i])
> + continue;
> + list_for_each_entry_safe(map, tmp,
> + &kvm->arch.adapters[i]->maps, list) {
> + spin_lock_irqsave(&kvm->arch.adapters[i]->maps_lock, flags);
Same comment as kvm_s390_adapter_unmap, need to aquire this before the
list_for_each_entry_safe call.
Actually this one will need to be a bit more creative since you need to
drop the spinlock before each call to set_page_dirty_lock(), but you'll
need to re-acquire it again each time you go back to the list. That
makes list_for_each_entry_safe a bad choice.
Maybe using list_first_entry_or_null() each time you re-acquire the
spinlock, until you get a null (meaning the list is empty)?
> + list_del(&map->list);
You need to be decrementing nr_maps here before droppping the lock;
unlike kvm_s390_destroy_adapters we are not freeing the structure and if
we leave SE mode we could get more mappings later so the nr_maps value
has to be kept up-to-date.
> + spin_unlock_irqrestore(&kvm->arch.adapters[i]->maps_lock, flags);
> + idx = srcu_read_lock(&kvm->srcu);
> + mark_page_dirty(kvm, map->addr >> PAGE_SHIFT);
Same comment as above, need to use the gaddr
> + set_page_dirty_lock(map->page);
> + srcu_read_unlock(&kvm->srcu, idx);
> + put_page(map->page);
> + kfree(map);
> + }
> + }
> +}
> +
> static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
> {
> const bool need_lock = (cmd->cmd != KVM_PV_ASYNC_CLEANUP_PERFORM);
> @@ -2507,6 +2533,7 @@ static int kvm_s390_handle_pv(struct kvm *kvm, struct kvm_pv_cmd *cmd)
> if (kvm_s390_pv_is_protected(kvm))
> break;
>
> + kvm_s390_unmap_all_adapters_pv(kvm);
> mmap_write_lock(kvm->mm);
> /*
> * Disable creation of new THPs. Existing THPs can stay, they
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 2/3] KVM: s390: Enable adapter_indicators_set to use mapped pages
2026-04-06 6:44 ` [PATCH v3 2/3] KVM: s390: Enable adapter_indicators_set to use mapped pages Douglas Freimuth
@ 2026-04-06 15:33 ` Matthew Rosato
0 siblings, 0 replies; 10+ messages in thread
From: Matthew Rosato @ 2026-04-06 15:33 UTC (permalink / raw)
To: Douglas Freimuth, borntraeger, imbrenda, frankja, david, hca, gor,
agordeev, svens, kvm, linux-s390, linux-kernel
On 4/6/26 2:44 AM, Douglas Freimuth wrote:
> The S390 adapter_indicators_set function needs to be able to use mapped
> pages so that worked can be processed,on a fast path when interrupts are
> disabled. If adapter indicator pages are not mapped then local mapping is
> done on a slow path as it is prior to this patch. For example, Secure
> Execution environments will take the local mapping path as it does prior to
> this patch.
>
> Signed-off-by: Douglas Freimuth <freimuth@linux.ibm.com>
> ---
> arch/s390/kvm/interrupt.c | 91 ++++++++++++++++++++++++++++-----------
> 1 file changed, 66 insertions(+), 25 deletions(-)
>
> diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
> index 47bd6361c849..f3183c9ec7f1 100644
> --- a/arch/s390/kvm/interrupt.c
> +++ b/arch/s390/kvm/interrupt.c
> @@ -2847,41 +2847,82 @@ static unsigned long get_ind_bit(__u64 addr, unsigned long bit_nr, bool swap)
> return swap ? (bit ^ (BITS_PER_LONG - 1)) : bit;
> }
>
> +static struct s390_map_info *get_map_info(struct s390_io_adapter *adapter,
> + u64 addr)
> +{
> + struct s390_map_info *map;
> +
> + if (!adapter)
> + return NULL;
> +
> + list_for_each_entry(map, &adapter->maps, list) {
> + if (map->guest_addr == addr)
> + return map;
> + }
> + return NULL;
> +}
> +
> static int adapter_indicators_set(struct kvm *kvm,
> struct s390_io_adapter *adapter,
> struct kvm_s390_adapter_int *adapter_int)
> {
> unsigned long bit;
> int summary_set, idx;
> - struct page *ind_page, *summary_page;
> + struct s390_map_info *ind_info, *summary_info;
> void *map;
> + struct page *ind_page, *summary_page;
> + unsigned long flags;
>
> - ind_page = get_map_page(kvm, adapter_int->ind_addr);
> - if (!ind_page)
> - return -1;
> - summary_page = get_map_page(kvm, adapter_int->summary_addr);
> - if (!summary_page) {
> - put_page(ind_page);
> - return -1;
> + spin_lock_irqsave(&adapter->maps_lock, flags);
> + ind_info = get_map_info(adapter, adapter_int->ind_addr);
> + if (!ind_info) {
> + spin_unlock_irqrestore(&adapter->maps_lock, flags);
> + ind_page = get_map_page(kvm, adapter_int->ind_addr);
> + if (!ind_page)
> + return -1;
> + idx = srcu_read_lock(&kvm->srcu);
> + map = page_address(ind_page);
> + bit = get_ind_bit(adapter_int->ind_addr,
> + adapter_int->ind_offset, adapter->swap);
> + set_bit(bit, map);
> + mark_page_dirty(kvm, adapter_int->ind_gaddr >> PAGE_SHIFT);
> + set_page_dirty_lock(ind_page);
> + srcu_read_unlock(&kvm->srcu, idx);
> + } else {
> + map = page_address(ind_info->page);
> + bit = get_ind_bit(ind_info->addr, adapter_int->ind_offset, adapter->swap);
> + set_bit(bit, map);
> + spin_unlock_irqrestore(&adapter->maps_lock, flags);
> + }
> + spin_lock_irqsave(&adapter->maps_lock, flags);
> + summary_info = get_map_info(adapter, adapter_int->summary_addr);
> + if (!summary_info) {
> + spin_unlock_irqrestore(&adapter->maps_lock, flags);
> + summary_page = get_map_page(kvm, adapter_int->summary_addr);
> + if (!summary_page && !ind_info) {
> + put_page(ind_page);
> + return -1;
1) Sashiko mentions that this now allows for a path where we already set
the indicator bits above but bail out early because we couldn't find the
summary page. The old code validated that it could find the indicator
page AND the summary page before setting any bits.
I think re-implementing to achieve that model is _preferable_ but I
think we are also OK with the current approach; in this unlikely event
the indicator bits are set but we fail to set the summary bit, then I
can envision at least 2 scenarios...
1a) an already-running interrupt handler in the guest happens to see the
indicator bits on because the summary bit was already set and proceeds
to handle it and clear the indicator bits. I suppose we might get an
over-indication of those bits later if the host attempts a re-delivery.
1b) Same as above but the summary bit was off (or the interrupt handler
never was given initiative to run in the first place) so the indicator
bits are not noticed and nothing is handled. But if re-delivery is
attempted then this code would re-indicate the same bits which were
already on -- this would not prevent an attempt at indicating the
summary bit again. Assuming that succeeds, it will result in an adapter
interrupt delivered.
If we continually fail to map the summary page I guess it could be an
awkward dance of indicating the bits but never being able to set the
summary bit. Realistically, if we are in this situation (cannot map the
summary page that was previously valid) it seems to me we've got a
bigger issue. Would a WARN_ON_ONCE make sense?
2) If you keep this approach then there is another issue here that is
definitely a valid concern -- if summary_page is NULL but info_info is
non-NULL, you continue on and use a summary_page of 0 below which is
wrong - I think you wanted to return -1 (but without a put_page) in this
case e.g.:
if (!summary_page) {
if (!ind_page)
put_page(ind_page);
return -1;
}
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 3/3] KVM: s390: Introducing kvm_arch_set_irq_inatomic fast inject
2026-04-06 6:44 ` [PATCH v3 3/3] KVM: s390: Introducing kvm_arch_set_irq_inatomic fast inject Douglas Freimuth
@ 2026-04-06 16:15 ` Matthew Rosato
2026-04-06 17:59 ` Sean Christopherson
2026-04-06 16:25 ` Matthew Rosato
1 sibling, 1 reply; 10+ messages in thread
From: Matthew Rosato @ 2026-04-06 16:15 UTC (permalink / raw)
To: Douglas Freimuth, borntraeger, imbrenda, frankja, david, hca, gor,
agordeev, svens, kvm, linux-s390, linux-kernel
On 4/6/26 2:44 AM, Douglas Freimuth wrote:
> S390 needs a fast path for irq injection, and along those lines we
> introduce kvm_arch_set_irq_inatomic. Instead of placing all interrupts on
> the global work queue as it does today, this patch provides a fast path for
> irq injection.
>
> The inatomic fast path cannot lose control since it is running with
> interrupts disabled. This meant making the following changes that exist on
> the slow path today. First, the adapter_indicators page needs to be mapped
> since it is accessed with interrupts disabled, so we added map/unmap
> functions. Second, access to shared resources between the fast and slow
> paths needed to be changed from mutex and semaphores to spin_lock's.
> Finally, the memory allocation on the slow path utilizes GFP_KERNEL_ACCOUNT
> but we had to implement the fast path with GFP_ATOMIC allocation. Each of
> these enhancements were required to prevent blocking on the fast inject
> path.
>
> Fencing of Fast Inject in Secure Execution environments is enabled in the
> patch series by not mapping adapter indicator pages. In Secure Execution
> environments the path of execution available before this patch is followed.
>
> Statistical counters have been added to enable analysis of irq injection on
> the fast path and slow path including io_390_inatomic, io_flic_inject_airq,
> io_set_adapter_int and io_390_inatomic_adapter_masked.
>
> Signed-off-by: Douglas Freimuth <freimuth@linux.ibm.com>
Sashiko complains about PREEMPT_RT kernels and spinlocks being sleepable
in this case which would break the whole point of kvm_arch_set_irq_inatomic.
I suspect actually the kvm_arch_set_irq_inatomic() call itself shouldn't
be used in this case, or in other words it wouldn't be an issue with
just this s390 implementation but rather all of arch implementations?
I did not try enabling it and running a test, but I did do some
searching of the codebase and I can found at least 1 spinlock acquired
somewhere along the inatomic path for the existing implementations...
longarch (pch_pic_set_irq)
arm64 (vgic_its_inject_cached_translation)
powerpc (icp_deliver_irq)
riscv (kvm_riscv_aia_aplic_inject)
For x86 I didn't find a spinlock -- maybe I didn't look hard enough! --
but I did find a path that uses RCU (kvm_irq_delivery_to_apic_fast)
which AFAIU would also become preemptible under PREEMPT_RT.
So for this series it seems reasonable to me to proceed as-is, with an
open question whether there should be a KVM-wide avoidance of
kvm_arch_set_irq_inatomic() under PREEMPT_RT?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 3/3] KVM: s390: Introducing kvm_arch_set_irq_inatomic fast inject
2026-04-06 6:44 ` [PATCH v3 3/3] KVM: s390: Introducing kvm_arch_set_irq_inatomic fast inject Douglas Freimuth
2026-04-06 16:15 ` Matthew Rosato
@ 2026-04-06 16:25 ` Matthew Rosato
1 sibling, 0 replies; 10+ messages in thread
From: Matthew Rosato @ 2026-04-06 16:25 UTC (permalink / raw)
To: Douglas Freimuth, borntraeger, imbrenda, frankja, david, hca, gor,
agordeev, svens, kvm, linux-s390, linux-kernel
>
> -static void __kvm_inject_pfault_token(struct kvm_vcpu *vcpu, bool start_token,
> - unsigned long token)
> +static int __kvm_inject_pfault_token(struct kvm_vcpu *vcpu, bool start_token,
> + unsigned long token)
> {
> struct kvm_s390_interrupt inti;
> struct kvm_s390_irq irq;
> + struct kvm_s390_interrupt_info *inti_mem;
> +
> + inti_mem = kzalloc_obj(*inti_mem, GFP_KERNEL_ACCOUNT);
> + if (!inti_mem)
> + return -ENOMEM;
You change this function to possibly return this value but you do not
change the callers of this routine to actually look at the new return value?
AFAICT there are 2 callers of this today in arch/s390/kvm/kvm-s390.c - I
assume one or both need updating, otherwise why do we need this change?
>
> if (start_token) {
> irq.u.ext.ext_params2 = token;
> @@ -4409,8 +4422,9 @@ static void __kvm_inject_pfault_token(struct kvm_vcpu *vcpu, bool start_token,
> } else {
> inti.type = KVM_S390_INT_PFAULT_DONE;
> inti.parm64 = token;
> - WARN_ON_ONCE(kvm_s390_inject_vm(vcpu->kvm, &inti));
> + WARN_ON_ONCE(kvm_s390_inject_vm(vcpu->kvm, &inti, inti_mem));
> }
> + return true;
Since return value is an integer, return 0?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 3/3] KVM: s390: Introducing kvm_arch_set_irq_inatomic fast inject
2026-04-06 16:15 ` Matthew Rosato
@ 2026-04-06 17:59 ` Sean Christopherson
2026-04-06 18:24 ` Matthew Rosato
0 siblings, 1 reply; 10+ messages in thread
From: Sean Christopherson @ 2026-04-06 17:59 UTC (permalink / raw)
To: Matthew Rosato
Cc: Douglas Freimuth, borntraeger, imbrenda, frankja, david, hca, gor,
agordeev, svens, kvm, linux-s390, linux-kernel
On Mon, Apr 06, 2026, Matthew Rosato wrote:
> On 4/6/26 2:44 AM, Douglas Freimuth wrote:
> > S390 needs a fast path for irq injection, and along those lines we
> > introduce kvm_arch_set_irq_inatomic. Instead of placing all interrupts on
> > the global work queue as it does today, this patch provides a fast path for
> > irq injection.
> >
> > The inatomic fast path cannot lose control since it is running with
> > interrupts disabled. This meant making the following changes that exist on
> > the slow path today. First, the adapter_indicators page needs to be mapped
> > since it is accessed with interrupts disabled, so we added map/unmap
> > functions. Second, access to shared resources between the fast and slow
> > paths needed to be changed from mutex and semaphores to spin_lock's.
> > Finally, the memory allocation on the slow path utilizes GFP_KERNEL_ACCOUNT
> > but we had to implement the fast path with GFP_ATOMIC allocation. Each of
> > these enhancements were required to prevent blocking on the fast inject
> > path.
> >
> > Fencing of Fast Inject in Secure Execution environments is enabled in the
> > patch series by not mapping adapter indicator pages. In Secure Execution
> > environments the path of execution available before this patch is followed.
> >
> > Statistical counters have been added to enable analysis of irq injection on
> > the fast path and slow path including io_390_inatomic, io_flic_inject_airq,
> > io_set_adapter_int and io_390_inatomic_adapter_masked.
> >
> > Signed-off-by: Douglas Freimuth <freimuth@linux.ibm.com>
>
>
> Sashiko complains about PREEMPT_RT kernels and spinlocks being sleepable
> in this case which would break the whole point of kvm_arch_set_irq_inatomic.
Just make it a raw spinlock so that it stays an actual spinlock.
> I suspect actually the kvm_arch_set_irq_inatomic() call itself shouldn't
> be used in this case, or in other words it wouldn't be an issue with
> just this s390 implementation but rather all of arch implementations?
>
> I did not try enabling it and running a test, but I did do some
> searching of the codebase and I can found at least 1 spinlock acquired
> somewhere along the inatomic path for the existing implementations...
>
> longarch (pch_pic_set_irq)
I doubt anyone runs PREEMPT_RT VMs on LoongArch at this point.
> arm64 (vgic_its_inject_cached_translation)
Uses raw.
> powerpc (icp_deliver_irq)
Presumably arch_spin_lock() is also a "raw" version? PPC KVM is barely maintained
at this point, so I wouldn't worry much about it.
> riscv (kvm_riscv_aia_aplic_inject)
Uses "raw".
> For x86 I didn't find a spinlock -- maybe I didn't look hard enough! --
> but I did find a path that uses RCU (kvm_irq_delivery_to_apic_fast)
> which AFAIU would also become preemptible under PREEMPT_RT.
This isn't about becoming preemptible per se, it's about non-raw spinlocks
becoming sleepable locks. RCU can be made preemptible, but rcu_read_lock()
doesn't become sleepable.
> So for this series it seems reasonable to me to proceed as-is, with an
> open question whether there should be a KVM-wide avoidance of
> kvm_arch_set_irq_inatomic() under PREEMPT_RT?
s390 should use a raw spinlock, same as arm64 and RISC-V.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 3/3] KVM: s390: Introducing kvm_arch_set_irq_inatomic fast inject
2026-04-06 17:59 ` Sean Christopherson
@ 2026-04-06 18:24 ` Matthew Rosato
0 siblings, 0 replies; 10+ messages in thread
From: Matthew Rosato @ 2026-04-06 18:24 UTC (permalink / raw)
To: Sean Christopherson
Cc: Douglas Freimuth, borntraeger, imbrenda, frankja, david, hca, gor,
agordeev, svens, kvm, linux-s390, linux-kernel
>>
>> Sashiko complains about PREEMPT_RT kernels and spinlocks being sleepable
>> in this case which would break the whole point of kvm_arch_set_irq_inatomic.
>
> Just make it a raw spinlock so that it stays an actual spinlock.
[...]
>
> s390 should use a raw spinlock, same as arm64 and RISC-V.
Ahh, I missed that subtlety.
Thanks for the explanation!
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-04-06 18:24 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-06 6:44 [PATCH v3 0/3] KVM: s390: Introducing kvm_arch_set_irq_inatomic Fast Inject Douglas Freimuth
2026-04-06 6:44 ` [PATCH v3 1/3] KVM: s390: Add map/unmap ioctl and clean mappings post-guest Douglas Freimuth
2026-04-06 13:39 ` Matthew Rosato
2026-04-06 6:44 ` [PATCH v3 2/3] KVM: s390: Enable adapter_indicators_set to use mapped pages Douglas Freimuth
2026-04-06 15:33 ` Matthew Rosato
2026-04-06 6:44 ` [PATCH v3 3/3] KVM: s390: Introducing kvm_arch_set_irq_inatomic fast inject Douglas Freimuth
2026-04-06 16:15 ` Matthew Rosato
2026-04-06 17:59 ` Sean Christopherson
2026-04-06 18:24 ` Matthew Rosato
2026-04-06 16:25 ` Matthew Rosato
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox