* [patch 00/12] Ucontrol patches V3
@ 2011-12-09 11:23 Carsten Otte
2011-12-09 11:23 ` [patch 01/12] [PATCH] kvm-s390: add parameter for KVM_CREATE_VM Carsten Otte
` (11 more replies)
0 siblings, 12 replies; 33+ messages in thread
From: Carsten Otte @ 2011-12-09 11:23 UTC (permalink / raw)
To: Avi Kivity, Marcelo Tossati
Cc: Christian Borntraeger, Heiko Carstens, Martin Schwidefsky,
Cornelia Huck, KVM, Joachim von Buttlar, Jens Freimann,
Constantin Werner
Hi Avi, Hi Marcelo,
this round includes feedback from Sasha Levin:
KVM_VM_S390_UCONTROL renamed to KVM_VM_UCONTROL
KVM_CAP_S390_UCONTROL renamed to KVM_CAP_UCONTROL
and a bugfix for a possible host change bit underindication (race)
in SSKE that was reported by Joachim off-list.
@Heiko: since Martin is out, could you please sit in for him and
ack the pgtable.[ch] parts of patch #10 so that Avi can pick up
the whole series?
so long,
Carsten
^ permalink raw reply [flat|nested] 33+ messages in thread* [patch 01/12] [PATCH] kvm-s390: add parameter for KVM_CREATE_VM 2011-12-09 11:23 [patch 00/12] Ucontrol patches V3 Carsten Otte @ 2011-12-09 11:23 ` Carsten Otte 2011-12-09 11:32 ` Alexander Graf 2011-12-09 11:23 ` [patch 02/12] [PATCH] kvm-s390-ucontrol: per vcpu address spaces Carsten Otte ` (10 subsequent siblings) 11 siblings, 1 reply; 33+ messages in thread From: Carsten Otte @ 2011-12-09 11:23 UTC (permalink / raw) To: Avi Kivity, Marcelo Tossati Cc: Christian Borntraeger, Heiko Carstens, Martin Schwidefsky, Cornelia Huck, KVM, Joachim von Buttlar, Jens Freimann, Constantin Werner, Alexander Graf, Xiantao Zhang [-- Attachment #1: enable-ucontrol.patch --] [-- Type: text/plain, Size: 7252 bytes --] This patch introduces a new config option for user controlled kernel virtual machines. It introduces an optional parameter to KVM_CREATE_VM in order to create a user controlled virtual machine. The parameter is passed to kvm_arch_init_vm for all architectures. Valid values for the new parameter are KVM_VM_REGULAR (defined to 0 for backward compatibility to old KVM_CREATE_VM) and KVM_VM_UCONTROL for s390 only. Note that the user controlled virtual machines require CAP_SYS_ADMIN privileges. Signed-off-by: Carsten Otte <cotte@de.ibm.com> --- --- Documentation/virtual/kvm/api.txt | 7 ++++++- arch/ia64/kvm/kvm-ia64.c | 5 ++++- arch/powerpc/kvm/powerpc.c | 5 ++++- arch/s390/kvm/Kconfig | 9 +++++++++ arch/s390/kvm/kvm-s390.c | 30 +++++++++++++++++++++++++----- arch/s390/kvm/kvm-s390.h | 10 ++++++++++ arch/x86/kvm/x86.c | 5 ++++- include/linux/kvm.h | 3 +++ include/linux/kvm_host.h | 2 +- virt/kvm/kvm_main.c | 19 +++++++++++++------ 10 files changed, 79 insertions(+), 16 deletions(-) --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -95,7 +95,7 @@ described as 'basic' will be available. Capability: basic Architectures: all Type: system ioctl -Parameters: none +Parameters: machine type identifier (KVM_VM_*) Returns: a VM fd that can be used to control the new virtual machine. The new VM has no virtual cpus and no memory. An mmap() of a VM fd @@ -103,6 +103,11 @@ will access the virtual machine's physic corresponds to guest physical address zero. Use of mmap() on a VM fd is discouraged if userspace memory allocation (KVM_CAP_USER_MEMORY) is available. +You most certainly want to use KVM_VM_REGULAR as machine type. + +In order to create user controlled virtual machines on S390, check +KVM_CAP_UCONTROL and use KVM_VM_UCONTROL as machine type as +privileged user (CAP_SYS_ADMIN). 4.3 KVM_GET_MSR_INDEX_LIST --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -809,10 +809,13 @@ static void kvm_build_io_pmt(struct kvm #define GUEST_PHYSICAL_RR4 0x2739 #define VMM_INIT_RR 0x1660 -int kvm_arch_init_vm(struct kvm *kvm) +int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) { BUG_ON(!kvm); + if (type != KVM_VM_REGULAR) + return -EINVAL; + kvm->arch.is_sn2 = ia64_platform_is("sn2"); kvm->arch.metaphysical_rr0 = GUEST_PHYSICAL_RR0; --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -171,8 +171,11 @@ void kvm_arch_check_processor_compat(voi *(int *)rtn = kvmppc_core_check_processor_compat(); } -int kvm_arch_init_vm(struct kvm *kvm) +int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) { + if (type != KVM_VM_REGULAR) + return -EINVAL; + return kvmppc_core_init_vm(kvm); } --- a/arch/s390/kvm/Kconfig +++ b/arch/s390/kvm/Kconfig @@ -34,6 +34,15 @@ config KVM If unsure, say N. +config KVM_UCONTROL + bool "Userspace controlled virtual machines" + depends on KVM + ---help--- + Allow CAP_SYS_ADMIN users to create KVM virtual machines that are + controlled by userspace. + + If unsure, say N. + # OK, it's a little counter-intuitive to do this, but it puts it neatly under # the virtualization menu. source drivers/vhost/Kconfig --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -171,11 +171,28 @@ long kvm_arch_vm_ioctl(struct file *filp return r; } -int kvm_arch_init_vm(struct kvm *kvm) +int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) { int rc; char debug_name[16]; + rc = -EINVAL; +#ifdef CONFIG_KVM_UCONTROL + switch (type) { + case KVM_VM_REGULAR: + break; + case KVM_VM_UCONTROL: + if (!capable(CAP_SYS_ADMIN)) + goto out_err; + break; + default: + goto out_err; + } +#else + if (type != KVM_VM_REGULAR) + goto out_err; +#endif + rc = s390_enable_sie(); if (rc) goto out_err; @@ -198,10 +215,13 @@ int kvm_arch_init_vm(struct kvm *kvm) debug_register_view(kvm->arch.dbf, &debug_sprintf_view); VM_EVENT(kvm, 3, "%s", "vm created"); - kvm->arch.gmap = gmap_alloc(current->mm); - if (!kvm->arch.gmap) - goto out_nogmap; - + if (type == KVM_VM_REGULAR) { + kvm->arch.gmap = gmap_alloc(current->mm); + if (!kvm->arch.gmap) + goto out_nogmap; + } else { + kvm->arch.gmap = NULL; + } return 0; out_nogmap: debug_unregister(kvm->arch.dbf); --- a/arch/s390/kvm/kvm-s390.h +++ b/arch/s390/kvm/kvm-s390.h @@ -47,6 +47,16 @@ static inline int __cpu_is_stopped(struc return atomic_read(&vcpu->arch.sie_block->cpuflags) & CPUSTAT_STOP_INT; } +static inline int kvm_is_ucontrol(struct kvm *kvm) +{ +#ifdef CONFIG_KVM_UCONTROL + if (kvm->arch.gmap) + return 0; + return 1; +#else + return 0; +#endif +} int kvm_s390_handle_wait(struct kvm_vcpu *vcpu); enum hrtimer_restart kvm_s390_idle_wakeup(struct hrtimer *timer); void kvm_s390_tasklet(unsigned long parm); --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5969,8 +5969,11 @@ void kvm_arch_vcpu_uninit(struct kvm_vcp free_page((unsigned long)vcpu->arch.pio_data); } -int kvm_arch_init_vm(struct kvm *kvm) +int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) { + if (type != KVM_VM_REGULAR) + return -EINVAL; + INIT_LIST_HEAD(&kvm->arch.active_mmu_pages); INIT_LIST_HEAD(&kvm->arch.assigned_dev_head); --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -431,6 +431,9 @@ struct kvm_ppc_pvinfo { #define KVMIO 0xAE +#define KVM_VM_REGULAR 0 +#define KVM_VM_UCONTROL 1 + /* * ioctls for /dev/kvm fds: */ --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -519,7 +519,7 @@ static inline void kvm_arch_free_vm(stru } #endif -int kvm_arch_init_vm(struct kvm *kvm); +int kvm_arch_init_vm(struct kvm *kvm, unsigned long type); void kvm_arch_destroy_vm(struct kvm *kvm); void kvm_free_all_assigned_devices(struct kvm *kvm); void kvm_arch_sync_events(struct kvm *kvm); --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -449,7 +449,7 @@ static void kvm_init_memslots_id(struct slots->id_to_index[i] = slots->memslots[i].id = i; } -static struct kvm *kvm_create_vm(void) +static struct kvm *kvm_create_vm(unsigned long type) { int r, i; struct kvm *kvm = kvm_arch_alloc_vm(); @@ -457,7 +457,7 @@ static struct kvm *kvm_create_vm(void) if (!kvm) return ERR_PTR(-ENOMEM); - r = kvm_arch_init_vm(kvm); + r = kvm_arch_init_vm(kvm, type); if (r) goto out_err_nodisable; @@ -2207,12 +2207,12 @@ static struct file_operations kvm_vm_fop .llseek = noop_llseek, }; -static int kvm_dev_ioctl_create_vm(void) +static int kvm_dev_ioctl_create_vm(unsigned long type) { int r; struct kvm *kvm; - kvm = kvm_create_vm(); + kvm = kvm_create_vm(type); if (IS_ERR(kvm)) return PTR_ERR(kvm); #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET @@ -2264,9 +2264,16 @@ static long kvm_dev_ioctl(struct file *f break; case KVM_CREATE_VM: r = -EINVAL; - if (arg) + switch (arg) { + case KVM_VM_REGULAR: +#ifdef CONFIG_KVM_UCONTROL + case KVM_VM_UCONTROL: +#endif + r = kvm_dev_ioctl_create_vm(arg); + break; + default: goto out; - r = kvm_dev_ioctl_create_vm(); + } break; case KVM_CHECK_EXTENSION: r = kvm_dev_ioctl_check_extension_generic(arg); ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [patch 01/12] [PATCH] kvm-s390: add parameter for KVM_CREATE_VM 2011-12-09 11:23 ` [patch 01/12] [PATCH] kvm-s390: add parameter for KVM_CREATE_VM Carsten Otte @ 2011-12-09 11:32 ` Alexander Graf 2011-12-09 11:50 ` Carsten Otte 0 siblings, 1 reply; 33+ messages in thread From: Alexander Graf @ 2011-12-09 11:32 UTC (permalink / raw) To: Carsten Otte Cc: Avi Kivity, Marcelo Tossati, Christian Borntraeger, Heiko Carstens, Martin Schwidefsky, Cornelia Huck, KVM, Joachim von Buttlar, Jens Freimann, Constantin Werner, Xiantao Zhang On 09.12.2011, at 12:23, Carsten Otte wrote: > This patch introduces a new config option for user controlled kernel > virtual machines. It introduces an optional parameter to > KVM_CREATE_VM in order to create a user controlled virtual machine. > The parameter is passed to kvm_arch_init_vm for all architectures. > Valid values for the new parameter are KVM_VM_REGULAR (defined to 0 > for backward compatibility to old KVM_CREATE_VM) and > KVM_VM_UCONTROL for s390 only. > Note that the user controlled virtual machines require CAP_SYS_ADMIN > privileges. > > Signed-off-by: Carsten Otte <cotte@de.ibm.com> > --- > --- > Documentation/virtual/kvm/api.txt | 7 ++++++- > arch/ia64/kvm/kvm-ia64.c | 5 ++++- > arch/powerpc/kvm/powerpc.c | 5 ++++- > arch/s390/kvm/Kconfig | 9 +++++++++ > arch/s390/kvm/kvm-s390.c | 30 +++++++++++++++++++++++++----- > arch/s390/kvm/kvm-s390.h | 10 ++++++++++ > arch/x86/kvm/x86.c | 5 ++++- > include/linux/kvm.h | 3 +++ > include/linux/kvm_host.h | 2 +- > virt/kvm/kvm_main.c | 19 +++++++++++++------ > 10 files changed, 79 insertions(+), 16 deletions(-) > > --- a/Documentation/virtual/kvm/api.txt > +++ b/Documentation/virtual/kvm/api.txt > @@ -95,7 +95,7 @@ described as 'basic' will be available. > Capability: basic > Architectures: all > Type: system ioctl > -Parameters: none > +Parameters: machine type identifier (KVM_VM_*) > Returns: a VM fd that can be used to control the new virtual machine. > > The new VM has no virtual cpus and no memory. An mmap() of a VM fd > @@ -103,6 +103,11 @@ will access the virtual machine's physic > corresponds to guest physical address zero. Use of mmap() on a VM fd > is discouraged if userspace memory allocation (KVM_CAP_USER_MEMORY) is > available. > +You most certainly want to use KVM_VM_REGULAR as machine type. > + > +In order to create user controlled virtual machines on S390, check > +KVM_CAP_UCONTROL KVM_S390_CAP_UCONTROL > and use KVM_VM_UCONTROL as machine type as > +privileged user (CAP_SYS_ADMIN). Why not KVM_ENABLE_CAP(KVM_S390_CAP_UCONTROL)? It doesn't look like you can't switch from kernel-controlled to user controlled mode during runtime. All you need to do is remove the gmap again and you should be fine, no? We do something similar on PPC where we just call ENABLE_CAP to switch to PAPR mode. If otherwise too difficult you can for example also define that the ENABLE_CAP has to happen before your first VCPU_RUN. Alex ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [patch 01/12] [PATCH] kvm-s390: add parameter for KVM_CREATE_VM 2011-12-09 11:32 ` Alexander Graf @ 2011-12-09 11:50 ` Carsten Otte 0 siblings, 0 replies; 33+ messages in thread From: Carsten Otte @ 2011-12-09 11:50 UTC (permalink / raw) To: Alexander Graf Cc: Avi Kivity, Marcelo Tossati, borntrae, heicars2, mschwid2, huckc, KVM, Joachim von Buttlar, Jens Freimann, Constantin Werner, Xiantao Zhang On 09.12.2011 12:32, Alexander Graf wrote: >> +KVM_CAP_UCONTROL > KVM_S390_CAP_UCONTROL I'm happy either way. It seemed to me that the discussion between Avi and Sasha for V2 of the patch series on this naming has concluded to KVM_CAP_UCONTROL/KVM_VM_UCONTROL without _S390 in it. > KVM_ENABLE_CAP(KVM_S390_CAP_UCONTROL)? It doesn't look like you can't switch from kernel-controlled to user controlled mode during runtime. All you need to do is remove the gmap again and you should be fine, no? This was the case via an ioctl KVM_S390_ENABLE_UCONTROL in version 1. Avi pointed out some possible race conditions with that, and recommended to switch it via KVM_CREATE_VM. I'm happy either way, just let me know what's prefered. > We do something similar on PPC where we just call ENABLE_CAP to switch to PAPR mode. If otherwise too difficult you can for example also define that the ENABLE_CAP has to happen before your first VCPU_RUN. Code looks to me like you do ENABLE_CAP per vcpu on ppc (chapter 4.37 of api.txt agrees with that). We need something per VM, we cannot switch individual CPUs between ucontrol/regular because with ucontrol the VM does not have a common address space. ^ permalink raw reply [flat|nested] 33+ messages in thread
* [patch 02/12] [PATCH] kvm-s390-ucontrol: per vcpu address spaces 2011-12-09 11:23 [patch 00/12] Ucontrol patches V3 Carsten Otte 2011-12-09 11:23 ` [patch 01/12] [PATCH] kvm-s390: add parameter for KVM_CREATE_VM Carsten Otte @ 2011-12-09 11:23 ` Carsten Otte 2011-12-09 11:23 ` [patch 03/12] [PATCH] kvm-s390-ucontrol: export page faults to user Carsten Otte ` (9 subsequent siblings) 11 siblings, 0 replies; 33+ messages in thread From: Carsten Otte @ 2011-12-09 11:23 UTC (permalink / raw) To: Avi Kivity, Marcelo Tossati Cc: Christian Borntraeger, Heiko Carstens, Martin Schwidefsky, Cornelia Huck, KVM, Joachim von Buttlar, Jens Freimann, Constantin Werner [-- Attachment #1: address-spaces.patch --] [-- Type: text/plain, Size: 4798 bytes --] This patch introduces two ioctls for virtual cpus, that are only valid for kernel virtual machines that are controlled by userspace. Each virtual cpu has its individual address space in this mode of operation, and each address space is backed by the gmap implementation just like the address space for regular KVM guests. KVM_S390_UCAS_MAP allows to map a part of the user's virtual address space to the vcpu. Starting offset and length in both the user and the vcpu address space need to be aligned to 1M. KVM_S390_UCAS_UNMAP can be used to unmap a range of memory from a virtual cpu in a similar way. Signed-off-by: Carsten Otte <cotte@de.ibm.com> --- --- Documentation/virtual/kvm/api.txt | 38 ++++++++++++++++++++++++++++ arch/s390/kvm/kvm-s390.c | 50 +++++++++++++++++++++++++++++++++++++- include/linux/kvm.h | 10 +++++++ 3 files changed, 97 insertions(+), 1 deletion(-) --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1455,6 +1455,44 @@ is supported; 2 if the processor require an RMA, or 1 if the processor can use an RMA but doesn't require it, because it supports the Virtual RMA (VRMA) facility. +4.64 KVM_S390_UCAS_MAP + +Capability: KVM_CAP_UCONTROL +Architectures: s390 +Type: vcpu ioctl +Parameters: struct kvm_s390_ucas_mapping (in) +Returns: 0 in case of success + +The parameter is defined like this: + struct kvm_s390_ucas_mapping { + __u64 user_addr; + __u64 vcpu_addr; + __u64 length; + }; + +This ioctl maps the memory at "user_addr" with the length "length" to +the vcpu's address space starting at "vcpu_addr". All parameters need to +be alligned by 1 megabyte. + +4.65 KVM_S390_UCAS_UNMAP + +Capability: KVM_CAP_UCONTROL +Architectures: s390 +Type: vcpu ioctl +Parameters: struct kvm_s390_ucas_mapping (in) +Returns: 0 in case of success + +The parameter is defined like this: + struct kvm_s390_ucas_mapping { + __u64 user_addr; + __u64 vcpu_addr; + __u64 length; + }; + +This ioctl unmaps the memory in the vcpu's address space starting at +"vcpu_addr" with the length "length". The field "user_addr" is ignored. +All parameters need to be alligned by 1 megabyte. + 5. The kvm_run structure Application code obtains a pointer to the kvm_run structure by --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -239,6 +239,10 @@ void kvm_arch_vcpu_destroy(struct kvm_vc (__u64) vcpu->arch.sie_block) vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sda = 0; smp_mb(); + + if (kvm_is_ucontrol(vcpu->kvm)) + gmap_free(vcpu->arch.gmap); + free_page((unsigned long)(vcpu->arch.sie_block)); kvm_vcpu_uninit(vcpu); kfree(vcpu); @@ -269,12 +273,20 @@ void kvm_arch_destroy_vm(struct kvm *kvm kvm_free_vcpus(kvm); free_page((unsigned long)(kvm->arch.sca)); debug_unregister(kvm->arch.dbf); - gmap_free(kvm->arch.gmap); + if (!kvm_is_ucontrol(kvm)) + gmap_free(kvm->arch.gmap); } /* Section: vcpu related */ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) { + if (kvm_is_ucontrol(vcpu->kvm)) { + vcpu->arch.gmap = gmap_alloc(current->mm); + if (!vcpu->arch.gmap) + return -ENOMEM; + return 0; + } + vcpu->arch.gmap = vcpu->kvm->arch.gmap; return 0; } @@ -693,6 +705,42 @@ long kvm_arch_vcpu_ioctl(struct file *fi case KVM_S390_INITIAL_RESET: r = kvm_arch_vcpu_ioctl_initial_reset(vcpu); break; +#ifdef CONFIG_KVM_UCONTROL + case KVM_S390_UCAS_MAP: { + struct kvm_s390_ucas_mapping ucasmap; + + if (copy_from_user(&ucasmap, argp, sizeof(ucasmap))) { + r = -EFAULT; + break; + } + + if (!kvm_is_ucontrol(vcpu->kvm)) { + r = -EINVAL; + break; + } + + r = gmap_map_segment(vcpu->arch.gmap, ucasmap.user_addr, + ucasmap.vcpu_addr, ucasmap.length); + break; + } + case KVM_S390_UCAS_UNMAP: { + struct kvm_s390_ucas_mapping ucasmap; + + if (copy_from_user(&ucasmap, argp, sizeof(ucasmap))) { + r = -EFAULT; + break; + } + + if (!kvm_is_ucontrol(vcpu->kvm)) { + r = -EINVAL; + break; + } + + r = gmap_unmap_segment(vcpu->arch.gmap, ucasmap.vcpu_addr, + ucasmap.length); + break; + } +#endif default: r = -EINVAL; } --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -657,6 +657,16 @@ struct kvm_clock_data { struct kvm_userspace_memory_region) #define KVM_SET_TSS_ADDR _IO(KVMIO, 0x47) #define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO, 0x48, __u64) + +/* enable ucontrol for s390 */ +struct kvm_s390_ucas_mapping { + __u64 user_addr; + __u64 vcpu_addr; + __u64 length; +}; +#define KVM_S390_UCAS_MAP _IOW(KVMIO, 0x50, struct kvm_s390_ucas_mapping) +#define KVM_S390_UCAS_UNMAP _IOW(KVMIO, 0x51, struct kvm_s390_ucas_mapping) + /* Device model IOC */ #define KVM_CREATE_IRQCHIP _IO(KVMIO, 0x60) #define KVM_IRQ_LINE _IOW(KVMIO, 0x61, struct kvm_irq_level) ^ permalink raw reply [flat|nested] 33+ messages in thread
* [patch 03/12] [PATCH] kvm-s390-ucontrol: export page faults to user 2011-12-09 11:23 [patch 00/12] Ucontrol patches V3 Carsten Otte 2011-12-09 11:23 ` [patch 01/12] [PATCH] kvm-s390: add parameter for KVM_CREATE_VM Carsten Otte 2011-12-09 11:23 ` [patch 02/12] [PATCH] kvm-s390-ucontrol: per vcpu address spaces Carsten Otte @ 2011-12-09 11:23 ` Carsten Otte 2011-12-09 11:23 ` [patch 04/12] [PATCH] kvm-s390-ucontrol: export SIE control block " Carsten Otte ` (8 subsequent siblings) 11 siblings, 0 replies; 33+ messages in thread From: Carsten Otte @ 2011-12-09 11:23 UTC (permalink / raw) To: Avi Kivity, Marcelo Tossati Cc: Christian Borntraeger, Heiko Carstens, Martin Schwidefsky, Cornelia Huck, KVM, Joachim von Buttlar, Jens Freimann, Constantin Werner [-- Attachment #1: page-fault-exit.patch --] [-- Type: text/plain, Size: 4561 bytes --] This patch introduces a new exit reason in the kvm_run structure named KVM_EXIT_UCONTROL. This exit indicates, that a virtual cpu has regognized a fault on the host page table. The idea is that userspace can handle this fault by mapping memory at the fault location into the cpu's address space and then continue to run the virtual cpu. Signed-off-by: Carsten Otte <cotte@de.ibm.com> --- --- Documentation/virtual/kvm/api.txt | 14 ++++++++++++++ arch/s390/kvm/kvm-s390.c | 32 +++++++++++++++++++++++++++----- arch/s390/kvm/kvm-s390.h | 1 + include/linux/kvm.h | 6 ++++++ 4 files changed, 48 insertions(+), 5 deletions(-) --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1653,6 +1653,20 @@ s390 specific. s390 specific. + /* KVM_EXIT_UCONTROL */ + struct { + __u64 trans_exc_code; + __u32 pgm_code; + } s390_ucontrol; + +s390 specific. A page fault has occurred for a user controlled virtual +machine (KVM_VM_S390_UNCONTROL) on it's host page table that cannot be +resolved by the kernel. +The program code and the translation exception code that were placed +in the cpu's lowcore are presented here as defined by the z Architecture +Principles of Operation Book in the Chapter for Dynamic Address Translation +(DAT) + /* KVM_EXIT_DCR */ struct { __u32 dcrn; --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -499,8 +499,10 @@ int kvm_arch_vcpu_ioctl_set_mpstate(stru return -EINVAL; /* not implemented yet */ } -static void __vcpu_run(struct kvm_vcpu *vcpu) +static int __vcpu_run(struct kvm_vcpu *vcpu) { + int rc; + memcpy(&vcpu->arch.sie_block->gg14, &vcpu->arch.guest_gprs[14], 16); if (need_resched()) @@ -517,9 +519,15 @@ static void __vcpu_run(struct kvm_vcpu * local_irq_enable(); VCPU_EVENT(vcpu, 6, "entering sie flags %x", atomic_read(&vcpu->arch.sie_block->cpuflags)); - if (sie64a(vcpu->arch.sie_block, vcpu->arch.guest_gprs)) { - VCPU_EVENT(vcpu, 3, "%s", "fault in sie instruction"); - kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); + rc = sie64a(vcpu->arch.sie_block, vcpu->arch.guest_gprs); + if (rc) { + if (kvm_is_ucontrol(vcpu->kvm)) { + rc = SIE_INTERCEPT_UCONTROL; + } else { + VCPU_EVENT(vcpu, 3, "%s", "fault in sie instruction"); + kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); + rc = 0; + } } VCPU_EVENT(vcpu, 6, "exit sie icptcode %d", vcpu->arch.sie_block->icptcode); @@ -528,6 +536,7 @@ static void __vcpu_run(struct kvm_vcpu * local_irq_enable(); memcpy(&vcpu->arch.guest_gprs[14], &vcpu->arch.sie_block->gg14, 16); + return rc; } int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) @@ -548,6 +557,7 @@ rerun_vcpu: case KVM_EXIT_UNKNOWN: case KVM_EXIT_INTR: case KVM_EXIT_S390_RESET: + case KVM_EXIT_UCONTROL: break; default: BUG(); @@ -559,7 +569,9 @@ rerun_vcpu: might_fault(); do { - __vcpu_run(vcpu); + rc = __vcpu_run(vcpu); + if (rc) + break; rc = kvm_handle_sie_intercept(vcpu); } while (!signal_pending(current) && !rc); @@ -571,6 +583,16 @@ rerun_vcpu: rc = -EINTR; } +#ifdef CONFIG_KVM_UCONTROL + if (rc == SIE_INTERCEPT_UCONTROL) { + kvm_run->exit_reason = KVM_EXIT_UCONTROL; + kvm_run->s390_ucontrol.trans_exc_code = + current->thread.gmap_addr; + kvm_run->s390_ucontrol.pgm_code = 0x10; + rc = 0; + } +#endif + if (rc == -EOPNOTSUPP) { /* intercept cannot be handled in-kernel, prepare kvm-run */ kvm_run->exit_reason = KVM_EXIT_S390_SIEIC; --- a/arch/s390/kvm/kvm-s390.h +++ b/arch/s390/kvm/kvm-s390.h @@ -26,6 +26,7 @@ typedef int (*intercept_handler_t)(struc /* negativ values are error codes, positive values for internal conditions */ #define SIE_INTERCEPT_RERUNVCPU (1<<0) +#define SIE_INTERCEPT_UCONTROL (1<<1) int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu); #define VM_EVENT(d_kvm, d_loglevel, d_string, d_args...)\ --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -162,6 +162,7 @@ struct kvm_pit_config { #define KVM_EXIT_INTERNAL_ERROR 17 #define KVM_EXIT_OSI 18 #define KVM_EXIT_PAPR_HCALL 19 +#define KVM_EXIT_UCONTROL 20 /* For KVM_EXIT_INTERNAL_ERROR */ #define KVM_INTERNAL_ERROR_EMULATION 1 @@ -249,6 +250,11 @@ struct kvm_run { #define KVM_S390_RESET_CPU_INIT 8 #define KVM_S390_RESET_IPL 16 __u64 s390_reset_flags; + /* KVM_EXIT_UCONTROL */ + struct { + __u64 trans_exc_code; + __u32 pgm_code; + } s390_ucontrol; /* KVM_EXIT_DCR */ struct { __u32 dcrn; ^ permalink raw reply [flat|nested] 33+ messages in thread
* [patch 04/12] [PATCH] kvm-s390-ucontrol: export SIE control block to user 2011-12-09 11:23 [patch 00/12] Ucontrol patches V3 Carsten Otte ` (2 preceding siblings ...) 2011-12-09 11:23 ` [patch 03/12] [PATCH] kvm-s390-ucontrol: export page faults to user Carsten Otte @ 2011-12-09 11:23 ` Carsten Otte 2011-12-09 11:37 ` Alexander Graf 2011-12-09 11:23 ` [patch 05/12] [PATCH] kvm-s390-ucontrol: disable in-kernel handling of SIE intercepts Carsten Otte ` (7 subsequent siblings) 11 siblings, 1 reply; 33+ messages in thread From: Carsten Otte @ 2011-12-09 11:23 UTC (permalink / raw) To: Avi Kivity, Marcelo Tossati Cc: Christian Borntraeger, Heiko Carstens, Martin Schwidefsky, Cornelia Huck, KVM, Joachim von Buttlar, Jens Freimann, Constantin Werner, Alexander Graf, Xiantao Zhang [-- Attachment #1: sie-control-block-to-user.patch --] [-- Type: text/plain, Size: 3728 bytes --] This patch exports the s390 SIE hardware control block to userspace via the mapping of the vcpu file descriptor. In order to do so, a new arch callback named kvm_arch_vcpu_fault is introduced for all architectures. It allows to map architecture specific pages. Signed-off-by: Carsten Otte <cotte@de.ibm.com> --- --- Documentation/virtual/kvm/api.txt | 5 +++++ arch/ia64/kvm/kvm-ia64.c | 5 +++++ arch/powerpc/kvm/powerpc.c | 5 +++++ arch/s390/kvm/kvm-s390.c | 13 +++++++++++++ arch/x86/kvm/x86.c | 5 +++++ include/linux/kvm.h | 1 + include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c | 2 +- 8 files changed, 36 insertions(+), 1 deletion(-) --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -218,6 +218,11 @@ allocation of vcpu ids. For example, if single-threaded guest vcpus, it should make all vcpu ids be a multiple of the number of vcpus per vcore. +For virtual cpus that have been created with S390 user controlled virtual +machines, the resulting vcpu fd can be memory mapped at page offset +KVM_S390_SIE_PAGE_OFFSET in order to obtain a memory map of the virtual +cpu's hardware control block. + 4.8 KVM_GET_DIRTY_LOG (vm ioctl) Capability: basic --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -1566,6 +1566,11 @@ out: return r; } +int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf) +{ + return VM_FAULT_SIGBUS; +} + int kvm_arch_prepare_memory_region(struct kvm *kvm, struct kvm_memory_slot *memslot, struct kvm_memory_slot old, --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -659,6 +659,11 @@ out: return r; } +int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf) +{ + return VM_FAULT_SIGBUS; +} + static int kvm_vm_ioctl_get_pvinfo(struct kvm_ppc_pvinfo *pvinfo) { u32 inst_lis = 0x3c000000; --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -769,6 +769,19 @@ long kvm_arch_vcpu_ioctl(struct file *fi return r; } +int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf) +{ +#ifdef CONFIG_KVM_UCONTROL + if ((vmf->pgoff == KVM_S390_SIE_PAGE_OFFSET) + && (kvm_is_ucontrol(vcpu->kvm))) { + vmf->page = virt_to_page(vcpu->arch.sie_block); + get_page(vmf->page); + return 0; + } +#endif + return VM_FAULT_SIGBUS; +} + /* Section: memory related */ int kvm_arch_prepare_memory_region(struct kvm *kvm, struct kvm_memory_slot *memslot, --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2790,6 +2790,11 @@ out: return r; } +int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf) +{ + return VM_FAULT_SIGBUS; +} + static int kvm_vm_ioctl_set_tss_addr(struct kvm *kvm, unsigned long addr) { int ret; --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -439,6 +439,7 @@ struct kvm_ppc_pvinfo { #define KVM_VM_REGULAR 0 #define KVM_VM_UCONTROL 1 +#define KVM_S390_SIE_PAGE_OFFSET 1 /* * ioctls for /dev/kvm fds: --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -449,6 +449,7 @@ long kvm_arch_dev_ioctl(struct file *fil unsigned int ioctl, unsigned long arg); long kvm_arch_vcpu_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg); +int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf); int kvm_dev_ioctl_check_extension(long ext); --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1657,7 +1657,7 @@ static int kvm_vcpu_fault(struct vm_area page = virt_to_page(vcpu->kvm->coalesced_mmio_ring); #endif else - return VM_FAULT_SIGBUS; + return kvm_arch_vcpu_fault(vcpu, vmf); get_page(page); vmf->page = page; return 0; ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [patch 04/12] [PATCH] kvm-s390-ucontrol: export SIE control block to user 2011-12-09 11:23 ` [patch 04/12] [PATCH] kvm-s390-ucontrol: export SIE control block " Carsten Otte @ 2011-12-09 11:37 ` Alexander Graf 2011-12-09 11:54 ` Carsten Otte 0 siblings, 1 reply; 33+ messages in thread From: Alexander Graf @ 2011-12-09 11:37 UTC (permalink / raw) To: Carsten Otte Cc: Avi Kivity, Marcelo Tossati, Christian Borntraeger, Heiko Carstens, Martin Schwidefsky, Cornelia Huck, KVM, Joachim von Buttlar, Jens Freimann, Constantin Werner, Xiantao Zhang On 09.12.2011, at 12:23, Carsten Otte wrote: > This patch exports the s390 SIE hardware control block to userspace > via the mapping of the vcpu file descriptor. In order to do so, > a new arch callback named kvm_arch_vcpu_fault is introduced for all > architectures. It allows to map architecture specific pages. > > Signed-off-by: Carsten Otte <cotte@de.ibm.com> > --- > --- > Documentation/virtual/kvm/api.txt | 5 +++++ > arch/ia64/kvm/kvm-ia64.c | 5 +++++ > arch/powerpc/kvm/powerpc.c | 5 +++++ > arch/s390/kvm/kvm-s390.c | 13 +++++++++++++ > arch/x86/kvm/x86.c | 5 +++++ > include/linux/kvm.h | 1 + > include/linux/kvm_host.h | 1 + > virt/kvm/kvm_main.c | 2 +- > 8 files changed, 36 insertions(+), 1 deletion(-) > > --- a/Documentation/virtual/kvm/api.txt > +++ b/Documentation/virtual/kvm/api.txt > @@ -218,6 +218,11 @@ allocation of vcpu ids. For example, if > single-threaded guest vcpus, it should make all vcpu ids be a multiple > of the number of vcpus per vcore. > > +For virtual cpus that have been created with S390 user controlled virtual > +machines, the resulting vcpu fd can be memory mapped at page offset > +KVM_S390_SIE_PAGE_OFFSET in order to obtain a memory map of the virtual > +cpu's hardware control block. > + > 4.8 KVM_GET_DIRTY_LOG (vm ioctl) > > Capability: basic > --- a/arch/ia64/kvm/kvm-ia64.c > +++ b/arch/ia64/kvm/kvm-ia64.c > @@ -1566,6 +1566,11 @@ out: > return r; > } > > +int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf) > +{ > + return VM_FAULT_SIGBUS; > +} > + > int kvm_arch_prepare_memory_region(struct kvm *kvm, > struct kvm_memory_slot *memslot, > struct kvm_memory_slot old, > --- a/arch/powerpc/kvm/powerpc.c > +++ b/arch/powerpc/kvm/powerpc.c > @@ -659,6 +659,11 @@ out: > return r; > } > > +int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf) > +{ > + return VM_FAULT_SIGBUS; > +} > + > static int kvm_vm_ioctl_get_pvinfo(struct kvm_ppc_pvinfo *pvinfo) > { > u32 inst_lis = 0x3c000000; > --- a/arch/s390/kvm/kvm-s390.c > +++ b/arch/s390/kvm/kvm-s390.c > @@ -769,6 +769,19 @@ long kvm_arch_vcpu_ioctl(struct file *fi > return r; > } > > +int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf) > +{ > +#ifdef CONFIG_KVM_UCONTROL > + if ((vmf->pgoff == KVM_S390_SIE_PAGE_OFFSET) > + && (kvm_is_ucontrol(vcpu->kvm))) { > + vmf->page = virt_to_page(vcpu->arch.sie_block); > + get_page(vmf->page); > + return 0; > + } > +#endif > + return VM_FAULT_SIGBUS; > +} > + > /* Section: memory related */ > int kvm_arch_prepare_memory_region(struct kvm *kvm, > struct kvm_memory_slot *memslot, > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -2790,6 +2790,11 @@ out: > return r; > } > > +int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf) > +{ > + return VM_FAULT_SIGBUS; > +} > + > static int kvm_vm_ioctl_set_tss_addr(struct kvm *kvm, unsigned long addr) > { > int ret; > --- a/include/linux/kvm.h > +++ b/include/linux/kvm.h > @@ -439,6 +439,7 @@ struct kvm_ppc_pvinfo { > > #define KVM_VM_REGULAR 0 > #define KVM_VM_UCONTROL 1 > +#define KVM_S390_SIE_PAGE_OFFSET 1 Can we please make these a global number space? I don't want to have any user space code "accidently" call this mmap while it's trying to find out if it can map the PIO page. We already have a global number space for CAPs. Alex ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [patch 04/12] [PATCH] kvm-s390-ucontrol: export SIE control block to user 2011-12-09 11:37 ` Alexander Graf @ 2011-12-09 11:54 ` Carsten Otte 0 siblings, 0 replies; 33+ messages in thread From: Carsten Otte @ 2011-12-09 11:54 UTC (permalink / raw) To: Alexander Graf Cc: Avi Kivity, Marcelo Tossati, borntrae, heicars2, mschwid2, huckc, KVM, Joachim von Buttlar, Jens Freimann, Constantin Werner, Xiantao Zhang On 09.12.2011 12:37, Alexander Graf wrote: >> +#define KVM_S390_SIE_PAGE_OFFSET 1 > > Can we please make these a global number space? I don't want to have any user space code "accidently" call this mmap while it's trying to find out if it can map the PIO page. We already have a global number space for CAPs. I consider a vma with holes in it pretty ugly, and we've got the mechanism for userspace to correctly find out where to find what. I do prefer a contiguous vma and the existing capabilities/queries. Avi? ^ permalink raw reply [flat|nested] 33+ messages in thread
* [patch 05/12] [PATCH] kvm-s390-ucontrol: disable in-kernel handling of SIE intercepts 2011-12-09 11:23 [patch 00/12] Ucontrol patches V3 Carsten Otte ` (3 preceding siblings ...) 2011-12-09 11:23 ` [patch 04/12] [PATCH] kvm-s390-ucontrol: export SIE control block " Carsten Otte @ 2011-12-09 11:23 ` Carsten Otte 2011-12-09 11:23 ` [patch 06/12] [PATCH] kvm-s390-ucontrol: disable in-kernel irq stack Carsten Otte ` (6 subsequent siblings) 11 siblings, 0 replies; 33+ messages in thread From: Carsten Otte @ 2011-12-09 11:23 UTC (permalink / raw) To: Avi Kivity, Marcelo Tossati Cc: Christian Borntraeger, Heiko Carstens, Martin Schwidefsky, Cornelia Huck, KVM, Joachim von Buttlar, Jens Freimann, Constantin Werner [-- Attachment #1: no-sie-intercepts-for-ucontrol.patch --] [-- Type: text/plain, Size: 823 bytes --] This patch disables in-kernel handling of SIE intercepts for user controlled virtual machines. All intercepts are passed to userspace via KVM_EXIT_SIE exit reason just like SIE intercepts that cannot be handled in-kernel for regular KVM guests. Signed-off-by: Carsten Otte <cotte@de.ibm.com> --- Index: linux-2.5-cecsim/arch/s390/kvm/kvm-s390.c =================================================================== --- linux-2.5-cecsim.orig/arch/s390/kvm/kvm-s390.c +++ linux-2.5-cecsim/arch/s390/kvm/kvm-s390.c @@ -572,7 +572,10 @@ rerun_vcpu: rc = __vcpu_run(vcpu); if (rc) break; - rc = kvm_handle_sie_intercept(vcpu); + if (kvm_is_ucontrol(vcpu->kvm)) + rc = -EOPNOTSUPP; + else + rc = kvm_handle_sie_intercept(vcpu); } while (!signal_pending(current) && !rc); if (rc == SIE_INTERCEPT_RERUNVCPU) ^ permalink raw reply [flat|nested] 33+ messages in thread
* [patch 06/12] [PATCH] kvm-s390-ucontrol: disable in-kernel irq stack 2011-12-09 11:23 [patch 00/12] Ucontrol patches V3 Carsten Otte ` (4 preceding siblings ...) 2011-12-09 11:23 ` [patch 05/12] [PATCH] kvm-s390-ucontrol: disable in-kernel handling of SIE intercepts Carsten Otte @ 2011-12-09 11:23 ` Carsten Otte 2011-12-09 11:23 ` [patch 07/12] [PATCH] kvm-s390-ucontrol: interface to inject faults on a vcpu page table Carsten Otte ` (5 subsequent siblings) 11 siblings, 0 replies; 33+ messages in thread From: Carsten Otte @ 2011-12-09 11:23 UTC (permalink / raw) To: Avi Kivity, Marcelo Tossati Cc: Christian Borntraeger, Heiko Carstens, Martin Schwidefsky, Cornelia Huck, KVM, Joachim von Buttlar, Jens Freimann, Constantin Werner [-- Attachment #1: kvm-disable-irq-stack.patch --] [-- Type: text/plain, Size: 750 bytes --] This patch disables the in-kernel interrupt stack for KVM virtual machines that are controlled by user. Userspace has to take care of handling interrupts on its own. Signed-off-by: Carsten Otte <cotte@de.ibm.com> --- Index: linux-2.5-cecsim/arch/s390/kvm/kvm-s390.c =================================================================== --- linux-2.5-cecsim.orig/arch/s390/kvm/kvm-s390.c +++ linux-2.5-cecsim/arch/s390/kvm/kvm-s390.c @@ -511,7 +511,8 @@ static int __vcpu_run(struct kvm_vcpu *v if (test_thread_flag(TIF_MCCK_PENDING)) s390_handle_mcck(); - kvm_s390_deliver_pending_interrupts(vcpu); + if (!kvm_is_ucontrol(vcpu->kvm)) + kvm_s390_deliver_pending_interrupts(vcpu); vcpu->arch.sie_block->icptcode = 0; local_irq_disable(); ^ permalink raw reply [flat|nested] 33+ messages in thread
* [patch 07/12] [PATCH] kvm-s390-ucontrol: interface to inject faults on a vcpu page table 2011-12-09 11:23 [patch 00/12] Ucontrol patches V3 Carsten Otte ` (5 preceding siblings ...) 2011-12-09 11:23 ` [patch 06/12] [PATCH] kvm-s390-ucontrol: disable in-kernel irq stack Carsten Otte @ 2011-12-09 11:23 ` Carsten Otte 2011-12-09 11:23 ` [patch 08/12] [PATCH] kvm-s390-ucontrol: disable sca Carsten Otte ` (4 subsequent siblings) 11 siblings, 0 replies; 33+ messages in thread From: Carsten Otte @ 2011-12-09 11:23 UTC (permalink / raw) To: Avi Kivity, Marcelo Tossati Cc: Christian Borntraeger, Heiko Carstens, Martin Schwidefsky, Cornelia Huck, KVM, Joachim von Buttlar, Jens Freimann, Constantin Werner [-- Attachment #1: guest-fault-interface.patch --] [-- Type: text/plain, Size: 2502 bytes --] This patch allows the user to fault in pages on a virtual cpus address space for user controlled virtual machines. Typically this is superfluous because userspace can just create a mapping and let the kernel's page fault logic take are of it. There is one exception: SIE won't start if the lowcore is not present. Normally the kernel takes care of this [handle_validity() in arch/s390/kvm/intercept.c] but since the kernel does not handle intercepts for user controlled virtual machines, userspace needs to be able to handle this condition. Signed-off-by: Carsten Otte <cotte@de.ibm.com> --- --- Documentation/virtual/kvm/api.txt | 16 ++++++++++++++++ arch/s390/kvm/kvm-s390.c | 6 ++++++ include/linux/kvm.h | 1 + 3 files changed, 23 insertions(+) --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1498,6 +1498,22 @@ This ioctl unmaps the memory in the vcpu "vcpu_addr" with the length "length". The field "user_addr" is ignored. All parameters need to be alligned by 1 megabyte. +4.66 KVM_S390_VCPU_FAULT + +Capability: KVM_CAP_UCONTROL +Architectures: s390 +Type: vcpu ioctl +Parameters: vcpu absolute address (in) +Returns: 0 in case of success + +This call creates a page table entry on the virtual cpu's address space +(for user controlled virtual machines) or the virtual machine's address +space (for regular virtual machines). This only works for minor faults, +thus it's recommended to access subject memory page via the user page +table upfront. This is useful to handle validity intercepts for user +controlled virtual machines to fault in the virtual cpu's lowcore pages +prior to calling the KVM_RUN ioctl. + 5. The kvm_run structure Application code obtains a pointer to the kvm_run structure by --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -767,6 +767,12 @@ long kvm_arch_vcpu_ioctl(struct file *fi break; } #endif + case KVM_S390_VCPU_FAULT: { + r = gmap_fault(arg, vcpu->arch.gmap); + if (!IS_ERR_VALUE(r)) + r = 0; + break; + } default: r = -EINVAL; } --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -673,6 +673,7 @@ struct kvm_s390_ucas_mapping { }; #define KVM_S390_UCAS_MAP _IOW(KVMIO, 0x50, struct kvm_s390_ucas_mapping) #define KVM_S390_UCAS_UNMAP _IOW(KVMIO, 0x51, struct kvm_s390_ucas_mapping) +#define KVM_S390_VCPU_FAULT _IOW(KVMIO, 0x52, unsigned long) /* Device model IOC */ #define KVM_CREATE_IRQCHIP _IO(KVMIO, 0x60) ^ permalink raw reply [flat|nested] 33+ messages in thread
* [patch 08/12] [PATCH] kvm-s390-ucontrol: disable sca 2011-12-09 11:23 [patch 00/12] Ucontrol patches V3 Carsten Otte ` (6 preceding siblings ...) 2011-12-09 11:23 ` [patch 07/12] [PATCH] kvm-s390-ucontrol: interface to inject faults on a vcpu page table Carsten Otte @ 2011-12-09 11:23 ` Carsten Otte 2011-12-09 11:23 ` [patch 09/12] [PATCH] kvm-s390: fix assumption for KVM_MAX_VCPUS Carsten Otte ` (3 subsequent siblings) 11 siblings, 0 replies; 33+ messages in thread From: Carsten Otte @ 2011-12-09 11:23 UTC (permalink / raw) To: Avi Kivity, Marcelo Tossati Cc: Christian Borntraeger, Heiko Carstens, Martin Schwidefsky, Cornelia Huck, KVM, Joachim von Buttlar, Jens Freimann, Constantin Werner [-- Attachment #1: do-not-add-ucontrol-cpus-to-sca.patch --] [-- Type: text/plain, Size: 2061 bytes --] This patch makes sure user controlled virtual machines do not use a system control area (sca). This is needed in order to create virtual machines with more cpus than the size of the sca [64]. Signed-off-by: Carsten Otte <cotte@de.ibm.com> --- Index: linux-2.5-cecsim/arch/s390/kvm/kvm-s390.c =================================================================== --- linux-2.5-cecsim.orig/arch/s390/kvm/kvm-s390.c +++ linux-2.5-cecsim/arch/s390/kvm/kvm-s390.c @@ -234,10 +234,13 @@ out_err: void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) { VCPU_EVENT(vcpu, 3, "%s", "free cpu"); - clear_bit(63 - vcpu->vcpu_id, (unsigned long *) &vcpu->kvm->arch.sca->mcn); - if (vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sda == - (__u64) vcpu->arch.sie_block) - vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sda = 0; + if (!kvm_is_ucontrol(vcpu->kvm)) { + clear_bit(63 - vcpu->vcpu_id, + (unsigned long *) &vcpu->kvm->arch.sca->mcn); + if (vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sda == + (__u64) vcpu->arch.sie_block) + vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sda = 0; + } smp_mb(); if (kvm_is_ucontrol(vcpu->kvm)) @@ -374,12 +377,19 @@ struct kvm_vcpu *kvm_arch_vcpu_create(st goto out_free_cpu; vcpu->arch.sie_block->icpua = id; - BUG_ON(!kvm->arch.sca); - if (!kvm->arch.sca->cpu[id].sda) - kvm->arch.sca->cpu[id].sda = (__u64) vcpu->arch.sie_block; - vcpu->arch.sie_block->scaoh = (__u32)(((__u64)kvm->arch.sca) >> 32); - vcpu->arch.sie_block->scaol = (__u32)(__u64)kvm->arch.sca; - set_bit(63 - id, (unsigned long *) &kvm->arch.sca->mcn); + if (!kvm_is_ucontrol(kvm)) { + if (!kvm->arch.sca) { + WARN_ON_ONCE(1); + goto out_free_cpu; + } + if (!kvm->arch.sca->cpu[id].sda) + kvm->arch.sca->cpu[id].sda = + (__u64) vcpu->arch.sie_block; + vcpu->arch.sie_block->scaoh = + (__u32)(((__u64)kvm->arch.sca) >> 32); + vcpu->arch.sie_block->scaol = (__u32)(__u64)kvm->arch.sca; + set_bit(63 - id, (unsigned long *) &kvm->arch.sca->mcn); + } spin_lock_init(&vcpu->arch.local_int.lock); INIT_LIST_HEAD(&vcpu->arch.local_int.list); ^ permalink raw reply [flat|nested] 33+ messages in thread
* [patch 09/12] [PATCH] kvm-s390: fix assumption for KVM_MAX_VCPUS 2011-12-09 11:23 [patch 00/12] Ucontrol patches V3 Carsten Otte ` (7 preceding siblings ...) 2011-12-09 11:23 ` [patch 08/12] [PATCH] kvm-s390-ucontrol: disable sca Carsten Otte @ 2011-12-09 11:23 ` Carsten Otte 2011-12-09 11:23 ` [patch 10/12] [PATCH] kvm-s390: storage key interface Carsten Otte ` (2 subsequent siblings) 11 siblings, 0 replies; 33+ messages in thread From: Carsten Otte @ 2011-12-09 11:23 UTC (permalink / raw) To: Avi Kivity, Marcelo Tossati Cc: Christian Borntraeger, Heiko Carstens, Martin Schwidefsky, Cornelia Huck, KVM, Joachim von Buttlar, Jens Freimann, Constantin Werner [-- Attachment #1: allow-more-than-64-vcpus.patch --] [-- Type: text/plain, Size: 854 bytes --] This patch fixes definition of the idle_mask and the local_int array in kvm_s390_float_interrupt. Previous definition had 64 cpus max hardcoded instead of using KVM_MAX_VCPUS. Signed-off-by: Carsten Otte <cotte@de.ibm.com> --- Index: linux-2.5-cecsim/arch/s390/include/asm/kvm_host.h =================================================================== --- linux-2.5-cecsim.orig/arch/s390/include/asm/kvm_host.h +++ linux-2.5-cecsim/arch/s390/include/asm/kvm_host.h @@ -220,8 +220,9 @@ struct kvm_s390_float_interrupt { struct list_head list; atomic_t active; int next_rr_cpu; - unsigned long idle_mask [(64 + sizeof(long) - 1) / sizeof(long)]; - struct kvm_s390_local_interrupt *local_int[64]; + unsigned long idle_mask[(KVM_MAX_VCPUS + sizeof(long) - 1) + / sizeof(long)]; + struct kvm_s390_local_interrupt *local_int[KVM_MAX_VCPUS]; }; ^ permalink raw reply [flat|nested] 33+ messages in thread
* [patch 10/12] [PATCH] kvm-s390: storage key interface 2011-12-09 11:23 [patch 00/12] Ucontrol patches V3 Carsten Otte ` (8 preceding siblings ...) 2011-12-09 11:23 ` [patch 09/12] [PATCH] kvm-s390: fix assumption for KVM_MAX_VCPUS Carsten Otte @ 2011-12-09 11:23 ` Carsten Otte 2011-12-09 12:04 ` Heiko Carstens [not found] ` <OFEDB7DC8E.0D8BE463-ONC1257961.004633DF-C1257961.0046C1E7@de.ibm.com> 2011-12-09 11:23 ` [patch 11/12] [PATCH] kvm-s390-ucontrol: announce capability for user controlled vms Carsten Otte 2011-12-09 11:23 ` [patch 12/12] [PATCH] kvm-s390: Fix return code for unknown ioctl numbers Carsten Otte 11 siblings, 2 replies; 33+ messages in thread From: Carsten Otte @ 2011-12-09 11:23 UTC (permalink / raw) To: Avi Kivity, Marcelo Tossati Cc: Christian Borntraeger, Heiko Carstens, Martin Schwidefsky, Cornelia Huck, KVM, Joachim von Buttlar, Jens Freimann, Constantin Werner [-- Attachment #1: skey-new.patch --] [-- Type: text/plain, Size: 9114 bytes --] This patch introduces an interface to access the guest visible storage keys. It supports three operations that model the behavior that SSKE/ISKE/RRBE instructions would have if they were issued by the guest. These instructions are all documented in the z architecture principles of operation book. Signed-off-by: Carsten Otte <cotte@de.ibm.com> --- --- Documentation/virtual/kvm/api.txt | 38 +++++++++++++ arch/s390/include/asm/kvm_host.h | 4 + arch/s390/include/asm/pgtable.h | 1 arch/s390/kvm/kvm-s390.c | 106 ++++++++++++++++++++++++++++++++++++-- arch/s390/mm/pgtable.c | 70 ++++++++++++++++++------- include/linux/kvm.h | 7 ++ 6 files changed, 205 insertions(+), 21 deletions(-) --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1514,6 +1514,44 @@ table upfront. This is useful to handle controlled virtual machines to fault in the virtual cpu's lowcore pages prior to calling the KVM_RUN ioctl. +4.67 KVM_S390_KEYOP + +Capability: KVM_CAP_UCONTROL +Architectures: s390 +Type: vm ioctl +Parameters: struct kvm_s390_keyop (in+out) +Returns: 0 in case of success + +The parameter looks like this: + struct kvm_s390_keyop { + __u64 user_addr; + __u8 key; + __u8 operation; + }; + +user_addr contains the userspace address of a memory page +key contains the guest visible storage key as defined by the + z Architecture Principles of Operation book, including key + value for key controlled storage protection, the fetch + protection bit, and the reference and change indicator bits +operation indicates the key operation that should be performed + +The following operations are supported: +KVM_S390_KEYOP_SSKE: + This operation behaves just like the set storage key extended (SSKE) + instruction would, if it were issued by the guest. The storage key + provided in "key" is placed in the guest visible storage key. +KVM_S390_KEYOP_ISKE: + This operation behaves just like the insert storage key extended (ISKE) + instruction would, if it were issued by the guest. After this call, + the guest visible storage key is presented in the "key" field. +KVM_S390_KEYOP_RRBE: + This operation behaves just like the reset referenced bit extended + (RRBE) instruction would, if it were issued by the guest. The guest + visible reference bit is cleared, and the value presented in the "key" + field after this call has the reference bit set to 1 in case the + guest view of the reference bit was 1 prior to this call. + 5. The kvm_run structure Application code obtains a pointer to the kvm_run structure by --- a/arch/s390/include/asm/kvm_host.h +++ b/arch/s390/include/asm/kvm_host.h @@ -24,6 +24,10 @@ /* memory slots that does not exposed to userspace */ #define KVM_PRIVATE_MEM_SLOTS 4 +#define KVM_S390_KEYOP_SSKE 0x01 +#define KVM_S390_KEYOP_ISKE 0x02 +#define KVM_S390_KEYOP_RRBE 0x03 + struct sca_entry { atomic_t scn; __u32 reserved; --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1254,6 +1254,7 @@ static inline pte_t mk_swap_pte(unsigned extern int vmem_add_mapping(unsigned long start, unsigned long size); extern int vmem_remove_mapping(unsigned long start, unsigned long size); extern int s390_enable_sie(void); +extern pte_t *ptep_for_addr(unsigned long addr); /* * No page table caches to initialise --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -112,13 +112,113 @@ void kvm_arch_exit(void) { } +static long kvm_s390_keyop(struct kvm_s390_keyop *kop) +{ + unsigned long addr = kop->user_addr; + pte_t *ptep; + pgste_t pgste; + int r; + unsigned long skey; + unsigned long bits; + + /* make sure this process is a hypervisor */ + r = -EINVAL; + if (!mm_has_pgste(current->mm)) + goto out; + + r = -ENXIO; + if (addr >= PGDIR_SIZE) + goto out; + + spin_lock(¤t->mm->page_table_lock); + ptep = ptep_for_addr(addr); + if (!ptep) + goto out_unlock; + + pgste = pgste_get_lock(ptep); + + switch (kop->operation) { + case KVM_S390_KEYOP_SSKE: + pgste = pgste_update_all(ptep, pgste); + /* set the real key back w/o rc bits */ + skey = kop->key & (_PAGE_ACC_BITS | _PAGE_FP_BIT); + if (pte_present(*ptep)) { + page_set_storage_key(pte_val(*ptep), skey, 1); + /* avoid race clobbering changed bit */ + pte_val(*ptep) |= _PAGE_SWC; + } + /* put acc+f plus guest refereced and changed into the pgste */ + pgste_val(pgste) &= ~(RCP_ACC_BITS | RCP_FP_BIT | RCP_GR_BIT + | RCP_GC_BIT); + bits = (kop->key & (_PAGE_ACC_BITS | _PAGE_FP_BIT)); + pgste_val(pgste) |= bits << 56; + bits = (kop->key & (_PAGE_CHANGED | _PAGE_REFERENCED)); + pgste_val(pgste) |= bits << 48; + r = 0; + break; + case KVM_S390_KEYOP_ISKE: + if (pte_present(*ptep)) { + skey = page_get_storage_key(pte_val(*ptep)); + kop->key = skey & (_PAGE_ACC_BITS | _PAGE_FP_BIT); + } else { + skey = 0; + kop->key = (pgste_val(pgste) >> 56) & + (_PAGE_ACC_BITS | _PAGE_FP_BIT); + } + kop->key |= skey & (_PAGE_CHANGED | _PAGE_REFERENCED); + kop->key |= (pgste_val(pgste) >> 48) & + (_PAGE_CHANGED | _PAGE_REFERENCED); + r = 0; + break; + case KVM_S390_KEYOP_RRBE: + pgste = pgste_update_all(ptep, pgste); + kop->key = 0; + if (pgste_val(pgste) & RCP_GR_BIT) + kop->key |= _PAGE_REFERENCED; + pgste_val(pgste) &= ~RCP_GR_BIT; + r = 0; + break; + default: + r = -EINVAL; + } + pgste_set_unlock(ptep, pgste); + +out_unlock: + spin_unlock(¤t->mm->page_table_lock); +out: + return r; +} + /* Section: device related */ long kvm_arch_dev_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { - if (ioctl == KVM_S390_ENABLE_SIE) - return s390_enable_sie(); - return -EINVAL; + void __user *argp = (void __user *)arg; + int r; + + switch (ioctl) { + case KVM_S390_ENABLE_SIE: + r = s390_enable_sie(); + break; + case KVM_S390_KEYOP: { + struct kvm_s390_keyop kop; + r = -EFAULT; + if (copy_from_user(&kop, argp, sizeof(struct kvm_s390_keyop))) + break; + r = kvm_s390_keyop(&kop); + if (r) + break; + r = -EFAULT; + if (copy_to_user(argp, &kop, sizeof(struct kvm_s390_keyop))) + break; + r = 0; + break; + } + default: + r = -ENOTTY; + } + + return r; } int kvm_dev_ioctl_check_extension(long ext) --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -393,6 +393,33 @@ out_unmap: } EXPORT_SYMBOL_GPL(gmap_map_segment); +static pmd_t *__pmdp_for_addr(struct mm_struct *mm, unsigned long addr) +{ + struct vm_area_struct *vma; + pgd_t *pgd; + pud_t *pud; + pmd_t *pmd; + + vma = find_vma(mm, addr); + if (!vma) + return ERR_PTR(-EINVAL); + + pgd = pgd_offset(mm, addr); + pud = pud_alloc(mm, pgd, addr); + if (!pud) + return ERR_PTR(-ENOMEM); + + pmd = pmd_alloc(mm, pud, addr); + if (!pmd) + return ERR_PTR(-ENOMEM); + + if (!pmd_present(*pmd) && + __pte_alloc(mm, vma, pmd, addr)) + return ERR_PTR(-ENOMEM); + + return pmd; +} + /* * this function is assumed to be called with mmap_sem held */ @@ -402,10 +429,7 @@ unsigned long __gmap_fault(unsigned long struct mm_struct *mm; struct gmap_pgtable *mp; struct gmap_rmap *rmap; - struct vm_area_struct *vma; struct page *page; - pgd_t *pgd; - pud_t *pud; pmd_t *pmd; current->thread.gmap_addr = address; @@ -433,21 +457,11 @@ unsigned long __gmap_fault(unsigned long return mp->vmaddr | (address & ~PMD_MASK); } else if (segment & _SEGMENT_ENTRY_RO) { vmaddr = segment & _SEGMENT_ENTRY_ORIGIN; - vma = find_vma(mm, vmaddr); - if (!vma || vma->vm_start > vmaddr) - return -EFAULT; - - /* Walk the parent mm page table */ - pgd = pgd_offset(mm, vmaddr); - pud = pud_alloc(mm, pgd, vmaddr); - if (!pud) - return -ENOMEM; - pmd = pmd_alloc(mm, pud, vmaddr); - if (!pmd) - return -ENOMEM; - if (!pmd_present(*pmd) && - __pte_alloc(mm, vma, pmd, vmaddr)) - return -ENOMEM; + + pmd = __pmdp_for_addr(mm, vmaddr); + if (IS_ERR(pmd)) + return PTR_ERR(pmd); + /* pmd now points to a valid segment table entry. */ rmap = kmalloc(sizeof(*rmap), GFP_KERNEL|__GFP_REPEAT); if (!rmap) @@ -806,6 +820,26 @@ int s390_enable_sie(void) } EXPORT_SYMBOL_GPL(s390_enable_sie); +pte_t *ptep_for_addr(unsigned long addr) +{ + pmd_t *pmd; + pte_t *rc; + + down_read(¤t->mm->mmap_sem); + + pmd = __pmdp_for_addr(current->mm, addr); + if (IS_ERR(pmd)) { + rc = (pte_t *)pmd; + goto up_out; + } + + rc = pte_offset(pmd, addr); +up_out: + up_read(¤t->mm->mmap_sem); + return rc; +} +EXPORT_SYMBOL_GPL(ptep_for_addr); + #if defined(CONFIG_DEBUG_PAGEALLOC) && defined(CONFIG_HIBERNATION) bool kernel_page_present(struct page *page) { --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -449,6 +449,13 @@ struct kvm_ppc_pvinfo { #define KVM_GET_MSR_INDEX_LIST _IOWR(KVMIO, 0x02, struct kvm_msr_list) #define KVM_S390_ENABLE_SIE _IO(KVMIO, 0x06) + +struct kvm_s390_keyop { + __u64 user_addr; + __u8 key; + __u8 operation; +}; +#define KVM_S390_KEYOP _IOWR(KVMIO, 0x09, struct kvm_s390_keyop) /* * Check if a kvm extension is available. Argument is extension number, * return is 1 (yes) or 0 (no, sorry). ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [patch 10/12] [PATCH] kvm-s390: storage key interface 2011-12-09 11:23 ` [patch 10/12] [PATCH] kvm-s390: storage key interface Carsten Otte @ 2011-12-09 12:04 ` Heiko Carstens [not found] ` <OFEDB7DC8E.0D8BE463-ONC1257961.004633DF-C1257961.0046C1E7@de.ibm.com> 1 sibling, 0 replies; 33+ messages in thread From: Heiko Carstens @ 2011-12-09 12:04 UTC (permalink / raw) To: Carsten Otte Cc: Avi Kivity, Marcelo Tossati, Christian Borntraeger, Martin Schwidefsky, Cornelia Huck, KVM, Joachim von Buttlar, Jens Freimann, Constantin Werner On Fri, Dec 09, 2011 at 12:23:36PM +0100, Carsten Otte wrote: > This patch introduces an interface to access the guest visible > storage keys. It supports three operations that model the behavior > that SSKE/ISKE/RRBE instructions would have if they were issued by > the guest. These instructions are all documented in the z architecture > principles of operation book. > > Signed-off-by: Carsten Otte <cotte@de.ibm.com> [...] > +static long kvm_s390_keyop(struct kvm_s390_keyop *kop) > +{ > + unsigned long addr = kop->user_addr; > + pte_t *ptep; > + pgste_t pgste; > + int r; > + unsigned long skey; > + unsigned long bits; > + > + /* make sure this process is a hypervisor */ > + r = -EINVAL; > + if (!mm_has_pgste(current->mm)) > + goto out; > + > + r = -ENXIO; > + if (addr >= PGDIR_SIZE) > + goto out; imho this should be -EFAULT. > + spin_lock(¤t->mm->page_table_lock); > + ptep = ptep_for_addr(addr); > + if (!ptep) > + goto out_unlock; ptep is a pointer and may contain an error code, like you implemented it below. Therefore you need to check for IS_ERR() here. > +static pmd_t *__pmdp_for_addr(struct mm_struct *mm, unsigned long addr) > +{ > + struct vm_area_struct *vma; > + pgd_t *pgd; > + pud_t *pud; > + pmd_t *pmd; > + > + vma = find_vma(mm, addr); > + if (!vma) > + return ERR_PTR(-EINVAL); -EFAULT imho. Also, why is this check good enough? As far as I remember find_vma() only guarantees that addr < vma_end, (if vma != NULL), but it does not guarantee that addr >= vma_start. > - vma = find_vma(mm, vmaddr); > - if (!vma || vma->vm_start > vmaddr) > - return -EFAULT; ... you used to check for that and also used the proper return code, btw. Or is there a different reason why the above code is correct? > +pte_t *ptep_for_addr(unsigned long addr) > +{ > + pmd_t *pmd; > + pte_t *rc; Would you mind renaming rc into pte? > + > + down_read(¤t->mm->mmap_sem); > + > + pmd = __pmdp_for_addr(current->mm, addr); > + if (IS_ERR(pmd)) { > + rc = (pte_t *)pmd; > + goto up_out; > + } > + > + rc = pte_offset(pmd, addr); > +up_out: > + up_read(¤t->mm->mmap_sem); > + return rc; ^ permalink raw reply [flat|nested] 33+ messages in thread
[parent not found: <OFEDB7DC8E.0D8BE463-ONC1257961.004633DF-C1257961.0046C1E7@de.ibm.com>]
* Re: [patch 10/12] [PATCH] kvm-s390: storage key interface [not found] ` <OFEDB7DC8E.0D8BE463-ONC1257961.004633DF-C1257961.0046C1E7@de.ibm.com> @ 2011-12-09 13:37 ` Carsten Otte 0 siblings, 0 replies; 33+ messages in thread From: Carsten Otte @ 2011-12-09 13:37 UTC (permalink / raw) To: Joachim von Buttlar Cc: Avi Kivity, borntrae, Constantin Werner, heicars2, huckc, Jens Freimann, KVM, mschwid2, Marcelo Tossati On 09.12.2011 13:52, Joachim von Buttlar wrote: > Shouldn't it be: page_set_storage_key(pte_val(*ptep), skey | > _PAGE_CHANGED, 1); > > + /* avoid race clobbering changed bit > */ > + pte_val(*ptep) |= _PAGE_SWC; No, the guest GR/GC bits get set to the value userspace wants down below (this is set storage key after all), and for the host we turn on Martins _PAGE_SWC software bit in the pte to make sure we don't underindicate changed. As far as I can tell, this should be just fine. > Typo: /* put acc+f plus guest referenced and changed into the will fix. ^ permalink raw reply [flat|nested] 33+ messages in thread
* [patch 11/12] [PATCH] kvm-s390-ucontrol: announce capability for user controlled vms 2011-12-09 11:23 [patch 00/12] Ucontrol patches V3 Carsten Otte ` (9 preceding siblings ...) 2011-12-09 11:23 ` [patch 10/12] [PATCH] kvm-s390: storage key interface Carsten Otte @ 2011-12-09 11:23 ` Carsten Otte 2011-12-09 11:23 ` [patch 12/12] [PATCH] kvm-s390: Fix return code for unknown ioctl numbers Carsten Otte 11 siblings, 0 replies; 33+ messages in thread From: Carsten Otte @ 2011-12-09 11:23 UTC (permalink / raw) To: Avi Kivity, Marcelo Tossati Cc: Christian Borntraeger, Heiko Carstens, Martin Schwidefsky, Cornelia Huck, KVM, Joachim von Buttlar, Jens Freimann, Constantin Werner [-- Attachment #1: announce-ucas.patch --] [-- Type: text/plain, Size: 884 bytes --] This patch announces a new capability KVM_CAP_UCONTROL that indicates that kvm can now support virtual machines that are controlled by userspace. Signed-off-by: Carsten Otte <cotte@de.ibm.com> --- --- arch/s390/kvm/kvm-s390.c | 3 +++ include/linux/kvm.h | 1 + 2 files changed, 4 insertions(+) --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -229,6 +229,9 @@ int kvm_dev_ioctl_check_extension(long e case KVM_CAP_S390_PSW: case KVM_CAP_S390_GMAP: case KVM_CAP_SYNC_MMU: +#ifdef CONFIG_KVM_UCONTROL + case KVM_CAP_UCONTROL: +#endif r = 1; break; default: --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -574,6 +574,7 @@ struct kvm_s390_keyop { #define KVM_CAP_MAX_VCPUS 66 /* returns max vcpus per vm */ #define KVM_CAP_PPC_PAPR 68 #define KVM_CAP_S390_GMAP 71 +#define KVM_CAP_UCONTROL 72 #ifdef KVM_CAP_IRQ_ROUTING ^ permalink raw reply [flat|nested] 33+ messages in thread
* [patch 12/12] [PATCH] kvm-s390: Fix return code for unknown ioctl numbers 2011-12-09 11:23 [patch 00/12] Ucontrol patches V3 Carsten Otte ` (10 preceding siblings ...) 2011-12-09 11:23 ` [patch 11/12] [PATCH] kvm-s390-ucontrol: announce capability for user controlled vms Carsten Otte @ 2011-12-09 11:23 ` Carsten Otte 11 siblings, 0 replies; 33+ messages in thread From: Carsten Otte @ 2011-12-09 11:23 UTC (permalink / raw) To: Avi Kivity, Marcelo Tossati Cc: Christian Borntraeger, Heiko Carstens, Martin Schwidefsky, Cornelia Huck, KVM, Joachim von Buttlar, Jens Freimann, Constantin Werner [-- Attachment #1: fix-ioctl-return-values.patch --] [-- Type: text/plain, Size: 583 bytes --] This patch fixes the return code of kvm_arch_vcpu_ioctl in case of an unkown ioctl number. Signed-off-by: Carsten Otte <cotte@de.ibm.com> --- --- arch/s390/kvm/kvm-s390.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-2.5-cecsim/arch/s390/kvm/kvm-s390.c =================================================================== --- linux-2.5-cecsim.orig/arch/s390/kvm/kvm-s390.c +++ linux-2.5-cecsim/arch/s390/kvm/kvm-s390.c @@ -887,7 +887,7 @@ long kvm_arch_vcpu_ioctl(struct file *fi break; } default: - r = -EINVAL; + r = -ENOTTY; } return r; } ^ permalink raw reply [flat|nested] 33+ messages in thread
* [patch 00/12] Ucontrol patchset V6 @ 2011-12-14 12:23 Carsten Otte 2011-12-14 12:23 ` [patch 10/12] [PATCH] kvm-s390: storage key interface Carsten Otte 0 siblings, 1 reply; 33+ messages in thread From: Carsten Otte @ 2011-12-14 12:23 UTC (permalink / raw) To: Avi Kivity, Marcelo Tossati Cc: Christian Borntraeger, Heiko Carstens, Martin Schwidefsky, Cornelia Huck, KVM, Joachim von Buttlar, Jens Freimann, agraf Hi Avi, Hi Marcelo, I've integrated all review feedback: - locking is fixed on the storage key path - symbols have been renamed all over the place: CONFIG_KVM_UCONTROL => CONFIG_KVM_S390_UCONTROL KVM_CAP_UCONTROL => KVM_CAP_S390_UCONTROL KVM_VM_UCONTROL => KVM_VM_S390_UCONTROL - KVM_VM_REGULAR is gone now, normal VMs should be created with KVM_CREATE_VM and 0 as parameter so long, Carsten ^ permalink raw reply [flat|nested] 33+ messages in thread
* [patch 10/12] [PATCH] kvm-s390: storage key interface 2011-12-14 12:23 [patch 00/12] Ucontrol patchset V6 Carsten Otte @ 2011-12-14 12:23 ` Carsten Otte 2011-12-15 10:28 ` Carsten Otte 0 siblings, 1 reply; 33+ messages in thread From: Carsten Otte @ 2011-12-14 12:23 UTC (permalink / raw) To: Avi Kivity, Marcelo Tossati Cc: Christian Borntraeger, Heiko Carstens, Martin Schwidefsky, Cornelia Huck, KVM, Joachim von Buttlar, Jens Freimann, agraf [-- Attachment #1: skey-new.patch --] [-- Type: text/plain, Size: 9151 bytes --] This patch introduces an interface to access the guest visible storage keys. It supports three operations that model the behavior that SSKE/ISKE/RRBE instructions would have if they were issued by the guest. These instructions are all documented in the z architecture principles of operation book. Signed-off-by: Carsten Otte <cotte@de.ibm.com> --- --- Documentation/virtual/kvm/api.txt | 38 +++++++++++++ arch/s390/include/asm/kvm_host.h | 4 + arch/s390/include/asm/pgtable.h | 1 arch/s390/kvm/kvm-s390.c | 110 ++++++++++++++++++++++++++++++++++++-- arch/s390/mm/pgtable.c | 64 +++++++++++++++------- include/linux/kvm.h | 7 ++ 6 files changed, 203 insertions(+), 21 deletions(-) --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1514,6 +1514,44 @@ table upfront. This is useful to handle controlled virtual machines to fault in the virtual cpu's lowcore pages prior to calling the KVM_RUN ioctl. +4.67 KVM_S390_KEYOP + +Capability: KVM_CAP_S390_UCONTROL +Architectures: s390 +Type: vm ioctl +Parameters: struct kvm_s390_keyop (in+out) +Returns: 0 in case of success + +The parameter looks like this: + struct kvm_s390_keyop { + __u64 user_addr; + __u8 key; + __u8 operation; + }; + +user_addr contains the userspace address of a memory page +key contains the guest visible storage key as defined by the + z Architecture Principles of Operation book, including key + value for key controlled storage protection, the fetch + protection bit, and the reference and change indicator bits +operation indicates the key operation that should be performed + +The following operations are supported: +KVM_S390_KEYOP_SSKE: + This operation behaves just like the set storage key extended (SSKE) + instruction would, if it were issued by the guest. The storage key + provided in "key" is placed in the guest visible storage key. +KVM_S390_KEYOP_ISKE: + This operation behaves just like the insert storage key extended (ISKE) + instruction would, if it were issued by the guest. After this call, + the guest visible storage key is presented in the "key" field. +KVM_S390_KEYOP_RRBE: + This operation behaves just like the reset referenced bit extended + (RRBE) instruction would, if it were issued by the guest. The guest + visible reference bit is cleared, and the value presented in the "key" + field after this call has the reference bit set to 1 in case the + guest view of the reference bit was 1 prior to this call. + 5. The kvm_run structure Application code obtains a pointer to the kvm_run structure by --- a/arch/s390/include/asm/kvm_host.h +++ b/arch/s390/include/asm/kvm_host.h @@ -24,6 +24,10 @@ /* memory slots that does not exposed to userspace */ #define KVM_PRIVATE_MEM_SLOTS 4 +#define KVM_S390_KEYOP_SSKE 0x01 +#define KVM_S390_KEYOP_ISKE 0x02 +#define KVM_S390_KEYOP_RRBE 0x03 + struct sca_entry { atomic_t scn; __u32 reserved; --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1254,6 +1254,7 @@ static inline pte_t mk_swap_pte(unsigned extern int vmem_add_mapping(unsigned long start, unsigned long size); extern int vmem_remove_mapping(unsigned long start, unsigned long size); extern int s390_enable_sie(void); +extern pte_t *ptep_for_addr(unsigned long addr); /* * No page table caches to initialise --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -112,13 +112,117 @@ void kvm_arch_exit(void) { } +static long kvm_s390_keyop(struct kvm_s390_keyop *kop) +{ + unsigned long addr = kop->user_addr; + pte_t *ptep; + pgste_t pgste; + int r; + unsigned long skey; + unsigned long bits; + + /* make sure this process is a hypervisor */ + r = -EINVAL; + if (!mm_has_pgste(current->mm)) + goto out; + + r = -EFAULT; + if (addr >= PGDIR_SIZE) + goto out; + + down_read(¤t->mm->mmap_sem); + ptep = ptep_for_addr(addr); + if (IS_ERR(ptep)) { + r = PTR_ERR(ptep); + goto out_unlock; + } + + spin_lock(¤t->mm->page_table_lock); + pgste = pgste_get_lock(ptep); + + switch (kop->operation) { + case KVM_S390_KEYOP_SSKE: + pgste = pgste_update_all(ptep, pgste); + /* set the real key back w/o rc bits */ + skey = kop->key & (_PAGE_ACC_BITS | _PAGE_FP_BIT); + if (pte_present(*ptep)) { + page_set_storage_key(pte_val(*ptep), skey, 1); + /* avoid race clobbering changed bit */ + pte_val(*ptep) |= _PAGE_SWC; + } + /* put acc+f plus guest referenced and changed into the pgste */ + pgste_val(pgste) &= ~(RCP_ACC_BITS | RCP_FP_BIT | RCP_GR_BIT + | RCP_GC_BIT); + bits = (kop->key & (_PAGE_ACC_BITS | _PAGE_FP_BIT)); + pgste_val(pgste) |= bits << 56; + bits = (kop->key & (_PAGE_CHANGED | _PAGE_REFERENCED)); + pgste_val(pgste) |= bits << 48; + r = 0; + break; + case KVM_S390_KEYOP_ISKE: + if (pte_present(*ptep)) { + skey = page_get_storage_key(pte_val(*ptep)); + kop->key = skey & (_PAGE_ACC_BITS | _PAGE_FP_BIT); + } else { + skey = 0; + kop->key = (pgste_val(pgste) >> 56) & + (_PAGE_ACC_BITS | _PAGE_FP_BIT); + } + kop->key |= skey & (_PAGE_CHANGED | _PAGE_REFERENCED); + kop->key |= (pgste_val(pgste) >> 48) & + (_PAGE_CHANGED | _PAGE_REFERENCED); + r = 0; + break; + case KVM_S390_KEYOP_RRBE: + pgste = pgste_update_all(ptep, pgste); + kop->key = 0; + if (pgste_val(pgste) & RCP_GR_BIT) + kop->key |= _PAGE_REFERENCED; + pgste_val(pgste) &= ~RCP_GR_BIT; + r = 0; + break; + default: + r = -EINVAL; + } + pgste_set_unlock(ptep, pgste); + spin_unlock(¤t->mm->page_table_lock); + +out_unlock: + up_read(¤t->mm->mmap_sem); +out: + return r; +} + /* Section: device related */ long kvm_arch_dev_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { - if (ioctl == KVM_S390_ENABLE_SIE) - return s390_enable_sie(); - return -EINVAL; + void __user *argp = (void __user *)arg; + int r; + + switch (ioctl) { + case KVM_S390_ENABLE_SIE: + r = s390_enable_sie(); + break; + case KVM_S390_KEYOP: { + struct kvm_s390_keyop kop; + r = -EFAULT; + if (copy_from_user(&kop, argp, sizeof(struct kvm_s390_keyop))) + break; + r = kvm_s390_keyop(&kop); + if (r) + break; + r = -EFAULT; + if (copy_to_user(argp, &kop, sizeof(struct kvm_s390_keyop))) + break; + r = 0; + break; + } + default: + r = -ENOTTY; + } + + return r; } int kvm_dev_ioctl_check_extension(long ext) --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -393,6 +393,33 @@ out_unmap: } EXPORT_SYMBOL_GPL(gmap_map_segment); +static pmd_t *__pmdp_for_addr(struct mm_struct *mm, unsigned long addr) +{ + struct vm_area_struct *vma; + pgd_t *pgd; + pud_t *pud; + pmd_t *pmd; + + vma = find_vma(mm, addr); + if (!vma || (vma->vm_start > addr)) + return ERR_PTR(-EFAULT); + + pgd = pgd_offset(mm, addr); + pud = pud_alloc(mm, pgd, addr); + if (!pud) + return ERR_PTR(-ENOMEM); + + pmd = pmd_alloc(mm, pud, addr); + if (!pmd) + return ERR_PTR(-ENOMEM); + + if (!pmd_present(*pmd) && + __pte_alloc(mm, vma, pmd, addr)) + return ERR_PTR(-ENOMEM); + + return pmd; +} + /* * this function is assumed to be called with mmap_sem held */ @@ -402,10 +429,7 @@ unsigned long __gmap_fault(unsigned long struct mm_struct *mm; struct gmap_pgtable *mp; struct gmap_rmap *rmap; - struct vm_area_struct *vma; struct page *page; - pgd_t *pgd; - pud_t *pud; pmd_t *pmd; current->thread.gmap_addr = address; @@ -433,21 +457,11 @@ unsigned long __gmap_fault(unsigned long return mp->vmaddr | (address & ~PMD_MASK); } else if (segment & _SEGMENT_ENTRY_RO) { vmaddr = segment & _SEGMENT_ENTRY_ORIGIN; - vma = find_vma(mm, vmaddr); - if (!vma || vma->vm_start > vmaddr) - return -EFAULT; - - /* Walk the parent mm page table */ - pgd = pgd_offset(mm, vmaddr); - pud = pud_alloc(mm, pgd, vmaddr); - if (!pud) - return -ENOMEM; - pmd = pmd_alloc(mm, pud, vmaddr); - if (!pmd) - return -ENOMEM; - if (!pmd_present(*pmd) && - __pte_alloc(mm, vma, pmd, vmaddr)) - return -ENOMEM; + + pmd = __pmdp_for_addr(mm, vmaddr); + if (IS_ERR(pmd)) + return PTR_ERR(pmd); + /* pmd now points to a valid segment table entry. */ rmap = kmalloc(sizeof(*rmap), GFP_KERNEL|__GFP_REPEAT); if (!rmap) @@ -806,6 +820,20 @@ int s390_enable_sie(void) } EXPORT_SYMBOL_GPL(s390_enable_sie); +pte_t *ptep_for_addr(unsigned long addr) +{ + pmd_t *pmd; + pte_t *pte; + + pmd = __pmdp_for_addr(current->mm, addr); + if (IS_ERR(pmd)) + return (pte_t *)pmd; + + pte = pte_offset(pmd, addr); + return pte; +} +EXPORT_SYMBOL_GPL(ptep_for_addr); + #if defined(CONFIG_DEBUG_PAGEALLOC) && defined(CONFIG_HIBERNATION) bool kernel_page_present(struct page *page) { --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -450,6 +450,13 @@ struct kvm_ppc_pvinfo { #define KVM_GET_MSR_INDEX_LIST _IOWR(KVMIO, 0x02, struct kvm_msr_list) #define KVM_S390_ENABLE_SIE _IO(KVMIO, 0x06) + +struct kvm_s390_keyop { + __u64 user_addr; + __u8 key; + __u8 operation; +}; +#define KVM_S390_KEYOP _IOWR(KVMIO, 0x09, struct kvm_s390_keyop) /* * Check if a kvm extension is available. Argument is extension number, * return is 1 (yes) or 0 (no, sorry). ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [patch 10/12] [PATCH] kvm-s390: storage key interface 2011-12-14 12:23 ` [patch 10/12] [PATCH] kvm-s390: storage key interface Carsten Otte @ 2011-12-15 10:28 ` Carsten Otte 2011-12-15 16:11 ` Heiko Carstens 2011-12-15 17:14 ` Martin Schwidefsky 0 siblings, 2 replies; 33+ messages in thread From: Carsten Otte @ 2011-12-15 10:28 UTC (permalink / raw) To: Avi Kivity, Marcelo Tossati Cc: borntrae, heicars2, mschwid2, huckc, KVM, Joachim von Buttlar, Jens Freimann, agraf New version below. Changes: - __pmdp_for_addr and ptep_for_addr now take a vma as argument - check if a vma exists has moved to gmap_fault and kvm_s390_keyop - kvm_s390_keyop verifies that a vma is writable so that it's safe to set the SWC bit Subject: [PATCH] kvm-s390: storage key interface From: Carsten Otte <cotte@de.ibm.com> This patch introduces an interface to access the guest visible storage keys. It supports three operations that model the behavior that SSKE/ISKE/RRBE instructions would have if they were issued by the guest. These instructions are all documented in the z architecture principles of operation book. Signed-off-by: Carsten Otte <cotte@de.ibm.com> --- --- Documentation/virtual/kvm/api.txt | 38 ++++++++++++++ arch/s390/include/asm/kvm_host.h | 4 + arch/s390/include/asm/pgtable.h | 1 arch/s390/kvm/kvm-s390.c | 103 ++++++++++++++++++++++++++++++++++++-- arch/s390/mm/pgtable.c | 70 +++++++++++++++++++------ include/linux/kvm.h | 7 ++ 6 files changed, 202 insertions(+), 21 deletions(-) Index: linux-2.5-cecsim/Documentation/virtual/kvm/api.txt =================================================================== --- linux-2.5-cecsim.orig/Documentation/virtual/kvm/api.txt +++ linux-2.5-cecsim/Documentation/virtual/kvm/api.txt @@ -1494,6 +1494,44 @@ table upfront. This is useful to handle controlled virtual machines to fault in the virtual cpu's lowcore pages prior to calling the KVM_RUN ioctl. +4.67 KVM_S390_KEYOP + +Capability: KVM_CAP_S390_UCONTROL +Architectures: s390 +Type: vm ioctl +Parameters: struct kvm_s390_keyop (in+out) +Returns: 0 in case of success + +The parameter looks like this: + struct kvm_s390_keyop { + __u64 user_addr; + __u8 key; + __u8 operation; + }; + +user_addr contains the userspace address of a memory page +key contains the guest visible storage key as defined by the + z Architecture Principles of Operation book, including key + value for key controlled storage protection, the fetch + protection bit, and the reference and change indicator bits +operation indicates the key operation that should be performed + +The following operations are supported: +KVM_S390_KEYOP_SSKE: + This operation behaves just like the set storage key extended (SSKE) + instruction would, if it were issued by the guest. The storage key + provided in "key" is placed in the guest visible storage key. +KVM_S390_KEYOP_ISKE: + This operation behaves just like the insert storage key extended (ISKE) + instruction would, if it were issued by the guest. After this call, + the guest visible storage key is presented in the "key" field. +KVM_S390_KEYOP_RRBE: + This operation behaves just like the reset referenced bit extended + (RRBE) instruction would, if it were issued by the guest. The guest + visible reference bit is cleared, and the value presented in the "key" + field after this call has the reference bit set to 1 in case the + guest view of the reference bit was 1 prior to this call. + 5. The kvm_run structure Application code obtains a pointer to the kvm_run structure by Index: linux-2.5-cecsim/arch/s390/include/asm/kvm_host.h =================================================================== --- linux-2.5-cecsim.orig/arch/s390/include/asm/kvm_host.h +++ linux-2.5-cecsim/arch/s390/include/asm/kvm_host.h @@ -24,6 +24,10 @@ /* memory slots that does not exposed to userspace */ #define KVM_PRIVATE_MEM_SLOTS 4 +#define KVM_S390_KEYOP_SSKE 0x01 +#define KVM_S390_KEYOP_ISKE 0x02 +#define KVM_S390_KEYOP_RRBE 0x03 + struct sca_entry { atomic_t scn; __u32 reserved; Index: linux-2.5-cecsim/arch/s390/include/asm/pgtable.h =================================================================== --- linux-2.5-cecsim.orig/arch/s390/include/asm/pgtable.h +++ linux-2.5-cecsim/arch/s390/include/asm/pgtable.h @@ -1237,6 +1237,7 @@ static inline pte_t mk_swap_pte(unsigned extern int vmem_add_mapping(unsigned long start, unsigned long size); extern int vmem_remove_mapping(unsigned long start, unsigned long size); extern int s390_enable_sie(void); +extern pte_t *ptep_for_addr(unsigned long addr, struct vm_area_struct *); /* * No page table caches to initialise Index: linux-2.5-cecsim/arch/s390/kvm/kvm-s390.c =================================================================== --- linux-2.5-cecsim.orig/arch/s390/kvm/kvm-s390.c +++ linux-2.5-cecsim/arch/s390/kvm/kvm-s390.c @@ -112,13 +112,127 @@ void kvm_arch_exit(void) { } +static long kvm_s390_keyop(struct kvm_s390_keyop *kop) +{ + struct vm_area_struct *vma; + unsigned long addr = kop->user_addr; + pte_t *ptep; + pgste_t pgste; + int r; + unsigned long skey; + unsigned long bits; + + /* make sure this process is a hypervisor */ + r = -EINVAL; + if (!mm_has_pgste(current->mm)) + goto out; + + r = -EFAULT; + if (addr >= PGDIR_SIZE) + goto out; + + down_read(¤t->mm->mmap_sem); + r = -EFAULT; + vma = find_vma(current->mm, addr); + if (!vma || (vma->vm_start > addr)) + goto out_unlock; + + ptep = ptep_for_addr(addr, vma); + if (IS_ERR(ptep)) { + r = PTR_ERR(ptep); + goto out_unlock; + } + + spin_lock(¤t->mm->page_table_lock); + pgste = pgste_get_lock(ptep); + + switch (kop->operation) { + case KVM_S390_KEYOP_SSKE: + if (!(vma->vm_flags & (VM_WRITE | VM_MAYWRITE))) { + r = -EACCES; + break; + } + pgste = pgste_update_all(ptep, pgste); + /* set the real key back w/o rc bits */ + skey = kop->key & (_PAGE_ACC_BITS | _PAGE_FP_BIT); + if (pte_present(*ptep)) { + page_set_storage_key(pte_val(*ptep), skey, 1); + /* avoid race clobbering changed bit */ + pte_val(*ptep) |= _PAGE_SWC; + } + /* put acc+f plus guest referenced and changed into the pgste */ + pgste_val(pgste) &= ~(RCP_ACC_BITS | RCP_FP_BIT | RCP_GR_BIT + | RCP_GC_BIT); + bits = (kop->key & (_PAGE_ACC_BITS | _PAGE_FP_BIT)); + pgste_val(pgste) |= bits << 56; + bits = (kop->key & (_PAGE_CHANGED | _PAGE_REFERENCED)); + pgste_val(pgste) |= bits << 48; + r = 0; + break; + case KVM_S390_KEYOP_ISKE: + if (pte_present(*ptep)) { + skey = page_get_storage_key(pte_val(*ptep)); + kop->key = skey & (_PAGE_ACC_BITS | _PAGE_FP_BIT); + } else { + skey = 0; + kop->key = (pgste_val(pgste) >> 56) & + (_PAGE_ACC_BITS | _PAGE_FP_BIT); + } + kop->key |= skey & (_PAGE_CHANGED | _PAGE_REFERENCED); + kop->key |= (pgste_val(pgste) >> 48) & + (_PAGE_CHANGED | _PAGE_REFERENCED); + r = 0; + break; + case KVM_S390_KEYOP_RRBE: + pgste = pgste_update_all(ptep, pgste); + kop->key = 0; + if (pgste_val(pgste) & RCP_GR_BIT) + kop->key |= _PAGE_REFERENCED; + pgste_val(pgste) &= ~RCP_GR_BIT; + r = 0; + break; + default: + r = -EINVAL; + } + pgste_set_unlock(ptep, pgste); + spin_unlock(¤t->mm->page_table_lock); + +out_unlock: + up_read(¤t->mm->mmap_sem); +out: + return r; +} + /* Section: device related */ long kvm_arch_dev_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { - if (ioctl == KVM_S390_ENABLE_SIE) - return s390_enable_sie(); - return -EINVAL; + void __user *argp = (void __user *)arg; + int r; + + switch (ioctl) { + case KVM_S390_ENABLE_SIE: + r = s390_enable_sie(); + break; + case KVM_S390_KEYOP: { + struct kvm_s390_keyop kop; + r = -EFAULT; + if (copy_from_user(&kop, argp, sizeof(struct kvm_s390_keyop))) + break; + r = kvm_s390_keyop(&kop); + if (r) + break; + r = -EFAULT; + if (copy_to_user(argp, &kop, sizeof(struct kvm_s390_keyop))) + break; + r = 0; + break; + } + default: + r = -ENOTTY; + } + + return r; } int kvm_dev_ioctl_check_extension(long ext) Index: linux-2.5-cecsim/arch/s390/mm/pgtable.c =================================================================== --- linux-2.5-cecsim.orig/arch/s390/mm/pgtable.c +++ linux-2.5-cecsim/arch/s390/mm/pgtable.c @@ -382,6 +382,29 @@ out_unmap: } EXPORT_SYMBOL_GPL(gmap_map_segment); +static pmd_t *__pmdp_for_addr(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long addr) +{ + pgd_t *pgd; + pud_t *pud; + pmd_t *pmd; + + pgd = pgd_offset(mm, addr); + pud = pud_alloc(mm, pgd, addr); + if (!pud) + return ERR_PTR(-ENOMEM); + + pmd = pmd_alloc(mm, pud, addr); + if (!pmd) + return ERR_PTR(-ENOMEM); + + if (!pmd_present(*pmd) && + __pte_alloc(mm, vma, pmd, addr)) + return ERR_PTR(-ENOMEM); + + return pmd; +} + /* * this function is assumed to be called with mmap_sem held */ @@ -391,10 +414,8 @@ unsigned long __gmap_fault(unsigned long struct mm_struct *mm; struct gmap_pgtable *mp; struct gmap_rmap *rmap; - struct vm_area_struct *vma; struct page *page; - pgd_t *pgd; - pud_t *pud; + struct vm_area_struct *vma; pmd_t *pmd; current->thread.gmap_addr = address; @@ -422,21 +443,15 @@ unsigned long __gmap_fault(unsigned long return mp->vmaddr | (address & ~PMD_MASK); } else if (segment & _SEGMENT_ENTRY_RO) { vmaddr = segment & _SEGMENT_ENTRY_ORIGIN; + vma = find_vma(mm, vmaddr); - if (!vma || vma->vm_start > vmaddr) + if (!vma || (vma->vm_start > vmaddr)) return -EFAULT; - /* Walk the parent mm page table */ - pgd = pgd_offset(mm, vmaddr); - pud = pud_alloc(mm, pgd, vmaddr); - if (!pud) - return -ENOMEM; - pmd = pmd_alloc(mm, pud, vmaddr); - if (!pmd) - return -ENOMEM; - if (!pmd_present(*pmd) && - __pte_alloc(mm, vma, pmd, vmaddr)) - return -ENOMEM; + pmd = __pmdp_for_addr(mm, vma, vmaddr); + if (IS_ERR(pmd)) + return PTR_ERR(pmd); + /* pmd now points to a valid segment table entry. */ rmap = kmalloc(sizeof(*rmap), GFP_KERNEL|__GFP_REPEAT); if (!rmap) @@ -795,6 +810,20 @@ int s390_enable_sie(void) } EXPORT_SYMBOL_GPL(s390_enable_sie); +pte_t *ptep_for_addr(unsigned long addr, struct vm_area_struct *vma) +{ + pmd_t *pmd; + pte_t *pte; + + pmd = __pmdp_for_addr(current->mm, vma, addr); + if (IS_ERR(pmd)) + return (pte_t *)pmd; + + pte = pte_offset(pmd, addr); + return pte; +} +EXPORT_SYMBOL_GPL(ptep_for_addr); + #if defined(CONFIG_DEBUG_PAGEALLOC) && defined(CONFIG_HIBERNATION) bool kernel_page_present(struct page *page) { Index: linux-2.5-cecsim/include/linux/kvm.h =================================================================== --- linux-2.5-cecsim.orig/include/linux/kvm.h +++ linux-2.5-cecsim/include/linux/kvm.h @@ -450,6 +450,13 @@ struct kvm_ppc_pvinfo { #define KVM_GET_MSR_INDEX_LIST _IOWR(KVMIO, 0x02, struct kvm_msr_list) #define KVM_S390_ENABLE_SIE _IO(KVMIO, 0x06) + +struct kvm_s390_keyop { + __u64 user_addr; + __u8 key; + __u8 operation; +}; +#define KVM_S390_KEYOP _IOWR(KVMIO, 0x09, struct kvm_s390_keyop) /* * Check if a kvm extension is available. Argument is extension number, * return is 1 (yes) or 0 (no, sorry). ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [patch 10/12] [PATCH] kvm-s390: storage key interface 2011-12-15 10:28 ` Carsten Otte @ 2011-12-15 16:11 ` Heiko Carstens 2011-12-15 16:49 ` Christian Borntraeger 2011-12-15 17:14 ` Martin Schwidefsky 1 sibling, 1 reply; 33+ messages in thread From: Heiko Carstens @ 2011-12-15 16:11 UTC (permalink / raw) To: Carsten Otte Cc: Avi Kivity, Marcelo Tossati, borntrae, heicars2, mschwid2, huckc, KVM, Joachim von Buttlar, Jens Freimann, agraf On Thu, Dec 15, 2011 at 11:28:03AM +0100, Carsten Otte wrote: > New version below. Changes: > - __pmdp_for_addr and ptep_for_addr now take a vma as argument > - check if a vma exists has moved to gmap_fault and kvm_s390_keyop > - kvm_s390_keyop verifies that a vma is writable so that it's safe to > set the SWC bit oh.. cool. [...] > + spin_lock(¤t->mm->page_table_lock); > + pgste = pgste_get_lock(ptep); > + > + switch (kop->operation) { > + case KVM_S390_KEYOP_SSKE: > + if (!(vma->vm_flags & (VM_WRITE | VM_MAYWRITE))) { > + r = -EACCES; > + break; > + } Why again is this needed? Or put in other words: what prevents a guest to change the storage key contents via sske of a page that is mapped read-only into the guest address space? As far as I can see: nothing. Interestingly I could -in theory- do some nice stuff like: - map a file from a read-only filesystem (which doesn't have a writepage aops function) into guest address space - let the guest set the change bit in the storage key of a page that belongs to that file mapping via sske - watch the fun that happens when the host tries to write the page back But of course I could be totally wrong ;) This doesn't have to do anything with your patch, it's just that I think you shouldn't check if the vma is writable or not. It doesn't matter. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [patch 10/12] [PATCH] kvm-s390: storage key interface 2011-12-15 16:11 ` Heiko Carstens @ 2011-12-15 16:49 ` Christian Borntraeger 2011-12-15 17:18 ` Heiko Carstens 0 siblings, 1 reply; 33+ messages in thread From: Christian Borntraeger @ 2011-12-15 16:49 UTC (permalink / raw) To: Heiko Carstens Cc: Carsten Otte, Avi Kivity, Marcelo Tossati, borntrae, heicars2, mschwid2, huckc, KVM, Joachim von Buttlar, Jens Freimann, agraf On 15/12/11 17:11, Heiko Carstens wrote: > Why again is this needed? Or put in other words: what prevents a guest to > change the storage key contents via sske of a page that is mapped read-only > into the guest address space? > As far as I can see: nothing. Interestingly I could -in theory- do some nice > stuff like: > - map a file from a read-only filesystem (which doesn't have a writepage > aops function) into guest address space > - let the guest set the change bit in the storage key of a page that belongs > to that file mapping via sske > - watch the fun that happens when the host tries to write the page back Huh? The guest itself can neither set the dirty bit of the real storage key nor set the dirty bit the host change bit of the pgste via guest SSKE. The transition 0->1 will only be done in the guest change bit of the pgste. (Otherwise we would not have a separate guest/host view of change/referenced) This interface here is for userspace (to change the guest storage key on behalf of the guest, e.g. for life guest relocation). Since we might have to touch the real storage key and this is host code millicode will not protect us from doing stupid things like it does for guest code, we better check before we touch the real storage key. Or did I misread your question? Christian ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [patch 10/12] [PATCH] kvm-s390: storage key interface 2011-12-15 16:49 ` Christian Borntraeger @ 2011-12-15 17:18 ` Heiko Carstens 0 siblings, 0 replies; 33+ messages in thread From: Heiko Carstens @ 2011-12-15 17:18 UTC (permalink / raw) To: Christian Borntraeger Cc: Carsten Otte, Avi Kivity, Marcelo Tossati, borntrae, heicars2, mschwid2, huckc, KVM, Joachim von Buttlar, Jens Freimann, agraf On Thu, Dec 15, 2011 at 05:49:19PM +0100, Christian Borntraeger wrote: > On 15/12/11 17:11, Heiko Carstens wrote: > > Why again is this needed? Or put in other words: what prevents a guest to > > change the storage key contents via sske of a page that is mapped read-only > > into the guest address space? > > As far as I can see: nothing. Interestingly I could -in theory- do some nice > > stuff like: > > - map a file from a read-only filesystem (which doesn't have a writepage > > aops function) into guest address space > > - let the guest set the change bit in the storage key of a page that belongs > > to that file mapping via sske > > - watch the fun that happens when the host tries to write the page back > > Huh? > The guest itself can neither set the dirty bit of the real storage key nor > set the dirty bit the host change bit of the pgste via guest SSKE. The transition > 0->1 will only be done in the guest change bit of the pgste. (Otherwise > we would not have a separate guest/host view of change/referenced) Yeah, I had a major braino.. > This interface here is for userspace (to change the guest storage key on behalf > of the guest, e.g. for life guest relocation). Since we might have to touch the > real storage key and this is host code millicode will not protect us from doing > stupid things like it does for guest code, we better check before we touch the > real storage key. > > Or did I misread your question? No, you did not. However, I still think it's wrong to have an early exit if the vma is not writable. Since the guest can set the guest change bit, but it is is not possible with this interface, but I can see now that the purpose was to avoid an overindication of the change bit. oh well... ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [patch 10/12] [PATCH] kvm-s390: storage key interface 2011-12-15 10:28 ` Carsten Otte 2011-12-15 16:11 ` Heiko Carstens @ 2011-12-15 17:14 ` Martin Schwidefsky 1 sibling, 0 replies; 33+ messages in thread From: Martin Schwidefsky @ 2011-12-15 17:14 UTC (permalink / raw) To: Carsten Otte Cc: Avi Kivity, Marcelo Tossati, borntrae, heicars2, mschwid2, huckc, KVM, Joachim von Buttlar, Jens Freimann, agraf On Thu, 15 Dec 2011 11:28:03 +0100 Carsten Otte <cotte@de.ibm.com> wrote: > + case KVM_S390_KEYOP_SSKE: > + if (!(vma->vm_flags & (VM_WRITE | VM_MAYWRITE))) { > + r = -EACCES; > + break; > + } Unfortunately I just realized while discussing with Heiko that a check for VM_WRITE is not enough. We could still have a read-only pte that points to a file backed page which is purely read-only. A write access is allowed but would cause copy-on-write. But we set the storage key of the original page which would make the read-only page dirty. We need to solved the race on the dirty bit in a clean way, otherwise there always will be a corner case. -- blue skies, Martin. "Reality continues to ruin my life." - Calvin. ^ permalink raw reply [flat|nested] 33+ messages in thread
* [patch 00/12] Ucontrol patchset V5 @ 2011-12-10 12:35 Carsten Otte 2011-12-10 12:35 ` [patch 10/12] [PATCH] kvm-s390: storage key interface Carsten Otte 0 siblings, 1 reply; 33+ messages in thread From: Carsten Otte @ 2011-12-10 12:35 UTC (permalink / raw) To: Avi Kivity, Marcelo Tossati Cc: Christian Borntraeger, Heiko Carstens, Martin Schwidefsky, Cornelia Huck, KVM, Joachim von Buttlar, Jens Freimann, Constantin Werner, agraf Hi Avi, Hi Marcelo, this iteration of the patchset has two changes: - Handling of null PTEs is fixed (thanks Heiko) - Typo in comment is fixed (thanks Joachim) so long, Carsten ^ permalink raw reply [flat|nested] 33+ messages in thread
* [patch 10/12] [PATCH] kvm-s390: storage key interface 2011-12-10 12:35 [patch 00/12] Ucontrol patchset V5 Carsten Otte @ 2011-12-10 12:35 ` Carsten Otte 2011-12-11 21:57 ` Heiko Carstens 2011-12-12 10:06 ` Martin Schwidefsky 0 siblings, 2 replies; 33+ messages in thread From: Carsten Otte @ 2011-12-10 12:35 UTC (permalink / raw) To: Avi Kivity, Marcelo Tossati Cc: Christian Borntraeger, Heiko Carstens, Martin Schwidefsky, Cornelia Huck, KVM, Joachim von Buttlar, Jens Freimann, Constantin Werner, agraf [-- Attachment #1: skey-new.patch --] [-- Type: text/plain, Size: 9180 bytes --] This patch introduces an interface to access the guest visible storage keys. It supports three operations that model the behavior that SSKE/ISKE/RRBE instructions would have if they were issued by the guest. These instructions are all documented in the z architecture principles of operation book. Signed-off-by: Carsten Otte <cotte@de.ibm.com> --- --- Documentation/virtual/kvm/api.txt | 38 +++++++++++++ arch/s390/include/asm/kvm_host.h | 4 + arch/s390/include/asm/pgtable.h | 1 arch/s390/kvm/kvm-s390.c | 108 ++++++++++++++++++++++++++++++++++++-- arch/s390/mm/pgtable.c | 70 ++++++++++++++++++------ include/linux/kvm.h | 7 ++ 6 files changed, 207 insertions(+), 21 deletions(-) --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1514,6 +1514,44 @@ table upfront. This is useful to handle controlled virtual machines to fault in the virtual cpu's lowcore pages prior to calling the KVM_RUN ioctl. +4.67 KVM_S390_KEYOP + +Capability: KVM_CAP_UCONTROL +Architectures: s390 +Type: vm ioctl +Parameters: struct kvm_s390_keyop (in+out) +Returns: 0 in case of success + +The parameter looks like this: + struct kvm_s390_keyop { + __u64 user_addr; + __u8 key; + __u8 operation; + }; + +user_addr contains the userspace address of a memory page +key contains the guest visible storage key as defined by the + z Architecture Principles of Operation book, including key + value for key controlled storage protection, the fetch + protection bit, and the reference and change indicator bits +operation indicates the key operation that should be performed + +The following operations are supported: +KVM_S390_KEYOP_SSKE: + This operation behaves just like the set storage key extended (SSKE) + instruction would, if it were issued by the guest. The storage key + provided in "key" is placed in the guest visible storage key. +KVM_S390_KEYOP_ISKE: + This operation behaves just like the insert storage key extended (ISKE) + instruction would, if it were issued by the guest. After this call, + the guest visible storage key is presented in the "key" field. +KVM_S390_KEYOP_RRBE: + This operation behaves just like the reset referenced bit extended + (RRBE) instruction would, if it were issued by the guest. The guest + visible reference bit is cleared, and the value presented in the "key" + field after this call has the reference bit set to 1 in case the + guest view of the reference bit was 1 prior to this call. + 5. The kvm_run structure Application code obtains a pointer to the kvm_run structure by --- a/arch/s390/include/asm/kvm_host.h +++ b/arch/s390/include/asm/kvm_host.h @@ -24,6 +24,10 @@ /* memory slots that does not exposed to userspace */ #define KVM_PRIVATE_MEM_SLOTS 4 +#define KVM_S390_KEYOP_SSKE 0x01 +#define KVM_S390_KEYOP_ISKE 0x02 +#define KVM_S390_KEYOP_RRBE 0x03 + struct sca_entry { atomic_t scn; __u32 reserved; --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1254,6 +1254,7 @@ static inline pte_t mk_swap_pte(unsigned extern int vmem_add_mapping(unsigned long start, unsigned long size); extern int vmem_remove_mapping(unsigned long start, unsigned long size); extern int s390_enable_sie(void); +extern pte_t *ptep_for_addr(unsigned long addr); /* * No page table caches to initialise --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -112,13 +112,115 @@ void kvm_arch_exit(void) { } +static long kvm_s390_keyop(struct kvm_s390_keyop *kop) +{ + unsigned long addr = kop->user_addr; + pte_t *ptep; + pgste_t pgste; + int r; + unsigned long skey; + unsigned long bits; + + /* make sure this process is a hypervisor */ + r = -EINVAL; + if (!mm_has_pgste(current->mm)) + goto out; + + r = -EFAULT; + if (addr >= PGDIR_SIZE) + goto out; + + spin_lock(¤t->mm->page_table_lock); + ptep = ptep_for_addr(addr); + if (IS_ERR(ptep)) { + r = PTR_ERR(ptep); + goto out_unlock; + } + + pgste = pgste_get_lock(ptep); + + switch (kop->operation) { + case KVM_S390_KEYOP_SSKE: + pgste = pgste_update_all(ptep, pgste); + /* set the real key back w/o rc bits */ + skey = kop->key & (_PAGE_ACC_BITS | _PAGE_FP_BIT); + if (pte_present(*ptep)) { + page_set_storage_key(pte_val(*ptep), skey, 1); + /* avoid race clobbering changed bit */ + pte_val(*ptep) |= _PAGE_SWC; + } + /* put acc+f plus guest referenced and changed into the pgste */ + pgste_val(pgste) &= ~(RCP_ACC_BITS | RCP_FP_BIT | RCP_GR_BIT + | RCP_GC_BIT); + bits = (kop->key & (_PAGE_ACC_BITS | _PAGE_FP_BIT)); + pgste_val(pgste) |= bits << 56; + bits = (kop->key & (_PAGE_CHANGED | _PAGE_REFERENCED)); + pgste_val(pgste) |= bits << 48; + r = 0; + break; + case KVM_S390_KEYOP_ISKE: + if (pte_present(*ptep)) { + skey = page_get_storage_key(pte_val(*ptep)); + kop->key = skey & (_PAGE_ACC_BITS | _PAGE_FP_BIT); + } else { + skey = 0; + kop->key = (pgste_val(pgste) >> 56) & + (_PAGE_ACC_BITS | _PAGE_FP_BIT); + } + kop->key |= skey & (_PAGE_CHANGED | _PAGE_REFERENCED); + kop->key |= (pgste_val(pgste) >> 48) & + (_PAGE_CHANGED | _PAGE_REFERENCED); + r = 0; + break; + case KVM_S390_KEYOP_RRBE: + pgste = pgste_update_all(ptep, pgste); + kop->key = 0; + if (pgste_val(pgste) & RCP_GR_BIT) + kop->key |= _PAGE_REFERENCED; + pgste_val(pgste) &= ~RCP_GR_BIT; + r = 0; + break; + default: + r = -EINVAL; + } + pgste_set_unlock(ptep, pgste); + +out_unlock: + spin_unlock(¤t->mm->page_table_lock); +out: + return r; +} + /* Section: device related */ long kvm_arch_dev_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { - if (ioctl == KVM_S390_ENABLE_SIE) - return s390_enable_sie(); - return -EINVAL; + void __user *argp = (void __user *)arg; + int r; + + switch (ioctl) { + case KVM_S390_ENABLE_SIE: + r = s390_enable_sie(); + break; + case KVM_S390_KEYOP: { + struct kvm_s390_keyop kop; + r = -EFAULT; + if (copy_from_user(&kop, argp, sizeof(struct kvm_s390_keyop))) + break; + r = kvm_s390_keyop(&kop); + if (r) + break; + r = -EFAULT; + if (copy_to_user(argp, &kop, sizeof(struct kvm_s390_keyop))) + break; + r = 0; + break; + } + default: + r = -ENOTTY; + } + + return r; } int kvm_dev_ioctl_check_extension(long ext) --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -393,6 +393,33 @@ out_unmap: } EXPORT_SYMBOL_GPL(gmap_map_segment); +static pmd_t *__pmdp_for_addr(struct mm_struct *mm, unsigned long addr) +{ + struct vm_area_struct *vma; + pgd_t *pgd; + pud_t *pud; + pmd_t *pmd; + + vma = find_vma(mm, addr); + if (!vma || (vma->vm_start > addr)) + return ERR_PTR(-EFAULT); + + pgd = pgd_offset(mm, addr); + pud = pud_alloc(mm, pgd, addr); + if (!pud) + return ERR_PTR(-ENOMEM); + + pmd = pmd_alloc(mm, pud, addr); + if (!pmd) + return ERR_PTR(-ENOMEM); + + if (!pmd_present(*pmd) && + __pte_alloc(mm, vma, pmd, addr)) + return ERR_PTR(-ENOMEM); + + return pmd; +} + /* * this function is assumed to be called with mmap_sem held */ @@ -402,10 +429,7 @@ unsigned long __gmap_fault(unsigned long struct mm_struct *mm; struct gmap_pgtable *mp; struct gmap_rmap *rmap; - struct vm_area_struct *vma; struct page *page; - pgd_t *pgd; - pud_t *pud; pmd_t *pmd; current->thread.gmap_addr = address; @@ -433,21 +457,11 @@ unsigned long __gmap_fault(unsigned long return mp->vmaddr | (address & ~PMD_MASK); } else if (segment & _SEGMENT_ENTRY_RO) { vmaddr = segment & _SEGMENT_ENTRY_ORIGIN; - vma = find_vma(mm, vmaddr); - if (!vma || vma->vm_start > vmaddr) - return -EFAULT; - - /* Walk the parent mm page table */ - pgd = pgd_offset(mm, vmaddr); - pud = pud_alloc(mm, pgd, vmaddr); - if (!pud) - return -ENOMEM; - pmd = pmd_alloc(mm, pud, vmaddr); - if (!pmd) - return -ENOMEM; - if (!pmd_present(*pmd) && - __pte_alloc(mm, vma, pmd, vmaddr)) - return -ENOMEM; + + pmd = __pmdp_for_addr(mm, vmaddr); + if (IS_ERR(pmd)) + return PTR_ERR(pmd); + /* pmd now points to a valid segment table entry. */ rmap = kmalloc(sizeof(*rmap), GFP_KERNEL|__GFP_REPEAT); if (!rmap) @@ -806,6 +820,26 @@ int s390_enable_sie(void) } EXPORT_SYMBOL_GPL(s390_enable_sie); +pte_t *ptep_for_addr(unsigned long addr) +{ + pmd_t *pmd; + pte_t *pte; + + down_read(¤t->mm->mmap_sem); + + pmd = __pmdp_for_addr(current->mm, addr); + if (IS_ERR(pmd)) { + pte = (pte_t *)pmd; + goto up_out; + } + + pte = pte_offset(pmd, addr); +up_out: + up_read(¤t->mm->mmap_sem); + return pte; +} +EXPORT_SYMBOL_GPL(ptep_for_addr); + #if defined(CONFIG_DEBUG_PAGEALLOC) && defined(CONFIG_HIBERNATION) bool kernel_page_present(struct page *page) { --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -449,6 +449,13 @@ struct kvm_ppc_pvinfo { #define KVM_GET_MSR_INDEX_LIST _IOWR(KVMIO, 0x02, struct kvm_msr_list) #define KVM_S390_ENABLE_SIE _IO(KVMIO, 0x06) + +struct kvm_s390_keyop { + __u64 user_addr; + __u8 key; + __u8 operation; +}; +#define KVM_S390_KEYOP _IOWR(KVMIO, 0x09, struct kvm_s390_keyop) /* * Check if a kvm extension is available. Argument is extension number, * return is 1 (yes) or 0 (no, sorry). ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [patch 10/12] [PATCH] kvm-s390: storage key interface 2011-12-10 12:35 ` [patch 10/12] [PATCH] kvm-s390: storage key interface Carsten Otte @ 2011-12-11 21:57 ` Heiko Carstens 2011-12-12 10:06 ` Martin Schwidefsky 1 sibling, 0 replies; 33+ messages in thread From: Heiko Carstens @ 2011-12-11 21:57 UTC (permalink / raw) To: Carsten Otte Cc: Avi Kivity, Marcelo Tossati, Christian Borntraeger, Martin Schwidefsky, Cornelia Huck, KVM, Joachim von Buttlar, Jens Freimann, Constantin Werner, agraf On Sat, Dec 10, 2011 at 01:35:39PM +0100, Carsten Otte wrote: > This patch introduces an interface to access the guest visible > storage keys. It supports three operations that model the behavior > that SSKE/ISKE/RRBE instructions would have if they were issued by > the guest. These instructions are all documented in the z architecture > principles of operation book. > > Signed-off-by: Carsten Otte <cotte@de.ibm.com> [...] > --- a/arch/s390/kvm/kvm-s390.c > +++ b/arch/s390/kvm/kvm-s390.c > @@ -112,13 +112,115 @@ void kvm_arch_exit(void) > { > } > > +static long kvm_s390_keyop(struct kvm_s390_keyop *kop) > +{ > + unsigned long addr = kop->user_addr; > + pte_t *ptep; > + pgste_t pgste; > + int r; > + unsigned long skey; > + unsigned long bits; > + > + /* make sure this process is a hypervisor */ > + r = -EINVAL; > + if (!mm_has_pgste(current->mm)) > + goto out; > + > + r = -EFAULT; > + if (addr >= PGDIR_SIZE) > + goto out; > + > + spin_lock(¤t->mm->page_table_lock); > + ptep = ptep_for_addr(addr); Locking is broken; following order is possible: kvm_s390_keyop() - spin_lock(¤t->mm->page_table_lock) -> ptep_for_addr() - down_read(¤t->mm->mmap_sem) ---> Bug 1, we might schedule here -> __pmdp_for_addr() -> __pte_alloc() - spin_lock(&mm->page_table_lock) ---> Bug 2, deadlock ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [patch 10/12] [PATCH] kvm-s390: storage key interface 2011-12-10 12:35 ` [patch 10/12] [PATCH] kvm-s390: storage key interface Carsten Otte 2011-12-11 21:57 ` Heiko Carstens @ 2011-12-12 10:06 ` Martin Schwidefsky 1 sibling, 0 replies; 33+ messages in thread From: Martin Schwidefsky @ 2011-12-12 10:06 UTC (permalink / raw) To: Carsten Otte Cc: Avi Kivity, Marcelo Tossati, Christian Borntraeger, Heiko Carstens, Cornelia Huck, KVM, Joachim von Buttlar, Jens Freimann, Constantin Werner, agraf On Sat, 10 Dec 2011 13:35:39 +0100 Carsten Otte <cotte@de.ibm.com> wrote: > --- a/arch/s390/mm/pgtable.c > +++ b/arch/s390/mm/pgtable.c > @@ -393,6 +393,33 @@ out_unmap: > } > EXPORT_SYMBOL_GPL(gmap_map_segment); > > +static pmd_t *__pmdp_for_addr(struct mm_struct *mm, unsigned long addr) > +{ > + struct vm_area_struct *vma; > + pgd_t *pgd; > + pud_t *pud; > + pmd_t *pmd; > + > + vma = find_vma(mm, addr); > + if (!vma || (vma->vm_start > addr)) > + return ERR_PTR(-EFAULT); > + > + pgd = pgd_offset(mm, addr); > + pud = pud_alloc(mm, pgd, addr); > + if (!pud) > + return ERR_PTR(-ENOMEM); > + > + pmd = pmd_alloc(mm, pud, addr); > + if (!pmd) > + return ERR_PTR(-ENOMEM); > + > + if (!pmd_present(*pmd) && > + __pte_alloc(mm, vma, pmd, addr)) > + return ERR_PTR(-ENOMEM); > + > + return pmd; > +} > + > /* > * this function is assumed to be called with mmap_sem held > */ The __pmdp_for_addr function is fine for the usage in __gmap_fault. > @@ -806,6 +820,26 @@ int s390_enable_sie(void) > } > EXPORT_SYMBOL_GPL(s390_enable_sie); > > +pte_t *ptep_for_addr(unsigned long addr) > +{ > + pmd_t *pmd; > + pte_t *pte; > + > + down_read(¤t->mm->mmap_sem); > + > + pmd = __pmdp_for_addr(current->mm, addr); > + if (IS_ERR(pmd)) { > + pte = (pte_t *)pmd; > + goto up_out; > + } > + > + pte = pte_offset(pmd, addr); > +up_out: > + up_read(¤t->mm->mmap_sem); > + return pte; > +} > +EXPORT_SYMBOL_GPL(ptep_for_addr); > + > #if defined(CONFIG_DEBUG_PAGEALLOC) && defined(CONFIG_HIBERNATION) > bool kernel_page_present(struct page *page) > { There is a fundamental locking sanfu. The pointer the the page table entry is only valid until the mmap_sem is released. The down_read/up_read has to be done in the caller. -- blue skies, Martin. "Reality continues to ruin my life." - Calvin. ^ permalink raw reply [flat|nested] 33+ messages in thread
* [patch 00/12] Ucontrol patchset V4 @ 2011-12-09 12:49 Carsten Otte 2011-12-09 12:49 ` [patch 10/12] [PATCH] kvm-s390: storage key interface Carsten Otte 0 siblings, 1 reply; 33+ messages in thread From: Carsten Otte @ 2011-12-09 12:49 UTC (permalink / raw) To: Avi Kivity, Marcelo Tossati Cc: Christian Borntraeger, Heiko Carstens, Martin Schwidefsky, Cornelia Huck, KVM, Joachim von Buttlar, Jens Freimann, Constantin Werner Hi Avi, Hi Marcelo, this version includes review feedback from Heiko. so long, Carsten ^ permalink raw reply [flat|nested] 33+ messages in thread
* [patch 10/12] [PATCH] kvm-s390: storage key interface 2011-12-09 12:49 [patch 00/12] Ucontrol patchset V4 Carsten Otte @ 2011-12-09 12:49 ` Carsten Otte 2011-12-09 13:46 ` Heiko Carstens 0 siblings, 1 reply; 33+ messages in thread From: Carsten Otte @ 2011-12-09 12:49 UTC (permalink / raw) To: Avi Kivity, Marcelo Tossati Cc: Christian Borntraeger, Heiko Carstens, Martin Schwidefsky, Cornelia Huck, KVM, Joachim von Buttlar, Jens Freimann, Constantin Werner [-- Attachment #1: skey-new.patch --] [-- Type: text/plain, Size: 9212 bytes --] This patch introduces an interface to access the guest visible storage keys. It supports three operations that model the behavior that SSKE/ISKE/RRBE instructions would have if they were issued by the guest. These instructions are all documented in the z architecture principles of operation book. Signed-off-by: Carsten Otte <cotte@de.ibm.com> --- --- Documentation/virtual/kvm/api.txt | 38 +++++++++++++ arch/s390/include/asm/kvm_host.h | 4 + arch/s390/include/asm/pgtable.h | 1 arch/s390/kvm/kvm-s390.c | 110 ++++++++++++++++++++++++++++++++++++-- arch/s390/mm/pgtable.c | 70 +++++++++++++++++------- include/linux/kvm.h | 7 ++ 6 files changed, 209 insertions(+), 21 deletions(-) --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1514,6 +1514,44 @@ table upfront. This is useful to handle controlled virtual machines to fault in the virtual cpu's lowcore pages prior to calling the KVM_RUN ioctl. +4.67 KVM_S390_KEYOP + +Capability: KVM_CAP_UCONTROL +Architectures: s390 +Type: vm ioctl +Parameters: struct kvm_s390_keyop (in+out) +Returns: 0 in case of success + +The parameter looks like this: + struct kvm_s390_keyop { + __u64 user_addr; + __u8 key; + __u8 operation; + }; + +user_addr contains the userspace address of a memory page +key contains the guest visible storage key as defined by the + z Architecture Principles of Operation book, including key + value for key controlled storage protection, the fetch + protection bit, and the reference and change indicator bits +operation indicates the key operation that should be performed + +The following operations are supported: +KVM_S390_KEYOP_SSKE: + This operation behaves just like the set storage key extended (SSKE) + instruction would, if it were issued by the guest. The storage key + provided in "key" is placed in the guest visible storage key. +KVM_S390_KEYOP_ISKE: + This operation behaves just like the insert storage key extended (ISKE) + instruction would, if it were issued by the guest. After this call, + the guest visible storage key is presented in the "key" field. +KVM_S390_KEYOP_RRBE: + This operation behaves just like the reset referenced bit extended + (RRBE) instruction would, if it were issued by the guest. The guest + visible reference bit is cleared, and the value presented in the "key" + field after this call has the reference bit set to 1 in case the + guest view of the reference bit was 1 prior to this call. + 5. The kvm_run structure Application code obtains a pointer to the kvm_run structure by --- a/arch/s390/include/asm/kvm_host.h +++ b/arch/s390/include/asm/kvm_host.h @@ -24,6 +24,10 @@ /* memory slots that does not exposed to userspace */ #define KVM_PRIVATE_MEM_SLOTS 4 +#define KVM_S390_KEYOP_SSKE 0x01 +#define KVM_S390_KEYOP_ISKE 0x02 +#define KVM_S390_KEYOP_RRBE 0x03 + struct sca_entry { atomic_t scn; __u32 reserved; --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1254,6 +1254,7 @@ static inline pte_t mk_swap_pte(unsigned extern int vmem_add_mapping(unsigned long start, unsigned long size); extern int vmem_remove_mapping(unsigned long start, unsigned long size); extern int s390_enable_sie(void); +extern pte_t *ptep_for_addr(unsigned long addr); /* * No page table caches to initialise --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -112,13 +112,117 @@ void kvm_arch_exit(void) { } +static long kvm_s390_keyop(struct kvm_s390_keyop *kop) +{ + unsigned long addr = kop->user_addr; + pte_t *ptep; + pgste_t pgste; + int r; + unsigned long skey; + unsigned long bits; + + /* make sure this process is a hypervisor */ + r = -EINVAL; + if (!mm_has_pgste(current->mm)) + goto out; + + r = -EFAULT; + if (addr >= PGDIR_SIZE) + goto out; + + spin_lock(¤t->mm->page_table_lock); + ptep = ptep_for_addr(addr); + if (!ptep) + goto out_unlock; + if (IS_ERR(ptep)) { + r = PTR_ERR(ptep); + goto out_unlock; + } + + pgste = pgste_get_lock(ptep); + + switch (kop->operation) { + case KVM_S390_KEYOP_SSKE: + pgste = pgste_update_all(ptep, pgste); + /* set the real key back w/o rc bits */ + skey = kop->key & (_PAGE_ACC_BITS | _PAGE_FP_BIT); + if (pte_present(*ptep)) { + page_set_storage_key(pte_val(*ptep), skey, 1); + /* avoid race clobbering changed bit */ + pte_val(*ptep) |= _PAGE_SWC; + } + /* put acc+f plus guest refereced and changed into the pgste */ + pgste_val(pgste) &= ~(RCP_ACC_BITS | RCP_FP_BIT | RCP_GR_BIT + | RCP_GC_BIT); + bits = (kop->key & (_PAGE_ACC_BITS | _PAGE_FP_BIT)); + pgste_val(pgste) |= bits << 56; + bits = (kop->key & (_PAGE_CHANGED | _PAGE_REFERENCED)); + pgste_val(pgste) |= bits << 48; + r = 0; + break; + case KVM_S390_KEYOP_ISKE: + if (pte_present(*ptep)) { + skey = page_get_storage_key(pte_val(*ptep)); + kop->key = skey & (_PAGE_ACC_BITS | _PAGE_FP_BIT); + } else { + skey = 0; + kop->key = (pgste_val(pgste) >> 56) & + (_PAGE_ACC_BITS | _PAGE_FP_BIT); + } + kop->key |= skey & (_PAGE_CHANGED | _PAGE_REFERENCED); + kop->key |= (pgste_val(pgste) >> 48) & + (_PAGE_CHANGED | _PAGE_REFERENCED); + r = 0; + break; + case KVM_S390_KEYOP_RRBE: + pgste = pgste_update_all(ptep, pgste); + kop->key = 0; + if (pgste_val(pgste) & RCP_GR_BIT) + kop->key |= _PAGE_REFERENCED; + pgste_val(pgste) &= ~RCP_GR_BIT; + r = 0; + break; + default: + r = -EINVAL; + } + pgste_set_unlock(ptep, pgste); + +out_unlock: + spin_unlock(¤t->mm->page_table_lock); +out: + return r; +} + /* Section: device related */ long kvm_arch_dev_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { - if (ioctl == KVM_S390_ENABLE_SIE) - return s390_enable_sie(); - return -EINVAL; + void __user *argp = (void __user *)arg; + int r; + + switch (ioctl) { + case KVM_S390_ENABLE_SIE: + r = s390_enable_sie(); + break; + case KVM_S390_KEYOP: { + struct kvm_s390_keyop kop; + r = -EFAULT; + if (copy_from_user(&kop, argp, sizeof(struct kvm_s390_keyop))) + break; + r = kvm_s390_keyop(&kop); + if (r) + break; + r = -EFAULT; + if (copy_to_user(argp, &kop, sizeof(struct kvm_s390_keyop))) + break; + r = 0; + break; + } + default: + r = -ENOTTY; + } + + return r; } int kvm_dev_ioctl_check_extension(long ext) --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -393,6 +393,33 @@ out_unmap: } EXPORT_SYMBOL_GPL(gmap_map_segment); +static pmd_t *__pmdp_for_addr(struct mm_struct *mm, unsigned long addr) +{ + struct vm_area_struct *vma; + pgd_t *pgd; + pud_t *pud; + pmd_t *pmd; + + vma = find_vma(mm, addr); + if (!vma || (vma->vm_start > addr)) + return ERR_PTR(-EFAULT); + + pgd = pgd_offset(mm, addr); + pud = pud_alloc(mm, pgd, addr); + if (!pud) + return ERR_PTR(-ENOMEM); + + pmd = pmd_alloc(mm, pud, addr); + if (!pmd) + return ERR_PTR(-ENOMEM); + + if (!pmd_present(*pmd) && + __pte_alloc(mm, vma, pmd, addr)) + return ERR_PTR(-ENOMEM); + + return pmd; +} + /* * this function is assumed to be called with mmap_sem held */ @@ -402,10 +429,7 @@ unsigned long __gmap_fault(unsigned long struct mm_struct *mm; struct gmap_pgtable *mp; struct gmap_rmap *rmap; - struct vm_area_struct *vma; struct page *page; - pgd_t *pgd; - pud_t *pud; pmd_t *pmd; current->thread.gmap_addr = address; @@ -433,21 +457,11 @@ unsigned long __gmap_fault(unsigned long return mp->vmaddr | (address & ~PMD_MASK); } else if (segment & _SEGMENT_ENTRY_RO) { vmaddr = segment & _SEGMENT_ENTRY_ORIGIN; - vma = find_vma(mm, vmaddr); - if (!vma || vma->vm_start > vmaddr) - return -EFAULT; - - /* Walk the parent mm page table */ - pgd = pgd_offset(mm, vmaddr); - pud = pud_alloc(mm, pgd, vmaddr); - if (!pud) - return -ENOMEM; - pmd = pmd_alloc(mm, pud, vmaddr); - if (!pmd) - return -ENOMEM; - if (!pmd_present(*pmd) && - __pte_alloc(mm, vma, pmd, vmaddr)) - return -ENOMEM; + + pmd = __pmdp_for_addr(mm, vmaddr); + if (IS_ERR(pmd)) + return PTR_ERR(pmd); + /* pmd now points to a valid segment table entry. */ rmap = kmalloc(sizeof(*rmap), GFP_KERNEL|__GFP_REPEAT); if (!rmap) @@ -806,6 +820,26 @@ int s390_enable_sie(void) } EXPORT_SYMBOL_GPL(s390_enable_sie); +pte_t *ptep_for_addr(unsigned long addr) +{ + pmd_t *pmd; + pte_t *pte; + + down_read(¤t->mm->mmap_sem); + + pmd = __pmdp_for_addr(current->mm, addr); + if (IS_ERR(pmd)) { + pte = (pte_t *)pmd; + goto up_out; + } + + pte = pte_offset(pmd, addr); +up_out: + up_read(¤t->mm->mmap_sem); + return pte; +} +EXPORT_SYMBOL_GPL(ptep_for_addr); + #if defined(CONFIG_DEBUG_PAGEALLOC) && defined(CONFIG_HIBERNATION) bool kernel_page_present(struct page *page) { --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -449,6 +449,13 @@ struct kvm_ppc_pvinfo { #define KVM_GET_MSR_INDEX_LIST _IOWR(KVMIO, 0x02, struct kvm_msr_list) #define KVM_S390_ENABLE_SIE _IO(KVMIO, 0x06) + +struct kvm_s390_keyop { + __u64 user_addr; + __u8 key; + __u8 operation; +}; +#define KVM_S390_KEYOP _IOWR(KVMIO, 0x09, struct kvm_s390_keyop) /* * Check if a kvm extension is available. Argument is extension number, * return is 1 (yes) or 0 (no, sorry). ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [patch 10/12] [PATCH] kvm-s390: storage key interface 2011-12-09 12:49 ` [patch 10/12] [PATCH] kvm-s390: storage key interface Carsten Otte @ 2011-12-09 13:46 ` Heiko Carstens 2011-12-10 11:36 ` Carsten Otte 0 siblings, 1 reply; 33+ messages in thread From: Heiko Carstens @ 2011-12-09 13:46 UTC (permalink / raw) To: Carsten Otte Cc: Avi Kivity, Marcelo Tossati, Christian Borntraeger, Martin Schwidefsky, Cornelia Huck, KVM, Joachim von Buttlar, Jens Freimann, Constantin Werner On Fri, Dec 09, 2011 at 01:49:35PM +0100, Carsten Otte wrote: > This patch introduces an interface to access the guest visible > storage keys. It supports three operations that model the behavior > that SSKE/ISKE/RRBE instructions would have if they were issued by > the guest. These instructions are all documented in the z architecture > principles of operation book. > > Signed-off-by: Carsten Otte <cotte@de.ibm.com> > --- [...] > + spin_lock(¤t->mm->page_table_lock); > + ptep = ptep_for_addr(addr); > + if (!ptep) > + goto out_unlock; FWIW, this is also a bit odd: if the guest would perform a storage key operation on such an address it would succeed. If the host will do it, it will fail (which doesn't match your description above). No? ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [patch 10/12] [PATCH] kvm-s390: storage key interface 2011-12-09 13:46 ` Heiko Carstens @ 2011-12-10 11:36 ` Carsten Otte 0 siblings, 0 replies; 33+ messages in thread From: Carsten Otte @ 2011-12-10 11:36 UTC (permalink / raw) To: heicars2 Cc: Avi Kivity, Marcelo Tossati, borntrae, mschwid2, huckc, KVM, Joachim von Buttlar, Jens Freimann, Constantin Werner On 09.12.2011 14:46, heicars2@linux.vnet.ibm.com wrote: > On Fri, Dec 09, 2011 at 01:49:35PM +0100, Carsten Otte wrote: >> This patch introduces an interface to access the guest visible >> storage keys. It supports three operations that model the behavior >> that SSKE/ISKE/RRBE instructions would have if they were issued by >> the guest. These instructions are all documented in the z architecture >> principles of operation book. >> >> Signed-off-by: Carsten Otte<cotte@de.ibm.com> >> --- > > [...] > >> + spin_lock(¤t->mm->page_table_lock); >> + ptep = ptep_for_addr(addr); >> + if (!ptep) >> + goto out_unlock; > > FWIW, this is also a bit odd: if the guest would perform a storage key > operation on such an address it would succeed. If the host will do it, > it will fail (which doesn't match your description above). > No? Good catch, will fix. ^ permalink raw reply [flat|nested] 33+ messages in thread
* [patch 00/12] Ucontrol patchset V2 @ 2011-12-08 9:12 Carsten Otte 2011-12-08 9:12 ` [patch 10/12] [PATCH] kvm-s390: storage key interface Carsten Otte 0 siblings, 1 reply; 33+ messages in thread From: Carsten Otte @ 2011-12-08 9:12 UTC (permalink / raw) To: Avi Kivity, Marcelo Tossati Cc: Christian Borntraeger, Heiko Carstens, Martin Schwidefsky, Cornelia Huck, KVM, Joachim von Buttlar, Jens Freimann, Constantin Werner Hi Avi, Hi Marcelo, I think I've integrated all feedback from last round. The race between KVM_S390_ENABLE_UCONTROL and creation of vcpus has been resolved by adding a parameter to KVM_CREATE_VM. The default KVM_VM_REGULAR (==0) is backward compatible to KVM_CREATE_VM without parameters, and KVM_VM_S390_UCONTROL enables user controlled virtual machines for CECSIM. The extra ioctl is gone. The page table walk in the storage key patch has moved to arch/s390/mm, and I think this patch should go with the rest of this series via your tree after Martin reviewed it. so long, Carsten ^ permalink raw reply [flat|nested] 33+ messages in thread
* [patch 10/12] [PATCH] kvm-s390: storage key interface 2011-12-08 9:12 [patch 00/12] Ucontrol patchset V2 Carsten Otte @ 2011-12-08 9:12 ` Carsten Otte 0 siblings, 0 replies; 33+ messages in thread From: Carsten Otte @ 2011-12-08 9:12 UTC (permalink / raw) To: Avi Kivity, Marcelo Tossati Cc: Christian Borntraeger, Heiko Carstens, Martin Schwidefsky, Cornelia Huck, KVM, Joachim von Buttlar, Jens Freimann, Constantin Werner [-- Attachment #1: skey-new.patch --] [-- Type: text/plain, Size: 9031 bytes --] This patch introduces an interface to access the guest visible storage keys. It supports three operations that model the behavior that SSKE/ISKE/RRBE instructions would have if they were issued by the guest. These instructions are all documented in the z architecture principles of operation book. Signed-off-by: Carsten Otte <cotte@de.ibm.com> --- --- Documentation/virtual/kvm/api.txt | 38 ++++++++++++++ arch/s390/include/asm/kvm_host.h | 4 + arch/s390/include/asm/pgtable.h | 1 arch/s390/kvm/kvm-s390.c | 103 ++++++++++++++++++++++++++++++++++++-- arch/s390/mm/pgtable.c | 70 +++++++++++++++++++------ include/linux/kvm.h | 7 ++ 6 files changed, 202 insertions(+), 21 deletions(-) --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1514,6 +1514,44 @@ table upfront. This is useful to handle controlled virtual machines to fault in the virtual cpu's lowcore pages prior to calling the KVM_RUN ioctl. +4.67 KVM_S390_KEYOP + +Capability: KVM_CAP_UCONTROL +Architectures: s390 +Type: vm ioctl +Parameters: struct kvm_s390_keyop (in+out) +Returns: 0 in case of success + +The parameter looks like this: + struct kvm_s390_keyop { + __u64 user_addr; + __u8 key; + __u8 operation; + }; + +user_addr contains the userspace address of a memory page +key contains the guest visible storage key as defined by the + z Architecture Principles of Operation book, including key + value for key controlled storage protection, the fetch + protection bit, and the reference and change indicator bits +operation indicates the key operation that should be performed + +The following operations are supported: +KVM_S390_KEYOP_SSKE: + This operation behaves just like the set storage key extended (SSKE) + instruction would, if it were issued by the guest. The storage key + provided in "key" is placed in the guest visible storage key. +KVM_S390_KEYOP_ISKE: + This operation behaves just like the insert storage key extended (ISKE) + instruction would, if it were issued by the guest. After this call, + the guest visible storage key is presented in the "key" field. +KVM_S390_KEYOP_RRBE: + This operation behaves just like the reset referenced bit extended + (RRBE) instruction would, if it were issued by the guest. The guest + visible reference bit is cleared, and the value presented in the "key" + field after this call has the reference bit set to 1 in case the + guest view of the reference bit was 1 prior to this call. + 5. The kvm_run structure Application code obtains a pointer to the kvm_run structure by --- a/arch/s390/include/asm/kvm_host.h +++ b/arch/s390/include/asm/kvm_host.h @@ -24,6 +24,10 @@ /* memory slots that does not exposed to userspace */ #define KVM_PRIVATE_MEM_SLOTS 4 +#define KVM_S390_KEYOP_SSKE 0x01 +#define KVM_S390_KEYOP_ISKE 0x02 +#define KVM_S390_KEYOP_RRBE 0x03 + struct sca_entry { atomic_t scn; __u32 reserved; --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -1254,6 +1254,7 @@ static inline pte_t mk_swap_pte(unsigned extern int vmem_add_mapping(unsigned long start, unsigned long size); extern int vmem_remove_mapping(unsigned long start, unsigned long size); extern int s390_enable_sie(void); +extern pte_t *ptep_for_addr(unsigned long addr); /* * No page table caches to initialise --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -112,13 +112,110 @@ void kvm_arch_exit(void) { } +static long kvm_s390_keyop(struct kvm_s390_keyop *kop) +{ + unsigned long addr = kop->user_addr; + pte_t *ptep; + pgste_t pgste; + int r; + unsigned long skey; + unsigned long bits; + + /* make sure this process is a hypervisor */ + r = -EINVAL; + if (!mm_has_pgste(current->mm)) + goto out; + + r = -ENXIO; + if (addr >= PGDIR_SIZE) + goto out; + + spin_lock(¤t->mm->page_table_lock); + ptep = ptep_for_addr(addr); + if (!ptep) + goto out_unlock; + + pgste = pgste_get_lock(ptep); + + switch (kop->operation) { + case KVM_S390_KEYOP_SSKE: + pgste = pgste_update_all(ptep, pgste); + /* set the real key back w/o rc bits */ + skey = kop->key & (_PAGE_ACC_BITS | _PAGE_FP_BIT); + if (pte_present(*ptep)) + page_set_storage_key(pte_val(*ptep), skey, 1); + /* put acc+f plus guest refereced and changed into the pgste */ + pgste_val(pgste) &= ~(RCP_ACC_BITS | RCP_FP_BIT | RCP_GR_BIT + | RCP_GC_BIT); + bits = (kop->key & (_PAGE_ACC_BITS | _PAGE_FP_BIT)); + pgste_val(pgste) |= bits << 56; + bits = (kop->key & (_PAGE_CHANGED | _PAGE_REFERENCED)); + pgste_val(pgste) |= bits << 48; + r = 0; + break; + case KVM_S390_KEYOP_ISKE: + if (pte_present(*ptep)) { + skey = page_get_storage_key(pte_val(*ptep)); + kop->key = skey & (_PAGE_ACC_BITS | _PAGE_FP_BIT); + } else { + skey = 0; + kop->key = (pgste_val(pgste) >> 56) & + (_PAGE_ACC_BITS | _PAGE_FP_BIT); + } + kop->key |= skey & (_PAGE_CHANGED | _PAGE_REFERENCED); + kop->key |= (pgste_val(pgste) >> 48) & + (_PAGE_CHANGED | _PAGE_REFERENCED); + r = 0; + break; + case KVM_S390_KEYOP_RRBE: + pgste = pgste_update_all(ptep, pgste); + kop->key = 0; + if (pgste_val(pgste) & RCP_GR_BIT) + kop->key |= _PAGE_REFERENCED; + pgste_val(pgste) &= ~RCP_GR_BIT; + r = 0; + break; + default: + r = -EINVAL; + } + pgste_set_unlock(ptep, pgste); + +out_unlock: + spin_unlock(¤t->mm->page_table_lock); +out: + return r; +} + /* Section: device related */ long kvm_arch_dev_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { - if (ioctl == KVM_S390_ENABLE_SIE) - return s390_enable_sie(); - return -EINVAL; + void __user *argp = (void __user *)arg; + int r; + + switch (ioctl) { + case KVM_S390_ENABLE_SIE: + r = s390_enable_sie(); + break; + case KVM_S390_KEYOP: { + struct kvm_s390_keyop kop; + r = -EFAULT; + if (copy_from_user(&kop, argp, sizeof(struct kvm_s390_keyop))) + break; + r = kvm_s390_keyop(&kop); + if (r) + break; + r = -EFAULT; + if (copy_to_user(argp, &kop, sizeof(struct kvm_s390_keyop))) + break; + r = 0; + break; + } + default: + r = -ENOTTY; + } + + return r; } int kvm_dev_ioctl_check_extension(long ext) --- a/arch/s390/mm/pgtable.c +++ b/arch/s390/mm/pgtable.c @@ -393,6 +393,33 @@ out_unmap: } EXPORT_SYMBOL_GPL(gmap_map_segment); +static pmd_t *__pmdp_for_addr(struct mm_struct *mm, unsigned long addr) +{ + struct vm_area_struct *vma; + pgd_t *pgd; + pud_t *pud; + pmd_t *pmd; + + vma = find_vma(mm, addr); + if (!vma) + return ERR_PTR(-EINVAL); + + pgd = pgd_offset(mm, addr); + pud = pud_alloc(mm, pgd, addr); + if (!pud) + return ERR_PTR(-ENOMEM); + + pmd = pmd_alloc(mm, pud, addr); + if (!pmd) + return ERR_PTR(-ENOMEM); + + if (!pmd_present(*pmd) && + __pte_alloc(mm, vma, pmd, addr)) + return ERR_PTR(-ENOMEM); + + return pmd; +} + /* * this function is assumed to be called with mmap_sem held */ @@ -402,10 +429,7 @@ unsigned long __gmap_fault(unsigned long struct mm_struct *mm; struct gmap_pgtable *mp; struct gmap_rmap *rmap; - struct vm_area_struct *vma; struct page *page; - pgd_t *pgd; - pud_t *pud; pmd_t *pmd; current->thread.gmap_addr = address; @@ -433,21 +457,11 @@ unsigned long __gmap_fault(unsigned long return mp->vmaddr | (address & ~PMD_MASK); } else if (segment & _SEGMENT_ENTRY_RO) { vmaddr = segment & _SEGMENT_ENTRY_ORIGIN; - vma = find_vma(mm, vmaddr); - if (!vma || vma->vm_start > vmaddr) - return -EFAULT; - - /* Walk the parent mm page table */ - pgd = pgd_offset(mm, vmaddr); - pud = pud_alloc(mm, pgd, vmaddr); - if (!pud) - return -ENOMEM; - pmd = pmd_alloc(mm, pud, vmaddr); - if (!pmd) - return -ENOMEM; - if (!pmd_present(*pmd) && - __pte_alloc(mm, vma, pmd, vmaddr)) - return -ENOMEM; + + pmd = __pmdp_for_addr(mm, vmaddr); + if (IS_ERR(pmd)) + return PTR_ERR(pmd); + /* pmd now points to a valid segment table entry. */ rmap = kmalloc(sizeof(*rmap), GFP_KERNEL|__GFP_REPEAT); if (!rmap) @@ -806,6 +820,26 @@ int s390_enable_sie(void) } EXPORT_SYMBOL_GPL(s390_enable_sie); +pte_t *ptep_for_addr(unsigned long addr) +{ + pmd_t *pmd; + pte_t *rc; + + down_read(¤t->mm->mmap_sem); + + pmd = __pmdp_for_addr(current->mm, addr); + if (IS_ERR(pmd)) { + rc = (pte_t *)pmd; + goto up_out; + } + + rc = pte_offset(pmd, addr); +up_out: + up_read(¤t->mm->mmap_sem); + return rc; +} +EXPORT_SYMBOL_GPL(ptep_for_addr); + #if defined(CONFIG_DEBUG_PAGEALLOC) && defined(CONFIG_HIBERNATION) bool kernel_page_present(struct page *page) { --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -449,6 +449,13 @@ struct kvm_ppc_pvinfo { #define KVM_GET_MSR_INDEX_LIST _IOWR(KVMIO, 0x02, struct kvm_msr_list) #define KVM_S390_ENABLE_SIE _IO(KVMIO, 0x06) + +struct kvm_s390_keyop { + __u64 user_addr; + __u8 key; + __u8 operation; +}; +#define KVM_S390_KEYOP _IOWR(KVMIO, 0x09, struct kvm_s390_keyop) /* * Check if a kvm extension is available. Argument is extension number, * return is 1 (yes) or 0 (no, sorry). ^ permalink raw reply [flat|nested] 33+ messages in thread
* [patch 00/12] User controlled virtual machines @ 2011-12-01 12:57 Carsten Otte 2011-12-01 12:57 ` [patch 10/12] [PATCH] kvm-s390: storage key interface Carsten Otte 0 siblings, 1 reply; 33+ messages in thread From: Carsten Otte @ 2011-12-01 12:57 UTC (permalink / raw) To: Avi Kivity, Marcelo Tossati Cc: Christian Borntraeger, Heiko Carstens, Martin Schwidefsky, Cornelia Huck, KVM, Joachim von Buttlar, Jens Freimann, Constantin Werner Hi Avi, Hi Marcelo, this patch series introduces an interface to allow a privileged userspace program to control a KVM virtual machine. The interface is intended for use by a machine simulator called CECSIM that can simulate an entire mainframe machine with nested virtualization and I/O for the purpose of testing and debugging its firmware prior to availability of silicon. This patchset allows for concurrent use of the KVM device driver to drive both regular and user controlled virtual machines at the same time. I would kindly like to ask for review and inclusion of these patches. with kind regards, Carsten ^ permalink raw reply [flat|nested] 33+ messages in thread
* [patch 10/12] [PATCH] kvm-s390: storage key interface 2011-12-01 12:57 [patch 00/12] User controlled virtual machines Carsten Otte @ 2011-12-01 12:57 ` Carsten Otte 0 siblings, 0 replies; 33+ messages in thread From: Carsten Otte @ 2011-12-01 12:57 UTC (permalink / raw) To: Avi Kivity, Marcelo Tossati Cc: Christian Borntraeger, Heiko Carstens, Martin Schwidefsky, Cornelia Huck, KVM, Joachim von Buttlar, Jens Freimann, Constantin Werner [-- Attachment #1: skey.patch --] [-- Type: text/plain, Size: 5037 bytes --] This patch introduces an interface to access the guest visible storage keys. It supports three operations that model the behavior that SSKE/ISKE/RRBE instructions would have if they were issued by the guest. These instructions are all documented in the z architecture principles of operation book. Signed-off-by: Carsten Otte <cotte@de.ibm.com> --- Index: linux-2.5-cecsim/arch/s390/include/asm/kvm_host.h =================================================================== --- linux-2.5-cecsim.orig/arch/s390/include/asm/kvm_host.h +++ linux-2.5-cecsim/arch/s390/include/asm/kvm_host.h @@ -25,6 +25,9 @@ #define KVM_PRIVATE_MEM_SLOTS 4 #define KVM_SIE_PAGE_OFFSET 1 +#define KVM_S390_KEYOP_SSKE 0x01 +#define KVM_S390_KEYOP_ISKE 0x02 +#define KVM_S390_KEYOP_RRBE 0x03 struct sca_entry { atomic_t scn; Index: linux-2.5-cecsim/arch/s390/kvm/kvm-s390.c =================================================================== --- linux-2.5-cecsim.orig/arch/s390/kvm/kvm-s390.c +++ linux-2.5-cecsim/arch/s390/kvm/kvm-s390.c @@ -112,13 +112,144 @@ void kvm_arch_exit(void) { } +pte_t *ptep_for_addr(unsigned long addr) +{ + struct vm_area_struct *vma; + pgd_t *pgd; + pud_t *pud; + pmd_t *pmd; + pte_t *rc; + + down_read(¤t->mm->mmap_sem); + + rc = NULL; + vma = find_vma(current->mm, addr); + if (!vma) + goto up_out; + + pgd = pgd_offset(current->mm, addr); + pud = pud_alloc(current->mm, pgd, addr); + if (!pud) + goto up_out; + + pmd = pmd_alloc(current->mm, pud, addr); + if (!pmd) + goto up_out; + + if (!pmd_present(*pmd) && + __pte_alloc(current->mm, vma, pmd, addr)) + goto up_out; + + rc = pte_offset(pmd, addr); +up_out: + up_read(¤t->mm->mmap_sem); + return rc; +} + +static long kvm_s390_keyop(struct kvm_s390_keyop *kop) +{ + unsigned long addr = kop->user_addr; + pte_t *ptep; + pgste_t pgste; + int r; + unsigned long skey; + unsigned long bits; + + /* make sure this process is a hypervisor */ + r = -EINVAL; + if (!mm_has_pgste(current->mm)) + goto out; + + r = -ENXIO; + if (addr >= PGDIR_SIZE) + goto out; + + spin_lock(¤t->mm->page_table_lock); + ptep = ptep_for_addr(addr); + if (!ptep) + goto out_unlock; + + pgste = pgste_get_lock(ptep); + + switch (kop->operation) { + case KVM_S390_KEYOP_SSKE: + pgste = pgste_update_all(ptep, pgste); + /* set the real key back w/o rc bits */ + skey = kop->key & (_PAGE_ACC_BITS | _PAGE_FP_BIT); + if (pte_present(*ptep)) + page_set_storage_key(pte_val(*ptep), skey, 1); + /* put acc+f plus guest refereced and changed into the pgste */ + pgste_val(pgste) &= ~(RCP_ACC_BITS | RCP_FP_BIT | RCP_GR_BIT + | RCP_GC_BIT); + bits = (kop->key & (_PAGE_ACC_BITS | _PAGE_FP_BIT)); + pgste_val(pgste) |= bits << 56; + bits = (kop->key & (_PAGE_CHANGED | _PAGE_REFERENCED)); + pgste_val(pgste) |= bits << 48; + r = 0; + break; + case KVM_S390_KEYOP_ISKE: + if (pte_present(*ptep)) { + skey = page_get_storage_key(pte_val(*ptep)); + kop->key = skey & (_PAGE_ACC_BITS | _PAGE_FP_BIT); + } else { + skey = 0; + kop->key = (pgste_val(pgste) >> 56) & + (_PAGE_ACC_BITS | _PAGE_FP_BIT); + } + kop->key |= skey & (_PAGE_CHANGED | _PAGE_REFERENCED); + kop->key |= (pgste_val(pgste) >> 48) & + (_PAGE_CHANGED | _PAGE_REFERENCED); + r = 0; + break; + case KVM_S390_KEYOP_RRBE: + pgste = pgste_update_all(ptep, pgste); + kop->key = 0; + if (pgste_val(pgste) & RCP_GR_BIT) + kop->key |= _PAGE_REFERENCED; + pgste_val(pgste) &= ~RCP_GR_BIT; + r = 0; + break; + default: + r = -EINVAL; + } + pgste_set_unlock(ptep, pgste); + +out_unlock: + spin_unlock(¤t->mm->page_table_lock); +out: + return r; +} + /* Section: device related */ long kvm_arch_dev_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { - if (ioctl == KVM_S390_ENABLE_SIE) - return s390_enable_sie(); - return -EINVAL; + void __user *argp = (void __user *)arg; + int r; + + switch (ioctl) { + case KVM_S390_ENABLE_SIE: + r = s390_enable_sie(); + break; + case KVM_S390_KEYOP: { + struct kvm_s390_keyop kop; + r = -EFAULT; + if (copy_from_user(&kop, argp, sizeof(struct kvm_s390_keyop))) + break; + r = kvm_s390_keyop(&kop); + if (r) + break; + r = -EFAULT; + if (copy_to_user(argp, &kop, sizeof(struct kvm_s390_keyop))) + break; + r = 0; + break; + } + default: + r = -ENOTTY; + } + + return r; } int kvm_dev_ioctl_check_extension(long ext) Index: linux-2.5-cecsim/include/linux/kvm.h =================================================================== --- linux-2.5-cecsim.orig/include/linux/kvm.h +++ linux-2.5-cecsim/include/linux/kvm.h @@ -445,6 +445,13 @@ struct kvm_ppc_pvinfo { #define KVM_GET_MSR_INDEX_LIST _IOWR(KVMIO, 0x02, struct kvm_msr_list) #define KVM_S390_ENABLE_SIE _IO(KVMIO, 0x06) + +struct kvm_s390_keyop { + __u64 user_addr; + __u8 key; + __u8 operation; +}; +#define KVM_S390_KEYOP _IOWR(KVMIO, 0x09, struct kvm_s390_keyop) /* * Check if a kvm extension is available. Argument is extension number, * return is 1 (yes) or 0 (no, sorry). ^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2011-12-15 17:18 UTC | newest]
Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-09 11:23 [patch 00/12] Ucontrol patches V3 Carsten Otte
2011-12-09 11:23 ` [patch 01/12] [PATCH] kvm-s390: add parameter for KVM_CREATE_VM Carsten Otte
2011-12-09 11:32 ` Alexander Graf
2011-12-09 11:50 ` Carsten Otte
2011-12-09 11:23 ` [patch 02/12] [PATCH] kvm-s390-ucontrol: per vcpu address spaces Carsten Otte
2011-12-09 11:23 ` [patch 03/12] [PATCH] kvm-s390-ucontrol: export page faults to user Carsten Otte
2011-12-09 11:23 ` [patch 04/12] [PATCH] kvm-s390-ucontrol: export SIE control block " Carsten Otte
2011-12-09 11:37 ` Alexander Graf
2011-12-09 11:54 ` Carsten Otte
2011-12-09 11:23 ` [patch 05/12] [PATCH] kvm-s390-ucontrol: disable in-kernel handling of SIE intercepts Carsten Otte
2011-12-09 11:23 ` [patch 06/12] [PATCH] kvm-s390-ucontrol: disable in-kernel irq stack Carsten Otte
2011-12-09 11:23 ` [patch 07/12] [PATCH] kvm-s390-ucontrol: interface to inject faults on a vcpu page table Carsten Otte
2011-12-09 11:23 ` [patch 08/12] [PATCH] kvm-s390-ucontrol: disable sca Carsten Otte
2011-12-09 11:23 ` [patch 09/12] [PATCH] kvm-s390: fix assumption for KVM_MAX_VCPUS Carsten Otte
2011-12-09 11:23 ` [patch 10/12] [PATCH] kvm-s390: storage key interface Carsten Otte
2011-12-09 12:04 ` Heiko Carstens
[not found] ` <OFEDB7DC8E.0D8BE463-ONC1257961.004633DF-C1257961.0046C1E7@de.ibm.com>
2011-12-09 13:37 ` Carsten Otte
2011-12-09 11:23 ` [patch 11/12] [PATCH] kvm-s390-ucontrol: announce capability for user controlled vms Carsten Otte
2011-12-09 11:23 ` [patch 12/12] [PATCH] kvm-s390: Fix return code for unknown ioctl numbers Carsten Otte
-- strict thread matches above, loose matches on Subject: below --
2011-12-14 12:23 [patch 00/12] Ucontrol patchset V6 Carsten Otte
2011-12-14 12:23 ` [patch 10/12] [PATCH] kvm-s390: storage key interface Carsten Otte
2011-12-15 10:28 ` Carsten Otte
2011-12-15 16:11 ` Heiko Carstens
2011-12-15 16:49 ` Christian Borntraeger
2011-12-15 17:18 ` Heiko Carstens
2011-12-15 17:14 ` Martin Schwidefsky
2011-12-10 12:35 [patch 00/12] Ucontrol patchset V5 Carsten Otte
2011-12-10 12:35 ` [patch 10/12] [PATCH] kvm-s390: storage key interface Carsten Otte
2011-12-11 21:57 ` Heiko Carstens
2011-12-12 10:06 ` Martin Schwidefsky
2011-12-09 12:49 [patch 00/12] Ucontrol patchset V4 Carsten Otte
2011-12-09 12:49 ` [patch 10/12] [PATCH] kvm-s390: storage key interface Carsten Otte
2011-12-09 13:46 ` Heiko Carstens
2011-12-10 11:36 ` Carsten Otte
2011-12-08 9:12 [patch 00/12] Ucontrol patchset V2 Carsten Otte
2011-12-08 9:12 ` [patch 10/12] [PATCH] kvm-s390: storage key interface Carsten Otte
2011-12-01 12:57 [patch 00/12] User controlled virtual machines Carsten Otte
2011-12-01 12:57 ` [patch 10/12] [PATCH] kvm-s390: storage key interface Carsten Otte
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).