* [patch 0/4] kvm-s390 patches
@ 2011-11-17 10:00 Carsten Otte
2011-11-17 10:00 ` [patch 1/4] [PATCH] kvm-s390: Fix RUNNING flag misinterpretation Carsten Otte
` (4 more replies)
0 siblings, 5 replies; 13+ messages in thread
From: Carsten Otte @ 2011-11-17 10:00 UTC (permalink / raw)
To: Avi Kivity, Marcelo Tossati
Cc: Christian Borntraeger, Heiko Carstens, Martin Schwidefsky,
Cornelia Huck, KVM
Hi Avi,
these patches are bugfixes for kvm on s390. Could you please apply
them?
thanks,
Carsten
^ permalink raw reply [flat|nested] 13+ messages in thread
* [patch 1/4] [PATCH] kvm-s390: Fix RUNNING flag misinterpretation.
2011-11-17 10:00 [patch 0/4] kvm-s390 patches Carsten Otte
@ 2011-11-17 10:00 ` Carsten Otte
2011-11-17 10:00 ` [patch 2/4] [PATCH] kvm-s390: handle SIGP sense running intercepts Carsten Otte
` (3 subsequent siblings)
4 siblings, 0 replies; 13+ messages in thread
From: Carsten Otte @ 2011-11-17 10:00 UTC (permalink / raw)
To: Avi Kivity, Marcelo Tossati
Cc: Christian Borntraeger, Heiko Carstens, Martin Schwidefsky,
Cornelia Huck, KVM, stable
[-- Attachment #1: 500-kvm-running-flag.diff --]
[-- Type: text/plain, Size: 5411 bytes --]
From: Cornelia Huck <cornelia.huck@de.ibm.com>
CPUSTAT_RUNNING was implemented signifying that a vcpu is not stopped.
This is not, however, what the architecture says: RUNNING should be
set when the host is acting on the behalf of the guest operating
system.
CPUSTAT_RUNNING has been changed to be set in kvm_arch_vcpu_load()
and to be unset in kvm_arch_vcpu_put().
For signifying stopped state of a vcpu, a host-controlled bit has
been used and is set/unset basically on the reverse as the old
CPUSTAT_RUNNING bit (including pushing it down into stop handling
proper in handle_stop()).
Cc: stable@kernel.org
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
---
arch/s390/include/asm/kvm_host.h | 2 +-
arch/s390/kvm/diag.c | 2 +-
arch/s390/kvm/intercept.c | 3 ++-
arch/s390/kvm/interrupt.c | 1 +
arch/s390/kvm/kvm-s390.c | 10 +++++++---
arch/s390/kvm/sigp.c | 6 +++---
6 files changed, 15 insertions(+), 9 deletions(-)
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -47,7 +47,7 @@ struct sca_block {
#define KVM_HPAGE_MASK(x) (~(KVM_HPAGE_SIZE(x) - 1))
#define KVM_PAGES_PER_HPAGE(x) (KVM_HPAGE_SIZE(x) / PAGE_SIZE)
-#define CPUSTAT_HOST 0x80000000
+#define CPUSTAT_STOPPED 0x80000000
#define CPUSTAT_WAIT 0x10000000
#define CPUSTAT_ECALL_PEND 0x08000000
#define CPUSTAT_STOP_INT 0x04000000
--- a/arch/s390/kvm/diag.c
+++ b/arch/s390/kvm/diag.c
@@ -42,7 +42,7 @@ static int __diag_ipl_functions(struct k
return -EOPNOTSUPP;
}
- atomic_clear_mask(CPUSTAT_RUNNING, &vcpu->arch.sie_block->cpuflags);
+ atomic_set_mask(CPUSTAT_STOPPED, &vcpu->arch.sie_block->cpuflags);
vcpu->run->s390_reset_flags |= KVM_S390_RESET_SUBSYSTEM;
vcpu->run->s390_reset_flags |= KVM_S390_RESET_IPL;
vcpu->run->s390_reset_flags |= KVM_S390_RESET_CPU_INIT;
--- a/arch/s390/kvm/intercept.c
+++ b/arch/s390/kvm/intercept.c
@@ -132,7 +132,6 @@ static int handle_stop(struct kvm_vcpu *
int rc = 0;
vcpu->stat.exit_stop_request++;
- atomic_clear_mask(CPUSTAT_RUNNING, &vcpu->arch.sie_block->cpuflags);
spin_lock_bh(&vcpu->arch.local_int.lock);
if (vcpu->arch.local_int.action_bits & ACTION_STORE_ON_STOP) {
vcpu->arch.local_int.action_bits &= ~ACTION_STORE_ON_STOP;
@@ -149,6 +148,8 @@ static int handle_stop(struct kvm_vcpu *
}
if (vcpu->arch.local_int.action_bits & ACTION_STOP_ON_STOP) {
+ atomic_set_mask(CPUSTAT_STOPPED,
+ &vcpu->arch.sie_block->cpuflags);
vcpu->arch.local_int.action_bits &= ~ACTION_STOP_ON_STOP;
VCPU_EVENT(vcpu, 3, "%s", "cpu stopped");
rc = -EOPNOTSUPP;
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -252,6 +252,7 @@ static void __do_deliver_interrupt(struc
offsetof(struct _lowcore, restart_psw), sizeof(psw_t));
if (rc == -EFAULT)
exception = 1;
+ atomic_clear_mask(CPUSTAT_STOPPED, &vcpu->arch.sie_block->cpuflags);
break;
case KVM_S390_PROGRAM_INT:
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -269,10 +269,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu
restore_fp_regs(&vcpu->arch.guest_fpregs);
restore_access_regs(vcpu->arch.guest_acrs);
gmap_enable(vcpu->arch.gmap);
+ atomic_set_mask(CPUSTAT_RUNNING, &vcpu->arch.sie_block->cpuflags);
}
void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
{
+ atomic_clear_mask(CPUSTAT_RUNNING, &vcpu->arch.sie_block->cpuflags);
gmap_disable(vcpu->arch.gmap);
save_fp_regs(&vcpu->arch.guest_fpregs);
save_access_regs(vcpu->arch.guest_acrs);
@@ -300,7 +302,9 @@ static void kvm_s390_vcpu_initial_reset(
int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
{
- atomic_set(&vcpu->arch.sie_block->cpuflags, CPUSTAT_ZARCH | CPUSTAT_SM);
+ atomic_set(&vcpu->arch.sie_block->cpuflags, CPUSTAT_ZARCH |
+ CPUSTAT_SM |
+ CPUSTAT_STOPPED);
vcpu->arch.sie_block->ecb = 6;
vcpu->arch.sie_block->eca = 0xC1002001U;
vcpu->arch.sie_block->fac = (int) (long) facilities;
@@ -427,7 +431,7 @@ static int kvm_arch_vcpu_ioctl_set_initi
{
int rc = 0;
- if (atomic_read(&vcpu->arch.sie_block->cpuflags) & CPUSTAT_RUNNING)
+ if (!(atomic_read(&vcpu->arch.sie_block->cpuflags) & CPUSTAT_STOPPED))
rc = -EBUSY;
else {
vcpu->run->psw_mask = psw.mask;
@@ -500,7 +504,7 @@ rerun_vcpu:
if (vcpu->sigset_active)
sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
- atomic_set_mask(CPUSTAT_RUNNING, &vcpu->arch.sie_block->cpuflags);
+ atomic_clear_mask(CPUSTAT_STOPPED, &vcpu->arch.sie_block->cpuflags);
BUG_ON(vcpu->kvm->arch.float_int.local_int[vcpu->vcpu_id] == NULL);
--- a/arch/s390/kvm/sigp.c
+++ b/arch/s390/kvm/sigp.c
@@ -57,8 +57,8 @@ static int __sigp_sense(struct kvm_vcpu
spin_lock(&fi->lock);
if (fi->local_int[cpu_addr] == NULL)
rc = 3; /* not operational */
- else if (atomic_read(fi->local_int[cpu_addr]->cpuflags)
- & CPUSTAT_RUNNING) {
+ else if (!(atomic_read(fi->local_int[cpu_addr]->cpuflags)
+ & CPUSTAT_STOPPED)) {
*reg &= 0xffffffff00000000UL;
rc = 1; /* status stored */
} else {
@@ -251,7 +251,7 @@ static int __sigp_set_prefix(struct kvm_
spin_lock_bh(&li->lock);
/* cpu must be in stopped state */
- if (atomic_read(li->cpuflags) & CPUSTAT_RUNNING) {
+ if (!(atomic_read(li->cpuflags) & CPUSTAT_STOPPED)) {
rc = 1; /* incorrect state */
*reg &= SIGP_STAT_INCORRECT_STATE;
kfree(inti);
^ permalink raw reply [flat|nested] 13+ messages in thread
* [patch 2/4] [PATCH] kvm-s390: handle SIGP sense running intercepts.
2011-11-17 10:00 [patch 0/4] kvm-s390 patches Carsten Otte
2011-11-17 10:00 ` [patch 1/4] [PATCH] kvm-s390: Fix RUNNING flag misinterpretation Carsten Otte
@ 2011-11-17 10:00 ` Carsten Otte
2011-11-17 10:15 ` Avi Kivity
2011-11-17 10:00 ` [patch 3/4] [PATCH] kvm: Fix tprot locking Carsten Otte
` (2 subsequent siblings)
4 siblings, 1 reply; 13+ messages in thread
From: Carsten Otte @ 2011-11-17 10:00 UTC (permalink / raw)
To: Avi Kivity, Marcelo Tossati
Cc: Christian Borntraeger, Heiko Carstens, Martin Schwidefsky,
Cornelia Huck, KVM
[-- Attachment #1: 501-kvm-sense-running.diff --]
[-- Type: text/plain, Size: 3047 bytes --]
From: Cornelia Huck <cornelia.huck@de.ibm.com>
SIGP sense running may cause an intercept on higher level
virtualization, so handle it by checking the CPUSTAT_RUNNING flag.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
---
arch/s390/include/asm/kvm_host.h | 1 +
arch/s390/kvm/kvm-s390.c | 1 +
arch/s390/kvm/sigp.c | 39 +++++++++++++++++++++++++++++++++++++++
3 files changed, 41 insertions(+)
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -145,6 +145,7 @@ struct kvm_vcpu_stat {
u32 instruction_sigp_arch;
u32 instruction_sigp_prefix;
u32 instruction_sigp_restart;
+ u32 instruction_sigp_sense_running;
u32 diagnose_44;
};
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -71,6 +71,7 @@ struct kvm_stats_debugfs_item debugfs_en
{ "instruction_sigp_set_arch", VCPU_STAT(instruction_sigp_arch) },
{ "instruction_sigp_set_prefix", VCPU_STAT(instruction_sigp_prefix) },
{ "instruction_sigp_restart", VCPU_STAT(instruction_sigp_restart) },
+ { "instruction_sigp_sense_running", VCPU_STAT(instruction_sigp_sense_running) },
{ "diagnose_44", VCPU_STAT(diagnose_44) },
{ NULL }
};
--- a/arch/s390/kvm/sigp.c
+++ b/arch/s390/kvm/sigp.c
@@ -31,9 +31,11 @@
#define SIGP_SET_PREFIX 0x0d
#define SIGP_STORE_STATUS_ADDR 0x0e
#define SIGP_SET_ARCH 0x12
+#define SIGP_SENSE_RUNNING 0x15
/* cpu status bits */
#define SIGP_STAT_EQUIPMENT_CHECK 0x80000000UL
+#define SIGP_STAT_NOT_RUNNING 0x00000400UL
#define SIGP_STAT_INCORRECT_STATE 0x00000200UL
#define SIGP_STAT_INVALID_PARAMETER 0x00000100UL
#define SIGP_STAT_EXT_CALL_PENDING 0x00000080UL
@@ -275,6 +277,38 @@ out_fi:
return rc;
}
+static int __sigp_sense_running(struct kvm_vcpu *vcpu, u16 cpu_addr,
+ unsigned long *reg)
+{
+ int rc;
+ struct kvm_s390_float_interrupt *fi = &vcpu->kvm->arch.float_int;
+
+ if (cpu_addr >= KVM_MAX_VCPUS)
+ return 3; /* not operational */
+
+ spin_lock(&fi->lock);
+ if (fi->local_int[cpu_addr] == NULL)
+ rc = 3; /* not operational */
+ else {
+ if (atomic_read(fi->local_int[cpu_addr]->cpuflags)
+ & CPUSTAT_RUNNING) {
+ /* running */
+ rc = 1;
+ } else {
+ /* not running */
+ *reg &= 0xffffffff00000000UL;
+ *reg |= SIGP_STAT_NOT_RUNNING;
+ rc = 0;
+ }
+ }
+ spin_unlock(&fi->lock);
+
+ VCPU_EVENT(vcpu, 4, "sensed running status of cpu %x rc %x", cpu_addr,
+ rc);
+
+ return rc;
+}
+
int kvm_s390_handle_sigp(struct kvm_vcpu *vcpu)
{
int r1 = (vcpu->arch.sie_block->ipa & 0x00f0) >> 4;
@@ -331,6 +365,11 @@ int kvm_s390_handle_sigp(struct kvm_vcpu
rc = __sigp_set_prefix(vcpu, cpu_addr, parameter,
&vcpu->arch.guest_gprs[r1]);
break;
+ case SIGP_SENSE_RUNNING:
+ vcpu->stat.instruction_sigp_sense_running++;
+ rc = __sigp_sense_running(vcpu, cpu_addr,
+ &vcpu->arch.guest_gprs[r1]);
+ break;
case SIGP_RESTART:
vcpu->stat.instruction_sigp_restart++;
/* user space must know about restart */
^ permalink raw reply [flat|nested] 13+ messages in thread
* [patch 3/4] [PATCH] kvm: Fix tprot locking
2011-11-17 10:00 [patch 0/4] kvm-s390 patches Carsten Otte
2011-11-17 10:00 ` [patch 1/4] [PATCH] kvm-s390: Fix RUNNING flag misinterpretation Carsten Otte
2011-11-17 10:00 ` [patch 2/4] [PATCH] kvm-s390: handle SIGP sense running intercepts Carsten Otte
@ 2011-11-17 10:00 ` Carsten Otte
2011-11-17 10:27 ` Avi Kivity
2011-11-17 10:00 ` [patch 4/4] [PATCH] kvm: announce SYNC_MMU Carsten Otte
2011-11-17 10:35 ` [patch 0/4] kvm-s390 patches Avi Kivity
4 siblings, 1 reply; 13+ messages in thread
From: Carsten Otte @ 2011-11-17 10:00 UTC (permalink / raw)
To: Avi Kivity, Marcelo Tossati
Cc: Christian Borntraeger, Heiko Carstens, Martin Schwidefsky,
Cornelia Huck, KVM
[-- Attachment #1: 504-kvm-tprot-locking.diff --]
[-- Type: text/plain, Size: 1783 bytes --]
From: Christian Borntraeger <borntraeger@de.ibm.com>
There is a potential host deadlock in the tprot intercept handling.
We must not hold the mmap semaphore while resolving the guest
address. If userspace is remapping, then the memory detection in
the guest is broken anyway so we can safely separate the
address translation from walking the vmas.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
---
arch/s390/kvm/priv.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff -urpN linux-2.6/arch/s390/kvm/priv.c linux-2.6-patched/arch/s390/kvm/priv.c
--- linux-2.6/arch/s390/kvm/priv.c 2011-10-24 09:10:05.000000000 +0200
+++ linux-2.6-patched/arch/s390/kvm/priv.c 2011-11-17 10:03:53.000000000 +0100
@@ -336,6 +336,7 @@ static int handle_tprot(struct kvm_vcpu
u64 address1 = disp1 + base1 ? vcpu->arch.guest_gprs[base1] : 0;
u64 address2 = disp2 + base2 ? vcpu->arch.guest_gprs[base2] : 0;
struct vm_area_struct *vma;
+ unsigned long user_address;
vcpu->stat.instruction_tprot++;
@@ -349,9 +350,14 @@ static int handle_tprot(struct kvm_vcpu
return -EOPNOTSUPP;
+ /* we must resolve the address without holding the mmap semaphore.
+ * This is ok since the userspace hypervisor is not supposed to change
+ * the mapping while the guest queries the memory. Otherwise the guest
+ * might crash or get wrong info anyway. */
+ user_address = (unsigned long) __guestaddr_to_user(vcpu, address1);
+
down_read(¤t->mm->mmap_sem);
- vma = find_vma(current->mm,
- (unsigned long) __guestaddr_to_user(vcpu, address1));
+ vma = find_vma(current->mm, user_address);
if (!vma) {
up_read(¤t->mm->mmap_sem);
return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
^ permalink raw reply [flat|nested] 13+ messages in thread
* [patch 4/4] [PATCH] kvm: announce SYNC_MMU
2011-11-17 10:00 [patch 0/4] kvm-s390 patches Carsten Otte
` (2 preceding siblings ...)
2011-11-17 10:00 ` [patch 3/4] [PATCH] kvm: Fix tprot locking Carsten Otte
@ 2011-11-17 10:00 ` Carsten Otte
2011-11-17 10:35 ` [patch 0/4] kvm-s390 patches Avi Kivity
4 siblings, 0 replies; 13+ messages in thread
From: Carsten Otte @ 2011-11-17 10:00 UTC (permalink / raw)
To: Avi Kivity, Marcelo Tossati
Cc: Christian Borntraeger, Heiko Carstens, Martin Schwidefsky,
Cornelia Huck, KVM
[-- Attachment #1: 505-kvm-sync-mmu.diff --]
[-- Type: text/plain, Size: 721 bytes --]
From: Christian Borntraeger <borntraeger@de.ibm.com>
KVM on s390 always had a sync mmu. Any mapping change in userspace
mapping was always reflected immediately in the guest mapping.
- In older code the guest mapping was just an offset
- In newer code the last level page table is shared
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
---
arch/s390/kvm/kvm-s390.c | 1 +
1 file changed, 1 insertion(+)
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -127,6 +127,7 @@ int kvm_dev_ioctl_check_extension(long e
switch (ext) {
case KVM_CAP_S390_PSW:
case KVM_CAP_S390_GMAP:
+ case KVM_CAP_SYNC_MMU:
r = 1;
break;
default:
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch 2/4] [PATCH] kvm-s390: handle SIGP sense running intercepts.
2011-11-17 10:00 ` [patch 2/4] [PATCH] kvm-s390: handle SIGP sense running intercepts Carsten Otte
@ 2011-11-17 10:15 ` Avi Kivity
2011-11-17 10:19 ` Christian Borntraeger
0 siblings, 1 reply; 13+ messages in thread
From: Avi Kivity @ 2011-11-17 10:15 UTC (permalink / raw)
To: Carsten Otte
Cc: Marcelo Tossati, Christian Borntraeger, Heiko Carstens,
Martin Schwidefsky, Cornelia Huck, KVM
On 11/17/2011 12:00 PM, Carsten Otte wrote:
> From: Cornelia Huck <cornelia.huck@de.ibm.com>
>
> SIGP sense running may cause an intercept on higher level
> virtualization, so handle it by checking the CPUSTAT_RUNNING flag.
>
What does "higher level virtualization" mean?
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch 2/4] [PATCH] kvm-s390: handle SIGP sense running intercepts.
2011-11-17 10:15 ` Avi Kivity
@ 2011-11-17 10:19 ` Christian Borntraeger
0 siblings, 0 replies; 13+ messages in thread
From: Christian Borntraeger @ 2011-11-17 10:19 UTC (permalink / raw)
To: Avi Kivity
Cc: Carsten Otte, Marcelo Tossati, Heiko Carstens, Martin Schwidefsky,
Cornelia Huck, KVM
On 17/11/11 11:15, Avi Kivity wrote:
> On 11/17/2011 12:00 PM, Carsten Otte wrote:
>> From: Cornelia Huck <cornelia.huck@de.ibm.com>
>>
>> SIGP sense running may cause an intercept on higher level
>> virtualization, so handle it by checking the CPUSTAT_RUNNING flag.
>>
>
> What does "higher level virtualization" mean?
Running KVM under z/VM under LPAR.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch 3/4] [PATCH] kvm: Fix tprot locking
2011-11-17 10:00 ` [patch 3/4] [PATCH] kvm: Fix tprot locking Carsten Otte
@ 2011-11-17 10:27 ` Avi Kivity
2011-11-17 11:15 ` Martin Schwidefsky
0 siblings, 1 reply; 13+ messages in thread
From: Avi Kivity @ 2011-11-17 10:27 UTC (permalink / raw)
To: Carsten Otte
Cc: Marcelo Tossati, Christian Borntraeger, Heiko Carstens,
Martin Schwidefsky, Cornelia Huck, KVM
On 11/17/2011 12:00 PM, Carsten Otte wrote:
> From: Christian Borntraeger <borntraeger@de.ibm.com>
>
> There is a potential host deadlock in the tprot intercept handling.
> We must not hold the mmap semaphore while resolving the guest
> address. If userspace is remapping, then the memory detection in
> the guest is broken anyway so we can safely separate the
> address translation from walking the vmas.
>
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> Signed-off-by: Carsten Otte <cotte@de.ibm.com>
> ---
>
> arch/s390/kvm/priv.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff -urpN linux-2.6/arch/s390/kvm/priv.c linux-2.6-patched/arch/s390/kvm/priv.c
> --- linux-2.6/arch/s390/kvm/priv.c 2011-10-24 09:10:05.000000000 +0200
> +++ linux-2.6-patched/arch/s390/kvm/priv.c 2011-11-17 10:03:53.000000000 +0100
> @@ -336,6 +336,7 @@ static int handle_tprot(struct kvm_vcpu
> u64 address1 = disp1 + base1 ? vcpu->arch.guest_gprs[base1] : 0;
> u64 address2 = disp2 + base2 ? vcpu->arch.guest_gprs[base2] : 0;
> struct vm_area_struct *vma;
> + unsigned long user_address;
>
> vcpu->stat.instruction_tprot++;
>
> @@ -349,9 +350,14 @@ static int handle_tprot(struct kvm_vcpu
> return -EOPNOTSUPP;
>
>
> + /* we must resolve the address without holding the mmap semaphore.
> + * This is ok since the userspace hypervisor is not supposed to change
> + * the mapping while the guest queries the memory. Otherwise the guest
> + * might crash or get wrong info anyway. */
> + user_address = (unsigned long) __guestaddr_to_user(vcpu, address1);
> +
> down_read(¤t->mm->mmap_sem);
> - vma = find_vma(current->mm,
> - (unsigned long) __guestaddr_to_user(vcpu, address1));
> + vma = find_vma(current->mm, user_address);
> if (!vma) {
> up_read(¤t->mm->mmap_sem);
> return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
>
Unrelated to the patch, but I'm curious: it looks like __gmap_fault()
dereferences the guest page table? How can it assume that it is mapped?
I'm probably misreading the code.
A little closer to the patch, x86 handles the same issue by calling
get_user_pages_fast(). This should be more scalable than bouncing
mmap_sem, something to consider.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch 0/4] kvm-s390 patches
2011-11-17 10:00 [patch 0/4] kvm-s390 patches Carsten Otte
` (3 preceding siblings ...)
2011-11-17 10:00 ` [patch 4/4] [PATCH] kvm: announce SYNC_MMU Carsten Otte
@ 2011-11-17 10:35 ` Avi Kivity
4 siblings, 0 replies; 13+ messages in thread
From: Avi Kivity @ 2011-11-17 10:35 UTC (permalink / raw)
To: Carsten Otte
Cc: Marcelo Tossati, Christian Borntraeger, Heiko Carstens,
Martin Schwidefsky, Cornelia Huck, KVM
On 11/17/2011 12:00 PM, Carsten Otte wrote:
> Hi Avi,
>
> these patches are bugfixes for kvm on s390. Could you please apply
> them?
>
All applied, thanks. Patch 2 had rejects (you didn't have the diagnose
0x10 patch in your tree), it should be good, but please double check
kvm.git next.
Should all four patches go to 3.2?
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch 3/4] [PATCH] kvm: Fix tprot locking
2011-11-17 10:27 ` Avi Kivity
@ 2011-11-17 11:15 ` Martin Schwidefsky
2011-11-17 11:32 ` Martin Schwidefsky
2011-11-20 12:02 ` Avi Kivity
0 siblings, 2 replies; 13+ messages in thread
From: Martin Schwidefsky @ 2011-11-17 11:15 UTC (permalink / raw)
To: Avi Kivity
Cc: Carsten Otte, Marcelo Tossati, Christian Borntraeger,
Heiko Carstens, Cornelia Huck, KVM
On Thu, 17 Nov 2011 12:27:41 +0200
Avi Kivity <avi@redhat.com> wrote:
> On 11/17/2011 12:00 PM, Carsten Otte wrote:
> > From: Christian Borntraeger <borntraeger@de.ibm.com>
> >
> > There is a potential host deadlock in the tprot intercept handling.
> > We must not hold the mmap semaphore while resolving the guest
> > address. If userspace is remapping, then the memory detection in
> > the guest is broken anyway so we can safely separate the
> > address translation from walking the vmas.
> >
> > Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> > Signed-off-by: Carsten Otte <cotte@de.ibm.com>
> > ---
> >
> > arch/s390/kvm/priv.c | 10 ++++++++--
> > 1 file changed, 8 insertions(+), 2 deletions(-)
> >
> > diff -urpN linux-2.6/arch/s390/kvm/priv.c linux-2.6-patched/arch/s390/kvm/priv.c
> > --- linux-2.6/arch/s390/kvm/priv.c 2011-10-24 09:10:05.000000000 +0200
> > +++ linux-2.6-patched/arch/s390/kvm/priv.c 2011-11-17 10:03:53.000000000 +0100
> > @@ -336,6 +336,7 @@ static int handle_tprot(struct kvm_vcpu
> > u64 address1 = disp1 + base1 ? vcpu->arch.guest_gprs[base1] : 0;
> > u64 address2 = disp2 + base2 ? vcpu->arch.guest_gprs[base2] : 0;
> > struct vm_area_struct *vma;
> > + unsigned long user_address;
> >
> > vcpu->stat.instruction_tprot++;
> >
> > @@ -349,9 +350,14 @@ static int handle_tprot(struct kvm_vcpu
> > return -EOPNOTSUPP;
> >
> >
> > + /* we must resolve the address without holding the mmap semaphore.
> > + * This is ok since the userspace hypervisor is not supposed to change
> > + * the mapping while the guest queries the memory. Otherwise the guest
> > + * might crash or get wrong info anyway. */
> > + user_address = (unsigned long) __guestaddr_to_user(vcpu, address1);
> > +
> > down_read(¤t->mm->mmap_sem);
> > - vma = find_vma(current->mm,
> > - (unsigned long) __guestaddr_to_user(vcpu, address1));
> > + vma = find_vma(current->mm, user_address);
> > if (!vma) {
> > up_read(¤t->mm->mmap_sem);
> > return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
> >
>
> Unrelated to the patch, but I'm curious: it looks like __gmap_fault()
> dereferences the guest page table? How can it assume that it is mapped?
The gmap code does not assume that the code is mapped. If the individual
MB has not been mapped in the guest address space or the target memory
is not mapped in the process address space __gmap_fault() returns -EFAULT.
> I'm probably misreading the code.
>
> A little closer to the patch, x86 handles the same issue by calling
> get_user_pages_fast(). This should be more scalable than bouncing
> mmap_sem, something to consider.
I don't think that the frequency of asynchronous page faults will make
it necessary to use get_user_pages_fast(). We are talking about the
case where I/O is necessary to provide the page that the guest accessed.
The advantage of the way s390 does things is that after __gmap_fault
translated the guest address to a user space address we can just do a
standard page fault for the user space process. Only if that requires
I/O we go the long way. Makes sense?
--
blue skies,
Martin.
"Reality continues to ruin my life." - Calvin.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch 3/4] [PATCH] kvm: Fix tprot locking
2011-11-17 11:15 ` Martin Schwidefsky
@ 2011-11-17 11:32 ` Martin Schwidefsky
2011-11-20 12:05 ` Avi Kivity
2011-11-20 12:02 ` Avi Kivity
1 sibling, 1 reply; 13+ messages in thread
From: Martin Schwidefsky @ 2011-11-17 11:32 UTC (permalink / raw)
To: Avi Kivity
Cc: Carsten Otte, Marcelo Tossati, Christian Borntraeger,
Heiko Carstens, Cornelia Huck, KVM
On Thu, 17 Nov 2011 12:15:52 +0100
Martin Schwidefsky <schwidefsky@de.ibm.com> wrote:
> On Thu, 17 Nov 2011 12:27:41 +0200
> Avi Kivity <avi@redhat.com> wrote:
>
> > On 11/17/2011 12:00 PM, Carsten Otte wrote:
> > > From: Christian Borntraeger <borntraeger@de.ibm.com>
> > >
> > > There is a potential host deadlock in the tprot intercept handling.
> > > We must not hold the mmap semaphore while resolving the guest
> > > address. If userspace is remapping, then the memory detection in
> > > the guest is broken anyway so we can safely separate the
> > > address translation from walking the vmas.
> > >
> > > Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> > > Signed-off-by: Carsten Otte <cotte@de.ibm.com>
> > > ---
> > >
> > > arch/s390/kvm/priv.c | 10 ++++++++--
> > > 1 file changed, 8 insertions(+), 2 deletions(-)
> > >
> > > diff -urpN linux-2.6/arch/s390/kvm/priv.c linux-2.6-patched/arch/s390/kvm/priv.c
> > > --- linux-2.6/arch/s390/kvm/priv.c 2011-10-24 09:10:05.000000000 +0200
> > > +++ linux-2.6-patched/arch/s390/kvm/priv.c 2011-11-17 10:03:53.000000000 +0100
> > > @@ -336,6 +336,7 @@ static int handle_tprot(struct kvm_vcpu
> > > u64 address1 = disp1 + base1 ? vcpu->arch.guest_gprs[base1] : 0;
> > > u64 address2 = disp2 + base2 ? vcpu->arch.guest_gprs[base2] : 0;
> > > struct vm_area_struct *vma;
> > > + unsigned long user_address;
> > >
> > > vcpu->stat.instruction_tprot++;
> > >
> > > @@ -349,9 +350,14 @@ static int handle_tprot(struct kvm_vcpu
> > > return -EOPNOTSUPP;
> > >
> > >
> > > + /* we must resolve the address without holding the mmap semaphore.
> > > + * This is ok since the userspace hypervisor is not supposed to change
> > > + * the mapping while the guest queries the memory. Otherwise the guest
> > > + * might crash or get wrong info anyway. */
> > > + user_address = (unsigned long) __guestaddr_to_user(vcpu, address1);
> > > +
> > > down_read(¤t->mm->mmap_sem);
> > > - vma = find_vma(current->mm,
> > > - (unsigned long) __guestaddr_to_user(vcpu, address1));
> > > + vma = find_vma(current->mm, user_address);
> > > if (!vma) {
> > > up_read(¤t->mm->mmap_sem);
> > > return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
> > >
> >
> > Unrelated to the patch, but I'm curious: it looks like __gmap_fault()
> > dereferences the guest page table? How can it assume that it is mapped?
>
> The gmap code does not assume that the code is mapped. If the individual
> MB has not been mapped in the guest address space or the target memory
> is not mapped in the process address space __gmap_fault() returns -EFAULT.
>
> > I'm probably misreading the code.
> >
> > A little closer to the patch, x86 handles the same issue by calling
> > get_user_pages_fast(). This should be more scalable than bouncing
> > mmap_sem, something to consider.
>
> I don't think that the frequency of asynchronous page faults will make
> it necessary to use get_user_pages_fast(). We are talking about the
> case where I/O is necessary to provide the page that the guest accessed.
>
> The advantage of the way s390 does things is that after __gmap_fault
> translated the guest address to a user space address we can just do a
> standard page fault for the user space process. Only if that requires
> I/O we go the long way. Makes sense?
Hmm, Carsten just made me aware that your question is not about pfault,
it is about the standard case of a guest fault.
For normal guest faults we use a cool trick that the s390 hardware
allows us. We have the paging table for the kvm process and we have the
guest page table for execution in the virtualized context. The trick is
that the guest page table reuses the lowest level of the process page
table. A fault that sets a pte in the process page table will
automatically make that pte visible in the guest page table as well
if the memory region has been mapped in the higher order page tables.
Even the invalidation of a pte will automatically (!!) remove the
referenced page from the guest page table as well, including the TLB
entries on all cpus. The IPTE instruction is your friend :-)
That is why we don't need mm notifiers.
--
blue skies,
Martin.
"Reality continues to ruin my life." - Calvin.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch 3/4] [PATCH] kvm: Fix tprot locking
2011-11-17 11:15 ` Martin Schwidefsky
2011-11-17 11:32 ` Martin Schwidefsky
@ 2011-11-20 12:02 ` Avi Kivity
1 sibling, 0 replies; 13+ messages in thread
From: Avi Kivity @ 2011-11-20 12:02 UTC (permalink / raw)
To: Martin Schwidefsky
Cc: Carsten Otte, Marcelo Tossati, Christian Borntraeger,
Heiko Carstens, Cornelia Huck, KVM
On 11/17/2011 01:15 PM, Martin Schwidefsky wrote:
> On Thu, 17 Nov 2011 12:27:41 +0200
> Avi Kivity <avi@redhat.com> wrote:
>
> > On 11/17/2011 12:00 PM, Carsten Otte wrote:
> > > From: Christian Borntraeger <borntraeger@de.ibm.com>
> > >
> > > There is a potential host deadlock in the tprot intercept handling.
> > > We must not hold the mmap semaphore while resolving the guest
> > > address. If userspace is remapping, then the memory detection in
> > > the guest is broken anyway so we can safely separate the
> > > address translation from walking the vmas.
> > >
> > > Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> > > Signed-off-by: Carsten Otte <cotte@de.ibm.com>
> > > ---
> > >
> > > arch/s390/kvm/priv.c | 10 ++++++++--
> > > 1 file changed, 8 insertions(+), 2 deletions(-)
> > >
> > > diff -urpN linux-2.6/arch/s390/kvm/priv.c linux-2.6-patched/arch/s390/kvm/priv.c
> > > --- linux-2.6/arch/s390/kvm/priv.c 2011-10-24 09:10:05.000000000 +0200
> > > +++ linux-2.6-patched/arch/s390/kvm/priv.c 2011-11-17 10:03:53.000000000 +0100
> > > @@ -336,6 +336,7 @@ static int handle_tprot(struct kvm_vcpu
> > > u64 address1 = disp1 + base1 ? vcpu->arch.guest_gprs[base1] : 0;
> > > u64 address2 = disp2 + base2 ? vcpu->arch.guest_gprs[base2] : 0;
> > > struct vm_area_struct *vma;
> > > + unsigned long user_address;
> > >
> > > vcpu->stat.instruction_tprot++;
> > >
> > > @@ -349,9 +350,14 @@ static int handle_tprot(struct kvm_vcpu
> > > return -EOPNOTSUPP;
> > >
> > >
> > > + /* we must resolve the address without holding the mmap semaphore.
> > > + * This is ok since the userspace hypervisor is not supposed to change
> > > + * the mapping while the guest queries the memory. Otherwise the guest
> > > + * might crash or get wrong info anyway. */
> > > + user_address = (unsigned long) __guestaddr_to_user(vcpu, address1);
> > > +
> > > down_read(¤t->mm->mmap_sem);
> > > - vma = find_vma(current->mm,
> > > - (unsigned long) __guestaddr_to_user(vcpu, address1));
> > > + vma = find_vma(current->mm, user_address);
> > > if (!vma) {
> > > up_read(¤t->mm->mmap_sem);
> > > return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
> > >
> >
> > Unrelated to the patch, but I'm curious: it looks like __gmap_fault()
> > dereferences the guest page table? How can it assume that it is mapped?
>
> The gmap code does not assume that the code is mapped. If the individual
> MB has not been mapped in the guest address space or the target memory
> is not mapped in the process address space __gmap_fault() returns -EFAULT.
>
> > I'm probably misreading the code.
I did misread it - I assumed it was guest page table, whereas those are
host page tables mapping guest memory (here, "guest page table" mean
guest-managed virt to phys translation).
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [patch 3/4] [PATCH] kvm: Fix tprot locking
2011-11-17 11:32 ` Martin Schwidefsky
@ 2011-11-20 12:05 ` Avi Kivity
0 siblings, 0 replies; 13+ messages in thread
From: Avi Kivity @ 2011-11-20 12:05 UTC (permalink / raw)
To: Martin Schwidefsky
Cc: Carsten Otte, Marcelo Tossati, Christian Borntraeger,
Heiko Carstens, Cornelia Huck, KVM
On 11/17/2011 01:32 PM, Martin Schwidefsky wrote:
> On Thu, 17 Nov 2011 12:15:52 +0100
> Martin Schwidefsky <schwidefsky@de.ibm.com> wrote:
>
> > On Thu, 17 Nov 2011 12:27:41 +0200
> > Avi Kivity <avi@redhat.com> wrote:
> >
> > > On 11/17/2011 12:00 PM, Carsten Otte wrote:
> > > > From: Christian Borntraeger <borntraeger@de.ibm.com>
> > > >
> > > > There is a potential host deadlock in the tprot intercept handling.
> > > > We must not hold the mmap semaphore while resolving the guest
> > > > address. If userspace is remapping, then the memory detection in
> > > > the guest is broken anyway so we can safely separate the
> > > > address translation from walking the vmas.
> > > >
> > > > Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> > > > Signed-off-by: Carsten Otte <cotte@de.ibm.com>
> > > > ---
> > > >
> > > > arch/s390/kvm/priv.c | 10 ++++++++--
> > > > 1 file changed, 8 insertions(+), 2 deletions(-)
> > > >
> > > > diff -urpN linux-2.6/arch/s390/kvm/priv.c linux-2.6-patched/arch/s390/kvm/priv.c
> > > > --- linux-2.6/arch/s390/kvm/priv.c 2011-10-24 09:10:05.000000000 +0200
> > > > +++ linux-2.6-patched/arch/s390/kvm/priv.c 2011-11-17 10:03:53.000000000 +0100
> > > > @@ -336,6 +336,7 @@ static int handle_tprot(struct kvm_vcpu
> > > > u64 address1 = disp1 + base1 ? vcpu->arch.guest_gprs[base1] : 0;
> > > > u64 address2 = disp2 + base2 ? vcpu->arch.guest_gprs[base2] : 0;
> > > > struct vm_area_struct *vma;
> > > > + unsigned long user_address;
> > > >
> > > > vcpu->stat.instruction_tprot++;
> > > >
> > > > @@ -349,9 +350,14 @@ static int handle_tprot(struct kvm_vcpu
> > > > return -EOPNOTSUPP;
> > > >
> > > >
> > > > + /* we must resolve the address without holding the mmap semaphore.
> > > > + * This is ok since the userspace hypervisor is not supposed to change
> > > > + * the mapping while the guest queries the memory. Otherwise the guest
> > > > + * might crash or get wrong info anyway. */
> > > > + user_address = (unsigned long) __guestaddr_to_user(vcpu, address1);
> > > > +
> > > > down_read(¤t->mm->mmap_sem);
> > > > - vma = find_vma(current->mm,
> > > > - (unsigned long) __guestaddr_to_user(vcpu, address1));
> > > > + vma = find_vma(current->mm, user_address);
> > > > if (!vma) {
> > > > up_read(¤t->mm->mmap_sem);
> > > > return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
> > > >
> > >
> > > Unrelated to the patch, but I'm curious: it looks like __gmap_fault()
> > > dereferences the guest page table? How can it assume that it is mapped?
> >
> > The gmap code does not assume that the code is mapped. If the individual
> > MB has not been mapped in the guest address space or the target memory
> > is not mapped in the process address space __gmap_fault() returns -EFAULT.
> >
> > > I'm probably misreading the code.
> > >
> > > A little closer to the patch, x86 handles the same issue by calling
> > > get_user_pages_fast(). This should be more scalable than bouncing
> > > mmap_sem, something to consider.
> >
> > I don't think that the frequency of asynchronous page faults will make
> > it necessary to use get_user_pages_fast(). We are talking about the
> > case where I/O is necessary to provide the page that the guest accessed.
> >
> > The advantage of the way s390 does things is that after __gmap_fault
> > translated the guest address to a user space address we can just do a
> > standard page fault for the user space process. Only if that requires
> > I/O we go the long way. Makes sense?
>
> Hmm, Carsten just made me aware that your question is not about pfault,
> it is about the standard case of a guest fault.
>
> For normal guest faults we use a cool trick that the s390 hardware
> allows us. We have the paging table for the kvm process and we have the
> guest page table for execution in the virtualized context. The trick is
> that the guest page table reuses the lowest level of the process page
> table. A fault that sets a pte in the process page table will
> automatically make that pte visible in the guest page table as well
> if the memory region has been mapped in the higher order page tables.
> Even the invalidation of a pte will automatically (!!) remove the
> referenced page from the guest page table as well, including the TLB
> entries on all cpus. The IPTE instruction is your friend :-)
> That is why we don't need mm notifiers.
Yes, that explains it perfectly. I congratulate you on having such
friendly hardware...
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2011-11-20 12:06 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-17 10:00 [patch 0/4] kvm-s390 patches Carsten Otte
2011-11-17 10:00 ` [patch 1/4] [PATCH] kvm-s390: Fix RUNNING flag misinterpretation Carsten Otte
2011-11-17 10:00 ` [patch 2/4] [PATCH] kvm-s390: handle SIGP sense running intercepts Carsten Otte
2011-11-17 10:15 ` Avi Kivity
2011-11-17 10:19 ` Christian Borntraeger
2011-11-17 10:00 ` [patch 3/4] [PATCH] kvm: Fix tprot locking Carsten Otte
2011-11-17 10:27 ` Avi Kivity
2011-11-17 11:15 ` Martin Schwidefsky
2011-11-17 11:32 ` Martin Schwidefsky
2011-11-20 12:05 ` Avi Kivity
2011-11-20 12:02 ` Avi Kivity
2011-11-17 10:00 ` [patch 4/4] [PATCH] kvm: announce SYNC_MMU Carsten Otte
2011-11-17 10:35 ` [patch 0/4] kvm-s390 patches Avi Kivity
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).