Re: [patch 3/4] [PATCH] kvm: Fix tprot locking

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Martin Schwidefsky <schwidefsky@de.ibm.com>
To: Avi Kivity <avi@redhat.com>
Cc: Carsten Otte <cotte@de.ibm.com>,
	Marcelo Tossati <mtosatti@redhat.com>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Cornelia Huck <cornelia.huck@de.ibm.com>,
	KVM <kvm@vger.kernel.org>
Subject: Re: [patch 3/4] [PATCH] kvm: Fix tprot locking
Date: Thu, 17 Nov 2011 12:32:33 +0100	[thread overview]
Message-ID: <20111117123233.79e35a43@de.ibm.com> (raw)
In-Reply-To: <20111117121552.798a359b@de.ibm.com>

On Thu, 17 Nov 2011 12:15:52 +0100
Martin Schwidefsky <schwidefsky@de.ibm.com> wrote:

> On Thu, 17 Nov 2011 12:27:41 +0200
> Avi Kivity <avi@redhat.com> wrote:
> 
> > On 11/17/2011 12:00 PM, Carsten Otte wrote:
> > > From: Christian Borntraeger <borntraeger@de.ibm.com> 
> > >
> > > There is a potential host deadlock in the tprot intercept handling.
> > > We must not hold the mmap semaphore while resolving the guest
> > > address. If userspace is remapping, then the memory detection in
> > > the guest is broken anyway so we can safely separate the 
> > > address translation from walking the vmas.
> > >
> > > Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> 
> > > Signed-off-by: Carsten Otte <cotte@de.ibm.com>
> > > ---
> > >
> > >  arch/s390/kvm/priv.c |   10 ++++++++--
> > >  1 file changed, 8 insertions(+), 2 deletions(-)
> > >
> > > diff -urpN linux-2.6/arch/s390/kvm/priv.c linux-2.6-patched/arch/s390/kvm/priv.c
> > > --- linux-2.6/arch/s390/kvm/priv.c	2011-10-24 09:10:05.000000000 +0200
> > > +++ linux-2.6-patched/arch/s390/kvm/priv.c	2011-11-17 10:03:53.000000000 +0100
> > > @@ -336,6 +336,7 @@ static int handle_tprot(struct kvm_vcpu
> > >  	u64 address1 = disp1 + base1 ? vcpu->arch.guest_gprs[base1] : 0;
> > >  	u64 address2 = disp2 + base2 ? vcpu->arch.guest_gprs[base2] : 0;
> > >  	struct vm_area_struct *vma;
> > > +	unsigned long user_address;
> > >  
> > >  	vcpu->stat.instruction_tprot++;
> > >  
> > > @@ -349,9 +350,14 @@ static int handle_tprot(struct kvm_vcpu
> > >  		return -EOPNOTSUPP;
> > >  
> > >  
> > > +	/* we must resolve the address without holding the mmap semaphore.
> > > +	 * This is ok since the userspace hypervisor is not supposed to change
> > > +	 * the mapping while the guest queries the memory. Otherwise the guest
> > > +	 * might crash or get wrong info anyway. */
> > > +	user_address = (unsigned long) __guestaddr_to_user(vcpu, address1);
> > > +
> > >  	down_read(&current->mm->mmap_sem);
> > > -	vma = find_vma(current->mm,
> > > -			(unsigned long) __guestaddr_to_user(vcpu, address1));
> > > +	vma = find_vma(current->mm, user_address);
> > >  	if (!vma) {
> > >  		up_read(&current->mm->mmap_sem);
> > >  		return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
> > >
> > 
> > Unrelated to the patch, but I'm curious: it looks like __gmap_fault()
> > dereferences the guest page table?  How can it assume that it is mapped?
> 
> The gmap code does not assume that the code is mapped. If the individual
> MB has not been mapped in the guest address space or the target memory
> is not mapped in the process address space __gmap_fault() returns -EFAULT. 
> 
> > I'm probably misreading the code.
> > 
> > A little closer to the patch, x86 handles the same issue by calling
> > get_user_pages_fast().  This should be more scalable than bouncing
> > mmap_sem, something to consider.
> 
> I don't think that the frequency of asynchronous page faults will make
> it necessary to use get_user_pages_fast(). We are talking about the
> case where I/O is necessary to provide the page that the guest accessed.
> 
> The advantage of the way s390 does things is that after __gmap_fault
> translated the guest address to a user space address we can just do a
> standard page fault for the user space process. Only if that requires
> I/O we go the long way. Makes sense?

Hmm, Carsten just made me aware that your question is not about pfault,
it is about the standard case of a guest fault.

For normal guest faults we use a cool trick that the s390 hardware
allows us. We have the paging table for the kvm process and we have the
guest page table for execution in the virtualized context. The trick is
that the guest page table reuses the lowest level of the process page
table. A fault that sets a pte in the process page table will
automatically make that pte visible in the guest page table as well
if the memory region has been mapped in the higher order page tables.
Even the invalidation of a pte will automatically (!!) remove the
referenced page from the guest page table as well, including the TLB
entries on all cpus. The IPTE instruction is your friend :-)
That is why we don't need mm notifiers.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

next prev parent reply	other threads:[~2011-11-17 11:32 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-17 10:00 [patch 0/4] kvm-s390 patches Carsten Otte
2011-11-17 10:00 ` [patch 1/4] [PATCH] kvm-s390: Fix RUNNING flag misinterpretation Carsten Otte
2011-11-17 10:00 ` [patch 2/4] [PATCH] kvm-s390: handle SIGP sense running intercepts Carsten Otte
2011-11-17 10:15   ` Avi Kivity
2011-11-17 10:19     ` Christian Borntraeger
2011-11-17 10:00 ` [patch 3/4] [PATCH] kvm: Fix tprot locking Carsten Otte
2011-11-17 10:27   ` Avi Kivity
2011-11-17 11:15     ` Martin Schwidefsky
2011-11-17 11:32       ` Martin Schwidefsky [this message]
2011-11-20 12:05         ` Avi Kivity
2011-11-20 12:02       ` Avi Kivity
2011-11-17 10:00 ` [patch 4/4] [PATCH] kvm: announce SYNC_MMU Carsten Otte
2011-11-17 10:35 ` [patch 0/4] kvm-s390 patches Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111117123233.79e35a43@de.ibm.com \
    --to=schwidefsky@de.ibm.com \
    --cc=avi@redhat.com \
    --cc=borntraeger@de.ibm.com \
    --cc=cornelia.huck@de.ibm.com \
    --cc=cotte@de.ibm.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).