Re: [patch 3/4] [PATCH] kvm: Fix tprot locking

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Avi Kivity <avi@redhat.com>
To: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Carsten Otte <cotte@de.ibm.com>,
	Marcelo Tossati <mtosatti@redhat.com>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Cornelia Huck <cornelia.huck@de.ibm.com>,
	KVM <kvm@vger.kernel.org>
Subject: Re: [patch 3/4] [PATCH] kvm: Fix tprot locking
Date: Sun, 20 Nov 2011 14:05:56 +0200	[thread overview]
Message-ID: <4EC8ED24.3090901@redhat.com> (raw)
In-Reply-To: <20111117123233.79e35a43@de.ibm.com>

On 11/17/2011 01:32 PM, Martin Schwidefsky wrote:
> On Thu, 17 Nov 2011 12:15:52 +0100
> Martin Schwidefsky <schwidefsky@de.ibm.com> wrote:
>
> > On Thu, 17 Nov 2011 12:27:41 +0200
> > Avi Kivity <avi@redhat.com> wrote:
> > 
> > > On 11/17/2011 12:00 PM, Carsten Otte wrote:
> > > > From: Christian Borntraeger <borntraeger@de.ibm.com> 
> > > >
> > > > There is a potential host deadlock in the tprot intercept handling.
> > > > We must not hold the mmap semaphore while resolving the guest
> > > > address. If userspace is remapping, then the memory detection in
> > > > the guest is broken anyway so we can safely separate the 
> > > > address translation from walking the vmas.
> > > >
> > > > Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> 
> > > > Signed-off-by: Carsten Otte <cotte@de.ibm.com>
> > > > ---
> > > >
> > > >  arch/s390/kvm/priv.c |   10 ++++++++--
> > > >  1 file changed, 8 insertions(+), 2 deletions(-)
> > > >
> > > > diff -urpN linux-2.6/arch/s390/kvm/priv.c linux-2.6-patched/arch/s390/kvm/priv.c
> > > > --- linux-2.6/arch/s390/kvm/priv.c	2011-10-24 09:10:05.000000000 +0200
> > > > +++ linux-2.6-patched/arch/s390/kvm/priv.c	2011-11-17 10:03:53.000000000 +0100
> > > > @@ -336,6 +336,7 @@ static int handle_tprot(struct kvm_vcpu
> > > >  	u64 address1 = disp1 + base1 ? vcpu->arch.guest_gprs[base1] : 0;
> > > >  	u64 address2 = disp2 + base2 ? vcpu->arch.guest_gprs[base2] : 0;
> > > >  	struct vm_area_struct *vma;
> > > > +	unsigned long user_address;
> > > >  
> > > >  	vcpu->stat.instruction_tprot++;
> > > >  
> > > > @@ -349,9 +350,14 @@ static int handle_tprot(struct kvm_vcpu
> > > >  		return -EOPNOTSUPP;
> > > >  
> > > >  
> > > > +	/* we must resolve the address without holding the mmap semaphore.
> > > > +	 * This is ok since the userspace hypervisor is not supposed to change
> > > > +	 * the mapping while the guest queries the memory. Otherwise the guest
> > > > +	 * might crash or get wrong info anyway. */
> > > > +	user_address = (unsigned long) __guestaddr_to_user(vcpu, address1);
> > > > +
> > > >  	down_read(&current->mm->mmap_sem);
> > > > -	vma = find_vma(current->mm,
> > > > -			(unsigned long) __guestaddr_to_user(vcpu, address1));
> > > > +	vma = find_vma(current->mm, user_address);
> > > >  	if (!vma) {
> > > >  		up_read(&current->mm->mmap_sem);
> > > >  		return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
> > > >
> > > 
> > > Unrelated to the patch, but I'm curious: it looks like __gmap_fault()
> > > dereferences the guest page table?  How can it assume that it is mapped?
> > 
> > The gmap code does not assume that the code is mapped. If the individual
> > MB has not been mapped in the guest address space or the target memory
> > is not mapped in the process address space __gmap_fault() returns -EFAULT. 
> > 
> > > I'm probably misreading the code.
> > > 
> > > A little closer to the patch, x86 handles the same issue by calling
> > > get_user_pages_fast().  This should be more scalable than bouncing
> > > mmap_sem, something to consider.
> > 
> > I don't think that the frequency of asynchronous page faults will make
> > it necessary to use get_user_pages_fast(). We are talking about the
> > case where I/O is necessary to provide the page that the guest accessed.
> > 
> > The advantage of the way s390 does things is that after __gmap_fault
> > translated the guest address to a user space address we can just do a
> > standard page fault for the user space process. Only if that requires
> > I/O we go the long way. Makes sense?
>
> Hmm, Carsten just made me aware that your question is not about pfault,
> it is about the standard case of a guest fault.
>
> For normal guest faults we use a cool trick that the s390 hardware
> allows us. We have the paging table for the kvm process and we have the
> guest page table for execution in the virtualized context. The trick is
> that the guest page table reuses the lowest level of the process page
> table. A fault that sets a pte in the process page table will
> automatically make that pte visible in the guest page table as well
> if the memory region has been mapped in the higher order page tables.
> Even the invalidation of a pte will automatically (!!) remove the
> referenced page from the guest page table as well, including the TLB
> entries on all cpus. The IPTE instruction is your friend :-)
> That is why we don't need mm notifiers.

Yes, that explains it perfectly.  I congratulate you on having such
friendly hardware...

-- 
error compiling committee.c: too many arguments to function

next prev parent reply	other threads:[~2011-11-20 12:06 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-17 10:00 [patch 0/4] kvm-s390 patches Carsten Otte
2011-11-17 10:00 ` [patch 1/4] [PATCH] kvm-s390: Fix RUNNING flag misinterpretation Carsten Otte
2011-11-17 10:00 ` [patch 2/4] [PATCH] kvm-s390: handle SIGP sense running intercepts Carsten Otte
2011-11-17 10:15   ` Avi Kivity
2011-11-17 10:19     ` Christian Borntraeger
2011-11-17 10:00 ` [patch 3/4] [PATCH] kvm: Fix tprot locking Carsten Otte
2011-11-17 10:27   ` Avi Kivity
2011-11-17 11:15     ` Martin Schwidefsky
2011-11-17 11:32       ` Martin Schwidefsky
2011-11-20 12:05         ` Avi Kivity [this message]
2011-11-20 12:02       ` Avi Kivity
2011-11-17 10:00 ` [patch 4/4] [PATCH] kvm: announce SYNC_MMU Carsten Otte
2011-11-17 10:35 ` [patch 0/4] kvm-s390 patches Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4EC8ED24.3090901@redhat.com \
    --to=avi@redhat.com \
    --cc=borntraeger@de.ibm.com \
    --cc=cornelia.huck@de.ibm.com \
    --cc=cotte@de.ibm.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=schwidefsky@de.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).