From: Avi Kivity <avi@redhat.com>
To: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Carsten Otte <cotte@de.ibm.com>,
Marcelo Tossati <mtosatti@redhat.com>,
Christian Borntraeger <borntraeger@de.ibm.com>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
Cornelia Huck <cornelia.huck@de.ibm.com>,
KVM <kvm@vger.kernel.org>
Subject: Re: [patch 3/4] [PATCH] kvm: Fix tprot locking
Date: Sun, 20 Nov 2011 14:05:56 +0200 [thread overview]
Message-ID: <4EC8ED24.3090901@redhat.com> (raw)
In-Reply-To: <20111117123233.79e35a43@de.ibm.com>
On 11/17/2011 01:32 PM, Martin Schwidefsky wrote:
> On Thu, 17 Nov 2011 12:15:52 +0100
> Martin Schwidefsky <schwidefsky@de.ibm.com> wrote:
>
> > On Thu, 17 Nov 2011 12:27:41 +0200
> > Avi Kivity <avi@redhat.com> wrote:
> >
> > > On 11/17/2011 12:00 PM, Carsten Otte wrote:
> > > > From: Christian Borntraeger <borntraeger@de.ibm.com>
> > > >
> > > > There is a potential host deadlock in the tprot intercept handling.
> > > > We must not hold the mmap semaphore while resolving the guest
> > > > address. If userspace is remapping, then the memory detection in
> > > > the guest is broken anyway so we can safely separate the
> > > > address translation from walking the vmas.
> > > >
> > > > Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> > > > Signed-off-by: Carsten Otte <cotte@de.ibm.com>
> > > > ---
> > > >
> > > > arch/s390/kvm/priv.c | 10 ++++++++--
> > > > 1 file changed, 8 insertions(+), 2 deletions(-)
> > > >
> > > > diff -urpN linux-2.6/arch/s390/kvm/priv.c linux-2.6-patched/arch/s390/kvm/priv.c
> > > > --- linux-2.6/arch/s390/kvm/priv.c 2011-10-24 09:10:05.000000000 +0200
> > > > +++ linux-2.6-patched/arch/s390/kvm/priv.c 2011-11-17 10:03:53.000000000 +0100
> > > > @@ -336,6 +336,7 @@ static int handle_tprot(struct kvm_vcpu
> > > > u64 address1 = disp1 + base1 ? vcpu->arch.guest_gprs[base1] : 0;
> > > > u64 address2 = disp2 + base2 ? vcpu->arch.guest_gprs[base2] : 0;
> > > > struct vm_area_struct *vma;
> > > > + unsigned long user_address;
> > > >
> > > > vcpu->stat.instruction_tprot++;
> > > >
> > > > @@ -349,9 +350,14 @@ static int handle_tprot(struct kvm_vcpu
> > > > return -EOPNOTSUPP;
> > > >
> > > >
> > > > + /* we must resolve the address without holding the mmap semaphore.
> > > > + * This is ok since the userspace hypervisor is not supposed to change
> > > > + * the mapping while the guest queries the memory. Otherwise the guest
> > > > + * might crash or get wrong info anyway. */
> > > > + user_address = (unsigned long) __guestaddr_to_user(vcpu, address1);
> > > > +
> > > > down_read(¤t->mm->mmap_sem);
> > > > - vma = find_vma(current->mm,
> > > > - (unsigned long) __guestaddr_to_user(vcpu, address1));
> > > > + vma = find_vma(current->mm, user_address);
> > > > if (!vma) {
> > > > up_read(¤t->mm->mmap_sem);
> > > > return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
> > > >
> > >
> > > Unrelated to the patch, but I'm curious: it looks like __gmap_fault()
> > > dereferences the guest page table? How can it assume that it is mapped?
> >
> > The gmap code does not assume that the code is mapped. If the individual
> > MB has not been mapped in the guest address space or the target memory
> > is not mapped in the process address space __gmap_fault() returns -EFAULT.
> >
> > > I'm probably misreading the code.
> > >
> > > A little closer to the patch, x86 handles the same issue by calling
> > > get_user_pages_fast(). This should be more scalable than bouncing
> > > mmap_sem, something to consider.
> >
> > I don't think that the frequency of asynchronous page faults will make
> > it necessary to use get_user_pages_fast(). We are talking about the
> > case where I/O is necessary to provide the page that the guest accessed.
> >
> > The advantage of the way s390 does things is that after __gmap_fault
> > translated the guest address to a user space address we can just do a
> > standard page fault for the user space process. Only if that requires
> > I/O we go the long way. Makes sense?
>
> Hmm, Carsten just made me aware that your question is not about pfault,
> it is about the standard case of a guest fault.
>
> For normal guest faults we use a cool trick that the s390 hardware
> allows us. We have the paging table for the kvm process and we have the
> guest page table for execution in the virtualized context. The trick is
> that the guest page table reuses the lowest level of the process page
> table. A fault that sets a pte in the process page table will
> automatically make that pte visible in the guest page table as well
> if the memory region has been mapped in the higher order page tables.
> Even the invalidation of a pte will automatically (!!) remove the
> referenced page from the guest page table as well, including the TLB
> entries on all cpus. The IPTE instruction is your friend :-)
> That is why we don't need mm notifiers.
Yes, that explains it perfectly. I congratulate you on having such
friendly hardware...
--
error compiling committee.c: too many arguments to function
next prev parent reply other threads:[~2011-11-20 12:06 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-11-17 10:00 [patch 0/4] kvm-s390 patches Carsten Otte
2011-11-17 10:00 ` [patch 1/4] [PATCH] kvm-s390: Fix RUNNING flag misinterpretation Carsten Otte
2011-11-17 10:00 ` [patch 2/4] [PATCH] kvm-s390: handle SIGP sense running intercepts Carsten Otte
2011-11-17 10:15 ` Avi Kivity
2011-11-17 10:19 ` Christian Borntraeger
2011-11-17 10:00 ` [patch 3/4] [PATCH] kvm: Fix tprot locking Carsten Otte
2011-11-17 10:27 ` Avi Kivity
2011-11-17 11:15 ` Martin Schwidefsky
2011-11-17 11:32 ` Martin Schwidefsky
2011-11-20 12:05 ` Avi Kivity [this message]
2011-11-20 12:02 ` Avi Kivity
2011-11-17 10:00 ` [patch 4/4] [PATCH] kvm: announce SYNC_MMU Carsten Otte
2011-11-17 10:35 ` [patch 0/4] kvm-s390 patches Avi Kivity
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4EC8ED24.3090901@redhat.com \
--to=avi@redhat.com \
--cc=borntraeger@de.ibm.com \
--cc=cornelia.huck@de.ibm.com \
--cc=cotte@de.ibm.com \
--cc=heiko.carstens@de.ibm.com \
--cc=kvm@vger.kernel.org \
--cc=mtosatti@redhat.com \
--cc=schwidefsky@de.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).