From: Marcelo Tosatti <marcelo-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org>
To: Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
Cc: kvm-devel <kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org>
Subject: Re: [PATCH] Fix SMP shadow instantiation race
Date: Mon, 10 Dec 2007 17:22:07 -0500 [thread overview]
Message-ID: <20071210222207.GA17457@dmt> (raw)
In-Reply-To: <475DAF51.8060804-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
On Mon, Dec 10, 2007 at 11:27:45PM +0200, Avi Kivity wrote:
> Marcelo Tosatti wrote:
> > On Mon, Dec 10, 2007 at 07:07:54PM +0200, Avi Kivity wrote:
> >
> >> Marcelo Tosatti wrote:
> >>
> >>> There is a race where VCPU0 is shadowing a pagetable entry while VCPU1
> >>> is updating it, which results in a stale shadow copy.
> >>>
> >>> Fix that by comparing the contents of the cached guest pte with the
> >>> current guest pte after write-protecting the guest pagetable.
> >>>
> >>> Attached program kvm_shadow_race.c demonstrates the problem.
> >>>
> >>>
> >>>
> >> Where is it?
> >>
> >
> > Attached.
> >
> >
>
> Can you explain what it does? I get the same results on both host and
> guest (successful completion).
What it does is:
1) mmap 128MB file as (PROT_READ|PROT_WRITE)
2) starts 32 threads to read 128/32 bytes each
3) calls mprotect(PROT_READ) on that region
2) and 3) run simultaneously.
I added a printk inside the fault handler to read and compare the
original and the just shadowed pte, and you can sometimes see that they
differ (it takes about 10-20 runs of the program to hit the race on a
4-way host with 4-way guest), in that the shadowed PTE has the writeable
bit set but the original doesnt. The program will always succeed even if
the race happens.
It could be a similar scenario such as the one you mentioned earlier:
guest kernel is nukeing a pte to reclaim a page while another CPU is
instantiating that shadow pte, in which case there would be a shadow
page mapping for a now freed page, resulting in data corruption.
The kernel sets the PTE to zero and then flushes the TLB to do that, but
for KVM the TLB flush has no effect.
> >>> diff --git a/drivers/kvm/paging_tmpl.h b/drivers/kvm/paging_tmpl.h
> >>> index 72d4816..4fece01 100644
> >>> --- a/drivers/kvm/paging_tmpl.h
> >>> +++ b/drivers/kvm/paging_tmpl.h
> >>> @@ -66,6 +66,7 @@ struct guest_walker {
> >>> int level;
> >>> gfn_t table_gfn[PT_MAX_FULL_LEVELS];
> >>> pt_element_t pte;
> >>> + gpa_t pte_gpa;
> >>>
> >>>
> >> I think this needs to be an array like table_gfn[]. The guest may play
> >> with the pde (and upper entries) as well as the pte.
> >>
> >
> > I was working under the assumption that the only significant bits of
> > upper entries (WRITEABLE and PRESENT) that can be changed by the guest
> > must be reflected first in the lower level pte's.
> >
> > Isnt that a fair assumption to make?
> >
> >
>
> The other bits (including the physical addresses) may change too. There
> is no requirement that the changes to pde write/present bits be
> reflected on pte write/present bits.
>
> Consider a unix kernel implementing fork() by write-protecting the pud
> tables. It can write protect the entire user address space by clearing
> the write bit on the first 256 pgd entries.
>
> (I don't think Linux does that; maybe that's a worthwhile optimization)
OK, I'll do as you suggest comparing at mmu_get_page() just after shadowing (which
also gets rid of the copy/test if the page is already shadowed).
Thanks
-------------------------------------------------------------------------
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
next prev parent reply other threads:[~2007-12-10 22:22 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-12-10 16:19 [PATCH] Fix SMP shadow instantiation race Marcelo Tosatti
2007-12-10 17:07 ` Avi Kivity
[not found] ` <475D726A.2040901-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-12-10 19:12 ` Marcelo Tosatti
2007-12-10 21:27 ` Avi Kivity
[not found] ` <475DAF51.8060804-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-12-10 22:22 ` Marcelo Tosatti [this message]
2007-12-12 0:12 ` Marcelo Tosatti
2007-12-13 8:37 ` Avi Kivity
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20071210222207.GA17457@dmt \
--to=marcelo-bw31mazkks3ytjvyw6ydsg@public.gmane.org \
--cc=avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org \
--cc=kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox