From: Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
To: Marcelo Tosatti <marcelo-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org>
Cc: kvm-devel <kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org>
Subject: Re: [PATCH] Use cmpxchg for pte updates on walk_addr()
Date: Fri, 07 Dec 2007 07:06:26 +0200 [thread overview]
Message-ID: <4758D4D2.8090208@qumranet.com> (raw)
In-Reply-To: <20071207023237.GA2841@dmt>
Marcelo Tosatti wrote:
> Right, patch at end of the message restarts the process if the pte
> changes under the walker. The goto is pretty ugly, but I fail to see any
> elegant way of doing that. Ideas?
>
>
goto is fine for that. But there's a subtle livelock here: suppose vcpu
0 is in guest mode with continuously updating a memory location. vcpu 1
is faulting with that memory location acting as a pte. While we're in
kernel mode, we aren't responding to signals like we should; so we need
to abort the walk and let the guest retry; that way we go through the
signal_pending() check.
However, this is an intrusive change, so let's start with the goto and
drop it later in favor or an abort.
>>> @@ -1510,6 +1510,9 @@ static int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
>>> {
>>> int ret;
>>>
>>> + /* No need for kvm_cmpxchg_guest_pte here, its the guest
>>> + * responsability to synchronize pte updates and page faults.
>>> + */
>>> ret = kvm_write_guest(vcpu->kvm, gpa, val, bytes);
>>> if (ret < 0)
>>> return 0;
>>>
>> Hmm. What if an i386 pae guest carefully uses cmpxchg8b to atomically
>> set a pte? kvm_write_guest() doesn't guarantee atomicity, so an
>> intended atomic write can be seen splitted by the guest walker doing a
>> concurrent walk.
>>
>
> True, an atomic write is needed... a separate patch for that seems more
> appropriate.
>
>
>
Yes.
> +static inline bool FNAME(cmpxchg_gpte)(struct kvm *kvm,
> + gfn_t table_gfn, unsigned index,
> + pt_element_t orig_pte, pt_element_t new_pte)
> +{
> + pt_element_t ret;
> + pt_element_t *table;
> + struct page *page;
> +
> + page = gfn_to_page(kvm, table_gfn);
> + table = kmap_atomic(page, KM_USER0);
> +
> + ret = CMPXCHG(&table[index], orig_pte, new_pte);
> +
> + kunmap_atomic(page, KM_USER0);
> +
>
Missing kvm_release_page_dirty() here. May also move mark_page_dirty()
here.
No need to force inlining.
> + return (ret != orig_pte);
> +}
> +
> /*
> * Fetch a guest pte for a guest virtual address
> */
> @@ -91,6 +112,7 @@ static int FNAME(walk_addr)(struct guest_walker *walker,
> gpa_t pte_gpa;
>
> pgprintk("%s: addr %lx\n", __FUNCTION__, addr);
> +walk:
> walker->level = vcpu->mmu.root_level;
> pte = vcpu->cr3;
> #if PTTYPE == 64
> @@ -135,8 +157,9 @@ static int FNAME(walk_addr)(struct guest_walker *walker,
>
> if (!(pte & PT_ACCESSED_MASK)) {
> mark_page_dirty(vcpu->kvm, table_gfn);
> - pte |= PT_ACCESSED_MASK;
> - kvm_write_guest(vcpu->kvm, pte_gpa, &pte, sizeof(pte));
> + if (FNAME(cmpxchg_gpte)(vcpu->kvm, table_gfn,
> + index, pte, pte|PT_ACCESSED_MASK))
> + goto walk;
>
We lose the accessed bit in the local variable pte here. Not sure if it
matters but let's play it safe.
> }
>
> if (walker->level == PT_PAGE_TABLE_LEVEL) {
> @@ -159,9 +182,13 @@ static int FNAME(walk_addr)(struct guest_walker *walker,
> }
>
> if (write_fault && !is_dirty_pte(pte)) {
> + bool ret;
> mark_page_dirty(vcpu->kvm, table_gfn);
> - pte |= PT_DIRTY_MASK;
> - kvm_write_guest(vcpu->kvm, pte_gpa, &pte, sizeof(pte));
> + ret = FNAME(cmpxchg_gpte)(vcpu->kvm, table_gfn, index, pte,
> + pte|PT_DIRTY_MASK);
> + if (ret)
> + goto walk;
> +
Again we lose a bit in pte. That ends up in walker->pte and is quite
important.
--
Any sufficiently difficult bug is indistinguishable from a feature.
-------------------------------------------------------------------------
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
next prev parent reply other threads:[~2007-12-07 5:06 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-12-06 15:04 [PATCH] Use cmpxchg for pte updates on walk_addr() Marcelo Tosatti
2007-12-06 15:24 ` Avi Kivity
[not found] ` <47581418.8000506-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-12-07 2:32 ` Marcelo Tosatti
2007-12-07 5:06 ` Avi Kivity [this message]
[not found] ` <4758D4D2.8090208-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-12-07 12:56 ` Marcelo Tosatti
2007-12-07 17:54 ` Andrea Arcangeli
2007-12-09 8:38 ` Avi Kivity
2007-12-07 22:47 ` Marcelo Tosatti
2007-12-09 8:47 ` Avi Kivity
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4758D4D2.8090208@qumranet.com \
--to=avi-atkuwr5tajbwk0htik3j/w@public.gmane.org \
--cc=kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org \
--cc=marcelo-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox