* KVM: guest: only batch user pte updates
@ 2009-02-10 21:45 Marcelo Tosatti
2009-02-10 22:17 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 6+ messages in thread
From: Marcelo Tosatti @ 2009-02-10 21:45 UTC (permalink / raw)
To: Avi Kivity; +Cc: kvm-devel, Jeremy Fitzhardinge
KVM's paravirt mmu pte batching has issues with, at least, kernel
updates from DEBUG_PAGEALLOC.
This has been experienced with slab allocation from irq context from
within lazy mmu sections:
https://bugzilla.redhat.com/show_bug.cgi?id=480822
DEBUG_PAGEALLOC will map/unmap the kernel pagetables to catch bad
accesses, with code such as:
__change_page_attr():
/*
* Do we really change anything ?
*/
if (pte_val(old_pte) != pte_val(new_pte)) {
set_pte_atomic(kpte, new_pte);
cpa->flags |= CPA_FLUSHTLB;
}
A present->nonpresent update can be queued, but not yet committed to
memory. So the set_pte_atomic will be skipped but the update flushed
afterwards. set_pte_ATOMIC.
Only allow batching from set_pte_at, which is the interesting case.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 478bca9..ba2086a 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -86,7 +86,7 @@ static void kvm_deferred_mmu_op(void *buffer, int len)
state->mmu_queue_len += len;
}
-static void kvm_mmu_write(void *dest, u64 val)
+static void kvm_mmu_write(void *dest, u64 val, bool batch)
{
__u64 pte_phys;
struct kvm_mmu_op_write_pte wpte;
@@ -107,6 +107,8 @@ static void kvm_mmu_write(void *dest, u64 val)
wpte.pte_phys = pte_phys;
kvm_deferred_mmu_op(&wpte, sizeof wpte);
+ if (!batch)
+ mmu_queue_flush(kvm_para_state());
}
/*
@@ -117,54 +119,54 @@ static void kvm_mmu_write(void *dest, u64 val)
*/
static void kvm_set_pte(pte_t *ptep, pte_t pte)
{
- kvm_mmu_write(ptep, pte_val(pte));
+ kvm_mmu_write(ptep, pte_val(pte), false);
}
static void kvm_set_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte)
{
- kvm_mmu_write(ptep, pte_val(pte));
+ kvm_mmu_write(ptep, pte_val(pte), true);
}
static void kvm_set_pmd(pmd_t *pmdp, pmd_t pmd)
{
- kvm_mmu_write(pmdp, pmd_val(pmd));
+ kvm_mmu_write(pmdp, pmd_val(pmd), false);
}
#if PAGETABLE_LEVELS >= 3
#ifdef CONFIG_X86_PAE
static void kvm_set_pte_atomic(pte_t *ptep, pte_t pte)
{
- kvm_mmu_write(ptep, pte_val(pte));
+ kvm_mmu_write(ptep, pte_val(pte), false);
}
static void kvm_set_pte_present(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte)
{
- kvm_mmu_write(ptep, pte_val(pte));
+ kvm_mmu_write(ptep, pte_val(pte), false);
}
static void kvm_pte_clear(struct mm_struct *mm,
unsigned long addr, pte_t *ptep)
{
- kvm_mmu_write(ptep, 0);
+ kvm_mmu_write(ptep, 0, false);
}
static void kvm_pmd_clear(pmd_t *pmdp)
{
- kvm_mmu_write(pmdp, 0);
+ kvm_mmu_write(pmdp, 0, false);
}
#endif
static void kvm_set_pud(pud_t *pudp, pud_t pud)
{
- kvm_mmu_write(pudp, pud_val(pud));
+ kvm_mmu_write(pudp, pud_val(pud), false);
}
#if PAGETABLE_LEVELS == 4
static void kvm_set_pgd(pgd_t *pgdp, pgd_t pgd)
{
- kvm_mmu_write(pgdp, pgd_val(pgd));
+ kvm_mmu_write(pgdp, pgd_val(pgd), false);
}
#endif
#endif /* PAGETABLE_LEVELS >= 3 */
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: KVM: guest: only batch user pte updates
2009-02-10 21:45 KVM: guest: only batch user pte updates Marcelo Tosatti
@ 2009-02-10 22:17 ` Jeremy Fitzhardinge
2009-02-10 22:41 ` Marcelo Tosatti
0 siblings, 1 reply; 6+ messages in thread
From: Jeremy Fitzhardinge @ 2009-02-10 22:17 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Avi Kivity, kvm-devel, Rusty Russell, Zachary Amsden
Marcelo Tosatti wrote:
> KVM's paravirt mmu pte batching has issues with, at least, kernel
> updates from DEBUG_PAGEALLOC.
>
> This has been experienced with slab allocation from irq context from
> within lazy mmu sections:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=480822
>
> DEBUG_PAGEALLOC will map/unmap the kernel pagetables to catch bad
> accesses, with code such as:
>
> __change_page_attr():
>
> /*
> * Do we really change anything ?
> */
> if (pte_val(old_pte) != pte_val(new_pte)) {
> set_pte_atomic(kpte, new_pte);
> cpa->flags |= CPA_FLUSHTLB;
> }
>
> A present->nonpresent update can be queued, but not yet committed to
> memory. So the set_pte_atomic will be skipped but the update flushed
> afterwards. set_pte_ATOMIC.
>
Are you saying that there's a queued update which means that old_pte is
a stale value which happens to equal new_pte, so new_pte is never set?
OK, sounds like a generic problem, of the same sort we've had with
kmap_atomic being used in interrupt routines in lazy mode.
In this case, I think the proper fix is to call
arch_flush_lazy_mmu_mode() before reading old_pte to make sure its up to
date, and calling it again when processing CPA_FLUSHTLB. Could you try
the patch below instead?
(BTW, set_pte_atomic doesn't mean synchronous; it just means its safe to
use on live ptes on 32-bit PAE machines which can't otherwise atomically
update a pte.)
J
commit 264d7d09de69b1f729adb43acc86bd504dd21251
Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Date: Tue Feb 10 14:15:52 2009 -0800
x86/cpa: make sure cpa is safe to call in lazy mmu mode
The CPA code may be called while we're in lazy mmu update mode - for
example, when using DEBUG_PAGE_ALLOC and doing a slab allocation
in an interrupt handler which interrupted a lazy mmu update. In this
case, the in memory pagetable state may be out of date due to pending
queued updates. We need to flush any pending updates before inspecting
the page table. Similarly, we must explicitly flush any modifications
CPA may have made (which comes down to flushing queued operations when
flushing the TLB).
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 84ba748..fb12f06 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -576,6 +576,13 @@ static int __change_page_attr(struct cpa_data *cpa, int primary)
else
address = *cpa->vaddr;
+ /*
+ * If we're called with lazy mmu updates enabled, the
+ * in-memory pte state may be stale. Flush pending updates to
+ * bring them up to date.
+ */
+ arch_flush_lazy_mmu_mode();
+
repeat:
kpte = lookup_address(address, &level);
if (!kpte)
@@ -854,6 +861,13 @@ static int change_page_attr_set_clr(unsigned long *addr, int numpages,
} else
cpa_flush_all(cache);
+ /*
+ * If we've been called with lazy mmu updates enabled, then
+ * make sure that everything gets flushed out before we
+ * return.
+ */
+ arch_flush_lazy_mmu_mode();
+
out:
return ret;
}
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: KVM: guest: only batch user pte updates
2009-02-10 22:17 ` Jeremy Fitzhardinge
@ 2009-02-10 22:41 ` Marcelo Tosatti
2009-02-10 23:14 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 6+ messages in thread
From: Marcelo Tosatti @ 2009-02-10 22:41 UTC (permalink / raw)
To: Jeremy Fitzhardinge; +Cc: Avi Kivity, kvm-devel, Rusty Russell, Zachary Amsden
On Tue, Feb 10, 2009 at 02:17:49PM -0800, Jeremy Fitzhardinge wrote:
> Marcelo Tosatti wrote:
>> KVM's paravirt mmu pte batching has issues with, at least, kernel
>> updates from DEBUG_PAGEALLOC.
>>
>> This has been experienced with slab allocation from irq context from
>> within lazy mmu sections:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=480822
>>
>> DEBUG_PAGEALLOC will map/unmap the kernel pagetables to catch bad
>> accesses, with code such as:
>>
>> __change_page_attr():
>>
>> /*
>> * Do we really change anything ?
>> */
>> if (pte_val(old_pte) != pte_val(new_pte)) {
>> set_pte_atomic(kpte, new_pte);
>> cpa->flags |= CPA_FLUSHTLB;
>> }
>>
>> A present->nonpresent update can be queued, but not yet committed to
>> memory. So the set_pte_atomic will be skipped but the update flushed
>> afterwards. set_pte_ATOMIC.
>>
>
> Are you saying that there's a queued update which means that old_pte is
> a stale value which happens to equal new_pte, so new_pte is never set?
> OK, sounds like a generic problem, of the same sort we've had with
> kmap_atomic being used in interrupt routines in lazy mode.
Yes. It seems however that only set_pte_at/pte_update/_defer are
used under significatly long lazy mmu sections (long as in number of
updates). Is it worthwhile to bother (and risk) batching kernel pte updates ?
Until someone forgets about arch_flush_lazy_mmu_mode again...
> In this case, I think the proper fix is to call
> arch_flush_lazy_mmu_mode() before reading old_pte to make sure its up to
> date, and calling it again when processing CPA_FLUSHTLB.
> Could you try the patch below instead?
It should work yes.
> (BTW, set_pte_atomic doesn't mean synchronous; it just means its safe to
> use on live ptes on 32-bit PAE machines which can't otherwise atomically
> update a pte.)
Doh, of course.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: KVM: guest: only batch user pte updates
2009-02-10 22:41 ` Marcelo Tosatti
@ 2009-02-10 23:14 ` Jeremy Fitzhardinge
2009-02-11 11:56 ` Avi Kivity
0 siblings, 1 reply; 6+ messages in thread
From: Jeremy Fitzhardinge @ 2009-02-10 23:14 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Avi Kivity, kvm-devel, Rusty Russell, Zachary Amsden
Marcelo Tosatti wrote:
> On Tue, Feb 10, 2009 at 02:17:49PM -0800, Jeremy Fitzhardinge wrote:
>
>> Marcelo Tosatti wrote:
>>
>>> KVM's paravirt mmu pte batching has issues with, at least, kernel
>>> updates from DEBUG_PAGEALLOC.
>>>
>>> This has been experienced with slab allocation from irq context from
>>> within lazy mmu sections:
>>>
>>> https://bugzilla.redhat.com/show_bug.cgi?id=480822
>>>
>>> DEBUG_PAGEALLOC will map/unmap the kernel pagetables to catch bad
>>> accesses, with code such as:
>>>
>>> __change_page_attr():
>>>
>>> /*
>>> * Do we really change anything ?
>>> */
>>> if (pte_val(old_pte) != pte_val(new_pte)) {
>>> set_pte_atomic(kpte, new_pte);
>>> cpa->flags |= CPA_FLUSHTLB;
>>> }
>>>
>>> A present->nonpresent update can be queued, but not yet committed to
>>> memory. So the set_pte_atomic will be skipped but the update flushed
>>> afterwards. set_pte_ATOMIC.
>>>
>>>
>> Are you saying that there's a queued update which means that old_pte is
>> a stale value which happens to equal new_pte, so new_pte is never set?
>> OK, sounds like a generic problem, of the same sort we've had with
>> kmap_atomic being used in interrupt routines in lazy mode.
>>
>
> Yes. It seems however that only set_pte_at/pte_update/_defer are
> used under significatly long lazy mmu sections (long as in number of
> updates). Is it worthwhile to bother (and risk) batching kernel pte updates ?
>
Well, that depends on how expensive each update is. For something like
kunmap atomic, I think combining the clear+tlb flush probably is worthwhile.
> Until someone forgets about arch_flush_lazy_mmu_mode again...
>
It has been surprisingly unproblematic, though this CPA issue came to
light. But given that there's only a few "correct" ways to update the
kernel mappings now (kmap/vmap/vmalloc, kmap_atomic and cpa, I think),
it should be easy to cover all the bases. (Hm, better check vmap.)
J
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: KVM: guest: only batch user pte updates
2009-02-10 23:14 ` Jeremy Fitzhardinge
@ 2009-02-11 11:56 ` Avi Kivity
2009-02-11 16:57 ` Jeremy Fitzhardinge
0 siblings, 1 reply; 6+ messages in thread
From: Avi Kivity @ 2009-02-11 11:56 UTC (permalink / raw)
To: Jeremy Fitzhardinge
Cc: Marcelo Tosatti, kvm-devel, Rusty Russell, Zachary Amsden
Jeremy Fitzhardinge wrote:
>>
>> Yes. It seems however that only set_pte_at/pte_update/_defer are
>> used under significatly long lazy mmu sections (long as in number of
>> updates). Is it worthwhile to bother (and risk) batching kernel pte
>> updates ?
>>
>
> Well, that depends on how expensive each update is. For something
> like kunmap atomic, I think combining the clear+tlb flush probably is
> worthwhile.
I agree, kmap_atomic() is fairly common.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: KVM: guest: only batch user pte updates
2009-02-11 11:56 ` Avi Kivity
@ 2009-02-11 16:57 ` Jeremy Fitzhardinge
0 siblings, 0 replies; 6+ messages in thread
From: Jeremy Fitzhardinge @ 2009-02-11 16:57 UTC (permalink / raw)
To: Avi Kivity; +Cc: Marcelo Tosatti, kvm-devel, Rusty Russell, Zachary Amsden
Avi Kivity wrote:
> Jeremy Fitzhardinge wrote:
>>>
>>> Yes. It seems however that only set_pte_at/pte_update/_defer are
>>> used under significatly long lazy mmu sections (long as in number of
>>> updates). Is it worthwhile to bother (and risk) batching kernel pte
>>> updates ?
>>>
>>
>> Well, that depends on how expensive each update is. For something
>> like kunmap atomic, I think combining the clear+tlb flush probably is
>> worthwhile.
>
> I agree, kmap_atomic() is fairly common.
(Not that we're actually batching it at present; we need to work out
proper semantics for nesting batches...)
J
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-02-11 16:57 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-10 21:45 KVM: guest: only batch user pte updates Marcelo Tosatti
2009-02-10 22:17 ` Jeremy Fitzhardinge
2009-02-10 22:41 ` Marcelo Tosatti
2009-02-10 23:14 ` Jeremy Fitzhardinge
2009-02-11 11:56 ` Avi Kivity
2009-02-11 16:57 ` Jeremy Fitzhardinge
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox