* [PATCH 5/5] just realized that it's broken
@ 2009-01-30 10:48 Jan Beulich
2009-01-30 11:35 ` Keir Fraser
2009-02-04 8:27 ` Jan Beulich
0 siblings, 2 replies; 4+ messages in thread
From: Jan Beulich @ 2009-01-30 10:48 UTC (permalink / raw)
To: xen-devel
This patch was moving the cpumask out of struct page_info into the page
itself, but I just realized that this it not valid, since the purpose of the mask
is to avoid a later TLB flush, hence the base assumption is that the page
might still be in the TLB of some CPU, thus making it possible for a mis-
behaved guest to still write to that page.
Sorry for the mis-numbering,
Jan
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 5/5] just realized that it's broken
2009-01-30 10:48 [PATCH 5/5] just realized that it's broken Jan Beulich
@ 2009-01-30 11:35 ` Keir Fraser
2009-02-04 8:27 ` Jan Beulich
1 sibling, 0 replies; 4+ messages in thread
From: Keir Fraser @ 2009-01-30 11:35 UTC (permalink / raw)
To: Jan Beulich, xen-devel
Fortunately I think it can be got rid of and probably at little or no
performance cost. I will take a look maybe next week.
-- Keir
On 30/01/2009 10:48, "Jan Beulich" <jbeulich@novell.com> wrote:
> This patch was moving the cpumask out of struct page_info into the page
> itself, but I just realized that this it not valid, since the purpose of the
> mask
> is to avoid a later TLB flush, hence the base assumption is that the page
> might still be in the TLB of some CPU, thus making it possible for a mis-
> behaved guest to still write to that page.
>
> Sorry for the mis-numbering,
> Jan
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 5/5] just realized that it's broken
2009-01-30 10:48 [PATCH 5/5] just realized that it's broken Jan Beulich
2009-01-30 11:35 ` Keir Fraser
@ 2009-02-04 8:27 ` Jan Beulich
2009-02-04 8:49 ` Keir Fraser
1 sibling, 1 reply; 4+ messages in thread
From: Jan Beulich @ 2009-02-04 8:27 UTC (permalink / raw)
To: xen-devel
>>> "Jan Beulich" <jbeulich@novell.com> 30.01.09 11:48 >>>
>This patch was moving the cpumask out of struct page_info into the page
>itself, but I just realized that this it not valid, since the purpose of the mask
>is to avoid a later TLB flush, hence the base assumption is that the page
>might still be in the TLB of some CPU, thus making it possible for a mis-
>behaved guest to still write to that page.
>
>Sorry for the mis-numbering,
>Jan
Actually, after some more consideration I think the patch is actually correct
(minus an issue that already has been in place for a while anyway): All
access a domain may continue have to a page it's freeing (through stale
TLB entries) is read/execute. This is because when the type count of a
page (i.e. in particular a writeable one) a (domain-)global TLB flush is
performed anyway. Allowing the freeing domain (as well as the one
subsequently allocating) to read the cpumask does not present a problem
in my opinion.
The issue mentioned above that I think current code has even without
that patch (though I may easily have missed some aspect here) is that
there is no enforcement of an immediate TLB flush when a writeable
page's type count drops to zero - instead the flush is deferred until the
end of the current operation or batch. With the removal of the use of
the per-domain lock in these code paths it is no longer valid to defer
the flush this much - it must be carried out before the page in question
gets unlocked.
And I would think that invalidate_shadow_ldt() has a similar issue, plus
it seems questionable that it does a local flush when acting on the current
vCPU, but a global one for all others.
Jan
*******************************************************
Move the (potentially large) cpumask field of free pages into the page
itself, thus making sizeof(struct page_info) independent of the
configured number of CPUs, which avoids penalizing systems with few
CPUs just because the hypervisor is able to support many.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
--- 2009-01-27.orig/xen/common/page_alloc.c 2009-01-29 15:27:38.000000000 +0100
+++ 2009-01-27/xen/common/page_alloc.c 2009-01-29 15:33:09.000000000 +0100
@@ -376,7 +376,7 @@ static struct page_info *alloc_heap_page
BUG_ON(pg[i].count_info != 0);
/* Add in any extra CPUs that need flushing because of this page. */
- cpus_andnot(extra_cpus_mask, pg[i].u.free.cpumask, mask);
+ cpus_andnot(extra_cpus_mask, PFN_CPUMASK(&pg[i]), mask);
tlbflush_filter(extra_cpus_mask, pg[i].tlbflush_timestamp);
cpus_or(mask, mask, extra_cpus_mask);
@@ -425,11 +425,11 @@ static void free_heap_pages(
if ( (d = page_get_owner(&pg[i])) != NULL )
{
pg[i].tlbflush_timestamp = tlbflush_current_time();
- pg[i].u.free.cpumask = d->domain_dirty_cpumask;
+ PFN_CPUMASK(&pg[i]) = d->domain_dirty_cpumask;
}
else
{
- cpus_clear(pg[i].u.free.cpumask);
+ cpus_clear(PFN_CPUMASK(&pg[i]));
}
}
--- 2009-01-27.orig/xen/include/asm-ia64/mm.h 2009-01-30 08:26:33.000000000 +0100
+++ 2009-01-27/xen/include/asm-ia64/mm.h 2009-01-29 14:24:42.000000000 +0100
@@ -35,8 +35,11 @@ typedef unsigned long page_flags_t;
* Every architecture must ensure the following:
* 1. 'struct page_info' contains a 'struct list_head list'.
* 2. Provide a PFN_ORDER() macro for accessing the order of a free page.
+ * 3. Provide a PFN_CPUMASK() macro for accessing the mask of possibly-tainted
+ * TLBs.
*/
-#define PFN_ORDER(_pfn) ((_pfn)->u.free.order)
+#define PFN_ORDER(_pfn) ((_pfn)->u.free.order)
+#define PFN_CPUMASK(_pfn) ((_pfn)->u.free.cpumask)
#define PRtype_info "016lx"
--- 2009-01-27.orig/xen/include/asm-x86/mm.h 2009-01-29 14:03:37.000000000 +0100
+++ 2009-01-27/xen/include/asm-x86/mm.h 2009-01-30 08:28:27.000000000 +0100
@@ -14,8 +14,18 @@
* Every architecture must ensure the following:
* 1. 'struct page_info' contains a 'struct page_list_entry list'.
* 2. Provide a PFN_ORDER() macro for accessing the order of a free page.
+ * 3. Provide a PFN_CPUMASK() macro for accessing the mask of possibly-tainted
+ * TLBs.
*/
#define PFN_ORDER(_pfn) ((_pfn)->v.free.order)
+#ifdef __i386__
+# define PFN_CPUMASK(_pfn) ((_pfn)->u.free.cpumask)
+#else
+# define PFN_CPUMASK(_pfn) (*(cpumask_t *)page_to_virt(_pfn))
+# if NR_CPUS > PAGE_SIZE * 8
+# error Cannot handle this many CPUs.
+# endif
+#endif
/*
* This definition is solely for the use in struct page_info (and
@@ -72,8 +82,10 @@ struct page_info
/* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
struct {
+#ifdef __i386__
/* Mask of possibly-tainted TLBs. */
cpumask_t cpumask;
+#endif
} free;
} u;
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 5/5] just realized that it's broken
2009-02-04 8:27 ` Jan Beulich
@ 2009-02-04 8:49 ` Keir Fraser
0 siblings, 0 replies; 4+ messages in thread
From: Keir Fraser @ 2009-02-04 8:49 UTC (permalink / raw)
To: Jan Beulich, xen-devel
On 04/02/2009 08:27, "Jan Beulich" <jbeulich@novell.com> wrote:
> The issue mentioned above that I think current code has even without
> that patch (though I may easily have missed some aspect here) is that
> there is no enforcement of an immediate TLB flush when a writeable
> page's type count drops to zero - instead the flush is deferred until the
> end of the current operation or batch. With the removal of the use of
> the per-domain lock in these code paths it is no longer valid to defer
> the flush this much - it must be carried out before the page in question
> gets unlocked.
Oh, good point. We don't need the synchronous flush in free_page_type() any
more as we do not rely on correctness of the linear map any more (we use it,
but then verify sufficient correctness via get_page()/lock_page(), so now a
guest can only shoot itself in the foot).
I'll revert back to the lazy scheme we used to use. Which of course means
cpumask-in-the-page will not be safe. I will kill it off by other means.
For invalidate_shadow_ldt(), the LDT mappings are per-VCPU so only the local
CPU needs flushing. The reason for flush_tlb_mask() is I think a
conservative attempt at flushing the correct TLB if we are not the VCPU in
question. I would think it is better to use v->vcpu_dirty_cpumask, which
maybe didn't exist when i_s_l() was last looked at. I shall take a look.
-- Keir
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2009-02-04 8:49 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-30 10:48 [PATCH 5/5] just realized that it's broken Jan Beulich
2009-01-30 11:35 ` Keir Fraser
2009-02-04 8:27 ` Jan Beulich
2009-02-04 8:49 ` Keir Fraser
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.