All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 5/5] just realized that it's broken
@ 2009-01-30 10:48 Jan Beulich
  2009-01-30 11:35 ` Keir Fraser
  2009-02-04  8:27 ` Jan Beulich
  0 siblings, 2 replies; 4+ messages in thread
From: Jan Beulich @ 2009-01-30 10:48 UTC (permalink / raw)
  To: xen-devel

This patch was moving the cpumask out of struct page_info into the page
itself, but I just realized that this it not valid, since the purpose of the mask
is to avoid a later TLB flush, hence the base assumption is that the page
might still be in the TLB of some CPU, thus making it possible for a mis-
behaved guest to still write to that page.

Sorry for the mis-numbering,
Jan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 5/5] just realized that it's broken
  2009-01-30 10:48 [PATCH 5/5] just realized that it's broken Jan Beulich
@ 2009-01-30 11:35 ` Keir Fraser
  2009-02-04  8:27 ` Jan Beulich
  1 sibling, 0 replies; 4+ messages in thread
From: Keir Fraser @ 2009-01-30 11:35 UTC (permalink / raw)
  To: Jan Beulich, xen-devel

Fortunately I think it can be got rid of and probably at little or no
performance cost. I will take a look maybe next week.

 -- Keir

On 30/01/2009 10:48, "Jan Beulich" <jbeulich@novell.com> wrote:

> This patch was moving the cpumask out of struct page_info into the page
> itself, but I just realized that this it not valid, since the purpose of the
> mask
> is to avoid a later TLB flush, hence the base assumption is that the page
> might still be in the TLB of some CPU, thus making it possible for a mis-
> behaved guest to still write to that page.
> 
> Sorry for the mis-numbering,
> Jan
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 5/5] just realized that it's broken
  2009-01-30 10:48 [PATCH 5/5] just realized that it's broken Jan Beulich
  2009-01-30 11:35 ` Keir Fraser
@ 2009-02-04  8:27 ` Jan Beulich
  2009-02-04  8:49   ` Keir Fraser
  1 sibling, 1 reply; 4+ messages in thread
From: Jan Beulich @ 2009-02-04  8:27 UTC (permalink / raw)
  To: xen-devel

>>> "Jan Beulich" <jbeulich@novell.com> 30.01.09 11:48 >>>
>This patch was moving the cpumask out of struct page_info into the page
>itself, but I just realized that this it not valid, since the purpose of the mask
>is to avoid a later TLB flush, hence the base assumption is that the page
>might still be in the TLB of some CPU, thus making it possible for a mis-
>behaved guest to still write to that page.
>
>Sorry for the mis-numbering,
>Jan

Actually, after some more consideration I think the patch is actually correct
(minus an issue that already has been in place for a while anyway): All
access a domain may continue have to a page it's freeing (through stale
TLB entries) is read/execute. This is because when the type count of a
page (i.e. in particular a writeable one) a (domain-)global TLB flush is
performed anyway. Allowing the freeing domain (as well as the one
subsequently allocating) to read the cpumask does not present a problem
in my opinion.

The issue mentioned above that I think current code has even without
that patch (though I may easily have missed some aspect here) is that
there is no enforcement of an immediate TLB flush when a writeable
page's type count drops to zero - instead the flush is deferred until the
end of the current operation or batch. With the removal of the use of
the per-domain lock in these code paths it is no longer valid to defer
the flush this much - it must be carried out before the page in question
gets unlocked.

And I would think that invalidate_shadow_ldt() has a similar issue, plus
it seems questionable that it does a local flush when acting on the current
vCPU, but a global one for all others.

Jan

*******************************************************

Move the (potentially large) cpumask field of free pages into the page
itself, thus making sizeof(struct page_info) independent of the
configured number of CPUs, which avoids penalizing systems with few
CPUs just because the hypervisor is able to support many.

Signed-off-by: Jan Beulich <jbeulich@novell.com>

--- 2009-01-27.orig/xen/common/page_alloc.c	2009-01-29 15:27:38.000000000 +0100
+++ 2009-01-27/xen/common/page_alloc.c	2009-01-29 15:33:09.000000000 +0100
@@ -376,7 +376,7 @@ static struct page_info *alloc_heap_page
         BUG_ON(pg[i].count_info != 0);
 
         /* Add in any extra CPUs that need flushing because of this page. */
-        cpus_andnot(extra_cpus_mask, pg[i].u.free.cpumask, mask);
+        cpus_andnot(extra_cpus_mask, PFN_CPUMASK(&pg[i]), mask);
         tlbflush_filter(extra_cpus_mask, pg[i].tlbflush_timestamp);
         cpus_or(mask, mask, extra_cpus_mask);
 
@@ -425,11 +425,11 @@ static void free_heap_pages(
         if ( (d = page_get_owner(&pg[i])) != NULL )
         {
             pg[i].tlbflush_timestamp = tlbflush_current_time();
-            pg[i].u.free.cpumask     = d->domain_dirty_cpumask;
+            PFN_CPUMASK(&pg[i])      = d->domain_dirty_cpumask;
         }
         else
         {
-            cpus_clear(pg[i].u.free.cpumask);
+            cpus_clear(PFN_CPUMASK(&pg[i]));
         }
     }
 
--- 2009-01-27.orig/xen/include/asm-ia64/mm.h	2009-01-30 08:26:33.000000000 +0100
+++ 2009-01-27/xen/include/asm-ia64/mm.h	2009-01-29 14:24:42.000000000 +0100
@@ -35,8 +35,11 @@ typedef unsigned long page_flags_t;
  * Every architecture must ensure the following:
  *  1. 'struct page_info' contains a 'struct list_head list'.
  *  2. Provide a PFN_ORDER() macro for accessing the order of a free page.
+ *  3. Provide a PFN_CPUMASK() macro for accessing the mask of possibly-tainted
+ *     TLBs.
  */
-#define PFN_ORDER(_pfn)	((_pfn)->u.free.order)
+#define PFN_ORDER(_pfn)   ((_pfn)->u.free.order)
+#define PFN_CPUMASK(_pfn) ((_pfn)->u.free.cpumask)
 
 #define PRtype_info "016lx"
 
--- 2009-01-27.orig/xen/include/asm-x86/mm.h	2009-01-29 14:03:37.000000000 +0100
+++ 2009-01-27/xen/include/asm-x86/mm.h	2009-01-30 08:28:27.000000000 +0100
@@ -14,8 +14,18 @@
  * Every architecture must ensure the following:
  *  1. 'struct page_info' contains a 'struct page_list_entry list'.
  *  2. Provide a PFN_ORDER() macro for accessing the order of a free page.
+ *  3. Provide a PFN_CPUMASK() macro for accessing the mask of possibly-tainted
+ *     TLBs.
  */
 #define PFN_ORDER(_pfn) ((_pfn)->v.free.order)
+#ifdef __i386__
+# define PFN_CPUMASK(_pfn) ((_pfn)->u.free.cpumask)
+#else
+# define PFN_CPUMASK(_pfn) (*(cpumask_t *)page_to_virt(_pfn))
+# if NR_CPUS > PAGE_SIZE * 8
+#  error Cannot handle this many CPUs.
+# endif
+#endif
 
 /*
  * This definition is solely for the use in struct page_info (and
@@ -72,8 +82,10 @@ struct page_info
 
         /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
         struct {
+#ifdef __i386__
             /* Mask of possibly-tainted TLBs. */
             cpumask_t cpumask;
+#endif
         } free;
 
     } u;

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 5/5] just realized that it's broken
  2009-02-04  8:27 ` Jan Beulich
@ 2009-02-04  8:49   ` Keir Fraser
  0 siblings, 0 replies; 4+ messages in thread
From: Keir Fraser @ 2009-02-04  8:49 UTC (permalink / raw)
  To: Jan Beulich, xen-devel

On 04/02/2009 08:27, "Jan Beulich" <jbeulich@novell.com> wrote:

> The issue mentioned above that I think current code has even without
> that patch (though I may easily have missed some aspect here) is that
> there is no enforcement of an immediate TLB flush when a writeable
> page's type count drops to zero - instead the flush is deferred until the
> end of the current operation or batch. With the removal of the use of
> the per-domain lock in these code paths it is no longer valid to defer
> the flush this much - it must be carried out before the page in question
> gets unlocked.

Oh, good point. We don't need the synchronous flush in free_page_type() any
more as we do not rely on correctness of the linear map any more (we use it,
but then verify sufficient correctness via get_page()/lock_page(), so now a
guest can only shoot itself in the foot).

I'll revert back to the lazy scheme we used to use. Which of course means
cpumask-in-the-page will not be safe. I will kill it off by other means.

For invalidate_shadow_ldt(), the LDT mappings are per-VCPU so only the local
CPU needs flushing. The reason for flush_tlb_mask() is I think a
conservative attempt at flushing the correct TLB if we are not the VCPU in
question. I would think it is better to use v->vcpu_dirty_cpumask, which
maybe didn't exist when i_s_l() was last looked at. I shall take a look.

 -- Keir

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-02-04  8:49 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-30 10:48 [PATCH 5/5] just realized that it's broken Jan Beulich
2009-01-30 11:35 ` Keir Fraser
2009-02-04  8:27 ` Jan Beulich
2009-02-04  8:49   ` Keir Fraser

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.