From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Jan Beulich" Subject: Re: [PATCH 5/5] just realized that it's broken Date: Wed, 04 Feb 2009 08:27:24 +0000 Message-ID: <49895F7C.76E4.0078.0@novell.com> References: <4982E8F4.76E4.0078.0@novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <4982E8F4.76E4.0078.0@novell.com> Content-Disposition: inline List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org >>> "Jan Beulich" 30.01.09 11:48 >>> >This patch was moving the cpumask out of struct page_info into the page >itself, but I just realized that this it not valid, since the purpose of = the mask >is to avoid a later TLB flush, hence the base assumption is that the page >might still be in the TLB of some CPU, thus making it possible for a mis- >behaved guest to still write to that page. > >Sorry for the mis-numbering, >Jan Actually, after some more consideration I think the patch is actually = correct (minus an issue that already has been in place for a while anyway): All access a domain may continue have to a page it's freeing (through stale TLB entries) is read/execute. This is because when the type count of a page (i.e. in particular a writeable one) a (domain-)global TLB flush is performed anyway. Allowing the freeing domain (as well as the one subsequently allocating) to read the cpumask does not present a problem in my opinion. The issue mentioned above that I think current code has even without that patch (though I may easily have missed some aspect here) is that there is no enforcement of an immediate TLB flush when a writeable page's type count drops to zero - instead the flush is deferred until the end of the current operation or batch. With the removal of the use of the per-domain lock in these code paths it is no longer valid to defer the flush this much - it must be carried out before the page in question gets unlocked. And I would think that invalidate_shadow_ldt() has a similar issue, plus it seems questionable that it does a local flush when acting on the = current vCPU, but a global one for all others. Jan ******************************************************* Move the (potentially large) cpumask field of free pages into the page itself, thus making sizeof(struct page_info) independent of the configured number of CPUs, which avoids penalizing systems with few CPUs just because the hypervisor is able to support many. Signed-off-by: Jan Beulich --- 2009-01-27.orig/xen/common/page_alloc.c 2009-01-29 15:27:38.0000000= 00 +0100 +++ 2009-01-27/xen/common/page_alloc.c 2009-01-29 15:33:09.000000000 = +0100 @@ -376,7 +376,7 @@ static struct page_info *alloc_heap_page BUG_ON(pg[i].count_info !=3D 0); =20 /* Add in any extra CPUs that need flushing because of this page. = */ - cpus_andnot(extra_cpus_mask, pg[i].u.free.cpumask, mask); + cpus_andnot(extra_cpus_mask, PFN_CPUMASK(&pg[i]), mask); tlbflush_filter(extra_cpus_mask, pg[i].tlbflush_timestamp); cpus_or(mask, mask, extra_cpus_mask); =20 @@ -425,11 +425,11 @@ static void free_heap_pages( if ( (d =3D page_get_owner(&pg[i])) !=3D NULL ) { pg[i].tlbflush_timestamp =3D tlbflush_current_time(); - pg[i].u.free.cpumask =3D d->domain_dirty_cpumask; + PFN_CPUMASK(&pg[i]) =3D d->domain_dirty_cpumask; } else { - cpus_clear(pg[i].u.free.cpumask); + cpus_clear(PFN_CPUMASK(&pg[i])); } } =20 --- 2009-01-27.orig/xen/include/asm-ia64/mm.h 2009-01-30 08:26:33.0000000= 00 +0100 +++ 2009-01-27/xen/include/asm-ia64/mm.h 2009-01-29 14:24:42.0000000= 00 +0100 @@ -35,8 +35,11 @@ typedef unsigned long page_flags_t; * Every architecture must ensure the following: * 1. 'struct page_info' contains a 'struct list_head list'. * 2. Provide a PFN_ORDER() macro for accessing the order of a free = page. + * 3. Provide a PFN_CPUMASK() macro for accessing the mask of possibly-ta= inted + * TLBs. */ -#define PFN_ORDER(_pfn) ((_pfn)->u.free.order) +#define PFN_ORDER(_pfn) ((_pfn)->u.free.order) +#define PFN_CPUMASK(_pfn) ((_pfn)->u.free.cpumask) =20 #define PRtype_info "016lx" =20 --- 2009-01-27.orig/xen/include/asm-x86/mm.h 2009-01-29 14:03:37.0000000= 00 +0100 +++ 2009-01-27/xen/include/asm-x86/mm.h 2009-01-30 08:28:27.000000000 = +0100 @@ -14,8 +14,18 @@ * Every architecture must ensure the following: * 1. 'struct page_info' contains a 'struct page_list_entry list'. * 2. Provide a PFN_ORDER() macro for accessing the order of a free = page. + * 3. Provide a PFN_CPUMASK() macro for accessing the mask of possibly-ta= inted + * TLBs. */ #define PFN_ORDER(_pfn) ((_pfn)->v.free.order) +#ifdef __i386__ +# define PFN_CPUMASK(_pfn) ((_pfn)->u.free.cpumask) +#else +# define PFN_CPUMASK(_pfn) (*(cpumask_t *)page_to_virt(_pfn)) +# if NR_CPUS > PAGE_SIZE * 8 +# error Cannot handle this many CPUs. +# endif +#endif =20 /* * This definition is solely for the use in struct page_info (and @@ -72,8 +82,10 @@ struct page_info =20 /* Page is on a free list: ((count_info & PGC_count_mask) =3D=3D = 0). */ struct { +#ifdef __i386__ /* Mask of possibly-tainted TLBs. */ cpumask_t cpumask; +#endif } free; =20 } u;