From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tomasz Wroblewski Subject: Re: [PATCH] Fix boot crash on xsm/flask enabled builds when no policy module is present Date: Tue, 27 Aug 2013 09:23:21 +0200 Message-ID: <521C53E9.9070003@citrix.com> References: <1377511404-3365-1-git-send-email-tomasz.wroblewski@citrix.com> <521B543902000078000EE55D@nat28.tlf.novell.com> <521B48FF.1040904@citrix.com> <521B6D7302000078000EE64E@nat28.tlf.novell.com> <521B89AC.1040509@citrix.com> <521C6DBC02000078000EE9D6@nat28.tlf.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta14.messagelabs.com ([193.109.254.103]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1VEDdN-0000CX-2P for xen-devel@lists.xenproject.org; Tue, 27 Aug 2013 07:24:13 +0000 In-Reply-To: <521C6DBC02000078000EE9D6@nat28.tlf.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich Cc: Andrew Cooper , dgdegra@tycho.nsa.gov, xen-devel@lists.xenproject.org List-Id: xen-devel@lists.xenproject.org On 08/27/2013 09:13 AM, Jan Beulich wrote: >>>> On 26.08.13 at 19:00, Tomasz Wroblewski wrote: >> I've verified that replacing security_load_policy call >> completely with the following allocation/deallocaiton is enough to cause >> this crash: >> >> //ret = security_load_policy(policy_buffer, policy_size); >> { >> void ** p = xmalloc_array(void*, 768); >> xfree(p); >> } >> >> Note that this allocation succeeds, and also if you would not call xfree >> (which is not called if say a policy was succesfully loaded), there is >> no crash. So yeah my original patch accidentaly fixes it by just >> avoiding the alloc/free completely. > But I then understand, together with the below, that the crash isn't > down the xfree() path, ... > >> The shaky manually constructed call graph for the assertion failure: >> >> setup.c: init_idle_domain >> schedule.c: scheduler_init >> domain.c: domain_create >> domain.c: alloc_domain_struct >> domain.c: alloc_xenheap_pages >> .. >> page_alloc.c: alloc_heap_pages >> flushtlb.h: flush_tlb_mask >> flushtlb.h: flush_mask >> smp.c: flush_area_mask - hits ASSERT because interrupts are disabled here > ... but instead is on a _subsequent_ allocation. Hence the prior > free gets a heap page into a state that makes in non-suitable for > re-use. But you certainly noticed that free_heap_pages() sets a > page's u.free.need_tlbflush only if the page had an owner, > which shouldn't be the case for Xen-internal allocations. > > With that, I think I can see where the bug really is: The owner > field (v.inuse._domain) is in a union with the order field re-used > by the xmalloc() implementation for whole page allocations. The > fix therefore ought to be as simple as the patch below. Just tested that fix, works! Thanks > Jan > > --- a/xen/common/xmalloc_tlsf.c > +++ a/xen/common/xmalloc_tlsf.c > @@ -629,6 +629,7 @@ void xfree(void *p) > unsigned int i, order = get_order_from_pages(size); > > BUG_ON((unsigned long)p& ((PAGE_SIZE<< order) - 1)); > + PFN_ORDER(virt_to_page(p)) = 0; > for ( i = 0; ; ++i ) > { > if ( !(size& (1<< i)) ) > >