From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: [PATCH] Fix boot crash on xsm/flask enabled builds when no policy module is present Date: Tue, 27 Aug 2013 09:50:35 +0100 Message-ID: <521C685B.6060707@citrix.com> References: <1377511404-3365-1-git-send-email-tomasz.wroblewski@citrix.com> <521B543902000078000EE55D@nat28.tlf.novell.com> <521B48FF.1040904@citrix.com> <521B6D7302000078000EE64E@nat28.tlf.novell.com> <521B89AC.1040509@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1VEEz2-00045d-O7 for xen-devel@lists.xenproject.org; Tue, 27 Aug 2013 08:50:40 +0000 In-Reply-To: <521B89AC.1040509@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Tomasz Wroblewski Cc: xen-devel@lists.xenproject.org, dgdegra@tycho.nsa.gov, Jan Beulich List-Id: xen-devel@lists.xenproject.org On 26/08/2013 18:00, Tomasz Wroblewski wrote: > > > The shaky manually constructed call graph for the assertion failure: > > setup.c: init_idle_domain > schedule.c: scheduler_init > domain.c: domain_create > domain.c: alloc_domain_struct > domain.c: alloc_xenheap_pages > .. > page_alloc.c: alloc_heap_pages > flushtlb.h: flush_tlb_mask > flushtlb.h: flush_mask > smp.c: flush_area_mask - hits ASSERT because interrupts are disabled here > > I apparently can't get a real stacktrace because adding > dump_execution_state in flush_area_mask just causes the "Unknown > interrupt" error, similarily to what hitting the ASSERT fail does. I > printed the assert condition manually to verify it tho and interrupts > are disabled there so its bound to fail. > Just for reference here, as I went digging in the code. dump_execution_state() makes use of run_in_exception_handler() which makes use of the ud2 instruction to get its hands on an exception frame, for the purpose of dumping the state. The "Unknown Interrupt" means that the CPU is still running on the early boot IDT, set up in xen/arch/x86/boot/x86_64.S which does a blanket ignore on all interrupts, including exceptions. We should probably see about setting up and using the arch traps earlier in boot. Failing that, an early_invalid_opcode() handler could at least hint that it might have hit an early bug, and give some details. ~Andrew