Re: [PATCH] Fix boot crash on xsm/flask enabled builds when no policy module is present

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Tomasz Wroblewski <tomasz.wroblewski@citrix.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: xen-devel@lists.xenproject.org, dgdegra@tycho.nsa.gov,
	Andrew Cooper <andrew.cooper3@citrix.com>
Subject: Re: [PATCH] Fix boot crash on xsm/flask enabled builds when no policy module is present
Date: Mon, 26 Aug 2013 19:00:28 +0200	[thread overview]
Message-ID: <521B89AC.1040509@citrix.com> (raw)
In-Reply-To: <521B6D7302000078000EE64E@nat28.tlf.novell.com>

On 08/26/2013 03:00 PM, Jan Beulich wrote:
>>>> On 26.08.13 at 14:24, Tomasz Wroblewski<tomasz.wroblewski@citrix.com>  wrote:
>> On 08/26/2013 01:12 PM, Jan Beulich wrote:
>>>>>> On 26.08.13 at 12:03, Tomasz Wroblewski<tomasz.wroblewski@citrix.com>   wrote:
>>>> Xen crashes on boot of xsm/flask enabled builds, if policy module is not
>>>> specified.
>>>> This seems to have worked on 4.1 at least.
>>> Looking at the code (4.1.5) I can't see what would prevent the
>>> same NULL pointer deref. Care to explain?
>> The crash doesn't happen at the NULL pointer dereference site though,
> Then does it deref the NULL pointer, or does it not? If it does and
> merely doesn't crash because something happens to be mapped
> there, that's still a bug.
So after bit more tracing it looks like the issue is not a null pointer 
deref (it is avoided as Daniel pointed out), but rather some 
xmalloc/xfree pairs which happen inside security_load_policy even if 
pointer is null.

security_load_policy calls policydb_init, then tries to read the policy 
which fails since length of is 0, so it calls policydb_destroy. Inside 
these functions there is some hashtable construction/destruction 
happening, and a reasonably sizable ones too (there are 7 or so 
hashtables, biggest one uses an array of 512 pointers, they total to use 
768 pointers). I've verified that replacing security_load_policy call 
completely with the following allocation/deallocaiton is enough to cause 
this crash:

     //ret = security_load_policy(policy_buffer, policy_size);
     {
             void ** p = xmalloc_array(void*, 768);
             xfree(p);
     }

Note that this allocation succeeds, and also if you would not call xfree 
(which is not called if say a policy was succesfully loaded), there is 
no crash. So yeah my original patch accidentaly fixes it by just 
avoiding the alloc/free completely.

The shaky manually constructed call graph for the assertion failure:

setup.c: init_idle_domain
schedule.c: scheduler_init
domain.c: domain_create
domain.c: alloc_domain_struct
domain.c: alloc_xenheap_pages
..
page_alloc.c: alloc_heap_pages
flushtlb.h: flush_tlb_mask
flushtlb.h: flush_mask
smp.c: flush_area_mask - hits ASSERT because interrupts are disabled here

I apparently can't get a real stacktrace because adding 
dump_execution_state in flush_area_mask just causes the "Unknown 
interrupt" error, similarily to what hitting the ASSERT fail does. I 
printed the assert condition manually to verify it tho and interrupts 
are disabled there so its bound to fail.

So it looks like the flush_tlb is called too early (with interrupts 
disabled) due to large deallocations in the policy loader code?

On 08/26/2013 03:00 PM, Jan Beulich wrote:
>>>> On 26.08.13 at 14:24, Tomasz Wroblewski<tomasz.wroblewski@citrix.com>  wrote:
>> On 08/26/2013 01:12 PM, Jan Beulich wrote:
>>>>>> On 26.08.13 at 12:03, Tomasz Wroblewski<tomasz.wroblewski@citrix.com>   wrote:
>>>> Xen crashes on boot of xsm/flask enabled builds, if policy module is not
>>>> specified.
>>>> This seems to have worked on 4.1 at least.
>>> Looking at the code (4.1.5) I can't see what would prevent the
>>> same NULL pointer deref. Care to explain?
>> The crash doesn't happen at the NULL pointer dereference site though,

>> but a bit later, when xen tries to flush tlbs for first time I believe,
>> which happens during page allocation for the initial domain structure. I
>> traced it to the following ASSERT in smp.c (so yes I should add this
>> particular crash likely is limited to debug builds then)
>>
>> void flush_area_mask(const cpumask_t *mask, const void *va, unsigned int
>> flags)
>> {
>>       ASSERT(local_irq_is_enabled());
>>       ...
>>
>> The actual crash message is unhelpful since it's basically only
>>
>> ...
>> (XEN) Using scheduler: SMP Credit Scheduler (credit)
>> (XEN) Unknown interrupt (cr2=0000000000000000)
>>
>>
>> Either removing the assert (which is obviously bad), or checking for the
> The assertion is in no way bad. It's the too early use of the
> function that is the problem here.
>
>> null pointer deref as in the submitted patch seems to be fixing it. I'm
>> suspecting it was always broken somehow but just was hidden or had
>> different side effects on 4.1 than it does now. I do lack for a good
>> explanation why fiddling with null addresses breaks up this assert, though.
> Also, you didn't show the call trace that made things get here (yes,
> you may need to construct this manually). I'm in no way convinced
> that there's a NULL pointer involved here at all - the fact that CR2
> is zero doesn't mean a page fault occurred in the first place.
>
> Jan
>

next prev parent reply	other threads:[~2013-08-26 17:01 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-26 10:03 [PATCH] Fix boot crash on xsm/flask enabled builds when no policy module is present Tomasz Wroblewski
2013-08-26 10:52 ` Andrew Cooper
2013-08-26 13:27   ` Daniel De Graaf
2013-08-26 13:32     ` Tomasz Wroblewski
2013-08-26 11:12 ` Jan Beulich
2013-08-26 12:24   ` Tomasz Wroblewski
2013-08-26 12:41     ` Andrew Cooper
2013-08-26 13:00     ` Jan Beulich
2013-08-26 13:34       ` Tomasz Wroblewski
2013-08-26 17:00       ` Tomasz Wroblewski [this message]
2013-08-27  7:13         ` Jan Beulich
2013-08-27  7:23           ` Tomasz Wroblewski
2013-08-27  7:47             ` [PATCH] xmalloc: make whole pages xfree() clear the order field (ab)used by xmalloc() Jan Beulich
2013-09-09 11:14               ` Keir Fraser
2013-08-27  8:50         ` [PATCH] Fix boot crash on xsm/flask enabled builds when no policy module is present Andrew Cooper

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=521B89AC.1040509@citrix.com \
    --to=tomasz.wroblewski@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=dgdegra@tycho.nsa.gov \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).