Re: [PATCH] Fix boot crash on xsm/flask enabled builds when no policy module is present

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Tomasz Wroblewski <tomasz.wroblewski@citrix.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: xen-devel@lists.xenproject.org, dgdegra@tycho.nsa.gov,
	Andrew Cooper <andrew.cooper3@citrix.com>
Subject: Re: [PATCH] Fix boot crash on xsm/flask enabled builds when no policy module is present
Date: Mon, 26 Aug 2013 19:00:28 +0200	[thread overview]
Message-ID: <521B89AC.1040509@citrix.com> (raw)
In-Reply-To: <521B6D7302000078000EE64E@nat28.tlf.novell.com>

On 08/26/2013 03:00 PM, Jan Beulich wrote:
>>>> On 26.08.13 at 14:24, Tomasz Wroblewski<tomasz.wroblewski@citrix.com>  wrote:
>> On 08/26/2013 01:12 PM, Jan Beulich wrote:
>>>>>> On 26.08.13 at 12:03, Tomasz Wroblewski<tomasz.wroblewski@citrix.com>   wrote:
>>>> Xen crashes on boot of xsm/flask enabled builds, if policy module is not
>>>> specified.
>>>> This seems to have worked on 4.1 at least.
>>> Looking at the code (4.1.5) I can't see what would prevent the
>>> same NULL pointer deref. Care to explain?
>> The crash doesn't happen at the NULL pointer dereference site though,
> Then does it deref the NULL pointer, or does it not? If it does and
> merely doesn't crash because something happens to be mapped
> there, that's still a bug.
So after bit more tracing it looks like the issue is not a null pointer 
deref (it is avoided as Daniel pointed out), but rather some 
xmalloc/xfree pairs which happen inside security_load_policy even if 
pointer is null.

security_load_policy calls policydb_init, then tries to read the policy 
which fails since length of is 0, so it calls policydb_destroy. Inside 
these functions there is some hashtable construction/destruction 
happening, and a reasonably sizable ones too (there are 7 or so 
hashtables, biggest one uses an array of 512 pointers, they total to use 
768 pointers). I've verified that replacing security_load_policy call 
completely with the following allocation/deallocaiton is enough to cause 
this crash:

     //ret = security_load_policy(policy_buffer, policy_size);
     {
             void ** p = xmalloc_array(void*, 768);
             xfree(p);
     }

Note that this allocation succeeds, and also if you would not call xfree 
(which is not called if say a policy was succesfully loaded), there is 
no crash. So yeah my original patch accidentaly fixes it by just 
avoiding the alloc/free completely.

The shaky manually constructed call graph for the assertion failure:

setup.c: init_idle_domain
schedule.c: scheduler_init
domain.c: domain_create
domain.c: alloc_domain_struct
domain.c: alloc_xenheap_pages
..
page_alloc.c: alloc_heap_pages
flushtlb.h: flush_tlb_mask
flushtlb.h: flush_mask
smp.c: flush_area_mask - hits ASSERT because interrupts are disabled here

I apparently can't get a real stacktrace because adding 
dump_execution_state in flush_area_mask just causes the "Unknown 
interrupt" error, similarily to what hitting the ASSERT fail does. I 
printed the assert condition manually to verify it tho and interrupts 
are disabled there so its bound to fail.

So it looks like the flush_tlb is called too early (with interrupts 
disabled) due to large deallocations in the policy loader code?

On 08/26/2013 03:00 PM, Jan Beulich wrote:
>>>> On 26.08.13 at 14:24, Tomasz Wroblewski<tomasz.wroblewski@citrix.com>  wrote:
>> On 08/26/2013 01:12 PM, Jan Beulich wrote:
>>>>>> On 26.08.13 at 12:03, Tomasz Wroblewski<tomasz.wroblewski@citrix.com>   wrote:
>>>> Xen crashes on boot of xsm/flask enabled builds, if policy module is not
>>>> specified.
>>>> This seems to have worked on 4.1 at least.
>>> Looking at the code (4.1.5) I can't see what would prevent the
>>> same NULL pointer deref. Care to explain?
>> The crash doesn't happen at the NULL pointer dereference site though,

>> but a bit later, when xen tries to flush tlbs for first time I believe,
>> which happens during page allocation for the initial domain structure. I
>> traced it to the following ASSERT in smp.c (so yes I should add this
>> particular crash likely is limited to debug builds then)
>>
>> void flush_area_mask(const cpumask_t *mask, const void *va, unsigned int
>> flags)
>> {
>>       ASSERT(local_irq_is_enabled());
>>       ...
>>
>> The actual crash message is unhelpful since it's basically only
>>
>> ...
>> (XEN) Using scheduler: SMP Credit Scheduler (credit)
>> (XEN) Unknown interrupt (cr2=0000000000000000)
>>
>>
>> Either removing the assert (which is obviously bad), or checking for the
> The assertion is in no way bad. It's the too early use of the
> function that is the problem here.
>
>> null pointer deref as in the submitted patch seems to be fixing it. I'm
>> suspecting it was always broken somehow but just was hidden or had
>> different side effects on 4.1 than it does now. I do lack for a good
>> explanation why fiddling with null addresses breaks up this assert, though.
> Also, you didn't show the call trace that made things get here (yes,
> you may need to construct this manually). I'm in no way convinced
> that there's a NULL pointer involved here at all - the fact that CR2
> is zero doesn't mean a page fault occurred in the first place.
>
> Jan
>

next prev parent reply	other threads:[~2013-08-26 17:01 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-26 10:03 [PATCH] Fix boot crash on xsm/flask enabled builds when no policy module is present Tomasz Wroblewski
2013-08-26 10:52 ` Andrew Cooper
2013-08-26 13:27   ` Daniel De Graaf
2013-08-26 13:32     ` Tomasz Wroblewski
2013-08-26 11:12 ` Jan Beulich
2013-08-26 12:24   ` Tomasz Wroblewski
2013-08-26 12:41     ` Andrew Cooper
2013-08-26 13:00     ` Jan Beulich
2013-08-26 13:34       ` Tomasz Wroblewski
2013-08-26 17:00       ` Tomasz Wroblewski [this message]
2013-08-27  7:13         ` Jan Beulich
2013-08-27  7:23           ` Tomasz Wroblewski
2013-08-27  7:47             ` [PATCH] xmalloc: make whole pages xfree() clear the order field (ab)used by xmalloc() Jan Beulich
2013-09-09 11:14               ` Keir Fraser
2013-08-27  8:50         ` [PATCH] Fix boot crash on xsm/flask enabled builds when no policy module is present Andrew Cooper

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=521B89AC.1040509@citrix.com \
    --to=tomasz.wroblewski@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=dgdegra@tycho.nsa.gov \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.