xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Ross Lagerwall <ross.lagerwall@citrix.com>,
	xen-devel@lists.xen.org, Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Subject: Re: [PATCH] x86: Don't crash when mapping a page using EFI runtime page tables
Date: Fri, 15 May 2015 18:41:09 +0100	[thread overview]
Message-ID: <55562FB5.2010502@citrix.com> (raw)
In-Reply-To: <1431706091-32288-1-git-send-email-ross.lagerwall@citrix.com>

On 15/05/15 17:08, Ross Lagerwall wrote:
> When an interrupt is received during an EFI runtime service call, Xen
> may call map_domain_page() while using the EFI runtime page tables.
> This fails because, although the EFI runtime page tables are a
> copy of the idle domain's page tables, current points at a different
> domain's vCPU.
>
> To fix this, return NULL from mapcache_current_vcpu() when using the EFI
> runtime page tables which is treated equivalently to running in an idle
> vCPU.
>
> This issue can be reproduced by repeatedly calling GetVariable() from
> dom0 while using VT-d, since VT-d frequently maps a page from interrupt
> context.
>
> Example call trace:
> [<ffff82d0801615dc>] __find_next_zero_bit+0x28/0x60
> [<ffff82d08016a10e>] map_domain_page+0x4c6/0x4eb
> [<ffff82d080156ae6>] map_vtd_domain_page+0xd/0xf
> [<ffff82d08015533a>] msi_msg_read_remap_rte+0xe3/0x1d8
> [<ffff82d08014e516>] iommu_read_msi_from_ire+0x31/0x34
> [<ffff82d08016ff6c>] set_msi_affinity+0x134/0x17a
> [<ffff82d0801737b5>] move_masked_irq+0x5c/0x98
> [<ffff82d080173816>] move_native_irq+0x25/0x36
> [<ffff82d08016ffcb>] ack_nonmaskable_msi_irq+0x19/0x20
> [<ffff82d08016ffdb>] ack_maskable_msi_irq+0x9/0x37
> [<ffff82d080173e8b>] do_IRQ+0x251/0x635
> [<ffff82d080234502>] common_interrupt+0x62/0x70
> [<00000000df7ed2be>] 00000000df7ed2be
>
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>

This is actually a VT-d issue which I have fallen over before, and
didn't have time to fix back then.  I had a cascade crash from having to
use map_domain_page() to disable the IOMMU following a crash in the
middle of a context switch.

The VT-d code should use map_domain_page_global() for various of the
root structures, which will simply the code and reduce the runtime cost
of a lot of the codepaths.

However, the overall crash proves that map_domain_page() is not safe for
use in IRQ or exception context.  In the case that v == current,
map_domain_page() will degrade to mfn_to_virt() which is only safe if
one can guarantee that the frame being mapped is present in the direct
mapped region.  The map_domain_page() used by the VT-d code very
certainly isn't.

The stack trace Ross gave is actually only gives half the picture.  What
went on to happen was that the pagefault handler attempting to do a page
walk, which attempted to recursively lock the dcache->lock, and a
different CPU suffered a watchdog timeout because of waiting for the
time calibration rendezvous to start.

We probably want to avoid using map_domain_page() when printing a
hypervisor page walk to avoid deadlocks in situations like this.

~Andrew

> ---
>  xen/arch/x86/domain_page.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c
> index 158a164..6fbc808 100644
> --- a/xen/arch/x86/domain_page.c
> +++ b/xen/arch/x86/domain_page.c
> @@ -32,20 +32,25 @@ static inline struct vcpu *mapcache_current_vcpu(void)
>          return NULL;
>  
>      /*
> +     * When using efi runtime page tables, we have the equivalent of the idle
> +     * domain's page tables but current may point at another domain's VCPU.
> +     * Return NULL as though current is not properly set up yet.
> +     */
> +    if ( efi_enabled && read_cr3() == efi_rs_page_table() )
> +        return NULL;
> +
> +    /*
>       * If guest_table is NULL, and we are running a paravirtualised guest,
>       * then it means we are running on the idle domain's page table and must
>       * therefore use its mapcache.
>       */
>      if ( unlikely(pagetable_is_null(v->arch.guest_table)) && is_pv_vcpu(v) )
>      {
> -        unsigned long cr3;
> -
>          /* If we really are idling, perform lazy context switch now. */
>          if ( (v = idle_vcpu[smp_processor_id()]) == current )
>              sync_local_execstate();
>          /* We must now be running on the idle page table. */
> -        ASSERT((cr3 = read_cr3()) == __pa(idle_pg_table) ||
> -               (efi_enabled && cr3 == efi_rs_page_table()));
> +        ASSERT(read_cr3() == __pa(idle_pg_table));
>      }
>  
>      return v;

  reply	other threads:[~2015-05-15 17:41 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-15 16:08 [PATCH] x86: Don't crash when mapping a page using EFI runtime page tables Ross Lagerwall
2015-05-15 17:41 ` Andrew Cooper [this message]
2015-05-18  9:02   ` Jan Beulich
2015-05-18 10:55     ` Andrew Cooper
2015-05-18  9:12 ` Jan Beulich
2015-05-18 14:58 ` Jan Beulich
2015-05-27 10:23   ` Ross Lagerwall
2015-05-27 11:59     ` Jan Beulich
2015-05-27 12:03       ` Ross Lagerwall
2015-05-27 12:17         ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55562FB5.2010502@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=keir@xen.org \
    --cc=ross.lagerwall@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).