From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Ross Lagerwall <ross.lagerwall@citrix.com>,
xen-devel@lists.xen.org, Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Subject: Re: [PATCH] x86: Don't crash when mapping a page using EFI runtime page tables
Date: Fri, 15 May 2015 18:41:09 +0100 [thread overview]
Message-ID: <55562FB5.2010502@citrix.com> (raw)
In-Reply-To: <1431706091-32288-1-git-send-email-ross.lagerwall@citrix.com>
On 15/05/15 17:08, Ross Lagerwall wrote:
> When an interrupt is received during an EFI runtime service call, Xen
> may call map_domain_page() while using the EFI runtime page tables.
> This fails because, although the EFI runtime page tables are a
> copy of the idle domain's page tables, current points at a different
> domain's vCPU.
>
> To fix this, return NULL from mapcache_current_vcpu() when using the EFI
> runtime page tables which is treated equivalently to running in an idle
> vCPU.
>
> This issue can be reproduced by repeatedly calling GetVariable() from
> dom0 while using VT-d, since VT-d frequently maps a page from interrupt
> context.
>
> Example call trace:
> [<ffff82d0801615dc>] __find_next_zero_bit+0x28/0x60
> [<ffff82d08016a10e>] map_domain_page+0x4c6/0x4eb
> [<ffff82d080156ae6>] map_vtd_domain_page+0xd/0xf
> [<ffff82d08015533a>] msi_msg_read_remap_rte+0xe3/0x1d8
> [<ffff82d08014e516>] iommu_read_msi_from_ire+0x31/0x34
> [<ffff82d08016ff6c>] set_msi_affinity+0x134/0x17a
> [<ffff82d0801737b5>] move_masked_irq+0x5c/0x98
> [<ffff82d080173816>] move_native_irq+0x25/0x36
> [<ffff82d08016ffcb>] ack_nonmaskable_msi_irq+0x19/0x20
> [<ffff82d08016ffdb>] ack_maskable_msi_irq+0x9/0x37
> [<ffff82d080173e8b>] do_IRQ+0x251/0x635
> [<ffff82d080234502>] common_interrupt+0x62/0x70
> [<00000000df7ed2be>] 00000000df7ed2be
>
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
This is actually a VT-d issue which I have fallen over before, and
didn't have time to fix back then. I had a cascade crash from having to
use map_domain_page() to disable the IOMMU following a crash in the
middle of a context switch.
The VT-d code should use map_domain_page_global() for various of the
root structures, which will simply the code and reduce the runtime cost
of a lot of the codepaths.
However, the overall crash proves that map_domain_page() is not safe for
use in IRQ or exception context. In the case that v == current,
map_domain_page() will degrade to mfn_to_virt() which is only safe if
one can guarantee that the frame being mapped is present in the direct
mapped region. The map_domain_page() used by the VT-d code very
certainly isn't.
The stack trace Ross gave is actually only gives half the picture. What
went on to happen was that the pagefault handler attempting to do a page
walk, which attempted to recursively lock the dcache->lock, and a
different CPU suffered a watchdog timeout because of waiting for the
time calibration rendezvous to start.
We probably want to avoid using map_domain_page() when printing a
hypervisor page walk to avoid deadlocks in situations like this.
~Andrew
> ---
> xen/arch/x86/domain_page.c | 13 +++++++++----
> 1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c
> index 158a164..6fbc808 100644
> --- a/xen/arch/x86/domain_page.c
> +++ b/xen/arch/x86/domain_page.c
> @@ -32,20 +32,25 @@ static inline struct vcpu *mapcache_current_vcpu(void)
> return NULL;
>
> /*
> + * When using efi runtime page tables, we have the equivalent of the idle
> + * domain's page tables but current may point at another domain's VCPU.
> + * Return NULL as though current is not properly set up yet.
> + */
> + if ( efi_enabled && read_cr3() == efi_rs_page_table() )
> + return NULL;
> +
> + /*
> * If guest_table is NULL, and we are running a paravirtualised guest,
> * then it means we are running on the idle domain's page table and must
> * therefore use its mapcache.
> */
> if ( unlikely(pagetable_is_null(v->arch.guest_table)) && is_pv_vcpu(v) )
> {
> - unsigned long cr3;
> -
> /* If we really are idling, perform lazy context switch now. */
> if ( (v = idle_vcpu[smp_processor_id()]) == current )
> sync_local_execstate();
> /* We must now be running on the idle page table. */
> - ASSERT((cr3 = read_cr3()) == __pa(idle_pg_table) ||
> - (efi_enabled && cr3 == efi_rs_page_table()));
> + ASSERT(read_cr3() == __pa(idle_pg_table));
> }
>
> return v;
next prev parent reply other threads:[~2015-05-15 17:41 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-15 16:08 [PATCH] x86: Don't crash when mapping a page using EFI runtime page tables Ross Lagerwall
2015-05-15 17:41 ` Andrew Cooper [this message]
2015-05-18 9:02 ` Jan Beulich
2015-05-18 10:55 ` Andrew Cooper
2015-05-18 9:12 ` Jan Beulich
2015-05-18 14:58 ` Jan Beulich
2015-05-27 10:23 ` Ross Lagerwall
2015-05-27 11:59 ` Jan Beulich
2015-05-27 12:03 ` Ross Lagerwall
2015-05-27 12:17 ` Jan Beulich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55562FB5.2010502@citrix.com \
--to=andrew.cooper3@citrix.com \
--cc=jbeulich@suse.com \
--cc=keir@xen.org \
--cc=ross.lagerwall@citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.