xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86: Don't crash when mapping a page using EFI runtime page tables
@ 2015-05-15 16:08 Ross Lagerwall
  2015-05-15 17:41 ` Andrew Cooper
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Ross Lagerwall @ 2015-05-15 16:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Keir Fraser, Jan Beulich, Ross Lagerwall

When an interrupt is received during an EFI runtime service call, Xen
may call map_domain_page() while using the EFI runtime page tables.
This fails because, although the EFI runtime page tables are a
copy of the idle domain's page tables, current points at a different
domain's vCPU.

To fix this, return NULL from mapcache_current_vcpu() when using the EFI
runtime page tables which is treated equivalently to running in an idle
vCPU.

This issue can be reproduced by repeatedly calling GetVariable() from
dom0 while using VT-d, since VT-d frequently maps a page from interrupt
context.

Example call trace:
[<ffff82d0801615dc>] __find_next_zero_bit+0x28/0x60
[<ffff82d08016a10e>] map_domain_page+0x4c6/0x4eb
[<ffff82d080156ae6>] map_vtd_domain_page+0xd/0xf
[<ffff82d08015533a>] msi_msg_read_remap_rte+0xe3/0x1d8
[<ffff82d08014e516>] iommu_read_msi_from_ire+0x31/0x34
[<ffff82d08016ff6c>] set_msi_affinity+0x134/0x17a
[<ffff82d0801737b5>] move_masked_irq+0x5c/0x98
[<ffff82d080173816>] move_native_irq+0x25/0x36
[<ffff82d08016ffcb>] ack_nonmaskable_msi_irq+0x19/0x20
[<ffff82d08016ffdb>] ack_maskable_msi_irq+0x9/0x37
[<ffff82d080173e8b>] do_IRQ+0x251/0x635
[<ffff82d080234502>] common_interrupt+0x62/0x70
[<00000000df7ed2be>] 00000000df7ed2be

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
---
 xen/arch/x86/domain_page.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c
index 158a164..6fbc808 100644
--- a/xen/arch/x86/domain_page.c
+++ b/xen/arch/x86/domain_page.c
@@ -32,20 +32,25 @@ static inline struct vcpu *mapcache_current_vcpu(void)
         return NULL;
 
     /*
+     * When using efi runtime page tables, we have the equivalent of the idle
+     * domain's page tables but current may point at another domain's VCPU.
+     * Return NULL as though current is not properly set up yet.
+     */
+    if ( efi_enabled && read_cr3() == efi_rs_page_table() )
+        return NULL;
+
+    /*
      * If guest_table is NULL, and we are running a paravirtualised guest,
      * then it means we are running on the idle domain's page table and must
      * therefore use its mapcache.
      */
     if ( unlikely(pagetable_is_null(v->arch.guest_table)) && is_pv_vcpu(v) )
     {
-        unsigned long cr3;
-
         /* If we really are idling, perform lazy context switch now. */
         if ( (v = idle_vcpu[smp_processor_id()]) == current )
             sync_local_execstate();
         /* We must now be running on the idle page table. */
-        ASSERT((cr3 = read_cr3()) == __pa(idle_pg_table) ||
-               (efi_enabled && cr3 == efi_rs_page_table()));
+        ASSERT(read_cr3() == __pa(idle_pg_table));
     }
 
     return v;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86: Don't crash when mapping a page using EFI runtime page tables
  2015-05-15 16:08 [PATCH] x86: Don't crash when mapping a page using EFI runtime page tables Ross Lagerwall
@ 2015-05-15 17:41 ` Andrew Cooper
  2015-05-18  9:02   ` Jan Beulich
  2015-05-18  9:12 ` Jan Beulich
  2015-05-18 14:58 ` Jan Beulich
  2 siblings, 1 reply; 10+ messages in thread
From: Andrew Cooper @ 2015-05-15 17:41 UTC (permalink / raw)
  To: Ross Lagerwall, xen-devel, Jan Beulich; +Cc: Keir Fraser

On 15/05/15 17:08, Ross Lagerwall wrote:
> When an interrupt is received during an EFI runtime service call, Xen
> may call map_domain_page() while using the EFI runtime page tables.
> This fails because, although the EFI runtime page tables are a
> copy of the idle domain's page tables, current points at a different
> domain's vCPU.
>
> To fix this, return NULL from mapcache_current_vcpu() when using the EFI
> runtime page tables which is treated equivalently to running in an idle
> vCPU.
>
> This issue can be reproduced by repeatedly calling GetVariable() from
> dom0 while using VT-d, since VT-d frequently maps a page from interrupt
> context.
>
> Example call trace:
> [<ffff82d0801615dc>] __find_next_zero_bit+0x28/0x60
> [<ffff82d08016a10e>] map_domain_page+0x4c6/0x4eb
> [<ffff82d080156ae6>] map_vtd_domain_page+0xd/0xf
> [<ffff82d08015533a>] msi_msg_read_remap_rte+0xe3/0x1d8
> [<ffff82d08014e516>] iommu_read_msi_from_ire+0x31/0x34
> [<ffff82d08016ff6c>] set_msi_affinity+0x134/0x17a
> [<ffff82d0801737b5>] move_masked_irq+0x5c/0x98
> [<ffff82d080173816>] move_native_irq+0x25/0x36
> [<ffff82d08016ffcb>] ack_nonmaskable_msi_irq+0x19/0x20
> [<ffff82d08016ffdb>] ack_maskable_msi_irq+0x9/0x37
> [<ffff82d080173e8b>] do_IRQ+0x251/0x635
> [<ffff82d080234502>] common_interrupt+0x62/0x70
> [<00000000df7ed2be>] 00000000df7ed2be
>
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>

This is actually a VT-d issue which I have fallen over before, and
didn't have time to fix back then.  I had a cascade crash from having to
use map_domain_page() to disable the IOMMU following a crash in the
middle of a context switch.

The VT-d code should use map_domain_page_global() for various of the
root structures, which will simply the code and reduce the runtime cost
of a lot of the codepaths.

However, the overall crash proves that map_domain_page() is not safe for
use in IRQ or exception context.  In the case that v == current,
map_domain_page() will degrade to mfn_to_virt() which is only safe if
one can guarantee that the frame being mapped is present in the direct
mapped region.  The map_domain_page() used by the VT-d code very
certainly isn't.

The stack trace Ross gave is actually only gives half the picture.  What
went on to happen was that the pagefault handler attempting to do a page
walk, which attempted to recursively lock the dcache->lock, and a
different CPU suffered a watchdog timeout because of waiting for the
time calibration rendezvous to start.

We probably want to avoid using map_domain_page() when printing a
hypervisor page walk to avoid deadlocks in situations like this.

~Andrew

> ---
>  xen/arch/x86/domain_page.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c
> index 158a164..6fbc808 100644
> --- a/xen/arch/x86/domain_page.c
> +++ b/xen/arch/x86/domain_page.c
> @@ -32,20 +32,25 @@ static inline struct vcpu *mapcache_current_vcpu(void)
>          return NULL;
>  
>      /*
> +     * When using efi runtime page tables, we have the equivalent of the idle
> +     * domain's page tables but current may point at another domain's VCPU.
> +     * Return NULL as though current is not properly set up yet.
> +     */
> +    if ( efi_enabled && read_cr3() == efi_rs_page_table() )
> +        return NULL;
> +
> +    /*
>       * If guest_table is NULL, and we are running a paravirtualised guest,
>       * then it means we are running on the idle domain's page table and must
>       * therefore use its mapcache.
>       */
>      if ( unlikely(pagetable_is_null(v->arch.guest_table)) && is_pv_vcpu(v) )
>      {
> -        unsigned long cr3;
> -
>          /* If we really are idling, perform lazy context switch now. */
>          if ( (v = idle_vcpu[smp_processor_id()]) == current )
>              sync_local_execstate();
>          /* We must now be running on the idle page table. */
> -        ASSERT((cr3 = read_cr3()) == __pa(idle_pg_table) ||
> -               (efi_enabled && cr3 == efi_rs_page_table()));
> +        ASSERT(read_cr3() == __pa(idle_pg_table));
>      }
>  
>      return v;

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86: Don't crash when mapping a page using EFI runtime page tables
  2015-05-15 17:41 ` Andrew Cooper
@ 2015-05-18  9:02   ` Jan Beulich
  2015-05-18 10:55     ` Andrew Cooper
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2015-05-18  9:02 UTC (permalink / raw)
  To: Andrew Cooper, Ross Lagerwall, xen-devel; +Cc: Keir Fraser

>>> On 15.05.15 at 19:41, <andrew.cooper3@citrix.com> wrote:
> On 15/05/15 17:08, Ross Lagerwall wrote:
>> When an interrupt is received during an EFI runtime service call, Xen
>> may call map_domain_page() while using the EFI runtime page tables.
>> This fails because, although the EFI runtime page tables are a
>> copy of the idle domain's page tables, current points at a different
>> domain's vCPU.
>>
>> To fix this, return NULL from mapcache_current_vcpu() when using the EFI
>> runtime page tables which is treated equivalently to running in an idle
>> vCPU.
>>
>> This issue can be reproduced by repeatedly calling GetVariable() from
>> dom0 while using VT-d, since VT-d frequently maps a page from interrupt
>> context.
>>
>> Example call trace:
>> [<ffff82d0801615dc>] __find_next_zero_bit+0x28/0x60
>> [<ffff82d08016a10e>] map_domain_page+0x4c6/0x4eb
>> [<ffff82d080156ae6>] map_vtd_domain_page+0xd/0xf
>> [<ffff82d08015533a>] msi_msg_read_remap_rte+0xe3/0x1d8
>> [<ffff82d08014e516>] iommu_read_msi_from_ire+0x31/0x34
>> [<ffff82d08016ff6c>] set_msi_affinity+0x134/0x17a
>> [<ffff82d0801737b5>] move_masked_irq+0x5c/0x98
>> [<ffff82d080173816>] move_native_irq+0x25/0x36
>> [<ffff82d08016ffcb>] ack_nonmaskable_msi_irq+0x19/0x20
>> [<ffff82d08016ffdb>] ack_maskable_msi_irq+0x9/0x37
>> [<ffff82d080173e8b>] do_IRQ+0x251/0x635
>> [<ffff82d080234502>] common_interrupt+0x62/0x70
>> [<00000000df7ed2be>] 00000000df7ed2be
>>
>> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> 
> This is actually a VT-d issue which I have fallen over before, and
> didn't have time to fix back then.  I had a cascade crash from having to
> use map_domain_page() to disable the IOMMU following a crash in the
> middle of a context switch.
> 
> The VT-d code should use map_domain_page_global() for various of the
> root structures, which will simply the code and reduce the runtime cost
> of a lot of the codepaths.

Yes.

> However, the overall crash proves that map_domain_page() is not safe for
> use in IRQ or exception context.  In the case that v == current,
> map_domain_page() will degrade to mfn_to_virt() which is only safe if
> one can guarantee that the frame being mapped is present in the direct
> mapped region.  The map_domain_page() used by the VT-d code very
> certainly isn't.

I'm not following: There's certainly no special casing of v == current
in map_domain_page(). And hence I can't see the IRQ/exception
context issue you try to describe here.

I'm wondering whether we shouldn't instead simply disable interrupts
while doing EFI runtime services calls.

> We probably want to avoid using map_domain_page() when printing a
> hypervisor page walk to avoid deadlocks in situations like this.

That wouldn't be very helpful - such printing happens when we're
about to halt the system due to a crash, and hence would hamper
analysis of the cause. Or did you mean to replace the use of
map_domain_page() there by e.g. a fixmap based mechanism, to
avoid the lock state dependency?

Jan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86: Don't crash when mapping a page using EFI runtime page tables
  2015-05-15 16:08 [PATCH] x86: Don't crash when mapping a page using EFI runtime page tables Ross Lagerwall
  2015-05-15 17:41 ` Andrew Cooper
@ 2015-05-18  9:12 ` Jan Beulich
  2015-05-18 14:58 ` Jan Beulich
  2 siblings, 0 replies; 10+ messages in thread
From: Jan Beulich @ 2015-05-18  9:12 UTC (permalink / raw)
  To: Ross Lagerwall; +Cc: Andrew Cooper, Keir Fraser, xen-devel

>>> On 15.05.15 at 18:08, <ross.lagerwall@citrix.com> wrote:
> --- a/xen/arch/x86/domain_page.c
> +++ b/xen/arch/x86/domain_page.c
> @@ -32,20 +32,25 @@ static inline struct vcpu *mapcache_current_vcpu(void)
>          return NULL;
>  
>      /*
> +     * When using efi runtime page tables, we have the equivalent of the idle
> +     * domain's page tables but current may point at another domain's VCPU.
> +     * Return NULL as though current is not properly set up yet.
> +     */
> +    if ( efi_enabled && read_cr3() == efi_rs_page_table() )
> +        return NULL;

While I think this is an issue even without your change, it highlights
that we need to sync updates to idle_pg_table[] into efi_l4_pgtable[]
(particularly such done during memory hotplug, but I'd envision this
to be done generically in virt_to_xen_l3e()) for this to work.

Jan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86: Don't crash when mapping a page using EFI runtime page tables
  2015-05-18  9:02   ` Jan Beulich
@ 2015-05-18 10:55     ` Andrew Cooper
  0 siblings, 0 replies; 10+ messages in thread
From: Andrew Cooper @ 2015-05-18 10:55 UTC (permalink / raw)
  To: Jan Beulich, Ross Lagerwall, xen-devel; +Cc: Keir Fraser

On 18/05/15 10:02, Jan Beulich wrote:
>>>> On 15.05.15 at 19:41, <andrew.cooper3@citrix.com> wrote:
>> On 15/05/15 17:08, Ross Lagerwall wrote:
>>> When an interrupt is received during an EFI runtime service call, Xen
>>> may call map_domain_page() while using the EFI runtime page tables.
>>> This fails because, although the EFI runtime page tables are a
>>> copy of the idle domain's page tables, current points at a different
>>> domain's vCPU.
>>>
>>> To fix this, return NULL from mapcache_current_vcpu() when using the EFI
>>> runtime page tables which is treated equivalently to running in an idle
>>> vCPU.
>>>
>>> This issue can be reproduced by repeatedly calling GetVariable() from
>>> dom0 while using VT-d, since VT-d frequently maps a page from interrupt
>>> context.
>>>
>>> Example call trace:
>>> [<ffff82d0801615dc>] __find_next_zero_bit+0x28/0x60
>>> [<ffff82d08016a10e>] map_domain_page+0x4c6/0x4eb
>>> [<ffff82d080156ae6>] map_vtd_domain_page+0xd/0xf
>>> [<ffff82d08015533a>] msi_msg_read_remap_rte+0xe3/0x1d8
>>> [<ffff82d08014e516>] iommu_read_msi_from_ire+0x31/0x34
>>> [<ffff82d08016ff6c>] set_msi_affinity+0x134/0x17a
>>> [<ffff82d0801737b5>] move_masked_irq+0x5c/0x98
>>> [<ffff82d080173816>] move_native_irq+0x25/0x36
>>> [<ffff82d08016ffcb>] ack_nonmaskable_msi_irq+0x19/0x20
>>> [<ffff82d08016ffdb>] ack_maskable_msi_irq+0x9/0x37
>>> [<ffff82d080173e8b>] do_IRQ+0x251/0x635
>>> [<ffff82d080234502>] common_interrupt+0x62/0x70
>>> [<00000000df7ed2be>] 00000000df7ed2be
>>>
>>> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
>> This is actually a VT-d issue which I have fallen over before, and
>> didn't have time to fix back then.  I had a cascade crash from having to
>> use map_domain_page() to disable the IOMMU following a crash in the
>> middle of a context switch.
>>
>> The VT-d code should use map_domain_page_global() for various of the
>> root structures, which will simply the code and reduce the runtime cost
>> of a lot of the codepaths.
> Yes.
>
>> However, the overall crash proves that map_domain_page() is not safe for
>> use in IRQ or exception context.  In the case that v == current,
>> map_domain_page() will degrade to mfn_to_virt() which is only safe if
>> one can guarantee that the frame being mapped is present in the direct
>> mapped region.  The map_domain_page() used by the VT-d code very
>> certainly isn't.
> I'm not following: There's certainly no special casing of v == current
> in map_domain_page(). And hence I can't see the IRQ/exception
> context issue you try to describe here.

My apologies.  I had confused myself.

>
> I'm wondering whether we shouldn't instead simply disable interrupts
> while doing EFI runtime services calls.

I would be hesitant doing this.  We have no idea how long EFI might
decide to stay in the runtime services.

>
>> We probably want to avoid using map_domain_page() when printing a
>> hypervisor page walk to avoid deadlocks in situations like this.
> That wouldn't be very helpful - such printing happens when we're
> about to halt the system due to a crash, and hence would hamper
> analysis of the cause. Or did you mean to replace the use of
> map_domain_page() there by e.g. a fixmap based mechanism, to
> avoid the lock state dependency?

I indented to find an alternative, not remove the printing.

~Andrew

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86: Don't crash when mapping a page using EFI runtime page tables
  2015-05-15 16:08 [PATCH] x86: Don't crash when mapping a page using EFI runtime page tables Ross Lagerwall
  2015-05-15 17:41 ` Andrew Cooper
  2015-05-18  9:12 ` Jan Beulich
@ 2015-05-18 14:58 ` Jan Beulich
  2015-05-27 10:23   ` Ross Lagerwall
  2 siblings, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2015-05-18 14:58 UTC (permalink / raw)
  To: Ross Lagerwall; +Cc: Andrew Cooper, Keir Fraser, xen-devel

>>> On 15.05.15 at 18:08, <ross.lagerwall@citrix.com> wrote:
> --- a/xen/arch/x86/domain_page.c
> +++ b/xen/arch/x86/domain_page.c
> @@ -32,20 +32,25 @@ static inline struct vcpu *mapcache_current_vcpu(void)
>          return NULL;
>  
>      /*
> +     * When using efi runtime page tables, we have the equivalent of the idle
> +     * domain's page tables but current may point at another domain's VCPU.
> +     * Return NULL as though current is not properly set up yet.
> +     */
> +    if ( efi_enabled && read_cr3() == efi_rs_page_table() )
> +        return NULL;

I'm okay with the patch in principle; what worries me is the CR3 read
that is now going to be necessary even in non-debug builds. With
this code being the only user of efi_rs_page_table(), I wonder if it
wouldn't make sense to alter that function to return non-zero only
when spin_is_locked(&efi_rs_lock), and then alter the code above
such that the CR3 read would happen only when we got a non-zero
value back.

And with that I would then also like to fold the potential double CR3
reads in debug builds to at most one.

Jan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86: Don't crash when mapping a page using EFI runtime page tables
  2015-05-18 14:58 ` Jan Beulich
@ 2015-05-27 10:23   ` Ross Lagerwall
  2015-05-27 11:59     ` Jan Beulich
  0 siblings, 1 reply; 10+ messages in thread
From: Ross Lagerwall @ 2015-05-27 10:23 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Keir Fraser, xen-devel

On 05/18/2015 03:58 PM, Jan Beulich wrote:
>>>> On 15.05.15 at 18:08, <ross.lagerwall@citrix.com> wrote:
>> --- a/xen/arch/x86/domain_page.c
>> +++ b/xen/arch/x86/domain_page.c
>> @@ -32,20 +32,25 @@ static inline struct vcpu *mapcache_current_vcpu(void)
>>           return NULL;
>>
>>       /*
>> +     * When using efi runtime page tables, we have the equivalent of the idle
>> +     * domain's page tables but current may point at another domain's VCPU.
>> +     * Return NULL as though current is not properly set up yet.
>> +     */
>> +    if ( efi_enabled && read_cr3() == efi_rs_page_table() )
>> +        return NULL;
>
> I'm okay with the patch in principle; what worries me is the CR3 read
> that is now going to be necessary even in non-debug builds. With
> this code being the only user of efi_rs_page_table(), I wonder if it
> wouldn't make sense to alter that function to return non-zero only
> when spin_is_locked(&efi_rs_lock), and then alter the code above
> such that the CR3 read would happen only when we got a non-zero
> value back.

mapcache_current_vcpu() appears to be called from IRQ-enabled and 
IRQ-disabled callers which prevents us from using the spinlock. Is 
reading cr3 that bad, given that this is a slow path anyway?

>
> And with that I would then also like to fold the potential double CR3
> reads in debug builds to at most one.
>

CR3 needs to be re-read after the lazy context switch to the idle domain 
is completed.

-- 
Ross Lagerwall

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86: Don't crash when mapping a page using EFI runtime page tables
  2015-05-27 10:23   ` Ross Lagerwall
@ 2015-05-27 11:59     ` Jan Beulich
  2015-05-27 12:03       ` Ross Lagerwall
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2015-05-27 11:59 UTC (permalink / raw)
  To: Ross Lagerwall; +Cc: Andrew Cooper, Keir Fraser, xen-devel

>>> On 27.05.15 at 12:23, <ross.lagerwall@citrix.com> wrote:
> On 05/18/2015 03:58 PM, Jan Beulich wrote:
>>>>> On 15.05.15 at 18:08, <ross.lagerwall@citrix.com> wrote:
>>> --- a/xen/arch/x86/domain_page.c
>>> +++ b/xen/arch/x86/domain_page.c
>>> @@ -32,20 +32,25 @@ static inline struct vcpu *mapcache_current_vcpu(void)
>>>           return NULL;
>>>
>>>       /*
>>> +     * When using efi runtime page tables, we have the equivalent of the 
> idle
>>> +     * domain's page tables but current may point at another domain's VCPU.
>>> +     * Return NULL as though current is not properly set up yet.
>>> +     */
>>> +    if ( efi_enabled && read_cr3() == efi_rs_page_table() )
>>> +        return NULL;
>>
>> I'm okay with the patch in principle; what worries me is the CR3 read
>> that is now going to be necessary even in non-debug builds. With
>> this code being the only user of efi_rs_page_table(), I wonder if it
>> wouldn't make sense to alter that function to return non-zero only
>> when spin_is_locked(&efi_rs_lock), and then alter the code above
>> such that the CR3 read would happen only when we got a non-zero
>> value back.
> 
> mapcache_current_vcpu() appears to be called from IRQ-enabled and 
> IRQ-disabled callers which prevents us from using the spinlock.

I didn't suggest to use any spin lock; I merely suggested checking
whether that particular one is being held by someone (to avoid the
CR3 read if that's not the case).

> Is reading cr3 that bad, given that this is a slow path anyway?

I never easily agree to making a code path slower that may be
executed frequently. And looking at the most likely path through
the function, I don't think it's all that slow anyway.

>> And with that I would then also like to fold the potential double CR3
>> reads in debug builds to at most one.
> 
> CR3 needs to be re-read after the lazy context switch to the idle domain 
> is completed.

That's a good point.

Jan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86: Don't crash when mapping a page using EFI runtime page tables
  2015-05-27 11:59     ` Jan Beulich
@ 2015-05-27 12:03       ` Ross Lagerwall
  2015-05-27 12:17         ` Jan Beulich
  0 siblings, 1 reply; 10+ messages in thread
From: Ross Lagerwall @ 2015-05-27 12:03 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Keir Fraser, xen-devel

On 05/27/2015 12:59 PM, Jan Beulich wrote:
>>>> On 27.05.15 at 12:23, <ross.lagerwall@citrix.com> wrote:
>> On 05/18/2015 03:58 PM, Jan Beulich wrote:
>>>>>> On 15.05.15 at 18:08, <ross.lagerwall@citrix.com> wrote:
>>>> --- a/xen/arch/x86/domain_page.c
>>>> +++ b/xen/arch/x86/domain_page.c
>>>> @@ -32,20 +32,25 @@ static inline struct vcpu *mapcache_current_vcpu(void)
>>>>            return NULL;
>>>>
>>>>        /*
>>>> +     * When using efi runtime page tables, we have the equivalent of the
>> idle
>>>> +     * domain's page tables but current may point at another domain's VCPU.
>>>> +     * Return NULL as though current is not properly set up yet.
>>>> +     */
>>>> +    if ( efi_enabled && read_cr3() == efi_rs_page_table() )
>>>> +        return NULL;
>>>
>>> I'm okay with the patch in principle; what worries me is the CR3 read
>>> that is now going to be necessary even in non-debug builds. With
>>> this code being the only user of efi_rs_page_table(), I wonder if it
>>> wouldn't make sense to alter that function to return non-zero only
>>> when spin_is_locked(&efi_rs_lock), and then alter the code above
>>> such that the CR3 read would happen only when we got a non-zero
>>> value back.
>>
>> mapcache_current_vcpu() appears to be called from IRQ-enabled and
>> IRQ-disabled callers which prevents us from using the spinlock.
>
> I didn't suggest to use any spin lock; I merely suggested checking
> whether that particular one is being held by someone (to avoid the
> CR3 read if that's not the case).

spin_is_locked() calls check_lock() which causes a BUG_ON() even though 
you're not actually using the lock.

-- 
Ross Lagerwall

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86: Don't crash when mapping a page using EFI runtime page tables
  2015-05-27 12:03       ` Ross Lagerwall
@ 2015-05-27 12:17         ` Jan Beulich
  0 siblings, 0 replies; 10+ messages in thread
From: Jan Beulich @ 2015-05-27 12:17 UTC (permalink / raw)
  To: Ross Lagerwall; +Cc: Andrew Cooper, Keir Fraser, xen-devel

>>> On 27.05.15 at 14:03, <ross.lagerwall@citrix.com> wrote:
> On 05/27/2015 12:59 PM, Jan Beulich wrote:
>>>>> On 27.05.15 at 12:23, <ross.lagerwall@citrix.com> wrote:
>>> On 05/18/2015 03:58 PM, Jan Beulich wrote:
>>>>>>> On 15.05.15 at 18:08, <ross.lagerwall@citrix.com> wrote:
>>>>> --- a/xen/arch/x86/domain_page.c
>>>>> +++ b/xen/arch/x86/domain_page.c
>>>>> @@ -32,20 +32,25 @@ static inline struct vcpu *mapcache_current_vcpu(void)
>>>>>            return NULL;
>>>>>
>>>>>        /*
>>>>> +     * When using efi runtime page tables, we have the equivalent of the
>>> idle
>>>>> +     * domain's page tables but current may point at another domain's VCPU.
>>>>> +     * Return NULL as though current is not properly set up yet.
>>>>> +     */
>>>>> +    if ( efi_enabled && read_cr3() == efi_rs_page_table() )
>>>>> +        return NULL;
>>>>
>>>> I'm okay with the patch in principle; what worries me is the CR3 read
>>>> that is now going to be necessary even in non-debug builds. With
>>>> this code being the only user of efi_rs_page_table(), I wonder if it
>>>> wouldn't make sense to alter that function to return non-zero only
>>>> when spin_is_locked(&efi_rs_lock), and then alter the code above
>>>> such that the CR3 read would happen only when we got a non-zero
>>>> value back.
>>>
>>> mapcache_current_vcpu() appears to be called from IRQ-enabled and
>>> IRQ-disabled callers which prevents us from using the spinlock.
>>
>> I didn't suggest to use any spin lock; I merely suggested checking
>> whether that particular one is being held by someone (to avoid the
>> CR3 read if that's not the case).
> 
> spin_is_locked() calls check_lock() which causes a BUG_ON() even though 
> you're not actually using the lock.

Then some other mechanism - spin_is_locked() would have produced
false positives anyway. E.g. storing the CPU which managed to acquire
the lock in efi_rs_enter(), and clearing it in efi_rs_leave() before
dropping the lock. efi_rs_page_table() could then return non-zero on
only this one CPU.

Jan

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-05-27 12:17 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-15 16:08 [PATCH] x86: Don't crash when mapping a page using EFI runtime page tables Ross Lagerwall
2015-05-15 17:41 ` Andrew Cooper
2015-05-18  9:02   ` Jan Beulich
2015-05-18 10:55     ` Andrew Cooper
2015-05-18  9:12 ` Jan Beulich
2015-05-18 14:58 ` Jan Beulich
2015-05-27 10:23   ` Ross Lagerwall
2015-05-27 11:59     ` Jan Beulich
2015-05-27 12:03       ` Ross Lagerwall
2015-05-27 12:17         ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).