All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 7 of 7] KEXEC: correctly revert x2apic state when kexecing
@ 2011-06-15  9:26 Jan Beulich
  2011-06-15  9:40 ` Andrew Cooper
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2011-06-15  9:26 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

>>> On 13.06.11 at 19:02, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> Introduce the boolean variable 'kexecing' which indicates to functions
> whether we are on the kexec path or not.  This is used by
> disable_local_APIC() to try and revert the APIC mode back to how it
> was found on boot.
> 
> We also need some fudging of the x2apic_enabled variable.  It is used
> in multiple places over the codebase to mean multiple things,
> including:
>     What did the user specifify on the command line?
>     Did the BIOS boot me in x2apic mode?
>     Is the BSP Local APIC in x2apic mode?
>     What mode is my Local APIC in?

I don't really follow the need for this, and a properly explaining
comment certainly also belongs at the place where the hack is.

> Therefore, set it up to prevent a protection fault when disabling the
> IOAPICs.  (In this case, it is used in the "What mode is my Local APIC
> in?" case, so the processor doesnt suffer a protection fault because
> of trying to use x2apic MSRs when it should be using xapic MMIO)
> 
> Finally, make sure that interrupts are disabled when jumping into the
> purgatory code.  It would be bad to service interrupts in the Xen
> context when the next kernel is booting.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> diff -r 3ad737eb0a8d -r b6666fbafe33 xen/arch/x86/apic.c
> --- a/xen/arch/x86/apic.c	Mon Jun 13 17:45:43 2011 +0100
> +++ b/xen/arch/x86/apic.c	Mon Jun 13 17:45:43 2011 +0100
> @@ -37,6 +37,7 @@
>  #include <asm/asm_defns.h> /* for BUILD_SMP_INTERRUPT */
>  #include <mach_apic.h>
>  #include <io_ports.h>
> +#include <xen/kexec.h>
>  
>  static bool_t tdt_enabled __read_mostly;
>  static bool_t tdt_enable __initdata = 1;
> @@ -345,6 +346,33 @@ void disable_local_APIC(void)
>          wrmsrl(MSR_IA32_APICBASE, msr_content &
>                 ~(MSR_IA32_APICBASE_ENABLE|MSR_IA32_APICBASE_EXTD));
>      }
> +
> +    if ( kexecing )
> +    {
> +        uint64_t msr_content;
> +        rdmsrl(MSR_IA32_APICBASE, msr_content);
> +        msr_content &= ~(MSR_IA32_APICBASE_ENABLE|MSR_IA32_APICBASE_EXTD);
> +        wrmsrl(MSR_IA32_APICBASE, msr_content);
> +
> +        switch ( apic_boot_mode )

Did you verify this gets executed only for the single remaining CPU?

> +        {
> +        case APIC_MODE_DISABLED:
> +            break; /* Nothing to do - we did this above */
> +        case APIC_MODE_XAPIC:
> +            msr_content |= MSR_IA32_APICBASE_ENABLE;
> +            wrmsrl(MSR_IA32_APICBASE, msr_content);
> +            break;
> +        case APIC_MODE_X2APIC:
> +            msr_content |= 
> (MSR_IA32_APICBASE_ENABLE|MSR_IA32_APICBASE_EXTD);
> +            wrmsrl(MSR_IA32_APICBASE, msr_content);
> +            break;
> +        default:
> +            printk("Default case when reverting #%d lapic to boot state\n",
> +                   smp_processor_id());
> +            break;
> +        }
> +    }
> +
>  }
>  
>  /*
> diff -r 3ad737eb0a8d -r b6666fbafe33 xen/arch/x86/crash.c
> --- a/xen/arch/x86/crash.c	Mon Jun 13 17:45:43 2011 +0100
> +++ b/xen/arch/x86/crash.c	Mon Jun 13 17:45:43 2011 +0100
> @@ -27,6 +27,7 @@
>  #include <asm/hvm/support.h>
>  #include <asm/apic.h>
>  #include <asm/io_apic.h>
> +#include <xen/iommu.h>
>  
>  static atomic_t waiting_for_crash_ipi;
>  static unsigned int crashing_cpu;
> @@ -82,6 +83,15 @@ static void nmi_shootdown_cpus(void)
>      iommu_crash_shutdown();
>  
>      __stop_this_cpu();
> +
> +    /* This is a bit of a hack due to the problems with the x2apic_enabled
> +     * variable, but we can't do any better without a significant 
> refactoring
> +     * of the APIC code */

Ugly, but I can't exclude it may indeed be necessary. But no matter
what, I think this belongs into apic.c.

> +    if ( current_local_apic_mode() == APIC_MODE_X2APIC )
> +        x2apic_enabled = 1;
> +    else
> +        x2apic_enabled = 0;

Do you really need to force x2apic_enabled *both* ways to avoid
described protection fault? And really I still don't follow why the
variable at the end of the life of the system all of the sudden needs
tweaking, when the system lived happily with its "normal" value.

Jan

> +
>      disable_IO_APIC();
>  
>      local_irq_restore(flags);
> diff -r 3ad737eb0a8d -r b6666fbafe33 xen/arch/x86/machine_kexec.c
> --- a/xen/arch/x86/machine_kexec.c	Mon Jun 13 17:45:43 2011 +0100
> +++ b/xen/arch/x86/machine_kexec.c	Mon Jun 13 17:45:43 2011 +0100
> @@ -91,6 +91,11 @@ void machine_kexec(xen_kexec_image_t *im
>      if ( hpet_broadcast_is_available() )
>          hpet_disable_legacy_broadcast();
>  
> +    /* We are about to permenantly jump out of the Xen context into the 
> kexec
> +     * purgatory code.  We really dont want to be still servicing 
> interupts.
> +     */
> +    local_irq_disable();
> +
>      /*
>       * compat_machine_kexec() returns to idle pagetables, which requires us
>       * to be running on a static GDT mapping (idle pagetables have no GDT
> diff -r 3ad737eb0a8d -r b6666fbafe33 xen/common/kexec.c
> --- a/xen/common/kexec.c	Mon Jun 13 17:45:43 2011 +0100
> +++ b/xen/common/kexec.c	Mon Jun 13 17:45:43 2011 +0100
> @@ -29,6 +29,8 @@
>  #include <compat/kexec.h>
>  #endif
>  
> +bool_t kexecing = FALSE;
> +
>  static DEFINE_PER_CPU_READ_MOSTLY(void *, crash_notes);
>  
>  static Elf_Note *xen_crash_note;
> @@ -220,6 +222,8 @@ void kexec_crash(void)
>      if ( !test_bit(KEXEC_IMAGE_CRASH_BASE + pos, &kexec_flags) )
>          return;
>  
> +    kexecing = TRUE;
> +
>      kexec_common_shutdown();
>      kexec_crash_save_cpu();
>      machine_crash_shutdown();
> @@ -232,6 +236,8 @@ static long kexec_reboot(void *_image)
>  {
>      xen_kexec_image_t *image = _image;
>  
> +    kexecing = TRUE;
> +
>      kexec_common_shutdown();
>      machine_reboot_kexec(image);
>  
> diff -r 3ad737eb0a8d -r b6666fbafe33 xen/include/xen/kexec.h
> --- a/xen/include/xen/kexec.h	Mon Jun 13 17:45:43 2011 +0100
> +++ b/xen/include/xen/kexec.h	Mon Jun 13 17:45:43 2011 +0100
> @@ -12,6 +12,8 @@ typedef struct xen_kexec_reserve {
>  
>  extern xen_kexec_reserve_t kexec_crash_area;
>  
> +extern bool_t kexecing;
> +
>  void set_kexec_crash_area_size(u64 system_ram);
>  
>  /* We have space for 4 images to support atomic update
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com 
> http://lists.xensource.com/xen-devel 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 7 of 7] KEXEC: correctly revert x2apic  state when kexecing
  2011-06-15  9:26 [PATCH 7 of 7] KEXEC: correctly revert x2apic state when kexecing Jan Beulich
@ 2011-06-15  9:40 ` Andrew Cooper
  2011-06-15 10:14   ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Cooper @ 2011-06-15  9:40 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel@lists.xensource.com



On 15/06/11 10:26, Jan Beulich wrote:
>>>> On 13.06.11 at 19:02, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> Introduce the boolean variable 'kexecing' which indicates to functions
>> whether we are on the kexec path or not.  This is used by
>> disable_local_APIC() to try and revert the APIC mode back to how it
>> was found on boot.
>>
>> We also need some fudging of the x2apic_enabled variable.  It is used
>> in multiple places over the codebase to mean multiple things,
>> including:
>>     What did the user specifify on the command line?
>>     Did the BIOS boot me in x2apic mode?
>>     Is the BSP Local APIC in x2apic mode?
>>     What mode is my Local APIC in?
> I don't really follow the need for this, and a properly explaining
> comment certainly also belongs at the place where the hack is.

It was more for reference when some unlucky person really has to go and
refactor the apic code.  This and the paragraph following it were meant
to convey the why we are choosing to re-evaluate the x2apic_enabled
variable.

>> Therefore, set it up to prevent a protection fault when disabling the
>> IOAPICs.  (In this case, it is used in the "What mode is my Local APIC
>> in?" case, so the processor doesnt suffer a protection fault because
>> of trying to use x2apic MSRs when it should be using xapic MMIO)
>>
>> Finally, make sure that interrupts are disabled when jumping into the
>> purgatory code.  It would be bad to service interrupts in the Xen
>> context when the next kernel is booting.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>
>> diff -r 3ad737eb0a8d -r b6666fbafe33 xen/arch/x86/apic.c
>> --- a/xen/arch/x86/apic.c	Mon Jun 13 17:45:43 2011 +0100
>> +++ b/xen/arch/x86/apic.c	Mon Jun 13 17:45:43 2011 +0100
>> @@ -37,6 +37,7 @@
>>  #include <asm/asm_defns.h> /* for BUILD_SMP_INTERRUPT */
>>  #include <mach_apic.h>
>>  #include <io_ports.h>
>> +#include <xen/kexec.h>
>>  
>>  static bool_t tdt_enabled __read_mostly;
>>  static bool_t tdt_enable __initdata = 1;
>> @@ -345,6 +346,33 @@ void disable_local_APIC(void)
>>          wrmsrl(MSR_IA32_APICBASE, msr_content &
>>                 ~(MSR_IA32_APICBASE_ENABLE|MSR_IA32_APICBASE_EXTD));
>>      }
>> +
>> +    if ( kexecing )
>> +    {
>> +        uint64_t msr_content;
>> +        rdmsrl(MSR_IA32_APICBASE, msr_content);
>> +        msr_content &= ~(MSR_IA32_APICBASE_ENABLE|MSR_IA32_APICBASE_EXTD);
>> +        wrmsrl(MSR_IA32_APICBASE, msr_content);
>> +
>> +        switch ( apic_boot_mode )
> Did you verify this gets executed only for the single remaining CPU?

It most definitely runs on all CPUs.  Because of the difference between
x2apic and xapic interrupts, it is stark-raving mad to try and run a
system with different lapics in different modes as the default operating
state.

>> +        {
>> +        case APIC_MODE_DISABLED:
>> +            break; /* Nothing to do - we did this above */
>> +        case APIC_MODE_XAPIC:
>> +            msr_content |= MSR_IA32_APICBASE_ENABLE;
>> +            wrmsrl(MSR_IA32_APICBASE, msr_content);
>> +            break;
>> +        case APIC_MODE_X2APIC:
>> +            msr_content |= 
>> (MSR_IA32_APICBASE_ENABLE|MSR_IA32_APICBASE_EXTD);
>> +            wrmsrl(MSR_IA32_APICBASE, msr_content);
>> +            break;
>> +        default:
>> +            printk("Default case when reverting #%d lapic to boot state\n",
>> +                   smp_processor_id());
>> +            break;
>> +        }
>> +    }
>> +
>>  }
>>  
>>  /*
>> diff -r 3ad737eb0a8d -r b6666fbafe33 xen/arch/x86/crash.c
>> --- a/xen/arch/x86/crash.c	Mon Jun 13 17:45:43 2011 +0100
>> +++ b/xen/arch/x86/crash.c	Mon Jun 13 17:45:43 2011 +0100
>> @@ -27,6 +27,7 @@
>>  #include <asm/hvm/support.h>
>>  #include <asm/apic.h>
>>  #include <asm/io_apic.h>
>> +#include <xen/iommu.h>
>>  
>>  static atomic_t waiting_for_crash_ipi;
>>  static unsigned int crashing_cpu;
>> @@ -82,6 +83,15 @@ static void nmi_shootdown_cpus(void)
>>      iommu_crash_shutdown();
>>  
>>      __stop_this_cpu();
>> +
>> +    /* This is a bit of a hack due to the problems with the x2apic_enabled
>> +     * variable, but we can't do any better without a significant 
>> refactoring
>> +     * of the APIC code */
> Ugly, but I can't exclude it may indeed be necessary. But no matter
> what, I think this belongs into apic.c.

I tried but couldn't get it to work in any sane way.  This is, as far as
I can tell, the only simple solution to the problem

>> +    if ( current_local_apic_mode() == APIC_MODE_X2APIC )
>> +        x2apic_enabled = 1;
>> +    else
>> +        x2apic_enabled = 0;
> Do you really need to force x2apic_enabled *both* ways to avoid
> described protection fault? And really I still don't follow why the
> variable at the end of the life of the system all of the sudden needs
> tweaking, when the system lived happily with its "normal" value.
>
> Jan

You do need to force it both ways.  disable_IO_APIC() which is the
following call runs the risk of causing a protection fault, when setting
virtual wire mode back up.  However, in the alternate case where the
local APIC is in x2apic mode, and x2apic_enabled is false, all the APIC
code will attempt to use MMIO and get confused when twiddling it does
nothing.  (This is one of the problems the linux kdump kernel had until
very recently)

~Andrew

>> +
>>      disable_IO_APIC();
>>  
>>      local_irq_restore(flags);
>> diff -r 3ad737eb0a8d -r b6666fbafe33 xen/arch/x86/machine_kexec.c
>> --- a/xen/arch/x86/machine_kexec.c	Mon Jun 13 17:45:43 2011 +0100
>> +++ b/xen/arch/x86/machine_kexec.c	Mon Jun 13 17:45:43 2011 +0100
>> @@ -91,6 +91,11 @@ void machine_kexec(xen_kexec_image_t *im
>>      if ( hpet_broadcast_is_available() )
>>          hpet_disable_legacy_broadcast();
>>  
>> +    /* We are about to permenantly jump out of the Xen context into the 
>> kexec
>> +     * purgatory code.  We really dont want to be still servicing 
>> interupts.
>> +     */
>> +    local_irq_disable();
>> +
>>      /*
>>       * compat_machine_kexec() returns to idle pagetables, which requires us
>>       * to be running on a static GDT mapping (idle pagetables have no GDT
>> diff -r 3ad737eb0a8d -r b6666fbafe33 xen/common/kexec.c
>> --- a/xen/common/kexec.c	Mon Jun 13 17:45:43 2011 +0100
>> +++ b/xen/common/kexec.c	Mon Jun 13 17:45:43 2011 +0100
>> @@ -29,6 +29,8 @@
>>  #include <compat/kexec.h>
>>  #endif
>>  
>> +bool_t kexecing = FALSE;
>> +
>>  static DEFINE_PER_CPU_READ_MOSTLY(void *, crash_notes);
>>  
>>  static Elf_Note *xen_crash_note;
>> @@ -220,6 +222,8 @@ void kexec_crash(void)
>>      if ( !test_bit(KEXEC_IMAGE_CRASH_BASE + pos, &kexec_flags) )
>>          return;
>>  
>> +    kexecing = TRUE;
>> +
>>      kexec_common_shutdown();
>>      kexec_crash_save_cpu();
>>      machine_crash_shutdown();
>> @@ -232,6 +236,8 @@ static long kexec_reboot(void *_image)
>>  {
>>      xen_kexec_image_t *image = _image;
>>  
>> +    kexecing = TRUE;
>> +
>>      kexec_common_shutdown();
>>      machine_reboot_kexec(image);
>>  
>> diff -r 3ad737eb0a8d -r b6666fbafe33 xen/include/xen/kexec.h
>> --- a/xen/include/xen/kexec.h	Mon Jun 13 17:45:43 2011 +0100
>> +++ b/xen/include/xen/kexec.h	Mon Jun 13 17:45:43 2011 +0100
>> @@ -12,6 +12,8 @@ typedef struct xen_kexec_reserve {
>>  
>>  extern xen_kexec_reserve_t kexec_crash_area;
>>  
>> +extern bool_t kexecing;
>> +
>>  void set_kexec_crash_area_size(u64 system_ram);
>>  
>>  /* We have space for 4 images to support atomic update
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com 
>> http://lists.xensource.com/xen-devel 
>

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 7 of 7] KEXEC: correctly revert x2apic state when kexecing
  2011-06-15  9:40 ` Andrew Cooper
@ 2011-06-15 10:14   ` Jan Beulich
  2011-06-15 10:36     ` Andrew Cooper
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2011-06-15 10:14 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel@lists.xensource.com

>>> On 15.06.11 at 11:40, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> On 15/06/11 10:26, Jan Beulich wrote:
>>>>> On 13.06.11 at 19:02, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>> @@ -345,6 +346,33 @@ void disable_local_APIC(void)
>>>          wrmsrl(MSR_IA32_APICBASE, msr_content &
>>>                 ~(MSR_IA32_APICBASE_ENABLE|MSR_IA32_APICBASE_EXTD));
>>>      }
>>> +
>>> +    if ( kexecing )
>>> +    {
>>> +        uint64_t msr_content;
>>> +        rdmsrl(MSR_IA32_APICBASE, msr_content);
>>> +        msr_content &= ~(MSR_IA32_APICBASE_ENABLE|MSR_IA32_APICBASE_EXTD);
>>> +        wrmsrl(MSR_IA32_APICBASE, msr_content);
>>> +
>>> +        switch ( apic_boot_mode )
>> Did you verify this gets executed only for the single remaining CPU?
> 
> It most definitely runs on all CPUs.  Because of the difference between
> x2apic and xapic interrupts, it is stark-raving mad to try and run a
> system with different lapics in different modes as the default operating
> state.

But you're not returning each CPU to its boot state - you're instead
forcing them all into the state the boot CPU was in (and hence
possibly out of sync with other firmware provided information).

>>> +    if ( current_local_apic_mode() == APIC_MODE_X2APIC )
>>> +        x2apic_enabled = 1;
>>> +    else
>>> +        x2apic_enabled = 0;
>> Do you really need to force x2apic_enabled *both* ways to avoid
>> described protection fault? And really I still don't follow why the
>> variable at the end of the life of the system all of the sudden needs
>> tweaking, when the system lived happily with its "normal" value.
> 
> You do need to force it both ways.  disable_IO_APIC() which is the
> following call runs the risk of causing a protection fault, when setting
> virtual wire mode back up.  However, in the alternate case where the
> local APIC is in x2apic mode, and x2apic_enabled is false, all the APIC
> code will attempt to use MMIO and get confused when twiddling it does
> nothing.  (This is one of the problems the linux kdump kernel had until
> very recently)

How would the APIC end up in x2apic mode when x2apic_enabled
is not set (or vice versa)?

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 7 of 7] KEXEC: correctly revert x2apic  state when kexecing
  2011-06-15 10:14   ` Jan Beulich
@ 2011-06-15 10:36     ` Andrew Cooper
  2011-06-15 10:44       ` Jan Beulich
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Cooper @ 2011-06-15 10:36 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel@lists.xensource.com



On 15/06/11 11:14, Jan Beulich wrote:
>>>> On 15.06.11 at 11:40, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> On 15/06/11 10:26, Jan Beulich wrote:
>>>>>> On 13.06.11 at 19:02, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>>> @@ -345,6 +346,33 @@ void disable_local_APIC(void)
>>>>          wrmsrl(MSR_IA32_APICBASE, msr_content &
>>>>                 ~(MSR_IA32_APICBASE_ENABLE|MSR_IA32_APICBASE_EXTD));
>>>>      }
>>>> +
>>>> +    if ( kexecing )
>>>> +    {
>>>> +        uint64_t msr_content;
>>>> +        rdmsrl(MSR_IA32_APICBASE, msr_content);
>>>> +        msr_content &= ~(MSR_IA32_APICBASE_ENABLE|MSR_IA32_APICBASE_EXTD);
>>>> +        wrmsrl(MSR_IA32_APICBASE, msr_content);
>>>> +
>>>> +        switch ( apic_boot_mode )
>>> Did you verify this gets executed only for the single remaining CPU?
>> It most definitely runs on all CPUs.  Because of the difference between
>> x2apic and xapic interrupts, it is stark-raving mad to try and run a
>> system with different lapics in different modes as the default operating
>> state.
> But you're not returning each CPU to its boot state - you're instead
> forcing them all into the state the boot CPU was in (and hence
> possibly out of sync with other firmware provided information).

My point was that the apic state of the boot processor really cant be
different to the boot state of any other processors, due to the triple
faulting fun you get when lapics receive interrupts in the wrong mode.

>>>> +    if ( current_local_apic_mode() == APIC_MODE_X2APIC )
>>>> +        x2apic_enabled = 1;
>>>> +    else
>>>> +        x2apic_enabled = 0;
>>> Do you really need to force x2apic_enabled *both* ways to avoid
>>> described protection fault? And really I still don't follow why the
>>> variable at the end of the life of the system all of the sudden needs
>>> tweaking, when the system lived happily with its "normal" value.
>> You do need to force it both ways.  disable_IO_APIC() which is the
>> following call runs the risk of causing a protection fault, when setting
>> virtual wire mode back up.  However, in the alternate case where the
>> local APIC is in x2apic mode, and x2apic_enabled is false, all the APIC
>> code will attempt to use MMIO and get confused when twiddling it does
>> nothing.  (This is one of the problems the linux kdump kernel had until
>> very recently)
> How would the APIC end up in x2apic mode when x2apic_enabled
> is not set (or vice versa)?
>
> Jan
>
You get into that state when taring down the local APICs on the kexec
path, which is why we need to go and tweak the x2apic_enabled variable. 
Without this patch, there is nowhere in the Xen code which ever sets
x2apic_enabled to 0 (It defaults to 0 in the .data section).

~Andrew

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 7 of 7] KEXEC: correctly revert x2apic state when kexecing
  2011-06-15 10:36     ` Andrew Cooper
@ 2011-06-15 10:44       ` Jan Beulich
  2011-06-15 11:59         ` Andrew Cooper
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2011-06-15 10:44 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel@lists.xensource.com

>>> On 15.06.11 at 12:36, Andrew Cooper <andrew.cooper3@citrix.com> wrote:

> 
> On 15/06/11 11:14, Jan Beulich wrote:
>>>>> On 15.06.11 at 11:40, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>> On 15/06/11 10:26, Jan Beulich wrote:
>>>>>>> On 13.06.11 at 19:02, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>>>> @@ -345,6 +346,33 @@ void disable_local_APIC(void)
>>>>>          wrmsrl(MSR_IA32_APICBASE, msr_content &
>>>>>                 ~(MSR_IA32_APICBASE_ENABLE|MSR_IA32_APICBASE_EXTD));
>>>>>      }
>>>>> +
>>>>> +    if ( kexecing )
>>>>> +    {
>>>>> +        uint64_t msr_content;
>>>>> +        rdmsrl(MSR_IA32_APICBASE, msr_content);
>>>>> +        msr_content &= ~(MSR_IA32_APICBASE_ENABLE|MSR_IA32_APICBASE_EXTD);
>>>>> +        wrmsrl(MSR_IA32_APICBASE, msr_content);
>>>>> +
>>>>> +        switch ( apic_boot_mode )
>>>> Did you verify this gets executed only for the single remaining CPU?
>>> It most definitely runs on all CPUs.  Because of the difference between
>>> x2apic and xapic interrupts, it is stark-raving mad to try and run a
>>> system with different lapics in different modes as the default operating
>>> state.
>> But you're not returning each CPU to its boot state - you're instead
>> forcing them all into the state the boot CPU was in (and hence
>> possibly out of sync with other firmware provided information).
> 
> My point was that the apic state of the boot processor really cant be
> different to the boot state of any other processors, due to the triple
> faulting fun you get when lapics receive interrupts in the wrong mode.

As long as the APIC is (software) disabled, it won't receive any
interrupts (apart from the special SMP boot messages).

>>>>> +    if ( current_local_apic_mode() == APIC_MODE_X2APIC )
>>>>> +        x2apic_enabled = 1;
>>>>> +    else
>>>>> +        x2apic_enabled = 0;
>>>> Do you really need to force x2apic_enabled *both* ways to avoid
>>>> described protection fault? And really I still don't follow why the
>>>> variable at the end of the life of the system all of the sudden needs
>>>> tweaking, when the system lived happily with its "normal" value.
>>> You do need to force it both ways.  disable_IO_APIC() which is the
>>> following call runs the risk of causing a protection fault, when setting
>>> virtual wire mode back up.  However, in the alternate case where the
>>> local APIC is in x2apic mode, and x2apic_enabled is false, all the APIC
>>> code will attempt to use MMIO and get confused when twiddling it does
>>> nothing.  (This is one of the problems the linux kdump kernel had until
>>> very recently)
>> How would the APIC end up in x2apic mode when x2apic_enabled
>> is not set (or vice versa)?
>>
>> Jan
>>
> You get into that state when taring down the local APICs on the kexec
> path, which is why we need to go and tweak the x2apic_enabled variable. 
> Without this patch, there is nowhere in the Xen code which ever sets
> x2apic_enabled to 0 (It defaults to 0 in the .data section).

Past that point there shouldn't be any accesses anymore. If there
are, I'd rather say that is what needs fixing.

Jan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 7 of 7] KEXEC: correctly revert x2apic  state when kexecing
  2011-06-15 10:44       ` Jan Beulich
@ 2011-06-15 11:59         ` Andrew Cooper
  2011-06-15 15:41           ` [PATCH 7 of 7] KEXEC: correctly revert x2apic state when kexecing [Reformatted] Andrew Cooper
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Cooper @ 2011-06-15 11:59 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel@lists.xensource.com



On 15/06/11 11:44, Jan Beulich wrote:
>>>> On 15.06.11 at 12:36, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> On 15/06/11 11:14, Jan Beulich wrote:
>>>>>> On 15.06.11 at 11:40, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>>> On 15/06/11 10:26, Jan Beulich wrote:
>>>>>>>> On 13.06.11 at 19:02, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>>>>> @@ -345,6 +346,33 @@ void disable_local_APIC(void)
>>>>>>          wrmsrl(MSR_IA32_APICBASE, msr_content &
>>>>>>                 ~(MSR_IA32_APICBASE_ENABLE|MSR_IA32_APICBASE_EXTD));
>>>>>>      }
>>>>>> +
>>>>>> +    if ( kexecing )
>>>>>> +    {
>>>>>> +        uint64_t msr_content;
>>>>>> +        rdmsrl(MSR_IA32_APICBASE, msr_content);
>>>>>> +        msr_content &= ~(MSR_IA32_APICBASE_ENABLE|MSR_IA32_APICBASE_EXTD);
>>>>>> +        wrmsrl(MSR_IA32_APICBASE, msr_content);
>>>>>> +
>>>>>> +        switch ( apic_boot_mode )
>>>>> Did you verify this gets executed only for the single remaining CPU?
>>>> It most definitely runs on all CPUs.  Because of the difference between
>>>> x2apic and xapic interrupts, it is stark-raving mad to try and run a
>>>> system with different lapics in different modes as the default operating
>>>> state.
>>> But you're not returning each CPU to its boot state - you're instead
>>> forcing them all into the state the boot CPU was in (and hence
>>> possibly out of sync with other firmware provided information).
>> My point was that the apic state of the boot processor really cant be
>> different to the boot state of any other processors, due to the triple
>> faulting fun you get when lapics receive interrupts in the wrong mode.
> As long as the APIC is (software) disabled, it won't receive any
> interrupts (apart from the special SMP boot messages).

True, until the kdump kernel soft-enabled the local apic, tries to
configure it and falls over because it is using MMIO when Xen left the
lapic in x2apic mode.  (I guess one argument is "use a newer kernel" but
that is not helpful for those of us needing to support older kernels)

>>>>>> +    if ( current_local_apic_mode() == APIC_MODE_X2APIC )
>>>>>> +        x2apic_enabled = 1;
>>>>>> +    else
>>>>>> +        x2apic_enabled = 0;
>>>>> Do you really need to force x2apic_enabled *both* ways to avoid
>>>>> described protection fault? And really I still don't follow why the
>>>>> variable at the end of the life of the system all of the sudden needs
>>>>> tweaking, when the system lived happily with its "normal" value.
>>>> You do need to force it both ways.  disable_IO_APIC() which is the
>>>> following call runs the risk of causing a protection fault, when setting
>>>> virtual wire mode back up.  However, in the alternate case where the
>>>> local APIC is in x2apic mode, and x2apic_enabled is false, all the APIC
>>>> code will attempt to use MMIO and get confused when twiddling it does
>>>> nothing.  (This is one of the problems the linux kdump kernel had until
>>>> very recently)
>>> How would the APIC end up in x2apic mode when x2apic_enabled
>>> is not set (or vice versa)?
>>>
>>> Jan
>>>
>> You get into that state when taring down the local APICs on the kexec
>> path, which is why we need to go and tweak the x2apic_enabled variable. 
>> Without this patch, there is nowhere in the Xen code which ever sets
>> x2apic_enabled to 0 (It defaults to 0 in the .data section).
> Past that point there shouldn't be any accesses anymore. If there
> are, I'd rather say that is what needs fixing.
>
> Jan
>
I tried swapping the order of disable_local_APIC and disable_IO_APICs on
the crash path, but that resulted in the kdump kernel not getting any
interrupts whatsoever, and hanging indefinitely when checking the hlt
instruction.  I am still not certain why that happened, but given that
the kdump kernel is fine when booting natively, I didn't explore the
problem very much.

I dislike this hack as much as you do, but I have not found any other
way of doing which is anything like as simple.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 7 of 7] KEXEC: correctly revert x2apic state when kexecing [Reformatted]
  2011-06-15 11:59         ` Andrew Cooper
@ 2011-06-15 15:41           ` Andrew Cooper
  2011-06-15 15:50             ` Ian Campbell
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Cooper @ 2011-06-15 15:41 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel@lists.xensource.com, Jan Beulich

[-- Attachment #1: Type: text/plain, Size: 326 bytes --]

Tweaked the patch to prevent IOMMU_WAIT_OP panicking on the kexec path. 
Unfortunately, this has to rely on the kexecing variable.

As with the other hacks in this patch, I cant see a better way of
solving the problem.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com


[-- Attachment #2: kexec-fix-x2apic.patch --]
[-- Type: text/x-patch, Size: 6308 bytes --]

KEXEC: correctly revert x2apic state when kexecing

Introduce the boolean variable 'kexecing' which indicates to functions
whether we are on the kexec path or not.  This is used by
disable_local_APIC() to try and revert the APIC mode back to how it
was found on boot.

We also need some fudging of the x2apic_enabled variable.  It is used
in multiple places over the codebase to mean multiple things,
including:
    What did the user specifify on the command line?
    Did the BIOS boot me in x2apic mode?
    Is the BSP Local APIC in x2apic mode?
    What mode is my Local APIC in?

Therefore, set it up to prevent a protection fault when disabling the
IOAPICs.  (In this case, it is used in the "What mode is my Local APIC
in?" case, so the processor doesnt suffer a protection fault because
of trying to use x2apic MSRs when it should be using xapic MMIO)

Finally, make sure that interrupts are disabled when jumping into the
purgatory code.  It would be bad to service interrupts in the Xen
context when the next kernel is booting.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

diff -r e30942288f87 xen/arch/x86/apic.c
--- a/xen/arch/x86/apic.c	Wed Jun 15 15:55:20 2011 +0100
+++ b/xen/arch/x86/apic.c	Wed Jun 15 16:21:13 2011 +0100
@@ -37,6 +37,7 @@
 #include <asm/asm_defns.h> /* for BUILD_SMP_INTERRUPT */
 #include <mach_apic.h>
 #include <io_ports.h>
+#include <xen/kexec.h>
 
 static bool_t tdt_enabled __read_mostly;
 static bool_t tdt_enable __initdata = 1;
@@ -349,6 +350,33 @@ void disable_local_APIC(void)
         wrmsrl(MSR_IA32_APICBASE, msr_content &
                ~(MSR_IA32_APICBASE_ENABLE|MSR_IA32_APICBASE_EXTD));
     }
+
+    if ( kexecing )
+    {
+        uint64_t msr_content;
+        rdmsrl(MSR_IA32_APICBASE, msr_content);
+        msr_content &= ~(MSR_IA32_APICBASE_ENABLE|MSR_IA32_APICBASE_EXTD);
+        wrmsrl(MSR_IA32_APICBASE, msr_content);
+
+        switch ( apic_boot_mode )
+        {
+        case APIC_MODE_DISABLED:
+            break; /* Nothing to do - we did this above */
+        case APIC_MODE_XAPIC:
+            msr_content |= MSR_IA32_APICBASE_ENABLE;
+            wrmsrl(MSR_IA32_APICBASE, msr_content);
+            break;
+        case APIC_MODE_X2APIC:
+            msr_content |= (MSR_IA32_APICBASE_ENABLE|MSR_IA32_APICBASE_EXTD);
+            wrmsrl(MSR_IA32_APICBASE, msr_content);
+            break;
+        default:
+            printk("Default case when reverting #%d lapic to boot state\n",
+                   smp_processor_id());
+            break;
+        }
+    }
+
 }
 
 /*
diff -r e30942288f87 xen/arch/x86/crash.c
--- a/xen/arch/x86/crash.c	Wed Jun 15 15:55:20 2011 +0100
+++ b/xen/arch/x86/crash.c	Wed Jun 15 16:21:13 2011 +0100
@@ -27,6 +27,7 @@
 #include <asm/hvm/support.h>
 #include <asm/apic.h>
 #include <asm/io_apic.h>
+#include <xen/iommu.h>
 
 static atomic_t waiting_for_crash_ipi;
 static unsigned int crashing_cpu;
@@ -82,6 +83,15 @@ static void nmi_shootdown_cpus(void)
     iommu_crash_shutdown();
 
     __stop_this_cpu();
+
+    /* This is a bit of a hack due to the problems with the x2apic_enabled
+     * variable, but we can't do any better without a significant refactoring
+     * of the APIC code */
+    if ( current_local_apic_mode() == APIC_MODE_X2APIC )
+        x2apic_enabled = 1;
+    else
+        x2apic_enabled = 0;
+
     disable_IO_APIC();
 }
 
diff -r e30942288f87 xen/arch/x86/machine_kexec.c
--- a/xen/arch/x86/machine_kexec.c	Wed Jun 15 15:55:20 2011 +0100
+++ b/xen/arch/x86/machine_kexec.c	Wed Jun 15 16:21:13 2011 +0100
@@ -88,6 +88,11 @@ void machine_kexec(xen_kexec_image_t *im
     if ( hpet_broadcast_is_available() )
         hpet_disable_legacy_broadcast();
 
+    /* We are about to permenantly jump out of the Xen context into the kexec
+     * purgatory code.  We really dont want to be still servicing interupts.
+     */
+    local_irq_disable();
+
     /*
      * compat_machine_kexec() returns to idle pagetables, which requires us
      * to be running on a static GDT mapping (idle pagetables have no GDT
diff -r e30942288f87 xen/common/kexec.c
--- a/xen/common/kexec.c	Wed Jun 15 15:55:20 2011 +0100
+++ b/xen/common/kexec.c	Wed Jun 15 16:21:13 2011 +0100
@@ -29,6 +29,8 @@
 #include <compat/kexec.h>
 #endif
 
+bool_t kexecing = FALSE;
+
 static DEFINE_PER_CPU_READ_MOSTLY(void *, crash_notes);
 
 static Elf_Note *xen_crash_note;
@@ -220,6 +222,8 @@ void kexec_crash(void)
     if ( !test_bit(KEXEC_IMAGE_CRASH_BASE + pos, &kexec_flags) )
         return;
 
+    kexecing = TRUE;
+
     kexec_common_shutdown();
     kexec_crash_save_cpu();
     machine_crash_shutdown();
@@ -232,6 +236,8 @@ static long kexec_reboot(void *_image)
 {
     xen_kexec_image_t *image = _image;
 
+    kexecing = TRUE;
+
     kexec_common_shutdown();
     machine_reboot_kexec(image);
 
diff -r e30942288f87 xen/drivers/passthrough/vtd/dmar.h
--- a/xen/drivers/passthrough/vtd/dmar.h	Wed Jun 15 15:55:20 2011 +0100
+++ b/xen/drivers/passthrough/vtd/dmar.h	Wed Jun 15 16:21:13 2011 +0100
@@ -23,6 +23,7 @@
 
 #include <xen/list.h>
 #include <xen/iommu.h>
+#include <xen/kexec.h>
 
 /* This one is for interrupt remapping */
 struct acpi_ioapic_unit {
@@ -99,8 +100,9 @@ do {                                    
         if ( cond )                                 \
             break;                                  \
         if ( NOW() > start_time + DMAR_OPERATION_TIMEOUT )      \
-            panic("%s:%d:%s: DMAR hardware is malfunctional\n", \
-                  __FILE__, __LINE__, __func__);                \
+            if ( !kexecing )                                    \
+                panic("%s:%d:%s: DMAR hardware is malfunctional\n",\
+                      __FILE__, __LINE__, __func__);            \
         cpu_relax();                                            \
     }                                                           \
 } while (0)
diff -r e30942288f87 xen/include/xen/kexec.h
--- a/xen/include/xen/kexec.h	Wed Jun 15 15:55:20 2011 +0100
+++ b/xen/include/xen/kexec.h	Wed Jun 15 16:21:13 2011 +0100
@@ -12,6 +12,8 @@ typedef struct xen_kexec_reserve {
 
 extern xen_kexec_reserve_t kexec_crash_area;
 
+extern bool_t kexecing;
+
 void set_kexec_crash_area_size(u64 system_ram);
 
 /* We have space for 4 images to support atomic update

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 7 of 7] KEXEC: correctly revert x2apic state when kexecing [Reformatted]
  2011-06-15 15:41           ` [PATCH 7 of 7] KEXEC: correctly revert x2apic state when kexecing [Reformatted] Andrew Cooper
@ 2011-06-15 15:50             ` Ian Campbell
  2011-06-15 16:03               ` [PATCH 7 of 7] KEXEC: correctly revert x2apic state when kexecing [Reformatted, v2] Andrew Cooper
  0 siblings, 1 reply; 9+ messages in thread
From: Ian Campbell @ 2011-06-15 15:50 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel@lists.xensource.com, Jan Beulich

On Wed, 2011-06-15 at 16:41 +0100, Andrew Cooper wrote:
> Tweaked the patch to prevent IOMMU_WAIT_OP panicking on the kexec path. 
> Unfortunately, this has to rely on the kexecing variable.
> 
> As with the other hacks in this patch, I cant see a better way of
> solving the problem.
> 


> @@ -99,8 +100,9 @@ do {                                    
>          if ( cond )                                 \
>              break;                                  \
>          if ( NOW() > start_time + DMAR_OPERATION_TIMEOUT )      \
> -            panic("%s:%d:%s: DMAR hardware is malfunctional\n", \
> -                  __FILE__, __LINE__, __func__);                \
> +            if ( !kexecing )                                    \
> +                panic("%s:%d:%s: DMAR hardware is malfunctional\n",\
> +                      __FILE__, __LINE__, __func__);            \
>          cpu_relax();                                            \
>      }                                                           \
>  } while (0)

I think you want an "else break" here to cause it to struggle onwards
rather than the infinite loop you get otherwise.

Ian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 7 of 7] KEXEC: correctly revert x2apic state when kexecing [Reformatted, v2]
  2011-06-15 15:50             ` Ian Campbell
@ 2011-06-15 16:03               ` Andrew Cooper
  0 siblings, 0 replies; 9+ messages in thread
From: Andrew Cooper @ 2011-06-15 16:03 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Jan, xen-devel@lists.xensource.com, Beulich

[-- Attachment #1: Type: text/plain, Size: 1384 bytes --]



On 15/06/11 16:50, Ian Campbell wrote:
> On Wed, 2011-06-15 at 16:41 +0100, Andrew Cooper wrote:
>> Tweaked the patch to prevent IOMMU_WAIT_OP panicking on the kexec path. 
>> Unfortunately, this has to rely on the kexecing variable.
>>
>> As with the other hacks in this patch, I cant see a better way of
>> solving the problem.
>>
>
>> @@ -99,8 +100,9 @@ do {                                    
>>          if ( cond )                                 \
>>              break;                                  \
>>          if ( NOW() > start_time + DMAR_OPERATION_TIMEOUT )      \
>> -            panic("%s:%d:%s: DMAR hardware is malfunctional\n", \
>> -                  __FILE__, __LINE__, __func__);                \
>> +            if ( !kexecing )                                    \
>> +                panic("%s:%d:%s: DMAR hardware is malfunctional\n",\
>> +                      __FILE__, __LINE__, __func__);            \
>>          cpu_relax();                                            \
>>      }                                                           \
>>  } while (0)
> I think you want an "else break" here to cause it to struggle onwards
> rather than the infinite loop you get otherwise.
>
> Ian
Yep - Fixed and reformatted against staging again.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com


[-- Attachment #2: kexec-prevent-panic.patch --]
[-- Type: text/x-patch, Size: 1540 bytes --]

KEXEC: prevent panic on the kexec path when talking to the DMAR hardware

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

diff -r 23c068b10923 xen/drivers/passthrough/vtd/dmar.h
--- a/xen/drivers/passthrough/vtd/dmar.h	Wed Jun 15 16:16:41 2011 +0100
+++ b/xen/drivers/passthrough/vtd/dmar.h	Wed Jun 15 16:57:39 2011 +0100
@@ -23,6 +23,7 @@
 
 #include <xen/list.h>
 #include <xen/iommu.h>
+#include <xen/kexec.h>
 
 /* This one is for interrupt remapping */
 struct acpi_ioapic_unit {
@@ -98,9 +99,13 @@ do {                                    
         sts = op(iommu->reg, offset);               \
         if ( cond )                                 \
             break;                                  \
-        if ( NOW() > start_time + DMAR_OPERATION_TIMEOUT )      \
-            panic("%s:%d:%s: DMAR hardware is malfunctional\n", \
-                  __FILE__, __LINE__, __func__);                \
+        if ( NOW() > start_time + DMAR_OPERATION_TIMEOUT ) {    \
+            if ( !kexecing )                                    \
+                panic("%s:%d:%s: DMAR hardware is malfunctional\n",\
+                      __FILE__, __LINE__, __func__);            \
+            else                                                \
+                break;                                          \
+        }                                                       \
         cpu_relax();                                            \
     }                                                           \
 } while (0)

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-06-15 16:03 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-06-15  9:26 [PATCH 7 of 7] KEXEC: correctly revert x2apic state when kexecing Jan Beulich
2011-06-15  9:40 ` Andrew Cooper
2011-06-15 10:14   ` Jan Beulich
2011-06-15 10:36     ` Andrew Cooper
2011-06-15 10:44       ` Jan Beulich
2011-06-15 11:59         ` Andrew Cooper
2011-06-15 15:41           ` [PATCH 7 of 7] KEXEC: correctly revert x2apic state when kexecing [Reformatted] Andrew Cooper
2011-06-15 15:50             ` Ian Campbell
2011-06-15 16:03               ` [PATCH 7 of 7] KEXEC: correctly revert x2apic state when kexecing [Reformatted, v2] Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.