All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0 of 7] Fix kexec in Xen (take 4)
@ 2011-06-13 17:02 Andrew Cooper
  2011-06-13 17:02 ` [PATCH 1 of 7] APIC BUG: fix potential Protection Fault during shutdown Andrew Cooper
                   ` (7 more replies)
  0 siblings, 8 replies; 49+ messages in thread
From: Andrew Cooper @ 2011-06-13 17:02 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

This set of patches are partly bugfixed to problems I have noticed while working on this area of the codebase, and targeted fixes to get the kexec path working again on Xen 4.x

The first three patches are small bugfixes which are not directly related to the kexec problems, but are on the codepath.

The next four patches are directly related to fixing the kexec path.

These patches cause xen to track the BSP local APIC boot state and return to it before kexec'ing to a new kernel.  This prevents the kdump kernel falling over itself when booting in x2apic mode while expecting to be in xapic mode.

It also makes sure to disable IO virtualisation, along with fixing the current codepath related to disabling Interrup Remapping on Intel VTD boxes.


Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

^ permalink raw reply	[flat|nested] 49+ messages in thread
* Re: [PATCH 7 of 7] KEXEC: correctly revert x2apic state when kexecing
@ 2011-06-15  9:26 Jan Beulich
  2011-06-15  9:40 ` Andrew Cooper
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2011-06-15  9:26 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

>>> On 13.06.11 at 19:02, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> Introduce the boolean variable 'kexecing' which indicates to functions
> whether we are on the kexec path or not.  This is used by
> disable_local_APIC() to try and revert the APIC mode back to how it
> was found on boot.
> 
> We also need some fudging of the x2apic_enabled variable.  It is used
> in multiple places over the codebase to mean multiple things,
> including:
>     What did the user specifify on the command line?
>     Did the BIOS boot me in x2apic mode?
>     Is the BSP Local APIC in x2apic mode?
>     What mode is my Local APIC in?

I don't really follow the need for this, and a properly explaining
comment certainly also belongs at the place where the hack is.

> Therefore, set it up to prevent a protection fault when disabling the
> IOAPICs.  (In this case, it is used in the "What mode is my Local APIC
> in?" case, so the processor doesnt suffer a protection fault because
> of trying to use x2apic MSRs when it should be using xapic MMIO)
> 
> Finally, make sure that interrupts are disabled when jumping into the
> purgatory code.  It would be bad to service interrupts in the Xen
> context when the next kernel is booting.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> diff -r 3ad737eb0a8d -r b6666fbafe33 xen/arch/x86/apic.c
> --- a/xen/arch/x86/apic.c	Mon Jun 13 17:45:43 2011 +0100
> +++ b/xen/arch/x86/apic.c	Mon Jun 13 17:45:43 2011 +0100
> @@ -37,6 +37,7 @@
>  #include <asm/asm_defns.h> /* for BUILD_SMP_INTERRUPT */
>  #include <mach_apic.h>
>  #include <io_ports.h>
> +#include <xen/kexec.h>
>  
>  static bool_t tdt_enabled __read_mostly;
>  static bool_t tdt_enable __initdata = 1;
> @@ -345,6 +346,33 @@ void disable_local_APIC(void)
>          wrmsrl(MSR_IA32_APICBASE, msr_content &
>                 ~(MSR_IA32_APICBASE_ENABLE|MSR_IA32_APICBASE_EXTD));
>      }
> +
> +    if ( kexecing )
> +    {
> +        uint64_t msr_content;
> +        rdmsrl(MSR_IA32_APICBASE, msr_content);
> +        msr_content &= ~(MSR_IA32_APICBASE_ENABLE|MSR_IA32_APICBASE_EXTD);
> +        wrmsrl(MSR_IA32_APICBASE, msr_content);
> +
> +        switch ( apic_boot_mode )

Did you verify this gets executed only for the single remaining CPU?

> +        {
> +        case APIC_MODE_DISABLED:
> +            break; /* Nothing to do - we did this above */
> +        case APIC_MODE_XAPIC:
> +            msr_content |= MSR_IA32_APICBASE_ENABLE;
> +            wrmsrl(MSR_IA32_APICBASE, msr_content);
> +            break;
> +        case APIC_MODE_X2APIC:
> +            msr_content |= 
> (MSR_IA32_APICBASE_ENABLE|MSR_IA32_APICBASE_EXTD);
> +            wrmsrl(MSR_IA32_APICBASE, msr_content);
> +            break;
> +        default:
> +            printk("Default case when reverting #%d lapic to boot state\n",
> +                   smp_processor_id());
> +            break;
> +        }
> +    }
> +
>  }
>  
>  /*
> diff -r 3ad737eb0a8d -r b6666fbafe33 xen/arch/x86/crash.c
> --- a/xen/arch/x86/crash.c	Mon Jun 13 17:45:43 2011 +0100
> +++ b/xen/arch/x86/crash.c	Mon Jun 13 17:45:43 2011 +0100
> @@ -27,6 +27,7 @@
>  #include <asm/hvm/support.h>
>  #include <asm/apic.h>
>  #include <asm/io_apic.h>
> +#include <xen/iommu.h>
>  
>  static atomic_t waiting_for_crash_ipi;
>  static unsigned int crashing_cpu;
> @@ -82,6 +83,15 @@ static void nmi_shootdown_cpus(void)
>      iommu_crash_shutdown();
>  
>      __stop_this_cpu();
> +
> +    /* This is a bit of a hack due to the problems with the x2apic_enabled
> +     * variable, but we can't do any better without a significant 
> refactoring
> +     * of the APIC code */

Ugly, but I can't exclude it may indeed be necessary. But no matter
what, I think this belongs into apic.c.

> +    if ( current_local_apic_mode() == APIC_MODE_X2APIC )
> +        x2apic_enabled = 1;
> +    else
> +        x2apic_enabled = 0;

Do you really need to force x2apic_enabled *both* ways to avoid
described protection fault? And really I still don't follow why the
variable at the end of the life of the system all of the sudden needs
tweaking, when the system lived happily with its "normal" value.

Jan

> +
>      disable_IO_APIC();
>  
>      local_irq_restore(flags);
> diff -r 3ad737eb0a8d -r b6666fbafe33 xen/arch/x86/machine_kexec.c
> --- a/xen/arch/x86/machine_kexec.c	Mon Jun 13 17:45:43 2011 +0100
> +++ b/xen/arch/x86/machine_kexec.c	Mon Jun 13 17:45:43 2011 +0100
> @@ -91,6 +91,11 @@ void machine_kexec(xen_kexec_image_t *im
>      if ( hpet_broadcast_is_available() )
>          hpet_disable_legacy_broadcast();
>  
> +    /* We are about to permenantly jump out of the Xen context into the 
> kexec
> +     * purgatory code.  We really dont want to be still servicing 
> interupts.
> +     */
> +    local_irq_disable();
> +
>      /*
>       * compat_machine_kexec() returns to idle pagetables, which requires us
>       * to be running on a static GDT mapping (idle pagetables have no GDT
> diff -r 3ad737eb0a8d -r b6666fbafe33 xen/common/kexec.c
> --- a/xen/common/kexec.c	Mon Jun 13 17:45:43 2011 +0100
> +++ b/xen/common/kexec.c	Mon Jun 13 17:45:43 2011 +0100
> @@ -29,6 +29,8 @@
>  #include <compat/kexec.h>
>  #endif
>  
> +bool_t kexecing = FALSE;
> +
>  static DEFINE_PER_CPU_READ_MOSTLY(void *, crash_notes);
>  
>  static Elf_Note *xen_crash_note;
> @@ -220,6 +222,8 @@ void kexec_crash(void)
>      if ( !test_bit(KEXEC_IMAGE_CRASH_BASE + pos, &kexec_flags) )
>          return;
>  
> +    kexecing = TRUE;
> +
>      kexec_common_shutdown();
>      kexec_crash_save_cpu();
>      machine_crash_shutdown();
> @@ -232,6 +236,8 @@ static long kexec_reboot(void *_image)
>  {
>      xen_kexec_image_t *image = _image;
>  
> +    kexecing = TRUE;
> +
>      kexec_common_shutdown();
>      machine_reboot_kexec(image);
>  
> diff -r 3ad737eb0a8d -r b6666fbafe33 xen/include/xen/kexec.h
> --- a/xen/include/xen/kexec.h	Mon Jun 13 17:45:43 2011 +0100
> +++ b/xen/include/xen/kexec.h	Mon Jun 13 17:45:43 2011 +0100
> @@ -12,6 +12,8 @@ typedef struct xen_kexec_reserve {
>  
>  extern xen_kexec_reserve_t kexec_crash_area;
>  
> +extern bool_t kexecing;
> +
>  void set_kexec_crash_area_size(u64 system_ram);
>  
>  /* We have space for 4 images to support atomic update
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com 
> http://lists.xensource.com/xen-devel 

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2011-06-16 13:13 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-06-13 17:02 [PATCH 0 of 7] Fix kexec in Xen (take 4) Andrew Cooper
2011-06-13 17:02 ` [PATCH 1 of 7] APIC BUG: fix potential Protection Fault during shutdown Andrew Cooper
2011-06-14  8:44   ` Jan Beulich
2011-06-14  9:44     ` Andrew Cooper
2011-06-13 17:02 ` [PATCH 2 of 7] KEXEC BUG: nmi_shootdown_cpus doesn't look after the interrupt flag Andrew Cooper
2011-06-14  8:46   ` Jan Beulich
2011-06-14  9:46     ` Keir Fraser
2011-06-15 11:01       ` [PATCH 2 of 7] KEXEC BUG: nmi_shootdown_cpus doesn't look after the interrupt flag [Reformatted] Andrew Cooper
2011-06-14  9:51     ` [PATCH 2 of 7] KEXEC BUG: nmi_shootdown_cpus doesn't look after the interrupt flag Andrew Cooper
2011-06-13 17:02 ` [PATCH 3 of 7] IOMMU: Sanitise pointer work Andrew Cooper
2011-06-13 18:13   ` Keir Fraser
2011-06-14  9:53     ` Andrew Cooper
2011-06-14 11:51       ` Keir Fraser
2011-06-13 17:02 ` [PATCH 4 of 7] APIC: record local APIC state on boot Andrew Cooper
2011-06-14  8:57   ` Jan Beulich
2011-06-14 10:48     ` Ian Campbell
2011-06-14 11:21       ` Jan Beulich
2011-06-15 12:33         ` [PATCH 4 of 7] APIC: record local APIC state on boot [Reformatted] Andrew Cooper
2011-06-15 12:42           ` Keir Fraser
2011-06-15 13:38             ` Andrew Cooper
2011-06-15 14:49               ` Andrew Cooper
2011-06-15 12:50           ` Jan Beulich
2011-06-13 17:02 ` [PATCH 5 of 7] IOMMU VTD BUG: disable Extended Interrupt Mode when disabling Interupt Remapping Andrew Cooper
2011-06-14  9:02   ` Jan Beulich
2011-06-14  9:59     ` Andrew Cooper
2011-06-14 21:20     ` Kay, Allen M
2011-06-15  6:48       ` Jan Beulich
2011-06-15  7:45         ` Ian Campbell
2011-06-15 14:49           ` [PATCH 5 of 7] IOMMU VTD BUG: disable Extended Interrupt Mode when disabling Interupt Remapping [Reformatted] Andrew Cooper
2011-06-14 21:45   ` [PATCH 5 of 7] IOMMU VTD BUG: disable Extended Interrupt Mode when disabling Interupt Remapping Kay, Allen M
2011-06-13 17:02 ` [PATCH 6 of 7] IOMMU: add crash_shutdown iommu_op Andrew Cooper
2011-06-14 12:10   ` Keir Fraser
2011-06-15 12:50     ` Andrew Cooper
2011-06-14 22:15   ` Kay, Allen M
2011-06-15 13:06     ` Andrew Cooper
2011-06-15 16:39       ` Kay, Allen M
2011-06-15 15:00     ` [PATCH 6 of 7] IOMMU: add crash_shutdown iommu_op [Reformatted] Andrew Cooper
2011-06-13 17:02 ` [PATCH 7 of 7] KEXEC: correctly revert x2apic state when kexecing Andrew Cooper
2011-06-14 12:11   ` Keir Fraser
2011-06-14 13:05     ` Andrew Cooper
2011-06-13 18:15 ` [PATCH 0 of 7] Fix kexec in Xen (take 4) Keir Fraser
2011-06-16 13:05   ` Andrew Cooper
2011-06-16 13:13     ` Keir Fraser
  -- strict thread matches above, loose matches on Subject: below --
2011-06-15  9:26 [PATCH 7 of 7] KEXEC: correctly revert x2apic state when kexecing Jan Beulich
2011-06-15  9:40 ` Andrew Cooper
2011-06-15 10:14   ` Jan Beulich
2011-06-15 10:36     ` Andrew Cooper
2011-06-15 10:44       ` Jan Beulich
2011-06-15 11:59         ` Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.