Re: kexec -e in PVHVM guests (and in PV).

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: xen-devel@lists.xenproject.org, daniel.kiper@oracle.com
Subject: Re: kexec -e in PVHVM guests (and in PV).
Date: Tue, 1 Jul 2014 11:34:01 -0400	[thread overview]
Message-ID: <20140701153401.GA26227@laptop.dumpdata.com> (raw)
In-Reply-To: <87ha317b05.fsf@vitty.brq.redhat.com>

[-- Attachment #1: Type: text/plain, Size: 9027 bytes --]

On Tue, Jul 01, 2014 at 10:12:58AM +0200, Vitaly Kuznetsov wrote:
> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> writes:
> 
> > Hey, 
> >
> > I had on my todo list an patch from Olaf patch that shuffles
> > the shared_page to be in the 0xFE700000 addr (in the "gap"
> > with newer QEMU's) which unfortunately did not work when
> > migrating on 32-bit PVHVM guests on Xen 4.1.
> >
> > The commit is 9d02b43dee0d7fb18dfb13a00915550b1a3daa9f
> > "xen PVonHVM: use E820_Reserved area for shared_info" and it
> > ended up being reverted. I dusted it off and I think I found
> > the original bug (and fixed it), but while digging in this
> > the more I discovered a ton more of issues.
> >
> > A bit about the use case - the 'kexec -e' allows one to
> > restart the Linux kernel without a reboot. It is not a crash kernel
> > so it is just meant to restart and work, and then restart, etc.
> >
> > The 'kdump -c' (crash) is a different use case and I had not
> > thought much about it. But I think that all of the solutions
> > I am thinking of will make it also work. (so you could
> > do kexec-crash -> kexec-e->kexec-e>kexec-crash->kexec-e, and
> > so, if you would want to).
> >
> > The problem I uncovered was that the memory region where
> > the new kernel would be executed had bits of memory changed - which
> > meant that the purgatory code in kexec would detect the SHA1SUM
> > being incorrect and not load. That lead me to find out that
> > VCPUOP_register_vcpu_info was the culprit (well, the xen_vcpu_info
> > was being modified, and its PFN was in the 'new' kernel image area).
> >
> > Anyhow, the end result of that is that I think to get this
> > working we would need to have:
> >
> >  1). A symmetrical VCPUOP_register_vcpu_info call, say
> >      VCPUOP_unregister_vcpu_info, which would for a provided vpuid
> >      set 'vcpu_info' to the shared_info, and 'vcpu_info_mfn' to
> >      INVALID_MFN. Naturally the vcpu_id has to be down (_VPF_down).
> >      A prototype patch along with an naive implementation in
> >      the Linux kernel made this work surprisingly well!
> >
> >      The Linux kernel had to call on the shutdown the:
> >      disable_nonboot_cpus() which would bring all the AP CPUs down.
> >      Each AP CPU would call said hypercall. Also on each CPU
> >      bringup we would call this (that is the BSP would make this
> >      call before bringing the AP CPUs up - on bootup paths it
> >      would result in nothing, while for an kexec -c type kernel
> >      it would allow us to use the CPUs).
> >
> >  2). Ditto for VCPUOP_register_runtime and
> >      VCPUOP_register_runstate_memory_area.  They would need a
> >      similar 'unregister' call with similar semantics as the
> >      one above.
> >
> >  3). The shared_info. Olaf's patch stuck the shared_info in the
> >      "gaps" of the E820 or the E820_RSRV region. But the recent patches
> >       for PCI passthrough are making me twitchy and I think we would
> >       need to parse the E820 and /proc/ioports (so 'resource API in
> >       Linux kernel' to figure out a good place to stash this. Or on
> >       shutdown (kexec -e)  balloon out the shared region (need to
> >       double check that this possible in the first place).
> >
> >  4). Balloon memory. I am not really sure how to deal with that. The
> >      guest might have ballooned out tons of memory but the new kernel
> >      won't know about it until the xen/balloon driver kicks in and
> >      figures this out based on XenStore. Then it will try to balloon
> >      out.. and depending on its luck - balloon out memory that was
> >      already ballooned out, or not.  Also during the bootup of
> >      the 'kexec -e' kernel it might touch pages that had been
> >      ballooned out - and try to use them!
> >
> >  5). Events. Olaf had written code long time ago that would poke the
> >      events to see if they were already in use (-EEXIST) and if so
> >      re-use them - it works great albeit there are tons of messages
> >      in the Xen ring buffer. The Linux patch I wrote did an
> >      'disable_nonboot_cpus' and also tore down the BSP interrupts - that
> >      meant that all of the events were nicely torn down. This all works
> >      on non-FIFO event.  David Vrabel says that to make this work
> >      (re-use or teardown and bring up) would be hard.
> >
> >  6). QEMU PnP typ devices. Such as 'serial,'i8042', and 'rtc' end up
> >      going through the EVTCHNOP_bind_pirg. Somehow on the 'kexec -e'
> >      kernel we end up doing OK, but the devices don't work anymore.
> >      That is - the serial input does not accept any more input (but
> >      it can output alright).
> >
> >  7). Grants. Andrew Cooper hinted at this and a bit of experimentation
> >      shows that Xen hypervisor will indeed smack down any guest that
> >      tries to re-use its "old" grants. I am not even sure if the
> >      GNTTAB_setup call is returning the "old" grant frames.
> >      His suggestion was 'GNTTAB_reset' to well, reset everything.
> >
> > My thinking is that a lot of this code is shared with PV (and PVH)
> > once this is fixed we could do full scale 'kexec -e' in an PV
> > (or PVH) type guest. Doing dom0 kexec -e would be an interesting
> > experiment :-(
> >
> > I am unable to fix this for Xen 4.5 and I am not sure what other
> > issues there are present. If folks have some ideas or would like to
> > chime in (or even pick some of these up!)- please do respond.
> >
> 
> I have one more issue related to kexec/kdump topic I'm investigating
> right now. 

Woot!
> 
> When kdump happens and new kernel boots we have /proc/vmcore
> device. There is no problem in reading from this device, however
> makedumpfile reads it with mmap() by default and that doesn't work for
> me.
> 
> I figured out the following: there are several pages (2 in my case) in
> vmcore which are not ram. read_from_oldmem() calls special pfn_is_ram()
> check (which does HVMOP_get_mem_type and these pages are reported as
> HVMMEM_mmio_dm) and skips them. mmap_vmcore() doesn't have this check
> and we got these pages mapped. When we do memcpy() from them we get
> stuck in case we try reading more than 16 bytes (that's weird).

Ooh, would it make sense to expand 'mmap_vmcore' to have this check?
> 
> I have 'quick and dirty' patch which brings pfn_is_ram() check to
> mmap_vmcore() and replaces all HVMMEM_mmio_dm pages with an empty
> page. I'm going to investigate a bit more here.

Ok.
> 
> I can try looking at something from the above as well. E.g. I was able
> to solve no.6 with the following (yes, dirty again) patch:

Yeey! That would be fantastic.

Heh. I was thinking some thing similar, albeit to do this also
from the 'xen_kexec_shutdown' path - in case we are booting in
an kernel that does not have these patches.

See the four attached patches - two for Xen, and two for Linux.
They are very much RFC and I believe they are still buggy. If you
want to try them out and improve, please be my guest.

Thank you for your interest!
> 
> commit 23a224c4ad664dfc6fe672f74f83549387efebda
> Author: Vitaly Kuznetsov <vkuznets@redhat.com>
> Date:   Wed Jun 18 14:12:15 2014 +0200
> 
>     wip: unmap all pirqs
>     
>     Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> 
> diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
> index dfa12a4..16af7e4 100644
> --- a/drivers/xen/events/events_base.c
> +++ b/drivers/xen/events/events_base.c
> @@ -1658,6 +1719,35 @@ void xen_callback_vector(void) {}
>  static bool fifo_events = true;
>  module_param(fifo_events, bool, 0);
>  
> +static void unmap_all_pirqs(void)
> +{
> +	struct evtchn_status status;
> +	int port, rc = -ENOENT;
> +	struct physdev_unmap_pirq unmap_irq;
> +	struct evtchn_close close;
> +
> +	memset(&status, 0, sizeof(status));
> +	for (port = 0; port < xen_evtchn_max_channels(); port++) {
> +		status.dom = DOMID_SELF;
> +		status.port = port;
> +		rc = HYPERVISOR_event_channel_op(EVTCHNOP_status, &status);
> +		if (rc < 0)
> +			continue;
> +		pr_warn("unmap_all_pirqs: port: %d, status: %d\n", status.port, status.status);
> +		if (status.status == EVTCHNSTAT_pirq) {
> +			close.port = port;
> +			if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close) != 0)
> +				pr_warn("EVTCHNSTAT_pirq: failed to close event channel %d\n", port);
> +			unmap_irq.pirq = status.u.pirq;
> +			unmap_irq.domid = DOMID_SELF;
> +			pr_warn("unmapping previously mapped pirq %d\n", unmap_irq.pirq);
> +			if (HYPERVISOR_physdev_op(PHYSDEVOP_unmap_pirq, &unmap_irq) != 0)
> +				pr_warn("failed to unmap pirq %d\n", unmap_irq.pirq);
> +		}
> +	}
> +}
> +
> +
>  void __init xen_init_IRQ(void)
>  {
>  	int ret = -EINVAL;
> @@ -1686,6 +1776,8 @@ void __init xen_init_IRQ(void)
>  		xen_callback_vector();
>  
>  	if (xen_hvm_domain()) {
> +		unmap_all_pirqs();
> +
>  		native_init_IRQ();
>  		/* pci_xen_hvm_init must be called after native_init_IRQ so that
>  		 * __acpi_register_gsi can point at the right function */
> 
> -- 
>   Vitaly

[-- Attachment #2: 0001-VCPUOP_reset_vcpu_info.patch --]
[-- Type: text/plain, Size: 2281 bytes --]

>From ad6647cc24a09e872b244768ba2dcd7a46d171a9 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Fri, 27 Jun 2014 11:09:35 -0400
Subject: [PATCH 1/2] VCPUOP_reset_vcpu_info

Prototype.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 xen/arch/x86/hvm/hvm.c    |    1 +
 xen/common/domain.c       |   16 ++++++++++++++++
 xen/include/public/vcpu.h |    6 ++++++
 3 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index ef2411c..065abb4 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4628,6 +4628,7 @@ static long hvm_vcpu_op(
     case VCPUOP_set_singleshot_timer:
     case VCPUOP_stop_singleshot_timer:
     case VCPUOP_register_vcpu_info:
+    case VCPUOP_reset_vcpu_info:
     case VCPUOP_register_vcpu_time_memory_area:
         rc = do_vcpu_op(cmd, vcpuid, arg);
         break;
diff --git a/xen/common/domain.c b/xen/common/domain.c
index c3a576e..f4536af 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1196,6 +1196,22 @@ long do_vcpu_op(int cmd, int vcpuid, XEN_GUEST_HANDLE_PARAM(void) arg)
         break;
     }
 
+    case VCPUOP_reset_vcpu_info:
+    {
+        struct domain *d = v->domain;
+
+        if (!test_bit(_VPF_down, &v->pause_flags))
+            return -EFAULT;
+
+        domain_lock(d);
+        unmap_vcpu_info(v);
+        if ( vcpuid < XEN_LEGACY_MAX_VCPUS )
+            v->vcpu_info = (vcpu_info_t *)&shared_info(d, vcpu_info[vcpuid]);
+        domain_unlock(d);
+        rc = 0;
+        break;
+    }
+
     case VCPUOP_register_runstate_memory_area:
     {
         struct vcpu_register_runstate_memory_area area;
diff --git a/xen/include/public/vcpu.h b/xen/include/public/vcpu.h
index e888daf..c0283bc 100644
--- a/xen/include/public/vcpu.h
+++ b/xen/include/public/vcpu.h
@@ -227,6 +227,12 @@ struct vcpu_register_time_memory_area {
 typedef struct vcpu_register_time_memory_area vcpu_register_time_memory_area_t;
 DEFINE_XEN_GUEST_HANDLE(vcpu_register_time_memory_area_t);
 
+/*
+ * Reset all of the vcpu_info information from their previous location
+ * to the default one used at bootup.
+ */
+#define VCPUOP_reset_vcpu_info   14
+
 #endif /* __XEN_PUBLIC_VCPU_H__ */
 
 /*
-- 
1.7.7.6


[-- Attachment #3: 0002-VCPU_reset-VCPU_up-VCPU_is_up-etc-for-HVM.patch --]
[-- Type: text/plain, Size: 880 bytes --]

>From 4e3113c34e2faa11bbb80026d437601ad2a79089 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Fri, 27 Jun 2014 17:31:14 -0400
Subject: [PATCH 2/2] VCPU_reset, VCPU_up, VCPU_is_up, etc for HVM.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 xen/arch/x86/hvm/hvm.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 065abb4..c446a79 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4629,6 +4629,9 @@ static long hvm_vcpu_op(
     case VCPUOP_stop_singleshot_timer:
     case VCPUOP_register_vcpu_info:
     case VCPUOP_reset_vcpu_info:
+    case VCPUOP_is_up:
+    case VCPUOP_up:
+    case VCPUOP_down:
     case VCPUOP_register_vcpu_time_memory_area:
         rc = do_vcpu_op(cmd, vcpuid, arg);
         break;
-- 
1.7.7.6


[-- Attachment #4: 0001-xen-PVonHVM-use-E820_Reserved-area-for-shared_info.patch --]
[-- Type: text/plain, Size: 6147 bytes --]

>From 952f906faa699106a43cbc90a711f0d82f8b0cf7 Mon Sep 17 00:00:00 2001
From: Olaf Hering <olaf@aepfle.de>
Date: Thu, 1 Nov 2012 22:02:30 +0100
Subject: [PATCH 1/2] xen PVonHVM: use E820_Reserved area for shared_info

This is a respin of 00e37bdb0113a98408de42db85be002f21dbffd3
("xen PVonHVM: move shared_info to MMIO before kexec").

Currently kexec in a PVonHVM guest fails with a triple fault because the
new kernel overwrites the shared info page. The exact failure depends on
the size of the kernel image. This patch moves the pfn from RAM into an
E820 reserved memory area.

The pfn containing the shared_info is located somewhere in RAM. This will
cause trouble if the current kernel is doing a kexec boot into a new
kernel. The new kernel (and its startup code) can not know where the pfn
is, so it can not reserve the page. The hypervisor will continue to update
the pfn, and as a result memory corruption occours in the new kernel.

The toolstack marks the memory area FC000000-FFFFFFFF as reserved in the
E820 map. Within that range newer toolstacks (4.3+) will keep 1MB
starting from FE700000 as reserved for guest use. Older Xen4 toolstacks
will usually not allocate areas up to FE700000, so FE700000 is expected
to work also with older toolstacks.

In Xen3 there is no reserved area at a fixed location. If the guest is
started on such old hosts the shared_info page will be placed in RAM. As
a result kexec can not be used.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit 9d02b43dee0d7fb18dfb13a00915550b1a3daa9f)

[On resume we need to reset the xen_vcpu_info, which the original
patch did not do]
---
 arch/x86/xen/enlighten.c |   74 ++++++++++++++++++++++++++++++++++-----------
 arch/x86/xen/suspend.c   |    2 +-
 arch/x86/xen/xen-ops.h   |    2 +-
 3 files changed, 58 insertions(+), 20 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index c34bfc4..7e1b951 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1722,23 +1722,29 @@ asmlinkage __visible void __init xen_start_kernel(void)
 #endif
 }
 
-void __ref xen_hvm_init_shared_info(void)
+#ifdef CONFIG_XEN_PVHVM
+#define HVM_SHARED_INFO_ADDR 0xFE700000UL
+static struct shared_info *xen_hvm_shared_info;
+static unsigned long xen_hvm_sip_phys;
+static int xen_major, xen_minor;
+
+static void xen_hvm_connect_shared_info(unsigned long pfn)
 {
-	int cpu;
 	struct xen_add_to_physmap xatp;
-	static struct shared_info *shared_info_page = 0;
 
-	if (!shared_info_page)
-		shared_info_page = (struct shared_info *)
-			extend_brk(PAGE_SIZE, PAGE_SIZE);
 	xatp.domid = DOMID_SELF;
 	xatp.idx = 0;
 	xatp.space = XENMAPSPACE_shared_info;
-	xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
+	xatp.gpfn = pfn;
 	if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
 		BUG();
 
-	HYPERVISOR_shared_info = (struct shared_info *)shared_info_page;
+}
+static void __init xen_hvm_set_shared_info(struct shared_info *sip)
+{
+	int cpu;
+
+	HYPERVISOR_shared_info = sip;
 
 	/* xen_vcpu is a pointer to the vcpu_info struct in the shared_info
 	 * page, we use it in the event channel upcall and in some pvclock
@@ -1756,20 +1762,39 @@ void __ref xen_hvm_init_shared_info(void)
 	}
 }
 
-#ifdef CONFIG_XEN_PVHVM
+/* Reconnect the shared_info pfn to a (new) mfn */
+void xen_hvm_resume_shared_info(void)
+{
+	xen_hvm_connect_shared_info(xen_hvm_sip_phys >> PAGE_SHIFT);
+	xen_hvm_set_shared_info(xen_hvm_shared_info);
+}
+
+/* Xen tools prior to Xen 4 do not provide a E820_Reserved area for guest usage.
+ * On these old tools the shared info page will be placed in E820_Ram.
+ * Xen 4 provides a E820_Reserved area at 0xFC000000, and this code expects
+ * that nothing is mapped up to HVM_SHARED_INFO_ADDR.
+ * Xen 4.3+ provides an explicit 1MB area at HVM_SHARED_INFO_ADDR which is used
+ * here for the shared info page. */
+static void __init xen_hvm_init_shared_info(void)
+{
+	if (xen_major < 4) {
+		xen_hvm_shared_info = extend_brk(PAGE_SIZE, PAGE_SIZE);
+		xen_hvm_sip_phys = __pa(xen_hvm_shared_info);
+	} else {
+		xen_hvm_sip_phys = HVM_SHARED_INFO_ADDR;
+		set_fixmap(FIX_PARAVIRT_BOOTMAP, xen_hvm_sip_phys);
+		xen_hvm_shared_info =
+		(struct shared_info *)fix_to_virt(FIX_PARAVIRT_BOOTMAP);
+	}
+	xen_hvm_resume_shared_info();
+}
+
 static void __init init_hvm_pv_info(void)
 {
-	int major, minor;
-	uint32_t eax, ebx, ecx, edx, pages, msr, base;
+	uint32_t  ecx, edx, pages, msr, base;
 	u64 pfn;
 
 	base = xen_cpuid_base();
-	cpuid(base + 1, &eax, &ebx, &ecx, &edx);
-
-	major = eax >> 16;
-	minor = eax & 0xffff;
-	printk(KERN_INFO "Xen version %d.%d.\n", major, minor);
-
 	cpuid(base + 2, &pages, &msr, &ecx, &edx);
 
 	pfn = __pa(hypercall_page);
@@ -1824,10 +1849,23 @@ static void __init xen_hvm_guest_init(void)
 
 static uint32_t __init xen_hvm_platform(void)
 {
+	uint32_t eax, ebx, ecx, edx, base;
+
 	if (xen_pv_domain())
 		return 0;
 
-	return xen_cpuid_base();
+	base = xen_cpuid_base();
+	if (!base)
+		return 0;
+
+	cpuid(base + 1, &eax, &ebx, &ecx, &edx);
+
+	xen_major = eax >> 16;
+	xen_minor = eax & 0xffff;
+
+	printk(KERN_INFO "Xen version %d.%d.\n", xen_major, xen_minor);
+
+	return 1;
 }
 
 bool xen_hvm_need_lapic(void)
diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
index 45329c8..ae8a00c 100644
--- a/arch/x86/xen/suspend.c
+++ b/arch/x86/xen/suspend.c
@@ -30,7 +30,7 @@ void xen_arch_hvm_post_suspend(int suspend_cancelled)
 {
 #ifdef CONFIG_XEN_PVHVM
 	int cpu;
-	xen_hvm_init_shared_info();
+	xen_hvm_resume_shared_info();
 	xen_callback_vector();
 	xen_unplug_emulated_devices();
 	if (xen_feature(XENFEAT_hvm_safe_pvclock)) {
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 1cb6f4c..002c3e9 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -40,7 +40,7 @@ void xen_enable_syscall(void);
 void xen_vcpu_restore(void);
 
 void xen_callback_vector(void);
-void xen_hvm_init_shared_info(void);
+void xen_hvm_resume_shared_info(void);
 void xen_unplug_emulated_devices(void);
 
 void __init xen_build_dynamic_phys_to_machine(void);
-- 
1.7.7.6


[-- Attachment #5: 0002-RFC-VCPU_reset_cpu_info.patch --]
[-- Type: text/plain, Size: 4940 bytes --]

>From 1b38e07f98241d7eb7901f9cbc40dded7ccc5bb4 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Fri, 27 Jun 2014 10:47:24 -0400
Subject: [PATCH 2/2] RFC: VCPU_reset_cpu_info

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/enlighten.c     |   29 ++++++++++++++++++++++++++++-
 arch/x86/xen/smp.c           |   26 +++++++++++++++++++++++++-
 arch/x86/xen/smp.h           |    1 +
 arch/x86/xen/xen-ops.h       |    1 +
 include/xen/interface/vcpu.h |    2 ++
 5 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 7e1b951..303883e 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -215,6 +215,11 @@ static void xen_vcpu_setup(int cpu)
 	   hypervisor has no unregister variant and this hypercall does not
 	   allow to over-write info.mfn and info.offset.
 	 */
+	err = HYPERVISOR_vcpu_op(VCPUOP_reset_vcpu_info, cpu, NULL);
+	if (err)
+		printk(KERN_DEBUG "%s: VCPUOP_reset_vcpu_info for CPU%d failed: %d\n",
+			__func__, cpu, err);
+
 	err = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_info, cpu, &info);
 
 	if (err) {
@@ -228,6 +233,21 @@ static void xen_vcpu_setup(int cpu)
 	}
 }
 
+void xen_teardown_vcpu_setup(int cpu)
+{
+	int err;
+
+	if (!have_vcpu_info_placement)
+		return;
+
+	err = HYPERVISOR_vcpu_op(VCPUOP_reset_vcpu_info, cpu, NULL);
+	if (err) {
+		xen_raw_printk("%s: VCPUOP_reset_vcpu_info rc: %d\n", __func__, err);
+		return;
+	}
+	if (cpu < MAX_VIRT_CPUS)
+		per_cpu(xen_vcpu, cpu) = &HYPERVISOR_shared_info->vcpu_info[cpu];
+}
 /*
  * On restore, set the vcpu placement up again.
  * If it fails, then we're in a bad state, since
@@ -1828,7 +1848,11 @@ static int xen_hvm_cpu_notify(struct notifier_block *self, unsigned long action,
 static struct notifier_block xen_hvm_cpu_notifier = {
 	.notifier_call	= xen_hvm_cpu_notify,
 };
-
+static void xen_pvhvm_kexec_shutdown(void)
+{
+	xen_kexec_shutdown();
+	native_machine_shutdown();
+}
 static void __init xen_hvm_guest_init(void)
 {
 	init_hvm_pv_info();
@@ -1845,6 +1869,9 @@ static void __init xen_hvm_guest_init(void)
 	x86_init.irqs.intr_init = xen_init_IRQ;
 	xen_hvm_init_time_ops();
 	xen_hvm_init_mmu_ops();
+#ifdef CONFIG_KEXEC
+	machine_ops.shutdown = xen_pvhvm_kexec_shutdown;
+#endif
 }
 
 static uint32_t __init xen_hvm_platform(void)
diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 7005974..984a955 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -18,6 +18,7 @@
 #include <linux/smp.h>
 #include <linux/irq_work.h>
 #include <linux/tick.h>
+#include <linux/kexec.h>
 
 #include <asm/paravirt.h>
 #include <asm/desc.h>
@@ -494,7 +495,6 @@ static int xen_cpu_disable(void)
 	load_cr3(swapper_pg_dir);
 	return 0;
 }
-
 static void xen_cpu_die(unsigned int cpu)
 {
 	while (xen_pv_domain() && HYPERVISOR_vcpu_op(VCPUOP_is_up, cpu, NULL)) {
@@ -504,6 +504,8 @@ static void xen_cpu_die(unsigned int cpu)
 	xen_smp_intr_free(cpu);
 	xen_uninit_lock_cpu(cpu);
 	xen_teardown_timer(cpu);
+
+	xen_teardown_vcpu_setup(cpu);
 }
 
 static void xen_play_dead(void) /* used only with HOTPLUG_CPU */
@@ -762,6 +764,28 @@ static void xen_hvm_cpu_die(unsigned int cpu)
 	native_cpu_die(cpu);
 }
 
+void xen_kexec_shutdown(void)
+{
+#ifdef CONFIG_KEXEC
+	int cpu;
+
+	if (!kexec_in_progress)
+		return;
+
+	cpu = smp_processor_id();
+
+	gnttab_suspend();
+	xen_arch_pre_suspend();
+
+	/* Stop all CPUs... */
+	disable_nonboot_cpus();
+
+	/* Bring down the IPIs, PIRQs, on the BSP. */
+	xen_raw_printk("CPU0 down.\n");
+	xen_cpu_die(cpu);
+	xen_raw_printk("CPU0 down done.\n");
+#endif
+}
 void __init xen_hvm_smp_init(void)
 {
 	if (!xen_have_vector_callback)
diff --git a/arch/x86/xen/smp.h b/arch/x86/xen/smp.h
index c7c2d89..1af0493 100644
--- a/arch/x86/xen/smp.h
+++ b/arch/x86/xen/smp.h
@@ -8,4 +8,5 @@ extern void xen_send_IPI_allbutself(int vector);
 extern void xen_send_IPI_all(int vector);
 extern void xen_send_IPI_self(int vector);
 
+extern void xen_kexec_shutdown(void);
 #endif
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 002c3e9..978b79b 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -50,6 +50,7 @@ void xen_init_irq_ops(void);
 void xen_setup_timer(int cpu);
 void xen_setup_runstate_info(int cpu);
 void xen_teardown_timer(int cpu);
+void xen_teardown_vcpu_setup(int cpu);
 cycle_t xen_clocksource_read(void);
 void xen_setup_cpu_clockevents(void);
 void __init xen_init_time_ops(void);
diff --git a/include/xen/interface/vcpu.h b/include/xen/interface/vcpu.h
index b05288c..3e5e6e9 100644
--- a/include/xen/interface/vcpu.h
+++ b/include/xen/interface/vcpu.h
@@ -172,4 +172,6 @@ DEFINE_GUEST_HANDLE_STRUCT(vcpu_register_vcpu_info);
 
 /* Send an NMI to the specified VCPU. @extra_arg == NULL. */
 #define VCPUOP_send_nmi             11
+
+#define VCPUOP_reset_vcpu_info	    14
 #endif /* __XEN_PUBLIC_VCPU_H__ */
-- 
1.7.7.6


[-- Attachment #6: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

     prev parent reply	other threads:[~2014-07-01 15:34 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-30 15:36 kexec -e in PVHVM guests (and in PV) Konrad Rzeszutek Wilk
2014-06-30 16:05 ` David Vrabel
2014-06-30 16:21   ` Konrad Rzeszutek Wilk
2014-06-30 16:28 ` Olaf Hering
2014-07-01  8:12 ` Vitaly Kuznetsov
2014-07-01 15:34   ` Konrad Rzeszutek Wilk [this message]

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:ef2411c dfblob:065abb4 dfblob:c3a576e dfblob:f4536af
dfblob:e888daf dfblob:c0283bc dfblob:065abb4 dfblob:c446a79
dfblob:c34bfc4 dfblob:7e1b951 dfblob:45329c8 dfblob:ae8a00c
dfblob:1cb6f4c dfblob:002c3e9 dfblob:7e1b951 dfblob:303883e
dfblob:7005974 dfblob:984a955 dfblob:c7c2d89 dfblob:1af0493
dfblob:002c3e9 dfblob:978b79b dfblob:b05288c dfblob:3e5e6e9 )
 OR (
bs:"RFC: VCPU_reset_cpu_info" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140701153401.GA26227@laptop.dumpdata.com \
    --to=konrad.wilk@oracle.com \
    --cc=daniel.kiper@oracle.com \
    --cc=vkuznets@redhat.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.