[RFC] KEXEC: allocate crash note buffers at boot time v2

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Keir Fraser <keir.xen@gmail.com>
Cc: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
	Jan Beulich <JBeulich@suse.com>
Subject: [RFC] KEXEC: allocate crash note buffers at boot time v2
Date: Wed, 30 Nov 2011 17:24:37 +0000	[thread overview]
Message-ID: <4ED666D5.5060603@citrix.com> (raw)
In-Reply-To: <4ED62C50.7060204@citrix.com>

[-- Attachment #1: Type: text/plain, Size: 2564 bytes --]

Hello,

Presented is version 2 of this patch, which uses cpu hotplug
notifications to be rather safer when allocating buffers.

It occurs to me that there is a bit of an API problem with it comes to
combining a crashdump kernel with potential hotplug events.

If dom0 does not get notification of new/removed pcpus, and if it
doesn't re-evaluate /proc/iomem by recalling things like
KEXEC_CMD_get_range, then subsequent calls to /sbin/kexec -p will bundle
up stale information into the kdump magic bundle.

Even if dom0 does get a notification of pcpu hotplug events, and it
updates its iomem map, /sbin/kexec would still need to be called to
bundle the new information into the kdump magic bundle.  Possibly doable
off a udev CPU hotplug event.

Even if /sbin/kexec gets called after hotplug events, a crash before the
new kexec magic bundle has been loaded will still result in the old
bundle being used, with its stale information regarding the position and
number of crash notes.

Sadly, I don't see any easy or sensible solution to these problems. 
However, it is probably worth knowing as a potential limitation.

I have worked around this problem by never deallocating crash notes, so
even if information is stale as to the number of crash notes to be
expected, the stale physical addresses still point to allocated notes
buffers which have been partially initialized to have sensible note
headers in.  This unfortunately means that an offlined cpu which was
present at boot time will have a notes section with zero'd data.  It
also means that a cpu onlined after boot which crashes will not have its
register state presented to the kdump environment.

I would like to hope that both of these cases are unlikely, but again,
it is still a potential limitation.

The cpu crash notes themselves (PRSTATUS and XEN_ELFNOTE_CRASH_REGS)
don't contain a reference to which pcpu they represent.  This is
inferred by /sbin/kexec from the order in which they appear in dom0's
/proc/iomem, meaning that the kdump environments idea of which pcpu is
which might differ from Xen's. This depending on whether Xen allocates
the notes buffer in ascending order, whether dom0 sorts memory addresses
reported, and, as it currently gets it correct, whether either of these
behaviors change in the future.

This final issue is within my ability to fix and I will do so in the
near future, along with some other additions I already need to make to
the per cpu crash notes.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

[-- Attachment #2: KEXEC-setup-crash-notes-during-boot.patch --]
[-- Type: text/x-patch, Size: 8621 bytes --]

# HG changeset patch
# Parent b5ceec1ccccad6e79053a80820132303ff1e136f
KEXEC: Allocate crash notes on boot v2

Currently, the buffers for crash notes are allocated per CPU when a
KEXEC_CMD_kexec_get_range hypercall is made, referencing the CPU in
question.

Although it certainly works in general, there are a few edge case problems:

1) There appears to be no guarentee that dom0 will make this hypercall
   for each pcpu on the system.  There appears to be variable
   behaviour depending on how many cpus dom0 is compiled for, and how
   many vcpus Xen gives to dom0.

2) The allocation of these buffers occur at the whim of dom0.  While
   this is typically very early on dom0 boot, but not guarenteed.

3) It is possible (although not sensible) for a crash
   kernel to be loaded without these (or some of these) hypercalls
   being made. Under these circumstances, a crash would cause the
   crash note code path will suffer a slew of null pointer deferences.

4) If the hypercalls are made after late in the day, it is possible
   for the hypercall to return -ENOMEM.  As code tends to be more
   fragile once memory is enhausted, the likelyhood of us needing the
   crash kernel is greater.

In addition, my forthcoming code to support 32bit kdump kernels on
64bit Xen on large (>64GB) boxes will require some guarentees as to
where the crash note buffers are actually allocated in physical
memory.  This is far easier to sort out at boot time, rather than
after dom0 has been booted and potentially using the physical memory
required.

Therefore, allocate the crash note buffers at boot time.

Changes since v1:

 *  Use cpu hotplug notifiers to handle allocating of the notes
    buffers rather than assuming the boot state of cpus will be the
    same as the crash state.

 *  Move crash_notes from being per_cpu.  This is because the kdump
    kernel elf binary put in the crash area is hard coded to physical
    addresses which the dom0 kernel typically obtains at boot time.
    If a cpu is offlined, its buffer should not be deallocated because
    the kdump kernel would read junk when trying to get the crash
    notes.  Similarly, the same problem would occur if the cpu was
    re-onlined later and its crash notes buffer was allocated elsewhere.

 *  Only attempt to allocate buffers if a crash area has been
    specified.  Else, allocating crash note buffers is a waste of
    space.  Along with this, change the test in kexec_get_cpu to
    return -EINVAL if no buffers have been allocated.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

diff -r df7cec2c6c03 xen/common/kexec.c
--- a/xen/common/kexec.c
+++ b/xen/common/kexec.c
@@ -25,13 +25,14 @@
 #include <xen/kexec.h>
 #include <public/elfnote.h>
 #include <xsm/xsm.h>
+#include <xen/cpu.h>
 #ifdef CONFIG_COMPAT
 #include <compat/kexec.h>
 #endif

 bool_t kexecing = FALSE;

-static DEFINE_PER_CPU_READ_MOSTLY(void *, crash_notes);
+static void * crash_notes[NR_CPUS];

 static Elf_Note *xen_crash_note;

@@ -165,7 +166,7 @@ static void one_cpu_only(void)
 void kexec_crash_save_cpu(void)
 {
     int cpu = smp_processor_id();
-    Elf_Note *note = per_cpu(crash_notes, cpu);
+    Elf_Note *note = crash_notes[cpu];
     ELF_Prstatus *prstatus;
     crash_xen_core_t *xencore;

@@ -245,25 +246,6 @@ static long kexec_reboot(void *_image)
     return 0;
 }

-static void do_crashdump_trigger(unsigned char key)
-{
-    printk("'%c' pressed -> triggering crashdump\n", key);
-    kexec_crash();
-    printk(" * no crash kernel loaded!\n");
-}
-
-static struct keyhandler crashdump_trigger_keyhandler = {
-    .u.fn = do_crashdump_trigger,
-    .desc = "trigger a crashdump"
-};
-
-static __init int register_crashdump_trigger(void)
-{
-    register_keyhandler('C', &crashdump_trigger_keyhandler);
-    return 0;
-}
-__initcall(register_crashdump_trigger);
-
 static void setup_note(Elf_Note *n, const char *name, int type, int descsz)
 {
     int l = strlen(name) + 1;
@@ -280,6 +262,110 @@ static int sizeof_note(const char *name,
             ELFNOTE_ALIGN(descsz));
 }

+/* Allocate a crash note buffer for a newly onlined cpu. */
+static int kexec_init_cpu_notes(const int cpu)
+{
+    Elf_Note * note;
+    int nr_bytes = 0;
+
+    BUG_ON( cpu < 0 || cpu >= NR_CPUS );
+
+    /* If already allocated, nothing to do. */
+    if ( crash_notes[cpu] )
+        return 0;
+
+    /* All CPUs present a PRSTATUS and crash_xen_core note. */
+    nr_bytes =
+        sizeof_note("CORE", sizeof(ELF_Prstatus)) +
+        sizeof_note("Xen", sizeof(crash_xen_core_t));
+
+    /* CPU0 also presents the crash_xen_info note. */
+    if ( 0 == cpu )
+        nr_bytes = nr_bytes +
+            sizeof_note("Xen", sizeof(crash_xen_info_t));
+
+    note = xmalloc_bytes(nr_bytes);
+    if ( ! note )
+        /* Ideally, this would be -ENOMEM.  However, there are more problems
+         * assocated with trying to deal with -ENOMEM sensibly than just
+         * pretending that the crash note area doesn't exist. */
+        return 0;
+
+    crash_notes[cpu] = note;
+
+    /* Setup CORE note. */
+    setup_note(note, "CORE", NT_PRSTATUS, sizeof(ELF_Prstatus));
+    note = ELFNOTE_NEXT(note);
+
+    /* Setup Xen CORE note. */
+    setup_note(note, "Xen", XEN_ELFNOTE_CRASH_REGS,
+               sizeof(crash_xen_core_t));
+
+    if ( 0 == cpu )
+    {
+        /* Set up Xen Crash Info note. */
+        xen_crash_note = note = ELFNOTE_NEXT(note);
+        setup_note(note, "Xen", XEN_ELFNOTE_CRASH_INFO,
+                   sizeof(crash_xen_info_t));
+    }
+
+    return 0;
+}
+
+static void do_crashdump_trigger(unsigned char key)
+{
+    printk("'%c' pressed -> triggering crashdump\n", key);
+    kexec_crash();
+    printk(" * no crash kernel loaded!\n");
+}
+
+static struct keyhandler crashdump_trigger_keyhandler = {
+    .u.fn = do_crashdump_trigger,
+    .desc = "trigger a crashdump"
+};
+
+static int cpu_callback(
+    struct notifier_block *nfb, unsigned long action, void *hcpu)
+{
+    unsigned int cpu = (unsigned long)hcpu;
+
+    /* Only hook on CPU_UP_PREPARE because once a crash_note has been reported
+     * to dom0, it must keep it around in case of a crash, as the crash kernel
+     * will be hard coded to the original physical address reported. */
+    switch ( action )
+    {
+    case CPU_UP_PREPARE:
+        kexec_init_cpu_notes(cpu);
+        break;
+    default:
+        break;
+    }
+    return NOTIFY_DONE;
+}
+
+static struct notifier_block cpu_nfb = {
+    .notifier_call = cpu_callback
+};
+
+static int __init kexec_init(void)
+{
+    void *cpu = (void *)(long)smp_processor_id();
+
+    register_keyhandler('C', &crashdump_trigger_keyhandler);
+
+    /* If no crash area, no need to allocate space for notes. */
+    if ( 0 == kexec_crash_area.size )
+        return 0;
+
+    cpu_callback(&cpu_nfb, CPU_UP_PREPARE, cpu);
+    register_cpu_notifier(&cpu_nfb);
+    return 0;
+}
+/* The reason for this to be a presmp_initcall as opposed to a regular
+ * __initcall is to allow the setup of the cpu hotplug handler before APs are
+ * brought up. */
+presmp_initcall(kexec_init);
+
 static int kexec_get_reserve(xen_kexec_range_t *range)
 {
     if ( kexec_crash_area.size > 0 && kexec_crash_area.start > 0) {
@@ -296,7 +382,7 @@ static int kexec_get_cpu(xen_kexec_range
     int nr = range->nr;
     int nr_bytes = 0;

-    if ( nr < 0 || nr >= nr_cpu_ids || !cpu_online(nr) )
+    if ( nr < 0 || nr >= nr_cpu_ids || !cpu_online(nr) || !crash_notes[nr] )
         return -EINVAL;

     nr_bytes += sizeof_note("CORE", sizeof(ELF_Prstatus));
@@ -306,31 +392,7 @@ static int kexec_get_cpu(xen_kexec_range
     if ( nr == 0 )
         nr_bytes += sizeof_note("Xen", sizeof(crash_xen_info_t));

-    if ( per_cpu(crash_notes, nr) == NULL )
-    {
-        Elf_Note *note;
-
-        note = per_cpu(crash_notes, nr) = xmalloc_bytes(nr_bytes);
-
-        if ( note == NULL )
-            return -ENOMEM;
-
-        /* Setup CORE note. */
-        setup_note(note, "CORE", NT_PRSTATUS, sizeof(ELF_Prstatus));
-
-        /* Setup Xen CORE note. */
-        note = ELFNOTE_NEXT(note);
-        setup_note(note, "Xen", XEN_ELFNOTE_CRASH_REGS, sizeof(crash_xen_core_t));
-
-        if (nr == 0)
-        {
-            /* Setup system wide Xen info note. */
-            xen_crash_note = note = ELFNOTE_NEXT(note);
-            setup_note(note, "Xen", XEN_ELFNOTE_CRASH_INFO, sizeof(crash_xen_info_t));
-        }
-    }
-
-    range->start = __pa((unsigned long)per_cpu(crash_notes, nr));
+    range->start = __pa((unsigned long)crash_notes[nr]);
     range->size = nr_bytes;
     return 0;
 }

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

next prev parent reply	other threads:[~2011-11-30 17:24 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-29 18:56 [RFC] KEXEC: allocate crash note buffers at boot time Andrew Cooper
2011-11-29 11:19 ` Keir Fraser
2011-11-30 13:14   ` Andrew Cooper
2011-11-30 17:24     ` Andrew Cooper [this message]
2011-12-01  9:08       ` [RFC] KEXEC: allocate crash note buffers at boot time v2 Jan Beulich
2011-12-01  9:49         ` Andrew Cooper
2011-12-01 10:01           ` Jan Beulich
2011-12-01 12:29             ` [RFC] KEXEC: allocate crash note buffers at boot time v3 Andrew Cooper
2011-12-01 12:56               ` Jan Beulich
2011-12-01  5:20                 ` Keir Fraser
2011-12-01 14:00                   ` Andrew Cooper
2011-12-01 13:59                 ` Andrew Cooper
2011-12-01 15:14                   ` Jan Beulich
2011-12-01 15:02                 ` Andrew Cooper
2011-12-01 15:15                   ` Jan Beulich
2011-12-01 17:14                     ` [RFC] KEXEC: allocate crash note buffers at boot time v4 Andrew Cooper
2011-12-02  8:02                       ` Jan Beulich
2011-12-02 12:33                         ` Andrew Cooper
2011-12-02 15:19                           ` KEXEC: allocate crash note buffers at boot time v5 Andrew Cooper
2011-12-02 16:04                             ` Jan Beulich
2011-12-02 16:10                               ` KEXEC: allocate crash note buffers at boot time v Andrew Cooper
2011-11-30  9:20 ` [RFC] KEXEC: allocate crash note buffers at boot time Jan Beulich
2011-11-30 14:01   ` Andrew Cooper

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ED666D5.5060603@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=keir.xen@gmail.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.