All of lore.kernel.org
 help / color / mirror / Atom feed
* KEXEC: allocate crash note buffers at boot time v5
  2011-12-02 12:33                       ` Andrew Cooper
@ 2011-12-02 15:19                         ` Andrew Cooper
  2011-12-02 16:04                           ` Jan Beulich
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Cooper @ 2011-12-02 15:19 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir Fraser, xen-devel@lists.xensource.com

[-- Attachment #1: Type: text/plain, Size: 1150 bytes --]

Here is v5.

I have tested quite carefully the impact of changing the error
conditions in kexec_get_cpu().  dom0 is happy to accept ERANGE as well
as EINVAL, and ERANGE is a more sensible response.  Moreover, this new
scheme (ignoring for a moment ENOMEM) allows for all nr_cpu_ids to give
note ranges to dom0 whether they are up or not at the moment.   It is
better to have more notes provided with some of them potentially not
filled in, than to miss a crashed CPU register state because it was
offline when the crash kernel was loaded.

All this testing has been done via backport to 4.1.2.  Minimal changes
were necessary, such as the lack of xzalloc_array() and NR_CPUS instead
of nr_cpu_ids, but it was otherwise functionally identical.

I am not sure whether to recomend this for backport into 4.1-testing or
not.  It does make the crashkernel more likely to get crash notes,
although in reality, it was only special circumstances which caused
problems.  If you consider it worth backporting, I can provide my
already-backported patch.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com


[-- Attachment #2: KEXEC-setup-crash-notes-during-boot.patch --]
[-- Type: text/x-patch, Size: 11725 bytes --]

# HG changeset patch
# Parent b5ceec1ccccad6e79053a80820132303ff1e136f
KEXEC: Allocate crash notes on boot v5

Currently, the buffers for crash notes are allocated per CPU when a
KEXEC_CMD_kexec_get_range hypercall is made, referencing the CPU in
question.

Although it certainly works in general, there are a few edge case problems:

1) There appears to be no guarentee that dom0 will make this hypercall
   for each pcpu on the system.  There appears to be variable
   behaviour depending on how many cpus dom0 is compiled for, and how
   many vcpus Xen gives to dom0.

2) The allocation of these buffers occur at the whim of dom0.  While
   this is typically very early on dom0 boot, but not guarenteed.

3) It is possible (although not sensible) for a crash
   kernel to be loaded without these (or some of these) hypercalls
   being made. Under these circumstances, a crash would cause the
   crash note code path will suffer a slew of null pointer deferences.

4) If the hypercalls are made after late in the day, it is possible
   for the hypercall to return -ENOMEM.  As code tends to be more
   fragile once memory is enhausted, the likelyhood of us needing the
   crash kernel is greater.

In addition, my forthcoming code to support 32bit kdump kernels on
64bit Xen on large (>64GB) boxes will require some guarentees as to
where the crash note buffers are actually allocated in physical
memory.  This is far easier to sort out at boot time, rather than
after dom0 has been booted and potentially using the physical memory
required.

Therefore, allocate the crash note buffers at boot time.

Changes since v4:

 *  Replace the current cpu crash note scheme of using void pointers
    and hand calculating the size each time is needed, by a range
    structure containing a pointer and a size.  This removes duplicate
    times where the size is calculated.
 *  Tweak kexec_get_cpu().  Don't fail if a cpu is offline because it
    may already have crash notes, and may be up by the time a crash
    happens.  Split the error conditions up to return ERANGE for an
    out-of-range cpu request rather than EINVAL.  Finally, returning a
    range of zeros is acceptable, so do this in preference to failing.

Changes since v3:

 *  Alter the spinlocks to avoid calling xmalloc/xfree while holding
    the lock.
 *  Tidy up the coding style used.

Changes since v2:

 *  Allocate crash_notes dynamically using nr_cpu_ids at boot time,
    rather than statically using NR_CPUS.
 *  Fix the incorrect use of signed integers for cpu id.
 *  Fix collateral damage to do_crashdump_trigger() and
    crashdump_trigger_handler caused by reordering sizeof_note() and
    setup_note()
 *  Tweak the issue about returing -ENOMEM from kexec_init_cpu_note().
    No functional change.
 *  Change kexec_get_cpu() to attempt to allocate crash note buffers
    in case we have more free memory now than when the pcpu came up.
 *  Now that there are two codepaths possibly allocating crash notes,
    protect the allocation itself with a spinlock.

Changes since v1:

 *  Use cpu hotplug notifiers to handle allocating of the notes
    buffers rather than assuming the boot state of cpus will be the
    same as the crash state.
 *  Move crash_notes from being per_cpu.  This is because the kdump
    kernel elf binary put in the crash area is hard coded to physical
    addresses which the dom0 kernel typically obtains at boot time.
    If a cpu is offlined, its buffer should not be deallocated because
    the kdump kernel would read junk when trying to get the crash
    notes.  Similarly, the same problem would occur if the cpu was
    re-onlined later and its crash notes buffer was allocated elsewhere.
 *  Only attempt to allocate buffers if a crash area has been
    specified.  Else, allocating crash note buffers is a waste of
    space.  Along with this, change the test in kexec_get_cpu to
    return -EINVAL if no buffers have been allocated.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

diff -r df7cec2c6c03 xen/common/kexec.c
--- a/xen/common/kexec.c
+++ b/xen/common/kexec.c
@@ -25,13 +25,19 @@
 #include <xen/kexec.h>
 #include <public/elfnote.h>
 #include <xsm/xsm.h>
+#include <xen/cpu.h>
 #ifdef CONFIG_COMPAT
 #include <compat/kexec.h>
 #endif
 
 bool_t kexecing = FALSE;
 
-static DEFINE_PER_CPU_READ_MOSTLY(void *, crash_notes);
+/* Memory regions to store the per cpu register state etc. on a crash. */
+typedef struct { Elf_Note * start; size_t size; } crash_note_range_t;
+static crash_note_range_t * crash_notes;
+
+/* Lock to prevent race conditions when allocating the crash note buffers. */
+static DEFINE_SPINLOCK(crash_notes_lock);
 
 static Elf_Note *xen_crash_note;
 
@@ -165,13 +171,17 @@ static void one_cpu_only(void)
 void kexec_crash_save_cpu(void)
 {
     int cpu = smp_processor_id();
-    Elf_Note *note = per_cpu(crash_notes, cpu);
+    Elf_Note *note;
     ELF_Prstatus *prstatus;
     crash_xen_core_t *xencore;
 
+    BUG_ON( ! crash_notes );
+
     if ( cpumask_test_and_set_cpu(cpu, &crash_saved_cpus) )
         return;
 
+    note = crash_notes[cpu].start;
+
     prstatus = (ELF_Prstatus *)ELFNOTE_DESC(note);
 
     note = ELFNOTE_NEXT(note);
@@ -257,13 +267,6 @@ static struct keyhandler crashdump_trigg
     .desc = "trigger a crashdump"
 };
 
-static __init int register_crashdump_trigger(void)
-{
-    register_keyhandler('C', &crashdump_trigger_keyhandler);
-    return 0;
-}
-__initcall(register_crashdump_trigger);
-
 static void setup_note(Elf_Note *n, const char *name, int type, int descsz)
 {
     int l = strlen(name) + 1;
@@ -280,6 +283,129 @@ static int sizeof_note(const char *name,
             ELFNOTE_ALIGN(descsz));
 }
 
+/* Allocate a crash note buffer for a newly onlined cpu. */
+static int kexec_init_cpu_notes(const unsigned long cpu)
+{
+    Elf_Note * note;
+    int ret = 0;
+    int nr_bytes = 0;
+
+    BUG_ON( cpu >= nr_cpu_ids || ! crash_notes );
+
+    /* If already allocated, nothing to do. */
+    if ( crash_notes[cpu].start )
+        return ret;
+
+    /* All CPUs present a PRSTATUS and crash_xen_core note. */
+    nr_bytes =
+        sizeof_note("CORE", sizeof(ELF_Prstatus)) +
+        sizeof_note("Xen", sizeof(crash_xen_core_t));
+
+    /* CPU0 also presents the crash_xen_info note. */
+    if ( ! cpu )
+        nr_bytes = nr_bytes +
+            sizeof_note("Xen", sizeof(crash_xen_info_t));
+
+    note = xmalloc_bytes(nr_bytes);
+
+    /* Protect the write into crash_notes[] with a spinlock, as this function
+     * is on a hotplug path and a hypercall path. */
+    spin_lock(&crash_notes_lock);
+
+    /* If we are racing with another CPU and it has beaten us, give up
+     * gracefully. */
+    if ( crash_notes[cpu].start )
+    {
+        spin_unlock(&crash_notes_lock);
+        /* Always return ok, because whether we successfully allocated or not,
+         * another CPU has successfully allocated. */
+        if ( note )
+            xfree(note);
+    }
+    else
+    {
+        crash_notes[cpu].start = note;
+        crash_notes[cpu].size = nr_bytes;
+        spin_unlock(&crash_notes_lock);
+
+        /* If the allocation failed, and another CPU did not beat us, give
+         * up with ENOMEM. */
+        if ( ! note )
+            ret = -ENOMEM;
+        /* else all is good so lets set up the notes. */
+        else
+        {
+            /* Set up CORE note. */
+            setup_note(note, "CORE", NT_PRSTATUS, sizeof(ELF_Prstatus));
+            note = ELFNOTE_NEXT(note);
+
+            /* Set up Xen CORE note. */
+            setup_note(note, "Xen", XEN_ELFNOTE_CRASH_REGS,
+                       sizeof(crash_xen_core_t));
+
+            if ( ! cpu )
+            {
+                /* Set up Xen Crash Info note. */
+                xen_crash_note = note = ELFNOTE_NEXT(note);
+                setup_note(note, "Xen", XEN_ELFNOTE_CRASH_INFO,
+                           sizeof(crash_xen_info_t));
+            }
+        }
+    }
+
+    return ret;
+}
+
+static int cpu_callback(
+    struct notifier_block *nfb, unsigned long action, void *hcpu)
+{
+    unsigned long cpu = (unsigned long)hcpu;
+
+    /* Only hook on CPU_UP_PREPARE because once a crash_note has been reported
+     * to dom0, it must keep it around in case of a crash, as the crash kernel
+     * will be hard coded to the original physical address reported. */
+    switch ( action )
+    {
+    case CPU_UP_PREPARE:
+        /* Ignore return value.  If this boot time, -ENOMEM will cause all
+         * manner of problems elsewhere very soon, and if it is during runtime,
+         * then failing to allocate crash notes is not a good enough reason to
+         * fail the CPU_UP_PREPARE */
+        kexec_init_cpu_notes(cpu);
+        break;
+    default:
+        break;
+    }
+    return NOTIFY_DONE;
+}
+
+static struct notifier_block cpu_nfb = {
+    .notifier_call = cpu_callback
+};
+
+static int __init kexec_init(void)
+{
+    void *cpu = (void *)(unsigned long)smp_processor_id();
+
+    /* If no crash area, no need to allocate space for notes. */
+    if ( !kexec_crash_area.size )
+        return 0;
+
+    register_keyhandler('C', &crashdump_trigger_keyhandler);
+
+    crash_notes = xzalloc_array(crash_note_range_t, nr_cpu_ids);
+    if ( ! crash_notes )
+        return -ENOMEM;
+
+    cpu_callback(&cpu_nfb, CPU_UP_PREPARE, cpu);
+    register_cpu_notifier(&cpu_nfb);
+    return 0;
+}
+/* The reason for this to be a presmp_initcall as opposed to a regular
+ * __initcall is to allow the setup of the cpu hotplug handler before APs are
+ * brought up. */
+presmp_initcall(kexec_init);
+
 static int kexec_get_reserve(xen_kexec_range_t *range)
 {
     if ( kexec_crash_area.size > 0 && kexec_crash_area.start > 0) {
@@ -294,44 +420,23 @@ static int kexec_get_reserve(xen_kexec_r
 static int kexec_get_cpu(xen_kexec_range_t *range)
 {
     int nr = range->nr;
-    int nr_bytes = 0;
 
-    if ( nr < 0 || nr >= nr_cpu_ids || !cpu_online(nr) )
+    if ( nr < 0 || nr >= nr_cpu_ids )
+        return -ERANGE;
+
+    if ( ! crash_notes )
         return -EINVAL;
 
-    nr_bytes += sizeof_note("CORE", sizeof(ELF_Prstatus));
-    nr_bytes += sizeof_note("Xen", sizeof(crash_xen_core_t));
+    /* Try once again to allocate room for the crash notes.  It is just possible
+     * that more space has become available since we last tried.  If space has
+     * already been allocated, kexec_init_cpu_notes() will return early with 0.
+     */
+    kexec_init_cpu_notes(nr);
 
-    /* The Xen info note is included in CPU0's range. */
-    if ( nr == 0 )
-        nr_bytes += sizeof_note("Xen", sizeof(crash_xen_info_t));
-
-    if ( per_cpu(crash_notes, nr) == NULL )
-    {
-        Elf_Note *note;
-
-        note = per_cpu(crash_notes, nr) = xmalloc_bytes(nr_bytes);
-
-        if ( note == NULL )
-            return -ENOMEM;
-
-        /* Setup CORE note. */
-        setup_note(note, "CORE", NT_PRSTATUS, sizeof(ELF_Prstatus));
-
-        /* Setup Xen CORE note. */
-        note = ELFNOTE_NEXT(note);
-        setup_note(note, "Xen", XEN_ELFNOTE_CRASH_REGS, sizeof(crash_xen_core_t));
-
-        if (nr == 0)
-        {
-            /* Setup system wide Xen info note. */
-            xen_crash_note = note = ELFNOTE_NEXT(note);
-            setup_note(note, "Xen", XEN_ELFNOTE_CRASH_INFO, sizeof(crash_xen_info_t));
-        }
-    }
-
-    range->start = __pa((unsigned long)per_cpu(crash_notes, nr));
-    range->size = nr_bytes;
+    /* In the case of still not having enough memory to allocate buffer room,
+     * returning a range of 0,0 is still valid. */
+    range->start = __pa((unsigned long)crash_notes[nr].start);
+    range->size = crash_notes[nr].size;
     return 0;
 }
 

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: KEXEC: allocate crash note buffers at boot time v5
  2011-12-02 15:19                         ` KEXEC: allocate crash note buffers at boot time v5 Andrew Cooper
@ 2011-12-02 16:04                           ` Jan Beulich
  0 siblings, 0 replies; 6+ messages in thread
From: Jan Beulich @ 2011-12-02 16:04 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Keir Fraser, xen-devel@lists.xensource.com

(Andrew, my previous reply dropped all Cc-s, but I'm re-sending not only
because of that - I also adjusted the first part of the response.)

>>> On 02.12.11 at 16:19, Andrew Cooper <andrew.cooper3@citrix.com> wrote:

I just had another look at the Dom0 side of things, and I fail to see why
you think moving this out of per-CPU data is necessary: All Dom0 does
with the provided info is set up the resource tree. The data doesn't get
stored for any post-boot use. What am I overlooking?

>+    /* In the case of still not having enough memory to allocate buffer room,
>+     * returning a range of 0,0 is still valid. */
>+    range->start = __pa((unsigned long)crash_notes[nr].start);
>+    range->size = crash_notes[nr].size;

Comment and implementation don't match - __pa(NULL) is not zero
(and the cast to 'unsigned long' is pointless afaict).

Jan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: KEXEC: allocate crash note buffers at boot time v5
@ 2011-12-02 16:11 Jan Beulich
  2011-12-02 16:27 ` Andrew Cooper
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Beulich @ 2011-12-02 16:11 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Keir Fraser, xen-devel@lists.xensource.com

>>> On 02.12.11 at 16:59, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> On 02/12/11 15:43, Jan Beulich wrote:
>> I just had another look at the Dom0 side of things, and I fail to see why
>> you think boot time allocation is necessary: All Dom0 does with the
>> provided info is set up the resource tree. The data doesn't get stored
>> for any post-boot use. What am I overlooking?
> 
> /sbin/kexec opens /proc/iomem and looks for "Crash note" and interprets
> the range values.  This is how it grabs the locations to pack into its
> magic binary package.

So how does the hotplug scenario then get handled on native? I can't
imagine they expect things to remain stable across CPU unplug and
re-activation.

Jan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: KEXEC: allocate crash note buffers at boot time v5
  2011-12-02 16:11 KEXEC: allocate crash note buffers at boot time v5 Jan Beulich
@ 2011-12-02 16:27 ` Andrew Cooper
  2011-12-02 16:38   ` Jan Beulich
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Cooper @ 2011-12-02 16:27 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir Fraser, xen-devel@lists.xensource.com

On 02/12/11 16:11, Jan Beulich wrote:
>>>> On 02.12.11 at 16:59, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> On 02/12/11 15:43, Jan Beulich wrote:
>>> I just had another look at the Dom0 side of things, and I fail to see why
>>> you think boot time allocation is necessary: All Dom0 does with the
>>> provided info is set up the resource tree. The data doesn't get stored
>>> for any post-boot use. What am I overlooking?
>> /sbin/kexec opens /proc/iomem and looks for "Crash note" and interprets
>> the range values.  This is how it grabs the locations to pack into its
>> magic binary package.
> So how does the hotplug scenario then get handled on native? I can't
> imagine they expect things to remain stable across CPU unplug and
> re-activation.
>
> Jan

I am not how (or even if) the hotplug condition is handled on native.  I
guess it depends on what is put into the resource tree on boot.  With my
patch, Xen will give crash areas for all pcpus up to nr_cpu_ids, which
covers all the cases.  The worst that will happen is that some crash
notes do not get written if certain cpus are offline at the time of a crash.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: KEXEC: allocate crash note buffers at boot time v5
  2011-12-02 16:27 ` Andrew Cooper
@ 2011-12-02 16:38   ` Jan Beulich
  2011-12-02 16:55     ` Andrew Cooper
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Beulich @ 2011-12-02 16:38 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Keir Fraser, xen-devel@lists.xensource.com

>>> On 02.12.11 at 17:27, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> On 02/12/11 16:11, Jan Beulich wrote:
>>>>> On 02.12.11 at 16:59, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>> On 02/12/11 15:43, Jan Beulich wrote:
>>>> I just had another look at the Dom0 side of things, and I fail to see why
>>>> you think boot time allocation is necessary: All Dom0 does with the
>>>> provided info is set up the resource tree. The data doesn't get stored
>>>> for any post-boot use. What am I overlooking?
>>> /sbin/kexec opens /proc/iomem and looks for "Crash note" and interprets
>>> the range values.  This is how it grabs the locations to pack into its
>>> magic binary package.
>> So how does the hotplug scenario then get handled on native? I can't
>> imagine they expect things to remain stable across CPU unplug and
>> re-activation.
> 
> I am not how (or even if) the hotplug condition is handled on native.  I
> guess it depends on what is put into the resource tree on boot.  With my

I don't think native kexec depends on stuff being made visible in
/proc/iomem - grep-ing for "Crash note" in 2.6.18 as well as a current
tree hits exclusively the Xen instance.

Native uses a per-CPU allocation (i.e. address and contents can't be
expected to survive offlining of a CPU).

> patch, Xen will give crash areas for all pcpus up to nr_cpu_ids, which
> covers all the cases.  The worst that will happen is that some crash
> notes do not get written if certain cpus are offline at the time of a crash.

And did you check that nothing in the producer or consumer chain gets
confused by this new behavior?

Jan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: KEXEC: allocate crash note buffers at boot time v5
  2011-12-02 16:38   ` Jan Beulich
@ 2011-12-02 16:55     ` Andrew Cooper
  0 siblings, 0 replies; 6+ messages in thread
From: Andrew Cooper @ 2011-12-02 16:55 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir Fraser, xen-devel@lists.xensource.com

On 02/12/11 16:38, Jan Beulich wrote:
>> patch, Xen will give crash areas for all pcpus up to nr_cpu_ids, which
>> covers all the cases.  The worst that will happen is that some crash
>> notes do not get written if certain cpus are offline at the time of a crash.
> And did you check that nothing in the producer or consumer chain gets
> confused by this new behavior?
>
> Jan
>

/proc/vmcore reported by the kdump kernel is fine, even with extra notes
which have 0 contents (after I deliberately caused kexec_get_cpu to
report 1 more cpu than was present for testing exactly this)

The producer/consumer chain will not change.  There was a case
previously where a CPU which was present at boot and subsequently
offlined would leave crash notes with 0's in them.  Therefore, if a tool
couldn't deal with that case, it wont be able to deal with this case. 
If a tool could deal with that case, it can deal with the new case.

I would hazard a guess that most of the time, we will boot on all CPUs
and crash with all of those CPUs still up, so it is more likely that
there will be no 0'd notes.

(When I say a 0'd note, I mean a note with correct header, type and
name, but 0's in the desc)

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-12-02 16:55 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-02 16:11 KEXEC: allocate crash note buffers at boot time v5 Jan Beulich
2011-12-02 16:27 ` Andrew Cooper
2011-12-02 16:38   ` Jan Beulich
2011-12-02 16:55     ` Andrew Cooper
  -- strict thread matches above, loose matches on Subject: below --
2011-11-29 11:19 [RFC] KEXEC: allocate crash note buffers at boot time Keir Fraser
2011-11-30 13:14 ` Andrew Cooper
2011-11-30 17:24   ` [RFC] KEXEC: allocate crash note buffers at boot time v2 Andrew Cooper
2011-12-01  9:08     ` Jan Beulich
2011-12-01  9:49       ` Andrew Cooper
2011-12-01 10:01         ` Jan Beulich
2011-12-01 12:29           ` [RFC] KEXEC: allocate crash note buffers at boot time v3 Andrew Cooper
2011-12-01 12:56             ` Jan Beulich
2011-12-01 15:02               ` Andrew Cooper
2011-12-01 15:15                 ` Jan Beulich
2011-12-01 17:14                   ` [RFC] KEXEC: allocate crash note buffers at boot time v4 Andrew Cooper
2011-12-02  8:02                     ` Jan Beulich
2011-12-02 12:33                       ` Andrew Cooper
2011-12-02 15:19                         ` KEXEC: allocate crash note buffers at boot time v5 Andrew Cooper
2011-12-02 16:04                           ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.