Kexec Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Eric DeVolder <eric.devolder@oracle.com>
To: Sourabh Jain <sourabhjain@linux.ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org, x86@kernel.org,
	kexec@lists.infradead.org, ebiederm@xmission.com,
	dyoung@redhat.com, bhe@redhat.com, vgoyal@redhat.com
Cc: mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	hpa@zytor.com, nramas@linux.microsoft.com,
	thomas.lendacky@amd.com, robh@kernel.org, efault@gmx.de,
	rppt@kernel.org, david@redhat.com, konrad.wilk@oracle.com,
	boris.ostrovsky@oracle.com
Subject: Re: [PATCH v18 5/7] kexec: exclude hot remove cpu from elfcorehdr notes
Date: Fri, 10 Feb 2023 18:35:09 -0600	[thread overview]
Message-ID: <b3d5c730-60ee-8f1c-978c-e3df41e3a3f1@oracle.com> (raw)
In-Reply-To: <18c57fd0-2ad0-361a-9a53-ac49c372f021@linux.ibm.com>



On 2/10/23 00:29, Sourabh Jain wrote:
> 
> On 10/02/23 01:09, Eric DeVolder wrote:
>>
>>
>> On 2/9/23 12:43, Sourabh Jain wrote:
>>> Hello Eric,
>>>
>>> On 09/02/23 23:01, Eric DeVolder wrote:
>>>>
>>>>
>>>> On 2/8/23 07:44, Thomas Gleixner wrote:
>>>>> Eric!
>>>>>
>>>>> On Tue, Feb 07 2023 at 11:23, Eric DeVolder wrote:
>>>>>> On 2/1/23 05:33, Thomas Gleixner wrote:
>>>>>>
>>>>>> So my latest solution is introduce two new CPUHP states, CPUHP_AP_ELFCOREHDR_ONLINE
>>>>>> for onlining and CPUHP_BP_ELFCOREHDR_OFFLINE for offlining. I'm open to better names.
>>>>>>
>>>>>> The CPUHP_AP_ELFCOREHDR_ONLINE needs to be placed after CPUHP_BRINGUP_CPU. My
>>>>>> attempts at locating this state failed when inside the STARTING section, so I located
>>>>>> this just inside the ONLINE sectoin. The crash hotplug handler is registered on
>>>>>> this state as the callback for the .startup method.
>>>>>>
>>>>>> The CPUHP_BP_ELFCOREHDR_OFFLINE needs to be placed before CPUHP_TEARDOWN_CPU, and I
>>>>>> placed it at the end of the PREPARE section. This crash hotplug handler is also
>>>>>> registered on this state as the callback for the .teardown method.
>>>>>
>>>>> TBH, that's still overengineered. Something like this:
>>>>>
>>>>> bool cpu_is_alive(unsigned int cpu)
>>>>> {
>>>>>     struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
>>>>>
>>>>>     return data_race(st->state) <= CPUHP_AP_IDLE_DEAD;
>>>>> }
>>>>>
>>>>> and use this to query the actual state at crash time. That spares all
>>>>> those callback heuristics.
>>>>>
>>>>>> I'm making my way though percpu crash_notes, elfcorehdr, vmcoreinfo,
>>>>>> makedumpfile and (the consumer of it all) the userspace crash utility,
>>>>>> in order to understand the impact of moving from for_each_present_cpu()
>>>>>> to for_each_online_cpu().
>>>>>
>>>>> Is the packing actually worth the trouble? What's the actual win?
>>>>>
>>>>> Thanks,
>>>>>
>>>>>          tglx
>>>>>
>>>>>
>>>>
>>>> Thomas,
>>>> I've investigated the passing of crash notes through the vmcore. What I've learned is that:
>>>>
>>>> - linux/fs/proc/vmcore.c (which makedumpfile references to do its job) does
>>>>   not care what the contents of cpu PT_NOTES are, but it does coalesce them together.
>>>>
>>>> - makedumpfile will count the number of cpu PT_NOTES in order to determine its
>>>>   nr_cpus variable, which is reported in a header, but otherwise unused (except
>>>>   for sadump method).
>>>>
>>>> - the crash utility, for the purposes of determining the cpus, does not appear to
>>>>   reference the elfcorehdr PT_NOTEs. Instead it locates the various
>>>>   cpu_[possible|present|online]_mask and computes nr_cpus from that, and also of
>>>>   course which are online. In addition, when crash does reference the cpu PT_NOTE,
>>>>   to get its prstatus, it does so by using a percpu technique directly in the vmcore
>>>>   image memory, not via the ELF structure. Said differently, it appears to me that
>>>>   crash utility doesn't rely on the ELF PT_NOTEs for cpus; rather it obtains them
>>>>   via kernel cpumasks and the memory within the vmcore.
>>>>
>>>> With this understanding, I did some testing. Perhaps the most telling test was that I
>>>> changed the number of cpu PT_NOTEs emitted in the crash_prepare_elf64_headers() to just 1,
>>>> hot plugged some cpus, then also took a few offline sparsely via chcpu, then generated a
>>>> vmcore. The crash utility had no problem loading the vmcore, it reported the proper number
>>>> of cpus and the number offline (despite only one cpu PT_NOTE), and changing to a different
>>>> cpu via 'set -c 30' and the backtrace was completely valid.
>>>>
>>>> My take away is that crash utility does not rely upon ELF cpu PT_NOTEs, it obtains the
>>>> cpu information directly from kernel data structures. Perhaps at one time crash relied
>>>> upon the ELF information, but no more. (Perhaps there are other crash dump analyzers
>>>> that might rely on the ELF info?)
>>>>
>>>> So, all this to say that I see no need to change crash_prepare_elf64_headers(). There
>>>> is no compelling reason to move away from for_each_present_cpu(), or modify the list for
>>>> online/offline.
>>>>
>>>> Which then leaves the topic of the cpuhp state on which to register. Perhaps reverting
>>>> back to the use of CPUHP_BP_PREPARE_DYN is the right answer. There does not appear to
>>>> be a compelling need to accurately track whether the cpu went online/offline for the
>>>> purposes of creating the elfcorehdr, as ultimately the crash utility pulls that from
>>>> kernel data structures, not the elfcorehdr.
>>>>
>>>> I think this is what Sourabh has known and has been advocating for an optimization
>>>> path that allows not regenerating the elfcorehdr on cpu changes (because all the percpu
>>>> structs are all laid out). I do think it best to leave that as an arch choice.
>>>
>>> Since things are clear on how the PT_NOTES are consumed in kdump kernel [fs/proc/vmcore.c],
>>> makedumpfile, and crash tool I need your opinion on this:
>>>
>>> Do we really need to regenerate elfcorehdr for CPU hotplug events?
>>> If yes, can you please list the elfcorehdr components that changes due to CPU hotplug.
>> Due to the use of for_each_present_cpu(), it is possible for the number of cpu PT_NOTEs
>> to fluctuate as cpus are un/plugged. Onlining/offlining of cpus does not impact the
>> number of cpu PT_NOTEs (as the cpus are still present).
>>
>>>
>>>  From what I understood, crash notes are prepared for possible CPUs as system boots and
>>> could be used to create a PT_NOTE section for each possible CPU while generating the elfcorehdr
>>> during the kdump kernel load.
>>>
>>> Now once the elfcorehdr is loaded with PT_NOTEs for every possible CPU there is no need to
>>> regenerate it for CPU hotplug events. Or do we?
>>
>> For onlining/offlining of cpus, there is no need to regenerate the elfcorehdr. However,
>> for actual hot un/plug of cpus, the answer is yes due to for_each_present_cpu(). The
>> caveat here of course is that if crash utility is the only coredump analyzer of concern,
>> then it doesn't care about these cpu PT_NOTEs and there would be no need to re-generate them.
>>
>> Also, I'm not sure if ARM cpu hotplug, which is just now coming into mainstream, impacts
>> any of this.
>>
>> Perhaps the one item that might help here is to distinguish between actual hot un/plug of
>> cpus, versus onlining/offlining. At the moment, I can not distinguish between a hot plug
>> event and an online event (and unplug/offline). If those were distinguishable, then we
>> could only regenerate on un/plug events.
>>
>> Or perhaps moving to for_each_possible_cpu() is the better choice?
> 
> Yes, because once elfcorehdr is built with possible CPUs we don't have to worry about
> hot[un]plug case.
> 
> Here is my view on how things should be handled if a core-dump analyzer is dependent on
> elfcorehdr PT_NOTEs to find online/offline CPUs.
> 
> A PT_NOTE in elfcorehdr holds the address of the corresponding crash notes (kernel has
> one crash note per CPU for every possible CPU). Though the crash notes are allocated
> during the boot time they are populated when the system is on the crash path.
> 
> This is how crash notes are populated on PowerPC and I am expecting it would be something
> similar on other architectures too.
> 
> The crashing CPU sends IPI to every other online CPU with a callback function that updates the
> crash notes of that specific CPU. Once the IPI completes the crashing CPU updates its own crash
> note and proceeds further.
> 
> The crash notes of CPUs remain uninitialized if the CPUs were offline or hot unplugged at the time
> system crash. The core-dump analyzer should be able to identify [un]/initialized crash notes
> and display the information accordingly.
> 
> Thoughts?
> 
> - Sourabh

In general, I agree with your points. You've presented a strong case to go with 
for_each_possible_cpu() in crash_prepare_elf64_headers() and those crash notes would always be 
present, and we can ignore changes to cpus wrt/ elfcorehdr updates.

But what do we do about kexec_load() syscall? The way the userspace utility works is it determines 
cpus by:
  nr_cpus = sysconf(_SC_NPROCESSORS_CONF);
which is not the equivalent of possible_cpus. So the complete list of cpu PT_NOTEs is not generated 
up front. We would need a solution for that?

Thanks,
eric

PS. I'll be on vacation all of next week, returning 20feb.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

  reply	other threads:[~2023-02-11  0:35 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-31 22:42 [PATCH v18 0/7] crash: Kernel handling of CPU and memory hot un/plug Eric DeVolder
2023-01-31 22:42 ` [PATCH v18 1/7] crash: move a few code bits to setup support of crash hotplug Eric DeVolder
2023-01-31 22:42 ` [PATCH v18 2/7] crash: prototype change for crash_prepare_elf64_headers() Eric DeVolder
2023-01-31 22:42 ` [PATCH v18 3/7] crash: add generic infrastructure for crash hotplug support Eric DeVolder
2023-02-09 19:10   ` Sourabh Jain
2023-02-10 16:51     ` Eric DeVolder
2023-01-31 22:42 ` [PATCH v18 4/7] kexec: exclude elfcorehdr from the segment digest Eric DeVolder
2023-01-31 22:42 ` [PATCH v18 5/7] kexec: exclude hot remove cpu from elfcorehdr notes Eric DeVolder
2023-02-01 11:33   ` Thomas Gleixner
2023-02-06  8:12     ` Sourabh Jain
2023-02-06 13:03       ` Thomas Gleixner
2023-02-07 17:23     ` Eric DeVolder
2023-02-08 13:44       ` Thomas Gleixner
2023-02-09 17:31         ` Eric DeVolder
2023-02-09 18:43           ` Sourabh Jain
2023-02-09 19:39             ` Eric DeVolder
2023-02-10  6:29               ` Sourabh Jain
2023-02-11  0:35                 ` Eric DeVolder [this message]
2023-02-13  4:40                   ` Sourabh Jain
2023-02-13 12:52                     ` Thomas Gleixner
2023-02-15  2:53                       ` Sourabh Jain
2023-02-28 12:44                     ` Baoquan He
2023-02-28 18:52                       ` Eric DeVolder
2023-03-01 15:48                         ` Eric DeVolder
2023-03-02 10:51                           ` Baoquan He
2023-03-02  5:23                         ` Sourabh Jain
2023-02-23 20:34                 ` Eric DeVolder
2023-02-24  8:34                   ` Sourabh Jain
2023-02-24 20:16                     ` Eric DeVolder
2023-02-27  6:11                       ` Sourabh Jain
2023-02-28 21:50                         ` Eric DeVolder
2023-03-01  6:22                           ` Sourabh Jain
2023-03-01 14:16                             ` Eric DeVolder
2023-01-31 22:42 ` [PATCH v18 6/7] crash: memory and cpu hotplug sysfs attributes Eric DeVolder
2023-01-31 22:42 ` [PATCH v18 7/7] x86/crash: add x86 crash hotplug support Eric DeVolder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b3d5c730-60ee-8f1c-978c-e3df41e3a3f1@oracle.com \
    --to=eric.devolder@oracle.com \
    --cc=bhe@redhat.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=dyoung@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=efault@gmx.de \
    --cc=hpa@zytor.com \
    --cc=kexec@lists.infradead.org \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=nramas@linux.microsoft.com \
    --cc=robh@kernel.org \
    --cc=rppt@kernel.org \
    --cc=sourabhjain@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=vgoyal@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox