Kexec Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Eric DeVolder <eric.devolder@oracle.com>
To: Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org, x86@kernel.org,
	kexec@lists.infradead.org, ebiederm@xmission.com,
	dyoung@redhat.com, bhe@redhat.com, vgoyal@redhat.com
Cc: mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	hpa@zytor.com, nramas@linux.microsoft.com,
	thomas.lendacky@amd.com, robh@kernel.org, efault@gmx.de,
	rppt@kernel.org, david@redhat.com, sourabhjain@linux.ibm.com,
	konrad.wilk@oracle.com, boris.ostrovsky@oracle.com
Subject: Re: [PATCH v17 3/6] crash: add generic infrastructure for crash hotplug support
Date: Fri, 20 Jan 2023 15:24:32 -0600	[thread overview]
Message-ID: <92fd9892-e1cf-79f6-529c-c4e5b1516802@oracle.com> (raw)
In-Reply-To: <878rhyi53l.ffs@tglx>


On 1/19/23 15:31, Thomas Gleixner wrote:
> Eric!
> 
> On Wed, Jan 18 2023 at 16:35, Eric DeVolder wrote:
>> CPU and memory change notifications are received in order to
>> regenerate the elfcorehdr.
>>
>> To support cpu hotplug, a callback is registered to capture the
>> CPUHP_AP_ONLINE_DYN online and offline events via
>> cpuhp_setup_state_nocalls().
> 
> This sentence does not make sense. The callback is not registered to
> capture CPUHP_AP_ONLINE_DYN events >
> What this does is: It installs a dynamic CPU hotplug state with
> callbacks for online and offline. These callbacks store information
> about a CPU coming up and going down. Right?

I agree, the wording is wrong; this code taps into that state, as you suggest, in order to handle 
the online and offline events.

> 
> But why are they required and what's the value?
> 
> This changelog tells WHAT it does and not WHY. I can see the WHAT from
> the patch itself.
> 
> Don't tell me the WHY is in the cover letter. The cover letter is not
> part of the commits and changelogs have to be self contained.
> 
> Now let me cite from your cover letter:
> 
>> When the kdump service is loaded, if a CPU or memory is hot
>> un/plugged, the crash elfcorehdr, which describes the CPUs
>> and memory in the system, must also be updated, else the resulting
>> vmcore is inaccurate (eg. missing either CPU context or memory
>> regions).
I'll work to improve the wording and why for the next iteration.

> 
> The CPU hotplug state you are using for this is patently inaccurate
> too. With your approach the CPU is tracked as online very late in the
> hotplug process and tracked as offline very early on unplug.
> 
> So if the kernel crashes before/after the plug/unplug tracking event
> then your recorded state is bogus and given the amount of callbacks
> between the real online/offline and the recording point there is a
> pretty large window.
> 
> You can argue that this is better than the current state and considered
> good enough for whatever reason, but such information wants to be in the
> changelog, no?
I agree!  I admit that CPUHP_AP_ONLINE_DYN may (is) not the best choice. I did spend time looking at 
the cpu hotplug infrastructure, but did not learn a better/correct way. Fwiw: 
https://lore.kernel.org/lkml/20211118174948.37435-1-eric.devolder@oracle.com/:

"The second problem is the use of CPUHP_AP_ONLINE_DYN.  The
cpuhp_setup_state_nocalls() is invoked with parameter
CPUHP_AP_ONLINE_DYN. While this works, when a CPU is being unplugged,
the CPU still shows up in foreach_present_cpu() during the
regeneration of the elfcorehdr, thus the need to explicitly check and
exclude the soon-to-be offlined CPU in crash_prepare_elf64_headers().
Perhaps if value(s) new/different than CPUHP_AP_ONLINE_DYN to
cpuhp_setup_state() was utilized, then the offline cpu would no longer
be in foreach_present_cpu(), and this change could be eliminated. I do
not understand cpuhp_setup_state() well enough to choose, or create,
appropriate value(s)."

The problem described (and worked around in this patch series) is the behavior/window you point out. 
I'd prefer to narrow the window, if possible. The states/values I tried did not work; any 
suggestions for a more appropriate state/value would be most welcomed!

> 
> Thanks,
> 
>          tglx
> 
> Hint: The requirements for changelogs are well documented in Documentation/process/
> 
> 
Thomas, thank you for looking at this!
eric

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

  reply	other threads:[~2023-01-20 21:25 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-18 21:35 [PATCH v17 0/6] crash: Kernel handling of CPU and memory hot un/plug Eric DeVolder
2023-01-18 21:35 ` [PATCH v17 1/6] crash: move a few code bits to setup support of crash hotplug Eric DeVolder
2023-01-18 21:35 ` [PATCH v17 2/6] crash: prototype change for crash_prepare_elf64_headers() Eric DeVolder
2023-01-18 21:35 ` [PATCH v17 3/6] crash: add generic infrastructure for crash hotplug support Eric DeVolder
2023-01-19 21:31   ` Thomas Gleixner
2023-01-20 21:24     ` Eric DeVolder [this message]
2023-01-18 21:35 ` [PATCH v17 4/6] kexec: exclude elfcorehdr from the segment digest Eric DeVolder
2023-01-18 21:35 ` [PATCH v17 5/6] kexec: exclude hot remove cpu from elfcorehdr notes Eric DeVolder
2023-01-18 21:35 ` [PATCH v17 6/6] x86/crash: add x86 crash hotplug support Eric DeVolder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=92fd9892-e1cf-79f6-529c-c4e5b1516802@oracle.com \
    --to=eric.devolder@oracle.com \
    --cc=bhe@redhat.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=dyoung@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=efault@gmx.de \
    --cc=hpa@zytor.com \
    --cc=kexec@lists.infradead.org \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=nramas@linux.microsoft.com \
    --cc=robh@kernel.org \
    --cc=rppt@kernel.org \
    --cc=sourabhjain@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=vgoyal@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox