All of lore.kernel.org
 help / color / mirror / Atom feed
From: Baoquan He <bhe@redhat.com>
To: Eric DeVolder <eric.devolder@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	"kexec@lists.infradead.org" <kexec@lists.infradead.org>,
	ebiederm@xmission.com, linux-kernel@vger.kernel.org
Subject: Re: [RFC]: kexec: change to handle memory/cpu changes
Date: Tue, 22 Dec 2020 09:40:39 +0800	[thread overview]
Message-ID: <20201222014039.GA2237@MiWiFi-R3L-srv> (raw)
In-Reply-To: <b04ed259-dc5f-7f30-6661-c26f92d9096a@oracle.com>

On 12/14/20 at 10:50am, Eric DeVolder wrote:
...
> The cell contents show the number of seconds it took for the system to
> process all of the 3840 memblocks. The value in parenthesis is the
> number of kdump unload-then-reload operations per second.
> 
>           1 480GB DIMM   480 1GB DIMMs
> -------+-----------------+----------------+
>  RHEL7 | 181s (21.2 ops) | 389s (9.8 ops) |
> -------+-----------------+----------------+
>  RHEL8 |  86s (44.7 ops) | 419s (9.2 ops) |
> -------+-----------------+----------------+
> 
> The scenario of adding 480 1GiB virtual DIMMs takes more time given
> the larger number of round trips of QEMU -> kernel -> udev -> kernel ->
> QEMU, and are both roughly 400s.
> 
> The RHEL7 system process all 3840 memblocks individually and perform
> 3840 kdump unload-then-reload operations.
> 
> However, RHEL8 data in the best case scenario (1 480GiB DIMM) suggests
> that approximately 86/4= 21 kdump unload-then-reload operations
> happened, and in the worst case scenario (480 1GiB DIMMs), the data
> suggests that approximately 419/4 = 105 kdump unload-then-reload
> operations happened. For RHEL8, the final number of kdump
> unload-then-reload operations are 0.5% (21 of 3840) and 2.7% (105 of
> 3840), respectively, compared to that of the RHEL7 system.
> 
> The throttle approach is quite effective in reducing the number of
> kdump unload-then-reload operations. However, the kdump capture kernel
> is still reloaded multiple times, and each kdump capture kernel reload
> is a race window in which kdump can fail.
> 
> A quick peek at Ubuntu 20.04 LTS reveals it has 50-kdump-tools.rules
> that looks like:
> 
>   SUBSYSTEM=="memory", ACTION=="online", PROGRAM="/usr/sbin/kdump-config try-reload"
>   SUBSYSTEM=="memory", ACTION=="offline", PROGRAM="/usr/sbin/kdump-config try-reload"
>   SUBSYSTEM=="cpu", ACTION=="add", PROGRAM="/usr/sbin/kdump-config try-reload"
>   SUBSYSTEM=="cpu", ACTION=="remove", PROGRAM="/usr/sbin/kdump-config try-reload"
>   SUBSYSTEM=="cpu", ACTION=="offline", PROGRAM="/usr/sbin/kdump-config try-reload"
> 
> which produces the equivalent behavior to RHEL7 whereby every event
> results in a kdump capture kernel reload.
> 
> Fedora 33 and CentOS 8-stream behave the same as RHEL8.
> 
> Perhaps a better solution is to rewrite the vmcoreinfo structure that
> contains the memory and CPU layout information, as those changes to
> memory and CPUs occur. Rewriting vmcoreinfo is an in-kernel activity
> and would certainly avoid the relatively large unload-then-reload
> times of the kdump capture kernel. The pointer to the vmcoreinfo
> structure is provided to the capture kernel via the elfcorehdr=
> parameter to the capture kernel cmdline. Rewriting the vmcoreinfo
> structure as well as rewriting the capture kernel cmdline parameter is
> needed to utilize this approach.

Great investigation and conclusion, and very nice idea as below. When I
read the first half of this mail, I thought maybe we could add a new
option to kexec-tools utility for updating eflcorehdr only when hotplug
udev events detected. Then come to this part, I would say yes, doing it
inside kernel looks better. A special handling for hotplug looks
necessary as you have said, I will check what we can do and give back
some details, thanks for doing these.

Thanks
Baoquan

> 
> Based upon some amount of examining code, I think the challenges
> involved in updating the CPU and memory layout in-kernel are:
> 
>  - adding call-outs on the add_memory()/try_remove_memory() and
>    cpu_up()/cpu_down() paths for notifying the kdump subsystem of
>    memory and/or CPU changes.
> 
>  - updating the struct kimage with the memory or CPU changes
> 
>  - Rewriting the vmcoreinfo structure from the data contained
>    in struct kimage, eg crash_prepare_elf64_headers()
> 
>  - Installing the updated vmcoreinfo struct via
>    kimage_crash_copy_vmcoreinfo() and rewriting the kdump kernel
>    cmdline in order to update parameter elfcorehdr= with the
>    new address
> 
> As I am not overly familiar with all the code paths involved, yet, I'm
> sure the devil is in the details. However, due the kexec_file_load
> syscall, it appears most of the infrastructure is already in place,
> and we essentially need to tap into it again for memory and cpu
> changes.
> 
> It appears that this change could be applicable to both kexec_load and
> kexec_file_load, it has the potential to (eventually) simplify the
> userland kexec utility for kexec_load, and would eliminate the need
> for 98-kexec.rules and the associated churn.
> 
> Comments please!
> eric
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
> 


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

WARNING: multiple messages have this Message-ID (diff)
From: Baoquan He <bhe@redhat.com>
To: Eric DeVolder <eric.devolder@oracle.com>
Cc: ebiederm@xmission.com, linux-kernel@vger.kernel.org,
	"kexec@lists.infradead.org" <kexec@lists.infradead.org>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Subject: Re: [RFC]: kexec: change to handle memory/cpu changes
Date: Tue, 22 Dec 2020 09:40:39 +0800	[thread overview]
Message-ID: <20201222014039.GA2237@MiWiFi-R3L-srv> (raw)
In-Reply-To: <b04ed259-dc5f-7f30-6661-c26f92d9096a@oracle.com>

On 12/14/20 at 10:50am, Eric DeVolder wrote:
...
> The cell contents show the number of seconds it took for the system to
> process all of the 3840 memblocks. The value in parenthesis is the
> number of kdump unload-then-reload operations per second.
> 
>           1 480GB DIMM   480 1GB DIMMs
> -------+-----------------+----------------+
>  RHEL7 | 181s (21.2 ops) | 389s (9.8 ops) |
> -------+-----------------+----------------+
>  RHEL8 |  86s (44.7 ops) | 419s (9.2 ops) |
> -------+-----------------+----------------+
> 
> The scenario of adding 480 1GiB virtual DIMMs takes more time given
> the larger number of round trips of QEMU -> kernel -> udev -> kernel ->
> QEMU, and are both roughly 400s.
> 
> The RHEL7 system process all 3840 memblocks individually and perform
> 3840 kdump unload-then-reload operations.
> 
> However, RHEL8 data in the best case scenario (1 480GiB DIMM) suggests
> that approximately 86/4= 21 kdump unload-then-reload operations
> happened, and in the worst case scenario (480 1GiB DIMMs), the data
> suggests that approximately 419/4 = 105 kdump unload-then-reload
> operations happened. For RHEL8, the final number of kdump
> unload-then-reload operations are 0.5% (21 of 3840) and 2.7% (105 of
> 3840), respectively, compared to that of the RHEL7 system.
> 
> The throttle approach is quite effective in reducing the number of
> kdump unload-then-reload operations. However, the kdump capture kernel
> is still reloaded multiple times, and each kdump capture kernel reload
> is a race window in which kdump can fail.
> 
> A quick peek at Ubuntu 20.04 LTS reveals it has 50-kdump-tools.rules
> that looks like:
> 
>   SUBSYSTEM=="memory", ACTION=="online", PROGRAM="/usr/sbin/kdump-config try-reload"
>   SUBSYSTEM=="memory", ACTION=="offline", PROGRAM="/usr/sbin/kdump-config try-reload"
>   SUBSYSTEM=="cpu", ACTION=="add", PROGRAM="/usr/sbin/kdump-config try-reload"
>   SUBSYSTEM=="cpu", ACTION=="remove", PROGRAM="/usr/sbin/kdump-config try-reload"
>   SUBSYSTEM=="cpu", ACTION=="offline", PROGRAM="/usr/sbin/kdump-config try-reload"
> 
> which produces the equivalent behavior to RHEL7 whereby every event
> results in a kdump capture kernel reload.
> 
> Fedora 33 and CentOS 8-stream behave the same as RHEL8.
> 
> Perhaps a better solution is to rewrite the vmcoreinfo structure that
> contains the memory and CPU layout information, as those changes to
> memory and CPUs occur. Rewriting vmcoreinfo is an in-kernel activity
> and would certainly avoid the relatively large unload-then-reload
> times of the kdump capture kernel. The pointer to the vmcoreinfo
> structure is provided to the capture kernel via the elfcorehdr=
> parameter to the capture kernel cmdline. Rewriting the vmcoreinfo
> structure as well as rewriting the capture kernel cmdline parameter is
> needed to utilize this approach.

Great investigation and conclusion, and very nice idea as below. When I
read the first half of this mail, I thought maybe we could add a new
option to kexec-tools utility for updating eflcorehdr only when hotplug
udev events detected. Then come to this part, I would say yes, doing it
inside kernel looks better. A special handling for hotplug looks
necessary as you have said, I will check what we can do and give back
some details, thanks for doing these.

Thanks
Baoquan

> 
> Based upon some amount of examining code, I think the challenges
> involved in updating the CPU and memory layout in-kernel are:
> 
>  - adding call-outs on the add_memory()/try_remove_memory() and
>    cpu_up()/cpu_down() paths for notifying the kdump subsystem of
>    memory and/or CPU changes.
> 
>  - updating the struct kimage with the memory or CPU changes
> 
>  - Rewriting the vmcoreinfo structure from the data contained
>    in struct kimage, eg crash_prepare_elf64_headers()
> 
>  - Installing the updated vmcoreinfo struct via
>    kimage_crash_copy_vmcoreinfo() and rewriting the kdump kernel
>    cmdline in order to update parameter elfcorehdr= with the
>    new address
> 
> As I am not overly familiar with all the code paths involved, yet, I'm
> sure the devil is in the details. However, due the kexec_file_load
> syscall, it appears most of the infrastructure is already in place,
> and we essentially need to tap into it again for memory and cpu
> changes.
> 
> It appears that this change could be applicable to both kexec_load and
> kexec_file_load, it has the potential to (eventually) simplify the
> userland kexec utility for kexec_load, and would eliminate the need
> for 98-kexec.rules and the associated churn.
> 
> Comments please!
> eric
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
> 


  reply	other threads:[~2020-12-22  1:42 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-14 16:50 [RFC]: kexec: change to handle memory/cpu changes Eric DeVolder
2020-12-14 16:50 ` Eric DeVolder
2020-12-22  1:40 ` Baoquan He [this message]
2020-12-22  1:40   ` Baoquan He

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201222014039.GA2237@MiWiFi-R3L-srv \
    --to=bhe@redhat.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=ebiederm@xmission.com \
    --cc=eric.devolder@oracle.com \
    --cc=kexec@lists.infradead.org \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.