* Re: [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support [not found] ` <fd08c13d-a917-4cd6-85ec-267e0fe74c41@oracle.com> @ 2022-09-30 16:50 ` Borislav Petkov 2022-09-30 17:11 ` Eric DeVolder 0 siblings, 1 reply; 26+ messages in thread From: Borislav Petkov @ 2022-09-30 16:50 UTC (permalink / raw) To: Eric DeVolder, Oscar Salvador, Andrew Morton, david Cc: linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal, tglx, mingo, dave.hansen, hpa, nramas, thomas.lendacky, robh, efault, rppt, sourabhjain, linux-mm On Fri, Sep 30, 2022 at 10:36:49AM -0500, Eric DeVolder wrote: > > Your help text talks about System RAM entries in /proc/iomem which means > > that those entries are present somewhere in the kernel and you can read > > them out and do the proper calculations dynamically instead of doing the > > static CONFIG_NR_CPUS_DEFAULT + CONFIG_CRASH_MAX_MEMORY_RANGES thing. > > The intent is to compute the max size buffer needed to contain a maximum > populated elfcorehdr, which is primarily based on the number of CPUs and > memory regions. Thus far I (and others involved) have not found a kernel > method to determine the maximum number of memory regions possible (if you > are aware of one, please let me know!). Thus CONFIG_CRASH_MAX_MEMORY_RANGES > was born (rather borrowed from kexec-tools). Let's ask some mm folks. mm folks, is there a way to enumerate all the memory regions a machine has? It looks to me like register_memory_resource() in mm/memory_hotplug.c does register the resource so there should be a way to count that list of resources or at least maintain a count somewhere so that kexec/crash code can know how big its elfcodehdr buffer should be instead of doing a clumsy Kconfig item where people would need to guess... Hmm. > > +#ifdef CONFIG_CRASH_MAX_MEMORY_RANGES > So I think the use of CONFIG_CRASH_MAX_MEMORY_RANGES is not correct; it > still needs to be based on the cpu or memory hotplug options. You're kidding, right? +config CRASH_MAX_MEMORY_RANGES + depends on CRASH_DUMP && KEXEC_FILE && (HOTPLUG_CPU || MEMORY_HOTPLUG) ^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > @@ -622,6 +622,15 @@ static int __init crash_save_vmcoreinfo_init(void) > > subsys_initcall(crash_save_vmcoreinfo_init); > > #if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG) > > + > > +void __weak *arch_map_crash_pages(unsigned long paddr, unsigned long size) > > +{ > > + return NULL; > > +} > > + > > +void __weak arch_unmap_crash_pages(void **ptr) { } > > +void __weak arch_crash_handle_hotplug_event(struct kimage *image, unsigned int hp_action) { } > > + > I was asked by Baoquan He to eliminate the use of __weak Because? -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support 2022-09-30 16:50 ` [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support Borislav Petkov @ 2022-09-30 17:11 ` Eric DeVolder 2022-09-30 17:40 ` Borislav Petkov 0 siblings, 1 reply; 26+ messages in thread From: Eric DeVolder @ 2022-09-30 17:11 UTC (permalink / raw) To: Borislav Petkov, Oscar Salvador, Andrew Morton, david Cc: linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal, tglx, mingo, dave.hansen, hpa, nramas, thomas.lendacky, robh, efault, rppt, sourabhjain, linux-mm On 9/30/22 11:50, Borislav Petkov wrote: > On Fri, Sep 30, 2022 at 10:36:49AM -0500, Eric DeVolder wrote: >>> Your help text talks about System RAM entries in /proc/iomem which means >>> that those entries are present somewhere in the kernel and you can read >>> them out and do the proper calculations dynamically instead of doing the >>> static CONFIG_NR_CPUS_DEFAULT + CONFIG_CRASH_MAX_MEMORY_RANGES thing. >> >> The intent is to compute the max size buffer needed to contain a maximum >> populated elfcorehdr, which is primarily based on the number of CPUs and >> memory regions. Thus far I (and others involved) have not found a kernel >> method to determine the maximum number of memory regions possible (if you >> are aware of one, please let me know!). Thus CONFIG_CRASH_MAX_MEMORY_RANGES >> was born (rather borrowed from kexec-tools). > > Let's ask some mm folks. > > mm folks, is there a way to enumerate all the memory regions a machine > has? > > It looks to me like register_memory_resource() in mm/memory_hotplug.c > does register the resource so there should be a way to count that list > of resources or at least maintain a count somewhere so that kexec/crash > code can know how big its elfcodehdr buffer should be instead of doing a > clumsy Kconfig item where people would need to guess... > > Hmm. > There is of course a way to enumerate the memory regions in use on the machine, that is not what this code needs. In order to compute the maximum buffer size needed (this buffer size is computed once), the count of the maximum number of memory regions possible (even if not currently in use) is what is needed. >>> +#ifdef CONFIG_CRASH_MAX_MEMORY_RANGES >> So I think the use of CONFIG_CRASH_MAX_MEMORY_RANGES is not correct; it >> still needs to be based on the cpu or memory hotplug options. > > You're kidding, right? > > +config CRASH_MAX_MEMORY_RANGES > + depends on CRASH_DUMP && KEXEC_FILE && (HOTPLUG_CPU || MEMORY_HOTPLUG) > ^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Oh, that would be an error of haste on my part. This should be: depends on CRASH_DUMP && MEMORY_HOTPLUG > >>> @@ -622,6 +622,15 @@ static int __init crash_save_vmcoreinfo_init(void) >>> subsys_initcall(crash_save_vmcoreinfo_init); >>> #if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG) >>> + >>> +void __weak *arch_map_crash_pages(unsigned long paddr, unsigned long size) >>> +{ >>> + return NULL; >>> +} >>> + >>> +void __weak arch_unmap_crash_pages(void **ptr) { } >>> +void __weak arch_crash_handle_hotplug_event(struct kimage *image, unsigned int hp_action) { } >>> + >> I was asked by Baoquan He to eliminate the use of __weak > > Because? > Baoquan pointed me to: https://lore.kernel.org/lkml/cover.1656659357.git.naveen.n.rao@linux.vnet.ibm.com/T/ Thanks, eric ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support 2022-09-30 17:11 ` Eric DeVolder @ 2022-09-30 17:40 ` Borislav Petkov 2022-10-08 2:35 ` Baoquan He 2022-10-12 16:20 ` Eric DeVolder 0 siblings, 2 replies; 26+ messages in thread From: Borislav Petkov @ 2022-09-30 17:40 UTC (permalink / raw) To: Eric DeVolder Cc: Oscar Salvador, Andrew Morton, david, linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal, tglx, mingo, dave.hansen, hpa, nramas, thomas.lendacky, robh, efault, rppt, sourabhjain, linux-mm On Fri, Sep 30, 2022 at 12:11:26PM -0500, Eric DeVolder wrote: > There is of course a way to enumerate the memory regions in use on the > machine, that is not what this code needs. In order to compute the maximum > buffer size needed (this buffer size is computed once), the count of the > maximum number of memory regions possible (even if not currently in use) is > what is needed. Isn't that max number documented somewhere in memory hotplug docs? Because then you don't need that Kconfig item either. Imagine you're a distro kernel distributor and you want crash to work on all machines your kernel works. So you go and set that number to max. And that would be the 99% of the kernel configs out there. Which means, you can just set it to max without a Kconfig item. > Oh, that would be an error of haste on my part. This should be: > depends on CRASH_DUMP && MEMORY_HOTPLUG You need a Kconfig item which enables all this gunk as MEMORY_HOTPLUG is not a omnipresent feature. And that Kconfig item should depend on the other Kconfig items of the technology you need. > Baoquan pointed me to: > > https://lore.kernel.org/lkml/cover.1656659357.git.naveen.n.rao@linux.vnet.ibm.com/T/ In that thread says: "- arch_kexec_apply_relocations_add() is only overridden by x86 and s390. Retain the function prototype for those and move the weak implementation into the header as a static inline for other architectures." So yes, that's even better. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support 2022-09-30 17:40 ` Borislav Petkov @ 2022-10-08 2:35 ` Baoquan He 2022-10-12 17:46 ` Borislav Petkov 2022-10-12 16:20 ` Eric DeVolder 1 sibling, 1 reply; 26+ messages in thread From: Baoquan He @ 2022-10-08 2:35 UTC (permalink / raw) To: Borislav Petkov Cc: Eric DeVolder, Oscar Salvador, Andrew Morton, david, linux-kernel, x86, kexec, ebiederm, dyoung, vgoyal, tglx, mingo, dave.hansen, hpa, nramas, thomas.lendacky, robh, efault, rppt, sourabhjain, linux-mm On 09/30/22 at 07:40pm, Borislav Petkov wrote: > On Fri, Sep 30, 2022 at 12:11:26PM -0500, Eric DeVolder wrote: > > There is of course a way to enumerate the memory regions in use on the > > machine, that is not what this code needs. In order to compute the maximum > > buffer size needed (this buffer size is computed once), the count of the > > maximum number of memory regions possible (even if not currently in use) is > > what is needed. > > Isn't that max number documented somewhere in memory hotplug docs? Memory hptlug is not limited by a certin or a max number of memory regions, but limited by how large of the linear mapping range which physical can be mapped into. E.g on x86_64, with 4-level page tables, it has 64TB linear mapping range by default. On principle, we can add 64TB of phisical memory into system altogether from booting and memory hotplug. While with KASLR enabled, it has 10TB of linear mapping range by default, see CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING. Means there's only 10TB phisical memory being allowed to be added into system. For the Kconfig CRASH_MAX_MEMORY_RANGES Eric added, it's meaningful to me to set a fixed value which is enough in reality. For extreme testing with special purpose, it could be broken easily, people need decide by self whether the CONFIG_CRASH_MAX_MEMORY_RANGES is enlarged or not. E.g on x86_64, we make a system with memory smaller than 64G, this will cause the memory block size being probed as 256M. Then we hot added many Tera Bytes of physical memory every second memory block after bootup with a shell shell script. It could be easier to manipulate this with virtiomem. Please see function probe_memory_block_size() on x86_64 about the memory block size probing. However, I don't think on real system, this kind of system could really exist, with a tiny memory booted up, a huge memory hot added sparsely. > > Because then you don't need that Kconfig item either. Imagine you're a > distro kernel distributor and you want crash to work on all machines > your kernel works. > > So you go and set that number to max. And that would be the 99% of the > kernel configs out there. > > Which means, you can just set it to max without a Kconfig item. > > > Oh, that would be an error of haste on my part. This should be: > > depends on CRASH_DUMP && MEMORY_HOTPLUG > > You need a Kconfig item which enables all this gunk as MEMORY_HOTPLUG is > not a omnipresent feature. And that Kconfig item should depend on the > other Kconfig items of the technology you need. > > > Baoquan pointed me to: > > > > https://lore.kernel.org/lkml/cover.1656659357.git.naveen.n.rao@linux.vnet.ibm.com/T/ > > In that thread says: > > "- arch_kexec_apply_relocations_add() is only overridden by x86 and s390. > Retain the function prototype for those and move the weak > implementation into the header as a static inline for other > architectures." > > So yes, that's even better. > > -- > Regards/Gruss, > Boris. > > https://people.kernel.org/tglx/notes-about-netiquette > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support 2022-10-08 2:35 ` Baoquan He @ 2022-10-12 17:46 ` Borislav Petkov 2022-10-12 20:19 ` Eric DeVolder 0 siblings, 1 reply; 26+ messages in thread From: Borislav Petkov @ 2022-10-12 17:46 UTC (permalink / raw) To: Baoquan He Cc: Eric DeVolder, Oscar Salvador, Andrew Morton, david, linux-kernel, x86, kexec, ebiederm, dyoung, vgoyal, tglx, mingo, dave.hansen, hpa, nramas, thomas.lendacky, robh, efault, rppt, sourabhjain, linux-mm On Sat, Oct 08, 2022 at 10:35:14AM +0800, Baoquan He wrote: > Memory hptlug is not limited by a certin or a max number of memory > regions, but limited by how large of the linear mapping range which > physical can be mapped into. Memory hotplug is not limited by some abstract range but by the *actual* possibility of how many DIMM slots on any motherboard can hotplug memory. Certainly not 32K. So you can choose a sane default which covers *all* actual systems out there. > For the Kconfig CRASH_MAX_MEMORY_RANGES Eric added, it's meaningful to > me to set a fixed value which is enough in reality. Yes, exactly. > For extreme testing with special purpose, it could be broken easily, > people need decide by self whether the CONFIG_CRASH_MAX_MEMORY_RANGES > is enlarged or not. I don't want for people to decide on one more thing where they have to go and read a bunch of specs just to know what is a good value. So we should set a sane, *practical* upper limit and simply go with it. Everything else is testing stuff and if you test the kernel, then you can change limits and values and so on as you want to. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support 2022-10-12 17:46 ` Borislav Petkov @ 2022-10-12 20:19 ` Eric DeVolder 2022-10-12 20:41 ` Borislav Petkov 2022-10-12 20:42 ` Eric DeVolder 0 siblings, 2 replies; 26+ messages in thread From: Eric DeVolder @ 2022-10-12 20:19 UTC (permalink / raw) To: Borislav Petkov, Baoquan He Cc: Oscar Salvador, Andrew Morton, david, linux-kernel, x86, kexec, ebiederm, dyoung, vgoyal, tglx, mingo, dave.hansen, hpa, nramas, thomas.lendacky, robh, efault, rppt, sourabhjain, linux-mm On 10/12/22 12:46, Borislav Petkov wrote: > On Sat, Oct 08, 2022 at 10:35:14AM +0800, Baoquan He wrote: >> Memory hptlug is not limited by a certin or a max number of memory >> regions, but limited by how large of the linear mapping range which >> physical can be mapped into. > > Memory hotplug is not limited by some abstract range but by the *actual* > possibility of how many DIMM slots on any motherboard can hotplug > memory. Certainly not 32K. > > So you can choose a sane default which covers *all* actual systems out > there. We run here QEMU with the ability for 1024 DIMM slots. A DIMM can be any reasonable power of 2 size, and then that DIMM is further divided into memblocks, typically 128MiB. So, for example, 1TiB requires 1024 DIMMs of 1GiB each with 128MiB memblocks, that results in 8K possible memory regions. So just going to 4TiB reaches 32K memory regions. This I can attest for virtualized DIMMs, not sure about other memory hotplug technologies like virtio-mem or DynamicMemory. But it seems reasonable that those technologies could also easily reach into these number ranges. Eric > >> For the Kconfig CRASH_MAX_MEMORY_RANGES Eric added, it's meaningful to >> me to set a fixed value which is enough in reality. > > Yes, exactly. > >> For extreme testing with special purpose, it could be broken easily, >> people need decide by self whether the CONFIG_CRASH_MAX_MEMORY_RANGES >> is enlarged or not. > > I don't want for people to decide on one more thing where they have to > go and read a bunch of specs just to know what is a good value. So we > should set a sane, *practical* upper limit and simply go with it. > > Everything else is testing stuff and if you test the kernel, then you > can change limits and values and so on as you want to. > > Thx. > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support 2022-10-12 20:19 ` Eric DeVolder @ 2022-10-12 20:41 ` Borislav Petkov 2022-10-13 2:57 ` Baoquan He 2022-10-12 20:42 ` Eric DeVolder 1 sibling, 1 reply; 26+ messages in thread From: Borislav Petkov @ 2022-10-12 20:41 UTC (permalink / raw) To: Eric DeVolder Cc: Baoquan He, Oscar Salvador, Andrew Morton, david, linux-kernel, x86, kexec, ebiederm, dyoung, vgoyal, tglx, mingo, dave.hansen, hpa, nramas, thomas.lendacky, robh, efault, rppt, sourabhjain, linux-mm On Wed, Oct 12, 2022 at 03:19:19PM -0500, Eric DeVolder wrote: > We run here QEMU with the ability for 1024 DIMM slots. QEMU, haha. What is the highest count of DIMM slots which are hotpluggable on a real, *physical* system today? Are you saying you can have 1K DIMM slots on a board? I hardly doubt that. > So, for example, 1TiB requires 1024 DIMMs of 1GiB each with 128MiB > memblocks, that results in 8K possible memory regions. So just going > to 4TiB reaches 32K memory regions. Lemme see if I understand this correctly: when a system like that crashes, you want to kdump *all* those 4TiB in a vmcore? How long would that dump take to complete? A day? IOW, how does a realistic use case of this look like - not a QEMU one? Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support 2022-10-12 20:41 ` Borislav Petkov @ 2022-10-13 2:57 ` Baoquan He 2022-10-25 10:31 ` Borislav Petkov 0 siblings, 1 reply; 26+ messages in thread From: Baoquan He @ 2022-10-13 2:57 UTC (permalink / raw) To: Borislav Petkov Cc: Eric DeVolder, Oscar Salvador, Andrew Morton, david, linux-kernel, x86, kexec, ebiederm, dyoung, vgoyal, tglx, mingo, dave.hansen, hpa, nramas, thomas.lendacky, robh, efault, rppt, sourabhjain, linux-mm On 10/12/22 at 10:41pm, Borislav Petkov wrote: > On Wed, Oct 12, 2022 at 03:19:19PM -0500, Eric DeVolder wrote: > > We run here QEMU with the ability for 1024 DIMM slots. > > QEMU, haha. > > What is the highest count of DIMM slots which are hotpluggable on a > real, *physical* system today? Are you saying you can have 1K DIMM slots > on a board? The concern to range number mainly is on Virt guest systems. On baremetal system, basically only very high end server support memory hotplug. I ever visited customer's lab and saw one server, it owns 8 slots, on each slot a box containing about 20 cpus and 2T memory at most can be plugged in at one time. So people won't make too many slots for hotplugging since it's too expensive. I checked user space kexec code, the maximum memory range number is honored to x86_64 because of a HPE SGI system. After that, nobody complains about it. Please see below user space kexec-tools commit in https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git The memory ranges may be not all made by different DIMM slots, could be firmware reservatoin, e.g efi/BIOS diggged out physical memory, or the cpu logical address space is occupied by pci or other stuffs. I don't have a HPE SGI system at hand to check. commit 4a6d67d9e938a7accf128aff23f8ad4bda67f729 Author: Xunlei Pang <xlpang@redhat.com> Date: Thu Mar 23 19:16:59 2017 +0800 x86: Support large number of memory ranges We got a problem on one SGI 64TB machine, the current kexec-tools failed to work due to the insufficient ranges(MAX_MEMORY_RANGES) allowed which is defined as 1024(less than the ranges on the machine). The kcore header is insufficient due to the same reason as well. To solve this, this patch simply doubles "MAX_MEMORY_RANGES" and "KCORE_ELF_HEADERS_SIZE". Signed-off-by: Xunlei Pang <xlpang@redhat.com> Tested-by: Frank Ramsay <frank.ramsay@hpe.com> Signed-off-by: Simon Horman <horms@verge.net.au> diff --git a/kexec/arch/i386/kexec-x86.h b/kexec/arch/i386/kexec-x86.h index 33df3524f4e2..51855f8db762 100644 --- a/kexec/arch/i386/kexec-x86.h +++ b/kexec/arch/i386/kexec-x86.h @@ -1,7 +1,7 @@ #ifndef KEXEC_X86_H #define KEXEC_X86_H -#define MAX_MEMORY_RANGES 1024 +#define MAX_MEMORY_RANGES 2048 > > I hardly doubt that. The questioning is reasonable. 32K truly looks too much. Now CONFIG_NR_CPUS has the maximum number as 8192. And user space kexec-tools has maximum memory range number as 2048. We can take the current 8192 + 2048 = 10K as default value conservatively. Or take 8192 + 2048 * 2 = 12K which has two times of maximum memory range bumber in kexec-tools. What do you think? > > > So, for example, 1TiB requires 1024 DIMMs of 1GiB each with 128MiB > > memblocks, that results in 8K possible memory regions. So just going > > to 4TiB reaches 32K memory regions. > > Lemme see if I understand this correctly: when a system like that > crashes, you want to kdump *all* those 4TiB in a vmcore? How long would > that dump take to complete? A day? That is not a problem. The time of vmcore dumping mainly depends on the actual memory size, not on memory range numbers. when dumping vmcore, people use makedumpfile to filter zero page, free page, cache page, or user date page according to configuration. If memory is huge, they can use nr_cpus=x to enable multiple cpu to do multi-thread dumping. Kdump now support more than 10 TB vmcore dumping. ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support 2022-10-13 2:57 ` Baoquan He @ 2022-10-25 10:31 ` Borislav Petkov 2022-10-26 14:48 ` Baoquan He 0 siblings, 1 reply; 26+ messages in thread From: Borislav Petkov @ 2022-10-25 10:31 UTC (permalink / raw) To: Baoquan He Cc: Eric DeVolder, Oscar Salvador, Andrew Morton, david, linux-kernel, x86, kexec, ebiederm, dyoung, vgoyal, tglx, mingo, dave.hansen, hpa, nramas, thomas.lendacky, robh, efault, rppt, sourabhjain, linux-mm On Thu, Oct 13, 2022 at 10:57:28AM +0800, Baoquan He wrote: > The concern to range number mainly is on Virt guest systems. And why would virt emulate 1K hotpluggable DIMM slots and not emulate a real machine? > On baremetal system, basically only very high end server support > memory hotplug. I ever visited customer's lab and saw one server, > it owns 8 slots, on each slot a box containing about 20 cpus and 2T > memory at most can be plugged in at one time. So people won't make too > many slots for hotplugging since it's too expensive. There you have it - the persuading argument. > I checked user space kexec code, the maximum memory range number is > honored to x86_64 because of a HPE SGI system. After that, nobody > complains about it. Please see below user space kexec-tools commit in > https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git > > The memory ranges may be not all made by different DIMM slots, could be > firmware reservatoin, e.g efi/BIOS diggged out physical memory, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ I don't know what that means. If it is firmware crap, you want to exclude that from kdump anyway. > Now CONFIG_NR_CPUS has the maximum number as 8192. And user space > kexec-tools has maximum memory range number as 2048. We can take > the current 8192 + 2048 = 10K as default value conservatively. Or > take 8192 + 2048 * 2 = 12K which has two times of maximum memory range > bumber in kexec-tools. What do you think? I still think that we should stick to reality and support what is possible not what is potentially and theoretically there. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support 2022-10-25 10:31 ` Borislav Petkov @ 2022-10-26 14:48 ` Baoquan He 2022-10-26 14:54 ` David Hildenbrand 2022-10-27 19:24 ` Eric DeVolder 0 siblings, 2 replies; 26+ messages in thread From: Baoquan He @ 2022-10-26 14:48 UTC (permalink / raw) To: Borislav Petkov, david Cc: Eric DeVolder, Oscar Salvador, Andrew Morton, linux-kernel, x86, kexec, ebiederm, dyoung, vgoyal, tglx, mingo, dave.hansen, hpa, nramas, thomas.lendacky, robh, efault, rppt, sourabhjain, linux-mm On 10/25/22 at 12:31pm, Borislav Petkov wrote: > On Thu, Oct 13, 2022 at 10:57:28AM +0800, Baoquan He wrote: > > The concern to range number mainly is on Virt guest systems. > > And why would virt emulate 1K hotpluggable DIMM slots and not emulate a > real machine? Well, currently, mem hotpug is an important feature on virt system to dynamically increase/shrink memory on the system. If only emulating real machine, it won't be different than bare metal system. IIRC, the ballon driver or virtio-mem feature can add memory board, e.g 1G, block size is 128M, 8 blocks added. When shrinking this 1G memory later, it will take best effort way to hot remove memory. Means if any memory block is occupied, it will be kept there. Finally we could only remove every second blocks, 4 blocks altogether. Then the left un-removed blocks will produce 4 separate memory regions. Like this, a virt guest could have many memory regions in kernel after memory being added/removed. If I am wrong, Please correct me, David. > > > On baremetal system, basically only very high end server support > > memory hotplug. I ever visited customer's lab and saw one server, > > it owns 8 slots, on each slot a box containing about 20 cpus and 2T > > memory at most can be plugged in at one time. So people won't make too > > many slots for hotplugging since it's too expensive. > > There you have it - the persuading argument. > > > I checked user space kexec code, the maximum memory range number is > > honored to x86_64 because of a HPE SGI system. After that, nobody > > complains about it. Please see below user space kexec-tools commit in > > https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git > > > > The memory ranges may be not all made by different DIMM slots, could be > > firmware reservatoin, e.g efi/BIOS diggged out physical memory, > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > I don't know what that means. > > If it is firmware crap, you want to exclude that from kdump anyway. Yes, now assume we have a HPE SGI system and it has memory hotplug capacity. The system itself has already got memory regions more than 1024. Then when we hot add extra memory board, we want to include the newly added memory regions into elfcorehdr so that it will be dumped out in kdump kernel. That's why I earlier suggested 2048 for number of memory regions. > > > Now CONFIG_NR_CPUS has the maximum number as 8192. And user space > > kexec-tools has maximum memory range number as 2048. We can take > > the current 8192 + 2048 = 10K as default value conservatively. Or > > take 8192 + 2048 * 2 = 12K which has two times of maximum memory range > > bumber in kexec-tools. What do you think? > > I still think that we should stick to reality and support what is > possible not what is potentially and theoretically there. Yes, agree. We should try to get a number which satisfies needs in reality. For Kconfig CRASH_MAX_MEMORY_RANGES in this patch, I have three items to suggest: 1) the name is not good, it doesn't reflect the fact that it's the number of program headers of elfcorehdr which includes the cpu note numbers and memory region numers. 2) default cpu number, I suggest 512 or 1024. The biggest number I ever saw in reality is 384. On virt system, it won't be too big. Below is abstracted from arch/x86/Kconfig. A smaller one is also OK, we can enlarge it when people really have a super machine and run into the problem. config NR_CPUS_DEFAULT int depends on X86_64 default 8192 if MAXSMP default 64 if SMP default 1 if !SMP 3) For memory regions, I would suggest 2048. Likewise, smaller value is also fine, we can enlarge it when a real system run into this. I made a draft here for reference, with my undertanding. Please feel free to change it. +config CRASH_ELF_CORE_PHDRS_NUM + depends on CRASH_DUMP && KEXEC_FILE && (HOTPLUG_CPU || MEMORY_HOTPLUG) + int + default 3072 + help + For the kexec_file_load path, specify the default number of + phdr for the vmcore. E.g the memory regions represented by the + 'System RAM' entries in /proc/iomem, the cpu notes of each + present cpu stored in /sys/devices/system/cpu/cpuX/crash_notes. Thanks ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support 2022-10-26 14:48 ` Baoquan He @ 2022-10-26 14:54 ` David Hildenbrand 2022-10-27 13:52 ` Baoquan He 2022-10-27 19:24 ` Eric DeVolder 1 sibling, 1 reply; 26+ messages in thread From: David Hildenbrand @ 2022-10-26 14:54 UTC (permalink / raw) To: Baoquan He, Borislav Petkov Cc: Eric DeVolder, Oscar Salvador, Andrew Morton, linux-kernel, x86, kexec, ebiederm, dyoung, vgoyal, tglx, mingo, dave.hansen, hpa, nramas, thomas.lendacky, robh, efault, rppt, sourabhjain, linux-mm On 26.10.22 16:48, Baoquan He wrote: > On 10/25/22 at 12:31pm, Borislav Petkov wrote: >> On Thu, Oct 13, 2022 at 10:57:28AM +0800, Baoquan He wrote: >>> The concern to range number mainly is on Virt guest systems. >> >> And why would virt emulate 1K hotpluggable DIMM slots and not emulate a >> real machine? IIRC, ACPI only allows for 256 slots. PPC dlpar might provide more. > > Well, currently, mem hotpug is an important feature on virt system to > dynamically increase/shrink memory on the system. If only emulating real > machine, it won't be different than bare metal system. > > IIRC, the ballon driver or virtio-mem feature can add memory board, e.g > 1G, block size is 128M, 8 blocks added. When shrinking this 1G memory > later, it will take best effort way to hot remove memory. Means if any > memory block is occupied, it will be kept there. Finally we could only > remove every second blocks, 4 blocks altogether. Then the left > un-removed blocks will produce 4 separate memory regions. Like this, a > virt guest could have many memory regions in kernel after memory > being added/removed. > > If I am wrong, Please correct me, David. Yes, virtio-mem (but also PPC dlpar) can result in many individual memory blocks with holes in between after hotunplug. Hotplug OTOH, usually tries to "plug" these holes and reduce the total number of memory blocks. It might be rare that our range will be heavily fragmented after unplug, but it's certainly possible. [...] > > Yes, now assume we have a HPE SGI system and it has memory hotplug > capacity. The system itself has already got memory regions more than > 1024. Then when we hot add extra memory board, we want to include the > newly added memory regions into elfcorehdr so that it will be dumped out > in kdump kernel. > > That's why I earlier suggested 2048 for number of memory regions. The more the better, unless "it hurts". Assuming a single memory block is 128 MiB, that would be 256 GiB. Usually, on big systems, the memory block size is 2 GiB. So 4 TiB. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support 2022-10-26 14:54 ` David Hildenbrand @ 2022-10-27 13:52 ` Baoquan He 2022-10-27 19:28 ` Eric DeVolder 0 siblings, 1 reply; 26+ messages in thread From: Baoquan He @ 2022-10-27 13:52 UTC (permalink / raw) To: David Hildenbrand, Borislav Petkov, Eric DeVolder Cc: Oscar Salvador, Andrew Morton, linux-kernel, x86, kexec, ebiederm, dyoung, vgoyal, tglx, mingo, dave.hansen, hpa, nramas, thomas.lendacky, robh, efault, rppt, sourabhjain, linux-mm On 10/26/22 at 04:54pm, David Hildenbrand wrote: > On 26.10.22 16:48, Baoquan He wrote: > > On 10/25/22 at 12:31pm, Borislav Petkov wrote: > > > On Thu, Oct 13, 2022 at 10:57:28AM +0800, Baoquan He wrote: > > > > The concern to range number mainly is on Virt guest systems. > > > > > > And why would virt emulate 1K hotpluggable DIMM slots and not emulate a > > > real machine? > > IIRC, ACPI only allows for 256 slots. PPC dlpar might provide more. > > > > > Well, currently, mem hotpug is an important feature on virt system to > > dynamically increase/shrink memory on the system. If only emulating real > > machine, it won't be different than bare metal system. > > > > IIRC, the ballon driver or virtio-mem feature can add memory board, e.g > > 1G, block size is 128M, 8 blocks added. When shrinking this 1G memory > > later, it will take best effort way to hot remove memory. Means if any > > memory block is occupied, it will be kept there. Finally we could only > > remove every second blocks, 4 blocks altogether. Then the left > > un-removed blocks will produce 4 separate memory regions. Like this, a > > virt guest could have many memory regions in kernel after memory > > being added/removed. > > > > If I am wrong, Please correct me, David. > > Yes, virtio-mem (but also PPC dlpar) can result in many individual memory > blocks with holes in between after hotunplug. Hotplug OTOH, usually tries to > "plug" these holes and reduce the total number of memory blocks. It might be > rare that our range will be heavily fragmented after unplug, but it's > certainly possible. > > [...] > > > > > Yes, now assume we have a HPE SGI system and it has memory hotplug > > capacity. The system itself has already got memory regions more than > > 1024. Then when we hot add extra memory board, we want to include the > > newly added memory regions into elfcorehdr so that it will be dumped out > > in kdump kernel. > > > > That's why I earlier suggested 2048 for number of memory regions. > > The more the better, unless "it hurts". Assuming a single memory block is > 128 MiB, that would be 256 GiB. > > Usually, on big systems, the memory block size is 2 GiB. So 4 TiB. Thanks a lot for these valuable inputs, David. Hi Boris, Eric So what's your suggested value for the Kconfig option? 1) cpu number, 1024? 2) memory regions, 2048? About below draft, any comment? We can decide a value based on our knowledge, can adjust later if any real system has more than the number. +config CRASH_ELF_CORE_PHDRS_NUM + depends on CRASH_DUMP && KEXEC_FILE && (HOTPLUG_CPU || MEMORY_HOTPLUG) + int + default 3072 + help + For the kexec_file_load path, specify the default number of + phdr for the vmcore. E.g the memory regions represented by the + 'System RAM' entries in /proc/iomem, the cpu notes of each + present cpu stored in /sys/devices/system/cpu/cpuX/crash_notes. Thanks Baoquan ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support 2022-10-27 13:52 ` Baoquan He @ 2022-10-27 19:28 ` Eric DeVolder 2022-10-29 4:27 ` Baoquan He 0 siblings, 1 reply; 26+ messages in thread From: Eric DeVolder @ 2022-10-27 19:28 UTC (permalink / raw) To: Baoquan He, David Hildenbrand, Borislav Petkov Cc: Oscar Salvador, Andrew Morton, linux-kernel, x86, kexec, ebiederm, dyoung, vgoyal, tglx, mingo, dave.hansen, hpa, nramas, thomas.lendacky, robh, efault, rppt, sourabhjain, linux-mm On 10/27/22 08:52, Baoquan He wrote: > On 10/26/22 at 04:54pm, David Hildenbrand wrote: >> On 26.10.22 16:48, Baoquan He wrote: >>> On 10/25/22 at 12:31pm, Borislav Petkov wrote: >>>> On Thu, Oct 13, 2022 at 10:57:28AM +0800, Baoquan He wrote: >>>>> The concern to range number mainly is on Virt guest systems. >>>> >>>> And why would virt emulate 1K hotpluggable DIMM slots and not emulate a >>>> real machine? >> >> IIRC, ACPI only allows for 256 slots. PPC dlpar might provide more. >> >>> >>> Well, currently, mem hotpug is an important feature on virt system to >>> dynamically increase/shrink memory on the system. If only emulating real >>> machine, it won't be different than bare metal system. >>> >>> IIRC, the ballon driver or virtio-mem feature can add memory board, e.g >>> 1G, block size is 128M, 8 blocks added. When shrinking this 1G memory >>> later, it will take best effort way to hot remove memory. Means if any >>> memory block is occupied, it will be kept there. Finally we could only >>> remove every second blocks, 4 blocks altogether. Then the left >>> un-removed blocks will produce 4 separate memory regions. Like this, a >>> virt guest could have many memory regions in kernel after memory >>> being added/removed. >>> >>> If I am wrong, Please correct me, David. >> >> Yes, virtio-mem (but also PPC dlpar) can result in many individual memory >> blocks with holes in between after hotunplug. Hotplug OTOH, usually tries to >> "plug" these holes and reduce the total number of memory blocks. It might be >> rare that our range will be heavily fragmented after unplug, but it's >> certainly possible. >> >> [...] >> >>> >>> Yes, now assume we have a HPE SGI system and it has memory hotplug >>> capacity. The system itself has already got memory regions more than >>> 1024. Then when we hot add extra memory board, we want to include the >>> newly added memory regions into elfcorehdr so that it will be dumped out >>> in kdump kernel. >>> >>> That's why I earlier suggested 2048 for number of memory regions. >> >> The more the better, unless "it hurts". Assuming a single memory block is >> 128 MiB, that would be 256 GiB. >> >> Usually, on big systems, the memory block size is 2 GiB. So 4 TiB. > > Thanks a lot for these valuable inputs, David. > > Hi Boris, Eric > > So what's your suggested value for the Kconfig option? > > 1) cpu number, 1024? > 2) memory regions, 2048? > > About below draft, any comment? We can decide a value based on our > knowledge, can adjust later if any real system has more than the number. > > +config CRASH_ELF_CORE_PHDRS_NUM > + depends on CRASH_DUMP && KEXEC_FILE && (HOTPLUG_CPU || MEMORY_HOTPLUG) > + int > + default 3072 > + help > + For the kexec_file_load path, specify the default number of > + phdr for the vmcore. E.g the memory regions represented by the > + 'System RAM' entries in /proc/iomem, the cpu notes of each > + present cpu stored in /sys/devices/system/cpu/cpuX/crash_notes. > > Thanks > Baoquan > I prefer to keep CRASH_MAX_MEMORY_RANGES, as explained in my response to your message on October 26. eric ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support 2022-10-27 19:28 ` Eric DeVolder @ 2022-10-29 4:27 ` Baoquan He 0 siblings, 0 replies; 26+ messages in thread From: Baoquan He @ 2022-10-29 4:27 UTC (permalink / raw) To: Eric DeVolder Cc: David Hildenbrand, Borislav Petkov, Oscar Salvador, Andrew Morton, linux-kernel, x86, kexec, ebiederm, dyoung, vgoyal, tglx, mingo, dave.hansen, hpa, nramas, thomas.lendacky, robh, efault, rppt, sourabhjain, linux-mm On 10/27/22 at 02:28pm, Eric DeVolder wrote: > > > On 10/27/22 08:52, Baoquan He wrote: > > On 10/26/22 at 04:54pm, David Hildenbrand wrote: > > > On 26.10.22 16:48, Baoquan He wrote: > > > > On 10/25/22 at 12:31pm, Borislav Petkov wrote: > > > > > On Thu, Oct 13, 2022 at 10:57:28AM +0800, Baoquan He wrote: > > > > > > The concern to range number mainly is on Virt guest systems. > > > > > > > > > > And why would virt emulate 1K hotpluggable DIMM slots and not emulate a > > > > > real machine? > > > > > > IIRC, ACPI only allows for 256 slots. PPC dlpar might provide more. > > > > > > > > > > > Well, currently, mem hotpug is an important feature on virt system to > > > > dynamically increase/shrink memory on the system. If only emulating real > > > > machine, it won't be different than bare metal system. > > > > > > > > IIRC, the ballon driver or virtio-mem feature can add memory board, e.g > > > > 1G, block size is 128M, 8 blocks added. When shrinking this 1G memory > > > > later, it will take best effort way to hot remove memory. Means if any > > > > memory block is occupied, it will be kept there. Finally we could only > > > > remove every second blocks, 4 blocks altogether. Then the left > > > > un-removed blocks will produce 4 separate memory regions. Like this, a > > > > virt guest could have many memory regions in kernel after memory > > > > being added/removed. > > > > > > > > If I am wrong, Please correct me, David. > > > > > > Yes, virtio-mem (but also PPC dlpar) can result in many individual memory > > > blocks with holes in between after hotunplug. Hotplug OTOH, usually tries to > > > "plug" these holes and reduce the total number of memory blocks. It might be > > > rare that our range will be heavily fragmented after unplug, but it's > > > certainly possible. > > > > > > [...] > > > > > > > > > > > Yes, now assume we have a HPE SGI system and it has memory hotplug > > > > capacity. The system itself has already got memory regions more than > > > > 1024. Then when we hot add extra memory board, we want to include the > > > > newly added memory regions into elfcorehdr so that it will be dumped out > > > > in kdump kernel. > > > > > > > > That's why I earlier suggested 2048 for number of memory regions. > > > > > > The more the better, unless "it hurts". Assuming a single memory block is > > > 128 MiB, that would be 256 GiB. > > > > > > Usually, on big systems, the memory block size is 2 GiB. So 4 TiB. > > > > Thanks a lot for these valuable inputs, David. > > > > Hi Boris, Eric > > > > So what's your suggested value for the Kconfig option? > > > > 1) cpu number, 1024? > > 2) memory regions, 2048? > > > > About below draft, any comment? We can decide a value based on our > > knowledge, can adjust later if any real system has more than the number. > > > > +config CRASH_ELF_CORE_PHDRS_NUM > > + depends on CRASH_DUMP && KEXEC_FILE && (HOTPLUG_CPU || MEMORY_HOTPLUG) > > + int > > + default 3072 > > + help > > + For the kexec_file_load path, specify the default number of > > + phdr for the vmcore. E.g the memory regions represented by the > > + 'System RAM' entries in /proc/iomem, the cpu notes of each > > + present cpu stored in /sys/devices/system/cpu/cpuX/crash_notes. > > > > Thanks > > Baoquan > > > > I prefer to keep CRASH_MAX_MEMORY_RANGES, as explained in my response to your message on October 26. > eric Ah, sorry, I mixed it up with NR_CPUS. I went on an office outing yesterday, glad to see you and Boris have made an agreement on the code change and value. Thanks. > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support 2022-10-26 14:48 ` Baoquan He 2022-10-26 14:54 ` David Hildenbrand @ 2022-10-27 19:24 ` Eric DeVolder 2022-10-28 10:19 ` Borislav Petkov 1 sibling, 1 reply; 26+ messages in thread From: Eric DeVolder @ 2022-10-27 19:24 UTC (permalink / raw) To: Baoquan He, Borislav Petkov, david Cc: Oscar Salvador, Andrew Morton, linux-kernel, x86, kexec, ebiederm, dyoung, vgoyal, tglx, mingo, dave.hansen, hpa, nramas, thomas.lendacky, robh, efault, rppt, sourabhjain, linux-mm On 10/26/22 09:48, Baoquan He wrote: > On 10/25/22 at 12:31pm, Borislav Petkov wrote: >> On Thu, Oct 13, 2022 at 10:57:28AM +0800, Baoquan He wrote: >>> The concern to range number mainly is on Virt guest systems. >> >> And why would virt emulate 1K hotpluggable DIMM slots and not emulate a >> real machine? > > Well, currently, mem hotpug is an important feature on virt system to > dynamically increase/shrink memory on the system. If only emulating real > machine, it won't be different than bare metal system. > > IIRC, the ballon driver or virtio-mem feature can add memory board, e.g > 1G, block size is 128M, 8 blocks added. When shrinking this 1G memory > later, it will take best effort way to hot remove memory. Means if any > memory block is occupied, it will be kept there. Finally we could only > remove every second blocks, 4 blocks altogether. Then the left > un-removed blocks will produce 4 separate memory regions. Like this, a > virt guest could have many memory regions in kernel after memory > being added/removed. > > If I am wrong, Please correct me, David. > >> >>> On baremetal system, basically only very high end server support >>> memory hotplug. I ever visited customer's lab and saw one server, >>> it owns 8 slots, on each slot a box containing about 20 cpus and 2T >>> memory at most can be plugged in at one time. So people won't make too >>> many slots for hotplugging since it's too expensive. >> >> There you have it - the persuading argument. So after re-reading the exchanges, many times, I think I realized that I have introduced confusion by using "hotplug", and specifically "memory hotplug" and DIMMs in the same breath, and thus that perhaps equated hotplug with ACPI DIMMs in these discussions. Allow me to state that "hotplug" in this patch series refers to CPU and memory hot un/plug, and does *not* explicitly refer to any particular underlying technology to generate CPU and/or memory hot un/plug events. For example, I have been using DIMMs as my example because that has been my test vehicle for exercising this code; as such I think the discussion cornered itself into real world vs virt discussion about DIMMs, with no end in sight. To be plain, this patch series does not intend to convey or change anything specific about ACPI DIMMs. In reality, when I state "hotplug" in these patches, I am talking generically and therefore inclusive of any technology that can hot un/plug CPUs or memory. For memory specifically, this includes ACPI DIMMs (whether baremetal or QEMU), ballooning, virtio-mem, PPC dlpar (per David), Microsoft DynamicMemory, and the upcoming CXL.mem technology. Probably others that I am not aware. Any of these technologies can add or remove memory from a bare metal and/or virtual system. I apologize. What is important is the number of memory regions (ie. /proc/iomem entries) that can be considered to be the maximum. There is no kernel definition of such. The need to identify a maximum number is so that the buffer containing the elfcorehdr can be sized and allocated at kdump load time, *once*. This elfcorehdr buffer is then modified/updated repeatedly as hot un/plug events occur. It is *not* re-allocated on each hot un/plug event; that is what the current solution does, sort of. >> >>> I checked user space kexec code, the maximum memory range number is >>> honored to x86_64 because of a HPE SGI system. After that, nobody >>> complains about it. Please see below user space kexec-tools commit in >>> https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git >>> >>> The memory ranges may be not all made by different DIMM slots, could be >>> firmware reservatoin, e.g efi/BIOS diggged out physical memory, >> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >> >> I don't know what that means. >> >> If it is firmware crap, you want to exclude that from kdump anyway. > > Yes, now assume we have a HPE SGI system and it has memory hotplug > capacity. The system itself has already got memory regions more than > 1024. Then when we hot add extra memory board, we want to include the > newly added memory regions into elfcorehdr so that it will be dumped out > in kdump kernel. > > That's why I earlier suggested 2048 for number of memory regions. > >> >>> Now CONFIG_NR_CPUS has the maximum number as 8192. And user space >>> kexec-tools has maximum memory range number as 2048. We can take >>> the current 8192 + 2048 = 10K as default value conservatively. Or >>> take 8192 + 2048 * 2 = 12K which has two times of maximum memory range >>> bumber in kexec-tools. What do you think? >> >> I still think that we should stick to reality and support what is >> possible not what is potentially and theoretically there. > > Yes, agree. We should try to get a number which satisfies needs in > reality. > > For Kconfig CRASH_MAX_MEMORY_RANGES in this patch, I have three items to > suggest: > > 1) the name is not good, it doesn't reflect the fact that it's the > number of program headers of elfcorehdr which includes the cpu note > numbers and memory region numers. The total number of program headers is, generally speaking, the number of CPUs and the number of memory regions (plug one for VMCOREINFO and maybe a few others). The NR_CPUS_DEFAULT conveys the maximum number of CPUs possible, and likewise the CRASH_MAX_MEMORY_RANGES is intended to convey the maximum number of memory regions possible. It is not misnamed, imho; rather, I think due to the confusion I outline above, misunderstood. > > 2) default cpu number, I suggest 512 or 1024. The biggest number I > ever saw in reality is 384. On virt system, it won't be too big. Below > is abstracted from arch/x86/Kconfig. A smaller one is also OK, we can > enlarge it when people really have a super machine and run into the > problem. > > config NR_CPUS_DEFAULT > int > depends on X86_64 > default 8192 if MAXSMP > default 64 if SMP > default 1 if !SMP I'm all for making this a sane number, I'm just not sure this patch series is the place to do this? > > 3) For memory regions, I would suggest 2048. Likewise, smaller value is > also fine, we can enlarge it when a real system run into this. As David points out, if the memblock size if 128MiB, then 2048 allows for 256GiB, which I believe to be too low for a maximum default. I'd at least target 1TiB with 128MiB memblock size, which would put the number at 8192. Should the memblock size be 2GiB, for really big systems, then 8192 entries allow handling 16GiB. With sizeof(elf64_phdr) of 64 bytes, that means the elfcorehdr buffer/memory segment is essentially 512KiB. Be aware, in reality, that if the system was fully populated, it would not actually consume all 8192 phdrs. Rather /proc/iomem would essentially show a large contiguous address space which would require just a single phdr. The reason to consider having the larger number of phdrs is so that if the memory becomes fragmented due to hot un/plug events, then you need the phdrs to record the sparse mapping. The sadistic case is that every other memblock is offlined/removed, not likely, but possible. > > I made a draft here for reference, with my undertanding. Please feel > free to change it. > > +config CRASH_ELF_CORE_PHDRS_NUM > + depends on CRASH_DUMP && KEXEC_FILE && (HOTPLUG_CPU || MEMORY_HOTPLUG) > + int > + default 3072 > + help > + For the kexec_file_load path, specify the default number of > + phdr for the vmcore. E.g the memory regions represented by the > + 'System RAM' entries in /proc/iomem, the cpu notes of each > + present cpu stored in /sys/devices/system/cpu/cpuX/crash_notes. > > Thanks > I'd prefer keeping CRASH_MAX_MEMORY_RANGES as that allow the maximum phdr number value to be reflective of CPUs and/or memory; not all systems support both CPU and memory hotplug. For example, I have queued up this change to reflect this: if (IS_ENABLED(CONFIG_HOTPLUG_CPU) || IS_ENABLED(CONFIG_MEMORY_HOTPLUG)) { /* Ensure elfcorehdr segment large enough for hotplug changes */ unsigned long pnum = 2; /* VMCOREINFO and kernel_map */ if (IS_ENABLED(CONFIG_HOTPLUG_CPU)) pnum += CONFIG_NR_CPUS_DEFAULT; if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG)) pnum += CONFIG_CRASH_MAX_MEMORY_RANGES; if (pnum < (unsigned long)PN_XNUM) { kbuf.memsz = pnum * sizeof(Elf64_Phdr); kbuf.memsz += sizeof(Elf64_Ehdr); image->elfcorehdr_index = image->nr_segments; image->elfcorehdr_index_valid = true; /* Mark as usable to crash kernel, else crash kernel fails on boot * image->elf_headers_sz = kbuf.memsz; } else { pr_err("number of Phdrs %lu exceeds max %lu\n", pnum, (unsigned long } } I hope this helps clarify my intentions with this patch series. Thanks, eric ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support 2022-10-27 19:24 ` Eric DeVolder @ 2022-10-28 10:19 ` Borislav Petkov 2022-10-28 15:29 ` Eric DeVolder 0 siblings, 1 reply; 26+ messages in thread From: Borislav Petkov @ 2022-10-28 10:19 UTC (permalink / raw) To: Eric DeVolder Cc: Baoquan He, david, Oscar Salvador, Andrew Morton, linux-kernel, x86, kexec, ebiederm, dyoung, vgoyal, tglx, mingo, dave.hansen, hpa, nramas, thomas.lendacky, robh, efault, rppt, sourabhjain, linux-mm On Thu, Oct 27, 2022 at 02:24:11PM -0500, Eric DeVolder wrote: > Be aware, in reality, that if the system was fully populated, it would not > actually consume all 8192 phdrs. Rather /proc/iomem would essentially show a > large contiguous address space which would require just a single phdr. Then that from below: pnum += CONFIG_CRASH_MAX_MEMORY_RANGES; which then would end up allocating 8192 would be a total waste. So why don't you make that number dynamic then? You start with something sensible: total_num_pheaders = num_online_cpus() + "some number of regions" + "some few others" I.e., a number which is a good compromise on the majority of machines. Then, on hotplug events you count how many new regions are coming in and when you reach the total_num_pheaders number, you double it (or some other increase stragegy), reallocate the ELF header buffers etc needed for kdump and you're good. This way, you don't waste memory unnecessarily on the majority of systems and those who need more, get to allocate more. > I'd prefer keeping CRASH_MAX_MEMORY_RANGES as that allow the maximum phdr > number value to be reflective of CPUs and/or memory; not all systems support > both CPU and memory hotplug. For example, I have queued up this change to > reflect this: > > if (IS_ENABLED(CONFIG_HOTPLUG_CPU) || IS_ENABLED(CONFIG_MEMORY_HOTPLUG)) { If you're going to keep CRASH_MAX_MEMORY_RANGES, then you can test only that thing as it expresses the dependency on CONFIG_HOTPLUG_CPU and CONFIG_MEMORY_HOTPLUG already. If you end up making the number dynamic, then you could make that a different Kconfig item which contains all that crash code as most of the people won't need it anyway. Hmm? -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support 2022-10-28 10:19 ` Borislav Petkov @ 2022-10-28 15:29 ` Eric DeVolder 2022-10-28 17:06 ` Borislav Petkov 0 siblings, 1 reply; 26+ messages in thread From: Eric DeVolder @ 2022-10-28 15:29 UTC (permalink / raw) To: Borislav Petkov Cc: Baoquan He, david, Oscar Salvador, Andrew Morton, linux-kernel, x86, kexec, ebiederm, dyoung, vgoyal, tglx, mingo, dave.hansen, hpa, nramas, thomas.lendacky, robh, efault, rppt, sourabhjain, linux-mm On 10/28/22 05:19, Borislav Petkov wrote: > On Thu, Oct 27, 2022 at 02:24:11PM -0500, Eric DeVolder wrote: >> Be aware, in reality, that if the system was fully populated, it would not >> actually consume all 8192 phdrs. Rather /proc/iomem would essentially show a >> large contiguous address space which would require just a single phdr. > > Then that from below: > > pnum += CONFIG_CRASH_MAX_MEMORY_RANGES; > > which then would end up allocating 8192 would be a total waste. > > So why don't you make that number dynamic then? > > You start with something sensible: > > total_num_pheaders = num_online_cpus() + "some number of regions" + "some few others" > > I.e., a number which is a good compromise on the majority of machines. > > Then, on hotplug events you count how many new regions are coming in > and when you reach the total_num_pheaders number, you double it (or some > other increase stragegy), reallocate the ELF header buffers etc needed > for kdump and you're good. > > This way, you don't waste memory unnecessarily on the majority of > systems and those who need more, get to allocate more. This patch series sizes and allocates the memory buffer/segment for the elfcorehdr once, at kdump load time. In order to dynamically resize the elcorehdr memory buffer/segment, that causes the following ripple effects: - Permitting resizing of the elfcorehdr requires a means to "allocate" a new size buffer from within the crash kernel reserved area. There is no allocator today; currently it is a kind of one-pass placement process that happens at load time. The critical side effect of allocating a new elfcorehdr buffer memory/segment is that it creates a new address for the elfcorehdr. - The elfcorehdr is passed to the crash kernel via the elfcorehdr= kernel cmdline option. As such, a dynamic change to the size of the elfcorehdr size necessarily invites a change of address of that buffer, and therefore a change to rewrite the crash kernel cmdline to reflect the new elfcorehdr buffer address. - A change to the cmdline, also invites a possible change of address of the buffer containing the cmdline, and thus a change to the x86 boot_params, which contains the cmdline pointer. - A change to the cmdline and/or boot_params, which are *not* excluded from the hash/digest, means that the purgatory hash/digest needs to be recomputed, and purgatory re-linked with the new hash/digest and replaced. A fair amount of work, but I have had this working in the past, around the v1 patch series timeframe. However, it only worked for the kexec_file_load() syscall as all the needed pieces of information were available; but for kexec_load(), it isn't possible to relink purgatory as by that point purgatory is but a user-space binary blob. It was feedback on the v1/v2 that pointed out that by excluding the elfcorehdr from the hash/digest, the "change of address" problem with the elfcorehdr buffer/segment goes away, and, in turn, negates the need to: introduce an allocator for the crash kernel reserved space, rewrite the crash kernel cmdline with a new elfcorehdr, update boot_params with a new cmdline and re-link and replace purgatory with the updated digest. And it enables this hotplug efforts to support kexec_load() syscall as well. So it is with this in mind that I suggest we stay with the statically sized elfcorehdr buffer. If that can be agreed upon, then it is "just a matter" of picking a useful elfcorehdr size. Currently that size is derived from the NR_DEFAULT_CPUS and CRASH_MAX_MEMORY_RANGES. So, there is still the CRASH_MAX_MEMORY_RANGES knob to help a dial in size, should there be some issue with the default value/size. Or if there is desire to drop computing the size from NR_DEFAULT_CPUs and CRASH_MAX_MEMORY_RANGES and simply go with CRASH_HOTPLUG_ELFCOREHDR_SZ which simply specifies the buffer size, then I'm also good with that. I still owe a much better explanation of how to size the elfcorehdr. I can use the comments and ideas from the discussion to provide the necessary insight when choosing this value, whether that be CRASH_MAX_MEMORY_RANGES or CRASH_HOTPLUG_ELFCOREHDR_SZ. > >> I'd prefer keeping CRASH_MAX_MEMORY_RANGES as that allow the maximum phdr >> number value to be reflective of CPUs and/or memory; not all systems support >> both CPU and memory hotplug. For example, I have queued up this change to >> reflect this: >> >> if (IS_ENABLED(CONFIG_HOTPLUG_CPU) || IS_ENABLED(CONFIG_MEMORY_HOTPLUG)) { > > If you're going to keep CRASH_MAX_MEMORY_RANGES, then you can test only > that thing as it expresses the dependency on CONFIG_HOTPLUG_CPU and > CONFIG_MEMORY_HOTPLUG already. > > If you end up making the number dynamic, then you could make that a > different Kconfig item which contains all that crash code as most of the > people won't need it anyway. It is my intention to correct the CRASH_MAX_MEMORY_RANGES (if we keep it) as such: config CRASH_MAX_MEMORY_RANGES depends on CRASH_DUMP && KEXEC_FILE && MEMORY_HOTPLUG CRASH_MAX_MEMORY_RANGES should have never had CPU_HOTPLUG as a dependency; that was a cut-n-paste error on my part. > > Hmm? > Thank you for the time and thought on this topic! eric ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support 2022-10-28 15:29 ` Eric DeVolder @ 2022-10-28 17:06 ` Borislav Petkov 2022-10-28 19:26 ` Eric DeVolder 0 siblings, 1 reply; 26+ messages in thread From: Borislav Petkov @ 2022-10-28 17:06 UTC (permalink / raw) To: Eric DeVolder Cc: Baoquan He, david, Oscar Salvador, Andrew Morton, linux-kernel, x86, kexec, ebiederm, dyoung, vgoyal, tglx, mingo, dave.hansen, hpa, nramas, thomas.lendacky, robh, efault, rppt, sourabhjain, linux-mm On Fri, Oct 28, 2022 at 10:29:45AM -0500, Eric DeVolder wrote: > So it is with this in mind that I suggest we stay with the statically sized elfcorehdr buffer. > > If that can be agreed upon, then it is "just a matter" of picking a useful > elfcorehdr size. Currently that size is derived from the NR_DEFAULT_CPUS and > CRASH_MAX_MEMORY_RANGES. So, there is still the CRASH_MAX_MEMORY_RANGES knob > to help a dial in size, should there be some issue with the default > value/size. Let's see kbuf.memsz = (CONFIG_NR_CPUS_DEFAULT + CONFIG_CRASH_MAX_MEMORY_RANGES) * sizeof(Elf64_Phdr); which, IINM, is (8192 + 32768) * 56 which is something like 2M. (CONFIG_NR_CPUS_DEFAULT = 8192 - this is because of MAXSMP which gets set on distro kernels) Now, since userspace kexec tools uses 2048 for max memory ranges, that size becomes smaller - around half a Mb. And since y'all wanna be on the safe side, you can quadruple it and have (8192 + 8192) * 56 which is still under a megabyte. And that's fine, I guess, on a big server. > Or if there is desire to drop computing the size from NR_DEFAULT_CPUs and I think you should leave the dependency on the Kconfig size so that smaller machines which are configured this way, don't end up wasting unnecessary memory. > It is my intention to correct the CRASH_MAX_MEMORY_RANGES (if we keep it) as such: > > config CRASH_MAX_MEMORY_RANGES > depends on CRASH_DUMP && KEXEC_FILE && MEMORY_HOTPLUG Yes, but don't leave it to the user to decide what number to choose - choose a high enough number, explain why you've chosen this with a comment and that's it. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support 2022-10-28 17:06 ` Borislav Petkov @ 2022-10-28 19:26 ` Eric DeVolder 2022-10-28 20:30 ` Borislav Petkov 0 siblings, 1 reply; 26+ messages in thread From: Eric DeVolder @ 2022-10-28 19:26 UTC (permalink / raw) To: Borislav Petkov Cc: Baoquan He, david, Oscar Salvador, Andrew Morton, linux-kernel, x86, kexec, ebiederm, dyoung, vgoyal, tglx, mingo, dave.hansen, hpa, nramas, thomas.lendacky, robh, efault, rppt, sourabhjain, linux-mm On 10/28/22 12:06, Borislav Petkov wrote: > On Fri, Oct 28, 2022 at 10:29:45AM -0500, Eric DeVolder wrote: >> So it is with this in mind that I suggest we stay with the statically sized elfcorehdr buffer. >> >> If that can be agreed upon, then it is "just a matter" of picking a useful >> elfcorehdr size. Currently that size is derived from the NR_DEFAULT_CPUS and >> CRASH_MAX_MEMORY_RANGES. So, there is still the CRASH_MAX_MEMORY_RANGES knob >> to help a dial in size, should there be some issue with the default >> value/size. > > Let's see > > kbuf.memsz = > (CONFIG_NR_CPUS_DEFAULT + CONFIG_CRASH_MAX_MEMORY_RANGES) * > sizeof(Elf64_Phdr); > > which, IINM, is > > (8192 + 32768) * 56 > > which is something like 2M. > > (CONFIG_NR_CPUS_DEFAULT = 8192 - this is because of MAXSMP which gets > set on distro kernels) > > Now, since userspace kexec tools uses 2048 for max memory ranges, that > size becomes smaller - around half a Mb. And since y'all wanna be on the > safe side, you can quadruple it and have > > (8192 + 8192) * 56 > > which is still under a megabyte. And that's fine, I guess, on a big > server. Excellent, I'll set CRASH_MAX_MEMORY_RANGES to 8192! That seems a quite fair trade off of elfcorehdr size vs system size (ie 1TiB w/ 128MiB memblock size). > >> Or if there is desire to drop computing the size from NR_DEFAULT_CPUs and > > I think you should leave the dependency on the Kconfig size so that > smaller machines which are configured this way, don't end up wasting > unnecessary memory. Excellent, I'll leave the computation as NR_DEFAULT_CPUS + CRASH_MAX_MEMORY_RANGES. > >> It is my intention to correct the CRASH_MAX_MEMORY_RANGES (if we keep it) as such: >> >> config CRASH_MAX_MEMORY_RANGES >> depends on CRASH_DUMP && KEXEC_FILE && MEMORY_HOTPLUG > > Yes, but don't leave it to the user to decide what number to choose > - choose a high enough number, explain why you've chosen this with a > comment and that's it. I currently have the Kconfig item as: config CRASH_MAX_MEMORY_RANGES depends on CRASH_DUMP && KEXEC_FILE && MEMORY_HOTPLUG int default 8192 help For the kexec_file_load path, specify the maximum number of memory regions, eg. as represented by the 'System RAM' entries in /proc/iomem, that the elfcorehdr buffer/segment can accommodate. This value is combined with NR_CPUS and multiplied by Elf64_Phdr size to determine the final buffer size. I'll work to provide information a better explanation as to the 8192 number. Thank you! eric > > Thx. > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support 2022-10-28 19:26 ` Eric DeVolder @ 2022-10-28 20:30 ` Borislav Petkov 2022-10-28 20:34 ` Eric DeVolder 2022-10-28 21:22 ` Eric DeVolder 0 siblings, 2 replies; 26+ messages in thread From: Borislav Petkov @ 2022-10-28 20:30 UTC (permalink / raw) To: Eric DeVolder Cc: Baoquan He, david, Oscar Salvador, Andrew Morton, linux-kernel, x86, kexec, ebiederm, dyoung, vgoyal, tglx, mingo, dave.hansen, hpa, nramas, thomas.lendacky, robh, efault, rppt, sourabhjain, linux-mm On Fri, Oct 28, 2022 at 02:26:58PM -0500, Eric DeVolder wrote: > config CRASH_MAX_MEMORY_RANGES > depends on CRASH_DUMP && KEXEC_FILE && MEMORY_HOTPLUG > int > default 8192 > help > For the kexec_file_load path, specify the maximum number of > memory regions, eg. as represented by the 'System RAM' entries > in /proc/iomem, that the elfcorehdr buffer/segment can accommodate. > This value is combined with NR_CPUS and multiplied by Elf64_Phdr > size to determine the final buffer size. No, do this: config CRASH_MEMORY_HOTPLUG_SUPPORT depends on CRASH_DUMP && KEXEC_FILE && MEMORY_HOTPLUG help Help text explaining what this feature is this thing will simply get enabled when the user enables MEMORY_HOTPLUG and CRASH_DUMP. and then you do in the code: /* * A comment explaining how the 8192 value has been selected. */ #define CRASH_MAX_MEMORY_RANGES 8192 Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support 2022-10-28 20:30 ` Borislav Petkov @ 2022-10-28 20:34 ` Eric DeVolder 2022-10-28 21:22 ` Eric DeVolder 1 sibling, 0 replies; 26+ messages in thread From: Eric DeVolder @ 2022-10-28 20:34 UTC (permalink / raw) To: Borislav Petkov Cc: Baoquan He, david, Oscar Salvador, Andrew Morton, linux-kernel, x86, kexec, ebiederm, dyoung, vgoyal, tglx, mingo, dave.hansen, hpa, nramas, thomas.lendacky, robh, efault, rppt, sourabhjain, linux-mm On 10/28/22 15:30, Borislav Petkov wrote: > On Fri, Oct 28, 2022 at 02:26:58PM -0500, Eric DeVolder wrote: >> config CRASH_MAX_MEMORY_RANGES >> depends on CRASH_DUMP && KEXEC_FILE && MEMORY_HOTPLUG >> int >> default 8192 >> help >> For the kexec_file_load path, specify the maximum number of >> memory regions, eg. as represented by the 'System RAM' entries >> in /proc/iomem, that the elfcorehdr buffer/segment can accommodate. >> This value is combined with NR_CPUS and multiplied by Elf64_Phdr >> size to determine the final buffer size. > > No, do this: > > config CRASH_MEMORY_HOTPLUG_SUPPORT > depends on CRASH_DUMP && KEXEC_FILE && MEMORY_HOTPLUG > help > Help text explaining what this feature is > > this thing will simply get enabled when the user enables MEMORY_HOTPLUG > and CRASH_DUMP. > > and then you do in the code: > > /* > * A comment explaining how the 8192 value has been selected. > */ > #define CRASH_MAX_MEMORY_RANGES 8192 > > Thx. > ok, will do! thanks! eric ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support 2022-10-28 20:30 ` Borislav Petkov 2022-10-28 20:34 ` Eric DeVolder @ 2022-10-28 21:22 ` Eric DeVolder 2022-10-28 22:19 ` Borislav Petkov 1 sibling, 1 reply; 26+ messages in thread From: Eric DeVolder @ 2022-10-28 21:22 UTC (permalink / raw) To: Borislav Petkov Cc: Baoquan He, david, Oscar Salvador, Andrew Morton, linux-kernel, x86, kexec, ebiederm, dyoung, vgoyal, tglx, mingo, dave.hansen, hpa, nramas, thomas.lendacky, robh, efault, rppt, sourabhjain, linux-mm On 10/28/22 15:30, Borislav Petkov wrote: > On Fri, Oct 28, 2022 at 02:26:58PM -0500, Eric DeVolder wrote: >> config CRASH_MAX_MEMORY_RANGES >> depends on CRASH_DUMP && KEXEC_FILE && MEMORY_HOTPLUG >> int >> default 8192 >> help >> For the kexec_file_load path, specify the maximum number of >> memory regions, eg. as represented by the 'System RAM' entries >> in /proc/iomem, that the elfcorehdr buffer/segment can accommodate. >> This value is combined with NR_CPUS and multiplied by Elf64_Phdr >> size to determine the final buffer size. > > No, do this: > > config CRASH_MEMORY_HOTPLUG_SUPPORT > depends on CRASH_DUMP && KEXEC_FILE && MEMORY_HOTPLUG > help > Help text explaining what this feature is > > this thing will simply get enabled when the user enables MEMORY_HOTPLUG > and CRASH_DUMP. > > and then you do in the code: > > /* > * A comment explaining how the 8192 value has been selected. > */ > #define CRASH_MAX_MEMORY_RANGES 8192 > > Thx. > How is this comment? /* * For the kexec_file_load() syscall path, specify the maximum number of * memory regions that the elfcorehdr buffer/segment can accommodate. * These regions are obtained via walk_system_ram_res(); eg. the * 'System RAM' entries in /proc/iomem. * This value is combined with NR_CPUS and multiplied by sizeof(Elf64_Phdr) * to determine the final elfcorehdr memory buffer/segment size. * The value 8192, for example, covers a (sparsely populated) 1TiB system * consisting of 128MiB memblock size, while resulting in an elfcorehdr * memory buffer/segment size under 1MiB. */ #define CRASH_MAX_MEMORY_RANGES 8192 ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support 2022-10-28 21:22 ` Eric DeVolder @ 2022-10-28 22:19 ` Borislav Petkov 0 siblings, 0 replies; 26+ messages in thread From: Borislav Petkov @ 2022-10-28 22:19 UTC (permalink / raw) To: Eric DeVolder Cc: Baoquan He, david, Oscar Salvador, Andrew Morton, linux-kernel, x86, kexec, ebiederm, dyoung, vgoyal, tglx, mingo, dave.hansen, hpa, nramas, thomas.lendacky, robh, efault, rppt, sourabhjain, linux-mm On Fri, Oct 28, 2022 at 04:22:54PM -0500, Eric DeVolder wrote: > /* > * For the kexec_file_load() syscall path, specify the maximum number of > * memory regions that the elfcorehdr buffer/segment can accommodate. > * These regions are obtained via walk_system_ram_res(); eg. the > * 'System RAM' entries in /proc/iomem. > * This value is combined with NR_CPUS and multiplied by sizeof(Elf64_Phdr) NR_CPUS_DEFAULT > * to determine the final elfcorehdr memory buffer/segment size. > * The value 8192, for example, covers a (sparsely populated) 1TiB system > * consisting of 128MiB memblock size, while resulting in an elfcorehdr > * memory buffer/segment size under 1MiB. ... and it is a sane choice trying to accomodate both actual baremetal and VM configurations." Yeah, it's a good start. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support 2022-10-12 20:19 ` Eric DeVolder 2022-10-12 20:41 ` Borislav Petkov @ 2022-10-12 20:42 ` Eric DeVolder 1 sibling, 0 replies; 26+ messages in thread From: Eric DeVolder @ 2022-10-12 20:42 UTC (permalink / raw) To: Borislav Petkov, Baoquan He Cc: Oscar Salvador, Andrew Morton, david, linux-kernel, x86, kexec, ebiederm, dyoung, vgoyal, tglx, mingo, dave.hansen, hpa, nramas, thomas.lendacky, robh, efault, rppt, sourabhjain, linux-mm On 10/12/22 15:19, Eric DeVolder wrote: > > > On 10/12/22 12:46, Borislav Petkov wrote: >> On Sat, Oct 08, 2022 at 10:35:14AM +0800, Baoquan He wrote: >>> Memory hptlug is not limited by a certin or a max number of memory >>> regions, but limited by how large of the linear mapping range which >>> physical can be mapped into. >> >> Memory hotplug is not limited by some abstract range but by the *actual* >> possibility of how many DIMM slots on any motherboard can hotplug >> memory. Certainly not 32K. >> >> So you can choose a sane default which covers *all* actual systems out >> there. > > > We run here QEMU with the ability for 1024 DIMM slots. A DIMM can be any > reasonable power of 2 size, and then that DIMM is further divided into memblocks, > typically 128MiB. > > So, for example, 1TiB requires 1024 DIMMs of 1GiB each with 128MiB memblocks, that results > in 8K possible memory regions. So just going to 4TiB reaches 32K memory regions. > > This I can attest for virtualized DIMMs, not sure about other memory hotplug technologies > like virtio-mem or DynamicMemory. But it seems reasonable that those technologies could > also easily reach into these number ranges. > > Eric Oh, to be fair, if the above were fully populated, it would essentially coalescence into a single reported region via crash_prepare_elf64_headers(). But in the sadistic case, where every other memblock was offlined, that would result in the need to report half of the memory regions via the elfcorehdr. $0.02. eric > >> >>> For the Kconfig CRASH_MAX_MEMORY_RANGES Eric added, it's meaningful to >>> me to set a fixed value which is enough in reality. >> >> Yes, exactly. >> >>> For extreme testing with special purpose, it could be broken easily, >>> people need decide by self whether the CONFIG_CRASH_MAX_MEMORY_RANGES >>> is enlarged or not. >> >> I don't want for people to decide on one more thing where they have to >> go and read a bunch of specs just to know what is a good value. So we >> should set a sane, *practical* upper limit and simply go with it. >> >> Everything else is testing stuff and if you test the kernel, then you >> can change limits and values and so on as you want to. >> >> Thx. >> ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support 2022-09-30 17:40 ` Borislav Petkov 2022-10-08 2:35 ` Baoquan He @ 2022-10-12 16:20 ` Eric DeVolder 2022-10-25 10:39 ` Borislav Petkov 1 sibling, 1 reply; 26+ messages in thread From: Eric DeVolder @ 2022-10-12 16:20 UTC (permalink / raw) To: Borislav Petkov Cc: Oscar Salvador, Andrew Morton, david, linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal, tglx, mingo, dave.hansen, hpa, nramas, thomas.lendacky, robh, efault, rppt, sourabhjain, linux-mm On 9/30/22 12:40, Borislav Petkov wrote: > On Fri, Sep 30, 2022 at 12:11:26PM -0500, Eric DeVolder wrote: >> There is of course a way to enumerate the memory regions in use on the >> machine, that is not what this code needs. In order to compute the maximum >> buffer size needed (this buffer size is computed once), the count of the >> maximum number of memory regions possible (even if not currently in use) is >> what is needed. > > Isn't that max number documented somewhere in memory hotplug docs? > > Because then you don't need that Kconfig item either. Imagine you're a > distro kernel distributor and you want crash to work on all machines > your kernel works. > > So you go and set that number to max. And that would be the 99% of the > kernel configs out there. > > Which means, you can just set it to max without a Kconfig item. > >> Oh, that would be an error of haste on my part. This should be: >> depends on CRASH_DUMP && MEMORY_HOTPLUG > > You need a Kconfig item which enables all this gunk as MEMORY_HOTPLUG is > not a omnipresent feature. And that Kconfig item should depend on the > other Kconfig items of the technology you need. I once had CONFIG_CRASH_HOTPLUG, but you disagreed. https://lore.kernel.org/lkml/Ylgot+LUDQl+G%2F5N@zn.tnic/ From there I simply went with #if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG) which route do you prefer? Thanks! eric > >> Baoquan pointed me to: >> >> https://lore.kernel.org/lkml/cover.1656659357.git.naveen.n.rao@linux.vnet.ibm.com/T/ > > In that thread says: > > "- arch_kexec_apply_relocations_add() is only overridden by x86 and s390. > Retain the function prototype for those and move the weak > implementation into the header as a static inline for other > architectures." > > So yes, that's even better. > ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support 2022-10-12 16:20 ` Eric DeVolder @ 2022-10-25 10:39 ` Borislav Petkov 0 siblings, 0 replies; 26+ messages in thread From: Borislav Petkov @ 2022-10-25 10:39 UTC (permalink / raw) To: Eric DeVolder Cc: Oscar Salvador, Andrew Morton, david, linux-kernel, x86, kexec, ebiederm, dyoung, bhe, vgoyal, tglx, mingo, dave.hansen, hpa, nramas, thomas.lendacky, robh, efault, rppt, sourabhjain, linux-mm On Wed, Oct 12, 2022 at 11:20:59AM -0500, Eric DeVolder wrote: > I once had CONFIG_CRASH_HOTPLUG, but you disagreed. > > https://lore.kernel.org/lkml/Ylgot+LUDQl+G%2F5N@zn.tnic/ > > From there I simply went with > > #if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG) > > which route do you prefer? If you do a single Kconfig item which depends on those two, it probably is cleaner this way. And if the max memory ranges are hardcoded you don't need the other prompt asking the user something she most likely doesn't know how to answer properly. That is, unless you wanna have that crash hotplug built in all the time. Because CONFIG_HOTPLUG_CPU is pretty much always enabled so you might just as well add the crash hotplug support unconditionally, without any Kconfig ifdeffery whatsoever except CONFIG_MEMORY_HOTPLUG as that is special and not present on the majority of hardware. But on a plain simple laptop or workstation which has CPU hotplug, would it make sense for the crash ranges to get updated too when CPUs are offlined? If so, I think you want this code present there too, without a Kconfig item. If this is server-only anyway, then a single Kconfig item sounds like not such a bad idea. I hope that makes some sense. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2022-10-29 4:27 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20220909210509.6286-1-eric.devolder@oracle.com>
[not found] ` <20220909210509.6286-8-eric.devolder@oracle.com>
[not found] ` <Yx7XEcXZ8PwwQW95@nazgul.tnic>
[not found] ` <cb343eef-46be-2d67-b93a-84c75be86325@oracle.com>
[not found] ` <YzRxPAoN+XmOfJzV@zn.tnic>
[not found] ` <fd08c13d-a917-4cd6-85ec-267e0fe74c41@oracle.com>
2022-09-30 16:50 ` [PATCH v12 7/7] x86/crash: Add x86 crash hotplug support Borislav Petkov
2022-09-30 17:11 ` Eric DeVolder
2022-09-30 17:40 ` Borislav Petkov
2022-10-08 2:35 ` Baoquan He
2022-10-12 17:46 ` Borislav Petkov
2022-10-12 20:19 ` Eric DeVolder
2022-10-12 20:41 ` Borislav Petkov
2022-10-13 2:57 ` Baoquan He
2022-10-25 10:31 ` Borislav Petkov
2022-10-26 14:48 ` Baoquan He
2022-10-26 14:54 ` David Hildenbrand
2022-10-27 13:52 ` Baoquan He
2022-10-27 19:28 ` Eric DeVolder
2022-10-29 4:27 ` Baoquan He
2022-10-27 19:24 ` Eric DeVolder
2022-10-28 10:19 ` Borislav Petkov
2022-10-28 15:29 ` Eric DeVolder
2022-10-28 17:06 ` Borislav Petkov
2022-10-28 19:26 ` Eric DeVolder
2022-10-28 20:30 ` Borislav Petkov
2022-10-28 20:34 ` Eric DeVolder
2022-10-28 21:22 ` Eric DeVolder
2022-10-28 22:19 ` Borislav Petkov
2022-10-12 20:42 ` Eric DeVolder
2022-10-12 16:20 ` Eric DeVolder
2022-10-25 10:39 ` Borislav Petkov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).