* Re: [PATCH 2/4] x86, memblock: Fix crashkernel allocation [not found] ` <4CAA4DE2.1020406@kernel.org> @ 2010-10-05 21:15 ` H. Peter Anvin 2010-10-05 22:29 ` H. Peter Anvin 1 sibling, 0 replies; 7+ messages in thread From: H. Peter Anvin @ 2010-10-05 21:15 UTC (permalink / raw) To: Yinghai Lu Cc: Jeremy Fitzhardinge, Benjamin Herrenschmidt, kexec@lists.infradead.org, linux-kernel@vger.kernel.org, Thomas Gleixner, Ingo Molnar, Vivek Goyal On 10/04/2010 02:57 PM, Yinghai Lu wrote: > > +#define DEFAULT_BZIMAGE_ADDR_MAX 0x37FFFFFF > static void __init reserve_crashkernel(void) > { > unsigned long long total_mem; > @@ -518,17 +519,28 @@ static void __init reserve_crashkernel(v > if (crash_base <= 0) { > const unsigned long long alignment = 16<<20; /* 16M */ > > - crash_base = memblock_find_in_range(alignment, ULONG_MAX, crash_size, > - alignment); > + /* > + * Assume half crash_size is for bzImage > + * kexec want bzImage is below DEFAULT_BZIMAGE_ADDR_MAX > + */ > + crash_base = memblock_find_in_range(alignment, > + DEFAULT_BZIMAGE_ADDR_MAX + crash_size/2, > + crash_size, alignment); > + > if (crash_base == MEMBLOCK_ERROR) { > - pr_info("crashkernel reservation failed - No suitable area found.\n"); > - return; > + crash_base = memblock_find_in_range(alignment, > + ULONG_MAX, crash_size, alignment); > + > + if (crash_base == MEMBLOCK_ERROR) { > + pr_info("crashkernel reservation failed - No suitable area found.\n"); > + return; > + } > } > Okay, this *really* doesn't make sense. It's bad enough that kexec doesn't know what memory is safe for it, but why the heck the heuristic that "half is for bzImage and the rest can go beyond the heuristic limit"? Can't we at least simply cap the region to the default, unless the kexec system has passed in some knowable alternative? Furthermore, why bother having the "fallback" at all (certainly without having a message!?) If we don't get the memory area we need we're likely to randomly fail anyway. Let me be completely clear -- it's obvious from all of this that kexec is fundamentally broken by design: if kexec can't communicate the safe memory to use it's busted seven ways to Sunday and it needs to be fixed. However, in the meantime I can see capping the memory available to it as a temporary band-aid, but a fallback to picking random memory is nuts, especially on the motivation that "a future kexec version might be able to use it." If so, the "future kexec tools" should SAY SO. This is beyond crazy -- it's complete and total bonkers. -hpa _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/4] x86, memblock: Fix crashkernel allocation [not found] ` <4CAA4DE2.1020406@kernel.org> 2010-10-05 21:15 ` [PATCH 2/4] x86, memblock: Fix crashkernel allocation H. Peter Anvin @ 2010-10-05 22:29 ` H. Peter Anvin 2010-10-05 23:05 ` Yinghai Lu 1 sibling, 1 reply; 7+ messages in thread From: H. Peter Anvin @ 2010-10-05 22:29 UTC (permalink / raw) To: Yinghai Lu Cc: Jeremy Fitzhardinge, Benjamin Herrenschmidt, kexec@lists.infradead.org, linux-kernel@vger.kernel.org, Thomas Gleixner, Ingo Molnar, Vivek Goyal On 10/04/2010 02:57 PM, Yinghai Lu wrote: > > +#define DEFAULT_BZIMAGE_ADDR_MAX 0x37FFFFFF > static void __init reserve_crashkernel(void) > { > unsigned long long total_mem; > @@ -518,17 +519,28 @@ static void __init reserve_crashkernel(v > if (crash_base <= 0) { > const unsigned long long alignment = 16<<20; /* 16M */ > > - crash_base = memblock_find_in_range(alignment, ULONG_MAX, crash_size, > - alignment); > + /* > + * Assume half crash_size is for bzImage > + * kexec want bzImage is below DEFAULT_BZIMAGE_ADDR_MAX > + */ > + crash_base = memblock_find_in_range(alignment, > + DEFAULT_BZIMAGE_ADDR_MAX + crash_size/2, > + crash_size, alignment); > + > if (crash_base == MEMBLOCK_ERROR) { > - pr_info("crashkernel reservation failed - No suitable area found.\n"); > - return; > + crash_base = memblock_find_in_range(alignment, > + ULONG_MAX, crash_size, alignment); > + > + if (crash_base == MEMBLOCK_ERROR) { > + pr_info("crashkernel reservation failed - No suitable area found.\n"); > + return; > + } > } > Okay, this *really* doesn't make sense. It's bad enough that kexec doesn't know what memory is safe for it, but why the heck the heuristic that "half is for bzImage and the rest can go beyond the heuristic limit"? Can't we at least simply cap the region to the default, unless the kexec system has passed in some knowable alternative? Furthermore, why bother having the "fallback" at all (certainly without having a message!?) If we don't get the memory area we need we're likely to randomly fail anyway. Let me be completely clear -- it's obvious from all of this that kexec is fundamentally broken by design: if kexec can't communicate the safe memory to use it's busted seven ways to Sunday and it needs to be fixed. However, in the meantime I can see capping the memory available to it as a temporary band-aid, but a fallback to picking random memory is nuts, especially on the motivation that "a future kexec version might be able to use it." If so, the "future kexec tools" should SAY SO. This is beyond crazy -- it's complete and total bonkers. -hpa _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/4] x86, memblock: Fix crashkernel allocation 2010-10-05 22:29 ` H. Peter Anvin @ 2010-10-05 23:05 ` Yinghai Lu [not found] ` <tip-9f4c13964b58608fbce05540743281ea3146c0e8@git.kernel.org> 0 siblings, 1 reply; 7+ messages in thread From: Yinghai Lu @ 2010-10-05 23:05 UTC (permalink / raw) To: H. Peter Anvin Cc: Jeremy Fitzhardinge, Benjamin Herrenschmidt, kexec@lists.infradead.org, linux-kernel@vger.kernel.org, Thomas Gleixner, Ingo Molnar, Vivek Goyal On 10/05/2010 03:29 PM, H. Peter Anvin wrote: > On 10/04/2010 02:57 PM, Yinghai Lu wrote: >> >> +#define DEFAULT_BZIMAGE_ADDR_MAX 0x37FFFFFF >> static void __init reserve_crashkernel(void) >> { >> unsigned long long total_mem; >> @@ -518,17 +519,28 @@ static void __init reserve_crashkernel(v >> if (crash_base <= 0) { >> const unsigned long long alignment = 16<<20; /* 16M */ >> >> - crash_base = memblock_find_in_range(alignment, ULONG_MAX, crash_size, >> - alignment); >> + /* >> + * Assume half crash_size is for bzImage >> + * kexec want bzImage is below DEFAULT_BZIMAGE_ADDR_MAX >> + */ >> + crash_base = memblock_find_in_range(alignment, >> + DEFAULT_BZIMAGE_ADDR_MAX + crash_size/2, >> + crash_size, alignment); >> + >> if (crash_base == MEMBLOCK_ERROR) { >> - pr_info("crashkernel reservation failed - No suitable area found.\n"); >> - return; >> + crash_base = memblock_find_in_range(alignment, >> + ULONG_MAX, crash_size, alignment); >> + >> + if (crash_base == MEMBLOCK_ERROR) { >> + pr_info("crashkernel reservation failed - No suitable area found.\n"); >> + return; >> + } >> } >> > > Okay, this *really* doesn't make sense. > > It's bad enough that kexec doesn't know what memory is safe for it, but > why the heck the heuristic that "half is for bzImage and the rest can go > beyond the heuristic limit"? kdump want that range half for bzImage or half for initrd. and kexec only check if bzImage can be put under small range. > Can't we at least simply cap the region to > the default, unless the kexec system has passed in some knowable > alternative? + crash_base = memblock_find_in_range(alignment, + DEFAULT_BZIMAGE_ADDR_MAX, + crash_size, alignment); Furthermore, why bother having the "fallback" at all > (certainly without having a message!?) If we don't get the memory area > we need we're likely to randomly fail anyway. if kexec is fixed to work with bzImage with 64bit entry... > > Let me be completely clear -- it's obvious from all of this that kexec > is fundamentally broken by design: if kexec can't communicate the safe > memory to use it's busted seven ways to Sunday and it needs to be fixed. > However, in the meantime I can see capping the memory available to it > as a temporary band-aid, but a fallback to picking random memory is > nuts, especially on the motivation that "a future kexec version might be > able to use it." If so, the "future kexec tools" should SAY SO. ok, please check [PATCH -v6] x86, memblock: Fix crashkernel allocation Cai Qian found crashkernel is broken with x86 memblock changes 1. crashkernel=128M@32M always reported that range is used, even first kernel is small no one use that range 2. always get following report when using "kexec -p" Could not find a free area of memory of a000 bytes... locate_hole failed The root cause is that generic memblock_find_in_range() will try to get range from top_down. But crashkernel do need from low and specified range. Let's limit the target range with crash_base + crash_size to make sure that We get exact range. -v6: use DEFAULT_BZIMAGE_ADDR_MAX to limit area that could be used by bzImge. Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> --- arch/x86/kernel/setup.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) Index: linux-2.6/arch/x86/kernel/setup.c =================================================================== --- linux-2.6.orig/arch/x86/kernel/setup.c +++ linux-2.6/arch/x86/kernel/setup.c @@ -501,6 +501,7 @@ static inline unsigned long long get_tot return total << PAGE_SHIFT; } +#define DEFAULT_BZIMAGE_ADDR_MAX 0x37FFFFFF static void __init reserve_crashkernel(void) { unsigned long long total_mem; @@ -518,8 +519,12 @@ static void __init reserve_crashkernel(v if (crash_base <= 0) { const unsigned long long alignment = 16<<20; /* 16M */ - crash_base = memblock_find_in_range(alignment, ULONG_MAX, crash_size, - alignment); + /* + * kexec want bzImage is below DEFAULT_BZIMAGE_ADDR_MAX + */ + crash_base = memblock_find_in_range(alignment, + DEFAULT_BZIMAGE_ADDR_MAX, crash_size, alignment); + if (crash_base == MEMBLOCK_ERROR) { pr_info("crashkernel reservation failed - No suitable area found.\n"); return; @@ -527,8 +532,8 @@ static void __init reserve_crashkernel(v } else { unsigned long long start; - start = memblock_find_in_range(crash_base, ULONG_MAX, crash_size, - 1<<20); + start = memblock_find_in_range(crash_base, + crash_base + crash_size, crash_size, 1<<20); if (start != crash_base) { pr_info("crashkernel reservation failed - memory is in use.\n"); return; _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <tip-9f4c13964b58608fbce05540743281ea3146c0e8@git.kernel.org>]
[parent not found: <20101006151449.GA7378@redhat.com>]
[parent not found: <4CACF531.2060407@intel.com>]
[parent not found: <20101006224704.GD7378@redhat.com>]
[parent not found: <4CAD01A9.9050907@intel.com>]
* Re: [tip:core/memblock] x86, memblock: Fix crashkernel allocation [not found] ` <4CAD01A9.9050907@intel.com> @ 2010-10-07 18:18 ` Vivek Goyal 2010-10-07 18:54 ` H. Peter Anvin 0 siblings, 1 reply; 7+ messages in thread From: Vivek Goyal @ 2010-10-07 18:18 UTC (permalink / raw) To: H. Peter Anvin Cc: caiqian@redhat.com, linux-tip-commits@vger.kernel.org, Kexec Mailing List, linux-kernel@vger.kernel.org, mingo@redhat.com, tglx@linutronix.de, yinghai@kernel.org On Wed, Oct 06, 2010 at 04:09:29PM -0700, H. Peter Anvin wrote: > On 10/06/2010 03:47 PM, Vivek Goyal wrote: > > > > I really don't mind fixing the things properly in long term, just that I am > > running out of ideas regarding how to fix it in proper way. > > > > To me the best thing would be that this whole allocation thing be dyanmic > > from user space where kexec will run, determine what it is loading, > > determine what are the memory contstraints on these segments (min, upper > > limit, alignment etc), and then ask kernel for reserving contiguous > > memory. This kind of dynamic reservation will remove lot of problems > > associated with crashkernel= reservations. > > > > But I am not aware of anyway of doing dynamic allocation and it certainly > > does not seem to be easy to be able to allocated 128M of memory contiguously. > > > > Because we don't have a way to reserve memory dynamically later, we end up > > doing a big chunk of reservation using kernel command line and later > > figure out what to load where. Now with this approach kexec has not even run > > so how it can tell you what are the memory constraints. > > > > So to me one of the ways of properly fixing is adding some kind of > > capability to reserve the memory dynamically (may be using sys_kexec()) > > and get rid of this notion of reserving memory at boot time. > > The problem, of course, will allocating very large chunks of memory at > runtime is that there are going to be some number of non-movable and > non-evictable pages that are going to break up the contiguous ranges. > However, the mm recently added support for moving most pages, which > should make that kind of allocation a lot more feasible. I haven't > experimented how well it works in practice, but I rather suspect that as > long as the crashkernel is installed sufficiently early in the boot > process it should have a very good probability of success. Ok. > Another > option, although one which has its own hackiness issues, is to do a > conservative allocation at boot time in preparation of the kexec call, > which is then freed. This doesn't really address the issue of location, > though, which is part of the problem here. > > > The other concern you raised is hiding constraints from kernel. At this > > point of time the only problem with crashkernel=X@0 syntax is that it > > does not tell you whether to look for memory bottom up or top down. How > > about if we specify it explicitly in the syntax so that kernel does not > > have to assume things? > > See below. > > > In fact the initial crashkernel syntax was. crashkernel=X@Y. This meant > > allocated X amount of memory at location Y. This left no ambiguity and > > kernel did not have to assume things. It had the problem though that > > we might not have physical RAM at location Y. So I think that's when > > somebody came up with the idea of crashkernel=X@0 so that we ideally > > want memory at location 0, but if you can't provide that, then provide > > anything available next scanning bottom up. > > > > So the only part missing from syntax is explicitly speicifying "next > > available location scanning bottom up". If we add that to syntax then > > kernel does not have to make assumptions. (except the alignment part). > > > > So how about modifying syntax to crashkernel=X@Y#BU. > > > > The "#BU" part can be optional and in that case kernel is free to allocate > > memory either top down or bottom up. > > > > Or any other string which can communicate the bottom up part in a more > > intutive manner. > > The whole problem here is that "bottoms up" isn't the true constraint -- > it's a proxy for "this chunk needs < address X, this chunk needs < > address Y, ..." which is the real issue. This is particularly messy > since low memory is a (sometimes very) precious resource that is used by > a lot of things (BIOS stubs, DMA-mask-limited hardware devices, and > perhaps especially 1:1 mappable pages on 32 bits, and so on), and one of > the major reasons we want to switch to a top-down allocation scheme is > to not waste a precious resource when we don't have to. > > The one improvement one could to the crashkernel= syntax is perhaps > "crashkernel=X<Y" meaning "allocate entirely below Y", since that is (at > least in part) the real constraint. It could even be extended to > multiple segments: "crashkernel=X<Y,Z<W,..." if we really need to... > that way you have your preallocation. Ok, I was browsing through kexec-tools, x86 bzImage code and trying to refresh my memory what segments were being loaded and what were memory address concerns. - relocatable bzImage (max addr 0x37ffffff, 896MB). Though I don't know/understand where that 896MB come from. - initrd (max addr 0x37ffffff, 896MB) Don't know why 896MB as upper limit - Purgatory (max addr 2G) - A segment to keep elf headers (no limit) These are accessed when second kernel as fully booted so can be addressed in higher addresses. - A backup segment to copy first 640K of memory (not aware of any limit) - Setup/parameter segment (no limit) - We don't really execute anything here and just access it for command line. So atleast for bzImage it looks that if we specify crashkernel=128M<896M, it will work. So I am fine with above additional syntax for crashkernel=. May be we shall have to the deprecate the crashkernel=X<@0 syntax. CCing kexec list, in case others have any comments. Thanks Vivek _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [tip:core/memblock] x86, memblock: Fix crashkernel allocation 2010-10-07 18:18 ` [tip:core/memblock] " Vivek Goyal @ 2010-10-07 18:54 ` H. Peter Anvin 2010-10-07 19:21 ` Vivek Goyal 0 siblings, 1 reply; 7+ messages in thread From: H. Peter Anvin @ 2010-10-07 18:54 UTC (permalink / raw) To: Vivek Goyal Cc: caiqian@redhat.com, linux-tip-commits@vger.kernel.org, Kexec Mailing List, linux-kernel@vger.kernel.org, mingo@redhat.com, tglx@linutronix.de, yinghai@kernel.org On 10/07/2010 11:18 AM, Vivek Goyal wrote: > > Ok, I was browsing through kexec-tools, x86 bzImage code and trying to > refresh my memory what segments were being loaded and what were memory > address concerns. > > - relocatable bzImage (max addr 0x37ffffff, 896MB). > Though I don't know/understand where that 896MB come from. > > - initrd (max addr 0x37ffffff, 896MB) > Don't know why 896MB as upper limit 896 MB is presumably the (default!!) LOWMEM limit on 32 bits. This is actually wrong if vmalloc= is also specified on command line, though, or with nonstandard compile-time options. > - Purgatory (max addr 2G) > > - A segment to keep elf headers (no limit) > These are accessed when second kernel as fully booted so can be > addressed in higher addresses. > > - A backup segment to copy first 640K of memory (not aware of any limit) > - Setup/parameter segment (no limit) > - We don't really execute anything here and just access it for > command line. Probably has a 4 GB limit, since I believe it only has a 32-bit pointer. > So atleast for bzImage it looks that if we specify crashkernel=128M<896M, it > will work. > > So I am fine with above additional syntax for crashkernel=. May be we shall > have to the deprecate the crashkernel=X<@0 syntax. > > CCing kexec list, in case others have any comments. It would be easy enough to either deprecate or make it an alias for crashkernel=...<896M, which is basically what Yinghai's patch does. -hpa _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [tip:core/memblock] x86, memblock: Fix crashkernel allocation 2010-10-07 18:54 ` H. Peter Anvin @ 2010-10-07 19:21 ` Vivek Goyal 2010-10-07 20:44 ` H. Peter Anvin 0 siblings, 1 reply; 7+ messages in thread From: Vivek Goyal @ 2010-10-07 19:21 UTC (permalink / raw) To: H. Peter Anvin Cc: caiqian@redhat.com, linux-tip-commits@vger.kernel.org, Kexec Mailing List, linux-kernel@vger.kernel.org, mingo@redhat.com, tglx@linutronix.de, yinghai@kernel.org On Thu, Oct 07, 2010 at 11:54:31AM -0700, H. Peter Anvin wrote: > On 10/07/2010 11:18 AM, Vivek Goyal wrote: > > > > Ok, I was browsing through kexec-tools, x86 bzImage code and trying to > > refresh my memory what segments were being loaded and what were memory > > address concerns. > > > > - relocatable bzImage (max addr 0x37ffffff, 896MB). > > Though I don't know/understand where that 896MB come from. > > > > - initrd (max addr 0x37ffffff, 896MB) > > Don't know why 896MB as upper limit > > 896 MB is presumably the (default!!) LOWMEM limit on 32 bits. This is > actually wrong if vmalloc= is also specified on command line, though, or > with nonstandard compile-time options. > > > - Purgatory (max addr 2G) > > > > - A segment to keep elf headers (no limit) > > These are accessed when second kernel as fully booted so can be > > addressed in higher addresses. > > > > - A backup segment to copy first 640K of memory (not aware of any limit) > > - Setup/parameter segment (no limit) > > - We don't really execute anything here and just access it for > > command line. > > Probably has a 4 GB limit, since I believe it only has a 32-bit pointer. > > > So atleast for bzImage it looks that if we specify crashkernel=128M<896M, it > > will work. > > > > So I am fine with above additional syntax for crashkernel=. May be we shall > > have to the deprecate the crashkernel=X<@0 syntax. > > > > CCing kexec list, in case others have any comments. > > It would be easy enough to either deprecate or make it an alias for > crashkernel=...<896M, which is basically what Yinghai's patch does. Agreed. So Yinghai's patch is fine. I need to write a patch for introducing crashkernel=X<Y syntax to make the behavior explicit. Will do... Thanks Vivek _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [tip:core/memblock] x86, memblock: Fix crashkernel allocation 2010-10-07 19:21 ` Vivek Goyal @ 2010-10-07 20:44 ` H. Peter Anvin 0 siblings, 0 replies; 7+ messages in thread From: H. Peter Anvin @ 2010-10-07 20:44 UTC (permalink / raw) To: Vivek Goyal Cc: caiqian@redhat.com, linux-tip-commits@vger.kernel.org, Kexec Mailing List, linux-kernel@vger.kernel.org, mingo@redhat.com, tglx@linutronix.de, yinghai@kernel.org On 10/07/2010 12:21 PM, Vivek Goyal wrote: >> >> It would be easy enough to either deprecate or make it an alias for >> crashkernel=...<896M, which is basically what Yinghai's patch does. > > Agreed. > > So Yinghai's patch is fine. I need to write a patch for introducing > crashkernel=X<Y syntax to make the behavior explicit. Will do... > Sounds like a plan. Thanks! -hpa _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2010-10-07 20:44 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <4CAA4BD5.4020505@kernel.org>
[not found] ` <4CAA4DE2.1020406@kernel.org>
2010-10-05 21:15 ` [PATCH 2/4] x86, memblock: Fix crashkernel allocation H. Peter Anvin
2010-10-05 22:29 ` H. Peter Anvin
2010-10-05 23:05 ` Yinghai Lu
[not found] ` <tip-9f4c13964b58608fbce05540743281ea3146c0e8@git.kernel.org>
[not found] ` <20101006151449.GA7378@redhat.com>
[not found] ` <4CACF531.2060407@intel.com>
[not found] ` <20101006224704.GD7378@redhat.com>
[not found] ` <4CAD01A9.9050907@intel.com>
2010-10-07 18:18 ` [tip:core/memblock] " Vivek Goyal
2010-10-07 18:54 ` H. Peter Anvin
2010-10-07 19:21 ` Vivek Goyal
2010-10-07 20:44 ` H. Peter Anvin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox