From: Vivek Goyal <vgoyal@redhat.com>
To: "H. Peter Anvin" <h.peter.anvin@intel.com>
Cc: "caiqian@redhat.com" <caiqian@redhat.com>,
"linux-tip-commits@vger.kernel.org"
<linux-tip-commits@vger.kernel.org>,
Kexec Mailing List <kexec@lists.infradead.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"mingo@redhat.com" <mingo@redhat.com>,
"tglx@linutronix.de" <tglx@linutronix.de>,
"yinghai@kernel.org" <yinghai@kernel.org>
Subject: Re: [tip:core/memblock] x86, memblock: Fix crashkernel allocation
Date: Thu, 7 Oct 2010 14:18:05 -0400 [thread overview]
Message-ID: <20101007181804.GE23308@redhat.com> (raw)
In-Reply-To: <4CAD01A9.9050907@intel.com>
On Wed, Oct 06, 2010 at 04:09:29PM -0700, H. Peter Anvin wrote:
> On 10/06/2010 03:47 PM, Vivek Goyal wrote:
> >
> > I really don't mind fixing the things properly in long term, just that I am
> > running out of ideas regarding how to fix it in proper way.
> >
> > To me the best thing would be that this whole allocation thing be dyanmic
> > from user space where kexec will run, determine what it is loading,
> > determine what are the memory contstraints on these segments (min, upper
> > limit, alignment etc), and then ask kernel for reserving contiguous
> > memory. This kind of dynamic reservation will remove lot of problems
> > associated with crashkernel= reservations.
> >
> > But I am not aware of anyway of doing dynamic allocation and it certainly
> > does not seem to be easy to be able to allocated 128M of memory contiguously.
> >
> > Because we don't have a way to reserve memory dynamically later, we end up
> > doing a big chunk of reservation using kernel command line and later
> > figure out what to load where. Now with this approach kexec has not even run
> > so how it can tell you what are the memory constraints.
> >
> > So to me one of the ways of properly fixing is adding some kind of
> > capability to reserve the memory dynamically (may be using sys_kexec())
> > and get rid of this notion of reserving memory at boot time.
>
> The problem, of course, will allocating very large chunks of memory at
> runtime is that there are going to be some number of non-movable and
> non-evictable pages that are going to break up the contiguous ranges.
> However, the mm recently added support for moving most pages, which
> should make that kind of allocation a lot more feasible. I haven't
> experimented how well it works in practice, but I rather suspect that as
> long as the crashkernel is installed sufficiently early in the boot
> process it should have a very good probability of success.
Ok.
> Another
> option, although one which has its own hackiness issues, is to do a
> conservative allocation at boot time in preparation of the kexec call,
> which is then freed. This doesn't really address the issue of location,
> though, which is part of the problem here.
>
> > The other concern you raised is hiding constraints from kernel. At this
> > point of time the only problem with crashkernel=X@0 syntax is that it
> > does not tell you whether to look for memory bottom up or top down. How
> > about if we specify it explicitly in the syntax so that kernel does not
> > have to assume things?
>
> See below.
>
> > In fact the initial crashkernel syntax was. crashkernel=X@Y. This meant
> > allocated X amount of memory at location Y. This left no ambiguity and
> > kernel did not have to assume things. It had the problem though that
> > we might not have physical RAM at location Y. So I think that's when
> > somebody came up with the idea of crashkernel=X@0 so that we ideally
> > want memory at location 0, but if you can't provide that, then provide
> > anything available next scanning bottom up.
> >
> > So the only part missing from syntax is explicitly speicifying "next
> > available location scanning bottom up". If we add that to syntax then
> > kernel does not have to make assumptions. (except the alignment part).
> >
> > So how about modifying syntax to crashkernel=X@Y#BU.
> >
> > The "#BU" part can be optional and in that case kernel is free to allocate
> > memory either top down or bottom up.
> >
> > Or any other string which can communicate the bottom up part in a more
> > intutive manner.
>
> The whole problem here is that "bottoms up" isn't the true constraint --
> it's a proxy for "this chunk needs < address X, this chunk needs <
> address Y, ..." which is the real issue. This is particularly messy
> since low memory is a (sometimes very) precious resource that is used by
> a lot of things (BIOS stubs, DMA-mask-limited hardware devices, and
> perhaps especially 1:1 mappable pages on 32 bits, and so on), and one of
> the major reasons we want to switch to a top-down allocation scheme is
> to not waste a precious resource when we don't have to.
>
> The one improvement one could to the crashkernel= syntax is perhaps
> "crashkernel=X<Y" meaning "allocate entirely below Y", since that is (at
> least in part) the real constraint. It could even be extended to
> multiple segments: "crashkernel=X<Y,Z<W,..." if we really need to...
> that way you have your preallocation.
Ok, I was browsing through kexec-tools, x86 bzImage code and trying to
refresh my memory what segments were being loaded and what were memory
address concerns.
- relocatable bzImage (max addr 0x37ffffff, 896MB).
Though I don't know/understand where that 896MB come from.
- initrd (max addr 0x37ffffff, 896MB)
Don't know why 896MB as upper limit
- Purgatory (max addr 2G)
- A segment to keep elf headers (no limit)
These are accessed when second kernel as fully booted so can be
addressed in higher addresses.
- A backup segment to copy first 640K of memory (not aware of any limit)
- Setup/parameter segment (no limit)
- We don't really execute anything here and just access it for
command line.
So atleast for bzImage it looks that if we specify crashkernel=128M<896M, it
will work.
So I am fine with above additional syntax for crashkernel=. May be we shall
have to the deprecate the crashkernel=X<@0 syntax.
CCing kexec list, in case others have any comments.
Thanks
Vivek
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
WARNING: multiple messages have this Message-ID (diff)
From: Vivek Goyal <vgoyal@redhat.com>
To: "H. Peter Anvin" <h.peter.anvin@intel.com>
Cc: "mingo@redhat.com" <mingo@redhat.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"yinghai@kernel.org" <yinghai@kernel.org>,
"caiqian@redhat.com" <caiqian@redhat.com>,
"tglx@linutronix.de" <tglx@linutronix.de>,
"linux-tip-commits@vger.kernel.org"
<linux-tip-commits@vger.kernel.org>,
Kexec Mailing List <kexec@lists.infradead.org>
Subject: Re: [tip:core/memblock] x86, memblock: Fix crashkernel allocation
Date: Thu, 7 Oct 2010 14:18:05 -0400 [thread overview]
Message-ID: <20101007181804.GE23308@redhat.com> (raw)
In-Reply-To: <4CAD01A9.9050907@intel.com>
On Wed, Oct 06, 2010 at 04:09:29PM -0700, H. Peter Anvin wrote:
> On 10/06/2010 03:47 PM, Vivek Goyal wrote:
> >
> > I really don't mind fixing the things properly in long term, just that I am
> > running out of ideas regarding how to fix it in proper way.
> >
> > To me the best thing would be that this whole allocation thing be dyanmic
> > from user space where kexec will run, determine what it is loading,
> > determine what are the memory contstraints on these segments (min, upper
> > limit, alignment etc), and then ask kernel for reserving contiguous
> > memory. This kind of dynamic reservation will remove lot of problems
> > associated with crashkernel= reservations.
> >
> > But I am not aware of anyway of doing dynamic allocation and it certainly
> > does not seem to be easy to be able to allocated 128M of memory contiguously.
> >
> > Because we don't have a way to reserve memory dynamically later, we end up
> > doing a big chunk of reservation using kernel command line and later
> > figure out what to load where. Now with this approach kexec has not even run
> > so how it can tell you what are the memory constraints.
> >
> > So to me one of the ways of properly fixing is adding some kind of
> > capability to reserve the memory dynamically (may be using sys_kexec())
> > and get rid of this notion of reserving memory at boot time.
>
> The problem, of course, will allocating very large chunks of memory at
> runtime is that there are going to be some number of non-movable and
> non-evictable pages that are going to break up the contiguous ranges.
> However, the mm recently added support for moving most pages, which
> should make that kind of allocation a lot more feasible. I haven't
> experimented how well it works in practice, but I rather suspect that as
> long as the crashkernel is installed sufficiently early in the boot
> process it should have a very good probability of success.
Ok.
> Another
> option, although one which has its own hackiness issues, is to do a
> conservative allocation at boot time in preparation of the kexec call,
> which is then freed. This doesn't really address the issue of location,
> though, which is part of the problem here.
>
> > The other concern you raised is hiding constraints from kernel. At this
> > point of time the only problem with crashkernel=X@0 syntax is that it
> > does not tell you whether to look for memory bottom up or top down. How
> > about if we specify it explicitly in the syntax so that kernel does not
> > have to assume things?
>
> See below.
>
> > In fact the initial crashkernel syntax was. crashkernel=X@Y. This meant
> > allocated X amount of memory at location Y. This left no ambiguity and
> > kernel did not have to assume things. It had the problem though that
> > we might not have physical RAM at location Y. So I think that's when
> > somebody came up with the idea of crashkernel=X@0 so that we ideally
> > want memory at location 0, but if you can't provide that, then provide
> > anything available next scanning bottom up.
> >
> > So the only part missing from syntax is explicitly speicifying "next
> > available location scanning bottom up". If we add that to syntax then
> > kernel does not have to make assumptions. (except the alignment part).
> >
> > So how about modifying syntax to crashkernel=X@Y#BU.
> >
> > The "#BU" part can be optional and in that case kernel is free to allocate
> > memory either top down or bottom up.
> >
> > Or any other string which can communicate the bottom up part in a more
> > intutive manner.
>
> The whole problem here is that "bottoms up" isn't the true constraint --
> it's a proxy for "this chunk needs < address X, this chunk needs <
> address Y, ..." which is the real issue. This is particularly messy
> since low memory is a (sometimes very) precious resource that is used by
> a lot of things (BIOS stubs, DMA-mask-limited hardware devices, and
> perhaps especially 1:1 mappable pages on 32 bits, and so on), and one of
> the major reasons we want to switch to a top-down allocation scheme is
> to not waste a precious resource when we don't have to.
>
> The one improvement one could to the crashkernel= syntax is perhaps
> "crashkernel=X<Y" meaning "allocate entirely below Y", since that is (at
> least in part) the real constraint. It could even be extended to
> multiple segments: "crashkernel=X<Y,Z<W,..." if we really need to...
> that way you have your preallocation.
Ok, I was browsing through kexec-tools, x86 bzImage code and trying to
refresh my memory what segments were being loaded and what were memory
address concerns.
- relocatable bzImage (max addr 0x37ffffff, 896MB).
Though I don't know/understand where that 896MB come from.
- initrd (max addr 0x37ffffff, 896MB)
Don't know why 896MB as upper limit
- Purgatory (max addr 2G)
- A segment to keep elf headers (no limit)
These are accessed when second kernel as fully booted so can be
addressed in higher addresses.
- A backup segment to copy first 640K of memory (not aware of any limit)
- Setup/parameter segment (no limit)
- We don't really execute anything here and just access it for
command line.
So atleast for bzImage it looks that if we specify crashkernel=128M<896M, it
will work.
So I am fine with above additional syntax for crashkernel=. May be we shall
have to the deprecate the crashkernel=X<@0 syntax.
CCing kexec list, in case others have any comments.
Thanks
Vivek
next prev parent reply other threads:[~2010-10-07 18:18 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <4CAA4BD5.4020505@kernel.org>
2010-10-04 21:57 ` [PATCH 1/4] memblock: Fix big size with find_region() Yinghai Lu
2010-10-06 6:28 ` [tip:core/memblock] memblock: Fix wraparound in find_region() tip-bot for Yinghai Lu
2010-10-04 21:57 ` [PATCH 2/4] x86, memblock: Fix crashkernel allocation Yinghai Lu
2010-10-05 21:15 ` H. Peter Anvin
2010-10-05 21:15 ` H. Peter Anvin
2010-10-05 22:29 ` H. Peter Anvin
2010-10-05 22:29 ` H. Peter Anvin
2010-10-05 23:05 ` Yinghai Lu
2010-10-05 23:05 ` Yinghai Lu
2010-10-06 6:27 ` [tip:core/memblock] " tip-bot for Yinghai Lu
2010-10-06 15:14 ` Vivek Goyal
2010-10-06 22:16 ` H. Peter Anvin
2010-10-06 22:47 ` Vivek Goyal
2010-10-06 23:06 ` Vivek Goyal
2010-10-06 23:09 ` H. Peter Anvin
2010-10-07 18:18 ` Vivek Goyal [this message]
2010-10-07 18:18 ` Vivek Goyal
2010-10-07 18:54 ` H. Peter Anvin
2010-10-07 18:54 ` H. Peter Anvin
2010-10-07 19:21 ` Vivek Goyal
2010-10-07 19:21 ` Vivek Goyal
2010-10-07 20:44 ` H. Peter Anvin
2010-10-07 20:44 ` H. Peter Anvin
2010-10-04 21:58 ` [PATCH 3/4] x86, memblock: Remove __memblock_x86_find_in_range_size() Yinghai Lu
2010-10-06 6:29 ` [tip:core/memblock] " tip-bot for Yinghai Lu
2010-10-04 21:58 ` [PATCH 4/4] x86, mm, memblock, 32bit: Make add_highpages honor early reserved ranges Yinghai Lu
2010-10-05 22:50 ` H. Peter Anvin
2010-10-05 23:15 ` Yinghai Lu
2010-10-06 6:28 ` [tip:core/memblock] x86-32, memblock: " tip-bot for Yinghai Lu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101007181804.GE23308@redhat.com \
--to=vgoyal@redhat.com \
--cc=caiqian@redhat.com \
--cc=h.peter.anvin@intel.com \
--cc=kexec@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-tip-commits@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=tglx@linutronix.de \
--cc=yinghai@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.