public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: "H. Peter Anvin" <h.peter.anvin@intel.com>
Cc: "mingo@redhat.com" <mingo@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"yinghai@kernel.org" <yinghai@kernel.org>,
	"caiqian@redhat.com" <caiqian@redhat.com>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"linux-tip-commits@vger.kernel.org" 
	<linux-tip-commits@vger.kernel.org>,
	Kexec Mailing List <kexec@lists.infradead.org>
Subject: Re: [tip:core/memblock] x86, memblock: Fix crashkernel allocation
Date: Thu, 7 Oct 2010 14:18:05 -0400	[thread overview]
Message-ID: <20101007181804.GE23308@redhat.com> (raw)
In-Reply-To: <4CAD01A9.9050907@intel.com>

On Wed, Oct 06, 2010 at 04:09:29PM -0700, H. Peter Anvin wrote:
> On 10/06/2010 03:47 PM, Vivek Goyal wrote:
> > 
> > I really don't mind fixing the things properly in long term, just that I am
> > running out of ideas regarding how to fix it in proper way.
> > 
> > To me the best thing would be that this whole allocation thing be dyanmic
> > from user space where kexec will run, determine what it is loading, 
> > determine what are the memory contstraints on these segments (min, upper
> > limit, alignment etc), and then ask kernel for reserving contiguous
> > memory. This kind of dynamic reservation will remove lot of problems
> > associated with crashkernel= reservations.
> > 
> > But I am not aware of anyway of doing dynamic allocation and it certainly
> > does not seem to be easy to be able to allocated 128M of memory contiguously.
> > 
> > Because we don't have a way to reserve memory dynamically later, we end up
> > doing a big chunk of reservation using kernel command line and later
> > figure out what to load where. Now with this approach kexec has not even run
> > so how it can tell you what are the memory constraints.
> > 
> > So to me one of the ways of properly fixing is adding some kind of
> > capability to reserve the memory dynamically (may be using sys_kexec())
> > and get rid of this notion of reserving memory at boot time.
> 
> The problem, of course, will allocating very large chunks of memory at
> runtime is that there are going to be some number of non-movable and
> non-evictable pages that are going to break up the contiguous ranges.
> However, the mm recently added support for moving most pages, which
> should make that kind of allocation a lot more feasible.  I haven't
> experimented how well it works in practice, but I rather suspect that as
> long as the crashkernel is installed sufficiently early in the boot
> process it should have a very good probability of success.

Ok.

>  Another
> option, although one which has its own hackiness issues, is to do a
> conservative allocation at boot time in preparation of the kexec call,
> which is then freed.  This doesn't really address the issue of location,
> though, which is part of the problem here.
> 
> > The other concern you raised is hiding constraints from kernel. At this
> > point of time the only problem with crashkernel=X@0 syntax is that it
> > does not tell you whether to look for memory bottom up or top down. How
> > about if we specify it explicitly in the syntax so that kernel does not
> > have to assume things?
> 
> See below.
> 
> > In fact the initial crashkernel syntax was. crashkernel=X@Y. This meant
> > allocated X amount of memory at location Y. This left no ambiguity and
> > kernel did not have to assume things. It had the problem though that 
> > we might not have physical RAM at location Y. So I think that's when
> > somebody came up with the idea of crashkernel=X@0 so that we ideally
> > want memory at location 0, but if you can't provide that, then provide
> > anything available next scanning bottom up. 
> > 
> > So the only part missing from syntax is explicitly speicifying "next
> > available location scanning bottom up". If we add that to syntax then
> > kernel does not have to make assumptions. (except the alignment part).
> > 
> > So how about modifying syntax to crashkernel=X@Y#BU.
> > 
> > The "#BU" part can be optional and in that case kernel is free to allocate
> > memory either top down or bottom up.
> > 
> > Or any other string which can communicate the bottom up part in a more 
> > intutive manner.
> 
> The whole problem here is that "bottoms up" isn't the true constraint --
> it's a proxy for "this chunk needs < address X, this chunk needs <
> address Y, ..." which is the real issue.  This is particularly messy
> since low memory is a (sometimes very) precious resource that is used by
> a lot of things (BIOS stubs, DMA-mask-limited hardware devices, and
> perhaps especially 1:1 mappable pages on 32 bits, and so on), and one of
> the major reasons we want to switch to a top-down allocation scheme is
> to not waste a precious resource when we don't have to.
> 
> The one improvement one could to the crashkernel= syntax is perhaps
> "crashkernel=X<Y" meaning "allocate entirely below Y", since that is (at
> least in part) the real constraint.  It could even be extended to
> multiple segments: "crashkernel=X<Y,Z<W,..." if we really need to...
> that way you have your preallocation.

Ok, I was browsing through kexec-tools, x86 bzImage code and trying to
refresh my memory what segments were being loaded and what were memory
address concerns.

- relocatable bzImage (max addr 0x37ffffff, 896MB). 
	Though I don't know/understand where that 896MB come from.

- initrd (max addr 0x37ffffff, 896MB)
	Don't know why 896MB as upper limit

- Purgatory (max addr 2G)

- A segment to keep elf headers (no limit)
	These are accessed when second kernel as fully booted so can be
	addressed in higher addresses.

- A backup segment to copy first 640K of memory (not aware of any limit)
- Setup/parameter segment (no limit)
	- We don't really execute anything here and just access it for
  	  command line.

So atleast for bzImage it looks that if we specify crashkernel=128M<896M, it
will work.

So I am fine with above additional syntax for crashkernel=. May be we shall
have to the deprecate the crashkernel=X<@0 syntax.

CCing kexec list, in case others have any comments.

Thanks
Vivek

  reply	other threads:[~2010-10-07 18:18 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <4CAA4BD5.4020505@kernel.org>
2010-10-04 21:57 ` [PATCH 1/4] memblock: Fix big size with find_region() Yinghai Lu
2010-10-06  6:28   ` [tip:core/memblock] memblock: Fix wraparound in find_region() tip-bot for Yinghai Lu
2010-10-04 21:57 ` [PATCH 2/4] x86, memblock: Fix crashkernel allocation Yinghai Lu
2010-10-05 21:15   ` H. Peter Anvin
2010-10-05 22:29   ` H. Peter Anvin
2010-10-05 23:05     ` Yinghai Lu
2010-10-06  6:27       ` [tip:core/memblock] " tip-bot for Yinghai Lu
2010-10-06 15:14         ` Vivek Goyal
2010-10-06 22:16           ` H. Peter Anvin
2010-10-06 22:47             ` Vivek Goyal
2010-10-06 23:06               ` Vivek Goyal
2010-10-06 23:09               ` H. Peter Anvin
2010-10-07 18:18                 ` Vivek Goyal [this message]
2010-10-07 18:54                   ` H. Peter Anvin
2010-10-07 19:21                     ` Vivek Goyal
2010-10-07 20:44                       ` H. Peter Anvin
2010-10-04 21:58 ` [PATCH 3/4] x86, memblock: Remove __memblock_x86_find_in_range_size() Yinghai Lu
2010-10-06  6:29   ` [tip:core/memblock] " tip-bot for Yinghai Lu
2010-10-04 21:58 ` [PATCH 4/4] x86, mm, memblock, 32bit: Make add_highpages honor early reserved ranges Yinghai Lu
2010-10-05 22:50   ` H. Peter Anvin
2010-10-05 23:15     ` Yinghai Lu
2010-10-06  6:28       ` [tip:core/memblock] x86-32, memblock: " tip-bot for Yinghai Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101007181804.GE23308@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=caiqian@redhat.com \
    --cc=h.peter.anvin@intel.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tip-commits@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox