From: Michael Ellerman <michael@ellerman.id.au>
To: Milton Miller <miltonm@bga.com>
Cc: linux-ppc <linuxppc-dev@ozlabs.org>,
Mohan Kumar <mohan@in.ibm.com>,
kexec@lists.infradead.org
Subject: Re: [PATCH] powerpc: check crash_base for relocatable kernel
Date: Thu, 08 Jan 2009 14:35:42 +1100 [thread overview]
Message-ID: <1231385742.8294.31.camel@localhost> (raw)
In-Reply-To: <f378ca9151a64ea3cead1b7354e8153c@bga.com>
[-- Attachment #1.1: Type: text/plain, Size: 4767 bytes --]
On Wed, 2009-01-07 at 08:57 -0600, Milton Miller wrote:
> [removed Paul from cc and fixed Mohan's email]
>
> On Jan 6, 2009, at 5:44 PM, Michael Ellerman wrote:
>
> > On Fri, 2009-01-02 at 14:46 -0600, Milton Miller wrote:
> >> @@ -94,10 +95,35 @@ void __init reserve_crashkernel(void)
> >> KDUMP_KERNELBASE);
> >>
> >> crashk_res.start = KDUMP_KERNELBASE;
> >> +#else
> >> + if (!crashk_res.start) {
> >> + /*
> >> + * unspecified address, choose a region of specified size
> >> + * can overlap with initrd (ignoring corruption when retained)
> >> + * ppc64 requires kernel and some stacks to be in first segemnt
> >> + */
> >> + crashk_res.start = KDUMP_KERNELBASE;
> >> + }
> >> +
> >> + crash_base = PAGE_ALIGN(crashk_res.start);
> >> + if (crash_base != crashk_res.start) {
> >> + printk("Crash kernel base must be aligned to 0x%lx\n",
> >> + PAGE_SIZE);
> >> + crashk_res.start = crash_base;
> >> + }
> >> +
> >> #endif
> >> crash_size = PAGE_ALIGN(crash_size);
> >> crashk_res.end = crashk_res.start + crash_size - 1;
> >>
> >> + /* The crash region must not overlap the current kernel */
> >> + if (overlaps_crashkernel(__pa(_stext), _end - _stext)) {
> >> + printk(KERN_WARNING
> >> + "Crash kernel can not overlap current kernel\n");
> >> + crashk_res.start = crashk_res.end = 0;
> >> + return;
> >> + }
> >
> > I think we can be smarter here. Why don't we adjust the crash kernel
> > region so that it doesn't overlap the first kernel? ie. move it up a
> > bit.
>
> How much? In addition to the size of the kernel, we have to allocate
> (1) the emergeency stacks as we use them to bring up secondary cpus (2)
> the irq stacks in the first segment. While the second could be met
> easier on systems with 1TB slbs we don't take advantage of that yet.
Hmm, we could try and work it out though. I guess we don't know how many
CPUs we have at that point, which makes it a little trickier.
So we have the emergency stack and the hard & soft irq stacks per cpu,
which is 48KB AFAICT. So for a 256-way system that would be 12MB.
I don't think I've seen an RMO smaller than 128MB, though I notice our
RPA note specifies 64M as the minimum we'll accept. That would probably
be a bit tight.
How about something like:
min_space = _end + 16MB (16 to be safe?)
if min_space < rmo_size / 2:
min_space = rmo_size / 2
if crash_base < min_space:
crash_base = min_space
> > There's also the issue of the RMO, I'm not sure what we should do
> > there,
> > but I think the kernel needs some smarts otherwise users are going to
> > shoot themselves in the foot.
>
> I was looking at the code in kexec-tools for the rmo, and it seems
> extremely broken (ie it sets rmo_top on every memory block instead of
> the lowest; the clamp to 768M is the savior for systems with multiple
> blocks).
Oh surprise.
> Do we care about loading a kernel below a relocated kernel (between the
> interrupt vectors and the new kernel)? I ignored that for now,
> arguing that we always run the first kernel at 0.
No I don't think so.
> > We could ignore the @x setting and split the RMO between both kernels
> > somewhat intelligently.
> >
> > What might work is multiple crash regions, that way we could have some
> > space in the RMO for the second kernel (say 32MB?), but the rest
> > outside
> > - leaving some RMO for the first kernel. But I think that would require
> > some serious surgery.
> >
>
> Other archs have this, i guess because they read the memory out of
> /proc/iomem. The trick is knowing what has to be put in real space
> and what can go abvoe the rmo. Also, we have those horrible hard-code
> rmo to 768M max because some platform (one of the cell ones?) didn't
> make the device tree to show it. Maybe we can track it down and add
> linux,usable-mem-ranges to fix it up?
Dunno about the cell, but some of the early blades did have crufty
firmware.
> Does the generic code support loading into the split regions, or is it
> just for giving the kernel room to run?
I don't think so. I don't see any logic that deals with gaps in the
crashk region.
> So while all of these are nice, what do you think about merging this as
> an interm measure, especially for backporting to 2.6.28 stable (and any
> distro that wants to pick up relocatable kdump)?
I guess. I'd rather do something smarter, like I suggested above.
cheers
--
Michael Ellerman
OzLabs, IBM Australia Development Lab
wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)
We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
[-- Attachment #2: Type: text/plain, Size: 143 bytes --]
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
WARNING: multiple messages have this Message-ID (diff)
From: Michael Ellerman <michael@ellerman.id.au>
To: Milton Miller <miltonm@bga.com>
Cc: linux-ppc <linuxppc-dev@ozlabs.org>, kexec@lists.infradead.org
Subject: Re: [PATCH] powerpc: check crash_base for relocatable kernel
Date: Thu, 08 Jan 2009 14:35:42 +1100 [thread overview]
Message-ID: <1231385742.8294.31.camel@localhost> (raw)
In-Reply-To: <f378ca9151a64ea3cead1b7354e8153c@bga.com>
[-- Attachment #1: Type: text/plain, Size: 4767 bytes --]
On Wed, 2009-01-07 at 08:57 -0600, Milton Miller wrote:
> [removed Paul from cc and fixed Mohan's email]
>
> On Jan 6, 2009, at 5:44 PM, Michael Ellerman wrote:
>
> > On Fri, 2009-01-02 at 14:46 -0600, Milton Miller wrote:
> >> @@ -94,10 +95,35 @@ void __init reserve_crashkernel(void)
> >> KDUMP_KERNELBASE);
> >>
> >> crashk_res.start = KDUMP_KERNELBASE;
> >> +#else
> >> + if (!crashk_res.start) {
> >> + /*
> >> + * unspecified address, choose a region of specified size
> >> + * can overlap with initrd (ignoring corruption when retained)
> >> + * ppc64 requires kernel and some stacks to be in first segemnt
> >> + */
> >> + crashk_res.start = KDUMP_KERNELBASE;
> >> + }
> >> +
> >> + crash_base = PAGE_ALIGN(crashk_res.start);
> >> + if (crash_base != crashk_res.start) {
> >> + printk("Crash kernel base must be aligned to 0x%lx\n",
> >> + PAGE_SIZE);
> >> + crashk_res.start = crash_base;
> >> + }
> >> +
> >> #endif
> >> crash_size = PAGE_ALIGN(crash_size);
> >> crashk_res.end = crashk_res.start + crash_size - 1;
> >>
> >> + /* The crash region must not overlap the current kernel */
> >> + if (overlaps_crashkernel(__pa(_stext), _end - _stext)) {
> >> + printk(KERN_WARNING
> >> + "Crash kernel can not overlap current kernel\n");
> >> + crashk_res.start = crashk_res.end = 0;
> >> + return;
> >> + }
> >
> > I think we can be smarter here. Why don't we adjust the crash kernel
> > region so that it doesn't overlap the first kernel? ie. move it up a
> > bit.
>
> How much? In addition to the size of the kernel, we have to allocate
> (1) the emergeency stacks as we use them to bring up secondary cpus (2)
> the irq stacks in the first segment. While the second could be met
> easier on systems with 1TB slbs we don't take advantage of that yet.
Hmm, we could try and work it out though. I guess we don't know how many
CPUs we have at that point, which makes it a little trickier.
So we have the emergency stack and the hard & soft irq stacks per cpu,
which is 48KB AFAICT. So for a 256-way system that would be 12MB.
I don't think I've seen an RMO smaller than 128MB, though I notice our
RPA note specifies 64M as the minimum we'll accept. That would probably
be a bit tight.
How about something like:
min_space = _end + 16MB (16 to be safe?)
if min_space < rmo_size / 2:
min_space = rmo_size / 2
if crash_base < min_space:
crash_base = min_space
> > There's also the issue of the RMO, I'm not sure what we should do
> > there,
> > but I think the kernel needs some smarts otherwise users are going to
> > shoot themselves in the foot.
>
> I was looking at the code in kexec-tools for the rmo, and it seems
> extremely broken (ie it sets rmo_top on every memory block instead of
> the lowest; the clamp to 768M is the savior for systems with multiple
> blocks).
Oh surprise.
> Do we care about loading a kernel below a relocated kernel (between the
> interrupt vectors and the new kernel)? I ignored that for now,
> arguing that we always run the first kernel at 0.
No I don't think so.
> > We could ignore the @x setting and split the RMO between both kernels
> > somewhat intelligently.
> >
> > What might work is multiple crash regions, that way we could have some
> > space in the RMO for the second kernel (say 32MB?), but the rest
> > outside
> > - leaving some RMO for the first kernel. But I think that would require
> > some serious surgery.
> >
>
> Other archs have this, i guess because they read the memory out of
> /proc/iomem. The trick is knowing what has to be put in real space
> and what can go abvoe the rmo. Also, we have those horrible hard-code
> rmo to 768M max because some platform (one of the cell ones?) didn't
> make the device tree to show it. Maybe we can track it down and add
> linux,usable-mem-ranges to fix it up?
Dunno about the cell, but some of the early blades did have crufty
firmware.
> Does the generic code support loading into the split regions, or is it
> just for giving the kernel room to run?
I don't think so. I don't see any logic that deals with gaps in the
crashk region.
> So while all of these are nice, what do you think about merging this as
> an interm measure, especially for backporting to 2.6.28 stable (and any
> distro that wants to pick up relocatable kdump)?
I guess. I'd rather do something smarter, like I suggested above.
cheers
--
Michael Ellerman
OzLabs, IBM Australia Development Lab
wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)
We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
next prev parent reply other threads:[~2009-01-08 3:35 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-01-02 20:42 [PATCH 0/5 + 2] kexec updates Milton Miller
2009-01-02 20:42 ` Milton Miller
2009-01-02 20:44 ` Milton Miller
2009-01-02 21:00 ` Milton Miller
2009-01-02 20:44 ` Milton Miller
2009-01-02 21:00 ` Milton Miller
2009-01-02 20:44 ` Milton Miller
2009-01-02 21:00 ` Milton Miller
2009-01-02 20:44 ` Milton Miller
2009-01-02 21:00 ` Milton Miller
2009-01-02 20:46 ` [PATCH] powerpc: make dummy section a valid note header Milton Miller
2009-01-02 20:46 ` Milton Miller
2009-01-02 20:46 ` [PATCH] powerpc: check crash_base for relocatable kernel Milton Miller
2009-01-02 20:46 ` Milton Miller
2009-01-06 23:44 ` Michael Ellerman
2009-01-06 23:44 ` Michael Ellerman
2009-01-07 14:57 ` Milton Miller
2009-01-07 14:57 ` Milton Miller
2009-01-08 3:35 ` Michael Ellerman [this message]
2009-01-08 3:35 ` Michael Ellerman
2009-01-02 21:04 ` [PATCH kexec-tools 1/5] ppc64: always check number of ranges when adding Milton Miller
2009-01-02 21:04 ` Milton Miller
2009-01-07 2:42 ` Michael Ellerman
2009-01-07 2:42 ` Michael Ellerman
2009-01-07 14:34 ` Milton Miller
2009-01-07 14:34 ` Milton Miller
2009-01-08 12:33 ` [PATCH kexec-tools v2] ppc64: always check number of ranges when adding them Milton Miller
2009-01-08 12:33 ` Milton Miller
2009-01-02 21:04 ` [PATCH kexec-tools 2/5] ppc64: update kdump for 2.6.28 relocatable kernel Milton Miller
2009-01-02 21:04 ` Milton Miller
2009-01-02 21:04 ` [PATCH kexec-tools 4/5] ppc64: cleanups Milton Miller
2009-01-02 21:04 ` Milton Miller
2009-01-02 21:04 ` [PATCH kexec-tools 5/5] entry wants to be void * Milton Miller
2009-01-02 21:04 ` Milton Miller
2009-01-12 6:24 ` [PATCH 0/5 + 2] kexec updates Simon Horman
2009-01-12 6:24 ` Simon Horman
2009-01-13 4:15 ` M. Mohan Kumar
2009-01-13 4:15 ` M. Mohan Kumar
2009-01-13 15:59 ` Milton Miller
2009-01-13 15:59 ` Milton Miller
2009-01-15 22:43 ` Simon Horman
2009-01-15 22:43 ` Simon Horman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1231385742.8294.31.camel@localhost \
--to=michael@ellerman.id.au \
--cc=kexec@lists.infradead.org \
--cc=linuxppc-dev@ozlabs.org \
--cc=miltonm@bga.com \
--cc=mohan@in.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.