public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: kexec reboot code buffer
       [not found] <3E31AC58.2020802@us.ibm.com>
@ 2003-01-25 14:16 ` Eric W. Biederman
  2003-01-27 21:55   ` Dave Hansen
  0 siblings, 1 reply; 13+ messages in thread
From: Eric W. Biederman @ 2003-01-25 14:16 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel

Dave Hansen <haveblue@us.ibm.com> writes:

> On my system, it appears to lock up in:
> kimage_alloc_reboot_code_pages()
> after the kexec -l.

O.k. It should come out of it eventually from what I have
seen described, the current algorithm is definitely inefficient on
your machine.
 
> I put a little printk in the loop:
>         list_for_each_safe(pos, next, &extra_pages) {
>                 struct page *page;
>                 int i;
>                 if( (listcount++%1000000) == 0 )
>                 printk("listcount:%d\n", listcount);
>                 page = list_entry(pos, struct page, list);
>                 for(i = 0; i < count; i++) {
>                         ClearPageReserved(pages +i);
>                 }
>                 list_del(&extra_pages);
>                 __free_pages(page, order);
>         }
> 
> I stopped it when it hit 1.2 billion:
> kimage_alloc_reboot_code_pages(): listcount:1213000001
> 
> First of all, this is a 16-way, 4-node NUMA-Q with 32GB of RAM.
> If the alloc_pages(GFP_HIGHUSER, order) doesn't happen on node 0,
> CPU[0-3], it will be guaranteed to get physical addresses >8GB, until
> that node is out of memory, when it will start falling other to other
> nodes' highmem.

Thanks I was afraid this might be a problem, but I was not certain
how much a practical problem it would be.  On a NUMA machine on x86
with > 4GB of memory the fact that this hangs shows it is very
definitely a problem.

> So, my real question is why you bother to allocate from HIGHMEM at all,
> when you know that you probably won't be getting what you want?  

Actually I did not know.  The common case is non-NUMA with PAE not
enabled.  In which case the odds are fairly high I will get what I
want.  And being able to allocate from 3GB instead of just 1GB is
much more polite.  The question then is how do I specify the zones
properly.

Additionally I did not start using high memory until recently when, I
rewrote my generic code.  With the rewrite it is more obvious
what I am trying to accomplish but it obviously works much less
well in the NUMA-Q corner case.  

Looking at this code a little more there is also another related
bug.  I use TASK_SIZE to figure out how large an address I can
allocate but on 64bit architectures with a 32bit subset TASK_SIZE
can be a variable instead of a constant.  For the moment I am
going to just ignore that issue as sys_kexec_load looks
like one system call that it does not make sense to emulate.  The
expected behavior on a 64bit vs. a 32bit architecture are quite
different.

> What you want is RAM with physical addresses <3GB, right?

In this case, and then later I want to allocate from physical
addresses < 4GB.  The rest of the allocations will suffer
from the same problem on the NUMA-Q.

The problem is that I have not figured out how to tell the memory
allocator just what I need, and now it has been confirmed that
with two many false positives my code takes forever.   My goal has
always been to not make kexec an undue burden on the rest of the
kernel because I am not the common case.  

Allocating DMA able memory that you can use for pci devices
has the same sort of issues.   And so far it appears there the
solution is not to allocate from memory outside of the kernels virtual
address space.  As that case is improved I may be able to ride
it's coat tails.

I guess what I want to do then is add to asm-i386/kexec.h
#ifndef HIGHMEM64G
#define GFP_KEXEC GFP_HIGHUSER
#else
#define GFP_KEXEC GFP_KERNEL
#endif

And then in kernel/kexec.c
s/GFP_HIGHUSER/GFP_KEXEC/g

That should provide a usable mechanism to control this kind of thing,
it is not perfect, but at least I will be guaranteed to get back
memory that I can use.

I wonder if it is worth it to setup a special zone and zone list for
use with kexec?

I guess I would make the standard zones something like:
/*
 * ZONE_DMA	   < 16 MB	ISA DMA capable memory
 * ZONE_NORMAL	 16-896 MB	direct mapped by the kernel
 * ZONE_PHYSMEM 896-4096 MB	memory that is accessible with the MMU disabled.
 * ZONE_HIGHMEM  > 4096MB       only page cache and user processes
 */


Or something to that effect, so I could separate out memory
below 4GB and memory above 4GB.  For the reboot code buffer it really
does not matter, as that is just one chunk of physically continuous
memory.  For the normal allocation being able to get memory anywhere
between 0-4GB on a 32bit platform is something I don't want to give
up.

Eric

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kexec reboot code buffer
  2003-01-25 14:16 ` kexec reboot code buffer Eric W. Biederman
@ 2003-01-27 21:55   ` Dave Hansen
  2003-01-27 22:03     ` Martin J. Bligh
  2003-01-28  7:04     ` Eric W. Biederman
  0 siblings, 2 replies; 13+ messages in thread
From: Dave Hansen @ 2003-01-27 21:55 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-kernel, Martin J. Bligh

Eric W. Biederman wrote:
> Dave Hansen <haveblue@us.ibm.com> writes:
>>On my system, it appears to lock up in:
>>kimage_alloc_reboot_code_pages()
>>after the kexec -l.
> 
> 
> O.k. It should come out of it eventually from what I have
> seen described, the current algorithm is definitely inefficient on
> your machine.

It does appear to completely hang in the free loop.  Something funny is
happening there.  I'll try to provide more details later.  BTW, do you
mind updating your patches for 2.5.59?  I'm having some other problems
and I want to make sure it isn't my bad merging that's at fault :)

> And being able to allocate from 3GB instead of just 1GB is
> much more polite.  The question then is how do I specify the zones
> properly.

Actually, I think that using lowmem is OK.  The machine is going away
soon anyway, and the necessary memory is a very small portion,
especially on a machine with this much RAM.

>>What you want is RAM with physical addresses <3GB, right?
> 
> In this case, and then later I want to allocate from physical
> addresses < 4GB.  The rest of the allocations will suffer
> from the same problem on the NUMA-Q.
> 
> The problem is that I have not figured out how to tell the memory
> allocator just what I need, 
<snip>
> I guess I would make the standard zones something like:
> /*
>  * ZONE_DMA	  < 16 MB	ISA DMA capable memory
>  * ZONE_NORMAL  16-896 MB	direct mapped by the kernel
>  * ZONE_PHYSMEM 896-4096 MB	memory that is accessible with the
>                               MMU disabled.
>  * ZONE_HIGHMEM > 4096MB      only page cache and user processes
>  */

I think this might be overkill.  ZONE_NORMAL gives you what you want,
and I don't think it's worth it to introduce a new one just for the
relatively short timespan where you have the new kernel loaded, but
haven't actually shut down.  I think a little comment next to the
allocation explaining this will be more than enough.

Martin, any ideas?
-- 
Dave Hansen
haveblue@us.ibm.com


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kexec reboot code buffer
  2003-01-27 21:55   ` Dave Hansen
@ 2003-01-27 22:03     ` Martin J. Bligh
  2003-01-28  0:10       ` William Lee Irwin III
  2003-01-28  7:24       ` Eric W. Biederman
  2003-01-28  7:04     ` Eric W. Biederman
  1 sibling, 2 replies; 13+ messages in thread
From: Martin J. Bligh @ 2003-01-27 22:03 UTC (permalink / raw)
  To: Dave Hansen, Eric W. Biederman; +Cc: linux-kernel

>> The problem is that I have not figured out how to tell the memory
>> allocator just what I need, 
> <snip>
>> I guess I would make the standard zones something like:
>> /*
>>  * ZONE_DMA	  < 16 MB	ISA DMA capable memory
>>  * ZONE_NORMAL  16-896 MB	direct mapped by the kernel
>>  * ZONE_PHYSMEM 896-4096 MB	memory that is accessible with the
>>                               MMU disabled.
>>  * ZONE_HIGHMEM > 4096MB      only page cache and user processes
>>  */
> 
> I think this might be overkill.  ZONE_NORMAL gives you what you want,
> and I don't think it's worth it to introduce a new one just for the
> relatively short timespan where you have the new kernel loaded, but
> haven't actually shut down.  I think a little comment next to the
> allocation explaining this will be more than enough.
> 
> Martin, any ideas?

We talked about creating a new zone specifically for DMA32 (ie <4Gb)
for other reasons, but it's not there as yet. As Dave mentioned,
ZONE_NORMAL should be sufficient, though if you need it physically
contiguous, that might be a problem.

How much memory do you need? If it's only 2Mb or so, why don't we
statically reserve it at boot time and keep it set aside?

M.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kexec reboot code buffer
  2003-01-27 22:03     ` Martin J. Bligh
@ 2003-01-28  0:10       ` William Lee Irwin III
  2003-01-28  7:24       ` Eric W. Biederman
  1 sibling, 0 replies; 13+ messages in thread
From: William Lee Irwin III @ 2003-01-28  0:10 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Dave Hansen, Eric W. Biederman, linux-kernel

On Mon, Jan 27, 2003 at 02:03:24PM -0800, Martin J. Bligh wrote:
> We talked about creating a new zone specifically for DMA32 (ie <4Gb)
> for other reasons, but it's not there as yet. As Dave mentioned,
> ZONE_NORMAL should be sufficient, though if you need it physically
> contiguous, that might be a problem.

Slapping down the new zone type is trivial and has no obvious negative
consequences. There's no clear reason why it's not already been done.

-- wli

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kexec reboot code buffer
  2003-01-27 21:55   ` Dave Hansen
  2003-01-27 22:03     ` Martin J. Bligh
@ 2003-01-28  7:04     ` Eric W. Biederman
  2003-01-28  7:18       ` William Lee Irwin III
  1 sibling, 1 reply; 13+ messages in thread
From: Eric W. Biederman @ 2003-01-28  7:04 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel, Martin J. Bligh

Dave Hansen <haveblue@us.ibm.com> writes:

> Eric W. Biederman wrote:
> > Dave Hansen <haveblue@us.ibm.com> writes:
> >>On my system, it appears to lock up in:
> >>kimage_alloc_reboot_code_pages()
> >>after the kexec -l.
> > 
> > 
> > O.k. It should come out of it eventually from what I have
> > seen described, the current algorithm is definitely inefficient on
> > your machine.
> 
> It does appear to completely hang in the free loop.  Something funny is
> happening there.  I'll try to provide more details later.  BTW, do you
> mind updating your patches for 2.5.59?  

I will give it a shot shortly I have been intensely busy just
lately so find the free second is a bit difficult.  At the same

> I'm having some other problems
> and I want to make sure it isn't my bad merging that's at fault :)

I don't recall any merging issues at all with the stock kernel, just
a some slight line changes.
> 
> > And being able to allocate from 3GB instead of just 1GB is
> > much more polite.  The question then is how do I specify the zones
> > properly.
> 
> Actually, I think that using lowmem is OK.  The machine is going away
> soon anyway, and the necessary memory is a very small portion,
> especially on a machine with this much RAM.

I agree that lowmem for the common case is fine.  For kexec on panic,
and a some weird cases using high mem is beneficial.  I don't have
a problem with changing it back to just lowmem for the time being.
 
Eric

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kexec reboot code buffer
  2003-01-28  7:04     ` Eric W. Biederman
@ 2003-01-28  7:18       ` William Lee Irwin III
  2003-01-28  7:28         ` Eric W. Biederman
  0 siblings, 1 reply; 13+ messages in thread
From: William Lee Irwin III @ 2003-01-28  7:18 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Dave Hansen, linux-kernel, Martin J. Bligh

On Tue, Jan 28, 2003 at 12:04:19AM -0700, Eric W. Biederman wrote:
> I agree that lowmem for the common case is fine.  For kexec on panic,
> and a some weird cases using high mem is beneficial.  I don't have
> a problem with changing it back to just lowmem for the time being.

Well, there is the bit about dropping the PAE bit from %cr4 too.
Seriously, just plop down the fresh zone type and all will be well.
It's really incredibly easy.


-- wli

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kexec reboot code buffer
  2003-01-27 22:03     ` Martin J. Bligh
  2003-01-28  0:10       ` William Lee Irwin III
@ 2003-01-28  7:24       ` Eric W. Biederman
  2003-01-28 16:15         ` Martin J. Bligh
  1 sibling, 1 reply; 13+ messages in thread
From: Eric W. Biederman @ 2003-01-28  7:24 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Dave Hansen, linux-kernel

"Martin J. Bligh" <mbligh@aracnet.com> writes:

> >> The problem is that I have not figured out how to tell the memory
> >> allocator just what I need, 
> > <snip>
> >> I guess I would make the standard zones something like:
> >> /*
> >>  * ZONE_DMA	  < 16 MB	ISA DMA capable memory
> >>  * ZONE_NORMAL  16-896 MB	direct mapped by the kernel
> >>  * ZONE_PHYSMEM 896-4096 MB	memory that is accessible with the
> >>                               MMU disabled.
> >>  * ZONE_HIGHMEM > 4096MB      only page cache and user processes
> >>  */
> > 
> > I think this might be overkill.  ZONE_NORMAL gives you what you want,
> > and I don't think it's worth it to introduce a new one just for the
> > relatively short timespan where you have the new kernel loaded, but
> > haven't actually shut down.  I think a little comment next to the
> > allocation explaining this will be more than enough.
> > 
> > Martin, any ideas?
> 
> We talked about creating a new zone specifically for DMA32 (ie <4Gb)
> for other reasons, but it's not there as yet. 

Right.  And because of that I don't feel bad about asking for a zone
that ends at 4GB, as it is a fairly general need in the kernel, even
if the rest of the interfaces have a little catching up to do before
the can use it.  Although with IOMMUs I don't know how much such a
DMA32 zone is worth.

> As Dave mentioned,
> ZONE_NORMAL should be sufficient, though if you need it physically
> contiguous, that might be a problem.

I am fine with memory that is not physically contiguous.  The memory
I really want the kernel is currently sitting on.....

> How much memory do you need? If it's only 2Mb or so, why don't we
> statically reserve it at boot time and keep it set aside?

The largest I have heard of is currently is 96MB.   Typical is
somewhere between 900K and 6MB.  You get some interesting
kernel+ramdisk combinations when people are network booting a diskless
system.   Theoretically I can accommodate a nearly 4GB image, with the
current code structure. 

So the 4GB instead of 960MB limit, and not pesimizing the kernel for
the cases where the new image sits in ram for a while (kexec on panic)
is while I modified my code to use high memory.

The nasty case comes with highmemory when I am allocating memory on a
32GB NUMA box and am allocating memory on the wrong node.  In which
case my code needs to allocate 28GB before it starts getting the
memory it wants.

Eric

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kexec reboot code buffer
  2003-01-28  7:18       ` William Lee Irwin III
@ 2003-01-28  7:28         ` Eric W. Biederman
  2003-01-28  7:31           ` William Lee Irwin III
  0 siblings, 1 reply; 13+ messages in thread
From: Eric W. Biederman @ 2003-01-28  7:28 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: Dave Hansen, linux-kernel, Martin J. Bligh

William Lee Irwin III <wli@holomorphy.com> writes:

> On Tue, Jan 28, 2003 at 12:04:19AM -0700, Eric W. Biederman wrote:
> > I agree that lowmem for the common case is fine.  For kexec on panic,
> > and a some weird cases using high mem is beneficial.  I don't have
> > a problem with changing it back to just lowmem for the time being.
> 
> Well, there is the bit about dropping the PAE bit from %cr4 too.

Already done, it actually doesn't byte me until the next kernel starts
to execute, as we only set and not clear the PAE bit during bootup.

> Seriously, just plop down the fresh zone type and all will be well.
> It's really incredibly easy.

I will certainly take a look, tracing through that code can get a little
hairy.

Eric

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kexec reboot code buffer
  2003-01-28  7:28         ` Eric W. Biederman
@ 2003-01-28  7:31           ` William Lee Irwin III
  2003-01-28 15:21             ` Eric W. Biederman
  0 siblings, 1 reply; 13+ messages in thread
From: William Lee Irwin III @ 2003-01-28  7:31 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Dave Hansen, linux-kernel, Martin J. Bligh

William Lee Irwin III <wli@holomorphy.com> writes:
>> Seriously, just plop down the fresh zone type and all will be well.
>> It's really incredibly easy.

On Tue, Jan 28, 2003 at 12:28:04AM -0700, Eric W. Biederman wrote:
> I will certainly take a look, tracing through that code can get a little
> hairy.

It can really be approached much more cavalierly than that. The only
extant example aside from the original ZONE_DMA32 implementation I've
seen is Simon Winwood's MPSS patch, which needed something on the order
of 10 lines of code for a fresh zone type (for one arch).

And most of the bulk of the ZONE_DMA32 implementation was stringing up
the block layer to utilize it, not inserting the new zone type itself.


-- wli

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kexec reboot code buffer
  2003-01-28  7:31           ` William Lee Irwin III
@ 2003-01-28 15:21             ` Eric W. Biederman
  0 siblings, 0 replies; 13+ messages in thread
From: Eric W. Biederman @ 2003-01-28 15:21 UTC (permalink / raw)
  To: William Lee Irwin III; +Cc: Dave Hansen, linux-kernel, Martin J. Bligh

William Lee Irwin III <wli@holomorphy.com> writes:

> William Lee Irwin III <wli@holomorphy.com> writes:
> >> Seriously, just plop down the fresh zone type and all will be well.
> >> It's really incredibly easy.
> 
> On Tue, Jan 28, 2003 at 12:28:04AM -0700, Eric W. Biederman wrote:
> > I will certainly take a look, tracing through that code can get a little
> > hairy.
> 
> It can really be approached much more cavalierly than that. The only
> extant example aside from the original ZONE_DMA32 implementation I've
> seen is Simon Winwood's MPSS patch, which needed something on the order
> of 10 lines of code for a fresh zone type (for one arch).
> 
> And most of the bulk of the ZONE_DMA32 implementation was stringing up
> the block layer to utilize it, not inserting the new zone type itself.

Primarily it appears that just another ZONE needs to be added, and then
free_area_init needs to be passed the proper parameters.  

I still want to look closely at how the discontig mem case for NUMA is
setup.  It is probably nothing to worry about but I want to make
certain it does not have any perverse behavior and also I want to be
certain I know how to setup a NUMA system properly, since I am looking
at the code anyway.

Eric

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kexec reboot code buffer
  2003-01-28  7:24       ` Eric W. Biederman
@ 2003-01-28 16:15         ` Martin J. Bligh
  2003-01-29 15:41           ` Eric W. Biederman
  0 siblings, 1 reply; 13+ messages in thread
From: Martin J. Bligh @ 2003-01-28 16:15 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Dave Hansen, linux-kernel

>> We talked about creating a new zone specifically for DMA32 (ie <4Gb)
>> for other reasons, but it's not there as yet. 
> 
> Right.  And because of that I don't feel bad about asking for a zone
> that ends at 4GB, as it is a fairly general need in the kernel, even
> if the rest of the interfaces have a little catching up to do before
> the can use it.  Although with IOMMUs I don't know how much such a
> DMA32 zone is worth.

It's probably too late for that sort of thing now in 2.5 though. 

> I am fine with memory that is not physically contiguous.  The memory
> I really want the kernel is currently sitting on.....

Oh, in that case you should have no problem getting it from ZONE_NORMAL,
especially if you can wake up kswapd and wait for a few seconds.
 
> The largest I have heard of is currently is 96MB.   Typical is

Eeek! ;-)

> somewhere between 900K and 6MB.  You get some interesting
> kernel+ramdisk combinations when people are network booting a diskless
> system.   Theoretically I can accommodate a nearly 4GB image, with the
> current code structure. 

Personnally I don't have a problem setting aside that much space at boot
time, but it's probably not a good solution for small boxes.

> So the 4GB instead of 960MB limit, and not pesimizing the kernel for
> the cases where the new image sits in ram for a while (kexec on panic)
> is while I modified my code to use high memory.

Maybe IFF you want to suppork kexec on panic, it should be statically
reserved at boot time? You don't want to be mucking around in the panic
path trying to swap out memory, etc. when your kernel is halfway down
the toilet already ... and that has nothing to do with memory placement,
it's just a space issue.

> The nasty case comes with highmemory when I am allocating memory on a
> 32GB NUMA box and am allocating memory on the wrong node.  In which
> case my code needs to allocate 28GB before it starts getting the
> memory it wants.

Oh, just do alloc_pages_node(0) (works on non NUMA as well, will just 
fall back). But I can show you a 32Gb SMP box as well ;-) ZONE_NORMAL 
is probably still easiest, and most general.

M.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kexec reboot code buffer
  2003-01-28 16:15         ` Martin J. Bligh
@ 2003-01-29 15:41           ` Eric W. Biederman
  2003-01-29 16:17             ` Martin J. Bligh
  0 siblings, 1 reply; 13+ messages in thread
From: Eric W. Biederman @ 2003-01-29 15:41 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Dave Hansen, linux-kernel

"Martin J. Bligh" <mbligh@aracnet.com> writes:

> >> We talked about creating a new zone specifically for DMA32 (ie <4Gb)
> >> for other reasons, but it's not there as yet. 
> > 
> > Right.  And because of that I don't feel bad about asking for a zone
> > that ends at 4GB, as it is a fairly general need in the kernel, even
> > if the rest of the interfaces have a little catching up to do before
> > the can use it.  Although with IOMMUs I don't know how much such a
> > DMA32 zone is worth.
> 
> It's probably too late for that sort of thing now in 2.5 though. 

Unless there are balancing issues adding a new zone is very trivial.
However any code in that direction will be an additional patch so
because ZONE_NORMAL works for most cases.  Alan Cox can have his high
memory case if he wants it.

> > I am fine with memory that is not physically contiguous.  The memory
> > I really want the kernel is currently sitting on.....
> 
> Oh, in that case you should have no problem getting it from ZONE_NORMAL,
> especially if you can wake up kswapd and wait for a few seconds.

Nope, kswapd will not free the kernels text segment.  So in practice
I can use anything below 4GB. 
  
> > The largest I have heard of is currently is 96MB.   Typical is
> 
> Eeek! ;-)

There is even a distribution built to be run completely out of a ramdisk. 
http://warewulf-cluster.org/
 
> > somewhere between 900K and 6MB.  You get some interesting
> > kernel+ramdisk combinations when people are network booting a diskless
> > system.   Theoretically I can accommodate a nearly 4GB image, with the
> > current code structure. 
> 
> Personnally I don't have a problem setting aside that much space at boot
> time, but it's probably not a good solution for small boxes.
> 
> > So the 4GB instead of 960MB limit, and not pesimizing the kernel for
> > the cases where the new image sits in ram for a while (kexec on panic)
> > is while I modified my code to use high memory.
> 
> Maybe IFF you want to suppork kexec on panic, it should be statically
> reserved at boot time? You don't want to be mucking around in the panic
> path trying to swap out memory, etc. when your kernel is halfway down
> the toilet already ... and that has nothing to do with memory placement,
> it's just a space issue.

As it currently exists kexec happens in two stages sys_kexec_load that
scatters the new image through out memory, with some care so that I can
use memcpy, and some variant of memmove to place the image at it's
final location in memory.   This is where all of the memory allocation
happens.


Then there is sys_reboot(LINUX_REBOOT_CMD_KEXEC), (or a kernel panic)
which triggers the work.  And except for cpu state changes nothing
happens.

> > The nasty case comes with highmemory when I am allocating memory on a
> > 32GB NUMA box and am allocating memory on the wrong node.  In which
> > case my code needs to allocate 28GB before it starts getting the
> > memory it wants.
> 
> Oh, just do alloc_pages_node(0) (works on non NUMA as well, will just 
> fall back). But I can show you a 32Gb SMP box as well ;-) ZONE_NORMAL 
> is probably still easiest, and most general.

Right it is not hard to do it just takes a little more work.

Eric

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: kexec reboot code buffer
  2003-01-29 15:41           ` Eric W. Biederman
@ 2003-01-29 16:17             ` Martin J. Bligh
  0 siblings, 0 replies; 13+ messages in thread
From: Martin J. Bligh @ 2003-01-29 16:17 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Dave Hansen, linux-kernel

>> > I am fine with memory that is not physically contiguous.  The memory
>> > I really want the kernel is currently sitting on.....
>> 
>> Oh, in that case you should have no problem getting it from ZONE_NORMAL,
>> especially if you can wake up kswapd and wait for a few seconds.
> 
> Nope, kswapd will not free the kernels text segment.  So in practice
> I can use anything below 4GB. 

Oh, I'm well aware that the kernel won't get swapped out ;-) 
I was referring to the getting memory that's "not physically contiguous"
by waking up kswapd ;-)

> There is even a distribution built to be run completely out of a ramdisk. 
> http://warewulf-cluster.org/

Terrifying ;-)
  

M.


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2003-01-29 16:08 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <3E31AC58.2020802@us.ibm.com>
2003-01-25 14:16 ` kexec reboot code buffer Eric W. Biederman
2003-01-27 21:55   ` Dave Hansen
2003-01-27 22:03     ` Martin J. Bligh
2003-01-28  0:10       ` William Lee Irwin III
2003-01-28  7:24       ` Eric W. Biederman
2003-01-28 16:15         ` Martin J. Bligh
2003-01-29 15:41           ` Eric W. Biederman
2003-01-29 16:17             ` Martin J. Bligh
2003-01-28  7:04     ` Eric W. Biederman
2003-01-28  7:18       ` William Lee Irwin III
2003-01-28  7:28         ` Eric W. Biederman
2003-01-28  7:31           ` William Lee Irwin III
2003-01-28 15:21             ` Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox