linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC 1/1] bootmem: move big allocations behing 4G
@ 2010-01-18 22:56 Jiri Slaby
  2010-01-19 14:33 ` Johannes Weiner
  0 siblings, 1 reply; 7+ messages in thread
From: Jiri Slaby @ 2010-01-18 22:56 UTC (permalink / raw)
  To: linux-mm; +Cc: hannes, linux-kernel, jirislaby

Hi, I'm fighting a bug where Grub loads the kernel just fine, whereas
isolinux doesn't. I found out, it's due to different addresses of
loaded initrd. On a machine with 128G of memory, grub loads the
initrd at 895M in our case and flat mem_map (2G long) is allocated
above 4G due to 2-4G BIOS reservation.

On the other hand, with isolinux, the 0-2G is free and mem_map is
placed there leaving no space for others, hence kernel panics for
swiotlb which needs to be below 4G.

I use the patch below, but it seems, from the code, like it won't
work out for section allocations.

Any ideas?

--

If there is a big amount of memory (128G) in a machine and 2G of
low 4 gigs are reserved by BIOS, the rest of the "low" memory is
consumed by mem_map with flat mapping enabled.

Consequent allocations with limit being 4G (e.g. swiotlb) fails to
allocate and kernel panics.

Try to avoid that situation on 64-bit by allocating space bigger
than 128M above 4G if possible. With that, mem_map is allocated above
4G and there is enough space for others (swiotlb) in low 4G.
---
 mm/bootmem.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/mm/bootmem.c b/mm/bootmem.c
index 7d14868..365a0d1 100644
--- a/mm/bootmem.c
+++ b/mm/bootmem.c
@@ -486,6 +486,11 @@ static void * __init alloc_bootmem_core(struct bootmem_data *bdata,
 
 	step = max(align >> PAGE_SHIFT, 1UL);
 
+	/* on 64-bit: allocate 128M+ at 4G if satisfies limit */
+	if (BITS_PER_LONG == 64 && size >= (128UL << 20) &&
+			(4UL << 30) + size < (max << PAGE_SHIFT))
+		goal = 4UL << (30 - PAGE_SHIFT);
+
 	if (goal && min < goal && goal < max)
 		start = ALIGN(goal, step);
 	else
-- 
1.6.5.7

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC 1/1] bootmem: move big allocations behing 4G
  2010-01-18 22:56 [RFC 1/1] bootmem: move big allocations behing 4G Jiri Slaby
@ 2010-01-19 14:33 ` Johannes Weiner
  2010-01-19 22:02   ` Jiri Slaby
  2010-01-20 13:50   ` Jiri Slaby
  0 siblings, 2 replies; 7+ messages in thread
From: Johannes Weiner @ 2010-01-19 14:33 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: linux-mm, linux-kernel, jirislaby, Ralf Baechle, x86

Hello Jiri,

On Mon, Jan 18, 2010 at 11:56:30PM +0100, Jiri Slaby wrote:
> Hi, I'm fighting a bug where Grub loads the kernel just fine, whereas
> isolinux doesn't. I found out, it's due to different addresses of
> loaded initrd. On a machine with 128G of memory, grub loads the
> initrd at 895M in our case and flat mem_map (2G long) is allocated
> above 4G due to 2-4G BIOS reservation.
> 
> On the other hand, with isolinux, the 0-2G is free and mem_map is
> placed there leaving no space for others, hence kernel panics for
> swiotlb which needs to be below 4G.

Bootmem already protects the lower 16MB DMA zone for the obvious reasons,
how about shifting the default bootmem goal above the DMA32 zone if it exists?

I added Ralf and the x86 Team on Cc as this only affects x86 and mips, afaics.

> Any ideas?

I tested the below on a rather dull x86_64 machine and it seems to work.  Would
this work in your case as well?  The goal for mem_map should now be above 4G.

	Hannes

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC 1/1] bootmem: move big allocations behing 4G
  2010-01-19 14:33 ` Johannes Weiner
@ 2010-01-19 22:02   ` Jiri Slaby
  2010-01-20 13:50   ` Jiri Slaby
  1 sibling, 0 replies; 7+ messages in thread
From: Jiri Slaby @ 2010-01-19 22:02 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: linux-mm, linux-kernel, Ralf Baechle, x86

On 01/19/2010 03:33 PM, Johannes Weiner wrote:
> On Mon, Jan 18, 2010 at 11:56:30PM +0100, Jiri Slaby wrote:
>> Hi, I'm fighting a bug where Grub loads the kernel just fine, whereas
>> isolinux doesn't. I found out, it's due to different addresses of
>> loaded initrd. On a machine with 128G of memory, grub loads the
>> initrd at 895M in our case and flat mem_map (2G long) is allocated
>> above 4G due to 2-4G BIOS reservation.
>>
>> On the other hand, with isolinux, the 0-2G is free and mem_map is
>> placed there leaving no space for others, hence kernel panics for
>> swiotlb which needs to be below 4G.
> 
> Bootmem already protects the lower 16MB DMA zone for the obvious reasons,
> how about shifting the default bootmem goal above the DMA32 zone if it exists?

Hi, I think it makes sense.

> I tested the below on a rather dull x86_64 machine and it seems to work.  Would
> this work in your case as well?  The goal for mem_map should now be above 4G.

It seems that it will. I'll give it a try later (it needs to be set up)
and report back.

> From 1c11ce1e82c6209f0eda72e3340ab0c55cd6f330 Mon Sep 17 00:00:00 2001
> From: Johannes Weiner <jw@emlix.com>
> Date: Tue, 19 Jan 2010 14:14:44 +0100
> Subject: [patch] bootmem: avoid DMA32 zone, if any, by default
> 
> x86_64 and mips define a DMA32 zone additionally to the old DMA
> zone of 16MB.  Bootmem already avoids the old DMA zone if the
> allocation site did not request otherwise.
> 
> But since DMA32 is also a limited resource, avoid using it as well
> by default, if defined.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

So for the time being:
Reviewed-by: Jiri Slaby <jirislaby@gmail.com>

thanks,
-- 
js

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC 1/1] bootmem: move big allocations behing 4G
  2010-01-19 14:33 ` Johannes Weiner
  2010-01-19 22:02   ` Jiri Slaby
@ 2010-01-20 13:50   ` Jiri Slaby
  2010-01-20 15:30     ` Johannes Weiner
  1 sibling, 1 reply; 7+ messages in thread
From: Jiri Slaby @ 2010-01-20 13:50 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: linux-mm, linux-kernel, Ralf Baechle, x86

On 01/19/2010 03:33 PM, Johannes Weiner wrote:
> --- a/include/linux/bootmem.h
> +++ b/include/linux/bootmem.h
> @@ -96,20 +96,26 @@ extern void *__alloc_bootmem_low_node(pg_data_t *pgdat,
>  				      unsigned long align,
>  				      unsigned long goal);
>  
> +#ifdef MAX_DMA32_PFN
> +#define BOOTMEM_DEFAULT_GOAL	(__pa(MAX_DMA32_PFN << PAGE_SHIFT))
> +#else
> +#define BOOTMEM_DEFAULT_GOAL	MAX_DMA_ADDRESS

I just noticed this should write:
#define BOOTMEM_DEFAULT_GOAL   __pa(MAX_DMA_ADDRESS)

> +#endif
> +
>  #define alloc_bootmem(x) \
> -	__alloc_bootmem(x, SMP_CACHE_BYTES, __pa(MAX_DMA_ADDRESS))
> +	__alloc_bootmem(x, SMP_CACHE_BYTES, BOOTMEM_DEFAULT_GOAL)


-- 
js

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC 1/1] bootmem: move big allocations behing 4G
  2010-01-20 13:50   ` Jiri Slaby
@ 2010-01-20 15:30     ` Johannes Weiner
  2010-01-20 22:53       ` [PATCH] bootmem: avoid DMA32 zone by default Johannes Weiner
  0 siblings, 1 reply; 7+ messages in thread
From: Johannes Weiner @ 2010-01-20 15:30 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: linux-mm, linux-kernel, Ralf Baechle, x86

Hi Jiri,

On Wed, Jan 20, 2010 at 02:50:13PM +0100, Jiri Slaby wrote:
> On 01/19/2010 03:33 PM, Johannes Weiner wrote:
> > --- a/include/linux/bootmem.h
> > +++ b/include/linux/bootmem.h
> > @@ -96,20 +96,26 @@ extern void *__alloc_bootmem_low_node(pg_data_t *pgdat,
> >  				      unsigned long align,
> >  				      unsigned long goal);
> >  
> > +#ifdef MAX_DMA32_PFN
> > +#define BOOTMEM_DEFAULT_GOAL	(__pa(MAX_DMA32_PFN << PAGE_SHIFT))
> > +#else
> > +#define BOOTMEM_DEFAULT_GOAL	MAX_DMA_ADDRESS
> 
> I just noticed this should write:
> #define BOOTMEM_DEFAULT_GOAL   __pa(MAX_DMA_ADDRESS)

Pardon my sloppiness, it's all backwards.  The other case should
be without the __pa(), of course.

I'll send a fixed and tested version later.

Thanks,
	Hannes

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] bootmem: avoid DMA32 zone by default
  2010-01-20 15:30     ` Johannes Weiner
@ 2010-01-20 22:53       ` Johannes Weiner
  2010-01-20 23:12         ` Jiri Slaby
  0 siblings, 1 reply; 7+ messages in thread
From: Johannes Weiner @ 2010-01-20 22:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jiri Slaby, linux-mm, linux-kernel, Ralf Baechle, x86, stable

Bootmem already tries normal allocations above the DMA zone to reserve
it for users that can not cope with higher addresses.

The same principle applies to the DMA32 zone, which is currently not
spared from normal allocations.

This can lead to exhaustion of this limited amount of address space
through things that can easily live elsewhere, like the mem_map e.g.

Raise bootmem's default goal beyond DMA32 for architectures with this
zone defined.  For now, these are x86 and mips.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Jiri Slaby <jslaby@suse.cz>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: x86@kernel.org
Cc: stable@kernel.org
---
 include/linux/bootmem.h |   20 +++++++++++++-------
 1 files changed, 13 insertions(+), 7 deletions(-)

I cc'd stable because this affects already released kernels.  But since this is
the first report of DMA32 memory exhaustion through bootmem that I hear of,
you guys might want to skip this patch due to the fragile nature of early memory
management.

diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h
index b10ec49..52c8272 100644
--- a/include/linux/bootmem.h
+++ b/include/linux/bootmem.h
@@ -96,20 +96,26 @@ extern void *__alloc_bootmem_low_node(pg_data_t *pgdat,
 				      unsigned long align,
 				      unsigned long goal);
 
+#ifdef MAX_DMA32_PFN
+#define BOOTMEM_DEFAULT_GOAL	(MAX_DMA32_PFN << PAGE_SHIFT)
+#else
+#define BOOTMEM_DEFAULT_GOAL	__pa(MAX_DMA_ADDRESS)
+#endif
+
 #define alloc_bootmem(x) \
-	__alloc_bootmem(x, SMP_CACHE_BYTES, __pa(MAX_DMA_ADDRESS))
+	__alloc_bootmem(x, SMP_CACHE_BYTES, BOOTMEM_DEFAULT_GOAL)
 #define alloc_bootmem_nopanic(x) \
-	__alloc_bootmem_nopanic(x, SMP_CACHE_BYTES, __pa(MAX_DMA_ADDRESS))
+	__alloc_bootmem_nopanic(x, SMP_CACHE_BYTES, BOOTMEM_DEFAULT_GOAL)
 #define alloc_bootmem_pages(x) \
-	__alloc_bootmem(x, PAGE_SIZE, __pa(MAX_DMA_ADDRESS))
+	__alloc_bootmem(x, PAGE_SIZE, BOOTMEM_DEFAULT_GOAL)
 #define alloc_bootmem_pages_nopanic(x) \
-	__alloc_bootmem_nopanic(x, PAGE_SIZE, __pa(MAX_DMA_ADDRESS))
+	__alloc_bootmem_nopanic(x, PAGE_SIZE, BOOTMEM_DEFAULT_GOAL)
 #define alloc_bootmem_node(pgdat, x) \
-	__alloc_bootmem_node(pgdat, x, SMP_CACHE_BYTES, __pa(MAX_DMA_ADDRESS))
+	__alloc_bootmem_node(pgdat, x, SMP_CACHE_BYTES, BOOTMEM_DEFAULT_GOAL)
 #define alloc_bootmem_pages_node(pgdat, x) \
-	__alloc_bootmem_node(pgdat, x, PAGE_SIZE, __pa(MAX_DMA_ADDRESS))
+	__alloc_bootmem_node(pgdat, x, PAGE_SIZE, BOOTMEM_DEFAULT_GOAL)
 #define alloc_bootmem_pages_node_nopanic(pgdat, x) \
-	__alloc_bootmem_node_nopanic(pgdat, x, PAGE_SIZE, __pa(MAX_DMA_ADDRESS))
+	__alloc_bootmem_node_nopanic(pgdat, x, PAGE_SIZE, BOOTMEM_DEFAULT_GOAL)
 
 #define alloc_bootmem_low(x) \
 	__alloc_bootmem_low(x, SMP_CACHE_BYTES, 0)
-- 
1.6.5.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] bootmem: avoid DMA32 zone by default
  2010-01-20 22:53       ` [PATCH] bootmem: avoid DMA32 zone by default Johannes Weiner
@ 2010-01-20 23:12         ` Jiri Slaby
  0 siblings, 0 replies; 7+ messages in thread
From: Jiri Slaby @ 2010-01-20 23:12 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, linux-mm, linux-kernel, Ralf Baechle, x86, stable

On 01/20/2010 11:53 PM, Johannes Weiner wrote:
> I cc'd stable because this affects already released kernels.  But since this is
> the first report of DMA32 memory exhaustion through bootmem that I hear of,

Just for how the setup look like:
128G of RAM, flat mapping
sizeof(struct page)=56
0-1.75G mem_map
1.75-2G vfs caches, console and others. initrd reservation
2-4G reserved by BIOS

Kernel panics with out of memory when swiotlb tries to allocate 64M of
"low" bootmem.

-- 
js
suse labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-01-20 23:12 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-18 22:56 [RFC 1/1] bootmem: move big allocations behing 4G Jiri Slaby
2010-01-19 14:33 ` Johannes Weiner
2010-01-19 22:02   ` Jiri Slaby
2010-01-20 13:50   ` Jiri Slaby
2010-01-20 15:30     ` Johannes Weiner
2010-01-20 22:53       ` [PATCH] bootmem: avoid DMA32 zone by default Johannes Weiner
2010-01-20 23:12         ` Jiri Slaby

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).