Re: [PATCH] mm: fix boundary checking in free_bootmem_core

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andrew Morton <akpm@linux-foundation.org>
To: "Yinghai Lu" <yhlu.kernel@gmail.com>
Cc: andi@firstfloor.org, ak@suse.de, mingo@elte.hu, clameter@sgi.com,
	linux-kernel@vger.kernel.org, y-goto@jp.fujitsu.com,
	kamezawa.hiroyu@jp.fujitsu.com
Subject: Re: [PATCH] mm: fix boundary checking in free_bootmem_core
Date: Fri, 21 Mar 2008 12:44:04 -0700	[thread overview]
Message-ID: <20080321124404.f9d74052.akpm@linux-foundation.org> (raw)
In-Reply-To: <86802c440803141036m4a508a91o2cf6706157231429@mail.gmail.com>

On Fri, 14 Mar 2008 10:36:52 -0700
"Yinghai Lu" <yhlu.kernel@gmail.com> wrote:

> On Fri, Mar 14, 2008 at 9:53 AM, Andi Kleen <andi@firstfloor.org> wrote:
> > On Fri, Mar 14, 2008 at 09:44:50AM -0700, Yinghai Lu wrote:
> >  > On 14 Mar 2008 12:58:44 +0100, Andi Kleen <andi@firstfloor.org> wrote:
> >  > > "Yinghai Lu" <yhlu.kernel@gmail.com> writes:
> >  > >  >
> >  > >  > then i tried to reserve 64M or 128M RAM before that, and free that
> >  > >  > before gart/switotble try to allloc_bootmem under 4g.
> >  > >
> >  > >  Sounds like an incredible hack. There are far better ways to do that
> >  > >  for bootmem allocations. e.g. you can just specify a high enough "goal"
> >  > >  That is how swiotlb solves a similar problem (at least before my
> >  > >  mask allocator rewrite)
> >  >
> >  > I don't think so.
> >  >
> >  > anyway, otherway to workaround it is
> >  > change
> >  >                 return __earlyonly_bootmem_alloc(node, size, size,
> >  >                                 __pa(MAX_DMA_ADDRESS));
> >  > in vmemmap_alloc_block to
> >  >                 return __earlyonly_bootmem_alloc(node, size, size,
> >  >                                 __pa(MAX_DMA_ADDRESS + (1<<27)));
> >  > to make room for gart. but that is global change. and may affect other
> >  > platform.
> >
> >  You can just make it an optional architecture defined macro
> it is hard to use MACRO, if someone have allsysconfig, that will make
> kernel code use a lot.
> >
> >
> >  > and don't make sure gart will get it.
> >
> >  Has nothing to do with the gart?
> >
> >
> >
> >  >
> >  > also i assume swiotlb need that range is less than 4g.
> >
> >  The normal rule is that anybody who needs big bootmem allocations
> >  need to make sure they're high enough to not fill up first 4GB.
> >  For small allocations like most of bootmem it doesn't matter because
> >  they're, um, small.
> >
> >  If vmemmap doesn't do that vmemmap needs to be fixed.
> 
> how to define big?
> it has hundreds of 2M block. when numa is on, they span on all nodes,
> and if numa is off, they are sitting on first_online_node.
> 

So Ingo has now merged some x86 patches which apparently had a dependency
upon this patch: "otherwise free_bootmem_node in dma32_free could do sth
bad.".

I had this patch on hold awaiting conclusive feedback from Andi.  It looks
like it needs to be merged asap and any remaining problems should be
addressed separately.

Here's what I have:


From: "Yinghai Lu" <yhlu.kernel@gmail.com>

With numa enabled, some callers could have a range o fmemory on one node but
try to free that on other node.  This can cause some pages to be freed
wrongly.

For example: when we try to allocate 128g boot ram early for gart/swiotlb, and
free that range later so gart/swiotlb can get some range afterwards.

With this patch, we don't need to care which node holds the range, just loop
to call free_bootmem_node for all online nodes.

This patch makes free_bootmem_core() more robust by trimming the sidx and eidx
according the ram range that the node has.

And make the free_bootmem_core handle this out of range case.  We could use
bdata_list to make sure the range can be freed for sure.  So next time, we
don't need to loop online nodes and could use free_bootmem directly.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/bootmem.c |   25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

diff -puN mm/bootmem.c~mm-fix-boundary-checking-in-free_bootmem_core mm/bootmem.c
--- a/mm/bootmem.c~mm-fix-boundary-checking-in-free_bootmem_core
+++ a/mm/bootmem.c
@@ -125,6 +125,7 @@ static int __init reserve_bootmem_core(b
 	BUG_ON(!size);
 	BUG_ON(PFN_DOWN(addr) >= bdata->node_low_pfn);
 	BUG_ON(PFN_UP(addr + size) > bdata->node_low_pfn);
+	BUG_ON(addr < bdata->node_boot_start);
 
 	sidx = PFN_DOWN(addr - bdata->node_boot_start);
 	eidx = PFN_UP(addr + size - bdata->node_boot_start);
@@ -156,21 +157,31 @@ static void __init free_bootmem_core(boo
 	unsigned long sidx, eidx;
 	unsigned long i;
 
+	BUG_ON(!size);
+
+	/* out range */
+	if (addr + size < bdata->node_boot_start ||
+		PFN_DOWN(addr) > bdata->node_low_pfn)
+		return;
 	/*
 	 * round down end of usable mem, partially free pages are
 	 * considered reserved.
 	 */
-	BUG_ON(!size);
-	BUG_ON(PFN_DOWN(addr + size) > bdata->node_low_pfn);
 
-	if (addr < bdata->last_success)
+	if (addr >= bdata->node_boot_start && addr < bdata->last_success)
 		bdata->last_success = addr;
 
 	/*
-	 * Round up the beginning of the address.
+	 * Round up to index to the range.
 	 */
-	sidx = PFN_UP(addr) - PFN_DOWN(bdata->node_boot_start);
+	if (PFN_UP(addr) > PFN_DOWN(bdata->node_boot_start))
+		sidx = PFN_UP(addr) - PFN_DOWN(bdata->node_boot_start);
+	else
+		sidx = 0;
+
 	eidx = PFN_DOWN(addr + size - bdata->node_boot_start);
+	if (eidx > bdata->node_low_pfn - PFN_DOWN(bdata->node_boot_start))
+		eidx = bdata->node_low_pfn - PFN_DOWN(bdata->node_boot_start);
 
 	for (i = sidx; i < eidx; i++) {
 		if (unlikely(!test_and_clear_bit(i, bdata->node_bootmem_map)))
@@ -421,7 +432,9 @@ int __init reserve_bootmem(unsigned long
 
 void __init free_bootmem(unsigned long addr, unsigned long size)
 {
-	free_bootmem_core(NODE_DATA(0)->bdata, addr, size);
+	bootmem_data_t *bdata;
+	list_for_each_entry(bdata, &bdata_list, list)
+		free_bootmem_core(bdata, addr, size);
 }
 
 unsigned long __init free_all_bootmem(void)
_

next prev parent reply	other threads:[~2008-03-21 19:44 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-12  1:01 [PATCH] mm: fix boundary checking in free_bootmem_core Yinghai Lu
2008-03-12 23:21 ` Yinghai Lu
2008-03-12 23:33   ` Andrew Morton
2008-03-13  1:11     ` Yinghai Lu
2008-03-13  1:22       ` Andrew Morton
2008-03-13 21:59         ` Andi Kleen
2008-03-13 22:22           ` Yinghai Lu
2008-03-14 11:58             ` Andi Kleen
2008-03-14 16:44               ` Yinghai Lu
2008-03-14 16:53                 ` Andi Kleen
2008-03-14 17:36                   ` Yinghai Lu
2008-03-21 19:44                     ` Andrew Morton [this message]
2008-03-21 20:00                       ` Ingo Molnar
2008-03-21 21:54                       ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080321124404.f9d74052.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=ak@suse.de \
    --cc=andi@firstfloor.org \
    --cc=clameter@sgi.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=y-goto@jp.fujitsu.com \
    --cc=yhlu.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.