From: Johannes Weiner <hannes@saeurebad.de>
To: "Yinghai Lu" <yhlu.kernel@gmail.com>
Cc: "Ingo Molnar" <mingo@elte.hu>,
"Linus Torvalds" <torvalds@linux-foundation.org>,
linux-kernel@vger.kernel.org,
"Andrew Morton" <akpm@linux-foundation.org>,
"Thomas Gleixner" <tglx@linutronix.de>,
"H. Peter Anvin" <hpa@zytor.com>,
jbarnes@virtuousgeek.org, "Siddha\,
Suresh B" <suresh.b.siddha@intel.com>
Subject: Re: [patch] mm: node-setup agnostic free_bootmem()
Date: Wed, 30 Apr 2008 19:52:17 +0200 [thread overview]
Message-ID: <87mynbz0vi.fsf@saeurebad.de> (raw)
In-Reply-To: <86802c440804300922l6f4371aayc99ba8b55646204a@mail.gmail.com> (Yinghai Lu's message of "Wed, 30 Apr 2008 09:22:26 -0700")
Hi,
"Yinghai Lu" <yhlu.kernel@gmail.com> writes:
> On Wed, Apr 30, 2008 at 3:50 AM, Johannes Weiner <hannes@saeurebad.de> wrote:
>>
>> Hi,
>>
>> "Yinghai Lu" <yhlu.kernel@gmail.com> writes:
>>
>> > On Mon, Apr 28, 2008 at 12:11 PM, Yinghai Lu <yhlu.kernel@gmail.com> wrote:
>> >>
>> >> On Mon, Apr 28, 2008 at 9:54 AM, Johannes Weiner <hannes@saeurebad.de> wrote:
>> >> > Hi Yinghai,
>> >> >
>> >> >
>> >> >
>> >> > "Yinghai Lu" <yhlu.kernel@gmail.com> writes:
>> >> >
>> >> > > On Sun, Apr 27, 2008 at 5:40 PM, Ingo Molnar <mingo@elte.hu> wrote:
>> >> > >>
>> >> > >> * Johannes Weiner <hannes@saeurebad.de> wrote:
>> >> > >>
>> >> > >> > > so i very much agree that your changes are cleaner, i just wanted to
>> >> > >> > > have one that has all the fixes included.
>> >> > >> >
>> >> > >> > I had planned this to be another patch because there are more then one
>> >> > >> > boundary check I wanted to tighten. I can merge them though if you
>> >> > >> > like.
>> >> > >>
>> >> > >> no, better to have them in separate patches.
>> >> > >>
>> >> > >> > > Would you like to post a patch against current -git or should i
>> >> > >> > > extract the cleaner reserve_bootmem() from your previous patch?
>> >> > >> >
>> >> > >> > I just moved and have only sporadic internet access and free time
>> >> > >> > slots available. Would be nice if you could do it!
>> >> > >>
>> >> > >> sure, find the merged patch below, against latest -git, boot-tested on
>> >> > >> x86. Is this what you had in mind?
>> >> > >>
>> >> > >> Ingo
>> >> > >>
>> >> > >> ---------------->
>> >> > >> Subject: mm: node-setup agnostic free_bootmem()
>> >> > >> From: Johannes Weiner <hannes@saeurebad.de>
>> >> > >> Date: Wed, 16 Apr 2008 13:36:31 +0200
>> >> > >>
>> >> > >> Make free_bootmem() look up the node holding the specified address
>> >> > >> range which lets it work transparently on single-node and multi-node
>> >> > >> configurations.
>> >> > >>
>> >> > >> If the address range exceeds the node range, it well be marked free
>> >> > >> across node boundaries, too.
>> >> > >>
>> >> > >> Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
>> >> > >> CC: Andi Kleen <andi@firstfloor.org>
>> >> > >> CC: Yinghai Lu <yhlu.kernel@gmail.com>
>> >> > >> CC: Yasunori Goto <y-goto@jp.fujitsu.com>
>> >> > >> CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> >> > >> CC: Christoph Lameter <clameter@sgi.com>
>> >> > >> CC: Andrew Morton <akpm@linux-foundation.org>
>> >> > >> Signed-off-by: Ingo Molnar <mingo@elte.hu>
>> >> > >> ---
>> >> > >> mm/bootmem.c | 27 +++++++++++++++++++++++++--
>> >> > >> 1 file changed, 25 insertions(+), 2 deletions(-)
>> >> > >>
>> >> > >> Index: linux-x86.q/mm/bootmem.c
>> >> > >> ===================================================================
>> >> > >> --- linux-x86.q.orig/mm/bootmem.c
>> >> > >> +++ linux-x86.q/mm/bootmem.c
>> >> > >> @@ -493,8 +493,31 @@ int __init reserve_bootmem(unsigned long
>> >> > >> void __init free_bootmem(unsigned long addr, unsigned long size)
>> >> > >> {
>> >> > >> bootmem_data_t *bdata;
>> >> > >> - list_for_each_entry(bdata, &bdata_list, list)
>> >> > >> - free_bootmem_core(bdata, addr, size);
>> >> > >> + unsigned long pos = addr;
>> >> > >> + unsigned long partsize = size;
>> >> > >> +
>> >> > >> + list_for_each_entry(bdata, &bdata_list, list) {
>> >> > >> + unsigned long remainder = 0;
>> >> > >> +
>> >> > >> + if (pos < bdata->node_boot_start)
>> >> > >> + continue;
>> >> > >> +
>> >> > >> + if (PFN_DOWN(pos + partsize) > bdata->node_low_pfn) {
>> >> > >> + remainder = PFN_DOWN(pos + partsize) - bdata->node_low_pfn;
>> >> > >> + partsize -= remainder;
>> >> > >> + }
>> >> > >> +
>> >> > >> + free_bootmem_core(bdata, pos, partsize);
>> >> > >> +
>> >> > >> + if (!remainder)
>> >> > >> + return;
>> >> > >> +
>> >> > >> + pos = PFN_PHYS(bdata->node_low_pfn + 1);
>> >> > >> + }
>> >> > >> + printk(KERN_ERR "free_bootmem: request: addr=%lx, size=%lx, "
>> >> > >> + "state: pos=%lx, partsize=%lx\n", addr, size,
>> >> > >> + pos, partsize);
>> >> > >> + BUG();
>> >> > >> }
>> >> > >>
>> >> > >> unsigned long __init free_all_bootmem(void)
>> >> > >>
>> >> > >
>> >> > > it will not work with cross nodes.
>> >> > >
>> >> > > for example: node 0: 0-2g, 4-6g, node1: 2-4g, 6-8g.
>> >> > > and if ramdisk sit cross 2G boundary. you will only free the range
>> >> > > before 2g.
>> >> >
>> >> > Yes, you stated that several times but this is not a technical argument:
>> >> > These setups are afaik not yet supported by the kernel at all. And you
>> >> > could not explain the node layout with the patch that implements support
>> >> > for these configurations.
>> >>
>> >> I looked at Suresh's patch, and it still only has one bdata for one node.
>> >
>> > Suresh's patch already in the Linus tree.
>> > commit 6ec6e0d9f2fd7cb6ca6bc3bfab5ae7b5cdd8c36f
>> > Author: Suresh Siddha <suresh.b.siddha@intel.com>
>> > Date: Tue Mar 25 10:14:35 2008 -0700
>> >
>> > srat, x86: add support for nodes spanning other nodes
>> >
>> > For example, If the physical address layout on a two node system with 8 GB
>> > memory is something like:
>> > node 0: 0-2GB, 4-6GB
>> > node 1: 2-4GB, 6-8GB
>> >
>> > Current kernels fail to boot/detect this NUMA topology.
>> >
>> > ACPI SRAT tables can expose such a topology which needs to be supported.
>> >
>> > Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
>> > Signed-off-by: Ingo Molnar <mingo@elte.hu>
>> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>>
>> Okay, so we have one bdata for node 0 and one for node 1. Does that mean
>> that both have overlapping pfn ranges?
>>
>> [1 ||||| ]
>> [2 ||||| ]
>>
>> Like this? How are the ||||| represented in the bootmem maps of each bdata?
>
> Yes.
Okay. So they share the same PFNs. Now imagine the following scenario:
node0: 0-2GB, 4-6GB
node1: 2-4GB, 6-8GB
/* Marks the range on node0 and node1 */
free_bootmem(1.5G, 2G);
/* Frees all bootmem on both nodes */
free_all_bootmem_node(NODE_DATA(0));
free_all_bootmem_node(NODE_DATA(1));
Aren't the same page descriptors send to __free_bootmem_pages() twice?
Hannes
next prev parent reply other threads:[~2008-04-30 17:52 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-26 18:55 [RFC git pull] "big box" x86 changes Ingo Molnar
2008-04-26 19:05 ` Stefan Richter
2008-04-26 19:21 ` Ingo Molnar
2008-04-26 19:12 ` Linus Torvalds
2008-04-26 19:41 ` [git pull] "big box" x86 changes, bootmem/sparsemem Ingo Molnar
2008-04-26 19:52 ` Linus Torvalds
2008-04-26 20:07 ` Ingo Molnar
2008-04-26 20:08 ` [git pull] "big box" x86 changes, bootmem/sparsemem, #2 Ingo Molnar
2008-04-26 20:30 ` Linus Torvalds
2008-04-26 20:55 ` [git pull] "big box" x86 changes, bootmem/sparsemem, #3 Ingo Molnar
2008-04-27 22:48 ` [git pull] "big box" x86 changes, bootmem/sparsemem Johannes Weiner
2008-04-27 23:46 ` Ingo Molnar
2008-04-28 0:19 ` Johannes Weiner
2008-04-28 0:40 ` [patch] mm: node-setup agnostic free_bootmem() Ingo Molnar
2008-04-28 1:48 ` Yinghai Lu
2008-04-28 16:54 ` Johannes Weiner
2008-04-28 19:11 ` Yinghai Lu
2008-04-28 19:55 ` Yinghai Lu
2008-04-30 10:50 ` Johannes Weiner
2008-04-30 16:22 ` Yinghai Lu
2008-04-30 17:52 ` Johannes Weiner [this message]
2008-04-30 20:30 ` Yinghai Lu
2008-04-28 16:49 ` Johannes Weiner
2008-04-29 14:25 ` Ingo Molnar
2008-04-30 10:52 ` Johannes Weiner
2008-04-28 0:33 ` [git pull] "big box" x86 changes, bootmem/sparsemem Yinghai Lu
2008-04-28 16:58 ` Johannes Weiner
2008-04-26 19:54 ` [git pull] "big box" x86 changes, boot protocol Ingo Molnar
2008-04-26 20:39 ` Andrew Morton
2008-04-26 21:06 ` Adrian Bunk
2008-04-26 21:10 ` H. Peter Anvin
2008-04-26 21:11 ` Linus Torvalds
2008-04-26 21:17 ` Ingo Molnar
2008-04-26 23:37 ` Jeremy Fitzhardinge
2008-04-27 11:21 ` Ian Campbell
2008-04-27 19:29 ` H. Peter Anvin
2008-04-28 15:27 ` Ingo Molnar
2008-04-26 20:24 ` [RFC git pull] "big box" x86 changes, GART Ingo Molnar
2008-04-26 20:26 ` Ingo Molnar
2008-04-26 21:55 ` [git pull] "big box" x86 changes, PCI Ingo Molnar
2008-04-27 16:30 ` Jesse Barnes
2008-04-28 15:38 ` Ingo Molnar
2008-04-28 20:34 ` Jesse Barnes
2008-04-28 22:53 ` Yinghai Lu
2008-04-28 23:27 ` [PATCH] x86/pci: remove flag in pci_cfg_space_size_ext Yinghai Lu
2008-04-29 16:14 ` Jesse Barnes
2008-04-29 22:05 ` Ingo Molnar
2008-04-29 22:34 ` Jesse Barnes
2008-04-26 22:17 ` [RFC git pull] "big box" x86 changes Andi Kleen
2008-04-27 3:14 ` Yinghai Lu
2008-04-27 8:30 ` Andi Kleen
2008-04-27 8:32 ` [RFC git pull] "big box" x86 changes II Andi Kleen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87mynbz0vi.fsf@saeurebad.de \
--to=hannes@saeurebad.de \
--cc=akpm@linux-foundation.org \
--cc=hpa@zytor.com \
--cc=jbarnes@virtuousgeek.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=suresh.b.siddha@intel.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=yhlu.kernel@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.