linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Tang Chen <tangchen@cn.fujitsu.com>
Cc: tglx@linutronix.de, mingo@elte.hu, hpa@zytor.com,
	akpm@linux-foundation.org, trenn@suse.de, yinghai@kernel.org,
	jiang.liu@huawei.com, wency@cn.fujitsu.com, laijs@cn.fujitsu.com,
	isimatu.yasuaki@jp.fujitsu.com, mgorman@suse.de,
	minchan@kernel.org, mina86@mina86.com, gong.chen@linux.intel.com,
	vasilis.liaskovitis@profitbricks.com, lwoodman@redhat.com,
	riel@redhat.com, jweiner@redhat.com, prarit@redhat.com,
	x86@kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [Part1 PATCH v5 00/22] x86, ACPI, numa: Parse numa info earlier
Date: Tue, 18 Jun 2013 10:21:29 -0700	[thread overview]
Message-ID: <20130618172129.GH2767@htj.dyndns.org> (raw)
In-Reply-To: <51BFF464.809@cn.fujitsu.com>

Hey, Tang.

On Tue, Jun 18, 2013 at 01:47:16PM +0800, Tang Chen wrote:
> [approach]
> Parse SRAT earlier before memblock starts to work, because there is a
> bit in SRAT specifying which memory is hotpluggable.
> 
> I'm not saying this is the best approach. I can also see that this
> patch-set touches a lot of boot code. But i think parsing SRAT earlier
> is reasonable because this is the only way for now to know which memory
> is hotpluggable from firmware.

Touching a lot of code is not a problem but it feels like it's trying
to boot strap itself while walking and achieves that by carefully
sequencing all operations which may allocate from memblock before NUMA
info is available without any way to enforce or verify that.

> >Can't you just move memblock arrays after NUMA init is complete?
> >That'd be a lot simpler and way more robust than the proposed changes,
> >no?
> 
> Sorry, I don't quite understand the approach you are suggesting. If we
> move memblock arrays, we need to update all the pointers pointing to
> the moved memory. How can we do this ?

So, there are two things involved here - memblock itself and consumers
of memblock, right?  I get that the latter shouldn't allocate memory
from memblock before NUMA info is entered into memblock, so please
reorder as necessary *and* make sure memblock complains if something
violates that.  Temporary memory areas which are return are fine.
Just complain if there are memory regions remaining which are
allocated before NUMA info is available after boot is complete.  No
need to make booting more painful than it currently is.

As for memblock itself, there's no need to walk carefully around it.
Just let it do its thing and implement
memblock_relocate_to_numa_node_0() or whatever after NUMA information
is available.  memblock already does relocate itself whenever it's
expanding the arrays anyway, so implementation should be trivial.

Maybe I'm missing something but having a working memory allocator as
soon as possible is *way* less painful than trying to bootstrap around
it.  Allow boot path to allocate memory areas from memblock as soon as
possible but just ensure that none of the ones which may violate the
hotplug requirements is remaining once boot is complete.  Temporaray
regions won't matter then and the few which need persistent areas can
either be reordered to happen after NUMA init or they can allocate a
new area and move to there after NUMA info is available.  Let's please
minimize this walking-and-trying-to-tie-shoestrings-at-the-same-time
thing.  It's painful and extremely fragile.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-06-18 17:21 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-13 13:02 [Part1 PATCH v5 00/22] x86, ACPI, numa: Parse numa info earlier Tang Chen
2013-06-13 13:02 ` [Part1 PATCH v5 01/22] x86: Change get_ramdisk_{image|size}() to global Tang Chen
2013-06-13 13:02 ` [Part1 PATCH v5 02/22] x86, microcode: Use common get_ramdisk_{image|size}() Tang Chen
2013-06-13 13:02 ` [Part1 PATCH v5 03/22] x86, ACPI, mm: Kill max_low_pfn_mapped Tang Chen
2013-06-17 21:04   ` Tejun Heo
2013-06-17 21:13     ` Yinghai Lu
2013-06-17 23:08       ` Tejun Heo
2013-06-13 13:02 ` [Part1 PATCH v5 04/22] x86, ACPI: Search buffer above 4GB in a second try for acpi initrd table override Tang Chen
2013-06-17 21:06   ` Tejun Heo
2013-06-13 13:02 ` [Part1 PATCH v5 05/22] x86, ACPI: Increase acpi initrd override tables number limit Tang Chen
2013-06-13 13:02 ` [Part1 PATCH v5 06/22] x86, ACPI: Split acpi_initrd_override() into find/copy two steps Tang Chen
2013-06-13 13:02 ` [Part1 PATCH v5 07/22] x86, ACPI: Store override acpi tables phys addr in cpio files info array Tang Chen
2013-06-17 23:38   ` Tejun Heo
2013-06-17 23:40     ` Yinghai Lu
2013-06-17 23:52   ` Tejun Heo
2013-06-13 13:02 ` [Part1 PATCH v5 08/22] x86, ACPI: Make acpi_initrd_override_find work with 32bit flat mode Tang Chen
2013-06-18  0:07   ` Tejun Heo
2013-06-13 13:02 ` [Part1 PATCH v5 09/22] x86, ACPI: Find acpi tables in initrd early from head_32.S/head64.c Tang Chen
2013-06-18  0:33   ` Tejun Heo
2013-06-13 13:02 ` [Part1 PATCH v5 10/22] x86, mm, numa: Move two functions calling on successful path later Tang Chen
2013-06-18  0:53   ` Tejun Heo
2013-06-13 13:02 ` [Part1 PATCH v5 11/22] x86, mm, numa: Call numa_meminfo_cover_memory() checking early Tang Chen
2013-06-18  1:05   ` Tejun Heo
2013-06-13 13:02 ` [Part1 PATCH v5 12/22] x86, mm, numa: Move node_map_pfn_alignment() to x86 Tang Chen
2013-06-18  1:08   ` Tejun Heo
2013-06-13 13:03 ` [Part1 PATCH v5 13/22] x86, mm, numa: Use numa_meminfo to check node_map_pfn alignment Tang Chen
2013-06-18  1:40   ` Tejun Heo
2013-06-13 13:03 ` [Part1 PATCH v5 14/22] x86, mm, numa: Set memblock nid later Tang Chen
2013-06-18  1:45   ` Tejun Heo
2013-06-13 13:03 ` [Part1 PATCH v5 15/22] x86, mm, numa: Move node_possible_map setting later Tang Chen
2013-06-13 13:03 ` [Part1 PATCH v5 16/22] x86, mm, numa: Move numa emulation handling down Tang Chen
2013-06-18  1:58   ` Tejun Heo
2013-06-18  6:22     ` Yinghai Lu
2013-06-18  7:13       ` Yinghai Lu
2013-06-19 21:25       ` Yinghai Lu
2013-06-13 13:03 ` [Part1 PATCH v5 17/22] x86, ACPI, numa, ia64: split SLIT handling out Tang Chen
2013-06-13 13:03 ` [Part1 PATCH v5 18/22] x86, mm, numa: Add early_initmem_init() stub Tang Chen
2013-06-13 13:03 ` [Part1 PATCH v5 19/22] x86, mm: Parse numa info earlier Tang Chen
2013-06-13 13:03 ` [Part1 PATCH v5 20/22] x86, mm: Add comments for step_size shift Tang Chen
2013-06-13 13:03 ` [Part1 PATCH v5 21/22] x86, mm: Make init_mem_mapping be able to be called several times Tang Chen
2013-06-13 18:35   ` Konrad Rzeszutek Wilk
2013-06-13 22:47     ` Yinghai Lu
2013-06-14  5:08       ` Tang Chen
2013-06-13 13:03 ` [Part1 PATCH v5 22/22] x86, mm, numa: Put pagetable on local node ram for 64bit Tang Chen
2013-06-18  2:03 ` [Part1 PATCH v5 00/22] x86, ACPI, numa: Parse numa info earlier Tejun Heo
2013-06-18  5:47   ` Tang Chen
2013-06-18 17:21     ` Tejun Heo [this message]
2013-06-20  5:52       ` Tang Chen
2013-06-20  6:17         ` Tejun Heo
2013-06-21  9:19           ` Tang Chen
2013-06-21 18:25             ` Tejun Heo
2013-06-24  3:51               ` Tang Chen
2013-06-24  7:26                 ` Tang Chen
2013-06-24 19:59                   ` Tejun Heo
2013-06-18 17:10 ` Vasilis Liaskovitis
2013-06-18 20:19   ` Yinghai Lu
2013-06-19 10:05     ` Vasilis Liaskovitis
2013-06-20 18:42       ` Yinghai Lu
2013-06-24  9:40   ` Gu Zheng
2013-06-21  5:19 ` H. Peter Anvin
2013-06-21  6:06   ` Tang Chen
2013-06-21  6:10     ` H. Peter Anvin
2013-06-21  6:20       ` Tang Chen
2013-06-21  6:26         ` Tejun Heo
2013-06-21 20:18   ` Yinghai Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130618172129.GH2767@htj.dyndns.org \
    --to=tj@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=gong.chen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=isimatu.yasuaki@jp.fujitsu.com \
    --cc=jiang.liu@huawei.com \
    --cc=jweiner@redhat.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lwoodman@redhat.com \
    --cc=mgorman@suse.de \
    --cc=mina86@mina86.com \
    --cc=minchan@kernel.org \
    --cc=mingo@elte.hu \
    --cc=prarit@redhat.com \
    --cc=riel@redhat.com \
    --cc=tangchen@cn.fujitsu.com \
    --cc=tglx@linutronix.de \
    --cc=trenn@suse.de \
    --cc=vasilis.liaskovitis@profitbricks.com \
    --cc=wency@cn.fujitsu.com \
    --cc=x86@kernel.org \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).