From: Mel Gorman <mgorman@suse.de>
To: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>,
Andrew Morton <akpm@linux-foundation.org>,
Tejun Heo <tj@kernel.org>, Len Brown <lenb@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@elte.hu>,
"H. Peter Anvin" <hpa@zytor.com>, Toshi Kani <toshi.kani@hp.com>,
Wanpeng Li <liwanp@linux.vnet.ibm.com>,
Thomas Renninger <trenn@suse.de>, Yinghai Lu <yinghai@kernel.org>,
Jiang Liu <jiang.liu@huawei.com>,
Wen Congyang <wency@cn.fujitsu.com>,
Lai Jiangshan <laijs@cn.fujitsu.com>,
Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>,
Taku Izumi <izumi.taku@jp.fujitsu.com>,
Minchan Kim <minchan@kernel.org>,
"mina86@mina86.com" <mina86@mina86.com>,
"gong.chen@linux.intel.com" <gong.chen@linux.intel.com>,
Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>,
"lwoodman@redhat.com" <lwoodman@redhat.com>,
Rik van Riel <riel@redhat.com>,
"jweiner@redhat.com" <jweiner@redhat.com>,
Prarit Bhargava <prarit@redhat.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Linux MM <linux-mm@kvack.org>, Chen Tang <imtangchen@gmail.com>,
Zhang Yanfei <zhangyanfei.yes@gmail.com>
Subject: Re: [PATCH RESEND part2 v2 1/8] x86: get pg_data_t's memory from other node
Date: Tue, 11 Feb 2014 11:08:42 +0000 [thread overview]
Message-ID: <20140211110842.GI6732@suse.de> (raw)
In-Reply-To: <52F86745.2060204@cn.fujitsu.com>
On Mon, Feb 10, 2014 at 01:44:37PM +0800, Tang Chen wrote:
> Hi Mel,
>
> On 02/06/2014 06:12 PM, Mel Gorman wrote:
> >Any comment on this or are the issues just going to be waved away?
>
> Sorry for the delay.
>
> >
> ......
> >>Again, booting is fine but least say it's an 8-node machine then that
> >>implies the Normal:Movable ratio will be 1:8. All page table pages, inode,
> >>dentries etc will have to fit in that 1/8th of memory with all the associated
> >>costs including remote access penalties. In extreme cases it may not be
> >>possible to use all of memory because the management structures cannot be
> >>allocated. Users may want the option of adjusting what this ratio is so
> >>they can unplug some memory while not completely sacrificing performance.
> >>
> >>Minimally, the kernel should print a big fat warning if the ratio is equal
> >>or more than 1:3 Normal:Movable. That ratio selection is arbitrary. I do not
> >>recall ever seeing any major Normal:Highmem bugs on 4G 32-bit machines so it
> >>is a conservative choice. The last Normal:Highmem bug I remember was related
> >>to a 16G 32-bit machine (https://bugzilla.kernel.org/show_bug.cgi?id=42578)
> >>a 1:15 ratio feels very optimistic for a very large machine.
> ......
> >>>
> >>>For now, yes. We expect firmware and hardware to give the basic
> >>>ratio (how much memory
> >>>is hotpluggable), and the user decides how to arrange the memory
> >>>(decide the size of
> >>>normal zone and movable zone).
> >>>
> >>
> >>There seems to be big gaps in the configuration options here. The user
> >>can either ask it to be automatically assigned and have no control of
> >>the ratio or manually hot-add the memory which is a relatively heavy
> >>administrative burden.
>
> Yes.
>
> 1. Automatically assigning is done by movable_node boot option,
> which is the
> main work of this patch-set. It depends on SRAT (firmware).
>
I know but I'm concerned that this means that the firmware can request a
setup with an insane Normal:Movable ratio.
> 2. Manually assigning has been done since 2012, by the following patch-set.
>
> https://lkml.org/lkml/2012/8/6//113
>
> This patch-set allowed users to online memory as normal or
> movable. But it
> is not that easy to use. So, I also think an user space tool is needed.
> And I'm planing to do this recently.
>
Ok.
> >>
> >>I think they should be warned if the ratio is high and have an option of
> >>specifying a ratio manually even if that means that additional nodes
> >>will not be hot-removable.
>
> I think this is easy to do, provide an option for users to specify a
> Normal:Movable ratio. This is not phys addr, and it is easy to use.
>
Yes. It would even be some help if the parameter forced some NUMA nodes
to be Normal instead of Movable regardless of what SRAT says. There
still would be an administrative burden in discovering what nodes are
now pluggable but they must have been dealing with this already.
> >>
> >>This is all still a kludge around the fact that node memory hot-remove
> >>did not try and cope with full migration by breaking some of the 1:1
> >>virt:phys mapping assumptions when hot-remove was enabled.
>
> I also said before, the implementation now can only be a temporary
> solution for memory hotplug since it would take us a lot of time to
> deal with 1:1 mapping thing.
>
> But about "breaking some of the 1:1 mapping", would you please give me
> any hint of it ? I want to do it too, but I cannot see where to start.
>
Some hints on how it might be tackled were given back in November 2012
https://lkml.org/lkml/2012/11/29/190 but I never researched it in
detail.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-02-11 11:08 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-03 2:19 [PATCH RESEND part2 v2 0/8] Arrange hotpluggable memory as ZONE_MOVABLE Zhang Yanfei
2013-12-03 2:22 ` [PATCH RESEND part2 v2 1/8] x86: get pg_data_t's memory from other node Zhang Yanfei
2014-01-16 17:11 ` Mel Gorman
2014-01-17 0:15 ` H. Peter Anvin
2014-01-20 7:29 ` Tang Chen
2014-01-20 15:14 ` Mel Gorman
2014-02-06 10:12 ` Mel Gorman
2014-02-10 5:44 ` Tang Chen
2014-02-11 11:08 ` Mel Gorman [this message]
2014-02-12 7:11 ` Tang Chen
2013-12-03 2:24 ` [PATCH RESEND part2 v2 2/8] memblock, numa: Introduce flag into memblock Zhang Yanfei
2013-12-03 2:25 ` [PATCH RESEND part2 v2 3/8] memblock, mem_hotplug: Introduce MEMBLOCK_HOTPLUG flag to mark hotpluggable regions Zhang Yanfei
2013-12-03 2:25 ` [PATCH RESEND part2 v2 4/8] memblock: Make memblock_set_node() support different memblock_type Zhang Yanfei
2013-12-03 2:27 ` [PATCH RESEND part2 v2 5/8] acpi, numa, mem_hotplug: Mark hotpluggable memory in memblock Zhang Yanfei
2013-12-03 2:28 ` [PATCH RESEND part2 v2 6/8] acpi, numa, mem_hotplug: Mark all nodes the kernel resides un-hotpluggable Zhang Yanfei
2013-12-03 23:44 ` Andrew Morton
2013-12-04 2:09 ` [PATCH update " Zhang Yanfei
2013-12-03 2:29 ` [PATCH RESEND part2 v2 7/8] memblock, mem_hotplug: Make memblock skip hotpluggable regions if needed Zhang Yanfei
2013-12-03 2:30 ` [PATCH RESEND part2 v2 8/8] x86, numa, acpi, memory-hotplug: Make movable_node have higher priority Zhang Yanfei
2014-01-16 17:03 ` Mel Gorman
2013-12-03 2:45 ` [PATCH RESEND part2 v2 0/8] Arrange hotpluggable memory as ZONE_MOVABLE Zhang Yanfei
2013-12-03 23:48 ` Andrew Morton
2013-12-04 0:02 ` Zhang Yanfei
2013-12-04 9:53 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140211110842.GI6732@suse.de \
--to=mgorman@suse.de \
--cc=akpm@linux-foundation.org \
--cc=gong.chen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=imtangchen@gmail.com \
--cc=isimatu.yasuaki@jp.fujitsu.com \
--cc=izumi.taku@jp.fujitsu.com \
--cc=jiang.liu@huawei.com \
--cc=jweiner@redhat.com \
--cc=laijs@cn.fujitsu.com \
--cc=lenb@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=liwanp@linux.vnet.ibm.com \
--cc=lwoodman@redhat.com \
--cc=mina86@mina86.com \
--cc=minchan@kernel.org \
--cc=mingo@elte.hu \
--cc=prarit@redhat.com \
--cc=riel@redhat.com \
--cc=tangchen@cn.fujitsu.com \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=toshi.kani@hp.com \
--cc=trenn@suse.de \
--cc=vasilis.liaskovitis@profitbricks.com \
--cc=wency@cn.fujitsu.com \
--cc=yinghai@kernel.org \
--cc=zhangyanfei.yes@gmail.com \
--cc=zhangyanfei@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).