All of lore.kernel.org
 help / color / mirror / Atom feed
From: Toshi Kani <toshi.kani@hp.com>
To: Tejun Heo <tj@kernel.org>
Cc: Zhang Yanfei <zhangyanfei.yes@gmail.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Rafael J . Wysocki" <rjw@sisk.pl>,
	"lenb@kernel.org" <lenb@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	"mingo@elte.hu" <mingo@elte.hu>,
	Wanpeng Li <liwanp@linux.vnet.ibm.com>,
	Thomas Renninger <trenn@suse.de>, Yinghai Lu <yinghai@kernel.org>,
	Jiang Liu <jiang.liu@huawei.com>,
	Wen Congyang <wency@cn.fujitsu.com>,
	Lai Jiangshan <laijs@cn.fujitsu.com>,
	"isimatu.yasuaki@jp.fujitsu.com" <isimatu.yasuaki@jp.fujitsu.com>,
	"izumi.taku@jp.fujitsu.com" <izumi.taku@jp.fujitsu.com>,
	Mel Gorman <mgorman@suse.de>, Minchan Kim <minchan@kernel.org>,
	"mina86@mina86.com" <mina86@mina86.com>,
	"gong.chen@linux.intel.com" <gong.chen@linux.intel.com>,
	"vasilis.liaskovitis@profitbricks.com"
	<vasilis.liaskovitis@profitbricks.com>,
	"lwoodman@redhat.com" <lwoodman@redh>
Subject: Re: [PATCH part1 v6 4/6] x86/mem-hotplug: Support initialize page tables in bottom-up
Date: Thu, 10 Oct 2013 08:36:49 -0600	[thread overview]
Message-ID: <1381415809.24268.40.camel@misato.fc.hp.com> (raw)
In-Reply-To: <20131010010029.GA10900@mtj.dyndns.org>

On Thu, 2013-10-10 at 01:00 +0000, Tejun Heo wrote:
> Hello, Toshi.
> 
> On Wed, Oct 09, 2013 at 05:58:55PM -0600, Toshi Kani wrote:
> > Well, there was a plan before, which considered to enhance it to a
> > memory device granularity at step 3.  But we had a major replan at step
> > 1 per your suggestion.
> > 
> > https://lkml.org/lkml/2013/6/19/73
> 
> Where?
> 
>  "3. Improve memory hotplug to support local device pagetable."
> 
> How can the above possibly be considered as a plan for finer
> granularity?  Forget about the "how" part.  The stated goal doesn't
> even mention finer granularity.  

The word "device" above refers memory device level granularity.  

> Are firmware writers gonna be
> required to split SRAT entries into multiple sub-nodes to support it?

Yes, and that's part of the ACPI spec.  That's not something the OS
requests to do.  If a memory range has different attribute, firmware has
to put it in a separate entry.

> Is segregating zones further for this even a good idea?  Adding more
> NUMA nodes has its own overhead and the mm code isn't written
> expecting it to be repurposed for segmenting the same NUMA node for
> hotplug underneath it.

I agree.  But my point is that it is an issue today with the current
kernel implementation.  This issue is not introduced by using SRAT.

> Maybe zoning is a viable approach.  Maybe it is not.  I don't know,
> but you guys don't seem to be too interested in actual long term
> planning while pushing for something invasive which may or may not be
> viable in the longer term, which can often lead to silly situations.
> It isn't even clear whether SRAT is the right interface for this.  If
> it's gonna require firwmare writer's cooperation anyway, why not
> provide the information as extended part of e820?  It doesn't seem to
> have much to do with NUMA or zones.  The only information the kernel
> needs to know is whether certain memory areas should only be used for
> page cache.

SRAT and _EJ0 method are the only interfaces that define ejectability in
the standard spec.  Are you suggesting us to change the e820 spec or not
to comply with the spec?  I do not think such approaches work.    

> At this point, at least to me, it doesn't seem reasonably clear how
> this is gonna develop and the whole thing feels like a kludge, which
> can be fine too, but seriously if you guys wanna push for an invasive
> approach, it should really be backed by longer term plan, vision,
> justification and the ability to make the necessary changes in the
> various involved layers.  Maybe I'm being too pessimistic but I feel
> that there are a lot missing in most of those areas, which makes it
> quite risky to commit to invasive changes.
> 
> If the zone based kludgy appraoch is something meaningfully useful,
> I'd suggest to sticking to it at least for now.  Some of it would be
> useful anyway and if it doesn't fan out the added maintenance overhead
> is fairly low.

I think memory hotplug was originally implemented on ia64 with the node
granularity.  I share your concerns, but that's been done a long time
ago.  It's too late to complain the past.  This SRAT work is not
introducing such restriction.

Thanks,
-Toshi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Toshi Kani <toshi.kani@hp.com>
To: Tejun Heo <tj@kernel.org>
Cc: Zhang Yanfei <zhangyanfei.yes@gmail.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Rafael J . Wysocki" <rjw@sisk.pl>,
	"lenb@kernel.org" <lenb@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	"mingo@elte.hu" <mingo@elte.hu>,
	Wanpeng Li <liwanp@linux.vnet.ibm.com>,
	Thomas Renninger <trenn@suse.de>, Yinghai Lu <yinghai@kernel.org>,
	Jiang Liu <jiang.liu@huawei.com>,
	Wen Congyang <wency@cn.fujitsu.com>,
	Lai Jiangshan <laijs@cn.fujitsu.com>,
	"isimatu.yasuaki@jp.fujitsu.com" <isimatu.yasuaki@jp.fujitsu.com>,
	"izumi.taku@jp.fujitsu.com" <izumi.taku@jp.fujitsu.com>,
	Mel Gorman <mgorman@suse.de>, Minchan Kim <minchan@kernel.org>,
	"mina86@mina86.com" <mina86@mina86.com>,
	"gong.chen@linux.intel.com" <gong.chen@linux.intel.com>,
	"vasilis.liaskovitis@profitbricks.com"
	<vasilis.liaskovitis@profitbricks.com>,
	"lwoodman@redhat.com" <lwoodman@redhat.com>,
	Rik van Riel <riel@redhat.com>,
	"jweiner@redhat.com" <jweiner@redhat.com>,
	"prarit@redhat.com" <prarit@redhat.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,
	"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
	"imtangchen@gmail.com" <imtangchen@gmail.com>,
	Zhang Yanfei <zhangyanfei@cn.fujitsu.com>,
	Tang Chen <tangchen@cn.fujitsu.com>
Subject: Re: [PATCH part1 v6 4/6] x86/mem-hotplug: Support initialize page tables in bottom-up
Date: Thu, 10 Oct 2013 08:36:49 -0600	[thread overview]
Message-ID: <1381415809.24268.40.camel@misato.fc.hp.com> (raw)
In-Reply-To: <20131010010029.GA10900@mtj.dyndns.org>

On Thu, 2013-10-10 at 01:00 +0000, Tejun Heo wrote:
> Hello, Toshi.
> 
> On Wed, Oct 09, 2013 at 05:58:55PM -0600, Toshi Kani wrote:
> > Well, there was a plan before, which considered to enhance it to a
> > memory device granularity at step 3.  But we had a major replan at step
> > 1 per your suggestion.
> > 
> > https://lkml.org/lkml/2013/6/19/73
> 
> Where?
> 
>  "3. Improve memory hotplug to support local device pagetable."
> 
> How can the above possibly be considered as a plan for finer
> granularity?  Forget about the "how" part.  The stated goal doesn't
> even mention finer granularity.  

The word "device" above refers memory device level granularity.  

> Are firmware writers gonna be
> required to split SRAT entries into multiple sub-nodes to support it?

Yes, and that's part of the ACPI spec.  That's not something the OS
requests to do.  If a memory range has different attribute, firmware has
to put it in a separate entry.

> Is segregating zones further for this even a good idea?  Adding more
> NUMA nodes has its own overhead and the mm code isn't written
> expecting it to be repurposed for segmenting the same NUMA node for
> hotplug underneath it.

I agree.  But my point is that it is an issue today with the current
kernel implementation.  This issue is not introduced by using SRAT.

> Maybe zoning is a viable approach.  Maybe it is not.  I don't know,
> but you guys don't seem to be too interested in actual long term
> planning while pushing for something invasive which may or may not be
> viable in the longer term, which can often lead to silly situations.
> It isn't even clear whether SRAT is the right interface for this.  If
> it's gonna require firwmare writer's cooperation anyway, why not
> provide the information as extended part of e820?  It doesn't seem to
> have much to do with NUMA or zones.  The only information the kernel
> needs to know is whether certain memory areas should only be used for
> page cache.

SRAT and _EJ0 method are the only interfaces that define ejectability in
the standard spec.  Are you suggesting us to change the e820 spec or not
to comply with the spec?  I do not think such approaches work.    

> At this point, at least to me, it doesn't seem reasonably clear how
> this is gonna develop and the whole thing feels like a kludge, which
> can be fine too, but seriously if you guys wanna push for an invasive
> approach, it should really be backed by longer term plan, vision,
> justification and the ability to make the necessary changes in the
> various involved layers.  Maybe I'm being too pessimistic but I feel
> that there are a lot missing in most of those areas, which makes it
> quite risky to commit to invasive changes.
> 
> If the zone based kludgy appraoch is something meaningfully useful,
> I'd suggest to sticking to it at least for now.  Some of it would be
> useful anyway and if it doesn't fan out the added maintenance overhead
> is fairly low.

I think memory hotplug was originally implemented on ia64 with the node
granularity.  I share your concerns, but that's been done a long time
ago.  It's too late to complain the past.  This SRAT work is not
introducing such restriction.

Thanks,
-Toshi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-10-10 14:36 UTC|newest]

Thread overview: 109+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-04  1:56 [PATCH part1 v6 0/6] x86, memblock: Allocate memory near kernel image before SRAT parsed Zhang Yanfei
2013-10-04  1:56 ` Zhang Yanfei
2013-10-04  1:57 ` [PATCH part1 v6 1/6] memblock: Factor out of top-down allocation Zhang Yanfei
2013-10-04  1:57   ` Zhang Yanfei
2013-10-04  1:58 ` [PATCH part1 v6 2/6] memblock: Introduce bottom-up allocation mode Zhang Yanfei
2013-10-04  1:58   ` Zhang Yanfei
2013-10-05 21:30   ` Toshi Kani
2013-10-05 21:30     ` Toshi Kani
2013-10-04  1:59 ` [PATCH part1 v6 3/6] x86/mm: Factor out of top-down direct mapping setup Zhang Yanfei
2013-10-04  1:59   ` Zhang Yanfei
2013-10-04  2:00 ` [PATCH part1 v6 4/6] x86/mem-hotplug: Support initialize page tables in bottom-up Zhang Yanfei
2013-10-04  2:00   ` Zhang Yanfei
2013-10-05 22:09   ` Toshi Kani
2013-10-05 22:09     ` Toshi Kani
2013-10-07  0:00   ` H. Peter Anvin
2013-10-07  0:00     ` H. Peter Anvin
2013-10-07 14:17     ` Zhang Yanfei
2013-10-07 14:17       ` Zhang Yanfei
2013-10-08 17:36     ` Zhang Yanfei
2013-10-08 17:36       ` Zhang Yanfei
2013-10-08 17:36       ` Zhang Yanfei
2013-10-09 16:44       ` Tejun Heo
2013-10-09 16:44         ` Tejun Heo
2013-10-09 17:14         ` Zhang Yanfei
2013-10-09 17:14           ` Zhang Yanfei
2013-10-09 19:20           ` Tejun Heo
2013-10-09 19:20             ` Tejun Heo
2013-10-09 19:30             ` Dave Hansen
2013-10-09 19:30               ` Dave Hansen
2013-10-09 19:47               ` Tejun Heo
2013-10-09 19:47                 ` Tejun Heo
2013-10-09 20:58             ` Toshi Kani
2013-10-09 20:58               ` Toshi Kani
2013-10-09 21:11               ` Tejun Heo
2013-10-09 21:11                 ` Tejun Heo
2013-10-09 21:14                 ` H. Peter Anvin
2013-10-09 21:14                   ` H. Peter Anvin
2013-10-09 21:45                   ` Zhang Yanfei
2013-10-09 21:45                     ` Zhang Yanfei
2013-10-09 23:10                     ` H. Peter Anvin
2013-10-09 23:10                       ` H. Peter Anvin
2013-10-09 23:26                       ` Zhang Yanfei
2013-10-09 23:26                         ` Zhang Yanfei
2013-10-10  1:20                         ` Zhang Yanfei
2013-10-10  1:20                           ` Zhang Yanfei
2013-10-10  1:20                           ` Zhang Yanfei
2013-10-10  0:25                   ` Toshi Kani
2013-10-10  0:25                     ` Toshi Kani
2013-10-09 23:58                 ` Toshi Kani
2013-10-09 23:58                   ` Toshi Kani
2013-10-10  1:00                   ` Tejun Heo
2013-10-10  1:00                     ` Tejun Heo
2013-10-10 14:36                     ` Toshi Kani [this message]
2013-10-10 14:36                       ` Toshi Kani
2013-10-10 15:35                       ` Tejun Heo
2013-10-10 15:35                         ` Tejun Heo
2013-10-10 16:24                         ` Toshi Kani
2013-10-10 16:24                           ` Toshi Kani
2013-10-10 16:46                           ` Tejun Heo
2013-10-10 16:46                             ` Tejun Heo
2013-10-10 16:50                             ` Toshi Kani
2013-10-10 16:50                               ` Toshi Kani
2013-10-10 16:55                               ` Tejun Heo
2013-10-10 16:55                                 ` Tejun Heo
2013-10-10 16:59                                 ` Toshi Kani
2013-10-10 16:59                                   ` Toshi Kani
2013-10-10 17:12                                   ` H. Peter Anvin
2013-10-10 17:12                                     ` H. Peter Anvin
2013-10-10 19:17                                     ` Toshi Kani
2013-10-10 19:17                                       ` Toshi Kani
2013-10-10 22:19                                       ` Tejun Heo
2013-10-10 22:19                                         ` Tejun Heo
2013-10-10 23:00                                         ` Toshi Kani
2013-10-10 23:00                                           ` Toshi Kani
2013-10-09 21:19             ` Zhang Yanfei
2013-10-09 21:19               ` Zhang Yanfei
2013-10-09 21:22               ` H. Peter Anvin
2013-10-09 21:22                 ` H. Peter Anvin
2013-10-09 23:30                 ` Zhang Yanfei
2013-10-09 23:30                   ` Zhang Yanfei
2013-10-09 19:10         ` Yinghai Lu
2013-10-09 19:10           ` Yinghai Lu
2013-10-09 19:23           ` Tejun Heo
2013-10-09 19:23             ` Tejun Heo
2013-10-11  5:27             ` Yinghai Lu
2013-10-11  5:27               ` Yinghai Lu
2013-10-11  5:47               ` Zhang Yanfei
2013-10-11  5:47                 ` Zhang Yanfei
2013-10-11  6:33                 ` Ingo Molnar
2013-10-11  6:33                   ` Ingo Molnar
2013-10-11  6:46                   ` Zhang Yanfei
2013-10-11  6:46                     ` Zhang Yanfei
2013-10-04  2:01 ` [PATCH part1 v6 5/6] x86, acpi, crash, kdump: Do reserve_crashkernel() after SRAT is parsed Zhang Yanfei
2013-10-04  2:01   ` Zhang Yanfei
2013-10-05 22:10   ` Toshi Kani
2013-10-05 22:10     ` Toshi Kani
2013-10-04  2:02 ` [PATCH part1 v6 6/6] mem-hotplug: Introduce movable_node boot option Zhang Yanfei
2013-10-04  2:02   ` Zhang Yanfei
2013-10-05 22:28   ` Toshi Kani
2013-10-05 22:28     ` Toshi Kani
2013-10-06 14:43     ` [PATCH part1 v6 update " Zhang Yanfei
2013-10-06 14:43       ` Zhang Yanfei
2013-10-06 14:43       ` Zhang Yanfei
2013-10-06 23:03       ` Toshi Kani
2013-10-06 23:03         ` Toshi Kani
2013-10-08  4:23 ` [PATCH part1 v6 0/6] x86, memblock: Allocate memory near kernel image before SRAT parsed Ingo Molnar
2013-10-08  4:23   ` Ingo Molnar
2013-10-08 15:28   ` Zhang Yanfei
2013-10-08 15:28     ` Zhang Yanfei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1381415809.24268.40.camel@misato.fc.hp.com \
    --to=toshi.kani@hp.com \
    --cc=akpm@linux-foundation.org \
    --cc=gong.chen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=isimatu.yasuaki@jp.fujitsu.com \
    --cc=izumi.taku@jp.fujitsu.com \
    --cc=jiang.liu@huawei.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=lenb@kernel.org \
    --cc=liwanp@linux.vnet.ibm.com \
    --cc=lwoodman@redh \
    --cc=mgorman@suse.de \
    --cc=mina86@mina86.com \
    --cc=minchan@kernel.org \
    --cc=mingo@elte.hu \
    --cc=rjw@sisk.pl \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=trenn@suse.de \
    --cc=vasilis.liaskovitis@profitbricks.com \
    --cc=wency@cn.fujitsu.com \
    --cc=yinghai@kernel.org \
    --cc=zhangyanfei.yes@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.