From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dave Hansen Subject: Re: [RFC v2 0/5] surface heterogeneous memory performance information Date: Thu, 6 Jul 2017 16:30:08 -0700 Message-ID: References: <20170706215233.11329-1-ross.zwisler@linux.intel.com> <20170706230803.GE2919@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20170706230803.GE2919-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> Content-Language: en-US List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linux-nvdimm-bounces-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org Sender: "Linux-nvdimm" To: Jerome Glisse , Ross Zwisler Cc: "Box, David E" , linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, "Zheng, Lv" , linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org, "Rafael J. Wysocki" , "Anaczkowski, Lukasz" , "Moore, Robert" , linux-acpi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, "Odzioba, Lukasz" , "Schmauss, Erik" , Len Brown , devel-E0kO6a4B6psdnm+yROfE0A@public.gmane.org, "Kogut, Jaroslaw" , Greg Kroah-Hartman , "Nachimuthu, Murugasamy" , "Rafael J. Wysocki" , linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, "Lahtinen, Joonas" , Andrew Morton , Tim Chen List-Id: linux-acpi@vger.kernel.org On 07/06/2017 04:08 PM, Jerome Glisse wrote: >> So, for applications that need to differentiate between memory ranges based >> on their performance, what option would work best for you? Is the local >> (initiator,target) performance provided by patch 5 enough, or do you >> require performance information for all possible (initiator,target) >> pairings? > > Am i right in assuming that HBM or any faster memory will be relatively small > (1GB - 8GB maybe 16GB ?) and of fix amount (ie size will depend on the exact > CPU model you have) ? For HBM, that's certainly consistent with the Xeon Phi MCDRAM. But, please remember that this patch set is for fast memory *and* slow memory (vs. plain DRAM). > If so i am wondering if we should not restrict NUMA placement policy for such > node to vma only. Forbid any policy that would prefer those node globally at > thread/process level. This would avoid wide thread policy to exhaust this > smaller pool of memory. You would like to take the NUMA APIs and bifurcate them? Make some of them able to work on this memory, and others not? So, set_mempolicy() would work if you passed it one of these "special" nodes with MPOL_F_ADDR, but would fail otherwise? > Drawback of doing so would be that existing applications would not benefit > from it. So workload where is acceptable to exhaust such memory wouldn't > benefit until their application are updated. I think the guys running 40-year-old fortran binaries might not be so keen on this restriction. I bet there are a pretty substantial number of folks out there that would love to get new hardware and just do: numactl --membind=fast-node ./old-binary If I were working for a hardware company, I'd sure like to just be able to sell somebody some fancy new hardware and have their existing software "just work" with a minimal wrapper.