linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Anshuman Khandual <khandual@linux.vnet.ibm.com>
To: Dave Hansen <dave.hansen@intel.com>,
	Anshuman Khandual <khandual@linux.vnet.ibm.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Cc: mhocko@suse.com, vbabka@suse.cz, mgorman@suse.de,
	minchan@kernel.org, aneesh.kumar@linux.vnet.ibm.com,
	bsingharora@gmail.com, srikar@linux.vnet.ibm.com,
	haren@linux.vnet.ibm.com, jglisse@redhat.com,
	dan.j.williams@intel.com
Subject: Re: [RFC V2 02/12] mm: Isolate HugeTLB allocations away from CDM nodes
Date: Wed, 1 Feb 2017 19:29:00 +0530	[thread overview]
Message-ID: <d1995ee9-246f-5920-8a75-61868c2a209e@linux.vnet.ibm.com> (raw)
In-Reply-To: <db9e7345-da08-5011-22ae-b20927b174f4@intel.com>

On 01/31/2017 07:07 AM, Dave Hansen wrote:
> On 01/30/2017 05:03 PM, Anshuman Khandual wrote:
>> On 01/30/2017 10:49 PM, Dave Hansen wrote:
>>> On 01/29/2017 07:35 PM, Anshuman Khandual wrote:
>>>> HugeTLB allocation/release/accounting currently spans across all the nodes
>>>> under N_MEMORY node mask. Coherent memory nodes should not be part of these
>>>> allocations. So use system_ram() call to fetch system RAM only nodes on the
>>>> platform which can then be used for HugeTLB allocation purpose instead of
>>>> N_MEMORY node mask. This isolates coherent device memory nodes from HugeTLB
>>>> allocations.
>>>
>>> Does this end up making it impossible to use hugetlbfs to access device
>>> memory?
>>
>> Right, thats the implementation at the moment. But going forward if we need
>> to have HugeTLB pages on the CDM node, then we can implement through the
>> sysfs interface from individual NUMA node paths instead of changing the
>> generic HugeTLB path. I wrote this up in the cover letter but should also
>> have mentioned in the comment section of this patch as well. Does this
>> approach look okay ?
> 
> The cover letter is not the most approachable document I've ever seen. :)

Hmm,

So shall we write all these details in the comment section for each
patch after the SOB statement to be more visible ? Or some where
in-code documentation as FIXME or XXX or something. These are little
large paragraphs, hence was wondering.

> 
>> "Now, we ensure complete HugeTLB allocation isolation from CDM nodes. Going
>> forward if we need to support HugeTLB allocation on CDM nodes on targeted
>> basis, then we would have to enable those allocations through the
>> /sys/devices/system/node/nodeN/hugepages/hugepages-16384kB/nr_hugepages
>> interface while still ensuring isolation from other generic sysctl and
>> /sys/kernel/mm/hugepages/hugepages-16384kB/nr_hugepages interfaces."
> 
> That would be passable if that's the only way you can allocate hugetlbfs
> pages.  But we also have the fault-based allocations that can pull stuff
> right out of the buddy allocator.  This approach would break that path
> entirely.

There two distinct points which I think will prevent the problem you just
mentioned.

* No regular node has CDM memory in their fallback zone list. Hence any
  allocation attempt without __GFP_THISNODE will never go into CDM memory
  zones. If the allocation happens with __GFP_THISNODE flag it will only
  happen from the exact node. Remember we have removed CDM nodes from the
  global nodemask iterators. Then how can pre allocated reserve HugeTLB
  pages can come from CDM nodes ?

* Page faults (which will probably use __GFP_THISNODE) cannot come from the
  CDM nodes as they dont have any CPUs.

I did a quick scan of all the allocation paths leading upto the allocation
functions alloc_pages_node() and __alloc_pages_node() inside the hugetlb.c
file. Might be missing something here.

> 
> FWIW, I think you really need to separate the true "CDM" stuff that's
> *really* device-specific from the parts of this from which you really
> just want to implement isolation.

IIUC, are you suggesting something like a pure CDM HugeTLB implementation
which is completely separated from the generic one ?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-02-01 14:00 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-30  3:35 [RFC V2 00/12] Define coherent device memory node Anshuman Khandual
2017-01-30  3:35 ` [RFC V2 01/12] mm: Define coherent device memory (CDM) node Anshuman Khandual
2017-01-30  3:35 ` [RFC V2 02/12] mm: Isolate HugeTLB allocations away from CDM nodes Anshuman Khandual
2017-01-30 17:19   ` Dave Hansen
2017-01-31  1:03     ` Anshuman Khandual
2017-01-31  1:37       ` Dave Hansen
2017-02-01 13:59         ` Anshuman Khandual [this message]
2017-02-01 19:01           ` Dave Hansen
2017-01-30  3:35 ` [RFC V2 03/12] mm: Change generic FALLBACK zonelist creation process Anshuman Khandual
2017-01-30 17:34   ` Dave Hansen
2017-01-31  1:36     ` Anshuman Khandual
2017-01-31  1:57       ` Dave Hansen
2017-01-31  7:25         ` John Hubbard
2017-01-31 18:04           ` Dave Hansen
2017-01-31 19:14             ` David Nellans
2017-02-01  6:56             ` Anshuman Khandual
2017-02-01  6:46           ` Anshuman Khandual
2017-02-01  6:40         ` Anshuman Khandual
2017-01-30  3:35 ` [RFC V2 04/12] mm: Change mbind(MPOL_BIND) implementation for CDM nodes Anshuman Khandual
2017-01-30  3:35 ` [RFC V2 05/12] cpuset: Add cpuset_inc() inside cpuset_init() Anshuman Khandual
2017-01-30 17:36   ` Dave Hansen
2017-01-30 20:30   ` Mel Gorman
2017-01-31 14:22     ` [RFC] cpuset: Enable changing of top_cpuset's mems_allowed nodemask Anshuman Khandual
2017-01-31 16:00       ` Mel Gorman
2017-02-01  7:31         ` Anshuman Khandual
2017-02-01  8:53           ` Michal Hocko
2017-02-01  9:18           ` Mel Gorman
2017-01-31 14:36     ` [RFC V2 05/12] cpuset: Add cpuset_inc() inside cpuset_init() Vlastimil Babka
2017-01-31 15:30       ` Anshuman Khandual
2017-01-30  3:35 ` [RFC V2 06/12] mm: Exclude CDM nodes from task->mems_allowed and root cpuset Anshuman Khandual
2017-01-30  3:35 ` [RFC V2 07/12] mm: Ignore cpuset enforcement when allocation flag has __GFP_THISNODE Anshuman Khandual
2017-01-30  3:35 ` [RFC V2 08/12] mm: Add new VMA flag VM_CDM Anshuman Khandual
2017-01-30 18:52   ` Jerome Glisse
2017-01-31  4:22     ` Anshuman Khandual
2017-01-31  6:05       ` Jerome Glisse
2017-01-30  3:35 ` [RFC V2 09/12] mm: Exclude CDM marked VMAs from auto NUMA Anshuman Khandual
2017-01-30  3:35 ` [RFC V2 10/12] mm: Ignore madvise(MADV_MERGEABLE) request for VM_CDM marked VMAs Anshuman Khandual
2017-01-30  3:35 ` [RFC V2 11/12] mm: Tag VMA with VM_CDM flag during page fault Anshuman Khandual
2017-01-30 17:51   ` Dave Hansen
2017-01-31  5:10     ` Anshuman Khandual
2017-01-31 17:54       ` Dave Hansen
2017-01-30  3:35 ` [RFC V2 12/12] mm: Tag VMA with VM_CDM flag explicitly during mbind(MPOL_BIND) Anshuman Khandual
2017-01-30 17:54   ` Dave Hansen
2017-01-31  4:36     ` Anshuman Khandual
2017-02-07 18:07       ` Dave Hansen
2017-02-08 14:13         ` Anshuman Khandual
2017-02-08 15:04         ` Jerome Glisse
2017-01-30  3:35 ` [DEBUG 13/21] powerpc/mm: Identify coherent device memory nodes during platform init Anshuman Khandual
2017-01-30  3:35 ` [DEBUG 14/21] powerpc/mm: Create numa nodes for hotplug memory Anshuman Khandual
2017-01-30  3:35 ` [DEBUG 15/21] powerpc/mm: Enable CONFIG_MOVABLE_NODE for PPC64 platform Anshuman Khandual
2017-01-30  3:35 ` [DEBUG 16/21] mm: Enable CONFIG_MOVABLE_NODE on powerpc Anshuman Khandual
2017-01-30  3:35 ` [DEBUG 17/21] mm: Export definition of 'zone_names' array through mmzone.h Anshuman Khandual
2017-01-30  3:35 ` [DEBUG 18/21] mm: Add debugfs interface to dump each node's zonelist information Anshuman Khandual
2017-01-30  3:36 ` [DEBUG 19/21] mm: Add migrate_virtual_range migration interface Anshuman Khandual
2017-01-30  3:36 ` [DEBUG 20/21] drivers: Add two drivers for coherent device memory tests Anshuman Khandual
2017-01-30  3:36 ` [DEBUG 21/21] selftests/powerpc: Add a script to perform random VMA migrations Anshuman Khandual
2017-01-31  5:48 ` [RFC V2 00/12] Define coherent device memory node Anshuman Khandual
2017-01-31  6:15   ` Jerome Glisse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d1995ee9-246f-5920-8a75-61868c2a209e@linux.vnet.ibm.com \
    --to=khandual@linux.vnet.ibm.com \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=bsingharora@gmail.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=haren@linux.vnet.ibm.com \
    --cc=jglisse@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).