linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: Hugh Dickins <hughd@google.com>
Cc: Albert Huang <huangjie.albert@bytedance.com>,
	Muchun Song <songmuchun@bytedance.com>,
	Andi Kleen <ak@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm: hugetlb: support get/set_policy for hugetlb_vm_ops
Date: Fri, 14 Oct 2022 09:56:23 -0700	[thread overview]
Message-ID: <Y0mUt84TctGP3BtT@monkey> (raw)
In-Reply-To: <5f7ef6ee-6241-9912-f434-962be53272c@google.com>

On 10/12/22 12:45, Hugh Dickins wrote:
> On Wed, 12 Oct 2022, Albert Huang wrote:
> 
> > From: "huangjie.albert" <huangjie.albert@bytedance.com>
> > 
> > implement these two functions so that we can set the mempolicy to
> > the inode of the hugetlb file. This ensures that the mempolicy of
> > all processes sharing this huge page file is consistent.
> > 
> > In some scenarios where huge pages are shared:
> > if we need to limit the memory usage of vm within node0, so I set qemu's
> > mempilciy bind to node0, but if there is a process (such as virtiofsd)
> > shared memory with the vm, in this case. If the page fault is triggered
> > by virtiofsd, the allocated memory may go to node1 which  depends on
> > virtiofsd.
> > 
> > Signed-off-by: huangjie.albert <huangjie.albert@bytedance.com>

Thanks for the patch Albert, and thank you Hugh for the comments!

> Aha!  Congratulations for noticing, after all this time.  hugetlbfs
> contains various little pieces of code that pretend to be supporting
> shared NUMA mempolicy, but in fact there was nothing connecting it up.

I actually had to look this up to verify it was not supported.  However, the
documentation is fairly clear.
From admin-guide/mm/numa_memory_policy.rst.

"As of 2.6.22, only shared memory segments, created by shmget() or
 mmap(MAP_ANONYMOUS|MAP_SHARED), support shared policy.  When shared
 policy support was added to Linux, the associated data structures were
 added to hugetlbfs shmem segments.  At the time, hugetlbfs did not
 support allocation at fault time--a.k.a lazy allocation--so hugetlbfs
 shmem segments were never "hooked up" to the shared policy support.
 Although hugetlbfs segments now support lazy allocation, their support
 for shared policy has not been completed."

It is somewhat embarrassing that this has been known for so long and
nothing has changed.

> It will be for Mike to decide, but personally I oppose adding
> shared NUMA mempolicy support to hugetlbfs, after eighteen years.
> 
> The thing is, it will change the behaviour of NUMA on hugetlbfs:
> in ways that would have been sensible way back then, yes; but surely
> those who have invested in NUMA and hugetlbfs have developed other
> ways of administering it successfully, without shared NUMA mempolicy.
> 
> At the least, I would expect some tests to break (I could easily be
> wrong), and there's a chance that some app or tool would break too.
> 
> I have carried the reverse of Albert's patch for a long time, stripping
> out the pretence of shared NUMA mempolicy support from hugetlbfs: I
> wanted that, so that I could work on modifying the tmpfs implementation,
> without having to worry about other users.
> 
> Mike, if you would prefer to see my patch stripping out the pretence,
> let us know: it has never been a priority to send in, but I can update
> it to 6.1-rc1 if you'd like to see it.  (Once upon a time, it removed
> all need for struct hugetlbfs_inode_info, but nowadays that's still
> required for the memfd seals.)
> 
> Whether Albert's patch is complete and correct, I haven't begun to think
> about: I am not saying it isn't, but shared NUMA mempolicy adds another
> dimension of complexity, and need for support, that I think hugetlbfs
> would be better off continuing to survive without.

To be honest, I have not looked into the complexities of shared NUMA
mempolicy and exactly what is required for it's support.  With my limited
knowledge, it appears that this patch adds some type of support for shared
policy, but it may not provide all support mentioned in the documentation.

At the very least, this patch should also update documentation to state
what type of support is provided.

Albert, can you look into what would be required for full support?  I can take
a look as well but have some other higher priority tasks to work first.

TBH, I like Hugh's idea of removing the 'pretence of shared policy support'.
We are currently wasting memory carrying around extra unused fields in
hugetlbfs_inode_info. :(
-- 
Mike Kravetz


  reply	other threads:[~2022-10-14 16:56 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-12  8:15 [PATCH] mm: hugetlb: support get/set_policy for hugetlb_vm_ops Albert Huang
2022-10-12 19:45 ` Hugh Dickins
2022-10-14 16:56   ` Mike Kravetz [this message]
2022-10-17  3:35     ` [External] " 黄杰
2022-10-19  9:29     ` [PATCH v2] mm: hugetlb: support for shared memory policy Albert Huang
2022-10-19 11:49       ` Aneesh Kumar K V
2022-10-19  9:33   ` [PATCH] mm: hugetlb: support get/set_policy for hugetlb_vm_ops 黄杰
2022-10-23 20:16     ` Hugh Dickins
2022-10-17  8:44 ` David Hildenbrand
2022-10-17  9:48   ` [External] " 黄杰
2022-10-17 11:33     ` David Hildenbrand
2022-10-17 11:46       ` 黄杰
2022-10-17 12:00         ` David Hildenbrand
2022-10-18  9:27           ` 黄杰
2022-10-18  9:35             ` David Hildenbrand
2022-10-17 17:59       ` [External] " Mike Kravetz
2022-10-18  9:24         ` 黄杰

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y0mUt84TctGP3BtT@monkey \
    --to=mike.kravetz@oracle.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=huangjie.albert@bytedance.com \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=songmuchun@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).