Re: [External] Re: [PATCH] mm: hugetlb: support get/set_policy for hugetlb_vm_ops

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: 黄杰 <huangjie.albert@bytedance.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Hugh Dickins <hughd@google.com>,
	Muchun Song <songmuchun@bytedance.com>,
	 Andi Kleen <ak@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org,  linux-kernel@vger.kernel.org
Subject: Re: [External] Re: [PATCH] mm: hugetlb: support get/set_policy for hugetlb_vm_ops
Date: Mon, 17 Oct 2022 11:35:09 +0800	[thread overview]
Message-ID: <CABKxMyOMEKdYWqvePNsn5C-vi9nK+XzmDscfCmFc9hQijfMTAA@mail.gmail.com> (raw)
In-Reply-To: <Y0mUt84TctGP3BtT@monkey>

Mike Kravetz <mike.kravetz@oracle.com> 于2022年10月15日周六 00:56写道：
>
> On 10/12/22 12:45, Hugh Dickins wrote:
> > On Wed, 12 Oct 2022, Albert Huang wrote:
> >
> > > From: "huangjie.albert" <huangjie.albert@bytedance.com>
> > >
> > > implement these two functions so that we can set the mempolicy to
> > > the inode of the hugetlb file. This ensures that the mempolicy of
> > > all processes sharing this huge page file is consistent.
> > >
> > > In some scenarios where huge pages are shared:
> > > if we need to limit the memory usage of vm within node0, so I set qemu's
> > > mempilciy bind to node0, but if there is a process (such as virtiofsd)
> > > shared memory with the vm, in this case. If the page fault is triggered
> > > by virtiofsd, the allocated memory may go to node1 which  depends on
> > > virtiofsd.
> > >
> > > Signed-off-by: huangjie.albert <huangjie.albert@bytedance.com>
>
> Thanks for the patch Albert, and thank you Hugh for the comments!
>
> > Aha!  Congratulations for noticing, after all this time.  hugetlbfs
> > contains various little pieces of code that pretend to be supporting
> > shared NUMA mempolicy, but in fact there was nothing connecting it up.
>
> I actually had to look this up to verify it was not supported.  However, the
> documentation is fairly clear.
> From admin-guide/mm/numa_memory_policy.rst.
>
> "As of 2.6.22, only shared memory segments, created by shmget() or
>  mmap(MAP_ANONYMOUS|MAP_SHARED), support shared policy.  When shared
>  policy support was added to Linux, the associated data structures were
>  added to hugetlbfs shmem segments.  At the time, hugetlbfs did not
>  support allocation at fault time--a.k.a lazy allocation--so hugetlbfs
>  shmem segments were never "hooked up" to the shared policy support.
>  Although hugetlbfs segments now support lazy allocation, their support
>  for shared policy has not been completed."
>
> It is somewhat embarrassing that this has been known for so long and
> nothing has changed.
>
> > It will be for Mike to decide, but personally I oppose adding
> > shared NUMA mempolicy support to hugetlbfs, after eighteen years.
> >
> > The thing is, it will change the behaviour of NUMA on hugetlbfs:
> > in ways that would have been sensible way back then, yes; but surely
> > those who have invested in NUMA and hugetlbfs have developed other
> > ways of administering it successfully, without shared NUMA mempolicy.
> >
> > At the least, I would expect some tests to break (I could easily be
> > wrong), and there's a chance that some app or tool would break too.
> >
> > I have carried the reverse of Albert's patch for a long time, stripping
> > out the pretence of shared NUMA mempolicy support from hugetlbfs: I
> > wanted that, so that I could work on modifying the tmpfs implementation,
> > without having to worry about other users.
> >
> > Mike, if you would prefer to see my patch stripping out the pretence,
> > let us know: it has never been a priority to send in, but I can update
> > it to 6.1-rc1 if you'd like to see it.  (Once upon a time, it removed
> > all need for struct hugetlbfs_inode_info, but nowadays that's still
> > required for the memfd seals.)
> >
> > Whether Albert's patch is complete and correct, I haven't begun to think
> > about: I am not saying it isn't, but shared NUMA mempolicy adds another
> > dimension of complexity, and need for support, that I think hugetlbfs
> > would be better off continuing to survive without.
>
> To be honest, I have not looked into the complexities of shared NUMA
> mempolicy and exactly what is required for it's support.  With my limited
> knowledge, it appears that this patch adds some type of support for shared
> policy, but it may not provide all support mentioned in the documentation.
>
> At the very least, this patch should also update documentation to state
> what type of support is provided.
>
> Albert, can you look into what would be required for full support?  I can take
> a look as well but have some other higher priority tasks to work first.
>

Lucky to do this job, let me think about it.

> TBH, I like Hugh's idea of removing the 'pretence of shared policy support'.
> We are currently wasting memory carrying around extra unused fields in
> hugetlbfs_inode_info. :(
> --
> Mike Kravetz

next prev parent reply	other threads:[~2022-10-17  3:35 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-12  8:15 [PATCH] mm: hugetlb: support get/set_policy for hugetlb_vm_ops Albert Huang
2022-10-12 19:45 ` Hugh Dickins
2022-10-14 16:56   ` Mike Kravetz
2022-10-17  3:35     ` 黄杰 [this message]
2022-10-19  9:29     ` [PATCH v2] mm: hugetlb: support for shared memory policy Albert Huang
2022-10-19 11:49       ` Aneesh Kumar K V
2022-10-19  9:33   ` [PATCH] mm: hugetlb: support get/set_policy for hugetlb_vm_ops 黄杰
2022-10-23 20:16     ` Hugh Dickins
2022-10-17  8:44 ` David Hildenbrand
2022-10-17  9:48   ` [External] " 黄杰
2022-10-17 11:33     ` David Hildenbrand
2022-10-17 11:46       ` 黄杰
2022-10-17 12:00         ` David Hildenbrand
2022-10-18  9:27           ` 黄杰
2022-10-18  9:35             ` David Hildenbrand
2022-10-17 17:59       ` [External] " Mike Kravetz
2022-10-18  9:24         ` 黄杰

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABKxMyOMEKdYWqvePNsn5C-vi9nK+XzmDscfCmFc9hQijfMTAA@mail.gmail.com \
    --to=huangjie.albert@bytedance.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=songmuchun@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).