From: 黄杰 <huangjie.albert@bytedance.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Hugh Dickins <hughd@google.com>,
Muchun Song <songmuchun@bytedance.com>,
Andi Kleen <ak@linux.intel.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [External] Re: [PATCH] mm: hugetlb: support get/set_policy for hugetlb_vm_ops
Date: Mon, 17 Oct 2022 11:35:09 +0800 [thread overview]
Message-ID: <CABKxMyOMEKdYWqvePNsn5C-vi9nK+XzmDscfCmFc9hQijfMTAA@mail.gmail.com> (raw)
In-Reply-To: <Y0mUt84TctGP3BtT@monkey>
Mike Kravetz <mike.kravetz@oracle.com> 于2022年10月15日周六 00:56写道:
>
> On 10/12/22 12:45, Hugh Dickins wrote:
> > On Wed, 12 Oct 2022, Albert Huang wrote:
> >
> > > From: "huangjie.albert" <huangjie.albert@bytedance.com>
> > >
> > > implement these two functions so that we can set the mempolicy to
> > > the inode of the hugetlb file. This ensures that the mempolicy of
> > > all processes sharing this huge page file is consistent.
> > >
> > > In some scenarios where huge pages are shared:
> > > if we need to limit the memory usage of vm within node0, so I set qemu's
> > > mempilciy bind to node0, but if there is a process (such as virtiofsd)
> > > shared memory with the vm, in this case. If the page fault is triggered
> > > by virtiofsd, the allocated memory may go to node1 which depends on
> > > virtiofsd.
> > >
> > > Signed-off-by: huangjie.albert <huangjie.albert@bytedance.com>
>
> Thanks for the patch Albert, and thank you Hugh for the comments!
>
> > Aha! Congratulations for noticing, after all this time. hugetlbfs
> > contains various little pieces of code that pretend to be supporting
> > shared NUMA mempolicy, but in fact there was nothing connecting it up.
>
> I actually had to look this up to verify it was not supported. However, the
> documentation is fairly clear.
> From admin-guide/mm/numa_memory_policy.rst.
>
> "As of 2.6.22, only shared memory segments, created by shmget() or
> mmap(MAP_ANONYMOUS|MAP_SHARED), support shared policy. When shared
> policy support was added to Linux, the associated data structures were
> added to hugetlbfs shmem segments. At the time, hugetlbfs did not
> support allocation at fault time--a.k.a lazy allocation--so hugetlbfs
> shmem segments were never "hooked up" to the shared policy support.
> Although hugetlbfs segments now support lazy allocation, their support
> for shared policy has not been completed."
>
> It is somewhat embarrassing that this has been known for so long and
> nothing has changed.
>
> > It will be for Mike to decide, but personally I oppose adding
> > shared NUMA mempolicy support to hugetlbfs, after eighteen years.
> >
> > The thing is, it will change the behaviour of NUMA on hugetlbfs:
> > in ways that would have been sensible way back then, yes; but surely
> > those who have invested in NUMA and hugetlbfs have developed other
> > ways of administering it successfully, without shared NUMA mempolicy.
> >
> > At the least, I would expect some tests to break (I could easily be
> > wrong), and there's a chance that some app or tool would break too.
> >
> > I have carried the reverse of Albert's patch for a long time, stripping
> > out the pretence of shared NUMA mempolicy support from hugetlbfs: I
> > wanted that, so that I could work on modifying the tmpfs implementation,
> > without having to worry about other users.
> >
> > Mike, if you would prefer to see my patch stripping out the pretence,
> > let us know: it has never been a priority to send in, but I can update
> > it to 6.1-rc1 if you'd like to see it. (Once upon a time, it removed
> > all need for struct hugetlbfs_inode_info, but nowadays that's still
> > required for the memfd seals.)
> >
> > Whether Albert's patch is complete and correct, I haven't begun to think
> > about: I am not saying it isn't, but shared NUMA mempolicy adds another
> > dimension of complexity, and need for support, that I think hugetlbfs
> > would be better off continuing to survive without.
>
> To be honest, I have not looked into the complexities of shared NUMA
> mempolicy and exactly what is required for it's support. With my limited
> knowledge, it appears that this patch adds some type of support for shared
> policy, but it may not provide all support mentioned in the documentation.
>
> At the very least, this patch should also update documentation to state
> what type of support is provided.
>
> Albert, can you look into what would be required for full support? I can take
> a look as well but have some other higher priority tasks to work first.
>
Lucky to do this job, let me think about it.
> TBH, I like Hugh's idea of removing the 'pretence of shared policy support'.
> We are currently wasting memory carrying around extra unused fields in
> hugetlbfs_inode_info. :(
> --
> Mike Kravetz
next prev parent reply other threads:[~2022-10-17 3:35 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-12 8:15 [PATCH] mm: hugetlb: support get/set_policy for hugetlb_vm_ops Albert Huang
2022-10-12 19:45 ` Hugh Dickins
2022-10-14 16:56 ` Mike Kravetz
2022-10-17 3:35 ` 黄杰 [this message]
2022-10-19 9:29 ` [PATCH v2] mm: hugetlb: support for shared memory policy Albert Huang
2022-10-19 11:49 ` Aneesh Kumar K V
2022-10-19 9:33 ` [PATCH] mm: hugetlb: support get/set_policy for hugetlb_vm_ops 黄杰
2022-10-23 20:16 ` Hugh Dickins
2022-10-17 8:44 ` David Hildenbrand
2022-10-17 9:48 ` [External] " 黄杰
2022-10-17 11:33 ` David Hildenbrand
2022-10-17 11:46 ` 黄杰
2022-10-17 12:00 ` David Hildenbrand
2022-10-18 9:27 ` 黄杰
2022-10-18 9:35 ` David Hildenbrand
2022-10-17 17:59 ` [External] " Mike Kravetz
2022-10-18 9:24 ` 黄杰
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CABKxMyOMEKdYWqvePNsn5C-vi9nK+XzmDscfCmFc9hQijfMTAA@mail.gmail.com \
--to=huangjie.albert@bytedance.com \
--cc=ak@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=songmuchun@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).