From: Zach O'Keefe <zokeefe@google.com>
To: Peter Xu <peterx@redhat.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>,
David Hildenbrand <david@redhat.com>,
David Rientjes <rientjes@google.com>,
Matthew Wilcox <willy@infradead.org>,
Michal Hocko <mhocko@suse.com>,
Pasha Tatashin <pasha.tatashin@soleen.com>,
Rongwei Wang <rongwei.wang@linux.alibaba.com>,
SeongJae Park <sj@kernel.org>, Song Liu <songliubraving@fb.com>,
Vlastimil Babka <vbabka@suse.cz>, Yang Shi <shy828301@gmail.com>,
Zi Yan <ziy@nvidia.com>,
linux-mm@kvack.org, Andrea Arcangeli <aarcange@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Arnd Bergmann <arnd@arndb.de>,
Axel Rasmussen <axelrasmussen@google.com>,
Chris Kennelly <ckennelly@google.com>,
Chris Zankel <chris@zankel.net>, Helge Deller <deller@gmx.de>,
Hugh Dickins <hughd@google.com>,
Ivan Kokshaysky <ink@jurassic.park.msu.ru>,
"James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>,
Jens Axboe <axboe@kernel.dk>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Matt Turner <mattst88@gmail.com>,
Max Filippov <jcmvbkbc@gmail.com>,
Miaohe Lin <linmiaohe@huawei.com>,
Minchan Kim <minchan@kernel.org>,
Patrick Xia <patrickx@google.com>,
Pavel Begunkov <asml.silence@gmail.com>,
Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Subject: Re: [PATCH v6 08/15] mm/khugepaged: add flag to ignore THP sysfs enabled
Date: Thu, 30 Jun 2022 07:17:52 -0700 [thread overview]
Message-ID: <Yr2wkNX1LIZ6masw@google.com> (raw)
In-Reply-To: <Yr0LR5bBmOEOudMw@xz-m1.local>
On Jun 29 22:32, Peter Xu wrote:
> On Wed, Jun 29, 2022 at 06:42:25PM -0700, Zach O'Keefe wrote:
> > On Jun 29 19:21, Peter Xu wrote:
> > > On Fri, Jun 03, 2022 at 05:39:57PM -0700, Zach O'Keefe wrote:
> > > > Add enforce_thp_enabled flag to struct collapse_control that allows context
> > > > to ignore constraints imposed by /sys/kernel/transparent_hugepage/enabled.
> > > >
> > > > This flag is set in khugepaged collapse context to preserve existing
> > > > khugepaged behavior.
> > > >
> > > > This flag will be used (unset) when introducing madvise collapse
> > > > context since the desired THP semantics of MADV_COLLAPSE aren't coupled
> > > > to sysfs THP settings. Most notably, for the purpose of eventual
> > > > madvise_collapse(2) support, this allows userspace to trigger THP collapse
> > > > on behalf of another processes, without adding support to meddle with
> > > > the VMA flags of said process, or change sysfs THP settings.
> > > >
> > > > For now, limit this flag to /sys/kernel/transparent_hugepage/enabled,
> > > > but it can be expanded to include
> > > > /sys/kernel/transparent_hugepage/shmem_enabled later.
> > > >
> > > > Link: https://lore.kernel.org/linux-mm/CAAa6QmQxay1_=Pmt8oCX2-Va18t44FV-Vs-WsQt_6+qBks4nZA@mail.gmail.com/
> > > >
> > > > Signed-off-by: Zach O'Keefe <zokeefe@google.com>
> > > > ---
> > > > mm/khugepaged.c | 34 +++++++++++++++++++++++++++-------
> > > > 1 file changed, 27 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > > index c3589b3e238d..4ad04f552347 100644
> > > > --- a/mm/khugepaged.c
> > > > +++ b/mm/khugepaged.c
> > > > @@ -94,6 +94,11 @@ struct collapse_control {
> > > > */
> > > > bool enforce_page_heuristics;
> > > >
> > > > + /* Enforce constraints of
> > > > + * /sys/kernel/mm/transparent_hugepage/enabled
> > > > + */
> > > > + bool enforce_thp_enabled;
> > >
> > > Small nitpick that we could have merged the two booleans if they always
> > > match, but no strong opinions if you think these two are clearer. Or maybe
> > > there's other plan of using them?
> > >
> > > > +
> > > > /* Num pages scanned per node */
> > > > int node_load[MAX_NUMNODES];
> > > >
> > > > @@ -893,10 +898,12 @@ static bool khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node)
> > > > */
> > > >
> > > > static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
> > > > - struct vm_area_struct **vmap)
> > > > + struct vm_area_struct **vmap,
> > > > + struct collapse_control *cc)
> > > > {
> > > > struct vm_area_struct *vma;
> > > > unsigned long hstart, hend;
> > > > + unsigned long vma_flags;
> > > >
> > > > if (unlikely(khugepaged_test_exit(mm)))
> > > > return SCAN_ANY_PROCESS;
> > > > @@ -909,7 +916,18 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
> > > > hend = vma->vm_end & HPAGE_PMD_MASK;
> > > > if (address < hstart || address + HPAGE_PMD_SIZE > hend)
> > > > return SCAN_ADDRESS_RANGE;
> > > > - if (!hugepage_vma_check(vma, vma->vm_flags))
> > > > +
> > > > + /*
> > > > + * If !cc->enforce_thp_enabled, set VM_HUGEPAGE so that
> > > > + * hugepage_vma_check() can pass even if
> > > > + * TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG is set (i.e. "madvise" mode).
> > > > + * Note that hugepage_vma_check() doesn't enforce that
> > > > + * TRANSPARENT_HUGEPAGE_FLAG or TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG
> > > > + * must be set (i.e. "never" mode).
> > > > + */
> > > > + vma_flags = cc->enforce_thp_enabled ? vma->vm_flags
> > > > + : vma->vm_flags | VM_HUGEPAGE;
> > >
> > > Another nitpick..
> > >
> > > We could get a weird vm_flags when VM_NOHUGEPAGE is set. I don't think
> > > it'll go wrong since hugepage_vma_check() checks NOHUGEPAGE first, but IMHO
> > > we shouldn't rely on that as it seems error prone (e.g. when accidentally
> > > moved things around).
> > >
> > > So maybe nicer to only apply VM_HUGEPAGE if !VM_NOHUGEPAGE? Or pass over
> > > "enforce_thp_enabled" into hugepage_vma_check() should work too, iiuc.
> > > Passing in the boolean has one benefit that we don't really need the
> > > complicated comment above since the code should be able to explain itself.
> >
> > Hey Peter, thanks again for taking the time to review.
> >
> > Answering both of the above at the time:
> >
> > As in this series so far, I've tried to keep context functionally-declarative -
> > specifying the intended behavior (e.g. "enforce_page_heuristics") rather than
> > adding "if (khugepaged) { .. } else if (madv_collapse) { .. } else if { .. }"
> > around the code which, IMO, makes it difficult to follow. Unfortunately, I've
> > ran into the 2 problems you've stated here:
> >
> > 1) *Right now* all the behavior knobs are either off/on at the same time
> > 2) For hugepage_vma_check() (now in mm/huge_memory.c and acting as the central
> > authority on THP eligibility), things are complicated enough that I
> > couldn't find a clean way to describe the parameters of the context without
> > explicitly mentioning the caller.
> >
> > For (2), instead of adding another arg to specify MADV_COLLAPSE's behavior,
> > I think we need to package these contexts into a single set of flags:
> >
> > enum thp_ctx_flags {
> > THP_CTX_ANON_FAULT = 1 << 1,
> > THP_CTX_KHUGEPAGED = 1 << 2,
> > THP_CTX_SMAPS = 1 << 3,
> > THP_CTX_MADVISE_COLLAPSE = 1 << 4,
> > };
> >
> > That will avoid hacking vma flags passed to hugepage_vma_check().
> >
> > And, if we have these anyways, I might as well do away with some of the
> > (semantically meaningful but functionally redundant) flags in
> > struct collapse_control and just specify a single .thp_ctx_flags member. I'm
> > not entirely happy with it - but that's what I'm planning.
> >
> > WDYT?
>
> Firstly I think I wrongly sent previous email privately.. :( Let me try to
> add the list back..
>
> IMHO we don't need to worry too much on the "if... else if... else",
> because they shouldn't be more complicated than when you spread the
> meanings into multiple flags, or how could it be? :) IMHO it should
> literally be as simple as applying:
>
> s/enforce_{A|B|C|...}/khugepaged_initiated/g
>
> Throughout the patches, then we squash the patches introducing enforce_X.
Right, the code today will be virtually identical. The attempt to describe
contexts in terms of behaviors is based on an unfounded assumption that
successive contexts could reuse said behaviors - but there are currently no
plans for other collapsing contexts.
> If you worry it's not clear on "what does khugepaged_initiated mean", we
> could add whatever comment above the variable explaining A/B/C/D will be
> covered when this is set, and we could postpone to do the flag split only
> until there're real user.
>
> Adding these flags could add unnecessary bit-and instructions into the code
> generated at last, and if it's only about readability issue that's really
> what comment is for?
Ya maybe I'm overthinking it and will just do the most straightforward thing for
now.
As always, thanks for taking the time to review / offer suggestions!
Best,
Zach
> Thanks,
>
> --
> Peter Xu
>
next prev parent reply other threads:[~2022-06-30 14:18 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-04 0:39 [PATCH v6 00/15] mm: userspace hugepage collapse Zach O'Keefe
2022-06-04 0:39 ` [PATCH v6 01/15] mm: khugepaged: don't carry huge page to the next loop for !CONFIG_NUMA Zach O'Keefe
2022-06-06 18:25 ` Yang Shi
2022-06-29 20:49 ` Peter Xu
2022-06-30 1:15 ` Zach O'Keefe
2022-06-04 0:39 ` [PATCH v6 02/15] mm/khugepaged: record SCAN_PMD_MAPPED when scan_pmd() finds THP Zach O'Keefe
2022-06-06 20:45 ` Yang Shi
2022-06-07 16:01 ` Zach O'Keefe
2022-06-07 19:32 ` Zach O'Keefe
2022-06-07 21:27 ` Yang Shi
2022-06-08 0:27 ` Zach O'Keefe
2022-06-04 0:39 ` [PATCH v6 03/15] mm/khugepaged: add struct collapse_control Zach O'Keefe
2022-06-06 2:41 ` kernel test robot
2022-06-06 16:40 ` Zach O'Keefe
2022-06-06 20:20 ` Yang Shi
2022-06-06 21:22 ` Yang Shi
2022-06-06 22:23 ` Andrew Morton
2022-06-06 23:53 ` Yang Shi
2022-06-08 0:42 ` Zach O'Keefe
2022-06-08 1:00 ` Yang Shi
2022-06-08 1:06 ` Zach O'Keefe
2022-06-04 0:39 ` [PATCH v6 04/15] mm/khugepaged: dedup and simplify hugepage alloc and charging Zach O'Keefe
2022-06-06 20:50 ` Yang Shi
2022-06-29 21:58 ` Peter Xu
2022-06-30 20:14 ` Zach O'Keefe
2022-06-04 0:39 ` [PATCH v6 05/15] mm/khugepaged: make allocation semantics context-specific Zach O'Keefe
2022-06-06 20:58 ` Yang Shi
2022-06-07 19:56 ` Zach O'Keefe
2022-06-04 0:39 ` [PATCH v6 06/15] mm/khugepaged: pipe enum scan_result codes back to callers Zach O'Keefe
2022-06-06 22:39 ` Yang Shi
2022-06-07 0:17 ` Zach O'Keefe
2022-06-04 0:39 ` [PATCH v6 07/15] mm/khugepaged: add flag to ignore khugepaged heuristics Zach O'Keefe
2022-06-06 22:51 ` Yang Shi
2022-06-04 0:39 ` [PATCH v6 08/15] mm/khugepaged: add flag to ignore THP sysfs enabled Zach O'Keefe
2022-06-06 23:02 ` Yang Shi
[not found] ` <YrzehlUoo2iMMLC2@xz-m1.local>
[not found] ` <CAAa6QmRXD5KboM8=ZZRPThOmcLEPtxzf0XyjkCeY_vgR7VOPqg@mail.gmail.com>
2022-06-30 2:32 ` Peter Xu
2022-06-30 14:17 ` Zach O'Keefe [this message]
2022-06-04 0:39 ` [PATCH v6 09/15] mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse Zach O'Keefe
2022-06-06 23:53 ` Yang Shi
2022-06-07 22:48 ` Zach O'Keefe
2022-06-08 0:39 ` Yang Shi
2022-06-09 17:35 ` Zach O'Keefe
2022-06-09 18:51 ` Yang Shi
2022-06-10 14:51 ` Zach O'Keefe
2022-06-04 0:39 ` [PATCH v6 10/15] mm/khugepaged: rename prefix of shared collapse functions Zach O'Keefe
2022-06-06 23:56 ` Yang Shi
2022-06-07 0:31 ` Zach O'Keefe
2022-06-04 0:40 ` [PATCH v6 11/15] mm/madvise: add MADV_COLLAPSE to process_madvise() Zach O'Keefe
2022-06-07 19:14 ` Yang Shi
2022-06-04 0:40 ` [PATCH v6 12/15] selftests/vm: modularize collapse selftests Zach O'Keefe
2022-06-04 0:40 ` [PATCH v6 13/15] selftests/vm: add MADV_COLLAPSE collapse context to selftests Zach O'Keefe
2022-06-04 0:40 ` [PATCH v6 14/15] selftests/vm: add selftest to verify recollapse of THPs Zach O'Keefe
2022-06-04 0:40 ` [PATCH v6 15/15] tools headers uapi: add MADV_COLLAPSE madvise mode to tools Zach O'Keefe
2022-06-06 23:58 ` Yang Shi
2022-06-07 0:24 ` Zach O'Keefe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Yr2wkNX1LIZ6masw@google.com \
--to=zokeefe@google.com \
--cc=James.Bottomley@hansenpartnership.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=alex.shi@linux.alibaba.com \
--cc=arnd@arndb.de \
--cc=asml.silence@gmail.com \
--cc=axboe@kernel.dk \
--cc=axelrasmussen@google.com \
--cc=chris@zankel.net \
--cc=ckennelly@google.com \
--cc=david@redhat.com \
--cc=deller@gmx.de \
--cc=hughd@google.com \
--cc=ink@jurassic.park.msu.ru \
--cc=jcmvbkbc@gmail.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linmiaohe@huawei.com \
--cc=linux-mm@kvack.org \
--cc=mattst88@gmail.com \
--cc=mhocko@suse.com \
--cc=minchan@kernel.org \
--cc=pasha.tatashin@soleen.com \
--cc=patrickx@google.com \
--cc=peterx@redhat.com \
--cc=rientjes@google.com \
--cc=rongwei.wang@linux.alibaba.com \
--cc=shy828301@gmail.com \
--cc=sj@kernel.org \
--cc=songliubraving@fb.com \
--cc=tsbogend@alpha.franken.de \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).