linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Zach O'Keefe <zokeefe@google.com>
To: Peter Xu <peterx@redhat.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>,
	David Hildenbrand <david@redhat.com>,
	David Rientjes <rientjes@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	Michal Hocko <mhocko@suse.com>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	Rongwei Wang <rongwei.wang@linux.alibaba.com>,
	SeongJae Park <sj@kernel.org>, Song Liu <songliubraving@fb.com>,
	Vlastimil Babka <vbabka@suse.cz>, Yang Shi <shy828301@gmail.com>,
	Zi Yan <ziy@nvidia.com>,
	linux-mm@kvack.org, Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Arnd Bergmann <arnd@arndb.de>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Chris Kennelly <ckennelly@google.com>,
	Chris Zankel <chris@zankel.net>, Helge Deller <deller@gmx.de>,
	Hugh Dickins <hughd@google.com>,
	Ivan Kokshaysky <ink@jurassic.park.msu.ru>,
	"James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>,
	Jens Axboe <axboe@kernel.dk>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Matt Turner <mattst88@gmail.com>,
	Max Filippov <jcmvbkbc@gmail.com>,
	Miaohe Lin <linmiaohe@huawei.com>,
	Minchan Kim <minchan@kernel.org>,
	Patrick Xia <patrickx@google.com>,
	Pavel Begunkov <asml.silence@gmail.com>,
	Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Subject: Re: [PATCH v6 08/15] mm/khugepaged: add flag to ignore THP sysfs enabled
Date: Thu, 30 Jun 2022 07:17:52 -0700	[thread overview]
Message-ID: <Yr2wkNX1LIZ6masw@google.com> (raw)
In-Reply-To: <Yr0LR5bBmOEOudMw@xz-m1.local>

On Jun 29 22:32, Peter Xu wrote:
> On Wed, Jun 29, 2022 at 06:42:25PM -0700, Zach O'Keefe wrote:
> > On Jun 29 19:21, Peter Xu wrote:
> > > On Fri, Jun 03, 2022 at 05:39:57PM -0700, Zach O'Keefe wrote:
> > > > Add enforce_thp_enabled flag to struct collapse_control that allows context
> > > > to ignore constraints imposed by /sys/kernel/transparent_hugepage/enabled.
> > > >
> > > > This flag is set in khugepaged collapse context to preserve existing
> > > > khugepaged behavior.
> > > >
> > > > This flag will be used (unset) when introducing madvise collapse
> > > > context since the desired THP semantics of MADV_COLLAPSE aren't coupled
> > > > to sysfs THP settings.  Most notably, for the purpose of eventual
> > > > madvise_collapse(2) support, this allows userspace to trigger THP collapse
> > > > on behalf of another processes, without adding support to meddle with
> > > > the VMA flags of said process, or change sysfs THP settings.
> > > >
> > > > For now, limit this flag to /sys/kernel/transparent_hugepage/enabled,
> > > > but it can be expanded to include
> > > > /sys/kernel/transparent_hugepage/shmem_enabled later.
> > > >
> > > > Link: https://lore.kernel.org/linux-mm/CAAa6QmQxay1_=Pmt8oCX2-Va18t44FV-Vs-WsQt_6+qBks4nZA@mail.gmail.com/
> > > >
> > > > Signed-off-by: Zach O'Keefe <zokeefe@google.com>
> > > > ---
> > > >  mm/khugepaged.c | 34 +++++++++++++++++++++++++++-------
> > > >  1 file changed, 27 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > > index c3589b3e238d..4ad04f552347 100644
> > > > --- a/mm/khugepaged.c
> > > > +++ b/mm/khugepaged.c
> > > > @@ -94,6 +94,11 @@ struct collapse_control {
> > > >      */
> > > >     bool enforce_page_heuristics;
> > > >
> > > > +   /* Enforce constraints of
> > > > +    * /sys/kernel/mm/transparent_hugepage/enabled
> > > > +    */
> > > > +   bool enforce_thp_enabled;
> > >
> > > Small nitpick that we could have merged the two booleans if they always
> > > match, but no strong opinions if you think these two are clearer.  Or maybe
> > > there's other plan of using them?
> > >
> > > > +
> > > >     /* Num pages scanned per node */
> > > >     int node_load[MAX_NUMNODES];
> > > >
> > > > @@ -893,10 +898,12 @@ static bool khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node)
> > > >   */
> > > >
> > > >  static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
> > > > -           struct vm_area_struct **vmap)
> > > > +                              struct vm_area_struct **vmap,
> > > > +                              struct collapse_control *cc)
> > > >  {
> > > >     struct vm_area_struct *vma;
> > > >     unsigned long hstart, hend;
> > > > +   unsigned long vma_flags;
> > > >
> > > >     if (unlikely(khugepaged_test_exit(mm)))
> > > >             return SCAN_ANY_PROCESS;
> > > > @@ -909,7 +916,18 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address,
> > > >     hend = vma->vm_end & HPAGE_PMD_MASK;
> > > >     if (address < hstart || address + HPAGE_PMD_SIZE > hend)
> > > >             return SCAN_ADDRESS_RANGE;
> > > > -   if (!hugepage_vma_check(vma, vma->vm_flags))
> > > > +
> > > > +   /*
> > > > +    * If !cc->enforce_thp_enabled, set VM_HUGEPAGE so that
> > > > +    * hugepage_vma_check() can pass even if
> > > > +    * TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG is set (i.e. "madvise" mode).
> > > > +    * Note that hugepage_vma_check() doesn't enforce that
> > > > +    * TRANSPARENT_HUGEPAGE_FLAG or TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG
> > > > +    * must be set (i.e. "never" mode).
> > > > +    */
> > > > +   vma_flags = cc->enforce_thp_enabled ?  vma->vm_flags
> > > > +                   : vma->vm_flags | VM_HUGEPAGE;
> > >
> > > Another nitpick..
> > >
> > > We could get a weird vm_flags when VM_NOHUGEPAGE is set.  I don't think
> > > it'll go wrong since hugepage_vma_check() checks NOHUGEPAGE first, but IMHO
> > > we shouldn't rely on that as it seems error prone (e.g. when accidentally
> > > moved things around).
> > >
> > > So maybe nicer to only apply VM_HUGEPAGE if !VM_NOHUGEPAGE?  Or pass over
> > > "enforce_thp_enabled" into hugepage_vma_check() should work too, iiuc.
> > > Passing in the boolean has one benefit that we don't really need the
> > > complicated comment above since the code should be able to explain itself.
> > 
> > Hey Peter, thanks again for taking the time to review.
> > 
> > Answering both of the above at the time:
> > 
> > As in this series so far, I've tried to keep context functionally-declarative -
> > specifying the intended behavior (e.g. "enforce_page_heuristics") rather than
> > adding "if (khugepaged) { .. } else if (madv_collapse) { .. } else if { .. }"
> > around the code which, IMO, makes it difficult to follow. Unfortunately, I've
> > ran into the 2 problems you've stated here:
> > 
> > 1) *Right now* all the behavior knobs are either off/on at the same time
> > 2) For hugepage_vma_check() (now in mm/huge_memory.c and acting as the central
> >    authority on THP eligibility), things are complicated enough that I
> >    couldn't find a clean way to describe the parameters of the context without
> >    explicitly mentioning the caller.
> > 
> > For (2), instead of adding another arg to specify MADV_COLLAPSE's behavior,
> > I think we need to package these contexts into a single set of flags:
> > 
> > enum thp_ctx_flags {
> >         THP_CTX_ANON_FAULT              = 1 << 1,
> >         THP_CTX_KHUGEPAGED              = 1 << 2,
> >         THP_CTX_SMAPS                   = 1 << 3,
> >         THP_CTX_MADVISE_COLLAPSE        = 1 << 4,
> > };
> > 
> > That will avoid hacking vma flags passed to hugepage_vma_check().
> > 
> > And, if we have these anyways, I might as well do away with some of the
> > (semantically meaningful but functionally redundant) flags in
> > struct collapse_control and just specify a single .thp_ctx_flags member. I'm
> > not entirely happy with it - but that's what I'm planning.
> > 
> > WDYT?
> 
> Firstly I think I wrongly sent previous email privately.. :( Let me try to
> add the list back..
> 
> IMHO we don't need to worry too much on the "if... else if... else",
> because they shouldn't be more complicated than when you spread the
> meanings into multiple flags, or how could it be? :) IMHO it should
> literally be as simple as applying:
> 
>   s/enforce_{A|B|C|...}/khugepaged_initiated/g
> 
> Throughout the patches, then we squash the patches introducing enforce_X.

Right, the code today will be virtually identical. The attempt to describe
contexts in terms of behaviors is based on an unfounded assumption that
successive contexts could reuse said behaviors - but there are currently no
plans for other collapsing contexts.

> If you worry it's not clear on "what does khugepaged_initiated mean", we
> could add whatever comment above the variable explaining A/B/C/D will be
> covered when this is set, and we could postpone to do the flag split only
> until there're real user.
> 
> Adding these flags could add unnecessary bit-and instructions into the code
> generated at last, and if it's only about readability issue that's really
> what comment is for?

Ya maybe I'm overthinking it and will just do the most straightforward thing for
now.

As always, thanks for taking the time to review / offer suggestions!

Best,
Zach

> Thanks,
> 
> -- 
> Peter Xu
> 


  reply	other threads:[~2022-06-30 14:18 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-04  0:39 [PATCH v6 00/15] mm: userspace hugepage collapse Zach O'Keefe
2022-06-04  0:39 ` [PATCH v6 01/15] mm: khugepaged: don't carry huge page to the next loop for !CONFIG_NUMA Zach O'Keefe
2022-06-06 18:25   ` Yang Shi
2022-06-29 20:49   ` Peter Xu
2022-06-30  1:15     ` Zach O'Keefe
2022-06-04  0:39 ` [PATCH v6 02/15] mm/khugepaged: record SCAN_PMD_MAPPED when scan_pmd() finds THP Zach O'Keefe
2022-06-06 20:45   ` Yang Shi
2022-06-07 16:01     ` Zach O'Keefe
2022-06-07 19:32       ` Zach O'Keefe
2022-06-07 21:27         ` Yang Shi
2022-06-08  0:27           ` Zach O'Keefe
2022-06-04  0:39 ` [PATCH v6 03/15] mm/khugepaged: add struct collapse_control Zach O'Keefe
2022-06-06  2:41   ` kernel test robot
2022-06-06 16:40     ` Zach O'Keefe
2022-06-06 20:20       ` Yang Shi
2022-06-06 21:22         ` Yang Shi
2022-06-06 22:23       ` Andrew Morton
2022-06-06 23:53         ` Yang Shi
2022-06-08  0:42           ` Zach O'Keefe
2022-06-08  1:00             ` Yang Shi
2022-06-08  1:06               ` Zach O'Keefe
2022-06-04  0:39 ` [PATCH v6 04/15] mm/khugepaged: dedup and simplify hugepage alloc and charging Zach O'Keefe
2022-06-06 20:50   ` Yang Shi
2022-06-29 21:58   ` Peter Xu
2022-06-30 20:14     ` Zach O'Keefe
2022-06-04  0:39 ` [PATCH v6 05/15] mm/khugepaged: make allocation semantics context-specific Zach O'Keefe
2022-06-06 20:58   ` Yang Shi
2022-06-07 19:56     ` Zach O'Keefe
2022-06-04  0:39 ` [PATCH v6 06/15] mm/khugepaged: pipe enum scan_result codes back to callers Zach O'Keefe
2022-06-06 22:39   ` Yang Shi
2022-06-07  0:17     ` Zach O'Keefe
2022-06-04  0:39 ` [PATCH v6 07/15] mm/khugepaged: add flag to ignore khugepaged heuristics Zach O'Keefe
2022-06-06 22:51   ` Yang Shi
2022-06-04  0:39 ` [PATCH v6 08/15] mm/khugepaged: add flag to ignore THP sysfs enabled Zach O'Keefe
2022-06-06 23:02   ` Yang Shi
     [not found]   ` <YrzehlUoo2iMMLC2@xz-m1.local>
     [not found]     ` <CAAa6QmRXD5KboM8=ZZRPThOmcLEPtxzf0XyjkCeY_vgR7VOPqg@mail.gmail.com>
2022-06-30  2:32       ` Peter Xu
2022-06-30 14:17         ` Zach O'Keefe [this message]
2022-06-04  0:39 ` [PATCH v6 09/15] mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse Zach O'Keefe
2022-06-06 23:53   ` Yang Shi
2022-06-07 22:48     ` Zach O'Keefe
2022-06-08  0:39       ` Yang Shi
2022-06-09 17:35         ` Zach O'Keefe
2022-06-09 18:51           ` Yang Shi
2022-06-10 14:51             ` Zach O'Keefe
2022-06-04  0:39 ` [PATCH v6 10/15] mm/khugepaged: rename prefix of shared collapse functions Zach O'Keefe
2022-06-06 23:56   ` Yang Shi
2022-06-07  0:31     ` Zach O'Keefe
2022-06-04  0:40 ` [PATCH v6 11/15] mm/madvise: add MADV_COLLAPSE to process_madvise() Zach O'Keefe
2022-06-07 19:14   ` Yang Shi
2022-06-04  0:40 ` [PATCH v6 12/15] selftests/vm: modularize collapse selftests Zach O'Keefe
2022-06-04  0:40 ` [PATCH v6 13/15] selftests/vm: add MADV_COLLAPSE collapse context to selftests Zach O'Keefe
2022-06-04  0:40 ` [PATCH v6 14/15] selftests/vm: add selftest to verify recollapse of THPs Zach O'Keefe
2022-06-04  0:40 ` [PATCH v6 15/15] tools headers uapi: add MADV_COLLAPSE madvise mode to tools Zach O'Keefe
2022-06-06 23:58   ` Yang Shi
2022-06-07  0:24     ` Zach O'Keefe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yr2wkNX1LIZ6masw@google.com \
    --to=zokeefe@google.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@linux.alibaba.com \
    --cc=arnd@arndb.de \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=axelrasmussen@google.com \
    --cc=chris@zankel.net \
    --cc=ckennelly@google.com \
    --cc=david@redhat.com \
    --cc=deller@gmx.de \
    --cc=hughd@google.com \
    --cc=ink@jurassic.park.msu.ru \
    --cc=jcmvbkbc@gmail.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-mm@kvack.org \
    --cc=mattst88@gmail.com \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=pasha.tatashin@soleen.com \
    --cc=patrickx@google.com \
    --cc=peterx@redhat.com \
    --cc=rientjes@google.com \
    --cc=rongwei.wang@linux.alibaba.com \
    --cc=shy828301@gmail.com \
    --cc=sj@kernel.org \
    --cc=songliubraving@fb.com \
    --cc=tsbogend@alpha.franken.de \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).