All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zach O'Keefe <zokeefe@google.com>
To: Yang Shi <shy828301@gmail.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>,
	David Hildenbrand <david@redhat.com>,
	David Rientjes <rientjes@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	Michal Hocko <mhocko@suse.com>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	Peter Xu <peterx@redhat.com>,
	Rongwei Wang <rongwei.wang@linux.alibaba.com>,
	SeongJae Park <sj@kernel.org>, Song Liu <songliubraving@fb.com>,
	Vlastimil Babka <vbabka@suse.cz>, Zi Yan <ziy@nvidia.com>,
	Linux MM <linux-mm@kvack.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Arnd Bergmann <arnd@arndb.de>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Chris Kennelly <ckennelly@google.com>,
	Chris Zankel <chris@zankel.net>, Helge Deller <deller@gmx.de>,
	Hugh Dickins <hughd@google.com>,
	Ivan Kokshaysky <ink@jurassic.park.msu.ru>,
	"James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>,
	Jens Axboe <axboe@kernel.dk>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Matt Turner <mattst88@gmail.com>,
	Max Filippov <jcmvbkbc@gmail.com>,
	Miaohe Lin <linmiaohe@huawei.com>,
	Minchan Kim <minchan@kernel.org>,
	Patrick Xia <patrickx@google.com>,
	Pavel Begunkov <asml.silence@gmail.com>,
	Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Subject: Re: [mm-unstable v7 06/18] mm/khugepaged: add flag to predicate khugepaged-only behavior
Date: Tue, 12 Jul 2022 10:06:46 -0700	[thread overview]
Message-ID: <Ys2qJm6FaOQcxkha@google.com> (raw)
In-Reply-To: <CAHbLzkoWuhbpeGmyOogYqZa4YjBhEYiBP0nx1qCmhRdOUyATZA@mail.gmail.com>

On Jul 11 13:43, Yang Shi wrote:
> On Wed, Jul 6, 2022 at 5:06 PM Zach O'Keefe <zokeefe@google.com> wrote:
> >
> > Add .is_khugepaged flag to struct collapse_control so
> > khugepaged-specific behavior can be elided by MADV_COLLAPSE context.
> >
> > Start by protecting khugepaged-specific heuristics by this flag. In
> > MADV_COLLAPSE, the user presumably has reason to believe the collapse
> > will be beneficial and khugepaged heuristics shouldn't prevent the user
> > from doing so:
> >
> > 1) sysfs-controlled knobs khugepaged_max_ptes_[none|swap|shared]
> >
> > 2) requirement that some pages in region being collapsed be young or
> >    referenced
> >
> > Signed-off-by: Zach O'Keefe <zokeefe@google.com>
> > ---
> >
> > v6 -> v7: There is no functional change here from v6, just a renaming of
> >           flags to explicitly be predicated on khugepaged.
> 
> Reviewed-by: Yang Shi <shy828301@gmail.com>
> 
> Just a nit, some conditions check is_khugepaged first, some don't. Why
> not make them more consistent to check is_khugepaged first?
>

Again, thank you for taking the time to review. Agreed the inconsistency is
ugly, and have updated to check is_khugepaged consistently first. Thanks for the
suggestion.

Zach

> > ---
> >  mm/khugepaged.c | 62 ++++++++++++++++++++++++++++++++++---------------
> >  1 file changed, 43 insertions(+), 19 deletions(-)
> >
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index 147f5828f052..d89056d8cbad 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -73,6 +73,8 @@ static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait);
> >   * default collapse hugepages if there is at least one pte mapped like
> >   * it would have happened if the vma was large enough during page
> >   * fault.
> > + *
> > + * Note that these are only respected if collapse was initiated by khugepaged.
> >   */
> >  static unsigned int khugepaged_max_ptes_none __read_mostly;
> >  static unsigned int khugepaged_max_ptes_swap __read_mostly;
> > @@ -86,6 +88,8 @@ static struct kmem_cache *mm_slot_cache __read_mostly;
> >  #define MAX_PTE_MAPPED_THP 8
> >
> >  struct collapse_control {
> > +       bool is_khugepaged;
> > +
> >         /* Num pages scanned per node */
> >         int node_load[MAX_NUMNODES];
> >
> > @@ -554,6 +558,7 @@ static bool is_refcount_suitable(struct page *page)
> >  static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> >                                         unsigned long address,
> >                                         pte_t *pte,
> > +                                       struct collapse_control *cc,
> >                                         struct list_head *compound_pagelist)
> >  {
> >         struct page *page = NULL;
> > @@ -567,7 +572,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> >                 if (pte_none(pteval) || (pte_present(pteval) &&
> >                                 is_zero_pfn(pte_pfn(pteval)))) {
> >                         if (!userfaultfd_armed(vma) &&
> > -                           ++none_or_zero <= khugepaged_max_ptes_none) {
> > +                           (++none_or_zero <= khugepaged_max_ptes_none ||
> > +                            !cc->is_khugepaged)) {
> >                                 continue;
> >                         } else {
> >                                 result = SCAN_EXCEED_NONE_PTE;
> > @@ -587,8 +593,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> >
> >                 VM_BUG_ON_PAGE(!PageAnon(page), page);
> >
> > -               if (page_mapcount(page) > 1 &&
> > -                               ++shared > khugepaged_max_ptes_shared) {
> > +               if (cc->is_khugepaged && page_mapcount(page) > 1 &&
> > +                   ++shared > khugepaged_max_ptes_shared) {
> >                         result = SCAN_EXCEED_SHARED_PTE;
> >                         count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
> >                         goto out;
> > @@ -654,10 +660,14 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> >                 if (PageCompound(page))
> >                         list_add_tail(&page->lru, compound_pagelist);
> >  next:
> > -               /* There should be enough young pte to collapse the page */
> > -               if (pte_young(pteval) ||
> > -                   page_is_young(page) || PageReferenced(page) ||
> > -                   mmu_notifier_test_young(vma->vm_mm, address))
> > +               /*
> > +                * If collapse was initiated by khugepaged, check that there is
> > +                * enough young pte to justify collapsing the page
> > +                */
> > +               if (cc->is_khugepaged &&
> > +                   (pte_young(pteval) || page_is_young(page) ||
> > +                    PageReferenced(page) || mmu_notifier_test_young(vma->vm_mm,
> > +                                                                    address)))
> >                         referenced++;
> >
> >                 if (pte_write(pteval))
> > @@ -666,7 +676,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
> >
> >         if (unlikely(!writable)) {
> >                 result = SCAN_PAGE_RO;
> > -       } else if (unlikely(!referenced)) {
> > +       } else if (unlikely(cc->is_khugepaged && !referenced)) {
> >                 result = SCAN_LACK_REFERENCED_PAGE;
> >         } else {
> >                 result = SCAN_SUCCEED;
> > @@ -745,6 +755,7 @@ static void khugepaged_alloc_sleep(void)
> >
> >
> >  struct collapse_control khugepaged_collapse_control = {
> > +       .is_khugepaged = true,
> >         .last_target_node = NUMA_NO_NODE,
> >  };
> >
> > @@ -1023,7 +1034,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address,
> >         mmu_notifier_invalidate_range_end(&range);
> >
> >         spin_lock(pte_ptl);
> > -       result =  __collapse_huge_page_isolate(vma, address, pte,
> > +       result =  __collapse_huge_page_isolate(vma, address, pte, cc,
> >                                                &compound_pagelist);
> >         spin_unlock(pte_ptl);
> >
> > @@ -1114,7 +1125,8 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma,
> >              _pte++, _address += PAGE_SIZE) {
> >                 pte_t pteval = *_pte;
> >                 if (is_swap_pte(pteval)) {
> > -                       if (++unmapped <= khugepaged_max_ptes_swap) {
> > +                       if (++unmapped <= khugepaged_max_ptes_swap ||
> > +                           !cc->is_khugepaged) {
> >                                 /*
> >                                  * Always be strict with uffd-wp
> >                                  * enabled swap entries.  Please see
> > @@ -1133,7 +1145,8 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma,
> >                 }
> >                 if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
> >                         if (!userfaultfd_armed(vma) &&
> > -                           ++none_or_zero <= khugepaged_max_ptes_none) {
> > +                           (++none_or_zero <= khugepaged_max_ptes_none ||
> > +                            !cc->is_khugepaged)) {
> >                                 continue;
> >                         } else {
> >                                 result = SCAN_EXCEED_NONE_PTE;
> > @@ -1163,8 +1176,9 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma,
> >                         goto out_unmap;
> >                 }
> >
> > -               if (page_mapcount(page) > 1 &&
> > -                               ++shared > khugepaged_max_ptes_shared) {
> > +               if (cc->is_khugepaged &&
> > +                   page_mapcount(page) > 1 &&
> > +                   ++shared > khugepaged_max_ptes_shared) {
> >                         result = SCAN_EXCEED_SHARED_PTE;
> >                         count_vm_event(THP_SCAN_EXCEED_SHARED_PTE);
> >                         goto out_unmap;
> > @@ -1218,14 +1232,22 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma,
> >                         result = SCAN_PAGE_COUNT;
> >                         goto out_unmap;
> >                 }
> > -               if (pte_young(pteval) ||
> > -                   page_is_young(page) || PageReferenced(page) ||
> > -                   mmu_notifier_test_young(vma->vm_mm, address))
> > +
> > +               /*
> > +                * If collapse was initiated by khugepaged, check that there is
> > +                * enough young pte to justify collapsing the page
> > +                */
> > +               if (cc->is_khugepaged &&
> > +                   (pte_young(pteval) || page_is_young(page) ||
> > +                    PageReferenced(page) || mmu_notifier_test_young(vma->vm_mm,
> > +                                                                    address)))
> >                         referenced++;
> >         }
> >         if (!writable) {
> >                 result = SCAN_PAGE_RO;
> > -       } else if (!referenced || (unmapped && referenced < HPAGE_PMD_NR/2)) {
> > +       } else if (cc->is_khugepaged &&
> > +                  (!referenced ||
> > +                   (unmapped && referenced < HPAGE_PMD_NR / 2))) {
> >                 result = SCAN_LACK_REFERENCED_PAGE;
> >         } else {
> >                 result = SCAN_SUCCEED;
> > @@ -1894,7 +1916,8 @@ static int khugepaged_scan_file(struct mm_struct *mm, struct file *file,
> >                         continue;
> >
> >                 if (xa_is_value(page)) {
> > -                       if (++swap > khugepaged_max_ptes_swap) {
> > +                       if (cc->is_khugepaged &&
> > +                           ++swap > khugepaged_max_ptes_swap) {
> >                                 result = SCAN_EXCEED_SWAP_PTE;
> >                                 count_vm_event(THP_SCAN_EXCEED_SWAP_PTE);
> >                                 break;
> > @@ -1945,7 +1968,8 @@ static int khugepaged_scan_file(struct mm_struct *mm, struct file *file,
> >         rcu_read_unlock();
> >
> >         if (result == SCAN_SUCCEED) {
> > -               if (present < HPAGE_PMD_NR - khugepaged_max_ptes_none) {
> > +               if (present < HPAGE_PMD_NR - khugepaged_max_ptes_none &&
> > +                   cc->is_khugepaged) {
> >                         result = SCAN_EXCEED_NONE_PTE;
> >                         count_vm_event(THP_SCAN_EXCEED_NONE_PTE);
> >                 } else {
> > --
> > 2.37.0.rc0.161.g10f37bed90-goog
> >


  reply	other threads:[~2022-07-12 17:06 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-06 23:59 [mm-unstable v7 00/18] mm: userspace hugepage collapse Zach O'Keefe
2022-07-06 23:59 ` [mm-unstable v7 01/18] mm/khugepaged: remove redundant transhuge_vma_suitable() check Zach O'Keefe
2022-07-11 20:38   ` Yang Shi
2022-07-12 17:14     ` Zach O'Keefe
2022-07-06 23:59 ` [mm-unstable v7 02/18] mm: khugepaged: don't carry huge page to the next loop for !CONFIG_NUMA Zach O'Keefe
2022-07-06 23:59 ` [mm-unstable v7 03/18] mm/khugepaged: add struct collapse_control Zach O'Keefe
2022-07-08 21:01   ` Andrew Morton
2022-07-11 18:29     ` Zach O'Keefe
2022-07-11 18:45       ` Andrew Morton
2022-07-12 14:17         ` Zach O'Keefe
2022-07-11 21:51       ` Yang Shi
2022-07-06 23:59 ` [mm-unstable v7 04/18] mm/khugepaged: dedup and simplify hugepage alloc and charging Zach O'Keefe
2022-07-06 23:59 ` [mm-unstable v7 05/18] mm/khugepaged: pipe enum scan_result codes back to callers Zach O'Keefe
2022-07-06 23:59 ` [mm-unstable v7 06/18] mm/khugepaged: add flag to predicate khugepaged-only behavior Zach O'Keefe
2022-07-11 20:43   ` Yang Shi
2022-07-12 17:06     ` Zach O'Keefe [this message]
2022-07-06 23:59 ` [mm-unstable v7 07/18] mm/thp: add flag to enforce sysfs THP in hugepage_vma_check() Zach O'Keefe
2022-07-11 20:57   ` Yang Shi
2022-07-12 16:58     ` Zach O'Keefe
2022-07-06 23:59 ` [mm-unstable v7 08/18] mm/khugepaged: record SCAN_PMD_MAPPED when scan_pmd() finds hugepage Zach O'Keefe
2022-07-11 21:03   ` Yang Shi
2022-07-12 16:50     ` Zach O'Keefe
2022-07-06 23:59 ` [mm-unstable v7 09/18] mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse Zach O'Keefe
2022-07-11 21:22   ` Yang Shi
2022-07-12 16:54     ` Zach O'Keefe
2022-07-06 23:59 ` [mm-unstable v7 10/18] mm/khugepaged: rename prefix of shared collapse functions Zach O'Keefe
2022-07-06 23:59 ` [mm-unstable v7 11/18] mm/madvise: add huge_memory:mm_madvise_collapse tracepoint Zach O'Keefe
2022-07-11 21:32   ` Yang Shi
2022-07-12 16:21     ` Zach O'Keefe
2022-07-12 17:05       ` Yang Shi
2022-07-12 17:30         ` Zach O'Keefe
2022-07-06 23:59 ` [mm-unstable v7 12/18] mm/madvise: add MADV_COLLAPSE to process_madvise() Zach O'Keefe
2022-07-08 20:47   ` Andrew Morton
2022-07-13  1:05     ` Zach O'Keefe
2022-07-06 23:59 ` [mm-unstable v7 13/18] proc/smaps: add PMDMappable field to smaps Zach O'Keefe
2022-07-11 21:37   ` Yang Shi
2022-07-12 16:31     ` Zach O'Keefe
2022-07-12 17:27       ` Yang Shi
2022-07-12 17:57         ` Zach O'Keefe
2022-07-13 18:02           ` Andrew Morton
2022-07-13 18:40             ` Zach O'Keefe
2022-07-06 23:59 ` [mm-unstable v7 14/18] selftests/vm: modularize collapse selftests Zach O'Keefe
2022-07-06 23:59 ` [mm-unstable v7 15/18] selftests/vm: dedup hugepage allocation logic Zach O'Keefe
2022-07-06 23:59 ` [mm-unstable v7 16/18] selftests/vm: add MADV_COLLAPSE collapse context to selftests Zach O'Keefe
2022-07-06 23:59 ` [mm-unstable v7 17/18] selftests/vm: add selftest to verify recollapse of THPs Zach O'Keefe
2022-07-06 23:59 ` [mm-unstable v7 18/18] selftests/vm: add selftest to verify multi THP collapse Zach O'Keefe
2022-07-14 18:55 ` [RFC] mm: userspace hugepage collapse: file/shmem semantics Zach O'Keefe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Ys2qJm6FaOQcxkha@google.com \
    --to=zokeefe@google.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@linux.alibaba.com \
    --cc=arnd@arndb.de \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=axelrasmussen@google.com \
    --cc=chris@zankel.net \
    --cc=ckennelly@google.com \
    --cc=david@redhat.com \
    --cc=deller@gmx.de \
    --cc=hughd@google.com \
    --cc=ink@jurassic.park.msu.ru \
    --cc=jcmvbkbc@gmail.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-mm@kvack.org \
    --cc=mattst88@gmail.com \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=pasha.tatashin@soleen.com \
    --cc=patrickx@google.com \
    --cc=peterx@redhat.com \
    --cc=rientjes@google.com \
    --cc=rongwei.wang@linux.alibaba.com \
    --cc=shy828301@gmail.com \
    --cc=sj@kernel.org \
    --cc=songliubraving@fb.com \
    --cc=tsbogend@alpha.franken.de \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.