All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vishal Moola <vishal.moola@gmail.com>
To: James Houghton <jthoughton@google.com>
Cc: Peter Xu <peterx@redhat.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	linux-mm@kvack.org, Naoya Horiguchi <naoya.horiguchi@linux.dev>,
	David Rientjes <rientjes@google.com>,
	Michal Hocko <mhocko@suse.com>,
	Matthew Wilcox <willy@infradead.org>,
	David Hildenbrand <david@redhat.com>,
	Muchun Song <songmuchun@bytedance.com>
Subject: Re: A mapcount riddle
Date: Wed, 25 Jan 2023 11:26:10 -0800	[thread overview]
Message-ID: <Y9GCUsY2UJQSrNeg@fedora> (raw)
In-Reply-To: <CADrL8HW5V5c6YyO30pRMjHm0r3rCfOqm-9O0Dnm6Ft9KNQQWyA@mail.gmail.com>

On Wed, Jan 25, 2023 at 08:22:24AM -0800, James Houghton wrote:
> On Wed, Jan 25, 2023 at 7:54 AM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Wed, Jan 25, 2023 at 07:26:49AM -0800, James Houghton wrote:
> > > > At first thought this seems bad.  However, I believe this has been the
> > > > behavior since hugetlb PMD sharing was introduced in 2006 and I am
> > > > unaware of any reported issues.  I did a audit of code looking at
> > > > mapcount.  In addition to the above issue with smaps, there appears
> > > > to be an issue with 'migrate_pages' where shared pages could be migrated
> > > > without appropriate privilege.
> > > >
> > > >         /* With MPOL_MF_MOVE, we migrate only unshared hugepage. */
> > > >         if (flags & (MPOL_MF_MOVE_ALL) ||
> > > >             (flags & MPOL_MF_MOVE && page_mapcount(page) == 1)) {
> > > >                 if (isolate_hugetlb(page, qp->pagelist) &&
> > > >                         (flags & MPOL_MF_STRICT))
> > > >                         /*
> > > >                          * Failed to isolate page but allow migrating pages
> > > >                          * which have been queued.
> > > >                          */
> > > >                         ret = 1;
> > > >         }
> > >
> > > This isn't the exact same problem you're fixing Mike, but I want to
> > > point out a related problem.
> > >
> > > This is the generic-mm-equivalent of the hugetlb code above:
> > >
> > > static int migrate_page_add(struct page *page, struct list_head
> > > *pagelist, unsigned long flags)
> > > {
> > >         struct page *head = compound_head(page);
> > >         /*
> > >         * Avoid migrating a page that is shared with others.
> > >         */
> > >         if ((flags & MPOL_MF_MOVE_ALL) || page_mapcount(head) == 1) {
> > >                 if (!isolate_lru_page(head)) {
> > >                         list_add_tail(&head->lru, pagelist);
> > >                         mod_node_page_state(page_pgdat(head),
> > >                                 NR_ISOLATED_ANON + page_is_file_lru(head),
> > >                                 thp_nr_pages(head));
> > > ...
> > > }
> > >

There's also 3 functions in migrate that appear to check for a similar
condition - add_page_for_migration(), numamigrate_isolate_page(), and
migrate_misplaced_page(). 

> > > If you have a partially PTE-mapped THP, page_mapcount(head) will not
> > > accurately determine if a page is mapped in multiple VMAs or not (it
> > > only tells you how many times the head page is mapped).
> > >
> > > For example...
> > > 1) You could have the THP PMD-mapped in one VMA, and then one tail
> > > page of the THP can be mapped in another. page_mapcount(head) will be
> > > 1.
> > > 2) You could have two VMAs map two separate tail pages of the THP, in
> > > which case page_mapcount(head) will be 0.
> > >
> > > I bring this up because we have the same problem with HugeTLB
> > > high-granularity mapping.
> >
> > Maybe a better match here is total_mapcount() rather than page_mapcount()
> > (despite the overheads on the sub-page loop)?
> 
> This would kind of fix the problem, but it would be too conservative now. :)

I agree. Interestingly, numamigrate_isolate_page() does take the
total_mapcount() approach right now, so I'm not sure how much of a
difference being more conservative makes.

> In both example 1 and 2 above, total_mapcount(head) for both would be
> 2, so that's ok. But now consider: you have one VMA that is
> PTE-mapping two pieces of the same THP. total_mapcount(head) is still
> 2, even though only a single VMA is mapping the page.


  reply	other threads:[~2023-01-25 19:26 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-24 20:56 A mapcount riddle Mike Kravetz
2023-01-24 23:00 ` Peter Xu
2023-01-24 23:29   ` Yang Shi
2023-01-25 16:02     ` Peter Xu
2023-01-25 18:26       ` Yang Shi
2023-01-24 23:35   ` Mike Kravetz
2023-01-25 16:46     ` Peter Xu
2023-01-25 18:16       ` Mike Kravetz
2023-01-25 20:13         ` Peter Xu
2023-01-25  8:24 ` Michal Hocko
2023-01-25 17:59   ` Mike Kravetz
2023-01-26  9:16     ` Michal Hocko
2023-01-26 17:51       ` Mike Kravetz
2023-01-27  9:56         ` Michal Hocko
2023-01-25  9:09 ` David Hildenbrand
2023-01-25 15:26 ` James Houghton
2023-01-25 15:54   ` Peter Xu
2023-01-25 16:22     ` James Houghton
2023-01-25 19:26       ` Vishal Moola [this message]
2023-01-26  9:15       ` David Hildenbrand
2023-01-26 18:22         ` Yang Shi
2023-01-26  9:10   ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y9GCUsY2UJQSrNeg@fedora \
    --to=vishal.moola@gmail.com \
    --cc=david@redhat.com \
    --cc=jthoughton@google.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=naoya.horiguchi@linux.dev \
    --cc=peterx@redhat.com \
    --cc=rientjes@google.com \
    --cc=songmuchun@bytedance.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.