linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: linux-mm@kvack.org, Naoya Horiguchi <naoya.horiguchi@linux.dev>,
	David Rientjes <rientjes@google.com>,
	Michal Hocko <mhocko@suse.com>,
	Matthew Wilcox <willy@infradead.org>,
	David Hildenbrand <david@redhat.com>,
	James Houghton <jthoughton@google.com>,
	Muchun Song <songmuchun@bytedance.com>
Subject: Re: A mapcount riddle
Date: Wed, 25 Jan 2023 11:46:58 -0500	[thread overview]
Message-ID: <Y9FdAiWxeja6stdR@x1n> (raw)
In-Reply-To: <Y9BrSsYFPdm7eFFH@monkey>

On Tue, Jan 24, 2023 at 03:35:38PM -0800, Mike Kravetz wrote:
> On 01/24/23 18:00, Peter Xu wrote:
> > On Tue, Jan 24, 2023 at 12:56:24PM -0800, Mike Kravetz wrote:
> > > At first thought this seems bad.  However, I believe this has been the
> > > behavior since hugetlb PMD sharing was introduced in 2006 and I am
> > > unaware of any reported issues.  I did a audit of code looking at
> > > mapcount.  In addition to the above issue with smaps, there appears
> > > to be an issue with 'migrate_pages' where shared pages could be migrated
> > > without appropriate privilege.
> > > 
> > > 	/* With MPOL_MF_MOVE, we migrate only unshared hugepage. */
> > > 	if (flags & (MPOL_MF_MOVE_ALL) ||
> > > 	    (flags & MPOL_MF_MOVE && page_mapcount(page) == 1)) {
> > > 		if (isolate_hugetlb(page, qp->pagelist) &&
> > > 			(flags & MPOL_MF_STRICT))
> > > 			/*
> > > 			 * Failed to isolate page but allow migrating pages
> > > 			 * which have been queued.
> > > 			 */
> > > 			ret = 1;
> > > 	}
> > > 
> > > I will prepare fixes for both of these.  However, I wanted to ask if
> > > anyone has ideas about other potential issues with this?
> > 
> > This reminded me whether things should be checked already before this
> > happens.  E.g. when trying to share pmd, whether it makes sense to check
> > vma mempolicy before doing so?
> 
> Not sure I understand your question.  Are you questioning whether we should
> enter into pmd sharing if mempolicy allows movement to another node?
> Wouldn't this be the 'normal' case on a multi-node system?
> 
> > Then the question is if pmd sharing only happens with the vma that shares
> > the same memory policy, whether above mapcount==1 check would be acceptable
> > even if it's shared by multiple processes.
> 
> I am not a mempolicy expert, but that would still involve moving pages
> mapped by another process.  For that CAP_SYS_NICE is required.  So, my
> opinion would be that it is not allowed even if mempolicy is the same.

Makes sense.

> 
> > Besides, I'm also curious on the planned fix too regarding the two issues
> > mentioned.
> 
> My planned 'fix' is to simply check for shared a PMD
> (page_count(virt_to_page(pte))) to determine if page with mapcount == 1
> is shared.

I think having the current pte* won't easily work, we'll need to walk all
the pgtable that mapped this page.

To be explicit, one page can be mapped at pgtable1 which is shared by proc1
& proc2, and it can also be mapped at pgtable2 which is shared by proc3 &
proc4.  Then (assuming pte1* points to pgtable1):

  page_count(virt_to_page(pte1)) + page_mapcount(page)

Won't be the right mapcount we're looking for.

But then if we're going to rmap walk all the mappings it seems even easier
to just count how many times the page is mapped when walking, rather than
relying on page_mapcount.

What David said on maintaining map/ref counts during sharing is another
approach, I'm wondering whether there's any better way to do this.

Thanks,

-- 
Peter Xu



  reply	other threads:[~2023-01-25 16:47 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-24 20:56 A mapcount riddle Mike Kravetz
2023-01-24 23:00 ` Peter Xu
2023-01-24 23:29   ` Yang Shi
2023-01-25 16:02     ` Peter Xu
2023-01-25 18:26       ` Yang Shi
2023-01-24 23:35   ` Mike Kravetz
2023-01-25 16:46     ` Peter Xu [this message]
2023-01-25 18:16       ` Mike Kravetz
2023-01-25 20:13         ` Peter Xu
2023-01-25  8:24 ` Michal Hocko
2023-01-25 17:59   ` Mike Kravetz
2023-01-26  9:16     ` Michal Hocko
2023-01-26 17:51       ` Mike Kravetz
2023-01-27  9:56         ` Michal Hocko
2023-01-25  9:09 ` David Hildenbrand
2023-01-25 15:26 ` James Houghton
2023-01-25 15:54   ` Peter Xu
2023-01-25 16:22     ` James Houghton
2023-01-25 19:26       ` Vishal Moola
2023-01-26  9:15       ` David Hildenbrand
2023-01-26 18:22         ` Yang Shi
2023-01-26  9:10   ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y9FdAiWxeja6stdR@x1n \
    --to=peterx@redhat.com \
    --cc=david@redhat.com \
    --cc=jthoughton@google.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=naoya.horiguchi@linux.dev \
    --cc=rientjes@google.com \
    --cc=songmuchun@bytedance.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).