linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: Hugh Dickins <hughd@google.com>
Cc: Qian Cai <cai@lca.pw>,
	js1304@gmail.com, Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-team@lge.com, Vlastimil Babka <vbabka@suse.cz>,
	Christoph Hellwig <hch@infradead.org>,
	Roman Gushchin <guro@fb.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Michal Hocko <mhocko@suse.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>
Subject: Re: [PATCH v3 7/8] mm/mempolicy: use a standard migration target allocation callback
Date: Fri, 9 Oct 2020 17:25:30 -0700	[thread overview]
Message-ID: <42779a0b-d174-a4e5-abbf-4a77879d1e51@oracle.com> (raw)
In-Reply-To: <alpine.LSU.2.11.2010091521460.12013@eggly.anvils>

On 10/9/20 3:23 PM, Hugh Dickins wrote:
> On Fri, 9 Oct 2020, Mike Kravetz wrote:
>> On 10/8/20 10:50 PM, Hugh Dickins wrote:
>>>
>>> It's a problem I've faced before in tmpfs, keeping a hold on the
>>> mapping while page lock is dropped.  Quite awkward: igrab() looks as
>>> if it's the right thing to use, but turns out to give no protection
>>> against umount.  Last time around, I ended up with a stop_eviction
>>> count in the shmem inode, which shmem_evict_inode() waits on if
>>> necessary.  Something like that could be done for hugetlbfs too,
>>> but I'd prefer to do it without adding extra, if there is a way.
>>
>> Thanks.
> 
> I failed to come up with anything neater than a stop_eviction count
> in the hugetlbfs inode.
> 
> But that's not unlike a very special purpose rwsem added into the
> hugetlbfs inode: and now that you're reconsidering how i_mmap_rwsem
> got repurposed, perhaps you will end up with an rwsem of your own in
> the hugetlbfs inode.

We have this in the Oracle UEK kernel.
https://github.com/oracle/linux-uek/commit/89260f55df008bb01c841714fb2ad26c300c8c88
Usage has been expanded to cover more cases.

When I proposed the same upstream, the suggestion was to try and use
i_mmap_rwsem.  That is what I have been trying to do.  Certainly, a
hugetlbfs specific rwsem is easier to manage and less disruptive.

> So I won't distract you with a stop_eviction patch unless you ask:
> that's easy, what you're looking into is hard - good luck!

Thanks Hugh!

>> In parallel to working the locking issue, I am also exploring enhanced
>> backout code to handle all the needed cases.

I'm now confident that we need this or some other locking in place.  Why?

I went back to a code base before locking changes.  My idea was to check
for races and backout changes as necessary.  Page fault handling code will
do something like this:

ptep = huge_pte_alloc(mm, haddr, huge_page_size(h));
... do some stuff, possibly allocate a page ...
ptl = huge_pte_lock(h, mm, ptep);

Because of pmd sharing, we can not be sure the ptep is valid until
after holding the ptl.  So, I was going to add something like this
after obtaining the ptl.

while(ptep != huge_pte_offset(mm, haddr, huge_page_size(h))) {
	spin_unlock(ptl);
	ptep = huge_pte_alloc(mm, haddr, huge_page_size(h));
	if (!ptep) {
		ret = VM_FAULT_OOM;
		goto backout_unlocked;
	}
	ptl = huge_pte_lock(h, mm, ptep);
}

Unfortunately, the problem is worse than I thought.  I knew the PMD
pointed to by the ptep could be unshared before attempting to acquire
the ptl.  However, it is possible that the page which was the PMD may
be repurposed before we even try to acquire the ptl.  Since the ptl is
a spinlock within the struct page of what was the PMD, it may no longer
be valid.  I actually experienced addressing exceptions in the spinlock
code within huge_pte_lock. :(

So, it seems that we need some way to prevent PMD unsharing between the
huge_pte_alloc() and huge_pte_lock().  It is going to have to be
i_mmap_rwsem or some other rwsem.
-- 
Mike Kravetz


  reply	other threads:[~2020-10-10  0:27 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-23  6:13 [PATCH v3 0/8] clean-up the migration target allocation functions js1304
2020-06-23  6:13 ` [PATCH v3 1/8] mm/page_isolation: prefer the node of the source page js1304
2020-06-23  6:13 ` [PATCH v3 2/8] mm/migrate: move migration helper from .h to .c js1304
2020-06-23  6:13 ` [PATCH v3 3/8] mm/hugetlb: unify migration callbacks js1304
2020-06-24 21:18   ` Mike Kravetz
2020-06-25 11:26   ` Michal Hocko
2020-06-26  4:02     ` Joonsoo Kim
2020-07-02 16:13       ` Vlastimil Babka
2020-07-03  0:55         ` Joonsoo Kim
2020-06-23  6:13 ` [PATCH v3 4/8] mm/hugetlb: make hugetlb migration callback CMA aware js1304
2020-06-25 11:54   ` Michal Hocko
2020-06-26  4:49     ` Joonsoo Kim
2020-06-26  7:23       ` Michal Hocko
2020-06-29  6:27         ` Joonsoo Kim
2020-06-29  7:55           ` Michal Hocko
2020-06-30  6:30             ` Joonsoo Kim
2020-06-30  6:42               ` Michal Hocko
2020-06-30  7:22                 ` Joonsoo Kim
2020-06-30 16:37                   ` Mike Kravetz
2020-06-23  6:13 ` [PATCH v3 5/8] mm/migrate: make a standard migration target allocation function js1304
2020-06-25 12:05   ` Michal Hocko
2020-06-26  5:02     ` Joonsoo Kim
2020-06-26  7:33       ` Michal Hocko
2020-06-29  6:41         ` Joonsoo Kim
2020-06-29  8:03           ` Michal Hocko
2020-06-30  7:19             ` Joonsoo Kim
2020-07-03 15:25   ` Vlastimil Babka
2020-06-23  6:13 ` [PATCH v3 6/8] mm/gup: use a standard migration target allocation callback js1304
2020-06-25 12:08   ` Michal Hocko
2020-06-26  5:03     ` Joonsoo Kim
2020-07-03 15:56   ` Vlastimil Babka
2020-07-06  8:34     ` Joonsoo Kim
2020-06-23  6:13 ` [PATCH v3 7/8] mm/mempolicy: " js1304
2020-06-25 12:09   ` Michal Hocko
2020-07-03 15:59   ` Vlastimil Babka
     [not found]   ` <20200708012044.GC992@lca.pw>
2020-07-08  6:45     ` Michal Hocko
2020-10-08  3:21     ` Hugh Dickins
2020-10-08 17:29       ` Mike Kravetz
2020-10-09  5:50         ` Hugh Dickins
2020-10-09 17:42           ` Mike Kravetz
2020-10-09 22:23             ` Hugh Dickins
2020-10-10  0:25               ` Mike Kravetz [this message]
2020-06-23  6:13 ` [PATCH v3 8/8] mm/page_alloc: remove a wrapper for alloc_migration_target() js1304
2020-06-25 12:10   ` Michal Hocko
2020-07-03 16:18   ` Vlastimil Babka
2020-07-06  8:44     ` Joonsoo Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=42779a0b-d174-a4e5-abbf-4a77879d1e51@oracle.com \
    --to=mike.kravetz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=cai@lca.pw \
    --cc=guro@fb.com \
    --cc=hch@infradead.org \
    --cc=hughd@google.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=js1304@gmail.com \
    --cc=kernel-team@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).