Re: [PATCH] mm/hmm: Fix bad subpage pointer in try_to_unmap_one

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jerome Glisse <jglisse@redhat.com>
To: John Hubbard <jhubbard@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Jason Gunthorpe <jgg@mellanox.com>
Subject: Re: [PATCH] mm/hmm: Fix bad subpage pointer in try_to_unmap_one
Date: Tue, 16 Jul 2019 11:33:37 -0400	[thread overview]
Message-ID: <20190716153337.GA3490@redhat.com> (raw)
In-Reply-To: <6a52c2a0-8d27-2ce4-e797-7cae653df21a@nvidia.com>

On Mon, Jul 15, 2019 at 11:14:31PM -0700, John Hubbard wrote:
> On 7/15/19 5:38 PM, Ralph Campbell wrote:
> > On 7/15/19 4:34 PM, John Hubbard wrote:
> > > On 7/15/19 3:00 PM, Andrew Morton wrote:
> > > > On Tue, 9 Jul 2019 18:24:57 -0700 Ralph Campbell <rcampbell@nvidia.com> wrote:
> > > > 
> > > >   mm/rmap.c |    1 +
> > > >   1 file changed, 1 insertion(+)
> > > > 
> > > > --- a/mm/rmap.c~mm-hmm-fix-bad-subpage-pointer-in-try_to_unmap_one
> > > > +++ a/mm/rmap.c
> > > > @@ -1476,6 +1476,7 @@ static bool try_to_unmap_one(struct page
> > > >                * No need to invalidate here it will synchronize on
> > > >                * against the special swap migration pte.
> > > >                */
> > > > +            subpage = page;
> > > >               goto discard;
> > > >           }
> > > 
> > > Hi Ralph and everyone,
> > > 
> > > While the above prevents a crash, I'm concerned that it is still not
> > > an accurate fix. This fix leads to repeatedly removing the rmap, against the
> > > same struct page, which is odd, and also doesn't directly address the
> > > root cause, which I understand to be: this routine can't handle migrating
> > > the zero page properly--over and back, anyway. (We should also mention more
> > > about how this is triggered, in the commit description.)
> > > 
> > > I'll take a closer look at possible fixes (I have to step out for a bit) soon,
> > > but any more experienced help is also appreciated here.
> > > 
> > > thanks,
> > 
> > I'm not surprised at the confusion. It took me quite awhile to
> > understand how migrate_vma() works with ZONE_DEVICE private memory.
> > The big point to be aware of is that when migrating a page to
> > device private memory, the source page's page->mapping pointer
> > is copied to the ZONE_DEVICE struct page and the page_mapcount()
> > is increased. So, the kernel sees the page as being "mapped"
> > but the page table entry as being is_swap_pte() so the CPU will fault
> > if it tries to access the mapped address.
> 
> Thanks for humoring me here...
> 
> The part about the source page's page->mapping pointer being *copied*
> to the ZONE_DEVICE struct page is particularly interesting, and belongs
> maybe even in a comment (if not already there). Definitely at least in
> the commit description, for now.
> 
> > So yes, the source anon page is unmapped, DMA'ed to the device,
> > and then mapped again. Then on a CPU fault, the zone device page
> > is unmapped, DMA'ed to system memory, and mapped again.
> > The rmap_walk() is used to clear the temporary migration pte so
> > that is another important detail of how migrate_vma() works.
> > At the moment, only single anon private pages can migrate to
> > device private memory so there are no subpages and setting it to "page"
> > should be correct for now. I'm looking at supporting migration of
> > transparent huge pages but that is a work in progress.
> 
> Well here, I worry, because subpage != tail page, right? subpage is a
> strange variable name, and here it is used to record the page that
> corresponds to *each* mapping that is found during the reverse page
> mapping walk.
> 
> And that makes me suspect that if there were more than one of these
> found (which is unlikely, given the light testing that we have available
> so far, I realize), then there could possibly be a problem with the fix,
> yes?

No THP when migrating to device memory so no tail vs head page here.

Cheers,
Jérôme

next prev parent reply	other threads:[~2019-07-16 15:33 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-09 22:35 [PATCH] mm/hmm: Fix bad subpage pointer in try_to_unmap_one Ralph Campbell
2019-07-10  0:28 ` Andrew Morton
2019-07-10  1:24   ` Ralph Campbell
2019-07-15 22:00     ` Andrew Morton
2019-07-15 23:34       ` John Hubbard
2019-07-16  0:38         ` Ralph Campbell
2019-07-16  6:14           ` John Hubbard
2019-07-16 15:33             ` Jerome Glisse [this message]
2019-07-16 21:10           ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190716153337.GA3490@redhat.com \
    --to=jglisse@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=jgg@mellanox.com \
    --cc=jhubbard@nvidia.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=rcampbell@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.