All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Stancek <jstancek@redhat.com>
To: ltp@lists.linux.it
Subject: [LTP] [bug] problems with migration of huge pages with v4.20-10214-ge1ef035d272e
Date: Thu, 3 Jan 2019 12:06:09 -0500 (EST)	[thread overview]
Message-ID: <495081357.93179893.1546535169172.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <1808265696.93134171.1546519652798.JavaMail.zimbra@redhat.com>



----- Original Message -----
<snip>

> > That commit does cause BUGs for migration and page poisoning of anon huge
> > pages.  The patch was trying to take care of i_mmap_rwsem locking outside
> > try_to_unmap infrastructure.  This is because try_to_unmap will take the
> > semaphore in read mode (for file mappings) and we really need it to be
> > taken in write mode.
> > 
> > The patch below continues to take the semaphore outside try_to_unmap for
> > the file mapping case.  For anon mappings, the locking is done as a special
> > case in try_to_unmap_one.  This is something I was trying to avoid as it
> > it harder to follow/understand.  Any suggestions on how to restructure this
> > or make it more clear are welcome.
> > 
> > Adding Andrew on Cc as he already sent the commit causing the BUGs
> > upstream.
> > 
> > From: Mike Kravetz <mike.kravetz@oracle.com>
> > 
> > hugetlbfs: fix migration and poisoning of anon huge pages
> > 
> > Expanded use of i_mmap_rwsem for pmd sharing synchronization incorrectly
> > used page_mapping() of anon huge pages to get to address_space
> > i_mmap_rwsem.  Since page_mapping() is NULL for pages of anon mappings,
> > an "unable to handle kernel NULL pointer" BUG would occur with stack
> > similar to:
> > 
> > RIP: 0010:down_write+0x1b/0x40
> > Call Trace:
> >  migrate_pages+0x81f/0xb90
> >  __ia32_compat_sys_migrate_pages+0x190/0x190
> >  do_move_pages_to_node.isra.53.part.54+0x2a/0x50
> >  kernel_move_pages+0x566/0x7b0
> >  __x64_sys_move_pages+0x24/0x30
> >  do_syscall_64+0x5b/0x180
> >  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > 
> > To fix, only use page_mapping() for non-anon or file pages.  For anon
> > pages wait until we find a vma in which the page is mapped and get the
> > address_space from vm_file.
> > 
> > Fixes: b43a99900559 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing
> > synchronization")
> > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> 
> Mike,
> 
> 1) with LTP move_pages12 (MAP_PRIVATE version of reproducer)
> Patch below fixes the panic for me.
> It didn't apply cleanly to latest master, but conflicts were easy to resolve.
> 
> 2) with MAP_SHARED version of reproducer
> It still hangs in user-space.
> v4.19 kernel appears to work fine so I've started a bisect.

My bisect with MAP_SHARED version arrived at same 2 commits:
  c86aa7bbfd55 hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race
  b43a99900559 hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization

Maybe a deadlock between page lock and mapping->i_mmap_rwsem?

thread1:
  hugetlbfs_evict_inode
    i_mmap_lock_write(mapping);
    remove_inode_hugepages
      lock_page(page);

thread2:
  __unmap_and_move
    trylock_page(page) / lock_page(page)
      remove_migration_ptes
        rmap_walk_file
          i_mmap_lock_read(mapping);

Here's strace output:
<snip>
1196  11:27:16 mmap(NULL, 4194304, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS|MAP_HUGETLB, -1, 0) = 0x7f646c400000
1197  11:27:16 set_robust_list(0x7f646d5b0e60, 24) = 0
1197  11:27:16 getppid()                = 1196
1197  11:27:16 move_pages(1196, 1024, [0x7f646c400000, 0x7f646c401000, 0x7f646c402000, 0x7f646c403000, 0x7f646c404000, 0x7f646c405000, 0x7f646c406000, 0x7f646c407000, 0x7f646c408000, 0x7f646c409000, 0x7f646c40a000, 0x7f646c40b000, 0x7f646c40c000, 0x7f646c40d000, 0x7f646c40e000, 0x7f646c40f000, 0x7f646c410000, 0x7f646c411000, 0x7f646c412000, 0x7f646c413000, 0x7f646c414000, 0x7f646c415000, 0x7f646c416000, 0x7f646c417000, 0x7f646c418000, 0x7f646c419000, 0x7f646c41a000, 0x7f646c41b000, 0x7f646c41c000, 0x7f646c41d000, 0x7f646c41e000, 0x7f646c41f000, ...], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...], [-ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, ...], MPOL_MF_MOVE_ALL) = 0
1197  11:27:16 move_pages(1196, 1024, [0x7f646c400000, 0x7f646c401000, 0x7f646c402000, 0x7f646c403000, 0x7f646c404000, 0x7f646c405000, 0x7f646c406000, 0x7f646c407000, 0x7f646c408000, 0x7f646c409000, 0x7f646c40a000, 0x7f646c40b000, 0x7f646c40c000, 0x7f646c40d000, 0x7f646c40e000, 0x7f646c40f000, 0x7f646c410000, 0x7f646c411000, 0x7f646c412000, 0x7f646c413000, 0x7f646c414000, 0x7f646c415000, 0x7f646c416000, 0x7f646c417000, 0x7f646c418000, 0x7f646c419000, 0x7f646c41a000, 0x7f646c41b000, 0x7f646c41c000, 0x7f646c41d000, 0x7f646c41e000, 0x7f646c41f000, ...], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...], [1, -EACCES, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...], MPOL_MF_MOVE_ALL) = 0
1197  11:27:16 move_pages(1196, 1024, [0x7f646c400000, 0x7f646c401000, 0x7f646c402000, 0x7f646c403000, 0x7f646c404000, 0x7f646c405000, 0x7f646c406000, 0x7f646c407000, 0x7f646c408000, 0x7f646c409000, 0x7f646c40a000, 0x7f646c40b000, 0x7f646c40c000, 0x7f646c40d000, 0x7f646c40e000, 0x7f646c40f000, 0x7f646c410000, 0x7f646c411000, 0x7f646c412000, 0x7f646c413000, 0x7f646c414000, 0x7f646c415000, 0x7f646c416000, 0x7f646c417000, 0x7f646c418000, 0x7f646c419000, 0x7f646c41a000, 0x7f646c41b000, 0x7f646c41c000, 0x7f646c41d000, 0x7f646c41e000, 0x7f646c41f000, ...], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...],  <unfinished ...>
1196  11:27:16 munmap(0x7f646c400000, 4194304 <unfinished ...>
<hangs>

Regards,
Jan

WARNING: multiple messages have this Message-ID (diff)
From: Jan Stancek <jstancek@redhat.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: linux-mm@kvack.org,
	kirill shutemov <kirill.shutemov@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	ltp@lists.linux.it, mhocko@kernel.org,
	Rachel Sibley <rasibley@redhat.com>,
	hughd@google.com, n-horiguchi@ah.jp.nec.com, aarcange@redhat.com,
	aneesh kumar <aneesh.kumar@linux.vnet.ibm.com>,
	dave@stgolabs.net, prakash sangappa <prakash.sangappa@oracle.com>,
	colin king <colin.king@canonical.com>
Subject: Re: [bug] problems with migration of huge pages with v4.20-10214-ge1ef035d272e
Date: Thu, 3 Jan 2019 12:06:09 -0500 (EST)	[thread overview]
Message-ID: <495081357.93179893.1546535169172.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <1808265696.93134171.1546519652798.JavaMail.zimbra@redhat.com>



----- Original Message -----
<snip>

> > That commit does cause BUGs for migration and page poisoning of anon huge
> > pages.  The patch was trying to take care of i_mmap_rwsem locking outside
> > try_to_unmap infrastructure.  This is because try_to_unmap will take the
> > semaphore in read mode (for file mappings) and we really need it to be
> > taken in write mode.
> > 
> > The patch below continues to take the semaphore outside try_to_unmap for
> > the file mapping case.  For anon mappings, the locking is done as a special
> > case in try_to_unmap_one.  This is something I was trying to avoid as it
> > it harder to follow/understand.  Any suggestions on how to restructure this
> > or make it more clear are welcome.
> > 
> > Adding Andrew on Cc as he already sent the commit causing the BUGs
> > upstream.
> > 
> > From: Mike Kravetz <mike.kravetz@oracle.com>
> > 
> > hugetlbfs: fix migration and poisoning of anon huge pages
> > 
> > Expanded use of i_mmap_rwsem for pmd sharing synchronization incorrectly
> > used page_mapping() of anon huge pages to get to address_space
> > i_mmap_rwsem.  Since page_mapping() is NULL for pages of anon mappings,
> > an "unable to handle kernel NULL pointer" BUG would occur with stack
> > similar to:
> > 
> > RIP: 0010:down_write+0x1b/0x40
> > Call Trace:
> >  migrate_pages+0x81f/0xb90
> >  __ia32_compat_sys_migrate_pages+0x190/0x190
> >  do_move_pages_to_node.isra.53.part.54+0x2a/0x50
> >  kernel_move_pages+0x566/0x7b0
> >  __x64_sys_move_pages+0x24/0x30
> >  do_syscall_64+0x5b/0x180
> >  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > 
> > To fix, only use page_mapping() for non-anon or file pages.  For anon
> > pages wait until we find a vma in which the page is mapped and get the
> > address_space from vm_file.
> > 
> > Fixes: b43a99900559 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing
> > synchronization")
> > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> 
> Mike,
> 
> 1) with LTP move_pages12 (MAP_PRIVATE version of reproducer)
> Patch below fixes the panic for me.
> It didn't apply cleanly to latest master, but conflicts were easy to resolve.
> 
> 2) with MAP_SHARED version of reproducer
> It still hangs in user-space.
> v4.19 kernel appears to work fine so I've started a bisect.

My bisect with MAP_SHARED version arrived at same 2 commits:
  c86aa7bbfd55 hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race
  b43a99900559 hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization

Maybe a deadlock between page lock and mapping->i_mmap_rwsem?

thread1:
  hugetlbfs_evict_inode
    i_mmap_lock_write(mapping);
    remove_inode_hugepages
      lock_page(page);

thread2:
  __unmap_and_move
    trylock_page(page) / lock_page(page)
      remove_migration_ptes
        rmap_walk_file
          i_mmap_lock_read(mapping);

Here's strace output:
<snip>
1196  11:27:16 mmap(NULL, 4194304, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS|MAP_HUGETLB, -1, 0) = 0x7f646c400000
1197  11:27:16 set_robust_list(0x7f646d5b0e60, 24) = 0
1197  11:27:16 getppid()                = 1196
1197  11:27:16 move_pages(1196, 1024, [0x7f646c400000, 0x7f646c401000, 0x7f646c402000, 0x7f646c403000, 0x7f646c404000, 0x7f646c405000, 0x7f646c406000, 0x7f646c407000, 0x7f646c408000, 0x7f646c409000, 0x7f646c40a000, 0x7f646c40b000, 0x7f646c40c000, 0x7f646c40d000, 0x7f646c40e000, 0x7f646c40f000, 0x7f646c410000, 0x7f646c411000, 0x7f646c412000, 0x7f646c413000, 0x7f646c414000, 0x7f646c415000, 0x7f646c416000, 0x7f646c417000, 0x7f646c418000, 0x7f646c419000, 0x7f646c41a000, 0x7f646c41b000, 0x7f646c41c000, 0x7f646c41d000, 0x7f646c41e000, 0x7f646c41f000, ...], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...], [-ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, -ENOENT, ...], MPOL_MF_MOVE_ALL) = 0
1197  11:27:16 move_pages(1196, 1024, [0x7f646c400000, 0x7f646c401000, 0x7f646c402000, 0x7f646c403000, 0x7f646c404000, 0x7f646c405000, 0x7f646c406000, 0x7f646c407000, 0x7f646c408000, 0x7f646c409000, 0x7f646c40a000, 0x7f646c40b000, 0x7f646c40c000, 0x7f646c40d000, 0x7f646c40e000, 0x7f646c40f000, 0x7f646c410000, 0x7f646c411000, 0x7f646c412000, 0x7f646c413000, 0x7f646c414000, 0x7f646c415000, 0x7f646c416000, 0x7f646c417000, 0x7f646c418000, 0x7f646c419000, 0x7f646c41a000, 0x7f646c41b000, 0x7f646c41c000, 0x7f646c41d000, 0x7f646c41e000, 0x7f646c41f000, ...], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...], [1, -EACCES, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...], MPOL_MF_MOVE_ALL) = 0
1197  11:27:16 move_pages(1196, 1024, [0x7f646c400000, 0x7f646c401000, 0x7f646c402000, 0x7f646c403000, 0x7f646c404000, 0x7f646c405000, 0x7f646c406000, 0x7f646c407000, 0x7f646c408000, 0x7f646c409000, 0x7f646c40a000, 0x7f646c40b000, 0x7f646c40c000, 0x7f646c40d000, 0x7f646c40e000, 0x7f646c40f000, 0x7f646c410000, 0x7f646c411000, 0x7f646c412000, 0x7f646c413000, 0x7f646c414000, 0x7f646c415000, 0x7f646c416000, 0x7f646c417000, 0x7f646c418000, 0x7f646c419000, 0x7f646c41a000, 0x7f646c41b000, 0x7f646c41c000, 0x7f646c41d000, 0x7f646c41e000, 0x7f646c41f000, ...], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...],  <unfinished ...>
1196  11:27:16 munmap(0x7f646c400000, 4194304 <unfinished ...>
<hangs>

Regards,
Jan

  reply	other threads:[~2019-01-03 17:06 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1038135449.92986364.1546459244292.JavaMail.zimbra@redhat.com>
2019-01-02 20:30 ` [LTP] [bug] problems with migration of huge pages with v4.20-10214-ge1ef035d272e Jan Stancek
2019-01-02 20:30   ` Jan Stancek
2019-01-02 21:24   ` [LTP] " Mike Kravetz
2019-01-02 21:24     ` Mike Kravetz
2019-01-03  1:44     ` [LTP] " Mike Kravetz
2019-01-03  1:44       ` Mike Kravetz
2019-01-03 12:47       ` [LTP] " Jan Stancek
2019-01-03 12:47         ` Jan Stancek
2019-01-03 17:06         ` Jan Stancek [this message]
2019-01-03 17:06           ` Jan Stancek
2019-01-03 21:44           ` [LTP] " Mike Kravetz
2019-01-03 21:44             ` Mike Kravetz
2019-01-03 21:59             ` [LTP] " Andrew Morton
2019-01-03 21:59               ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=495081357.93179893.1546535169172.JavaMail.zimbra@redhat.com \
    --to=jstancek@redhat.com \
    --cc=ltp@lists.linux.it \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.