Re: [PATCH 14/15] mm: numa: Flush TLB if NUMA hinting faults race with PTE scan update

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Mel Gorman <mgorman@suse.de>
To: Rik van Riel <riel@redhat.com>
Cc: Alex Thorlton <athorlton@sgi.com>, Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	hhuang@redhat.com
Subject: Re: [PATCH 14/15] mm: numa: Flush TLB if NUMA hinting faults race with PTE scan update
Date: Fri, 6 Dec 2013 09:24:00 +0000	[thread overview]
Message-ID: <20131206092400.GJ11295@suse.de> (raw)
In-Reply-To: <52A0DC7F.7050403@redhat.com>

On Thu, Dec 05, 2013 at 03:05:19PM -0500, Rik van Riel wrote:
> On 12/05/2013 02:54 PM, Mel Gorman wrote:
> 
> >I think that's a better fit and a neater fix. Thanks! I think it barriers
> >more than it needs to (definite cost vs maybe cost), the flush can be
> >deferred until we are definitely trying to migrate and the pte case is
> >not guaranteed to be flushed before migration due to pte_mknonnuma causing
> >a flush in ptep_clear_flush to be avoided later. Mashing the two patches
> >together yields this.
> 
> I think this would fix the numa migrate case.
> 

Good. So far I have not been seeing any problems with it at least.

> However, I believe the same issue is also present in
> mprotect(..., PROT_NONE) vs. compaction, for programs
> that trap SIGSEGV for garbage collection purposes.
> 

I'm not 100% convinced we need to be concerned with races with
mprotect(PROT_NONE) and a parallel reference to that area from userspace. I
would consider it to be a buggy application if two threads were not
co-ordinating the protection of a region and referencing it.  I would also
expect garbage collectors to be managing smart pointers and using reference
counting to copy between heap generations (or similar mechanisms) instead
of trapping sigsegv.

Intel's architectural manual 3A covers what happens for delayed TLB
invalidations in section 4.10.4.4 (in the version I'm looking at at
least). The following two snippets are the most important

	Software developers should understand that, between the modification
	of a paging-structure entry and execution of the invalidation
	instruction recommended in Section 4.10.4.2, the processor may
	use translations based on either the old value or the new value
	of the paging- structure entry. The following items describe some
	of the potential consequences of delayed invalidation:

	o If a paging-structure entry is modified to change from 1 to 0 the P
	flag from 1 to 0, an access to a linear address whose translation is
	controlled by this entry may or may not cause a page-fault exception.

	o If a paging-structure entry is modified to change the R/W flag
	from 0 to 1, write accesses to linear addresses whose translation is
	controlled by this entry may or may not cause a page-fault exception.

After the PROT_NONE may happen until after the deferred TLB flush. In a
race with mprotect(PROT_NONE) it'll either complete the access or receive
SIGSEGV signal due to failed protections but this is pretty much
expected and unpredictable.

I do not think the present bit gets cleared on mprotect(PROT_NONE) due
to the relevant bits been

#define _PAGE_CHG_MASK  (PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT | \
                         _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY)
#define PAGE_NONE   __pgprot(_PAGE_PROTNONE | _PAGE_ACCESSED)

If the present bit remains then compaction should flush the TLB on the
call to ptep_clear_flush as pte_accessible check is based on the present
bit. So even though it is possible for a write to complete during a call
to mprotect(PROT_NONE), the same is not true for compaction.

> They could lose modifications done in-between when
> the pte was set to PROT_NONE, and the actual TLB
> flush, if compaction moves the page around in-between
> those two events.
> 
> I don't know if this is a case we need to worry about
> at all, but I think the same fix would apply to that
> code path, so I guess we might as well make it...

I might be going "la la la la we're fine" and deluding myself but we
appear to be covered here and it would be a shame to add expense to a
path unnecessarily.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2013-12-06  9:24 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-03  8:51 [PATCH 00/14] NUMA balancing segmentation faults candidate fix on large machines Mel Gorman
2013-12-03  8:51 ` [PATCH 01/15] mm: numa: Do not batch handle PMD pages Mel Gorman
2013-12-03  8:51 ` [PATCH 02/15] mm: hugetlbfs: fix hugetlbfs optimization Mel Gorman
2013-12-03  8:51 ` [PATCH 03/15] mm: thp: give transparent hugepage code a separate copy_page Mel Gorman
2013-12-04 16:59   ` Alex Thorlton
2013-12-05 13:35     ` Mel Gorman
2013-12-03  8:51 ` [PATCH 04/15] mm: numa: Serialise parallel get_user_page against THP migration Mel Gorman
2013-12-03 23:07   ` Rik van Riel
2013-12-03 23:54     ` Mel Gorman
2013-12-03  8:51 ` [PATCH 05/15] mm: numa: Call MMU notifiers on " Mel Gorman
2013-12-03  8:51 ` [PATCH 06/15] mm: Clear pmd_numa before invalidating Mel Gorman
2013-12-03  8:51 ` [PATCH 07/15] mm: numa: Do not clear PMD during PTE update scan Mel Gorman
2013-12-03  8:51 ` [PATCH 08/15] mm: numa: Do not clear PTE for pte_numa update Mel Gorman
2013-12-03  8:51 ` [PATCH 09/15] mm: numa: Ensure anon_vma is locked to prevent parallel THP splits Mel Gorman
2013-12-03  8:51 ` [PATCH 10/15] mm: numa: Avoid unnecessary work on the failure path Mel Gorman
2013-12-03  8:51 ` [PATCH 11/15] sched: numa: Skip inaccessible VMAs Mel Gorman
2013-12-03  8:51 ` [PATCH 12/15] Clear numa on mprotect Mel Gorman
2013-12-03  8:52 ` [PATCH 13/15] mm: numa: Avoid unnecessary disruption of NUMA hinting during migration Mel Gorman
2013-12-03  8:52 ` [PATCH 14/15] mm: numa: Flush TLB if NUMA hinting faults race with PTE scan update Mel Gorman
2013-12-03 23:07   ` Rik van Riel
2013-12-03 23:46     ` Mel Gorman
2013-12-04 14:33       ` Rik van Riel
2013-12-04 16:07         ` Mel Gorman
2013-12-05 15:40           ` Rik van Riel
2013-12-05 19:54             ` Mel Gorman
2013-12-05 20:05               ` Rik van Riel
2013-12-06  9:24                 ` Mel Gorman [this message]
2013-12-06 17:38                   ` Alex Thorlton
2013-12-06 18:32                     ` Mel Gorman
2013-12-06 19:13           ` [PATCH 14/15] mm: fix TLB flush race between migration, and change_protection_range Rik van Riel
2013-12-06 20:32             ` Christoph Lameter
2013-12-06 21:21               ` Rik van Riel
2013-12-07  0:25                 ` Christoph Lameter
2013-12-07  3:14                   ` Rik van Riel
2013-12-09 16:00                     ` Christoph Lameter
2013-12-09 16:27                       ` Mel Gorman
2013-12-09 16:59                         ` Christoph Lameter
2013-12-09 21:01                           ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131206092400.GJ11295@suse.de \
    --to=mgorman@suse.de \
    --cc=athorlton@sgi.com \
    --cc=hhuang@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).