linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Rik van Riel <riel@redhat.com>
To: Mel Gorman <mgorman@suse.de>, Linux-MM <linux-mm@kvack.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Dave Hansen <dave.hansen@intel.com>,
	Andi Kleen <andi@firstfloor.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/4] mm: Send a single IPI to TLB flush multiple pages when unmapping
Date: Wed, 15 Apr 2015 17:03:16 -0400	[thread overview]
Message-ID: <552ED214.3050105@redhat.com> (raw)
In-Reply-To: <1429094576-5877-3-git-send-email-mgorman@suse.de>

On 04/15/2015 06:42 AM, Mel Gorman wrote:
> An IPI is sent to flush remote TLBs when a page is unmapped that was
> recently accessed by other CPUs. There are many circumstances where this
> happens but the obvious one is kswapd reclaiming pages belonging to a
> running process as kswapd and the task are likely running on separate CPUs.
> 
> On small machines, this is not a significant problem but as machine
> gets larger with more cores and more memory, the cost of these IPIs can
> be high. This patch uses a structure similar in principle to a pagevec
> to collect a list of PFNs and CPUs that require flushing. It then sends
> one IPI to flush the list of PFNs. A new TLB flush helper is required for
> this and one is added for x86. Other architectures will need to decide if
> batching like this is both safe and worth the memory overhead. Specifically
> the requirement is;
> 
> 	If a clean page is unmapped and not immediately flushed, the
> 	architecture must guarantee that a write to that page from a CPU
> 	with a cached TLB entry will trap a page fault.
> 
> This is essentially what the kernel already depends on but the window is
> much larger with this patch applied and is worth highlighting.

This means we already have a (hard to hit?) data corruption
issue in the kernel.  We can lose data if we unmap a writable
but not dirty pte from a file page, and the task writes before
we flush the TLB.

I can only see one way to completely close the window, and that
is to make the pte(s) read-only, and flush the TLB before unmapping
and then flushing the TLB again. Luckily this is only true for
ptes that are both writeable and clean.

This would of course not be acceptable overhead when flushing things
one page at a time, but if we are moving to batched TLB flushes
anyway, there may be a way around this...

1) Check whether the to-be-unmapped pte is read-only, or the page is
   already marked dirty, if either is true, we can go straight to (4).
2) Mark a larger number of ptes read-only in one go (one page table
   page worth of ptes perhaps?)
3) Flush the TLBs for the task(s) with recently turned read-only ptes.
4) Unmap PTEs like your patch series does.
5) Flush the TLBs like your patch series does.

This might require some protection in the page fault code, to ensure
do_wp_page does not mark the pte read-write again in-between (2) and
(4). Then again, do_wp_page does mark the page dirty so we may be ok.

As an aside, it may be worth just doing a global tlb flush if the number
of entries in a ubc exceeds a certain number.

It may also be worth moving try_to_unmap_flush() from shrink_lruvec()
to shrink_zone(), so it is called once per zone and not once per cgroup
inside the zone. I guess we do need to call it before we call
should_continue_reclaim(), though :)



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2015-04-15 21:03 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-15 10:42 [RFC PATCH 0/4] TLB flush multiple pages with a single IPI Mel Gorman
2015-04-15 10:42 ` [PATCH 1/4] x86, mm: Trace when an IPI is about to be sent Mel Gorman
2015-04-15 10:42 ` [PATCH 2/4] mm: Send a single IPI to TLB flush multiple pages when unmapping Mel Gorman
2015-04-15 21:03   ` Rik van Riel [this message]
2015-04-15 21:16     ` Hugh Dickins
2015-04-15 21:28       ` Mel Gorman
2015-04-15 21:32         ` Dave Hansen
2015-04-16  6:38         ` Minchan Kim
2015-04-16  8:07           ` Mel Gorman
2015-04-16  8:29             ` Minchan Kim
2015-04-16  9:19               ` Mel Gorman
2015-04-16 23:30                 ` Minchan Kim
2015-04-15 22:20   ` Andi Kleen
2015-04-15 22:53     ` Mel Gorman
2015-04-15 10:42 ` [PATCH 3/4] mm: Gather more PFNs before sending a TLB to flush unmapped pages Mel Gorman
2015-04-15 11:42   ` Peter Zijlstra
2015-04-15 12:15     ` Mel Gorman
2015-04-15 12:24       ` Peter Zijlstra
2015-04-15 12:56         ` Mel Gorman
2015-04-15 10:42 ` [PATCH 4/4] mm: migrate: Batch TLB flushing when unmapping pages for migration Mel Gorman
2015-04-15 21:06   ` Hugh Dickins
2015-04-15 21:44     ` Mel Gorman
2015-04-15 23:50       ` Hugh Dickins
  -- strict thread matches above, loose matches on Subject: below --
2015-04-16 10:22 [RFC PATCH 0/4] TLB flush multiple pages with a single IPI v2 Mel Gorman
2015-04-16 10:22 ` [PATCH 2/4] mm: Send a single IPI to TLB flush multiple pages when unmapping Mel Gorman
2015-04-16 15:52   ` Rik van Riel
2015-04-16 19:21   ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=552ED214.3050105@redhat.com \
    --to=riel@redhat.com \
    --cc=andi@firstfloor.org \
    --cc=dave.hansen@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).