Re: [PATCH 3/6] mm: Defer TLB flush after unmap as long as possible

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mel Gorman <mgorman@suse.de>
To: Rik van Riel <riel@redhat.com>
Cc: Linux-MM <linux-mm@kvack.org>, Hugh Dickins <hughd@google.com>,
	Minchan Kim <minchan@kernel.org>,
	Dave Hansen <dave.hansen@intel.com>,
	Andi Kleen <andi@firstfloor.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 3/6] mm: Defer TLB flush after unmap as long as possible
Date: Tue, 21 Apr 2015 22:17:04 +0100	[thread overview]
Message-ID: <20150421211704.GC14842@suse.de> (raw)
In-Reply-To: <5536B386.4050808@redhat.com>

On Tue, Apr 21, 2015 at 04:31:02PM -0400, Rik van Riel wrote:
> On 04/21/2015 06:41 AM, Mel Gorman wrote:
> > If a PTE is unmapped and it's dirty then it was writable recently. Due
> > to deferred TLB flushing, it's best to assume a writable TLB cache entry
> > exists. With that assumption, the TLB must be flushed before any IO can
> > start or the page is freed to avoid lost writes or data corruption. Prior
> > to this patch, such PFNs were simply flushed immediately. In this patch,
> > the caller is informed that such entries potentially exist and it's up to
> > the caller to flush before pages are freed or IO can start.
> > 
> > Signed-off-by: Mel Gorman <mgorman@suse.de>
> 
> > @@ -1450,10 +1455,11 @@ static int page_not_mapped(struct page *page)
> >   * page, used in the pageout path.  Caller must hold the page lock.
> >   * Return values are:
> >   *
> > - * SWAP_SUCCESS	- we succeeded in removing all mappings
> > - * SWAP_AGAIN	- we missed a mapping, try again later
> > - * SWAP_FAIL	- the page is unswappable
> > - * SWAP_MLOCK	- page is mlocked.
> > + * SWAP_SUCCESS	       - we succeeded in removing all mappings
> > + * SWAP_SUCCESS_CACHED - Like SWAP_SUCCESS but a writable TLB entry may exist
> > + * SWAP_AGAIN	       - we missed a mapping, try again later
> > + * SWAP_FAIL	       - the page is unswappable
> > + * SWAP_MLOCK	       - page is mlocked.
> >   */
> >  int try_to_unmap(struct page *page, enum ttu_flags flags)
> >  {
> > @@ -1481,7 +1487,8 @@ int try_to_unmap(struct page *page, enum ttu_flags flags)
> >  	ret = rmap_walk(page, &rwc);
> >  
> >  	if (ret != SWAP_MLOCK && !page_mapped(page))
> > -		ret = SWAP_SUCCESS;
> > +		ret = (ret == SWAP_AGAIN_CACHED) ? SWAP_SUCCESS_CACHED : SWAP_SUCCESS;
> > +
> >  	return ret;
> >  }
> 
> This wants a big fat comment explaining why SWAP_AGAIN_CACHED is turned
> into SWAP_SUCCESS_CACHED.
> 

I'll add something in V4. SWAP_AGAIN_CACHED indicates to rmap_walk that
it should continue the rmap but that a write cached PTE was encountered.
SWAP_SUCCESS is what callers of try_to_unmap() expect so
SWAP_SUCCESS_CACHED is the equivalent.

> I think I understand why this is happening, but I am not sure how to
> explain it...
> 
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 12ec298087b6..0ad3f435afdd 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -860,6 +860,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
> >  	unsigned long nr_reclaimed = 0;
> >  	unsigned long nr_writeback = 0;
> >  	unsigned long nr_immediate = 0;
> > +	bool tlb_flush_required = false;
> >  
> >  	cond_resched();
> >  
> > @@ -1032,6 +1033,9 @@ static unsigned long shrink_page_list(struct list_head *page_list,
> >  				goto keep_locked;
> >  			case SWAP_MLOCK:
> >  				goto cull_mlocked;
> > +			case SWAP_SUCCESS_CACHED:
> > +				/* Must flush before free, fall through */
> > +				tlb_flush_required = true;
> >  			case SWAP_SUCCESS:
> >  				; /* try to free the page below */
> >  			}
> > @@ -1176,7 +1180,8 @@ keep:
> >  	}
> >  
> >  	mem_cgroup_uncharge_list(&free_pages);
> > -	try_to_unmap_flush();
> > +	if (tlb_flush_required)
> > +		try_to_unmap_flush();
> >  	free_hot_cold_page_list(&free_pages, true);
> 
> Don't we have to flush the TLB before calling pageout() on the page?
> 

Not any more. It got removed in patch 2 up and I forgot to reintroduce it
with a tlb_flush_required check here. Thanks for that.

> In other words, we would also have to batch up calls to pageout(), if
> we want to do batched TLB flushing.
> 
> This could be accomplished by putting the SWAP_SUCCESS_CACHED pages on
> a special list, instead of calling pageout() on them immediately, and
> then calling pageout() on all the pages on that list after the batch
> flush.
> 

True. We had discussed something like that on IRC. It's a good idea but
a separate patch because it's less clear-cut. We're taking a partial pass
through the list in an attempt to reduce IPIs.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Mel Gorman <mgorman@suse.de>
To: Rik van Riel <riel@redhat.com>
Cc: Linux-MM <linux-mm@kvack.org>, Hugh Dickins <hughd@google.com>,
	Minchan Kim <minchan@kernel.org>,
	Dave Hansen <dave.hansen@intel.com>,
	Andi Kleen <andi@firstfloor.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 3/6] mm: Defer TLB flush after unmap as long as possible
Date: Tue, 21 Apr 2015 22:17:04 +0100	[thread overview]
Message-ID: <20150421211704.GC14842@suse.de> (raw)
In-Reply-To: <5536B386.4050808@redhat.com>

On Tue, Apr 21, 2015 at 04:31:02PM -0400, Rik van Riel wrote:
> On 04/21/2015 06:41 AM, Mel Gorman wrote:
> > If a PTE is unmapped and it's dirty then it was writable recently. Due
> > to deferred TLB flushing, it's best to assume a writable TLB cache entry
> > exists. With that assumption, the TLB must be flushed before any IO can
> > start or the page is freed to avoid lost writes or data corruption. Prior
> > to this patch, such PFNs were simply flushed immediately. In this patch,
> > the caller is informed that such entries potentially exist and it's up to
> > the caller to flush before pages are freed or IO can start.
> > 
> > Signed-off-by: Mel Gorman <mgorman@suse.de>
> 
> > @@ -1450,10 +1455,11 @@ static int page_not_mapped(struct page *page)
> >   * page, used in the pageout path.  Caller must hold the page lock.
> >   * Return values are:
> >   *
> > - * SWAP_SUCCESS	- we succeeded in removing all mappings
> > - * SWAP_AGAIN	- we missed a mapping, try again later
> > - * SWAP_FAIL	- the page is unswappable
> > - * SWAP_MLOCK	- page is mlocked.
> > + * SWAP_SUCCESS	       - we succeeded in removing all mappings
> > + * SWAP_SUCCESS_CACHED - Like SWAP_SUCCESS but a writable TLB entry may exist
> > + * SWAP_AGAIN	       - we missed a mapping, try again later
> > + * SWAP_FAIL	       - the page is unswappable
> > + * SWAP_MLOCK	       - page is mlocked.
> >   */
> >  int try_to_unmap(struct page *page, enum ttu_flags flags)
> >  {
> > @@ -1481,7 +1487,8 @@ int try_to_unmap(struct page *page, enum ttu_flags flags)
> >  	ret = rmap_walk(page, &rwc);
> >  
> >  	if (ret != SWAP_MLOCK && !page_mapped(page))
> > -		ret = SWAP_SUCCESS;
> > +		ret = (ret == SWAP_AGAIN_CACHED) ? SWAP_SUCCESS_CACHED : SWAP_SUCCESS;
> > +
> >  	return ret;
> >  }
> 
> This wants a big fat comment explaining why SWAP_AGAIN_CACHED is turned
> into SWAP_SUCCESS_CACHED.
> 

I'll add something in V4. SWAP_AGAIN_CACHED indicates to rmap_walk that
it should continue the rmap but that a write cached PTE was encountered.
SWAP_SUCCESS is what callers of try_to_unmap() expect so
SWAP_SUCCESS_CACHED is the equivalent.

> I think I understand why this is happening, but I am not sure how to
> explain it...
> 
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 12ec298087b6..0ad3f435afdd 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -860,6 +860,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
> >  	unsigned long nr_reclaimed = 0;
> >  	unsigned long nr_writeback = 0;
> >  	unsigned long nr_immediate = 0;
> > +	bool tlb_flush_required = false;
> >  
> >  	cond_resched();
> >  
> > @@ -1032,6 +1033,9 @@ static unsigned long shrink_page_list(struct list_head *page_list,
> >  				goto keep_locked;
> >  			case SWAP_MLOCK:
> >  				goto cull_mlocked;
> > +			case SWAP_SUCCESS_CACHED:
> > +				/* Must flush before free, fall through */
> > +				tlb_flush_required = true;
> >  			case SWAP_SUCCESS:
> >  				; /* try to free the page below */
> >  			}
> > @@ -1176,7 +1180,8 @@ keep:
> >  	}
> >  
> >  	mem_cgroup_uncharge_list(&free_pages);
> > -	try_to_unmap_flush();
> > +	if (tlb_flush_required)
> > +		try_to_unmap_flush();
> >  	free_hot_cold_page_list(&free_pages, true);
> 
> Don't we have to flush the TLB before calling pageout() on the page?
> 

Not any more. It got removed in patch 2 up and I forgot to reintroduce it
with a tlb_flush_required check here. Thanks for that.

> In other words, we would also have to batch up calls to pageout(), if
> we want to do batched TLB flushing.
> 
> This could be accomplished by putting the SWAP_SUCCESS_CACHED pages on
> a special list, instead of calling pageout() on them immediately, and
> then calling pageout() on all the pages on that list after the batch
> flush.
> 

True. We had discussed something like that on IRC. It's a good idea but
a separate patch because it's less clear-cut. We're taking a partial pass
through the list in an attempt to reduce IPIs.

-- 
Mel Gorman
SUSE Labs

next prev parent reply	other threads:[~2015-04-21 21:17 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-21 10:41 [RFC PATCH 0/6] TLB flush multiple pages with a single IPI v3 Mel Gorman
2015-04-21 10:41 ` Mel Gorman
2015-04-21 10:41 ` [PATCH 1/6] x86, mm: Trace when an IPI is about to be sent Mel Gorman
2015-04-21 10:41   ` Mel Gorman
2015-04-21 10:41 ` [PATCH 2/6] mm: Send a single IPI to TLB flush multiple pages when unmapping Mel Gorman
2015-04-21 10:41   ` Mel Gorman
2015-04-21 10:41 ` [PATCH 3/6] mm: Defer TLB flush after unmap as long as possible Mel Gorman
2015-04-21 10:41   ` Mel Gorman
2015-04-21 20:31   ` Rik van Riel
2015-04-21 20:31     ` Rik van Riel
2015-04-21 21:17     ` Mel Gorman [this message]
2015-04-21 21:17       ` Mel Gorman
2015-04-21 10:41 ` [PATCH 4/6] mm, migrate: Drop references to successfully migrated pages at the same time Mel Gorman
2015-04-21 10:41   ` Mel Gorman
2015-04-21 10:41 ` [PATCH 5/6] mm, migrate: Batch TLB flushing when unmapping pages for migration Mel Gorman
2015-04-21 10:41   ` Mel Gorman
2015-04-21 10:41 ` [PATCH 6/6] mm: Gather more PFNs before sending a TLB to flush unmapped pages Mel Gorman
2015-04-21 10:41   ` Mel Gorman
2015-04-24 14:46 ` [RFC PATCH 0/6] TLB flush multiple pages with a single IPI v3 Vlastimil Babka
2015-04-24 14:46   ` Vlastimil Babka
2015-04-24 15:16   ` Mel Gorman
2015-04-24 15:16     ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150421211704.GC14842@suse.de \
    --to=mgorman@suse.de \
    --cc=andi@firstfloor.org \
    --cc=dave.hansen@intel.com \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.