[PATCH 2 of 3] mremap: avoid sending one IPI per page

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: aarcange@redhat.com
To: linux-mm@kvack.org
Cc: Mel Gorman <mel@csn.ul.ie>, Johannes Weiner <jweiner@redhat.com>,
	Rik van Riel <riel@redhat.com>, Hugh Dickins <hughd@google.com>
Subject: [PATCH 2 of 3] mremap: avoid sending one IPI per page
Date: Sat, 06 Aug 2011 18:58:04 +0200	[thread overview]
Message-ID: <cbe9e822c59a912e9f76.1312649884@localhost> (raw)
In-Reply-To: <patchbomb.1312649882@localhost>

From: Andrea Arcangeli <aarcange@redhat.com>

This replaces ptep_clear_flush() with ptep_get_and_clear() and a single
flush_tlb_range() at the end of the loop, to avoid sending one IPI for each
page.

The mmu_notifier_invalidate_range_start/end section is enlarged accordingly but
this is not going to fundamentally change things. It was more by accident that
the region under mremap was for the most part still available for secondary
MMUs: the primary MMU was never allowed to reliably access that region for the
duration of the mremap (modulo trapping SIGSEGV on the old address range which
sounds unpractical and flakey). If users wants secondary MMUs not to lose
access to a large region under mremap they should reduce the mremap size
accordingly in userland and run multiple calls. Overall this will run faster so
it's actually going to reduce the time the region is under mremap for the
primary MMU which should provide a net benefit to apps.

For KVM this is a noop because the guest physical memory is never mremapped,
there's just no point it ever moving it while guest runs. One target of this
optimization is JVM GC (so unrelated to the mmu notifier logic).

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---

diff --git a/mm/mremap.c b/mm/mremap.c
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -80,11 +80,7 @@ static void move_ptes(struct vm_area_str
 	struct mm_struct *mm = vma->vm_mm;
 	pte_t *old_pte, *new_pte, pte;
 	spinlock_t *old_ptl, *new_ptl;
-	unsigned long old_start;
 
-	old_start = old_addr;
-	mmu_notifier_invalidate_range_start(vma->vm_mm,
-					    old_start, old_end);
 	if (vma->vm_file) {
 		/*
 		 * Subtle point from Rajesh Venkatasubramanian: before
@@ -111,7 +107,7 @@ static void move_ptes(struct vm_area_str
 				   new_pte++, new_addr += PAGE_SIZE) {
 		if (pte_none(*old_pte))
 			continue;
-		pte = ptep_clear_flush(vma, old_addr, old_pte);
+		pte = ptep_get_and_clear(mm, old_addr, old_pte);
 		pte = move_pte(pte, new_vma->vm_page_prot, old_addr, new_addr);
 		set_pte_at(mm, new_addr, new_pte, pte);
 	}
@@ -123,7 +119,6 @@ static void move_ptes(struct vm_area_str
 	pte_unmap_unlock(old_pte - 1, old_ptl);
 	if (mapping)
 		mutex_unlock(&mapping->i_mmap_mutex);
-	mmu_notifier_invalidate_range_end(vma->vm_mm, old_start, old_end);
 }
 
 #define LATENCY_LIMIT	(64 * PAGE_SIZE)
@@ -134,10 +129,13 @@ unsigned long move_page_tables(struct vm
 {
 	unsigned long extent, next, old_end;
 	pmd_t *old_pmd, *new_pmd;
+	bool need_flush = false;
 
 	old_end = old_addr + len;
 	flush_cache_range(vma, old_addr, old_end);
 
+	mmu_notifier_invalidate_range_start(vma->vm_mm, old_addr, old_end);
+
 	for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
 		cond_resched();
 		next = (old_addr + PMD_SIZE) & PMD_MASK;
@@ -158,7 +156,12 @@ unsigned long move_page_tables(struct vm
 			extent = LATENCY_LIMIT;
 		move_ptes(vma, old_pmd, old_addr, old_addr + extent,
 				new_vma, new_pmd, new_addr);
+		need_flush = true;
 	}
+	if (likely(need_flush))
+		flush_tlb_range(vma, old_end-len, old_addr);
+
+	mmu_notifier_invalidate_range_end(vma->vm_mm, old_end-len, old_end);
 
 	return len + old_addr - old_end;	/* how much done */
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2011-08-06 18:18 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-06 16:58 [PATCH 0 of 3] THP: mremap support and TLB optimization #3 aarcange
2011-08-06 16:58 ` [PATCH 1 of 3] mremap: check for overflow using deltas aarcange
2011-08-08  8:20   ` Johannes Weiner
2011-08-10 10:48   ` Mel Gorman
2011-08-11  0:14   ` Rik van Riel
2011-08-06 16:58 ` aarcange [this message]
2011-08-08  8:23   ` [PATCH 2 of 3] mremap: avoid sending one IPI per page Johannes Weiner
2011-08-10 10:55   ` Mel Gorman
2011-08-11  0:16   ` Rik van Riel
2011-08-06 16:58 ` [PATCH 3 of 3] thp: mremap support and TLB optimization aarcange
2011-08-08  8:25   ` Johannes Weiner
2011-08-10 11:01   ` Mel Gorman
2011-08-11  0:26   ` Rik van Riel
2011-08-23 21:14   ` Andrew Morton
2011-08-23 22:13     ` Andrea Arcangeli
2011-08-23 22:25       ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cbe9e822c59a912e9f76.1312649884@localhost \
    --to=aarcange@redhat.com \
    --cc=hughd@google.com \
    --cc=jweiner@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).