[RFC PATCH 0/6] Improving munlock() performance for large non-THP areas

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH 0/6] Improving munlock() performance for large non-THP areas
@ 2013-08-05 14:31 Vlastimil Babka
  2013-08-05 14:32 ` [RFC PATCH 1/6] mm: putback_lru_page: remove unnecessary call to page_lru_base_type() Vlastimil Babka
                   ` (7 more replies)
  0 siblings, 8 replies; 13+ messages in thread
From: Vlastimil Babka @ 2013-08-05 14:31 UTC (permalink / raw)
  To: joern; +Cc: mgorman, linux-mm, Vlastimil Babka

Hi everyone and apologies for any mistakes in my first attempt at linux-mm
contribution :)

The goal of this patch series is to improve performance of munlock() of large
mlocked memory areas on systems without THP. This is motivated by reported very
long times of crash recovery of processes with such areas, where munlock() can
take several seconds. See http://lwn.net/Articles/548108/

The work was driven by a simple benchmark (to be included in mmtests) that
mmaps() e.g. 56GB with MAP_LOCKED | MAP_POPULATE and measures the time of
munlock(). Profiling was performed by attaching operf --pid to the process
and sending a signal to trigger the munlock() part and then notify bach
the monitoring wrapper to stop operf, so that only munlock() appears in the
profile.

The profiles have shown that CPU time is spent mostly by atomic operations
and locking, which the patches aim to reduce, starting from easier to more
complex changes.

Patch 1 performs a simple cleanup in putback_lru_page() so that page lru base
	type is not determined without being actually needed.

Patch 2 removes an unnecessary call to lru_add_drain() which drains the per-cpu
	pagevec after each munlocked page is put there.

Patch 3 changes munlock_vma_range() to use an on-stack pagevec for isolating
	multiple non-THP pages under a single lru_lock instead of locking and
	processing each page separately.

Patch 4 changes the NR_MLOCK accounting to be called only once per the pvec
	introduced by previous patch.

Patch 5 uses the introduced pagevec to batch also the work of putback_lru_page
	when possible, bypassing the per-cpu pvec and associated overhead.

Patch 6 Removes a redundant get_page/put_page pair which saves costly atomic
	operations.

Measurements were made using 3.11-rc3 as a baseline.

timedmunlock
                            3.11-rc3              3.11-rc3              3.11-rc3              3.11-rc3              3.11-rc3              3.11-rc3              3.11-rc3
                                   0                     1                     2                     3                     4                     5                     6
Elapsed min           3.38 (  0.00%)        3.39 ( -0.14%)        3.00 ( 11.35%)        2.73 ( 19.48%)        2.72 ( 19.50%)        2.34 ( 30.78%)        2.16 ( 36.23%)
Elapsed mean          3.39 (  0.00%)        3.39 ( -0.05%)        3.01 ( 11.25%)        2.73 ( 19.54%)        2.73 ( 19.41%)        2.36 ( 30.30%)        2.17 ( 36.00%)
Elapsed stddev        0.01 (  0.00%)        0.00 ( 71.98%)        0.01 (-71.14%)        0.00 ( 89.12%)        0.01 (-48.55%)        0.03 (-277.27%)        0.01 (-85.75%)
Elapsed max           3.41 (  0.00%)        3.40 (  0.39%)        3.04 ( 10.81%)        2.73 ( 19.96%)        2.76 ( 19.09%)        2.43 ( 28.64%)        2.20 ( 35.41%)
Elapsed range         0.02 (  0.00%)        0.01 ( 74.99%)        0.04 (-66.12%)        0.00 ( 88.12%)        0.03 (-39.24%)        0.09 (-274.85%)        0.04 (-81.04%)


Vlastimil Babka (6):
  mm: putback_lru_page: remove unnecessary call to page_lru_base_type()
  mm: munlock: remove unnecessary call to lru_add_drain()
  mm: munlock: batch non-THP page isolation and munlock+putback using
    pagevec
  mm: munlock: batch NR_MLOCK zone state updates
  mm: munlock: bypass per-cpu pvec for putback_lru_page
  mm: munlock: remove redundant get_page/put_page pair on the fast path

 mm/mlock.c  | 259 ++++++++++++++++++++++++++++++++++++++++++++++++++----------
 mm/vmscan.c |  12 +--
 2 files changed, 224 insertions(+), 47 deletions(-)

-- 
1.8.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH 1/6] mm: putback_lru_page: remove unnecessary call to page_lru_base_type()
  2013-08-05 14:31 [RFC PATCH 0/6] Improving munlock() performance for large non-THP areas Vlastimil Babka
@ 2013-08-05 14:32 ` Vlastimil Babka
  2013-08-05 14:32 ` [RFC PATCH 2/6] mm: munlock: remove unnecessary call to lru_add_drain() Vlastimil Babka
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Vlastimil Babka @ 2013-08-05 14:32 UTC (permalink / raw)
  To: joern; +Cc: mgorman, linux-mm, Vlastimil Babka

In putback_lru_page() since commit c53954a092 (""mm: remove lru parameter from
__lru_cache_add and lru_cache_add_lru") it is no longer needed to determine lru
list via page_lru_base_type().

This patch replaces it with simple flag is_unevictable which says that the page
was put on the inevictable list. This is the only information that matters in
subsequent tests.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/vmscan.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 2cff0d4..0fa537e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -545,7 +545,7 @@ int remove_mapping(struct address_space *mapping, struct page *page)
  */
 void putback_lru_page(struct page *page)
 {
-	int lru;
+	bool is_unevictable;
 	int was_unevictable = PageUnevictable(page);
 
 	VM_BUG_ON(PageLRU(page));
@@ -560,14 +560,14 @@ redo:
 		 * unevictable page on [in]active list.
 		 * We know how to handle that.
 		 */
-		lru = page_lru_base_type(page);
+		is_unevictable = false;
 		lru_cache_add(page);
 	} else {
 		/*
 		 * Put unevictable pages directly on zone's unevictable
 		 * list.
 		 */
-		lru = LRU_UNEVICTABLE;
+		is_unevictable = true;
 		add_page_to_unevictable_list(page);
 		/*
 		 * When racing with an mlock or AS_UNEVICTABLE clearing
@@ -587,7 +587,7 @@ redo:
 	 * page is on unevictable list, it never be freed. To avoid that,
 	 * check after we added it to the list, again.
 	 */
-	if (lru == LRU_UNEVICTABLE && page_evictable(page)) {
+	if (is_unevictable && page_evictable(page)) {
 		if (!isolate_lru_page(page)) {
 			put_page(page);
 			goto redo;
@@ -598,9 +598,9 @@ redo:
 		 */
 	}
 
-	if (was_unevictable && lru != LRU_UNEVICTABLE)
+	if (was_unevictable && !is_unevictable)
 		count_vm_event(UNEVICTABLE_PGRESCUED);
-	else if (!was_unevictable && lru == LRU_UNEVICTABLE)
+	else if (!was_unevictable && is_unevictable)
 		count_vm_event(UNEVICTABLE_PGCULLED);
 
 	put_page(page);		/* drop ref from isolate */
-- 
1.8.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH 2/6] mm: munlock: remove unnecessary call to lru_add_drain()
  2013-08-05 14:31 [RFC PATCH 0/6] Improving munlock() performance for large non-THP areas Vlastimil Babka
  2013-08-05 14:32 ` [RFC PATCH 1/6] mm: putback_lru_page: remove unnecessary call to page_lru_base_type() Vlastimil Babka
@ 2013-08-05 14:32 ` Vlastimil Babka
  2013-08-05 14:32 ` [RFC PATCH 3/6] mm: munlock: batch non-THP page isolation and munlock+putback using pagevec Vlastimil Babka
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Vlastimil Babka @ 2013-08-05 14:32 UTC (permalink / raw)
  To: joern; +Cc: mgorman, linux-mm, Vlastimil Babka

In munlock_vma_range(), lru_add_drain() is currently called in a loop before
each munlock_vma_page() call.
This is suboptimal for performance when munlocking many pages. The benefits
of per-cpu pagevec for batching the LRU putback are removed since the pagevec
only holds at most one page from the previous loop's iteration.

The lru_add_drain() call also does not serve any purposes for correctness - it
does not even drain pagavecs of all cpu's. The munlock code already expects
and handles situations where a page cannot be isolated from the LRU (e.g.
because it is on some per-cpu pagevec).

The history of the (not commented) call also suggest that it appears there as
an oversight rather than intentionally. Before commit ff6a6da6 ("mm: accelerate
munlock() treatment of THP pages") the call happened only once upon entering the
function. The commit has moved the call into the while loope. So while the
other changes in the commit improved munlock performance for THP pages, it
introduced the abovementioned suboptimal per-cpu pagevec usage.

Further in history, before commit 408e82b7 ("mm: munlock use follow_page"),
munlock_vma_pages_range() was just a wrapper around __mlock_vma_pages_range
which performed both mlock and munlock depending on a flag. However, before
ba470de4 ("mmap: handle mlocked pages during map, remap, unmap") the function
handled only mlock, not munlock. The lru_add_drain call thus comes from the
implementation in commit b291f000 ("mlock: mlocked pages are unevictable" and
was intended only for mlocking, not munlocking. The original intention of
draining the LRU pagevec at mlock time was to ensure the pages were on the LRU
before the lock operation so that they could be placed on the unevictable list
immediately. There is very little motivation to do the same in the munlock path
this, particularly for every single page.

This patch therefore removes the call completely. After removing the call, a
10% speedup was measured for munlock() of a 56GB large memory area with THP
disabled.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/mlock.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/mm/mlock.c b/mm/mlock.c
index 79b7cf7..b85f1e8 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -247,7 +247,6 @@ void munlock_vma_pages_range(struct vm_area_struct *vma,
 					&page_mask);
 		if (page && !IS_ERR(page)) {
 			lock_page(page);
-			lru_add_drain();
 			/*
 			 * Any THP page found by follow_page_mask() may have
 			 * gotten split before reaching munlock_vma_page(),
-- 
1.8.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH 3/6] mm: munlock: batch non-THP page isolation and munlock+putback using pagevec
  2013-08-05 14:31 [RFC PATCH 0/6] Improving munlock() performance for large non-THP areas Vlastimil Babka
  2013-08-05 14:32 ` [RFC PATCH 1/6] mm: putback_lru_page: remove unnecessary call to page_lru_base_type() Vlastimil Babka
  2013-08-05 14:32 ` [RFC PATCH 2/6] mm: munlock: remove unnecessary call to lru_add_drain() Vlastimil Babka
@ 2013-08-05 14:32 ` Vlastimil Babka
  2013-08-05 17:21   ` Jörn Engel
  2013-08-05 14:32 ` [RFC PATCH 4/6] mm: munlock: batch NR_MLOCK zone state updates Vlastimil Babka
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 13+ messages in thread
From: Vlastimil Babka @ 2013-08-05 14:32 UTC (permalink / raw)
  To: joern; +Cc: mgorman, linux-mm, Vlastimil Babka

Currently, munlock_vma_range() calls munlock_vma_page on each page in a loop,
which results in repeated taking and releasing of the lru_lock spinlock for
isolating pages one by one. This patch batches the munlock operations using
an on-stack pagevec, so that isolation is done under single lru_lock. For THP
pages, the old behavior is preserved as they might be split while putting them
into the pagevec. After this patch, a 9% speedup was measured for munlocking
a 56GB large memory area with THP disabled.

A new function __munlock_pagevec() is introduced that takes a pagevec and:
1) It clears PageMlocked and isolates all pages under lru_lock. Zone page stats
can be also updated using the variant which assumes disabled interrupts.
2) It finishes the munlock and lru putback on all pages under their lock_page.
Note that previously, lock_page covered also the PageMlocked clearing and page
isolation, but it is not needed for those operations.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/mlock.c | 197 ++++++++++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 157 insertions(+), 40 deletions(-)

diff --git a/mm/mlock.c b/mm/mlock.c
index b85f1e8..08689b6 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -11,6 +11,7 @@
 #include <linux/swap.h>
 #include <linux/swapops.h>
 #include <linux/pagemap.h>
+#include <linux/pagevec.h>
 #include <linux/mempolicy.h>
 #include <linux/syscalls.h>
 #include <linux/sched.h>
@@ -18,6 +19,8 @@
 #include <linux/rmap.h>
 #include <linux/mmzone.h>
 #include <linux/hugetlb.h>
+#include <linux/memcontrol.h>
+#include <linux/mm_inline.h>
 
 #include "internal.h"
 
@@ -87,6 +90,47 @@ void mlock_vma_page(struct page *page)
 	}
 }
 
+/*
+ * Finish munlock after successful page isolation
+ *
+ * Page must be locked. This is a wrapper for try_to_munlock()
+ * and putback_lru_page() with munlock accounting.
+ */
+static void __munlock_isolated_page(struct page *page)
+{
+	int ret = SWAP_AGAIN;
+
+	/*
+	 * Optimization: if the page was mapped just once, that's our mapping
+	 * and we don't need to check all the other vmas.
+	 */
+	if (page_mapcount(page) > 1)
+		ret = try_to_munlock(page);
+
+	/* Did try_to_unlock() succeed or punt? */
+	if (ret != SWAP_MLOCK)
+		count_vm_event(UNEVICTABLE_PGMUNLOCKED);
+
+	putback_lru_page(page);
+}
+
+/*
+ * Accounting for page isolation fail during munlock
+ *
+ * Performs accounting when page isolation fails in munlock. There is nothing
+ * else to do because it means some other task has already removed the page
+ * from the LRU. putback_lru_page() will take care of removing the page from
+ * the unevictable list, if necessary. vmscan [page_referenced()] will move
+ * the page back to the unevictable list if some other vma has it mlocked.
+ */
+static void __munlock_isolation_failed(struct page *page)
+{
+	if (PageUnevictable(page))
+		count_vm_event(UNEVICTABLE_PGSTRANDED);
+	else
+		count_vm_event(UNEVICTABLE_PGMUNLOCKED);
+}
+
 /**
  * munlock_vma_page - munlock a vma page
  * @page - page to be unlocked
@@ -112,37 +156,10 @@ unsigned int munlock_vma_page(struct page *page)
 		unsigned int nr_pages = hpage_nr_pages(page);
 		mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages);
 		page_mask = nr_pages - 1;
-		if (!isolate_lru_page(page)) {
-			int ret = SWAP_AGAIN;
-
-			/*
-			 * Optimization: if the page was mapped just once,
-			 * that's our mapping and we don't need to check all the
-			 * other vmas.
-			 */
-			if (page_mapcount(page) > 1)
-				ret = try_to_munlock(page);
-			/*
-			 * did try_to_unlock() succeed or punt?
-			 */
-			if (ret != SWAP_MLOCK)
-				count_vm_event(UNEVICTABLE_PGMUNLOCKED);
-
-			putback_lru_page(page);
-		} else {
-			/*
-			 * Some other task has removed the page from the LRU.
-			 * putback_lru_page() will take care of removing the
-			 * page from the unevictable list, if necessary.
-			 * vmscan [page_referenced()] will move the page back
-			 * to the unevictable list if some other vma has it
-			 * mlocked.
-			 */
-			if (PageUnevictable(page))
-				count_vm_event(UNEVICTABLE_PGSTRANDED);
-			else
-				count_vm_event(UNEVICTABLE_PGMUNLOCKED);
-		}
+		if (!isolate_lru_page(page))
+			__munlock_isolated_page(page);
+		else
+			__munlock_isolation_failed(page);
 	}
 
 	return page_mask;
@@ -210,6 +227,74 @@ static int __mlock_posix_error_return(long retval)
 }
 
 /*
+ * Munlock a batch of pages from the same zone
+ *
+ * The work is split to two main phases. First phase clears the Mlocked flag
+ * and attempts to isolate the pages, all under a single zone lru lock.
+ * The second phase finishes the munlock only for pages where isolation
+ * succeeded.
+ */
+static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
+{
+	int i;
+	int nr = pagevec_count(pvec);
+
+	/* Phase 1: page isolation */
+	spin_lock_irq(&zone->lru_lock);
+	for (i = 0; i < nr; i++) {
+		struct page *page = pvec->pages[i];
+
+		if (TestClearPageMlocked(page)) {
+			struct lruvec *lruvec;
+			int lru;
+
+			/* we have disabled interrupts */
+			__mod_zone_page_state(zone, NR_MLOCK, -1);
+
+			switch (__isolate_lru_page(page,
+						ISOLATE_UNEVICTABLE)) {
+			case 0:
+				lruvec = mem_cgroup_page_lruvec(page, zone);
+				lru = page_lru(page);
+				del_page_from_lru_list(page, lruvec, lru);
+				break;
+
+			case -EINVAL:
+				__munlock_isolation_failed(page);
+				goto skip_munlock;
+
+			default:
+				BUG();
+			}
+		} else {
+skip_munlock:
+			/*
+			 * We won't be munlocking this page in the next phase
+			 * but we still need to release the follow_page_mask()
+			 * pin.
+			 */
+			pvec->pages[i] = NULL;
+			put_page(page);
+		}
+	}
+	spin_unlock_irq(&zone->lru_lock);
+
+	/* Phase 2: page munlock and putback */
+	for (i = 0; i < nr; i++) {
+		struct page *page = pvec->pages[i];
+
+		if (unlikely(!page))
+			continue;
+
+		lock_page(page);
+		__munlock_isolated_page(page);
+		unlock_page(page);
+		put_page(page); /* pin from follow_page_mask() */
+	}
+	pagevec_reinit(pvec);
+}
+
+/*
  * munlock_vma_pages_range() - munlock all pages in the vma range.'
  * @vma - vma containing range to be munlock()ed.
  * @start - start address in @vma of the range
@@ -230,11 +315,16 @@ static int __mlock_posix_error_return(long retval)
 void munlock_vma_pages_range(struct vm_area_struct *vma,
 			     unsigned long start, unsigned long end)
 {
+	struct pagevec pvec;
+	struct zone *zone = NULL;
+
+	pagevec_init(&pvec, 0);
 	vma->vm_flags &= ~VM_LOCKED;
 
 	while (start < end) {
 		struct page *page;
 		unsigned int page_mask, page_increm;
+		struct zone *pagezone;
 
 		/*
 		 * Although FOLL_DUMP is intended for get_dump_page(),
@@ -246,20 +336,47 @@ void munlock_vma_pages_range(struct vm_area_struct *vma,
 		page = follow_page_mask(vma, start, FOLL_GET | FOLL_DUMP,
 					&page_mask);
 		if (page && !IS_ERR(page)) {
-			lock_page(page);
-			/*
-			 * Any THP page found by follow_page_mask() may have
-			 * gotten split before reaching munlock_vma_page(),
-			 * so we need to recompute the page_mask here.
-			 */
-			page_mask = munlock_vma_page(page);
-			unlock_page(page);
-			put_page(page);
+			pagezone = page_zone(page);
+			/* The whole pagevec must be in the same zone */
+			if (pagezone != zone) {
+				if (pagevec_count(&pvec))
+					__munlock_pagevec(&pvec, zone);
+				zone = pagezone;
+			}
+			if (PageTransHuge(page)) {
+				/*
+				 * THP pages are not handled by pagevec due
+				 * to their possible split (see below).
+				 */
+				if (pagevec_count(&pvec))
+					__munlock_pagevec(&pvec, zone);
+				lock_page(page);
+				/*
+				 * Any THP page found by follow_page_mask() may
+				 * have gotten split before reaching
+				 * munlock_vma_page(), so we need to recompute
+				 * the page_mask here.
+				 */
+				page_mask = munlock_vma_page(page);
+				unlock_page(page);
+				put_page(page); /* follow_page_mask() */
+			} else {
+				/*
+				 * Non-huge pages are handled in batches
+				 * via pagevec. The pin from
+				 * follow_page_mask() prevents them from
+				 * collapsing by THP.
+				 */
+				if (pagevec_add(&pvec, page) == 0)
+					__munlock_pagevec(&pvec, zone);
+			}
 		}
 		page_increm = 1 + (~(start >> PAGE_SHIFT) & page_mask);
 		start += page_increm * PAGE_SIZE;
 		cond_resched();
 	}
+	if (pagevec_count(&pvec))
+		__munlock_pagevec(&pvec, zone);
 }
 
 /*
-- 
1.8.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH 4/6] mm: munlock: batch NR_MLOCK zone state updates
  2013-08-05 14:31 [RFC PATCH 0/6] Improving munlock() performance for large non-THP areas Vlastimil Babka
                   ` (2 preceding siblings ...)
  2013-08-05 14:32 ` [RFC PATCH 3/6] mm: munlock: batch non-THP page isolation and munlock+putback using pagevec Vlastimil Babka
@ 2013-08-05 14:32 ` Vlastimil Babka
  2013-08-05 17:23   ` Jörn Engel
  2013-08-05 14:32 ` [RFC PATCH 5/6] mm: munlock: bypass per-cpu pvec for putback_lru_page Vlastimil Babka
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 13+ messages in thread
From: Vlastimil Babka @ 2013-08-05 14:32 UTC (permalink / raw)
  To: joern; +Cc: mgorman, linux-mm, Vlastimil Babka

Depending on previous batch which introduced batched isolation in
munlock_vma_range(), we can batch also the updates of NR_MLOCK
page stats. After the whole pagevec is processed for page isolation,
the stats are updated only once with the number of successful isolations.
There were however no measurable perfomance gains.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/mlock.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/mlock.c b/mm/mlock.c
index 08689b6..d112e06 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -238,6 +238,7 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
 {
 	int i;
 	int nr = pagevec_count(pvec);
+	int delta_munlocked = -nr;
 
 	/* Phase 1: page isolation */
 	spin_lock_irq(&zone->lru_lock);
@@ -248,9 +249,6 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
 			struct lruvec *lruvec;
 			int lru;
 
-			/* we have disabled interrupts */
-			__mod_zone_page_state(zone, NR_MLOCK, -1);
-
 			switch (__isolate_lru_page(page,
 						ISOLATE_UNEVICTABLE)) {
 			case 0:
@@ -275,8 +273,10 @@ skip_munlock:
 			 */
 			pvec->pages[i] = NULL;
 			put_page(page);
+			delta_munlocked++;
 		}
 	}
+	__mod_zone_page_state(zone, NR_MLOCK, delta_munlocked);
 	spin_unlock_irq(&zone->lru_lock);
 
 	/* Phase 2: page munlock and putback */
-- 
1.8.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH 5/6] mm: munlock: bypass per-cpu pvec for putback_lru_page
  2013-08-05 14:31 [RFC PATCH 0/6] Improving munlock() performance for large non-THP areas Vlastimil Babka
                   ` (3 preceding siblings ...)
  2013-08-05 14:32 ` [RFC PATCH 4/6] mm: munlock: batch NR_MLOCK zone state updates Vlastimil Babka
@ 2013-08-05 14:32 ` Vlastimil Babka
  2013-08-05 14:32 ` [RFC PATCH 6/6] mm: munlock: remove redundant get_page/put_page pair on the fast path Vlastimil Babka
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Vlastimil Babka @ 2013-08-05 14:32 UTC (permalink / raw)
  To: joern; +Cc: mgorman, linux-mm, Vlastimil Babka

After introducing batching by pagevecs into munlock_vma_range(), we can further
improve performance by bypassing the copying into per-cpu pagevec and the
get_page/put_page pair associated with that. Instead we perform LRU putback
directly from our pagevec. However, this is possible only for single-mapped
pages that are evictable after munlock. Unevictable pages require rechecking
after putting on the unevictable list, so for those we fallback to
putback_lru_page(), hich handles that.

After this patch, a 13% speedup was measured for munlocking a 56GB large memory
area with THP disabled.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/mlock.c | 85 ++++++++++++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 72 insertions(+), 13 deletions(-)

diff --git a/mm/mlock.c b/mm/mlock.c
index d112e06..5c38475 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -227,6 +227,49 @@ static int __mlock_posix_error_return(long retval)
 }
 
 /*
+ * Prepare page for fast batched LRU putback via putback_lru_evictable_pagevec()
+ *
+ * The fast path is available only for evictable pages with single mapping.
+ * Then we can bypass the per-cpu pvec and get better performance.
+ * when mapcount > 1 we need try_to_munlock() which can fail.
+ * when !page_evictable(), we need the full redo logic of putback_lru_page to
+ * avoid leaving evictable page in unevictable list.
+ *
+ * In case of success, @page is added to @pvec and @pgrescued is incremented
+ * in case that the page was previously unevictable. @page is also unlocked.
+ */
+static bool __putback_lru_fast_prepare(struct page *page, struct pagevec *pvec,
+		int *pgrescued)
+{
+	VM_BUG_ON(PageLRU(page));
+	VM_BUG_ON(!PageLocked(page));
+
+	if (page_mapcount(page) <= 1 && page_evictable(page)) {
+		pagevec_add(pvec, page);
+		if (TestClearPageUnevictable(page))
+			*pgrescued++;
+		unlock_page(page);
+		return true;
+	}
+
+	return false;
+}
+
+/*
+ * Putback multiple evictable pages to the LRU
+ *
+ * Batched putback of evictable pages that bypasses the per-cpu pvec. Some of
+ * the pages might have meanwhile become unevictable but that is OK.
+ */
+static void __putback_lru_fast(struct pagevec *pvec, int pgrescued)
+{
+	count_vm_events(UNEVICTABLE_PGMUNLOCKED, pagevec_count(pvec));
+	/* This includes put_page so we don't call it explicitly */
+	__pagevec_lru_add(pvec);
+	count_vm_events(UNEVICTABLE_PGRESCUED, pgrescued);
+}
+
+/*
  * Munlock a batch of pages from the same zone
  *
  * The work is split to two main phases. First phase clears the Mlocked flag
@@ -239,6 +282,8 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
 	int i;
 	int nr = pagevec_count(pvec);
 	int delta_munlocked = -nr;
+	struct pagevec pvec_putback;
+	int pgrescued = 0;
 
 	/* Phase 1: page isolation */
 	spin_lock_irq(&zone->lru_lock);
@@ -249,21 +294,18 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
 			struct lruvec *lruvec;
 			int lru;
 
-			switch (__isolate_lru_page(page,
-						ISOLATE_UNEVICTABLE)) {
-			case 0:
+			if (PageLRU(page)) {
 				lruvec = mem_cgroup_page_lruvec(page, zone);
 				lru = page_lru(page);
-				del_page_from_lru_list(page, lruvec, lru);
-				break;
 
-			case -EINVAL:
+				get_page(page);
+				ClearPageLRU(page);
+				del_page_from_lru_list(page, lruvec, lru);
+			} else {
 				__munlock_isolation_failed(page);
 				goto skip_munlock;
-
-			default:
-				BUG();
 			}
+
 		} else {
 skip_munlock:
 			/*
@@ -279,7 +321,8 @@ skip_munlock:
 	__mod_zone_page_state(zone, NR_MLOCK, delta_munlocked);
 	spin_unlock_irq(&zone->lru_lock);
 
-	/* Phase 2: page munlock and putback */
+	/* Phase 2: page munlock */
+	pagevec_init(&pvec_putback, 0);
 	for (i = 0; i < nr; i++) {
 		struct page *page = pvec->pages[i];
 
@@ -287,10 +330,26 @@ skip_munlock:
 			continue;
 
 		lock_page(page);
-		__munlock_isolated_page(page);
-		unlock_page(page);
-		put_page(page); /* pin from follow_page_mask() */
+		if (!__putback_lru_fast_prepare(page, &pvec_putback,
+				&pgrescued)) {
+			/* Slow path */
+			__munlock_isolated_page(page);
+			unlock_page(page);
+		}
 	}
+
+	/* Phase 3: page putback for pages that qualified for the fast path */
+	if (pagevec_count(&pvec_putback))
+		__putback_lru_fast(&pvec_putback, pgrescued);
+
+	/* Phase 4: put_page to return pin from follow_page_mask() */
+	for (i = 0; i < nr; i++) {
+		struct page *page = pvec->pages[i];
+
+		if (likely(page))
+			put_page(page);
+	}
+
 	pagevec_reinit(pvec);
 }
 
-- 
1.8.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH 6/6] mm: munlock: remove redundant get_page/put_page pair on the fast path
  2013-08-05 14:31 [RFC PATCH 0/6] Improving munlock() performance for large non-THP areas Vlastimil Babka
                   ` (4 preceding siblings ...)
  2013-08-05 14:32 ` [RFC PATCH 5/6] mm: munlock: bypass per-cpu pvec for putback_lru_page Vlastimil Babka
@ 2013-08-05 14:32 ` Vlastimil Babka
  2013-08-05 17:31 ` [RFC PATCH 0/6] Improving munlock() performance for large non-THP areas Jörn Engel
  2013-08-06 16:39 ` Jörn Engel
  7 siblings, 0 replies; 13+ messages in thread
From: Vlastimil Babka @ 2013-08-05 14:32 UTC (permalink / raw)
  To: joern; +Cc: mgorman, linux-mm, Vlastimil Babka

The performance of the fast path in munlock_vma_range() can be further improved
by avoiding atomic ops of a redundant get_page()/put_page() pair.

When calling get_page() during page isolation, we already have the pin from
follow_page_mask(). This pin will be then returned by __pagevec_lru_add(),
after which we do not reference the pages anymore.

After this patch, an 8% speedup was measured for munlocking a 56GB large memory
area with THP disabled.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/mlock.c | 26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/mm/mlock.c b/mm/mlock.c
index 5c38475..b0e897a 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -297,8 +297,10 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
 			if (PageLRU(page)) {
 				lruvec = mem_cgroup_page_lruvec(page, zone);
 				lru = page_lru(page);
-
-				get_page(page);
+				/*
+				 * We already have pin from follow_page_mask()
+				 * so we can spare the get_page() here.
+				 */
 				ClearPageLRU(page);
 				del_page_from_lru_list(page, lruvec, lru);
 			} else {
@@ -332,24 +334,24 @@ skip_munlock:
 		lock_page(page);
 		if (!__putback_lru_fast_prepare(page, &pvec_putback,
 				&pgrescued)) {
-			/* Slow path */
+			/*
+			 * Slow path. We don't want to lose the last pin
+			 * before unlock_page()
+			 */
+			get_page(page); /* for putback_lru_page() */
 			__munlock_isolated_page(page);
 			unlock_page(page);
+			put_page(page); /* from follow_page_mask() */
 		}
 	}
 
-	/* Phase 3: page putback for pages that qualified for the fast path */
+	/*
+	 * Phase 3: page putback for pages that qualified for the fast path
+	 * This will also call put_page() to return pin from follow_page_mask()
+	 */
 	if (pagevec_count(&pvec_putback))
 		__putback_lru_fast(&pvec_putback, pgrescued);
 
-	/* Phase 4: put_page to return pin from follow_page_mask() */
-	for (i = 0; i < nr; i++) {
-		struct page *page = pvec->pages[i];
-
-		if (likely(page))
-			put_page(page);
-	}
-
 	pagevec_reinit(pvec);
 }
 
-- 
1.8.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 3/6] mm: munlock: batch non-THP page isolation and munlock+putback using pagevec
  2013-08-05 14:32 ` [RFC PATCH 3/6] mm: munlock: batch non-THP page isolation and munlock+putback using pagevec Vlastimil Babka
@ 2013-08-05 17:21   ` Jörn Engel
  2013-08-06 13:27     ` Vlastimil Babka
  0 siblings, 1 reply; 13+ messages in thread
From: Jörn Engel @ 2013-08-05 17:21 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: mgorman, linux-mm

On Mon, 5 August 2013 16:32:02 +0200, Vlastimil Babka wrote:
>  
>  /*
> + * Munlock a batch of pages from the same zone
> + *
> + * The work is split to two main phases. First phase clears the Mlocked flag
> + * and attempts to isolate the pages, all under a single zone lru lock.
> + * The second phase finishes the munlock only for pages where isolation
> + * succeeded.
> + */
> +static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
> +{
> +	int i;
> +	int nr = pagevec_count(pvec);
> +
> +	/* Phase 1: page isolation */
> +	spin_lock_irq(&zone->lru_lock);
> +	for (i = 0; i < nr; i++) {
> +		struct page *page = pvec->pages[i];
> +
> +		if (TestClearPageMlocked(page)) {
> +			struct lruvec *lruvec;
> +			int lru;
> +
> +			/* we have disabled interrupts */
> +			__mod_zone_page_state(zone, NR_MLOCK, -1);
> +
> +			switch (__isolate_lru_page(page,
> +						ISOLATE_UNEVICTABLE)) {
> +			case 0:
> +				lruvec = mem_cgroup_page_lruvec(page, zone);
> +				lru = page_lru(page);
> +				del_page_from_lru_list(page, lruvec, lru);
> +				break;
> +
> +			case -EINVAL:
> +				__munlock_isolation_failed(page);
> +				goto skip_munlock;
> +
> +			default:
> +				BUG();
> +			}

On purely aesthetic grounds I don't like the switch too much.  A bit
more serious is that you don't handle -EBUSY gracefully.  I guess you
would have to mlock() the empty zero page to excercise this code path.

> +		} else {
> +skip_munlock:
> +			/*
> +			 * We won't be munlocking this page in the next phase
> +			 * but we still need to release the follow_page_mask()
> +			 * pin.
> +			 */
> +			pvec->pages[i] = NULL;
> +			put_page(page);
> +		}
> +	}
> +	spin_unlock_irq(&zone->lru_lock);
> +
> +	/* Phase 2: page munlock and putback */
> +	for (i = 0; i < nr; i++) {
> +		struct page *page = pvec->pages[i];
> +
> +		if (unlikely(!page))
> +			continue;

Whenever I see likely() or unlikely() I wonder whether it really makes
a difference or whether it is just cargo-cult programming.  My best
guess is that about half of them are cargo-cult.

> +		lock_page(page);
> +		__munlock_isolated_page(page);
> +		unlock_page(page);
> +		put_page(page); /* pin from follow_page_mask() */
> +	}
> +	pagevec_reinit(pvec);
> +}
> +
> +/*
>   * munlock_vma_pages_range() - munlock all pages in the vma range.'
>   * @vma - vma containing range to be munlock()ed.
>   * @start - start address in @vma of the range
> @@ -230,11 +315,16 @@ static int __mlock_posix_error_return(long retval)
>  void munlock_vma_pages_range(struct vm_area_struct *vma,
>  			     unsigned long start, unsigned long end)
>  {
> +	struct pagevec pvec;
> +	struct zone *zone = NULL;
> +
> +	pagevec_init(&pvec, 0);
>  	vma->vm_flags &= ~VM_LOCKED;
>  
>  	while (start < end) {
>  		struct page *page;
>  		unsigned int page_mask, page_increm;
> +		struct zone *pagezone;
>  
>  		/*
>  		 * Although FOLL_DUMP is intended for get_dump_page(),
> @@ -246,20 +336,47 @@ void munlock_vma_pages_range(struct vm_area_struct *vma,
>  		page = follow_page_mask(vma, start, FOLL_GET | FOLL_DUMP,
>  					&page_mask);
>  		if (page && !IS_ERR(page)) {
> -			lock_page(page);
> -			/*
> -			 * Any THP page found by follow_page_mask() may have
> -			 * gotten split before reaching munlock_vma_page(),
> -			 * so we need to recompute the page_mask here.
> -			 */
> -			page_mask = munlock_vma_page(page);
> -			unlock_page(page);
> -			put_page(page);
> +			pagezone = page_zone(page);
> +			/* The whole pagevec must be in the same zone */
> +			if (pagezone != zone) {
> +				if (pagevec_count(&pvec))
> +					__munlock_pagevec(&pvec, zone);
> +				zone = pagezone;
> +			}
> +			if (PageTransHuge(page)) {
> +				/*
> +				 * THP pages are not handled by pagevec due
> +				 * to their possible split (see below).
> +				 */
> +				if (pagevec_count(&pvec))
> +					__munlock_pagevec(&pvec, zone);

Should you re-initialize the pvec after this call?

> +				lock_page(page);
> +				/*
> +				 * Any THP page found by follow_page_mask() may
> +				 * have gotten split before reaching
> +				 * munlock_vma_page(), so we need to recompute
> +				 * the page_mask here.
> +				 */
> +				page_mask = munlock_vma_page(page);
> +				unlock_page(page);
> +				put_page(page); /* follow_page_mask() */
> +			} else {
> +				/*
> +				 * Non-huge pages are handled in batches
> +				 * via pagevec. The pin from
> +				 * follow_page_mask() prevents them from
> +				 * collapsing by THP.
> +				 */
> +				if (pagevec_add(&pvec, page) == 0)
> +					__munlock_pagevec(&pvec, zone);
> +			}
>  		}
>  		page_increm = 1 + (~(start >> PAGE_SHIFT) & page_mask);
>  		start += page_increm * PAGE_SIZE;
>  		cond_resched();
>  	}
> +	if (pagevec_count(&pvec))
> +		__munlock_pagevec(&pvec, zone);
>  }

The rest looks good to my untrained eyes.

JA?rn

--
One of my most productive days was throwing away 1000 lines of code.
-- Ken Thompson.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 4/6] mm: munlock: batch NR_MLOCK zone state updates
  2013-08-05 14:32 ` [RFC PATCH 4/6] mm: munlock: batch NR_MLOCK zone state updates Vlastimil Babka
@ 2013-08-05 17:23   ` Jörn Engel
  0 siblings, 0 replies; 13+ messages in thread
From: Jörn Engel @ 2013-08-05 17:23 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: mgorman, linux-mm

On Mon, 5 August 2013 16:32:03 +0200, Vlastimil Babka wrote:
> 
> Depending on previous batch which introduced batched isolation in
> munlock_vma_range(), we can batch also the updates of NR_MLOCK
> page stats. After the whole pagevec is processed for page isolation,
> the stats are updated only once with the number of successful isolations.
> There were however no measurable perfomance gains.

Neat.  This answers a question I've had when reading patch 3/6.

> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  mm/mlock.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/mlock.c b/mm/mlock.c
> index 08689b6..d112e06 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -238,6 +238,7 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
>  {
>  	int i;
>  	int nr = pagevec_count(pvec);
> +	int delta_munlocked = -nr;
>  
>  	/* Phase 1: page isolation */
>  	spin_lock_irq(&zone->lru_lock);
> @@ -248,9 +249,6 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
>  			struct lruvec *lruvec;
>  			int lru;
>  
> -			/* we have disabled interrupts */
> -			__mod_zone_page_state(zone, NR_MLOCK, -1);
> -
>  			switch (__isolate_lru_page(page,
>  						ISOLATE_UNEVICTABLE)) {
>  			case 0:
> @@ -275,8 +273,10 @@ skip_munlock:
>  			 */
>  			pvec->pages[i] = NULL;
>  			put_page(page);
> +			delta_munlocked++;
>  		}
>  	}
> +	__mod_zone_page_state(zone, NR_MLOCK, delta_munlocked);
>  	spin_unlock_irq(&zone->lru_lock);
>  
>  	/* Phase 2: page munlock and putback */
> -- 
> 1.8.1.4
> 

JA?rn

--
Doubt is not a pleasant condition, but certainty is an absurd one.
-- Voltaire

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 0/6] Improving munlock() performance for large non-THP areas
  2013-08-05 14:31 [RFC PATCH 0/6] Improving munlock() performance for large non-THP areas Vlastimil Babka
                   ` (5 preceding siblings ...)
  2013-08-05 14:32 ` [RFC PATCH 6/6] mm: munlock: remove redundant get_page/put_page pair on the fast path Vlastimil Babka
@ 2013-08-05 17:31 ` Jörn Engel
  2013-08-06 16:39 ` Jörn Engel
  7 siblings, 0 replies; 13+ messages in thread
From: Jörn Engel @ 2013-08-05 17:31 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: mgorman, linux-mm

On Mon, 5 August 2013 16:31:59 +0200, Vlastimil Babka wrote:
> 
> timedmunlock
>                             3.11-rc3              3.11-rc3              3.11-rc3              3.11-rc3              3.11-rc3              3.11-rc3              3.11-rc3
>                                    0                     1                     2                     3                     4                     5                     6
> Elapsed min           3.38 (  0.00%)        3.39 ( -0.14%)        3.00 ( 11.35%)        2.73 ( 19.48%)        2.72 ( 19.50%)        2.34 ( 30.78%)        2.16 ( 36.23%)
> Elapsed mean          3.39 (  0.00%)        3.39 ( -0.05%)        3.01 ( 11.25%)        2.73 ( 19.54%)        2.73 ( 19.41%)        2.36 ( 30.30%)        2.17 ( 36.00%)
> Elapsed stddev        0.01 (  0.00%)        0.00 ( 71.98%)        0.01 (-71.14%)        0.00 ( 89.12%)        0.01 (-48.55%)        0.03 (-277.27%)        0.01 (-85.75%)
> Elapsed max           3.41 (  0.00%)        3.40 (  0.39%)        3.04 ( 10.81%)        2.73 ( 19.96%)        2.76 ( 19.09%)        2.43 ( 28.64%)        2.20 ( 35.41%)
> Elapsed range         0.02 (  0.00%)        0.01 ( 74.99%)        0.04 (-66.12%)        0.00 ( 88.12%)        0.03 (-39.24%)        0.09 (-274.85%)        0.04 (-81.04%)

Impressive numbers.  Patches 1,2,4,6 look good to me (for whatever
that is worth).  Patch 5 exceeded my review capacity for now, I will
give you feedback once my brain returns from vacation.

Thank you for the patchset!  Work in this area is very much
appreciated.

JA?rn

--
The rabbit runs faster than the fox, because the rabbit is rinning for
his life while the fox is only running for his dinner.
-- Aesop

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 3/6] mm: munlock: batch non-THP page isolation and munlock+putback using pagevec
  2013-08-05 17:21   ` Jörn Engel
@ 2013-08-06 13:27     ` Vlastimil Babka
  2013-08-06 16:21       ` Jörn Engel
  0 siblings, 1 reply; 13+ messages in thread
From: Vlastimil Babka @ 2013-08-06 13:27 UTC (permalink / raw)
  To: Jörn Engel; +Cc: mgorman, linux-mm

On 08/05/2013 07:21 PM, JA?rn Engel wrote:

Hi and thanks for the review!
> On Mon, 5 August 2013 16:32:02 +0200, Vlastimil Babka wrote:
>>  
>>  /*
>> + * Munlock a batch of pages from the same zone
>> + *
>> + * The work is split to two main phases. First phase clears the Mlocked flag
>> + * and attempts to isolate the pages, all under a single zone lru lock.
>> + * The second phase finishes the munlock only for pages where isolation
>> + * succeeded.
>> + */
>> +static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
>> +{
>> +	int i;
>> +	int nr = pagevec_count(pvec);
>> +
>> +	/* Phase 1: page isolation */
>> +	spin_lock_irq(&zone->lru_lock);
>> +	for (i = 0; i < nr; i++) {
>> +		struct page *page = pvec->pages[i];
>> +
>> +		if (TestClearPageMlocked(page)) {
>> +			struct lruvec *lruvec;
>> +			int lru;
>> +
>> +			/* we have disabled interrupts */
>> +			__mod_zone_page_state(zone, NR_MLOCK, -1);
>> +
>> +			switch (__isolate_lru_page(page,
>> +						ISOLATE_UNEVICTABLE)) {
>> +			case 0:
>> +				lruvec = mem_cgroup_page_lruvec(page, zone);
>> +				lru = page_lru(page);
>> +				del_page_from_lru_list(page, lruvec, lru);
>> +				break;
>> +
>> +			case -EINVAL:
>> +				__munlock_isolation_failed(page);
>> +				goto skip_munlock;
>> +
>> +			default:
>> +				BUG();
>> +			}
> On purely aesthetic grounds I don't like the switch too much.  A bit
Right, I just saw this function used like this elsewhere so it seemed
like the right thing to do if I was to reuse as much existing code as
possible. But I already got a suggestion that this is too big of a
hammer for this call path where three simple statements are sufficient
instead, and subsequent patches also replace this.
> more serious is that you don't handle -EBUSY gracefully.  I guess you
> would have to mlock() the empty zero page to excercise this code path.
>
>From what I see in the implementation, -EBUSY can only happen with flags
that I don't use, or when get_page_unless_zero() fails. But it cannot
fail since I already have get_page() from follow_page_mask(). (the
function is about zero get_page() pins, not about being zero page).
>> +		} else {
>> +skip_munlock:
>> +			/*
>> +			 * We won't be munlocking this page in the next phase
>> +			 * but we still need to release the follow_page_mask()
>> +			 * pin.
>> +			 */
>> +			pvec->pages[i] = NULL;
>> +			put_page(page);
>> +		}
>> +	}
>> +	spin_unlock_irq(&zone->lru_lock);
>> +
>> +	/* Phase 2: page munlock and putback */
>> +	for (i = 0; i < nr; i++) {
>> +		struct page *page = pvec->pages[i];
>> +
>> +		if (unlikely(!page))
>> +			continue;
> Whenever I see likely() or unlikely() I wonder whether it really makes
> a difference or whether it is just cargo-cult programming.  My best
> guess is that about half of them are cargo-cult.
Yeah that's another thing I saw being used around and seemed to make
sense. But in truth I'm also not sure if contemporary processors gain
anything from it. I will drop it then.
> +			}
> +			if (PageTransHuge(page)) {
> +				/*
> +				 * THP pages are not handled by pagevec due
> +				 * to their possible split (see below).
> +				 */
> +				if (pagevec_count(&pvec))
> +					__munlock_pagevec(&pvec, zone);
> Should you re-initialize the pvec after this call?
__munlock_pagevec() does it as the last thing

Thanks,
Vlastimil

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 3/6] mm: munlock: batch non-THP page isolation and munlock+putback using pagevec
  2013-08-06 13:27     ` Vlastimil Babka
@ 2013-08-06 16:21       ` Jörn Engel
  0 siblings, 0 replies; 13+ messages in thread
From: Jörn Engel @ 2013-08-06 16:21 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: mgorman, linux-mm

On Tue, 6 August 2013 15:27:53 +0200, Vlastimil Babka wrote:
> On 08/05/2013 07:21 PM, JA?rn Engel wrote:
> > On Mon, 5 August 2013 16:32:02 +0200, Vlastimil Babka wrote:
> >>  
> >>  /*
> >> + * Munlock a batch of pages from the same zone
> >> + *
> >> + * The work is split to two main phases. First phase clears the Mlocked flag
> >> + * and attempts to isolate the pages, all under a single zone lru lock.
> >> + * The second phase finishes the munlock only for pages where isolation
> >> + * succeeded.
> >> + */
> >> +static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
> >> +{
> >> +	int i;
> >> +	int nr = pagevec_count(pvec);
> >> +
> >> +	/* Phase 1: page isolation */
> >> +	spin_lock_irq(&zone->lru_lock);
> >> +	for (i = 0; i < nr; i++) {
> >> +		struct page *page = pvec->pages[i];
> >> +
> >> +		if (TestClearPageMlocked(page)) {
> >> +			struct lruvec *lruvec;
> >> +			int lru;
> >> +
> >> +			/* we have disabled interrupts */
> >> +			__mod_zone_page_state(zone, NR_MLOCK, -1);
> >> +
> >> +			switch (__isolate_lru_page(page,
> >> +						ISOLATE_UNEVICTABLE)) {
> >> +			case 0:
> >> +				lruvec = mem_cgroup_page_lruvec(page, zone);
> >> +				lru = page_lru(page);
> >> +				del_page_from_lru_list(page, lruvec, lru);
> >> +				break;
> >> +
> >> +			case -EINVAL:
> >> +				__munlock_isolation_failed(page);
> >> +				goto skip_munlock;
> >> +
> >> +			default:
> >> +				BUG();
> >> +			}
> > more serious is that you don't handle -EBUSY gracefully.  I guess you
> > would have to mlock() the empty zero page to excercise this code path.
>
> From what I see in the implementation, -EBUSY can only happen with flags
> that I don't use, or when get_page_unless_zero() fails. But it cannot
> fail since I already have get_page() from follow_page_mask(). (the
> function is about zero get_page() pins, not about being zero page).

You are right.  Not sure if this should be explained in a comment in
the code as well.

> > +			}
> > +			if (PageTransHuge(page)) {
> > +				/*
> > +				 * THP pages are not handled by pagevec due
> > +				 * to their possible split (see below).
> > +				 */
> > +				if (pagevec_count(&pvec))
> > +					__munlock_pagevec(&pvec, zone);
> > Should you re-initialize the pvec after this call?
> __munlock_pagevec() does it as the last thing

Right you are.

JA?rn

--
The key to performance is elegance, not battalions of special cases.
-- Jon Bentley and Doug McIlroy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 0/6] Improving munlock() performance for large non-THP areas
  2013-08-05 14:31 [RFC PATCH 0/6] Improving munlock() performance for large non-THP areas Vlastimil Babka
                   ` (6 preceding siblings ...)
  2013-08-05 17:31 ` [RFC PATCH 0/6] Improving munlock() performance for large non-THP areas Jörn Engel
@ 2013-08-06 16:39 ` Jörn Engel
  7 siblings, 0 replies; 13+ messages in thread
From: Jörn Engel @ 2013-08-06 16:39 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: mgorman, linux-mm

On Mon, 5 August 2013 16:31:59 +0200, Vlastimil Babka wrote:
> 
> timedmunlock
>                             3.11-rc3              3.11-rc3              3.11-rc3              3.11-rc3              3.11-rc3              3.11-rc3              3.11-rc3
>                                    0                     1                     2                     3                     4                     5                     6
> Elapsed min           3.38 (  0.00%)        3.39 ( -0.14%)        3.00 ( 11.35%)        2.73 ( 19.48%)        2.72 ( 19.50%)        2.34 ( 30.78%)        2.16 ( 36.23%)
> Elapsed mean          3.39 (  0.00%)        3.39 ( -0.05%)        3.01 ( 11.25%)        2.73 ( 19.54%)        2.73 ( 19.41%)        2.36 ( 30.30%)        2.17 ( 36.00%)
> Elapsed stddev        0.01 (  0.00%)        0.00 ( 71.98%)        0.01 (-71.14%)        0.00 ( 89.12%)        0.01 (-48.55%)        0.03 (-277.27%)        0.01 (-85.75%)
> Elapsed max           3.41 (  0.00%)        3.40 (  0.39%)        3.04 ( 10.81%)        2.73 ( 19.96%)        2.76 ( 19.09%)        2.43 ( 28.64%)        2.20 ( 35.41%)
> Elapsed range         0.02 (  0.00%)        0.01 ( 74.99%)        0.04 (-66.12%)        0.00 ( 88.12%)        0.03 (-39.24%)        0.09 (-274.85%)        0.04 (-81.04%)
> 
> 
> Vlastimil Babka (6):
>   mm: putback_lru_page: remove unnecessary call to page_lru_base_type()
>   mm: munlock: remove unnecessary call to lru_add_drain()
>   mm: munlock: batch non-THP page isolation and munlock+putback using
>     pagevec
>   mm: munlock: batch NR_MLOCK zone state updates
>   mm: munlock: bypass per-cpu pvec for putback_lru_page
>   mm: munlock: remove redundant get_page/put_page pair on the fast path
> 
>  mm/mlock.c  | 259 ++++++++++++++++++++++++++++++++++++++++++++++++++----------
>  mm/vmscan.c |  12 +--
>  2 files changed, 224 insertions(+), 47 deletions(-)

Finally walked through 5/6 as well.  The entire patchset looks good to
me.  Feel free to attach my Reviewed-By: to the patchset.

JA?rn

--
tglx1 thinks that joern should get a (TM) for "Thinking Is Hard"
-- Thomas Gleixner

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-08-06 18:11 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-05 14:31 [RFC PATCH 0/6] Improving munlock() performance for large non-THP areas Vlastimil Babka
2013-08-05 14:32 ` [RFC PATCH 1/6] mm: putback_lru_page: remove unnecessary call to page_lru_base_type() Vlastimil Babka
2013-08-05 14:32 ` [RFC PATCH 2/6] mm: munlock: remove unnecessary call to lru_add_drain() Vlastimil Babka
2013-08-05 14:32 ` [RFC PATCH 3/6] mm: munlock: batch non-THP page isolation and munlock+putback using pagevec Vlastimil Babka
2013-08-05 17:21   ` Jörn Engel
2013-08-06 13:27     ` Vlastimil Babka
2013-08-06 16:21       ` Jörn Engel
2013-08-05 14:32 ` [RFC PATCH 4/6] mm: munlock: batch NR_MLOCK zone state updates Vlastimil Babka
2013-08-05 17:23   ` Jörn Engel
2013-08-05 14:32 ` [RFC PATCH 5/6] mm: munlock: bypass per-cpu pvec for putback_lru_page Vlastimil Babka
2013-08-05 14:32 ` [RFC PATCH 6/6] mm: munlock: remove redundant get_page/put_page pair on the fast path Vlastimil Babka
2013-08-05 17:31 ` [RFC PATCH 0/6] Improving munlock() performance for large non-THP areas Jörn Engel
2013-08-06 16:39 ` Jörn Engel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).