linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/6] Improving munlock() performance for large non-THP areas
@ 2013-08-05 14:31 Vlastimil Babka
  2013-08-05 14:32 ` [RFC PATCH 1/6] mm: putback_lru_page: remove unnecessary call to page_lru_base_type() Vlastimil Babka
                   ` (7 more replies)
  0 siblings, 8 replies; 13+ messages in thread
From: Vlastimil Babka @ 2013-08-05 14:31 UTC (permalink / raw)
  To: joern; +Cc: mgorman, linux-mm, Vlastimil Babka

Hi everyone and apologies for any mistakes in my first attempt at linux-mm
contribution :)

The goal of this patch series is to improve performance of munlock() of large
mlocked memory areas on systems without THP. This is motivated by reported very
long times of crash recovery of processes with such areas, where munlock() can
take several seconds. See http://lwn.net/Articles/548108/

The work was driven by a simple benchmark (to be included in mmtests) that
mmaps() e.g. 56GB with MAP_LOCKED | MAP_POPULATE and measures the time of
munlock(). Profiling was performed by attaching operf --pid to the process
and sending a signal to trigger the munlock() part and then notify bach
the monitoring wrapper to stop operf, so that only munlock() appears in the
profile.

The profiles have shown that CPU time is spent mostly by atomic operations
and locking, which the patches aim to reduce, starting from easier to more
complex changes.

Patch 1 performs a simple cleanup in putback_lru_page() so that page lru base
	type is not determined without being actually needed.

Patch 2 removes an unnecessary call to lru_add_drain() which drains the per-cpu
	pagevec after each munlocked page is put there.

Patch 3 changes munlock_vma_range() to use an on-stack pagevec for isolating
	multiple non-THP pages under a single lru_lock instead of locking and
	processing each page separately.

Patch 4 changes the NR_MLOCK accounting to be called only once per the pvec
	introduced by previous patch.

Patch 5 uses the introduced pagevec to batch also the work of putback_lru_page
	when possible, bypassing the per-cpu pvec and associated overhead.

Patch 6 Removes a redundant get_page/put_page pair which saves costly atomic
	operations.

Measurements were made using 3.11-rc3 as a baseline.

timedmunlock
                            3.11-rc3              3.11-rc3              3.11-rc3              3.11-rc3              3.11-rc3              3.11-rc3              3.11-rc3
                                   0                     1                     2                     3                     4                     5                     6
Elapsed min           3.38 (  0.00%)        3.39 ( -0.14%)        3.00 ( 11.35%)        2.73 ( 19.48%)        2.72 ( 19.50%)        2.34 ( 30.78%)        2.16 ( 36.23%)
Elapsed mean          3.39 (  0.00%)        3.39 ( -0.05%)        3.01 ( 11.25%)        2.73 ( 19.54%)        2.73 ( 19.41%)        2.36 ( 30.30%)        2.17 ( 36.00%)
Elapsed stddev        0.01 (  0.00%)        0.00 ( 71.98%)        0.01 (-71.14%)        0.00 ( 89.12%)        0.01 (-48.55%)        0.03 (-277.27%)        0.01 (-85.75%)
Elapsed max           3.41 (  0.00%)        3.40 (  0.39%)        3.04 ( 10.81%)        2.73 ( 19.96%)        2.76 ( 19.09%)        2.43 ( 28.64%)        2.20 ( 35.41%)
Elapsed range         0.02 (  0.00%)        0.01 ( 74.99%)        0.04 (-66.12%)        0.00 ( 88.12%)        0.03 (-39.24%)        0.09 (-274.85%)        0.04 (-81.04%)


Vlastimil Babka (6):
  mm: putback_lru_page: remove unnecessary call to page_lru_base_type()
  mm: munlock: remove unnecessary call to lru_add_drain()
  mm: munlock: batch non-THP page isolation and munlock+putback using
    pagevec
  mm: munlock: batch NR_MLOCK zone state updates
  mm: munlock: bypass per-cpu pvec for putback_lru_page
  mm: munlock: remove redundant get_page/put_page pair on the fast path

 mm/mlock.c  | 259 ++++++++++++++++++++++++++++++++++++++++++++++++++----------
 mm/vmscan.c |  12 +--
 2 files changed, 224 insertions(+), 47 deletions(-)

-- 
1.8.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-08-06 18:11 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-05 14:31 [RFC PATCH 0/6] Improving munlock() performance for large non-THP areas Vlastimil Babka
2013-08-05 14:32 ` [RFC PATCH 1/6] mm: putback_lru_page: remove unnecessary call to page_lru_base_type() Vlastimil Babka
2013-08-05 14:32 ` [RFC PATCH 2/6] mm: munlock: remove unnecessary call to lru_add_drain() Vlastimil Babka
2013-08-05 14:32 ` [RFC PATCH 3/6] mm: munlock: batch non-THP page isolation and munlock+putback using pagevec Vlastimil Babka
2013-08-05 17:21   ` Jörn Engel
2013-08-06 13:27     ` Vlastimil Babka
2013-08-06 16:21       ` Jörn Engel
2013-08-05 14:32 ` [RFC PATCH 4/6] mm: munlock: batch NR_MLOCK zone state updates Vlastimil Babka
2013-08-05 17:23   ` Jörn Engel
2013-08-05 14:32 ` [RFC PATCH 5/6] mm: munlock: bypass per-cpu pvec for putback_lru_page Vlastimil Babka
2013-08-05 14:32 ` [RFC PATCH 6/6] mm: munlock: remove redundant get_page/put_page pair on the fast path Vlastimil Babka
2013-08-05 17:31 ` [RFC PATCH 0/6] Improving munlock() performance for large non-THP areas Jörn Engel
2013-08-06 16:39 ` Jörn Engel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).