linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00 of 30] Transparent Hugepage support #3
@ 2010-01-21  6:20 Andrea Arcangeli
  2010-01-21  6:20 ` [PATCH 01 of 30] define MADV_HUGEPAGE Andrea Arcangeli
                   ` (31 more replies)
  0 siblings, 32 replies; 79+ messages in thread
From: Andrea Arcangeli @ 2010-01-21  6:20 UTC (permalink / raw)
  To: linux-mm
  Cc: Marcelo Tosatti, Adam Litke, Avi Kivity, Izik Eidus, Hugh Dickins,
	Nick Piggin, Rik van Riel, Mel Gorman, Andi Kleen, Dave Hansen,
	Benjamin Herrenschmidt, Ingo Molnar, Mike Travis,
	KAMEZAWA Hiroyuki, Christoph Lameter, Chris Wright

Hello,

this is the latest version of my patchset, it has all cleanups requested by
previous review on linux-mm and more fixes and it ships the first working
khugepaged daemon.

This seems feature complete as far as KVM is concerned, "madvise" mode for
both /sys/kernel/mm/transparent_hugepage/enabled and
/sys/kernel/mm/transparent_hugepage/khugepaged/enabled is enough for
hypervisor utilization. The default of the patchset is "always" for both, to
be sure the new code gets exercised even by all apps that could benefit from
this (yes, khugepaged is transparently enabled on all mappings with
what I think is a negligeable/unmesurable overhead and perhaps it becomes
beneficial for long-living vmas with intensive computations on them).

TODO (first things that come to mind):

- at leats smaps should stop calling split_huge_page

- find a way to fix the lru statistics so they will show the right ram amount
  (statistic code these days seems almost more complex than the real useful
  code, especially the isolated lru counter seems very dubious in its
  usefulness and it's further pain to deal with all over the VM). Fixing these
  counters is after all low priority because I know no app aware about the VM
  internals and depending on the exact size of
  inactive/active/anon/file/unevictable lru lists. fixign smaps not to split
  hugepages is much higher priority. The stats don't overflow or underflow,
  they're just not right.

- maybe add some other stat in addition to AnonHugePages in /proc/meminfo. You
  can monitor the effect of khugepaged or of an mprotect calling
  split_huge_page trivially with "grep Anon /proc/meminfo"

- I need to integrate Mel's memory compation code to be used by khugepaged and
  by the page faults if "defrag" sysfs file setting requires it. His results
  (especially with the bug fixes that decreased reclaim a lot) looks promising.

- likely we'll need a slab front allocator too allocating in 2m chunks, but
  this should be re-evaluated after merging Mel's work, maybe he already did
  that.

- khugepaged isn't yet capable of merging readonly shared anon pages, that
  isn't needed by KVM (KVM uses MADV_DONTFORK) but it might want to learn it
  for other apps

- khugepaged should also learn to skip the copy and collapse the hugepage
  in-place, if possible (to undo the effect of surpious split_huge_page)

I'm leaving this under a continous stress with scan_sleep_millisecs and
defrag_sleep_millisecs set to 0 and a 5G swap storm + ~4G in ram. The swap storm
before settling in pause() will call madvise to split all hugepages in ram and
then it will run a further memset again to swapin everything a second time.
Eventually it will settle and khugepaged will remerge as many hugepages as
they're fully mapped in userland (mapped as swapcache is ok, but khugepaged
will not trigger swapin I/O or swapcache minor fault) if there are enough not
fragmented hugepages available.

This is shortly after start.

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
0  5 3938052  67688    208   3684 1219 2779  1239  2779  396 1061  1  5 75 20
2  5 3937092  61612    208   3712 25120 24112 25120 24112 7420 5396  0  8 44 48
0  5 3932116  55536    208   3780 26444 21468 26444 21468 7532 5399  0  8 52 40
0  5 3927264  46724    208   3296 28208 22528 28328 22528 7871 5722  0  7 52 41
AnonPages:       1751352 kB
AnonHugePages:   2021376 kB
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
0  5 3935604  58092    208   3864 1233 2787  1253  2787  400 1061  1  5 74 20
0  5 3933924  54248    208   3508 23748 23548 23904 23548 7112 4829  0  6 49 45
1  4 3937708  60696    208   3704 24680 28680 24760 28680 7034 5112  0  8 50 42
1  4 3934508  59084    208   3304 24096 21020 24156 21020 6832 5015  0  7 48 46
AnonPages:       1746296 kB
AnonHugePages:   2023424 kB

this is after it settled and it's waiting in pause(). khugepaged when it's not
copying with defrag_sleep/scan_sleep both = 0, just trigers a
superoverschedule, but as you can see it's extremely low overhead, only taking
8% of 4 cores or 32% of 1 core. Likely most of the cpu is taking by schedule().
So you can imagine how low overhead it is when sleep is set to a "production"
level and not stress test level. Default sleep is 10seconds and not 2usec...

1  0 5680228 106028    396   5060    0    0     0     0  534 341005  0  8 92  0
1  0 5680228 106028    396   5060    0    0     0     0  517 349159  0  9 91  0
1  0 5680228 106028    396   5060    0    0     0     0  518 346356  0  6 94  0
0  0 5680228 106028    396   5060    0    0     0     0  511 348478  0  8 92  0
AnonPages:        392396 kB
AnonHugePages:   3371008 kB

So it looks good so far.

I think it's probably time to port the patchset to mmotd.
Further review welcome!

Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 79+ messages in thread

end of thread, other threads:[~2010-01-28 12:37 UTC | newest]

Thread overview: 79+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-21  6:20 [PATCH 00 of 30] Transparent Hugepage support #3 Andrea Arcangeli
2010-01-21  6:20 ` [PATCH 01 of 30] define MADV_HUGEPAGE Andrea Arcangeli
2010-01-21  6:20 ` [PATCH 02 of 30] compound_lock Andrea Arcangeli
2010-01-21  6:20 ` [PATCH 03 of 30] alter compound get_page/put_page Andrea Arcangeli
2010-01-21 17:35   ` Dave Hansen
2010-01-23 17:39     ` Andrea Arcangeli
2010-01-21  6:20 ` [PATCH 04 of 30] clear compound mapping Andrea Arcangeli
2010-01-21 17:43   ` Dave Hansen
2010-01-23 17:55     ` Andrea Arcangeli
2010-01-21  6:20 ` [PATCH 05 of 30] add native_set_pmd_at Andrea Arcangeli
2010-01-21  6:20 ` [PATCH 06 of 30] add pmd paravirt ops Andrea Arcangeli
2010-01-21  6:20 ` [PATCH 07 of 30] no paravirt version of pmd ops Andrea Arcangeli
2010-01-21  6:20 ` [PATCH 08 of 30] export maybe_mkwrite Andrea Arcangeli
2010-01-21  6:20 ` [PATCH 09 of 30] comment reminder in destroy_compound_page Andrea Arcangeli
2010-01-21  6:20 ` [PATCH 10 of 30] config_transparent_hugepage Andrea Arcangeli
2010-01-21  6:20 ` [PATCH 11 of 30] add pmd mangling functions to x86 Andrea Arcangeli
2010-01-21 17:47   ` Dave Hansen
2010-01-21 19:14     ` Andrea Arcangeli
2010-01-21  6:20 ` [PATCH 12 of 30] add pmd mangling generic functions Andrea Arcangeli
2010-01-21  6:20 ` [PATCH 13 of 30] special pmd_trans_* functions Andrea Arcangeli
2010-01-21  6:20 ` [PATCH 14 of 30] bail out gup_fast on splitting pmd Andrea Arcangeli
2010-01-21  6:20 ` [PATCH 15 of 30] pte alloc trans splitting Andrea Arcangeli
2010-01-21  6:20 ` [PATCH 16 of 30] add pmd mmu_notifier helpers Andrea Arcangeli
2010-01-21  6:20 ` [PATCH 17 of 30] clear page compound Andrea Arcangeli
2010-01-21  6:20 ` [PATCH 18 of 30] add pmd_huge_pte to mm_struct Andrea Arcangeli
2010-01-21  6:20 ` [PATCH 19 of 30] ensure mapcount is taken on head pages Andrea Arcangeli
2010-01-21  6:20 ` [PATCH 20 of 30] split_huge_page_mm/vma Andrea Arcangeli
2010-01-21  6:20 ` [PATCH 21 of 30] split_huge_page paging Andrea Arcangeli
2010-01-21  6:20 ` [PATCH 22 of 30] pmd_trans_huge migrate bugcheck Andrea Arcangeli
2010-01-21 20:40   ` Christoph Lameter
2010-01-21 23:01     ` Andrea Arcangeli
2010-01-21 23:17       ` Andrea Arcangeli
2010-01-21  6:20 ` [PATCH 23 of 30] clear_copy_huge_page Andrea Arcangeli
2010-01-21  6:20 ` [PATCH 24 of 30] kvm mmu transparent hugepage support Andrea Arcangeli
2010-01-21  6:20 ` [PATCH 25 of 30] transparent hugepage core Andrea Arcangeli
2010-01-21  6:20 ` [PATCH 26 of 30] madvise(MADV_HUGEPAGE) Andrea Arcangeli
2010-01-21  6:20 ` [PATCH 27 of 30] memcg compound Andrea Arcangeli
2010-01-21  7:07   ` KAMEZAWA Hiroyuki
2010-01-21 15:44     ` Andrea Arcangeli
2010-01-21 23:55       ` KAMEZAWA Hiroyuki
2010-01-21  6:20 ` [PATCH 28 of 30] memcg huge memory Andrea Arcangeli
2010-01-21  7:16   ` KAMEZAWA Hiroyuki
2010-01-21 16:08     ` Andrea Arcangeli
2010-01-22  0:13       ` KAMEZAWA Hiroyuki
2010-01-27 11:27         ` Balbir Singh
2010-01-28  0:50           ` Daisuke Nishimura
2010-01-28 11:39           ` Andrea Arcangeli
2010-01-28 12:23             ` Balbir Singh
2010-01-28 12:36               ` Andrea Arcangeli
2010-01-21  6:20 ` [PATCH 29 of 30] transparent hugepage vmstat Andrea Arcangeli
2010-01-21  6:20 ` [PATCH 30 of 30] khugepaged Andrea Arcangeli
2010-01-22 14:46 ` [PATCH 00 of 30] Transparent Hugepage support #3 Christoph Lameter
2010-01-22 15:19   ` Andrea Arcangeli
2010-01-22 16:51     ` Christoph Lameter
2010-01-23 17:58       ` Andrea Arcangeli
2010-01-25 21:50         ` Christoph Lameter
2010-01-25 22:46           ` Andrea Arcangeli
2010-01-26 15:47             ` Christoph Lameter
2010-01-26 16:11               ` Andrea Arcangeli
2010-01-26 16:30                 ` Christoph Lameter
2010-01-26 16:45                   ` Andrea Arcangeli
2010-01-26 18:23                     ` Christoph Lameter
2010-01-26 17:09                   ` Avi Kivity
2010-01-26  0:52           ` Rik van Riel
2010-01-26  6:53             ` Gleb Natapov
2010-01-26 12:35               ` Andrea Arcangeli
2010-01-26 15:55                 ` Christoph Lameter
2010-01-26 16:19                   ` Andrea Arcangeli
2010-01-26 15:54             ` Christoph Lameter
2010-01-26 16:16               ` Andrea Arcangeli
2010-01-26 16:24                 ` Andi Kleen
2010-01-26 16:37                 ` Christoph Lameter
2010-01-26 16:42                 ` Mel Gorman
2010-01-26 16:52                   ` Andrea Arcangeli
2010-01-26 17:26                     ` Mel Gorman
2010-01-26 19:46                       ` Andrea Arcangeli
2010-01-26 23:07               ` Rik van Riel
2010-01-27 18:33                 ` Christoph Lameter
2010-01-26 11:24 ` Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).