From: "Jim Schutt" <jaschut@sandia.gov>
To: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org,
Mel Gorman <mel@csn.ul.ie>,
kamezawa.hiroyu@jp.fujitsu.com, minchan@kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH -mm] mm: have order>0 compaction start off where it left
Date: Thu, 28 Jun 2012 11:16:49 -0600 [thread overview]
Message-ID: <4FEC9181.9060000@sandia.gov> (raw)
In-Reply-To: <20120627233742.53225fc7@annuminas.surriel.com>
On 06/27/2012 09:37 PM, Rik van Riel wrote:
> Order> 0 compaction stops when enough free pages of the correct
> page order have been coalesced. When doing subsequent higher order
> allocations, it is possible for compaction to be invoked many times.
>
> However, the compaction code always starts out looking for things to
> compact at the start of the zone, and for free pages to compact things
> to at the end of the zone.
>
> This can cause quadratic behaviour, with isolate_freepages starting
> at the end of the zone each time, even though previous invocations
> of the compaction code already filled up all free memory on that end
> of the zone.
>
> This can cause isolate_freepages to take enormous amounts of CPU
> with certain workloads on larger memory systems.
>
> The obvious solution is to have isolate_freepages remember where
> it left off last time, and continue at that point the next time
> it gets invoked for an order> 0 compaction. This could cause
> compaction to fail if cc->free_pfn and cc->migrate_pfn are close
> together initially, in that case we restart from the end of the
> zone and try once more.
>
> Forced full (order == -1) compactions are left alone.
>
> Reported-by: Jim Schutt<jaschut@sandia.gov>
> Signed-off-by: Rik van Riel<riel@redhat.com>
Tested-by: Jim Schutt<jaschut@sandia.gov>
Please let me know if you further refine this patch
and would like me to test it with my workload.
> ---
> CAUTION: due to the time of day, I have only COMPILE tested this code
>
> include/linux/mmzone.h | 4 ++++
> mm/compaction.c | 25 +++++++++++++++++++++++--
> mm/internal.h | 1 +
> mm/page_alloc.c | 4 ++++
> 4 files changed, 32 insertions(+), 2 deletions(-)
This patch is working great for me.
FWIW here's a typical vmstat report, after ~20 minutes of my Ceph load:
2012-06-28 10:59:16.887-06:00
vmstat -w 4 16
procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu-------
r b swpd free buff cache si so bi bo in cs us sy id wa st
23 21 0 393128 480 36883448 0 0 8 49583 90 273 6 25 58 11 0
6 18 0 397892 480 36912832 0 0 281 2293321 203321 168790 11 43 22 24 0
17 23 0 394540 480 36921356 0 0 262 2227505 202744 163158 11 45 20 23 0
25 17 0 359404 480 36972884 0 0 205 2243941 201087 167874 11 42 23 24 0
21 20 0 367400 480 36934416 0 0 232 2310577 200666 156693 12 50 17 22 0
12 18 0 378048 480 36890624 0 0 232 2235455 196480 165692 11 44 22 24 0
17 18 0 372444 480 36874484 0 0 280 2185592 195885 168416 11 43 24 23 0
51 16 0 372760 480 36841148 0 0 245 2211135 195711 158012 11 46 23 20 0
23 17 0 375272 480 36847292 0 0 228 2323708 207079 164988 12 49 19 20 0
10 26 0 373540 480 36889240 0 0 341 2290586 201708 167954 11 46 19 23 0
44 14 0 303828 480 37020940 0 0 302 2180893 199958 168619 11 40 23 26 0
24 14 0 359320 480 36970272 0 0 345 2173978 197097 163760 11 47 22 20 0
32 19 0 355744 480 36917372 0 0 267 2276251 200123 167776 11 46 19 23 0
34 19 0 360824 480 36900032 0 0 259 2252057 200942 170912 11 43 21 25 0
13 17 0 361288 480 36919360 0 0 253 2149189 188426 170940 10 40 27 23 0
15 16 0 341828 480 36883988 0 0 317 2272817 205203 173732 11 48 19 21 0
Also FWIW, here's a typical "perf top" report with the patch applied:
PerfTop: 17575 irqs/sec kernel:80.4% exact: 0.0% [1000Hz cycles], (all, 24 CPUs)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ___________________________ ________________________________________________________________________________________
27583.00 11.6% copy_user_generic_string /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
18387.00 7.8% __crc32c_le /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
17264.00 7.3% _raw_spin_lock_irqsave /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
13890.00 5.9% ceph_crc32c_le /usr/bin/ceph-osd
5952.00 2.5% __copy_user_nocache /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
4663.00 2.0% memmove /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
3141.00 1.3% _raw_spin_lock /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
2939.00 1.2% rb_prev /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
2933.00 1.2% clflush_cache_range /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
2586.00 1.1% __list_del_entry /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
2357.00 1.0% intel_idle /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
2168.00 0.9% __set_page_dirty_nobuffers /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
2110.00 0.9% get_pageblock_flags_group /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
2103.00 0.9% set_page_dirty /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
2090.00 0.9% futex_wake /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
1959.00 0.8% __memcpy /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
1696.00 0.7% generic_bin_search /lib/modules/3.5.0-rc4-00012-g3986cf7/kernel/fs/btrfs/btrfs.ko
1628.00 0.7% btree_set_page_dirty /lib/modules/3.5.0-rc4-00012-g3986cf7/kernel/fs/btrfs/btrfs.ko
1606.00 0.7% _raw_spin_unlock_irqrestore /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
1516.00 0.6% map_private_extent_buffer /lib/modules/3.5.0-rc4-00012-g3986cf7/kernel/fs/btrfs/btrfs.ko
1481.00 0.6% futex_requeue /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
1447.00 0.6% isolate_migratepages_range /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
1365.00 0.6% __schedule /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
1361.00 0.6% memcpy /lib64/libc-2.12.so
1263.00 0.5% trace_hardirqs_off /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
1238.00 0.5% tg_load_down /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
1220.00 0.5% move_freepages_block /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
1198.00 0.5% find_iova /lib/modules/3.5.0-rc4-00012-g3986cf7/build/vmlinux
1139.00 0.5% process_responses /lib/modules/3.5.0-rc4-00012-g3986cf7/kernel/drivers/net/ethernet/chelsio/cxgb4/cxgb4.ko
So far I've run a total of ~20 TB of data over fifty minutes
or so through 12 machines running this patch; no hint of
trouble, great performance.
Without this patch I would typically start having trouble
after just a few minutes of this load.
Thanks!
-- Jim
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2012-06-28 17:21 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-28 3:37 [PATCH -mm] mm: have order>0 compaction start off where it left Rik van Riel
2012-06-28 10:29 ` Mel Gorman
2012-06-28 16:30 ` Rik van Riel
2012-06-29 2:58 ` Kamezawa Hiroyuki
2012-06-28 17:16 ` Jim Schutt [this message]
2012-06-28 17:25 ` Rik van Riel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FEC9181.9060000@sandia.gov \
--to=jaschut@sandia.gov \
--cc=akpm@linux-foundation.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=minchan@kernel.org \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).