* Re: [PATCH 19/25] mm, vmscan: Account in vmstat for pages skipped during reclaim
@ 2015-06-12 8:49 Hillf Danton
0 siblings, 0 replies; 2+ messages in thread
From: Hillf Danton @ 2015-06-12 8:49 UTC (permalink / raw)
To: Mel Gorman
Cc: linux-mm, linux-kernel, Rik van Riel, Johannes Weiner,
Michal Hocko
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1326,6 +1326,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>
> for (scan = 0; scan < nr_to_scan && !list_empty(src); scan++) {
> struct page *page;
> + struct zone *zone;
> int nr_pages;
>
> page = lru_to_page(src);
> @@ -1333,8 +1334,11 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>
> VM_BUG_ON_PAGE(!PageLRU(page), page);
>
> - if (page_zone_id(page) > sc->reclaim_idx)
> + zone = page_zone(page);
> + if (page_zone_id(page) > sc->reclaim_idx) {
> list_move(&page->lru, &pages_skipped);
> + __count_zone_vm_events(PGSCAN_SKIP, page_zone(page), 1);
> + }
The newly added zone is not used.
>
> switch (__isolate_lru_page(page, mode)) {
> case 0:
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 2+ messages in thread
* [RFC PATCH 00/25] Move LRU page reclaim from zones to nodes
@ 2015-06-08 13:56 Mel Gorman
2015-06-08 13:56 ` [PATCH 19/25] mm, vmscan: Account in vmstat for pages skipped during reclaim Mel Gorman
0 siblings, 1 reply; 2+ messages in thread
From: Mel Gorman @ 2015-06-08 13:56 UTC (permalink / raw)
To: Linux-MM; +Cc: Rik van Riel, Johannes Weiner, Michal Hocko, LKML, Mel Gorman
This is an RFC series against 4.0 that moves LRUs from the zones to the
node. In concept, this is straight forward but there are a lot of details
so I'm posting it early to see what people think. The motivations are;
1. Currently, reclaim on node 0 behaves differently to node 1 with subtly different
aging rules. Workloads may exhibit different behaviour depending on what node
it was scheduled on as a result.
2. The residency of a page partially depends on what zone the page was
allocated from. This is partially combatted by the fair zone allocation
policy but that is a partial solution that introduces overhead in the
page allocator paths.
3. kswapd and the page allocator play special games with the order they scan zones
to avoid interfering with each other but it's unpredictable.
4. The different scan activity and ordering for zone reclaim is very difficult
to predict.
5. slab shrinkers are node-based which makes relating page reclaim to
slab reclaim harder than it should be.
The reason we have zone-based reclaim is that we used to have
large highmem zones in common configurations and it was necessary
to quickly find ZONE_NORMAL pages for reclaim. Today, this is much
less of a concern as machines with lots of memory will (or should) use
64-bit kernels. Combinations of 32-bit hardware and 64-bit hardware are
rare. Machines that do use highmem should have relatively low highmem:lowmem
ratios than we worried about in the past.
Conceptually, moving to node LRUs should be easier to understand. The
page allocator plays fewer tricks to game reclaim and reclaim behaves
similarly on all nodes.
The series is very long and bisection will be hazardous due to being
misleading as infrastructure is reshuffled. The rational bisection points are
[PATCH 01/25] mm, vmstat: Add infrastructure for per-node vmstats
[PATCH 19/25] mm, vmscan: Account in vmstat for pages skipped during reclaim
[PATCH 21/25] mm, page_alloc: Defer zlc_setup until it is known it is required
[PATCH 23/25] mm, page_alloc: Delete the zonelist_cache
[PATCH 25/25] mm: page_alloc: Take fewer passes when allocating to the low watermark
It was tested on a UMA (8 cores single socket) and a NUMA machine (48 cores,
4 sockets). The page allocator tests showed marginal differences in aim9,
page fault microbenchmark, page allocator micro-benchmark and ebizzy. This
was expected as the affected paths are small in comparison to the overall
workloads.
I also tested using fstest on zero-length files to stress slab reclaim. It
showed no major differences in performance or stats.
A THP-based test case that stresses compaction was inconclusive. It showed
differences in the THP allocation success rate and both gains and losses in
the time it takes to allocate THP depending on the number of threads running.
Tests did show there were differences in the pages allocated from each zone.
This is due to the fact the fair zone allocation policy is removed as with
node-based LRU reclaim, it *should* not be necessary. It would be preferable
if the original database workload that motivated the introduction of that
policy was retested with this series though.
The raw figures as such are not that interesting -- things perform more
or less the same which is what you'd hope.
arch/s390/appldata/appldata_mem.c | 2 +-
arch/tile/mm/pgtable.c | 18 +-
drivers/base/node.c | 73 +--
drivers/staging/android/lowmemorykiller.c | 12 +-
fs/fs-writeback.c | 8 +-
fs/fuse/file.c | 8 +-
fs/nfs/internal.h | 2 +-
fs/nfs/write.c | 2 +-
fs/proc/meminfo.c | 14 +-
include/linux/backing-dev.h | 2 +-
include/linux/memcontrol.h | 15 +-
include/linux/mm_inline.h | 4 +-
include/linux/mmzone.h | 224 ++++------
include/linux/swap.h | 11 +-
include/linux/topology.h | 2 +-
include/linux/vm_event_item.h | 11 +-
include/linux/vmstat.h | 94 +++-
include/linux/writeback.h | 2 +-
include/trace/events/vmscan.h | 10 +-
include/trace/events/writeback.h | 6 +-
kernel/power/snapshot.c | 10 +-
kernel/sysctl.c | 4 +-
mm/backing-dev.c | 14 +-
mm/compaction.c | 25 +-
mm/filemap.c | 16 +-
mm/huge_memory.c | 14 +-
mm/internal.h | 11 +-
mm/memcontrol.c | 37 +-
mm/memory-failure.c | 4 +-
mm/memory_hotplug.c | 2 +-
mm/mempolicy.c | 2 +-
mm/migrate.c | 31 +-
mm/mlock.c | 12 +-
mm/mmap.c | 4 +-
mm/nommu.c | 4 +-
mm/page-writeback.c | 109 ++---
mm/page_alloc.c | 489 ++++++--------------
mm/rmap.c | 16 +-
mm/shmem.c | 12 +-
mm/swap.c | 66 +--
mm/swap_state.c | 4 +-
mm/truncate.c | 2 +-
mm/vmscan.c | 718 ++++++++++++++----------------
mm/vmstat.c | 308 ++++++++++---
mm/workingset.c | 49 +-
45 files changed, 1225 insertions(+), 1258 deletions(-)
--
2.3.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 2+ messages in thread
* [PATCH 19/25] mm, vmscan: Account in vmstat for pages skipped during reclaim
2015-06-08 13:56 [RFC PATCH 00/25] Move LRU page reclaim from zones to nodes Mel Gorman
@ 2015-06-08 13:56 ` Mel Gorman
0 siblings, 0 replies; 2+ messages in thread
From: Mel Gorman @ 2015-06-08 13:56 UTC (permalink / raw)
To: Linux-MM; +Cc: Rik van Riel, Johannes Weiner, Michal Hocko, LKML, Mel Gorman
Low reclaim efficiency occurs when many pages are scanned that cannot
be reclaimed. This occurs for example when pages are dirty or under
writeback. Node-based LRU reclaim introduces a new source as reclaim
for allocation requests requiring lower zones will skip pages belonging
to higher zones. This patch adds vmstat counters to count pages that
were skipped because the calling context could not use pages from that
zone. It will help distinguish one source of low reclaim efficiency.
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
include/linux/vm_event_item.h | 1 +
mm/vmscan.c | 6 +++++-
mm/vmstat.c | 2 ++
3 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 4ce4d59d361e..95cdd56c65bf 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -25,6 +25,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
FOR_ALL_ZONES(PGALLOC),
PGFREE, PGACTIVATE, PGDEACTIVATE,
PGFAULT, PGMAJFAULT,
+ FOR_ALL_ZONES(PGSCAN_SKIP),
PGREFILL,
PGSTEAL_KSWAPD,
PGSTEAL_DIRECT,
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 69916bb9acba..3cb0cc70ddbd 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1326,6 +1326,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
for (scan = 0; scan < nr_to_scan && !list_empty(src); scan++) {
struct page *page;
+ struct zone *zone;
int nr_pages;
page = lru_to_page(src);
@@ -1333,8 +1334,11 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
VM_BUG_ON_PAGE(!PageLRU(page), page);
- if (page_zone_id(page) > sc->reclaim_idx)
+ zone = page_zone(page);
+ if (page_zone_id(page) > sc->reclaim_idx) {
list_move(&page->lru, &pages_skipped);
+ __count_zone_vm_events(PGSCAN_SKIP, page_zone(page), 1);
+ }
switch (__isolate_lru_page(page, mode)) {
case 0:
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 4a9f73c4140b..d805df47d3ae 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -957,6 +957,8 @@ const char * const vmstat_text[] = {
"pgfault",
"pgmajfault",
+ TEXTS_FOR_ZONES("pgskip")
+
"pgrefill",
"pgsteal_kswapd",
"pgsteal_direct",
--
2.3.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 2+ messages in thread
end of thread, other threads:[~2015-06-12 8:50 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-12 8:49 [PATCH 19/25] mm, vmscan: Account in vmstat for pages skipped during reclaim Hillf Danton
-- strict thread matches above, loose matches on Subject: below --
2015-06-08 13:56 [RFC PATCH 00/25] Move LRU page reclaim from zones to nodes Mel Gorman
2015-06-08 13:56 ` [PATCH 19/25] mm, vmscan: Account in vmstat for pages skipped during reclaim Mel Gorman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).