linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/25] Move LRU page reclaim from zones to nodes
@ 2015-06-08 13:56 Mel Gorman
  2015-06-08 13:56 ` [PATCH 01/25] mm, vmstat: Add infrastructure for per-node vmstats Mel Gorman
                   ` (25 more replies)
  0 siblings, 26 replies; 30+ messages in thread
From: Mel Gorman @ 2015-06-08 13:56 UTC (permalink / raw)
  To: Linux-MM; +Cc: Rik van Riel, Johannes Weiner, Michal Hocko, LKML, Mel Gorman

This is an RFC series against 4.0 that moves LRUs from the zones to the
node. In concept, this is straight forward but there are a lot of details
so I'm posting it early to see what people think. The motivations are;

1. Currently, reclaim on node 0 behaves differently to node 1 with subtly different
   aging rules. Workloads may exhibit different behaviour depending on what node
   it was scheduled on as a result.

2. The residency of a page partially depends on what zone the page was
   allocated from.  This is partially combatted by the fair zone allocation
   policy but that is a partial solution that introduces overhead in the
   page allocator paths.

3. kswapd and the page allocator play special games with the order they scan zones
   to avoid interfering with each other but it's unpredictable.

4. The different scan activity and ordering for zone reclaim is very difficult
   to predict.

5. slab shrinkers are node-based which makes relating page reclaim to
   slab reclaim harder than it should be.

The reason we have zone-based reclaim is that we used to have
large highmem zones in common configurations and it was necessary
to quickly find ZONE_NORMAL pages for reclaim. Today, this is much
less of a concern as machines with lots of memory will (or should) use
64-bit kernels. Combinations of 32-bit hardware and 64-bit hardware are
rare. Machines that do use highmem should have relatively low highmem:lowmem
ratios than we worried about in the past.

Conceptually, moving to node LRUs should be easier to understand. The
page allocator plays fewer tricks to game reclaim and reclaim behaves
similarly on all nodes. 

The series is very long and bisection will be hazardous due to being
misleading as infrastructure is reshuffled. The rational bisection points are

  [PATCH 01/25] mm, vmstat: Add infrastructure for per-node vmstats
  [PATCH 19/25] mm, vmscan: Account in vmstat for pages skipped during reclaim
  [PATCH 21/25] mm, page_alloc: Defer zlc_setup until it is known it is required
  [PATCH 23/25] mm, page_alloc: Delete the zonelist_cache
  [PATCH 25/25] mm: page_alloc: Take fewer passes when allocating to the low watermark

It was tested on a UMA (8 cores single socket) and a NUMA machine (48 cores,
4 sockets). The page allocator tests showed marginal differences in aim9,
page fault microbenchmark, page allocator micro-benchmark and ebizzy. This
was expected as the affected paths are small in comparison to the overall
workloads.

I also tested using fstest on zero-length files to stress slab reclaim. It
showed no major differences in performance or stats.

A THP-based test case that stresses compaction was inconclusive. It showed
differences in the THP allocation success rate and both gains and losses in
the time it takes to allocate THP depending on the number of threads running.

Tests did show there were differences in the pages allocated from each zone.
This is due to the fact the fair zone allocation policy is removed as with
node-based LRU reclaim, it *should* not be necessary. It would be preferable
if the original database workload that motivated the introduction of that
policy was retested with this series though.

The raw figures as such are not that interesting -- things perform more
or less the same which is what you'd hope.

 arch/s390/appldata/appldata_mem.c         |   2 +-
 arch/tile/mm/pgtable.c                    |  18 +-
 drivers/base/node.c                       |  73 +--
 drivers/staging/android/lowmemorykiller.c |  12 +-
 fs/fs-writeback.c                         |   8 +-
 fs/fuse/file.c                            |   8 +-
 fs/nfs/internal.h                         |   2 +-
 fs/nfs/write.c                            |   2 +-
 fs/proc/meminfo.c                         |  14 +-
 include/linux/backing-dev.h               |   2 +-
 include/linux/memcontrol.h                |  15 +-
 include/linux/mm_inline.h                 |   4 +-
 include/linux/mmzone.h                    | 224 ++++------
 include/linux/swap.h                      |  11 +-
 include/linux/topology.h                  |   2 +-
 include/linux/vm_event_item.h             |  11 +-
 include/linux/vmstat.h                    |  94 +++-
 include/linux/writeback.h                 |   2 +-
 include/trace/events/vmscan.h             |  10 +-
 include/trace/events/writeback.h          |   6 +-
 kernel/power/snapshot.c                   |  10 +-
 kernel/sysctl.c                           |   4 +-
 mm/backing-dev.c                          |  14 +-
 mm/compaction.c                           |  25 +-
 mm/filemap.c                              |  16 +-
 mm/huge_memory.c                          |  14 +-
 mm/internal.h                             |  11 +-
 mm/memcontrol.c                           |  37 +-
 mm/memory-failure.c                       |   4 +-
 mm/memory_hotplug.c                       |   2 +-
 mm/mempolicy.c                            |   2 +-
 mm/migrate.c                              |  31 +-
 mm/mlock.c                                |  12 +-
 mm/mmap.c                                 |   4 +-
 mm/nommu.c                                |   4 +-
 mm/page-writeback.c                       | 109 ++---
 mm/page_alloc.c                           | 489 ++++++--------------
 mm/rmap.c                                 |  16 +-
 mm/shmem.c                                |  12 +-
 mm/swap.c                                 |  66 +--
 mm/swap_state.c                           |   4 +-
 mm/truncate.c                             |   2 +-
 mm/vmscan.c                               | 718 ++++++++++++++----------------
 mm/vmstat.c                               | 308 ++++++++++---
 mm/workingset.c                           |  49 +-
 45 files changed, 1225 insertions(+), 1258 deletions(-)

-- 
2.3.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread
* Re: [PATCH 03/25] mm, vmscan: Move LRU lists to node
@ 2015-06-11  7:12 Hillf Danton
  2015-06-15  8:19 ` Mel Gorman
  0 siblings, 1 reply; 30+ messages in thread
From: Hillf Danton @ 2015-06-11  7:12 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-mm, linux-kernel, Rik van Riel, Johannes Weiner,
	Michal Hocko

> @@ -774,6 +764,21 @@ typedef struct pglist_data {
>  	ZONE_PADDING(_pad1_)
>  	spinlock_t		lru_lock;
> 
> +	/* Fields commonly accessed by the page reclaim scanner */
> +	struct lruvec		lruvec;
> +
> +	/* Evictions & activations on the inactive file list */
> +	atomic_long_t		inactive_age;
> +
> +	/*
> +	 * The target ratio of ACTIVE_ANON to INACTIVE_ANON pages on
> +	 * this zone's LRU.  Maintained by the pageout code.
> +	 */

The comment has to be updated.

> +	unsigned int inactive_ratio;
> +
> +	unsigned long		flags;
> +
> +	ZONE_PADDING(_pad2_)
>  	struct per_cpu_nodestat __percpu *per_cpu_nodestats;
>  	atomic_long_t		vm_stat[NR_VM_NODE_STAT_ITEMS];
>  } pg_data_t;
> @@ -1185,7 +1185,7 @@ struct lruvec *mem_cgroup_zone_lruvec(struct zone *zone,
>  	struct lruvec *lruvec;
> 
>  	if (mem_cgroup_disabled()) {
> -		lruvec = &zone->lruvec;
> +		lruvec = zone_lruvec(zone);
>  		goto out;
>  	}
> 
> @@ -1197,8 +1197,8 @@ out:
>  	 * we have to be prepared to initialize lruvec->zone here;
>  	 * and if offlined then reonlined, we need to reinitialize it.
>  	 */
> -	if (unlikely(lruvec->zone != zone))
> -		lruvec->zone = zone;
> +	if (unlikely(lruvec->pgdat != zone->zone_pgdat))
> +		lruvec->pgdat = zone->zone_pgdat;

See below please.

>  	return lruvec;
>  }
> 
> @@ -1211,14 +1211,14 @@ out:
>   * and putback protocol: the LRU lock must be held, and the page must
>   * either be PageLRU() or the caller must have isolated/allocated it.
>   */
> -struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct zone *zone)
> +struct lruvec *mem_cgroup_page_lruvec(struct page *page, struct pglist_data *pgdat)
>  {
>  	struct mem_cgroup_per_zone *mz;
>  	struct mem_cgroup *memcg;
>  	struct lruvec *lruvec;
> 
>  	if (mem_cgroup_disabled()) {
> -		lruvec = &zone->lruvec;
> +		lruvec = &pgdat->lruvec;
>  		goto out;
>  	}
> 
> @@ -1238,8 +1238,8 @@ out:
>  	 * we have to be prepared to initialize lruvec->zone here;
>  	 * and if offlined then reonlined, we need to reinitialize it.
>  	 */
> -	if (unlikely(lruvec->zone != zone))
> -		lruvec->zone = zone;
> +	if (unlikely(lruvec->pgdat != pgdat))
> +		lruvec->pgdat = pgdat;

Given &pgdat->lruvec, we no longer need(or are able) to set lruvec->pgdat.

>  	return lruvec;
>  }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2015-06-21 14:04 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-08 13:56 [RFC PATCH 00/25] Move LRU page reclaim from zones to nodes Mel Gorman
2015-06-08 13:56 ` [PATCH 01/25] mm, vmstat: Add infrastructure for per-node vmstats Mel Gorman
2015-06-08 13:56 ` [PATCH 02/25] mm, vmscan: Move lru_lock to the node Mel Gorman
2015-06-08 13:56 ` [PATCH 03/25] mm, vmscan: Move LRU lists to node Mel Gorman
2015-06-08 13:56 ` [PATCH 04/25] mm, vmscan: Begin reclaiming pages on a per-node basis Mel Gorman
2015-06-08 13:56 ` [PATCH 05/25] mm, vmscan: Have kswapd only scan based on the highest requested zone Mel Gorman
2015-06-08 13:56 ` [PATCH 06/25] mm, vmscan: Avoid a second search through zones checking if compaction is required Mel Gorman
2015-06-08 13:56 ` [PATCH 07/25] mm, vmscan: Make kswapd think of reclaim in terms of nodes Mel Gorman
2015-06-08 13:56 ` [PATCH 08/25] mm, vmscan: By default have direct reclaim only shrink once per node Mel Gorman
2015-06-08 13:56 ` [PATCH 09/25] mm, vmscan: Clear congestion, dirty and need for compaction on a per-node basis Mel Gorman
2015-06-08 13:56 ` [PATCH 10/25] mm, vmscan: Make shrink_node decisions more node-centric Mel Gorman
2015-06-08 13:56 ` [PATCH 11/25] mm, workingset: Make working set detection node-aware Mel Gorman
2015-06-08 13:56 ` [PATCH 12/25] mm, page_alloc: Consider dirtyable memory in terms of nodes Mel Gorman
2015-06-08 13:56 ` [PATCH 13/25] mm: Move NR_FILE_MAPPED accounting to the node Mel Gorman
2015-06-08 13:56 ` [PATCH 14/25] mm: Rename NR_ANON_PAGES to NR_ANON_MAPPED Mel Gorman
2015-06-08 13:56 ` [PATCH 15/25] mm: Move most file-based accounting to the node Mel Gorman
2015-06-08 13:56 ` [PATCH 16/25] mm, vmscan: Update classzone_idx if buffer_heads_over_limit Mel Gorman
2015-06-08 13:56 ` [PATCH 17/25] mm, vmscan: Check if cpusets are enabled during direct reclaim Mel Gorman
2015-06-08 13:56 ` [PATCH 18/25] mm, vmscan: Only wakeup kswapd once per node for the requested classzone Mel Gorman
2015-06-08 13:56 ` [PATCH 19/25] mm, vmscan: Account in vmstat for pages skipped during reclaim Mel Gorman
2015-06-08 13:56 ` [PATCH 20/25] mm, page_alloc: Remove fair zone allocation policy Mel Gorman
2015-06-08 13:56 ` [PATCH 21/25] mm, page_alloc: Defer zlc_setup until it is known it is required Mel Gorman
2015-06-08 13:56 ` [PATCH 22/25] mm: Convert zone_reclaim to node_reclaim Mel Gorman
2015-06-08 13:56 ` [PATCH 23/25] mm, page_alloc: Delete the zonelist_cache Mel Gorman
2015-06-08 13:56 ` [PATCH 24/25] mm, page_alloc: Use ac->classzone_idx instead of zone_idx(preferred_zone) Mel Gorman
2015-06-08 13:56 ` [PATCH 25/25] mm: page_alloc: Take fewer passes when allocating to the low watermark Mel Gorman
2015-06-19 17:01 ` [RFC PATCH 00/25] Move LRU page reclaim from zones to nodes Johannes Weiner
2015-06-21 14:04   ` Mel Gorman
  -- strict thread matches above, loose matches on Subject: below --
2015-06-11  7:12 [PATCH 03/25] mm, vmscan: Move LRU lists to node Hillf Danton
2015-06-15  8:19 ` Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).