All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v4 07/11] mm: multigenerational lru: aging
@ 2021-08-18 13:36 kernel test robot
  0 siblings, 0 replies; 2+ messages in thread
From: kernel test robot @ 2021-08-18 13:36 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 7025 bytes --]

CC: kbuild-all(a)lists.01.org
In-Reply-To: <20210818063107.2696454-8-yuzhao@google.com>
References: <20210818063107.2696454-8-yuzhao@google.com>
TO: Yu Zhao <yuzhao@google.com>

Hi Yu,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on arm64/for-next/core]
[also build test WARNING on linus/master v5.14-rc6]
[cannot apply to hnaz-linux-mm/master tip/x86/core next-20210818]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Yu-Zhao/Multigenerational-LRU-Framework/20210818-143330
base:   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core
:::::: branch date: 7 hours ago
:::::: commit date: 7 hours ago
config: x86_64-randconfig-s021-20210817 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.3-348-gf0e6938b-dirty
        # https://github.com/0day-ci/linux/commit/e51fff45b9822540a91efdeed86f4ee9dabcb1c4
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Yu-Zhao/Multigenerational-LRU-Framework/20210818-143330
        git checkout e51fff45b9822540a91efdeed86f4ee9dabcb1c4
        # save the attached .config to linux build tree
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
   mm/rmap.c: note: in included file (through include/linux/ksm.h):
   include/linux/rmap.h:222:28: sparse: sparse: context imbalance in 'page_referenced_one' - unexpected unlock
>> mm/rmap.c:1127:6: sparse: sparse: context imbalance in 'do_page_add_anon_rmap' - different lock contexts for basic block
   include/linux/rmap.h:222:28: sparse: sparse: context imbalance in 'try_to_unmap_one' - unexpected unlock
   include/linux/rmap.h:222:28: sparse: sparse: context imbalance in 'try_to_migrate_one' - unexpected unlock
   include/linux/rmap.h:222:28: sparse: sparse: context imbalance in 'page_mlock_one' - unexpected unlock

vim +/do_page_add_anon_rmap +1127 mm/rmap.c

ad8c2ee801ad7a Rik van Riel            2010-08-09  1121  
ad8c2ee801ad7a Rik van Riel            2010-08-09  1122  /*
ad8c2ee801ad7a Rik van Riel            2010-08-09  1123   * Special version of the above for do_swap_page, which often runs
ad8c2ee801ad7a Rik van Riel            2010-08-09  1124   * into pages that are exclusively owned by the current process.
ad8c2ee801ad7a Rik van Riel            2010-08-09  1125   * Everybody else should continue to use page_add_anon_rmap above.
ad8c2ee801ad7a Rik van Riel            2010-08-09  1126   */
ad8c2ee801ad7a Rik van Riel            2010-08-09 @1127  void do_page_add_anon_rmap(struct page *page,
d281ee61451835 Kirill A. Shutemov      2016-01-15  1128  	struct vm_area_struct *vma, unsigned long address, int flags)
9617d95e6e9ffd Nicholas Piggin         2006-01-06  1129  {
d281ee61451835 Kirill A. Shutemov      2016-01-15  1130  	bool compound = flags & RMAP_COMPOUND;
53f9263baba69f Kirill A. Shutemov      2016-01-15  1131  	bool first;
53f9263baba69f Kirill A. Shutemov      2016-01-15  1132  
be5d0a74c62d8d Johannes Weiner         2020-06-03  1133  	if (unlikely(PageKsm(page)))
be5d0a74c62d8d Johannes Weiner         2020-06-03  1134  		lock_page_memcg(page);
be5d0a74c62d8d Johannes Weiner         2020-06-03  1135  	else
be5d0a74c62d8d Johannes Weiner         2020-06-03  1136  		VM_BUG_ON_PAGE(!PageLocked(page), page);
be5d0a74c62d8d Johannes Weiner         2020-06-03  1137  
53f9263baba69f Kirill A. Shutemov      2016-01-15  1138  	if (compound) {
53f9263baba69f Kirill A. Shutemov      2016-01-15  1139  		atomic_t *mapcount;
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  1140  		VM_BUG_ON_PAGE(!PageLocked(page), page);
53f9263baba69f Kirill A. Shutemov      2016-01-15  1141  		VM_BUG_ON_PAGE(!PageTransHuge(page), page);
53f9263baba69f Kirill A. Shutemov      2016-01-15  1142  		mapcount = compound_mapcount_ptr(page);
53f9263baba69f Kirill A. Shutemov      2016-01-15  1143  		first = atomic_inc_and_test(mapcount);
53f9263baba69f Kirill A. Shutemov      2016-01-15  1144  	} else {
53f9263baba69f Kirill A. Shutemov      2016-01-15  1145  		first = atomic_inc_and_test(&page->_mapcount);
53f9263baba69f Kirill A. Shutemov      2016-01-15  1146  	}
53f9263baba69f Kirill A. Shutemov      2016-01-15  1147  
53f9263baba69f Kirill A. Shutemov      2016-01-15  1148  	if (first) {
6c357848b44b40 Matthew Wilcox (Oracle  2020-08-14  1149) 		int nr = compound ? thp_nr_pages(page) : 1;
bea04b073292b2 Jianyu Zhan             2014-06-04  1150  		/*
bea04b073292b2 Jianyu Zhan             2014-06-04  1151  		 * We use the irq-unsafe __{inc|mod}_zone_page_stat because
bea04b073292b2 Jianyu Zhan             2014-06-04  1152  		 * these counters are not modified in interrupt context, and
bea04b073292b2 Jianyu Zhan             2014-06-04  1153  		 * pte lock(a spinlock) is held, which implies preemption
bea04b073292b2 Jianyu Zhan             2014-06-04  1154  		 * disabled.
bea04b073292b2 Jianyu Zhan             2014-06-04  1155  		 */
65c453778aea37 Kirill A. Shutemov      2016-07-26  1156  		if (compound)
69473e5de87389 Muchun Song             2021-02-24  1157  			__mod_lruvec_page_state(page, NR_ANON_THPS, nr);
be5d0a74c62d8d Johannes Weiner         2020-06-03  1158  		__mod_lruvec_page_state(page, NR_ANON_MAPPED, nr);
79134171df2381 Andrea Arcangeli        2011-01-13  1159  	}
5ad6468801d28c Hugh Dickins            2009-12-14  1160  
be5d0a74c62d8d Johannes Weiner         2020-06-03  1161  	if (unlikely(PageKsm(page))) {
be5d0a74c62d8d Johannes Weiner         2020-06-03  1162  		unlock_page_memcg(page);
be5d0a74c62d8d Johannes Weiner         2020-06-03  1163  		return;
be5d0a74c62d8d Johannes Weiner         2020-06-03  1164  	}
53f9263baba69f Kirill A. Shutemov      2016-01-15  1165  
5dbe0af47f8a8f Hugh Dickins            2011-05-28  1166  	/* address might be in next vma when migration races vma_adjust */
5ad6468801d28c Hugh Dickins            2009-12-14  1167  	if (first)
d281ee61451835 Kirill A. Shutemov      2016-01-15  1168  		__page_set_anon_rmap(page, vma, address,
d281ee61451835 Kirill A. Shutemov      2016-01-15  1169  				flags & RMAP_EXCLUSIVE);
69029cd550284e KAMEZAWA Hiroyuki       2008-07-25  1170  	else
c97a9e10eaee32 Nicholas Piggin         2007-05-16  1171  		__page_check_anon_rmap(page, vma, address);
^1da177e4c3f41 Linus Torvalds          2005-04-16  1172  }
^1da177e4c3f41 Linus Torvalds          2005-04-16  1173  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 36643 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread
* [PATCH v4 00/11] Multigenerational LRU Framework
@ 2021-08-18  6:30 Yu Zhao
  2021-08-18  6:31 ` [PATCH v4 07/11] mm: multigenerational lru: aging Yu Zhao
  0 siblings, 1 reply; 2+ messages in thread
From: Yu Zhao @ 2021-08-18  6:30 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, Hillf Danton, page-reclaim, Yu Zhao

TLDR
====
The current page reclaim is too expensive in terms of CPU usage and it
often makes poor choices about what to evict. This patchset offers an
alternative solution that is performant, versatile and
straightforward.

Repo
====
git fetch https://linux-mm.googlesource.com/page-reclaim refs/changes/81/1281/1

Problems
========
Active/inactive
---------------
Data centers need to predict whether a job can successfully land on a
machine without actually impacting the existing jobs. The granularity
of the active/inactive is too coarse to be useful for job schedulers
to make such decisions. In addition, data centers need to monitor
their memory utilization for horizontal scaling. The active/inactive
are relative terms and therefore cannot give any insight into a pool
of machines, e.g., aggregating the active/inactive across multiple
machines without a common frame of reference yields no meaningful
results.

Phones and laptops need to make good choices about what to evict
because they are more sensitive to major faults and power consumption.
Major faults can cause janks, i.e., slow UI renderings, and negatively
impact user experience. The selection between anon and file types has
been suboptimal because it is difficult to compare the access patterns
of the two types. On phones and laptops, executable pages are
frequently evicted despite the fact that there are many less
frequently used anon pages. Conversely, on workstations building large
projects, anon pages are sometimes swapped out while there are many
less recently used file pages.

Fundamentally, the notion of active/inactive has very limited ability
to measure temporal locality.

Rmap walk
---------
Traversing a list of pages and searching the rmap for PTEs mapping
each page can be very expensive because those pages are likely to be
unrelated. For workloads using a high percentage of anon memory, the
rmap becomes a bottleneck in page reclaim. For example, kswapd can
easily spend more CPU time in the rmap than in anything else on
laptops running Chrome. And the kernel can spend more CPU time in the
rmap than in any other functions on servers that heavily overcommit
anon memory.

Simply put, it does not take advantage of spatial locality when using
the rmap to test the accessed bit over a large number of pages.

Solutions
=========
Generations
-----------
This solution introduces a temporal dimension. Each generation is a
dot on the timeline and its population includes all mapped pages that
have been accessed since the birth of this generation.

All eviction choices are made based on generation numbers, which are
simple and yet effective. A large number of pages can be spread out
across many generations. Since each generation is timestamped at
birth, its population is aggregatable across different machines. This
is especially useful for data centers that require working set
estimation and proactive reclaim.

Page table walk
---------------
Each walk traverses an mm_struct list to scan PTEs for accessed pages
only. Processes that have been sleeping since the last walk are
skipped. The cost of this solution is roughly proportional to the
number of accessed pages. Since page tables usually have good spatial
locality for workloads using a high percentage of anon memory, the end
result is generally a significant reduction in kswapd CPU usage.

Note that page table walks are conditional and therefore do not
replace the rmap. For workloads that have sparse mappings, this
solution falls back to the rmap.

Use cases
=========
Page cache overcommit
---------------------
Tiers within each generation are specifically designed to improve the
performance of page cache under memory pressure. The fio/io_uring
benchmark shows 14% increase in IOPS for buffered I/O.

Without this patchset, the profile of fio/io_uring looks like:
  12.03%  __page_cache_alloc
   6.53%  shrink_active_list
   2.53%  mark_page_accessed

With this patchset, it looks like:
   9.45%  __page_cache_alloc
   0.52%  mark_page_accessed

Essentially, the idea of tiers is a feedback loop based on trial and
error. Instead of unconditionally moving file pages to the active list
upon the second access, this solution monitors refaults and
conditionally protects file pages with outlying refaults.

Anon memory overcommit
----------------------
Our real-world benchmark that browses popular websites in multiple
Chrome tabs demonstrates 51% less CPU usage from kswapd and 52% (full)
less PSI.

Without this patchset, the profile of kswapd looks like:
  31.03%  page_vma_mapped_walk
  25.59%  lzo1x_1_do_compress
   4.63%  do_raw_spin_lock
   3.89%  vma_interval_tree_iter_next
   3.33%  vma_interval_tree_subtree_search

With this patchset, it looks like:
  49.36%  lzo1x_1_do_compress
   4.54%  page_vma_mapped_walk
   4.45%  memset_erms
   3.47%  walk_pte_range
   2.88%  zram_bvec_rw

In addition, direct reclaim latency is reduced by 22% at 99th
percentile and the number of refaults is reduced by 7%. Both metrics
are important to phones and laptops as they are highly correlated to
user experience.

Working set estimation
----------------------
Userspace can invoke the aging by writing "+ memcg_id node_id max_gen
[swappiness]" to /sys/kernel/debug/lru_gen. Reading this debugfs
interface returns the birth time and the population of each
generation.

Given a pool of machines, by periodically invoking the aging, a job
scheduler is able to rank these machines based on the sizes of their
working sets and in turn selects the most ideal ones to land new jobs.

Proactive reclaim
-----------------
Userspace can invoke the eviction by writing "- memcg_id node_id
min_gen [swappiness] [nr_to_reclaim]" to /sys/kernel/debug/lru_gen.
Multiple command lines are supported, as is concatenation with
delimiters "," and ";".

A typical use case is that a job scheduler invokes the eviction in
anticipation of new jobs. The savings from proactive reclaim can
provide certain SLA to landing these new jobs.

Yu Zhao (11):
  mm: x86, arm64: add arch_has_hw_pte_young()
  mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG
  mm/vmscan.c: refactor shrink_node()
  mm: multigenerational lru: groundwork
  mm: multigenerational lru: protection
  mm: multigenerational lru: mm_struct list
  mm: multigenerational lru: aging
  mm: multigenerational lru: eviction
  mm: multigenerational lru: user interface
  mm: multigenerational lru: Kconfig
  mm: multigenerational lru: documentation

 Documentation/vm/index.rst          |    1 +
 Documentation/vm/multigen_lru.rst   |  134 ++
 arch/Kconfig                        |    9 +
 arch/arm64/include/asm/cpufeature.h |   19 +-
 arch/arm64/include/asm/pgtable.h    |   10 +-
 arch/arm64/kernel/cpufeature.c      |   19 +
 arch/arm64/mm/proc.S                |   12 -
 arch/arm64/tools/cpucaps            |    1 +
 arch/x86/Kconfig                    |    1 +
 arch/x86/include/asm/pgtable.h      |    9 +-
 arch/x86/mm/pgtable.c               |    5 +-
 fs/exec.c                           |    2 +
 fs/fuse/dev.c                       |    3 +-
 include/linux/cgroup.h              |   15 +-
 include/linux/memcontrol.h          |    9 +
 include/linux/mm.h                  |   34 +
 include/linux/mm_inline.h           |  201 ++
 include/linux/mm_types.h            |  107 ++
 include/linux/mmzone.h              |  103 ++
 include/linux/nodemask.h            |    1 +
 include/linux/oom.h                 |   16 +
 include/linux/page-flags-layout.h   |   19 +-
 include/linux/page-flags.h          |    4 +-
 include/linux/pgtable.h             |   16 +-
 include/linux/sched.h               |    3 +
 include/linux/swap.h                |    1 +
 kernel/bounds.c                     |    3 +
 kernel/cgroup/cgroup-internal.h     |    1 -
 kernel/exit.c                       |    1 +
 kernel/fork.c                       |   10 +
 kernel/kthread.c                    |    1 +
 kernel/sched/core.c                 |    2 +
 mm/Kconfig                          |   59 +
 mm/huge_memory.c                    |    3 +-
 mm/memcontrol.c                     |   28 +
 mm/memory.c                         |   21 +-
 mm/mm_init.c                        |    6 +-
 mm/mmzone.c                         |    2 +
 mm/oom_kill.c                       |    4 +-
 mm/rmap.c                           |    7 +
 mm/swap.c                           |   55 +-
 mm/swapfile.c                       |    2 +
 mm/vmscan.c                         | 2674 ++++++++++++++++++++++++++-
 mm/workingset.c                     |  119 +-
 44 files changed, 3591 insertions(+), 161 deletions(-)
 create mode 100644 Documentation/vm/multigen_lru.rst

-- 
2.33.0.rc1.237.g0d66db33f3-goog



^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-08-18 13:36 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-08-18 13:36 [PATCH v4 07/11] mm: multigenerational lru: aging kernel test robot
  -- strict thread matches above, loose matches on Subject: below --
2021-08-18  6:30 [PATCH v4 00/11] Multigenerational LRU Framework Yu Zhao
2021-08-18  6:31 ` [PATCH v4 07/11] mm: multigenerational lru: aging Yu Zhao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.