Re: [PATCH] mm: throttle LRU pages skipping on rmap_lock contention

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Minchan Kim <minchan@kernel.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>, John Dias <joaodias@google.com>,
	Tim Murray <timmurray@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Martin Liu <liumartin@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH] mm: throttle LRU pages skipping on rmap_lock contention
Date: Tue, 31 May 2022 11:26:09 -0700	[thread overview]
Message-ID: <YpZdwRDuX/aQoAGu@google.com> (raw)
In-Reply-To: <Yo+0HMJYuhiJv+Ak@google.com>

Bump up.

On Thu, May 26, 2022 at 10:08:44AM -0700, Minchan Kim wrote:
> On Thu, May 12, 2022 at 12:55:16PM -0700, Minchan Kim wrote:
> > On Wed, May 11, 2022 at 07:05:23PM -0700, Andrew Morton wrote:
> > > On Wed, 11 May 2022 15:57:09 -0700 Minchan Kim <minchan@kernel.org> wrote:
> > > 
> > > > > 
> > > > > Could we burn much CPU time pointlessly churning though the LRU?  Could
> > > > > it mess up aging decisions enough to be performance-affecting in any
> > > > > workload?
> > > > 
> > > > Yes, correct. However, we are already churning LRUs by several
> > > > ways. For example, isolate and putback from LRU list for page
> > > > migration from several sources(typical example is compaction)
> > > > and trylock_page and sc->gfp_mask not allowing page to be
> > > > reclaimed in shrink_page_list.
> > > 
> > > Well.  "we're already doing a risky thing so it's OK to do more of that
> > > thing"?
> > 
> > I meant the aging is not rocket science.
> > 
> > 
> > > 
> > > > > 
> > > > > Something else?
> > > > 
> > > > One thing I am worry about was the granularity of the churning.
> > > > Example above was page granuarity churning so might be execuse
> > > > but this one is address space's churning, especically for file LRU
> > > > (i_mmap_rwsem) which might cause too many rotating and live-lock
> > > > in the end(keey rotating in small LRU with heavy memory pressure).
> > > > 
> > > > If it could be a problem, maybe we use sc->priority to stop
> > > > the skipping on a certain level of memory pressure.
> > > > 
> > > > Any thought? Do we really need it?
> > > 
> > > Are we able to think of a test which might demonstrate any worst case? 
> > > Whip that up and see what the numbers say?
> > 
> > Yeah, let me create a worst test case to see how it goes.
> > 
> > A thread keep reading a file-backed vma with 2xRAM file but other threads
> > keep changing other vmas mapped at the same file so heavy i_mmap_rwsem
> > contention in aging path.
> 
> Forking new thread
> 
> I checked what happens the worst case. I am not sure how the worst
> case is realistic but would be great to have safety net.
> 
> From 5ccc8b170af5496f803243732e96b131419d7462 Mon Sep 17 00:00:00 2001
> From: Minchan Kim <minchan@kernel.org>
> Date: Thu, 19 May 2022 19:48:12 -0700
> Subject: [PATCH] mm: throttle LRU pages skipping on rmap_lock contention
> 
> On heavy contention on rmap_lock(e.g., i_mmap_rwsem), VM can keep
> skipping LRU pages so reclaim efficiency(steal/scanning) would drop
> from 48% to 27% and workingset would be reclaimed faster than old
> so workingset_refault rate increased to 240%.
> 
> We need a safe net to throttle the skipping LRU pages. This patch
> throttle the skipping policy using (DEF_PRIRORITY - 2) magic value
> VM has used for indicating non-light memory pressure.
> IOW, let's skip rmap_lock contendeded pages only when
> only when sc->priority >= (DEF_PRIRORITY - 2).
> 
> The test scenario to see the worst case:
> 
> 1. A thread mmap a big file(e.g., 2x times of RAM) and keep touching
>    the address space up to three times.
> 2. B thread keeps doing mmap/munmap with the same file to cause
>    heavy lock contention in i_mmap_rwsem until the A thread finish
>    the job.
> 3. measure vmstat and thread A's elapsed time.
> 
> Thread's elapsed time:
> 
> 1. vanilla
> 24.64sec(5.04%)
> 
> 2. rmap_skip(i.e., mm-dont-be-stuck-to-rmap-lock-on-reclaim-path.patch)
> 25.20sec(4.16%)
> 
> 3. priority(2 + this patch)
> 23.62sec(6.61%)
> 
> Vmstat Comparison:
> 				     vanilla    rmap_skip    priority
> 	     allocstall_movable          582         9772       14643
> 		     pgactivate          232        25865        4906
>       		   pgdeactivate           78        17265         651
>         	     pgmajfault           58        10639        1376
>     		 pgsteal_kswapd     15947857     15133195    15095445
>     		 pgsteal_direct       105439       583092      943195
>      	          pgscan_kswapd     24647536     52768898    28103170
>      		  pgscan_direct      8398139      3767100     7966353
> 	workingset_refault_file     12582926     12248353    12565934
> 
> B test scenario
> 
> 1. A thread mmap a big file(e.g., 2x times of RAM) and keep touching
>    the address space up to three times.
> 2. B thread keeps doing mmap/munmap with the same file to cause
>    heavy lock contention in i_mmap_rwsem until the A thread finish
>    the job.
> 3. C thread keep reading other big file using read(2) syscall
> 4. measure vmstat and thread A's elapsed time.
> 
> 1. vanilla
> 27.24sec(5.29%)
> 
> 2. rmap_skip
> 33.54sec(3.20%)
> 
> 3. priority
> 28.68sec(1.26%)
> 
> Vmstat Comparison:
> 				     vanilla    rmap_skip    priority
> 	     allocstall_movable        15262        81258       21644
>         	     pgactivate      3042004      3086906     3502959
>       		   pgdeactivate      2307849      8959162     3605768
>         	     pgmajfault          566         1059	  557
>     		 pgsteal_kswapd     17557735     30861283    18385674
>     		 pgsteal_direct       955389      6353527     1233605
>      		  pgscan_kswapd     31622695     59670433    35372575
> 		  pgscan_direct      4924052     13939254     4310247
> 	workingset_refault_file     13466538     32193161    14588019
> 
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
>  include/linux/rmap.h | 5 +++--
>  mm/rmap.c            | 6 ++++--
>  mm/vmscan.c          | 6 ++++--
>  3 files changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index 9ec23138e410..2893da3f1cd3 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -296,7 +296,8 @@ static inline int page_try_share_anon_rmap(struct page *page)
>   * Called from mm/vmscan.c to handle paging out
>   */
>  int folio_referenced(struct folio *, int is_locked,
> -			struct mem_cgroup *memcg, unsigned long *vm_flags);
> +			struct mem_cgroup *memcg, unsigned long *vm_flags,
> +			bool rmap_try_lock);
>  
>  void try_to_migrate(struct folio *folio, enum ttu_flags flags);
>  void try_to_unmap(struct folio *, enum ttu_flags flags);
> @@ -418,7 +419,7 @@ void page_unlock_anon_vma_read(struct anon_vma *anon_vma);
>  
>  static inline int folio_referenced(struct folio *folio, int is_locked,
>  				  struct mem_cgroup *memcg,
> -				  unsigned long *vm_flags)
> +				  unsigned long *vm_flags, bool rmap_try_lock)
>  {
>  	*vm_flags = 0;
>  	return 0;
> diff --git a/mm/rmap.c b/mm/rmap.c
> index d4cf3ea1b616..a75c7f7a0392 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -888,6 +888,7 @@ static bool invalid_folio_referenced_vma(struct vm_area_struct *vma, void *arg)
>   * @is_locked: Caller holds lock on the folio.
>   * @memcg: target memory cgroup
>   * @vm_flags: A combination of all the vma->vm_flags which referenced the folio.
> + * @rmap_try_lock: bail out if the rmap lock is contended
>   *
>   * Quick test_and_clear_referenced for all mappings of a folio,
>   *
> @@ -895,7 +896,8 @@ static bool invalid_folio_referenced_vma(struct vm_area_struct *vma, void *arg)
>   * the function bailed out due to rmap lock contention.
>   */
>  int folio_referenced(struct folio *folio, int is_locked,
> -		     struct mem_cgroup *memcg, unsigned long *vm_flags)
> +		     struct mem_cgroup *memcg, unsigned long *vm_flags,
> +		     bool rmap_try_lock)
>  {
>  	int we_locked = 0;
>  	struct folio_referenced_arg pra = {
> @@ -906,7 +908,7 @@ int folio_referenced(struct folio *folio, int is_locked,
>  		.rmap_one = folio_referenced_one,
>  		.arg = (void *)&pra,
>  		.anon_lock = folio_lock_anon_vma_read,
> -		.try_lock = true,
> +		.try_lock = rmap_try_lock,
>  	};
>  
>  	*vm_flags = 0;
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index ac168f4b0492..f0987e027aba 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1381,7 +1381,8 @@ static enum page_references folio_check_references(struct folio *folio,
>  	unsigned long vm_flags;
>  
>  	referenced_ptes = folio_referenced(folio, 1, sc->target_mem_cgroup,
> -					   &vm_flags);
> +					   &vm_flags,
> +					   sc->priority >= DEF_PRIORITY - 2);
>  	referenced_folio = folio_test_clear_referenced(folio);
>  
>  	/*
> @@ -2497,7 +2498,8 @@ static void shrink_active_list(unsigned long nr_to_scan,
>  
>  		/* Referenced or rmap lock contention: rotate */
>  		if (folio_referenced(folio, 0, sc->target_mem_cgroup,
> -				     &vm_flags) != 0) {
> +				     &vm_flags,
> +				     sc->priority >= DEF_PRIORITY - 2) != 0) {
>  			/*
>  			 * Identify referenced, file-backed active pages and
>  			 * give them one more trip around the active list. So
> -- 
> 2.36.1.124.g0e6072fb45-goog
>

     prev parent reply	other threads:[~2022-05-31 18:26 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-26 17:08 [PATCH] mm: throttle LRU pages skipping on rmap_lock contention Minchan Kim
2022-05-31 18:26 ` Minchan Kim [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YpZdwRDuX/aQoAGu@google.com \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=joaodias@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liumartin@google.com \
    --cc=mhocko@suse.com \
    --cc=surenb@google.com \
    --cc=timmurray@google.com \
    --cc=vdavydov.dev@gmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.