From: Minchan Kim <minchan@kernel.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>, John Dias <joaodias@google.com>,
Tim Murray <timmurray@google.com>,
Matthew Wilcox <willy@infradead.org>,
Vladimir Davydov <vdavydov.dev@gmail.com>,
Martin Liu <liumartin@google.com>,
Johannes Weiner <hannes@cmpxchg.org>
Subject: [PATCH] mm: throttle LRU pages skipping on rmap_lock contention
Date: Thu, 26 May 2022 10:08:44 -0700 [thread overview]
Message-ID: <Yo+0HMJYuhiJv+Ak@google.com> (raw)
On Thu, May 12, 2022 at 12:55:16PM -0700, Minchan Kim wrote:
> On Wed, May 11, 2022 at 07:05:23PM -0700, Andrew Morton wrote:
> > On Wed, 11 May 2022 15:57:09 -0700 Minchan Kim <minchan@kernel.org> wrote:
> >
> > > >
> > > > Could we burn much CPU time pointlessly churning though the LRU? Could
> > > > it mess up aging decisions enough to be performance-affecting in any
> > > > workload?
> > >
> > > Yes, correct. However, we are already churning LRUs by several
> > > ways. For example, isolate and putback from LRU list for page
> > > migration from several sources(typical example is compaction)
> > > and trylock_page and sc->gfp_mask not allowing page to be
> > > reclaimed in shrink_page_list.
> >
> > Well. "we're already doing a risky thing so it's OK to do more of that
> > thing"?
>
> I meant the aging is not rocket science.
>
>
> >
> > > >
> > > > Something else?
> > >
> > > One thing I am worry about was the granularity of the churning.
> > > Example above was page granuarity churning so might be execuse
> > > but this one is address space's churning, especically for file LRU
> > > (i_mmap_rwsem) which might cause too many rotating and live-lock
> > > in the end(keey rotating in small LRU with heavy memory pressure).
> > >
> > > If it could be a problem, maybe we use sc->priority to stop
> > > the skipping on a certain level of memory pressure.
> > >
> > > Any thought? Do we really need it?
> >
> > Are we able to think of a test which might demonstrate any worst case?
> > Whip that up and see what the numbers say?
>
> Yeah, let me create a worst test case to see how it goes.
>
> A thread keep reading a file-backed vma with 2xRAM file but other threads
> keep changing other vmas mapped at the same file so heavy i_mmap_rwsem
> contention in aging path.
Forking new thread
I checked what happens the worst case. I am not sure how the worst
case is realistic but would be great to have safety net.
From 5ccc8b170af5496f803243732e96b131419d7462 Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Thu, 19 May 2022 19:48:12 -0700
Subject: [PATCH] mm: throttle LRU pages skipping on rmap_lock contention
On heavy contention on rmap_lock(e.g., i_mmap_rwsem), VM can keep
skipping LRU pages so reclaim efficiency(steal/scanning) would drop
from 48% to 27% and workingset would be reclaimed faster than old
so workingset_refault rate increased to 240%.
We need a safe net to throttle the skipping LRU pages. This patch
throttle the skipping policy using (DEF_PRIRORITY - 2) magic value
VM has used for indicating non-light memory pressure.
IOW, let's skip rmap_lock contendeded pages only when
only when sc->priority >= (DEF_PRIRORITY - 2).
The test scenario to see the worst case:
1. A thread mmap a big file(e.g., 2x times of RAM) and keep touching
the address space up to three times.
2. B thread keeps doing mmap/munmap with the same file to cause
heavy lock contention in i_mmap_rwsem until the A thread finish
the job.
3. measure vmstat and thread A's elapsed time.
Thread's elapsed time:
1. vanilla
24.64sec(5.04%)
2. rmap_skip(i.e., mm-dont-be-stuck-to-rmap-lock-on-reclaim-path.patch)
25.20sec(4.16%)
3. priority(2 + this patch)
23.62sec(6.61%)
Vmstat Comparison:
vanilla rmap_skip priority
allocstall_movable 582 9772 14643
pgactivate 232 25865 4906
pgdeactivate 78 17265 651
pgmajfault 58 10639 1376
pgsteal_kswapd 15947857 15133195 15095445
pgsteal_direct 105439 583092 943195
pgscan_kswapd 24647536 52768898 28103170
pgscan_direct 8398139 3767100 7966353
workingset_refault_file 12582926 12248353 12565934
B test scenario
1. A thread mmap a big file(e.g., 2x times of RAM) and keep touching
the address space up to three times.
2. B thread keeps doing mmap/munmap with the same file to cause
heavy lock contention in i_mmap_rwsem until the A thread finish
the job.
3. C thread keep reading other big file using read(2) syscall
4. measure vmstat and thread A's elapsed time.
1. vanilla
27.24sec(5.29%)
2. rmap_skip
33.54sec(3.20%)
3. priority
28.68sec(1.26%)
Vmstat Comparison:
vanilla rmap_skip priority
allocstall_movable 15262 81258 21644
pgactivate 3042004 3086906 3502959
pgdeactivate 2307849 8959162 3605768
pgmajfault 566 1059 557
pgsteal_kswapd 17557735 30861283 18385674
pgsteal_direct 955389 6353527 1233605
pgscan_kswapd 31622695 59670433 35372575
pgscan_direct 4924052 13939254 4310247
workingset_refault_file 13466538 32193161 14588019
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
include/linux/rmap.h | 5 +++--
mm/rmap.c | 6 ++++--
mm/vmscan.c | 6 ++++--
3 files changed, 11 insertions(+), 6 deletions(-)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 9ec23138e410..2893da3f1cd3 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -296,7 +296,8 @@ static inline int page_try_share_anon_rmap(struct page *page)
* Called from mm/vmscan.c to handle paging out
*/
int folio_referenced(struct folio *, int is_locked,
- struct mem_cgroup *memcg, unsigned long *vm_flags);
+ struct mem_cgroup *memcg, unsigned long *vm_flags,
+ bool rmap_try_lock);
void try_to_migrate(struct folio *folio, enum ttu_flags flags);
void try_to_unmap(struct folio *, enum ttu_flags flags);
@@ -418,7 +419,7 @@ void page_unlock_anon_vma_read(struct anon_vma *anon_vma);
static inline int folio_referenced(struct folio *folio, int is_locked,
struct mem_cgroup *memcg,
- unsigned long *vm_flags)
+ unsigned long *vm_flags, bool rmap_try_lock)
{
*vm_flags = 0;
return 0;
diff --git a/mm/rmap.c b/mm/rmap.c
index d4cf3ea1b616..a75c7f7a0392 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -888,6 +888,7 @@ static bool invalid_folio_referenced_vma(struct vm_area_struct *vma, void *arg)
* @is_locked: Caller holds lock on the folio.
* @memcg: target memory cgroup
* @vm_flags: A combination of all the vma->vm_flags which referenced the folio.
+ * @rmap_try_lock: bail out if the rmap lock is contended
*
* Quick test_and_clear_referenced for all mappings of a folio,
*
@@ -895,7 +896,8 @@ static bool invalid_folio_referenced_vma(struct vm_area_struct *vma, void *arg)
* the function bailed out due to rmap lock contention.
*/
int folio_referenced(struct folio *folio, int is_locked,
- struct mem_cgroup *memcg, unsigned long *vm_flags)
+ struct mem_cgroup *memcg, unsigned long *vm_flags,
+ bool rmap_try_lock)
{
int we_locked = 0;
struct folio_referenced_arg pra = {
@@ -906,7 +908,7 @@ int folio_referenced(struct folio *folio, int is_locked,
.rmap_one = folio_referenced_one,
.arg = (void *)&pra,
.anon_lock = folio_lock_anon_vma_read,
- .try_lock = true,
+ .try_lock = rmap_try_lock,
};
*vm_flags = 0;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index ac168f4b0492..f0987e027aba 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1381,7 +1381,8 @@ static enum page_references folio_check_references(struct folio *folio,
unsigned long vm_flags;
referenced_ptes = folio_referenced(folio, 1, sc->target_mem_cgroup,
- &vm_flags);
+ &vm_flags,
+ sc->priority >= DEF_PRIORITY - 2);
referenced_folio = folio_test_clear_referenced(folio);
/*
@@ -2497,7 +2498,8 @@ static void shrink_active_list(unsigned long nr_to_scan,
/* Referenced or rmap lock contention: rotate */
if (folio_referenced(folio, 0, sc->target_mem_cgroup,
- &vm_flags) != 0) {
+ &vm_flags,
+ sc->priority >= DEF_PRIORITY - 2) != 0) {
/*
* Identify referenced, file-backed active pages and
* give them one more trip around the active list. So
--
2.36.1.124.g0e6072fb45-goog
next reply other threads:[~2022-05-26 17:08 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-26 17:08 Minchan Kim [this message]
2022-05-31 18:26 ` [PATCH] mm: throttle LRU pages skipping on rmap_lock contention Minchan Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Yo+0HMJYuhiJv+Ak@google.com \
--to=minchan@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=joaodias@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=liumartin@google.com \
--cc=mhocko@suse.com \
--cc=surenb@google.com \
--cc=timmurray@google.com \
--cc=vdavydov.dev@gmail.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.