From: Minchan Kim <minchan@kernel.org>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kuo-Hsin Yang <vovoy@chromium.org>,
Andrew Morton <akpm@linux-foundation.org>,
Michal Hocko <mhocko@suse.com>, Sonny Rao <sonnyrao@chromium.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v2] mm: vmscan: fix not scanning anonymous pages when detecting file refaults
Date: Sat, 29 Jun 2019 08:36:49 +0900 [thread overview]
Message-ID: <20190628233649.GB245333@google.com> (raw)
In-Reply-To: <20190628143201.GB17212@cmpxchg.org>
On Fri, Jun 28, 2019 at 10:32:01AM -0400, Johannes Weiner wrote:
> On Fri, Jun 28, 2019 at 07:16:27PM +0800, Kuo-Hsin Yang wrote:
> > When file refaults are detected and there are many inactive file pages,
> > the system never reclaim anonymous pages, the file pages are dropped
> > aggressively when there are still a lot of cold anonymous pages and
> > system thrashes. This issue impacts the performance of applications
> > with large executable, e.g. chrome.
>
> This is good.
>
> > Commit 2a2e48854d70 ("mm: vmscan: fix IO/refault regression in cache
> > workingset transition") introduced actual_reclaim parameter. When file
> > refaults are detected, inactive_list_is_low() may return different
> > values depends on the actual_reclaim parameter. Vmscan would only scan
> > active/inactive file lists at file thrashing state when the following 2
> > conditions are satisfied.
> >
> > 1) inactive_list_is_low() returns false in get_scan_count() to trigger
> > scanning file lists only.
> > 2) inactive_list_is_low() returns true in shrink_list() to allow
> > scanning active file list.
> >
> > This patch makes the return value of inactive_list_is_low() independent
> > of actual_reclaim and rename the parameter back to trace.
>
> This is not. The root cause for the problem you describe isn't the
> patch you point to. The root cause is our decision to force-scan the
> file LRU based on relative inactive:active size alone, without taking
> file thrashing into account at all. This is a much older problem.
>
> After the referenced patch, we're taking thrashing into account when
> deciding whether to deactivate active file pages or not. To solve the
> problem pointed out here, we can extend that same principle to the
> decision whether to force-scan files and skip the anon LRUs.
>
> The patch you're pointing to isn't the culprit. On the contrary, it
> provides the infrastructure to solve a much older problem.
>
> > The problem can be reproduced by the following test program.
> >
> > ---8<---
> > void fallocate_file(const char *filename, off_t size)
> > {
> > struct stat st;
> > int fd;
> >
> > if (!stat(filename, &st) && st.st_size >= size)
> > return;
> >
> > fd = open(filename, O_WRONLY | O_CREAT, 0600);
> > if (fd < 0) {
> > perror("create file");
> > exit(1);
> > }
> > if (posix_fallocate(fd, 0, size)) {
> > perror("fallocate");
> > exit(1);
> > }
> > close(fd);
> > }
> >
> > long *alloc_anon(long size)
> > {
> > long *start = malloc(size);
> > memset(start, 1, size);
> > return start;
> > }
> >
> > long access_file(const char *filename, long size, long rounds)
> > {
> > int fd, i;
> > volatile char *start1, *end1, *start2;
> > const int page_size = getpagesize();
> > long sum = 0;
> >
> > fd = open(filename, O_RDONLY);
> > if (fd == -1) {
> > perror("open");
> > exit(1);
> > }
> >
> > /*
> > * Some applications, e.g. chrome, use a lot of executable file
> > * pages, map some of the pages with PROT_EXEC flag to simulate
> > * the behavior.
> > */
> > start1 = mmap(NULL, size / 2, PROT_READ | PROT_EXEC, MAP_SHARED,
> > fd, 0);
> > if (start1 == MAP_FAILED) {
> > perror("mmap");
> > exit(1);
> > }
> > end1 = start1 + size / 2;
> >
> > start2 = mmap(NULL, size / 2, PROT_READ, MAP_SHARED, fd, size / 2);
> > if (start2 == MAP_FAILED) {
> > perror("mmap");
> > exit(1);
> > }
> >
> > for (i = 0; i < rounds; ++i) {
> > struct timeval before, after;
> > volatile char *ptr1 = start1, *ptr2 = start2;
> > gettimeofday(&before, NULL);
> > for (; ptr1 < end1; ptr1 += page_size, ptr2 += page_size)
> > sum += *ptr1 + *ptr2;
> > gettimeofday(&after, NULL);
> > printf("File access time, round %d: %f (sec)\n", i,
> > (after.tv_sec - before.tv_sec) +
> > (after.tv_usec - before.tv_usec) / 1000000.0);
> > }
> > return sum;
> > }
> >
> > int main(int argc, char *argv[])
> > {
> > const long MB = 1024 * 1024;
> > long anon_mb, file_mb, file_rounds;
> > const char filename[] = "large";
> > long *ret1;
> > long ret2;
> >
> > if (argc != 4) {
> > printf("usage: thrash ANON_MB FILE_MB FILE_ROUNDS\n");
> > exit(0);
> > }
> > anon_mb = atoi(argv[1]);
> > file_mb = atoi(argv[2]);
> > file_rounds = atoi(argv[3]);
> >
> > fallocate_file(filename, file_mb * MB);
> > printf("Allocate %ld MB anonymous pages\n", anon_mb);
> > ret1 = alloc_anon(anon_mb * MB);
> > printf("Access %ld MB file pages\n", file_mb);
> > ret2 = access_file(filename, file_mb * MB, file_rounds);
> > printf("Print result to prevent optimization: %ld\n",
> > *ret1 + ret2);
> > return 0;
> > }
> > ---8<---
> >
> > Running the test program on 2GB RAM VM with kernel 5.2.0-rc5, the
> > program fills ram with 2048 MB memory, access a 200 MB file for 10
> > times. Without this patch, the file cache is dropped aggresively and
> > every access to the file is from disk.
> >
> > $ ./thrash 2048 200 10
> > Allocate 2048 MB anonymous pages
> > Access 200 MB file pages
> > File access time, round 0: 2.489316 (sec)
> > File access time, round 1: 2.581277 (sec)
> > File access time, round 2: 2.487624 (sec)
> > File access time, round 3: 2.449100 (sec)
> > File access time, round 4: 2.420423 (sec)
> > File access time, round 5: 2.343411 (sec)
> > File access time, round 6: 2.454833 (sec)
> > File access time, round 7: 2.483398 (sec)
> > File access time, round 8: 2.572701 (sec)
> > File access time, round 9: 2.493014 (sec)
> >
> > With this patch, these file pages can be cached.
> >
> > $ ./thrash 2048 200 10
> > Allocate 2048 MB anonymous pages
> > Access 200 MB file pages
> > File access time, round 0: 2.475189 (sec)
> > File access time, round 1: 2.440777 (sec)
> > File access time, round 2: 2.411671 (sec)
> > File access time, round 3: 1.955267 (sec)
> > File access time, round 4: 0.029924 (sec)
> > File access time, round 5: 0.000808 (sec)
> > File access time, round 6: 0.000771 (sec)
> > File access time, round 7: 0.000746 (sec)
> > File access time, round 8: 0.000738 (sec)
> > File access time, round 9: 0.000747 (sec)
>
> This is all good again.
>
> > Fixes: 2a2e48854d70 ("mm: vmscan: fix IO/refault regression in cache workingset transition")
>
> Please replace this line with the two Fixes: lines that I provided
> earlier in this thread.
Can't we have "Cc: <stable@vger.kernel.org> # 4.12+" so we have fix kernels which has
thrashing/workingset transition detection?
next prev parent reply other threads:[~2019-06-28 23:36 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-19 8:08 [PATCH] mm: vmscan: fix not scanning anonymous pages when detecting file refaults Kuo-Hsin Yang
2019-06-27 4:03 ` Andrew Morton
2019-06-27 18:41 ` Johannes Weiner
2019-06-28 6:51 ` Minchan Kim
2019-06-28 8:44 ` Vovo Yang
2019-06-28 14:22 ` Johannes Weiner
2019-06-28 23:34 ` Minchan Kim
2019-06-28 6:58 ` Minchan Kim
2019-06-28 11:16 ` [PATCH v2] " Kuo-Hsin Yang
2019-06-28 14:32 ` Johannes Weiner
2019-06-28 23:36 ` Minchan Kim [this message]
2019-07-01 7:56 ` Kuo-Hsin Yang
2019-07-01 8:10 ` [PATCH] mm: vmscan: scan anonymous pages on " Kuo-Hsin Yang
2019-07-03 14:30 ` Michal Hocko
2019-07-04 9:47 ` Kuo-Hsin Yang
2019-07-04 11:04 ` Michal Hocko
2019-07-05 12:45 ` Kuo-Hsin Yang
2019-07-12 7:13 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190628233649.GB245333@google.com \
--to=minchan@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=sonnyrao@chromium.org \
--cc=vovoy@chromium.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).