From: Wu Fengguang <fengguang.wu@intel.com>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>,
LKML <linux-kernel@vger.kernel.org>, Elladan <elladan@eskimo.com>,
Nick Piggin <npiggin@suse.de>,
Johannes Weiner <hannes@cmpxchg.org>,
Peter Zijlstra <peterz@infradead.org>,
Rik van Riel <riel@redhat.com>, "tytso@mit.edu" <tytso@mit.edu>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"minchan.kim@gmail.com" <minchan.kim@gmail.com>
Subject: Re: [PATCH 2/3] vmscan: make mapped executable pages the first class citizen
Date: Tue, 19 May 2009 13:09:32 +0800 [thread overview]
Message-ID: <20090519050932.GB8769@localhost> (raw)
In-Reply-To: <20090519133422.4ECC.A69D9226@jp.fujitsu.com>
[-- Attachment #1: Type: text/plain, Size: 4860 bytes --]
On Tue, May 19, 2009 at 12:41:38PM +0800, KOSAKI Motohiro wrote:
> Hi
>
> Thanks for great works.
>
>
> > SUMMARY
> > =======
> > The patch decreases the number of major faults from 50 to 3 during 10% cache hot reads.
> >
> >
> > SCENARIO
> > ========
> > The test scenario is to do 100000 pread(size=110 pages, offset=(i*100) pages),
> > where 10% of the pages will be activated:
> >
> > for i in `seq 0 100 10000000`; do echo $i 110; done > pattern-hot-10
> > iotrace.rb --load pattern-hot-10 --play /b/sparse
>
>
> Which can I download iotrace.rb?
In the attachment. It relies on some ruby libraries.
> > and monitor /proc/vmstat during the time. The test box has 2G memory.
> >
> >
> > ANALYZES
> > ========
> >
> > I carried out two runs on fresh booted console mode 2.6.29 with the VM_EXEC
> > patch, and fetched the vmstat numbers on
> >
> > (1) begin: shortly after the big read IO starts;
> > (2) end: just before the big read IO stops;
> > (3) restore: the big read IO stops and the zsh working set restored
> >
> > nr_mapped nr_active_file nr_inactive_file pgmajfault pgdeactivate pgfree
> > begin: 2481 2237 8694 630 0 574299
> > end: 275 231976 233914 633 776271 20933042
> > restore: 370 232154 234524 691 777183 20958453
> >
> > begin: 2434 2237 8493 629 0 574195
> > end: 284 231970 233536 632 771918 20896129
> > restore: 399 232218 234789 690 774526 20957909
> >
> > and another run on 2.6.30-rc4-mm with the VM_EXEC logic disabled:
>
> I don't think it is proper comparision.
> you need either following comparision. otherwise we insert many guess into the analysis.
>
> - 2.6.29 with and without VM_EXEC patch
> - 2.6.30-rc4-mm with and without VM_EXEC patch
I think it doesn't matter that much when it comes to "relative" numbers.
But anyway I guess you want to try a more typical desktop ;)
Unfortunately currently the Xorg is broken in my test box..
> >
> > begin: 2479 2344 9659 210 0 579643
> > end: 284 232010 234142 260 772776 20917184
> > restore: 379 232159 234371 301 774888 20967849
> >
> > The numbers show that
> >
> > - The startup pgmajfault of 2.6.30-rc4-mm is merely 1/3 that of 2.6.29.
> > I'd attribute that improvement to the mmap readahead improvements :-)
> >
> > - The pgmajfault increment during the file copy is 633-630=3 vs 260-210=50.
> > That's a huge improvement - which means with the VM_EXEC protection logic,
> > active mmap pages is pretty safe even under partially cache hot streaming IO.
> >
> > - when active:inactive file lru size reaches 1:1, their scan rates is 1:20.8
> > under 10% cache hot IO. (computed with formula Dpgdeactivate:Dpgfree)
> > That roughly means the active mmap pages get 20.8 more chances to get
> > re-referenced to stay in memory.
> >
> > - The absolute nr_mapped drops considerably to 1/9 during the big IO, and the
> > dropped pages are mostly inactive ones. The patch has almost no impact in
> > this aspect, that means it won't unnecessarily increase memory pressure.
> > (In contrast, your 20% mmap protection ratio will keep them all, and
> > therefore eliminate the extra 41 major faults to restore working set
> > of zsh etc.)
>
> I'm surprised this.
> Why your patch don't protect mapped page from streaming io?
It is only protecting the *active* mapped pages, as expected.
But yes, the active percent is much lower than expected :-)
> I strongly hope reproduce myself, please teach me reproduce way.
OK.
Firstly:
for i in `seq 0 100 10000000`; do echo $i 110; done > pattern-hot-10
dd if=/dev/zero of=/tmp/sparse bs=1M count=1 seek=1024000
Then boot into desktop and run concurrently:
iotrace.rb --load pattern-hot-10 --play /tmp/sparse
vmmon nr_mapped nr_active_file nr_inactive_file pgmajfault pgdeactivate pgfree
Note that I was creating the sparse file in btrfs, which happens to be
very slow in sparse file reading:
151.194384MB/s 284.198252s 100001x 450560b --load pattern-hot-10 --play /b/sparse
In that case, the inactive list is rotated at the speed of 250MB/s,
so a full scan of which takes about 3.5 seconds, while a full scan
of active file list takes about 77 seconds.
Attached source code for both of the above tools.
Thanks,
Fengguang
[-- Attachment #2: iotrace.rb --]
[-- Type: application/x-ruby, Size: 8999 bytes --]
[-- Attachment #3: vmmon.c --]
[-- Type: text/x-csrc, Size: 2410 bytes --]
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/time.h>
static int raw = 1;
static int delay = 1;
static int nr_fields;
static char **fields;
static FILE *f;
static void acquire(long *values)
{
char buf[1024];
rewind(f);
memset(values, 0, nr_fields * sizeof(*values));
while (fgets(buf, sizeof(buf), f)) {
int i;
for (i = 0; i < nr_fields; i++) {
char *p;
if (strncmp(buf, fields[i], strlen(fields[i])))
continue;
p = strchr(buf, ' ');
if (p == NULL) {
fprintf(stderr, "vmmon: error parsing /proc\n");
exit(1);
}
values[i] += strtoul(p, NULL, 10);
break;
}
}
}
static void display(long *new_values, long *prev_values,
unsigned long long usecs)
{
int i;
for (i = 0; i < nr_fields; i++) {
if (raw)
printf(" %16ld", new_values[i]);
else {
long long diff;
double ddiff;
ddiff = new_values[i] - prev_values[i];
ddiff *= 1000000;
ddiff /= usecs;
diff = ddiff;
printf(" %16lld", diff);
}
}
printf("\n");
}
static void do1(long *prev_values)
{
struct timeval start;
struct timeval end;
long long usecs;
long new_values[nr_fields];
gettimeofday(&start, NULL);
sleep(delay);
gettimeofday(&end, NULL);
acquire(new_values);
usecs = end.tv_sec - start.tv_sec;
usecs *= 1000000;
usecs += end.tv_usec - start.tv_usec;
display(new_values, prev_values, usecs);
memcpy(prev_values, new_values, nr_fields * sizeof(*prev_values));
}
static void heading(void)
{
int i;
printf("\n");
for (i = 0; i < nr_fields; i++)
printf(" %16s", fields[i]);
printf("\n");
}
static void doit(void)
{
int line = 0;
long prev_values[nr_fields];
acquire(prev_values);
for ( ; ; ) {
if (line == 0)
heading();
do1(prev_values);
line++;
if (line == 24)
line = 0;
}
}
static void usage(void)
{
fprintf(stderr, "usage: vmmon [-r] [-d N] field [field ...]\n");
fprintf(stderr, " -d N : delay N seconds\n");
fprintf(stderr, " -r : show raw numbers instead of diff\n");
exit(1);
}
int main(int argc, char *argv[])
{
int c;
while ((c = getopt(argc, argv, "rd:")) != -1) {
switch (c) {
case 'r':
raw = 1;
case 'd':
delay = strtol(optarg, NULL, 10);
break;
default:
usage();
}
}
if (optind == argc)
usage();
nr_fields = argc - optind;
fields = argv + optind;
f = fopen("/proc/vmstat", "r");
doit();
exit(0);
}
next prev parent reply other threads:[~2009-05-19 5:11 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-05-16 9:00 [PATCH 0/3] make mapped executable pages the first class citizen Wu Fengguang
2009-05-16 9:00 ` [PATCH 1/3] vmscan: report vm_flags in page_referenced() Wu Fengguang
2009-05-16 13:17 ` Johannes Weiner
2009-05-16 13:37 ` Rik van Riel
2009-05-17 0:35 ` Minchan Kim
2009-05-17 1:36 ` Minchan Kim
2009-05-17 1:58 ` Wu Fengguang
2009-05-16 9:00 ` [PATCH 2/3] vmscan: make mapped executable pages the first class citizen Wu Fengguang
2009-05-16 9:28 ` Wu Fengguang
2009-05-16 13:20 ` Johannes Weiner
2009-05-17 0:38 ` Minchan Kim
2009-05-18 14:46 ` Christoph Lameter
2009-05-19 3:27 ` Wu Fengguang
2009-05-19 4:41 ` KOSAKI Motohiro
2009-05-19 4:44 ` KOSAKI Motohiro
2009-05-19 4:48 ` Wu Fengguang
2009-05-19 5:09 ` Wu Fengguang [this message]
2009-05-19 6:27 ` Wu Fengguang
2009-05-19 6:25 ` Wu Fengguang
2009-05-20 11:20 ` Andi Kleen
2009-05-20 14:32 ` Wu Fengguang
2009-05-20 14:47 ` Andi Kleen
2009-05-20 14:56 ` Wu Fengguang
2009-05-20 15:38 ` Wu Fengguang
2009-06-08 12:14 ` Nai Xia
2009-06-08 12:46 ` Wu Fengguang
2009-06-08 15:02 ` Nai Xia
2009-06-08 7:39 ` Wu Fengguang
2009-06-08 7:51 ` KOSAKI Motohiro
2009-06-08 7:56 ` Wu Fengguang
2009-06-08 17:18 ` Nai Xia
2009-06-09 6:44 ` Wu Fengguang
2009-05-19 7:15 ` Wu Fengguang
2009-05-19 7:20 ` KOSAKI Motohiro
2009-05-19 7:49 ` Wu Fengguang
2009-05-19 8:06 ` KOSAKI Motohiro
2009-05-19 8:53 ` Wu Fengguang
2009-05-19 12:28 ` KOSAKI Motohiro
2009-05-20 1:44 ` Wu Fengguang
2009-05-20 1:59 ` KOSAKI Motohiro
2009-05-20 2:31 ` Wu Fengguang
2009-05-20 2:58 ` KOSAKI Motohiro
2009-05-19 13:24 ` Rik van Riel
2009-05-19 15:55 ` KOSAKI Motohiro
2009-05-19 6:39 ` Pekka Enberg
2009-05-19 6:56 ` KOSAKI Motohiro
2009-05-19 7:44 ` Peter Zijlstra
2009-05-19 8:05 ` Pekka Enberg
2009-05-19 8:12 ` Wu Fengguang
2009-05-19 8:14 ` Pekka Enberg
2009-05-19 13:14 ` Rik van Riel
2009-05-16 9:00 ` [PATCH 3/3] vmscan: merge duplicate code in shrink_active_list() Wu Fengguang
2009-05-16 13:39 ` Johannes Weiner
2009-05-16 13:47 ` Wu Fengguang
2009-05-16 14:35 ` Rik van Riel
2009-05-17 1:24 ` Minchan Kim
2009-05-16 14:56 ` [PATCH 0/3] make mapped executable pages the first class citizen Peter Zijlstra
2009-06-17 21:11 ` Jesse Barnes
2009-06-17 21:37 ` Jesse Barnes
2009-06-18 1:25 ` Wu Fengguang
2009-06-18 16:33 ` Jesse Barnes
2009-06-19 9:00 ` Wu, Fengguang
2009-06-19 9:04 ` Peter Zijlstra
2009-06-19 9:32 ` Wu Fengguang
2009-06-19 16:43 ` Jesse Barnes
2009-07-04 1:27 ` Roger WANG
2009-07-06 17:38 ` Jesse Barnes
-- strict thread matches above, loose matches on Subject: below --
2009-05-17 2:23 Wu Fengguang
2009-05-17 2:23 ` [PATCH 2/3] vmscan: " Wu Fengguang
2009-05-19 8:59 ` Wu Fengguang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090519050932.GB8769@localhost \
--to=fengguang.wu@intel.com \
--cc=akpm@linux-foundation.org \
--cc=cl@linux-foundation.org \
--cc=elladan@eskimo.com \
--cc=hannes@cmpxchg.org \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan.kim@gmail.com \
--cc=npiggin@suse.de \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).