linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>, Elladan <elladan@eskimo.com>,
	Nick Piggin <npiggin@suse.de>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Rik van Riel <riel@redhat.com>, "tytso@mit.edu" <tytso@mit.edu>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"minchan.kim@gmail.com" <minchan.kim@gmail.com>
Subject: Re: [PATCH 2/3] vmscan: make mapped executable pages the first class citizen
Date: Tue, 19 May 2009 13:09:32 +0800	[thread overview]
Message-ID: <20090519050932.GB8769@localhost> (raw)
In-Reply-To: <20090519133422.4ECC.A69D9226@jp.fujitsu.com>

[-- Attachment #1: Type: text/plain, Size: 4860 bytes --]

On Tue, May 19, 2009 at 12:41:38PM +0800, KOSAKI Motohiro wrote:
> Hi
> 
> Thanks for great works.
> 
> 
> > SUMMARY
> > =======
> > The patch decreases the number of major faults from 50 to 3 during 10% cache hot reads.
> > 
> > 
> > SCENARIO
> > ========
> > The test scenario is to do 100000 pread(size=110 pages, offset=(i*100) pages),
> > where 10% of the pages will be activated:
> > 
> >         for i in `seq 0 100 10000000`; do echo $i 110;  done > pattern-hot-10
> >         iotrace.rb --load pattern-hot-10 --play /b/sparse
> 
> 
> Which can I download iotrace.rb?

In the attachment. It relies on some ruby libraries.

> > and monitor /proc/vmstat during the time. The test box has 2G memory.
> > 
> > 
> > ANALYZES
> > ========
> > 
> > I carried out two runs on fresh booted console mode 2.6.29 with the VM_EXEC
> > patch, and fetched the vmstat numbers on
> > 
> > (1) begin:   shortly after the big read IO starts;
> > (2) end:     just before the big read IO stops;
> > (3) restore: the big read IO stops and the zsh working set restored
> > 
> >         nr_mapped   nr_active_file nr_inactive_file       pgmajfault     pgdeactivate           pgfree
> > begin:       2481             2237             8694              630                0           574299
> > end:          275           231976           233914              633           776271         20933042
> > restore:      370           232154           234524              691           777183         20958453
> > 
> > begin:       2434             2237             8493              629                0           574195
> > end:          284           231970           233536              632           771918         20896129
> > restore:      399           232218           234789              690           774526         20957909
> > 
> > and another run on 2.6.30-rc4-mm with the VM_EXEC logic disabled:
> 
> I don't think it is proper comparision.
> you need either following comparision. otherwise we insert many guess into the analysis.
> 
>  - 2.6.29 with and without VM_EXEC patch
>  - 2.6.30-rc4-mm with and without VM_EXEC patch

I think it doesn't matter that much when it comes to "relative" numbers.
But anyway I guess you want to try a more typical desktop ;)
Unfortunately currently the Xorg is broken in my test box..

> > 
> > begin:       2479             2344             9659              210                0           579643
> > end:          284           232010           234142              260           772776         20917184
> > restore:      379           232159           234371              301           774888         20967849
> > 
> > The numbers show that
> > 
> > - The startup pgmajfault of 2.6.30-rc4-mm is merely 1/3 that of 2.6.29.
> >   I'd attribute that improvement to the mmap readahead improvements :-)
> > 
> > - The pgmajfault increment during the file copy is 633-630=3 vs 260-210=50.
> >   That's a huge improvement - which means with the VM_EXEC protection logic,
> >   active mmap pages is pretty safe even under partially cache hot streaming IO.
> > 
> > - when active:inactive file lru size reaches 1:1, their scan rates is 1:20.8
> >   under 10% cache hot IO. (computed with formula Dpgdeactivate:Dpgfree)
> >   That roughly means the active mmap pages get 20.8 more chances to get
> >   re-referenced to stay in memory.
> > 
> > - The absolute nr_mapped drops considerably to 1/9 during the big IO, and the
> >   dropped pages are mostly inactive ones. The patch has almost no impact in
> >   this aspect, that means it won't unnecessarily increase memory pressure.
> >   (In contrast, your 20% mmap protection ratio will keep them all, and
> >   therefore eliminate the extra 41 major faults to restore working set
> >   of zsh etc.)
> 
> I'm surprised this.
> Why your patch don't protect mapped page from streaming io?

It is only protecting the *active* mapped pages, as expected.
But yes, the active percent is much lower than expected :-)

> I strongly hope reproduce myself, please teach me reproduce way.

OK. 

Firstly:

         for i in `seq 0 100 10000000`; do echo $i 110;  done > pattern-hot-10
         dd if=/dev/zero of=/tmp/sparse bs=1M count=1 seek=1024000

Then boot into desktop and run concurrently:

         iotrace.rb --load pattern-hot-10 --play /tmp/sparse
         vmmon  nr_mapped nr_active_file nr_inactive_file   pgmajfault pgdeactivate pgfree

Note that I was creating the sparse file in btrfs, which happens to be
very slow in sparse file reading:

        151.194384MB/s 284.198252s 100001x 450560b --load pattern-hot-10 --play /b/sparse

In that case, the inactive list is rotated at the speed of 250MB/s,
so a full scan of which takes about 3.5 seconds, while a full scan
of active file list takes about 77 seconds.

Attached source code for both of the above tools.

Thanks,
Fengguang

[-- Attachment #2: iotrace.rb --]
[-- Type: application/x-ruby, Size: 8999 bytes --]

[-- Attachment #3: vmmon.c --]
[-- Type: text/x-csrc, Size: 2410 bytes --]

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/time.h>

static int raw   = 1;
static int delay = 1;
static int nr_fields;
static char **fields;
static FILE *f;

static void acquire(long *values)
{
	char buf[1024];

	rewind(f);
	memset(values, 0, nr_fields * sizeof(*values));
	while (fgets(buf, sizeof(buf), f)) {
		int i;

		for (i = 0; i < nr_fields; i++) {
			char *p;

			if (strncmp(buf, fields[i], strlen(fields[i])))
				continue;
			p = strchr(buf, ' ');
			if (p == NULL) {
				fprintf(stderr, "vmmon: error parsing /proc\n");
				exit(1);
			}
			values[i] += strtoul(p, NULL, 10);
			break;
		}
	}
}

static void display(long *new_values, long *prev_values,
			unsigned long long usecs)
{
	int i;

	for (i = 0; i < nr_fields; i++) {
		if (raw)
			printf(" %16ld", new_values[i]);
		else {
			long long diff;
			double ddiff;
			ddiff = new_values[i] - prev_values[i];
			ddiff *= 1000000;
			ddiff /= usecs;
			diff = ddiff;
			printf(" %16lld", diff);
		}
	}
	printf("\n");
}

static void do1(long *prev_values)
{
	struct timeval start;
	struct timeval end;
	long long usecs;
	long new_values[nr_fields];

	gettimeofday(&start, NULL);
	sleep(delay);
	gettimeofday(&end, NULL);
	acquire(new_values);
	usecs = end.tv_sec - start.tv_sec;
	usecs *= 1000000;
	usecs += end.tv_usec - start.tv_usec;
	display(new_values, prev_values, usecs);
	memcpy(prev_values, new_values, nr_fields * sizeof(*prev_values));
}

static void heading(void)
{
	int i;

	printf("\n");
	for (i = 0; i < nr_fields; i++)
		printf(" %16s", fields[i]);
	printf("\n");
}

static void doit(void)
{
	int line = 0;
	long prev_values[nr_fields];

	acquire(prev_values);
	for ( ; ; ) {
		if (line == 0)
			heading();
		do1(prev_values);
		line++;
		if (line == 24)
			line = 0;
	}
}

static void usage(void)
{
	fprintf(stderr, "usage: vmmon [-r] [-d N] field [field ...]\n");
	fprintf(stderr, "   -d N             : delay N seconds\n");
	fprintf(stderr, "   -r               : show raw numbers instead of diff\n");
	exit(1);
}

int main(int argc, char *argv[])
{
	int c;

	while ((c = getopt(argc, argv, "rd:")) != -1) {
		switch (c) {
		case 'r':
			raw = 1;
		case 'd':
			delay = strtol(optarg, NULL, 10);
			break;
		default:
			usage();
		}
	}

	if (optind == argc)
		usage();

	nr_fields = argc - optind;
	fields = argv + optind;
	f = fopen("/proc/vmstat", "r");
	doit();
	exit(0);
}

  parent reply	other threads:[~2009-05-19  5:11 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-16  9:00 [PATCH 0/3] make mapped executable pages the first class citizen Wu Fengguang
2009-05-16  9:00 ` [PATCH 1/3] vmscan: report vm_flags in page_referenced() Wu Fengguang
2009-05-16 13:17   ` Johannes Weiner
2009-05-16 13:37   ` Rik van Riel
2009-05-17  0:35   ` Minchan Kim
2009-05-17  1:36   ` Minchan Kim
2009-05-17  1:58     ` Wu Fengguang
2009-05-16  9:00 ` [PATCH 2/3] vmscan: make mapped executable pages the first class citizen Wu Fengguang
2009-05-16  9:28   ` Wu Fengguang
2009-05-16 13:20     ` Johannes Weiner
2009-05-17  0:38     ` Minchan Kim
2009-05-18 14:46     ` Christoph Lameter
2009-05-19  3:27       ` Wu Fengguang
2009-05-19  4:41         ` KOSAKI Motohiro
2009-05-19  4:44           ` KOSAKI Motohiro
2009-05-19  4:48             ` Wu Fengguang
2009-05-19  5:09           ` Wu Fengguang [this message]
2009-05-19  6:27             ` Wu Fengguang
2009-05-19  6:25           ` Wu Fengguang
2009-05-20 11:20             ` Andi Kleen
2009-05-20 14:32               ` Wu Fengguang
2009-05-20 14:47                 ` Andi Kleen
2009-05-20 14:56                   ` Wu Fengguang
2009-05-20 15:38                     ` Wu Fengguang
2009-06-08 12:14                       ` Nai Xia
2009-06-08 12:46                         ` Wu Fengguang
2009-06-08 15:02                           ` Nai Xia
2009-06-08  7:39               ` Wu Fengguang
2009-06-08  7:51                 ` KOSAKI Motohiro
2009-06-08  7:56                   ` Wu Fengguang
2009-06-08 17:18                 ` Nai Xia
2009-06-09  6:44                   ` Wu Fengguang
2009-05-19  7:15           ` Wu Fengguang
2009-05-19  7:20             ` KOSAKI Motohiro
2009-05-19  7:49               ` Wu Fengguang
2009-05-19  8:06                 ` KOSAKI Motohiro
2009-05-19  8:53                   ` Wu Fengguang
2009-05-19 12:28                     ` KOSAKI Motohiro
2009-05-20  1:44                       ` Wu Fengguang
2009-05-20  1:59                         ` KOSAKI Motohiro
2009-05-20  2:31                           ` Wu Fengguang
2009-05-20  2:58                             ` KOSAKI Motohiro
2009-05-19 13:24                     ` Rik van Riel
2009-05-19 15:55                       ` KOSAKI Motohiro
2009-05-19  6:39   ` Pekka Enberg
2009-05-19  6:56     ` KOSAKI Motohiro
2009-05-19  7:44     ` Peter Zijlstra
2009-05-19  8:05       ` Pekka Enberg
2009-05-19  8:12         ` Wu Fengguang
2009-05-19  8:14           ` Pekka Enberg
2009-05-19 13:14     ` Rik van Riel
2009-05-16  9:00 ` [PATCH 3/3] vmscan: merge duplicate code in shrink_active_list() Wu Fengguang
2009-05-16 13:39   ` Johannes Weiner
2009-05-16 13:47     ` Wu Fengguang
2009-05-16 14:35   ` Rik van Riel
2009-05-17  1:24   ` Minchan Kim
2009-05-16 14:56 ` [PATCH 0/3] make mapped executable pages the first class citizen Peter Zijlstra
2009-06-17 21:11   ` Jesse Barnes
2009-06-17 21:37     ` Jesse Barnes
2009-06-18  1:25     ` Wu Fengguang
2009-06-18 16:33       ` Jesse Barnes
2009-06-19  9:00       ` Wu, Fengguang
2009-06-19  9:04         ` Peter Zijlstra
2009-06-19  9:32           ` Wu Fengguang
2009-06-19 16:43             ` Jesse Barnes
2009-07-04  1:27               ` Roger WANG
2009-07-06 17:38                 ` Jesse Barnes
  -- strict thread matches above, loose matches on Subject: below --
2009-05-17  2:23 Wu Fengguang
2009-05-17  2:23 ` [PATCH 2/3] vmscan: " Wu Fengguang
2009-05-19  8:59   ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090519050932.GB8769@localhost \
    --to=fengguang.wu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux-foundation.org \
    --cc=elladan@eskimo.com \
    --cc=hannes@cmpxchg.org \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan.kim@gmail.com \
    --cc=npiggin@suse.de \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).