Re: filesystem benchmarking fun

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Chris Mason <chris.mason@oracle.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Chuck Ebbert <cebbert@redhat.com>, linux-kernel@vger.kernel.org
Subject: Re: filesystem benchmarking fun
Date: Wed, 16 May 2007 17:02:06 -0400	[thread overview]
Message-ID: <20070516210206.GH26766@think.oraclecorp.com> (raw)
In-Reply-To: <20070516133726.0c68a65f.akpm@linux-foundation.org>

[-- Attachment #1: Type: text/plain, Size: 2068 bytes --]

On Wed, May 16, 2007 at 01:37:26PM -0700, Andrew Morton wrote:
> On Wed, 16 May 2007 16:14:14 -0400
> Chris Mason <chris.mason@oracle.com> wrote:
> 
> > On Wed, May 16, 2007 at 01:04:13PM -0700, Andrew Morton wrote:
> > > > The good news is that if you let it run long enough, the times
> > > > stabilize.  The bad news is:
> > > > 
> > > > create dir kernel-86 222MB in 15.85 seconds (14.03 MB/s)
> > > > create dir kernel-87 222MB in 28.67 seconds (7.76 MB/s)
> > > > create dir kernel-88 222MB in 18.12 seconds (12.27 MB/s)
> > > > create dir kernel-89 222MB in 19.77 seconds (11.25 MB/s)
> > > 
> > > well hang on.  Doesn't this just mean that the first few runs were writing
> > > into pagecache and the later ones were blocking due to dirty-memory limits?
> > > 
> > > Or do you have a sync in there?
> > > 
> > There's no sync,  but if you watch vmstat you can clearly see the log
> > flushes, even when the overall create times are 11MB/s.  vmstat goes
> > 30MB/s -> 4MB/s or less, then back up to 30MB/s.
> 
> How do you know that it is a log flush rather than, say, pdflush
> hitting the blockdev inode and doing a big seeky write?

I don't...it gets especially tricky because ext3_writepage starts
a transaction, and so pdflush does hit the log flushing code too.

So, in comes systemtap.  I instrumented submit_bh to look for seeks
(defined as writes more than 16 blocks apart) when the process was
inside __log_wait_for_space.  The probe is attached, it is _really_
quick and dirty because I'm about to run out the door.

Watching vmstat, every time the __log_wait_for_space hits lots of seeks,
vmstat goes into the 2-4MB/s range.  Not a scientific match up, but
here's some sample output:

7824 ext3 done waiting for space total wrote 3155 blocks seeks 2241
7827 ext3 done waiting for space total wrote 855 blocks seeks 598
7827 ext3 done waiting for space total wrote 2547 blocks seeks 1759
7653 ext3 done waiting for space total wrote 2273 blocks seeks 1609

I also recorded the total size of each seek, 66% of them where 6000
blocks or more.

-chris


[-- Attachment #2: jbd.tap --]
[-- Type: text/plain, Size: 1049 bytes --]


global in_process
global writers
global last
global seeks

probe kernel.function("__log_wait_for_space@fs/jbd/checkpoint.c") {
    printf("%d ext3 waiting for space\n", pid())
    p = pid()
    writers[p] = 0
    in_process[p] = 1
    last[p] = 0
    seeks[p] = 0
}

probe kernel.function("__log_wait_for_space@fs/jbd/checkpoint.c").return {
    p = pid()
    in_process[p] = 0
    printf("%d ext3 done waiting for space total wrote %d blocks seeks %d\n", p,
	  writers[p], seeks[p])
}

probe kernel.function("submit_bh") {
    p = pid()
    in_proc = in_process[p]
    if (in_proc != 0) {
	writers[p] += 1
	block = $bh->b_blocknr
	last_block = last[p]
	diff = 0
	if (last_block != 0) {
	    if (last_block < block && block - last_block > 16) {
		diff = block - last_block
	    }
	    if (last_block > block && last_block - block > 16) {
		diff = last_block - block
	    }
	}
	    
	last[p] = block
	if (diff != 0) {
	    printf("seek log write pid %d last %d this %d diff %d\n",
		       p, last_block, block, diff);
	    seeks[p] += 1
	}
    }
}

next prev parent reply	other threads:[~2007-05-16 21:06 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-16 14:42 filesystem benchmarking fun Chris Mason
2007-05-16 16:01 ` Chuck Ebbert
2007-05-16 17:11   ` Chris Mason
2007-05-16 18:25     ` Andrew Morton
2007-05-16 19:13       ` Chris Mason
2007-05-16 19:33         ` Andrew Morton
2007-05-16 19:53           ` Chris Mason
2007-05-16 20:04             ` Andrew Morton
2007-05-16 20:14               ` Chris Mason
2007-05-16 20:37                 ` Andrew Morton
2007-05-16 21:02                   ` Chris Mason [this message]
2007-05-24 17:29                     ` Vara Prasad
2007-05-22 16:35                   ` Chris Mason
2007-05-22 17:50                     ` John Stoffel
2007-05-22 18:12                       ` Chris Mason
2007-05-22 18:21                     ` Andrew Morton
2007-05-22 18:39                       ` Chris Mason
2007-05-22 21:25                       ` Matt Mackall
2007-05-25  7:14                         ` Jens Axboe
2007-05-16 18:12 ` Jan Engelhardt
2007-05-16 19:12   ` Jeff Garzik
2007-05-16 19:16     ` Jeffrey Hundstad
2007-05-16 19:21       ` Jan Engelhardt
2007-05-18  3:32     ` Eric Sandeen
2007-05-16 19:25   ` Chris Mason
  -- strict thread matches above, loose matches on Subject: below --
2007-05-16 21:01 Al Boldi
2007-05-17 11:52 Xu CanHao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070516210206.GH26766@think.oraclecorp.com \
    --to=chris.mason@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=cebbert@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.