Re: Lockless page cache test results

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Linus Torvalds <torvalds@osdl.org>
To: Andrew Morton <akpm@osdl.org>
Cc: Jens Axboe <axboe@suse.de>,
	linux-kernel@vger.kernel.org, npiggin@suse.de,
	linux-mm@kvack.org
Subject: Re: Lockless page cache test results
Date: Wed, 26 Apr 2006 12:00:37 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.64.0604261144290.3701@g5.osdl.org> (raw)
In-Reply-To: <20060426111054.2b4f1736.akpm@osdl.org>

On Wed, 26 Apr 2006, Andrew Morton wrote:

> Jens Axboe <axboe@suse.de> wrote:
> > 
> > Once per page, it's basically exercising the generic_file_splice_read()
> > path. Basically X number of "clients" open the same file, and fill those
> > pages into a pipe using splice. The output end of the pipe is then
> > spliced to /dev/null to toss it away again.
> 
> OK.  That doesn't sound like something which a real application is likely
> to do ;)

True, but on the other hand, it does kind of "distill" one (small) part of 
something that real apps _are_ likely to do.

The whole 'splice to /dev/null' part can be seen as totally irrelevant, 
but at the same time a way to ignore all the other parts of normal page 
cache usage (ie the other parts of page cache usage tend to be the "map it 
into user space" or the actual "memcpy_to/from_user()" or the "TCP send" 
part).

The question, of course, is whether the part that remains (the actual page 
lookup) is important enough to matter, once it is part of a bigger chain 
in a real application.

In other words, the splice() thing is just a way to isolate one part of a 
chain that is usually much more involved, and micro-benchmark just that 
one part.

Splice itself can be optimized to do the lookup locking only once per N 
pages (where N currently is on the order of ~16), but that may not be as 
easy for some other paths (ie the normal read path).

And the "reading from the same file in multiple threads" _is_ a real load. 
It may sound stupid, but it would happen for any server that has a lot of 
locality across clients (and that's very much true for web-servers, for 
example).

That said, under most real loads, the page cach elookup is obviously 
always going to be just a tiny tiny part (as shown by the fact that Jens 
quotes 35 GB/s throughput - possible only because splice to /dev/null 
doesn't need to actually ever even _touch_ the data).

The fact that it drops to "just" 3GB/s for four clients is somewhat 
interesting, though, since that does put a limit on how well we can serve 
the same file (of course, 3GB/s is still a lot faster than any modern 
network will ever be able to push things around, but it's getting closer 
to the possibilities for real hardware (ie IB over PCI-X should be able to 
do about 1GB/s in "real life")

So the fact that basically just lookup/locking overhead can limit things 
to 3GB/s is absolutely not totally uninteresting. Even if in practice 
there are other limits that would probably hit us much earlier.

It would be interesting to see where doing gang-lookup moves the target, 
but on the other hand, with smaller files (and small files are still 
common), gang lookup isn't going to help as much.

Of course, with small files, the actual filename lookup is likely to be 
the real limiter.

			Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2006-04-26 19:00 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-04-26 13:53 Lockless page cache test results Jens Axboe
2006-04-26 14:43 ` Nick Piggin
2006-04-26 19:46   ` Jens Axboe
2006-04-27  5:39     ` Chen, Kenneth W
2006-04-27  6:07       ` Nick Piggin
2006-04-27  6:15       ` Andi Kleen
2006-04-27  7:51         ` Chen, Kenneth W
2006-04-26 16:55 ` Andrew Morton
2006-04-26 17:42   ` Jens Axboe
2006-04-26 18:10     ` Andrew Morton
2006-04-26 18:23       ` Jens Axboe
2006-04-26 18:46         ` Andrew Morton
2006-04-26 19:21           ` Jens Axboe
2006-04-27  5:58           ` Nick Piggin
2006-04-26 18:34       ` Christoph Lameter
2006-04-26 18:47         ` Andrew Morton
2006-04-26 18:48           ` Christoph Lameter
2006-04-26 18:49           ` Jens Axboe
2006-04-26 20:31             ` Christoph Lameter
2006-04-28 14:01               ` David Chinner
2006-04-28 14:10                 ` David Chinner
2006-04-30  9:49                 ` Nick Piggin
2006-04-30 11:20                   ` Nick Piggin
2006-04-30 11:39                   ` Jens Axboe
2006-04-30 11:44                     ` Nick Piggin
2006-04-26 18:58       ` Christoph Hellwig
2006-04-26 19:02         ` Jens Axboe
2006-04-26 19:00       ` Linus Torvalds [this message]
2006-04-26 19:15         ` Jens Axboe
2006-04-26 20:12           ` Andrew Morton
2006-04-27  7:45             ` Jens Axboe
2006-04-27  7:47               ` Jens Axboe
2006-04-27  7:57               ` Nick Piggin
2006-04-27  8:02                 ` Nick Piggin
2006-04-27  9:00                   ` Jens Axboe
2006-04-27 13:36                     ` Nick Piggin
2006-04-27  8:36                 ` Jens Axboe
     [not found]             ` <20060428112835.GA8072@mail.ustc.edu.cn>
2006-04-28 11:28               ` Wu Fengguang
2006-04-27  5:49         ` Nick Piggin
2006-04-27 15:12           ` Linus Torvalds
2006-04-28  4:54             ` Nick Piggin
2006-04-28  5:34               ` Linus Torvalds
2006-04-27  9:35         ` Jens Axboe
2006-04-27  5:22       ` Nick Piggin
2006-04-26 18:57     ` Jens Axboe
2006-04-27  2:19       ` KAMEZAWA Hiroyuki
2006-04-27  8:03         ` Jens Axboe
2006-04-27 11:16           ` Jens Axboe
2006-04-27 11:41             ` KAMEZAWA Hiroyuki
2006-04-27 11:45               ` Jens Axboe
2006-04-28  9:10 ` Pavel Machek
2006-04-28  9:21   ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0604261144290.3701@g5.osdl.org \
    --to=torvalds@osdl.org \
    --cc=akpm@osdl.org \
    --cc=axboe@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).