From: Linus Torvalds <torvalds@osdl.org>
To: Andrew Morton <akpm@osdl.org>
Cc: Jens Axboe <axboe@suse.de>,
linux-kernel@vger.kernel.org, npiggin@suse.de,
linux-mm@kvack.org
Subject: Re: Lockless page cache test results
Date: Wed, 26 Apr 2006 12:00:37 -0700 (PDT) [thread overview]
Message-ID: <Pine.LNX.4.64.0604261144290.3701@g5.osdl.org> (raw)
In-Reply-To: <20060426111054.2b4f1736.akpm@osdl.org>
On Wed, 26 Apr 2006, Andrew Morton wrote:
> Jens Axboe <axboe@suse.de> wrote:
> >
> > Once per page, it's basically exercising the generic_file_splice_read()
> > path. Basically X number of "clients" open the same file, and fill those
> > pages into a pipe using splice. The output end of the pipe is then
> > spliced to /dev/null to toss it away again.
>
> OK. That doesn't sound like something which a real application is likely
> to do ;)
True, but on the other hand, it does kind of "distill" one (small) part of
something that real apps _are_ likely to do.
The whole 'splice to /dev/null' part can be seen as totally irrelevant,
but at the same time a way to ignore all the other parts of normal page
cache usage (ie the other parts of page cache usage tend to be the "map it
into user space" or the actual "memcpy_to/from_user()" or the "TCP send"
part).
The question, of course, is whether the part that remains (the actual page
lookup) is important enough to matter, once it is part of a bigger chain
in a real application.
In other words, the splice() thing is just a way to isolate one part of a
chain that is usually much more involved, and micro-benchmark just that
one part.
Splice itself can be optimized to do the lookup locking only once per N
pages (where N currently is on the order of ~16), but that may not be as
easy for some other paths (ie the normal read path).
And the "reading from the same file in multiple threads" _is_ a real load.
It may sound stupid, but it would happen for any server that has a lot of
locality across clients (and that's very much true for web-servers, for
example).
That said, under most real loads, the page cach elookup is obviously
always going to be just a tiny tiny part (as shown by the fact that Jens
quotes 35 GB/s throughput - possible only because splice to /dev/null
doesn't need to actually ever even _touch_ the data).
The fact that it drops to "just" 3GB/s for four clients is somewhat
interesting, though, since that does put a limit on how well we can serve
the same file (of course, 3GB/s is still a lot faster than any modern
network will ever be able to push things around, but it's getting closer
to the possibilities for real hardware (ie IB over PCI-X should be able to
do about 1GB/s in "real life")
So the fact that basically just lookup/locking overhead can limit things
to 3GB/s is absolutely not totally uninteresting. Even if in practice
there are other limits that would probably hit us much earlier.
It would be interesting to see where doing gang-lookup moves the target,
but on the other hand, with smaller files (and small files are still
common), gang lookup isn't going to help as much.
Of course, with small files, the actual filename lookup is likely to be
the real limiter.
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2006-04-26 19:00 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-04-26 13:53 Lockless page cache test results Jens Axboe
2006-04-26 14:43 ` Nick Piggin
2006-04-26 19:46 ` Jens Axboe
2006-04-27 5:39 ` Chen, Kenneth W
2006-04-27 6:07 ` Nick Piggin
2006-04-27 6:15 ` Andi Kleen
2006-04-27 7:51 ` Chen, Kenneth W
2006-04-26 16:55 ` Andrew Morton
2006-04-26 17:42 ` Jens Axboe
2006-04-26 18:10 ` Andrew Morton
2006-04-26 18:23 ` Jens Axboe
2006-04-26 18:46 ` Andrew Morton
2006-04-26 19:21 ` Jens Axboe
2006-04-27 5:58 ` Nick Piggin
2006-04-26 18:34 ` Christoph Lameter
2006-04-26 18:47 ` Andrew Morton
2006-04-26 18:48 ` Christoph Lameter
2006-04-26 18:49 ` Jens Axboe
2006-04-26 20:31 ` Christoph Lameter
2006-04-28 14:01 ` David Chinner
2006-04-28 14:10 ` David Chinner
2006-04-30 9:49 ` Nick Piggin
2006-04-30 11:20 ` Nick Piggin
2006-04-30 11:39 ` Jens Axboe
2006-04-30 11:44 ` Nick Piggin
2006-04-26 18:58 ` Christoph Hellwig
2006-04-26 19:02 ` Jens Axboe
2006-04-26 19:00 ` Linus Torvalds [this message]
2006-04-26 19:15 ` Jens Axboe
2006-04-26 20:12 ` Andrew Morton
2006-04-27 7:45 ` Jens Axboe
2006-04-27 7:47 ` Jens Axboe
2006-04-27 7:57 ` Nick Piggin
2006-04-27 8:02 ` Nick Piggin
2006-04-27 9:00 ` Jens Axboe
2006-04-27 13:36 ` Nick Piggin
2006-04-27 8:36 ` Jens Axboe
[not found] ` <20060428112835.GA8072@mail.ustc.edu.cn>
2006-04-28 11:28 ` Wu Fengguang
2006-04-27 5:49 ` Nick Piggin
2006-04-27 15:12 ` Linus Torvalds
2006-04-28 4:54 ` Nick Piggin
2006-04-28 5:34 ` Linus Torvalds
2006-04-27 9:35 ` Jens Axboe
2006-04-27 5:22 ` Nick Piggin
2006-04-26 18:57 ` Jens Axboe
2006-04-27 2:19 ` KAMEZAWA Hiroyuki
2006-04-27 8:03 ` Jens Axboe
2006-04-27 11:16 ` Jens Axboe
2006-04-27 11:41 ` KAMEZAWA Hiroyuki
2006-04-27 11:45 ` Jens Axboe
2006-04-28 9:10 ` Pavel Machek
2006-04-28 9:21 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0604261144290.3701@g5.osdl.org \
--to=torvalds@osdl.org \
--cc=akpm@osdl.org \
--cc=axboe@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).