From: torvalds@transmeta.com (Linus Torvalds)
To: linux-kernel@vger.kernel.org
Subject: Re: [Bench] New benchmark showing fileserver problem in 2.4.12
Date: Wed, 17 Oct 2001 16:44:49 +0000 (UTC) [thread overview]
Message-ID: <9qkci1$h9g$1@penguin.transmeta.com> (raw)
In-Reply-To: <3BCD8269.B4E003E5@anu.edu.au>
In article <3BCD8269.B4E003E5@anu.edu.au>,
Robert Cohen <robert.cohen@anu.edu.au> wrote:
>
>Factor 3: the performance problems only happens for I/O that is due to
>network traffic, not I/O that was generated locally. I realise this is
>extremely strange and I have no idea how it knows that I/O is die to
>network traffic let alone why it cares. But I can assure you that it
>does make a difference.
I'll bet you $5 AUD that this happens because you don't block your
output into nicely aligned chunks.
When you have an existing file, and you write 1500 bytes to the middle
of it, performance will degrade _horribly_ compared to the case of
writing a full block, or writing to a position that hasn't been written
yet.
Your benchmark probably just does the equivalent of
for (;;) {
int bytes = read(in, buf, BUFSIZE);
if (bytes <= 0)
break;
write(out, buf, bytes);
}
am I right? The above is obvious code, but it happens to be bad code.
Now, when you read from the network, you will NOT get reads that are a
nice multiple of BUFSIZE, you'll get reads that are a multiple of the
packet data load (~1460 bytes on TCP over ethernet), and you'll end up
doing unaligned writes that require a read-modify-wrtie cycle and thus
end up doing twice as much IO.
And not only does it do twice as much IO (and potentially more with
read-ahead), the read will obviously be _synchronous_, so the slowdown
is more than twice as much.
In contrast, when the source is a local file (or a pipe that ends up
chunking stuff up in 4kB chunks instead of 1500-byte packets), you'll
have nice write patterns that fill the whole buffer and make the read
unnecessary. Which gets you nice streaming writes to disk.
With most fast disks, this is not unlikely to be performance difference
on the order of a magnitude.
And there is _nothing_ the kernel can do about it. Your benchmark is
bad, and has different behaviour depending on the source.
In short, fix your program. Change the loop to be something like
unsigned int so_far = 0;
for (;;) {
int bytes = read(in, buf+so_far, BUFSIZE-so_far);
if (bytes <= 0)
break;
so_far += bytes;
if (so_far < BUFSIZE)
continue;
write(out, buf, BUFSIZE);
so_far = 0;
}
if (so_far)
write(out, buf, so_far);
which will act the same for partial and full reads, and I bet you'll see
the same difference for local and networking I/O (modulo the speed
difference in the _source_, of course).
Oh, and I bet you that once you do something like the above, you won't
see much difference between a 8kB buffer and a 256kB buffer. The
smaller buffer will generate more system calls, but it won't much matter
(and sometimes the smaller buffer performs better due to better data
cache locality and better overlapping IO - system calls under Linux
aren't slow, other factors can easily dominate).
Linus
next prev parent reply other threads:[~2001-10-17 16:45 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2001-10-17 13:06 [Bench] New benchmark showing fileserver problem in 2.4.12 Robert Cohen
2001-10-17 14:12 ` Marcelo Tosatti
2001-10-17 15:12 ` M. Edward Borasky
2001-10-17 15:18 ` John Stoffel
2001-10-17 15:47 ` Andreas Dilger
2001-10-17 16:44 ` Linus Torvalds [this message]
2001-10-18 2:01 ` Leo Mauro
2001-10-18 8:30 ` James Sutherland
2001-10-18 21:36 ` Roger Larsson
2001-10-19 2:53 ` George Greer
2001-10-19 6:08 ` Roger Larsson
[not found] ` <200110171644.f9HGinZ17717@penguin.transmeta.com>
2001-10-18 4:51 ` Robert Cohen
-- strict thread matches above, loose matches on Subject: below --
2001-10-16 9:07 Robert Cohen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='9qkci1$h9g$1@penguin.transmeta.com' \
--to=torvalds@transmeta.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox