From: Marr <marr@flex.com>
To: linux-kernel@vger.kernel.org
Cc: reiserfs-dev@namesys.com
Subject: Re: Drastic Slowdown of 'fseek()' Calls From 2.4 to 2.6 -- VMM Change?
Date: Mon, 27 Feb 2006 15:24:13 -0500 [thread overview]
Message-ID: <200602271524.13693.marr@flex.com> (raw)
In-Reply-To: <20060224211650.569248d0.akpm@osdl.org>
On Saturday 25 February 2006 12:16am, Andrew Morton wrote:
> Marr <marr@flex.com> wrote:
> > ..
> >
> > When switching from kernel 2.4.31 to 2.6.13 (with everything else the
> > same), there is a drastic increase in the time required to perform
> > 'fseek()' on larger files (e.g. 4.3 MB, using ReiserFS [in case it
> > matters], in my test case).
> >
> > It seems that any seeks in a range larger than 128KB (regardless of the
> > file size or the position within the file) cause the performace to drop
> > precipitously.
>
> Interesting.
>
> What's happening is that glibc does a read from the file within each
> fseek(). Which might seem a bit silly because the app could seek somewhere
> else without doing any IO. But then the app would be silly too.
>
> Also, glibc is using the value returned in struct stat's blksize (a hint as
> to this file's preferred read chunk size) as, umm, a hint as to this file's
> preferred read size.
>
> Most filesystems return 4k in stat.blksize. But in 2.6, reiserfs bumped
> that to 128k to get good I/O patterns. Consequently this:
> > for (j=0; j < max_calls; j++) {
> > pos = (int)(((double)random() / (double)RAND_MAX) *
> > 4000000.0); if (fseek(inp_fh, pos, SEEK_SET)) {
> > printf("Error ('%s') seeking to position %d!\n",
> > strerror(errno), pos);
> > }
> > }
>
> runs like a dog on 2.6's reiserfs. libc is doing a (probably) 128k read
> on every fseek.
(...snip...)
> - You can alter the filesystem's behaviour by mounting with the
> `nolargeio=1' option. That sets stat.blksize back to 4k.
Greetings again,
*** Please CC: me on replies -- I'm not subscribed.
First off, many thanks to all who replied. A special "thank you" to Andrew
Morton for his valuable insight -- very much appreciated!
Apologies for my delay in replying. I wanted to do some proper testing in
order to have something intelligent to report.
Based on Andrew's excellent advice, I've re-tested. As before, I tested under
the stock (Slackware 10.2) 2.4.31 and 2.6.13 kernels. This time, I tested
ext2, ext3, and reiserfs (with and without the 'nolargeio=1' mount option)
filesystems.
Some notes on the testing:
(1) This is on a faster machine and a faster hard disk drive than the
testing from my initial email, so the absolute times are not meaningful in
comparison.
(2) I found (unsurprisingly) that ext2 and ext3 times were very similar, so
I'm reporting them as one here.
(3) I'm only reporting the times for the 2nd and subsequent runs of the
'fdisk_seek' test. On all cases (except for the 2.6.13 kernel with reiserfs
without the 'nolargeio=1' setting), the 1st run after mounting the filesystem
was predictably slower (uncached file content). The 2nd and subsequent runs
are all close enough to be considered identical.
(4) All tests were done on the same 4MB zero-filled file described in my
initial email.
Timing tests with 200,000 randomized 'fseek()' calls:
Kernel 2.4.31:
ext2/3: 2.8s
reiserfs (w/o 'nolargeio=1'): 2.8s
Kernel 2.6.13:
ext2/3: 3.0s
reiserfs (w/o 'nolargeio=1'): 2m12s (ouch!)
reiserfs (with 'nolargeio=1'): 3.0s
Basically, the "reiserfs without 'nolargeio=1' option on a 2.6.x kernel" is
the "problem child". Every run, from the 1st to the nth, takes the same
amount of time and is _incredibly_ slow for any application which is doing a
lot of file seeking outside of a 128KB window.
Clearly, however, there are 2 workarounds when using a 2.6.x kernel: (A) Use
ext2/ext3 or (B) use the 'nolargeio=1' mount option when using reiserfs.
Aside: For some reason, the 'nolargeio' option for the 'reiserfs' filesystem
is not mentioned on their page of such info:
http://www.namesys.com/mount-options.html
On Saturday 25 February 2006 12:16am, Andrew Morton wrote:
> No happy answers there, sorry. But a workaround.
Actually, 2 workarounds, both good ones. Thanks again, Andrew, for your
excellent advice!
*** Please CC: me on replies -- I'm not subscribed.
Bill Marr
next prev parent reply other threads:[~2006-02-27 20:25 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-02-24 20:22 Drastic Slowdown of 'fseek()' Calls From 2.4 to 2.6 -- VMM Change? Marr
2006-02-25 5:16 ` Andrew Morton
2006-02-26 13:07 ` Ingo Oeser
2006-02-26 13:50 ` Nick Piggin
2006-02-26 14:11 ` Arjan van de Ven
2006-02-27 20:52 ` Hans Reiser
2006-02-28 0:34 ` Nick Piggin
2006-02-28 18:42 ` Hans Reiser
2006-02-28 18:51 ` Hans Reiser
2006-02-27 20:24 ` Marr [this message]
2006-02-27 21:53 ` Hans Reiser
2006-02-28 0:03 ` Bill Davidsen
2006-02-28 18:38 ` Hans Reiser
2006-03-05 23:02 ` Readahead value 128K? (was Re: Drastic Slowdown of 'fseek()' Calls From 2.4 to 2.6 -- VMM Change?) Linda Walsh
2006-03-07 19:53 ` Marr
2006-03-07 21:15 ` Linda Walsh
2006-03-12 21:53 ` Marr
2006-03-12 22:15 ` Mark Lord
2006-03-13 4:36 ` Marr
2006-03-13 14:41 ` Mark Lord
2006-03-13 18:15 ` Hans Reiser
2006-03-13 20:00 ` Marr
[not found] <5JRJO-6Al-7@gated-at.bofh.it>
2006-02-24 23:31 ` Drastic Slowdown of 'fseek()' Calls From 2.4 to 2.6 -- VMM Change? Robert Hancock
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200602271524.13693.marr@flex.com \
--to=marr@flex.com \
--cc=linux-kernel@vger.kernel.org \
--cc=reiserfs-dev@namesys.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox