public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Hans Reiser <reiser@namesys.com>
To: Nick Piggin <nickpiggin@yahoo.com.au>, drepper@redhat.com
Cc: Ingo Oeser <ioe-lkml@rameria.de>,
	linux-kernel@vger.kernel.org, Andrew Morton <akpm@osdl.org>,
	Marr <marr@flex.com>,
	reiserfs-dev@namesys.com
Subject: Re: Drastic Slowdown of 'fseek()' Calls From 2.4 to 2.6 -- VMM Change?
Date: Tue, 28 Feb 2006 10:51:12 -0800	[thread overview]
Message-ID: <44049BA0.5080007@namesys.com> (raw)
In-Reply-To: <44039A83.4050604@yahoo.com.au>

Ulrich, it seems that glibc is doing something that looks like some sort
of attempt at a filesystem optimization for fseek() which really ought
to be in the filesystems instead of glibc.  Could you comment, and
assuming you agree, fix it for us?

It particularly affects ReiserFS V3 performance in a highly negative
way, because we set stat.blksize to 128k.  stat.blksize is intended to
hint what the preferred IO size is for an FS.

Could you read this thread and contribute to it?

Hans

The most important part of the thread to read was:

Marr <marr@flex.com> wrote:
  

>>
>> ..
>>
>> When switching from kernel 2.4.31 to 2.6.13 (with everything else the same), 
>> there is a drastic increase in the time required to perform 'fseek()' on 
>> larger files (e.g. 4.3 MB, using ReiserFS [in case it matters], in my test 
>> case).
>> 
>> It seems that any seeks in a range larger than 128KB (regardless of the file 
>> size or the position within the file) cause the performace to drop 
>> precipitously.
>>
>    
>

Interesting.

What's happening is that glibc does a read from the file within each
fseek().  Which might seem a bit silly because the app could seek somewhere
else without doing any IO.  But then the app would be silly too.

Also, glibc is using the value returned in struct stat's blksize (a hint as
to this file's preferred read chunk size) as, umm, a hint as to this file's
preferred read size.

Most filesystems return 4k in stat.blksize.  But in 2.6, reiserfs bumped
that to 128k to get good I/O patterns.   Consequently this:

  

>>          for (j=0; j < max_calls; j++) {
>>             pos = (int)(((double)random() / (double)RAND_MAX) * 4000000.0);
>>             if (fseek(inp_fh, pos, SEEK_SET)) {
>>                printf("Error ('%s') seeking to position %d!\n", 
>>                       strerror(errno), pos);
>>             }
>>          }
>    
>

runs like a dog on 2.6's reiserfs.  libc is doing a (probably) 128k read
on every fseek.



Nick Piggin wrote:

> Hans Reiser wrote:
>
>> Sounds like the real problem is that glibc is doing filesystem
>> optimizations without making them conditional on the filesystem type.
>
>
> I'm not sure that it should even be conditional on the filesystem type...
> To me it seems silly to even bother doing it, although I guess there
> is another level of buffering involved which might mean it makes more
> sense.
>
>
>> My entry for the ugliest thought of the day: I wonder if the kernel can
>> test the glibc version and.....
>>
>> Hans
>>
>> Nick Piggin wrote:
>>
>>
>>> Actually glibc tries to turn this pre-read off if the seek is to a page
>>> aligned offset, presumably to handle this case. However a big write
>>> would only have to RMW the first and last partial pages, so pre-reading
>>> 128KB in this case is wrong.
>>>
>>> And I would also say a 4K read is wrong as well, because a big read
>>> will
>>> be less efficient due to the extra syscall and small IO.
>>>


  parent reply	other threads:[~2006-02-28 18:51 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-02-24 20:22 Drastic Slowdown of 'fseek()' Calls From 2.4 to 2.6 -- VMM Change? Marr
2006-02-25  5:16 ` Andrew Morton
2006-02-26 13:07   ` Ingo Oeser
2006-02-26 13:50     ` Nick Piggin
2006-02-26 14:11       ` Arjan van de Ven
2006-02-27 20:52       ` Hans Reiser
2006-02-28  0:34         ` Nick Piggin
2006-02-28 18:42           ` Hans Reiser
2006-02-28 18:51           ` Hans Reiser [this message]
2006-02-27 20:24   ` Marr
2006-02-27 21:53   ` Hans Reiser
2006-02-28  0:03     ` Bill Davidsen
2006-02-28 18:38       ` Hans Reiser
2006-03-05 23:02       ` Readahead value 128K? (was Re: Drastic Slowdown of 'fseek()' Calls From 2.4 to 2.6 -- VMM Change?) Linda Walsh
2006-03-07 19:53         ` Marr
2006-03-07 21:15           ` Linda Walsh
2006-03-12 21:53             ` Marr
2006-03-12 22:15               ` Mark Lord
2006-03-13  4:36                 ` Marr
2006-03-13 14:41                   ` Mark Lord
2006-03-13 18:15                     ` Hans Reiser
2006-03-13 20:00                     ` Marr
     [not found] <5JRJO-6Al-7@gated-at.bofh.it>
2006-02-24 23:31 ` Drastic Slowdown of 'fseek()' Calls From 2.4 to 2.6 -- VMM Change? Robert Hancock

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44049BA0.5080007@namesys.com \
    --to=reiser@namesys.com \
    --cc=akpm@osdl.org \
    --cc=drepper@redhat.com \
    --cc=ioe-lkml@rameria.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marr@flex.com \
    --cc=nickpiggin@yahoo.com.au \
    --cc=reiserfs-dev@namesys.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox