public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: Readahead value 128K? (was Re: Drastic Slowdown of 'fseek()'
@ 2006-03-13 11:37 Al Boldi
  2006-03-13 20:01 ` Marr
  0 siblings, 1 reply; 5+ messages in thread
From: Al Boldi @ 2006-03-13 11:37 UTC (permalink / raw)
  To: Marr; +Cc: linux-kernel

Marr wrote:
> The 2.6.13 kernel on ReiserFS (without using 
> 'nolargeio=1' as a mount option) still takes about 4m35s to fseek 200,000 
> times on that 4MB file, even with 'hdparm -a0 /dev/hda' in effect.

try this magic number:

        echo 192 > /sys/block/hda/queue/max_sectors_kb
        echo 192 > /sys/block/hda/queue/read_ahead_kb

Anything outside 132-255 affects throughput negatively.

Also, can you dump hdparm -I /dev/hda?

Thanks!

--
Al


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Readahead value 128K? (was Re: Drastic Slowdown of 'fseek()'
  2006-03-13 11:37 Readahead value 128K? (was Re: Drastic Slowdown of 'fseek()' Al Boldi
@ 2006-03-13 20:01 ` Marr
  2006-03-26 22:25   ` Marr
  0 siblings, 1 reply; 5+ messages in thread
From: Marr @ 2006-03-13 20:01 UTC (permalink / raw)
  To: Al Boldi
  Cc: linux-kernel, reiserfs-dev, Andrew Morton, Mark Lord, Linda Walsh,
	Bill Davidsen, marr, Hans Reiser

On Monday 13 March 2006 6:37am, Al Boldi wrote:
> Marr wrote:
> > The 2.6.13 kernel on ReiserFS (without using
> > 'nolargeio=1' as a mount option) still takes about 4m35s to fseek 200,000
> > times on that 4MB file, even with 'hdparm -a0 /dev/hda' in effect.
>
> try this magic number:
>
>         echo 192 > /sys/block/hda/queue/max_sectors_kb
>         echo 192 > /sys/block/hda/queue/read_ahead_kb
>
> Anything outside 132-255 affects throughput negatively.

I tried this, but it seems that neither of these settings will take any value 
over 128 (which is what they both started at before I tried to change them). 
It seems that I can set values _lower_ than 128, but nothing higher (stock 
Slackware 10.2 2.6.13 kernel).

> Also, can you dump hdparm -I /dev/hda?

Sure. Here's the results:

----------------------------------------

/dev/hda:

ATA device, with non-removable media
	Model Number:       TOSHIBA MK2023GAS                       
	Serial Number:      54594043T
	Firmware Revision:  MB001A  
Standards:
	Supported: 5 4 3 2 
	Likely used: 6
Configuration:
	Logical		max	current
	cylinders	16383	16383
	heads		16	16
	sectors/track	63	63
	--
	CHS current addressable sectors:   16514064
	LBA    user addressable sectors:   39070080
	device size with M = 1024*1024:       19077 MBytes
	device size with M = 1000*1000:       20003 MBytes (20 GB)
Capabilities:
	LBA, IORDY(can be disabled)
	bytes avail on r/w long: 48	Queue depth: 1
	Standby timer values: spec'd by Standard, no device specific minimum
	R/W multiple sector transfer: Max = 16	Current = ?
	Advanced power management level: unknown setting (0x0080)
	DMA: sdma0 sdma1 sdma2 mdma0 mdma1 mdma2 udma0 udma1 *udma2 udma3 udma4 udma5 
	     Cycle time: min=120ns recommended=120ns
	PIO: pio0 pio1 pio2 pio3 pio4 
	     Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
	Enabled	Supported:
	   *	NOP cmd
	   *	READ BUFFER cmd
	   *	WRITE BUFFER cmd
	   *	Host Protected Area feature set
	   *	Look-ahead
	   *	Write cache
	   *	Power Management feature set
		Security Mode feature set
		SMART feature set
	   *	Mandatory FLUSH CACHE command 
	   *	Device Configuration Overlay feature set 
		SET MAX security extension
	   *	Advanced Power Management feature set
	   *	SMART self-test 
	   *	SMART error logging 
Security: 
	Master password revision code = 65534
		supported
	not	enabled
	not	locked
		frozen
	not	expired: security count
	not	supported: enhanced erase
	24min for SECURITY ERASE UNIT. 
HW reset results:
	CBLID- above Vih
	Device num = 0 determined by the jumper
Checksum: correct

----------------------------------------

I don't think anyone is implying this, but for the record, this problem is not 
peculiar to this particular hard disk drive. I seem to see the problem on any 
2.6.x kernel with a ReiserFS filesystem that has _not_ enabled the 
(undocumented, as near as I can see) 'nolargeio=1' option on the mount.

Also, I haven't heard anything more after Hans Reiser queried Ulrich Drepper 
about the 'glibc' angle on this problem, so I'm wondering if there's been any 
progress on that front. Oh, I just noticed that Hans has re-pinged Ulrich on 
this issue -- thanks, Hans!

Anway, as always, thanks to all who've contributed to this thread.

*** Please CC: me on replies -- I'm not subscribed.

Regards,
Bill Marr

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Readahead value 128K? (was Re: Drastic Slowdown of 'fseek()'
  2006-03-13 20:01 ` Marr
@ 2006-03-26 22:25   ` Marr
  2006-03-27 18:50     ` Hans Reiser
  0 siblings, 1 reply; 5+ messages in thread
From: Marr @ 2006-03-26 22:25 UTC (permalink / raw)
  To: linux-kernel, reiserfs-dev, drepper, Hans Reiser
  Cc: Andrew Morton, Mark Lord, Linda Walsh, Bill Davidsen, Gerold Jury,
	Robert Hancock, Al Boldi, Ingo Oeser, Nick Piggin,
	Arjan van de Ven, marr

Greetings, Ulrich, Hans, et al,

*** Please CC: me on replies -- I'm not subscribed.

After some more testing and some input (off-list) from others, here is a 
summary of this problem and its various work-arounds to date....

On Monday 27 February 2006 4:53pm, Hans Reiser wrote:
> Andrew Morton wrote:
> >runs like a dog on 2.6's reiserfs.  libc is doing a (probably) 128k read
> >on every fseek.
> >
> >- There may be a libc stdio function which allows you to tune this
> >  behaviour.

It turns out that there is just such a function. Thanks to some sage 
(off-list) advice from Gerold Jury, this is an effective way to switch the 
file's stream to "unbuffered" mode:

   setvbuf( inp_fh, 0, _IONBF, 0 );

This results in incredible speedups on the ReiserFS+2.6.x setup, without the 
need to even use the 'nolargeio=1' mount option. Basically, we're going from 
128KB read-ahead on every 'fseek()' call to no read-ahead.

> >
> >- libc should probably be a bit more defensive about this anyway -
> >  plainly the filesystem is being silly.
>
> I really thank you for isolating the problem, but I don't see how you
> can do other than blame glibc for this.  The recommended IO size is only
> relevant to uncached data, and glibc is using it regardless of whether
> or not it is cached or uncached.   Do I misunderstand something myself
> here?

To date, I've not seen anyone address this implicit question/issue that Hans 
raised. To wit: Is the "recommended I/O size" only relevant to _uncached_ 
data???

If not, then anyone using ReiserFS on a 2.6.x kernel had best be well aware 
that 128KB read-aheads are going to occur with every 'fseek()' call, 
degrading performance drastically. This seems like a good reason for the 
ReiserFS folks to re-evaluate the use of 128KB as the default value for 
read-ahead.

Alternatively, if "recommended I/O size" _is_ (intended to be) only relevant 
to _uncached_ data, then the question becomes this: Is 'glibc' erroneously 
using that recommended size regardless of whether the data is cached or 
uncached?

Ulrich, we'd really appreciate your input on this matter. Please advise. Even 
a simple reply of "buzz off" would be useful at this point! ;^)

------------------------------

In summary, the problem still exists, but any of the following work-arounds 
are effective, ordered here from best to worst:

(A) Use a 'setvbuf()' call in the target application to disable (or reduce) 
buffering on the input stream. 

Under certain conditions, this should be useful even when not using ReiserFS 
and/or when not running a 2.6.x kernel. However, it's almost essential 
(currently) with ReiserFS and 2.6.x kernels, for apps which do a lot of file 
seeks using ANSI C file I/O (i.e. 'fseek()').

OR

(B) Use the `nolargeio=1' option when mounting a ReiserFS partition under 
2.6.x kernels. This effectively changes the recommended I/O read-ahead after 
each 'fseek()' call from 128KB to 4KB.

Unlike option (A) above, this is useful for situations where you don't have 
access to the source code of the target application(s).

However, Andrew Morton mentioned this possible negative side-effect:

>   This will alter the behaviour of every reiserfs filesystem in the
>   machine.  Even the already mounted ones.

OR

(C) Don't use ReiserFS (v3) under 2.6.x kernels (for apps which do a lot of 
file seeks using ANSI C file I/O).

For example, the 'ext2'/'ext3' filesystems seem to still use the 4KB 
read-ahead, resulting in _much_ better performance when performing multiple 
seeks (outside the range of the 'read-ahead' setting).

------------------------------

Of course, the unmentioned option (which basically bypasses the whole issue) 
is to convert the underlying application to use raw, unbuffered Unix file I/O 
(i.e. 'lseek() + read()' [or even just 'pread()', as suggested by Andrew 
Morton]) instead of ANSI C file I/O ('fseek() + fread()'), but that is 
considered out-of-scope for purposes of this discussion.

-----------------------------

Thanks to all who supplied input. Special thanks to Andrew Morton and Gerold 
Jury who supplied what effectively turned out to be the most-useful 
work-arounds.

*** Please CC: me on replies -- I'm not subscribed.

Bill Marr

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Readahead value 128K? (was Re: Drastic Slowdown of 'fseek()'
  2006-03-26 22:25   ` Marr
@ 2006-03-27 18:50     ` Hans Reiser
  2006-03-27 19:12       ` Marr
  0 siblings, 1 reply; 5+ messages in thread
From: Hans Reiser @ 2006-03-27 18:50 UTC (permalink / raw)
  To: Marr
  Cc: linux-kernel, reiserfs-dev, drepper, Andrew Morton, Mark Lord,
	Linda Walsh, Bill Davidsen, Gerold Jury, Robert Hancock, Al Boldi,
	Ingo Oeser, Nick Piggin, Arjan van de Ven

Thanks Marr.

My concern here is with the users who have no idea what fseek is, and
just see their apps getting slow.  libc is to my mind doing the clearly
incorrect thing here.

Is there a libc developers mailing list, maybe we should try them if
Ulrich is no longer active in libc maintaining?

Hans

Marr wrote:

>Greetings, Ulrich, Hans, et al,
>
>*** Please CC: me on replies -- I'm not subscribed.
>
>After some more testing and some input (off-list) from others, here is a 
>summary of this problem and its various work-arounds to date....
>
>On Monday 27 February 2006 4:53pm, Hans Reiser wrote:
>  
>
>>Andrew Morton wrote:
>>    
>>
>>>runs like a dog on 2.6's reiserfs.  libc is doing a (probably) 128k read
>>>on every fseek.
>>>
>>>- There may be a libc stdio function which allows you to tune this
>>> behaviour.
>>>      
>>>
>
>It turns out that there is just such a function. Thanks to some sage 
>(off-list) advice from Gerold Jury, this is an effective way to switch the 
>file's stream to "unbuffered" mode:
>
>   setvbuf( inp_fh, 0, _IONBF, 0 );
>
>This results in incredible speedups on the ReiserFS+2.6.x setup, without the 
>need to even use the 'nolargeio=1' mount option. Basically, we're going from 
>128KB read-ahead on every 'fseek()' call to no read-ahead.
>
>  
>
>>>- libc should probably be a bit more defensive about this anyway -
>>> plainly the filesystem is being silly.
>>>      
>>>
>>I really thank you for isolating the problem, but I don't see how you
>>can do other than blame glibc for this.  The recommended IO size is only
>>relevant to uncached data, and glibc is using it regardless of whether
>>or not it is cached or uncached.   Do I misunderstand something myself
>>here?
>>    
>>
>
>To date, I've not seen anyone address this implicit question/issue that Hans 
>raised. To wit: Is the "recommended I/O size" only relevant to _uncached_ 
>data???
>
>If not, then anyone using ReiserFS on a 2.6.x kernel had best be well aware 
>that 128KB read-aheads are going to occur with every 'fseek()' call, 
>degrading performance drastically. This seems like a good reason for the 
>ReiserFS folks to re-evaluate the use of 128KB as the default value for 
>read-ahead.
>
>Alternatively, if "recommended I/O size" _is_ (intended to be) only relevant 
>to _uncached_ data, then the question becomes this: Is 'glibc' erroneously 
>using that recommended size regardless of whether the data is cached or 
>uncached?
>
>Ulrich, we'd really appreciate your input on this matter. Please advise. Even 
>a simple reply of "buzz off" would be useful at this point! ;^)
>
>------------------------------
>
>In summary, the problem still exists, but any of the following work-arounds 
>are effective, ordered here from best to worst:
>
>(A) Use a 'setvbuf()' call in the target application to disable (or reduce) 
>buffering on the input stream. 
>
>Under certain conditions, this should be useful even when not using ReiserFS 
>and/or when not running a 2.6.x kernel. However, it's almost essential 
>(currently) with ReiserFS and 2.6.x kernels, for apps which do a lot of file 
>seeks using ANSI C file I/O (i.e. 'fseek()').
>
>OR
>
>(B) Use the `nolargeio=1' option when mounting a ReiserFS partition under 
>2.6.x kernels. This effectively changes the recommended I/O read-ahead after 
>each 'fseek()' call from 128KB to 4KB.
>
>Unlike option (A) above, this is useful for situations where you don't have 
>access to the source code of the target application(s).
>
>However, Andrew Morton mentioned this possible negative side-effect:
>
>  
>
>>  This will alter the behaviour of every reiserfs filesystem in the
>>  machine.  Even the already mounted ones.
>>    
>>
>
>OR
>
>(C) Don't use ReiserFS (v3) under 2.6.x kernels (for apps which do a lot of 
>file seeks using ANSI C file I/O).
>
>For example, the 'ext2'/'ext3' filesystems seem to still use the 4KB 
>read-ahead, resulting in _much_ better performance when performing multiple 
>seeks (outside the range of the 'read-ahead' setting).
>
>------------------------------
>
>Of course, the unmentioned option (which basically bypasses the whole issue) 
>is to convert the underlying application to use raw, unbuffered Unix file I/O 
>(i.e. 'lseek() + read()' [or even just 'pread()', as suggested by Andrew 
>Morton]) instead of ANSI C file I/O ('fseek() + fread()'), but that is 
>considered out-of-scope for purposes of this discussion.
>
>-----------------------------
>
>Thanks to all who supplied input. Special thanks to Andrew Morton and Gerold 
>Jury who supplied what effectively turned out to be the most-useful 
>work-arounds.
>
>*** Please CC: me on replies -- I'm not subscribed.
>
>Bill Marr
>
>
>  
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Readahead value 128K? (was Re: Drastic Slowdown of 'fseek()'
  2006-03-27 18:50     ` Hans Reiser
@ 2006-03-27 19:12       ` Marr
  0 siblings, 0 replies; 5+ messages in thread
From: Marr @ 2006-03-27 19:12 UTC (permalink / raw)
  To: Hans Reiser, libc-alpha
  Cc: linux-kernel, reiserfs-dev, drepper, Andrew Morton, Mark Lord,
	Linda Walsh, Bill Davidsen, Gerold Jury, Robert Hancock, Al Boldi,
	Ingo Oeser, Nick Piggin, Arjan van de Ven, marr

On Monday 27 March 2006 1:50pm, Hans Reiser wrote:
> Thanks Marr.
>
> My concern here is with the users who have no idea what fseek is, and
> just see their apps getting slow.  libc is to my mind doing the clearly
> incorrect thing here.
>
> Is there a libc developers mailing list, maybe we should try them if
> Ulrich is no longer active in libc maintaining?

Good point. I've found a 'glibc' developers' mailing list, so I'm including 
them on this reply. Hopefully someone there will pick up on this thread and 
respond.

Bill Marr

> Marr wrote:
> >Greetings, Ulrich, Hans, et al,
> >
> >*** Please CC: me on replies -- I'm not subscribed.
> >
> >After some more testing and some input (off-list) from others, here is a
> >summary of this problem and its various work-arounds to date....
> >
> >On Monday 27 February 2006 4:53pm, Hans Reiser wrote:
> >>Andrew Morton wrote:
> >>>runs like a dog on 2.6's reiserfs.  libc is doing a (probably) 128k read
> >>>on every fseek.
> >>>
> >>>- There may be a libc stdio function which allows you to tune this
> >>> behaviour.
> >
> >It turns out that there is just such a function. Thanks to some sage
> >(off-list) advice from Gerold Jury, this is an effective way to switch the
> >file's stream to "unbuffered" mode:
> >
> >   setvbuf( inp_fh, 0, _IONBF, 0 );
> >
> >This results in incredible speedups on the ReiserFS+2.6.x setup, without
> > the need to even use the 'nolargeio=1' mount option. Basically, we're
> > going from 128KB read-ahead on every 'fseek()' call to no read-ahead.
> >
> >>>- libc should probably be a bit more defensive about this anyway -
> >>> plainly the filesystem is being silly.
> >>
> >>I really thank you for isolating the problem, but I don't see how you
> >>can do other than blame glibc for this.  The recommended IO size is only
> >>relevant to uncached data, and glibc is using it regardless of whether
> >>or not it is cached or uncached.   Do I misunderstand something myself
> >>here?
> >
> >To date, I've not seen anyone address this implicit question/issue that
> > Hans raised. To wit: Is the "recommended I/O size" only relevant to
> > _uncached_ data???
> >
> >If not, then anyone using ReiserFS on a 2.6.x kernel had best be well
> > aware that 128KB read-aheads are going to occur with every 'fseek()'
> > call, degrading performance drastically. This seems like a good reason
> > for the ReiserFS folks to re-evaluate the use of 128KB as the default
> > value for read-ahead.
> >
> >Alternatively, if "recommended I/O size" _is_ (intended to be) only
> > relevant to _uncached_ data, then the question becomes this: Is 'glibc'
> > erroneously using that recommended size regardless of whether the data is
> > cached or uncached?
> >
> >Ulrich, we'd really appreciate your input on this matter. Please advise.
> > Even a simple reply of "buzz off" would be useful at this point! ;^)
> >
> >------------------------------
> >
> >In summary, the problem still exists, but any of the following
> > work-arounds are effective, ordered here from best to worst:
> >
> >(A) Use a 'setvbuf()' call in the target application to disable (or
> > reduce) buffering on the input stream.
> >
> >Under certain conditions, this should be useful even when not using
> > ReiserFS and/or when not running a 2.6.x kernel. However, it's almost
> > essential (currently) with ReiserFS and 2.6.x kernels, for apps which do
> > a lot of file seeks using ANSI C file I/O (i.e. 'fseek()').
> >
> >OR
> >
> >(B) Use the `nolargeio=1' option when mounting a ReiserFS partition under
> >2.6.x kernels. This effectively changes the recommended I/O read-ahead
> > after each 'fseek()' call from 128KB to 4KB.
> >
> >Unlike option (A) above, this is useful for situations where you don't
> > have access to the source code of the target application(s).
> >
> >However, Andrew Morton mentioned this possible negative side-effect:
> >>  This will alter the behaviour of every reiserfs filesystem in the
> >>  machine.  Even the already mounted ones.
> >
> >OR
> >
> >(C) Don't use ReiserFS (v3) under 2.6.x kernels (for apps which do a lot
> > of file seeks using ANSI C file I/O).
> >
> >For example, the 'ext2'/'ext3' filesystems seem to still use the 4KB
> >read-ahead, resulting in _much_ better performance when performing
> > multiple seeks (outside the range of the 'read-ahead' setting).
> >
> >------------------------------
> >
> >Of course, the unmentioned option (which basically bypasses the whole
> > issue) is to convert the underlying application to use raw, unbuffered
> > Unix file I/O (i.e. 'lseek() + read()' [or even just 'pread()', as
> > suggested by Andrew Morton]) instead of ANSI C file I/O ('fseek() +
> > fread()'), but that is considered out-of-scope for purposes of this
> > discussion.
> >
> >-----------------------------
> >
> >Thanks to all who supplied input. Special thanks to Andrew Morton and
> > Gerold Jury who supplied what effectively turned out to be the
> > most-useful work-arounds.
> >
> >*** Please CC: me on replies -- I'm not subscribed.
> >
> >Bill Marr

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2006-03-27 19:17 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-03-13 11:37 Readahead value 128K? (was Re: Drastic Slowdown of 'fseek()' Al Boldi
2006-03-13 20:01 ` Marr
2006-03-26 22:25   ` Marr
2006-03-27 18:50     ` Hans Reiser
2006-03-27 19:12       ` Marr

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox