public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* fsync delays for a long time.
@ 2002-02-14 16:03 Alexander Moibenko
  2002-02-14 17:39 ` Alan Cox
  0 siblings, 1 reply; 17+ messages in thread
From: Alexander Moibenko @ 2002-02-14 16:03 UTC (permalink / raw)
  To: linux-kernel

Hi,
we are using gdbm in our application. It has been noticed that whenever
a disk intesive job is running our application hangs for a very long time.
This is the scenario I'm getting in trouble with:
run my gdbm application and bonnie test on the same device.
When gdbm comes to the point when it calls fsync it delays for a long
time.
The time depends on the CPU and disk speed, but always is intolerably big:
few tens of sec - to minutes.
It does not seem to depend on the size of the DB.
Application runs on the machines with 2.2.x kernel.
Had anyone seen the same problem?
I've seen a discussion about a bad performance of SCSI versus IDE drives
with mySQL on this list. But we tried it on both with the same (bad)
result. IDE is even worse in our case. In the discussion it was also said
that fsync for 2.4.x is modified. But does it fix a problem?
Thanks in advance for comments and suggestions.

--------------------------------------------------------------------------
Alexander N. Moibenko, Integrated Systems Development, CD, Fermilab
email: moibenko@fnal.fnal.gov
--------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: fsync delays for a long time.
  2002-02-14 16:03 fsync delays for a long time Alexander Moibenko
@ 2002-02-14 17:39 ` Alan Cox
  2002-02-14 20:51   ` Andrew Morton
  0 siblings, 1 reply; 17+ messages in thread
From: Alan Cox @ 2002-02-14 17:39 UTC (permalink / raw)
  To: Alexander Moibenko; +Cc: linux-kernel

> run my gdbm application and bonnie test on the same device.
> When gdbm comes to the point when it calls fsync it delays for a long
> time.

fsync on a very large file is very slow on the 2.2 kernels

> result. IDE is even worse in our case. In the discussion it was also said
> that fsync for 2.4.x is modified. But does it fix a problem?

2.4 is a lot smarter about flushing only the things it needs to. That
makes it dependant on the number of blocks to write not some embarrasingly
large power of the file size



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: fsync delays for a long time.
  2002-02-14 17:39 ` Alan Cox
@ 2002-02-14 20:51   ` Andrew Morton
  2002-02-14 21:09     ` Alan Cox
  2002-02-15 17:24     ` Simon Kirby
  0 siblings, 2 replies; 17+ messages in thread
From: Andrew Morton @ 2002-02-14 20:51 UTC (permalink / raw)
  To: Alan Cox; +Cc: Alexander Moibenko, linux-kernel

Alan Cox wrote:
> 
> > run my gdbm application and bonnie test on the same device.
> > When gdbm comes to the point when it calls fsync it delays for a long
> > time.
> 
> fsync on a very large file is very slow on the 2.2 kernels

This could very well be due to request allocation starvation.
fsync is sleeping in __get_request_wait() while bonnie keeps
on stealing all the requests.

Recall that patch you dropped on Tuesday? :)

> > result. IDE is even worse in our case. In the discussion it was also said
> > that fsync for 2.4.x is modified. But does it fix a problem?
> 
> 2.4 is a lot smarter about flushing only the things it needs to. That
> makes it dependant on the number of blocks to write not some embarrasingly
> large power of the file size

OBTW: It seems that fsync_inode_data_buffers() will livelock if
another process is writing to the same file.  gargh.

-

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: fsync delays for a long time.
  2002-02-14 20:51   ` Andrew Morton
@ 2002-02-14 21:09     ` Alan Cox
  2002-02-14 21:12       ` Alexander Moibenko
  2002-02-14 21:20       ` Andrew Morton
  2002-02-15 17:24     ` Simon Kirby
  1 sibling, 2 replies; 17+ messages in thread
From: Alan Cox @ 2002-02-14 21:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Alan Cox, Alexander Moibenko, linux-kernel

> > fsync on a very large file is very slow on the 2.2 kernels
> 
> This could very well be due to request allocation starvation.
> fsync is sleeping in __get_request_wait() while bonnie keeps
> on stealing all the requests.

That may amplify it but in the 2.2 case fsync on any sensible sized file
is already horribly performing. It hits databases like solid quite badly

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: fsync delays for a long time.
  2002-02-14 21:09     ` Alan Cox
@ 2002-02-14 21:12       ` Alexander Moibenko
  2002-02-14 21:37         ` Alan Cox
  2002-02-14 21:20       ` Andrew Morton
  1 sibling, 1 reply; 17+ messages in thread
From: Alexander Moibenko @ 2002-02-14 21:12 UTC (permalink / raw)
  To: Alan Cox; +Cc: Andrew Morton, linux-kernel

On Thu, 14 Feb 2002, Alan Cox wrote:

> > > fsync on a very large file is very slow on the 2.2 kernels
> >
> > This could very well be due to request allocation starvation.
> > fsync is sleeping in __get_request_wait() while bonnie keeps
> > on stealing all the requests.
>
> That may amplify it but in the 2.2 case fsync on any sensible sized file
> is already horribly performing. It hits databases like solid quite badly
>
please elaborate on "sensible sized". In my case it is less then 20MB.
--------------------------------------------------------------------------
Alexander N. Moibenko, Integrated Systems Development, CD, Fermilab
Tel: (630)840-3937
email: moibenko@fnal.fnal.gov
--------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: fsync delays for a long time.
  2002-02-14 21:09     ` Alan Cox
  2002-02-14 21:12       ` Alexander Moibenko
@ 2002-02-14 21:20       ` Andrew Morton
  2002-02-14 21:38         ` Alan Cox
  2002-02-15 16:48         ` Stephen C. Tweedie
  1 sibling, 2 replies; 17+ messages in thread
From: Andrew Morton @ 2002-02-14 21:20 UTC (permalink / raw)
  To: Alan Cox; +Cc: Alexander Moibenko, linux-kernel

Alan Cox wrote:
> 
> > > fsync on a very large file is very slow on the 2.2 kernels
> >
> > This could very well be due to request allocation starvation.
> > fsync is sleeping in __get_request_wait() while bonnie keeps
> > on stealing all the requests.
> 
> That may amplify it but in the 2.2 case fsync on any sensible sized file
> is already horribly performing. It hits databases like solid quite badly

I'm surprised.  ext2's fsync in 2.2 is in fact quite optimal: a single
pass across the block tree, in probable-LBA-order.  No livelock potential
there.  Optimal.  Note that it implements "only sync the stuff which was
dirty on entry" semantics.

But msync() is a different kettle of fish.  It calls file_fsync(), which
syncs the entire device, livelockably.  Are you sure `solid' is not
using msync?

-

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: fsync delays for a long time.
  2002-02-14 21:37         ` Alan Cox
@ 2002-02-14 21:26           ` Alexander Moibenko
  2002-02-14 21:41             ` Andrew Morton
  0 siblings, 1 reply; 17+ messages in thread
From: Alexander Moibenko @ 2002-02-14 21:26 UTC (permalink / raw)
  To: Alan Cox; +Cc: Andrew Morton, linux-kernel

On Thu, 14 Feb 2002, Alan Cox wrote:

> > > > This could very well be due to request allocation starvation.
> > > > fsync is sleeping in __get_request_wait() while bonnie keeps
> > > > on stealing all the requests.
> > >
> > > That may amplify it but in the 2.2 case fsync on any sensible sized file
> > > is already horribly performing. It hits databases like solid quite badly
> > >
> > please elaborate on "sensible sized". In my case it is less then 20MB.
>
> That ought to be ok. Andrew may well be right in that case.
>
Then what is your advise. Switch to 2.4.x?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: fsync delays for a long time.
  2002-02-14 21:12       ` Alexander Moibenko
@ 2002-02-14 21:37         ` Alan Cox
  2002-02-14 21:26           ` Alexander Moibenko
  0 siblings, 1 reply; 17+ messages in thread
From: Alan Cox @ 2002-02-14 21:37 UTC (permalink / raw)
  To: Alexander Moibenko; +Cc: Alan Cox, Andrew Morton, linux-kernel

> > > This could very well be due to request allocation starvation.
> > > fsync is sleeping in __get_request_wait() while bonnie keeps
> > > on stealing all the requests.
> >
> > That may amplify it but in the 2.2 case fsync on any sensible sized file
> > is already horribly performing. It hits databases like solid quite badly
> >
> please elaborate on "sensible sized". In my case it is less then 20MB.

That ought to be ok. Andrew may well be right in that case.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: fsync delays for a long time.
  2002-02-14 21:20       ` Andrew Morton
@ 2002-02-14 21:38         ` Alan Cox
  2002-02-15 16:48         ` Stephen C. Tweedie
  1 sibling, 0 replies; 17+ messages in thread
From: Alan Cox @ 2002-02-14 21:38 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Alan Cox, Alexander Moibenko, linux-kernel

> there.  Optimal.  Note that it implements "only sync the stuff which was
> dirty on entry" semantics.
> 
> But msync() is a different kettle of fish.  It calls file_fsync(), which
> syncs the entire device, livelockably.  Are you sure `solid' is not
> using msync?

Could be. I'm going on second hand reports here. 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: fsync delays for a long time.
  2002-02-14 21:26           ` Alexander Moibenko
@ 2002-02-14 21:41             ` Andrew Morton
  2002-02-14 22:18               ` Alexander Moibenko
  0 siblings, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2002-02-14 21:41 UTC (permalink / raw)
  To: Alexander Moibenko; +Cc: Alan Cox, linux-kernel

Alexander Moibenko wrote:
> 
> On Thu, 14 Feb 2002, Alan Cox wrote:
> 
> > > > > This could very well be due to request allocation starvation.
> > > > > fsync is sleeping in __get_request_wait() while bonnie keeps
> > > > > on stealing all the requests.
> > > >
> > > > That may amplify it but in the 2.2 case fsync on any sensible sized file
> > > > is already horribly performing. It hits databases like solid quite badly
> > > >
> > > please elaborate on "sensible sized". In my case it is less then 20MB.
> >
> > That ought to be ok. Andrew may well be right in that case.
> >
> Then what is your advise. Switch to 2.4.x?

I would recommend that, yes.  One consideration: if the problem
is still appearing in 2.4 then it is about 1000 times more
likely to get fixed.

What filesystem were you using, BTW?  ext2?

If you do test on 2.4, and the problem still appears, please try

wget http://www.zip.com.au/~akpm/linux/2.4/2.4.18-pre9/make_request.patch
patch -p1 < make_request.patch

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: fsync delays for a long time.
  2002-02-14 21:41             ` Andrew Morton
@ 2002-02-14 22:18               ` Alexander Moibenko
  0 siblings, 0 replies; 17+ messages in thread
From: Alexander Moibenko @ 2002-02-14 22:18 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Alan Cox, linux-kernel

On Thu, 14 Feb 2002, Andrew Morton wrote:

> Alexander Moibenko wrote:
> >
> > On Thu, 14 Feb 2002, Alan Cox wrote:
> >
> > > > > > This could very well be due to request allocation starvation.
> > > > > > fsync is sleeping in __get_request_wait() while bonnie keeps
> > > > > > on stealing all the requests.
> > > > >
> > > > > That may amplify it but in the 2.2 case fsync on any sensible sized file
> > > > > is already horribly performing. It hits databases like solid quite badly
> > > > >
> > > > please elaborate on "sensible sized". In my case it is less then 20MB.
> > >
> > > That ought to be ok. Andrew may well be right in that case.
> > >
> > Then what is your advise. Switch to 2.4.x?
>
> I would recommend that, yes.  One consideration: if the problem
> is still appearing in 2.4 then it is about 1000 times more
> likely to get fixed.
Thanks a lot.
>
> What filesystem were you using, BTW?  ext2?
Yes, it is ext2
>
> If you do test on 2.4, and the problem still appears, please try
>
> wget http://www.zip.com.au/~akpm/linux/2.4/2.4.18-pre9/make_request.patch
> patch -p1 < make_request.patch
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: fsync delays for a long time.
  2002-02-14 21:20       ` Andrew Morton
  2002-02-14 21:38         ` Alan Cox
@ 2002-02-15 16:48         ` Stephen C. Tweedie
  1 sibling, 0 replies; 17+ messages in thread
From: Stephen C. Tweedie @ 2002-02-15 16:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Alan Cox, Alexander Moibenko, linux-kernel, Stephen Tweedie

Hi,

On Thu, Feb 14, 2002 at 01:20:57PM -0800, Andrew Morton wrote:

> I'm surprised.  ext2's fsync in 2.2 is in fact quite optimal: a single
> pass across the block tree, in probable-LBA-order.

Except that it pages the entire indirect tree into memory, which is
_really_ painful if you're just appending to the end of a 1GB file.
See ext2_sync_file(): in 2.2 it uses the indirect tree to locate all
possible buffer_heads which might need flushing, and then it does a
buffer cache lookup for every block in the file.  It's _way_
suboptimal for large files, just in terms of the CPU cost and memory
cost of maintaining and walking the block tree.

2.4 doesn't need to search physically for dirty buffers, so is much
much faster.

Cheers,
 Stephen

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: fsync delays for a long time.
  2002-02-14 20:51   ` Andrew Morton
  2002-02-14 21:09     ` Alan Cox
@ 2002-02-15 17:24     ` Simon Kirby
  2002-02-15 19:07       ` Andrew Morton
  2002-02-17 12:47       ` Bill Davidsen
  1 sibling, 2 replies; 17+ messages in thread
From: Simon Kirby @ 2002-02-15 17:24 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Alan Cox, linux-kernel

On Thu, Feb 14, 2002 at 12:51:14PM -0800, Andrew Morton wrote:

> Alan Cox wrote:
> > 
> > > run my gdbm application and bonnie test on the same device.
> > > When gdbm comes to the point when it calls fsync it delays for a long
> > > time.
> > 
> > fsync on a very large file is very slow on the 2.2 kernels
> 
> This could very well be due to request allocation starvation.
> fsync is sleeping in __get_request_wait() while bonnie keeps
> on stealing all the requests.
> 
> Recall that patch you dropped on Tuesday? :)

Not sure if this is related, but I still can't get 2.4 or 2.5 kernels to
actually read and write at the same time during a large file copy between
two totally separate devices (eg: from hda1 to hdc1).  "vmstat 1" shows
reads with no writing for about 6-8 seconds followed by writes with no
reading for about 5-6 seconds, repeat.

Is there a patch available that could fix this?

Simon-

[  Stormix Technologies Inc.  ][  NetNation Communications Inc. ]
[       sim@stormix.com       ][       sim@netnation.com        ]
[ Opinions expressed are not necessarily those of my employers. ]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: fsync delays for a long time.
  2002-02-15 17:24     ` Simon Kirby
@ 2002-02-15 19:07       ` Andrew Morton
  2002-02-15 19:21         ` Simon Kirby
  2002-02-17 12:47       ` Bill Davidsen
  1 sibling, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2002-02-15 19:07 UTC (permalink / raw)
  To: Simon Kirby; +Cc: Alan Cox, linux-kernel

Simon Kirby wrote:
> 
> Not sure if this is related, but I still can't get 2.4 or 2.5 kernels to
> actually read and write at the same time during a large file copy between
> two totally separate devices (eg: from hda1 to hdc1).  "vmstat 1" shows
> reads with no writing for about 6-8 seconds followed by writes with no
> reading for about 5-6 seconds, repeat.

That's different.

It tends to be the case that when the dirty-data-generator hits
a particular threshold, it blocks while we write out vast amounts
of data.  So the throughput is very lumpy.

It's probable that it can be tamed a bit by fiddling with the
/proc/sys/vm/bdflush parameters. 

> Is there a patch available that could fix this?

The -aa patches fiddle extensively with the bdflush thresholds and logic.
There's stuff in there which might addresses this.

-

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: fsync delays for a long time.
  2002-02-15 19:07       ` Andrew Morton
@ 2002-02-15 19:21         ` Simon Kirby
  0 siblings, 0 replies; 17+ messages in thread
From: Simon Kirby @ 2002-02-15 19:21 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Alan Cox, linux-kernel

On Fri, Feb 15, 2002 at 11:07:26AM -0800, Andrew Morton wrote:

> Simon Kirby wrote:
> > 
> > Not sure if this is related, but I still can't get 2.4 or 2.5 kernels to
> > actually read and write at the same time during a large file copy between
> > two totally separate devices (eg: from hda1 to hdc1).  "vmstat 1" shows
> > reads with no writing for about 6-8 seconds followed by writes with no
> > reading for about 5-6 seconds, repeat.
> 
> That's different.
> 
> It tends to be the case that when the dirty-data-generator hits
> a particular threshold, it blocks while we write out vast amounts
> of data.  So the throughput is very lumpy.
> 
> It's probable that it can be tamed a bit by fiddling with the
> /proc/sys/vm/bdflush parameters. 

I did try fiddling with bdflush, and I was able to get them to read and
write at what looked like the same time, but the resolution of "vmstat 1"
wasn't really good enough to see.  Also, I think the overall throughput
was the same, where as it should be roughly twice as high (shouldn't it
be possible to read and write as fast as the lowest speed of both
drives?).

> > Is there a patch available that could fix this?
> 
> The -aa patches fiddle extensively with the bdflush thresholds and logic.
> There's stuff in there which might addresses this.

I'll take a look at this.

Simon-

[  Stormix Technologies Inc.  ][  NetNation Communications Inc. ]
[       sim@stormix.com       ][       sim@netnation.com        ]
[ Opinions expressed are not necessarily those of my employers. ]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: fsync delays for a long time.
  2002-02-15 17:24     ` Simon Kirby
  2002-02-15 19:07       ` Andrew Morton
@ 2002-02-17 12:47       ` Bill Davidsen
  2002-02-17 16:45         ` Simon Kirby
  1 sibling, 1 reply; 17+ messages in thread
From: Bill Davidsen @ 2002-02-17 12:47 UTC (permalink / raw)
  To: Simon Kirby; +Cc: Linux Kernel Mailing List

On Fri, 15 Feb 2002, Simon Kirby wrote:

> On Thu, Feb 14, 2002 at 12:51:14PM -0800, Andrew Morton wrote:
> Not sure if this is related, but I still can't get 2.4 or 2.5 kernels to
> actually read and write at the same time during a large file copy between
> two totally separate devices (eg: from hda1 to hdc1).  "vmstat 1" shows
> reads with no writing for about 6-8 seconds followed by writes with no
> reading for about 5-6 seconds, repeat.

You don't have enough memory... you can probably tune bdflush to make the
system flush writes more aggressively, but one cause is that you fill all
available memory before bdflush even runs. Try setting the time way down,
say one sec, and see if things change.

Note that this may not make the system run faster in any significant way,
it may just get all the drive lights blinking.

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: fsync delays for a long time.
  2002-02-17 12:47       ` Bill Davidsen
@ 2002-02-17 16:45         ` Simon Kirby
  0 siblings, 0 replies; 17+ messages in thread
From: Simon Kirby @ 2002-02-17 16:45 UTC (permalink / raw)
  To: Linux Kernel Mailing List

On Sun, Feb 17, 2002 at 07:47:38AM -0500, Bill Davidsen wrote:

> On Fri, 15 Feb 2002, Simon Kirby wrote:
> 
> > On Thu, Feb 14, 2002 at 12:51:14PM -0800, Andrew Morton wrote:
> > Not sure if this is related, but I still can't get 2.4 or 2.5 kernels to
> > actually read and write at the same time during a large file copy between
> > two totally separate devices (eg: from hda1 to hdc1).  "vmstat 1" shows
> > reads with no writing for about 6-8 seconds followed by writes with no
> > reading for about 5-6 seconds, repeat.
> 
> You don't have enough memory... you can probably tune bdflush to make the
> system flush writes more aggressively, but one cause is that you fill all
> available memory before bdflush even runs. Try setting the time way down,
> say one sec, and see if things change.
> 
> Note that this may not make the system run faster in any significant way,
> it may just get all the drive lights blinking.

This is happening on boxes with 1 GB of memory, etc... Besides that,
bdflush's first argument is a percentage, and that's what I was
adjusting.  And yes, I've set the percentage way down and it has
increased the rate at which it switches back and forth to make it look
like it's reading and writing at the same time, but it's not.  Throughput
never goes up.

This sort of thing works fine in 2.2 kernels...

Simon-

[  Stormix Technologies Inc.  ][  NetNation Communications Inc. ]
[       sim@stormix.com       ][       sim@netnation.com        ]
[ Opinions expressed are not necessarily those of my employers. ]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2002-02-17 16:46 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-02-14 16:03 fsync delays for a long time Alexander Moibenko
2002-02-14 17:39 ` Alan Cox
2002-02-14 20:51   ` Andrew Morton
2002-02-14 21:09     ` Alan Cox
2002-02-14 21:12       ` Alexander Moibenko
2002-02-14 21:37         ` Alan Cox
2002-02-14 21:26           ` Alexander Moibenko
2002-02-14 21:41             ` Andrew Morton
2002-02-14 22:18               ` Alexander Moibenko
2002-02-14 21:20       ` Andrew Morton
2002-02-14 21:38         ` Alan Cox
2002-02-15 16:48         ` Stephen C. Tweedie
2002-02-15 17:24     ` Simon Kirby
2002-02-15 19:07       ` Andrew Morton
2002-02-15 19:21         ` Simon Kirby
2002-02-17 12:47       ` Bill Davidsen
2002-02-17 16:45         ` Simon Kirby

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox