All of lore.kernel.org
 help / color / mirror / Atom feed
* Delaying writes to disk when there's no need
@ 2003-03-26 20:31 Erik Hensema
  2003-03-27  9:06 ` Helge Hafting
  2003-03-28 23:12 ` Pavel Machek
  0 siblings, 2 replies; 23+ messages in thread
From: Erik Hensema @ 2003-03-26 20:31 UTC (permalink / raw)
  To: linux-kernel

In all kernels I've tested writes to disk are delayed a long time even when
there's no need to do so.

A very simple test shows this: on an otherwise idle system, create a tar of
a NFS-mounted filesystem to a local disk. The kernel starts writing out the
data after 30 seconds, while a slow and steady stream would be much nicer
to the system, I think.

On 2.4.x this can block the system for several seconds. 2.5.6x and
2.5.6x-mm (with AS) also show this behaviour, but the system doesn't block
anymore. I'm using a preemtable kernel.

I only started to notice this behaviour when I upgraded from 256 MB ram to
512 MB. In other words: Linux behaves more nicely with 256 MB.

Attached is a vmstat trace, it starts at the moment the tar process starts
too. It's a simple tar cf /local/test.tar /home (/home being mounted over
NFS).

Further data:
SIS5513: IDE controller at PCI slot 00:02.5
SIS5513: chipset revision 208
SIS5513: not 100% native mode: will probe irqs later
SiS745    ATA 100 controller
    ide0: BM-DMA at 0xff00-0xff07, BIOS settings: hda:DMA, hdb:DMA
    ide1: BM-DMA at 0xff08-0xff0f, BIOS settings: hdc:DMA, hdd:DMA
hda: MAXTOR 6L080J4, ATA DISK drive

It's an 80 GB ATA 133 drive.

AMD Athlon XP 1800+, 512 MB ram.

Attached vmstat was created on:
Linux bender 2.5.66-mm1 #10 Wed Mar 26 15:16:17 CET 2003 i686 unknown

tar output was going to a logical volume formatted with reiserfs.

   procs                      memory      swap          io     system      cpu
 r  b  w   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id
 5  0  0      0 123720  45068 200044    0    0     9    10 1038   766 73  1 26
 0  1  0      0 144640  45488 178568    0    0   420     0 1587  2139  9 15 76
 0  1  0      0 137976  45496 184624    0    0     4     0 4543  3415 17 28 54
 0  1  0      0 134848  45496 188112    0    0     0     0 2877  2311 21 29 50
 0  1  0      0 124992  45500 197848    0    0     0     0 5313  2795 32 59  9
 4  1  0      0 110528  45604 212088    0    0     0   496 7460  3510 24 72  4
 3  1  0      0 105408  45608 217068    0    0     0     0 3594  2436 20 27 53
 6  0  0      0 104192  45608 218160    0    0     0     0 2037  2068 15 18 68
 3  0  0      0  99328  45608 222736    0    0     0     0 4064  3484 16 26 57
 4  1  0      0  95040  45612 226888    0    0     0     0 3379  2500 24 30 45
 5  1  0      0  90752  45656 230936    0    0     0   116 3497  2821 19 21 60
 1  1  0      0  85056  45660 236568    0    0     0     0 3744  2369 25 45 30
 1  1  0      0  83200  45660 238308    0    0     0     0 2786  3216 40 18 42
 3  1  0      0  81400  45660 239876    0    0     0     0 3020  3587 22 21 57
 1  1  0      0  71608  45664 249512    0    0     0     0 5742  3378 23 48 29
 0  1  1      0  63872  45716 257040    0    0     0   124 5063  3405 25 36 39
 2  1  1      0  56064  45720 264752    0    0     0     0 4652  2694 26 52 22
 2  1  0      0  47936  45724 272660    0    0     0     0 5443  3850 27 41 32
 1  1  0      0  46912  45724 273664    0    0     0     0 2300  4372 66 21 13
 3  1  0      0  41728  45728 278732    0    0     0     0 3755  3370 50 37 13
 2  1  0      0  36928  45780 283368    0    0     0   124 4549  4094 33 33 35
 1  1  1      0  31680  45780 288508    0    0     0     0 4116  2931 23 31 46
 0  1  0      0  28672  45780 291436    0    0     0     0 2694  2183 33 33 33
 3  1  0      0  24448  45784 295556    0    0     0     0 3152  2322 19 30 51
 0  1  0      0  24256  45784 295724    0    0     0     0 1354  1414  6  4 90
 4  0  1      0  20800  45824 298184    0    0     0 21120 3267  3374 22 20 57
 4  0  1      0  18304  45828 300440    0    0     0 36328 3583  4013 47 53  0
 3  0  0      0  16640  45828 302468    0    0     0     0 3594  4084 16 15 69
 4  1  1      0  14456  45828 304332    0    0     0     0 3531  3946 23 20 57


-- 
Erik Hensema <erik@hensema.net>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Delaying writes to disk when there's no need
  2003-03-26 20:31 Delaying writes to disk when there's no need Erik Hensema
@ 2003-03-27  9:06 ` Helge Hafting
  2003-03-27 11:22   ` Erik Hensema
  2003-03-28 23:12 ` Pavel Machek
  1 sibling, 1 reply; 23+ messages in thread
From: Helge Hafting @ 2003-03-27  9:06 UTC (permalink / raw)
  To: erik; +Cc: linux-kernel

Erik Hensema wrote:
> In all kernels I've tested writes to disk are delayed a long time even when
> there's no need to do so.
> 
Short answer - it is supposed to do that!

> A very simple test shows this: on an otherwise idle system, create a tar of
> a NFS-mounted filesystem to a local disk. The kernel starts writing out the
> data after 30 seconds, while a slow and steady stream would be much nicer
> to the system, I think.
>
You're wrong then.  There's no need for a slow steady stream, why do
you want that.  Of course you can set up cron to run sync at
regular (short) intervals to achieve this.

> On 2.4.x this can block the system for several seconds. 2.5.6x and
> 2.5.6x-mm (with AS) also show this behaviour, but the system doesn't block
> anymore. I'm using a preemtable kernel.
> 
Writing out stuff is not supposed to block the machine, and as you say,
it is fixed in 2.5.  No need for the steady writing.

> I only started to notice this behaviour when I upgraded from 256 MB ram to
> 512 MB. In other words: Linux behaves more nicely with 256 MB.
> 
Why do you think that is more nice?

Writing is delayed because that accumulate bigger writes and
fewer seeks.  This helps performance a lot.  Delaying writes
has another advantage - somw writes won't be done at all,
saving 100% writing time.  This is the case for temporary
files that gets written to, read, and deleted before they
get written to disk. It all happens in cache, improving
performance tremendously.  To see the alternative,
try booting with mem=4M or 16M or some such, with _no_ swapping.

Another case is a file that gets overwritten several times.
This all happens in memory because of the delay, only the final
version gets written to disk.  This is ver common, even for files
that are written only once.  (slowly extending a file
and writing it to disk every time _will_ write the same stuff
over and over because it is impossible to add a few bytes only.
You always write a whole block, adding anything less than that
involves reading the half-full block, updating it, and writing it
back.  Keeping such operations in memory saves quite a few
disk writes.)

For more detailed information, read a book about how filesystems and
disk caching works.

Helge Hafting


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Delaying writes to disk when there's no need
  2003-03-27  9:06 ` Helge Hafting
@ 2003-03-27 11:22   ` Erik Hensema
  0 siblings, 0 replies; 23+ messages in thread
From: Erik Hensema @ 2003-03-27 11:22 UTC (permalink / raw)
  To: linux-kernel

Helge Hafting (helgehaf@aitel.hist.no) wrote:
> Erik Hensema wrote:
>> In all kernels I've tested writes to disk are delayed a long time even when
>> there's no need to do so.
>> 
> Short answer - it is supposed to do that!
> 
>> A very simple test shows this: on an otherwise idle system, create a tar of
>> a NFS-mounted filesystem to a local disk. The kernel starts writing out the
>> data after 30 seconds, while a slow and steady stream would be much nicer
>> to the system, I think.
>>
> You're wrong then.  There's no need for a slow steady stream, why do
> you want that.  Of course you can set up cron to run sync at
> regular (short) intervals to achieve this.
> 
>> On 2.4.x this can block the system for several seconds. 2.5.6x and
>> 2.5.6x-mm (with AS) also show this behaviour, but the system doesn't block
>> anymore. I'm using a preemtable kernel.
>> 
> Writing out stuff is not supposed to block the machine, and as you say,
> it is fixed in 2.5.  No need for the steady writing.
> 
>> I only started to notice this behaviour when I upgraded from 256 MB ram to
>> 512 MB. In other words: Linux behaves more nicely with 256 MB.
>> 
> Why do you think that is more nice?

Because the interactivity of the system is better with less memory.

> Writing is delayed because that accumulate bigger writes and
> fewer seeks.  This helps performance a lot.  Delaying writes
> has another advantage - somw writes won't be done at all,
> saving 100% writing time.  This is the case for temporary
> files that gets written to, read, and deleted before they
> get written to disk. It all happens in cache, improving
> performance tremendously.  To see the alternative,
> try booting with mem=4M or 16M or some such, with _no_ swapping.

I see that. However, I don't see why the kernel is writing out data
as agressively as it does now. Delaying a write for 30 seconds isn't the
problem: the aggressive writes are. Since the disks are otherwise idle, the
kernel can gently start writing out the dirty cache. No need to try and
write 40 MB in 1 sec when you can write 10 MB/sec in 4 seconds.

[...]

> For more detailed information, read a book about how filesystems and
> disk caching works.

I'm just reporting what's happening to me in practice, I don't really care
about what should happen in theory.

-- 
Erik Hensema <erik@hensema.net>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Delaying writes to disk when there's no need
       [not found]   ` <20030327113014$37b4@gated-at.bofh.it>
@ 2003-03-28 10:18     ` Tim Connors
  2003-03-30 17:38       ` Helge Hafting
  0 siblings, 1 reply; 23+ messages in thread
From: Tim Connors @ 2003-03-28 10:18 UTC (permalink / raw)
  To: linux-kernel

In linux.kernel, you wrote:
> Helge Hafting (helgehaf@aitel.hist.no) wrote:
>> Erik Hensema wrote:
>>> In all kernels I've tested writes to disk are delayed a long time even when
>>> there's no need to do so.
>>> 
>> Short answer - it is supposed to do that!
>> 
>>> A very simple test shows this: on an otherwise idle system, create a tar of
>>> a NFS-mounted filesystem to a local disk. The kernel starts writing out the
>>> data after 30 seconds, while a slow and steady stream would be much nicer
>>> to the system, I think.

Agreed. We have a cluster which is writing on average something like
20 Megs/sec/node. We had to lower the write threshold from 30% to 0%,
because with the constant writing, linux will buffer it for 30 secs,
fill up RAM, try to empty the write-cache, stall, wash, rinse,
repeat. Because it was being filled up at roughly the rate it was
being emptied, once it got 30% behind, there was no catching up, so
the realtime system would lose data. Ouch.

>> You're wrong then.  There's no need for a slow steady stream, why do
>> you want that.  Of course you can set up cron to run sync at
>> regular (short) intervals to achieve this.

Last time I checked, cron had 1 minute resolution.
 
> I see that. However, I don't see why the kernel is writing out data
> as agressively as it does now. Delaying a write for 30 seconds isn't the
> problem: the aggressive writes are. Since the disks are otherwise idle, the
> kernel can gently start writing out the dirty cache. No need to try and
> write 40 MB in 1 sec when you can write 10 MB/sec in 4 seconds.
> 
> [...]
> 
>> For more detailed information, read a book about how filesystems and
>> disk caching works.
> 
> I'm just reporting what's happening to me in practice, I don't really care
> about what should happen in theory.

Exactly. 

Helge's comment about /tmp files and rewriting files multiple times:
in real life, how often does this happen? How often do you overwrite
one file many times in 30 seconds? The occasional 20 kilobyte /tmp
file perhaps, but I doubt it matters in real life. In real life, when
writing to disk constantly (not just scientific applications - I
believe this happens in the real world too!), waiting for 30 seconds
is a liability!


-- 
TimC -- http://astronomy.swin.edu.au/staff/tconnors/

White dwarf seeks red giant star

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Delaying writes to disk when there's no need
  2003-03-26 20:31 Delaying writes to disk when there's no need Erik Hensema
  2003-03-27  9:06 ` Helge Hafting
@ 2003-03-28 23:12 ` Pavel Machek
  2003-03-31 12:00   ` Erik Hensema
  1 sibling, 1 reply; 23+ messages in thread
From: Pavel Machek @ 2003-03-28 23:12 UTC (permalink / raw)
  To: Erik Hensema; +Cc: linux-kernel

Hi!

> In all kernels I've tested writes to disk are delayed a long time even when
> there's no need to do so.
> 
> A very simple test shows this: on an otherwise idle system, create a tar of
> a NFS-mounted filesystem to a local disk. The kernel starts writing out the
> data after 30 seconds, while a slow and steady stream would be much nicer
> to the system, I think.
> 

Well, doing writeback sooner when disks
are idle might be good idea; detecting
if disk is idle might not be too easy, through.

OTOH, raid resync already has some
such detection?
				Pavel
-- 
				Pavel
Written on sharp zaurus, because my Velo1 broke. If you have Velo you don't need...


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Delaying writes to disk when there's no need
  2003-03-28 10:18     ` Tim Connors
@ 2003-03-30 17:38       ` Helge Hafting
  0 siblings, 0 replies; 23+ messages in thread
From: Helge Hafting @ 2003-03-30 17:38 UTC (permalink / raw)
  To: Tim Connors; +Cc: linux-kernel

On Fri, Mar 28, 2003 at 09:18:59PM +1100, Tim Connors wrote:
> In linux.kernel, you wrote:
> > Helge Hafting (helgehaf@aitel.hist.no) wrote:
> >> Erik Hensema wrote:
> >>> In all kernels I've tested writes to disk are delayed a long time even when
> >>> there's no need to do so.
> >>> 
> >> Short answer - it is supposed to do that!
> >> 
> >>> A very simple test shows this: on an otherwise idle system, create a tar of
> >>> a NFS-mounted filesystem to a local disk. The kernel starts writing out the
> >>> data after 30 seconds, while a slow and steady stream would be much nicer
> >>> to the system, I think.
> 
> Agreed. We have a cluster which is writing on average something like
> 20 Megs/sec/node. We had to lower the write threshold from 30% to 0%,
> because with the constant writing, linux will buffer it for 30 secs,
> fill up RAM, try to empty the write-cache, stall, wash, rinse,
> repeat. Because it was being filled up at roughly the rate it was
> being emptied, once it got 30% behind, there was no catching up, so
> the realtime system would lose data. Ouch.
> 
Nothing can help you if you're getting input at the rate you
can write it.  You need ability to write at least somewhat faster
no matter how buffering is done.

> >> You're wrong then.  There's no need for a slow steady stream, why do
> >> you want that.  Of course you can set up cron to run sync at
> >> regular (short) intervals to achieve this.
> 
> Last time I checked, cron had 1 minute resolution.

If you need to sync more often than once per minute, consider
this shellscript:
for ((;;)) ; do sleep 1 ; sync ; done

[...]
> Helge's comment about /tmp files and rewriting files multiple times:
> in real life, how often does this happen? How often do you overwrite
> one file many times in 30 seconds?

_Every_ time you write a file in chunks not perfectly aligned
on block boundaries. 

> The occasional 20 kilobyte /tmp
> file perhaps, but I doubt it matters in real life. In real life, when
> writing to disk constantly (not just scientific applications - I
> believe this happens in the real world too!), waiting for 30 seconds
> is a liability!

Only if that 30-second wait makes linux buffer more than it can handle.
That may indeed be a problem, but buffering up _some_ data
before writing is still a good idea.

Helge Hafting

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Delaying writes to disk when there's no need
  2003-03-28 23:12 ` Pavel Machek
@ 2003-03-31 12:00   ` Erik Hensema
  2003-03-31 13:42     ` Helge Hafting
  0 siblings, 1 reply; 23+ messages in thread
From: Erik Hensema @ 2003-03-31 12:00 UTC (permalink / raw)
  To: linux-kernel

Pavel Machek (pavel@suse.cz) wrote:
> Hi!
> 
>> In all kernels I've tested writes to disk are delayed a long time even when
>> there's no need to do so.
>> 
>> A very simple test shows this: on an otherwise idle system, create a tar of
>> a NFS-mounted filesystem to a local disk. The kernel starts writing out the
>> data after 30 seconds, while a slow and steady stream would be much nicer
>> to the system, I think.
>> 
> 
> Well, doing writeback sooner when disks
> are idle might be good idea; detecting
> if disk is idle might not be too easy, through.

Helge Hafting already pointed out that writing out the data earlier isn't
desirable. The problem isn't in the waiting: the problem is in the writing.
I think the current kernel tries to write too much data too fast when
there's absolutely no reason to do so. It should probably gently write out
small amounts of data until there is a more pressing need for memory.

-- 
Erik Hensema <erik@hensema.net>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Delaying writes to disk when there's no need
  2003-03-31 12:00   ` Erik Hensema
@ 2003-03-31 13:42     ` Helge Hafting
  2003-03-31 14:45       ` Oliver Neukum
  2003-03-31 22:02       ` Nick Piggin
  0 siblings, 2 replies; 23+ messages in thread
From: Helge Hafting @ 2003-03-31 13:42 UTC (permalink / raw)
  To: erik; +Cc: linux-kernel

Erik Hensema wrote:
[...]
> Helge Hafting already pointed out that writing out the data earlier isn't
> desirable. The problem isn't in the waiting: the problem is in the writing.
> I think the current kernel tries to write too much data too fast when
> there's absolutely no reason to do so. It should probably gently write out
> small amounts of data until there is a more pressing need for memory.
> 
I don't think the problem is "writing a large chunk", rather that this
chunk is scheduled for writing a bit too late.  Memory is filling up
and the process producing data us throttled while waiting for
the write to free up pages.  Then the "huge chunk" of pages is released,
and memory is allowed to fill up for too long again.

Seem to me the correct solution is to start writing out
things long before memory gets so full that we need to
throttle the producer.

This will result in somewhat smaller chunks and a somewhat
steadier stream of data.  It will work better, but not because
the chunks are smaller. (Block devices is supposed to handle
enormous chunks with no problems, and the bandwith utilization
is generally better the bigger chunks you can get.  50M to
disk in one go isn't "pushing" anything - it is "nice".)
The reason an earlier start works better is that memory never fills up
to the point where the producer is throttled, assuming
the io system can keep up with the producer forever.
Throttling will _always_ happen when that isn't the case.

The tricky part here is knowing the bandwith of the output
device, and start writing at such a time that memory
won't have time to fill up in the case where a
producer is almost as fast as the output device.

The problem is that this depends on several things:
1. How much more memory is there (varies a lot, but
    the kernel knows this one.)
2. How fast is the output device (varies a lot, different
    areas on a disk have different speed.  Different
    disks have different speed.  The speed of nfs depends
    on network speed, network congestion,
    roundtrip time, server load, and server disk speed.
    You probably cannot get good estimates for all cases,
    particularly not nfs in a shared net.
    To get this right we need both bandwith and latency.
3. How fast is data produced?  A global estimate may
    be possible, looking at how fast memory is dirtied.
    I have no idea if such an estimate is possible per
    block device.
4. The big problem is that there may be several unrelated
    processes dirtying memory to be written to several
    very different block devices.
    For this to work automatically we need a low estimate
    for the bandwith for each block device/filesystem,
    and memory dirying rate for each.


This seems hard to solve automatically.  A specific
case of a realtime program writing near disk speed is solvable
by having an extra thread that issue a fsync whenever the
amount of written but unsynced data gets near the point
where the time necessary to write it is long enough
to fill memory with the same rate of producing data.
Of course one wants a substantial safety margin here,
perhaps an assumption that only one third or so of memory
actually will be available for caching the important stuff.

A manual solution is possible if we can have two "knobs"
for this:
1. Treshold for when to start writing out stuff
2. Treshold for when to throttle processes.

The latter may or may not be necessary, the point is that the former
should kick in long before throttling is necessary.

This is usually expressed as how many % of memory that is dirty, but
I'm not sure that is the right thing.  It assumes that 100% will be
available after cleaning, which may be way off.

Something like % of memory that is still available (free,
or instantly freeable by reclaiming clean unpinned cache)

Helge Hafting





^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Delaying writes to disk when there's no need
  2003-03-31 13:42     ` Helge Hafting
@ 2003-03-31 14:45       ` Oliver Neukum
  2003-03-31 22:02       ` Nick Piggin
  1 sibling, 0 replies; 23+ messages in thread
From: Oliver Neukum @ 2003-03-31 14:45 UTC (permalink / raw)
  To: Helge Hafting, erik; +Cc: linux-kernel


> A manual solution is possible if we can have two "knobs"
> for this:
> 1. Treshold for when to start writing out stuff
> 2. Treshold for when to throttle processes.
>
> The latter may or may not be necessary, the point is that the former
> should kick in long before throttling is necessary.
>
> This is usually expressed as how many % of memory that is dirty, but
> I'm not sure that is the right thing.  It assumes that 100% will be
> available after cleaning, which may be way off.
>
> Something like % of memory that is still available (free,
> or instantly freeable by reclaiming clean unpinned cache)

Is there any sense in allowing a task to keep dirty a certain percentage
of free memory? If you have a task that has to be throttled amyway,
is any memory that this task keeps dirty wasted anyway, if it's more
than needed to send efficient io requests to the device? Somebody
else might have better uses for that memory.

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Delaying writes to disk when there's no need
  2003-03-31 13:42     ` Helge Hafting
  2003-03-31 14:45       ` Oliver Neukum
@ 2003-03-31 22:02       ` Nick Piggin
  2003-03-31 22:22         ` Chris Friesen
  2003-03-31 22:45         ` Andrew Morton
  1 sibling, 2 replies; 23+ messages in thread
From: Nick Piggin @ 2003-03-31 22:02 UTC (permalink / raw)
  To: Helge Hafting; +Cc: erik, linux-kernel

Helge Hafting wrote:

> Erik Hensema wrote:
> [...]
>
>> Helge Hafting already pointed out that writing out the data earlier 
>> isn't
>> desirable. The problem isn't in the waiting: the problem is in the 
>> writing.
>> I think the current kernel tries to write too much data too fast when
>> there's absolutely no reason to do so. It should probably gently 
>> write out
>> small amounts of data until there is a more pressing need for memory.
>>
> I don't think the problem is "writing a large chunk", rather that this
> chunk is scheduled for writing a bit too late.  Memory is filling up
> and the process producing data us throttled while waiting for
> the write to free up pages.  Then the "huge chunk" of pages is released,
> and memory is allowed to fill up for too long again.
>
> Seem to me the correct solution is to start writing out
> things long before memory gets so full that we need to
> throttle the producer. 

I haven't thought about this much, but it seems to me that
doing writeout whenever the disk would otherwise be idle
(and we have dirty memory to write out) would be a good
solution.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Delaying writes to disk when there's no need
  2003-03-31 22:02       ` Nick Piggin
@ 2003-03-31 22:22         ` Chris Friesen
  2003-03-31 22:35           ` Nick Piggin
  2003-03-31 22:45         ` Andrew Morton
  1 sibling, 1 reply; 23+ messages in thread
From: Chris Friesen @ 2003-03-31 22:22 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Helge Hafting, erik, linux-kernel

Nick Piggin wrote:

> I haven't thought about this much, but it seems to me that
> doing writeout whenever the disk would otherwise be idle
> (and we have dirty memory to write out) would be a good
> solution.

The whole argument about waiting though is that there may be another write 
coming to the same place, in which case you could save the cost of the first 
write because it didn't have to be written.

Writing to disk isn't free, even if the disk would otherwise be idle.  You have 
the cost of the setup as well as the memory and pci bus traffic.  You may have 
disk bandwidth available but be already maxing out the PCI bus, in which case 
your "free" disk write takes I/O away from other things.

Ultimately its all a tradeoff.  Do you write now, or do you hold off and hope 
that you can throw away some of the writes because new stuff will home in to 
overwrite them?

Chris

-- 
Chris Friesen                    | MailStop: 043/33/F10
Nortel Networks                  | work: (613) 765-0557
3500 Carling Avenue              | fax:  (613) 765-2986
Nepean, ON K2H 8E9 Canada        | email: cfriesen@nortelnetworks.com


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Delaying writes to disk when there's no need
  2003-03-31 22:22         ` Chris Friesen
@ 2003-03-31 22:35           ` Nick Piggin
  2003-03-31 22:51             ` John Bradford
  0 siblings, 1 reply; 23+ messages in thread
From: Nick Piggin @ 2003-03-31 22:35 UTC (permalink / raw)
  To: Chris Friesen; +Cc: Helge Hafting, erik, linux-kernel



Chris Friesen wrote:

> Nick Piggin wrote:
>
>> I haven't thought about this much, but it seems to me that
>> doing writeout whenever the disk would otherwise be idle
>> (and we have dirty memory to write out) would be a good
>> solution.
>
>
> The whole argument about waiting though is that there may be another 
> write coming to the same place, in which case you could save the cost 
> of the first write because it didn't have to be written.
>
> Writing to disk isn't free, even if the disk would otherwise be idle.  
> You have the cost of the setup as well as the memory and pci bus 
> traffic.  You may have disk bandwidth available but be already maxing 
> out the PCI bus, in which case your "free" disk write takes I/O away 
> from other things. 

Only if the memory gets dirtied again, otherwise the earlier the better. 
If the
memory does get written to again before the writeout timeout then yeah 
its used
some cpu, memory, pci, etc that it didn't have to.

>
>
> Ultimately its all a tradeoff.  Do you write now, or do you hold off 
> and hope that you can throw away some of the writes because new stuff 
> will home in to overwrite them?

Yes it is a tradeoff. Having an idle disk gives more weight to "write now".


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Delaying writes to disk when there's no need
  2003-03-31 22:02       ` Nick Piggin
  2003-03-31 22:22         ` Chris Friesen
@ 2003-03-31 22:45         ` Andrew Morton
  2003-03-31 23:03           ` Nick Piggin
                             ` (2 more replies)
  1 sibling, 3 replies; 23+ messages in thread
From: Andrew Morton @ 2003-03-31 22:45 UTC (permalink / raw)
  To: Nick Piggin; +Cc: helgehaf, erik, linux-kernel

Nick Piggin <piggin@cyberone.com.au> wrote:
>
> it seems to me that
> doing writeout whenever the disk would otherwise be idle
> (and we have dirty memory to write out) would be a good
> solution.

This is what the recently-removed BDI_read_active flag in backing_dev_info
was supposed to be for.  I let it go because I don't think it's terribly
important and it's time to stop fiddling with the vfs writeout code and it
wasn't right anyway.

Note that 2.5 starts pdflush writeout at 10% of memory dirty.  Or even lower
if there is a lot of mapped memory around.  Whereas 2.4 will start background
writeout at 30% or 40% dirty.  That's a fairly significant tuning change.

The algorithm for utilisation of an idle disk should be, in
balance_dirty_pages():

	if (ps.nr_dirty + ps.nr_writeback < background_thresh) {
		if (time_after(jiffies, bdi->last_read + HZ/100)) {
			if (bdi->write_requests_in_flight < 2) {
				struct writeback_control wbc = {
					.bdi		= bdi,
					.sync_mode	= WB_SYNC_NONE,
					.nr_to_write	= write_chunk,
				};

				writeback_inodes(&wbc);
			}
		}
		return;
	}


Or something like that.  It's pretty close.

It could have pretty bad failure modes.  Short-lived files in /tmp now
perform writeout, which needs to be waited on when those files are removed.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Delaying writes to disk when there's no need
  2003-03-31 22:35           ` Nick Piggin
@ 2003-03-31 22:51             ` John Bradford
  2003-03-31 22:58               ` Nick Piggin
  0 siblings, 1 reply; 23+ messages in thread
From: John Bradford @ 2003-03-31 22:51 UTC (permalink / raw)
  To: Nick Piggin; +Cc: cfriesen, helgehaf, erik, linux-kernel

> If the memory does get written to again before the writeout timeout
> then yeah its used some cpu, memory, pci, etc that it didn't have
> to.

It will presumably also have filled the cache with the writeout data.

> > Ultimately its all a tradeoff.  Do you write now, or do you hold off 
> > and hope that you can throw away some of the writes because new stuff 
> > will home in to overwrite them?
> 
> Yes it is a tradeoff. Having an idle disk gives more weight to "write now".

Not necessarily.  What if you are using a solid state disk which only
allows a relatively low number of re-write cycles?  What if the disk
is spun down, and spinning it up uses a lot of power?  On a laptop,
you don't necessarily want the disk spinning up just to write one
sector.

John.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Delaying writes to disk when there's no need
  2003-03-31 22:51             ` John Bradford
@ 2003-03-31 22:58               ` Nick Piggin
  0 siblings, 0 replies; 23+ messages in thread
From: Nick Piggin @ 2003-03-31 22:58 UTC (permalink / raw)
  To: John Bradford; +Cc: cfriesen, helgehaf, erik, linux-kernel

John Bradford wrote:

>>If the memory does get written to again before the writeout timeout
>>then yeah its used some cpu, memory, pci, etc that it didn't have
>>to.
>>
>
>It will presumably also have filled the cache with the writeout data.
>
What cache?

>
>
>>>Ultimately its all a tradeoff.  Do you write now, or do you hold off 
>>>and hope that you can throw away some of the writes because new stuff 
>>>will home in to overwrite them?
>>>
>>Yes it is a tradeoff. Having an idle disk gives more weight to "write now".
>>
>
>Not necessarily.  What if you are using a solid state disk which only
>allows a relatively low number of re-write cycles?  What if the disk
>is spun down, and spinning it up uses a lot of power?  On a laptop,
>you don't necessarily want the disk spinning up just to write one
>sector.
>
Yes it does. The factors you mention just add (a lot) more
weight to "hold off".


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Delaying writes to disk when there's no need
  2003-03-31 22:45         ` Andrew Morton
@ 2003-03-31 23:03           ` Nick Piggin
  2003-03-31 23:32           ` Ingo Oeser
  2003-04-01  0:43           ` Daniel Pittman
  2 siblings, 0 replies; 23+ messages in thread
From: Nick Piggin @ 2003-03-31 23:03 UTC (permalink / raw)
  To: Andrew Morton; +Cc: helgehaf, erik, linux-kernel



Andrew Morton wrote:

>Nick Piggin <piggin@cyberone.com.au> wrote:
>  
>
>>it seems to me that
>>doing writeout whenever the disk would otherwise be idle
>>(and we have dirty memory to write out) would be a good
>>solution.
>>    
>>
>
>This is what the recently-removed BDI_read_active flag in backing_dev_info
>was supposed to be for.  I let it go because I don't think it's terribly
>important and it's time to stop fiddling with the vfs writeout code and it
>wasn't right anyway.
>
>Note that 2.5 starts pdflush writeout at 10% of memory dirty.  Or even lower
>if there is a lot of mapped memory around.  Whereas 2.4 will start background
>writeout at 30% or 40% dirty.  That's a fairly significant tuning change.
>
>The algorithm for utilisation of an idle disk should be, in
>balance_dirty_pages():
>
>	if (ps.nr_dirty + ps.nr_writeback < background_thresh) {
>		if (time_after(jiffies, bdi->last_read + HZ/100)) {
>			if (bdi->write_requests_in_flight < 2) {
>				struct writeback_control wbc = {
>					.bdi		= bdi,
>					.sync_mode	= WB_SYNC_NONE,
>					.nr_to_write	= write_chunk,
>				};
>
>				writeback_inodes(&wbc);
>			}
>		}
>		return;
>	}
>
>
>Or something like that.  It's pretty close.
>
Yeah something like that looks alright.

>
>It could have pretty bad failure modes.  Short-lived files in /tmp now
>perform writeout, which needs to be waited on when those files are removed.
>  
>
I didn't think of that.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Delaying writes to disk when there's no need
  2003-03-31 22:45         ` Andrew Morton
  2003-03-31 23:03           ` Nick Piggin
@ 2003-03-31 23:32           ` Ingo Oeser
  2003-04-01  0:02             ` Andrew Morton
  2003-04-01  0:43           ` Daniel Pittman
  2 siblings, 1 reply; 23+ messages in thread
From: Ingo Oeser @ 2003-03-31 23:32 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Nick Piggin, helgehaf, erik, linux-kernel

On Mon, Mar 31, 2003 at 02:45:00PM -0800, Andrew Morton wrote:
> It could have pretty bad failure modes.  Short-lived files in /tmp now
> perform writeout, which needs to be waited on when those files are removed.

/tmp is not a problem, because this can be fixed by using tmpfs
(I use 2GB of it with 1GB of RAM).

Bad are the small writes generated by the proposed behavior. 

The disk is idle, so this is not about performance, but power
consumption. Spinning up a disk costs around 1-2 seconds, so you
should come in with at least the amount of data you write in 1-2
seconds for a spun down disk.

Regards

Ingo Oeser
-- 
Marketing ist die Kunst, Leuten Sachen zu verkaufen, die sie
nicht brauchen, mit Geld, was sie nicht haben, um Leute zu
beeindrucken, die sie nicht moegen.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Delaying writes to disk when there's no need
  2003-03-31 23:32           ` Ingo Oeser
@ 2003-04-01  0:02             ` Andrew Morton
  0 siblings, 0 replies; 23+ messages in thread
From: Andrew Morton @ 2003-04-01  0:02 UTC (permalink / raw)
  To: Ingo Oeser; +Cc: piggin, helgehaf, erik, linux-kernel

Ingo Oeser <ingo.oeser@informatik.tu-chemnitz.de> wrote:
>
> On Mon, Mar 31, 2003 at 02:45:00PM -0800, Andrew Morton wrote:
> > It could have pretty bad failure modes.  Short-lived files in /tmp now
> > perform writeout, which needs to be waited on when those files are removed.
> 
> /tmp is not a problem, because this can be fixed by using tmpfs
> (I use 2GB of it with 1GB of RAM).

I don't.   These files get unlinked before they hit disk.

> The disk is idle, so this is not about performance, but power
> consumption. Spinning up a disk costs around 1-2 seconds, so you
> should come in with at least the amount of data you write in 1-2
> seconds for a spun down disk.

The requirements for portable computers are totally different.  You'd turn
the whole thing off for them.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Delaying writes to disk when there's no need
  2003-03-31 22:45         ` Andrew Morton
  2003-03-31 23:03           ` Nick Piggin
  2003-03-31 23:32           ` Ingo Oeser
@ 2003-04-01  0:43           ` Daniel Pittman
  2003-04-01  1:09             ` Andrew Morton
  2 siblings, 1 reply; 23+ messages in thread
From: Daniel Pittman @ 2003-04-01  0:43 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

On Mon, 31 Mar 2003, Andrew Morton wrote:
> Nick Piggin <piggin@cyberone.com.au> wrote:
>>
>> it seems to me that
>> doing writeout whenever the disk would otherwise be idle
>> (and we have dirty memory to write out) would be a good
>> solution.
> 
> This is what the recently-removed BDI_read_active flag in
> backing_dev_info was supposed to be for. I let it go because I don't
> think it's terribly important and it's time to stop fiddling with the
> vfs writeout code and it wasn't right anyway.
> 
> Note that 2.5 starts pdflush writeout at 10% of memory dirty. Or even
> lower if there is a lot of mapped memory around. Whereas 2.4 will
> start background writeout at 30% or 40% dirty. That's a fairly
> significant tuning change.

I don't figure it's a very important thing, but even this change doesn't
resolve one of the issues I have with the default writeout scheduler.

Capturing a real-time video stream from an IEEE1394 DV stream means
writing a stead 3.5MB per second for two on two and a half hours.

Linux isn't great at this, using the default writeout policy, even as
recent as 2.5.64. The writer goes OK for a while but, eventually, blocks
on writeout for long enough to drop a frame -- more than 8/25ths of a
second.


This can be resolved by tuning the default delay before write-out start
to 5 seconds, down from 30, or by running sync every second, or by doing
fsync tricks.


I think it's a good thing that you can delay writes for a long time, in
general, but there are cases where blocking *really* sucks and on a
system that does nothing else but produce 3.5MB per second of dirty
memory and write that to disk...

Well, something that allowed only that data stream to be preemptively
written out would be good without the need for the thread-and-fsync
trick.

        Daniel

-- 
Anyone who stops learning is old, whether at twenty or eighty. Anyone who keeps
learning stays young. The greatest thing in life is to keep your mind young.
        -- Henry Ford

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Delaying writes to disk when there's no need
  2003-04-01  0:43           ` Daniel Pittman
@ 2003-04-01  1:09             ` Andrew Morton
  2003-04-01  1:34               ` Daniel Pittman
       [not found]               ` <3E88EB3D.6020409@cyberone.com.au>
  0 siblings, 2 replies; 23+ messages in thread
From: Andrew Morton @ 2003-04-01  1:09 UTC (permalink / raw)
  To: Daniel Pittman; +Cc: linux-kernel

Daniel Pittman <daniel@rimspace.net> wrote:
>
> Capturing a real-time video stream from an IEEE1394 DV stream means
> writing a stead 3.5MB per second for two on two and a half hours.
> 
> Linux isn't great at this, using the default writeout policy, even as
> recent as 2.5.64. The writer goes OK for a while but, eventually, blocks
> on writeout for long enough to drop a frame -- more than 8/25ths of a
> second.
> 
> 
> This can be resolved by tuning the default delay before write-out start
> to 5 seconds, down from 30, or by running sync every second, or by doing
> fsync tricks.

Interesting.

Yes, I expect that you could fix that up by altering dirty_background_ratio
and dirty_expire_centisecs.

The problem with fsync() is that it waits on the writeout.  You don't want
that to happen - you just want to tell the kernel "I won't be overwriting or
deleting this data".  Make the kernel queue up and start the IO but not wait
on its completion.

It is quite appropriate to do this in fadvise(FADV_DONTNEED) - as a
lower-latency fsync().  The app would need to call it once per second or so. 

It would also throw away any written-back pagecache inside your (start, len)
which is exactly what your applications wants to happen, so the app should be
calling fadvise _anyway_.

What do you think?


 25-akpm/include/linux/fs.h |    1 +
 25-akpm/mm/fadvise.c       |    1 +
 25-akpm/mm/filemap.c       |   18 ++++++++++++++++--
 3 files changed, 18 insertions(+), 2 deletions(-)

diff -puN include/linux/fs.h~fadvise-flush-data include/linux/fs.h
--- 25/include/linux/fs.h~fadvise-flush-data	Mon Mar 31 17:03:39 2003
+++ 25-akpm/include/linux/fs.h	Mon Mar 31 17:03:39 2003
@@ -1112,6 +1112,7 @@ unsigned long invalidate_inode_pages(str
 extern void invalidate_inode_pages2(struct address_space *mapping);
 extern void write_inode_now(struct inode *, int);
 extern int filemap_fdatawrite(struct address_space *);
+extern int filemap_flush(struct address_space *);
 extern int filemap_fdatawait(struct address_space *);
 extern void sync_supers(void);
 extern void sync_filesystems(int wait);
diff -puN mm/fadvise.c~fadvise-flush-data mm/fadvise.c
--- 25/mm/fadvise.c~fadvise-flush-data	Mon Mar 31 17:03:39 2003
+++ 25-akpm/mm/fadvise.c	Mon Mar 31 17:03:39 2003
@@ -61,6 +61,7 @@ long sys_fadvise64(int fd, loff_t offset
 			ret = 0;
 		break;
 	case POSIX_FADV_DONTNEED:
+		filemap_flush(mapping);
 		invalidate_mapping_pages(mapping, offset >> PAGE_CACHE_SHIFT,
 				(len >> PAGE_CACHE_SHIFT) + 1);
 		break;
diff -puN mm/filemap.c~fadvise-flush-data mm/filemap.c
--- 25/mm/filemap.c~fadvise-flush-data	Mon Mar 31 17:03:39 2003
+++ 25-akpm/mm/filemap.c	Mon Mar 31 17:03:39 2003
@@ -122,11 +122,11 @@ static inline int sync_page(struct page 
  * if a dirty page/buffer is encountered, it must be waited upon, and not just
  * skipped over.
  */
-int filemap_fdatawrite(struct address_space *mapping)
+static int __filemap_fdatawrite(struct address_space *mapping, int sync_mode)
 {
 	int ret;
 	struct writeback_control wbc = {
-		.sync_mode = WB_SYNC_ALL,
+		.sync_mode = sync_mode,
 		.nr_to_write = mapping->nrpages * 2,
 	};
 
@@ -140,6 +140,20 @@ int filemap_fdatawrite(struct address_sp
 	return ret;
 }
 
+int filemap_fdatawrite(struct address_space *mapping)
+{
+	return __filemap_fdatawrite(mapping, WB_SYNC_ALL);
+}
+
+/*
+ * This is a mostly non-blocking flush.  Not suitable for data-integrity
+ * purposes.
+ */
+int filemap_flush(struct address_space *mapping)
+{
+	return __filemap_fdatawrite(mapping, WB_SYNC_NONE);
+}
+
 /**
  * filemap_fdatawait - walk the list of locked pages of the given address
  *                     space and wait for all of them.

_


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Delaying writes to disk when there's no need
  2003-04-01  1:09             ` Andrew Morton
@ 2003-04-01  1:34               ` Daniel Pittman
  2003-04-01  1:45                 ` Andrew Morton
       [not found]               ` <3E88EB3D.6020409@cyberone.com.au>
  1 sibling, 1 reply; 23+ messages in thread
From: Daniel Pittman @ 2003-04-01  1:34 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

On Mon, 31 Mar 2003, Andrew Morton wrote:
> Daniel Pittman <daniel@rimspace.net> wrote:
>>
>> Capturing a real-time video stream from an IEEE1394 DV stream means
>> writing a stead 3.5MB per second for two on two and a half hours.
>> 
>> Linux isn't great at this, using the default writeout policy, even as
>> recent as 2.5.64. The writer goes OK for a while but, eventually,
>> blocks on writeout for long enough to drop a frame -- more than
>> 8/25ths of a second.
>> 
>> 
>> This can be resolved by tuning the default delay before write-out
>> start to 5 seconds, down from 30, or by running sync every second, or
>> by doing fsync tricks.
> 
> Interesting.
> 
> Yes, I expect that you could fix that up by altering
> dirty_background_ratio and dirty_expire_centisecs.

Those are, in fact, the precise knobs I turned. Well, those and the XFS
pagebuf layer equivalents.

> The problem with fsync() is that it waits on the writeout. You don't
> want that to happen - you just want to tell the kernel "I won't be
> overwriting or deleting this data". Make the kernel queue up and start
> the IO but not wait on its completion.

Yes, that would be good, because then I wouldn't need to write an IPC
thing and fork or thread, so that the second thread can be busy blocking
on the writeout for me.

> It is quite appropriate to do this in fadvise(FADV_DONTNEED) - as a
> lower-latency fsync(). The app would need to call it once per second
> or so.
> 
> It would also throw away any written-back pagecache inside your
> (start, len) which is exactly what your applications wants to happen,
> so the app should be calling fadvise _anyway_.
> 
> What do you think?

I will apply the patch and test later today.  This, however, looks like
a *really* good thing to me.

  Daniel

-- 
there's a party going on 
we'll all be here dancing underground 
      there's a riot going on 
we'll all be here dancing underground
        -- Covenant, _Riot_

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Delaying writes to disk when there's no need
       [not found]               ` <3E88EB3D.6020409@cyberone.com.au>
@ 2003-04-01  1:39                 ` Andrew Morton
  0 siblings, 0 replies; 23+ messages in thread
From: Andrew Morton @ 2003-04-01  1:39 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-kernel

Nick Piggin <piggin@cyberone.com.au> wrote:
>
> Would the writeout on disk idle solve this without using the fadvise?

Yes it would schedule the I/O in the desired manner.  But it would do that
for _all_ files, not just the desired one.

And that app needs to be changed to use fadvise anyway, to take down the
useless pagecache.

> How often is balance_dirty_pages called? Enough to keep an otherwise
> idle disk busy?

Approximately once per 1000 dirtied pages per cpu.  Say 4 megs.  A nice
chunk.

> Would it be possible to fix the /tmp files case? Could you cancel IO
> to a file that gets deleted?

Well...  One could place a hint in the inode somewhere. 
balance_dirty_pages() is given the inode, so it could notice that this is an
"early sync" inode and flush it every 4 megabytes.

That would be appropriate for O_STREAMING, which is a better and more
efficient interface than fadvise.

But really, the right thing to do here is to modify the app to use fadvise.

> On a similar note, would it be useful and not difficult to do
> speculative LIFO swapin in case of lots of free memory and an idle
> disk? Probably too hard. I guess lowering swapiness would help.

Might do.  I'm not sure how though.

Another possibility would be to perform speculative writes.  Write some
random data to a disk block and later see if that was the data which the
application actually wanted to write.  If so, we can optimise away the later
I/O.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Delaying writes to disk when there's no need
  2003-04-01  1:34               ` Daniel Pittman
@ 2003-04-01  1:45                 ` Andrew Morton
  0 siblings, 0 replies; 23+ messages in thread
From: Andrew Morton @ 2003-04-01  1:45 UTC (permalink / raw)
  To: Daniel Pittman; +Cc: linux-kernel

Daniel Pittman <daniel@rimspace.net> wrote:
>
> > What do you think?
> 
> I will apply the patch and test later today.  This, however, looks like
> a *really* good thing to me.

I think so too.

A possible enhancement is to only do the flush if the backing queue is not
write-congested.  So the syscall will very probably be async.  If the queue
is write congested then it's a sure bet that someone else is saturating the
disk ayway.  We don't want to block in that case.


 25-akpm/include/linux/fs.h |    1 +
 25-akpm/mm/fadvise.c       |    2 ++
 25-akpm/mm/filemap.c       |   18 ++++++++++++++++--
 3 files changed, 19 insertions(+), 2 deletions(-)

diff -puN include/linux/fs.h~fadvise-flush-data include/linux/fs.h
--- 25/include/linux/fs.h~fadvise-flush-data	Mon Mar 31 17:03:39 2003
+++ 25-akpm/include/linux/fs.h	Mon Mar 31 17:43:45 2003
@@ -1112,6 +1112,7 @@ unsigned long invalidate_inode_pages(str
 extern void invalidate_inode_pages2(struct address_space *mapping);
 extern void write_inode_now(struct inode *, int);
 extern int filemap_fdatawrite(struct address_space *);
+extern int filemap_flush(struct address_space *);
 extern int filemap_fdatawait(struct address_space *);
 extern void sync_supers(void);
 extern void sync_filesystems(int wait);
diff -puN mm/fadvise.c~fadvise-flush-data mm/fadvise.c
--- 25/mm/fadvise.c~fadvise-flush-data	Mon Mar 31 17:03:39 2003
+++ 25-akpm/mm/fadvise.c	Mon Mar 31 17:44:49 2003
@@ -61,6 +61,8 @@ long sys_fadvise64(int fd, loff_t offset
 			ret = 0;
 		break;
 	case POSIX_FADV_DONTNEED:
+		if (!bdi_write_congested(mapping->backing_dev_info))
+			filemap_flush(mapping);
 		invalidate_mapping_pages(mapping, offset >> PAGE_CACHE_SHIFT,
 				(len >> PAGE_CACHE_SHIFT) + 1);
 		break;
diff -puN mm/filemap.c~fadvise-flush-data mm/filemap.c
--- 25/mm/filemap.c~fadvise-flush-data	Mon Mar 31 17:03:39 2003
+++ 25-akpm/mm/filemap.c	Mon Mar 31 17:03:39 2003
@@ -122,11 +122,11 @@ static inline int sync_page(struct page 
  * if a dirty page/buffer is encountered, it must be waited upon, and not just
  * skipped over.
  */
-int filemap_fdatawrite(struct address_space *mapping)
+static int __filemap_fdatawrite(struct address_space *mapping, int sync_mode)
 {
 	int ret;
 	struct writeback_control wbc = {
-		.sync_mode = WB_SYNC_ALL,
+		.sync_mode = sync_mode,
 		.nr_to_write = mapping->nrpages * 2,
 	};
 
@@ -140,6 +140,20 @@ int filemap_fdatawrite(struct address_sp
 	return ret;
 }
 
+int filemap_fdatawrite(struct address_space *mapping)
+{
+	return __filemap_fdatawrite(mapping, WB_SYNC_ALL);
+}
+
+/*
+ * This is a mostly non-blocking flush.  Not suitable for data-integrity
+ * purposes.
+ */
+int filemap_flush(struct address_space *mapping)
+{
+	return __filemap_fdatawrite(mapping, WB_SYNC_NONE);
+}
+
 /**
  * filemap_fdatawait - walk the list of locked pages of the given address
  *                     space and wait for all of them.

_


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2003-04-01  1:34 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-03-26 20:31 Delaying writes to disk when there's no need Erik Hensema
2003-03-27  9:06 ` Helge Hafting
2003-03-27 11:22   ` Erik Hensema
2003-03-28 23:12 ` Pavel Machek
2003-03-31 12:00   ` Erik Hensema
2003-03-31 13:42     ` Helge Hafting
2003-03-31 14:45       ` Oliver Neukum
2003-03-31 22:02       ` Nick Piggin
2003-03-31 22:22         ` Chris Friesen
2003-03-31 22:35           ` Nick Piggin
2003-03-31 22:51             ` John Bradford
2003-03-31 22:58               ` Nick Piggin
2003-03-31 22:45         ` Andrew Morton
2003-03-31 23:03           ` Nick Piggin
2003-03-31 23:32           ` Ingo Oeser
2003-04-01  0:02             ` Andrew Morton
2003-04-01  0:43           ` Daniel Pittman
2003-04-01  1:09             ` Andrew Morton
2003-04-01  1:34               ` Daniel Pittman
2003-04-01  1:45                 ` Andrew Morton
     [not found]               ` <3E88EB3D.6020409@cyberone.com.au>
2003-04-01  1:39                 ` Andrew Morton
     [not found] <20030326204012$188c@gated-at.bofh.it>
     [not found] ` <20030327091007$22a5@gated-at.bofh.it>
     [not found]   ` <20030327113014$37b4@gated-at.bofh.it>
2003-03-28 10:18     ` Tim Connors
2003-03-30 17:38       ` Helge Hafting

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.