linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [Bug 421482] Firefox 3 uses fsync excessively
       [not found] ` <200805260513.m4Q5DAU8018498@mrapp54.mozilla.org>
@ 2008-05-26  7:05   ` Andrew Morton
  2008-05-26 10:07     ` Theodore Tso
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2008-05-26  7:05 UTC (permalink / raw)
  To: linux-ext4, linux-fsdevel


This:

On Sun, 25 May 2008 22:13:10 -0700 bugzilla-daemon@mozilla.org wrote:

> Do not reply to this email.  You can add comments to this bug at
> https://bugzilla.mozilla.org/show_bug.cgi?id=421482
> 
> 
> 
> 
> 
> --- Comment #152 from Karl Tomlinson (:karlt) <mozbugz@karlt.net>  2008-05-25 22:12:23 PDT ---
> Created an attachment (id=322475)
>  --> (https://bugzilla.mozilla.org/attachment.cgi?id=322475)
> fdatasync/sync_file_range test program
> 
> fdatasync/sync_file_range test program
> 
> This first creates a file of length 1 then does one fsync on the new file.
> Then the file is continually modified without changing the length and synced
> after each modification using one of three methods (somewhat randomly
> selected): fsync/fdatasync/sync_file_range.
> 
> The I/O load for the test results below was produced using dd with a small
> blocksize to limit the I/O some:
> 
> dd if=/dev/zero of=large bs=64 count=$((3*1024*1024*1024/64))
> 
> I used ltrace instead of strace as my strace didn't find sync_file_range (and
> my glibc-2.5 libraries don't seem to have a sync_file_range function), so
> sync_file_range appears below as "syscall(277".
> 
> rm -f datasync-test.tmp &&
> ltrace -t -T -e trace=,fsync,fdatasync,syscall ./a.out
> 
> 16:12:59 fsync(3) = 0                   <11.864858>
> 16:13:13 fdatasync(3) = 0               <14.706356>
> 16:13:30 fsync(3) = 0                   <12.832373>
> 16:13:45 syscall(277, 3, 0, 1, 7) = 0   <0.343116>
> 16:13:49 fdatasync(3) = 0               <8.231468>
> 16:14:01 syscall(277, 3, 0, 1, 7) = 0   <2.347144>
> 16:14:06 fsync(3) = 0                   <6.938656>
> 16:14:16 fdatasync(3) = 0               <8.359644>
> 16:14:27 fsync(3) = 0                   <5.928242>
> 16:14:35 syscall(277, 3, 0, 1, 7) = 0   <0.009531>
> 16:14:39 fdatasync(3) = 0               <7.356126>
> 16:14:50 fsync(3) = 0                   <6.402128>
> 16:14:59 syscall(277, 3, 0, 1, 7) = 0   <0.802706>
> 16:15:03 syscall(277, 3, 0, 1, 7) = 0   <2.985404>
> 16:15:08 fsync(3) = 0                   <4.722020>
> 16:15:15 fdatasync(3) = 0               <6.532945>
> 16:15:24 fdatasync(3) = 0               <2.294488>
> 16:15:30 fsync(3) = 0                   <7.986250>
> 16:15:40 syscall(277, 3, 0, 1, 7) = 0   <1.409809>
> 16:15:45 fdatasync(3) = 0               <5.404190>
> 
> The results are consistent with fdatasync being implemented as fsync on ext3.
> 
> They show the potential for considerable savings from growing (and shrinking)
> files in large hunks and using sync_file_range (which also should reduce the
> impact on the rest of the filesystem).

is wrong, isn't it?

It's purportedly showing that fdatasync() on ext3 is syncing the whole
world in fsync()-fashion even with an application which does not grow
the file size.

But fdatasync() shouldn't do that.  Even if the inode is dirty from
atime or mtime updates, that shouldn't cause fdatasync() to run an
ext3 commit?


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bug 421482] Firefox 3 uses fsync excessively
  2008-05-26  7:05   ` [Bug 421482] Firefox 3 uses fsync excessively Andrew Morton
@ 2008-05-26 10:07     ` Theodore Tso
  2008-05-26 11:10       ` Jörn Engel
  2008-05-26 18:49       ` [Bug 421482] Firefox 3 uses fsync excessively Andrew Morton
  0 siblings, 2 replies; 15+ messages in thread
From: Theodore Tso @ 2008-05-26 10:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-ext4, linux-fsdevel

On Mon, May 26, 2008 at 12:05:06AM -0700, Andrew Morton wrote:
> It's purportedly showing that fdatasync() on ext3 is syncing the whole
> world in fsync()-fashion even with an application which does not grow
> the file size.
> 
> But fdatasync() shouldn't do that.  Even if the inode is dirty from
> atime or mtime updates, that shouldn't cause fdatasync() to run an
> ext3 commit?

Well, ideally it shouldn't, although POSIX allows fdatasync() to be
implemented in terms of fsync().  It is at the moment.  :-/

The problem is we don't currently have a way of distinguishing between
a "smudged" inode (only the mtime/atime has changed) and a "dirty"
inode (even if the number of blocks hasn't changed, if i_size has
changed, or i_mode, or anything else, including extended attributes
inline in the inode).  We're not tracking that difference.  If we only
allow mtime/atime changes through setattr (see Cristoph's patches),
and don't set the VFS dirty bit, but our own "smudged" bit, we could
do it --- but at the moment, we're not.

							- Ted

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bug 421482] Firefox 3 uses fsync excessively
  2008-05-26 10:07     ` Theodore Tso
@ 2008-05-26 11:10       ` Jörn Engel
  2008-05-26 11:38         ` Theodore Tso
  2008-05-26 18:49       ` [Bug 421482] Firefox 3 uses fsync excessively Andrew Morton
  1 sibling, 1 reply; 15+ messages in thread
From: Jörn Engel @ 2008-05-26 11:10 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Andrew Morton, linux-ext4, linux-fsdevel

On Mon, 26 May 2008 06:07:51 -0400, Theodore Tso wrote:
> On Mon, May 26, 2008 at 12:05:06AM -0700, Andrew Morton wrote:
> > It's purportedly showing that fdatasync() on ext3 is syncing the whole
> > world in fsync()-fashion even with an application which does not grow
> > the file size.
> > 
> > But fdatasync() shouldn't do that.  Even if the inode is dirty from
> > atime or mtime updates, that shouldn't cause fdatasync() to run an
> > ext3 commit?
> 
> Well, ideally it shouldn't, although POSIX allows fdatasync() to be
> implemented in terms of fsync().  It is at the moment.  :-/
> 
> The problem is we don't currently have a way of distinguishing between
> a "smudged" inode (only the mtime/atime has changed) and a "dirty"
> inode (even if the number of blocks hasn't changed, if i_size has
> changed, or i_mode, or anything else, including extended attributes
> inline in the inode).  We're not tracking that difference.  If we only
> allow mtime/atime changes through setattr (see Cristoph's patches),
> and don't set the VFS dirty bit, but our own "smudged" bit, we could
> do it --- but at the moment, we're not.

Don't we already have this bit since Linux 2.4.0-test12?  I_DIRTY_SYNC
is admittedly not well-named for "smudged".  But it used to mean just
that.  I_DIRTY_DATASYNC was the real dirty bit.  Which, in I_DIRTY_PAGES,
has been split into I_DIRTY_DATASYNC and I_DIRTY_PAGES.

Now we just have to use sane names.

Jörn

-- 
Don't worry about people stealing your ideas. If your ideas are any good,
you'll have to ram them down people's throats.
-- Howard Aiken quoted by Ken Iverson quoted by Jim Horning quoted by
   Raph Levien, 1979
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bug 421482] Firefox 3 uses fsync excessively
  2008-05-26 11:10       ` Jörn Engel
@ 2008-05-26 11:38         ` Theodore Tso
  2008-05-26 12:52           ` Jörn Engel
  0 siblings, 1 reply; 15+ messages in thread
From: Theodore Tso @ 2008-05-26 11:38 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Andrew Morton, linux-ext4, linux-fsdevel

On Mon, May 26, 2008 at 01:10:16PM +0200, Jörn Engel wrote:
> Don't we already have this bit since Linux 2.4.0-test12?  I_DIRTY_SYNC
> is admittedly not well-named for "smudged".  But it used to mean just
> that.  I_DIRTY_DATASYNC was the real dirty bit.  Which, in I_DIRTY_PAGES,
> has been split into I_DIRTY_DATASYNC and I_DIRTY_PAGES.
> 
> Now we just have to use sane names.

We're currently forcing a new commit if I_DIRTY_SYNC or
I_DIRTY_DATASYNC (but not necessarily I_DIRTY_PAGES) is set.  If
I_DIRTY_SYNC really means "smudged" (I believe you but I'll want to go
through the code and prove it to myself :-), then this might be a very
easy fix.  We'll need to make sure that unmount time we do actually
force out all inodes even if only I_DIRTY_SYNC is set.

(And then, we should rename things to more sane names.  :-)

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bug 421482] Firefox 3 uses fsync excessively
  2008-05-26 11:38         ` Theodore Tso
@ 2008-05-26 12:52           ` Jörn Engel
  2008-05-26 20:22             ` Jamie Lokier
  0 siblings, 1 reply; 15+ messages in thread
From: Jörn Engel @ 2008-05-26 12:52 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Andrew Morton, linux-ext4, linux-fsdevel

On Mon, 26 May 2008 07:38:46 -0400, Theodore Tso wrote:
> 
> On Mon, May 26, 2008 at 01:10:16PM +0200, Jörn Engel wrote:
> > Don't we already have this bit since Linux 2.4.0-test12?  I_DIRTY_SYNC
> > is admittedly not well-named for "smudged".  But it used to mean just
> > that.  I_DIRTY_DATASYNC was the real dirty bit.  Which, in I_DIRTY_PAGES,
                                                               ^^^^^^^^^^^^^
That should have been "2.4.0-prerelease".

> > has been split into I_DIRTY_DATASYNC and I_DIRTY_PAGES.
> > 
> > Now we just have to use sane names.
> 
> We're currently forcing a new commit if I_DIRTY_SYNC or
> I_DIRTY_DATASYNC (but not necessarily I_DIRTY_PAGES) is set.  If
> I_DIRTY_SYNC really means "smudged" (I believe you but I'll want to go
> through the code and prove it to myself :-),

Proving it to yourself is good advice indeed.  I'm sure it used to mean
"smudged" in 2.4.0 time.  Whether any changes since have damaged that
property I haven't checked.

> then this might be a very
> easy fix.  We'll need to make sure that unmount time we do actually
> force out all inodes even if only I_DIRTY_SYNC is set.
> 
> (And then, we should rename things to more sane names.  :-)

Jörn

-- 
Joern's library part 11:
http://www.unicom.com/pw/reply-to-harmful.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bug 421482] Firefox 3 uses fsync excessively
  2008-05-26 10:07     ` Theodore Tso
  2008-05-26 11:10       ` Jörn Engel
@ 2008-05-26 18:49       ` Andrew Morton
  1 sibling, 0 replies; 15+ messages in thread
From: Andrew Morton @ 2008-05-26 18:49 UTC (permalink / raw)
  To: Theodore Tso; +Cc: linux-ext4, linux-fsdevel

On Mon, 26 May 2008 06:07:51 -0400 Theodore Tso <tytso@MIT.EDU> wrote:

> On Mon, May 26, 2008 at 12:05:06AM -0700, Andrew Morton wrote:
> > It's purportedly showing that fdatasync() on ext3 is syncing the whole
> > world in fsync()-fashion even with an application which does not grow
> > the file size.
> > 
> > But fdatasync() shouldn't do that.  Even if the inode is dirty from
> > atime or mtime updates, that shouldn't cause fdatasync() to run an
> > ext3 commit?
> 
> Well, ideally it shouldn't, although POSIX allows fdatasync() to be
> implemented in terms of fsync().  It is at the moment.  :-/

Well..

> The problem is we don't currently have a way of distinguishing between
> a "smudged" inode (only the mtime/atime has changed) and a "dirty"
> inode (even if the number of blocks hasn't changed, if i_size has
> changed, or i_mode, or anything else, including extended attributes
> inline in the inode).

Who do you mena by "we"?  ext3 or the kernel as a whole?

>  We're not tracking that difference.  If we only
> allow mtime/atime changes through setattr (see Cristoph's patches),
> and don't set the VFS dirty bit, but our own "smudged" bit, we could
> do it --- but at the moment, we're not.

But the VFS _does_ track these things, via the eternally
incomprehensible I_DIRTY_SYNC and I_DIRTY_DATASYNC.

We have:

	if (datasync && !(inode->i_state & I_DIRTY_DATASYNC))
		goto out;

which _should_ cause the fs to skip the commit during fdatasync() if
only mtime and ctime have changed?


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Bug 421482] Firefox 3 uses fsync excessively
  2008-05-26 12:52           ` Jörn Engel
@ 2008-05-26 20:22             ` Jamie Lokier
  2008-05-29 17:08               ` fdatasync/barriers (was : [Bug 421482] Firefox 3 uses fsync excessively) Bryan Henderson
  0 siblings, 1 reply; 15+ messages in thread
From: Jamie Lokier @ 2008-05-26 20:22 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Theodore Tso, Andrew Morton, linux-ext4, linux-fsdevel

Jörn Engel wrote:
> > We're currently forcing a new commit if I_DIRTY_SYNC or
> > I_DIRTY_DATASYNC (but not necessarily I_DIRTY_PAGES) is set.  If
> > I_DIRTY_SYNC really means "smudged" (I believe you but I'll want to go
> > through the code and prove it to myself :-),
> 
> Proving it to yourself is good advice indeed.  I'm sure it used to mean
> "smudged" in 2.4.0 time.  Whether any changes since have damaged that
> property I haven't checked.

I noticed fdatasync() doing a full fsync(), and had a look at those
flags a few kernels ago, to implement fdatasync().  I wasn't convinced
the flags were being used in that way, but now I don't remember why.

So, yes, do check what they mean _now_.  And then, please, make us all
happy and implement fdatasync() :-)

Here's a thought for someone implementing fdatasync().  If a database
uses O_DIRECT writes (typically with aio), then wants data which it's
written to be committed to the hard disk platter, and the filesystem
is mounted "barrier=1" - should it call fdatasync()?  Should that emit
the barrier?  If another application uses normal (not O_DIRECT)
writes, and then _is delayed_ so long that kernel writeback occurs and
all cache is clean, and then calls fdatasync(), should that call emit
a barrier in that case?  (Answers imho: yes and yes).

> > (And then, we should rename things to more sane names.  :-)

Please, yes!  The names made sense instinctively, until I looked at
the code then they didn't :-)

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fdatasync/barriers (was : [Bug 421482] Firefox 3 uses fsync excessively)
  2008-05-26 20:22             ` Jamie Lokier
@ 2008-05-29 17:08               ` Bryan Henderson
  2008-05-29 18:46                 ` jim owens
  0 siblings, 1 reply; 15+ messages in thread
From: Bryan Henderson @ 2008-05-29 17:08 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Andrew Morton, Jörn Engel, linux-ext4, linux-fsdevel,
	Theodore Tso

> Here's a thought for someone implementing fdatasync().  If a database
> uses O_DIRECT writes (typically with aio), then wants data which it's
> written to be committed to the hard disk platter, and the filesystem
> is mounted "barrier=1" - should it call fdatasync()?  Should that emit
> the barrier?  If another application uses normal (not O_DIRECT)
> writes, and then _is delayed_ so long that kernel writeback occurs and
> all cache is clean, and then calls fdatasync(), should that call emit
> a barrier in that case?  (Answers imho: yes and yes).

I don't get it.  What would be the value of emitting the barrier?

--
Bryan Henderson                     IBM Almaden Research Center
San Jose CA                         Filesystems


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fdatasync/barriers (was : [Bug 421482] Firefox 3 uses fsync excessively)
  2008-05-29 17:08               ` fdatasync/barriers (was : [Bug 421482] Firefox 3 uses fsync excessively) Bryan Henderson
@ 2008-05-29 18:46                 ` jim owens
  2008-05-29 23:15                   ` Bryan Henderson
  0 siblings, 1 reply; 15+ messages in thread
From: jim owens @ 2008-05-29 18:46 UTC (permalink / raw)
  To: Bryan Henderson; +Cc: linux-fsdevel

Bryan Henderson wrote:
>>Here's a thought for someone implementing fdatasync().  If a database
>>uses O_DIRECT writes (typically with aio), then wants data which it's
>>written to be committed to the hard disk platter, and the filesystem
>>is mounted "barrier=1" - should it call fdatasync()?  Should that emit
>>the barrier?  If another application uses normal (not O_DIRECT)
>>writes, and then _is delayed_ so long that kernel writeback occurs and
>>all cache is clean, and then calls fdatasync(), should that call emit
>>a barrier in that case?  (Answers imho: yes and yes).
> 
> 
> I don't get it.  What would be the value of emitting the barrier?

In both cases the FS must flush the drive write cache.

So which of Jamie's traps got you ...

    EMIT (SEND) the barrier, not OMIT.

   "all cache is clean": meaning KERNEL cache, not DRIVE cache.

? :) jim

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fdatasync/barriers (was : [Bug 421482] Firefox 3 uses fsync excessively)
  2008-05-29 18:46                 ` jim owens
@ 2008-05-29 23:15                   ` Bryan Henderson
  2008-05-30  4:00                     ` Timothy Shimmin
  0 siblings, 1 reply; 15+ messages in thread
From: Bryan Henderson @ 2008-05-29 23:15 UTC (permalink / raw)
  To: jim owens; +Cc: linux-fsdevel

jim owens <jowens@hp.com> wrote on 05/29/2008 11:46:10 AM:

> Bryan Henderson wrote:
> >>Here's a thought for someone implementing fdatasync().  If a database
> >>uses O_DIRECT writes (typically with aio), then wants data which it's
> >>written to be committed to the hard disk platter, and the filesystem
> >>is mounted "barrier=1" - should it call fdatasync()?  Should that emit
> >>the barrier?  If another application uses normal (not O_DIRECT)
> >>writes, and then _is delayed_ so long that kernel writeback occurs and
> >>all cache is clean, and then calls fdatasync(), should that call emit
> >>a barrier in that case?  (Answers imho: yes and yes).
> > 
> > 
> > I don't get it.  What would be the value of emitting the barrier?
> 
> In both cases the FS must flush the drive write cache.
> 
> So which of Jamie's traps got you ...

Must have been where he assumes we think of a barrier as something that 
causes a flush of the drive write cache.  That actually didn't cross my 
mind in reading the proposal; it's probably some context I missed from 
earlier in the thread.

If the idea is for fdatasync() to have that sync-to-platter function, 
fdatasync() should just tell the block layer to sync previously written 
data (now in the drive cache) to the platter; it has an interface for 
that, doesn't it?

A barrier is rather the opposite: it doesn't say to sync some data.  It 
says _don't_ sync some data.  I can believe it has a side effect of 
cleaning the drive's write cache, but I wouldn't want to depend on it for 
that.

The other question -- whether fdatasync ought to sync the data all the way 
to the platter instead of just to the drive -- is separate.  Hasn't that 
been discussed before?  Unfortunately, there are lots of levels of storage 
stability and POSIX just gives us the means to specify one, and the two 
sides of that interface have been locked in a battle for as long as I can 
remember to control the stability/performance tradeoff.

--
Bryan Henderson                     IBM Almaden Research Center
San Jose CA                         Filesystems


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fdatasync/barriers (was : [Bug 421482] Firefox 3 uses fsync excessively)
  2008-05-29 23:15                   ` Bryan Henderson
@ 2008-05-30  4:00                     ` Timothy Shimmin
  2008-05-30 14:14                       ` jim owens
  0 siblings, 1 reply; 15+ messages in thread
From: Timothy Shimmin @ 2008-05-30  4:00 UTC (permalink / raw)
  To: Bryan Henderson; +Cc: jim owens, linux-fsdevel

Bryan Henderson wrote:
> jim owens <jowens@hp.com> wrote on 05/29/2008 11:46:10 AM:
> 
>> Bryan Henderson wrote:
>>>> Here's a thought for someone implementing fdatasync().  If a database
>>>> uses O_DIRECT writes (typically with aio), then wants data which it's
>>>> written to be committed to the hard disk platter, and the filesystem
>>>> is mounted "barrier=1" - should it call fdatasync()?  Should that emit
>>>> the barrier?  If another application uses normal (not O_DIRECT)
>>>> writes, and then _is delayed_ so long that kernel writeback occurs and
>>>> all cache is clean, and then calls fdatasync(), should that call emit
>>>> a barrier in that case?  (Answers imho: yes and yes).
>>>
>>> I don't get it.  What would be the value of emitting the barrier?
>> In both cases the FS must flush the drive write cache.
>>
>> So which of Jamie's traps got you ...
> 
> Must have been where he assumes we think of a barrier as something that 
> causes a flush of the drive write cache.  That actually didn't cross my 
> mind in reading the proposal; it's probably some context I missed from 
> earlier in the thread.
> 
> If the idea is for fdatasync() to have that sync-to-platter function, 
> fdatasync() should just tell the block layer to sync previously written 
> data (now in the drive cache) to the platter; it has an interface for 
> that, doesn't it?
> 
blkdev_issue_flush() do you mean?

--Tim

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fdatasync/barriers (was : [Bug 421482] Firefox 3 uses fsync excessively)
  2008-05-30  4:00                     ` Timothy Shimmin
@ 2008-05-30 14:14                       ` jim owens
  2008-05-30 16:25                         ` Bryan Henderson
  0 siblings, 1 reply; 15+ messages in thread
From: jim owens @ 2008-05-30 14:14 UTC (permalink / raw)
  To: Timothy Shimmin, Bryan Henderson; +Cc: linux-fsdevel

Timothy Shimmin wrote:

>>>Bryan Henderson wrote:
>>
>>Must have been where he assumes we think of a barrier as something that 
>>causes a flush of the drive write cache.

In my case maybe I only assume the barrier will do that because
that is what I want to happen and I have not had time to
really dig into the docs and code.

>>If the idea is for fdatasync() to have that sync-to-platter function, 
>>fdatasync() should just tell the block layer to sync previously written 
>>data (now in the drive cache) to the platter; it has an interface for 
>>that, doesn't it?
>>
> 
> blkdev_issue_flush() do you mean?

My understanding (but I don't know this as fact) is:

Instead of a "flush-all-drive-cache" command, the  FS
should issue the proper barrier(s) to the blkdev layer
so it knows this set of data must sync-to-platter.

The key is "this set of data", not "all data".

The blkdev should know what the device supports for
caching and tagging I/Os and how to sync-to-platter
that "set of data".  If we are lucky, the device and
layers under the FS can sync-to-platter without a full
drive cache flush.  If not, then the device cache should
be flushed.

My further understanding is that some layers (and devices)
have bugs and don't sync-to-platter.  In my opinion those
are problems to fix or document so users can make the
right choices to protect their data.

jim


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fdatasync/barriers (was : [Bug 421482] Firefox 3 uses fsync excessively)
  2008-05-30 14:14                       ` jim owens
@ 2008-05-30 16:25                         ` Bryan Henderson
  2008-05-30 18:48                           ` jim owens
  0 siblings, 1 reply; 15+ messages in thread
From: Bryan Henderson @ 2008-05-30 16:25 UTC (permalink / raw)
  To: jim owens; +Cc: linux-fsdevel, Timothy Shimmin

> Instead of a "flush-all-drive-cache" command, the  FS
> should issue the proper barrier(s) to the blkdev layer
> so it knows this set of data must sync-to-platter.

But that's not what a barrier does.  In fact, I'm pretty sure no disk 
device provides the facilities to make that possible.

A barrier doesn't say any particular data must sync-to-platter.  What it 
says is that writes requested _after_ now should _not_ sync-to-platter 
until those requested before have done so.  It could still be arbitrarily 
long before the data previously written gets to the platter.

A pure barrier doesn't even give the requester any way to know when the 
data has hit the platter; its essential purpose is to make it so the 
requester doesn't have to know; it's a way for the requester to say, "I 
would have waited here for all previous writes to harden before starting 
any more; so that I don't have to suffer the slowdown of a dry queue, 
please do that ordering _for_ me while I continue to feed you requests."

But the Linux implementation does provide notification when the barrier 
moves through, so a requester could abuse it as a way to synchronize some 
other activity with his data hitting the platter.

For fdatasync() purposes, the fact that blkdev_issue_flush() syncs all 
data previously written, even though the user requires only one file's 
data to be synced, is a problem.  Maybe that's the best reason not to do 
it.  At least not unconditionally.  A barrier would have that same problem 
while simultaneously needlessly delaying later writes.

> My further understanding is that some layers (and devices)
> have bugs and don't sync-to-platter.  In my opinion those
> are problems to fix or document so users can make the
> right choices to protect their data.

Those aren't bugs.  They're conscious design choices, so the worst you can 
say about them is they are design defects.  The designer decided that the 
user would be more upset by constant slowness than by exposure to data 
loss in certain situations.  Yes, even though the user's program or OS 
explicitly requested sync-to-platter.  But I agree the behavior should be 
documented -- probably in every listing of the device's specifications.

--
Bryan Henderson                     IBM Almaden Research Center
San Jose CA                         Filesystems


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fdatasync/barriers (was : [Bug 421482] Firefox 3 uses fsync excessively)
  2008-05-30 16:25                         ` Bryan Henderson
@ 2008-05-30 18:48                           ` jim owens
  2008-06-02 17:31                             ` Bryan Henderson
  0 siblings, 1 reply; 15+ messages in thread
From: jim owens @ 2008-05-30 18:48 UTC (permalink / raw)
  To: Bryan Henderson; +Cc: linux-fsdevel

Bryan Henderson wrote:

> A barrier doesn't say any particular data must sync-to-platter.

I was told by a blkdev expert that there are barrier sequences
that will do this... which probably means I asked the wrong questions.

>>My further understanding is that some layers (and devices)
>>have bugs and don't sync-to-platter.
> 
> Those aren't bugs.  They're conscious design choices, so the worst you can 
> say about them is they are design defects.  The designer decided that the 
> user would be more upset by constant slowness than by exposure to data 
> loss in certain situations.  Yes, even though the user's program or OS 
> explicitly requested sync-to-platter.  But I agree the behavior should be 
> documented -- probably in every listing of the device's specifications.

I know it is often a design choice for some system vendors to
say they are posix compliant while not meeting the data
integrity requirements just so they can win benchmarks.  They
don't document it, they hope they never get caught.  Or do you
think the specs don't require data to reach non-volatile storage?

I'm not worried about devices since I can tell customers to buy
ones that work. I'm worried if the kernel won't save user data.

Trying to convince customers to move off proprietary systems and
onto linux is a tough sell if we don't really protect their data.
So I think I'll put finding a solution to fsync somewhere near the
top of my own todo list.

The large commercial users we (HP) want to pay my expenses would
be a little unforgiving about fsync not working... and they keep
packs of underfed lawyers in kennels :)

jim

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: fdatasync/barriers (was : [Bug 421482] Firefox 3 uses fsync excessively)
  2008-05-30 18:48                           ` jim owens
@ 2008-06-02 17:31                             ` Bryan Henderson
  0 siblings, 0 replies; 15+ messages in thread
From: Bryan Henderson @ 2008-06-02 17:31 UTC (permalink / raw)
  To: jim owens; +Cc: linux-fsdevel

> I know it is often a design choice for some system vendors to
> say they are posix compliant while not meeting the data
> integrity requirements just so they can win benchmarks.  They
> don't document it, they hope they never get caught.  Or do you
> think the specs don't require data to reach non-volatile storage?

Saying you're POSIX compliant isn't a design choice; it's a marketing 
choice.  But I do think POSIX doesn't require data to reach the platter. I 
looked into this a while back when I was designing a filesystem type that 
had lots of levels of data stability and found that POSIX is intentionally 
ambiguous about what fsync means.  Its words are "stable storage." 
Stability is relative.  That the data can't disappear if the OS crashes is 
one important kind of stability.  Even data on the platter is not 
perfectly stable, as data has been known to be not retrievable from a 
platter.  Electronic storage can be as stable and non-volatile as magnetic 
media with a good enough UPS.

But all that is kind of irrelevant, because users don't want POSIX 
compliance; POSIX is just a word.  If the user wants a system where a 
storage device power failure doesn't cause data loss, he needs a certain 
fsync behavior.  If he wants one where he can do N transactions a second 
on a single disk drive and doesn't have a risk of storage device power 
failure, he might need a different fsync behavior.

I don't know if filesystem driver or storage device vendors are 
intentionally misleading customers about how stable the storage is; it 
doesn't seem like something one could get away with, considering who buys 
these.  But I do know there's plenty of misunderstanding, and that's bad.

--
Bryan Henderson                     IBM Almaden Research Center
San Jose CA                         Filesystems


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2008-06-02 17:31 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <bug-421482-310856@https.bugzilla.mozilla.org/>
     [not found] ` <200805260513.m4Q5DAU8018498@mrapp54.mozilla.org>
2008-05-26  7:05   ` [Bug 421482] Firefox 3 uses fsync excessively Andrew Morton
2008-05-26 10:07     ` Theodore Tso
2008-05-26 11:10       ` Jörn Engel
2008-05-26 11:38         ` Theodore Tso
2008-05-26 12:52           ` Jörn Engel
2008-05-26 20:22             ` Jamie Lokier
2008-05-29 17:08               ` fdatasync/barriers (was : [Bug 421482] Firefox 3 uses fsync excessively) Bryan Henderson
2008-05-29 18:46                 ` jim owens
2008-05-29 23:15                   ` Bryan Henderson
2008-05-30  4:00                     ` Timothy Shimmin
2008-05-30 14:14                       ` jim owens
2008-05-30 16:25                         ` Bryan Henderson
2008-05-30 18:48                           ` jim owens
2008-06-02 17:31                             ` Bryan Henderson
2008-05-26 18:49       ` [Bug 421482] Firefox 3 uses fsync excessively Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).