* Re: [Bug 421482] Firefox 3 uses fsync excessively [not found] ` <200805260513.m4Q5DAU8018498@mrapp54.mozilla.org> @ 2008-05-26 7:05 ` Andrew Morton 2008-05-26 10:07 ` Theodore Tso 0 siblings, 1 reply; 15+ messages in thread From: Andrew Morton @ 2008-05-26 7:05 UTC (permalink / raw) To: linux-ext4, linux-fsdevel This: On Sun, 25 May 2008 22:13:10 -0700 bugzilla-daemon@mozilla.org wrote: > Do not reply to this email. You can add comments to this bug at > https://bugzilla.mozilla.org/show_bug.cgi?id=421482 > > > > > > --- Comment #152 from Karl Tomlinson (:karlt) <mozbugz@karlt.net> 2008-05-25 22:12:23 PDT --- > Created an attachment (id=322475) > --> (https://bugzilla.mozilla.org/attachment.cgi?id=322475) > fdatasync/sync_file_range test program > > fdatasync/sync_file_range test program > > This first creates a file of length 1 then does one fsync on the new file. > Then the file is continually modified without changing the length and synced > after each modification using one of three methods (somewhat randomly > selected): fsync/fdatasync/sync_file_range. > > The I/O load for the test results below was produced using dd with a small > blocksize to limit the I/O some: > > dd if=/dev/zero of=large bs=64 count=$((3*1024*1024*1024/64)) > > I used ltrace instead of strace as my strace didn't find sync_file_range (and > my glibc-2.5 libraries don't seem to have a sync_file_range function), so > sync_file_range appears below as "syscall(277". > > rm -f datasync-test.tmp && > ltrace -t -T -e trace=,fsync,fdatasync,syscall ./a.out > > 16:12:59 fsync(3) = 0 <11.864858> > 16:13:13 fdatasync(3) = 0 <14.706356> > 16:13:30 fsync(3) = 0 <12.832373> > 16:13:45 syscall(277, 3, 0, 1, 7) = 0 <0.343116> > 16:13:49 fdatasync(3) = 0 <8.231468> > 16:14:01 syscall(277, 3, 0, 1, 7) = 0 <2.347144> > 16:14:06 fsync(3) = 0 <6.938656> > 16:14:16 fdatasync(3) = 0 <8.359644> > 16:14:27 fsync(3) = 0 <5.928242> > 16:14:35 syscall(277, 3, 0, 1, 7) = 0 <0.009531> > 16:14:39 fdatasync(3) = 0 <7.356126> > 16:14:50 fsync(3) = 0 <6.402128> > 16:14:59 syscall(277, 3, 0, 1, 7) = 0 <0.802706> > 16:15:03 syscall(277, 3, 0, 1, 7) = 0 <2.985404> > 16:15:08 fsync(3) = 0 <4.722020> > 16:15:15 fdatasync(3) = 0 <6.532945> > 16:15:24 fdatasync(3) = 0 <2.294488> > 16:15:30 fsync(3) = 0 <7.986250> > 16:15:40 syscall(277, 3, 0, 1, 7) = 0 <1.409809> > 16:15:45 fdatasync(3) = 0 <5.404190> > > The results are consistent with fdatasync being implemented as fsync on ext3. > > They show the potential for considerable savings from growing (and shrinking) > files in large hunks and using sync_file_range (which also should reduce the > impact on the rest of the filesystem). is wrong, isn't it? It's purportedly showing that fdatasync() on ext3 is syncing the whole world in fsync()-fashion even with an application which does not grow the file size. But fdatasync() shouldn't do that. Even if the inode is dirty from atime or mtime updates, that shouldn't cause fdatasync() to run an ext3 commit? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Bug 421482] Firefox 3 uses fsync excessively 2008-05-26 7:05 ` [Bug 421482] Firefox 3 uses fsync excessively Andrew Morton @ 2008-05-26 10:07 ` Theodore Tso 2008-05-26 11:10 ` Jörn Engel 2008-05-26 18:49 ` [Bug 421482] Firefox 3 uses fsync excessively Andrew Morton 0 siblings, 2 replies; 15+ messages in thread From: Theodore Tso @ 2008-05-26 10:07 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-ext4, linux-fsdevel On Mon, May 26, 2008 at 12:05:06AM -0700, Andrew Morton wrote: > It's purportedly showing that fdatasync() on ext3 is syncing the whole > world in fsync()-fashion even with an application which does not grow > the file size. > > But fdatasync() shouldn't do that. Even if the inode is dirty from > atime or mtime updates, that shouldn't cause fdatasync() to run an > ext3 commit? Well, ideally it shouldn't, although POSIX allows fdatasync() to be implemented in terms of fsync(). It is at the moment. :-/ The problem is we don't currently have a way of distinguishing between a "smudged" inode (only the mtime/atime has changed) and a "dirty" inode (even if the number of blocks hasn't changed, if i_size has changed, or i_mode, or anything else, including extended attributes inline in the inode). We're not tracking that difference. If we only allow mtime/atime changes through setattr (see Cristoph's patches), and don't set the VFS dirty bit, but our own "smudged" bit, we could do it --- but at the moment, we're not. - Ted ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Bug 421482] Firefox 3 uses fsync excessively 2008-05-26 10:07 ` Theodore Tso @ 2008-05-26 11:10 ` Jörn Engel 2008-05-26 11:38 ` Theodore Tso 2008-05-26 18:49 ` [Bug 421482] Firefox 3 uses fsync excessively Andrew Morton 1 sibling, 1 reply; 15+ messages in thread From: Jörn Engel @ 2008-05-26 11:10 UTC (permalink / raw) To: Theodore Tso; +Cc: Andrew Morton, linux-ext4, linux-fsdevel On Mon, 26 May 2008 06:07:51 -0400, Theodore Tso wrote: > On Mon, May 26, 2008 at 12:05:06AM -0700, Andrew Morton wrote: > > It's purportedly showing that fdatasync() on ext3 is syncing the whole > > world in fsync()-fashion even with an application which does not grow > > the file size. > > > > But fdatasync() shouldn't do that. Even if the inode is dirty from > > atime or mtime updates, that shouldn't cause fdatasync() to run an > > ext3 commit? > > Well, ideally it shouldn't, although POSIX allows fdatasync() to be > implemented in terms of fsync(). It is at the moment. :-/ > > The problem is we don't currently have a way of distinguishing between > a "smudged" inode (only the mtime/atime has changed) and a "dirty" > inode (even if the number of blocks hasn't changed, if i_size has > changed, or i_mode, or anything else, including extended attributes > inline in the inode). We're not tracking that difference. If we only > allow mtime/atime changes through setattr (see Cristoph's patches), > and don't set the VFS dirty bit, but our own "smudged" bit, we could > do it --- but at the moment, we're not. Don't we already have this bit since Linux 2.4.0-test12? I_DIRTY_SYNC is admittedly not well-named for "smudged". But it used to mean just that. I_DIRTY_DATASYNC was the real dirty bit. Which, in I_DIRTY_PAGES, has been split into I_DIRTY_DATASYNC and I_DIRTY_PAGES. Now we just have to use sane names. Jörn -- Don't worry about people stealing your ideas. If your ideas are any good, you'll have to ram them down people's throats. -- Howard Aiken quoted by Ken Iverson quoted by Jim Horning quoted by Raph Levien, 1979 -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Bug 421482] Firefox 3 uses fsync excessively 2008-05-26 11:10 ` Jörn Engel @ 2008-05-26 11:38 ` Theodore Tso 2008-05-26 12:52 ` Jörn Engel 0 siblings, 1 reply; 15+ messages in thread From: Theodore Tso @ 2008-05-26 11:38 UTC (permalink / raw) To: Jörn Engel; +Cc: Andrew Morton, linux-ext4, linux-fsdevel On Mon, May 26, 2008 at 01:10:16PM +0200, Jörn Engel wrote: > Don't we already have this bit since Linux 2.4.0-test12? I_DIRTY_SYNC > is admittedly not well-named for "smudged". But it used to mean just > that. I_DIRTY_DATASYNC was the real dirty bit. Which, in I_DIRTY_PAGES, > has been split into I_DIRTY_DATASYNC and I_DIRTY_PAGES. > > Now we just have to use sane names. We're currently forcing a new commit if I_DIRTY_SYNC or I_DIRTY_DATASYNC (but not necessarily I_DIRTY_PAGES) is set. If I_DIRTY_SYNC really means "smudged" (I believe you but I'll want to go through the code and prove it to myself :-), then this might be a very easy fix. We'll need to make sure that unmount time we do actually force out all inodes even if only I_DIRTY_SYNC is set. (And then, we should rename things to more sane names. :-) - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Bug 421482] Firefox 3 uses fsync excessively 2008-05-26 11:38 ` Theodore Tso @ 2008-05-26 12:52 ` Jörn Engel 2008-05-26 20:22 ` Jamie Lokier 0 siblings, 1 reply; 15+ messages in thread From: Jörn Engel @ 2008-05-26 12:52 UTC (permalink / raw) To: Theodore Tso; +Cc: Andrew Morton, linux-ext4, linux-fsdevel On Mon, 26 May 2008 07:38:46 -0400, Theodore Tso wrote: > > On Mon, May 26, 2008 at 01:10:16PM +0200, Jörn Engel wrote: > > Don't we already have this bit since Linux 2.4.0-test12? I_DIRTY_SYNC > > is admittedly not well-named for "smudged". But it used to mean just > > that. I_DIRTY_DATASYNC was the real dirty bit. Which, in I_DIRTY_PAGES, ^^^^^^^^^^^^^ That should have been "2.4.0-prerelease". > > has been split into I_DIRTY_DATASYNC and I_DIRTY_PAGES. > > > > Now we just have to use sane names. > > We're currently forcing a new commit if I_DIRTY_SYNC or > I_DIRTY_DATASYNC (but not necessarily I_DIRTY_PAGES) is set. If > I_DIRTY_SYNC really means "smudged" (I believe you but I'll want to go > through the code and prove it to myself :-), Proving it to yourself is good advice indeed. I'm sure it used to mean "smudged" in 2.4.0 time. Whether any changes since have damaged that property I haven't checked. > then this might be a very > easy fix. We'll need to make sure that unmount time we do actually > force out all inodes even if only I_DIRTY_SYNC is set. > > (And then, we should rename things to more sane names. :-) Jörn -- Joern's library part 11: http://www.unicom.com/pw/reply-to-harmful.html -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Bug 421482] Firefox 3 uses fsync excessively 2008-05-26 12:52 ` Jörn Engel @ 2008-05-26 20:22 ` Jamie Lokier 2008-05-29 17:08 ` fdatasync/barriers (was : [Bug 421482] Firefox 3 uses fsync excessively) Bryan Henderson 0 siblings, 1 reply; 15+ messages in thread From: Jamie Lokier @ 2008-05-26 20:22 UTC (permalink / raw) To: Jörn Engel; +Cc: Theodore Tso, Andrew Morton, linux-ext4, linux-fsdevel Jörn Engel wrote: > > We're currently forcing a new commit if I_DIRTY_SYNC or > > I_DIRTY_DATASYNC (but not necessarily I_DIRTY_PAGES) is set. If > > I_DIRTY_SYNC really means "smudged" (I believe you but I'll want to go > > through the code and prove it to myself :-), > > Proving it to yourself is good advice indeed. I'm sure it used to mean > "smudged" in 2.4.0 time. Whether any changes since have damaged that > property I haven't checked. I noticed fdatasync() doing a full fsync(), and had a look at those flags a few kernels ago, to implement fdatasync(). I wasn't convinced the flags were being used in that way, but now I don't remember why. So, yes, do check what they mean _now_. And then, please, make us all happy and implement fdatasync() :-) Here's a thought for someone implementing fdatasync(). If a database uses O_DIRECT writes (typically with aio), then wants data which it's written to be committed to the hard disk platter, and the filesystem is mounted "barrier=1" - should it call fdatasync()? Should that emit the barrier? If another application uses normal (not O_DIRECT) writes, and then _is delayed_ so long that kernel writeback occurs and all cache is clean, and then calls fdatasync(), should that call emit a barrier in that case? (Answers imho: yes and yes). > > (And then, we should rename things to more sane names. :-) Please, yes! The names made sense instinctively, until I looked at the code then they didn't :-) -- Jamie -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fdatasync/barriers (was : [Bug 421482] Firefox 3 uses fsync excessively) 2008-05-26 20:22 ` Jamie Lokier @ 2008-05-29 17:08 ` Bryan Henderson 2008-05-29 18:46 ` jim owens 0 siblings, 1 reply; 15+ messages in thread From: Bryan Henderson @ 2008-05-29 17:08 UTC (permalink / raw) To: Jamie Lokier Cc: Andrew Morton, Jörn Engel, linux-ext4, linux-fsdevel, Theodore Tso > Here's a thought for someone implementing fdatasync(). If a database > uses O_DIRECT writes (typically with aio), then wants data which it's > written to be committed to the hard disk platter, and the filesystem > is mounted "barrier=1" - should it call fdatasync()? Should that emit > the barrier? If another application uses normal (not O_DIRECT) > writes, and then _is delayed_ so long that kernel writeback occurs and > all cache is clean, and then calls fdatasync(), should that call emit > a barrier in that case? (Answers imho: yes and yes). I don't get it. What would be the value of emitting the barrier? -- Bryan Henderson IBM Almaden Research Center San Jose CA Filesystems ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fdatasync/barriers (was : [Bug 421482] Firefox 3 uses fsync excessively) 2008-05-29 17:08 ` fdatasync/barriers (was : [Bug 421482] Firefox 3 uses fsync excessively) Bryan Henderson @ 2008-05-29 18:46 ` jim owens 2008-05-29 23:15 ` Bryan Henderson 0 siblings, 1 reply; 15+ messages in thread From: jim owens @ 2008-05-29 18:46 UTC (permalink / raw) To: Bryan Henderson; +Cc: linux-fsdevel Bryan Henderson wrote: >>Here's a thought for someone implementing fdatasync(). If a database >>uses O_DIRECT writes (typically with aio), then wants data which it's >>written to be committed to the hard disk platter, and the filesystem >>is mounted "barrier=1" - should it call fdatasync()? Should that emit >>the barrier? If another application uses normal (not O_DIRECT) >>writes, and then _is delayed_ so long that kernel writeback occurs and >>all cache is clean, and then calls fdatasync(), should that call emit >>a barrier in that case? (Answers imho: yes and yes). > > > I don't get it. What would be the value of emitting the barrier? In both cases the FS must flush the drive write cache. So which of Jamie's traps got you ... EMIT (SEND) the barrier, not OMIT. "all cache is clean": meaning KERNEL cache, not DRIVE cache. ? :) jim ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fdatasync/barriers (was : [Bug 421482] Firefox 3 uses fsync excessively) 2008-05-29 18:46 ` jim owens @ 2008-05-29 23:15 ` Bryan Henderson 2008-05-30 4:00 ` Timothy Shimmin 0 siblings, 1 reply; 15+ messages in thread From: Bryan Henderson @ 2008-05-29 23:15 UTC (permalink / raw) To: jim owens; +Cc: linux-fsdevel jim owens <jowens@hp.com> wrote on 05/29/2008 11:46:10 AM: > Bryan Henderson wrote: > >>Here's a thought for someone implementing fdatasync(). If a database > >>uses O_DIRECT writes (typically with aio), then wants data which it's > >>written to be committed to the hard disk platter, and the filesystem > >>is mounted "barrier=1" - should it call fdatasync()? Should that emit > >>the barrier? If another application uses normal (not O_DIRECT) > >>writes, and then _is delayed_ so long that kernel writeback occurs and > >>all cache is clean, and then calls fdatasync(), should that call emit > >>a barrier in that case? (Answers imho: yes and yes). > > > > > > I don't get it. What would be the value of emitting the barrier? > > In both cases the FS must flush the drive write cache. > > So which of Jamie's traps got you ... Must have been where he assumes we think of a barrier as something that causes a flush of the drive write cache. That actually didn't cross my mind in reading the proposal; it's probably some context I missed from earlier in the thread. If the idea is for fdatasync() to have that sync-to-platter function, fdatasync() should just tell the block layer to sync previously written data (now in the drive cache) to the platter; it has an interface for that, doesn't it? A barrier is rather the opposite: it doesn't say to sync some data. It says _don't_ sync some data. I can believe it has a side effect of cleaning the drive's write cache, but I wouldn't want to depend on it for that. The other question -- whether fdatasync ought to sync the data all the way to the platter instead of just to the drive -- is separate. Hasn't that been discussed before? Unfortunately, there are lots of levels of storage stability and POSIX just gives us the means to specify one, and the two sides of that interface have been locked in a battle for as long as I can remember to control the stability/performance tradeoff. -- Bryan Henderson IBM Almaden Research Center San Jose CA Filesystems ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fdatasync/barriers (was : [Bug 421482] Firefox 3 uses fsync excessively) 2008-05-29 23:15 ` Bryan Henderson @ 2008-05-30 4:00 ` Timothy Shimmin 2008-05-30 14:14 ` jim owens 0 siblings, 1 reply; 15+ messages in thread From: Timothy Shimmin @ 2008-05-30 4:00 UTC (permalink / raw) To: Bryan Henderson; +Cc: jim owens, linux-fsdevel Bryan Henderson wrote: > jim owens <jowens@hp.com> wrote on 05/29/2008 11:46:10 AM: > >> Bryan Henderson wrote: >>>> Here's a thought for someone implementing fdatasync(). If a database >>>> uses O_DIRECT writes (typically with aio), then wants data which it's >>>> written to be committed to the hard disk platter, and the filesystem >>>> is mounted "barrier=1" - should it call fdatasync()? Should that emit >>>> the barrier? If another application uses normal (not O_DIRECT) >>>> writes, and then _is delayed_ so long that kernel writeback occurs and >>>> all cache is clean, and then calls fdatasync(), should that call emit >>>> a barrier in that case? (Answers imho: yes and yes). >>> >>> I don't get it. What would be the value of emitting the barrier? >> In both cases the FS must flush the drive write cache. >> >> So which of Jamie's traps got you ... > > Must have been where he assumes we think of a barrier as something that > causes a flush of the drive write cache. That actually didn't cross my > mind in reading the proposal; it's probably some context I missed from > earlier in the thread. > > If the idea is for fdatasync() to have that sync-to-platter function, > fdatasync() should just tell the block layer to sync previously written > data (now in the drive cache) to the platter; it has an interface for > that, doesn't it? > blkdev_issue_flush() do you mean? --Tim ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fdatasync/barriers (was : [Bug 421482] Firefox 3 uses fsync excessively) 2008-05-30 4:00 ` Timothy Shimmin @ 2008-05-30 14:14 ` jim owens 2008-05-30 16:25 ` Bryan Henderson 0 siblings, 1 reply; 15+ messages in thread From: jim owens @ 2008-05-30 14:14 UTC (permalink / raw) To: Timothy Shimmin, Bryan Henderson; +Cc: linux-fsdevel Timothy Shimmin wrote: >>>Bryan Henderson wrote: >> >>Must have been where he assumes we think of a barrier as something that >>causes a flush of the drive write cache. In my case maybe I only assume the barrier will do that because that is what I want to happen and I have not had time to really dig into the docs and code. >>If the idea is for fdatasync() to have that sync-to-platter function, >>fdatasync() should just tell the block layer to sync previously written >>data (now in the drive cache) to the platter; it has an interface for >>that, doesn't it? >> > > blkdev_issue_flush() do you mean? My understanding (but I don't know this as fact) is: Instead of a "flush-all-drive-cache" command, the FS should issue the proper barrier(s) to the blkdev layer so it knows this set of data must sync-to-platter. The key is "this set of data", not "all data". The blkdev should know what the device supports for caching and tagging I/Os and how to sync-to-platter that "set of data". If we are lucky, the device and layers under the FS can sync-to-platter without a full drive cache flush. If not, then the device cache should be flushed. My further understanding is that some layers (and devices) have bugs and don't sync-to-platter. In my opinion those are problems to fix or document so users can make the right choices to protect their data. jim ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fdatasync/barriers (was : [Bug 421482] Firefox 3 uses fsync excessively) 2008-05-30 14:14 ` jim owens @ 2008-05-30 16:25 ` Bryan Henderson 2008-05-30 18:48 ` jim owens 0 siblings, 1 reply; 15+ messages in thread From: Bryan Henderson @ 2008-05-30 16:25 UTC (permalink / raw) To: jim owens; +Cc: linux-fsdevel, Timothy Shimmin > Instead of a "flush-all-drive-cache" command, the FS > should issue the proper barrier(s) to the blkdev layer > so it knows this set of data must sync-to-platter. But that's not what a barrier does. In fact, I'm pretty sure no disk device provides the facilities to make that possible. A barrier doesn't say any particular data must sync-to-platter. What it says is that writes requested _after_ now should _not_ sync-to-platter until those requested before have done so. It could still be arbitrarily long before the data previously written gets to the platter. A pure barrier doesn't even give the requester any way to know when the data has hit the platter; its essential purpose is to make it so the requester doesn't have to know; it's a way for the requester to say, "I would have waited here for all previous writes to harden before starting any more; so that I don't have to suffer the slowdown of a dry queue, please do that ordering _for_ me while I continue to feed you requests." But the Linux implementation does provide notification when the barrier moves through, so a requester could abuse it as a way to synchronize some other activity with his data hitting the platter. For fdatasync() purposes, the fact that blkdev_issue_flush() syncs all data previously written, even though the user requires only one file's data to be synced, is a problem. Maybe that's the best reason not to do it. At least not unconditionally. A barrier would have that same problem while simultaneously needlessly delaying later writes. > My further understanding is that some layers (and devices) > have bugs and don't sync-to-platter. In my opinion those > are problems to fix or document so users can make the > right choices to protect their data. Those aren't bugs. They're conscious design choices, so the worst you can say about them is they are design defects. The designer decided that the user would be more upset by constant slowness than by exposure to data loss in certain situations. Yes, even though the user's program or OS explicitly requested sync-to-platter. But I agree the behavior should be documented -- probably in every listing of the device's specifications. -- Bryan Henderson IBM Almaden Research Center San Jose CA Filesystems ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fdatasync/barriers (was : [Bug 421482] Firefox 3 uses fsync excessively) 2008-05-30 16:25 ` Bryan Henderson @ 2008-05-30 18:48 ` jim owens 2008-06-02 17:31 ` Bryan Henderson 0 siblings, 1 reply; 15+ messages in thread From: jim owens @ 2008-05-30 18:48 UTC (permalink / raw) To: Bryan Henderson; +Cc: linux-fsdevel Bryan Henderson wrote: > A barrier doesn't say any particular data must sync-to-platter. I was told by a blkdev expert that there are barrier sequences that will do this... which probably means I asked the wrong questions. >>My further understanding is that some layers (and devices) >>have bugs and don't sync-to-platter. > > Those aren't bugs. They're conscious design choices, so the worst you can > say about them is they are design defects. The designer decided that the > user would be more upset by constant slowness than by exposure to data > loss in certain situations. Yes, even though the user's program or OS > explicitly requested sync-to-platter. But I agree the behavior should be > documented -- probably in every listing of the device's specifications. I know it is often a design choice for some system vendors to say they are posix compliant while not meeting the data integrity requirements just so they can win benchmarks. They don't document it, they hope they never get caught. Or do you think the specs don't require data to reach non-volatile storage? I'm not worried about devices since I can tell customers to buy ones that work. I'm worried if the kernel won't save user data. Trying to convince customers to move off proprietary systems and onto linux is a tough sell if we don't really protect their data. So I think I'll put finding a solution to fsync somewhere near the top of my own todo list. The large commercial users we (HP) want to pay my expenses would be a little unforgiving about fsync not working... and they keep packs of underfed lawyers in kennels :) jim ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: fdatasync/barriers (was : [Bug 421482] Firefox 3 uses fsync excessively) 2008-05-30 18:48 ` jim owens @ 2008-06-02 17:31 ` Bryan Henderson 0 siblings, 0 replies; 15+ messages in thread From: Bryan Henderson @ 2008-06-02 17:31 UTC (permalink / raw) To: jim owens; +Cc: linux-fsdevel > I know it is often a design choice for some system vendors to > say they are posix compliant while not meeting the data > integrity requirements just so they can win benchmarks. They > don't document it, they hope they never get caught. Or do you > think the specs don't require data to reach non-volatile storage? Saying you're POSIX compliant isn't a design choice; it's a marketing choice. But I do think POSIX doesn't require data to reach the platter. I looked into this a while back when I was designing a filesystem type that had lots of levels of data stability and found that POSIX is intentionally ambiguous about what fsync means. Its words are "stable storage." Stability is relative. That the data can't disappear if the OS crashes is one important kind of stability. Even data on the platter is not perfectly stable, as data has been known to be not retrievable from a platter. Electronic storage can be as stable and non-volatile as magnetic media with a good enough UPS. But all that is kind of irrelevant, because users don't want POSIX compliance; POSIX is just a word. If the user wants a system where a storage device power failure doesn't cause data loss, he needs a certain fsync behavior. If he wants one where he can do N transactions a second on a single disk drive and doesn't have a risk of storage device power failure, he might need a different fsync behavior. I don't know if filesystem driver or storage device vendors are intentionally misleading customers about how stable the storage is; it doesn't seem like something one could get away with, considering who buys these. But I do know there's plenty of misunderstanding, and that's bad. -- Bryan Henderson IBM Almaden Research Center San Jose CA Filesystems ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Bug 421482] Firefox 3 uses fsync excessively 2008-05-26 10:07 ` Theodore Tso 2008-05-26 11:10 ` Jörn Engel @ 2008-05-26 18:49 ` Andrew Morton 1 sibling, 0 replies; 15+ messages in thread From: Andrew Morton @ 2008-05-26 18:49 UTC (permalink / raw) To: Theodore Tso; +Cc: linux-ext4, linux-fsdevel On Mon, 26 May 2008 06:07:51 -0400 Theodore Tso <tytso@MIT.EDU> wrote: > On Mon, May 26, 2008 at 12:05:06AM -0700, Andrew Morton wrote: > > It's purportedly showing that fdatasync() on ext3 is syncing the whole > > world in fsync()-fashion even with an application which does not grow > > the file size. > > > > But fdatasync() shouldn't do that. Even if the inode is dirty from > > atime or mtime updates, that shouldn't cause fdatasync() to run an > > ext3 commit? > > Well, ideally it shouldn't, although POSIX allows fdatasync() to be > implemented in terms of fsync(). It is at the moment. :-/ Well.. > The problem is we don't currently have a way of distinguishing between > a "smudged" inode (only the mtime/atime has changed) and a "dirty" > inode (even if the number of blocks hasn't changed, if i_size has > changed, or i_mode, or anything else, including extended attributes > inline in the inode). Who do you mena by "we"? ext3 or the kernel as a whole? > We're not tracking that difference. If we only > allow mtime/atime changes through setattr (see Cristoph's patches), > and don't set the VFS dirty bit, but our own "smudged" bit, we could > do it --- but at the moment, we're not. But the VFS _does_ track these things, via the eternally incomprehensible I_DIRTY_SYNC and I_DIRTY_DATASYNC. We have: if (datasync && !(inode->i_state & I_DIRTY_DATASYNC)) goto out; which _should_ cause the fs to skip the commit during fdatasync() if only mtime and ctime have changed? ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2008-06-02 17:31 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <bug-421482-310856@https.bugzilla.mozilla.org/>
[not found] ` <200805260513.m4Q5DAU8018498@mrapp54.mozilla.org>
2008-05-26 7:05 ` [Bug 421482] Firefox 3 uses fsync excessively Andrew Morton
2008-05-26 10:07 ` Theodore Tso
2008-05-26 11:10 ` Jörn Engel
2008-05-26 11:38 ` Theodore Tso
2008-05-26 12:52 ` Jörn Engel
2008-05-26 20:22 ` Jamie Lokier
2008-05-29 17:08 ` fdatasync/barriers (was : [Bug 421482] Firefox 3 uses fsync excessively) Bryan Henderson
2008-05-29 18:46 ` jim owens
2008-05-29 23:15 ` Bryan Henderson
2008-05-30 4:00 ` Timothy Shimmin
2008-05-30 14:14 ` jim owens
2008-05-30 16:25 ` Bryan Henderson
2008-05-30 18:48 ` jim owens
2008-06-02 17:31 ` Bryan Henderson
2008-05-26 18:49 ` [Bug 421482] Firefox 3 uses fsync excessively Andrew Morton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).