* RE: XFS corruption during power-blackout [not found] <20050629001847.GB850@frodo> @ 2005-06-29 4:53 ` Al Boldi 2005-06-29 16:38 ` Christian Rice 2005-06-29 17:02 ` Chris Wedgwood 0 siblings, 2 replies; 36+ messages in thread From: Al Boldi @ 2005-06-29 4:53 UTC (permalink / raw) To: 'Nathan Scott' Cc: linux-xfs, linux-kernel, linux-fsdevel, reiserfs-list Hi Nathan, You wrote: { On Tue, Jun 28, 2005 at 12:08:05PM +0300, Al Boldi wrote: > True now, not so around 2.4.20 when XFS was rock-solid. I think they > tried to improve on performance and broke something. I wish they would > fix that because it forced me back to ext3, as in consistency over > performance any time. Can you provide any details... } Specifically, in 2.4.20 I did an acid test: Spawn 10 cp -a on some big dir like /usr. Let it run for a few seconds, then pull the plug. Don't reset-button, reset is different then pulling the plug. Don't poweroff-button, poweroff is different then pulling the plug. On reboot diff the dirs spawned. What I found were 4 things in the dest dir: 1. Missing Dirs,Files. That's OK. 2. Files of size 0. That's acceptable. 3. Corrupted Files. That's unacceptable. 4. Corrupted Files with original fingerprint. That's ABSOLUTELY unacceptable. Ext3 performed best with minimal files of size 0. XFS was second with more files of size 0. Reiser,JFS was worst with corruptions. When XFS was added into the vanilla-Kernel it caused corruptions like Reiser and JFS, which forced me back to Ext3. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: XFS corruption during power-blackout 2005-06-29 4:53 ` XFS corruption during power-blackout Al Boldi @ 2005-06-29 16:38 ` Christian Rice 2005-06-29 17:02 ` Chris Wedgwood 1 sibling, 0 replies; 36+ messages in thread From: Christian Rice @ 2005-06-29 16:38 UTC (permalink / raw) To: Al Boldi Cc: 'Nathan Scott', linux-xfs, linux-kernel, linux-fsdevel, reiserfs-list Al Boldi wrote: >Hi Nathan, >You wrote: { >On Tue, Jun 28, 2005 at 12:08:05PM +0300, Al Boldi wrote: > > >>True now, not so around 2.4.20 when XFS was rock-solid. I think they >>tried to improve on performance and broke something. I wish they would >>fix that because it forced me back to ext3, as in consistency over >>performance any time. >> >> > >Can you provide any details... >} > >Specifically, in 2.4.20 I did an acid test: >Spawn 10 cp -a on some big dir like /usr. >Let it run for a few seconds, then pull the plug. >Don't reset-button, reset is different then pulling the plug. >Don't poweroff-button, poweroff is different then pulling the plug. >On reboot diff the dirs spawned. > >What I found were 4 things in the dest dir: >1. Missing Dirs,Files. That's OK. >2. Files of size 0. That's acceptable. >3. Corrupted Files. That's unacceptable. >4. Corrupted Files with original fingerprint. That's ABSOLUTELY >unacceptable. > >Ext3 performed best with minimal files of size 0. >XFS was second with more files of size 0. >Reiser,JFS was worst with corruptions. > >When XFS was added into the vanilla-Kernel it caused corruptions like Reiser >and JFS, which forced me back to Ext3. > > > > > Pardon me if I haven't seen the whole thread. Do you have hard drive write cache turned off or, if it's a raid card, a battery backup on the write cache? That makes a big difference when operators begin doing things like pulling plugs and hitting reset. Again, no offense, just one of those "have you taken it out of the box, plugged it in and turned it on" kind of questions. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: XFS corruption during power-blackout 2005-06-29 4:53 ` XFS corruption during power-blackout Al Boldi 2005-06-29 16:38 ` Christian Rice @ 2005-06-29 17:02 ` Chris Wedgwood 2005-06-29 17:56 ` Steve Lord 2005-07-01 8:17 ` David Masover 1 sibling, 2 replies; 36+ messages in thread From: Chris Wedgwood @ 2005-06-29 17:02 UTC (permalink / raw) To: Al Boldi Cc: 'Nathan Scott', linux-xfs, linux-kernel, linux-fsdevel, reiserfs-list On Wed, Jun 29, 2005 at 07:53:09AM +0300, Al Boldi wrote: > What I found were 4 things in the dest dir: > 1. Missing Dirs,Files. That's OK. > 2. Files of size 0. That's acceptable. > 3. Corrupted Files. That's unacceptable. > 4. Corrupted Files with original fingerprint. That's ABSOLUTELY > unacceptable. disk usually default to caching these days and can lose data as a result, disable that ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: XFS corruption during power-blackout 2005-06-29 17:02 ` Chris Wedgwood @ 2005-06-29 17:56 ` Steve Lord 2005-06-29 20:56 ` Chris Wedgwood 2005-06-29 21:10 ` Nathan Scott 2005-07-01 8:17 ` David Masover 1 sibling, 2 replies; 36+ messages in thread From: Steve Lord @ 2005-06-29 17:56 UTC (permalink / raw) To: Chris Wedgwood Cc: Al Boldi, 'Nathan Scott', linux-xfs, linux-kernel, linux-fsdevel, reiserfs-list Chris Wedgwood wrote: > On Wed, Jun 29, 2005 at 07:53:09AM +0300, Al Boldi wrote: > > >>What I found were 4 things in the dest dir: >>1. Missing Dirs,Files. That's OK. >>2. Files of size 0. That's acceptable. >>3. Corrupted Files. That's unacceptable. >>4. Corrupted Files with original fingerprint. That's ABSOLUTELY >>unacceptable. > > > disk usually default to caching these days and can lose data as a > result, disable that > There are IDE drives where the vendor will tell you that you will drasticly shorten the life of a drive if you turn off caching. There are also cool bits of technology which use the rotational energy of the spinning down drive to dump the cache out to a special track (or this may be an urban legend, not sure). Problem is, no one but the vendors really knows what any particular disk is going to do when you pull the plug. I did spend a bunch of time once ensuring that when you typed sync on xfs you could pull the power right after that and everything from before the sync survived. There have been a lot of changes both in xfs and the surrounding kernel since then. I do not know if anyone has attempted this effort again recently. If you care sufficiently about your data to want to do power fail testing then, even assuming the filesystem works perfectly: a) have a working, tested, regular backup policy b) keep the backups in a different building c) buy a UPS. Steve ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: XFS corruption during power-blackout 2005-06-29 17:56 ` Steve Lord @ 2005-06-29 20:56 ` Chris Wedgwood 2005-06-30 16:30 ` Bryan Henderson 2005-06-29 21:10 ` Nathan Scott 1 sibling, 1 reply; 36+ messages in thread From: Chris Wedgwood @ 2005-06-29 20:56 UTC (permalink / raw) To: Steve Lord Cc: Al Boldi, 'Nathan Scott', linux-xfs, linux-kernel, linux-fsdevel, reiserfs-list On Wed, Jun 29, 2005 at 12:56:12PM -0500, Steve Lord wrote: > There are also cool bits of technology which use the rotational > energy of the spinning down drive to dump the cache out to a special > track (or this may be an urban legend, not sure). This seems only to be true for very small writes. I suspect on power loss a drive and finish writing the current sector. Anyhow, I've tested power loss on drives with caching enabled and they definatley do lose data. Sometimes a couple of MBs worth. I don't know if this is true for all drives but NONE of the ones I had access to when testing did anything like save the cache --- pretty much all data that was inflight got lost. > I did spend a bunch of time once ensuring that when you typed sync > on xfs you could pull the power right after that and everything from > before the sync survived. I think this is probably still true. If I sync then drop power I don't seem to have any problems provided caching is off. If caching is enabled I still lose data. Linux does have a concept of write barriers but these are presently not implemented for XFS right now. Once they are I assume sunc + poweroff will be reliable with caching enabled. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: XFS corruption during power-blackout 2005-06-29 20:56 ` Chris Wedgwood @ 2005-06-30 16:30 ` Bryan Henderson 2005-06-30 18:46 ` Chris Wedgwood 0 siblings, 1 reply; 36+ messages in thread From: Bryan Henderson @ 2005-06-30 16:30 UTC (permalink / raw) To: Chris Wedgwood Cc: Al Boldi, linux-fsdevel, linux-xfs, Steve Lord, 'Nathan Scott', reiserfs-list >I don't know if this is true for all drives but NONE of the ones I had >access to when testing did anything like save the cache --- pretty >much all data that was inflight got lost. For another point of reference - were these ATA (personal class) or SCSI (commercial class) drives or both? Is write caching the default on typical SCSI devices? >Linux does have a concept of >write barriers but these are presently not implemented for XFS right >now. Once they are I assume sync + poweroff will be reliable with >caching enabled. But be careful with the 'sync' program/system call. As defined by POSIX, it is not a synchronizing operation. It's supposed to cause buffered writes to get hardened some time soon, not right now. So in theory, you can't pull the plug after typing "sync." In Linux, the implementation has changed a few times in this respect. In some versions, it at least _tries_ to implement "everything that was buffered when sync() started is hardened before sync() returns." In others, it implements "everything that was buffered when sync() started is hardened before the next sync() returns," and some 'sync' programs do multiple sync()s. And it's also filesystem-type-dependent. I don't know exactly what the present state is. fsync(), on the other hand, is a true synchronizing operation. -- Bryan Henderson IBM Almaden Research Center San Jose CA Filesystems ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: XFS corruption during power-blackout 2005-06-30 16:30 ` Bryan Henderson @ 2005-06-30 18:46 ` Chris Wedgwood 2005-06-30 19:44 ` Jörn Engel ` (3 more replies) 0 siblings, 4 replies; 36+ messages in thread From: Chris Wedgwood @ 2005-06-30 18:46 UTC (permalink / raw) To: Bryan Henderson Cc: Al Boldi, linux-fsdevel, linux-xfs, Steve Lord, 'Nathan Scott', reiserfs-list On Thu, Jun 30, 2005 at 12:30:20PM -0400, Bryan Henderson wrote: > For another point of reference - were these ATA (personal class) or > SCSI (commercial class) drives or both? IDE were Maxtor some old Maxtor 60GB disks and some not-so-old 200GB WD drives. Maxtor has 2MB cache. WD 8MB. The SCSI disks where 10K RPM SCA somethings. I think they were Segate (they've since been taken or else I would check). I have no idea what the cache is on those. > Is write caching the default on typical SCSI devices? I'm not sure. It seemed to be off by default for the SCSI disks and on by default for IDE when I checked. I can't rule out the bios/controller doing something there though. > But be careful with the 'sync' program/system call. As defined by > POSIX, it is not a synchronizing operation. Yes, but POSIX is broken in places. The linux implmentation (now and for sometime but not always) won't return until all dirty data is flushed. POSIX is a bit more sane about fsync(): The fsync() function can be used by an application to indicate that all data for the open file description named by fildes is to be transferred to the storage device associated with the file described by fildes in an implementation-dependent manner. The fsync() function does not return until the system has completed that action or until an error is detected. > It's supposed to cause buffered writes to get hardened some time > soon, not right now. So in theory, you can't pull the plug after > typing "sync." Data lss internal to the disks aside you can uner Linux. I do it all the time. Various other people do and this is something some people do test. > In others, it implements "everything that was buffered when sync() > started is hardened before the next sync() returns," That is what happens now. I'm not sure any other behavior makes sense does it? If it happens differently I would call that a bug. > and some 'sync' programs do multiple sync()s. Such programs are arguably broken (grub maybe?). If one doesn't work, then why should doing it <n>-times? > And it's also filesystem-type-dependent. If a filesystem doesn't flush reliably with sync, I would call that a bug. > fsync(), on the other hand, is a true synchronizing operation. Again that requires the fs to behave correctly so if it fails it should be reported as a bug. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: XFS corruption during power-blackout 2005-06-30 18:46 ` Chris Wedgwood @ 2005-06-30 19:44 ` Jörn Engel 2005-06-30 20:32 ` Chris Wedgwood 2005-06-30 20:49 ` Bryan Henderson ` (2 subsequent siblings) 3 siblings, 1 reply; 36+ messages in thread From: Jörn Engel @ 2005-06-30 19:44 UTC (permalink / raw) To: Chris Wedgwood Cc: Bryan Henderson, Al Boldi, linux-fsdevel, linux-xfs, Steve Lord, 'Nathan Scott', reiserfs-list On Thu, 30 June 2005 11:46:27 -0700, Chris Wedgwood wrote: > On Thu, Jun 30, 2005 at 12:30:20PM -0400, Bryan Henderson wrote: > > > In others, it implements "everything that was buffered when sync() > > started is hardened before the next sync() returns," > > That is what happens now. I'm not sure any other behavior makes sense > does it? > > If it happens differently I would call that a bug. While I agree with all the rest, this part confuses me. Do you mean that sync() should altually return immediatly, but the second sync() block until all data present at the time of the previous sync() is hardened? Or do you rather mean that a single sync() should block until all data currently present is hardened? Jörn -- It is better to die of hunger having lived without grief and fear, than to live with a troubled spirit amid abundance. -- Epictetus - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: XFS corruption during power-blackout 2005-06-30 19:44 ` Jörn Engel @ 2005-06-30 20:32 ` Chris Wedgwood 2005-06-30 21:07 ` Jörn Engel 2005-07-01 12:36 ` Ric Wheeler 0 siblings, 2 replies; 36+ messages in thread From: Chris Wedgwood @ 2005-06-30 20:32 UTC (permalink / raw) To: J?rn Engel Cc: Bryan Henderson, Al Boldi, linux-fsdevel, linux-xfs, Steve Lord, 'Nathan Scott', reiserfs-list On Thu, Jun 30, 2005 at 09:44:37PM +0200, J?rn Engel wrote: > Or do you rather mean that a single sync() should block until all data > currently present is hardened? Logically sync() should return only after all dirty buffers that existed before sync() was called are flushed. Anything more than this (i.e. waiting on newly (since sync was called but before it returns) dirtied buffers) could live-lock (actually, this used to happen sometimes, I don't know if that's the case). ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: XFS corruption during power-blackout 2005-06-30 20:32 ` Chris Wedgwood @ 2005-06-30 21:07 ` Jörn Engel 2005-07-01 12:36 ` Ric Wheeler 1 sibling, 0 replies; 36+ messages in thread From: Jörn Engel @ 2005-06-30 21:07 UTC (permalink / raw) To: Chris Wedgwood Cc: Bryan Henderson, Al Boldi, linux-fsdevel, linux-xfs, Steve Lord, 'Nathan Scott', reiserfs-list On Thu, 30 June 2005 13:32:23 -0700, Chris Wedgwood wrote: > On Thu, Jun 30, 2005 at 09:44:37PM +0200, J?rn Engel wrote: > > > Or do you rather mean that a single sync() should block until all data > > currently present is hardened? > > Logically sync() should return only after all dirty buffers that > existed before sync() was called are flushed. That's what I thought. Thanks for the confirmation. > Anything more than this (i.e. waiting on newly (since sync was called > but before it returns) dirtied buffers) could live-lock (actually, > this used to happen sometimes, I don't know if that's the case). ... and would be totally useless anyway, yep. Jörn -- The strong give up and move away, while the weak give up and stay. -- unknown ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: XFS corruption during power-blackout 2005-06-30 20:32 ` Chris Wedgwood 2005-06-30 21:07 ` Jörn Engel @ 2005-07-01 12:36 ` Ric Wheeler 2005-07-01 12:56 ` Jens Axboe 1 sibling, 1 reply; 36+ messages in thread From: Ric Wheeler @ 2005-07-01 12:36 UTC (permalink / raw) To: Chris Wedgwood Cc: J?rn Engel, Bryan Henderson, Al Boldi, linux-fsdevel, linux-xfs, Steve Lord, 'Nathan Scott', reiserfs-list Chris Wedgwood wrote: >On Thu, Jun 30, 2005 at 09:44:37PM +0200, J?rn Engel wrote: > > > >>Or do you rather mean that a single sync() should block until all data >>currently present is hardened? >> >> > >Logically sync() should return only after all dirty buffers that >existed before sync() was called are flushed. > >Anything more than this (i.e. waiting on newly (since sync was called >but before it returns) dirtied buffers) could live-lock (actually, >this used to happen sometimes, I don't know if that's the case). > > I think that we need one more stage in sync() behavior to make sure that the data is safely on the platter. For file systems with supported write barriers, the last IO should be wrapped with a barrier to flush the disk cache. It doesn't seem that sync() does that in today's code. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: XFS corruption during power-blackout 2005-07-01 12:36 ` Ric Wheeler @ 2005-07-01 12:56 ` Jens Axboe 0 siblings, 0 replies; 36+ messages in thread From: Jens Axboe @ 2005-07-01 12:56 UTC (permalink / raw) To: Ric Wheeler Cc: Chris Wedgwood, J?rn Engel, Bryan Henderson, Al Boldi, linux-fsdevel, linux-xfs, Steve Lord, 'Nathan Scott', reiserfs-list On Fri, Jul 01 2005, Ric Wheeler wrote: > Chris Wedgwood wrote: > > >On Thu, Jun 30, 2005 at 09:44:37PM +0200, J?rn Engel wrote: > > > > > > > >>Or do you rather mean that a single sync() should block until all data > >>currently present is hardened? > >> > >> > > > >Logically sync() should return only after all dirty buffers that > >existed before sync() was called are flushed. > > > >Anything more than this (i.e. waiting on newly (since sync was called > >but before it returns) dirtied buffers) could live-lock (actually, > >this used to happen sometimes, I don't know if that's the case). > > > > > I think that we need one more stage in sync() behavior to make sure that > the data is safely on the platter. For file systems with supported > write barriers, the last IO should be wrapped with a barrier to flush > the disk cache. > > It doesn't seem that sync() does that in today's code. That is true, sync() really only guarantees that the io has been issued and the drive signalled completion, with write back caching on it might not be on platter yet. -- Jens Axboe ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: XFS corruption during power-blackout 2005-06-30 18:46 ` Chris Wedgwood 2005-06-30 19:44 ` Jörn Engel @ 2005-06-30 20:49 ` Bryan Henderson 2005-07-01 12:53 ` Ric Wheeler 2005-07-01 1:09 ` Stewart Smith 2005-07-05 15:53 ` Sonny Rao 3 siblings, 1 reply; 36+ messages in thread From: Bryan Henderson @ 2005-06-30 20:49 UTC (permalink / raw) To: Chris Wedgwood Cc: Al Boldi, linux-fsdevel, linux-xfs, Steve Lord, 'Nathan Scott', reiserfs-list >POSIX is broken in places ... >If it happens differently I would call that a bug. I think you're confusing goodness with correctness. POSIX is a definition; it can't be broken. A bug is where don't meet your own specification. So if the spec doesn't say you have to be synchronous, it's not a bug not to be synchronous. Call it a design flaw if you want. >> In others, it implements "everything that was buffered when sync() >> started is hardened before the next sync() returns," > >That is what happens now. I'm not sure any other behavior makes sense >does it? I think you quoted the wrong part. From context, I think you meant "everything that was buffered when sync() started is hardened before sync() returns." And it's also my understanding that current Linux does that. Another Linux sync() behavior is that it keeps syncing super blocks until every super block is clean at the same moment. That has given me fits. I don't know what the goal of that is -- it came in around 2.4.10. >POSIX is a bit more sane about fsync(): > > The fsync() function can be used by an application to indicate > that all data for the open file description named by fildes is > to be transferred to the storage device associated with the file > described by fildes in an implementation-dependent manner. The > fsync() function does not return until the system has completed > that action or until an error is detected. Strange; that's not the way I remember it. I remember it being much more vague; in particular, I remember a specification that did not assume that a file is associated with a particular device and referred instead to "stable storage," the definition of which was entirely up to the implementation. In other words, the definition I've been working from was more grown-up. I wonder what the difference is. >> and some 'sync' programs do multiple sync()s. > >Such programs are arguably broken (grub maybe?). If one doesn't work, >then why should doing it <n>-times? It's because of the words before that: "everything that was buffered when sync() started is hardened before the next sync() returns." The point is that the second sync() is the one that waits (it actually waits for the previous one to finish before it starts). By the way, I'm not talking about Linux at this point. I'm talking about so-called POSIX systems in general. But it does sound like Linux has a pretty firm philosophy of synchronous sync (I see it documented in an old man page), so I guess it's OK to rely on it. There are scenarios where you'd rather not have a process tied up while syncing takes place. Stepping back, I would guess the primary original purpose of sync() was to allow you to make a sync daemon. Early Unix systems did not have in-kernel safety clean timers. A user space process did that. -- Bryan Henderson IBM Almaden Research Center San Jose CA Filesystems ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: XFS corruption during power-blackout 2005-06-30 20:49 ` Bryan Henderson @ 2005-07-01 12:53 ` Ric Wheeler 2005-07-01 18:24 ` Bryan Henderson 0 siblings, 1 reply; 36+ messages in thread From: Ric Wheeler @ 2005-07-01 12:53 UTC (permalink / raw) To: Bryan Henderson Cc: Chris Wedgwood, Al Boldi, linux-fsdevel, linux-xfs, Steve Lord, 'Nathan Scott', reiserfs-list Bryan Henderson wrote: > >It's because of the words before that: "everything that was buffered when >sync() >started is hardened before the next sync() returns." The point is that >the second sync() is the one that waits (it actually waits for the >previous one to finish before it starts). By the way, I'm not talking >about Linux at this point. I'm talking about so-called POSIX systems in >general. > >But it does sound like Linux has a pretty firm philosophy of synchronous >sync (I see it documented in an old man page), so I guess it's OK to rely >on it. > >There are scenarios where you'd rather not have a process tied up while >syncing takes place. Stepping back, I would guess the primary original >purpose of sync() was to allow you to make a sync daemon. Early Unix >systems did not have in-kernel safety clean timers. A user space process >did that. > >-- >Bryan Henderson IBM Almaden Research Center >San Jose CA Filesystems > > We have been playing around with various sync techniques that allow you to get good data safety for a large batch of files (think of a restore of a file system or a migration of lots of files from one server to another). You can always restart a restore if the box goes down in the middle, but once you are done, you want a hard promise that all files are safely on the disk platter. Using system level sync() has all of the disadvantages that you mention along with the lack of a per-file system barrier flush. You can try to hack in a flush by issuing an fsync() call on one file per file system after the sync() completes, but whether or not the file system issues a barrier operation is file system dependent. Doing an fsync() per file is slow but safe. Writing the files without syncing and then reopening and fsync()'ing each one in reasonable batch size is much faster, but still kludgey. An attractive, but as far as I can see missing feature, would be the ability to do a file system specific sync() command. Another option would be a batched AIO like fsync() with a bit vector of descriptors to sync. Not surprising, but the best performance is reached when you let the writing phase working asynchronously and let the underlying file system do its thing and wrap it up with a group cache to disk sync and a single disk write cache invalidate (barrier) at the end. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: XFS corruption during power-blackout 2005-07-01 12:53 ` Ric Wheeler @ 2005-07-01 18:24 ` Bryan Henderson 2005-07-01 19:58 ` David Masover 0 siblings, 1 reply; 36+ messages in thread From: Bryan Henderson @ 2005-07-01 18:24 UTC (permalink / raw) To: Ric Wheeler Cc: Al Boldi, Chris Wedgwood, linux-fsdevel, linux-xfs, Steve Lord, 'Nathan Scott', reiserfs-list >We have been playing around with various sync techniques that allow you >to get good data safety for a large batch of files (think of a restore >of a file system or a migration of lots of files from one server to >another). You can always restart a restore if the box goes down in the >middle, but once you are done, you want a hard promise that all files >are safely on the disk platter. > >Using system level sync() has all of the disadvantages that you mention >along with the lack of a per-file system barrier flush. > >You can try to hack in a flush by issuing an fsync() call on one file >per file system after the sync() completes, but whether or not the file >system issues a barrier operation is file system dependent. > >Doing an fsync() per file is slow but safe. Writing the files without >syncing and then reopening and fsync()'ing each one in reasonable batch >size is much faster, but still kludgey. > >An attractive, but as far as I can see missing feature, would be the >ability to do a file system specific sync() command. Another option >would be a batched AIO like fsync() with a bit vector of descriptors to >sync. Not surprising, but the best performance is reached when you let >the writing phase working asynchronously and let the underlying file >system do its thing and wrap it up with a group cache to disk sync and a >single disk write cache invalidate (barrier) at the end. Hear, hear to all of that. sync() has gotten to be really old-fashioned. You can sync an invidual filesystem image if the filesystem is on a block device or a suitable simulation of one, by opening a block device special file for the device and doing fsync(). What you'd really like is to fsync a multi-file unit of work (transaction) -- and not just among open files. You'd like to open, write, and close 1000 files in a single transaction and then commit that transaction, with no syncing due to timers in the meantime. If you're really greedy, you'd also ask for complete rollback if the system fails before the commit. I've always found it awkward that any user can do a sync(), when it's a system-wide control operation. In the Storage Tank Linux filesystem driver I designed, you could turn off safety cleaning with a mount option (and could mount the filesystem multiple times in order to work with multiple options). You could also turn it off for a particular file with a "temporary file" attribute, and a file which was not linked to a directory was also understood to be temporary. Safety cleaning is what sync() and the internal timers do. Safety cleaning doesn't make much sense unless it goes down inside the storage device as well. -- Bryan Henderson IBM Almaden Research Center San Jose CA Filesystems ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: XFS corruption during power-blackout 2005-07-01 18:24 ` Bryan Henderson @ 2005-07-01 19:58 ` David Masover 2005-07-01 21:10 ` Jörn Engel 0 siblings, 1 reply; 36+ messages in thread From: David Masover @ 2005-07-01 19:58 UTC (permalink / raw) To: Bryan Henderson Cc: Ric Wheeler, Al Boldi, Chris Wedgwood, linux-fsdevel, linux-xfs, Steve Lord, 'Nathan Scott', reiserfs-list Bryan Henderson wrote: [...] > What you'd really like is to fsync a multi-file unit of work (transaction) > -- and not just among open files. You'd like to open, write, and close > 1000 files in a single transaction and then commit that transaction, with > no syncing due to timers in the meantime. If you're really greedy, you'd > also ask for complete rollback if the system fails before the commit. Both of these are planned for Reiser4. Or is it 4.1? I would like said interface to be able to not necessarily flush to disk right away, though. It should certainly be an option (I'm sure MySQL would use that option), but sometimes you want the performance, especially if there are dozens of these transactions firing all at once -- better to let RAM fill up and then flush them all. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: XFS corruption during power-blackout 2005-07-01 19:58 ` David Masover @ 2005-07-01 21:10 ` Jörn Engel 2005-07-01 21:39 ` David Masover 0 siblings, 1 reply; 36+ messages in thread From: Jörn Engel @ 2005-07-01 21:10 UTC (permalink / raw) To: David Masover Cc: Bryan Henderson, Ric Wheeler, Al Boldi, Chris Wedgwood, linux-fsdevel, linux-xfs, Steve Lord, 'Nathan Scott', reiserfs-list On Fri, 1 July 2005 14:58:39 -0500, David Masover wrote: > Bryan Henderson wrote: > [...] > >What you'd really like is to fsync a multi-file unit of work (transaction) > >-- and not just among open files. You'd like to open, write, and close > >1000 files in a single transaction and then commit that transaction, with > >no syncing due to timers in the meantime. If you're really greedy, you'd > >also ask for complete rollback if the system fails before the commit. > > Both of these are planned for Reiser4. Or is it 4.1? Both are pretty trivial to implement for a tree-based fs like reiserfs. Non-trivial is the user interface. Not sure if sys_reiser is the answer to that. Jörn -- When people work hard for you for a pat on the back, you've got to give them that pat. -- Robert Heinlein ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: XFS corruption during power-blackout 2005-07-01 21:10 ` Jörn Engel @ 2005-07-01 21:39 ` David Masover 0 siblings, 0 replies; 36+ messages in thread From: David Masover @ 2005-07-01 21:39 UTC (permalink / raw) To: Jörn Engel Cc: Bryan Henderson, Ric Wheeler, Al Boldi, Chris Wedgwood, linux-fsdevel, linux-xfs, Steve Lord, 'Nathan Scott', reiserfs-list Jörn Engel wrote: > On Fri, 1 July 2005 14:58:39 -0500, David Masover wrote: > >>Bryan Henderson wrote: >>[...] >> >>>What you'd really like is to fsync a multi-file unit of work (transaction) >>>-- and not just among open files. You'd like to open, write, and close >>>1000 files in a single transaction and then commit that transaction, with >>>no syncing due to timers in the meantime. If you're really greedy, you'd >>>also ask for complete rollback if the system fails before the commit. >> >>Both of these are planned for Reiser4. Or is it 4.1? > > > Both are pretty trivial to implement for a tree-based fs like > reiserfs. Non-trivial is the user interface. Not sure if sys_reiser > is the answer to that. It is intended to be, I think. But sys_reiser has been pushed off to 4.1, last I checked. From the general attitude here, I'm guessing that it should *not* be called sys_reiser. We're already doing the meta-files interface for doing anything we want to do with reiser, which means sys_reiser currently only does two things: allows simultaneous access to lots of small files efficiently (versus open()-ing each of them), and transactions. While the two may or may not belong in the same system call, I don't believe they should be Reiser-specific. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: XFS corruption during power-blackout 2005-06-30 18:46 ` Chris Wedgwood 2005-06-30 19:44 ` Jörn Engel 2005-06-30 20:49 ` Bryan Henderson @ 2005-07-01 1:09 ` Stewart Smith 2005-07-05 15:53 ` Sonny Rao 3 siblings, 0 replies; 36+ messages in thread From: Stewart Smith @ 2005-07-01 1:09 UTC (permalink / raw) To: Chris Wedgwood Cc: Bryan Henderson, Al Boldi, linux-fsdevel, linux-xfs, Steve Lord, 'Nathan Scott', reiserfs-list [-- Attachment #1: Type: text/plain, Size: 2238 bytes --] On Thu, 2005-06-30 at 11:46 -0700, Chris Wedgwood wrote: > Yes, but POSIX is broken in places. The linux implmentation (now and > for sometime but not always) won't return until all dirty data is > flushed. POSIX, in regard to fsync() provides "flexibility for the implementation" - maybe your environment is special and you don't buffer anything, so fsync() is null. Or perhaps you cannot control some of the disk caches, so fsync() is null. In newer systems, you can check for the flag POSIX_SYNCHRONIZED_IO (or similar) that, if set, gaurentees that fsync() is synchronously flushing buffers to disk. However, this only came into the spec in 99 or 2000 i think, so there are still a lot of systems in which you have to know the behaviour. > > and some 'sync' programs do multiple sync()s. > > Such programs are arguably broken (grub maybe?). If one doesn't work, > then why should doing it <n>-times? It's a legacy from the days when it was an async operation. The idea went: that the time it took to type sync and press enter three times (note, no using up-arrow, enter - typing) would be long enough for the buffers that started to get flushed on the first sync to have hit disk. > > And it's also filesystem-type-dependent. > > If a filesystem doesn't flush reliably with sync, I would call that a > bug. > > > fsync(), on the other hand, is a true synchronizing operation. > > Again that requires the fs to behave correctly so if it fails it > should be reported as a bug. It's all fun and games - reliably getting data to disk is not fun. If Linux can reliably follow the idea that fsync() is synchronous and really does flush everything to disk, then it will be a lot better off then a lot of other platforms. Also, it'd be useful to have a list of where bugs affecting this have been found and in what kernels - It is not out of the question explicitly coding in exceptions (read: big warnings to users) for these systems. I guess a list of known-bad drives and controllers could be useful too. Doubly useful if the kernel could report this, but a userspace list would also be good. -- Stewart Smith (stewart@flamingspork.com) http://www.flamingspork.com/ [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 307 bytes --] ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: XFS corruption during power-blackout 2005-06-30 18:46 ` Chris Wedgwood ` (2 preceding siblings ...) 2005-07-01 1:09 ` Stewart Smith @ 2005-07-05 15:53 ` Sonny Rao 3 siblings, 0 replies; 36+ messages in thread From: Sonny Rao @ 2005-07-05 15:53 UTC (permalink / raw) To: Chris Wedgwood Cc: Bryan Henderson, Al Boldi, linux-fsdevel, linux-xfs, Steve Lord, 'Nathan Scott', reiserfs-list On Thu, Jun 30, 2005 at 11:46:27AM -0700, Chris Wedgwood wrote: > On Thu, Jun 30, 2005 at 12:30:20PM -0400, Bryan Henderson wrote: > > > For another point of reference - were these ATA (personal class) or > > SCSI (commercial class) drives or both? > > IDE were Maxtor some old Maxtor 60GB disks and some not-so-old 200GB > WD drives. Maxtor has 2MB cache. WD 8MB. > > The SCSI disks where 10K RPM SCA somethings. I think they were Segate > (they've since been taken or else I would check). I have no idea what > the cache is on those. > > > Is write caching the default on typical SCSI devices? > > I'm not sure. It seemed to be off by default for the SCSI disks and > on by default for IDE when I checked. I can't rule out the > bios/controller doing something there though. On all the SCSI drives shipped w/ servers write-caching is turned off for this very reason. This is true of all the IBM equipment I've seen, not sure about the smaller mom & pop outfits or drives sold through retail channels though. Sonny ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: XFS corruption during power-blackout 2005-06-29 17:56 ` Steve Lord 2005-06-29 20:56 ` Chris Wedgwood @ 2005-06-29 21:10 ` Nathan Scott 1 sibling, 0 replies; 36+ messages in thread From: Nathan Scott @ 2005-06-29 21:10 UTC (permalink / raw) To: Steve Lord Cc: Chris Wedgwood, Al Boldi, linux-xfs, linux-kernel, linux-fsdevel, reiserfs-list On Wed, Jun 29, 2005 at 12:56:12PM -0500, Steve Lord wrote: > I did spend a bunch of time once ensuring that when you typed > sync on xfs you could pull the power right after that and > everything from before the sync survived. There have been a > lot of changes both in xfs and the surrounding kernel since > then. I do not know if anyone has attempted this effort > again recently. Yep, someone has, a number of times. And as Homer would say "its still good!". cheers. -- Nathan ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: XFS corruption during power-blackout 2005-06-29 17:02 ` Chris Wedgwood 2005-06-29 17:56 ` Steve Lord @ 2005-07-01 8:17 ` David Masover 2005-07-01 9:24 ` Jens Axboe 1 sibling, 1 reply; 36+ messages in thread From: David Masover @ 2005-07-01 8:17 UTC (permalink / raw) To: Chris Wedgwood Cc: Al Boldi, 'Nathan Scott', linux-xfs, linux-kernel, linux-fsdevel, reiserfs-list Chris Wedgwood wrote: > On Wed, Jun 29, 2005 at 07:53:09AM +0300, Al Boldi wrote: > > >>What I found were 4 things in the dest dir: >>1. Missing Dirs,Files. That's OK. >>2. Files of size 0. That's acceptable. >>3. Corrupted Files. That's unacceptable. >>4. Corrupted Files with original fingerprint. That's ABSOLUTELY >>unacceptable. > > > disk usually default to caching these days and can lose data as a > result, disable that Not always possible. Some disks lie and leave caching on anyway. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: XFS corruption during power-blackout 2005-07-01 8:17 ` David Masover @ 2005-07-01 9:24 ` Jens Axboe [not found] ` <20050701131950.GA15180@ime.usp.br> 2005-07-01 14:05 ` Al Boldi 0 siblings, 2 replies; 36+ messages in thread From: Jens Axboe @ 2005-07-01 9:24 UTC (permalink / raw) To: David Masover Cc: Chris Wedgwood, Al Boldi, 'Nathan Scott', linux-xfs, linux-kernel, linux-fsdevel, reiserfs-list On Fri, Jul 01 2005, David Masover wrote: > Chris Wedgwood wrote: > >On Wed, Jun 29, 2005 at 07:53:09AM +0300, Al Boldi wrote: > > > > > >>What I found were 4 things in the dest dir: > >>1. Missing Dirs,Files. That's OK. > >>2. Files of size 0. That's acceptable. > >>3. Corrupted Files. That's unacceptable. > >>4. Corrupted Files with original fingerprint. That's ABSOLUTELY > >>unacceptable. > > > > > >disk usually default to caching these days and can lose data as a > >result, disable that > > Not always possible. Some disks lie and leave caching on anyway. And the same (and others) disks will not honor a flush anyways. Moral of that story - avoid bad hardware. -- Jens Axboe ^ permalink raw reply [flat|nested] 36+ messages in thread
[parent not found: <20050701131950.GA15180@ime.usp.br>]
* Re: XFS corruption during power-blackout [not found] ` <20050701131950.GA15180@ime.usp.br> @ 2005-07-01 13:57 ` Ric Wheeler 2005-07-01 18:37 ` Bryan Henderson 0 siblings, 1 reply; 36+ messages in thread From: Ric Wheeler @ 2005-07-01 13:57 UTC (permalink / raw) To: Rogério Brito; +Cc: linux-kernel, Brett Russ, linux-fsdevel Rogério Brito wrote: >On Jul 01 2005, Jens Axboe wrote: > > >>On Fri, Jul 01 2005, David Masover wrote: >> >> >>>Not always possible. Some disks lie and leave caching on anyway. >>> >>> >>And the same (and others) disks will not honor a flush anyways. >>Moral of that story - avoid bad hardware. >> >> > >But how does the end-user know what hardware is "good hardware"? Which >vendors don't lie (or, at least, lie less than others) regarding HDs? > > >Thanks, Rogério Brito. > > > The only real way is to test the drive (and retest when you get a new versions of firmware) and the whole fsync -> write barrier code path. We use a bus analyzer to make sure that when you fsync() a file, you will see a cache flush command coming across the bus. Of course, that is the easy step ;-) The second step is to test your system across power failures. We have a "wbtest" code that we have used to catch bugs. The basic idea is to write a file to a disk with the cache turned off, write the same file to the disk with the write barrier (and working cache flush command) and then randomly drop power to the box. It is important to really drop power to the whole box since a "reset button" push often does not drop power to the drives and will give you false passes. Our wbtest used to be good at finding holes in the write barrier code using 2.4 kernels and PATA drives, but we have had no luck yet in catching known bugs with this test on 2.6 with S-ATA drives. Ideas on how to get a more effective test are welcome - it is a very small window that you need to hit to catch a misbehaving drive (i.e., your write cache flush command has returned, you want to drop power and on reboot, validate that the platter contains that last IO correctly). If you had enough NVRAM in a test system, you might be able to substitute a NVRAM backed file system for the write-cache disabled drive and get closer to catching the window. The alternative is to either run with the write cache disabled (again, you will need to validate that the drive really disabled the cache) or to buy a mid-range or better storage array that provides a non-volatile (battery backed) write cache. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: XFS corruption during power-blackout 2005-07-01 13:57 ` Ric Wheeler @ 2005-07-01 18:37 ` Bryan Henderson 2005-07-01 18:41 ` Jens Axboe 0 siblings, 1 reply; 36+ messages in thread From: Bryan Henderson @ 2005-07-01 18:37 UTC (permalink / raw) To: Ric Wheeler; +Cc: linux-fsdevel, Rogério Brito, Brett Russ >>But how does the end-user know what hardware is "good hardware"? Which >>vendors don't lie (or, at least, lie less than others) regarding HDs? >> > >The only real way is to test the drive (and retest when you get a new >versions of firmware) and the whole fsync -> write barrier code path. Wouldn't a commercial class drive that ignores explicit flushes be infamous? I'm ready to accept that there are SCSI drives that cache writes in volatile storage by default (but frankly, I'm still skeptical), but I'm not ready to accept that there are drives out there secretly ignoring explicit commands to harden data, thus jeopardizing millions of dollars' worth of data. I'd need more evidence. -- Bryan Henderson IBM Almaden Research Center San Jose CA Filesystems ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: XFS corruption during power-blackout 2005-07-01 18:37 ` Bryan Henderson @ 2005-07-01 18:41 ` Jens Axboe 2005-07-11 12:53 ` Ric Wheeler 0 siblings, 1 reply; 36+ messages in thread From: Jens Axboe @ 2005-07-01 18:41 UTC (permalink / raw) To: Bryan Henderson Cc: Ric Wheeler, linux-fsdevel, Rogério Brito, Brett Russ On Fri, Jul 01 2005, Bryan Henderson wrote: > >>But how does the end-user know what hardware is "good hardware"? Which > >>vendors don't lie (or, at least, lie less than others) regarding HDs? > >> > > > >The only real way is to test the drive (and retest when you get a new > >versions of firmware) and the whole fsync -> write barrier code path. > > Wouldn't a commercial class drive that ignores explicit flushes be > infamous? I'm ready to accept that there are SCSI drives that cache > writes in volatile storage by default (but frankly, I'm still skeptical), > but I'm not ready to accept that there are drives out there secretly > ignoring explicit commands to harden data, thus jeopardizing millions of > dollars' worth of data. I'd need more evidence. I'm pretty sure I have an IBM drive that does so (its flush cache command is _really_ fast), as a matter of fact :-) I need to locate it and put it in a test box to re-ensure this. I'm not sure such drives would necessarily be infamous, hardly anyone would notice anything wrong in a desktop type machine. Which is what these drives were made for. -- Jens Axboe ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: XFS corruption during power-blackout 2005-07-01 18:41 ` Jens Axboe @ 2005-07-11 12:53 ` Ric Wheeler 0 siblings, 0 replies; 36+ messages in thread From: Ric Wheeler @ 2005-07-11 12:53 UTC (permalink / raw) To: Jens Axboe; +Cc: Bryan Henderson, linux-fsdevel, Rogério Brito, Brett Russ Jens Axboe wrote: >On Fri, Jul 01 2005, Bryan Henderson wrote: > > >>Wouldn't a commercial class drive that ignores explicit flushes be >>infamous? I'm ready to accept that there are SCSI drives that cache >>writes in volatile storage by default (but frankly, I'm still skeptical), >>but I'm not ready to accept that there are drives out there secretly >>ignoring explicit commands to harden data, thus jeopardizing millions of >>dollars' worth of data. I'd need more evidence. >> >> > >I'm pretty sure I have an IBM drive that does so (its flush cache >command is _really_ fast), as a matter of fact :-) I need to locate it >and put it in a test box to re-ensure this. > >I'm not sure such drives would necessarily be infamous, hardly anyone >would notice anything wrong in a desktop type machine. Which is what >these drives were made for. > > One other thing to keep in mind is that drive firmware can have bugs just like any other bit of code, so a drive may have a bug in one firmware revision that gets fixed in a following one. I am not sure how much that other operating system uses flush cache commands, but until the write barrier patch, it has been a relatively rarely issued command for Linux and breakage would not be noticed. ^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: XFS corruption during power-blackout 2005-07-01 9:24 ` Jens Axboe [not found] ` <20050701131950.GA15180@ime.usp.br> @ 2005-07-01 14:05 ` Al Boldi 2005-07-01 16:35 ` Alistair John Strachan 2005-07-05 15:49 ` Sonny Rao 1 sibling, 2 replies; 36+ messages in thread From: Al Boldi @ 2005-07-01 14:05 UTC (permalink / raw) To: 'Jens Axboe', 'David Masover' Cc: 'Chris Wedgwood', 'Nathan Scott', linux-xfs, linux-kernel, linux-fsdevel, reiserfs-list Jens Axboe wrote: { On Fri, Jul 01 2005, David Masover wrote: > Chris Wedgwood wrote: > >On Wed, Jun 29, 2005 at 07:53:09AM +0300, Al Boldi wrote: > > > > > >>What I found were 4 things in the dest dir: > >>1. Missing Dirs,Files. That's OK. > >>2. Files of size 0. That's acceptable. > >>3. Corrupted Files. That's unacceptable. > >>4. Corrupted Files with original fingerprint. That's ABSOLUTELY > >>unacceptable. > > > > > >disk usually default to caching these days and can lose data as a > >result, disable that > > Not always possible. Some disks lie and leave caching on anyway. And the same (and others) disks will not honor a flush anyways. Moral of that story - avoid bad hardware. } 1. Sync is not the issue. The issue is whether a journaled FS can detect corrupted files and flag them after a power-blackout! 2. Moral of the story is: What's ext3 doing the others aren't? ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: XFS corruption during power-blackout 2005-07-01 14:05 ` Al Boldi @ 2005-07-01 16:35 ` Alistair John Strachan 2005-07-05 15:49 ` Sonny Rao 1 sibling, 0 replies; 36+ messages in thread From: Alistair John Strachan @ 2005-07-01 16:35 UTC (permalink / raw) To: Al Boldi Cc: 'Jens Axboe', 'David Masover', 'Chris Wedgwood', 'Nathan Scott', linux-xfs, linux-kernel, linux-fsdevel, reiserfs-list On Friday 01 Jul 2005 15:05, Al Boldi wrote: > Jens Axboe wrote: { > > On Fri, Jul 01 2005, David Masover wrote: > > Chris Wedgwood wrote: > > >On Wed, Jun 29, 2005 at 07:53:09AM +0300, Al Boldi wrote: > > >>What I found were 4 things in the dest dir: > > >>1. Missing Dirs,Files. That's OK. > > >>2. Files of size 0. That's acceptable. > > >>3. Corrupted Files. That's unacceptable. > > >>4. Corrupted Files with original fingerprint. That's ABSOLUTELY > > >>unacceptable. > > > > > >disk usually default to caching these days and can lose data as a > > >result, disable that > > > > Not always possible. Some disks lie and leave caching on anyway. > > And the same (and others) disks will not honor a flush anyways. > Moral of that story - avoid bad hardware. > } > > 1. Sync is not the issue. The issue is whether a journaled FS can detect > corrupted files and flag them after a power-blackout! > 2. Moral of the story is: What's ext3 doing the others aren't? > I agree, I've used XFS for about three years on Linux now, and whilst I love the performance and self-repair attributes of the filesystem, I do think it leaves a lot to be desired when it comes to file corruption. In my experience, using a standard XFS log/volume setup on the same physical, cheap IDE HD, any files open at the time as a power down or hardware lockup end up being filled either with zeros, or garbage. However, I'd far rather lose a few files once in a blue moon than have to sit through 10 minute fsck's every time the kernel crashes or I kick out the plugs. -- Cheers, Alistair. personal: alistair()devzero!co!uk university: s0348365()sms!ed!ac!uk student: CS/CSim Undergraduate contact: 1F2 55 South Clerk Street, Edinburgh. EH8 9PP. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: XFS corruption during power-blackout 2005-07-01 14:05 ` Al Boldi 2005-07-01 16:35 ` Alistair John Strachan @ 2005-07-05 15:49 ` Sonny Rao 2005-07-05 17:25 ` Al Boldi 1 sibling, 1 reply; 36+ messages in thread From: Sonny Rao @ 2005-07-05 15:49 UTC (permalink / raw) To: Al Boldi Cc: 'Jens Axboe', 'David Masover', 'Chris Wedgwood', 'Nathan Scott', linux-xfs, linux-kernel, linux-fsdevel, reiserfs-list On Fri, Jul 01, 2005 at 05:05:11PM +0300, Al Boldi wrote: > Jens Axboe wrote: { > On Fri, Jul 01 2005, David Masover wrote: > > Chris Wedgwood wrote: > > >On Wed, Jun 29, 2005 at 07:53:09AM +0300, Al Boldi wrote: > > > > > > > > >>What I found were 4 things in the dest dir: > > >>1. Missing Dirs,Files. That's OK. > > >>2. Files of size 0. That's acceptable. > > >>3. Corrupted Files. That's unacceptable. > > >>4. Corrupted Files with original fingerprint. That's ABSOLUTELY > > >>unacceptable. > > > > > > > > >disk usually default to caching these days and can lose data as a > > >result, disable that > > > > Not always possible. Some disks lie and leave caching on anyway. > > And the same (and others) disks will not honor a flush anyways. > Moral of that story - avoid bad hardware. > } > > 1. Sync is not the issue. The issue is whether a journaled FS can detect > corrupted files and flag them after a power-blackout! Journaling implies filesystem consistency, not data integrity, AFAIK. > 2. Moral of the story is: What's ext3 doing the others aren't? Ext3 has stronger guaranties than basic filesystem consistency. I.e. in ordered mode, file data is always written before metadata, so the worst that could happen is a growing file's new data is written but the metadata isn't updated before a power failure... so the new writes wouldn't be seen afterwards. You should try the same test w/ ext3 in "writeback" mode and see if it fares better or worse in terms of file corruption. Sonny ^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: XFS corruption during power-blackout 2005-07-05 15:49 ` Sonny Rao @ 2005-07-05 17:25 ` Al Boldi 2005-07-05 18:10 ` Sonny Rao 0 siblings, 1 reply; 36+ messages in thread From: Al Boldi @ 2005-07-05 17:25 UTC (permalink / raw) To: 'Sonny Rao' Cc: 'Jens Axboe', 'David Masover', 'Chris Wedgwood', 'Nathan Scott', linux-xfs, linux-kernel, linux-fsdevel, reiserfs-list Sonny Rao wrote: { > > >On Wed, Jun 29, 2005 at 07:53:09AM +0300, Al Boldi wrote: > > >>What I found were 4 things in the dest dir: > > >>1. Missing Dirs,Files. That's OK. > > >>2. Files of size 0. That's acceptable. > > >>3. Corrupted Files. That's unacceptable. > > >>4. Corrupted Files with original fingerprint. That's ABSOLUTELY > > >>unacceptable. > > > > 2. Moral of the story is: What's ext3 doing the others aren't? Ext3 has stronger guaranties than basic filesystem consistency. I.e. in ordered mode, file data is always written before metadata, so the worst that could happen is a growing file's new data is written but the metadata isn't updated before a power failure... so the new writes wouldn't be seen afterwards. } Sonny, Thanks for you input! Is there an option in XFS,ReiserFS,JFS to enable ordered mode? ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: XFS corruption during power-blackout 2005-07-05 17:25 ` Al Boldi @ 2005-07-05 18:10 ` Sonny Rao 2005-07-05 19:24 ` Dieter Nützel 2005-07-06 4:24 ` Al Boldi 0 siblings, 2 replies; 36+ messages in thread From: Sonny Rao @ 2005-07-05 18:10 UTC (permalink / raw) To: Al Boldi Cc: 'Jens Axboe', 'David Masover', 'Chris Wedgwood', 'Nathan Scott', linux-xfs, linux-kernel, linux-fsdevel, reiserfs-list On Tue, Jul 05, 2005 at 08:25:11PM +0300, Al Boldi wrote: > Sonny Rao wrote: { > > > >On Wed, Jun 29, 2005 at 07:53:09AM +0300, Al Boldi wrote: > > > >>What I found were 4 things in the dest dir: > > > >>1. Missing Dirs,Files. That's OK. > > > >>2. Files of size 0. That's acceptable. > > > >>3. Corrupted Files. That's unacceptable. > > > >>4. Corrupted Files with original fingerprint. That's ABSOLUTELY > > > >>unacceptable. > > > > > > 2. Moral of the story is: What's ext3 doing the others aren't? > > Ext3 has stronger guaranties than basic filesystem consistency. > I.e. in ordered mode, file data is always written before metadata, so the > worst that could happen is a growing file's new data is written but the > metadata isn't updated before a power failure... so the new writes wouldn't > be seen afterwards. > > } > > Sonny, > Thanks for you input! > Is there an option in XFS,ReiserFS,JFS to enable ordered mode? I beleive in newer 2.6 kernels that Reiser has ordered mode (IIRC, courtesy of Chris Mason), but XFS and JFS do not support it. I seem to remember Shaggy (JFS maintainer) saying in older 2.4 kernels he tried to write file data before metadata but had to change that behavior in 2.6, not really sure why or anything beyond that. Sonny ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: XFS corruption during power-blackout 2005-07-05 18:10 ` Sonny Rao @ 2005-07-05 19:24 ` Dieter Nützel 2005-07-06 4:24 ` Al Boldi 1 sibling, 0 replies; 36+ messages in thread From: Dieter Nützel @ 2005-07-05 19:24 UTC (permalink / raw) To: reiserfs-list Cc: Sonny Rao, Al Boldi, 'Jens Axboe', 'David Masover', 'Chris Wedgwood', 'Nathan Scott', linux-xfs, linux-kernel, linux-fsdevel Am Dienstag, 5. Juli 2005 20:10 schrieb Sonny Rao: > On Tue, Jul 05, 2005 at 08:25:11PM +0300, Al Boldi wrote: > > Sonny Rao wrote: { > > > > > > >On Wed, Jun 29, 2005 at 07:53:09AM +0300, Al Boldi wrote: > > > > >>What I found were 4 things in the dest dir: > > > > >>1. Missing Dirs,Files. That's OK. > > > > >>2. Files of size 0. That's acceptable. > > > > >>3. Corrupted Files. That's unacceptable. > > > > >>4. Corrupted Files with original fingerprint. That's ABSOLUTELY > > > > >>unacceptable. > > > > > > 2. Moral of the story is: What's ext3 doing the others aren't? > > > > Ext3 has stronger guaranties than basic filesystem consistency. > > I.e. in ordered mode, file data is always written before metadata, so the > > worst that could happen is a growing file's new data is written but the > > metadata isn't updated before a power failure... so the new writes > > wouldn't be seen afterwards. > > > > } > > > > Sonny, > > Thanks for you input! > > Is there an option in XFS,ReiserFS,JFS to enable ordered mode? > > I beleive in newer 2.6 kernels that Reiser has ordered mode (IIRC, courtesy > of Chris Mason), And SuSE, ack. ftp://ftp.suse.com/pub/people/mason/patches/data-logging They are around some time ;-) > but XFS and JFS do not support it. I seem to remember > Shaggy (JFS maintainer) saying in older 2.4 kernels he tried to write > file data before metadata but had to change that behavior in 2.6, not > really sure why or anything beyond that. Greetings, Dieter -- Dieter Nützel @home: <Dieter () nuetzel-hh ! de> ^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: XFS corruption during power-blackout 2005-07-05 18:10 ` Sonny Rao 2005-07-05 19:24 ` Dieter Nützel @ 2005-07-06 4:24 ` Al Boldi 2005-07-06 4:46 ` Nathan Scott 1 sibling, 1 reply; 36+ messages in thread From: Al Boldi @ 2005-07-06 4:24 UTC (permalink / raw) To: 'Sonny Rao' Cc: 'Jens Axboe', 'David Masover', 'Chris Wedgwood', 'Nathan Scott', linux-xfs, linux-kernel, linux-fsdevel, reiserfs-list Sonny Rao wrote: { > > > >On Wed, Jun 29, 2005 at 07:53:09AM +0300, Al Boldi wrote: > > > >>What I found were 4 things in the dest dir: > > > >>1. Missing Dirs,Files. That's OK. > > > >>2. Files of size 0. That's acceptable. > > > >>3. Corrupted Files. That's unacceptable. > > > >>4. Corrupted Files with original fingerprint. That's ABSOLUTELY > > > >>unacceptable. > > > > > > 2. Moral of the story is: What's ext3 doing the others aren't? > > Ext3 has stronger guaranties than basic filesystem consistency. > I.e. in ordered mode, file data is always written before metadata, so > the worst that could happen is a growing file's new data is written > but the metadata isn't updated before a power failure... so the new > writes wouldn't be seen afterwards. > I believe in newer 2.6 kernels that Reiser has ordered mode (IIRC, courtesy of Chris Mason), but XFS and JFS do not support it. } Was ordered mode disabled/removed when XFS was add to the vanilla-kernel? ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: XFS corruption during power-blackout 2005-07-06 4:24 ` Al Boldi @ 2005-07-06 4:46 ` Nathan Scott 0 siblings, 0 replies; 36+ messages in thread From: Nathan Scott @ 2005-07-06 4:46 UTC (permalink / raw) To: Al Boldi Cc: 'Sonny Rao', 'Jens Axboe', 'David Masover', 'Chris Wedgwood', linux-xfs, linux-kernel, linux-fsdevel, reiserfs-list On Wed, Jul 06, 2005 at 07:24:03AM +0300, Al Boldi wrote: > Was ordered mode disabled/removed when XFS was add to the vanilla-kernel? No, XFS has never supported such a mode. cheers. -- Nathan ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: XFS corruption during power-blackout
@ 2005-07-16 7:02 Al Boldi
0 siblings, 0 replies; 36+ messages in thread
From: Al Boldi @ 2005-07-16 7:02 UTC (permalink / raw)
To: rhowe; +Cc: linux-kernel, linux-fsdevel, linux-xfs, 'Nathan Scott'
Russell Howe wrote: {
XFS only journals metadata, not data.
So, you are supposed to get a consistent filesystem structure, but your
data consistency isn't guaranteed.
}
What did XFS do to detect filedata-corruption before it was added to the
vanilla-kernel?
Maybe it did not update the metadata before the fs was sync'd?
Really, it should wait for fs sync and then update metadata!
This would imply 2 syncs in succession to ensure updated filedata/metadata
consistency, which is OK.
Is it possible to instruct XFS to delay metadata update until after a
filedata sync?
Thanks!
Al
^ permalink raw reply [flat|nested] 36+ messages in threadend of thread, other threads:[~2005-07-16 7:04 UTC | newest]
Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20050629001847.GB850@frodo>
2005-06-29 4:53 ` XFS corruption during power-blackout Al Boldi
2005-06-29 16:38 ` Christian Rice
2005-06-29 17:02 ` Chris Wedgwood
2005-06-29 17:56 ` Steve Lord
2005-06-29 20:56 ` Chris Wedgwood
2005-06-30 16:30 ` Bryan Henderson
2005-06-30 18:46 ` Chris Wedgwood
2005-06-30 19:44 ` Jörn Engel
2005-06-30 20:32 ` Chris Wedgwood
2005-06-30 21:07 ` Jörn Engel
2005-07-01 12:36 ` Ric Wheeler
2005-07-01 12:56 ` Jens Axboe
2005-06-30 20:49 ` Bryan Henderson
2005-07-01 12:53 ` Ric Wheeler
2005-07-01 18:24 ` Bryan Henderson
2005-07-01 19:58 ` David Masover
2005-07-01 21:10 ` Jörn Engel
2005-07-01 21:39 ` David Masover
2005-07-01 1:09 ` Stewart Smith
2005-07-05 15:53 ` Sonny Rao
2005-06-29 21:10 ` Nathan Scott
2005-07-01 8:17 ` David Masover
2005-07-01 9:24 ` Jens Axboe
[not found] ` <20050701131950.GA15180@ime.usp.br>
2005-07-01 13:57 ` Ric Wheeler
2005-07-01 18:37 ` Bryan Henderson
2005-07-01 18:41 ` Jens Axboe
2005-07-11 12:53 ` Ric Wheeler
2005-07-01 14:05 ` Al Boldi
2005-07-01 16:35 ` Alistair John Strachan
2005-07-05 15:49 ` Sonny Rao
2005-07-05 17:25 ` Al Boldi
2005-07-05 18:10 ` Sonny Rao
2005-07-05 19:24 ` Dieter Nützel
2005-07-06 4:24 ` Al Boldi
2005-07-06 4:46 ` Nathan Scott
2005-07-16 7:02 Al Boldi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).