XFS: how to NOT null files on fsck?

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* XFS: how to NOT null files on fsck?
@ 2004-07-05  5:47 Norberto Bensa
  2004-07-09 16:37 ` L A Walsh
  0 siblings, 1 reply; 52+ messages in thread
From: Norberto Bensa @ 2004-07-05  5:47 UTC (permalink / raw)
  To: linux-kernel

Hello,

how do I setup XFS to not null files after a bad shutdown?

Thanks,
Norberto

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-05  5:47 XFS: how to NOT null files on fsck? Norberto Bensa
@ 2004-07-09 16:37 ` L A Walsh
  2004-07-09 21:59   ` Chris Wedgwood
  2004-07-29  1:30   ` Nathan Scott
  0 siblings, 2 replies; 52+ messages in thread
From: L A Walsh @ 2004-07-09 16:37 UTC (permalink / raw)
  To: Norberto Bensa; +Cc: linux-kernel

It's a feature! :-)

It's been in the code for years to randomly write nulls to some files 
that have been
modified in the past few days after a bad shutdown.  Reported on XFS 
list and got same
overwhelming response there. 

Apparently not easily reproduced, no one has a clue why it does it.  
Just does. 
Even after multiple syncs, files edited within the past few days
will sometimes go mysteriously null.  Good reason to do daily backups as the
backups will usually contain the correct file...

Now if we could just come up with a reproducable test case...but when I
try to reproduce it, it doesn't.  Grrr....it knows when I'm 
scrutinizing!! :-)

-l

Norberto Bensa wrote:

>Hello,
>
>how do I setup XFS to not null files after a bad shutdown?
>
>Thanks,
>Norberto
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/
>  
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-09 16:37 ` L A Walsh
@ 2004-07-09 21:59   ` Chris Wedgwood
  2004-07-10 18:33     ` L A Walsh
  2004-07-10 18:43     ` Jan Knutar
  2004-07-29  1:30   ` Nathan Scott
  1 sibling, 2 replies; 52+ messages in thread
From: Chris Wedgwood @ 2004-07-09 21:59 UTC (permalink / raw)
  To: L A Walsh; +Cc: Norberto Bensa, linux-kernel

On Fri, Jul 09, 2004 at 09:37:48AM -0700, L A Walsh wrote:

> Even after multiple syncs, files edited within the past few days
> will sometimes go mysteriously null.  Good reason to do daily
> backups as the backups will usually contain the correct file...

I *never* see this even when beating the hell out of machines and
trying to break things.

I do see nulls in cases where the metadata was updated and the data
didn't flush, that's supposed to happen.

> Now if we could just come up with a reproducable test case...but
> when I try to reproduce it, it doesn't.  Grrr....it knows when I'm
> scrutinizing!! :-)

Use anything that handles dotfiles or configuration badly (ie. KDE),
make some changes or just 'run it' for a bit.  Every now something
rewrites some files.  Yank the power a few times and sooner or later
you'll end up with problems under KDE certainly.

Sane applications (MTAs like postfix for example) don't have this
problem because they were written with more clue.  If they did have
this problem people would scream, because mail would get lost...  and
large mail servers might have tens of thousands of files moving about
in-flight, much more strenuous that a few dot-files.

  --cw

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-09 21:59   ` Chris Wedgwood
@ 2004-07-10 18:33     ` L A Walsh
  2004-07-10 18:43       ` Chris Wedgwood
  2004-07-12 23:03       ` Bernd Eckenfels
  2004-07-10 18:43     ` Jan Knutar
  1 sibling, 2 replies; 52+ messages in thread
From: L A Walsh @ 2004-07-10 18:33 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: L A Walsh, Norberto Bensa, linux-kernel

My cases have been "vim" edited files.  I'd sorta think once vim has 
exited, the
data has been flushed, but that's just a WAG...

-l

Chris Wedgwood wrote:

>On Fri, Jul 09, 2004 at 09:37:48AM -0700, L A Walsh wrote:
>
>>ven after multiple syncs, files edited within the past few days
>>will sometimes go mysteriously null.  Good reason to do daily
>>backups as the backups will usually contain the correct file...
>>    
>>
>I *never* see this even when beating the hell out of machines and
>trying to break things.
>
>I do see nulls in cases where the metadata was updated and the data
>didn't flush, that's supposed to happen.
>  
>
>>Now if we could just come up with a reproducable test case...but
>>when I try to reproduce it, it doesn't.  Grrr....it knows when I'm
>>scrutinizing!! :-)
>>    
>>
>Use anything that handles dotfiles or configuration badly (ie. KDE),
>make some changes or just 'run it' for a bit.  Every now something
>rewrites some files.  Yank the power a few times and sooner or later
>you'll end up with problems under KDE certainly.
>  
>
---
    No desktop on this machine...it's a server I log into remotely for 
the most part.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-10 18:33     ` L A Walsh
@ 2004-07-10 18:43       ` Chris Wedgwood
  2004-07-10 21:24         ` Bernd Eckenfels
  2004-07-12 23:03       ` Bernd Eckenfels
  1 sibling, 1 reply; 52+ messages in thread
From: Chris Wedgwood @ 2004-07-10 18:43 UTC (permalink / raw)
  To: L A Walsh; +Cc: Norberto Bensa, linux-kernel

On Sat, Jul 10, 2004 at 11:33:09AM -0700, L A Walsh wrote:

> My cases have been "vim" edited files.  I'd sorta think once vim has
> exited, the data has been flushed, but that's just a WAG...

No, that's not the case.  Normally when files are written the data
isn't not flushed immediately, it sits in memory (the page-cache) for
some (usually) small amount of time.

If the data is critical applications should fsync (or similar) as
required.

FWIW my standard method of shutdown is:

     sync ; poweroff -f

sorta thing.  I don't loose any data doing this, (at least nothing
I've noticed).

   --cw

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-10 18:43       ` Chris Wedgwood
@ 2004-07-10 21:24         ` Bernd Eckenfels
  2004-07-11 21:54           ` Helge Hafting
  0 siblings, 1 reply; 52+ messages in thread
From: Bernd Eckenfels @ 2004-07-10 21:24 UTC (permalink / raw)
  To: linux-kernel

In article <20040710184357.GA5014@taniwha.stupidest.org> you wrote:
> No, that's not the case.  Normally when files are written the data
> isn't not flushed immediately, it sits in memory (the page-cache) for
> some (usually) small amount of time.

Does that mean, that closing a tempfile and then renaming  the file is not 
a reliable way to tell, that the data  is persited? I usually use a atomic
rename to have a point from which on I can tell if the data is complete
and persisted.

I thought close() has  fsync() semantics?

Greetings
Bernd
-- 
eckes privat - http://www.eckes.org/
Project Freefire - http://www.freefire.org/

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-10 21:24         ` Bernd Eckenfels
@ 2004-07-11 21:54           ` Helge Hafting
  2004-07-12 17:56             ` H. Peter Anvin
  0 siblings, 1 reply; 52+ messages in thread
From: Helge Hafting @ 2004-07-11 21:54 UTC (permalink / raw)
  To: Bernd Eckenfels; +Cc: linux-kernel

On Sat, Jul 10, 2004 at 11:24:53PM +0200, Bernd Eckenfels wrote:
> In article <20040710184357.GA5014@taniwha.stupidest.org> you wrote:
> > No, that's not the case.  Normally when files are written the data
> > isn't not flushed immediately, it sits in memory (the page-cache) for
> > some (usually) small amount of time.
> 
> Does that mean, that closing a tempfile and then renaming  the file is not 
> a reliable way to tell, that the data  is persited? I usually use a atomic
> rename to have a point from which on I can tell if the data is complete
> and persisted.
> 
> I thought close() has  fsync() semantics?
> 
No, it doesn't.

close() will flush the C library buffer.  That means the data
moves from theose buffers to the pagacache. The program crashing
after that will have no effect on the file.  It can still
be lost if the _kernel_ crashes though.
If you want the pagecache flushed to disk, use fsync (or sync)

Helge Hafting

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-11 21:54           ` Helge Hafting
@ 2004-07-12 17:56             ` H. Peter Anvin
  2004-07-12 19:59               ` Chris Wedgwood
  0 siblings, 1 reply; 52+ messages in thread
From: H. Peter Anvin @ 2004-07-12 17:56 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <20040711215446.GA21443@hh.idb.hist.no>
By author:    Helge Hafting <helgehaf@aitel.hist.no>
In newsgroup: linux.dev.kernel
> > 
> No, it doesn't.
> 
> close() will flush the C library buffer.  That means the data
> moves from theose buffers to the pagacache. The program crashing
> after that will have no effect on the file.  It can still
> be lost if the _kernel_ crashes though.
> If you want the pagecache flushed to disk, use fsync (or sync)
> 

No it won't, since if you're using file descriptors there *is* no C
library buffer.  fclose() will, though, and then call close().

	-hpa

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-12 17:56             ` H. Peter Anvin
@ 2004-07-12 19:59               ` Chris Wedgwood
  2004-07-12 20:32                 ` H. Peter Anvin
  2004-07-12 22:29                 ` Bernd Eckenfels
  0 siblings, 2 replies; 52+ messages in thread
From: Chris Wedgwood @ 2004-07-12 19:59 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel

On Mon, Jul 12, 2004 at 05:56:11PM +0000, H. Peter Anvin wrote:

> No it won't, since if you're using file descriptors there *is* no C
> library buffer.  fclose() will, though, and then call close().

Data sits in the page-cache though, and if you loose power before
that's flushed you will loose data.  This is why fsync is needed to be
sure.


   --cw

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-12 19:59               ` Chris Wedgwood
@ 2004-07-12 20:32                 ` H. Peter Anvin
  2004-07-12 22:29                 ` Bernd Eckenfels
  1 sibling, 0 replies; 52+ messages in thread
From: H. Peter Anvin @ 2004-07-12 20:32 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: linux-kernel

Chris Wedgwood wrote:
> On Mon, Jul 12, 2004 at 05:56:11PM +0000, H. Peter Anvin wrote:
> 
> 
>>No it won't, since if you're using file descriptors there *is* no C
>>library buffer.  fclose() will, though, and then call close().
> 
> 
> Data sits in the page-cache though, and if you loose power before
> that's flushed you will loose data.  This is why fsync is needed to be
> sure.
> 

Correct.

	-hpa

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-12 19:59               ` Chris Wedgwood
  2004-07-12 20:32                 ` H. Peter Anvin
@ 2004-07-12 22:29                 ` Bernd Eckenfels
  1 sibling, 0 replies; 52+ messages in thread
From: Bernd Eckenfels @ 2004-07-12 22:29 UTC (permalink / raw)
  To: linux-kernel

In article <20040712195956.GA14105@taniwha.stupidest.org> you wrote:
> Data sits in the page-cache though, and if you loose power before
> that's flushed you will loose data.  This is why fsync is needed to be
> sure.

Yes right, I was confusing that with networked filesystems with
commit-on-close semantics.

Greetings
Bernd


BTW:
I was stracing java, and it is enough to do "fos.getFD().sync();
fos.close()" on FileOutputStrea to get a fsync(fd) followed by close(fd).
-- 
eckes privat - http://www.eckes.org/
Project Freefire - http://www.freefire.org/

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-10 18:33     ` L A Walsh
  2004-07-10 18:43       ` Chris Wedgwood
@ 2004-07-12 23:03       ` Bernd Eckenfels
  2004-07-12 23:14         ` Chris Wedgwood
  1 sibling, 1 reply; 52+ messages in thread
From: Bernd Eckenfels @ 2004-07-12 23:03 UTC (permalink / raw)
  To: linux-kernel

Hello,

In article <40F03665.90108@tlinx.org> you wrote:
> My cases have been "vim" edited files.  I'd sorta think once vim has 
> exited, the
> data has been flushed, but that's just a WAG...

just a small background investigation, I checked joe:

   open("test.txt", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
   write(3, "test\ntest\n", 10)            = 10
   close(3)                                = 0

... which does not fsync... (it is also not an option in the source)

and vim:

   rename("test.txt", "test.txz~")         = 0
   open("test.txt", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
   write(3, " test\ntest\n", 11)           = 11
   close(3)                                = 0
   chmod("test.txt", 0100664)              = 0

... does no fsync, eighter.

Greetings
Bernd
-- 
eckes privat - http://www.eckes.org/
Project Freefire - http://www.freefire.org/

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-12 23:03       ` Bernd Eckenfels
@ 2004-07-12 23:14         ` Chris Wedgwood
  0 siblings, 0 replies; 52+ messages in thread
From: Chris Wedgwood @ 2004-07-12 23:14 UTC (permalink / raw)
  To: Bernd Eckenfels; +Cc: linux-kernel

On Tue, Jul 13, 2004 at 01:03:04AM +0200, Bernd Eckenfels wrote:

>    open("test.txt", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3

old data blocks release (truncated), transaction for this written to
journal more-or-less synchronously

>    write(3, "test\ntest\n", 10)            = 10
>    close(3)                                = 0

new data sitting in page-cache, not written to disk (in the case of
XFS the new blocks probably aren't even allocted at this stage).  the
file size being extended is i assume recorded in the journal though.

if you crash now, you see nulls or a truncated file,  i think this is
what people are getting with dotfiles

KDE is especially good at triggering this it seems


   --cw

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-09 21:59   ` Chris Wedgwood
  2004-07-10 18:33     ` L A Walsh
@ 2004-07-10 18:43     ` Jan Knutar
  2004-07-10 18:46       ` Chris Wedgwood
  1 sibling, 1 reply; 52+ messages in thread
From: Jan Knutar @ 2004-07-10 18:43 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: L A Walsh, Norberto Bensa, linux-kernel

On Saturday 10 July 2004 00:59, Chris Wedgwood wrote:

> I *never* see this even when beating the hell out of machines and
> trying to break things.

I've seen this on a partition with NO other activity, than me editing a .c
file with emacs in a project consisting of about 4 files in total, compiling
and testingocasionally, editing again, etc... Then one day, powerloss, 
when power came back, the file was nothing but null. Atleast it had 
correct size and timestamp though, great comfort, that. :)

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-10 18:43     ` Jan Knutar
@ 2004-07-10 18:46       ` Chris Wedgwood
  2004-07-10 18:55         ` Norberto Bensa
  0 siblings, 1 reply; 52+ messages in thread
From: Chris Wedgwood @ 2004-07-10 18:46 UTC (permalink / raw)
  To: Jan Knutar; +Cc: L A Walsh, Norberto Bensa, linux-kernel

On Sat, Jul 10, 2004 at 09:43:49PM +0300, Jan Knutar wrote:

> I've seen this on a partition with NO other activity, than me
> editing a .c file with emacs in a project consisting of about 4
> files in total, compiling and testingocasionally, editing again,
> etc... Then one day, powerloss, when power came back, the file was
> nothing but null. Atleast it had correct size and timestamp though,
> great comfort, that. :)


This is expected.  XFS does not journal data.  If you want that then
use ext3 or reiserfs.


   --cw

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-10 18:46       ` Chris Wedgwood
@ 2004-07-10 18:55         ` Norberto Bensa
  2004-07-10 19:19           ` Chris Wedgwood
                             ` (2 more replies)
  0 siblings, 3 replies; 52+ messages in thread
From: Norberto Bensa @ 2004-07-10 18:55 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Jan Knutar, L A Walsh, linux-kernel

Chris Wedgwood wrote:
> XFS does not journal data.

I think we all know that. The point, why the hell does it null files?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-10 18:55         ` Norberto Bensa
@ 2004-07-10 19:19           ` Chris Wedgwood
  2004-07-12 21:20             ` Chris Wedgwood
       [not found]             ` <2hgxc-5x9-9@gated-at.bofh.it>
  2004-07-10 19:33           ` Andreas Schwab
  2004-07-11  1:21           ` Gopikrishnan Sidhardhan
  2 siblings, 2 replies; 52+ messages in thread
From: Chris Wedgwood @ 2004-07-10 19:19 UTC (permalink / raw)
  To: Norberto Bensa; +Cc: Jan Knutar, L A Walsh, linux-kernel

On Sat, Jul 10, 2004 at 03:55:26PM -0300, Norberto Bensa wrote:

> I think we all know that. The point, why the hell does it null
> files?

A decision was made somewhere this is better than showing potentially
bogus or confidential data, so on log-reply some parts of files may be
zeroed.  I can see arguments for an againts this and clearly for a lot
of people the zeroing is a real pain.

It would be nice for some people to prevent log-replay zeroing files
but then something would have to be able to determine whether or not
these blocks were newly allocated (and this might contain confidential
data and need to be zeroed) or previously part of the file in which
case we probably would like them left alone.

I don't know any of the code well enough to know how easy this is or
even if I'm telling the truth :) Hopefully someone who does can speak
up on this.

  --cw

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-10 19:19           ` Chris Wedgwood
@ 2004-07-12 21:20             ` Chris Wedgwood
  2004-07-12 22:40               ` L A Walsh
       [not found]             ` <2hgxc-5x9-9@gated-at.bofh.it>
  1 sibling, 1 reply; 52+ messages in thread
From: Chris Wedgwood @ 2004-07-12 21:20 UTC (permalink / raw)
  To: Norberto Bensa; +Cc: Jan Knutar, L A Walsh, linux-kernel

On Sat, Jul 10, 2004 at 12:19:14PM -0700, Chris Wedgwood wrote:

> It would be nice for some people to prevent log-replay zeroing files
> but then something would have to be able to determine whether or not
> these blocks were newly allocated (and this might contain
> confidential data and need to be zeroed) or previously part of the
> file in which case we probably would like them left alone.

I told lies.

> I don't know any of the code well enough to know how easy this is or
> even if I'm telling the truth :) Hopefully someone who does can
> speak up on this.

I knew I was completely full of shit.

XFS does *not* zero files, it simply returns zeros for unwritten
extents.  If you open an existing file and scribble all over it, you
might see the old data during a crash, or the new data if it was
flushed.  You shouldn't see zero's though.

What does happen though, is that dotfiles are truncated and rewritten,
if the data blocks aren't flushed you will get zeros back because the
extents were unwritten.  This is really the only sensible thing to do
given the circumstances.

My guess is that with other fs' (when journaling metadata only) the
blocks allocated for the newly written data are *usually* the same as
the recently freed blocks from the truncate so things appear to work
but in reality it's probably mostly luck.  XFS could behave the same
way, but sooner or later you will still loose when you get crap back
instead of old data.

Some applications just need to be fixed.

   --cw

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-12 21:20             ` Chris Wedgwood
@ 2004-07-12 22:40               ` L A Walsh
  2004-07-12 22:53                 ` Chris Wedgwood
  0 siblings, 1 reply; 52+ messages in thread
From: L A Walsh @ 2004-07-12 22:40 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Norberto Bensa, Jan Knutar, L A Walsh, linux-kernel

<aside>
Chris, I'd never say you were full of shit or lied in any circumstance.
Mistakes are human -- not being "full of shit" or "lying".  Get over it.
Don't inflate error.  Seems to be related to pervasive belief that people
have to be either all good or all bad or all perfect, or flawed, or all 
white
hat or black hat, God or imperfect and to be "good" in programming field
one must be perfect and never be _known_ to make a mistake (like some
who portray others as dangerous or fools because they may hold knowledge of
that name-callers' faults).  The ones who pose the real danger are those
who censor others because the "masses" can't handle the truth.  Anyway
it's unuseful to demean yourself or others.  </aside>

If it is of any help (I doubt it, it perplexes me)...the files I've
written out with vim and have returned "nulls" have been files that were
written out 2-3 _DAYS_ earlier -- often with more recent write having
been saved fine. 

I've also seen sections in log files where blocks would return zero in the
middle of a log.  Obviously blocks before and after successfully made it to
disk, but in _RARE_ circumstances (crashes and unplanned shutdowns are 
already
rare enough, so it's a rare bug that only shows up on a 'rare' 
occasion...:-).

Almost (shot in the dark), like some code that was supposed to zero 
unused but allocated
datablocks got pointed at the wrong blocks, since these files are 
readable as having
been written (yes may all be out of membuffs) and are often recoverable 
from the day's
backup. 

If it was a file I just edited and then it crashed, that I could 
understand more than
having files I haven't touched for a few days be zapped.

-l

Chris Wedgwood wrote:

>On Sat, Jul 10, 2004 at 12:19:14PM -0700, Chris Wedgwood wrote:
>
>  
>
>>It would be nice for some people to prevent log-replay zeroing files
>>but then something would have to be able to determine whether or not
>>these blocks were newly allocated (and this might contain
>>confidential data and need to be zeroed) or previously part of the
>>file in which case we probably would like them left alone.
>>    
>>
>
>I told lies.
>
>  
>
>>I don't know any of the code well enough to know how easy this is or
>>even if I'm telling the truth :) Hopefully someone who does can
>>speak up on this.
>>    
>>
>
>I knew I was completely full of shit.
>
>
>XFS does *not* zero files, it simply returns zeros for unwritten
>extents.  If you open an existing file and scribble all over it, you
>might see the old data during a crash, or the new data if it was
>flushed.  You shouldn't see zero's though.
>
>What does happen though, is that dotfiles are truncated and rewritten,
>if the data blocks aren't flushed you will get zeros back because the
>extents were unwritten.  This is really the only sensible thing to do
>given the circumstances.
>
>My guess is that with other fs' (when journaling metadata only) the
>blocks allocated for the newly written data are *usually* the same as
>the recently freed blocks from the truncate so things appear to work
>but in reality it's probably mostly luck.  XFS could behave the same
>way, but sooner or later you will still loose when you get crap back
>instead of old data.
>
>Some applications just need to be fixed.
>
>
>   --cw
>  
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-12 22:40               ` L A Walsh
@ 2004-07-12 22:53                 ` Chris Wedgwood
  2004-07-13  1:44                   ` Bernd Eckenfels
  0 siblings, 1 reply; 52+ messages in thread
From: Chris Wedgwood @ 2004-07-12 22:53 UTC (permalink / raw)
  To: L A Walsh; +Cc: Norberto Bensa, Jan Knutar, linux-kernel

On Mon, Jul 12, 2004 at 03:40:08PM -0700, L A Walsh wrote:

> If it is of any help (I doubt it, it perplexes me)...the files I've
> written out with vim and have returned "nulls" have been files that
> were written out 2-3 _DAYS_ earlier -- often with more recent write
> having been saved fine.

I've heard this before and you're not the only person to claim this.
For a period of time the buffer-flushing code was broken and this was
probably possible then, even sync/fsync failed to write stuff out.

But that was a long time ago (last year) and I'm not sure that is
still the case.  It could be, the flushing code is quite complicated
and I don't understand it fully, but testing seems to indicate it does
work.

To be quite honest I've never seen nulls in files that a days old, and
I have scripts which checksum (md5) my files (hundreds of gigabytes)
which would notice this, so knowing how to reproduce it would be nice.

> I've also seen sections in log files where blocks would return zero
> in the middle of a log.

Log was being appended, system crashed, you get nulls at the end when
rebootd, the logger opens the file append and starts writing stuff,
the nulls end up in the middle.  Arguably this is expected.

> Obviously blocks before and after successfully made it to disk, but
> in _RARE_ circumstances (crashes and unplanned shutdowns are already
> rare enough, so it's a rare bug that only shows up on a 'rare'
> occasion... :-)

It can't be blocks before and after, if that was the case you wouldn't
see the nulls.  I'm pretty sure for you the nulls are not really
on-disk, looking at the raw device you won't see them.  They nulls are
returns for unwritten extents just as nulls are returned for holes in
sparse files.

> Almost (shot in the dark), like some code that was supposed to zero
> unused but allocated datablocks got pointed at the wrong blocks,
> since these files are readable as having been written (yes may all
> be out of membuffs) and are often recoverable from the day's backup.

I cant see how.  It seems to me that if block pointers got all messed
up, xfs_repair would scream bloody murder and this explode and die on
a live fs.  I don't see reports that look like this.

> If it was a file I just edited and then it crashed, that I could
> understand more than having files I haven't touched for a few days
> be zapped.

My gut feeling is these files really are being changed.  Stat should
show if this is the case.

  --cw

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-12 22:53                 ` Chris Wedgwood
@ 2004-07-13  1:44                   ` Bernd Eckenfels
  2004-07-13  5:24                     ` Chris Wedgwood
  0 siblings, 1 reply; 52+ messages in thread
From: Bernd Eckenfels @ 2004-07-13  1:44 UTC (permalink / raw)
  To: linux-kernel

In article <20040712225338.GD23623@taniwha.stupidest.org> you wrote:
> To be quite honest I've never seen nulls in files that a days old, and
> I have scripts which checksum (md5) my files (hundreds of gigabytes)
> which would notice this, so knowing how to reproduce it would be nice.

I can say, that nulls in files are most common at the end of (sys)log files
filing up to the next block boundary. I always asumed this is due to the
fact that the filesize in the metadata was not written but the last
half-finished block was already linked in the inode structure.

I have never seen null  filled data or config files other than that, but I
do not have busy servers crashing often on me.

> Log was being appended, system crashed, you get nulls at the end when
> rebootd, the logger opens the file append and starts writing stuff,
> the nulls end up in the middle.  Arguably this is expected.

Yes, and it is normally easy to spot, since the messages after the nulls are
boot messages.

> see the nulls.  I'm pretty sure for you the nulls are not really
> on-disk, looking at the raw device you won't see them.  They nulls are
> returns for unwritten extents just as nulls are returned for holes in
> sparse files.

ls -s compared with ls -l should make that visible?

Greetings
Bernd
-- 
eckes privat - http://www.eckes.org/
Project Freefire - http://www.freefire.org/

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-13  1:44                   ` Bernd Eckenfels
@ 2004-07-13  5:24                     ` Chris Wedgwood
  0 siblings, 0 replies; 52+ messages in thread
From: Chris Wedgwood @ 2004-07-13  5:24 UTC (permalink / raw)
  To: Bernd Eckenfels; +Cc: linux-kernel

On Tue, Jul 13, 2004 at 03:44:52AM +0200, Bernd Eckenfels wrote:

> I can say, that nulls in files are most common at the end of
> (sys)log files filing up to the next block boundary.

Ideally syslog would rewind back past an nulls when it opens files.

> ls -s compared with ls -l should make that visible?

No, unwritten extents has an on-disk place, just the data isn't
written.  I'm not sure if there is an easy way to tell if an extent is
unritten or not, I guess you could use xfs_bmap -p if that's working
right for you.

  --cw

^ permalink raw reply	[flat|nested] 52+ messages in thread

[parent not found: <2hgxc-5x9-9@gated-at.bofh.it>]

* Re: XFS: how to NOT null files on fsck?
       [not found]             ` <2hgxc-5x9-9@gated-at.bofh.it>
@ 2004-07-13  7:25               ` Anton Ertl
  2004-07-13  8:09                 ` Chris Wedgwood
  2004-07-13 22:24                 ` Helge Hafting
  0 siblings, 2 replies; 52+ messages in thread
From: Anton Ertl @ 2004-07-13  7:25 UTC (permalink / raw)
  To: linux-kernel; +Cc: Chris Wedgwood, Jan Knutar, L A Walsh

Chris Wedgwood <cw@f00f.org> writes:
>XFS does *not* zero files, it simply returns zeros for unwritten
>extents.  If you open an existing file and scribble all over it, you
>might see the old data during a crash, or the new data if it was
>flushed.  You shouldn't see zero's though.
>
>What does happen though, is that dotfiles are truncated and rewritten,
>if the data blocks aren't flushed you will get zeros back because the
>extents were unwritten.  This is really the only sensible thing to do
>given the circumstances.
>
>My guess is that with other fs' (when journaling metadata only) the
>blocks allocated for the newly written data are *usually* the same as
>the recently freed blocks from the truncate so things appear to work
>but in reality it's probably mostly luck.

A secure FS must ensure that other people's deleted data does not end
up in the file.  AFAIK FSs don't record owners for free blocks, so
they can only ensure this by zeroing the blocks.  So I doubt that you
will see any different behaviour from an FS that keeps only meta-data
consistent and writes meta-data before data.

>Some applications just need to be fixed.

It's too hard to fix the applications, since there is no easy way to
test that they are really fixed.  Also, the number of applications is
much higher than the number of file systems.

The way to go is to fix the file system (well, often it means a new
FS).

The file system should provide something that I call in-order
semantics, i.e., that the disk state always represents an existing
(possibly old) logical state of the FS, not some state that never
existed, or some existing state with missing data.

My favourite approach to achieve these semantics is based on
log-structured file systems (see
<http://www.complang.tuwien.ac.at/anton/lfs/> for some ideas and also
a longer description of in-order semantics), but there are also other
approaches: I believe that Soft Updates, when implemented correctly,
provide in-order semantics, and Reiser4 may provide them, too.

- anton
-- 
M. Anton Ertl                    Some things have to be seen to be believed
anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-13  7:25               ` Anton Ertl
@ 2004-07-13  8:09                 ` Chris Wedgwood
  2004-07-13  9:34                   ` Anton Ertl
  2004-07-13 22:24                 ` Helge Hafting
  1 sibling, 1 reply; 52+ messages in thread
From: Chris Wedgwood @ 2004-07-13  8:09 UTC (permalink / raw)
  To: Anton Ertl; +Cc: linux-kernel, Jan Knutar, L A Walsh

On Tue, Jul 13, 2004 at 07:25:29AM +0000, Anton Ertl wrote:

> A secure FS must ensure that other people's deleted data does not
> end up in the file.  AFAIK FSs don't record owners for free blocks,
> so they can only ensure this by zeroing the blocks.

How can free blocks have an owner?  They wouldn't be free then.

> So I doubt that you will see any different behaviour from an FS that
> keeps only meta-data consistent and writes meta-data before data.

You do, some fs' will return stale data.

> It's too hard to fix the applications, since there is no easy way to
> test that they are really fixed.

No, it's not hard to fix the applications and it's easy to tell if
they are fixed.

> Also, the number of applications is much higher than the number of
> file systems.

You don't fix all applications, only ones where data is critical and
their handling of it is poor.  MTAs like postfix don't have a problem
for example, they are generally written well.

> The file system should provide something that I call in-order
> semantics, i.e., that the disk state always represents an existing
> (possibly old) logical state of the FS, not some state that never
> existed, or some existing state with missing data.

ext3 and reiserfs have what amounts to this as an option right now.
It has some performance implications but I'm told works great.

I don't think the current XFS behaviour is undesirable or broken.


   --cw

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-13  8:09                 ` Chris Wedgwood
@ 2004-07-13  9:34                   ` Anton Ertl
  2004-07-13  9:53                     ` Chris Wedgwood
  0 siblings, 1 reply; 52+ messages in thread
From: Anton Ertl @ 2004-07-13  9:34 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: linux-kernel, Jan Knutar, L A Walsh

Chris Wedgwood wrote:
> 
> On Tue, Jul 13, 2004 at 07:25:29AM +0000, Anton Ertl wrote:
> 
> > A secure FS must ensure that other people's deleted data does not
> > end up in the file.  AFAIK FSs don't record owners for free blocks,
> > so they can only ensure this by zeroing the blocks.
> 
> How can free blocks have an owner?  They wouldn't be free then.

It would be the former owner of the block.

> > So I doubt that you will see any different behaviour from an FS that
> > keeps only meta-data consistent and writes meta-data before data.
> 
> You do, some fs' will return stale data.

Stale data yes, but probably not stale data from blocks that were
formerly free (or the file system is insecure).

> > It's too hard to fix the applications, since there is no easy way to
> > test that they are really fixed.
> 
> No, it's not hard to fix the applications and it's easy to tell if
> they are fixed.

So, how do you tell?

> > Also, the number of applications is much higher than the number of
> > file systems.
> 
> You don't fix all applications, only ones where data is critical and
> their handling of it is poor.  MTAs like postfix don't have a problem
> for example, they are generally written well.

Where is data not critical?  I had such a problem even with a
widely-used application like GNU Emacs (many years ago, may be fixed
now), casting doubt on your claim that fixing the application is easy.

> > The file system should provide something that I call in-order
> > semantics, i.e., that the disk state always represents an existing
> > (possibly old) logical state of the FS, not some state that never
> > existed, or some existing state with missing data.
> 
> ext3 and reiserfs have what amounts to this as an option right now.
> It has some performance implications but I'm told works great.

You mean ext3 data=journal?  The last I heard about it was that it was
broken.

ext3 data=ordered will probably also work better in most cases than an
FS with eager meta-data updates (like, apparently, XFS), but I don't
think it guarantees in-order semantics.

- anton

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-13  9:34                   ` Anton Ertl
@ 2004-07-13  9:53                     ` Chris Wedgwood
  2004-07-13 10:27                       ` Tim Connors
  2004-07-13 13:33                       ` Anton Ertl
  0 siblings, 2 replies; 52+ messages in thread
From: Chris Wedgwood @ 2004-07-13  9:53 UTC (permalink / raw)
  To: Anton Ertl; +Cc: linux-kernel, Jan Knutar, L A Walsh

On Tue, Jul 13, 2004 at 11:34:54AM +0200, Anton Ertl wrote:

> It would be the former owner of the block.

there might not be a former owner (in most cases there probably isn't)

> Stale data yes, but probably not stale data from blocks that were
> formerly free (or the file system is insecure).

some, like reiserfs apparently do (or did, it may be different now, if
not used reiserfs for a long time)

> So, how do you tell?

code inspection and/or testing

> Where is data not critical?

that depends on the person and situation, for me personally lots of my
data isn't critical.  certainly it's annoying to loose data but
probably not life threatening

> I had such a problem even with a widely-used application like GNU
> Emacs (many years ago, may be fixed now), casting doubt on your
> claim that fixing the application is easy.

emacs will usually rename the old file so at the very least you have
that

i've had emacs crash and whilst it's frustrating, it certainly isn't
as bad as loosing an email (which may or may not be important, i'll
decide that after i read it)

> ext3 data=ordered will probably also work better in most cases than an
> FS with eager meta-data updates (like, apparently, XFS), but I don't
> think it guarantees in-order semantics.

i thought that was the point of it?  as best as i can tell the
metadata changes will become visible after the data has updated

however, in the case of something like kde/emacs/whatever you can
*still* loose data

consider something like:

	open with truncate
	crash

or more likely:

	open with truncate
	write some data
	crash

there is also an even more common case than either of these:

        open with truncate
	write data, get -ENOSPC
	spplication terminates/aborts

at which point you've stomped on your file.  it's non uncommong for
KDE to do this (even though the window would apparently be very small)



  --cw

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re:  XFS: how to NOT null files on fsck?
  2004-07-13  9:53                     ` Chris Wedgwood
@ 2004-07-13 10:27                       ` Tim Connors
  2004-07-13 10:38                         ` ismail dönmez
  2004-07-13 10:58                         ` Chris Wedgwood
  2004-07-13 13:33                       ` Anton Ertl
  1 sibling, 2 replies; 52+ messages in thread
From: Tim Connors @ 2004-07-13 10:27 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Anton Ertl, linux-kernel, Jan Knutar, L A Walsh

Chris Wedgwood <cw@f00f.org> said on Tue, 13 Jul 2004 02:53:00 -0700:
> at which point you've stomped on your file.  it's non uncommong for
> KDE to do this (even though the window would apparently be very small)

KDE is a peice of shit with regards to file handling.

It seems they never learnt the lessons of writing files in Unix that
have been learnt over the last 30 years.

How the hell can you afford to hose your entire WM because KDE decides
to write some obscure file at some time when the NFS servers just
happen to be temporarily down? What ever happened to the standard
practice of write to temp file, then atomic rename? What ever happened
to making backups of critical files before overwriting them? Furrfu.

Makes me glad I use a much more sane WM, but I pity those 3 users in
the space of a few minutes who lost all of their settings.

BTW, I have submitted the occasional bug to Debian because packages
will cause dataloss to an /etc file if the disk happens to run out at
the wrong moment (quite a common occurence for me). Furrfu people -
this is so bloody simple to get right.

-- 
TimC -- http://astronomy.swin.edu.au/staff/tconnors/
The prolonged application of polysyllabic vocabulary infallibly
exercises a deleterious influence on the fecundity of expression,
rendering the ultimate tendancy apocryphal.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-13 10:27                       ` Tim Connors
@ 2004-07-13 10:38                         ` ismail dönmez
  2004-07-13 11:16                           ` Nick Piggin
  2004-07-13 10:58                         ` Chris Wedgwood
  1 sibling, 1 reply; 52+ messages in thread
From: ismail dönmez @ 2004-07-13 10:38 UTC (permalink / raw)
  To: Tim Connors
  Cc: Chris Wedgwood, Anton Ertl, linux-kernel, Jan Knutar, L A Walsh

Trying to start a flame war with bitching about KDE? How about trying
to solve at least work around it? No? Then please shut the fuck up.

On Tue, 13 Jul 2004 20:27:30 +1000, Tim Connors
<tconnors@astro.swin.edu.au> wrote:
> KDE is a peice of shit with regards to file handling.
> 
> It seems they never learnt the lessons of writing files in Unix that
> have been learnt over the last 30 years.
> 
> How the hell can you afford to hose your entire WM because KDE decides
> to write some obscure file at some time when the NFS servers just
> happen to be temporarily down? What ever happened to the standard
> practice of write to temp file, then atomic rename? What ever happened
> to making backups of critical files before overwriting them? Furrfu.
> 
> Makes me glad I use a much more sane WM, but I pity those 3 users in
> the space of a few minutes who lost all of their settings.
> 
> BTW, I have submitted the occasional bug to Debian because packages
> will cause dataloss to an /etc file if the disk happens to run out at
> the wrong moment (quite a common occurence for me). Furrfu people -
> this is so bloody simple to get right.
> 
> --
> TimC -- http://astronomy.swin.edu.au/staff/tconnors/
> The prolonged application of polysyllabic vocabulary infallibly
> exercises a deleterious influence on the fecundity of expression,
> rendering the ultimate tendancy apocryphal.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


-- 
Time is what you make of it

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-13 10:38                         ` ismail dönmez
@ 2004-07-13 11:16                           ` Nick Piggin
  2004-07-13 12:52                             ` ismail dönmez
  0 siblings, 1 reply; 52+ messages in thread
From: Nick Piggin @ 2004-07-13 11:16 UTC (permalink / raw)
  To: ismail dönmez
  Cc: Tim Connors, Chris Wedgwood, Anton Ertl, linux-kernel, Jan Knutar,
	L A Walsh

ismail dönmez wrote:
> Trying to start a flame war with bitching about KDE? How about trying
> to solve at least work around it? No? Then please shut the fuck up.
> 

This isn't really acceptable on this mailing list.

If you are offended by someone bitching about KDE, I politely
suggest that you unsubscribe.

Nick

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-13 11:16                           ` Nick Piggin
@ 2004-07-13 12:52                             ` ismail dönmez
  0 siblings, 0 replies; 52+ messages in thread
From: ismail dönmez @ 2004-07-13 12:52 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Tim Connors, Chris Wedgwood, Anton Ertl, linux-kernel, Jan Knutar,
	L A Walsh

Ok sorry for bad language but as an XFS & KDE user I would like to see
better discussion like there maybe some workarounds for this apart
from them dumping KDE or XFS.

About the KDE config stuff it happens in kconfig.cpp which lies under
kdelibs/kdecore directory on CVS if you want to look at it.

Again sorry for bad language.

Cheers,
ismail

On Tue, 13 Jul 2004 21:16:42 +1000, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> This isn't really acceptable on this mailing list.
> 
> If you are offended by someone bitching about KDE, I politely
> suggest that you unsubscribe.
> 
> Nick
> 

-- 
Time is what you make of it

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-13 10:27                       ` Tim Connors
  2004-07-13 10:38                         ` ismail dönmez
@ 2004-07-13 10:58                         ` Chris Wedgwood
  1 sibling, 0 replies; 52+ messages in thread
From: Chris Wedgwood @ 2004-07-13 10:58 UTC (permalink / raw)
  To: Tim Connors, ismail d?nmez
  Cc: Anton Ertl, linux-kernel, Jan Knutar, L A Walsh

On Tue, Jul 13, 2004 at 08:27:30PM +1000, Tim Connors wrote:

> KDE is a peice of shit with regards to file handling.

I personally would like to see KDE made more robust here (since I use
it myself).  I'm guessing it's probably not hard but I don't have a
good feeling as the few times I have hacked KDE I was pretty
disappointed how bad the code is.

That said, my guess is common code handles most of this stuff so the
right fixes in one or two places would probably cover everything.

> Makes me glad I use a much more sane WM, but I pity those 3 users in
> the space of a few minutes who lost all of their settings.

I back my .kde ever now and then as a precaution.  It's generally not
a problem for me but as mentioned I am aware KDE could be better in
this regard.

Loosing window manager settings is a pain, loosing data from knotes
and your bookmarks is very much more frustrating though.

On Tue, Jul 13, 2004 at 01:38:40PM +0300, ismail d?nmez wrote:

> Trying to start a flame war with bitching about KDE?

I'm not sure he was.

> How about trying to solve at least work around it?

He doesn't use it, why would he bother?

On the other hand, one day I might (I hope someone else does before
me, the KDE code is scary).

Since so many files are involved (I have 350 in my .kde) I suspect
properly fixing this is going to be more involved that write, fsync,
rename but it probably wouldn't be a bad place to start (only 43 of
them were modified in the last day).

  --cw

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-13  9:53                     ` Chris Wedgwood
  2004-07-13 10:27                       ` Tim Connors
@ 2004-07-13 13:33                       ` Anton Ertl
  2004-07-13 20:32                         ` Chris Wedgwood
  1 sibling, 1 reply; 52+ messages in thread
From: Anton Ertl @ 2004-07-13 13:33 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: linux-kernel, Jan Knutar, L A Walsh

Chris Wedgwood wrote:
> 
> On Tue, Jul 13, 2004 at 11:34:54AM +0200, Anton Ertl wrote:
> 
> > It would be the former owner of the block.
> 
> there might not be a former owner (in most cases there probably isn't)

If the owner of the file is not the former owner of the block, the FS
certainly should not put the block in the file.

> > So, how do you tell?
> 
> code inspection and/or testing

How do you test?

Code inspection is good, but I think it needs to be complemented by
testing.

> > Where is data not critical?
> 
> that depends on the person and situation, for me personally lots of my
> data isn't critical.  certainly it's annoying to loose data but
> probably not life threatening

We are balancing three things: making the file system nicer; working
around non-nice file-systems in the applications; and losing data
(even if it's just annoying rather than life-threatening).  IMO losing
data is the worst of these alternatives, and making file system nicer
is the best one.

> > I had such a problem even with a widely-used application like GNU
> > Emacs (many years ago, may be fixed now), casting doubt on your
> > claim that fixing the application is easy.
> 
> emacs will usually rename the old file so at the very least you have
> that

Emacs does that only once per session, and I tend to stay in an Emacs
session for days or weeks (and others probably do so, too).  Then
there is the auto-save file, but unfortunately eager meta-data updates
trash that, too (see
<http://www.complang.tuwien.ac.at/anton/sync-metadata-updates.html>).

> > ext3 data=ordered will probably also work better in most cases than an
> > FS with eager meta-data updates (like, apparently, XFS), but I don't
> > think it guarantees in-order semantics.
> 
> i thought that was the point of it?  as best as i can tell the
> metadata changes will become visible after the data has updated

Right, but that's not sufficient.  I am not an expert on ext3, but
from the description I have read that's all it guarantees.  If an
application does a meta-data update, and then a data update, the disk
state on crash might be that the data update was done and the
meta-data update was not, which is not any of the states that ever
existed logically.

> however, in the case of something like kde/emacs/whatever you can
> *still* loose data
> 
> consider something like:
> 
> 	open with truncate
> 	crash
> 
> or more likely:
> 
> 	open with truncate
> 	write some data
> 	crash
> 
> there is also an even more common case than either of these:
> 
>         open with truncate
> 	write data, get -ENOSPC
> 	spplication terminates/aborts
> 
> at which point you've stomped on your file.  it's non uncommong for
> KDE to do this (even though the window would apparently be very small)

There are certainly ways that an application can lose data even with a
fully synchronous file system (which is the semantically nicest thing
you can ask for (ignoring transactions)), but I am not talking about
that.  Applications can be tested against that relatively easily by
killing the application and seeing if the files are ok.

I am talking about ways that data can be lost because the file system
does not have the nice semantics of a fully synchronous one.  The
in-order guarantee is something that can be implemented relatively
efficiently and that does not add any local ways that data can be lost
or become inconsistent (it does add ways to become inconsistent in
distibuted applications, though, but there are fewer of these
applications around, and their programmers are more used to thinking
about concurrency, and thus hopefully better prepared to insert fsybcs
etc. at the right place).

- anton

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-13 13:33                       ` Anton Ertl
@ 2004-07-13 20:32                         ` Chris Wedgwood
  2004-07-13 22:42                           ` Bernd Eckenfels
  2004-07-14 18:49                           ` Anton Ertl
  0 siblings, 2 replies; 52+ messages in thread
From: Chris Wedgwood @ 2004-07-13 20:32 UTC (permalink / raw)
  To: Anton Ertl; +Cc: linux-kernel, Jan Knutar, L A Walsh

On Tue, Jul 13, 2004 at 03:33:23PM +0200, Anton Ertl wrote:

> If the owner of the file is not the former owner of the block, the FS
> certainly should not put the block in the file.

sorry, i dont understand that

> How do you test?

running the code and pressing reset or similar

> We are balancing three things: making the file system nicer; working
> around non-nice file-systems in the applications; and losing data
> (even if it's just annoying rather than life-threatening).  IMO losing
> data is the worst of these alternatives, and making file system nicer
> is the best one.

all these things have trade-offs, plenty of people are happy with the
current balance

for those that are not you can use something else

> Right, but that's not sufficient.  I am not an expert on ext3, but
> from the description I have read that's all it guarantees.  If an
> application does a meta-data update, and then a data update, the
> disk state on crash might be that the data update was done and the
> meta-data update was not, which is not any of the states that ever
> existed logically.

i don't see how for ordered updates that can occur,  otherwise they
wouldn't be ordered

> Applications can be tested against that relatively easily by killing
> the application and seeing if the files are ok.

i've seen both KDE emacs loose data by crashing, does the fix for that
belong in the fs too?

> I am talking about ways that data can be lost because the file
> system does not have the nice semantics of a fully synchronous one.

mount -o sync

> The in-order guarantee is something that can be implemented
> relatively efficiently

let's see a patch, please give details of performance differences

i don't think the current situation is all bad or even undesirable,
yes, it is a balance and i think it's fine as-is

what you want a much more high-level semantics in the filesystem which
possibly will have large performance implications.  im not sure such
semantics are *required* to be in the fs or should be there

also, this is fixing the relatively rare case where the system
crashes, which to be quite honest is a bigger concern, why no seek
solutinos that deal with more common failure modes like applications
crashing or bahaving badly?


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-13 20:32                         ` Chris Wedgwood
@ 2004-07-13 22:42                           ` Bernd Eckenfels
  2004-07-14 18:49                           ` Anton Ertl
  1 sibling, 0 replies; 52+ messages in thread
From: Bernd Eckenfels @ 2004-07-13 22:42 UTC (permalink / raw)
  To: linux-kernel

In article <20040713203246.GB6614@taniwha.stupidest.org> you wrote:
> running the code and pressing reset or similar

hmm... perhaps an LD_PRELOAD wrapper (based  on fakeroot) which logs all
filenames of writes with no fsync (in addition to renames and unlinks) may
easyly allow to find them by name.

let me check that out, it could even overwrite close() (which will for sure
make the system slower)

Greetings
Bernd
-- 
eckes privat - http://www.eckes.org/
Project Freefire - http://www.freefire.org/

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-13 20:32                         ` Chris Wedgwood
  2004-07-13 22:42                           ` Bernd Eckenfels
@ 2004-07-14 18:49                           ` Anton Ertl
  2004-07-14 19:00                             ` Chris Wedgwood
  1 sibling, 1 reply; 52+ messages in thread
From: Anton Ertl @ 2004-07-14 18:49 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: linux-kernel, Jan Knutar, L A Walsh

Chris Wedgwood wrote:
> 
> On Tue, Jul 13, 2004 at 03:33:23PM +0200, Anton Ertl wrote:
> 
> > If the owner of the file is not the former owner of the block, the FS
> > certainly should not put the block in the file.
> 
> sorry, i dont understand that

I'll try to put it another way:

If a free block was last allocated to a file belonging to user U, then
it may be ok (it's not a security problem) to put the block in a
file belonging to user U on recovery; if not, then it's certainly not
ok to put it into such a file without erasing it first.

If you don't understand that, please let me know where I am losing
you.

> > How do you test?
> 
> running the code and pressing reset or similar

Ok, I was thinking about this testing methodology, too.  That's not
what I call easy, and it has led to the current situation where many
applications are not safe against the not-so-nice crash semantics of
many file systems.

> > Right, but that's not sufficient.  I am not an expert on ext3, but
> > from the description I have read that's all it guarantees.  If an
> > application does a meta-data update, and then a data update, the
> > disk state on crash might be that the data update was done and the
> > meta-data update was not, which is not any of the states that ever
> > existed logically.
> 
> i don't see how for ordered updates that can occur,  otherwise they
> wouldn't be ordered

Full-blown ordering is hard in a file system that overwrites allocated
blocks.  E.g., consider writing a little bit to block A, then writing
something to block B, then writing something to block A again.  For
proper in-order semantics these writes have to occur in that order,
and the first write to block A must not already include the second
write; this becomes complicated with lazy writing.  Soft Updates do
funny things with the cache to get the ordering of operations right.

I don't know if ext3 data=ordered does any of this, but the
description "data updates are flushed to disk before transactions
commit" does not sound like it does.

OTOH, the data=ordered approach may be good enough for most
applications (which deal with whole files rather than changes to parts
of a file), so maybe any further effort will not provide enough
benefit to gain much popularity.  It's certainly much nicer than any
eager-meta-data-update system like (apparently) XFS.

> > Applications can be tested against that relatively easily by killing
> > the application and seeing if the files are ok.
> 
> i've seen both KDE emacs loose data by crashing, does the fix for that
> belong in the fs too?

Application crashing?  No; I don't see how the file system can fix
that.

I have never seen Emacs lose data from crashing or (more frequently)
being killed.  Do you have an idea what went wrong in your case and
how they 

In any case, if the developers have a hard time protecting even
against application crashes/kills, I would not expect them to go to
the effort and succeed in protecting against not-so-nice FS crash
semantics.

> > I am talking about ways that data can be lost because the file
> > system does not have the nice semantics of a fully synchronous one.
> 
> mount -o sync

Very slow, and I would not trust it, because it probably receives very
little testing.

> > The in-order guarantee is something that can be implemented
> > relatively efficiently
> 
> let's see a patch, please give details of performance differences

Take a look at <http://www.complang.tuwien.ac.at/czezatke/lfs.html>.
For performance results look at Section 7 of
<http://www.complang.tuwien.ac.at/papers/czezatke%26ertl00/>.  I would
not recommend using that stuff instead of any of the established FSs,
but it may be good enough to answer your questions.

> what you want a much more high-level semantics in the filesystem which
> possibly will have large performance implications.

I don't think that the performance implications are large in typical
situations, in crontrast to the solution you proposed (mount -o sync).

>  im not sure such
> semantics are *required* to be in the fs or should be there

Required by whom?  Me?  Yes!

> also, this is fixing the relatively rare case where the system
> crashes, which to be quite honest is a bigger concern, why no seek
> solutinos that deal with more common failure modes like applications
> crashing or bahaving badly?

What I am proposing is extending the solutions for that to also work
for the system crash case.  This will increase the incentive for the
programmers to fix the application crash case.

BTW, the way my current hardware acts up, system crashes are more
frequent than application crashes, and certainly more frequent than
applications behaving badly.

- anton

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-14 18:49                           ` Anton Ertl
@ 2004-07-14 19:00                             ` Chris Wedgwood
  0 siblings, 0 replies; 52+ messages in thread
From: Chris Wedgwood @ 2004-07-14 19:00 UTC (permalink / raw)
  To: Anton Ertl; +Cc: linux-kernel, Jan Knutar, L A Walsh

On Wed, Jul 14, 2004 at 08:49:03PM +0200, Anton Ertl wrote:

> If a free block was last allocated to a file belonging to user U,
> then it may be ok (it's not a security problem) to put the block in
> a file belonging to user U on recovery; if not, then it's certainly
> not ok to put it into such a file without erasing it first.

that's still a big security problem, consider files with restricted
paths all of a sudden appearing or globally visible root-owned files
appearing with old root-only data in them

> I have never seen Emacs lose data from crashing or (more frequently)
> being killed.  Do you have an idea what went wrong in your case and
> how they

no idea, for a while it would segfault when you resized the window and
you would loose everthing, (no crash handler to attempt to save things
i guess)

> Take a look at <http://www.complang.tuwien.ac.at/czezatke/lfs.html>.

apples and oranges

> BTW, the way my current hardware acts up, system crashes are more
> frequent than application crashes, and certainly more frequent than
> applications behaving badly.

you need new hardware or a new system then

this entire thread is dragging on and seems to have become a religious
discussion about how XFS should because various people don't like it's
current behavior despite the way things have worked that way for many
many years

i don't care if people use XFS or not,  there are plenty of
alternatives


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-13  7:25               ` Anton Ertl
  2004-07-13  8:09                 ` Chris Wedgwood
@ 2004-07-13 22:24                 ` Helge Hafting
  2004-07-13 22:39                   ` Chris Wedgwood
                                     ` (2 more replies)
  1 sibling, 3 replies; 52+ messages in thread
From: Helge Hafting @ 2004-07-13 22:24 UTC (permalink / raw)
  To: Anton Ertl; +Cc: linux-kernel, Chris Wedgwood, Jan Knutar, L A Walsh

On Tue, Jul 13, 2004 at 07:25:29AM +0000, Anton Ertl wrote:
> Chris Wedgwood <cw@f00f.org> writes:
> >XFS does *not* zero files, it simply returns zeros for unwritten
> >extents.  If you open an existing file and scribble all over it, you
> >might see the old data during a crash, or the new data if it was
> >flushed.  You shouldn't see zero's though.
> >
> >What does happen though, is that dotfiles are truncated and rewritten,
> >if the data blocks aren't flushed you will get zeros back because the
> >extents were unwritten.  This is really the only sensible thing to do
> >given the circumstances.
> >
> >My guess is that with other fs' (when journaling metadata only) the
> >blocks allocated for the newly written data are *usually* the same as
> >the recently freed blocks from the truncate so things appear to work
> >but in reality it's probably mostly luck.
> 
> A secure FS must ensure that other people's deleted data does not end
> up in the file.  AFAIK FSs don't record owners for free blocks, so
> they can only ensure this by zeroing the blocks.  So I doubt that you
> will see any different behaviour from an FS that keeps only meta-data
> consistent and writes meta-data before data.
> 

There is another solution - zero blocks when freeing them. (Or
put them on a list for later zeroing when the fs isn't busy,
in order to kee��p good performance)

With this approach you don't need to zero a half-written
block after a crash, which means you destroy less data.

Helge Hafting 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-13 22:24                 ` Helge Hafting
@ 2004-07-13 22:39                   ` Chris Wedgwood
  2004-07-13 23:23                   ` Bernd Eckenfels
  2004-07-14 18:53                   ` Anton Ertl
  2 siblings, 0 replies; 52+ messages in thread
From: Chris Wedgwood @ 2004-07-13 22:39 UTC (permalink / raw)
  To: Helge Hafting; +Cc: Anton Ertl, linux-kernel, Jan Knutar, L A Walsh

On Wed, Jul 14, 2004 at 12:24:11AM +0200, Helge Hafting wrote:

> There is another solution - zero blocks when freeing them.

slow

> (Or put them on a list for later zeroing when the fs isn't busy, in
> order to kee??????p good performance)

complicated, doesn't buy as anything, it also means the blocks are
tied up whilst they are being zeroed (consider a truncated on a
multi-gb file, fairly common)

> With this approach you don't need to zero a half-written
> block after a crash, which means you destroy less data.

it doesn't zero after a crash, what happens is the blocks never make
it to disk and the metadata (which did make it to disk) reflects this
so read returns nulls

as is, you can truncate a multi-gb file, write over it and the only IO
you see will be the new data being written out,  zeroing in between
would be horribly pianful


  --cw

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-13 22:24                 ` Helge Hafting
  2004-07-13 22:39                   ` Chris Wedgwood
@ 2004-07-13 23:23                   ` Bernd Eckenfels
  2004-07-14 18:53                   ` Anton Ertl
  2 siblings, 0 replies; 52+ messages in thread
From: Bernd Eckenfels @ 2004-07-13 23:23 UTC (permalink / raw)
  To: linux-kernel

In article <20040713222411.GA1035@hh.idb.hist.no> you wrote:
> With this approach you don't need to zero a half-written
> block after a crash, which means you destroy less data.

which  does not change the fact that the block contains zeros if it was not
written. :)

Greetings
Bernd
-- 
eckes privat - http://www.eckes.org/
Project Freefire - http://www.freefire.org/

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-13 22:24                 ` Helge Hafting
  2004-07-13 22:39                   ` Chris Wedgwood
  2004-07-13 23:23                   ` Bernd Eckenfels
@ 2004-07-14 18:53                   ` Anton Ertl
  2 siblings, 0 replies; 52+ messages in thread
From: Anton Ertl @ 2004-07-14 18:53 UTC (permalink / raw)
  To: Helge Hafting; +Cc: linux-kernel, Chris Wedgwood, Jan Knutar, L A Walsh

Helge Hafting wrote:
> There is another solution - zero blocks when freeing them. (Or
> put them on a list for later zeroing when the fs isn't busy,
> in order to kee=EF=BF=BD=EF=BF=BDp good performance)
> 
> With this approach you don't need to zero a half-written
> block after a crash, which means you destroy less data.

I don't think half-written blocks are the problem (at least not a
frequent one).  More typical is written meta-data without written
data.  In that case your solution will give the same result as the
current solution, just at higher cost.

- anton


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-10 18:55         ` Norberto Bensa
  2004-07-10 19:19           ` Chris Wedgwood
@ 2004-07-10 19:33           ` Andreas Schwab
  2004-07-10 19:40             ` Chris Wedgwood
  2004-07-10 19:46             ` Norberto Bensa
  2004-07-11  1:21           ` Gopikrishnan Sidhardhan
  2 siblings, 2 replies; 52+ messages in thread
From: Andreas Schwab @ 2004-07-10 19:33 UTC (permalink / raw)
  To: Norberto Bensa; +Cc: Chris Wedgwood, Jan Knutar, L A Walsh, linux-kernel

Norberto Bensa <norberto+linux-kernel@bensa.ath.cx> writes:

> Chris Wedgwood wrote:
>> XFS does not journal data.
>
> I think we all know that. The point, why the hell does it null files?

Security.  You don't want old contents of /etc/shadow appear in random
files after a crash.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Maxfeldstraße 5, 90409 Nürnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-10 19:33           ` Andreas Schwab
@ 2004-07-10 19:40             ` Chris Wedgwood
  2004-07-10 19:46             ` Norberto Bensa
  1 sibling, 0 replies; 52+ messages in thread
From: Chris Wedgwood @ 2004-07-10 19:40 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Norberto Bensa, Jan Knutar, L A Walsh, linux-kernel

On Sat, Jul 10, 2004 at 09:33:34PM +0200, Andreas Schwab wrote:

> Security.  You don't want old contents of /etc/shadow appear in
> random files after a crash.

If we had a different log format we could determine if the blocks were
newly allocated and avoid zeroing that for existing files, we could
even do the code to aggregate transactions which would be *really*
nice for some things.  Lots of work though.


  --cw

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-10 19:33           ` Andreas Schwab
  2004-07-10 19:40             ` Chris Wedgwood
@ 2004-07-10 19:46             ` Norberto Bensa
  2004-07-10 20:03               ` Chris Wedgwood
  1 sibling, 1 reply; 52+ messages in thread
From: Norberto Bensa @ 2004-07-10 19:46 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Chris Wedgwood, Jan Knutar, L A Walsh, linux-kernel

Andreas Schwab wrote:
> Norberto Bensa <norberto+linux-kernel@bensa.ath.cx> writes:
> > Chris Wedgwood wrote:
> >> XFS does not journal data.
> >
> > I think we all know that. The point, why the hell does it null files?
>
> Security.  You don't want old contents of /etc/shadow appear in random
> files after a crash.

Wow. You're telling me that XFS doesn't know if a given piece of the log is 
from file-a or file-b and just in case it zeroes its contents? 

If that's true, XFS has moved to my never-ever-use-it-again list.

Thanks,
Norberto

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-10 19:46             ` Norberto Bensa
@ 2004-07-10 20:03               ` Chris Wedgwood
  0 siblings, 0 replies; 52+ messages in thread
From: Chris Wedgwood @ 2004-07-10 20:03 UTC (permalink / raw)
  To: Norberto Bensa; +Cc: Andreas Schwab, Jan Knutar, L A Walsh, linux-kernel

On Sat, Jul 10, 2004 at 04:46:27PM -0300, Norberto Bensa wrote:

> Wow. You're telling me that XFS doesn't know if a given piece of the
> log is from file-a or file-b and just in case it zeroes its
> contents?

No.

The log-replay can't tell where that block came from --- it might have
been newly allocated and therfore need zeroing.

> If that's true, XFS has moved to my never-ever-use-it-again list.

There are many alternatives.


   --cw

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-10 18:55         ` Norberto Bensa
  2004-07-10 19:19           ` Chris Wedgwood
  2004-07-10 19:33           ` Andreas Schwab
@ 2004-07-11  1:21           ` Gopikrishnan Sidhardhan
  2 siblings, 0 replies; 52+ messages in thread
From: Gopikrishnan Sidhardhan @ 2004-07-11  1:21 UTC (permalink / raw)
  To: linux-kernel

Norberto Bensa wrote:
> Chris Wedgwood wrote:
> 
>>XFS does not journal data.
> 
> 
> I think we all know that. The point, why the hell does it null files?

See http://www-106.ibm.com/developerworks/linux/library/l-fs9.html
- under the section 'Journaling'.

Thanks,
--GS


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-09 16:37 ` L A Walsh
  2004-07-09 21:59   ` Chris Wedgwood
@ 2004-07-29  1:30   ` Nathan Scott
  2004-08-03 18:31     ` L A Walsh
  1 sibling, 1 reply; 52+ messages in thread
From: Nathan Scott @ 2004-07-29  1:30 UTC (permalink / raw)
  To: Norberto Bensa, L A Walsh; +Cc: linux-kernel, linux-xfs

On Fri, Jul 09, 2004 at 09:37:48AM -0700, L A Walsh wrote:
> It's a feature! :-)
> 
> It's been in the code for years to randomly write nulls to some files 

Pfft, nonsense.  The problem relates to an updated inode size
being flushed ahead of the data behind it (hence a size update
can make it out before delayed allocate extents do, and we end
up with a hole beyond the end of file, which reads as zeroes).

> Apparently not easily reproduced, no one has a clue why it does it.  
> Just does. 

No, its actually well known why it behaves this way.
We are looking into ways to address this, and have some
ideas - the trick is fixing it without hurting write
performance - which we will do, its just not trivial.

There are several techiques to reduce the impact of this
behaviour, as others have described (or see the linux-xfs
archives).

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-07-29  1:30   ` Nathan Scott
@ 2004-08-03 18:31     ` L A Walsh
  2004-08-04  0:48       ` Andi Kleen
  2004-08-05  8:16       ` Helge Hafting
  0 siblings, 2 replies; 52+ messages in thread
From: L A Walsh @ 2004-08-03 18:31 UTC (permalink / raw)
  To: Nathan Scott; +Cc: linux-kernel, linux-xfs

On 07-28-04 Nathan Scott blissfully wrote:

>On Fri, Jul 09, 2004 at 09:37:48AM -0700, L A Walsh wrote:
>
>>It's a feature! :-)
>>It's been in the code for years to randomly write nulls to some files 
>>
>Pfft, nonsense. 
>
The above was meant somewhat tongue-in-cheek, ya know...

> The problem relates to an updated inode size
>being flushed ahead of the data behind it (hence a size update
>can make it out before delayed allocate extents do, and we end
>up with a hole beyond the end of file, which reads as zeroes).
>
I believe I understand the scenario you are talking about, but I don't
think it fits the examples I have referred to.  In particular, "/etc/fstab".
I update 'fstab' on Tuesday, say, it works fine...gets backed up just
fine...and I forget about it and move on.  Then, 2-3 days later, my
system crashes and doesn't want to some up.  That's odd, usually after
a crash, it just burps a bit and comes back up.  I grumble and go for
single user.  Turns out my 1.2k fstab file is all "nulls".  Coinidentally,
I find, _maybe_, a couple of other files written around the same time,
also nulled, including times when the nulls appeared in the system log
for that time period! 

Now I know it takes a while before data may end up on disk and that it
may not go out to disk in an ordered fashion, but 2-3 days?  This isn't
a case of a multi-extent file.  My current fstab is only 1335 bytes long.
I doubt it has ever been more than 2.  

My filesystems all use the Allocation unit (AU) size allowed.  I wish
for something larger than a 4k AU size but I'm told it is limited by
the linux page size and to find a PC that uses the IA64 page size to
use larger file AU size (but I haven't seen to many of these IA64 machines
available from Dell or Gateway...:-)  Maybe the code in FAT32 that handles
larger AU's could be ported to XFS?  If FAT32 can do it...nevermind...
I'm sure there are more important issues on the plate.

>>Apparently not easily reproduced, no one has a clue why it does it.  
>>Just does. 
>>
>No, its actually well known why it behaves this way.
>We are looking into ways to address this, and have some
>ideas - the trick is fixing it without hurting write
>performance - which we will do, its just not trivial.
>
You could increase the max AU size :-)  But more seriously, is my
example of writing a 1 AU sized file that becomes zeroed days later
an example of the problem you are speaking of?

>There are several techiques to reduce the impact of this
>behaviour, as others have described (or see the linux-xfs
>archives).
>
Like setting the disk for synchronous writes?  Why not something
in between, like guaranteeing the info on a mostly quiescent machine
will be written to disk within an hour or so?  Or is that not "it"?

I haven't seen an incidence of this behavior in several months on
my machines so my particular problem may have been fixed and the
problem you speak of is unrelated to my own, but the number of unplanned 
shutdowns on my system has only increased recently, since I upgraded
to the stable 2.6 series, whereas before, with 2.4, it could be months
between "blue screens".

Sad was the day that it was decided that the linux-kernel "corp" decided
on feature development vs. stability in the "stable" kernel series. 
Isn't that criticism lodged most often against MS. It seems most 
"companies",
incorporated or not, seem to follow similar growth patterns.  Wasn't
there an Eastern saying about choosing your enemies wisely for you
will eventually become like them?

-l

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-08-03 18:31     ` L A Walsh
@ 2004-08-04  0:48       ` Andi Kleen
  2004-08-04  6:37         ` L A Walsh
  2004-08-05  8:16       ` Helge Hafting
  1 sibling, 1 reply; 52+ messages in thread
From: Andi Kleen @ 2004-08-04  0:48 UTC (permalink / raw)
  To: L A Walsh; +Cc: linux-kernel, linux-xfs, nathans

L A Walsh <lkml@tlinx.org> writes:

> Now I know it takes a while before data may end up on disk and that it
> may not go out to disk in an ordered fashion, but 2-3 days?  This isn't
> a case of a multi-extent file.  My current fstab is only 1335 bytes long.
> I doubt it has ever been more than 2.

Is this perhaps on a laptop? Some scripts for laptop use configure
insanely long data flush times to conserve HD spin time. Sometimes
it is even completely turned off (laptop mode). The extent 
flush is dependent on the configured bdflush or pdflushd data
timeouts.

The truncate is independent from this because it is flushed with a 
different path.

-Andi

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-08-04  0:48       ` Andi Kleen
@ 2004-08-04  6:37         ` L A Walsh
  0 siblings, 0 replies; 52+ messages in thread
From: L A Walsh @ 2004-08-04  6:37 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, linux-xfs, nathans

Not laptop, 2-CPU workstation used as home "server". :-)



Andi Kleen wrote:

>L A Walsh <lkml@tlinx.org> writes:
>
>  
>
>>Now I know it takes a while before data may end up on disk and that it
>>may not go out to disk in an ordered fashion, but 2-3 days?  This isn't
>>a case of a multi-extent file.  My current fstab is only 1335 bytes long.
>>I doubt it has ever been more than 2.
>>    
>>
>
>Is this perhaps on a laptop? Some scripts for laptop use configure
>insanely long data flush times to conserve HD spin time. Sometimes
>it is even completely turned off (laptop mode). The extent 
>flush is dependent on the configured bdflush or pdflushd data
>timeouts.
>
>The truncate is independent from this because it is flushed with a 
>different path.
>
>-Andi
>
>
>  
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-08-03 18:31     ` L A Walsh
  2004-08-04  0:48       ` Andi Kleen
@ 2004-08-05  8:16       ` Helge Hafting
  2004-08-06  1:10         ` Nathan Scott
  1 sibling, 1 reply; 52+ messages in thread
From: Helge Hafting @ 2004-08-05  8:16 UTC (permalink / raw)
  To: L A Walsh; +Cc: linux-kernel

L A Walsh wrote:

>
>> The problem relates to an updated inode size
>> being flushed ahead of the data behind it (hence a size update
>> can make it out before delayed allocate extents do, and we end
>> up with a hole beyond the end of file, which reads as zeroes).
>>
> I believe I understand the scenario you are talking about, but I don't
> think it fits the examples I have referred to.  In particular, 
> "/etc/fstab".
> I update 'fstab' on Tuesday, say, it works fine...gets backed up just
> fine...and I forget about it and move on.  Then, 2-3 days later, my
> system crashes and doesn't want to some up.  That's odd, usually after
> a crash, it just burps a bit and comes back up.  I grumble and go for
> single user.  Turns out my 1.2k fstab file is all "nulls".  
> Coinidentally,
> I find, _maybe_, a couple of other files written around the same time,
> also nulled, including times when the nulls appeared in the system log
> for that time period!
> Now I know it takes a while before data may end up on disk and that it
> may not go out to disk in an ordered fashion, but 2-3 days?  

Seems strange to me, but the amount of delay is entirely up to the 
filesystem.

> This isn't
> a case of a multi-extent file.  My current fstab is only 1335 bytes long.
> I doubt it has ever been more than 2. 
> My filesystems all use the Allocation unit (AU) size allowed.  I wish
> for something larger than a 4k AU size but I'm told it is limited by
> the linux page size and to find a PC that uses the IA64 page size to
> use larger file AU size (but I haven't seen to many of these IA64 
> machines
> available from Dell or Gateway...:-)  Maybe the code in FAT32 that 
> handles
> larger AU's could be ported to XFS?  If FAT32 can do it...nevermind...
> I'm sure there are more important issues on the plate.
>
>>> Apparently not easily reproduced, no one has a clue why it does it.  
>>> Just does.
>>
>> No, its actually well known why it behaves this way.
>> We are looking into ways to address this, and have some
>> ideas - the trick is fixing it without hurting write
>> performance - which we will do, its just not trivial.
>>
> You could increase the max AU size :-)  But more seriously, is my
> example of writing a 1 AU sized file that becomes zeroed days later
> an example of the problem you are speaking of?
>
>> There are several techiques to reduce the impact of this
>> behaviour, as others have described (or see the linux-xfs
>> archives).
>>
> Like setting the disk for synchronous writes?  Why not something
> in between, like guaranteeing the info on a mostly quiescent machine
> will be written to disk within an hour or so?  Or is that not "it"?
>
This should be trivial.  Edit your crontab, so that cron will
run "sync" once per hour.  Everything queued for writing when the
"sync" command is issued will be on disk when the command finishes.
So this guarantees that nothing waits more than 1 hour. (Sync is usually
over in a few seconds on a home machine.  There should be no more
lost "old" files unless the fs has a bug.)

You may also want to run "sync" manually before doing something that
risks crashing. (Such as moving a live machine, dubious hotplugging,
testing beta device drivers . . .)

> I haven't seen an incidence of this behavior in several months on
> my machines so my particular problem may have been fixed and the
> problem you speak of is unrelated to my own, but the number of 
> unplanned shutdowns on my system has only increased recently, since I 
> upgraded
> to the stable 2.6 series, whereas before, with 2.4, it could be months
> between "blue screens".

You may want to keep using 2.4 for a while then - it probably _is_ a lot 
more stable.
It has been stabilizing for the entire 2.5 development time,
2.6 stabilization has just begun!

>
> Sad was the day that it was decided that the linux-kernel "corp" decided
> on feature development vs. stability in the "stable" kernel series. 
> Isn't that criticism lodged most often against MS. 

Not a big problem in this case.  If XFS isn't stable enough for you, 
consider one
of the many other filesystems.  ext3 or reiserfs for journalling, or 
plain old
ext2.  The nice thing about having many features, is that you have a set
to choose from.

Helge Hafting

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-08-05  8:16       ` Helge Hafting
@ 2004-08-06  1:10         ` Nathan Scott
  2004-08-06  1:34           ` Andrew Morton
  0 siblings, 1 reply; 52+ messages in thread
From: Nathan Scott @ 2004-08-06  1:10 UTC (permalink / raw)
  To: L A Walsh, Helge Hafting; +Cc: linux-kernel

On Thu, Aug 05, 2004 at 10:16:02AM +0200, Helge Hafting wrote:
> L A Walsh wrote:
> >Now I know it takes a while before data may end up on disk and that it
> >may not go out to disk in an ordered fashion, but 2-3 days?  
> 
> Seems strange to me, but the amount of delay is entirely up to the 
> filesystem.

The flushing of dirty file data is actually performed by
kernel threads outside of the individual filesystems.

I cannot explain a 2/3 day wait for data to get flushed,
something really strange going on for you there.

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: XFS: how to NOT null files on fsck?
  2004-08-06  1:10         ` Nathan Scott
@ 2004-08-06  1:34           ` Andrew Morton
  0 siblings, 0 replies; 52+ messages in thread
From: Andrew Morton @ 2004-08-06  1:34 UTC (permalink / raw)
  To: Nathan Scott; +Cc: lkml, helge.hafting, linux-kernel

Nathan Scott <nathans@sgi.com> wrote:
>
>  On Thu, Aug 05, 2004 at 10:16:02AM +0200, Helge Hafting wrote:
>  > L A Walsh wrote:
>  > >Now I know it takes a while before data may end up on disk and that it
>  > >may not go out to disk in an ordered fashion, but 2-3 days?  
>  > 
>  > Seems strange to me, but the amount of delay is entirely up to the 
>  > filesystem.
> 
>  The flushing of dirty file data is actually performed by
>  kernel threads outside of the individual filesystems.
> 
>  I cannot explain a 2/3 day wait for data to get flushed,
>  something really strange going on for you there.

Well there was a writeback bug which could cause files to not get written
back ever.  Perhaps an unmount would cause writeback but nothing else would.

It was fixed by the below patch.  The situation will only arise with a
combination of a race and a data-synchronising writeback (O_SYNC, fsync,
etc).

It's unlikely that this is the cause of this report though.




From: Miklos Szeredi <miklos@szeredi.hu>

This patch fixes a hard-to-trigger condition, where the inode is on the
inode_in_use list while it's state is dirty.  In this state dirty pages are
not written back in sync() or from kupdate, only from direct page reclaim. 
And this causes a livelock in balance_dirty_pages after a while.

The actual sequence of events required to get into this state is:

thread   function                             inode state         inode list
----------------------------------------------------------------------------
1 __sync_single_inode (background)            I_DIRTY             sb->s_io
1 do_writepages ...                           I_LOCKED
2 __writeback_single_inode (sync) sleeps      I_LOCKED
1 __sync_single_inode (background) finish     0                   inode_in_use
2 __writeback_single_inode (sync) wakeup      0
2 __sync_single_inode (sync)                  0  
2 do_writepages ...                           I_LOCKED
3 __mark_inode_dirty                          I_LOCKED | I_DIRTY
2 __sync_single_inode (sync) finish           I_DIRTY             left on
                                                                  inode_in_use

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 25-akpm/fs/fs-writeback.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletion(-)

diff -puN fs/fs-writeback.c~fix-inode-state-corruption-268-rc1-bk1 fs/fs-writeback.c
--- 25/fs/fs-writeback.c~fix-inode-state-corruption-268-rc1-bk1	Fri Jul 16 15:06:57 2004
+++ 25-akpm/fs/fs-writeback.c	Fri Jul 16 15:06:57 2004
@@ -213,8 +213,9 @@ __sync_single_inode(struct inode *inode,
 		} else if (inode->i_state & I_DIRTY) {
 			/*
 			 * Someone redirtied the inode while were writing back
-			 * the pages: nothing to do.
+			 * the pages.
 			 */
+			list_move(&inode->i_list, &sb->s_dirty);
 		} else if (atomic_read(&inode->i_count)) {
 			/*
 			 * The inode is clean, inuse
_


^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2004-08-06  1:36 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-05  5:47 XFS: how to NOT null files on fsck? Norberto Bensa
2004-07-09 16:37 ` L A Walsh
2004-07-09 21:59   ` Chris Wedgwood
2004-07-10 18:33     ` L A Walsh
2004-07-10 18:43       ` Chris Wedgwood
2004-07-10 21:24         ` Bernd Eckenfels
2004-07-11 21:54           ` Helge Hafting
2004-07-12 17:56             ` H. Peter Anvin
2004-07-12 19:59               ` Chris Wedgwood
2004-07-12 20:32                 ` H. Peter Anvin
2004-07-12 22:29                 ` Bernd Eckenfels
2004-07-12 23:03       ` Bernd Eckenfels
2004-07-12 23:14         ` Chris Wedgwood
2004-07-10 18:43     ` Jan Knutar
2004-07-10 18:46       ` Chris Wedgwood
2004-07-10 18:55         ` Norberto Bensa
2004-07-10 19:19           ` Chris Wedgwood
2004-07-12 21:20             ` Chris Wedgwood
2004-07-12 22:40               ` L A Walsh
2004-07-12 22:53                 ` Chris Wedgwood
2004-07-13  1:44                   ` Bernd Eckenfels
2004-07-13  5:24                     ` Chris Wedgwood
     [not found]             ` <2hgxc-5x9-9@gated-at.bofh.it>
2004-07-13  7:25               ` Anton Ertl
2004-07-13  8:09                 ` Chris Wedgwood
2004-07-13  9:34                   ` Anton Ertl
2004-07-13  9:53                     ` Chris Wedgwood
2004-07-13 10:27                       ` Tim Connors
2004-07-13 10:38                         ` ismail dönmez
2004-07-13 11:16                           ` Nick Piggin
2004-07-13 12:52                             ` ismail dönmez
2004-07-13 10:58                         ` Chris Wedgwood
2004-07-13 13:33                       ` Anton Ertl
2004-07-13 20:32                         ` Chris Wedgwood
2004-07-13 22:42                           ` Bernd Eckenfels
2004-07-14 18:49                           ` Anton Ertl
2004-07-14 19:00                             ` Chris Wedgwood
2004-07-13 22:24                 ` Helge Hafting
2004-07-13 22:39                   ` Chris Wedgwood
2004-07-13 23:23                   ` Bernd Eckenfels
2004-07-14 18:53                   ` Anton Ertl
2004-07-10 19:33           ` Andreas Schwab
2004-07-10 19:40             ` Chris Wedgwood
2004-07-10 19:46             ` Norberto Bensa
2004-07-10 20:03               ` Chris Wedgwood
2004-07-11  1:21           ` Gopikrishnan Sidhardhan
2004-07-29  1:30   ` Nathan Scott
2004-08-03 18:31     ` L A Walsh
2004-08-04  0:48       ` Andi Kleen
2004-08-04  6:37         ` L A Walsh
2004-08-05  8:16       ` Helge Hafting
2004-08-06  1:10         ` Nathan Scott
2004-08-06  1:34           ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox