public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* massively truncated files with XFS with sudden power loss on 2.6.27 and 2.6.28
@ 2008-12-29 18:20 Martin Steigerwald
  2008-12-29 19:03 ` Chris Wedgwood
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Martin Steigerwald @ 2008-12-29 18:20 UTC (permalink / raw)
  To: xfs


Hi!

Remember

http://oss.sgi.com/pipermail/xfs/2008-November/037399.html

?

I thought it was resolved and with later TuxOnIce and sync all is better 
for sure. This all was with barriers and write cache enabled.

But I had a hard crash this time while shutting down the system regularily 
and the KDE addressbook, KDE settings, additional sidebar all was lost 
due to truncated files. This was without barriers but also without write 
cache.

Curious about the safety of my data I tried to simulate the thing. I 
shouldn't have done that with my productive data but here are the 
results:

I just switched the machine off after having made a backup of my KDE 
configuration and after closing my usual apps. Then I waited 30-40 
seconds. First time was fine, second time KDE colors were lost again. 
Third time I didn't wait that long. Side bar was lost. Fourth time I 
pressed power off after *starting* KDE. Lots of stuff was lost, 
including:

- colors
- sidebar
- kpanel settings
- kgpg settings
- one kwallet digital wallet with passwords and stuff, a complete file of 
130 KB was just 60 bytes anymore

I cannot remember having seen this kind of behavior anywhere between 
2.6.17.7 and 2.6.26! And I had sudden interruptions of write activity 
from time to time. 

I can't prove anything right now. I possibly could if I dare to test this 
again with 2.6.26! But from my experiences this never was so massive. 
Prior to the null file fixes a file or two might have been corrupted and 
that not all the times. Thats to be expected if thats the file that where 
written out at the time. But now it seems that almost every file that is 
opened for writing or not even just for writing is truncated seriously at 
sudden interruption of write activity. Whereas before it appeared that 
usually either the change was not made or it was made - at least for 
small files. Now the file is truncated, no holes, just lots less bytes 
than before.

I think I will go back to 2.6.26 for now - with write barriers, cause 
thats what used to work. I went too far already with my tests, cause its 
difficult to be sure that I found all truncated files even when I close 
all productivity applications in my tests. Altough it seems I was able to 
recovery everything everytime by mixing the current data set with the 
broken stuff restored from the last backup this is setting my data at a 
too high risk.

Do you have any idea on how to help to get down to the cause of this - 
without risking precious data? Did anyone else see this? Does anyone use 
XFS on laptops and had recent power losses or crashes?

I have seen this on a 2.6.27.7, 2.6.28 with tuxonice patches. syncing 
before a crash occurs seems to fix the issue. Did something change with 
how aggressively the kernel writes data out?

I think it was something along

shambhala:/proc/sys/vm> cat dirty_expire_centisecs
2999

shambhala:/proc/sys/fs/xfs> cat xfsbufd_centisecs xfssyncd_centisecs
100
3000

in all recent kernels!

I expect to loose the changes for a dirtied file thats in the page cache. 
But I do not expect to loose the current (old) file on disk in that case, 
unless the crash happens when its actually written out at that time. And 
that appears to be highly unlikely expecially at the time just after KDE 
started up when I did not use any application yet. I would be surprised 
when the first things applications would be doing was to write out what 
they just read in. And even then I would be surprised when XFS did write 
to all the files at once. So I just don't get what I have seen here and I 
think I see a regression. I am willing to look deeper when I found how to 
do so safely enough.

If there an xfsqa test that simulates sudden interruption of write 
activity?

Actually I am considering to switch to ext3/4. Maybe the people that say 
don't use XFS on commodity hardware really have a point. But then it did 
work very well from 2.6.17.7 to 2.6.26, so I think what I face here is a 
behavorial regression. It might be a performance improvement at the same 
time, but for laptops and commodity workstations this is too risky IMHO. 
Is there interest in digging this? I can accept when you tell my not to 
use XFS on my laptop. But actually I think something changed between 
2.6.26 andf 2.6.27 and maybe thats worth looking at.

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: massively truncated files with XFS with sudden power loss on 2.6.27 and 2.6.28
  2008-12-29 18:20 massively truncated files with XFS with sudden power loss on 2.6.27 and 2.6.28 Martin Steigerwald
@ 2008-12-29 19:03 ` Chris Wedgwood
  2008-12-29 19:08 ` Eric Sandeen
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Chris Wedgwood @ 2008-12-29 19:03 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: xfs

On Mon, Dec 29, 2008 at 07:20:33PM +0100, Martin Steigerwald wrote:

> But I had a hard crash this time while shutting down the system
> regularily and the KDE addressbook, KDE settings, additional sidebar
> all was lost due to truncated files. This was without barriers but
> also without write cache.

I've seen this but not for a very long time.

It used be be (perhaps still is) that KDE updates configurations with
open O_TRUNC & rewrite.

This means there is a window when you can lose data.


I suggested that they should open temp, write, fsync then rename (some
time ago) and I recall seeing some changes in CVS the next day to do
that, but i think that was with ktmpfile or something only).

The other thing is XFS has a much smaller window now than it used to
on the open w/ truncate case, I think now writeout begins as soon as
the file is closed.


Older versions of firefox did this with bookmarks too, so you would
get cases there were you lost data.  Now it uses sqlite as a store
which is much more sane in it's write patterns.

> Do you have any idea on how to help to get down to the cause of this
> - without risking precious data?

ball-peen hammer?

> Did anyone else see this? Does anyone use XFS on laptops and had
> recent power losses or crashes?

I use XFS on laptops, have done for years and don't typically see
this.

> I expect to loose the changes for a dirtied file thats in the page
> cache.

Right.

> But I do not expect to loose the current (old) file on disk in that
> case, unless the crash happens when its actually written out at that
> time.

But you do, when it opens the old file and truncates it, that event is
logged and at which point the file is zero-length containing nothing.

The data hits the disk later on and the size is updated

If you lose power before then, you get zero length files.

> Actually I am considering to switch to ext3/4.

If it is what i explained above, you can still this this though it's
much harder.

Basically, developers shouldn't rewrite critical data in place.

(didn't Jim Gray say something like "Update in Place is a Poison
Apple"?)

> Is there interest in digging this?

Check how KDE writes out configuration files, strace might be easier
than figuring it out from the code.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: massively truncated files with XFS with sudden power loss on 2.6.27 and 2.6.28
  2008-12-29 18:20 massively truncated files with XFS with sudden power loss on 2.6.27 and 2.6.28 Martin Steigerwald
  2008-12-29 19:03 ` Chris Wedgwood
@ 2008-12-29 19:08 ` Eric Sandeen
  2008-12-29 20:00   ` Martin Steigerwald
  2008-12-30  0:14   ` Chris Wedgwood
  2008-12-29 19:09 ` Russell Cattelan
  2008-12-29 19:48 ` safe writing in applications (was: Re: massively truncated files with XFS with sudden power loss on 2.6.27 and 2.6.28) Martin Steigerwald
  3 siblings, 2 replies; 14+ messages in thread
From: Eric Sandeen @ 2008-12-29 19:08 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: xfs

Martin Steigerwald wrote:
> Hi!
> 
> Remember
> 
> http://oss.sgi.com/pipermail/xfs/2008-November/037399.html
> 
> ?
> 
> I thought it was resolved and with later TuxOnIce and sync all is better 
> for sure. This all was with barriers and write cache enabled.
> 
> But I had a hard crash this time while shutting down the system regularily 
> and the KDE addressbook, KDE settings, additional sidebar all was lost 
> due to truncated files. This was without barriers but also without write 
> cache.

Some actual data here would be helpful; when you say "truncated files"
what do you mean; are they 0 length?  Or shorter than they should be?
How much shorter, and how do you know what they "should be?"

It is certainly at least possible that whatever is writing the KDE files
is not following good practices for data integrity... I can't say that
for sure, but apps have responsibility here, too.  :)

> Curious about the safety of my data I tried to simulate the thing. I 
> shouldn't have done that with my productive data but here are the 
> results:
> 
> I just switched the machine off after having made a backup of my KDE 
> configuration and after closing my usual apps. Then I waited 30-40 
> seconds. First time was fine, second time KDE colors were lost again. 
> Third time I didn't wait that long. Side bar was lost. Fourth time I 
> pressed power off after *starting* KDE. Lots of stuff was lost, 
> including:
> 
> - colors
> - sidebar
> - kpanel settings
> - kgpg settings
> - one kwallet digital wallet with passwords and stuff, a complete file of 
> 130 KB was just 60 bytes anymore

Ah, data!  So it went from 130KB to 60 bytes?  Were the first 60 bytes
valid data, or could you tell.

> I cannot remember having seen this kind of behavior anywhere between 
> 2.6.17.7 and 2.6.26! And I had sudden interruptions of write activity 
> from time to time. 
> 
> I can't prove anything right now. I possibly could if I dare to test this 
> again with 2.6.26! But from my experiences this never was so massive. 
> Prior to the null file fixes a file or two might have been corrupted and 
> that not all the times. Thats to be expected if thats the file that where 
> written out at the time. But now it seems that almost every file that is 
> opened for writing or not even just for writing is truncated seriously at 
> sudden interruption of write activity. Whereas before it appeared that 
> usually either the change was not made or it was made - at least for 
> small files. Now the file is truncated, no holes, just lots less bytes 
> than before.
> 
> I think I will go back to 2.6.26 for now - with write barriers, cause 
> thats what used to work. I went too far already with my tests, cause its 
> difficult to be sure that I found all truncated files even when I close 
> all productivity applications in my tests. Altough it seems I was able to 
> recovery everything everytime by mixing the current data set with the 
> broken stuff restored from the last backup this is setting my data at a 
> too high risk.
> 
> Do you have any idea on how to help to get down to the cause of this - 
> without risking precious data? Did anyone else see this? Does anyone use 
> XFS on laptops and had recent power losses or crashes?
> 
> I have seen this on a 2.6.27.7, 2.6.28 with tuxonice patches. 

Seems it'd be worth testing w/o tuxonice, too.  I don't know what all is
in there, honesetly.

> syncing 
> before a crash occurs seems to fix the issue. Did something change with 
> how aggressively the kernel writes data out?
> 
> I think it was something along
> 
> shambhala:/proc/sys/vm> cat dirty_expire_centisecs
> 2999
> 
> shambhala:/proc/sys/fs/xfs> cat xfsbufd_centisecs xfssyncd_centisecs
> 100
> 3000
> 
> in all recent kernels!

I don't think those have changed any time recently.

> I expect to loose the changes for a dirtied file thats in the page cache. 
> But I do not expect to loose the current (old) file on disk in that case, 
> unless the crash happens when its actually written out at that time. 

This will depend on what the application is doing, though.

> And 
> that appears to be highly unlikely expecially at the time just after KDE 
> started up when I did not use any application yet. I would be surprised 
> when the first things applications would be doing was to write out what 
> they just read in. And even then I would be surprised when XFS did write 
> to all the files at once. So I just don't get what I have seen here and I 
> think I see a regression. I am willing to look deeper when I found how to 
> do so safely enough.

I take it that you see this even for files which you have not
(intentionally) modified?

> If there an xfsqa test that simulates sudden interruption of write 
> activity?

There are tests which interrupt IO with the XFS_IOC_GOINGDOWN ioctl,
which simulates a filesystem shutdown, which is not exactly the same as
a crash or a power loss, though.

> Actually I am considering to switch to ext3/4. Maybe the people that say 
> don't use XFS on commodity hardware really have a point.

No.  :)

> But then it did 
> work very well from 2.6.17.7 to 2.6.26, so I think what I face here is a 
> behavorial regression. It might be a performance improvement at the same 
> time, but for laptops and commodity workstations this is too risky IMHO. 
> Is there interest in digging this? I can accept when you tell my not to 
> use XFS on my laptop. But actually I think something changed between 
> 2.6.26 andf 2.6.27 and maybe thats worth looking at.

If you know what is writing to the files that you often see truncated,
an strace of that pid might be interesting, to see what sorts of IO it
is doing.

ls -l /proc/$PID/fd/* | grep $FILE

might give a clue if anyone has these files open, then strace that pid
to see if there is any interesting activity on them?

Otherwise, if you're highly motivated, and have a test box, do a little
regression testing and see when you think this behavior changed.  But
I'd start w/ pristine upstream kernels.

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: massively truncated files with XFS with sudden power loss on 2.6.27 and 2.6.28
  2008-12-29 18:20 massively truncated files with XFS with sudden power loss on 2.6.27 and 2.6.28 Martin Steigerwald
  2008-12-29 19:03 ` Chris Wedgwood
  2008-12-29 19:08 ` Eric Sandeen
@ 2008-12-29 19:09 ` Russell Cattelan
  2008-12-29 19:20   ` Christoph Hellwig
  2008-12-29 19:29   ` Chris Wedgwood
  2008-12-29 19:48 ` safe writing in applications (was: Re: massively truncated files with XFS with sudden power loss on 2.6.27 and 2.6.28) Martin Steigerwald
  3 siblings, 2 replies; 14+ messages in thread
From: Russell Cattelan @ 2008-12-29 19:09 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: xfs

Martin Steigerwald wrote:
> Hi!
>
> Remember
>
> http://oss.sgi.com/pipermail/xfs/2008-November/037399.html
>
> ?
>
> I thought it was resolved and with later TuxOnIce and sync all is better 
> for sure. This all was with barriers and write cache enabled.
>
> But I had a hard crash this time while shutting down the system regularily 
> and the KDE addressbook, KDE settings, additional sidebar all was lost 
> due to truncated files. This was without barriers but also without write 
> cache.
>
> Curious about the safety of my data I tried to simulate the thing. I 
> shouldn't have done that with my productive data but here are the 
> results:
>
> I just switched the machine off after having made a backup of my KDE 
> configuration and after closing my usual apps. Then I waited 30-40 
> seconds. First time was fine, second time KDE colors were lost again. 
> Third time I didn't wait that long. Side bar was lost. Fourth time I 
> pressed power off after *starting* KDE. Lots of stuff was lost, 
> including:
>
> - colors
> - sidebar
> - kpanel settings
> - kgpg settings
> - one kwallet digital wallet with passwords and stuff, a complete file of 
> 130 KB was just 60 bytes anymore
>
> I cannot remember having seen this kind of behavior anywhere between 
> 2.6.17.7 and 2.6.26! And I had sudden interruptions of write activity 
> from time to time. 
>
> I can't prove anything right now. I possibly could if I dare to test this 
> again with 2.6.26! But from my experiences this never was so massive. 
> Prior to the null file fixes a file or two might have been corrupted and 
> that not all the times. Thats to be expected if thats the file that where 
> written out at the time. But now it seems that almost every file that is 
> opened for writing or not even just for writing is truncated seriously at 
> sudden interruption of write activity. Whereas before it appeared that 
> usually either the change was not made or it was made - at least for 
> small files. Now the file is truncated, no holes, just lots less bytes 
> than before.
>
> I think I will go back to 2.6.26 for now - with write barriers, cause 
> thats what used to work. I went too far already with my tests, cause its 
> difficult to be sure that I found all truncated files even when I close 
> all productivity applications in my tests. Altough it seems I was able to 
> recovery everything everytime by mixing the current data set with the 
> broken stuff restored from the last backup this is setting my data at a 
> too high risk.
>
> Do you have any idea on how to help to get down to the cause of this - 
> without risking precious data? Did anyone else see this? Does anyone use 
> XFS on laptops and had recent power losses or crashes?
>
> I have seen this on a 2.6.27.7, 2.6.28 with tuxonice patches. syncing 
> before a crash occurs seems to fix the issue. Did something change with 
> how aggressively the kernel writes data out?
>
> I think it was something along
>
> shambhala:/proc/sys/vm> cat dirty_expire_centisecs
> 2999
>
> shambhala:/proc/sys/fs/xfs> cat xfsbufd_centisecs xfssyncd_centisecs
> 100
> 3000
>
> in all recent kernels!
>
> I expect to loose the changes for a dirtied file thats in the page cache. 
> But I do not expect to loose the current (old) file on disk in that case, 
> unless the crash happens when its actually written out at that time. And 
> that appears to be highly unlikely expecially at the time just after KDE 
> started up when I did not use any application yet. I would be surprised 
> when the first things applications would be doing was to write out what 
> they just read in. And even then I would be surprised when XFS did write 
> to all the files at once. So I just don't get what I have seen here and I 
> think I see a regression. I am willing to look deeper when I found how to 
> do so safely enough.
>
> If there an xfsqa test that simulates sudden interruption of write 
> activity?
>
> Actually I am considering to switch to ext3/4. Maybe the people that say 
> don't use XFS on commodity hardware really have a point. But then it did 
> work very well from 2.6.17.7 to 2.6.26, so I think what I face here is a 
> behavorial regression. It might be a performance improvement at the same 
> time, but for laptops and commodity workstations this is too risky IMHO. 
> Is there interest in digging this? I can accept when you tell my not to 
> use XFS on my laptop. But actually I think something changed between 
> 2.6.26 andf 2.6.27 and maybe thats worth looking at.
>
> Ciao,
>   
I would have to look for sure when Dave's rewrite of the inode cache/fs 
sync code went in but
it could be around the time of 2.6.27.  xfssyncd also has special 
handling for laptop mode
so that is does not tickle the disk  to often, so maybe that needs to be 
looked at?

The question that I have is regards to kde apps.
The "null file" issue mainly shows up when an app truncates a file to 0 
and then
re-writes the entire contents, (vim is the most common app doing this).
So does kde do this on a regular basis? open a file, read it in, 
truncate to 0 and then write
it back out at some point? and why? are these files that are modified often?

The other thing that is odd is why files full of nulls still? xfs 
changed its behavior to write out size changes
at flush time and not before, previously size changes would be synced 
out prior to the data being synced
out, thus creating "null files" or rather a file with size but no extent 
data.

The "null files" problem should be an "empty files" problem at worst 
now, so it is really curious that you
are seeing null files still.

-Russell

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: massively truncated files with XFS with sudden power loss on 2.6.27 and 2.6.28
  2008-12-29 19:09 ` Russell Cattelan
@ 2008-12-29 19:20   ` Christoph Hellwig
  2008-12-29 19:29   ` Chris Wedgwood
  1 sibling, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2008-12-29 19:20 UTC (permalink / raw)
  To: Russell Cattelan; +Cc: Martin Steigerwald, xfs

On Mon, Dec 29, 2008 at 01:09:18PM -0600, Russell Cattelan wrote:
> I would have to look for sure when Dave's rewrite of the inode cache/fs 
> sync code went in but
> it could be around the time of 2.6.27.

That's all in the 2.6.29 queue.

> The other thing that is odd is why files full of nulls still? xfs 
> changed its behavior to write out size changes
> at flush time and not before, previously size changes would be synced 
> out prior to the data being synced
> out, thus creating "null files" or rather a file with size but no extent 
> data.
> 
> The "null files" problem should be an "empty files" problem at worst 
> now, so it is really curious that you
> are seeing null files still.

One good way would be to mount the partition with -o sync.  That way
you get data integrity for all files at the epense of really sucky
performance..

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: massively truncated files with XFS with sudden power loss on 2.6.27 and 2.6.28
  2008-12-29 19:09 ` Russell Cattelan
  2008-12-29 19:20   ` Christoph Hellwig
@ 2008-12-29 19:29   ` Chris Wedgwood
  2008-12-29 20:09     ` Russell Cattelan
  1 sibling, 1 reply; 14+ messages in thread
From: Chris Wedgwood @ 2008-12-29 19:29 UTC (permalink / raw)
  To: Russell Cattelan; +Cc: Martin Steigerwald, xfs

On Mon, Dec 29, 2008 at 01:09:18PM -0600, Russell Cattelan wrote:

> The question that I have is regards to kde apps.

i just did a quick strace of something, i see it do:


open newfile
write data
close file
rename newfile over oldfile

no fsync before close...


this will bite xfs more than ext3 w/ ordered mode

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* safe writing in applications (was: Re: massively truncated files with XFS with sudden power loss on 2.6.27 and 2.6.28)
  2008-12-29 18:20 massively truncated files with XFS with sudden power loss on 2.6.27 and 2.6.28 Martin Steigerwald
                   ` (2 preceding siblings ...)
  2008-12-29 19:09 ` Russell Cattelan
@ 2008-12-29 19:48 ` Martin Steigerwald
  2008-12-29 19:54   ` Christoph Hellwig
  3 siblings, 1 reply; 14+ messages in thread
From: Martin Steigerwald @ 2008-12-29 19:48 UTC (permalink / raw)
  To: xfs

Am Montag 29 Dezember 2008 schrieb Martin Steigerwald:
> Hi!
> Remember
>
> http://oss.sgi.com/pipermail/xfs/2008-November/037399.html
>
> ?

[... about truncated KDE configuration files ...]

> I cannot remember having seen this kind of behavior anywhere between
> 2.6.17.7 and 2.6.26! And I had sudden interruptions of write activity
> from time to time.
>
> I can't prove anything right now. I possibly could if I dare to test
> this again with 2.6.26! But from my experiences this never was so
> massive. Prior to the null file fixes a file or two might have been
> corrupted and that not all the times. Thats to be expected if thats the
> file that where written out at the time. But now it seems that almost
> every file that is opened for writing or not even just for writing is
> truncated seriously at sudden interruption of write activity. Whereas
> before it appeared that usually either the change was not made or it
> was made - at least for small files. Now the file is truncated, no
> holes, just lots less bytes than before.

Ok, I had to test this. So I made a backup of my current KDE configuration 
to an external drive and tested with 2.6.25.10 and 2.6.26.5! It happens 
there too. So its nothing new what I have observed here. Even the case of 
massively truncated files when trying directly after KDE login. Why all 
those applications appear to write out their configurations files when 
just having been started is a bit beyond me, but well that seems to be 
the case.

So it seems with pre 2.6.27 and 2.6.28 sudden power interruptions I had 
*lots of luck*. Or there is a very subtile difference in the likelyhood 
of truncated files happening. I had the impression during my todays test 
that at least with 2.6.25.10 and 2.6.26.5 truncated files were a little 
less likely, but I have no means of statistics.

And I do not yet have a comparison with ext3/ext4 either.

So I jumped out of the window with my conclusions too early, or I need to 
test even earlier kernels. I hold back an earlier mail about this 
already, but this time I thought I'd write an email. Sorry for the noise.

It might be wise however to file enhancement requests for the KDE 
applications where I observed this behavior if safer writing within the 
applications is possible. Any hints on what application developers should 
keep in mind when writing out config files?

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: safe writing in applications (was: Re: massively truncated files with XFS with sudden power loss on 2.6.27 and 2.6.28)
  2008-12-29 19:48 ` safe writing in applications (was: Re: massively truncated files with XFS with sudden power loss on 2.6.27 and 2.6.28) Martin Steigerwald
@ 2008-12-29 19:54   ` Christoph Hellwig
  0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2008-12-29 19:54 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: xfs

On Mon, Dec 29, 2008 at 08:48:40PM +0100, Martin Steigerwald wrote:
> It might be wise however to file enhancement requests for the KDE 
> applications where I observed this behavior if safer writing within the 
> applications is possible. Any hints on what application developers should 
> keep in mind when writing out config files?

Preferably use O_SYNC.  Never truncate and then rewrite, in doubt write
a new file and rename it to the right place after it was fsync'ed (the
mailserver trick)

in the meantime a nice way to hack around this is do chattr +S for all
these files which forces synchronous writes.  Doesn't help if they
actually use the rename trick above sometimes.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: massively truncated files with XFS with sudden power loss on 2.6.27 and 2.6.28
  2008-12-29 19:08 ` Eric Sandeen
@ 2008-12-29 20:00   ` Martin Steigerwald
  2008-12-30  0:14   ` Chris Wedgwood
  1 sibling, 0 replies; 14+ messages in thread
From: Martin Steigerwald @ 2008-12-29 20:00 UTC (permalink / raw)
  To: xfs

Am Montag 29 Dezember 2008 schrieb Eric Sandeen:
> Martin Steigerwald wrote:
> > Hi!
> >
> > Remember
> >
> > http://oss.sgi.com/pipermail/xfs/2008-November/037399.html
> >
> > ?
> >
> > I thought it was resolved and with later TuxOnIce and sync all is
> > better for sure. This all was with barriers and write cache enabled.
> >
> > But I had a hard crash this time while shutting down the system
> > regularily and the KDE addressbook, KDE settings, additional sidebar
> > all was lost due to truncated files. This was without barriers but
> > also without write cache.
>
> Some actual data here would be helpful; when you say "truncated files"
> what do you mean; are they 0 length?  Or shorter than they should be?
> How much shorter, and how do you know what they "should be?"

They are shortened by different amounts of bytes. Sometimes from 130 KB to 
60 bytes. Sometimes a file is 0 bytes.

http://oss.sgi.com/pipermail/xfs/2008-November/037399.html

> It is certainly at least possible that whatever is writing the KDE
> files is not following good practices for data integrity... I can't say
> that for sure, but apps have responsibility here, too.  :)

Yeah. I am willing to file enhancement requests were applicable.

> > Curious about the safety of my data I tried to simulate the thing. I
> > shouldn't have done that with my productive data but here are the
> > results:
> >
> > I just switched the machine off after having made a backup of my KDE
> > configuration and after closing my usual apps. Then I waited 30-40
> > seconds. First time was fine, second time KDE colors were lost again.
> > Third time I didn't wait that long. Side bar was lost. Fourth time I
> > pressed power off after *starting* KDE. Lots of stuff was lost,
> > including:
> >
> > - colors
> > - sidebar
> > - kpanel settings
> > - kgpg settings
> > - one kwallet digital wallet with passwords and stuff, a complete
> > file of 130 KB was just 60 bytes anymore
>
> Ah, data!  So it went from 130KB to 60 bytes?  Were the first 60 bytes
> valid data, or could you tell.

I do not have that one at hand anymore - I was quite panicking and forget 
to make a copy of the broken ~/.kde directory before fixing it. But see

http://oss.sgi.com/pipermail/xfs/2008-November/037399.html

for some examples. The contents upto to the truncation point were fine as 
far as I looked back then.

No holes either. Just less bytes than the once of the backup that I made 
just before doing my tests of today.

> > I have seen this on a 2.6.27.7, 2.6.28 with tuxonice patches.
>
> Seems it'd be worth testing w/o tuxonice, too.  I don't know what all
> is in there, honesetly.

Hmmm... I did not test suspend/resume cycled. I just bootet up once and 
shut the system down by pressing the power button long enough.

> > syncing
> > before a crash occurs seems to fix the issue. Did something change
> > with how aggressively the kernel writes data out?
> >
> > I think it was something along
> >
> > shambhala:/proc/sys/vm> cat dirty_expire_centisecs
> > 2999
> >
> > shambhala:/proc/sys/fs/xfs> cat xfsbufd_centisecs xfssyncd_centisecs
> > 100
> > 3000
> >
> > in all recent kernels!
>
> I don't think those have changed any time recently.

I think to lower them for now, until I got to the cause of that random 
lockups that *appear* to be related to switching between X11 and console 
and are offtopic for that list.

> > I expect to loose the changes for a dirtied file thats in the page
> > cache. But I do not expect to loose the current (old) file on disk in
> > that case, unless the crash happens when its actually written out at
> > that time.
>
> This will depend on what the application is doing, though.

Any hints or link on what it *should* be doing?

> > And
> > that appears to be highly unlikely expecially at the time just after
> > KDE started up when I did not use any application yet. I would be
> > surprised when the first things applications would be doing was to
> > write out what they just read in. And even then I would be surprised
> > when XFS did write to all the files at once. So I just don't get what
> > I have seen here and I think I see a regression. I am willing to look
> > deeper when I found how to do so safely enough.
>
> I take it that you see this even for files which you have not
> (intentionally) modified?

Yes. But then the try it directly after starting KDE case isn't the best 
one. Maybe KDE applications just write out lots of files when KDE is 
started. Hmmm, I maybe could have a glimpse at that with iotop.

> > If there an xfsqa test that simulates sudden interruption of write
> > activity?
>
> There are tests which interrupt IO with the XFS_IOC_GOINGDOWN ioctl,
> which simulates a filesystem shutdown, which is not exactly the same as
> a crash or a power loss, though.
>
> > Actually I am considering to switch to ext3/4. Maybe the people that
> > say don't use XFS on commodity hardware really have a point.
>
> No.  :)

No what? No, they don't have a point?

> > But then it did
> > work very well from 2.6.17.7 to 2.6.26, so I think what I face here
> > is a behavorial regression. It might be a performance improvement at
> > the same time, but for laptops and commodity workstations this is too
> > risky IMHO. Is there interest in digging this? I can accept when you
> > tell my not to use XFS on my laptop. But actually I think something
> > changed between 2.6.26 andf 2.6.27 and maybe thats worth looking at.
>
> If you know what is writing to the files that you often see truncated,
> an strace of that pid might be interesting, to see what sorts of IO it
> is doing.
>
> ls -l /proc/$PID/fd/* | grep $FILE
>
> might give a clue if anyone has these files open, then strace that pid
> to see if there is any interesting activity on them?

I could try that for the file kdeglobals. It seems to be written quite 
recently and in there are the desktop colors. Its basically like to get 
truncated even when the notebook has idled for more than 30 seconds.

> Otherwise, if you're highly motivated, and have a test box, do a little
> regression testing and see when you think this behavior changed.  But
> I'd start w/ pristine upstream kernels.

I think I will look at the contents of the tuxonice patch. I am not sure 
whether it patches anything in block/ oder fs/.

More tommorrow. See also my safe writing in applications mail, I tested 
with 2.6.26 and 2.6.25 and they might only have been subtile changes if 
at all.

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: massively truncated files with XFS with sudden power loss on 2.6.27 and 2.6.28
  2008-12-29 19:29   ` Chris Wedgwood
@ 2008-12-29 20:09     ` Russell Cattelan
  2008-12-29 20:17       ` Chris Wedgwood
  2008-12-29 21:56       ` Eric Sandeen
  0 siblings, 2 replies; 14+ messages in thread
From: Russell Cattelan @ 2008-12-29 20:09 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Martin Steigerwald, Russell Cattelan, xfs

Chris Wedgwood wrote:
> On Mon, Dec 29, 2008 at 01:09:18PM -0600, Russell Cattelan wrote:
>
>   
>> The question that I have is regards to kde apps.
>>     
>
> i just did a quick strace of something, i see it do:
>
>
> open newfile
> write data
> close file
> rename newfile over oldfile
>
> no fsync before close...
>   
Hmm that is worse than truncate to 0, since now we have a new file vs 
one that has been truncated.
But really same net result.
Still why is the file size making it to disk before the data and more 
importantly the extent transaction to the log?
 that should have been fixed.

>
> this will bite xfs more than ext3 w/ ordered mode
>   
Delayed allocation is a factor (and this will be true of any fs 
supporting delayed allocation)
holding of data flushes helps reduce fragmentation by allowing larger 
segments to be flushed out,
but it increases the time data is held in cache and thus create a larger 
window for data loss.

-Russell

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: massively truncated files with XFS with sudden power loss on 2.6.27 and 2.6.28
  2008-12-29 20:09     ` Russell Cattelan
@ 2008-12-29 20:17       ` Chris Wedgwood
  2008-12-29 21:25         ` Russell Cattelan
  2008-12-29 21:56       ` Eric Sandeen
  1 sibling, 1 reply; 14+ messages in thread
From: Chris Wedgwood @ 2008-12-29 20:17 UTC (permalink / raw)
  To: Russell Cattelan; +Cc: Martin Steigerwald, xfs

On Mon, Dec 29, 2008 at 02:09:33PM -0600, Russell Cattelan wrote:

> Still why is the file size making it to disk before the data and
> more importantly the extent transaction to the log?

well, as you know, it's logged, the data isn't

> that should have been fixed.

the window was shrunk to write out begins on close for existing files
the are opened with truncate (i think nathans did that some time
ago?)

new files won't be affected by that change

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: massively truncated files with XFS with sudden power loss on 2.6.27 and 2.6.28
  2008-12-29 20:17       ` Chris Wedgwood
@ 2008-12-29 21:25         ` Russell Cattelan
  0 siblings, 0 replies; 14+ messages in thread
From: Russell Cattelan @ 2008-12-29 21:25 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: Martin Steigerwald, Russell Cattelan, xfs

Chris Wedgwood wrote:
> On Mon, Dec 29, 2008 at 02:09:33PM -0600, Russell Cattelan wrote:
>
>   
>> Still why is the file size making it to disk before the data and
>> more importantly the extent transaction to the log?
>>     
>
> well, as you know, it's logged, the data isn't
>   
yes but the whole deal with null files is no extents for a file size 
that should have extents.

So if the extent creation transaction is logged then it should be safe 
to update the file size on disk,
if not then the file "last flushed" size should be on disk. In this case 
I would assume 0, since that would
be the last valid flush size.


>   
>> that should have been fixed.
>>     
>
> the window was shrunk to write out begins on close for existing files
> the are opened with truncate (i think nathans did that some time
> ago?)
>   
correct but that change/hack has apparently been removed at some point? 
maybe along with the "last flush size" changes?


> new files won't be affected by that change
>   
Correct even if the sync on close if truncate code was there it would 
not help kde apps apparently.


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: massively truncated files with XFS with sudden power loss on 2.6.27 and 2.6.28
  2008-12-29 20:09     ` Russell Cattelan
  2008-12-29 20:17       ` Chris Wedgwood
@ 2008-12-29 21:56       ` Eric Sandeen
  1 sibling, 0 replies; 14+ messages in thread
From: Eric Sandeen @ 2008-12-29 21:56 UTC (permalink / raw)
  To: Russell Cattelan; +Cc: Martin Steigerwald, Chris Wedgwood, xfs

Russell Cattelan wrote:
> Chris Wedgwood wrote:

>> this will bite xfs more than ext3 w/ ordered mode
>>   
> Delayed allocation is a factor (and this will be true of any fs 
> supporting delayed allocation)
> holding of data flushes helps reduce fragmentation by allowing larger 
> segments to be flushed out,
> but it increases the time data is held in cache and thus create a larger 
> window for data loss.

That's not quite accurate AFAIK; yes, xfs has delayed allocation, but it
pushes data to disk on the same schedule (by default) as any other
filesystem; when pdflush goes off (30s) or under memory pressure.

The only difference is that xfs (or any delalloc fs) allocates at flush
time not at write time.

But this does not imply that xfs is holding off flushes for longer due
to delayed allocation; I don't want it to sound like xfs is putting data
integrity at risk due to delalloc, because it's not ...

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: massively truncated files with XFS with sudden power loss on 2.6.27 and 2.6.28
  2008-12-29 19:08 ` Eric Sandeen
  2008-12-29 20:00   ` Martin Steigerwald
@ 2008-12-30  0:14   ` Chris Wedgwood
  1 sibling, 0 replies; 14+ messages in thread
From: Chris Wedgwood @ 2008-12-30  0:14 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Martin Steigerwald, xfs

On Mon, Dec 29, 2008 at 01:08:53PM -0600, Eric Sandeen wrote:

> It is certainly at least possible that whatever is writing the KDE
> files is not following good practices for data integrity... I can't
> say that for sure, but apps have responsibility here, too.  :)

BTW, it's not just KDE that does this.  A lot of apps that IMO should
be more careful aren't.

For example apt/dpkg on debian.  It's possible if you lose
power/oops/whatever during upgrade you can eat those files and cause
much pain.


It's bad enough I started work on a replacement for these that uses
sqlite.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2008-12-30  7:36 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-29 18:20 massively truncated files with XFS with sudden power loss on 2.6.27 and 2.6.28 Martin Steigerwald
2008-12-29 19:03 ` Chris Wedgwood
2008-12-29 19:08 ` Eric Sandeen
2008-12-29 20:00   ` Martin Steigerwald
2008-12-30  0:14   ` Chris Wedgwood
2008-12-29 19:09 ` Russell Cattelan
2008-12-29 19:20   ` Christoph Hellwig
2008-12-29 19:29   ` Chris Wedgwood
2008-12-29 20:09     ` Russell Cattelan
2008-12-29 20:17       ` Chris Wedgwood
2008-12-29 21:25         ` Russell Cattelan
2008-12-29 21:56       ` Eric Sandeen
2008-12-29 19:48 ` safe writing in applications (was: Re: massively truncated files with XFS with sudden power loss on 2.6.27 and 2.6.28) Martin Steigerwald
2008-12-29 19:54   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox