wishlist: xfs_repair should detect files with too small sizes

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* wishlist: xfs_repair should detect files with too small sizes
@ 2013-05-14 21:55 Andras Korn
  2013-05-15  0:47 ` Dave Chinner
  0 siblings, 1 reply; 6+ messages in thread
From: Andras Korn @ 2013-05-14 21:55 UTC (permalink / raw)
  To: xfs

Hi,

I have thousands of files on xfs whose inode claims their size is zero, but
they have blocks allocated to them; du(1) reports a nonzero size. xfs_repair
3.1.9 ignores this. xfs_db can be used to recover the contents of the files
(using commands like inode 1234; write core.size 4567).

David Chinner explained to me that xfs_repair ignores these files because
it's legitimate to have blocks beyond eof (e.g. due to fallocate()), and
that unwritten extent flagging doesn't help because such extents don't need
to be flagged as it's impossible to read them.

My zero-sized files were likely effected by a crash (certainly not
fallocate()).

I think it would be useful to have the ability to distinguish between files
that legitimately have extents beyond EOF and files whose inode incorrectly
reports a too-small size.

Maybe an allocated-size field could be added to the inode, or extents
assigned to files via fallocate() could be flagged somehow? And if files
with incorrect sizes (i.e. where allocated-size div blocksize < number_of_blocks
OR allocated-size < core.size OR where a file contains extents beyond EOF
that are not fallocate-flagged) are found, xfs_repair could at least report
them?

-- 
                     Andras Korn <korn at elan.rulez.org>
            WARNING: Do NOT look into laser with remaining eyeball.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: wishlist: xfs_repair should detect files with too small sizes
  2013-05-14 21:55 wishlist: xfs_repair should detect files with too small sizes Andras Korn
@ 2013-05-15  0:47 ` Dave Chinner
  2013-05-15  8:03   ` Andras Korn
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2013-05-15  0:47 UTC (permalink / raw)
  To: Andras Korn; +Cc: xfs

On Tue, May 14, 2013 at 11:55:50PM +0200, Andras Korn wrote:
> Hi,
> 
> I have thousands of files on xfs whose inode claims their size is zero, but
> they have blocks allocated to them; du(1) reports a nonzero size. xfs_repair
> 3.1.9 ignores this. xfs_db can be used to recover the contents of the files
> (using commands like inode 1234; write core.size 4567).
> 
> David Chinner explained to me that xfs_repair ignores these files because
> it's legitimate to have blocks beyond eof (e.g. due to fallocate()), and

Actually due to speculative preallocation done by delayed
allocation.

> that unwritten extent flagging doesn't help because such extents don't need
> to be flagged as it's impossible to read them.

fallocate will leave unwritten extents beyond EOF, in which case we
can detect it, but we know there's nothing to be done as there's no
data....

> My zero-sized files were likely effected by a crash (certainly not
> fallocate()).
> 
> I think it would be useful to have the ability to distinguish between files
> that legitimately have extents beyond EOF and files whose inode incorrectly
> reports a too-small size.

How? Add a transaction to track the data that has been written?

Well, we already do that - with the inode size.

How do we prevent that from going missing when the application
doesn't use fsync()? By making all inode size update transactions
synchronous.  i.e. really, really slow.

Really, the problem you see is that the applicaiton is not using
fsync, and so there's no guarantee what part of the change has got
to disk when a crash occurs.

> Maybe an allocated-size field could be added to the inode,

That's in the extent map.

> or extents
> assigned to files via fallocate() could be flagged somehow?

They are flagged as unwritten, even beyond EOF.

> And if files
> with incorrect sizes (i.e. where allocated-size div blocksize < number_of_blocks
> OR allocated-size < core.size OR where a file contains extents beyond EOF
> that are not fallocate-flagged) are found, xfs_repair could at least report
> them?

Like I said - how do you reliably determine that there's data in the
blocks if you can lose the update that indicates that there's data
in the blocks?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: wishlist: xfs_repair should detect files with too small sizes
  2013-05-15  0:47 ` Dave Chinner
@ 2013-05-15  8:03   ` Andras Korn
  2013-05-15 21:41     ` Dave Chinner
  0 siblings, 1 reply; 6+ messages in thread
From: Andras Korn @ 2013-05-15  8:03 UTC (permalink / raw)
  To: xfs

On Wed, May 15, 2013 at 10:47:36AM +1000, Dave Chinner wrote:

> > I have thousands of files on xfs whose inode claims their size is zero, but
> > they have blocks allocated to them; du(1) reports a nonzero size. xfs_repair
> > 3.1.9 ignores this. xfs_db can be used to recover the contents of the files
> > (using commands like inode 1234; write core.size 4567).
> > 
> > David Chinner explained to me that xfs_repair ignores these files because
> > it's legitimate to have blocks beyond eof (e.g. due to fallocate()), and
> 
> Actually due to speculative preallocation done by delayed
> allocation.

But if the space is preallocated speculatively and then remains unused,
shouldn't it get freed up somehow, eventually?

> > that unwritten extent flagging doesn't help because such extents don't need
> > to be flagged as it's impossible to read them.
> 
> fallocate will leave unwritten extents beyond EOF, in which case we
> can detect it, but we know there's nothing to be done as there's no
> data....

OK, so we have the following cases:

1. fallocate. The file has more space allocated to it than the size field in
its inode says, and the extra extents are flagged as unwritten.

2. speculative preallocation. Same as above, but the extents are not flagged
as unwritten.

3. corruption. The file's inode reports an incorrect size for whatever
reason. If it's too much, that's easy to detect; the problem is telling if
it's too little.

Distinguishing between #1 and #2 is possible based on the unwritten flags.
In case #2, I think the extra space should be possible to free up (perhaps
by xfs_repair?). Of course, I suppose you could just run truncate(1) on all
files...

The problem is recognising #3.

Do I have that right?

> > I think it would be useful to have the ability to distinguish between files
> > that legitimately have extents beyond EOF and files whose inode incorrectly
> > reports a too-small size.
> 
> How? Add a transaction to track the data that has been written?
> 
> Well, we already do that - with the inode size.

Ah, yes, that's true.

OK, thinking about it I realise there doesn't appear to be a good way of
preventing the problem, but I'm still not sure some heuristic couldn't be
invented to detect and partially remedy it after the fact.

-- 
                     Andras Korn <korn at elan.rulez.org>
  C++ is like an onion. It's got so many layers that you can't help but cry.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: wishlist: xfs_repair should detect files with too small sizes
  2013-05-15  8:03   ` Andras Korn
@ 2013-05-15 21:41     ` Dave Chinner
  2013-05-16  3:56       ` Andras Korn
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2013-05-15 21:41 UTC (permalink / raw)
  To: Andras Korn; +Cc: xfs

On Wed, May 15, 2013 at 10:03:55AM +0200, Andras Korn wrote:
> On Wed, May 15, 2013 at 10:47:36AM +1000, Dave Chinner wrote:
> 
> > > I have thousands of files on xfs whose inode claims their size is zero, but
> > > they have blocks allocated to them; du(1) reports a nonzero size. xfs_repair
> > > 3.1.9 ignores this. xfs_db can be used to recover the contents of the files
> > > (using commands like inode 1234; write core.size 4567).
> > > 
> > > David Chinner explained to me that xfs_repair ignores these files because
> > > it's legitimate to have blocks beyond eof (e.g. due to fallocate()), and
> > 
> > Actually due to speculative preallocation done by delayed
> > allocation.
> 
> But if the space is preallocated speculatively and then remains unused,
> shouldn't it get freed up somehow, eventually?

It does, whichever happens first:

	- the file is closed if the "keep on close" heuristic hasn't
	  been triggered
	- the file has been clean for 5 minutes (background sweeper)
	- the inode is reclaimed from cache.

If the system crashes, it is not cleaned up on reboot until the file
is read into cache again and the above can happen...

> > > that unwritten extent flagging doesn't help because such extents don't need
> > > to be flagged as it's impossible to read them.
> > 
> > fallocate will leave unwritten extents beyond EOF, in which case we
> > can detect it, but we know there's nothing to be done as there's no
> > data....
> 
> OK, so we have the following cases:
> 
> 1. fallocate. The file has more space allocated to it than the size field in
> its inode says, and the extra extents are flagged as unwritten.
> 
> 2. speculative preallocation. Same as above, but the extents are not flagged
> as unwritten.
> 
> 3. corruption. The file's inode reports an incorrect size for whatever
> reason. If it's too much, that's easy to detect; the problem is telling if
> it's too little.
> 
> Distinguishing between #1 and #2 is possible based on the unwritten flags.
> In case #2, I think the extra space should be possible to free up (perhaps
> by xfs_repair?). Of course, I suppose you could just run truncate(1) on all
> files...
> 
> The problem is recognising #3.
> 
> Do I have that right?

Yes.

But iin case #2 it is not xfs_repair's place to remove blocks that
are legitimately attached to an inode and correctly referenced. If
it does that, then you lose any hope of recovering from a missing
file size update on a crash as you lose all references to the data
that is on disk. i.e. you can't find the data to recover it.

> > > I think it would be useful to have the ability to distinguish between files
> > > that legitimately have extents beyond EOF and files whose inode incorrectly
> > > reports a too-small size.
> > 
> > How? Add a transaction to track the data that has been written?
> > 
> > Well, we already do that - with the inode size.
> 
> Ah, yes, that's true.
> 
> OK, thinking about it I realise there doesn't appear to be a good way of
> preventing the problem, but I'm still not sure some heuristic couldn't be
> invented to detect and partially remedy it after the fact.

Trying to remedy it in xfs_repair does more harm than good. What
happens now allows recovery of data if the inode size was wrong. If
we remove the blocks beyond EOF, we lose that ability and hence make
unrecoverable data loss more likely in common failure scenarios.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: wishlist: xfs_repair should detect files with too small sizes
  2013-05-15 21:41     ` Dave Chinner
@ 2013-05-16  3:56       ` Andras Korn
  2013-05-17  0:16         ` Dave Chinner
  0 siblings, 1 reply; 6+ messages in thread
From: Andras Korn @ 2013-05-16  3:56 UTC (permalink / raw)
  To: xfs

On Thu, May 16, 2013 at 07:41:05AM +1000, Dave Chinner wrote:

> > OK, thinking about it I realise there doesn't appear to be a good way of
> > preventing the problem, but I'm still not sure some heuristic couldn't be
> > invented to detect and partially remedy it after the fact.
> 
> Trying to remedy it in xfs_repair does more harm than good. What
> happens now allows recovery of data if the inode size was wrong. If
> we remove the blocks beyond EOF, we lose that ability and hence make
> unrecoverable data loss more likely in common failure scenarios.

That's clear (xfs_repair not freeing up the space is what allowed me to
recover the data). I meant "remedy" as in _either_ increase the inode size
OR free up the extra space. Perhaps xfs_db could be extended to do this?

Of course, increasing the size as stored in the inode can add garbage (at
the very least, binary zeroes) to the end of files, but if the data would
otherwise have been lost, this is probably still preferable. I can even
imagine an xfs_db command that increases file size up to the last non-zero
data byte in the allocated space.

-- 
                     Andras Korn <korn at elan.rulez.org>
                    No wanna work. Wanna bang on keyboard.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: wishlist: xfs_repair should detect files with too small sizes
  2013-05-16  3:56       ` Andras Korn
@ 2013-05-17  0:16         ` Dave Chinner
  0 siblings, 0 replies; 6+ messages in thread
From: Dave Chinner @ 2013-05-17  0:16 UTC (permalink / raw)
  To: Andras Korn; +Cc: xfs

On Thu, May 16, 2013 at 05:56:51AM +0200, Andras Korn wrote:
> On Thu, May 16, 2013 at 07:41:05AM +1000, Dave Chinner wrote:
> 
> > > OK, thinking about it I realise there doesn't appear to be a good way of
> > > preventing the problem, but I'm still not sure some heuristic couldn't be
> > > invented to detect and partially remedy it after the fact.
> > 
> > Trying to remedy it in xfs_repair does more harm than good. What
> > happens now allows recovery of data if the inode size was wrong. If
> > we remove the blocks beyond EOF, we lose that ability and hence make
> > unrecoverable data loss more likely in common failure scenarios.
> 
> That's clear (xfs_repair not freeing up the space is what allowed me to
> recover the data). I meant "remedy" as in _either_ increase the inode size
> OR free up the extra space. Perhaps xfs_db could be extended to do this?

You can already change the inode size with xfs_db by writing the
field directly.

> Of course, increasing the size as stored in the inode can add garbage (at
> the very least, binary zeroes) to the end of files, but if the data would
> otherwise have been lost, this is probably still preferable.

No, it's not preferable, because if data wasn't written after the
extents are allocated, extending the file size exposes stale data
that is already on disk that the owner of the file should not have
access to.

You are free to do this yourself, but we are not going to add
potential stale data exposure holes into repair/db if this situation
arises.

> I can even
> imagine an xfs_db command that increases file size up to the last non-zero
> data byte in the allocated space.

Stale data regions rarely contain zero.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-05-17  0:16 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-14 21:55 wishlist: xfs_repair should detect files with too small sizes Andras Korn
2013-05-15  0:47 ` Dave Chinner
2013-05-15  8:03   ` Andras Korn
2013-05-15 21:41     ` Dave Chinner
2013-05-16  3:56       ` Andras Korn
2013-05-17  0:16         ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox