public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Michael Monnerie <michael.monnerie@is.it-management.at>
Cc: xfs@oss.sgi.com
Subject: Re: rsync and corrupt inodes (was xfs_dump problem)
Date: Fri, 2 Jul 2010 12:42:35 +1000	[thread overview]
Message-ID: <20100702024235.GX24712@dastard> (raw)
In-Reply-To: <201007011025.04391@zmi.at>

On Thu, Jul 01, 2010 at 10:25:03AM +0200, Michael Monnerie wrote:
> On Donnerstag, 1. Juli 2010 Dave Chinner wrote:
> > > From another Linux ("saturn"), I do an rsync via an rsync-module,
> > > and have already 4 Versions where the ".vhd" file of that Windows
> > > Backup is destroyed on "saturn". So the corruption happens when
> > > starting rsync @saturn, copying orion->saturn, both having XFS.
> > 
> > Are you running rsync locally on saturn (i.e. pulling data)? If so,
> > can you get an strace of the rsync of that file so we can see what
> > the order or operations being done on the file is. If you are
> > pushing data to saturn, does the problem go away if you pull it (and
> > vice versa)?
> 
> Oh dear, I made a mistake. It's a push @orion, doing
> rsync -aPvHAXy / saturn::orionbackup/
> 
> The problem is: I cannot 100% replicate it. I found the problem once, 
> moved the dir with the broken file away and synced again. Again broken. 
> Then I reported here. Meanwhile, Windows has done a new backup, that 
> file doesn't seem to get broken. But with another fresh Windows backup, 
> it came again. I don't know if it depends on the file, it happened 4 
> times until now.

So it's the rsync daemon on saturn that is doing all the IO?

> I rsynced today 3 times, twice with the openSUSE kernel and once with 
> 2.6.34, no problem. Sorry (or maybe "lucky me"?).
> 
> > > 852c268f-cf1a-11de-b09b-806e6f6e6963.vhd* ??????????? ? ?    ?     
> > >       ?            ? 852c2690-cf1a-11de-b09b-806e6f6e6963.vhd
> > 
> > On the source machine, can you get a list of the xattrs on the
> > inode?
>
> How would I do that? "getfattr" on that file gives no return, does that 
> mean it doesn't have anything to say? I never do that things, so there 
> shouldn't be any attributes set.

"getfattr -d"

> > > and on dmesg:
> > > [125903.343714] Filesystem "dm-0": corrupt inode 649642 ((a)extents
> > > = 5).  Unmount and run xfs_repair. [125903.343735]
> > > ffff88011e34ca00: 49 4e 81 c0 02 02 00 00 00 00 03 e8 00 00 00 64 
> > > IN.............d [125903.343756] Filesystem "dm-0": XFS internal
> > > error xfs_iformat_extents(1) at line 558 of file
> > > /usr/src/packages/BUILD/kernel-desktop-2.6.31.12/linux-2.6.31/fs/xf
> > >s/xfs_inode.c.  Caller 0xffffffffa032c0ad
> > 
> > That seems like a different problem to what linda is seeing
> > because this is on-disk corruption. can you dump the bad inode via:
> > 
> > # xfs_db -x -r -c "inode 649642" -c p <dev>
> 
> Uh, that's a long output.
> 
> # xfs_db -x -r -c "inode 649642" -c p /dev/swraid0/backup 
.....
> u.bmx[0-4] = [startoff,startblock,blockcount,extentflag] 0:
> [0,549849376,2097151,0] 1:[2097151,551946527,2097151,0] 2:
> [4194302,554043678,2097151,0] 3:[6291453,556140829,2097151,0] 4:
> [8388604,558237980,539421,0]
> a.sfattr.hdr.totsize = 4
> a.sfattr.hdr.count = 40
> a.sfattr.list[0].namelen = 35
> a.sfattr.list[0].valuelen = 136
> a.sfattr.list[0].root = 1
> a.sfattr.list[0].secure = 0
> a.sfattr.list[0].name =
> "\035GI_ACL_FILE\000\000\000\005\000\000\000\001\377\377\377\377\000\a\000\000\000\000\000\002\000\000\004"
> a.sfattr.list[0].value = 
> "\346\000\a\000\000\000\000\000\004\377\377\377\377\000\006\000\000\000\000\000\020\377\377\377\377\000\000\000\000\000\000\000
> \377\377\377\377\000\000\000\000\000IN\201\377\002\002\000\000\000\000\003\350\000\000\000d\000\000\000\001\000\000\000\000\000\000\000\000\000\000\000\002L\025\356\025\000\000\000\000L\022\337\316\000\000\000\000L\025\356\025\024\'\314\214\000\000\000\000\000\000\004\242\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\001\000\000\c\001\000\000\000\000\000\000\000\000\006\273"

>From the metadump, I can see that other valid .vhd files are in
local format with:

core.forkoff = 9
a.sfattr.hdr.totsize = 83
a.sfattr.hdr.count = 1
a.sfattr.list[0].namelen = 12
a.sfattr.list[0].valuelen = 64
a.sfattr.list[0].root = 1
a.sfattr.list[0].secure = 0
a.sfattr.list[0].name = "SGI_ACL_FILE"
a.sfattr.list[0].value = <snipped>


All the broken inodes are in the same format as the valid .vhd files,
but the shortform attribute header is completely toast. Once I correct the
header and the lengths, the only thing that looks wrong is:

xfs_db> p a.sfattr.list[0].name
a.sfattr.list[0].name = "\035GI_ACL_FILE"

The first character of the name is bad, everything after that -
including the attribute value - is identical to that on other
inodes.  What this implies is that we've overwritten the start of
the attribute fork with something, and that looks exactly like the
swap extents problems that we've fixed recently....

> > Hmmmm - do you run xfs_fsr? The errors reported and the corrutpion
> > above are exactly what I'd expect from the swap extent bugs we fixed
> > a while back....
> 
> Yes, xfs_fsdr was running. Disabled it now, and compiled and changed to 
> kernel 2.6.34 now. Hope that's OK ;-)

Ok, so we have identified a potential cause. Either disabling fsr or
upgrading to 2.6.34 should be sufficient to avoid the problem. If no
problem show up now you are on 2.6.34, then I'd switch fsr back on
and see if they show up again...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2010-07-02  2:39 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-27  1:10 WARNING xfsdump [still] Cannot allocate memory for list of [root|non-root] attributes for nondir ino xxyz Linda A. Walsh
2010-06-28  2:27 ` Dave Chinner
2010-06-29 22:33   ` xfs file system in process of becoming corrupt; though xfs_repair thinks it's fine! ; -/ (was xfs_dump problem...) Linda Walsh
2010-06-29 23:25     ` xfs file system in process of becoming corrupt; though xfs_repair thinks it's fine! ;-/ " Dave Chinner
2010-06-29 23:55       ` Michael Weissenbacher
2010-06-30  0:42         ` xfs file system in process of becoming corrupt; though xfs_repair thinks it's fine! ; -/ " Linda A. Walsh
2010-06-30  1:16           ` Dave Chinner
2010-06-30  2:45             ` Linda A. Walsh
2010-07-01 23:58               ` Dave Chinner
2010-07-07  3:18                 ` Linda A. Walsh
2010-07-07  5:56                   ` Linda Walsh
2010-07-07  6:36                     ` Dave Chinner
2010-07-07  9:30                       ` Linda A. Walsh
2010-07-07 21:01                         ` Linda Walsh
2010-06-30  0:01       ` Linda A. Walsh
2010-06-30  1:06         ` xfs file system in process of becoming corrupt; though xfs_repair thinks it's fine! ;-/ " Dave Chinner
2010-06-30  1:52           ` xfs file system in process of becoming corrupt; though xfs_repair thinks it's fine! ; -/ " Linda A. Walsh
2010-06-30 21:01             ` Stan Hoeppner
2010-07-07 21:40               ` utf-8' chars from Winxp machine may be problem related (was Re: xfs file system in process of becoming corrupt; though xfs_repair...) Linda A. Walsh
2010-07-07 23:40                 ` Stan Hoeppner
2010-07-08  0:38                   ` Linda A. Walsh
2010-06-30 18:25     ` xfs file system in process of becoming corrupt; though xfs_repair thinks it's fine! ; -/ (was xfs_dump problem...) Michael Monnerie
2010-06-30 23:30       ` rsync and corrupt inodes (was xfs_dump problem) Dave Chinner
2010-07-01  8:25         ` Michael Monnerie
2010-07-02  2:42           ` Dave Chinner [this message]
2010-07-02  6:21             ` Michael Monnerie
2010-07-04 22:53               ` Dave Chinner
2010-07-12 11:28                 ` Michael Monnerie
2010-07-07 21:56         ` Linda Walsh
  -- strict thread matches above, loose matches on Subject: below --
2010-07-15 20:58 Michael Monnerie
2010-07-15 22:57 ` Dave Chinner
2010-07-16 14:17   ` Michael Monnerie
2010-07-16 14:40   ` Michael Monnerie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100702024235.GX24712@dastard \
    --to=david@fromorbit.com \
    --cc=michael.monnerie@is.it-management.at \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox