public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Wayne Walker <wwalker@crossroads.com>
Cc: xfs@oss.sgi.com
Subject: Re: File system corruption
Date: Sat, 13 Oct 2012 11:14:25 +1100	[thread overview]
Message-ID: <20121013001425.GN2739@dastard> (raw)
In-Reply-To: <50789076.7040402@crossroads.com>

[cc'd the list again so everyone can see what is happening]

On Fri, Oct 12, 2012 at 04:49:42PM -0500, Wayne Walker wrote:
> On 10/11/2012 04:07 PM, Dave Chinner wrote:
> <snip>
> >Ok, so having looked at the stack trace, the AGF block taht was read
> >contained zeros, not valid metadata, which is why the allocation
> >failed.
> >
> >Can you remake the filesystem at will? If so, can you run mkfs.xfs
> >as per above, then run the following command?
> >
> ># echo 3 > /proc/sys/vm/drop_caches
> ># for i in `seq 0 4`; do
> >>xfs_db -l /dev/sda5 -c "sb $i" -c p -c "agf $i" -c p /dev/sde1
> >>done
> >So that we can see what mkfs put on disk? Can you then mount the
> >filesystem, unmount it again, and run the same commands? Then mount
> >the filesystem, run the copy/sync to trigger the error, then unmount
> >and run the commands again?
> >
> >What I'm interested in if whether xfs_db sees the AGF (which ever
> >one it is) as zero, or whether only the kernel is seeing that.
> 
> Thank you for the help.  I believe this has everything you asked for Dave.
....
> bash-4.1# uname -a
> Linux t30-2.commstor.crossroads.com 2.6.32-71.29.1.el6.x86_64 #1 SMP
> Mon Jun 27 19:49:27 BST 2011 x86_64 x86_64 x86_64 GNU/Linux
> bash-4.1# /sbin/mkfs.xfs -f -l logdev=/dev/sda5 -b size=4096 -d
> su=1024k,sw=4 /dev/sde1
> meta-data=/dev/sde1              isize=256    agcount=5,
> agsize=268435200 blks
>          =                       sectsz=512   attr=2
> data     =                       bsize=4096   blocks=1183011584, imaxpct=5
>          =                       sunit=256    swidth=1024 blks
> naming   =version 2              bsize=4096   ascii-ci=0
> log      =/dev/sda5              bsize=4096   blocks=97280, version=2
>          =                       sectsz=512   sunit=0 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> bash-4.1# echo 3 > /proc/sys/vm/drop_caches
> bash-4.1# for i in `seq 0 4`; do xfs_db -l /dev/sda5 -c "sb $i" -c p
> -c "agf $i" -c p /dev/sde1; done
> magicnum = 0x58465342
> blocksize = 4096
.....

All superblocks and AGF headers look good.

> bash-4.1# mount -t xfs -o defaults,noatime,logdev=/dev/sda5
> /dev/sde1 /dtfs_data/data1
> bash-4.1# cp random_data.1G /dtfs_data/data1/foo2
> bash-4.1# sync
> bash-4.1# cp random_data.1G /dtfs_data/data1/foo3
> bash-4.1# sync
> bash-4.1# dmesg | tail -100
.....
> Filesystem "sde1": Disabling barriers, not supported with external
> log device
> XFS mounting filesystem sde1
> Ending clean XFS mount for filesystem: sde1
> ffff881808615200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> ................
> Filesystem "sde1": XFS internal error xfs_alloc_read_agf at line
> 2157 of file fs/xfs/xfs_alloc.c.  Caller 0xffffffffa01d7989
.....
> bash-4.1# umount /dtfs_data/data1
> bash-4.1# echo 3 > /proc/sys/vm/drop_caches
> bash-4.1# for i in `seq 0 4`; do xfs_db -l /dev/sda5 -c "sb $i" -c p
> -c "agf $i" -c p /dev/sde1; done
> xfs_db: cannot init perag data (117)

xfs_db sees the corruption, too. What is corrupted?

> magicnum = 0x58465342
> blocksize = 4096
> dblocks = 1183011584

sb 0 is fine.

> magicnum = 0x58414746
> versionnum = 1
> seqno = 0

AGF 0 is fine.

So are SB/AGF 1.

> magicnum = 0
> blocksize = 0
> dblocks = 0

SB 2 is zeroed.

> magicnum = 0
> versionnum = 0
> seqno = 0

AGF 2 is zeroed.

> magicnum = 0x58465342
> blocksize = 4096
> dblocks = 1183011584

And SB/AGF 3 and 4 are ok, too.

So, the filesystem headers just beyond the 2TB offset are zero.
That tends to point to a block device problem, as an offset of 2TB
is where a 32 bit sector count will overflow (i.e. 2^32). Next step
is to run blktrace/blkparse on the cp workload that generates the
error to see if anything actually writes to the 2TB offset region,
and if so, where it comes from.

Probably best to compress the resultant blkparse output file - it
might be quite large but the text will compress well.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2012-10-13  0:12 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-11 17:52 File system corruption Wayne Walker
2012-10-11 18:03 ` Wayne Walker
2012-10-11 21:07 ` Dave Chinner
     [not found]   ` <50789076.7040402@crossroads.com>
2012-10-13  0:14     ` Dave Chinner [this message]
2012-10-24 21:19       ` Wayne Walker
2012-10-24 22:51         ` Dave Chinner
  -- strict thread matches above, loose matches on Subject: below --
2009-07-16 18:08 John Quigley
2009-07-16 19:20 ` Eric Sandeen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121013001425.GN2739@dastard \
    --to=david@fromorbit.com \
    --cc=wwalker@crossroads.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox