public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Eric Sandeen <sandeen@sandeen.net>
To: Jason Detring <detringj@gmail.com>
Cc: xfs@oss.sgi.com
Subject: Re: Read corruption on ARM
Date: Wed, 27 Feb 2013 15:10:17 -0600	[thread overview]
Message-ID: <512E7639.20205@sandeen.net> (raw)
In-Reply-To: <CA+AKrqDq5xCNQo1X=MeRBq54ka0FGJEV5Rn6OzwY7eBfJ+8Wkw@mail.gmail.com>

On 2/27/13 12:15 PM, Jason Detring wrote:
> On 2/27/13, Eric Sandeen <sandeen@sandeen.net> wrote:
>> On 2/27/13 10:28 AM, Jason Detring wrote:
>>>             find-502   [000]   207.983594: xfs_da_btree_corrupt: dev 7:0
>>> bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 0 flags DONE|PAGES caller
>>> xfs_dir2_leaf_readbuf
>>
>> Was this on the same image as you sent earlier?
> 
> Yes, sorry, I should have said that.  I'm now using the demo image
> with the RasPi exclusively for testing.
> 
> 
>> Ok, so this tells us that it was trying to read sector nr. 0x5a4f8 (369912),
>> or fsblock 46239
>>
>> What's really on disk there?
>>
>> $ xfs_db problemimage.xfs
>> xfs_db> blockget -n
>> xfs_db> daddr 369912
>> xfs_db> blockuse
>> block 49152 (3/0) type sb
>> xfs_db> type text
>> xfs_db> p
>> 000:  58 46 53 42 00 00 10 00 00 00 00 00 00 00 f0 d3  XFSB............
>> ...
>>
>> So it really did have a superblock location that it was reading
>> at that point - the backup SB in the 3rd allocation group, to be exact.
>> But it shouldn't have been trying to read a superblock at this point
>> in the code...
>>
>> Hm, maybe I should have had you enable all xfs tracepoints to get
>> more info about where we thought we were on disk when we were doing this.
>> If you used trace-cmd you can do "trace-cmd record -e xfs*" IIRC.
>> You can do similar echo 1 > /<blah>/xfs*/enable I think for the sysfs
>> route.
>>
>> Can you identify which directory it was that tripped the above error?
> 
> # modprobe xfs-O1-g
> # mount -o loop,ro /xfsdebug/problemimage.xfs /loop
> # find /loop -type d -print0 > list.txt
> # umount /loop
> # rmmod xfs
> # modprobe xfs-O2-g
> # mount -o loop,ro /xfsdebug/problemimage.xfs /loop
> # cat list.txt | xargs -0 -P1 -n1 -I{} sh -c '(dir="{}" ; ls "${dir}"
>> /dev/null ; sleep 0.1 ; dmesg | tail -n1 | grep Corruption && echo
> "${dir} is causing problems")'
> ls: reading directory /loop/ruby/1.9.1: Structure needs cleaning
> [35689.975822] XFS (loop0): Corruption detected. Unmount and run xfs_repair
> /loop/ruby/1.9.1 is causing problems
> ...
> 
> OK, I now have a name.  Rebooting to get a clean slate.

Ok, and an inode number:

134 test/ruby/1.9.1

xfs_db> inode 134
xfs_db> p
core.format = 2 (extents)
...
core.aformat = 2 (extents)
...
u.bmx[0-1] = [startoff,startblock,blockcount,extentflag] 0:[0,53675,1,0] 1:[8388608,60304,1,0]

so those are the blocks it should live in.

Or, if you prefer:

# xfs_bmap -vv test/ruby/1.9.1
test/ruby/1.9.1:
 EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
   0: [0..7]:          406096..406103    3 (36184..36191)       8

Here's the relevant part of the trace, from the readdir of that inode:

   ls-520   xfs_readdir:          ino 0x86
   ls-520   xfs_perag_get:        agno 3 refcount 2 caller _xfs_buf_find
   ls-520   xfs_perag_put:        agno 3 refcount 1 caller _xfs_buf_find
   ls-520   xfs_buf_init:         bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 0 flags READ caller xfs_buf_get_map

by here we're already looking for the block which isn't related to the dir.

   ls-520   xfs_perag_get:        agno 3 refcount 2 caller _xfs_buf_find
   ls-520   xfs_buf_get:          bno 0x5a4f8 len 0x1000 hold 1 pincount 0 lock 0 flags READ caller xfs_buf_read_map
   ls-520   xfs_buf_read:         bno 0x5a4f8 len 0x1000 hold 1 pincount 0 lock 0 flags READ caller xfs_trans_read_buf_map
   ls-520   xfs_buf_iorequest:    bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 0 flags READ|PAGES caller _xfs_buf_read
   ls-520   xfs_buf_hold:         bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 0 flags READ|PAGES caller xfs_buf_iorequest
   ls-520   xfs_buf_rele:         bno 0x5a4f8 nblks 0x8 hold 2 pincount 0 lock 0 flags READ|PAGES caller xfs_buf_iorequest
   ls-520   xfs_buf_iowait:       bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 0 flags READ|PAGES caller _xfs_buf_read
loop0-514   xfs_buf_ioerror:      bno 0x5a4f8 len 0x1000 hold 1 pincount 0 lock 0 error 0 flags READ|PAGES caller xfs_buf_bio_end_io
loop0-514   xfs_buf_iodone:       bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 0 flags READ|PAGES caller _xfs_buf_ioend
   ls-520   xfs_buf_iowait_done:  bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 0 flags DONE|PAGES caller _xfs_buf_read
   ls-520   xfs_da_btree_corrupt: bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 0 flags DONE|PAGES caller xfs_dir2_leaf_readbuf

and here's where we notice that fact I think.

   ls-520   xfs_buf_unlock:       bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 1 flags DONE|PAGES caller xfs_trans_brelse
   ls-520   xfs_buf_rele:         bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 1 flags DONE|PAGES caller xfs_trans_brelse

Not yet sure what's up here.  I'd probably need to get a cross-compiled xfs.ko going on my rpi to do more debugging...

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2013-02-27 21:10 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-26 21:58 Read corruption on ARM Jason Detring
2013-02-26 22:33 ` Eric Sandeen
2013-02-26 23:25   ` Jason Detring
     [not found]     ` <512D49E2.40003@sandeen.net>
     [not found]       ` <CA+AKrqCrphO-eKy0n=70O9hmB3mXttOsKmTdfRnPxgJM3_PAkQ@mail.gmail.com>
2013-02-27 17:00         ` Eric Sandeen
     [not found]           ` <CA+AKrqDq5xCNQo1X=MeRBq54ka0FGJEV5Rn6OzwY7eBfJ+8Wkw@mail.gmail.com>
2013-02-27 21:10             ` Eric Sandeen [this message]
     [not found]               ` <512E89C2.9000302@sandeen.net>
     [not found]                 ` <CA+AKrqDaY4cgP+EPLepzUOU2jAOygTuj-0xDtOaGf+O0aRZV_g@mail.gmail.com>
     [not found]                   ` <512E903A.2020405@sandeen.net>
     [not found]                     ` <CA+AKrqAv7-5gGj_cNBNj=-nChKPzi+_HZmH=z2UABG9pDOmpBg@mail.gmail.com>
2013-02-28  4:38                       ` Eric Sandeen
2013-02-28  4:50                         ` Eric Sandeen
2013-02-28  5:27                           ` Eric Sandeen
2013-02-28 21:38                             ` Jason Detring
2013-03-01  2:25                               ` Dave Chinner
2013-03-01  2:53                                 ` Eric Sandeen
2013-03-01  4:54                                   ` Dave Chinner
2013-02-26 22:37 ` Eric Sandeen
2013-02-26 22:51   ` Eric Sandeen
2013-02-26 23:21     ` Jason Detring
2013-02-27  2:16       ` Dave Chinner
2013-02-27 14:48         ` Eric Sandeen
2013-02-27  7:19 ` Stefan Ring
2013-02-27 14:48   ` Eric Sandeen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=512E7639.20205@sandeen.net \
    --to=sandeen@sandeen.net \
    --cc=detringj@gmail.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox