All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Sandeen <sandeen@sandeen.net>
To: Jason Detring <detringj@gmail.com>
Cc: xfs@oss.sgi.com
Subject: Re: Read corruption on ARM
Date: Wed, 27 Feb 2013 15:10:17 -0600	[thread overview]
Message-ID: <512E7639.20205@sandeen.net> (raw)
In-Reply-To: <CA+AKrqDq5xCNQo1X=MeRBq54ka0FGJEV5Rn6OzwY7eBfJ+8Wkw@mail.gmail.com>

On 2/27/13 12:15 PM, Jason Detring wrote:
> On 2/27/13, Eric Sandeen <sandeen@sandeen.net> wrote:
>> On 2/27/13 10:28 AM, Jason Detring wrote:
>>>             find-502   [000]   207.983594: xfs_da_btree_corrupt: dev 7:0
>>> bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 0 flags DONE|PAGES caller
>>> xfs_dir2_leaf_readbuf
>>
>> Was this on the same image as you sent earlier?
> 
> Yes, sorry, I should have said that.  I'm now using the demo image
> with the RasPi exclusively for testing.
> 
> 
>> Ok, so this tells us that it was trying to read sector nr. 0x5a4f8 (369912),
>> or fsblock 46239
>>
>> What's really on disk there?
>>
>> $ xfs_db problemimage.xfs
>> xfs_db> blockget -n
>> xfs_db> daddr 369912
>> xfs_db> blockuse
>> block 49152 (3/0) type sb
>> xfs_db> type text
>> xfs_db> p
>> 000:  58 46 53 42 00 00 10 00 00 00 00 00 00 00 f0 d3  XFSB............
>> ...
>>
>> So it really did have a superblock location that it was reading
>> at that point - the backup SB in the 3rd allocation group, to be exact.
>> But it shouldn't have been trying to read a superblock at this point
>> in the code...
>>
>> Hm, maybe I should have had you enable all xfs tracepoints to get
>> more info about where we thought we were on disk when we were doing this.
>> If you used trace-cmd you can do "trace-cmd record -e xfs*" IIRC.
>> You can do similar echo 1 > /<blah>/xfs*/enable I think for the sysfs
>> route.
>>
>> Can you identify which directory it was that tripped the above error?
> 
> # modprobe xfs-O1-g
> # mount -o loop,ro /xfsdebug/problemimage.xfs /loop
> # find /loop -type d -print0 > list.txt
> # umount /loop
> # rmmod xfs
> # modprobe xfs-O2-g
> # mount -o loop,ro /xfsdebug/problemimage.xfs /loop
> # cat list.txt | xargs -0 -P1 -n1 -I{} sh -c '(dir="{}" ; ls "${dir}"
>> /dev/null ; sleep 0.1 ; dmesg | tail -n1 | grep Corruption && echo
> "${dir} is causing problems")'
> ls: reading directory /loop/ruby/1.9.1: Structure needs cleaning
> [35689.975822] XFS (loop0): Corruption detected. Unmount and run xfs_repair
> /loop/ruby/1.9.1 is causing problems
> ...
> 
> OK, I now have a name.  Rebooting to get a clean slate.

Ok, and an inode number:

134 test/ruby/1.9.1

xfs_db> inode 134
xfs_db> p
core.format = 2 (extents)
...
core.aformat = 2 (extents)
...
u.bmx[0-1] = [startoff,startblock,blockcount,extentflag] 0:[0,53675,1,0] 1:[8388608,60304,1,0]

so those are the blocks it should live in.

Or, if you prefer:

# xfs_bmap -vv test/ruby/1.9.1
test/ruby/1.9.1:
 EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
   0: [0..7]:          406096..406103    3 (36184..36191)       8

Here's the relevant part of the trace, from the readdir of that inode:

   ls-520   xfs_readdir:          ino 0x86
   ls-520   xfs_perag_get:        agno 3 refcount 2 caller _xfs_buf_find
   ls-520   xfs_perag_put:        agno 3 refcount 1 caller _xfs_buf_find
   ls-520   xfs_buf_init:         bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 0 flags READ caller xfs_buf_get_map

by here we're already looking for the block which isn't related to the dir.

   ls-520   xfs_perag_get:        agno 3 refcount 2 caller _xfs_buf_find
   ls-520   xfs_buf_get:          bno 0x5a4f8 len 0x1000 hold 1 pincount 0 lock 0 flags READ caller xfs_buf_read_map
   ls-520   xfs_buf_read:         bno 0x5a4f8 len 0x1000 hold 1 pincount 0 lock 0 flags READ caller xfs_trans_read_buf_map
   ls-520   xfs_buf_iorequest:    bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 0 flags READ|PAGES caller _xfs_buf_read
   ls-520   xfs_buf_hold:         bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 0 flags READ|PAGES caller xfs_buf_iorequest
   ls-520   xfs_buf_rele:         bno 0x5a4f8 nblks 0x8 hold 2 pincount 0 lock 0 flags READ|PAGES caller xfs_buf_iorequest
   ls-520   xfs_buf_iowait:       bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 0 flags READ|PAGES caller _xfs_buf_read
loop0-514   xfs_buf_ioerror:      bno 0x5a4f8 len 0x1000 hold 1 pincount 0 lock 0 error 0 flags READ|PAGES caller xfs_buf_bio_end_io
loop0-514   xfs_buf_iodone:       bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 0 flags READ|PAGES caller _xfs_buf_ioend
   ls-520   xfs_buf_iowait_done:  bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 0 flags DONE|PAGES caller _xfs_buf_read
   ls-520   xfs_da_btree_corrupt: bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 0 flags DONE|PAGES caller xfs_dir2_leaf_readbuf

and here's where we notice that fact I think.

   ls-520   xfs_buf_unlock:       bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 1 flags DONE|PAGES caller xfs_trans_brelse
   ls-520   xfs_buf_rele:         bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 1 flags DONE|PAGES caller xfs_trans_brelse

Not yet sure what's up here.  I'd probably need to get a cross-compiled xfs.ko going on my rpi to do more debugging...

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2013-02-27 21:10 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-26 21:58 Read corruption on ARM Jason Detring
2013-02-26 22:33 ` Eric Sandeen
2013-02-26 23:25   ` Jason Detring
     [not found]     ` <512D49E2.40003@sandeen.net>
     [not found]       ` <CA+AKrqCrphO-eKy0n=70O9hmB3mXttOsKmTdfRnPxgJM3_PAkQ@mail.gmail.com>
2013-02-27 17:00         ` Eric Sandeen
     [not found]           ` <CA+AKrqDq5xCNQo1X=MeRBq54ka0FGJEV5Rn6OzwY7eBfJ+8Wkw@mail.gmail.com>
2013-02-27 21:10             ` Eric Sandeen [this message]
     [not found]               ` <512E89C2.9000302@sandeen.net>
     [not found]                 ` <CA+AKrqDaY4cgP+EPLepzUOU2jAOygTuj-0xDtOaGf+O0aRZV_g@mail.gmail.com>
     [not found]                   ` <512E903A.2020405@sandeen.net>
     [not found]                     ` <CA+AKrqAv7-5gGj_cNBNj=-nChKPzi+_HZmH=z2UABG9pDOmpBg@mail.gmail.com>
2013-02-28  4:38                       ` Eric Sandeen
2013-02-28  4:50                         ` Eric Sandeen
2013-02-28  5:27                           ` Eric Sandeen
2013-02-28 21:38                             ` Jason Detring
2013-03-01  2:25                               ` Dave Chinner
2013-03-01  2:53                                 ` Eric Sandeen
2013-03-01  4:54                                   ` Dave Chinner
2013-02-26 22:37 ` Eric Sandeen
2013-02-26 22:51   ` Eric Sandeen
2013-02-26 23:21     ` Jason Detring
2013-02-27  2:16       ` Dave Chinner
2013-02-27 14:48         ` Eric Sandeen
2013-02-27  7:19 ` Stefan Ring
2013-02-27 14:48   ` Eric Sandeen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=512E7639.20205@sandeen.net \
    --to=sandeen@sandeen.net \
    --cc=detringj@gmail.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.