public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Eric Sandeen <sandeen@sandeen.net>
To: "Török Edwin" <edwin@skylable.com>, "Brian Foster" <bfoster@redhat.com>
Cc: Christopher Squires <christopher.squires@hgst.com>,
	Wayne Burri <wayne.burri@hgst.com>,
	Luca Gibelli <luca@skylable.com>,
	xfs@oss.sgi.com
Subject: Re: PROBLEM: XFS on ARM corruption 'Structure needs cleaning'
Date: Thu, 11 Jun 2015 10:58:44 -0500	[thread overview]
Message-ID: <5579B034.4070503@sandeen.net> (raw)
In-Reply-To: <5579AE85.5080203@sandeen.net>

On 6/11/15 10:51 AM, Eric Sandeen wrote:
> On 6/11/15 10:28 AM, Török Edwin wrote:
>> On 06/11/2015 06:16 PM, Brian Foster wrote:
>>> On Thu, Jun 11, 2015 at 09:23:38AM +0300, Török Edwin wrote:
>>>> [1.] XFS on ARM corruption 'Structure needs cleaning'
>>>> [2.] Full description of the problem/report:
>>>>
>>>> I have been running XFS sucessfully on x86-64 for years, however I'm having trouble running it on ARM.
>>>>
>>>> Running the testcase below [7.] reliably reproduces the filesystem corruption starting from a freshly
>>>> created XFS filesystem: running ls after 'sxadm node --new --batch /export/dfs/a/b' shows a 'Structure needs cleaning' error,
>>>> and dmesg shows a corruption error [6.].
>>>> xfs_repair 3.1.9 is not able to repair the corruption: after mounting the repair filesystem
>>>> I still get the 'Structure needs cleaning' error.
>>>>
>>>> Note: using /export/dfs/a/b is important for reproducing the problem: if I only use one level of directories in /export/dfs then the problem
>>>> doesn't reproduce. Also if I use a tuned version of sxadm that creates fewer database files then the problem doesn't reproduce either.
>>>>
>>>> [3.] Keywords: filesystems, XFS corruption, ARM
>>>> [4.] Kernel information
>>>> [4.1.] Kernel version (from /proc/version):
>>>> Linux hornet34 3.14.3-00088-g7651c68 #24 Thu Apr 9 16:13:46 MDT 2015 armv7l GNU/Linux
>>>>
>>> ...
>>>> [5.] Most recent kernel version which did not have the bug: Unknown, first kernel I try on ARM
>>>>
>>>> [6.] dmesg stacktrace
>>>>
>>>> [4627578.440000] XFS (sda4): Mounting Filesystem
>>>> [4627578.510000] XFS (sda4): Ending clean mount
>>>> [4627621.470000] dd6ee000: 58 46 53 42 00 00 10 00 00 00 00 00 37 40 21 00  XFSB........7@!.
>>>> [4627621.480000] dd6ee010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>>> [4627621.490000] dd6ee020: 5b 08 7f 79 0e 3a 46 3d 9b ea 26 ad 9d 62 17 8d  [..y.:F=..&..b..
>>>> [4627621.490000] dd6ee030: 00 00 00 00 20 00 00 04 00 00 00 00 00 00 00 80  .... ...........
>>>
>>> Just a data point... the magic number here looks like a superblock magic
>>> (XFSB) rather than one of the directory magic numbers. I'm wondering if
>>> a buffer disk address has gone bad somehow or another.
>>>
>>> Does this happen to be a large block device? I don't see any partition
>>> or xfs_info data below. If so, it would be interesting to see if this
>>> reproduces on a smaller device. It does appear that the large block
>>> device option is enabled in the kernel config above, however, so maybe
>>> that's unrelated.
>>
>> This is mkfs.xfs /dev/sda4:
>> meta-data=/dev/sda4              isize=256    agcount=4, agsize=231737408 blks
>>          =                       sectsz=512   attr=2, projid32bit=0
>> data     =                       bsize=4096   blocks=926949632, imaxpct=5
>>          =                       sunit=0      swidth=0 blks
>> naming   =version 2              bsize=4096   ascii-ci=0
>> log      =internal log           bsize=4096   blocks=452612, version=2
>>          =                       sectsz=512   sunit=0 blks, lazy-count=1
>> realtime =none                   extsz=4096   blocks=0, rtextents=0
>>
>> But it also reproduces with this small loopback file:
>> meta-data=/tmp/xfs.test          isize=256    agcount=2, agsize=5120 blks
>>          =                       sectsz=512   attr=2, projid32bit=0
>> data     =                       bsize=4096   blocks=10240, imaxpct=25
>>          =                       sunit=0      swidth=0 blks
>> naming   =version 2              bsize=4096   ascii-ci=0
>> log      =internal log           bsize=4096   blocks=1200, version=2
>>          =                       sectsz=512   sunit=0 blks, lazy-count=1
>> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> ok so not a block number overflow issue, thanks.
> 
>> You can have a look at xfs.test here: http://vol-public.s3.indian.skylable.com:8008/armel/testcase/xfs.test.gz
>>
>> If I loopback mount that on an x86-64 box it doesn't show the corruption message though ...
> 
> FWIW, this is the 2nd report we've had of something similar, both on Armv7, both ok on x86_64.
> 
> I'll take a look at your xfs.test; that's presumably copied after it reported the error, and you unmounted it before uploading, correct?  And it was mkfs'd on armv7, never mounted or manipulated in any way on x86_64?

Oh, and what were the kernel messages when you produced the corruption with xfs.txt?

thanks,
-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2015-06-11 15:58 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-11  6:23 PROBLEM: XFS on ARM corruption 'Structure needs cleaning' Török Edwin
2015-06-11 15:16 ` Brian Foster
2015-06-11 15:28   ` Török Edwin
2015-06-11 15:51     ` Eric Sandeen
2015-06-11 15:58       ` Eric Sandeen [this message]
2015-06-11 16:32         ` Török Edwin
2015-06-11 17:10           ` Eric Sandeen
2015-06-11 17:13             ` Török Edwin
2015-06-11 17:16               ` Eric Sandeen
2015-06-11 20:07           ` Eric Sandeen
2015-06-11 20:29             ` Eric Sandeen
2015-06-11 22:53             ` Dave Chinner
2015-06-12 12:21           ` Brian Foster
2015-06-12 12:47             ` Török Edwin
2015-06-12 13:54               ` Brian Foster
2015-06-12 20:19                 ` Eric Sandeen
     [not found]                   ` <BLUPR04MB593340A765596780F266454F2BB0@BLUPR04MB593.namprd04.prod.outlook.com>
2015-06-13 13:55                     ` Török Edwin
2015-06-12 22:52               ` Dave Chinner
2015-08-12  0:56                 ` katsuki.uwatoko
2015-08-12  3:14                   ` Dave Chinner
2015-08-12  6:19                     ` katsuki.uwatoko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5579B034.4070503@sandeen.net \
    --to=sandeen@sandeen.net \
    --cc=bfoster@redhat.com \
    --cc=christopher.squires@hgst.com \
    --cc=edwin@skylable.com \
    --cc=luca@skylable.com \
    --cc=wayne.burri@hgst.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox