All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: "Török Edwin" <edwin@skylable.com>
Cc: Karanvir Singh <karanvir.singh@hgst.com>,
	Eric Sandeen <sandeen@sandeen.net>,
	Luca Gibelli <luca@skylable.com>,
	xfs@oss.sgi.com,
	Christopher Squires <christopher.squires@hgst.com>,
	Wayne Burri <wayne.burri@hgst.com>
Subject: Re: PROBLEM: XFS on ARM corruption 'Structure needs cleaning'
Date: Fri, 12 Jun 2015 09:54:04 -0400	[thread overview]
Message-ID: <20150612135404.GC60661@bfoster.bfoster> (raw)
In-Reply-To: <557AD4D4.3010901@skylable.com>

On Fri, Jun 12, 2015 at 03:47:16PM +0300, Török Edwin wrote:
> On 06/12/2015 03:21 PM, Brian Foster wrote:
> > On Thu, Jun 11, 2015 at 07:32:04PM +0300, Török Edwin wrote:
> >> On 06/11/2015 06:58 PM, Eric Sandeen wrote:
> >>> On 6/11/15 10:51 AM, Eric Sandeen wrote:
> >>>> On 6/11/15 10:28 AM, Török Edwin wrote:
> >>>>> On 06/11/2015 06:16 PM, Brian Foster wrote:
> >>>>>> On Thu, Jun 11, 2015 at 09:23:38AM +0300, Török Edwin wrote:
> >>>>>>> [1.] XFS on ARM corruption 'Structure needs cleaning'
> >>>>>>> [2.] Full description of the problem/report:
> >>>>>>>
> >>>>>>> I have been running XFS sucessfully on x86-64 for years, however I'm having trouble running it on ARM.
> >>>>>>>
> >>>>>>> Running the testcase below [7.] reliably reproduces the filesystem corruption starting from a freshly
> >>>>>>> created XFS filesystem: running ls after 'sxadm node --new --batch /export/dfs/a/b' shows a 'Structure needs cleaning' error,
> >>>>>>> and dmesg shows a corruption error [6.].
> >>>>>>> xfs_repair 3.1.9 is not able to repair the corruption: after mounting the repair filesystem
> >>>>>>> I still get the 'Structure needs cleaning' error.
> >>>>>>>
> >>>>>>> Note: using /export/dfs/a/b is important for reproducing the problem: if I only use one level of directories in /export/dfs then the problem
> >>>>>>> doesn't reproduce. Also if I use a tuned version of sxadm that creates fewer database files then the problem doesn't reproduce either.
> >>>>>>>
> >>>>>>> [3.] Keywords: filesystems, XFS corruption, ARM
> >>>>>>> [4.] Kernel information
> >>>>>>> [4.1.] Kernel version (from /proc/version):
> >>>>>>> Linux hornet34 3.14.3-00088-g7651c68 #24 Thu Apr 9 16:13:46 MDT 2015 armv7l GNU/Linux
> >>>>>>>
> >>>>>> ...
> >>>>>>> [5.] Most recent kernel version which did not have the bug: Unknown, first kernel I try on ARM
> >>>>>>>
> >>>>>>> [6.] dmesg stacktrace
> >>>>>>>
> >>>>>>> [4627578.440000] XFS (sda4): Mounting Filesystem
> >>>>>>> [4627578.510000] XFS (sda4): Ending clean mount
> >>>>>>> [4627621.470000] dd6ee000: 58 46 53 42 00 00 10 00 00 00 00 00 37 40 21 00  XFSB........7@!.
> >>>>>>> [4627621.480000] dd6ee010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> >>>>>>> [4627621.490000] dd6ee020: 5b 08 7f 79 0e 3a 46 3d 9b ea 26 ad 9d 62 17 8d  [..y.:F=..&..b..
> >>>>>>> [4627621.490000] dd6ee030: 00 00 00 00 20 00 00 04 00 00 00 00 00 00 00 80  .... ...........
> >>>>>>
> >>>>>> Just a data point... the magic number here looks like a superblock magic
> >>>>>> (XFSB) rather than one of the directory magic numbers. I'm wondering if
> >>>>>> a buffer disk address has gone bad somehow or another.
> >>>>>>
> >>>>>> Does this happen to be a large block device? I don't see any partition
> >>>>>> or xfs_info data below. If so, it would be interesting to see if this
> >>>>>> reproduces on a smaller device. It does appear that the large block
> >>>>>> device option is enabled in the kernel config above, however, so maybe
> >>>>>> that's unrelated.
> >>>>>
> >>>>> This is mkfs.xfs /dev/sda4:
> >>>>> meta-data=/dev/sda4              isize=256    agcount=4, agsize=231737408 blks
> >>>>>          =                       sectsz=512   attr=2, projid32bit=0
> >>>>> data     =                       bsize=4096   blocks=926949632, imaxpct=5
> >>>>>          =                       sunit=0      swidth=0 blks
> >>>>> naming   =version 2              bsize=4096   ascii-ci=0
> >>>>> log      =internal log           bsize=4096   blocks=452612, version=2
> >>>>>          =                       sectsz=512   sunit=0 blks, lazy-count=1
> >>>>> realtime =none                   extsz=4096   blocks=0, rtextents=0
> >>>>>
> >>>>> But it also reproduces with this small loopback file:
> >>>>> meta-data=/tmp/xfs.test          isize=256    agcount=2, agsize=5120 blks
> >>>>>          =                       sectsz=512   attr=2, projid32bit=0
> >>>>> data     =                       bsize=4096   blocks=10240, imaxpct=25
> >>>>>          =                       sunit=0      swidth=0 blks
> >>>>> naming   =version 2              bsize=4096   ascii-ci=0
> >>>>> log      =internal log           bsize=4096   blocks=1200, version=2
> >>>>>          =                       sectsz=512   sunit=0 blks, lazy-count=1
> >>>>> realtime =none                   extsz=4096   blocks=0, rtextents=0
> >>>>
> >>>> ok so not a block number overflow issue, thanks.
> >>>>
> >>>>> You can have a look at xfs.test here: http://vol-public.s3.indian.skylable.com:8008/armel/testcase/xfs.test.gz
> >>>>>
> >>>>> If I loopback mount that on an x86-64 box it doesn't show the corruption message though ...
> >>>>
> >>>> FWIW, this is the 2nd report we've had of something similar, both on Armv7, both ok on x86_64.
> >>>>
> >>>> I'll take a look at your xfs.test; that's presumably copied after it reported the error, and you unmounted it before uploading, correct?  And it was mkfs'd on armv7, never mounted or manipulated in any way on x86_64?
> >>
> >> Thanks, yes it was mkfs.xfs on ARMv7 and unmounted.
> >>
> >>>
> >>> Oh, and what were the kernel messages when you produced the corruption with xfs.txt?
> >>
> >> Takes only a couple of minutes to reproduce the issue so I've prepared a fresh set of xfs2.test and corresponding kernel messages to make sure its all consistent.
> >> Freshly created XFS by mkfs.xfs: http://vol-public.s3.indian.skylable.com:8008/armel/testcase/xfs2.test.orig.gz
> >> The corrupted XFS: http://vol-public.s3.indian.skylable.com:8008/armel/testcase/xfs2.test.corrupted.gz
> >>
> > 
> > I managed to get an updated kernel on a beaglebone I had sitting around,
> > but I don't reproduce any errors with the "corrupted" image (I think
> > we've established that the image is fine on-disk and something is going
> > awry at runtime):
> > 
> > root@beaglebone:~# uname -a
> > Linux beaglebone 3.14.1+ #5 SMP Thu Jun 11 20:58:02 EDT 2015 armv7l GNU/Linux
> > root@beaglebone:~# mount ./xfs2.test.corrupted /mnt/
> > root@beaglebone:~# ls -al /mnt/a/
> > total 12
> > drwxr-xr-x 3 root root   14 Jun 11 16:11 .
> > drwxr-xr-x 3 root root   14 Jun 11 16:11 ..
> > drwxr-x--- 2 root root 8192 Jun 11 16:11 b
> > root@beaglebone:~# ls -al /mnt/a/b/
> > total 17996
> > drwxr-x--- 2 root root    8192 Jun 11 16:11 .
> > drwxr-xr-x 3 root root      14 Jun 11 16:11 ..
> > -rw-r--r-- 1 root root   12288 Jun 11 16:11 events.db
> > -rw-r--r-- 1 root root   15360 Jun 11 16:11 f00000000.db
> > -rw-r--r-- 1 root root   15360 Jun 11 16:11 f00000001.db
> > -rw-r--r-- 1 root root   15360 Jun 11 16:11 f00000002.db
> > -rw-r--r-- 1 root root   15360 Jun 11 16:11 f00000003.db
> > ...
> > root@beaglebone:~#
> > 
> > I echo Dave's suggestion down thread with regard to toolchain. This
> > kernel was compiled with the following cross-gcc (installed via Fedora
> > package):
> > 
> > 	gcc version 4.9.2 20150212 (Red Hat Cross 4.9.2-5) (GCC) 
> > 
> > Are you using something different?
> 
> /proc/version says:
> 
> Linux version 3.14.3-00088-g7651c68 (jenkins@boulder-jenkins) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #24 Thu Apr 9 16:13:46 MDT 2015
> 
> I'll get back to you when I have a new kernel running.
> 

Ok. FWIW, I just tried rebuilding with the following 4.6.3 toolchain:

https://www.kernel.org/pub/tools/crosstool/files/bin/x86_64/4.6.3/x86_64-gcc-4.6.3-nolibc_arm-unknown-linux-gnueabi.tar.xz

... and still didn't reproduce any errors. Of course, this probably
doesn't have whatever patches and whatnot might be included in the
distro 4.6.3 toolchain. It could be worth a try depending on what
happens with a newer kernel, though.

Brian

> Best regards,
> --Edwin
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2015-06-12 13:54 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-11  6:23 PROBLEM: XFS on ARM corruption 'Structure needs cleaning' Török Edwin
2015-06-11 15:16 ` Brian Foster
2015-06-11 15:28   ` Török Edwin
2015-06-11 15:51     ` Eric Sandeen
2015-06-11 15:58       ` Eric Sandeen
2015-06-11 16:32         ` Török Edwin
2015-06-11 17:10           ` Eric Sandeen
2015-06-11 17:13             ` Török Edwin
2015-06-11 17:16               ` Eric Sandeen
2015-06-11 20:07           ` Eric Sandeen
2015-06-11 20:29             ` Eric Sandeen
2015-06-11 22:53             ` Dave Chinner
2015-06-12 12:21           ` Brian Foster
2015-06-12 12:47             ` Török Edwin
2015-06-12 13:54               ` Brian Foster [this message]
2015-06-12 20:19                 ` Eric Sandeen
     [not found]                   ` <BLUPR04MB593340A765596780F266454F2BB0@BLUPR04MB593.namprd04.prod.outlook.com>
2015-06-13 13:55                     ` Török Edwin
2015-06-12 22:52               ` Dave Chinner
2015-08-12  0:56                 ` katsuki.uwatoko
2015-08-12  0:56                   ` katsuki.uwatoko at toshiba.co.jp
2015-08-12  3:14                   ` Dave Chinner
2015-08-12  3:14                     ` Dave Chinner
2015-08-12  6:19                     ` katsuki.uwatoko
2015-08-12  6:19                       ` katsuki.uwatoko at toshiba.co.jp
2015-08-12  6:24                   ` enabling libgcc for 64-bit divisions, was " Christoph Hellwig
2015-08-12  6:24                     ` Christoph Hellwig
2015-08-12 15:49                     ` Linus Torvalds
2015-08-12 15:49                       ` Linus Torvalds
2015-08-12 22:20                       ` Andy Lutomirski
2015-08-12 22:20                         ` Andy Lutomirski
2015-08-12 22:36                         ` Linus Torvalds
2015-08-12 22:36                           ` Linus Torvalds
2015-08-12 22:39                           ` Andy Lutomirski
2015-08-12 22:39                             ` Andy Lutomirski
2015-08-13  3:28                         ` Andrew Morton
2015-08-13  3:28                           ` Andrew Morton
2015-10-08 15:50                       ` Pavel Machek
2015-10-08 15:50                         ` Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150612135404.GC60661@bfoster.bfoster \
    --to=bfoster@redhat.com \
    --cc=christopher.squires@hgst.com \
    --cc=edwin@skylable.com \
    --cc=karanvir.singh@hgst.com \
    --cc=luca@skylable.com \
    --cc=sandeen@sandeen.net \
    --cc=wayne.burri@hgst.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.