Re: easily reproducible filesystem crash on rebuilding array

From: Emmanuel Florac <eflorac@intellique.com>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Subject: Re: easily reproducible filesystem crash on rebuilding array
Date: Tue, 16 Dec 2014 12:34:05 +0100	[thread overview]
Message-ID: <20141216123405.111c7ac0@harpe.intellique.com> (raw)
In-Reply-To: <20141215201036.GQ24183@dastard>

Le Tue, 16 Dec 2014 07:10:36 +1100
Dave Chinner <david@fromorbit.com> écrivait:

> 
> Deprecated Sysctls
> ==================
> 
>   fs.xfs.xfsbufd_centisecs      (Min: 50  Default: 100  Max: 3000)
>         Dirty metadata is now tracked by the log subsystem and
>         flushing is driven by log space and idling demands. The
>         xfsbufd no longer exists, so this syctl does nothing.
> 
>         Due for removal in 3.14.
> 
> Seems like the removal patch is overdue....

Probably, because the /proc/sys/fs/xfs/xfsbufd_centisecs is here on my
3.16.7....

> 
> > Trying
> > right away... Any advice welcome.
> 
> http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

I think I included some of the info in the first message:

kernel 3.16.7 plain vanilla version.
xfs_progs 3.2.1

CPU is opteron 6212 8 cores.

MemTotal:       16451948 kB
MemFree:          145756 kB
MemAvailable:   16190184 kB
Buffers:          146780 kB
Cached:         15457656 kB
SwapCached:            0 kB
Active:           304216 kB
Inactive:       15389180 kB
Active(anon):      80012 kB
Inactive(anon):    12844 kB
Active(file):     224204 kB
Inactive(file): 15376336 kB
Unevictable:        3444 kB
Mlocked:            3444 kB
SwapTotal:        976892 kB
SwapFree:         976892 kB
Dirty:           1334032 kB
Writeback:             0 kB
AnonPages:         92444 kB
Mapped:            30116 kB
Shmem:              1688 kB
Slab:             528524 kB
SReclaimable:     504668 kB
SUnreclaim:        23856 kB
KernelStack:        5008 kB
PageTables:         6204 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     9202864 kB
Committed_AS:     614164 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      334792 kB
VmallocChunk:   34359296000 kB
HardwareCorrupted:     0 kB
DirectMap4k:       10816 kB
DirectMap2M:     2068480 kB
DirectMap1G:    14680064 kB

# cat /proc/mounts 
rootfs / rootfs rw 0 0
/dev/root / reiserfs rw,relatime 0 0
tmpfs /run tmpfs rw,nosuid,noexec,relatime,size=1645196k,mode=755 0 0
tmpfs /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /dev tmpfs rw,relatime,size=10240k,mode=755 0 0
tmpfs /run/shm tmpfs rw,nosuid,nodev,noexec,relatime,size=3485760k 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620 0 0
fusectl /sys/fs/fuse/connections fusectl rw,relatime 0 0
rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0
guitare:/mnt/raid/partage /mnt/partage nfs
rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.0.1.5,mountvers=3,mountport=50731,mountproto=udp,local_lock=none,addr=10.0.1.5
0 0 nfsd /proc/fs/nfsd nfsd rw,relatime 0 0
taiko:/mnt/raid/shared/partage /mnt/shared nfs
rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.0.1.12,mountvers=3,mountport=56679,mountproto=udp,local_lock=none,addr=10.0.1.12
0 0 /dev/mapper/vg0-raid /mnt/raid xfs
rw,relatime,attr2,nobarrier,inode64,noquota 0 0

cat /proc/partitions 
major minor  #blocks  name

   8        0 54683228160 sda
   8        1    4881408 sda1
   8        2     976896 sda2
   8        3    4882432 sda3
   8        5 54672484352 sda5
 254        0 54672482304 dm-0

The RAID hardware is an adaptec 71685 running the latest firmware
( 32033 ). This is a 16 drives RAID-6 array of 4 TB HGST drives. The
problem occurs repeatly with any combination of 7xx5 controllers and 3
or 4 TB HGST drives in RAID-6 of various types, with XFS or JFS (it
never occurs with either ext4 or reiserfs).

As I mentioned, when the disk drives cache is on the corruption is
serious. With disk cache off, the corruption is minimal, however the
filesystem shuts down.

There's an LVM volume on sda5, which is the exercized partition.

The filesystem has been primed with a few (23) terabytes of mixed data
with both small (few KB or less), medium, and big (few gigabytes or
more) files. Two simultaneous, long running copies are made ( cp -a
somedir someotherdir) , while three simultaneous, long running read
operations are run ( md5sum -c mydir.md5 mydir), while the array is
busy rebuilding. Disk usage (as reported by iostat -mx 5) stays solidly
at 100%, with a continuous throughput of a few hundred megabytes per
second. The full test runs for about 12 hours (when not failing), and
ends up copying 6 TB or so, and md5summing 12 TB or so.

> I'd start with upgrading the firmware on your RAID controller and
> turning the XFS error level up to 11....

The firmware is the latest available. How do I turn logging to 11
please ?

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs