Weird XFS Corruption Error

* Weird XFS Corruption Error
@ 2014-01-22 16:09 Sascha Askani
  2014-01-22 23:31 ` Dave Chinner
  2014-01-23 14:19 ` Emmanuel Florac
  0 siblings, 2 replies; 7+ messages in thread
From: Sascha Askani @ 2014-01-22 16:09 UTC (permalink / raw)
  To: xfs

[-- Attachment #1.1: Type: text/plain, Size: 4825 bytes --]

Hi everybody,

We experienced a weird XFS corruption yesterday and I desperately trying to find out what was happening.
First, the setup:

* ProLiant DL380p Gen8
* 256GB RAM
* HP SmartArray P420i Controller
** 1 GB BBWC
** Firmware Version 4.68
** 20x MK0100GCTYU 100GB SSD Drives
** Raid 1+0
* LVM
* Ubuntu 12.10 LTS
* Kernel 3.11.0-15-generic #23~precise1-Ubuntu

fstab Entry: 
/dev/vg00/opt_mysqlbackup   /opt/mysqlbackup            xfs     nobarrier,noatime,nodiratime,logbufs=8,logbsize=256k       0 2

We created a 120GB LV mounted on /opt/mysqlbackup with which (obviously) temporarily hosts our MariaDB Backups until they are transferred to tape. We use mylvmbackup (http://www.lenzg.net/mylvmbackup/) to create a (approx. 55GB) tar.gz file containing the dump. While testing, I created a hardlink for 2 Files in a subdir („safe“) and forgot them for a day while the „original“ file was deleted and replaced by next day’s backup.

When I tried cleaning up the no longer needed files, I encountered the following:

---------------------------------------------------------
me@hsoi-gts3-de02:/opt/mysqlbackup$ sudo rm -rf safe/
sudo rm -rf safe/
[sudo] password for saskani:
rm: cannot remove `safe/daily_snapshot.tar.gz.md5': Input/output error
---------------------------------------------------------

dmesg told me:
---------------------------------------------------------
[964199.138848] XFS (dm-8): Internal error xfs_bmbt_read_verify at line 789 of file /build/buildd/linux-lts-saucy-3.11.0/fs/xfs/xfs_bmap_btree.c.  Caller 0xffffffffa0164495
[964199.138848]
[964199.138850] CPU: 1 PID: 3694 Comm: kworker/1:1H Tainted: GF            3.11.0-15-generic #23~precise1-Ubuntu
[964199.138851] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 09/18/2013
[964199.138874] Workqueue: xfslogd xfs_buf_iodone_work [xfs]
[964199.138876]  0000000000000001 ffff881c6be6fd18 ffffffff8173bc0e 0000000000004364
[964199.138878]  ffff883f9061c000 ffff881c6be6fd38 ffffffffa016629f ffffffffa0164495
[964199.138879]  0000000000000001 ffff881c6be6fd78 ffffffffa016630e ffff881c6be6fda8
[964199.138880] Call Trace:
[964199.138886]  [<ffffffff8173bc0e>] dump_stack+0x46/0x58
[964199.138906]  [<ffffffffa016629f>] xfs_error_report+0x3f/0x50 [xfs]
[964199.138913]  [<ffffffffa0164495>] ? xfs_buf_iodone_work+0x95/0xc0 [xfs]
[964199.138921]  [<ffffffffa016630e>] xfs_corruption_error+0x5e/0x90 [xfs]
[964199.138928]  [<ffffffffa0164495>] ? xfs_buf_iodone_work+0x95/0xc0 [xfs]
[964199.138939]  [<ffffffffa01944d6>] xfs_bmbt_read_verify+0x76/0xf0 [xfs]
[964199.138946]  [<ffffffffa0164495>] ? xfs_buf_iodone_work+0x95/0xc0 [xfs]
[964199.138949]  [<ffffffff81095bb2>] ? finish_task_switch+0x52/0xf0
[964199.138969]  [<ffffffffa0164495>] xfs_buf_iodone_work+0x95/0xc0 [xfs]
[964199.138972]  [<ffffffff81081060>] process_one_work+0x170/0x4a0
[964199.138973]  [<ffffffff81082121>] worker_thread+0x121/0x390
[964199.138975]  [<ffffffff81082000>] ? manage_workers.isra.21+0x170/0x170
[964199.138977]  [<ffffffff81088fe0>] kthread+0xc0/0xd0
[964199.138979]  [<ffffffff81088f20>] ? flush_kthread_worker+0xb0/0xb0
[964199.138981]  [<ffffffff817508ac>] ret_from_fork+0x7c/0xb0
[964199.138983]  [<ffffffff81088f20>] ? flush_kthread_worker+0xb0/0xb0
[964199.138984] XFS (dm-8): Corruption detected. Unmount and run xfs_repair
[964199.139014] XFS (dm-8): metadata I/O error: block 0x1f0 ("xfs_trans_read_buf_map") error 117 numblks 8
[964199.139016] XFS (dm-8): xfs_do_force_shutdown(0x1) called from line 367 of file /build/buildd/linux-lts-saucy-3.11.0/fs/xfs/xfs_trans_buf.c.  Return address = 0xffffffffa01cadbc
[964199.139324] XFS (dm-8): I/O Error Detected. Shutting down filesystem
[964199.139325] XFS (dm-8): Please umount the filesystem and rectify the problem(s)
[964212.367300] XFS (dm-8): xfs_log_force: error 5 returned.
[964242.477283] XFS (dm-8): xfs_log_force: error 5 returned.
---------------------------------------------------------

After that, I tried the following (in order):

1. xfs_repair, which did not find the superblock and started scanning the LV, after finding the secondary superblock, it told me there’s still something in the log, so I
2. mounted the filesystem, which gave me a „Structure needs cleaning“ after a couple of seconds
3. tried mounting again for good measure, same error „Structure needs cleaning“
4. xfs_repair -L which repaired everything, and effectively cleaned my Filesystem in the process.
5. mount the filesystem to find it empty.

Since then, I’m desperately trying to reproduce the problem, but unfortunately to no avail. Can somebody give some insight on the errors I encountered. I have previously operated 4,5PB worth of XFS Filesystems for 3 years and never got an error similar to this.

Best regards
Sascha

[-- Attachment #1.2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 670 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread