* Weird XFS Corruption Error
@ 2014-01-22 16:09 Sascha Askani
2014-01-22 23:31 ` Dave Chinner
2014-01-23 14:19 ` Emmanuel Florac
0 siblings, 2 replies; 7+ messages in thread
From: Sascha Askani @ 2014-01-22 16:09 UTC (permalink / raw)
To: xfs
[-- Attachment #1.1: Type: text/plain, Size: 4825 bytes --]
Hi everybody,
We experienced a weird XFS corruption yesterday and I desperately trying to find out what was happening.
First, the setup:
* ProLiant DL380p Gen8
* 256GB RAM
* HP SmartArray P420i Controller
** 1 GB BBWC
** Firmware Version 4.68
** 20x MK0100GCTYU 100GB SSD Drives
** Raid 1+0
* LVM
* Ubuntu 12.10 LTS
* Kernel 3.11.0-15-generic #23~precise1-Ubuntu
fstab Entry:
/dev/vg00/opt_mysqlbackup /opt/mysqlbackup xfs nobarrier,noatime,nodiratime,logbufs=8,logbsize=256k 0 2
We created a 120GB LV mounted on /opt/mysqlbackup with which (obviously) temporarily hosts our MariaDB Backups until they are transferred to tape. We use mylvmbackup (http://www.lenzg.net/mylvmbackup/) to create a (approx. 55GB) tar.gz file containing the dump. While testing, I created a hardlink for 2 Files in a subdir („safe“) and forgot them for a day while the „original“ file was deleted and replaced by next day’s backup.
When I tried cleaning up the no longer needed files, I encountered the following:
---------------------------------------------------------
me@hsoi-gts3-de02:/opt/mysqlbackup$ sudo rm -rf safe/
sudo rm -rf safe/
[sudo] password for saskani:
rm: cannot remove `safe/daily_snapshot.tar.gz.md5': Input/output error
---------------------------------------------------------
dmesg told me:
---------------------------------------------------------
[964199.138848] XFS (dm-8): Internal error xfs_bmbt_read_verify at line 789 of file /build/buildd/linux-lts-saucy-3.11.0/fs/xfs/xfs_bmap_btree.c. Caller 0xffffffffa0164495
[964199.138848]
[964199.138850] CPU: 1 PID: 3694 Comm: kworker/1:1H Tainted: GF 3.11.0-15-generic #23~precise1-Ubuntu
[964199.138851] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 09/18/2013
[964199.138874] Workqueue: xfslogd xfs_buf_iodone_work [xfs]
[964199.138876] 0000000000000001 ffff881c6be6fd18 ffffffff8173bc0e 0000000000004364
[964199.138878] ffff883f9061c000 ffff881c6be6fd38 ffffffffa016629f ffffffffa0164495
[964199.138879] 0000000000000001 ffff881c6be6fd78 ffffffffa016630e ffff881c6be6fda8
[964199.138880] Call Trace:
[964199.138886] [<ffffffff8173bc0e>] dump_stack+0x46/0x58
[964199.138906] [<ffffffffa016629f>] xfs_error_report+0x3f/0x50 [xfs]
[964199.138913] [<ffffffffa0164495>] ? xfs_buf_iodone_work+0x95/0xc0 [xfs]
[964199.138921] [<ffffffffa016630e>] xfs_corruption_error+0x5e/0x90 [xfs]
[964199.138928] [<ffffffffa0164495>] ? xfs_buf_iodone_work+0x95/0xc0 [xfs]
[964199.138939] [<ffffffffa01944d6>] xfs_bmbt_read_verify+0x76/0xf0 [xfs]
[964199.138946] [<ffffffffa0164495>] ? xfs_buf_iodone_work+0x95/0xc0 [xfs]
[964199.138949] [<ffffffff81095bb2>] ? finish_task_switch+0x52/0xf0
[964199.138969] [<ffffffffa0164495>] xfs_buf_iodone_work+0x95/0xc0 [xfs]
[964199.138972] [<ffffffff81081060>] process_one_work+0x170/0x4a0
[964199.138973] [<ffffffff81082121>] worker_thread+0x121/0x390
[964199.138975] [<ffffffff81082000>] ? manage_workers.isra.21+0x170/0x170
[964199.138977] [<ffffffff81088fe0>] kthread+0xc0/0xd0
[964199.138979] [<ffffffff81088f20>] ? flush_kthread_worker+0xb0/0xb0
[964199.138981] [<ffffffff817508ac>] ret_from_fork+0x7c/0xb0
[964199.138983] [<ffffffff81088f20>] ? flush_kthread_worker+0xb0/0xb0
[964199.138984] XFS (dm-8): Corruption detected. Unmount and run xfs_repair
[964199.139014] XFS (dm-8): metadata I/O error: block 0x1f0 ("xfs_trans_read_buf_map") error 117 numblks 8
[964199.139016] XFS (dm-8): xfs_do_force_shutdown(0x1) called from line 367 of file /build/buildd/linux-lts-saucy-3.11.0/fs/xfs/xfs_trans_buf.c. Return address = 0xffffffffa01cadbc
[964199.139324] XFS (dm-8): I/O Error Detected. Shutting down filesystem
[964199.139325] XFS (dm-8): Please umount the filesystem and rectify the problem(s)
[964212.367300] XFS (dm-8): xfs_log_force: error 5 returned.
[964242.477283] XFS (dm-8): xfs_log_force: error 5 returned.
---------------------------------------------------------
After that, I tried the following (in order):
1. xfs_repair, which did not find the superblock and started scanning the LV, after finding the secondary superblock, it told me there’s still something in the log, so I
2. mounted the filesystem, which gave me a „Structure needs cleaning“ after a couple of seconds
3. tried mounting again for good measure, same error „Structure needs cleaning“
4. xfs_repair -L which repaired everything, and effectively cleaned my Filesystem in the process.
5. mount the filesystem to find it empty.
Since then, I’m desperately trying to reproduce the problem, but unfortunately to no avail. Can somebody give some insight on the errors I encountered. I have previously operated 4,5PB worth of XFS Filesystems for 3 years and never got an error similar to this.
Best regards
Sascha
[-- Attachment #1.2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 670 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: Weird XFS Corruption Error 2014-01-22 16:09 Weird XFS Corruption Error Sascha Askani @ 2014-01-22 23:31 ` Dave Chinner 2014-01-24 7:56 ` Sascha Askani 2014-01-23 14:19 ` Emmanuel Florac 1 sibling, 1 reply; 7+ messages in thread From: Dave Chinner @ 2014-01-22 23:31 UTC (permalink / raw) To: Sascha Askani; +Cc: xfs On Wed, Jan 22, 2014 at 05:09:10PM +0100, Sascha Askani wrote: > Hi everybody, > > We experienced a weird XFS corruption yesterday and I desperately trying to find out what was happening. > First, the setup: > > * ProLiant DL380p Gen8 > * 256GB RAM > * HP SmartArray P420i Controller > ** 1 GB BBWC > ** Firmware Version 4.68 > ** 20x MK0100GCTYU 100GB SSD Drives > ** Raid 1+0 > * LVM > * Ubuntu 12.10 LTS > * Kernel 3.11.0-15-generic #23~precise1-Ubuntu > > fstab Entry: > /dev/vg00/opt_mysqlbackup /opt/mysqlbackup xfs nobarrier,noatime,nodiratime,logbufs=8,logbsize=256k 0 2 > > We created a 120GB LV mounted on /opt/mysqlbackup with which > (obviously) temporarily hosts our MariaDB Backups until they are > transferred to tape. We use mylvmbackup > (http://www.lenzg.net/mylvmbackup/) to create a (approx. 55GB) > tar.gz file containing the dump. While testing, I created a > hardlink for 2 Files in a subdir („safe“) and forgot them > for a day while the „original“ file was deleted and > replaced by next day’s backup. > > When I tried cleaning up the no longer needed files, I encountered the following: > > --------------------------------------------------------- > me@hsoi-gts3-de02:/opt/mysqlbackup$ sudo rm -rf safe/ > sudo rm -rf safe/ > [sudo] password for saskani: > rm: cannot remove `safe/daily_snapshot.tar.gz.md5': Input/output error > --------------------------------------------------------- > > dmesg told me: > --------------------------------------------------------- > [964199.138848] XFS (dm-8): Internal error xfs_bmbt_read_verify at line 789 of file /build/buildd/linux-lts-saucy-3.11.0/fs/xfs/xfs_bmap_btree.c. Caller 0xffffffffa0164495 > [964199.138848] > [964199.138850] CPU: 1 PID: 3694 Comm: kworker/1:1H Tainted: GF 3.11.0-15-generic #23~precise1-Ubuntu > [964199.138851] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 09/18/2013 > [964199.138874] Workqueue: xfslogd xfs_buf_iodone_work [xfs] > [964199.138876] 0000000000000001 ffff881c6be6fd18 ffffffff8173bc0e 0000000000004364 > [964199.138878] ffff883f9061c000 ffff881c6be6fd38 ffffffffa016629f ffffffffa0164495 > [964199.138879] 0000000000000001 ffff881c6be6fd78 ffffffffa016630e ffff881c6be6fda8 > [964199.138880] Call Trace: > [964199.138886] [<ffffffff8173bc0e>] dump_stack+0x46/0x58 > [964199.138906] [<ffffffffa016629f>] xfs_error_report+0x3f/0x50 [xfs] > [964199.138913] [<ffffffffa0164495>] ? xfs_buf_iodone_work+0x95/0xc0 [xfs] > [964199.138921] [<ffffffffa016630e>] xfs_corruption_error+0x5e/0x90 [xfs] > [964199.138928] [<ffffffffa0164495>] ? xfs_buf_iodone_work+0x95/0xc0 [xfs] > [964199.138939] [<ffffffffa01944d6>] xfs_bmbt_read_verify+0x76/0xf0 [xfs] > [964199.138946] [<ffffffffa0164495>] ? xfs_buf_iodone_work+0x95/0xc0 [xfs] > [964199.138949] [<ffffffff81095bb2>] ? finish_task_switch+0x52/0xf0 > [964199.138969] [<ffffffffa0164495>] xfs_buf_iodone_work+0x95/0xc0 [xfs] > [964199.138972] [<ffffffff81081060>] process_one_work+0x170/0x4a0 > [964199.138973] [<ffffffff81082121>] worker_thread+0x121/0x390 > [964199.138975] [<ffffffff81082000>] ? manage_workers.isra.21+0x170/0x170 > [964199.138977] [<ffffffff81088fe0>] kthread+0xc0/0xd0 > [964199.138979] [<ffffffff81088f20>] ? flush_kthread_worker+0xb0/0xb0 > [964199.138981] [<ffffffff817508ac>] ret_from_fork+0x7c/0xb0 > [964199.138983] [<ffffffff81088f20>] ? flush_kthread_worker+0xb0/0xb0 > [964199.138984] XFS (dm-8): Corruption detected. Unmount and run xfs_repair > [964199.139014] XFS (dm-8): metadata I/O error: block 0x1f0 ("xfs_trans_read_buf_map") error 117 numblks 8 > [964199.139016] XFS (dm-8): xfs_do_force_shutdown(0x1) called from line 367 of file /build/buildd/linux-lts-saucy-3.11.0/fs/xfs/xfs_trans_buf.c. Return address = 0xffffffffa01cadbc So, an inode extent map btree block failed verification for some reason. Hmmm - there should have been 4 lines of hexdump output there as well. Can you post that as well? Or have you modified /proc/sys/fs/xfs/error_level to have a value of 0 so it is not emitted? And not the disk address of the buffer? 0x1f0 - it's right near the start of the volume. > [964199.139324] XFS (dm-8): I/O Error Detected. Shutting down filesystem > [964199.139325] XFS (dm-8): Please umount the filesystem and rectify the problem(s) > [964212.367300] XFS (dm-8): xfs_log_force: error 5 returned. > [964242.477283] XFS (dm-8): xfs_log_force: error 5 returned. > --------------------------------------------------------- > > After that, I tried the following (in order): Do you have the output and log messages from these steps? That would be realy helpful in confirming any diagnosis. > 1. xfs_repair, which did not find the superblock and started scanning the LV, after finding the secondary superblock, it told me there’s still something in the log, so I Oh, wow. Ok, if the primary superblock is gone, along with metadata in the first few blocks of the filesystem, then something has overwritten the start of the block device the filesystem is on. > 2. mounted the filesystem, which gave me a „Structure needs cleaning“ after a couple of seconds > 3. tried mounting again for good measure, same error „Structure needs cleaning“ Right - the kernel can't read a valid superlock, either. > 4. xfs_repair -L which repaired everything, and effectively cleaned my Filesystem in the process. Recreating the primary superblock from the backup superblocks > 5. mount the filesystem to find it empty. Because the root inode was lost, along with AGI 0 and so all the inodes in the first AG were completely lost as all the redundant information that is used to find them was trashed. > Since then, I’m desperately trying to reproduce the problem, > but unfortunately to no avail. Can somebody give some insight on > the errors I encountered. I have previously operated 4,5PB worth > of XFS Filesystems for 3 years and never got an error similar to > this. This doesn't look like an XFS problem. This looks like something overwrote the start of the block device underneath the XFS filesystem. I've seen this happen before with faulty SSDs, I've also seen it when someone issued a discard to the wrong location on a block device (you didn't run fstrim on the block device, did you?), and I've seen faulty RAID controllers cause similar issues. So right now I'd be looking at logs and so on for hardware/storage issues that occurred in the past couple of days as potential causes. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Weird XFS Corruption Error 2014-01-22 23:31 ` Dave Chinner @ 2014-01-24 7:56 ` Sascha Askani 2014-01-24 21:52 ` Dave Chinner 0 siblings, 1 reply; 7+ messages in thread From: Sascha Askani @ 2014-01-24 7:56 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs [-- Attachment #1.1: Type: text/plain, Size: 4756 bytes --] Hi Dave, thanks for your reply and I’m sorry for the delayed answer… Am 23.01.2014 um 00:31 schrieb Dave Chinner <david@fromorbit.com>: > On Wed, Jan 22, 2014 at 05:09:10PM +0100, Sascha Askani wrote: > > So, an inode extent map btree block failed verification for some > reason. Hmmm - there should have been 4 lines of hexdump output > there as well. Can you post that as well? Or have you modified > /proc/sys/fs/xfs/error_level to have a value of 0 so it is not > emitted? > /proc/sys/fs/xfs/error_level is set to 3, sorry for not including this in my original post, the Hexdump is pretty „boring“ (or interesting, depending on your point of view): [964197.435322] ffff881f8e59b000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [964197.862037] ffff881f8e59b010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [964198.288694] ffff881f8e59b020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [964198.712093] ffff881f8e59b030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > And not the disk address of the buffer? 0x1f0 - it's right near the > start of the volume. > > >> [964199.139324] XFS (dm-8): I/O Error Detected. Shutting down filesystem >> [964199.139325] XFS (dm-8): Please umount the filesystem and rectify the problem(s) >> [964212.367300] XFS (dm-8): xfs_log_force: error 5 returned. >> [964242.477283] XFS (dm-8): xfs_log_force: error 5 returned. >> --------------------------------------------------------- >> >> After that, I tried the following (in order): > > Do you have the output and log messages from these steps? That would > be realy helpful in confirming any diagnosis. Unfortunately, the output got lost due to a reboot, but basically xfs_repair scanned the whole volume after failing to find a primary superblock, emitting Millions of dots in the process. > >> 1. xfs_repair, which did not find the superblock and started scanning the LV, after finding the secondary superblock, it told me there’s still something in the log, so I > > Oh, wow. Ok, if the primary superblock is gone, along with metadata > in the first few blocks of the filesystem, then something has > overwritten the start of the block device the filesystem is on. > >> 2. mounted the filesystem, which gave me a „Structure needs cleaning“ after a couple of seconds >> 3. tried mounting again for good measure, same error „Structure needs cleaning“ > > Right - the kernel can't read a valid superlock, either. Just seen this messages in the log which were emitted when trying to mount the FS: [964606.038733] XFS (dm-8): metadata I/O error: block 0x200 ("xlog_recover_do..(read#2)") error 117 numblks 16 [964606.515048] XFS (dm-8): log mount/recovery failed: error 117 [964606.515386] XFS (dm-8): log mount failed > >> 4. xfs_repair -L which repaired everything, and effectively cleaned my Filesystem in the process. > > Recreating the primary superblock from the backup superblocks > >> 5. mount the filesystem to find it empty. > > Because the root inode was lost, along with AGI 0 and so all > the inodes in the first AG were completely lost as all the redundant > information that is used to find them was trashed. Yes, and since at the time of the error, there were only 2 Files, 1 Directory and 2 Hardlinks on the fs, so it’s kind of probable that everything is lost. > >> Since then, I’m desperately trying to reproduce the problem, >> but unfortunately to no avail. Can somebody give some insight on >> the errors I encountered. I have previously operated 4,5PB worth >> of XFS Filesystems for 3 years and never got an error similar to >> this. > > This doesn't look like an XFS problem. This looks like something > overwrote the start of the block device underneath the XFS > filesystem. I've seen this happen before with faulty SSDs, I've also > seen it when someone issued a discard to the wrong location on a > block device (you didn't run fstrim on the block device, did you?), > and I've seen faulty RAID controllers cause similar issues. So right > now I'd be looking at logs and so on for hardware/storage issues > that occurred in the past couple of days as potential causes. No, we did not perform any kind of trimming on the device, also, there are no „discard“ options set anywhere (mount-options, lvm.conf,…). We have a pretty active MariaDB-Slave running on the same Controller Logical Drive / LVM VG and no errors on the other filesystems so far; also, mylvmbackup does not seem to have any problems. Thanks for your insights so far, if you need any more information, I’d be happy to provide it if possible. Best regards, Sascha [-- Attachment #1.2: Message signed with OpenPGP using GPGMail --] [-- Type: application/pgp-signature, Size: 670 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Weird XFS Corruption Error 2014-01-24 7:56 ` Sascha Askani @ 2014-01-24 21:52 ` Dave Chinner 0 siblings, 0 replies; 7+ messages in thread From: Dave Chinner @ 2014-01-24 21:52 UTC (permalink / raw) To: Sascha Askani; +Cc: xfs On Fri, Jan 24, 2014 at 08:56:32AM +0100, Sascha Askani wrote: > Hi Dave, > > thanks for your reply and I’m sorry for the delayed answer… > > Am 23.01.2014 um 00:31 schrieb Dave Chinner <david@fromorbit.com>: > > > On Wed, Jan 22, 2014 at 05:09:10PM +0100, Sascha Askani wrote: > > > > So, an inode extent map btree block failed verification for some > > reason. Hmmm - there should have been 4 lines of hexdump output > > there as well. Can you post that as well? Or have you modified > > /proc/sys/fs/xfs/error_level to have a value of 0 so it is not > > emitted? > > > > /proc/sys/fs/xfs/error_level is set to 3, sorry for not including this in my original post, the Hexdump is pretty „boring“ (or interesting, depending on your point of view): > > [964197.435322] ffff881f8e59b000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > [964197.862037] ffff881f8e59b010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > [964198.288694] ffff881f8e59b020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > [964198.712093] ffff881f8e59b030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ Yeah, that confirms what I suspected - the buffer has been overwritten with zeros. That tends to imply *something* has zeroed the start of the block device, and that's the cause of all the problems. > > Oh, wow. Ok, if the primary superblock is gone, along with metadata > > in the first few blocks of the filesystem, then something has > > overwritten the start of the block device the filesystem is on. > > > >> 2. mounted the filesystem, which gave me a „Structure needs cleaning“ after a couple of seconds > >> 3. tried mounting again for good measure, same error „Structure needs cleaning“ > > > > Right - the kernel can't read a valid superlock, either. > > Just seen this messages in the log which were emitted when trying to mount the FS: > > [964606.038733] XFS (dm-8): metadata I/O error: block 0x200 ("xlog_recover_do..(read#2)") error 117 numblks 16 > [964606.515048] XFS (dm-8): log mount/recovery failed: error 117 > [964606.515386] XFS (dm-8): log mount failed Yup, that's trying to read an inode cluster. It's also right near the start of the filesystem (0x200 * 512 bytes = 256k into the filesystem) So log recovery is trying to replay an inode change and finding the inodes that underly the change in the log are corrupt. This really looks like something outside the filesystem caused the problem. It's probably too late to find out what caused it either, but I'd be checking with your HW vendor(s) about known problems with their hardware/firmware.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Weird XFS Corruption Error 2014-01-22 16:09 Weird XFS Corruption Error Sascha Askani 2014-01-22 23:31 ` Dave Chinner @ 2014-01-23 14:19 ` Emmanuel Florac 2014-01-23 14:29 ` Emmanuel Florac 2014-01-24 8:08 ` Sascha Askani 1 sibling, 2 replies; 7+ messages in thread From: Emmanuel Florac @ 2014-01-23 14:19 UTC (permalink / raw) To: Sascha Askani; +Cc: xfs [-- Attachment #1.1: Type: text/plain, Size: 999 bytes --] Le Wed, 22 Jan 2014 17:09:10 +0100 Sascha Askani <saskani@inovex.de> écrivait: > Internal error xfs_bmbt_read_verify at line 789 ... > metadata I/O error: block 0x1f0 ("xfs_trans_read_buf_map") error 117 > numblks 8 A quick glance at the source code seems to indicate that there was an actual CRC error (on disk) at this point, so that could be a RAID problem; OTOH it looks really similar to an older bug : http://oss.sgi.com/archives/xfs/2013-02/msg00482.html AFAIK this one could be related to the current Ubuntu LTS lvm stack... We need Dave's advice on this one. Was there nothing in the xfs_repair output? nothing in lost+found either? -- ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | <eflorac@intellique.com> | +33 1 78 94 84 02 ------------------------------------------------------------------------ [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 181 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Weird XFS Corruption Error 2014-01-23 14:19 ` Emmanuel Florac @ 2014-01-23 14:29 ` Emmanuel Florac 2014-01-24 8:08 ` Sascha Askani 1 sibling, 0 replies; 7+ messages in thread From: Emmanuel Florac @ 2014-01-23 14:29 UTC (permalink / raw) To: Emmanuel Florac; +Cc: Sascha Askani, xfs [-- Attachment #1.1: Type: text/plain, Size: 1308 bytes --] Le Thu, 23 Jan 2014 15:19:43 +0100 Emmanuel Florac <eflorac@intellique.com> écrivait: > OTOH it looks really similar to an older bug : > > http://oss.sgi.com/archives/xfs/2013-02/msg00482.html > > AFAIK this one could be related to the current Ubuntu LTS lvm stack... > We need Dave's advice on this one. > Yes, I guess we're getting closer: https://www.redhat.com/archives/dm-devel/2013-February/msg00113.html "It's reproducable on lots of different kernels, apparently - 3.8, 3.4.33, CentOS 6.3, debian sid/wheezy and Fedora 18 were mentioned specifically by the OP - so it doesn't look like a recent regression or constrained to a specific kernel." And https://www.redhat.com/archives/dm-devel/2013-February/msg00122.html : "Was issue_discards enabled in lvm.conf? If so, as Alasdair said, this lvm2 2.02.97 fix is needed: http://git.fedorahosted.org/cgit/lvm2.git/commit/?id=07a25c249b3e " What is the version of lvm ( dpkg -l lvm2 )? -- ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | <eflorac@intellique.com> | +33 1 78 94 84 02 ------------------------------------------------------------------------ [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 181 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Weird XFS Corruption Error 2014-01-23 14:19 ` Emmanuel Florac 2014-01-23 14:29 ` Emmanuel Florac @ 2014-01-24 8:08 ` Sascha Askani 1 sibling, 0 replies; 7+ messages in thread From: Sascha Askani @ 2014-01-24 8:08 UTC (permalink / raw) To: Emmanuel Florac; +Cc: xfs [-- Attachment #1.1: Type: text/plain, Size: 1698 bytes --] Hi Emmanuel, thanks for your reply, sorry for the delayed answer... Am 23.01.2014 um 15:19 schrieb Emmanuel Florac <eflorac@intellique.com>: > Le Wed, 22 Jan 2014 17:09:10 +0100 > Sascha Askani <saskani@inovex.de> écrivait: > >> Internal error xfs_bmbt_read_verify at line 789 > ... >> metadata I/O error: block 0x1f0 ("xfs_trans_read_buf_map") error 117 >> numblks 8 > > > A quick glance at the source code seems to indicate that there was an > actual CRC error (on disk) at this point, so that could be a RAID > problem; (Un)fortunately, I don’t see any kind of controller errors, but I definitely second your opinion > > OTOH it looks really similar to an older bug : > > http://oss.sgi.com/archives/xfs/2013-02/msg00482.html > > AFAIK this one could be related to the current Ubuntu LTS lvm stack... > We need Dave's advice on this one. > > Was there nothing in the xfs_repair output? nothing in lost+found > either? xfs_repair output (which I unfortunately do not have) was sparse (ignoring the many dots it printed after having failed to find the primary superblock), basically it removed invalid references. lost+found contains 1 directory (which is empty) As for the bug you referenced above: Yes, it looks similar, but we have no pvmove going on anywhere at any time, but I’ll see if I can try to reproduce the error with the many files (kernel source) as described in the bug. For reference, the LVM versions currently in use: root@hsoi-gts3-de02:/# lvdisplay --version LVM version: 2.02.66(2) (2010-05-20) Library version: 1.02.48 (2010-05-20) Driver version: 4.25.0 Thanks so far :) Best regards, Sascha [-- Attachment #1.2: Message signed with OpenPGP using GPGMail --] [-- Type: application/pgp-signature, Size: 670 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-01-24 21:53 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-01-22 16:09 Weird XFS Corruption Error Sascha Askani 2014-01-22 23:31 ` Dave Chinner 2014-01-24 7:56 ` Sascha Askani 2014-01-24 21:52 ` Dave Chinner 2014-01-23 14:19 ` Emmanuel Florac 2014-01-23 14:29 ` Emmanuel Florac 2014-01-24 8:08 ` Sascha Askani
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox