From: Dave Chinner <david@fromorbit.com>
To: "Török Edwin" <edwintorok@gmail.com>
Cc: xfs-masters@oss.sgi.com,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
xfs@oss.sgi.com
Subject: Re: XFS internal error (memory corruption)
Date: Tue, 5 Jul 2011 23:09:32 +1000 [thread overview]
Message-ID: <20110705130932.GF1026@dastard> (raw)
In-Reply-To: <4E12A927.9020102@gmail.com>
On Tue, Jul 05, 2011 at 09:03:19AM +0300, Török Edwin wrote:
> Hi,
>
> Yesterday when running 'shutdown -Pfh now', it hung using 99% CPU in sys [*]
> Looking at the console there was a message about XFS "Corruption of in-memory data detected", and about XFS_WANT_CORRUPTED_GOTO.
So you had a btree corruption.
> Had to shutdown the machine via SysRQ u + o.
>
> Today when I booted I got this message:
> [ 9.786494] XFS (md1p2): Mounting Filesystem
> [ 9.927590] XFS (md1p2): Starting recovery (logdev: /dev/disk/by-id/scsi-SATA_WDC_WD740ADFD-0_WD-WMARF1007797-part5)
> [ 10.385941] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1638 of file fs/xfs/xfs_alloc.c. Caller 0xffffffff8122b80e
> [ 10.385943]
> [ 10.386007] Pid: 1990, comm: mount Not tainted 3.0.0-rc5 #155
> [ 10.386009] Call Trace:
> [ 10.386014] [<ffffffff812551ca>] xfs_error_report+0x3a/0x40
> [ 10.386017] [<ffffffff8122b80e>] ? xfs_free_extent+0xce/0x120
> [ 10.386019] [<ffffffff81227e06>] ? xfs_alloc_lookup_eq+0x16/0x20
> [ 10.386021] [<ffffffff81228f4a>] xfs_free_ag_extent+0x6aa/0x780
> [ 10.386023] [<ffffffff8122b80e>] xfs_free_extent+0xce/0x120
> [ 10.386026] [<ffffffff8127b0ff>] ? kmem_zone_alloc+0x5f/0xe0
> [ 10.386029] [<ffffffff81268e9f>] xlog_recover_process_efi+0x15f/0x1a0
> [ 10.386031] [<ffffffff8126ab26>] xlog_recover_process_efis.isra.4+0x76/0xc0
> [ 10.386033] [<ffffffff8126de62>] xlog_recover_finish+0x22/0xc0
> [ 10.386035] [<ffffffff81265aa4>] xfs_log_mount_finish+0x24/0x30
> [ 10.386038] [<ffffffff81270aab>] xfs_mountfs+0x45b/0x720
> [ 10.386040] [<ffffffff81288741>] xfs_fs_fill_super+0x1f1/0x2e0
> [ 10.386042] [<ffffffff811573aa>] mount_bdev+0x1aa/0x1f0
> [ 10.386044] [<ffffffff81288550>] ? xfs_parseargs+0xb90/0xb90
> [ 10.386046] [<ffffffff812866b0>] xfs_fs_mount+0x10/0x20
> [ 10.386048] [<ffffffff81157c3e>] mount_fs+0x3e/0x1b0
> [ 10.386051] [<ffffffff81171807>] vfs_kern_mount+0x57/0xa0
> [ 10.386052] [<ffffffff81171c4f>] do_kern_mount+0x4f/0x100
> [ 10.386054] [<ffffffff811732dc>] do_mount+0x19c/0x840
> [ 10.386057] [<ffffffff8110fa12>] ? __get_free_pages+0x12/0x50
> [ 10.386059] [<ffffffff81172fc5>] ? copy_mount_options+0x35/0x170
> [ 10.386061] [<ffffffff81173d0b>] sys_mount+0x8b/0xe0
> [ 10.386064] [<ffffffff814c19fb>] system_call_fastpath+0x16/0x1b
> [ 10.386071] XFS (md1p2): Failed to recover EFIs
> [ 10.386097] XFS (md1p2): log mount finish failed
> [ 10.428562] XFS (md1p3): Mounting Filesystem
> [ 10.609949] XFS (md1p3): Ending clean mount
>
> FWIW I got a message about EFIs yesterday too, but everything else worked:
> Jul 4 09:42:54 debian kernel: [ 11.439861] XFS (md1p2): Mounting Filesystem
> Jul 4 09:42:54 debian kernel: [ 11.599815] XFS (md1p2): Starting recovery (logdev: /dev/disk/by-id/scsi-SATA_WDC_WD740ADFD-0_WD-WMARF1007797-part5)
> Jul 4 09:42:54 debian kernel: [ 11.787980] XFS (md1p2): I/O error occurred: meta-data dev md1p2 block 0x117925a8 ("xfs_trans_read_buf") error 5 buf c
> ount 4096
> Jul 4 09:42:54 debian kernel: [ 11.788044] XFS (md1p2): Failed to recover EFIs
> Jul 4 09:42:54 debian kernel: [ 11.788065] XFS (md1p2): log mount finish failed
> Jul 4 09:42:54 debian kernel: [ 11.831077] XFS (md1p3): Mounting Filesystem
> Jul 4 09:42:54 debian kernel: [ 12.009647] XFS (md1p3): Ending clean mount
Looks like you might have a dying disk. That's a IO error on read
that has been reported back to XFS, and it warned that bad things
happened. Maybe XFS should have shut down, though.
> UUID=6f7c65b9-40b2-4b05-9157-522a67f65c4a /mnt/var_data xfs defaults,noatime,nodiratime,logbufs=8,logbsize=256k,logdev=/dev/disk/by-id/scsi-SATA_WDC_WD740ADFD-0_WD-WMARF1007797-part5 0 2
>
> I can't mount the FS anymore:
> mount: Structure needs cleaning
Obviously - you've got corrupted free space btrees thanks to the IO
error during recovery and the later operations that were done on it.
Now log recovery can't complete without hitting those corruptions.
> So I used xfs_repair /dev/md1p2 -l /dev/sdi5 -L, and then I could mount the log.
Did you replace the faulty disk? If not,this will just happen
again...
> I did save the faulty log-file, let me know if you need it:
> -rw-r--r-- 1 edwin edwin 2.9M Jul 5 09:00 sdi5.xz
>
> This is on a 3.0-rc5 kernel, my .config is below:
>
> I've run perf top with the hung shutdown, and it showed me something like this:
> 1964.00 16.3% filemap_fdatawait_range /lib/modules/3.0.0-rc5/build/vmlinux
> 1831.00 15.2% _raw_spin_lock /lib/modules/3.0.0-rc5/build/vmlinux
> 1316.00 10.9% iput /lib/modules/3.0.0-rc5/build/vmlinux
> 1265.00 10.5% _atomic_dec_and_lock /lib/modules/3.0.0-rc5/build/vmlinux
> 998.00 8.3% _raw_spin_unlock /lib/modules/3.0.0-rc5/build/vmlinux
> 731.00 6.1% sync_inodes_sb /lib/modules/3.0.0-rc5/build/vmlinux
> 724.00 6.0% find_get_pages_tag /lib/modules/3.0.0-rc5/build/vmlinux
> 580.00 4.8% radix_tree_gang_lookup_tag_slot /lib/modules/3.0.0-rc5/build/vmlinux
> 525.00 4.3% __rcu_read_unlock /lib/modules/3.0.0-rc5/build/vmlinux
Looks like it is running around trying to write back data, stuck
somewhere in the code outside XFS. I haven't seen anything like this
before.
Still, the root cause is likely a bad disk or driver, so finding and
fixing that is probably the first thing you should do...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: "Török Edwin" <edwintorok@gmail.com>
Cc: xfs-masters@oss.sgi.com, xfs@oss.sgi.com,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: XFS internal error (memory corruption)
Date: Tue, 5 Jul 2011 23:09:32 +1000 [thread overview]
Message-ID: <20110705130932.GF1026@dastard> (raw)
In-Reply-To: <4E12A927.9020102@gmail.com>
On Tue, Jul 05, 2011 at 09:03:19AM +0300, Török Edwin wrote:
> Hi,
>
> Yesterday when running 'shutdown -Pfh now', it hung using 99% CPU in sys [*]
> Looking at the console there was a message about XFS "Corruption of in-memory data detected", and about XFS_WANT_CORRUPTED_GOTO.
So you had a btree corruption.
> Had to shutdown the machine via SysRQ u + o.
>
> Today when I booted I got this message:
> [ 9.786494] XFS (md1p2): Mounting Filesystem
> [ 9.927590] XFS (md1p2): Starting recovery (logdev: /dev/disk/by-id/scsi-SATA_WDC_WD740ADFD-0_WD-WMARF1007797-part5)
> [ 10.385941] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1638 of file fs/xfs/xfs_alloc.c. Caller 0xffffffff8122b80e
> [ 10.385943]
> [ 10.386007] Pid: 1990, comm: mount Not tainted 3.0.0-rc5 #155
> [ 10.386009] Call Trace:
> [ 10.386014] [<ffffffff812551ca>] xfs_error_report+0x3a/0x40
> [ 10.386017] [<ffffffff8122b80e>] ? xfs_free_extent+0xce/0x120
> [ 10.386019] [<ffffffff81227e06>] ? xfs_alloc_lookup_eq+0x16/0x20
> [ 10.386021] [<ffffffff81228f4a>] xfs_free_ag_extent+0x6aa/0x780
> [ 10.386023] [<ffffffff8122b80e>] xfs_free_extent+0xce/0x120
> [ 10.386026] [<ffffffff8127b0ff>] ? kmem_zone_alloc+0x5f/0xe0
> [ 10.386029] [<ffffffff81268e9f>] xlog_recover_process_efi+0x15f/0x1a0
> [ 10.386031] [<ffffffff8126ab26>] xlog_recover_process_efis.isra.4+0x76/0xc0
> [ 10.386033] [<ffffffff8126de62>] xlog_recover_finish+0x22/0xc0
> [ 10.386035] [<ffffffff81265aa4>] xfs_log_mount_finish+0x24/0x30
> [ 10.386038] [<ffffffff81270aab>] xfs_mountfs+0x45b/0x720
> [ 10.386040] [<ffffffff81288741>] xfs_fs_fill_super+0x1f1/0x2e0
> [ 10.386042] [<ffffffff811573aa>] mount_bdev+0x1aa/0x1f0
> [ 10.386044] [<ffffffff81288550>] ? xfs_parseargs+0xb90/0xb90
> [ 10.386046] [<ffffffff812866b0>] xfs_fs_mount+0x10/0x20
> [ 10.386048] [<ffffffff81157c3e>] mount_fs+0x3e/0x1b0
> [ 10.386051] [<ffffffff81171807>] vfs_kern_mount+0x57/0xa0
> [ 10.386052] [<ffffffff81171c4f>] do_kern_mount+0x4f/0x100
> [ 10.386054] [<ffffffff811732dc>] do_mount+0x19c/0x840
> [ 10.386057] [<ffffffff8110fa12>] ? __get_free_pages+0x12/0x50
> [ 10.386059] [<ffffffff81172fc5>] ? copy_mount_options+0x35/0x170
> [ 10.386061] [<ffffffff81173d0b>] sys_mount+0x8b/0xe0
> [ 10.386064] [<ffffffff814c19fb>] system_call_fastpath+0x16/0x1b
> [ 10.386071] XFS (md1p2): Failed to recover EFIs
> [ 10.386097] XFS (md1p2): log mount finish failed
> [ 10.428562] XFS (md1p3): Mounting Filesystem
> [ 10.609949] XFS (md1p3): Ending clean mount
>
> FWIW I got a message about EFIs yesterday too, but everything else worked:
> Jul 4 09:42:54 debian kernel: [ 11.439861] XFS (md1p2): Mounting Filesystem
> Jul 4 09:42:54 debian kernel: [ 11.599815] XFS (md1p2): Starting recovery (logdev: /dev/disk/by-id/scsi-SATA_WDC_WD740ADFD-0_WD-WMARF1007797-part5)
> Jul 4 09:42:54 debian kernel: [ 11.787980] XFS (md1p2): I/O error occurred: meta-data dev md1p2 block 0x117925a8 ("xfs_trans_read_buf") error 5 buf c
> ount 4096
> Jul 4 09:42:54 debian kernel: [ 11.788044] XFS (md1p2): Failed to recover EFIs
> Jul 4 09:42:54 debian kernel: [ 11.788065] XFS (md1p2): log mount finish failed
> Jul 4 09:42:54 debian kernel: [ 11.831077] XFS (md1p3): Mounting Filesystem
> Jul 4 09:42:54 debian kernel: [ 12.009647] XFS (md1p3): Ending clean mount
Looks like you might have a dying disk. That's a IO error on read
that has been reported back to XFS, and it warned that bad things
happened. Maybe XFS should have shut down, though.
> UUID=6f7c65b9-40b2-4b05-9157-522a67f65c4a /mnt/var_data xfs defaults,noatime,nodiratime,logbufs=8,logbsize=256k,logdev=/dev/disk/by-id/scsi-SATA_WDC_WD740ADFD-0_WD-WMARF1007797-part5 0 2
>
> I can't mount the FS anymore:
> mount: Structure needs cleaning
Obviously - you've got corrupted free space btrees thanks to the IO
error during recovery and the later operations that were done on it.
Now log recovery can't complete without hitting those corruptions.
> So I used xfs_repair /dev/md1p2 -l /dev/sdi5 -L, and then I could mount the log.
Did you replace the faulty disk? If not,this will just happen
again...
> I did save the faulty log-file, let me know if you need it:
> -rw-r--r-- 1 edwin edwin 2.9M Jul 5 09:00 sdi5.xz
>
> This is on a 3.0-rc5 kernel, my .config is below:
>
> I've run perf top with the hung shutdown, and it showed me something like this:
> 1964.00 16.3% filemap_fdatawait_range /lib/modules/3.0.0-rc5/build/vmlinux
> 1831.00 15.2% _raw_spin_lock /lib/modules/3.0.0-rc5/build/vmlinux
> 1316.00 10.9% iput /lib/modules/3.0.0-rc5/build/vmlinux
> 1265.00 10.5% _atomic_dec_and_lock /lib/modules/3.0.0-rc5/build/vmlinux
> 998.00 8.3% _raw_spin_unlock /lib/modules/3.0.0-rc5/build/vmlinux
> 731.00 6.1% sync_inodes_sb /lib/modules/3.0.0-rc5/build/vmlinux
> 724.00 6.0% find_get_pages_tag /lib/modules/3.0.0-rc5/build/vmlinux
> 580.00 4.8% radix_tree_gang_lookup_tag_slot /lib/modules/3.0.0-rc5/build/vmlinux
> 525.00 4.3% __rcu_read_unlock /lib/modules/3.0.0-rc5/build/vmlinux
Looks like it is running around trying to write back data, stuck
somewhere in the code outside XFS. I haven't seen anything like this
before.
Still, the root cause is likely a bad disk or driver, so finding and
fixing that is probably the first thing you should do...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2011-07-05 13:09 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-05 6:03 XFS internal error (memory corruption) Török Edwin
2011-07-05 6:03 ` Török Edwin
2011-07-05 13:09 ` Dave Chinner [this message]
2011-07-05 13:09 ` Dave Chinner
2011-07-05 13:38 ` Török Edwin
2011-07-05 13:38 ` Török Edwin
2011-07-06 4:04 ` Dave Chinner
2011-07-06 4:04 ` Dave Chinner
2011-07-06 7:07 ` Christoph Hellwig
2011-07-06 7:07 ` Christoph Hellwig
2011-07-06 7:22 ` Dave Chinner
2011-07-06 7:22 ` Dave Chinner
2011-07-06 8:05 ` Christoph Hellwig
2011-07-06 8:05 ` Christoph Hellwig
2011-07-06 22:04 ` Stan Hoeppner
2011-07-06 19:30 ` Stan Hoeppner
2011-07-06 23:52 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110705130932.GF1026@dastard \
--to=david@fromorbit.com \
--cc=edwintorok@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=xfs-masters@oss.sgi.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.