XFS corruption of in-memory data detected with KVM

linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* XFS corruption of in-memory data detected with KVM
@ 2018-02-21 10:57 Andrea Mazzocchi
  2018-02-21 11:46 ` Carlos Maiolino
  0 siblings, 1 reply; 8+ messages in thread
From: Andrea Mazzocchi @ 2018-02-21 10:57 UTC (permalink / raw)
  To: linux-xfs

Hello everybody.

We are experiencing crashes on our SSD VPSes, all working on KVM;
other VPSes hosted on another VPS provider
using VMware never gave us troubles, the ones on KVM occasionally
crashes under unknown circumstances.
We use the same CentOS7 ISO on all our hosts (both KVM and VMware).
Only the hosts on KVM crash and we don't understand why.

Here's the dmesg of the crashed host in the emergency console.
Any suggestion is more than welcome!

Best regards

[    1.781684] systemd[1]: Found device /dev/mapper/centos-root.
[    1.781919] systemd[1]: Starting File System Check on
/dev/mapper/centos-root...
[    1.798487] systemd-fsck[3751]: /sbin/fsck.xfs: XFS file system.
[    1.799458] systemd[1]: Started File System Check on /dev/mapper/centos-root.
[    1.838485] systemd[1]: Started dracut initqueue hook.
[    1.838637] systemd[1]: Reached target Remote File Systems (Pre).
[    1.838764] systemd[1]: Starting Remote File Systems (Pre).
[    1.838886] systemd[1]: Reached target Remote File Systems.
[    1.839070] systemd[1]: Starting Remote File Systems.
[    1.839211] systemd[1]: Started dracut pre-mount hook.
[    1.839357] systemd[1]: Mounting /sysroot...
[    2.235562] kernel: SGI XFS with ACLs, security attributes, no debug enabled
[    2.237759] kernel: XFS (dm-0): Mounting U5 Filesystem
[    5.413560] kernel: XFS (dm-0): Starting recovery (logdev: internal)
[    5.436057] kernel: XFS (dm-0): Internal error
XFS_WANT_CORRUPTED_GOTO at line 3171 of file
fs/xfs/libxfs/xfs_btree.c. Caller xfs_free_ag_extent+8x402/0x780 [xfs]
[    5.437201] kernel: CPU: 1 PID: 398 Comm: mount Not tainted
3.18.0-693.11.1.e17.x86_64 #1
[    5.438265] kernel: Hardware name: QEMU Standard PC (i440FX + PIIX,
1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org
04/01/2014
[    5.448392] kernel: ffff8801361368d8 0000000073ff73da
ffff8801363bfa88 ffffffff816a3e61
[    5.441585] kernel: ffff8881363bfa28 ffffffffc822c46b
ffffffffc01e94d2 ffff8881363bfa98
[    5.442682] kernel: ffffffffc0206543 ffff8801363bfafc
0000000000000000 00000000ffffffff
[    5.443783] kernel: Call Trace:
[    5.444828] kernel: [<ffffffff816a3e61>] dump_stack+0x19/0x1b
[    5.445958] kernel: [<ffffffffc022c46b>] xfs_error_report+0x3b/0x40 [xfs]
[    5.447159] kernel: [<ffffffffc0le94d2>] ?
xfs_free_ag_extent+0x402/0x780 [xfs]
[    5.448357] kernel: [<ffffffffc0206543>] xfs_btree_insert+0x1a3/0x1b0 [xfs]
[    5.449683] kernel: [<ffffffffc0le94d2>] xfs_free_ag_extent+0x402/0x780 [xfs]
[    5.458844] kernel: [<ffffffffc0lebe6c>] xfs_free_extent+0xfc/0x130 [xfs]
[    5.452168] kernel: [<ffffffffc025a6b6>]
xfs_trans_free_extent+0x26/0x60 [xfs]
[    5.453378] kernel: [<ffffffffc0252a5e>]
xlog_recover_process_efi+0x17e/0x1c0 [xfs]
[    5.454788] kernel: [<ffffffffc0254db7>]
xlog_recover_process_efis.isra.30+0x77/0xe0 [xfs]
[    5.455927] kernel: [<ffffffffc02586e1>] xlog_recover_finish+0x21/0xb0 [xfs]
[    5.457896] kernel: [<ffffffffc024b814>] xfs_log_mount_finish+0x34/0x50 [xfs]
[    5.458273] kernel: [<ffffffffc0241eal>] xfs_mountfs+0x5d1/0x8b0 [xfs]
[    5.459589] kernel: [<ffffffffc02301a0>] ?
xfs_filestreamget_parent+Ox80/0x80 [xis]
[    5.468727] kernel: [<ffffffffc0244ceb>] xfs_fs_fill_super+0x3bb/0x4d0 [xfs]
[    5.461933] kernel: [<ffffffff81204b10>] mount_bdev+0x1b0/0x1f0
[    5.463161] kernel: [<ffffffffc0244930>] ?
xfs_test_remount_options.isra.11+0x70/0x70 [xfs]
[    5.464485] kernel: [<ffffffffc0243655>] xfs_fs_mount+0x15/0x20 [xfs]
[    5.465668] kernel: [<ffffffff81205389>] mount_fs+0x39/0x1b0
[    5.466951] kernel: [<ffffffff811a5f05>] ? __alloc_percpu+0x15/0x20
[    5.468185] kernel: [<ffffffff81221e57>] vfs_kern_mount+0x67/0x110
[    5.469497] kernel: [<ffffffff81224363>] do_mount+0x233/0xaf0
[    5.478715] kernel: [<ffffffff811a100b>] ? strndup_user+0x4b/0xa0
[    5.471909] kernel: [<ffffffff81224fa6>] SyS_mount+0x96/Oxf0
[    5.473063] kernel: [<ffffffff816b5089>] system_call_fastpath+0x16/0x1b
[    5.474233] kernel: XFS (dm-0): Internal error xfs_trans_cancel at
line 984 of file fs/xfs xfs_trans.c. Caller
xlog_recover_process_efi+Oxl8e/Oxlc0 (xis]
[    5.476668] kernel: CPU: 1 PID: 390 Comm: mount Not tainted
3.10.0-693.11.1.e17.x86_64 #1
[    5.477942] kernel: Hardware name: QEMU Standard PC (i440FX + PIIX,
1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org
04/01/2014
[    5.480466] kernel: ffff880136010000 0000000073ff73da
ffff8801363bfbd0 ffffffff816a3e61
[    5.481751] kernel: ffff8801363bfbe8 ffffffffc022c46b
ffffffffc0252a6e ffff8801363bfc10
[    5.483028] kernel: ffffffffc02488cd ffff8800369d6000
0000000000000000 ffff8800369d6198
[    5.484313] kernel: Call Trace:
[    5.485801] kernel: [<ffffffff816a3e61>] dump_stack+0x19/0x1b
[    5.487129] kernel: [<ffffffffc022c46b>] xfs_error_report+Ox3b/0x40 [xfs]
[    5.488464] kernel: [<ffffffffc0252a6e>] ?
xlog_recover_process_efi+Oxl8e/Oxlc0 [xfs]
[    5.489797] kernel: [<ffffffffc02488cd>] xfs_trans_cancel+Oxbd/Oxe0 [xfs]
[    5.491123] kernel: [<ffffffffc0252a6e>]
xlog_recover_process_efi+Oxl8e/Oxlc0 [xis]
[    5.492429] kernel: [<ffffffffc0254db7>]
xlog_recover_process_efis.isra.30+0x77/0xe0 [xfs]
[    5.493748] kernel: [<ffffffffc02586el>] xlog_recover_finish+Ox21/OxbO [xfs]
[    5.495001] kernel: [<ffffffffc024b814>] xfs_log_mount_finish+0x34/0x50 [xfs]
[    5.496286] kernel: [<ffffffffc0241eal>] xfs_mountfs+0x5d1/0x8b0 [xfs]
[    5.497508] kernel: [<ffffffffc02301a0>] ?
xfs_filestream_get_parent+0x80/0x80 [xfs]
[    5.498700] kernel: [<ffffffffc0244ceb>] xfs_fs_fi1l_super+Ox3bb/Ox4d0 [xfs]
[    5.499866] kernel: [<ffffffff81204b10>] mount_bdev+0x1b0/0x1f0
[    5.501050] kernel: [<ffffffffc0244930>] ?
xfs_test_remount_options.isra.11+0x70/0x70 [xfs]
[    5.502232] kernel: [<ffffffffc0243655>] xfs_fs_mount+Ox15/0x20 [xfs]
[    5.503406] kernel: [<ffffffff81205389>] mount_fs+0x39/0x1b0
[    5.504556] kernel: [<ffffffff811a5f05>] ? __alloc_percpu+Ox15/0x20
[    5.505713] kernel: [<ffffffff81221e57>] vfs_kern_mount+0x67/0x110
[    5.506893] kernel: [<ffffffff81224363>] do_mount+0x233/0xaf0
[    5.508048] kernel: [<ffffffff811a100b>] ? strndup_user+Ox4b/Oxa0
[    5.509222] kernel: [<ffffffff81224fa6>] SyS_mount+0x96/0xf0
[    5.510366] kernel: [<ffffffff816b5089>] system_call_fastpath+0x16/0x1b
[    5.511554] kernel: XFS (dm-0): xfs_do_force_shutdown(0x8) called
from line 985 of file fs/xfs/xfs_trans.c.     Return address =
Oxffffffffc02488e6
[    5.514064] kernel: XFS (dm-0): Corruption of in-memory data
detected. Shutting down file system
[    5.515795] kernel: XFS (dm-0): Please umount the filesystem and
rectify the problem(s)
[    5.517485] kernel: XFS (dm-0): Failed to recover EFIs
[    5.519112] kernel: XFS (dm-0): log mount finish failed
[    5.208098] mount[390]: mount: mount /dev/mapper/centos-root on
/sysroot failed: Structure needs cleaning
[    5.208670] systemd[1]: sysroot.mount mount process exited,
code-exited status=32
[    5.208864] systemd[1]: Failed to mount /sysroot.
[    5.209318] systemd[1]: Dependency failed for Initrd Root File System.
[    5.214710] systemd[1]: Dependency failed for Reload Configuration
from the Real Root.
[    5.214839] systemd[1]: Job initrd-parse-etc.service/start failed
with result 'dependency'
[    5.214948] systemd[1]: Triggering OnFailure= dependencies of
initrd-parse-etc.service.
[    5.215120] systemd[1]: Job initrd-root-fs.target/start failed with
result 'dependency'.
[    5.215248] systemd[1]: Triggering OnFailure= dependencies of
initrd-root-fs.target.
[    5.215350] systemd[1]: Unit sysroot.mount entered failed state.
[    5.215557] systemd[1]: Stopped dracut pre-pivot and cleanup hook.
[    5.215753] systemd[1]: Stopped target Initrd Default Target.
[    5.217627] systemd[1]: Reached target Initrd File Systems.
[    5.217760] systemd[1]: Starting Initrd File Systems.
[    5.218946] systemd[1]: Stopped dracut pre-udev hook.
[    5.219191] systemd[1]: Stopping dracut pre-udev hook...
[    5.221870] systemd[1]: Stopped dracut cmdline hook.
[    5.222055] systemd[1]: Stopping dracut cmdline hook...
[    5.227280] systemd[1]: Stopped dracut mount hook.
[    5.230291] systemd[1]: Stopped target Basic System.
[    5.230486] systemd[1]: Stopping Basic System.
[    5.232450] systemd[1]: Stopped dracut initqueue hook.
[    5.232621] systemd[1]: Stopping dracut initqueue hook...
[    5.235783] systemd[1]: Stopped target System Initialization.
[    5.235904] systemd[1]: Stopping System Initialization.
[    5.236560] systemd[1]: Started Emergency Shell.
[    5.236688] systemd[1]: Starting Emergency Shell...
[    5.236825] systemd[1]: Reached target Emergency Mode.
[    5.236947] systemd[1]: Startup finished in 806ms (kernel) + 0
(initrd) + 4.430s (userspace) = 5.236s.
[    5.237073] systemd[1]: Starting Emergency Mode.
[    5.256981] systemd[1]: Received SIGRTMIM+21 from PID 274 (plymouthd).
(END)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: XFS corruption of in-memory data detected with KVM
  2018-02-21 10:57 XFS corruption of in-memory data detected with KVM Andrea Mazzocchi
@ 2018-02-21 11:46 ` Carlos Maiolino
  2018-02-21 15:23   ` Andrea Mazzocchi
  0 siblings, 1 reply; 8+ messages in thread
From: Carlos Maiolino @ 2018-02-21 11:46 UTC (permalink / raw)
  To: Andrea Mazzocchi; +Cc: linux-xfs

On Wed, Feb 21, 2018 at 11:57:01AM +0100, Andrea Mazzocchi wrote:
> Hello everybody.
> 
> We are experiencing crashes on our SSD VPSes, all working on KVM;
> other VPSes hosted on another VPS provider
> using VMware never gave us troubles, the ones on KVM occasionally
> crashes under unknown circumstances.
> We use the same CentOS7 ISO on all our hosts (both KVM and VMware).
> Only the hosts on KVM crash and we don't understand why.
> 
> Here's the dmesg of the crashed host in the emergency console.
> Any suggestion is more than welcome!
> 

By a quick look, looks like your machine is falling into emergency mode due a
failure to mount the root filesystem due metadata corruption. Have you tried to
run xfs_repair on this filesystem to see if it catches something?

Looks like XFS found an on-disk corruption while trying to process an extent
free intent found in the log.

Notice though, if you can't properly mount/unmount it (to replay the log) before
running xfs_repair, you will might need to zero out the log (-L option).

Also, you are running a very old kernel, so, please make sure you try to run a
newer xfs_repair.

Have you tried xfs_repair already? Did the same problem happened after it? Have
you tried to use an updated kernel? Your kernel is old, and we can't track what
have been fixed or not by the distro, so that's why I suggested to try a newer
kernel anyway.

Also, this is more a guess than anything. If you see this happening often (even
after xfs_repair), you might want to double-check your storage stack and see if
this is not corrupting anything, bad configured storage stacks in virtual
environments are very usual culprits on filesystem corruption cases.

> Best regards
> 
> [    1.781684] systemd[1]: Found device /dev/mapper/centos-root.
> [    1.781919] systemd[1]: Starting File System Check on
> /dev/mapper/centos-root...
> [    1.798487] systemd-fsck[3751]: /sbin/fsck.xfs: XFS file system.
> [    1.799458] systemd[1]: Started File System Check on /dev/mapper/centos-root.
> [    1.838485] systemd[1]: Started dracut initqueue hook.
> [    1.838637] systemd[1]: Reached target Remote File Systems (Pre).
> [    1.838764] systemd[1]: Starting Remote File Systems (Pre).
> [    1.838886] systemd[1]: Reached target Remote File Systems.
> [    1.839070] systemd[1]: Starting Remote File Systems.
> [    1.839211] systemd[1]: Started dracut pre-mount hook.
> [    1.839357] systemd[1]: Mounting /sysroot...
> [    2.235562] kernel: SGI XFS with ACLs, security attributes, no debug enabled
> [    2.237759] kernel: XFS (dm-0): Mounting U5 Filesystem
> [    5.413560] kernel: XFS (dm-0): Starting recovery (logdev: internal)
> [    5.436057] kernel: XFS (dm-0): Internal error
> XFS_WANT_CORRUPTED_GOTO at line 3171 of file
> fs/xfs/libxfs/xfs_btree.c. Caller xfs_free_ag_extent+8x402/0x780 [xfs]
> [    5.437201] kernel: CPU: 1 PID: 398 Comm: mount Not tainted
> 3.18.0-693.11.1.e17.x86_64 #1
> [    5.438265] kernel: Hardware name: QEMU Standard PC (i440FX + PIIX,
> 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org
> 04/01/2014
> [    5.448392] kernel: ffff8801361368d8 0000000073ff73da
> ffff8801363bfa88 ffffffff816a3e61
> [    5.441585] kernel: ffff8881363bfa28 ffffffffc822c46b
> ffffffffc01e94d2 ffff8881363bfa98
> [    5.442682] kernel: ffffffffc0206543 ffff8801363bfafc
> 0000000000000000 00000000ffffffff
> [    5.443783] kernel: Call Trace:
> [    5.444828] kernel: [<ffffffff816a3e61>] dump_stack+0x19/0x1b
> [    5.445958] kernel: [<ffffffffc022c46b>] xfs_error_report+0x3b/0x40 [xfs]
> [    5.447159] kernel: [<ffffffffc0le94d2>] ?
> xfs_free_ag_extent+0x402/0x780 [xfs]
> [    5.448357] kernel: [<ffffffffc0206543>] xfs_btree_insert+0x1a3/0x1b0 [xfs]
> [    5.449683] kernel: [<ffffffffc0le94d2>] xfs_free_ag_extent+0x402/0x780 [xfs]
> [    5.458844] kernel: [<ffffffffc0lebe6c>] xfs_free_extent+0xfc/0x130 [xfs]
> [    5.452168] kernel: [<ffffffffc025a6b6>]
> xfs_trans_free_extent+0x26/0x60 [xfs]
> [    5.453378] kernel: [<ffffffffc0252a5e>]
> xlog_recover_process_efi+0x17e/0x1c0 [xfs]
> [    5.454788] kernel: [<ffffffffc0254db7>]
> xlog_recover_process_efis.isra.30+0x77/0xe0 [xfs]
> [    5.455927] kernel: [<ffffffffc02586e1>] xlog_recover_finish+0x21/0xb0 [xfs]
> [    5.457896] kernel: [<ffffffffc024b814>] xfs_log_mount_finish+0x34/0x50 [xfs]
> [    5.458273] kernel: [<ffffffffc0241eal>] xfs_mountfs+0x5d1/0x8b0 [xfs]
> [    5.459589] kernel: [<ffffffffc02301a0>] ?
> xfs_filestreamget_parent+Ox80/0x80 [xis]
> [    5.468727] kernel: [<ffffffffc0244ceb>] xfs_fs_fill_super+0x3bb/0x4d0 [xfs]
> [    5.461933] kernel: [<ffffffff81204b10>] mount_bdev+0x1b0/0x1f0
> [    5.463161] kernel: [<ffffffffc0244930>] ?
> xfs_test_remount_options.isra.11+0x70/0x70 [xfs]
> [    5.464485] kernel: [<ffffffffc0243655>] xfs_fs_mount+0x15/0x20 [xfs]
> [    5.465668] kernel: [<ffffffff81205389>] mount_fs+0x39/0x1b0
> [    5.466951] kernel: [<ffffffff811a5f05>] ? __alloc_percpu+0x15/0x20
> [    5.468185] kernel: [<ffffffff81221e57>] vfs_kern_mount+0x67/0x110
> [    5.469497] kernel: [<ffffffff81224363>] do_mount+0x233/0xaf0
> [    5.478715] kernel: [<ffffffff811a100b>] ? strndup_user+0x4b/0xa0
> [    5.471909] kernel: [<ffffffff81224fa6>] SyS_mount+0x96/Oxf0
> [    5.473063] kernel: [<ffffffff816b5089>] system_call_fastpath+0x16/0x1b
> [    5.474233] kernel: XFS (dm-0): Internal error xfs_trans_cancel at
> line 984 of file fs/xfs xfs_trans.c. Caller
> xlog_recover_process_efi+Oxl8e/Oxlc0 (xis]
> [    5.476668] kernel: CPU: 1 PID: 390 Comm: mount Not tainted
> 3.10.0-693.11.1.e17.x86_64 #1
> [    5.477942] kernel: Hardware name: QEMU Standard PC (i440FX + PIIX,
> 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org
> 04/01/2014
> [    5.480466] kernel: ffff880136010000 0000000073ff73da
> ffff8801363bfbd0 ffffffff816a3e61
> [    5.481751] kernel: ffff8801363bfbe8 ffffffffc022c46b
> ffffffffc0252a6e ffff8801363bfc10
> [    5.483028] kernel: ffffffffc02488cd ffff8800369d6000
> 0000000000000000 ffff8800369d6198
> [    5.484313] kernel: Call Trace:
> [    5.485801] kernel: [<ffffffff816a3e61>] dump_stack+0x19/0x1b
> [    5.487129] kernel: [<ffffffffc022c46b>] xfs_error_report+Ox3b/0x40 [xfs]
> [    5.488464] kernel: [<ffffffffc0252a6e>] ?
> xlog_recover_process_efi+Oxl8e/Oxlc0 [xfs]
> [    5.489797] kernel: [<ffffffffc02488cd>] xfs_trans_cancel+Oxbd/Oxe0 [xfs]
> [    5.491123] kernel: [<ffffffffc0252a6e>]
> xlog_recover_process_efi+Oxl8e/Oxlc0 [xis]
> [    5.492429] kernel: [<ffffffffc0254db7>]
> xlog_recover_process_efis.isra.30+0x77/0xe0 [xfs]
> [    5.493748] kernel: [<ffffffffc02586el>] xlog_recover_finish+Ox21/OxbO [xfs]
> [    5.495001] kernel: [<ffffffffc024b814>] xfs_log_mount_finish+0x34/0x50 [xfs]
> [    5.496286] kernel: [<ffffffffc0241eal>] xfs_mountfs+0x5d1/0x8b0 [xfs]
> [    5.497508] kernel: [<ffffffffc02301a0>] ?
> xfs_filestream_get_parent+0x80/0x80 [xfs]
> [    5.498700] kernel: [<ffffffffc0244ceb>] xfs_fs_fi1l_super+Ox3bb/Ox4d0 [xfs]
> [    5.499866] kernel: [<ffffffff81204b10>] mount_bdev+0x1b0/0x1f0
> [    5.501050] kernel: [<ffffffffc0244930>] ?
> xfs_test_remount_options.isra.11+0x70/0x70 [xfs]
> [    5.502232] kernel: [<ffffffffc0243655>] xfs_fs_mount+Ox15/0x20 [xfs]
> [    5.503406] kernel: [<ffffffff81205389>] mount_fs+0x39/0x1b0
> [    5.504556] kernel: [<ffffffff811a5f05>] ? __alloc_percpu+Ox15/0x20
> [    5.505713] kernel: [<ffffffff81221e57>] vfs_kern_mount+0x67/0x110
> [    5.506893] kernel: [<ffffffff81224363>] do_mount+0x233/0xaf0
> [    5.508048] kernel: [<ffffffff811a100b>] ? strndup_user+Ox4b/Oxa0
> [    5.509222] kernel: [<ffffffff81224fa6>] SyS_mount+0x96/0xf0
> [    5.510366] kernel: [<ffffffff816b5089>] system_call_fastpath+0x16/0x1b
> [    5.511554] kernel: XFS (dm-0): xfs_do_force_shutdown(0x8) called
> from line 985 of file fs/xfs/xfs_trans.c.     Return address =
> Oxffffffffc02488e6
> [    5.514064] kernel: XFS (dm-0): Corruption of in-memory data
> detected. Shutting down file system
> [    5.515795] kernel: XFS (dm-0): Please umount the filesystem and
> rectify the problem(s)
> [    5.517485] kernel: XFS (dm-0): Failed to recover EFIs
> [    5.519112] kernel: XFS (dm-0): log mount finish failed
> [    5.208098] mount[390]: mount: mount /dev/mapper/centos-root on
> /sysroot failed: Structure needs cleaning
> [    5.208670] systemd[1]: sysroot.mount mount process exited,
> code-exited status=32
> [    5.208864] systemd[1]: Failed to mount /sysroot.
> [    5.209318] systemd[1]: Dependency failed for Initrd Root File System.
> [    5.214710] systemd[1]: Dependency failed for Reload Configuration
> from the Real Root.
> [    5.214839] systemd[1]: Job initrd-parse-etc.service/start failed
> with result 'dependency'
> [    5.214948] systemd[1]: Triggering OnFailure= dependencies of
> initrd-parse-etc.service.
> [    5.215120] systemd[1]: Job initrd-root-fs.target/start failed with
> result 'dependency'.
> [    5.215248] systemd[1]: Triggering OnFailure= dependencies of
> initrd-root-fs.target.
> [    5.236947] systemd[1]: Startup finished in 806ms (kernel) + 0
> (initrd) + 4.430s (userspace) = 5.236s.
> [    5.237073] systemd[1]: Starting Emergency Mode.
> [    5.256981] systemd[1]: Received SIGRTMIM+21 from PID 274 (plymouthd).
> (END)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Carlos

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: XFS corruption of in-memory data detected with KVM
  2018-02-21 11:46 ` Carlos Maiolino
@ 2018-02-21 15:23   ` Andrea Mazzocchi
  2018-02-21 16:55     ` Emmanuel Florac
  0 siblings, 1 reply; 8+ messages in thread
From: Andrea Mazzocchi @ 2018-02-21 15:23 UTC (permalink / raw)
  To: Andrea Mazzocchi, linux-xfs

> Also, you are running a very old kernel, so, please make sure you try to run a
> newer xfs_repair.

We installed yesterday 3.10.0-693.17.1.el7. I know that CentOS and RedHat
keep old stable kernel version and backport important stuff: do you think that
upgrading to a more recent kernel (4 and above) would be better,
even if less stable?

> Also, this is more a guess than anything. If you see this happening often (even
> after xfs_repair), you might want to double-check your storage stack and see if
> this is not corrupting anything, bad configured storage stacks in virtual
> environments are very usual culprits on filesystem corruption cases.

How could we check our storage stack and see if it is the one to blame?

Thanks, best regards

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: XFS corruption of in-memory data detected with KVM
  2018-02-21 15:23   ` Andrea Mazzocchi
@ 2018-02-21 16:55     ` Emmanuel Florac
  2018-02-22 10:22       ` Carlos Maiolino
       [not found]       ` <CAJbUkCQd_TOOSxVwWzDca7Do6_g+dMsKU3fObMig4gG_0HHg-w@mail.gmail.com>
  0 siblings, 2 replies; 8+ messages in thread
From: Emmanuel Florac @ 2018-02-21 16:55 UTC (permalink / raw)
  To: Andrea Mazzocchi; +Cc: linux-xfs

[-- Attachment #1: Type: text/plain, Size: 1940 bytes --]

Le Wed, 21 Feb 2018 16:23:43 +0100
Andrea Mazzocchi <mazzocchiandrea24@gmail.com> écrivait:

> > Also, you are running a very old kernel, so, please make sure you
> > try to run a newer xfs_repair.  
> 
> We installed yesterday 3.10.0-693.17.1.el7. I know that CentOS and
> RedHat keep old stable kernel version and backport important stuff:
> do you think that upgrading to a more recent kernel (4 and above)
> would be better, even if less stable?

Actually the 3.10 from CentOS 7 has a lot of things backported, for
instance it supports XFS with crc metadata which was introduced in 3.16.

However in the recent year I've never met any reason NOT to use a
recent LTS kernel (like a 4.9 or 4.14). I don't really understand why
RedHat sticks to these absurdly old kernel releases as a basis (and
taking the pain of backporting stuff for years and years).

> > Also, this is more a guess than anything. If you see this happening
> > often (even after xfs_repair), you might want to double-check your
> > storage stack and see if this is not corrupting anything, bad
> > configured storage stacks in virtual environments are very usual
> > culprits on filesystem corruption cases.  
> 
> How could we check our storage stack and see if it is the one to
> blame?

Hard to say, what KVM disk format are you using? Raw, qcow2, LVM
volumes? If these are files (raw or qcow2), what kind of filesystem and
hardware stack are they living on? Are there any error on the hosting
system?

At tne VM level, do you see any IO error? Are you using the virtio disk
driver or something else? 

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

[-- Attachment #2: Signature digitale OpenPGP --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: XFS corruption of in-memory data detected with KVM
  2018-02-21 16:55     ` Emmanuel Florac
@ 2018-02-22 10:22       ` Carlos Maiolino
  2018-02-22 11:31         ` Andrea Mazzocchi
       [not found]       ` <CAJbUkCQd_TOOSxVwWzDca7Do6_g+dMsKU3fObMig4gG_0HHg-w@mail.gmail.com>
  1 sibling, 1 reply; 8+ messages in thread
From: Carlos Maiolino @ 2018-02-22 10:22 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: Andrea Mazzocchi, linux-xfs

On Wed, Feb 21, 2018 at 05:55:40PM +0100, Emmanuel Florac wrote:
> Le Wed, 21 Feb 2018 16:23:43 +0100
> Andrea Mazzocchi <mazzocchiandrea24@gmail.com> écrivait:
> 
> > > Also, you are running a very old kernel, so, please make sure you
> > > try to run a newer xfs_repair.  
> > 
> > We installed yesterday 3.10.0-693.17.1.el7. I know that CentOS and
> > RedHat keep old stable kernel version and backport important stuff:
> > do you think that upgrading to a more recent kernel (4 and above)
> > would be better, even if less stable?
> 

Yes, there are several things backported over Enterprise distributed kernels,
but, hte upstream community can't offer support on that, once we can't easily
keep track of what has been backported or not, so, that's why we usually tell
people reporting bugs here, using an enterprise distributed kernel, to report
the bug directly to their support representatives.

> Actually the 3.10 from CentOS 7 has a lot of things backported, for
> instance it supports XFS with crc metadata which was introduced in 3.16.
> 
> However in the recent year I've never met any reason NOT to use a
> recent LTS kernel (like a 4.9 or 4.14). I don't really understand why
> RedHat sticks to these absurdly old kernel releases as a basis (and
> taking the pain of backporting stuff for years and years).

There are several reasons for that, and it is not only Red Hat who does it. One
of the reasons is ABI stability.

> 
> > > Also, this is more a guess than anything. If you see this happening
> > > often (even after xfs_repair), you might want to double-check your
> > > storage stack and see if this is not corrupting anything, bad
> > > configured storage stacks in virtual environments are very usual
> > > culprits on filesystem corruption cases.  
> > 
> > How could we check our storage stack and see if it is the one to
> > blame?
> 


> Hard to say, what KVM disk format are you using? Raw, qcow2, LVM
> volumes? If these are files (raw or qcow2), what kind of filesystem and
> hardware stack are they living on? Are there any error on the hosting
> system?
> 
> At tne VM level, do you see any IO error? Are you using the virtio disk
> driver or something else? 
>

^ Good start, also, what kind of caching policies is a good information,
particularly, using any cache policies for VM disks other than 'None', is a good
way to start having problems per my experience.

Also:

http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

Don't forget xfs_repair :)


> -- 
> ------------------------------------------------------------------------
> Emmanuel Florac     |   Direction technique
>                     |   Intellique
>                     |	<eflorac@intellique.com>
>                     |   +33 1 78 94 84 02
> ------------------------------------------------------------------------



-- 
Carlos

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: XFS corruption of in-memory data detected with KVM
  2018-02-22 10:22       ` Carlos Maiolino
@ 2018-02-22 11:31         ` Andrea Mazzocchi
  2018-02-22 13:50           ` Emmanuel Florac
  0 siblings, 1 reply; 8+ messages in thread
From: Andrea Mazzocchi @ 2018-02-22 11:31 UTC (permalink / raw)
  To: Emmanuel Florac, Andrea Mazzocchi, linux-xfs

3.10.0-693.17.1.el7 means Enterprise Linux 7, right?
Enterprise Linux 7 uses Linux kernel 3.10.0-123
(https://en.wikipedia.org/wiki/Red_Hat_Enterprise_Linux#RHEL_7)
and even if it backports important things, I saw that kernel 4.9 and
4.14 introduces
various bug fixes and improvements about XFS file system that may
greatly help us .

In 3.10.0-693.17.1.el7 changelog there are at least 1400 entries for "XFS"
(http://rpmfind.net/linux/RPM/centos/updates/7.4.1708/x86_64/Packages/kernel-devel-3.10.0-693.17.1.el7.x86_64.html):
how could I see what kernel would be the best?
Do I have to manually check the XFS bug fixes between 3.10.0-693.17.1.el7
and 4.14.20 (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5791577963426c5a2db51fff57e9fcd72061e2c3)
to see what best suits us?
We do not know what caused the crash, so we do not know what bug had
to be fixed and which kernel actually fixed it...

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: XFS corruption of in-memory data detected with KVM
       [not found]       ` <CAJbUkCQd_TOOSxVwWzDca7Do6_g+dMsKU3fObMig4gG_0HHg-w@mail.gmail.com>
@ 2018-02-22 11:33         ` Emmanuel Florac
  0 siblings, 0 replies; 8+ messages in thread
From: Emmanuel Florac @ 2018-02-22 11:33 UTC (permalink / raw)
  To: linux-xfs; +Cc: Andrea Mazzocchi

[-- Attachment #1: Type: text/plain, Size: 1606 bytes --]

Le Thu, 22 Feb 2018 10:11:08 +0100
Andrea Mazzocchi <mazzocchiandrea24@gmail.com> écrivait:

Please keep the list cc'ed, in case someone comes with another idea.

> > Hard to say, what KVM disk format are you using? Raw, qcow2, LVM
> > volumes? If these are files (raw or qcow2), what kind of filesystem
> > and hardware stack are they living on? Are there any error on the
> > hosting system?  
> 
> Software configuration of provider's hypervisor host nodes is held in
> confidentiality:
> I can't know KVM version and configurations, unless something is
> shown in dmesg.

Hum, so we can't know for sure if something went wrong outside the VM,
annoying.

> > At tne VM level, do you see any IO error? Are you using the virtio
> > disk driver or something else?  
> 
> The errors we saw were: we failed to establish an SSH connection,
> so we went to the provider's console of the VPS and the dmesg was
> shown: the log seemed all fine until "systemd[1]: found device
> /dev/mapper/centos-root"
> (which is where it begins the dmesg I posted).
> We are using virtio disk driver.

I see. Now you'll need to boot the VM from some rescue system (possibly
the installation image of CentOS) to run xfs_repair before going any
further, unfortunately.

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

[-- Attachment #2: Signature digitale OpenPGP --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: XFS corruption of in-memory data detected with KVM
  2018-02-22 11:31         ` Andrea Mazzocchi
@ 2018-02-22 13:50           ` Emmanuel Florac
  0 siblings, 0 replies; 8+ messages in thread
From: Emmanuel Florac @ 2018-02-22 13:50 UTC (permalink / raw)
  To: Andrea Mazzocchi; +Cc: linux-xfs

[-- Attachment #1: Type: text/plain, Size: 1061 bytes --]

Le Thu, 22 Feb 2018 12:31:28 +0100
Andrea Mazzocchi <mazzocchiandrea24@gmail.com> écrivait:

> In 3.10.0-693.17.1.el7 changelog there are at least 1400 entries for
> "XFS" (http://rpmfind.net/linux/RPM/centos/updates/7.4.1708/x86_64/Packages/kernel-devel-3.10.0-693.17.1.el7.x86_64.html):
> how could I see what kernel would be the best?
> Do I have to manually check the XFS bug fixes between
> 3.10.0-693.17.1.el7 and 4.14.20
> (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5791577963426c5a2db51fff57e9fcd72061e2c3)
> to see what best suits us?

So far I haven't run the 4.14 extensively, but the 4.9 is really safe.
You should definitely give it a try. Could you repair your root FS?

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

[-- Attachment #2: Signature digitale OpenPGP --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-02-22 13:50 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-02-21 10:57 XFS corruption of in-memory data detected with KVM Andrea Mazzocchi
2018-02-21 11:46 ` Carlos Maiolino
2018-02-21 15:23   ` Andrea Mazzocchi
2018-02-21 16:55     ` Emmanuel Florac
2018-02-22 10:22       ` Carlos Maiolino
2018-02-22 11:31         ` Andrea Mazzocchi
2018-02-22 13:50           ` Emmanuel Florac
     [not found]       ` <CAJbUkCQd_TOOSxVwWzDca7Do6_g+dMsKU3fObMig4gG_0HHg-w@mail.gmail.com>
2018-02-22 11:33         ` Emmanuel Florac

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).