[Bug 217769] New: XFS crash on mount on kernels >= 6.1

linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [Bug 217769] New: XFS crash on mount on kernels >= 6.1
@ 2023-08-07 16:35 bugzilla-daemon
  2023-08-07 17:17 ` [Bug 217769] " bugzilla-daemon
                   ` (17 more replies)
  0 siblings, 18 replies; 21+ messages in thread
From: bugzilla-daemon @ 2023-08-07 16:35 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=217769

            Bug ID: 217769
           Summary: XFS crash on mount on kernels >= 6.1
           Product: File System
           Version: 2.5
          Hardware: All
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P3
         Component: XFS
          Assignee: filesystem_xfs@kernel-bugs.kernel.org
          Reporter: xani666@gmail.com
        Regression: No

Created attachment 304789
  --> https://bugzilla.kernel.org/attachment.cgi?id=304789&action=edit
full boot sequence where XFS fails

Background:

I've seen that happen on few machines now, both consumer (my personal laptop)
and enterprise (VMs on server with ECC memory) so I think any kind of disk
corruption could be ruled out here.

Common things between the machines is that all of them were on Debian,
dist-upgraded multiple times. IIRC the filesystem was originally formatted on
4.9 kernel. So basically XFS was formatted long time ago:

xfs_db> version
versionnum [0xb4b4+0x8a] =
V4,NLINK,DIRV2,ATTR,ALIGN,LOGV2,EXTFLG,MOREBITS,ATTR2,LAZYSBCOUNT,PROJID32BIT

xfs_info /dev/vda2 
meta-data=/dev/vda2              isize=256    agcount=4, agsize=624064 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=0        finobt=0, sparse=0, rmapbt=0
         =                       reflink=0    bigtime=0 inobtcount=0 nrext64=0
data     =                       bsize=4096   blocks=2496256, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=0
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

And now *some* of them started to crash near-immediately on boot. After
upgrading to Debian's 6.1 I get the crash, booting the old version (either
5,17, 5.10 or 4.19) works fine. I also tried 6.5~rc4 from experimental to the
same effect, and 6.3 on my laptop (where it only crashes one of the partitions,
as in attachment

The other interesting part is that the server mentioned in logs have a HA pair,
that was formatted at the same date, used near-same but doesn't (so far) have
same behaviour.

In both cases XFS complains about in-memory corruption but I don't believe
that's the case, aside of one of the systems having ECC memory (and dozen other
VMs running without problem) I also memtested them  for ~hour just to be sure.
I also booted that VM on other machine with same effect.

The log in question: 

[   16.115185] ------------[ cut here ]------------
[   16.115856] WARNING: CPU: 2 PID: 646 at fs/xfs/xfs_inode.c:1831
xfs_iunlink+0x165/0x1e0 [xfs]
[   16.118200] Modules linked in: tcp_diag inet_diag binfmt_misc ext4
ghash_clmulni_intel sha512_ssse3 sha512_generic crc16 mbcache jbd2 aesni_intel
crypto_simd cryptd cirrus drm_shmem_helper drm_kms_helper i6300esb
virtio_balloon watchdog pcspkr button joydev evdev serio_raw loop fuse drm
efi_pstore dm_mod configfs qemu_fw_cfg virtio_rng rng_core ip_tables x_tables
autofs4 hid_generic usbhid hid xfs libcrc32c crc32c_generic ata_generic
virtio_net virtio_blk net_failover failover uhci_hcd ata_piix crct10dif_pclmul
crct10dif_common ehci_hcd libata virtio_pci crc32_pclmul floppy crc32c_intel
virtio_pci_legacy_dev virtio_pci_modern_dev virtio scsi_mod usbcore psmouse
scsi_common virtio_ring usb_common i2c_piix4
[   16.129279] CPU: 2 PID: 646 Comm: 6_dirty_io_sche Not tainted 6.5.0-0-amd64
#1  Debian 6.5~rc4-1~exp1
[   16.129290] Hardware name: hq hqblade212.non.3dart.com, BIOS 0.5.1
01/01/2007
[   16.129293] RIP: 0010:xfs_iunlink+0x165/0x1e0 [xfs]
[   16.134173] Code: 89 4c 24 04 72 2d e8 ea 5f f1 d8 8b 74 24 04 48 8d bd e0
00 00 00 e8 8a 40 94 d9 48 89 c3 48 85 c0 74 07 48 83 78 20 00 75 26 <0f> 0b e8
24 9c f1 d8 eb 13 f3 0f 1e fa 48 c7 c6 be fe 64 c0 4c 89
[   16.134177] RSP: 0018:ffffae6740c67b60 EFLAGS: 00010246
[   16.134189] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
0000000000000006
[   16.134192] RDX: 0000000000000000 RSI: ffff9810817bfd98 RDI:
0000000000088b40
[   16.134194] RBP: ffff9810809cd800 R08: ffff9810817bff28 R09:
ffff9810809cd8e0
[   16.134196] R10: 0000000000000000 R11: 0000000000000000 R12:
ffff9810861e2ae0
[   16.134199] R13: 00000000000101c0 R14: ffff981081642900 R15:
ffff981084217c00
[   16.146474] FS:  00007f89734e06c0(0000) GS:ffff981119d00000(0000)
knlGS:0000000000000000
[   16.146478] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   16.146480] CR2: 00007f8972b04000 CR3: 00000001024a6002 CR4:
00000000000206e0
[   16.146493] Call Trace:
[   16.146533]  <TASK>
[   16.146536]  ? xfs_iunlink+0x165/0x1e0 [xfs]
[   16.153113]  ? __warn+0x81/0x130
[   16.153193]  ? xfs_iunlink+0x165/0x1e0 [xfs]
[   16.154796]  ? report_bug+0x191/0x1c0
[   16.155433]  ? handle_bug+0x3c/0x80
[   16.155468]  ? exc_invalid_op+0x17/0x70
[   16.155472]  ? asm_exc_invalid_op+0x1a/0x20
[   16.155498]  ? xfs_iunlink+0x165/0x1e0 [xfs]
[   16.159010]  xfs_rename+0xaf9/0xe50 [xfs]
[   16.159285]  xfs_vn_rename+0xfe/0x170 [xfs]
[   16.161080]  ? __pfx_bpf_lsm_inode_permission+0x10/0x10
[   16.161137]  vfs_rename+0xb7e/0xd40
[   16.163241]  ? do_renameat2+0x57a/0x5f0
[   16.163256]  do_renameat2+0x57a/0x5f0
[   16.163277]  __x64_sys_rename+0x43/0x50
[   16.165707]  do_syscall_64+0x60/0xc0
[   16.165740]  ? do_syscall_64+0x6c/0xc0
[   16.165746]  ? do_syscall_64+0x6c/0xc0
[   16.165749]  ? do_syscall_64+0x6c/0xc0
[   16.165754]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[   16.165776] RIP: 0033:0x7f89b7277997
[   16.170373] Code: e8 ce 0f 0a 00 f7 d8 19 c0 5b c3 0f 1f 84 00 00 00 00 00
b8 ff ff ff ff 5b c3 66 0f 1f 84 00 00 00 00 00 b8 52 00 00 00 0f 05 <48> 3d 00
f0 ff ff 77 01 c3 48 8b 15 39 94 17 00 f7 d8 64 89 02 b8
[   16.170377] RSP: 002b:00007f89734dfc88 EFLAGS: 00000246 ORIG_RAX:
0000000000000052
[   16.170382] RAX: ffffffffffffffda RBX: 00007f89734dfca0 RCX:
00007f89b7277997
[   16.170384] RDX: 00007f8972b5fbb9 RSI: 00007f8972b5fb80 RDI:
00007f8972b5fb30
[   16.170386] RBP: 00007f8975544480 R08: 0000000000000009 R09:
0000000000000fa0
[   16.170388] R10: 0000000000000000 R11: 0000000000000246 R12:
00007f89734dfca0
[   16.170399] R13: 00007f89734dfcd0 R14: 00007f8972f5e3e8 R15:
00007f8975544480
[   16.170407]  </TASK>
[   16.170408] ---[ end trace 0000000000000000 ]---
[   16.170419] XFS (vda2): Internal error xfs_trans_cancel at line 1104 of file
fs/xfs/xfs_trans.c.  Caller xfs_rename+0x613/0xe50 [xfs]
[   16.201330] CPU: 2 PID: 646 Comm: 6_dirty_io_sche Tainted: G        W       
  6.5.0-0-amd64 #1  Debian 6.5~rc4-1~exp1
[   16.201335] Hardware name: hq hqblade212.non.3dart.com, BIOS 0.5.1
01/01/2007
[   16.201337] Call Trace:
[   16.201377]  <TASK>
[   16.201382]  dump_stack_lvl+0x47/0x60
[   16.201412]  xfs_trans_cancel+0x131/0x150 [xfs]
[   16.214058]  xfs_rename+0x613/0xe50 [xfs]
[   16.218956]  xfs_vn_rename+0xfe/0x170 [xfs]
[   16.218956]  ? __pfx_bpf_lsm_inode_permission+0x10/0x10
[   16.222083]  vfs_rename+0xb7e/0xd40
[   16.222083]  ? do_renameat2+0x57a/0x5f0
[   16.222083]  do_renameat2+0x57a/0x5f0
[   16.222083]  __x64_sys_rename+0x43/0x50
[   16.222083]  do_syscall_64+0x60/0xc0
[   16.222083]  ? do_syscall_64+0x6c/0xc0
[   16.222083]  ? do_syscall_64+0x6c/0xc0
[   16.222083]  ? do_syscall_64+0x6c/0xc0
[   16.222083]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[   16.222083] RIP: 0033:0x7f89b7277997
[   16.222083] Code: e8 ce 0f 0a 00 f7 d8 19 c0 5b c3 0f 1f 84 00 00 00 00 00
b8 ff ff ff ff 5b c3 66 0f 1f 84 00 00 00 00 00 b8 52 00 00 00 0f 05 <48> 3d 00
f0 ff ff 77 01 c3 48 8b 15 39 94 17 00 f7 d8 64 89 02 b8
[   16.222083] RSP: 002b:00007f89734dfc88 EFLAGS: 00000246 ORIG_RAX:
0000000000000052
[   16.222083] RAX: ffffffffffffffda RBX: 00007f89734dfca0 RCX:
00007f89b7277997
[   16.222083] RDX: 00007f8972b5fbb9 RSI: 00007f8972b5fb80 RDI:
00007f8972b5fb30
[   16.222083] RBP: 00007f8975544480 R08: 0000000000000009 R09:
0000000000000fa0
[   16.222083] R10: 0000000000000000 R11: 0000000000000246 R12:
00007f89734dfca0
[   16.222083] R13: 00007f89734dfcd0 R14: 00007f8972f5e3e8 R15:
00007f8975544480
[   16.222083]  </TASK>
[   16.223078] XFS (vda2): Corruption of in-memory data (0x8) detected at
xfs_trans_cancel+0x14a/0x150 [xfs] (fs/xfs/xfs_trans.c:1105).  Shutting down
filesystem.
[   16.223941] systemd-journald[251]:
/var/log/journal/48d6a6734183b423cc1f60686f4553bb/system.journal: IO error,
rotating.
[   16.224548] XFS (vda2): Please unmount the filesystem and rectify the
problem(s)

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 217769] XFS crash on mount on kernels >= 6.1
  2023-08-07 16:35 [Bug 217769] New: XFS crash on mount on kernels >= 6.1 bugzilla-daemon
@ 2023-08-07 17:17 ` bugzilla-daemon
  2023-08-07 18:06 ` bugzilla-daemon
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2023-08-07 17:17 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=217769

Eric Sandeen (sandeen@sandeen.net) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |sandeen@sandeen.net

--- Comment #1 from Eric Sandeen (sandeen@sandeen.net) ---
Please try running xfs_repair on the filesystems in question, and capture the
output. (you can use xfs_repair -n to do a dry run if you prefer, it will yield
the same basic information.)

My guess is that you will find complaints about unlinked inodes - please let us
know.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 217769] XFS crash on mount on kernels >= 6.1
  2023-08-07 16:35 [Bug 217769] New: XFS crash on mount on kernels >= 6.1 bugzilla-daemon
  2023-08-07 17:17 ` [Bug 217769] " bugzilla-daemon
@ 2023-08-07 18:06 ` bugzilla-daemon
  2023-08-07 19:01 ` bugzilla-daemon
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2023-08-07 18:06 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=217769

--- Comment #2 from Mariusz Gronczewski (xani666@gmail.com) ---
It did, thanks for help! Why is that reported as "corruption of in-memory data"
?

The filesystem on other machine also had exactly 3 disconnected inodes

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
agi unlinked bucket 0 is 559936 in ag 2 (inode=34114368)
agi unlinked bucket 42 is 175466 in ag 2 (inode=33729898)
agi unlinked bucket 53 is 198581 in ag 2 (inode=33753013)
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
disconnected inode 33729898, would move to lost+found
disconnected inode 33753013, would move to lost+found
disconnected inode 34114368, would move to lost+found
Phase 7 - verify link counts...

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 217769] XFS crash on mount on kernels >= 6.1
  2023-08-07 16:35 [Bug 217769] New: XFS crash on mount on kernels >= 6.1 bugzilla-daemon
  2023-08-07 17:17 ` [Bug 217769] " bugzilla-daemon
  2023-08-07 18:06 ` bugzilla-daemon
@ 2023-08-07 19:01 ` bugzilla-daemon
  2023-08-07 19:40 ` bugzilla-daemon
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2023-08-07 19:01 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=217769

--- Comment #3 from Eric Sandeen (sandeen@sandeen.net) ---
It's essentially an unexpected/inconsistent in-memory state, as opposed to an
on-disk structure that was found to be corrupt.

I presume that it boots ok now post-repair?

Do you know if this was the root or /boot filesystem or something else? It's
still a mystery about how filesystems get into this state; we should never have
a clean filesystem that requires no log recovery, but with unlinked inodes ...
recovery is supposed to clear that.

It may have persisted on this filesystem for a very long time and it's just
recent code changes that have started tripping over it, but I've always had a
hunch that /boot seems to show the problem more often.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 217769] XFS crash on mount on kernels >= 6.1
  2023-08-07 16:35 [Bug 217769] New: XFS crash on mount on kernels >= 6.1 bugzilla-daemon
                   ` (2 preceding siblings ...)
  2023-08-07 19:01 ` bugzilla-daemon
@ 2023-08-07 19:40 ` bugzilla-daemon
  2023-08-07 20:15 ` bugzilla-daemon
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2023-08-07 19:40 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=217769

--- Comment #4 from Mariusz Gronczewski (xani666@gmail.com) ---
> It's essentially an unexpected/inconsistent in-memory state, as opposed to an
> on-disk structure that was found to be corrupt.

Shouldn't that also set filesystem as "dirty" ? The problem is that this
*basically* makes system unbootable without intervention; OS thinks it is clean
so it doesn't run xfs_repair, then driver crashes without marking it unclean,
reboot and process repeats. The crash also blows every mounted XFS system which
means even if log partition doesn't have that problem none of logs will
persist. I had to reformat /var/log in ext4 to even gather them on my laptop

> I presume that it boots ok now post-repair?

Yes

> Do you know if this was the root or /boot filesystem or something else? It's
> still a mystery about how filesystems get into this state; we should never
> have a clean filesystem that requires no log recovery, but with unlinked
> inodes ... recovery is supposed to clear that.

It was root in both cases, we keep /boot on ext4

So far (well, we got few hundred more machines to upgrade) I've only seen that
on old ones, might be some bug that was fixed but left the mark on filesystem ?

> It may have persisted on this filesystem for a very long time and it's just
> recent code changes that have started tripping over it, but I've always had a
> hunch that /boot seems to show the problem more often.

That would track, I only saw that on old machines (I think they were formatted
around 4.9 kernel release, some even earlier). I just had another case on
machine but this time reading certain files triggered it.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 217769] XFS crash on mount on kernels >= 6.1
  2023-08-07 16:35 [Bug 217769] New: XFS crash on mount on kernels >= 6.1 bugzilla-daemon
                   ` (3 preceding siblings ...)
  2023-08-07 19:40 ` bugzilla-daemon
@ 2023-08-07 20:15 ` bugzilla-daemon
  2023-08-07 22:34 ` bugzilla-daemon
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2023-08-07 20:15 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=217769

--- Comment #5 from Eric Sandeen (sandeen@sandeen.net) ---
Ok thanks, the other instance I recently saw of this problem likely also
started on a rather old kernel.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 217769] XFS crash on mount on kernels >= 6.1
  2023-08-07 16:35 [Bug 217769] New: XFS crash on mount on kernels >= 6.1 bugzilla-daemon
                   ` (4 preceding siblings ...)
  2023-08-07 20:15 ` bugzilla-daemon
@ 2023-08-07 22:34 ` bugzilla-daemon
  2023-08-08 13:39 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2023-08-07 22:34 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=217769

--- Comment #6 from Mariusz Gronczewski (xani666@gmail.com) ---
I still have image of that VM with the problem if you want me to check
something on it

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 217769] XFS crash on mount on kernels >= 6.1
  2023-08-07 16:35 [Bug 217769] New: XFS crash on mount on kernels >= 6.1 bugzilla-daemon
                   ` (5 preceding siblings ...)
  2023-08-07 22:34 ` bugzilla-daemon
@ 2023-08-08 13:39 ` bugzilla-daemon
  2023-08-08 14:51 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2023-08-08 13:39 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=217769

--- Comment #7 from Eric Sandeen (sandeen@sandeen.net) ---
What might be most useful is to create an xfs_metadump image of the problematic
filesystem (with -o if you are ok with showing filenames in the clear) and from
there we can examine things.

A metadump image is metadata only, no file data, and compresses well. This can
then be turned back into a filesystem image for analysis.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 217769] XFS crash on mount on kernels >= 6.1
  2023-08-07 16:35 [Bug 217769] New: XFS crash on mount on kernels >= 6.1 bugzilla-daemon
                   ` (6 preceding siblings ...)
  2023-08-08 13:39 ` bugzilla-daemon
@ 2023-08-08 14:51 ` bugzilla-daemon
  2023-08-08 14:54 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2023-08-08 14:51 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=217769

--- Comment #8 from Mariusz Gronczewski (xani666@gmail.com) ---
Created attachment 304795
  --> https://bugzilla.kernel.org/attachment.cgi?id=304795&action=edit
Metadata dump

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 217769] XFS crash on mount on kernels >= 6.1
  2023-08-07 16:35 [Bug 217769] New: XFS crash on mount on kernels >= 6.1 bugzilla-daemon
                   ` (7 preceding siblings ...)
  2023-08-08 14:51 ` bugzilla-daemon
@ 2023-08-08 14:54 ` bugzilla-daemon
  2023-08-08 14:55 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2023-08-08 14:54 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=217769

--- Comment #9 from Mariusz Gronczewski (xani666@gmail.com) ---
Created attachment 304796
  --> https://bugzilla.kernel.org/attachment.cgi?id=304796&action=edit
Metadata dump after xfs_repair

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 217769] XFS crash on mount on kernels >= 6.1
  2023-08-07 16:35 [Bug 217769] New: XFS crash on mount on kernels >= 6.1 bugzilla-daemon
                   ` (8 preceding siblings ...)
  2023-08-08 14:54 ` bugzilla-daemon
@ 2023-08-08 14:55 ` bugzilla-daemon
  2023-08-08 21:54 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2023-08-08 14:55 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=217769

--- Comment #10 from Mariusz Gronczewski (xani666@gmail.com) ---
I've attached metadata dump before and after xfs_repair

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 217769] XFS crash on mount on kernels >= 6.1
  2023-08-07 16:35 [Bug 217769] New: XFS crash on mount on kernels >= 6.1 bugzilla-daemon
                   ` (9 preceding siblings ...)
  2023-08-08 14:55 ` bugzilla-daemon
@ 2023-08-08 21:54 ` bugzilla-daemon
  2023-08-08 22:07 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2023-08-08 21:54 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=217769

--- Comment #11 from Eric Sandeen (sandeen@sandeen.net) ---
Another question - Did the systems repeatedly fail to boot, or fail once and
then succeed?

If anything at all had happened on the filesystem prior to encountering the
problem, I think that the next boot should have seen a dirty log, and cleaned
up the problem as a result.

But if the first action on the fs was to create a tmpfile or unlink a file, we
might shut down before the log ever gets dirty and requires replay.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 217769] XFS crash on mount on kernels >= 6.1
  2023-08-07 16:35 [Bug 217769] New: XFS crash on mount on kernels >= 6.1 bugzilla-daemon
                   ` (10 preceding siblings ...)
  2023-08-08 21:54 ` bugzilla-daemon
@ 2023-08-08 22:07 ` bugzilla-daemon
  2023-08-09 16:43 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2023-08-08 22:07 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=217769

--- Comment #12 from Mariusz Gronczewski (xani666@gmail.com) ---
The system was never marked as dirty so fsck (which fixed the problem) wasn't
happening on boot. So yeah, once I upgraded kernel if it failed the first time
it failed on every reboot after till manually fscking or downgrading kernel.

On the one machine where it failed after the boot (reading of certain files
triggered it), that too was repeatable

Maybe on "corruption of in-memory data" driver should also mark FS as dirty?
After all if data is corrupted from actual memory error (and not like here,
from loading bad data) there is nonzero chance some of that data ended up being
written to disk

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 217769] XFS crash on mount on kernels >= 6.1
  2023-08-07 16:35 [Bug 217769] New: XFS crash on mount on kernels >= 6.1 bugzilla-daemon
                   ` (11 preceding siblings ...)
  2023-08-08 22:07 ` bugzilla-daemon
@ 2023-08-09 16:43 ` bugzilla-daemon
  2023-08-09 18:31 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2023-08-09 16:43 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=217769

--- Comment #13 from Richard W.M. Jones (rjones@redhat.com) ---
Was VMware (the hypervisor) ever involved here, eg. were these
VMware guests, was VMware Tools installed, were they converted
from VMware to KVM, or anything similar?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 217769] XFS crash on mount on kernels >= 6.1
  2023-08-07 16:35 [Bug 217769] New: XFS crash on mount on kernels >= 6.1 bugzilla-daemon
                   ` (12 preceding siblings ...)
  2023-08-09 16:43 ` bugzilla-daemon
@ 2023-08-09 18:31 ` bugzilla-daemon
  2023-08-09 19:10 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2023-08-09 18:31 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=217769

--- Comment #14 from Mariusz Gronczewski (xani666@gmail.com) ---
Nope and I've seen same problem on bare metal

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 217769] XFS crash on mount on kernels >= 6.1
  2023-08-07 16:35 [Bug 217769] New: XFS crash on mount on kernels >= 6.1 bugzilla-daemon
                   ` (13 preceding siblings ...)
  2023-08-09 18:31 ` bugzilla-daemon
@ 2023-08-09 19:10 ` bugzilla-daemon
  2023-08-29 23:42   ` Darrick J. Wong
  2023-08-29 23:42 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  17 siblings, 1 reply; 21+ messages in thread
From: bugzilla-daemon @ 2023-08-09 19:10 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=217769

--- Comment #15 from Richard W.M. Jones (rjones@redhat.com) ---
No problems.  We had a similar bug reported internally that
happens on VMware guests, and I'm just trying to rule out VMware
as a factor.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bug 217769] XFS crash on mount on kernels >= 6.1
  2023-08-09 19:10 ` bugzilla-daemon
@ 2023-08-29 23:42   ` Darrick J. Wong
  0 siblings, 0 replies; 21+ messages in thread
From: Darrick J. Wong @ 2023-08-29 23:42 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: linux-xfs

On Wed, Aug 09, 2023 at 07:10:03PM +0000, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=217769
> 
> --- Comment #15 from Richard W.M. Jones (rjones@redhat.com) ---
> No problems.  We had a similar bug reported internally that
> happens on VMware guests, and I'm just trying to rule out VMware
> as a factor.

Does this:
https://lore.kernel.org/linux-xfs/20230829232043.GE28186@frogsfrogsfrogs/T/#u

help in any way?

--D

> -- 
> You may reply to this email to add a comment.
> 
> You are receiving this mail because:
> You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 217769] XFS crash on mount on kernels >= 6.1
  2023-08-07 16:35 [Bug 217769] New: XFS crash on mount on kernels >= 6.1 bugzilla-daemon
                   ` (14 preceding siblings ...)
  2023-08-09 19:10 ` bugzilla-daemon
@ 2023-08-29 23:42 ` bugzilla-daemon
  2023-11-14 15:57 ` bugzilla-daemon
  2023-11-14 23:46 ` bugzilla-daemon
  17 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2023-08-29 23:42 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=217769

--- Comment #16 from Darrick J. Wong (djwong@kernel.org) ---
On Wed, Aug 09, 2023 at 07:10:03PM +0000, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=217769
> 
> --- Comment #15 from Richard W.M. Jones (rjones@redhat.com) ---
> No problems.  We had a similar bug reported internally that
> happens on VMware guests, and I'm just trying to rule out VMware
> as a factor.

Does this:
https://lore.kernel.org/linux-xfs/20230829232043.GE28186@frogsfrogsfrogs/T/#u

help in any way?

--D

> -- 
> You may reply to this email to add a comment.
> 
> You are receiving this mail because:
> You are watching the assignee of the bug.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 217769] XFS crash on mount on kernels >= 6.1
  2023-08-07 16:35 [Bug 217769] New: XFS crash on mount on kernels >= 6.1 bugzilla-daemon
                   ` (15 preceding siblings ...)
  2023-08-29 23:42 ` bugzilla-daemon
@ 2023-11-14 15:57 ` bugzilla-daemon
  2023-11-14 23:46   ` Darrick J. Wong
  2023-11-14 23:46 ` bugzilla-daemon
  17 siblings, 1 reply; 21+ messages in thread
From: bugzilla-daemon @ 2023-11-14 15:57 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=217769

Grant Millar (grant@cylo.net) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |grant@cylo.net

--- Comment #17 from Grant Millar (grant@cylo.net) ---
We're experiencing the same bug following a data migration to new servers.

The servers are all running a fresh install of Debian 12 with brand new
hardware.

So far in the past 3 days we've had 2 mounts fail with:

[28797.357684] XFS (sdn): Internal error xfs_trans_cancel at line 1097 of file
fs/xfs/xfs_trans.c.  Caller xfs_rename+0x61a/0xea0 [xfs]
[28797.488475] XFS (sdn): Corruption of in-memory data (0x8) detected at
xfs_trans_cancel+0x146/0x150 [xfs] (fs/xfs/xfs_trans.c:1098).  Shutting down
filesystem.
[28797.488595] XFS (sdn): Please unmount the filesystem and rectify the
problem(s)

Both occurred in the same function on separate servers: xfs_rename+0x61a/0xea0

Neither mounts are the root filesystem.

versionnum [0xbcf5+0x18a] =
V5,NLINK,DIRV2,ATTR,QUOTA,ALIGN,LOGV2,EXTFLG,SECTOR,MOREBITS,ATTR2,LAZYSBCOUNT,PROJID32BIT,CRC,FTYPE,FINOBT,SPARSE_INODES,REFLINK,INOBTCNT,BIGTIME

meta-data=/dev/sdk               isize=512    agcount=17, agsize=268435455 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1    bigtime=1 inobtcount=1 nrext64=0
data     =                       bsize=4096   blocks=4394582016, imaxpct=50
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=521728, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

Please let me know if I can provide more information.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bug 217769] XFS crash on mount on kernels >= 6.1
  2023-11-14 15:57 ` bugzilla-daemon
@ 2023-11-14 23:46   ` Darrick J. Wong
  0 siblings, 0 replies; 21+ messages in thread
From: Darrick J. Wong @ 2023-11-14 23:46 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: linux-xfs

On Tue, Nov 14, 2023 at 03:57:06PM +0000, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=217769
> 
> Grant Millar (grant@cylo.net) changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |grant@cylo.net
> 
> --- Comment #17 from Grant Millar (grant@cylo.net) ---
> We're experiencing the same bug following a data migration to new servers.
> 
> The servers are all running a fresh install of Debian 12 with brand new
> hardware.
> 
> So far in the past 3 days we've had 2 mounts fail with:
> 
> [28797.357684] XFS (sdn): Internal error xfs_trans_cancel at line 1097 of file
> fs/xfs/xfs_trans.c.  Caller xfs_rename+0x61a/0xea0 [xfs]
> [28797.488475] XFS (sdn): Corruption of in-memory data (0x8) detected at
> xfs_trans_cancel+0x146/0x150 [xfs] (fs/xfs/xfs_trans.c:1098).  Shutting down
> filesystem.
> [28797.488595] XFS (sdn): Please unmount the filesystem and rectify the
> problem(s)
> 
> Both occurred in the same function on separate servers: xfs_rename+0x61a/0xea0
> 
> Neither mounts are the root filesystem.

This should be fixed in 6.6, could you try that and report back?

(See "xfs: reload entire unlinked bucket lists")

--D

> 
> versionnum [0xbcf5+0x18a] =
> V5,NLINK,DIRV2,ATTR,QUOTA,ALIGN,LOGV2,EXTFLG,SECTOR,MOREBITS,ATTR2,LAZYSBCOUNT,PROJID32BIT,CRC,FTYPE,FINOBT,SPARSE_INODES,REFLINK,INOBTCNT,BIGTIME
> 
> meta-data=/dev/sdk               isize=512    agcount=17, agsize=268435455 blks
>          =                       sectsz=4096  attr=2, projid32bit=1
>          =                       crc=1        finobt=1, sparse=1, rmapbt=0
>          =                       reflink=1    bigtime=1 inobtcount=1 nrext64=0
> data     =                       bsize=4096   blocks=4394582016, imaxpct=50
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
> log      =internal log           bsize=4096   blocks=521728, version=2
>          =                       sectsz=4096  sunit=1 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> Please let me know if I can provide more information.
> 
> -- 
> You may reply to this email to add a comment.
> 
> You are receiving this mail because:
> You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 217769] XFS crash on mount on kernels >= 6.1
  2023-08-07 16:35 [Bug 217769] New: XFS crash on mount on kernels >= 6.1 bugzilla-daemon
                   ` (16 preceding siblings ...)
  2023-11-14 15:57 ` bugzilla-daemon
@ 2023-11-14 23:46 ` bugzilla-daemon
  17 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2023-11-14 23:46 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=217769

--- Comment #18 from Darrick J. Wong (djwong@kernel.org) ---
On Tue, Nov 14, 2023 at 03:57:06PM +0000, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=217769
> 
> Grant Millar (grant@cylo.net) changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |grant@cylo.net
> 
> --- Comment #17 from Grant Millar (grant@cylo.net) ---
> We're experiencing the same bug following a data migration to new servers.
> 
> The servers are all running a fresh install of Debian 12 with brand new
> hardware.
> 
> So far in the past 3 days we've had 2 mounts fail with:
> 
> [28797.357684] XFS (sdn): Internal error xfs_trans_cancel at line 1097 of
> file
> fs/xfs/xfs_trans.c.  Caller xfs_rename+0x61a/0xea0 [xfs]
> [28797.488475] XFS (sdn): Corruption of in-memory data (0x8) detected at
> xfs_trans_cancel+0x146/0x150 [xfs] (fs/xfs/xfs_trans.c:1098).  Shutting down
> filesystem.
> [28797.488595] XFS (sdn): Please unmount the filesystem and rectify the
> problem(s)
> 
> Both occurred in the same function on separate servers:
> xfs_rename+0x61a/0xea0
> 
> Neither mounts are the root filesystem.

This should be fixed in 6.6, could you try that and report back?

(See "xfs: reload entire unlinked bucket lists")

--D

> 
> versionnum [0xbcf5+0x18a] =
>
> V5,NLINK,DIRV2,ATTR,QUOTA,ALIGN,LOGV2,EXTFLG,SECTOR,MOREBITS,ATTR2,LAZYSBCOUNT,PROJID32BIT,CRC,FTYPE,FINOBT,SPARSE_INODES,REFLINK,INOBTCNT,BIGTIME
> 
> meta-data=/dev/sdk               isize=512    agcount=17, agsize=268435455
> blks
>          =                       sectsz=4096  attr=2, projid32bit=1
>          =                       crc=1        finobt=1, sparse=1, rmapbt=0
>          =                       reflink=1    bigtime=1 inobtcount=1
>          nrext64=0
> data     =                       bsize=4096   blocks=4394582016, imaxpct=50
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
> log      =internal log           bsize=4096   blocks=521728, version=2
>          =                       sectsz=4096  sunit=1 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> Please let me know if I can provide more information.
> 
> -- 
> You may reply to this email to add a comment.
> 
> You are receiving this mail because:
> You are watching the assignee of the bug.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2023-11-14 23:46 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-07 16:35 [Bug 217769] New: XFS crash on mount on kernels >= 6.1 bugzilla-daemon
2023-08-07 17:17 ` [Bug 217769] " bugzilla-daemon
2023-08-07 18:06 ` bugzilla-daemon
2023-08-07 19:01 ` bugzilla-daemon
2023-08-07 19:40 ` bugzilla-daemon
2023-08-07 20:15 ` bugzilla-daemon
2023-08-07 22:34 ` bugzilla-daemon
2023-08-08 13:39 ` bugzilla-daemon
2023-08-08 14:51 ` bugzilla-daemon
2023-08-08 14:54 ` bugzilla-daemon
2023-08-08 14:55 ` bugzilla-daemon
2023-08-08 21:54 ` bugzilla-daemon
2023-08-08 22:07 ` bugzilla-daemon
2023-08-09 16:43 ` bugzilla-daemon
2023-08-09 18:31 ` bugzilla-daemon
2023-08-09 19:10 ` bugzilla-daemon
2023-08-29 23:42   ` Darrick J. Wong
2023-08-29 23:42 ` bugzilla-daemon
2023-11-14 15:57 ` bugzilla-daemon
2023-11-14 23:46   ` Darrick J. Wong
2023-11-14 23:46 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).