From: bugzilla-daemon@kernel.org
To: linux-xfs@vger.kernel.org
Subject: [Bug 217572] Initial blocked tasks causing deterioration over hours until (nearly) complete system lockup and data loss with PostgreSQL 13
Date: Tue, 20 Jun 2023 15:10:49 +0000 [thread overview]
Message-ID: <bug-217572-201763-CDEPKlpRnd@https.bugzilla.kernel.org/> (raw)
In-Reply-To: <bug-217572-201763@https.bugzilla.kernel.org/>
https://bugzilla.kernel.org/show_bug.cgi?id=217572
--- Comment #1 from Christian Theune (ct@flyingcircus.io) ---
Hi,
is there anything I can do to raise interest on this one? :)
> On 19. Jun 2023, at 10:29, bugzilla-daemon@kernel.org wrote:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=217572
>
> Bug ID: 217572
> Summary: Initial blocked tasks causing deterioration over hours
> until (nearly) complete system lockup and data loss
> with PostgreSQL 13
> Product: File System
> Version: 2.5
> Hardware: All
> OS: Linux
> Status: NEW
> Severity: high
> Priority: P3
> Component: XFS
> Assignee: filesystem_xfs@kernel-bugs.kernel.org
> Reporter: ct@flyingcircus.io
> Regression: No
>
> Last Friday we experienced the following hung task messages with PostgreSQL
> while performing our nightly backup using pg_dump. Normally this takes at
> most
> a few minutes with IO being stressed. This time it caused high SYS CPU time,
> went on for almost 20 minutes and caused the PostgreSQL dump to fail with
> inconsistent data.
>
> Around 3:50 AM we got this:
>
> [330289.821046] INFO: task .postgres-wrapp:11884 blocked for more than 122
> seconds.
> [330289.821830] Not tainted 6.1.31 #1-NixOS
> [330289.822285] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [330289.823098] task:.postgres-wrapp state:D stack:0 pid:11884 ppid:11881
> flags:0x00000002
> [330289.823911] Call Trace:
> [330289.824221] <TASK>
> [330289.824451] __schedule+0x35d/0x1370
> [330289.824858] ? mntput_no_expire+0x4a/0x250
> [330289.825307] schedule+0x5d/0xe0
> [330289.825630] rwsem_down_write_slowpath+0x34e/0x730
> [330289.826128] xfs_ilock+0xeb/0xf0 [xfs]
> [330289.826599] xfs_file_buffered_write+0x119/0x300 [xfs]
> [330289.827212] ? selinux_file_permission+0x10b/0x150
> [330289.827683] vfs_write+0x244/0x400
> [330289.828049] __x64_sys_pwrite64+0x94/0xc0
> [330289.828459] do_syscall_64+0x3a/0x90
> [330289.828801] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [330289.829302] RIP: 0033:0x7ff8de90e7f7
> [330289.829681] RSP: 002b:00007fff52069b08 EFLAGS: 00000202 ORIG_RAX:
> 0000000000000012
> [330289.830408] RAX: ffffffffffffffda RBX: 0000562bb434f510 RCX:
> 00007ff8de90e7f7
> [330289.831073] RDX: 0000000000002000 RSI: 00007ff888b52e80 RDI:
> 000000000000003b
> [330289.831762] RBP: 00000000000021b0 R08: 000000000a000010 R09:
> 0000000000000040
> [330289.832440] R10: 0000000004ed8000 R11: 0000000000000202 R12:
> 0000000000002000
> [330289.833130] R13: 0000000004ed8000 R14: 00007ff8de8176c8 R15:
> 0000562bb434af75
> [330289.833803] </TASK>
> [330289.834064] INFO: task .postgres-wrapp:1245532 blocked for more than 122
> seconds.
> [330289.834771] Not tainted 6.1.31 #1-NixOS
> [330289.835209] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [330289.835926] task:.postgres-wrapp state:D stack:0 pid:1245532
> ppid:11881
> flags:0x00000002
> [330289.836752] Call Trace:
> [330289.837010] <TASK>
> [330289.837258] __schedule+0x35d/0x1370
> [330289.837622] ? page_add_file_rmap+0xba/0x2f0
> [330289.838047] ? do_set_pte+0x174/0x1c0
> [330289.838420] ? unix_stream_read_generic+0x223/0xa60
> [330289.838887] schedule+0x5d/0xe0
> [330289.839265] schedule_preempt_disabled+0x14/0x30
> [330289.839758] rwsem_down_read_slowpath+0x29e/0x4f0
> [330289.840521] down_read+0x47/0xb0
> [330289.840853] xfs_ilock+0x79/0xf0 [xfs]
> [330289.841346] xfs_file_buffered_read+0x44/0xd0 [xfs]
> [330289.841945] xfs_file_read_iter+0x6a/0xd0 [xfs]
> [330289.842496] vfs_read+0x23c/0x310
> [330289.842845] __x64_sys_pread64+0x94/0xc0
> [330289.843303] do_syscall_64+0x3a/0x90
> [330289.843857] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [330289.844420] RIP: 0033:0x7ff8de90e747
> [330289.844898] RSP: 002b:00007fff5206aae8 EFLAGS: 00000202 ORIG_RAX:
> 0000000000000011
> [330289.845705] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
> 00007ff8de90e747
> [330289.846488] RDX: 0000000000002000 RSI: 00007ff8896e0e80 RDI:
> 0000000000000010
> [330289.847133] RBP: 0000000000000001 R08: 000000000a00000d R09:
> 0000000000000000
> [330289.847627] R10: 0000000018fb8000 R11: 0000000000000202 R12:
> 00007ff8cdd91278
> [330289.848113] R13: 0000562bb434af75 R14: 0000562bb434f510 R15:
> 0000562bb5e89e70
> [330289.848631] </TASK>
>
>
> This keeps going on for a few minutes until:
>
> [330584.618978] rcu: INFO: rcu_preempt self-detected stall on CPU
> [330584.619413] rcu: 1-....: (20999 ticks this GP)
> idle=1204/1/0x4000000000000000 softirq=53073547/53073547 fqs=5231
> [330584.620052] (t=21000 jiffies g=142859597 q=21168 ncpus=3)
> [330584.620409] CPU: 1 PID: 1246481 Comm: .postgres-wrapp Not tainted 6.1.31
> #1-NixOS
> [330584.620880] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
> [330584.621588] RIP: 0010:xas_load+0xb/0x40
> [330584.621844] Code: 04 48 8b 44 c2 08 c3 cc cc cc cc 48 8b 07 48 8b 40 08
> c3
> cc cc cc cc 0f 1f 84 00 00 00 00 00 e8 3b ff ff ff 48 89 c2 83 e2 03 <48> 83
> fa
> 02 75 08 48 3d 00 10 00 00 77 05 c3 cc cc cc cc 0f b6 4f
> [330584.622996] RSP: 0018:ffffb427c3387bf8 EFLAGS: 00000202
> [330584.623337] RAX: ffff988876d3c002 RBX: ffffb427c3387d70 RCX:
> 0000000000000002
> [330584.623787] RDX: 0000000000000002 RSI: ffff98871f9d2920 RDI:
> ffffb427c3387c00
> [330584.624239] RBP: 000000000000f161 R08: ffffb427c3387e68 R09:
> ffffda1444482000
> [330584.624686] R10: 0000000000000001 R11: 0000000000000001 R12:
> 000000000000f161
> [330584.625136] R13: ffff9887ad3c2700 R14: 000000000000f160 R15:
> ffffb427c3387e90
> [330584.625591] FS: 00007ff8de817800(0000) GS:ffff98887ac80000(0000)
> knlGS:0000000000000000
> [330584.626097] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [330584.626467] CR2: 0000562bb434e2d8 CR3: 0000000104bd6000 CR4:
> 00000000000006e0
> [330584.626954] Call Trace:
> [330584.627125] <IRQ>
> [330584.627281] ? rcu_dump_cpu_stacks+0xc8/0x100
> [330584.627567] ? rcu_sched_clock_irq.cold+0x15b/0x2fb
> [330584.627882] ? sched_slice+0x87/0x140
> [330584.628126] ? timekeeping_update+0xdd/0x130
> [330584.628414] ? __cgroup_account_cputime_field+0x5b/0xa0
> [330584.628751] ? update_process_times+0x77/0xb0
> [330584.629036] ? update_wall_time+0xc/0x20
> [330584.629300] ? tick_sched_handle+0x34/0x50
> [330584.629564] ? tick_sched_timer+0x6f/0x80
> [330584.629823] ? tick_sched_do_timer+0xa0/0xa0
> [330584.630103] ? __hrtimer_run_queues+0x112/0x2b0
> [330584.630404] ? hrtimer_interrupt+0xfe/0x220
> [330584.630678] ? __sysvec_apic_timer_interrupt+0x7f/0x170
> [330584.631016] ? sysvec_apic_timer_interrupt+0x99/0xc0
> [330584.631339] </IRQ>
> [330584.631485] <TASK>
> [330584.631631] ? asm_sysvec_apic_timer_interrupt+0x16/0x20
> [330584.631974] ? xas_load+0xb/0x40
> [330584.632195] ? xas_load+0x30/0x40
> [330584.632415] filemap_get_read_batch+0x16e/0x250
> [330584.632710] filemap_get_pages+0xa9/0x630
> [330584.632972] ? memcg_check_events+0xda/0x210
> [330584.633259] ? free_unref_page_commit+0x7c/0x170
> [330584.633560] ? _raw_spin_unlock_irqrestore+0x23/0x40
> [330584.633880] ? free_unref_page+0x1ac/0x220
> [330584.634146] filemap_read+0xd2/0x340
> [330584.634389] xfs_file_buffered_read+0x4f/0xd0 [xfs]
> [330584.634778] xfs_file_read_iter+0x6a/0xd0 [xfs]
> [330584.635123] vfs_read+0x23c/0x310
> [330584.635354] __x64_sys_pread64+0x94/0xc0
> [330584.635609] do_syscall_64+0x3a/0x90
> [330584.635846] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [330584.636174] RIP: 0033:0x7ff8de90e747
> [330584.636437] Code: 48 e8 9d dc f2 ff 41 b8 02 00 00 00 e9 38 f6 ff ff 66
> 90
> f3 0f 1e fa 80 3d bd ac 0e 00 00 49 89 ca 74 10 b8 11 00 00 00 0f 05 <48> 3d
> 00
> f0 ff ff 77 59 c3 48 83 ec 28 48 89 54 24 10 48 89 74 24
> [330584.637582] RSP: 002b:00007fff5206aaa8 EFLAGS: 00000202 ORIG_RAX:
> 0000000000000011
> [330584.638056] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
> 00007ff8de90e747
> [330584.638506] RDX: 0000000000002000 RSI: 00007ff888b68e80 RDI:
> 0000000000000230
> [330584.638954] RBP: 0000000000000001 R08: 000000000a00000d R09:
> 0000000000000000
> [330584.639411] R10: 000000000f160000 R11: 0000000000000202 R12:
> 00007ff8cdd92688
> [330584.639862] R13: 0000562bb434af75 R14: 0000562bb434f510 R15:
> 0000562bb5fe04d0
> [330584.640313] </TASK>
> [330584.721995] rcu: INFO: rcu_preempt detected expedited stalls on
> CPUs/tasks:
> { 1-.... } 21015 jiffies s: 7297 root: 0x2/.
> [330584.722930] rcu: blocking rcu_node structures (internal RCU debug):
> [330584.723428] Sending NMI from CPU 0 to CPUs 1:
> [330584.723763] NMI backtrace for cpu 1
> [330584.723769] CPU: 1 PID: 1246481 Comm: .postgres-wrapp Not tainted 6.1.31
> #1-NixOS
> [330584.723772] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
> [330584.723775] RIP: 0010:xas_load+0x29/0x40
> [330584.723783] Code: 00 e8 3b ff ff ff 48 89 c2 83 e2 03 48 83 fa 02 75 08
> 48
> 3d 00 10 00 00 77 05 c3 cc cc cc cc 0f b6 4f 10 48 8d 70 fe 38 48 fe <72> ee
> e8
> 20 fe ff ff 80 3e 00 75 d0 c3 cc cc cc cc 66 0f 1f 44 00
> [330584.723785] RSP: 0018:ffffb427c3387bf8 EFLAGS: 00000246
> [330584.723787] RAX: ffff98871f8dbdaa RBX: ffffb427c3387d70 RCX:
> 0000000000000000
> [330584.723788] RDX: 0000000000000002 RSI: ffff98871f8dbda8 RDI:
> ffffb427c3387c00
> [330584.723789] RBP: 000000000000f161 R08: ffffb427c3387e68 R09:
> ffffda1444482000
> [330584.723790] R10: 0000000000000001 R11: 0000000000000001 R12:
> 000000000000f161
> [330584.723791] R13: ffff9887ad3c2700 R14: 000000000000f160 R15:
> ffffb427c3387e90
> [330584.723793] FS: 00007ff8de817800(0000) GS:ffff98887ac80000(0000)
> knlGS:0000000000000000
> [330584.723794] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [330584.723795] CR2: 0000562bb434e2d8 CR3: 0000000104bd6000 CR4:
> 00000000000006e0
> [330584.723798] Call Trace:
> [330584.723801] <NMI>
> [330584.723805] ? nmi_cpu_backtrace.cold+0x1b/0x76
> [330584.723810] ? nmi_cpu_backtrace_handler+0xd/0x20
> [330584.723814] ? nmi_handle+0x5d/0x120
> [330584.723817] ? xas_load+0x29/0x40
> [330584.723818] ? default_do_nmi+0x69/0x170
> [330584.723821] ? exc_nmi+0x13c/0x170
> [330584.723823] ? end_repeat_nmi+0x16/0x67
> [330584.723828] ? xas_load+0x29/0x40
> [330584.723830] ? xas_load+0x29/0x40
> [330584.723832] ? xas_load+0x29/0x40
> [330584.723834] </NMI>
> [330584.723834] <TASK>
> [330584.723835] filemap_get_read_batch+0x16e/0x250
> [330584.723840] filemap_get_pages+0xa9/0x630
> [330584.723842] ? memcg_check_events+0xda/0x210
> [330584.723845] ? free_unref_page_commit+0x7c/0x170
> [330584.723849] ? _raw_spin_unlock_irqrestore+0x23/0x40
> [330584.723851] ? free_unref_page+0x1ac/0x220
> [330584.723852] filemap_read+0xd2/0x340
> [330584.723857] xfs_file_buffered_read+0x4f/0xd0 [xfs]
> [330584.723935] xfs_file_read_iter+0x6a/0xd0 [xfs]
> [330584.723990] vfs_read+0x23c/0x310
> [330584.723995] __x64_sys_pread64+0x94/0xc0
> [330584.723997] do_syscall_64+0x3a/0x90
> [330584.724000] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [330584.724003] RIP: 0033:0x7ff8de90e747
> [330584.724019] Code: 48 e8 9d dc f2 ff 41 b8 02 00 00 00 e9 38 f6 ff ff 66
> 90
> f3 0f 1e fa 80 3d bd ac 0e 00 00 49 89 ca 74 10 b8 11 00 00 00 0f 05 <48> 3d
> 00
> f0 ff ff 77 59 c3 48 83 ec 28 48 89 54 24 10 48 89 74 24
> [330584.724020] RSP: 002b:00007fff5206aaa8 EFLAGS: 00000202 ORIG_RAX:
> 0000000000000011
> [330584.724021] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
> 00007ff8de90e747
> [330584.724022] RDX: 0000000000002000 RSI: 00007ff888b68e80 RDI:
> 0000000000000230
> [330584.724023] RBP: 0000000000000001 R08: 000000000a00000d R09:
> 0000000000000000
> [330584.724023] R10: 000000000f160000 R11: 0000000000000202 R12:
> 00007ff8cdd92688
> [330584.724024] R13: 0000562bb434af75 R14: 0000562bb434f510 R15:
> 0000562bb5fe04d0
> [330584.724026] </TASK>
>
>
> This keeps repeating until around 4:03 which ends in systemd-journal getting
> coredumped:
>
> [331277.846966] rcu: INFO: rcu_preempt self-detected stall on CPU
> [331277.847413] rcu: 1-....: (713858 ticks this GP)
> idle=1204/1/0x4000000000000000 softirq=53073547/53073547 fqs=311996
> [331277.848088] (t=714253 jiffies g=142859597 q=1821602 ncpus=3)
> [331277.848462] CPU: 1 PID: 1246481 Comm: .postgres-wrapp Not tainted 6.1.31
> #1-NixOS
> [331277.848941] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
> [331277.849667] RIP: 0010:xas_descend+0x22/0x70
> [331277.849952] Code: cc cc cc cc cc cc cc cc 0f b6 0e 48 8b 57 08 48 d3 ea
> 83
> e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 77 18 48 89 c1 83 e1 03 <48> 83
> f9
> 02 75 08 48 3d fd 00 00 00 76 08 88 57 12 c3 cc cc cc cc
> [331277.851120] RSP: 0018:ffffb427c3387bf0 EFLAGS: 00000202
> [331277.851468] RAX: ffff988876d3c002 RBX: ffffb427c3387d70 RCX:
> 0000000000000002
> [331277.851928] RDX: 000000000000000f RSI: ffff98871f9d2920 RDI:
> ffffb427c3387c00
> [331277.852386] RBP: 000000000000f161 R08: ffffb427c3387e68 R09:
> ffffda1444482000
> [331277.852847] R10: 0000000000000001 R11: 0000000000000001 R12:
> 000000000000f161
> [331277.853303] R13: ffff9887ad3c2700 R14: 000000000000f160 R15:
> ffffb427c3387e90
> [331277.853766] FS: 00007ff8de817800(0000) GS:ffff98887ac80000(0000)
> knlGS:0000000000000000
> [331277.854278] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [331277.854653] CR2: 0000562bb434e2d8 CR3: 0000000104bd6000 CR4:
> 00000000000006e0
> [331277.855111] Call Trace:
> [331277.855284] <IRQ>
> [331277.855433] ? rcu_dump_cpu_stacks+0xc8/0x100
> [331277.855730] ? rcu_sched_clock_irq.cold+0x15b/0x2fb
> [331277.856049] ? sched_slice+0x87/0x140
> [331277.856294] ? timekeeping_update+0xdd/0x130
> [331277.856587] ? __cgroup_account_cputime_field+0x5b/0xa0
> [331277.856930] ? update_process_times+0x77/0xb0
> [331277.857222] ? update_wall_time+0xc/0x20
> [331277.857482] ? tick_sched_handle+0x34/0x50
> [331277.857758] ? tick_sched_timer+0x6f/0x80
> [331277.858024] ? tick_sched_do_timer+0xa0/0xa0
> [331277.858306] ? __hrtimer_run_queues+0x112/0x2b0
> [331277.858613] ? hrtimer_interrupt+0xfe/0x220
> [331277.858891] ? __sysvec_apic_timer_interrupt+0x7f/0x170
> [331277.859235] ? sysvec_apic_timer_interrupt+0x99/0xc0
> [331277.859558] </IRQ>
> [331277.859710] <TASK>
> [331277.859857] ? asm_sysvec_apic_timer_interrupt+0x16/0x20
> [331277.860206] ? xas_descend+0x22/0x70
> [331277.860446] xas_load+0x30/0x40
> [331277.860665] filemap_get_read_batch+0x16e/0x250
> [331277.860967] filemap_get_pages+0xa9/0x630
> [331277.861233] ? memcg_check_events+0xda/0x210
> [331277.861517] ? free_unref_page_commit+0x7c/0x170
> [331277.861834] ? _raw_spin_unlock_irqrestore+0x23/0x40
> [331277.862158] ? free_unref_page+0x1ac/0x220
> [331277.862427] filemap_read+0xd2/0x340
> [331277.862677] xfs_file_buffered_read+0x4f/0xd0 [xfs]
> [331277.863072] xfs_file_read_iter+0x6a/0xd0 [xfs]
> [331277.863424] vfs_read+0x23c/0x310
> [331277.863657] __x64_sys_pread64+0x94/0xc0
> [331277.863917] do_syscall_64+0x3a/0x90
> [331277.864159] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [331277.864488] RIP: 0033:0x7ff8de90e747
> [331277.864752] Code: 48 e8 9d dc f2 ff 41 b8 02 00 00 00 e9 38 f6 ff ff 66
> 90
> f3 0f 1e fa 80 3d bd ac 0e 00 00 49 89 ca 74 10 b8 11 00 00 00 0f 05 <48> 3d
> 00
> f0 ff ff 77 59 c3 48 83 ec 28 48 89 54 24 10 48 89 74 24
> [331277.865917] RSP: 002b:00007fff5206aaa8 EFLAGS: 00000202 ORIG_RAX:
> 0000000000000011
> [331277.866395] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
> 00007ff8de90e747
> [331277.866848] RDX: 0000000000002000 RSI: 00007ff888b68e80 RDI:
> 0000000000000230
> [331277.867298] RBP: 0000000000000001 R08: 000000000a00000d R09:
> 0000000000000000
> [331277.867750] R10: 000000000f160000 R11: 0000000000000202 R12:
> 00007ff8cdd92688
> [331277.868199] R13: 0000562bb434af75 R14: 0000562bb434f510 R15:
> 0000562bb5fe04d0
> [331277.868658] </TASK>
> Fri Jun 16 04:03:23 AM CEST 2023 -- SERIAL CONSOLE IS LIVE --
> [331289.987247] systemd[1]: Starting Serial console liveness marker...
> [331289.992645] systemd[1]: nscd.service: Deactivated successfully.
> [331290.011838] systemd[1]: nscd.service: Consumed 3min 16.251s CPU time,
> received 12.0M IP traffic, sent 8.2M IP traffic.
> [331290.020289] systemd[1]: serial-console-liveness.service: Deactivated
> successfully.
> [331290.035818] systemd[1]: Finished Serial console liveness marker.
> [331290.068776] systemd[1]: Started Logrotate Service.
> [331290.069700] systemd[1]: systemd-journald.service: Watchdog timeout (limit
> 3min)!
> [331290.070475] systemd[1]: systemd-journald.service: Killing process 528
> (systemd-journal) with signal SIGABRT.
> [331290.096246] systemd[1]: logrotate.service: Deactivated successfully.
> [331290.285285] systemd-coredump[1267441]: Process 528 (systemd-journal) of
> user 0 dumped core.
> [331290.286002] systemd-coredump[1267441]: Coredump diverted to
>
> /var/lib/systemd/coredump/core.systemd-journal.0.39df468c96764e8c8fffc53a0d0a47d1.528.1686881003000000.zst
> [331290.287045] systemd-coredump[1267441]: Module libaudit.so.1 without
> build-id.
> [331290.287574] systemd-coredump[1267441]: Module libnftnl.so.11 without
> build-id.
> [331290.288111] systemd-coredump[1267441]: Module libmnl.so.0 without
> build-id.
> [331290.288587] systemd-coredump[1267441]: Module libgpg-error.so.0 without
> build-id.
> [331290.289076] systemd-coredump[1267441]: Module libattr.so.1 without
> build-id.
> [331290.289541] systemd-coredump[1267441]: Module libzstd.so.1 without
> build-id.
> [331290.289991] systemd-coredump[1267441]: Module liblzma.so.5 without
> build-id.
> [331290.290446] systemd-coredump[1267441]: Module libseccomp.so.2 without
> build-id.
> [331290.290940] systemd-coredump[1267441]: Module libpam.so.0 without
> build-id.
> [331290.291419] systemd-coredump[1267441]: Module liblz4.so.1 without
> build-id.
> [331290.291893] systemd-coredump[1267441]: Module libkmod.so.2 without
> build-id.
> [331290.292370] systemd-coredump[1267441]: Module libip4tc.so.2 without
> build-id.
> [331290.295165] systemd-coredump[1267441]: Module libgcrypt.so.20 without
> build-id.
> [331290.295678] systemd-coredump[1267441]: Module libcrypt.so.2 without
> build-id.
> [331290.296160] systemd-coredump[1267441]: Module libcap.so.2 without
> build-id.
> [331290.296708] systemd-coredump[1267441]: Module libacl.so.1 without
> build-id.
> [331290.297206] systemd-coredump[1267441]: Module libsystemd-shared-253.so
> without build-id.
> [331290.297768] systemd-coredump[1267441]: Module systemd-journald without
> build-id.
> [331290.298273] systemd-coredump[1267441]: Stack trace of thread 528:
> [331290.298714] systemd-coredump[1267441]: #0 0x00007f3e96431de8
> check_object_header (libsystemd-shared-253.so + 0x231de8)
> [331290.299435] systemd-coredump[1267441]: #1 0x00007f3e964330d2
> journal_file_move_to_object (libsystemd-shared-253.so + 0x2330d2)
> [331290.300199] systemd-coredump[1267441]: #2 0x00007f3e96434a71
> journal_file_find_data_object_with_hash (libsystemd-shared-253.so + 0x234a71)
> [331290.302207] systemd-coredump[1267441]: #3 0x00007f3e96434d3b
> journal_file_append_data (libsystemd-shared-253.so + 0x234d3b)
> [331290.302983] systemd-coredump[1267441]: #4 0x00007f3e96437243
> journal_file_append_entry (libsystemd-shared-253.so + 0x237243)
> [331290.303767] systemd-coredump[1267441]: #5 0x000056536a4a7cd5
> server_dispatch_message_real (systemd-journald + 0x10cd5)
> [331290.304504] systemd-coredump[1267441]: #6 0x000056536a4a134b
> dev_kmsg_record (systemd-journald + 0xa34b)
> [331290.305195] systemd-coredump[1267441]: #7 0x000056536a4a182b
> server_read_dev_kmsg (systemd-journald + 0xa82b)
> [331290.305922] systemd-coredump[1267441]: #8 0x00007f3e9645c140
> source_dispatch (libsystemd-shared-253.so + 0x25c140)
> [331290.306640] systemd-coredump[1267441]: #9 0x00007f3e9645c42d
> sd_event_dispatch (libsystemd-shared-253.so + 0x25c42d)
> [331290.307350] systemd-coredump[1267441]: #10 0x00007f3e9645cb48
> sd_event_run
> (libsystemd-shared-253.so + 0x25cb48)
> [331290.308043] systemd-coredump[1267441]: #11 0x000056536a4a0568 main
> (systemd-journald + 0x9568)
> [331290.308652] systemd-coredump[1267441]: #12 0x00007f3e9603dace
> __libc_start_call_main (libc.so.6 + 0x23ace)
> [331290.309292] systemd-coredump[1267441]: #13 0x00007f3e9603db89
> __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x23b89)
> [331290.310048] systemd-coredump[1267441]: #14 0x000056536a4a0835 _start
> (systemd-journald + 0x9835)
> [331290.310692] systemd-coredump[1267441]: ELF object binary architecture:
> AMD
> x86-64
>
>
> This go back to normal until around 20:03 (around 14 hours later):
>
> [389370.873026] rcu: INFO: rcu_preempt self-detected stall on CPU
> [389370.873481] rcu: 0-....: (21000 ticks this GP)
> idle=91fc/1/0x4000000000000000 softirq=85252827/85252827 fqs=4704
> [389370.874126] (t=21002 jiffies g=167843445 q=13889 ncpus=3)
> [389370.874500] CPU: 0 PID: 2202919 Comm: .postgres-wrapp Not tainted 6.1.31
> #1-NixOS
> [389370.874979] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
> [389370.875710] RIP: 0010:xas_descend+0x26/0x70
> [389370.875991] Code: cc cc cc cc 0f b6 0e 48 8b 57 08 48 d3 ea 83 e2 3f 89
> d0
> 48 83 c0 04 48 8b 44 c6 08 48 89 77 18 48 89 c1 83 e1 03 48 83 f9 02 <75> 08
> 48
> 3d fd 00 00 00 76 08 88 57 12 c3 cc cc cc cc 48 c1 e8 02
> [389370.877164] RSP: 0018:ffffb427c4917bf0 EFLAGS: 00000246
> [389370.877512] RAX: ffff98871f8dbdaa RBX: ffffb427c4917d70 RCX:
> 0000000000000002
> [389370.877967] RDX: 0000000000000005 RSI: ffff988876d3c000 RDI:
> ffffb427c4917c00
> [389370.878426] RBP: 000000000000f177 R08: ffffb427c4917e68 R09:
> ffff988846485d38
> [389370.878880] R10: ffffb427c4917e60 R11: ffff988846485d38 R12:
> 000000000000f177
> [389370.879341] R13: ffff988827b4ae00 R14: 000000000000f176 R15:
> ffffb427c4917e90
> [389370.879798] FS: 00007ff8de817800(0000) GS:ffff98887ac00000(0000)
> knlGS:0000000000000000
> [389370.880308] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [389370.880683] CR2: 00007ff881c8c000 CR3: 000000010dfea000 CR4:
> 00000000000006f0
> [389370.881139] Call Trace:
> [389370.881309] <IRQ>
> [389370.881463] ? rcu_dump_cpu_stacks+0xc8/0x100
> [389370.881755] ? rcu_sched_clock_irq.cold+0x15b/0x2fb
> [389370.882077] ? sched_slice+0x87/0x140
> [389370.882325] ? perf_event_task_tick+0x64/0x370
> [389370.882629] ? __cgroup_account_cputime_field+0x5b/0xa0
> [389370.882968] ? update_process_times+0x77/0xb0
> [389370.883258] ? tick_sched_handle+0x34/0x50
> [389370.883538] ? tick_sched_timer+0x6f/0x80
> [389370.883802] ? tick_sched_do_timer+0xa0/0xa0
> [389370.884084] ? __hrtimer_run_queues+0x112/0x2b0
> [389370.884388] ? hrtimer_interrupt+0xfe/0x220
> [389370.884666] ? __sysvec_apic_timer_interrupt+0x7f/0x170
> [389370.885006] ? sysvec_apic_timer_interrupt+0x99/0xc0
> [389370.885338] </IRQ>
> [389370.885488] <TASK>
> [389370.885636] ? asm_sysvec_apic_timer_interrupt+0x16/0x20
> [389370.885983] ? xas_descend+0x26/0x70
> [389370.886224] xas_load+0x30/0x40
> [389370.886456] filemap_get_read_batch+0x16e/0x250
> [389370.886758] filemap_get_pages+0xa9/0x630
> [389370.887026] ? atime_needs_update+0x104/0x180
> [389370.887317] ? touch_atime+0x46/0x1f0
> [389370.887573] filemap_read+0xd2/0x340
> [389370.887817] xfs_file_buffered_read+0x4f/0xd0 [xfs]
> [389370.888210] xfs_file_read_iter+0x6a/0xd0 [xfs]
> [389370.888571] vfs_read+0x23c/0x310
> [389370.888797] __x64_sys_pread64+0x94/0xc0
> [389370.889058] do_syscall_64+0x3a/0x90
> [389370.889299] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [389370.889634] RIP: 0033:0x7ff8de90e747
> [389370.889900] Code: 48 e8 9d dc f2 ff 41 b8 02 00 00 00 e9 38 f6 ff ff 66
> 90
> f3 0f 1e fa 80 3d bd ac 0e 00 00 49 89 ca 74 10 b8 11 00 00 00 0f 05 <48> 3d
> 00
> f0 ff ff 77 59 c3 48 83 ec 28 48 89 54 24 10 48 89 74 24
> [389370.891063] RSP: 002b:00007fff5206a848 EFLAGS: 00000202 ORIG_RAX:
> 0000000000000011
> [389370.891549] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
> 00007ff8de90e747
> [389370.892003] RDX: 0000000000002000 RSI: 00007ff881c8ce80 RDI:
> 00000000000001c7
> [389370.892466] RBP: 0000000000000001 R08: 000000000a00000d R09:
> 0000000000000000
> [389370.892923] R10: 000000000f176000 R11: 0000000000000202 R12:
> 00007ff8cdd91278
> [389370.893386] R13: 0000562bb434af75 R14: 0000562bb434f510 R15:
> 0000562bb5e55828
> [389370.893843] </TASK>
>
>
> At this point we have to restart the VM externally because it didn't respond
> properly any longer.
>
> --
> You may reply to this email to add a comment.
>
> You are receiving this mail because:
> You are watching the assignee of the bug.
Liebe Grüße,
Christian Theune
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
next prev parent reply other threads:[~2023-06-20 15:10 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-19 8:29 [Bug 217572] New: Initial blocked tasks causing deterioration over hours until (nearly) complete system lockup and data loss with PostgreSQL 13 bugzilla-daemon
2023-06-20 15:10 ` Christian Theune
2023-06-20 15:11 ` Christian Theune
2023-06-20 15:10 ` bugzilla-daemon [this message]
2023-06-20 15:13 ` [Bug 217572] " bugzilla-daemon
2023-06-20 15:21 ` bugzilla-daemon
2023-06-20 17:26 ` bugzilla-daemon
2023-07-03 14:10 ` bugzilla-daemon
2023-07-03 19:56 ` bugzilla-daemon
2023-07-03 22:30 ` Dave Chinner
2023-07-03 22:30 ` bugzilla-daemon
2023-07-04 4:22 ` bugzilla-daemon
2023-07-05 22:07 ` bugzilla-daemon
2023-09-28 12:39 ` bugzilla-daemon
2023-09-28 22:44 ` Dave Chinner
2023-09-28 13:06 ` bugzilla-daemon
2023-09-28 22:44 ` bugzilla-daemon
2023-09-29 4:54 ` bugzilla-daemon
2023-09-29 5:01 ` bugzilla-daemon
2023-10-05 14:31 ` bugzilla-daemon
2023-10-08 17:35 ` bugzilla-daemon
2023-10-08 22:13 ` bugzilla-daemon
2023-11-02 15:27 ` bugzilla-daemon
2023-11-02 20:58 ` Dave Chinner
2023-11-02 15:28 ` bugzilla-daemon
2023-11-02 15:29 ` bugzilla-daemon
2023-11-02 16:23 ` bugzilla-daemon
2023-11-02 20:59 ` bugzilla-daemon
2023-11-03 12:52 ` bugzilla-daemon
2023-11-07 10:11 ` bugzilla-daemon
2023-11-07 10:25 ` bugzilla-daemon
2023-11-07 14:12 ` bugzilla-daemon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bug-217572-201763-CDEPKlpRnd@https.bugzilla.kernel.org/ \
--to=bugzilla-daemon@kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).