From: bugzilla-daemon@kernel.org
To: linux-xfs@vger.kernel.org
Subject: [Bug 217572] Initial blocked tasks causing deterioration over hours until (nearly) complete system lockup and data loss with PostgreSQL 13
Date: Thu, 02 Nov 2023 15:27:58 +0000 [thread overview]
Message-ID: <bug-217572-201763-LUmZsDeuuk@https.bugzilla.kernel.org/> (raw)
In-Reply-To: <bug-217572-201763@https.bugzilla.kernel.org/>
https://bugzilla.kernel.org/show_bug.cgi?id=217572
--- Comment #18 from Christian Theune (ct@flyingcircus.io) ---
We've updated a while ago and our fleet is not seeing improved results. They've
actually seemed to have gotten worse according to the number of alerts we've
seen.
We've had a multitude of crashes in the last weeks with the following
statistics:
6.1.31 - 2 affected machines
6.1.35 - 1 affected machine
6.1.37 - 1 affected machine
6.1.51 - 5 affected machines
6.1.55 - 2 affected machines
6.1.57 - 2 affected machines
Here's the more detailed behaviour of one of the machines with 6.1.57.
$ uptime
16:10:23 up 13 days 19:00, 1 user, load average: 3.21, 1.24, 0.57
$ uname -a
Linux ts00 6.1.57 #1-NixOS SMP PREEMPT_DYNAMIC Tue Oct 10 20:00:46 UTC 2023
x86_64 GNU/Linux
And here' the stall:
[654042.623386] rcu: INFO: rcu_preempt self-detected stall on CPU
[654042.624109] rcu: 1-....: (21079 ticks this GP)
idle=380c/1/0x4000000000000000 softirq=136208646/136208648 fqs=7552
[654042.625253] (t=21000 jiffies g=210623333 q=40912 ncpus=2)
[654042.625871] CPU: 1 PID: 1230375 Comm: nix-build Not tainted 6.1.57 #1-NixOS
[654042.626650] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[654042.627898] RIP: 0010:xas_descend+0x22/0x90
[654042.628379] Code: cc cc cc cc cc cc cc cc 0f b6 0e 48 8b 57 08 48 d3 ea 83
e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 77 18 48 89 c1 83 e1 03 <48> 83 f9
02 75 08 48 3d fd 00 00 00 76 08 88 57 12 c3 cc cc cc cc
[654042.630402] RSP: 0018:ffffa213c4c07bf8 EFLAGS: 00000202
[654042.630993] RAX: ffff8f9da3bca492 RBX: ffffa213c4c07d78 RCX:
0000000000000002
[654042.631782] RDX: 0000000000000004 RSI: ffff8f9eb8700248 RDI:
ffffa213c4c07c08
[654042.632570] RBP: 000000000000010f R08: ffffa213c4c07e70 R09:
ffff8f9e54dc2138
[654042.633352] R10: ffffa213c4c07e68 R11: ffff8f9e54dc2138 R12:
000000000000010f
[654042.634140] R13: ffff8f9d44c7ad00 R14: 0000000000000100 R15:
ffffa213c4c07e98
[654042.634934] FS: 00007faf9514ff80(0000) GS:ffff8f9ebad00000(0000)
knlGS:0000000000000000
[654042.635823] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[654042.636468] CR2: 00007faf78168000 CR3: 00000000366d2000 CR4:
00000000000006e0
[654042.637264] Call Trace:
[654042.637560] <IRQ>
[654042.637809] ? rcu_dump_cpu_stacks+0xc8/0x100
[654042.638305] ? rcu_sched_clock_irq.cold+0x15b/0x2fb
[654042.638862] ? sched_slice+0x87/0x140
[654042.639281] ? timekeeping_update+0xdd/0x130
[654042.639781] ? __cgroup_account_cputime_field+0x5b/0xa0
[654042.640363] ? update_process_times+0x77/0xb0
[654042.640862] ? update_wall_time+0xc/0x20
[654042.641305] ? tick_sched_handle+0x34/0x50
[654042.641773] ? tick_sched_timer+0x6f/0x80
[654042.642224] ? tick_sched_do_timer+0xa0/0xa0
[654042.642710] ? __hrtimer_run_queues+0x112/0x2b0
[654042.643220] ? hrtimer_interrupt+0xfe/0x220
[654042.643703] ? __sysvec_apic_timer_interrupt+0x7f/0x170
[654042.644286] ? sysvec_apic_timer_interrupt+0x99/0xc0
[654042.644849] </IRQ>
[654042.645101] <TASK>
[654042.645353] ? asm_sysvec_apic_timer_interrupt+0x16/0x20
[654042.645956] ? xas_descend+0x22/0x90
[654042.646366] xas_load+0x30/0x40
[654042.646738] filemap_get_read_batch+0x16e/0x250
[654042.647253] filemap_get_pages+0xa9/0x630
[654042.647714] filemap_read+0xd2/0x340
[654042.648124] ? __mod_memcg_lruvec_state+0x6e/0xd0
[654042.648670] xfs_file_buffered_read+0x4f/0xd0 [xfs]
[654042.649307] xfs_file_read_iter+0x6a/0xd0 [xfs]
[654042.649887] vfs_read+0x23c/0x310
[654042.650276] ksys_read+0x6b/0xf0
[654042.650658] do_syscall_64+0x3a/0x90
[654042.651071] entry_SYSCALL_64_after_hwframe+0x64/0xce
[654042.651650] RIP: 0033:0x7faf968ee78c
[654042.652085] Code: ec 28 48 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 a9 bb
f8 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 31 c0 0f 05 <48> 3d 00
f0 ff ff 77 34 44 89 c7 48 89 44 24 08 e8 ff bb f8 ff 48
[654042.654113] RSP: 002b:00007fff8d7e72e0 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[654042.654954] RAX: ffffffffffffffda RBX: 00005572a3d2c5f0 RCX:
00007faf968ee78c
[654042.655745] RDX: 0000000000010000 RSI: 00005572a3d2c5f0 RDI:
000000000000000c
[654042.656540] RBP: 00007fff8d7e7380 R08: 0000000000000000 R09:
0000000000000000
[654042.657327] R10: 0000000000000022 R11: 0000000000000246 R12:
000000000000000c
[654042.658119] R13: 00007faf96dfe6a8 R14: 0000000000000001 R15:
0000000000000001
[654042.658916] </TASK>
In previous situations this self-detected stall only happened after other
errors occured before them, afaict this is now happening "standalone" without
those other errors, maybe this is new info?
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
next prev parent reply other threads:[~2023-11-02 15:28 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-19 8:29 [Bug 217572] New: Initial blocked tasks causing deterioration over hours until (nearly) complete system lockup and data loss with PostgreSQL 13 bugzilla-daemon
2023-06-20 15:10 ` Christian Theune
2023-06-20 15:11 ` Christian Theune
2023-06-20 15:10 ` [Bug 217572] " bugzilla-daemon
2023-06-20 15:13 ` bugzilla-daemon
2023-06-20 15:21 ` bugzilla-daemon
2023-06-20 17:26 ` bugzilla-daemon
2023-07-03 14:10 ` bugzilla-daemon
2023-07-03 19:56 ` bugzilla-daemon
2023-07-03 22:30 ` Dave Chinner
2023-07-03 22:30 ` bugzilla-daemon
2023-07-04 4:22 ` bugzilla-daemon
2023-07-05 22:07 ` bugzilla-daemon
2023-09-28 12:39 ` bugzilla-daemon
2023-09-28 22:44 ` Dave Chinner
2023-09-28 13:06 ` bugzilla-daemon
2023-09-28 22:44 ` bugzilla-daemon
2023-09-29 4:54 ` bugzilla-daemon
2023-09-29 5:01 ` bugzilla-daemon
2023-10-05 14:31 ` bugzilla-daemon
2023-10-08 17:35 ` bugzilla-daemon
2023-10-08 22:13 ` bugzilla-daemon
2023-11-02 15:27 ` bugzilla-daemon [this message]
2023-11-02 20:58 ` Dave Chinner
2023-11-02 15:28 ` bugzilla-daemon
2023-11-02 15:29 ` bugzilla-daemon
2023-11-02 16:23 ` bugzilla-daemon
2023-11-02 20:59 ` bugzilla-daemon
2023-11-03 12:52 ` bugzilla-daemon
2023-11-07 10:11 ` bugzilla-daemon
2023-11-07 10:25 ` bugzilla-daemon
2023-11-07 14:12 ` bugzilla-daemon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bug-217572-201763-LUmZsDeuuk@https.bugzilla.kernel.org/ \
--to=bugzilla-daemon@kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox