From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90E55C433EF for ; Wed, 20 Oct 2021 16:31:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 64E1D61354 for ; Wed, 20 Oct 2021 16:31:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229941AbhJTQdX (ORCPT ); Wed, 20 Oct 2021 12:33:23 -0400 Received: from mail.kernel.org ([198.145.29.99]:35188 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229817AbhJTQdW (ORCPT ); Wed, 20 Oct 2021 12:33:22 -0400 Received: by mail.kernel.org (Postfix) with ESMTPS id 04A8261260 for ; Wed, 20 Oct 2021 16:31:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1634747468; bh=phsWdXJ85cSTZE4o11xlO4Q90OIiUfxV088yw/k7XM0=; h=From:To:Subject:Date:In-Reply-To:References:From; b=W6xVqXKMTCZswOB3+QidPg5qb3EsYQwP/hgjGMzn3eH99ccOs6JtVTKaN3/pmAHAW hDC5NTWM58h7Fis5PSpYWYTNaSD++JDltivcs3OLhglLBdeJm4JdbN0oIHNp4qS9ib umqnErsWG1tn75rgbZs+Bs5Ot6DqFPao9jPoXp+zjW1UwGKFJO3BXwF9YLtV7OWWcb kQsc3l+khqqAXMz7vkkg+rrIinmHI7e2W7B7AjOgW6gppgy080KP+PiSPgJ31DDft8 bs/AbgfgJODm03LRYUiTS9p2hwONonP5Y71nn/6UeyX6reGN4pBYo/Udw7PUK+QX7C Rq39fSjNsjFWw== From: bugzilla-daemon@bugzilla.kernel.org To: linux-xfs@vger.kernel.org Subject: [Bug 214767] xfs seems to hang due to race condition? maybe related to (gratuitous) thaw. Date: Wed, 20 Oct 2021 16:31:07 +0000 X-Bugzilla-Reason: None X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: AssignedTo filesystem_xfs@kernel-bugs.kernel.org X-Bugzilla-Product: File System X-Bugzilla-Component: XFS X-Bugzilla-Version: 2.5 X-Bugzilla-Keywords: X-Bugzilla-Severity: high X-Bugzilla-Who: ct@flyingcircus.io X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P1 X-Bugzilla-Assigned-To: filesystem_xfs@kernel-bugs.kernel.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugzilla.kernel.org/ Auto-Submitted: auto-generated MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org https://bugzilla.kernel.org/show_bug.cgi?id=3D214767 --- Comment #3 from Christian Theune (ct@flyingcircus.io) --- I have another machine that managed to break free after 20 minutes which did around 3 thaws around that time: Oct 20 18:05:22 pixometerstag04 kernel: INFO: task nix-daemon:1387736 block= ed for more than 1228 seconds. Oct 20 18:05:22 pixometerstag04 kernel: Not tainted 5.10.70 #1-NixOS Oct 20 18:05:22 pixometerstag04 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 20 18:05:22 pixometerstag04 kernel: task:nix-daemon state:D stack:= =20=20=20 0 pid:1387736 ppid: 1356 flags:0x00000000 Oct 20 18:05:22 pixometerstag04 kernel: Call Trace: Oct 20 18:05:22 pixometerstag04 kernel: __schedule+0x271/0x860 Oct 20 18:05:22 pixometerstag04 kernel: schedule+0x46/0xb0 Oct 20 18:05:22 pixometerstag04 kernel: xfs_log_commit_cil+0x6a4/0x800 [xf= s] Oct 20 18:05:22 pixometerstag04 kernel: ? wake_up_q+0xa0/0xa0 Oct 20 18:05:22 pixometerstag04 kernel: __xfs_trans_commit+0x9d/0x310 [xfs] Oct 20 18:05:22 pixometerstag04 kernel: xfs_create+0x472/0x560 [xfs] Oct 20 18:05:22 pixometerstag04 kernel: xfs_generic_create+0x247/0x320 [xf= s] Oct 20 18:05:22 pixometerstag04 kernel: ? xfs_lookup+0x55/0x100 [xfs] Oct 20 18:05:22 pixometerstag04 kernel: path_openat+0xdd7/0x1070 Oct 20 18:05:22 pixometerstag04 kernel: do_filp_open+0x88/0x130 Oct 20 18:05:22 pixometerstag04 kernel: ? getname_flags.part.0+0x29/0x1a0 Oct 20 18:05:22 pixometerstag04 kernel: do_sys_openat2+0x97/0x150 Oct 20 18:05:22 pixometerstag04 kernel: __x64_sys_openat+0x54/0x90 Oct 20 18:05:22 pixometerstag04 kernel: do_syscall_64+0x33/0x40 Oct 20 18:05:22 pixometerstag04 kernel:=20 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Oct 20 18:05:22 pixometerstag04 kernel: RIP: 0033:0x7f3e3114dea8 Oct 20 18:05:22 pixometerstag04 kernel: RSP: 002b:00007ffd92f7df10 EFLAGS: 00000293 ORIG_RAX: 0000000000000101 Oct 20 18:05:22 pixometerstag04 kernel: RAX: ffffffffffffffda RBX: 00000000000800c1 RCX: 00007f3e3114dea8 Oct 20 18:05:22 pixometerstag04 kernel: RDX: 00000000000800c1 RSI: 000000000195cb50 RDI: 00000000ffffff9c Oct 20 18:05:22 pixometerstag04 kernel: RBP: 000000000195cb50 R08: 0000000000000000 R09: 0000000000000003 Oct 20 18:05:22 pixometerstag04 kernel: R10: 00000000000001b6 R11: 0000000000000293 R12: 00007ffd92f7e300 Oct 20 18:05:22 pixometerstag04 kernel: R13: 00007ffd92f7dfc0 R14: 00007ffd92f7dfb0 R15: 00007ffd92f7fd30 However, the first thaw came between when the machine already reported 122 = and 245s of hangs and the last thaw came after it broke free. Interestingly it also logs a variety of things that worked while those were logged, maybe it only happened for the /tmp filesystem and not / ... appare= ntly postgresql was still writing stuff on the disk in that period, so I'd guess this only happened on /tmp This is square in the middle of the reported blocks: Oct 20 17:52:00 pixometerstag04 postgres[1008]: user=3D,db=3D LOG: checkpo= int complete: wrote 0 buffers (0.0%); 0 transaction log file(s) added, 0 remove= d, 0 recycled; write=3D0.004 s, sync=3D0.001 s, total=3D0.018 s; sync fi> --=20 You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.=