From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6EF7ACDB47E for ; Sun, 15 Oct 2023 19:06:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230261AbjJOTGg (ORCPT ); Sun, 15 Oct 2023 15:06:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42514 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230200AbjJOTGe (ORCPT ); Sun, 15 Oct 2023 15:06:34 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 934E9C5 for ; Sun, 15 Oct 2023 12:06:32 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPS id 3FD77C433CB for ; Sun, 15 Oct 2023 19:06:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1697396792; bh=3+hqk5hn4BpMotv9H/yuIHv2sIcrLzkaHVdVIqdAR30=; h=From:To:Subject:Date:In-Reply-To:References:From; b=T+vOMS4i5/HZp909pfFfUuAA+CI/5CwF/+GO4w6X3+IGch2/k28/pjOj8MMQrQBPR pHyhjwfbU+8hwXgZAAFpc54q7HGzdShTFMKRbU/1zXBSVhH4yuaNs7Ll819qHQ8QOz eKnoBCO75C9mcZwWmuSkl7jzNwB7Lukt9C9v1zRejOfb8Zp5HiXBWelQYB6kG0iUIx rLAVhLXVAAKCry0AB/tJ5GjR0csI4P0VS9aTfgsC5tk3y29pLFOAlamLb9iXNWmZjk siob2/Up7qcCbU4PLZilEcogv+yPW4PEARxBzdbSk2kNtQ4Z45lsi0RtiFN1vbpayx +a3UN0tRaWpvQ== Received: by aws-us-west-2-korg-bugzilla-1.web.codeaurora.org (Postfix, from userid 48) id 25C4DC53BD2; Sun, 15 Oct 2023 19:06:32 +0000 (UTC) From: bugzilla-daemon@kernel.org To: linux-ext4@vger.kernel.org Subject: [Bug 218011] ext4 root filesystem related hangs on 6.5 kernels Date: Sun, 15 Oct 2023 19:06:31 +0000 X-Bugzilla-Reason: None X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: AssignedTo fs_ext4@kernel-bugs.osdl.org X-Bugzilla-Product: File System X-Bugzilla-Component: ext4 X-Bugzilla-Version: 2.5 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: tytso@mit.edu X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: fs_ext4@kernel-bugs.osdl.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugzilla.kernel.org/ Auto-Submitted: auto-generated MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org https://bugzilla.kernel.org/show_bug.cgi?id=3D218011 Theodore Tso (tytso@mit.edu) changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |tytso@mit.edu --- Comment #6 from Theodore Tso (tytso@mit.edu) --- It would be really nice to get a translation from the stack trace offsets to line numbers, but what appears to be happening is that we're starting a jou= rnal commit, and to complete the journal, since we are in the default data=3Dord= ered mode, we call ext4_journalled_submit_inode_data_buffers(), which in turn ca= lls write_cache_pages() to flush out modified data blocks associated with an in= ode which had newly allocated blocks (so that we don't accidentally expose stale data blocks if there is a crash, which is a guarantee of data=3Dordered mod= e). The write_cache_pages() function is then calling some function in mm/filema= p.c (this is where a line number translation would be happy), which calls folio_wait_bit_common(), which presumably is waiting for some memory folio which is undergoing writeback, or otherwise busy, to complete. This then calls io_schedule() --- because we're waiting for some I/O to complete, and this apparently never completes, thus stalling the jbd2 commit operation, a= nd then all of the other processes which are trying to make changes to the file system are waiting for the commit complete, leading to all of the other sta= ck traces. The question is why is this happening on your system? It could be becaus= e of some kind of missed I/O completion interrupt, or some other problem in the block device layer or NVMe driver ---but normally if that were the case, th= ere should have been some kind of kernel log messages from those parts of the I= /O stack. Were there any that you could see (that perhaps were excerpted out= in the bug report, since "obviously" it was assumed this was an ext4 problem, = as opposed to ext4 simply being an innocent victim of problems lower down on t= he storage stack? The other question that might be worth asking is what sort of workload does your server run, and how might this be different from what other users migh= t be doing, or what we exercise with out regression tests? --=20 You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug.=