From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3765B4D98FD for ; Tue, 12 May 2026 11:35:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778585705; cv=none; b=hIlu7zeLEZiH16kcp7MPZ8CyNz5s+WBwNRZPXlNY0ZfccXLD5x6Qe1id1sl5kG2JpD7ionI8/9qlqHanpg2+rL9CUH21KFn0GYCzvfM1sZJ1bRMclhJbnt4+FxdlZPgxOZYwh5znSFAD9AplIMkwnN6WZYKqGgy870aFC+4eBzk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778585705; c=relaxed/simple; bh=Wf1k5jCSkeyMl6borMZ+pHNCJaEdTHw2l9Vp6dR3hVY=; h=To:From:Subject:Message-ID:Date:MIME-Version:Content-Type; b=bn5LY8QeOYKuZoy30+oSz5gCK0jMawqiLx8JNU4wTCiHqlP2KxNbX+iQ6H2ggBrCniyGQ/TGfX6US9v+T0kVBVD6AJYmHmG+1Q1GLbhQ1fw5yE1esAmCyvW2WLwQPmcw6TMvE6mPt9CRGTzcs6sNkyiaYoVX76eDSUQHWhCLRds= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.170]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTPS id 4gFDy43xqqzKHLv9 for ; Tue, 12 May 2026 19:34:04 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id EBCEE40562 for ; Tue, 12 May 2026 19:34:56 +0800 (CST) Received: from [10.174.178.185] (unknown [10.174.178.185]) by APP4 (Coremail) with SMTP id gCh0CgD3v1tgEANq5_cbCA--.59767S3; Tue, 12 May 2026 19:34:56 +0800 (CST) To: linux-xfs@vger.kernel.org, djwong@kernel.org, hch@lst.de, dgc@kernel.org From: yebin Subject: [bug report] kernel BUG at fs/xfs/xfs_message.c:102! Message-ID: <6A031038.9030708@huaweicloud.com> Date: Tue, 12 May 2026 19:34:16 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-CM-TRANSID:gCh0CgD3v1tgEANq5_cbCA--.59767S3 X-Coremail-Antispam: 1UD129KBjvJXoWxKry5CFyfAryUXFyfKF45ZFb_yoWxuFyxpr ZxCr1UGF4vqw18ZFsrAw15tr1fAw47CF4UJF4Ikr1fZa98CryIqrWDtF4YqFyDXrWrZFy2 qF4Yy34vyw1YvaDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUyCb4IE77IF4wAFF20E14v26r1j6r4UM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4 vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7Cj xVAFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x 0267AKxVW0oVCq3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG 6I80ewAv7VC0I7IYx2IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFV Cjc4AY6r1j6r4UM4x0Y48IcVAKI48JMxk0xIA0c2IEe2xFo4CEbIxvr21l42xK82IYc2Ij 64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x 8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r126r1DMIIYrxkI7VAKI48JMIIF0xvE 2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r1j6r4UMIIF0xvE42 xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF 7I0E14v26r1j6r4UYxBIdaVFxhVjvjDU0xZFpf9x07UWHqcUUUUU= X-CM-SenderInfo: p1hex046kxt4xhlfz01xgou0bp/ Hello Darrick and all, Recently, I encountered a problem where a BUG was triggered in the write-back process. The detailed problem information is as follows: ``` XFS (sde): Corruption of in-memory data (0x8) detected at xfs_trans_mod_sb+0xaa6/0xc60 (fs/xfs/xfs_trans.c:351). Shutting. XFS (sde): Please unmount the filesystem and rectify the problem(s) XFS: Assertion failed: tp->t_blk_res || tp->t_fdblocks_delta >= 0, file: fs/xfs/xfs_trans.c, line: 610 ------------[ cut here ]------------ kernel BUG at fs/xfs/xfs_message.c:102! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI RIP: 0010:assfail+0x9f/0xb0 Code: fe 84 db 75 20 e8 51 2e 33 fe 0f 0b 5b 5d 41 5c 41 5d c3 cc cc cc cc 48 c7 c7 58 ae 2b 8d e8 08 73 a2 fe eb cc e8 310 RSP: 0018:ffffc9000f6372e0 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff838c91a6 RDX: ffff8881a856bb00 RSI: ffffffff838c91cf RDI: 0000000000000001 RBP: 0000000000000000 R08: 0000000000000001 R09: fffff52001ec6ded R10: 0000000000000001 R11: 0000000000000001 R12: ffffffff8a956520 R13: 0000000000000262 R14: 0000000000000000 R15: ffffffffffffffff FS: 00007f7ee1f5b740(0000) GS:ffff88878bb45000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f0e632788f0 CR3: 00000001b524a000 CR4: 00000000000006f0 Call Trace: xfs_trans_unreserve_and_mod_sb+0xb86/0xd00 __xfs_trans_commit+0x38b/0xe00 xfs_trans_commit+0xeb/0x1a0 xfs_bmapi_convert_one_delalloc+0xbca/0x1270 xfs_bmapi_convert_delalloc+0x101/0x350 xfs_writeback_range+0x76c/0x12d0 iomap_writeback_folio+0x9ed/0x2100 iomap_writepages+0x13c/0x2a0 xfs_vm_writepages+0x278/0x330 do_writepages+0x247/0x5c0 filemap_writeback+0x22c/0x2e0 xfs_file_release+0x442/0x580 __fput+0x407/0xb50 fput_close_sync+0x114/0x210 __x64_sys_close+0x94/0x120 do_syscall_64+0xc4/0xf80 entry_SYSCALL_64_after_hwframe+0x76/0x7e ``` After analyzing the above issues, the possible triggering process is as follows: ``` xfs_bmapi_convert_delalloc xfs_bmapi_convert_one_delalloc xfs_bmapi_allocate xfs_bmap_add_extent_delay_real da_old = startblockval(PREV.br_startblock); // da_old = 5 case BMAP_LEFT_FILLING: ifp->if_nextents++; // 21 + 1 = 22 if (xfs_bmap_needs_btree(bma->ip, whichfork)) // 22 > 21 xfs_bmap_extents_to_btree // convert to btree cur->bc_ino.allocated++; da_new = XFS_FILBLKS_MIN(xfs_bmap_worst_indlen(bma->ip, temp), startblockval(PREV.br_startblock) - (bma->cur ? bma->cur->bc_ino.allocated : 0)); // da_new = 5 - 1 = 4 PREV.br_startblock = nullstartblock(da_new); //xfs_bmapi_convert_one_delalloc() return xfs_bmap_del_extent_real case BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING: ifp->if_nextents--; // 22 - 1 = 21 if (xfs_bmap_needs_btree(ip, whichfork)) xfs_bmap_extents_to_btree else xfs_bmap_btree_to_extents // convert to extents ... // Alternate a few times in the middle. da_old = 4 da_old = 3 da_old = 2 da_old = 1 ... xfs_bmapi_convert_delalloc xfs_bmapi_convert_one_delalloc error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, 0, 0, XFS_TRANS_RESERVE, &tp); // Both blocks and rtextents are 0 tp = kmem_cache_zalloc(xfs_trans_cache, GFP_KERNEL | __GFP_NOFAIL); error = xfs_trans_reserve(tp, resp, blocks, rtextents); if (blocks > 0) error = xfs_mod_fdblocks(mp, -((int64_t)blocks), rsvd); tp->t_blk_res += blocks; // The value of blocks is 0, so the value of tp->t_blk_res is 0 xfs_bmapi_allocate xfs_bmap_add_extent_delay_real da_old = startblockval(PREV.br_startblock); // da_old = 0 case BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING: // The current delay extent is just exhausted. ifp->if_nextents++; // 21 + 1 + 22 if (xfs_bmap_needs_btree(bma->ip, whichfork)) // 22 > 21 error = xfs_bmap_extents_to_btree(bma->tp, bma->ip, &bma->cur, da_old > 0, &tmp_logflags, whichfork); // Converted to btree. da_old > 0 is false. args.wasdel = wasdel; // wasdel is false error = xfs_alloc_vextent(&args); xfs_alloc_ag_vextent(args, 0) xfs_ag_resv_alloc_extent(args->pag, args->resv, args); case XFS_AG_RESV_NONE: field = args->wasdel ? XFS_TRANS_SB_RES_FDBLOCKS : XFS_TRANS_SB_FDBLOCKS; //args->wasdel == false xfs_trans_mod_sb(args->tp, field, -(int64_t)args->len); case XFS_TRANS_SB_FDBLOCKS: if (delta < 0) tp->t_blk_res_used += (uint)-delta; if (tp->t_blk_res_used > tp->t_blk_res) // ***tp->t_blk_res is 0, thus triggering xfs_force_shutdown()*** xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); ``` The logic that triggers the issue above was designed by me to facilitate the construction of the problem. Besides the scenario where XFS_DINODE_FMT_BTREE and XFS_DINODE_FMT_EXTENTS are converted back and forth, there is also the scenario of btree splitting. The core reason for the issue is that in xfs_bmapi_convert_delalloc(), the call to xfs_bmap_worst_indlen() calculates the worst-case number of reserved blocks, which is the number of additional blocks required after a complete conversion of the entire delayed extent. It assumes that the entire conversion process is atomic. However, the current process cannot guarantee such atomicity. In the case of a fragmented filesystem, the most extreme scenario is that every block conversion triggers a full btree split, in which case the reserved blocks are far from sufficient. When this issue is triggered, the filesystem fragmentation in the environment is indeed quite severe. Further analysis of this abnormal model shows that because the reserved blocks are continuously consumed, they may eventually exceed the reserved amount. When the space is nearly exhausted, xfs_bmap_extents_to_btree() may fail to allocate blocks, triggering a warning. This failure to allocate additional blocks can lead to issues with normal block allocation. Additionally, in xfs_bmap_add_extent_delay_real(), if a delayed extent is split into two, xfs_bmap_worst_indlen() is recalculated to reserve blocks. In the case of nearly exhausted space, it may be impossible to reserve the newly required blocks, leading to a writeback failure. During the reservation phase, reserving more blocks by considering the worst-case scenario would require occupying a lot of extra space, which is not very practical. I was thinking that we could convert all the delay extents at once to ensure atomicity, which would ensure that the two issues analyzed above do not exist. However, I am not sure what negative impacts this approach might have. The only thing I can think of is that the reserved space would be repeatedly allocated and released, but I believe the current logic already has similar situations. I haven't thought of a better solution at the moment. I wonder if anyone has any good ideas? Thanks, Ye Bin