From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 828842DFA4A for ; Sun, 5 Apr 2026 22:29:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775428196; cv=none; b=fK+FCsAYmlZgKSCpAZVzyeIeMpVLTKsOQQFyWyI0IKH0ZHMreBlKXt1nuRbI+/N1kQUU76WtJ3P78PZjBlWbkH5iag9RvrItBIKeEsa/tOXqTkSrfMFl5idOH7ZwbFMk1a7cmC8K/ZOSiBdH9xXEO81qtJ2EPvzKxPHmmIemLQE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775428196; c=relaxed/simple; bh=CKJtNDsNRIXX7ozYES2N00LHxy7Jqr/l6VggPef3dds=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=r3zy/C94KnCS6xXowGScYTnNZt0Y6TeK9Yyq401netLGb2h/YdM/rJ1SMBKHz4HqRzp8WBl6xVZo8LgQ91ugeCRL8MPcBEf/a7pjA1itHtJbzFIJVafw4/khFn0XMlqVc7L+8MU6hbFhZBfckuP4pJ94TBDFw8e0qFdRVgTVfSo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=VRjVgFqx; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="VRjVgFqx" Received: by smtp.kernel.org (Postfix) with ESMTPSA id E1805C116C6; Sun, 5 Apr 2026 22:29:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1775428196; bh=CKJtNDsNRIXX7ozYES2N00LHxy7Jqr/l6VggPef3dds=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=VRjVgFqxkRQJRHVWMHif8rAusbOYA4FuL7h6E2pMjcas7bteAG6yzGqEdXfuCRgLK fze6SOpP9W7Gg9iUXLg0X+vXqSQeqEyiSCGEWfzfAmISYg9Pos0Ga0cWEvq8N1uIZk OKLqjVECAfTACAlTARbkLcBUTF9Ex5jvPVZU3yGQ4BFVA4TiBJeZZtBCLgmTalPSqe XM1gkp6fOkZUDSSkoP0uGdgezc9p7wWuexMbI+qJxY/838nTbK5+7cmGsLs6f8EmMH LqQpFHqQP218MmNDZSxOrvItHTqYhz0/jky9mq0G0BJEDLsWnP8YIMlh8G6wMnvJzD ZKYE+CVMwpfDg== Date: Mon, 6 Apr 2026 08:29:51 +1000 From: Dave Chinner To: Matthew Wilcox Cc: linux-xfs@vger.kernel.org Subject: Re: Hang with xfs/285 on 2026-03-02 kernel Message-ID: References: Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Sat, Apr 04, 2026 at 09:40:37PM +0100, Matthew Wilcox wrote: > On Sat, Apr 04, 2026 at 10:42:59PM +1100, Dave Chinner wrote: > > On Fri, Apr 03, 2026 at 04:35:46PM +0100, Matthew Wilcox wrote: > > > This is with commit 5619b098e2fb so after 7.0-rc6 > > > INFO: task fsstress:3762792 blocked on a semaphore likely last held by task fsstress:3762793 > > > task:fsstress state:D stack:0 pid:3762793 tgid:3762793 ppid:3762783 task_flags:0x440140 flags:0x00080800 > > > Call Trace: > > > > > > __schedule+0x560/0xfc0 > > > schedule+0x3e/0x140 > > > schedule_timeout+0x84/0x110 > > > ? __pfx_process_timeout+0x10/0x10 > > > io_schedule_timeout+0x5b/0x80 > > > xfs_buf_alloc+0x793/0x7d0 > > > > -ENOMEM. > > > > It'll be looping here: > > > > fallback: > > for (;;) { > > bp->b_addr = __vmalloc(size, gfp_mask); > > if (bp->b_addr) > > break; > > if (flags & XBF_READ_AHEAD) > > return -ENOMEM; > > XFS_STATS_INC(bp->b_mount, xb_page_retries); > > memalloc_retry_wait(gfp_mask); > > } > > > > If it is looping here long enough to trigger the hang check timer, > > then the MM subsystem is not making progress reclaiming memory. This > > is probably a 16kB allocation (it's an inode cluster buffer), and > > the allocation context is NOFAIL because it is within a transaction > > (this loop pre-dates __vmalloc() supporting __GFP_NOFAIL).... > > There may be something else going on. I reproduced it again and ssh'd > into the VM. > > # free > total used free shared buff/cache available > Mem: 3988260 1197132 240080 144 3147496 2791128 > Swap: 2097148 258128 1839020 > > There are five instances of fsstress running. Very slowly, but they are > accumulating seconds of CPU time: > > root@deadly-kvm:~# ps -aux |grep fsstress > root 3745227 0.0 0.0 2664 1476 ? S 06:48 0:00 ./ltp/fsstress -p 4 -d /mnt/scratch -n 2000000 > root 3745236 7.5 1.6 127928 65256 ? D 06:48 42:54 ./ltp/fsstress -p 4 -d /mnt/scratch -n 2000000 > root 3745237 7.6 1.5 124644 61308 ? D 06:48 42:55 ./ltp/fsstress -p 4 -d /mnt/scratch -n 2000000 > root 3745238 7.6 1.6 130844 65584 ? D 06:48 43:01 ./ltp/fsstress -p 4 -d /mnt/scratch -n 2000000 > root 3745239 7.6 1.6 126524 66536 ? D 06:48 42:58 ./ltp/fsstress -p 4 -d /mnt/scratch -n 2000000 > root@deadly-kvm:~# ps -aux |grep fsstress > root 3745227 0.0 0.0 2664 1476 ? S 06:48 0:00 ./ltp/fsstress -p 4 -d /mnt/scratch -n 2000000 > root 3745236 5.5 1.6 133116 66708 ? R 06:48 45:44 ./ltp/fsstress -p 4 -d /mnt/scratch -n 2000000 > root 3745237 5.5 1.5 130136 62516 ? R 06:48 45:45 ./ltp/fsstress -p 4 -d /mnt/scratch -n 2000000 > root 3745238 5.5 1.6 136520 65944 ? R 06:48 45:52 ./ltp/fsstress -p 4 -d /mnt/scratch -n 2000000 > root 3745239 5.5 1.7 131988 67884 ? R 06:48 45:50 ./ltp/fsstress -p 4 -d /mnt/scratch -n 2000000 > > # cat /proc/3745239/stack > [<0>] xfs_buf_lock+0x4b/0x170 > [<0>] xfs_buf_find_lock+0x69/0x140 > [<0>] xfs_buf_get_map+0x265/0xbd0 > [<0>] xfs_buf_read_map+0x59/0x2e0 > [<0>] xfs_trans_read_buf_map+0x1bb/0x560 > [<0>] xfs_read_agi+0xab/0x1a0 > (...) It would be helpful to quote the full stack traces... > # cat /proc/3745238/stack > [<0>] xfs_buf_alloc+0x793/0x7d0 > [<0>] xfs_buf_get_map+0x651/0xbd0 > [<0>] xfs_buf_readahead_map+0x3b/0x1b0 > [<0>] xfs_iwalk_ichunk_ra+0xe9/0x130 > [<0>] xfs_iwalk_ag+0x185/0x2d0 > (...) However, how is memory allocation stuck here? That's the readahead path, which triggers an early exit from the __vmalloc() fallback loop. i.e. xfs_buf_alloc() does not loop forever on readahead - it tries once and then exits. Yes, this bulkstat path is holding the AGI buffer locked, and the previous thread is waiting on the AGI buffer lock, but that doesn't mean the system is deadlocked - it's just lockstepping on the AGI buffer lock due to the long hold in the bulkstat path.... i.e. these traces do not indicate that there is any sort of memory allocation problem in the system, just bulkstat slowing down other operations... -Dave. -- Dave Chinner dgc@kernel.org