From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C5B1030B535 for ; Tue, 14 Oct 2025 21:04:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760475849; cv=none; b=Vb1gJs0YVeR5nh3WFRfrrMK5fkoGhxXocwS0siR7SqVqQeSJuax0C61fSGfS51Syl56XMS4nfgZUlE4J6a6SuSa5SbRwusq/pQ75EL4gE91THjQo/ZBUWf5buwVm8znJS74uqExn5xRkaD0c79B+Q7el/30AuxtgN8NHCtJ4RCE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760475849; c=relaxed/simple; bh=7F84UB/wOAVzbYoLhR/QLzxALIfrhhT9J8300HHcVJk=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=PDs5NoTfn55J5qLx8TufmkxTbMCmjMpt4cjDyx34+Pi9FQFnOmzr1US0ydbB6oyi899aTCXQADlhxbw20vSuD1dFIIOpuPctPLtXiTT6TetdcxC30lJav4TzgMlIXCjyH7RPVLAcRFzXsP902TcsgCRTCb3lpFwAqej0/iahCgE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=rOXUo45B; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="rOXUo45B" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 29B88C4CEE7; Tue, 14 Oct 2025 21:04:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1760475847; bh=7F84UB/wOAVzbYoLhR/QLzxALIfrhhT9J8300HHcVJk=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=rOXUo45BqoGVjHuJJlKcYj5YjhKYDn51ct1tgyfZd1JxbDOHK3PMdX0jT5h+H2F3t v8J1JN+MayTeUdroYLLuzKZJIfpyTxXDUAsQGBqgUmXCbsNQhC6dMzHmmdEOOZYzbl uqjWbAgjhvCJv3Icho17oCph4dGvAhus4b39h/k4d4psmLFCEKlORR+k0JD462k6Hj CbHrb8svjWOX1Fgwro+YCoeAhZ4RsD9q3Hq8lurjBk1RbNgGBZ1NHrXH+95pawOdw5 afaOin9/HUunYQHV6MMcKP5HO3ipEDFRNP8qmnUx/qQ4VcUg7UNqT0OqS3TFhuuOoy GkED7ik41yGUQ== Date: Tue, 14 Oct 2025 14:04:06 -0700 From: "Darrick J. Wong" To: Damien Le Moal Cc: linux-xfs@vger.kernel.org, Carlos Maiolino , Christoph Hellwig , Hans Holmberg Subject: Re: [PATCH v4] xfs: do not tightly pack-write large files Message-ID: <20251014210406.GD6188@frogsfrogsfrogs> References: <20251014041945.760013-1-dlemoal@kernel.org> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20251014041945.760013-1-dlemoal@kernel.org> On Tue, Oct 14, 2025 at 01:19:45PM +0900, Damien Le Moal wrote: > When using a zoned realtime device, tightly packing of data blocks > belonging to multiple closed files into the same realtime group (RTG) > is very efficient at improving write performance. This is especially > true with SMR HDDs as this can reduce, and even suppress, disk head > seeks. > > However, such tight packing does not make sense for large files that > require at least a full RTG. If tight packing placement is applied for > such files, the VM writeback thread switching between inodes result in > the large files to be fragmented, thus increasing the garbage collection > penalty later when the RTG needs to be reclaimed. > > This problem can be avoided with a simple heuristic: if the size of the > inode being written back is at least equal to the RTG size, do not use > tight-packing. Modify xfs_zoned_pack_tight() to always return false in > this case. > > With this change, a multi-writer workload writing files of 256 MB on a > file system backed by an SMR HDD with 256 MB zone size as a realtime > device sees all files occupying exactly one RTG (i.e. one device zone), > thus completely removing the heavy fragmentation observed without this > change. > > Signed-off-by: Damien Le Moal Seems reasonable to me, it's like tail packing of the old days. Only now the blocks are 256M, like mkp says. ;) Reviewed-by: "Darrick J. Wong" --D > --- > Changes from v1: > - Improved commit message > - Improved code comments > Changes from v2: > - Fixed typos in the commit message > Changes from v3: > - Changed code comment as suggested by Christoph. > > fs/xfs/xfs_zone_alloc.c | 19 +++++++++++++++---- > 1 file changed, 15 insertions(+), 4 deletions(-) > > diff --git a/fs/xfs/xfs_zone_alloc.c b/fs/xfs/xfs_zone_alloc.c > index 1147bacb2da8..1b462cd5d8fa 100644 > --- a/fs/xfs/xfs_zone_alloc.c > +++ b/fs/xfs/xfs_zone_alloc.c > @@ -614,14 +614,25 @@ static inline enum rw_hint xfs_inode_write_hint(struct xfs_inode *ip) > } > > /* > - * Try to pack inodes that are written back after they were closed tight instead > - * of trying to open new zones for them or spread them to the least recently > - * used zone. This optimizes the data layout for workloads that untar or copy > - * a lot of small files. Right now this does not separate multiple such > + * Try to tightly pack small files that are written back after they were closed > + * instead of trying to open new zones for them or spread them to the least > + * recently used zone. This optimizes the data layout for workloads that untar > + * or copy a lot of small files. Right now this does not separate multiple such > * streams. > */ > static inline bool xfs_zoned_pack_tight(struct xfs_inode *ip) > { > + struct xfs_mount *mp = ip->i_mount; > + size_t zone_capacity = > + XFS_FSB_TO_B(mp, mp->m_groups[XG_TYPE_RTG].blocks); > + > + /* > + * Do not pack write files that are already using a full zone to avoid > + * fragmentation. > + */ > + if (i_size_read(VFS_I(ip)) >= zone_capacity) > + return false; > + > return !inode_is_open_for_write(VFS_I(ip)) && > !(ip->i_diflags & XFS_DIFLAG_APPEND); > } > -- > 2.51.0 > >