From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2249F2F7CE0; Wed, 2 Jul 2025 18:44:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751481880; cv=none; b=XifEW05IV7EoDI4gqK3nycWs9h/eav7RbIsAo+zz38ID4NeTmLPDuIsyKbGX+Sq9PgWCptDxvkE0SzwP9gWC97GFwnOT0jxQ/JGSc+LjVPjmyeo5vIcT2US8pfY7UnXHUmq7nNqaCS6jemk/DVKl+nvoEKpY/APjZwD7L14REHc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751481880; c=relaxed/simple; bh=/IZPd8KlF1Rfpn2HYXRQXoTSPYc958DcN74ujLHkxLA=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=KYnZHQG/9rkx3aPJEUN5D21kChn+1P2qXLmfFPdQ9nccaKBIAEgjM4jR36v/o4E/5cIcl3QRgiYKIGbslESu60b5y7Mjs+MV69idIMRirC68nss7WsiSyY1J4AXLyW+x0usMm7h74Zg/1os+r0qATPCDtQyhc7MMn16HuyhrA50= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=LTe3GXWS; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="LTe3GXWS" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 929F6C4CEE7; Wed, 2 Jul 2025 18:44:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1751481879; bh=/IZPd8KlF1Rfpn2HYXRQXoTSPYc958DcN74ujLHkxLA=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=LTe3GXWS6Db++/57WYwVc3keMIXhJgZR4AR38dxLEyuCPmELgp+zbrl4ZBaRTSrU4 cgOdmtJoRoq6XVoUNDgzn32O8s8Ldv6Pbr/j0mTPm9x7Mxon1GryHW7d6k+VR1ijtU t1OYIydDucFAbmJfRvfrYndlpUrRnTA9hd13CAMdlypcg68bIpUIWIAP/1m00NoYtA 8XFp30voETMfh7A6fJYAWNQhqcRn2jGAXGyHs8Od9eYjLFYQUhrApNqoCCQ6UzPGVE oN+Nv/SqIH5Kf8azSfXJ3hTAd2J26JhN2KGNL0kDLqFTjs3ox+gZrC/ARaA6Wws7Jc sGmv0XhJNxFIA== Date: Wed, 2 Jul 2025 11:44:39 -0700 From: "Darrick J. Wong" To: Kundan Kumar Cc: Anuj gupta , Christoph Hellwig , Anuj Gupta/Anuj Gupta , Kundan Kumar , jaegeuk@kernel.org, chao@kernel.org, viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, miklos@szeredi.hu, agruenba@redhat.com, trondmy@kernel.org, anna@kernel.org, akpm@linux-foundation.org, willy@infradead.org, mcgrof@kernel.org, clm@meta.com, david@fromorbit.com, amir73il@gmail.com, axboe@kernel.dk, ritesh.list@gmail.com, dave@stgolabs.net, p.raghav@samsung.com, da.gomez@samsung.com, linux-f2fs-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, gfs2@lists.linux.dev, linux-nfs@vger.kernel.org, linux-mm@kvack.org, gost.dev@samsung.com Subject: Re: [PATCH 00/13] Parallelizing filesystem writeback Message-ID: <20250702184439.GD9991@frogsfrogsfrogs> References: <20250529111504.89912-1-kundan.kumar@samsung.com> <20250602141904.GA21996@lst.de> <20250603132434.GA10865@lst.de> <20250611155144.GD6138@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Tue, Jun 24, 2025 at 11:29:28AM +0530, Kundan Kumar wrote: > On Wed, Jun 11, 2025 at 9:21 PM Darrick J. Wong wrote: > > > > On Wed, Jun 04, 2025 at 02:52:34PM +0530, Kundan Kumar wrote: > > > > > > For xfs used this command: > > > > > > xfs_io -c "stat" /mnt/testfile > > > > > > And for ext4 used this: > > > > > > filefrag /mnt/testfile > > > > > > > > > > filefrag merges contiguous extents, and only counts up for discontiguous > > > > > mappings, while fsxattr.nextents counts all extent even if they are > > > > > contiguous. So you probably want to use filefrag for both cases. > > > > > > > > Got it — thanks for the clarification. We'll switch to using filefrag > > > > and will share updated extent count numbers accordingly. > > > > > > Using filefrag, we recorded extent counts on xfs and ext4 at three > > > stages: > > > a. Just after a 1G random write, > > > b. After a 30-second wait, > > > c. After unmounting and remounting the filesystem, > > > > > > xfs > > > Base > > > a. 6251 b. 2526 c. 2526 > > > Parallel writeback > > > a. 6183 b. 2326 c. 2326 > > > > Interesting that the mapping record count goes down... > > > > I wonder, you said the xfs filesystem has 4 AGs and 12 cores, so I guess > > wb_ctx_arr[] is 12? I wonder, do you see a knee point in writeback > > throughput when the # of wb contexts exceeds the AG count? > > > > Though I guess for the (hopefully common) case of pure overwrites, we > > don't have to do any metadata updates so we wouldn't really hit a > > scaling limit due to ag count or log contention or whatever. Does that > > square with what you see? > > > > Hi Darrick, > > We analyzed AG count vs. number of writeback contexts to identify any > knee point. Earlier, wb_ctx_arr[] was fixed at 12; now we varied nr_wb_ctx > and measured the impact. > > We implemented a configurable number of writeback contexts to measure > throughput more easily. This feature will be exposed in the next series. > To configure, used: echo > /sys/class/bdi/259:2/nwritebacks. > > In our test, writing 1G across 12 directories showed improved bandwidth up > to the number of allocation groups (AGs), mostly a knee point, but gains > tapered off beyond that. Also, we see a good increase in bandwidth of about > 16 times from base to nr_wb_ctx = 6. > > Base (single threaded) : 9799KiB/s > Parallel Writeback (nr_wb_ctx = 1) : 9727KiB/s > Parallel Writeback (nr_wb_ctx = 2) : 18.1MiB/s > Parallel Writeback (nr_wb_ctx = 3) : 46.4MiB/s > Parallel Writeback (nr_wb_ctx = 4) : 135MiB/s > Parallel Writeback (nr_wb_ctx = 5) : 160MiB/s > Parallel Writeback (nr_wb_ctx = 6) : 163MiB/s Heh, nice! > Parallel Writeback (nr_wb_ctx = 7) : 162MiB/s > Parallel Writeback (nr_wb_ctx = 8) : 154MiB/s > Parallel Writeback (nr_wb_ctx = 9) : 152MiB/s > Parallel Writeback (nr_wb_ctx = 10) : 145MiB/s > Parallel Writeback (nr_wb_ctx = 11) : 145MiB/s > Parallel Writeback (nr_wb_ctx = 12) : 138MiB/s > > > System config > =========== > Number of CPUs = 12 > System RAM = 9G > For XFS number of AGs = 4 > Used NVMe SSD of 3.84 TB (Enterprise SSD PM1733a) > > Script > ===== > mkfs.xfs -f /dev/nvme0n1 > mount /dev/nvme0n1 /mnt > echo > /sys/class/bdi/259:2/nwritebacks > sync > echo 3 > /proc/sys/vm/drop_caches > > for i in {1..12}; do > mkdir -p /mnt/dir$i > done > > fio job_nvme.fio > > umount /mnt > echo 3 > /proc/sys/vm/drop_caches > sync > > fio job > ===== > [global] > bs=4k > iodepth=1 > rw=randwrite > ioengine=io_uring > nrfiles=12 > numjobs=1 # Each job writes to a different file > size=1g > direct=0 # Buffered I/O to trigger writeback > group_reporting=1 > create_on_open=1 > name=test > > [job1] > directory=/mnt/dir1 > > [job2] > directory=/mnt/dir2 > ... > ... > [job12] > directory=/mnt/dir1 > > > > ext4 > > > Base > > > a. 7080 b. 7080 c. 11 > > > Parallel writeback > > > a. 5961 b. 5961 c. 11 > > > > Hum, that's particularly ... interesting. I wonder what the mapping > > count behaviors are when you turn off delayed allocation? > > > > --D > > > > I attempted to disable delayed allocation by setting allocsize=4096 > during mount (mount -o allocsize=4096 /dev/pmem0 /mnt), but still > observed a reduction in file fragments after a delay. Is there something > I'm overlooking? Not that I know of. Maybe we should just take the win. :) --D > -Kundan >