From: Dave Chinner <david@fromorbit.com>
To: Joanne Koong <joannelkoong@gmail.com>
Cc: lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org,
"Matthew Wilcox (Oracle)" <willy@infradead.org>
Subject: Re: [LSF/MM/BPF TOPIC] Improving large folio writeback performance
Date: Wed, 15 Jan 2025 12:21:37 +1100 [thread overview]
Message-ID: <Z4cNoWIWnC7XwCT8@dread.disaster.area> (raw)
In-Reply-To: <CAJnrk1a38pv3OgFZRfdTiDMXuPWuBgN8KY47XfOsYHj=N2wxAg@mail.gmail.com>
On Tue, Jan 14, 2025 at 04:50:53PM -0800, Joanne Koong wrote:
> Hi all,
>
> I would like to propose a discussion topic about improving large folio
> writeback performance. As more filesystems adopt large folios, it
> becomes increasingly important that writeback is made to be as
> performant as possible. There are two areas I'd like to discuss:
>
>
> == Granularity of dirty pages writeback ==
> Currently, the granularity of writeback is at the folio level. If one
> byte in a folio is dirty, the entire folio will be written back. This
> becomes unscalable for larger folios and significantly degrades
> performance, especially for workloads that employ random writes.
This sounds familiar, probably because we fixed this exact issue in
the iomap infrastructure some while ago.
commit 4ce02c67972211be488408c275c8fbf19faf29b3
Author: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Date: Mon Jul 10 14:12:43 2023 -0700
iomap: Add per-block dirty state tracking to improve performance
When filesystem blocksize is less than folio size (either with
mapping_large_folio_support() or with blocksize < pagesize) and when the
folio is uptodate in pagecache, then even a byte write can cause
an entire folio to be written to disk during writeback. This happens
because we currently don't have a mechanism to track per-block dirty
state within struct iomap_folio_state. We currently only track uptodate
state.
This patch implements support for tracking per-block dirty state in
iomap_folio_state->state bitmap. This should help improve the filesystem
write performance and help reduce write amplification.
Performance testing of below fio workload reveals ~16x performance
improvement using nvme with XFS (4k blocksize) on Power (64K pagesize)
FIO reported write bw scores improved from around ~28 MBps to ~452 MBps.
1. <test_randwrite.fio>
[global]
ioengine=psync
rw=randwrite
overwrite=1
pre_read=1
direct=0
bs=4k
size=1G
dir=./
numjobs=8
fdatasync=1
runtime=60
iodepth=64
group_reporting=1
[fio-run]
2. Also our internal performance team reported that this patch improves
their database workload performance by around ~83% (with XFS on Power)
Reported-by: Aravinda Herle <araherle@in.ibm.com>
Reported-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> One idea is to track dirty pages at a smaller granularity using a
> 64-bit bitmap stored inside the folio struct where each bit tracks a
> smaller chunk of pages (eg for 2 MB folios, each bit would track 32k
> pages), and only write back dirty chunks rather than the entire folio.
Have a look at how sub-folio state is tracked via the
folio->iomap_folio_state->state{} bitmaps.
Essentially it is up to the subsystem to track sub-folio state if
they require it; there is some generic filesystem infrastructure
support already in place (like iomap), but if that doesn't fit a
filesystem then it will need to provide it's own dirty/uptodate
tracking....
-Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2025-01-15 1:21 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-15 0:50 [LSF/MM/BPF TOPIC] Improving large folio writeback performance Joanne Koong
2025-01-15 1:21 ` Dave Chinner [this message]
2025-01-16 20:14 ` Joanne Koong
2025-01-15 1:50 ` Darrick J. Wong
2025-01-16 11:01 ` [Lsf-pc] " Jan Kara
2025-01-16 23:38 ` Joanne Koong
2025-01-17 11:53 ` Jan Kara
2025-01-17 22:45 ` Joanne Koong
2025-01-20 22:42 ` Jan Kara
2025-01-22 0:29 ` Joanne Koong
2025-01-22 9:22 ` Jan Kara
2025-01-22 22:17 ` Joanne Koong
2025-01-17 11:40 ` Vlastimil Babka
2025-01-17 11:56 ` [Lsf-pc] " Jan Kara
2025-01-17 14:17 ` Matthew Wilcox
2025-01-22 11:15 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z4cNoWIWnC7XwCT8@dread.disaster.area \
--to=david@fromorbit.com \
--cc=joannelkoong@gmail.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox