public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Joanne Koong <joannelkoong@gmail.com>
Cc: lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>
Subject: Re: [LSF/MM/BPF TOPIC] Improving large folio writeback performance
Date: Wed, 15 Jan 2025 12:21:37 +1100	[thread overview]
Message-ID: <Z4cNoWIWnC7XwCT8@dread.disaster.area> (raw)
In-Reply-To: <CAJnrk1a38pv3OgFZRfdTiDMXuPWuBgN8KY47XfOsYHj=N2wxAg@mail.gmail.com>

On Tue, Jan 14, 2025 at 04:50:53PM -0800, Joanne Koong wrote:
> Hi all,
> 
> I would like to propose a discussion topic about improving large folio
> writeback performance. As more filesystems adopt large folios, it
> becomes increasingly important that writeback is made to be as
> performant as possible. There are two areas I'd like to discuss:
> 
> 
> == Granularity of dirty pages writeback ==
> Currently, the granularity of writeback is at the folio level. If one
> byte in a folio is dirty, the entire folio will be written back. This
> becomes unscalable for larger folios and significantly degrades
> performance, especially for workloads that employ random writes.

This sounds familiar, probably because we fixed this exact issue in
the iomap infrastructure some while ago.

commit 4ce02c67972211be488408c275c8fbf19faf29b3
Author: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Date:   Mon Jul 10 14:12:43 2023 -0700

    iomap: Add per-block dirty state tracking to improve performance
    
    When filesystem blocksize is less than folio size (either with
    mapping_large_folio_support() or with blocksize < pagesize) and when the
    folio is uptodate in pagecache, then even a byte write can cause
    an entire folio to be written to disk during writeback. This happens
    because we currently don't have a mechanism to track per-block dirty
    state within struct iomap_folio_state. We currently only track uptodate
    state.
    
    This patch implements support for tracking per-block dirty state in
    iomap_folio_state->state bitmap. This should help improve the filesystem
    write performance and help reduce write amplification.
    
    Performance testing of below fio workload reveals ~16x performance
    improvement using nvme with XFS (4k blocksize) on Power (64K pagesize)
    FIO reported write bw scores improved from around ~28 MBps to ~452 MBps.
    
    1. <test_randwrite.fio>
    [global]
            ioengine=psync
            rw=randwrite
            overwrite=1
            pre_read=1
            direct=0
            bs=4k
            size=1G
            dir=./
            numjobs=8
            fdatasync=1
            runtime=60
            iodepth=64
            group_reporting=1
    
    [fio-run]
    
    2. Also our internal performance team reported that this patch improves
       their database workload performance by around ~83% (with XFS on Power)
    
    Reported-by: Aravinda Herle <araherle@in.ibm.com>
    Reported-by: Brian Foster <bfoster@redhat.com>
    Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>


> One idea is to track dirty pages at a smaller granularity using a
> 64-bit bitmap stored inside the folio struct where each bit tracks a
> smaller chunk of pages (eg for 2 MB folios, each bit would track 32k
> pages), and only write back dirty chunks rather than the entire folio.

Have a look at how sub-folio state is tracked via the
folio->iomap_folio_state->state{} bitmaps.

Essentially it is up to the subsystem to track sub-folio state if
they require it; there is some generic filesystem infrastructure
support already in place (like iomap), but if that doesn't fit a
filesystem then it will need to provide it's own dirty/uptodate
tracking....

-Dave.
-- 
Dave Chinner
david@fromorbit.com


  reply	other threads:[~2025-01-15  1:21 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-15  0:50 [LSF/MM/BPF TOPIC] Improving large folio writeback performance Joanne Koong
2025-01-15  1:21 ` Dave Chinner [this message]
2025-01-16 20:14   ` Joanne Koong
2025-01-15  1:50 ` Darrick J. Wong
2025-01-16 11:01 ` [Lsf-pc] " Jan Kara
2025-01-16 23:38   ` Joanne Koong
2025-01-17 11:53     ` Jan Kara
2025-01-17 22:45       ` Joanne Koong
2025-01-20 22:42         ` Jan Kara
2025-01-22  0:29           ` Joanne Koong
2025-01-22  9:22             ` Jan Kara
2025-01-22 22:17               ` Joanne Koong
2025-01-17 11:40 ` Vlastimil Babka
2025-01-17 11:56   ` [Lsf-pc] " Jan Kara
2025-01-17 14:17     ` Matthew Wilcox
2025-01-22 11:15       ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z4cNoWIWnC7XwCT8@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=joannelkoong@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox