From: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
To: Andreas Gruenbacher <agruenba@redhat.com>
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
"Darrick J . Wong" <djwong@kernel.org>,
Matthew Wilcox <willy@infradead.org>,
Christoph Hellwig <hch@infradead.org>,
Brian Foster <bfoster@redhat.com>,
Ojaswin Mujoo <ojaswin@linux.ibm.com>,
Disha Goel <disgoel@linux.ibm.com>,
Aravinda Herle <araherle@in.ibm.com>
Subject: Re: [PATCHv10 8/8] iomap: Add per-block dirty state tracking to improve performance
Date: Mon, 19 Jun 2023 22:25:44 +0530 [thread overview]
Message-ID: <87legfmlwv.fsf@doe.com> (raw)
In-Reply-To: <CAHc6FU70U9HXe3=THWO6K5uzvz7c0BH38K0GytUbZdgiXMfh+Q@mail.gmail.com>
Andreas Gruenbacher <agruenba@redhat.com> writes:
> On Mon, Jun 19, 2023 at 4:29 AM Ritesh Harjani (IBM)
> <ritesh.list@gmail.com> wrote:
>> When filesystem blocksize is less than folio size (either with
>> mapping_large_folio_support() or with blocksize < pagesize) and when the
>> folio is uptodate in pagecache, then even a byte write can cause
>> an entire folio to be written to disk during writeback. This happens
>> because we currently don't have a mechanism to track per-block dirty
>> state within struct iomap_folio_state. We currently only track uptodate
>> state.
>>
>> This patch implements support for tracking per-block dirty state in
>> iomap_folio_state->state bitmap. This should help improve the filesystem
>> write performance and help reduce write amplification.
>>
>> Performance testing of below fio workload reveals ~16x performance
>> improvement using nvme with XFS (4k blocksize) on Power (64K pagesize)
>> FIO reported write bw scores improved from around ~28 MBps to ~452 MBps.
>>
>> 1. <test_randwrite.fio>
>> [global]
>> ioengine=psync
>> rw=randwrite
>> overwrite=1
>> pre_read=1
>> direct=0
>> bs=4k
>> size=1G
>> dir=./
>> numjobs=8
>> fdatasync=1
>> runtime=60
>> iodepth=64
>> group_reporting=1
>>
>> [fio-run]
>>
>> 2. Also our internal performance team reported that this patch improves
>> their database workload performance by around ~83% (with XFS on Power)
>>
>> Reported-by: Aravinda Herle <araherle@in.ibm.com>
>> Reported-by: Brian Foster <bfoster@redhat.com>
>> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
>> ---
>> fs/gfs2/aops.c | 2 +-
>> fs/iomap/buffered-io.c | 189 ++++++++++++++++++++++++++++++++++++-----
>> fs/xfs/xfs_aops.c | 2 +-
>> fs/zonefs/file.c | 2 +-
>> include/linux/iomap.h | 1 +
>> 5 files changed, 171 insertions(+), 25 deletions(-)
>>
>> diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
>> index a5f4be6b9213..75efec3c3b71 100644
>> --- a/fs/gfs2/aops.c
>> +++ b/fs/gfs2/aops.c
>> @@ -746,7 +746,7 @@ static const struct address_space_operations gfs2_aops = {
>> .writepages = gfs2_writepages,
>> .read_folio = gfs2_read_folio,
>> .readahead = gfs2_readahead,
>> - .dirty_folio = filemap_dirty_folio,
>> + .dirty_folio = iomap_dirty_folio,
>> .release_folio = iomap_release_folio,
>> .invalidate_folio = iomap_invalidate_folio,
>> .bmap = gfs2_bmap,
>> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
>> index 391d918ddd22..50f5840bb5f9 100644
>> --- a/fs/iomap/buffered-io.c
>> +++ b/fs/iomap/buffered-io.c
>> @@ -25,7 +25,7 @@
>>
>> typedef int (*iomap_punch_t)(struct inode *inode, loff_t offset, loff_t length);
>> /*
>> - * Structure allocated for each folio to track per-block uptodate state
>> + * Structure allocated for each folio to track per-block uptodate, dirty state
>> * and I/O completions.
>> */
>> struct iomap_folio_state {
>> @@ -35,31 +35,55 @@ struct iomap_folio_state {
>> unsigned long state[];
>> };
>>
>> +enum iomap_block_state {
>> + IOMAP_ST_UPTODATE,
>> + IOMAP_ST_DIRTY,
>> +
>> + IOMAP_ST_MAX,
>> +};
>> +
>> +static void ifs_calc_range(struct folio *folio, size_t off, size_t len,
>> + enum iomap_block_state state, unsigned int *first_blkp,
>> + unsigned int *nr_blksp)
>> +{
>> + struct inode *inode = folio->mapping->host;
>> + unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
>> + unsigned int first = off >> inode->i_blkbits;
>> + unsigned int last = (off + len - 1) >> inode->i_blkbits;
>> +
>> + *first_blkp = first + (state * blks_per_folio);
>> + *nr_blksp = last - first + 1;
>> +}
>> +
>> static struct bio_set iomap_ioend_bioset;
>>
>> static inline bool ifs_is_fully_uptodate(struct folio *folio,
>> struct iomap_folio_state *ifs)
>> {
>> struct inode *inode = folio->mapping->host;
>> + unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
>> + unsigned int nr_blks = (IOMAP_ST_UPTODATE + 1) * blks_per_folio;
>
> This nr_blks calculation doesn't make sense.
>
About this, I have replied with more details here [1]
[1]: https://lore.kernel.org/linux-xfs/87o7lbmnam.fsf@doe.com/
>> - return bitmap_full(ifs->state, i_blocks_per_folio(inode, folio));
>> + return bitmap_full(ifs->state, nr_blks);
>
> Could you please change this to:
>
> BUILD_BUG_ON(IOMAP_ST_UPTODATE != 0);
ditto
> return bitmap_full(ifs->state, blks_per_folio);
>
> Also, I'm seeing that the value of i_blocks_per_folio() is assigned to
> local variables with various names in several places (blks_per_folio,
> nr_blocks, nblocks). Maybe this could be made consistent.
>
I remember giving it a try, but then it was really not worth it because
all of above naming also does make sense in the way they are getting
used currently in the code. So, I think as long as it is clear behind
what those variables means and how are they getting used, it should be
ok, IMO.
-ritesh
next prev parent reply other threads:[~2023-06-19 16:56 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-19 2:28 [PATCHv10 0/8] iomap: Add support for per-block dirty state to improve write performance Ritesh Harjani (IBM)
2023-06-19 2:28 ` [PATCHv10 1/8] iomap: Rename iomap_page to iomap_folio_state and others Ritesh Harjani (IBM)
2023-06-19 2:28 ` [PATCHv10 2/8] iomap: Drop ifs argument from iomap_set_range_uptodate() Ritesh Harjani (IBM)
2023-06-19 2:28 ` [PATCHv10 3/8] iomap: Add some uptodate state handling helpers for ifs state bitmap Ritesh Harjani (IBM)
2023-06-19 2:28 ` [PATCHv10 4/8] iomap: Fix possible overflow condition in iomap_write_delalloc_scan Ritesh Harjani (IBM)
2023-06-19 2:28 ` [PATCHv10 5/8] iomap: Use iomap_punch_t typedef Ritesh Harjani (IBM)
2023-06-19 2:28 ` [PATCHv10 6/8] iomap: Refactor iomap_write_delalloc_punch() function out Ritesh Harjani (IBM)
2023-06-19 2:28 ` [PATCHv10 7/8] iomap: Allocate ifs in ->write_begin() early Ritesh Harjani (IBM)
2023-06-19 2:28 ` [PATCHv10 8/8] iomap: Add per-block dirty state tracking to improve performance Ritesh Harjani (IBM)
2023-06-19 12:43 ` Andreas Gruenbacher
2023-06-19 16:55 ` Ritesh Harjani [this message]
2023-06-19 14:26 ` Matthew Wilcox
2023-06-19 16:25 ` Ritesh Harjani
2023-06-19 16:54 ` Matthew Wilcox
2023-06-19 17:29 ` Ritesh Harjani
2023-06-20 5:01 ` Darrick J. Wong
2023-06-20 5:58 ` Ritesh Harjani
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87legfmlwv.fsf@doe.com \
--to=ritesh.list@gmail.com \
--cc=agruenba@redhat.com \
--cc=araherle@in.ibm.com \
--cc=bfoster@redhat.com \
--cc=disgoel@linux.ibm.com \
--cc=djwong@kernel.org \
--cc=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=ojaswin@linux.ibm.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.