From: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
To: Andreas Gruenbacher <agruenba@redhat.com>
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
"Darrick J . Wong" <djwong@kernel.org>,
Matthew Wilcox <willy@infradead.org>,
Christoph Hellwig <hch@infradead.org>,
Brian Foster <bfoster@redhat.com>,
Ojaswin Mujoo <ojaswin@linux.ibm.com>,
Disha Goel <disgoel@linux.ibm.com>,
Aravinda Herle <araherle@in.ibm.com>
Subject: Re: [PATCHv10 8/8] iomap: Add per-block dirty state tracking to improve performance
Date: Mon, 19 Jun 2023 22:25:44 +0530 [thread overview]
Message-ID: <87legfmlwv.fsf@doe.com> (raw)
In-Reply-To: <CAHc6FU70U9HXe3=THWO6K5uzvz7c0BH38K0GytUbZdgiXMfh+Q@mail.gmail.com>
Andreas Gruenbacher <agruenba@redhat.com> writes:
> On Mon, Jun 19, 2023 at 4:29 AM Ritesh Harjani (IBM)
> <ritesh.list@gmail.com> wrote:
>> When filesystem blocksize is less than folio size (either with
>> mapping_large_folio_support() or with blocksize < pagesize) and when the
>> folio is uptodate in pagecache, then even a byte write can cause
>> an entire folio to be written to disk during writeback. This happens
>> because we currently don't have a mechanism to track per-block dirty
>> state within struct iomap_folio_state. We currently only track uptodate
>> state.
>>
>> This patch implements support for tracking per-block dirty state in
>> iomap_folio_state->state bitmap. This should help improve the filesystem
>> write performance and help reduce write amplification.
>>
>> Performance testing of below fio workload reveals ~16x performance
>> improvement using nvme with XFS (4k blocksize) on Power (64K pagesize)
>> FIO reported write bw scores improved from around ~28 MBps to ~452 MBps.
>>
>> 1. <test_randwrite.fio>
>> [global]
>> ioengine=psync
>> rw=randwrite
>> overwrite=1
>> pre_read=1
>> direct=0
>> bs=4k
>> size=1G
>> dir=./
>> numjobs=8
>> fdatasync=1
>> runtime=60
>> iodepth=64
>> group_reporting=1
>>
>> [fio-run]
>>
>> 2. Also our internal performance team reported that this patch improves
>> their database workload performance by around ~83% (with XFS on Power)
>>
>> Reported-by: Aravinda Herle <araherle@in.ibm.com>
>> Reported-by: Brian Foster <bfoster@redhat.com>
>> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
>> ---
>> fs/gfs2/aops.c | 2 +-
>> fs/iomap/buffered-io.c | 189 ++++++++++++++++++++++++++++++++++++-----
>> fs/xfs/xfs_aops.c | 2 +-
>> fs/zonefs/file.c | 2 +-
>> include/linux/iomap.h | 1 +
>> 5 files changed, 171 insertions(+), 25 deletions(-)
>>
>> diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
>> index a5f4be6b9213..75efec3c3b71 100644
>> --- a/fs/gfs2/aops.c
>> +++ b/fs/gfs2/aops.c
>> @@ -746,7 +746,7 @@ static const struct address_space_operations gfs2_aops = {
>> .writepages = gfs2_writepages,
>> .read_folio = gfs2_read_folio,
>> .readahead = gfs2_readahead,
>> - .dirty_folio = filemap_dirty_folio,
>> + .dirty_folio = iomap_dirty_folio,
>> .release_folio = iomap_release_folio,
>> .invalidate_folio = iomap_invalidate_folio,
>> .bmap = gfs2_bmap,
>> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
>> index 391d918ddd22..50f5840bb5f9 100644
>> --- a/fs/iomap/buffered-io.c
>> +++ b/fs/iomap/buffered-io.c
>> @@ -25,7 +25,7 @@
>>
>> typedef int (*iomap_punch_t)(struct inode *inode, loff_t offset, loff_t length);
>> /*
>> - * Structure allocated for each folio to track per-block uptodate state
>> + * Structure allocated for each folio to track per-block uptodate, dirty state
>> * and I/O completions.
>> */
>> struct iomap_folio_state {
>> @@ -35,31 +35,55 @@ struct iomap_folio_state {
>> unsigned long state[];
>> };
>>
>> +enum iomap_block_state {
>> + IOMAP_ST_UPTODATE,
>> + IOMAP_ST_DIRTY,
>> +
>> + IOMAP_ST_MAX,
>> +};
>> +
>> +static void ifs_calc_range(struct folio *folio, size_t off, size_t len,
>> + enum iomap_block_state state, unsigned int *first_blkp,
>> + unsigned int *nr_blksp)
>> +{
>> + struct inode *inode = folio->mapping->host;
>> + unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
>> + unsigned int first = off >> inode->i_blkbits;
>> + unsigned int last = (off + len - 1) >> inode->i_blkbits;
>> +
>> + *first_blkp = first + (state * blks_per_folio);
>> + *nr_blksp = last - first + 1;
>> +}
>> +
>> static struct bio_set iomap_ioend_bioset;
>>
>> static inline bool ifs_is_fully_uptodate(struct folio *folio,
>> struct iomap_folio_state *ifs)
>> {
>> struct inode *inode = folio->mapping->host;
>> + unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
>> + unsigned int nr_blks = (IOMAP_ST_UPTODATE + 1) * blks_per_folio;
>
> This nr_blks calculation doesn't make sense.
>
About this, I have replied with more details here [1]
[1]: https://lore.kernel.org/linux-xfs/87o7lbmnam.fsf@doe.com/
>> - return bitmap_full(ifs->state, i_blocks_per_folio(inode, folio));
>> + return bitmap_full(ifs->state, nr_blks);
>
> Could you please change this to:
>
> BUILD_BUG_ON(IOMAP_ST_UPTODATE != 0);
ditto
> return bitmap_full(ifs->state, blks_per_folio);
>
> Also, I'm seeing that the value of i_blocks_per_folio() is assigned to
> local variables with various names in several places (blks_per_folio,
> nr_blocks, nblocks). Maybe this could be made consistent.
>
I remember giving it a try, but then it was really not worth it because
all of above naming also does make sense in the way they are getting
used currently in the code. So, I think as long as it is clear behind
what those variables means and how are they getting used, it should be
ok, IMO.
-ritesh
next prev parent reply other threads:[~2023-06-19 16:56 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-19 2:28 [PATCHv10 0/8] iomap: Add support for per-block dirty state to improve write performance Ritesh Harjani (IBM)
2023-06-19 2:28 ` [PATCHv10 1/8] iomap: Rename iomap_page to iomap_folio_state and others Ritesh Harjani (IBM)
2023-06-19 2:28 ` [PATCHv10 2/8] iomap: Drop ifs argument from iomap_set_range_uptodate() Ritesh Harjani (IBM)
2023-06-19 2:28 ` [PATCHv10 3/8] iomap: Add some uptodate state handling helpers for ifs state bitmap Ritesh Harjani (IBM)
2023-06-19 2:28 ` [PATCHv10 4/8] iomap: Fix possible overflow condition in iomap_write_delalloc_scan Ritesh Harjani (IBM)
2023-06-19 2:28 ` [PATCHv10 5/8] iomap: Use iomap_punch_t typedef Ritesh Harjani (IBM)
2023-06-19 2:28 ` [PATCHv10 6/8] iomap: Refactor iomap_write_delalloc_punch() function out Ritesh Harjani (IBM)
2023-06-19 2:28 ` [PATCHv10 7/8] iomap: Allocate ifs in ->write_begin() early Ritesh Harjani (IBM)
2023-06-19 2:28 ` [PATCHv10 8/8] iomap: Add per-block dirty state tracking to improve performance Ritesh Harjani (IBM)
2023-06-19 12:43 ` Andreas Gruenbacher
2023-06-19 16:55 ` Ritesh Harjani [this message]
2023-06-19 14:26 ` Matthew Wilcox
2023-06-19 16:25 ` Ritesh Harjani
2023-06-19 16:54 ` Matthew Wilcox
2023-06-19 17:29 ` Ritesh Harjani
2023-06-20 5:01 ` Darrick J. Wong
2023-06-20 5:58 ` Ritesh Harjani
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87legfmlwv.fsf@doe.com \
--to=ritesh.list@gmail.com \
--cc=agruenba@redhat.com \
--cc=araherle@in.ibm.com \
--cc=bfoster@redhat.com \
--cc=disgoel@linux.ibm.com \
--cc=djwong@kernel.org \
--cc=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=ojaswin@linux.ibm.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox