From: "Darrick J. Wong" <djwong@kernel.org>
To: Ritesh Harjani <ritesh.list@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>,
linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
Andreas Gruenbacher <agruenba@redhat.com>,
Christoph Hellwig <hch@infradead.org>,
Brian Foster <bfoster@redhat.com>,
Ojaswin Mujoo <ojaswin@linux.ibm.com>,
Disha Goel <disgoel@linux.ibm.com>,
Aravinda Herle <araherle@in.ibm.com>
Subject: Re: [PATCHv10 8/8] iomap: Add per-block dirty state tracking to improve performance
Date: Mon, 19 Jun 2023 22:01:02 -0700 [thread overview]
Message-ID: <20230620050102.GF11467@frogsfrogsfrogs> (raw)
In-Reply-To: <87ilbjmkd6.fsf@doe.com>
On Mon, Jun 19, 2023 at 10:59:09PM +0530, Ritesh Harjani wrote:
> Matthew Wilcox <willy@infradead.org> writes:
>
> > On Mon, Jun 19, 2023 at 09:55:53PM +0530, Ritesh Harjani wrote:
> >> Matthew Wilcox <willy@infradead.org> writes:
> >>
> >> > On Mon, Jun 19, 2023 at 07:58:51AM +0530, Ritesh Harjani (IBM) wrote:
> >> >> +static void ifs_calc_range(struct folio *folio, size_t off, size_t len,
> >> >> + enum iomap_block_state state, unsigned int *first_blkp,
> >> >> + unsigned int *nr_blksp)
> >> >> +{
> >> >> + struct inode *inode = folio->mapping->host;
> >> >> + unsigned int blks_per_folio = i_blocks_per_folio(inode, folio);
> >> >> + unsigned int first = off >> inode->i_blkbits;
> >> >> + unsigned int last = (off + len - 1) >> inode->i_blkbits;
> >> >> +
> >> >> + *first_blkp = first + (state * blks_per_folio);
> >> >> + *nr_blksp = last - first + 1;
> >> >> +}
> >> >
> >> > As I said, this is not 'first_blkp'. It's first_bitp. I think this
> >> > misunderstanding is related to Andreas' complaint, but it's not quite
> >> > the same.
> >> >
> >>
> >> We represent each FS block as a bit in the bitmap. So first_blkp or
> >> first_bitp or first_blkbitp essentially means the same.
> >> I went with first_blk, first_blkp in the first place based on your
> >> suggestion itself [1].
> >
> > No, it's not the same! If you have 1kB blocks in a 64kB page, they're
> > numbered 0-63. If you 'calc_range' for any of the dirty bits, you get
> > back a number in the range 64-127. That's not a block number! It's
> > the number of the bit you want to refer to. Calling it blkp is going
> > to lead to confusion -- as you yourself seem to be confused.
> >
> >> [1]: https://lore.kernel.org/linux-xfs/Y%2FvxlVUJ31PZYaRa@casper.infradead.org/
> >
> > Those _were_ block numbers! off >> inode->i_blkbits calculates a block
> > number. (off >> inode->i_blkbits) + blocks_per_folio() does not calculate
> > a block number, it calculates a bit number.
> >
>
> Yes, I don't mind changing it to _bit. It is derived out of an FS block
> representation only. But I agree with your above argument using _bit in
> variable name makes it explicit and clear.
>
> >> >> - return bitmap_full(ifs->state, i_blocks_per_folio(inode, folio));
> >> >> + return bitmap_full(ifs->state, nr_blks);
> >> >
> >> > I think we have a gap in our bitmap APIs. We don't have a
> >> > 'bitmap_range_full(src, pos, nbits)'. We could use find_next_zero_bit(),
> >> > but that's going to do more work than necessary.
> >> >
> >> > Given this lack, perhaps it's time to say that you're making all of
> >> > this too hard by using an enum, and pretending that we can switch the
> >> > positions of 'uptodate' and 'dirty' in the bitmap just by changing
> >> > the enum.
> >>
> >> Actually I never wanted to use the the enum this way. That's why I was
> >> not fond of the idea behind using enum in all the bitmap state
> >> manipulation APIs (test/set/).
> >>
> >> It was only intended to be passed as a state argument to ifs_calc_range()
> >> function to keep all the first_blkp and nr_blksp calculation at one
> >> place. And just use it's IOMAP_ST_MAX value while allocating state bitmap.
> >> It was never intended to be used like this.
> >>
> >> We can even now go back to this original idea and keep the use of the
> >> enum limited to what I just mentioned above i.e. for ifs_calc_range().
> >>
> >> And maybe just use this in ifs_alloc()?
> >> BUILD_BUG_ON(IOMAP_ST_UPTODATE == 0);
> >> BUILD_BUG_ON(IOMAP_ST_DIRTY == 1);
> >>
> >> > Define the uptodate bits to be the first ones in the bitmap,
> >> > document it (and why), and leave it at that.
> >>
> >> Do you think we can go with above suggestion, or do you still think we
> >> need to drop it?
> >>
> >> In case if we drop it, then should we open code the calculations for
> >> first_blk, last_blk? These calculations are done in exact same fashion
> >> at 3 places ifs_set_range_uptodate(), ifs_clear_range_dirty() and
> >> ifs_set_range_dirty().
> >> Thoughts?
> >
> > I disliked the enum from the moment I saw it, but didn't care enough to
> > say so.
> >
> > Look, an abstraction should have a _purpose_. The enum doesn't. I'd
> > ditch this calc_range function entirely; it's just not worth it.
>
> I guess enum is creating more confusion with almost everyone than adding value.
> So I don't mind ditching it (unless anyone else opposes for keeping it).
>
> Also it would be helpful if you could let me know of any other review
> comments on the rest of the patch? Does the rest looks good to you?
I deleted my entire angry rant about how this review has turned a
fairly simple design change into a big mess that even the reviewers
don't understand anymore. I'm on vacation, I DGAF anymore.
Ritesh: Dump the enum; "because btrfs does it" is not sufficient
justification. The rest is good enough, I'll put it in iomap-for-next
along with willy's thing as soon as 6.5-rc1 closes, and if you all have
further complaints, send your own patches.
--D
> -ritesh
next prev parent reply other threads:[~2023-06-20 5:01 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-19 2:28 [PATCHv10 0/8] iomap: Add support for per-block dirty state to improve write performance Ritesh Harjani (IBM)
2023-06-19 2:28 ` [PATCHv10 1/8] iomap: Rename iomap_page to iomap_folio_state and others Ritesh Harjani (IBM)
2023-06-19 2:28 ` [PATCHv10 2/8] iomap: Drop ifs argument from iomap_set_range_uptodate() Ritesh Harjani (IBM)
2023-06-19 2:28 ` [PATCHv10 3/8] iomap: Add some uptodate state handling helpers for ifs state bitmap Ritesh Harjani (IBM)
2023-06-19 2:28 ` [PATCHv10 4/8] iomap: Fix possible overflow condition in iomap_write_delalloc_scan Ritesh Harjani (IBM)
2023-06-19 2:28 ` [PATCHv10 5/8] iomap: Use iomap_punch_t typedef Ritesh Harjani (IBM)
2023-06-19 2:28 ` [PATCHv10 6/8] iomap: Refactor iomap_write_delalloc_punch() function out Ritesh Harjani (IBM)
2023-06-19 2:28 ` [PATCHv10 7/8] iomap: Allocate ifs in ->write_begin() early Ritesh Harjani (IBM)
2023-06-19 2:28 ` [PATCHv10 8/8] iomap: Add per-block dirty state tracking to improve performance Ritesh Harjani (IBM)
2023-06-19 12:43 ` Andreas Gruenbacher
2023-06-19 16:55 ` Ritesh Harjani
2023-06-19 14:26 ` Matthew Wilcox
2023-06-19 16:25 ` Ritesh Harjani
2023-06-19 16:54 ` Matthew Wilcox
2023-06-19 17:29 ` Ritesh Harjani
2023-06-20 5:01 ` Darrick J. Wong [this message]
2023-06-20 5:58 ` Ritesh Harjani
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230620050102.GF11467@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=agruenba@redhat.com \
--cc=araherle@in.ibm.com \
--cc=bfoster@redhat.com \
--cc=disgoel@linux.ibm.com \
--cc=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=ojaswin@linux.ibm.com \
--cc=ritesh.list@gmail.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).