From: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: "Darrick J. Wong" <djwong@kernel.org>,
linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org,
linux-fsdevel@vger.kernel.org,
Ojaswin Mujoo <ojaswin@linux.ibm.com>, Jan Kara <jack@suse.cz>
Subject: Re: [RFCv3 7/7] iomap: Optimize data access patterns for filesystems with indirect mappings
Date: Sat, 27 Apr 2024 00:27:52 +0530 [thread overview]
Message-ID: <871q6symrz.fsf@gmail.com> (raw)
In-Reply-To: <Zivu0gzb4aiazSNu@casper.infradead.org>
Matthew Wilcox <willy@infradead.org> writes:
> On Fri, Apr 26, 2024 at 11:25:25PM +0530, Ritesh Harjani wrote:
>> Matthew Wilcox <willy@infradead.org> writes:
>> > The approach I suggested was to initialise read_bytes_pending to
>> > folio_size() at the start. Then subtract off blocksize for each
>> > uptodate block, whether you find it already uptodate, or as the
>> > completion handler runs.
>> >
>> > Is there a reason that doesn't work?
>>
>> That is what this patch series does right. The current patch does work
>> as far as my testing goes.
>>
>> For e.g. This is what initializes the r_b_p for the first time when
>> ifs->r_b_p is 0.
>>
>> + loff_t to_read = min_t(loff_t, iter->len - offset,
>> + folio_size(folio) - offset_in_folio(folio, orig_pos));
>> <..>
>> + if (!ifs->read_bytes_pending)
>> + ifs->read_bytes_pending = to_read;
>>
>>
>> Then this is where we subtract r_b_p for blocks which are uptodate.
>> + padjust = pos - orig_pos;
>> + ifs->read_bytes_pending -= padjust;
>>
>>
>> This is when we adjust r_b_p when we directly zero the folio.
>> if (iomap_block_needs_zeroing(iter, pos)) {
>> + if (ifs) {
>> + spin_lock_irq(&ifs->state_lock);
>> + ifs->read_bytes_pending -= plen;
>> + if (!ifs->read_bytes_pending)
>> + rbp_finished = true;
>> + spin_unlock_irq(&ifs->state_lock);
>> + }
>>
>> But as you see this requires surgery throughout read paths. What if
>> we add a state flag to ifs only for BH_BOUNDARY. Maybe that could
>> result in a more simplified approach?
>> Because all we require is to know whether the folio should be unlocked
>> or not at the time of completion.
>>
>> Do you think we should try that part or you think the current approach
>> looks ok?
>
> You've really made life hard for yourself. I had something more like
> this in mind. I may have missed a few places that need to be changed,
> but this should update read_bytes_pending everwhere that we set bits
> in the uptodate bitmap, so it should be right?
Please correct me if I am wrong.
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 41c8f0c68ef5..f87ca8ee4d19 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -79,6 +79,7 @@ static void iomap_set_range_uptodate(struct folio *folio, size_t off,
> if (ifs) {
> spin_lock_irqsave(&ifs->state_lock, flags);
> uptodate = ifs_set_range_uptodate(folio, ifs, off, len);
> + ifs->read_bytes_pending -= len;
> spin_unlock_irqrestore(&ifs->state_lock, flags);
> }
iomap_set_range_uptodate() gets called from ->write_begin() and
->write_end() too. So what we are saying is we are updating
the state of read_bytes_pending even though we are not in
->read_folio() or ->readahead() call?
>
> @@ -208,6 +209,8 @@ static struct iomap_folio_state *ifs_alloc(struct inode *inode,
> spin_lock_init(&ifs->state_lock);
> if (folio_test_uptodate(folio))
> bitmap_set(ifs->state, 0, nr_blocks);
> + else
> + ifs->read_bytes_pending = folio_size(folio);
We might not come till here during ->read_folio -> ifs_alloc(). Since we
might have a cached ifs which was allocated during write to this folio.
But unless you are saying that during writes, we would have set
ifs->r_b_p to folio_size() and when the read call happens, we use
the same value of the cached ifs.
Ok, I see. I was mostly focusing on updating ifs->r_b_p value only when
the reads bytes are actually pending during ->read_folio() or
->readahead() and not updating r_b_p during writes.
...One small problem which I see with this approach is - we might have
some non-zero value in ifs->r_b_p when ifs_free() gets called and it
might give a warning of non-zero ifs->r_b_p, because we updated
ifs->r_b_p during writes to a non-zero value, but the reads
never happend. Then during a call to ->release_folio, it will complain
of a non-zero ifs->r_b_p.
> if (folio_test_dirty(folio))
> bitmap_set(ifs->state, nr_blocks, nr_blocks);
> folio_attach_private(folio, ifs);
> @@ -396,12 +399,6 @@ static loff_t iomap_readpage_iter(const struct iomap_iter *iter,
> }
>
> ctx->cur_folio_in_bio = true;
> - if (ifs) {
> - spin_lock_irq(&ifs->state_lock);
> - ifs->read_bytes_pending += plen;
> - spin_unlock_irq(&ifs->state_lock);
> - }
> -
> sector = iomap_sector(iomap, pos);
> if (!ctx->bio ||
> bio_end_sector(ctx->bio) != sector ||
-ritesh
next prev parent reply other threads:[~2024-04-26 18:57 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-25 13:28 [RFCv3 0/7] ext2 iomap changes and iomap improvements Ritesh Harjani (IBM)
2024-04-25 13:28 ` [RFCv3 1/7] ext2: Remove comment related to journal handle Ritesh Harjani (IBM)
2024-04-26 15:21 ` Darrick J. Wong
2024-04-25 13:28 ` [RFCv3 2/7] ext2: Convert ext2 regular file buffered I/O to use iomap Ritesh Harjani (IBM)
2024-04-26 15:29 ` Darrick J. Wong
2024-04-25 13:28 ` [RFCv3 3/7] ext2: Enable large folio support Ritesh Harjani (IBM)
2024-04-26 15:30 ` Darrick J. Wong
2024-04-25 13:28 ` [RFCv3 4/7] ext2: Implement seq counter for validating cached iomap Ritesh Harjani (IBM)
2024-04-26 15:41 ` Darrick J. Wong
2024-05-12 13:20 ` Jan Kara
2024-04-25 13:28 ` [RFCv3 5/7] iomap: Fix iomap_adjust_read_range for plen calculation Ritesh Harjani (IBM)
2024-04-26 6:49 ` Christoph Hellwig
2024-04-26 8:52 ` Ritesh Harjani
2024-04-26 15:43 ` Darrick J. Wong
2024-04-25 13:28 ` [RFCv3 6/7] iomap: Optimize iomap_read_folio Ritesh Harjani (IBM)
2024-04-25 13:53 ` Matthew Wilcox
2024-04-26 6:53 ` Christoph Hellwig
2024-04-26 8:50 ` Ritesh Harjani
2024-04-27 4:44 ` Christoph Hellwig
2024-04-25 13:28 ` [RFCv3 7/7] iomap: Optimize data access patterns for filesystems with indirect mappings Ritesh Harjani (IBM)
2024-04-26 16:24 ` Darrick J. Wong
2024-04-26 17:17 ` Ritesh Harjani
2024-04-26 17:25 ` Ritesh Harjani
2024-04-26 17:37 ` Matthew Wilcox
2024-04-26 17:55 ` Ritesh Harjani
2024-04-26 18:13 ` Matthew Wilcox
2024-04-26 18:57 ` Ritesh Harjani [this message]
2024-04-26 19:19 ` Matthew Wilcox
2024-04-27 4:54 ` Christoph Hellwig
2024-04-27 6:03 ` Ritesh Harjani
2024-04-27 4:47 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=871q6symrz.fsf@gmail.com \
--to=ritesh.list@gmail.com \
--cc=djwong@kernel.org \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=ojaswin@linux.ibm.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.