Re: [PATCH 5/6] iomap: drop unnecessary state_lock when setting ifs uptodate bits

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Zhang Yi <yi.zhang@huaweicloud.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, djwong@kernel.org,
	hch@infradead.org, brauner@kernel.org, jack@suse.cz,
	yi.zhang@huawei.com, chengzhihao1@huawei.com, yukuai3@huawei.com
Subject: Re: [PATCH 5/6] iomap: drop unnecessary state_lock when setting ifs uptodate bits
Date: Fri, 2 Aug 2024 10:57:41 +0800	[thread overview]
Message-ID: <b40a510d-37b3-da50-79db-d56ebd870bf0@huaweicloud.com> (raw)
In-Reply-To: <Zqwi48H74g2EX56c@dread.disaster.area>

On 2024/8/2 8:05, Dave Chinner wrote:
> On Wed, Jul 31, 2024 at 05:13:04PM +0800, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> Commit '1cea335d1db1 ("iomap: fix sub-page uptodate handling")' fix a
>> race issue when submitting multiple read bios for a page spans more than
>> one file system block by adding a spinlock(which names state_lock now)
>> to make the page uptodate synchronous. However, the race condition only
>> happened between the read I/O submitting and completeing threads,
> 
> when we do writeback on a folio that has multiple blocks on it we
> can submit multiple bios for that, too. Hence the write completions
> can race with each other and write submission, too.
> 
> Yes, write bio submission and completion only need to update ifs
> accounting using an atomic operation, but the same race condition
> exists even though the folio is fully locked at the point of bio
> submission.
> 
> 
>> it's
>> sufficient to use page lock to protect other paths, e.g. buffered write
>                     ^^^^ folio
>> path.
>>
>> After large folio is supported, the spinlock could affect more
>> about the buffered write performance, so drop it could reduce some
>> unnecessary locking overhead.
> 
> From the point of view of simple to understand and maintain code, I
> think this is a bad idea. The data structure is currently protected
> by the state lock in all situations, but this change now makes it
> protected by the state lock in one case and the folio lock in a
> different case.

Yeah, I agree that this is a side-effect of this change, after this patch,
we have to be careful to distinguish between below two cases B1 and B2 as
Willy mentioned.

B. If ifs_set_range_uptodate() is called from iomap_set_range_uptodate(),
   either we know:
B1. The caller of iomap_set_range_uptodate() holds the folio lock, and this
    is the only place that can call ifs_set_range_uptodate() for this folio
B2. The caller of iomap_set_range_uptodate() holds the state lock

> 
> Making this change also misses the elephant in the room: the
> buffered write path still needs the ifs->state_lock to update the
> dirty bitmap. Hence we're effectively changing the serialisation
> mechanism for only one of the two ifs state bitmaps that the
> buffered write path has to update.
> 
> Indeed, we can't get rid of the ifs->state_lock from the dirty range
> updates because iomap_dirty_folio() can be called without the folio
> being locked through folio_mark_dirty() calling the ->dirty_folio()
> aop.
> 

Sorry, I don't understand, why folio_mark_dirty() could be called without
folio lock (isn't this supposed to be a bug)? IIUC, all the file backed
folios must be locked before marking dirty. Are there any exceptions or am
I missing something?

> IOWs, getting rid of the state lock out of the uptodate range
> changes does not actually get rid of it from the buffered IO patch.
> we still have to take it to update the dirty range, and so there's
> an obvious way to optimise the state lock usage without changing any
> of the bitmap access serialisation behaviour. i.e.  We combine the
> uptodate and dirty range updates in __iomap_write_end() into a
> single lock context such as:
> 
> iomap_set_range_dirty_uptodate()
> {
> 	struct iomap_folio_state *ifs = folio->private;
> 	struct inode *inode:
>         unsigned int blks_per_folio;
>         unsigned int first_blk;
>         unsigned int last_blk;
>         unsigned int nr_blks;
> 	unsigned long flags;
> 
> 	if (!ifs)
> 		return;
> 
> 	inode = folio->mapping->host;
> 	blks_per_folio = i_blocks_per_folio(inode, folio);
> 	first_blk = (off >> inode->i_blkbits);
> 	last_blk = (off + len - 1) >> inode->i_blkbits;
> 	nr_blks = last_blk - first_blk + 1;
> 
> 	spin_lock_irqsave(&ifs->state_lock, flags);
> 	bitmap_set(ifs->state, first_blk, nr_blks);
> 	bitmap_set(ifs->state, first_blk + blks_per_folio, nr_blks);
> 	spin_unlock_irqrestore(&ifs->state_lock, flags);
> }
> 
> This means we calculate the bitmap offsets only once, we take the
> state lock only once, and we don't do anything if there is no
> sub-folio state.
> 
> If we then fix the __iomap_write_begin() code as Willy pointed out
> to elide the erroneous uptodate range update, then we end up only
> taking the state lock once per buffered write instead of 3 times per
> write.
> 
> This patch only reduces it to twice per buffered write, so doing the
> above should provide even better performance without needing to
> change the underlying serialisation mechanism at all.
> 

Thanks for the suggestion. I've thought about this solution too, but I
didn't think we need the state_lock when setting ifs dirty bit since the
folio lock should work, so I changed my mind and planed to drop all ifs
state_lock in the write path (please see the patch 6). Please let me
know if I'm wrong.

Thanks,
Yi.

next prev parent reply	other threads:[~2024-08-02  2:57 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-31  9:12 [PATCH 0/6] iomap: some minor non-critical fixes and improvements when block size < folio size Zhang Yi
2024-07-31  9:13 ` [PATCH 1/6] iomap: correct the range of a partial dirty clear Zhang Yi
2024-07-31  9:13 ` [PATCH 2/6] iomap: support invalidating partial folios Zhang Yi
2024-07-31  9:13 ` [PATCH 3/6] iomap: advance the ifs allocation if we have more than one blocks per folio Zhang Yi
2024-07-31  9:13 ` [PATCH 4/6] iomap: correct the dirty length in page mkwrite Zhang Yi
2024-07-31  9:13 ` [PATCH 5/6] iomap: drop unnecessary state_lock when setting ifs uptodate bits Zhang Yi
2024-07-31 16:52   ` Matthew Wilcox
2024-08-01  1:52     ` Zhang Yi
2024-08-01  4:24       ` Matthew Wilcox
2024-08-01  9:19         ` Zhang Yi
2024-08-02  0:05   ` Dave Chinner
2024-08-02  2:57     ` Zhang Yi [this message]
2024-08-02  6:29       ` Dave Chinner
2024-08-02 11:13         ` Zhang Yi
2024-08-05 12:42           ` Jan Kara
2024-08-05 14:00             ` Jan Kara
2024-08-05 15:48               ` Matthew Wilcox
2024-08-07 11:39                 ` Zhang Yi
2024-07-31  9:13 ` [PATCH 6/6] iomap: drop unnecessary state_lock when changing ifs dirty bits Zhang Yi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b40a510d-37b3-da50-79db-d56ebd870bf0@huaweicloud.com \
    --to=yi.zhang@huaweicloud.com \
    --cc=brauner@kernel.org \
    --cc=chengzhihao1@huawei.com \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=yi.zhang@huawei.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).