public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Zhang Yi <yi.zhang@huaweicloud.com>
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, djwong@kernel.org,
	hch@infradead.org, brauner@kernel.org, jack@suse.cz,
	yi.zhang@huawei.com, chengzhihao1@huawei.com, yukuai3@huawei.com
Subject: Re: [PATCH 5/6] iomap: drop unnecessary state_lock when setting ifs uptodate bits
Date: Fri, 2 Aug 2024 10:05:55 +1000	[thread overview]
Message-ID: <Zqwi48H74g2EX56c@dread.disaster.area> (raw)
In-Reply-To: <20240731091305.2896873-6-yi.zhang@huaweicloud.com>

On Wed, Jul 31, 2024 at 05:13:04PM +0800, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> Commit '1cea335d1db1 ("iomap: fix sub-page uptodate handling")' fix a
> race issue when submitting multiple read bios for a page spans more than
> one file system block by adding a spinlock(which names state_lock now)
> to make the page uptodate synchronous. However, the race condition only
> happened between the read I/O submitting and completeing threads,

when we do writeback on a folio that has multiple blocks on it we
can submit multiple bios for that, too. Hence the write completions
can race with each other and write submission, too.

Yes, write bio submission and completion only need to update ifs
accounting using an atomic operation, but the same race condition
exists even though the folio is fully locked at the point of bio
submission.


> it's
> sufficient to use page lock to protect other paths, e.g. buffered write
                    ^^^^ folio
> path.
>
> After large folio is supported, the spinlock could affect more
> about the buffered write performance, so drop it could reduce some
> unnecessary locking overhead.

From the point of view of simple to understand and maintain code, I
think this is a bad idea. The data structure is currently protected
by the state lock in all situations, but this change now makes it
protected by the state lock in one case and the folio lock in a
different case.

Making this change also misses the elephant in the room: the
buffered write path still needs the ifs->state_lock to update the
dirty bitmap. Hence we're effectively changing the serialisation
mechanism for only one of the two ifs state bitmaps that the
buffered write path has to update.

Indeed, we can't get rid of the ifs->state_lock from the dirty range
updates because iomap_dirty_folio() can be called without the folio
being locked through folio_mark_dirty() calling the ->dirty_folio()
aop.

IOWs, getting rid of the state lock out of the uptodate range
changes does not actually get rid of it from the buffered IO patch.
we still have to take it to update the dirty range, and so there's
an obvious way to optimise the state lock usage without changing any
of the bitmap access serialisation behaviour. i.e.  We combine the
uptodate and dirty range updates in __iomap_write_end() into a
single lock context such as:

iomap_set_range_dirty_uptodate()
{
	struct iomap_folio_state *ifs = folio->private;
	struct inode *inode:
        unsigned int blks_per_folio;
        unsigned int first_blk;
        unsigned int last_blk;
        unsigned int nr_blks;
	unsigned long flags;

	if (!ifs)
		return;

	inode = folio->mapping->host;
	blks_per_folio = i_blocks_per_folio(inode, folio);
	first_blk = (off >> inode->i_blkbits);
	last_blk = (off + len - 1) >> inode->i_blkbits;
	nr_blks = last_blk - first_blk + 1;

	spin_lock_irqsave(&ifs->state_lock, flags);
	bitmap_set(ifs->state, first_blk, nr_blks);
	bitmap_set(ifs->state, first_blk + blks_per_folio, nr_blks);
	spin_unlock_irqrestore(&ifs->state_lock, flags);
}

This means we calculate the bitmap offsets only once, we take the
state lock only once, and we don't do anything if there is no
sub-folio state.

If we then fix the __iomap_write_begin() code as Willy pointed out
to elide the erroneous uptodate range update, then we end up only
taking the state lock once per buffered write instead of 3 times per
write.

This patch only reduces it to twice per buffered write, so doing the
above should provide even better performance without needing to
change the underlying serialisation mechanism at all.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

  parent reply	other threads:[~2024-08-02  0:05 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-31  9:12 [PATCH 0/6] iomap: some minor non-critical fixes and improvements when block size < folio size Zhang Yi
2024-07-31  9:13 ` [PATCH 1/6] iomap: correct the range of a partial dirty clear Zhang Yi
2024-07-31  9:13 ` [PATCH 2/6] iomap: support invalidating partial folios Zhang Yi
2024-07-31  9:13 ` [PATCH 3/6] iomap: advance the ifs allocation if we have more than one blocks per folio Zhang Yi
2024-07-31  9:13 ` [PATCH 4/6] iomap: correct the dirty length in page mkwrite Zhang Yi
2024-07-31  9:13 ` [PATCH 5/6] iomap: drop unnecessary state_lock when setting ifs uptodate bits Zhang Yi
2024-07-31 16:52   ` Matthew Wilcox
2024-08-01  1:52     ` Zhang Yi
2024-08-01  4:24       ` Matthew Wilcox
2024-08-01  9:19         ` Zhang Yi
2024-08-02  0:05   ` Dave Chinner [this message]
2024-08-02  2:57     ` Zhang Yi
2024-08-02  6:29       ` Dave Chinner
2024-08-02 11:13         ` Zhang Yi
2024-08-05 12:42           ` Jan Kara
2024-08-05 14:00             ` Jan Kara
2024-08-05 15:48               ` Matthew Wilcox
2024-08-07 11:39                 ` Zhang Yi
2024-07-31  9:13 ` [PATCH 6/6] iomap: drop unnecessary state_lock when changing ifs dirty bits Zhang Yi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zqwi48H74g2EX56c@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=brauner@kernel.org \
    --cc=chengzhihao1@huawei.com \
    --cc=djwong@kernel.org \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=yi.zhang@huawei.com \
    --cc=yi.zhang@huaweicloud.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox