public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
* code questions about ext4_inode_datasync_dirty()
@ 2021-01-12 11:45 Xiaoguang Wang
  2021-01-13 17:19 ` Jan Kara
  0 siblings, 1 reply; 3+ messages in thread
From: Xiaoguang Wang @ 2021-01-12 11:45 UTC (permalink / raw)
  To: Ext4 Developers List; +Cc: joseph qi

hi,

I use io_uring to evaluate ext4 randread performance(direct io), observed
obvious overhead in jbd2_transaction_committed():
Samples: 124K of event 'cycles:ppp', Event count (approx.): 80630088951
Overhead  Command          Shared Object      Symbol
    7.02%  io_uring-sq-per  [kernel.kallsyms]  [k] jbd2_transaction_committed

The codes:
	/*
	 * Writes that span EOF might trigger an I/O size update on completion,
	 * so consider them to be dirty for the purpose of O_DSYNC, even if
	 * there is no other metadata changes being made or are pending.
	 */
	iomap->flags = 0;
	if (ext4_inode_datasync_dirty(inode) ||
	    offset + length > i_size_read(inode))
		iomap->flags |= IOMAP_F_DIRTY;

ext4_inode_datasync_dirty() calls jbd2_transaction_committed(). Sorry, I don't spend
much time to learn iomap codes yet, just ask a quick question here. Do we need to call
ext4_inode_datasync_dirty() for a read operation?

If we must call ext4_inode_datasync_dirty() for a read operation, can we improve
jbd2_transaction_committed() a bit, for example, have a quick check between
inode->i_datasync_tid and j_commit_sequence, if inode->i_datasync_tid is less than
or equal to j_commit_sequence, we also don't call jbd2_transaction_committed()?

Regards,
Xiaoguang Wang

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: code questions about ext4_inode_datasync_dirty()
  2021-01-12 11:45 code questions about ext4_inode_datasync_dirty() Xiaoguang Wang
@ 2021-01-13 17:19 ` Jan Kara
  2021-01-19  3:40   ` Andreas Dilger
  0 siblings, 1 reply; 3+ messages in thread
From: Jan Kara @ 2021-01-13 17:19 UTC (permalink / raw)
  To: Xiaoguang Wang; +Cc: Ext4 Developers List, joseph qi

Hi!

On Tue 12-01-21 19:45:06, Xiaoguang Wang wrote:
> I use io_uring to evaluate ext4 randread performance(direct io), observed
> obvious overhead in jbd2_transaction_committed():
> Samples: 124K of event 'cycles:ppp', Event count (approx.): 80630088951
> Overhead  Command          Shared Object      Symbol
>    7.02%  io_uring-sq-per  [kernel.kallsyms]  [k] jbd2_transaction_committed

Hum, that's quite a bit. Likely due to cacheline contention on
j_state_lock. Thanks for reporting this!

> The codes:
> 	/*
> 	 * Writes that span EOF might trigger an I/O size update on completion,
> 	 * so consider them to be dirty for the purpose of O_DSYNC, even if
> 	 * there is no other metadata changes being made or are pending.
> 	 */
> 	iomap->flags = 0;
> 	if (ext4_inode_datasync_dirty(inode) ||
> 	    offset + length > i_size_read(inode))
> 		iomap->flags |= IOMAP_F_DIRTY;
> 
> ext4_inode_datasync_dirty() calls jbd2_transaction_committed(). Sorry, I
> don't spend much time to learn iomap codes yet, just ask a quick question
> here. Do we need to call ext4_inode_datasync_dirty() for a read
> operation?

So strictly speaking we don't need to know the value of IOMAP_F_DIRTY in
that path (or any read path for that matter) but I'm somewhat reluctant to
optimize out setting of this flag because later some user might start to
depend on it being set correctly.

> If we must call ext4_inode_datasync_dirty() for a read operation, can we improve
> jbd2_transaction_committed() a bit, for example, have a quick check between
> inode->i_datasync_tid and j_commit_sequence, if inode->i_datasync_tid is less than
> or equal to j_commit_sequence, we also don't call jbd2_transaction_committed()?

Well, the modification would belong directly to
jbd2_transaction_committed(). Using j_commit_sequence is somewhat awkward
since TIDs can wrap around and so very old TIDs can suddently start to give
false positives leading to a strange results (this was one of motivations
to switch jbd2_transaction_committed() to a comparison for equality). But
it could certainly be done.

But j_state_lock is a scalability bottleneck in other cases as well. So
what I rather have in mind is that we could change transactions to be RCU
freed and then a majority of places where we use

  read_lock(&journal->j_state_lock);

can be switched to using plain RCU which should significantly reduce the
contention on the lock.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: code questions about ext4_inode_datasync_dirty()
  2021-01-13 17:19 ` Jan Kara
@ 2021-01-19  3:40   ` Andreas Dilger
  0 siblings, 0 replies; 3+ messages in thread
From: Andreas Dilger @ 2021-01-19  3:40 UTC (permalink / raw)
  To: Jan Kara; +Cc: Xiaoguang Wang, Ext4 Developers List, joseph qi

[-- Attachment #1: Type: text/plain, Size: 504 bytes --]

On Jan 13, 2021, at 10:19 AM, Jan Kara <jack@suse.cz> wrote:
> 
> Hi!
> 
> On Tue 12-01-21 19:45:06, Xiaoguang Wang wrote:
>> I use io_uring to evaluate ext4 randread performance(direct io), observed
>> obvious overhead in jbd2_transaction_committed():

I was going to ask about this - is the filesystem mounted with noatime or
relatime or lazytime?  Otherwise, it may be a lot of atime updates that are
causing all of this journal traffic in what _should_ be a read-only workload?

Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 873 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-01-19  3:41 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-01-12 11:45 code questions about ext4_inode_datasync_dirty() Xiaoguang Wang
2021-01-13 17:19 ` Jan Kara
2021-01-19  3:40   ` Andreas Dilger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox