linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] ext4/jbd2: drop jbd2_transaction_committed()
@ 2024-05-13  7:21 Zhang Yi
  2024-05-15  0:25 ` Jan Kara
  2024-05-27  6:37 ` kernel test robot
  0 siblings, 2 replies; 6+ messages in thread
From: Zhang Yi @ 2024-05-13  7:21 UTC (permalink / raw)
  To: linux-ext4
  Cc: linux-fsdevel, linux-kernel, tytso, adilger.kernel, jack,
	ritesh.list, yi.zhang, yi.zhang, chengzhihao1, yukuai3

From: Zhang Yi <yi.zhang@huawei.com>

jbd2_transaction_committed() is used to check whether a transaction with
the given tid has already committed, it hold j_state_lock in read mode
and check the tid of current running transaction and committing
transaction, but holding the j_state_lock is expensive.

We have already stored the sequence number of the most recently
committed transaction in journal t->j_commit_sequence, we could do this
check by comparing it with the given tid instead. If the given tid isn't
smaller than j_commit_sequence, we can ensure that the given transaction
has been committed. That way we could drop the expensive lock and
achieve about 10% ~ 20% performance gains in concurrent DIOs on may
virtual machine with 100G ramdisk.

fio -filename=/mnt/foo -direct=1 -iodepth=10 -rw=$rw -ioengine=libaio \
    -bs=4k -size=10G -numjobs=10 -runtime=60 -overwrite=1 -name=test \
    -group_reporting

Before:
  overwrite       IOPS=88.2k, BW=344MiB/s
  read            IOPS=95.7k, BW=374MiB/s
  rand overwrite  IOPS=98.7k, BW=386MiB/s
  randread        IOPS=102k, BW=397MiB/s

After:
  verwrite:       IOPS=105k, BW=410MiB/s
  read:           IOPS=112k, BW=436MiB/s
  rand overwrite: IOPS=104k, BW=404MiB/s
  randread:       IOPS=111k, BW=432MiB/s

CC: Dave Chinner <david@fromorbit.com>
Suggested-by: Dave Chinner <david@fromorbit.com>
Link: https://lore.kernel.org/linux-ext4/493ab4c5-505c-a351-eefa-7d2677cdf800@huaweicloud.com/T/#m6a14df5d085527a188c5a151191e87a3252dc4e2
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/inode.c      |  4 ++--
 fs/jbd2/journal.c    | 17 -----------------
 include/linux/jbd2.h |  1 -
 3 files changed, 2 insertions(+), 20 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 537803250ca9..e8e2865bf9ac 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3199,8 +3199,8 @@ static bool ext4_inode_datasync_dirty(struct inode *inode)
 	journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
 
 	if (journal) {
-		if (jbd2_transaction_committed(journal,
-			EXT4_I(inode)->i_datasync_tid))
+		if (tid_geq(journal->j_commit_sequence,
+			    EXT4_I(inode)->i_datasync_tid))
 			return false;
 		if (test_opt2(inode->i_sb, JOURNAL_FAST_COMMIT))
 			return !list_empty(&EXT4_I(inode)->i_fc_list);
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index b6c114c11b97..73737cd1106f 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -786,23 +786,6 @@ int jbd2_fc_end_commit_fallback(journal_t *journal)
 }
 EXPORT_SYMBOL(jbd2_fc_end_commit_fallback);
 
-/* Return 1 when transaction with given tid has already committed. */
-int jbd2_transaction_committed(journal_t *journal, tid_t tid)
-{
-	int ret = 1;
-
-	read_lock(&journal->j_state_lock);
-	if (journal->j_running_transaction &&
-	    journal->j_running_transaction->t_tid == tid)
-		ret = 0;
-	if (journal->j_committing_transaction &&
-	    journal->j_committing_transaction->t_tid == tid)
-		ret = 0;
-	read_unlock(&journal->j_state_lock);
-	return ret;
-}
-EXPORT_SYMBOL(jbd2_transaction_committed);
-
 /*
  * When this function returns the transaction corresponding to tid
  * will be completed.  If the transaction has currently running, start
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 971f3e826e15..e15ae324169d 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1643,7 +1643,6 @@ extern void	jbd2_clear_buffer_revoked_flags(journal_t *journal);
 int jbd2_log_start_commit(journal_t *journal, tid_t tid);
 int jbd2_journal_start_commit(journal_t *journal, tid_t *tid);
 int jbd2_log_wait_commit(journal_t *journal, tid_t tid);
-int jbd2_transaction_committed(journal_t *journal, tid_t tid);
 int jbd2_complete_transaction(journal_t *journal, tid_t tid);
 int jbd2_log_do_checkpoint(journal_t *journal);
 int jbd2_trans_will_send_data_barrier(journal_t *journal, tid_t tid);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] ext4/jbd2: drop jbd2_transaction_committed()
  2024-05-13  7:21 [PATCH] ext4/jbd2: drop jbd2_transaction_committed() Zhang Yi
@ 2024-05-15  0:25 ` Jan Kara
  2024-05-16  8:27   ` Zhang Yi
  2024-05-27  6:37 ` kernel test robot
  1 sibling, 1 reply; 6+ messages in thread
From: Jan Kara @ 2024-05-15  0:25 UTC (permalink / raw)
  To: Zhang Yi
  Cc: linux-ext4, linux-fsdevel, linux-kernel, tytso, adilger.kernel,
	jack, ritesh.list, yi.zhang, chengzhihao1, yukuai3

On Mon 13-05-24 15:21:19, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@huawei.com>
> 
> jbd2_transaction_committed() is used to check whether a transaction with
> the given tid has already committed, it hold j_state_lock in read mode
> and check the tid of current running transaction and committing
> transaction, but holding the j_state_lock is expensive.
> 
> We have already stored the sequence number of the most recently
> committed transaction in journal t->j_commit_sequence, we could do this
> check by comparing it with the given tid instead. If the given tid isn't
> smaller than j_commit_sequence, we can ensure that the given transaction
> has been committed. That way we could drop the expensive lock and
> achieve about 10% ~ 20% performance gains in concurrent DIOs on may
> virtual machine with 100G ramdisk.
> 
> fio -filename=/mnt/foo -direct=1 -iodepth=10 -rw=$rw -ioengine=libaio \
>     -bs=4k -size=10G -numjobs=10 -runtime=60 -overwrite=1 -name=test \
>     -group_reporting
> 
> Before:
>   overwrite       IOPS=88.2k, BW=344MiB/s
>   read            IOPS=95.7k, BW=374MiB/s
>   rand overwrite  IOPS=98.7k, BW=386MiB/s
>   randread        IOPS=102k, BW=397MiB/s
> 
> After:
>   verwrite:       IOPS=105k, BW=410MiB/s
>   read:           IOPS=112k, BW=436MiB/s
>   rand overwrite: IOPS=104k, BW=404MiB/s
>   randread:       IOPS=111k, BW=432MiB/s
> 
> CC: Dave Chinner <david@fromorbit.com>
> Suggested-by: Dave Chinner <david@fromorbit.com>
> Link: https://lore.kernel.org/linux-ext4/493ab4c5-505c-a351-eefa-7d2677cdf800@huaweicloud.com/T/#m6a14df5d085527a188c5a151191e87a3252dc4e2
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

I agree this is workable solution and the performance benefits are nice. But
I have some comments regarding the implementation:

> @@ -3199,8 +3199,8 @@ static bool ext4_inode_datasync_dirty(struct inode *inode)
>  	journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
>  
>  	if (journal) {
> -		if (jbd2_transaction_committed(journal,
> -			EXT4_I(inode)->i_datasync_tid))
> +		if (tid_geq(journal->j_commit_sequence,
> +			    EXT4_I(inode)->i_datasync_tid))

Please leave the helper jbd2_transaction_committed(), just make the
implementation more efficient. Also accessing j_commit_sequence without any
lock is theoretically problematic wrt compiler optimization. You should have
READ_ONCE() there and the places modifying j_commit_sequence need to use
WRITE_ONCE().

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] ext4/jbd2: drop jbd2_transaction_committed()
  2024-05-15  0:25 ` Jan Kara
@ 2024-05-16  8:27   ` Zhang Yi
  2024-05-20  8:49     ` Jan Kara
  0 siblings, 1 reply; 6+ messages in thread
From: Zhang Yi @ 2024-05-16  8:27 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-ext4, linux-fsdevel, linux-kernel, tytso, adilger.kernel,
	ritesh.list, yi.zhang, chengzhihao1, yukuai3

On 2024/5/15 8:25, Jan Kara wrote:
> On Mon 13-05-24 15:21:19, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@huawei.com>
>>
>> jbd2_transaction_committed() is used to check whether a transaction with
>> the given tid has already committed, it hold j_state_lock in read mode
>> and check the tid of current running transaction and committing
>> transaction, but holding the j_state_lock is expensive.
>>
>> We have already stored the sequence number of the most recently
>> committed transaction in journal t->j_commit_sequence, we could do this
>> check by comparing it with the given tid instead. If the given tid isn't
>> smaller than j_commit_sequence, we can ensure that the given transaction
>> has been committed. That way we could drop the expensive lock and
>> achieve about 10% ~ 20% performance gains in concurrent DIOs on may
>> virtual machine with 100G ramdisk.
>>
>> fio -filename=/mnt/foo -direct=1 -iodepth=10 -rw=$rw -ioengine=libaio \
>>     -bs=4k -size=10G -numjobs=10 -runtime=60 -overwrite=1 -name=test \
>>     -group_reporting
>>
>> Before:
>>   overwrite       IOPS=88.2k, BW=344MiB/s
>>   read            IOPS=95.7k, BW=374MiB/s
>>   rand overwrite  IOPS=98.7k, BW=386MiB/s
>>   randread        IOPS=102k, BW=397MiB/s
>>
>> After:
>>   verwrite:       IOPS=105k, BW=410MiB/s
>>   read:           IOPS=112k, BW=436MiB/s
>>   rand overwrite: IOPS=104k, BW=404MiB/s
>>   randread:       IOPS=111k, BW=432MiB/s
>>
>> CC: Dave Chinner <david@fromorbit.com>
>> Suggested-by: Dave Chinner <david@fromorbit.com>
>> Link: https://lore.kernel.org/linux-ext4/493ab4c5-505c-a351-eefa-7d2677cdf800@huaweicloud.com/T/#m6a14df5d085527a188c5a151191e87a3252dc4e2
>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> 
> I agree this is workable solution and the performance benefits are nice. But
> I have some comments regarding the implementation:
> 
>> @@ -3199,8 +3199,8 @@ static bool ext4_inode_datasync_dirty(struct inode *inode)
>>  	journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
>>  
>>  	if (journal) {
>> -		if (jbd2_transaction_committed(journal,
>> -			EXT4_I(inode)->i_datasync_tid))
>> +		if (tid_geq(journal->j_commit_sequence,
>> +			    EXT4_I(inode)->i_datasync_tid))
> 
> Please leave the helper jbd2_transaction_committed(), just make the
> implementation more efficient. 

Sure.

> Also accessing j_commit_sequence without any
> lock is theoretically problematic wrt compiler optimization. You should have
> READ_ONCE() there and the places modifying j_commit_sequence need to use
> WRITE_ONCE().
> 

Thanks for pointing this out, but I'm not sure if we have to need READ_ONCE()
here. IIUC, if we add READ_ONCE(), we could make sure to get the latest
j_commit_sequence, if not, there is a window (it might becomes larger) that
we could get the old value and jbd2_transaction_committed() could return false
even if the given transaction was just committed, but I think the window is
always there, so it looks like it is not a big problem, is that right?

Thanks,
Yi.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] ext4/jbd2: drop jbd2_transaction_committed()
  2024-05-16  8:27   ` Zhang Yi
@ 2024-05-20  8:49     ` Jan Kara
  2024-05-20 12:39       ` Zhang Yi
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Kara @ 2024-05-20  8:49 UTC (permalink / raw)
  To: Zhang Yi
  Cc: Jan Kara, linux-ext4, linux-fsdevel, linux-kernel, tytso,
	adilger.kernel, ritesh.list, yi.zhang, chengzhihao1, yukuai3

On Thu 16-05-24 16:27:25, Zhang Yi wrote:
> On 2024/5/15 8:25, Jan Kara wrote:
> > On Mon 13-05-24 15:21:19, Zhang Yi wrote:
> > Also accessing j_commit_sequence without any
> > lock is theoretically problematic wrt compiler optimization. You should have
> > READ_ONCE() there and the places modifying j_commit_sequence need to use
> > WRITE_ONCE().
> > 
> 
> Thanks for pointing this out, but I'm not sure if we have to need READ_ONCE()
> here. IIUC, if we add READ_ONCE(), we could make sure to get the latest
> j_commit_sequence, if not, there is a window (it might becomes larger) that
> we could get the old value and jbd2_transaction_committed() could return false
> even if the given transaction was just committed, but I think the window is
> always there, so it looks like it is not a big problem, is that right?

Well, all accesses to any memory should use READ_ONCE(), be protected by a
lock, or use types that handle atomicity on assembly level (like atomic_t,
or atomic bit operations and similar). Otherwise the compiler is free to
assume the underlying memory cannot change and generate potentionally
invalid code. In this case, I don't think realistically any compiler will
do it but still it is a good practice and also it saves us from KCSAN
warnings. If you want to know more details about possible problems, see

  tools/memory-model/Documentation/explanation.txt

chapter "PLAIN ACCESSES AND DATA RACES".

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] ext4/jbd2: drop jbd2_transaction_committed()
  2024-05-20  8:49     ` Jan Kara
@ 2024-05-20 12:39       ` Zhang Yi
  0 siblings, 0 replies; 6+ messages in thread
From: Zhang Yi @ 2024-05-20 12:39 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-ext4, linux-fsdevel, linux-kernel, tytso, adilger.kernel,
	ritesh.list, yi.zhang, chengzhihao1, yukuai3

On 2024/5/20 16:49, Jan Kara wrote:
> On Thu 16-05-24 16:27:25, Zhang Yi wrote:
>> On 2024/5/15 8:25, Jan Kara wrote:
>>> On Mon 13-05-24 15:21:19, Zhang Yi wrote:
>>> Also accessing j_commit_sequence without any
>>> lock is theoretically problematic wrt compiler optimization. You should have
>>> READ_ONCE() there and the places modifying j_commit_sequence need to use
>>> WRITE_ONCE().
>>>
>>
>> Thanks for pointing this out, but I'm not sure if we have to need READ_ONCE()
>> here. IIUC, if we add READ_ONCE(), we could make sure to get the latest
>> j_commit_sequence, if not, there is a window (it might becomes larger) that
>> we could get the old value and jbd2_transaction_committed() could return false
>> even if the given transaction was just committed, but I think the window is
>> always there, so it looks like it is not a big problem, is that right?
> 
> Well, all accesses to any memory should use READ_ONCE(), be protected by a
> lock, or use types that handle atomicity on assembly level (like atomic_t,
> or atomic bit operations and similar). Otherwise the compiler is free to
> assume the underlying memory cannot change and generate potentionally
> invalid code. In this case, I don't think realistically any compiler will
> do it but still it is a good practice and also it saves us from KCSAN
> warnings. If you want to know more details about possible problems, see
> 
>   tools/memory-model/Documentation/explanation.txt
> 
> chapter "PLAIN ACCESSES AND DATA RACES".
> 

Sure, this document is really helpful, I'll add READ_ONCE() and
WRITE_ONCE() here, thanks a lot.

Yi.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] ext4/jbd2: drop jbd2_transaction_committed()
  2024-05-13  7:21 [PATCH] ext4/jbd2: drop jbd2_transaction_committed() Zhang Yi
  2024-05-15  0:25 ` Jan Kara
@ 2024-05-27  6:37 ` kernel test robot
  1 sibling, 0 replies; 6+ messages in thread
From: kernel test robot @ 2024-05-27  6:37 UTC (permalink / raw)
  To: Zhang Yi
  Cc: oe-lkp, lkp, Dave Chinner, linux-ext4, ying.huang, feng.tang,
	fengwei.yin, linux-fsdevel, linux-kernel, tytso, adilger.kernel,
	jack, ritesh.list, yi.zhang, yi.zhang, chengzhihao1, yukuai3,
	oliver.sang



Hello,

kernel test robot noticed a 674.0% improvement of stress-ng.fiemap.ops_per_sec on:


commit: 8a7952dd64b38f3ac2754ca550b52ac3ca921c1a ("[PATCH] ext4/jbd2: drop jbd2_transaction_committed()")
url: https://github.com/intel-lab-lkp/linux/commits/Zhang-Yi/ext4-jbd2-drop-jbd2_transaction_committed/20240513-153311
base: https://git.kernel.org/cgit/linux/kernel/git/tytso/ext4.git dev
patch link: https://lore.kernel.org/all/20240513072119.2335346-1-yi.zhang@huaweicloud.com/
patch subject: [PATCH] ext4/jbd2: drop jbd2_transaction_committed()

testcase: stress-ng
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:

	nr_threads: 100%
	disk: 1HDD
	testtime: 60s
	fs: ext4
	test: fiemap
	cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240527/202405271406.9b9763f6-oliver.sang@intel.com

=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-13/performance/1HDD/ext4/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/fiemap/stress-ng/60s

commit: 
  26770a717c ("jbd2: add prefix 'jbd2' for 'shrink_type'")
  8a7952dd64 ("ext4/jbd2: drop jbd2_transaction_committed()")

26770a717cac5704 8a7952dd64b38f3ac2754ca550b 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      1.13 ±  5%     +52.7%       1.73 ±  3%  iostat.cpu.user
      0.01 ± 47%      +0.0        0.04 ± 44%  mpstat.cpu.all.iowait%
      0.02 ±  4%      -0.0        0.01 ± 15%  mpstat.cpu.all.soft%
      1.15 ±  5%      +0.6        1.75 ±  3%  mpstat.cpu.all.usr%
    100427 ± 42%     -59.1%      41106 ± 66%  numa-vmstat.node0.nr_anon_pages
    149979 ± 34%    +205.2%     457720 ± 31%  numa-vmstat.node1.nr_inactive_anon
    149980 ± 34%    +205.2%     457720 ± 31%  numa-vmstat.node1.nr_zone_inactive_anon
    402376 ± 42%     -59.1%     164453 ± 66%  numa-meminfo.node0.AnonPages
    424656 ± 39%     -57.6%     180029 ± 58%  numa-meminfo.node0.AnonPages.max
    620453 ± 33%    +198.4%    1851452 ± 30%  numa-meminfo.node1.Inactive
    601997 ± 34%    +204.4%    1832749 ± 31%  numa-meminfo.node1.Inactive(anon)
      2387 ±  2%     +12.8%       2693 ±  4%  vmstat.io.bo
     10.17 ± 13%     +83.8%      18.69 ±  3%  vmstat.procs.b
    201402 ±  9%    +631.9%    1474068 ±  6%  vmstat.system.cs
    167899            +5.5%     177141        vmstat.system.in
    347.50 ± 16%     +53.4%     533.00 ± 10%  perf-c2c.DRAM.local
      4232 ± 48%    +166.1%      11262 ± 24%  perf-c2c.DRAM.remote
     23484 ± 10%    +119.8%      51621 ±  7%  perf-c2c.HITM.local
      3377 ± 56%    +179.0%       9424 ± 27%  perf-c2c.HITM.remote
     26861 ±  7%    +127.3%      61046 ±  5%  perf-c2c.HITM.total
   7612782 ±  7%    +676.3%   59099460 ±  6%  stress-ng.fiemap.ops
    124734 ±  8%    +674.0%     965466 ±  6%  stress-ng.fiemap.ops_per_sec
  12752270 ± 10%    +643.8%   94847653 ±  5%  stress-ng.time.involuntary_context_switches
     19.02 ±  6%    +124.6%      42.70 ±  5%  stress-ng.time.user_time
     62295           +17.8%      73411 ±  4%  stress-ng.time.voluntary_context_switches
    262799 ±  5%     +90.1%     499700 ± 43%  meminfo.Active
   3750501           +38.7%    5203048 ±  2%  meminfo.Cached
   3989040           +36.2%    5432649 ±  2%  meminfo.Committed_AS
   1154859 ±  3%    +105.8%    2376633 ± 12%  meminfo.Inactive
   1118790 ±  3%    +109.2%    2340758 ± 12%  meminfo.Inactive(anon)
    362491 ±  7%    +153.0%     917217 ± 19%  meminfo.Mapped
   5902699           +25.8%    7426870        meminfo.Memused
    546333 ±  3%    +265.7%    1998111 ±  5%  meminfo.Shmem
   5963280           +25.0%    7456799        meminfo.max_used_kB
      0.92 ± 16%     -21.5%       0.72 ±  8%  sched_debug.cfs_rq:/.h_nr_running.stddev
    809.29 ±  9%     -15.5%     683.50 ±  7%  sched_debug.cfs_rq:/.runnable_avg.stddev
    567.16 ± 10%    +123.7%       1268 ±  9%  sched_debug.cfs_rq:/.util_est.avg
      2015 ±  8%     +41.9%       2859 ± 11%  sched_debug.cfs_rq:/.util_est.max
      0.75 ±223%  +23377.8%     176.08 ± 20%  sched_debug.cfs_rq:/.util_est.min
    447.08 ±  8%     +19.1%     532.35 ± 12%  sched_debug.cfs_rq:/.util_est.stddev
      0.92 ± 16%     -22.1%       0.71 ±  8%  sched_debug.cpu.nr_running.stddev
    101043 ±  9%    +621.7%     729210 ±  5%  sched_debug.cpu.nr_switches.avg
    176826 ± 18%    +423.9%     926474 ±  5%  sched_debug.cpu.nr_switches.max
     17586 ± 35%    +762.4%     151656 ± 44%  sched_debug.cpu.nr_switches.min
     54968 ± 44%    +170.9%     148885 ±  5%  sched_debug.cpu.nr_switches.stddev
      5712            +5.1%       6002 ±  2%  proc-vmstat.nr_active_file
    939938           +38.6%    1302470 ±  2%  proc-vmstat.nr_file_pages
    280225 ±  3%    +108.7%     584958 ± 12%  proc-vmstat.nr_inactive_anon
     91383 ±  8%    +149.7%     228197 ± 19%  proc-vmstat.nr_mapped
    136324 ±  3%    +265.7%     498577 ±  5%  proc-vmstat.nr_shmem
     25123            +2.8%      25834        proc-vmstat.nr_slab_reclaimable
      5712            +5.1%       6002 ±  2%  proc-vmstat.nr_zone_active_file
    280225 ±  3%    +108.7%     584958 ± 12%  proc-vmstat.nr_zone_inactive_anon
     15324 ± 30%    +359.4%      70400 ±  9%  proc-vmstat.numa_hint_faults
     10098 ± 24%    +283.5%      38729 ± 19%  proc-vmstat.numa_hint_faults_local
    604637           +86.4%    1127295 ±  4%  proc-vmstat.numa_hit
    538359           +97.1%    1061202 ±  4%  proc-vmstat.numa_local
    532388           +13.4%     603511 ±  3%  proc-vmstat.numa_pte_updates
    723145           +72.1%    1244321 ±  3%  proc-vmstat.pgalloc_normal
    383685 ±  2%     +21.4%     465726 ±  2%  proc-vmstat.pgfault
    152450           +13.4%     172900 ±  4%  proc-vmstat.pgpgout
 2.951e+09 ±  6%    +571.9%  1.983e+10 ±  2%  perf-stat.i.branch-instructions
      0.64 ±  4%      -0.2        0.40 ±  2%  perf-stat.i.branch-miss-rate%
  19046341 ±  4%    +291.2%   74517781        perf-stat.i.branch-misses
   9666270 ± 31%    +318.8%   40484483 ± 19%  perf-stat.i.cache-misses
  70884859 ±  6%    +367.8%  3.316e+08 ±  2%  perf-stat.i.cache-references
    213832 ±  9%    +623.2%    1546476 ±  6%  perf-stat.i.context-switches
     15.20 ±  7%     -84.9%       2.30 ±  2%  perf-stat.i.cpi
    414.63 ±  5%     +10.9%     459.73 ±  8%  perf-stat.i.cpu-migrations
     26710 ± 31%     -77.7%       5960 ± 22%  perf-stat.i.cycles-between-cache-misses
 1.501e+10 ±  6%    +556.2%  9.849e+10 ±  2%  perf-stat.i.instructions
      0.07 ±  6%    +515.9%       0.44 ±  2%  perf-stat.i.ipc
      3.33 ±  9%    +626.6%      24.22 ±  6%  perf-stat.i.metric.K/sec
      4908 ±  3%     +26.6%       6216 ±  3%  perf-stat.i.minor-faults
      4908 ±  3%     +26.6%       6216 ±  3%  perf-stat.i.page-faults
      0.65 ±  5%      -0.3        0.38 ±  2%  perf-stat.overall.branch-miss-rate%
     15.14 ±  6%     -84.8%       2.30 ±  2%  perf-stat.overall.cpi
     25496 ± 29%     -77.2%       5824 ± 21%  perf-stat.overall.cycles-between-cache-misses
      0.07 ±  6%    +556.1%       0.44 ±  2%  perf-stat.overall.ipc
   2.9e+09 ±  6%    +572.3%   1.95e+10 ±  2%  perf-stat.ps.branch-instructions
  18850245 ±  4%    +289.0%   73330498        perf-stat.ps.branch-misses
   9571442 ± 31%    +315.9%   39805993 ± 19%  perf-stat.ps.cache-misses
  70175556 ±  6%    +365.5%  3.267e+08 ±  2%  perf-stat.ps.cache-references
    205492 ±  9%    +636.9%    1514212 ±  6%  perf-stat.ps.context-switches
    405.62 ±  5%     +10.8%     449.30 ±  8%  perf-stat.ps.cpu-migrations
 1.475e+10 ±  6%    +556.5%  9.684e+10 ±  2%  perf-stat.ps.instructions
      4823 ±  3%     +25.8%       6069 ±  4%  perf-stat.ps.minor-faults
      4823 ±  3%     +25.8%       6069 ±  4%  perf-stat.ps.page-faults
 9.213e+11 ±  6%    +555.4%  6.038e+12 ±  2%  perf-stat.total.instructions
     85.73           -85.7        0.00        perf-profile.calltrace.cycles-pp.jbd2_transaction_committed.ext4_set_iomap.ext4_iomap_begin_report.iomap_iter.iomap_fiemap
     86.11           -84.1        1.99 ±  2%  perf-profile.calltrace.cycles-pp.ext4_set_iomap.ext4_iomap_begin_report.iomap_iter.iomap_fiemap.do_vfs_ioctl
     50.51 ±  3%     -50.5        0.00        perf-profile.calltrace.cycles-pp._raw_read_lock.jbd2_transaction_committed.ext4_set_iomap.ext4_iomap_begin_report.iomap_iter
     96.50           -12.4       84.06        perf-profile.calltrace.cycles-pp.ext4_iomap_begin_report.iomap_iter.iomap_fiemap.do_vfs_ioctl.__x64_sys_ioctl
     97.49            -5.6       91.94        perf-profile.calltrace.cycles-pp.iomap_iter.iomap_fiemap.do_vfs_ioctl.__x64_sys_ioctl.do_syscall_64
     98.08            -1.5       96.62        perf-profile.calltrace.cycles-pp.iomap_fiemap.do_vfs_ioctl.__x64_sys_ioctl.do_syscall_64.entry_SYSCALL_64_after_hwframe
     98.13            -1.1       96.99        perf-profile.calltrace.cycles-pp.do_vfs_ioctl.__x64_sys_ioctl.do_syscall_64.entry_SYSCALL_64_after_hwframe.ioctl
     98.14            -1.1       97.05        perf-profile.calltrace.cycles-pp.__x64_sys_ioctl.do_syscall_64.entry_SYSCALL_64_after_hwframe.ioctl
     98.18            -1.1       97.12        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.ioctl
     98.19            -1.0       97.14        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.ioctl
     98.22            -0.9       97.31        perf-profile.calltrace.cycles-pp.ioctl
      0.62 ±  9%      +0.3        0.94        perf-profile.calltrace.cycles-pp.__schedule.schedule.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.62 ±  9%      +0.3        0.96        perf-profile.calltrace.cycles-pp.schedule.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
      0.70 ±  9%      +0.5        1.18        perf-profile.calltrace.cycles-pp.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
      0.87 ±  8%      +0.7        1.58        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
      0.87 ±  8%      +0.7        1.59        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__sched_yield
      0.00            +1.0        1.00 ±  4%  perf-profile.calltrace.cycles-pp.ext4_sb_block_valid.__check_block_validity.ext4_map_blocks.ext4_iomap_begin_report.iomap_iter
      0.94 ±  8%      +1.0        1.98 ±  2%  perf-profile.calltrace.cycles-pp.__sched_yield
      0.00            +1.1        1.10 ±  2%  perf-profile.calltrace.cycles-pp._copy_to_user.fiemap_fill_next_extent.iomap_fiemap.do_vfs_ioctl.__x64_sys_ioctl
      0.00            +1.3        1.35 ±  8%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.percpu_counter_add_batch.ext4_es_lookup_extent.ext4_map_blocks
      0.00            +1.5        1.48 ±  2%  perf-profile.calltrace.cycles-pp.__check_block_validity.ext4_map_blocks.ext4_iomap_begin_report.iomap_iter.iomap_fiemap
      0.00            +1.6        1.58 ±  7%  perf-profile.calltrace.cycles-pp._raw_spin_lock.percpu_counter_add_batch.ext4_es_lookup_extent.ext4_map_blocks.ext4_iomap_begin_report
      0.00            +3.2        3.22 ±  2%  perf-profile.calltrace.cycles-pp.fiemap_fill_next_extent.iomap_fiemap.do_vfs_ioctl.__x64_sys_ioctl.do_syscall_64
      4.96 ±  6%      +4.8        9.72 ±  2%  perf-profile.calltrace.cycles-pp._raw_read_lock.ext4_es_lookup_extent.ext4_map_blocks.ext4_iomap_begin_report.iomap_iter
      0.78 ±  7%      +5.4        6.22 ±  2%  perf-profile.calltrace.cycles-pp.iomap_iter_advance.iomap_iter.iomap_fiemap.do_vfs_ioctl.__x64_sys_ioctl
      0.62 ±  6%     +50.6       51.27        perf-profile.calltrace.cycles-pp.percpu_counter_add_batch.ext4_es_lookup_extent.ext4_map_blocks.ext4_iomap_begin_report.iomap_iter
      9.55 ±  7%     +66.3       75.88        perf-profile.calltrace.cycles-pp.ext4_es_lookup_extent.ext4_map_blocks.ext4_iomap_begin_report.iomap_iter.iomap_fiemap
     10.12 ±  7%     +69.9       79.98        perf-profile.calltrace.cycles-pp.ext4_map_blocks.ext4_iomap_begin_report.iomap_iter.iomap_fiemap.do_vfs_ioctl
     85.84           -85.8        0.00        perf-profile.children.cycles-pp.jbd2_transaction_committed
     86.14           -84.0        2.10 ±  2%  perf-profile.children.cycles-pp.ext4_set_iomap
     55.59 ±  3%     -45.8        9.83 ±  2%  perf-profile.children.cycles-pp._raw_read_lock
     96.56           -12.1       84.50        perf-profile.children.cycles-pp.ext4_iomap_begin_report
     97.54            -5.3       92.26        perf-profile.children.cycles-pp.iomap_iter
     98.11            -1.3       96.84        perf-profile.children.cycles-pp.iomap_fiemap
     98.14            -1.1       97.00        perf-profile.children.cycles-pp.do_vfs_ioctl
     98.14            -1.1       97.06        perf-profile.children.cycles-pp.__x64_sys_ioctl
     98.22            -0.9       97.34        perf-profile.children.cycles-pp.ioctl
     99.42            -0.7       98.76        perf-profile.children.cycles-pp.do_syscall_64
     99.43            -0.6       98.81        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.39 ± 21%      -0.1        0.25 ±  6%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.14 ± 24%      -0.1        0.03 ±105%  perf-profile.children.cycles-pp.build_id__mark_dso_hit
      0.19 ±  5%      +0.0        0.21 ±  6%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.08 ± 13%      +0.0        0.11        perf-profile.children.cycles-pp.switch_fpu_return
      0.09 ±  7%      +0.0        0.12 ±  5%  perf-profile.children.cycles-pp.update_process_times
      0.11 ±  5%      +0.0        0.14 ±  6%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      0.10 ±  5%      +0.0        0.14 ±  5%  perf-profile.children.cycles-pp.tick_nohz_handler
      0.00            +0.1        0.05        perf-profile.children.cycles-pp.__fdget
      0.00            +0.1        0.05 ±  7%  perf-profile.children.cycles-pp.__update_load_avg_se
      0.00            +0.1        0.06 ±  9%  perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.00            +0.1        0.06 ±  6%  perf-profile.children.cycles-pp.syscall_return_via_sysret
      0.00            +0.1        0.06        perf-profile.children.cycles-pp.___perf_sw_event
      0.00            +0.1        0.06        perf-profile.children.cycles-pp.__dequeue_entity
      0.00            +0.1        0.06 ±  6%  perf-profile.children.cycles-pp.__switch_to_asm
      0.00            +0.1        0.06 ±  7%  perf-profile.children.cycles-pp.pick_eevdf
      0.00            +0.1        0.07        perf-profile.children.cycles-pp.__switch_to
      0.00            +0.1        0.08 ±  6%  perf-profile.children.cycles-pp.entry_SYSCALL_64
      0.00            +0.1        0.08 ±  4%  perf-profile.children.cycles-pp.rseq_ip_fixup
      0.00            +0.1        0.09 ±  4%  perf-profile.children.cycles-pp.restore_fpregs_from_fpstate
      0.27 ± 22%      +0.1        0.37 ±  9%  perf-profile.children.cycles-pp.perf_session__process_events
      0.27 ± 22%      +0.1        0.37 ±  9%  perf-profile.children.cycles-pp.record__finish_output
      0.26 ± 25%      +0.1        0.36 ±  9%  perf-profile.children.cycles-pp.reader__read_event
      0.00            +0.1        0.11 ±  3%  perf-profile.children.cycles-pp.set_next_entity
      0.00            +0.1        0.12 ±  4%  perf-profile.children.cycles-pp.update_load_avg
      0.08 ±  6%      +0.1        0.20        perf-profile.children.cycles-pp.do_sched_yield
      0.00            +0.1        0.12 ±  4%  perf-profile.children.cycles-pp.put_prev_entity
      0.02 ±141%      +0.1        0.15 ±  2%  perf-profile.children.cycles-pp.update_curr
      0.09 ± 36%      +0.1        0.23 ± 18%  perf-profile.children.cycles-pp.process_simple
      0.00            +0.1        0.15 ±  3%  perf-profile.children.cycles-pp.__rseq_handle_notify_resume
      0.07 ± 57%      +0.1        0.22 ± 19%  perf-profile.children.cycles-pp.queue_event
      0.00            +0.1        0.15 ±  4%  perf-profile.children.cycles-pp.clear_bhb_loop
      0.00            +0.1        0.15 ±  4%  perf-profile.children.cycles-pp.stress_fiemap
      0.07 ± 57%      +0.2        0.22 ± 19%  perf-profile.children.cycles-pp.ordered_events__queue
      0.00            +0.2        0.16 ±  2%  perf-profile.children.cycles-pp.yield_task_fair
      0.00            +0.2        0.16 ±  3%  perf-profile.children.cycles-pp.ext4_inode_block_valid
      0.20 ±  5%      +0.2        0.41        perf-profile.children.cycles-pp.syscall_exit_to_user_mode
      0.09 ± 10%      +0.3        0.38 ±  2%  perf-profile.children.cycles-pp.pick_next_task_fair
      0.62 ±  9%      +0.3        0.95        perf-profile.children.cycles-pp.__schedule
      0.63 ± 10%      +0.3        0.97        perf-profile.children.cycles-pp.schedule
      0.06 ± 11%      +0.4        0.48 ±  3%  perf-profile.children.cycles-pp.iomap_to_fiemap
      0.70 ±  9%      +0.5        1.18        perf-profile.children.cycles-pp.__x64_sys_sched_yield
      0.13 ±  8%      +0.9        1.05 ±  3%  perf-profile.children.cycles-pp.ext4_sb_block_valid
      0.94 ±  8%      +1.1        2.03 ±  2%  perf-profile.children.cycles-pp.__sched_yield
      0.16 ±  7%      +1.1        1.27 ±  2%  perf-profile.children.cycles-pp._copy_to_user
      0.20 ±  8%      +1.4        1.59 ±  2%  perf-profile.children.cycles-pp.__check_block_validity
      0.20 ±  7%      +1.4        1.64 ±  7%  perf-profile.children.cycles-pp._raw_spin_lock
      0.41 ±  7%      +2.9        3.30 ±  2%  perf-profile.children.cycles-pp.fiemap_fill_next_extent
      0.80 ±  7%      +5.6        6.35 ±  2%  perf-profile.children.cycles-pp.iomap_iter_advance
      0.64 ±  7%     +50.8       51.40        perf-profile.children.cycles-pp.percpu_counter_add_batch
      9.60 ±  7%     +66.7       76.28        perf-profile.children.cycles-pp.ext4_es_lookup_extent
     10.16 ±  7%     +70.2       80.39        perf-profile.children.cycles-pp.ext4_map_blocks
     55.43 ±  3%     -45.7        9.70 ±  2%  perf-profile.self.cycles-pp._raw_read_lock
     34.41 ±  6%     -34.4        0.00        perf-profile.self.cycles-pp.jbd2_transaction_committed
      0.10 ± 10%      -0.0        0.07 ±  5%  perf-profile.self.cycles-pp.prepare_task_switch
      0.08 ± 11%      +0.0        0.12 ±  4%  perf-profile.self.cycles-pp.__schedule
      0.00            +0.1        0.05        perf-profile.self.cycles-pp.___perf_sw_event
      0.00            +0.1        0.05        perf-profile.self.cycles-pp.__update_load_avg_se
      0.00            +0.1        0.05        perf-profile.self.cycles-pp.update_curr
      0.00            +0.1        0.05 ±  7%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.00            +0.1        0.06 ±  9%  perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.00            +0.1        0.06 ± 13%  perf-profile.self.cycles-pp.__sched_yield
      0.00            +0.1        0.06 ±  6%  perf-profile.self.cycles-pp.__switch_to_asm
      0.00            +0.1        0.07        perf-profile.self.cycles-pp.__switch_to
      0.20 ±  7%      +0.1        0.28 ±  4%  perf-profile.self.cycles-pp._raw_spin_lock
      0.00            +0.1        0.09 ±  4%  perf-profile.self.cycles-pp.restore_fpregs_from_fpstate
      0.00            +0.1        0.11 ±  4%  perf-profile.self.cycles-pp.ext4_inode_block_valid
      0.00            +0.1        0.13 ±  5%  perf-profile.self.cycles-pp.stress_fiemap
      0.07 ± 57%      +0.1        0.22 ± 19%  perf-profile.self.cycles-pp.queue_event
      0.00            +0.1        0.15 ±  4%  perf-profile.self.cycles-pp.clear_bhb_loop
      0.05 ± 45%      +0.3        0.38 ±  2%  perf-profile.self.cycles-pp.__check_block_validity
      0.03 ± 70%      +0.3        0.38 ±  3%  perf-profile.self.cycles-pp.iomap_to_fiemap
      0.13 ±  7%      +0.9        1.00 ±  4%  perf-profile.self.cycles-pp.ext4_sb_block_valid
      0.13 ±  7%      +0.9        1.05 ±  2%  perf-profile.self.cycles-pp.iomap_fiemap
      0.16 ±  7%      +1.1        1.24 ±  2%  perf-profile.self.cycles-pp._copy_to_user
      0.19 ±  7%      +1.3        1.52 ±  2%  perf-profile.self.cycles-pp.iomap_iter
      0.29 ± 11%      +1.6        1.88 ±  2%  perf-profile.self.cycles-pp.ext4_set_iomap
      0.25 ±  8%      +1.8        2.04 ±  2%  perf-profile.self.cycles-pp.fiemap_fill_next_extent
      0.27 ±  8%      +1.9        2.21 ±  2%  perf-profile.self.cycles-pp.ext4_iomap_begin_report
      0.38 ± 13%      +2.3        2.70 ±  4%  perf-profile.self.cycles-pp.ext4_map_blocks
      0.78 ±  7%      +5.4        6.23 ±  2%  perf-profile.self.cycles-pp.iomap_iter_advance
      3.98 ±  8%     +11.1       15.12 ±  2%  perf-profile.self.cycles-pp.ext4_es_lookup_extent
      0.56 ±  7%     +48.9       49.48        perf-profile.self.cycles-pp.percpu_counter_add_batch




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-05-27  6:37 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-13  7:21 [PATCH] ext4/jbd2: drop jbd2_transaction_committed() Zhang Yi
2024-05-15  0:25 ` Jan Kara
2024-05-16  8:27   ` Zhang Yi
2024-05-20  8:49     ` Jan Kara
2024-05-20 12:39       ` Zhang Yi
2024-05-27  6:37 ` kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).