public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
* [linus:master] [jbd2]  6a3afb6ac6:  fileio.latency_95th_ms 92.5% regression
@ 2024-01-02  7:31 kernel test robot
  2024-01-03  3:31 ` Zhang Yi
  0 siblings, 1 reply; 5+ messages in thread
From: kernel test robot @ 2024-01-02  7:31 UTC (permalink / raw)
  To: Zhang Yi
  Cc: oe-lkp, lkp, linux-kernel, Theodore Ts'o, Jan Kara,
	linux-ext4, ying.huang, feng.tang, fengwei.yin, oliver.sang



Hello,

kernel test robot noticed a 92.5% regression of fileio.latency_95th_ms on:


commit: 6a3afb6ac6dfab158ebdd4b87941178f58c8939f ("jbd2: increase the journal IO's priority")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: fileio
test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
parameters:

	period: 600s
	nr_threads: 100%
	disk: 1HDD
	fs: ext4
	size: 64G
	filenum: 1024f
	rwmode: seqwr
	iomode: sync
	cpufreq_governor: performance




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202401021525.a27b9444-oliver.sang@intel.com


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240102/202401021525.a27b9444-oliver.sang@intel.com

=========================================================================================
compiler/cpufreq_governor/disk/filenum/fs/iomode/kconfig/nr_threads/period/rootfs/rwmode/size/tbox_group/testcase:
  gcc-12/performance/1HDD/1024f/ext4/sync/x86_64-rhel-8.3/100%/600s/debian-11.1-x86_64-20220510.cgz/seqwr/64G/lkp-icl-2sp5/fileio

commit: 
  8555922721 ("jbd2: correct the printing of write_flags in jbd2_write_superblock()")
  6a3afb6ac6 ("jbd2: increase the journal IO's priority")

85559227211020b2 6a3afb6ac6dfab158ebdd4b8794 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     37.99 ±  9%     +92.5%      73.13        fileio.latency_95th_ms
     19872 ± 16%     -24.1%      15078 ± 21%  numa-meminfo.node1.Active(file)
     24.74 ±  6%     +97.3%      48.81 ±  2%  perf-sched.wait_time.avg.ms.kjournald2.kthread.ret_from_fork.ret_from_fork_asm
   8157448            -5.7%    7691858 ±  2%  proc-vmstat.pgpgout
     13464            -5.7%      12700 ±  2%  vmstat.io.bo
      4968 ± 16%     -24.1%       3768 ± 21%  numa-vmstat.node1.nr_active_file
      4968 ± 16%     -24.1%       3768 ± 21%  numa-vmstat.node1.nr_zone_active_file
      1413            +2.8%       1452 ±  2%  perf-stat.i.cycles-between-cache-misses
      1.59            -2.9%       1.54        perf-stat.overall.MPKI
      1368            +2.5%       1402        perf-stat.overall.cycles-between-cache-misses
    161.13 ±  9%     +17.4%     189.23 ± 10%  sched_debug.cpu.curr->pid.avg
     25531 ±  6%     -12.4%      22376 ±  8%  sched_debug.cpu.nr_switches.stddev
    -93.66           +38.8%    -130.00        sched_debug.cpu.nr_uninterruptible.min




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [linus:master] [jbd2] 6a3afb6ac6: fileio.latency_95th_ms 92.5% regression
  2024-01-02  7:31 [linus:master] [jbd2] 6a3afb6ac6: fileio.latency_95th_ms 92.5% regression kernel test robot
@ 2024-01-03  3:31 ` Zhang Yi
  2024-01-03  5:53   ` Oliver Sang
  2024-01-03  9:49   ` Jan Kara
  0 siblings, 2 replies; 5+ messages in thread
From: Zhang Yi @ 2024-01-03  3:31 UTC (permalink / raw)
  To: kernel test robot
  Cc: oe-lkp, lkp, linux-kernel, Theodore Ts'o, Jan Kara,
	linux-ext4, ying.huang, feng.tang, fengwei.yin, yukuai3

On 2024/1/2 15:31, kernel test robot wrote:
> 
> 
> Hello,
> 
> kernel test robot noticed a 92.5% regression of fileio.latency_95th_ms on:

Hello,

This seems a little weird, the tests doesn't use blk-cgroup, and the patch
increase IO priority in WBT, so there shouldn't be any negative influence in
theory. I've tested sysbench on my machine with Intel Xeon Gold 6240 CPU,
400GB memory with HDD disk, and couldn't reproduce this regression.

==
Without 6a3afb6ac6 ("jbd2: increase the journal IO's priority")
==

 $ sysbench fileio --events=0 --threads=128 --time=600 --file-test-mode=seqwr --file-total-size=68719476736 --file-io-mode=sync --file-num=1024 run

  sysbench 1.1.0-df89d34 (using bundled LuaJIT 2.1.0-beta3)

  Running the test with following options:
  Number of threads: 128
  Initializing random number generator from current time


  Extra file open flags: (none)
  1024 files, 64MiB each
  64GiB total file size
  Block size 16KiB
  Periodic FSYNC enabled, calling fsync() each 100 requests.
  Calling fsync() at the end of test, Enabled.
  Using synchronous I/O mode
  Doing sequential write (creation) test
  Initializing worker threads...

  Threads started!


  Throughput:
           read:  IOPS=0.00 0.00 MiB/s (0.00 MB/s)
           write: IOPS=31961.19 499.39 MiB/s (523.65 MB/s)
           fsync: IOPS=327500.24

  Latency (ms):
           min:                                  0.00
           avg:                                  0.33
           max:                                920.89
           95th percentile:                      0.33
           sum:                            71212319.19

==
With 6a3afb6ac6 ("jbd2: increase the journal IO's priority")
==

 $ sysbench fileio --events=0 --threads=128 --time=600 --file-test-mode=seqwr --file-total-size=68719476736 --file-io-mode=sync --file-num=1024 run

  sysbench 1.1.0-df89d34 (using bundled LuaJIT 2.1.0-beta3)

  Running the test with following options:
  Number of threads: 128
  Initializing random number generator from current time


  Extra file open flags: (none)
  1024 files, 64MiB each
  64GiB total file size
  Block size 16KiB
  Periodic FSYNC enabled, calling fsync() each 100 requests.
  Calling fsync() at the end of test, Enabled.
  Using synchronous I/O mode
  Doing sequential write (creation) test
  Initializing worker threads...

  Threads started!


  Throughput:
           read:  IOPS=0.00 0.00 MiB/s (0.00 MB/s)
           write: IOPS=31710.38 495.47 MiB/s (519.54 MB/s)
           fsync: IOPS=324931.88

  Latency (ms):
           min:                                  0.00
           avg:                                  0.33
           max:                               1051.69
           95th percentile:                      0.32
           sum:                            71309894.62

Thanks,
Yi.

> 
> 
> commit: 6a3afb6ac6dfab158ebdd4b87941178f58c8939f ("jbd2: increase the journal IO's priority")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> 
> testcase: fileio
> test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
> parameters:
> 
> 	period: 600s
> 	nr_threads: 100%
> 	disk: 1HDD
> 	fs: ext4
> 	size: 64G
> 	filenum: 1024f
> 	rwmode: seqwr
> 	iomode: sync
> 	cpufreq_governor: performance
> 
> 
> 
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202401021525.a27b9444-oliver.sang@intel.com
> 
> 
> Details are as below:
> -------------------------------------------------------------------------------------------------->
> 
> 
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20240102/202401021525.a27b9444-oliver.sang@intel.com
> 
> =========================================================================================
> compiler/cpufreq_governor/disk/filenum/fs/iomode/kconfig/nr_threads/period/rootfs/rwmode/size/tbox_group/testcase:
>   gcc-12/performance/1HDD/1024f/ext4/sync/x86_64-rhel-8.3/100%/600s/debian-11.1-x86_64-20220510.cgz/seqwr/64G/lkp-icl-2sp5/fileio
> 
> commit: 
>   8555922721 ("jbd2: correct the printing of write_flags in jbd2_write_superblock()")
>   6a3afb6ac6 ("jbd2: increase the journal IO's priority")
> 
> 85559227211020b2 6a3afb6ac6dfab158ebdd4b8794 
> ---------------- --------------------------- 
>          %stddev     %change         %stddev
>              \          |                \  
>      37.99 ±  9%     +92.5%      73.13        fileio.latency_95th_ms
>      19872 ± 16%     -24.1%      15078 ± 21%  numa-meminfo.node1.Active(file)
>      24.74 ±  6%     +97.3%      48.81 ±  2%  perf-sched.wait_time.avg.ms.kjournald2.kthread.ret_from_fork.ret_from_fork_asm
>    8157448            -5.7%    7691858 ±  2%  proc-vmstat.pgpgout
>      13464            -5.7%      12700 ±  2%  vmstat.io.bo
>       4968 ± 16%     -24.1%       3768 ± 21%  numa-vmstat.node1.nr_active_file
>       4968 ± 16%     -24.1%       3768 ± 21%  numa-vmstat.node1.nr_zone_active_file
>       1413            +2.8%       1452 ±  2%  perf-stat.i.cycles-between-cache-misses
>       1.59            -2.9%       1.54        perf-stat.overall.MPKI
>       1368            +2.5%       1402        perf-stat.overall.cycles-between-cache-misses
>     161.13 ±  9%     +17.4%     189.23 ± 10%  sched_debug.cpu.curr->pid.avg
>      25531 ±  6%     -12.4%      22376 ±  8%  sched_debug.cpu.nr_switches.stddev
>     -93.66           +38.8%    -130.00        sched_debug.cpu.nr_uninterruptible.min
> 
> 
> 
> 
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
> 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [linus:master] [jbd2] 6a3afb6ac6: fileio.latency_95th_ms 92.5% regression
  2024-01-03  3:31 ` Zhang Yi
@ 2024-01-03  5:53   ` Oliver Sang
  2024-01-03  9:49   ` Jan Kara
  1 sibling, 0 replies; 5+ messages in thread
From: Oliver Sang @ 2024-01-03  5:53 UTC (permalink / raw)
  To: Zhang Yi
  Cc: oe-lkp, lkp, linux-kernel, Theodore Ts'o, Jan Kara,
	linux-ext4, ying.huang, feng.tang, fengwei.yin, yukuai3,
	oliver.sang

hi, Yi,

On Wed, Jan 03, 2024 at 11:31:39AM +0800, Zhang Yi wrote:
> On 2024/1/2 15:31, kernel test robot wrote:
> > 
> > 
> > Hello,
> > 
> > kernel test robot noticed a 92.5% regression of fileio.latency_95th_ms on:
> 
> Hello,
> 
> This seems a little weird, the tests doesn't use blk-cgroup, and the patch
> increase IO priority in WBT, so there shouldn't be any negative influence in
> theory. I've tested sysbench on my machine with Intel Xeon Gold 6240 CPU,
> 400GB memory with HDD disk, and couldn't reproduce this regression.

we double confirmed the latency_95th_ms data is stable in multi runs on our
test platform (Ice Lake).
however, we cannot gurantee the regression could be reproduced on another test
machine.

our team's goal is to report issues (both performance and func) we observed in
our tests to community to help author/community to review then improve code
quality if possible. so just FYI. it's up to author/community to judge the next
step.

regarding this report, if you want to dig further, we will be very happen to
do further tests if you give us some debug patches. Thanks a lot!


> 
> ==
> Without 6a3afb6ac6 ("jbd2: increase the journal IO's priority")
> ==
> 
>  $ sysbench fileio --events=0 --threads=128 --time=600 --file-test-mode=seqwr --file-total-size=68719476736 --file-io-mode=sync --file-num=1024 run
> 
>   sysbench 1.1.0-df89d34 (using bundled LuaJIT 2.1.0-beta3)
> 
>   Running the test with following options:
>   Number of threads: 128
>   Initializing random number generator from current time
> 
> 
>   Extra file open flags: (none)
>   1024 files, 64MiB each
>   64GiB total file size
>   Block size 16KiB
>   Periodic FSYNC enabled, calling fsync() each 100 requests.
>   Calling fsync() at the end of test, Enabled.
>   Using synchronous I/O mode
>   Doing sequential write (creation) test
>   Initializing worker threads...
> 
>   Threads started!
> 
> 
>   Throughput:
>            read:  IOPS=0.00 0.00 MiB/s (0.00 MB/s)
>            write: IOPS=31961.19 499.39 MiB/s (523.65 MB/s)
>            fsync: IOPS=327500.24
> 
>   Latency (ms):
>            min:                                  0.00
>            avg:                                  0.33
>            max:                                920.89
>            95th percentile:                      0.33
>            sum:                            71212319.19
> 
> ==
> With 6a3afb6ac6 ("jbd2: increase the journal IO's priority")
> ==
> 
>  $ sysbench fileio --events=0 --threads=128 --time=600 --file-test-mode=seqwr --file-total-size=68719476736 --file-io-mode=sync --file-num=1024 run
> 
>   sysbench 1.1.0-df89d34 (using bundled LuaJIT 2.1.0-beta3)
> 
>   Running the test with following options:
>   Number of threads: 128
>   Initializing random number generator from current time
> 
> 
>   Extra file open flags: (none)
>   1024 files, 64MiB each
>   64GiB total file size
>   Block size 16KiB
>   Periodic FSYNC enabled, calling fsync() each 100 requests.
>   Calling fsync() at the end of test, Enabled.
>   Using synchronous I/O mode
>   Doing sequential write (creation) test
>   Initializing worker threads...
> 
>   Threads started!
> 
> 
>   Throughput:
>            read:  IOPS=0.00 0.00 MiB/s (0.00 MB/s)
>            write: IOPS=31710.38 495.47 MiB/s (519.54 MB/s)
>            fsync: IOPS=324931.88
> 
>   Latency (ms):
>            min:                                  0.00
>            avg:                                  0.33
>            max:                               1051.69
>            95th percentile:                      0.32
>            sum:                            71309894.62
> 
> Thanks,
> Yi.
> 
> > 
> > 
> > commit: 6a3afb6ac6dfab158ebdd4b87941178f58c8939f ("jbd2: increase the journal IO's priority")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > 
> > testcase: fileio
> > test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
> > parameters:
> > 
> > 	period: 600s
> > 	nr_threads: 100%
> > 	disk: 1HDD
> > 	fs: ext4
> > 	size: 64G
> > 	filenum: 1024f
> > 	rwmode: seqwr
> > 	iomode: sync
> > 	cpufreq_governor: performance
> > 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [linus:master] [jbd2] 6a3afb6ac6: fileio.latency_95th_ms 92.5% regression
  2024-01-03  3:31 ` Zhang Yi
  2024-01-03  5:53   ` Oliver Sang
@ 2024-01-03  9:49   ` Jan Kara
  2024-01-03 13:28     ` Zhang Yi
  1 sibling, 1 reply; 5+ messages in thread
From: Jan Kara @ 2024-01-03  9:49 UTC (permalink / raw)
  To: Zhang Yi
  Cc: kernel test robot, oe-lkp, lkp, linux-kernel, Theodore Ts'o,
	Jan Kara, linux-ext4, ying.huang, feng.tang, fengwei.yin, yukuai3

Hello!

On Wed 03-01-24 11:31:39, Zhang Yi wrote:
> On 2024/1/2 15:31, kernel test robot wrote:
> > 
> > 
> > Hello,
> > 
> > kernel test robot noticed a 92.5% regression of fileio.latency_95th_ms on:
> 
> This seems a little weird, the tests doesn't use blk-cgroup, and the patch
> increase IO priority in WBT, so there shouldn't be any negative influence in
> theory.

I don't have a great explanation either but there could be some impact e.g.
due to a different request merging of IO done by JBD2 vs the flush worker or
something like that. Note that the throughput reduction is only 5.7% so it
is not huge.

> I've tested sysbench on my machine with Intel Xeon Gold 6240 CPU,
> 400GB memory with HDD disk, and couldn't reproduce this regression.
> 
> ==
> Without 6a3afb6ac6 ("jbd2: increase the journal IO's priority")
> ==
> 
>  $ sysbench fileio --events=0 --threads=128 --time=600 --file-test-mode=seqwr --file-total-size=68719476736 --file-io-mode=sync --file-num=1024 run
> 
>   sysbench 1.1.0-df89d34 (using bundled LuaJIT 2.1.0-beta3)
> 
>   Running the test with following options:
>   Number of threads: 128
>   Initializing random number generator from current time
> 
> 
>   Extra file open flags: (none)
>   1024 files, 64MiB each
>   64GiB total file size
>   Block size 16KiB
>   Periodic FSYNC enabled, calling fsync() each 100 requests.
>   Calling fsync() at the end of test, Enabled.
>   Using synchronous I/O mode
>   Doing sequential write (creation) test
>   Initializing worker threads...
> 
>   Threads started!
> 
> 
>   Throughput:
>            read:  IOPS=0.00 0.00 MiB/s (0.00 MB/s)
>            write: IOPS=31961.19 499.39 MiB/s (523.65 MB/s)
>            fsync: IOPS=327500.24

Well, your setup seems to be very different from what LKP was using. You
are achieving ~500 MB/s (likely because all the files fit into the cache
and more or less even within the dirty limit of the page cache) while LKP
run achieves only ~54 MB/s (i.e., we are pretty much bound by the rather
slow disk). I'd try running with something like 32GB of RAM to really see
the disk speed impact...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [linus:master] [jbd2] 6a3afb6ac6: fileio.latency_95th_ms 92.5% regression
  2024-01-03  9:49   ` Jan Kara
@ 2024-01-03 13:28     ` Zhang Yi
  0 siblings, 0 replies; 5+ messages in thread
From: Zhang Yi @ 2024-01-03 13:28 UTC (permalink / raw)
  To: Jan Kara
  Cc: kernel test robot, oe-lkp, lkp, linux-kernel, Theodore Ts'o,
	linux-ext4, ying.huang, feng.tang, fengwei.yin, yukuai3

On 2024/1/3 17:49, Jan Kara wrote:
> Hello!
> 
> On Wed 03-01-24 11:31:39, Zhang Yi wrote:
>> On 2024/1/2 15:31, kernel test robot wrote:
>>>
>>>
>>> Hello,
>>>
>>> kernel test robot noticed a 92.5% regression of fileio.latency_95th_ms on:
>>
>> This seems a little weird, the tests doesn't use blk-cgroup, and the patch
>> increase IO priority in WBT, so there shouldn't be any negative influence in
>> theory.
> 
> I don't have a great explanation either but there could be some impact e.g.
> due to a different request merging of IO done by JBD2 vs the flush worker or
> something like that. Note that the throughput reduction is only 5.7% so it
> is not huge.

Yeah, make sense, this should be one explanation that can be thought of at
the moment.

> 
>> I've tested sysbench on my machine with Intel Xeon Gold 6240 CPU,
>> 400GB memory with HDD disk, and couldn't reproduce this regression.
>>
>> ==
>> Without 6a3afb6ac6 ("jbd2: increase the journal IO's priority")
>> ==
>>
>>  $ sysbench fileio --events=0 --threads=128 --time=600 --file-test-mode=seqwr --file-total-size=68719476736 --file-io-mode=sync --file-num=1024 run
>>
>>   sysbench 1.1.0-df89d34 (using bundled LuaJIT 2.1.0-beta3)
>>
>>   Running the test with following options:
>>   Number of threads: 128
>>   Initializing random number generator from current time
>>
>>
>>   Extra file open flags: (none)
>>   1024 files, 64MiB each
>>   64GiB total file size
>>   Block size 16KiB
>>   Periodic FSYNC enabled, calling fsync() each 100 requests.
>>   Calling fsync() at the end of test, Enabled.
>>   Using synchronous I/O mode
>>   Doing sequential write (creation) test
>>   Initializing worker threads...
>>
>>   Threads started!
>>
>>
>>   Throughput:
>>            read:  IOPS=0.00 0.00 MiB/s (0.00 MB/s)
>>            write: IOPS=31961.19 499.39 MiB/s (523.65 MB/s)
>>            fsync: IOPS=327500.24
> 
> Well, your setup seems to be very different from what LKP was using. You
> are achieving ~500 MB/s (likely because all the files fit into the cache
> and more or less even within the dirty limit of the page cache) while LKP
> run achieves only ~54 MB/s (i.e., we are pretty much bound by the rather
> slow disk). I'd try running with something like 32GB of RAM to really see
> the disk speed impact...
> 

I'm afraid I missed the vmstat.io.bo changes, I will limit the dirty ratio
and test it again tomorrow.

Thanks,
Yi.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-01-03 13:28 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-02  7:31 [linus:master] [jbd2] 6a3afb6ac6: fileio.latency_95th_ms 92.5% regression kernel test robot
2024-01-03  3:31 ` Zhang Yi
2024-01-03  5:53   ` Oliver Sang
2024-01-03  9:49   ` Jan Kara
2024-01-03 13:28     ` Zhang Yi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox