Linux-mm Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [linux-next:master] [mm]  7b32f64bc5: pts.svt-av1.Preset13.Bosphorus4K.frames_per_second 45.8% regression
@ 2026-06-18  8:00 kernel test robot
  2026-06-18  9:30 ` Jan Kara
  0 siblings, 1 reply; 6+ messages in thread
From: kernel test robot @ 2026-06-18  8:00 UTC (permalink / raw)
  To: Frederick Mayle
  Cc: oe-lkp, lkp, Andrew Morton, Jan Kara, Kalesh Singh,
	David Hildenbrand, Lorenzo Stoakes, Matthew Wilcox,
	Suren Baghdasaryan, linux-fsdevel, linux-mm, oliver.sang



Hello,

kernel test robot noticed a 45.8% regression of pts.svt-av1.Preset13.Bosphorus4K.frames_per_second on:


commit: 7b32f64bc512b40b268776c5ac4d354b325b3197 ("mm: limit filemap_fault readahead to VMA boundaries")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

[still regression on linux-next/master ec039126b7fac4e3af35ebccaa7c6f9b6875ba81]

testcase: pts
config: x86_64-rhel-9.4
compiler: gcc-14
test machine: 256 threads 2 sockets Intel(R) Xeon(R) 6767P  CPU @ 2.4GHz (Granite Rapids) with 256G memory
parameters:

	test: svt-av1-2.11.1
	option_a: 1
	option_b: Bosphorus 4K
	cpufreq_governor: performance



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202606181547.617a6967-lkp@intel.com


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20260618/202606181547.617a6967-lkp@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
  gcc-14/performance/x86_64-rhel-9.4/1/Bosphorus 4K/debian-12-x86_64-phoronix/lkp-gnr-2sp3/svt-av1-2.11.1/pts

commit: 
  0b20c36c11 ("mm/madvise: reject invalid process_madvise() advice for zero-length vectors")
  7b32f64bc5 ("mm: limit filemap_fault readahead to VMA boundaries")

0b20c36c118d2122 7b32f64bc512b40b268776c5ac4 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    169.95 ±  6%     -45.8%      92.15 ± 10%  pts.svt-av1.Preset13.Bosphorus4K.frames_per_second
    220.57 ±  3%    +870.9%       2141        pts.time.major_page_faults
   5898121            -3.7%    5682645        pts.time.maximum_resident_set_size
   1408522            -1.3%    1390537        pts.time.minor_page_faults
    370.43 ±  3%     -12.1%     325.57 ±  6%  pts.time.percent_of_cpu_this_job_got
    645228 ±  2%      +7.6%     694407 ±  3%  pts.time.voluntary_context_switches
    162542 ±  7%     +22.4%     198946 ±  8%  sched_debug.cpu.avg_idle.stddev
    274573 ±  3%      -9.3%     249101 ±  3%  vmstat.io.bi
 6.779e+09 ±  3%     +11.3%  7.545e+09 ±  4%  cpuidle..time
   7639156 ±  3%     +10.5%    8439210 ±  3%  cpuidle..usage
      3.73 ± 39%      -2.2        1.52 ± 52%  perf-profile.calltrace.cycles-pp.pv_native_safe_halt.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state
      0.65 ± 87%      +1.8        2.47 ± 60%  perf-profile.children.cycles-pp.link_path_walk
      0.24 ± 46%     -80.9%       0.05 ±143%  perf-sched.sch_delay.avg.ms.perf_trace_sched_switch.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.65 ± 18%     -33.7%       0.43 ± 17%  perf-sched.wait_and_delay.avg.ms.perf_trace_sched_switch.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.02 ±  7%      +0.0        0.06 ±  8%  mpstat.cpu.all.iowait%
      1.08 ±  4%      -0.2        0.93 ±  6%  mpstat.cpu.all.usr%
     22.00 ±  2%     +14.3%      25.14 ±  2%  mpstat.max_utilization.seconds
     10.00 ±  3%     -40.5%       5.94 ±  2%  mpstat.max_utilization_pct
   5916248            +6.9%    6324576        meminfo.Active
   3659476 ±  2%     +11.4%    4078190 ±  2%  meminfo.Active(anon)
    760816 ±  3%     +16.9%     889730 ±  3%  meminfo.AnonHugePages
   2263661 ±  4%     +18.5%    2682180 ±  3%  meminfo.AnonPages
   6611024           +11.9%    7400589 ±  2%  meminfo.Committed_AS
  11386404            +5.1%   11964152        meminfo.Memused
     21554            +5.6%      22770 ±  2%  meminfo.PageTables
    916081 ±  2%     +11.4%    1020691 ±  2%  proc-vmstat.nr_active_anon
    567129 ±  3%     +18.4%     671694 ±  3%  proc-vmstat.nr_anon_pages
    370.88 ±  3%     +17.2%     434.57 ±  3%  proc-vmstat.nr_anon_transparent_hugepages
     48477            +1.6%      49272        proc-vmstat.nr_kernel_stack
      5399            +5.4%       5691 ±  2%  proc-vmstat.nr_page_table_pages
    916081 ±  2%     +11.4%    1020691 ±  2%  proc-vmstat.nr_zone_active_anon
   4614742            -1.5%    4547741        proc-vmstat.pgalloc_normal
   1667517            -1.0%    1650517        proc-vmstat.pgfault
   2150014            -2.9%    2087812        proc-vmstat.pgfree
    224.57 ±  3%    +862.1%       2160        proc-vmstat.pgmajfault
      1021            -7.2%     948.14        proc-vmstat.thp_fault_alloc
 2.679e+09 ±  3%      -7.5%  2.477e+09 ±  2%  perf-stat.i.branch-instructions
      0.80            +0.1        0.86        perf-stat.i.branch-miss-rate%
  27722463 ±  3%      -8.0%   25498433 ±  2%  perf-stat.i.branch-misses
  47056892 ±  6%     -11.2%   41804370 ±  6%  perf-stat.i.cache-misses
    397.43 ±  3%      -7.4%     368.14 ±  2%  perf-stat.i.cpu-migrations
      0.03 ±  4%      +0.0        0.03 ±  4%  perf-stat.i.dTLB-load-miss-rate%
   3146082 ±  5%     -13.9%    2707716 ±  5%  perf-stat.i.dTLB-load-misses
   1739613 ±  5%     -12.5%    1521663 ±  4%  perf-stat.i.dTLB-store-misses
      9.97 ±  5%    +712.5%      81.01 ±  3%  perf-stat.i.major-faults
     40.27 ±  4%      -8.2%      36.99 ±  3%  perf-stat.i.metric.M/sec
     58347 ±  4%     -11.5%      51627 ±  3%  perf-stat.i.minor-faults
     63.09 ±  6%      -5.8       57.29 ±  8%  perf-stat.i.node-load-miss-rate%
   6474191 ±  5%     -14.1%    5560267 ±  5%  perf-stat.i.node-loads
     58356 ±  4%     -11.4%      51708 ±  3%  perf-stat.i.page-faults
      0.98            -0.0        0.96        perf-stat.overall.branch-miss-rate%
    435.13 ±  3%     +10.0%     478.43 ±  4%  perf-stat.overall.cycles-between-cache-misses
      0.06 ±  2%      -0.0        0.06 ±  4%  perf-stat.overall.dTLB-load-miss-rate%
      0.08 ±  2%      -0.0        0.07 ±  2%  perf-stat.overall.dTLB-store-miss-rate%
 2.646e+09 ±  3%      -8.7%  2.415e+09 ±  2%  perf-stat.ps.branch-instructions
  25808992 ±  3%     -10.2%   23188954 ±  2%  perf-stat.ps.branch-misses
  45057464 ±  5%     -14.4%   38574053 ±  6%  perf-stat.ps.cache-misses
 1.428e+08 ±  3%      -9.7%  1.289e+08 ±  2%  perf-stat.ps.cache-references
    367.11 ±  2%      -7.4%     339.78 ±  2%  perf-stat.ps.cpu-migrations
   2972584 ±  5%     -17.3%    2458144 ±  5%  perf-stat.ps.dTLB-load-misses
 4.801e+09 ±  3%      -9.2%  4.359e+09 ±  2%  perf-stat.ps.dTLB-loads
   1715051 ±  5%     -14.7%    1462728 ±  4%  perf-stat.ps.dTLB-store-misses
 2.193e+09 ±  3%      -9.2%  1.992e+09 ±  2%  perf-stat.ps.dTLB-stores
  2.06e+10 ±  3%      -9.2%  1.871e+10 ±  2%  perf-stat.ps.instructions
      8.54 ±  4%    +768.3%      74.19 ±  2%  perf-stat.ps.major-faults
     59802 ±  3%     -11.0%      53218 ±  3%  perf-stat.ps.minor-faults
   6145046 ±  4%     -17.9%    5046537 ±  4%  perf-stat.ps.node-loads
     59810 ±  3%     -10.9%      53292 ±  3%  perf-stat.ps.page-faults




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [linux-next:master] [mm]  7b32f64bc5: pts.svt-av1.Preset13.Bosphorus4K.frames_per_second 45.8% regression
  2026-06-18  8:00 [linux-next:master] [mm] 7b32f64bc5: pts.svt-av1.Preset13.Bosphorus4K.frames_per_second 45.8% regression kernel test robot
@ 2026-06-18  9:30 ` Jan Kara
  2026-06-18 14:30   ` Lorenzo Stoakes
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Kara @ 2026-06-18  9:30 UTC (permalink / raw)
  To: kernel test robot
  Cc: Frederick Mayle, oe-lkp, lkp, Andrew Morton, Jan Kara,
	Kalesh Singh, David Hildenbrand, Lorenzo Stoakes, Matthew Wilcox,
	Suren Baghdasaryan, linux-fsdevel, linux-mm

On Thu 18-06-26 16:00:42, kernel test robot wrote:
> Hello,
> 
> kernel test robot noticed a 45.8% regression of pts.svt-av1.Preset13.Bosphorus4K.frames_per_second on:

This one looks serious enough and real. It would be good to figure out what
happens in this benchmark that it benefits from the readahead across VMA
boundaries so much...

								Honza


> commit: 7b32f64bc512b40b268776c5ac4d354b325b3197 ("mm: limit filemap_fault readahead to VMA boundaries")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> 
> [still regression on linux-next/master ec039126b7fac4e3af35ebccaa7c6f9b6875ba81]
> 
> testcase: pts
> config: x86_64-rhel-9.4
> compiler: gcc-14
> test machine: 256 threads 2 sockets Intel(R) Xeon(R) 6767P  CPU @ 2.4GHz (Granite Rapids) with 256G memory
> parameters:
> 
> 	test: svt-av1-2.11.1
> 	option_a: 1
> 	option_b: Bosphorus 4K
> 	cpufreq_governor: performance
> 
> 
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202606181547.617a6967-lkp@intel.com
> 
> 
> Details are as below:
> -------------------------------------------------------------------------------------------------->
> 
> 
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20260618/202606181547.617a6967-lkp@intel.com
> 
> =========================================================================================
> compiler/cpufreq_governor/kconfig/option_a/option_b/rootfs/tbox_group/test/testcase:
>   gcc-14/performance/x86_64-rhel-9.4/1/Bosphorus 4K/debian-12-x86_64-phoronix/lkp-gnr-2sp3/svt-av1-2.11.1/pts
> 
> commit: 
>   0b20c36c11 ("mm/madvise: reject invalid process_madvise() advice for zero-length vectors")
>   7b32f64bc5 ("mm: limit filemap_fault readahead to VMA boundaries")
> 
> 0b20c36c118d2122 7b32f64bc512b40b268776c5ac4 
> ---------------- --------------------------- 
>          %stddev     %change         %stddev
>              \          |                \  
>     169.95 ±  6%     -45.8%      92.15 ± 10%  pts.svt-av1.Preset13.Bosphorus4K.frames_per_second
>     220.57 ±  3%    +870.9%       2141        pts.time.major_page_faults
>    5898121            -3.7%    5682645        pts.time.maximum_resident_set_size
>    1408522            -1.3%    1390537        pts.time.minor_page_faults
>     370.43 ±  3%     -12.1%     325.57 ±  6%  pts.time.percent_of_cpu_this_job_got
>     645228 ±  2%      +7.6%     694407 ±  3%  pts.time.voluntary_context_switches
>     162542 ±  7%     +22.4%     198946 ±  8%  sched_debug.cpu.avg_idle.stddev
>     274573 ±  3%      -9.3%     249101 ±  3%  vmstat.io.bi
>  6.779e+09 ±  3%     +11.3%  7.545e+09 ±  4%  cpuidle..time
>    7639156 ±  3%     +10.5%    8439210 ±  3%  cpuidle..usage
>       3.73 ± 39%      -2.2        1.52 ± 52%  perf-profile.calltrace.cycles-pp.pv_native_safe_halt.acpi_safe_halt.acpi_idle_do_entry.acpi_idle_enter.cpuidle_enter_state
>       0.65 ± 87%      +1.8        2.47 ± 60%  perf-profile.children.cycles-pp.link_path_walk
>       0.24 ± 46%     -80.9%       0.05 ±143%  perf-sched.sch_delay.avg.ms.perf_trace_sched_switch.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.65 ± 18%     -33.7%       0.43 ± 17%  perf-sched.wait_and_delay.avg.ms.perf_trace_sched_switch.do_wait.kernel_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.02 ±  7%      +0.0        0.06 ±  8%  mpstat.cpu.all.iowait%
>       1.08 ±  4%      -0.2        0.93 ±  6%  mpstat.cpu.all.usr%
>      22.00 ±  2%     +14.3%      25.14 ±  2%  mpstat.max_utilization.seconds
>      10.00 ±  3%     -40.5%       5.94 ±  2%  mpstat.max_utilization_pct
>    5916248            +6.9%    6324576        meminfo.Active
>    3659476 ±  2%     +11.4%    4078190 ±  2%  meminfo.Active(anon)
>     760816 ±  3%     +16.9%     889730 ±  3%  meminfo.AnonHugePages
>    2263661 ±  4%     +18.5%    2682180 ±  3%  meminfo.AnonPages
>    6611024           +11.9%    7400589 ±  2%  meminfo.Committed_AS
>   11386404            +5.1%   11964152        meminfo.Memused
>      21554            +5.6%      22770 ±  2%  meminfo.PageTables
>     916081 ±  2%     +11.4%    1020691 ±  2%  proc-vmstat.nr_active_anon
>     567129 ±  3%     +18.4%     671694 ±  3%  proc-vmstat.nr_anon_pages
>     370.88 ±  3%     +17.2%     434.57 ±  3%  proc-vmstat.nr_anon_transparent_hugepages
>      48477            +1.6%      49272        proc-vmstat.nr_kernel_stack
>       5399            +5.4%       5691 ±  2%  proc-vmstat.nr_page_table_pages
>     916081 ±  2%     +11.4%    1020691 ±  2%  proc-vmstat.nr_zone_active_anon
>    4614742            -1.5%    4547741        proc-vmstat.pgalloc_normal
>    1667517            -1.0%    1650517        proc-vmstat.pgfault
>    2150014            -2.9%    2087812        proc-vmstat.pgfree
>     224.57 ±  3%    +862.1%       2160        proc-vmstat.pgmajfault
>       1021            -7.2%     948.14        proc-vmstat.thp_fault_alloc
>  2.679e+09 ±  3%      -7.5%  2.477e+09 ±  2%  perf-stat.i.branch-instructions
>       0.80            +0.1        0.86        perf-stat.i.branch-miss-rate%
>   27722463 ±  3%      -8.0%   25498433 ±  2%  perf-stat.i.branch-misses
>   47056892 ±  6%     -11.2%   41804370 ±  6%  perf-stat.i.cache-misses
>     397.43 ±  3%      -7.4%     368.14 ±  2%  perf-stat.i.cpu-migrations
>       0.03 ±  4%      +0.0        0.03 ±  4%  perf-stat.i.dTLB-load-miss-rate%
>    3146082 ±  5%     -13.9%    2707716 ±  5%  perf-stat.i.dTLB-load-misses
>    1739613 ±  5%     -12.5%    1521663 ±  4%  perf-stat.i.dTLB-store-misses
>       9.97 ±  5%    +712.5%      81.01 ±  3%  perf-stat.i.major-faults
>      40.27 ±  4%      -8.2%      36.99 ±  3%  perf-stat.i.metric.M/sec
>      58347 ±  4%     -11.5%      51627 ±  3%  perf-stat.i.minor-faults
>      63.09 ±  6%      -5.8       57.29 ±  8%  perf-stat.i.node-load-miss-rate%
>    6474191 ±  5%     -14.1%    5560267 ±  5%  perf-stat.i.node-loads
>      58356 ±  4%     -11.4%      51708 ±  3%  perf-stat.i.page-faults
>       0.98            -0.0        0.96        perf-stat.overall.branch-miss-rate%
>     435.13 ±  3%     +10.0%     478.43 ±  4%  perf-stat.overall.cycles-between-cache-misses
>       0.06 ±  2%      -0.0        0.06 ±  4%  perf-stat.overall.dTLB-load-miss-rate%
>       0.08 ±  2%      -0.0        0.07 ±  2%  perf-stat.overall.dTLB-store-miss-rate%
>  2.646e+09 ±  3%      -8.7%  2.415e+09 ±  2%  perf-stat.ps.branch-instructions
>   25808992 ±  3%     -10.2%   23188954 ±  2%  perf-stat.ps.branch-misses
>   45057464 ±  5%     -14.4%   38574053 ±  6%  perf-stat.ps.cache-misses
>  1.428e+08 ±  3%      -9.7%  1.289e+08 ±  2%  perf-stat.ps.cache-references
>     367.11 ±  2%      -7.4%     339.78 ±  2%  perf-stat.ps.cpu-migrations
>    2972584 ±  5%     -17.3%    2458144 ±  5%  perf-stat.ps.dTLB-load-misses
>  4.801e+09 ±  3%      -9.2%  4.359e+09 ±  2%  perf-stat.ps.dTLB-loads
>    1715051 ±  5%     -14.7%    1462728 ±  4%  perf-stat.ps.dTLB-store-misses
>  2.193e+09 ±  3%      -9.2%  1.992e+09 ±  2%  perf-stat.ps.dTLB-stores
>   2.06e+10 ±  3%      -9.2%  1.871e+10 ±  2%  perf-stat.ps.instructions
>       8.54 ±  4%    +768.3%      74.19 ±  2%  perf-stat.ps.major-faults
>      59802 ±  3%     -11.0%      53218 ±  3%  perf-stat.ps.minor-faults
>    6145046 ±  4%     -17.9%    5046537 ±  4%  perf-stat.ps.node-loads
>      59810 ±  3%     -10.9%      53292 ±  3%  perf-stat.ps.page-faults
> 
> 
> 
> 
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
> 
> 
> -- 
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [linux-next:master] [mm]  7b32f64bc5: pts.svt-av1.Preset13.Bosphorus4K.frames_per_second 45.8% regression
  2026-06-18  9:30 ` Jan Kara
@ 2026-06-18 14:30   ` Lorenzo Stoakes
  2026-06-18 16:03     ` Suren Baghdasaryan
  0 siblings, 1 reply; 6+ messages in thread
From: Lorenzo Stoakes @ 2026-06-18 14:30 UTC (permalink / raw)
  To: Jan Kara
  Cc: kernel test robot, Frederick Mayle, oe-lkp, lkp, Andrew Morton,
	Kalesh Singh, David Hildenbrand, Matthew Wilcox,
	Suren Baghdasaryan, linux-fsdevel, linux-mm

On Thu, Jun 18, 2026 at 11:30:47AM +0200, Jan Kara wrote:
> On Thu 18-06-26 16:00:42, kernel test robot wrote:
> > Hello,
> >
> > kernel test robot noticed a 45.8% regression of pts.svt-av1.Preset13.Bosphorus4K.frames_per_second on:
>
> This one looks serious enough and real. It would be good to figure out what
> happens in this benchmark that it benefits from the readahead across VMA
> boundaries so much...

I think a revert first no? This seems pretty huge for something that isn't key
to the kernel, then a new attempt can be tried with this issue addressed
perhaps?

I seem to recall objecting to this in the past, had a look around and saw [0]
which has some discussion I think specifically on the VMA boundaries thing.

[0]:https://lore.kernel.org/linux-mm/CAC_TJvfG8GcwG_2w1o6GOTZS8tfEx2h9A91qsenYfYsX8Te=Bg@mail.gmail.com/

(Sorry to be naggy buuut :) - I see there wasn't maintainer [Matthew] signoff on
this, we should really be in the habit of _requiring_ that for merge).

Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [linux-next:master] [mm] 7b32f64bc5: pts.svt-av1.Preset13.Bosphorus4K.frames_per_second 45.8% regression
  2026-06-18 14:30   ` Lorenzo Stoakes
@ 2026-06-18 16:03     ` Suren Baghdasaryan
  2026-06-18 16:32       ` Pedro Falcato
  0 siblings, 1 reply; 6+ messages in thread
From: Suren Baghdasaryan @ 2026-06-18 16:03 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Jan Kara, kernel test robot, Frederick Mayle, oe-lkp, lkp,
	Andrew Morton, Kalesh Singh, David Hildenbrand, Matthew Wilcox,
	linux-fsdevel, linux-mm

On Thu, Jun 18, 2026 at 2:30 PM Lorenzo Stoakes <ljs@kernel.org> wrote:
>
> On Thu, Jun 18, 2026 at 11:30:47AM +0200, Jan Kara wrote:
> > On Thu 18-06-26 16:00:42, kernel test robot wrote:
> > > Hello,
> > >
> > > kernel test robot noticed a 45.8% regression of pts.svt-av1.Preset13.Bosphorus4K.frames_per_second on:
> >
> > This one looks serious enough and real. It would be good to figure out what
> > happens in this benchmark that it benefits from the readahead across VMA
> > boundaries so much...
>
> I think a revert first no? This seems pretty huge for something that isn't key
> to the kernel, then a new attempt can be tried with this issue addressed
> perhaps?

A quick search yields: "The
pts.svt-av1.Preset13.Bosphorus4K.frames_per_second is a benchmarking
metric from the Phoronix Test Suite that measures how many frames per
second a CPU can encode using the open-source SVT-AV1 video encoder."

If this is a video encoding benchmark I would expect it to explicitly
prefetch the data from the disk before measuring the encoding speed.
If limiting readahead caused this regression, I suspect the benchmark
doesn't explicitly prefetch the data...

>
> I seem to recall objecting to this in the past, had a look around and saw [0]
> which has some discussion I think specifically on the VMA boundaries thing.
>
> [0]:https://lore.kernel.org/linux-mm/CAC_TJvfG8GcwG_2w1o6GOTZS8tfEx2h9A91qsenYfYsX8Te=Bg@mail.gmail.com/
>
> (Sorry to be naggy buuut :) - I see there wasn't maintainer [Matthew] signoff on
> this, we should really be in the habit of _requiring_ that for merge).
>
> Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [linux-next:master] [mm] 7b32f64bc5: pts.svt-av1.Preset13.Bosphorus4K.frames_per_second 45.8% regression
  2026-06-18 16:03     ` Suren Baghdasaryan
@ 2026-06-18 16:32       ` Pedro Falcato
  2026-06-18 16:51         ` Suren Baghdasaryan
  0 siblings, 1 reply; 6+ messages in thread
From: Pedro Falcato @ 2026-06-18 16:32 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Lorenzo Stoakes, Jan Kara, kernel test robot, Frederick Mayle,
	oe-lkp, lkp, Andrew Morton, Kalesh Singh, David Hildenbrand,
	Matthew Wilcox, linux-fsdevel, linux-mm

On Thu, Jun 18, 2026 at 04:03:43PM +0000, Suren Baghdasaryan wrote:
> On Thu, Jun 18, 2026 at 2:30 PM Lorenzo Stoakes <ljs@kernel.org> wrote:
> >
> > On Thu, Jun 18, 2026 at 11:30:47AM +0200, Jan Kara wrote:
> > > On Thu 18-06-26 16:00:42, kernel test robot wrote:
> > > > Hello,
> > > >
> > > > kernel test robot noticed a 45.8% regression of pts.svt-av1.Preset13.Bosphorus4K.frames_per_second on:
> > >
> > > This one looks serious enough and real. It would be good to figure out what
> > > happens in this benchmark that it benefits from the readahead across VMA
> > > boundaries so much...
> >
> > I think a revert first no? This seems pretty huge for something that isn't key
> > to the kernel, then a new attempt can be tried with this issue addressed
> > perhaps?
> 
> A quick search yields: "The
> pts.svt-av1.Preset13.Bosphorus4K.frames_per_second is a benchmarking
> metric from the Phoronix Test Suite that measures how many frames per
> second a CPU can encode using the open-source SVT-AV1 video encoder."
> 
> If this is a video encoding benchmark I would expect it to explicitly
> prefetch the data from the disk before measuring the encoding speed.
> If limiting readahead caused this regression, I suspect the benchmark
> doesn't explicitly prefetch the data...

Well, commonly video data doesn't actually fit in memory :)

A quick look at the code (I think it's https://gitlab.com/AOMediaCodec/SVT-AV1/-/blob/master/Source/App/app_process_cmd.c#L821)
suggests it is progressively mapping the file data for a given frame
(or frames?). So the old behavior would result in page faults for a given
frame starting readahead for the next few frames. This looks reasonable.

FWIW I suspected this was a really weird case regarding mprotect or
something, and I'm happy it isn't; but at least I had a suggestion for that -
for this, maybe dropping the change (for now?) is the best course of action.

-- 
Pedro


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [linux-next:master] [mm] 7b32f64bc5: pts.svt-av1.Preset13.Bosphorus4K.frames_per_second 45.8% regression
  2026-06-18 16:32       ` Pedro Falcato
@ 2026-06-18 16:51         ` Suren Baghdasaryan
  0 siblings, 0 replies; 6+ messages in thread
From: Suren Baghdasaryan @ 2026-06-18 16:51 UTC (permalink / raw)
  To: Pedro Falcato
  Cc: Lorenzo Stoakes, Jan Kara, kernel test robot, Frederick Mayle,
	oe-lkp, lkp, Andrew Morton, Kalesh Singh, David Hildenbrand,
	Matthew Wilcox, linux-fsdevel, linux-mm

On Thu, Jun 18, 2026 at 9:32 AM Pedro Falcato <pfalcato@suse.de> wrote:
>
> On Thu, Jun 18, 2026 at 04:03:43PM +0000, Suren Baghdasaryan wrote:
> > On Thu, Jun 18, 2026 at 2:30 PM Lorenzo Stoakes <ljs@kernel.org> wrote:
> > >
> > > On Thu, Jun 18, 2026 at 11:30:47AM +0200, Jan Kara wrote:
> > > > On Thu 18-06-26 16:00:42, kernel test robot wrote:
> > > > > Hello,
> > > > >
> > > > > kernel test robot noticed a 45.8% regression of pts.svt-av1.Preset13.Bosphorus4K.frames_per_second on:
> > > >
> > > > This one looks serious enough and real. It would be good to figure out what
> > > > happens in this benchmark that it benefits from the readahead across VMA
> > > > boundaries so much...
> > >
> > > I think a revert first no? This seems pretty huge for something that isn't key
> > > to the kernel, then a new attempt can be tried with this issue addressed
> > > perhaps?
> >
> > A quick search yields: "The
> > pts.svt-av1.Preset13.Bosphorus4K.frames_per_second is a benchmarking
> > metric from the Phoronix Test Suite that measures how many frames per
> > second a CPU can encode using the open-source SVT-AV1 video encoder."
> >
> > If this is a video encoding benchmark I would expect it to explicitly
> > prefetch the data from the disk before measuring the encoding speed.
> > If limiting readahead caused this regression, I suspect the benchmark
> > doesn't explicitly prefetch the data...
>
> Well, commonly video data doesn't actually fit in memory :)
>
> A quick look at the code (I think it's https://gitlab.com/AOMediaCodec/SVT-AV1/-/blob/master/Source/App/app_process_cmd.c#L821)
> suggests it is progressively mapping the file data for a given frame
> (or frames?). So the old behavior would result in page faults for a given
> frame starting readahead for the next few frames. This looks reasonable.

Yeah, looks like it's mapping one frame at a time and even that is
done with 3 mmap calls - it maps luma_read_size and then 2
chroma_read_size chunks which are consequitive and could have been
mapped with one syscall. Seems very inefficient but I guess it's
legit.

>
> FWIW I suspected this was a really weird case regarding mprotect or
> something, and I'm happy it isn't; but at least I had a suggestion for that -
> for this, maybe dropping the change (for now?) is the best course of action.
>
> --
> Pedro


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-06-18 16:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-18  8:00 [linux-next:master] [mm] 7b32f64bc5: pts.svt-av1.Preset13.Bosphorus4K.frames_per_second 45.8% regression kernel test robot
2026-06-18  9:30 ` Jan Kara
2026-06-18 14:30   ` Lorenzo Stoakes
2026-06-18 16:03     ` Suren Baghdasaryan
2026-06-18 16:32       ` Pedro Falcato
2026-06-18 16:51         ` Suren Baghdasaryan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox