From: Matthew Brost <matthew.brost@intel.com>
To: Boris Brezillon <boris.brezillon@collabora.com>
Cc: intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
"Tvrtko Ursulin" <tvrtko.ursulin@igalia.com>,
"Rodrigo Vivi" <rodrigo.vivi@intel.com>,
"Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
"Christian König" <christian.koenig@amd.com>,
"Danilo Krummrich" <dakr@kernel.org>,
"David Airlie" <airlied@gmail.com>,
"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
"Maxime Ripard" <mripard@kernel.org>,
"Philipp Stanner" <phasta@kernel.org>,
"Simona Vetter" <simona@ffwll.ch>,
"Sumit Semwal" <sumit.semwal@linaro.org>,
"Thomas Zimmermann" <tzimmermann@suse.de>,
linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer
Date: Tue, 24 Mar 2026 19:33:15 -0700 [thread overview]
Message-ID: <acNJa4W2qndtbxJg@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <acK2apOn5DMJFb1+@lstrano-desk.jf.intel.com>
On Tue, Mar 24, 2026 at 09:06:02AM -0700, Matthew Brost wrote:
> On Tue, Mar 24, 2026 at 10:23:45AM +0100, Boris Brezillon wrote:
> > On Mon, 23 Mar 2026 11:38:06 -0700
> > Matthew Brost <matthew.brost@intel.com> wrote:
> >
> > >
> > > Ok, getting stats is easier than I thought...
> > >
> > > ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions /home/mbrost/xe/source/drivers.gpu.i915.igt-gpu-tools/build/tests/xe_exec_threads --r threads-basic
> > >
> > > This test creates one thread per engine instance (7 instances this BMG
> > > device) and submits 1k exec IOCTLs per thread, each performing a DW
> > > write. Each exec IOCTL typically does not have unsignaled input dependencies.
> > >
> > > With IRQ putting of jobs off + no bypass (drm_dep_queue_flags = 0):
> > >
> > > 8,449 context-switches
> > > 412 cpu-migrations
> > > 2,531.43 msec task-clock
> > > 1,847,846,588 cpu_atom/cycles/
> > > 1,847,856,947 cpu_core/cycles/
> > > <not supported> cpu_atom/instructions/
> > > 460,744,020 cpu_core/instructions/
> > >
> > > With IRQ putting of jobs off + bypass (drm_dep_queue_flags =
> > > DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED):
> > >
> > > 8,655 context-switches
> > > 229 cpu-migrations
> > > 2,571.33 msec task-clock
> > > 855,900,607 cpu_atom/cycles/
> > > 855,900,272 cpu_core/cycles/
> > > <not supported> cpu_atom/instructions/
> > > 403,651,469 cpu_core/instructions/
> > >
> > > With IRQ putting of jobs on + bypass (drm_dep_queue_flags =
> > > DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED |
> > > DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE):
> > >
> > > 5,361 context-switches
> > > 169 cpu-migrations
> > > 2,577.44 msec task-clock
> > > 685,769,153 cpu_atom/cycles/
> > > 685,768,407 cpu_core/cycles/
> > > <not supported> cpu_atom/instructions/
> > > 321,336,297 cpu_core/instructions/
> >
> > Thanks for sharing those numbers. For completeness, can you also add the
> > "With IRQ putting of jobs on + no bypass" case?
> >
>
> Yes, I also will share a DRM sched baseline too + I figured out power
> can be measured too - initial results confirm what I expected too - less
> power.
>
> I'm putting together a doc based on running glxgears and another
> benchmark on top Ubuntu 24.10 + Wayland which has explicit sync
> (linux-drm-syncobj, behaves like surfface flinger when rendering flag to
> not pass in fences to draw jobs).
>
> Almost have all the data. Will share here once I have it.
>
Here are some numbers based on glxgears and weston-simple-egl.
5 configurations tested:
DRM sched
DRM dep (no opt flags)
DRM dep + bypass flag
DRM dep + IRQ-safe flag
DRM dep + bypass + IRQ-safe flags
Each configuration was run 3× on both glxgears and weston-simple-egl.
Raptor lake CPU, BMG G21.
Summary:
DRM dep reduces power usage, CPU cycles, and context switches. Enabling
both the bypass and IRQ-safe flags further reduces all of these metrics.
I’d say this test case best models something like scrolling on a phone
or using a laptop for non-GPU-intensive workloads where the screen still
needs to refresh.
I’ve run more intensive benchmarks—glmark2 and Unigine Heaven as well.
The results are somewhat noisy between boots, but I think the same
conclusion holds.
Raw numbers (bit of a firehouse):
DRM sched:
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.
303 frames in 5.0 seconds = 60.565 FPS
300 frames in 5.0 seconds = 60.000 FPS
301 frames in 5.0 seconds = 60.001 FPS
Performance counter stats for 'system wide':
71,548 context-switches
1,466 cpu-migrations
320,440.96 msec task-clock
9,140,249,815 cpu_atom/cycles/
9,140,253,058 cpu_core/cycles/
<not supported> cpu_atom/instructions/
7,071,794,806 cpu_core/instructions/
168.76 Joules power/energy-pkg/
57.78 Joules power/energy-cores/
20.029126614 seconds time elapsed
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.
304 frames in 5.0 seconds = 60.642 FPS
300 frames in 5.0 seconds = 59.988 FPS
301 frames in 5.0 seconds = 60.001 FPS
Performance counter stats for 'system wide':
71,720 context-switches
1,581 cpu-migrations
320,530.64 msec task-clock
8,990,313,521 cpu_atom/cycles/
8,990,315,400 cpu_core/cycles/
<not supported> cpu_atom/instructions/
6,988,827,285 cpu_core/instructions/
172.15 Joules power/energy-pkg/
58.33 Joules power/energy-cores/
20.034862844 seconds time elapsed
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.
304 frames in 5.0 seconds = 60.741 FPS
299 frames in 5.0 seconds = 59.798 FPS
299 frames in 5.0 seconds = 59.799 FPS
Performance counter stats for 'system wide':
70,871 context-switches
1,980 cpu-migrations
320,558.82 msec task-clock
8,861,481,467 cpu_atom/cycles/
8,861,485,448 cpu_core/cycles/
<not supported> cpu_atom/instructions/
6,665,294,516 cpu_core/instructions/
167.82 Joules power/energy-pkg/
56.97 Joules power/energy-cores/
20.035713155 seconds time elapsed
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
299 frames in 5 seconds: 59.799999 fps
Performance counter stats for 'system wide':
27,398 context-switches
678 cpu-migrations
160,255.17 msec task-clock
5,002,546,782 cpu_atom/cycles/
5,002,549,920 cpu_core/cycles/
<not supported> cpu_atom/instructions/
3,498,672,077 cpu_core/instructions/
93.41 Joules power/energy-pkg/
23.91 Joules power/energy-cores/
10.017552274 seconds time elapsed
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
300 frames in 5 seconds: 60.000000 fps
Performance counter stats for 'system wide':
27,322 context-switches
580 cpu-migrations
160,307.12 msec task-clock
4,783,734,059 cpu_atom/cycles/
4,783,737,645 cpu_core/cycles/
<not supported> cpu_atom/instructions/
3,224,510,206 cpu_core/instructions/
91.89 Joules power/energy-pkg/
23.28 Joules power/energy-cores/
10.020629190 seconds time elapsed
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
300 frames in 5 seconds: 60.000000 fps
Performance counter stats for 'system wide':
27,356 context-switches
573 cpu-migrations
160,362.30 msec task-clock
5,112,653,847 cpu_atom/cycles/
5,112,658,503 cpu_core/cycles/
<not supported> cpu_atom/instructions/
3,395,873,668 cpu_core/instructions/
94.40 Joules power/energy-pkg/
24.58 Joules power/energy-cores/
10.023979647 seconds time elapsed
No opt (drm_dep_queue_flags = 0):
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.
303 frames in 5.0 seconds = 60.597 FPS
300 frames in 5.0 seconds = 59.989 FPS
297 frames in 5.0 seconds = 59.232 FPS
Performance counter stats for 'system wide':
66,233 context-switches
1,820 cpu-migrations
320,586.39 msec task-clock
9,028,164,726 cpu_atom/cycles/
9,028,178,052 cpu_core/cycles/
<not supported> cpu_atom/instructions/
6,541,478,243 cpu_core/instructions/
178.47 Joules power/energy-pkg/
44.18 Joules power/energy-cores/
20.036849235 seconds time elapsed
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.
304 frames in 5.0 seconds = 60.691 FPS
297 frames in 5.0 seconds = 59.393 FPS
300 frames in 5.0 seconds = 59.803 FPS
Performance counter stats for 'system wide':
68,389 context-switches
2,034 cpu-migrations
320,457.18 msec task-clock
8,736,092,056 cpu_atom/cycles/
8,736,096,958 cpu_core/cycles/
<not supported> cpu_atom/instructions/
6,511,630,145 cpu_core/instructions/
183.23 Joules power/energy-pkg/
47.43 Joules power/energy-cores/
20.031469459 seconds time elapsed
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.
303 frames in 5.0 seconds = 60.458 FPS
299 frames in 5.0 seconds = 59.606 FPS
298 frames in 5.0 seconds = 59.590 FPS
Performance counter stats for 'system wide':
67,692 context-switches
1,877 cpu-migrations
320,524.05 msec task-clock
8,837,946,224 cpu_atom/cycles/
8,837,949,628 cpu_core/cycles/
<not supported> cpu_atom/instructions/
6,018,812,170 cpu_core/instructions/
187.63 Joules power/energy-pkg/
46.76 Joules power/energy-cores/
20.034428856 seconds time elapsed
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
299 frames in 5 seconds: 59.799999 fps
Performance counter stats for 'system wide':
27,259 context-switches
313 cpu-migrations
160,538.29 msec task-clock
5,079,653,975 cpu_atom/cycles/
5,079,657,432 cpu_core/cycles/
<not supported> cpu_atom/instructions/
3,166,877,411 cpu_core/instructions/
90.72 Joules power/energy-pkg/
21.70 Joules power/energy-cores/
10.034716719 seconds time elapsed
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
300 frames in 5 seconds: 60.000000 fps
Performance counter stats for 'system wide':
26,933 context-switches
449 cpu-migrations
160,334.74 msec task-clock
4,851,027,105 cpu_atom/cycles/
4,851,054,678 cpu_core/cycles/
<not supported> cpu_atom/instructions/
3,042,177,215 cpu_core/instructions/
87.33 Joules power/energy-pkg/
21.85 Joules power/energy-cores/
10.021873082 seconds time elapsed
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
299 frames in 5 seconds: 59.799999 fps
Performance counter stats for 'system wide':
27,101 context-switches
351 cpu-migrations
160,333.98 msec task-clock
4,903,047,240 cpu_atom/cycles/
4,903,055,111 cpu_core/cycles/
<not supported> cpu_atom/instructions/
2,884,284,727 cpu_core/instructions/
87.68 Joules power/energy-pkg/
21.36 Joules power/energy-cores/
10.021938190 seconds time elapsed
Bypass (drm_dep_queue_flags = DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED):
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.
304 frames in 5.0 seconds = 60.718 FPS
299 frames in 5.0 seconds = 59.615 FPS
299 frames in 5.0 seconds = 59.795 FPS
Performance counter stats for 'system wide':
56,788 context-switches
2,576 cpu-migrations
320,610.02 msec task-clock
9,056,383,522 cpu_atom/cycles/
9,056,385,629 cpu_core/cycles/
<not supported> cpu_atom/instructions/
6,285,652,796 cpu_core/instructions/
164.29 Joules power/energy-pkg/
44.70 Joules power/energy-cores/
20.041318795 seconds time elapsed
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.
304 frames in 5.0 seconds = 60.734 FPS
300 frames in 5.0 seconds = 59.983 FPS
300 frames in 5.0 seconds = 60.000 FPS
Performance counter stats for 'system wide':
56,388 context-switches
2,326 cpu-migrations
320,581.07 msec task-clock
8,789,215,827 cpu_atom/cycles/
8,789,217,484 cpu_core/cycles/
<not supported> cpu_atom/instructions/
6,251,346,200 cpu_core/instructions/
162.67 Joules power/energy-pkg/
44.30 Joules power/energy-cores/
20.037648324 seconds time elapsed
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.
305 frames in 5.0 seconds = 60.950 FPS
300 frames in 5.0 seconds = 59.993 FPS
300 frames in 5.0 seconds = 59.806 FPS
Performance counter stats for 'system wide':
56,167 context-switches
2,434 cpu-migrations
320,594.69 msec task-clock
8,700,873,664 cpu_atom/cycles/
8,700,877,150 cpu_core/cycles/
<not supported> cpu_atom/instructions/
6,405,556,662 cpu_core/instructions/
162.55 Joules power/energy-pkg/
43.33 Joules power/energy-cores/
20.038448851 seconds time elapsed
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
299 frames in 5 seconds: 59.799999 fps
Performance counter stats for 'system wide':
24,747 context-switches
1,254 cpu-migrations
160,543.42 msec task-clock
5,047,832,024 cpu_atom/cycles/
5,047,823,996 cpu_core/cycles/
<not supported> cpu_atom/instructions/
3,124,591,155 cpu_core/instructions/
80.28 Joules power/energy-pkg/
21.49 Joules power/energy-cores/
10.034654628 seconds time elapsed
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
300 frames in 5 seconds: 60.000000 fps
Performance counter stats for 'system wide':
24,953 context-switches
921 cpu-migrations
160,375.32 msec task-clock
5,197,283,835 cpu_atom/cycles/
5,197,287,623 cpu_core/cycles/
<not supported> cpu_atom/instructions/
3,393,363,950 cpu_core/instructions/
83.36 Joules power/energy-pkg/
21.92 Joules power/energy-cores/
10.024899366 seconds time elapsed
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
298 frames in 5 seconds: 59.599998 fps
Performance counter stats for 'system wide':
24,576 context-switches
966 cpu-migrations
160,339.37 msec task-clock
4,915,705,971 cpu_atom/cycles/
4,915,709,503 cpu_core/cycles/
<not supported> cpu_atom/instructions/
2,968,947,722 cpu_core/instructions/
79.96 Joules power/energy-pkg/
21.08 Joules power/energy-cores/
10.022743041 seconds time elapsed
IRQ (drm_dep_queue_flags = DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE):
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.
304 frames in 5.0 seconds = 60.643 FPS
298 frames in 5.0 seconds = 59.599 FPS
295 frames in 5.0 seconds = 58.998 FPS
Performance counter stats for 'system wide':
60,305 context-switches
1,994 cpu-migrations
320,528.79 msec task-clock
8,518,549,937 cpu_atom/cycles/
8,518,573,906 cpu_core/cycles/
<not supported> cpu_atom/instructions/
5,813,890,066 cpu_core/instructions/
184.52 Joules power/energy-pkg/
40.79 Joules power/energy-cores/
20.032795872 seconds time elapsed
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.
304 frames in 5.0 seconds = 60.759 FPS
299 frames in 5.0 seconds = 59.790 FPS
301 frames in 5.0 seconds = 60.003 FPS
Performance counter stats for 'system wide':
59,401 context-switches
2,256 cpu-migrations
320,475.03 msec task-clock
8,581,759,828 cpu_atom/cycles/
8,581,763,986 cpu_core/cycles/
<not supported> cpu_atom/instructions/
6,748,269,548 cpu_core/instructions/
179.76 Joules power/energy-pkg/
40.66 Joules power/energy-cores/
20.029861532 seconds time elapsed
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.
304 frames in 5.0 seconds = 60.653 FPS
298 frames in 5.0 seconds = 59.404 FPS
300 frames in 5.0 seconds = 59.990 FPS
Performance counter stats for 'system wide':
59,381 context-switches
1,800 cpu-migrations
320,616.35 msec task-clock
8,829,473,025 cpu_atom/cycles/
8,829,477,019 cpu_core/cycles/
<not supported> cpu_atom/instructions/
6,505,926,710 cpu_core/instructions/
180.38 Joules power/energy-pkg/
40.86 Joules power/energy-cores/
20.040016190 seconds time elapsed
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
298 frames in 5 seconds: 59.599998 fps
Performance counter stats for 'system wide':
27,341 context-switches
786 cpu-migrations
160,478.01 msec task-clock
4,681,440,843 cpu_atom/cycles/
4,681,443,905 cpu_core/cycles/
<not supported> cpu_atom/instructions/
2,969,039,615 cpu_core/instructions/
91.74 Joules power/energy-pkg/
20.84 Joules power/energy-cores/
10.031116623 seconds time elapsed
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
299 frames in 5 seconds: 59.799999 fps
Performance counter stats for 'system wide':
24,626 context-switches
429 cpu-migrations
160,367.44 msec task-clock
4,828,015,355 cpu_atom/cycles/
4,828,019,887 cpu_core/cycles/
<not supported> cpu_atom/instructions/
2,675,419,833 cpu_core/instructions/
90.35 Joules power/energy-pkg/
21.10 Joules power/energy-cores/
10.024476921 seconds time elapsed
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
300 frames in 5 seconds: 60.000000 fps
Performance counter stats for 'system wide':
24,679 context-switches
340 cpu-migrations
160,303.90 msec task-clock
4,500,129,961 cpu_atom/cycles/
4,500,132,697 cpu_core/cycles/
<not supported> cpu_atom/instructions/
2,766,150,592 cpu_core/instructions/
88.01 Joules power/energy-pkg/
19.76 Joules power/energy-cores/
10.019653353 seconds time elapsed
IRQ plus bypass (drm_dep_queue_flags = DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED | DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE):
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.
305 frames in 5.0 seconds = 60.958 FPS
299 frames in 5.0 seconds = 59.607 FPS
299 frames in 5.0 seconds = 59.603 FPS
Performance counter stats for 'system wide':
46,934 context-switches
1,558 cpu-migrations
320,569.83 msec task-clock
7,976,414,449 cpu_atom/cycles/
7,976,417,934 cpu_core/cycles/
<not supported> cpu_atom/instructions/
6,126,973,947 cpu_core/instructions/
178.36 Joules power/energy-pkg/
40.10 Joules power/energy-cores/
20.037681420 seconds time elapsed
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.
304 frames in 5.0 seconds = 60.696 FPS
299 frames in 5.0 seconds = 59.616 FPS
299 frames in 5.0 seconds = 59.781 FPS
Performance counter stats for 'system wide':
47,691 context-switches
1,994 cpu-migrations
320,602.83 msec task-clock
8,270,567,663 cpu_atom/cycles/
8,270,572,484 cpu_core/cycles/
<not supported> cpu_atom/instructions/
4,361,204,861 cpu_core/instructions/
181.56 Joules power/energy-pkg/
40.16 Joules power/energy-cores/
20.038511163 seconds time elapsed
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh. The framerate should be
approximately the same as the monitor refresh rate.
305 frames in 5.0 seconds = 60.911 FPS
298 frames in 5.0 seconds = 59.597 FPS
300 frames in 5.0 seconds = 59.803 FPS
Performance counter stats for 'system wide':
47,129 context-switches
1,921 cpu-migrations
320,491.09 msec task-clock
8,054,513,204 cpu_atom/cycles/
8,054,518,711 cpu_core/cycles/
<not supported> cpu_atom/instructions/
6,131,796,639 cpu_core/instructions/
178.54 Joules power/energy-pkg/
40.08 Joules power/energy-cores/
20.032444923 seconds time elapsed
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
299 frames in 5 seconds: 59.799999 fps
Performance counter stats for 'system wide':
21,991 context-switches
286 cpu-migrations
160,343.73 msec task-clock
4,497,475,288 cpu_atom/cycles/
4,497,477,011 cpu_core/cycles/
<not supported> cpu_atom/instructions/
3,042,007,163 cpu_core/instructions/
89.14 Joules power/energy-pkg/
20.09 Joules power/energy-cores/
10.021642254 seconds time elapsed
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
300 frames in 5 seconds: 60.000000 fps
Performance counter stats for 'system wide':
22,366 context-switches
225 cpu-migrations
160,386.68 msec task-clock
4,398,432,348 cpu_atom/cycles/
4,398,435,205 cpu_core/cycles/
<not supported> cpu_atom/instructions/
3,086,156,274 cpu_core/instructions/
89.07 Joules power/energy-pkg/
19.68 Joules power/energy-cores/
10.024827902 seconds time elapsed
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
299 frames in 5 seconds: 59.799999 fps
Performance counter stats for 'system wide':
22,515 context-switches
286 cpu-migrations
160,481.91 msec task-clock
4,447,740,222 cpu_atom/cycles/
4,447,743,314 cpu_core/cycles/
<not supported> cpu_atom/instructions/
3,217,285,071 cpu_core/instructions/
90.15 Joules power/energy-pkg/
19.65 Joules power/energy-cores/
10.029135743 seconds time elapsed
Matt
> > I'm a bit surprised by the difference in number of context switches
> > given I'd expect the local-CPU to be picked in priority, and so queuing
> > work items on the same wq from another work item to be almost free in
> > term on scheduling. But I guess there's some load-balancing happening
> > when you execute jobs at such a high rate.
> >
> > Also, I don't know if that's just noise or if it's reproducible, but
> > task-clock seems to be ~40usec lower with the deferred cleanup and
> > no-bypass (higher throughput because you're not blocking the dequeuing
> > of the next job on the cleanup of the previous one, I suspect).
>
> I think that is just noise of what the test is doing in user space -
> that bounces around a bit.
>
> Matt
>
> >
next prev parent reply other threads:[~2026-03-25 2:33 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-16 4:32 [RFC PATCH 00/12] Introduce DRM dep queue Matthew Brost
2026-03-16 4:32 ` [RFC PATCH 01/12] workqueue: Add interface to teach lockdep to warn on reclaim violations Matthew Brost
2026-03-25 15:59 ` Tejun Heo
2026-03-26 1:49 ` Matthew Brost
2026-03-26 2:19 ` Tejun Heo
2026-03-27 4:33 ` Matthew Brost
2026-03-27 17:25 ` Tejun Heo
2026-03-16 4:32 ` [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer Matthew Brost
2026-03-16 9:16 ` Boris Brezillon
2026-03-17 5:22 ` Matthew Brost
2026-03-17 8:48 ` Boris Brezillon
2026-03-16 10:25 ` Danilo Krummrich
2026-03-17 5:10 ` Matthew Brost
2026-03-17 12:19 ` Danilo Krummrich
2026-03-18 23:02 ` Matthew Brost
2026-03-17 2:47 ` Daniel Almeida
2026-03-17 5:45 ` Matthew Brost
2026-03-17 7:17 ` Miguel Ojeda
2026-03-17 8:26 ` Matthew Brost
2026-03-17 12:04 ` Daniel Almeida
2026-03-17 19:41 ` Miguel Ojeda
2026-03-23 17:31 ` Matthew Brost
2026-03-23 17:42 ` Miguel Ojeda
2026-03-17 18:14 ` Matthew Brost
2026-03-17 19:48 ` Daniel Almeida
2026-03-17 20:43 ` Boris Brezillon
2026-03-18 22:40 ` Matthew Brost
2026-03-19 9:57 ` Boris Brezillon
2026-03-22 6:43 ` Matthew Brost
2026-03-23 7:58 ` Matthew Brost
2026-03-23 10:06 ` Boris Brezillon
2026-03-23 17:11 ` Matthew Brost
2026-03-17 12:31 ` Danilo Krummrich
2026-03-17 14:25 ` Daniel Almeida
2026-03-17 14:33 ` Danilo Krummrich
2026-03-18 22:50 ` Matthew Brost
2026-03-17 8:47 ` Christian König
2026-03-17 14:55 ` Boris Brezillon
2026-03-18 23:28 ` Matthew Brost
2026-03-19 9:11 ` Boris Brezillon
2026-03-23 4:50 ` Matthew Brost
2026-03-23 9:55 ` Boris Brezillon
2026-03-23 17:08 ` Matthew Brost
2026-03-23 18:38 ` Matthew Brost
2026-03-24 9:23 ` Boris Brezillon
2026-03-24 16:06 ` Matthew Brost
2026-03-25 2:33 ` Matthew Brost [this message]
2026-03-24 8:49 ` Boris Brezillon
2026-03-24 16:51 ` Matthew Brost
2026-03-17 16:30 ` Shashank Sharma
2026-03-16 4:32 ` [RFC PATCH 03/12] drm/xe: Use WQ_MEM_WARN_ON_RECLAIM on all workqueues in the reclaim path Matthew Brost
2026-03-16 4:32 ` [RFC PATCH 04/12] drm/xe: Issue GGTT invalidation under lock in ggtt_node_remove Matthew Brost
2026-03-26 5:45 ` Bhadane, Dnyaneshwar
2026-03-16 4:32 ` [RFC PATCH 05/12] drm/xe: Return fence from xe_sched_job_arm and adjust job references Matthew Brost
2026-03-16 4:32 ` [RFC PATCH 06/12] drm/xe: Convert to DRM dep queue scheduler layer Matthew Brost
2026-03-16 4:32 ` [RFC PATCH 07/12] drm/xe: Make scheduler message lock IRQ-safe Matthew Brost
2026-03-16 4:32 ` [RFC PATCH 08/12] drm/xe: Rework exec queue object on top of DRM dep Matthew Brost
2026-03-16 4:32 ` [RFC PATCH 09/12] drm/xe: Enable IRQ job put in " Matthew Brost
2026-03-16 4:32 ` [RFC PATCH 10/12] drm/xe: Use DRM dep queue kill semantics Matthew Brost
2026-03-16 4:32 ` [RFC PATCH 11/12] accel/amdxdna: Convert to drm_dep scheduler layer Matthew Brost
2026-03-16 4:32 ` [RFC PATCH 12/12] drm/panthor: " Matthew Brost
2026-03-16 4:52 ` ✗ CI.checkpatch: warning for Introduce DRM dep queue Patchwork
2026-03-16 4:53 ` ✓ CI.KUnit: success " Patchwork
2026-03-16 5:28 ` ✓ Xe.CI.BAT: " Patchwork
2026-03-16 8:09 ` ✗ Xe.CI.FULL: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=acNJa4W2qndtbxJg@lstrano-desk.jf.intel.com \
--to=matthew.brost@intel.com \
--cc=airlied@gmail.com \
--cc=boris.brezillon@collabora.com \
--cc=christian.koenig@amd.com \
--cc=dakr@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=intel-xe@lists.freedesktop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maarten.lankhorst@linux.intel.com \
--cc=mripard@kernel.org \
--cc=phasta@kernel.org \
--cc=rodrigo.vivi@intel.com \
--cc=simona@ffwll.ch \
--cc=sumit.semwal@linaro.org \
--cc=thomas.hellstrom@linux.intel.com \
--cc=tvrtko.ursulin@igalia.com \
--cc=tzimmermann@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.