All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Brost <matthew.brost@intel.com>
To: Boris Brezillon <boris.brezillon@collabora.com>
Cc: intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	"Tvrtko Ursulin" <tvrtko.ursulin@igalia.com>,
	"Rodrigo Vivi" <rodrigo.vivi@intel.com>,
	"Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Danilo Krummrich" <dakr@kernel.org>,
	"David Airlie" <airlied@gmail.com>,
	"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
	"Maxime Ripard" <mripard@kernel.org>,
	"Philipp Stanner" <phasta@kernel.org>,
	"Simona Vetter" <simona@ffwll.ch>,
	"Sumit Semwal" <sumit.semwal@linaro.org>,
	"Thomas Zimmermann" <tzimmermann@suse.de>,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer
Date: Tue, 24 Mar 2026 19:33:15 -0700	[thread overview]
Message-ID: <acNJa4W2qndtbxJg@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <acK2apOn5DMJFb1+@lstrano-desk.jf.intel.com>

On Tue, Mar 24, 2026 at 09:06:02AM -0700, Matthew Brost wrote:
> On Tue, Mar 24, 2026 at 10:23:45AM +0100, Boris Brezillon wrote:
> > On Mon, 23 Mar 2026 11:38:06 -0700
> > Matthew Brost <matthew.brost@intel.com> wrote:
> > 
> > > 
> > > Ok, getting stats is easier than I thought...
> > > 
> > > ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions /home/mbrost/xe/source/drivers.gpu.i915.igt-gpu-tools/build/tests/xe_exec_threads --r threads-basic
> > > 
> > > This test creates one thread per engine instance (7 instances this BMG
> > > device) and submits 1k exec IOCTLs per thread, each performing a DW
> > > write. Each exec IOCTL typically does not have unsignaled input dependencies.
> > > 
> > > With IRQ putting of jobs off + no bypass (drm_dep_queue_flags = 0):
> > > 
> > >              8,449      context-switches
> > >                412      cpu-migrations
> > >           2,531.43 msec task-clock
> > >      1,847,846,588      cpu_atom/cycles/
> > >      1,847,856,947      cpu_core/cycles/
> > >    <not supported>      cpu_atom/instructions/
> > >        460,744,020      cpu_core/instructions/
> > > 
> > > With IRQ putting of jobs off + bypass (drm_dep_queue_flags =
> > > DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED):
> > > 
> > >              8,655      context-switches
> > >                229      cpu-migrations
> > >           2,571.33 msec task-clock
> > >        855,900,607      cpu_atom/cycles/
> > >        855,900,272      cpu_core/cycles/
> > >    <not supported>      cpu_atom/instructions/
> > >        403,651,469      cpu_core/instructions/
> > > 
> > > With IRQ putting of jobs on + bypass (drm_dep_queue_flags =
> > > DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED |
> > > DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE):
> > > 
> > >              5,361      context-switches
> > >                169      cpu-migrations
> > >           2,577.44 msec task-clock
> > >        685,769,153      cpu_atom/cycles/
> > >        685,768,407      cpu_core/cycles/
> > >    <not supported>      cpu_atom/instructions/
> > >        321,336,297      cpu_core/instructions/
> > 
> > Thanks for sharing those numbers. For completeness, can you also add the
> > "With IRQ putting of jobs on + no bypass" case?
> > 
> 
> Yes, I also will share a DRM sched baseline too + I figured out power
> can be measured too - initial results confirm what I expected too - less
> power.
> 
> I'm putting together a doc based on running glxgears and another
> benchmark on top Ubuntu 24.10 + Wayland which has explicit sync
> (linux-drm-syncobj, behaves like surfface flinger when rendering flag to
> not pass in fences to draw jobs).
> 
> Almost have all the data. Will share here once I have it.
> 

Here are some numbers based on glxgears and weston-simple-egl.

5 configurations tested:
DRM sched
DRM dep (no opt flags)
DRM dep + bypass flag
DRM dep + IRQ-safe flag
DRM dep + bypass + IRQ-safe flags

Each configuration was run 3× on both glxgears and weston-simple-egl.
Raptor lake CPU, BMG G21.

Summary:
DRM dep reduces power usage, CPU cycles, and context switches. Enabling
both the bypass and IRQ-safe flags further reduces all of these metrics.

I’d say this test case best models something like scrolling on a phone
or using a laptop for non-GPU-intensive workloads where the screen still
needs to refresh.

I’ve run more intensive benchmarks—glmark2 and Unigine Heaven as well.
The results are somewhat noisy between boots, but I think the same
conclusion holds.

Raw numbers (bit of a firehouse):

DRM sched:
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
303 frames in 5.0 seconds = 60.565 FPS
300 frames in 5.0 seconds = 60.000 FPS
301 frames in 5.0 seconds = 60.001 FPS

 Performance counter stats for 'system wide':

            71,548        context-switches
             1,466        cpu-migrations
        320,440.96 msec   task-clock
     9,140,249,815        cpu_atom/cycles/
     9,140,253,058        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     7,071,794,806        cpu_core/instructions/
            168.76 Joules power/energy-pkg/
             57.78 Joules power/energy-cores/

      20.029126614 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
304 frames in 5.0 seconds = 60.642 FPS
300 frames in 5.0 seconds = 59.988 FPS
301 frames in 5.0 seconds = 60.001 FPS

 Performance counter stats for 'system wide':

            71,720        context-switches
             1,581        cpu-migrations
        320,530.64 msec   task-clock
     8,990,313,521        cpu_atom/cycles/
     8,990,315,400        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     6,988,827,285        cpu_core/instructions/
            172.15 Joules power/energy-pkg/
             58.33 Joules power/energy-cores/

      20.034862844 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
304 frames in 5.0 seconds = 60.741 FPS
299 frames in 5.0 seconds = 59.798 FPS
299 frames in 5.0 seconds = 59.799 FPS

 Performance counter stats for 'system wide':

            70,871        context-switches
             1,980        cpu-migrations
        320,558.82 msec   task-clock
     8,861,481,467        cpu_atom/cycles/
     8,861,485,448        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     6,665,294,516        cpu_core/instructions/
            167.82 Joules power/energy-pkg/
             56.97 Joules power/energy-cores/

      20.035713155 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
299 frames in 5 seconds: 59.799999 fps

 Performance counter stats for 'system wide':

            27,398        context-switches
               678        cpu-migrations
        160,255.17 msec   task-clock
     5,002,546,782        cpu_atom/cycles/
     5,002,549,920        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     3,498,672,077        cpu_core/instructions/
             93.41 Joules power/energy-pkg/
             23.91 Joules power/energy-cores/

      10.017552274 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
300 frames in 5 seconds: 60.000000 fps

 Performance counter stats for 'system wide':

            27,322        context-switches
               580        cpu-migrations
        160,307.12 msec   task-clock
     4,783,734,059        cpu_atom/cycles/
     4,783,737,645        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     3,224,510,206        cpu_core/instructions/
             91.89 Joules power/energy-pkg/
             23.28 Joules power/energy-cores/

      10.020629190 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
300 frames in 5 seconds: 60.000000 fps

 Performance counter stats for 'system wide':

            27,356        context-switches
               573        cpu-migrations
        160,362.30 msec   task-clock
     5,112,653,847        cpu_atom/cycles/
     5,112,658,503        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     3,395,873,668        cpu_core/instructions/
             94.40 Joules power/energy-pkg/
             24.58 Joules power/energy-cores/

      10.023979647 seconds time elapsed

No opt (drm_dep_queue_flags = 0):
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
303 frames in 5.0 seconds = 60.597 FPS
300 frames in 5.0 seconds = 59.989 FPS
297 frames in 5.0 seconds = 59.232 FPS

 Performance counter stats for 'system wide':

            66,233        context-switches
             1,820        cpu-migrations
        320,586.39 msec   task-clock
     9,028,164,726        cpu_atom/cycles/
     9,028,178,052        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     6,541,478,243        cpu_core/instructions/
            178.47 Joules power/energy-pkg/
             44.18 Joules power/energy-cores/

      20.036849235 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
304 frames in 5.0 seconds = 60.691 FPS
297 frames in 5.0 seconds = 59.393 FPS
300 frames in 5.0 seconds = 59.803 FPS

 Performance counter stats for 'system wide':

            68,389        context-switches
             2,034        cpu-migrations
        320,457.18 msec   task-clock
     8,736,092,056        cpu_atom/cycles/
     8,736,096,958        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     6,511,630,145        cpu_core/instructions/
            183.23 Joules power/energy-pkg/
             47.43 Joules power/energy-cores/

      20.031469459 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
303 frames in 5.0 seconds = 60.458 FPS
299 frames in 5.0 seconds = 59.606 FPS
298 frames in 5.0 seconds = 59.590 FPS

 Performance counter stats for 'system wide':

            67,692        context-switches
             1,877        cpu-migrations
        320,524.05 msec   task-clock
     8,837,946,224        cpu_atom/cycles/
     8,837,949,628        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     6,018,812,170        cpu_core/instructions/
            187.63 Joules power/energy-pkg/
             46.76 Joules power/energy-cores/

      20.034428856 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
299 frames in 5 seconds: 59.799999 fps

 Performance counter stats for 'system wide':

            27,259        context-switches
               313        cpu-migrations
        160,538.29 msec   task-clock
     5,079,653,975        cpu_atom/cycles/
     5,079,657,432        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     3,166,877,411        cpu_core/instructions/
             90.72 Joules power/energy-pkg/
             21.70 Joules power/energy-cores/

      10.034716719 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
300 frames in 5 seconds: 60.000000 fps

 Performance counter stats for 'system wide':

            26,933        context-switches
               449        cpu-migrations
        160,334.74 msec   task-clock
     4,851,027,105        cpu_atom/cycles/
     4,851,054,678        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     3,042,177,215        cpu_core/instructions/
             87.33 Joules power/energy-pkg/
             21.85 Joules power/energy-cores/

      10.021873082 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
299 frames in 5 seconds: 59.799999 fps

 Performance counter stats for 'system wide':

            27,101        context-switches
               351        cpu-migrations
        160,333.98 msec   task-clock
     4,903,047,240        cpu_atom/cycles/
     4,903,055,111        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     2,884,284,727        cpu_core/instructions/
             87.68 Joules power/energy-pkg/
             21.36 Joules power/energy-cores/

      10.021938190 seconds time elapsed

Bypass (drm_dep_queue_flags = DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED):
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
304 frames in 5.0 seconds = 60.718 FPS
299 frames in 5.0 seconds = 59.615 FPS
299 frames in 5.0 seconds = 59.795 FPS

 Performance counter stats for 'system wide':

            56,788        context-switches
             2,576        cpu-migrations
        320,610.02 msec   task-clock
     9,056,383,522        cpu_atom/cycles/
     9,056,385,629        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     6,285,652,796        cpu_core/instructions/
            164.29 Joules power/energy-pkg/
             44.70 Joules power/energy-cores/

      20.041318795 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
304 frames in 5.0 seconds = 60.734 FPS
300 frames in 5.0 seconds = 59.983 FPS
300 frames in 5.0 seconds = 60.000 FPS

 Performance counter stats for 'system wide':

            56,388        context-switches
             2,326        cpu-migrations
        320,581.07 msec   task-clock
     8,789,215,827        cpu_atom/cycles/
     8,789,217,484        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     6,251,346,200        cpu_core/instructions/
            162.67 Joules power/energy-pkg/
             44.30 Joules power/energy-cores/

      20.037648324 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
305 frames in 5.0 seconds = 60.950 FPS
300 frames in 5.0 seconds = 59.993 FPS
300 frames in 5.0 seconds = 59.806 FPS

 Performance counter stats for 'system wide':

            56,167        context-switches
             2,434        cpu-migrations
        320,594.69 msec   task-clock
     8,700,873,664        cpu_atom/cycles/
     8,700,877,150        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     6,405,556,662        cpu_core/instructions/
            162.55 Joules power/energy-pkg/
             43.33 Joules power/energy-cores/

      20.038448851 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
299 frames in 5 seconds: 59.799999 fps

 Performance counter stats for 'system wide':

            24,747        context-switches
             1,254        cpu-migrations
        160,543.42 msec   task-clock
     5,047,832,024        cpu_atom/cycles/
     5,047,823,996        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     3,124,591,155        cpu_core/instructions/
             80.28 Joules power/energy-pkg/
             21.49 Joules power/energy-cores/

      10.034654628 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
300 frames in 5 seconds: 60.000000 fps

 Performance counter stats for 'system wide':

            24,953        context-switches
               921        cpu-migrations
        160,375.32 msec   task-clock
     5,197,283,835        cpu_atom/cycles/
     5,197,287,623        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     3,393,363,950        cpu_core/instructions/
             83.36 Joules power/energy-pkg/
             21.92 Joules power/energy-cores/

      10.024899366 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
298 frames in 5 seconds: 59.599998 fps

 Performance counter stats for 'system wide':

            24,576        context-switches
               966        cpu-migrations
        160,339.37 msec   task-clock
     4,915,705,971        cpu_atom/cycles/
     4,915,709,503        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     2,968,947,722        cpu_core/instructions/
             79.96 Joules power/energy-pkg/
             21.08 Joules power/energy-cores/

      10.022743041 seconds time elapsed

IRQ (drm_dep_queue_flags = DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE):
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
304 frames in 5.0 seconds = 60.643 FPS
298 frames in 5.0 seconds = 59.599 FPS
295 frames in 5.0 seconds = 58.998 FPS

 Performance counter stats for 'system wide':

            60,305        context-switches
             1,994        cpu-migrations
        320,528.79 msec   task-clock
     8,518,549,937        cpu_atom/cycles/
     8,518,573,906        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     5,813,890,066        cpu_core/instructions/
            184.52 Joules power/energy-pkg/
             40.79 Joules power/energy-cores/

      20.032795872 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
304 frames in 5.0 seconds = 60.759 FPS
299 frames in 5.0 seconds = 59.790 FPS
301 frames in 5.0 seconds = 60.003 FPS

 Performance counter stats for 'system wide':

            59,401        context-switches
             2,256        cpu-migrations
        320,475.03 msec   task-clock
     8,581,759,828        cpu_atom/cycles/
     8,581,763,986        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     6,748,269,548        cpu_core/instructions/
            179.76 Joules power/energy-pkg/
             40.66 Joules power/energy-cores/

      20.029861532 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
304 frames in 5.0 seconds = 60.653 FPS
298 frames in 5.0 seconds = 59.404 FPS
300 frames in 5.0 seconds = 59.990 FPS

 Performance counter stats for 'system wide':

            59,381        context-switches
             1,800        cpu-migrations
        320,616.35 msec   task-clock
     8,829,473,025        cpu_atom/cycles/
     8,829,477,019        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     6,505,926,710        cpu_core/instructions/
            180.38 Joules power/energy-pkg/
             40.86 Joules power/energy-cores/

      20.040016190 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
298 frames in 5 seconds: 59.599998 fps

 Performance counter stats for 'system wide':

            27,341        context-switches
               786        cpu-migrations
        160,478.01 msec   task-clock
     4,681,440,843        cpu_atom/cycles/
     4,681,443,905        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     2,969,039,615        cpu_core/instructions/
             91.74 Joules power/energy-pkg/
             20.84 Joules power/energy-cores/

      10.031116623 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
299 frames in 5 seconds: 59.799999 fps

 Performance counter stats for 'system wide':

            24,626        context-switches
               429        cpu-migrations
        160,367.44 msec   task-clock
     4,828,015,355        cpu_atom/cycles/
     4,828,019,887        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     2,675,419,833        cpu_core/instructions/
             90.35 Joules power/energy-pkg/
             21.10 Joules power/energy-cores/

      10.024476921 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
300 frames in 5 seconds: 60.000000 fps

 Performance counter stats for 'system wide':

            24,679        context-switches
               340        cpu-migrations
        160,303.90 msec   task-clock
     4,500,129,961        cpu_atom/cycles/
     4,500,132,697        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     2,766,150,592        cpu_core/instructions/
             88.01 Joules power/energy-pkg/
             19.76 Joules power/energy-cores/

      10.019653353 seconds time elapsed

IRQ plus bypass (drm_dep_queue_flags = DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED | DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE):
root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
305 frames in 5.0 seconds = 60.958 FPS
299 frames in 5.0 seconds = 59.607 FPS
299 frames in 5.0 seconds = 59.603 FPS

 Performance counter stats for 'system wide':

            46,934        context-switches
             1,558        cpu-migrations
        320,569.83 msec   task-clock
     7,976,414,449        cpu_atom/cycles/
     7,976,417,934        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     6,126,973,947        cpu_core/instructions/
            178.36 Joules power/energy-pkg/
             40.10 Joules power/energy-cores/

      20.037681420 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
304 frames in 5.0 seconds = 60.696 FPS
299 frames in 5.0 seconds = 59.616 FPS
299 frames in 5.0 seconds = 59.781 FPS

 Performance counter stats for 'system wide':

            47,691        context-switches
             1,994        cpu-migrations
        320,602.83 msec   task-clock
     8,270,567,663        cpu_atom/cycles/
     8,270,572,484        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     4,361,204,861        cpu_core/instructions/
            181.56 Joules power/energy-pkg/
             40.16 Joules power/energy-cores/

      20.038511163 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 20s glxgears
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
305 frames in 5.0 seconds = 60.911 FPS
298 frames in 5.0 seconds = 59.597 FPS
300 frames in 5.0 seconds = 59.803 FPS

 Performance counter stats for 'system wide':

            47,129        context-switches
             1,921        cpu-migrations
        320,491.09 msec   task-clock
     8,054,513,204        cpu_atom/cycles/
     8,054,518,711        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     6,131,796,639        cpu_core/instructions/
            178.54 Joules power/energy-pkg/
             40.08 Joules power/energy-cores/

      20.032444923 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
299 frames in 5 seconds: 59.799999 fps

 Performance counter stats for 'system wide':

            21,991        context-switches
               286        cpu-migrations
        160,343.73 msec   task-clock
     4,497,475,288        cpu_atom/cycles/
     4,497,477,011        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     3,042,007,163        cpu_core/instructions/
             89.14 Joules power/energy-pkg/
             20.09 Joules power/energy-cores/

      10.021642254 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
300 frames in 5 seconds: 60.000000 fps

 Performance counter stats for 'system wide':

            22,366        context-switches
               225        cpu-migrations
        160,386.68 msec   task-clock
     4,398,432,348        cpu_atom/cycles/
     4,398,435,205        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     3,086,156,274        cpu_core/instructions/
             89.07 Joules power/energy-pkg/
             19.68 Joules power/energy-cores/

      10.024827902 seconds time elapsed

root@DUT6235BMGFRD:mbrost# ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions,power/energy-pkg/,power/energy-cores/ timeout 10 weston-simple-egl -f
Using config: r8g8b8a8
has EGL_EXT_buffer_age and EGL_EXT_swap_buffers_with_damage
has EGL_EXT_surface_compression
299 frames in 5 seconds: 59.799999 fps

 Performance counter stats for 'system wide':

            22,515        context-switches
               286        cpu-migrations
        160,481.91 msec   task-clock
     4,447,740,222        cpu_atom/cycles/
     4,447,743,314        cpu_core/cycles/
   <not supported>        cpu_atom/instructions/
     3,217,285,071        cpu_core/instructions/
             90.15 Joules power/energy-pkg/
             19.65 Joules power/energy-cores/

      10.029135743 seconds time elapsed

Matt

> > I'm a bit surprised by the difference in number of context switches
> > given I'd expect the local-CPU to be picked in priority, and so queuing
> > work items on the same wq from another work item to be almost free in
> > term on scheduling. But I guess there's some load-balancing happening
> > when you execute jobs at such a high rate.
> > 
> > Also, I don't know if that's just noise or if it's reproducible, but
> > task-clock seems to be ~40usec lower with the deferred cleanup and
> > no-bypass (higher throughput because you're not blocking the dequeuing
> > of the next job on the cleanup of the previous one, I suspect).
> 
> I think that is just noise of what the test is doing in user space -
> that bounces around a bit.
> 
> Matt
> 
> > 

  reply	other threads:[~2026-03-25  2:33 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-16  4:32 [RFC PATCH 00/12] Introduce DRM dep queue Matthew Brost
2026-03-16  4:32 ` [RFC PATCH 01/12] workqueue: Add interface to teach lockdep to warn on reclaim violations Matthew Brost
2026-03-25 15:59   ` Tejun Heo
2026-03-26  1:49     ` Matthew Brost
2026-03-26  2:19       ` Tejun Heo
2026-03-27  4:33         ` Matthew Brost
2026-03-27 17:25           ` Tejun Heo
2026-03-16  4:32 ` [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer Matthew Brost
2026-03-16  9:16   ` Boris Brezillon
2026-03-17  5:22     ` Matthew Brost
2026-03-17  8:48       ` Boris Brezillon
2026-03-16 10:25   ` Danilo Krummrich
2026-03-17  5:10     ` Matthew Brost
2026-03-17 12:19       ` Danilo Krummrich
2026-03-18 23:02         ` Matthew Brost
2026-03-17  2:47   ` Daniel Almeida
2026-03-17  5:45     ` Matthew Brost
2026-03-17  7:17       ` Miguel Ojeda
2026-03-17  8:26         ` Matthew Brost
2026-03-17 12:04           ` Daniel Almeida
2026-03-17 19:41           ` Miguel Ojeda
2026-03-23 17:31             ` Matthew Brost
2026-03-23 17:42               ` Miguel Ojeda
2026-03-17 18:14       ` Matthew Brost
2026-03-17 19:48         ` Daniel Almeida
2026-03-17 20:43         ` Boris Brezillon
2026-03-18 22:40           ` Matthew Brost
2026-03-19  9:57             ` Boris Brezillon
2026-03-22  6:43               ` Matthew Brost
2026-03-23  7:58                 ` Matthew Brost
2026-03-23 10:06                   ` Boris Brezillon
2026-03-23 17:11                     ` Matthew Brost
2026-03-17 12:31     ` Danilo Krummrich
2026-03-17 14:25       ` Daniel Almeida
2026-03-17 14:33         ` Danilo Krummrich
2026-03-18 22:50           ` Matthew Brost
2026-03-17  8:47   ` Christian König
2026-03-17 14:55   ` Boris Brezillon
2026-03-18 23:28     ` Matthew Brost
2026-03-19  9:11       ` Boris Brezillon
2026-03-23  4:50         ` Matthew Brost
2026-03-23  9:55           ` Boris Brezillon
2026-03-23 17:08             ` Matthew Brost
2026-03-23 18:38               ` Matthew Brost
2026-03-24  9:23                 ` Boris Brezillon
2026-03-24 16:06                   ` Matthew Brost
2026-03-25  2:33                     ` Matthew Brost [this message]
2026-03-24  8:49               ` Boris Brezillon
2026-03-24 16:51                 ` Matthew Brost
2026-03-17 16:30   ` Shashank Sharma
2026-03-16  4:32 ` [RFC PATCH 03/12] drm/xe: Use WQ_MEM_WARN_ON_RECLAIM on all workqueues in the reclaim path Matthew Brost
2026-03-16  4:32 ` [RFC PATCH 04/12] drm/xe: Issue GGTT invalidation under lock in ggtt_node_remove Matthew Brost
2026-03-26  5:45   ` Bhadane, Dnyaneshwar
2026-03-16  4:32 ` [RFC PATCH 05/12] drm/xe: Return fence from xe_sched_job_arm and adjust job references Matthew Brost
2026-03-16  4:32 ` [RFC PATCH 06/12] drm/xe: Convert to DRM dep queue scheduler layer Matthew Brost
2026-03-16  4:32 ` [RFC PATCH 07/12] drm/xe: Make scheduler message lock IRQ-safe Matthew Brost
2026-03-16  4:32 ` [RFC PATCH 08/12] drm/xe: Rework exec queue object on top of DRM dep Matthew Brost
2026-03-16  4:32 ` [RFC PATCH 09/12] drm/xe: Enable IRQ job put in " Matthew Brost
2026-03-16  4:32 ` [RFC PATCH 10/12] drm/xe: Use DRM dep queue kill semantics Matthew Brost
2026-03-16  4:32 ` [RFC PATCH 11/12] accel/amdxdna: Convert to drm_dep scheduler layer Matthew Brost
2026-03-16  4:32 ` [RFC PATCH 12/12] drm/panthor: " Matthew Brost
2026-03-16  4:52 ` ✗ CI.checkpatch: warning for Introduce DRM dep queue Patchwork
2026-03-16  4:53 ` ✓ CI.KUnit: success " Patchwork
2026-03-16  5:28 ` ✓ Xe.CI.BAT: " Patchwork
2026-03-16  8:09 ` ✗ Xe.CI.FULL: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=acNJa4W2qndtbxJg@lstrano-desk.jf.intel.com \
    --to=matthew.brost@intel.com \
    --cc=airlied@gmail.com \
    --cc=boris.brezillon@collabora.com \
    --cc=christian.koenig@amd.com \
    --cc=dakr@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=mripard@kernel.org \
    --cc=phasta@kernel.org \
    --cc=rodrigo.vivi@intel.com \
    --cc=simona@ffwll.ch \
    --cc=sumit.semwal@linaro.org \
    --cc=thomas.hellstrom@linux.intel.com \
    --cc=tvrtko.ursulin@igalia.com \
    --cc=tzimmermann@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.