From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from bali.collaboradmins.com (bali.collaboradmins.com [148.251.105.195]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0C4613D566E for ; Tue, 24 Mar 2026 09:23:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.251.105.195 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774344242; cv=none; b=DImL0hN3qTVNAro55eOBDQwONIPJinNSrOrXs880MyBnAddkC8FaeypqkXk4bMn5oHNZtjz3g/w4qHRKcjRJc2gzp1JJECyAZPQjCKtoRuUfSO7H5bn78Y3RBKJAlR4uJdiAUMlUYOTrxerl4bR+qZ4vU+/mpL0R2/grh9fzCFs= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774344242; c=relaxed/simple; bh=yZQ2+xLkFUtBrYQ3kuAajM2CQ/yzEgdmGO+VpSJ/vQg=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=FELLNgFjkN4//zr3gmVb0tHw+S1ke26pwMN5V7Qm7brFMHZezB9hI/dlv3VFzsvoS2AlU99I+KusmDFAnhRKDcrRr32KmI8J81ByubFzmeE4hn7jDnbwMN2qYb/SDMY/2I+t6W4QZTqZIbZoibgWGKbswGaeA5nvtz3SVXaoB2k= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=collabora.com; spf=pass smtp.mailfrom=collabora.com; dkim=pass (2048-bit key) header.d=collabora.com header.i=@collabora.com header.b=lGZrL11D; arc=none smtp.client-ip=148.251.105.195 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=collabora.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=collabora.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=collabora.com header.i=@collabora.com header.b="lGZrL11D" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1774344229; bh=yZQ2+xLkFUtBrYQ3kuAajM2CQ/yzEgdmGO+VpSJ/vQg=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=lGZrL11Dgp1MgW2ycqLbXRcHCAZA8iv27awxZMEyBVAHyVhAZnl+TandY2rcL4a3r rvISn7tigekike1wKrxwoajmHiOYyJYCCotofjPz5Q19TDJ4doX+TdZ1ziwW98+Upu LsRC/jiz2LtzUZAOKZOAPASF3xmTaFuidaMgf/Z+ACMHNq6yEi5Zikk+dGCbBohVSq zi7GBGC/X3yW8lNIUKAEW30CpMk/1YyzzCKPvmmEhJDN7g8u2Rna25oYhwdZVCIFTt OHZDB+Z4ms92sH2/EMKDykyGYiLOaBu13GeaBblK36cknrcu44M6mFcrYHHnWO1UtF 1AkGv+dSDomGA== Received: from fedora (unknown [IPv6:2a01:e0a:2c:6930:d919:a6e:5ea1:8a9f]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (prime256v1) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: bbrezillon) by bali.collaboradmins.com (Postfix) with ESMTPSA id A777E17E54A9; Tue, 24 Mar 2026 10:23:48 +0100 (CET) Date: Tue, 24 Mar 2026 10:23:45 +0100 From: Boris Brezillon To: Matthew Brost Cc: , , Tvrtko Ursulin , Rodrigo Vivi , Thomas =?UTF-8?B?SGVsbHN0csO2bQ==?= , Christian =?UTF-8?B?S8O2bmln?= , Danilo Krummrich , David Airlie , Maarten Lankhorst , Maxime Ripard , Philipp Stanner , Simona Vetter , Sumit Semwal , Thomas Zimmermann , Subject: Re: [RFC PATCH 02/12] drm/dep: Add DRM dependency queue layer Message-ID: <20260324102345.17742bef@fedora> In-Reply-To: References: <20260316043255.226352-1-matthew.brost@intel.com> <20260316043255.226352-3-matthew.brost@intel.com> <20260317155512.7250be13@fedora> <20260319101153.169c7f36@fedora> <20260323105504.2d9ae741@fedora> Organization: Collabora X-Mailer: Claws Mail 4.3.1 (GTK 3.24.51; x86_64-redhat-linux-gnu) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Mon, 23 Mar 2026 11:38:06 -0700 Matthew Brost wrote: > > Ok, getting stats is easier than I thought... > > ./perf stat -a -e context-switches,cpu-migrations,task-clock,cycles,instructions /home/mbrost/xe/source/drivers.gpu.i915.igt-gpu-tools/build/tests/xe_exec_threads --r threads-basic > > This test creates one thread per engine instance (7 instances this BMG > device) and submits 1k exec IOCTLs per thread, each performing a DW > write. Each exec IOCTL typically does not have unsignaled input dependencies. > > With IRQ putting of jobs off + no bypass (drm_dep_queue_flags = 0): > > 8,449 context-switches > 412 cpu-migrations > 2,531.43 msec task-clock > 1,847,846,588 cpu_atom/cycles/ > 1,847,856,947 cpu_core/cycles/ > cpu_atom/instructions/ > 460,744,020 cpu_core/instructions/ > > With IRQ putting of jobs off + bypass (drm_dep_queue_flags = > DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED): > > 8,655 context-switches > 229 cpu-migrations > 2,571.33 msec task-clock > 855,900,607 cpu_atom/cycles/ > 855,900,272 cpu_core/cycles/ > cpu_atom/instructions/ > 403,651,469 cpu_core/instructions/ > > With IRQ putting of jobs on + bypass (drm_dep_queue_flags = > DRM_DEP_QUEUE_FLAGS_BYPASS_SUPPORTED | > DRM_DEP_QUEUE_FLAGS_JOB_PUT_IRQ_SAFE): > > 5,361 context-switches > 169 cpu-migrations > 2,577.44 msec task-clock > 685,769,153 cpu_atom/cycles/ > 685,768,407 cpu_core/cycles/ > cpu_atom/instructions/ > 321,336,297 cpu_core/instructions/ Thanks for sharing those numbers. For completeness, can you also add the "With IRQ putting of jobs on + no bypass" case? I'm a bit surprised by the difference in number of context switches given I'd expect the local-CPU to be picked in priority, and so queuing work items on the same wq from another work item to be almost free in term on scheduling. But I guess there's some load-balancing happening when you execute jobs at such a high rate. Also, I don't know if that's just noise or if it's reproducible, but task-clock seems to be ~40usec lower with the deferred cleanup and no-bypass (higher throughput because you're not blocking the dequeuing of the next job on the cleanup of the previous one, I suspect).