From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id EC679C25B75 for ; Tue, 4 Jun 2024 00:48:55 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id EB76710E335; Tue, 4 Jun 2024 00:48:54 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="PS1D0ikb"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.14]) by gabe.freedesktop.org (Postfix) with ESMTPS id DA16510E335; Tue, 4 Jun 2024 00:48:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1717462133; x=1748998133; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=Abu4sWK0+OSMyD87RV2MyV8HAV/jk8H4OGZuXoUwq+0=; b=PS1D0ikbnpbzjZSVnOOY7t8j6004zdv35cO8M2lyxtVUNyiqBW3eZfAe +BCBgBjZWP81Q6zpn+pBI/FUqi07T/B5wwZ2tz4Y7iQ+ftQvoSXi97VZZ frmkWWU9kemEOf9G76W0ch+wUEzAilnb5z34AedbPlx+K/8DDhk3iG10E OmVebb6EbskuCJxvZ6+/Xa4E9KtCVNcrSeACFN6Atx6ws0CU7wmfum9FP hhLExPSUHBQ+47Va1nof/4+BA/nM9sQJ6qAWRb0gQ95p7tn9gTS9Jbg1y zR1QOfkfEdHlIiQagX9bkds3oeH1BqwzPsm3jEH3bUq0xGle4Jj30LSv2 g==; X-CSE-ConnectionGUID: UOtiARFgR0O+dq0++QaUUA== X-CSE-MsgGUID: 3UwfPWzcSL2WOFI2pwCNRg== X-IronPort-AV: E=McAfee;i="6600,9927,11092"; a="14207454" X-IronPort-AV: E=Sophos;i="6.08,212,1712646000"; d="scan'208";a="14207454" Received: from fmviesa008.fm.intel.com ([10.60.135.148]) by fmvoesa108.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Jun 2024 17:48:52 -0700 X-CSE-ConnectionGUID: Rn0Z7WdJQbqWsZc1bOUf1g== X-CSE-MsgGUID: d0a/vEunQ3mRCUVkuN4oSw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,212,1712646000"; d="scan'208";a="37036299" Received: from lfiedoro-mobl.ger.corp.intel.com (HELO intel.com) ([10.245.246.65]) by fmviesa008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Jun 2024 17:48:48 -0700 Date: Tue, 4 Jun 2024 02:48:43 +0200 From: Andi Shyti To: Janusz Krzysztofik Cc: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, Jani Nikula , Joonas Lahtinen , Rodrigo Vivi , Tvrtko Ursulin , Chris Wilson , Matthew Auld , Andi Shyti , Nirmoy Das , Jonathan Cavitt Subject: Re: [PATCH] drm/i915/gt: Fix potential UAF by revoke of fence registers Message-ID: References: <20240603195446.297690-2-janusz.krzysztofik@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240603195446.297690-2-janusz.krzysztofik@linux.intel.com> X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Hi Janusz, On Mon, Jun 03, 2024 at 09:54:45PM +0200, Janusz Krzysztofik wrote: > CI has been sporadically reporting the following issue triggered by > igt@i915_selftest@live@hangcheck on ADL-P and similar machines: > > <6> [414.049203] i915: Running intel_hangcheck_live_selftests/igt_reset_evict_fence > ... > <6> [414.068804] i915 0000:00:02.0: [drm] GT0: GUC: submission enabled > <6> [414.068812] i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled > <3> [414.070354] Unable to pin Y-tiled fence; err:-4 > <3> [414.071282] i915_vma_revoke_fence:301 GEM_BUG_ON(!i915_active_is_idle(&fence->active)) > ... > <4>[ 609.603992] ------------[ cut here ]------------ > <2>[ 609.603995] kernel BUG at drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c:301! > <4>[ 609.604003] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI > <4>[ 609.604006] CPU: 0 PID: 268 Comm: kworker/u64:3 Tainted: G U W 6.9.0-CI_DRM_14785-g1ba62f8cea9c+ #1 > <4>[ 609.604008] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P DDR4 RVP, BIOS RPLPFWI1.R00.4035.A00.2301200723 01/20/2023 > <4>[ 609.604010] Workqueue: i915 __i915_gem_free_work [i915] > <4>[ 609.604149] RIP: 0010:i915_vma_revoke_fence+0x187/0x1f0 [i915] > ... > <4>[ 609.604271] Call Trace: > <4>[ 609.604273] > ... > <4>[ 609.604716] __i915_vma_evict+0x2e9/0x550 [i915] > <4>[ 609.604852] __i915_vma_unbind+0x7c/0x160 [i915] > <4>[ 609.604977] force_unbind+0x24/0xa0 [i915] > <4>[ 609.605098] i915_vma_destroy+0x2f/0xa0 [i915] > <4>[ 609.605210] __i915_gem_object_pages_fini+0x51/0x2f0 [i915] > <4>[ 609.605330] __i915_gem_free_objects.isra.0+0x6a/0xc0 [i915] > <4>[ 609.605440] process_scheduled_works+0x351/0x690 > ... > > In the past, there were similar failures reported by CI from other IGT > tests, observed on other platforms. > > Before commit 63baf4f3d587 ("drm/i915/gt: Only wait for GPU activity > before unbinding a GGTT fence"), i915_vma_revoke_fence() was waiting for > idleness of vma->active via fence_update(). That commit introduced > vma->fence->active in order for the fence_update() to be able to wait > selectively on that one instead of vma->active since only idleness of > fence registers was needed. But then, another commit 0d86ee35097a > ("drm/i915/gt: Make fence revocation unequivocal") replaced the call to > fence_update() in i915_vma_revoke_fence() with only fence_write(), and > also added that GEM_BUG_ON(!i915_active_is_idle(&fence->active)) in front. > No justification was provided on why we might then expect idleness of > vma->fence->active without first waiting on it. > > The issue can be potentially caused by a race among revocation of fence > registers on one side and sequential execution of signal callbacks invoked > on completion of a request that was using them on the other, still > processed in parallel to revocation of those fence registers. Fix it by > waiting for idleness of vma->fence->active in i915_vma_revoke_fence(). > > Fixes: 0d86ee35097a ("drm/i915/gt: Make fence revocation unequivocal") > Closes: https://gitlab.freedesktop.org/drm/intel/issues/10021 > Signed-off-by: Janusz Krzysztofik > Cc: stable@vger.kernel.org # v5.8+ Just wondering whether we really need the stable kernel here. We have just an alleged failure reported on a selftest. I think we can drop the stable requirement. Otherwise, Reviewed-by: Andi Shyti Thanks, Andi