From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id AE02DCCD187 for ; Mon, 6 Oct 2025 11:10:50 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 3B29110E41A; Mon, 6 Oct 2025 11:10:50 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="gmnh8ELS"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) by gabe.freedesktop.org (Postfix) with ESMTPS id C8FBF10E40A for ; Mon, 6 Oct 2025 11:10:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1759749046; x=1791285046; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=kZfylbH+8Vi1imXtlcl1jgMjo1dDUcI5FoSmQghk6wg=; b=gmnh8ELS4mj9G28c9p0ZDGMBx5v8pL/AUkTp8S680THNG+t31DecDg+x A1lgup8bWgJpU/Yjatx+DJtmHo1khIJDUNSKDQVPDxniIMpK9SZAdh1WM NhU+cVcxSWi3zzPli/n2EpyVnNzUsJmiff3w8Ffd1pI5bNSqNj8uJMf+v mVIxjvU1s2IwxJsXnSduHevQfQb05B6n7p9S0DKrj22b4AYtueW4FV9dU naWOhi4iDfLKegTzSfbw4Y076Q8skSWIHOKhyDz6gS1yZKbcH99ox8y9/ MBXh8ERRcmU708+UbUI1TxKkMAazPIIBPkCqFLj7U/YVU2U1TdBIMGbiA A==; X-CSE-ConnectionGUID: raaYPuf5TSCxsdasuYav0Q== X-CSE-MsgGUID: r2dEFycJSZq2ypYVgzTQHg== X-IronPort-AV: E=McAfee;i="6800,10657,11573"; a="73020402" X-IronPort-AV: E=Sophos;i="6.18,319,1751266800"; d="scan'208";a="73020402" Received: from fmviesa010.fm.intel.com ([10.60.135.150]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Oct 2025 04:10:46 -0700 X-CSE-ConnectionGUID: oKJb1KW9T2CmZ+oz35hEsA== X-CSE-MsgGUID: P5csY3SiRwiwX8vK0syFHw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.18,319,1751266800"; d="scan'208";a="180655245" Received: from lstrano-desk.jf.intel.com ([10.54.39.91]) by fmviesa010-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Oct 2025 04:10:45 -0700 From: Matthew Brost To: intel-xe@lists.freedesktop.org Subject: [PATCH v6 25/30] drm/xe/vf: Workaround for race condition in GuC firmware during VF pause Date: Mon, 6 Oct 2025 04:10:33 -0700 Message-Id: <20251006111038.2234860-26-matthew.brost@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20251006111038.2234860-1-matthew.brost@intel.com> References: <20251006111038.2234860-1-matthew.brost@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" A race condition exists where a paused VF's H2G request can be processed and subsequently rejected. This rejection results in a FAST_REQ failure being delivered to the KMD, which then terminates the CT via a dead worker and triggers a GT reset—an undesirable outcome. This workaround mitigates the issue by checking if a VF post-migration recovery is in progress and aborting these adverse actions accordingly. The GuC firmware will address this bug in an upcoming release. Once that version is available and VF migration depends on it, this workaround can be safely removed. Signed-off-by: Matthew Brost Reviewed-by: Tomasz Lis --- drivers/gpu/drm/xe/xe_guc_ct.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c index c0d261abf735..dd593e9b0fe5 100644 --- a/drivers/gpu/drm/xe/xe_guc_ct.c +++ b/drivers/gpu/drm/xe/xe_guc_ct.c @@ -1395,6 +1395,10 @@ static int parse_g2h_response(struct xe_guc_ct *ct, u32 *msg, u32 len) fast_req_report(ct, fence); + /* FIXME: W/A race in the GuC, will get in firmware soon */ + if (xe_gt_recovery_pending(gt)) + return 0; + CT_DEAD(ct, NULL, PARSE_G2H_RESPONSE); return -EPROTO; -- 2.34.1