From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0E4ECCCD187 for ; Tue, 7 Oct 2025 13:05:18 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 47F6710E686; Tue, 7 Oct 2025 13:05:17 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="Fywgy+YW"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.13]) by gabe.freedesktop.org (Postfix) with ESMTPS id 86C3910E649 for ; Tue, 7 Oct 2025 13:05:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1759842316; x=1791378316; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=QKPLQk1wds1f31w0u0wVGqDsgyKDbKai5M+Kj+PsHas=; b=Fywgy+YWXJx3Qqn6/cmh9mjIlug1SYSc/p3rHmUgYa1FdeqV5M8OaKLB pAjcWJrMd50qckb/s0TSl6MAA04vjZ5PiI8CWglwO5sJ2oyuoYWCGNGpw UkZVIBbPx/4e+Oc0KoaRDrXv/R525JMWotD967QVaIlyGLv/M4MwiIfEx g8QPKXKgRide21urZhaBYMjS0xUqWAZ+Iwjzm8zpgMzcXB/BhyOldmDSC 1vJ//uFKLI9P4Myj23/51BIedphpLtf6GL3O3Dx904zxHUoXDPethLjet BMWwflAjciw5A8QRig1u0XpFbOV013A5oH3HTu8MEdHEQVvHcFBEnubq6 Q==; X-CSE-ConnectionGUID: uace7xX0QuekgvAIibo7ww== X-CSE-MsgGUID: 053WU95tTW+ZKa6XMvEKKw== X-IronPort-AV: E=McAfee;i="6800,10657,11575"; a="64639849" X-IronPort-AV: E=Sophos;i="6.18,321,1751266800"; d="scan'208";a="64639849" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Oct 2025 06:05:16 -0700 X-CSE-ConnectionGUID: LI1fHWd2RdqrojQyGQjlKA== X-CSE-MsgGUID: wS7akX1ORciG5J36pr47hg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.18,321,1751266800"; d="scan'208";a="180576963" Received: from lstrano-desk.jf.intel.com ([10.54.39.91]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Oct 2025 06:05:15 -0700 From: Matthew Brost To: intel-xe@lists.freedesktop.org Subject: [PATCH v8 28/33] drm/xe/vf: Workaround for race condition in GuC firmware during VF pause Date: Tue, 7 Oct 2025 06:05:00 -0700 Message-Id: <20251007130505.2694829-29-matthew.brost@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20251007130505.2694829-1-matthew.brost@intel.com> References: <20251007130505.2694829-1-matthew.brost@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" A race condition exists where a paused VF's H2G request can be processed and subsequently rejected. This rejection results in a FAST_REQ failure being delivered to the KMD, which then terminates the CT via a dead worker and triggers a GT reset—an undesirable outcome. This workaround mitigates the issue by checking if a VF post-migration recovery is in progress and aborting these adverse actions accordingly. The GuC firmware will address this bug in an upcoming release. Once that version is available and VF migration depends on it, this workaround can be safely removed. Signed-off-by: Matthew Brost Reviewed-by: Tomasz Lis --- drivers/gpu/drm/xe/xe_guc_ct.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c index d0666d6d12f8..9da3339fba69 100644 --- a/drivers/gpu/drm/xe/xe_guc_ct.c +++ b/drivers/gpu/drm/xe/xe_guc_ct.c @@ -1397,6 +1397,10 @@ static int parse_g2h_response(struct xe_guc_ct *ct, u32 *msg, u32 len) fast_req_report(ct, fence); + /* FIXME: W/A race in the GuC, will get in firmware soon */ + if (xe_gt_recovery_pending(gt)) + return 0; + CT_DEAD(ct, NULL, PARSE_G2H_RESPONSE); return -EPROTO; -- 2.34.1