From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 15AEFCCD19E for ; Wed, 8 Oct 2025 18:05:12 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 2240D10E8A3; Wed, 8 Oct 2025 18:05:10 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="MpJNjMFr"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) by gabe.freedesktop.org (Postfix) with ESMTPS id CADB210E890 for ; Wed, 8 Oct 2025 18:05:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1759946709; x=1791482709; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=91fGDIjEI+bf7GDKgyE18AfUMzubQ5JidOAf0iOP0ho=; b=MpJNjMFrt9YLsIveDbtmxhoCBrJvYzQbfTxYtylgz2eT1MsuhYnz6pLH ygN4eek1z6EorCszjFN6M2cvm+qGSRqu57P6jFpj21O+xKCUflGwG+I74 ft9miqHqFFPvZhhvfl9Vtd8MESsfesg1WGnGZZclNs5YlahpvKDsniFmI V2GXR6EpCmPT3NDW11XV7Zw/c9V6ro9zJC/pQlf0W03R44va1Ax+9DLLj azFKSNm9yrM8ymkBGKp4SHwztT7OSfMs/9Y4+TwAa1gq/G5zFh2PbsjTm KRcXSZ+XRd4Uto96xOlAPhVaoOwGKuyImBGljKwaWeQF1/araFGMgcJSG g==; X-CSE-ConnectionGUID: 3lSAsg/tQ5mtdoozMamb3g== X-CSE-MsgGUID: Bi3mvlpsSXWcG/XW2VYNVw== X-IronPort-AV: E=McAfee;i="6800,10657,11531"; a="62067539" X-IronPort-AV: E=Sophos;i="6.17,312,1747724400"; d="scan'208";a="62067539" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Oct 2025 11:05:07 -0700 X-CSE-ConnectionGUID: r5WgwTT6R7mJDH59tCY0/w== X-CSE-MsgGUID: lAr84ARmSK6Olv49v/DFSA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.19,214,1754982000"; d="scan'208";a="217593746" Received: from lstrano-desk.jf.intel.com ([10.54.39.91]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Oct 2025 11:05:07 -0700 From: Matthew Brost To: intel-xe@lists.freedesktop.org Subject: [PATCH v9 29/34] drm/xe/vf: Workaround for race condition in GuC firmware during VF pause Date: Wed, 8 Oct 2025 11:04:55 -0700 Message-Id: <20251008180500.3261209-30-matthew.brost@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20251008180500.3261209-1-matthew.brost@intel.com> References: <20251008180500.3261209-1-matthew.brost@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" A race condition exists where a paused VF's H2G request can be processed and subsequently rejected. This rejection results in a FAST_REQ failure being delivered to the KMD, which then terminates the CT via a dead worker and triggers a GT reset—an undesirable outcome. This workaround mitigates the issue by checking if a VF post-migration recovery is in progress and aborting these adverse actions accordingly. The GuC firmware will address this bug in an upcoming release. Once that version is available and VF migration depends on it, this workaround can be safely removed. Signed-off-by: Matthew Brost Reviewed-by: Tomasz Lis --- drivers/gpu/drm/xe/xe_guc_ct.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c index 3472e4ea2609..3ae1e8db143a 100644 --- a/drivers/gpu/drm/xe/xe_guc_ct.c +++ b/drivers/gpu/drm/xe/xe_guc_ct.c @@ -1398,6 +1398,10 @@ static int parse_g2h_response(struct xe_guc_ct *ct, u32 *msg, u32 len) fast_req_report(ct, fence); + /* FIXME: W/A race in the GuC, will get in firmware soon */ + if (xe_gt_recovery_pending(gt)) + return 0; + CT_DEAD(ct, NULL, PARSE_G2H_RESPONSE); return -EPROTO; -- 2.34.1