From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id EED95CCD185 for ; Tue, 7 Oct 2025 11:27:03 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id B079210E671; Tue, 7 Oct 2025 11:27:03 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="Xc6WwE49"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) by gabe.freedesktop.org (Postfix) with ESMTPS id C702710E5D0 for ; Tue, 7 Oct 2025 11:26:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1759836412; x=1791372412; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=QKPLQk1wds1f31w0u0wVGqDsgyKDbKai5M+Kj+PsHas=; b=Xc6WwE49AyvWdiCoJWSZKctyHfB579wvV6F8+BgeBOuSFqTzWbh7v9bP OJCT8Ne210jIWx/9yAVkHO/hNy5mcKOFPnbc3dJIb0YKF6fMmFDqOAqoB eIEDCdMTyvJ95VC0IWXjNRXnXfq6lHv+Zp1sn7l92P5XJMU6SMWNz1hBI sA3jd0bI+rmtLk5Y+C/ZkFrnzB+8wEE3jkiOyDSIJKUS6NIxh32487HN6 BIjBOgTlvsiwsnr25k4y0sW+5Zf85sOruYpUwMhkMhmjtk7ysasK3ydRa rmA7mvBsEb8QX0Y6kTEWvO2LIS676gxudHmONdDjJE897e6XNs+YiBGM/ w==; X-CSE-ConnectionGUID: N37WOTGXQ++TEZWxeBMUUQ== X-CSE-MsgGUID: mPIJ8x9xQfGc6Vr+4QZ3Hg== X-IronPort-AV: E=McAfee;i="6800,10657,11574"; a="65660700" X-IronPort-AV: E=Sophos;i="6.18,321,1751266800"; d="scan'208";a="65660700" Received: from fmviesa010.fm.intel.com ([10.60.135.150]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Oct 2025 04:26:52 -0700 X-CSE-ConnectionGUID: Sknhp4EbQBCFKsg/kUAv8g== X-CSE-MsgGUID: QwuehWrRS+unNkwWQbB0GA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.18,321,1751266800"; d="scan'208";a="180924020" Received: from lstrano-desk.jf.intel.com ([10.54.39.91]) by fmviesa010-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Oct 2025 04:26:51 -0700 From: Matthew Brost To: intel-xe@lists.freedesktop.org Subject: [PATCH v7 27/32] drm/xe/vf: Workaround for race condition in GuC firmware during VF pause Date: Tue, 7 Oct 2025 04:26:36 -0700 Message-Id: <20251007112641.2669655-28-matthew.brost@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20251007112641.2669655-1-matthew.brost@intel.com> References: <20251007112641.2669655-1-matthew.brost@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" A race condition exists where a paused VF's H2G request can be processed and subsequently rejected. This rejection results in a FAST_REQ failure being delivered to the KMD, which then terminates the CT via a dead worker and triggers a GT reset—an undesirable outcome. This workaround mitigates the issue by checking if a VF post-migration recovery is in progress and aborting these adverse actions accordingly. The GuC firmware will address this bug in an upcoming release. Once that version is available and VF migration depends on it, this workaround can be safely removed. Signed-off-by: Matthew Brost Reviewed-by: Tomasz Lis --- drivers/gpu/drm/xe/xe_guc_ct.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c index d0666d6d12f8..9da3339fba69 100644 --- a/drivers/gpu/drm/xe/xe_guc_ct.c +++ b/drivers/gpu/drm/xe/xe_guc_ct.c @@ -1397,6 +1397,10 @@ static int parse_g2h_response(struct xe_guc_ct *ct, u32 *msg, u32 len) fast_req_report(ct, fence); + /* FIXME: W/A race in the GuC, will get in firmware soon */ + if (xe_gt_recovery_pending(gt)) + return 0; + CT_DEAD(ct, NULL, PARSE_G2H_RESPONSE); return -EPROTO; -- 2.34.1