From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D164CCCD18A for ; Tue, 7 Oct 2025 11:26:58 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 8D47A10E689; Tue, 7 Oct 2025 11:26:58 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="D6HxCHFS"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) by gabe.freedesktop.org (Postfix) with ESMTPS id E9D8310E5D0 for ; Tue, 7 Oct 2025 11:26:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1759836411; x=1791372411; h=from:to:subject:date:message-id:in-reply-to:references: mime-version:content-transfer-encoding; bh=JCzlhaCbXoPSItw47vs+g1d4BNsQg6k9PdoA8K4OLXo=; b=D6HxCHFS4udmdYfjpoEhxc3RgRGDKJT8oCfa3XFefidZP5MGoWt6nRut vLNtyOa8QOEBlDu/wmqX6a6CjLx+LGIHTUaASnpa8CXXCM1b/h14cPJtP 1yy2UsUZXWyPJw1kPh1Si2AX7IEoNUvcl92L/ksPG4NP8MtiHNLDjcXkd 3nJvEaw1uYApg3ErH0ufT49f1FArycMkm82GXSL8uT3nwmLShKKdgsPYy 6zM7sgA7jfEC7yaZ0xsJE1bJDTScUf3p8sjf9yjWhcT2JzEuCwBd3qg0i IEuCkRAhhffwwGAYC+ucTaqsv2+9yV5ns1fKcRNZYIwHWNE5vqH/AJKNq A==; X-CSE-ConnectionGUID: gzNmqG3uRUaSlVO2qqpR6w== X-CSE-MsgGUID: 6zFh6SsIRbOFbjmzMNG+yg== X-IronPort-AV: E=McAfee;i="6800,10657,11574"; a="65660695" X-IronPort-AV: E=Sophos;i="6.18,321,1751266800"; d="scan'208";a="65660695" Received: from fmviesa010.fm.intel.com ([10.60.135.150]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Oct 2025 04:26:51 -0700 X-CSE-ConnectionGUID: iNX+Adq1RX2MeM41+/hQQQ== X-CSE-MsgGUID: wkr/g3QCS+axXci0hqPAgg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.18,321,1751266800"; d="scan'208";a="180924016" Received: from lstrano-desk.jf.intel.com ([10.54.39.91]) by fmviesa010-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Oct 2025 04:26:50 -0700 From: Matthew Brost To: intel-xe@lists.freedesktop.org Subject: [PATCH v7 23/32] drm/xe/vf: Abort VF post migration recovery on failure Date: Tue, 7 Oct 2025 04:26:32 -0700 Message-Id: <20251007112641.2669655-24-matthew.brost@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20251007112641.2669655-1-matthew.brost@intel.com> References: <20251007112641.2669655-1-matthew.brost@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" If VF post-migration recovery fails, the device is wedged. However, submission queues still need to be enabled for proper cleanup. In such cases, call into the GuC submission backend to restart all queues that were previously paused. v3: - s/Avort/Abort (Tomasz) Signed-off-by: Matthew Brost Reviewed-by: Tomasz Lis --- drivers/gpu/drm/xe/xe_gt_sriov_vf.c | 10 ++++++++++ drivers/gpu/drm/xe/xe_guc_submit.c | 20 ++++++++++++++++++++ drivers/gpu/drm/xe/xe_guc_submit.h | 1 + 3 files changed, 31 insertions(+) diff --git a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c index 675bb0d43343..c14a3f1724bb 100644 --- a/drivers/gpu/drm/xe/xe_gt_sriov_vf.c +++ b/drivers/gpu/drm/xe/xe_gt_sriov_vf.c @@ -1138,6 +1138,15 @@ static void vf_post_migration_kickstart(struct xe_gt *gt) xe_guc_submit_unpause(>->uc.guc); } +static void vf_post_migration_abort(struct xe_gt *gt) +{ + spin_lock_irq(>->sriov.vf.migration.lock); + WRITE_ONCE(gt->sriov.vf.migration.recovery_inprogress, false); + spin_unlock_irq(>->sriov.vf.migration.lock); + + xe_guc_submit_pause_abort(>->uc.guc); +} + static int vf_post_migration_notify_resfix_done(struct xe_gt *gt) { bool skip_resfix = false; @@ -1196,6 +1205,7 @@ static void vf_post_migration_recovery(struct xe_gt *gt) xe_gt_sriov_notice(gt, "migration recovery ended\n"); return; fail: + vf_post_migration_abort(gt); xe_pm_runtime_put(xe); xe_gt_sriov_err(gt, "migration recovery failed (%pe)\n", ERR_PTR(err)); xe_device_declare_wedged(xe); diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c index 7f0ea35f4f0a..4a45c4934dce 100644 --- a/drivers/gpu/drm/xe/xe_guc_submit.c +++ b/drivers/gpu/drm/xe/xe_guc_submit.c @@ -2098,6 +2098,26 @@ void xe_guc_submit_unpause(struct xe_guc *guc) wake_up_all(&guc->ct.wq); } +/** + * xe_guc_submit_abort - Abort all paused submission task on given GuC. + * @guc: the &xe_guc struct instance whose scheduler is to be aborted + */ +void xe_guc_submit_pause_abort(struct xe_guc *guc) +{ + struct xe_exec_queue *q; + unsigned long index; + + mutex_lock(&guc->submission_state.lock); + xa_for_each(&guc->submission_state.exec_queue_lookup, index, q) { + struct xe_gpu_scheduler *sched = &q->guc->sched; + + xe_sched_submission_start(sched); + if (exec_queue_killed_or_banned_or_wedged(q)) + xe_guc_exec_queue_trigger_cleanup(q); + } + mutex_unlock(&guc->submission_state.lock); +} + static struct xe_exec_queue * g2h_exec_queue_lookup(struct xe_guc *guc, u32 guc_id) { diff --git a/drivers/gpu/drm/xe/xe_guc_submit.h b/drivers/gpu/drm/xe/xe_guc_submit.h index f535fe3895e5..fe82c317048e 100644 --- a/drivers/gpu/drm/xe/xe_guc_submit.h +++ b/drivers/gpu/drm/xe/xe_guc_submit.h @@ -22,6 +22,7 @@ void xe_guc_submit_stop(struct xe_guc *guc); int xe_guc_submit_start(struct xe_guc *guc); void xe_guc_submit_pause(struct xe_guc *guc); void xe_guc_submit_unpause(struct xe_guc *guc); +void xe_guc_submit_pause_abort(struct xe_guc *guc); void xe_guc_submit_wedge(struct xe_guc *guc); int xe_guc_read_stopped(struct xe_guc *guc); -- 2.34.1