From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CE417C87FC9 for ; Wed, 30 Jul 2025 05:49:54 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 9086710E12F; Wed, 30 Jul 2025 05:49:54 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="XON3+qNr"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.19]) by gabe.freedesktop.org (Postfix) with ESMTPS id 57EC710E365 for ; Wed, 30 Jul 2025 05:49:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1753854589; x=1785390589; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Nvp5F2Jag6qWkyMoTxmiuUHG7I5xJkRzyn1IeJAwxi4=; b=XON3+qNrYORWN3rL1KUTQAHwg56feKVf9d4k+0+gDWFsm5YnP9KM3q8v uo540hKmQ7eHT2aioS9XrezWB1KFHT6hmSmOXznW5aYEVN/Hjo/chSVMD PKE0gDYg6AXzmqFS5MnHpUp4lVVFw+A20yEUK1YFRU5qNvoOnWX0Dee4v 9Bdi7iSHfJsPT0euIdsL49smxBsMR7BcbfRMsBh38O3/Kh6QLN4ypnCQ6 RJrnacwbc5o/74qYJkeklFrNPwUE/mdx3d+XGIpZIwLvMZ59qm4htlVr9 SEWJ9IZ7/gXBxEfNDKAAvFYPC6OCvwCZXXfcmThJ9z/kFZ/rOafaQeMlT w==; X-CSE-ConnectionGUID: 7jf7aKFrT267zsuFyH8gVA== X-CSE-MsgGUID: aVQqPMzhTXKgn4lFAiEHrw== X-IronPort-AV: E=McAfee;i="6800,10657,11506"; a="55215624" X-IronPort-AV: E=Sophos;i="6.16,350,1744095600"; d="scan'208";a="55215624" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by fmvoesa113.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jul 2025 22:49:48 -0700 X-CSE-ConnectionGUID: 7xSUjyv8Sfmzjtgngdsgnw== X-CSE-MsgGUID: JQwD/nmlTayoXsWlSCpkQQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.16,350,1744095600"; d="scan'208";a="163240271" Received: from aravind-dev.iind.intel.com ([10.190.239.36]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jul 2025 22:49:46 -0700 From: Aravind Iddamsetty To: intel-xe@lists.freedesktop.org Cc: riana.tauro@intel.com, rodrigo.vivi@intel.com, himal.prasad.ghimiray@intel.com, anshuman.gupta@intel.com Subject: [PATCH 10/10] drm/xe: Clear all SoC errors post warm reset. Date: Wed, 30 Jul 2025 11:18:14 +0530 Message-Id: <20250730054814.1376770-11-aravind.iddamsetty@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20250730054814.1376770-1-aravind.iddamsetty@linux.intel.com> References: <20250730054814.1376770-1-aravind.iddamsetty@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" From: Himal Prasad Ghimiray There are scenarios where there are errors being reported from the SoC uncore to IEH and not propagated to SG unit. Since these errors are not propagated to SG unit, driver won't be able to clean them as part of xe_process_hw_error. Hence clear all SoC register post xe_process_hw_error during the driver load. v2 - Fix commit message. v3 - Limit check to PVC. v4 - Fix check Cc: Aravind Iddamsetty Reviewed-by: Aravind Iddamsetty Signed-off-by: Himal Prasad Ghimiray --- drivers/gpu/drm/xe/xe_hw_error.c | 41 ++++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) diff --git a/drivers/gpu/drm/xe/xe_hw_error.c b/drivers/gpu/drm/xe/xe_hw_error.c index a77779eb6ce8..6a7cd59caac1 100644 --- a/drivers/gpu/drm/xe/xe_hw_error.c +++ b/drivers/gpu/drm/xe/xe_hw_error.c @@ -510,6 +510,46 @@ xe_gt_hw_error_log_vector_reg(struct xe_gt *gt, const enum hardware_error hw_err } } +static void xe_clear_all_soc_errors(struct xe_device *xe) +{ + enum hardware_error hw_err; + u32 base, slave_base; + struct xe_tile *tile; + struct xe_gt *gt; + unsigned int i; + + if (xe->info.platform != XE_PVC) + return; + + base = SOC_PVC_BASE; + slave_base = SOC_PVC_SLAVE_BASE; + + hw_err = HARDWARE_ERROR_CORRECTABLE; + + for_each_tile(tile, xe, i) { + gt = tile->primary_gt; + + while (hw_err < HARDWARE_ERROR_MAX) { + for (i = 0; i < XE_SOC_NUM_IEH; i++) + xe_mmio_write32(>->tile->mmio, SOC_GSYSEVTCTL_REG(base, slave_base, i), + ~REG_BIT(hw_err)); + + xe_mmio_write32(>->tile->mmio, SOC_GLOBAL_ERR_STAT_MASTER_REG(base, hw_err), + REG_GENMASK(31, 0)); + xe_mmio_write32(>->tile->mmio, SOC_LOCAL_ERR_STAT_MASTER_REG(base, hw_err), + REG_GENMASK(31, 0)); + xe_mmio_write32(>->tile->mmio, SOC_GLOBAL_ERR_STAT_SLAVE_REG(slave_base, hw_err), + REG_GENMASK(31, 0)); + xe_mmio_write32(>->tile->mmio, SOC_LOCAL_ERR_STAT_SLAVE_REG(slave_base, hw_err), + REG_GENMASK(31, 0)); + hw_err++; + } + for (i = 0; i < XE_SOC_NUM_IEH; i++) + xe_mmio_write32(>->tile->mmio, SOC_GSYSEVTCTL_REG(base, slave_base, i), + (HARDWARE_ERROR_MAX << 1) + 1); + } +} + static void xe_gt_hw_error_handler(struct xe_gt *gt, const enum hardware_error hw_err) { @@ -852,4 +892,5 @@ void xe_init_hw_errors(struct xe_device *xe) { xe_assign_hw_err_regs(xe); xe_process_hw_errors(xe); + xe_clear_all_soc_errors(xe); } -- 2.25.1