From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <intel-xe-bounces@lists.freedesktop.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id 2D8D2F36B9A
	for <intel-xe@archiver.kernel.org>; Fri, 10 Apr 2026 01:37:36 +0000 (UTC)
Received: from gabe.freedesktop.org (localhost [127.0.0.1])
	by gabe.freedesktop.org (Postfix) with ESMTP id E395910E88D;
	Fri, 10 Apr 2026 01:37:35 +0000 (UTC)
Authentication-Results: gabe.freedesktop.org;
	dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="be4bUIQ1";
	dkim-atps=neutral
Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 0DC8710E88D
 for <intel-xe@lists.freedesktop.org>; Fri, 10 Apr 2026 01:37:34 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
 d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
 t=1775785054; x=1807321054;
 h=date:message-id:from:to:cc:subject:in-reply-to:
 references:mime-version;
 bh=gZ8f1aLlXHOTDWSJnkpidRohUYSQCEdeJE//e5vz46Y=;
 b=be4bUIQ1DMicFClVfrm49xfdz8iQgKJgZXOmJwiU52DDEhpnCG0eDtwk
 JeBktj9n5CCKgDEJrpBIeOvlMtRbNkO9rCZPLK79CKuJDKPL7tLXaa3VE
 8ahUwvsZ14cmB5W8IKtx60HRCGKTV+cjz9D55kxaA2kMPsfSm0w4E3jgb
 2Ppc3CJ+n7D5fQyrpZsK0T9gcRVr1aLrbahCR7SXVEmSUJvhTpmRY+7wO
 oKmToc1yEzAEVinFjyYM0uNnddmgqkhsxt8B6Www0YHqZu2Icqa2WPn9j
 /TshF8JwqH+k49bSAjpkJAZ7av5Mb8KfmtP0J1EreJqr962MLg2ehC8Fi g==;
X-CSE-ConnectionGUID: 0zHmaifeTxOumMU88dAS4w==
X-CSE-MsgGUID: /KIeif2mQuG3QwI3SxAMlg==
X-IronPort-AV: E=McAfee;i="6800,10657,11754"; a="75971488"
X-IronPort-AV: E=Sophos;i="6.23,170,1770624000"; d="scan'208";a="75971488"
Received: from orviesa002.jf.intel.com ([10.64.159.142])
 by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 09 Apr 2026 18:37:33 -0700
X-CSE-ConnectionGUID: OXb38DFwTMWQTRHGPmymOQ==
X-CSE-MsgGUID: iM0pcmawTvWYLf7MnWuKGQ==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="6.23,170,1770624000"; d="scan'208";a="259421236"
Received: from cmaltby-mobl.amr.corp.intel.com (HELO adixit-MOBL3.intel.com)
 ([10.125.66.140])
 by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 09 Apr 2026 18:37:32 -0700
Date: Thu, 09 Apr 2026 18:37:31 -0700
Message-ID: <87a4vb8ujo.wl-ashutosh.dixit@intel.com>
From: "Dixit, Ashutosh" <ashutosh.dixit@intel.com>
To: Harish Chegondi <harish.chegondi@intel.com>
Cc: <intel-xe@lists.freedesktop.org>, <felix.j.degrood@intel.com>,
 <matias.a.cabral@intel.com>, 
 <joshua.santosh.ranjan@intel.com>, <umesh.nerlige.ramappa@intel.com>
Subject: Re: [PATCH v4 1/1] drm/xe/eustall: Return ENODEV from read if EU
 stall registers get reset
In-Reply-To: <4360d082795f1e16e338de7f253926b1680e0beb.1775023744.git.harish.chegondi@intel.com>
References: <4360d082795f1e16e338de7f253926b1680e0beb.1775023744.git.harish.chegondi@intel.com>
User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue)
 FLIM-LB/1.14.9 (=?ISO-8859-4?Q?Goj=F2?=) APEL-LB/10.8 EasyPG/1.0.0
 Emacs/30.2 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO)
MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue")
Content-Type: text/plain; charset=US-ASCII
X-BeenThere: intel-xe@lists.freedesktop.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Intel Xe graphics driver <intel-xe.lists.freedesktop.org>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/intel-xe>,
 <mailto:intel-xe-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/intel-xe>
List-Post: <mailto:intel-xe@lists.freedesktop.org>
List-Help: <mailto:intel-xe-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/intel-xe>,
 <mailto:intel-xe-request@lists.freedesktop.org?subject=subscribe>
Errors-To: intel-xe-bounces@lists.freedesktop.org
Sender: "Intel-xe" <intel-xe-bounces@lists.freedesktop.org>

On Tue, 31 Mar 2026 23:15:44 -0700, Harish Chegondi wrote:
>
> If a reset (GT or engine) happens during EU stall data sampling, all the
> EU stall registers can get reset to 0. This will result in EU stall data
> buffers' read and write pointer register values to be out of sync with
> the cached values. This will result in read() returning invalid data. To
> prevent this, check the value of a EU stall base register. If it is zero,
> it indicates a reset may have happened that wiped the register to zero.
> If this happens, return ENODEV from read() upon which the user space
> should disable and enable EU stall data sampling or close the fd and
> open a new fd for a new EU stall data collection session.
>
> This patch has been tested by running two IGT tests simultaneously
> xe_eu_stall and xe_exec_reset.
>
> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
> ---
> v2: Move base register check from read to the poll function
> v3: Don't reschedule work item after reset detected (Ashutosh)
> v4: Return ENODEV as errno instead of EBADFD
>     Reset reset_detected in enable()
>
>  drivers/gpu/drm/xe/xe_eu_stall.c | 39 +++++++++++++++++++++++++-------
>  1 file changed, 31 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_eu_stall.c b/drivers/gpu/drm/xe/xe_eu_stall.c
> index c34408cfd292..8863d8ebd5d1 100644
> --- a/drivers/gpu/drm/xe/xe_eu_stall.c
> +++ b/drivers/gpu/drm/xe/xe_eu_stall.c
> @@ -44,6 +44,7 @@ struct per_xecore_buf {
>  struct xe_eu_stall_data_stream {
>	bool pollin;
>	bool enabled;
> +	bool reset_detected;
>	int wait_num_reports;
>	int sampling_rate_mult;
>	wait_queue_head_t poll_wq;
> @@ -428,9 +429,20 @@ static bool eu_stall_data_buf_poll(struct xe_eu_stall_data_stream *stream)
>			set_bit(xecore, stream->data_drop.mask);
>		xecore_buf->write = write_ptr;
>	}
> +	/* If a GT or engine reset happens during EU stall sampling,
> +	 * all EU stall registers get reset to 0 and the cached values of
> +	 * the EU stall data buffers' read pointers are out of sync with
> +	 * the register values. This causes invalid data to be returned
> +	 * from read(). To prevent this, check the value of a EU stall base
> +	 * register. If it is zero, there has been a reset.
> +	 */
> +	if (unlikely(!xe_gt_mcr_unicast_read_any(gt, XEHPC_EUSTALL_BASE)))
> +		stream->reset_detected = true;
> +
> +	stream->pollin = min_data_present || stream->reset_detected;
>	mutex_unlock(&stream->xecore_buf_lock);
>
> -	return min_data_present;
> +	return stream->pollin;
>  }
>
>  static void clear_dropped_eviction_line_bit(struct xe_gt *gt, u16 group, u16 instance)
> @@ -554,6 +566,15 @@ static ssize_t xe_eu_stall_stream_read_locked(struct xe_eu_stall_data_stream *st
>		}
>		stream->data_drop.reported_to_user = false;
>	}
> +	/* If EU stall registers got reset due to a GT/engine reset,
> +	 * continuing with the read() will return invalid data to
> +	 * the user space. Just return -ENODEV instead.
> +	 */
> +	if (unlikely(stream->reset_detected)) {
> +		xe_gt_dbg(gt, "EU stall base register has been reset\n");
> +		mutex_unlock(&stream->xecore_buf_lock);
> +		return -ENODEV;
> +	}

Likely OK, but could you quickly check if this new if statement should be
moved above the previous 'if (bitmap_weight...)' if statement.

Otherwise, this is now:

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>

>
>	for_each_dss_steering(xecore, gt, group, instance) {
>		ret = xe_eu_stall_data_buf_read(stream, buf, count, &total_size,
> @@ -609,7 +630,8 @@ static ssize_t xe_eu_stall_stream_read(struct file *file, char __user *buf,
>	 * We don't want to block the next read() when there is data in the buffer
>	 * now, but couldn't be accommodated in the small user buffer.
>	 */
> -	stream->pollin = false;
> +	if (!stream->reset_detected)
> +		stream->pollin = false;
>
>	return ret;
>  }
> @@ -692,6 +714,7 @@ static int xe_eu_stall_stream_enable(struct xe_eu_stall_data_stream *stream)
>		xecore_buf->write = write_ptr;
>		xecore_buf->read = write_ptr;
>	}
> +	stream->reset_detected = false;
>	stream->data_drop.reported_to_user = false;
>	bitmap_zero(stream->data_drop.mask, XE_MAX_DSS_FUSE_BITS);
>
> @@ -717,13 +740,13 @@ static void eu_stall_data_buf_poll_work_fn(struct work_struct *work)
>		container_of(work, typeof(*stream), buf_poll_work.work);
>	struct xe_gt *gt = stream->gt;
>
> -	if (eu_stall_data_buf_poll(stream)) {
> -		stream->pollin = true;
> +	if (eu_stall_data_buf_poll(stream))
>		wake_up(&stream->poll_wq);
> -	}
> -	queue_delayed_work(gt->eu_stall->buf_ptr_poll_wq,
> -			   &stream->buf_poll_work,
> -			   msecs_to_jiffies(POLL_PERIOD_MS));
> +
> +	if (!stream->reset_detected)
> +		queue_delayed_work(gt->eu_stall->buf_ptr_poll_wq,
> +				   &stream->buf_poll_work,
> +				   msecs_to_jiffies(POLL_PERIOD_MS));
>  }
>
>  static int xe_eu_stall_stream_init(struct xe_eu_stall_data_stream *stream,
> --
> 2.43.0
>