Re: [PATCH v4 1/1] drm/xe/eustall: Return ENODEV from read if EU stall registers get reset

public inbox for intel-xe@lists.freedesktop.org
 help / color / mirror / Atom feed

From: "Dixit, Ashutosh" <ashutosh.dixit@intel.com>
To: Harish Chegondi <harish.chegondi@intel.com>
Cc: <intel-xe@lists.freedesktop.org>, <felix.j.degrood@intel.com>,
	<matias.a.cabral@intel.com>,  <joshua.santosh.ranjan@intel.com>,
	<umesh.nerlige.ramappa@intel.com>
Subject: Re: [PATCH v4 1/1] drm/xe/eustall: Return ENODEV from read if EU stall registers get reset
Date: Thu, 09 Apr 2026 18:37:31 -0700	[thread overview]
Message-ID: <87a4vb8ujo.wl-ashutosh.dixit@intel.com> (raw)
In-Reply-To: <4360d082795f1e16e338de7f253926b1680e0beb.1775023744.git.harish.chegondi@intel.com>

On Tue, 31 Mar 2026 23:15:44 -0700, Harish Chegondi wrote:
>
> If a reset (GT or engine) happens during EU stall data sampling, all the
> EU stall registers can get reset to 0. This will result in EU stall data
> buffers' read and write pointer register values to be out of sync with
> the cached values. This will result in read() returning invalid data. To
> prevent this, check the value of a EU stall base register. If it is zero,
> it indicates a reset may have happened that wiped the register to zero.
> If this happens, return ENODEV from read() upon which the user space
> should disable and enable EU stall data sampling or close the fd and
> open a new fd for a new EU stall data collection session.
>
> This patch has been tested by running two IGT tests simultaneously
> xe_eu_stall and xe_exec_reset.
>
> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
> ---
> v2: Move base register check from read to the poll function
> v3: Don't reschedule work item after reset detected (Ashutosh)
> v4: Return ENODEV as errno instead of EBADFD
>     Reset reset_detected in enable()
>
>  drivers/gpu/drm/xe/xe_eu_stall.c | 39 +++++++++++++++++++++++++-------
>  1 file changed, 31 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_eu_stall.c b/drivers/gpu/drm/xe/xe_eu_stall.c
> index c34408cfd292..8863d8ebd5d1 100644
> --- a/drivers/gpu/drm/xe/xe_eu_stall.c
> +++ b/drivers/gpu/drm/xe/xe_eu_stall.c
> @@ -44,6 +44,7 @@ struct per_xecore_buf {
>  struct xe_eu_stall_data_stream {
>	bool pollin;
>	bool enabled;
> +	bool reset_detected;
>	int wait_num_reports;
>	int sampling_rate_mult;
>	wait_queue_head_t poll_wq;
> @@ -428,9 +429,20 @@ static bool eu_stall_data_buf_poll(struct xe_eu_stall_data_stream *stream)
>			set_bit(xecore, stream->data_drop.mask);
>		xecore_buf->write = write_ptr;
>	}
> +	/* If a GT or engine reset happens during EU stall sampling,
> +	 * all EU stall registers get reset to 0 and the cached values of
> +	 * the EU stall data buffers' read pointers are out of sync with
> +	 * the register values. This causes invalid data to be returned
> +	 * from read(). To prevent this, check the value of a EU stall base
> +	 * register. If it is zero, there has been a reset.
> +	 */
> +	if (unlikely(!xe_gt_mcr_unicast_read_any(gt, XEHPC_EUSTALL_BASE)))
> +		stream->reset_detected = true;
> +
> +	stream->pollin = min_data_present || stream->reset_detected;
>	mutex_unlock(&stream->xecore_buf_lock);
>
> -	return min_data_present;
> +	return stream->pollin;
>  }
>
>  static void clear_dropped_eviction_line_bit(struct xe_gt *gt, u16 group, u16 instance)
> @@ -554,6 +566,15 @@ static ssize_t xe_eu_stall_stream_read_locked(struct xe_eu_stall_data_stream *st
>		}
>		stream->data_drop.reported_to_user = false;
>	}
> +	/* If EU stall registers got reset due to a GT/engine reset,
> +	 * continuing with the read() will return invalid data to
> +	 * the user space. Just return -ENODEV instead.
> +	 */
> +	if (unlikely(stream->reset_detected)) {
> +		xe_gt_dbg(gt, "EU stall base register has been reset\n");
> +		mutex_unlock(&stream->xecore_buf_lock);
> +		return -ENODEV;
> +	}

Likely OK, but could you quickly check if this new if statement should be
moved above the previous 'if (bitmap_weight...)' if statement.

Otherwise, this is now:

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>

>
>	for_each_dss_steering(xecore, gt, group, instance) {
>		ret = xe_eu_stall_data_buf_read(stream, buf, count, &total_size,
> @@ -609,7 +630,8 @@ static ssize_t xe_eu_stall_stream_read(struct file *file, char __user *buf,
>	 * We don't want to block the next read() when there is data in the buffer
>	 * now, but couldn't be accommodated in the small user buffer.
>	 */
> -	stream->pollin = false;
> +	if (!stream->reset_detected)
> +		stream->pollin = false;
>
>	return ret;
>  }
> @@ -692,6 +714,7 @@ static int xe_eu_stall_stream_enable(struct xe_eu_stall_data_stream *stream)
>		xecore_buf->write = write_ptr;
>		xecore_buf->read = write_ptr;
>	}
> +	stream->reset_detected = false;
>	stream->data_drop.reported_to_user = false;
>	bitmap_zero(stream->data_drop.mask, XE_MAX_DSS_FUSE_BITS);
>
> @@ -717,13 +740,13 @@ static void eu_stall_data_buf_poll_work_fn(struct work_struct *work)
>		container_of(work, typeof(*stream), buf_poll_work.work);
>	struct xe_gt *gt = stream->gt;
>
> -	if (eu_stall_data_buf_poll(stream)) {
> -		stream->pollin = true;
> +	if (eu_stall_data_buf_poll(stream))
>		wake_up(&stream->poll_wq);
> -	}
> -	queue_delayed_work(gt->eu_stall->buf_ptr_poll_wq,
> -			   &stream->buf_poll_work,
> -			   msecs_to_jiffies(POLL_PERIOD_MS));
> +
> +	if (!stream->reset_detected)
> +		queue_delayed_work(gt->eu_stall->buf_ptr_poll_wq,
> +				   &stream->buf_poll_work,
> +				   msecs_to_jiffies(POLL_PERIOD_MS));
>  }
>
>  static int xe_eu_stall_stream_init(struct xe_eu_stall_data_stream *stream,
> --
> 2.43.0
>

     prev parent reply	other threads:[~2026-04-10  1:37 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-01  6:15 [PATCH v4 1/1] drm/xe/eustall: Return ENODEV from read if EU stall registers get reset Harish Chegondi
2026-04-01  6:22 ` ✓ CI.KUnit: success for series starting with [v4,1/1] " Patchwork
2026-04-01  6:56 ` ✓ Xe.CI.BAT: " Patchwork
2026-04-01 12:09 ` ✗ Xe.CI.FULL: failure " Patchwork
2026-04-10  1:37 ` Dixit, Ashutosh [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a4vb8ujo.wl-ashutosh.dixit@intel.com \
    --to=ashutosh.dixit@intel.com \
    --cc=felix.j.degrood@intel.com \
    --cc=harish.chegondi@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=joshua.santosh.ranjan@intel.com \
    --cc=matias.a.cabral@intel.com \
    --cc=umesh.nerlige.ramappa@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox