From: "Dixit, Ashutosh" <ashutosh.dixit@intel.com>
To: Harish Chegondi <harish.chegondi@intel.com>
Cc: <intel-xe@lists.freedesktop.org>, <felix.j.degrood@intel.com>,
<matias.a.cabral@intel.com>, <joshua.santosh.ranjan@intel.com>,
<umesh.nerlige.ramappa@intel.com>
Subject: Re: [PATCH v4 1/1] drm/xe/eustall: Return ENODEV from read if EU stall registers get reset
Date: Thu, 09 Apr 2026 18:37:31 -0700 [thread overview]
Message-ID: <87a4vb8ujo.wl-ashutosh.dixit@intel.com> (raw)
In-Reply-To: <4360d082795f1e16e338de7f253926b1680e0beb.1775023744.git.harish.chegondi@intel.com>
On Tue, 31 Mar 2026 23:15:44 -0700, Harish Chegondi wrote:
>
> If a reset (GT or engine) happens during EU stall data sampling, all the
> EU stall registers can get reset to 0. This will result in EU stall data
> buffers' read and write pointer register values to be out of sync with
> the cached values. This will result in read() returning invalid data. To
> prevent this, check the value of a EU stall base register. If it is zero,
> it indicates a reset may have happened that wiped the register to zero.
> If this happens, return ENODEV from read() upon which the user space
> should disable and enable EU stall data sampling or close the fd and
> open a new fd for a new EU stall data collection session.
>
> This patch has been tested by running two IGT tests simultaneously
> xe_eu_stall and xe_exec_reset.
>
> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
> ---
> v2: Move base register check from read to the poll function
> v3: Don't reschedule work item after reset detected (Ashutosh)
> v4: Return ENODEV as errno instead of EBADFD
> Reset reset_detected in enable()
>
> drivers/gpu/drm/xe/xe_eu_stall.c | 39 +++++++++++++++++++++++++-------
> 1 file changed, 31 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_eu_stall.c b/drivers/gpu/drm/xe/xe_eu_stall.c
> index c34408cfd292..8863d8ebd5d1 100644
> --- a/drivers/gpu/drm/xe/xe_eu_stall.c
> +++ b/drivers/gpu/drm/xe/xe_eu_stall.c
> @@ -44,6 +44,7 @@ struct per_xecore_buf {
> struct xe_eu_stall_data_stream {
> bool pollin;
> bool enabled;
> + bool reset_detected;
> int wait_num_reports;
> int sampling_rate_mult;
> wait_queue_head_t poll_wq;
> @@ -428,9 +429,20 @@ static bool eu_stall_data_buf_poll(struct xe_eu_stall_data_stream *stream)
> set_bit(xecore, stream->data_drop.mask);
> xecore_buf->write = write_ptr;
> }
> + /* If a GT or engine reset happens during EU stall sampling,
> + * all EU stall registers get reset to 0 and the cached values of
> + * the EU stall data buffers' read pointers are out of sync with
> + * the register values. This causes invalid data to be returned
> + * from read(). To prevent this, check the value of a EU stall base
> + * register. If it is zero, there has been a reset.
> + */
> + if (unlikely(!xe_gt_mcr_unicast_read_any(gt, XEHPC_EUSTALL_BASE)))
> + stream->reset_detected = true;
> +
> + stream->pollin = min_data_present || stream->reset_detected;
> mutex_unlock(&stream->xecore_buf_lock);
>
> - return min_data_present;
> + return stream->pollin;
> }
>
> static void clear_dropped_eviction_line_bit(struct xe_gt *gt, u16 group, u16 instance)
> @@ -554,6 +566,15 @@ static ssize_t xe_eu_stall_stream_read_locked(struct xe_eu_stall_data_stream *st
> }
> stream->data_drop.reported_to_user = false;
> }
> + /* If EU stall registers got reset due to a GT/engine reset,
> + * continuing with the read() will return invalid data to
> + * the user space. Just return -ENODEV instead.
> + */
> + if (unlikely(stream->reset_detected)) {
> + xe_gt_dbg(gt, "EU stall base register has been reset\n");
> + mutex_unlock(&stream->xecore_buf_lock);
> + return -ENODEV;
> + }
Likely OK, but could you quickly check if this new if statement should be
moved above the previous 'if (bitmap_weight...)' if statement.
Otherwise, this is now:
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
>
> for_each_dss_steering(xecore, gt, group, instance) {
> ret = xe_eu_stall_data_buf_read(stream, buf, count, &total_size,
> @@ -609,7 +630,8 @@ static ssize_t xe_eu_stall_stream_read(struct file *file, char __user *buf,
> * We don't want to block the next read() when there is data in the buffer
> * now, but couldn't be accommodated in the small user buffer.
> */
> - stream->pollin = false;
> + if (!stream->reset_detected)
> + stream->pollin = false;
>
> return ret;
> }
> @@ -692,6 +714,7 @@ static int xe_eu_stall_stream_enable(struct xe_eu_stall_data_stream *stream)
> xecore_buf->write = write_ptr;
> xecore_buf->read = write_ptr;
> }
> + stream->reset_detected = false;
> stream->data_drop.reported_to_user = false;
> bitmap_zero(stream->data_drop.mask, XE_MAX_DSS_FUSE_BITS);
>
> @@ -717,13 +740,13 @@ static void eu_stall_data_buf_poll_work_fn(struct work_struct *work)
> container_of(work, typeof(*stream), buf_poll_work.work);
> struct xe_gt *gt = stream->gt;
>
> - if (eu_stall_data_buf_poll(stream)) {
> - stream->pollin = true;
> + if (eu_stall_data_buf_poll(stream))
> wake_up(&stream->poll_wq);
> - }
> - queue_delayed_work(gt->eu_stall->buf_ptr_poll_wq,
> - &stream->buf_poll_work,
> - msecs_to_jiffies(POLL_PERIOD_MS));
> +
> + if (!stream->reset_detected)
> + queue_delayed_work(gt->eu_stall->buf_ptr_poll_wq,
> + &stream->buf_poll_work,
> + msecs_to_jiffies(POLL_PERIOD_MS));
> }
>
> static int xe_eu_stall_stream_init(struct xe_eu_stall_data_stream *stream,
> --
> 2.43.0
>
prev parent reply other threads:[~2026-04-10 1:37 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-01 6:15 [PATCH v4 1/1] drm/xe/eustall: Return ENODEV from read if EU stall registers get reset Harish Chegondi
2026-04-01 6:22 ` ✓ CI.KUnit: success for series starting with [v4,1/1] " Patchwork
2026-04-01 6:56 ` ✓ Xe.CI.BAT: " Patchwork
2026-04-01 12:09 ` ✗ Xe.CI.FULL: failure " Patchwork
2026-04-10 1:37 ` Dixit, Ashutosh [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87a4vb8ujo.wl-ashutosh.dixit@intel.com \
--to=ashutosh.dixit@intel.com \
--cc=felix.j.degrood@intel.com \
--cc=harish.chegondi@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=joshua.santosh.ranjan@intel.com \
--cc=matias.a.cabral@intel.com \
--cc=umesh.nerlige.ramappa@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.