From: "Dixit, Ashutosh" <ashutosh.dixit@intel.com>
To: Harish Chegondi <harish.chegondi@intel.com>
Cc: <intel-xe@lists.freedesktop.org>, <felix.j.degrood@intel.com>,
<matias.a.cabral@intel.com>, <joshua.santosh.ranjan@intel.com>,
<umesh.nerlige.ramappa@intel.com>
Subject: Re: [PATCH v4 1/1] drm/xe/eustall: Return ENODEV from read if EU stall registers get reset
Date: Thu, 09 Apr 2026 18:37:31 -0700 [thread overview]
Message-ID: <87a4vb8ujo.wl-ashutosh.dixit@intel.com> (raw)
In-Reply-To: <4360d082795f1e16e338de7f253926b1680e0beb.1775023744.git.harish.chegondi@intel.com>
On Tue, 31 Mar 2026 23:15:44 -0700, Harish Chegondi wrote:
>
> If a reset (GT or engine) happens during EU stall data sampling, all the
> EU stall registers can get reset to 0. This will result in EU stall data
> buffers' read and write pointer register values to be out of sync with
> the cached values. This will result in read() returning invalid data. To
> prevent this, check the value of a EU stall base register. If it is zero,
> it indicates a reset may have happened that wiped the register to zero.
> If this happens, return ENODEV from read() upon which the user space
> should disable and enable EU stall data sampling or close the fd and
> open a new fd for a new EU stall data collection session.
>
> This patch has been tested by running two IGT tests simultaneously
> xe_eu_stall and xe_exec_reset.
>
> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
> ---
> v2: Move base register check from read to the poll function
> v3: Don't reschedule work item after reset detected (Ashutosh)
> v4: Return ENODEV as errno instead of EBADFD
> Reset reset_detected in enable()
>
> drivers/gpu/drm/xe/xe_eu_stall.c | 39 +++++++++++++++++++++++++-------
> 1 file changed, 31 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_eu_stall.c b/drivers/gpu/drm/xe/xe_eu_stall.c
> index c34408cfd292..8863d8ebd5d1 100644
> --- a/drivers/gpu/drm/xe/xe_eu_stall.c
> +++ b/drivers/gpu/drm/xe/xe_eu_stall.c
> @@ -44,6 +44,7 @@ struct per_xecore_buf {
> struct xe_eu_stall_data_stream {
> bool pollin;
> bool enabled;
> + bool reset_detected;
> int wait_num_reports;
> int sampling_rate_mult;
> wait_queue_head_t poll_wq;
> @@ -428,9 +429,20 @@ static bool eu_stall_data_buf_poll(struct xe_eu_stall_data_stream *stream)
> set_bit(xecore, stream->data_drop.mask);
> xecore_buf->write = write_ptr;
> }
> + /* If a GT or engine reset happens during EU stall sampling,
> + * all EU stall registers get reset to 0 and the cached values of
> + * the EU stall data buffers' read pointers are out of sync with
> + * the register values. This causes invalid data to be returned
> + * from read(). To prevent this, check the value of a EU stall base
> + * register. If it is zero, there has been a reset.
> + */
> + if (unlikely(!xe_gt_mcr_unicast_read_any(gt, XEHPC_EUSTALL_BASE)))
> + stream->reset_detected = true;
> +
> + stream->pollin = min_data_present || stream->reset_detected;
> mutex_unlock(&stream->xecore_buf_lock);
>
> - return min_data_present;
> + return stream->pollin;
> }
>
> static void clear_dropped_eviction_line_bit(struct xe_gt *gt, u16 group, u16 instance)
> @@ -554,6 +566,15 @@ static ssize_t xe_eu_stall_stream_read_locked(struct xe_eu_stall_data_stream *st
> }
> stream->data_drop.reported_to_user = false;
> }
> + /* If EU stall registers got reset due to a GT/engine reset,
> + * continuing with the read() will return invalid data to
> + * the user space. Just return -ENODEV instead.
> + */
> + if (unlikely(stream->reset_detected)) {
> + xe_gt_dbg(gt, "EU stall base register has been reset\n");
> + mutex_unlock(&stream->xecore_buf_lock);
> + return -ENODEV;
> + }
Likely OK, but could you quickly check if this new if statement should be
moved above the previous 'if (bitmap_weight...)' if statement.
Otherwise, this is now:
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
>
> for_each_dss_steering(xecore, gt, group, instance) {
> ret = xe_eu_stall_data_buf_read(stream, buf, count, &total_size,
> @@ -609,7 +630,8 @@ static ssize_t xe_eu_stall_stream_read(struct file *file, char __user *buf,
> * We don't want to block the next read() when there is data in the buffer
> * now, but couldn't be accommodated in the small user buffer.
> */
> - stream->pollin = false;
> + if (!stream->reset_detected)
> + stream->pollin = false;
>
> return ret;
> }
> @@ -692,6 +714,7 @@ static int xe_eu_stall_stream_enable(struct xe_eu_stall_data_stream *stream)
> xecore_buf->write = write_ptr;
> xecore_buf->read = write_ptr;
> }
> + stream->reset_detected = false;
> stream->data_drop.reported_to_user = false;
> bitmap_zero(stream->data_drop.mask, XE_MAX_DSS_FUSE_BITS);
>
> @@ -717,13 +740,13 @@ static void eu_stall_data_buf_poll_work_fn(struct work_struct *work)
> container_of(work, typeof(*stream), buf_poll_work.work);
> struct xe_gt *gt = stream->gt;
>
> - if (eu_stall_data_buf_poll(stream)) {
> - stream->pollin = true;
> + if (eu_stall_data_buf_poll(stream))
> wake_up(&stream->poll_wq);
> - }
> - queue_delayed_work(gt->eu_stall->buf_ptr_poll_wq,
> - &stream->buf_poll_work,
> - msecs_to_jiffies(POLL_PERIOD_MS));
> +
> + if (!stream->reset_detected)
> + queue_delayed_work(gt->eu_stall->buf_ptr_poll_wq,
> + &stream->buf_poll_work,
> + msecs_to_jiffies(POLL_PERIOD_MS));
> }
>
> static int xe_eu_stall_stream_init(struct xe_eu_stall_data_stream *stream,
> --
> 2.43.0
>
prev parent reply other threads:[~2026-04-10 1:37 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-01 6:15 [PATCH v4 1/1] drm/xe/eustall: Return ENODEV from read if EU stall registers get reset Harish Chegondi
2026-04-01 6:22 ` ✓ CI.KUnit: success for series starting with [v4,1/1] " Patchwork
2026-04-01 6:56 ` ✓ Xe.CI.BAT: " Patchwork
2026-04-01 12:09 ` ✗ Xe.CI.FULL: failure " Patchwork
2026-04-10 1:37 ` Dixit, Ashutosh [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87a4vb8ujo.wl-ashutosh.dixit@intel.com \
--to=ashutosh.dixit@intel.com \
--cc=felix.j.degrood@intel.com \
--cc=harish.chegondi@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=joshua.santosh.ranjan@intel.com \
--cc=matias.a.cabral@intel.com \
--cc=umesh.nerlige.ramappa@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox