From: "Dixit, Ashutosh" <ashutosh.dixit@intel.com>
To: Harish Chegondi <harish.chegondi@intel.com>
Cc: <intel-xe@lists.freedesktop.org>,
Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Subject: Re: [PATCH 1/1] drm/xe/eustall: Return EBADFD from read if EU stall registers get reset
Date: Thu, 18 Dec 2025 11:53:02 -0800 [thread overview]
Message-ID: <87fr97worl.wl-ashutosh.dixit@intel.com> (raw)
In-Reply-To: <6d78578c015b12e7ae243727ca7ed4b93551075d.1765174462.git.harish.chegondi@intel.com>
On Sun, 07 Dec 2025 22:16:11 -0800, Harish Chegondi wrote:
>
Hi Harish,
> If a reset (GT or engine) happens during EU stall data sampling, all the
> EU stall registers can get reset to 0. This will result in EU stall data
> buffers' read and write pointer register values to be out of sync with
> the cached values. This can result in read() returning invalid data. To
> prevent this, check the value of a EU stall base register. If it is zero,
> it indicates a reset may have happened that wiped the register to zero.
> If this happens, return EBADFD from read() upon which the user space
> should close the fd and open a new fd for a new EU stall data
> collection session.
>
> Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
> ---
> drivers/gpu/drm/xe/xe_eu_stall.c | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/xe_eu_stall.c b/drivers/gpu/drm/xe/xe_eu_stall.c
> index 97dfb7945b7a..02c0beb4559f 100644
> --- a/drivers/gpu/drm/xe/xe_eu_stall.c
> +++ b/drivers/gpu/drm/xe/xe_eu_stall.c
> @@ -541,9 +541,24 @@ static ssize_t xe_eu_stall_stream_read_locked(struct xe_eu_stall_data_stream *st
> size_t total_size = 0;
> u16 group, instance;
> unsigned int xecore;
> + u32 base_reg_value;
> int ret = 0;
>
> mutex_lock(&stream->xecore_buf_lock);
> + /* If a GT or engine reset happens during EU stall data sampling,
> + * all EU stall registers get reset to 0 and the cached values of
> + * EU stall data buffers' read and write pointers are out of sync
> + * with the register values. This can cause invalid data to be
> + * returned from read(). To prevent this, check the value of a
> + * EU stall base register. If it is zero, return -EBADFD. The
> + * user is expected to close the fd and open a new fd.
> + */
> + base_reg_value = xe_gt_mcr_unicast_read_any(gt, XEHPC_EUSTALL_BASE);
> + if (unlikely(!base_reg_value)) {
> + xe_gt_dbg(gt, "EU stall base register has been reset to 0\n");
> + mutex_unlock(&stream->xecore_buf_lock);
> + return -EBADFD;
> + }
So I am seeing two problems here:
1. We are doing register read every read() call, rather than just when a
reset happens.
2. The other issue is should reset itself unblock a blocked poll() or
blocking read() call? If we don't do that, it is possible that poll()
or blocking read() remains blocked indefinitely and so either the
non-blocking read() doesn't get called at all, or a blocking read()
remains indefinitely blocked. So that we never actually return -EBADFD
even though a reset has happened.
(Note that, for exec(), I believe any blocked fences will unblock and
return error etc. if a reset happens during an exec() call (see
reset_status()), so EU stall should probably do something similar).
So to address these two issues how about doing something like this:
1. Call an EU stall callback from xe_guc_exec_queue_reset_handler(). In the
callback, if an EU stall stream is open on that gt, check if
XEHPC_EUSTALL_BASE is 0 and set a stream variable stream->reset under a
suitable lock (likely xecore_buf_lock).
2. From eu_stall_data_buf_poll(), if stream->reset is set, return true to
wake up any waiters. We may also need to set POLLERR or POLLHUP revents.
3. Now from read(), if stream->reset is set return -EBADFD.
So I think something like this solves both problems mentioned above.
So could you please look into this and see if this is possible? Or any
other thoughts about this?
Thanks.
--
Ashutosh
> if (bitmap_weight(stream->data_drop.mask, XE_MAX_DSS_FUSE_BITS)) {
> if (!stream->data_drop.reported_to_user) {
> stream->data_drop.reported_to_user = true;
> --
> 2.43.0
>
next prev parent reply other threads:[~2025-12-18 19:53 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-08 6:16 [PATCH 1/1] drm/xe/eustall: Return EBADFD from read if EU stall registers get reset Harish Chegondi
2025-12-08 6:32 ` ✓ CI.KUnit: success for series starting with [1/1] " Patchwork
2025-12-08 7:56 ` ✓ Xe.CI.BAT: " Patchwork
2025-12-08 8:48 ` ✗ Xe.CI.Full: failure " Patchwork
2025-12-12 21:18 ` [PATCH 1/1] " Dixit, Ashutosh
2025-12-16 23:53 ` Harish Chegondi
2025-12-18 19:53 ` Dixit, Ashutosh [this message]
2025-12-22 22:37 ` Harish Chegondi
2025-12-23 5:08 ` Dixit, Ashutosh
2025-12-23 23:39 ` Harish Chegondi
2025-12-24 1:47 ` Dixit, Ashutosh
-- strict thread matches above, loose matches on Subject: below --
2025-10-01 6:38 Harish Chegondi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87fr97worl.wl-ashutosh.dixit@intel.com \
--to=ashutosh.dixit@intel.com \
--cc=harish.chegondi@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=umesh.nerlige.ramappa@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox