Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Harish Chegondi <harish.chegondi@intel.com>
To: "Dixit, Ashutosh" <ashutosh.dixit@intel.com>
Cc: <intel-xe@lists.freedesktop.org>,
	Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Subject: Re: [PATCH 1/1] drm/xe/eustall: Return EBADFD from read if EU stall registers get reset
Date: Tue, 23 Dec 2025 15:39:47 -0800	[thread overview]
Message-ID: <aUsoQy89xwWAAkmM@intel.com> (raw)
In-Reply-To: <87ecolwzt7.wl-ashutosh.dixit@intel.com>

On Mon, Dec 22, 2025 at 09:08:04PM -0800, Dixit, Ashutosh wrote:
> On Mon, 22 Dec 2025 14:37:53 -0800, Harish Chegondi wrote:
> >
> > On Thu, Dec 18, 2025 at 11:53:02AM -0800, Dixit, Ashutosh wrote:
> > > On Sun, 07 Dec 2025 22:16:11 -0800, Harish Chegondi wrote:
> > > >
> > >
> > Hi Ashutosh,
> > > Hi Harish,
> > >
> > > > If a reset (GT or engine) happens during EU stall data sampling, all the
> > > > EU stall registers can get reset to 0. This will result in EU stall data
> > > > buffers' read and write pointer register values to be out of sync with
> > > > the cached values. This can result in read() returning invalid data. To
> > > > prevent this, check the value of a EU stall base register. If it is zero,
> > > > it indicates a reset may have happened that wiped the register to zero.
> > > > If this happens, return EBADFD from read() upon which the user space
> > > > should close the fd and open a new fd for a new EU stall data
> > > > collection session.
> > > >
> > > > Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
> > > > Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
> > > > Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
> > > > ---
> > > >  drivers/gpu/drm/xe/xe_eu_stall.c | 15 +++++++++++++++
> > > >  1 file changed, 15 insertions(+)
> > > >
> > > > diff --git a/drivers/gpu/drm/xe/xe_eu_stall.c b/drivers/gpu/drm/xe/xe_eu_stall.c
> > > > index 97dfb7945b7a..02c0beb4559f 100644
> > > > --- a/drivers/gpu/drm/xe/xe_eu_stall.c
> > > > +++ b/drivers/gpu/drm/xe/xe_eu_stall.c
> > > > @@ -541,9 +541,24 @@ static ssize_t xe_eu_stall_stream_read_locked(struct xe_eu_stall_data_stream *st
> > > >	size_t total_size = 0;
> > > >	u16 group, instance;
> > > >	unsigned int xecore;
> > > > +	u32 base_reg_value;
> > > >	int ret = 0;
> > > >
> > > >	mutex_lock(&stream->xecore_buf_lock);
> > > > +	/* If a GT or engine reset happens during EU stall data sampling,
> > > > +	 * all EU stall registers get reset to 0 and the cached values of
> > > > +	 * EU stall data buffers' read and write pointers are out of sync
> > > > +	 * with the register values. This can cause invalid data to be
> > > > +	 * returned from read(). To prevent this, check the value of a
> > > > +	 * EU stall base register. If it is zero, return -EBADFD. The
> > > > +	 * user is expected to close the fd and open a new fd.
> > > > +	 */
> > > > +	base_reg_value = xe_gt_mcr_unicast_read_any(gt, XEHPC_EUSTALL_BASE);
> > > > +	if (unlikely(!base_reg_value)) {
> > > > +		xe_gt_dbg(gt, "EU stall base register has been reset to 0\n");
> > > > +		mutex_unlock(&stream->xecore_buf_lock);
> > > > +		return -EBADFD;
> > > > +	}
> > >
> > > So I am seeing two problems here:
> > >
> > > 1. We are doing register read every read() call, rather than just when a
> > >    reset happens.
> > >
> > > 2. The other issue is should reset itself unblock a blocked poll() or
> > >    blocking read() call? If we don't do that, it is possible that poll()
> > >    or blocking read() remains blocked indefinitely and so either the
> > >    non-blocking read() doesn't get called at all, or a blocking read()
> > >    remains indefinitely blocked. So that we never actually return -EBADFD
> > >    even though a reset has happened.
> > >
> > >    (Note that, for exec(), I believe any blocked fences will unblock and
> > >    return error etc. if a reset happens during an exec() call (see
> > >    reset_status()), so EU stall should probably do something similar).
> > >
> > > So to address these two issues how about doing something like this:
> > >
> > > 1. Call an EU stall callback from xe_guc_exec_queue_reset_handler(). In the
> > >    callback, if an EU stall stream is open on that gt, check if
> > >    XEHPC_EUSTALL_BASE is 0 and set a stream variable stream->reset under a
> > >    suitable lock (likely xecore_buf_lock).
> >
> > I thought about this and I think this can be racy - if read() is called
> > after the engine and the EU stall registers got reset but before the
> > stream->reset is set, the read() would return bad EU stall data. I think
> > the XEHPC_EUSTALL_BASE register should be checked either in the polling
> > thread or in read (as in this patch) to avoid any race conditions.
> 
> In that case we need to ask GuC team about "pre context reset" g2h/h2g
> messages/callbacks.
> 
> Is it possible to deduce that reset has happened from the write_ptr
> register which is read in eu_stall_data_buf_poll? Say what happens to the
> mask bits when reset occurs (why are the mask bits there in the first place
> in a register which cannot be written by SW)? Then we can use that to set
> stream->reset, and not have to introduce an extra register read.
The write_ptr gets reset to 0, but it happens when the buffer wraps
around too. I am not sure about the mask bits, but I can check what
happens to them after a reset.
> 
> In a sense read() cannot return bad data since it uses cached read/wrt
> ptr's. So as long as we catch the reset while updating write_ptr, we should
> be ok.
> 
> Or, can we can drop the constraint that read() will never return bad
> data. If a reset happens, read() can return bad data.
> 
> Or do you have any other ideas to address the issues I mentioned above?
You earlier suggested about checking the base register in the polling
thread. If we set the stream->reset in the polling thread, it can be
used in both read() and poll(). Do you think this would address the
issues you mentioned above ?

Thank You
Harish.


> 
> >
> > >
> > > 2. From eu_stall_data_buf_poll(), if stream->reset is set, return true to
> > >    wake up any waiters. We may also need to set POLLERR or POLLHUP revents.
> > >
> > > 3. Now from read(), if stream->reset is set return -EBADFD.
> > >
> > > So I think something like this solves both problems mentioned above.
> > >
> > > So could you please look into this and see if this is possible? Or any
> > > other thoughts about this?
> > >
> > > Thanks.
> > > --
> > > Ashutosh
> > Thank You
> > Harish.
> > >
> > > >	if (bitmap_weight(stream->data_drop.mask, XE_MAX_DSS_FUSE_BITS)) {
> > > >		if (!stream->data_drop.reported_to_user) {
> > > >			stream->data_drop.reported_to_user = true;
> > > > --
> > > > 2.43.0
> > > >

  reply	other threads:[~2025-12-23 23:39 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-08  6:16 [PATCH 1/1] drm/xe/eustall: Return EBADFD from read if EU stall registers get reset Harish Chegondi
2025-12-08  6:32 ` ✓ CI.KUnit: success for series starting with [1/1] " Patchwork
2025-12-08  7:56 ` ✓ Xe.CI.BAT: " Patchwork
2025-12-08  8:48 ` ✗ Xe.CI.Full: failure " Patchwork
2025-12-12 21:18 ` [PATCH 1/1] " Dixit, Ashutosh
2025-12-16 23:53   ` Harish Chegondi
2025-12-18 19:53 ` Dixit, Ashutosh
2025-12-22 22:37   ` Harish Chegondi
2025-12-23  5:08     ` Dixit, Ashutosh
2025-12-23 23:39       ` Harish Chegondi [this message]
2025-12-24  1:47         ` Dixit, Ashutosh
  -- strict thread matches above, loose matches on Subject: below --
2025-10-01  6:38 Harish Chegondi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aUsoQy89xwWAAkmM@intel.com \
    --to=harish.chegondi@intel.com \
    --cc=ashutosh.dixit@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=umesh.nerlige.ramappa@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox