intel-xe.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: "Dixit, Ashutosh" <ashutosh.dixit@intel.com>
To: Harish Chegondi <harish.chegondi@intel.com>
Cc: <intel-xe@lists.freedesktop.org>
Subject: Re: [PATCH v8 4/7] drm/xe/eustall: Return -EIO error from read() if HW drops data
Date: Fri, 31 Jan 2025 12:19:46 -0800	[thread overview]
Message-ID: <85wmeaenx9.wl-ashutosh.dixit@intel.com> (raw)
In-Reply-To: <Z50ku-fhUr3ip6lz@intel.com>

On Fri, 31 Jan 2025 11:30:03 -0800, Harish Chegondi wrote:
>
> On Wed, Jan 29, 2025 at 08:45:59PM -0800, Dixit, Ashutosh wrote:
> > On Wed, 15 Jan 2025 12:02:10 -0800, Harish Chegondi wrote:
> > >
> > > If the user space doesn't read the EU stall data fast enough,
> > > it is possible that the EU stall data buffer can get filled,
> > > and if the hardware wants to write more data, it simply drops
> > > data due to unavailable buffer space. In that case, hardware
> > > sets a bit in a register. If the driver detects data drop,
> > > the driver read() returns -EIO error to let the user space
> > > know that HW has dropped data. The -EIO error is returned
> > > even if there is EU stall data in the buffer. A subsequent
> > > read by the user space returns the remaining EU stall data.
> >
> > As I mentioned earlier, entire dropped packet handling should be in this
> > patch, so we can see the entire logic around this. So data_drop struct
> > should be defined in this patch.
> I worded the commit message that this commit is about read() returning
> -EIO when data is dropped. So, I didn't put all the data drop code in
> this patch. Sure, I can reword the commit message and move the code
> into this patch.

Yeah, the "unit of functionality" is not returning -EIO, it is "dropped
data handling", so that whole unit should be in a separate patch. This one,
I don't want to compromise on.

> >
> > >
> > > Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_eu_stall.c | 12 ++++++++++++
> > >  drivers/gpu/drm/xe/xe_eu_stall.h |  1 +
> > >  2 files changed, 13 insertions(+)
> > >
> > > diff --git a/drivers/gpu/drm/xe/xe_eu_stall.c b/drivers/gpu/drm/xe/xe_eu_stall.c
> > > index c388d733b857..437782f8433c 100644
> > > --- a/drivers/gpu/drm/xe/xe_eu_stall.c
> > > +++ b/drivers/gpu/drm/xe/xe_eu_stall.c
> > > @@ -472,6 +472,7 @@ xe_eu_stall_stream_read_locked(struct xe_eu_stall_data_stream *stream,
> > >   * before calling read().
> > >   *
> > >   * Returns: The number of bytes copied or a negative error code on failure.
> > > + *	    -EIO if HW drops any EU stall data when the buffer is full.
> > >   */
> > >  static ssize_t xe_eu_stall_stream_read(struct file *file, char __user *buf,
> > >				       size_t count, loff_t *ppos)
> > > @@ -485,6 +486,16 @@ static ssize_t xe_eu_stall_stream_read(struct file *file, char __user *buf,
> > >		return -EINVAL;
> > >	}
> > >
> > > +	if (bitmap_weight(stream->data_drop.mask, XE_MAX_DSS_FUSE_BITS)) {
> >
> > Since data_drop.mask is being touched elsewhere under xecore_buf->lock,
> > here also it should be accessed under the same lock. So this returning -EIO
> > should probably be moved into xe_eu_stall_stream_read_locked?
> data_drop.mask is being accessed via set_bit(), clear_bit(), test_bit()
> and bitmap_weight(). set_bit() and clear_bit() are atomic operations,
> but test_bit() and bitmap_weight() are not atomic. So, not all the code
> accessing the mask need to be under lock. The code that is under lock is
> under the buffer lock, whereas xe_eu_stall_stream_read_locked() is under
> gt->eu_stall->lock. So, moving this code into xe_eu_stall_stream_read_locked
> would not make any difference. I think this code can exist outside of a
> lock. If one read() just misses a data drop, the subsequent read would
> report the data drop.

OK, let me see what happens with my suggestion below (starting with "I had
also outlined another...") and then we can see if the locking will be ok or
not.

> >
> > > +		if (!stream->data_drop.reported_to_user) {
> > > +			stream->data_drop.reported_to_user = true;
> > > +			xe_gt_dbg(gt, "EU stall data dropped in XeCores: %*pb\n",
> > > +				  XE_MAX_DSS_FUSE_BITS, stream->data_drop.mask);
> > > +			return -EIO;
> > > +		}
> > > +		stream->data_drop.reported_to_user = false;
> >
> > I don't think this logic is correct. We should set this to false only after
> > we have cleared all set bits (e.g. only after bitmap_weight) otherwise we
> > might keep returning -EIO multiple times?
> If the subsequent read() reads all the data from all the subslices, it
> would clear all the bits. But if the user buffer is small and it doesn't
> read all the data from all the subslices, some bits can continue to be
> set and can cause multiple alternate -EIO returns. Ideally, the user
> buffer should be big enough to accomodate all the data from the kernel
> buffer.

Afaik we are not exposing a minimum user buffer size in the uapi, but we
could too.

I had also outlined another simple way of doing this in my follow up to
this email, which doesn't have such issues. What do you think of that?

> >
> > If HW continues to drop data and keep setting the line, while we are
> > resetting the bit, it is possible bitmap_weight might never become 0. I
> > think that is ok, we have returned -EIO at least once to indicate to
> > userspace that it is not reading data fast enough and HW is dropping data.
> >
> > Or we may return -EIO multiple times as is happening here, where
> > reported_to_user is set to 0 before all bits might have been cleared. So
> > what is happening here might be ok too.
> >
> > To see this clearly and evaluate it is why I am saying move all of this
> > data drop handling and -EIO return into this one patch. So we can decide
> > which approach to take: return -EIO just once or return multiple times.
> >
> > We can also maybe defer this patch and merge the other stuff first if it's
> > a separate patch.
> >
> > So maybe this is ok, maybe not, anyway something to think about.
> >
> > > +	}
> > > +
> > >	if (!(file->f_flags & O_NONBLOCK)) {
> > >		do {
> > >			if (!stream->pollin) {
> > > @@ -680,6 +691,7 @@ static int xe_eu_stall_stream_init(struct xe_eu_stall_data_stream *stream,
> > >	if (!stream->xecore_buf)
> > >		return -ENOMEM;
> > >
> > > +	stream->data_drop.reported_to_user = false;
> > >	bitmap_zero(stream->data_drop.mask, XE_MAX_DSS_FUSE_BITS);
> >
> > Stream is kzalloc'd, why do you need to init these?
> >
> > >
> > >	xe_pm_runtime_get(gt_to_xe(gt));
> > > diff --git a/drivers/gpu/drm/xe/xe_eu_stall.h b/drivers/gpu/drm/xe/xe_eu_stall.h
> > > index f97c8bf8e852..8bc44e9e98af 100644
> > > --- a/drivers/gpu/drm/xe/xe_eu_stall.h
> > > +++ b/drivers/gpu/drm/xe/xe_eu_stall.h
> > > @@ -31,6 +31,7 @@ struct xe_eu_stall_data_stream {
> > >	struct xe_bo *bo;
> > >	struct per_xecore_buf *xecore_buf;
> > >	struct {
> > > +		bool reported_to_user;
> > >		xe_dss_mask_t mask;
> > >	} data_drop;
> > >	struct hrtimer poll_check_timer;
> > > --
> > > 2.47.1
> > >

  reply	other threads:[~2025-01-31 20:19 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-15 20:02 [PATCH v8 0/7] Add support for EU stall sampling Harish Chegondi
2025-01-15 20:02 ` [PATCH v8 1/7] drm/xe/topology: Add a function to find the index of the last enabled DSS in a mask Harish Chegondi
2025-01-17 17:25   ` Dixit, Ashutosh
2025-01-22  5:18     ` Harish Chegondi
2025-01-15 20:02 ` [PATCH v8 2/7] drm/xe/uapi: Introduce API for EU stall sampling Harish Chegondi
2025-01-17 19:02   ` Dixit, Ashutosh
2025-01-22 23:44     ` Harish Chegondi
2025-01-23  2:19       ` Dixit, Ashutosh
2025-01-15 20:02 ` [PATCH v8 3/7] drm/xe/eustall: Implement EU stall sampling APIs for Xe_HPC Harish Chegondi
2025-01-18  2:34   ` Dixit, Ashutosh
2025-01-23 18:51   ` Dixit, Ashutosh
2025-01-25  3:09   ` [PATCH v8 3 " Dixit, Ashutosh
2025-01-29  4:12   ` [PATCH v8 4 " Dixit, Ashutosh
2025-01-29  4:32     ` Dixit, Ashutosh
2025-01-30 18:46     ` Harish Chegondi
2025-01-31  3:23       ` Dixit, Ashutosh
2025-01-15 20:02 ` [PATCH v8 4/7] drm/xe/eustall: Return -EIO error from read() if HW drops data Harish Chegondi
2025-01-30  4:45   ` Dixit, Ashutosh
2025-01-30 17:05     ` Dixit, Ashutosh
2025-01-31 21:50       ` Harish Chegondi
2025-01-31 19:30     ` Harish Chegondi
2025-01-31 20:19       ` Dixit, Ashutosh [this message]
2025-01-31 22:59         ` Harish Chegondi
2025-02-01  0:13           ` Dixit, Ashutosh
2025-02-01  6:57             ` Dixit, Ashutosh
2025-01-15 20:02 ` [PATCH v8 5/7] drm/xe/eustall: Add EU stall sampling support for Xe2 Harish Chegondi
2025-01-30  4:55   ` Dixit, Ashutosh
2025-02-05  1:16     ` Olson, Matthew
2025-02-05  1:57       ` Dixit, Ashutosh
2025-02-05 19:03         ` Olson, Matthew
2025-02-05 20:02           ` Dixit, Ashutosh
2025-01-15 20:02 ` [PATCH v8 6/7] drm/xe/uapi: Add a device query to get EU stall sampling information Harish Chegondi
2025-01-16 22:34   ` Dixit, Ashutosh
2025-01-22  2:48     ` Harish Chegondi
2025-01-22  3:00       ` Dixit, Ashutosh
2025-01-30 17:36   ` Dixit, Ashutosh
2025-01-15 20:02 ` [PATCH v8 7/7] drm/xe/eustall: Add workaround 22016596838 which applies to PVC Harish Chegondi
2025-01-30  5:14   ` Dixit, Ashutosh
2025-01-15 20:46 ` ✓ CI.Patch_applied: success for Add support for EU stall sampling Patchwork
2025-01-15 20:46 ` ✗ CI.checkpatch: warning " Patchwork
2025-01-15 20:48 ` ✓ CI.KUnit: success " Patchwork
2025-01-15 21:14 ` ✓ CI.Build: " Patchwork
2025-01-15 21:16 ` ✗ CI.Hooks: failure " Patchwork
2025-01-15 21:18 ` ✓ CI.checksparse: success " Patchwork
2025-01-15 21:43 ` ✗ Xe.CI.BAT: failure " Patchwork
2025-01-16  0:37 ` ✗ Xe.CI.Full: " Patchwork
2025-01-16  0:51 ` [PATCH v8 0/7] " Degrood, Felix J
2025-01-16 21:50 ` Olson, Matthew
2025-01-18  5:19   ` Harish Chegondi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=85wmeaenx9.wl-ashutosh.dixit@intel.com \
    --to=ashutosh.dixit@intel.com \
    --cc=harish.chegondi@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).