From: "Dixit, Ashutosh" <ashutosh.dixit@intel.com>
To: Harish Chegondi <harish.chegondi@intel.com>
Cc: <intel-xe@lists.freedesktop.org>
Subject: Re: [PATCH v8 4/7] drm/xe/eustall: Return -EIO error from read() if HW drops data
Date: Thu, 30 Jan 2025 09:05:10 -0800 [thread overview]
Message-ID: <8534h0fd15.wl-ashutosh.dixit@intel.com> (raw)
In-Reply-To: <857c6cgb94.wl-ashutosh.dixit@intel.com>
On Wed, 29 Jan 2025 20:45:59 -0800, Dixit, Ashutosh wrote:
>
> On Wed, 15 Jan 2025 12:02:10 -0800, Harish Chegondi wrote:
> >
> > If the user space doesn't read the EU stall data fast enough,
> > it is possible that the EU stall data buffer can get filled,
> > and if the hardware wants to write more data, it simply drops
> > data due to unavailable buffer space. In that case, hardware
> > sets a bit in a register. If the driver detects data drop,
> > the driver read() returns -EIO error to let the user space
> > know that HW has dropped data. The -EIO error is returned
> > even if there is EU stall data in the buffer. A subsequent
> > read by the user space returns the remaining EU stall data.
>
> As I mentioned earlier, entire dropped packet handling should be in this
> patch, so we can see the entire logic around this. So data_drop struct
> should be defined in this patch.
>
> >
> > Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
> > ---
> > drivers/gpu/drm/xe/xe_eu_stall.c | 12 ++++++++++++
> > drivers/gpu/drm/xe/xe_eu_stall.h | 1 +
> > 2 files changed, 13 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_eu_stall.c b/drivers/gpu/drm/xe/xe_eu_stall.c
> > index c388d733b857..437782f8433c 100644
> > --- a/drivers/gpu/drm/xe/xe_eu_stall.c
> > +++ b/drivers/gpu/drm/xe/xe_eu_stall.c
> > @@ -472,6 +472,7 @@ xe_eu_stall_stream_read_locked(struct xe_eu_stall_data_stream *stream,
> > * before calling read().
> > *
> > * Returns: The number of bytes copied or a negative error code on failure.
> > + * -EIO if HW drops any EU stall data when the buffer is full.
> > */
> > static ssize_t xe_eu_stall_stream_read(struct file *file, char __user *buf,
> > size_t count, loff_t *ppos)
> > @@ -485,6 +486,16 @@ static ssize_t xe_eu_stall_stream_read(struct file *file, char __user *buf,
> > return -EINVAL;
> > }
> >
> > + if (bitmap_weight(stream->data_drop.mask, XE_MAX_DSS_FUSE_BITS)) {
>
> Since data_drop.mask is being touched elsewhere under xecore_buf->lock,
> here also it should be accessed under the same lock. So this returning -EIO
> should probably be moved into xe_eu_stall_stream_read_locked?
>
> > + if (!stream->data_drop.reported_to_user) {
> > + stream->data_drop.reported_to_user = true;
> > + xe_gt_dbg(gt, "EU stall data dropped in XeCores: %*pb\n",
> > + XE_MAX_DSS_FUSE_BITS, stream->data_drop.mask);
> > + return -EIO;
> > + }
> > + stream->data_drop.reported_to_user = false;
>
> I don't think this logic is correct. We should set this to false only after
> we have cleared all set bits (e.g. only after bitmap_weight) otherwise we
> might keep returning -EIO multiple times?
>
> If HW continues to drop data and keep setting the line, while we are
> resetting the bit, it is possible bitmap_weight might never become 0. I
> think that is ok, we have returned -EIO at least once to indicate to
> userspace that it is not reading data fast enough and HW is dropping data.
>
> Or we may return -EIO multiple times as is happening here, where
> reported_to_user is set to 0 before all bits might have been cleared. So
> what is happening here might be ok too.
>
> To see this clearly and evaluate it is why I am saying move all of this
> data drop handling and -EIO return into this one patch. So we can decide
> which approach to take: return -EIO just once or return multiple times.
>
> We can also maybe defer this patch and merge the other stuff first if it's
> a separate patch.
>
> So maybe this is ok, maybe not, anyway something to think about.
Because we don't want to complicate this, here's another simple idea:
In eu_stall_data_buf_check(), when we see the drop bit set for a particular
dss, move the read pointer (effectively discarding any data from that dss)
and also clear_dropped_eviction_line_bit(), but the dss with dropped data
is saved off in data_drop.mask. So basically we are clearing and saving the
error condition. Discarding data from that dss might be ok, since HW has
already dropped data.
Now in the next read_locked, if any data_drop.mask bit is set, return -EIO
and clear data_drop.mask. This way we don't need
data_drop.reported_to_user. So this way the entire situation is cleared and
handled once we have returned -EIO.
If more data gets dropped before the next read, same process repeats and
-EIO will be returned again.
>
> > + }
> > +
> > if (!(file->f_flags & O_NONBLOCK)) {
> > do {
> > if (!stream->pollin) {
> > @@ -680,6 +691,7 @@ static int xe_eu_stall_stream_init(struct xe_eu_stall_data_stream *stream,
> > if (!stream->xecore_buf)
> > return -ENOMEM;
> >
> > + stream->data_drop.reported_to_user = false;
> > bitmap_zero(stream->data_drop.mask, XE_MAX_DSS_FUSE_BITS);
>
> Stream is kzalloc'd, why do you need to init these?
>
> >
> > xe_pm_runtime_get(gt_to_xe(gt));
> > diff --git a/drivers/gpu/drm/xe/xe_eu_stall.h b/drivers/gpu/drm/xe/xe_eu_stall.h
> > index f97c8bf8e852..8bc44e9e98af 100644
> > --- a/drivers/gpu/drm/xe/xe_eu_stall.h
> > +++ b/drivers/gpu/drm/xe/xe_eu_stall.h
> > @@ -31,6 +31,7 @@ struct xe_eu_stall_data_stream {
> > struct xe_bo *bo;
> > struct per_xecore_buf *xecore_buf;
> > struct {
> > + bool reported_to_user;
> > xe_dss_mask_t mask;
> > } data_drop;
> > struct hrtimer poll_check_timer;
> > --
> > 2.47.1
> >
next prev parent reply other threads:[~2025-01-30 17:05 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-15 20:02 [PATCH v8 0/7] Add support for EU stall sampling Harish Chegondi
2025-01-15 20:02 ` [PATCH v8 1/7] drm/xe/topology: Add a function to find the index of the last enabled DSS in a mask Harish Chegondi
2025-01-17 17:25 ` Dixit, Ashutosh
2025-01-22 5:18 ` Harish Chegondi
2025-01-15 20:02 ` [PATCH v8 2/7] drm/xe/uapi: Introduce API for EU stall sampling Harish Chegondi
2025-01-17 19:02 ` Dixit, Ashutosh
2025-01-22 23:44 ` Harish Chegondi
2025-01-23 2:19 ` Dixit, Ashutosh
2025-01-15 20:02 ` [PATCH v8 3/7] drm/xe/eustall: Implement EU stall sampling APIs for Xe_HPC Harish Chegondi
2025-01-18 2:34 ` Dixit, Ashutosh
2025-01-23 18:51 ` Dixit, Ashutosh
2025-01-25 3:09 ` [PATCH v8 3 " Dixit, Ashutosh
2025-01-29 4:12 ` [PATCH v8 4 " Dixit, Ashutosh
2025-01-29 4:32 ` Dixit, Ashutosh
2025-01-30 18:46 ` Harish Chegondi
2025-01-31 3:23 ` Dixit, Ashutosh
2025-01-15 20:02 ` [PATCH v8 4/7] drm/xe/eustall: Return -EIO error from read() if HW drops data Harish Chegondi
2025-01-30 4:45 ` Dixit, Ashutosh
2025-01-30 17:05 ` Dixit, Ashutosh [this message]
2025-01-31 21:50 ` Harish Chegondi
2025-01-31 19:30 ` Harish Chegondi
2025-01-31 20:19 ` Dixit, Ashutosh
2025-01-31 22:59 ` Harish Chegondi
2025-02-01 0:13 ` Dixit, Ashutosh
2025-02-01 6:57 ` Dixit, Ashutosh
2025-01-15 20:02 ` [PATCH v8 5/7] drm/xe/eustall: Add EU stall sampling support for Xe2 Harish Chegondi
2025-01-30 4:55 ` Dixit, Ashutosh
2025-02-05 1:16 ` Olson, Matthew
2025-02-05 1:57 ` Dixit, Ashutosh
2025-02-05 19:03 ` Olson, Matthew
2025-02-05 20:02 ` Dixit, Ashutosh
2025-01-15 20:02 ` [PATCH v8 6/7] drm/xe/uapi: Add a device query to get EU stall sampling information Harish Chegondi
2025-01-16 22:34 ` Dixit, Ashutosh
2025-01-22 2:48 ` Harish Chegondi
2025-01-22 3:00 ` Dixit, Ashutosh
2025-01-30 17:36 ` Dixit, Ashutosh
2025-01-15 20:02 ` [PATCH v8 7/7] drm/xe/eustall: Add workaround 22016596838 which applies to PVC Harish Chegondi
2025-01-30 5:14 ` Dixit, Ashutosh
2025-01-15 20:46 ` ✓ CI.Patch_applied: success for Add support for EU stall sampling Patchwork
2025-01-15 20:46 ` ✗ CI.checkpatch: warning " Patchwork
2025-01-15 20:48 ` ✓ CI.KUnit: success " Patchwork
2025-01-15 21:14 ` ✓ CI.Build: " Patchwork
2025-01-15 21:16 ` ✗ CI.Hooks: failure " Patchwork
2025-01-15 21:18 ` ✓ CI.checksparse: success " Patchwork
2025-01-15 21:43 ` ✗ Xe.CI.BAT: failure " Patchwork
2025-01-16 0:37 ` ✗ Xe.CI.Full: " Patchwork
2025-01-16 0:51 ` [PATCH v8 0/7] " Degrood, Felix J
2025-01-16 21:50 ` Olson, Matthew
2025-01-18 5:19 ` Harish Chegondi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8534h0fd15.wl-ashutosh.dixit@intel.com \
--to=ashutosh.dixit@intel.com \
--cc=harish.chegondi@intel.com \
--cc=intel-xe@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.