From: "Dixit, Ashutosh" <ashutosh.dixit@intel.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: intel-xe@lists.freedesktop.org, Jose Souza <jose.souza@intel.com>,
Lionel Landwerlin <lionel.g.landwerlin@intel.com>,
Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>,
Jonathan Cavitt <jonathan.cavitt@intel.com>
Subject: Re: [PATCH 4/7] drm/xe/oa: Signal output fences
Date: Wed, 21 Aug 2024 08:20:49 -0700 [thread overview]
Message-ID: <875xrt9a26.wl-ashutosh.dixit@intel.com> (raw)
In-Reply-To: <ZsTtIUxkOuPxNlMe@DUT025-TGLU.fm.intel.com>
On Tue, 20 Aug 2024 12:23:13 -0700, Matthew Brost wrote:
>
Hi Matt,
Thanks for all the feedback. Yes, my first encounter with dma_fence's so
not everything is smooth yet.
> On Mon, Aug 19, 2024 at 05:58:05PM -0700, Ashutosh Dixit wrote:
> > Introduce 'struct xe_oa_fence' which includes the dma_fence used to signal
> > output fences in the xe_sync array. The fences are signaled
> > asynchronously. When there are no output fences to signal, the OA
> > configuration wait is synchronously re-introduced into the ioctl.
> >
> > v2: Don't wait in the work, use callback + delayed work (Matt B)
> > Use a single, not a per-fence spinlock (Matt Brost)
> >
> > Suggested-by: Matthew Brost <matthew.brost@intel.com>
> > Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
> > ---
> > drivers/gpu/drm/xe/xe_oa.c | 110 +++++++++++++++++++++++++++----
> > drivers/gpu/drm/xe/xe_oa_types.h | 3 +
> > 2 files changed, 100 insertions(+), 13 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
> > index cad8f54500a10..1478d88722170 100644
> > --- a/drivers/gpu/drm/xe/xe_oa.c
> > +++ b/drivers/gpu/drm/xe/xe_oa.c
> > @@ -100,6 +100,15 @@ struct xe_oa_config_bo {
> > struct xe_bb *bb;
> > };
> >
> > +struct xe_oa_fence {
> > + /* @base: dma fence base */
> > + struct dma_fence base;
> > + /* @work: work to signal @base */
> > + struct delayed_work work;
> > + /* @cb: callback to schedule @work */
> > + struct dma_fence_cb cb;
> > +};
> > +
> > #define DRM_FMT(x) DRM_XE_OA_FMT_TYPE_##x
> >
> > static const struct xe_oa_format oa_formats[] = {
> > @@ -945,13 +954,62 @@ xe_oa_alloc_config_buffer(struct xe_oa_stream *stream, struct xe_oa_config *oa_c
> > return oa_bo;
> > }
> >
> > +static void xe_oa_fence_work_fn(struct work_struct *w)
> > +{
> > + struct xe_oa_fence *ofence = container_of(w, typeof(*ofence), work.work);
> > +
> > + /* Signal fence to indicate new OA configuration is active */
> > + dma_fence_signal(&ofence->base);
> > + dma_fence_put(&ofence->base);
> > +}
> > +
> > +static void xe_oa_config_cb(struct dma_fence *fence, struct dma_fence_cb *cb)
> > +{
> > + /* Additional empirical delay needed for NOA programming after registers are written */
> > +#define NOA_PROGRAM_ADDITIONAL_DELAY_US 500
> > +
> > + struct xe_oa_fence *ofence = container_of(cb, typeof(*ofence), cb);
> > +
> > + INIT_DELAYED_WORK(&ofence->work, xe_oa_fence_work_fn);
> > + queue_delayed_work(system_unbound_wq, &ofence->work,
> > + usecs_to_jiffies(NOA_PROGRAM_ADDITIONAL_DELAY_US));
> > + dma_fence_put(fence);
> > +}
> > +
> > +static const char *xe_oa_get_driver_name(struct dma_fence *fence)
> > +{
> > + return "xe_oa";
> > +}
> > +
> > +static const char *xe_oa_get_timeline_name(struct dma_fence *fence)
> > +{
> > + return "unbound";
> > +}
> > +
> > +static const struct dma_fence_ops xe_oa_fence_ops = {
> > + .get_driver_name = xe_oa_get_driver_name,
> > + .get_timeline_name = xe_oa_get_timeline_name,
> > +};
> > +
> > +static struct xe_oa_fence *xe_oa_fence_arm(struct xe_oa_stream *stream)
> > +{
> > + struct xe_oa_fence *ofence;
> > +
> > + ofence = kzalloc(sizeof(*ofence), GFP_KERNEL);
> > + if (!ofence)
> > + return ERR_PTR(-ENOMEM);
>
> I'd split this out so the malloc is done before submitting the job and
> done dma_fence_init after. This way once the job submitted there are no
> failure points.
OK, done in v3, definitely simplifies code flow.
> Also doing malloc after a job is submitted plays into dma-fence rules
> too, you have malloc in the path a signaling a user dma-fence too.
I believe you are referring to this:
* * Drivers are allowed to call dma_fence_wait() from their &shrinker
* callbacks. This means any code required for fence completion cannot
* allocate memory with GFP_KERNEL.
> It probably works the way you have it, but best practices we to be follow
> the changes I suggest.
Agreed.
>
> > +
> > + dma_fence_init(&ofence->base, &xe_oa_fence_ops, &stream->oa_fence_lock, 0, 0);
> > + return ofence;
> > +}
> > +
> > static int xe_oa_emit_oa_config(struct xe_oa_stream *stream, struct xe_oa_config *config)
> > {
> > #define NOA_PROGRAM_ADDITIONAL_DELAY_US 500
> > struct xe_oa_config_bo *oa_bo;
> > - int err = 0, us = NOA_PROGRAM_ADDITIONAL_DELAY_US;
> > + struct xe_oa_fence *ofence;
> > + int i, err, num_signal = 0;
> > struct dma_fence *fence;
> > - long timeout;
> >
> > /* Emit OA configuration batch */
> > oa_bo = xe_oa_alloc_config_buffer(stream, config);
> > @@ -966,18 +1024,43 @@ static int xe_oa_emit_oa_config(struct xe_oa_stream *stream, struct xe_oa_config
> > goto exit;
> > }
> >
> > - /* Wait till all previous batches have executed */
> > - timeout = dma_fence_wait_timeout(fence, false, 5 * HZ);
> > - dma_fence_put(fence);
> > - if (timeout < 0)
> > - err = timeout;
> > - else if (!timeout)
> > - err = -ETIME;
> > - if (err)
> > - drm_dbg(&stream->oa->xe->drm, "dma_fence_wait_timeout err %d\n", err);
> > + /* Initialize and set fence to signal */
> > + ofence = xe_oa_fence_arm(stream);
> > + if (IS_ERR(ofence)) {
> > + err = PTR_ERR(ofence);
> > + goto put_fence;
> > + }
> >
> > - /* Additional empirical delay needed for NOA programming after registers are written */
> > - usleep_range(us, 2 * us);
> > + for (i = 0; i < stream->num_syncs; i++) {
> > + if (stream->syncs[i].flags & DRM_XE_SYNC_FLAG_SIGNAL)
> > + num_signal++;
> > + xe_sync_entry_signal(&stream->syncs[i], &ofence->base);
> > + }
> > +
> > + /* Add job fence callback to schedule work to signal ofence->base */
> > + err = dma_fence_add_callback(fence, &ofence->cb, xe_oa_config_cb);
> > + if (err == -ENOENT)
> > + xe_oa_config_cb(fence, &ofence->cb);
> > + else if (err)
>
> I'd just assert here rather than fail. The only return currently from
> dma_fence_add_callback is -ENOENT, in other code paths we just assert
> too. See invalidation_fence_init in xe_pt.c.
Done in v3.
>
> > + goto put_ofence;
> > +
> > + /* If nothing needs to be signaled we wait synchronously */
> > + if (!num_signal)
> > + dma_fence_wait(&ofence->base, true);
>
> I think you have a UAF here. The worker which signals the fence puts
> '&ofence->base'. So I think you need an extra ref for !num_signal before
> calling dma_fence_add_callback which is dropped after dma_fence_wait.
Good catch, but I think it is safe because the fence cannot really be freed
till the xe_sync_entry_cleanup() below (even if userspace closes the
sync handle). But in any case, to be safe, I've added an additional
dma_fence_get/put.
> Also since you have interruptable wait here, you likely need to return
> an error to the user to retry the IOCTL upon interruption, right?
Good catch, I've changed that to a non-interruptible wait (didn't really
need an interruptible wait, was just jittery about hangs i.e. fence not
signaling at all :/ )
Thanks a lot,
Ashutosh
>
> > +
> > + /* Done with syncs */
> > + for (i = 0; i < stream->num_syncs; i++)
> > + xe_sync_entry_cleanup(&stream->syncs[i]);
> > + kfree(stream->syncs);
> > +
> > + return 0;
> > +put_ofence:
> > + for (i = 0; i < stream->num_syncs; i++)
> > + xe_sync_entry_cleanup(&stream->syncs[i]);
> > + kfree(stream->syncs);
> > + dma_fence_put(&ofence->base);
> > +put_fence:
> > + dma_fence_put(fence);
> > exit:
> > return err;
> > }
> > @@ -1480,6 +1563,7 @@ static int xe_oa_stream_init(struct xe_oa_stream *stream,
> > goto err_free_oa_buf;
> > }
> >
> > + spin_lock_init(&stream->oa_fence_lock);
> > ret = xe_oa_enable_metric_set(stream);
> > if (ret) {
> > drm_dbg(&stream->oa->xe->drm, "Unable to enable metric set\n");
> > diff --git a/drivers/gpu/drm/xe/xe_oa_types.h b/drivers/gpu/drm/xe/xe_oa_types.h
> > index c1ca960af9305..412f1460c1437 100644
> > --- a/drivers/gpu/drm/xe/xe_oa_types.h
> > +++ b/drivers/gpu/drm/xe/xe_oa_types.h
> > @@ -239,6 +239,9 @@ struct xe_oa_stream {
> > /** @no_preempt: Whether preemption and timeslicing is disabled for stream exec_q */
> > u32 no_preempt;
> >
> > + /** @oa_fence_lock: Lock for struct xe_oa_fence */
> > + spinlock_t oa_fence_lock;
> > +
> > /** @num_syncs: size of @syncs array */
> > u32 num_syncs;
> >
> > --
> > 2.41.0
> >
next prev parent reply other threads:[~2024-08-21 15:32 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-20 0:58 [PATCH v2 0/7] drm/xe/oa: xe_syncs for OA Ashutosh Dixit
2024-08-20 0:58 ` [PATCH 1/7] drm/xe/oa: Separate batch submission from waiting for completion Ashutosh Dixit
2024-08-20 0:58 ` [PATCH 2/7] drm/xe/oa/uapi: Define and parse OA sync properties Ashutosh Dixit
2024-08-20 0:58 ` [PATCH 3/7] drm/xe/oa: Add input fence dependencies Ashutosh Dixit
2024-08-20 0:58 ` [PATCH 4/7] drm/xe/oa: Signal output fences Ashutosh Dixit
2024-08-20 19:23 ` Matthew Brost
2024-08-21 15:20 ` Dixit, Ashutosh [this message]
2024-08-21 16:02 ` Matthew Brost
2024-08-20 0:58 ` [PATCH 5/7] drm/xe/oa: Move functions up so they can be reused for config ioctl Ashutosh Dixit
2024-08-20 0:58 ` [PATCH 6/7] drm/xe/oa: Add syncs support to OA " Ashutosh Dixit
2024-08-20 0:58 ` [PATCH 7/7] drm/xe/oa: Allow only certain property changes from config Ashutosh Dixit
2024-08-20 14:15 ` Cavitt, Jonathan
2024-08-21 15:19 ` Dixit, Ashutosh
2024-08-20 1:03 ` ✓ CI.Patch_applied: success for drm/xe/oa: xe_syncs for OA (rev2) Patchwork
2024-08-20 1:04 ` ✓ CI.checkpatch: " Patchwork
2024-08-20 1:05 ` ✓ CI.KUnit: " Patchwork
2024-08-20 1:17 ` ✓ CI.Build: " Patchwork
2024-08-20 1:19 ` ✓ CI.Hooks: " Patchwork
2024-08-20 1:20 ` ✓ CI.checksparse: " Patchwork
2024-08-20 1:40 ` ✓ CI.BAT: " Patchwork
2024-08-20 9:37 ` ✗ CI.FULL: failure " Patchwork
-- strict thread matches above, loose matches on Subject: below --
2024-08-21 15:28 [PATCH v3 0/7] drm/xe/oa: xe_syncs for OA Ashutosh Dixit
2024-08-21 15:28 ` [PATCH 4/7] drm/xe/oa: Signal output fences Ashutosh Dixit
2024-08-21 15:49 ` Matthew Brost
2024-08-28 1:50 [PATCH v4 0/7] drm/xe/oa: xe_syncs for OA Ashutosh Dixit
2024-08-28 1:50 ` [PATCH 4/7] drm/xe/oa: Signal output fences Ashutosh Dixit
2024-08-30 22:16 [PATCH v5 0/7] drm/xe/oa: xe_syncs for OA Ashutosh Dixit
2024-08-30 22:16 ` [PATCH 4/7] drm/xe/oa: Signal output fences Ashutosh Dixit
2024-08-30 22:15 ` Dixit, Ashutosh
2024-08-30 22:45 ` Matthew Brost
2024-08-30 22:58 ` Dixit, Ashutosh
2024-09-17 22:18 ` Dixit, Ashutosh
2024-09-17 23:38 ` Matthew Brost
2024-09-18 19:59 ` Dixit, Ashutosh
2024-09-18 19:53 [PATCH v6 0/7] drm/xe/oa: xe_syncs for OA Ashutosh Dixit
2024-09-18 19:53 ` [PATCH 4/7] drm/xe/oa: Signal output fences Ashutosh Dixit
2024-10-22 20:03 [PATCH v7 0/7] drm/xe/oa: xe_syncs for OA Ashutosh Dixit
2024-10-22 20:03 ` [PATCH 4/7] drm/xe/oa: Signal output fences Ashutosh Dixit
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=875xrt9a26.wl-ashutosh.dixit@intel.com \
--to=ashutosh.dixit@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=jonathan.cavitt@intel.com \
--cc=jose.souza@intel.com \
--cc=lionel.g.landwerlin@intel.com \
--cc=matthew.brost@intel.com \
--cc=umesh.nerlige.ramappa@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.