From: Matthew Brost <matthew.brost@intel.com>
To: "Summers, Stuart" <stuart.summers@intel.com>
Cc: "intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>
Subject: Re: [PATCH v4 11/12] drm/xe: Add context-based invalidation to GuC TLB invalidation backend
Date: Tue, 13 Jan 2026 16:25:38 -0800 [thread overview]
Message-ID: <aWbigiMoVLMzSBux@lstrano-desk.jf.intel.com> (raw)
In-Reply-To: <c020446d914494e6f5d08eaaa055cef5ab0fbc0d.camel@intel.com>
On Tue, Jan 13, 2026 at 04:25:57PM -0700, Summers, Stuart wrote:
> On Tue, 2026-01-13 at 14:08 -0800, Matthew Brost wrote:
> > On Tue, Jan 13, 2026 at 02:23:57PM -0700, Summers, Stuart wrote:
> > > On Mon, 2026-01-12 at 18:52 -0800, Matthew Brost wrote:
> > > > Introduce context-based invalidation support to the GuC TLB
> > > > invalidation
> > > > backend. This is implemented by iterating over each exec queue
> > > > per GT
> > > > within a VM, skipping inactive queues, and issuing a context-
> > > > based
> > > > (GuC
> > > > ID) H2G TLB invalidation. All H2G messages, except the final one,
> > > > are
> > > > sent with an invalid seqno, which the G2H handler drops to ensure
> > > > the
> > > > TLB invalidation fence is only signaled once all H2G messages are
> > > > completed.
> > > >
> > > > A watermark mechanism is also added to switch between context-
> > > > based
> > > > TLB
> > > > invalidations and full device-wide invalidations, as the return
> > > > on
> > > > investment for context-based invalidation diminishes when many
> > > > exec
> > > > queues are mapped.
> > > >
> > > > v2:
> > > > - Fix checkpatch warnings
> > > > v3:
> > > > - Rebase on PRL
> > > > - Use ref counting to avoid racing with deregisters
> > > > v4:
> > > > - Extra braces (Stuart)
> > > > - Use per GT list (CI)
> > > > - Reorder put
> > > >
> > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > ---
> > > > drivers/gpu/drm/xe/xe_guc_tlb_inval.c | 145
> > > > +++++++++++++++++++++++++-
> > > > 1 file changed, 141 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> > > > b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> > > > index 070d2e2cb7c9..ced58f46f846 100644
> > > > --- a/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> > > > +++ b/drivers/gpu/drm/xe/xe_guc_tlb_inval.c
> > > > @@ -6,15 +6,19 @@
> > > > #include "abi/guc_actions_abi.h"
> > > >
> > > > #include "xe_device.h"
> > > > +#include "xe_exec_queue.h"
> > > > +#include "xe_exec_queue_types.h"
> > > > #include "xe_gt_stats.h"
> > > > #include "xe_gt_types.h"
> > > > #include "xe_guc.h"
> > > > #include "xe_guc_ct.h"
> > > > +#include "xe_guc_exec_queue_types.h"
> > > > #include "xe_guc_tlb_inval.h"
> > > > #include "xe_force_wake.h"
> > > > #include "xe_mmio.h"
> > > > #include "xe_sa.h"
> > > > #include "xe_tlb_inval.h"
> > > > +#include "xe_vm.h"
> > > >
> > > > #include "regs/xe_guc_regs.h"
> > > >
> > > > @@ -156,10 +160,16 @@ static int send_tlb_inval_ppgtt(struct
> > > > xe_guc
> > > > *guc, u32 seqno, u64 start,
> > > > {
> > > > #define MAX_TLB_INVALIDATION_LEN 7
> > > > struct xe_gt *gt = guc_to_gt(guc);
> > > > + struct xe_device *xe = guc_to_xe(guc);
> > > > u32 action[MAX_TLB_INVALIDATION_LEN];
> > > > u64 length = end - start;
> > > > int len = 0, err;
> > > >
> > > > + xe_gt_assert(gt, (type == XE_GUC_TLB_INVAL_PAGE_SELECTIVE
> > > > &&
> > > > + !xe->info.has_ctx_tlb_inval) ||
> > > > + (type == XE_GUC_TLB_INVAL_PAGE_SELECTIVE_CTX
> > > > &&
> > > > + xe->info.has_ctx_tlb_inval));
> > > > +
> > > > action[len++] = XE_GUC_ACTION_TLB_INVALIDATION;
> > > > action[len++] = !prl_sa ? seqno :
> > > > TLB_INVALIDATION_SEQNO_INVALID;
> > > > if (!gt_to_xe(gt)->info.has_range_tlb_inval ||
> > > > @@ -168,9 +178,11 @@ static int send_tlb_inval_ppgtt(struct
> > > > xe_guc
> > > > *guc, u32 seqno, u64 start,
> > > > } else {
> > > > u64 normalize_len =
> > > > normalize_invalidation_range(gt,
> > > > &start,
> > > >
> > > > &end);
> > > > + bool need_flush = !prl_sa &&
> > > > + seqno != TLB_INVALIDATION_SEQNO_INVALID;
> > > >
> > > > /* Flush on NULL case, Media is not required to
> > > > modify flush due to no PPC so NOP */
> > > > - action[len++] = MAKE_INVAL_OP_FLUSH(type,
> > > > !prl_sa);
> > > > + action[len++] = MAKE_INVAL_OP_FLUSH(type,
> > > > need_flush);
> > > > action[len++] = id;
> > > > action[len++] = lower_32_bits(start);
> > > > action[len++] = upper_32_bits(start);
> > > > @@ -181,8 +193,10 @@ static int send_tlb_inval_ppgtt(struct
> > > > xe_guc
> > > > *guc, u32 seqno, u64 start,
> > > > #undef MAX_TLB_INVALIDATION_LEN
> > > >
> > > > err = send_tlb_inval(guc, action, len);
> > > > - if (!err && prl_sa)
> > > > + if (!err && prl_sa) {
> > > > + xe_gt_assert(gt, seqno !=
> > > > TLB_INVALIDATION_SEQNO_INVALID);
> > > > err = send_page_reclaim(guc, seqno,
> > > > xe_sa_bo_gpu_addr(prl_sa));
> > > > + }
> > > > return err;
> > > > }
> > > >
> > > > @@ -201,6 +215,114 @@ static int send_tlb_inval_asid_ppgtt(struct
> > > > xe_tlb_inval *tlb_inval, u32 seqno,
> > > >
> > > > XE_GUC_TLB_INVAL_PAGE_SELECTIVE,
> > > > prl_sa);
> > > > }
> > > >
> > > > +static int send_tlb_inval_ctx_ppgtt(struct xe_tlb_inval
> > > > *tlb_inval,
> > > > u32 seqno,
> > > > + u64 start, u64 end, u32 asid,
> > > > + struct drm_suballoc *prl_sa)
> > > > +{
> > > > + struct xe_guc *guc = tlb_inval->private;
> > > > + struct xe_device *xe = guc_to_xe(guc);
> > > > + struct xe_exec_queue *q, *next, *last_q = NULL;
> > > > + struct xe_vm *vm;
> > > > + LIST_HEAD(tlb_inval_list);
> > > > + int err = 0, id = guc_to_gt(guc)->info.id;
> > > > +
> > > > + lockdep_assert_held(&tlb_inval->seqno_lock);
> > > > +
> > > > + if (xe->info.force_execlist)
> > > > + return -ECANCELED;
> > > > +
> > > > + vm = xe_device_asid_to_vm(xe, asid);
> > > > + if (IS_ERR(vm))
> > > > + return PTR_ERR(vm);
> > > > +
> > > > + down_read(&vm->exec_queues.lock);
> > > > +
> > > > + /*
> > > > + * XXX: Randomly picking a threshold for now. This will
> > > > need
> > > > to be
> > > > + * tuned based on expected UMD queue counts and
> > > > performance
> > > > profiling.
> > > > + */
> > > > +#define EXEC_QUEUE_COUNT_FULL_THRESHOLD 8
> > > > + if (vm->exec_queues.count[id] >=
> > > > EXEC_QUEUE_COUNT_FULL_THRESHOLD) {
> > > > + u32 action[] = {
> > > > + XE_GUC_ACTION_TLB_INVALIDATION,
> > > > + seqno,
> > > > + MAKE_INVAL_OP(XE_GUC_TLB_INVAL_FULL),
> > > > + };
> > > > +
> > > > + err = send_tlb_inval(guc, action,
> > > > ARRAY_SIZE(action));
> > > > + goto err_unlock;
> > > > + }
> > > > +#undef EXEC_QUEUE_COUNT_FULL_THRESHOLD
> > > > +
> > > > + /*
> > > > + * Move exec queues to a temporary list to issue
> > > > invalidations. The exec
> > > > + * queue must active and a reference must be taken to
> > > > prevent
> > > > concurrent
> > > > + * deregistrations.
> > > > + *
> > > > + * List modification is safe because we hold 'vm-
> > > > > exec_queues.lock' for
> > > > + * reading, which prevents external modifications. Using
> > > > a
> > > > per-GT list
> > > > + * is also safe since 'tlb_inval->seqno_lock' ensures no
> > > > other GT users
> > > > + * can enter this code path.
> > > > + */
> > > > + list_for_each_entry_safe(q, next, &vm-
> > > > >exec_queues.list[id],
> > > > + vm_exec_queue_link) {
> > > > + if (q->ops->active(q) &&
> > > > xe_exec_queue_get_unless_zero(q)) {
> > > > + last_q = q;
> > > > + list_move_tail(&q->vm_exec_queue_link,
> > > > &tlb_inval_list);
> > >
> > > It would be nice if we could consolidate this into a single list
> > > instead of moving things around like this, but I realize this makes
> > > it
> > > easier to track the last active queue so no issue...
> > >
> >
> > Yes, the active check is stable exactly once, which would make it
> > impossible to find last_q if we do two passes.
> >
> >
> > > I don't see anything blocking us from deregistering a queue though
> > > after it has been moved to the tlb_inval_list here. Should we add
> > > that
> >
> > The xe_exec_queue_get_unless_zero is what makes this safe.
> > Deregistrations are now completely tied to the reference count, which
> > changed here [1]. A reference count of zero means all jobs have
> > signaled
> > on the queue, so there’s no need to issue an invalidation.
>
> Oh I hadn't seen this patch yet, thanks for the link!
>
> So basically before we were doing this extra reference which meant a
> get/put here would not trigger anything until the deregistration
> happened later as part of that schedule disable (assuming it didn't
> time out or something). By dropping that extra ref, the expectation is
> if scheduling ends after we check active here, i.e. someone does an
> exec_queue_put(), as soon as we do the put in this new routine, it will
> trigger exec_queue_destroy() which sets that pending disable. Is that
> right?
>
Schedule disable is a bit different. It can happen for several
reasons—the TDR kicks something off the hardware, a preempt fence
triggers, or we switch from page fault mode to DMA fence mode. It’s safe
if any of these race; ->active() just opportunistically checks if a
queue is disabled and skips the invalidation. It’s harmless if a disable
races because the queue is still registered, so the TLB invalidation
will still work.
Yes, the refcount protects against guc_exec_queue_destroy being entered.
We also issue a schedule disable here (or more precisely, the message
scheduler work item does this), and when that returns, we issue a
deregistration—the latter is what we care about and want to avoid. It’s
possible that the TLB invalidation code triggers guc_exec_queue_destroy
upon put, but that’s fine.
> And I guess since we're doing the get_unless_zero here, we're also
> protected in the situation where a schedule disable is in progress but
> we haven't yet hit that pending disable?
>
The refcount protects against queue cleanup from starting. If refcount
== 0, the cleanup process has already started, so there’s no need for an
invalidation. Again since queue cleanup is the only place that issue
deregistration, this is all safe.
The vm->exec_queues.lock also protects against any queue memory
disappearing in the TLB invalidation code.
Matt
> Still need to get some more testing in here to prove all of that, but I
> agree it makes sense.
>
> >
> > [1]
> > https://patchwork.freedesktop.org/patch/697820/?series=155314&rev=10
> >
> > > seqno_lock in the register/deregister routines to be safe? Or maybe
> > > block until tlb_inval_list is empty? Or if I'm missing it
> > > otherwise,
> > > how are you preventing the deregistration after this is on the
> > > list?
> > >
> >
> > See above—I’m pretty sure this is safe. Refcounting tricks wouldn’t
> > be
> > my first choice, but a cross-component lock would be considerably
> > worse.
>
> Agreed.
>
> Thanks,
> Stuart
>
> >
> > Matt
> >
> > > Thanks,
> > > Stuart
> > >
> > > > + }
> > > > + }
> > > > +
> > > > + if (!last_q) {
> > > > + /*
> > > > + * We can't break fence ordering for TLB
> > > > invalidation
> > > > jobs, if
> > > > + * TLB invalidations are inflight issue a dummy
> > > > invalidation to
> > > > + * maintain ordering. Nor can we move safely the
> > > > seqno_recv when
> > > > + * returning -ECANCELED if TLB invalidations are
> > > > in
> > > > flight. Use
> > > > + * GGTT invalidation as dummy invalidation given
> > > > ASID
> > > > + * invalidations are unsupported here.
> > > > + */
> > > > + if (xe_tlb_inval_idle(tlb_inval))
> > > > + err = -ECANCELED;
> > > > + else
> > > > + err = send_tlb_inval_ggtt(tlb_inval,
> > > > seqno);
> > > > + goto err_unlock;
> > > > + }
> > > > +
> > > > + list_for_each_entry_safe(q, next, &tlb_inval_list,
> > > > vm_exec_queue_link) {
> > > > + struct drm_suballoc *__prl_sa = NULL;
> > > > + int __seqno = TLB_INVALIDATION_SEQNO_INVALID;
> > > > + u32 type = XE_GUC_TLB_INVAL_PAGE_SELECTIVE_CTX;
> > > > +
> > > > + xe_assert(xe, q->vm == vm);
> > > > +
> > > > + if (err)
> > > > + goto unref;
> > > > +
> > > > + if (last_q == q) {
> > > > + __prl_sa = prl_sa;
> > > > + __seqno = seqno;
> > > > + }
> > > > +
> > > > + err = send_tlb_inval_ppgtt(guc, __seqno, start,
> > > > end,
> > > > + q->guc->id, type,
> > > > __prl_sa);
> > > > +
> > > > +unref:
> > > > + /*
> > > > + * Must always return exec queue to original list
> > > > /
> > > > drop
> > > > + * reference
> > > > + */
> > > > + list_move_tail(&q->vm_exec_queue_link,
> > > > + &vm->exec_queues.list[id]);
> > > > + xe_exec_queue_put(q);
> > > > + }
> > > > +
> > > > +err_unlock:
> > > > + up_read(&vm->exec_queues.lock);
> > > > + xe_vm_put(vm);
> > > > +
> > > > + return err;
> > > > +}
> > > > +
> > > > static bool tlb_inval_initialized(struct xe_tlb_inval
> > > > *tlb_inval)
> > > > {
> > > > struct xe_guc *guc = tlb_inval->private;
> > > > @@ -228,7 +350,7 @@ static long tlb_inval_timeout_delay(struct
> > > > xe_tlb_inval *tlb_inval)
> > > > return hw_tlb_timeout + 2 * delay;
> > > > }
> > > >
> > > > -static const struct xe_tlb_inval_ops guc_tlb_inval_ops = {
> > > > +static const struct xe_tlb_inval_ops guc_tlb_inval_asid_ops = {
> > > > .all = send_tlb_inval_all,
> > > > .ggtt = send_tlb_inval_ggtt,
> > > > .ppgtt = send_tlb_inval_asid_ppgtt,
> > > > @@ -237,6 +359,15 @@ static const struct xe_tlb_inval_ops
> > > > guc_tlb_inval_ops = {
> > > > .timeout_delay = tlb_inval_timeout_delay,
> > > > };
> > > >
> > > > +static const struct xe_tlb_inval_ops guc_tlb_inval_ctx_ops = {
> > > > + .ggtt = send_tlb_inval_ggtt,
> > > > + .all = send_tlb_inval_all,
> > > > + .ppgtt = send_tlb_inval_ctx_ppgtt,
> > > > + .initialized = tlb_inval_initialized,
> > > > + .flush = tlb_inval_flush,
> > > > + .timeout_delay = tlb_inval_timeout_delay,
> > > > +};
> > > > +
> > > > /**
> > > > * xe_guc_tlb_inval_init_early() - Init GuC TLB invalidation
> > > > early
> > > > * @guc: GuC object
> > > > @@ -248,8 +379,14 @@ static const struct xe_tlb_inval_ops
> > > > guc_tlb_inval_ops = {
> > > > void xe_guc_tlb_inval_init_early(struct xe_guc *guc,
> > > > struct xe_tlb_inval *tlb_inval)
> > > > {
> > > > + struct xe_device *xe = guc_to_xe(guc);
> > > > +
> > > > tlb_inval->private = guc;
> > > > - tlb_inval->ops = &guc_tlb_inval_ops;
> > > > +
> > > > + if (xe->info.has_ctx_tlb_inval)
> > > > + tlb_inval->ops = &guc_tlb_inval_ctx_ops;
> > > > + else
> > > > + tlb_inval->ops = &guc_tlb_inval_asid_ops;
> > > > }
> > > >
> > > > /**
> > >
>
next prev parent reply other threads:[~2026-01-14 0:25 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-13 2:52 [PATCH v4 00/12] Context based TLB invalidations Matthew Brost
2026-01-13 2:52 ` [PATCH v4 01/12] drm/xe: Add normalize_invalidation_range Matthew Brost
2026-01-13 2:52 ` [PATCH v4 02/12] drm/xe: Make usm.asid_to_vm allocation use GFP_NOWAIT Matthew Brost
2026-01-13 2:52 ` [PATCH v4 03/12] drm/xe: Add has_ctx_tlb_inval to device info Matthew Brost
2026-01-13 21:24 ` Summers, Stuart
2026-01-13 2:52 ` [PATCH v4 04/12] drm/xe: Add xe_device_asid_to_vm helper Matthew Brost
2026-01-13 2:52 ` [PATCH v4 05/12] drm/xe: Add vm to exec queues association Matthew Brost
2026-01-13 21:26 ` Summers, Stuart
2026-01-13 2:52 ` [PATCH v4 06/12] drm/xe: Taint TLB invalidation seqno lock with GFP_KERNEL Matthew Brost
2026-01-13 2:52 ` [PATCH v4 07/12] drm/xe: Rename send_tlb_inval_ppgtt to send_tlb_inval_asid_ppgtt Matthew Brost
2026-01-13 2:52 ` [PATCH v4 08/12] drm/xe: Add send_tlb_inval_ppgtt helper Matthew Brost
2026-01-13 2:52 ` [PATCH v4 09/12] drm/xe: Add xe_tlb_inval_idle helper Matthew Brost
2026-01-13 2:52 ` [PATCH v4 10/12] drm/xe: Add exec queue active vfunc Matthew Brost
2026-01-13 21:32 ` Summers, Stuart
2026-01-13 22:16 ` Matthew Brost
2026-01-13 22:21 ` Summers, Stuart
2026-01-14 21:17 ` Summers, Stuart
2026-01-13 2:52 ` [PATCH v4 11/12] drm/xe: Add context-based invalidation to GuC TLB invalidation backend Matthew Brost
2026-01-13 21:23 ` Summers, Stuart
2026-01-13 22:08 ` Matthew Brost
2026-01-13 23:25 ` Summers, Stuart
2026-01-14 0:25 ` Matthew Brost [this message]
2026-01-13 2:52 ` [PATCH v4 12/12] drm/xe: Enable context TLB invalidations for CI Matthew Brost
2026-01-13 3:00 ` ✓ CI.KUnit: success for Context based TLB invalidations (rev4) Patchwork
2026-01-13 3:41 ` ✓ Xe.CI.BAT: " Patchwork
2026-01-13 9:45 ` ✓ Xe.CI.Full: " Patchwork
2026-01-16 0:25 ` [PATCH v4 00/12] Context based TLB invalidations Summers, Stuart
2026-01-16 1:52 ` Matthew Brost
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aWbigiMoVLMzSBux@lstrano-desk.jf.intel.com \
--to=matthew.brost@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=stuart.summers@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox