From: Matthew Brost <matthew.brost@intel.com>
To: Nirmoy Das <nirmoy.das@intel.com>
Cc: <intel-xe@lists.freedesktop.org>, <rodrigo.vivi@intel.com>
Subject: Re: [PATCH] drm/xe/vm: Keep the device awake for TLB inval
Date: Tue, 16 Jul 2024 15:45:24 +0000 [thread overview]
Message-ID: <ZpaVlInvZh0XRUMH@DUT025-TGLU.fm.intel.com> (raw)
In-Reply-To: <20240716133855.12015-1-nirmoy.das@intel.com>
On Tue, Jul 16, 2024 at 03:38:55PM +0200, Nirmoy Das wrote:
> GT can suspend while TLB invalidation is happening in the background.
> This would cause a TLB timeout when that happens. Keep the device awake
> when using fence which doesn't wait for the TLB invalidation to finish.
>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
+ Rodrigo our local PM expert.
> ---
> Adding strace here for more information:
>
> xe_pm-18095 [001] ..... 3493.481048: xe_vma_unbind: dev=0000:00:02.0, vma=ffff8881c3062b00, asid=0x0000f, start=0x0000001a0000, end=0x0000001a1fff, userptr=0x000000000000,
> xe_pm-18095 [001] ..... 3493.481063: xe_vm_cpu_bind: dev=0000:00:02.0, vm=ffff88812a00d000, asid=0x0000f
> xe_pm-18095 [001] ..... 3493.481093: xe_gt_tlb_invalidation_fence_create: dev=0000:00:02.0, fence=ffff88811bf3d000, seqno=0
> xe_pm-18095 [001] ..... 3493.481095: xe_gt_tlb_invalidation_fence_work_func: dev=0000:00:02.0, fence=ffff88811bf3d000, seqno=0
> xe_pm-18095 [001] ..... 3493.481097: xe_gt_tlb_TL_fence_send: dev=0000:00:02.0, fence=ffff88811bf3d000, seqno=93
> xe_pm-18095 [001] d..1. 3493.481097: xe_guc_ctb_h2g: H2G CTB: dev=0000:00:02.0, gt0: action=0x7000, len=8, tail=44, head=36
> kworker/1:2-17900 [001] ..... 3493.481302: xe_exec_queue_stop: dev=0000:00:02.0, 3:0x2, gt=0, width=1, guc_id=0, guc_state=0x0, flags=0x13
> kworker/1:2-17900 [001] ..... 3493.481303: xe_exec_queue_stop: dev=0000:00:02.0, 3:0x1, gt=0, width=1, guc_id=1, guc_state=0x0, flags=0x4
> kworker/1:2-17900 [001] ..... 3493.481305: xe_exec_queue_stop: dev=0000:00:02.0, 0:0x1, gt=0, width=1, guc_id=2, guc_state=0x0, flags=0x0
> xe_pm-18095 [001] ..... 3493.756294: xe_guc_ctb_h2g: H2G CTB: dev=0000:00:02.0, gt0: action=0x3003, len=5, tail=5, head=0
> xe_pm-18095 [001] d..1. 3493.756470: xe_guc_ctb_h2g: H2G CTB: dev=0000:00:02.0, gt0: action=0x3003, len=5, tail=10, head=5
> kworker/u32:1-17912 [006] d..1. 3493.756535: xe_guc_ctb_g2h: G2H CTB: dev=0000:00:02.0, gt0: action=0x0, len=2, tail=2, head=2
> xe_pm-18095 [001] ..... 3493.756557: xe_guc_ctb_h2g: H2G CTB: dev=0000:00:02.0, gt0: action=0x3003, len=5, tail=15, head=10
> xe_pm-18095 [001] ..... 3493.756559: xe_guc_ctb_h2g: H2G CTB: dev=0000:00:02.0, gt0: action=0x3004, len=3, tail=18, head=10
> kworker/1:2-17900 [001] d..1. 3497.951783: xe_gt_tlb_invalidation_fence_timeout: dev=0000:00:02.0, fence=ffff88811bf3d000, seqno=93
>
How do you know from this the device is suspending? I can't tell that is
happening. I do think this raises a good point that suspend / resume
should be added to ftrace as that is useful information.
> drivers/gpu/drm/xe/xe_vm.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
> index b6932cc98ff9..241b7ea00d5f 100644
> --- a/drivers/gpu/drm/xe/xe_vm.c
> +++ b/drivers/gpu/drm/xe/xe_vm.c
> @@ -2700,6 +2700,7 @@ static int vm_bind_ioctl_ops_execute(struct xe_vm *vm,
> struct dma_fence *fence;
> int err;
>
> + xe_pm_runtime_get(vm->xe);
While I agree the device shouldn't enter suspend while TLB invalidations
are inflight I don't think this patch will help with this.
This code path is called in various places in where we should have PM
ref (VM bind IOCTL, exec IOCTL for rebind, or preempt rebind worker). If
we don't have PM ref when this function is called, that is a bug that
needs to be fixed at the outer most layers. Beyond that, GT TLB
invalidations are async and pipelined (e.g. they can be sent after this
function returns and completion can returns sometime later).
With this, I believe correct place to fix this is either in the CT layer
or perhaps hook into GT TLB invalidation fence (Arming of fence
takes a ref, signaling of fence drops a ref). If we choose the latter
option I think following series will help as we will use GT TLB
invalidation fences everywhere for waits [1]/
Rodrigo - I know we had talked about something like above but it doesn't
appear this has gotten implemented. WIP or did this get lost in the PM
work?
Matt
[1] https://patchwork.freedesktop.org/series/135809/
> lockdep_assert_held_write(&vm->lock);
>
> drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT |
> @@ -2721,6 +2722,7 @@ static int vm_bind_ioctl_ops_execute(struct xe_vm *vm,
>
> unlock:
> drm_exec_fini(&exec);
> + xe_pm_runtime_put(vm->xe);
> return err;
> }
>
> --
> 2.42.0
>
next prev parent reply other threads:[~2024-07-16 15:46 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-16 13:38 [PATCH] drm/xe/vm: Keep the device awake for TLB inval Nirmoy Das
2024-07-16 14:00 ` ✓ CI.Patch_applied: success for " Patchwork
2024-07-16 14:00 ` ✓ CI.checkpatch: " Patchwork
2024-07-16 14:02 ` ✓ CI.KUnit: " Patchwork
2024-07-16 14:13 ` ✓ CI.Build: " Patchwork
2024-07-16 14:16 ` ✓ CI.Hooks: " Patchwork
2024-07-16 14:17 ` ✓ CI.checksparse: " Patchwork
2024-07-16 14:42 ` ✓ CI.BAT: " Patchwork
2024-07-16 15:45 ` Matthew Brost [this message]
2024-07-16 16:25 ` [PATCH] " Nirmoy Das
2024-07-16 16:32 ` Matthew Brost
2024-07-17 9:12 ` Nirmoy Das
2024-07-17 12:36 ` Nirmoy Das
2024-07-16 15:45 ` ✗ CI.FULL: failure for " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZpaVlInvZh0XRUMH@DUT025-TGLU.fm.intel.com \
--to=matthew.brost@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=nirmoy.das@intel.com \
--cc=rodrigo.vivi@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox