Re: [Intel-gfx] [PATCH v8 3/7] drm/i915: Define and use GuC and CTB TLB invalidation routines

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: John Harrison <john.c.harrison@intel.com>,
	Jonathan Cavitt <jonathan.cavitt@intel.com>,
	intel-gfx@lists.freedesktop.org
Cc: janusz.krzysztofik@intel.com, andi.shyti@intel.com,
	matthew.d.roper@intel.com, chris.p.wilson@linux.intel.com,
	nirmoy.das@intel.com
Subject: Re: [Intel-gfx] [PATCH v8 3/7] drm/i915: Define and use GuC and CTB TLB invalidation routines
Date: Tue, 10 Oct 2023 10:00:42 +0100	[thread overview]
Message-ID: <8d3db6e2-9456-7e4d-3262-034e48435a09@linux.intel.com> (raw)
In-Reply-To: <ad5e3e47-aa7a-4c96-8dac-42691390ce4e@intel.com>


On 09/10/2023 20:14, John Harrison wrote:
> On 10/9/2023 01:56, Tvrtko Ursulin wrote:
>> On 06/10/2023 19:20, Jonathan Cavitt wrote:
>>> From: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
>>>
>>> The GuC firmware had defined the interface for Translation Look-Aside
>>> Buffer (TLB) invalidation.  We should use this interface when
>>> invalidating the engine and GuC TLBs.
>>> Add additional functionality to intel_gt_invalidate_tlb, invalidating
>>> the GuC TLBs and falling back to GT invalidation when the GuC is
>>> disabled.
>>> The invalidation is done by sending a request directly to the GuC
>>> tlb_lookup that invalidates the table.  The invalidation is submitted as
>>> a wait request and is performed in the CT event handler.  This means we
>>> cannot perform this TLB invalidation path if the CT is not enabled.
>>> If the request isn't fulfilled in two seconds, this would constitute
>>> an error in the invalidation as that would constitute either a lost
>>> request or a severe GuC overload.
>>>
>>> With this new invalidation routine, we can perform GuC-based GGTT
>>> invalidations.  GuC-based GGTT invalidation is incompatible with
>>> MMIO invalidation so we should not perform MMIO invalidation when
>>> GuC-based GGTT invalidation is expected.
>>>
>>> Purpose of xarray:
>>> The tlb_lookup table is allocated as an xarray because the set of
>>> pending TLB invalidations may have no upper bound.  The consequence of
>>> this is that all actions interfacing with this table need to use the
>>> xarray functions, such as xa_alloc_cyclic_irq for array insertion.
>>>
>>> Purpose of must_wait_woken:
>>> Our wait for the G2H ack for the completion of a TLB invalidation is
>>> mandatory; we must wait for the HW to confirm that the physical
>>> addresses are no longer accessible before we return those to the system.
>>>
>>> On switching to using the wait_woken() convenience routine, we
>>> introduced ourselves to an issue where wait_woken() may complete early
>>> under a kthread that is stopped. Since we send a TLB invalidation when
>>> we try to release pages from the shrinker, we can be called from any
>>> process; including kthreads.
>>>
>>> Using wait_woken() from any process context causes another issue. The
>>> use of is_kthread_should_stop() assumes that any task with PF_KTHREAD
>>> set was made by kthread_create() and has called set_kthread_struct().
>>> This is not true for the raw kernel_thread():
>>
>> This explanation misses the main point of my ask - which is to explain 
>> why a simpler scheme isn't sufficient. Simpler scheme aka not needed 
>> the xarray or any flavour of wait_token().
>>
>> In other words it is obvious we have to wait for the invalidation ack, 
>> but not obvious why we need a complicated scheme.
> The alternative being to simply serialise all TLB invalidation requests? 
> Thus, no complex tracking required as there is only one in flight at a 
> time? That seems inefficient and a potential performance impact if a 
> bunch of invalidations are required back to back. But given that the 
> current scheme is global invalidation only (no support for ranges / per 
> page invalidation yet), is it possible to get multiple back-to-back 
> requests?

It is possible to get a stream of invalidation requests but all that 
come with put_pages() are serialized on the gt->tlb.invalidate_lock 
anyway. So for them only benefit with the complicated approach versus 
the dumb single wait queue is avoiding wake up storms.

Then the second source of TLB invalidations is ggtt->invalidate(). I am 
not sure if those are frequent enough to warrant parallelism. Definitely 
shouldn't be for things like context images and ringbuffers. So I was 
asking if maybe framebuffers but don't know.

Regards,

Tvrtko

next prev parent reply	other threads:[~2023-10-10  9:00 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-06 18:20 [Intel-gfx] [PATCH v8 0/7] drm/i915: Define and use GuC and CTB TLB invalidation routines Jonathan Cavitt
2023-10-06 18:20 ` [Intel-gfx] [PATCH v8 1/7] drm/i915: Add GuC TLB Invalidation device info flags Jonathan Cavitt
2023-10-06 18:20 ` [Intel-gfx] [PATCH v8 2/7] drm/i915/guc: Add CT size delay helper Jonathan Cavitt
2023-10-06 18:20 ` [Intel-gfx] [PATCH v8 3/7] drm/i915: Define and use GuC and CTB TLB invalidation routines Jonathan Cavitt
2023-10-06 21:56   ` kernel test robot
2023-10-09  8:56   ` Tvrtko Ursulin
2023-10-09 11:40     ` Andi Shyti
2023-10-09 12:49       ` Tvrtko Ursulin
2023-10-09 15:02     ` Cavitt, Jonathan
2023-10-09 19:14     ` John Harrison
2023-10-10  9:00       ` Tvrtko Ursulin [this message]
2023-10-09 12:12   ` Nirmoy Das
2023-10-09 12:54     ` Tvrtko Ursulin
2023-10-09 13:24       ` Nirmoy Das
2023-10-06 18:20 ` [Intel-gfx] [PATCH v8 4/7] drm/i915: No TLB invalidation on suspended GT Jonathan Cavitt
2023-10-09  9:00   ` Tvrtko Ursulin
2023-10-09  9:31     ` Jani Nikula
2023-10-06 18:20 ` [Intel-gfx] [PATCH v8 5/7] drm/i915: No TLB invalidation on wedged GT Jonathan Cavitt
2023-10-06 18:20 ` [Intel-gfx] [PATCH v8 6/7] drm/i915/gt: Increase sleep in gt_tlb selftest sanitycheck Jonathan Cavitt
2023-10-06 18:20 ` [Intel-gfx] [PATCH v8 7/7] drm/i915: Enable GuC TLB invalidations for MTL Jonathan Cavitt
2023-10-06 21:27 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Define and use GuC and CTB TLB invalidation routines Patchwork
2023-10-06 21:27 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2023-10-06 21:45 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
2023-10-07  6:43 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Define and use GuC and CTB TLB invalidation routines (rev2) Patchwork
2023-10-07  6:43 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2023-10-07  6:57 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2023-10-07 16:47 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2023-10-09 11:52 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Define and use GuC and CTB TLB invalidation routines (rev3) Patchwork
2023-10-09 11:52 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2023-10-09 12:05 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2023-10-09 14:40 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8d3db6e2-9456-7e4d-3262-034e48435a09@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=andi.shyti@intel.com \
    --cc=chris.p.wilson@linux.intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=janusz.krzysztofik@intel.com \
    --cc=john.c.harrison@intel.com \
    --cc=jonathan.cavitt@intel.com \
    --cc=matthew.d.roper@intel.com \
    --cc=nirmoy.das@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.