From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 14/15] drm/i915: Async GPU relocation processing
Date: Fri, 17 Mar 2017 13:15:33 +0200 [thread overview]
Message-ID: <1489749333.2892.11.camel@linux.intel.com> (raw)
In-Reply-To: <20170316132006.7976-15-chris@chris-wilson.co.uk>
On to, 2017-03-16 at 13:20 +0000, Chris Wilson wrote:
> If the user requires patching of their batch or auxiliary buffers, we
> currently make the alterations on the cpu. If they are active on the GPU
> at the time, we wait under the struct_mutex for them to finish executing
> before we rewrite the contents. This happens if shared relocation trees
> are used between different contexts with separate address space (and the
> buffers then have different addresses in each), the 3D state will need
> to be adjusted between execution on each context. However, we don't need
> to use the CPU to do the relocation patching, as we could queue commands
> to the GPU to perform it and use fences to serialise the operation with
> the current activity and future - so the operation on the GPU appears
> just as atomic as performing it immediately. Performing the relocation
> rewrites on the GPU is not free, in terms of pure throughput, the number
> of relocations/s is about halved - but more importantly so is the time
> under the struct_mutex.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
<SNIP>
> /* Must be a variable in the struct to allow GCC to unroll. */
> + cache->gen = INTEL_GEN(i915);
> cache->has_llc = HAS_LLC(i915);
> - cache->has_fence = INTEL_GEN(i915) < 4;
> - cache->use_64bit_reloc = HAS_64BIT_RELOC(i915);
> + cache->has_fence = cache->gen < 4;
> + cache->use_64bit_reloc = cache->gen >= 8;
Keep using HAS_64BIT_RELOC(i915)...
> +static u32 *reloc_gpu(struct i915_execbuffer *eb,
> + struct i915_vma *vma,
> + unsigned int len)
> +{
> + struct reloc_cache *cache = &eb->reloc_cache;
> + u32 *cmd;
> +
> + if (cache->rq_size > PAGE_SIZE/sizeof(u32) - (len + 1))
There's no overflow chance here, so I'd rq_size + len + 1.
> + reloc_gpu_flush(cache);
> +
> + if (!cache->rq) {
> + struct drm_i915_gem_object *obj;
> + struct drm_i915_gem_request *rq;
> + struct i915_vma *batch;
> + int err;
> +
> + GEM_BUG_ON(vma->obj->base.write_domain & I915_GEM_DOMAIN_CPU);
Use ==.
I just close my eyes for the reminder of this function and expect v2 to
have a proper teardown :P
Also vma / obj pair naming varies from what they usually are, so I'd
consider renaming one of them to lessen confusion.
> @@ -1012,6 +1148,67 @@ relocate_entry(struct i915_vma *vma,
> bool wide = eb->reloc_cache.use_64bit_reloc;
> void *vaddr;
>
> + if (!eb->reloc_cache.vaddr &&
> + (DBG_FORCE_RELOC == FORCE_GPU_RELOC ||
> + !reservation_object_test_signaled_rcu(obj->resv, true))) {
> + const unsigned int gen = eb->reloc_cache.gen;
> + unsigned int len;
> + u32 *batch;
> + u64 addr;
> +
> + if (wide)
> + len = offset & 7 ? 8 : 5;
> + else if (gen >= 4)
> + len = 4;
> + else if (gen >= 3)
> + len = 3;
> + else /* On gen2 MI_STORE_DWORD_IMM uses a physical address */
> + goto repeat;
> +
> + batch = reloc_gpu(eb, vma, len);
> + if (IS_ERR(batch))
> + goto repeat;
> +
> + addr = gen8_canonical_addr(vma->node.start + offset);
> + if (wide) {
> + if (offset & 7) {
> + *batch++ = MI_STORE_DWORD_IMM_GEN4;
This indent level calls for a new function. __relocate_entry_gpu
Couldn't we share some of this STORE IMM code we have around? I don't
want to crawl the specs every time :(
Regards, Joonas
--
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
next prev parent reply other threads:[~2017-03-17 11:15 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-16 13:19 Make execbuf fast and green Chris Wilson
2017-03-16 13:19 ` [PATCH 01/15] drm/i915: Copy user requested buffers into the error state Chris Wilson
2017-03-16 13:19 ` [PATCH 02/15] drm/i915: Retire an active batch pool object rather than allocate new Chris Wilson
2017-03-17 8:52 ` Joonas Lahtinen
2017-03-17 9:02 ` Chris Wilson
2017-03-16 13:19 ` [PATCH 03/15] drm/i915: Amalgamate execbuffer parameter structures Chris Wilson
2017-03-17 10:04 ` Joonas Lahtinen
2017-03-16 13:19 ` [PATCH 04/15] drm/i915: Use vma->exec_entry as our double-entry placeholder Chris Wilson
2017-03-16 13:19 ` [PATCH 05/15] drm/i915: Split vma exec_link/evict_link Chris Wilson
2017-03-16 13:19 ` [PATCH 06/15] drm/i915: Stop using obj->obj_exec_link outside of execbuf Chris Wilson
2017-03-16 13:19 ` [PATCH 07/15] drm/i915: Store a direct lookup from object handle to vma Chris Wilson
2017-03-22 9:33 ` [PATCH v2] " Chris Wilson
2017-03-22 14:22 ` Joonas Lahtinen
2017-03-23 14:23 ` Daniel Vetter
2017-03-23 14:40 ` Chris Wilson
2017-03-16 13:19 ` [PATCH 08/15] drm/i915: Pass vma to relocate entry Chris Wilson
2017-03-21 14:17 ` Joonas Lahtinen
2017-03-23 13:02 ` [PATCH] " Chris Wilson
2017-03-16 13:20 ` [PATCH 09/15] drm/i915: Eliminate lots of iterations over the execobjects array Chris Wilson
2017-03-16 13:20 ` [PATCH 10/15] drm/i915: First try the previous execbuffer location Chris Wilson
2017-03-21 13:54 ` Joonas Lahtinen
2017-03-16 13:20 ` [PATCH 11/15] drm/i915: Wait upon userptr get-user-pages within execbuffer Chris Wilson
2017-03-16 13:20 ` [PATCH 12/15] drm/i915: Remove superfluous i915_add_request_no_flush() helper Chris Wilson
2017-03-17 11:17 ` Joonas Lahtinen
2017-03-17 13:06 ` Chris Wilson
2017-03-16 13:20 ` [PATCH 13/15] drm/i915: Allow execbuffer to use the first object as the batch Chris Wilson
2017-03-17 11:15 ` Joonas Lahtinen
2017-07-07 10:17 ` Daniel Vetter
2017-07-07 11:58 ` Chris Wilson
2017-07-10 5:55 ` Daniel Vetter
2017-07-08 3:54 ` Kenneth Graunke
2017-03-16 13:20 ` [PATCH 14/15] drm/i915: Async GPU relocation processing Chris Wilson
2017-03-17 11:11 ` [PATCH v2] " Chris Wilson
2017-03-17 11:15 ` Joonas Lahtinen [this message]
2017-03-16 13:20 ` [PATCH 15/15] drm/i915/scheduler: Support user-defined priorities Chris Wilson
2017-03-16 13:57 ` ✗ Fi.CI.BAT: warning for series starting with [01/15] drm/i915: Copy user requested buffers into the error state Patchwork
2017-03-17 11:28 ` ✓ Fi.CI.BAT: success for series starting with [01/15] drm/i915: Copy user requested buffers into the error state (rev2) Patchwork
2017-03-22 9:43 ` ✗ Fi.CI.BAT: failure for series starting with [01/15] drm/i915: Copy user requested buffers into the error state (rev3) Patchwork
2017-03-23 13:51 ` ✗ Fi.CI.BAT: failure for series starting with [01/15] drm/i915: Copy user requested buffers into the error state (rev4) Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1489749333.2892.11.camel@linux.intel.com \
--to=joonas.lahtinen@linux.intel.com \
--cc=chris@chris-wilson.co.uk \
--cc=intel-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).