All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ville Syrjälä" <ville.syrjala@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>
Cc: intel-gfx@lists.freedesktop.org, Kenneth Graunke <kenneth@whitecape.org>
Subject: Re: [PATCH 3/3] drm/i915: Enable render context support for gen4 (Broadwater to Cantiga)
Date: Thu, 10 Jan 2019 18:03:21 +0200	[thread overview]
Message-ID: <20190110160321.GY20097@intel.com> (raw)
In-Reply-To: <20190110103807.16477-4-chris@chris-wilson.co.uk>

On Thu, Jan 10, 2019 at 10:38:07AM +0000, Chris Wilson wrote:
> Broadwater and the rest of gen4  do support being able to saving and
> reloading context specific registers between contexts, providing isolation
> of the basic GPU state (as programmable by userspace). This allows
> userspace to assume that the GPU retains their state from one batch to the
> next, minimising the amount of state it needs to reload and manually save
> across batches.
> 
> v2: CONSTANT_BUFFER woes
> 
> Running through piglit turned up an interesting issue, a GPU hang inside
> the context load. The context image includes the CONSTANT_BUFFER command
> that loads an address into a on-gpu buffer, and the context load was
> executing that immediately. However, since it was reading from the GTT
> there is no guarantee that the GTT retains the same configuration as
> when the context was saved, resulting in stray reads and a GPU hang.
> 
> Having tried issuing a CONSTANT_BUFFER (to disable the command) from the
> ring before saving the context to no avail, we resort to patching out
> the instruction inside the context image before loading.
> 
> This does impose that gen4 always reissues CONSTANT_BUFFER commands on
> each batch, but due to the use of a shared GTT that was and will remain
> a requirement.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Cc: Kenneth Graunke <kenneth@whitecape.org>
> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> #v1
> ---
>  drivers/gpu/drm/i915/intel_engine_cs.c    |  2 +-
>  drivers/gpu/drm/i915/intel_gpu_commands.h |  3 +++
>  drivers/gpu/drm/i915/intel_ringbuffer.c   | 17 +++++++++++++++++
>  3 files changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index f89b8f199e3f..88109e0de051 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -219,6 +219,7 @@ __intel_engine_context_size(struct drm_i915_private *dev_priv, u8 class)
>  			return round_up(GEN6_CXT_TOTAL_SIZE(cxt_size) * 64,
>  					PAGE_SIZE);
>  		case 5:
> +		case 4:
>  			/*
>  			 * There is a discrepancy here between the size reported
>  			 * by the register and the size of the context layout
> @@ -235,7 +236,6 @@ __intel_engine_context_size(struct drm_i915_private *dev_priv, u8 class)
>  					 cxt_size * 64,
>  					 cxt_size - 1);
>  			return round_up(cxt_size * 64, PAGE_SIZE);
> -		case 4:
>  		case 3:
>  		case 2:
>  		/* For the special day when i810 gets merged. */
> diff --git a/drivers/gpu/drm/i915/intel_gpu_commands.h b/drivers/gpu/drm/i915/intel_gpu_commands.h
> index 105e2a9e874a..00c0175c37ed 100644
> --- a/drivers/gpu/drm/i915/intel_gpu_commands.h
> +++ b/drivers/gpu/drm/i915/intel_gpu_commands.h
> @@ -266,6 +266,9 @@
>  #define GFX_OP_3DSTATE_BINDING_TABLE_EDIT_PS \
>  	((0x3<<29)|(0x3<<27)|(0x0<<24)|(0x47<<16))
>  
> +#define GFX_OP_CONSTANT_BUFFER \
> +	(0x3 << 29 | 0x0 << 27 | 0x0 << 24 | 0x2 << 16)
> +
>  #define MFX_WAIT  ((0x3<<29)|(0x1<<27)|(0x0<<16))
>  
>  #define COLOR_BLT     ((0x2<<29)|(0x40<<22))
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 889f3de79dd0..21bd71cf2e94 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1632,6 +1632,8 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
>  		len += 2 + (num_rings ? 4*num_rings + 6 : 0);
>  	else if (IS_GEN(i915, 5))
>  		len += 2;
> +	else if (IS_GEN(i915, 4))
> +		len += 4;
>  	if (flags & MI_FORCE_RESTORE) {
>  		GEM_BUG_ON(flags & MI_RESTORE_INHIBIT);
>  		flags &= ~MI_FORCE_RESTORE;
> @@ -1668,6 +1670,21 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
>  		 * this should never take effect and so be a no-op!
>  		 */
>  		*cs++ = MI_SUSPEND_FLUSH | MI_SUSPEND_FLUSH_EN;
> +	} else if (IS_GEN(i915, 4)) {
> +		/*
> +		 * Disable CONSTANT_BUFFER before it is loaded from the context
> +		 * image. For as it is loaded, it is executed and the stored
> +		 * address may no longer be valid, leading to a GPU hang.
> +		 *
> +		 * This imposes the requirement that userspace reload their
> +		 * CONSTANT_BUFFER on every batch, fortunately a requirement
> +		 * they are already accustomed to from before contexts were
> +		 * enabled.
> +		 */
> +		*cs++ = MI_STORE_DWORD_IMM_GEN4 | MI_USE_GGTT;
> +		*cs++ = 0;
> +		*cs++ = i915_ggtt_offset(rq->hw_context->state) + 0x1d4;

Is that offset correct for ctg/elk? The docs suggest that it is not.
Though it looks like it'd land inside 3DSTATE_SAMPLER_PALETTE_LOAD_1
so probably not something anyone would notice.

> +		*cs++ = GFX_OP_CONSTANT_BUFFER; /* inactive */

I'm thinking there are a few ways in which we could even end up
with an inconsistent curbe setup in the context.

Eg. something like this:
CS_URB_STATE(num>0)
URB_FENCE(cs>0)
CONSTANT_BUFFER(len>0)
CS_URB_STATE(num=0)
URB_FENCE(cs=0)
ctx save
ctx restore

The restore would end up trying to issue CONSTANT_BUFFER with
len > 0 even though nothing is allocated for the CS unit in
the URB.

That didn't seem to be the case in your earlier hang though
so not quite sure why it was hanging. And I'm not sure this
would even cause a hang. The spec just says "undefined".

Bspec has this to say about CONSTANT_BUFFER:
"Programming Notes
 Constant Buffers can only be allocated in linear (not tiled) graphics memory
 Constant Buffers can only be mapped to Main Memory (UC)"

Maybe we were violating one of those? Though I don't think we could
even violate the tiled vs. linear restriction unless the memory access
goes through the gtt fence for some reason. I didn't think gen4+ do
that anymore. So that maybe leaves the possibility that a snooped
bo got mapped to the same address. But again I'm not sure if that
would cause a hang or just potentiall stale data being loaded into
the CURBE.

Another mystery is why g4x/ilk wouldn't need this. There's nothing in
the docs to suggest that it should behave any differently. Hmm. Maybe
MI_INVALIDATE_ISP could have something to do with this? Maybe that
forces CONSTANT_BUFFER to be saved with valid==0 always? I guess it
might be semi-interesting to peek at the resulting context image
for.

After a bit of digging I found a few more potentially related
tidbits:

ECOSPKD[4] Constant Buffer Save/Restore Disable [DevBW-C1+]

ILK_Chicken_1[7] ILK-C0: Curbe 8HW drop ECO
 0: Fix is enabled to complete the 8HW CURBE restore FIFO clear
 1: Fix is disable. Behavior is same as ILK B0

ILK_Chicken_1[0] [DevILK-B0]
 “1” CS will force URB modify_enables set after context restore
 “0” CS will not force URB modify_enables set after context restore

All of those default to 0 AFAICS, so only ILK_Chicken_1[7] seems like it
could have any real effect. So wouldn't explain g4x working.

>  	}
>  
>  	if (force_restore) {
> -- 
> 2.20.1

-- 
Ville Syrjälä
Intel
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2019-01-10 16:03 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-10 10:38 Logical (HW) contexts for gen4/5 Chris Wilson
2019-01-10 10:38 ` [PATCH 1/3] drm/i915/ringbuffer: EMIT_INVALIDATE *before* switch context Chris Wilson
2019-01-10 10:38 ` [PATCH 2/3] drm/i915: Enable render context support for Ironlake (gen5) Chris Wilson
2019-01-10 10:38 ` [PATCH 3/3] drm/i915: Enable render context support for gen4 (Broadwater to Cantiga) Chris Wilson
2019-01-10 16:03   ` Ville Syrjälä [this message]
2019-01-10 16:32     ` Chris Wilson
2019-01-12 21:28     ` Chris Wilson
2019-01-10 12:34 ` ✓ Fi.CI.BAT: success for series starting with [1/3] drm/i915/ringbuffer: EMIT_INVALIDATE *before* switch context Patchwork
2019-01-10 15:56 ` ✓ Fi.CI.IGT: " Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2019-01-28 21:52 [PATCH 1/3] " Chris Wilson
2019-01-28 21:52 ` [PATCH 3/3] drm/i915: Enable render context support for gen4 (Broadwater to Cantiga) Chris Wilson
2019-04-19 11:17 [PATCH 1/3] drm/i915/ringbuffer: EMIT_INVALIDATE *before* switch context Chris Wilson
2019-04-19 11:17 ` [PATCH 3/3] drm/i915: Enable render context support for gen4 (Broadwater to Cantiga) Chris Wilson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190110160321.GY20097@intel.com \
    --to=ville.syrjala@linux.intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=kenneth@whitecape.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.