All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ian Romanick <idr@freedesktop.org>
To: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org, miku@iki.fi
Subject: Re: [PATCH v2 16/16] drm/i915: add i915_gem_context_get_reset_status_ioctl
Date: Mon, 18 Mar 2013 13:26:26 -0700	[thread overview]
Message-ID: <51477872.2080608@freedesktop.org> (raw)
In-Reply-To: <1363276337-12509-17-git-send-email-mika.kuoppala@intel.com>

On 03/14/2013 08:52 AM, Mika Kuoppala wrote:
> This ioctl returns context reset status for specified context.
>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> CC: idr@freedesktop.org
> ---
>   drivers/gpu/drm/i915/i915_dma.c |    1 +
>   drivers/gpu/drm/i915/i915_drv.c |   61 +++++++++++++++++++++++++++++++++++++++
>   drivers/gpu/drm/i915/i915_drv.h |    2 ++
>   include/uapi/drm/i915_drm.h     |   28 ++++++++++++++++++
>   4 files changed, 92 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index 7902d97..c919832 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -1903,6 +1903,7 @@ struct drm_ioctl_desc i915_ioctls[] = {
>   	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_CREATE, i915_gem_context_create_ioctl, DRM_UNLOCKED),
>   	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_DESTROY, i915_gem_context_destroy_ioctl, DRM_UNLOCKED),
>   	DRM_IOCTL_DEF_DRV(I915_REG_READ, i915_reg_read_ioctl, DRM_UNLOCKED),
> +	DRM_IOCTL_DEF_DRV(I915_GET_RESET_STATUS, i915_gem_context_get_reset_status_ioctl, DRM_UNLOCKED),
>   };
>
>   int i915_max_ioctl = DRM_ARRAY_SIZE(i915_ioctls);
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 69c9856..a4d06f2 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -1267,3 +1267,64 @@ int i915_reg_read_ioctl(struct drm_device *dev,
>
>   	return 0;
>   }
> +
> +int i915_gem_context_get_reset_status_ioctl(struct drm_device *dev,
> +					    void *data, struct drm_file *file)
> +{
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct intel_ring_buffer *ring;
> +	struct drm_i915_reset_status *args = data;
> +	struct ctx_reset_state *rs = NULL;
> +	unsigned long reset_cnt;
> +	u32 reset_status = I915_RESET_UNKNOWN;
> +	int ret;
> +
> +	ret = mutex_lock_interruptible(&dev->struct_mutex);
> +	if (ret)
> +		return ret;
> +
> +	ring = &dev_priv->ring[RCS];
> +
> +	ret = i915_gem_context_get_reset_state(ring,
> +					       file,
> +					       args->ctx_id,
> +					       &rs);
> +	if (ret)
> +		goto out;
> +
> +	BUG_ON(!rs);
> +
> +	reset_cnt = atomic_read(&dev_priv->gpu_error.reset_counter);
> +
> +	if (reset_cnt & I915_RESET_IN_PROGRESS_FLAG ||

In this case, I believe we're supposed to return the reset state to the 
application.  The ARB_robustness spec says:

     "If a reset status other than NO_ERROR is returned and subsequent
     calls return NO_ERROR, the context reset was encountered and
     completed. If a reset status is repeatedly returned, the context may
     be in the process of resetting."

If the reset takes a long time, it seems that even a well-behaved app 
could run afoul of the 'banned' logic.

> +	    reset_cnt == I915_WEDGED) {
> +		goto out;
> +	}
> +
> +	/* Set guilty/innocent status if only one reset was
> +	 * observed and if only one guilty was found
> +	 */
> +	if ((rs->reset_cnt + 2) == reset_cnt &&
> +	    (rs->guilty_cnt + 1) == dev_priv->gpu_error.guilty_cnt) {

This logic seems... wrong, or at least weird.  "rs->reset_cnt + 2" is 
confusing next to "if only one reset was observed".

dev_priv->gpu_error.reset_counter is the global GPU reset count since 
start-up, and rs->reset_cnt is the global GPU count since start-up when 
the context was created.  Right?

If that's the case, this will cause a context that was completely idle 
(i.e., didn't actually lose anything) to get a reset notification. 
That's an absolute deal breaker.

If that's not the case, then this architecture needs a lot more 
documentation so that people new to it can understand what's happening.

> +		reset_status = 0;
> +
> +		if (rs->guilty)
> +			reset_status |= I915_RESET_BATCH_ACTIVE;
> +
> +		if (rs->innocent)
> +			reset_status |= I915_RESET_BATCH_PENDING;
> +
> +		if (reset_status == 0)
> +			reset_status = I915_RESET_UNKNOWN;
> +	} else if (rs->reset_cnt == reset_cnt) {
> +		reset_status = I915_RESET_NO_ERROR;
> +	}
> +
> +out:
> +	if (!ret)
> +		args->reset_status = reset_status;
> +
> +	mutex_unlock(&dev->struct_mutex);
> +
> +	return ret ? -EINVAL : 0;
> +}
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 3e11acf..2e5e8e7 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1712,6 +1712,8 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
>   				  struct drm_file *file);
>   int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
>   				   struct drm_file *file);
> +int i915_gem_context_get_reset_status_ioctl(struct drm_device *dev,
> +					    void *data, struct drm_file *file);
>
>   /* i915_gem_gtt.c */
>   void i915_gem_cleanup_aliasing_ppgtt(struct drm_device *dev);
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 07d5941..a195e0e 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -198,6 +198,7 @@ typedef struct _drm_i915_sarea {
>   #define DRM_I915_GEM_SET_CACHING	0x2f
>   #define DRM_I915_GEM_GET_CACHING	0x30
>   #define DRM_I915_REG_READ		0x31
> +#define DRM_I915_GET_RESET_STATUS	0x32
>
>   #define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
>   #define DRM_IOCTL_I915_FLUSH		DRM_IO ( DRM_COMMAND_BASE + DRM_I915_FLUSH)
> @@ -247,6 +248,7 @@ typedef struct _drm_i915_sarea {
>   #define DRM_IOCTL_I915_GEM_CONTEXT_CREATE	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create)
>   #define DRM_IOCTL_I915_GEM_CONTEXT_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_DESTROY, struct drm_i915_gem_context_destroy)
>   #define DRM_IOCTL_I915_REG_READ			DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_REG_READ, struct drm_i915_reg_read)
> +#define DRM_IOCTL_I915_GET_RESET_STATUS		DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GET_RESET_STATUS, struct drm_i915_reset_status)
>
>   /* Allow drivers to submit batchbuffers directly to hardware, relying
>    * on the security mechanisms provided by hardware.
> @@ -980,4 +982,30 @@ struct drm_i915_reg_read {
>   	__u64 offset;
>   	__u64 val; /* Return value */
>   };
> +
> +/* No reset observed */
> +#define I915_RESET_NO_ERROR      0
> +
> +/* Context had batch processing active while
> +   gpu hung and batch was guilty of gpu hang */
> +#define I915_RESET_BATCH_ACTIVE  (1 << 0)
> +
> +/* Context had batch queued for processing while
> +   reset occurred and guilty batch was found:
> +   I915_RESET_BATCH_ACTIVE was set for this or
> +   some other context */
> +#define I915_RESET_BATCH_PENDING (1 << 1)
> +
> +/* Context observed gpu hung and reset but guilty context
> +   was not found: I915_RESET_BATCH_ACTIVE and
> +   I915_RESET_BATCH_PENDING were not set for any context */
> +#define I915_RESET_UNKNOWN       (1 << 2)
> +
> +struct drm_i915_reset_status {
> +	__u32 ctx_id;
> +	__u32 flags;
> +	__u32 reset_status;
> +	__u32 pad;
> +};
> +
>   #endif /* _UAPI_I915_DRM_H_ */
>

  parent reply	other threads:[~2013-03-18 20:26 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-14 15:52 [PATCH v2 00/16] arb robustness enablers v2 Mika Kuoppala
2013-03-14 15:52 ` [PATCH v2 01/16] drm/i915: return context from i915_switch_context() Mika Kuoppala
2013-03-14 15:52 ` [PATCH v2 02/16] drm/i915: cleanup i915_add_request Mika Kuoppala
2013-03-15  9:43   ` Chris Wilson
2013-03-14 15:52 ` [PATCH v2 03/16] drm/i915: reference count for i915_hw_contexts Mika Kuoppala
2013-03-15  9:44   ` Chris Wilson
2013-04-02 22:45   ` [PATCH 1/2] [v3] " Ben Widawsky
2013-04-03  1:27     ` Ben Widawsky
2013-04-03 11:56       ` Chris Wilson
2013-04-03 16:37         ` Daniel Vetter
2013-04-02 22:45   ` [PATCH 2/2] drm/i915: Print all contexts in debugfs Ben Widawsky
2013-03-14 15:52 ` [PATCH v2 04/16] drm/i915: Resurrect ring kicking for semaphores, selectively Mika Kuoppala
2013-03-17 21:53   ` Daniel Vetter
2013-03-14 15:52 ` [PATCH v2 05/16] drm/i915: pass seqno to i915_hangcheck_ring_idle Mika Kuoppala
2013-03-14 15:52 ` [PATCH v2 06/16] drm/i915: track ring progression using seqnos Mika Kuoppala
2013-03-15  9:48   ` Chris Wilson
2013-03-14 15:52 ` [PATCH v2 07/16] drm/i915: introduce i915_hangcheck_ring_hung Mika Kuoppala
2013-03-14 15:52 ` [PATCH v2 08/16] drm/i915: detect hang using per ring hangcheck_score Mika Kuoppala
2013-03-14 15:52 ` [PATCH v2 09/16] drm/i915: remove i915_hangcheck_hung Mika Kuoppala
2013-03-14 15:52 ` [PATCH v2 10/16] drm/i915: add struct ctx_reset_state Mika Kuoppala
2013-03-15  9:52   ` Chris Wilson
2013-03-17 21:51     ` Daniel Vetter
2013-03-18 20:18   ` Ian Romanick
2013-03-14 15:52 ` [PATCH v2 11/16] drm/i915: add reset_state for hw_contexts Mika Kuoppala
2013-03-15  9:53   ` Chris Wilson
2013-03-14 15:52 ` [PATCH v2 12/16] drm/i915: add batch object and context to i915_add_request() Mika Kuoppala
2013-03-14 15:52 ` [PATCH v2 13/16] drm/i915: mark rings which were waiting when hang happened Mika Kuoppala
2013-03-14 15:52 ` [PATCH v2 14/16] drm/i915: find guilty batch buffer on ring resets Mika Kuoppala
2013-03-14 15:52 ` [PATCH v2 15/16] drm/i915: refuse to submit more batchbuffers from guilty context Mika Kuoppala
2013-03-14 15:52 ` [PATCH v2 16/16] drm/i915: add i915_gem_context_get_reset_status_ioctl Mika Kuoppala
2013-03-15 10:01   ` Chris Wilson
2013-03-18 20:26   ` Ian Romanick [this message]
2013-03-19 12:58     ` Mika Kuoppala
2013-03-19 19:02       ` Ian Romanick
2013-03-19 19:21         ` Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51477872.2080608@freedesktop.org \
    --to=idr@freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=mika.kuoppala@linux.intel.com \
    --cc=miku@iki.fi \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.