All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Wilson <chris@chris-wilson.co.uk>
To: intel-gfx@lists.freedesktop.org
Cc: Ben Widawsky <ben@bwidawsk.net>,
	Eugeni Dodonov <eugeni.dodonov@intel.com>
Subject: Re: [PATCH 1/1] drm/i915: track first and last processes that touch gem objects
Date: Fri, 03 Feb 2012 18:02:38 +0000	[thread overview]
Message-ID: <c55c5d$1s237a@AZSMGA002.ch.intel.com> (raw)
In-Reply-To: <1328280205-1509-1-git-send-email-eugeni.dodonov@intel.com>

On Fri,  3 Feb 2012 12:43:25 -0200, Eugeni Dodonov <eugeni.dodonov@intel.com> wrote:
> This allows to hopefully find out who was responsible for the GPU death.
> We record the 1st and last process to touch each object, to keep track of
> the process which created the object originally and the last process to
> touch it.
> 
> To simplify post-mortem analysis, we also search for the processes names
> when gathering the i915_error_state and when peeking at the list of active
> gem objects in debugfs. This is not perfect for tracking all the
> processes, as they can quit or die before their batchbuffers got executed,
> but having to track them during the entire object lifetime would be
> excessively memcpy hungry.

I think you've slightly missed here. Tracking who created a buffer is
interesting and who last used it, but you really need to also track 
on whose behalf the request (i.e. each batch) is executing.

For the goal of recording creator, you could just use:

  obj->creator = current ? current->pid : 0;

in i915_gem_object_init with 0 as the special value for objects created by
the driver outside of process context. And similarly for i915_add_request,
though I'd associate those with the owner of the file_priv.  The important
point here is that a buffer may be associated with multiple batches
submitted by one or more clients before a hang is detected, and so unless
the dispatch pid is tracked you do not know who submitted the erroneous
batch. (Even a batch may be submitted more than once by many clients,
given sufficient pathology.) So adding the request queue to the
i915_error_state would also be interesting, especially with the jiffie
and ring->tail.

Also note that there is no direct link between i915_gem_fault() and usage
of the object, the point at which you want to add the obj->last_used_by
tracking to is domain management - which catches the usage of CPU
mappings as well as move-to-active.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

  parent reply	other threads:[~2012-02-03 18:02 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-03 14:43 [PATCH 1/1] drm/i915: track first and last processes that touch gem objects Eugeni Dodonov
2012-02-03 14:53 ` Konstantin Belousov
2012-02-03 15:31   ` Eugeni Dodonov
2012-02-03 15:49     ` Konstantin Belousov
2012-02-03 18:02 ` Chris Wilson [this message]
2012-02-06 16:15   ` Daniel Vetter
2012-02-06 22:59     ` Eric Anholt
2012-02-07  8:49       ` Ben Widawsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='c55c5d$1s237a@AZSMGA002.ch.intel.com' \
    --to=chris@chris-wilson.co.uk \
    --cc=ben@bwidawsk.net \
    --cc=eugeni.dodonov@intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.