[PATCH 0/8] DPF (GPU l3 parity detection) improvements

public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed

From: Ben Widawsky <benjamin.widawsky@intel.com>
To: intel-gfx@lists.freedesktop.org
Cc: bryan.j.bell@intel.com,
	Ben Widawsky <benjamin.widawsky@intel.com>,
	vishnu.venkatesh@intel.com
Subject: [PATCH 0/8] DPF (GPU l3 parity detection) improvements
Date: Thu, 12 Sep 2013 22:28:26 -0700	[thread overview]
Message-ID: <1379050122-12774-1-git-send-email-benjamin.widawsky@intel.com> (raw)

Since IVB, our driver has supported GPU L3 cacheline remapping for
parity errors. This is known as, "DPF" for Dynamic Parity Feature. I am
told such an error is a good predictor for a subsequent error in the
same part of the cache.  To address this possible issue for workloads
requiring precise and correct data, like GPGPU workloads the HW has
extra space in the cache which can be dynamically remapped to fill in
the old, faulting parts of the cache. I should also note, to my
knowledge, no such error has actually been seen on either Ivybridge or
Haswell in the wild.

Note, and reminder: GPU L3 is not the same thing as "L3." It is a
special (usually incoherent) cache that is only used by certain
components within the GPU.

Included in the patches:
1. Fix HSW test cases previously submitted and bikeshedded by Ville.
2. Support for an extra area of L3 added in certain HSW SKUs
3. Error injection support from the user space for test.
4. A reference daemon for listening to the parity error events.

Caveats:
* I've not implemented the "hang" injection. I was not clear what it does, and
  I don't really see how it benefits testing the software I have written.

* I am currently missing a test which uses the error injection.
  Volunteers who want to help, please raise your hand. If not, I'll get
  to it as soon as possible.

* We do have a race with the udev mechanism of error delivery. If I
  understand the way udev works, if we have more than 1 event before the
  daemon is woken, the properties will get us the failing cache location
  of the last error only. I think this is okay because of the earlier statement
  that a parity error is a good indicator of a future parity error. One thing
  which I've not done is trying to track when there are missed errors which
  should be possible even if the info about the location of the error can't be
  retrieved.

* There is no way to read out the per context remapping information through
  sysfs. I only expose whether or not a context has outstanding remaps through
  debugfs. This does effect the testability a bit, but the implementation is
  simple enough that I'm not terrible worried.

Ben Widawsky (8):
  drm/i915: Remove extra "ring"
  drm/i915: Round l3 parity reads down
  drm/i915: Fix l3 parity user buffer offset
  drm/i915: Fix HSW parity test
  drm/i915: Add second slice l3 remapping
  drm/i915: Make l3 remapping use the ring
  drm/i915: Keep a list of all contexts
  drm/i915: Do remaps for all contexts

 drivers/gpu/drm/i915/i915_debugfs.c     | 23 ++++++---
 drivers/gpu/drm/i915/i915_drv.h         | 13 +++--
 drivers/gpu/drm/i915/i915_gem.c         | 46 +++++++++---------
 drivers/gpu/drm/i915/i915_gem_context.c | 20 +++++++-
 drivers/gpu/drm/i915/i915_irq.c         | 84 +++++++++++++++++++++------------
 drivers/gpu/drm/i915/i915_reg.h         |  6 +++
 drivers/gpu/drm/i915/i915_sysfs.c       | 57 +++++++++++++++-------
 drivers/gpu/drm/i915/intel_ringbuffer.c |  6 +--
 include/uapi/drm/i915_drm.h             |  8 ++--
 9 files changed, 175 insertions(+), 88 deletions(-)

-- 
1.8.4

next             reply	other threads:[~2013-09-13  5:28 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-13  5:28 Ben Widawsky [this message]
2013-09-13  5:28 ` [PATCH 1/8] drm/i915: Remove extra "ring" Ben Widawsky
2013-09-13  5:28 ` [PATCH 2/8] drm/i915: Round l3 parity reads down Ben Widawsky
2013-09-13  5:28 ` [PATCH 3/8] drm/i915: Fix l3 parity user buffer offset Ben Widawsky
2013-09-13 12:56   ` Daniel Vetter
2013-09-13  5:28 ` [PATCH 4/8] drm/i915: Fix HSW parity test Ben Widawsky
2013-09-13  8:17   ` Ville Syrjälä
2013-09-13  5:28 ` [PATCH 5/8] drm/i915: Add second slice l3 remapping Ben Widawsky
2013-09-13  9:38   ` Ville Syrjälä
2013-09-17 18:45     ` Ben Widawsky
2013-09-17 18:51       ` Bell, Bryan J
2013-09-17 19:02         ` Ville Syrjälä
2013-09-17 19:08           ` Bell, Bryan J
2013-09-13  5:28 ` [PATCH 6/8] drm/i915: Make l3 remapping use the ring Ben Widawsky
2013-09-13 16:16   ` Daniel Vetter
2013-09-13  5:28 ` [PATCH 7/8] drm/i915: Keep a list of all contexts Ben Widawsky
2013-09-13  5:28 ` [PATCH 8/8] drm/i915: Do remaps for " Ben Widawsky
2013-09-13  9:17   ` Ville Syrjälä
2013-09-13  9:20     ` Ville Syrjälä
2013-09-17 20:42     ` Ben Widawsky
2013-09-13  5:28 ` [PATCH 09/16] intel_l3_parity: Fix indentation Ben Widawsky
2013-09-13  5:28 ` [PATCH 10/16] intel_l3_parity: Assert all GEN7+ support Ben Widawsky
2013-09-16 18:18   ` Bell, Bryan J
2013-09-17 23:52     ` Ben Widawsky
2013-09-17 23:59       ` Ben Widawsky
2013-09-13  5:28 ` [PATCH 11/16] intel_l3_parity: Use getopt for the l3 parity tool Ben Widawsky
2013-09-13  5:28 ` [PATCH 12/16] intel_l3_parity: Hardware info argument Ben Widawsky
2013-09-13  5:28 ` [PATCH 13/16] intel_l3_parity: slice support Ben Widawsky
2013-09-13  5:28 ` [PATCH 14/16] intel_l3_parity: Actually support multiple slices Ben Widawsky
2013-09-13  5:28 ` [PATCH 15/16] intel_l3_parity: Support error injection Ben Widawsky
2013-09-13  9:12   ` Daniel Vetter
2013-09-13 15:54     ` Ben Widawsky
2013-09-13 16:14       ` Daniel Vetter
2013-09-13 16:29         ` Ben Widawsky
2013-09-13  5:28 ` [PATCH 16/16] intel_l3_parity: Support a daemonic mode Ben Widawsky
2013-09-13  9:44 ` [PATCH 0/8] DPF (GPU l3 parity detection) improvements Ville Syrjälä
2013-09-17  0:52 ` Bell, Bryan J
2013-09-17  4:15   ` Ben Widawsky
2013-09-17  7:27     ` Daniel Vetter
2013-09-17 18:23       ` Bell, Bryan J

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1379050122-12774-1-git-send-email-benjamin.widawsky@intel.com \
    --to=benjamin.widawsky@intel.com \
    --cc=bryan.j.bell@intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=vishnu.venkatesh@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox