From: Ben Widawsky <ben@bwidawsk.net>
To: Daniel Vetter <daniel@ffwll.ch>
Cc: intel-gfx@lists.freedesktop.org, vishnu.venkatesh@intel.com,
bryan.j.bell@intel.com,
Ben Widawsky <benjamin.widawsky@intel.com>
Subject: Re: [PATCH 15/16] intel_l3_parity: Support error injection
Date: Fri, 13 Sep 2013 08:54:52 -0700 [thread overview]
Message-ID: <20130913155452.GA17672@bwidawsk.net> (raw)
In-Reply-To: <20130913091211.GF5459@phenom.ffwll.local>
On Fri, Sep 13, 2013 at 11:12:11AM +0200, Daniel Vetter wrote:
> On Thu, Sep 12, 2013 at 10:28:41PM -0700, Ben Widawsky wrote:
> > Haswell added the ability to inject errors which is extremely useful for
> > testing. Add two arguments to the tool to inject, and uninject.
> >
> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
>
> Do we run any risk that a concurrent write/read to the same register range
> could hang the machine due to the same-cacheline w/a we need? Just want to
> make sure that when we integrate this into a testcase there's no surprises
> like with intel_gpu_top ...
> -Daniel
The race against the kernel is ever present on all tests/tools. Are we
running parallel igt yet? If so, I can make the read/write functions
threadsafe.
On this note in particular I suppose we can make a debugfs entry like
the forcewake one to allow user space to do register accesses.
Interestingly, this also reminds me of another caveat I meant to put in
the commit message and forgot... the error injection register is also
per context, which makes it a pain to clear (and the pain in writing the
test case). I'm even beginning to think maybe a debugfs for this
register is the way to go.
As a side note, the injection feature is entirely debug only - but
agreed, random hangs in the test suite is not good.
[snip]
>
> > ---
> > tests/sysfs_l3_parity | 2 +-
> > tools/intel_l3_parity.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++--
> > 2 files changed, 68 insertions(+), 3 deletions(-)
> >
> > diff --git a/tests/sysfs_l3_parity b/tests/sysfs_l3_parity
> > index a0dfad9..e9d4411 100755
> > --- a/tests/sysfs_l3_parity
> > +++ b/tests/sysfs_l3_parity
> > @@ -21,7 +21,7 @@ fi
> > $SOURCE_DIR/../tools/intel_l3_parity -r 0 -b 0 -s 0 -e
> >
> > #Check that we can clear remaps
> > -if [ `$SOURCE_DIR/../tools/intel_l3_parity -l | wc -c` != "0" ] ; then
> > +if [ `$SOURCE_DIR/../tools/intel_l3_parity -l | wc -l` != 1 ] ; then
> > echo "Fail 2"
> > exit 1
> > fi
> > diff --git a/tools/intel_l3_parity.c b/tools/intel_l3_parity.c
> > index cf15541..cd6754e 100644
> > --- a/tools/intel_l3_parity.c
> > +++ b/tools/intel_l3_parity.c
> > @@ -79,6 +79,20 @@ static int which_slice = -1;
> > (__i) < ((which_slice == -1) ? MAX_SLICES : (which_slice + 1)); \
> > (__i)++)
> >
> > +static void decode_dft(uint32_t dft)
> > +{
> > + if (IS_IVYBRIDGE(devid) || !(dft & 1)) {
> > + printf("Error injection disabled\n");
> > + return;
> > + }
> > + printf("Error injection enabled\n");
> > + printf(" Hang = %s\n", (dft >> 28) & 0x1 ? "yes" : "no");
> > + printf(" Row = %d\n", (dft >> 7) & 0x7ff);
> > + printf(" Bank = %d\n", (dft >> 2) & 0x3);
> > + printf(" Subbank = %d\n", (dft >> 4) & 0x7);
> > + printf(" Slice = %d\n", (dft >> 1) & 0x1);
> > +}
> > +
> > static void dumpit(int slice)
> > {
> > int i, j;
> > @@ -150,7 +164,9 @@ static void usage(const char *name)
> > " -l, --list List the current L3 logs\n"
> > " -a, --clear-all Clear all disabled rows\n"
> > " -e, --enable Enable row, bank, subbank (undo -d)\n"
> > - " -d, --disable=<row,bank,subbank> Disable row, bank, subbank (inline arguments are deprecated. Please use -r, -b, -s instead\n",
> > + " -d, --disable=<row,bank,subbank> Disable row, bank, subbank (inline arguments are deprecated. Please use -r, -b, -s instead\n"
> > + " -i, --inject [HSW only] Cause hardware to inject a row errors\n"
> > + " -u, --uninject [HSW only] Turn off hardware error injectection (undo -i)\n",
> > name);
> > }
> >
> > @@ -158,6 +174,7 @@ int main(int argc, char *argv[])
> > {
> > const int device = drm_get_card();
> > char *path[REAL_MAX_SLICES];
> > + uint32_t dft;
> > int row = 0, bank = 0, sbank = 0;
> > int fd[REAL_MAX_SLICES] = {0}, ret, i;
> > int action = '0';
> > @@ -167,6 +184,8 @@ int main(int argc, char *argv[])
> > if (intel_gen(devid) < 7)
> > exit(EXIT_SUCCESS);
> >
> > + assert(intel_register_access_init(intel_get_pci_device(), 0) == 0);
> > +
> > ret = asprintf(&path[0], "/sys/class/drm/card%d/l3_parity", device);
> > assert(ret != -1);
> > ret = asprintf(&path[1], "/sys/class/drm/card%d/l3_parity_slice_1", device);
> > @@ -183,6 +202,7 @@ int main(int argc, char *argv[])
> > assert(lseek(fd[i], 0, SEEK_SET) == 0);
> > }
> >
> > + dft = intel_register_read(0xb038);
> >
> > while (1) {
> > int c, option_index = 0;
> > @@ -192,6 +212,8 @@ int main(int argc, char *argv[])
> > { "clear-all", no_argument, 0, 'a' },
> > { "enable", no_argument, 0, 'e' },
> > { "disable", optional_argument, 0, 'd' },
> > + { "inject", no_argument, 0, 'i' },
> > + { "uninject", no_argument, 0, 'u' },
> > { "hw-info", no_argument, 0, 'H' },
> > { "row", required_argument, 0, 'r' },
> > { "bank", required_argument, 0, 'b' },
> > @@ -200,7 +222,7 @@ int main(int argc, char *argv[])
> > {0, 0, 0, 0}
> > };
> >
> > - c = getopt_long(argc, argv, "hHr:b:s:w:aled::", long_options,
> > + c = getopt_long(argc, argv, "hHr:b:s:w:aled::iu", long_options,
> > &option_index);
> > if (c == -1)
> > break;
> > @@ -215,6 +237,7 @@ int main(int argc, char *argv[])
> > printf("Number of banks: %d\n", num_banks());
> > printf("Subbanks per bank: %d\n", NUM_SUBBANKS);
> > printf("Max L3 size: %dK\n", L3_SIZE >> 10);
> > + printf("Has error injection: %s\n", IS_HASWELL(devid) ? "yes" : "no");
> > exit(EXIT_SUCCESS);
> > case 'r':
> > row = atoi(optarg);
> > @@ -236,6 +259,12 @@ int main(int argc, char *argv[])
> > if (which_slice >= MAX_SLICES)
> > exit(EXIT_FAILURE);
> > break;
> > + case 'i':
> > + case 'u':
> > + if (!IS_HASWELL(devid)) {
> > + fprintf(stderr, "Error injection supported on HSW+ only\n");
> > + exit(EXIT_FAILURE);
> > + }
> > case 'd':
> > if (optarg) {
> > ret = sscanf(optarg, "%d,%d,%d", &row, &bank, &sbank);
> > @@ -256,6 +285,23 @@ int main(int argc, char *argv[])
> > }
> > }
> >
> > + if (action == 'i') {
> > + if (((dft >> 1) & 1) != which_slice) {
> > + fprintf(stderr, "DFT register already has slice %d enabled, and we don't support multiple slices. Try modifying -w; but sometimes the register sticks in the wrong way\n", (dft >> 1) & 1);
> > + exit(EXIT_FAILURE);
> > + }
> > +
> > + if (which_slice == -1) {
> > + fprintf(stderr, "Cannot inject errors to multiple slices (modify -w)\n");
> > + exit(EXIT_FAILURE);
> > + }
> > + if (dft & 1 && ((dft >> 1) && 1) == which_slice)
> > + printf("warning: overwriting existing injections. This is very dangerous.\n");
> > + }
> > +
> > + if (action == 'l')
> > + decode_dft(dft);
> > +
> > /* Per slice operations */
> > for_each_slice(i) {
> > switch (action) {
> > @@ -271,11 +317,30 @@ int main(int argc, char *argv[])
> > case 'd':
> > assert(disable_rbs(row, bank, sbank, i) == 0);
> > break;
> > + case 'i':
> > + if (bank == 3) {
> > + fprintf(stderr, "The hardware does not support error inject on bank 3.\n");
> > + exit(EXIT_FAILURE);
> > + }
> > + dft |= row << 7;
> > + dft |= sbank << 4;
> > + dft |= bank << 2;
> > + assert(i < 2);
> > + dft |= i << 1; /* slice */
> > + dft |= 1 << 0; /* enable */
> > + intel_register_write(0xb038, dft);
> > + break;
> > + case 'u':
> > + intel_register_write(0xb038, dft & ~(1<<0));
> > + break;
> > + case 'L':
> > + break;
> > default:
> > abort();
> > }
> > }
> >
> > + intel_register_access_fini();
> > if (action == 'l')
> > exit(EXIT_SUCCESS);
> >
> > --
> > 1.8.4
> >
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
--
Ben Widawsky, Intel Open Source Technology Center
next prev parent reply other threads:[~2013-09-13 15:55 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-09-13 5:28 [PATCH 0/8] DPF (GPU l3 parity detection) improvements Ben Widawsky
2013-09-13 5:28 ` [PATCH 1/8] drm/i915: Remove extra "ring" Ben Widawsky
2013-09-13 5:28 ` [PATCH 2/8] drm/i915: Round l3 parity reads down Ben Widawsky
2013-09-13 5:28 ` [PATCH 3/8] drm/i915: Fix l3 parity user buffer offset Ben Widawsky
2013-09-13 12:56 ` Daniel Vetter
2013-09-13 5:28 ` [PATCH 4/8] drm/i915: Fix HSW parity test Ben Widawsky
2013-09-13 8:17 ` Ville Syrjälä
2013-09-13 5:28 ` [PATCH 5/8] drm/i915: Add second slice l3 remapping Ben Widawsky
2013-09-13 9:38 ` Ville Syrjälä
2013-09-17 18:45 ` Ben Widawsky
2013-09-17 18:51 ` Bell, Bryan J
2013-09-17 19:02 ` Ville Syrjälä
2013-09-17 19:08 ` Bell, Bryan J
2013-09-13 5:28 ` [PATCH 6/8] drm/i915: Make l3 remapping use the ring Ben Widawsky
2013-09-13 16:16 ` Daniel Vetter
2013-09-13 5:28 ` [PATCH 7/8] drm/i915: Keep a list of all contexts Ben Widawsky
2013-09-13 5:28 ` [PATCH 8/8] drm/i915: Do remaps for " Ben Widawsky
2013-09-13 9:17 ` Ville Syrjälä
2013-09-13 9:20 ` Ville Syrjälä
2013-09-17 20:42 ` Ben Widawsky
2013-09-13 5:28 ` [PATCH 09/16] intel_l3_parity: Fix indentation Ben Widawsky
2013-09-13 5:28 ` [PATCH 10/16] intel_l3_parity: Assert all GEN7+ support Ben Widawsky
2013-09-16 18:18 ` Bell, Bryan J
2013-09-17 23:52 ` Ben Widawsky
2013-09-17 23:59 ` Ben Widawsky
2013-09-13 5:28 ` [PATCH 11/16] intel_l3_parity: Use getopt for the l3 parity tool Ben Widawsky
2013-09-13 5:28 ` [PATCH 12/16] intel_l3_parity: Hardware info argument Ben Widawsky
2013-09-13 5:28 ` [PATCH 13/16] intel_l3_parity: slice support Ben Widawsky
2013-09-13 5:28 ` [PATCH 14/16] intel_l3_parity: Actually support multiple slices Ben Widawsky
2013-09-13 5:28 ` [PATCH 15/16] intel_l3_parity: Support error injection Ben Widawsky
2013-09-13 9:12 ` Daniel Vetter
2013-09-13 15:54 ` Ben Widawsky [this message]
2013-09-13 16:14 ` Daniel Vetter
2013-09-13 16:29 ` Ben Widawsky
2013-09-13 5:28 ` [PATCH 16/16] intel_l3_parity: Support a daemonic mode Ben Widawsky
2013-09-13 9:44 ` [PATCH 0/8] DPF (GPU l3 parity detection) improvements Ville Syrjälä
2013-09-17 0:52 ` Bell, Bryan J
2013-09-17 4:15 ` Ben Widawsky
2013-09-17 7:27 ` Daniel Vetter
2013-09-17 18:23 ` Bell, Bryan J
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130913155452.GA17672@bwidawsk.net \
--to=ben@bwidawsk.net \
--cc=benjamin.widawsky@intel.com \
--cc=bryan.j.bell@intel.com \
--cc=daniel@ffwll.ch \
--cc=intel-gfx@lists.freedesktop.org \
--cc=vishnu.venkatesh@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.