* [benjamin.widawsky@intel.com: intel_gpu_top broken for HSW. Ideas needed]
@ 2013-07-12 17:12 Ben Widawsky
2013-07-12 17:16 ` Daniel Vetter
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Ben Widawsky @ 2013-07-12 17:12 UTC (permalink / raw)
To: Intel GFX; +Cc: mesa-dev
FWD'd from our internal list now that we have more insight.
----- Forwarded message from Ben Widawsky <benjamin.widawsky@intel.com> -----
Date: Thu, 11 Jul 2013 10:32:03 -0700
From: Ben Widawsky <benjamin.widawsky@intel.com>
To: linux-gfx@linux.intel.com
Subject: intel_gpu_top broken for HSW. Ideas needed
Message-ID: <20130711173202.GB8802@intel.com>
Hi everybody.
While investigating a hard hang on Haswell. Eero noticed that
intel_gpu_top helped to invoke the hang faster. I used this in my test
case to validation, and they are suspecting it is a known issue which we
have not yet worked around (and cannot reasonably workaround).
[internal bug sighting redacted]
To sum up, we cannot concurrently access registers within the same
cacheline. It has the potential to hit a known bug.
I see some choices:
1. Don't do anything.
2. Try to eliminate shared registers as much as possible. Instdone is
used by the hangcheck, and we can eliminate hangcheck with a
module parameter. Eero, can you try this as a workaround, btw?
3. Somehow make the kernel collect the top data and serialize access
there.
Anyone else have input? I personally do not use top very much, so I
won't be volunteering to do any of these.
----- End forwarded message -----
--
Ben Widawsky, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [benjamin.widawsky@intel.com: intel_gpu_top broken for HSW. Ideas needed]
2013-07-12 17:12 [benjamin.widawsky@intel.com: intel_gpu_top broken for HSW. Ideas needed] Ben Widawsky
@ 2013-07-12 17:16 ` Daniel Vetter
2013-07-12 17:27 ` Ben Widawsky
2013-07-12 17:35 ` Ben Widawsky
2013-07-15 9:42 ` Mika Kuoppala
2 siblings, 1 reply; 5+ messages in thread
From: Daniel Vetter @ 2013-07-12 17:16 UTC (permalink / raw)
To: Ben Widawsky; +Cc: mesa-dev, Intel GFX
On Fri, Jul 12, 2013 at 7:12 PM, Ben Widawsky
<benjamin.widawsky@intel.com> wrote:
> FWD'd from our internal list now that we have more insight.
> ----- Forwarded message from Ben Widawsky <benjamin.widawsky@intel.com> -----
>
> Date: Thu, 11 Jul 2013 10:32:03 -0700
> From: Ben Widawsky <benjamin.widawsky@intel.com>
> To: linux-gfx@linux.intel.com
> Subject: intel_gpu_top broken for HSW. Ideas needed
> Message-ID: <20130711173202.GB8802@intel.com>
>
> Hi everybody.
>
> While investigating a hard hang on Haswell. Eero noticed that
> intel_gpu_top helped to invoke the hang faster. I used this in my test
> case to validation, and they are suspecting it is a known issue which we
> have not yet worked around (and cannot reasonably workaround).
>
> [internal bug sighting redacted]
>
> To sum up, we cannot concurrently access registers within the same
> cacheline. It has the potential to hit a known bug.
>
> I see some choices:
> 1. Don't do anything.
> 2. Try to eliminate shared registers as much as possible. Instdone is
> used by the hangcheck, and we can eliminate hangcheck with a
> module parameter. Eero, can you try this as a workaround, btw?
> 3. Somehow make the kernel collect the top data and serialize access
> there.
>
> Anyone else have input? I personally do not use top very much, so I
> won't be volunteering to do any of these.
For now I'd just vote for a warning on gen6+ on the intel-gpu-top
screen that this might hang hw. If anyone cares we could add a debugfs
interface (or finally get real approval for the performance counters
the hw has an expose them properly). Not a intel_gpu_top user myself
though.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [benjamin.widawsky@intel.com: intel_gpu_top broken for HSW. Ideas needed]
2013-07-12 17:16 ` Daniel Vetter
@ 2013-07-12 17:27 ` Ben Widawsky
0 siblings, 0 replies; 5+ messages in thread
From: Ben Widawsky @ 2013-07-12 17:27 UTC (permalink / raw)
To: Daniel Vetter; +Cc: mesa-dev, Intel GFX, Tamminen, Eero T, Ben Widawsky
On Fri, Jul 12, 2013 at 07:16:37PM +0200, Daniel Vetter wrote:
> On Fri, Jul 12, 2013 at 7:12 PM, Ben Widawsky
> <benjamin.widawsky@intel.com> wrote:
> > FWD'd from our internal list now that we have more insight.
> > ----- Forwarded message from Ben Widawsky <benjamin.widawsky@intel.com> -----
> >
> > Date: Thu, 11 Jul 2013 10:32:03 -0700
> > From: Ben Widawsky <benjamin.widawsky@intel.com>
> > To: linux-gfx@linux.intel.com
> > Subject: intel_gpu_top broken for HSW. Ideas needed
> > Message-ID: <20130711173202.GB8802@intel.com>
> >
> > Hi everybody.
> >
> > While investigating a hard hang on Haswell. Eero noticed that
> > intel_gpu_top helped to invoke the hang faster. I used this in my test
> > case to validation, and they are suspecting it is a known issue which we
> > have not yet worked around (and cannot reasonably workaround).
> >
> > [internal bug sighting redacted]
> >
> > To sum up, we cannot concurrently access registers within the same
> > cacheline. It has the potential to hit a known bug.
> >
> > I see some choices:
> > 1. Don't do anything.
> > 2. Try to eliminate shared registers as much as possible. Instdone is
> > used by the hangcheck, and we can eliminate hangcheck with a
> > module parameter. Eero, can you try this as a workaround, btw?
> > 3. Somehow make the kernel collect the top data and serialize access
> > there.
> >
> > Anyone else have input? I personally do not use top very much, so I
> > won't be volunteering to do any of these.
>
>
> For now I'd just vote for a warning on gen6+ on the intel-gpu-top
> screen that this might hang hw. If anyone cares we could add a debugfs
> interface (or finally get real approval for the performance counters
> the hw has an expose them properly). Not a intel_gpu_top user myself
> though.
> -Daniel
>
Eero: I meant to add by the way, ring head/tail are also used as much as
instdone. So Maybe we can get rid of that for the ring fullness check.
We're *very* likely to hit that one.
--
Ben Widawsky, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [benjamin.widawsky@intel.com: intel_gpu_top broken for HSW. Ideas needed]
2013-07-12 17:12 [benjamin.widawsky@intel.com: intel_gpu_top broken for HSW. Ideas needed] Ben Widawsky
2013-07-12 17:16 ` Daniel Vetter
@ 2013-07-12 17:35 ` Ben Widawsky
2013-07-15 9:42 ` Mika Kuoppala
2 siblings, 0 replies; 5+ messages in thread
From: Ben Widawsky @ 2013-07-12 17:35 UTC (permalink / raw)
To: Ben Widawsky; +Cc: mesa-dev, Intel GFX
On Fri, Jul 12, 2013 at 10:12:39AM -0700, Ben Widawsky wrote:
> FWD'd from our internal list now that we have more insight.
> ----- Forwarded message from Ben Widawsky <benjamin.widawsky@intel.com> -----
>
> Date: Thu, 11 Jul 2013 10:32:03 -0700
> From: Ben Widawsky <benjamin.widawsky@intel.com>
> To: linux-gfx@linux.intel.com
> Subject: intel_gpu_top broken for HSW. Ideas needed
> Message-ID: <20130711173202.GB8802@intel.com>
>
> Hi everybody.
>
> While investigating a hard hang on Haswell. Eero noticed that
> intel_gpu_top helped to invoke the hang faster. I used this in my test
> case to validation, and they are suspecting it is a known issue which we
> have not yet worked around (and cannot reasonably workaround).
>
> [internal bug sighting redacted]
>
> To sum up, we cannot concurrently access registers within the same
> cacheline. It has the potential to hit a known bug.
>
> I see some choices:
> 1. Don't do anything.
> 2. Try to eliminate shared registers as much as possible. Instdone is
> used by the hangcheck, and we can eliminate hangcheck with a
> module parameter. Eero, can you try this as a workaround, btw?
> 3. Somehow make the kernel collect the top data and serialize access
> there.
>
> Anyone else have input? I personally do not use top very much, so I
> won't be volunteering to do any of these.
>
BTW, of course any tool which reads or writes registers is subject to
the same problem. GPU top is just the one that kind of depends upon us
not synchronizing with the kernel.
--
Ben Widawsky, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [benjamin.widawsky@intel.com: intel_gpu_top broken for HSW. Ideas needed]
2013-07-12 17:12 [benjamin.widawsky@intel.com: intel_gpu_top broken for HSW. Ideas needed] Ben Widawsky
2013-07-12 17:16 ` Daniel Vetter
2013-07-12 17:35 ` Ben Widawsky
@ 2013-07-15 9:42 ` Mika Kuoppala
2 siblings, 0 replies; 5+ messages in thread
From: Mika Kuoppala @ 2013-07-15 9:42 UTC (permalink / raw)
To: Ben Widawsky, Intel GFX; +Cc: mesa-dev
Ben Widawsky <benjamin.widawsky@intel.com> writes:
> FWD'd from our internal list now that we have more insight.
> ----- Forwarded message from Ben Widawsky <benjamin.widawsky@intel.com> -----
>
> Date: Thu, 11 Jul 2013 10:32:03 -0700
> From: Ben Widawsky <benjamin.widawsky@intel.com>
> To: linux-gfx@linux.intel.com
> Subject: intel_gpu_top broken for HSW. Ideas needed
> Message-ID: <20130711173202.GB8802@intel.com>
>
> Hi everybody.
>
> While investigating a hard hang on Haswell. Eero noticed that
> intel_gpu_top helped to invoke the hang faster. I used this in my test
> case to validation, and they are suspecting it is a known issue which we
> have not yet worked around (and cannot reasonably workaround).
>
> [internal bug sighting redacted]
>
> To sum up, we cannot concurrently access registers within the same
> cacheline. It has the potential to hit a known bug.
>
> I see some choices:
> 1. Don't do anything.
> 2. Try to eliminate shared registers as much as possible. Instdone is
> used by the hangcheck, and we can eliminate hangcheck with a
> module parameter. Eero, can you try this as a workaround, btw?
Commit: 92cab7345131db7af18f630a799ce6b2e8e624c5 gets rid of
instdone on hangcheck.
-Mika
> 3. Somehow make the kernel collect the top data and serialize access
> there.
>
> Anyone else have input? I personally do not use top very much, so I
> won't be volunteering to do any of these.
>
> ----- End forwarded message -----
>
> --
> Ben Widawsky, Intel Open Source Technology Center
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-07-15 9:42 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-07-12 17:12 [benjamin.widawsky@intel.com: intel_gpu_top broken for HSW. Ideas needed] Ben Widawsky
2013-07-12 17:16 ` Daniel Vetter
2013-07-12 17:27 ` Ben Widawsky
2013-07-12 17:35 ` Ben Widawsky
2013-07-15 9:42 ` Mika Kuoppala
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.