5 bugs

All of lore.kernel.org
 help / color / mirror / Atom feed

* 5 bugs
@ 2011-06-16  1:10 Bryce Harrington
  2011-06-16 11:37 ` Chris Wilson
  0 siblings, 1 reply; 5+ messages in thread
From: Bryce Harrington @ 2011-06-16  1:10 UTC (permalink / raw)
  To: Alt, Maxim; +Cc: intel-gfx

Hi Max,

I currently am tracking 6 bug reports with the intel driver so far for
the oneiric development cycle, of which 5 have been forwarded upstream:

  https://bugs.freedesktop.org/show_bug.cgi?id=36515
  https://bugs.freedesktop.org/show_bug.cgi?id=37393
  https://bugs.freedesktop.org/show_bug.cgi?id=37526
  https://bugs.freedesktop.org/show_bug.cgi?id=28798
  https://bugs.freedesktop.org/show_bug.cgi?id=38191

The first two would be the higher priorities; I've seen several other
bug reports with similar symptoms/error codes.  These are mainly
leftover priorities from natty; the deluge of oneiric bug reports
probably won't start for a couple more weeks.

There's also one mesa 7.10.2 bug report showing up on my list right now:

  https://bugs.freedesktop.org/show_bug.cgi?id=35234 (lp: #35234)

That's reported when using mutter, which we're not really focused on,
but the bug looks tractable (stacktrace / assert triggered).

Thanks,
Bryce

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 5 bugs
  2011-06-16  1:10 5 bugs Bryce Harrington
@ 2011-06-16 11:37 ` Chris Wilson
  2011-06-16 22:46   ` Bryce Harrington
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Wilson @ 2011-06-16 11:37 UTC (permalink / raw)
  To: Bryce Harrington, Alt, Maxim; +Cc: intel-gfx

On Wed, 15 Jun 2011 18:10:29 -0700, Bryce Harrington <bryce@canonical.com> wrote:
> Hi Max,
> 
> I currently am tracking 6 bug reports with the intel driver so far for
> the oneiric development cycle, of which 5 have been forwarded upstream:
> 
>   https://bugs.freedesktop.org/show_bug.cgi?id=36515

This looks to be a continuation of the WAIT_EVENT on a dead pipe that we
thought we had beaten into submission. The other reports provide more
circumstantial evidence to suggest that the hang coincides with a hotplug
event. I think the cause is a race between the kernel turning the pipe off
due to the hotplug and reprobing and that uevent reaching the ddx. In the
meantime, we've queued another video frame to execute on the dead pipe.
Worse we may have queued it up long before the hotplug event and due to
buffering in the GPU command stream it only gets executed afterwards.

commit 85345517fe6d4de27b0d6ca19fef9d28ac947c4a
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sat Nov 13 09:49:11 2010 +0000

    drm/i915: Retire any pending operations on the old scanout when switching

Handles the case were we are changing modes. Unfortunately, disabling an
output takes a different path. Though, I think we can a similar big hammer
approach there are well.

>   https://bugs.freedesktop.org/show_bug.cgi?id=37393

Doesn't look to be the typical unresolved VGA on Arrandale failure.
At first glance it suggests the misreporting of DPMS issue.

>   https://bugs.freedesktop.org/show_bug.cgi?id=37526

drm "hotplug" polling claims another victim. Daniel last volunteered to
fix drm locking, which will help the general symptoms of a stall every
10s, but doesn't root cause the blanking.

>   https://bugs.freedesktop.org/show_bug.cgi?id=28798

I've fixed all the rendering issues I could find for sna. The best
solution I can see to fix uxa is to remove the attempts to get it to use
the 3D pipeline for core rendering routines.

>   https://bugs.freedesktop.org/show_bug.cgi?id=38191

Tiling artefacts, haven't investigated ff5 yet, so I've no idea where
the issues lies or if it is a reoccurrence of a much older GM45 tiling
bug under compiz.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 5 bugs
  2011-06-16 11:37 ` Chris Wilson
@ 2011-06-16 22:46   ` Bryce Harrington
  2011-06-16 23:12     ` Chris Wilson
  0 siblings, 1 reply; 5+ messages in thread
From: Bryce Harrington @ 2011-06-16 22:46 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Thu, Jun 16, 2011 at 12:37:00PM +0100, Chris Wilson wrote:
> On Wed, 15 Jun 2011 18:10:29 -0700, Bryce Harrington <bryce@canonical.com> wrote:
> > Hi Max,
> > 
> > I currently am tracking 6 bug reports with the intel driver so far for
> > the oneiric development cycle, of which 5 have been forwarded upstream:
> > 
> >   https://bugs.freedesktop.org/show_bug.cgi?id=36515
> 
> This looks to be a continuation of the WAIT_EVENT on a dead pipe that we
> thought we had beaten into submission. The other reports provide more
> circumstantial evidence to suggest that the hang coincides with a hotplug
> event. I think the cause is a race between the kernel turning the pipe off
> due to the hotplug and reprobing and that uevent reaching the ddx. In the
> meantime, we've queued another video frame to execute on the dead pipe.
> Worse we may have queued it up long before the hotplug event and due to
> buffering in the GPU command stream it only gets executed afterwards.
> 
> commit 85345517fe6d4de27b0d6ca19fef9d28ac947c4a
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Sat Nov 13 09:49:11 2010 +0000
> 
>     drm/i915: Retire any pending operations on the old scanout when switching
> 
> Handles the case were we are changing modes. Unfortunately, disabling an
> output takes a different path. Though, I think we can a similar big hammer
> approach there are well.

As luck would have it, my own i965 laptop locked up today with I guess
this same bug.  IPEHR=0x01820000

Before I restart it, is there any data which could be gathered that
would assist you?

Otherwise, I can boot and test the patch you posted to the bug.

One of the difficulties with this type of bug is that it's so
intermittent and uncertain to reproduce (and so easily confused with
other unrelated freezes), that it's hard to tell for certain if a given
patch has definitively helped the situation.  Do you have suggestions on
ways of measuring this better, or techniques to help in triggering the
bug more reliably?

> >   https://bugs.freedesktop.org/show_bug.cgi?id=37393
> 
> Doesn't look to be the typical unresolved VGA on Arrandale failure.
> At first glance it suggests the misreporting of DPMS issue.
> 
> >   https://bugs.freedesktop.org/show_bug.cgi?id=37526
> 
> drm "hotplug" polling claims another victim. Daniel last volunteered to
> fix drm locking, which will help the general symptoms of a stall every
> 10s, but doesn't root cause the blanking.
> 
> >   https://bugs.freedesktop.org/show_bug.cgi?id=28798
> 
> I've fixed all the rendering issues I could find for sna. The best
> solution I can see to fix uxa is to remove the attempts to get it to use
> the 3D pipeline for core rendering routines.
> 
> >   https://bugs.freedesktop.org/show_bug.cgi?id=38191
> 
> Tiling artefacts, haven't investigated ff5 yet, so I've no idea where
> the issues lies or if it is a reoccurrence of a much older GM45 tiling
> bug under compiz.

Thanks.  Let me know if there are actions you need me to take on any of
these.

Bryce

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 5 bugs
  2011-06-16 22:46   ` Bryce Harrington
@ 2011-06-16 23:12     ` Chris Wilson
  2011-06-16 23:54       ` Bryce Harrington
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Wilson @ 2011-06-16 23:12 UTC (permalink / raw)
  To: Bryce Harrington; +Cc: intel-gfx

On Thu, 16 Jun 2011 15:46:29 -0700, Bryce Harrington <bryce@canonical.com> wrote:
> On Thu, Jun 16, 2011 at 12:37:00PM +0100, Chris Wilson wrote:
> > On Wed, 15 Jun 2011 18:10:29 -0700, Bryce Harrington <bryce@canonical.com> wrote:
> > > Hi Max,
> > > 
> > > I currently am tracking 6 bug reports with the intel driver so far for
> > > the oneiric development cycle, of which 5 have been forwarded upstream:
> > > 
> > >   https://bugs.freedesktop.org/show_bug.cgi?id=36515
> > 
> > This looks to be a continuation of the WAIT_EVENT on a dead pipe that we
> > thought we had beaten into submission. The other reports provide more
> > circumstantial evidence to suggest that the hang coincides with a hotplug
> > event. I think the cause is a race between the kernel turning the pipe off
> > due to the hotplug and reprobing and that uevent reaching the ddx. In the
> > meantime, we've queued another video frame to execute on the dead pipe.
> > Worse we may have queued it up long before the hotplug event and due to
> > buffering in the GPU command stream it only gets executed afterwards.
> > 
> > commit 85345517fe6d4de27b0d6ca19fef9d28ac947c4a
> > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > Date:   Sat Nov 13 09:49:11 2010 +0000
> > 
> >     drm/i915: Retire any pending operations on the old scanout when switching
> > 
> > Handles the case were we are changing modes. Unfortunately, disabling an
> > output takes a different path. Though, I think we can a similar big hammer
> > approach there are well.
> 
> As luck would have it, my own i965 laptop locked up today with I guess
> this same bug.  IPEHR=0x01820000
> 
> Before I restart it, is there any data which could be gathered that
> would assist you?

My theory is based upon this still being a WAIT_EVENT on a disable pipe.
The error state should support this is the DSP*CNTR is disabled for the
pipe we are waiting on. But the other observation to make is whether you
know if a modeset happened at around the same time as the hang.

> 
> Otherwise, I can boot and test the patch you posted to the bug.

I'm confident that that patch closes another window for the bug. I'm
less confident that that's the only race condition we have.

> One of the difficulties with this type of bug is that it's so
> intermittent and uncertain to reproduce (and so easily confused with
> other unrelated freezes), that it's hard to tell for certain if a given
> patch has definitively helped the situation.  Do you have suggestions on
> ways of measuring this better, or techniques to help in triggering the
> bug more reliably?

If am I right, then we have two paths that cause WAIT_FOR_EVENT,
windowed swapbuffers (or sub_copy_swap) and video. So playing a number
of video streams should increase the likelihood of the bug, run in
parallel with looping xrandr mode changes - in particular disabling
outputs.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 5 bugs
  2011-06-16 23:12     ` Chris Wilson
@ 2011-06-16 23:54       ` Bryce Harrington
  0 siblings, 0 replies; 5+ messages in thread
From: Bryce Harrington @ 2011-06-16 23:54 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Fri, Jun 17, 2011 at 12:12:16AM +0100, Chris Wilson wrote:
> On Thu, 16 Jun 2011 15:46:29 -0700, Bryce Harrington <bryce@canonical.com> wrote:
> > On Thu, Jun 16, 2011 at 12:37:00PM +0100, Chris Wilson wrote:
> > > On Wed, 15 Jun 2011 18:10:29 -0700, Bryce Harrington <bryce@canonical.com> wrote:
> > > >   https://bugs.freedesktop.org/show_bug.cgi?id=36515
> > > 
> > > This looks to be a continuation of the WAIT_EVENT on a dead pipe that we
> > > thought we had beaten into submission. The other reports provide more
> > > circumstantial evidence to suggest that the hang coincides with a hotplug
> > > event. I think the cause is a race between the kernel turning the pipe off
> > > due to the hotplug and reprobing and that uevent reaching the ddx. In the
> > > meantime, we've queued another video frame to execute on the dead pipe.
> > > Worse we may have queued it up long before the hotplug event and due to
> > > buffering in the GPU command stream it only gets executed afterwards.
> > > 
> > > commit 85345517fe6d4de27b0d6ca19fef9d28ac947c4a
> > > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > > Date:   Sat Nov 13 09:49:11 2010 +0000
> > > 
> > >     drm/i915: Retire any pending operations on the old scanout when switching
> > > 
> > > Handles the case were we are changing modes. Unfortunately, disabling an
> > > output takes a different path. Though, I think we can a similar big hammer
> > > approach there are well.
> > 
> > As luck would have it, my own i965 laptop locked up today with I guess
> > this same bug.  IPEHR=0x01820000
> > 
> > Before I restart it, is there any data which could be gathered that
> > would assist you?
> 
> My theory is based upon this still being a WAIT_EVENT on a disable pipe.
> The error state should support this is the DSP*CNTR is disabled for the
> pipe we are waiting on. But the other observation to make is whether you
> know if a modeset happened at around the same time as the hang.

The hang occurred while the system was preparing for sleep, triggered by
a lid close event.

>From my kern.log:

Jun 14 23:40:40 lynmouth kernel: [511433.780066] tg3 0000:08:00.0: eth0: Link is down
Jun 14 23:40:41 lynmouth kernel: [511434.597257] PM: Syncing filesystems ... done.
Jun 14 23:40:41 lynmouth kernel: [511434.615699] PM: Preparing system for mem sleep
Jun 14 23:40:45 lynmouth kernel: [511439.284049] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
Jun 14 23:40:45 lynmouth kernel: [511439.284823] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 1680764 at 1680757, next 1680765)
Jun 14 23:40:46 lynmouth kernel: [511439.788055] [drm:i915_reset] *ERROR* Failed to reset chip.
Jun 16 15:02:15 lynmouth kernel: [511439.916240] Freezing user space processes ... (elapsed 0.01 seconds) done.
Jun 16 15:02:15 lynmouth kernel: [511439.932109] Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
Jun 16 15:02:15 lynmouth kernel: [511439.948084] PM: Entering mem sleep

I don't see a modeset event but could be it happens but doesn't cause a
log entry.  I'll flip on more debugging output and check.

The log shows the system has an uptime of 15 days and has gone through
suspend resume cycles roughly daily.  I do play videos on it from time
to time, although I hadn't been at the time of this suspend/resume
cycle.

The system does occasionally lose its dualhead configuration during
suspend/resume, and comes back mirrored.  I've assumed it to be a
gnome-settings-daemon bug, but could be a symptom of this problem.  It
does hint that perhaps some modeset or output hotplug event or something
does occur during resume.

> > Otherwise, I can boot and test the patch you posted to the bug.
> 
> I'm confident that that patch closes another window for the bug. I'm
> less confident that that's the only race condition we have.
>
> > One of the difficulties with this type of bug is that it's so
> > intermittent and uncertain to reproduce (and so easily confused with
> > other unrelated freezes), that it's hard to tell for certain if a given
> > patch has definitively helped the situation.  Do you have suggestions on
> > ways of measuring this better, or techniques to help in triggering the
> > bug more reliably?
> 
> If am I right, then we have two paths that cause WAIT_FOR_EVENT,
> windowed swapbuffers (or sub_copy_swap) and video. So playing a number
> of video streams should increase the likelihood of the bug, run in
> parallel with looping xrandr mode changes - in particular disabling
> outputs.

Awesome, can do.

The reason I ask is because the way Ubuntu's stable updates process
works, if I can demonstrate that a patch improves things, in a way
that's clear to a non-X person (i.e. the archive admin team) to
understand, I can get the patch released to all Ubuntu users.  If I
can't prove it or demonstrate it in some fashion, it'll get rejected or
significantly delayed.

Bryce

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-06-16 23:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-06-16  1:10 5 bugs Bryce Harrington
2011-06-16 11:37 ` Chris Wilson
2011-06-16 22:46   ` Bryce Harrington
2011-06-16 23:12     ` Chris Wilson
2011-06-16 23:54       ` Bryce Harrington

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.