* 5 bugs @ 2011-06-16 1:10 Bryce Harrington 2011-06-16 11:37 ` Chris Wilson 0 siblings, 1 reply; 5+ messages in thread From: Bryce Harrington @ 2011-06-16 1:10 UTC (permalink / raw) To: Alt, Maxim; +Cc: intel-gfx Hi Max, I currently am tracking 6 bug reports with the intel driver so far for the oneiric development cycle, of which 5 have been forwarded upstream: https://bugs.freedesktop.org/show_bug.cgi?id=36515 https://bugs.freedesktop.org/show_bug.cgi?id=37393 https://bugs.freedesktop.org/show_bug.cgi?id=37526 https://bugs.freedesktop.org/show_bug.cgi?id=28798 https://bugs.freedesktop.org/show_bug.cgi?id=38191 The first two would be the higher priorities; I've seen several other bug reports with similar symptoms/error codes. These are mainly leftover priorities from natty; the deluge of oneiric bug reports probably won't start for a couple more weeks. There's also one mesa 7.10.2 bug report showing up on my list right now: https://bugs.freedesktop.org/show_bug.cgi?id=35234 (lp: #35234) That's reported when using mutter, which we're not really focused on, but the bug looks tractable (stacktrace / assert triggered). Thanks, Bryce ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 5 bugs 2011-06-16 1:10 5 bugs Bryce Harrington @ 2011-06-16 11:37 ` Chris Wilson 2011-06-16 22:46 ` Bryce Harrington 0 siblings, 1 reply; 5+ messages in thread From: Chris Wilson @ 2011-06-16 11:37 UTC (permalink / raw) To: Bryce Harrington, Alt, Maxim; +Cc: intel-gfx On Wed, 15 Jun 2011 18:10:29 -0700, Bryce Harrington <bryce@canonical.com> wrote: > Hi Max, > > I currently am tracking 6 bug reports with the intel driver so far for > the oneiric development cycle, of which 5 have been forwarded upstream: > > https://bugs.freedesktop.org/show_bug.cgi?id=36515 This looks to be a continuation of the WAIT_EVENT on a dead pipe that we thought we had beaten into submission. The other reports provide more circumstantial evidence to suggest that the hang coincides with a hotplug event. I think the cause is a race between the kernel turning the pipe off due to the hotplug and reprobing and that uevent reaching the ddx. In the meantime, we've queued another video frame to execute on the dead pipe. Worse we may have queued it up long before the hotplug event and due to buffering in the GPU command stream it only gets executed afterwards. commit 85345517fe6d4de27b0d6ca19fef9d28ac947c4a Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sat Nov 13 09:49:11 2010 +0000 drm/i915: Retire any pending operations on the old scanout when switching Handles the case were we are changing modes. Unfortunately, disabling an output takes a different path. Though, I think we can a similar big hammer approach there are well. > https://bugs.freedesktop.org/show_bug.cgi?id=37393 Doesn't look to be the typical unresolved VGA on Arrandale failure. At first glance it suggests the misreporting of DPMS issue. > https://bugs.freedesktop.org/show_bug.cgi?id=37526 drm "hotplug" polling claims another victim. Daniel last volunteered to fix drm locking, which will help the general symptoms of a stall every 10s, but doesn't root cause the blanking. > https://bugs.freedesktop.org/show_bug.cgi?id=28798 I've fixed all the rendering issues I could find for sna. The best solution I can see to fix uxa is to remove the attempts to get it to use the 3D pipeline for core rendering routines. > https://bugs.freedesktop.org/show_bug.cgi?id=38191 Tiling artefacts, haven't investigated ff5 yet, so I've no idea where the issues lies or if it is a reoccurrence of a much older GM45 tiling bug under compiz. -Chris -- Chris Wilson, Intel Open Source Technology Centre ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 5 bugs 2011-06-16 11:37 ` Chris Wilson @ 2011-06-16 22:46 ` Bryce Harrington 2011-06-16 23:12 ` Chris Wilson 0 siblings, 1 reply; 5+ messages in thread From: Bryce Harrington @ 2011-06-16 22:46 UTC (permalink / raw) To: Chris Wilson; +Cc: intel-gfx On Thu, Jun 16, 2011 at 12:37:00PM +0100, Chris Wilson wrote: > On Wed, 15 Jun 2011 18:10:29 -0700, Bryce Harrington <bryce@canonical.com> wrote: > > Hi Max, > > > > I currently am tracking 6 bug reports with the intel driver so far for > > the oneiric development cycle, of which 5 have been forwarded upstream: > > > > https://bugs.freedesktop.org/show_bug.cgi?id=36515 > > This looks to be a continuation of the WAIT_EVENT on a dead pipe that we > thought we had beaten into submission. The other reports provide more > circumstantial evidence to suggest that the hang coincides with a hotplug > event. I think the cause is a race between the kernel turning the pipe off > due to the hotplug and reprobing and that uevent reaching the ddx. In the > meantime, we've queued another video frame to execute on the dead pipe. > Worse we may have queued it up long before the hotplug event and due to > buffering in the GPU command stream it only gets executed afterwards. > > commit 85345517fe6d4de27b0d6ca19fef9d28ac947c4a > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Sat Nov 13 09:49:11 2010 +0000 > > drm/i915: Retire any pending operations on the old scanout when switching > > Handles the case were we are changing modes. Unfortunately, disabling an > output takes a different path. Though, I think we can a similar big hammer > approach there are well. As luck would have it, my own i965 laptop locked up today with I guess this same bug. IPEHR=0x01820000 Before I restart it, is there any data which could be gathered that would assist you? Otherwise, I can boot and test the patch you posted to the bug. One of the difficulties with this type of bug is that it's so intermittent and uncertain to reproduce (and so easily confused with other unrelated freezes), that it's hard to tell for certain if a given patch has definitively helped the situation. Do you have suggestions on ways of measuring this better, or techniques to help in triggering the bug more reliably? > > https://bugs.freedesktop.org/show_bug.cgi?id=37393 > > Doesn't look to be the typical unresolved VGA on Arrandale failure. > At first glance it suggests the misreporting of DPMS issue. > > > https://bugs.freedesktop.org/show_bug.cgi?id=37526 > > drm "hotplug" polling claims another victim. Daniel last volunteered to > fix drm locking, which will help the general symptoms of a stall every > 10s, but doesn't root cause the blanking. > > > https://bugs.freedesktop.org/show_bug.cgi?id=28798 > > I've fixed all the rendering issues I could find for sna. The best > solution I can see to fix uxa is to remove the attempts to get it to use > the 3D pipeline for core rendering routines. > > > https://bugs.freedesktop.org/show_bug.cgi?id=38191 > > Tiling artefacts, haven't investigated ff5 yet, so I've no idea where > the issues lies or if it is a reoccurrence of a much older GM45 tiling > bug under compiz. Thanks. Let me know if there are actions you need me to take on any of these. Bryce ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 5 bugs 2011-06-16 22:46 ` Bryce Harrington @ 2011-06-16 23:12 ` Chris Wilson 2011-06-16 23:54 ` Bryce Harrington 0 siblings, 1 reply; 5+ messages in thread From: Chris Wilson @ 2011-06-16 23:12 UTC (permalink / raw) To: Bryce Harrington; +Cc: intel-gfx On Thu, 16 Jun 2011 15:46:29 -0700, Bryce Harrington <bryce@canonical.com> wrote: > On Thu, Jun 16, 2011 at 12:37:00PM +0100, Chris Wilson wrote: > > On Wed, 15 Jun 2011 18:10:29 -0700, Bryce Harrington <bryce@canonical.com> wrote: > > > Hi Max, > > > > > > I currently am tracking 6 bug reports with the intel driver so far for > > > the oneiric development cycle, of which 5 have been forwarded upstream: > > > > > > https://bugs.freedesktop.org/show_bug.cgi?id=36515 > > > > This looks to be a continuation of the WAIT_EVENT on a dead pipe that we > > thought we had beaten into submission. The other reports provide more > > circumstantial evidence to suggest that the hang coincides with a hotplug > > event. I think the cause is a race between the kernel turning the pipe off > > due to the hotplug and reprobing and that uevent reaching the ddx. In the > > meantime, we've queued another video frame to execute on the dead pipe. > > Worse we may have queued it up long before the hotplug event and due to > > buffering in the GPU command stream it only gets executed afterwards. > > > > commit 85345517fe6d4de27b0d6ca19fef9d28ac947c4a > > Author: Chris Wilson <chris@chris-wilson.co.uk> > > Date: Sat Nov 13 09:49:11 2010 +0000 > > > > drm/i915: Retire any pending operations on the old scanout when switching > > > > Handles the case were we are changing modes. Unfortunately, disabling an > > output takes a different path. Though, I think we can a similar big hammer > > approach there are well. > > As luck would have it, my own i965 laptop locked up today with I guess > this same bug. IPEHR=0x01820000 > > Before I restart it, is there any data which could be gathered that > would assist you? My theory is based upon this still being a WAIT_EVENT on a disable pipe. The error state should support this is the DSP*CNTR is disabled for the pipe we are waiting on. But the other observation to make is whether you know if a modeset happened at around the same time as the hang. > > Otherwise, I can boot and test the patch you posted to the bug. I'm confident that that patch closes another window for the bug. I'm less confident that that's the only race condition we have. > One of the difficulties with this type of bug is that it's so > intermittent and uncertain to reproduce (and so easily confused with > other unrelated freezes), that it's hard to tell for certain if a given > patch has definitively helped the situation. Do you have suggestions on > ways of measuring this better, or techniques to help in triggering the > bug more reliably? If am I right, then we have two paths that cause WAIT_FOR_EVENT, windowed swapbuffers (or sub_copy_swap) and video. So playing a number of video streams should increase the likelihood of the bug, run in parallel with looping xrandr mode changes - in particular disabling outputs. -Chris -- Chris Wilson, Intel Open Source Technology Centre ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 5 bugs 2011-06-16 23:12 ` Chris Wilson @ 2011-06-16 23:54 ` Bryce Harrington 0 siblings, 0 replies; 5+ messages in thread From: Bryce Harrington @ 2011-06-16 23:54 UTC (permalink / raw) To: Chris Wilson; +Cc: intel-gfx On Fri, Jun 17, 2011 at 12:12:16AM +0100, Chris Wilson wrote: > On Thu, 16 Jun 2011 15:46:29 -0700, Bryce Harrington <bryce@canonical.com> wrote: > > On Thu, Jun 16, 2011 at 12:37:00PM +0100, Chris Wilson wrote: > > > On Wed, 15 Jun 2011 18:10:29 -0700, Bryce Harrington <bryce@canonical.com> wrote: > > > > https://bugs.freedesktop.org/show_bug.cgi?id=36515 > > > > > > This looks to be a continuation of the WAIT_EVENT on a dead pipe that we > > > thought we had beaten into submission. The other reports provide more > > > circumstantial evidence to suggest that the hang coincides with a hotplug > > > event. I think the cause is a race between the kernel turning the pipe off > > > due to the hotplug and reprobing and that uevent reaching the ddx. In the > > > meantime, we've queued another video frame to execute on the dead pipe. > > > Worse we may have queued it up long before the hotplug event and due to > > > buffering in the GPU command stream it only gets executed afterwards. > > > > > > commit 85345517fe6d4de27b0d6ca19fef9d28ac947c4a > > > Author: Chris Wilson <chris@chris-wilson.co.uk> > > > Date: Sat Nov 13 09:49:11 2010 +0000 > > > > > > drm/i915: Retire any pending operations on the old scanout when switching > > > > > > Handles the case were we are changing modes. Unfortunately, disabling an > > > output takes a different path. Though, I think we can a similar big hammer > > > approach there are well. > > > > As luck would have it, my own i965 laptop locked up today with I guess > > this same bug. IPEHR=0x01820000 > > > > Before I restart it, is there any data which could be gathered that > > would assist you? > > My theory is based upon this still being a WAIT_EVENT on a disable pipe. > The error state should support this is the DSP*CNTR is disabled for the > pipe we are waiting on. But the other observation to make is whether you > know if a modeset happened at around the same time as the hang. The hang occurred while the system was preparing for sleep, triggered by a lid close event. >From my kern.log: Jun 14 23:40:40 lynmouth kernel: [511433.780066] tg3 0000:08:00.0: eth0: Link is down Jun 14 23:40:41 lynmouth kernel: [511434.597257] PM: Syncing filesystems ... done. Jun 14 23:40:41 lynmouth kernel: [511434.615699] PM: Preparing system for mem sleep Jun 14 23:40:45 lynmouth kernel: [511439.284049] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung Jun 14 23:40:45 lynmouth kernel: [511439.284823] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 1680764 at 1680757, next 1680765) Jun 14 23:40:46 lynmouth kernel: [511439.788055] [drm:i915_reset] *ERROR* Failed to reset chip. Jun 16 15:02:15 lynmouth kernel: [511439.916240] Freezing user space processes ... (elapsed 0.01 seconds) done. Jun 16 15:02:15 lynmouth kernel: [511439.932109] Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done. Jun 16 15:02:15 lynmouth kernel: [511439.948084] PM: Entering mem sleep I don't see a modeset event but could be it happens but doesn't cause a log entry. I'll flip on more debugging output and check. The log shows the system has an uptime of 15 days and has gone through suspend resume cycles roughly daily. I do play videos on it from time to time, although I hadn't been at the time of this suspend/resume cycle. The system does occasionally lose its dualhead configuration during suspend/resume, and comes back mirrored. I've assumed it to be a gnome-settings-daemon bug, but could be a symptom of this problem. It does hint that perhaps some modeset or output hotplug event or something does occur during resume. > > Otherwise, I can boot and test the patch you posted to the bug. > > I'm confident that that patch closes another window for the bug. I'm > less confident that that's the only race condition we have. > > > One of the difficulties with this type of bug is that it's so > > intermittent and uncertain to reproduce (and so easily confused with > > other unrelated freezes), that it's hard to tell for certain if a given > > patch has definitively helped the situation. Do you have suggestions on > > ways of measuring this better, or techniques to help in triggering the > > bug more reliably? > > If am I right, then we have two paths that cause WAIT_FOR_EVENT, > windowed swapbuffers (or sub_copy_swap) and video. So playing a number > of video streams should increase the likelihood of the bug, run in > parallel with looping xrandr mode changes - in particular disabling > outputs. Awesome, can do. The reason I ask is because the way Ubuntu's stable updates process works, if I can demonstrate that a patch improves things, in a way that's clear to a non-X person (i.e. the archive admin team) to understand, I can get the patch released to all Ubuntu users. If I can't prove it or demonstrate it in some fashion, it'll get rejected or significantly delayed. Bryce ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-06-16 23:54 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-06-16 1:10 5 bugs Bryce Harrington 2011-06-16 11:37 ` Chris Wilson 2011-06-16 22:46 ` Bryce Harrington 2011-06-16 23:12 ` Chris Wilson 2011-06-16 23:54 ` Bryce Harrington
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.