From: Bryce Harrington <bryce@canonical.com>
To: Chris Wilson <chris@chris-wilson.co.uk>
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: 5 bugs
Date: Thu, 16 Jun 2011 16:54:45 -0700 [thread overview]
Message-ID: <20110616235445.GW459@bryceharrington.org> (raw)
In-Reply-To: <013811$gef6o@fmsmga002.fm.intel.com>
On Fri, Jun 17, 2011 at 12:12:16AM +0100, Chris Wilson wrote:
> On Thu, 16 Jun 2011 15:46:29 -0700, Bryce Harrington <bryce@canonical.com> wrote:
> > On Thu, Jun 16, 2011 at 12:37:00PM +0100, Chris Wilson wrote:
> > > On Wed, 15 Jun 2011 18:10:29 -0700, Bryce Harrington <bryce@canonical.com> wrote:
> > > > https://bugs.freedesktop.org/show_bug.cgi?id=36515
> > >
> > > This looks to be a continuation of the WAIT_EVENT on a dead pipe that we
> > > thought we had beaten into submission. The other reports provide more
> > > circumstantial evidence to suggest that the hang coincides with a hotplug
> > > event. I think the cause is a race between the kernel turning the pipe off
> > > due to the hotplug and reprobing and that uevent reaching the ddx. In the
> > > meantime, we've queued another video frame to execute on the dead pipe.
> > > Worse we may have queued it up long before the hotplug event and due to
> > > buffering in the GPU command stream it only gets executed afterwards.
> > >
> > > commit 85345517fe6d4de27b0d6ca19fef9d28ac947c4a
> > > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > > Date: Sat Nov 13 09:49:11 2010 +0000
> > >
> > > drm/i915: Retire any pending operations on the old scanout when switching
> > >
> > > Handles the case were we are changing modes. Unfortunately, disabling an
> > > output takes a different path. Though, I think we can a similar big hammer
> > > approach there are well.
> >
> > As luck would have it, my own i965 laptop locked up today with I guess
> > this same bug. IPEHR=0x01820000
> >
> > Before I restart it, is there any data which could be gathered that
> > would assist you?
>
> My theory is based upon this still being a WAIT_EVENT on a disable pipe.
> The error state should support this is the DSP*CNTR is disabled for the
> pipe we are waiting on. But the other observation to make is whether you
> know if a modeset happened at around the same time as the hang.
The hang occurred while the system was preparing for sleep, triggered by
a lid close event.
>From my kern.log:
Jun 14 23:40:40 lynmouth kernel: [511433.780066] tg3 0000:08:00.0: eth0: Link is down
Jun 14 23:40:41 lynmouth kernel: [511434.597257] PM: Syncing filesystems ... done.
Jun 14 23:40:41 lynmouth kernel: [511434.615699] PM: Preparing system for mem sleep
Jun 14 23:40:45 lynmouth kernel: [511439.284049] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
Jun 14 23:40:45 lynmouth kernel: [511439.284823] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 1680764 at 1680757, next 1680765)
Jun 14 23:40:46 lynmouth kernel: [511439.788055] [drm:i915_reset] *ERROR* Failed to reset chip.
Jun 16 15:02:15 lynmouth kernel: [511439.916240] Freezing user space processes ... (elapsed 0.01 seconds) done.
Jun 16 15:02:15 lynmouth kernel: [511439.932109] Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
Jun 16 15:02:15 lynmouth kernel: [511439.948084] PM: Entering mem sleep
I don't see a modeset event but could be it happens but doesn't cause a
log entry. I'll flip on more debugging output and check.
The log shows the system has an uptime of 15 days and has gone through
suspend resume cycles roughly daily. I do play videos on it from time
to time, although I hadn't been at the time of this suspend/resume
cycle.
The system does occasionally lose its dualhead configuration during
suspend/resume, and comes back mirrored. I've assumed it to be a
gnome-settings-daemon bug, but could be a symptom of this problem. It
does hint that perhaps some modeset or output hotplug event or something
does occur during resume.
> > Otherwise, I can boot and test the patch you posted to the bug.
>
> I'm confident that that patch closes another window for the bug. I'm
> less confident that that's the only race condition we have.
>
> > One of the difficulties with this type of bug is that it's so
> > intermittent and uncertain to reproduce (and so easily confused with
> > other unrelated freezes), that it's hard to tell for certain if a given
> > patch has definitively helped the situation. Do you have suggestions on
> > ways of measuring this better, or techniques to help in triggering the
> > bug more reliably?
>
> If am I right, then we have two paths that cause WAIT_FOR_EVENT,
> windowed swapbuffers (or sub_copy_swap) and video. So playing a number
> of video streams should increase the likelihood of the bug, run in
> parallel with looping xrandr mode changes - in particular disabling
> outputs.
Awesome, can do.
The reason I ask is because the way Ubuntu's stable updates process
works, if I can demonstrate that a patch improves things, in a way
that's clear to a non-X person (i.e. the archive admin team) to
understand, I can get the patch released to all Ubuntu users. If I
can't prove it or demonstrate it in some fashion, it'll get rejected or
significantly delayed.
Bryce
prev parent reply other threads:[~2011-06-16 23:54 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-16 1:10 5 bugs Bryce Harrington
2011-06-16 11:37 ` Chris Wilson
2011-06-16 22:46 ` Bryce Harrington
2011-06-16 23:12 ` Chris Wilson
2011-06-16 23:54 ` Bryce Harrington [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110616235445.GW459@bryceharrington.org \
--to=bryce@canonical.com \
--cc=chris@chris-wilson.co.uk \
--cc=intel-gfx@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.