public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
From: Ben Widawsky <ben@bwidawsk.net>
To: Daniel Vetter <daniel@ffwll.ch>
Cc: Intel Graphics Development <intel-gfx@lists.freedesktop.org>,
	Daniel Vetter <daniel.vetter@ffwll.ch>
Subject: Re: [PATCH 1/2] drm/i915: fix up semaphore_waits_for
Date: Tue, 25 Mar 2014 07:54:09 -0700	[thread overview]
Message-ID: <20140325145408.GA777@bwidawsk.net> (raw)
In-Reply-To: <20140325094209.GN26878@phenom.ffwll.local>

On Tue, Mar 25, 2014 at 10:42:09AM +0100, Daniel Vetter wrote:
> On Mon, Mar 24, 2014 at 03:37:12PM -0700, Ben Widawsky wrote:
> > On Sat, Mar 22, 2014 at 06:52:25PM +0100, Daniel Vetter wrote:
> > > On Fri, Mar 21, 2014 at 07:33:59PM +0200, Mika Kuoppala wrote:
> > > > Hi,
> > > > 
> > > > Daniel Vetter <daniel.vetter@ffwll.ch> writes:
> > > > 
> > > > > There's an entire pile of issues in here:
> > > > >
> > > > > - Use the main RING_HEAD register, not ACTHD. ACTHD points at the gtt
> > > > >   offset of the batch buffer when a batch is executed. Semaphores are
> > > > >   always emitted to the main ring, so we always want to look at that.
> > > > 
> > > > The ipehr check should make sure that we are at the ring and
> > > > acthd should not be at batch.
> > > > 
> > > > Regardless I agree that RING_HEAD is much more safer. When I have
> > > > tried to get bottom of the snb reset hang, I have noticed that
> > > > after reset the acthd is sometimes 0x0c even tho head is 0x00,
> > > > on snb.
> > > 
> > > Hm, should we maybe mask out the lower bits, too? If you can confirm this,
> > > can you please add a follow-up patch?
> > > 
> > > > > - Mask the obtained HEAD pointer with the actual ring size, which is
> > > > >   much smaller. Together with the above issue this resulted us in
> > > > >   trying to dereference a pointer way outside of the ring mmio
> > > > >   mapping. The resulting invalid access in interrupt context
> > > > >   (hangcheck is executed from timers) lead to a full blown kernel
> > > > >   panic. The fbcon panic handler then tried to frob our driver harder,
> > > > >   resulting in a full machine hang at least on my snb here where I've
> > > > >   stumbled over this.
> > > > >
> > > > > - Handle ring wrapping correctly and be a bit more explicit about how
> > > > >   many dwords we're scanning. We probably should also scan more than
> > > > >   just 4 ...
> > > > 
> > > > ipehr dont update on MI_NOOPS and the head should point to
> > > > the extra MI_NOOP after semaphore sequence. So 4 should be
> > > > enough. Perhaps revisit this when bdw gains semaphores.
> > > 
> > > Yeah, Chris also mentioned that the HEAD should point at one dword past
> > > the end of the ring, so should even work when there are no MI_NOOPs. In
> > > any case this code is more robust in case hw engineers suddenly change the
> > > behaviour.
> > > 
> > > > > - Space out some of teh computations for readability.
> > > > >
> > > > > This prevents hard-hangs on my snb here. Verdict from QA is still
> > > > > pending, but the symptoms match.
> > > > >
> > > > > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > > > > Cc: Ben Widawsky <ben@bwidawsk.net>
> > > > > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74100
> > > > 
> > > > The hard hangs are not so frequent with this patch but
> > > > they are still there. This patch should take care of
> > > > the oops we have been seeing, but there is still
> > > > something else to be found until #74100 can be marked as
> > > > fixed.
> > > > 
> > > > Very often after reset, when igt has pushed the quietence
> > > > batches into rings, blitter and video ring heads gets
> > > > moved properly but all updates to hardware status page and to
> > > > the sync registers are missing. And result is hang by hangcheck.
> > > > Repeat enough times and it is a hard hang.
> > > > 
> > > > > Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > > 
> > > > Please remove the blowup comment and also update the
> > > > bugzilla tag. 
> > > 
> > > Yeah, QA also says that it doesn't fix the hard hangs, only seems to
> > > reduce them a bit on certain setups. I've updated the commit message.
> > > 
> > > btw are you testing with FBDEV=n? The lack of a fbcon panic handler should
> > > greatly increase the chances that the last few message get correctly
> > > transmitted through other means like netconsole.
> > > 
> > > > Patches 1-2 and the followup one are
> > > > 
> > > > Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
> > > 
> > > Thanks for the review, patches merged.
> > > -Daniel
> > 
> > Patch 2 was trivial to implement for gen8. This functionality is a lot
> > less trivial. I volunteered to do patch 2, are you going to port this to
> > gen8?
> 
> If you fill out the bdw pieces for patch 2&3 the only thing you need to
> change here is to adjsut the number of dwords we walk backwards to make
> sure that the bdw semaphore cmd still fits. Or at least that's been my
> idea behind all this, maybe I've overlooked something.
> -Daniel

I don't think it's quite that easy, but I took it as a, "yes." Which one
is patch 3?

-- 
Ben Widawsky, Intel Open Source Technology Center

  reply	other threads:[~2014-03-25 14:54 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-14 23:08 [PATCH 1/2] drm/i915: fix up semaphore_waits_for Daniel Vetter
2014-03-14 23:08 ` [PATCH 2/2] drm/i915: Add FIXME for bdw semaphore detection in hancheck Daniel Vetter
2014-03-15  0:38   ` Ben Widawsky
2014-03-15 11:44     ` Daniel Vetter
2014-03-18  9:26       ` [PATCH] drm/i915: make semaphore signaller detection more robust Daniel Vetter
2014-03-19 15:05         ` Chris Wilson
2014-03-19 15:35           ` Daniel Vetter
2014-03-15 15:46 ` [PATCH 1/2] drm/i915: fix up semaphore_waits_for Chris Wilson
2014-03-15 16:31   ` Daniel Vetter
2014-03-15 23:13     ` Chris Wilson
2014-03-21 17:33 ` Mika Kuoppala
2014-03-22 17:52   ` Daniel Vetter
2014-03-24 22:37     ` Ben Widawsky
2014-03-25  9:42       ` Daniel Vetter
2014-03-25 14:54         ` Ben Widawsky [this message]
2014-03-25 15:24           ` Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140325145408.GA777@bwidawsk.net \
    --to=ben@bwidawsk.net \
    --cc=daniel.vetter@ffwll.ch \
    --cc=daniel@ffwll.ch \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox