Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: Tarun Vyas <tarun.vyas@intel.com>
Cc: Deak@otc-chromeosbuild-5, Pandiyan@otc-chromeosbuild-5,
	Dhinakaran <dhinakaran.pandiyan@intel.com>,
	intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/i915: Wait for PSR exit before checking for vblank evasion for an atomic update
Date: Thu, 3 May 2018 09:58:56 -0700	[thread overview]
Message-ID: <20180503165856.GB17530@intel.com> (raw)
In-Reply-To: <20180502223115.GA29751@otc-chromeosbuild-5>

On Wed, May 02, 2018 at 03:31:15PM -0700, Tarun Vyas wrote:
> On Wed, May 02, 2018 at 01:04:06PM -0700, Vivi, Rodrigo wrote:
> > On Wed, May 02, 2018 at 09:51:43PM +0300, Ville Syrjälä wrote:
> > > On Wed, May 02, 2018 at 11:19:14AM -0700, Tarun Vyas wrote:
> > > > On Mon, Apr 30, 2018 at 10:19:33AM -0700, Rodrigo Vivi wrote:
> > > > > On Sun, Apr 29, 2018 at 09:00:18PM -0700, Tarun Vyas wrote:
> > > > > > From: Tarun <tarun.vyas@intel.com>
> > > > > >
> > > > > > The PIPEDSL freezes on PSR entry and if PSR hasn't fully exited, then
> > > > > > the pipe_update_start call schedules itself out to check back later.
> > > > > >
> > > > > > On ChromeOS-4.4 kernel, which is fairly up-to-date w.r.t drm/i915 but
> > > > > > lags w.r.t core kernel code, hot plugging an external display triggers
> > > > > > tons of "potential atomic update errors" in the dmesg, on *pipe A*. A
> > > > > > closer analysis reveals that we try to read the scanline 3 times and
> > > > > > eventually timeout, b/c PSR hasn't exited fully leading to a PIPEDSL
> > > > > > stuck @ 1599. This issue is not seen on upstream kernels, b/c for *some*
> > > > > > reason we loop inside intel_pipe_update start for ~2+ msec which in this
> > > > > > case is more than enough to exit PSR fully, hence an *unstuck* PIPEDSL
> > > > > > counter, hence no error. On the other hand, the ChromeOS kernel spends
> > > > > > ~1.1 msec looping inside intel_pipe_update_start and hence errors out
> > > > > > b/c the source is still in PSR.
> > > > > >
> > > > > > Regardless, we should wait for PSR exit (if PSR is supported and active
> > > > > > on the current pipe) before reading the PIPEDSL, b/c if we haven't
> > > > > > fully exited PSR, then checking for vblank evasion isn't actually
> > > > > > applicable.
> > > > > >
> > > > > > This scenario applies to a configuration with an additional pipe,
> > > > > > as of now.
> > > > >
> > > > > I honestly believe you picking the wrong culprit here. By "coincidence".
> > > > > PSR will allow DC state with screen on and DC state will mess up with all
> > > > > registers reads....
> > > > >
> > > > > probably what you are missing you your kernel is some power domain
> > > > > grab that would keep DC_OFF and consequently a sane read of these
> > > > > registers.
> > > > >
> > > > > Maybe Imre has a quick idea of what you could be missing on your kernel
> > > > > that we already have on upstream one.
> > > > >
> > > > > Thanks,
> > > > > Rodrigo.
> > > > >
> > > > Thanks for the quick response Rodrigo !
> > > > Some key observations based on my experiments so far:
> > > >        for (;;) {
> > > >                 /*
> > > >                  * prepare_to_wait() has a memory barrier, which guarantees
> > > >                  * other CPUs can see the task state update by the time we
> > > >                  * read the scanline.
> > > >                  */
> > > >                 prepare_to_wait(wq, &wait, TASK_UNINTERRUPTIBLE);
> > > >
> > > >                 scanline = intel_get_crtc_scanline(crtc);
> > > >                 if (scanline < min || scanline > max)
> > > >                         break;
> > > >
> > > >                 if (timeout <= 0) {
> > > >                         DRM_ERROR("Potential atomic update failure on pipe %c\n",
> > > >                                   pipe_name(crtc->pipe));
> > > >                         break;
> > > >                 }
> > > >
> > > >                 local_irq_enable();
> > > >
> > > >                 timeout = schedule_timeout(timeout);
> > > >
> > > >                 local_irq_disable();
> > > >         }
> > > > 1. In the above loop inside pipe_update_start, the *first time*, we read the PIPEDSL, with PSR1 and external display connected, it always reads 1599, for *both* the kernels(upstream and ChromeOS-4.4) . The PSR_STATUS also reads the exact same for *both* kernels and shows that we haven't *fully* exited PSR.
> > > >
> > > > 2. The difference between the two kernels comes after this first read of the PIPEDSL. ChromeOS-4.4 spends ~1 msec inside that loop and upstream spends ~2msec. I suspect that it is because of the scheduling changes between the two kernels, b/c I can't find any i915 specific code running in that loop, except for vblank processing.
> > > >
> > > > 3. So to summarize it, both the kernels are in the same state w.r.t PSR and PIPEDSL value when they read the PIPEDSL for the first time inside the loop. *When* the kernels *transition* to a *full PSR exit* is what is differing.
> > 
> > Oh! So you really are getting reliable counters....
> > 
> > > >
> > > > My rationale for this patch is that, the pipe_update_start function is meant to evade 100 usec before a vblank, but, *if* we haven't *fully* exited PSR (which is true for both the kernels for the first PIPEDSL read), then vblank evasion is *not applicable* b/c the PIPEDSL will be messed up. So we shouldn't bother evading vblank until we have fully exited PSR.
> > >
> > > Yeah, I think this is the right direction. The problem really is the
> > > extra vblank pulse that the hardware generates (or at least can
> > > generate depending on a chicken bit) when it exits PSR. We have no
> > > control over when that happens and hence we have no control over when
> > > the registers get latched. And yet we still have to somehow prevent
> > > the register latching from occurring while we're in middle of
> > > reprogramming them.
> > 
> > I see the problem now. Thanks for the explanation.
> > 
> > >
> > > There are a couple of ways to avoid this:
> > > 1) Set the chicken bit so that we don't get the vblank pulse. The
> > >    pipe should restart from the vblank start, so we would have one
> > >    full frame to reprogam the registers. Howver IIRC DK told me
> > >    there is no way to fully eliminate it in all cases so this
> > >    option is probably out. There was also some implication for FBC
> > >    which I already forgot.
> > > 2) Make sure we've exited PSR before repgrogamming the registers
> > >    (ie. what you do).
> > > 3) Use the DOUBLE_BUFFER_CTL to prevent the extra vblank pulse from
> > >    latching the registers while we're still reprogramming them.
> > >    This feature only exists on SKL+ so is not a solution for
> > >    HSW/BDW. But maybe HSW/BDW didn't even have the extra vblank
> > >    pulse?
> > >
> > > Option 2) does provide a consistent behaviour on all platforms, so I
> > > do kinda like it. It also avoids a bigger reword on account of the
> > > DOUBLE_BUFFER_CTL. I do think we'll have to start using
> > > DOUBLE_BUFFER_CTL anyway due to other issues, but at least this way
> > > we don't block PSR progress on that work.
> > 
> > My vote is for the option 2. Seems more straighforward and more broad.
> > 
> > DK?
> > 
> > My only request on the patch itself would be to create a function
> > on intel_psr.c intel_psr_wait_for_idle... or something like this
> > and put the register wait logic inside it instead of spreading
> > the psr code around.
> > 
> > Thanks,
> > Rodrigo.
> > 
> > >
> > > --
> > > Ville Syrjälä
> > > Intel
> Thanks for the comments, Ville and Rodrigo. I'll rework this to move the wait to intel_psr.c. There is a psr_wait_for_idle() in there, but there are some PSR locks being passed around inside it (eventually released by the caller). Also,the max timeout specified there is 50 msec which might be way too much ?

ouch! that function is ugly.... unlock than lock back again...
(specially unlock without any assert locked... :/)

If you can improve that or split in a way that we reuse some code it would be nice...

>
> Best,
> Tarun
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2018-05-03 16:58 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-30  4:00 [PATCH] drm/i915: Wait for PSR exit before checking for vblank evasion for an atomic update Tarun Vyas
2018-04-30  8:20 ` Jani Nikula
2018-04-30 10:48 ` ✗ Fi.CI.CHECKPATCH: warning for " Patchwork
2018-04-30 11:04 ` ✓ Fi.CI.BAT: success " Patchwork
2018-04-30 13:39 ` ✓ Fi.CI.IGT: " Patchwork
2018-04-30 17:19 ` [PATCH] " Rodrigo Vivi
2018-05-02 18:19   ` Tarun Vyas
2018-05-02 18:51     ` Ville Syrjälä
2018-05-02 20:04       ` Rodrigo Vivi
2018-05-02 22:31         ` Tarun Vyas
2018-05-03 16:58           ` Rodrigo Vivi [this message]
2018-05-03 17:08             ` Tarun Vyas
2018-05-14 12:53 ` Jani Nikula

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180503165856.GB17530@intel.com \
    --to=rodrigo.vivi@intel.com \
    --cc=Deak@otc-chromeosbuild-5 \
    --cc=Pandiyan@otc-chromeosbuild-5 \
    --cc=dhinakaran.pandiyan@intel.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=tarun.vyas@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox