From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mario Kleiner Subject: Re: Funky new vblank counter regressions in Linux 4.4-rc1 Date: Wed, 25 Nov 2015 20:04:26 +0100 Message-ID: <5656063A.8080504@gmail.com> References: <20151119182051.GU4437@intel.com> <564E1F18.5070508@gmail.com> <20151119194557.GB4437@intel.com> <564F3B42.5050803@gmail.com> <20151120153435.GR4437@intel.com> <56532F69.9080001@gmail.com> <20151123155109.GB4437@intel.com> <565353CA.5010909@gmail.com> <20151123202426.GC4437@intel.com> <5655EEBD.5040403@gmail.com> <20151125174600.GV4437@intel.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------040700090304030501080604" Return-path: Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by gabe.freedesktop.org (Postfix) with ESMTP id 01D946E66C for ; Mon, 30 Nov 2015 12:13:49 -0800 (PST) Resent-Message-ID: <20151130201338.GL4437@intel.com> Resent-To: dri-devel@lists.freedesktop.org In-Reply-To: <20151125174600.GV4437@intel.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: =?UTF-8?B?VmlsbGUgU3lyasOkbMOk?= Cc: =?UTF-8?Q?Michel_D=c3=a4nzer?= List-Id: dri-devel@lists.freedesktop.org This is a multi-part message in MIME format. --------------040700090304030501080604 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: quoted-printable On 11/25/2015 06:46 PM, Ville Syrj=E4l=E4 wrote: > On Wed, Nov 25, 2015 at 06:24:13PM +0100, Mario Kleiner wrote: >> On 11/23/2015 09:24 PM, Ville Syrj=E4l=E4 wrote: >>> On Mon, Nov 23, 2015 at 06:58:34PM +0100, Mario Kleiner wrote: >>>> >>>> >>>> On 11/23/2015 04:51 PM, Ville Syrj=E4l=E4 wrote: >>>>> On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote: >>>>>> On 11/20/2015 04:34 PM, Ville Syrj=E4l=E4 wrote: >>>>>>> On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote: >>>>>> >>>>>> ... >>>>>> Ok, but why would that be a bad thing? I think we want it to think= it is >>>>>> in the previous frame if it is called outside the vblank irq conte= xt. >>>>>> The only reason we fudge it to the next frames vblank if i vblank = irq is >>>>>> because we know the vblank irq handler we are executing atm. was m= eant >>>>>> to execute within the upcoming vblank for the next frame, so we fu= dge >>>>>> the scanout positions and thereby timestamp to correspond to that = new >>>>>> frame. But if something called outside irq context it should get a >>>>>> scanout position/timestamp that corresponds to "reality". >>>>> >>>>> It would be a bad thing since it would cause the timestamp to jump >>>>> backwards, and that would also cause the frame count guesstimate to= go >>>>> backwards. >>>>> >>>> >>>> But only if we don't use the dev->driver->get_vblank_counter() metho= d, >>>> which we try to use on AMD. >>> >>> Well, if you do it that way then you have the problem of the hw count= er >>> seeming to jump forward by one after crossing the start of vblank (wh= en >>> compared to the value you sampled when you processed the early vblank >>> interrupt). >>> >> >> Ok, finally i see the bad scenario that wouldn't get prevented by our >> current locking with the new vblank counting in the core. The vblank >> enable path is safe due to locking and discounting of redundant >> timestamps etc. But the disable path could go wrong: >> >> 1. Vblank irq fires, drm_handle_vblank() -> drm_update_vblank_count(), >> updates timestamps and counts "as if" in vblank -> incremented vblank >> count and timestamp now set in the future. >> >> 2. After vblank irq finishes, but just before leading edge of vblank, >> vblank_disable_and_save() executes, doesn't get bumped timestamp or >> count because before vblank and not in vblank irq. Now >> drm_update_vblank_count() would process a >> "new" timestamp and count from the past and we'd have time and counts >> going backwards, and bad things would happen. >> >> I haven't observed such a thing happening during testing so far, >> probably because the time window in which it could happen is tiny, but >> given how awfully bad it would be, it needs to be prevented. >> >> I had a look at the description of the Vblank irq in the "M76 Register >> Reference Guide" for older asics and the description suggests that the >> vblank irq fires when the crtc's line buffer is finished reading pixel >> data from the scanout buffer in memory for a frame, ie., when the line >> buffer read "enters" vblank. That would explain why the irq happens a >> few scanlines before actual vblank, because line buffer refills must >> obviously happen before the crtc can send pixel data from the line >> buffer to the encoders, so it would lead a bit in time. That also mean= s >> we can't delay the vblank irq to actually happen at start of vblank an= d >> have to deal with the early vblank irq. >> >>> I guess one silly idea would be to defer the vblank interrupt process= ing >>> to a timer, and just schedule it a bit into the future from the actua= l >>> interrupt handler. >>> >> >> Timer would be bad because then we get problems with the pageflip >> completion irq sometimes being processed before the vblank irq, > > You you'd need to move page flip completion to happen from the vblank > timer too I suppose. > >> and >> because we want to be fast in vblank irq handling, delivering vblank >> events etc. I wouldn't trust a timer to be reliable enough for such >> short waits. > > hrtimers should be accurate. But maybe more expensive than the timer > wheel. > Sounds all a bit complex and fraught with new possible complications. I=20 can't spend much more time on this than i did so far, and we need to get=20 this into 4.4-rc, so i am aiming for the most simple solution that would=20 work. >> Busy waiting wouldn't be great either in irq. >> >> So what about this relatively simple one? >> >> 1. In radeon_get_crtc_scanoutpos() we artifically define the >> vblank_start line to be, e.g, 5 scanlines before the true start of >> vblank, so for the purpose of vblank counter queries and timestamping >> our "vblank" would start a bit earlier and the vblank irq would always >> execute in "vblank". Non-Irq invocations like vblank_disable_and_save(= ) >> would also be treated to this early vblank start and so what the DRM >> core observes would always be consistent. >> >> 2. At least in radeon-kms we also use radeon_get_crtc_scanoutpos() >> internally for "dynpm" dynamic power management/reclocking, and to >> implement pageflip completion detection on asics older than DCE3 which >> don't have pageflip interrupts. For those cases we need to use the tru= e >> start of vblank, so for this internal use we pass in some special flag >> into radeon_get_crtc_scanoutpos() to tell it to not shift the vblank >> start around. >> >> 3. I've added another check to my patch for drm_update_vblank_count() = to >> catch timestamps going backwards and discounting such cases, for a bit >> of extra robustness against driver trouble. >> >> Any ideas why this would be a stupid idea? I'll try this now and just >> hope meanwhile nobody finds a reason why this would be bad. > > What I don't like is leaking any of this into the core code. All the > hacks should live in the hw specific code. We've managed to keep all > the i915 hacks hidden that way. > > So if you would: > - fudge your get_scanout_position as you suggested > - _not_ expose the hw counter to the vblank core since it > disagrees with the scanout position > - keep the internal get_scanout_position variant/flag > purely internal > > then I think I might like it. That way working hardware doesn't have to > be exposed to these hacks, and there's possible a bit less danger that = it > all gets broken again next time the core needs some work. > Ok. Exposing the fudged hw counter shouldn't be a problem though, given=20 that it would be fudged in a consistent way with the fudged scanout=20 positions and to have incremented at time of drm_handle_vblank(). We'd=20 bump the hw counter to the count of the new vblank at the same time when=20 the scanout positions would start counting backwards from minus=20 something to 0, showing how many vblank lines to go until start of=20 scanout, etc. The only difference to reality would be that this=20 simulated vblank would start 5 scanlines earlier than the true hw=20 vblank, but i can't see how the core would care about that. > One problem with that approach could be that the vblank event and page > flip event would be delievered at different times if you keep using the > page flip interrupt, but I'm not sure that would be a problem. At least > they should both have the same timestamp and counter value. > That's the same now, and the timestamps and counts be the same. I'll=20 check with my measurement equipment that the timestamps will be still=20 accurate wrt. to reality. Attached is my current patch i wanted to submit for the drm core's=20 drm_update_vblank_count(). I think it's good to make the core somewhat=20 robust against potential kms driver bugs or glitches. But if you=20 wouldn't like that patch, there wouldn't be much of a point sending it=20 out at all. thanks, -mario --------------040700090304030501080604 Content-Type: text/x-patch; name="0001-drm-irq-Make-drm_update_vblank_count-more-robust.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename*0="0001-drm-irq-Make-drm_update_vblank_count-more-robust.patch" >>From 2d5d58a1c575ad002ce2cb643f395d0e4757d959 Mon Sep 17 00:00:00 2001 From: Mario Kleiner Date: Wed, 25 Nov 2015 18:48:31 +0100 Subject: [PATCH] drm/irq: Make drm_update_vblank_count() more robust. The changes to drm_update_vblank_count() for Linux 4.4-rc made the function more fragile wrt. some hw quirks. E.g., at dev->driver->enable_vblank(), AMD gpu's fire a spurious redundant vblank irq shortly after enabling vblank irqs, not locked to vblank. This causes a redundant call which needs to be suppressed to avoid miscounting. To increase robustness, shuffle things around a bit: On drivers with high precision vblank timestamping always evaluate the timestamp difference between current timestamp and previously recorded timestamp to detect such redundant invocations and no-op in that case. Also detect and warn about timestamps going backwards to catch potential kms driver bugs. This patch is meant for Linux 4.4-rc and later. Signed-off-by: Mario Kleiner --- drivers/gpu/drm/drm_irq.c | 53 ++++++++++++++++++++++++++++++++++++----------- 1 file changed, 41 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c index 819b8c1..8728c3c 100644 --- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -172,9 +172,11 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, unsigned long flags) { struct drm_vblank_crtc *vblank = &dev->vblank[pipe]; - u32 cur_vblank, diff; + u32 cur_vblank, diff = 0; bool rc; struct timeval t_vblank; + const struct timeval *t_old; + u64 diff_ns; int count = DRM_TIMESTAMP_MAXRETRIES; int framedur_ns = vblank->framedur_ns; @@ -195,13 +197,15 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, rc = drm_get_last_vbltimestamp(dev, pipe, &t_vblank, flags); } while (cur_vblank != dev->driver->get_vblank_counter(dev, pipe) && --count > 0); - if (dev->max_vblank_count != 0) { - /* trust the hw counter when it's around */ - diff = (cur_vblank - vblank->last) & dev->max_vblank_count; - } else if (rc && framedur_ns) { - const struct timeval *t_old; - u64 diff_ns; - + /* + * Always use vblank timestamping based method if supported to reject + * redundant vblank irqs. E.g., AMD hardware needs this to not screw up + * due to some irq handling quirk. + * + * This also sets the diff value for use as fallback below in case the + * hw does not support a suitable hw vblank counter. + */ + if (rc && framedur_ns) { t_old = &vblanktimestamp(dev, pipe, vblank->count); diff_ns = timeval_to_ns(&t_vblank) - timeval_to_ns(t_old); @@ -212,11 +216,36 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, */ diff = DIV_ROUND_CLOSEST_ULL(diff_ns, framedur_ns); - if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) + /* Catch driver timestamping bugs and prevent worse. */ + if ((s32) diff < 0) { + DRM_DEBUG_VBL("crtc %u: Timestamp going backward!" + " diff_ns = %lld, framedur_ns = %d)\n", + pipe, (long long) diff_ns, framedur_ns); + return; + } + + if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) { DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored." - " diff_ns = %lld, framedur_ns = %d)\n", - pipe, (long long) diff_ns, framedur_ns); - } else { + " diff_ns = %lld, framedur_ns = %d)\n", + pipe, (long long) diff_ns, framedur_ns); + + /* Treat this redundant invocation as no-op. */ + WARN_ON_ONCE(cur_vblank != vblank->last); + return; + } + } + + /* + * hw counters, if marked as supported via max_vblank_count != 0, + * *must* increment at leading edge of vblank or at least before + * the firing of the hw vblank irq, otherwise we have a race here + * between gpu and us, where small variations in our execution + * timing can cause off-by-one miscounting of vblanks. + */ + if (dev->max_vblank_count != 0) { + /* trust the hw counter when it's around */ + diff = (cur_vblank - vblank->last) & dev->max_vblank_count; + } else if (!(rc && framedur_ns)) { /* some kind of default for drivers w/o accurate vbl timestamping */ diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0; } -- 1.9.1 --------------040700090304030501080604 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHA6Ly9saXN0 cy5mcmVlZGVza3RvcC5vcmcvbWFpbG1hbi9saXN0aW5mby9kcmktZGV2ZWwK --------------040700090304030501080604--