From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mario Kleiner Subject: Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon Date: Mon, 25 Jan 2016 17:38:30 +0100 Message-ID: <56A64F86.9080605@gmail.com> References: <56A07D97.6030606@daenzer.net> <20160121075849.GH19130@phenom.ffwll.local> <56A0989E.30006@daenzer.net> <20160121100905.GL19130@phenom.ffwll.local> <56A19C98.8020208@daenzer.net> <20160122151835.GM23290@intel.com> <56A5A171.7000205@daenzer.net> <56A6203D.3030803@gmail.com> <20160125132310.GS23290@intel.com> <56A626D5.2040808@gmail.com> <20160125145309.GT23290@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: In-Reply-To: <20160125145309.GT23290@intel.com> Sender: linux-kernel-owner@vger.kernel.org To: =?UTF-8?B?VmlsbGUgU3lyasOkbMOk?= Cc: =?UTF-8?Q?Michel_D=c3=a4nzer?= , Alex Deucher , Vlastimil Babka , LKML , dri-devel@lists.freedesktop.org, =?UTF-8?Q?Christian_K=c3=b6nig?= , Daniel Vetter List-Id: dri-devel@lists.freedesktop.org Readding Daniel, which somehow got dropped from the cc. On 01/25/2016 03:53 PM, Ville Syrj=E4l=E4 wrote: > On Mon, Jan 25, 2016 at 02:44:53PM +0100, Mario Kleiner wrote: >> >> >> On 01/25/2016 02:23 PM, Ville Syrj=E4l=E4 wrote: >>> On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote: >>>> >>>> >>>> On 01/25/2016 05:15 AM, Michel D=E4nzer wrote: >>>>> On 23.01.2016 00:18, Ville Syrj=E4l=E4 wrote: >>>>>> On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel D=E4nzer wrote: >>>>>>> >>>>>>> [ Trimming KDE folks from Cc ] >>>>>>> >>>>>>> On 21.01.2016 19:09, Daniel Vetter wrote: >>>>>>>> On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel D=E4nzer wrot= e: >>>>>>>>> On 21.01.2016 16:58, Daniel Vetter wrote: >>>>>>>>>> >>>>>>>>>> Can you please point me at the vblank on/off jump bug please= ? >>>>>>>>> >>>>>>>>> AFAIR I originally reported it in response to >>>>>>>>> http://lists.freedesktop.org/archives/dri-devel/2015-August/0= 87841.html >>>>>>>>> , but I can't find that in the archives, so maybe that was ju= st on IRC. >>>>>>>>> See >>>>>>>>> http://lists.freedesktop.org/archives/dri-devel/2016-January/= 099122.html >>>>>>>>> . Basically, I ran into the bug fixed by your patch because t= he counter >>>>>>>>> jumped forward on every DPMS off, so it hit the 32-bit bounda= ry after >>>>>>>>> just a few days. >>>>>>>> >>>>>>>> Ok, so just uncovered the overflow bug. >>>>>>> >>>>>>> Not sure what you mean by "just", but to be clear: The drm_vbla= nk_on/off >>>>>>> counter jumping bug (similar to the bug this thread is about), = which >>>>>>> exposed the overflow bug, is still alive and kicking in 4.5. It= seems >>>>>>> to happen when turning off the CRTC: >>>>>>> >>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: = current=3D218104694, diff=3D0, hw=3D916 hw_last=3D916 >>>>>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank st= art 3 >>>>>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(21= 99,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep] >>>>>>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank st= art 3 >>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: = current=3D218104694, diff=3D16776301, hw=3D1 hw_last=3D916 >>>>>> >>>>>> Not sure what bug we're talking about here, but here the hw coun= ter >>>>>> clearly jumps backwards. >>>>>> >>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 1: = current=3D0, diff=3D0, hw=3D0 hw_last=3D0 >>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 2: = current=3D0, diff=3D0, hw=3D0 hw_last=3D0 >>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 >>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 3: = current=3D0, diff=3D0, hw=3D0 hw_last=3D0 >>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 >>>>>>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,= 0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep] >>>>>>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 >>>>>>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: = current=3D234880995, diff=3D16777215, hw=3D0 hw_last=3D1 >>>>>> >>>>>> Same here. >>>>> >>>>> At least one of the jumps is expected, because this is around tur= ning >>>>> off the CRTC for DPMS off. Don't know yet why there are two jumps= back >>>>> though. >>>>> >>>>> >>>>>> These things just don't happen on i915 because drm_vblank_off() = and >>>>>> drm_vblank_on() are always called around the times when the hw c= ounter >>>>>> might get reset. Or at least that's how it should be. >>>>> >>>>> Which is of course the idea of Daniel's patch (which is what I'm = getting >>>>> the above with) or Mario's patch as well, but clearly something's= still >>>>> wrong. It's certainly possible that it's something in the driver,= but >>>>> since calling drm_vblank_pre/post_modeset from the same places se= ems to >>>>> work fine (ignoring the regression discussed in this thread)... D= o >>>>> drm_vblank_on/off require something else to handle this correctly= ? >>>>> >>>>> >>>> >>>> I suspect it is because vblank_disable_and_save calls >>>> drm_update_vblank_count() unconditionally, even if vblank irqs are >>>> already off. >>>> >>>> So on a manual display disable -> reenable you get something like >>>> >>>> At disable: >>>> >>>> Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_o= ff -> >>>> vblank_disable_and_save -> irqs off, drm_update_vblank_count() com= putes >>>> final count. >>>> >>>> Then the crtc is shut down and its hw counter resets to zero. >>>> >>>> At reenable: >>>> >>>> Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crt= c) -> >>>> atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) -> >>>> drm_vblank_off -> vblank_disable_and_save -> A pointless >>>> drm_update_vblank_count() while the hw counter is already reset to= zero >>>> --> Unwanted counter jump. >>>> >>>> >>>> The problem doesn't happen on a pure modeset to a different video >>>> resolution/refresh rate, as then we only have one call into >>>> atombios_crtc_dpms(DPMS_OFF). >>>> >>>> I think the fix is to fix vblank_disable_and_save() to only call >>>> drm_update_vblank_count() if vblank irqs get actually disabled, no= t on >>>> no-op calls. I will try that now. >>> >>> It does that on purpose. Otherwise the vblank counter would appear = to >>> have stalled while the interrupt was off. >>> >> >> Ok, that's what the comments there say, although i don't see atm. wh= y >> that perceived stall would be a big problem. I checked all callers o= f >> vblank_disable_and_save(). They are all careful to not call that >> function if vblanks are already disabled. The only exception is >> drm_vblank_off(). If drm_vblank_off/on is supposed to protect kms >> drivers which have resetting hw counters or other problematic behavi= our >> during modesets etc. then this will break. E.g., calling the vblank >> timestamping stuff is also not safe/well-defined during modesets whe= n >> the timestamping constants are not (yet) updated to reflect the new = mode >> timing of the modeset in progress. > > The idea is to maintain the appearance that the counter ticks all the > time as long as the crtc is active. While that may not be really > required in case if no one is currently interested in the vblank > counter, I think it's a nice thing to have just to make the behaviour > of the counter consistent. > > As far as calling drm_vblank_off() after the hw counter got reset, we= ll, > that not correct. It should be called before the reset. What radeon does is calling drm_vblank_off at beginning of DPMS_OFF. Th= e=20 first call to DMPS_OFF will call drm_vblank_off() and really disable=20 vblank-irqs if they were running, updating the counts/ts a last time.=20 But then the dpms off will reset the hw counter to zero. When one=20 reenables the display, a second call to DPMS_OFF will now call=20 drm_vblank_off again when it apparently shouldn't. I just tested this patch, which fixes the counter jumps on radeon-kms=20 with my or Daniel's drm_vblank_off patches to radeon: diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c index 607f493..d739d93 100644 --- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -1313,7 +1313,10 @@ void drm_vblank_off(struct drm_device *dev,=20 unsigned int pipe) spin_lock_irqsave(&dev->event_lock, irqflags); spin_lock(&dev->vbl_lock); - vblank_disable_and_save(dev, pipe); + DRM_DEBUG_VBL("crtc %d, vblank enabled %d\n", pipe,=20 vblank->enabled); + + if (vblank->enabled) + vblank_disable_and_save(dev, pipe); wake_up(&vblank->queue); /* @@ -1415,6 +1418,8 @@ void drm_vblank_on(struct drm_device *dev,=20 unsigned int pipe) return; spin_lock_irqsave(&dev->vbl_lock, irqflags); + DRM_DEBUG_VBL("crtc %d, vblank enabled %d\n", pipe,=20 vblank->enabled); + /* Drop our private "prevent drm_vblank_get" refcount */ if (vblank->inmodeset) { atomic_dec(&vblank->refcount); Another, maybe better, approach might be to no-op redundant calls to=20 drm_vblank_off() iff vblank->inmodeset and no-op redundant calls to=20 drm_vblank_on() iff !vblank->inmodeset. -mario > >> >> -mario >> >> >>>> >>>> Otherwise kms drivers would have to be careful to never call >>>> drm_vblank_off multiple times before calling drm_vblank_on, but th= e help >>>> text to drm_vblank_on() claims that unbalanced calls to these func= tions >>>> are perfectly fine. >>>> >>>> -mario >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >