From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ville =?iso-8859-1?Q?Syrj=E4l=E4?= Subject: Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon Date: Mon, 25 Jan 2016 15:23:10 +0200 Message-ID: <20160125132310.GS23290@intel.com> References: <56A06D2E.4000008@gmail.com> <56A07CF9.5060506@daenzer.net> <56A07D97.6030606@daenzer.net> <20160121075849.GH19130@phenom.ffwll.local> <56A0989E.30006@daenzer.net> <20160121100905.GL19130@phenom.ffwll.local> <56A19C98.8020208@daenzer.net> <20160122151835.GM23290@intel.com> <56A5A171.7000205@daenzer.net> <56A6203D.3030803@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Content-Disposition: inline In-Reply-To: <56A6203D.3030803@gmail.com> Sender: linux-kernel-owner@vger.kernel.org To: Mario Kleiner Cc: Michel =?iso-8859-1?Q?D=E4nzer?= , Alex Deucher , Vlastimil Babka , LKML , dri-devel@lists.freedesktop.org, Christian =?iso-8859-1?Q?K=F6nig?= List-Id: dri-devel@lists.freedesktop.org On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote: >=20 >=20 > On 01/25/2016 05:15 AM, Michel D=E4nzer wrote: > > On 23.01.2016 00:18, Ville Syrj=E4l=E4 wrote: > >> On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel D=E4nzer wrote: > >>> > >>> [ Trimming KDE folks from Cc ] > >>> > >>> On 21.01.2016 19:09, Daniel Vetter wrote: > >>>> On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel D=E4nzer wrote: > >>>>> On 21.01.2016 16:58, Daniel Vetter wrote: > >>>>>> > >>>>>> Can you please point me at the vblank on/off jump bug please? > >>>>> > >>>>> AFAIR I originally reported it in response to > >>>>> http://lists.freedesktop.org/archives/dri-devel/2015-August/087= 841.html > >>>>> , but I can't find that in the archives, so maybe that was just= on IRC. > >>>>> See > >>>>> http://lists.freedesktop.org/archives/dri-devel/2016-January/09= 9122.html > >>>>> . Basically, I ran into the bug fixed by your patch because the= counter > >>>>> jumped forward on every DPMS off, so it hit the 32-bit boundary= after > >>>>> just a few days. > >>>> > >>>> Ok, so just uncovered the overflow bug. > >>> > >>> Not sure what you mean by "just", but to be clear: The drm_vblank= _on/off > >>> counter jumping bug (similar to the bug this thread is about), wh= ich > >>> exposed the overflow bug, is still alive and kicking in 4.5. It s= eems > >>> to happen when turning off the CRTC: > >>> > >>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: cu= rrent=3D218104694, diff=3D0, hw=3D916 hw_last=3D916 > >>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank star= t 3 > >>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199= ,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep] > >>> [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank star= t 3 > >>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: cu= rrent=3D218104694, diff=3D16776301, hw=3D1 hw_last=3D916 > >> > >> Not sure what bug we're talking about here, but here the hw counte= r > >> clearly jumps backwards. > >> > >>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 > >>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 > >>> [drm:drm_update_vblank_count] updating vblank count on crtc 1: cu= rrent=3D0, diff=3D0, hw=3D0 hw_last=3D0 > >>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 > >>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 > >>> [drm:drm_update_vblank_count] updating vblank count on crtc 2: cu= rrent=3D0, diff=3D0, hw=3D0 hw_last=3D0 > >>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 > >>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 > >>> [drm:drm_update_vblank_count] updating vblank count on crtc 3: cu= rrent=3D0, diff=3D0, hw=3D0 hw_last=3D0 > >>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 > >>> [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)= @ 7304.317140 -> 7304.317140 [e 0 us, 0 rep] > >>> [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 > >>> [drm:drm_update_vblank_count] updating vblank count on crtc 0: cu= rrent=3D234880995, diff=3D16777215, hw=3D0 hw_last=3D1 > >> > >> Same here. > > > > At least one of the jumps is expected, because this is around turni= ng > > off the CRTC for DPMS off. Don't know yet why there are two jumps b= ack > > though. > > > > > >> These things just don't happen on i915 because drm_vblank_off() an= d > >> drm_vblank_on() are always called around the times when the hw cou= nter > >> might get reset. Or at least that's how it should be. > > > > Which is of course the idea of Daniel's patch (which is what I'm ge= tting > > the above with) or Mario's patch as well, but clearly something's s= till > > wrong. It's certainly possible that it's something in the driver, b= ut > > since calling drm_vblank_pre/post_modeset from the same places seem= s to > > work fine (ignoring the regression discussed in this thread)... Do > > drm_vblank_on/off require something else to handle this correctly? > > > > >=20 > I suspect it is because vblank_disable_and_save calls=20 > drm_update_vblank_count() unconditionally, even if vblank irqs are=20 > already off. >=20 > So on a manual display disable -> reenable you get something like >=20 > At disable: >=20 > Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off = ->=20 > vblank_disable_and_save -> irqs off, drm_update_vblank_count() comput= es=20 > final count. >=20 > Then the crtc is shut down and its hw counter resets to zero. >=20 > At reenable: >=20 > Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) = ->=20 > atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) ->=20 > drm_vblank_off -> vblank_disable_and_save -> A pointless=20 > drm_update_vblank_count() while the hw counter is already reset to ze= ro=20 > --> Unwanted counter jump. >=20 >=20 > The problem doesn't happen on a pure modeset to a different video=20 > resolution/refresh rate, as then we only have one call into=20 > atombios_crtc_dpms(DPMS_OFF). >=20 > I think the fix is to fix vblank_disable_and_save() to only call=20 > drm_update_vblank_count() if vblank irqs get actually disabled, not o= n=20 > no-op calls. I will try that now. It does that on purpose. Otherwise the vblank counter would appear to have stalled while the interrupt was off. >=20 > Otherwise kms drivers would have to be careful to never call=20 > drm_vblank_off multiple times before calling drm_vblank_on, but the h= elp=20 > text to drm_vblank_on() claims that unbalanced calls to these functio= ns=20 > are perfectly fine. >=20 > -mario >=20 >=20 >=20 >=20 >=20 >=20 >=20 --=20 Ville Syrj=E4l=E4 Intel OTC