From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ville =?iso-8859-1?Q?Syrj=E4l=E4?= Subject: Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon Date: Fri, 15 Jan 2016 14:26:29 +0200 Message-ID: <20160115122629.GC23290@intel.com> References: <5698CB20.9050602@suse.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Content-Disposition: inline In-Reply-To: <5698CB20.9050602@suse.cz> Sender: linux-kernel-owner@vger.kernel.org To: Vlastimil Babka Cc: Alex Deucher , Christian =?iso-8859-1?Q?K=F6nig?= , Daniel Vetter , mgraesslin@kde.org, David Airlie , dri-devel@lists.freedesktop.org, LKML , Mario Kleiner , kwin@kde.org List-Id: dri-devel@lists.freedesktop.org On Fri, Jan 15, 2016 at 11:34:08AM +0100, Vlastimil Babka wrote: > Hi, >=20 > since kernel 4.4 I'm unable to login to kde5 desktop (on openSUSE=20 > Tumbleweed). There's a screen with progressbar showing the startup,=20 > which normally fades away after reaching 100%. But with kernel 4.4, t= he=20 > progress gets stuck somewhere between 1/2 and 3/4 (not always the sam= e). > Top shows that kwin is using few % of CPU's but mostly sleeps in poll= (). > When I kill it from another console, I see that everything has actual= ly=20 > started up, just the progressbar screen was obscuring it. The windows= =20 > obviously don't have decorations etc. Starting kwin manually again sh= ows=20 > me again the progressbar screen at the same position. Hmm. Sounds like it could then be waiting for a vblank in the distant future. There's that 1<<23 limit in the code though, but even with that we end up with a max wait of ~38 hours assuming a 60Hz refresh rate. Stuff to try might include enabling drm.debug=3D0x2f, though that'll generate a lot of stuff. Another option would be to use the drm vblank tracepoints to try and catch what seq number it's waiting for and where we're at currently. Or I suppose you could just hack up drm_wait_vblank() to print an error message or something if the requested seq number is in the future by, say, more than a few seconds, and if that's the case then we could try to figure out why that happens= =2E >=20 > I have suspected that kwin is waiting for some event, but nevertheles= s=20 > tried bisecting the kernel between 4.3 and 4.4, which lead to: >=20 > # first bad commit: [4dfd64862ff852df7b1198d667dda778715ee88f] drm: U= se=20 > vblank timestamps to guesstimate how many vblanks were missed >=20 > I can confirm that 4.4 works if I revert the following commits: > 63154ff230fc9255cc507af6277cd181943c50a1 "drm/amdgpu: Fixup hw vblank= =20 > counter/ts for new drm_update_vblank_count() (v3)" >=20 > d1145ad1e41b6c33758a856163198cb53bb96a50 "drm/radeon: Fixup hw vblank= =20 > counter/ts for new drm_update_vblank_count() (v2)" The sha1s don't seem to match what I have, so not sure which kernel tre= e you have, but looking at the radeon commit at least one thing immediately caught my attention; + /* Bump counter if we are at >=3D leading edge = of vblank, + * but before vsync where vpos would turn negat= ive and + * the hw counter really increments. + */ + if (vpos >=3D 0) + count++; It's rather hard to see what it's really doing since the custom flags t= o the get_scanout_position now cause it return non-standard things. But i= f I'm reading things correctly it should really say something like: if (vpos >=3D 0 && vpos < (vsync_start - vblank_start)) count++; Hmm. Actually even that might not be correct since it could be using th= e "fake" vblank start here, so might be it'd need to be something like: if (vpos >=3D 0 && vpos < (vsync_start - vblank_start + lb_vblank_lead_= lines) count++; Also might be worth a shot to just ignore the hw frame counter. Eg.: index e266ffc520d2..db728580549a 100644 --- a/drivers/gpu/drm/radeon/radeon_drv.c +++ b/drivers/gpu/drm/radeon/radeon_drv.c @@ -492,7 +492,6 @@ static struct drm_driver kms_driver =3D { .lastclose =3D radeon_driver_lastclose_kms, .set_busid =3D drm_pci_set_busid, .unload =3D radeon_driver_unload_kms, - .get_vblank_counter =3D radeon_get_vblank_counter_kms, .enable_vblank =3D radeon_enable_vblank_kms, .disable_vblank =3D radeon_disable_vblank_kms, .get_vblank_timestamp =3D radeon_get_vblank_timestamp_kms, diff --git a/drivers/gpu/drm/radeon/radeon_irq_kms.c b/drivers/gpu/drm/= radeon/radeon_irq_kms.c index 979f3bf65f2c..3c5fcab74152 100644 --- a/drivers/gpu/drm/radeon/radeon_irq_kms.c +++ b/drivers/gpu/drm/radeon/radeon_irq_kms.c @@ -152,11 +152,6 @@ int radeon_driver_irq_postinstall_kms(struct drm_d= evice *dev) { struct radeon_device *rdev =3D dev->dev_private; =20 - if (ASIC_IS_AVIVO(rdev)) - dev->max_vblank_count =3D 0x00ffffff; - else - dev->max_vblank_count =3D 0x001fffff; - return 0; } assuming I'm reading the code correctly. >=20 > 31ace027c9f1f8e0a2b09bbf961e4db7b1f6cf19 "drm: Don't zero vblank=20 > timestamps from the irq handler" >=20 > ac0567a4b132fa66e3edf3f913938af9daf7f916 "drm: Add DRM_DEBUG_VBL()" >=20 > 4dfd64862ff852df7b1198d667dda778715ee88f "drm: Use vblank timestamps = to=20 > guesstimate how many vblanks were missed" >=20 > All clean reverts, just needs some fixup on top to use abs() instead = of=20 > abs64() due to 79211c8ed19c055ca105502c8733800d442a0ae6. >=20 > Unfortunately I don't know if this is a kernel problem or kwin proble= m.=20 > I tried to CC maintainers of both, advices what to try or what info t= o=20 > provide welcome. The card is "CAICOS" with 1GB memory. >=20 > Thanks, > Vlastimil --=20 Ville Syrj=E4l=E4 Intel OTC