linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mario Kleiner <mario.kleiner.de@gmail.com>
To: "Ville Syrjälä" <ville.syrjala@linux.intel.com>,
	"Vlastimil Babka" <vbabka@suse.cz>
Cc: "Alex Deucher" <alexander.deucher@amd.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Daniel Vetter" <daniel.vetter@ffwll.ch>,
	mgraesslin@kde.org, "David Airlie" <airlied@linux.ie>,
	dri-devel@lists.freedesktop.org,
	LKML <linux-kernel@vger.kernel.org>,
	kwin@kde.org
Subject: Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
Date: Sat, 16 Jan 2016 05:24:05 +0100	[thread overview]
Message-ID: <5699C5E5.90702@gmail.com> (raw)
In-Reply-To: <20160115122629.GC23290@intel.com>



On 01/15/2016 01:26 PM, Ville Syrjälä wrote:
> On Fri, Jan 15, 2016 at 11:34:08AM +0100, Vlastimil Babka wrote:
>> Hi,
>>
>> since kernel 4.4 I'm unable to login to kde5 desktop (on openSUSE
>> Tumbleweed). There's a screen with progressbar showing the startup,
>> which normally fades away after reaching 100%. But with kernel 4.4, the
>> progress gets stuck somewhere between 1/2 and 3/4 (not always the same).
>> Top shows that kwin is using few % of CPU's but mostly sleeps in poll().
>> When I kill it from another console, I see that everything has actually
>> started up, just the progressbar screen was obscuring it. The windows
>> obviously don't have decorations etc. Starting kwin manually again shows
>> me again the progressbar screen at the same position.
>

Depressing. I was stress-testing those patches with Linux 4.4 for days 
on 2 AMD gpu's (HD-4000 RV 730 and HD-5770) under KDE 5 Plasma 5.4.2 
(KUbuntu 15.10, XOrg 1.17) and just retested Linux 4.4 on 
nouveau/radeon/intel also with XOrg 1.18 and XOrg master) with Linux 4.4 
a few days ago and never encountered such a hang or other vblank related 
problem on KDE-5 or GNOME-3.

I'm currently running...

while xinit /usr/bin/ksplashqml --test -- :1 ; do echo yay; done

... in an endless loop on Linux 4.4 SMP PREEMPT on HD-5770  and so far i 
can't trigger a hang after hundreds of runs.

Does this also hang for you?

> Hmm. Sounds like it could then be waiting for a vblank in the distant
> future. There's that 1<<23 limit in the code though, but even with that
> we end up with a max wait of ~38 hours assuming a 60Hz refresh rate.
>

xtrace suggests that ksplashqml seems to use classic OpenGL  + 
glXSwapBuffers under DRI2. So no clever swap scheduling based on vblank 
counter values.

> Stuff to try might include enabling drm.debug=0x2f, though that'll
> generate a lot of stuff. Another option would be to use the drm vblank
> tracepoints to try and catch what seq number it's waiting for and
> where we're at currently. Or I suppose you could just hack
> up drm_wait_vblank() to print an error message or something if the
> requested seq number is in the future by, say, more than a few seconds,
> and if that's the case then we could try to figure out why that happens.
>
>>
>> I have suspected that kwin is waiting for some event, but nevertheless
>> tried bisecting the kernel between 4.3 and 4.4, which lead to:
>>
>> # first bad commit: [4dfd64862ff852df7b1198d667dda778715ee88f] drm: Use
>> vblank timestamps to guesstimate how many vblanks were missed
>>
>> I can confirm that 4.4 works if I revert the following commits:
>> 63154ff230fc9255cc507af6277cd181943c50a1 "drm/amdgpu: Fixup hw vblank
>> counter/ts for new drm_update_vblank_count() (v3)"
>>
>> d1145ad1e41b6c33758a856163198cb53bb96a50 "drm/radeon: Fixup hw vblank
>> counter/ts for new drm_update_vblank_count() (v2)"
>
> The sha1s don't seem to match what I have, so not sure which kernel tree
> you have, but looking at the radeon commit at least one thing
> immediately caught my attention;
>
> +                       /* Bump counter if we are at >= leading edge of vblank,
> +                        * but before vsync where vpos would turn negative and
> +                        * the hw counter really increments.
> +                        */
> +                       if (vpos >= 0)
> +                               count++;
>
> It's rather hard to see what it's really doing since the custom flags to
> the get_scanout_position now cause it return non-standard things. But if
> I'm reading things correctly it should really say something like:
>
> if (vpos >= 0 && vpos < (vsync_start - vblank_start))
> 	count++;
>
> Hmm. Actually even that might not be correct since it could be using the
> "fake" vblank start here, so might be it'd need to be something like:
>
> if (vpos >= 0 && vpos < (vsync_start - vblank_start + lb_vblank_lead_lines)
> 	count++;
>

The current code should be correct. vpos here returns the distance of hw 
vertical scanout position to the start of vblank. According to Alex and 
Harry Wentland of AMD's display team, and my testing of my two cards the 
hw vertical scanout position resets to zero at start line of vsync, 
therefore the "vpos" in that code becomes negative at start of vsync. At 
the same time the hw frame counter increments by one, making that 
"count++" to bump the returned count by +1 no longer neccessary.

If the reset of hw vertical scanout pos to zero and the increment of hw 
frame counter wouldn't happen at exactly the same time at start of vsync 
i could see how that could cause two successive queries of 
driver->get_vblank_counter() could report a count of N+1 and then N if 
the timing of both calls would be just perfectly right. That would cause 
the DRM code to falsely detect counter wraparound and jump the vblank 
counter forward by 2^24.

My tested gpu's had DCE-3 or DCE-4 display engines, Caicos has DCE-5, so 
could this be some hw quirk for DCE-5?


> Also might be worth a shot to just ignore the hw frame counter. Eg.:
>
> index e266ffc520d2..db728580549a 100644
> --- a/drivers/gpu/drm/radeon/radeon_drv.c
> +++ b/drivers/gpu/drm/radeon/radeon_drv.c
> @@ -492,7 +492,6 @@ static struct drm_driver kms_driver = {
>          .lastclose = radeon_driver_lastclose_kms,
>          .set_busid = drm_pci_set_busid,
>          .unload = radeon_driver_unload_kms,
> -       .get_vblank_counter = radeon_get_vblank_counter_kms,
>          .enable_vblank = radeon_enable_vblank_kms,
>          .disable_vblank = radeon_disable_vblank_kms,
>          .get_vblank_timestamp = radeon_get_vblank_timestamp_kms,
> diff --git a/drivers/gpu/drm/radeon/radeon_irq_kms.c b/drivers/gpu/drm/radeon/radeon_irq_kms.c
> index 979f3bf65f2c..3c5fcab74152 100644
> --- a/drivers/gpu/drm/radeon/radeon_irq_kms.c
> +++ b/drivers/gpu/drm/radeon/radeon_irq_kms.c
> @@ -152,11 +152,6 @@ int radeon_driver_irq_postinstall_kms(struct drm_device *dev)
>   {
>          struct radeon_device *rdev = dev->dev_private;
>
> -       if (ASIC_IS_AVIVO(rdev))
> -               dev->max_vblank_count = 0x00ffffff;
> -       else
> -               dev->max_vblank_count = 0x001fffff;
> -
>          return 0;
>   }
>
> assuming I'm reading the code correctly.
>
>>
>> 31ace027c9f1f8e0a2b09bbf961e4db7b1f6cf19 "drm: Don't zero vblank
>> timestamps from the irq handler"
>>
>> ac0567a4b132fa66e3edf3f913938af9daf7f916 "drm: Add DRM_DEBUG_VBL()"
>>
>> 4dfd64862ff852df7b1198d667dda778715ee88f "drm: Use vblank timestamps to
>> guesstimate how many vblanks were missed"
>>
>> All clean reverts, just needs some fixup on top to use abs() instead of
>> abs64() due to 79211c8ed19c055ca105502c8733800d442a0ae6.
>>
>> Unfortunately I don't know if this is a kernel problem or kwin problem.
>> I tried to CC maintainers of both, advices what to try or what info to
>> provide welcome. The card is "CAICOS" with 1GB memory.
>>

I think a drm.debug=0x21 setting and grep'ping the syslog for "vblank" 
should probably give useful info around the time of the hang.

Maybe also check XOrg.0.log for (WW) warnings related to flip.

thanks,
-mario


>> Thanks,
>> Vlastimil
>

  parent reply	other threads:[~2016-01-16  4:24 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-15 10:34 linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon Vlastimil Babka
2016-01-15 12:26 ` Ville Syrjälä
2016-01-15 12:40   ` Vlastimil Babka
2016-01-16  4:24   ` Mario Kleiner [this message]
2016-01-18 10:49     ` Vlastimil Babka
2016-01-18 14:06       ` Vlastimil Babka
2016-01-18 14:14         ` Christian König
2016-01-20 20:25       ` Vlastimil Babka
2016-01-20 20:32       ` Mario Kleiner
2016-01-21  3:43         ` Michel Dänzer
2016-01-21  5:31           ` Mario Kleiner
2016-01-21  6:38             ` Michel Dänzer
2016-01-21  6:41               ` Michel Dänzer
2016-01-21  7:58                 ` Daniel Vetter
2016-01-21  8:36                   ` Michel Dänzer
2016-01-21 10:09                     ` Daniel Vetter
2016-01-22  3:06                       ` Michel Dänzer
2016-01-22 15:18                         ` Ville Syrjälä
2016-01-22 18:29                           ` Mario Kleiner
2016-01-23 18:23                             ` Mario Kleiner
2016-01-25  4:15                           ` Michel Dänzer
2016-01-25 13:16                             ` Mario Kleiner
2016-01-25 13:23                               ` Ville Syrjälä
2016-01-25 13:44                                 ` Mario Kleiner
2016-01-25 14:53                                   ` Ville Syrjälä
2016-01-25 16:38                                     ` Mario Kleiner
2016-01-25 18:51                                       ` Daniel Vetter
2016-01-25 19:30                                         ` Mario Kleiner
2016-01-25 20:32                                           ` Daniel Vetter
2016-01-25 21:42                                             ` Mario Kleiner
2016-01-25 22:05                                               ` Daniel Vetter
2016-01-21  8:28               ` Mario Kleiner
2016-01-21  9:15                 ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5699C5E5.90702@gmail.com \
    --to=mario.kleiner.de@gmail.com \
    --cc=airlied@linux.ie \
    --cc=alexander.deucher@amd.com \
    --cc=christian.koenig@amd.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=kwin@kde.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgraesslin@kde.org \
    --cc=vbabka@suse.cz \
    --cc=ville.syrjala@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).