dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 3/6] drm: Fix drm_vblank_pre/post_modeset regression from Linux 4.4
  2016-02-08  1:13 drm vblank regression fixes for Linux 4.4+ Mario Kleiner
@ 2016-02-08  1:13 ` Mario Kleiner
  2016-02-09 10:00   ` Daniel Vetter
  2016-02-11 13:03   ` Vlastimil Babka
  0 siblings, 2 replies; 12+ messages in thread
From: Mario Kleiner @ 2016-02-08  1:13 UTC (permalink / raw)
  To: dri-devel
  Cc: daniel.vetter, michel, linux, stable, alexander.deucher,
	christian.koenig, vbabka

Changes to drm_update_vblank_count() in Linux 4.4 broke the
behaviour of the pre/post modeset functions as the new update
code doesn't deal with hw vblank counter resets inbetween calls
to drm_vblank_pre_modeset an drm_vblank_post_modeset, as it
should.

This causes mistreatment of such hw counter resets as counter
wraparound, and thereby large forward jumps of the software
vblank counter which in turn cause vblank event dispatching
and vblank waits to fail/hang --> userspace clients hang.

This symptom was reported on radeon-kms to cause a infinite
hang of KDE Plasma 5 shell's login procedure, preventing users
from logging in.

Fix this by detecting when drm_update_vblank_count() is called
inside a pre->post modeset interval. If so, clamp valid vblank
increments to the safe values 0 and 1, pretty much restoring
the update behavior of the old update code of Linux 4.3 and
earlier. Also reset the last recorded hw vblank count at call
to drm_vblank_post_modeset() to be safe against hw that after
modesetting, dpms on etc. only fires its first vblank irq after
drm_vblank_post_modeset() was already called.

Reported-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: <stable@vger.kernel.org> # 4.4+
Cc: michel@daenzer.net
Cc: vbabka@suse.cz
Cc: ville.syrjala@linux.intel.com
Cc: daniel.vetter@ffwll.ch
Cc: dri-devel@lists.freedesktop.org
Cc: alexander.deucher@amd.com
Cc: christian.koenig@amd.com
---
 drivers/gpu/drm/drm_irq.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index aa2c74b..5c27ad3 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -222,6 +222,21 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
 	}
 
 	/*
+	 * Within a drm_vblank_pre_modeset - drm_vblank_post_modeset
+	 * interval? If so then vblank irqs keep running and it will likely
+	 * happen that the hardware vblank counter is not trustworthy as it
+	 * might reset at some point in that interval and vblank timestamps
+	 * are not trustworthy either in that interval. Iow. this can result
+	 * in a bogus diff >> 1 which must be avoided as it would cause
+	 * random large forward jumps of the software vblank counter.
+	 */
+	if (diff > 1 && (vblank->inmodeset & 0x2)) {
+		DRM_DEBUG_VBL("clamping vblank bump to 1 on crtc %u: diffr=%u"
+			      " due to pre-modeset.\n", pipe, diff);
+		diff = 1;
+	}
+
+	/*
 	 * Restrict the bump of the software vblank counter to a safe maximum
 	 * value of +1 whenever there is the possibility that concurrent readers
 	 * of vblank timestamps could be active at the moment, as the current
@@ -1573,6 +1588,7 @@ void drm_vblank_post_modeset(struct drm_device *dev, unsigned int pipe)
 	if (vblank->inmodeset) {
 		spin_lock_irqsave(&dev->vbl_lock, irqflags);
 		dev->vblank_disable_allowed = true;
+		drm_reset_vblank_timestamp(dev, pipe);
 		spin_unlock_irqrestore(&dev->vbl_lock, irqflags);
 
 		if (vblank->inmodeset & 0x2)
-- 
1.9.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/6] drm: Fix drm_vblank_pre/post_modeset regression from Linux 4.4
  2016-02-08  1:13 ` [PATCH 3/6] drm: Fix drm_vblank_pre/post_modeset regression from Linux 4.4 Mario Kleiner
@ 2016-02-09 10:00   ` Daniel Vetter
  2016-02-11 13:03   ` Vlastimil Babka
  1 sibling, 0 replies; 12+ messages in thread
From: Daniel Vetter @ 2016-02-09 10:00 UTC (permalink / raw)
  To: Mario Kleiner
  Cc: dri-devel, linux, stable, michel, vbabka, ville.syrjala,
	daniel.vetter, alexander.deucher, christian.koenig

On Mon, Feb 08, 2016 at 02:13:26AM +0100, Mario Kleiner wrote:
> Changes to drm_update_vblank_count() in Linux 4.4 broke the
> behaviour of the pre/post modeset functions as the new update
> code doesn't deal with hw vblank counter resets inbetween calls
> to drm_vblank_pre_modeset an drm_vblank_post_modeset, as it
> should.
> 
> This causes mistreatment of such hw counter resets as counter
> wraparound, and thereby large forward jumps of the software
> vblank counter which in turn cause vblank event dispatching
> and vblank waits to fail/hang --> userspace clients hang.
> 
> This symptom was reported on radeon-kms to cause a infinite
> hang of KDE Plasma 5 shell's login procedure, preventing users
> from logging in.
> 
> Fix this by detecting when drm_update_vblank_count() is called
> inside a pre->post modeset interval. If so, clamp valid vblank
> increments to the safe values 0 and 1, pretty much restoring
> the update behavior of the old update code of Linux 4.3 and
> earlier. Also reset the last recorded hw vblank count at call
> to drm_vblank_post_modeset() to be safe against hw that after
> modesetting, dpms on etc. only fires its first vblank irq after
> drm_vblank_post_modeset() was already called.
> 
> Reported-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> Cc: <stable@vger.kernel.org> # 4.4+
> Cc: michel@daenzer.net
> Cc: vbabka@suse.cz
> Cc: ville.syrjala@linux.intel.com
> Cc: daniel.vetter@ffwll.ch
> Cc: dri-devel@lists.freedesktop.org
> Cc: alexander.deucher@amd.com
> Cc: christian.koenig@amd.com

We need to untangle the new vblank stuff that assumes solide (atomic)
drivers and all the old stuff much more. But that can be done later on.

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> ---
>  drivers/gpu/drm/drm_irq.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> index aa2c74b..5c27ad3 100644
> --- a/drivers/gpu/drm/drm_irq.c
> +++ b/drivers/gpu/drm/drm_irq.c
> @@ -222,6 +222,21 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>  	}
>  
>  	/*
> +	 * Within a drm_vblank_pre_modeset - drm_vblank_post_modeset
> +	 * interval? If so then vblank irqs keep running and it will likely
> +	 * happen that the hardware vblank counter is not trustworthy as it
> +	 * might reset at some point in that interval and vblank timestamps
> +	 * are not trustworthy either in that interval. Iow. this can result
> +	 * in a bogus diff >> 1 which must be avoided as it would cause
> +	 * random large forward jumps of the software vblank counter.
> +	 */
> +	if (diff > 1 && (vblank->inmodeset & 0x2)) {
> +		DRM_DEBUG_VBL("clamping vblank bump to 1 on crtc %u: diffr=%u"
> +			      " due to pre-modeset.\n", pipe, diff);
> +		diff = 1;
> +	}
> +
> +	/*
>  	 * Restrict the bump of the software vblank counter to a safe maximum
>  	 * value of +1 whenever there is the possibility that concurrent readers
>  	 * of vblank timestamps could be active at the moment, as the current
> @@ -1573,6 +1588,7 @@ void drm_vblank_post_modeset(struct drm_device *dev, unsigned int pipe)
>  	if (vblank->inmodeset) {
>  		spin_lock_irqsave(&dev->vbl_lock, irqflags);
>  		dev->vblank_disable_allowed = true;
> +		drm_reset_vblank_timestamp(dev, pipe);
>  		spin_unlock_irqrestore(&dev->vbl_lock, irqflags);
>  
>  		if (vblank->inmodeset & 0x2)
> -- 
> 1.9.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/6] drm: Fix drm_vblank_pre/post_modeset regression from Linux 4.4
  2016-02-08  1:13 ` [PATCH 3/6] drm: Fix drm_vblank_pre/post_modeset regression from Linux 4.4 Mario Kleiner
  2016-02-09 10:00   ` Daniel Vetter
@ 2016-02-11 13:03   ` Vlastimil Babka
  1 sibling, 0 replies; 12+ messages in thread
From: Vlastimil Babka @ 2016-02-11 13:03 UTC (permalink / raw)
  To: Mario Kleiner, dri-devel
  Cc: linux, stable, michel, ville.syrjala, daniel.vetter,
	alexander.deucher, christian.koenig

On 02/08/2016 02:13 AM, Mario Kleiner wrote:
> Changes to drm_update_vblank_count() in Linux 4.4 broke the
> behaviour of the pre/post modeset functions as the new update
> code doesn't deal with hw vblank counter resets inbetween calls
> to drm_vblank_pre_modeset an drm_vblank_post_modeset, as it
> should.
>
> This causes mistreatment of such hw counter resets as counter
> wraparound, and thereby large forward jumps of the software
> vblank counter which in turn cause vblank event dispatching
> and vblank waits to fail/hang --> userspace clients hang.
>
> This symptom was reported on radeon-kms to cause a infinite
> hang of KDE Plasma 5 shell's login procedure, preventing users
> from logging in.
>
> Fix this by detecting when drm_update_vblank_count() is called
> inside a pre->post modeset interval. If so, clamp valid vblank
> increments to the safe values 0 and 1, pretty much restoring
> the update behavior of the old update code of Linux 4.3 and
> earlier. Also reset the last recorded hw vblank count at call
> to drm_vblank_post_modeset() to be safe against hw that after
> modesetting, dpms on etc. only fires its first vblank irq after
> drm_vblank_post_modeset() was already called.
>
> Reported-by: Vlastimil Babka <vbabka@suse.cz>

FWIW, I've applied the whole patchset to 4.4 and the kde5 login problem 
didn't occur. I can test the next version too.

Thanks,
Vlastimil

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Respin: drm vblank regression fixes for Linux 4.4+ (v2)
@ 2016-02-12 19:30 Mario Kleiner
  2016-02-12 19:30 ` [PATCH 1/6] drm: No-Op redundant calls to drm_vblank_off() (v2) Mario Kleiner
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Mario Kleiner @ 2016-02-12 19:30 UTC (permalink / raw)
  To: dri-devel

Here the respin of my series of vblank bug fixes for stable
Linux 4.4/4.5 with all review comments by Daniel and Ville
taken into account and reviewd-by's and tested-by's added.

I've removed the stable patch "drm: Prevent vblank counter jumps
with timestamp based update method." for the moment. I'll send a
respin of that patch later for drm-next / Linux 4.6, taking
Daniels and Villes comments into account.

Instead a stable patch 6/6 for nouveau now solves the problem of large
vblank counter jumps for that driver on suspend/resume. As nouveau is
currently the only user of the timestamping based vblank counter path,
this should fix that regression properly for Linux 4.4/4.5. Tested with
a couple of suspend/resume cycles composited/non-composited, with/without
vblank clients on a laptop with nv-50.

thanks,
-mario

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/6] drm: No-Op redundant calls to drm_vblank_off() (v2)
  2016-02-12 19:30 Respin: drm vblank regression fixes for Linux 4.4+ (v2) Mario Kleiner
@ 2016-02-12 19:30 ` Mario Kleiner
  2016-02-12 19:30 ` [PATCH 2/6] drm: Prevent vblank counter bumps > 1 with active vblank clients. (v2) Mario Kleiner
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Mario Kleiner @ 2016-02-12 19:30 UTC (permalink / raw)
  To: dri-devel
  Cc: mario.kleiner.de, stable, michel, vbabka, ville.syrjala,
	daniel.vetter, alexander.deucher, christian.koenig

Otherwise if a kms driver calls into drm_vblank_off() more than once
before calling drm_vblank_on() again, the redundant calls to
vblank_disable_and_save() will call drm_update_vblank_count()
while hw vblank counters and vblank timestamping are in a undefined
state during modesets, dpms off etc.

At least with the legacy drm helpers it is not unusual to
get multiple calls to drm_vblank_off and drm_vblank_on, e.g.,
half a dozen calls to drm_vblank_off and two calls to drm_vblank_on
were observed on radeon-kms during dpms-off -> dpms-on transition.

We don't no-op calls from atomic modesetting drivers, as they
should do a proper job of tracking hw state.

Fixes large jumps of the software maintained vblank counter due to
the hardware vblank counter resetting to zero during dpms off or
modeset, e.g., if radeon-kms is modified to use drm_vblank_off/on
instead of drm_vblank_pre/post_modeset().

This fixes a regression caused by the changes made to
drm_update_vblank_count() in Linux 4.4.

v2: Don't no-op on atomic modesetting drivers, per suggestion
    of Daniel Vetter.

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Cc: <stable@vger.kernel.org> # 4.4+
Cc: michel@daenzer.net
Cc: vbabka@suse.cz
Cc: ville.syrjala@linux.intel.com
Cc: daniel.vetter@ffwll.ch
Cc: dri-devel@lists.freedesktop.org
Cc: alexander.deucher@amd.com
Cc: christian.koenig@amd.com
---
 drivers/gpu/drm/drm_irq.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index 607f493..1a18033 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -1313,7 +1313,13 @@ void drm_vblank_off(struct drm_device *dev, unsigned int pipe)
 	spin_lock_irqsave(&dev->event_lock, irqflags);
 
 	spin_lock(&dev->vbl_lock);
-	vblank_disable_and_save(dev, pipe);
+	DRM_DEBUG_VBL("crtc %d, vblank enabled %d, inmodeset %d\n",
+		      pipe, vblank->enabled, vblank->inmodeset);
+
+	/* Avoid redundant vblank disables without previous drm_vblank_on(). */
+	if (drm_core_check_feature(dev, DRIVER_ATOMIC) || !vblank->inmodeset)
+		vblank_disable_and_save(dev, pipe);
+
 	wake_up(&vblank->queue);
 
 	/*
@@ -1415,6 +1421,9 @@ void drm_vblank_on(struct drm_device *dev, unsigned int pipe)
 		return;
 
 	spin_lock_irqsave(&dev->vbl_lock, irqflags);
+	DRM_DEBUG_VBL("crtc %d, vblank enabled %d, inmodeset %d\n",
+		      pipe, vblank->enabled, vblank->inmodeset);
+
 	/* Drop our private "prevent drm_vblank_get" refcount */
 	if (vblank->inmodeset) {
 		atomic_dec(&vblank->refcount);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/6] drm: Prevent vblank counter bumps > 1 with active vblank clients. (v2)
  2016-02-12 19:30 Respin: drm vblank regression fixes for Linux 4.4+ (v2) Mario Kleiner
  2016-02-12 19:30 ` [PATCH 1/6] drm: No-Op redundant calls to drm_vblank_off() (v2) Mario Kleiner
@ 2016-02-12 19:30 ` Mario Kleiner
  2016-02-12 19:30 ` [PATCH 3/6] drm: Fix drm_vblank_pre/post_modeset regression from Linux 4.4 Mario Kleiner
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Mario Kleiner @ 2016-02-12 19:30 UTC (permalink / raw)
  To: dri-devel
  Cc: mario.kleiner.de, stable, michel, vbabka, ville.syrjala,
	daniel.vetter, alexander.deucher, christian.koenig

This fixes a regression introduced by the new drm_update_vblank_count()
implementation in Linux 4.4:

Restrict the bump of the software vblank counter in drm_update_vblank_count()
to a safe maximum value of +1 whenever there is the possibility that
concurrent readers of vblank timestamps could be active at the moment,
as the current implementation of the timestamp caching and updating is
not safe against concurrent readers for calls to store_vblank() with a
bump of anything but +1. A bump != 1 would very likely return corrupted
timestamps to userspace, because the same slot in the cache could
be concurrently written by store_vblank() and read by one of those
readers in a non-atomic fashion and without the read-retry logic
detecting this collision.

Concurrent readers can exist while drm_update_vblank_count() is called
from the drm_vblank_off() or drm_vblank_on() functions or other non-vblank-
irq callers. However, all those calls are happening with the vbl_lock
locked thereby preventing a drm_vblank_get(), so the vblank refcount
can't increase while drm_update_vblank_count() is executing. Therefore
a zero vblank refcount during execution of that function signals that
is safe for arbitrary counter bumps if called from outside vblank irq,
whereas a non-zero count is not safe.

Whenever the function is called from vblank irq, we have to assume concurrent
readers could show up any time during its execution, even if the refcount
is currently zero, as vblank irqs are usually only enabled due to the
presence of readers, and because when it is called from vblank irq it
can't hold the vbl_lock to protect it from sudden bumps in vblank refcount.
Therefore also restrict bumps to +1 when the function is called from vblank
irq.

Such bumps of more than +1 can happen at other times than reenabling
vblank irqs, e.g., when regular vblank interrupts get delayed by more
than 1 frame due to long held locks, long irq off periods, realtime
preemption on RT kernels, or system management interrupts.

A better solution would be to rewrite the timestamp caching to use
full seqlocks to allow concurrent writes and reads for arbitrary
vblank counter increments.

v2: Add code comment that this is essentially a hack and should
    be replaced by a full seqlock implementation for caching of
    timestamps.

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Cc: <stable@vger.kernel.org> # 4.4+
Cc: michel@daenzer.net
Cc: vbabka@suse.cz
Cc: ville.syrjala@linux.intel.com
Cc: daniel.vetter@ffwll.ch
Cc: dri-devel@lists.freedesktop.org
Cc: alexander.deucher@amd.com
Cc: christian.koenig@amd.com
---
 drivers/gpu/drm/drm_irq.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index 1a18033..92ad62f 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -221,6 +221,49 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
 		diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
 	}
 
+	/*
+	 * FIMXE: Need to replace this hack with proper seqlocks.
+	 *
+	 * Restrict the bump of the software vblank counter to a safe maximum
+	 * value of +1 whenever there is the possibility that concurrent readers
+	 * of vblank timestamps could be active at the moment, as the current
+	 * implementation of the timestamp caching and updating is not safe
+	 * against concurrent readers for calls to store_vblank() with a bump
+	 * of anything but +1. A bump != 1 would very likely return corrupted
+	 * timestamps to userspace, because the same slot in the cache could
+	 * be concurrently written by store_vblank() and read by one of those
+	 * readers without the read-retry logic detecting the collision.
+	 *
+	 * Concurrent readers can exist when we are called from the
+	 * drm_vblank_off() or drm_vblank_on() functions and other non-vblank-
+	 * irq callers. However, all those calls to us are happening with the
+	 * vbl_lock locked to prevent drm_vblank_get(), so the vblank refcount
+	 * can't increase while we are executing. Therefore a zero refcount at
+	 * this point is safe for arbitrary counter bumps if we are called
+	 * outside vblank irq, a non-zero count is not 100% safe. Unfortunately
+	 * we must also accept a refcount of 1, as whenever we are called from
+	 * drm_vblank_get() -> drm_vblank_enable() the refcount will be 1 and
+	 * we must let that one pass through in order to not lose vblank counts
+	 * during vblank irq off - which would completely defeat the whole
+	 * point of this routine.
+	 *
+	 * Whenever we are called from vblank irq, we have to assume concurrent
+	 * readers exist or can show up any time during our execution, even if
+	 * the refcount is currently zero, as vblank irqs are usually only
+	 * enabled due to the presence of readers, and because when we are called
+	 * from vblank irq we can't hold the vbl_lock to protect us from sudden
+	 * bumps in vblank refcount. Therefore also restrict bumps to +1 when
+	 * called from vblank irq.
+	 */
+	if ((diff > 1) && (atomic_read(&vblank->refcount) > 1 ||
+	    (flags & DRM_CALLED_FROM_VBLIRQ))) {
+		DRM_DEBUG_VBL("clamping vblank bump to 1 on crtc %u: diffr=%u "
+			      "refcount %u, vblirq %u\n", pipe, diff,
+			      atomic_read(&vblank->refcount),
+			      (flags & DRM_CALLED_FROM_VBLIRQ) != 0);
+		diff = 1;
+	}
+
 	DRM_DEBUG_VBL("updating vblank count on crtc %u:"
 		      " current=%u, diff=%u, hw=%u hw_last=%u\n",
 		      pipe, vblank->count, diff, cur_vblank, vblank->last);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/6] drm: Fix drm_vblank_pre/post_modeset regression from Linux 4.4
  2016-02-12 19:30 Respin: drm vblank regression fixes for Linux 4.4+ (v2) Mario Kleiner
  2016-02-12 19:30 ` [PATCH 1/6] drm: No-Op redundant calls to drm_vblank_off() (v2) Mario Kleiner
  2016-02-12 19:30 ` [PATCH 2/6] drm: Prevent vblank counter bumps > 1 with active vblank clients. (v2) Mario Kleiner
@ 2016-02-12 19:30 ` Mario Kleiner
  2016-02-24  8:59   ` Vlastimil Babka
  2016-02-12 19:30 ` [PATCH 4/6] drm: Fix treatment of drm_vblank_offdelay in drm_vblank_on() (v2) Mario Kleiner
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Mario Kleiner @ 2016-02-12 19:30 UTC (permalink / raw)
  To: dri-devel
  Cc: mario.kleiner.de, stable, michel, vbabka, ville.syrjala,
	daniel.vetter, alexander.deucher, christian.koenig

Changes to drm_update_vblank_count() in Linux 4.4 broke the
behaviour of the pre/post modeset functions as the new update
code doesn't deal with hw vblank counter resets inbetween calls
to drm_vblank_pre_modeset an drm_vblank_post_modeset, as it
should.

This causes mistreatment of such hw counter resets as counter
wraparound, and thereby large forward jumps of the software
vblank counter which in turn cause vblank event dispatching
and vblank waits to fail/hang --> userspace clients hang.

This symptom was reported on radeon-kms to cause a infinite
hang of KDE Plasma 5 shell's login procedure, preventing users
from logging in.

Fix this by detecting when drm_update_vblank_count() is called
inside a pre->post modeset interval. If so, clamp valid vblank
increments to the safe values 0 and 1, pretty much restoring
the update behavior of the old update code of Linux 4.3 and
earlier. Also reset the last recorded hw vblank count at call
to drm_vblank_post_modeset() to be safe against hw that after
modesetting, dpms on etc. only fires its first vblank irq after
drm_vblank_post_modeset() was already called.

Reported-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Tested-by: Vlastimil Babka <vbabka@suse.cz>

Cc: <stable@vger.kernel.org> # 4.4+
Cc: michel@daenzer.net
Cc: vbabka@suse.cz
Cc: ville.syrjala@linux.intel.com
Cc: daniel.vetter@ffwll.ch
Cc: dri-devel@lists.freedesktop.org
Cc: alexander.deucher@amd.com
Cc: christian.koenig@amd.com
---
 drivers/gpu/drm/drm_irq.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index 92ad62f..055b0fa 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -222,6 +222,21 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
 	}
 
 	/*
+	 * Within a drm_vblank_pre_modeset - drm_vblank_post_modeset
+	 * interval? If so then vblank irqs keep running and it will likely
+	 * happen that the hardware vblank counter is not trustworthy as it
+	 * might reset at some point in that interval and vblank timestamps
+	 * are not trustworthy either in that interval. Iow. this can result
+	 * in a bogus diff >> 1 which must be avoided as it would cause
+	 * random large forward jumps of the software vblank counter.
+	 */
+	if (diff > 1 && (vblank->inmodeset & 0x2)) {
+		DRM_DEBUG_VBL("clamping vblank bump to 1 on crtc %u: diffr=%u"
+			      " due to pre-modeset.\n", pipe, diff);
+		diff = 1;
+	}
+
+	/*
 	 * FIMXE: Need to replace this hack with proper seqlocks.
 	 *
 	 * Restrict the bump of the software vblank counter to a safe maximum
@@ -1575,6 +1590,7 @@ void drm_vblank_post_modeset(struct drm_device *dev, unsigned int pipe)
 	if (vblank->inmodeset) {
 		spin_lock_irqsave(&dev->vbl_lock, irqflags);
 		dev->vblank_disable_allowed = true;
+		drm_reset_vblank_timestamp(dev, pipe);
 		spin_unlock_irqrestore(&dev->vbl_lock, irqflags);
 
 		if (vblank->inmodeset & 0x2)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/6] drm: Fix treatment of drm_vblank_offdelay in drm_vblank_on() (v2)
  2016-02-12 19:30 Respin: drm vblank regression fixes for Linux 4.4+ (v2) Mario Kleiner
                   ` (2 preceding siblings ...)
  2016-02-12 19:30 ` [PATCH 3/6] drm: Fix drm_vblank_pre/post_modeset regression from Linux 4.4 Mario Kleiner
@ 2016-02-12 19:30 ` Mario Kleiner
  2016-02-12 19:30 ` [PATCH 5/6] drm/radeon/pm: Handle failure of drm_vblank_get Mario Kleiner
  2016-02-12 19:30 ` [PATCH 6/6] drm/nouveau/display: Enable vblank irqs after display engine is on again Mario Kleiner
  5 siblings, 0 replies; 12+ messages in thread
From: Mario Kleiner @ 2016-02-12 19:30 UTC (permalink / raw)
  To: dri-devel
  Cc: mario.kleiner.de, stable, michel, vbabka, ville.syrjala,
	daniel.vetter, alexander.deucher, christian.koenig

drm_vblank_offdelay can have three different types of values:

< 0 is to be always treated the same as dev->vblank_disable_immediate
= 0 is to be treated as "never disable vblanks"
> 0 is to be treated as disable immediate if kms driver wants it
    that way via dev->vblank_disable_immediate. Otherwise it is
    a disable timeout in msecs.

This got broken in Linux 3.18+ for the implementation of
drm_vblank_on. If the user specified a value of zero which should
always reenable vblank irqs in this function, a kms driver could
override the users choice by setting vblank_disable_immediate
to true. This patch fixes the regression and keeps the user in
control.

v2: Only reenable vblank if there are clients left or the user
    requested to "never disable vblanks" via offdelay 0. Enabling
    vblanks even in the "delayed disable" case (offdelay > 0) was
    specifically added by Ville in commit cd19e52aee922
    ("drm: Kick start vblank interrupts at drm_vblank_on()"),
    but after discussion it turns out that this was done by accident.

    Citing Ville: "I think it just ended up as a mess due to changing
    some of the semantics of offdelay<0 vs. offdelay==0 vs.
    disable_immediate during the review of the series. So yeah, given
    how drm_vblank_put() works now, I'd just make this check for
    offdelay==0."

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Cc: <stable@vger.kernel.org> # 3.18+
Cc: michel@daenzer.net
Cc: vbabka@suse.cz
Cc: ville.syrjala@linux.intel.com
Cc: daniel.vetter@ffwll.ch
Cc: dri-devel@lists.freedesktop.org
Cc: alexander.deucher@amd.com
Cc: christian.koenig@amd.com
---
 drivers/gpu/drm/drm_irq.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index 055b0fa..8090989 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -1494,8 +1494,7 @@ void drm_vblank_on(struct drm_device *dev, unsigned int pipe)
 	 * re-enable interrupts if there are users left, or the
 	 * user wishes vblank interrupts to be enabled all the time.
 	 */
-	if (atomic_read(&vblank->refcount) != 0 ||
-	    (!dev->vblank_disable_immediate && drm_vblank_offdelay == 0))
+	if (atomic_read(&vblank->refcount) != 0 || drm_vblank_offdelay == 0)
 		WARN_ON(drm_vblank_enable(dev, pipe));
 	spin_unlock_irqrestore(&dev->vbl_lock, irqflags);
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 5/6] drm/radeon/pm: Handle failure of drm_vblank_get.
  2016-02-12 19:30 Respin: drm vblank regression fixes for Linux 4.4+ (v2) Mario Kleiner
                   ` (3 preceding siblings ...)
  2016-02-12 19:30 ` [PATCH 4/6] drm: Fix treatment of drm_vblank_offdelay in drm_vblank_on() (v2) Mario Kleiner
@ 2016-02-12 19:30 ` Mario Kleiner
  2016-02-12 19:30 ` [PATCH 6/6] drm/nouveau/display: Enable vblank irqs after display engine is on again Mario Kleiner
  5 siblings, 0 replies; 12+ messages in thread
From: Mario Kleiner @ 2016-02-12 19:30 UTC (permalink / raw)
  To: dri-devel; +Cc: alexander.deucher, michel, christian.koenig

Make sure that drm_vblank_get/put() stay balanced in
case drm_vblank_get fails, by skipping the corresponding
put.

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: michel@daenzer.net
Cc: dri-devel@lists.freedesktop.org
Cc: alexander.deucher@amd.com
Cc: christian.koenig@amd.com
---
 drivers/gpu/drm/radeon/radeon_pm.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_pm.c b/drivers/gpu/drm/radeon/radeon_pm.c
index 59abebd..a94979f 100644
--- a/drivers/gpu/drm/radeon/radeon_pm.c
+++ b/drivers/gpu/drm/radeon/radeon_pm.c
@@ -276,8 +276,12 @@ static void radeon_pm_set_clocks(struct radeon_device *rdev)
 	if (rdev->irq.installed) {
 		for (i = 0; i < rdev->num_crtc; i++) {
 			if (rdev->pm.active_crtcs & (1 << i)) {
-				rdev->pm.req_vblank |= (1 << i);
-				drm_vblank_get(rdev->ddev, i);
+				/* This can fail if a modeset is in progress */
+				if (drm_vblank_get(rdev->ddev, i) == 0)
+					rdev->pm.req_vblank |= (1 << i);
+				else
+					DRM_DEBUG_DRIVER("crtc %d no vblank, can glitch\n",
+							 i);
 			}
 		}
 	}
-- 
1.9.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 6/6] drm/nouveau/display: Enable vblank irqs after display engine is on again.
  2016-02-12 19:30 Respin: drm vblank regression fixes for Linux 4.4+ (v2) Mario Kleiner
                   ` (4 preceding siblings ...)
  2016-02-12 19:30 ` [PATCH 5/6] drm/radeon/pm: Handle failure of drm_vblank_get Mario Kleiner
@ 2016-02-12 19:30 ` Mario Kleiner
  2016-02-14 18:28   ` Daniel Vetter
  5 siblings, 1 reply; 12+ messages in thread
From: Mario Kleiner @ 2016-02-12 19:30 UTC (permalink / raw)
  To: dri-devel
  Cc: mario.kleiner.de, stable, Ben Skeggs, ville.syrjala,
	daniel.vetter

In the display resume path, move the calls to drm_vblank_on()
after the point when the display engine is running again.

Since changes were made to drm_update_vblank_count() in Linux 4.4+
to emulate hw vblank counters via vblank timestamping, the function
drm_vblank_on() now needs working high precision vblank timestamping
and therefore working scanout position queries at time of call.
These don't work before the display engine gets restarted, causing
miscalculation of vblank counter increments and thereby large forward
jumps in vblank count at display resume. These jumps can cause client
hangs on resume, or desktop hangs in the case of composited desktops.

Fix this Linux 4.4 regression by reordering calls accordingly.

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: <stable@vger.kernel.org> # 4.4+
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: ville.syrjala@linux.intel.com
Cc: daniel.vetter@ffwll.ch
Cc: dri-devel@lists.freedesktop.org
---
 drivers/gpu/drm/nouveau/nouveau_display.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c
index 18676b8..1f8e51b 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -634,10 +634,6 @@ nouveau_display_resume(struct drm_device *dev, bool runtime)
 		nv_crtc->lut.depth = 0;
 	}
 
-	/* Make sure that drm and hw vblank irqs get resumed if needed. */
-	for (head = 0; head < dev->mode_config.num_crtc; head++)
-		drm_vblank_on(dev, head);
-
 	/* This should ensure we don't hit a locking problem when someone
 	 * wakes us up via a connector.  We should never go into suspend
 	 * while the display is on anyways.
@@ -647,6 +643,10 @@ nouveau_display_resume(struct drm_device *dev, bool runtime)
 
 	drm_helper_resume_force_mode(dev);
 
+	/* Make sure that drm and hw vblank irqs get resumed if needed. */
+	for (head = 0; head < dev->mode_config.num_crtc; head++)
+		drm_vblank_on(dev, head);
+
 	list_for_each_entry(crtc, &dev->mode_config.crtc_list, head) {
 		struct nouveau_crtc *nv_crtc = nouveau_crtc(crtc);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 6/6] drm/nouveau/display: Enable vblank irqs after display engine is on again.
  2016-02-12 19:30 ` [PATCH 6/6] drm/nouveau/display: Enable vblank irqs after display engine is on again Mario Kleiner
@ 2016-02-14 18:28   ` Daniel Vetter
  0 siblings, 0 replies; 12+ messages in thread
From: Daniel Vetter @ 2016-02-14 18:28 UTC (permalink / raw)
  To: Mario Kleiner; +Cc: dri-devel, stable, Ben Skeggs, ville.syrjala, daniel.vetter

On Fri, Feb 12, 2016 at 08:30:32PM +0100, Mario Kleiner wrote:
> In the display resume path, move the calls to drm_vblank_on()
> after the point when the display engine is running again.
> 
> Since changes were made to drm_update_vblank_count() in Linux 4.4+
> to emulate hw vblank counters via vblank timestamping, the function
> drm_vblank_on() now needs working high precision vblank timestamping
> and therefore working scanout position queries at time of call.
> These don't work before the display engine gets restarted, causing
> miscalculation of vblank counter increments and thereby large forward
> jumps in vblank count at display resume. These jumps can cause client
> hangs on resume, or desktop hangs in the case of composited desktops.
> 
> Fix this Linux 4.4 regression by reordering calls accordingly.
> 
> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> Cc: <stable@vger.kernel.org> # 4.4+
> Cc: Ben Skeggs <bskeggs@redhat.com>
> Cc: ville.syrjala@linux.intel.com
> Cc: daniel.vetter@ffwll.ch
> Cc: dri-devel@lists.freedesktop.org
> ---
>  drivers/gpu/drm/nouveau/nouveau_display.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c
> index 18676b8..1f8e51b 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_display.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_display.c
> @@ -634,10 +634,6 @@ nouveau_display_resume(struct drm_device *dev, bool runtime)
>  		nv_crtc->lut.depth = 0;
>  	}
>  
> -	/* Make sure that drm and hw vblank irqs get resumed if needed. */
> -	for (head = 0; head < dev->mode_config.num_crtc; head++)
> -		drm_vblank_on(dev, head);
> -
>  	/* This should ensure we don't hit a locking problem when someone
>  	 * wakes us up via a connector.  We should never go into suspend
>  	 * while the display is on anyways.
> @@ -647,6 +643,10 @@ nouveau_display_resume(struct drm_device *dev, bool runtime)
>  
>  	drm_helper_resume_force_mode(dev);
>  
> +	/* Make sure that drm and hw vblank irqs get resumed if needed. */
> +	for (head = 0; head < dev->mode_config.num_crtc; head++)
> +		drm_vblank_on(dev, head);

Hm, imo correct place for these is in the crtc->enable/disable hooks, but
those are only guaranteed to be called symmetrically on atomic drivers.
Anyway as an interim fix this looks good.

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> +
>  	list_for_each_entry(crtc, &dev->mode_config.crtc_list, head) {
>  		struct nouveau_crtc *nv_crtc = nouveau_crtc(crtc);
>  
> -- 
> 1.9.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/6] drm: Fix drm_vblank_pre/post_modeset regression from Linux 4.4
  2016-02-12 19:30 ` [PATCH 3/6] drm: Fix drm_vblank_pre/post_modeset regression from Linux 4.4 Mario Kleiner
@ 2016-02-24  8:59   ` Vlastimil Babka
  0 siblings, 0 replies; 12+ messages in thread
From: Vlastimil Babka @ 2016-02-24  8:59 UTC (permalink / raw)
  To: Mario Kleiner, dri-devel
  Cc: stable, michel, ville.syrjala, daniel.vetter, alexander.deucher,
	christian.koenig

On 02/12/2016 08:30 PM, Mario Kleiner wrote:
> Changes to drm_update_vblank_count() in Linux 4.4 broke the
> behaviour of the pre/post modeset functions as the new update
> code doesn't deal with hw vblank counter resets inbetween calls
> to drm_vblank_pre_modeset an drm_vblank_post_modeset, as it
> should.
>
> This causes mistreatment of such hw counter resets as counter
> wraparound, and thereby large forward jumps of the software
> vblank counter which in turn cause vblank event dispatching
> and vblank waits to fail/hang --> userspace clients hang.
>
> This symptom was reported on radeon-kms to cause a infinite
> hang of KDE Plasma 5 shell's login procedure, preventing users
> from logging in.
>
> Fix this by detecting when drm_update_vblank_count() is called
> inside a pre->post modeset interval. If so, clamp valid vblank
> increments to the safe values 0 and 1, pretty much restoring
> the update behavior of the old update code of Linux 4.3 and
> earlier. Also reset the last recorded hw vblank count at call
> to drm_vblank_post_modeset() to be safe against hw that after
> modesetting, dpms on etc. only fires its first vblank irq after
> drm_vblank_post_modeset() was already called.
>
> Reported-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> Tested-by: Vlastimil Babka <vbabka@suse.cz>

FWIW I have applied patches 1-4 of this v2 to 4.4.2 and it works fine so 
far on my radeon. Patch 6/6 was for nouveau so I ignored it, and 5/6 
didn't have stable tag (and from the description I couldn't decide if I 
should include it or not).

Thanks, Vlastimil

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-02-24  8:59 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-12 19:30 Respin: drm vblank regression fixes for Linux 4.4+ (v2) Mario Kleiner
2016-02-12 19:30 ` [PATCH 1/6] drm: No-Op redundant calls to drm_vblank_off() (v2) Mario Kleiner
2016-02-12 19:30 ` [PATCH 2/6] drm: Prevent vblank counter bumps > 1 with active vblank clients. (v2) Mario Kleiner
2016-02-12 19:30 ` [PATCH 3/6] drm: Fix drm_vblank_pre/post_modeset regression from Linux 4.4 Mario Kleiner
2016-02-24  8:59   ` Vlastimil Babka
2016-02-12 19:30 ` [PATCH 4/6] drm: Fix treatment of drm_vblank_offdelay in drm_vblank_on() (v2) Mario Kleiner
2016-02-12 19:30 ` [PATCH 5/6] drm/radeon/pm: Handle failure of drm_vblank_get Mario Kleiner
2016-02-12 19:30 ` [PATCH 6/6] drm/nouveau/display: Enable vblank irqs after display engine is on again Mario Kleiner
2016-02-14 18:28   ` Daniel Vetter
  -- strict thread matches above, loose matches on Subject: below --
2016-02-08  1:13 drm vblank regression fixes for Linux 4.4+ Mario Kleiner
2016-02-08  1:13 ` [PATCH 3/6] drm: Fix drm_vblank_pre/post_modeset regression from Linux 4.4 Mario Kleiner
2016-02-09 10:00   ` Daniel Vetter
2016-02-11 13:03   ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).