dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: Mario Kleiner <mario.kleiner.de@gmail.com>
To: "Daniel Vetter" <daniel@ffwll.ch>,
	"Ville Syrjälä" <ville.syrjala@linux.intel.com>
Cc: daniel.vetter@ffwll.ch, michel@daenzer.net,
	linux@bernd-steinhauser.de, stable@vger.kernel.org,
	dri-devel@lists.freedesktop.org, alexander.deucher@amd.com,
	christian.koenig@amd.com, vbabka@suse.cz
Subject: Re: [PATCH 5/6] drm: Prevent vblank counter jumps with timestamp based update method.
Date: Wed, 10 Feb 2016 17:28:26 +0100	[thread overview]
Message-ID: <56BB652A.1010107@gmail.com> (raw)
In-Reply-To: <20160209150347.GZ11240@phenom.ffwll.local>

On 02/09/2016 04:03 PM, Daniel Vetter wrote:
> On Tue, Feb 09, 2016 at 04:11:49PM +0200, Ville Syrjälä wrote:
>> On Tue, Feb 09, 2016 at 02:53:30PM +0100, Mario Kleiner wrote:
>>> On 02/09/2016 11:09 AM, Daniel Vetter wrote:
>>>> On Mon, Feb 08, 2016 at 02:13:28AM +0100, Mario Kleiner wrote:
>>>>> The changes to drm_update_vblank_count() in Linux 4.4 added a
>>>>> method to emulate a hardware vblank counter by use of high
>>>>> precision vblank timestamps if a kms driver supports those,
>>>>> but doesn't suppport hw vblank counters.
>>>>>
>>>>> That method assumes that the old timestamp from a previous
>>>>> invocation is valid, but that is not always the case. E.g.,
>>>>> if drm_reset_vblank_timestamp() gets called during drm_vblank_on()
>>>>> or drm_update_vblank_count() gets called outside vblank irq and
>>>>> the high precision timestamping can't deliver a precise timestamp,
>>>>> ie. drm_get_last_vbltimestamp() delivers a return value of false,
>>>>> then those functions will initialize the old timestamp to zero
>>>>> to mark it as invalid.
>>>>>
>>>>> A following call to drm_update_vblank_count() would then calculate
>>>>> elapsed time with vblank irqs off as current vblank timestamp minus
>>>>> the zero old timestamp and compute a software vblank counter increment
>>>>> that corresponds to system uptime, causing a large forward jump of the
>>>>> software vblank counter. That jump in turn can cause too long waits
>>>>> in drmWaitVblank and very long delays in delivery of vblank events,
>>>>> resulting in hangs of userspace clients.
>>>>>
>>>>> This problem can be observed on nouveau-kms during machine
>>>>> suspend->resume cycles, where drm_vblank_off is called during
>>>>> suspend, drm_vblank_on is called during resume and the first
>>>>> queries to drm_get_last_vbltimestamp() don't deliver high
>>>>> precision timestamps, resulting in a large harmful counter jump.
>>>>
>>>> Why does nouveau enable vblank interrupts before it can get valid
>>>> timestamps? That sounds like the core bug here, and papering over that in
>>>> the vblank code feels very wrong to me. Maybe we should instead just not
>>>> sample the vblank at all if the timestamp fails and splat a big WARN_ON
>>>> about this, to give driver writers a chance to fix up their mess?
>>>> -Daniel
>>>>
>>>
>>> The high precision timestamping is allowed to fail for a kms driver
>>> under some conditions which are not implementation errors of the driver,
>>> or getting disabled by user override, so we should handle that robustly.
>>> We handle it robustly everywhere else in the drm, so we should do it
>>> here as well.
>>>
>>> E.g., nouveau generally supports timestamping/scanout position queries,
>>> but can't support it on old pre NV-50 hardware with some output type
>>> (either on analog VGA, or DVI-D, can't remember atm. which one is
>>> unsupported).
>>
>> I think the surprising thing here is that it fails first and then
>> succeeds on the same crtc/output combo presumably.
>
> Yeah exactly this. Failing consistently is ok imo (and probably should be
> handled). Failing first and then later on working (without changing the
> mode config in between) seems suspicous. That shouldn't ever happen really
> and seems like a driver bug (like enabling vblanks too early, before the
> pipe is fully up&running).
> -Daniel
>
>>
>> I guess in theory the driver could fail during random times for whatever
>> reason, though I tend to think that the driver should either make it
>> robust or not even pretend to support it.
>>
>> I suppose one failure mode the driver can't really do anything about is
>> some random SMI crap or something stalling the timestamp queries enough
>> that we fail the precisions tests and exhaust the retry limits. So yeah,
>> making it robust against that kind of nastyness sounds line it might be
>> a good idea. Though perhaps it should be something a bit more severe
>> than a DRM_DEBUG since I think it really shouldn't happen when the
>> driver and system are otherwise sane. Of course if it routinely fails
>> with some driver then simply making it spew errors into dmesg isn't
>> so nice, unless the driver also gets fixed.
>>

I think i have an idea what might go wrong with nouveau, so i'll see if 
i can add a fixup patch.

There's another scenario where this zero-ts case can be hit. If the 
driver drm_vblank_init()'s - setting all timestamps to zero - and then 
code starts using vblanks (drm_vblank_get()) without drm_vblank_on 
beforehand, which is afaics the case with nouveau. Unless that's 
considered an error as well, we'd need to init the timestamps to 
something non-zero but harmless like 1 usecs at drm_vblank_init() time?
What makes sense as output here? DRM_WARN_ONCE?

-mario

>>>
>>> There are also new Soc drivers showing up which support those methods,
>>> but at least i didn't verify or test if their implementations are good
>>> enough for the needs of the new drm_udpate_vblank_count() which is more
>>> sensitive to kms drivers being even slightly off.
>>>
>>> -mario
>>>
>>>>>
>>>>> Fix this by checking if the old timestamp used for this calculations
>>>>> is zero == invalid. If so, perform a counter increment of +1 to
>>>>> prevent large counter jumps and reinitialize the timestamps to
>>>>> sane values.
>>>>>
>>>>> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
>>>>> Cc: <stable@vger.kernel.org> # 4.4+
>>>>> Cc: michel@daenzer.net
>>>>> Cc: vbabka@suse.cz
>>>>> Cc: ville.syrjala@linux.intel.com
>>>>> Cc: daniel.vetter@ffwll.ch
>>>>> Cc: dri-devel@lists.freedesktop.org
>>>>> Cc: alexander.deucher@amd.com
>>>>> Cc: christian.koenig@amd.com
>>>>> ---
>>>>>    drivers/gpu/drm/drm_irq.c | 7 +++++++
>>>>>    1 file changed, 7 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
>>>>> index fb17c45..88bdf19 100644
>>>>> --- a/drivers/gpu/drm/drm_irq.c
>>>>> +++ b/drivers/gpu/drm/drm_irq.c
>>>>> @@ -216,6 +216,13 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>>>>>    			DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
>>>>>    				      " diff_ns = %lld, framedur_ns = %d)\n",
>>>>>    				      pipe, (long long) diff_ns, framedur_ns);
>>>>> +
>>>>> +		/* No valid t_old to calculate diff? Bump +1 to force reinit. */
>>>>> +		if (t_old->tv_sec == 0 && t_old->tv_usec == 0) {
>>>>> +			DRM_DEBUG_VBL("crtc %u: No baseline ts. Bump +1.\n",
>>>>> +				      pipe);
>>>>> +			diff = 1;
>>>>> +		}
>>>>>    	} else {
>>>>>    		/* some kind of default for drivers w/o accurate vbl timestamping */
>>>>>    		diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
>>>>> --
>>>>> 1.9.1
>>>>>
>>>>
>>
>> --
>> Ville Syrjälä
>> Intel OTC
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

  reply	other threads:[~2016-02-10 16:28 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-08  1:13 drm vblank regression fixes for Linux 4.4+ Mario Kleiner
2016-02-08  1:13 ` [PATCH 1/6] drm: No-Op redundant calls to drm_vblank_off() Mario Kleiner
2016-02-09  9:54   ` Daniel Vetter
2016-02-09 13:27     ` Mario Kleiner
2016-02-08  1:13 ` [PATCH 2/6] drm: Prevent vblank counter bumps > 1 with active vblank clients Mario Kleiner
2016-02-09  9:56   ` Daniel Vetter
2016-02-09 10:07     ` Ville Syrjälä
2016-02-09 10:23       ` Daniel Vetter
2016-02-09 13:39         ` Mario Kleiner
2016-02-09 14:29           ` Daniel Vetter
2016-02-09 16:18             ` Mario Kleiner
2016-02-08  1:13 ` [PATCH 3/6] drm: Fix drm_vblank_pre/post_modeset regression from Linux 4.4 Mario Kleiner
2016-02-09 10:00   ` Daniel Vetter
2016-02-11 13:03   ` Vlastimil Babka
2016-02-08  1:13 ` [PATCH 4/6] drm: Fix treatment of drm_vblank_offdelay in drm_vblank_on() Mario Kleiner
2016-02-09 10:06   ` Daniel Vetter
2016-02-09 11:10     ` Ville Syrjälä
2016-02-09 13:29       ` Mario Kleiner
2016-02-09 13:41         ` Ville Syrjälä
2016-02-09 14:31           ` Daniel Vetter
2016-02-08  1:13 ` [PATCH 5/6] drm: Prevent vblank counter jumps with timestamp based update method Mario Kleiner
2016-02-09 10:09   ` Daniel Vetter
2016-02-09 13:53     ` Mario Kleiner
2016-02-09 14:11       ` Ville Syrjälä
2016-02-09 15:03         ` Daniel Vetter
2016-02-10 16:28           ` Mario Kleiner [this message]
2016-02-10 17:17             ` Daniel Vetter
2016-02-10 18:36               ` Mario Kleiner
2016-02-10 19:34                 ` Daniel Vetter
2016-02-08  1:13 ` [PATCH 6/6] drm/radeon/pm: Handle failure of drm_vblank_get Mario Kleiner
2016-02-09 10:10   ` Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56BB652A.1010107@gmail.com \
    --to=mario.kleiner.de@gmail.com \
    --cc=alexander.deucher@amd.com \
    --cc=christian.koenig@amd.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux@bernd-steinhauser.de \
    --cc=michel@daenzer.net \
    --cc=stable@vger.kernel.org \
    --cc=vbabka@suse.cz \
    --cc=ville.syrjala@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).