* Re: [PATCH weston 1/1] compositor: Abort on bad page flip timestamps
[not found] ` <545B1E88.5070702@daenzer.net>
@ 2014-11-06 16:42 ` Pekka Paalanen
[not found] ` <20141106184242.4e281eb9-DA5HUFbJnAM@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Pekka Paalanen @ 2014-11-06 16:42 UTC (permalink / raw)
To: Michel Dänzer; +Cc: Frederic Plourde, dri-devel, wayland-devel
On Thu, 06 Nov 2014 16:08:56 +0900
Michel Dänzer <michel@daenzer.net> wrote:
> On 06.11.2014 03:06, Frederic Plourde wrote:
> > Many features, like animations, hardly depend on page flip timestamps
> > to work properly, but some DRM drivers do not correctly support page flip
> > timestamps (or not at all) and in that case, things start to go wrong.
> >
> > This patch adds sanity check to weston_output_finish_frame. By solely
> > verifying that page flip timestamps are monotonically increasing, we
> > make sure that :
> >
> > 1) Underlying driver is not throwing zeroed-out timestamp series at us.
> > 2) We have not mistakenly jumped backwards because of integer overflow.
> >
> > If a pathological case is detected, we gracefully exit Weston
> > with an appropriate exit code to help developers debug their drivers.
>
> That seems a bit harsh. IIRC, zero can be returned for the timestamp
> intermittently if no accurate timestamp value can be determined, e.g.
> because the CRTC is disabled. At the very least, I'd recommend
> double-checking this with Mario Kleiner (Cc'd) and the dri-devel mailing
> list.
Can that really happen if we are not doing stupid things like
attempting to flip on a disabled crtc or output?
Or can it happen, if we schedule a flip, then disable the crtc
before the flip completes? Or maybe when an output is hot-unplugged?
Is zero a special timestamp that simply cannot be produced during
normal operations, like due to clock wrap-around?
Thanks,
pq
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH weston 1/1] compositor: Abort on bad page flip timestamps
[not found] ` <20141106184242.4e281eb9-DA5HUFbJnAM@public.gmane.org>
@ 2014-11-07 6:04 ` Mario Kleiner
[not found] ` <545C60E0.8090405-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Mario Kleiner @ 2014-11-07 6:04 UTC (permalink / raw)
To: Pekka Paalanen, Michel Dänzer
Cc: Frederic Plourde, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
wayland-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
On 06/11/14 17:42, Pekka Paalanen wrote:
> On Thu, 06 Nov 2014 16:08:56 +0900
> Michel Dänzer <michel@daenzer.net> wrote:
>
>> On 06.11.2014 03:06, Frederic Plourde wrote:
>>> Many features, like animations, hardly depend on page flip timestamps
>>> to work properly, but some DRM drivers do not correctly support page flip
>>> timestamps (or not at all) and in that case, things start to go wrong.
>>>
>>> This patch adds sanity check to weston_output_finish_frame. By solely
>>> verifying that page flip timestamps are monotonically increasing, we
>>> make sure that :
>>>
>>> 1) Underlying driver is not throwing zeroed-out timestamp series at us.
>>> 2) We have not mistakenly jumped backwards because of integer overflow.
>>>
>>> If a pathological case is detected, we gracefully exit Weston
>>> with an appropriate exit code to help developers debug their drivers.
>>
>> That seems a bit harsh. IIRC, zero can be returned for the timestamp
>> intermittently if no accurate timestamp value can be determined, e.g.
>> because the CRTC is disabled. At the very least, I'd recommend
>> double-checking this with Mario Kleiner (Cc'd) and the dri-devel mailing
>> list.
>
> Can that really happen if we are not doing stupid things like
> attempting to flip on a disabled crtc or output?
>
I don't think it could happen for non-stupid regular use and would
indicate a driver bug.
> Or can it happen, if we schedule a flip, then disable the crtc
> before the flip completes? Or maybe when an output is hot-unplugged?
>
On kernels <= 3.17 disabling vblank irqs will clear the cached timestamp
to zero, so a waitvblank ioctl() for a pure query of msc/ust could
return zero as a signal of "invalid/undefined timestamp" on some drivers
under some circumstances. Basically a -EAGAIN error code.
But that shouldn't ever happen for kms-pageflip completion timestamps,
because vblank irqs don't get disabled while a pageflip is pending, as
the pending pageflip keeps the vblank reference count > 0.
There are also some new patches into Linux 3.18rc which should prevent
returning zero timestamps even for pure waitvblank ioctl() queries if
vblank irqs get disabled for one reason or the other, cfe.
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/gpu/drm/drm_irq.c?id=844b03f27739135fe1fed2fef06da0ffc4c7a081
kms drivers should usually use the drm_send_vblank_event() helper in
their pageflip completion routine. That helper will ideally send out the
cached vblank count and high precision timestamps which were computed
during the most recent vblank interrupt for the crtc that pageflipped.
There's also some fallback there which, if no crtc is specified (crtc ==
-1), will simply return a msc of zero and the current system time as
timestamp. The fallback is for gpus with weird flip completion
behaviour. Currently nouveau uses it for old < NV-50 gpu's where we
couldn't find a better way to make pageflip completion behave
properly/reliably due to some hardware weirdness.
> Is zero a special timestamp that simply cannot be produced during
> normal operations, like due to clock wrap-around?
>
The timestamps are CLOCK_MONOTONIC uint32 seconds.microseconds since
bootup, so a wraparound would ony happen after 2^32 seconds or 136
years, so normal operation shouldn't cause a zero timestamp.
So i think observing zero pageflip timestamps would be a sign that the
kms-driver needs fixing.
-mario
>
> Thanks,
> pq
>
_______________________________________________
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/wayland-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH weston 1/1] compositor: Abort on bad page flip timestamps
[not found] ` <545C60E0.8090405-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2014-11-07 9:36 ` Pekka Paalanen
2014-11-07 9:56 ` Michel Dänzer
0 siblings, 1 reply; 4+ messages in thread
From: Pekka Paalanen @ 2014-11-07 9:36 UTC (permalink / raw)
To: Mario Kleiner, Michel Dänzer
Cc: Frederic Plourde, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
wayland-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
On Fri, 07 Nov 2014 07:04:16 +0100
Mario Kleiner <mario.kleiner.de@gmail.com> wrote:
> On 06/11/14 17:42, Pekka Paalanen wrote:
> > On Thu, 06 Nov 2014 16:08:56 +0900
> > Michel Dänzer <michel@daenzer.net> wrote:
> >
> >> On 06.11.2014 03:06, Frederic Plourde wrote:
> >>> Many features, like animations, hardly depend on page flip timestamps
> >>> to work properly, but some DRM drivers do not correctly support page flip
> >>> timestamps (or not at all) and in that case, things start to go wrong.
> >>>
> >>> This patch adds sanity check to weston_output_finish_frame. By solely
> >>> verifying that page flip timestamps are monotonically increasing, we
> >>> make sure that :
> >>>
> >>> 1) Underlying driver is not throwing zeroed-out timestamp series at us.
> >>> 2) We have not mistakenly jumped backwards because of integer overflow.
> >>>
> >>> If a pathological case is detected, we gracefully exit Weston
> >>> with an appropriate exit code to help developers debug their drivers.
> >>
> >> That seems a bit harsh. IIRC, zero can be returned for the timestamp
> >> intermittently if no accurate timestamp value can be determined, e.g.
> >> because the CRTC is disabled. At the very least, I'd recommend
> >> double-checking this with Mario Kleiner (Cc'd) and the dri-devel mailing
> >> list.
> >
> > Can that really happen if we are not doing stupid things like
> > attempting to flip on a disabled crtc or output?
> >
>
> I don't think it could happen for non-stupid regular use and would
> indicate a driver bug.
>
> > Or can it happen, if we schedule a flip, then disable the crtc
> > before the flip completes? Or maybe when an output is hot-unplugged?
> >
>
> On kernels <= 3.17 disabling vblank irqs will clear the cached timestamp
> to zero, so a waitvblank ioctl() for a pure query of msc/ust could
> return zero as a signal of "invalid/undefined timestamp" on some drivers
> under some circumstances. Basically a -EAGAIN error code.
>
> But that shouldn't ever happen for kms-pageflip completion timestamps,
> because vblank irqs don't get disabled while a pageflip is pending, as
> the pending pageflip keeps the vblank reference count > 0.
>
> There are also some new patches into Linux 3.18rc which should prevent
> returning zero timestamps even for pure waitvblank ioctl() queries if
> vblank irqs get disabled for one reason or the other, cfe.
>
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/gpu/drm/drm_irq.c?id=844b03f27739135fe1fed2fef06da0ffc4c7a081
>
> kms drivers should usually use the drm_send_vblank_event() helper in
> their pageflip completion routine. That helper will ideally send out the
> cached vblank count and high precision timestamps which were computed
> during the most recent vblank interrupt for the crtc that pageflipped.
> There's also some fallback there which, if no crtc is specified (crtc ==
> -1), will simply return a msc of zero and the current system time as
> timestamp. The fallback is for gpus with weird flip completion
> behaviour. Currently nouveau uses it for old < NV-50 gpu's where we
> couldn't find a better way to make pageflip completion behave
> properly/reliably due to some hardware weirdness.
>
> > Is zero a special timestamp that simply cannot be produced during
> > normal operations, like due to clock wrap-around?
> >
>
> The timestamps are CLOCK_MONOTONIC uint32 seconds.microseconds since
> bootup, so a wraparound would ony happen after 2^32 seconds or 136
> years, so normal operation shouldn't cause a zero timestamp.
>
> So i think observing zero pageflip timestamps would be a sign that the
> kms-driver needs fixing.
Cool, thank you for that. So zero timestamp is basically a driver bug.
But, is the assumption of monotonic timestamps similarly valid? Should
we consider getting the same timestamp more than once from a page-flip
completion event to be a driver bug or not? I'm not quite sure as you
mentioned the sending of cached timestamps.
If we still feel that exiting Weston on the first timestamp failure
(zero or not increasing) is too harsh, we could simply log a warning
and add again some heuristics on when things are too far wrong to
continue.
E.g. getting a single timestamp failure during the first 5 seconds of
compositor uptime warrants an exit, but later probably not. We want to
catch cases where the compositor cannot reliably work to begin with,
but killing the user's session much later is pretty blunt if it already
worked somewhat.
Would that be good?
I would prefer to not fall back to clock_gettime() for timestamps on
DRM, so that there would be more pressure to fix the kernel drivers if
any are broken.
Thanks,
pq
_______________________________________________
wayland-devel mailing list
wayland-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/wayland-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH weston 1/1] compositor: Abort on bad page flip timestamps
2014-11-07 9:36 ` Pekka Paalanen
@ 2014-11-07 9:56 ` Michel Dänzer
0 siblings, 0 replies; 4+ messages in thread
From: Michel Dänzer @ 2014-11-07 9:56 UTC (permalink / raw)
To: Pekka Paalanen, Mario Kleiner; +Cc: Frederic Plourde, dri-devel, wayland-devel
On 07.11.2014 18:36, Pekka Paalanen wrote:
>
> If we still feel that exiting Weston on the first timestamp failure
> (zero or not increasing) is too harsh, we could simply log a warning
> and add again some heuristics on when things are too far wrong to
> continue.
>
> E.g. getting a single timestamp failure during the first 5 seconds of
> compositor uptime warrants an exit, but later probably not. We want to
> catch cases where the compositor cannot reliably work to begin with,
> but killing the user's session much later is pretty blunt if it already
> worked somewhat.
>
> Would that be good?
Yeah, that would address my main concern.
--
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Mesa and X developer
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-11-07 9:56 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1415210783-16839-1-git-send-email-frederic.plourde@collabora.co.uk>
[not found] ` <545B1E88.5070702@daenzer.net>
2014-11-06 16:42 ` [PATCH weston 1/1] compositor: Abort on bad page flip timestamps Pekka Paalanen
[not found] ` <20141106184242.4e281eb9-DA5HUFbJnAM@public.gmane.org>
2014-11-07 6:04 ` Mario Kleiner
[not found] ` <545C60E0.8090405-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-11-07 9:36 ` Pekka Paalanen
2014-11-07 9:56 ` Michel Dänzer
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.