* drm: fix one flawed mutex grab and remove some spurious mutex grabs
@ 2011-10-26 2:20 Ilija Hadzic
2011-10-26 2:20 ` [PATCH 1/3] drm: no need to hold global mutex for static data Ilija Hadzic
` (2 more replies)
0 siblings, 3 replies; 25+ messages in thread
From: Ilija Hadzic @ 2011-10-26 2:20 UTC (permalink / raw)
To: airlied, dri-devel
The following three patches remove unecessary locks around ioctls
in drm module.
First two:
[PATCH 1/3] drm: no need to hold global mutex for static data
[PATCH 2/3] drm: make DRM_UNLOCKED ioctls with their own mutex
are rather trivial and straight forward and probably do not
need much explanation.
The third one:
[PATCH 3/3] drm: do not sleep on vblank while holding a mutex
is more serious and clears a clog that can occur if multiple
processes call drm_wait_vblank as explained in patch commit
message.
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 1/3] drm: no need to hold global mutex for static data
2011-10-26 2:20 drm: fix one flawed mutex grab and remove some spurious mutex grabs Ilija Hadzic
@ 2011-10-26 2:20 ` Ilija Hadzic
2011-10-26 7:44 ` Daniel Vetter
2011-10-26 2:20 ` [PATCH 2/3] drm: make DRM_UNLOCKED ioctls with their own mutex Ilija Hadzic
2011-10-26 2:20 ` [PATCH 3/3] drm: do not sleep on vblank while holding a mutex Ilija Hadzic
2 siblings, 1 reply; 25+ messages in thread
From: Ilija Hadzic @ 2011-10-26 2:20 UTC (permalink / raw)
To: airlied, dri-devel
drm_getcap and drm_version ioctls only reads static data,
there is no need to protect them with drm_global_mutex,
so make them DRM_UNLOCKED
Signed-off-by: Ilija Hadzic <ihadzic@research.bell-labs.com>
---
drivers/gpu/drm/drm_drv.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index 7a87e08..e159f17 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -60,14 +60,14 @@ static int drm_version(struct drm_device *dev, void *data,
/** Ioctl table */
static struct drm_ioctl_desc drm_ioctls[] = {
- DRM_IOCTL_DEF(DRM_IOCTL_VERSION, drm_version, 0),
+ DRM_IOCTL_DEF(DRM_IOCTL_VERSION, drm_version, DRM_UNLOCKED),
DRM_IOCTL_DEF(DRM_IOCTL_GET_UNIQUE, drm_getunique, 0),
DRM_IOCTL_DEF(DRM_IOCTL_GET_MAGIC, drm_getmagic, 0),
DRM_IOCTL_DEF(DRM_IOCTL_IRQ_BUSID, drm_irq_by_busid, DRM_MASTER|DRM_ROOT_ONLY),
DRM_IOCTL_DEF(DRM_IOCTL_GET_MAP, drm_getmap, 0),
DRM_IOCTL_DEF(DRM_IOCTL_GET_CLIENT, drm_getclient, 0),
DRM_IOCTL_DEF(DRM_IOCTL_GET_STATS, drm_getstats, 0),
- DRM_IOCTL_DEF(DRM_IOCTL_GET_CAP, drm_getcap, 0),
+ DRM_IOCTL_DEF(DRM_IOCTL_GET_CAP, drm_getcap, DRM_UNLOCKED),
DRM_IOCTL_DEF(DRM_IOCTL_SET_VERSION, drm_setversion, DRM_MASTER),
DRM_IOCTL_DEF(DRM_IOCTL_SET_UNIQUE, drm_setunique, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
--
1.7.7
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 2/3] drm: make DRM_UNLOCKED ioctls with their own mutex
2011-10-26 2:20 drm: fix one flawed mutex grab and remove some spurious mutex grabs Ilija Hadzic
2011-10-26 2:20 ` [PATCH 1/3] drm: no need to hold global mutex for static data Ilija Hadzic
@ 2011-10-26 2:20 ` Ilija Hadzic
2011-10-26 7:47 ` Daniel Vetter
2011-10-26 2:20 ` [PATCH 3/3] drm: do not sleep on vblank while holding a mutex Ilija Hadzic
2 siblings, 1 reply; 25+ messages in thread
From: Ilija Hadzic @ 2011-10-26 2:20 UTC (permalink / raw)
To: airlied, dri-devel
drm_getmap, drm_getclient and drm_getstats are all
protected with their own mutex (dev->struct_mutex)
no need to hold global mutex; make them DRM_UNLOCKED
Signed-off-by: Ilija Hadzic <ihadzic@research.bell-labs.com>
---
drivers/gpu/drm/drm_drv.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index e159f17..dbabcb0 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -64,9 +64,9 @@ static struct drm_ioctl_desc drm_ioctls[] = {
DRM_IOCTL_DEF(DRM_IOCTL_GET_UNIQUE, drm_getunique, 0),
DRM_IOCTL_DEF(DRM_IOCTL_GET_MAGIC, drm_getmagic, 0),
DRM_IOCTL_DEF(DRM_IOCTL_IRQ_BUSID, drm_irq_by_busid, DRM_MASTER|DRM_ROOT_ONLY),
- DRM_IOCTL_DEF(DRM_IOCTL_GET_MAP, drm_getmap, 0),
- DRM_IOCTL_DEF(DRM_IOCTL_GET_CLIENT, drm_getclient, 0),
- DRM_IOCTL_DEF(DRM_IOCTL_GET_STATS, drm_getstats, 0),
+ DRM_IOCTL_DEF(DRM_IOCTL_GET_MAP, drm_getmap, DRM_UNLOCKED),
+ DRM_IOCTL_DEF(DRM_IOCTL_GET_CLIENT, drm_getclient, DRM_UNLOCKED),
+ DRM_IOCTL_DEF(DRM_IOCTL_GET_STATS, drm_getstats, DRM_UNLOCKED),
DRM_IOCTL_DEF(DRM_IOCTL_GET_CAP, drm_getcap, DRM_UNLOCKED),
DRM_IOCTL_DEF(DRM_IOCTL_SET_VERSION, drm_setversion, DRM_MASTER),
--
1.7.7
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 3/3] drm: do not sleep on vblank while holding a mutex
2011-10-26 2:20 drm: fix one flawed mutex grab and remove some spurious mutex grabs Ilija Hadzic
2011-10-26 2:20 ` [PATCH 1/3] drm: no need to hold global mutex for static data Ilija Hadzic
2011-10-26 2:20 ` [PATCH 2/3] drm: make DRM_UNLOCKED ioctls with their own mutex Ilija Hadzic
@ 2011-10-26 2:20 ` Ilija Hadzic
2011-10-26 7:54 ` Daniel Vetter
2011-10-26 8:02 ` Michel Dänzer
2 siblings, 2 replies; 25+ messages in thread
From: Ilija Hadzic @ 2011-10-26 2:20 UTC (permalink / raw)
To: airlied, dri-devel
holding the drm_global_mutex in drm_wait_vblank and then
going to sleep (via DRM_WAIT_ON) is a bad idea in general
because it can block other processes that may issue ioctls
that also grab drm_global_mutex. Blocking can occur until
next vblank which is in the tens of microseconds order of
magnitude.
fix it, but making drm_wait_vblank DRM_UNLOCKED call and
then grabbing the mutex inside the drm_wait_vblank function
and only for the duration of the critical section (i.e.,
from first manipulation of vblank sequence number until
putting the task in the wait queue).
Signed-off-by: Ilija Hadzic <ihadzic@research.bell-labs.com>
---
drivers/gpu/drm/drm_drv.c | 2 +-
drivers/gpu/drm/drm_irq.c | 16 +++++++++++-----
include/drm/drm_os_linux.h | 25 +++++++++++++++++++++++++
3 files changed, 37 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index dbabcb0..dc0eb0b 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -124,7 +124,7 @@ static struct drm_ioctl_desc drm_ioctls[] = {
DRM_IOCTL_DEF(DRM_IOCTL_SG_ALLOC, drm_sg_alloc_ioctl, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
DRM_IOCTL_DEF(DRM_IOCTL_SG_FREE, drm_sg_free, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
- DRM_IOCTL_DEF(DRM_IOCTL_WAIT_VBLANK, drm_wait_vblank, 0),
+ DRM_IOCTL_DEF(DRM_IOCTL_WAIT_VBLANK, drm_wait_vblank, DRM_UNLOCKED),
DRM_IOCTL_DEF(DRM_IOCTL_MODESET_CTL, drm_modeset_ctl, 0),
diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index 3830e9e..d85d604 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -1191,6 +1191,7 @@ int drm_wait_vblank(struct drm_device *dev, void *data,
DRM_DEBUG("failed to acquire vblank counter, %d\n", ret);
return ret;
}
+ mutex_lock(&drm_global_mutex);
seq = drm_vblank_count(dev, crtc);
switch (vblwait->request.type & _DRM_VBLANK_TYPES_MASK) {
@@ -1200,12 +1201,15 @@ int drm_wait_vblank(struct drm_device *dev, void *data,
case _DRM_VBLANK_ABSOLUTE:
break;
default:
+ mutex_unlock(&drm_global_mutex);
ret = -EINVAL;
goto done;
}
- if (flags & _DRM_VBLANK_EVENT)
+ if (flags & _DRM_VBLANK_EVENT) {
+ mutex_unlock(&drm_global_mutex);
return drm_queue_vblank_event(dev, crtc, vblwait, file_priv);
+ }
if ((flags & _DRM_VBLANK_NEXTONMISS) &&
(seq - vblwait->request.sequence) <= (1<<23)) {
@@ -1215,10 +1219,12 @@ int drm_wait_vblank(struct drm_device *dev, void *data,
DRM_DEBUG("waiting on vblank count %d, crtc %d\n",
vblwait->request.sequence, crtc);
dev->last_vblank_wait[crtc] = vblwait->request.sequence;
- DRM_WAIT_ON(ret, dev->vbl_queue[crtc], 3 * DRM_HZ,
- (((drm_vblank_count(dev, crtc) -
- vblwait->request.sequence) <= (1 << 23)) ||
- !dev->irq_enabled));
+ /* DRM_WAIT_ON_LOCKED will release drm_global_mutex */
+ /* before going to sleep */
+ DRM_WAIT_ON_LOCKED(ret, dev->vbl_queue[crtc], 3 * DRM_HZ,
+ (((drm_vblank_count(dev, crtc) -
+ vblwait->request.sequence) <= (1 << 23)) ||
+ !dev->irq_enabled));
if (ret != -EINTR) {
struct timeval now;
diff --git a/include/drm/drm_os_linux.h b/include/drm/drm_os_linux.h
index 3933691..fc6aaf4 100644
--- a/include/drm/drm_os_linux.h
+++ b/include/drm/drm_os_linux.h
@@ -123,5 +123,30 @@ do { \
remove_wait_queue(&(queue), &entry); \
} while (0)
+#define DRM_WAIT_ON_LOCKED( ret, queue, timeout, condition ) \
+do { \
+ DECLARE_WAITQUEUE(entry, current); \
+ unsigned long end = jiffies + (timeout); \
+ add_wait_queue(&(queue), &entry); \
+ mutex_unlock(&drm_global_mutex); \
+ \
+ for (;;) { \
+ __set_current_state(TASK_INTERRUPTIBLE); \
+ if (condition) \
+ break; \
+ if (time_after_eq(jiffies, end)) { \
+ ret = -EBUSY; \
+ break; \
+ } \
+ schedule_timeout((HZ/100 > 1) ? HZ/100 : 1); \
+ if (signal_pending(current)) { \
+ ret = -EINTR; \
+ break; \
+ } \
+ } \
+ __set_current_state(TASK_RUNNING); \
+ remove_wait_queue(&(queue), &entry); \
+} while (0)
+
#define DRM_WAKEUP( queue ) wake_up( queue )
#define DRM_INIT_WAITQUEUE( queue ) init_waitqueue_head( queue )
--
1.7.7
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 1/3] drm: no need to hold global mutex for static data
2011-10-26 2:20 ` [PATCH 1/3] drm: no need to hold global mutex for static data Ilija Hadzic
@ 2011-10-26 7:44 ` Daniel Vetter
0 siblings, 0 replies; 25+ messages in thread
From: Daniel Vetter @ 2011-10-26 7:44 UTC (permalink / raw)
To: Ilija Hadzic; +Cc: dri-devel
On Tue, Oct 25, 2011 at 10:20:12PM -0400, Ilija Hadzic wrote:
> drm_getcap and drm_version ioctls only reads static data,
> there is no need to protect them with drm_global_mutex,
> so make them DRM_UNLOCKED
>
> Signed-off-by: Ilija Hadzic <ihadzic@research.bell-labs.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
--
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/3] drm: make DRM_UNLOCKED ioctls with their own mutex
2011-10-26 2:20 ` [PATCH 2/3] drm: make DRM_UNLOCKED ioctls with their own mutex Ilija Hadzic
@ 2011-10-26 7:47 ` Daniel Vetter
2011-10-26 16:11 ` Ilija Hadzic
0 siblings, 1 reply; 25+ messages in thread
From: Daniel Vetter @ 2011-10-26 7:47 UTC (permalink / raw)
To: Ilija Hadzic; +Cc: dri-devel
On Tue, Oct 25, 2011 at 10:20:13PM -0400, Ilija Hadzic wrote:
> drm_getmap, drm_getclient and drm_getstats are all
> protected with their own mutex (dev->struct_mutex)
> no need to hold global mutex; make them DRM_UNLOCKED
>
> Signed-off-by: Ilija Hadzic <ihadzic@research.bell-labs.com>
Just to check before I dig into reviewing this: Have you check all the
other users of these data structure that these functions touch whether
they don't accidentally rely on the global lock being taken? Just because
there's some locking around doesn't mean it actually protects much ...
I ask because I've just recently waded through the non-kms drm cruft again
and there's some horrible stuff there.
Yours, Daniel
--
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/3] drm: do not sleep on vblank while holding a mutex
2011-10-26 2:20 ` [PATCH 3/3] drm: do not sleep on vblank while holding a mutex Ilija Hadzic
@ 2011-10-26 7:54 ` Daniel Vetter
2011-10-26 22:33 ` Ilija Hadzic
2011-10-26 8:02 ` Michel Dänzer
1 sibling, 1 reply; 25+ messages in thread
From: Daniel Vetter @ 2011-10-26 7:54 UTC (permalink / raw)
To: Ilija Hadzic; +Cc: dri-devel
On Tue, Oct 25, 2011 at 10:20:14PM -0400, Ilija Hadzic wrote:
> holding the drm_global_mutex in drm_wait_vblank and then
> going to sleep (via DRM_WAIT_ON) is a bad idea in general
> because it can block other processes that may issue ioctls
> that also grab drm_global_mutex. Blocking can occur until
> next vblank which is in the tens of microseconds order of
> magnitude.
>
> fix it, but making drm_wait_vblank DRM_UNLOCKED call and
> then grabbing the mutex inside the drm_wait_vblank function
> and only for the duration of the critical section (i.e.,
> from first manipulation of vblank sequence number until
> putting the task in the wait queue).
>
> Signed-off-by: Ilija Hadzic <ihadzic@research.bell-labs.com>
While you mess around with this code, please use the standard linux
wait_event_* infrastructure. I want to stop that DRM_FOO yelling from
spreading around - the idea of the drm core being os agnostic died quite a
while ago.
Again, have you carefully checked whether this is safe and how the
relevant data structures (vblank counting) are proctected?
One comment below, mind though that I've only glanced over the patch.
-Daniel
> ---
> drivers/gpu/drm/drm_drv.c | 2 +-
> drivers/gpu/drm/drm_irq.c | 16 +++++++++++-----
> include/drm/drm_os_linux.h | 25 +++++++++++++++++++++++++
> 3 files changed, 37 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> index dbabcb0..dc0eb0b 100644
> --- a/drivers/gpu/drm/drm_drv.c
> +++ b/drivers/gpu/drm/drm_drv.c
> @@ -124,7 +124,7 @@ static struct drm_ioctl_desc drm_ioctls[] = {
> DRM_IOCTL_DEF(DRM_IOCTL_SG_ALLOC, drm_sg_alloc_ioctl, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
> DRM_IOCTL_DEF(DRM_IOCTL_SG_FREE, drm_sg_free, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
>
> - DRM_IOCTL_DEF(DRM_IOCTL_WAIT_VBLANK, drm_wait_vblank, 0),
> + DRM_IOCTL_DEF(DRM_IOCTL_WAIT_VBLANK, drm_wait_vblank, DRM_UNLOCKED),
>
> DRM_IOCTL_DEF(DRM_IOCTL_MODESET_CTL, drm_modeset_ctl, 0),
>
> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> index 3830e9e..d85d604 100644
> --- a/drivers/gpu/drm/drm_irq.c
> +++ b/drivers/gpu/drm/drm_irq.c
> @@ -1191,6 +1191,7 @@ int drm_wait_vblank(struct drm_device *dev, void *data,
> DRM_DEBUG("failed to acquire vblank counter, %d\n", ret);
> return ret;
> }
> + mutex_lock(&drm_global_mutex);
> seq = drm_vblank_count(dev, crtc);
>
> switch (vblwait->request.type & _DRM_VBLANK_TYPES_MASK) {
> @@ -1200,12 +1201,15 @@ int drm_wait_vblank(struct drm_device *dev, void *data,
> case _DRM_VBLANK_ABSOLUTE:
> break;
> default:
> + mutex_unlock(&drm_global_mutex);
> ret = -EINVAL;
> goto done;
> }
>
> - if (flags & _DRM_VBLANK_EVENT)
> + if (flags & _DRM_VBLANK_EVENT) {
> + mutex_unlock(&drm_global_mutex);
> return drm_queue_vblank_event(dev, crtc, vblwait, file_priv);
> + }
>
> if ((flags & _DRM_VBLANK_NEXTONMISS) &&
> (seq - vblwait->request.sequence) <= (1<<23)) {
> @@ -1215,10 +1219,12 @@ int drm_wait_vblank(struct drm_device *dev, void *data,
> DRM_DEBUG("waiting on vblank count %d, crtc %d\n",
> vblwait->request.sequence, crtc);
> dev->last_vblank_wait[crtc] = vblwait->request.sequence;
> - DRM_WAIT_ON(ret, dev->vbl_queue[crtc], 3 * DRM_HZ,
> - (((drm_vblank_count(dev, crtc) -
> - vblwait->request.sequence) <= (1 << 23)) ||
> - !dev->irq_enabled));
> + /* DRM_WAIT_ON_LOCKED will release drm_global_mutex */
> + /* before going to sleep */
> + DRM_WAIT_ON_LOCKED(ret, dev->vbl_queue[crtc], 3 * DRM_HZ,
> + (((drm_vblank_count(dev, crtc) -
> + vblwait->request.sequence) <= (1 << 23)) ||
> + !dev->irq_enabled));
>
> if (ret != -EINTR) {
> struct timeval now;
> diff --git a/include/drm/drm_os_linux.h b/include/drm/drm_os_linux.h
> index 3933691..fc6aaf4 100644
> --- a/include/drm/drm_os_linux.h
> +++ b/include/drm/drm_os_linux.h
> @@ -123,5 +123,30 @@ do { \
> remove_wait_queue(&(queue), &entry); \
> } while (0)
>
> +#define DRM_WAIT_ON_LOCKED( ret, queue, timeout, condition ) \
> +do { \
> + DECLARE_WAITQUEUE(entry, current); \
> + unsigned long end = jiffies + (timeout); \
> + add_wait_queue(&(queue), &entry); \
> + mutex_unlock(&drm_global_mutex); \
Hiding a lock-dropping in this fashion is horrible.
> + \
> + for (;;) { \
> + __set_current_state(TASK_INTERRUPTIBLE); \
> + if (condition) \
> + break; \
> + if (time_after_eq(jiffies, end)) { \
> + ret = -EBUSY; \
> + break; \
> + } \
> + schedule_timeout((HZ/100 > 1) ? HZ/100 : 1); \
> + if (signal_pending(current)) { \
> + ret = -EINTR; \
> + break; \
> + } \
> + } \
> + __set_current_state(TASK_RUNNING); \
> + remove_wait_queue(&(queue), &entry); \
> +} while (0)
> +
> #define DRM_WAKEUP( queue ) wake_up( queue )
> #define DRM_INIT_WAITQUEUE( queue ) init_waitqueue_head( queue )
> --
> 1.7.7
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
--
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/3] drm: do not sleep on vblank while holding a mutex
2011-10-26 2:20 ` [PATCH 3/3] drm: do not sleep on vblank while holding a mutex Ilija Hadzic
2011-10-26 7:54 ` Daniel Vetter
@ 2011-10-26 8:02 ` Michel Dänzer
2011-10-26 22:50 ` Ilija Hadzic
1 sibling, 1 reply; 25+ messages in thread
From: Michel Dänzer @ 2011-10-26 8:02 UTC (permalink / raw)
To: Ilija Hadzic; +Cc: dri-devel
On Die, 2011-10-25 at 22:20 -0400, Ilija Hadzic wrote:
> holding the drm_global_mutex in drm_wait_vblank and then
> going to sleep (via DRM_WAIT_ON) is a bad idea in general
> because it can block other processes that may issue ioctls
> that also grab drm_global_mutex. Blocking can occur until
> next vblank which is in the tens of microseconds order of
> magnitude.
>
> fix it, but making drm_wait_vblank DRM_UNLOCKED call and
> then grabbing the mutex inside the drm_wait_vblank function
> and only for the duration of the critical section (i.e.,
> from first manipulation of vblank sequence number until
> putting the task in the wait queue).
Does it really need drm_global_mutex at all, as opposed to e.g.
dev->struct_mutex?
> diff --git a/include/drm/drm_os_linux.h b/include/drm/drm_os_linux.h
> index 3933691..fc6aaf4 100644
> --- a/include/drm/drm_os_linux.h
> +++ b/include/drm/drm_os_linux.h
> @@ -123,5 +123,30 @@ do
> { \
> remove_wait_queue(&(queue), &entry); \
> } while (0)
>
> +#define DRM_WAIT_ON_LOCKED( ret, queue, timeout, condition ) \
> +do { \
> + DECLARE_WAITQUEUE(entry, current); \
> + unsigned long end = jiffies + (timeout); \
> + add_wait_queue(&(queue), &entry); \
> + mutex_unlock(&drm_global_mutex); \
I agree with Daniel's sentiment on this. AFAICT add_wait_queue()
protects against concurrent access to the wait queue, so I think it
would be better to just drop the mutex explicitly before calling
DRM_WAIT_ON.
--
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Debian, X and DRI developer
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/3] drm: make DRM_UNLOCKED ioctls with their own mutex
2011-10-26 7:47 ` Daniel Vetter
@ 2011-10-26 16:11 ` Ilija Hadzic
2011-10-26 20:01 ` Daniel Vetter
0 siblings, 1 reply; 25+ messages in thread
From: Ilija Hadzic @ 2011-10-26 16:11 UTC (permalink / raw)
To: Daniel Vetter; +Cc: dri-devel
On Wed, 26 Oct 2011, Daniel Vetter wrote:
>
> Just to check before I dig into reviewing this: Have you check all the
> other users of these data structure that these functions touch whether
> they don't accidentally rely on the global lock being taken?
I did and thought it was safe. I re-checked this morning prompted by
your note and of course there is one (easily fixable) thing that I missed.
Here is the full "report":
drm_getclient is safe: the only thing that should be protected there is
dev->filelist and that is all protected in other functions using
struct_mutex.
drm_getstats is safe: actually I think there is no need for any mutex there
because the loop runs through quasi-static (set at load time only) data,
and the actual count access is done with atomic_read()
drm_getmap has one problem which I can fix (and it's the hardest to
follow): the structure that should be protected there is the dev->maplist.
There are three functions that may potentially be an issue:
drm_getsarea doesn't grab any lock before touching dev->maplist (so
global mutex won't help here either). However, drivers seem to call
it only at initialization time so it probably doesn't matter
drm_master_destroy doesn't grab any lock either, but it is called
from drm_master_put which is protected with dev->struct_mutex
everywhere else in drm module. I also see a couple of calls
to drm_master_destroy from vmwgfx driver that look unlocked
to me, but drm_global_mutex won't help here either
drm_getsareactx uses struct_mutex, but it apparently releases it
too early (that's the problem I missed initially). If it's released
after it's done with dev->maplist, we should be fine.
I'll redo this patch with a fix to drm_getsareactx and that should do it.
I would still appreciate another pair of eyes to make sure I didn't miss
anything else
thanks,
-- Ilija
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/3] drm: make DRM_UNLOCKED ioctls with their own mutex
2011-10-26 16:11 ` Ilija Hadzic
@ 2011-10-26 20:01 ` Daniel Vetter
0 siblings, 0 replies; 25+ messages in thread
From: Daniel Vetter @ 2011-10-26 20:01 UTC (permalink / raw)
To: Ilija Hadzic; +Cc: dri-devel
On Wed, Oct 26, 2011 at 11:11:14AM -0500, Ilija Hadzic wrote:
> On Wed, 26 Oct 2011, Daniel Vetter wrote:
> >
> >Just to check before I dig into reviewing this: Have you check all the
> >other users of these data structure that these functions touch whether
> >they don't accidentally rely on the global lock being taken?
>
> I did and thought it was safe. I re-checked this morning prompted by
> your note and of course there is one (easily fixable) thing that I
> missed.
> Here is the full "report":
>
> drm_getclient is safe: the only thing that should be protected there
> is dev->filelist and that is all protected in other functions using
> struct_mutex.
>
> drm_getstats is safe: actually I think there is no need for any mutex there
> because the loop runs through quasi-static (set at load time only)
> data, and the actual count access is done with atomic_read()
>
> drm_getmap has one problem which I can fix (and it's the hardest to
> follow): the structure that should be protected there is the
> dev->maplist. There are three functions that may potentially be an
> issue:
>
> drm_getsarea doesn't grab any lock before touching dev->maplist (so
> global mutex won't help here either). However, drivers seem to call
> it only at initialization time so it probably doesn't matter
>
> drm_master_destroy doesn't grab any lock either, but it is called
> from drm_master_put which is protected with dev->struct_mutex
> everywhere else in drm module. I also see a couple of calls
> to drm_master_destroy from vmwgfx driver that look unlocked
> to me, but drm_global_mutex won't help here either
>
> drm_getsareactx uses struct_mutex, but it apparently releases it
> too early (that's the problem I missed initially). If it's released
> after it's done with dev->maplist, we should be fine.
>
> I'll redo this patch with a fix to drm_getsareactx and that should do it.
> I would still appreciate another pair of eyes to make sure I didn't
> miss anything else
Awesome. Please include this in the patch description, and if the
description gets too complicated, maybe even split up the patches into the
individual cases. Also, when you can convince yourself that drm_getstats
doesn't need the struct_mutex, I think it'd be best to drop it. Locking
that doesn't actually protect anything just confuses, especially here in
drm where the locking isn't too clear to begin with.
Yours, Daniel
--
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/3] drm: do not sleep on vblank while holding a mutex
2011-10-26 7:54 ` Daniel Vetter
@ 2011-10-26 22:33 ` Ilija Hadzic
2011-10-27 10:43 ` Daniel Vetter
0 siblings, 1 reply; 25+ messages in thread
From: Ilija Hadzic @ 2011-10-26 22:33 UTC (permalink / raw)
To: Daniel Vetter; +Cc: dri-devel
On Wed, 26 Oct 2011, Daniel Vetter wrote:
>
> While you mess around with this code, please use the standard linux
> wait_event_* infrastructure.
Right now the order of entering the queue is guaranteed to be the same as
the order of entering the ioctl, which reflects the order of progressing
vblank sequence. I wanted to preserve this semantic, so I need a wait
function that puts the task into the queue and then unlocks the mutex
(essentially a kernel equivalent of pthread_cond_wait that POSIX threads
library implements).
The closest "approximation" of that wait_event_interruptable_locked, but
that requires a spinlock, not mutex (and thus the rework to
drm_wait_vblank ioctl would be more radical.
> I want to stop that DRM_FOO yelling from
> spreading around - the idea of the drm core being os agnostic died quite a
> while ago.
>
Is DRM support for other OS-es officially dead ? I know it's not in best
shape on BSD and friends, but I am not sure if I should make it worse (or
inconsistent with the current code shape and form).
Anyway, the primary reason for implementing the DRM_WAIT_ON_LOCKED
is because I didn't have the function that enqueues the task and then
releases the mutex.
> Again, have you carefully checked whether this is safe and how the
> relevant data structures (vblank counting) are proctected?
>
I am only moving the global lock further down in the function. The
structures moved out of the critical section are only local vars. and
arguments. So it's safe. Had I changed the mutex to something other than
global one (which is the right thing to do in a long run) then a more
careful review would be warranted.
Also note that running drm_wait_ioctl as DRM_LOCKED is a severe problem
that we have to address quickly. So I'd like to get *some* kind of decent
fix in this merge window and then follow up with more polishing later.
For example, try compiling and running this code (which any bozo in the
userland can do):
#include <stdio.h>
#include <xf86drm.h>
main()
{
int fd;
drmVBlank vbl;
fd = open("/dev/dri/card0", 0);
printf("fd=%d\n", fd);
while(1) {
vbl.request.type = DRM_VBLANK_RELATIVE ;
vbl.request.sequence = 60;
printf("waiting on vblank\n");
ioctl(fd, DRM_IOCTL_WAIT_VBLANK, &vbl);
}
}
Then start glxgears (on CRTC-0) and watch the hell break loose. The
hostile code will cause any OpenGL application to hang for a full second
before it can even enter the vblank_wait. Now because all vblank waits go
through a common function in DDX, the whole windowing system will
consequently go into a crawl. Run it with Compiz and things are even worse
(the whole windowing system goes "stupid" as soon as the hostile
application is started).
>> + add_wait_queue(&(queue), &entry); \
>> + mutex_unlock(&drm_global_mutex); \
>
> Hiding a lock-dropping in this fashion is horrible.
>
I explained above why am I have to release the lock inside the macro.
That's equivalent to what you can find in the userland in POSIX threads,
like I said. So that's not bad.
What *is* bad is that I should have passed the mutex as an argument to
DRM_WAIT_ON_LOCKED and that I'll fix.
-- Ilija
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/3] drm: do not sleep on vblank while holding a mutex
2011-10-26 8:02 ` Michel Dänzer
@ 2011-10-26 22:50 ` Ilija Hadzic
0 siblings, 0 replies; 25+ messages in thread
From: Ilija Hadzic @ 2011-10-26 22:50 UTC (permalink / raw)
To: Michel Dänzer; +Cc: dri-devel
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; charset=UTF-8; format=flowed, Size: 2068 bytes --]
On Wed, 26 Oct 2011, Michel [ISO-8859-1] Dänzer wrote:
>
> Does it really need drm_global_mutex at all, as opposed to e.g.
> dev->struct_mutex?
>
>
It doesn't. Actually, it would probably suffice to have a mutex that is
one-per-CRTC, because vlbank waits on different CRTCs don't step on each
other, but I was going for a conservative fix. I tried to avoid
having to hunt around in other sections of code for lines that possibly
assume that a global lock is held while touching vblank counter structures
(one of concerns raised by Daniel which is a non-issue because I didn't
change the mutex being used).
The "urgent" part of this patch is to make sure we don't go to sleep while
holding the mutex. In my response to Daniel, I sent an example of a simple
userland application that can hang up the whole windowing system. That's a
big deal for which we have to get the fix in quickly.
After that we can follow up with more polishing (e.g. use per-CRTC mutex,
review and potentially fix other parts of the code that deal with vblank
and maybe assume global mutex protection).
> I agree with Daniel's sentiment on this. AFAICT add_wait_queue()
> protects against concurrent access to the wait queue, so I think it
> would be better to just drop the mutex explicitly before calling
> DRM_WAIT_ON.
>
The problem is that if I release the mutex just before calling
DRM_WAIT_ON, the task can be scheduled away right at that point.
Then another task can go all the way into the DRM_WAIT_ON and enter
the queue. Then the first task wakes up and enters the queue.
Now the last task in the queue is *not* the task that updated
the dev->last_vblank_wait[crtc] value last.
If you or someone who knows better than me can tell me that this potential
reordering is guaranteed harmless, I'll gladly drop the mutex earlier and
then I no longer need the DRM_WAIT_ON_LOCKED macro.
However, if the preservation of the order is actually important, then it
will be an ugly race condition to track later.
-- Ilija
[-- Attachment #2: Type: text/plain, Size: 159 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/3] drm: do not sleep on vblank while holding a mutex
2011-10-26 22:33 ` Ilija Hadzic
@ 2011-10-27 10:43 ` Daniel Vetter
2011-10-27 14:10 ` Alan Coopersmith
2011-10-28 3:19 ` Ilija Hadzic
0 siblings, 2 replies; 25+ messages in thread
From: Daniel Vetter @ 2011-10-27 10:43 UTC (permalink / raw)
To: Ilija Hadzic; +Cc: dri-devel
On Wed, Oct 26, 2011 at 05:33:24PM -0500, Ilija Hadzic wrote:
> Is DRM support for other OS-es officially dead ? I know it's not in
> best shape on BSD and friends, but I am not sure if I should make it
> worse (or inconsistent with the current code shape and form).
I think the idea of sharing kernel drm code is pretty much dead, yeah.
> main()
> {
> int fd;
> drmVBlank vbl;
>
> fd = open("/dev/dri/card0", 0);
> printf("fd=%d\n", fd);
>
> while(1) {
> vbl.request.type = DRM_VBLANK_RELATIVE ;
> vbl.request.sequence = 60;
> printf("waiting on vblank\n");
> ioctl(fd, DRM_IOCTL_WAIT_VBLANK, &vbl);
> }
> }
>
> Then start glxgears (on CRTC-0) and watch the hell break loose. The
> hostile code will cause any OpenGL application to hang for a full
> second before it can even enter the vblank_wait. Now because all
> vblank waits go through a common function in DDX, the whole
> windowing system will consequently go into a crawl. Run it with
> Compiz and things are even worse (the whole windowing system goes
> "stupid" as soon as the hostile application is started).
This is really just horrible.
>
>
> >>+ add_wait_queue(&(queue), &entry); \
> >>+ mutex_unlock(&drm_global_mutex); \
> >
> >Hiding a lock-dropping in this fashion is horrible.
>
> I explained above why am I have to release the lock inside the
> macro. That's equivalent to what you can find in the userland in
> POSIX threads, like I said. So that's not bad.
I disagree - this is just the blk magic reinvented. Also I don't like to
rush locking changes, because the hard part is reviewing all the drivers,
and I prefer to do that just once for the real solution.
Also explaining locking changes (especially in drm) starting with
"assuming the old code is correct, this is safe" isn't great. Because your
changes might simply enlarge the already existing race.
On a quick glance
- drm_vblank functions call by wait_vblank seem to do correct locking
already using various spinlocks and atomic ops.
- linux waitqueues have their own locking
- dev->last_vblank_wait is only used for a procfs file. I don't think it
makes much sense given that we can have more than one vblank waiter
- there's no selective wakeup going on, all waiters get woken for every
vblank and call schedule again if it's not the right vblank
The only really hairy thing going on is racing modeset ioctls against
vblank waiters. But modeset is protected by the dev->mode_config.mutex
and hence already races against wait_vblank with the current code, so I'm
slightly inclined to ignore this for the moment. Would be great if you
coudl check that in-depth, too.
So I think the right thing to do is
- Kill dev->last_vblank_wait (in a prep patch).
- Imo we also have a few too many atomic ops going on, e.g.
dev->vblank_refcount looks like it's atomic only because or the procfs
file, so maybe kill that procfs file entirely.
- Audit the vblank locking, maybe resulting in a patch with updated
comments, if there's anything misleading or tricky going on. And it's
always good when the locking of structe members is explained in the
header file ...
- Drop the mutex locking because it's not needed.
While locking at the code I also noticed that wait_vblank calls
drm_vblank_get, but not always drm_vblank_put. The code is correct, the
missing vblank_put is hidden in drm_queue_vblank_event. Can you also write
the patch to move that around to not trip up reviewers the next time
around?
Yours, Daniel
-
--
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/3] drm: do not sleep on vblank while holding a mutex
2011-10-27 10:43 ` Daniel Vetter
@ 2011-10-27 14:10 ` Alan Coopersmith
2011-10-27 14:20 ` Ilija Hadzic
2011-10-28 3:19 ` Ilija Hadzic
1 sibling, 1 reply; 25+ messages in thread
From: Alan Coopersmith @ 2011-10-27 14:10 UTC (permalink / raw)
To: Daniel Vetter; +Cc: dri-devel
On 10/27/11 03:43, Daniel Vetter wrote:
> On Wed, Oct 26, 2011 at 05:33:24PM -0500, Ilija Hadzic wrote:
>> Is DRM support for other OS-es officially dead ? I know it's not in
>> best shape on BSD and friends, but I am not sure if I should make it
>> worse (or inconsistent with the current code shape and form).
>
> I think the idea of sharing kernel drm code is pretty much dead, yeah.
We are still trying to keep it alive, despite what some may think.
--
-Alan Coopersmith- alan.coopersmith@oracle.com
Oracle Solaris Platform Engineering: X Window System
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/3] drm: do not sleep on vblank while holding a mutex
2011-10-27 14:10 ` Alan Coopersmith
@ 2011-10-27 14:20 ` Ilija Hadzic
0 siblings, 0 replies; 25+ messages in thread
From: Ilija Hadzic @ 2011-10-27 14:20 UTC (permalink / raw)
To: Alan Coopersmith; +Cc: dri-devel
On Thu, 27 Oct 2011, Alan Coopersmith wrote:
>>
>> I think the idea of sharing kernel drm code is pretty much dead, yeah.
>
> We are still trying to keep it alive, despite what some may think.
>
In the context of this patch (in progress), it's probably a moot topic
because per comments that Daniel sent, most of the locks in
drm_vblank_wait will become unecessary after I rework the patch
and I won't need to introduce (new) DRM_WAIT_ON_LOCKED any more.
I will, however, *not* convert existing DRM_WAIT_ON to a Linux-centric
function, so I will not be the one to kill (or contribute to the killing
of) the portability of that file. That I leave to someone else ;-).
P.S. If the "fight" starts I will side with the "keep the portability
alive" camp. I have "emotional" ties with other (less widely used)
operating systems. ;-)
-- Ilija
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/3] drm: do not sleep on vblank while holding a mutex
2011-10-27 10:43 ` Daniel Vetter
2011-10-27 14:10 ` Alan Coopersmith
@ 2011-10-28 3:19 ` Ilija Hadzic
2011-10-28 4:36 ` Ilija Hadzic
` (2 more replies)
1 sibling, 3 replies; 25+ messages in thread
From: Ilija Hadzic @ 2011-10-28 3:19 UTC (permalink / raw)
To: Daniel Vetter; +Cc: dri-devel
On Thu, 27 Oct 2011, Daniel Vetter wrote:
>
> On a quick glance
> - drm_vblank functions call by wait_vblank seem to do correct locking
> already using various spinlocks and atomic ops.
> - linux waitqueues have their own locking
> - dev->last_vblank_wait is only used for a procfs file. I don't think it
> makes much sense given that we can have more than one vblank waiter
> - there's no selective wakeup going on, all waiters get woken for every
> vblank and call schedule again if it's not the right vblank
>
I concur.
> The only really hairy thing going on is racing modeset ioctls against
> vblank waiters. But modeset is protected by the dev->mode_config.mutex
> and hence already races against wait_vblank with the current code, so I'm
> slightly inclined to ignore this for the moment. Would be great if you
> coudl check that in-depth, too.
>
Will leave this for some other patch at some other time; the critical path
is to fix the hang/crawl that I explained in my previous note.
> So I think the right thing to do is
> - Kill dev->last_vblank_wait (in a prep patch).
Agreed. Also drm_vblank_info function can go away
> - Imo we also have a few too many atomic ops going on, e.g.
> dev->vblank_refcount looks like it's atomic only because or the procfs
> file, so maybe kill that procfs file entirely.
I am not sure about that. dev->vblank_refcount looks critical to me: it is
incremented through drm_vblank_get which is called from several different
contexts: page-flip functions in several drivers (which can run in the
context of multiple user processes), **_pm_** stuff in radeon driver
(which looks like it runs in the context of kernel worker thread). Mutexes
used at all these different places are not quite consistent. If anything,
as the result of a later audit, some other mutexes may be rendered
unecessary, but definitely, I would keep vblank_refcount atomic.
> - Audit the vblank locking, maybe resulting in a patch with updated
> comments, if there's anything misleading or tricky going on. And it's
> always good when the locking of structe members is explained in the
> header file ...
I'll add comments that I feel make sense for this set of patches (if
anything). Full audit, should come later at a slower pace. There are quite
a few mutexes and locks many of which are overlapping in function and some
are spurious. It will take more than just myself to untangle this know.
> - Drop the mutex locking because it's not needed.
>
Agreed. Will drop.
> While locking at the code I also noticed that wait_vblank calls
> drm_vblank_get, but not always drm_vblank_put. The code is correct, the
> missing vblank_put is hidden in drm_queue_vblank_event. Can you also write
> the patch to move that around to not trip up reviewers the next time
> around?
>
Actually, there is something fishy here. Look at the 'else' branch under
the 'if ((seq - vblwait->request.sequence) <= (1 << 23))' at the end of
the drm_queue_vblank_event. It doesn't have the 'drm_vblank_put'. It looks
wrong to me, but I need to first convince myself that there is not some
other obscure place where drm_vblank_put is accounted for. If that branch
is really missing a drm_vblank_put, then it will be easy pull the
drm_vblank_put out. Otherwise, it will be another knot to untangle.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/3] drm: do not sleep on vblank while holding a mutex
2011-10-28 3:19 ` Ilija Hadzic
@ 2011-10-28 4:36 ` Ilija Hadzic
2011-10-28 6:59 ` Michel Dänzer
2011-10-28 9:30 ` Daniel Vetter
2 siblings, 0 replies; 25+ messages in thread
From: Ilija Hadzic @ 2011-10-28 4:36 UTC (permalink / raw)
To: Daniel Vetter; +Cc: dri-devel
On Thu, 27 Oct 2011, Ilija Hadzic wrote:
>
>> While locking at the code I also noticed that wait_vblank calls
>> drm_vblank_get, but not always drm_vblank_put. The code is correct, the
>> missing vblank_put is hidden in drm_queue_vblank_event. Can you also write
>> the patch to move that around to not trip up reviewers the next time
>> around?
>>
>
> Actually, there is something fishy here. Look at the 'else' branch under the
> 'if ((seq - vblwait->request.sequence) <= (1 << 23))' at the end of the
> drm_queue_vblank_event. It doesn't have the 'drm_vblank_put'. It looks wrong
> to me, but I need to first convince myself that there is not some other
> obscure place where drm_vblank_put is accounted for. If that branch is really
> missing a drm_vblank_put, then it will be easy pull the drm_vblank_put out.
> Otherwise, it will be another knot to untangle.
>
>
OK, I checked this. Although the code is a little hard to read, it is done
the way it is with the reason. Whenever the drm_queue_vblank_event and
that 'else' branch is caught, the drm_vblank_put should *not* be called.
The reason is that the relevant vblank has not come in yet, so the
reference must remain up so that the vblank interrupts are not disabled
until the event occurs.
The seemingly missing drm_vblank_put will be accounted for in
drm_handle_vblank_events.
So I should not move anything around. If anything, this should be
annotated with comments to explain what's going on.
-- Ilija
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/3] drm: do not sleep on vblank while holding a mutex
2011-10-28 3:19 ` Ilija Hadzic
2011-10-28 4:36 ` Ilija Hadzic
@ 2011-10-28 6:59 ` Michel Dänzer
2011-10-28 9:20 ` Daniel Vetter
2011-10-28 9:30 ` Daniel Vetter
2 siblings, 1 reply; 25+ messages in thread
From: Michel Dänzer @ 2011-10-28 6:59 UTC (permalink / raw)
To: Ilija Hadzic; +Cc: dri-devel
On Don, 2011-10-27 at 22:19 -0500, Ilija Hadzic wrote:
> On Thu, 27 Oct 2011, Daniel Vetter wrote:
>
> > So I think the right thing to do is
> > - Kill dev->last_vblank_wait (in a prep patch).
>
> Agreed. Also drm_vblank_info function can go away
Actually, I was rather going to submit a patch to hook it up again —
AFAICT it was unhooked without any justification. It could be useful for
debugging vblank related hangs. Any issues with it, such as
last_vblank_wait not being guaranteed to really be the last one, can
always be improved later on.
--
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Debian, X and DRI developer
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/3] drm: do not sleep on vblank while holding a mutex
2011-10-28 6:59 ` Michel Dänzer
@ 2011-10-28 9:20 ` Daniel Vetter
2011-10-28 12:10 ` Ilija Hadzic
0 siblings, 1 reply; 25+ messages in thread
From: Daniel Vetter @ 2011-10-28 9:20 UTC (permalink / raw)
To: Michel Dänzer; +Cc: dri-devel
On Fri, Oct 28, 2011 at 08:59:04AM +0200, Michel Dänzer wrote:
> On Don, 2011-10-27 at 22:19 -0500, Ilija Hadzic wrote:
> > On Thu, 27 Oct 2011, Daniel Vetter wrote:
> >
> > > So I think the right thing to do is
> > > - Kill dev->last_vblank_wait (in a prep patch).
> >
> > Agreed. Also drm_vblank_info function can go away
>
> Actually, I was rather going to submit a patch to hook it up again —
> AFAICT it was unhooked without any justification. It could be useful for
> debugging vblank related hangs. Any issues with it, such as
> last_vblank_wait not being guaranteed to really be the last one, can
> always be improved later on.
I've thought a bit about the usefulness of it for debugging before
proposing to kill it and I think it can die: It's only really useful for a
complete hangs, if we have an issue we just missing a wakeup somewhere,
that's not gonna catch things. Hence I think something that allows you to
watch things while it's running is much better, i.e. either a drm debug
prinkt or a tracecall.
But if you're firmly attached to that debug file it should be pretty easy
to shove that under the protection of one of the vblank spinlocks.
-Daniel
--
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/3] drm: do not sleep on vblank while holding a mutex
2011-10-28 3:19 ` Ilija Hadzic
2011-10-28 4:36 ` Ilija Hadzic
2011-10-28 6:59 ` Michel Dänzer
@ 2011-10-28 9:30 ` Daniel Vetter
2 siblings, 0 replies; 25+ messages in thread
From: Daniel Vetter @ 2011-10-28 9:30 UTC (permalink / raw)
To: Ilija Hadzic; +Cc: dri-devel
On Thu, Oct 27, 2011 at 10:19:39PM -0500, Ilija Hadzic wrote:
> On Thu, 27 Oct 2011, Daniel Vetter wrote:
> >The only really hairy thing going on is racing modeset ioctls against
> >vblank waiters. But modeset is protected by the dev->mode_config.mutex
> >and hence already races against wait_vblank with the current code, so I'm
> >slightly inclined to ignore this for the moment. Would be great if you
> >coudl check that in-depth, too.
> >
>
> Will leave this for some other patch at some other time; the
> critical path is to fix the hang/crawl that I explained in my
> previous note.
Agreed, one thing at a time. It's complicated enough as is.
> >So I think the right thing to do is
> >- Kill dev->last_vblank_wait (in a prep patch).
>
> Agreed. Also drm_vblank_info function can go away
>
> >- Imo we also have a few too many atomic ops going on, e.g.
> > dev->vblank_refcount looks like it's atomic only because or the procfs
> > file, so maybe kill that procfs file entirely.
>
> I am not sure about that. dev->vblank_refcount looks critical to me:
> it is incremented through drm_vblank_get which is called from
> several different contexts: page-flip functions in several drivers
> (which can run in the context of multiple user processes), **_pm_**
> stuff in radeon driver (which looks like it runs in the context of
> kernel worker thread). Mutexes used at all these different places
> are not quite consistent. If anything, as the result of a later
> audit, some other mutexes may be rendered unecessary, but
> definitely, I would keep vblank_refcount atomic.
I've re-read the code a bit and in _get it's protected by dev->vbl_lock,
but not so in _put (because we issue a timer call to actually switch it
off). I think we could just shove it under the protection of dev->vbl_lock
completely. Another fuzzzy area is the relation between dev->vbl_lock and
dev->vblank_time_lock. The latter looks a bit rendundant.
> >- Audit the vblank locking, maybe resulting in a patch with updated
> > comments, if there's anything misleading or tricky going on. And it's
> > always good when the locking of structe members is explained in the
> > header file ...
>
> I'll add comments that I feel make sense for this set of patches (if
> anything). Full audit, should come later at a slower pace. There are
> quite a few mutexes and locks many of which are overlapping in
> function and some are spurious. It will take more than just myself
> to untangle this know.
Yeah, agreed. Let's first drop the mutex around this and untangle the
spinlock/atomic mess in a second step. This is too hairy.
> >- Drop the mutex locking because it's not needed.
> >
>
> Agreed. Will drop.
>
> >While locking at the code I also noticed that wait_vblank calls
> >drm_vblank_get, but not always drm_vblank_put. The code is correct, the
> >missing vblank_put is hidden in drm_queue_vblank_event. Can you also write
> >the patch to move that around to not trip up reviewers the next time
> >around?
> >
>
> Actually, there is something fishy here. Look at the 'else' branch
> under the 'if ((seq - vblwait->request.sequence) <= (1 << 23))' at
> the end of the drm_queue_vblank_event. It doesn't have the
> 'drm_vblank_put'. It looks wrong to me, but I need to first convince
> myself that there is not some other obscure place where
> drm_vblank_put is accounted for. If that branch is really missing a
> drm_vblank_put, then it will be easy pull the drm_vblank_put out.
> Otherwise, it will be another knot to untangle.
I've re-read the code and agree with your follow-up analysis.
--
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/3] drm: do not sleep on vblank while holding a mutex
2011-10-28 9:20 ` Daniel Vetter
@ 2011-10-28 12:10 ` Ilija Hadzic
2011-10-28 14:51 ` Daniel Vetter
0 siblings, 1 reply; 25+ messages in thread
From: Ilija Hadzic @ 2011-10-28 12:10 UTC (permalink / raw)
To: Daniel Vetter; +Cc: Michel Dänzer, dri-devel
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1806 bytes --]
On Fri, 28 Oct 2011, Daniel Vetter wrote:
> On Fri, Oct 28, 2011 at 08:59:04AM +0200, Michel Dänzer wrote:
>> On Don, 2011-10-27 at 22:19 -0500, Ilija Hadzic wrote:
>>> On Thu, 27 Oct 2011, Daniel Vetter wrote:
>>>
>>>> So I think the right thing to do is
>>>> - Kill dev->last_vblank_wait (in a prep patch).
>>>
>>> Agreed. Also drm_vblank_info function can go away
>>
>> Actually, I was rather going to submit a patch to hook it up again —
>> AFAICT it was unhooked without any justification. It could be useful for
>> debugging vblank related hangs. Any issues with it, such as
>> last_vblank_wait not being guaranteed to really be the last one, can
>> always be improved later on.
>
> I've thought a bit about the usefulness of it for debugging before
> proposing to kill it and I think it can die: It's only really useful for a
> complete hangs, if we have an issue we just missing a wakeup somewhere,
> that's not gonna catch things. Hence I think something that allows you to
> watch things while it's running is much better, i.e. either a drm debug
> prinkt or a tracecall.
>
> But if you're firmly attached to that debug file it should be pretty easy
> to shove that under the protection of one of the vblank spinlocks.
I'll keep it then and figure out the best mutex/spinlock to use. It can be
anything that exists on one-per-CRTC basis (vblank waits on different CTCs
are not contending). The critical section is from that switch in which
vblwait->request.sequence is incremented until it is assigned to
dev->last_vblank_wait[crtc] (and we are only protecting that section
against itself, executed in contexts of different PIDs).
I guess we settled now on this patch (other comments will be
addressed in a different set of patches).
-- Ilija
[-- Attachment #2: Type: text/plain, Size: 159 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/3] drm: do not sleep on vblank while holding a mutex
2011-10-28 12:10 ` Ilija Hadzic
@ 2011-10-28 14:51 ` Daniel Vetter
0 siblings, 0 replies; 25+ messages in thread
From: Daniel Vetter @ 2011-10-28 14:51 UTC (permalink / raw)
To: Ilija Hadzic; +Cc: Michel Dänzer, dri-devel
On Fri, Oct 28, 2011 at 07:10:51AM -0500, Ilija Hadzic wrote:
> I'll keep it then and figure out the best mutex/spinlock to use. It
> can be anything that exists on one-per-CRTC basis (vblank waits on
> different CTCs are not contending). The critical section is from
> that switch in which vblwait->request.sequence is incremented until
> it is assigned to dev->last_vblank_wait[crtc] (and we are only
> protecting that section against itself, executed in contexts of
> different PIDs).
>
> I guess we settled now on this patch (other comments will be
> addressed in a different set of patches).
If you settle on keeping ->last_vblank_wait I think the easiest is to wrap
it with the dev->vbl_lock spinlock everywhere.
-Daniel
--
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/3] drm: do not sleep on vblank while holding a mutex
[not found] <mailman.37.1319803862.23620.dri-devel@lists.freedesktop.org>
@ 2011-10-28 18:15 ` Mario Kleiner
2011-10-28 19:15 ` Daniel Vetter
0 siblings, 1 reply; 25+ messages in thread
From: Mario Kleiner @ 2011-10-28 18:15 UTC (permalink / raw)
To: dri-devel; +Cc: Daniel Vetter, Mario Kleiner
> Message: 5
> Date: Fri, 28 Oct 2011 11:30:38 +0200
> From: Daniel Vetter <daniel@ffwll.ch>
> Subject: Re: [PATCH 3/3] drm: do not sleep on vblank while holding a
> mutex
> To: Ilija Hadzic <ihadzic@research.bell-labs.com>
> Cc: dri-devel@lists.freedesktop.org
> Message-ID: <20111028093038.GB2919@phenom.ffwll.local>
> Content-Type: text/plain; charset=us-ascii
>>
>>> - Imo we also have a few too many atomic ops going on, e.g.
>>> dev->vblank_refcount looks like it's atomic only because or the
>>> procfs
>>> file, so maybe kill that procfs file entirely.
>>
>> I am not sure about that. dev->vblank_refcount looks critical to me:
>> it is incremented through drm_vblank_get which is called from
>> several different contexts: page-flip functions in several drivers
>> (which can run in the context of multiple user processes), **_pm_**
>> stuff in radeon driver (which looks like it runs in the context of
>> kernel worker thread). Mutexes used at all these different places
>> are not quite consistent. If anything, as the result of a later
>> audit, some other mutexes may be rendered unecessary, but
>> definitely, I would keep vblank_refcount atomic.
>
Hi,
be careful with vblank_refcount. I think it probably should stay
atomic. The drm_vblank_put() is often used in interrupt handlers of
the kms drivers where you don't want to uneccessary wait on a lock,
but be as quick as possible.
> I've re-read the code a bit and in _get it's protected by dev-
> >vbl_lock,
> but not so in _put (because we issue a timer call to actually
> switch it
> off). I think we could just shove it under the protection of dev-
> >vbl_lock
> completely. Another fuzzzy area is the relation between dev-
> >vbl_lock and
> dev->vblank_time_lock. The latter looks a bit rendundant.
>
The vblank_time_lock serves a different purpose from vbl_lock. It is
used in the vblank irq handler drm_handle_vblank() and in the vblank
irq on/off functions to protect the vblank timestamping and counting
against races between the irq handler and the on/off code, and some
funny races between the cpu reinitializing vblank counts and
timestamps in non-irq path and the gpu firing vblank interrupts and/
or updating its hardware vblank counter at the "wrong" moment.
vblank_time_lock basically blocks/delays gpu vblank irq processing
during such problematic periods. Timing there is criticial for
reliable vblank timestamping, kms page flipping and artifact free
dynamic gpu power-management. It should be locked as seldomly and
held as short as possible. I initially tried to get away only with
vbl_lock, but there were uses of vbl_lock that looked as if using it
in the interrupt handler as well could cause long delays in irq
processing.
thanks,
-mario
*********************************************************************
Mario Kleiner
Max Planck Institute for Biological Cybernetics
Spemannstr. 38
72076 Tuebingen
Germany
e-mail: mario.kleiner@tuebingen.mpg.de
office: +49 (0)7071/601-1623
fax: +49 (0)7071/601-616
www: http://www.kyb.tuebingen.mpg.de/~kleinerm
*********************************************************************
"For a successful technology, reality must take precedence
over public relations, for Nature cannot be fooled."
(Richard Feynman)
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/3] drm: do not sleep on vblank while holding a mutex
2011-10-28 18:15 ` Mario Kleiner
@ 2011-10-28 19:15 ` Daniel Vetter
2011-10-28 19:53 ` Mario Kleiner
0 siblings, 1 reply; 25+ messages in thread
From: Daniel Vetter @ 2011-10-28 19:15 UTC (permalink / raw)
To: Mario Kleiner; +Cc: Daniel Vetter, dri-devel
On Fri, Oct 28, 2011 at 08:15:11PM +0200, Mario Kleiner wrote:
> be careful with vblank_refcount. I think it probably should stay
> atomic. The drm_vblank_put() is often used in interrupt handlers of
> the kms drivers where you don't want to uneccessary wait on a lock,
> but be as quick as possible.
I don't think that's a real issue - if we have lock contention anywhere
with the current code, we already have a problem - and it looks like we
have ;-) And the spinlocks are all already switching off irqs locally, so
we should already strive for the shortest possible lock hold times.
[snip]
> The vblank_time_lock serves a different purpose from vbl_lock. It is
> used in the vblank irq handler drm_handle_vblank() and in the vblank
> irq on/off functions to protect the vblank timestamping and counting
> against races between the irq handler and the on/off code, and some
> funny races between the cpu reinitializing vblank counts and
> timestamps in non-irq path and the gpu firing vblank interrupts and/
> or updating its hardware vblank counter at the "wrong" moment.
> vblank_time_lock basically blocks/delays gpu vblank irq processing
> during such problematic periods. Timing there is criticial for
> reliable vblank timestamping, kms page flipping and artifact free
> dynamic gpu power-management. It should be locked as seldomly and
> held as short as possible. I initially tried to get away only with
> vbl_lock, but there were uses of vbl_lock that looked as if using it
> in the interrupt handler as well could cause long delays in irq
> processing.
I've gone through the code, and in almost all case where the vbl_lock is
hold for a bit more than just a few load and stores we are immediately
taking the vblank_time_lock and hold it almost as long. Which is why I
think this is redundant. The only exeption is the irq uninstall code,
which isn't really a fast path anyway.
Also, all these paths with longer lock-hold times are in the initial
vblank_get/final vblank_put. And because we delay the vblank disabling by
a full second, we should never hit that in steady state.
But I think you're one of the most timing-sensitive users of this vblank
stuff, so when I get around to clean this up I'll certainly put you on cc
so you can test for any adverse effects. Hopefully ripping out unecessary
atomic ops makes stuff faster, not slower.
-Daniel
--
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/3] drm: do not sleep on vblank while holding a mutex
2011-10-28 19:15 ` Daniel Vetter
@ 2011-10-28 19:53 ` Mario Kleiner
0 siblings, 0 replies; 25+ messages in thread
From: Mario Kleiner @ 2011-10-28 19:53 UTC (permalink / raw)
To: Daniel Vetter; +Cc: Mario Kleiner, dri-devel
On Oct 28, 2011, at 9:15 PM, Daniel Vetter wrote:
> On Fri, Oct 28, 2011 at 08:15:11PM +0200, Mario Kleiner wrote:
>> be careful with vblank_refcount. I think it probably should stay
>> atomic. The drm_vblank_put() is often used in interrupt handlers of
>> the kms drivers where you don't want to uneccessary wait on a lock,
>> but be as quick as possible.
>
> I don't think that's a real issue - if we have lock contention
> anywhere
> with the current code, we already have a problem - and it looks
> like we
> have ;-) And the spinlocks are all already switching off irqs
> locally, so
> we should already strive for the shortest possible lock hold times.
>
> [snip]
>
>> The vblank_time_lock serves a different purpose from vbl_lock. It is
>> used in the vblank irq handler drm_handle_vblank() and in the vblank
>> irq on/off functions to protect the vblank timestamping and counting
>> against races between the irq handler and the on/off code, and some
>> funny races between the cpu reinitializing vblank counts and
>> timestamps in non-irq path and the gpu firing vblank interrupts and/
>> or updating its hardware vblank counter at the "wrong" moment.
>> vblank_time_lock basically blocks/delays gpu vblank irq processing
>> during such problematic periods. Timing there is criticial for
>> reliable vblank timestamping, kms page flipping and artifact free
>> dynamic gpu power-management. It should be locked as seldomly and
>> held as short as possible. I initially tried to get away only with
>> vbl_lock, but there were uses of vbl_lock that looked as if using it
>> in the interrupt handler as well could cause long delays in irq
>> processing.
>
> I've gone through the code, and in almost all case where the
> vbl_lock is
> hold for a bit more than just a few load and stores we are immediately
> taking the vblank_time_lock and hold it almost as long. Which is why I
> think this is redundant. The only exeption is the irq uninstall code,
> which isn't really a fast path anyway.
Ok, if that is the case in the current code, fair enough. We would
have to stress the importance of *very low* vbl_lock hold times in
the documentation of the code if we want to get rid of the
vblank_time_lock. The distinctive lock for that purpose is also meant
as a marker that you should try twice as hard to be quick while you
hold that lock than with other spinlocks. A few dozen usecs extra
delay there could make a difference between met and missed pageflip
deadlines and well working or not well working dynamic gpu
reclocking, e.g., on radeon, where this stuff is triggered by vblank
irq's and quite timing sensitive.
>
> Also, all these paths with longer lock-hold times are in the initial
> vblank_get/final vblank_put. And because we delay the vblank
> disabling by
> a full second, we should never hit that in steady state.
5 seconds by default, i think. We wanted to lower that to something
like < 100 msecs at some point, but there's still some work to do to
remove some potential race with the gpu in the on/off path which
could cause lost/wrong vblank counts, and the nouveau kms driver
doesn't implement hardware vblank counter at all atm, which makes
small values on nouveau unsafe atm. With higher on/off frequencies
the impact of too long lock hold times would become worse.
> But I think you're one of the most timing-sensitive users of this
> vblank
> stuff, so when I get around to clean this up I'll certainly put you
> on cc
> so you can test for any adverse effects. Hopefully ripping out
> unecessary
> atomic ops makes stuff faster, not slower.
Yes, the current timing/timestamp precision and some of the related
dri2 features of the free graphics stack is a very strong selling
point to lure quite many timing sensitive users of my userspace
toolkit away from other proprietary solutions.
Ok, cc'ing me sounds like a plan.
Thanks,
-mario
> -Daniel
> --
> Daniel Vetter
> Mail: daniel@ffwll.ch
> Mobile: +41 (0)79 365 57 48
*********************************************************************
Mario Kleiner
Max Planck Institute for Biological Cybernetics
Spemannstr. 38
72076 Tuebingen
Germany
e-mail: mario.kleiner@tuebingen.mpg.de
office: +49 (0)7071/601-1623
fax: +49 (0)7071/601-616
www: http://www.kyb.tuebingen.mpg.de/~kleinerm
*********************************************************************
"For a successful technology, reality must take precedence
over public relations, for Nature cannot be fooled."
(Richard Feynman)
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2011-10-28 19:53 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-26 2:20 drm: fix one flawed mutex grab and remove some spurious mutex grabs Ilija Hadzic
2011-10-26 2:20 ` [PATCH 1/3] drm: no need to hold global mutex for static data Ilija Hadzic
2011-10-26 7:44 ` Daniel Vetter
2011-10-26 2:20 ` [PATCH 2/3] drm: make DRM_UNLOCKED ioctls with their own mutex Ilija Hadzic
2011-10-26 7:47 ` Daniel Vetter
2011-10-26 16:11 ` Ilija Hadzic
2011-10-26 20:01 ` Daniel Vetter
2011-10-26 2:20 ` [PATCH 3/3] drm: do not sleep on vblank while holding a mutex Ilija Hadzic
2011-10-26 7:54 ` Daniel Vetter
2011-10-26 22:33 ` Ilija Hadzic
2011-10-27 10:43 ` Daniel Vetter
2011-10-27 14:10 ` Alan Coopersmith
2011-10-27 14:20 ` Ilija Hadzic
2011-10-28 3:19 ` Ilija Hadzic
2011-10-28 4:36 ` Ilija Hadzic
2011-10-28 6:59 ` Michel Dänzer
2011-10-28 9:20 ` Daniel Vetter
2011-10-28 12:10 ` Ilija Hadzic
2011-10-28 14:51 ` Daniel Vetter
2011-10-28 9:30 ` Daniel Vetter
2011-10-26 8:02 ` Michel Dänzer
2011-10-26 22:50 ` Ilija Hadzic
[not found] <mailman.37.1319803862.23620.dri-devel@lists.freedesktop.org>
2011-10-28 18:15 ` Mario Kleiner
2011-10-28 19:15 ` Daniel Vetter
2011-10-28 19:53 ` Mario Kleiner
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.