[PATCH 0/4] drm/ttm: Get rid of a number of atomic read-modify-write ops

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/4] drm/ttm: Get rid of a number of atomic read-modify-write ops
@ 2012-11-05 13:31 Thomas Hellstrom
  2012-11-05 13:31 ` [PATCH 1/4] drm: Make hashtab rcu-safe Thomas Hellstrom
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Thomas Hellstrom @ 2012-11-05 13:31 UTC (permalink / raw)
  To: airlied, airlied; +Cc: dri-devel, linux-kernel

A patch series for next that removes a substantial number of read-modify-write
operations from TTM command submission, in particular if TTM objects are used
to export objects to user-space. The only per-object atomic r-m-w operations
left during a typical execbuf call should be refcount up and down.

In-Reply-To: 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/4] drm: Make hashtab rcu-safe
  2012-11-05 13:31 [PATCH 0/4] drm/ttm: Get rid of a number of atomic read-modify-write ops Thomas Hellstrom
@ 2012-11-05 13:31 ` Thomas Hellstrom
  2012-11-05 13:31 ` [PATCH 2/4] kref: Implement kref_get_unless_zero Thomas Hellstrom
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Thomas Hellstrom @ 2012-11-05 13:31 UTC (permalink / raw)
  To: airlied, airlied; +Cc: dri-devel, linux-kernel, Thomas Hellstrom

TTM base objects will be the first consumer.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
---
 drivers/gpu/drm/drm_hashtab.c |   18 +++++++-----------
 1 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/drm_hashtab.c b/drivers/gpu/drm/drm_hashtab.c
index c3745c4..5729e39 100644
--- a/drivers/gpu/drm/drm_hashtab.c
+++ b/drivers/gpu/drm/drm_hashtab.c
@@ -67,10 +67,8 @@ void drm_ht_verbose_list(struct drm_open_hash *ht, unsigned long key)
 	hashed_key = hash_long(key, ht->order);
 	DRM_DEBUG("Key is 0x%08lx, Hashed key is 0x%08x\n", key, hashed_key);
 	h_list = &ht->table[hashed_key];
-	hlist_for_each(list, h_list) {
-		entry = hlist_entry(list, struct drm_hash_item, head);
+	hlist_for_each_entry_rcu(entry, list, h_list, head)
 		DRM_DEBUG("count %d, key: 0x%08lx\n", count++, entry->key);
-	}
 }
 
 static struct hlist_node *drm_ht_find_key(struct drm_open_hash *ht,
@@ -83,8 +81,7 @@ static struct hlist_node *drm_ht_find_key(struct drm_open_hash *ht,
 
 	hashed_key = hash_long(key, ht->order);
 	h_list = &ht->table[hashed_key];
-	hlist_for_each(list, h_list) {
-		entry = hlist_entry(list, struct drm_hash_item, head);
+	hlist_for_each_entry_rcu(entry, list, h_list, head) {
 		if (entry->key == key)
 			return list;
 		if (entry->key > key)
@@ -105,8 +102,7 @@ int drm_ht_insert_item(struct drm_open_hash *ht, struct drm_hash_item *item)
 	hashed_key = hash_long(key, ht->order);
 	h_list = &ht->table[hashed_key];
 	parent = NULL;
-	hlist_for_each(list, h_list) {
-		entry = hlist_entry(list, struct drm_hash_item, head);
+	hlist_for_each_entry_rcu(entry, list, h_list, head) {
 		if (entry->key == key)
 			return -EINVAL;
 		if (entry->key > key)
@@ -114,9 +110,9 @@ int drm_ht_insert_item(struct drm_open_hash *ht, struct drm_hash_item *item)
 		parent = list;
 	}
 	if (parent) {
-		hlist_add_after(parent, &item->head);
+		hlist_add_after_rcu(parent, &item->head);
 	} else {
-		hlist_add_head(&item->head, h_list);
+		hlist_add_head_rcu(&item->head, h_list);
 	}
 	return 0;
 }
@@ -171,7 +167,7 @@ int drm_ht_remove_key(struct drm_open_hash *ht, unsigned long key)
 
 	list = drm_ht_find_key(ht, key);
 	if (list) {
-		hlist_del_init(list);
+		hlist_del_init_rcu(list);
 		return 0;
 	}
 	return -EINVAL;
@@ -179,7 +175,7 @@ int drm_ht_remove_key(struct drm_open_hash *ht, unsigned long key)
 
 int drm_ht_remove_item(struct drm_open_hash *ht, struct drm_hash_item *item)
 {
-	hlist_del_init(&item->head);
+	hlist_del_init_rcu(&item->head);
 	return 0;
 }
 EXPORT_SYMBOL(drm_ht_remove_item);
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/4] kref: Implement kref_get_unless_zero
  2012-11-05 13:31 [PATCH 0/4] drm/ttm: Get rid of a number of atomic read-modify-write ops Thomas Hellstrom
  2012-11-05 13:31 ` [PATCH 1/4] drm: Make hashtab rcu-safe Thomas Hellstrom
@ 2012-11-05 13:31 ` Thomas Hellstrom
  2012-11-05 13:31 ` [PATCH 3/4] drm/ttm, drm/vmwgfx: Use RCU locking for object lookups Thomas Hellstrom
  2012-11-05 13:31 ` [PATCH 4/4] drm/ttm: Optimize reservation slightly Thomas Hellstrom
  3 siblings, 0 replies; 10+ messages in thread
From: Thomas Hellstrom @ 2012-11-05 13:31 UTC (permalink / raw)
  To: airlied, airlied; +Cc: dri-devel, linux-kernel, Thomas Hellstrom

This function is intended to simplify locking around refcounting for
objects that can be looked up from a lookup structure, and which are
removed from that lookup structure in the object destructor.
Operations on such objects require at least a read lock around
lookup + kref_get, and a write lock around kref_put + remove from lookup
structure. Furthermore, RCU implementations become extremely tricky.
With a lookup followed by a kref_get_unless_zero *with return value check*
locking in the kref_put path can be deferred to the actual removal from
the lookup structure and RCU lookups become trivial.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
---
 include/linux/kref.h |   21 +++++++++++++++++++++
 1 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/include/linux/kref.h b/include/linux/kref.h
index 65af688..fd16042 100644
--- a/include/linux/kref.h
+++ b/include/linux/kref.h
@@ -111,4 +111,25 @@ static inline int kref_put_mutex(struct kref *kref,
 	}
 	return 0;
 }
+
+/**
+ * kref_get_unless_zero - Increment refcount for object unless it is zero.
+ * @kref: object.
+ *
+ * Return 0 if the increment succeeded. Otherwise return non-zero.
+
+ * This function is intended to simplify locking around refcounting for
+ * objects that can be looked up from a lookup structure, and which are
+ * removed from that lookup structure in the object destructor.
+ * Operations on such objects require at least a read lock around
+ * lookup + kref_get, and a write lock around kref_put + remove from lookup
+ * structure. Furthermore, RCU implementations become extremely tricky.
+ * With a lookup followed by a kref_get_unless_zero *with return value check*
+ * locking in the kref_put path can be deferred to the actual removal from
+ * the lookup structure and RCU lookups become trivial.
+ */
+static inline int __must_check kref_get_unless_zero(struct kref *kref)
+{
+	return !atomic_add_unless(&kref->refcount, 1, 0);
+}
 #endif /* _KREF_H_ */
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/4] drm/ttm, drm/vmwgfx: Use RCU locking for object lookups
  2012-11-05 13:31 [PATCH 0/4] drm/ttm: Get rid of a number of atomic read-modify-write ops Thomas Hellstrom
  2012-11-05 13:31 ` [PATCH 1/4] drm: Make hashtab rcu-safe Thomas Hellstrom
  2012-11-05 13:31 ` [PATCH 2/4] kref: Implement kref_get_unless_zero Thomas Hellstrom
@ 2012-11-05 13:31 ` Thomas Hellstrom
  2012-11-05 13:34   ` Thomas Hellstrom
  2012-11-05 13:31 ` [PATCH 4/4] drm/ttm: Optimize reservation slightly Thomas Hellstrom
  3 siblings, 1 reply; 10+ messages in thread
From: Thomas Hellstrom @ 2012-11-05 13:31 UTC (permalink / raw)
  To: airlied, airlied; +Cc: dri-devel, linux-kernel, Thomas Hellstrom

The mostly used lookup+get put+potential_destroy path of TTM objects
is converted to use RCU locks. This will substantially decrease the amount
of locked bus cycles during normal operation.
Since we use kfree_rcu to free the objects, no rcu synchronization is needed
at module unload time.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
---
 drivers/gpu/drm/ttm/ttm_object.c         |   30 +++++++++++-------------------
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c |    8 ++++----
 include/drm/ttm/ttm_object.h             |    4 ++++
 include/linux/kref.h                     |    2 +-
 4 files changed, 20 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_object.c b/drivers/gpu/drm/ttm/ttm_object.c
index c785787..9d7f674 100644
--- a/drivers/gpu/drm/ttm/ttm_object.c
+++ b/drivers/gpu/drm/ttm/ttm_object.c
@@ -80,7 +80,7 @@ struct ttm_object_file {
  */
 
 struct ttm_object_device {
-	rwlock_t object_lock;
+	spinlock_t object_lock;
 	struct drm_open_hash object_hash;
 	atomic_t object_count;
 	struct ttm_mem_global *mem_glob;
@@ -157,12 +157,12 @@ int ttm_base_object_init(struct ttm_object_file *tfile,
 	base->refcount_release = refcount_release;
 	base->ref_obj_release = ref_obj_release;
 	base->object_type = object_type;
-	write_lock(&tdev->object_lock);
+	spin_lock(&tdev->object_lock);
 	kref_init(&base->refcount);
 	ret = drm_ht_just_insert_please(&tdev->object_hash,
 					&base->hash,
 					(unsigned long)base, 31, 0, 0);
-	write_unlock(&tdev->object_lock);
+	spin_unlock(&tdev->object_lock);
 	if (unlikely(ret != 0))
 		goto out_err0;
 
@@ -186,30 +186,22 @@ static void ttm_release_base(struct kref *kref)
 	    container_of(kref, struct ttm_base_object, refcount);
 	struct ttm_object_device *tdev = base->tfile->tdev;
 
+	spin_lock(&tdev->object_lock);
 	(void)drm_ht_remove_item(&tdev->object_hash, &base->hash);
-	write_unlock(&tdev->object_lock);
+	spin_unlock(&tdev->object_lock);
 	if (base->refcount_release) {
 		ttm_object_file_unref(&base->tfile);
 		base->refcount_release(&base);
 	}
-	write_lock(&tdev->object_lock);
 }
 
 void ttm_base_object_unref(struct ttm_base_object **p_base)
 {
 	struct ttm_base_object *base = *p_base;
-	struct ttm_object_device *tdev = base->tfile->tdev;
 
 	*p_base = NULL;
 
-	/*
-	 * Need to take the lock here to avoid racing with
-	 * users trying to look up the object.
-	 */
-
-	write_lock(&tdev->object_lock);
 	kref_put(&base->refcount, ttm_release_base);
-	write_unlock(&tdev->object_lock);
 }
 EXPORT_SYMBOL(ttm_base_object_unref);
 
@@ -221,14 +213,14 @@ struct ttm_base_object *ttm_base_object_lookup(struct ttm_object_file *tfile,
 	struct drm_hash_item *hash;
 	int ret;
 
-	read_lock(&tdev->object_lock);
+	rcu_read_lock();
 	ret = drm_ht_find_item(&tdev->object_hash, key, &hash);
 
 	if (likely(ret == 0)) {
 		base = drm_hash_entry(hash, struct ttm_base_object, hash);
-		kref_get(&base->refcount);
+		ret = kref_get_unless_zero(&base->refcount);
 	}
-	read_unlock(&tdev->object_lock);
+	rcu_read_unlock();
 
 	if (unlikely(ret != 0))
 		return NULL;
@@ -426,7 +418,7 @@ struct ttm_object_device *ttm_object_device_init(struct ttm_mem_global
 		return NULL;
 
 	tdev->mem_glob = mem_glob;
-	rwlock_init(&tdev->object_lock);
+	spin_lock_init(&tdev->object_lock);
 	atomic_set(&tdev->object_count, 0);
 	ret = drm_ht_create(&tdev->object_hash, hash_order);
 
@@ -444,9 +436,9 @@ void ttm_object_device_release(struct ttm_object_device **p_tdev)
 
 	*p_tdev = NULL;
 
-	write_lock(&tdev->object_lock);
+	spin_lock(&tdev->object_lock);
 	drm_ht_remove(&tdev->object_hash);
-	write_unlock(&tdev->object_lock);
+	spin_unlock(&tdev->object_lock);
 
 	kfree(tdev);
 }
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
index da3c6b5..ae675c6 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
@@ -351,7 +351,7 @@ static void vmw_user_context_free(struct vmw_resource *res)
 	    container_of(res, struct vmw_user_context, res);
 	struct vmw_private *dev_priv = res->dev_priv;
 
-	kfree(ctx);
+	ttm_base_object_kfree(ctx, base);
 	ttm_mem_global_free(vmw_mem_glob(dev_priv),
 			    vmw_user_context_size);
 }
@@ -1147,7 +1147,7 @@ static void vmw_user_surface_free(struct vmw_resource *res)
 	kfree(srf->offsets);
 	kfree(srf->sizes);
 	kfree(srf->snooper.image);
-	kfree(user_srf);
+	ttm_base_object_kfree(user_srf, base);
 	ttm_mem_global_free(vmw_mem_glob(dev_priv), size);
 }
 
@@ -1575,7 +1575,7 @@ static void vmw_user_dmabuf_destroy(struct ttm_buffer_object *bo)
 {
 	struct vmw_user_dma_buffer *vmw_user_bo = vmw_user_dma_buffer(bo);
 
-	kfree(vmw_user_bo);
+	ttm_base_object_kfree(vmw_user_bo, base);
 }
 
 static void vmw_user_dmabuf_release(struct ttm_base_object **p_base)
@@ -1763,7 +1763,7 @@ static void vmw_user_stream_free(struct vmw_resource *res)
 	    container_of(res, struct vmw_user_stream, stream.res);
 	struct vmw_private *dev_priv = res->dev_priv;
 
-	kfree(stream);
+	ttm_base_object_kfree(stream, base);
 	ttm_mem_global_free(vmw_mem_glob(dev_priv),
 			    vmw_user_stream_size);
 }
diff --git a/include/drm/ttm/ttm_object.h b/include/drm/ttm/ttm_object.h
index b01c563..fc0cf06 100644
--- a/include/drm/ttm/ttm_object.h
+++ b/include/drm/ttm/ttm_object.h
@@ -40,6 +40,7 @@
 #include <linux/list.h>
 #include <drm/drm_hashtab.h>
 #include <linux/kref.h>
+#include <linux/rcupdate.h>
 #include <ttm/ttm_memory.h>
 
 /**
@@ -120,6 +121,7 @@ struct ttm_object_device;
  */
 
 struct ttm_base_object {
+	struct rcu_head rhead;
 	struct drm_hash_item hash;
 	enum ttm_object_type object_type;
 	bool shareable;
@@ -268,4 +270,6 @@ extern struct ttm_object_device *ttm_object_device_init
 
 extern void ttm_object_device_release(struct ttm_object_device **p_tdev);
 
+#define ttm_base_object_kfree(__object, __base)\
+	kfree_rcu(__object, __base.rhead)
 #endif
diff --git a/include/linux/kref.h b/include/linux/kref.h
index fd16042..bae91d0 100644
--- a/include/linux/kref.h
+++ b/include/linux/kref.h
@@ -117,7 +117,7 @@ static inline int kref_put_mutex(struct kref *kref,
  * @kref: object.
  *
  * Return 0 if the increment succeeded. Otherwise return non-zero.
-
+ *
  * This function is intended to simplify locking around refcounting for
  * objects that can be looked up from a lookup structure, and which are
  * removed from that lookup structure in the object destructor.
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/4] drm/ttm: Optimize reservation slightly
  2012-11-05 13:31 [PATCH 0/4] drm/ttm: Get rid of a number of atomic read-modify-write ops Thomas Hellstrom
                   ` (2 preceding siblings ...)
  2012-11-05 13:31 ` [PATCH 3/4] drm/ttm, drm/vmwgfx: Use RCU locking for object lookups Thomas Hellstrom
@ 2012-11-05 13:31 ` Thomas Hellstrom
  2012-11-05 14:01   ` Maarten Lankhorst
  3 siblings, 1 reply; 10+ messages in thread
From: Thomas Hellstrom @ 2012-11-05 13:31 UTC (permalink / raw)
  To: airlied, airlied; +Cc: dri-devel, linux-kernel, Thomas Hellstrom

Reservation locking currently always takes place under the LRU spinlock.
Hence, strictly there is no need for an atomic_cmpxchg call; we can use
atomic_read followed by atomic_write since nobody else will ever reserve
without the lru spinlock held.
At least on Intel this should remove a locked bus cycle on successful
reserve.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index bf6e4b5..46008ea 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -220,7 +220,7 @@ int ttm_bo_reserve_locked(struct ttm_buffer_object *bo,
 	struct ttm_bo_global *glob = bo->glob;
 	int ret;
 
-	while (unlikely(atomic_cmpxchg(&bo->reserved, 0, 1) != 0)) {
+	while (unlikely(atomic_read(&bo->reserved) != 0)) {
 		/**
 		 * Deadlock avoidance for multi-bo reserving.
 		 */
@@ -249,6 +249,7 @@ int ttm_bo_reserve_locked(struct ttm_buffer_object *bo,
 			return ret;
 	}
 
+	atomic_set(&bo->reserved, 1);
 	if (use_sequence) {
 		/**
 		 * Wake up waiters that may need to recheck for deadlock,
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/4] drm/ttm, drm/vmwgfx: Use RCU locking for object lookups
  2012-11-05 13:31 ` [PATCH 3/4] drm/ttm, drm/vmwgfx: Use RCU locking for object lookups Thomas Hellstrom
@ 2012-11-05 13:34   ` Thomas Hellstrom
  0 siblings, 0 replies; 10+ messages in thread
From: Thomas Hellstrom @ 2012-11-05 13:34 UTC (permalink / raw)
  To: Thomas Hellstrom; +Cc: airlied, airlied, dri-devel, linux-kernel

On 11/05/2012 02:31 PM, Thomas Hellstrom wrote:
> The mostly used lookup+get put+potential_destroy path of TTM objects
> is converted to use RCU locks. This will substantially decrease the amount
> of locked bus cycles during normal operation.
> Since we use kfree_rcu to free the objects, no rcu synchronization is needed
> at module unload time.
>
> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
> ---
>   drivers/gpu/drm/ttm/ttm_object.c         |   30 +++++++++++-------------------
>   drivers/gpu/drm/vmwgfx/vmwgfx_resource.c |    8 ++++----
>   include/drm/ttm/ttm_object.h             |    4 ++++
>   include/linux/kref.h                     |    2 +-
>   4 files changed, 20 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_object.c b/drivers/gpu/drm/ttm/ttm_object.c
> index c785787..9d7f674 100644
> --- a/drivers/gpu/drm/ttm/ttm_object.c
> +++ b/drivers/gpu/drm/ttm/ttm_object.c
> @@ -80,7 +80,7 @@ struct ttm_object_file {
>    */
>   
>   struct ttm_object_device {
> -	rwlock_t object_lock;
> +	spinlock_t object_lock;
>   	struct drm_open_hash object_hash;
>   	atomic_t object_count;
>   	struct ttm_mem_global *mem_glob;
> @@ -157,12 +157,12 @@ int ttm_base_object_init(struct ttm_object_file *tfile,
>   	base->refcount_release = refcount_release;
>   	base->ref_obj_release = ref_obj_release;
>   	base->object_type = object_type;
> -	write_lock(&tdev->object_lock);
> +	spin_lock(&tdev->object_lock);
>   	kref_init(&base->refcount);
>   	ret = drm_ht_just_insert_please(&tdev->object_hash,
>   					&base->hash,
>   					(unsigned long)base, 31, 0, 0);
> -	write_unlock(&tdev->object_lock);
> +	spin_unlock(&tdev->object_lock);
>   	if (unlikely(ret != 0))
>   		goto out_err0;
>   
> @@ -186,30 +186,22 @@ static void ttm_release_base(struct kref *kref)
>   	    container_of(kref, struct ttm_base_object, refcount);
>   	struct ttm_object_device *tdev = base->tfile->tdev;
>   
> +	spin_lock(&tdev->object_lock);
>   	(void)drm_ht_remove_item(&tdev->object_hash, &base->hash);
> -	write_unlock(&tdev->object_lock);
> +	spin_unlock(&tdev->object_lock);
>   	if (base->refcount_release) {
>   		ttm_object_file_unref(&base->tfile);
>   		base->refcount_release(&base);
>   	}
> -	write_lock(&tdev->object_lock);
>   }
>   
>   void ttm_base_object_unref(struct ttm_base_object **p_base)
>   {
>   	struct ttm_base_object *base = *p_base;
> -	struct ttm_object_device *tdev = base->tfile->tdev;
>   
>   	*p_base = NULL;
>   
> -	/*
> -	 * Need to take the lock here to avoid racing with
> -	 * users trying to look up the object.
> -	 */
> -
> -	write_lock(&tdev->object_lock);
>   	kref_put(&base->refcount, ttm_release_base);
> -	write_unlock(&tdev->object_lock);
>   }
>   EXPORT_SYMBOL(ttm_base_object_unref);
>   
> @@ -221,14 +213,14 @@ struct ttm_base_object *ttm_base_object_lookup(struct ttm_object_file *tfile,
>   	struct drm_hash_item *hash;
>   	int ret;
>   
> -	read_lock(&tdev->object_lock);
> +	rcu_read_lock();
>   	ret = drm_ht_find_item(&tdev->object_hash, key, &hash);
>   
>   	if (likely(ret == 0)) {
>   		base = drm_hash_entry(hash, struct ttm_base_object, hash);
> -		kref_get(&base->refcount);
> +		ret = kref_get_unless_zero(&base->refcount);
>   	}
> -	read_unlock(&tdev->object_lock);
> +	rcu_read_unlock();
>   
>   	if (unlikely(ret != 0))
>   		return NULL;
> @@ -426,7 +418,7 @@ struct ttm_object_device *ttm_object_device_init(struct ttm_mem_global
>   		return NULL;
>   
>   	tdev->mem_glob = mem_glob;
> -	rwlock_init(&tdev->object_lock);
> +	spin_lock_init(&tdev->object_lock);
>   	atomic_set(&tdev->object_count, 0);
>   	ret = drm_ht_create(&tdev->object_hash, hash_order);
>   
> @@ -444,9 +436,9 @@ void ttm_object_device_release(struct ttm_object_device **p_tdev)
>   
>   	*p_tdev = NULL;
>   
> -	write_lock(&tdev->object_lock);
> +	spin_lock(&tdev->object_lock);
>   	drm_ht_remove(&tdev->object_hash);
> -	write_unlock(&tdev->object_lock);
> +	spin_unlock(&tdev->object_lock);
>   
>   	kfree(tdev);
>   }
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
> index da3c6b5..ae675c6 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
> @@ -351,7 +351,7 @@ static void vmw_user_context_free(struct vmw_resource *res)
>   	    container_of(res, struct vmw_user_context, res);
>   	struct vmw_private *dev_priv = res->dev_priv;
>   
> -	kfree(ctx);
> +	ttm_base_object_kfree(ctx, base);
>   	ttm_mem_global_free(vmw_mem_glob(dev_priv),
>   			    vmw_user_context_size);
>   }
> @@ -1147,7 +1147,7 @@ static void vmw_user_surface_free(struct vmw_resource *res)
>   	kfree(srf->offsets);
>   	kfree(srf->sizes);
>   	kfree(srf->snooper.image);
> -	kfree(user_srf);
> +	ttm_base_object_kfree(user_srf, base);
>   	ttm_mem_global_free(vmw_mem_glob(dev_priv), size);
>   }
>   
> @@ -1575,7 +1575,7 @@ static void vmw_user_dmabuf_destroy(struct ttm_buffer_object *bo)
>   {
>   	struct vmw_user_dma_buffer *vmw_user_bo = vmw_user_dma_buffer(bo);
>   
> -	kfree(vmw_user_bo);
> +	ttm_base_object_kfree(vmw_user_bo, base);
>   }
>   
>   static void vmw_user_dmabuf_release(struct ttm_base_object **p_base)
> @@ -1763,7 +1763,7 @@ static void vmw_user_stream_free(struct vmw_resource *res)
>   	    container_of(res, struct vmw_user_stream, stream.res);
>   	struct vmw_private *dev_priv = res->dev_priv;
>   
> -	kfree(stream);
> +	ttm_base_object_kfree(stream, base);
>   	ttm_mem_global_free(vmw_mem_glob(dev_priv),
>   			    vmw_user_stream_size);
>   }
> diff --git a/include/drm/ttm/ttm_object.h b/include/drm/ttm/ttm_object.h
> index b01c563..fc0cf06 100644
> --- a/include/drm/ttm/ttm_object.h
> +++ b/include/drm/ttm/ttm_object.h
> @@ -40,6 +40,7 @@
>   #include <linux/list.h>
>   #include <drm/drm_hashtab.h>
>   #include <linux/kref.h>
> +#include <linux/rcupdate.h>
>   #include <ttm/ttm_memory.h>
>   
>   /**
> @@ -120,6 +121,7 @@ struct ttm_object_device;
>    */
>   
>   struct ttm_base_object {
> +	struct rcu_head rhead;
>   	struct drm_hash_item hash;
>   	enum ttm_object_type object_type;
>   	bool shareable;
> @@ -268,4 +270,6 @@ extern struct ttm_object_device *ttm_object_device_init
>   
>   extern void ttm_object_device_release(struct ttm_object_device **p_tdev);
>   
> +#define ttm_base_object_kfree(__object, __base)\
> +	kfree_rcu(__object, __base.rhead)
>   #endif
> diff --git a/include/linux/kref.h b/include/linux/kref.h
> index fd16042..bae91d0 100644
> --- a/include/linux/kref.h
> +++ b/include/linux/kref.h
> @@ -117,7 +117,7 @@ static inline int kref_put_mutex(struct kref *kref,
>    * @kref: object.
>    *
>    * Return 0 if the increment succeeded. Otherwise return non-zero.
> -
> + *
>    * This function is intended to simplify locking around refcounting for
>    * objects that can be looked up from a lookup structure, and which are
>    * removed from that lookup structure in the object destructor.

Hmm, This patch looks bad. I'll respin it.
/Thomas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 4/4] drm/ttm: Optimize reservation slightly
  2012-11-05 13:55 [PATCH 0/4] drm/ttm: Get rid of a number of atomic read-modify-write ops v2 Thomas Hellstrom
@ 2012-11-05 13:55 ` Thomas Hellstrom
  0 siblings, 0 replies; 10+ messages in thread
From: Thomas Hellstrom @ 2012-11-05 13:55 UTC (permalink / raw)
  To: airlied, airlied; +Cc: dri-devel, linux-kernel, Thomas Hellstrom

Reservation locking currently always takes place under the LRU spinlock.
Hence, strictly there is no need for an atomic_cmpxchg call; we can use
atomic_read followed by atomic_write since nobody else will ever reserve
without the lru spinlock held.
At least on Intel this should remove a locked bus cycle on successful
reserve.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index bf6e4b5..46008ea 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -220,7 +220,7 @@ int ttm_bo_reserve_locked(struct ttm_buffer_object *bo,
 	struct ttm_bo_global *glob = bo->glob;
 	int ret;
 
-	while (unlikely(atomic_cmpxchg(&bo->reserved, 0, 1) != 0)) {
+	while (unlikely(atomic_read(&bo->reserved) != 0)) {
 		/**
 		 * Deadlock avoidance for multi-bo reserving.
 		 */
@@ -249,6 +249,7 @@ int ttm_bo_reserve_locked(struct ttm_buffer_object *bo,
 			return ret;
 	}
 
+	atomic_set(&bo->reserved, 1);
 	if (use_sequence) {
 		/**
 		 * Wake up waiters that may need to recheck for deadlock,
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 4/4] drm/ttm: Optimize reservation slightly
  2012-11-05 13:31 ` [PATCH 4/4] drm/ttm: Optimize reservation slightly Thomas Hellstrom
@ 2012-11-05 14:01   ` Maarten Lankhorst
  2012-11-05 14:09     ` Thomas Hellstrom
  0 siblings, 1 reply; 10+ messages in thread
From: Maarten Lankhorst @ 2012-11-05 14:01 UTC (permalink / raw)
  To: Thomas Hellstrom; +Cc: airlied, airlied, linux-kernel, dri-devel

Hey,

Op 05-11-12 14:31, Thomas Hellstrom schreef:
> Reservation locking currently always takes place under the LRU spinlock.
> Hence, strictly there is no need for an atomic_cmpxchg call; we can use
> atomic_read followed by atomic_write since nobody else will ever reserve
> without the lru spinlock held.
> At least on Intel this should remove a locked bus cycle on successful
> reserve.
>
> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
>
Is that really a good thing to submit when I am busy killing lru lock around reserve? :-)

-	while (unlikely(atomic_cmpxchg(&bo->reserved, 0, 1) != 0)) {
+	while (unlikely(atomic_xchg(&bo->reserved, 1) != 0)) {

Works without lru lock too!

In fact mutexes are done in a similar way[1], except with some more magic, and unlocked state is 1, not 0.
However I do think that to get that right (saves a irq disable in unlock path, and less wakeups in contended
case), I should really just post the mutex extension patches for reservations and ride the flames. It's
getting too close to real mutexes so I really want it to be a mutex in that case. So lets convert it.. Soon! :-)

~Maarten

[1] See linux/include/asm-generic/mutex-xchg.h and linux/include/asm-generic/mutex-dec.h for how
archs generally implement mutex fastpaths.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 4/4] drm/ttm: Optimize reservation slightly
  2012-11-05 14:01   ` Maarten Lankhorst
@ 2012-11-05 14:09     ` Thomas Hellstrom
  0 siblings, 0 replies; 10+ messages in thread
From: Thomas Hellstrom @ 2012-11-05 14:09 UTC (permalink / raw)
  To: Maarten Lankhorst; +Cc: airlied, airlied, linux-kernel, dri-devel

On 11/05/2012 03:01 PM, Maarten Lankhorst wrote:
> Hey,
>
> Op 05-11-12 14:31, Thomas Hellstrom schreef:
>> Reservation locking currently always takes place under the LRU spinlock.
>> Hence, strictly there is no need for an atomic_cmpxchg call; we can use
>> atomic_read followed by atomic_write since nobody else will ever reserve
>> without the lru spinlock held.
>> At least on Intel this should remove a locked bus cycle on successful
>> reserve.
>>
>> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
>>
> Is that really a good thing to submit when I am busy killing lru lock around reserve? :-)
>

If your patch series makes it into the same kernel, let's kill this 
patch. Otherwise it may live
at least for a kernel release. It's not a big thing to rebase against, 
and I won't complain if your
patch adds another atomic read-modify-write op here. :)

/Thomas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 4/4] drm/ttm: Optimize reservation slightly
  2012-11-06 11:31 [PATCH 0/4] drm/ttm: Get rid of a number of atomic read-modify-write ops v3 Thomas Hellstrom
@ 2012-11-06 11:31 ` Thomas Hellstrom
  0 siblings, 0 replies; 10+ messages in thread
From: Thomas Hellstrom @ 2012-11-06 11:31 UTC (permalink / raw)
  To: airlied, airlied; +Cc: dri-devel, linux-kernel, Thomas Hellstrom

Reservation locking currently always takes place under the LRU spinlock.
Hence, strictly there is no need for an atomic_cmpxchg call; we can use
atomic_read followed by atomic_write since nobody else will ever reserve
without the lru spinlock held.
At least on Intel this should remove a locked bus cycle on successful
reserve.

Note that thit commit may be obsoleted by the cross-device reservation work.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index bf6e4b5..46008ea 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -220,7 +220,7 @@ int ttm_bo_reserve_locked(struct ttm_buffer_object *bo,
 	struct ttm_bo_global *glob = bo->glob;
 	int ret;
 
-	while (unlikely(atomic_cmpxchg(&bo->reserved, 0, 1) != 0)) {
+	while (unlikely(atomic_read(&bo->reserved) != 0)) {
 		/**
 		 * Deadlock avoidance for multi-bo reserving.
 		 */
@@ -249,6 +249,7 @@ int ttm_bo_reserve_locked(struct ttm_buffer_object *bo,
 			return ret;
 	}
 
+	atomic_set(&bo->reserved, 1);
 	if (use_sequence) {
 		/**
 		 * Wake up waiters that may need to recheck for deadlock,
-- 
1.7.4.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-11-06 11:31 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-05 13:31 [PATCH 0/4] drm/ttm: Get rid of a number of atomic read-modify-write ops Thomas Hellstrom
2012-11-05 13:31 ` [PATCH 1/4] drm: Make hashtab rcu-safe Thomas Hellstrom
2012-11-05 13:31 ` [PATCH 2/4] kref: Implement kref_get_unless_zero Thomas Hellstrom
2012-11-05 13:31 ` [PATCH 3/4] drm/ttm, drm/vmwgfx: Use RCU locking for object lookups Thomas Hellstrom
2012-11-05 13:34   ` Thomas Hellstrom
2012-11-05 13:31 ` [PATCH 4/4] drm/ttm: Optimize reservation slightly Thomas Hellstrom
2012-11-05 14:01   ` Maarten Lankhorst
2012-11-05 14:09     ` Thomas Hellstrom
  -- strict thread matches above, loose matches on Subject: below --
2012-11-05 13:55 [PATCH 0/4] drm/ttm: Get rid of a number of atomic read-modify-write ops v2 Thomas Hellstrom
2012-11-05 13:55 ` [PATCH 4/4] drm/ttm: Optimize reservation slightly Thomas Hellstrom
2012-11-06 11:31 [PATCH 0/4] drm/ttm: Get rid of a number of atomic read-modify-write ops v3 Thomas Hellstrom
2012-11-06 11:31 ` [PATCH 4/4] drm/ttm: Optimize reservation slightly Thomas Hellstrom

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.