All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] Refcounting of objects part of a lockfree collection
@ 2004-07-14  4:53 Ravikiran G Thirumalai
  2004-07-14  7:07 ` Greg KH
  0 siblings, 1 reply; 18+ messages in thread
From: Ravikiran G Thirumalai @ 2004-07-14  4:53 UTC (permalink / raw)
  To: linux-kernel; +Cc: dipankar

Traditionally, refcounting technique is used to avoid or minimise
lock holding time for individual objects part of a collection.  The
'collection' of such objects (lists/arrays/etc) are proteced by traditional
locks such as spinlocks/reader-writer locks.  These locks protect
access to the collection data structure (...walking the objects part of
the collection, adding/deleting objects to the collection). Access to 
individual objects are protected through the refcounting mechanism.  

It makes sense for performance sake to make access to a read-mostly 
collection of objects lock-free with RCU techniques.  Introduction of RCU 
in place of traditional locking to protect such a collection creates some 
refcounting issues.  These issues come up because, with RCU, unlike 
traditional locking, writers and readers could execute concurrently.  

The attatched patch provides infrastructure for refcounting of objects
in a rcu protected collection.

This infrastructure was a result of our go at making the fd lookup lockfree.
fget right now holds the struct files_struct.file_lock to get the 
struct file pointer of the requested fd from the process' fd array.
The spinlock in fget is used to serialise read access to the 
files_struct.fd[] and incrementing the refcount.  We can save on the 
spin_lock and spin_unlock on the fd array lookup if reference can be 
taken on the struct file object in a safe and lock free manner 
(under rcu_readlock/rcu_readunlock and rcu on the update side).  This 
optimization should improve performance for threaded io intensive workloads.  
A patch to make use of the lockfree refcounting infrastructure to do 
just that follows. With the lockfree fd lookup patch, tiobench performance 
increases by 13% for sequential reads, 21 % for random reads on a 4 
processor pIII xeon.

Comments welcome.

Thanks,
Kiran

Signed off by: Ravikiran Thirumalai <kiran@in.ibm.com>

---

diff -ruN -X dontdiff linux-2.6.6/include/linux/refcount.h refcount-2.6.6/include/linux/refcount.h
--- linux-2.6.6/include/linux/refcount.h	1970-01-01 05:30:00.000000000 +0530
+++ refcount-2.6.6/include/linux/refcount.h	2004-05-24 18:31:25.523860952 +0530
@@ -0,0 +1,244 @@
+/* 
+ * Refcounter framework for elements of lists/arrays protected by
+ * RCU.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2004
+ *
+ * Author: Ravikiran Thirumalai <kiran@in.ibm.com>
+ * Credits: Dipankar Sarma <dipankar@in.ibm.com>.  refcount_get_rcu has
+ *	    been abstracted out from Dipankar's implementation of rcu based
+ *	    fd management.
+ * 
+ * Refcounting on elements of  lists which are protected by traditional
+ * reader/writer spinlocks or semaphores are straight forward as in:
+ * 
+ * 1.					2.
+ * add()				search_and_reference()
+ * {					{
+ * 	alloc_object				read_lock(&list_lock);
+ *	...					search_for_element
+ *	refcount_init(&el->rc)			refcount_get(&el->rc);	
+ *	write_lock(&list_lock);			...
+ *	add_element				read_unlock(&list_lock);
+ *	...					...
+ *	write_unlock(&list_lock);	}
+ * }					
+ *
+ * 3.					4.
+ * release_referenced()			delete()
+ * {					{
+ *	...				write_lock(&list_lock);
+ *	if (refcount_put(&el->rc))	...
+ *		start_cleanup_object	...
+ *		free_object		delete_element
+ *	...				write_unlock(&list_lock);
+ * }					...
+ *					if (refcount_put(&el->rc))
+ *						start_cleanup_object
+ *						free_object
+ *					}
+ *
+ * If this list/array is made lock free using rcu as in changing the 
+ * write_lock in add() and delete() to spin_lock and changing read_lock
+ * in search_and_reference to rcu_read_lock(), the refcount_get in 
+ * search_and_reference could potentially hold reference to an element which
+ * has already been deleted from the list/array.  refcount_get_rcu takes
+ * care of this scenario. search_and_reference should look as;
+ * 2. 
+ * search_and_reference()
+ * {
+ *	rcu_read_lock();
+ *	search_for_element
+ *	if (!refcount_get_rcu(&el->rc)) {
+ *		rcu_read_unlock();
+ *		return FAIL;
+ *	}
+ *	...
+ *	...
+ *	rcu_read_unlock();
+ * }
+ * 
+ * Of course, free_object after refcount_put should be batched using call_rcu.
+ * Sometimes, reference to the element need to be obtained in the 
+ * update (write) stream.  In such cases, refcount_get_rcu might be an overkill
+ * since the spinlock serialising list updates are held. refcount_get
+ * is to be used in such cases.  
+ * Note: Eventhough these refcount primitives can be used for lists which
+ * are protected by traditional locking, if the arch doesn't have cmpxchg,
+ * there might be extra penalties in refcount_get.
+ *
+ */
+
+#ifndef _LINUX_REFCOUNT_H
+#define _LINUX_REFCOUNT_H
+#ifdef  __KERNEL__
+
+#include <asm/atomic.h>
+#include <asm/system.h>
+
+typedef struct { atomic_t count; } refcount_t;
+
+/* 
+ * Refcount primitives for lists/array elements protected by traditional
+ * locking
+ */
+
+static inline void refcount_init(refcount_t *rc)	
+{
+	atomic_set(&rc->count, 1);
+}
+
+static inline int refcount_put(refcount_t *rc)
+{
+	return atomic_dec_and_test(&rc->count);
+}
+
+static inline void refcount_dec(refcount_t *rc)
+{
+	atomic_dec(&rc->count);
+}
+
+static inline int refcount_put_and_lock(refcount_t *rc, spinlock_t *lock)
+{
+	return atomic_dec_and_lock(&rc->count, lock);
+}
+
+static inline int refcount_read(refcount_t *rc)
+{
+	return atomic_read(&rc->count);
+}
+
+#ifdef __HAVE_ARCH_CMPXCHG
+
+static inline void refcount_get(refcount_t *rc)
+{
+	atomic_inc(&rc->count);
+}
+
+/* 
+ * Refcount primitives for lists/array elements protected by rcu
+ */
+
+/* 
+ * cmpxchg is needed on UP too, if deletions to the list/array can happen
+ * in interrupt context.
+ */
+static inline int refcount_get_rcu(refcount_t *rc)
+{
+	int c, old;
+	c = atomic_read(&rc->count);
+	while ( c && (old = cmpxchg(&rc->count.counter, c, c+1)) != c) 
+		c = old;
+	return c;
+}
+
+#else /* !__HAVE_ARCH_CMPXCHG */
+
+/* 
+ * We use an array of spinlocks for the refcounts -- similar to ones in sparc
+ * 32 bit atomic_t implementations, and a hash function similar to that
+ * for our refcounting needs.
+ * Can't help multiprocessors which donot have cmpxchg :(
+ */
+
+#ifdef	CONFIG_SMP
+#define REFCOUNT_HASH_SIZE	4
+#define REFCOUNT_HASH(r) \
+	(&__refcount_hash[(((unsigned long)r)>>8) & (REFCOUNT_HASH_SIZE-1)])
+#else
+#define	REFCOUNT_HASH_SIZE	1
+#define REFCOUNT_HASH(r) 	&__refcount_hash[0]
+#endif /* CONFIG_SMP */
+
+extern spinlock_t __refcount_hash[REFCOUNT_HASH_SIZE]
+
+static inline void refcount_init(refcoun_t *rc)	
+{
+	unsigned long flags;
+	spin_lock_irqsave(REFCOUNT_HASH(rc), flags);
+	rc->count.counter = 1;
+	spin_unlock_irqrestore((REFCOUNT_HASH(rc), flags);
+}
+
+static inline int refcount_put(refcount_t *rc)
+{
+	unsigned long flags;
+	spin_lock_irqsave(REFCOUNT_HASH(rc), flags);
+	rc->count.counter--;
+	if (!rc->count.counter) {
+		spin_unlock_irqrestore((REFCOUNT_HASH(rc), flags);
+		return 1;
+	} else {
+		spin_unlock_irqrestore((REFCOUNT_HASH(rc), flags);
+		return 0;
+	}
+}
+
+static inline void refcount_dec(refcount_t *rc)
+{
+	unsigned long flags;
+	spin_lock_irqsave(REFCOUNT_HASH(rc), flags);
+	rc->count.counter--;
+	spin_unlock_irqrestore((REFCOUNT_HASH(rc), flags);
+}
+
+static inline int refcount_put_and_locked(refcount_t *rc, spinlock_t *lock)
+{
+	unsigned long flags;
+	spin_lock(lock);
+	spin_lock_irqsave(REFCOUNT_HASH(rc), flags);
+	rc->count.counter--;
+	if (!rc->count.counter) {
+		spin_unlock_irqrestore((REFCOUNT_HASH(rc), flags);
+		return 1;
+	} else {
+		spin_unlock_irqrestore((REFCOUNT_HASH(rc), flags);
+		spin_unlock(lock)
+		return 0;
+	}
+}
+/* Spinlocks are not needed in this case, if atomic_read is used */
+static inline int refcount_read(refcount_t *rc)
+{
+	return atomic_read(&rc->count.counter);
+}
+
+static inline void refcount_get(refcount_t *rc)
+{
+	unsigned long flags;
+	spin_lock_irqsave(REFCOUNT_HASH(rc), flags);
+	rc->count++;
+	spin_unlock_irqrestore((REFCOUNT_HASH(rc), flags);
+}
+
+static inline int refcount_get_rcu(refcount_t *rc)
+{
+	int ret;
+	unsigned long flags;
+	spin_lock_irqsave(REFCOUNT_HASH(rc), flags);
+	if (rc->count.counter) 
+		ret = rc->count.counter++;
+	else
+		ret = 0;
+	spin_unlock_irqrestore((REFCOUNT_HASH(rc), flags);
+	return ret;
+}
+
+#endif /* __HAVE_ARCH_CMPXCHG */
+
+#endif
+#endif /* _LINUX_REFCOUNT_H */
diff -ruN -X dontdiff linux-2.6.6/kernel/rcupdate.c refcount-2.6.6/kernel/rcupdate.c
--- linux-2.6.6/kernel/rcupdate.c	2004-05-10 08:01:58.000000000 +0530
+++ refcount-2.6.6/kernel/rcupdate.c	2004-05-24 15:41:15.000000000 +0530
@@ -44,6 +44,7 @@
 #include <linux/notifier.h>
 #include <linux/rcupdate.h>
 #include <linux/cpu.h>
+#include <linux/refcount.h>
 
 /* Definition for rcupdate control block. */
 struct rcu_ctrlblk rcu_ctrlblk = 
@@ -325,6 +326,17 @@
 	wait_for_completion(&completion);
 }
 
+/**
+ * spinlock_t array for arches which donot have cmpxchg.  This is for
+ * rcu based refcounting stuff.
+ */
+
+#ifndef __HAVE_ARCH_CMPXCHG
+spinlock_t __refcount_hash[REFCOUNT_HASH_SIZE] = {
+	[0 ... (REFCOUNT_HASH_SIZE-1)] = SPIN_LOCK_UNLOCKED
+};
+EXPORT_SYMBOL(__refcount_hash);
+#endif
 
 EXPORT_SYMBOL(call_rcu);
 EXPORT_SYMBOL(synchronize_kernel);

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Refcounting of objects part of a lockfree collection
  2004-07-14  4:53 Ravikiran G Thirumalai
@ 2004-07-14  7:07 ` Greg KH
  2004-07-14  8:26   ` Dipankar Sarma
  2004-07-14  8:57   ` Ravikiran G Thirumalai
  0 siblings, 2 replies; 18+ messages in thread
From: Greg KH @ 2004-07-14  7:07 UTC (permalink / raw)
  To: Ravikiran G Thirumalai; +Cc: linux-kernel, dipankar

On Wed, Jul 14, 2004 at 10:23:50AM +0530, Ravikiran G Thirumalai wrote:
> 
> The attatched patch provides infrastructure for refcounting of objects
> in a rcu protected collection.

This is really close to the kref implementation.  Why not just use that
instead?

Oh, and I think you need to use atomic_set() instead of initializing the
atomic_t by hand.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Refcounting of objects part of a lockfree collection
  2004-07-14  7:07 ` Greg KH
@ 2004-07-14  8:26   ` Dipankar Sarma
  2004-07-14 14:26     ` Greg KH
  2004-07-14  8:57   ` Ravikiran G Thirumalai
  1 sibling, 1 reply; 18+ messages in thread
From: Dipankar Sarma @ 2004-07-14  8:26 UTC (permalink / raw)
  To: Greg KH; +Cc: Ravikiran G Thirumalai, linux-kernel

On Wed, Jul 14, 2004 at 12:07:00AM -0700, Greg KH wrote:
> On Wed, Jul 14, 2004 at 10:23:50AM +0530, Ravikiran G Thirumalai wrote:
> > 
> > The attatched patch provides infrastructure for refcounting of objects
> > in a rcu protected collection.
> 
> This is really close to the kref implementation.  Why not just use that
> instead?

Well, the kref has the same get/put race if used in a lock-free
look-up. When you do a kref_get() it is assumed that another
cpu will not see a 1-to-0 transition of the reference count.
If that indeed happens, ->release() will get invoked more
than once for that object which is bad. Kiran's patch actually
solves this fundamental lock-free ref-counting problem.

The other issue is that there are many refcounted data structures
like dentry, dst_entry, file etc. that do not use kref. If everybody
were to use kref, we could possibly apply Kiran's lock-free extensions
to kref itself and be done with it. Until then, we need the lock-free
refcounting support from non-kref refcounting objects.

Thanks
Dipankar

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Refcounting of objects part of a lockfree collection
  2004-07-14  7:07 ` Greg KH
  2004-07-14  8:26   ` Dipankar Sarma
@ 2004-07-14  8:57   ` Ravikiran G Thirumalai
  2004-07-14 17:08     ` Greg KH
  1 sibling, 1 reply; 18+ messages in thread
From: Ravikiran G Thirumalai @ 2004-07-14  8:57 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-kernel, dipankar

On Wed, Jul 14, 2004 at 12:07:00AM -0700, Greg KH wrote:
> On Wed, Jul 14, 2004 at 10:23:50AM +0530, Ravikiran G Thirumalai wrote:
> > 
> > The attatched patch provides infrastructure for refcounting of objects
> > in a rcu protected collection.
> 
> This is really close to the kref implementation.  Why not just use that
> instead?

Close, but not the same.  I just had a quick look at krefs.
Actually, this refrerence count infrastructure I am proposing is not for 
traditional refcounting.  This is for refcounting of elemnts of a list
or array (collection) which can be 'read' in a lock free manner.

For ex. With traditional refcounting, you can have 

1.                                  2.
add()                               search_and_reference()
{                                   {
    alloc_object                            read_lock(&list_lock);
    ...                                     search_for_element
    refcount_init(&el->rc)                  refcount_get(&el->rc);
    write_lock(&list_lock);                 ...
    add_element                             read_unlock(&list_lock);
    ...                                     ...
    write_unlock(&list_lock);       }
}

3.                                  4.
release_referenced()                delete()
{                                   {
    ...                             write_lock(&list_lock);
    if (refcount_put(&el->rc))      ...
            start_cleanup_object    ...
            free_object             delete_element
    ...                             write_unlock(&list_lock);
}                                   ...
                                    if (refcount_put(&el->rc))
                                            start_cleanup_object
                                            free_object
                                    }

add() puts the refcounted element into the system with the list_lock
taken for write, search_and_reference() takes the list_lock for read
and gets the refcount.  Now if the list was a read mostly kind, then
we could replace the list_lock rw lock with a spinlock,
serialise updates to the list with the spinlock taken (in the add())
and use rcu_read_lock() and rcu_read_unlock() on the reader side
(search_and_reference()) replacing the read_locks in 2. above. 

But, with rcu, search and reference could see stale elements, that is
elements which have been taken off the list from delete(). Using call_rcu
to free the object from release_referenced() (free_object in 3.) will defer 
freeing, so that &el memory location above is not freed 'til the reader 
comes out of rcu_read_lock protected code, _but_ the 
search_and_reference thread could potentially get a reference to a 
deleted list element.  Hence, under lockfree circumstances, just an 
atomic_inc to the refcount is not sufficient.  
We do:

	...
	c = atomic_read(&rc->count);
	while ( c && (old = cmpxchg(&rc->count.counter, c, c+1)) != c)
		c = old;
	return c;


which is abstracted out as refcount_get_rcu()

Hence, in the example above, search_and_reference would look like

search_and_reference()
{
    rcu_read_lock();
    search_for_element
    if (!refcount_get_rcu(&el->rc)) {
            rcu_read_unlock();
            return FAIL;
    }
    ...
    ...
    rcu_read_unlock();
}

Hope that clears things up.

> 
> Oh, and I think you need to use atomic_set() instead of initializing the
> atomic_t by hand.

I have used atomic_set for the case where arch has cmpxchg.  But for 
arches lacking cmpxchg, I need to use hashed spinlocks to implement 
the ref_count_get_rcu.
No point in having more atomic operations when I hold spinlocks.  Admittedly,
might be a bit yucky to assume atomic_t internals, but it is just one header
file :) <ducks>

Thanks,
Kiran

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Refcounting of objects part of a lockfree collection
@ 2004-07-14 10:24 Oleg Nesterov
  0 siblings, 0 replies; 18+ messages in thread
From: Oleg Nesterov @ 2004-07-14 10:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ravikiran G Thirumalai

Hello.

> #ifdef __HAVE_ARCH_CMPXCHG

in wrong place.

static inline void refcount_init(refcount_t *rc)
...
#ifdef __HAVE_ARCH_CMPXCHG
...
#else /* !__HAVE_ARCH_CMPXCHG */
...
static inline void refcount_init(refcoun_t *rc)  <--- redefenition
                                 ^^^^^^^^        <--- and typo
Oleg.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Refcounting of objects part of a lockfree collection
  2004-07-14  8:26   ` Dipankar Sarma
@ 2004-07-14 14:26     ` Greg KH
  2004-07-14 15:22       ` Dipankar Sarma
  2004-07-15  6:21       ` Ravikiran G Thirumalai
  0 siblings, 2 replies; 18+ messages in thread
From: Greg KH @ 2004-07-14 14:26 UTC (permalink / raw)
  To: Dipankar Sarma; +Cc: Ravikiran G Thirumalai, linux-kernel

On Wed, Jul 14, 2004 at 01:56:22PM +0530, Dipankar Sarma wrote:
> On Wed, Jul 14, 2004 at 12:07:00AM -0700, Greg KH wrote:
> > On Wed, Jul 14, 2004 at 10:23:50AM +0530, Ravikiran G Thirumalai wrote:
> > > 
> > > The attatched patch provides infrastructure for refcounting of objects
> > > in a rcu protected collection.
> > 
> > This is really close to the kref implementation.  Why not just use that
> > instead?
> 
> Well, the kref has the same get/put race if used in a lock-free
> look-up. When you do a kref_get() it is assumed that another
> cpu will not see a 1-to-0 transition of the reference count.

You mean kref_put(), right?

> If that indeed happens, ->release() will get invoked more
> than once for that object which is bad.

As kref_put() uses a atomic_t, how can that transistion happen twice?

What can happen is kref_get() and kref_put() can race if the last
kref_put() happens at the same time that kref_get().  But that is solved
by having the caller guarantee that this can not happen (see my 2004 OLS
paper for more info about this.)

> Kiran's patch actually solves this fundamental lock-free ref-counting
> problem.

Hm, maybe I'm missing something, but I don't see how it does so, based
on the implmentation of refcount_get() and refcount_put().  Sure, the
*_rcu implementations are a bit different, but if that's the only
difference with this code and kref, why not just add the rcu versions to
the kref code?

> The other issue is that there are many refcounted data structures
> like dentry, dst_entry, file etc. that do not use kref.

At this time, sure.  But you could always change that :)
(and yes, to do so, we can always shrink the size of struct kref if
really needed...)

> If everybody were to use kref, we could possibly apply Kiran's
> lock-free extensions to kref itself and be done with it.

Ok, sounds like a plan to me.  Having 2 refcount implementations in the
kernel that work alike, yet a bit different, is not acceptable.  Please
rework struct kref to do this.

> Until then, we need the lock-free refcounting support from non-kref
> refcounting objects.

We've lived without it until now somehow :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Refcounting of objects part of a lockfree collection
  2004-07-14 14:26     ` Greg KH
@ 2004-07-14 15:22       ` Dipankar Sarma
  2004-07-14 17:03         ` Greg KH
  2004-07-15  6:21       ` Ravikiran G Thirumalai
  1 sibling, 1 reply; 18+ messages in thread
From: Dipankar Sarma @ 2004-07-14 15:22 UTC (permalink / raw)
  To: Greg KH; +Cc: Ravikiran G Thirumalai, linux-kernel

On Wed, Jul 14, 2004 at 07:26:14AM -0700, Greg KH wrote:
> On Wed, Jul 14, 2004 at 01:56:22PM +0530, Dipankar Sarma wrote:
> > Well, the kref has the same get/put race if used in a lock-free
> > look-up. When you do a kref_get() it is assumed that another
> > cpu will not see a 1-to-0 transition of the reference count.
> 
> You mean kref_put(), right?

No, I meant kref_get(). See below.

> > If that indeed happens, ->release() will get invoked more
> > than once for that object which is bad.
> 
> As kref_put() uses a atomic_t, how can that transistion happen twice?
> 
> What can happen is kref_get() and kref_put() can race if the last
> kref_put() happens at the same time that kref_get().  But that is solved
> by having the caller guarantee that this can not happen (see my 2004 OLS
> paper for more info about this.)

Yes, and how do the callers guarantee that ? Using a lock, right ?
What Kiran's patch does is to allow those callers to use lock-free
algorithms. Let's look at the race -

---------------------------------------------------------------
                                                                                
CPU #0                                 CPU #1
------                                 ------
                                                                                
                                 my_put() from a user who did my_lookup() ...
                                                                                
                                 [ ->count is 1 ]
                                                                                
In my_lookup() ...               atomic_dec_and_test(&m->count)
                                                                                
                                 [ ->count is 0 ]
m = my_get(my_list[i]);
                                                                                
[ ->count is 1 ]                 call_rcu(&m->head, free, m);
                                                                                
return m;
                                                                                
[This CPU can now context
 switch and allow RCU to
 proceed]
                                                                                
                                 free(m);
                                                                                
Somebody dereferences m and
invalid memory reference
---------------------------------------------------------------

This can happen if my_lookup() is lock-free.

> > The other issue is that there are many refcounted data structures
> > like dentry, dst_entry, file etc. that do not use kref.
> 
> At this time, sure.  But you could always change that :)
> (and yes, to do so, we can always shrink the size of struct kref if
> really needed...)

How are you going to shrink it ? You need the ->release() method
and that is a nice way for drivers to get rid of objects.

> 
> > If everybody were to use kref, we could possibly apply Kiran's
> > lock-free extensions to kref itself and be done with it.
> 
> Ok, sounds like a plan to me.  Having 2 refcount implementations in the
> kernel that work alike, yet a bit different, is not acceptable.  Please
> rework struct kref to do this.

And I suspect that Andrew thwak me for trying to increase dentry size :)
Anyway, the summary is this - Kiran is not trying to introduce
a new refcounting API. He is just adding lock-free support from
an existing refcounting mechanism that is used in VFS. If kref users need to do
lock-free lookup, sure we should add it to kref_xxx APIs also.

> > Until then, we need the lock-free refcounting support from non-kref
> > refcounting objects.
>
> We've lived without it until now somehow :)

Actually, we already use lock-free refcounting in route cache, dcache. In those
cases, we work around this race using a different algorithm.

Thanks
Dipankar

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Refcounting of objects part of a lockfree collection
  2004-07-14 15:22       ` Dipankar Sarma
@ 2004-07-14 17:03         ` Greg KH
  2004-07-14 17:49           ` Dipankar Sarma
  0 siblings, 1 reply; 18+ messages in thread
From: Greg KH @ 2004-07-14 17:03 UTC (permalink / raw)
  To: Dipankar Sarma; +Cc: Ravikiran G Thirumalai, linux-kernel

On Wed, Jul 14, 2004 at 08:52:35PM +0530, Dipankar Sarma wrote:
> On Wed, Jul 14, 2004 at 07:26:14AM -0700, Greg KH wrote:
> > On Wed, Jul 14, 2004 at 01:56:22PM +0530, Dipankar Sarma wrote:
> > > Well, the kref has the same get/put race if used in a lock-free
> > > look-up. When you do a kref_get() it is assumed that another
> > > cpu will not see a 1-to-0 transition of the reference count.
> > 
> > You mean kref_put(), right?
> 
> No, I meant kref_get(). See below.

Ok, we are both talking about the same thing, just from different
angles.  Yes, I agree with you about this.

> Yes, and how do the callers guarantee that ? Using a lock, right ?
> What Kiran's patch does is to allow those callers to use lock-free
> algorithms.

So modify kref to also use those algorithms.  That's all I'm asking.

> > > The other issue is that there are many refcounted data structures
> > > like dentry, dst_entry, file etc. that do not use kref.
> > 
> > At this time, sure.  But you could always change that :)
> > (and yes, to do so, we can always shrink the size of struct kref if
> > really needed...)
> 
> How are you going to shrink it ? You need the ->release() method
> and that is a nice way for drivers to get rid of objects.

struct kref can get rid of the release pointer, and require it as part
of the kref_put() call.

> > > If everybody were to use kref, we could possibly apply Kiran's
> > > lock-free extensions to kref itself and be done with it.
> > 
> > Ok, sounds like a plan to me.  Having 2 refcount implementations in the
> > kernel that work alike, yet a bit different, is not acceptable.  Please
> > rework struct kref to do this.
> 
> And I suspect that Andrew thwak me for trying to increase dentry size :)
> Anyway, the summary is this - Kiran is not trying to introduce
> a new refcounting API.

Yes he is.  He's calling it refcount.h, and creating a typedef called
refcount_t.  Sure looks like a new refcount API to me :)

> He is just adding lock-free support from an existing refcounting
> mechanism that is used in VFS.

If this is true, then I strongly object to the naming of this file, and
the name of the typedef (which shouldn't be a typedef at all) and this
should be made a private data structure to the vfs so no one else tries
to use it.  Otherwise it will be used.

> If kref users need to do lock-free lookup, sure we should add it to
> kref_xxx APIs also.

I still think you should just use a kref and add the apis.

> > > Until then, we need the lock-free refcounting support from non-kref
> > > refcounting objects.
> >
> > We've lived without it until now somehow :)
> 
> Actually, we already use lock-free refcounting in route cache, dcache. In those
> cases, we work around this race using a different algorithm.

I realize that.  Then, again, this shouldn't be advertised as such a
general api that everyone can use.  That's my main objection here.  If
you want to make it private to the specific file implementation in the
vfs, fine with me.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Refcounting of objects part of a lockfree collection
  2004-07-14  8:57   ` Ravikiran G Thirumalai
@ 2004-07-14 17:08     ` Greg KH
  2004-07-14 18:17       ` Dipankar Sarma
  2004-07-15  8:02       ` Ravikiran G Thirumalai
  0 siblings, 2 replies; 18+ messages in thread
From: Greg KH @ 2004-07-14 17:08 UTC (permalink / raw)
  To: Ravikiran G Thirumalai; +Cc: linux-kernel, dipankar

On Wed, Jul 14, 2004 at 02:27:58PM +0530, Ravikiran G Thirumalai wrote:
> On Wed, Jul 14, 2004 at 12:07:00AM -0700, Greg KH wrote:
> > On Wed, Jul 14, 2004 at 10:23:50AM +0530, Ravikiran G Thirumalai wrote:
> > > 
> > > The attatched patch provides infrastructure for refcounting of objects
> > > in a rcu protected collection.
> > 
> > This is really close to the kref implementation.  Why not just use that
> > instead?
> 
> Close, but not the same.  I just had a quick look at krefs.
> Actually, this refrerence count infrastructure I am proposing is not for 
> traditional refcounting.

But you are advertising it as such by calling it a refcount_t and
putting it in a file called refcount.h.

> > Oh, and I think you need to use atomic_set() instead of initializing the
> > atomic_t by hand.
> 
> I have used atomic_set for the case where arch has cmpxchg.  But for 
> arches lacking cmpxchg, I need to use hashed spinlocks to implement 
> the ref_count_get_rcu.
> No point in having more atomic operations when I hold spinlocks.  Admittedly,
> might be a bit yucky to assume atomic_t internals, but it is just one header
> file :) <ducks>

I still think you need to fix this, manipulating atomic_t variables by
hand is not always guaranteed to work on all arches, from what I
remember.

And what arches do not support cmpxchg?  How does this change affect the
performance of them?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Refcounting of objects part of a lockfree collection
  2004-07-14 17:03         ` Greg KH
@ 2004-07-14 17:49           ` Dipankar Sarma
  2004-07-14 18:03             ` Greg KH
  0 siblings, 1 reply; 18+ messages in thread
From: Dipankar Sarma @ 2004-07-14 17:49 UTC (permalink / raw)
  To: Greg KH; +Cc: Ravikiran G Thirumalai, linux-kernel

On Wed, Jul 14, 2004 at 10:03:37AM -0700, Greg KH wrote:
> On Wed, Jul 14, 2004 at 08:52:35PM +0530, Dipankar Sarma wrote:
> > And I suspect that Andrew thwak me for trying to increase dentry size :)
> > Anyway, the summary is this - Kiran is not trying to introduce
> > a new refcounting API.
> 
> Yes he is.  He's calling it refcount.h, and creating a typedef called
> refcount_t.  Sure looks like a new refcount API to me :)

OK, in terms of code, it is new API. Just the refcounting mechanism
is same as existing non-kref refcounts.

> > He is just adding lock-free support from an existing refcounting
> > mechanism that is used in VFS.
> 
> If this is true, then I strongly object to the naming of this file, and
> the name of the typedef (which shouldn't be a typedef at all) and this
> should be made a private data structure to the vfs so no one else tries
> to use it.  Otherwise it will be used.

I am reasonably sure that when that patch was done (months ago) kref wasn't
there. Now that kref.[ch] is around, everything can be put there.

Now, if struct kref is shrinked (want patch ? ;-), all this 
can possibly be nicely collapsed into one set of APIs for refcounting.
There aren't many users of kref yet, so this seems like a good
time to do it. Was there any objection to shrinking it ?


Thanks
Dipankar

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Refcounting of objects part of a lockfree collection
  2004-07-14 17:49           ` Dipankar Sarma
@ 2004-07-14 18:03             ` Greg KH
  0 siblings, 0 replies; 18+ messages in thread
From: Greg KH @ 2004-07-14 18:03 UTC (permalink / raw)
  To: Dipankar Sarma; +Cc: Ravikiran G Thirumalai, linux-kernel

On Wed, Jul 14, 2004 at 11:19:49PM +0530, Dipankar Sarma wrote:
> On Wed, Jul 14, 2004 at 10:03:37AM -0700, Greg KH wrote:
> > On Wed, Jul 14, 2004 at 08:52:35PM +0530, Dipankar Sarma wrote:
> > > He is just adding lock-free support from an existing refcounting
> > > mechanism that is used in VFS.
> > 
> > If this is true, then I strongly object to the naming of this file, and
> > the name of the typedef (which shouldn't be a typedef at all) and this
> > should be made a private data structure to the vfs so no one else tries
> > to use it.  Otherwise it will be used.
> 
> I am reasonably sure that when that patch was done (months ago) kref wasn't
> there. Now that kref.[ch] is around, everything can be put there.

I agree.

> Now, if struct kref is shrinked (want patch ? ;-), all this 
> can possibly be nicely collapsed into one set of APIs for refcounting.

Sounds good to me.

> There aren't many users of kref yet, so this seems like a good
> time to do it. Was there any objection to shrinking it ?

None that I know of.  In fact, it's on my list of things to do, so a
patch from someone else to fix this would be greatly appreciated.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Refcounting of objects part of a lockfree collection
  2004-07-14 17:08     ` Greg KH
@ 2004-07-14 18:17       ` Dipankar Sarma
  2004-07-15  8:02       ` Ravikiran G Thirumalai
  1 sibling, 0 replies; 18+ messages in thread
From: Dipankar Sarma @ 2004-07-14 18:17 UTC (permalink / raw)
  To: Greg KH; +Cc: Ravikiran G Thirumalai, linux-kernel

On Wed, Jul 14, 2004 at 10:08:00AM -0700, Greg KH wrote:
> On Wed, Jul 14, 2004 at 02:27:58PM +0530, Ravikiran G Thirumalai wrote:
> > might be a bit yucky to assume atomic_t internals, but it is just one header
> > file :) <ducks>
> 
> I still think you need to fix this, manipulating atomic_t variables by
> hand is not always guaranteed to work on all arches, from what I
> remember.

AFAICS, the hash-locked refcounting grabs a spin lock for all
operations on the atomic_t. Any reason why that should not be safe ?
Of course, I can't see why we can't have two versions of the
reference counter depending on __HAVE_ARCH_CMPXCHG. Kiran ?

> 
> And what arches do not support cmpxchg?  How does this change affect the
> performance of them?

mips64, smp arm ?? ;-)

With a hashed lock, it should not be all that bad in low-end SMPs.
Besides we already use such a thing in gettimeofday implementation
with a global lock. However this is a valid issue and performance #s
from those arch users would be useful.

Thanks
Dipankar

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Refcounting of objects part of a lockfree collection
  2004-07-14 14:26     ` Greg KH
  2004-07-14 15:22       ` Dipankar Sarma
@ 2004-07-15  6:21       ` Ravikiran G Thirumalai
  2004-07-15  6:56         ` Dipankar Sarma
  1 sibling, 1 reply; 18+ messages in thread
From: Ravikiran G Thirumalai @ 2004-07-15  6:21 UTC (permalink / raw)
  To: Greg KH; +Cc: Dipankar Sarma, linux-kernel, oleg

On Wed, Jul 14, 2004 at 07:26:14AM -0700, Greg KH wrote:
> ... 
> As kref_put() uses a atomic_t, how can that transistion happen twice?
> 
> What can happen is kref_get() and kref_put() can race if the last
> kref_put() happens at the same time that kref_get().  But that is solved
> by having the caller guarantee that this can not happen (see my 2004 OLS
> paper for more info about this.)

The last kref_put (zero transition) should happen after the refcounted
object is removed from the list, and the removal is serialised
through the traditional rw/spinlock (see my example earlier).
With the traditional lock protection, if the reader has a lock, writer
cannot remove the element from the list and no subsequent last put, or if the 
writer has removed the element, the reader won't even see the elemnt,
when he walks the list..so no question of taking a reference on 
the deleted element.
Lockfree and rcu changes all this.  Hence, we need refcount_get_rcu.

Now it would have been nice if we did not have to use cmpxchg 
for refcount_get_rcu, or atleast if all arches implemented cmpxchg in
hardware.  Since neither is true, for arches with no cmpxchg, we use
the hashed spinlock method.  When we use hashed spinlock for 
refcount_get_rcu method, we need to use the same spinlock to protect the
refcount for the usual refcount_gets too.  So no atomic_inc for
refcount_gets on arches with no cmpxchg.  
Now why refcount_get when you are using refcount_get_rcu for a refcount one 
might ask. With the fd patch, there are places where the refcount_get 
happens with traditional serialisation, and places where refcount_get 
happens lockfree.  So it makes sense for performance sake to have a 
simple atomic_inc (refcount_get) instead of and the whole cmpxchg shebang
for the same refcounter when traditional locking is held.  Hence the
refcount_get infrastructure.  The whole refcount infrastructure 
provided is _not_ meant for traditinal refcounting ...which kref 
accomplishes.  If it is used for traditional refcounting (i.e refcount_get
on a refcounter is used without any refcount_get_rcus for the same
counter), arches with nocmpxchg will end up using hashed spinlocks for
simple refcount_gets :).  A Note was included in the header file as
a warning:

<in the patch>
 * Note: Eventhough these refcount primitives can be used for lists which
 * are protected by traditional locking, if the arch doesn't have cmpxchg,
 * there might be extra penalties in refcount_get.
</>

To summarise, this infrastructure is not meant to replace kref, or is not 
meant to be used for traditional refcounting.  All primitives are for lockfree
refcounting and associated functions.  And the infrastructure came in 
because there are more than one application to use it. files_struct is one,
fs_struct is another, on which I am working.  There might be more later :).

As Oleg rightly pointed out, the #ifdef __HAVE_ARCH_CMPXCHG  is at the wrong
place.  My bad.  Maybe this confusion wouldn't happen if it was at the
right place to begin with.  Here is the corrected refcount patch.

Thanks,
Kiran


diff -ruN -X dontdiff linux-2.6.7/include/linux/refcount.h files_struct-2.6.7/include/linux/refcount.h
--- linux-2.6.7/include/linux/refcount.h	1970-01-01 05:30:00.000000000 +0530
+++ files_struct-2.6.7/include/linux/refcount.h	2004-07-15 11:13:08.000000000 +0530
@@ -0,0 +1,245 @@
+/* 
+ * Refcounter framework for elements of lists/arrays protected by
+ * RCU.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2004
+ *
+ * Author: Ravikiran Thirumalai <kiran@in.ibm.com>
+ * Credits: Dipankar Sarma <dipankar@in.ibm.com>.  refcount_get_rcu has
+ *	    been abstracted out from Dipankar's implementation of rcu based
+ *	    fd management.
+ * 
+ * Refcounting on elements of  lists which are protected by traditional
+ * reader/writer spinlocks or semaphores are straight forward as in:
+ * 
+ * 1.					2.
+ * add()				search_and_reference()
+ * {					{
+ * 	alloc_object				read_lock(&list_lock);
+ *	...					search_for_element
+ *	refcount_init(&el->rc)			refcount_get(&el->rc);	
+ *	write_lock(&list_lock);			...
+ *	add_element				read_unlock(&list_lock);
+ *	...					...
+ *	write_unlock(&list_lock);	}
+ * }					
+ *
+ * 3.					4.
+ * release_referenced()			delete()
+ * {					{
+ *	...				write_lock(&list_lock);
+ *	if (refcount_put(&el->rc))	...
+ *		start_cleanup_object	...
+ *		free_object		delete_element
+ *	...				write_unlock(&list_lock);
+ * }					...
+ *					if (refcount_put(&el->rc))
+ *						start_cleanup_object
+ *						free_object
+ *					}
+ *
+ * If this list/array is made lock free using rcu as in changing the 
+ * write_lock in add() and delete() to spin_lock and changing read_lock
+ * in search_and_reference to rcu_read_lock(), the refcount_get in 
+ * search_and_reference could potentially hold reference to an element which
+ * has already been deleted from the list/array.  refcount_get_rcu takes
+ * care of this scenario. search_and_reference should look as;
+ * 2. 
+ * search_and_reference()
+ * {
+ *	rcu_read_lock();
+ *	search_for_element
+ *	if (!refcount_get_rcu(&el->rc)) {
+ *		rcu_read_unlock();
+ *		return FAIL;
+ *	}
+ *	...
+ *	...
+ *	rcu_read_unlock();
+ * }
+ * 
+ * Of course, free_object after refcount_put should be batched using call_rcu.
+ * Sometimes, reference to the element need to be obtained in the 
+ * update (write) stream.  In such cases, refcount_get_rcu might be an overkill
+ * since the spinlock serialising list updates are held. refcount_get
+ * is to be used in such cases.  
+ * Note: Eventhough these refcount primitives can be used for lists which
+ * are protected by traditional locking, if the arch doesn't have cmpxchg,
+ * there might be extra penalties in refcount_get.
+ *
+ */
+
+#ifndef _LINUX_REFCOUNT_H
+#define _LINUX_REFCOUNT_H
+#ifdef  __KERNEL__
+
+#include <asm/atomic.h>
+#include <asm/system.h>
+
+typedef struct { atomic_t count; } refcount_t;
+
+/* 
+ * Refcount primitives for lists/array elements protected by traditional
+ * locking
+ */
+
+#ifdef __HAVE_ARCH_CMPXCHG
+
+static inline void refcount_init(refcount_t *rc)	
+{
+	atomic_set(&rc->count, 1);
+}
+
+static inline int refcount_put(refcount_t *rc)
+{
+	return atomic_dec_and_test(&rc->count);
+}
+
+static inline void refcount_dec(refcount_t *rc)
+{
+	atomic_dec(&rc->count);
+}
+
+static inline int refcount_put_and_lock(refcount_t *rc, spinlock_t *lock)
+{
+	return atomic_dec_and_lock(&rc->count, lock);
+}
+
+static inline int refcount_read(refcount_t *rc)
+{
+	return atomic_read(&rc->count);
+}
+
+
+static inline void refcount_get(refcount_t *rc)
+{
+	atomic_inc(&rc->count);
+}
+
+/* 
+ * Refcount primitives for lists/array elements protected by rcu
+ */
+
+/* 
+ * cmpxchg is needed on UP too, if deletions to the list/array can happen
+ * in interrupt context.
+ */
+static inline int refcount_get_rcu(refcount_t *rc)
+{
+	int c, old;
+	c = atomic_read(&rc->count);
+	while ( c && (old = cmpxchg(&rc->count.counter, c, c+1)) != c) 
+		c = old;
+	return c;
+}
+
+#else /* !__HAVE_ARCH_CMPXCHG */
+
+/* 
+ * We use an array of spinlocks for the refcounts -- similar to ones in sparc
+ * 32 bit atomic_t implementations, and a hash function similar to that
+ * for our refcounting needs.
+ * Can't help multiprocessors which donot have cmpxchg :(
+ */
+
+#ifdef	CONFIG_SMP
+#define REFCOUNT_HASH_SIZE	4
+#define REFCOUNT_HASH(r) \
+	(&__refcount_hash[(((unsigned long)r)>>8) & (REFCOUNT_HASH_SIZE-1)])
+#else
+#define	REFCOUNT_HASH_SIZE	1
+#define REFCOUNT_HASH(r) 	&__refcount_hash[0]
+#endif /* CONFIG_SMP */
+
+extern spinlock_t __refcount_hash[REFCOUNT_HASH_SIZE]
+
+static inline void refcount_init(refcount_t *rc)	
+{
+	unsigned long flags;
+	spin_lock_irqsave(REFCOUNT_HASH(rc), flags);
+	rc->count.counter = 1;
+	spin_unlock_irqrestore((REFCOUNT_HASH(rc), flags);
+}
+
+static inline int refcount_put(refcount_t *rc)
+{
+	unsigned long flags;
+	spin_lock_irqsave(REFCOUNT_HASH(rc), flags);
+	rc->count.counter--;
+	if (!rc->count.counter) {
+		spin_unlock_irqrestore((REFCOUNT_HASH(rc), flags);
+		return 1;
+	} else {
+		spin_unlock_irqrestore((REFCOUNT_HASH(rc), flags);
+		return 0;
+	}
+}
+
+static inline void refcount_dec(refcount_t *rc)
+{
+	unsigned long flags;
+	spin_lock_irqsave(REFCOUNT_HASH(rc), flags);
+	rc->count.counter--;
+	spin_unlock_irqrestore((REFCOUNT_HASH(rc), flags);
+}
+
+static inline int refcount_put_and_locked(refcount_t *rc, spinlock_t *lock)
+{
+	unsigned long flags;
+	spin_lock(lock);
+	spin_lock_irqsave(REFCOUNT_HASH(rc), flags);
+	rc->count.counter--;
+	if (!rc->count.counter) {
+		spin_unlock_irqrestore((REFCOUNT_HASH(rc), flags);
+		return 1;
+	} else {
+		spin_unlock_irqrestore((REFCOUNT_HASH(rc), flags);
+		spin_unlock(lock)
+		return 0;
+	}
+}
+/* Spinlocks are not needed in this case, if atomic_read is used */
+static inline int refcount_read(refcount_t *rc)
+{
+	return atomic_read(&rc->count.counter);
+}
+
+static inline void refcount_get(refcount_t *rc)
+{
+	unsigned long flags;
+	spin_lock_irqsave(REFCOUNT_HASH(rc), flags);
+	rc->count++;
+	spin_unlock_irqrestore((REFCOUNT_HASH(rc), flags);
+}
+
+static inline int refcount_get_rcu(refcount_t *rc)
+{
+	int ret;
+	unsigned long flags;
+	spin_lock_irqsave(REFCOUNT_HASH(rc), flags);
+	if (rc->count.counter) 
+		ret = rc->count.counter++;
+	else
+		ret = 0;
+	spin_unlock_irqrestore((REFCOUNT_HASH(rc), flags);
+	return ret;
+}
+
+#endif /* __HAVE_ARCH_CMPXCHG */
+
+#endif
+#endif /* _LINUX_REFCOUNT_H */
diff -ruN -X dontdiff linux-2.6.7/kernel/rcupdate.c files_struct-2.6.7/kernel/rcupdate.c
--- linux-2.6.7/kernel/rcupdate.c	2004-06-16 10:48:38.000000000 +0530
+++ files_struct-2.6.7/kernel/rcupdate.c	2004-07-15 10:56:25.000000000 +0530
@@ -44,6 +44,7 @@
 #include <linux/notifier.h>
 #include <linux/rcupdate.h>
 #include <linux/cpu.h>
+#include <linux/refcount.h>
 
 /* Definition for rcupdate control block. */
 struct rcu_ctrlblk rcu_ctrlblk = 
@@ -325,6 +326,17 @@
 	wait_for_completion(&completion);
 }
 
+/**
+ * spinlock_t array for arches which donot have cmpxchg.  This is for
+ * rcu based refcounting stuff.
+ */
+
+#ifndef __HAVE_ARCH_CMPXCHG
+spinlock_t __refcount_hash[REFCOUNT_HASH_SIZE] = {
+	[0 ... (REFCOUNT_HASH_SIZE-1)] = SPIN_LOCK_UNLOCKED
+};
+EXPORT_SYMBOL(__refcount_hash);
+#endif
 
 EXPORT_SYMBOL(call_rcu);
 EXPORT_SYMBOL(synchronize_kernel);

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Refcounting of objects part of a lockfree collection
  2004-07-15  6:21       ` Ravikiran G Thirumalai
@ 2004-07-15  6:56         ` Dipankar Sarma
  0 siblings, 0 replies; 18+ messages in thread
From: Dipankar Sarma @ 2004-07-15  6:56 UTC (permalink / raw)
  To: Ravikiran G Thirumalai; +Cc: Greg KH, linux-kernel, oleg

On Thu, Jul 15, 2004 at 11:51:04AM +0530, Ravikiran G Thirumalai wrote:
> Now it would have been nice if we did not have to use cmpxchg 
> for refcount_get_rcu, or atleast if all arches implemented cmpxchg in
> hardware.  Since neither is true, for arches with no cmpxchg, we use
> the hashed spinlock method.  When we use hashed spinlock for 
> refcount_get_rcu method, we need to use the same spinlock to protect the
> refcount for the usual refcount_gets too.  So no atomic_inc for
> refcount_gets on arches with no cmpxchg.  
> Now why refcount_get when you are using refcount_get_rcu for a refcount one 
> might ask. With the fd patch, there are places where the refcount_get 
> happens with traditional serialisation, and places where refcount_get 
> happens lockfree.  So it makes sense for performance sake to have a 
> simple atomic_inc (refcount_get) instead of and the whole cmpxchg shebang
> for the same refcounter when traditional locking is held.  Hence the
> refcount_get infrastructure.  The whole refcount infrastructure 
> provided is _not_ meant for traditinal refcounting ...which kref 
> accomplishes.  If it is used for traditional refcounting (i.e refcount_get
> on a refcounter is used without any refcount_get_rcus for the same
> counter), arches with nocmpxchg will end up using hashed spinlocks for
> simple refcount_gets :).  A Note was included in the header file as
> a warning:

Would this work -

kref_get - just atomic inc
kref_put - just atomic dec
kref_get_rcu - cmpxchg if arch has or hashed spinlock
kref_put_rcu - dec if has cmpxchg else hashed spinlock

If the object has lock-free look-up, it must use kref_get_rcu.
You can use kref_get on such an object if the look-up is being
done with lock. Is there a situation that is not covered by this ?

Thanks
Dipankar

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Refcounting of objects part of a lockfree collection
  2004-07-14 17:08     ` Greg KH
  2004-07-14 18:17       ` Dipankar Sarma
@ 2004-07-15  8:02       ` Ravikiran G Thirumalai
  2004-07-15  9:36         ` Dipankar Sarma
  2004-07-16 14:32         ` Greg KH
  1 sibling, 2 replies; 18+ messages in thread
From: Ravikiran G Thirumalai @ 2004-07-15  8:02 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-kernel, dipankar

On Wed, Jul 14, 2004 at 10:08:00AM -0700, Greg KH wrote:
> > ...
> > Close, but not the same.  I just had a quick look at krefs.
> > Actually, this refrerence count infrastructure I am proposing is not for 
> > traditional refcounting.
> 
> But you are advertising it as such by calling it a refcount_t and
> putting it in a file called refcount.h.

The naming is bad, I agree. But as Dipankar pointed out earlier, there
was no kref when I did this.  We (Dipankar and myslef) had a discussion
and decided:
1. I will make a patch to shrink kref and feed it to Greg
2. Add new set kref api for lockfree refcounting --
	kref_lf_xxx.  (kref_lf_get, kref_lf_get_rcu etc.,)
3. Change the fd lookup patch to use kref_lf_xxx api

Does that sound ok?

Thanks,
Kiran

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Refcounting of objects part of a lockfree collection
  2004-07-15  8:02       ` Ravikiran G Thirumalai
@ 2004-07-15  9:36         ` Dipankar Sarma
  2004-07-16 14:32         ` Greg KH
  1 sibling, 0 replies; 18+ messages in thread
From: Dipankar Sarma @ 2004-07-15  9:36 UTC (permalink / raw)
  To: Ravikiran G Thirumalai; +Cc: Greg KH, linux-kernel

On Thu, Jul 15, 2004 at 01:32:04PM +0530, Ravikiran G Thirumalai wrote:
> The naming is bad, I agree. But as Dipankar pointed out earlier, there
> was no kref when I did this.  We (Dipankar and myslef) had a discussion
> and decided:
> 1. I will make a patch to shrink kref and feed it to Greg

1.5 Convert files_struct to use kref


> 2. Add new set kref api for lockfree refcounting --
> 	kref_lf_xxx.  (kref_lf_get, kref_lf_get_rcu etc.,)
> 3. Change the fd lookup patch to use kref_lf_xxx api

Thanks
Dipankar

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Refcounting of objects part of a lockfree collection
  2004-07-15  8:02       ` Ravikiran G Thirumalai
  2004-07-15  9:36         ` Dipankar Sarma
@ 2004-07-16 14:32         ` Greg KH
  2004-07-16 15:50           ` Ravikiran G Thirumalai
  1 sibling, 1 reply; 18+ messages in thread
From: Greg KH @ 2004-07-16 14:32 UTC (permalink / raw)
  To: Ravikiran G Thirumalai; +Cc: linux-kernel, dipankar

On Thu, Jul 15, 2004 at 01:32:04PM +0530, Ravikiran G Thirumalai wrote:
> The naming is bad, I agree. But as Dipankar pointed out earlier, there
> was no kref when I did this.

No offence meant, but lib/kref.c has been in the kernel tree since
2.6.5-rc1 which was released 4 months ago.  Did refcount.h take 4 months
to get released?  :)

> We (Dipankar and myslef) had a discussion
> and decided:
> 1. I will make a patch to shrink kref and feed it to Greg
> 2. Add new set kref api for lockfree refcounting --
> 	kref_lf_xxx.  (kref_lf_get, kref_lf_get_rcu etc.,)

kref_*_rcu() as Dipankar noted is much nicer.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Refcounting of objects part of a lockfree collection
  2004-07-16 14:32         ` Greg KH
@ 2004-07-16 15:50           ` Ravikiran G Thirumalai
  0 siblings, 0 replies; 18+ messages in thread
From: Ravikiran G Thirumalai @ 2004-07-16 15:50 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-kernel, dipankar

On Fri, Jul 16, 2004 at 07:32:35AM -0700, Greg KH wrote:
>... 
> > We (Dipankar and myslef) had a discussion
> > and decided:
> > 1. I will make a patch to shrink kref and feed it to Greg
> > 2. Add new set kref api for lockfree refcounting --
> > 	kref_lf_xxx.  (kref_lf_get, kref_lf_get_rcu etc.,)
> 
> kref_*_rcu() as Dipankar noted is much nicer.

Do you mean Dipankar had noted as in:

<Dipankar's earlier post>
> Would this work -
> 
> kref_get - just atomic inc
> kref_put - just atomic dec
> kref_get_rcu - cmpxchg if arch has or hashed spinlock
> kref_put_rcu - dec if has cmpxchg else hashed spinlock

> If the object has lock-free look-up, it must use kref_get_rcu.
> You can use kref_get on such an object if the look-up is being
> done with lock. Is there a situation that is not covered by this ?

</>

If that is what you want, the api cannot work well.
The problem with this case is I cannot use hashed spinlock _just_ for
kref_get_rcu as Dipankar mentions above.  If I do that the atomic_inc s
in kref_get will race with the 'hashed spinlock' kref_get_rcu when 
both have to be used on the same refcounter.  (I mentioned this in an 
earlier post).  So I have to take the same hashed spinlock for kref_gets as 
well to maintain correctness -- obviously not a good thing for arches 
with no cmpxchg which use krefs for non lock free purposes only.  
Hence, the proposal now is to add kref_lf_get, kref_lf_put, kref_lf_get_rcu 
to the kref api.  kref_lf_xxx is to be used when the kref refcounter
is used atleast one place lock free -- that is kref_lf_get_rcu is used
at atleast once on the struct kref.  Does that sound ok?

Thanks,
Kiran

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2004-07-16 15:51 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-14 10:24 [RFC] Refcounting of objects part of a lockfree collection Oleg Nesterov
  -- strict thread matches above, loose matches on Subject: below --
2004-07-14  4:53 Ravikiran G Thirumalai
2004-07-14  7:07 ` Greg KH
2004-07-14  8:26   ` Dipankar Sarma
2004-07-14 14:26     ` Greg KH
2004-07-14 15:22       ` Dipankar Sarma
2004-07-14 17:03         ` Greg KH
2004-07-14 17:49           ` Dipankar Sarma
2004-07-14 18:03             ` Greg KH
2004-07-15  6:21       ` Ravikiran G Thirumalai
2004-07-15  6:56         ` Dipankar Sarma
2004-07-14  8:57   ` Ravikiran G Thirumalai
2004-07-14 17:08     ` Greg KH
2004-07-14 18:17       ` Dipankar Sarma
2004-07-15  8:02       ` Ravikiran G Thirumalai
2004-07-15  9:36         ` Dipankar Sarma
2004-07-16 14:32         ` Greg KH
2004-07-16 15:50           ` Ravikiran G Thirumalai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.