public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH V4 1/1] rcu: introduce kfree_rcu()
@ 2011-03-15  9:46 Lai Jiangshan
  2011-03-15 10:15 ` Arnd Bergmann
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Lai Jiangshan @ 2011-03-15  9:46 UTC (permalink / raw)
  To: Ingo Molnar, Paul E. McKenney, LKML, Manfred Spraul

kfree_rcu() which was original proposed by Lai 2.5 years ago is one of
the most important RCU TODO list entries, Lai and Manfred have worked on
patches for this. This V4 patch is based on the Manfred's patch and
the V1 of Lai's patch. (These two patches are almost the same
in implementation, and this patch is mainly based on the Manfred's).

Lai's V1 patch: http://lkml.org/lkml/2008/9/18/1
Manfred's patch: http://lkml.org/lkml/2009/1/2/115
RCU TODO list: http://www.kernel.org/pub/linux/kernel/people/paulmck/rcutodo.html

This new introduced API kfree_rcu() primitive kfree()s the specified memory
after a RCU grace period elapses.

It replaces many simple "call_rcu(head, simple_kfree_callback)";
These many simple_kfree_callback() instances just does

	kfree(containerof(head,struct whatever_struct,rcu_member));

These simple_kfree_callback() instances are just duplicate code, we need
a generic function for them.

And kfree_rcu() is also help for unloadable modules, kfree_rcu() does not
queue any function which belong to the module, so a rcu_barrier() can
be avoid when module exit. (If we queue any other function by call_rcu(),
rcu_barrier() is still needed.)

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
---
 include/linux/rcupdate.h |   40 ++++++++++++++++++++++++++++++++++++++++
 kernel/rcutiny.c         |    2 +-
 kernel/rcutree.c         |    2 +-
 3 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 7d62909..18f7ade 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -777,4 +777,44 @@ static inline void debug_rcu_head_unqueue(struct rcu_head *head)
 }
 #endif	/* #else !CONFIG_DEBUG_OBJECTS_RCU_HEAD */
 
+static __always_inline bool __is_kfree_rcu_offset(unsigned long offset)
+{
+	return offset < 4096;
+}
+
+static __always_inline
+void __kfree_rcu(struct rcu_head *head, unsigned long offset)
+{
+	typedef void (*rcu_callback)(struct rcu_head *);
+
+	BUILD_BUG_ON(!__builtin_constant_p(offset));
+	BUILD_BUG_ON(!__is_kfree_rcu_offset(offset));
+
+	call_rcu(head, (rcu_callback)offset);
+}
+
+extern void kfree(const void *);
+
+static inline void __rcu_reclaim(struct rcu_head *head)
+{
+	unsigned long offset = (unsigned long)head->func;
+
+	if (__is_kfree_rcu_offset(offset))
+		kfree((void *)head - offset);
+	else
+		head->func(head);
+}
+
+/**
+ * kfree_rcu() - kfree an object after a grace period.
+ * @ptr:	pointer to kfree
+ * @rcu_head:	the name of the struct rcu_head within the type of @ptr.
+ *
+ * Many rcu callbacks just call kfree() on the base structure. This helper
+ * function calls kfree internally. The rcu_head structure must be embedded
+ * in the to be freed structure.
+ */
+#define kfree_rcu(ptr, rcu_head)					\
+	__kfree_rcu(&((ptr)->rcu_head), offsetof(typeof(*(ptr)), rcu_head))
+
 #endif /* __LINUX_RCUPDATE_H */
diff --git a/kernel/rcutiny.c b/kernel/rcutiny.c
index 0c343b9..4d60fbc 100644
--- a/kernel/rcutiny.c
+++ b/kernel/rcutiny.c
@@ -167,7 +167,7 @@ static void rcu_process_callbacks(struct rcu_ctrlblk *rcp)
 		prefetch(next);
 		debug_rcu_head_unqueue(list);
 		local_bh_disable();
-		list->func(list);
+		__rcu_reclaim(list);
 		local_bh_enable();
 		list = next;
 		RCU_TRACE(cb_count++);
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index dd4aea8..b3c1aed 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -1143,7 +1143,7 @@ static void rcu_do_batch(struct rcu_state *rsp, struct rcu_data *rdp)
 		next = list->next;
 		prefetch(next);
 		debug_rcu_head_unqueue(list);
-		list->func(list);
+		__rcu_reclaim(list);
 		list = next;
 		if (++count >= rdp->blimit)
 			break;
-- 
1.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH V4 1/1] rcu: introduce kfree_rcu()
  2011-03-15  9:46 [PATCH V4 1/1] rcu: introduce kfree_rcu() Lai Jiangshan
@ 2011-03-15 10:15 ` Arnd Bergmann
  2011-03-15 11:27   ` Paul E. McKenney
  2011-03-16  2:23   ` [PATCH V4 " Lai Jiangshan
  2011-03-15 11:30 ` Paul E. McKenney
  2011-03-15 13:11 ` Eric Dumazet
  2 siblings, 2 replies; 19+ messages in thread
From: Arnd Bergmann @ 2011-03-15 10:15 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: Ingo Molnar, Paul E. McKenney, LKML, Manfred Spraul

On Tuesday 15 March 2011 10:46:20 Lai Jiangshan wrote:
> +static __always_inline bool __is_kfree_rcu_offset(unsigned long offset)
> +{
> +       return offset < 4096;
> +}

So this relies on the assumptions that 

a) the rcu_head is within the first 4 KB of the data structure to be freed
b) no callback ever gets called in the first 4 KB of virtual address space

It's probably a reasonable assumption, but I think it should be documented
more explicitly, especially the first one. It's entirely possible that
an RCU managed data structure is larger than 4 KB.
Another alternative might be to encode the difference between a
function pointer and an offset in one of the lower bits of the address.

	Arnd

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V4 1/1] rcu: introduce kfree_rcu()
  2011-03-15 10:15 ` Arnd Bergmann
@ 2011-03-15 11:27   ` Paul E. McKenney
  2011-03-15 12:02     ` Arnd Bergmann
  2011-03-16  2:23   ` [PATCH V4 " Lai Jiangshan
  1 sibling, 1 reply; 19+ messages in thread
From: Paul E. McKenney @ 2011-03-15 11:27 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Lai Jiangshan, Ingo Molnar, LKML, Manfred Spraul

On Tue, Mar 15, 2011 at 11:15:54AM +0100, Arnd Bergmann wrote:
> On Tuesday 15 March 2011 10:46:20 Lai Jiangshan wrote:
> > +static __always_inline bool __is_kfree_rcu_offset(unsigned long offset)
> > +{
> > +       return offset < 4096;
> > +}
> 
> So this relies on the assumptions that 
> 
> a) the rcu_head is within the first 4 KB of the data structure to be freed
> b) no callback ever gets called in the first 4 KB of virtual address space
> 
> It's probably a reasonable assumption, but I think it should be documented
> more explicitly, especially the first one. It's entirely possible that
> an RCU managed data structure is larger than 4 KB.

Good catch, this does indeed need to be explicitly documented, for
example in the docbook header.

> Another alternative might be to encode the difference between a
> function pointer and an offset in one of the lower bits of the address.

We discussed this some time back, and it turned out that there were
CPUs that could legitimately have any combination of low-order bits
set -- functions could start at any byte address.

If this has changed, I would prefer to use the low-order bits, but
if it has not, we can't.  :-(

							Thanx, Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V4 1/1] rcu: introduce kfree_rcu()
  2011-03-15  9:46 [PATCH V4 1/1] rcu: introduce kfree_rcu() Lai Jiangshan
  2011-03-15 10:15 ` Arnd Bergmann
@ 2011-03-15 11:30 ` Paul E. McKenney
  2011-03-16  2:50   ` Lai Jiangshan
  2011-03-15 13:11 ` Eric Dumazet
  2 siblings, 1 reply; 19+ messages in thread
From: Paul E. McKenney @ 2011-03-15 11:30 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: Ingo Molnar, LKML, Manfred Spraul

On Tue, Mar 15, 2011 at 05:46:20PM +0800, Lai Jiangshan wrote:
> kfree_rcu() which was original proposed by Lai 2.5 years ago is one of
> the most important RCU TODO list entries, Lai and Manfred have worked on
> patches for this. This V4 patch is based on the Manfred's patch and
> the V1 of Lai's patch. (These two patches are almost the same
> in implementation, and this patch is mainly based on the Manfred's).
> 
> Lai's V1 patch: http://lkml.org/lkml/2008/9/18/1
> Manfred's patch: http://lkml.org/lkml/2009/1/2/115
> RCU TODO list: http://www.kernel.org/pub/linux/kernel/people/paulmck/rcutodo.html
> 
> This new introduced API kfree_rcu() primitive kfree()s the specified memory
> after a RCU grace period elapses.
> 
> It replaces many simple "call_rcu(head, simple_kfree_callback)";
> These many simple_kfree_callback() instances just does
> 
> 	kfree(containerof(head,struct whatever_struct,rcu_member));
> 
> These simple_kfree_callback() instances are just duplicate code, we need
> a generic function for them.
> 
> And kfree_rcu() is also help for unloadable modules, kfree_rcu() does not
> queue any function which belong to the module, so a rcu_barrier() can
> be avoid when module exit. (If we queue any other function by call_rcu(),
> rcu_barrier() is still needed.)

Thank you for putting this together!  It does represent a nice
reduction in code size.

Once it settles out a bit, I intend to queue this patch.  It would be
best if the subsystems queue their own patches using kfree_rcu() once
this patch reaches mainline.

Seem reasonable?

							Thanx, Paul

> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
> ---
>  include/linux/rcupdate.h |   40 ++++++++++++++++++++++++++++++++++++++++
>  kernel/rcutiny.c         |    2 +-
>  kernel/rcutree.c         |    2 +-
>  3 files changed, 42 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 7d62909..18f7ade 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -777,4 +777,44 @@ static inline void debug_rcu_head_unqueue(struct rcu_head *head)
>  }
>  #endif	/* #else !CONFIG_DEBUG_OBJECTS_RCU_HEAD */
> 
> +static __always_inline bool __is_kfree_rcu_offset(unsigned long offset)
> +{
> +	return offset < 4096;
> +}
> +
> +static __always_inline
> +void __kfree_rcu(struct rcu_head *head, unsigned long offset)
> +{
> +	typedef void (*rcu_callback)(struct rcu_head *);
> +
> +	BUILD_BUG_ON(!__builtin_constant_p(offset));
> +	BUILD_BUG_ON(!__is_kfree_rcu_offset(offset));
> +
> +	call_rcu(head, (rcu_callback)offset);
> +}
> +
> +extern void kfree(const void *);
> +
> +static inline void __rcu_reclaim(struct rcu_head *head)
> +{
> +	unsigned long offset = (unsigned long)head->func;
> +
> +	if (__is_kfree_rcu_offset(offset))
> +		kfree((void *)head - offset);
> +	else
> +		head->func(head);
> +}
> +
> +/**
> + * kfree_rcu() - kfree an object after a grace period.
> + * @ptr:	pointer to kfree
> + * @rcu_head:	the name of the struct rcu_head within the type of @ptr.
> + *
> + * Many rcu callbacks just call kfree() on the base structure. This helper
> + * function calls kfree internally. The rcu_head structure must be embedded
> + * in the to be freed structure.
> + */
> +#define kfree_rcu(ptr, rcu_head)					\
> +	__kfree_rcu(&((ptr)->rcu_head), offsetof(typeof(*(ptr)), rcu_head))
> +
>  #endif /* __LINUX_RCUPDATE_H */
> diff --git a/kernel/rcutiny.c b/kernel/rcutiny.c
> index 0c343b9..4d60fbc 100644
> --- a/kernel/rcutiny.c
> +++ b/kernel/rcutiny.c
> @@ -167,7 +167,7 @@ static void rcu_process_callbacks(struct rcu_ctrlblk *rcp)
>  		prefetch(next);
>  		debug_rcu_head_unqueue(list);
>  		local_bh_disable();
> -		list->func(list);
> +		__rcu_reclaim(list);
>  		local_bh_enable();
>  		list = next;
>  		RCU_TRACE(cb_count++);
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index dd4aea8..b3c1aed 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -1143,7 +1143,7 @@ static void rcu_do_batch(struct rcu_state *rsp, struct rcu_data *rdp)
>  		next = list->next;
>  		prefetch(next);
>  		debug_rcu_head_unqueue(list);
> -		list->func(list);
> +		__rcu_reclaim(list);
>  		list = next;
>  		if (++count >= rdp->blimit)
>  			break;
> -- 
> 1.7.4

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V4 1/1] rcu: introduce kfree_rcu()
  2011-03-15 11:27   ` Paul E. McKenney
@ 2011-03-15 12:02     ` Arnd Bergmann
  2011-03-15 12:19       ` Paul E. McKenney
  0 siblings, 1 reply; 19+ messages in thread
From: Arnd Bergmann @ 2011-03-15 12:02 UTC (permalink / raw)
  To: paulmck; +Cc: Lai Jiangshan, Ingo Molnar, LKML, Manfred Spraul

On Tuesday 15 March 2011, Paul E. McKenney wrote:
> > Another alternative might be to encode the difference between a
> > function pointer and an offset in one of the lower bits of the address.
> 
> We discussed this some time back, and it turned out that there were
> CPUs that could legitimately have any combination of low-order bits
> set -- functions could start at any byte address.
> 
> If this has changed, I would prefer to use the low-order bits, but
> if it has not, we can't.  :-(

Ok, I see.

I just had another idea, which may or may not have new problems:

static inline void *kzalloc_rcu(size_t len, gfp_t flags)
{
	struct rcu_head *head = kzalloc(len + sizeof (struct rcu_head), flags);
	return head + 1;
}

void __kfree_rcu(struct rcu_head *head)
{
	kfree(head);
}

static inline void kfree_rcu(void *p)
{
	struct rcu_head *head = p - sizeof (struct rcu_head);
	call_rcu(head, __kfree_rcu);
}

The only disadvantage I can see right now is that it messes
with the alignment of the structure.

	Arnd

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V4 1/1] rcu: introduce kfree_rcu()
  2011-03-15 12:02     ` Arnd Bergmann
@ 2011-03-15 12:19       ` Paul E. McKenney
  2011-03-15 13:07         ` Arnd Bergmann
  0 siblings, 1 reply; 19+ messages in thread
From: Paul E. McKenney @ 2011-03-15 12:19 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Lai Jiangshan, Ingo Molnar, LKML, Manfred Spraul

On Tue, Mar 15, 2011 at 01:02:09PM +0100, Arnd Bergmann wrote:
> On Tuesday 15 March 2011, Paul E. McKenney wrote:
> > > Another alternative might be to encode the difference between a
> > > function pointer and an offset in one of the lower bits of the address.
> > 
> > We discussed this some time back, and it turned out that there were
> > CPUs that could legitimately have any combination of low-order bits
> > set -- functions could start at any byte address.
> > 
> > If this has changed, I would prefer to use the low-order bits, but
> > if it has not, we can't.  :-(
> 
> Ok, I see.
> 
> I just had another idea, which may or may not have new problems:
> 
> static inline void *kzalloc_rcu(size_t len, gfp_t flags)
> {
> 	struct rcu_head *head = kzalloc(len + sizeof (struct rcu_head), flags);
> 	return head + 1;
> }
> 
> void __kfree_rcu(struct rcu_head *head)
> {
> 	kfree(head);
> }
> 
> static inline void kfree_rcu(void *p)
> {
> 	struct rcu_head *head = p - sizeof (struct rcu_head);
> 	call_rcu(head, __kfree_rcu);
> }
> 
> The only disadvantage I can see right now is that it messes
> with the alignment of the structure.

And it makes use of statically allocated structures a bit clunky.

The other approach I could imagine would be to create the RCU callback
functions on the fly at compile/link time by creating a new section into
which offsets and places for function pointers are placed.  A link-time
utility could scan the contents of the section, generate the needed
functions, compile them, and place pointers to them into the section.

One disadvantage of this approach (in addition to the changes required
to kbuild) is that it would not allow rcu_barrier() to be removed.

Yet another approach is to use the low-order bit of the rcu_head pointer,
given that the rcu_head structure does have to be aligned.  If this bit
is set, then the function pointer could be interpreted as an offset.
This approach might also allow a slab_free_rcu() to be constructed, given
that the full 32 bits of the function pointer would be available.
For example, if the upper 16 bits are zero, the low-order 16 bits are
the offset.  If the upper 16 bits are 0x1, then the low-order 16 bits
might be an index that selects the desired slab cache.

Other possible approaches?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V4 1/1] rcu: introduce kfree_rcu()
  2011-03-15 12:19       ` Paul E. McKenney
@ 2011-03-15 13:07         ` Arnd Bergmann
  2011-03-16  2:58           ` Lai Jiangshan
  2011-03-16  4:02           ` Paul E. McKenney
  0 siblings, 2 replies; 19+ messages in thread
From: Arnd Bergmann @ 2011-03-15 13:07 UTC (permalink / raw)
  To: paulmck; +Cc: Lai Jiangshan, Ingo Molnar, LKML, Manfred Spraul

On Tuesday 15 March 2011, Paul E. McKenney wrote:
> And it makes use of statically allocated structures a bit clunky.

How do statically allocated structures relate to this? I would
expect that you never call kfree_rcu on them, so it shouldn't
matter.

> Yet another approach is to use the low-order bit of the rcu_head pointer,
> given that the rcu_head structure does have to be aligned.  If this bit
> is set, then the function pointer could be interpreted as an offset.
> This approach might also allow a slab_free_rcu() to be constructed, given
> that the full 32 bits of the function pointer would be available.
> For example, if the upper 16 bits are zero, the low-order 16 bits are
> the offset.  If the upper 16 bits are 0x1, then the low-order 16 bits
> might be an index that selects the desired slab cache.

This solution sounds like a clear improvement over the patch that Lai
Jiangshan posted, without any downsides.

	Arnd

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V4 1/1] rcu: introduce kfree_rcu()
  2011-03-15  9:46 [PATCH V4 1/1] rcu: introduce kfree_rcu() Lai Jiangshan
  2011-03-15 10:15 ` Arnd Bergmann
  2011-03-15 11:30 ` Paul E. McKenney
@ 2011-03-15 13:11 ` Eric Dumazet
  2011-03-16  4:03   ` Paul E. McKenney
  2 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2011-03-15 13:11 UTC (permalink / raw)
  To: Lai Jiangshan, Paul E. McKenney; +Cc: Ingo Molnar, LKML, Manfred Spraul

Le mardi 15 mars 2011 à 17:46 +0800, Lai Jiangshan a écrit :


> --- a/kernel/rcutiny.c
> +++ b/kernel/rcutiny.c
> @@ -167,7 +167,7 @@ static void rcu_process_callbacks(struct rcu_ctrlblk *rcp)
>  		prefetch(next);
>  		debug_rcu_head_unqueue(list);
>  		local_bh_disable();
> -		list->func(list);
> +		__rcu_reclaim(list);
>  		local_bh_enable();
>  		list = next;
>  		RCU_TRACE(cb_count++);

Paul, I am just wondering why we disable BH before calling list->func()

This should be done in callbacks that really need it ?

At least the disable/enable pair is not necessary before calling kfree()




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V4 1/1] rcu: introduce kfree_rcu()
  2011-03-15 10:15 ` Arnd Bergmann
  2011-03-15 11:27   ` Paul E. McKenney
@ 2011-03-16  2:23   ` Lai Jiangshan
  1 sibling, 0 replies; 19+ messages in thread
From: Lai Jiangshan @ 2011-03-16  2:23 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Ingo Molnar, Paul E. McKenney, LKML, Manfred Spraul

On 03/15/2011 06:15 PM, Arnd Bergmann wrote:
> On Tuesday 15 March 2011 10:46:20 Lai Jiangshan wrote:
>> +static __always_inline bool __is_kfree_rcu_offset(unsigned long offset)
>> +{
>> +       return offset < 4096;
>> +}
> 
> So this relies on the assumptions that 
> 
> a) the rcu_head is within the first 4 KB of the data structure to be freed
> b) no callback ever gets called in the first 4 KB of virtual address space
> 
> It's probably a reasonable assumption, but I think it should be documented
> more explicitly, especially the first one. It's entirely possible that
> an RCU managed data structure is larger than 4 KB.

The first one is not a problem nor assumption, if an rcu_head offset is larger
than 4096, the BUILD_BUG_ON() will be triggered, and the user can use the original
call_rcu() instead.

b) is not a problem, the TEXT section is no at the first 4 KB of virtual address space.

> Another alternative might be to encode the difference between a
> function pointer and an offset in one of the lower bits of the address.
> 
> 	Arnd
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V4 1/1] rcu: introduce kfree_rcu()
  2011-03-15 11:30 ` Paul E. McKenney
@ 2011-03-16  2:50   ` Lai Jiangshan
  2011-03-16  4:29     ` Paul E. McKenney
  0 siblings, 1 reply; 19+ messages in thread
From: Lai Jiangshan @ 2011-03-16  2:50 UTC (permalink / raw)
  To: paulmck; +Cc: Ingo Molnar, LKML, Manfred Spraul

On 03/15/2011 07:30 PM, Paul E. McKenney wrote:
> On Tue, Mar 15, 2011 at 05:46:20PM +0800, Lai Jiangshan wrote:
>> kfree_rcu() which was original proposed by Lai 2.5 years ago is one of
>> the most important RCU TODO list entries, Lai and Manfred have worked on
>> patches for this. This V4 patch is based on the Manfred's patch and
>> the V1 of Lai's patch. (These two patches are almost the same
>> in implementation, and this patch is mainly based on the Manfred's).
>>
>> Lai's V1 patch: http://lkml.org/lkml/2008/9/18/1
>> Manfred's patch: http://lkml.org/lkml/2009/1/2/115
>> RCU TODO list: http://www.kernel.org/pub/linux/kernel/people/paulmck/rcutodo.html
>>
>> This new introduced API kfree_rcu() primitive kfree()s the specified memory
>> after a RCU grace period elapses.
>>
>> It replaces many simple "call_rcu(head, simple_kfree_callback)";
>> These many simple_kfree_callback() instances just does
>>
>> 	kfree(containerof(head,struct whatever_struct,rcu_member));
>>
>> These simple_kfree_callback() instances are just duplicate code, we need
>> a generic function for them.
>>
>> And kfree_rcu() is also help for unloadable modules, kfree_rcu() does not
>> queue any function which belong to the module, so a rcu_barrier() can
>> be avoid when module exit. (If we queue any other function by call_rcu(),
>> rcu_barrier() is still needed.)
> 
> Thank you for putting this together!  It does represent a nice
> reduction in code size.
> 
> Once it settles out a bit, I intend to queue this patch.  It would be
> best if the subsystems queue their own patches using kfree_rcu() once
> this patch reaches mainline.
> 

It seems that the subsystems maintainers just Ack the patches.
I hope Ingo queue the Acked using kfree_rcu() patches into -tip,
it will help the kfree_rcu() reaches mainline earlier.

Thanks,
Lai

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V4 1/1] rcu: introduce kfree_rcu()
  2011-03-15 13:07         ` Arnd Bergmann
@ 2011-03-16  2:58           ` Lai Jiangshan
  2011-03-16  4:38             ` Paul E. McKenney
  2011-03-16  4:02           ` Paul E. McKenney
  1 sibling, 1 reply; 19+ messages in thread
From: Lai Jiangshan @ 2011-03-16  2:58 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: paulmck, Ingo Molnar, LKML, Manfred Spraul

On 03/15/2011 09:07 PM, Arnd Bergmann wrote:
> On Tuesday 15 March 2011, Paul E. McKenney wrote:
>> And it makes use of statically allocated structures a bit clunky.
> 
> How do statically allocated structures relate to this? I would
> expect that you never call kfree_rcu on them, so it shouldn't
> matter.
> 
>> Yet another approach is to use the low-order bit of the rcu_head pointer,
>> given that the rcu_head structure does have to be aligned.  If this bit
>> is set, then the function pointer could be interpreted as an offset.
>> This approach might also allow a slab_free_rcu() to be constructed, given
>> that the full 32 bits of the function pointer would be available.
>> For example, if the upper 16 bits are zero, the low-order 16 bits are
>> the offset.  If the upper 16 bits are 0x1, then the low-order 16 bits
>> might be an index that selects the desired slab cache.
> 
> This solution sounds like a clear improvement over the patch that Lai
> Jiangshan posted, without any downsides.
> 

This solution is good, but it changes too much code, I think we will switch to
this solution until my posted solution can't work under some real bad situation
happened.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V4 1/1] rcu: introduce kfree_rcu()
  2011-03-15 13:07         ` Arnd Bergmann
  2011-03-16  2:58           ` Lai Jiangshan
@ 2011-03-16  4:02           ` Paul E. McKenney
  2011-03-18  3:15             ` [PATCH V5 " Lai Jiangshan
  1 sibling, 1 reply; 19+ messages in thread
From: Paul E. McKenney @ 2011-03-16  4:02 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Lai Jiangshan, Ingo Molnar, LKML, Manfred Spraul

On Tue, Mar 15, 2011 at 02:07:24PM +0100, Arnd Bergmann wrote:
> On Tuesday 15 March 2011, Paul E. McKenney wrote:
> > And it makes use of statically allocated structures a bit clunky.
> 
> How do statically allocated structures relate to this? I would
> expect that you never call kfree_rcu on them, so it shouldn't
> matter.
> 
> > Yet another approach is to use the low-order bit of the rcu_head pointer,
> > given that the rcu_head structure does have to be aligned.  If this bit
> > is set, then the function pointer could be interpreted as an offset.
> > This approach might also allow a slab_free_rcu() to be constructed, given
> > that the full 32 bits of the function pointer would be available.
> > For example, if the upper 16 bits are zero, the low-order 16 bits are
> > the offset.  If the upper 16 bits are 0x1, then the low-order 16 bits
> > might be an index that selects the desired slab cache.
> 
> This solution sounds like a clear improvement over the patch that Lai
> Jiangshan posted, without any downsides.

Except that I was forgetting that we don't really have any way to stop
people from handing us misaligned rcu_head structures -- that topic came
up last time as well.  Or were the people mentioning that possibility
being overly paranoid?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V4 1/1] rcu: introduce kfree_rcu()
  2011-03-15 13:11 ` Eric Dumazet
@ 2011-03-16  4:03   ` Paul E. McKenney
  2011-03-17  9:28     ` Lai Jiangshan
  0 siblings, 1 reply; 19+ messages in thread
From: Paul E. McKenney @ 2011-03-16  4:03 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Lai Jiangshan, Ingo Molnar, LKML, Manfred Spraul

On Tue, Mar 15, 2011 at 02:11:33PM +0100, Eric Dumazet wrote:
> Le mardi 15 mars 2011 à 17:46 +0800, Lai Jiangshan a écrit :
> 
> 
> > --- a/kernel/rcutiny.c
> > +++ b/kernel/rcutiny.c
> > @@ -167,7 +167,7 @@ static void rcu_process_callbacks(struct rcu_ctrlblk *rcp)
> >  		prefetch(next);
> >  		debug_rcu_head_unqueue(list);
> >  		local_bh_disable();
> > -		list->func(list);
> > +		__rcu_reclaim(list);
> >  		local_bh_enable();
> >  		list = next;
> >  		RCU_TRACE(cb_count++);
> 
> Paul, I am just wondering why we disable BH before calling list->func()
> 
> This should be done in callbacks that really need it ?
> 
> At least the disable/enable pair is not necessary before calling kfree()

Good point, we could bury the enable/disable pair in __rcu_reclaim().

Lai, am I forgetting any reason why we disable BH?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V4 1/1] rcu: introduce kfree_rcu()
  2011-03-16  2:50   ` Lai Jiangshan
@ 2011-03-16  4:29     ` Paul E. McKenney
  0 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2011-03-16  4:29 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: Ingo Molnar, LKML, Manfred Spraul

On Wed, Mar 16, 2011 at 10:50:32AM +0800, Lai Jiangshan wrote:
> On 03/15/2011 07:30 PM, Paul E. McKenney wrote:
> > On Tue, Mar 15, 2011 at 05:46:20PM +0800, Lai Jiangshan wrote:
> >> kfree_rcu() which was original proposed by Lai 2.5 years ago is one of
> >> the most important RCU TODO list entries, Lai and Manfred have worked on
> >> patches for this. This V4 patch is based on the Manfred's patch and
> >> the V1 of Lai's patch. (These two patches are almost the same
> >> in implementation, and this patch is mainly based on the Manfred's).
> >>
> >> Lai's V1 patch: http://lkml.org/lkml/2008/9/18/1
> >> Manfred's patch: http://lkml.org/lkml/2009/1/2/115
> >> RCU TODO list: http://www.kernel.org/pub/linux/kernel/people/paulmck/rcutodo.html
> >>
> >> This new introduced API kfree_rcu() primitive kfree()s the specified memory
> >> after a RCU grace period elapses.
> >>
> >> It replaces many simple "call_rcu(head, simple_kfree_callback)";
> >> These many simple_kfree_callback() instances just does
> >>
> >> 	kfree(containerof(head,struct whatever_struct,rcu_member));
> >>
> >> These simple_kfree_callback() instances are just duplicate code, we need
> >> a generic function for them.
> >>
> >> And kfree_rcu() is also help for unloadable modules, kfree_rcu() does not
> >> queue any function which belong to the module, so a rcu_barrier() can
> >> be avoid when module exit. (If we queue any other function by call_rcu(),
> >> rcu_barrier() is still needed.)
> > 
> > Thank you for putting this together!  It does represent a nice
> > reduction in code size.
> > 
> > Once it settles out a bit, I intend to queue this patch.  It would be
> > best if the subsystems queue their own patches using kfree_rcu() once
> > this patch reaches mainline.
> > 
> 
> It seems that the subsystems maintainers just Ack the patches.
> I hope Ingo queue the Acked using kfree_rcu() patches into -tip,
> it will help the kfree_rcu() reaches mainline earlier.

Yep, I am comfortable pushing the patches that have received acks.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V4 1/1] rcu: introduce kfree_rcu()
  2011-03-16  2:58           ` Lai Jiangshan
@ 2011-03-16  4:38             ` Paul E. McKenney
  0 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2011-03-16  4:38 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: Arnd Bergmann, Ingo Molnar, LKML, Manfred Spraul

On Wed, Mar 16, 2011 at 10:58:14AM +0800, Lai Jiangshan wrote:
> On 03/15/2011 09:07 PM, Arnd Bergmann wrote:
> > On Tuesday 15 March 2011, Paul E. McKenney wrote:
> >> And it makes use of statically allocated structures a bit clunky.
> > 
> > How do statically allocated structures relate to this? I would
> > expect that you never call kfree_rcu on them, so it shouldn't
> > matter.
> > 
> >> Yet another approach is to use the low-order bit of the rcu_head pointer,
> >> given that the rcu_head structure does have to be aligned.  If this bit
> >> is set, then the function pointer could be interpreted as an offset.
> >> This approach might also allow a slab_free_rcu() to be constructed, given
> >> that the full 32 bits of the function pointer would be available.
> >> For example, if the upper 16 bits are zero, the low-order 16 bits are
> >> the offset.  If the upper 16 bits are 0x1, then the low-order 16 bits
> >> might be an index that selects the desired slab cache.
> > 
> > This solution sounds like a clear improvement over the patch that Lai
> > Jiangshan posted, without any downsides.
> 
> This solution is good, but it changes too much code, I think we will switch to
> this solution until my posted solution can't work under some real bad situation
> happened.

Indeed, the bit patterns are totally internal to this patch, so we can
change as needed -- for example, if we later want to apply this same
technique to slab_free() as well as kfree().

							Thanx, Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V4 1/1] rcu: introduce kfree_rcu()
  2011-03-16  4:03   ` Paul E. McKenney
@ 2011-03-17  9:28     ` Lai Jiangshan
  2011-03-17 17:50       ` Paul E. McKenney
  0 siblings, 1 reply; 19+ messages in thread
From: Lai Jiangshan @ 2011-03-17  9:28 UTC (permalink / raw)
  To: paulmck; +Cc: Eric Dumazet, Ingo Molnar, LKML, Manfred Spraul

On 03/16/2011 12:03 PM, Paul E. McKenney wrote:
> On Tue, Mar 15, 2011 at 02:11:33PM +0100, Eric Dumazet wrote:
>> Le mardi 15 mars 2011 à 17:46 +0800, Lai Jiangshan a écrit :
>>
>>
>>> --- a/kernel/rcutiny.c
>>> +++ b/kernel/rcutiny.c
>>> @@ -167,7 +167,7 @@ static void rcu_process_callbacks(struct rcu_ctrlblk *rcp)
>>>  		prefetch(next);
>>>  		debug_rcu_head_unqueue(list);
>>>  		local_bh_disable();
>>> -		list->func(list);
>>> +		__rcu_reclaim(list);
>>>  		local_bh_enable();
>>>  		list = next;
>>>  		RCU_TRACE(cb_count++);
>>
>> Paul, I am just wondering why we disable BH before calling list->func()
>>
>> This should be done in callbacks that really need it ?
>>
>> At least the disable/enable pair is not necessary before calling kfree()
> 
> Good point, we could bury the enable/disable pair in __rcu_reclaim().
> 
> Lai, am I forgetting any reason why we disable BH?
> 
> 							Thanx, Paul
> 

For many years, rcu callbacks are called on BH since rcu is added to kernel,
and someone assume they always called in BH. So we have to disable BH before
calling list->func() to avoid bad result. It's a *historical* reason.

I greed the disable/enable pair is not necessary before calling kfree(), but
__rcu_reclaim() is also called in rcutree which rcu_process_callbacks()
is in BH currently, I don't want to write 2 different version of
__rcu_reclaim()s (one for rcutree, another for rcutiny).

rcutree's rcu_process_callbacks() will be moved to process context, we may
remove disable/enable BH pair for kfree() then.

Thanks,
Lai.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V4 1/1] rcu: introduce kfree_rcu()
  2011-03-17  9:28     ` Lai Jiangshan
@ 2011-03-17 17:50       ` Paul E. McKenney
  0 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2011-03-17 17:50 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: Eric Dumazet, Ingo Molnar, LKML, Manfred Spraul

On Thu, Mar 17, 2011 at 05:28:49PM +0800, Lai Jiangshan wrote:
> On 03/16/2011 12:03 PM, Paul E. McKenney wrote:
> > On Tue, Mar 15, 2011 at 02:11:33PM +0100, Eric Dumazet wrote:
> >> Le mardi 15 mars 2011 à 17:46 +0800, Lai Jiangshan a écrit :
> >>
> >>
> >>> --- a/kernel/rcutiny.c
> >>> +++ b/kernel/rcutiny.c
> >>> @@ -167,7 +167,7 @@ static void rcu_process_callbacks(struct rcu_ctrlblk *rcp)
> >>>  		prefetch(next);
> >>>  		debug_rcu_head_unqueue(list);
> >>>  		local_bh_disable();
> >>> -		list->func(list);
> >>> +		__rcu_reclaim(list);
> >>>  		local_bh_enable();
> >>>  		list = next;
> >>>  		RCU_TRACE(cb_count++);
> >>
> >> Paul, I am just wondering why we disable BH before calling list->func()
> >>
> >> This should be done in callbacks that really need it ?
> >>
> >> At least the disable/enable pair is not necessary before calling kfree()
> > 
> > Good point, we could bury the enable/disable pair in __rcu_reclaim().
> > 
> > Lai, am I forgetting any reason why we disable BH?
> > 
> > 							Thanx, Paul
> > 
> 
> For many years, rcu callbacks are called on BH since rcu is added to kernel,
> and someone assume they always called in BH. So we have to disable BH before
> calling list->func() to avoid bad result. It's a *historical* reason.
> 
> I greed the disable/enable pair is not necessary before calling kfree(), but
> __rcu_reclaim() is also called in rcutree which rcu_process_callbacks()
> is in BH currently, I don't want to write 2 different version of
> __rcu_reclaim()s (one for rcutree, another for rcutiny).
> 
> rcutree's rcu_process_callbacks() will be moved to process context, we may
> remove disable/enable BH pair for kfree() then.

OK, so if I sequence your patches after the rcutree priority boosting,
which threadifies rcutree's callback processing, I should be able to
omit BH for the kfree() case.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH V5 1/1] rcu: introduce kfree_rcu()
  2011-03-16  4:02           ` Paul E. McKenney
@ 2011-03-18  3:15             ` Lai Jiangshan
  2011-03-18  8:14               ` Arnd Bergmann
  0 siblings, 1 reply; 19+ messages in thread
From: Lai Jiangshan @ 2011-03-18  3:15 UTC (permalink / raw)
  To: paulmck, Ingo Molnar; +Cc: Arnd Bergmann, LKML, Manfred Spraul

kfree_rcu() which was original proposed by Lai 2.5 years ago is one of
the most important RCU TODO list entries, Lai and Manfred have worked on
patches for this. This V4 patch is based on the Manfred's patch and
the V1 of Lai's patch. (These two patches are almost the same
in implementation, and this patch is mainly based on the Manfred's).

Lai's V1 patch: http://lkml.org/lkml/2008/9/18/1
Manfred's patch: http://lkml.org/lkml/2009/1/2/115
RCU TODO list: http://www.kernel.org/pub/linux/kernel/people/paulmck/rcutodo.html

This new introduced API kfree_rcu() primitive kfree()s the specified memory
after a RCU grace period elapses.

It replaces many simple "call_rcu(head, simple_kfree_callback)";
These many simple_kfree_callback() instances just does

	kfree(containerof(head,struct whatever_struct,rcu_member));

These simple_kfree_callback() instances are just duplicate code, we need
a generic function for them.

And kfree_rcu() is also help for unloadable modules, kfree_rcu() does not
queue any function which belong to the module, so a rcu_barrier() can
be avoid when module exit. (If we queue any other function by call_rcu(),
rcu_barrier() is still needed.)

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
---
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 7d62909..e45ed82 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -777,4 +777,58 @@ static inline void debug_rcu_head_unqueue(struct rcu_head *head)
 }
 #endif	/* #else !CONFIG_DEBUG_OBJECTS_RCU_HEAD */
 
+static __always_inline bool __is_kfree_rcu_offset(unsigned long offset)
+{
+	return offset < 4096;
+}
+
+static __always_inline
+void __kfree_rcu(struct rcu_head *head, unsigned long offset)
+{
+	typedef void (*rcu_callback)(struct rcu_head *);
+
+	BUILD_BUG_ON(!__builtin_constant_p(offset));
+
+	/* See the comments of kfree_rcu(), the "Note:" section. */
+	BUILD_BUG_ON(!__is_kfree_rcu_offset(offset));
+
+	call_rcu(head, (rcu_callback)offset);
+}
+
+extern void kfree(const void *);
+
+static inline void __rcu_reclaim(struct rcu_head *head)
+{
+	unsigned long offset = (unsigned long)head->func;
+
+	if (__is_kfree_rcu_offset(offset))
+		kfree((void *)head - offset);
+	else
+		head->func(head);
+}
+
+/**
+ * kfree_rcu() - kfree an object after a grace period.
+ * @ptr:	pointer to kfree
+ * @rcu_head:	the name of the struct rcu_head within the type of @ptr.
+ *
+ * Many rcu callbacks just call kfree() on the base structure. This helper
+ * function calls kfree internally. The rcu_head structure must be embedded
+ * in the to be freed structure.
+ *
+ * It is different from call_rcu(), kfree_rcu() does not require nor create
+ * any local RCU callback functions belong to its module. So the caller
+ * does not need to wait the callback to complete when the caller want to
+ * unload the module.
+ *
+ * Note: if the offset of the struct rcu_head within the type of @ptr
+ * is larger than 4096, it will trigger a BUILD_BUG_ON() compile-time
+ * error in __kfree_rcu(), the user of kfree_rcu() should rerange the
+ * fields of the type of @ptr to make the offset smaller or use call_rcu()
+ * instead or require the RCU maintainer changing the limit
+ * in this situation.
+ */
+#define kfree_rcu(ptr, rcu_head)					\
+	__kfree_rcu(&((ptr)->rcu_head), offsetof(typeof(*(ptr)), rcu_head))
+
 #endif /* __LINUX_RCUPDATE_H */
diff --git a/kernel/rcutiny.c b/kernel/rcutiny.c
index 0c343b9..4d60fbc 100644
--- a/kernel/rcutiny.c
+++ b/kernel/rcutiny.c
@@ -167,7 +167,7 @@ static void rcu_process_callbacks(struct rcu_ctrlblk *rcp)
 		prefetch(next);
 		debug_rcu_head_unqueue(list);
 		local_bh_disable();
-		list->func(list);
+		__rcu_reclaim(list);
 		local_bh_enable();
 		list = next;
 		RCU_TRACE(cb_count++);
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index dd4aea8..b3c1aed 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -1143,7 +1143,7 @@ static void rcu_do_batch(struct rcu_state *rsp, struct rcu_data *rdp)
 		next = list->next;
 		prefetch(next);
 		debug_rcu_head_unqueue(list);
-		list->func(list);
+		__rcu_reclaim(list);
 		list = next;
 		if (++count >= rdp->blimit)
 			break;

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH V5 1/1] rcu: introduce kfree_rcu()
  2011-03-18  3:15             ` [PATCH V5 " Lai Jiangshan
@ 2011-03-18  8:14               ` Arnd Bergmann
  0 siblings, 0 replies; 19+ messages in thread
From: Arnd Bergmann @ 2011-03-18  8:14 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: paulmck, Ingo Molnar, LKML, Manfred Spraul

On Friday 18 March 2011, Lai Jiangshan wrote:
> kfree_rcu() which was original proposed by Lai 2.5 years ago is one of
> the most important RCU TODO list entries, Lai and Manfred have worked on
> patches for this. This V4 patch is based on the Manfred's patch and
> the V1 of Lai's patch. (These two patches are almost the same
> in implementation, and this patch is mainly based on the Manfred's).
> 
> Lai's V1 patch: http://lkml.org/lkml/2008/9/18/1
> Manfred's patch: http://lkml.org/lkml/2009/1/2/115
> RCU TODO list: http://www.kernel.org/pub/linux/kernel/people/paulmck/rcutodo.html
> 
> This new introduced API kfree_rcu() primitive kfree()s the specified memory
> after a RCU grace period elapses.
> 
> It replaces many simple "call_rcu(head, simple_kfree_callback)";
> These many simple_kfree_callback() instances just does
> 
>         kfree(containerof(head,struct whatever_struct,rcu_member));
> 
> These simple_kfree_callback() instances are just duplicate code, we need
> a generic function for them.
> 
> And kfree_rcu() is also help for unloadable modules, kfree_rcu() does not
> queue any function which belong to the module, so a rcu_barrier() can
> be avoid when module exit. (If we queue any other function by call_rcu(),
> rcu_barrier() is still needed.)
> 
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Signed-off-by: Manfred Spraul <manfred@colorfullife.com>

Acked-by: Arnd Bergmann <arnd@arndb.de>

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2011-03-18  8:14 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-15  9:46 [PATCH V4 1/1] rcu: introduce kfree_rcu() Lai Jiangshan
2011-03-15 10:15 ` Arnd Bergmann
2011-03-15 11:27   ` Paul E. McKenney
2011-03-15 12:02     ` Arnd Bergmann
2011-03-15 12:19       ` Paul E. McKenney
2011-03-15 13:07         ` Arnd Bergmann
2011-03-16  2:58           ` Lai Jiangshan
2011-03-16  4:38             ` Paul E. McKenney
2011-03-16  4:02           ` Paul E. McKenney
2011-03-18  3:15             ` [PATCH V5 " Lai Jiangshan
2011-03-18  8:14               ` Arnd Bergmann
2011-03-16  2:23   ` [PATCH V4 " Lai Jiangshan
2011-03-15 11:30 ` Paul E. McKenney
2011-03-16  2:50   ` Lai Jiangshan
2011-03-16  4:29     ` Paul E. McKenney
2011-03-15 13:11 ` Eric Dumazet
2011-03-16  4:03   ` Paul E. McKenney
2011-03-17  9:28     ` Lai Jiangshan
2011-03-17 17:50       ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox