* [RFC, PATCH] kernel/rcu: add kfree_rcu
@ 2009-01-02 11:25 Manfred Spraul
2009-01-02 18:55 ` Paul E. McKenney
2009-01-04 5:39 ` Lai Jiangshan
0 siblings, 2 replies; 16+ messages in thread
From: Manfred Spraul @ 2009-01-02 11:25 UTC (permalink / raw)
To: linux-kernel; +Cc: paulmck, akpm, laijs
Based on the idea from Lai Jiangshan:
There are 23 instances where the rcu callback just does
kfree(containerof(rcu_member,struct whatever_struct,head));
The 23 instances exist because there are 23 'struct whatever_struct' with
their individual rcu_member.
The attached patch creates a generic kfree_rcu function that removes the need
for these 23 helpers: The offset from the container struct to the rcu member
is calculated at compile time and stored in head->func instead of the
function pointer.
Effect on .text: (x86_64)
+23 bytes in rcutree.c
- 9 bytes in ipc/sem.c
Depending on the number of call_rcu instances in the .config, adding
kfree_rcu should save between 100 and 200 bytes in .text.
What do you think?
Is the interface ok? Incorrect use should generate compile time errors.
Is it ok to use IS_ERR(p)?
And: How many seperate patches should I create?
I would propose 24 patches: one with the interface, 23 that
replace the individual callbacks.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
---
include/linux/rcupdate.h | 33 +++++++++++++++++++++++++++++++++
ipc/sem.c | 10 ++--------
kernel/rcuclassic.c | 2 +-
kernel/rcupreempt.c | 2 +-
kernel/rcutree.c | 2 +-
5 files changed, 38 insertions(+), 11 deletions(-)
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 1168fbc..d120014 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -269,6 +269,29 @@ extern void call_rcu(struct rcu_head *head,
extern void call_rcu_bh(struct rcu_head *head,
void (*func)(struct rcu_head *head));
+/**
+ * kfree_rcu - kfree after a grace period
+ * @ptr: pointer to kfree
+ * @rcu_member: name of the struct rcu_head member.
+ *
+ * Many rcu callbacks just call kfree() on the base structure. This helper
+ * function calls kfree internally. The rcu_head structure must be embedded
+ * in the to be freed structure.
+ */
+
+static inline void __kfree_rcu(void *prcu, unsigned long offset)
+{
+ void (*pfnc)(struct rcu_head *head) = (void (*)(struct rcu_head *head))offset;
+
+ BUILD_BUG_ON(!__builtin_constant_p(-offset));
+ BUILD_BUG_ON(!IS_ERR(pfnc));
+
+ call_rcu(prcu, pfnc);
+}
+
+#define kfree_rcu(pobj, entry) \
+ __kfree_rcu(&((pobj)->entry), -offsetof(typeof(*(pobj)), entry))
+
/* Exported common interfaces */
extern void synchronize_rcu(void);
extern void rcu_barrier(void);
@@ -279,4 +302,14 @@ extern void rcu_barrier_sched(void);
extern void rcu_init(void);
extern int rcu_needs_cpu(int cpu);
+/* helper functions */
+static inline void rcu_docallback(struct rcu_head *head)
+{
+ if (IS_ERR(head->func)) {
+ kfree(((char *)head) + (unsigned long)head->func);
+ } else {
+ head->func(head);
+ }
+}
+
#endif /* __LINUX_RCUPDATE_H */
diff --git a/ipc/sem.c b/ipc/sem.c
index 0821224..4a68cf2 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -504,12 +504,6 @@ static int count_semzcnt (struct sem_array * sma, ushort semnum)
return semzcnt;
}
-static void free_un(struct rcu_head *head)
-{
- struct sem_undo *un = container_of(head, struct sem_undo, rcu);
- kfree(un);
-}
-
/* Free a semaphore set. freeary() is called with sem_ids.rw_mutex locked
* as a writer and the spinlock for this semaphore set hold. sem_ids.rw_mutex
* remains locked on exit.
@@ -528,7 +522,7 @@ static void freeary(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp)
un->semid = -1;
list_del_rcu(&un->list_proc);
spin_unlock(&un->ulp->lock);
- call_rcu(&un->rcu, free_un);
+ kfree_rcu(un, rcu);
}
/* Wake up all pending processes and let them fail with EIDRM. */
@@ -1347,7 +1341,7 @@ void exit_sem(struct task_struct *tsk)
update_queue(sma);
sem_unlock(sma);
- call_rcu(&un->rcu, free_un);
+ kfree_rcu(un, rcu);
}
kfree(ulp);
}
diff --git a/kernel/rcuclassic.c b/kernel/rcuclassic.c
index e503a00..b2f97d2 100644
--- a/kernel/rcuclassic.c
+++ b/kernel/rcuclassic.c
@@ -336,7 +336,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
while (list) {
next = list->next;
prefetch(next);
- list->func(list);
+ rcu_docallback(list);
list = next;
if (++count >= rdp->blimit)
break;
diff --git a/kernel/rcupreempt.c b/kernel/rcupreempt.c
index 0498265..8440dac 100644
--- a/kernel/rcupreempt.c
+++ b/kernel/rcupreempt.c
@@ -1110,7 +1110,7 @@ static void rcu_process_callbacks(struct softirq_action *unused)
spin_unlock_irqrestore(&rdp->lock, flags);
while (list) {
next = list->next;
- list->func(list);
+ rcu_docallback(list);
list = next;
RCU_TRACE_ME(rcupreempt_trace_invoke);
}
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index a342b03..29c88ce 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -901,7 +901,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
while (list) {
next = list->next;
prefetch(next);
- list->func(list);
+ rcu_docallback(list);
list = next;
if (++count >= rdp->blimit)
break;
--
1.6.0.6
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [RFC, PATCH] kernel/rcu: add kfree_rcu
2009-01-02 11:25 [RFC, PATCH] kernel/rcu: add kfree_rcu Manfred Spraul
@ 2009-01-02 18:55 ` Paul E. McKenney
2009-01-03 14:59 ` Manfred Spraul
2009-01-04 5:50 ` Lai Jiangshan
2009-01-04 5:39 ` Lai Jiangshan
1 sibling, 2 replies; 16+ messages in thread
From: Paul E. McKenney @ 2009-01-02 18:55 UTC (permalink / raw)
To: Manfred Spraul; +Cc: linux-kernel, akpm, laijs
On Fri, Jan 02, 2009 at 12:25:58PM +0100, Manfred Spraul wrote:
> Based on the idea from Lai Jiangshan:
>
> There are 23 instances where the rcu callback just does
>
> kfree(containerof(rcu_member,struct whatever_struct,head));
>
> The 23 instances exist because there are 23 'struct whatever_struct' with
> their individual rcu_member.
>
> The attached patch creates a generic kfree_rcu function that removes the need
> for these 23 helpers: The offset from the container struct to the rcu member
> is calculated at compile time and stored in head->func instead of the
> function pointer.
OK, these seem to me to avoid Nick Piggin's vfree() question in response
to a roughly similar posting from Lai Jiangshan on November 18th
(http://lkml.org/lkml/2008/11/18/83).
> Effect on .text: (x86_64)
> +23 bytes in rcutree.c
> - 9 bytes in ipc/sem.c
>
> Depending on the number of call_rcu instances in the .config, adding
> kfree_rcu should save between 100 and 200 bytes in .text.
>
> What do you think?
I am in favor, given changes noted below. This is a much nicer way to
implement some function in the original RCU patch for Linux.
> Is the interface ok? Incorrect use should generate compile time errors.
> Is it ok to use IS_ERR(p)?
I would suggest instead using the bottom bit to differentiate between
these two cases, especially given that your approach makes it impossible
for callback processing to notice a NULL function pointer. In addition,
this approach would allow different types of allocators to be specified
should this later prove to be helpful. You should not have to shift the
offset because the rcu_head offset should always be a multiple of four
(or eight on 64-bit architectures).
And we really are running into bugs that are detected by RCU's seeing a
null function pointer in the rcu_head structure at callback-invocation
time. So, whatever encoding you choose, please leave a function-pointer
value of zero as an invalid value!
> And: How many seperate patches should I create?
> I would propose 24 patches: one with the interface, 23 that
> replace the individual callbacks.
I defer to the various maintainers on this one. ;-)
Thanx, Paul
> Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
> ---
> include/linux/rcupdate.h | 33 +++++++++++++++++++++++++++++++++
> ipc/sem.c | 10 ++--------
> kernel/rcuclassic.c | 2 +-
> kernel/rcupreempt.c | 2 +-
> kernel/rcutree.c | 2 +-
> 5 files changed, 38 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 1168fbc..d120014 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -269,6 +269,29 @@ extern void call_rcu(struct rcu_head *head,
> extern void call_rcu_bh(struct rcu_head *head,
> void (*func)(struct rcu_head *head));
>
> +/**
> + * kfree_rcu - kfree after a grace period
> + * @ptr: pointer to kfree
> + * @rcu_member: name of the struct rcu_head member.
> + *
> + * Many rcu callbacks just call kfree() on the base structure. This helper
> + * function calls kfree internally. The rcu_head structure must be embedded
> + * in the to be freed structure.
> + */
> +
> +static inline void __kfree_rcu(void *prcu, unsigned long offset)
> +{
> + void (*pfnc)(struct rcu_head *head) = (void (*)(struct rcu_head *head))offset;
> +
> + BUILD_BUG_ON(!__builtin_constant_p(-offset));
> + BUILD_BUG_ON(!IS_ERR(pfnc));
This allows us to drop the above check in favor of something like the
following above (though a symbolic value in place of 0x1 would be nice):
void (*pfnc)(struct rcu_head *head) = (void (*)(struct rcu_head *head))(offset | 0x1);
> + call_rcu(prcu, pfnc);
> +}
> +
> +#define kfree_rcu(pobj, entry) \
> + __kfree_rcu(&((pobj)->entry), -offsetof(typeof(*(pobj)), entry))
> +
> /* Exported common interfaces */
> extern void synchronize_rcu(void);
> extern void rcu_barrier(void);
> @@ -279,4 +302,14 @@ extern void rcu_barrier_sched(void);
> extern void rcu_init(void);
> extern int rcu_needs_cpu(int cpu);
>
> +/* helper functions */
> +static inline void rcu_docallback(struct rcu_head *head)
> +{
Experience indicates that we should have some sort of WARN_ON() in
the case of a NULL rcu_head, either here or where this is called. :-/
> + if (IS_ERR(head->func)) {
And in this "if" statement, just check the bottom bit.
> + kfree(((char *)head) + (unsigned long)head->func);
> + } else {
> + head->func(head);
> + }
> +}
> +
> #endif /* __LINUX_RCUPDATE_H */
> diff --git a/ipc/sem.c b/ipc/sem.c
> index 0821224..4a68cf2 100644
> --- a/ipc/sem.c
> +++ b/ipc/sem.c
> @@ -504,12 +504,6 @@ static int count_semzcnt (struct sem_array * sma, ushort semnum)
> return semzcnt;
> }
>
> -static void free_un(struct rcu_head *head)
> -{
> - struct sem_undo *un = container_of(head, struct sem_undo, rcu);
> - kfree(un);
> -}
> -
> /* Free a semaphore set. freeary() is called with sem_ids.rw_mutex locked
> * as a writer and the spinlock for this semaphore set hold. sem_ids.rw_mutex
> * remains locked on exit.
> @@ -528,7 +522,7 @@ static void freeary(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp)
> un->semid = -1;
> list_del_rcu(&un->list_proc);
> spin_unlock(&un->ulp->lock);
> - call_rcu(&un->rcu, free_un);
> + kfree_rcu(un, rcu);
This -is- nice! Short and sweet!!!
> }
>
> /* Wake up all pending processes and let them fail with EIDRM. */
> @@ -1347,7 +1341,7 @@ void exit_sem(struct task_struct *tsk)
> update_queue(sma);
> sem_unlock(sma);
>
> - call_rcu(&un->rcu, free_un);
> + kfree_rcu(un, rcu);
> }
> kfree(ulp);
> }
> diff --git a/kernel/rcuclassic.c b/kernel/rcuclassic.c
> index e503a00..b2f97d2 100644
> --- a/kernel/rcuclassic.c
> +++ b/kernel/rcuclassic.c
> @@ -336,7 +336,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
> while (list) {
> next = list->next;
> prefetch(next);
> - list->func(list);
> + rcu_docallback(list);
> list = next;
> if (++count >= rdp->blimit)
> break;
> diff --git a/kernel/rcupreempt.c b/kernel/rcupreempt.c
> index 0498265..8440dac 100644
> --- a/kernel/rcupreempt.c
> +++ b/kernel/rcupreempt.c
> @@ -1110,7 +1110,7 @@ static void rcu_process_callbacks(struct softirq_action *unused)
> spin_unlock_irqrestore(&rdp->lock, flags);
> while (list) {
> next = list->next;
> - list->func(list);
> + rcu_docallback(list);
> list = next;
> RCU_TRACE_ME(rcupreempt_trace_invoke);
> }
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index a342b03..29c88ce 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -901,7 +901,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
> while (list) {
> next = list->next;
> prefetch(next);
> - list->func(list);
> + rcu_docallback(list);
Good, you got all three of them! ;-)
> list = next;
> if (++count >= rdp->blimit)
> break;
> --
> 1.6.0.6
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC, PATCH] kernel/rcu: add kfree_rcu
2009-01-02 18:55 ` Paul E. McKenney
@ 2009-01-03 14:59 ` Manfred Spraul
2009-01-03 23:46 ` Paul E. McKenney
2009-01-04 5:50 ` Lai Jiangshan
1 sibling, 1 reply; 16+ messages in thread
From: Manfred Spraul @ 2009-01-03 14:59 UTC (permalink / raw)
To: paulmck; +Cc: linux-kernel, akpm, laijs
Paul E. McKenney wrote:
> I would suggest instead using the bottom bit to differentiate between
> these two cases, especially given that your approach makes it impossible
> for callback processing to notice a NULL function pointer. In addition,
> this approach would allow different types of allocators to be specified
> should this later prove to be helpful. You should not have to shift the
> offset because the rcu_head offset should always be a multiple of four
> (or eight on 64-bit architectures).
>
We must be careful: rcu_head might be always aligned, but are function
pointers always aligned?
The x86 hardware allows arbitrary function pointers, I'm not sure what
gcc would do if '--falign-functions=0' is used.
Are there other codepaths that assume that the lowest bit of a function
pointer is never set?
> And we really are running into bugs that are detected by RCU's seeing a
> null function pointer in the rcu_head structure at callback-invocation
> time. So, whatever encoding you choose, please leave a function-pointer
> value of zero as an invalid value!
>
Ok.
>> --- a/kernel/rcutree.c
>> +++ b/kernel/rcutree.c
>> @@ -901,7 +901,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
>> while (list) {
>> next = list->next;
>> prefetch(next);
>> - list->func(list);
>> + rcu_docallback(list);
>>
>
> Good, you got all three of them! ;-)
>
>
The patch was tested against rcutree ;-)
--
Manfred
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC, PATCH] kernel/rcu: add kfree_rcu
2009-01-03 14:59 ` Manfred Spraul
@ 2009-01-03 23:46 ` Paul E. McKenney
0 siblings, 0 replies; 16+ messages in thread
From: Paul E. McKenney @ 2009-01-03 23:46 UTC (permalink / raw)
To: Manfred Spraul; +Cc: linux-kernel, akpm, laijs
On Sat, Jan 03, 2009 at 03:59:40PM +0100, Manfred Spraul wrote:
> Paul E. McKenney wrote:
>> I would suggest instead using the bottom bit to differentiate between
>> these two cases, especially given that your approach makes it impossible
>> for callback processing to notice a NULL function pointer. In addition,
>> this approach would allow different types of allocators to be specified
>> should this later prove to be helpful. You should not have to shift the
>> offset because the rcu_head offset should always be a multiple of four
>> (or eight on 64-bit architectures).
>>
> We must be careful: rcu_head might be always aligned, but are function
> pointers always aligned?
> The x86 hardware allows arbitrary function pointers, I'm not sure what gcc
> would do if '--falign-functions=0' is used.
> Are there other codepaths that assume that the lowest bit of a function
> pointer is never set?
Good point. I guess that we will have to worry about expandability when
and if the need arises. I see a couple of ways of doing it, but they
are pretty ugly...
>> And we really are running into bugs that are detected by RCU's seeing a
>> null function pointer in the rcu_head structure at callback-invocation
>> time. So, whatever encoding you choose, please leave a function-pointer
>> value of zero as an invalid value!
>>
> Ok.
>
>>> --- a/kernel/rcutree.c
>>> +++ b/kernel/rcutree.c
>>> @@ -901,7 +901,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
>>> while (list) {
>>> next = list->next;
>>> prefetch(next);
>>> - list->func(list);
>>> + rcu_docallback(list);
>>>
>>
>> Good, you got all three of them! ;-)
>>
>>
> The patch was tested against rcutree ;-)
;-)
Thanx, Paul
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC, PATCH] kernel/rcu: add kfree_rcu
2009-01-02 11:25 [RFC, PATCH] kernel/rcu: add kfree_rcu Manfred Spraul
2009-01-02 18:55 ` Paul E. McKenney
@ 2009-01-04 5:39 ` Lai Jiangshan
2009-01-04 7:07 ` Manfred Spraul
1 sibling, 1 reply; 16+ messages in thread
From: Lai Jiangshan @ 2009-01-04 5:39 UTC (permalink / raw)
To: Manfred Spraul; +Cc: linux-kernel, paulmck, akpm
Hi, Manfred Spraul,
Thanks you for re-propose kfree_rcu().
I NAK. this patch.
Comparing with my patch(V1), your patch has several improvements.
But core idea is the same as my patch.
This idea(share offset with function pointer) has some disadvantages.
And I had implemented V2 and the patches for using kfree_rcu().
The V2 patch has not tested enough, but It will be done in one month.
Thanks!
Lai.
Manfred Spraul wrote:
> Based on the idea from Lai Jiangshan:
>
> There are 23 instances where the rcu callback just does
>
> kfree(containerof(rcu_member,struct whatever_struct,head));
>
> The 23 instances exist because there are 23 'struct whatever_struct' with
> their individual rcu_member.
>
> The attached patch creates a generic kfree_rcu function that removes the need
> for these 23 helpers: The offset from the container struct to the rcu member
> is calculated at compile time and stored in head->func instead of the
> function pointer.
>
>
> Effect on .text: (x86_64)
> +23 bytes in rcutree.c
> - 9 bytes in ipc/sem.c
>
> Depending on the number of call_rcu instances in the .config, adding
> kfree_rcu should save between 100 and 200 bytes in .text.
>
> What do you think?
> Is the interface ok? Incorrect use should generate compile time errors.
> Is it ok to use IS_ERR(p)?
> And: How many seperate patches should I create?
> I would propose 24 patches: one with the interface, 23 that
> replace the individual callbacks.
>
> Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
> ---
> include/linux/rcupdate.h | 33 +++++++++++++++++++++++++++++++++
> ipc/sem.c | 10 ++--------
> kernel/rcuclassic.c | 2 +-
> kernel/rcupreempt.c | 2 +-
> kernel/rcutree.c | 2 +-
> 5 files changed, 38 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 1168fbc..d120014 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -269,6 +269,29 @@ extern void call_rcu(struct rcu_head *head,
> extern void call_rcu_bh(struct rcu_head *head,
> void (*func)(struct rcu_head *head));
>
> +/**
> + * kfree_rcu - kfree after a grace period
> + * @ptr: pointer to kfree
> + * @rcu_member: name of the struct rcu_head member.
> + *
> + * Many rcu callbacks just call kfree() on the base structure. This helper
> + * function calls kfree internally. The rcu_head structure must be embedded
> + * in the to be freed structure.
> + */
> +
> +static inline void __kfree_rcu(void *prcu, unsigned long offset)
> +{
> + void (*pfnc)(struct rcu_head *head) = (void (*)(struct rcu_head *head))offset;
> +
> + BUILD_BUG_ON(!__builtin_constant_p(-offset));
> + BUILD_BUG_ON(!IS_ERR(pfnc));
> +
> + call_rcu(prcu, pfnc);
> +}
> +
> +#define kfree_rcu(pobj, entry) \
> + __kfree_rcu(&((pobj)->entry), -offsetof(typeof(*(pobj)), entry))
> +
> /* Exported common interfaces */
> extern void synchronize_rcu(void);
> extern void rcu_barrier(void);
> @@ -279,4 +302,14 @@ extern void rcu_barrier_sched(void);
> extern void rcu_init(void);
> extern int rcu_needs_cpu(int cpu);
>
> +/* helper functions */
> +static inline void rcu_docallback(struct rcu_head *head)
> +{
> + if (IS_ERR(head->func)) {
> + kfree(((char *)head) + (unsigned long)head->func);
> + } else {
> + head->func(head);
> + }
> +}
> +
> #endif /* __LINUX_RCUPDATE_H */
> diff --git a/ipc/sem.c b/ipc/sem.c
> index 0821224..4a68cf2 100644
> --- a/ipc/sem.c
> +++ b/ipc/sem.c
> @@ -504,12 +504,6 @@ static int count_semzcnt (struct sem_array * sma, ushort semnum)
> return semzcnt;
> }
>
> -static void free_un(struct rcu_head *head)
> -{
> - struct sem_undo *un = container_of(head, struct sem_undo, rcu);
> - kfree(un);
> -}
> -
> /* Free a semaphore set. freeary() is called with sem_ids.rw_mutex locked
> * as a writer and the spinlock for this semaphore set hold. sem_ids.rw_mutex
> * remains locked on exit.
> @@ -528,7 +522,7 @@ static void freeary(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp)
> un->semid = -1;
> list_del_rcu(&un->list_proc);
> spin_unlock(&un->ulp->lock);
> - call_rcu(&un->rcu, free_un);
> + kfree_rcu(un, rcu);
> }
>
> /* Wake up all pending processes and let them fail with EIDRM. */
> @@ -1347,7 +1341,7 @@ void exit_sem(struct task_struct *tsk)
> update_queue(sma);
> sem_unlock(sma);
>
> - call_rcu(&un->rcu, free_un);
> + kfree_rcu(un, rcu);
> }
> kfree(ulp);
> }
> diff --git a/kernel/rcuclassic.c b/kernel/rcuclassic.c
> index e503a00..b2f97d2 100644
> --- a/kernel/rcuclassic.c
> +++ b/kernel/rcuclassic.c
> @@ -336,7 +336,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
> while (list) {
> next = list->next;
> prefetch(next);
> - list->func(list);
> + rcu_docallback(list);
> list = next;
> if (++count >= rdp->blimit)
> break;
> diff --git a/kernel/rcupreempt.c b/kernel/rcupreempt.c
> index 0498265..8440dac 100644
> --- a/kernel/rcupreempt.c
> +++ b/kernel/rcupreempt.c
> @@ -1110,7 +1110,7 @@ static void rcu_process_callbacks(struct softirq_action *unused)
> spin_unlock_irqrestore(&rdp->lock, flags);
> while (list) {
> next = list->next;
> - list->func(list);
> + rcu_docallback(list);
> list = next;
> RCU_TRACE_ME(rcupreempt_trace_invoke);
> }
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index a342b03..29c88ce 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -901,7 +901,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
> while (list) {
> next = list->next;
> prefetch(next);
> - list->func(list);
> + rcu_docallback(list);
> list = next;
> if (++count >= rdp->blimit)
> break;
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC, PATCH] kernel/rcu: add kfree_rcu
2009-01-02 18:55 ` Paul E. McKenney
2009-01-03 14:59 ` Manfred Spraul
@ 2009-01-04 5:50 ` Lai Jiangshan
2009-01-04 19:48 ` Paul E. McKenney
1 sibling, 1 reply; 16+ messages in thread
From: Lai Jiangshan @ 2009-01-04 5:50 UTC (permalink / raw)
To: paulmck; +Cc: Manfred Spraul, linux-kernel, akpm
Paul E. McKenney wrote:
> On Fri, Jan 02, 2009 at 12:25:58PM +0100, Manfred Spraul wrote:
>> Based on the idea from Lai Jiangshan:
>>
>> There are 23 instances where the rcu callback just does
>>
>> kfree(containerof(rcu_member,struct whatever_struct,head));
>>
>> The 23 instances exist because there are 23 'struct whatever_struct' with
>> their individual rcu_member.
>>
>> The attached patch creates a generic kfree_rcu function that removes the need
>> for these 23 helpers: The offset from the container struct to the rcu member
>> is calculated at compile time and stored in head->func instead of the
>> function pointer.
>
> OK, these seem to me to avoid Nick Piggin's vfree() question in response
> to a roughly similar posting from Lai Jiangshan on November 18th
> (http://lkml.org/lkml/2008/11/18/83).
Why?
I think these two are different questions.
vfree() still can not be called from softirq context now.
And I proposed vfree_atomic() for RCU, but it can not be accepted.
Lai
>
>> Effect on .text: (x86_64)
>> +23 bytes in rcutree.c
>> - 9 bytes in ipc/sem.c
>>
>> Depending on the number of call_rcu instances in the .config, adding
>> kfree_rcu should save between 100 and 200 bytes in .text.
>>
>> What do you think?
>
> I am in favor, given changes noted below. This is a much nicer way to
> implement some function in the original RCU patch for Linux.
>
>> Is the interface ok? Incorrect use should generate compile time errors.
>> Is it ok to use IS_ERR(p)?
>
> I would suggest instead using the bottom bit to differentiate between
> these two cases, especially given that your approach makes it impossible
> for callback processing to notice a NULL function pointer. In addition,
> this approach would allow different types of allocators to be specified
> should this later prove to be helpful. You should not have to shift the
> offset because the rcu_head offset should always be a multiple of four
> (or eight on 64-bit architectures).
>
> And we really are running into bugs that are detected by RCU's seeing a
> null function pointer in the rcu_head structure at callback-invocation
> time. So, whatever encoding you choose, please leave a function-pointer
> value of zero as an invalid value!
>
>> And: How many seperate patches should I create?
>> I would propose 24 patches: one with the interface, 23 that
>> replace the individual callbacks.
>
> I defer to the various maintainers on this one. ;-)
>
> Thanx, Paul
>
>> Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
>> ---
>> include/linux/rcupdate.h | 33 +++++++++++++++++++++++++++++++++
>> ipc/sem.c | 10 ++--------
>> kernel/rcuclassic.c | 2 +-
>> kernel/rcupreempt.c | 2 +-
>> kernel/rcutree.c | 2 +-
>> 5 files changed, 38 insertions(+), 11 deletions(-)
>>
>> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
>> index 1168fbc..d120014 100644
>> --- a/include/linux/rcupdate.h
>> +++ b/include/linux/rcupdate.h
>> @@ -269,6 +269,29 @@ extern void call_rcu(struct rcu_head *head,
>> extern void call_rcu_bh(struct rcu_head *head,
>> void (*func)(struct rcu_head *head));
>>
>> +/**
>> + * kfree_rcu - kfree after a grace period
>> + * @ptr: pointer to kfree
>> + * @rcu_member: name of the struct rcu_head member.
>> + *
>> + * Many rcu callbacks just call kfree() on the base structure. This helper
>> + * function calls kfree internally. The rcu_head structure must be embedded
>> + * in the to be freed structure.
>> + */
>> +
>> +static inline void __kfree_rcu(void *prcu, unsigned long offset)
>> +{
>> + void (*pfnc)(struct rcu_head *head) = (void (*)(struct rcu_head *head))offset;
>> +
>> + BUILD_BUG_ON(!__builtin_constant_p(-offset));
>> + BUILD_BUG_ON(!IS_ERR(pfnc));
>
> This allows us to drop the above check in favor of something like the
> following above (though a symbolic value in place of 0x1 would be nice):
>
> void (*pfnc)(struct rcu_head *head) = (void (*)(struct rcu_head *head))(offset | 0x1);
>
>> + call_rcu(prcu, pfnc);
>> +}
>> +
>> +#define kfree_rcu(pobj, entry) \
>> + __kfree_rcu(&((pobj)->entry), -offsetof(typeof(*(pobj)), entry))
>> +
>> /* Exported common interfaces */
>> extern void synchronize_rcu(void);
>> extern void rcu_barrier(void);
>> @@ -279,4 +302,14 @@ extern void rcu_barrier_sched(void);
>> extern void rcu_init(void);
>> extern int rcu_needs_cpu(int cpu);
>>
>> +/* helper functions */
>> +static inline void rcu_docallback(struct rcu_head *head)
>> +{
>
> Experience indicates that we should have some sort of WARN_ON() in
> the case of a NULL rcu_head, either here or where this is called. :-/
>
>> + if (IS_ERR(head->func)) {
>
> And in this "if" statement, just check the bottom bit.
>
>> + kfree(((char *)head) + (unsigned long)head->func);
>> + } else {
>> + head->func(head);
>> + }
>> +}
>> +
>> #endif /* __LINUX_RCUPDATE_H */
>> diff --git a/ipc/sem.c b/ipc/sem.c
>> index 0821224..4a68cf2 100644
>> --- a/ipc/sem.c
>> +++ b/ipc/sem.c
>> @@ -504,12 +504,6 @@ static int count_semzcnt (struct sem_array * sma, ushort semnum)
>> return semzcnt;
>> }
>>
>> -static void free_un(struct rcu_head *head)
>> -{
>> - struct sem_undo *un = container_of(head, struct sem_undo, rcu);
>> - kfree(un);
>> -}
>> -
>> /* Free a semaphore set. freeary() is called with sem_ids.rw_mutex locked
>> * as a writer and the spinlock for this semaphore set hold. sem_ids.rw_mutex
>> * remains locked on exit.
>> @@ -528,7 +522,7 @@ static void freeary(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp)
>> un->semid = -1;
>> list_del_rcu(&un->list_proc);
>> spin_unlock(&un->ulp->lock);
>> - call_rcu(&un->rcu, free_un);
>> + kfree_rcu(un, rcu);
>
> This -is- nice! Short and sweet!!!
>
>> }
>>
>> /* Wake up all pending processes and let them fail with EIDRM. */
>> @@ -1347,7 +1341,7 @@ void exit_sem(struct task_struct *tsk)
>> update_queue(sma);
>> sem_unlock(sma);
>>
>> - call_rcu(&un->rcu, free_un);
>> + kfree_rcu(un, rcu);
>> }
>> kfree(ulp);
>> }
>> diff --git a/kernel/rcuclassic.c b/kernel/rcuclassic.c
>> index e503a00..b2f97d2 100644
>> --- a/kernel/rcuclassic.c
>> +++ b/kernel/rcuclassic.c
>> @@ -336,7 +336,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
>> while (list) {
>> next = list->next;
>> prefetch(next);
>> - list->func(list);
>> + rcu_docallback(list);
>> list = next;
>> if (++count >= rdp->blimit)
>> break;
>> diff --git a/kernel/rcupreempt.c b/kernel/rcupreempt.c
>> index 0498265..8440dac 100644
>> --- a/kernel/rcupreempt.c
>> +++ b/kernel/rcupreempt.c
>> @@ -1110,7 +1110,7 @@ static void rcu_process_callbacks(struct softirq_action *unused)
>> spin_unlock_irqrestore(&rdp->lock, flags);
>> while (list) {
>> next = list->next;
>> - list->func(list);
>> + rcu_docallback(list);
>> list = next;
>> RCU_TRACE_ME(rcupreempt_trace_invoke);
>> }
>> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
>> index a342b03..29c88ce 100644
>> --- a/kernel/rcutree.c
>> +++ b/kernel/rcutree.c
>> @@ -901,7 +901,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
>> while (list) {
>> next = list->next;
>> prefetch(next);
>> - list->func(list);
>> + rcu_docallback(list);
>
> Good, you got all three of them! ;-)
>
>> list = next;
>> if (++count >= rdp->blimit)
>> break;
>> --
>> 1.6.0.6
>>
>
>
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC, PATCH] kernel/rcu: add kfree_rcu
2009-01-04 5:39 ` Lai Jiangshan
@ 2009-01-04 7:07 ` Manfred Spraul
2009-01-04 8:30 ` Lai Jiangshan
0 siblings, 1 reply; 16+ messages in thread
From: Manfred Spraul @ 2009-01-04 7:07 UTC (permalink / raw)
To: Lai Jiangshan; +Cc: linux-kernel, paulmck, akpm
Lai Jiangshan wrote:
> Hi, Manfred Spraul,
>
> Thanks you for re-propose kfree_rcu().
> I NAK. this patch.
>
> Comparing with my patch(V1), your patch has several improvements.
> But core idea is the same as my patch.
> This idea(share offset with function pointer) has some disadvantages.
> And I had implemented V2 and the patches for using kfree_rcu().
> The V2 patch has not tested enough, but It will be done in one month.
>
Did you already post V2, did you cc me?
I couldn't find it in my e-mails.
--
Manfred
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC, PATCH] kernel/rcu: add kfree_rcu
2009-01-04 7:07 ` Manfred Spraul
@ 2009-01-04 8:30 ` Lai Jiangshan
2009-01-04 12:22 ` Manfred Spraul
0 siblings, 1 reply; 16+ messages in thread
From: Lai Jiangshan @ 2009-01-04 8:30 UTC (permalink / raw)
To: Manfred Spraul; +Cc: linux-kernel, paulmck, akpm
Manfred Spraul wrote:
> Lai Jiangshan wrote:
>> Hi, Manfred Spraul,
>>
>> Thanks you for re-propose kfree_rcu().
>> I NAK. this patch.
>>
>> Comparing with my patch(V1), your patch has several improvements.
>> But core idea is the same as my patch.
>> This idea(share offset with function pointer) has some disadvantages.
>> And I had implemented V2 and the patches for using kfree_rcu().
>> The V2 patch has not tested enough, but It will be done in one month.
>>
> Did you already post V2, did you cc me?
> I couldn't find it in my e-mails.
>
I have not posted it. -:)
Lai
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC, PATCH] kernel/rcu: add kfree_rcu
2009-01-04 8:30 ` Lai Jiangshan
@ 2009-01-04 12:22 ` Manfred Spraul
2009-01-04 20:06 ` Paul E. McKenney
0 siblings, 1 reply; 16+ messages in thread
From: Manfred Spraul @ 2009-01-04 12:22 UTC (permalink / raw)
To: Lai Jiangshan; +Cc: linux-kernel, paulmck, akpm
Lai Jiangshan wrote:
> I have not posted it. -:)
>
Could you post it?
Paul: What would break if we stop processing rcu entries in (cpu) order?
The head->func(head) in rcu_do_batch() is probably a nightmare for the
branch target predictor.
What about:
- shrinking struct rcu_head to just a pointer (let's start with the goodie)
- Adding a register_rcu_callback() function.
It allocates the per-cpu storage for the rcu grace period lists.
Seperate lists for each registered callback - thus no need to copy the
callback target into each rcu_head structure.
It returns a pointer/handle to these lists.
- call_rcu gets that handle instead of the plain function pointer.
- rcu_do_batch enumerates all registered callbacks. Thus first all
callback_struct->func(head) calls for the first registered callback,
then the calls for the 2nd callback, etc.
Better for the icache, better for the branch predictor.
Paul: Do you have a test case that is suitable for benchmarking rcu?
Any workloads were rcu appears significantly in oprofile?
And: Do you know how many rcu entries are typically alive? How much
memory is used for the function pointers?
--
Manfred
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC, PATCH] kernel/rcu: add kfree_rcu
2009-01-04 5:50 ` Lai Jiangshan
@ 2009-01-04 19:48 ` Paul E. McKenney
2009-01-05 2:48 ` Lai Jiangshan
2009-01-05 7:15 ` Manfred Spraul
0 siblings, 2 replies; 16+ messages in thread
From: Paul E. McKenney @ 2009-01-04 19:48 UTC (permalink / raw)
To: Lai Jiangshan; +Cc: Manfred Spraul, linux-kernel, akpm
On Sun, Jan 04, 2009 at 01:50:17PM +0800, Lai Jiangshan wrote:
> Paul E. McKenney wrote:
> > On Fri, Jan 02, 2009 at 12:25:58PM +0100, Manfred Spraul wrote:
> >> Based on the idea from Lai Jiangshan:
> >>
> >> There are 23 instances where the rcu callback just does
> >>
> >> kfree(containerof(rcu_member,struct whatever_struct,head));
> >>
> >> The 23 instances exist because there are 23 'struct whatever_struct' with
> >> their individual rcu_member.
> >>
> >> The attached patch creates a generic kfree_rcu function that removes the need
> >> for these 23 helpers: The offset from the container struct to the rcu member
> >> is calculated at compile time and stored in head->func instead of the
> >> function pointer.
> >
> > OK, these seem to me to avoid Nick Piggin's vfree() question in response
> > to a roughly similar posting from Lai Jiangshan on November 18th
> > (http://lkml.org/lkml/2008/11/18/83).
>
> Why?
Nick seemed to me to be objecting to vfree() being involved, and
Manfred's patch does not do vfree(). Hence "avoid" rather than "fix".
> I think these two are different questions.
> vfree() still can not be called from softirq context now.
> And I proposed vfree_atomic() for RCU, but it can not be accepted.
And one would indeed either need to have a vfree_atomic() or have some
mechanism that sent the vfree() to a workqueue or some such.
In any case, I do apologize -- since I didn't see anything from you I
incorrectly assumed that you had given up on this patch. Please accept
my apologies!
Would it be possible for you and Manfred to agree on a common patch, and
to have one of you submit it with both of you having Signed-off-by on it?
Thanx, Paul
> Lai
>
> >
> >> Effect on .text: (x86_64)
> >> +23 bytes in rcutree.c
> >> - 9 bytes in ipc/sem.c
> >>
> >> Depending on the number of call_rcu instances in the .config, adding
> >> kfree_rcu should save between 100 and 200 bytes in .text.
> >>
> >> What do you think?
> >
> > I am in favor, given changes noted below. This is a much nicer way to
> > implement some function in the original RCU patch for Linux.
> >
> >> Is the interface ok? Incorrect use should generate compile time errors.
> >> Is it ok to use IS_ERR(p)?
> >
> > I would suggest instead using the bottom bit to differentiate between
> > these two cases, especially given that your approach makes it impossible
> > for callback processing to notice a NULL function pointer. In addition,
> > this approach would allow different types of allocators to be specified
> > should this later prove to be helpful. You should not have to shift the
> > offset because the rcu_head offset should always be a multiple of four
> > (or eight on 64-bit architectures).
> >
> > And we really are running into bugs that are detected by RCU's seeing a
> > null function pointer in the rcu_head structure at callback-invocation
> > time. So, whatever encoding you choose, please leave a function-pointer
> > value of zero as an invalid value!
> >
> >> And: How many seperate patches should I create?
> >> I would propose 24 patches: one with the interface, 23 that
> >> replace the individual callbacks.
> >
> > I defer to the various maintainers on this one. ;-)
> >
> > Thanx, Paul
> >
> >> Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
> >> ---
> >> include/linux/rcupdate.h | 33 +++++++++++++++++++++++++++++++++
> >> ipc/sem.c | 10 ++--------
> >> kernel/rcuclassic.c | 2 +-
> >> kernel/rcupreempt.c | 2 +-
> >> kernel/rcutree.c | 2 +-
> >> 5 files changed, 38 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> >> index 1168fbc..d120014 100644
> >> --- a/include/linux/rcupdate.h
> >> +++ b/include/linux/rcupdate.h
> >> @@ -269,6 +269,29 @@ extern void call_rcu(struct rcu_head *head,
> >> extern void call_rcu_bh(struct rcu_head *head,
> >> void (*func)(struct rcu_head *head));
> >>
> >> +/**
> >> + * kfree_rcu - kfree after a grace period
> >> + * @ptr: pointer to kfree
> >> + * @rcu_member: name of the struct rcu_head member.
> >> + *
> >> + * Many rcu callbacks just call kfree() on the base structure. This helper
> >> + * function calls kfree internally. The rcu_head structure must be embedded
> >> + * in the to be freed structure.
> >> + */
> >> +
> >> +static inline void __kfree_rcu(void *prcu, unsigned long offset)
> >> +{
> >> + void (*pfnc)(struct rcu_head *head) = (void (*)(struct rcu_head *head))offset;
> >> +
> >> + BUILD_BUG_ON(!__builtin_constant_p(-offset));
> >> + BUILD_BUG_ON(!IS_ERR(pfnc));
> >
> > This allows us to drop the above check in favor of something like the
> > following above (though a symbolic value in place of 0x1 would be nice):
> >
> > void (*pfnc)(struct rcu_head *head) = (void (*)(struct rcu_head *head))(offset | 0x1);
> >
> >> + call_rcu(prcu, pfnc);
> >> +}
> >> +
> >> +#define kfree_rcu(pobj, entry) \
> >> + __kfree_rcu(&((pobj)->entry), -offsetof(typeof(*(pobj)), entry))
> >> +
> >> /* Exported common interfaces */
> >> extern void synchronize_rcu(void);
> >> extern void rcu_barrier(void);
> >> @@ -279,4 +302,14 @@ extern void rcu_barrier_sched(void);
> >> extern void rcu_init(void);
> >> extern int rcu_needs_cpu(int cpu);
> >>
> >> +/* helper functions */
> >> +static inline void rcu_docallback(struct rcu_head *head)
> >> +{
> >
> > Experience indicates that we should have some sort of WARN_ON() in
> > the case of a NULL rcu_head, either here or where this is called. :-/
> >
> >> + if (IS_ERR(head->func)) {
> >
> > And in this "if" statement, just check the bottom bit.
> >
> >> + kfree(((char *)head) + (unsigned long)head->func);
> >> + } else {
> >> + head->func(head);
> >> + }
> >> +}
> >> +
> >> #endif /* __LINUX_RCUPDATE_H */
> >> diff --git a/ipc/sem.c b/ipc/sem.c
> >> index 0821224..4a68cf2 100644
> >> --- a/ipc/sem.c
> >> +++ b/ipc/sem.c
> >> @@ -504,12 +504,6 @@ static int count_semzcnt (struct sem_array * sma, ushort semnum)
> >> return semzcnt;
> >> }
> >>
> >> -static void free_un(struct rcu_head *head)
> >> -{
> >> - struct sem_undo *un = container_of(head, struct sem_undo, rcu);
> >> - kfree(un);
> >> -}
> >> -
> >> /* Free a semaphore set. freeary() is called with sem_ids.rw_mutex locked
> >> * as a writer and the spinlock for this semaphore set hold. sem_ids.rw_mutex
> >> * remains locked on exit.
> >> @@ -528,7 +522,7 @@ static void freeary(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp)
> >> un->semid = -1;
> >> list_del_rcu(&un->list_proc);
> >> spin_unlock(&un->ulp->lock);
> >> - call_rcu(&un->rcu, free_un);
> >> + kfree_rcu(un, rcu);
> >
> > This -is- nice! Short and sweet!!!
> >
> >> }
> >>
> >> /* Wake up all pending processes and let them fail with EIDRM. */
> >> @@ -1347,7 +1341,7 @@ void exit_sem(struct task_struct *tsk)
> >> update_queue(sma);
> >> sem_unlock(sma);
> >>
> >> - call_rcu(&un->rcu, free_un);
> >> + kfree_rcu(un, rcu);
> >> }
> >> kfree(ulp);
> >> }
> >> diff --git a/kernel/rcuclassic.c b/kernel/rcuclassic.c
> >> index e503a00..b2f97d2 100644
> >> --- a/kernel/rcuclassic.c
> >> +++ b/kernel/rcuclassic.c
> >> @@ -336,7 +336,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
> >> while (list) {
> >> next = list->next;
> >> prefetch(next);
> >> - list->func(list);
> >> + rcu_docallback(list);
> >> list = next;
> >> if (++count >= rdp->blimit)
> >> break;
> >> diff --git a/kernel/rcupreempt.c b/kernel/rcupreempt.c
> >> index 0498265..8440dac 100644
> >> --- a/kernel/rcupreempt.c
> >> +++ b/kernel/rcupreempt.c
> >> @@ -1110,7 +1110,7 @@ static void rcu_process_callbacks(struct softirq_action *unused)
> >> spin_unlock_irqrestore(&rdp->lock, flags);
> >> while (list) {
> >> next = list->next;
> >> - list->func(list);
> >> + rcu_docallback(list);
> >> list = next;
> >> RCU_TRACE_ME(rcupreempt_trace_invoke);
> >> }
> >> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> >> index a342b03..29c88ce 100644
> >> --- a/kernel/rcutree.c
> >> +++ b/kernel/rcutree.c
> >> @@ -901,7 +901,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
> >> while (list) {
> >> next = list->next;
> >> prefetch(next);
> >> - list->func(list);
> >> + rcu_docallback(list);
> >
> > Good, you got all three of them! ;-)
> >
> >> list = next;
> >> if (++count >= rdp->blimit)
> >> break;
> >> --
> >> 1.6.0.6
> >>
> >
> >
> >
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC, PATCH] kernel/rcu: add kfree_rcu
2009-01-04 12:22 ` Manfred Spraul
@ 2009-01-04 20:06 ` Paul E. McKenney
2009-01-12 17:43 ` Paul E. McKenney
0 siblings, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2009-01-04 20:06 UTC (permalink / raw)
To: Manfred Spraul; +Cc: Lai Jiangshan, linux-kernel, akpm
On Sun, Jan 04, 2009 at 01:22:00PM +0100, Manfred Spraul wrote:
> Lai Jiangshan wrote:
>> I have not posted it. -:)
>>
> Could you post it?
>
> Paul: What would break if we stop processing rcu entries in (cpu) order?
If I understand, you are suggesting that a given CPU process its RCU
callbacks out of order. This would break rcu_barrier(), so please do
not do this.
If I misunderstood what you are suggesting, please enlighten me!
> The head->func(head) in rcu_do_batch() is probably a nightmare for the
> branch target predictor.
>
> What about:
> - shrinking struct rcu_head to just a pointer (let's start with the goodie)
> - Adding a register_rcu_callback() function.
> It allocates the per-cpu storage for the rcu grace period lists.
> Seperate lists for each registered callback - thus no need to copy the
> callback target into each rcu_head structure.
> It returns a pointer/handle to these lists.
> - call_rcu gets that handle instead of the plain function pointer.
> - rcu_do_batch enumerates all registered callbacks. Thus first all
> callback_struct->func(head) calls for the first registered callback, then
> the calls for the 2nd callback, etc.
> Better for the icache, better for the branch predictor.
Hmmm... I guess that rcu_barrier() could put a callback on each of the
resulting per-CPU lists for each CPU. Making rcu_barrier() more
expensive is probably not a problem. But there would need to be a way
of marking rcu_barrier()'s rcu_head structures, perhaps the bottom bit
of the pointer (shudder!).
The rcu_offline code will of course need to traverse these lists in
order to move the callbacks from an outgoing CPU.
It would also be necessary to inspect the current call_rcu() invocations
in the kernel (not too big a job, as there are only about 100 of them).
If there are any that rely on callbacks being invoked in order, these
would need to be addressed if we are to do something like what you
are suggesting. I do not recall ever suggesting that people rely on
such ordering, but given that people can read the code and see that
rcu_barrier() already relies on it...
So if we do go this way, we will need to update the documentation.
The deep embedded guys would like a single-pointer rcu_head, and your
approach seems better than the one I came up with a couple of years ago
on page 11 of:
http://www.rdrop.com/users/paulmck/RCU/OLSrtRCU.2006.08.11a.pdf
At least assuming that the problems can be resolved.
I don't see how this helps the icache at all, but could see how it might
help branch prediction.
> Paul: Do you have a test case that is suitable for benchmarking rcu?
> Any workloads were rcu appears significantly in oprofile?
> And: Do you know how many rcu entries are typically alive? How much memory
> is used for the function pointers?
The test cases I know of are those used to validate the performance of
various RCU patches, most of which have been quite insensitive to the
update-side overhead. The only workloads that I am aware of where RCU
update-side processing shows up are those running on hundreds of CPUs
(hence hierarchical RCU). Some workloads have many thousands of RCU
callbacks in flight -- I believe that Dipankar Sarma measured something
like 1600 per grace period on a file-system benchmark some years back.
The amount of memory used for the function pointers can be large, though
many cases now union this space with other storage (e.g., struct dentry).
The deep embedded guys have worried about it in the past, though I have
not heard much from them in the past few years -- something about even
cellphones having hundreds of megabytes of DRAM, I guess. ;-)
So, in short, I am not sure that this will be worth the increase in code
complexity, but it does sound like an interesting possibility.
Thanx, Paul
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC, PATCH] kernel/rcu: add kfree_rcu
2009-01-04 19:48 ` Paul E. McKenney
@ 2009-01-05 2:48 ` Lai Jiangshan
2009-01-05 4:40 ` Paul E. McKenney
2009-01-05 7:15 ` Manfred Spraul
1 sibling, 1 reply; 16+ messages in thread
From: Lai Jiangshan @ 2009-01-05 2:48 UTC (permalink / raw)
To: paulmck; +Cc: Manfred Spraul, linux-kernel, akpm
Paul E. McKenney wrote:
>
> In any case, I do apologize -- since I didn't see anything from you I
> incorrectly assumed that you had given up on this patch. Please accept
> my apologies!
It's my fault.
I'm a very low-yield developer.
I have given up on unaccepted cleanup-patches, kfree_rcu() is not a
cleanup for reducing lines of code, it has some significances for kernel.
>
> Would it be possible for you and Manfred to agree on a common patch, and
> to have one of you submit it with both of you having Signed-off-by on it?
>
All are OK.
I NAK-ed only for deferring kfree_rcu() being accepted.
I'm not concerned about whether I'm patches' Author or I'm in the
Signed-off-by list.
Thanks,
Lai.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC, PATCH] kernel/rcu: add kfree_rcu
2009-01-05 2:48 ` Lai Jiangshan
@ 2009-01-05 4:40 ` Paul E. McKenney
2009-01-06 22:17 ` Paul E. McKenney
0 siblings, 1 reply; 16+ messages in thread
From: Paul E. McKenney @ 2009-01-05 4:40 UTC (permalink / raw)
To: Lai Jiangshan; +Cc: Manfred Spraul, linux-kernel, akpm
On Mon, Jan 05, 2009 at 10:48:11AM +0800, Lai Jiangshan wrote:
> Paul E. McKenney wrote:
> >
> > In any case, I do apologize -- since I didn't see anything from you I
> > incorrectly assumed that you had given up on this patch. Please accept
> > my apologies!
>
> It's my fault.
>
> I'm a very low-yield developer.
Well, I am certainly not the highest-yield developer around, and I have
probably put in at least 30,000 hours of programming over the past 35
years, and probably half of those between 1981 and 1985. Of course,
a fair fraction of those hours were in languages and environments that
are pretty much irrelevant these days. Nevertheless, my guess is that
you need to invest about 10,000 hours to really master programming, which
works out to about five years at 40 hours per week of doing nothing but
designing, coding, and debugging.
So please don't give up!
> I have given up on unaccepted cleanup-patches, kfree_rcu() is not a
> cleanup for reducing lines of code, it has some significances for kernel.
I agree that it would be a nice addition.
> > Would it be possible for you and Manfred to agree on a common patch, and
> > to have one of you submit it with both of you having Signed-off-by on it?
>
> All are OK.
>
> I NAK-ed only for deferring kfree_rcu() being accepted.
>
> I'm not concerned about whether I'm patches' Author or I'm in the
> Signed-off-by list.
I greatly appreciate the fact that you are more interested in the
quality of the Linux kernel than in credit. That said, I would like the
solution to be something that both you and Manfred agree on.
Thanx, Paul
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC, PATCH] kernel/rcu: add kfree_rcu
2009-01-04 19:48 ` Paul E. McKenney
2009-01-05 2:48 ` Lai Jiangshan
@ 2009-01-05 7:15 ` Manfred Spraul
1 sibling, 0 replies; 16+ messages in thread
From: Manfred Spraul @ 2009-01-05 7:15 UTC (permalink / raw)
To: paulmck; +Cc: Lai Jiangshan, linux-kernel, akpm
Paul E. McKenney wrote:
>
>> I think these two are different questions.
>> vfree() still can not be called from softirq context now.
>> And I proposed vfree_atomic() for RCU, but it can not be accepted.
>>
>
> And one would indeed either need to have a vfree_atomic() or have some
> mechanism that sent the vfree() to a workqueue or some such.
>
ipc/util.c uses a workqueue (ipc_schedule_free):
The semaphore array structure can be large (documented with SEMMSL=8000
around 16 kB, but there is no hard limit), thus vmalloc is used.
But as long as there are just one or two users, I doubt that a generic
system is worth the effort.
--
Manfred
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC, PATCH] kernel/rcu: add kfree_rcu
2009-01-05 4:40 ` Paul E. McKenney
@ 2009-01-06 22:17 ` Paul E. McKenney
0 siblings, 0 replies; 16+ messages in thread
From: Paul E. McKenney @ 2009-01-06 22:17 UTC (permalink / raw)
To: Lai Jiangshan; +Cc: Manfred Spraul, linux-kernel, akpm
On Sun, Jan 04, 2009 at 08:40:23PM -0800, Paul E. McKenney wrote:
> On Mon, Jan 05, 2009 at 10:48:11AM +0800, Lai Jiangshan wrote:
> > Paul E. McKenney wrote:
> > >
> > > In any case, I do apologize -- since I didn't see anything from you I
> > > incorrectly assumed that you had given up on this patch. Please accept
> > > my apologies!
> >
> > It's my fault.
> >
> > I'm a very low-yield developer.
>
> Well, I am certainly not the highest-yield developer around, and I have
> probably put in at least 30,000 hours of programming over the past 35
> years, and probably half of those between 1981 and 1985. Of course,
> a fair fraction of those hours were in languages and environments that
> are pretty much irrelevant these days. Nevertheless, my guess is that
> you need to invest about 10,000 hours to really master programming, which
> works out to about five years at 40 hours per week of doing nothing but
> designing, coding, and debugging.
>
> So please don't give up!
Two more things:
1. I believe that reviewing code counts towards the 10,000 hours.
Just in case you were curious. ;-)
2. To help prevent more collisions, I have posted a rough RCU
to-do list at:
http://www.kernel.org/pub/linux/kernel/people/paulmck/rcutodo.html.
Thanx, Paul
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC, PATCH] kernel/rcu: add kfree_rcu
2009-01-04 20:06 ` Paul E. McKenney
@ 2009-01-12 17:43 ` Paul E. McKenney
0 siblings, 0 replies; 16+ messages in thread
From: Paul E. McKenney @ 2009-01-12 17:43 UTC (permalink / raw)
To: Manfred Spraul; +Cc: Lai Jiangshan, linux-kernel, akpm
On Sun, Jan 04, 2009 at 12:06:58PM -0800, Paul E. McKenney wrote:
> On Sun, Jan 04, 2009 at 01:22:00PM +0100, Manfred Spraul wrote:
> > Lai Jiangshan wrote:
> >> I have not posted it. -:)
> >>
> > Could you post it?
> >
> > Paul: What would break if we stop processing rcu entries in (cpu) order?
>
> If I understand, you are suggesting that a given CPU process its RCU
> callbacks out of order. This would break rcu_barrier(), so please do
> not do this.
>
> If I misunderstood what you are suggesting, please enlighten me!
One other thing that might be really cool is for memory freed via RCU to
be treated as if it was cache-cold, which it is unless the RCU callback
needs to write to the memory block. In the case of kfree_rcu(), the
callback should not need to do writes, so it might make sense to handle
the block differently than the typical hot-in-cache free.
Thanx, Paul
> > The head->func(head) in rcu_do_batch() is probably a nightmare for the
> > branch target predictor.
> >
> > What about:
> > - shrinking struct rcu_head to just a pointer (let's start with the goodie)
> > - Adding a register_rcu_callback() function.
> > It allocates the per-cpu storage for the rcu grace period lists.
> > Seperate lists for each registered callback - thus no need to copy the
> > callback target into each rcu_head structure.
> > It returns a pointer/handle to these lists.
> > - call_rcu gets that handle instead of the plain function pointer.
> > - rcu_do_batch enumerates all registered callbacks. Thus first all
> > callback_struct->func(head) calls for the first registered callback, then
> > the calls for the 2nd callback, etc.
> > Better for the icache, better for the branch predictor.
>
> Hmmm... I guess that rcu_barrier() could put a callback on each of the
> resulting per-CPU lists for each CPU. Making rcu_barrier() more
> expensive is probably not a problem. But there would need to be a way
> of marking rcu_barrier()'s rcu_head structures, perhaps the bottom bit
> of the pointer (shudder!).
>
> The rcu_offline code will of course need to traverse these lists in
> order to move the callbacks from an outgoing CPU.
>
> It would also be necessary to inspect the current call_rcu() invocations
> in the kernel (not too big a job, as there are only about 100 of them).
> If there are any that rely on callbacks being invoked in order, these
> would need to be addressed if we are to do something like what you
> are suggesting. I do not recall ever suggesting that people rely on
> such ordering, but given that people can read the code and see that
> rcu_barrier() already relies on it...
>
> So if we do go this way, we will need to update the documentation.
>
> The deep embedded guys would like a single-pointer rcu_head, and your
> approach seems better than the one I came up with a couple of years ago
> on page 11 of:
>
> http://www.rdrop.com/users/paulmck/RCU/OLSrtRCU.2006.08.11a.pdf
>
> At least assuming that the problems can be resolved.
>
> I don't see how this helps the icache at all, but could see how it might
> help branch prediction.
>
> > Paul: Do you have a test case that is suitable for benchmarking rcu?
> > Any workloads were rcu appears significantly in oprofile?
> > And: Do you know how many rcu entries are typically alive? How much memory
> > is used for the function pointers?
>
> The test cases I know of are those used to validate the performance of
> various RCU patches, most of which have been quite insensitive to the
> update-side overhead. The only workloads that I am aware of where RCU
> update-side processing shows up are those running on hundreds of CPUs
> (hence hierarchical RCU). Some workloads have many thousands of RCU
> callbacks in flight -- I believe that Dipankar Sarma measured something
> like 1600 per grace period on a file-system benchmark some years back.
>
> The amount of memory used for the function pointers can be large, though
> many cases now union this space with other storage (e.g., struct dentry).
> The deep embedded guys have worried about it in the past, though I have
> not heard much from them in the past few years -- something about even
> cellphones having hundreds of megabytes of DRAM, I guess. ;-)
>
> So, in short, I am not sure that this will be worth the increase in code
> complexity, but it does sound like an interesting possibility.
>
> Thanx, Paul
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2009-01-12 17:43 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-02 11:25 [RFC, PATCH] kernel/rcu: add kfree_rcu Manfred Spraul
2009-01-02 18:55 ` Paul E. McKenney
2009-01-03 14:59 ` Manfred Spraul
2009-01-03 23:46 ` Paul E. McKenney
2009-01-04 5:50 ` Lai Jiangshan
2009-01-04 19:48 ` Paul E. McKenney
2009-01-05 2:48 ` Lai Jiangshan
2009-01-05 4:40 ` Paul E. McKenney
2009-01-06 22:17 ` Paul E. McKenney
2009-01-05 7:15 ` Manfred Spraul
2009-01-04 5:39 ` Lai Jiangshan
2009-01-04 7:07 ` Manfred Spraul
2009-01-04 8:30 ` Lai Jiangshan
2009-01-04 12:22 ` Manfred Spraul
2009-01-04 20:06 ` Paul E. McKenney
2009-01-12 17:43 ` Paul E. McKenney
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).