* [RFC PATCH] rcu: introduce kfree_rcu()
@ 2008-09-18 4:18 Lai Jiangshan
2008-09-18 4:37 ` Andrew Morton
2008-09-18 6:44 ` Paul E. McKenney
0 siblings, 2 replies; 12+ messages in thread
From: Lai Jiangshan @ 2008-09-18 4:18 UTC (permalink / raw)
To: Ingo Molnar
Cc: Linux Kernel Mailing List, Paul E. McKenney, Dipankar Sarma,
Andrew Morton, Peter Zijlstra, manfred
sometimes a rcu callback is just calling kfree() to free a struct's memory
(we say this callback is a trivial callback.).
this patch introduce kfree_rcu() to do these things directly, easily.
There are 4 reasons that we need kfree_rcu():
1) unloadable modules:
a module(rcu callback is defined in this module) using rcu must
call rcu_barrier() when unload. rcu_barrier() will increase
the system's overhead(the more cpus the worse) and
rcu_barrier() is very time-consuming. if all rcu callback defined
in this module are trivial callback, we can just call kfree_rcu()
instead, save a rcu_barrier() when unload.
2) duplicate code:
all trivial callback are duplicate code though the structs to be freed
are different. it's just a container_of() and a kfree().
There are about 50% callbacks are trivial callbacks for call_rcu() in
current kernel code.
3) cache:
the instructions of trivial callback is not in the cache supposedly.
calling a trivial callback will let to cache missing very likely.
the more trivial callback the more cache missing. OK, this is
not a problem now or in a few days: Only less than 1% trivial callback
are called in running kernel.
4) future:
the number of user of rcu is increasing. new code for rcu is
trivial callback very likely. it means more modules using rcu
and more duplicate code(may come to 90% of callbacks is trivial
callbacks) and more cache missing.
Implementation:
there were a lot of ideas came out when i implemented kfree_rcu().
I chose the simplest one as this patch shows. but these implementation
may cannot be used for to free a struct larger than 16KBytes.
kfree_rcu_bh()? kfree_rcu_sched()?
these two are not need current. call_rcu_bh() & call_rcu_sched()
are hardly be called(and hardly be called for trivial callback).
vfree_rcu()?
No, vfree() is not atomic function, will not be called in softirq.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index e8b4039..04c654f 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -253,4 +253,25 @@ extern void rcu_barrier_sched(void);
extern void rcu_init(void);
extern int rcu_needs_cpu(int cpu);
+#define __KFREE_RCU_MAX_OFFSET 4095
+#define KFREE_RCU_MAX_OFFSET (sizeof(void *) * __KFREE_RCU_MAX_OFFSET)
+
+#define __rcu_reclaim(head) \
+do { \
+ unsigned long __offset = (unsigned long)head->func; \
+ if (__offset <= __KFREE_RCU_MAX_OFFSET) \
+ kfree((void *)head - sizeof(void *) * __offset); \
+ else \
+ head->func(head); \
+} while(0)
+
+
+/**
+ * kfree_rcu - free previously allocated memory after a grace period.
+ * @ptr: pointer returned by kmalloc.
+ * @head: structure to be used for queueing the RCU updates. This structure
+ * is a part of previously allocated memory @ptr.
+ */
+extern void kfree_rcu(const void *ptr, struct rcu_head *head);
+
#endif /* __LINUX_RCUPDATE_H */
diff --git a/kernel/rcuclassic.c b/kernel/rcuclassic.c
index aad93cd..5a14190 100644
--- a/kernel/rcuclassic.c
+++ b/kernel/rcuclassic.c
@@ -232,7 +232,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
while (list) {
next = list->next;
prefetch(next);
- list->func(list);
+ __rcu_reclaim(list);
list = next;
if (++count >= rdp->blimit)
break;
diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
index 467d594..aa9b56a 100644
--- a/kernel/rcupdate.c
+++ b/kernel/rcupdate.c
@@ -162,6 +162,18 @@ void rcu_barrier_sched(void)
}
EXPORT_SYMBOL_GPL(rcu_barrier_sched);
+void kfree_rcu(const void *ptr, struct rcu_head *head)
+{
+ unsigned long offset;
+ typedef void (*rcu_callback)(struct rcu_head *);
+
+ offset = (void *)head - (void *)ptr;
+ BUG_ON(offset > KFREE_RCU_MAX_OFFSET);
+
+ call_rcu(head, (rcu_callback)(offset / sizeof(void *)));
+}
+EXPORT_SYMBOL_GPL(kfree_rcu);
+
void __init rcu_init(void)
{
__rcu_init();
diff --git a/kernel/rcupreempt.c b/kernel/rcupreempt.c
index 2782793..62a9e54 100644
--- a/kernel/rcupreempt.c
+++ b/kernel/rcupreempt.c
@@ -1108,7 +1108,7 @@ static void rcu_process_callbacks(struct softirq_action *unused)
spin_unlock_irqrestore(&rdp->lock, flags);
while (list) {
next = list->next;
- list->func(list);
+ __rcu_reclaim(list);
list = next;
RCU_TRACE_ME(rcupreempt_trace_invoke);
}
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [RFC PATCH] rcu: introduce kfree_rcu()
2008-09-18 4:18 [RFC PATCH] rcu: introduce kfree_rcu() Lai Jiangshan
@ 2008-09-18 4:37 ` Andrew Morton
2008-09-18 16:52 ` Manfred Spraul
2008-09-19 2:31 ` Lai Jiangshan
2008-09-18 6:44 ` Paul E. McKenney
1 sibling, 2 replies; 12+ messages in thread
From: Andrew Morton @ 2008-09-18 4:37 UTC (permalink / raw)
To: Lai Jiangshan
Cc: Ingo Molnar, Linux Kernel Mailing List, Paul E. McKenney,
Dipankar Sarma, Peter Zijlstra, manfred
On Thu, 18 Sep 2008 12:18:28 +0800 Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
>
> sometimes a rcu callback is just calling kfree() to free a struct's memory
> (we say this callback is a trivial callback.).
> this patch introduce kfree_rcu() to do these things directly, easily.
>
> There are 4 reasons that we need kfree_rcu():
>
> 1) unloadable modules:
> a module(rcu callback is defined in this module) using rcu must
> call rcu_barrier() when unload. rcu_barrier() will increase
> the system's overhead(the more cpus the worse) and
> rcu_barrier() is very time-consuming. if all rcu callback defined
> in this module are trivial callback, we can just call kfree_rcu()
> instead, save a rcu_barrier() when unload.
>
> 2) duplicate code:
> all trivial callback are duplicate code though the structs to be freed
> are different. it's just a container_of() and a kfree().
> There are about 50% callbacks are trivial callbacks for call_rcu() in
> current kernel code.
>
> 3) cache:
> the instructions of trivial callback is not in the cache supposedly.
> calling a trivial callback will let to cache missing very likely.
> the more trivial callback the more cache missing. OK, this is
> not a problem now or in a few days: Only less than 1% trivial callback
> are called in running kernel.
>
> 4) future:
> the number of user of rcu is increasing. new code for rcu is
> trivial callback very likely. it means more modules using rcu
> and more duplicate code(may come to 90% of callbacks is trivial
> callbacks) and more cache missing.
>
> Implementation:
> there were a lot of ideas came out when i implemented kfree_rcu().
> I chose the simplest one as this patch shows. but these implementation
> may cannot be used for to free a struct larger than 16KBytes.
>
> kfree_rcu_bh()? kfree_rcu_sched()?
> these two are not need current. call_rcu_bh() & call_rcu_sched()
> are hardly be called(and hardly be called for trivial callback).
>
> vfree_rcu()?
> No, vfree() is not atomic function, will not be called in softirq.
>
This is all rather mysterious.
> ---
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index e8b4039..04c654f 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -253,4 +253,25 @@ extern void rcu_barrier_sched(void);
> extern void rcu_init(void);
> extern int rcu_needs_cpu(int cpu);
>
> +#define __KFREE_RCU_MAX_OFFSET 4095
> +#define KFREE_RCU_MAX_OFFSET (sizeof(void *) * __KFREE_RCU_MAX_OFFSET)
> +
> +#define __rcu_reclaim(head) \
> +do { \
> + unsigned long __offset = (unsigned long)head->func; \
> + if (__offset <= __KFREE_RCU_MAX_OFFSET) \
> + kfree((void *)head - sizeof(void *) * __offset); \
> + else \
> + head->func(head); \
> +} while(0)
All the above could do with some comments explaining what it does.
Please use checkpatch.
Surely it cannot be resirable to have a "function" which can call kfree
either synchronously or asynchronously depending upon the size of the
object which was passed to it? If the caller wants rcu treatment of
the freeing then that is what the caller must be given. I mean, if the
caller can tolerate a synchrnous call to kfree() then the caller should
have directly called kfree?
I think (and pray) that the above could have been implemented as an
out-of-line C function?
> +
> +/**
> + * kfree_rcu - free previously allocated memory after a grace period.
> + * @ptr: pointer returned by kmalloc.
> + * @head: structure to be used for queueing the RCU updates. This structure
> + * is a part of previously allocated memory @ptr.
> + */
> +extern void kfree_rcu(const void *ptr, struct rcu_head *head);
rcu kerneldoc is implemented in the .c file, not in the .h file.
> #endif /* __LINUX_RCUPDATE_H */
> diff --git a/kernel/rcuclassic.c b/kernel/rcuclassic.c
> index aad93cd..5a14190 100644
> --- a/kernel/rcuclassic.c
> +++ b/kernel/rcuclassic.c
> @@ -232,7 +232,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
> while (list) {
> next = list->next;
> prefetch(next);
> - list->func(list);
> + __rcu_reclaim(list);
> list = next;
> if (++count >= rdp->blimit)
> break;
> diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
> index 467d594..aa9b56a 100644
> --- a/kernel/rcupdate.c
> +++ b/kernel/rcupdate.c
> @@ -162,6 +162,18 @@ void rcu_barrier_sched(void)
> }
> EXPORT_SYMBOL_GPL(rcu_barrier_sched);
>
> +void kfree_rcu(const void *ptr, struct rcu_head *head)
> +{
> + unsigned long offset;
> + typedef void (*rcu_callback)(struct rcu_head *);
> +
> + offset = (void *)head - (void *)ptr;
> + BUG_ON(offset > KFREE_RCU_MAX_OFFSET);
> +
> + call_rcu(head, (rcu_callback)(offset / sizeof(void *)));
How can this work? We take the difference between two pointers, divide
that by 4 or 8, then treat the resulting number as the address of an
RCU callback function.
I think I'm missing something here.
> +}
> +EXPORT_SYMBOL_GPL(kfree_rcu);
> +
> void __init rcu_init(void)
> {
> __rcu_init();
> diff --git a/kernel/rcupreempt.c b/kernel/rcupreempt.c
> index 2782793..62a9e54 100644
> --- a/kernel/rcupreempt.c
> +++ b/kernel/rcupreempt.c
> @@ -1108,7 +1108,7 @@ static void rcu_process_callbacks(struct softirq_action *unused)
> spin_unlock_irqrestore(&rdp->lock, flags);
> while (list) {
> next = list->next;
> - list->func(list);
> + __rcu_reclaim(list);
> list = next;
> RCU_TRACE_ME(rcupreempt_trace_invoke);
> }
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH] rcu: introduce kfree_rcu()
2008-09-18 4:18 [RFC PATCH] rcu: introduce kfree_rcu() Lai Jiangshan
2008-09-18 4:37 ` Andrew Morton
@ 2008-09-18 6:44 ` Paul E. McKenney
2008-09-18 8:59 ` Lai Jiangshan
` (2 more replies)
1 sibling, 3 replies; 12+ messages in thread
From: Paul E. McKenney @ 2008-09-18 6:44 UTC (permalink / raw)
To: Lai Jiangshan
Cc: Ingo Molnar, Linux Kernel Mailing List, Dipankar Sarma,
Andrew Morton, Peter Zijlstra, manfred
On Thu, Sep 18, 2008 at 12:18:28PM +0800, Lai Jiangshan wrote:
>
> sometimes a rcu callback is just calling kfree() to free a struct's memory
> (we say this callback is a trivial callback.).
> this patch introduce kfree_rcu() to do these things directly, easily.
Interesting! Please see questions and comments below.
> There are 4 reasons that we need kfree_rcu():
>
> 1) unloadable modules:
> a module(rcu callback is defined in this module) using rcu must
> call rcu_barrier() when unload. rcu_barrier() will increase
> the system's overhead(the more cpus the worse) and
> rcu_barrier() is very time-consuming. if all rcu callback defined
> in this module are trivial callback, we can just call kfree_rcu()
> instead, save a rcu_barrier() when unload.
You lost me on this one. Suppose that the following sequence of
events occurred:
a. The module invokes call_rcu() or kfree_rcu(). The callback
is queued on CPU 0.
b. Perhaps a grace period completes, and the callback is therefore
moved to CPU 0's donelist. But CPU 0 is busy, so doesn't get
around to invoking the callback. (For example, ksoftirqd.)
c. The module is unloaded, and uses kfree_rcu() instead of
rcu_barrier(). The callback is queued on CPU 1.
d. A grace period completes, and CPU 1 is relatively idle, so
invokes its callback quickly. The module is therefore unloaded.
e. CPU 0 finally gets around to executing its callback, but the
module has been unloaded, so there is nothingness where the
callback function used to be. We get an oops.
What prevents this sequence of events from happening?
> 2) duplicate code:
> all trivial callback are duplicate code though the structs to be freed
> are different. it's just a container_of() and a kfree().
> There are about 50% callbacks are trivial callbacks for call_rcu() in
> current kernel code.
Indeed! There was something similar to kfree_rcu() proposed some
years back, but it was rejected because it contained more code than
did the trivial callbacks. :-/
But there are more such callbacks these days, so might be worth
revisiting.
> 3) cache:
> the instructions of trivial callback is not in the cache supposedly.
> calling a trivial callback will let to cache missing very likely.
> the more trivial callback the more cache missing. OK, this is
> not a problem now or in a few days: Only less than 1% trivial callback
> are called in running kernel.
Reducing code footprint would be a good thing. Do you have stats on
the kernel text size, before and after?
> 4) future:
> the number of user of rcu is increasing. new code for rcu is
> trivial callback very likely. it means more modules using rcu
> and more duplicate code(may come to 90% of callbacks is trivial
> callbacks) and more cache missing.
Ditto.
> Implementation:
> there were a lot of ideas came out when i implemented kfree_rcu().
> I chose the simplest one as this patch shows. but these implementation
> may cannot be used for to free a struct larger than 16KBytes.
>
> kfree_rcu_bh()? kfree_rcu_sched()?
> these two are not need current. call_rcu_bh() & call_rcu_sched()
> are hardly be called(and hardly be called for trivial callback).
>
> vfree_rcu()?
> No, vfree() is not atomic function, will not be called in softirq.
>
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> ---
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index e8b4039..04c654f 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -253,4 +253,25 @@ extern void rcu_barrier_sched(void);
> extern void rcu_init(void);
> extern int rcu_needs_cpu(int cpu);
>
> +#define __KFREE_RCU_MAX_OFFSET 4095
> +#define KFREE_RCU_MAX_OFFSET (sizeof(void *) * __KFREE_RCU_MAX_OFFSET)
> +
> +#define __rcu_reclaim(head) \
> +do { \
> + unsigned long __offset = (unsigned long)head->func; \
> + if (__offset <= __KFREE_RCU_MAX_OFFSET) \
> + kfree((void *)head - sizeof(void *) * __offset); \
> + else \
> + head->func(head); \
> +} while(0)
OK, so the idea is that structures whose rcu_head is near the front
of the structure have the offset of the rcu_head put into the
->func field instead of a pointer to the callback function?
Of course, it doesn't need to be too near the beginning of the
function...
All arches are guaranteed not to have kernel text in the low 16K
of memory (for 32-bit arches) or low 32K of memory (for 64-bit arches)?
> +/**
> + * kfree_rcu - free previously allocated memory after a grace period.
> + * @ptr: pointer returned by kmalloc.
> + * @head: structure to be used for queueing the RCU updates. This structure
> + * is a part of previously allocated memory @ptr.
> + */
> +extern void kfree_rcu(const void *ptr, struct rcu_head *head);
> +
> #endif /* __LINUX_RCUPDATE_H */
> diff --git a/kernel/rcuclassic.c b/kernel/rcuclassic.c
> index aad93cd..5a14190 100644
> --- a/kernel/rcuclassic.c
> +++ b/kernel/rcuclassic.c
> @@ -232,7 +232,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
> while (list) {
> next = list->next;
> prefetch(next);
> - list->func(list);
> + __rcu_reclaim(list);
OK, consistent with above.
> list = next;
> if (++count >= rdp->blimit)
> break;
> diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
> index 467d594..aa9b56a 100644
> --- a/kernel/rcupdate.c
> +++ b/kernel/rcupdate.c
> @@ -162,6 +162,18 @@ void rcu_barrier_sched(void)
> }
> EXPORT_SYMBOL_GPL(rcu_barrier_sched);
>
> +void kfree_rcu(const void *ptr, struct rcu_head *head)
> +{
> + unsigned long offset;
> + typedef void (*rcu_callback)(struct rcu_head *);
> +
> + offset = (void *)head - (void *)ptr;
> + BUG_ON(offset > KFREE_RCU_MAX_OFFSET);
> +
> + call_rcu(head, (rcu_callback)(offset / sizeof(void *)));
OK, so we pass in the pointer to the rcu_head structure, followed
by the offset in pointer-sized units, but with the latter cast to
a pointer to a callback function? Hmmm.... Kinky....
Then after the grace period completes, the __rcu_reclaim() sorts
things out.
> +}
> +EXPORT_SYMBOL_GPL(kfree_rcu);
> +
> void __init rcu_init(void)
> {
> __rcu_init();
> diff --git a/kernel/rcupreempt.c b/kernel/rcupreempt.c
> index 2782793..62a9e54 100644
> --- a/kernel/rcupreempt.c
> +++ b/kernel/rcupreempt.c
> @@ -1108,7 +1108,7 @@ static void rcu_process_callbacks(struct softirq_action *unused)
> spin_unlock_irqrestore(&rdp->lock, flags);
> while (list) {
> next = list->next;
> - list->func(list);
> + __rcu_reclaim(list);
And we do this for preemptable RCU as well.
> list = next;
> RCU_TRACE_ME(rcupreempt_trace_invoke);
> }
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH] rcu: introduce kfree_rcu()
2008-09-18 6:44 ` Paul E. McKenney
@ 2008-09-18 8:59 ` Lai Jiangshan
2008-09-18 17:15 ` Paul E. McKenney
2008-09-18 16:56 ` Manfred Spraul
2008-09-19 1:04 ` Lai Jiangshan
2 siblings, 1 reply; 12+ messages in thread
From: Lai Jiangshan @ 2008-09-18 8:59 UTC (permalink / raw)
To: paulmck
Cc: Ingo Molnar, Linux Kernel Mailing List, Dipankar Sarma,
Andrew Morton, Peter Zijlstra, manfred
How to usage kfree_rcu:
struct my_struct {
int data;
struct rcu_head rcu;
};
----------------original code:--------------------------
void my_struct_release_rcu(struct rcu_head *rcu)
{
struct my_struct *p;
item = container_of(rcu, struct my_struct, rcu);
kfree(p);
}
void some_fuction()
{
struct my_struct *p;
.....;
call_rcu(&p->rcu, my_struct_release_rcu);
.....;
}
---end---
-----------------after use kfree_rcu:--------------------
/* my_struct_release_rcu() was removed */
void some_fuction()
{
struct my_struct *p;
.....;
kfree_rcu(p, &p->rcu);
.....;
}
---end---
1) unloadable modules:
A) use my_struct_release_rcu():
when we unload this modules, we need call rcu_barrier() to wait
all my_struct_release_rcu() had called.
B) use kfree_rcu():
if all trivial callback are removed and kfree_rcu() are used instead,
we do not need to wait anything. just quick finish unloading.
2) duplicate code:
A) use my_struct_release_rcu():
All trivial callback are very like my_struct_release_rcu(),
all are duplicate code.
B) use kfree_rcu():
all trivial callback are removed, not duplicate code like
my_struct_release_rcu().
3) cache:
A) use my_struct_release_rcu():
my_struct_release_rcu() is called rarely, when my_struct_release_rcu()
is being called, cache missing will occur.
B) use kfree_rcu():
my_struct_release_rcu() is removed, not such cache missing.
4) future:
A) use my_struct_release_rcu():
when new user use rcu, the most callback is trivial callback
like my_struct_release_rcu(). this is the common of using rcu.
so the problems of above are more and more heavy.
B) use kfree_rcu():
fix these problems for ever.
Paul E. McKenney wrote:
> On Thu, Sep 18, 2008 at 12:18:28PM +0800, Lai Jiangshan wrote:
>> sometimes a rcu callback is just calling kfree() to free a struct's memory
>> (we say this callback is a trivial callback.).
>> this patch introduce kfree_rcu() to do these things directly, easily.
>
> Interesting! Please see questions and comments below.
>
>> There are 4 reasons that we need kfree_rcu():
>>
>> 1) unloadable modules:
>> a module(rcu callback is defined in this module) using rcu must
>> call rcu_barrier() when unload. rcu_barrier() will increase
>> the system's overhead(the more cpus the worse) and
>> rcu_barrier() is very time-consuming. if all rcu callback defined
>> in this module are trivial callback, we can just call kfree_rcu()
>> instead, save a rcu_barrier() when unload.
>
> You lost me on this one. Suppose that the following sequence of
> events occurred:
>
> a. The module invokes call_rcu() or kfree_rcu(). The callback
> is queued on CPU 0.
>
> b. Perhaps a grace period completes, and the callback is therefore
> moved to CPU 0's donelist. But CPU 0 is busy, so doesn't get
> around to invoking the callback. (For example, ksoftirqd.)
>
> c. The module is unloaded, and uses kfree_rcu() instead of
> rcu_barrier(). The callback is queued on CPU 1.
uses kfree_rcu() instead of trivial callback, not rcu_barrier()
>
> d. A grace period completes, and CPU 1 is relatively idle, so
> invokes its callback quickly. The module is therefore unloaded.
>
> e. CPU 0 finally gets around to executing its callback, but the
> module has been unloaded, so there is nothingness where the
> callback function used to be. We get an oops.
>
we done need wait anything if not callback is defined in this module.
> What prevents this sequence of events from happening?
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH] rcu: introduce kfree_rcu()
2008-09-18 4:37 ` Andrew Morton
@ 2008-09-18 16:52 ` Manfred Spraul
2008-09-19 2:31 ` Lai Jiangshan
1 sibling, 0 replies; 12+ messages in thread
From: Manfred Spraul @ 2008-09-18 16:52 UTC (permalink / raw)
To: Andrew Morton
Cc: Lai Jiangshan, Ingo Molnar, Linux Kernel Mailing List,
Paul E. McKenney, Dipankar Sarma, Peter Zijlstra
Andrew Morton wrote:
> On Thu, 18 Sep 2008 12:18:28 +0800 Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
>
>
>> sometimes a rcu callback is just calling kfree() to free a struct's memory
>> (we say this callback is a trivial callback.).
>> this patch introduce kfree_rcu() to do these things directly, easily.
>>
>> There are 4 reasons that we need kfree_rcu():
>>
>> 1) unloadable modules:
>> a module(rcu callback is defined in this module) using rcu must
>> call rcu_barrier() when unload. rcu_barrier() will increase
>> the system's overhead(the more cpus the worse) and
>> rcu_barrier() is very time-consuming. if all rcu callback defined
>> in this module are trivial callback, we can just call kfree_rcu()
>> instead, save a rcu_barrier() when unload.
>>
>> 2) duplicate code:
>> all trivial callback are duplicate code though the structs to be freed
>> are different. it's just a container_of() and a kfree().
>> There are about 50% callbacks are trivial callbacks for call_rcu() in
>> current kernel code.
>>
>> 3) cache:
>> the instructions of trivial callback is not in the cache supposedly.
>> calling a trivial callback will let to cache missing very likely.
>> the more trivial callback the more cache missing. OK, this is
>> not a problem now or in a few days: Only less than 1% trivial callback
>> are called in running kernel.
>>
>> 4) future:
>> the number of user of rcu is increasing. new code for rcu is
>> trivial callback very likely. it means more modules using rcu
>> and more duplicate code(may come to 90% of callbacks is trivial
>> callbacks) and more cache missing.
>>
>> Implementation:
>> there were a lot of ideas came out when i implemented kfree_rcu().
>> I chose the simplest one as this patch shows. but these implementation
>> may cannot be used for to free a struct larger than 16KBytes.
>>
>> kfree_rcu_bh()? kfree_rcu_sched()?
>> these two are not need current. call_rcu_bh() & call_rcu_sched()
>> are hardly be called(and hardly be called for trivial callback).
>>
>> vfree_rcu()?
>> No, vfree() is not atomic function, will not be called in softirq.
>>
>>
>
> This is all rather mysterious.
>
>
>> ---
>> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
>> index e8b4039..04c654f 100644
>> --- a/include/linux/rcupdate.h
>> +++ b/include/linux/rcupdate.h
>> @@ -253,4 +253,25 @@ extern void rcu_barrier_sched(void);
>> extern void rcu_init(void);
>> extern int rcu_needs_cpu(int cpu);
>>
>> +#define __KFREE_RCU_MAX_OFFSET 4095
>> +#define KFREE_RCU_MAX_OFFSET (sizeof(void *) * __KFREE_RCU_MAX_OFFSET)
>> +
>> +#define __rcu_reclaim(head) \
>> +do { \
>> + unsigned long __offset = (unsigned long)head->func; \
>> + if (__offset <= __KFREE_RCU_MAX_OFFSET) \
>> + kfree((void *)head - sizeof(void *) * __offset); \
>> + else \
>> + head->func(head); \
>> +} while(0)
>>
>
> All the above could do with some comments explaining what it does.
>
__rcu_reclaim either treats head->func as an offset for kfree or as a
function pointer.
>> #endif /* __LINUX_RCUPDATE_H */
>> diff --git a/kernel/rcuclassic.c b/kernel/rcuclassic.c
>> index aad93cd..5a14190 100644
>> --- a/kernel/rcuclassic.c
>> +++ b/kernel/rcuclassic.c
>> @@ -232,7 +232,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
>> while (list) {
>> next = list->next;
>> prefetch(next);
>> - list->func(list);
>> + __rcu_reclaim(list);
>>
Here it's used:
the softirq that is called after the grace period calls kfree directly
instead of calling a wrapper function around kfree.
>> list = next;
>> if (++count >= rdp->blimit)
>> break;
>> diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
>> index 467d594..aa9b56a 100644
>> --- a/kernel/rcupdate.c
>> +++ b/kernel/rcupdate.c
>> @@ -162,6 +162,18 @@ void rcu_barrier_sched(void)
>> }
>> EXPORT_SYMBOL_GPL(rcu_barrier_sched);
>>
>> +void kfree_rcu(const void *ptr, struct rcu_head *head)
>> +{
>> + unsigned long offset;
>> + typedef void (*rcu_callback)(struct rcu_head *);
>> +
>> + offset = (void *)head - (void *)ptr;
>>
What about offset_of? the computation is known at compile time.
>> + BUG_ON(offset > KFREE_RCU_MAX_OFFSET);
>> +
>>
I'd try to make that a compile time error. Is that possible? perhaps
with some __builtin_constant_p (head-ptr) or something like that. Or
with offset_of.
>> + call_rcu(head, (rcu_callback)(offset / sizeof(void *)));
>>
>
> How can this work? We take the difference between two pointers, divide
> that by 4 or 8, then treat the resulting number as the address of an
> RCU callback function.
>
> I think I'm missing something here.
>
>
__rcu_reclaim() knows that function pointers < 4096 are actually offsets
for kfree.
I like the idea:
- the call to list->func() is probably very difficult to predict for a
branch target predictor.
- it's just a waste not to call kfree directly.
- I'm not sure about the implementation.
--
Manfred
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH] rcu: introduce kfree_rcu()
2008-09-18 6:44 ` Paul E. McKenney
2008-09-18 8:59 ` Lai Jiangshan
@ 2008-09-18 16:56 ` Manfred Spraul
2008-09-18 17:46 ` Paul E. McKenney
2008-09-19 1:04 ` Lai Jiangshan
2 siblings, 1 reply; 12+ messages in thread
From: Manfred Spraul @ 2008-09-18 16:56 UTC (permalink / raw)
To: paulmck
Cc: Lai Jiangshan, Ingo Molnar, Linux Kernel Mailing List,
Dipankar Sarma, Andrew Morton, Peter Zijlstra
Paul E. McKenney wrote:
> On Thu, Sep 18, 2008 at 12:18:28PM +0800, Lai Jiangshan wrote:
>
>> sometimes a rcu callback is just calling kfree() to free a struct's memory
>> (we say this callback is a trivial callback.).
>> this patch introduce kfree_rcu() to do these things directly, easily.
>>
>
> Interesting! Please see questions and comments below.
>
>
>> There are 4 reasons that we need kfree_rcu():
>>
>> 1) unloadable modules:
>> a module(rcu callback is defined in this module) using rcu must
>> call rcu_barrier() when unload. rcu_barrier() will increase
>> the system's overhead(the more cpus the worse) and
>> rcu_barrier() is very time-consuming. if all rcu callback defined
>> in this module are trivial callback, we can just call kfree_rcu()
>> instead, save a rcu_barrier() when unload.
>>
Hmm: why is rcu_barrier() sufficient to prevent races?
Offlining a cpu reorders rcu callbacks - rcu_barrier() can return before
all previous call_rcu() callbacks were called.
--
Manfred
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH] rcu: introduce kfree_rcu()
2008-09-18 8:59 ` Lai Jiangshan
@ 2008-09-18 17:15 ` Paul E. McKenney
0 siblings, 0 replies; 12+ messages in thread
From: Paul E. McKenney @ 2008-09-18 17:15 UTC (permalink / raw)
To: Lai Jiangshan
Cc: Ingo Molnar, Linux Kernel Mailing List, Dipankar Sarma,
Andrew Morton, Peter Zijlstra, manfred
On Thu, Sep 18, 2008 at 04:59:33PM +0800, Lai Jiangshan wrote:
[ . . . ]
> 1) unloadable modules:
> A) use my_struct_release_rcu():
> when we unload this modules, we need call rcu_barrier() to wait
> all my_struct_release_rcu() had called.
> B) use kfree_rcu():
> if all trivial callback are removed and kfree_rcu() are used instead,
> we do not need to wait anything. just quick finish unloading.
OK, so the trick is that the module -never- uses call_rcu()
directly, instead using -only- kfree_rcu(), along perhaps also with
synchronize_rcu(). Because kfree_rcu() does not reference module text,
you then don't need to wait at all. Good point!
Thanx, Paul
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH] rcu: introduce kfree_rcu()
2008-09-18 16:56 ` Manfred Spraul
@ 2008-09-18 17:46 ` Paul E. McKenney
2008-09-19 16:03 ` Manfred Spraul
0 siblings, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2008-09-18 17:46 UTC (permalink / raw)
To: Manfred Spraul
Cc: Lai Jiangshan, Ingo Molnar, Linux Kernel Mailing List,
Dipankar Sarma, Andrew Morton, Peter Zijlstra
On Thu, Sep 18, 2008 at 06:56:11PM +0200, Manfred Spraul wrote:
> Paul E. McKenney wrote:
>> On Thu, Sep 18, 2008 at 12:18:28PM +0800, Lai Jiangshan wrote:
>>
>>> sometimes a rcu callback is just calling kfree() to free a struct's
>>> memory
>>> (we say this callback is a trivial callback.).
>>> this patch introduce kfree_rcu() to do these things directly, easily.
>>>
>>
>> Interesting! Please see questions and comments below.
>>
>>
>>> There are 4 reasons that we need kfree_rcu():
>>>
>>> 1) unloadable modules:
>>> a module(rcu callback is defined in this module) using rcu must
>>> call rcu_barrier() when unload. rcu_barrier() will increase
>>> the system's overhead(the more cpus the worse) and
>>> rcu_barrier() is very time-consuming. if all rcu callback defined
>>> in this module are trivial callback, we can just call kfree_rcu()
>>> instead, save a rcu_barrier() when unload.
>>>
> Hmm: why is rcu_barrier() sufficient to prevent races?
> Offlining a cpu reorders rcu callbacks - rcu_barrier() can return before
> all previous call_rcu() callbacks were called.
The rcu_barrier() family of functions registers a callback on each CPU,
and waits until all these callbacks have been invoked. The CPU offlining
process preserves the order of the callbacks that were registered on a
given CPU. Thus, when rcu_barrier() returns, all RCU callbacks previously
registered are guaranteed to have already been invoked, regardless of
what CPUs might have been offlined and onlined in the meantime.
Thanx, Paul
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH] rcu: introduce kfree_rcu()
2008-09-18 6:44 ` Paul E. McKenney
2008-09-18 8:59 ` Lai Jiangshan
2008-09-18 16:56 ` Manfred Spraul
@ 2008-09-19 1:04 ` Lai Jiangshan
2008-09-19 3:58 ` Paul E. McKenney
2 siblings, 1 reply; 12+ messages in thread
From: Lai Jiangshan @ 2008-09-19 1:04 UTC (permalink / raw)
To: paulmck
Cc: Ingo Molnar, Linux Kernel Mailing List, Dipankar Sarma,
Andrew Morton, Peter Zijlstra, manfred
Paul E. McKenney wrote:
> On Thu, Sep 18, 2008 at 12:18:28PM +0800, Lai Jiangshan wrote:
>> sometimes a rcu callback is just calling kfree() to free a struct's memory
>> (we say this callback is a trivial callback.).
>> this patch introduce kfree_rcu() to do these things directly, easily.
>
> Interesting! Please see questions and comments below.
>
>> There are 4 reasons that we need kfree_rcu():
>>
>> 1) unloadable modules:
>> a module(rcu callback is defined in this module) using rcu must
>> call rcu_barrier() when unload. rcu_barrier() will increase
>> the system's overhead(the more cpus the worse) and
>> rcu_barrier() is very time-consuming. if all rcu callback defined
>> in this module are trivial callback, we can just call kfree_rcu()
>> instead, save a rcu_barrier() when unload.
>
> You lost me on this one. Suppose that the following sequence of
> events occurred:
>
> a. The module invokes call_rcu() or kfree_rcu(). The callback
> is queued on CPU 0.
>
> b. Perhaps a grace period completes, and the callback is therefore
> moved to CPU 0's donelist. But CPU 0 is busy, so doesn't get
> around to invoking the callback. (For example, ksoftirqd.)
>
> c. The module is unloaded, and uses kfree_rcu() instead of
> rcu_barrier(). The callback is queued on CPU 1.
>
> d. A grace period completes, and CPU 1 is relatively idle, so
> invokes its callback quickly. The module is therefore unloaded.
>
> e. CPU 0 finally gets around to executing its callback, but the
> module has been unloaded, so there is nothingness where the
> callback function used to be. We get an oops.
>
> What prevents this sequence of events from happening?
We save a rcu_barrier() only when all rcu callback defined in this
module are trivial callback and we use kfree_rcu to instead them.
trivial callbacks are the most common callbacks, so some module may used
trivial callback only.
>
>> 2) duplicate code:
>> all trivial callback are duplicate code though the structs to be freed
>> are different. it's just a container_of() and a kfree().
>> There are about 50% callbacks are trivial callbacks for call_rcu() in
>> current kernel code.
>
> Indeed! There was something similar to kfree_rcu() proposed some
> years back, but it was rejected because it contained more code than
> did the trivial callbacks. :-/
>
> But there are more such callbacks these days, so might be worth
> revisiting.
>
>> 3) cache:
>> the instructions of trivial callback is not in the cache supposedly.
>> calling a trivial callback will let to cache missing very likely.
>> the more trivial callback the more cache missing. OK, this is
>> not a problem now or in a few days: Only less than 1% trivial callback
>> are called in running kernel.
>
> Reducing code footprint would be a good thing. Do you have stats on
> the kernel text size, before and after?
I did not have stats on the kernel text size, I think these cache
missing are caused by lots of different trivial callbacks in everywhere,
not too big kernel text.
>
>> 4) future:
>> the number of user of rcu is increasing. new code for rcu is
>> trivial callback very likely. it means more modules using rcu
>> and more duplicate code(may come to 90% of callbacks is trivial
>> callbacks) and more cache missing.
>
> Ditto.
>
>> Implementation:
>> there were a lot of ideas came out when i implemented kfree_rcu().
>> I chose the simplest one as this patch shows. but these implementation
>> may cannot be used for to free a struct larger than 16KBytes.
>>
>> kfree_rcu_bh()? kfree_rcu_sched()?
>> these two are not need current. call_rcu_bh() & call_rcu_sched()
>> are hardly be called(and hardly be called for trivial callback).
>>
>> vfree_rcu()?
>> No, vfree() is not atomic function, will not be called in softirq.
>>
>> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
>> ---
>> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
>> index e8b4039..04c654f 100644
>> --- a/include/linux/rcupdate.h
>> +++ b/include/linux/rcupdate.h
>> @@ -253,4 +253,25 @@ extern void rcu_barrier_sched(void);
>> extern void rcu_init(void);
>> extern int rcu_needs_cpu(int cpu);
>>
>> +#define __KFREE_RCU_MAX_OFFSET 4095
>> +#define KFREE_RCU_MAX_OFFSET (sizeof(void *) * __KFREE_RCU_MAX_OFFSET)
>> +
>> +#define __rcu_reclaim(head) \
>> +do { \
>> + unsigned long __offset = (unsigned long)head->func; \
>> + if (__offset <= __KFREE_RCU_MAX_OFFSET) \
>> + kfree((void *)head - sizeof(void *) * __offset); \
>> + else \
>> + head->func(head); \
>> +} while(0)
>
> OK, so the idea is that structures whose rcu_head is near the front
> of the structure have the offset of the rcu_head put into the
> ->func field instead of a pointer to the callback function?
>
> Of course, it doesn't need to be too near the beginning of the
> function...
>
> All arches are guaranteed not to have kernel text in the low 16K
> of memory (for 32-bit arches) or low 32K of memory (for 64-bit arches)?
(unsigned long)head->func is always <= 4095, not 14K or 32K.
we just guaranteed not to have kernel text in the low 4k of memory.
the real offset is (sizeof(void *) * (unsigned long)head->func),
it's 16K or 32K.
>
>> +/**
>> + * kfree_rcu - free previously allocated memory after a grace period.
>> + * @ptr: pointer returned by kmalloc.
>> + * @head: structure to be used for queueing the RCU updates. This structure
>> + * is a part of previously allocated memory @ptr.
>> + */
>> +extern void kfree_rcu(const void *ptr, struct rcu_head *head);
>> +
>> #endif /* __LINUX_RCUPDATE_H */
>> diff --git a/kernel/rcuclassic.c b/kernel/rcuclassic.c
>> index aad93cd..5a14190 100644
>> --- a/kernel/rcuclassic.c
>> +++ b/kernel/rcuclassic.c
>> @@ -232,7 +232,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
>> while (list) {
>> next = list->next;
>> prefetch(next);
>> - list->func(list);
>> + __rcu_reclaim(list);
>
> OK, consistent with above.
>
>> list = next;
>> if (++count >= rdp->blimit)
>> break;
>> diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
>> index 467d594..aa9b56a 100644
>> --- a/kernel/rcupdate.c
>> +++ b/kernel/rcupdate.c
>> @@ -162,6 +162,18 @@ void rcu_barrier_sched(void)
>> }
>> EXPORT_SYMBOL_GPL(rcu_barrier_sched);
>>
>> +void kfree_rcu(const void *ptr, struct rcu_head *head)
>> +{
>> + unsigned long offset;
>> + typedef void (*rcu_callback)(struct rcu_head *);
>> +
>> + offset = (void *)head - (void *)ptr;
>> + BUG_ON(offset > KFREE_RCU_MAX_OFFSET);
>> +
>> + call_rcu(head, (rcu_callback)(offset / sizeof(void *)));
>
> OK, so we pass in the pointer to the rcu_head structure, followed
> by the offset in pointer-sized units, but with the latter cast to
> a pointer to a callback function? Hmmm.... Kinky....
>
> Then after the grace period completes, the __rcu_reclaim() sorts
> things out.
Yes, kernel pointers have redundant information, we use the low 4k
as offset. when ->func < 4k, it stand for offset, when ->func >= 4k,
it stand for function pointer.
>
>> +}
>> +EXPORT_SYMBOL_GPL(kfree_rcu);
>> +
>> void __init rcu_init(void)
>> {
>> __rcu_init();
>> diff --git a/kernel/rcupreempt.c b/kernel/rcupreempt.c
>> index 2782793..62a9e54 100644
>> --- a/kernel/rcupreempt.c
>> +++ b/kernel/rcupreempt.c
>> @@ -1108,7 +1108,7 @@ static void rcu_process_callbacks(struct softirq_action *unused)
>> spin_unlock_irqrestore(&rdp->lock, flags);
>> while (list) {
>> next = list->next;
>> - list->func(list);
>> + __rcu_reclaim(list);
>
> And we do this for preemptable RCU as well.
>
>> list = next;
>> RCU_TRACE_ME(rcupreempt_trace_invoke);
>> }
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH] rcu: introduce kfree_rcu()
2008-09-18 4:37 ` Andrew Morton
2008-09-18 16:52 ` Manfred Spraul
@ 2008-09-19 2:31 ` Lai Jiangshan
1 sibling, 0 replies; 12+ messages in thread
From: Lai Jiangshan @ 2008-09-19 2:31 UTC (permalink / raw)
To: Andrew Morton
Cc: Ingo Molnar, Linux Kernel Mailing List, Paul E. McKenney,
Dipankar Sarma, Peter Zijlstra, manfred
Andrew Morton wrote:
>> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
>> index e8b4039..04c654f 100644
>> --- a/include/linux/rcupdate.h
>> +++ b/include/linux/rcupdate.h
>> @@ -253,4 +253,25 @@ extern void rcu_barrier_sched(void);
>> extern void rcu_init(void);
>> extern int rcu_needs_cpu(int cpu);
>>
>> +#define __KFREE_RCU_MAX_OFFSET 4095
>> +#define KFREE_RCU_MAX_OFFSET (sizeof(void *) * __KFREE_RCU_MAX_OFFSET)
>> +
>> +#define __rcu_reclaim(head) \
>> +do { \
>> + unsigned long __offset = (unsigned long)head->func; \
>> + if (__offset <= __KFREE_RCU_MAX_OFFSET) \
>> + kfree((void *)head - sizeof(void *) * __offset); \
>> + else \
>> + head->func(head); \
>> +} while(0)
>
> All the above could do with some comments explaining what it does.
>
> Please use checkpatch.
>
> Surely it cannot be resirable to have a "function" which can call kfree
> either synchronously or asynchronously depending upon the size of the
> object which was passed to it? If the caller wants rcu treatment of
> the freeing then that is what the caller must be given. I mean, if the
> caller can tolerate a synchrnous call to kfree() then the caller should
> have directly called kfree?
>
> I think (and pray) that the above could have been implemented as an
> out-of-line C function?
>
__rcu_reclaim(head) is called when @head 's grace period had completed.
__rcu_reclaim(head) uses "if (__offset <= __KFREE_RCU_MAX_OFFSET)" to
check whether @head is queued by kfree_rcu() or normal call_rcu().
if @head is queued by kfree_rcu(), ((void *)head - sizeof(void *) * __offset)
is the pointer that the memory chunk need to be freed.
otherwise call head->func(head) as original code.
__rcu_reclaim has __prefix, will not be used by user, it is a helper
for rcu-machine. it is always called asynchronously. sorry for not comments
for it.
>> +
>> +/**
>> + * kfree_rcu - free previously allocated memory after a grace period.
>> + * @ptr: pointer returned by kmalloc.
>> + * @head: structure to be used for queueing the RCU updates. This structure
>> + * is a part of previously allocated memory @ptr.
>> + */
>> +extern void kfree_rcu(const void *ptr, struct rcu_head *head);
>
> rcu kerneldoc is implemented in the .c file, not in the .h file.
Good, Thanks.
>
>> #endif /* __LINUX_RCUPDATE_H */
>> diff --git a/kernel/rcuclassic.c b/kernel/rcuclassic.c
>> index aad93cd..5a14190 100644
>> --- a/kernel/rcuclassic.c
>> +++ b/kernel/rcuclassic.c
>> @@ -232,7 +232,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
>> while (list) {
>> next = list->next;
>> prefetch(next);
>> - list->func(list);
>> + __rcu_reclaim(list);
>> list = next;
>> if (++count >= rdp->blimit)
>> break;
>> diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
>> index 467d594..aa9b56a 100644
>> --- a/kernel/rcupdate.c
>> +++ b/kernel/rcupdate.c
>> @@ -162,6 +162,18 @@ void rcu_barrier_sched(void)
>> }
>> EXPORT_SYMBOL_GPL(rcu_barrier_sched);
>>
>> +void kfree_rcu(const void *ptr, struct rcu_head *head)
>> +{
>> + unsigned long offset;
>> + typedef void (*rcu_callback)(struct rcu_head *);
>> +
>> + offset = (void *)head - (void *)ptr;
>> + BUG_ON(offset > KFREE_RCU_MAX_OFFSET);
>> +
>> + call_rcu(head, (rcu_callback)(offset / sizeof(void *)));
>
> How can this work? We take the difference between two pointers, divide
> that by 4 or 8, then treat the resulting number as the address of an
> RCU callback function.
>
> I think I'm missing something here.
We will use __rcu_reclaim(head) sort it out when it's grace period has
completed.
>
>> +}
>> +EXPORT_SYMBOL_GPL(kfree_rcu);
>> +
>> void __init rcu_init(void)
>> {
>> __rcu_init();
>> diff --git a/kernel/rcupreempt.c b/kernel/rcupreempt.c
>> index 2782793..62a9e54 100644
>> --- a/kernel/rcupreempt.c
>> +++ b/kernel/rcupreempt.c
>> @@ -1108,7 +1108,7 @@ static void rcu_process_callbacks(struct softirq_action *unused)
>> spin_unlock_irqrestore(&rdp->lock, flags);
>> while (list) {
>> next = list->next;
>> - list->func(list);
>> + __rcu_reclaim(list);
>> list = next;
>> RCU_TRACE_ME(rcupreempt_trace_invoke);
>> }
>
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH] rcu: introduce kfree_rcu()
2008-09-19 1:04 ` Lai Jiangshan
@ 2008-09-19 3:58 ` Paul E. McKenney
0 siblings, 0 replies; 12+ messages in thread
From: Paul E. McKenney @ 2008-09-19 3:58 UTC (permalink / raw)
To: Lai Jiangshan
Cc: Ingo Molnar, Linux Kernel Mailing List, Dipankar Sarma,
Andrew Morton, Peter Zijlstra, manfred
On Fri, Sep 19, 2008 at 09:04:12AM +0800, Lai Jiangshan wrote:
> Paul E. McKenney wrote:
> > On Thu, Sep 18, 2008 at 12:18:28PM +0800, Lai Jiangshan wrote:
> >> sometimes a rcu callback is just calling kfree() to free a struct's memory
> >> (we say this callback is a trivial callback.).
> >> this patch introduce kfree_rcu() to do these things directly, easily.
> >
> > Interesting! Please see questions and comments below.
> >
> >> There are 4 reasons that we need kfree_rcu():
> >>
> >> 1) unloadable modules:
> >> a module(rcu callback is defined in this module) using rcu must
> >> call rcu_barrier() when unload. rcu_barrier() will increase
> >> the system's overhead(the more cpus the worse) and
> >> rcu_barrier() is very time-consuming. if all rcu callback defined
> >> in this module are trivial callback, we can just call kfree_rcu()
> >> instead, save a rcu_barrier() when unload.
> >
> > You lost me on this one. Suppose that the following sequence of
> > events occurred:
> >
> > a. The module invokes call_rcu() or kfree_rcu(). The callback
> > is queued on CPU 0.
> >
> > b. Perhaps a grace period completes, and the callback is therefore
> > moved to CPU 0's donelist. But CPU 0 is busy, so doesn't get
> > around to invoking the callback. (For example, ksoftirqd.)
> >
> > c. The module is unloaded, and uses kfree_rcu() instead of
> > rcu_barrier(). The callback is queued on CPU 1.
> >
> > d. A grace period completes, and CPU 1 is relatively idle, so
> > invokes its callback quickly. The module is therefore unloaded.
> >
> > e. CPU 0 finally gets around to executing its callback, but the
> > module has been unloaded, so there is nothingness where the
> > callback function used to be. We get an oops.
> >
> > What prevents this sequence of events from happening?
>
> We save a rcu_barrier() only when all rcu callback defined in this
> module are trivial callback and we use kfree_rcu to instead them.
>
> trivial callbacks are the most common callbacks, so some module may used
> trivial callback only.
Understood.
> >
> >> 2) duplicate code:
> >> all trivial callback are duplicate code though the structs to be freed
> >> are different. it's just a container_of() and a kfree().
> >> There are about 50% callbacks are trivial callbacks for call_rcu() in
> >> current kernel code.
> >
> > Indeed! There was something similar to kfree_rcu() proposed some
> > years back, but it was rejected because it contained more code than
> > did the trivial callbacks. :-/
> >
> > But there are more such callbacks these days, so might be worth
> > revisiting.
> >
> >> 3) cache:
> >> the instructions of trivial callback is not in the cache supposedly.
> >> calling a trivial callback will let to cache missing very likely.
> >> the more trivial callback the more cache missing. OK, this is
> >> not a problem now or in a few days: Only less than 1% trivial callback
> >> are called in running kernel.
> >
> > Reducing code footprint would be a good thing. Do you have stats on
> > the kernel text size, before and after?
>
> I did not have stats on the kernel text size, I think these cache
> missing are caused by lots of different trivial callbacks in everywhere,
> not too big kernel text.
The Tiny Linux guys might be interested in even a small reduction in
kernel text size.
> >> 4) future:
> >> the number of user of rcu is increasing. new code for rcu is
> >> trivial callback very likely. it means more modules using rcu
> >> and more duplicate code(may come to 90% of callbacks is trivial
> >> callbacks) and more cache missing.
> >
> > Ditto.
> >
> >> Implementation:
> >> there were a lot of ideas came out when i implemented kfree_rcu().
> >> I chose the simplest one as this patch shows. but these implementation
> >> may cannot be used for to free a struct larger than 16KBytes.
> >>
> >> kfree_rcu_bh()? kfree_rcu_sched()?
> >> these two are not need current. call_rcu_bh() & call_rcu_sched()
> >> are hardly be called(and hardly be called for trivial callback).
> >>
> >> vfree_rcu()?
> >> No, vfree() is not atomic function, will not be called in softirq.
> >>
> >> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> >> ---
> >> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> >> index e8b4039..04c654f 100644
> >> --- a/include/linux/rcupdate.h
> >> +++ b/include/linux/rcupdate.h
> >> @@ -253,4 +253,25 @@ extern void rcu_barrier_sched(void);
> >> extern void rcu_init(void);
> >> extern int rcu_needs_cpu(int cpu);
> >>
> >> +#define __KFREE_RCU_MAX_OFFSET 4095
> >> +#define KFREE_RCU_MAX_OFFSET (sizeof(void *) * __KFREE_RCU_MAX_OFFSET)
> >> +
> >> +#define __rcu_reclaim(head) \
> >> +do { \
> >> + unsigned long __offset = (unsigned long)head->func; \
> >> + if (__offset <= __KFREE_RCU_MAX_OFFSET) \
> >> + kfree((void *)head - sizeof(void *) * __offset); \
> >> + else \
> >> + head->func(head); \
> >> +} while(0)
> >
> > OK, so the idea is that structures whose rcu_head is near the front
> > of the structure have the offset of the rcu_head put into the
> > ->func field instead of a pointer to the callback function?
> >
> > Of course, it doesn't need to be too near the beginning of the
> > function...
> >
> > All arches are guaranteed not to have kernel text in the low 16K
> > of memory (for 32-bit arches) or low 32K of memory (for 64-bit arches)?
>
> (unsigned long)head->func is always <= 4095, not 14K or 32K.
> we just guaranteed not to have kernel text in the low 4k of memory.
>
> the real offset is (sizeof(void *) * (unsigned long)head->func),
> it's 16K or 32K.
Good point!
Thanx, Paul
> >> +/**
> >> + * kfree_rcu - free previously allocated memory after a grace period.
> >> + * @ptr: pointer returned by kmalloc.
> >> + * @head: structure to be used for queueing the RCU updates. This structure
> >> + * is a part of previously allocated memory @ptr.
> >> + */
> >> +extern void kfree_rcu(const void *ptr, struct rcu_head *head);
> >> +
> >> #endif /* __LINUX_RCUPDATE_H */
> >> diff --git a/kernel/rcuclassic.c b/kernel/rcuclassic.c
> >> index aad93cd..5a14190 100644
> >> --- a/kernel/rcuclassic.c
> >> +++ b/kernel/rcuclassic.c
> >> @@ -232,7 +232,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
> >> while (list) {
> >> next = list->next;
> >> prefetch(next);
> >> - list->func(list);
> >> + __rcu_reclaim(list);
> >
> > OK, consistent with above.
> >
> >> list = next;
> >> if (++count >= rdp->blimit)
> >> break;
> >> diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
> >> index 467d594..aa9b56a 100644
> >> --- a/kernel/rcupdate.c
> >> +++ b/kernel/rcupdate.c
> >> @@ -162,6 +162,18 @@ void rcu_barrier_sched(void)
> >> }
> >> EXPORT_SYMBOL_GPL(rcu_barrier_sched);
> >>
> >> +void kfree_rcu(const void *ptr, struct rcu_head *head)
> >> +{
> >> + unsigned long offset;
> >> + typedef void (*rcu_callback)(struct rcu_head *);
> >> +
> >> + offset = (void *)head - (void *)ptr;
> >> + BUG_ON(offset > KFREE_RCU_MAX_OFFSET);
> >> +
> >> + call_rcu(head, (rcu_callback)(offset / sizeof(void *)));
> >
> > OK, so we pass in the pointer to the rcu_head structure, followed
> > by the offset in pointer-sized units, but with the latter cast to
> > a pointer to a callback function? Hmmm.... Kinky....
> >
> > Then after the grace period completes, the __rcu_reclaim() sorts
> > things out.
>
> Yes, kernel pointers have redundant information, we use the low 4k
> as offset. when ->func < 4k, it stand for offset, when ->func >= 4k,
> it stand for function pointer.
>
> >
> >> +}
> >> +EXPORT_SYMBOL_GPL(kfree_rcu);
> >> +
> >> void __init rcu_init(void)
> >> {
> >> __rcu_init();
> >> diff --git a/kernel/rcupreempt.c b/kernel/rcupreempt.c
> >> index 2782793..62a9e54 100644
> >> --- a/kernel/rcupreempt.c
> >> +++ b/kernel/rcupreempt.c
> >> @@ -1108,7 +1108,7 @@ static void rcu_process_callbacks(struct softirq_action *unused)
> >> spin_unlock_irqrestore(&rdp->lock, flags);
> >> while (list) {
> >> next = list->next;
> >> - list->func(list);
> >> + __rcu_reclaim(list);
> >
> > And we do this for preemptable RCU as well.
> >
> >> list = next;
> >> RCU_TRACE_ME(rcupreempt_trace_invoke);
> >> }
> >>
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
> >
> >
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH] rcu: introduce kfree_rcu()
2008-09-18 17:46 ` Paul E. McKenney
@ 2008-09-19 16:03 ` Manfred Spraul
0 siblings, 0 replies; 12+ messages in thread
From: Manfred Spraul @ 2008-09-19 16:03 UTC (permalink / raw)
To: paulmck
Cc: Lai Jiangshan, Ingo Molnar, Linux Kernel Mailing List,
Dipankar Sarma, Andrew Morton, Peter Zijlstra
Paul E. McKenney wrote:
> The rcu_barrier() family of functions registers a callback on each CPU,
> and waits until all these callbacks have been invoked. The CPU offlining
> process preserves the order of the callbacks that were registered on a
> given CPU. Thus, when rcu_barrier() returns, all RCU callbacks previously
> registered are guaranteed to have already been invoked, regardless of
> what CPUs might have been offlined and onlined in the meantime.
>
>
You are right: I mixed up rcu_barrier() and synchronize_rcu().
--
Manfred
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2008-09-19 16:03 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-18 4:18 [RFC PATCH] rcu: introduce kfree_rcu() Lai Jiangshan
2008-09-18 4:37 ` Andrew Morton
2008-09-18 16:52 ` Manfred Spraul
2008-09-19 2:31 ` Lai Jiangshan
2008-09-18 6:44 ` Paul E. McKenney
2008-09-18 8:59 ` Lai Jiangshan
2008-09-18 17:15 ` Paul E. McKenney
2008-09-18 16:56 ` Manfred Spraul
2008-09-18 17:46 ` Paul E. McKenney
2008-09-19 16:03 ` Manfred Spraul
2008-09-19 1:04 ` Lai Jiangshan
2008-09-19 3:58 ` Paul E. McKenney
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).