From: Manfred Spraul <manfred@colorfullife.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>, Ingo Molnar <mingo@elte.hu>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Dipankar Sarma <dipankar@in.ibm.com>,
Peter Zijlstra <peterz@infradead.org>
Subject: Re: [RFC PATCH] rcu: introduce kfree_rcu()
Date: Thu, 18 Sep 2008 18:52:46 +0200 [thread overview]
Message-ID: <48D2875E.3090300@colorfullife.com> (raw)
In-Reply-To: <20080917213748.7571fbdf.akpm@linux-foundation.org>
Andrew Morton wrote:
> On Thu, 18 Sep 2008 12:18:28 +0800 Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
>
>
>> sometimes a rcu callback is just calling kfree() to free a struct's memory
>> (we say this callback is a trivial callback.).
>> this patch introduce kfree_rcu() to do these things directly, easily.
>>
>> There are 4 reasons that we need kfree_rcu():
>>
>> 1) unloadable modules:
>> a module(rcu callback is defined in this module) using rcu must
>> call rcu_barrier() when unload. rcu_barrier() will increase
>> the system's overhead(the more cpus the worse) and
>> rcu_barrier() is very time-consuming. if all rcu callback defined
>> in this module are trivial callback, we can just call kfree_rcu()
>> instead, save a rcu_barrier() when unload.
>>
>> 2) duplicate code:
>> all trivial callback are duplicate code though the structs to be freed
>> are different. it's just a container_of() and a kfree().
>> There are about 50% callbacks are trivial callbacks for call_rcu() in
>> current kernel code.
>>
>> 3) cache:
>> the instructions of trivial callback is not in the cache supposedly.
>> calling a trivial callback will let to cache missing very likely.
>> the more trivial callback the more cache missing. OK, this is
>> not a problem now or in a few days: Only less than 1% trivial callback
>> are called in running kernel.
>>
>> 4) future:
>> the number of user of rcu is increasing. new code for rcu is
>> trivial callback very likely. it means more modules using rcu
>> and more duplicate code(may come to 90% of callbacks is trivial
>> callbacks) and more cache missing.
>>
>> Implementation:
>> there were a lot of ideas came out when i implemented kfree_rcu().
>> I chose the simplest one as this patch shows. but these implementation
>> may cannot be used for to free a struct larger than 16KBytes.
>>
>> kfree_rcu_bh()? kfree_rcu_sched()?
>> these two are not need current. call_rcu_bh() & call_rcu_sched()
>> are hardly be called(and hardly be called for trivial callback).
>>
>> vfree_rcu()?
>> No, vfree() is not atomic function, will not be called in softirq.
>>
>>
>
> This is all rather mysterious.
>
>
>> ---
>> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
>> index e8b4039..04c654f 100644
>> --- a/include/linux/rcupdate.h
>> +++ b/include/linux/rcupdate.h
>> @@ -253,4 +253,25 @@ extern void rcu_barrier_sched(void);
>> extern void rcu_init(void);
>> extern int rcu_needs_cpu(int cpu);
>>
>> +#define __KFREE_RCU_MAX_OFFSET 4095
>> +#define KFREE_RCU_MAX_OFFSET (sizeof(void *) * __KFREE_RCU_MAX_OFFSET)
>> +
>> +#define __rcu_reclaim(head) \
>> +do { \
>> + unsigned long __offset = (unsigned long)head->func; \
>> + if (__offset <= __KFREE_RCU_MAX_OFFSET) \
>> + kfree((void *)head - sizeof(void *) * __offset); \
>> + else \
>> + head->func(head); \
>> +} while(0)
>>
>
> All the above could do with some comments explaining what it does.
>
__rcu_reclaim either treats head->func as an offset for kfree or as a
function pointer.
>> #endif /* __LINUX_RCUPDATE_H */
>> diff --git a/kernel/rcuclassic.c b/kernel/rcuclassic.c
>> index aad93cd..5a14190 100644
>> --- a/kernel/rcuclassic.c
>> +++ b/kernel/rcuclassic.c
>> @@ -232,7 +232,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
>> while (list) {
>> next = list->next;
>> prefetch(next);
>> - list->func(list);
>> + __rcu_reclaim(list);
>>
Here it's used:
the softirq that is called after the grace period calls kfree directly
instead of calling a wrapper function around kfree.
>> list = next;
>> if (++count >= rdp->blimit)
>> break;
>> diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
>> index 467d594..aa9b56a 100644
>> --- a/kernel/rcupdate.c
>> +++ b/kernel/rcupdate.c
>> @@ -162,6 +162,18 @@ void rcu_barrier_sched(void)
>> }
>> EXPORT_SYMBOL_GPL(rcu_barrier_sched);
>>
>> +void kfree_rcu(const void *ptr, struct rcu_head *head)
>> +{
>> + unsigned long offset;
>> + typedef void (*rcu_callback)(struct rcu_head *);
>> +
>> + offset = (void *)head - (void *)ptr;
>>
What about offset_of? the computation is known at compile time.
>> + BUG_ON(offset > KFREE_RCU_MAX_OFFSET);
>> +
>>
I'd try to make that a compile time error. Is that possible? perhaps
with some __builtin_constant_p (head-ptr) or something like that. Or
with offset_of.
>> + call_rcu(head, (rcu_callback)(offset / sizeof(void *)));
>>
>
> How can this work? We take the difference between two pointers, divide
> that by 4 or 8, then treat the resulting number as the address of an
> RCU callback function.
>
> I think I'm missing something here.
>
>
__rcu_reclaim() knows that function pointers < 4096 are actually offsets
for kfree.
I like the idea:
- the call to list->func() is probably very difficult to predict for a
branch target predictor.
- it's just a waste not to call kfree directly.
- I'm not sure about the implementation.
--
Manfred
next prev parent reply other threads:[~2008-09-18 16:52 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-09-18 4:18 [RFC PATCH] rcu: introduce kfree_rcu() Lai Jiangshan
2008-09-18 4:37 ` Andrew Morton
2008-09-18 16:52 ` Manfred Spraul [this message]
2008-09-19 2:31 ` Lai Jiangshan
2008-09-18 6:44 ` Paul E. McKenney
2008-09-18 8:59 ` Lai Jiangshan
2008-09-18 17:15 ` Paul E. McKenney
2008-09-18 16:56 ` Manfred Spraul
2008-09-18 17:46 ` Paul E. McKenney
2008-09-19 16:03 ` Manfred Spraul
2008-09-19 1:04 ` Lai Jiangshan
2008-09-19 3:58 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48D2875E.3090300@colorfullife.com \
--to=manfred@colorfullife.com \
--cc=akpm@linux-foundation.org \
--cc=dipankar@in.ibm.com \
--cc=laijs@cn.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.