All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Dipankar Sarma <dipankar@in.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	manfred@colorfullife.com
Subject: Re: [RFC PATCH] rcu: introduce kfree_rcu()
Date: Thu, 18 Sep 2008 20:58:42 -0700	[thread overview]
Message-ID: <20080919035842.GD7482@linux.vnet.ibm.com> (raw)
In-Reply-To: <48D2FA8C.2010207@cn.fujitsu.com>

On Fri, Sep 19, 2008 at 09:04:12AM +0800, Lai Jiangshan wrote:
> Paul E. McKenney wrote:
> > On Thu, Sep 18, 2008 at 12:18:28PM +0800, Lai Jiangshan wrote:
> >> sometimes a rcu callback is just calling kfree() to free a struct's memory
> >> (we say this callback is a trivial callback.).
> >> this patch introduce kfree_rcu() to do these things directly, easily.
> > 
> > Interesting!  Please see questions and comments below.
> > 
> >> There are 4 reasons that we need kfree_rcu():
> >>
> >> 1) unloadable modules:
> >>    a module(rcu callback is defined in this module) using rcu must
> >>    call rcu_barrier() when unload. rcu_barrier() will increase
> >>    the system's overhead(the more cpus the worse) and
> >>    rcu_barrier() is very time-consuming. if all rcu callback defined
> >>    in this module are trivial callback, we can just call kfree_rcu()
> >>    instead, save a rcu_barrier() when unload.
> > 
> > You lost me on this one.  Suppose that the following sequence of
> > events occurred:
> > 
> > a.	The module invokes call_rcu() or kfree_rcu().  The callback
> > 	is queued on CPU 0.
> > 
> > b.	Perhaps a grace period completes, and the callback is therefore
> > 	moved to CPU 0's donelist.  But CPU 0 is busy, so doesn't get
> > 	around to invoking the callback.  (For example, ksoftirqd.)
> > 
> > c.	The module is unloaded, and uses kfree_rcu() instead of
> > 	rcu_barrier().  The callback is queued on CPU 1.
> > 
> > d.	A grace period completes, and CPU 1 is relatively idle, so
> > 	invokes its callback quickly.  The module is therefore unloaded.
> > 
> > e.	CPU 0 finally gets around to executing its callback, but the
> > 	module has been unloaded, so there is nothingness where the
> > 	callback function used to be.  We get an oops.
> > 
> > What prevents this sequence of events from happening?
> 
> We save a rcu_barrier() only when all rcu callback defined in this
> module are trivial callback and we use kfree_rcu to instead them.
> 
> trivial callbacks are the most common callbacks, so some module may used
> trivial callback only.

Understood.

> > 
> >> 2) duplicate code:
> >>    all trivial callback are duplicate code though the structs to be freed
> >>    are different. it's just a container_of() and a kfree().
> >>    There are about 50% callbacks are trivial callbacks for call_rcu() in
> >>    current kernel code.
> > 
> > Indeed!  There was something similar to kfree_rcu() proposed some
> > years back, but it was rejected because it contained more code than
> > did the trivial callbacks.  :-/
> > 
> > But there are more such callbacks these days, so might be worth
> > revisiting.
> > 
> >> 3) cache:
> >>    the instructions of trivial callback is not in the cache supposedly.
> >>    calling a trivial callback will let to cache missing very likely.
> >>    the more trivial callback the more cache missing. OK, this is
> >>    not a problem now or in a few days: Only less than 1% trivial callback
> >>    are called in running kernel.
> > 
> > Reducing code footprint would be a good thing.  Do you have stats on
> > the kernel text size, before and after?
> 
> I did not have stats on the kernel text size, I think these cache
> missing are caused by lots of different trivial callbacks in everywhere,
> not too big kernel text.

The Tiny Linux guys might be interested in even a small reduction in
kernel text size.

> >> 4) future:
> >>    the number of user of rcu is increasing. new code for rcu is
> >>    trivial callback very likely. it means more modules using rcu
> >>    and more duplicate code(may come to 90% of callbacks is trivial
> >>    callbacks) and more cache missing.
> > 
> > Ditto.
> > 
> >> Implementation:
> >>    there were a lot of ideas came out when i implemented kfree_rcu().
> >>    I chose the simplest one as this patch shows. but these implementation
> >>    may cannot be used for to free a struct larger than 16KBytes.
> >>
> >> kfree_rcu_bh()? kfree_rcu_sched()?
> >>    these two are not need current. call_rcu_bh() & call_rcu_sched()
> >>    are hardly be called(and hardly be called for trivial callback).
> >>
> >> vfree_rcu()?
> >>    No, vfree() is not atomic function, will not be called in softirq.
> >>
> >> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> >> ---
> >> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> >> index e8b4039..04c654f 100644
> >> --- a/include/linux/rcupdate.h
> >> +++ b/include/linux/rcupdate.h
> >> @@ -253,4 +253,25 @@ extern void rcu_barrier_sched(void);
> >>  extern void rcu_init(void);
> >>  extern int rcu_needs_cpu(int cpu);
> >>
> >> +#define __KFREE_RCU_MAX_OFFSET 4095
> >> +#define KFREE_RCU_MAX_OFFSET (sizeof(void *) * __KFREE_RCU_MAX_OFFSET)
> >> +
> >> +#define __rcu_reclaim(head) \
> >> +do { \
> >> +	unsigned long __offset = (unsigned long)head->func; \
> >> +	if (__offset <= __KFREE_RCU_MAX_OFFSET) \
> >> +		kfree((void *)head - sizeof(void *) * __offset); \
> >> +	else \
> >> +		head->func(head); \
> >> +} while(0)
> > 
> > OK, so the idea is that structures whose rcu_head is near the front
> > of the structure have the offset of the rcu_head put into the
> > ->func field instead of a pointer to the callback function?
> > 
> > Of course, it doesn't need to be too near the beginning of the
> > function...
> > 
> > All arches are guaranteed not to have kernel text in the low 16K
> > of memory (for 32-bit arches) or low 32K of memory (for 64-bit arches)?
> 
> (unsigned long)head->func is always <= 4095, not 14K or 32K.
> we just guaranteed not to have kernel text in the low 4k of memory.
> 
> the real offset is (sizeof(void *) * (unsigned long)head->func),
> it's 16K or 32K.

Good point!

						Thanx, Paul

> >> +/**
> >> + * kfree_rcu - free previously allocated memory after a grace period.
> >> + * @ptr:  pointer returned by kmalloc.
> >> + * @head: structure to be used for queueing the RCU updates. This structure
> >> + *        is a part of previously allocated memory @ptr.
> >> + */
> >> +extern void kfree_rcu(const void *ptr, struct rcu_head *head);
> >> +
> >>  #endif /* __LINUX_RCUPDATE_H */
> >> diff --git a/kernel/rcuclassic.c b/kernel/rcuclassic.c
> >> index aad93cd..5a14190 100644
> >> --- a/kernel/rcuclassic.c
> >> +++ b/kernel/rcuclassic.c
> >> @@ -232,7 +232,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
> >>  	while (list) {
> >>  		next = list->next;
> >>  		prefetch(next);
> >> -		list->func(list);
> >> +		__rcu_reclaim(list);
> > 
> > OK, consistent with above.
> > 
> >>  		list = next;
> >>  		if (++count >= rdp->blimit)
> >>  			break;
> >> diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
> >> index 467d594..aa9b56a 100644
> >> --- a/kernel/rcupdate.c
> >> +++ b/kernel/rcupdate.c
> >> @@ -162,6 +162,18 @@ void rcu_barrier_sched(void)
> >>  }
> >>  EXPORT_SYMBOL_GPL(rcu_barrier_sched);
> >>
> >> +void kfree_rcu(const void *ptr, struct rcu_head *head)
> >> +{
> >> +	unsigned long offset;
> >> +	typedef void (*rcu_callback)(struct rcu_head *);
> >> +
> >> +	offset = (void *)head - (void *)ptr;
> >> +	BUG_ON(offset > KFREE_RCU_MAX_OFFSET);
> >> +
> >> +	call_rcu(head, (rcu_callback)(offset / sizeof(void *)));
> > 
> > OK, so we pass in the pointer to the rcu_head structure, followed
> > by the offset in pointer-sized units, but with the latter cast to
> > a pointer to a callback function?  Hmmm....  Kinky....
> > 
> > Then after the grace period completes, the __rcu_reclaim() sorts
> > things out.
> 
> Yes, kernel pointers have redundant information, we use the low 4k
> as offset. when ->func < 4k, it stand for offset, when ->func >= 4k,
> it stand for function pointer.
> 
> > 
> >> +}
> >> +EXPORT_SYMBOL_GPL(kfree_rcu);
> >> +
> >>  void __init rcu_init(void)
> >>  {
> >>  	__rcu_init();
> >> diff --git a/kernel/rcupreempt.c b/kernel/rcupreempt.c
> >> index 2782793..62a9e54 100644
> >> --- a/kernel/rcupreempt.c
> >> +++ b/kernel/rcupreempt.c
> >> @@ -1108,7 +1108,7 @@ static void rcu_process_callbacks(struct softirq_action *unused)
> >>  	spin_unlock_irqrestore(&rdp->lock, flags);
> >>  	while (list) {
> >>  		next = list->next;
> >> -		list->func(list);
> >> +		__rcu_reclaim(list);
> > 
> > And we do this for preemptable RCU as well.
> > 
> >>  		list = next;
> >>  		RCU_TRACE_ME(rcupreempt_trace_invoke);
> >>  	}
> >>
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> > 
> > 
> > 
> 
> 

      reply	other threads:[~2008-09-19  3:59 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-18  4:18 [RFC PATCH] rcu: introduce kfree_rcu() Lai Jiangshan
2008-09-18  4:37 ` Andrew Morton
2008-09-18 16:52   ` Manfred Spraul
2008-09-19  2:31   ` Lai Jiangshan
2008-09-18  6:44 ` Paul E. McKenney
2008-09-18  8:59   ` Lai Jiangshan
2008-09-18 17:15     ` Paul E. McKenney
2008-09-18 16:56   ` Manfred Spraul
2008-09-18 17:46     ` Paul E. McKenney
2008-09-19 16:03       ` Manfred Spraul
2008-09-19  1:04   ` Lai Jiangshan
2008-09-19  3:58     ` Paul E. McKenney [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080919035842.GD7482@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=dipankar@in.ibm.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=manfred@colorfullife.com \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.