* Re: [PATCH 1/2] srcu-3: RCU variant permitting read-side blocking [not found] ` <1152226204.21787.2093.camel@stark> @ 2006-07-06 23:39 ` Paul E. McKenney [not found] ` <Pine.LNX.4.44L0.0607071051430.17135-100000@iolanthe.rowland.org> 0 siblings, 1 reply; 18+ messages in thread From: Paul E. McKenney @ 2006-07-06 23:39 UTC (permalink / raw) To: Matt Helsley Cc: Alan Stern, linux-kernel, Andrew Morton, dipankar, Ingo Molnar, tytso, Darren Hart, oleg, Jes Sorensen On Thu, Jul 06, 2006 at 03:50:03PM -0700, Matt Helsley wrote: > On Thu, 2006-07-06 at 16:28 -0400, Alan Stern wrote: > > I've been trying to come up with a way to allow SRCU structures to be > > initialized statically rather than dynamically. The per-cpu data makes it > > quite hard. Not only do you have to use different routines to access > > static vs. dynamic per-cpu data, there's just no good way to write a > > static initializer. This is because the per-cpu data requires its own > > separate definition, and there's no way to call DEFINE_PER_CPU from within > > an initializer. > > > > Here, in outline, is the best I've been able to come up with. It uses a > > function pointer member to select the appropriate sort of per-cpu data > > access. You would use it like this: > > > > PREDEFINE_SRCU(s); > > static DEFINE_SRCU(s); > > ... > > idx = srcu_read_lock(&s); > > ... etc ... > > > > Alternative possibilities involve an entire parallel implementation for > > statically-initialized structures (which seems excessive) or using a > > runtime test instead of a function pointer to select the dereferencing > > mechanism. > > > > Can anybody suggest anything better? > > > > Alan Stern > > I started to come up with something similar but did not get as far. I > suspect the runtime test you're suggesting would look like: > > #include <asm/sections.h> > > ... > if ((per_cpu_ptr >= __per_cpu_start) && (per_cpu_ptr < __per_cpu_end)) { > /* staticly-allocated per-cpu data */ > ... > } else { > /* dynamically-allocated per-cpu data */ > ... > } > ... > > I think that's easier to read and understand than following a function > pointer. Is this what the two of you are getting at? #define DEFINE_SRCU_STRUCT(name) \ DEFINE_PER_CPU(struct srcu_struct_array, name) = { 0, 0 }; \ struct srcu_struct name = { \ .completed = 0, \ .per_cpu_ref = NULL, \ .mutex = __MUTEX_INITIALIZER(name.mutex) \ } #define srcu_read_lock(ss) \ ({ \ if ((ss)->per_cpu_ref != NULL) \ srcu_read_lock_dynamic(&ss); \ else { \ int ret; \ \ preempt_disable(); \ ret = srcu_read_lock_static(&ss, &__get_cpu_var(ss)); \ preempt_enable(); \ ret; \ } \ }) int srcu_read_lock_dynamic(struct srcu_struct *sp) { int idx; preempt_disable(); idx = sp->completed & 0x1; barrier(); /* ensure compiler looks -once- at sp->completed. */ per_cpu_ptr(sp->per_cpu_ref, smp_processor_id())->c[idx]++; srcu_barrier(); /* ensure compiler won't misorder critical section. */ preempt_enable(); return idx; } int srcu_read_lock_static(struct srcu_struct *sp, srcu_struct_array *cp) { int idx; idx = sp->completed & 0x1; barrier(); /* ensure compiler looks -once- at sp->completed. */ cp->c[idx]++; srcu_barrier(); /* ensure compiler won't misorder critical section. */ return idx; } And similarly for srcu_read_unlock()? I sure hope that there is a better way!!! For one thing, you cannot pass a pointer in to srcu_read_lock(), since __get_cpu_var's name mangling would fail in that case... Thanx, Paul ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <Pine.LNX.4.44L0.0607071051430.17135-100000@iolanthe.rowland.org>]
* Re: [PATCH 1/2] srcu-3: RCU variant permitting read-side blocking [not found] ` <Pine.LNX.4.44L0.0607071051430.17135-100000@iolanthe.rowland.org> @ 2006-07-07 16:33 ` Paul E. McKenney [not found] ` <Pine.LNX.4.44L0.0607071345270.6793-100000@iolanthe.rowland.org> 0 siblings, 1 reply; 18+ messages in thread From: Paul E. McKenney @ 2006-07-07 16:33 UTC (permalink / raw) To: Alan Stern Cc: Matt Helsley, linux-kernel, Andrew Morton, dipankar, Ingo Molnar, tytso, Darren Hart, oleg, Jes Sorensen On Fri, Jul 07, 2006 at 10:58:28AM -0400, Alan Stern wrote: > On Thu, 6 Jul 2006, Paul E. McKenney wrote: > > > Is this what the two of you are getting at? > > > > #define DEFINE_SRCU_STRUCT(name) \ > > DEFINE_PER_CPU(struct srcu_struct_array, name) = { 0, 0 }; \ > > struct srcu_struct name = { \ > > .completed = 0, \ > > .per_cpu_ref = NULL, \ > > .mutex = __MUTEX_INITIALIZER(name.mutex) \ > > } > > Note that this approach won't work when you need to do something like: > > struct xyz { > struct srcu_struct s; > } the_xyz = { > .s = /* What goes here? */ > }; Yep, this the same issue leading to my complaint below about not being able to pass a pointer to the resulting srcu_struct. > > #define srcu_read_lock(ss) \ > > ({ \ > > if ((ss)->per_cpu_ref != NULL) \ > > srcu_read_lock_dynamic(&ss); \ > > else { \ > > int ret; \ > > \ > > preempt_disable(); \ > > ret = srcu_read_lock_static(&ss, &__get_cpu_var(ss)); \ > > preempt_enable(); \ > > ret; \ > > } \ > > }) > > > > int srcu_read_lock_dynamic(struct srcu_struct *sp) > > { > > int idx; > > > > preempt_disable(); > > idx = sp->completed & 0x1; > > barrier(); /* ensure compiler looks -once- at sp->completed. */ > > per_cpu_ptr(sp->per_cpu_ref, smp_processor_id())->c[idx]++; > > srcu_barrier(); /* ensure compiler won't misorder critical section. */ > > preempt_enable(); > > return idx; > > } > > > > int srcu_read_lock_static(struct srcu_struct *sp, srcu_struct_array *cp) > > { > > int idx; > > > > idx = sp->completed & 0x1; > > barrier(); /* ensure compiler looks -once- at sp->completed. */ > > cp->c[idx]++; > > srcu_barrier(); /* ensure compiler won't misorder critical section. */ > > return idx; > > } > > > > And similarly for srcu_read_unlock()? > > > > I sure hope that there is a better way!!! For one thing, you cannot pass > > a pointer in to srcu_read_lock(), since __get_cpu_var's name mangling would > > fail in that case... > > No, that's not what we had in mind. Another approach I looked at was statically allocating a struct percpu_data, but initializing it seems to be problematic. So here are the three approaches that seem to have some chance of working: 1. Your approach of dynamically selecting between the per_cpu_ptr() and per_cpu() APIs based on a flag within the structure. 2. Creating a pair of SRCU APIs, reflecting the two underlying per-CPU APIs (one for staticly allocated per-CPU variables, the other for dynamically allocated per-CPU variables). 3. A compile-time translation layer, making use of two different structure types and a bit of gcc type comparison. The idea would be to create a srcu_struct_static and a srcu_struct_dynamic structure that contained a pointer to the corresponding per-CPU variable and an srcu_struct, and to have a set of macros that did a typeof comparison, selecting the appropriate underlying primitive from the set of two. This is essentially #2, but with some cpp/typeof magic to make it look to the user of SRCU that there is but one API. The goal I believe we are trying to attain with SRCU include: a. Minimal read-side overhead. This goal favors 2 and 3. (Yes, blocking is so expensive that the extra check is "in the noise" if we block on the read side -- but I expect uses where blocking can happen but is extremely rare.) b. Minimal API expansion. This goal favors 1 and 3. c. Simple and straightforward use of well-understood and timeworn features of gcc. This goal favors 1 and 2. Based on this breakdown, we have a three-way tie. I tend to pay less much attention to (c), which would lead me to choose #2. Thoughts? Other important goals? Better yet, other approaches? Thanx, Paul ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <Pine.LNX.4.44L0.0607071345270.6793-100000@iolanthe.rowland.org>]
* Re: [PATCH 1/2] srcu-3: RCU variant permitting read-side blocking [not found] ` <Pine.LNX.4.44L0.0607071345270.6793-100000@iolanthe.rowland.org> @ 2006-07-07 18:59 ` Paul E. McKenney 2006-07-07 19:59 ` Alan Stern 0 siblings, 1 reply; 18+ messages in thread From: Paul E. McKenney @ 2006-07-07 18:59 UTC (permalink / raw) To: Alan Stern Cc: Matt Helsley, Andrew Morton, dipankar, Ingo Molnar, tytso, Darren Hart, oleg, Jes Sorensen, linux-kernel On Fri, Jul 07, 2006 at 02:02:27PM -0400, Alan Stern wrote: > On Fri, 7 Jul 2006, Paul E. McKenney wrote: > > > > Note that this approach won't work when you need to do something like: > > > > > > struct xyz { > > > struct srcu_struct s; > > > } the_xyz = { > > > .s = /* What goes here? */ > > > }; > > > > Yep, this the same issue leading to my complaint below about not being > > able to pass a pointer to the resulting srcu_struct. > > No, not really. The problem here is that you can't use DEFINE_PER_CPU > inside the initializer for the_xyz. The problem about not being able to > pass a pointer is easily fixed; this problem is not so easy. Both symptoms of the same problem in my view, but I agree that other perspectives are possible and perhaps even useful. ;-) We agree on the important thing, which is that the approach I was calling out in the earlier email has some severe shortcomings, and that we therefore need to do something different. > > Another approach I looked at was statically allocating a struct > > percpu_data, but initializing it seems to be problematic. > > > > So here are the three approaches that seem to have some chance > > of working: > > > > 1. Your approach of dynamically selecting between the > > per_cpu_ptr() and per_cpu() APIs based on a flag > > within the structure. > > Or a function pointer within the structure. Agreed, either a function pointer or a flag. > > 2. Creating a pair of SRCU APIs, reflecting the two > > underlying per-CPU APIs (one for staticly allocated > > per-CPU variables, the other for dynamically allocated > > per-CPU variables). > > This seems ridiculous. It would be much better IMO to come up with a > least-common-multiple API that would apply to both sorts of variables. > For example, per-cpu data could be represented by _both_ a pointer and a > table instead of just a pointer (static) or just a table (dynamic). No argument here. > > 3. A compile-time translation layer, making use of > > two different structure types and a bit of gcc > > type comparison. The idea would be to create > > a srcu_struct_static and a srcu_struct_dynamic > > structure that contained a pointer to the corresponding > > per-CPU variable and an srcu_struct, and to have > > a set of macros that did a typeof comparison, selecting > > the appropriate underlying primitive from the set > > of two. > > > > This is essentially #2, but with some cpp/typeof > > magic to make it look to the user of SRCU that there > > is but one API. > > This would add tremendous complexity, in terms of how the API is > implemented, for no very good reason. Programming is hard enough > already... Leaving out the "tremendous", yes, there would be some machinations. It would certainly be OK by me if this can be avoided. ;-) > > The goal I believe we are trying to attain with SRCU include: > > > > a. Minimal read-side overhead. This goal favors 2 and 3. > > (Yes, blocking is so expensive that the extra check is > > "in the noise" if we block on the read side -- but I > > expect uses where blocking can happen but is extremely > > rare.) > > > > b. Minimal API expansion. This goal favors 1 and 3. > > > > c. Simple and straightforward use of well-understood and > > timeworn features of gcc. This goal favors 1 and 2. > > > > Based on this breakdown, we have a three-way tie. I tend to pay less > > much attention to (c), which would lead me to choose #2. > > > > Thoughts? Other important goals? Better yet, other approaches? > > I think it's foolish for us to waste a tremendous amount of time on this > when the real problem is the poor design of the per-cpu API. Fix that, > and most of the difficulties will be gone. If the per-CPU API was reasonably unifiable, I expect that it would already be unified. The problem is that the easy ways to unify it hit some extremely hot code paths with extra cache misses -- for example, one could add a struct percpu_data to each and every static DEFINE_PERCPU(), but at the cost of an extra cache line touched and extra indirection -- which I believe was deemed unacceptable -- and would introduce initialization difficulties for the static case. So, a fourth possibility -- can a call from start_kernel() invoke some function in yours and Matt's code invoke init_srcu_struct() to get a statically allocated srcu_struct initialized? Or, if this is part of a module, can the module initialization function do this work? (Hey, I had to ask!) Thanx, Paul ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/2] srcu-3: RCU variant permitting read-side blocking 2006-07-07 18:59 ` Paul E. McKenney @ 2006-07-07 19:59 ` Alan Stern 2006-07-07 21:11 ` Matt Helsley 0 siblings, 1 reply; 18+ messages in thread From: Alan Stern @ 2006-07-07 19:59 UTC (permalink / raw) To: Paul E. McKenney Cc: Matt Helsley, Andrew Morton, dipankar, Ingo Molnar, tytso, Darren Hart, oleg, Jes Sorensen, linux-kernel On Fri, 7 Jul 2006, Paul E. McKenney wrote: > > I think it's foolish for us to waste a tremendous amount of time on this > > when the real problem is the poor design of the per-cpu API. Fix that, > > and most of the difficulties will be gone. > > If the per-CPU API was reasonably unifiable, I expect that it would > already be unified. The problem is that the easy ways to unify it hit > some extremely hot code paths with extra cache misses -- for example, one > could add a struct percpu_data to each and every static DEFINE_PERCPU(), > but at the cost of an extra cache line touched and extra indirection > -- which I believe was deemed unacceptable -- and would introduce > initialization difficulties for the static case. Here's a sketch of a possible approach. Generalizing it and making it look pretty are left as exercises for the reader. :-) In srcu_struct, along with struct srcu_struct_array *per_cpu_ref; add struct percpu_data *per_cpu_table; Dynamic initialization does: sp->per_cpu_ref = NULL; sp->per_cpu_table = alloc_percpu(...); Static initialization does: sp->per_cpu_ref = PER_CPU_ADDRESS(...); /* My macro from before; gives the address of the static variable */ sp->per_cpu_table = (struct percpu_data *) ~(unsigned long) __per_cpu_offset; Then the unified_per_cpu_ptr(ref, table, cpu) macro would expand to something like this: ({ struct percpu_data *t = (struct percpu_data *)~(unsigned long)(table); RELOC_HIDE(ref, t->ptrs[cpu]); }) Making this work right would of course require knowledge of the intimate details of both include/linux/percpu.h and include/asmXXX/percpu.h. There's some ambiguity about what t above points to: a structure containing an array of pointers to void, or an array of unsigned longs. Fortunately I think it doesn't matter. Doing it this way would not incur any extra cache misses, except for the need to store an extra member in srcu_struct. > So, a fourth possibility -- can a call from start_kernel() invoke some > function in yours and Matt's code invoke init_srcu_struct() to get a > statically allocated srcu_struct initialized? Or, if this is part of > a module, can the module initialization function do this work? > > (Hey, I had to ask!) That is certainly a viable approach: just force everyone to use dynamic initialization. Changes to existing code would be relatively few. I'm not sure where the right place would be to add these initialization calls. After kmalloc is working but before the relevant notifier chains get used at all. Is there such a place? I guess it depends on which notifier chains we convert. We might want to leave some chains using the existing rw-semaphore API. It's more appropriate when there's a high frequency of write-locking (i.e., things registering or unregistering on the notifier chain). The SRCU approach is more appropriate when the chain is called a lot and needs to have low overhead, but (un)registration is uncommon. Matt's task notifiers are a good example. Alan Stern ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/2] srcu-3: RCU variant permitting read-side blocking 2006-07-07 19:59 ` Alan Stern @ 2006-07-07 21:11 ` Matt Helsley 2006-07-07 21:47 ` Paul E. McKenney 2006-07-10 19:11 ` SRCU-based notifier chains Alan Stern 0 siblings, 2 replies; 18+ messages in thread From: Matt Helsley @ 2006-07-07 21:11 UTC (permalink / raw) To: Alan Stern Cc: Paul E. McKenney, Andrew Morton, dipankar, Ingo Molnar, tytso, Darren Hart, oleg, Jes Sorensen, LKML On Fri, 2006-07-07 at 15:59 -0400, Alan Stern wrote: > On Fri, 7 Jul 2006, Paul E. McKenney wrote: <snip> > > So, a fourth possibility -- can a call from start_kernel() invoke some > > function in yours and Matt's code invoke init_srcu_struct() to get a > > statically allocated srcu_struct initialized? Or, if this is part of > > a module, can the module initialization function do this work? > > > > (Hey, I had to ask!) > > That is certainly a viable approach: just force everyone to use dynamic > initialization. Changes to existing code would be relatively few. Works for me. I've been working on patches for Andrew's multi-chain proposal and I could use an init function there anyway. Should be faster too -- dynamically-allocated per-cpu memory can take advantage of node-local memory whereas, to my knowledge, statically-allocated cannot. > I'm not sure where the right place would be to add these initialization > calls. After kmalloc is working but before the relevant notifier chains > get used at all. Is there such a place? I guess it depends on which > notifier chains we convert. > > We might want to leave some chains using the existing rw-semaphore API. > It's more appropriate when there's a high frequency of write-locking > (i.e., things registering or unregistering on the notifier chain). The > SRCU approach is more appropriate when the chain is called a lot and > needs to have low overhead, but (un)registration is uncommon. Matt's task > notifiers are a good example. Yes, it is an excellent example. > Alan Stern Cheers, -Matt Helsley ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/2] srcu-3: RCU variant permitting read-side blocking 2006-07-07 21:11 ` Matt Helsley @ 2006-07-07 21:47 ` Paul E. McKenney 2006-07-10 19:11 ` SRCU-based notifier chains Alan Stern 1 sibling, 0 replies; 18+ messages in thread From: Paul E. McKenney @ 2006-07-07 21:47 UTC (permalink / raw) To: Matt Helsley Cc: Alan Stern, Andrew Morton, dipankar, Ingo Molnar, tytso, Darren Hart, oleg, Jes Sorensen, LKML On Fri, Jul 07, 2006 at 02:11:26PM -0700, Matt Helsley wrote: > On Fri, 2006-07-07 at 15:59 -0400, Alan Stern wrote: > > On Fri, 7 Jul 2006, Paul E. McKenney wrote: > > <snip> > > > > So, a fourth possibility -- can a call from start_kernel() invoke some > > > function in yours and Matt's code invoke init_srcu_struct() to get a > > > statically allocated srcu_struct initialized? Or, if this is part of > > > a module, can the module initialization function do this work? > > > > > > (Hey, I had to ask!) > > > > That is certainly a viable approach: just force everyone to use dynamic > > initialization. Changes to existing code would be relatively few. > > Works for me. I've been working on patches for Andrew's multi-chain > proposal and I could use an init function there anyway. Should be faster > too -- dynamically-allocated per-cpu memory can take advantage of > node-local memory whereas, to my knowledge, statically-allocated cannot. Sounds very good to me! ;-) > > I'm not sure where the right place would be to add these initialization > > calls. After kmalloc is working but before the relevant notifier chains > > get used at all. Is there such a place? I guess it depends on which > > notifier chains we convert. > > > > We might want to leave some chains using the existing rw-semaphore API. > > It's more appropriate when there's a high frequency of write-locking > > (i.e., things registering or unregistering on the notifier chain). The > > SRCU approach is more appropriate when the chain is called a lot and > > needs to have low overhead, but (un)registration is uncommon. Matt's task > > notifiers are a good example. > > Yes, it is an excellent example. Good!!! Please let me know how it goes. I will shelve the idea of statically allocated per-CPU data for srcu_struct for the moment. If some other application shows up that needs it, I will revisit. Thanx, Paul ^ permalink raw reply [flat|nested] 18+ messages in thread
* SRCU-based notifier chains 2006-07-07 21:11 ` Matt Helsley 2006-07-07 21:47 ` Paul E. McKenney @ 2006-07-10 19:11 ` Alan Stern 2006-07-11 17:39 ` Paul E. McKenney 1 sibling, 1 reply; 18+ messages in thread From: Alan Stern @ 2006-07-10 19:11 UTC (permalink / raw) To: Matt Helsley Cc: Paul E. McKenney, Andrew Morton, dipankar, Ingo Molnar, tytso, Darren Hart, oleg, Jes Sorensen, LKML On Fri, 7 Jul 2006, Matt Helsley wrote: > On Fri, 2006-07-07 at 15:59 -0400, Alan Stern wrote: > > On Fri, 7 Jul 2006, Paul E. McKenney wrote: > > <snip> > > > > So, a fourth possibility -- can a call from start_kernel() invoke some > > > function in yours and Matt's code invoke init_srcu_struct() to get a > > > statically allocated srcu_struct initialized? Or, if this is part of > > > a module, can the module initialization function do this work? > > > > > > (Hey, I had to ask!) > > > > That is certainly a viable approach: just force everyone to use dynamic > > initialization. Changes to existing code would be relatively few. > > Works for me. I've been working on patches for Andrew's multi-chain > proposal and I could use an init function there anyway. Should be faster > too -- dynamically-allocated per-cpu memory can take advantage of > node-local memory whereas, to my knowledge, statically-allocated cannot. > > > I'm not sure where the right place would be to add these initialization > > calls. After kmalloc is working but before the relevant notifier chains > > get used at all. Is there such a place? I guess it depends on which > > notifier chains we convert. > > > > We might want to leave some chains using the existing rw-semaphore API. > > It's more appropriate when there's a high frequency of write-locking > > (i.e., things registering or unregistering on the notifier chain). The > > SRCU approach is more appropriate when the chain is called a lot and > > needs to have low overhead, but (un)registration is uncommon. Matt's task > > notifiers are a good example. > > Yes, it is an excellent example. Okay, here is a patch -- completely untested but it compiles -- that adds a new kind of notifier head, using SRCU to manage the list consistency. At the moment I don't have any good candidates for blocking notifier chains that should be converted to SRCU notifier chains, although some of the things in the neworking core probably qualify. Anyway, you can try this out with your task notifiers to make sure it works as desired. Alan Stern P.S.: For this to work, the patch had to add "#ifndef _LINUX_SRCU_H" guards to include/linux/srcu.h. They undoubtedly belong there regardless. Index: usb-2.6/kernel/sys.c =================================================================== --- usb-2.6.orig/kernel/sys.c +++ usb-2.6/kernel/sys.c @@ -151,7 +151,7 @@ static int __kprobes notifier_call_chain /* * Atomic notifier chain routines. Registration and unregistration - * use a mutex, and call_chain is synchronized by RCU (no locks). + * use a spinlock, and call_chain is synchronized by RCU (no locks). */ /** @@ -399,6 +399,128 @@ int raw_notifier_call_chain(struct raw_n EXPORT_SYMBOL_GPL(raw_notifier_call_chain); +/* + * SRCU notifier chain routines. Registration and unregistration + * use a mutex, and call_chain is synchronized by SRCU (no locks). + */ + +/** + * srcu_notifier_chain_register - Add notifier to an SRCU notifier chain + * @nh: Pointer to head of the SRCU notifier chain + * @n: New entry in notifier chain + * + * Adds a notifier to an SRCU notifier chain. + * Must be called in process context. + * + * Currently always returns zero. + */ + +int srcu_notifier_chain_register(struct srcu_notifier_head *nh, + struct notifier_block *n) +{ + int ret; + + /* + * This code gets used during boot-up, when task switching is + * not yet working and interrupts must remain disabled. At + * such times we must not call mutex_lock(). + */ + if (unlikely(system_state == SYSTEM_BOOTING)) + return notifier_chain_register(&nh->head, n); + + mutex_lock(&nh->mutex); + ret = notifier_chain_register(&nh->head, n); + mutex_unlock(&nh->mutex); + return ret; +} + +EXPORT_SYMBOL_GPL(srcu_notifier_chain_register); + +/** + * srcu_notifier_chain_unregister - Remove notifier from an SRCU notifier chain + * @nh: Pointer to head of the SRCU notifier chain + * @n: Entry to remove from notifier chain + * + * Removes a notifier from an SRCU notifier chain. + * Must be called from process context. + * + * Returns zero on success or %-ENOENT on failure. + */ +int srcu_notifier_chain_unregister(struct srcu_notifier_head *nh, + struct notifier_block *n) +{ + int ret; + + /* + * This code gets used during boot-up, when task switching is + * not yet working and interrupts must remain disabled. At + * such times we must not call mutex_lock(). + */ + if (unlikely(system_state == SYSTEM_BOOTING)) + return notifier_chain_unregister(&nh->head, n); + + mutex_lock(&nh->mutex); + ret = notifier_chain_unregister(&nh->head, n); + mutex_unlock(&nh->mutex); + synchronize_srcu(&nh->srcu); + return ret; +} + +EXPORT_SYMBOL_GPL(srcu_notifier_chain_unregister); + +/** + * srcu_notifier_call_chain - Call functions in an SRCU notifier chain + * @nh: Pointer to head of the SRCU notifier chain + * @val: Value passed unmodified to notifier function + * @v: Pointer passed unmodified to notifier function + * + * Calls each function in a notifier chain in turn. The functions + * run in a process context, so they are allowed to block. + * + * If the return value of the notifier can be and'ed + * with %NOTIFY_STOP_MASK then srcu_notifier_call_chain + * will return immediately, with the return value of + * the notifier function which halted execution. + * Otherwise the return value is the return value + * of the last notifier function called. + */ + +int srcu_notifier_call_chain(struct srcu_notifier_head *nh, + unsigned long val, void *v) +{ + int ret; + int idx; + + idx = srcu_read_lock(&nh->srcu); + ret = notifier_call_chain(&nh->head, val, v); + srcu_read_unlock(&nh->srcu, idx); + return ret; +} + +EXPORT_SYMBOL_GPL(srcu_notifier_call_chain); + +/** + * srcu_init_notifier_head - Initialize an SRCU notifier head + * @nh: Pointer to head of the srcu notifier chain + * + * Unlike other sorts of notifier heads, SRCU notifier heads require + * dynamic initialization. Be sure to call this routine before + * calling any of the other SRCU notifier routines for this head. + * + * If an SRCU notifier head is deallocated, it must first be cleaned + * up by calling srcu_cleanup_notifier_head(). Otherwise the head's + * per-cpu data (used by the SRCU mechanism) will leak. + */ + +void srcu_init_notifier_head(struct srcu_notifier_head *nh) +{ + mutex_init(&nh->mutex); + init_srcu_struct(&nh->srcu); + nh->head = NULL; +} + +EXPORT_SYMBOL_GPL(srcu_init_notifier_head); + /** * register_reboot_notifier - Register function to be called at reboot time * @nb: Info about notifier function to be called Index: usb-2.6/include/linux/notifier.h =================================================================== --- usb-2.6.orig/include/linux/notifier.h +++ usb-2.6/include/linux/notifier.h @@ -12,9 +12,10 @@ #include <linux/errno.h> #include <linux/mutex.h> #include <linux/rwsem.h> +#include <linux/srcu.h> /* - * Notifier chains are of three types: + * Notifier chains are of four types: * * Atomic notifier chains: Chain callbacks run in interrupt/atomic * context. Callouts are not allowed to block. @@ -23,13 +24,27 @@ * Raw notifier chains: There are no restrictions on callbacks, * registration, or unregistration. All locking and protection * must be provided by the caller. + * SRCU notifier chains: A variant of blocking notifier chains, with + * the same restrictions. * * atomic_notifier_chain_register() may be called from an atomic context, - * but blocking_notifier_chain_register() must be called from a process - * context. Ditto for the corresponding _unregister() routines. + * but blocking_notifier_chain_register() and srcu_notifier_chain_register() + * must be called from a process context. Ditto for the corresponding + * _unregister() routines. * - * atomic_notifier_chain_unregister() and blocking_notifier_chain_unregister() - * _must not_ be called from within the call chain. + * atomic_notifier_chain_unregister(), blocking_notifier_chain_unregister(), + * and srcu_notifier_chain_unregister() _must not_ be called from within + * the call chain. + * + * SRCU notifier chains are an alternative form of blocking notifier chains. + * They use SRCU (Sleepable Read-Copy Update) instead of rw-semaphores for + * protection of the chain links. This means there is _very_ low overhead + * in srcu_notifier_call_chain(): no cache misses and no memory barriers. + * As compensation, srcu_notifier_chain_unregister() is rather expensive. + * SRCU notifier chains should be used when the chain will be called very + * often but notifier_blocks will seldom be removed. Also, SRCU notifier + * chains are slightly more difficult to use because they require dynamic + * runtime initialization. */ struct notifier_block { @@ -52,6 +67,12 @@ struct raw_notifier_head { struct notifier_block *head; }; +struct srcu_notifier_head { + struct mutex mutex; + struct srcu_struct srcu; + struct notifier_block *head; +}; + #define ATOMIC_INIT_NOTIFIER_HEAD(name) do { \ spin_lock_init(&(name)->lock); \ (name)->head = NULL; \ @@ -64,6 +85,11 @@ struct raw_notifier_head { (name)->head = NULL; \ } while (0) +/* srcu_notifier_heads must be initialized and cleaned up dynamically */ +extern void srcu_init_notifier_head(struct srcu_notifier_head *nh); +#define srcu_cleanup_notifier_head(name) \ + cleanup_srcu_struct(&(name)->srcu); + #define ATOMIC_NOTIFIER_INIT(name) { \ .lock = __SPIN_LOCK_UNLOCKED(name.lock), \ .head = NULL } @@ -72,6 +98,7 @@ struct raw_notifier_head { .head = NULL } #define RAW_NOTIFIER_INIT(name) { \ .head = NULL } +/* srcu_notifier_heads cannot be initialized statically */ #define ATOMIC_NOTIFIER_HEAD(name) \ struct atomic_notifier_head name = \ @@ -91,6 +118,8 @@ extern int blocking_notifier_chain_regis struct notifier_block *); extern int raw_notifier_chain_register(struct raw_notifier_head *, struct notifier_block *); +extern int srcu_notifier_chain_register(struct srcu_notifier_head *, + struct notifier_block *); extern int atomic_notifier_chain_unregister(struct atomic_notifier_head *, struct notifier_block *); @@ -98,6 +127,8 @@ extern int blocking_notifier_chain_unreg struct notifier_block *); extern int raw_notifier_chain_unregister(struct raw_notifier_head *, struct notifier_block *); +extern int srcu_notifier_chain_unregister(struct srcu_notifier_head *, + struct notifier_block *); extern int atomic_notifier_call_chain(struct atomic_notifier_head *, unsigned long val, void *v); @@ -105,6 +136,8 @@ extern int blocking_notifier_call_chain( unsigned long val, void *v); extern int raw_notifier_call_chain(struct raw_notifier_head *, unsigned long val, void *v); +extern int srcu_notifier_call_chain(struct srcu_notifier_head *, + unsigned long val, void *v); #define NOTIFY_DONE 0x0000 /* Don't care */ #define NOTIFY_OK 0x0001 /* Suits me */ Index: usb-2.6/include/linux/srcu.h =================================================================== --- usb-2.6.orig/include/linux/srcu.h +++ usb-2.6/include/linux/srcu.h @@ -24,6 +24,9 @@ * */ +#ifndef _LINUX_SRCU_H +#define _LINUX_SRCU_H + struct srcu_struct_array { int c[2]; }; @@ -47,3 +50,5 @@ void srcu_read_unlock(struct srcu_struct void synchronize_srcu(struct srcu_struct *sp); long srcu_batches_completed(struct srcu_struct *sp); void cleanup_srcu_struct(struct srcu_struct *sp); + +#endif ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: SRCU-based notifier chains 2006-07-10 19:11 ` SRCU-based notifier chains Alan Stern @ 2006-07-11 17:39 ` Paul E. McKenney 2006-07-11 18:03 ` Alan Stern 2006-07-11 18:18 ` [PATCH] Add " Alan Stern 0 siblings, 2 replies; 18+ messages in thread From: Paul E. McKenney @ 2006-07-11 17:39 UTC (permalink / raw) To: Alan Stern Cc: Matt Helsley, Andrew Morton, dipankar, Ingo Molnar, tytso, Darren Hart, oleg, Jes Sorensen, LKML On Mon, Jul 10, 2006 at 03:11:31PM -0400, Alan Stern wrote: > On Fri, 7 Jul 2006, Matt Helsley wrote: > > On Fri, 2006-07-07 at 15:59 -0400, Alan Stern wrote: [ . . . ] > > > We might want to leave some chains using the existing rw-semaphore API. > > > It's more appropriate when there's a high frequency of write-locking > > > (i.e., things registering or unregistering on the notifier chain). The > > > SRCU approach is more appropriate when the chain is called a lot and > > > needs to have low overhead, but (un)registration is uncommon. Matt's task > > > notifiers are a good example. > > > > Yes, it is an excellent example. > > Okay, here is a patch -- completely untested but it compiles -- that adds > a new kind of notifier head, using SRCU to manage the list consistency. > > At the moment I don't have any good candidates for blocking notifier > chains that should be converted to SRCU notifier chains, although some of > the things in the neworking core probably qualify. > > Anyway, you can try this out with your task notifiers to make sure it > works as desired. > > Alan Stern > > P.S.: For this to work, the patch had to add "#ifndef _LINUX_SRCU_H" > guards to include/linux/srcu.h. They undoubtedly belong there regardless. Looks sane to me. A couple of minor comments interspersed. Thanx, Paul Acked-by: Paul E. McKenney <paulmck@us.ibm.com> > Index: usb-2.6/kernel/sys.c > =================================================================== > --- usb-2.6.orig/kernel/sys.c > +++ usb-2.6/kernel/sys.c > @@ -151,7 +151,7 @@ static int __kprobes notifier_call_chain > > /* > * Atomic notifier chain routines. Registration and unregistration > - * use a mutex, and call_chain is synchronized by RCU (no locks). > + * use a spinlock, and call_chain is synchronized by RCU (no locks). > */ > > /** > @@ -399,6 +399,128 @@ int raw_notifier_call_chain(struct raw_n > > EXPORT_SYMBOL_GPL(raw_notifier_call_chain); > > +/* > + * SRCU notifier chain routines. Registration and unregistration > + * use a mutex, and call_chain is synchronized by SRCU (no locks). > + */ Hmmm... Probably my just failing to pay attention, but haven't noticed the double-header-comment style before. > +/** > + * srcu_notifier_chain_register - Add notifier to an SRCU notifier chain > + * @nh: Pointer to head of the SRCU notifier chain > + * @n: New entry in notifier chain > + * > + * Adds a notifier to an SRCU notifier chain. > + * Must be called in process context. > + * > + * Currently always returns zero. > + */ > + > +int srcu_notifier_chain_register(struct srcu_notifier_head *nh, > + struct notifier_block *n) > +{ > + int ret; > + > + /* > + * This code gets used during boot-up, when task switching is > + * not yet working and interrupts must remain disabled. At > + * such times we must not call mutex_lock(). > + */ > + if (unlikely(system_state == SYSTEM_BOOTING)) > + return notifier_chain_register(&nh->head, n); > + > + mutex_lock(&nh->mutex); > + ret = notifier_chain_register(&nh->head, n); > + mutex_unlock(&nh->mutex); > + return ret; > +} > + > +EXPORT_SYMBOL_GPL(srcu_notifier_chain_register); > + > +/** > + * srcu_notifier_chain_unregister - Remove notifier from an SRCU notifier chain > + * @nh: Pointer to head of the SRCU notifier chain > + * @n: Entry to remove from notifier chain > + * > + * Removes a notifier from an SRCU notifier chain. > + * Must be called from process context. > + * > + * Returns zero on success or %-ENOENT on failure. > + */ > +int srcu_notifier_chain_unregister(struct srcu_notifier_head *nh, > + struct notifier_block *n) > +{ > + int ret; > + > + /* > + * This code gets used during boot-up, when task switching is > + * not yet working and interrupts must remain disabled. At > + * such times we must not call mutex_lock(). > + */ > + if (unlikely(system_state == SYSTEM_BOOTING)) > + return notifier_chain_unregister(&nh->head, n); > + > + mutex_lock(&nh->mutex); > + ret = notifier_chain_unregister(&nh->head, n); > + mutex_unlock(&nh->mutex); > + synchronize_srcu(&nh->srcu); > + return ret; > +} > + > +EXPORT_SYMBOL_GPL(srcu_notifier_chain_unregister); > + > +/** > + * srcu_notifier_call_chain - Call functions in an SRCU notifier chain > + * @nh: Pointer to head of the SRCU notifier chain > + * @val: Value passed unmodified to notifier function > + * @v: Pointer passed unmodified to notifier function > + * > + * Calls each function in a notifier chain in turn. The functions > + * run in a process context, so they are allowed to block. > + * > + * If the return value of the notifier can be and'ed > + * with %NOTIFY_STOP_MASK then srcu_notifier_call_chain > + * will return immediately, with the return value of > + * the notifier function which halted execution. > + * Otherwise the return value is the return value > + * of the last notifier function called. > + */ > + > +int srcu_notifier_call_chain(struct srcu_notifier_head *nh, > + unsigned long val, void *v) > +{ > + int ret; > + int idx; > + > + idx = srcu_read_lock(&nh->srcu); > + ret = notifier_call_chain(&nh->head, val, v); > + srcu_read_unlock(&nh->srcu, idx); > + return ret; > +} > + > +EXPORT_SYMBOL_GPL(srcu_notifier_call_chain); > + > +/** > + * srcu_init_notifier_head - Initialize an SRCU notifier head > + * @nh: Pointer to head of the srcu notifier chain > + * > + * Unlike other sorts of notifier heads, SRCU notifier heads require > + * dynamic initialization. Be sure to call this routine before > + * calling any of the other SRCU notifier routines for this head. > + * > + * If an SRCU notifier head is deallocated, it must first be cleaned > + * up by calling srcu_cleanup_notifier_head(). Otherwise the head's > + * per-cpu data (used by the SRCU mechanism) will leak. > + */ > + > +void srcu_init_notifier_head(struct srcu_notifier_head *nh) > +{ > + mutex_init(&nh->mutex); > + init_srcu_struct(&nh->srcu); > + nh->head = NULL; > +} > + > +EXPORT_SYMBOL_GPL(srcu_init_notifier_head); > + > /** > * register_reboot_notifier - Register function to be called at reboot time > * @nb: Info about notifier function to be called > Index: usb-2.6/include/linux/notifier.h > =================================================================== > --- usb-2.6.orig/include/linux/notifier.h > +++ usb-2.6/include/linux/notifier.h > @@ -12,9 +12,10 @@ > #include <linux/errno.h> > #include <linux/mutex.h> > #include <linux/rwsem.h> > +#include <linux/srcu.h> > > /* > - * Notifier chains are of three types: > + * Notifier chains are of four types: Is it possible to subsume one of the other three types? Might not be, but have to ask... > * > * Atomic notifier chains: Chain callbacks run in interrupt/atomic > * context. Callouts are not allowed to block. > @@ -23,13 +24,27 @@ > * Raw notifier chains: There are no restrictions on callbacks, > * registration, or unregistration. All locking and protection > * must be provided by the caller. > + * SRCU notifier chains: A variant of blocking notifier chains, with > + * the same restrictions. > * > * atomic_notifier_chain_register() may be called from an atomic context, > - * but blocking_notifier_chain_register() must be called from a process > - * context. Ditto for the corresponding _unregister() routines. > + * but blocking_notifier_chain_register() and srcu_notifier_chain_register() > + * must be called from a process context. Ditto for the corresponding > + * _unregister() routines. > * > - * atomic_notifier_chain_unregister() and blocking_notifier_chain_unregister() > - * _must not_ be called from within the call chain. > + * atomic_notifier_chain_unregister(), blocking_notifier_chain_unregister(), > + * and srcu_notifier_chain_unregister() _must not_ be called from within > + * the call chain. > + * > + * SRCU notifier chains are an alternative form of blocking notifier chains. > + * They use SRCU (Sleepable Read-Copy Update) instead of rw-semaphores for > + * protection of the chain links. This means there is _very_ low overhead > + * in srcu_notifier_call_chain(): no cache misses and no memory barriers. > + * As compensation, srcu_notifier_chain_unregister() is rather expensive. > + * SRCU notifier chains should be used when the chain will be called very > + * often but notifier_blocks will seldom be removed. Also, SRCU notifier > + * chains are slightly more difficult to use because they require dynamic > + * runtime initialization. > */ > > struct notifier_block { > @@ -52,6 +67,12 @@ struct raw_notifier_head { > struct notifier_block *head; > }; > > +struct srcu_notifier_head { > + struct mutex mutex; > + struct srcu_struct srcu; > + struct notifier_block *head; > +}; > + > #define ATOMIC_INIT_NOTIFIER_HEAD(name) do { \ > spin_lock_init(&(name)->lock); \ > (name)->head = NULL; \ > @@ -64,6 +85,11 @@ struct raw_notifier_head { > (name)->head = NULL; \ > } while (0) > > +/* srcu_notifier_heads must be initialized and cleaned up dynamically */ > +extern void srcu_init_notifier_head(struct srcu_notifier_head *nh); > +#define srcu_cleanup_notifier_head(name) \ > + cleanup_srcu_struct(&(name)->srcu); > + > #define ATOMIC_NOTIFIER_INIT(name) { \ > .lock = __SPIN_LOCK_UNLOCKED(name.lock), \ > .head = NULL } > @@ -72,6 +98,7 @@ struct raw_notifier_head { > .head = NULL } > #define RAW_NOTIFIER_INIT(name) { \ > .head = NULL } > +/* srcu_notifier_heads cannot be initialized statically */ > > #define ATOMIC_NOTIFIER_HEAD(name) \ > struct atomic_notifier_head name = \ > @@ -91,6 +118,8 @@ extern int blocking_notifier_chain_regis > struct notifier_block *); > extern int raw_notifier_chain_register(struct raw_notifier_head *, > struct notifier_block *); > +extern int srcu_notifier_chain_register(struct srcu_notifier_head *, > + struct notifier_block *); > > extern int atomic_notifier_chain_unregister(struct atomic_notifier_head *, > struct notifier_block *); > @@ -98,6 +127,8 @@ extern int blocking_notifier_chain_unreg > struct notifier_block *); > extern int raw_notifier_chain_unregister(struct raw_notifier_head *, > struct notifier_block *); > +extern int srcu_notifier_chain_unregister(struct srcu_notifier_head *, > + struct notifier_block *); > > extern int atomic_notifier_call_chain(struct atomic_notifier_head *, > unsigned long val, void *v); > @@ -105,6 +136,8 @@ extern int blocking_notifier_call_chain( > unsigned long val, void *v); > extern int raw_notifier_call_chain(struct raw_notifier_head *, > unsigned long val, void *v); > +extern int srcu_notifier_call_chain(struct srcu_notifier_head *, > + unsigned long val, void *v); > > #define NOTIFY_DONE 0x0000 /* Don't care */ > #define NOTIFY_OK 0x0001 /* Suits me */ > Index: usb-2.6/include/linux/srcu.h > =================================================================== > --- usb-2.6.orig/include/linux/srcu.h > +++ usb-2.6/include/linux/srcu.h > @@ -24,6 +24,9 @@ > * > */ > > +#ifndef _LINUX_SRCU_H > +#define _LINUX_SRCU_H > + > struct srcu_struct_array { > int c[2]; > }; > @@ -47,3 +50,5 @@ void srcu_read_unlock(struct srcu_struct > void synchronize_srcu(struct srcu_struct *sp); > long srcu_batches_completed(struct srcu_struct *sp); > void cleanup_srcu_struct(struct srcu_struct *sp); > + > +#endif > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: SRCU-based notifier chains 2006-07-11 17:39 ` Paul E. McKenney @ 2006-07-11 18:03 ` Alan Stern 2006-07-11 18:22 ` Paul E. McKenney 2006-07-11 18:18 ` [PATCH] Add " Alan Stern 1 sibling, 1 reply; 18+ messages in thread From: Alan Stern @ 2006-07-11 18:03 UTC (permalink / raw) To: Paul E. McKenney Cc: Matt Helsley, Andrew Morton, dipankar, Ingo Molnar, tytso, Darren Hart, oleg, Jes Sorensen, LKML On Tue, 11 Jul 2006, Paul E. McKenney wrote: > Looks sane to me. A couple of minor comments interspersed. Okay, I'll submit it with a proper writeup. > > +/* > > + * SRCU notifier chain routines. Registration and unregistration > > + * use a mutex, and call_chain is synchronized by SRCU (no locks). > > + */ > > Hmmm... Probably my just failing to pay attention, but haven't noticed > the double-header-comment style before. As far as I know, I made it up. It seemed appropriate, since the first header applies to the entire group of three routines that follow whereas the second header is kerneldoc just for the next function. > > /* > > - * Notifier chains are of three types: > > + * Notifier chains are of four types: > > Is it possible to subsume one of the other three types? > > Might not be, but have to ask... In principle we could replace blocking notifiers, but in practice we can't. We can't just substitute one for the other for two reasons: SRCU notifiers need special initialization which the blocking notifiers don't have, and SRCU notifiers have different time/space tradeoffs which might not be appropriate for all existing blocking notifiers. Alan Stern ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: SRCU-based notifier chains 2006-07-11 18:03 ` Alan Stern @ 2006-07-11 18:22 ` Paul E. McKenney 0 siblings, 0 replies; 18+ messages in thread From: Paul E. McKenney @ 2006-07-11 18:22 UTC (permalink / raw) To: Alan Stern Cc: Matt Helsley, Andrew Morton, dipankar, Ingo Molnar, tytso, Darren Hart, oleg, Jes Sorensen, LKML On Tue, Jul 11, 2006 at 02:03:50PM -0400, Alan Stern wrote: > On Tue, 11 Jul 2006, Paul E. McKenney wrote: > > > Looks sane to me. A couple of minor comments interspersed. > > Okay, I'll submit it with a proper writeup. > > > > +/* > > > + * SRCU notifier chain routines. Registration and unregistration > > > + * use a mutex, and call_chain is synchronized by SRCU (no locks). > > > + */ > > > > Hmmm... Probably my just failing to pay attention, but haven't noticed > > the double-header-comment style before. > > As far as I know, I made it up. It seemed appropriate, since the first > header applies to the entire group of three routines that follow whereas > the second header is kerneldoc just for the next function. Fair enough -- I missed the fact that the first header applies to all three functions. > > > /* > > > - * Notifier chains are of three types: > > > + * Notifier chains are of four types: > > > > Is it possible to subsume one of the other three types? > > > > Might not be, but have to ask... > > In principle we could replace blocking notifiers, but in practice we > can't. > > We can't just substitute one for the other for two reasons: SRCU notifiers > need special initialization which the blocking notifiers don't have, and > SRCU notifiers have different time/space tradeoffs which might not be > appropriate for all existing blocking notifiers. Again, fair enough! Thanx, Paul ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH] Add SRCU-based notifier chains 2006-07-11 17:39 ` Paul E. McKenney 2006-07-11 18:03 ` Alan Stern @ 2006-07-11 18:18 ` Alan Stern 2006-07-11 18:30 ` Paul E. McKenney 2006-07-12 0:56 ` Chandra Seetharaman 1 sibling, 2 replies; 18+ messages in thread From: Alan Stern @ 2006-07-11 18:18 UTC (permalink / raw) To: Andrew Morton Cc: Chandra Seetharaman, Paul E. McKenney, Matt Helsley, Benjamin LaHaise, Kernel development list This patch (as751) adds a new type of notifier chain, based on the SRCU (Sleepable Read-Copy Update) primitives recently added to the kernel. An SRCU notifier chain is much like a blocking notifier chain, in that it must be called in process context and its callout routines are allowed to sleep. The difference is that the chain's links are protected by the SRCU mechanism rather than by an rw-semaphore, so calling the chain has extremely low overhead: no memory barriers and no cache-line bouncing. On the other hand, unregistering from the chain is expensive and the chain head requires special runtime initialization (plus cleanup if it is to be deallocated). SRCU notifiers are appropriate for notifiers that will be called very frequently and for which unregistration occurs very seldom. The proposed "task notifier" scheme qualifies, as may some of the network notifiers. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> --- Index: usb-2.6/kernel/sys.c =================================================================== --- usb-2.6.orig/kernel/sys.c +++ usb-2.6/kernel/sys.c @@ -151,7 +151,7 @@ static int __kprobes notifier_call_chain /* * Atomic notifier chain routines. Registration and unregistration - * use a mutex, and call_chain is synchronized by RCU (no locks). + * use a spinlock, and call_chain is synchronized by RCU (no locks). */ /** @@ -399,6 +399,128 @@ int raw_notifier_call_chain(struct raw_n EXPORT_SYMBOL_GPL(raw_notifier_call_chain); +/* + * SRCU notifier chain routines. Registration and unregistration + * use a mutex, and call_chain is synchronized by SRCU (no locks). + */ + +/** + * srcu_notifier_chain_register - Add notifier to an SRCU notifier chain + * @nh: Pointer to head of the SRCU notifier chain + * @n: New entry in notifier chain + * + * Adds a notifier to an SRCU notifier chain. + * Must be called in process context. + * + * Currently always returns zero. + */ + +int srcu_notifier_chain_register(struct srcu_notifier_head *nh, + struct notifier_block *n) +{ + int ret; + + /* + * This code gets used during boot-up, when task switching is + * not yet working and interrupts must remain disabled. At + * such times we must not call mutex_lock(). + */ + if (unlikely(system_state == SYSTEM_BOOTING)) + return notifier_chain_register(&nh->head, n); + + mutex_lock(&nh->mutex); + ret = notifier_chain_register(&nh->head, n); + mutex_unlock(&nh->mutex); + return ret; +} + +EXPORT_SYMBOL_GPL(srcu_notifier_chain_register); + +/** + * srcu_notifier_chain_unregister - Remove notifier from an SRCU notifier chain + * @nh: Pointer to head of the SRCU notifier chain + * @n: Entry to remove from notifier chain + * + * Removes a notifier from an SRCU notifier chain. + * Must be called from process context. + * + * Returns zero on success or %-ENOENT on failure. + */ +int srcu_notifier_chain_unregister(struct srcu_notifier_head *nh, + struct notifier_block *n) +{ + int ret; + + /* + * This code gets used during boot-up, when task switching is + * not yet working and interrupts must remain disabled. At + * such times we must not call mutex_lock(). + */ + if (unlikely(system_state == SYSTEM_BOOTING)) + return notifier_chain_unregister(&nh->head, n); + + mutex_lock(&nh->mutex); + ret = notifier_chain_unregister(&nh->head, n); + mutex_unlock(&nh->mutex); + synchronize_srcu(&nh->srcu); + return ret; +} + +EXPORT_SYMBOL_GPL(srcu_notifier_chain_unregister); + +/** + * srcu_notifier_call_chain - Call functions in an SRCU notifier chain + * @nh: Pointer to head of the SRCU notifier chain + * @val: Value passed unmodified to notifier function + * @v: Pointer passed unmodified to notifier function + * + * Calls each function in a notifier chain in turn. The functions + * run in a process context, so they are allowed to block. + * + * If the return value of the notifier can be and'ed + * with %NOTIFY_STOP_MASK then srcu_notifier_call_chain + * will return immediately, with the return value of + * the notifier function which halted execution. + * Otherwise the return value is the return value + * of the last notifier function called. + */ + +int srcu_notifier_call_chain(struct srcu_notifier_head *nh, + unsigned long val, void *v) +{ + int ret; + int idx; + + idx = srcu_read_lock(&nh->srcu); + ret = notifier_call_chain(&nh->head, val, v); + srcu_read_unlock(&nh->srcu, idx); + return ret; +} + +EXPORT_SYMBOL_GPL(srcu_notifier_call_chain); + +/** + * srcu_init_notifier_head - Initialize an SRCU notifier head + * @nh: Pointer to head of the srcu notifier chain + * + * Unlike other sorts of notifier heads, SRCU notifier heads require + * dynamic initialization. Be sure to call this routine before + * calling any of the other SRCU notifier routines for this head. + * + * If an SRCU notifier head is deallocated, it must first be cleaned + * up by calling srcu_cleanup_notifier_head(). Otherwise the head's + * per-cpu data (used by the SRCU mechanism) will leak. + */ + +void srcu_init_notifier_head(struct srcu_notifier_head *nh) +{ + mutex_init(&nh->mutex); + init_srcu_struct(&nh->srcu); + nh->head = NULL; +} + +EXPORT_SYMBOL_GPL(srcu_init_notifier_head); + /** * register_reboot_notifier - Register function to be called at reboot time * @nb: Info about notifier function to be called Index: usb-2.6/include/linux/notifier.h =================================================================== --- usb-2.6.orig/include/linux/notifier.h +++ usb-2.6/include/linux/notifier.h @@ -12,9 +12,10 @@ #include <linux/errno.h> #include <linux/mutex.h> #include <linux/rwsem.h> +#include <linux/srcu.h> /* - * Notifier chains are of three types: + * Notifier chains are of four types: * * Atomic notifier chains: Chain callbacks run in interrupt/atomic * context. Callouts are not allowed to block. @@ -23,13 +24,27 @@ * Raw notifier chains: There are no restrictions on callbacks, * registration, or unregistration. All locking and protection * must be provided by the caller. + * SRCU notifier chains: A variant of blocking notifier chains, with + * the same restrictions. * * atomic_notifier_chain_register() may be called from an atomic context, - * but blocking_notifier_chain_register() must be called from a process - * context. Ditto for the corresponding _unregister() routines. + * but blocking_notifier_chain_register() and srcu_notifier_chain_register() + * must be called from a process context. Ditto for the corresponding + * _unregister() routines. * - * atomic_notifier_chain_unregister() and blocking_notifier_chain_unregister() - * _must not_ be called from within the call chain. + * atomic_notifier_chain_unregister(), blocking_notifier_chain_unregister(), + * and srcu_notifier_chain_unregister() _must not_ be called from within + * the call chain. + * + * SRCU notifier chains are an alternative form of blocking notifier chains. + * They use SRCU (Sleepable Read-Copy Update) instead of rw-semaphores for + * protection of the chain links. This means there is _very_ low overhead + * in srcu_notifier_call_chain(): no cache bounces and no memory barriers. + * As compensation, srcu_notifier_chain_unregister() is rather expensive. + * SRCU notifier chains should be used when the chain will be called very + * often but notifier_blocks will seldom be removed. Also, SRCU notifier + * chains are slightly more difficult to use because they require special + * runtime initialization. */ struct notifier_block { @@ -52,6 +67,12 @@ struct raw_notifier_head { struct notifier_block *head; }; +struct srcu_notifier_head { + struct mutex mutex; + struct srcu_struct srcu; + struct notifier_block *head; +}; + #define ATOMIC_INIT_NOTIFIER_HEAD(name) do { \ spin_lock_init(&(name)->lock); \ (name)->head = NULL; \ @@ -64,6 +85,11 @@ struct raw_notifier_head { (name)->head = NULL; \ } while (0) +/* srcu_notifier_heads must be initialized and cleaned up dynamically */ +extern void srcu_init_notifier_head(struct srcu_notifier_head *nh); +#define srcu_cleanup_notifier_head(name) \ + cleanup_srcu_struct(&(name)->srcu); + #define ATOMIC_NOTIFIER_INIT(name) { \ .lock = __SPIN_LOCK_UNLOCKED(name.lock), \ .head = NULL } @@ -72,6 +98,7 @@ struct raw_notifier_head { .head = NULL } #define RAW_NOTIFIER_INIT(name) { \ .head = NULL } +/* srcu_notifier_heads cannot be initialized statically */ #define ATOMIC_NOTIFIER_HEAD(name) \ struct atomic_notifier_head name = \ @@ -91,6 +118,8 @@ extern int blocking_notifier_chain_regis struct notifier_block *); extern int raw_notifier_chain_register(struct raw_notifier_head *, struct notifier_block *); +extern int srcu_notifier_chain_register(struct srcu_notifier_head *, + struct notifier_block *); extern int atomic_notifier_chain_unregister(struct atomic_notifier_head *, struct notifier_block *); @@ -98,6 +127,8 @@ extern int blocking_notifier_chain_unreg struct notifier_block *); extern int raw_notifier_chain_unregister(struct raw_notifier_head *, struct notifier_block *); +extern int srcu_notifier_chain_unregister(struct srcu_notifier_head *, + struct notifier_block *); extern int atomic_notifier_call_chain(struct atomic_notifier_head *, unsigned long val, void *v); @@ -105,6 +136,8 @@ extern int blocking_notifier_call_chain( unsigned long val, void *v); extern int raw_notifier_call_chain(struct raw_notifier_head *, unsigned long val, void *v); +extern int srcu_notifier_call_chain(struct srcu_notifier_head *, + unsigned long val, void *v); #define NOTIFY_DONE 0x0000 /* Don't care */ #define NOTIFY_OK 0x0001 /* Suits me */ Index: usb-2.6/include/linux/srcu.h =================================================================== --- usb-2.6.orig/include/linux/srcu.h +++ usb-2.6/include/linux/srcu.h @@ -24,6 +24,9 @@ * */ +#ifndef _LINUX_SRCU_H +#define _LINUX_SRCU_H + struct srcu_struct_array { int c[2]; }; @@ -47,3 +50,5 @@ void srcu_read_unlock(struct srcu_struct void synchronize_srcu(struct srcu_struct *sp); long srcu_batches_completed(struct srcu_struct *sp); void cleanup_srcu_struct(struct srcu_struct *sp); + +#endif ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Add SRCU-based notifier chains 2006-07-11 18:18 ` [PATCH] Add " Alan Stern @ 2006-07-11 18:30 ` Paul E. McKenney 2006-07-12 0:56 ` Chandra Seetharaman 1 sibling, 0 replies; 18+ messages in thread From: Paul E. McKenney @ 2006-07-11 18:30 UTC (permalink / raw) To: Alan Stern Cc: Andrew Morton, Chandra Seetharaman, Matt Helsley, Benjamin LaHaise, Kernel development list On Tue, Jul 11, 2006 at 02:18:53PM -0400, Alan Stern wrote: > This patch (as751) adds a new type of notifier chain, based on the SRCU > (Sleepable Read-Copy Update) primitives recently added to the kernel. An > SRCU notifier chain is much like a blocking notifier chain, in that it > must be called in process context and its callout routines are allowed to > sleep. The difference is that the chain's links are protected by the SRCU > mechanism rather than by an rw-semaphore, so calling the chain has > extremely low overhead: no memory barriers and no cache-line bouncing. > On the other hand, unregistering from the chain is expensive and the chain > head requires special runtime initialization (plus cleanup if it is to be > deallocated). > > SRCU notifiers are appropriate for notifiers that will be called very > frequently and for which unregistration occurs very seldom. The proposed > "task notifier" scheme qualifies, as may some of the network notifiers. Looks good from an SRCU perspective! Looks like notifier_chain_register() already contains the required rcu_assign_pointer(), and notifier_call_chain() already contains the required rcu_dereference(). Acked-by: Paul E. McKenney <paulmck@us.ibm.com> > Signed-off-by: Alan Stern <stern@rowland.harvard.edu> > > --- > > Index: usb-2.6/kernel/sys.c > =================================================================== > --- usb-2.6.orig/kernel/sys.c > +++ usb-2.6/kernel/sys.c > @@ -151,7 +151,7 @@ static int __kprobes notifier_call_chain > > /* > * Atomic notifier chain routines. Registration and unregistration > - * use a mutex, and call_chain is synchronized by RCU (no locks). > + * use a spinlock, and call_chain is synchronized by RCU (no locks). > */ > > /** > @@ -399,6 +399,128 @@ int raw_notifier_call_chain(struct raw_n > > EXPORT_SYMBOL_GPL(raw_notifier_call_chain); > > +/* > + * SRCU notifier chain routines. Registration and unregistration > + * use a mutex, and call_chain is synchronized by SRCU (no locks). > + */ > + > +/** > + * srcu_notifier_chain_register - Add notifier to an SRCU notifier chain > + * @nh: Pointer to head of the SRCU notifier chain > + * @n: New entry in notifier chain > + * > + * Adds a notifier to an SRCU notifier chain. > + * Must be called in process context. > + * > + * Currently always returns zero. > + */ > + > +int srcu_notifier_chain_register(struct srcu_notifier_head *nh, > + struct notifier_block *n) > +{ > + int ret; > + > + /* > + * This code gets used during boot-up, when task switching is > + * not yet working and interrupts must remain disabled. At > + * such times we must not call mutex_lock(). > + */ > + if (unlikely(system_state == SYSTEM_BOOTING)) > + return notifier_chain_register(&nh->head, n); > + > + mutex_lock(&nh->mutex); > + ret = notifier_chain_register(&nh->head, n); > + mutex_unlock(&nh->mutex); > + return ret; > +} > + > +EXPORT_SYMBOL_GPL(srcu_notifier_chain_register); > + > +/** > + * srcu_notifier_chain_unregister - Remove notifier from an SRCU notifier chain > + * @nh: Pointer to head of the SRCU notifier chain > + * @n: Entry to remove from notifier chain > + * > + * Removes a notifier from an SRCU notifier chain. > + * Must be called from process context. > + * > + * Returns zero on success or %-ENOENT on failure. > + */ > +int srcu_notifier_chain_unregister(struct srcu_notifier_head *nh, > + struct notifier_block *n) > +{ > + int ret; > + > + /* > + * This code gets used during boot-up, when task switching is > + * not yet working and interrupts must remain disabled. At > + * such times we must not call mutex_lock(). > + */ > + if (unlikely(system_state == SYSTEM_BOOTING)) > + return notifier_chain_unregister(&nh->head, n); > + > + mutex_lock(&nh->mutex); > + ret = notifier_chain_unregister(&nh->head, n); > + mutex_unlock(&nh->mutex); > + synchronize_srcu(&nh->srcu); > + return ret; > +} > + > +EXPORT_SYMBOL_GPL(srcu_notifier_chain_unregister); > + > +/** > + * srcu_notifier_call_chain - Call functions in an SRCU notifier chain > + * @nh: Pointer to head of the SRCU notifier chain > + * @val: Value passed unmodified to notifier function > + * @v: Pointer passed unmodified to notifier function > + * > + * Calls each function in a notifier chain in turn. The functions > + * run in a process context, so they are allowed to block. > + * > + * If the return value of the notifier can be and'ed > + * with %NOTIFY_STOP_MASK then srcu_notifier_call_chain > + * will return immediately, with the return value of > + * the notifier function which halted execution. > + * Otherwise the return value is the return value > + * of the last notifier function called. > + */ > + > +int srcu_notifier_call_chain(struct srcu_notifier_head *nh, > + unsigned long val, void *v) > +{ > + int ret; > + int idx; > + > + idx = srcu_read_lock(&nh->srcu); > + ret = notifier_call_chain(&nh->head, val, v); > + srcu_read_unlock(&nh->srcu, idx); > + return ret; > +} > + > +EXPORT_SYMBOL_GPL(srcu_notifier_call_chain); > + > +/** > + * srcu_init_notifier_head - Initialize an SRCU notifier head > + * @nh: Pointer to head of the srcu notifier chain > + * > + * Unlike other sorts of notifier heads, SRCU notifier heads require > + * dynamic initialization. Be sure to call this routine before > + * calling any of the other SRCU notifier routines for this head. > + * > + * If an SRCU notifier head is deallocated, it must first be cleaned > + * up by calling srcu_cleanup_notifier_head(). Otherwise the head's > + * per-cpu data (used by the SRCU mechanism) will leak. > + */ > + > +void srcu_init_notifier_head(struct srcu_notifier_head *nh) > +{ > + mutex_init(&nh->mutex); > + init_srcu_struct(&nh->srcu); > + nh->head = NULL; > +} > + > +EXPORT_SYMBOL_GPL(srcu_init_notifier_head); > + > /** > * register_reboot_notifier - Register function to be called at reboot time > * @nb: Info about notifier function to be called > Index: usb-2.6/include/linux/notifier.h > =================================================================== > --- usb-2.6.orig/include/linux/notifier.h > +++ usb-2.6/include/linux/notifier.h > @@ -12,9 +12,10 @@ > #include <linux/errno.h> > #include <linux/mutex.h> > #include <linux/rwsem.h> > +#include <linux/srcu.h> > > /* > - * Notifier chains are of three types: > + * Notifier chains are of four types: > * > * Atomic notifier chains: Chain callbacks run in interrupt/atomic > * context. Callouts are not allowed to block. > @@ -23,13 +24,27 @@ > * Raw notifier chains: There are no restrictions on callbacks, > * registration, or unregistration. All locking and protection > * must be provided by the caller. > + * SRCU notifier chains: A variant of blocking notifier chains, with > + * the same restrictions. > * > * atomic_notifier_chain_register() may be called from an atomic context, > - * but blocking_notifier_chain_register() must be called from a process > - * context. Ditto for the corresponding _unregister() routines. > + * but blocking_notifier_chain_register() and srcu_notifier_chain_register() > + * must be called from a process context. Ditto for the corresponding > + * _unregister() routines. > * > - * atomic_notifier_chain_unregister() and blocking_notifier_chain_unregister() > - * _must not_ be called from within the call chain. > + * atomic_notifier_chain_unregister(), blocking_notifier_chain_unregister(), > + * and srcu_notifier_chain_unregister() _must not_ be called from within > + * the call chain. > + * > + * SRCU notifier chains are an alternative form of blocking notifier chains. > + * They use SRCU (Sleepable Read-Copy Update) instead of rw-semaphores for > + * protection of the chain links. This means there is _very_ low overhead > + * in srcu_notifier_call_chain(): no cache bounces and no memory barriers. > + * As compensation, srcu_notifier_chain_unregister() is rather expensive. > + * SRCU notifier chains should be used when the chain will be called very > + * often but notifier_blocks will seldom be removed. Also, SRCU notifier > + * chains are slightly more difficult to use because they require special > + * runtime initialization. > */ > > struct notifier_block { > @@ -52,6 +67,12 @@ struct raw_notifier_head { > struct notifier_block *head; > }; > > +struct srcu_notifier_head { > + struct mutex mutex; > + struct srcu_struct srcu; > + struct notifier_block *head; > +}; > + > #define ATOMIC_INIT_NOTIFIER_HEAD(name) do { \ > spin_lock_init(&(name)->lock); \ > (name)->head = NULL; \ > @@ -64,6 +85,11 @@ struct raw_notifier_head { > (name)->head = NULL; \ > } while (0) > > +/* srcu_notifier_heads must be initialized and cleaned up dynamically */ > +extern void srcu_init_notifier_head(struct srcu_notifier_head *nh); > +#define srcu_cleanup_notifier_head(name) \ > + cleanup_srcu_struct(&(name)->srcu); > + > #define ATOMIC_NOTIFIER_INIT(name) { \ > .lock = __SPIN_LOCK_UNLOCKED(name.lock), \ > .head = NULL } > @@ -72,6 +98,7 @@ struct raw_notifier_head { > .head = NULL } > #define RAW_NOTIFIER_INIT(name) { \ > .head = NULL } > +/* srcu_notifier_heads cannot be initialized statically */ > > #define ATOMIC_NOTIFIER_HEAD(name) \ > struct atomic_notifier_head name = \ > @@ -91,6 +118,8 @@ extern int blocking_notifier_chain_regis > struct notifier_block *); > extern int raw_notifier_chain_register(struct raw_notifier_head *, > struct notifier_block *); > +extern int srcu_notifier_chain_register(struct srcu_notifier_head *, > + struct notifier_block *); > > extern int atomic_notifier_chain_unregister(struct atomic_notifier_head *, > struct notifier_block *); > @@ -98,6 +127,8 @@ extern int blocking_notifier_chain_unreg > struct notifier_block *); > extern int raw_notifier_chain_unregister(struct raw_notifier_head *, > struct notifier_block *); > +extern int srcu_notifier_chain_unregister(struct srcu_notifier_head *, > + struct notifier_block *); > > extern int atomic_notifier_call_chain(struct atomic_notifier_head *, > unsigned long val, void *v); > @@ -105,6 +136,8 @@ extern int blocking_notifier_call_chain( > unsigned long val, void *v); > extern int raw_notifier_call_chain(struct raw_notifier_head *, > unsigned long val, void *v); > +extern int srcu_notifier_call_chain(struct srcu_notifier_head *, > + unsigned long val, void *v); > > #define NOTIFY_DONE 0x0000 /* Don't care */ > #define NOTIFY_OK 0x0001 /* Suits me */ > Index: usb-2.6/include/linux/srcu.h > =================================================================== > --- usb-2.6.orig/include/linux/srcu.h > +++ usb-2.6/include/linux/srcu.h > @@ -24,6 +24,9 @@ > * > */ > > +#ifndef _LINUX_SRCU_H > +#define _LINUX_SRCU_H > + > struct srcu_struct_array { > int c[2]; > }; > @@ -47,3 +50,5 @@ void srcu_read_unlock(struct srcu_struct > void synchronize_srcu(struct srcu_struct *sp); > long srcu_batches_completed(struct srcu_struct *sp); > void cleanup_srcu_struct(struct srcu_struct *sp); > + > +#endif > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] Add SRCU-based notifier chains 2006-07-11 18:18 ` [PATCH] Add " Alan Stern 2006-07-11 18:30 ` Paul E. McKenney @ 2006-07-12 0:56 ` Chandra Seetharaman 1 sibling, 0 replies; 18+ messages in thread From: Chandra Seetharaman @ 2006-07-12 0:56 UTC (permalink / raw) To: Alan Stern Cc: Andrew Morton, Paul E. McKenney, Matt Helsley, Benjamin LaHaise, Kernel development list On Tue, 2006-07-11 at 14:18 -0400, Alan Stern wrote: > This patch (as751) adds a new type of notifier chain, based on the SRCU > (Sleepable Read-Copy Update) primitives recently added to the kernel. An > SRCU notifier chain is much like a blocking notifier chain, in that it > must be called in process context and its callout routines are allowed to > sleep. The difference is that the chain's links are protected by the SRCU > mechanism rather than by an rw-semaphore, so calling the chain has > extremely low overhead: no memory barriers and no cache-line bouncing. > On the other hand, unregistering from the chain is expensive and the chain > head requires special runtime initialization (plus cleanup if it is to be > deallocated). > > SRCU notifiers are appropriate for notifiers that will be called very > frequently and for which unregistration occurs very seldom. The proposed > "task notifier" scheme qualifies, as may some of the network notifiers. > > Looks good from the notifier mechanism perspective. Acked-by: Chandra Seetharaman <sekharan@us.ibm.com> > Signed-off-by: Alan Stern <stern@rowland.harvard.edu> > > --- > > Index: usb-2.6/kernel/sys.c > =================================================================== > --- usb-2.6.orig/kernel/sys.c > +++ usb-2.6/kernel/sys.c > @@ -151,7 +151,7 @@ static int __kprobes notifier_call_chain > > /* > * Atomic notifier chain routines. Registration and unregistration > - * use a mutex, and call_chain is synchronized by RCU (no locks). > + * use a spinlock, and call_chain is synchronized by RCU (no locks). > */ > > /** > @@ -399,6 +399,128 @@ int raw_notifier_call_chain(struct raw_n > > EXPORT_SYMBOL_GPL(raw_notifier_call_chain); > > +/* > + * SRCU notifier chain routines. Registration and unregistration > + * use a mutex, and call_chain is synchronized by SRCU (no locks). > + */ > + > +/** > + * srcu_notifier_chain_register - Add notifier to an SRCU notifier chain > + * @nh: Pointer to head of the SRCU notifier chain > + * @n: New entry in notifier chain > + * > + * Adds a notifier to an SRCU notifier chain. > + * Must be called in process context. > + * > + * Currently always returns zero. > + */ > + > +int srcu_notifier_chain_register(struct srcu_notifier_head *nh, > + struct notifier_block *n) > +{ > + int ret; > + > + /* > + * This code gets used during boot-up, when task switching is > + * not yet working and interrupts must remain disabled. At > + * such times we must not call mutex_lock(). > + */ > + if (unlikely(system_state == SYSTEM_BOOTING)) > + return notifier_chain_register(&nh->head, n); > + > + mutex_lock(&nh->mutex); > + ret = notifier_chain_register(&nh->head, n); > + mutex_unlock(&nh->mutex); > + return ret; > +} > + > +EXPORT_SYMBOL_GPL(srcu_notifier_chain_register); > + > +/** > + * srcu_notifier_chain_unregister - Remove notifier from an SRCU notifier chain > + * @nh: Pointer to head of the SRCU notifier chain > + * @n: Entry to remove from notifier chain > + * > + * Removes a notifier from an SRCU notifier chain. > + * Must be called from process context. > + * > + * Returns zero on success or %-ENOENT on failure. > + */ > +int srcu_notifier_chain_unregister(struct srcu_notifier_head *nh, > + struct notifier_block *n) > +{ > + int ret; > + > + /* > + * This code gets used during boot-up, when task switching is > + * not yet working and interrupts must remain disabled. At > + * such times we must not call mutex_lock(). > + */ > + if (unlikely(system_state == SYSTEM_BOOTING)) > + return notifier_chain_unregister(&nh->head, n); > + > + mutex_lock(&nh->mutex); > + ret = notifier_chain_unregister(&nh->head, n); > + mutex_unlock(&nh->mutex); > + synchronize_srcu(&nh->srcu); > + return ret; > +} > + > +EXPORT_SYMBOL_GPL(srcu_notifier_chain_unregister); > + > +/** > + * srcu_notifier_call_chain - Call functions in an SRCU notifier chain > + * @nh: Pointer to head of the SRCU notifier chain > + * @val: Value passed unmodified to notifier function > + * @v: Pointer passed unmodified to notifier function > + * > + * Calls each function in a notifier chain in turn. The functions > + * run in a process context, so they are allowed to block. > + * > + * If the return value of the notifier can be and'ed > + * with %NOTIFY_STOP_MASK then srcu_notifier_call_chain > + * will return immediately, with the return value of > + * the notifier function which halted execution. > + * Otherwise the return value is the return value > + * of the last notifier function called. > + */ > + > +int srcu_notifier_call_chain(struct srcu_notifier_head *nh, > + unsigned long val, void *v) > +{ > + int ret; > + int idx; > + > + idx = srcu_read_lock(&nh->srcu); > + ret = notifier_call_chain(&nh->head, val, v); > + srcu_read_unlock(&nh->srcu, idx); > + return ret; > +} > + > +EXPORT_SYMBOL_GPL(srcu_notifier_call_chain); > + > +/** > + * srcu_init_notifier_head - Initialize an SRCU notifier head > + * @nh: Pointer to head of the srcu notifier chain > + * > + * Unlike other sorts of notifier heads, SRCU notifier heads require > + * dynamic initialization. Be sure to call this routine before > + * calling any of the other SRCU notifier routines for this head. > + * > + * If an SRCU notifier head is deallocated, it must first be cleaned > + * up by calling srcu_cleanup_notifier_head(). Otherwise the head's > + * per-cpu data (used by the SRCU mechanism) will leak. > + */ > + > +void srcu_init_notifier_head(struct srcu_notifier_head *nh) > +{ > + mutex_init(&nh->mutex); > + init_srcu_struct(&nh->srcu); > + nh->head = NULL; > +} > + > +EXPORT_SYMBOL_GPL(srcu_init_notifier_head); > + > /** > * register_reboot_notifier - Register function to be called at reboot time > * @nb: Info about notifier function to be called > Index: usb-2.6/include/linux/notifier.h > =================================================================== > --- usb-2.6.orig/include/linux/notifier.h > +++ usb-2.6/include/linux/notifier.h > @@ -12,9 +12,10 @@ > #include <linux/errno.h> > #include <linux/mutex.h> > #include <linux/rwsem.h> > +#include <linux/srcu.h> > > /* > - * Notifier chains are of three types: > + * Notifier chains are of four types: > * > * Atomic notifier chains: Chain callbacks run in interrupt/atomic > * context. Callouts are not allowed to block. > @@ -23,13 +24,27 @@ > * Raw notifier chains: There are no restrictions on callbacks, > * registration, or unregistration. All locking and protection > * must be provided by the caller. > + * SRCU notifier chains: A variant of blocking notifier chains, with > + * the same restrictions. > * > * atomic_notifier_chain_register() may be called from an atomic context, > - * but blocking_notifier_chain_register() must be called from a process > - * context. Ditto for the corresponding _unregister() routines. > + * but blocking_notifier_chain_register() and srcu_notifier_chain_register() > + * must be called from a process context. Ditto for the corresponding > + * _unregister() routines. > * > - * atomic_notifier_chain_unregister() and blocking_notifier_chain_unregister() > - * _must not_ be called from within the call chain. > + * atomic_notifier_chain_unregister(), blocking_notifier_chain_unregister(), > + * and srcu_notifier_chain_unregister() _must not_ be called from within > + * the call chain. > + * > + * SRCU notifier chains are an alternative form of blocking notifier chains. > + * They use SRCU (Sleepable Read-Copy Update) instead of rw-semaphores for > + * protection of the chain links. This means there is _very_ low overhead > + * in srcu_notifier_call_chain(): no cache bounces and no memory barriers. > + * As compensation, srcu_notifier_chain_unregister() is rather expensive. > + * SRCU notifier chains should be used when the chain will be called very > + * often but notifier_blocks will seldom be removed. Also, SRCU notifier > + * chains are slightly more difficult to use because they require special > + * runtime initialization. > */ > > struct notifier_block { > @@ -52,6 +67,12 @@ struct raw_notifier_head { > struct notifier_block *head; > }; > > +struct srcu_notifier_head { > + struct mutex mutex; > + struct srcu_struct srcu; > + struct notifier_block *head; > +}; > + > #define ATOMIC_INIT_NOTIFIER_HEAD(name) do { \ > spin_lock_init(&(name)->lock); \ > (name)->head = NULL; \ > @@ -64,6 +85,11 @@ struct raw_notifier_head { > (name)->head = NULL; \ > } while (0) > > +/* srcu_notifier_heads must be initialized and cleaned up dynamically */ > +extern void srcu_init_notifier_head(struct srcu_notifier_head *nh); > +#define srcu_cleanup_notifier_head(name) \ > + cleanup_srcu_struct(&(name)->srcu); > + > #define ATOMIC_NOTIFIER_INIT(name) { \ > .lock = __SPIN_LOCK_UNLOCKED(name.lock), \ > .head = NULL } > @@ -72,6 +98,7 @@ struct raw_notifier_head { > .head = NULL } > #define RAW_NOTIFIER_INIT(name) { \ > .head = NULL } > +/* srcu_notifier_heads cannot be initialized statically */ > > #define ATOMIC_NOTIFIER_HEAD(name) \ > struct atomic_notifier_head name = \ > @@ -91,6 +118,8 @@ extern int blocking_notifier_chain_regis > struct notifier_block *); > extern int raw_notifier_chain_register(struct raw_notifier_head *, > struct notifier_block *); > +extern int srcu_notifier_chain_register(struct srcu_notifier_head *, > + struct notifier_block *); > > extern int atomic_notifier_chain_unregister(struct atomic_notifier_head *, > struct notifier_block *); > @@ -98,6 +127,8 @@ extern int blocking_notifier_chain_unreg > struct notifier_block *); > extern int raw_notifier_chain_unregister(struct raw_notifier_head *, > struct notifier_block *); > +extern int srcu_notifier_chain_unregister(struct srcu_notifier_head *, > + struct notifier_block *); > > extern int atomic_notifier_call_chain(struct atomic_notifier_head *, > unsigned long val, void *v); > @@ -105,6 +136,8 @@ extern int blocking_notifier_call_chain( > unsigned long val, void *v); > extern int raw_notifier_call_chain(struct raw_notifier_head *, > unsigned long val, void *v); > +extern int srcu_notifier_call_chain(struct srcu_notifier_head *, > + unsigned long val, void *v); > > #define NOTIFY_DONE 0x0000 /* Don't care */ > #define NOTIFY_OK 0x0001 /* Suits me */ > Index: usb-2.6/include/linux/srcu.h > =================================================================== > --- usb-2.6.orig/include/linux/srcu.h > +++ usb-2.6/include/linux/srcu.h > @@ -24,6 +24,9 @@ > * > */ > > +#ifndef _LINUX_SRCU_H > +#define _LINUX_SRCU_H > + > struct srcu_struct_array { > int c[2]; > }; > @@ -47,3 +50,5 @@ void srcu_read_unlock(struct srcu_struct > void synchronize_srcu(struct srcu_struct *sp); > long srcu_batches_completed(struct srcu_struct *sp); > void cleanup_srcu_struct(struct srcu_struct *sp); > + > +#endif > -- ---------------------------------------------------------------------- Chandra Seetharaman | Be careful what you choose.... - sekharan@us.ibm.com | .......you may get it. ---------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <20060711172530.GA93@oleg>]
* Re: [PATCH 1/2] srcu-3: RCU variant permitting read-side blocking [not found] <20060711172530.GA93@oleg> @ 2006-07-11 14:56 ` Alan Stern 2006-07-11 18:21 ` Paul E. McKenney 0 siblings, 1 reply; 18+ messages in thread From: Alan Stern @ 2006-07-11 14:56 UTC (permalink / raw) To: Oleg Nesterov Cc: Paul E. McKenney, Andrew Morton, matthltc, dipankar, mingo, tytso, dvhltc, jes, Kernel development list On Tue, 11 Jul 2006, Oleg Nesterov wrote: > Let's look at the 3-rd synchronize_sched() call. > > Suppose that the reader writes to the rcu-protected memory and then > does srcu_read_unlock(). It is possible that the writer sees the result > of srcu_read_unlock() (so that srcu_readers_active_idx() returns 0) > but does not see other writes yet. The writer does synchronize_sched(), > the reader (according to 2) above) does mb(). But this doesn't guarantee > that all preceding mem ops were finished, so it is possible that > synchronize_srcu() returns and probably frees that memory too early. > > If it were possible to "insert" mb() into srcu_read_unlock() before ->c[]--, > we are ok, but synchronize_sched() is a "shot in the dark". > > In a similar manner, the 2-nd synchronize_sched() requires (I think) that > all changes which were done by the current CPU should be visible to other > CPUs before return. > > Does it make sense? Yes, it's a valid concern. Paul will correct me if this is wrong... As I understand it, the code which responds to a synchronize_sched() call (that is, the code which runs on other CPUs) does lots of stuff, involving context changes and who-knows-what else. There are lots of memory barriers in there, buried in the middle, along with plenty of other memory reads and writes. In outline, the reader's CPU responds to synchronize_sched() something like this: At the next safe time (context switch): mb(); Tell the writer's CPU that the preempt count on the reader's CPU is 0 Since that "tell the writer's CPU" step won't complete until after the mb() has forced all preceding memory operations to complete, we are safe. So perhaps the documentation for synchronize_sched() and synchronize_rcu() should say that when the call returns, all CPUs (including the caller's) will have executed a memory barrier and so will have completed all memory operations that started before the synchronize_xxx() call was issued. Alan Stern ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/2] srcu-3: RCU variant permitting read-side blocking 2006-07-11 14:56 ` [PATCH 1/2] srcu-3: RCU variant permitting read-side blocking Alan Stern @ 2006-07-11 18:21 ` Paul E. McKenney 0 siblings, 0 replies; 18+ messages in thread From: Paul E. McKenney @ 2006-07-11 18:21 UTC (permalink / raw) To: Alan Stern Cc: Oleg Nesterov, Andrew Morton, matthltc, dipankar, mingo, tytso, dvhltc, jes, Kernel development list On Tue, Jul 11, 2006 at 10:56:05AM -0400, Alan Stern wrote: > On Tue, 11 Jul 2006, Oleg Nesterov wrote: > > > Let's look at the 3-rd synchronize_sched() call. > > > > Suppose that the reader writes to the rcu-protected memory and then > > does srcu_read_unlock(). It is possible that the writer sees the result > > of srcu_read_unlock() (so that srcu_readers_active_idx() returns 0) > > but does not see other writes yet. The writer does synchronize_sched(), > > the reader (according to 2) above) does mb(). But this doesn't guarantee > > that all preceding mem ops were finished, so it is possible that > > synchronize_srcu() returns and probably frees that memory too early. > > > > If it were possible to "insert" mb() into srcu_read_unlock() before ->c[]--, > > we are ok, but synchronize_sched() is a "shot in the dark". > > > > In a similar manner, the 2-nd synchronize_sched() requires (I think) that > > all changes which were done by the current CPU should be visible to other > > CPUs before return. > > > > Does it make sense? > > Yes, it's a valid concern. > > Paul will correct me if this is wrong... As I understand it, the code > which responds to a synchronize_sched() call (that is, the code which runs > on other CPUs) does lots of stuff, involving context changes and > who-knows-what else. There are lots of memory barriers in there, buried > in the middle, along with plenty of other memory reads and writes. > > In outline, the reader's CPU responds to synchronize_sched() something > like this: > > At the next safe time (context switch): > mb(); > Tell the writer's CPU that the preempt count on > the reader's CPU is 0 > > Since that "tell the writer's CPU" step won't complete until after the > mb() has forced all preceding memory operations to complete, we are safe. > > So perhaps the documentation for synchronize_sched() and synchronize_rcu() > should say that when the call returns, all CPUs (including the caller's) > will have executed a memory barrier and so will have completed all memory > operations that started before the synchronize_xxx() call was issued. I would instead think of this in the context of memory-barrier pairings as documented in Documentation/memory-barriers.txt. As Oleg notes above, synchronize_sched() is a "shot in the dark", since there is nothing that lines up the synchronize_srcu() with any concurrent srcu_read_lock() or srcu_read_unlock(). The trick here is that the synchronize_sched() eliminates one of the possibilities (taking this from the viewpoint of the last synchronize_sched() in synchronize_srcu() on CPU 0 interacting with a concurrent srcu_read_unlock() on CPU 1): 1. srcu_read_unlock()'s counter decrement is perceived by CPU 1 as happening after all of the read-side critical-section accesses. This is OK, because this means that the critical section completes before synchronize_srcu()'s checks, and thus before any destructive operations following the synchronize_srcu(). 2. srcu_read_unlock()'s counter decrement is perceived by CPU 1 as happening -before- some of the read-side critical-section accesses. This could be bad, but... The synchronize_srcu() forces CPU 0 to execute a memory barrier on each active CPU, including CPU 1. CPU 0 must subsequently execute a memory barrier to become aware of the grace period ending, so that it sees -all- CPU 1's earlier memory accesses (required, because CPU 0 had to have seen, perhaps indirectly, CPU 1's indication that it had gone through a quiescent state). Therefore, by the time CPU 0 is done with the synchronize_sched(), the memory accesses that CPU 1 executed as part of the SRCU read-side critical section must all be visible. Any sequence of events that has the srcu_read_unlock() happening after the synchronize_sched() starts is prohibited, since the synchronize_sched() won't start until the counters all hit zero. In case people were wondering, the ornateness of the above reasoning is one reason I have been resisting any attempt to further complicate the SRCU stuff. ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 0/2] srcu-3: add RCU variant that permits read-side blocking @ 2006-07-06 17:14 Paul E. McKenney 2006-07-06 17:20 ` [PATCH 1/2] srcu-3: RCU variant permitting " Paul E. McKenney 0 siblings, 1 reply; 18+ messages in thread From: Paul E. McKenney @ 2006-07-06 17:14 UTC (permalink / raw) To: linux-kernel Cc: akpm, matthltc, dipankar, stern, mingo, tytso, dvhltc, oleg, jes Version 3 of the SRCU patchset. This patch incorporates a number of improvements, many of which came up in off-list discussions with Alan Stern. Neither of us are sure why these discussions ended up off-list, so I have summarized them below. o Fixes some "zombie code" -- excess curly braces and the like. o Gets rid of the double-flip in favor of an additional synchronize_sched(). This turned out to be safe, despite my saying otherwise at http://lkml.org/lkml/2006/6/27/486. The trick that I was missing is that synchronize_sched() forces all CPUs to execute at least one memory barrier during the synchronize_sched()'s execution, which forces all CPUs to see synchronize_srcu()'s counter increment as happening after any memory manipulations prior to the synchronize_srcu(). Upgraded comments to indicate what the synchronize_sched() calls are needed for. o Added a barrier() to both srcu_read_lock() and srcu_read_unlock() to prevent the compiler from performing optimizations that would cause the critical section to move outside of the enclosing srcu_read_lock() and srcu_read_unlock(). However, these barrier()s in srcu_read_lock() and srcu_read_unlock() are needed only in non-CONFIG_PREEMPT kernels, so they compile to nothing in CONFIG_PREEMPT kernels (where the preempt_disable() and preempt_enable() calls supply the needed barrier() call). o Added a check to synchronize_srcu() that permits this primitive to take advantage of grace periods induced by concurrent executions in other tasks. This can be useful in cases where you are using a single srcu_struct to handle all the individually-locked chains of a hash table, for example. o cleanup_srcu_struct() now contains error checks to catch cases where readers are still using the srcu_struct in question. It does a WARN_ON() and leaks the srcu_struct's per-CPU data in that case. o There is an srcu_readers_active() that returns the number of readers (approximate!) currently using the specified srcu_struct. This can be useful when terminating use of an srcu_struct, e.g., at module-unload time. o Improved the RCU torture tests, increasing the skew on reader times and providing implementation-specific delay functions. ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/2] srcu-3: RCU variant permitting read-side blocking 2006-07-06 17:14 [PATCH 0/2] srcu-3: add RCU variant that permits " Paul E. McKenney @ 2006-07-06 17:20 ` Paul E. McKenney [not found] ` <20060709235029.GA194@oleg> 0 siblings, 1 reply; 18+ messages in thread From: Paul E. McKenney @ 2006-07-06 17:20 UTC (permalink / raw) To: linux-kernel Cc: akpm, matthltc, dipankar, stern, mingo, tytso, dvhltc, oleg, jes Updated patch adding a variant of RCU that permits sleeping in read-side critical sections. SRCU is as follows: o Each use of SRCU creates its own srcu_struct, and each srcu_struct has its own set of grace periods. This is critical, as it prevents one subsystem with a blocking reader from holding up SRCU grace periods for other subsystems. o The SRCU primitives (srcu_read_lock(), srcu_read_unlock(), and synchronize_srcu()) all take a pointer to a srcu_struct. o The SRCU primitives must be called from process context. o srcu_read_lock() returns an int that must be passed to the matching srcu_read_unlock(). Realtime RCU avoids the need for this by storing the state in the task struct, but SRCU needs to allow a given code path to pass through multiple SRCU domains -- storing state in the task struct would therefore require either arbitrary space in the task struct or arbitrary limits on SRCU nesting. So I kicked the state-storage problem up to the caller. Of course, it is not permitted to call synchronize_srcu() while in an SRCU read-side critical section. o There is no call_srcu(). It would not be hard to implement one, but it seems like too easy a way to OOM the system. (Hey, we have enough trouble with call_rcu(), which does -not- permit readers to sleep!!!) So, if you want it, please tell me why... Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com> --- Documentation/RCU/checklist.txt | 38 ++++++ Documentation/RCU/rcu.txt | 3 Documentation/RCU/whatisRCU.txt | 3 include/linux/srcu.h | 49 ++++++++ kernel/Makefile | 2 kernel/srcu.c | 238 ++++++++++++++++++++++++++++++++++++++++ 6 files changed, 331 insertions(+), 2 deletions(-) diff -urpNa -X dontdiff linux-2.6.17-torturercu_bh/Documentation/RCU/checklist.txt linux-2.6.17-srcu/Documentation/RCU/checklist.txt --- linux-2.6.17-torturercu_bh/Documentation/RCU/checklist.txt 2006-06-17 18:49:35.000000000 -0700 +++ linux-2.6.17-srcu/Documentation/RCU/checklist.txt 2006-06-30 17:07:02.000000000 -0700 @@ -183,3 +183,41 @@ over a rather long period of time, but i disable irq on a given acquisition of that lock will result in deadlock as soon as the RCU callback happens to interrupt that acquisition's critical section. + +13. SRCU (srcu_read_lock(), srcu_read_unlock(), and synchronize_srcu()) + may only be invoked from process context. Unlike other forms of + RCU, it -is- permissible to block in an SRCU read-side critical + section (demarked by srcu_read_lock() and srcu_read_unlock()), + hence the "SRCU": "sleepable RCU". Please note that if you + don't need to sleep in read-side critical sections, you should + be using RCU rather than SRCU, because RCU is almost always + faster and easier to use than is SRCU. + + Also unlike other forms of RCU, explicit initialization + and cleanup is required via init_srcu_struct() and + cleanup_srcu_struct(). These are passed a "struct srcu_struct" + that defines the scope of a given SRCU domain. Once initialized, + the srcu_struct is passed to srcu_read_lock(), srcu_read_unlock() + and synchronize_srcu(). A given synchronize_srcu() waits only + for SRCU read-side critical sections governed by srcu_read_lock() + and srcu_read_unlock() calls that have been passd the same + srcu_struct. This property is what makes sleeping read-side + critical sections tolerable -- a given subsystem delays only + its own updates, not those of other subsystems using SRCU. + Therefore, SRCU is less prone to OOM the system than RCU would + be if RCU's read-side critical sections were permitted to + sleep. + + The ability to sleep in read-side critical sections does not + come for free. First, corresponding srcu_read_lock() and + srcu_read_unlock() calls must be passed the same srcu_struct. + Second, grace-period-detection overhead is amortized only + over those updates sharing a given srcu_struct, rather than + being globally amortized as they are for other forms of RCU. + Therefore, SRCU should be used in preference to rw_semaphore + only in extremely read-intensive situations, or in situations + requiring SRCU's read-side deadlock immunity or low read-side + realtime latency. + + Note that, rcu_assign_pointer() and rcu_dereference() relate to + SRCU just as they do to other forms of RCU. diff -urpNa -X dontdiff linux-2.6.17-torturercu_bh/Documentation/RCU/rcu.txt linux-2.6.17-srcu/Documentation/RCU/rcu.txt --- linux-2.6.17-torturercu_bh/Documentation/RCU/rcu.txt 2006-06-17 18:49:35.000000000 -0700 +++ linux-2.6.17-srcu/Documentation/RCU/rcu.txt 2006-06-24 08:04:07.000000000 -0700 @@ -45,7 +45,8 @@ o How can I see where RCU is currently u Search for "rcu_read_lock", "rcu_read_unlock", "call_rcu", "rcu_read_lock_bh", "rcu_read_unlock_bh", "call_rcu_bh", - "synchronize_rcu", and "synchronize_net". + "srcu_read_lock", "srcu_read_unlock", "synchronize_rcu", + "synchronize_net", and "synchronize_srcu". o What guidelines should I follow when writing code that uses RCU? diff -urpNa -X dontdiff linux-2.6.17-torturercu_bh/Documentation/RCU/whatisRCU.txt linux-2.6.17-srcu/Documentation/RCU/whatisRCU.txt --- linux-2.6.17-torturercu_bh/Documentation/RCU/whatisRCU.txt 2006-06-17 18:49:35.000000000 -0700 +++ linux-2.6.17-srcu/Documentation/RCU/whatisRCU.txt 2006-06-24 08:04:07.000000000 -0700 @@ -767,6 +767,8 @@ Markers for RCU read-side critical secti rcu_read_unlock rcu_read_lock_bh rcu_read_unlock_bh + srcu_read_lock + srcu_read_unlock RCU pointer/list traversal: @@ -794,6 +796,7 @@ RCU grace period: synchronize_net synchronize_sched synchronize_rcu + synchronize_srcu call_rcu call_rcu_bh diff -urpNa -X dontdiff linux-2.6.17-torturercu_bh/include/linux/srcu.h linux-2.6.17-srcu/include/linux/srcu.h --- linux-2.6.17-torturercu_bh/include/linux/srcu.h 1969-12-31 16:00:00.000000000 -0800 +++ linux-2.6.17-srcu/include/linux/srcu.h 2006-07-02 07:27:32.000000000 -0700 @@ -0,0 +1,49 @@ +/* + * Sleepable Read-Copy Update mechanism for mutual exclusion + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Copyright (C) IBM Corporation, 2006 + * + * Author: Paul McKenney <paulmck@us.ibm.com> + * + * For detailed explanation of Read-Copy Update mechanism see - + * Documentation/RCU/ *.txt + * + */ + +struct srcu_struct_array { + int c[2]; +}; + +struct srcu_struct { + int completed; + struct srcu_struct_array *per_cpu_ref; + struct mutex mutex; +}; + +#ifndef CONFIG_PREEMPT +#define srcu_barrier() barrier() +#else /* #ifndef CONFIG_PREEMPT */ +#define srcu_barrier() +#endif /* #else #ifndef CONFIG_PREEMPT */ + +void init_srcu_struct(struct srcu_struct *sp); +void cleanup_srcu_struct(struct srcu_struct *sp); +int srcu_read_lock(struct srcu_struct *sp); +void srcu_read_unlock(struct srcu_struct *sp, int idx); +void synchronize_srcu(struct srcu_struct *sp); +long srcu_batches_completed(struct srcu_struct *sp); +void cleanup_srcu_struct(struct srcu_struct *sp); diff -urpNa -X dontdiff linux-2.6.17-torturercu_bh/kernel/Makefile linux-2.6.17-srcu/kernel/Makefile --- linux-2.6.17-torturercu_bh/kernel/Makefile 2006-06-17 18:49:35.000000000 -0700 +++ linux-2.6.17-srcu/kernel/Makefile 2006-06-24 08:04:07.000000000 -0700 @@ -8,7 +8,7 @@ obj-y = sched.o fork.o exec_domain.o signal.o sys.o kmod.o workqueue.o pid.o \ rcupdate.o extable.o params.o posix-timers.o \ kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o mutex.o \ - hrtimer.o + hrtimer.o srcu.o obj-$(CONFIG_DEBUG_MUTEXES) += mutex-debug.o obj-$(CONFIG_FUTEX) += futex.o diff -urpNa -X dontdiff linux-2.6.17-torturercu_bh/kernel/srcu.c linux-2.6.17-srcu/kernel/srcu.c --- linux-2.6.17-torturercu_bh/kernel/srcu.c 1969-12-31 16:00:00.000000000 -0800 +++ linux-2.6.17-srcu/kernel/srcu.c 2006-07-04 18:49:13.000000000 -0700 @@ -0,0 +1,238 @@ +/* + * Sleepable Read-Copy Update mechanism for mutual exclusion. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Copyright (C) IBM Corporation, 2006 + * + * Author: Paul McKenney <paulmck@us.ibm.com> + * + * For detailed explanation of Read-Copy Update mechanism see - + * Documentation/RCU/ *.txt + * + */ + +#include <linux/module.h> +#include <linux/mutex.h> +#include <linux/percpu.h> +#include <linux/preempt.h> +#include <linux/rcupdate.h> +#include <linux/sched.h> +#include <linux/slab.h> +#include <linux/smp.h> +#include <linux/srcu.h> + +/** + * init_srcu_struct - initialize a sleep-RCU structure + * @sp: structure to initialize. + * + * Must invoke this on a given srcu_struct before passing that srcu_struct + * to any other function. Each srcu_struct represents a separate domain + * of SRCU protection. + */ +void init_srcu_struct(struct srcu_struct *sp) +{ + sp->completed = 0; + sp->per_cpu_ref = alloc_percpu(struct srcu_struct_array); + mutex_init(&sp->mutex); +} + +/* + * srcu_readers_active_idx -- returns approximate number of readers + * active on the specified rank of per-CPU counters. + */ + +static int srcu_readers_active_idx(struct srcu_struct *sp, int idx) +{ + int cpu; + int sum; + + sum = 0; + for_each_possible_cpu(cpu) + sum += per_cpu_ptr(sp->per_cpu_ref, cpu)->c[idx]; + return (sum); +} + +/** + * srcu_readers_active - returns approximate number of readers. + * @sp: which srcu_struct to count active readers (holding srcu_read_lock). + * + * Note that this is not an atomic primitive, and can therefore suffer + * severe errors when invoked on an active srcu_struct. That said, it + * can be useful as an error check at cleanup time. + */ +int srcu_readers_active(struct srcu_struct *sp) +{ + return srcu_readers_active_idx(sp, 0) + srcu_readers_active_idx(sp, 1); +} + +/** + * cleanup_srcu_struct - deconstruct a sleep-RCU structure + * @sp: structure to clean up. + * + * Must invoke this after you are finished using a given srcu_struct that + * was initialized via init_srcu_struct(), else you leak memory. + */ +void cleanup_srcu_struct(struct srcu_struct *sp) +{ + int sum; + + sum = srcu_readers_active(sp); + WARN_ON(sum); /* Leakage unless caller handles error. */ + if (sum != 0) + return; + free_percpu(sp->per_cpu_ref); + sp->per_cpu_ref = NULL; +} + +/** + * srcu_read_lock - register a new reader for an SRCU-protected structure. + * @sp: srcu_struct in which to register the new reader. + * + * Counts the new reader in the appropriate per-CPU element of the + * srcu_struct. Must be called from process context. + * Returns an index that must be passed to the matching srcu_read_unlock(). + */ +int srcu_read_lock(struct srcu_struct *sp) +{ + int idx; + + preempt_disable(); + idx = sp->completed & 0x1; + barrier(); /* ensure compiler looks -once- at sp->completed. */ + per_cpu_ptr(sp->per_cpu_ref, smp_processor_id())->c[idx]++; + srcu_barrier(); /* ensure compiler won't misorder critical section. */ + preempt_enable(); + return idx; +} + +/** + * srcu_read_unlock - unregister a old reader from an SRCU-protected structure. + * @sp: srcu_struct in which to unregister the old reader. + * @idx: return value from corresponding srcu_read_lock(). + * + * Removes the count for the old reader from the appropriate per-CPU + * element of the srcu_struct. Note that this may well be a different + * CPU than that which was incremented by the corresponding srcu_read_lock(). + * Must be called from process context. + */ +void srcu_read_unlock(struct srcu_struct *sp, int idx) +{ + preempt_disable(); + srcu_barrier(); /* ensure compiler won't misorder critical section. */ + per_cpu_ptr(sp->per_cpu_ref, smp_processor_id())->c[idx]--; + preempt_enable(); +} + +/** + * synchronize_srcu - wait for prior SRCU read-side critical-section completion + * @sp: srcu_struct with which to synchronize. + * + * Flip the completed counter, and wait for the old count to drain to zero. + * As with classic RCU, the updater must use some separate means of + * synchronizing concurrent updates. Can block; must be called from + * process context. + */ +void synchronize_srcu(struct srcu_struct *sp) +{ + int idx; + int sum; + + idx = sp->completed; + mutex_lock(&sp->mutex); + + /* + * Check to see if someone else did the work for us while we were + * waiting to acquire the lock. We need -two- advances of + * the counter, not just one. If there was but one, we might have + * shown up -after- our helper's first synchronize_sched(), thus + * having failed to prevent CPU-reordering races with concurrent + * srcu_read_unlock()s on other CPUs (see comment below). So we + * either (1) wait for two or (2) supply the second ourselves. + */ + + if ((sp->completed - idx) >= 2) { + mutex_unlock(&sp->mutex); + return; + } + + synchronize_sched(); /* Force memory barrier on all CPUs. */ + + /* + * The preceding synchronize_sched() ensures that any CPU that + * sees the new value of sp->completed will also see any preceding + * changes to data structures made by this CPU. This prevents + * some other CPU from reordering the accesses in its SRCU + * read-side critical section to precede the corresponding + * srcu_read_lock() -- ensuring that such references will in + * fact be protected. + * + * So it is now safe to do the flip. + */ + + idx = sp->completed & 0x1; + sp->completed++; + + synchronize_sched(); /* Force memory barrier on all CPUs. */ + + /* + * At this point, because of the preceding synchronize_sched(), + * all srcu_read_lock() calls using the old counters have completed. + * Their corresponding critical sections might well be still + * executing, but the srcu_read_lock() primitives themselves + * will have finished executing. + */ + + for (;;) { + sum = srcu_readers_active_idx(sp, idx); + if (sum == 0) + break; + schedule_timeout_interruptible(1); + } + + synchronize_sched(); /* Force memory barrier on all CPUs. */ + + /* + * The preceding synchronize_sched() forces all srcu_read_unlock() + * primitives that were executing concurrently with the preceding + * for_each_possible_cpu() loop to have completed by this point. + * More importantly, it also forces the corresponding SRCU read-side + * critical sections to have also completed, and the corresponding + * references to SRCU-protected data items to be dropped. + */ + + mutex_unlock(&sp->mutex); +} + +/** + * srcu_batches_completed - return batches completed. + * @sp: srcu_struct on which to report batch completion. + * + * Report the number of batches, correlated with, but not necessarily + * precisely the same as, the number of grace periods that have elapsed. + */ + +long srcu_batches_completed(struct srcu_struct *sp) +{ + return sp->completed; +} + +EXPORT_SYMBOL_GPL(init_srcu_struct); +EXPORT_SYMBOL_GPL(cleanup_srcu_struct); +EXPORT_SYMBOL_GPL(srcu_read_lock); +EXPORT_SYMBOL_GPL(srcu_read_unlock); +EXPORT_SYMBOL_GPL(synchronize_srcu); +EXPORT_SYMBOL_GPL(srcu_batches_completed); +EXPORT_SYMBOL_GPL(srcu_readers_active); ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <20060709235029.GA194@oleg>]
* Re: [PATCH 1/2] srcu-3: RCU variant permitting read-side blocking [not found] ` <20060709235029.GA194@oleg> @ 2006-07-10 16:51 ` Paul E. McKenney [not found] ` <44B29212.1070301@yahoo.com.au> 0 siblings, 1 reply; 18+ messages in thread From: Paul E. McKenney @ 2006-07-10 16:51 UTC (permalink / raw) To: Oleg Nesterov Cc: linux-kernel, akpm, matthltc, dipankar, stern, mingo, tytso, dvhltc, jes, dhowells On Mon, Jul 10, 2006 at 03:50:29AM +0400, Oleg Nesterov wrote: > On 07/06, Paul E. McKenney wrote: > > > > Updated patch adding a variant of RCU that permits sleeping in read-side > > critical sections. > > I do not see any problems with this patch, but I have a couple of > questions, so your help is needed again. Thank you for looking it over! > > +void synchronize_srcu(struct srcu_struct *sp) > > +{ > > + [... snip ...] > > + > > + synchronize_sched(); /* Force memory barrier on all CPUs. */ > > + > > + /* > > + * The preceding synchronize_sched() forces all srcu_read_unlock() > > + * primitives that were executing concurrently with the preceding > > + * for_each_possible_cpu() loop to have completed by this point. > > + * More importantly, it also forces the corresponding SRCU read-side > > + * critical sections to have also completed, and the corresponding > > + * references to SRCU-protected data items to be dropped. > > + */ > > + > > + mutex_unlock(&sp->mutex); > > +} > > Isn't it possible to unlock ->mutex earlier, before the last > synchronize_sched()? It seems possible, but I would like to think carefully about this one first, and, if it still seems plausible, test it heavily. If I understand your line of reasoning, the thought is that the first synchronize_sched() at the beginning of synchronize_srcu() ensures that all of the counter updates pertaining to the last instance of synchronize_srcu() have been committed. The same reasoning might well cover the sp->completed fastpath as well. In any case, this is a performance boost off the fastpath. A good boost, if it works, but I will be much more excited if you find a way of speeding up srcu_read_lock() or srcu_read_unlock(). ;-) > Another question: what is the semantics of synchronize_sched() ? > > I am not talking about the current implementation, it is very clear. > The question is: what is the _definition_ of synchronize_sched() > (which must be valid for "any" RCU implementation) ? > > 1) The comment in include/linux/rcupdate.h states that "all preempt_disable > code sequences will have completed before this primitive returns". > > 2) kernel/srcu.c claims that this primitive "forces memory barrier on all > CPUs". (so the comment in rcupdate.h is not complete). > > (I understand this so that each cpu does something which implies mb() > semantics). > > As I see it, 1) + 2) is NOT enough for synchronize_srcu() to be correct > (the 2-nd and 3-rd synchronize_sched() calls). I think synchronize_sched() > should also guarantee the completion of mem ops on all CPUs before return, > not just mb() (which does not have any timing guaranties). > > Could you clarify this issue? > > (Again, I do not see any problems with the current RCU implementation). However, this -does- seem to be to be a problem with the comment headers and the documentation. Does the following patch make things better? David, would it be worthwhile adding this global-memory-barrier effect of synchronize_rcu(), synchronize_sched(), and synchronize_srcu() to Documentation/memory-barriers.txt? Thanx, Paul Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com> --- Documentation/RCU/checklist.txt | 4 ++++ include/linux/rcupdate.h | 3 +++ kernel/rcupdate.c | 3 +++ kernel/srcu.c | 3 ++- 4 files changed, 12 insertions(+), 1 deletion(-) diff -urpNa -X dontdiff linux-2.6.17-srcu-LKML-4/Documentation/RCU/checklist.txt linux-2.6.17-srcu-LKML-5/Documentation/RCU/checklist.txt --- linux-2.6.17-srcu-LKML-4/Documentation/RCU/checklist.txt 2006-07-06 16:45:01.000000000 -0700 +++ linux-2.6.17-srcu-LKML-5/Documentation/RCU/checklist.txt 2006-07-10 09:43:19.000000000 -0700 @@ -221,3 +221,7 @@ over a rather long period of time, but i Note that, rcu_assign_pointer() and rcu_dereference() relate to SRCU just as they do to other forms of RCU. + +14. The synchronize_rcu(), synchronize_sched(), and synchronize_srcu() + primitives force at least one memory barrier to be executed on + each active CPU before they return. diff -urpNa -X dontdiff linux-2.6.17-srcu-LKML-4/include/linux/rcupdate.h linux-2.6.17-srcu-LKML-5/include/linux/rcupdate.h --- linux-2.6.17-srcu-LKML-4/include/linux/rcupdate.h 2006-06-17 18:49:35.000000000 -0700 +++ linux-2.6.17-srcu-LKML-5/include/linux/rcupdate.h 2006-07-10 09:48:51.000000000 -0700 @@ -251,6 +251,9 @@ extern int rcu_needs_cpu(int cpu); * guarantees that rcu_read_lock() sections will have completed. * In "classic RCU", these two guarantees happen to be one and * the same, but can differ in realtime RCU implementations. + * + * In addition, this primitive guarantees that every active CPU has + * executed at least one memory barrier before it returns. */ #define synchronize_sched() synchronize_rcu() diff -urpNa -X dontdiff linux-2.6.17-srcu-LKML-4/kernel/rcupdate.c linux-2.6.17-srcu-LKML-5/kernel/rcupdate.c --- linux-2.6.17-srcu-LKML-4/kernel/rcupdate.c 2006-06-17 18:49:35.000000000 -0700 +++ linux-2.6.17-srcu-LKML-5/kernel/rcupdate.c 2006-07-10 09:48:32.000000000 -0700 @@ -597,6 +597,9 @@ static void wakeme_after_rcu(struct rcu_ * sections are delimited by rcu_read_lock() and rcu_read_unlock(), * and may be nested. * + * This primitive also causes each active CPU to execute at least one + * memory barrier before it returns. + * * If your read-side code is not protected by rcu_read_lock(), do -not- * use synchronize_rcu(). */ diff -urpNa -X dontdiff linux-2.6.17-srcu-LKML-4/kernel/srcu.c linux-2.6.17-srcu-LKML-5/kernel/srcu.c --- linux-2.6.17-srcu-LKML-4/kernel/srcu.c 2006-07-06 16:50:23.000000000 -0700 +++ linux-2.6.17-srcu-LKML-5/kernel/srcu.c 2006-07-10 09:48:09.000000000 -0700 @@ -143,7 +143,8 @@ void srcu_read_unlock(struct srcu_struct * Flip the completed counter, and wait for the old count to drain to zero. * As with classic RCU, the updater must use some separate means of * synchronizing concurrent updates. Can block; must be called from - * process context. + * process context. Has the side-effect of forcing a memory barrier on + * each active CPU before returning. * * Note that it is illegal to call synchornize_srcu() from the corresponding * SRCU read-side critical section; doing so will result in deadlock. ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <44B29212.1070301@yahoo.com.au>]
* Re: [PATCH 1/2] srcu-3: RCU variant permitting read-side blocking [not found] ` <44B29212.1070301@yahoo.com.au> @ 2006-07-11 14:19 ` Paul E. McKenney 0 siblings, 0 replies; 18+ messages in thread From: Paul E. McKenney @ 2006-07-11 14:19 UTC (permalink / raw) To: Nick Piggin Cc: Oleg Nesterov, linux-kernel, akpm, matthltc, dipankar, stern, mingo, tytso, dvhltc, jes, dhowells On Tue, Jul 11, 2006 at 03:44:50AM +1000, Nick Piggin wrote: > Paul E. McKenney wrote: > >On Mon, Jul 10, 2006 at 03:50:29AM +0400, Oleg Nesterov wrote: > > >>As I see it, 1) + 2) is NOT enough for synchronize_srcu() to be correct > >>(the 2-nd and 3-rd synchronize_sched() calls). I think synchronize_sched() > >>should also guarantee the completion of mem ops on all CPUs before return, > >>not just mb() (which does not have any timing guaranties). > >> > >>Could you clarify this issue? > >> > >>(Again, I do not see any problems with the current RCU implementation). > > > > > >However, this -does- seem to be to be a problem with the comment headers > >and the documentation. Does the following patch make things better? > > > >David, would it be worthwhile adding this global-memory-barrier effect > >of synchronize_rcu(), synchronize_sched(), and synchronize_srcu() to > > And call_rcu? (or is that already tucked away in the documentation > somewhere?) ie. there is a memory barrier between the call_rcu() call > and the actual callback. > > This is something I needed clarification with (as you might remember), > which might not be clear from an RCU API user's point of view. Good point -- since synchronize_rcu() is just a wrapper around call_rcu(), they do have the same properties. How about the following? Thanx, Paul Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com> Documentation/RCU/checklist.txt | 4 +++- include/linux/srcu.h | 5 +++++ kernel/rcupdate.c | 4 ++++ 3 files changed, 12 insertions(+), 1 deletion(-) diff -urpNa -X dontdiff linux-2.6.17-srcu-LKML-5/Documentation/RCU/checklist.txt linux-2.6.17-srcu-LKML-6/Documentation/RCU/checklist.txt --- linux-2.6.17-srcu-LKML-5/Documentation/RCU/checklist.txt 2006-07-10 09:43:19.000000000 -0700 +++ linux-2.6.17-srcu-LKML-6/Documentation/RCU/checklist.txt 2006-07-11 07:12:25.000000000 -0700 @@ -224,4 +224,6 @@ over a rather long period of time, but i 14. The synchronize_rcu(), synchronize_sched(), and synchronize_srcu() primitives force at least one memory barrier to be executed on - each active CPU before they return. + each active CPU before they return. Similarly, call_rcu() + forces at least one memory barrier to be executed on each active + CPU before the corresponding callback is invoked. diff -urpNa -X dontdiff linux-2.6.17-srcu-LKML-5/kernel/rcupdate.c linux-2.6.17-srcu-LKML-6/kernel/rcupdate.c --- linux-2.6.17-srcu-LKML-5/kernel/rcupdate.c 2006-07-10 09:48:32.000000000 -0700 +++ linux-2.6.17-srcu-LKML-6/kernel/rcupdate.c 2006-07-11 07:11:07.000000000 -0700 @@ -116,6 +116,10 @@ static inline void force_quiescent_state * read-side critical sections have completed. RCU read-side critical * sections are delimited by rcu_read_lock() and rcu_read_unlock(), * and may be nested. + * + * There will be at least one memory barrier executed on each active + * CPU between the time call_rcu() is invoked and the time that the + * corresponding callback is invoked. */ void fastcall call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu)) ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2006-07-12 0:57 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <Pine.LNX.4.44L0.0607061603320.5768-100000@iolanthe.rowland.org>
[not found] ` <1152226204.21787.2093.camel@stark>
2006-07-06 23:39 ` [PATCH 1/2] srcu-3: RCU variant permitting read-side blocking Paul E. McKenney
[not found] ` <Pine.LNX.4.44L0.0607071051430.17135-100000@iolanthe.rowland.org>
2006-07-07 16:33 ` Paul E. McKenney
[not found] ` <Pine.LNX.4.44L0.0607071345270.6793-100000@iolanthe.rowland.org>
2006-07-07 18:59 ` Paul E. McKenney
2006-07-07 19:59 ` Alan Stern
2006-07-07 21:11 ` Matt Helsley
2006-07-07 21:47 ` Paul E. McKenney
2006-07-10 19:11 ` SRCU-based notifier chains Alan Stern
2006-07-11 17:39 ` Paul E. McKenney
2006-07-11 18:03 ` Alan Stern
2006-07-11 18:22 ` Paul E. McKenney
2006-07-11 18:18 ` [PATCH] Add " Alan Stern
2006-07-11 18:30 ` Paul E. McKenney
2006-07-12 0:56 ` Chandra Seetharaman
[not found] <20060711172530.GA93@oleg>
2006-07-11 14:56 ` [PATCH 1/2] srcu-3: RCU variant permitting read-side blocking Alan Stern
2006-07-11 18:21 ` Paul E. McKenney
2006-07-06 17:14 [PATCH 0/2] srcu-3: add RCU variant that permits " Paul E. McKenney
2006-07-06 17:20 ` [PATCH 1/2] srcu-3: RCU variant permitting " Paul E. McKenney
[not found] ` <20060709235029.GA194@oleg>
2006-07-10 16:51 ` Paul E. McKenney
[not found] ` <44B29212.1070301@yahoo.com.au>
2006-07-11 14:19 ` Paul E. McKenney
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox