* Re: [PATCH] Fix RCU race in access of nohz_cpu_mask
@ 2005-12-08 19:31 Oleg Nesterov
2005-12-09 2:46 ` Srivatsa Vaddagiri
2005-12-09 2:56 ` Srivatsa Vaddagiri
0 siblings, 2 replies; 33+ messages in thread
From: Oleg Nesterov @ 2005-12-08 19:31 UTC (permalink / raw)
To: Srivatsa Vaddagiri
Cc: linux-kernel, Dipankar Sarma, Paul E. McKenney, Andrew Morton
Srivatsa Vaddagiri wrote:
>
> Accessing nohz_cpu_mask before incrementing rcp->cur is racy. It can
> cause tickless idle CPUs to be included in rsp->cpumask, which will
> extend graceperiods unnecessarily.
> ...
> @@ -244,15 +244,15 @@ static void rcu_start_batch(struct rcu_c
>
> if (rcp->next_pending &&
> rcp->completed == rcp->cur) {
> - /* Can't change, since spin lock held. */
> - cpus_andnot(rsp->cpumask, cpu_online_map, nohz_cpu_mask);
> -
> rcp->next_pending = 0;
> /* next_pending == 0 must be visible in __rcu_process_callbacks()
> * before it can see new value of cur.
> */
> smp_wmb();
> rcp->cur++;
> + smp_mb();
> + cpus_andnot(rsp->cpumask, cpu_online_map, nohz_cpu_mask);
> +
Could you explain why this patch makes any difference?
grep shows the only user of nohz_cpu_mask in arch/s390/kernel/time.c,
start_hz_timer() and stop_hz_timer().
I can't see how this change can prevent idle cpus to be included in
->cpumask, cpu can add itself to nohz_cpu_mask right after some other
cpu started new grace period.
I think cpu should call cpu_quiet() after adding itself to nohz mask
to eliminate this race.
No?
Oleg.
^ permalink raw reply [flat|nested] 33+ messages in thread* Re: [PATCH] Fix RCU race in access of nohz_cpu_mask 2005-12-08 19:31 [PATCH] Fix RCU race in access of nohz_cpu_mask Oleg Nesterov @ 2005-12-09 2:46 ` Srivatsa Vaddagiri 2005-12-09 19:17 ` Oleg Nesterov 2005-12-09 2:56 ` Srivatsa Vaddagiri 1 sibling, 1 reply; 33+ messages in thread From: Srivatsa Vaddagiri @ 2005-12-09 2:46 UTC (permalink / raw) To: Oleg Nesterov Cc: linux-kernel, Dipankar Sarma, Paul E. McKenney, Andrew Morton On Thu, Dec 08, 2005 at 10:31:06PM +0300, Oleg Nesterov wrote: > I can't see how this change can prevent idle cpus to be included in > ->cpumask, cpu can add itself to nohz_cpu_mask right after some other > cpu started new grace period. Yes that can happen, but if they check for rcu_pending right after that it will prevent them from going tickless atleast (which will prevent grace periods from being unnecessarily extended). Something like below: CPU0 CPU1 rcp->cur++; /* New GP */ smp_mb(); rsp->cpumask = 0x3 cpu_set(1, nohz_cpu_mask); rcu_pending()? - Yes, cpu_clear(1, nohz_cpu_mask); Abort going tickless Ideally we would have needed a smp_mb() in CPU1 also between setting CPU1 in nohz_cpu_mask and checking for rcu_pending(), but I guess it is not needed in s390 because of its strong ordering? -- Thanks and Regards, Srivatsa Vaddagiri, Linux Technology Center, IBM Software Labs, Bangalore, INDIA - 560017 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] Fix RCU race in access of nohz_cpu_mask 2005-12-09 2:46 ` Srivatsa Vaddagiri @ 2005-12-09 19:17 ` Oleg Nesterov 2005-12-10 15:19 ` Srivatsa Vaddagiri 0 siblings, 1 reply; 33+ messages in thread From: Oleg Nesterov @ 2005-12-09 19:17 UTC (permalink / raw) To: vatsa; +Cc: linux-kernel, Dipankar Sarma, Paul E. McKenney, Andrew Morton Srivatsa Vaddagiri wrote: > > On Thu, Dec 08, 2005 at 10:31:06PM +0300, Oleg Nesterov wrote: > > I can't see how this change can prevent idle cpus to be included in > > ->cpumask, cpu can add itself to nohz_cpu_mask right after some other > > cpu started new grace period. > > Yes that can happen, but if they check for rcu_pending right after that > it will prevent them from going tickless atleast (which will prevent grace > periods from being unnecessarily extended). Something like below: > > CPU0 CPU1 > > rcp->cur++; /* New GP */ > > smp_mb(); I think I need some education on memory barriers. Does this mb() garantees that the new value of ->cur will be visible on other cpus immediately after smp_mb() (so that rcu_pending() will notice it) ? My understanding is that it only garantees that all stores before it must be visible before any store after mb. (yes, mb implies rmb, but I think it does not matter if CPU1 adds itself to nonhz mask after CPU0 reads nohz_cpu_mask). This means that CPU1 can read the stale value of ->cur. I guess I am wrong, but I can't prove it to myself. Could you please clarify this? Even simpler question: CPU0 var = 1; wmb(); after that CPU1 does rmb(). Does it garantees that CPU1 will see the new value of var? > Ideally we would have needed a smp_mb() in CPU1 also between setting CPU1 > in nohz_cpu_mask and checking for rcu_pending(), but I guess it is not needed > in s390 because of its strong ordering? I don't know, but please note that s390's definition of smp_mb__after_atomic_inc() is not a 'nop'. Oleg. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] Fix RCU race in access of nohz_cpu_mask 2005-12-09 19:17 ` Oleg Nesterov @ 2005-12-10 15:19 ` Srivatsa Vaddagiri 2005-12-10 18:55 ` Oleg Nesterov 0 siblings, 1 reply; 33+ messages in thread From: Srivatsa Vaddagiri @ 2005-12-10 15:19 UTC (permalink / raw) To: Oleg Nesterov Cc: linux-kernel, Dipankar Sarma, Paul E. McKenney, Andrew Morton On Fri, Dec 09, 2005 at 10:17:38PM +0300, Oleg Nesterov wrote: > Srivatsa Vaddagiri wrote: > > > > On Thu, Dec 08, 2005 at 10:31:06PM +0300, Oleg Nesterov wrote: > > > I can't see how this change can prevent idle cpus to be included in > > > ->cpumask, cpu can add itself to nohz_cpu_mask right after some other > > > cpu started new grace period. > > > > Yes that can happen, but if they check for rcu_pending right after that > > it will prevent them from going tickless atleast (which will prevent grace > > periods from being unnecessarily extended). Something like below: > > > > CPU0 CPU1 > > > > rcp->cur++; /* New GP */ > > > > smp_mb(); > > I think I need some education on memory barriers. > > Does this mb() garantees that the new value of ->cur will be visible > on other cpus immediately after smp_mb() (so that rcu_pending() will > notice it) ? AFAIK the new value of ->cur should be visible to other CPUs when smp_mb() returns. Here's a definition of smp_mb() from Paul's article: smp_mb(): "memory barrier" that orders both loads and stores. This means loads and stores preceding the memory barrier are committed to memory before any loads and stores following the memory barrier. [ http://www.linuxjournal.com/article/8211 ] > My understanding is that it only garantees that all stores before it > must be visible before any store after mb. (yes, mb implies rmb, but > I think it does not matter if CPU1 adds itself to nonhz mask after > CPU0 reads nohz_cpu_mask). This means that CPU1 can read the stale > value of ->cur. I guess I am wrong, but I can't prove it to myself. > > Could you please clarify this? > > Even simpler question: > > CPU0 > var = 1; > wmb(); > > after that CPU1 does rmb(). > > Does it garantees that CPU1 will see the new value of var? Again, going by the above article, I would expect CPU1 to see the new value of var. -- Thanks and Regards, Srivatsa Vaddagiri, Linux Technology Center, IBM Software Labs, Bangalore, INDIA - 560017 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] Fix RCU race in access of nohz_cpu_mask 2005-12-10 15:19 ` Srivatsa Vaddagiri @ 2005-12-10 18:55 ` Oleg Nesterov 2005-12-11 17:41 ` Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ] Srivatsa Vaddagiri 2005-12-12 3:10 ` [PATCH] Fix RCU race in access of nohz_cpu_mask Paul E. McKenney 0 siblings, 2 replies; 33+ messages in thread From: Oleg Nesterov @ 2005-12-10 18:55 UTC (permalink / raw) To: vatsa; +Cc: linux-kernel, Dipankar Sarma, Paul E. McKenney, Andrew Morton Srivatsa Vaddagiri wrote: > > On Fri, Dec 09, 2005 at 10:17:38PM +0300, Oleg Nesterov wrote: > > > rcp->cur++; /* New GP */ > > > > > > smp_mb(); > > > > I think I need some education on memory barriers. > > > > Does this mb() garantees that the new value of ->cur will be visible > > on other cpus immediately after smp_mb() (so that rcu_pending() will > > notice it) ? > > AFAIK the new value of ->cur should be visible to other CPUs when smp_mb() > returns. Here's a definition of smp_mb() from Paul's article: > > smp_mb(): "memory barrier" that orders both loads and stores. This means loads > and stores preceding the memory barrier are committed to memory before any > loads and stores following the memory barrier. Thanks, but this definition talks about ordering, it does not say anything about the time when stores are really commited to memory (and does it mean also that cache-lines are invalidated on other cpus ?). > [ http://www.linuxjournal.com/article/8211 ] And thanks for this link, now I have some understanding about read_barrier_depends() ... Oleg. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ] 2005-12-10 18:55 ` Oleg Nesterov @ 2005-12-11 17:41 ` Srivatsa Vaddagiri 2005-12-11 21:21 ` Andrew James Wade 2005-12-12 3:10 ` [PATCH] Fix RCU race in access of nohz_cpu_mask Paul E. McKenney 1 sibling, 1 reply; 33+ messages in thread From: Srivatsa Vaddagiri @ 2005-12-11 17:41 UTC (permalink / raw) To: Oleg Nesterov Cc: linux-kernel, Dipankar Sarma, Paul E. McKenney, Andrew Morton, Ingo Molnar, Rusty Russell [Changed the subject line to be more generic in the interest of wider audience] We seem to be having some confusion over the exact semantics of smp_mb(). Specifically, are all stores preceding smp_mb() guaranteed to have finished (committed to memory/corresponding cache-lines on other CPUs invalidated) *before* successive loads are issued? >From IA-32 manual (download.intel.com/design/Pentium4/manuals/25366817.pdf Page 271): "Memory mapped devices and other I/O devices on the bus are often sensitive to the order of writes to their I/O buffers. I/O instructions can be used to (the IN and OUT instructions) impose strong write ordering on such accesses as follows. Prior to executing an I/O instruction, the processor waits for all ___________________________ previous instructions in the program to complete and for all buffered _____________________________________________________________________ writes to drain to memory. Only instruction fetch and page tables walks can _________________________ pass I/O instructions. Execution of subsequent instructions do not begin until the processor determines that the I/O instruction has been completed." Synchronization mechanisms in multiple-processor systems may depend upon a strong memory-ordering model. Here, a program can use a locking instruction such as the XCHG instruction or the LOCK prefix to insure that a read-modify-write operation on memory is carried out atomically. Locking operations typically operate like I/O operations in that they wait for all _________________ previous instructions to complete and for all buffered writes to drain to _________________________________________________________________________ memory (see Section 7.1.2, “Bus Locking”). ______ Program synchronization can also be carried out with serializing instructions (see Section 7.4). These instructions are typically used at critical procedure or task boundaries to force completion of all previous instructions before a jump to a new section of code or a context switch occurs. Like the I/O and locking instructions, the processor waits until all previous instructions have ________________________________________________________ been completed and all buffered writes have been drained to memory before _________________________________________________________________________ executing the serializing instruction." _____________________________________ >From this, looks like we can interpret that IA-32 processor will wait for all writes to drain to memory (implies cache invalidation on other CPUs?) *before* executing the synchronizing instruction? Question is can this be generalized for other CPUs too? On Sat, Dec 10, 2005 at 09:55:35PM +0300, Oleg Nesterov wrote: > Srivatsa Vaddagiri wrote: > > > > On Fri, Dec 09, 2005 at 10:17:38PM +0300, Oleg Nesterov wrote: > > > > rcp->cur++; /* New GP */ > > > > > > > > smp_mb(); > > > > > > I think I need some education on memory barriers. > > > > > > Does this mb() garantees that the new value of ->cur will be visible > > > on other cpus immediately after smp_mb() (so that rcu_pending() will > > > notice it) ? > > > > AFAIK the new value of ->cur should be visible to other CPUs when smp_mb() > > returns. Here's a definition of smp_mb() from Paul's article: > > > > smp_mb(): "memory barrier" that orders both loads and stores. This means loads > > and stores preceding the memory barrier are committed to memory before any > > loads and stores following the memory barrier. > > Thanks, but this definition talks about ordering, it does not say > anything about the time when stores are really commited to memory > (and does it mean also that cache-lines are invalidated on other > cpus ?). > > > [ http://www.linuxjournal.com/article/8211 ] > > And thanks for this link, now I have some understanding about > read_barrier_depends() ... > > Oleg. -- Thanks and Regards, Srivatsa Vaddagiri, Linux Technology Center, IBM Software Labs, Bangalore, INDIA - 560017 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ] 2005-12-11 17:41 ` Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ] Srivatsa Vaddagiri @ 2005-12-11 21:21 ` Andrew James Wade 2005-12-11 23:45 ` Rusty Russell 0 siblings, 1 reply; 33+ messages in thread From: Andrew James Wade @ 2005-12-11 21:21 UTC (permalink / raw) To: vatsa Cc: Oleg Nesterov, linux-kernel, Dipankar Sarma, Paul E. McKenney, Andrew Morton, Ingo Molnar, Rusty Russell On Sunday 11 December 2005 12:41, Srivatsa Vaddagiri wrote: > [Changed the subject line to be more generic in the interest of wider audience] > > We seem to be having some confusion over the exact semantics of smp_mb(). > > Specifically, are all stores preceding smp_mb() guaranteed to have finished > (committed to memory/corresponding cache-lines on other CPUs invalidated) > *before* successive loads are issued? I doubt it. That's definitely not true of smp_wmb(), which boils down to __asm__ __volatile__ ("": : :"memory") on SMP i386 (which the constrains how the compiler orders write instructions, but is otherwise a nop. i386 has in-order writes.). And it makes sense that wmb() wouldn't wait for writes: RCU needs constraints on the order in which writes become visible, but has very week constraints on when they do. Waiting for writes to flush would hurt performance. Andrew Wade ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ] 2005-12-11 21:21 ` Andrew James Wade @ 2005-12-11 23:45 ` Rusty Russell 2005-12-12 0:49 ` Keith Owens 2005-12-13 5:07 ` Andrew James Wade 0 siblings, 2 replies; 33+ messages in thread From: Rusty Russell @ 2005-12-11 23:45 UTC (permalink / raw) To: ajwade Cc: vatsa, Oleg Nesterov, linux-kernel, Dipankar Sarma, Paul E. McKenney, Andrew Morton, Ingo Molnar On Sun, 2005-12-11 at 16:21 -0500, Andrew James Wade wrote: > On Sunday 11 December 2005 12:41, Srivatsa Vaddagiri wrote: > > [Changed the subject line to be more generic in the interest of wider audience] > > > > We seem to be having some confusion over the exact semantics of smp_mb(). > > > > Specifically, are all stores preceding smp_mb() guaranteed to have finished > > (committed to memory/corresponding cache-lines on other CPUs invalidated) > > *before* successive loads are issued? > > I doubt it. That's definitely not true of smp_wmb(), which boils down to > __asm__ __volatile__ ("": : :"memory") on SMP i386 (which the constrains > how the compiler orders write instructions, but is otherwise a nop. i386 > has in-order writes.). > > And it makes sense that wmb() wouldn't wait for writes: RCU needs > constraints on the order in which writes become visible, but has very week > constraints on when they do. Waiting for writes to flush would hurt > performance. On the contrary. I did some digging and asking and thinking about this for the Unreliable Guide to Kernel Locking, years ago: wmb() means all writes preceeding will complete before any writes following are started. rmb() means all reads preceeding will complete before any reads following are started. mb() means all reads and writes preceeding will complete before any reads and writes following are started. This does not map to all the possibilities, nor does it take advantage of all architectures, but it's generally sufficient without getting insanely complex. Hope that clarifies, Rusty. -- ccontrol: http://ozlabs.org/~rusty/ccontrol ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ] 2005-12-11 23:45 ` Rusty Russell @ 2005-12-12 0:49 ` Keith Owens 2005-12-12 8:41 ` Srivatsa Vaddagiri 2005-12-13 5:07 ` Andrew James Wade 1 sibling, 1 reply; 33+ messages in thread From: Keith Owens @ 2005-12-12 0:49 UTC (permalink / raw) To: Rusty Russell Cc: ajwade, vatsa, Oleg Nesterov, linux-kernel, Dipankar Sarma, Paul E. McKenney, Andrew Morton, Ingo Molnar On Mon, 12 Dec 2005 10:45:16 +1100, Rusty Russell <rusty@rustcorp.com.au> wrote: >On Sun, 2005-12-11 at 16:21 -0500, Andrew James Wade wrote: >> On Sunday 11 December 2005 12:41, Srivatsa Vaddagiri wrote: >> > We seem to be having some confusion over the exact semantics of smp_mb(). >> > >> > Specifically, are all stores preceding smp_mb() guaranteed to have finished >> > (committed to memory/corresponding cache-lines on other CPUs invalidated) >> > *before* successive loads are issued? >> >> I doubt it. That's definitely not true of smp_wmb(), which boils down to >> __asm__ __volatile__ ("": : :"memory") on SMP i386 (which the constrains >> how the compiler orders write instructions, but is otherwise a nop. i386 >> has in-order writes.). >> >> And it makes sense that wmb() wouldn't wait for writes: RCU needs >> constraints on the order in which writes become visible, but has very week >> constraints on when they do. Waiting for writes to flush would hurt >> performance. > >On the contrary. I did some digging and asking and thinking about this >for the Unreliable Guide to Kernel Locking, years ago: > >wmb() means all writes preceeding will complete before any writes >following are started. >rmb() means all reads preceeding will complete before any reads >following are started. >mb() means all reads and writes preceeding will complete before any >reads and writes following are started. FWIW, wmb() on IA64 does not require that preceding stores are flushed to main memory. It only requires that they be "made visible to other processors in the coherence domain". "visible" means that the updated value must reach (at least) an externally snooped cache. There is no requirement that the preceding stores be flushed all the way to main memory, the updates only have to get as far as a cache level that other cpus can see. The cache snooping takes care of flushing to main memory when necessary. IA64 does have a memory fence that stalls the cpu until the data is "accepted by the external platform". That format is expensive and is only used for memory mapped I/O, where the data really does have to read the memory before the cpu can perform its next operation. For example, in the mmiowb() case. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ] 2005-12-12 0:49 ` Keith Owens @ 2005-12-12 8:41 ` Srivatsa Vaddagiri 2005-12-12 19:33 ` Oleg Nesterov 0 siblings, 1 reply; 33+ messages in thread From: Srivatsa Vaddagiri @ 2005-12-12 8:41 UTC (permalink / raw) To: Keith Owens Cc: Rusty Russell, ajwade, Oleg Nesterov, linux-kernel, Dipankar Sarma, Paul E. McKenney, Andrew Morton, Ingo Molnar On Mon, Dec 12, 2005 at 11:49:07AM +1100, Keith Owens wrote: > >On the contrary. I did some digging and asking and thinking about this > >for the Unreliable Guide to Kernel Locking, years ago: > > > >wmb() means all writes preceeding will complete before any writes > >following are started. > >rmb() means all reads preceeding will complete before any reads > >following are started. > >mb() means all reads and writes preceeding will complete before any > >reads and writes following are started. > > FWIW, wmb() on IA64 does not require that preceding stores are flushed > to main memory. It only requires that they be "made visible to other > processors in the coherence domain". "visible" means that the updated > value must reach (at least) an externally snooped cache. There is no > requirement that the preceding stores be flushed all the way to main > memory, the updates only have to get as far as a cache level that other > cpus can see. The cache snooping takes care of flushing to main memory > when necessary. For the context of the problem that we are dealing with, I think this fact that writes are made "visible" to other CPUs (before smp_mb() finishes and before other reads are started) is good enough. Oleg, with all these inputs, I consider the patch I had sent to be correct. Let me know if you still have some lingering doubts! P.S :- Thanks to everybody who reponded clarifying this subject. -- Thanks and Regards, Srivatsa Vaddagiri, Linux Technology Center, IBM Software Labs, Bangalore, INDIA - 560017 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ] 2005-12-12 8:41 ` Srivatsa Vaddagiri @ 2005-12-12 19:33 ` Oleg Nesterov 2005-12-13 5:20 ` Paul E. McKenney 0 siblings, 1 reply; 33+ messages in thread From: Oleg Nesterov @ 2005-12-12 19:33 UTC (permalink / raw) To: vatsa Cc: Keith Owens, Rusty Russell, ajwade, linux-kernel, Dipankar Sarma, Paul E. McKenney, Andrew Morton, Ingo Molnar Srivatsa Vaddagiri wrote: > > Oleg, with all these inputs, I consider the patch I had sent to be correct. Yes, I think so. My suspects were due to my misunderstanding. I was wrong when I said that we can ignore rmb() part of mb() in case when start_hz_timer() runs after rcu_start_batch(). rcp->cur++; // ->cur will be visible to other cpus // _before_ we will *READ* nohz_cpu_mask. // we don't have any 'timing' problems. // In other words: if another cpu does not // see the new value - we did not read this // mask yet. smp_mb(); Thanks for your patience. > P.S :- Thanks to everybody who reponded clarifying this subject. Yes! it was really helpful, thanks to all. I think it would be great to have Paul's very clear (and short!) explanation somewhere in Documentation/. Oleg. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ] 2005-12-12 19:33 ` Oleg Nesterov @ 2005-12-13 5:20 ` Paul E. McKenney 0 siblings, 0 replies; 33+ messages in thread From: Paul E. McKenney @ 2005-12-13 5:20 UTC (permalink / raw) To: Oleg Nesterov Cc: vatsa, Keith Owens, Rusty Russell, ajwade, linux-kernel, Dipankar Sarma, Andrew Morton, Ingo Molnar On Mon, Dec 12, 2005 at 10:33:22PM +0300, Oleg Nesterov wrote: > Srivatsa Vaddagiri wrote: > > > P.S :- Thanks to everybody who reponded clarifying this subject. > > Yes! it was really helpful, thanks to all. I think it would be great > to have Paul's very clear (and short!) explanation somewhere in > Documentation/. Glad you liked it! I need to do an RCU documentation update soon anyway, so will include memory barriers in that update. Thanx, Paul ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ] 2005-12-11 23:45 ` Rusty Russell 2005-12-12 0:49 ` Keith Owens @ 2005-12-13 5:07 ` Andrew James Wade 2005-12-13 5:43 ` Paul E. McKenney 2005-12-13 11:20 ` Andi Kleen 1 sibling, 2 replies; 33+ messages in thread From: Andrew James Wade @ 2005-12-13 5:07 UTC (permalink / raw) To: Rusty Russell Cc: vatsa, Oleg Nesterov, linux-kernel, Dipankar Sarma, Paul E. McKenney, Andrew Morton, Ingo Molnar On Sunday 11 December 2005 18:45, Rusty Russell wrote: > On Sun, 2005-12-11 at 16:21 -0500, Andrew James Wade wrote: > > On Sunday 11 December 2005 12:41, Srivatsa Vaddagiri wrote: > > > [Changed the subject line to be more generic in the interest of wider audience] > > > > > > We seem to be having some confusion over the exact semantics of smp_mb(). > > > > > > Specifically, are all stores preceding smp_mb() guaranteed to have finished > > > (committed to memory/corresponding cache-lines on other CPUs invalidated) > > > *before* successive loads are issued? > > > > I doubt it. That's definitely not true of smp_wmb(), which boils down to > > __asm__ __volatile__ ("": : :"memory") on SMP i386 (which the constrains > > how the compiler orders write instructions, but is otherwise a nop. i386 > > has in-order writes.). Hrrm, after doing a bit of reading on cache-coherency, verifying that the corresponding cache-lines on other CPUs are invalidated can (sometimes) be done quite quickly, so waiting for that before issuing reads might not destroy performance. I still doubt that i386es do thing this way, but I don't think it matters (see below). > > > > And it makes sense that wmb() wouldn't wait for writes: RCU needs > > constraints on the order in which writes become visible, but has very week > > constraints on when they do. Waiting for writes to flush would hurt > > performance. > > On the contrary. I did some digging and asking and thinking about this > for the Unreliable Guide to Kernel Locking, years ago: > > wmb() means all writes preceeding will complete before any writes > following are started. What does it mean for a write to start? For that matter, what does it mean for a write to complete? I think my focusing on the hardware details (of which I am woefully ignorant) was a mistake: the hardware can really do whatever it wants so long as it implements the expected abstract machine model, and it is ordering that matters in that model. So it makes sense that ordering, not time, is what the definitions of the memory barriers focus on. I think that Oleg's question: > Does this mb() garantees that the new value of ->cur will be visible > on other cpus immediately after smp_mb() (so that rcu_pending() will > notice it) ? is really just about the timeliness with which writes before a smp_mb() become visible to other CPUs. Does Linux run on architectures in which writes are not propagated in a timely manner (that is: other CPUs can read stale values for a "considerable" time)? I rather doubt it. But with a proviso to my answer: the compiler will happily hoist reads out of loops. So writes won't necessarily get read in a timely manner. Andrew ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ] 2005-12-13 5:07 ` Andrew James Wade @ 2005-12-13 5:43 ` Paul E. McKenney 2005-12-13 11:20 ` Andi Kleen 1 sibling, 0 replies; 33+ messages in thread From: Paul E. McKenney @ 2005-12-13 5:43 UTC (permalink / raw) To: ajwade Cc: Rusty Russell, vatsa, Oleg Nesterov, linux-kernel, Dipankar Sarma, Andrew Morton, Ingo Molnar On Tue, Dec 13, 2005 at 12:07:47AM -0500, Andrew James Wade wrote: > On Sunday 11 December 2005 18:45, Rusty Russell wrote: > > On Sun, 2005-12-11 at 16:21 -0500, Andrew James Wade wrote: > > > On Sunday 11 December 2005 12:41, Srivatsa Vaddagiri wrote: > > > > [Changed the subject line to be more generic in the interest of wider audience] > > > > > > > > We seem to be having some confusion over the exact semantics of smp_mb(). > > > > > > > > Specifically, are all stores preceding smp_mb() guaranteed to have finished > > > > (committed to memory/corresponding cache-lines on other CPUs invalidated) > > > > *before* successive loads are issued? > > > > > > I doubt it. That's definitely not true of smp_wmb(), which boils down to > > > __asm__ __volatile__ ("": : :"memory") on SMP i386 (which the constrains > > > how the compiler orders write instructions, but is otherwise a nop. i386 > > > has in-order writes.). > > Hrrm, after doing a bit of reading on cache-coherency, verifying that the > corresponding cache-lines on other CPUs are invalidated can (sometimes) > be done quite quickly, so waiting for that before issuing reads might not > destroy performance. I still doubt that i386es do thing this way, but I > don't think it matters (see below). Hardware designers need only be concerned with how things -appear- to the software. They can and do use optimizations that get the effect of waiting without actually waiting. For example, an x86 CPU need not -really- keep writes ordered as long as the software cannot tell that they have become misordered. This may sound strange, but one important example would be a CPU doing writes that are to variables that it knows do not appear in any other CPU's cache. In this sort of case, the CPU could reorder the writes until such time as it detected a request from another CPU requsting a cache line containing one of the variables. > > > And it makes sense that wmb() wouldn't wait for writes: RCU needs > > > constraints on the order in which writes become visible, but has very week > > > constraints on when they do. Waiting for writes to flush would hurt > > > performance. > > > > On the contrary. I did some digging and asking and thinking about this > > for the Unreliable Guide to Kernel Locking, years ago: > > > > wmb() means all writes preceeding will complete before any writes > > following are started. > > What does it mean for a write to start? For that matter, what does it mean > for a write to complete? Potentially many things in both cases: o Fetching the store instruction that will do the write. o Determining that all instructions whose execution logically precedes the write are free of fatal errors (e.g., page faults, exceptions, traps, ...). o Determining the exact value that is to be written (which might have been computed by prior instructions). o Determining the exact address that is to be written to (which also might have been computed by prior instructions). o Determining that the store instruction itself is free of fatal errors. Note that it might be necessary to compute the address to be sure. o Determining that no hardware interrupt will precede the completion of the store instruction. o Storing the value to one of the CPU's write buffers (which is not yet in the coherence region represented by the CPU's cache). o Starting the process of fetching the cache line into which the value is to be stored. o Starting the process of invalidating the corresponding cache line from all the other CPUs' caches. o Receiving the cache line into which the value is to be stored. o Storing the value to the CPU's cache, which involves combining the value to be written with the rest of the cache line. o Responding to the first request for the corresponding cache line from some other CPU. o Completing the process of invalidating the corresponding cache line from all the other CPUs' caches. Note that Alpha's weak memory-barrier semantics allow this step to complete long after the value is stored into the writing CPU's cache. o Storing the value to physical DRAM. Most of these steps can execute in parallel or out of order. And a CPU designer would gleefully point out -lots- of steps I have omitted, some for the sake of brevity, others out of ignorance. You asked!!! > I think my focusing on the hardware details (of which I am woefully > ignorant) was a mistake: the hardware can really do whatever it wants so > long as it implements the expected abstract machine model, and it is > ordering that matters in that model. So it makes sense that ordering, not > time, is what the definitions of the memory barriers focus on. Yes, maintaining one's sanity requires taking a very simplified view of memory ordering, especially since no two CPUs order memory references in exactly the same way. > I think that Oleg's question: > > Does this mb() garantees that the new value of ->cur will be visible > > on other cpus immediately after smp_mb() (so that rcu_pending() will > > notice it) ? > is really just about the timeliness with which writes before a smp_mb() > become visible to other CPUs. Does Linux run on architectures in which > writes are not propagated in a timely manner (that is: other CPUs can read > stale values for a "considerable" time)? I rather doubt it. Remember that smp_mb() is officially about controlling the order in which values are communicated between the CPU executing the smp_mb() and the interconnect, -not- between a pair of CPUs. In the general case (which includes Alpha), controlling the order in which values are communicated from one CPU to another requires that -both- CPUs execute memory barriers. > But with a proviso to my answer: the compiler will happily hoist reads out > of loops. So writes won't necessarily get read in a timely manner. Very good! And this is in fact why smp_wmb() and friends also contain must directives instructing the compiler not to mess with ordering. Thanx, Paul ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ] 2005-12-13 5:07 ` Andrew James Wade 2005-12-13 5:43 ` Paul E. McKenney @ 2005-12-13 11:20 ` Andi Kleen 2005-12-13 16:20 ` Paul E. McKenney 1 sibling, 1 reply; 33+ messages in thread From: Andi Kleen @ 2005-12-13 11:20 UTC (permalink / raw) To: ajwade Cc: vatsa, Oleg Nesterov, linux-kernel, Dipankar Sarma, Paul E. McKenney, Andrew Morton, Ingo Molnar Andrew James Wade <andrew.j.wade@gmail.com> writes: > > What does it mean for a write to start? For that matter, what does it mean > for a write to complete? wmb() or smp_smb() really makes no guarantees when a write completes (e.g. when another CPU or a device sees the new value) It just guarantees ordering. This means when you do e.g. *p = 1; wmb(); *p = 2; wmb(); then another CPU will never see 2 before 1 (but possibly it might miss the "1 state completely"). When it actually sees the values (or if it keeps using whatever value was in *p before the first assignment) is undefined. The definition quoted above is wrong I think. Rusty can you perhaps fix it up? > I think my focusing on the hardware details (of which I am woefully > ignorant) was a mistake: the hardware can really do whatever it wants so > long as it implements the expected abstract machine model, and it is > ordering that matters in that model. So it makes sense that ordering, not > time, is what the definitions of the memory barriers focus on. Exactly. > > Does this mb() garantees that the new value of ->cur will be visible > > on other cpus immediately after smp_sees 1 or 2 is undefined. mb() (so that rcu_pending() will > > notice it) ? > is really just about the timeliness with which writes before a smp_mb() > become visible to other CPUs. Does Linux run on architectures in which > writes are not propagated in a timely manner (that is: other CPUs can read > stale values for a "considerable" time)? I rather doubt it. > > But with a proviso to my answer: the compiler will happily hoist reads out > of loops. So writes won't necessarily get read in a timely manner. Or just consider interrupts. Any CPU can stop for an very long time at any time as seen by the other CPUs. Or it might take an machine check and come back. You can never make any assumptions about when another CPU sees a write unless you add explicit synchronization like a spinlock. -Andi ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ] 2005-12-13 11:20 ` Andi Kleen @ 2005-12-13 16:20 ` Paul E. McKenney 2005-12-13 22:27 ` Keith Owens 0 siblings, 1 reply; 33+ messages in thread From: Paul E. McKenney @ 2005-12-13 16:20 UTC (permalink / raw) To: Andi Kleen Cc: ajwade, vatsa, Oleg Nesterov, linux-kernel, Dipankar Sarma, Andrew Morton, Ingo Molnar On Tue, Dec 13, 2005 at 04:20:13AM -0700, Andi Kleen wrote: > Andrew James Wade <andrew.j.wade@gmail.com> writes: > > > > What does it mean for a write to start? For that matter, what does it mean > > for a write to complete? > > wmb() or smp_smb() really makes no guarantees when a write > completes (e.g. when another CPU or a device sees the new value) > It just guarantees ordering. This means when you do e.g. > > *p = 1; > wmb(); > *p = 2; > wmb(); > > then another CPU will never see 2 before 1 (but possibly it might > miss the "1 state completely"). When it actually sees the values > (or if it keeps using whatever value was in *p before the first > assignment) is undefined. It is OK on the write side, but the corresponding barriers are required on the read side (even on non-Alpha CPUs, since the value of the pointer remains constant). Writer Reader p = 1; p1 = p; smp_wmb(); smp_rmb(); p = 2; p2 = p; With these memory barriers in place, if p2 is set to 1, then p1 must also be set to 1. Similarly, if p1 is set to 2, then p2 must also be set to 2. If the smp_rmb() is not present on the read side, then the reading CPU (and compiler) are free to interchange the order of the two assignment statements. As Andi says, memory barriers do not necessarily make reads or writes happen faster, instead they only enforce limited ordering constraints. If the variable p references MMIO rather than normal memory, then wmb() and rmb() are needed instead of smp_wmb() and smp_rmb(). This is because the I/O device needs to see the accesses ordered even in a UP system. Thanx, Paul > The definition quoted above is wrong I think. Rusty can you > perhaps fix it up? > > > I think my focusing on the hardware details (of which I am woefully > > ignorant) was a mistake: the hardware can really do whatever it wants so > > long as it implements the expected abstract machine model, and it is > > ordering that matters in that model. So it makes sense that ordering, not > > time, is what the definitions of the memory barriers focus on. > > Exactly. > > > > Does this mb() garantees that the new value of ->cur will be visible > > > on other cpus immediately after smp_sees 1 or 2 is undefined. mb() (so that rcu_pending() will > > > notice it) ? > > is really just about the timeliness with which writes before a smp_mb() > > become visible to other CPUs. Does Linux run on architectures in which > > writes are not propagated in a timely manner (that is: other CPUs can read > > stale values for a "considerable" time)? I rather doubt it. > > > > But with a proviso to my answer: the compiler will happily hoist reads out > > of loops. So writes won't necessarily get read in a timely manner. > > Or just consider interrupts. Any CPU can stop for an very long time > at any time as seen by the other CPUs. Or it might take an machine check > and come back. You can never make any assumptions about when another > CPU sees a write unless you add explicit synchronization like a spinlock. > > -Andi > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ] 2005-12-13 16:20 ` Paul E. McKenney @ 2005-12-13 22:27 ` Keith Owens 2005-12-13 22:50 ` Paul E. McKenney ` (2 more replies) 0 siblings, 3 replies; 33+ messages in thread From: Keith Owens @ 2005-12-13 22:27 UTC (permalink / raw) To: paulmck Cc: Andi Kleen, ajwade, vatsa, Oleg Nesterov, linux-kernel, Dipankar Sarma, Andrew Morton, Ingo Molnar On Tue, 13 Dec 2005 08:20:27 -0800, "Paul E. McKenney" <paulmck@us.ibm.com> wrote: >If the variable p references MMIO rather than normal memory, then >wmb() and rmb() are needed instead of smp_wmb() and smp_rmb(). mmiowb(), not wmb(). IA64 has a different form of memory fence for I/O space compared to normal memory. MIPS also has a non-empty form of mmiowb(). >This is because the I/O device needs to see the accesses ordered >even in a UP system. Why does mmiowb() map to empty on most systems, including Alpha? Should it not map to wmb() for everything except IA64 and MIPS? ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ] 2005-12-13 22:27 ` Keith Owens @ 2005-12-13 22:50 ` Paul E. McKenney 2005-12-14 1:12 ` Andi Kleen 2005-12-15 21:15 ` Semantics of smp_mb() Roland Dreier 2005-12-16 7:46 ` Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ] Jeremy Higdon 2 siblings, 1 reply; 33+ messages in thread From: Paul E. McKenney @ 2005-12-13 22:50 UTC (permalink / raw) To: Keith Owens Cc: Andi Kleen, ajwade, vatsa, Oleg Nesterov, linux-kernel, Dipankar Sarma, Andrew Morton, Ingo Molnar On Wed, Dec 14, 2005 at 09:27:41AM +1100, Keith Owens wrote: > On Tue, 13 Dec 2005 08:20:27 -0800, > "Paul E. McKenney" <paulmck@us.ibm.com> wrote: > >If the variable p references MMIO rather than normal memory, then > >wmb() and rmb() are needed instead of smp_wmb() and smp_rmb(). > > mmiowb(), not wmb(). IA64 has a different form of memory fence for I/O > space compared to normal memory. MIPS also has a non-empty form of > mmiowb(). New one on me! > >This is because the I/O device needs to see the accesses ordered > >even in a UP system. > > Why does mmiowb() map to empty on most systems, including Alpha? > Should it not map to wmb() for everything except IA64 and MIPS? Sounds like I am not the only one that it is new to... There are over four hundred instances of wmb() still in the drivers tree in 2.6.14. I suspect that most of them are for MMIO fencing -- the ones I looked at quickly certainly seem to be. But, given mmiowb(), what is the point of having wmb()? Why are write memory barriers needed in UP kernels if not for MMIO and other hardware-specific accesses? Is your thought that use of wmb() should be phased out in favor of mmiowb()? (Might be a good idea, but doesn't look like we are there yet.) Thanx, Paul ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ] 2005-12-13 22:50 ` Paul E. McKenney @ 2005-12-14 1:12 ` Andi Kleen 2005-12-14 1:46 ` Paul E. McKenney 0 siblings, 1 reply; 33+ messages in thread From: Andi Kleen @ 2005-12-14 1:12 UTC (permalink / raw) To: Paul E. McKenney Cc: Keith Owens, Andi Kleen, ajwade, vatsa, Oleg Nesterov, linux-kernel, Dipankar Sarma, Andrew Morton, Ingo Molnar On Tue, Dec 13, 2005 at 02:50:59PM -0800, Paul E. McKenney wrote: > On Wed, Dec 14, 2005 at 09:27:41AM +1100, Keith Owens wrote: > > On Tue, 13 Dec 2005 08:20:27 -0800, > > "Paul E. McKenney" <paulmck@us.ibm.com> wrote: > > >If the variable p references MMIO rather than normal memory, then > > >wmb() and rmb() are needed instead of smp_wmb() and smp_rmb(). > > > > mmiowb(), not wmb(). IA64 has a different form of memory fence for I/O > > space compared to normal memory. MIPS also has a non-empty form of > > mmiowb(). > > New one on me! Didn't it make only a difference on the Altix or something like that? I suppose they added it only on the drivers for devices supported by SGI. -Andi ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ] 2005-12-14 1:12 ` Andi Kleen @ 2005-12-14 1:46 ` Paul E. McKenney 0 siblings, 0 replies; 33+ messages in thread From: Paul E. McKenney @ 2005-12-14 1:46 UTC (permalink / raw) To: Andi Kleen Cc: Keith Owens, ajwade, vatsa, Oleg Nesterov, linux-kernel, Dipankar Sarma, Andrew Morton, Ingo Molnar On Wed, Dec 14, 2005 at 02:12:53AM +0100, Andi Kleen wrote: > On Tue, Dec 13, 2005 at 02:50:59PM -0800, Paul E. McKenney wrote: > > On Wed, Dec 14, 2005 at 09:27:41AM +1100, Keith Owens wrote: > > > On Tue, 13 Dec 2005 08:20:27 -0800, > > > "Paul E. McKenney" <paulmck@us.ibm.com> wrote: > > > >If the variable p references MMIO rather than normal memory, then > > > >wmb() and rmb() are needed instead of smp_wmb() and smp_rmb(). > > > > > > mmiowb(), not wmb(). IA64 has a different form of memory fence for I/O > > > space compared to normal memory. MIPS also has a non-empty form of > > > mmiowb(). > > > > New one on me! > > Didn't it make only a difference on the Altix or something like that? > I suppose they added it only on the drivers for devices supported by SGI. It could potentially help on a few other CPUs, but quite a few driver changes would be needed to really bring out the full benefits. I am concerned that the current state leaves a number of CPU families broken -- the empty definitions cannot be good for other weakly ordered CPUs! Thanx, Paul ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Semantics of smp_mb() 2005-12-13 22:27 ` Keith Owens 2005-12-13 22:50 ` Paul E. McKenney @ 2005-12-15 21:15 ` Roland Dreier 2005-12-16 7:46 ` Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ] Jeremy Higdon 2 siblings, 0 replies; 33+ messages in thread From: Roland Dreier @ 2005-12-15 21:15 UTC (permalink / raw) To: Keith Owens Cc: paulmck, Andi Kleen, ajwade, vatsa, Oleg Nesterov, linux-kernel, Dipankar Sarma, Andrew Morton, Ingo Molnar Keith> Why does mmiowb() map to empty on most systems, including Keith> Alpha? Should it not map to wmb() for everything except Keith> IA64 and MIPS? I think the intention (as spelled out in Documentation/DocBook/deviceiobook.tmpl) is that mmiowb() must be used in conjunction with spinlocks or some other SMP synchronization mechanism. The locks themselves are sufficient ordering on the archs where mmiowb() is a NOP. One way of thinking about this is that the usually barrier operations like wmb() affect the order of a single CPU's operations -- that is, wmb() is saying that all writes from the current thread of execution before the wmb() become visible before any of the writes from the current after the wmb(). But wmb() doesn't say anything about how one CPU's writes are ordered against anything a different CPU does. mmiowb() is something else -- it controls the visibility of writes from different CPUs. It says that all writes before the mmiowb() become visible before any writes from any CPU after the mmiowb(). However, this isn't sensible unless we can order the writes between CPUs, and that only makes sense if there's a lock that lets us say that one CPU is executing something after the mmiowb(). - R. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ] 2005-12-13 22:27 ` Keith Owens 2005-12-13 22:50 ` Paul E. McKenney 2005-12-15 21:15 ` Semantics of smp_mb() Roland Dreier @ 2005-12-16 7:46 ` Jeremy Higdon 2006-03-13 18:39 ` Paul E. McKenney 2 siblings, 1 reply; 33+ messages in thread From: Jeremy Higdon @ 2005-12-16 7:46 UTC (permalink / raw) To: Keith Owens Cc: paulmck, Andi Kleen, ajwade, vatsa, Oleg Nesterov, linux-kernel, Dipankar Sarma, Andrew Morton, Ingo Molnar Roland Dreier got this right. The purpose of the mmiowb is to ensure that writes to I/O devices while holding a spinlock are ordered with respect to writes issued after the original processor releases and a second processor acquires said spinlock. A MMIO read would be sufficient, but is much heavier weight. On the SGI MIPS-based systems, the "sync" instruction was used. On the Altix systems, a register on the hub chip is read. >From comments by jejb, we're looking at modifying the mmiowb API by adding an argument which would be a register to read from if the architecture in question needs ordering in this way but does not have a lighter weight mechanism like the Altix mmiowb. Since there will now need to be a width indication, mmiowb will be replaced with mmiowb[bwlq]. jeremy ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ] 2005-12-16 7:46 ` Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ] Jeremy Higdon @ 2006-03-13 18:39 ` Paul E. McKenney 2006-03-31 4:56 ` Jeremy Higdon 0 siblings, 1 reply; 33+ messages in thread From: Paul E. McKenney @ 2006-03-13 18:39 UTC (permalink / raw) To: Jeremy Higdon Cc: Keith Owens, Andi Kleen, ajwade, vatsa, Oleg Nesterov, linux-kernel, Dipankar Sarma, Andrew Morton, Ingo Molnar On Thu, Dec 15, 2005 at 11:46:26PM -0800, Jeremy Higdon wrote: > Roland Dreier got this right. The purpose of the mmiowb is > to ensure that writes to I/O devices while holding a spinlock > are ordered with respect to writes issued after the original > processor releases and a second processor acquires said > spinlock. > > A MMIO read would be sufficient, but is much heavier weight. > > On the SGI MIPS-based systems, the "sync" instruction was used. > On the Altix systems, a register on the hub chip is read. > > >From comments by jejb, we're looking at modifying the mmiowb > API by adding an argument which would be a register to read > from if the architecture in question needs ordering in this > way but does not have a lighter weight mechanism like the Altix > mmiowb. Since there will now need to be a width indication, > mmiowb will be replaced with mmiowb[bwlq]. Any progress on this front? I figured that I would wait to update the ordering document until after this change happened, but if it is going to be awhile, I should proceed with the current API. Thoughts? Thanx, Paul ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ] 2006-03-13 18:39 ` Paul E. McKenney @ 2006-03-31 4:56 ` Jeremy Higdon 2006-03-31 6:18 ` Paul E. McKenney 2006-03-31 23:38 ` Jesse Barnes 0 siblings, 2 replies; 33+ messages in thread From: Jeremy Higdon @ 2006-03-31 4:56 UTC (permalink / raw) To: Paul E. McKenney, bcasavan Cc: Keith Owens, Andi Kleen, ajwade, vatsa, Oleg Nesterov, linux-kernel, Dipankar Sarma, Andrew Morton, Ingo Molnar On Mon, Mar 13, 2006 at 10:39:32AM -0800, Paul E. McKenney wrote: > On Thu, Dec 15, 2005 at 11:46:26PM -0800, Jeremy Higdon wrote: > > Roland Dreier got this right. The purpose of the mmiowb is > > to ensure that writes to I/O devices while holding a spinlock > > are ordered with respect to writes issued after the original > > processor releases and a second processor acquires said > > spinlock. > > > > A MMIO read would be sufficient, but is much heavier weight. > > > > On the SGI MIPS-based systems, the "sync" instruction was used. > > On the Altix systems, a register on the hub chip is read. > > > > >From comments by jejb, we're looking at modifying the mmiowb > > API by adding an argument which would be a register to read > > from if the architecture in question needs ordering in this > > way but does not have a lighter weight mechanism like the Altix > > mmiowb. Since there will now need to be a width indication, > > mmiowb will be replaced with mmiowb[bwlq]. > > Any progress on this front? I figured that I would wait to update > the ordering document until after this change happened, but if it > is going to be awhile, I should proceed with the current API. > > Thoughts? > > Thanx, Paul Brent Casavant was going to be working on this. I'll CC him so that he can indicate the status. jeremy ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ] 2006-03-31 4:56 ` Jeremy Higdon @ 2006-03-31 6:18 ` Paul E. McKenney 2006-03-31 23:38 ` Jesse Barnes 1 sibling, 0 replies; 33+ messages in thread From: Paul E. McKenney @ 2006-03-31 6:18 UTC (permalink / raw) To: Jeremy Higdon Cc: bcasavan, Keith Owens, Andi Kleen, ajwade, vatsa, Oleg Nesterov, linux-kernel, Dipankar Sarma, Andrew Morton, Ingo Molnar On Thu, Mar 30, 2006 at 08:56:27PM -0800, Jeremy Higdon wrote: > On Mon, Mar 13, 2006 at 10:39:32AM -0800, Paul E. McKenney wrote: > > On Thu, Dec 15, 2005 at 11:46:26PM -0800, Jeremy Higdon wrote: > > > Roland Dreier got this right. The purpose of the mmiowb is > > > to ensure that writes to I/O devices while holding a spinlock > > > are ordered with respect to writes issued after the original > > > processor releases and a second processor acquires said > > > spinlock. > > > > > > A MMIO read would be sufficient, but is much heavier weight. > > > > > > On the SGI MIPS-based systems, the "sync" instruction was used. > > > On the Altix systems, a register on the hub chip is read. > > > > > > >From comments by jejb, we're looking at modifying the mmiowb > > > API by adding an argument which would be a register to read > > > from if the architecture in question needs ordering in this > > > way but does not have a lighter weight mechanism like the Altix > > > mmiowb. Since there will now need to be a width indication, > > > mmiowb will be replaced with mmiowb[bwlq]. > > > > Any progress on this front? I figured that I would wait to update > > the ordering document until after this change happened, but if it > > is going to be awhile, I should proceed with the current API. > > > > Thoughts? > > Brent Casavant was going to be working on this. I'll CC him so that > he can indicate the status. David Howells is documenting memory barriers in the Documentation directory as well. Thanx, Paul ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ] 2006-03-31 4:56 ` Jeremy Higdon 2006-03-31 6:18 ` Paul E. McKenney @ 2006-03-31 23:38 ` Jesse Barnes 1 sibling, 0 replies; 33+ messages in thread From: Jesse Barnes @ 2006-03-31 23:38 UTC (permalink / raw) To: Jeremy Higdon Cc: Paul E. McKenney, bcasavan, Keith Owens, Andi Kleen, ajwade, vatsa, Oleg Nesterov, linux-kernel, Dipankar Sarma, Andrew Morton, Ingo Molnar [Unknown author] > > > From comments by jejb, we're looking at modifying the mmiowb > > > API by adding an argument which would be a register to read > > > from if the architecture in question needs ordering in this > > > way but does not have a lighter weight mechanism like the Altix > > > mmiowb. Since there will now need to be a width indication, > > > mmiowb will be replaced with mmiowb[bwlq]. > > > > Any progress on this front? I figured that I would wait to update > > the ordering document until after this change happened, but if it > > is going to be awhile, I should proceed with the current API. I avoided doing this initially on the premise that I shouldn't do things 'just because we might need it in the future' since that way seems to lead to madness. Is there actual hardware that needs an argument to mmiowb() (i.e. that can't just do a read from the system's single bridge or something)? Jesse ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] Fix RCU race in access of nohz_cpu_mask 2005-12-10 18:55 ` Oleg Nesterov 2005-12-11 17:41 ` Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ] Srivatsa Vaddagiri @ 2005-12-12 3:10 ` Paul E. McKenney 2005-12-12 4:32 ` Andrew Morton 1 sibling, 1 reply; 33+ messages in thread From: Paul E. McKenney @ 2005-12-12 3:10 UTC (permalink / raw) To: Oleg Nesterov; +Cc: vatsa, linux-kernel, Dipankar Sarma, Andrew Morton On Sat, Dec 10, 2005 at 09:55:35PM +0300, Oleg Nesterov wrote: > Srivatsa Vaddagiri wrote: > > > > On Fri, Dec 09, 2005 at 10:17:38PM +0300, Oleg Nesterov wrote: > > > > rcp->cur++; /* New GP */ > > > > > > > > smp_mb(); > > > > > > I think I need some education on memory barriers. > > > > > > Does this mb() garantees that the new value of ->cur will be visible > > > on other cpus immediately after smp_mb() (so that rcu_pending() will > > > notice it) ? > > > > AFAIK the new value of ->cur should be visible to other CPUs when smp_mb() > > returns. Here's a definition of smp_mb() from Paul's article: > > > > smp_mb(): "memory barrier" that orders both loads and stores. This means loads > > and stores preceding the memory barrier are committed to memory before any > > loads and stores following the memory barrier. > > Thanks, but this definition talks about ordering, it does not say > anything about the time when stores are really commited to memory > (and does it mean also that cache-lines are invalidated on other > cpus ?). Think of a pair of CPUs (creatively named "CPU 0" and "CPU 1") with a cache-coherent interconnect between them. Then: 1. wmb() guarantees that any writes preceding the wmb() will be seen by the interconnect before any writes following the wmb(). But this applies -only- to the writes executed by the CPU doing the wmb(). 2. rmb() guarantees that any changes seen by the interconnect preceding the rmb() will be seen by any reads following the rmb(). Again, this applies only to reads executed by the CPU doing the wmb(). However, the changes might be due to any CPU. 3. mb() combines the guarantees made by rmb() and wmb(). All CPUs but Alpha make stronger guarantees. On those CPUs, you can replace "interconnect" in #1 above with "all CPUs", and you can replace "seen by the interconnect" in #2 above with "caused by any CPU". Rumor has it that more recent Alpha CPUs also have stronger memory barriers, but I have thus far been unable to confirm this. On with the rest of the definitions: 4. smp_wmb(), smp_rmb(), and smb_mb() do nothing in UP kernels, but do the corresponding memory barrier in SMP kernels. 5. read_barrier_depends() is rmb() on Alpha, and nop on other CPUs. Non-Alpha CPUs guarantee that if CPU 0 does the following, where p->a is initially zero and global_pointer->a is initially 1: p->a = 2; smp_wmb(); global_pointer = p; and if CPU 1 does: x = global_pointer->a; then the value of x will be either 1 or 2, never zero. On Alpha, strange though it may seem (and it did seem strange to me when I first encountered it!), the value of x could well be zero. To get the same guarantee on Alpha as on the other CPUs, the assignment to x must instead be as follows: tmp = global_pointer; read_memory_depends(); x = tmp->a; or, equivalently: x = rcu_dereference(global_pointer)->a; There is an smp_read_barrier_depends() that takes effect only in an SMP kernel, similar to smp_wmb() and friends. This example may seem quite strange, but the need for the barrier follows directly from the definition #1, which makes its ordering guarantee only to the -interconnect-, -not- to the other CPUs. > > [ http://www.linuxjournal.com/article/8211 ] Hmm... That one does look familiar. ;-) > And thanks for this link, now I have some understanding about > read_barrier_depends() ... Let's just say it required much patience and perseverence on the part of the Alpha architects to explain it to me the first time around. ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] Fix RCU race in access of nohz_cpu_mask 2005-12-12 3:10 ` [PATCH] Fix RCU race in access of nohz_cpu_mask Paul E. McKenney @ 2005-12-12 4:32 ` Andrew Morton 2005-12-12 4:38 ` David S. Miller ` (2 more replies) 0 siblings, 3 replies; 33+ messages in thread From: Andrew Morton @ 2005-12-12 4:32 UTC (permalink / raw) To: paulmck; +Cc: oleg, vatsa, linux-kernel, dipankar "Paul E. McKenney" <paulmck@us.ibm.com> wrote: > > 1. wmb() guarantees that any writes preceding the wmb() will > be seen by the interconnect before any writes following the > wmb(). But this applies -only- to the writes executed by > the CPU doing the wmb(). > > 2. rmb() guarantees that any changes seen by the interconnect > preceding the rmb() will be seen by any reads following the > rmb(). Again, this applies only to reads executed by the > CPU doing the wmb(). However, the changes might be due to > any CPU. > > 3. mb() combines the guarantees made by rmb() and wmb(). So foo_mb() in preemptible code is potentially buggy. I guess we assume that a context switch accidentally did enough of the right types of barriers for things to work OK. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] Fix RCU race in access of nohz_cpu_mask 2005-12-12 4:32 ` Andrew Morton @ 2005-12-12 4:38 ` David S. Miller 2005-12-12 4:47 ` Nick Piggin 2005-12-12 4:49 ` Paul Mackerras 2005-12-12 6:27 ` Keith Owens 2 siblings, 1 reply; 33+ messages in thread From: David S. Miller @ 2005-12-12 4:38 UTC (permalink / raw) To: akpm; +Cc: paulmck, oleg, vatsa, linux-kernel, dipankar From: Andrew Morton <akpm@osdl.org> Date: Sun, 11 Dec 2005 20:32:26 -0800 > So foo_mb() in preemptible code is potentially buggy. > > I guess we assume that a context switch accidentally did enough of the > right types of barriers for things to work OK. A trap ought to flush all memory operations. There are some incredibly difficult memory error handling cases if the cpu does not synchronize all pending memory operations when a trap occurs. Failing that, yes, to be absolutely safe we'd need to have some explicit memory barrier in the context switch. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] Fix RCU race in access of nohz_cpu_mask 2005-12-12 4:38 ` David S. Miller @ 2005-12-12 4:47 ` Nick Piggin 0 siblings, 0 replies; 33+ messages in thread From: Nick Piggin @ 2005-12-12 4:47 UTC (permalink / raw) To: David S. Miller; +Cc: akpm, paulmck, oleg, vatsa, linux-kernel, dipankar David S. Miller wrote: > From: Andrew Morton <akpm@osdl.org> > Date: Sun, 11 Dec 2005 20:32:26 -0800 > > >>So foo_mb() in preemptible code is potentially buggy. >> >>I guess we assume that a context switch accidentally did enough of the >>right types of barriers for things to work OK. > > > A trap ought to flush all memory operations. > > There are some incredibly difficult memory error handling cases if the > cpu does not synchronize all pending memory operations when a trap > occurs. > > Failing that, yes, to be absolutely safe we'd need to have some > explicit memory barrier in the context switch. But it isn't that mbs in preemptible code are buggy. If they are scheduled off then back on the same CPU, the barrier will still be executed in the expected instruction sequence for that process. I think the minimum needed is for cpu *migrations* to be memory barriers. Again, this isn't any particular problem of mb() instructions - if cpu migrations weren't memory barriers, then preemptible code isn't even guaranteed ordering with its own memory operations, which would be quite interesting :) -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] Fix RCU race in access of nohz_cpu_mask 2005-12-12 4:32 ` Andrew Morton 2005-12-12 4:38 ` David S. Miller @ 2005-12-12 4:49 ` Paul Mackerras 2005-12-12 6:27 ` Keith Owens 2 siblings, 0 replies; 33+ messages in thread From: Paul Mackerras @ 2005-12-12 4:49 UTC (permalink / raw) To: Andrew Morton; +Cc: paulmck, oleg, vatsa, linux-kernel, dipankar Andrew Morton writes: > So foo_mb() in preemptible code is potentially buggy. > > I guess we assume that a context switch accidentally did enough of the > right types of barriers for things to work OK. The context switch code on powerpc includes an mb() for exactly this reason. Paul. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] Fix RCU race in access of nohz_cpu_mask 2005-12-12 4:32 ` Andrew Morton 2005-12-12 4:38 ` David S. Miller 2005-12-12 4:49 ` Paul Mackerras @ 2005-12-12 6:27 ` Keith Owens 2 siblings, 0 replies; 33+ messages in thread From: Keith Owens @ 2005-12-12 6:27 UTC (permalink / raw) To: Andrew Morton; +Cc: paulmck, oleg, vatsa, linux-kernel, dipankar On Sun, 11 Dec 2005 20:32:26 -0800, Andrew Morton <akpm@osdl.org> wrote: >"Paul E. McKenney" <paulmck@us.ibm.com> wrote: >> >> 1. wmb() guarantees that any writes preceding the wmb() will >> be seen by the interconnect before any writes following the >> wmb(). But this applies -only- to the writes executed by >> the CPU doing the wmb(). >> >> 2. rmb() guarantees that any changes seen by the interconnect >> preceding the rmb() will be seen by any reads following the >> rmb(). Again, this applies only to reads executed by the >> CPU doing the wmb(). However, the changes might be due to >> any CPU. >> >> 3. mb() combines the guarantees made by rmb() and wmb(). > >So foo_mb() in preemptible code is potentially buggy. > >I guess we assume that a context switch accidentally did enough of the >right types of barriers for things to work OK. Not by accident. Any context switch must flush the memory state from the old cpu's internal buffers, and that flush must get at least as far as the globally snoopable cache. Otherwise the old cpu could still own partial memory updates from the process, even though the process was now running on a new cpu. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH] Fix RCU race in access of nohz_cpu_mask 2005-12-08 19:31 [PATCH] Fix RCU race in access of nohz_cpu_mask Oleg Nesterov 2005-12-09 2:46 ` Srivatsa Vaddagiri @ 2005-12-09 2:56 ` Srivatsa Vaddagiri 1 sibling, 0 replies; 33+ messages in thread From: Srivatsa Vaddagiri @ 2005-12-09 2:56 UTC (permalink / raw) To: Oleg Nesterov Cc: linux-kernel, Dipankar Sarma, Paul E. McKenney, Andrew Morton On Thu, Dec 08, 2005 at 10:31:06PM +0300, Oleg Nesterov wrote: > I think cpu should call cpu_quiet() after adding itself to nohz mask > to eliminate this race. That would require rsp->lock to be taken on every idle CPU that wishes to go tickless. IMO that may not be a good idea. -- Thanks and Regards, Srivatsa Vaddagiri, Linux Technology Center, IBM Software Labs, Bangalore, INDIA - 560017 ^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2006-03-31 23:38 UTC | newest] Thread overview: 33+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-12-08 19:31 [PATCH] Fix RCU race in access of nohz_cpu_mask Oleg Nesterov 2005-12-09 2:46 ` Srivatsa Vaddagiri 2005-12-09 19:17 ` Oleg Nesterov 2005-12-10 15:19 ` Srivatsa Vaddagiri 2005-12-10 18:55 ` Oleg Nesterov 2005-12-11 17:41 ` Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ] Srivatsa Vaddagiri 2005-12-11 21:21 ` Andrew James Wade 2005-12-11 23:45 ` Rusty Russell 2005-12-12 0:49 ` Keith Owens 2005-12-12 8:41 ` Srivatsa Vaddagiri 2005-12-12 19:33 ` Oleg Nesterov 2005-12-13 5:20 ` Paul E. McKenney 2005-12-13 5:07 ` Andrew James Wade 2005-12-13 5:43 ` Paul E. McKenney 2005-12-13 11:20 ` Andi Kleen 2005-12-13 16:20 ` Paul E. McKenney 2005-12-13 22:27 ` Keith Owens 2005-12-13 22:50 ` Paul E. McKenney 2005-12-14 1:12 ` Andi Kleen 2005-12-14 1:46 ` Paul E. McKenney 2005-12-15 21:15 ` Semantics of smp_mb() Roland Dreier 2005-12-16 7:46 ` Semantics of smp_mb() [was : Re: [PATCH] Fix RCU race in access of nohz_cpu_mask ] Jeremy Higdon 2006-03-13 18:39 ` Paul E. McKenney 2006-03-31 4:56 ` Jeremy Higdon 2006-03-31 6:18 ` Paul E. McKenney 2006-03-31 23:38 ` Jesse Barnes 2005-12-12 3:10 ` [PATCH] Fix RCU race in access of nohz_cpu_mask Paul E. McKenney 2005-12-12 4:32 ` Andrew Morton 2005-12-12 4:38 ` David S. Miller 2005-12-12 4:47 ` Nick Piggin 2005-12-12 4:49 ` Paul Mackerras 2005-12-12 6:27 ` Keith Owens 2005-12-09 2:56 ` Srivatsa Vaddagiri
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox