* Question of usage of per_thread()
@ 2018-09-16 8:37 Akira Yokosawa
2018-09-16 15:08 ` Paul E. McKenney
0 siblings, 1 reply; 4+ messages in thread
From: Akira Yokosawa @ 2018-09-16 8:37 UTC (permalink / raw)
To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa
Hi Paul,
Every time I review code under CodeSamples/,
I find myself confused where to use READ_ONCE/WRITE_ONCEs.
I'm looking at Listing 5.3 of current master.
There are two cases which lack READ_ONCE/WRITE_ONCE to access potentially
shared variables, namely on line 5 (__get_thread_var(counter)++;) and
on line 14 (sum += per_thread(counter, t);).
Line 5 looks like a good candidate to be optimized out when inlined.
But the performance result indicates "gcc -O3" keeps it inside the loop.
Is this because the definition of __get_thread_var() contains
a call to smp_thread_id() and complicated enough not to be optimized
out?
As for line 14, as per_thread() was derived from per_cpu() of kernel
API, I looked for call sites of per_cpu() in the kernel source tree.
There are very few cases where READ_ONCE/WRITE_ONCE is used along
with per_cpu(). There are two READ_ONCEs with per_cpu() in
kernel/rcu/srcutree.c, whose author is none other than you.
Are those READ_ONCEs necessary?
I don't grasp the actual definition of per_cpu() macro.
Definition of per_thread() macro under CodeSamples/api-pthreads/
does not look so complicated, but contains array indexing,
which might be good enough to prevent optimization in the loop.
I'm not sure, but my gut feeling is that READ_ONCE/WRITE_ONCE
is necessary to access an unannotated variable. If we need
volatility for sure, we could modify the definition of annotating
macros/functions.
Can you enlighten me?
Thanks, Akira
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: Question of usage of per_thread() 2018-09-16 8:37 Question of usage of per_thread() Akira Yokosawa @ 2018-09-16 15:08 ` Paul E. McKenney 2018-10-01 22:48 ` Akira Yokosawa 0 siblings, 1 reply; 4+ messages in thread From: Paul E. McKenney @ 2018-09-16 15:08 UTC (permalink / raw) To: Akira Yokosawa; +Cc: Paul E. McKenney, perfbook On Sun, Sep 16, 2018 at 05:37:23PM +0900, Akira Yokosawa wrote: > Hi Paul, > > Every time I review code under CodeSamples/, > I find myself confused where to use READ_ONCE/WRITE_ONCEs. > > I'm looking at Listing 5.3 of current master. > > There are two cases which lack READ_ONCE/WRITE_ONCE to access potentially > shared variables, namely on line 5 (__get_thread_var(counter)++;) and > on line 14 (sum += per_thread(counter, t);). > > Line 5 looks like a good candidate to be optimized out when inlined. > But the performance result indicates "gcc -O3" keeps it inside the loop. > > Is this because the definition of __get_thread_var() contains > a call to smp_thread_id() and complicated enough not to be optimized > out? > > As for line 14, as per_thread() was derived from per_cpu() of kernel > API, I looked for call sites of per_cpu() in the kernel source tree. > > There are very few cases where READ_ONCE/WRITE_ONCE is used along > with per_cpu(). There are two READ_ONCEs with per_cpu() in > kernel/rcu/srcutree.c, whose author is none other than you. > Are those READ_ONCEs necessary? > > I don't grasp the actual definition of per_cpu() macro. > Definition of per_thread() macro under CodeSamples/api-pthreads/ > does not look so complicated, but contains array indexing, > which might be good enough to prevent optimization in the loop. > > I'm not sure, but my gut feeling is that READ_ONCE/WRITE_ONCE > is necessary to access an unannotated variable. If we need > volatility for sure, we could modify the definition of annotating > macros/functions. > > Can you enlighten me? Yes, I do need to expand on this, both in perfbook and in order to inspect my usage in Linux-kernel RCU. Please see below for my current outline, and any and all feedback would be deeply appreciated! Thanx, Paul ------------------------------------------------------------------------ Plain accesses, that is, without either READ_ONCE() or WRITE_ONCE(): o Only accessed under lock. o Only accessed by owning CPU/thread. o Only modified by owning CPU/thread under lock, and otherwise accessed only either while holding lock or by owning CPU/thread. WRITE_ONCE() for modifications, READ_ONCE() for non-owning CPUs/threads not holding lock: o Only modified while holding lock by owning CPU/thread, could be read from anywhere by anyone. WRITE_ONCE() for modifications, READ_ONCE() for CPUs/threads not holding lock: o Only modified while holding lock, could be read anywhere by anyone. WRITE_ONCE() for modifications, READ_ONCE() for non-owning CPUs/threads: o Only modified by owning CPU/threads, could be read anywhere by anyone. Otherwise, READ_ONCE() and WRITE_ONCE() everywhere. But note that READ_ONCE() and WRITE_ONCE() provide absolutely no ordering guarantees unless: o There is only one variable being accesses, and all of those accesses are of the same size and alignment. o There is only one CPU/thread doing the accesses to all of the variables during the timeframe of interest. o There are other operations involved, for example, smp_mb(). ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Question of usage of per_thread() 2018-09-16 15:08 ` Paul E. McKenney @ 2018-10-01 22:48 ` Akira Yokosawa 2018-10-02 4:00 ` Paul E. McKenney 0 siblings, 1 reply; 4+ messages in thread From: Akira Yokosawa @ 2018-10-01 22:48 UTC (permalink / raw) To: Paul E. McKenney; +Cc: perfbook, Akira Yokosawa On 2018/09/17 0:08, Paul E. McKenney wrote: > On Sun, Sep 16, 2018 at 05:37:23PM +0900, Akira Yokosawa wrote: >> Hi Paul, >> >> Every time I review code under CodeSamples/, >> I find myself confused where to use READ_ONCE/WRITE_ONCEs. >> >> I'm looking at Listing 5.3 of current master. >> >> There are two cases which lack READ_ONCE/WRITE_ONCE to access potentially >> shared variables, namely on line 5 (__get_thread_var(counter)++;) and >> on line 14 (sum += per_thread(counter, t);). >> >> Line 5 looks like a good candidate to be optimized out when inlined. >> But the performance result indicates "gcc -O3" keeps it inside the loop. >> >> Is this because the definition of __get_thread_var() contains >> a call to smp_thread_id() and complicated enough not to be optimized >> out? >> >> As for line 14, as per_thread() was derived from per_cpu() of kernel >> API, I looked for call sites of per_cpu() in the kernel source tree. >> >> There are very few cases where READ_ONCE/WRITE_ONCE is used along >> with per_cpu(). There are two READ_ONCEs with per_cpu() in >> kernel/rcu/srcutree.c, whose author is none other than you. >> Are those READ_ONCEs necessary? >> >> I don't grasp the actual definition of per_cpu() macro. >> Definition of per_thread() macro under CodeSamples/api-pthreads/ >> does not look so complicated, but contains array indexing, >> which might be good enough to prevent optimization in the loop. >> >> I'm not sure, but my gut feeling is that READ_ONCE/WRITE_ONCE >> is necessary to access an unannotated variable. If we need >> volatility for sure, we could modify the definition of annotating >> macros/functions. >> >> Can you enlighten me? > > Yes, I do need to expand on this, both in perfbook and in order to inspect > my usage in Linux-kernel RCU. Please see below for my current outline, > and any and all feedback would be deeply appreciated! > So, I applied your outline to accesses in Listing 5.3. See inline comments below for my observation. > Thanx, Paul > > ------------------------------------------------------------------------ > > > Plain accesses, that is, without either READ_ONCE() or WRITE_ONCE(): > o Only accessed under lock. > o Only accessed by owning CPU/thread. > o Only modified by owning CPU/thread under lock, and otherwise > accessed only either while holding lock or by owning CPU/thread. > > WRITE_ONCE() for modifications, READ_ONCE() for non-owning CPUs/threads > not holding lock: > o Only modified while holding lock by owning CPU/thread, > could be read from anywhere by anyone. > > WRITE_ONCE() for modifications, READ_ONCE() for CPUs/threads not holding > lock: > o Only modified while holding lock, could be read anywhere by > anyone. > > WRITE_ONCE() for modifications, READ_ONCE() for non-owning > CPUs/threads: > o Only modified by owning CPU/threads, could be read anywhere > by anyone. Listing 5.3's per-thread variable "counter" looks covered here. So the write part of the increment on line 5 needs WRITE_ONCE(). Read access on line 14 also needs READ_ONCE(). Read part of the increment doesn't need READ_ONCE(), I guess. Line 5 would look like: WRITE_ONCE(__get_thread_var(counter), __get_thread_var(counter) + 1); . Considering the cost of __get_thread_var(), there should be a room to help compilers to optimize by using a local pointer as follows: static __inline__ void inc_count(void) { long *p_counter = &__get_thread_var(counter); WRITE_ONCE(*p_counter, *p_counter + 1); } . Does this look reasonable to you? Or is it better to also use READ_ONCE()? But if __get_thread_var() does imply volatility, we can keep the current way of increment: __get_thread_var(counter)++; Thoughts? Thanks, Akira > > Otherwise, READ_ONCE() and WRITE_ONCE() everywhere. > > But note that READ_ONCE() and WRITE_ONCE() provide absolutely no > ordering guarantees unless: > > o There is only one variable being accesses, and all of those > accesses are of the same size and alignment. > > o There is only one CPU/thread doing the accesses to all > of the variables during the timeframe of interest. > > o There are other operations involved, for example, smp_mb(). > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Question of usage of per_thread() 2018-10-01 22:48 ` Akira Yokosawa @ 2018-10-02 4:00 ` Paul E. McKenney 0 siblings, 0 replies; 4+ messages in thread From: Paul E. McKenney @ 2018-10-02 4:00 UTC (permalink / raw) To: Akira Yokosawa; +Cc: perfbook On Tue, Oct 02, 2018 at 07:48:15AM +0900, Akira Yokosawa wrote: > On 2018/09/17 0:08, Paul E. McKenney wrote: > > On Sun, Sep 16, 2018 at 05:37:23PM +0900, Akira Yokosawa wrote: > >> Hi Paul, > >> > >> Every time I review code under CodeSamples/, > >> I find myself confused where to use READ_ONCE/WRITE_ONCEs. > >> > >> I'm looking at Listing 5.3 of current master. > >> > >> There are two cases which lack READ_ONCE/WRITE_ONCE to access potentially > >> shared variables, namely on line 5 (__get_thread_var(counter)++;) and > >> on line 14 (sum += per_thread(counter, t);). > >> > >> Line 5 looks like a good candidate to be optimized out when inlined. > >> But the performance result indicates "gcc -O3" keeps it inside the loop. > >> > >> Is this because the definition of __get_thread_var() contains > >> a call to smp_thread_id() and complicated enough not to be optimized > >> out? > >> > >> As for line 14, as per_thread() was derived from per_cpu() of kernel > >> API, I looked for call sites of per_cpu() in the kernel source tree. > >> > >> There are very few cases where READ_ONCE/WRITE_ONCE is used along > >> with per_cpu(). There are two READ_ONCEs with per_cpu() in > >> kernel/rcu/srcutree.c, whose author is none other than you. > >> Are those READ_ONCEs necessary? > >> > >> I don't grasp the actual definition of per_cpu() macro. > >> Definition of per_thread() macro under CodeSamples/api-pthreads/ > >> does not look so complicated, but contains array indexing, > >> which might be good enough to prevent optimization in the loop. > >> > >> I'm not sure, but my gut feeling is that READ_ONCE/WRITE_ONCE > >> is necessary to access an unannotated variable. If we need > >> volatility for sure, we could modify the definition of annotating > >> macros/functions. > >> > >> Can you enlighten me? > > > > Yes, I do need to expand on this, both in perfbook and in order to inspect > > my usage in Linux-kernel RCU. Please see below for my current outline, > > and any and all feedback would be deeply appreciated! > > > > So, I applied your outline to accesses in Listing 5.3. > See inline comments below for my observation. > > > Thanx, Paul > > > > ------------------------------------------------------------------------ > > > > > > Plain accesses, that is, without either READ_ONCE() or WRITE_ONCE(): > > o Only accessed under lock. > > o Only accessed by owning CPU/thread. > > o Only modified by owning CPU/thread under lock, and otherwise > > accessed only either while holding lock or by owning CPU/thread. > > > > WRITE_ONCE() for modifications, READ_ONCE() for non-owning CPUs/threads > > not holding lock: > > o Only modified while holding lock by owning CPU/thread, > > could be read from anywhere by anyone. > > > > WRITE_ONCE() for modifications, READ_ONCE() for CPUs/threads not holding > > lock: > > o Only modified while holding lock, could be read anywhere by > > anyone. > > > > WRITE_ONCE() for modifications, READ_ONCE() for non-owning > > CPUs/threads: > > o Only modified by owning CPU/threads, could be read anywhere > > by anyone. > > Listing 5.3's per-thread variable "counter" looks covered here. > So the write part of the increment on line 5 needs WRITE_ONCE(). > Read access on line 14 also needs READ_ONCE(). > > Read part of the increment doesn't need READ_ONCE(), I guess. > > Line 5 would look like: > > WRITE_ONCE(__get_thread_var(counter), __get_thread_var(counter) + 1); > > . Considering the cost of __get_thread_var(), there should be a room > to help compilers to optimize by using a local pointer as follows: > > static __inline__ void inc_count(void) > { > long *p_counter = &__get_thread_var(counter); > > WRITE_ONCE(*p_counter, *p_counter + 1); > } > > . Does this look reasonable to you? Or is it better to also use > READ_ONCE()? Yes, this is fine. You don't need READ_ONCE() because threads only modify the counters that they own, and this case is a thread reading its own counter -- thus concurrent update cannot happen. > But if __get_thread_var() does imply volatility, we can keep the > current way of increment: > > __get_thread_var(counter)++; Unfortunately, it does not, at least not the way that it is currently implemented. > Thoughts? Your taking the pointer and using WRITE_ONCE() looks good! Thanx, Paul > Thanks, Akira > > > > > Otherwise, READ_ONCE() and WRITE_ONCE() everywhere. > > > > But note that READ_ONCE() and WRITE_ONCE() provide absolutely no > > ordering guarantees unless: > > > > o There is only one variable being accesses, and all of those > > accesses are of the same size and alignment. > > > > o There is only one CPU/thread doing the accesses to all > > of the variables during the timeframe of interest. > > > > o There are other operations involved, for example, smp_mb(). > > > ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2018-10-02 10:42 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-09-16 8:37 Question of usage of per_thread() Akira Yokosawa 2018-09-16 15:08 ` Paul E. McKenney 2018-10-01 22:48 ` Akira Yokosawa 2018-10-02 4:00 ` Paul E. McKenney
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.