* [for-next][PATCH 1/4] ring-buffer: Replace this_cpu_*() with __this_cpu_*() [not found] <20150325130011.709478161@goodmis.org> @ 2015-03-25 13:00 ` Steven Rostedt 2015-03-27 19:41 ` Christoph Lameter 0 siblings, 1 reply; 7+ messages in thread From: Steven Rostedt @ 2015-03-25 13:00 UTC (permalink / raw) To: linux-kernel Cc: Ingo Molnar, Andrew Morton, stable, Christoph Lameter, Uwe Kleine-Koenig [-- Attachment #1: 0001-ring-buffer-Replace-this_cpu_-with-__this_cpu_.patch --] [-- Type: text/plain, Size: 3003 bytes --] From: Steven Rostedt <rostedt@goodmis.org> It has come to my attention that this_cpu_read/write are horrible on architectures other than x86. Worse yet, they actually disable preemption or interrupts! This caused some unexpected tracing results on ARM. 101.356868: preempt_count_add <-ring_buffer_lock_reserve 101.356870: preempt_count_sub <-ring_buffer_lock_reserve The ring_buffer_lock_reserve has recursion protection that requires accessing a per cpu variable. But since preempt_disable() is traced, it too got traced while accessing the variable that is suppose to prevent recursion like this. The generic version of this_cpu_read() and write() are: #define this_cpu_generic_read(pcp) \ ({ typeof(pcp) ret__; \ preempt_disable(); \ ret__ = *this_cpu_ptr(&(pcp)); \ preempt_enable(); \ ret__; \ }) #define this_cpu_generic_to_op(pcp, val, op) \ do { \ unsigned long flags; \ raw_local_irq_save(flags); \ *__this_cpu_ptr(&(pcp)) op val; \ raw_local_irq_restore(flags); \ } while (0) Which is unacceptable for locations that know they are within preempt disabled or interrupt disabled locations. Paul McKenney stated that __this_cpu_() versions produce much better code on other architectures than this_cpu_() does, if we know that the call is done in a preempt disabled location. I also changed the recursive_unlock() to use two local variables instead of accessing the per_cpu variable twice. Link: http://lkml.kernel.org/r/20150317114411.GE3589@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/20150317104038.312e73d1@gandalf.local.home Cc: stable@vger.kernel.org Acked-by: Christoph Lameter <cl@linux.com> Reported-by: Uwe Kleine-Koenig <u.kleine-koenig@pengutronix.de> Tested-by: Uwe Kleine-Koenig <u.kleine-koenig@pengutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> --- kernel/trace/ring_buffer.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 5040d44fe5a3..922048a0f7ea 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -2679,7 +2679,7 @@ static DEFINE_PER_CPU(unsigned int, current_context); static __always_inline int trace_recursive_lock(void) { - unsigned int val = this_cpu_read(current_context); + unsigned int val = __this_cpu_read(current_context); int bit; if (in_interrupt()) { @@ -2696,18 +2696,17 @@ static __always_inline int trace_recursive_lock(void) return 1; val |= (1 << bit); - this_cpu_write(current_context, val); + __this_cpu_write(current_context, val); return 0; } static __always_inline void trace_recursive_unlock(void) { - unsigned int val = this_cpu_read(current_context); + unsigned int val = __this_cpu_read(current_context); - val--; - val &= this_cpu_read(current_context); - this_cpu_write(current_context, val); + val &= val & (val - 1); + __this_cpu_write(current_context, val); } #else -- 2.1.4 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [for-next][PATCH 1/4] ring-buffer: Replace this_cpu_*() with __this_cpu_*() 2015-03-25 13:00 ` [for-next][PATCH 1/4] ring-buffer: Replace this_cpu_*() with __this_cpu_*() Steven Rostedt @ 2015-03-27 19:41 ` Christoph Lameter 2015-03-27 20:11 ` Steven Rostedt 2015-03-27 21:50 ` [PATCH] ring-buffer: Remove duplicate use of '&' in recursive code Steven Rostedt 0 siblings, 2 replies; 7+ messages in thread From: Christoph Lameter @ 2015-03-27 19:41 UTC (permalink / raw) To: Steven Rostedt Cc: linux-kernel, Ingo Molnar, Andrew Morton, stable, Uwe Kleine-Koenig On Wed, 25 Mar 2015, Steven Rostedt wrote: > It has come to my attention that this_cpu_read/write are horrible on > architectures other than x86. Worse yet, they actually disable > preemption or interrupts! This caused some unexpected tracing results > on ARM. This isnt something new and I thought the comment was dropped from the patch? This is a plain error in using this_cpu_* where __this_cpu_* would have been sufficient. Code was uselessly disabling preemption twice. > Which is unacceptable for locations that know they are within preempt > disabled or interrupt disabled locations. Well yes. Thats why the __this_cpu ops are there to avoid this overhead. > I also changed the recursive_unlock() to use two local variables instead > of accessing the per_cpu variable twice. Ok gotta look at that. > static __always_inline void trace_recursive_unlock(void) > { > - unsigned int val = this_cpu_read(current_context); > + unsigned int val = __this_cpu_read(current_context); > > - val--; > - val &= this_cpu_read(current_context); > - this_cpu_write(current_context, val); > + val &= val & (val - 1); > + __this_cpu_write(current_context, val); > } Ummm... This is does not look like an equivalent thing. Should this not be: unsigned int val = __this_cpu_read(current_context); unsigned int newval = val - 1; newval &= val; __this_cpu_write(current_context, newval); or more compact unsigned int val = __this_cpu_read(current_context); __this_cpu_write(current_context, val & (val - 1)); ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [for-next][PATCH 1/4] ring-buffer: Replace this_cpu_*() with __this_cpu_*() 2015-03-27 19:41 ` Christoph Lameter @ 2015-03-27 20:11 ` Steven Rostedt 2015-03-30 12:44 ` Christoph Lameter 2015-03-27 21:50 ` [PATCH] ring-buffer: Remove duplicate use of '&' in recursive code Steven Rostedt 1 sibling, 1 reply; 7+ messages in thread From: Steven Rostedt @ 2015-03-27 20:11 UTC (permalink / raw) To: Christoph Lameter Cc: linux-kernel, Ingo Molnar, Andrew Morton, stable, Uwe Kleine-Koenig On Fri, 27 Mar 2015 14:41:44 -0500 (CDT) Christoph Lameter <cl@linux.com> wrote: > On Wed, 25 Mar 2015, Steven Rostedt wrote: > > > It has come to my attention that this_cpu_read/write are horrible on > > architectures other than x86. Worse yet, they actually disable > > preemption or interrupts! This caused some unexpected tracing results > > on ARM. > > This isnt something new and I thought the comment was dropped from the > patch? This is a plain error in using this_cpu_* where __this_cpu_* would > have been sufficient. Code was uselessly disabling preemption twice. > Where in the patch do you see the comment? Or were you talking about the change log? The original patch did have a comment, an it was dropped, that's what I thought you were talking about. > > Which is unacceptable for locations that know they are within preempt > > disabled or interrupt disabled locations. > > Well yes. Thats why the __this_cpu ops are there to avoid this > overhead. > > > I also changed the recursive_unlock() to use two local variables instead > > of accessing the per_cpu variable twice. > > Ok gotta look at that. > > > static __always_inline void trace_recursive_unlock(void) > > { > > - unsigned int val = this_cpu_read(current_context); > > + unsigned int val = __this_cpu_read(current_context); > > > > - val--; > > - val &= this_cpu_read(current_context); > > - this_cpu_write(current_context, val); > > + val &= val & (val - 1); > > + __this_cpu_write(current_context, val); > > } > > Ummm... This is does not look like an equivalent thing. Should this not > be: > > unsigned int val = __this_cpu_read(current_context); > unsigned int newval = val - 1; > > newval &= val; > __this_cpu_write(current_context, newval); Actually, it is equivalent, but I do see a issue with my patch. val &= val & (val - 1); is the same as the more reasonable: val &= val - 1; I think I meant to replace &= with = :-/ > > or more compact > > unsigned int val = __this_cpu_read(current_context); > > __this_cpu_write(current_context, val & (val - 1)); Maybe I'll just use your compact version. Thanks, -- Steve ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [for-next][PATCH 1/4] ring-buffer: Replace this_cpu_*() with __this_cpu_*() 2015-03-27 20:11 ` Steven Rostedt @ 2015-03-30 12:44 ` Christoph Lameter 2015-03-30 13:37 ` Steven Rostedt 0 siblings, 1 reply; 7+ messages in thread From: Christoph Lameter @ 2015-03-30 12:44 UTC (permalink / raw) To: Steven Rostedt Cc: linux-kernel, Ingo Molnar, Andrew Morton, stable, Uwe Kleine-Koenig On Fri, 27 Mar 2015, Steven Rostedt wrote: > Where in the patch do you see the comment? Or were you talking about > the change log? The original patch did have a comment, an it was > dropped, that's what I thought you were talking about. Sorry yes the changelog. > Actually, it is equivalent, but I do see a issue with my patch. > > val &= val & (val - 1); > > is the same as the more reasonable: > > val &= val - 1; > > I think I meant to replace &= with = :-/ > > > > > or more compact > > > > unsigned int val = __this_cpu_read(current_context); > > > > __this_cpu_write(current_context, val & (val - 1)); > > Maybe I'll just use your compact version. Hmmm... It could even be more compact __this_cpu_and(current_context, __this_cpu_read(current_context) - 1); ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [for-next][PATCH 1/4] ring-buffer: Replace this_cpu_*() with __this_cpu_*() 2015-03-30 12:44 ` Christoph Lameter @ 2015-03-30 13:37 ` Steven Rostedt 2015-03-30 14:32 ` Christoph Lameter 0 siblings, 1 reply; 7+ messages in thread From: Steven Rostedt @ 2015-03-30 13:37 UTC (permalink / raw) To: Christoph Lameter Cc: linux-kernel, Ingo Molnar, Andrew Morton, stable, Uwe Kleine-Koenig On Mon, 30 Mar 2015 07:44:30 -0500 (CDT) Christoph Lameter <cl@linux.com> wrote: > > > > > > or more compact > > > > > > unsigned int val = __this_cpu_read(current_context); > > > > > > __this_cpu_write(current_context, val & (val - 1)); > > > > Maybe I'll just use your compact version. > > Hmmm... It could even be more compact > > __this_cpu_and(current_context, __this_cpu_read(current_context) - 1); Hmm, I didn't realize there was an "and" version. I'm guessing this would bring down the instruction count even more? /me tries it. I just finished testing my previous version. If this does prove to be more compact, I'll have to replace that one with this one. -- Steve ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [for-next][PATCH 1/4] ring-buffer: Replace this_cpu_*() with __this_cpu_*() 2015-03-30 13:37 ` Steven Rostedt @ 2015-03-30 14:32 ` Christoph Lameter 0 siblings, 0 replies; 7+ messages in thread From: Christoph Lameter @ 2015-03-30 14:32 UTC (permalink / raw) To: Steven Rostedt Cc: linux-kernel, Ingo Molnar, Andrew Morton, stable, Uwe Kleine-Koenig On Mon, 30 Mar 2015, Steven Rostedt wrote: > Hmm, I didn't realize there was an "and" version. I'm guessing this > would bring down the instruction count even more? Yes two segment prefixed instructions and a decrement. ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH] ring-buffer: Remove duplicate use of '&' in recursive code 2015-03-27 19:41 ` Christoph Lameter 2015-03-27 20:11 ` Steven Rostedt @ 2015-03-27 21:50 ` Steven Rostedt 1 sibling, 0 replies; 7+ messages in thread From: Steven Rostedt @ 2015-03-27 21:50 UTC (permalink / raw) To: Christoph Lameter Cc: linux-kernel, Ingo Molnar, Andrew Morton, stable, Uwe Kleine-Koenig Steven Rostedt (Red Hat) (1): ring-buffer: Remove duplicate use of '&' in recursive code ---- kernel/trace/ring_buffer.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) --------------------------- commit 7eb867195b9f3990da60738b1f26d0a71f37f77f Author: Steven Rostedt (Red Hat) <rostedt@goodmis.org> Date: Fri Mar 27 17:39:49 2015 -0400 ring-buffer: Remove duplicate use of '&' in recursive code A clean up of the recursive protection code changed val = this_cpu_read(current_context); val--; val &= this_cpu_read(current_context); to val = this_cpu_read(current_context); val &= val & (val - 1); Which has a duplicate use of '&' as the above is the same as val = val & (val - 1); Actually, it would be best to remove that line altogether and just add it to where it is used. Link: http://lkml.kernel.org/alpine.DEB.2.11.1503271423580.23114@gentwo.org Suggested-by: Christoph Lameter <cl@linux.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 922048a0f7ea..93caf56567cb 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -2705,8 +2705,7 @@ static __always_inline void trace_recursive_unlock(void) { unsigned int val = __this_cpu_read(current_context); - val &= val & (val - 1); - __this_cpu_write(current_context, val); + __this_cpu_write(current_context, val & (val - 1)); } #else ^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-03-30 14:32 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20150325130011.709478161@goodmis.org>
2015-03-25 13:00 ` [for-next][PATCH 1/4] ring-buffer: Replace this_cpu_*() with __this_cpu_*() Steven Rostedt
2015-03-27 19:41 ` Christoph Lameter
2015-03-27 20:11 ` Steven Rostedt
2015-03-30 12:44 ` Christoph Lameter
2015-03-30 13:37 ` Steven Rostedt
2015-03-30 14:32 ` Christoph Lameter
2015-03-27 21:50 ` [PATCH] ring-buffer: Remove duplicate use of '&' in recursive code Steven Rostedt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox