* [for-next][PATCH 1/4] ring-buffer: Replace this_cpu_*() with __this_cpu_*()
[not found] <20150325130011.709478161@goodmis.org>
@ 2015-03-25 13:00 ` Steven Rostedt
2015-03-27 19:41 ` Christoph Lameter
0 siblings, 1 reply; 7+ messages in thread
From: Steven Rostedt @ 2015-03-25 13:00 UTC (permalink / raw)
To: linux-kernel
Cc: Ingo Molnar, Andrew Morton, stable, Christoph Lameter,
Uwe Kleine-Koenig
[-- Attachment #1: 0001-ring-buffer-Replace-this_cpu_-with-__this_cpu_.patch --]
[-- Type: text/plain, Size: 3003 bytes --]
From: Steven Rostedt <rostedt@goodmis.org>
It has come to my attention that this_cpu_read/write are horrible on
architectures other than x86. Worse yet, they actually disable
preemption or interrupts! This caused some unexpected tracing results
on ARM.
101.356868: preempt_count_add <-ring_buffer_lock_reserve
101.356870: preempt_count_sub <-ring_buffer_lock_reserve
The ring_buffer_lock_reserve has recursion protection that requires
accessing a per cpu variable. But since preempt_disable() is traced, it
too got traced while accessing the variable that is suppose to prevent
recursion like this.
The generic version of this_cpu_read() and write() are:
#define this_cpu_generic_read(pcp) \
({ typeof(pcp) ret__; \
preempt_disable(); \
ret__ = *this_cpu_ptr(&(pcp)); \
preempt_enable(); \
ret__; \
})
#define this_cpu_generic_to_op(pcp, val, op) \
do { \
unsigned long flags; \
raw_local_irq_save(flags); \
*__this_cpu_ptr(&(pcp)) op val; \
raw_local_irq_restore(flags); \
} while (0)
Which is unacceptable for locations that know they are within preempt
disabled or interrupt disabled locations.
Paul McKenney stated that __this_cpu_() versions produce much better code on
other architectures than this_cpu_() does, if we know that the call is done in
a preempt disabled location.
I also changed the recursive_unlock() to use two local variables instead
of accessing the per_cpu variable twice.
Link: http://lkml.kernel.org/r/20150317114411.GE3589@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/20150317104038.312e73d1@gandalf.local.home
Cc: stable@vger.kernel.org
Acked-by: Christoph Lameter <cl@linux.com>
Reported-by: Uwe Kleine-Koenig <u.kleine-koenig@pengutronix.de>
Tested-by: Uwe Kleine-Koenig <u.kleine-koenig@pengutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
kernel/trace/ring_buffer.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 5040d44fe5a3..922048a0f7ea 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2679,7 +2679,7 @@ static DEFINE_PER_CPU(unsigned int, current_context);
static __always_inline int trace_recursive_lock(void)
{
- unsigned int val = this_cpu_read(current_context);
+ unsigned int val = __this_cpu_read(current_context);
int bit;
if (in_interrupt()) {
@@ -2696,18 +2696,17 @@ static __always_inline int trace_recursive_lock(void)
return 1;
val |= (1 << bit);
- this_cpu_write(current_context, val);
+ __this_cpu_write(current_context, val);
return 0;
}
static __always_inline void trace_recursive_unlock(void)
{
- unsigned int val = this_cpu_read(current_context);
+ unsigned int val = __this_cpu_read(current_context);
- val--;
- val &= this_cpu_read(current_context);
- this_cpu_write(current_context, val);
+ val &= val & (val - 1);
+ __this_cpu_write(current_context, val);
}
#else
--
2.1.4
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [for-next][PATCH 1/4] ring-buffer: Replace this_cpu_*() with __this_cpu_*()
2015-03-25 13:00 ` [for-next][PATCH 1/4] ring-buffer: Replace this_cpu_*() with __this_cpu_*() Steven Rostedt
@ 2015-03-27 19:41 ` Christoph Lameter
2015-03-27 20:11 ` Steven Rostedt
2015-03-27 21:50 ` [PATCH] ring-buffer: Remove duplicate use of '&' in recursive code Steven Rostedt
0 siblings, 2 replies; 7+ messages in thread
From: Christoph Lameter @ 2015-03-27 19:41 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, Ingo Molnar, Andrew Morton, stable,
Uwe Kleine-Koenig
On Wed, 25 Mar 2015, Steven Rostedt wrote:
> It has come to my attention that this_cpu_read/write are horrible on
> architectures other than x86. Worse yet, they actually disable
> preemption or interrupts! This caused some unexpected tracing results
> on ARM.
This isnt something new and I thought the comment was dropped from the
patch? This is a plain error in using this_cpu_* where __this_cpu_* would
have been sufficient. Code was uselessly disabling preemption twice.
> Which is unacceptable for locations that know they are within preempt
> disabled or interrupt disabled locations.
Well yes. Thats why the __this_cpu ops are there to avoid this
overhead.
> I also changed the recursive_unlock() to use two local variables instead
> of accessing the per_cpu variable twice.
Ok gotta look at that.
> static __always_inline void trace_recursive_unlock(void)
> {
> - unsigned int val = this_cpu_read(current_context);
> + unsigned int val = __this_cpu_read(current_context);
>
> - val--;
> - val &= this_cpu_read(current_context);
> - this_cpu_write(current_context, val);
> + val &= val & (val - 1);
> + __this_cpu_write(current_context, val);
> }
Ummm... This is does not look like an equivalent thing. Should this not
be:
unsigned int val = __this_cpu_read(current_context);
unsigned int newval = val - 1;
newval &= val;
__this_cpu_write(current_context, newval);
or more compact
unsigned int val = __this_cpu_read(current_context);
__this_cpu_write(current_context, val & (val - 1));
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [for-next][PATCH 1/4] ring-buffer: Replace this_cpu_*() with __this_cpu_*()
2015-03-27 19:41 ` Christoph Lameter
@ 2015-03-27 20:11 ` Steven Rostedt
2015-03-30 12:44 ` Christoph Lameter
2015-03-27 21:50 ` [PATCH] ring-buffer: Remove duplicate use of '&' in recursive code Steven Rostedt
1 sibling, 1 reply; 7+ messages in thread
From: Steven Rostedt @ 2015-03-27 20:11 UTC (permalink / raw)
To: Christoph Lameter
Cc: linux-kernel, Ingo Molnar, Andrew Morton, stable,
Uwe Kleine-Koenig
On Fri, 27 Mar 2015 14:41:44 -0500 (CDT)
Christoph Lameter <cl@linux.com> wrote:
> On Wed, 25 Mar 2015, Steven Rostedt wrote:
>
> > It has come to my attention that this_cpu_read/write are horrible on
> > architectures other than x86. Worse yet, they actually disable
> > preemption or interrupts! This caused some unexpected tracing results
> > on ARM.
>
> This isnt something new and I thought the comment was dropped from the
> patch? This is a plain error in using this_cpu_* where __this_cpu_* would
> have been sufficient. Code was uselessly disabling preemption twice.
>
Where in the patch do you see the comment? Or were you talking about
the change log? The original patch did have a comment, an it was
dropped, that's what I thought you were talking about.
> > Which is unacceptable for locations that know they are within preempt
> > disabled or interrupt disabled locations.
>
> Well yes. Thats why the __this_cpu ops are there to avoid this
> overhead.
>
> > I also changed the recursive_unlock() to use two local variables instead
> > of accessing the per_cpu variable twice.
>
> Ok gotta look at that.
>
> > static __always_inline void trace_recursive_unlock(void)
> > {
> > - unsigned int val = this_cpu_read(current_context);
> > + unsigned int val = __this_cpu_read(current_context);
> >
> > - val--;
> > - val &= this_cpu_read(current_context);
> > - this_cpu_write(current_context, val);
> > + val &= val & (val - 1);
> > + __this_cpu_write(current_context, val);
> > }
>
> Ummm... This is does not look like an equivalent thing. Should this not
> be:
>
> unsigned int val = __this_cpu_read(current_context);
> unsigned int newval = val - 1;
>
> newval &= val;
> __this_cpu_write(current_context, newval);
Actually, it is equivalent, but I do see a issue with my patch.
val &= val & (val - 1);
is the same as the more reasonable:
val &= val - 1;
I think I meant to replace &= with = :-/
>
> or more compact
>
> unsigned int val = __this_cpu_read(current_context);
>
> __this_cpu_write(current_context, val & (val - 1));
Maybe I'll just use your compact version.
Thanks,
-- Steve
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH] ring-buffer: Remove duplicate use of '&' in recursive code
2015-03-27 19:41 ` Christoph Lameter
2015-03-27 20:11 ` Steven Rostedt
@ 2015-03-27 21:50 ` Steven Rostedt
1 sibling, 0 replies; 7+ messages in thread
From: Steven Rostedt @ 2015-03-27 21:50 UTC (permalink / raw)
To: Christoph Lameter
Cc: linux-kernel, Ingo Molnar, Andrew Morton, stable,
Uwe Kleine-Koenig
Steven Rostedt (Red Hat) (1):
ring-buffer: Remove duplicate use of '&' in recursive code
----
kernel/trace/ring_buffer.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
---------------------------
commit 7eb867195b9f3990da60738b1f26d0a71f37f77f
Author: Steven Rostedt (Red Hat) <rostedt@goodmis.org>
Date: Fri Mar 27 17:39:49 2015 -0400
ring-buffer: Remove duplicate use of '&' in recursive code
A clean up of the recursive protection code changed
val = this_cpu_read(current_context);
val--;
val &= this_cpu_read(current_context);
to
val = this_cpu_read(current_context);
val &= val & (val - 1);
Which has a duplicate use of '&' as the above is the same as
val = val & (val - 1);
Actually, it would be best to remove that line altogether and
just add it to where it is used.
Link: http://lkml.kernel.org/alpine.DEB.2.11.1503271423580.23114@gentwo.org
Suggested-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 922048a0f7ea..93caf56567cb 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2705,8 +2705,7 @@ static __always_inline void trace_recursive_unlock(void)
{
unsigned int val = __this_cpu_read(current_context);
- val &= val & (val - 1);
- __this_cpu_write(current_context, val);
+ __this_cpu_write(current_context, val & (val - 1));
}
#else
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [for-next][PATCH 1/4] ring-buffer: Replace this_cpu_*() with __this_cpu_*()
2015-03-27 20:11 ` Steven Rostedt
@ 2015-03-30 12:44 ` Christoph Lameter
2015-03-30 13:37 ` Steven Rostedt
0 siblings, 1 reply; 7+ messages in thread
From: Christoph Lameter @ 2015-03-30 12:44 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, Ingo Molnar, Andrew Morton, stable,
Uwe Kleine-Koenig
On Fri, 27 Mar 2015, Steven Rostedt wrote:
> Where in the patch do you see the comment? Or were you talking about
> the change log? The original patch did have a comment, an it was
> dropped, that's what I thought you were talking about.
Sorry yes the changelog.
> Actually, it is equivalent, but I do see a issue with my patch.
>
> val &= val & (val - 1);
>
> is the same as the more reasonable:
>
> val &= val - 1;
>
> I think I meant to replace &= with = :-/
>
> >
> > or more compact
> >
> > unsigned int val = __this_cpu_read(current_context);
> >
> > __this_cpu_write(current_context, val & (val - 1));
>
> Maybe I'll just use your compact version.
Hmmm... It could even be more compact
__this_cpu_and(current_context, __this_cpu_read(current_context) - 1);
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [for-next][PATCH 1/4] ring-buffer: Replace this_cpu_*() with __this_cpu_*()
2015-03-30 12:44 ` Christoph Lameter
@ 2015-03-30 13:37 ` Steven Rostedt
2015-03-30 14:32 ` Christoph Lameter
0 siblings, 1 reply; 7+ messages in thread
From: Steven Rostedt @ 2015-03-30 13:37 UTC (permalink / raw)
To: Christoph Lameter
Cc: linux-kernel, Ingo Molnar, Andrew Morton, stable,
Uwe Kleine-Koenig
On Mon, 30 Mar 2015 07:44:30 -0500 (CDT)
Christoph Lameter <cl@linux.com> wrote:
> > >
> > > or more compact
> > >
> > > unsigned int val = __this_cpu_read(current_context);
> > >
> > > __this_cpu_write(current_context, val & (val - 1));
> >
> > Maybe I'll just use your compact version.
>
> Hmmm... It could even be more compact
>
> __this_cpu_and(current_context, __this_cpu_read(current_context) - 1);
Hmm, I didn't realize there was an "and" version. I'm guessing this
would bring down the instruction count even more?
/me tries it.
I just finished testing my previous version. If this does prove to be
more compact, I'll have to replace that one with this one.
-- Steve
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [for-next][PATCH 1/4] ring-buffer: Replace this_cpu_*() with __this_cpu_*()
2015-03-30 13:37 ` Steven Rostedt
@ 2015-03-30 14:32 ` Christoph Lameter
0 siblings, 0 replies; 7+ messages in thread
From: Christoph Lameter @ 2015-03-30 14:32 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, Ingo Molnar, Andrew Morton, stable,
Uwe Kleine-Koenig
On Mon, 30 Mar 2015, Steven Rostedt wrote:
> Hmm, I didn't realize there was an "and" version. I'm guessing this
> would bring down the instruction count even more?
Yes two segment prefixed instructions and a decrement.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-03-30 14:32 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20150325130011.709478161@goodmis.org>
2015-03-25 13:00 ` [for-next][PATCH 1/4] ring-buffer: Replace this_cpu_*() with __this_cpu_*() Steven Rostedt
2015-03-27 19:41 ` Christoph Lameter
2015-03-27 20:11 ` Steven Rostedt
2015-03-30 12:44 ` Christoph Lameter
2015-03-30 13:37 ` Steven Rostedt
2015-03-30 14:32 ` Christoph Lameter
2015-03-27 21:50 ` [PATCH] ring-buffer: Remove duplicate use of '&' in recursive code Steven Rostedt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox