* [RFC][PATCH 1/5] x86: Remove const_udelay() caring about which cpu var it uses
2011-09-19 21:20 [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write() Steven Rostedt
@ 2011-09-19 21:20 ` Steven Rostedt
2011-09-19 21:51 ` Christoph Lameter
2011-09-19 21:20 ` [RFC][PATCH 2/5] mm: Switch mod_state() to __this_cpu_read() Steven Rostedt
` (5 subsequent siblings)
6 siblings, 1 reply; 73+ messages in thread
From: Steven Rostedt @ 2011-09-19 21:20 UTC (permalink / raw)
To: linux-kernel
Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
Christoph Lameter
[-- Attachment #1: 0001-x86-Remove-const_udelay-caring-about-which-cpu-var-i.patch --]
[-- Type: text/plain, Size: 998 bytes --]
From: Steven Rostedt <srostedt@redhat.com>
The __const_udelay() code originally used raw_smp_processor_id()
in its calculations for a delaying. Probably because if it were
to migrate, it would take much longer to do so than the requested
delay.
Switch from this_cpu_read() to __this_cpu_read() to document that
the read is racy and we do not care.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
arch/x86/lib/delay.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/arch/x86/lib/delay.c b/arch/x86/lib/delay.c
index fc45ba8..1b9bde5 100644
--- a/arch/x86/lib/delay.c
+++ b/arch/x86/lib/delay.c
@@ -121,7 +121,7 @@ inline void __const_udelay(unsigned long xloops)
asm("mull %%edx"
:"=d" (xloops), "=&a" (d0)
:"1" (xloops), "0"
- (this_cpu_read(cpu_info.loops_per_jiffy) * (HZ/4)));
+ (__this_cpu_read(cpu_info.loops_per_jiffy) * (HZ/4)));
__delay(++xloops);
}
--
1.7.5.4
^ permalink raw reply related [flat|nested] 73+ messages in thread* Re: [RFC][PATCH 1/5] x86: Remove const_udelay() caring about which cpu var it uses
2011-09-19 21:20 ` [RFC][PATCH 1/5] x86: Remove const_udelay() caring about which cpu var it uses Steven Rostedt
@ 2011-09-19 21:51 ` Christoph Lameter
2011-09-19 23:31 ` Steven Rostedt
0 siblings, 1 reply; 73+ messages in thread
From: Christoph Lameter @ 2011-09-19 21:51 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
Peter Zijlstra, Christoph Lameter
On Mon, 19 Sep 2011, Steven Rostedt wrote:
> The __const_udelay() code originally used raw_smp_processor_id()
> in its calculations for a delaying. Probably because if it were
> to migrate, it would take much longer to do so than the requested
> delay.
>
> Switch from this_cpu_read() to __this_cpu_read() to document that
> the read is racy and we do not care.
Is preemption disabled by all callers to __const_udelay?
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 1/5] x86: Remove const_udelay() caring about which cpu var it uses
2011-09-19 21:51 ` Christoph Lameter
@ 2011-09-19 23:31 ` Steven Rostedt
0 siblings, 0 replies; 73+ messages in thread
From: Steven Rostedt @ 2011-09-19 23:31 UTC (permalink / raw)
To: Christoph Lameter
Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
Peter Zijlstra, Christoph Lameter
On Mon, 2011-09-19 at 16:51 -0500, Christoph Lameter wrote:
> On Mon, 19 Sep 2011, Steven Rostedt wrote:
>
> > The __const_udelay() code originally used raw_smp_processor_id()
> > in its calculations for a delaying. Probably because if it were
> > to migrate, it would take much longer to do so than the requested
> > delay.
> >
> > Switch from this_cpu_read() to __this_cpu_read() to document that
> > the read is racy and we do not care.
>
> Is preemption disabled by all callers to __const_udelay?
Nope, because I triggered a splat with it :-) From lots of places.
-- Steve
^ permalink raw reply [flat|nested] 73+ messages in thread
* [RFC][PATCH 2/5] mm: Switch mod_state() to __this_cpu_read()
2011-09-19 21:20 [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write() Steven Rostedt
2011-09-19 21:20 ` [RFC][PATCH 1/5] x86: Remove const_udelay() caring about which cpu var it uses Steven Rostedt
@ 2011-09-19 21:20 ` Steven Rostedt
2011-09-19 22:02 ` Christoph Lameter
2011-09-19 21:20 ` [RFC][PATCH 3/5] memcg: Disable preemption in memcg_check_events() Steven Rostedt
` (4 subsequent siblings)
6 siblings, 1 reply; 73+ messages in thread
From: Steven Rostedt @ 2011-09-19 21:20 UTC (permalink / raw)
To: linux-kernel
Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
Christoph Lameter
[-- Attachment #1: 0002-mm-Switch-mod_state-to-__this_cpu_read.patch --]
[-- Type: text/plain, Size: 868 bytes --]
From: Steven Rostedt <srostedt@redhat.com>
The code in mod_state() is already made to handle the raciness of
this_cpu_read(). Have the code use __this_cpu_read() instead so
the debug code does not trigger warnings about it.
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
mm/vmstat.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/vmstat.c b/mm/vmstat.c
index d52b13d..41a843f 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -328,9 +328,9 @@ static inline void mod_state(struct zone *zone,
* Most of the time the thresholds are the same anyways
* for all cpus in a zone.
*/
- t = this_cpu_read(pcp->stat_threshold);
+ t = __this_cpu_read(pcp->stat_threshold);
- o = this_cpu_read(*p);
+ o = __this_cpu_read(*p);
n = delta + o;
if (n > t || n < -t) {
--
1.7.5.4
^ permalink raw reply related [flat|nested] 73+ messages in thread* Re: [RFC][PATCH 2/5] mm: Switch mod_state() to __this_cpu_read()
2011-09-19 21:20 ` [RFC][PATCH 2/5] mm: Switch mod_state() to __this_cpu_read() Steven Rostedt
@ 2011-09-19 22:02 ` Christoph Lameter
2011-09-19 23:48 ` Steven Rostedt
2011-09-20 13:49 ` Thomas Gleixner
0 siblings, 2 replies; 73+ messages in thread
From: Christoph Lameter @ 2011-09-19 22:02 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
Peter Zijlstra, Christoph Lameter
On Mon, 19 Sep 2011, Steven Rostedt wrote:
> From: Steven Rostedt <srostedt@redhat.com>
>
> The code in mod_state() is already made to handle the raciness of
> this_cpu_read(). Have the code use __this_cpu_read() instead so
> the debug code does not trigger warnings about it.
Why would there be a warning triggered? this_cpu_read should take care of
disabling preemption for the read if needed. In fact the fallback case
does do exactly that.
I think it would make more sence if __this_cpu_read() could be made to
trigger a warning if used in context where preemption could be off.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 2/5] mm: Switch mod_state() to __this_cpu_read()
2011-09-19 22:02 ` Christoph Lameter
@ 2011-09-19 23:48 ` Steven Rostedt
2011-09-20 14:46 ` Christoph Lameter
2011-09-20 13:49 ` Thomas Gleixner
1 sibling, 1 reply; 73+ messages in thread
From: Steven Rostedt @ 2011-09-19 23:48 UTC (permalink / raw)
To: Christoph Lameter
Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
Peter Zijlstra, Christoph Lameter
On Mon, 2011-09-19 at 17:02 -0500, Christoph Lameter wrote:
> On Mon, 19 Sep 2011, Steven Rostedt wrote:
>
> > From: Steven Rostedt <srostedt@redhat.com>
> >
> > The code in mod_state() is already made to handle the raciness of
> > this_cpu_read(). Have the code use __this_cpu_read() instead so
> > the debug code does not trigger warnings about it.
>
> Why would there be a warning triggered? this_cpu_read should take care of
> disabling preemption for the read if needed. In fact the fallback case
> does do exactly that.
What does disabling preemption for just reading the cpu variable help
anywhere? Once you read the variable, and enable preemption, you can
migrate. Then that info you have could be useless.
>
> I think it would make more sence if __this_cpu_read() could be made to
> trigger a warning if used in context where preemption could be off.
Hmm, maybe I should have called it raw_this_cpu_read() then it would
probably have better meaning. Just like smp_processor_id() and
raw_smp_processor_id().
The point is, most places use this_cpu_read(). Adding a "__" or "raw_"
prefix usually means that less checks are done. Think of the
copy_from_user(). You have __copy_from_user() that does less checks.
Most of the per_cpu() and get_cpu_var() code was blindly replaced with
the this_cpu_*() variants. The original code had preemption disabled
checks. Now they don't. Which means if you use the this_cpu_read() and
then make some decision on it, that read may be useless and buggy.
We have smp_processor_id() that does the check (and the old per_cpu()
and get_cpu_var()). Now we have this_cpu_read() which replaced most of
the per_cpu() code in the kernel, making them vulnerable to bugs when
preemption is enabled.
-- Steve
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 2/5] mm: Switch mod_state() to __this_cpu_read()
2011-09-19 23:48 ` Steven Rostedt
@ 2011-09-20 14:46 ` Christoph Lameter
2011-09-20 15:16 ` Steven Rostedt
0 siblings, 1 reply; 73+ messages in thread
From: Christoph Lameter @ 2011-09-20 14:46 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
Peter Zijlstra, Christoph Lameter
On Mon, 19 Sep 2011, Steven Rostedt wrote:
> On Mon, 2011-09-19 at 17:02 -0500, Christoph Lameter wrote:
> > On Mon, 19 Sep 2011, Steven Rostedt wrote:
> >
> > > From: Steven Rostedt <srostedt@redhat.com>
> > >
> > > The code in mod_state() is already made to handle the raciness of
> > > this_cpu_read(). Have the code use __this_cpu_read() instead so
> > > the debug code does not trigger warnings about it.
> >
> > Why would there be a warning triggered? this_cpu_read should take care of
> > disabling preemption for the read if needed. In fact the fallback case
> > does do exactly that.
>
> What does disabling preemption for just reading the cpu variable help
> anywhere? Once you read the variable, and enable preemption, you can
> migrate. Then that info you have could be useless.
The effects on just reading are not that severe. We may calculate the
address of the per cpu variable on one cpu and then be migrated to another
processor. Then there is an access to the per cpu area of another
processor which in many cases is harmless and will just cause the
cacheline to go from exclusive mode into shared mode on the other cpu. The
next write by the other cpu then brings it back to exclusive mode evicting
the cacheline from the cpu we migrated to.
The same conventions are also used for RMV instructions like this_cpu_inc.
In that case the situation is much more sever. Now we are locklessly
incrementing a counter in the per cpu area of another cpu. That is a race
condition that can corrupt the counter data.
> The point is, most places use this_cpu_read(). Adding a "__" or "raw_"
> prefix usually means that less checks are done. Think of the
> copy_from_user(). You have __copy_from_user() that does less checks.
Why do you need any checks there? this_cpu_xxx operations should take care
of preemption.
> Most of the per_cpu() and get_cpu_var() code was blindly replaced with
> the this_cpu_*() variants. The original code had preemption disabled
> checks. Now they don't. Which means if you use the this_cpu_read() and
> then make some decision on it, that read may be useless and buggy.
>
> We have smp_processor_id() that does the check (and the old per_cpu()
> and get_cpu_var()). Now we have this_cpu_read() which replaced most of
> the per_cpu() code in the kernel, making them vulnerable to bugs when
> preemption is enabled.
this_cpu_xx can only replace a get_cpu_var/per_cpu section where a single
word is read or updated. If multiple per cpu variables are accessed and
modified in one go then it is better to keep disabling preemption for
the whole section and use the __this_cpu_xx operations.
If there are cases where this was not done correctly then lets fix those.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 2/5] mm: Switch mod_state() to __this_cpu_read()
2011-09-20 14:46 ` Christoph Lameter
@ 2011-09-20 15:16 ` Steven Rostedt
2011-09-20 15:54 ` Christoph Lameter
0 siblings, 1 reply; 73+ messages in thread
From: Steven Rostedt @ 2011-09-20 15:16 UTC (permalink / raw)
To: Christoph Lameter
Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
Peter Zijlstra, Christoph Lameter
On Tue, 2011-09-20 at 09:46 -0500, Christoph Lameter wrote:
> The effects on just reading are not that severe. We may calculate the
> address of the per cpu variable on one cpu and then be migrated to another
> processor. Then there is an access to the per cpu area of another
> processor which in many cases is harmless and will just cause the
> cacheline to go from exclusive mode into shared mode on the other cpu. The
> next write by the other cpu then brings it back to exclusive mode evicting
> the cacheline from the cpu we migrated to.
Then what protects another process from touching that same CPU variable?
Looks very error prone to me.
>
> The same conventions are also used for RMV instructions like this_cpu_inc.
> In that case the situation is much more sever. Now we are locklessly
> incrementing a counter in the per cpu area of another cpu. That is a race
> condition that can corrupt the counter data.
If you read a per cpu variable with simple this_cpu_read() and then make
a decision based on that variable, it may be useless in the process if
you migrate.
>
> > The point is, most places use this_cpu_read(). Adding a "__" or "raw_"
> > prefix usually means that less checks are done. Think of the
> > copy_from_user(). You have __copy_from_user() that does less checks.
>
> Why do you need any checks there? this_cpu_xxx operations should take care
> of preemption.
NO IT DOES NOT!!!! Look at the damn memcg bug and figure it out!
>
> > Most of the per_cpu() and get_cpu_var() code was blindly replaced with
> > the this_cpu_*() variants. The original code had preemption disabled
> > checks. Now they don't. Which means if you use the this_cpu_read() and
> > then make some decision on it, that read may be useless and buggy.
> >
> > We have smp_processor_id() that does the check (and the old per_cpu()
> > and get_cpu_var()). Now we have this_cpu_read() which replaced most of
> > the per_cpu() code in the kernel, making them vulnerable to bugs when
> > preemption is enabled.
>
> this_cpu_xx can only replace a get_cpu_var/per_cpu section where a single
> word is read or updated. If multiple per cpu variables are accessed and
> modified in one go then it is better to keep disabling preemption for
> the whole section and use the __this_cpu_xx operations.
And that is what __get_cpu_var() is for.
>
> If there are cases where this was not done correctly then lets fix those.
Lets not let this happen in the first place! There's absolutely no
advantage of not disabling preemption for a period of time while per cpu
variables are being used. If we annotate this properly, then the -rt
kernel could handle this in other ways too, to not keep the latency of
preemption off too long.
-- Steve
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 2/5] mm: Switch mod_state() to __this_cpu_read()
2011-09-20 15:16 ` Steven Rostedt
@ 2011-09-20 15:54 ` Christoph Lameter
2011-09-20 16:07 ` Steven Rostedt
2011-09-20 22:19 ` Valdis.Kletnieks
0 siblings, 2 replies; 73+ messages in thread
From: Christoph Lameter @ 2011-09-20 15:54 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
Peter Zijlstra, Christoph Lameter
On Tue, 20 Sep 2011, Steven Rostedt wrote:
> On Tue, 2011-09-20 at 09:46 -0500, Christoph Lameter wrote:
>
> > The effects on just reading are not that severe. We may calculate the
> > address of the per cpu variable on one cpu and then be migrated to another
> > processor. Then there is an access to the per cpu area of another
> > processor which in many cases is harmless and will just cause the
> > cacheline to go from exclusive mode into shared mode on the other cpu. The
> > next write by the other cpu then brings it back to exclusive mode evicting
> > the cacheline from the cpu we migrated to.
>
> Then what protects another process from touching that same CPU variable?
> Looks very error prone to me.
I am sorry but this is a quite puzzling statement. Any processor in kernel
mode can create arbitrary pointers to access any memory location it wants.
That actually occurs legitimately when the sum of all per cpu values is
calculated.
> > The same conventions are also used for RMV instructions like this_cpu_inc.
> > In that case the situation is much more sever. Now we are locklessly
> > incrementing a counter in the per cpu area of another cpu. That is a race
> > condition that can corrupt the counter data.
>
> If you read a per cpu variable with simple this_cpu_read() and then make
> a decision based on that variable, it may be useless in the process if
> you migrate.
Correct but in many context that does not matter.
> > > The point is, most places use this_cpu_read(). Adding a "__" or "raw_"
> > > prefix usually means that less checks are done. Think of the
> > > copy_from_user(). You have __copy_from_user() that does less checks.
> >
> > Why do you need any checks there? this_cpu_xxx operations should take care
> > of preemption.
>
> NO IT DOES NOT!!!! Look at the damn memcg bug and figure it out!
IT DOES!!!! The memcg bug is caused by someone using this_cpu_read and
expecting the cpu not to change. this_cpu_xx ops stand on their own and do
not guarantee access to the same per cpu area.
> > this_cpu_xx can only replace a get_cpu_var/per_cpu section where a single
> > word is read or updated. If multiple per cpu variables are accessed and
> > modified in one go then it is better to keep disabling preemption for
> > the whole section and use the __this_cpu_xx operations.
>
> And that is what __get_cpu_var() is for.
__get_cpu_var cannot generate a single instruction that does RMW ops on a
counter without disabling preemption.
> > If there are cases where this was not done correctly then lets fix those.
>
> Lets not let this happen in the first place! There's absolutely no
> advantage of not disabling preemption for a period of time while per cpu
> variables are being used. If we annotate this properly, then the -rt
> kernel could handle this in other ways too, to not keep the latency of
> preemption off too long.
There are significant advantages for counters and things designed to
operate in an environment where the OS can migrate a task at will.
Vvmstat and the slub fastpaths exploit these things now and it will be
possible with these operations to increase the performance of more
subsystems that way. The page allocator comes to mind.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 2/5] mm: Switch mod_state() to __this_cpu_read()
2011-09-20 15:54 ` Christoph Lameter
@ 2011-09-20 16:07 ` Steven Rostedt
2011-09-20 22:19 ` Valdis.Kletnieks
1 sibling, 0 replies; 73+ messages in thread
From: Steven Rostedt @ 2011-09-20 16:07 UTC (permalink / raw)
To: Christoph Lameter
Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
Peter Zijlstra, Christoph Lameter
On Tue, 2011-09-20 at 10:54 -0500, Christoph Lameter wrote:
> There are significant advantages for counters and things designed to
> operate in an environment where the OS can migrate a task at will.
> Vvmstat and the slub fastpaths exploit these things now and it will be
> possible with these operations to increase the performance of more
> subsystems that way. The page allocator comes to mind.
I'm sorry but correctness beats performance every time! What you have
done seems to be micro optimizations with the sacrifice to correctness.
If you kept the damn this_cpu_*() local to the slab and page allocators,
we may not have even noticed. But you went ahead and made this a very
intrusive invasion into the rest of the kernel, and caused bugs to be
missed everywhere.
-- Steve
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 2/5] mm: Switch mod_state() to __this_cpu_read()
2011-09-20 15:54 ` Christoph Lameter
2011-09-20 16:07 ` Steven Rostedt
@ 2011-09-20 22:19 ` Valdis.Kletnieks
1 sibling, 0 replies; 73+ messages in thread
From: Valdis.Kletnieks @ 2011-09-20 22:19 UTC (permalink / raw)
To: Christoph Lameter
Cc: Steven Rostedt, linux-kernel, Ingo Molnar, Andrew Morton,
Thomas Gleixner, Peter Zijlstra, Christoph Lameter
[-- Attachment #1: Type: text/plain, Size: 762 bytes --]
On Tue, 20 Sep 2011 10:54:10 CDT, Christoph Lameter said:
> IT DOES!!!! The memcg bug is caused by someone using this_cpu_read and
> expecting the cpu not to change. this_cpu_xx ops stand on their own and do
> not guarantee access to the same per cpu area.
As most 6 year olds will tell you, "Well, DUH!". Maybe if you had called it
"this_cpu_or_maybe_that_one_over_there_read()", people wouldn't use it wrong.
"If people place a nice chocky in their mouth, they don't want their cheeks
pierced. In any case this is an inadequate description of the sweetmeat. I
shall have to ask you to accompany me to the station." -- Monty Python.
"inadequate description" hardly begins to cover the design of this API, which
definitely needs to be taken to the station.
[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 2/5] mm: Switch mod_state() to __this_cpu_read()
2011-09-19 22:02 ` Christoph Lameter
2011-09-19 23:48 ` Steven Rostedt
@ 2011-09-20 13:49 ` Thomas Gleixner
2011-09-20 14:01 ` Steven Rostedt
2011-09-20 14:51 ` Christoph Lameter
1 sibling, 2 replies; 73+ messages in thread
From: Thomas Gleixner @ 2011-09-20 13:49 UTC (permalink / raw)
To: Christoph Lameter
Cc: Steven Rostedt, LKML, Ingo Molnar, Andrew Morton, Peter Zijlstra,
Christoph Lameter, Tejun Heo
On Mon, 19 Sep 2011, Christoph Lameter wrote:
> On Mon, 19 Sep 2011, Steven Rostedt wrote:
>
> > From: Steven Rostedt <srostedt@redhat.com>
> >
> > The code in mod_state() is already made to handle the raciness of
> > this_cpu_read(). Have the code use __this_cpu_read() instead so
> > the debug code does not trigger warnings about it.
>
> Why would there be a warning triggered? this_cpu_read should take care of
> disabling preemption for the read if needed. In fact the fallback case
> does do exactly that.
And what exactly is the purpose of having a preempt_disable()/enable()
pair in this_cpu_read()?
Nothing, AFAICT. And looking at the usage:
arch/x86/lib/delay.c:124: (this_cpu_read(cpu_info.loops_per_jiffy) * (HZ/4)));
Pointless, as we do not care about which loops_per_jiffy we access.
arch/x86/mm/tlb.c:179: sender = this_cpu_read(tlb_vector_offset);
Pointless, as all callers have preemption disabled.
drivers/acpi/processor_throttling.c:719: if ((this_cpu_read(cpu_info.x86_vendor) != X86_VENDOR_INTEL) ||
drivers/acpi/processor_throttling.c:740: if ((this_cpu_read(cpu_info.x86_vendor) != X86_VENDOR_INTEL) ||
The conversion was made in commit 9d42a53e with a lousy changelog and
this_cpu_read() with the given semantics is simply crap as this code
wants to run on a given cpu as we access msrs which we don't want to
do in a random fashion. In fact if you had used __this_cpu_read() and
that code would have contained a debug check then the callchain coming
up from thermal_cooling_device_cur_state_store() would have triggered
it AFAICT.
drivers/dma/dmaengine.c:331: return this_cpu_read(channel_table[tx_type]->chan);
Blindly converted w/o noticing the obvious problem with this code
(even befor the conversion). We just blindly return a channel from the
cpu on which this happens to be called. Can all the calling code deal
with possibly being migrated away after that ? If yes, then this wants
to have a comment. If no, then the calling convention wants to be
documented.
drivers/input/gameport/gameport.c:124: return (this_cpu_read(cpu_info.loops_per_jiffy) *
Harmless
include/linux/interrupt.h:462: return this_cpu_read(ksoftirqd);
Pointless. This is called from account_system_vtime() and we already
use __this_cpu_read() in that very function.
kernel/irq_work.c:125: if (this_cpu_read(irq_work_list) == NULL)
Pointless, called from interrupts disabled context
kernel/sched.c:2013: latest_ns = this_cpu_read(cpu_hardirq_time);
Pointless, has interrupts disabled
kernel/sched.c:2028: latest_ns = this_cpu_read(cpu_softirq_time);
Ditto
mm/memcontrol.c:686: val = this_cpu_read(mem->stat->events[MEM_CGROUP_EVENTS_COUNT]);
mm/memcontrol.c:687: next = this_cpu_read(mem->stat->targets[target]);
Completely broken as Steven pointed out already
mm/memcontrol.c:696: val = this_cpu_read(mem->stat->events[MEM_CGROUP_EVENTS_COUNT]);
Broken as well, as it writes back to the same per cpu var, unless
called from atomic contexts.
mm/memcontrol.c:1321: return this_cpu_read(mem->stat->count[MEM_CGROUP_ON_MOVE]) > 0;
Either broken by returning random data or pointless when called from
atomic contexts.
mm/vmstat.c:331: t = this_cpu_read(pcp->stat_threshold);
mm/vmstat.c:333: o = this_cpu_read(*p);
Race known and dealt with.
So most of the this_cpu_read() sites are either pointless or broken. I
suspect that other this_cpu_* stuff has the same problems.
So what's the point of that interface again? Random access to random
cpu data protected with a preempt_disable/enable() pair to paper over
the real bugs?
The whole idea of having this_cpu_* which can be called from any
context is completely stupid as it is just a guarantee for misuse and
failure.
The best thing is to remove the this_cpu_* hackery alltogether and
just keep the __this_cpu_* versions (along with proper debugging) and
git rid of the silly underscores.
> I think it would make more sence if __this_cpu_read() could be made to
> trigger a warning if used in context where preemption could be off.
It should have had such a warning in the first place and the warning
needs to yell about preemptible (i.e. unprotected) context and not the
other way round.
But instead of just slapping smp_processor_id() checks into those
functions we should add a more sensible debug interface like:
debug_check_percpu_access()
and the per cpu sections which require protection over a series (1
.. N) of this_cpu_* operations want to have
this_cpu_start()
this_cpu_end()
or similar annotations around them.
This allows us to do proper analysis of this_cpu usage and makes the
code understandable.
Thanks,
tglx
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 2/5] mm: Switch mod_state() to __this_cpu_read()
2011-09-20 13:49 ` Thomas Gleixner
@ 2011-09-20 14:01 ` Steven Rostedt
2011-09-20 14:51 ` Christoph Lameter
1 sibling, 0 replies; 73+ messages in thread
From: Steven Rostedt @ 2011-09-20 14:01 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Christoph Lameter, LKML, Ingo Molnar, Andrew Morton,
Peter Zijlstra, Christoph Lameter, Tejun Heo
On Tue, 2011-09-20 at 15:49 +0200, Thomas Gleixner wrote:
> The best thing is to remove the this_cpu_* hackery alltogether and
> just keep the __this_cpu_* versions (along with proper debugging) and
> git rid of the silly underscores.
I think renaming all the __this_cpu_*() to this_cpu_*() and then
removing all the current this_cpu_*() usages is the best thing.
That is, make it mandatory that all this_cpu_*() users are with preempt
disabled.
>
> > I think it would make more sence if __this_cpu_read() could be made to
> > trigger a warning if used in context where preemption could be off.
>
> It should have had such a warning in the first place and the warning
> needs to yell about preemptible (i.e. unprotected) context and not the
> other way round.
>
> But instead of just slapping smp_processor_id() checks into those
> functions we should add a more sensible debug interface like:
>
> debug_check_percpu_access()
Sure, I'd do something like that in the this_cpu_*() code. But since
smp_processor_id() already had the bug triggering. I just did the easy
approach as more of a proof of concept (hence the RFC in the patch set).
>
> and the per cpu sections which require protection over a series (1
> .. N) of this_cpu_* operations want to have
>
> this_cpu_start()
> this_cpu_end()
>
> or similar annotations around them.
I agree about the concept but hate the naming.
On IRC, Peter suggested a local_lock_t type. Give a named protection
area. I like this approach as we can add debugging infrastructure to
this concept.
local_lock_t my_vars;
local_lock(my_vars);
var = this_cpu_read();
/* play with var */
local_unlock(my_vars);
Peter even suggested for those locations that disable interrupts for
this work we could document that with:
local_lock_irqsave(my_vars, flags);
[...]
local_unlock_irqrestore(my_vars, flags);
This documents nicely the area of the protected variables. The
local_lock() could simply turn into a preempt_disable(), but we could
add hooks into lockdep or something to perform other checks. Even have
sparse or coccinelle find usages of var outside of local_lock(), and
report them.
As the kernel gets more complex, especially with the focus on scaling to
huge number of CPUs, and moving towards per_cpu data handling, having
debugging capabilities of this sort is critical.
>
> This allows us to do proper analysis of this_cpu usage and makes the
> code understandable.
Totally agree!
-- Steve
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 2/5] mm: Switch mod_state() to __this_cpu_read()
2011-09-20 13:49 ` Thomas Gleixner
2011-09-20 14:01 ` Steven Rostedt
@ 2011-09-20 14:51 ` Christoph Lameter
2011-09-20 15:11 ` Steven Rostedt
2011-09-20 15:27 ` Thomas Gleixner
1 sibling, 2 replies; 73+ messages in thread
From: Christoph Lameter @ 2011-09-20 14:51 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Steven Rostedt, LKML, Ingo Molnar, Andrew Morton, Peter Zijlstra,
Christoph Lameter, Tejun Heo
On Tue, 20 Sep 2011, Thomas Gleixner wrote:
> > Why would there be a warning triggered? this_cpu_read should take care of
> > disabling preemption for the read if needed. In fact the fallback case
> > does do exactly that.
>
> And what exactly is the purpose of having a preempt_disable()/enable()
> pair in this_cpu_read()?
It avoids a read from one cpu of another processors per cpu area. In the
fallback case the processor has to calculate the per cpu address and then
retrieve data from the calculated address. The process can be migrated
between calculation of the address and the actual fetch.
The more severe cases are the RMV operations like this_cpu_inc.
> functions we should add a more sensible debug interface like:
>
> debug_check_percpu_access()
>
> and the per cpu sections which require protection over a series (1
> .. N) of this_cpu_* operations want to have
>
> this_cpu_start()
> this_cpu_end()
>
> or similar annotations around them.
>
> This allows us to do proper analysis of this_cpu usage and makes the
> code understandable.
Not sure what you want to check in this_cpu_xx operations. Why could
there be issues with this_cpu_xx operations?
I see that __this_cpu_xx operations may not work as intended in
preemptable contexts and there we could have more changes.
this_cpu ops are single instructions that do not guarantee any consistency
with other this_cpu_ops. If you want consistency (same per cpu area data)
then preemption needs to be disabled.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 2/5] mm: Switch mod_state() to __this_cpu_read()
2011-09-20 14:51 ` Christoph Lameter
@ 2011-09-20 15:11 ` Steven Rostedt
2011-09-20 15:59 ` Christoph Lameter
2011-09-20 15:27 ` Thomas Gleixner
1 sibling, 1 reply; 73+ messages in thread
From: Steven Rostedt @ 2011-09-20 15:11 UTC (permalink / raw)
To: Christoph Lameter
Cc: Thomas Gleixner, LKML, Ingo Molnar, Andrew Morton, Peter Zijlstra,
Christoph Lameter, Tejun Heo
On Tue, 2011-09-20 at 09:51 -0500, Christoph Lameter wrote:
> I see that __this_cpu_xx operations may not work as intended in
> preemptable contexts and there we could have more changes.
Then why do you use it in slub.c?
in mm/slab.c slab_alloc():
redo:
/*
* Must read kmem_cache cpu data via this cpu ptr. Preemption is
* enabled. We may switch back and forth between cpus while
* reading from one cpu area. That does not matter as long
* as we end up on the original cpu again when doing the cmpxchg.
*/
c = __this_cpu_ptr(s->cpu_slab);
The __this_cpu_*() is in preempt enabled location. In fact, the three
this_cpu_write()'s in acquire_slab() is done within a spinlock and thus
with preemption disabled.
>
> this_cpu ops are single instructions that do not guarantee any consistency
> with other this_cpu_ops. If you want consistency (same per cpu area data)
> then preemption needs to be disabled.
The point is, people get it wrong all the time. In fact, we should
really require that ALL USES of this_cpu_*() must be with preemption
disabled. Regardless. Because anytime you touch a per cpu variable,
there's usually a reason that it is on the cpu you are on. If you don't
have preemption disabled (and I don't count the silly:
#define _this_cpu_generic_read(pcp) \
({ typeof(pcp) ret__; \
preempt_disable(); \
ret__ = *this_cpu_ptr(&(pcp)); \
preempt_enable(); \
ret__; \
})
Which is totally useless), there's probably a bug somewhere in there.
Just like the code in memcg.
Monkeying around with per cpu data is tricky. If you start doing it in
preempt enabled code, you are most certainly about to get it wrong. Why
have this super optimization. A preempt_disable() is a single operation
that touches cache hot data.
I say nuke all users of preempt_enabled per_cpu touching, and be safe.
-- Steve
^ permalink raw reply [flat|nested] 73+ messages in thread* Re: [RFC][PATCH 2/5] mm: Switch mod_state() to __this_cpu_read()
2011-09-20 15:11 ` Steven Rostedt
@ 2011-09-20 15:59 ` Christoph Lameter
2011-09-20 16:03 ` Steven Rostedt
0 siblings, 1 reply; 73+ messages in thread
From: Christoph Lameter @ 2011-09-20 15:59 UTC (permalink / raw)
To: Steven Rostedt
Cc: Thomas Gleixner, LKML, Ingo Molnar, Andrew Morton, Peter Zijlstra,
Christoph Lameter, Tejun Heo
On Tue, 20 Sep 2011, Steven Rostedt wrote:
> On Tue, 2011-09-20 at 09:51 -0500, Christoph Lameter wrote:
>
> > I see that __this_cpu_xx operations may not work as intended in
> > preemptable contexts and there we could have more changes.
>
> Then why do you use it in slub.c?
>
> in mm/slab.c slab_alloc():
>
> redo:
>
> /*
> * Must read kmem_cache cpu data via this cpu ptr. Preemption is
> * enabled. We may switch back and forth between cpus while
> * reading from one cpu area. That does not matter as long
> * as we end up on the original cpu again when doing the cmpxchg.
> */
> c = __this_cpu_ptr(s->cpu_slab);
>
> The __this_cpu_*() is in preempt enabled location. In fact, the three
> this_cpu_write()'s in acquire_slab() is done within a spinlock and thus
> with preemption disabled.
So the this_cpu_writes could become __this_cpu_writes
The __this_cpu_ptr operation in slab_alloc is special. We explicitly do
address calculation and do not care from which per cpu area we fetch
multiple per cpu data items because we can verify that we are on the same
per cpu area during the this_cpu_cmpxchg_double operation.
> The point is, people get it wrong all the time. In fact, we should
> really require that ALL USES of this_cpu_*() must be with preemption
> disabled. Regardless. Because anytime you touch a per cpu variable,
NO!! This defeats the whole purpose of this_cpu_ops and make the whole
scheme utterly useless.
> Monkeying around with per cpu data is tricky. If you start doing it in
> preempt enabled code, you are most certainly about to get it wrong. Why
> have this super optimization. A preempt_disable() is a single operation
> that touches cache hot data.
There are trivial cases like counter increments that are not a problem at
all. Most use cases are those. More complex ones can be developed to avoid
various overhead in performance critical sections of the kernel.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 2/5] mm: Switch mod_state() to __this_cpu_read()
2011-09-20 15:59 ` Christoph Lameter
@ 2011-09-20 16:03 ` Steven Rostedt
2011-09-20 16:07 ` Christoph Lameter
0 siblings, 1 reply; 73+ messages in thread
From: Steven Rostedt @ 2011-09-20 16:03 UTC (permalink / raw)
To: Christoph Lameter
Cc: Thomas Gleixner, LKML, Ingo Molnar, Andrew Morton, Peter Zijlstra,
Christoph Lameter, Tejun Heo
On Tue, 2011-09-20 at 10:59 -0500, Christoph Lameter wrote:
> On Tue, 20 Sep 2011, Steven Rostedt wrote:
>
> > On Tue, 2011-09-20 at 09:51 -0500, Christoph Lameter wrote:
> >
> > > I see that __this_cpu_xx operations may not work as intended in
> > > preemptable contexts and there we could have more changes.
> >
> > Then why do you use it in slub.c?
> >
> > in mm/slab.c slab_alloc():
> >
> > redo:
> >
> > /*
> > * Must read kmem_cache cpu data via this cpu ptr. Preemption is
> > * enabled. We may switch back and forth between cpus while
> > * reading from one cpu area. That does not matter as long
> > * as we end up on the original cpu again when doing the cmpxchg.
> > */
> > c = __this_cpu_ptr(s->cpu_slab);
> >
> > The __this_cpu_*() is in preempt enabled location. In fact, the three
> > this_cpu_write()'s in acquire_slab() is done within a spinlock and thus
> > with preemption disabled.
>
> So the this_cpu_writes could become __this_cpu_writes
>
> The __this_cpu_ptr operation in slab_alloc is special. We explicitly do
> address calculation and do not care from which per cpu area we fetch
> multiple per cpu data items because we can verify that we are on the same
> per cpu area during the this_cpu_cmpxchg_double operation.
ARGH! You now have special cases that break the sanity of it all? This
just proves that the design is wrong.
>
> > The point is, people get it wrong all the time. In fact, we should
> > really require that ALL USES of this_cpu_*() must be with preemption
> > disabled. Regardless. Because anytime you touch a per cpu variable,
>
> NO!! This defeats the whole purpose of this_cpu_ops and make the whole
> scheme utterly useless.
The thing is, the whole purpose was broken to begin with. Defeating a
broken design is a good thing!
>
> > Monkeying around with per cpu data is tricky. If you start doing it in
> > preempt enabled code, you are most certainly about to get it wrong. Why
> > have this super optimization. A preempt_disable() is a single operation
> > that touches cache hot data.
>
> There are trivial cases like counter increments that are not a problem at
> all. Most use cases are those. More complex ones can be developed to avoid
> various overhead in performance critical sections of the kernel.
>
And adding a preempt_disable; this_cpu_inc(); preempt_enable; is not a
bad thing either.
What benchmarks do you have that shows this helped in anything????
-- Steve
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 2/5] mm: Switch mod_state() to __this_cpu_read()
2011-09-20 16:03 ` Steven Rostedt
@ 2011-09-20 16:07 ` Christoph Lameter
0 siblings, 0 replies; 73+ messages in thread
From: Christoph Lameter @ 2011-09-20 16:07 UTC (permalink / raw)
To: Steven Rostedt
Cc: Thomas Gleixner, LKML, Ingo Molnar, Andrew Morton, Peter Zijlstra,
Christoph Lameter, Tejun Heo
On Tue, 20 Sep 2011, Steven Rostedt wrote:
> > NO!! This defeats the whole purpose of this_cpu_ops and make the whole
> > scheme utterly useless.
>
> The thing is, the whole purpose was broken to begin with. Defeating a
> broken design is a good thing!
So then we are not allowed to use segment prefixes in core code to avoid
preempt enable disable? And to avoid interrupt disabling enabling in
critical allocator sections?
> > There are trivial cases like counter increments that are not a problem at
> > all. Most use cases are those. More complex ones can be developed to avoid
> > various overhead in performance critical sections of the kernel.
> >
>
> And adding a preempt_disable; this_cpu_inc(); preempt_enable; is not a
> bad thing either.
It defeats the purpose of the whole thing.
> What benchmarks do you have that shows this helped in anything????
The various patchsets that went into the kernel had benchmarks results.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 2/5] mm: Switch mod_state() to __this_cpu_read()
2011-09-20 14:51 ` Christoph Lameter
2011-09-20 15:11 ` Steven Rostedt
@ 2011-09-20 15:27 ` Thomas Gleixner
2011-09-20 16:02 ` Christoph Lameter
1 sibling, 1 reply; 73+ messages in thread
From: Thomas Gleixner @ 2011-09-20 15:27 UTC (permalink / raw)
To: Christoph Lameter
Cc: Steven Rostedt, LKML, Ingo Molnar, Andrew Morton, Peter Zijlstra,
Christoph Lameter, Tejun Heo
On Tue, 20 Sep 2011, Christoph Lameter wrote:
> On Tue, 20 Sep 2011, Thomas Gleixner wrote:
> > This allows us to do proper analysis of this_cpu usage and makes the
> > code understandable.
>
> Not sure what you want to check in this_cpu_xx operations. Why could
> there be issues with this_cpu_xx operations?
I want to get rid of them alltogether. They are crap by design and
prone to be used wrong without a sensible way to detect that.
> I see that __this_cpu_xx operations may not work as intended in
> preemptable contexts and there we could have more changes.
>
> this_cpu ops are single instructions that do not guarantee any consistency
> with other this_cpu_ops. If you want consistency (same per cpu area data)
> then preemption needs to be disabled.
Right, and that's the main problem. We have no fricking way to debug
this, so they should have never been there in the first place. And you
simply CANNOT prevent people from getting this wrong w/o proper
debugging and annotation. Even YOU and Tejun made bogus conversion w/o
noticing, but you expect that others get it right?
So stop defending that trainwreck and help out with fixing the mess
you created in a proper and debugable way!
Thanks,
tglx
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 2/5] mm: Switch mod_state() to __this_cpu_read()
2011-09-20 15:27 ` Thomas Gleixner
@ 2011-09-20 16:02 ` Christoph Lameter
2011-09-20 16:51 ` Thomas Gleixner
0 siblings, 1 reply; 73+ messages in thread
From: Christoph Lameter @ 2011-09-20 16:02 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Steven Rostedt, LKML, Ingo Molnar, Andrew Morton, Peter Zijlstra,
Christoph Lameter, Tejun Heo
On Tue, 20 Sep 2011, Thomas Gleixner wrote:
> I want to get rid of them alltogether. They are crap by design and
> prone to be used wrong without a sensible way to detect that.
Counter increments have been done this way for a very long time. This is
a pretty well established way of doing things in the kernel that was
expanded and made better by avoid preempt disable/enable around these per
cpu pointer increments.
> Right, and that's the main problem. We have no fricking way to debug
> this, so they should have never been there in the first place. And you
> simply CANNOT prevent people from getting this wrong w/o proper
> debugging and annotation. Even YOU and Tejun made bogus conversion w/o
> noticing, but you expect that others get it right?
Bogus conversions?
> So stop defending that trainwreck and help out with fixing the mess
> you created in a proper and debugable way!
You want debugging to ensure that this_cpu ops always access the intended
per cpu area? But the point of these operations is to be used in kernel
locations where the use of any per cpu area is satisfactory.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 2/5] mm: Switch mod_state() to __this_cpu_read()
2011-09-20 16:02 ` Christoph Lameter
@ 2011-09-20 16:51 ` Thomas Gleixner
2011-09-20 17:08 ` Steven Rostedt
0 siblings, 1 reply; 73+ messages in thread
From: Thomas Gleixner @ 2011-09-20 16:51 UTC (permalink / raw)
To: Christoph Lameter
Cc: Steven Rostedt, LKML, Ingo Molnar, Andrew Morton, Peter Zijlstra,
Christoph Lameter, Tejun Heo
On Tue, 20 Sep 2011, Christoph Lameter wrote:
> On Tue, 20 Sep 2011, Thomas Gleixner wrote:
> > I want to get rid of them alltogether. They are crap by design and
> > prone to be used wrong without a sensible way to detect that.
>
> Counter increments have been done this way for a very long time. This is
> a pretty well established way of doing things in the kernel that was
> expanded and made better by avoid preempt disable/enable around these per
> cpu pointer increments.
This still can be done, when the functions are actually having
sensible names and annotations.
> > Right, and that's the main problem. We have no fricking way to debug
> > this, so they should have never been there in the first place. And you
> > simply CANNOT prevent people from getting this wrong w/o proper
> > debugging and annotation. Even YOU and Tejun made bogus conversion w/o
> > noticing, but you expect that others get it right?
>
> Bogus conversions?
Is the list I posted not enough?
The usage of this_cpu_read() in functions which require to be called
with preemption or interrupts disabled is pointless and worse it
actually removed debug functionality in some places which used
smp_processor_id() before.
I do not care at all whether this_cpu_read() on x86 does not have the
preempt_enable/disable() pair, but I care a lot about the correctness
of the code and debugability.
> > So stop defending that trainwreck and help out with fixing the mess
> > you created in a proper and debugable way!
>
> You want debugging to ensure that this_cpu ops always access the intended
> per cpu area? But the point of these operations is to be used in kernel
> locations where the use of any per cpu area is satisfactory.
The point is that the whole design assumes that people get it correct
and you spent not a split second to provide any means of debugability
for this. Now you argue in circles about your performance numbers and
let others deal with the fallout and refuse to think about a sensible
solution to the problem.
It does not matter at all what you intended and what people should do,
but it does matter very much what unintentional and undebugable
wreckage can be caused by it.
We spent huge efforts to make locks debugable and we still have people
getting it wrong and you expect that something as complex as the per
cpu stuff is just correct by definition.
And the problem of this_cpu & co starts with the naming convention.
this_cpu_*() is patently wrong. It should be: random_cpu_*() or
any_cpu_*(). This way you could have avoided confusion in the first
place and made it entirely clear what the interfaces are about.
Thanks,
tglx
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 2/5] mm: Switch mod_state() to __this_cpu_read()
2011-09-20 16:51 ` Thomas Gleixner
@ 2011-09-20 17:08 ` Steven Rostedt
0 siblings, 0 replies; 73+ messages in thread
From: Steven Rostedt @ 2011-09-20 17:08 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Christoph Lameter, LKML, Ingo Molnar, Andrew Morton,
Peter Zijlstra, Christoph Lameter, Tejun Heo
On Tue, 2011-09-20 at 18:51 +0200, Thomas Gleixner wrote:
> this_cpu_*() is patently wrong. It should be: random_cpu_*() or
> any_cpu_*(). This way you could have avoided confusion in the first
> place and made it entirely clear what the interfaces are about.
Thinking about this, my vote is for "any_cpu_*()". I initially liked my
own snapshot_cpu_*(), and even random_cpu_*() is humorous. But truly,
the "any_cpu_*()" has the best meaning. As you really don't seem to care
about which CPU you access. Sure you can document something like:
/*
* Will modify the per cpu data on the current CPU, but there is no
* guarantee which CPU you may be on, as you can migrate just before
* or after calling this function. Only the scope of this function
* will be atomic to the CPU the task is currently on, but no guarantee
* before or after the function. If you care about that, disable
* preemption and use the this_cpu_*() variants.
*/
-- Steve
^ permalink raw reply [flat|nested] 73+ messages in thread
* [RFC][PATCH 3/5] memcg: Disable preemption in memcg_check_events()
2011-09-19 21:20 [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write() Steven Rostedt
2011-09-19 21:20 ` [RFC][PATCH 1/5] x86: Remove const_udelay() caring about which cpu var it uses Steven Rostedt
2011-09-19 21:20 ` [RFC][PATCH 2/5] mm: Switch mod_state() to __this_cpu_read() Steven Rostedt
@ 2011-09-19 21:20 ` Steven Rostedt
2011-09-20 14:20 ` Johannes Weiner
2011-09-24 0:46 ` Steven Rostedt
2011-09-19 21:20 ` [RFC][PATCH 4/5] printk: Have wake_up_klogd() use __this_cpu_write() Steven Rostedt
` (3 subsequent siblings)
6 siblings, 2 replies; 73+ messages in thread
From: Steven Rostedt @ 2011-09-19 21:20 UTC (permalink / raw)
To: linux-kernel
Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
Christoph Lameter, Johannes Weiner, Greg Thelen,
KAMEZAWA Hiroyuki, Balbir Singh, Daisuke Nishimura
[-- Attachment #1: 0003-memcg-Disable-preemption-in-memcg_check_events.patch --]
[-- Type: text/plain, Size: 1431 bytes --]
From: Steven Rostedt <srostedt@redhat.com>
The code in memcg_check_events() calls this_cpu_read() on
different variables without disabling preemption, and can cause
the calculations to be done from two different CPU variables.
Disable preemption throughout the check to keep apples and oranges
from becoming a mixed drink.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Greg Thelen <gthelen@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
mm/memcontrol.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 3508777..a164c93 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -718,6 +718,7 @@ static void __mem_cgroup_target_update(struct mem_cgroup *mem, int target)
*/
static void memcg_check_events(struct mem_cgroup *mem, struct page *page)
{
+ preempt_disable();
/* threshold event is triggered in finer grain than soft limit */
if (unlikely(__memcg_event_check(mem, MEM_CGROUP_TARGET_THRESH))) {
mem_cgroup_threshold(mem);
@@ -737,6 +738,7 @@ static void memcg_check_events(struct mem_cgroup *mem, struct page *page)
}
#endif
}
+ preempt_enable();
}
static struct mem_cgroup *mem_cgroup_from_cont(struct cgroup *cont)
--
1.7.5.4
^ permalink raw reply related [flat|nested] 73+ messages in thread* Re: [RFC][PATCH 3/5] memcg: Disable preemption in memcg_check_events()
2011-09-19 21:20 ` [RFC][PATCH 3/5] memcg: Disable preemption in memcg_check_events() Steven Rostedt
@ 2011-09-20 14:20 ` Johannes Weiner
2011-09-20 14:24 ` Johannes Weiner
2011-09-24 0:46 ` Steven Rostedt
1 sibling, 1 reply; 73+ messages in thread
From: Johannes Weiner @ 2011-09-20 14:20 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
Peter Zijlstra, Christoph Lameter, Greg Thelen, KAMEZAWA Hiroyuki,
Balbir Singh, Daisuke Nishimura
On Mon, Sep 19, 2011 at 05:20:43PM -0400, Steven Rostedt wrote:
> From: Steven Rostedt <srostedt@redhat.com>
>
> The code in memcg_check_events() calls this_cpu_read() on
> different variables without disabling preemption, and can cause
> the calculations to be done from two different CPU variables.
>
> Disable preemption throughout the check to keep apples and oranges
> from becoming a mixed drink.
Makes sense, thanks!
Since the atomic versions are no longer required with preemption
disabled explicitely, could you also make the this_cpu ops in
__memcg_event_check and __mem_cgroup_target_update non-atomic in the
same go?
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 3/5] memcg: Disable preemption in memcg_check_events()
2011-09-20 14:20 ` Johannes Weiner
@ 2011-09-20 14:24 ` Johannes Weiner
2011-09-20 14:33 ` Steven Rostedt
0 siblings, 1 reply; 73+ messages in thread
From: Johannes Weiner @ 2011-09-20 14:24 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
Peter Zijlstra, Christoph Lameter, Greg Thelen, KAMEZAWA Hiroyuki,
Balbir Singh, Daisuke Nishimura
On Tue, Sep 20, 2011 at 04:20:31PM +0200, Johannes Weiner wrote:
> On Mon, Sep 19, 2011 at 05:20:43PM -0400, Steven Rostedt wrote:
> > From: Steven Rostedt <srostedt@redhat.com>
> >
> > The code in memcg_check_events() calls this_cpu_read() on
> > different variables without disabling preemption, and can cause
> > the calculations to be done from two different CPU variables.
> >
> > Disable preemption throughout the check to keep apples and oranges
> > from becoming a mixed drink.
>
> Makes sense, thanks!
>
> Since the atomic versions are no longer required with preemption
> disabled explicitely, could you also make the this_cpu ops in
> __memcg_event_check and __mem_cgroup_target_update non-atomic in the
> same go?
Sorry, shouldn't be on you, can you fold this in?
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b76011a..9d4ba65 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -683,8 +683,8 @@ static bool __memcg_event_check(struct mem_cgroup *memcg, int target)
{
unsigned long val, next;
- val = this_cpu_read(memcg->stat->events[MEM_CGROUP_EVENTS_COUNT]);
- next = this_cpu_read(memcg->stat->targets[target]);
+ val = __this_cpu_read(memcg->stat->events[MEM_CGROUP_EVENTS_COUNT]);
+ next = __this_cpu_read(memcg->stat->targets[target]);
/* from time_after() in jiffies.h */
return ((long)next - (long)val < 0);
}
@@ -693,7 +693,7 @@ static void __mem_cgroup_target_update(struct mem_cgroup *memcg, int target)
{
unsigned long val, next;
- val = this_cpu_read(memcg->stat->events[MEM_CGROUP_EVENTS_COUNT]);
+ val = __this_cpu_read(memcg->stat->events[MEM_CGROUP_EVENTS_COUNT]);
switch (target) {
case MEM_CGROUP_TARGET_THRESH:
@@ -709,7 +709,7 @@ static void __mem_cgroup_target_update(struct mem_cgroup *memcg, int target)
return;
}
- this_cpu_write(memcg->stat->targets[target], next);
+ __this_cpu_write(memcg->stat->targets[target], next);
}
/*
^ permalink raw reply related [flat|nested] 73+ messages in thread* Re: [RFC][PATCH 3/5] memcg: Disable preemption in memcg_check_events()
2011-09-20 14:24 ` Johannes Weiner
@ 2011-09-20 14:33 ` Steven Rostedt
0 siblings, 0 replies; 73+ messages in thread
From: Steven Rostedt @ 2011-09-20 14:33 UTC (permalink / raw)
To: Johannes Weiner
Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
Peter Zijlstra, Christoph Lameter, Greg Thelen, KAMEZAWA Hiroyuki,
Balbir Singh, Daisuke Nishimura
On Tue, 2011-09-20 at 16:24 +0200, Johannes Weiner wrote:
> On Tue, Sep 20, 2011 at 04:20:31PM +0200, Johannes Weiner wrote:
> > On Mon, Sep 19, 2011 at 05:20:43PM -0400, Steven Rostedt wrote:
> > > From: Steven Rostedt <srostedt@redhat.com>
> > >
> > > The code in memcg_check_events() calls this_cpu_read() on
> > > different variables without disabling preemption, and can cause
> > > the calculations to be done from two different CPU variables.
> > >
> > > Disable preemption throughout the check to keep apples and oranges
> > > from becoming a mixed drink.
> >
> > Makes sense, thanks!
> >
> > Since the atomic versions are no longer required with preemption
> > disabled explicitely, could you also make the this_cpu ops in
> > __memcg_event_check and __mem_cgroup_target_update non-atomic in the
> > same go?
Although this patch set is RFC, I probably should go ahead and push this
one forward, as it looks to be a real bug.
>
> Sorry, shouldn't be on you, can you fold this in?
Sure, thanks!
-- Steve
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index b76011a..9d4ba65 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -683,8 +683,8 @@ static bool __memcg_event_check(struct mem_cgroup *memcg, int target)
> {
> unsigned long val, next;
>
> - val = this_cpu_read(memcg->stat->events[MEM_CGROUP_EVENTS_COUNT]);
> - next = this_cpu_read(memcg->stat->targets[target]);
> + val = __this_cpu_read(memcg->stat->events[MEM_CGROUP_EVENTS_COUNT]);
> + next = __this_cpu_read(memcg->stat->targets[target]);
> /* from time_after() in jiffies.h */
> return ((long)next - (long)val < 0);
> }
> @@ -693,7 +693,7 @@ static void __mem_cgroup_target_update(struct mem_cgroup *memcg, int target)
> {
> unsigned long val, next;
>
> - val = this_cpu_read(memcg->stat->events[MEM_CGROUP_EVENTS_COUNT]);
> + val = __this_cpu_read(memcg->stat->events[MEM_CGROUP_EVENTS_COUNT]);
>
> switch (target) {
> case MEM_CGROUP_TARGET_THRESH:
> @@ -709,7 +709,7 @@ static void __mem_cgroup_target_update(struct mem_cgroup *memcg, int target)
> return;
> }
>
> - this_cpu_write(memcg->stat->targets[target], next);
> + __this_cpu_write(memcg->stat->targets[target], next);
> }
>
> /*
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 3/5] memcg: Disable preemption in memcg_check_events()
2011-09-19 21:20 ` [RFC][PATCH 3/5] memcg: Disable preemption in memcg_check_events() Steven Rostedt
2011-09-20 14:20 ` Johannes Weiner
@ 2011-09-24 0:46 ` Steven Rostedt
1 sibling, 0 replies; 73+ messages in thread
From: Steven Rostedt @ 2011-09-24 0:46 UTC (permalink / raw)
To: linux-kernel
Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
Christoph Lameter, Johannes Weiner, Greg Thelen,
KAMEZAWA Hiroyuki, Balbir Singh, Daisuke Nishimura
On Mon, 2011-09-19 at 17:20 -0400, Steven Rostedt wrote:
> plain text document attachment
> (0003-memcg-Disable-preemption-in-memcg_check_events.patch)
> From: Steven Rostedt <srostedt@redhat.com>
>
> The code in memcg_check_events() calls this_cpu_read() on
> different variables without disabling preemption, and can cause
> the calculations to be done from two different CPU variables.
>
> Disable preemption throughout the check to keep apples and oranges
> from becoming a mixed drink.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Greg Thelen <gthelen@google.com>
> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Oops, I just noticed that I cut and pasted from another commit and was
suppose to change these to Cc's as all the above (except Johannes) did
not give their ack or sob.
I'll merge in Johannes's changes and push out a patch for someone to
take.
Thanks,
-- Steve
> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> ---
> mm/memcontrol.c | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 3508777..a164c93 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -718,6 +718,7 @@ static void __mem_cgroup_target_update(struct mem_cgroup *mem, int target)
> */
> static void memcg_check_events(struct mem_cgroup *mem, struct page *page)
> {
> + preempt_disable();
> /* threshold event is triggered in finer grain than soft limit */
> if (unlikely(__memcg_event_check(mem, MEM_CGROUP_TARGET_THRESH))) {
> mem_cgroup_threshold(mem);
> @@ -737,6 +738,7 @@ static void memcg_check_events(struct mem_cgroup *mem, struct page *page)
> }
> #endif
> }
> + preempt_enable();
> }
>
> static struct mem_cgroup *mem_cgroup_from_cont(struct cgroup *cont)
^ permalink raw reply [flat|nested] 73+ messages in thread
* [RFC][PATCH 4/5] printk: Have wake_up_klogd() use __this_cpu_write()
2011-09-19 21:20 [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write() Steven Rostedt
` (2 preceding siblings ...)
2011-09-19 21:20 ` [RFC][PATCH 3/5] memcg: Disable preemption in memcg_check_events() Steven Rostedt
@ 2011-09-19 21:20 ` Steven Rostedt
2011-09-19 21:54 ` Christoph Lameter
2011-09-19 21:20 ` [RFC][PATCH 5/5] percpu: Add preempt checks back into this_cpu_read/write() Steven Rostedt
` (2 subsequent siblings)
6 siblings, 1 reply; 73+ messages in thread
From: Steven Rostedt @ 2011-09-19 21:20 UTC (permalink / raw)
To: linux-kernel
Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
Christoph Lameter
[-- Attachment #1: 0004-printk-Have-wake_up_klogd-use-__this_cpu_write.patch --]
[-- Type: text/plain, Size: 822 bytes --]
From: Steven Rostedt <srostedt@redhat.com>
The wake up code that triggers klogd does not really matter which
CPU it enables the wake up on. Every CPU will be doing a printk_tick()
and check the current CPU. As long as one of the CPUs triggers the
wakeup we are fine. Use __this_cpu_write() instead of this_cpu_write()
to show that we do not care.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
kernel/printk.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/kernel/printk.c b/kernel/printk.c
index 28a40d8..e221fec 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -1225,7 +1225,7 @@ int printk_needs_cpu(int cpu)
void wake_up_klogd(void)
{
if (waitqueue_active(&log_wait))
- this_cpu_write(printk_pending, 1);
+ __this_cpu_write(printk_pending, 1);
}
/**
--
1.7.5.4
^ permalink raw reply related [flat|nested] 73+ messages in thread* Re: [RFC][PATCH 4/5] printk: Have wake_up_klogd() use __this_cpu_write()
2011-09-19 21:20 ` [RFC][PATCH 4/5] printk: Have wake_up_klogd() use __this_cpu_write() Steven Rostedt
@ 2011-09-19 21:54 ` Christoph Lameter
2011-09-19 23:33 ` Steven Rostedt
0 siblings, 1 reply; 73+ messages in thread
From: Christoph Lameter @ 2011-09-19 21:54 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
Peter Zijlstra, Christoph Lameter
On Mon, 19 Sep 2011, Steven Rostedt wrote:
> From: Steven Rostedt <srostedt@redhat.com>
>
> The wake up code that triggers klogd does not really matter which
> CPU it enables the wake up on. Every CPU will be doing a printk_tick()
> and check the current CPU. As long as one of the CPUs triggers the
> wakeup we are fine. Use __this_cpu_write() instead of this_cpu_write()
> to show that we do not care.
printk_needs_cpu() is always called from context where we have disabled
interrupts therefore this is safe to do.
Reviewed-by: Christoph Lameter <cl@gentwo.org>
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 4/5] printk: Have wake_up_klogd() use __this_cpu_write()
2011-09-19 21:54 ` Christoph Lameter
@ 2011-09-19 23:33 ` Steven Rostedt
2011-09-20 14:54 ` Christoph Lameter
0 siblings, 1 reply; 73+ messages in thread
From: Steven Rostedt @ 2011-09-19 23:33 UTC (permalink / raw)
To: Christoph Lameter
Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
Peter Zijlstra, Christoph Lameter
On Mon, 2011-09-19 at 16:54 -0500, Christoph Lameter wrote:
> On Mon, 19 Sep 2011, Steven Rostedt wrote:
>
> > From: Steven Rostedt <srostedt@redhat.com>
> >
> > The wake up code that triggers klogd does not really matter which
> > CPU it enables the wake up on. Every CPU will be doing a printk_tick()
> > and check the current CPU. As long as one of the CPUs triggers the
> > wakeup we are fine. Use __this_cpu_write() instead of this_cpu_write()
> > to show that we do not care.
>
> printk_needs_cpu() is always called from context where we have disabled
> interrupts therefore this is safe to do.
hehe, the patch is deceiving ;)
@@ -1225,7 +1225,7 @@ int printk_needs_cpu(int cpu)
void wake_up_klogd(void)
{
if (waitqueue_active(&log_wait))
The printk_needs_cpu may be in the patch header there, but the real
function name happened to be in the call itself. "wake_up_klogd()".
Note, just because something is always in a location that preemption is
disabled, does not mean it should use the __this_cpu*() variants.
Because if things change, it may become a problem later on.
-- Steve
>
> Reviewed-by: Christoph Lameter <cl@gentwo.org>
^ permalink raw reply [flat|nested] 73+ messages in thread* Re: [RFC][PATCH 4/5] printk: Have wake_up_klogd() use __this_cpu_write()
2011-09-19 23:33 ` Steven Rostedt
@ 2011-09-20 14:54 ` Christoph Lameter
2011-09-20 14:55 ` Peter Zijlstra
0 siblings, 1 reply; 73+ messages in thread
From: Christoph Lameter @ 2011-09-20 14:54 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
Peter Zijlstra, Christoph Lameter
On Mon, 19 Sep 2011, Steven Rostedt wrote:
> Note, just because something is always in a location that preemption is
> disabled, does not mean it should use the __this_cpu*() variants.
Why not? If preemption is disabled then the process cannot be migrated to
another processor. And thus doing the address calculations and operations
on variables step by step is okay.
> Because if things change, it may become a problem later on.
What things may change? Someone calls the function with preemption
enabled?
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 4/5] printk: Have wake_up_klogd() use __this_cpu_write()
2011-09-20 14:54 ` Christoph Lameter
@ 2011-09-20 14:55 ` Peter Zijlstra
0 siblings, 0 replies; 73+ messages in thread
From: Peter Zijlstra @ 2011-09-20 14:55 UTC (permalink / raw)
To: Christoph Lameter
Cc: Steven Rostedt, linux-kernel, Ingo Molnar, Andrew Morton,
Thomas Gleixner, Christoph Lameter
On Tue, 2011-09-20 at 09:54 -0500, Christoph Lameter wrote:
> On Mon, 19 Sep 2011, Steven Rostedt wrote:
>
> > Note, just because something is always in a location that preemption is
> > disabled, does not mean it should use the __this_cpu*() variants.
>
> Why not? If preemption is disabled then the process cannot be migrated to
> another processor. And thus doing the address calculations and operations
> on variables step by step is okay.
>
> > Because if things change, it may become a problem later on.
>
> What things may change? Someone calls the function with preemption
> enabled?
Yes, also, for !x86 you get a redundant preempt_disable/enable pair.
^ permalink raw reply [flat|nested] 73+ messages in thread
* [RFC][PATCH 5/5] percpu: Add preempt checks back into this_cpu_read/write()
2011-09-19 21:20 [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write() Steven Rostedt
` (3 preceding siblings ...)
2011-09-19 21:20 ` [RFC][PATCH 4/5] printk: Have wake_up_klogd() use __this_cpu_write() Steven Rostedt
@ 2011-09-19 21:20 ` Steven Rostedt
2011-09-19 21:49 ` [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write() Christoph Lameter
2011-09-20 2:20 ` Andi Kleen
6 siblings, 0 replies; 73+ messages in thread
From: Steven Rostedt @ 2011-09-19 21:20 UTC (permalink / raw)
To: linux-kernel
Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
Christoph Lameter, Tejun Heo, Peter Zijlstra
[-- Attachment #1: 0005-percpu-Add-preempt-checks-back-into-this_cpu_read-wr.patch --]
[-- Type: text/plain, Size: 2106 bytes --]
From: Steven Rostedt <srostedt@redhat.com>
The conversion of per_cpu() and get_cpu_var() to this_cpu_read/write()
removed the debug check against using cpu variables in non atomic sections.
There are few cases where that is fine, but 99% of the time, if a per cpu
variable is going to be played with, it had better happen in an atomic
area, otherwise hard to find bugs may occur.
Right now only this_cpu_read/write() were updated. Maybe it would be a good
idea to handle the other this_cpu_*() functions as well.
Cc: Christoph Lameter <cl@linux.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
include/linux/percpu.h | 17 +++++++++++++++--
1 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/include/linux/percpu.h b/include/linux/percpu.h
index 9ca008f..a4048de 100644
--- a/include/linux/percpu.h
+++ b/include/linux/percpu.h
@@ -22,6 +22,13 @@
PERCPU_MODULE_RESERVE)
#endif
+#ifdef CONFIG_DEBUG_PREEMPT
+# define debug_check_cpu() smp_processor_id()
+#else
+# define debug_check_cpu()
+#endif
+
+
/*
* Must be an lvalue. Since @var must be a simple identifier,
* we force a syntax error here if it isn't.
@@ -342,7 +349,10 @@ do { \
# ifndef this_cpu_read_8
# define this_cpu_read_8(pcp) _this_cpu_generic_read(pcp)
# endif
-# define this_cpu_read(pcp) __pcpu_size_call_return(this_cpu_read_, (pcp))
+# define this_cpu_read(pcp) ({ \
+ debug_check_cpu(); \
+ __pcpu_size_call_return(this_cpu_read_, (pcp)); \
+ })
#endif
#define _this_cpu_generic_to_op(pcp, val, op) \
@@ -365,7 +375,10 @@ do { \
# ifndef this_cpu_write_8
# define this_cpu_write_8(pcp, val) _this_cpu_generic_to_op((pcp), (val), =)
# endif
-# define this_cpu_write(pcp, val) __pcpu_size_call(this_cpu_write_, (pcp), (val))
+# define this_cpu_write(pcp, val) ({ \
+ debug_check_cpu(); \
+ __pcpu_size_call(this_cpu_write_, (pcp), (val)); \
+ })
#endif
#ifndef this_cpu_add
--
1.7.5.4
^ permalink raw reply related [flat|nested] 73+ messages in thread* Re: [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write()
2011-09-19 21:20 [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write() Steven Rostedt
` (4 preceding siblings ...)
2011-09-19 21:20 ` [RFC][PATCH 5/5] percpu: Add preempt checks back into this_cpu_read/write() Steven Rostedt
@ 2011-09-19 21:49 ` Christoph Lameter
2011-09-20 3:06 ` Steven Rostedt
2011-09-20 2:20 ` Andi Kleen
6 siblings, 1 reply; 73+ messages in thread
From: Christoph Lameter @ 2011-09-19 21:49 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
Peter Zijlstra
On Mon, 19 Sep 2011, Steven Rostedt wrote:
> I just found out that the this_cpu_*() functions do not perform the
> test to see if the usage is in atomic or not. Thus, the blind
> conversion of the per_cpu(*, smp_processor_id()) and the get_cpu_var()
> code to this_cpu_*() introduce the regression to detect the hard
> to find case where a per cpu variable is used in preempt code that
> migrates and causes bugs.
this_cpu_* function can be used either way. There is no point in checking
for atomic or not since the this_cpu_* implementations are required to
take care of disabling preemption. Those operations are generally safe to
use regardless of the context.
It can be replaced by __this_cpu_* only when we know that the context
prevents races through disabling preemption.
^ permalink raw reply [flat|nested] 73+ messages in thread* Re: [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write()
2011-09-19 21:49 ` [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write() Christoph Lameter
@ 2011-09-20 3:06 ` Steven Rostedt
2011-09-20 12:44 ` Valdis.Kletnieks
2011-09-20 15:46 ` Mathieu Desnoyers
0 siblings, 2 replies; 73+ messages in thread
From: Steven Rostedt @ 2011-09-20 3:06 UTC (permalink / raw)
To: Christoph Lameter
Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
Peter Zijlstra, Mathieu Desnoyers
[ Added Mathieu as I've been discussing this with him on IRC ]
On Mon, 2011-09-19 at 16:49 -0500, Christoph Lameter wrote:
> On Mon, 19 Sep 2011, Steven Rostedt wrote:
>
> > I just found out that the this_cpu_*() functions do not perform the
> > test to see if the usage is in atomic or not. Thus, the blind
> > conversion of the per_cpu(*, smp_processor_id()) and the get_cpu_var()
> > code to this_cpu_*() introduce the regression to detect the hard
> > to find case where a per cpu variable is used in preempt code that
> > migrates and causes bugs.
>
> this_cpu_* function can be used either way. There is no point in checking
> for atomic or not since the this_cpu_* implementations are required to
> take care of disabling preemption. Those operations are generally safe to
> use regardless of the context.
>
> It can be replaced by __this_cpu_* only when we know that the context
> prevents races through disabling preemption.
Seems that this_cpu_*() and friends have been added as some new semantic
for slub and friends. The problem came when there was some stupid
campaign to convert all of the per_cpu(x, smp_processor_id()) and
__get_cpu_var() to this because it changed the semantic of those places.
If I knew of this change, I would have NACK'd most of them.
The smp_processor_id() and __get_cpu_var() checked to make sure that
preemption was disabled, and warned you if it was not. Because 99% of
the kernel that uses per_cpu variables EXPECTS TO BE ON THE SAME CPU
DURING THE USE OF THAT VARIABLE! The get_cpu_var() disabled preemption
for put_cpu_var() to enable them. But __get_cpu_var() checked to make
sure preemption is disabled.
Seems that this silly campaign ignored that fact. Now we have loads of
locations just waiting to become buggy if preemption is enabled. I think
I already found one area where this was the case in the memcg code. Two
per_cpu variables are read and compared, but those two variables may end
up being from two different CPUs. Do we care about that?
It is really confusing to know which version to use. I'm confused by the
this_cpu_*() compared with __this_cpu_*(). I'm guessing that most places
should use __this_cpu*(). But really this_cpu() should be the default,
and the places that can have it outside of preemption should have
another name. Maybe use the raw_this_cpu() or safe_this_cpu(), as there
is an irqsafe_this_cpu(). Maybe make a preemptsafe_cpu_*(). There should
only be a very few locations that are OK to have preemption enabled when
calling the this_cpu() code. Lets have those have the funny names and
not be the default "this_cpu_*()".
All this_cpu*() code, except the funny named ones, should make sure
preemption is disabled, otherwise give a nasty warning. As that is
usually a bug if you are using a per cpu variable and can migrate away.
The next reference to that value may be incorrect.
Note, just adding the checks to the __this_cpu_read/write() code,
produced a few bug reports:
max_sane_readahead()
alloc_zeroed_user_highpage()
tsk_fork_get_node()
all the above is from numa_node_id() using the __this_cpu_read() outside
of preempt disabled.
Note, most people would assume that using this_cpu_read/write() is the
correct thing to use when we expect that preemption is disabled. But it
is nice to have a check to know when it is not disabled and we are
incorrectly using these variables (just like smp_processor_id())
My suggestion is to nuke the __this_cpu*() functions and have them
called this_cpu*(). And change the locations that allow for preemption
enabled to be called preemptsafe_cpu_*() just like the irqsafe_cpu*()
are used.
Thoughts?
-- Steve
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write()
2011-09-20 3:06 ` Steven Rostedt
@ 2011-09-20 12:44 ` Valdis.Kletnieks
2011-09-20 13:51 ` Thomas Gleixner
2011-09-20 14:57 ` Christoph Lameter
2011-09-20 15:46 ` Mathieu Desnoyers
1 sibling, 2 replies; 73+ messages in thread
From: Valdis.Kletnieks @ 2011-09-20 12:44 UTC (permalink / raw)
To: Steven Rostedt
Cc: Christoph Lameter, linux-kernel, Ingo Molnar, Andrew Morton,
Thomas Gleixner, Peter Zijlstra, Mathieu Desnoyers
[-- Attachment #1: Type: text/plain, Size: 1379 bytes --]
On Mon, 19 Sep 2011 23:06:17 EDT, Steven Rostedt said:
> It is really confusing to know which version to use. I'm confused by the
> this_cpu_*() compared with __this_cpu_*(). I'm guessing that most places
> should use __this_cpu*(). But really this_cpu() should be the default,
> and the places that can have it outside of preemption should have
> another name. Maybe use the raw_this_cpu() or safe_this_cpu(), as there
> is an irqsafe_this_cpu(). Maybe make a preemptsafe_cpu_*(). There should
> only be a very few locations that are OK to have preemption enabled when
> calling the this_cpu() code. Lets have those have the funny names and
> not be the default "this_cpu_*()".
What's the latency hit on those very few locations if we simply put our
collective foot down and not support a preemptable version of this_cpu_*()?
"Yes, you *could* preempt here, but for our collective sanity that's not
supported"...
> All this_cpu*() code, except the funny named ones, should make sure
> preemption is disabled, otherwise give a nasty warning. As that is
> usually a bug if you are using a per cpu variable and can migrate away.
> The next reference to that value may be incorrect.
You get a much prettier diffstat if you just nuke the funny named ones. ;)
But of course it's early morning and I'm still caffeine-deficient and probably
overlooking some crucial use case. ;)
[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write()
2011-09-20 12:44 ` Valdis.Kletnieks
@ 2011-09-20 13:51 ` Thomas Gleixner
2011-09-20 14:58 ` Christoph Lameter
2011-09-20 14:57 ` Christoph Lameter
1 sibling, 1 reply; 73+ messages in thread
From: Thomas Gleixner @ 2011-09-20 13:51 UTC (permalink / raw)
To: Valdis.Kletnieks
Cc: Steven Rostedt, Christoph Lameter, linux-kernel, Ingo Molnar,
Andrew Morton, Peter Zijlstra, Mathieu Desnoyers
On Tue, 20 Sep 2011, Valdis.Kletnieks@vt.edu wrote:
> On Mon, 19 Sep 2011 23:06:17 EDT, Steven Rostedt said:
>
> > It is really confusing to know which version to use. I'm confused by the
> > this_cpu_*() compared with __this_cpu_*(). I'm guessing that most places
> > should use __this_cpu*(). But really this_cpu() should be the default,
> > and the places that can have it outside of preemption should have
> > another name. Maybe use the raw_this_cpu() or safe_this_cpu(), as there
> > is an irqsafe_this_cpu(). Maybe make a preemptsafe_cpu_*(). There should
> > only be a very few locations that are OK to have preemption enabled when
> > calling the this_cpu() code. Lets have those have the funny names and
> > not be the default "this_cpu_*()".
>
> What's the latency hit on those very few locations if we simply put our
> collective foot down and not support a preemptable version of this_cpu_*()?
> "Yes, you *could* preempt here, but for our collective sanity that's not
> supported"...
Full ack.
> > All this_cpu*() code, except the funny named ones, should make sure
> > preemption is disabled, otherwise give a nasty warning. As that is
> > usually a bug if you are using a per cpu variable and can migrate away.
> > The next reference to that value may be incorrect.
>
> You get a much prettier diffstat if you just nuke the funny named ones. ;)
Along with the maze of completely unused incarnations.
> But of course it's early morning and I'm still caffeine-deficient and probably
> overlooking some crucial use case. ;)
I doubt that.
Thanks,
tglx
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write()
2011-09-20 13:51 ` Thomas Gleixner
@ 2011-09-20 14:58 ` Christoph Lameter
2011-09-20 15:17 ` Steven Rostedt
0 siblings, 1 reply; 73+ messages in thread
From: Christoph Lameter @ 2011-09-20 14:58 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Valdis.Kletnieks, Steven Rostedt, linux-kernel, Ingo Molnar,
Andrew Morton, Peter Zijlstra, Mathieu Desnoyers
On Tue, 20 Sep 2011, Thomas Gleixner wrote:
> > What's the latency hit on those very few locations if we simply put our
> > collective foot down and not support a preemptable version of this_cpu_*()?
> > "Yes, you *could* preempt here, but for our collective sanity that's not
> > supported"...
>
> Full ack.
Latency hit could be very significant in various critical kernel paths.
Especially network subsystem, vm event counters etc.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write()
2011-09-20 14:58 ` Christoph Lameter
@ 2011-09-20 15:17 ` Steven Rostedt
0 siblings, 0 replies; 73+ messages in thread
From: Steven Rostedt @ 2011-09-20 15:17 UTC (permalink / raw)
To: Christoph Lameter
Cc: Thomas Gleixner, Valdis.Kletnieks, linux-kernel, Ingo Molnar,
Andrew Morton, Peter Zijlstra, Mathieu Desnoyers
On Tue, 2011-09-20 at 09:58 -0500, Christoph Lameter wrote:
> On Tue, 20 Sep 2011, Thomas Gleixner wrote:
>
> > > What's the latency hit on those very few locations if we simply put our
> > > collective foot down and not support a preemptable version of this_cpu_*()?
> > > "Yes, you *could* preempt here, but for our collective sanity that's not
> > > supported"...
> >
> > Full ack.
>
> Latency hit could be very significant in various critical kernel paths.
> Especially network subsystem, vm event counters etc.
Latency hit is better than incorrect behavior. I tell people working on
-rt all the time. A bug/kernel crash is much worse than a hit in
latency, as a bug/kernel crash causes a much bigger latency hit than
anything else.
-- Steve
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write()
2011-09-20 12:44 ` Valdis.Kletnieks
2011-09-20 13:51 ` Thomas Gleixner
@ 2011-09-20 14:57 ` Christoph Lameter
2011-09-20 15:19 ` Steven Rostedt
1 sibling, 1 reply; 73+ messages in thread
From: Christoph Lameter @ 2011-09-20 14:57 UTC (permalink / raw)
To: Valdis.Kletnieks
Cc: Steven Rostedt, linux-kernel, Ingo Molnar, Andrew Morton,
Thomas Gleixner, Peter Zijlstra, Mathieu Desnoyers
On Tue, 20 Sep 2011, Valdis.Kletnieks@vt.edu wrote:
> On Mon, 19 Sep 2011 23:06:17 EDT, Steven Rostedt said:
>
> > It is really confusing to know which version to use. I'm confused by the
> > this_cpu_*() compared with __this_cpu_*(). I'm guessing that most places
> > should use __this_cpu*(). But really this_cpu() should be the default,
> > and the places that can have it outside of preemption should have
> > another name. Maybe use the raw_this_cpu() or safe_this_cpu(), as there
> > is an irqsafe_this_cpu(). Maybe make a preemptsafe_cpu_*(). There should
> > only be a very few locations that are OK to have preemption enabled when
> > calling the this_cpu() code. Lets have those have the funny names and
> > not be the default "this_cpu_*()".
this_cpu_xx functions are made for those locations that have
preemption enabled. If you can use those function (classic case is a
per cpu counter increment in the network subsystem) then you can avoid
preempt disable/enable or get_cpu/put_cpu.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write()
2011-09-20 14:57 ` Christoph Lameter
@ 2011-09-20 15:19 ` Steven Rostedt
2011-09-20 16:08 ` Christoph Lameter
2011-09-20 22:17 ` Valdis.Kletnieks
0 siblings, 2 replies; 73+ messages in thread
From: Steven Rostedt @ 2011-09-20 15:19 UTC (permalink / raw)
To: Christoph Lameter
Cc: Valdis.Kletnieks, linux-kernel, Ingo Molnar, Andrew Morton,
Thomas Gleixner, Peter Zijlstra, Mathieu Desnoyers
On Tue, 2011-09-20 at 09:57 -0500, Christoph Lameter wrote:
> this_cpu_xx functions are made for those locations that have
> preemption enabled. If you can use those function (classic case is a
> per cpu counter increment in the network subsystem) then you can avoid
> preempt disable/enable or get_cpu/put_cpu.
If the variables are used for a very short time, then the latencies
introduced by a simple:
var = get_cpu_var(my_var);
if (var)
do_something_quick();
put_cpu_var(my_var);
Otherwise if that do_something_quick(); migrates, it may be doing
something it shouldn't be doing!
-- Steve
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write()
2011-09-20 15:19 ` Steven Rostedt
@ 2011-09-20 16:08 ` Christoph Lameter
2011-09-20 16:31 ` Steven Rostedt
2011-09-20 22:32 ` Valdis.Kletnieks
2011-09-20 22:17 ` Valdis.Kletnieks
1 sibling, 2 replies; 73+ messages in thread
From: Christoph Lameter @ 2011-09-20 16:08 UTC (permalink / raw)
To: Steven Rostedt
Cc: Valdis.Kletnieks, linux-kernel, Ingo Molnar, Andrew Morton,
Thomas Gleixner, Peter Zijlstra, Mathieu Desnoyers
On Tue, 20 Sep 2011, Steven Rostedt wrote:
> On Tue, 2011-09-20 at 09:57 -0500, Christoph Lameter wrote:
>
> > this_cpu_xx functions are made for those locations that have
> > preemption enabled. If you can use those function (classic case is a
> > per cpu counter increment in the network subsystem) then you can avoid
> > preempt disable/enable or get_cpu/put_cpu.
>
> If the variables are used for a very short time, then the latencies
> introduced by a simple:
>
> var = get_cpu_var(my_var);
> if (var)
> do_something_quick();
> put_cpu_var(my_var);
>
> Otherwise if that do_something_quick(); migrates, it may be doing
> something it shouldn't be doing!
That is obviously a use case in which this_cpu_xx ops could not be used
since we must stay on the same cpu. I think there is still a lot of
confusion on your part. The example here shows it.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write()
2011-09-20 16:08 ` Christoph Lameter
@ 2011-09-20 16:31 ` Steven Rostedt
2011-09-20 16:56 ` Steven Rostedt
2011-09-20 22:32 ` Valdis.Kletnieks
1 sibling, 1 reply; 73+ messages in thread
From: Steven Rostedt @ 2011-09-20 16:31 UTC (permalink / raw)
To: Christoph Lameter
Cc: Valdis.Kletnieks, linux-kernel, Ingo Molnar, Andrew Morton,
Thomas Gleixner, Peter Zijlstra, Mathieu Desnoyers
On Tue, 2011-09-20 at 11:08 -0500, Christoph Lameter wrote:
> On Tue, 20 Sep 2011, Steven Rostedt wrote:
>
> > On Tue, 2011-09-20 at 09:57 -0500, Christoph Lameter wrote:
> >
> > > this_cpu_xx functions are made for those locations that have
> > > preemption enabled. If you can use those function (classic case is a
> > > per cpu counter increment in the network subsystem) then you can avoid
> > > preempt disable/enable or get_cpu/put_cpu.
> >
> > If the variables are used for a very short time, then the latencies
> > introduced by a simple:
> >
> > var = get_cpu_var(my_var);
> > if (var)
> > do_something_quick();
> > put_cpu_var(my_var);
> >
> > Otherwise if that do_something_quick(); migrates, it may be doing
> > something it shouldn't be doing!
>
> That is obviously a use case in which this_cpu_xx ops could not be used
> since we must stay on the same cpu. I think there is still a lot of
> confusion on your part. The example here shows it.
No, the problem is that you do not understand what I'm saying.
You are saying that you have other tricks to use per cpu data even
though that task that uses it can migrate to other cpus.
But the issue I have with this, is that this_cpu() is all about tricks.
The normal use case of per_cpu data is to disable preemption, use the
local variable and then enable preemption. That's the normal case and
should be what this_cpu_*() should be the default for.
But instead, you say this_cpu() is for data that doesn't care what CPU
it is at the time. It's really just as snapshot of the data. Which to me
seems to be a special use case that should not be the default. It is
confusing and hard to understand. Basically, you want __this_cpu_*() as
the normal case, which is ass backwards.
If your page allocator has special per_cpu tricks, then change the API
of it to something completely different to what the rest of the kernel
uses. Basic English shows to me that when I use "this_cpu_*()" I'm
getting a variable that is for the current CPU that the task is on, and
I should stay on that CPU. If you want something that is tricky, add
that to the name. Don't make everyone use this confusing "__this_cpu*()"
code which seems to be inconsistent in your own code too.
Have the "snapshot" version. This documents very nicely that the
variable that you read or modify is simply a one shot deal and not to
use it later. Don't use a common term like "this_cpu" and expect
everyone to understand that it should only be used for per cpu tricks.
I personally believe that we shouldn't even have a modification version
of the snapshot code. But perhaps it is fine for inc and dec and
cmpxchg. Put "snapshot" in the name to document that this is a hack to
make things work, and let all other users in the kernel use a name that
is not confusing.
-- Steve
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write()
2011-09-20 16:31 ` Steven Rostedt
@ 2011-09-20 16:56 ` Steven Rostedt
2011-09-20 17:09 ` Peter Zijlstra
0 siblings, 1 reply; 73+ messages in thread
From: Steven Rostedt @ 2011-09-20 16:56 UTC (permalink / raw)
To: Christoph Lameter
Cc: Valdis.Kletnieks, linux-kernel, Ingo Molnar, Andrew Morton,
Thomas Gleixner, Peter Zijlstra, Mathieu Desnoyers
On Tue, 2011-09-20 at 12:31 -0400, Steven Rostedt wrote:
> I personally believe that we shouldn't even have a modification version
> of the snapshot code. But perhaps it is fine for inc and dec and
> cmpxchg. Put "snapshot" in the name to document that this is a hack to
> make things work, and let all other users in the kernel use a name that
> is not confusing.
Looking at some of the uses that this_cpu() is used outside of
preemption seems to be just statistic counters (or buggy). Talking with
Peter and Thomas on IRC, we come to realize that the entire problem with
your API is the "this_cpu". Because it has nothing to do with the
current CPU.
Here are some names that we should change it to:
snapshot_cpu_*() // my idea
random_cpu_*() // Thomas's idea
any_cpu_*() // Thomas's idea
Basically, the English term of "this_cpu" is patently wrong. It has
nothing to do with this cpu as you could be on any cpu at the time.
Lets rename the this_cpu() to something not so confusing and use them in
really the only locations that need them, and change all the
__this_cpu() users into this_cpu() which is exactly what they mean, and
also include the debug checking that we lost with the initial
conversions.
-- Steve
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write()
2011-09-20 16:56 ` Steven Rostedt
@ 2011-09-20 17:09 ` Peter Zijlstra
2011-09-20 17:15 ` Steven Rostedt
0 siblings, 1 reply; 73+ messages in thread
From: Peter Zijlstra @ 2011-09-20 17:09 UTC (permalink / raw)
To: Steven Rostedt
Cc: Christoph Lameter, Valdis.Kletnieks, linux-kernel, Ingo Molnar,
Andrew Morton, Thomas Gleixner, Mathieu Desnoyers
On Tue, 2011-09-20 at 12:56 -0400, Steven Rostedt wrote:
> random_cpu_*() // Thomas's idea
I like this one best..
But you forgot do deal with the irqsafe_cpu() crap, that's the same
brainfart as this_cpu() but more expensive because it frobs IRQ state.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write()
2011-09-20 17:09 ` Peter Zijlstra
@ 2011-09-20 17:15 ` Steven Rostedt
2011-09-20 17:25 ` Mathieu Desnoyers
0 siblings, 1 reply; 73+ messages in thread
From: Steven Rostedt @ 2011-09-20 17:15 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Christoph Lameter, Valdis.Kletnieks, linux-kernel, Ingo Molnar,
Andrew Morton, Thomas Gleixner, Mathieu Desnoyers
On Tue, 2011-09-20 at 19:09 +0200, Peter Zijlstra wrote:
> On Tue, 2011-09-20 at 12:56 -0400, Steven Rostedt wrote:
> > random_cpu_*() // Thomas's idea
>
> I like this one best..
I like it too, but not really the most appropriate.
>
> But you forgot do deal with the irqsafe_cpu() crap, that's the same
> brainfart as this_cpu() but more expensive because it frobs IRQ state.
But irqsafe_cpu_*() doesn't really have any real meaning to me. That is
something when I see it, I go and read the comments about it. It doesn't
contain "this_cpu" which is something that seems to explain what it is,
even though the obvious is not what it is.
-- Steve
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write()
2011-09-20 17:15 ` Steven Rostedt
@ 2011-09-20 17:25 ` Mathieu Desnoyers
2011-09-20 18:03 ` Steven Rostedt
0 siblings, 1 reply; 73+ messages in thread
From: Mathieu Desnoyers @ 2011-09-20 17:25 UTC (permalink / raw)
To: Steven Rostedt
Cc: Peter Zijlstra, Christoph Lameter, Valdis.Kletnieks, linux-kernel,
Ingo Molnar, Andrew Morton, Thomas Gleixner
* Steven Rostedt (rostedt@goodmis.org) wrote:
> On Tue, 2011-09-20 at 19:09 +0200, Peter Zijlstra wrote:
> > On Tue, 2011-09-20 at 12:56 -0400, Steven Rostedt wrote:
> > > random_cpu_*() // Thomas's idea
> >
> > I like this one best..
>
> I like it too, but not really the most appropriate.
>
> >
> > But you forgot do deal with the irqsafe_cpu() crap, that's the same
> > brainfart as this_cpu() but more expensive because it frobs IRQ state.
>
> But irqsafe_cpu_*() doesn't really have any real meaning to me. That is
> something when I see it, I go and read the comments about it. It doesn't
> contain "this_cpu" which is something that seems to explain what it is,
> even though the obvious is not what it is.
Throwing ideas from the IRC discussion into the mix (Paul McKenney and I
came up with it at the same time):
preempt_protected_percpu_*()
irq_protected_percpu_*()
Seems to be quite self-explanatory.
Thanks,
Mathieu
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write()
2011-09-20 17:25 ` Mathieu Desnoyers
@ 2011-09-20 18:03 ` Steven Rostedt
2011-09-20 18:12 ` Mathieu Desnoyers
0 siblings, 1 reply; 73+ messages in thread
From: Steven Rostedt @ 2011-09-20 18:03 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Peter Zijlstra, Christoph Lameter, Valdis.Kletnieks, linux-kernel,
Ingo Molnar, Andrew Morton, Thomas Gleixner
On Tue, 2011-09-20 at 13:25 -0400, Mathieu Desnoyers wrote:
> * Steven Rostedt (rostedt@goodmis.org) wrote:
> > On Tue, 2011-09-20 at 19:09 +0200, Peter Zijlstra wrote:
> > > On Tue, 2011-09-20 at 12:56 -0400, Steven Rostedt wrote:
> > > > random_cpu_*() // Thomas's idea
> > >
> > > I like this one best..
> >
> > I like it too, but not really the most appropriate.
> >
> > >
> > > But you forgot do deal with the irqsafe_cpu() crap, that's the same
> > > brainfart as this_cpu() but more expensive because it frobs IRQ state.
> >
> > But irqsafe_cpu_*() doesn't really have any real meaning to me. That is
> > something when I see it, I go and read the comments about it. It doesn't
> > contain "this_cpu" which is something that seems to explain what it is,
> > even though the obvious is not what it is.
>
> Throwing ideas from the IRC discussion into the mix (Paul McKenney and I
> came up with it at the same time):
>
> preempt_protected_percpu_*()
> irq_protected_percpu_*()
>
> Seems to be quite self-explanatory.
>
For use where the per_cpu data is protected with preemption disabled?
But isn't that the default case? Why make it hard to type for when you
should use it in the normal case.
It should be hard to type when it is a hack. As I recommended on IRC, we
probably should have it as:
use_this_if_you_really_do_not_care_what_cpu_you_are_on_but_are_anal_about_performance_cpu_*()
1) it is very self descriptive.
2) it would limit the usage as people wont like to have it in their
code ;)
-- Steve
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write()
2011-09-20 18:03 ` Steven Rostedt
@ 2011-09-20 18:12 ` Mathieu Desnoyers
2011-09-20 18:27 ` Steven Rostedt
0 siblings, 1 reply; 73+ messages in thread
From: Mathieu Desnoyers @ 2011-09-20 18:12 UTC (permalink / raw)
To: Steven Rostedt
Cc: Peter Zijlstra, Christoph Lameter, Valdis.Kletnieks, linux-kernel,
Ingo Molnar, Andrew Morton, Thomas Gleixner
* Steven Rostedt (rostedt@goodmis.org) wrote:
> On Tue, 2011-09-20 at 13:25 -0400, Mathieu Desnoyers wrote:
> > * Steven Rostedt (rostedt@goodmis.org) wrote:
> > > On Tue, 2011-09-20 at 19:09 +0200, Peter Zijlstra wrote:
> > > > On Tue, 2011-09-20 at 12:56 -0400, Steven Rostedt wrote:
> > > > > random_cpu_*() // Thomas's idea
> > > >
> > > > I like this one best..
> > >
> > > I like it too, but not really the most appropriate.
> > >
> > > >
> > > > But you forgot do deal with the irqsafe_cpu() crap, that's the same
> > > > brainfart as this_cpu() but more expensive because it frobs IRQ state.
> > >
> > > But irqsafe_cpu_*() doesn't really have any real meaning to me. That is
> > > something when I see it, I go and read the comments about it. It doesn't
> > > contain "this_cpu" which is something that seems to explain what it is,
> > > even though the obvious is not what it is.
> >
> > Throwing ideas from the IRC discussion into the mix (Paul McKenney and I
> > came up with it at the same time):
> >
> > preempt_protected_percpu_*()
> > irq_protected_percpu_*()
> >
> > Seems to be quite self-explanatory.
> >
>
> For use where the per_cpu data is protected with preemption disabled?
> But isn't that the default case? Why make it hard to type for when you
> should use it in the normal case.
>
> It should be hard to type when it is a hack. As I recommended on IRC, we
> probably should have it as:
>
> use_this_if_you_really_do_not_care_what_cpu_you_are_on_but_are_anal_about_performance_cpu_*()
>
> 1) it is very self descriptive.
> 2) it would limit the usage as people wont like to have it in their
> code ;)
Not quite. What I was proposing more precisely:
- this_cpu_*() for the case where the caller needs to disable
preemption. This is the default case. This is exactly what you
proposed, with WARN_ON debug checks. This could even be "percpu_*()"
now that I think of it. There is no real point in the "this_cpu"
prefix.
- preempt_protected_percpu_*() and irq_protected_percpu_*() for
statistics/slub use. Those primitives disable preemption or irq
internally on non-x86 architectures. The caller of these primitives
is not required to disable preemption nor irqs.
Thoughts ?
Thanks,
Mathieu
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write()
2011-09-20 18:12 ` Mathieu Desnoyers
@ 2011-09-20 18:27 ` Steven Rostedt
2011-09-20 18:34 ` Mathieu Desnoyers
0 siblings, 1 reply; 73+ messages in thread
From: Steven Rostedt @ 2011-09-20 18:27 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Peter Zijlstra, Christoph Lameter, Valdis.Kletnieks, linux-kernel,
Ingo Molnar, Andrew Morton, Thomas Gleixner
On Tue, 2011-09-20 at 14:12 -0400, Mathieu Desnoyers wrote:
> Not quite. What I was proposing more precisely:
>
> - this_cpu_*() for the case where the caller needs to disable
> preemption. This is the default case. This is exactly what you
> proposed, with WARN_ON debug checks. This could even be "percpu_*()"
> now that I think of it. There is no real point in the "this_cpu"
> prefix.
>
> - preempt_protected_percpu_*() and irq_protected_percpu_*() for
> statistics/slub use. Those primitives disable preemption or irq
> internally on non-x86 architectures. The caller of these primitives
> is not required to disable preemption nor irqs.
This is totally confusing. It suggests to me that the percpu requires
preemption protected. You are coupling the implementation of the
function too much with the name. The name should describe its use. What
does "preempt_protected" mean? To me, it sounds like I should use this
in preempt protected mode. Still way too confusing.
any_cpu_*() is still much more understanding. It means that we are
manipulating a CPU variable, and we do not care which one.
Looking at the real use cases of this_cpu(), that seems to be exactly
the use case for it. That is, we modify the cpu variable, maybe we get
migrated, but in the end, we just read all the cpu variables and report
the net sum. Thus the design POV is that we do not care what CPU
variable we read/write. From an implementation point of view, it just
happens to be an optimization that we try to read/write to the current
cpu pointer. But in reality it doesn't matter what CPU variable we
touch.
Do not confuse implementation and optimizations with design. The big
picture design is that we do not care what CPU variable is touched. The
name should reflect that.
-- Steve
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write()
2011-09-20 18:27 ` Steven Rostedt
@ 2011-09-20 18:34 ` Mathieu Desnoyers
0 siblings, 0 replies; 73+ messages in thread
From: Mathieu Desnoyers @ 2011-09-20 18:34 UTC (permalink / raw)
To: Steven Rostedt
Cc: Peter Zijlstra, Christoph Lameter, Valdis.Kletnieks, linux-kernel,
Ingo Molnar, Andrew Morton, Thomas Gleixner
* Steven Rostedt (rostedt@goodmis.org) wrote:
> On Tue, 2011-09-20 at 14:12 -0400, Mathieu Desnoyers wrote:
> > Not quite. What I was proposing more precisely:
> >
> > - this_cpu_*() for the case where the caller needs to disable
> > preemption. This is the default case. This is exactly what you
> > proposed, with WARN_ON debug checks. This could even be "percpu_*()"
> > now that I think of it. There is no real point in the "this_cpu"
> > prefix.
> >
> > - preempt_protected_percpu_*() and irq_protected_percpu_*() for
> > statistics/slub use. Those primitives disable preemption or irq
> > internally on non-x86 architectures. The caller of these primitives
> > is not required to disable preemption nor irqs.
>
> This is totally confusing. It suggests to me that the percpu requires
> preemption protected. You are coupling the implementation of the
> function too much with the name. The name should describe its use. What
> does "preempt_protected" mean? To me, it sounds like I should use this
> in preempt protected mode. Still way too confusing.
>
> any_cpu_*() is still much more understanding. It means that we are
> manipulating a CPU variable, and we do not care which one.
>
> Looking at the real use cases of this_cpu(), that seems to be exactly
> the use case for it. That is, we modify the cpu variable, maybe we get
> migrated, but in the end, we just read all the cpu variables and report
> the net sum. Thus the design POV is that we do not care what CPU
> variable we read/write. From an implementation point of view, it just
> happens to be an optimization that we try to read/write to the current
> cpu pointer. But in reality it doesn't matter what CPU variable we
> touch.
>
> Do not confuse implementation and optimizations with design. The big
> picture design is that we do not care what CPU variable is touched. The
> name should reflect that.
Yep, understood. We might want to consider percpu_*() for the case where
the caller must disable preemption, and any_percpu_*() for the case
where we don't care on which cpu we actually are. These are all touching
per-cpu variables after all. But still, it does not take into account
the "irqsafe" vs "preemptsafe" cases. So maybe irqsafe_any_percpu_*()
and preemptsafe_any_percpu_*() would do it ?
Thanks,
Mathieu
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write()
2011-09-20 16:08 ` Christoph Lameter
2011-09-20 16:31 ` Steven Rostedt
@ 2011-09-20 22:32 ` Valdis.Kletnieks
1 sibling, 0 replies; 73+ messages in thread
From: Valdis.Kletnieks @ 2011-09-20 22:32 UTC (permalink / raw)
To: Christoph Lameter
Cc: Steven Rostedt, linux-kernel, Ingo Molnar, Andrew Morton,
Thomas Gleixner, Peter Zijlstra, Mathieu Desnoyers
[-- Attachment #1: Type: text/plain, Size: 285 bytes --]
On Tue, 20 Sep 2011 11:08:56 CDT, Christoph Lameter said:
> That is obviously a use case in which this_cpu_xx ops could not be used
> since we must stay on the same cpu.
Wow. I don't think I've ever seen a single sentence scream quite so much
"totally broken API" in as few words.
[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write()
2011-09-20 15:19 ` Steven Rostedt
2011-09-20 16:08 ` Christoph Lameter
@ 2011-09-20 22:17 ` Valdis.Kletnieks
2011-09-21 1:33 ` Steven Rostedt
1 sibling, 1 reply; 73+ messages in thread
From: Valdis.Kletnieks @ 2011-09-20 22:17 UTC (permalink / raw)
To: Steven Rostedt
Cc: Christoph Lameter, linux-kernel, Ingo Molnar, Andrew Morton,
Thomas Gleixner, Peter Zijlstra, Mathieu Desnoyers
[-- Attachment #1: Type: text/plain, Size: 787 bytes --]
On Tue, 20 Sep 2011 11:19:47 EDT, Steven Rostedt said:
> On Tue, 2011-09-20 at 09:57 -0500, Christoph Lameter wrote:
>
> > this_cpu_xx functions are made for those locations that have
> > preemption enabled. If you can use those function (classic case is a
> > per cpu counter increment in the network subsystem) then you can avoid
> > preempt disable/enable or get_cpu/put_cpu.
>
> If the variables are used for a very short time, then the latencies
> introduced by a simple:
>
> var = get_cpu_var(my_var);
> if (var)
> do_something_quick();
> put_cpu_var(my_var);
>
> Otherwise if that do_something_quick(); migrates, it may be doing
> something it shouldn't be doing!
This has the added advantage of making the calling function take the blame
in latency traces, doesn't it?
[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write()
2011-09-20 22:17 ` Valdis.Kletnieks
@ 2011-09-21 1:33 ` Steven Rostedt
0 siblings, 0 replies; 73+ messages in thread
From: Steven Rostedt @ 2011-09-21 1:33 UTC (permalink / raw)
To: Valdis.Kletnieks
Cc: Christoph Lameter, linux-kernel, Ingo Molnar, Andrew Morton,
Thomas Gleixner, Peter Zijlstra, Mathieu Desnoyers
On Tue, 2011-09-20 at 18:17 -0400, Valdis.Kletnieks@vt.edu wrote:
> On Tue, 20 Sep 2011 11:19:47 EDT, Steven Rostedt said:
> > On Tue, 2011-09-20 at 09:57 -0500, Christoph Lameter wrote:
> >
> > > this_cpu_xx functions are made for those locations that have
> > > preemption enabled. If you can use those function (classic case is a
> > > per cpu counter increment in the network subsystem) then you can avoid
> > > preempt disable/enable or get_cpu/put_cpu.
> >
> > If the variables are used for a very short time, then the latencies
> > introduced by a simple:
> >
> > var = get_cpu_var(my_var);
> > if (var)
> > do_something_quick();
> > put_cpu_var(my_var);
> >
> > Otherwise if that do_something_quick(); migrates, it may be doing
> > something it shouldn't be doing!
>
> This has the added advantage of making the calling function take the blame
> in latency traces, doesn't it?
Yes, the preempt off latency tracer would detect the above, if it took
too long.
-- Steve
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write()
2011-09-20 3:06 ` Steven Rostedt
2011-09-20 12:44 ` Valdis.Kletnieks
@ 2011-09-20 15:46 ` Mathieu Desnoyers
2011-09-20 16:00 ` Steven Rostedt
1 sibling, 1 reply; 73+ messages in thread
From: Mathieu Desnoyers @ 2011-09-20 15:46 UTC (permalink / raw)
To: Steven Rostedt
Cc: Christoph Lameter, linux-kernel, Ingo Molnar, Andrew Morton,
Thomas Gleixner, Peter Zijlstra
* Steven Rostedt (rostedt@goodmis.org) wrote:
> [ Added Mathieu as I've been discussing this with him on IRC ]
[...]
> My suggestion is to nuke the __this_cpu*() functions and have them
> called this_cpu*(). And change the locations that allow for preemption
> enabled to be called preemptsafe_cpu_*() just like the irqsafe_cpu*()
> are used.
>
> Thoughts?
I fully agreed with this proposal.
this_cpu_*() should be a sane default, which is cases that require
preemption to be disabled. Warning about use without preemption disabled
is important, because most users will _assume_ that they stay on the
same CPU across multiple calls.
preemptsafe_cpu_*() is an optimisation made for those who know that
they don't care about having consistent view of the variables across
multiple operations, e.g. statistics, or deal with this explicitly, like
SLUB. Typical use require either add_return or cmpxchg for validation.
In the past, we used to have something like raw_smp_processor_id()
though, for sites where racy use does not matter. So probably having a
raw_this_cpu_*() that behaves like "this_cpu_*()" proposed here, but
does not check for preemption disabled would also be useful.
Best regards,
Mathieu
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write()
2011-09-20 15:46 ` Mathieu Desnoyers
@ 2011-09-20 16:00 ` Steven Rostedt
2011-09-20 16:10 ` Christoph Lameter
0 siblings, 1 reply; 73+ messages in thread
From: Steven Rostedt @ 2011-09-20 16:00 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Christoph Lameter, linux-kernel, Ingo Molnar, Andrew Morton,
Thomas Gleixner, Peter Zijlstra
On Tue, 2011-09-20 at 11:46 -0400, Mathieu Desnoyers wrote:
> * Steven Rostedt (rostedt@goodmis.org) wrote:
> > [ Added Mathieu as I've been discussing this with him on IRC ]
> [...]
> > My suggestion is to nuke the __this_cpu*() functions and have them
> > called this_cpu*(). And change the locations that allow for preemption
> > enabled to be called preemptsafe_cpu_*() just like the irqsafe_cpu*()
> > are used.
> >
> > Thoughts?
>
> I fully agreed with this proposal.
>
> this_cpu_*() should be a sane default, which is cases that require
> preemption to be disabled. Warning about use without preemption disabled
> is important, because most users will _assume_ that they stay on the
> same CPU across multiple calls.
>
> preemptsafe_cpu_*() is an optimisation made for those who know that
> they don't care about having consistent view of the variables across
> multiple operations, e.g. statistics, or deal with this explicitly, like
> SLUB. Typical use require either add_return or cmpxchg for validation.
The funny thing is, the few uses where preemption is enabled in slub,
happens to use the __this_cpu_() version. Which shows me that this whole
concept is completely flawed.
>
> In the past, we used to have something like raw_smp_processor_id()
> though, for sites where racy use does not matter. So probably having a
> raw_this_cpu_*() that behaves like "this_cpu_*()" proposed here, but
> does not check for preemption disabled would also be useful.
I'm going with all users of this_cpu_*() must be used in a preempt
disabled section. Byte the bullet. I doubt any sane benchmark shows any
improvement by removing preempt_disable() from small code paths. If a
longer code path is needed, then the code was probably buggy to begin
with.
For those cases that really do not care about preemption enabled, maybe
we could add a single this_cpu() function that allows that.
this_cpu_snapshot();
This would be the only function that can be used with preemption
enabled. It is read only (no modification of the variable). It is used
in cases that you really don't care what CPU you are on. For instance
the const_udelay() could use this.
The term "snapshot" should document very nicely that you do not care
that you migrated in the next moment. As a snapshot is just a quick
instance of time.
I really mean all other users of this_cpu_*(), including the cmpxchg and
friends, still need to have preemption disabled.
-- Steve
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write()
2011-09-20 16:00 ` Steven Rostedt
@ 2011-09-20 16:10 ` Christoph Lameter
2011-09-20 16:50 ` Peter Zijlstra
2011-09-20 18:54 ` Steven Rostedt
0 siblings, 2 replies; 73+ messages in thread
From: Christoph Lameter @ 2011-09-20 16:10 UTC (permalink / raw)
To: Steven Rostedt
Cc: Mathieu Desnoyers, linux-kernel, Ingo Molnar, Andrew Morton,
Thomas Gleixner, Peter Zijlstra
On Tue, 20 Sep 2011, Steven Rostedt wrote:
> I really mean all other users of this_cpu_*(), including the cmpxchg and
> friends, still need to have preemption disabled.
This is argument against the basic design of this_cpu_ops. They were
designed to avoid having to disable preemption for single operations on
per cpu data. I think this shows a basic misunderstanding of what you are
dealing with.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write()
2011-09-20 16:10 ` Christoph Lameter
@ 2011-09-20 16:50 ` Peter Zijlstra
2011-09-20 18:54 ` Steven Rostedt
1 sibling, 0 replies; 73+ messages in thread
From: Peter Zijlstra @ 2011-09-20 16:50 UTC (permalink / raw)
To: Christoph Lameter
Cc: Steven Rostedt, Mathieu Desnoyers, linux-kernel, Ingo Molnar,
Andrew Morton, Thomas Gleixner
On Tue, 2011-09-20 at 11:10 -0500, Christoph Lameter wrote:
> On Tue, 20 Sep 2011, Steven Rostedt wrote:
>
> > I really mean all other users of this_cpu_*(), including the cmpxchg and
> > friends, still need to have preemption disabled.
>
> This is argument against the basic design of this_cpu_ops. They were
> designed to avoid having to disable preemption for single operations on
> per cpu data. I think this shows a basic misunderstanding of what you are
> dealing with.
But part of that design is that its impossible to verify the
correctness. This is the part we object to and you keep avoiding.
There is a reason smp_processor_id() warns if its called in a
preemptible context, all the this_cpu wankery doesn't. It doesn't
provide a single useful debug feature and in places is designed so that
its impossible.
Seriously, how can you defend this shitpile with a straight face? Sure
it make slub go faster, but who gives a flying fuck if it brings the
rest of the kernel to its knees.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write()
2011-09-20 16:10 ` Christoph Lameter
2011-09-20 16:50 ` Peter Zijlstra
@ 2011-09-20 18:54 ` Steven Rostedt
2011-09-21 15:16 ` Christoph Lameter
1 sibling, 1 reply; 73+ messages in thread
From: Steven Rostedt @ 2011-09-20 18:54 UTC (permalink / raw)
To: Christoph Lameter
Cc: Mathieu Desnoyers, linux-kernel, Ingo Molnar, Andrew Morton,
Thomas Gleixner, Peter Zijlstra
On Tue, 2011-09-20 at 11:10 -0500, Christoph Lameter wrote:
> On Tue, 20 Sep 2011, Steven Rostedt wrote:
>
> > I really mean all other users of this_cpu_*(), including the cmpxchg and
> > friends, still need to have preemption disabled.
>
> This is argument against the basic design of this_cpu_ops. They were
> designed to avoid having to disable preemption for single operations on
> per cpu data. I think this shows a basic misunderstanding of what you are
> dealing with.
>
BTW, Can you explain to me where the this_cpu_*() ops were designed to
be used? The only places where "this_cpu_*()" is used in slub.c and
page_alloc.c have irqs disabled on their use. I thought this was for
slub and page_alloc?
Is this_cpu() made just for statistics? I see it used in the inode code
for that, and some accounting in the namespace.c code.
Note and there's places all over the kernel that uses this_cpu_read()
and thinks preemption should be disabled. Just look at
arch/x86/mm/tlb.c:
/* Caller has disabled preemption */
sender = this_cpu_read(tlb_vector_offset);
Why the comment?
My argument is that this_cpu_* is just confusing. Rename your use case
and keep this_cpu_*() as what you want __this_cpu_*() to be.
Thanks!
-- Steve
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write()
2011-09-20 18:54 ` Steven Rostedt
@ 2011-09-21 15:16 ` Christoph Lameter
2011-09-21 15:31 ` Steven Rostedt
2011-09-21 16:32 ` Thomas Gleixner
0 siblings, 2 replies; 73+ messages in thread
From: Christoph Lameter @ 2011-09-21 15:16 UTC (permalink / raw)
To: Steven Rostedt
Cc: Mathieu Desnoyers, linux-kernel, Ingo Molnar, Andrew Morton,
Thomas Gleixner, Peter Zijlstra
On Tue, 20 Sep 2011, Steven Rostedt wrote:
> BTW, Can you explain to me where the this_cpu_*() ops were designed to
> be used? The only places where "this_cpu_*()" is used in slub.c and
> page_alloc.c have irqs disabled on their use. I thought this was for
> slub and page_alloc?
Initially these were used for statistics that used per cpu counters. The
slub thing was an outgrow of this.
> Is this_cpu() made just for statistics? I see it used in the inode code
> for that, and some accounting in the namespace.c code.
That is the main use case yes.
> Note and there's places all over the kernel that uses this_cpu_read()
> and thinks preemption should be disabled. Just look at
> arch/x86/mm/tlb.c:
>
> /* Caller has disabled preemption */
> sender = this_cpu_read(tlb_vector_offset);
>
> Why the comment?
>
> My argument is that this_cpu_* is just confusing. Rename your use case
> and keep this_cpu_*() as what you want __this_cpu_*() to be.
Thought about this a bit last night. I think the main issue are these
this_cpu_read() and this_cpu_write() operations since people use those
irresponsibly. It usually does not make sense to read a value from a
random cpu nor does writing make sense. The situation is different for
per cpu counter increments where it does not matter which cpus counter is
incremented since we sum them up later anyways.
How about getting rid of this_cpu_read() and this_cpu_write() entirely?
Only allow __this_cpu_read and __this_cpu_write. There we check that the
caller has disabled preemption.
For the rare special cases (are there any?) that are legitimate use cases
for this_cpu_read/write we can use manual determination of per cpu
pointers and then just do a load via the pointer?
Or alternatively give this_cpu_read and write special names that make
their dangerousness clear.
In the case of slub there are only some this_cpu_write() things that can
be __this_cpu_write without a problem.
The __this_cpu_ptr() can become this_cpu_ptr as far as I can tell. This
should make it consistent so that we can check for disabled preemption for
all __this_cpu thingies.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write()
2011-09-21 15:16 ` Christoph Lameter
@ 2011-09-21 15:31 ` Steven Rostedt
2011-09-21 15:59 ` Christoph Lameter
2011-09-21 16:32 ` Thomas Gleixner
1 sibling, 1 reply; 73+ messages in thread
From: Steven Rostedt @ 2011-09-21 15:31 UTC (permalink / raw)
To: Christoph Lameter
Cc: Mathieu Desnoyers, linux-kernel, Ingo Molnar, Andrew Morton,
Thomas Gleixner, Peter Zijlstra
On Wed, 2011-09-21 at 10:16 -0500, Christoph Lameter wrote:
> > My argument is that this_cpu_* is just confusing. Rename your use case
> > and keep this_cpu_*() as what you want __this_cpu_*() to be.
>
> Thought about this a bit last night. I think the main issue are these
> this_cpu_read() and this_cpu_write() operations since people use those
> irresponsibly. It usually does not make sense to read a value from a
> random cpu nor does writing make sense. The situation is different for
> per cpu counter increments where it does not matter which cpus counter is
> incremented since we sum them up later anyways.
>
> How about getting rid of this_cpu_read() and this_cpu_write() entirely?
>
> Only allow __this_cpu_read and __this_cpu_write. There we check that the
> caller has disabled preemption.
The problem I have with this, is that this does not help at all. We are
back to the word "this_cpu" when you really do mean "any_cpu". We
optimize the implementation to only write to the current cpu we are on,
but that is an optimization not the design, as the process could migrate
before and after the call to your this_cpu_*() operation, which makes it
no longer "this cpu". The bigger view of this design is that
incrementing a cpu variable and then summing it up, means that you do
not care which CPU variable you updated. Do not call it "this_cpu"!
Yes, for optimization sake, we happen to use the current CPU, but if we
read/wrote to any CPU variable, the algorithm still works. Hence, it is
not "this_cpu" but "any_cpu".
>
> For the rare special cases (are there any?) that are legitimate use cases
> for this_cpu_read/write we can use manual determination of per cpu
> pointers and then just do a load via the pointer?
>
> Or alternatively give this_cpu_read and write special names that make
> their dangerousness clear.
Right, we need to change all "this_cpu_*()" operations that are made to
be safe under preempt enabled areas to "any_cpu_*()". And use
this_cpu_*() for the current __this_cpu_*().
This would clear up the confusion about using "this_cpu" vs "__this_cpu"
because "__" is truly meaningless.
>
> In the case of slub there are only some this_cpu_write() things that can
> be __this_cpu_write without a problem.
>
> The __this_cpu_ptr() can become this_cpu_ptr as far as I can tell. This
> should make it consistent so that we can check for disabled preemption for
> all __this_cpu thingies.
Again, lets just bite the bullet and rename them to something that is
understandable for everyone. This would make all of us happy.
I'm not against your code, I'm against the naming convention you decided
to use. It makes it confusing to something that is complex and complex
code needs to try to be a simple as possible.
-- Steve
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write()
2011-09-21 15:31 ` Steven Rostedt
@ 2011-09-21 15:59 ` Christoph Lameter
2011-09-21 16:12 ` Steven Rostedt
0 siblings, 1 reply; 73+ messages in thread
From: Christoph Lameter @ 2011-09-21 15:59 UTC (permalink / raw)
To: Steven Rostedt
Cc: Mathieu Desnoyers, linux-kernel, Ingo Molnar, Andrew Morton,
Thomas Gleixner, Peter Zijlstra
On Wed, 21 Sep 2011, Steven Rostedt wrote:
> The problem I have with this, is that this does not help at all. We are
> back to the word "this_cpu" when you really do mean "any_cpu". We
We mean the current cpu while the instruction is executing.
> Again, lets just bite the bullet and rename them to something that is
> understandable for everyone. This would make all of us happy.
>
> I'm not against your code, I'm against the naming convention you decided
> to use. It makes it confusing to something that is complex and complex
> code needs to try to be a simple as possible.
I am certainly open to new ways of structuring the stuff and for any
improvement possible. This whole scheme was developed in long discussions
over many years. My initial proposal was to use cpu_inc/dec etc. I did
what was possible in the context of the various opinions that people had
on these matters when the implementations were discussed. The "you" does
not apply. It is either "us" or "them" (if you want to distance yourself
from this).
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write()
2011-09-21 15:59 ` Christoph Lameter
@ 2011-09-21 16:12 ` Steven Rostedt
0 siblings, 0 replies; 73+ messages in thread
From: Steven Rostedt @ 2011-09-21 16:12 UTC (permalink / raw)
To: Christoph Lameter
Cc: Mathieu Desnoyers, linux-kernel, Ingo Molnar, Andrew Morton,
Thomas Gleixner, Peter Zijlstra
On Wed, 2011-09-21 at 10:59 -0500, Christoph Lameter wrote:
> On Wed, 21 Sep 2011, Steven Rostedt wrote:
>
> > The problem I have with this, is that this does not help at all. We are
> > back to the word "this_cpu" when you really do mean "any_cpu". We
>
> We mean the current cpu while the instruction is executing.
Yes, again that is a implementation detail not a design one. The
"current cpu" is nothing more than an optimization. The code is still
written in a way that it would work if you wrote it to another CPU as
long as the operation itself was atomic.
>
> > Again, lets just bite the bullet and rename them to something that is
> > understandable for everyone. This would make all of us happy.
> >
> > I'm not against your code, I'm against the naming convention you decided
> > to use. It makes it confusing to something that is complex and complex
> > code needs to try to be a simple as possible.
>
> I am certainly open to new ways of structuring the stuff and for any
> improvement possible. This whole scheme was developed in long discussions
> over many years. My initial proposal was to use cpu_inc/dec etc. I did
> what was possible in the context of the various opinions that people had
> on these matters when the implementations were discussed. The "you" does
> not apply. It is either "us" or "them" (if you want to distance yourself
> from this).
Who is this "them"? Currently the only ones that matter to me is the
mainline kernel developers. They are the ones that have to live with
this nonsense. If the "them" includes other mainline kernel developers
that are not on the Cc, please add them. Otherwise, I really don't give
a flying fart about the "them".
This is all about a naming convention, not the implementation. The
current naming convention is extremely confusing. As the mistakes that
have been shown through out the kernel. Worse yet, you removed the debug
features that have been there forever and don't even care. Just calling
the users that break the code irresponsible is not going to get you
anywhere.
We have already talked about bringing this up at Kernel Summit. Too bad
you wont be there. Make sure one of the "them" is there to defend this
silly naming convention, otherwise I will push to get all the main
kernel developers accepting a renaming of this internal API. And
immediately after the conference I will start sending patches that will
be accepted.
-- Steve
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write()
2011-09-21 15:16 ` Christoph Lameter
2011-09-21 15:31 ` Steven Rostedt
@ 2011-09-21 16:32 ` Thomas Gleixner
1 sibling, 0 replies; 73+ messages in thread
From: Thomas Gleixner @ 2011-09-21 16:32 UTC (permalink / raw)
To: Christoph Lameter
Cc: Steven Rostedt, Mathieu Desnoyers, linux-kernel, Ingo Molnar,
Andrew Morton, Peter Zijlstra
On Wed, 21 Sep 2011, Christoph Lameter wrote:
> On Tue, 20 Sep 2011, Steven Rostedt wrote:
>
> > BTW, Can you explain to me where the this_cpu_*() ops were designed to
> > be used? The only places where "this_cpu_*()" is used in slub.c and
> > page_alloc.c have irqs disabled on their use. I thought this was for
> > slub and page_alloc?
>
> Initially these were used for statistics that used per cpu counters. The
> slub thing was an outgrow of this.
>
> > Is this_cpu() made just for statistics? I see it used in the inode code
> > for that, and some accounting in the namespace.c code.
>
> That is the main use case yes.
>
> > Note and there's places all over the kernel that uses this_cpu_read()
> > and thinks preemption should be disabled. Just look at
> > arch/x86/mm/tlb.c:
> >
> > /* Caller has disabled preemption */
> > sender = this_cpu_read(tlb_vector_offset);
> >
> > Why the comment?
> >
> > My argument is that this_cpu_* is just confusing. Rename your use case
> > and keep this_cpu_*() as what you want __this_cpu_*() to be.
>
> Thought about this a bit last night. I think the main issue are these
> this_cpu_read() and this_cpu_write() operations since people use those
> irresponsibly. It usually does not make sense to read a value from a
> random cpu nor does writing make sense. The situation is different for
> per cpu counter increments where it does not matter which cpus counter is
> incremented since we sum them up later anyways.
>
> How about getting rid of this_cpu_read() and this_cpu_write() entirely?
>
> Only allow __this_cpu_read and __this_cpu_write. There we check that the
> caller has disabled preemption.
>
> For the rare special cases (are there any?) that are legitimate use cases
> for this_cpu_read/write we can use manual determination of per cpu
> pointers and then just do a load via the pointer?
>
> Or alternatively give this_cpu_read and write special names that make
> their dangerousness clear.
>
> In the case of slub there are only some this_cpu_write() things that can
> be __this_cpu_write without a problem.
>
> The __this_cpu_ptr() can become this_cpu_ptr as far as I can tell. This
> should make it consistent so that we can check for disabled preemption for
> all __this_cpu thingies.
The problem I have with that approach is that this_cpu_inc/dec still
look too close to __this_cpu_*.
It would really be nice to rename this_cpu_inc/dec to something which
makes it clear that this is statistics and does not care a whit about
the actual cpu on which this happens.
Thanks,
tglx
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write()
2011-09-19 21:20 [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write() Steven Rostedt
` (5 preceding siblings ...)
2011-09-19 21:49 ` [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write() Christoph Lameter
@ 2011-09-20 2:20 ` Andi Kleen
2011-09-20 3:12 ` Steven Rostedt
6 siblings, 1 reply; 73+ messages in thread
From: Andi Kleen @ 2011-09-20 2:20 UTC (permalink / raw)
To: Steven Rostedt
Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
Peter Zijlstra, Christoph Lameter
Steven Rostedt <rostedt@goodmis.org> writes:
> I just found out that the this_cpu_*() functions do not perform the
> test to see if the usage is in atomic or not. Thus, the blind
> conversion of the per_cpu(*, smp_processor_id()) and the get_cpu_var()
> code to this_cpu_*() introduce the regression to detect the hard
> to find case where a per cpu variable is used in preempt code that
> migrates and causes bugs.
Didn't preempt-rt recently get changed to not migrate in kernel-preempt
regions. How about just fixing the normal preemption to not do this
either.
Then all these complications wouldn't be necessary and a whole lot
of code related to this could be removed too, and you would still
have less bugs.
-Andi
--
ak@linux.intel.com -- Speaking for myself only
^ permalink raw reply [flat|nested] 73+ messages in thread* Re: [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write()
2011-09-20 2:20 ` Andi Kleen
@ 2011-09-20 3:12 ` Steven Rostedt
2011-09-20 3:17 ` Steven Rostedt
2011-09-20 8:32 ` Thomas Gleixner
0 siblings, 2 replies; 73+ messages in thread
From: Steven Rostedt @ 2011-09-20 3:12 UTC (permalink / raw)
To: Andi Kleen
Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
Peter Zijlstra, Christoph Lameter
On Mon, 2011-09-19 at 19:20 -0700, Andi Kleen wrote:
> Steven Rostedt <rostedt@goodmis.org> writes:
>
> > I just found out that the this_cpu_*() functions do not perform the
> > test to see if the usage is in atomic or not. Thus, the blind
> > conversion of the per_cpu(*, smp_processor_id()) and the get_cpu_var()
> > code to this_cpu_*() introduce the regression to detect the hard
> > to find case where a per cpu variable is used in preempt code that
> > migrates and causes bugs.
>
>
> Didn't preempt-rt recently get changed to not migrate in kernel-preempt
> regions. How about just fixing the normal preemption to not do this
> either.
Actually, that's part of the issue. RT has made spin_locks not migrate.
But this has also increased the overhead of those same spinlocks. I'm
hoping to do away with the big hammer approach (although Thomas is less
interested in this). I would like to have areas that require per-cpu
variables to be annotated, and not have every spinlock disable
preemption.
>
> Then all these complications wouldn't be necessary and a whole lot
> of code related to this could be removed too, and you would still
> have less bugs.
Note, that normal preemption doesn't migrate either. If you disable
preemption, you don't migrate.
-- Steve
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write()
2011-09-20 3:12 ` Steven Rostedt
@ 2011-09-20 3:17 ` Steven Rostedt
2011-09-20 8:32 ` Thomas Gleixner
1 sibling, 0 replies; 73+ messages in thread
From: Steven Rostedt @ 2011-09-20 3:17 UTC (permalink / raw)
To: Andi Kleen
Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
Peter Zijlstra, Christoph Lameter
On Mon, 2011-09-19 at 23:12 -0400, Steven Rostedt wrote:
> Actually, that's part of the issue. RT has made spin_locks not migrate.
> But this has also increased the overhead of those same spinlocks. I'm
> hoping to do away with the big hammer approach (although Thomas is less
> interested in this). I would like to have areas that require per-cpu
> variables to be annotated, and not have every spinlock disable
> preemption.
>
The point is, if I do make this change. It is even more essential that I
detect the preempt enable use of this_cpu() and friends. As the
spin_locks() will no longer be giving that guarantee.
-- Steve
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write()
2011-09-20 3:12 ` Steven Rostedt
2011-09-20 3:17 ` Steven Rostedt
@ 2011-09-20 8:32 ` Thomas Gleixner
2011-09-20 12:10 ` Steven Rostedt
2011-09-20 15:03 ` Christoph Lameter
1 sibling, 2 replies; 73+ messages in thread
From: Thomas Gleixner @ 2011-09-20 8:32 UTC (permalink / raw)
To: Steven Rostedt
Cc: Andi Kleen, LKML, Ingo Molnar, Andrew Morton, Peter Zijlstra,
Christoph Lameter, Tejun Heo, Linus Torvalds
On Mon, 19 Sep 2011, Steven Rostedt wrote:
> On Mon, 2011-09-19 at 19:20 -0700, Andi Kleen wrote:
> > Steven Rostedt <rostedt@goodmis.org> writes:
> >
> > > I just found out that the this_cpu_*() functions do not perform the
> > > test to see if the usage is in atomic or not. Thus, the blind
> > > conversion of the per_cpu(*, smp_processor_id()) and the get_cpu_var()
> > > code to this_cpu_*() introduce the regression to detect the hard
> > > to find case where a per cpu variable is used in preempt code that
> > > migrates and causes bugs.
Just for the record. I added some this_cpu_* debug checks to my
filesystem eating 2.6.38-rt and guess what: They trigger right away in
the FS code and without digging deeper I'm 100% sure, that this is the
root cause of the problems I was hunting for weeks. Thanks for wasting
my time and racking my nerves.
People who remove debugability blindly have earned an one way ticket
to the Oort cloud. There is utter chaos already so they wont be
noticed at all.
> > Didn't preempt-rt recently get changed to not migrate in kernel-preempt
> > regions. How about just fixing the normal preemption to not do this
> > either.
>
> Actually, that's part of the issue. RT has made spin_locks not migrate.
> But this has also increased the overhead of those same spinlocks. I'm
> hoping to do away with the big hammer approach (although Thomas is less
> interested in this). I would like to have areas that require per-cpu
> variables to be annotated,
Yes, annotation is definitely something which is needed badly.
Right now preempt_disable()/local_irq_disable() are used explicit or
implicit (through spin_lock*) to protect per cpu sections, but we have
no clue, where such a section really starts and ends.
In fact preempt_disable/local_irq_disable() have become the new cpu
local BKL and the per cpu stuff just happily (ab)uses that without
documenting the scope of the code sections which rely on that. It's
just nesting inside spinlocked sections at random places without
giving a clue what needs to be kept on a cpu or not.
That's what makes it basically impossible to use anything else than
the big hammer approach in RT. Nobody has the bandwidth to audit all
this stuff and I seriously doubt that we can improve that situation
unless we get proper annotation of the per cpu sections in place.
Can we please put that on the KS agenda? This definitely needs to be
addressed urgently.
> and not have every spinlock disable preemption.
That doesn't work, you're prone to deadlocks then. I guess you meant
not disable migration on RT, right?
Thanks,
tglx
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write()
2011-09-20 8:32 ` Thomas Gleixner
@ 2011-09-20 12:10 ` Steven Rostedt
2011-09-20 15:03 ` Christoph Lameter
1 sibling, 0 replies; 73+ messages in thread
From: Steven Rostedt @ 2011-09-20 12:10 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Andi Kleen, LKML, Ingo Molnar, Andrew Morton, Peter Zijlstra,
Christoph Lameter, Tejun Heo, Linus Torvalds
On Tue, 2011-09-20 at 10:32 +0200, Thomas Gleixner wrote:
> > and not have every spinlock disable preemption.
>
> That doesn't work, you're prone to deadlocks then. I guess you meant
> not disable migration on RT, right?
Oops, yeah I meant migrate disable.
-- Steve
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write()
2011-09-20 8:32 ` Thomas Gleixner
2011-09-20 12:10 ` Steven Rostedt
@ 2011-09-20 15:03 ` Christoph Lameter
2011-09-20 15:07 ` Peter Zijlstra
1 sibling, 1 reply; 73+ messages in thread
From: Christoph Lameter @ 2011-09-20 15:03 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Steven Rostedt, Andi Kleen, LKML, Ingo Molnar, Andrew Morton,
Peter Zijlstra, Christoph Lameter, Tejun Heo, Linus Torvalds
On Tue, 20 Sep 2011, Thomas Gleixner wrote:
> On Mon, 19 Sep 2011, Steven Rostedt wrote:
> > On Mon, 2011-09-19 at 19:20 -0700, Andi Kleen wrote:
> > > Steven Rostedt <rostedt@goodmis.org> writes:
> > >
> > > > I just found out that the this_cpu_*() functions do not perform the
> > > > test to see if the usage is in atomic or not. Thus, the blind
> > > > conversion of the per_cpu(*, smp_processor_id()) and the get_cpu_var()
> > > > code to this_cpu_*() introduce the regression to detect the hard
> > > > to find case where a per cpu variable is used in preempt code that
> > > > migrates and causes bugs.
>
> Just for the record. I added some this_cpu_* debug checks to my
> filesystem eating 2.6.38-rt and guess what: They trigger right away in
> the FS code and without digging deeper I'm 100% sure, that this is the
> root cause of the problems I was hunting for weeks. Thanks for wasting
> my time and racking my nerves.
this_cpu_xx is safe to use in preemptable contexts. So what does this have
to do with your FS problems?
> Can we please put that on the KS agenda? This definitely needs to be
> addressed urgently.
Well yes the misunderstanding of per cpu operations was one reason why I
proposed the discussion on the subject of esoteric kernel synchronization.
I do not think that it was accepted.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write()
2011-09-20 15:03 ` Christoph Lameter
@ 2011-09-20 15:07 ` Peter Zijlstra
2011-09-20 16:05 ` Christoph Lameter
0 siblings, 1 reply; 73+ messages in thread
From: Peter Zijlstra @ 2011-09-20 15:07 UTC (permalink / raw)
To: Christoph Lameter
Cc: Thomas Gleixner, Steven Rostedt, Andi Kleen, LKML, Ingo Molnar,
Andrew Morton, Christoph Lameter, Tejun Heo, Linus Torvalds
On Tue, 2011-09-20 at 10:03 -0500, Christoph Lameter wrote:
>
> Well yes the misunderstanding of per cpu operations was one reason why I
> proposed the discussion on the subject of esoteric kernel synchronization.
> I do not think that it was accepted.
I don't think anybody here misunderstands it, we're just all very angry
that its causing so much problems.
__this_cpu doesn't have preempt debug checks, and there's a lot of
this_cpu usage that really should have been __this_cpu.
The very fact that a quick scan still reveals actual bugs should be a
warning sign that this crap doesn't have enough sanity checks.
^ permalink raw reply [flat|nested] 73+ messages in thread
* Re: [RFC][PATCH 0/5] Introduce checks for preemptable code for this_cpu_read/write()
2011-09-20 15:07 ` Peter Zijlstra
@ 2011-09-20 16:05 ` Christoph Lameter
0 siblings, 0 replies; 73+ messages in thread
From: Christoph Lameter @ 2011-09-20 16:05 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Thomas Gleixner, Steven Rostedt, Andi Kleen, LKML, Ingo Molnar,
Andrew Morton, Christoph Lameter, Tejun Heo, Linus Torvalds
On Tue, 20 Sep 2011, Peter Zijlstra wrote:
> On Tue, 2011-09-20 at 10:03 -0500, Christoph Lameter wrote:
> >
> > Well yes the misunderstanding of per cpu operations was one reason why I
> > proposed the discussion on the subject of esoteric kernel synchronization.
> > I do not think that it was accepted.
>
> I don't think anybody here misunderstands it, we're just all very angry
> that its causing so much problems.
What problems? RT does not work with it? The memcg confusion?
> __this_cpu doesn't have preempt debug checks, and there's a lot of
> this_cpu usage that really should have been __this_cpu.
That only works if preemption is disabled when the this_cpu op is being
invoked.
^ permalink raw reply [flat|nested] 73+ messages in thread