* [BUG] WARNING: at kernel/trace/ring_buffer.c:3420
@ 2010-03-08 6:04 Li Zefan
2010-03-08 6:50 ` Lai Jiangshan
2010-03-08 14:54 ` [BUG] WARNING: at kernel/trace/ring_buffer.c:3420 Steven Rostedt
0 siblings, 2 replies; 11+ messages in thread
From: Li Zefan @ 2010-03-08 6:04 UTC (permalink / raw)
To: Steven Rostedt; +Cc: Frederic Weisbecker, Lai Jiangshan, Ingo Molnar, LKML
While running a ftrace test, a kernel warning showed up. The bug is not so
easy to reproduce, normally it takes 30 mins to 2 hours.
After revering 3c05d7482777f15e71bb4cb1ba78dee2800dfec6, I ran the test
for 15 hours, and the bug seems disapeared.
------------[ cut here ]------------
WARNING: at kernel/trace/ring_buffer.c:3420 ring_buffer_reset_cpu+0x5a/0x9c()
Hardware name: PRIMERGY
Modules linked in: ipt_MASQUERADE iptable_nat nf_nat bridge stp llc deflate zlib_deflate ctr camellia cast5 rmd160 crypto_null ccm serpent blowfish twofish_x86_64 twofish_common ecb xcbc cbc sha256_generic sha512_generic des_generic cryptd aes_x86_64 aes_generic ah6 ah4 esp6 esp4 xfrm4_mode_beet xfrm4_tunnel tunnel4 xfrm4_mode_tunnel xfrm4_mode_transport xfrm6_mode_transport xfrm6_mode_beet ipcomp ipcomp6 xfrm_ipcomp xfrm6_tunnel tunnel6 af_key autofs4 sunrpc ipv6 dm_mirror dm_region_hash dm_log dm_multipath uinput i5k_amb hwmon e1000e i5000_edac edac_core iTCO_wdt iTCO_vendor_support shpchp i2c_i801 i2c_core serio_raw pcspkr pata_acpi ata_generic mptsas mptscsih ata_piix mptbase scsi_transport_sas dm_mod [last unloaded: speedstep_lib]
Pid: 2108, comm: ftrace_current_ Tainted: G W 2.6.33-tip+ #2
Call Trace:
[<ffffffff8104ebd0>] warn_slowpath_common+0x7c/0x94
[<ffffffff8104ebfc>] warn_slowpath_null+0x14/0x16
[<ffffffff810a50f5>] ring_buffer_reset_cpu+0x5a/0x9c
[<ffffffff810aa3f0>] tracing_reset_online_cpus+0x4a/0x79
[<ffffffff810aa434>] tracer_init+0x15/0x20
[<ffffffff810aa601>] tracing_set_tracer+0x1c2/0x1e7
[<ffffffff810d9847>] ? might_fault+0x5c/0xac
[<ffffffff810aa6b2>] tracing_set_trace_write+0x8c/0xbc
[<ffffffff810a8e79>] ? trace_buffer_lock_reserve+0x21/0x56
[<ffffffff810a95ce>] ? trace_current_buffer_lock_reserve+0x16/0x18
[<ffffffff813ef135>] ? retint_swapgs+0x13/0x1b
[<ffffffff81104618>] vfs_write+0xae/0x10b
[<ffffffff81104735>] sys_write+0x4a/0x6e
[<ffffffff81009da6>] tracesys+0xd0/0xd5
---[ end trace f7002f0890d5b023 ]---
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [BUG] WARNING: at kernel/trace/ring_buffer.c:3420
2010-03-08 6:04 [BUG] WARNING: at kernel/trace/ring_buffer.c:3420 Li Zefan
@ 2010-03-08 6:50 ` Lai Jiangshan
2010-03-08 15:03 ` Steven Rostedt
2010-03-13 12:24 ` [tip:tracing/urgent] ring-buffer: Move disabled check into preempt disable section tip-bot for Lai Jiangshan
2010-03-08 14:54 ` [BUG] WARNING: at kernel/trace/ring_buffer.c:3420 Steven Rostedt
1 sibling, 2 replies; 11+ messages in thread
From: Lai Jiangshan @ 2010-03-08 6:50 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Li Zefan, Steven Rostedt, Frederic Weisbecker, LKML
Li Zefan wrote:
> While running a ftrace test, a kernel warning showed up. The bug is not so
> easy to reproduce, normally it takes 30 mins to 2 hours.
>
> After revering 3c05d7482777f15e71bb4cb1ba78dee2800dfec6, I ran the test
> for 15 hours, and the bug seems disapeared.
>
This bug exhausted us, we found a issue: a tiny window which is
one of causes of this bug. This patch is to close this tiny window.
(bug is not fixed)
From: Lai Jiangshan <laijs@cn.fujitsu.com>
ringbuffer resizing and reseting will increase the ->record_disabled
and then wait until a rcu_shced grace period passes.
Contrarily, testing ->record_disabled should be at the same
preempt disabled critical region as writing into ringbuffer, otherwise
it will leave a window break ringbuffer resizing or reseting.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reported-by: Li Zefan <lizf@cn.fujitsu.com>
---
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 8c1b2d2..54191d6 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2232,12 +2232,12 @@ ring_buffer_lock_reserve(struct ring_buffer *buffer, unsigned long length)
if (ring_buffer_flags != RB_BUFFERS_ON)
return NULL;
- if (atomic_read(&buffer->record_disabled))
- return NULL;
-
/* If we are tracing schedule, we don't want to recurse */
resched = ftrace_preempt_disable();
+ if (atomic_read(&buffer->record_disabled))
+ goto out_nocheck;
+
if (trace_recursive_lock())
goto out_nocheck;
@@ -2469,11 +2469,11 @@ int ring_buffer_write(struct ring_buffer *buffer,
if (ring_buffer_flags != RB_BUFFERS_ON)
return -EBUSY;
- if (atomic_read(&buffer->record_disabled))
- return -EBUSY;
-
resched = ftrace_preempt_disable();
+ if (atomic_read(&buffer->record_disabled))
+ goto out;
+
cpu = raw_smp_processor_id();
if (!cpumask_test_cpu(cpu, buffer->cpumask))
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [BUG] WARNING: at kernel/trace/ring_buffer.c:3420
2010-03-08 6:04 [BUG] WARNING: at kernel/trace/ring_buffer.c:3420 Li Zefan
2010-03-08 6:50 ` Lai Jiangshan
@ 2010-03-08 14:54 ` Steven Rostedt
2010-03-09 1:08 ` Li Zefan
1 sibling, 1 reply; 11+ messages in thread
From: Steven Rostedt @ 2010-03-08 14:54 UTC (permalink / raw)
To: Li Zefan; +Cc: Frederic Weisbecker, Lai Jiangshan, Ingo Molnar, LKML
On Mon, 2010-03-08 at 14:04 +0800, Li Zefan wrote:
> While running a ftrace test, a kernel warning showed up. The bug is not so
> easy to reproduce, normally it takes 30 mins to 2 hours.
What test are you running? Can you share it?
>
> After revering 3c05d7482777f15e71bb4cb1ba78dee2800dfec6, I ran the test
> for 15 hours, and the bug seems disapeared.
So basically, you are reading both the trace and trace_pipe (or
trace_pipe_raw) at the same time?
-- Steve
>
> ------------[ cut here ]------------
> WARNING: at kernel/trace/ring_buffer.c:3420 ring_buffer_reset_cpu+0x5a/0x9c()
> Hardware name: PRIMERGY
> Modules linked in: ipt_MASQUERADE iptable_nat nf_nat bridge stp llc deflate zlib_deflate ctr camellia cast5 rmd160 crypto_null ccm serpent blowfish twofish_x86_64 twofish_common ecb xcbc cbc sha256_generic sha512_generic des_generic cryptd aes_x86_64 aes_generic ah6 ah4 esp6 esp4 xfrm4_mode_beet xfrm4_tunnel tunnel4 xfrm4_mode_tunnel xfrm4_mode_transport xfrm6_mode_transport xfrm6_mode_beet ipcomp ipcomp6 xfrm_ipcomp xfrm6_tunnel tunnel6 af_key autofs4 sunrpc ipv6 dm_mirror dm_region_hash dm_log dm_multipath uinput i5k_amb hwmon e1000e i5000_edac edac_core iTCO_wdt iTCO_vendor_support shpchp i2c_i801 i2c_core serio_raw pcspkr pata_acpi ata_generic mptsas mptscsih ata_piix mptbase scsi_transport_sas dm_mod [last unloaded: speedstep_lib]
> Pid: 2108, comm: ftrace_current_ Tainted: G W 2.6.33-tip+ #2
> Call Trace:
> [<ffffffff8104ebd0>] warn_slowpath_common+0x7c/0x94
> [<ffffffff8104ebfc>] warn_slowpath_null+0x14/0x16
> [<ffffffff810a50f5>] ring_buffer_reset_cpu+0x5a/0x9c
> [<ffffffff810aa3f0>] tracing_reset_online_cpus+0x4a/0x79
> [<ffffffff810aa434>] tracer_init+0x15/0x20
> [<ffffffff810aa601>] tracing_set_tracer+0x1c2/0x1e7
> [<ffffffff810d9847>] ? might_fault+0x5c/0xac
> [<ffffffff810aa6b2>] tracing_set_trace_write+0x8c/0xbc
> [<ffffffff810a8e79>] ? trace_buffer_lock_reserve+0x21/0x56
> [<ffffffff810a95ce>] ? trace_current_buffer_lock_reserve+0x16/0x18
> [<ffffffff813ef135>] ? retint_swapgs+0x13/0x1b
> [<ffffffff81104618>] vfs_write+0xae/0x10b
> [<ffffffff81104735>] sys_write+0x4a/0x6e
> [<ffffffff81009da6>] tracesys+0xd0/0xd5
> ---[ end trace f7002f0890d5b023 ]---
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [BUG] WARNING: at kernel/trace/ring_buffer.c:3420
2010-03-08 6:50 ` Lai Jiangshan
@ 2010-03-08 15:03 ` Steven Rostedt
2010-03-08 15:07 ` Steven Rostedt
2010-03-13 12:24 ` [tip:tracing/urgent] ring-buffer: Move disabled check into preempt disable section tip-bot for Lai Jiangshan
1 sibling, 1 reply; 11+ messages in thread
From: Steven Rostedt @ 2010-03-08 15:03 UTC (permalink / raw)
To: Lai Jiangshan; +Cc: Ingo Molnar, Li Zefan, Frederic Weisbecker, LKML
On Mon, 2010-03-08 at 14:50 +0800, Lai Jiangshan wrote:
> Li Zefan wrote:
> > While running a ftrace test, a kernel warning showed up. The bug is not so
> > easy to reproduce, normally it takes 30 mins to 2 hours.
> >
> > After revering 3c05d7482777f15e71bb4cb1ba78dee2800dfec6, I ran the test
> > for 15 hours, and the bug seems disapeared.
> >
>
> This bug exhausted us, we found a issue: a tiny window which is
> one of causes of this bug. This patch is to close this tiny window.
> (bug is not fixed)
>
> From: Lai Jiangshan <laijs@cn.fujitsu.com>
>
> ringbuffer resizing and reseting will increase the ->record_disabled
> and then wait until a rcu_shced grace period passes.
>
> Contrarily, testing ->record_disabled should be at the same
> preempt disabled critical region as writing into ringbuffer, otherwise
> it will leave a window break ringbuffer resizing or reseting.
So the resizing and the resetting need a synchronize_sched() after the
disabling of the buffers, right?
-- Steve
>
> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
> Reported-by: Li Zefan <lizf@cn.fujitsu.com>
> ---
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index 8c1b2d2..54191d6 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -2232,12 +2232,12 @@ ring_buffer_lock_reserve(struct ring_buffer *buffer, unsigned long length)
> if (ring_buffer_flags != RB_BUFFERS_ON)
> return NULL;
>
> - if (atomic_read(&buffer->record_disabled))
> - return NULL;
> -
> /* If we are tracing schedule, we don't want to recurse */
> resched = ftrace_preempt_disable();
>
> + if (atomic_read(&buffer->record_disabled))
> + goto out_nocheck;
> +
> if (trace_recursive_lock())
> goto out_nocheck;
>
> @@ -2469,11 +2469,11 @@ int ring_buffer_write(struct ring_buffer *buffer,
> if (ring_buffer_flags != RB_BUFFERS_ON)
> return -EBUSY;
>
> - if (atomic_read(&buffer->record_disabled))
> - return -EBUSY;
> -
> resched = ftrace_preempt_disable();
>
> + if (atomic_read(&buffer->record_disabled))
> + goto out;
> +
> cpu = raw_smp_processor_id();
>
> if (!cpumask_test_cpu(cpu, buffer->cpumask))
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [BUG] WARNING: at kernel/trace/ring_buffer.c:3420
2010-03-08 15:03 ` Steven Rostedt
@ 2010-03-08 15:07 ` Steven Rostedt
2010-03-09 3:35 ` Lai Jiangshan
0 siblings, 1 reply; 11+ messages in thread
From: Steven Rostedt @ 2010-03-08 15:07 UTC (permalink / raw)
To: Lai Jiangshan; +Cc: Ingo Molnar, Li Zefan, Frederic Weisbecker, LKML
On Mon, 2010-03-08 at 10:03 -0500, Steven Rostedt wrote:
> > ringbuffer resizing and reseting will increase the ->record_disabled
> > and then wait until a rcu_shced grace period passes.
> >
> > Contrarily, testing ->record_disabled should be at the same
> > preempt disabled critical region as writing into ringbuffer, otherwise
> > it will leave a window break ringbuffer resizing or reseting.
>
> So the resizing and the resetting need a synchronize_sched() after the
> disabling of the buffers, right?
Looking at the code, the synchronize_sched() is already done in
ring_buffer_resize, and the caller (trace.c:tracing_reset() ) also
disables the ring buffer and calls synchronize_sched().
With that, what other window is still opened (after this fix)?
-- Steve
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [BUG] WARNING: at kernel/trace/ring_buffer.c:3420
2010-03-08 14:54 ` [BUG] WARNING: at kernel/trace/ring_buffer.c:3420 Steven Rostedt
@ 2010-03-09 1:08 ` Li Zefan
0 siblings, 0 replies; 11+ messages in thread
From: Li Zefan @ 2010-03-09 1:08 UTC (permalink / raw)
To: rostedt; +Cc: Frederic Weisbecker, Lai Jiangshan, Ingo Molnar, LKML
Steven Rostedt wrote:
> On Mon, 2010-03-08 at 14:04 +0800, Li Zefan wrote:
>> While running a ftrace test, a kernel warning showed up. The bug is not so
>> easy to reproduce, normally it takes 30 mins to 2 hours.
>
> What test are you running? Can you share it?
>
The test runs 10+ processes, reading/writing files in debugfs/tracing/
simultaneously.
I'll send it to you in a private mail.
>> After revering 3c05d7482777f15e71bb4cb1ba78dee2800dfec6, I ran the test
>> for 15 hours, and the bug seems disapeared.
>
> So basically, you are reading both the trace and trace_pipe (or
> trace_pipe_raw) at the same time?
>
reading trace, yes. reading trace_pipe, no.
Actualy we don't see how this commit relates to the bug.
I'll run the test again to be sure on this.
> -- Steve
>
>> ------------[ cut here ]------------
>> WARNING: at kernel/trace/ring_buffer.c:3420 ring_buffer_reset_cpu+0x5a/0x9c()
>> Hardware name: PRIMERGY
>> Modules linked in: ipt_MASQUERADE iptable_nat nf_nat bridge stp llc deflate zlib_deflate ctr camellia cast5 rmd160 crypto_null ccm serpent blowfish twofish_x86_64 twofish_common ecb xcbc cbc sha256_generic sha512_generic des_generic cryptd aes_x86_64 aes_generic ah6 ah4 esp6 esp4 xfrm4_mode_beet xfrm4_tunnel tunnel4 xfrm4_mode_tunnel xfrm4_mode_transport xfrm6_mode_transport xfrm6_mode_beet ipcomp ipcomp6 xfrm_ipcomp xfrm6_tunnel tunnel6 af_key autofs4 sunrpc ipv6 dm_mirror dm_region_hash dm_log dm_multipath uinput i5k_amb hwmon e1000e i5000_edac edac_core iTCO_wdt iTCO_vendor_support shpchp i2c_i801 i2c_core serio_raw pcspkr pata_acpi ata_generic mptsas mptscsih ata_piix mptbase scsi_transport_sas dm_mod [last unloaded: speedstep_lib]
>> Pid: 2108, comm: ftrace_current_ Tainted: G W 2.6.33-tip+ #2
>> Call Trace:
>> [<ffffffff8104ebd0>] warn_slowpath_common+0x7c/0x94
>> [<ffffffff8104ebfc>] warn_slowpath_null+0x14/0x16
>> [<ffffffff810a50f5>] ring_buffer_reset_cpu+0x5a/0x9c
>> [<ffffffff810aa3f0>] tracing_reset_online_cpus+0x4a/0x79
>> [<ffffffff810aa434>] tracer_init+0x15/0x20
>> [<ffffffff810aa601>] tracing_set_tracer+0x1c2/0x1e7
>> [<ffffffff810d9847>] ? might_fault+0x5c/0xac
>> [<ffffffff810aa6b2>] tracing_set_trace_write+0x8c/0xbc
>> [<ffffffff810a8e79>] ? trace_buffer_lock_reserve+0x21/0x56
>> [<ffffffff810a95ce>] ? trace_current_buffer_lock_reserve+0x16/0x18
>> [<ffffffff813ef135>] ? retint_swapgs+0x13/0x1b
>> [<ffffffff81104618>] vfs_write+0xae/0x10b
>> [<ffffffff81104735>] sys_write+0x4a/0x6e
>> [<ffffffff81009da6>] tracesys+0xd0/0xd5
>> ---[ end trace f7002f0890d5b023 ]---
>
>
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [BUG] WARNING: at kernel/trace/ring_buffer.c:3420
2010-03-08 15:07 ` Steven Rostedt
@ 2010-03-09 3:35 ` Lai Jiangshan
2010-03-09 3:54 ` Steven Rostedt
2010-03-09 17:28 ` Steven Rostedt
0 siblings, 2 replies; 11+ messages in thread
From: Lai Jiangshan @ 2010-03-09 3:35 UTC (permalink / raw)
To: rostedt; +Cc: Ingo Molnar, Li Zefan, Frederic Weisbecker, LKML
Steven Rostedt wrote:
> On Mon, 2010-03-08 at 10:03 -0500, Steven Rostedt wrote:
>
>>> ringbuffer resizing and reseting will increase the ->record_disabled
>>> and then wait until a rcu_shced grace period passes.
>>>
>>> Contrarily, testing ->record_disabled should be at the same
>>> preempt disabled critical region as writing into ringbuffer, otherwise
>>> it will leave a window break ringbuffer resizing or reseting.
>> So the resizing and the resetting need a synchronize_sched() after the
>> disabling of the buffers, right?
>
> Looking at the code, the synchronize_sched() is already done in
> ring_buffer_resize, and the caller (trace.c:tracing_reset() ) also
> disables the ring buffer and calls synchronize_sched().
>
> With that, what other window is still opened (after this fix)?
>
This window is still opened: (RCU vs IDLE vs Tracing)
synchronize_sched() does not protect preempt_disable()/enable() for
idle process. But tracing(function_graph, function) introduce more
preempt_disable()/enable() for idle process. It brings windows.
I bet that this bug is not come from this window.
(I added some strict code to RCU and did stress test,
bug was still occurred.)
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [BUG] WARNING: at kernel/trace/ring_buffer.c:3420
2010-03-09 3:35 ` Lai Jiangshan
@ 2010-03-09 3:54 ` Steven Rostedt
2010-03-09 17:28 ` Steven Rostedt
1 sibling, 0 replies; 11+ messages in thread
From: Steven Rostedt @ 2010-03-09 3:54 UTC (permalink / raw)
To: Lai Jiangshan
Cc: Ingo Molnar, Li Zefan, Frederic Weisbecker, LKML,
Paul E. McKenney
On Tue, 2010-03-09 at 11:35 +0800, Lai Jiangshan wrote:
> Steven Rostedt wrote:
> > On Mon, 2010-03-08 at 10:03 -0500, Steven Rostedt wrote:
> >
> >>> ringbuffer resizing and reseting will increase the ->record_disabled
> >>> and then wait until a rcu_shced grace period passes.
> >>>
> >>> Contrarily, testing ->record_disabled should be at the same
> >>> preempt disabled critical region as writing into ringbuffer, otherwise
> >>> it will leave a window break ringbuffer resizing or reseting.
> >> So the resizing and the resetting need a synchronize_sched() after the
> >> disabling of the buffers, right?
> >
> > Looking at the code, the synchronize_sched() is already done in
> > ring_buffer_resize, and the caller (trace.c:tracing_reset() ) also
> > disables the ring buffer and calls synchronize_sched().
> >
> > With that, what other window is still opened (after this fix)?
> >
>
> This window is still opened: (RCU vs IDLE vs Tracing)
>
> synchronize_sched() does not protect preempt_disable()/enable() for
> idle process. But tracing(function_graph, function) introduce more
> preempt_disable()/enable() for idle process. It brings windows.
It most definitely should. If it does not, then we have major problems.
>From the comment above synchronize_sched(void):
/**
* synchronize_sched - wait until an rcu-sched grace period has elapsed.
*
* Control will return to the caller some time after a full rcu-sched
* grace period has elapsed, in other words after all currently executing
* rcu-sched read-side critical sections have completed. These read-side
* critical sections are delimited by rcu_read_lock_sched() and
* rcu_read_unlock_sched(), and may be nested. Note that preempt_disable(),
* local_irq_disable(), and so on may be used in place of
* rcu_read_lock_sched().
I do not see how idle can be ignored here.
>
> I bet that this bug is not come from this window.
> (I added some strict code to RCU and did stress test,
> bug was still occurred.)
>
What exactly were you doing when this happened? Was this from Li's test?
-- Steve
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [BUG] WARNING: at kernel/trace/ring_buffer.c:3420
2010-03-09 3:35 ` Lai Jiangshan
2010-03-09 3:54 ` Steven Rostedt
@ 2010-03-09 17:28 ` Steven Rostedt
2010-03-09 18:46 ` Steven Rostedt
1 sibling, 1 reply; 11+ messages in thread
From: Steven Rostedt @ 2010-03-09 17:28 UTC (permalink / raw)
To: Lai Jiangshan
Cc: Ingo Molnar, Li Zefan, Frederic Weisbecker, LKML,
Paul E. McKenney
On Tue, 2010-03-09 at 11:35 +0800, Lai Jiangshan wrote:
> Steven Rostedt wrote:
> > On Mon, 2010-03-08 at 10:03 -0500, Steven Rostedt wrote:
> >
> >>> ringbuffer resizing and reseting will increase the ->record_disabled
> >>> and then wait until a rcu_shced grace period passes.
> >>>
> >>> Contrarily, testing ->record_disabled should be at the same
> >>> preempt disabled critical region as writing into ringbuffer, otherwise
> >>> it will leave a window break ringbuffer resizing or reseting.
> >> So the resizing and the resetting need a synchronize_sched() after the
> >> disabling of the buffers, right?
> >
> > Looking at the code, the synchronize_sched() is already done in
> > ring_buffer_resize, and the caller (trace.c:tracing_reset() ) also
> > disables the ring buffer and calls synchronize_sched().
> >
> > With that, what other window is still opened (after this fix)?
> >
>
> This window is still opened: (RCU vs IDLE vs Tracing)
>
> synchronize_sched() does not protect preempt_disable()/enable() for
> idle process. But tracing(function_graph, function) introduce more
> preempt_disable()/enable() for idle process. It brings windows.
>
> I bet that this bug is not come from this window.
> (I added some strict code to RCU and did stress test,
> bug was still occurred.)
With Li's test, I was able to trigger the ring_buffer warning about
resetting while committing. I was still able to trigger it with your
patch to move the ring buffer disable code. But I did not trigger it
with both the ring buffer disable code _and_ your RCU fix.
But I was able to trigger the segfault that Li is seeing. That is most
likely a separate issue. I'm currently writing special debug code to
find out why that is happening.
Thanks!
-- Steve
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [BUG] WARNING: at kernel/trace/ring_buffer.c:3420
2010-03-09 17:28 ` Steven Rostedt
@ 2010-03-09 18:46 ` Steven Rostedt
0 siblings, 0 replies; 11+ messages in thread
From: Steven Rostedt @ 2010-03-09 18:46 UTC (permalink / raw)
To: Lai Jiangshan
Cc: Ingo Molnar, Li Zefan, Frederic Weisbecker, LKML,
Paul E. McKenney
On Tue, 2010-03-09 at 12:28 -0500, Steven Rostedt wrote:
> On Tue, 2010-03-09 at 11:35 +0800, Lai Jiangshan wrote:
> With Li's test, I was able to trigger the ring_buffer warning about
> resetting while committing. I was still able to trigger it with your
> patch to move the ring buffer disable code. But I did not trigger it
> with both the ring buffer disable code _and_ your RCU fix.
Spoke too soon. I hit it again. But I think I know why.
-- Steve
^ permalink raw reply [flat|nested] 11+ messages in thread
* [tip:tracing/urgent] ring-buffer: Move disabled check into preempt disable section
2010-03-08 6:50 ` Lai Jiangshan
2010-03-08 15:03 ` Steven Rostedt
@ 2010-03-13 12:24 ` tip-bot for Lai Jiangshan
1 sibling, 0 replies; 11+ messages in thread
From: tip-bot for Lai Jiangshan @ 2010-03-13 12:24 UTC (permalink / raw)
To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, rostedt, lizf, tglx, laijs
Commit-ID: 52fbe9cde7fdb5c6fac196d7ebd2d92d05ef3cd4
Gitweb: http://git.kernel.org/tip/52fbe9cde7fdb5c6fac196d7ebd2d92d05ef3cd4
Author: Lai Jiangshan <laijs@cn.fujitsu.com>
AuthorDate: Mon, 8 Mar 2010 14:50:43 +0800
Committer: Steven Rostedt <rostedt@goodmis.org>
CommitDate: Fri, 12 Mar 2010 20:26:56 -0500
ring-buffer: Move disabled check into preempt disable section
The ring buffer resizing and resetting relies on a schedule RCU
action. The buffers are disabled, a synchronize_sched() is called
and then the resize or reset takes place.
But this only works if the disabling of the buffers are within the
preempt disabled section, otherwise a window exists that the buffers
can be written to while a reset or resize takes place.
Cc: stable@kernel.org
Reported-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <4B949E43.2010906@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
kernel/trace/ring_buffer.c | 12 ++++++------
1 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 8c1b2d2..54191d6 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2232,12 +2232,12 @@ ring_buffer_lock_reserve(struct ring_buffer *buffer, unsigned long length)
if (ring_buffer_flags != RB_BUFFERS_ON)
return NULL;
- if (atomic_read(&buffer->record_disabled))
- return NULL;
-
/* If we are tracing schedule, we don't want to recurse */
resched = ftrace_preempt_disable();
+ if (atomic_read(&buffer->record_disabled))
+ goto out_nocheck;
+
if (trace_recursive_lock())
goto out_nocheck;
@@ -2469,11 +2469,11 @@ int ring_buffer_write(struct ring_buffer *buffer,
if (ring_buffer_flags != RB_BUFFERS_ON)
return -EBUSY;
- if (atomic_read(&buffer->record_disabled))
- return -EBUSY;
-
resched = ftrace_preempt_disable();
+ if (atomic_read(&buffer->record_disabled))
+ goto out;
+
cpu = raw_smp_processor_id();
if (!cpumask_test_cpu(cpu, buffer->cpumask))
^ permalink raw reply related [flat|nested] 11+ messages in thread
end of thread, other threads:[~2010-03-13 12:25 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-08 6:04 [BUG] WARNING: at kernel/trace/ring_buffer.c:3420 Li Zefan
2010-03-08 6:50 ` Lai Jiangshan
2010-03-08 15:03 ` Steven Rostedt
2010-03-08 15:07 ` Steven Rostedt
2010-03-09 3:35 ` Lai Jiangshan
2010-03-09 3:54 ` Steven Rostedt
2010-03-09 17:28 ` Steven Rostedt
2010-03-09 18:46 ` Steven Rostedt
2010-03-13 12:24 ` [tip:tracing/urgent] ring-buffer: Move disabled check into preempt disable section tip-bot for Lai Jiangshan
2010-03-08 14:54 ` [BUG] WARNING: at kernel/trace/ring_buffer.c:3420 Steven Rostedt
2010-03-09 1:08 ` Li Zefan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox