linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] BUG: using smp_processor_id() in preemptible [00000000] code: caller is __qdisc_run
@ 2008-08-11 13:11 John Kacur
  2008-08-11 21:00 ` David Miller
  0 siblings, 1 reply; 6+ messages in thread
From: John Kacur @ 2008-08-11 13:11 UTC (permalink / raw)
  To: rt-users; +Cc: LKML, Ingo Molnar, Thomas Gleixner, Steven Rostedt,
	Peter Zijlstra

[-- Attachment #1: Type: text/plain, Size: 4317 bytes --]

I'm running a linux-2.6.26.1 kernel with the real-time
patch-2.6.26-rt1 plus most of the patches discussed on the
linux-rt-users list since rt1. (except ppc patches, and not Gregory
Haskins experimental stuff)
The above info is mostly just full exclosure, I'm sure that this bug
exists in plain linux-2.6.26 + real-time patch-2.6.26-rt1 too.

BUG: using smp_processor_id() in preemptible [00000000] code: firefox-bin/4091
caller is __qdisc_run+0x160/0x1e9
Pid: 4091, comm: firefox-bin Tainted: G        W 2.6.26.1-rt1.jk #4

Call Trace:
 [<ffffffff803468ac>] debug_smp_processor_id+0xe4/0xf4
 [<ffffffff8040673f>] __qdisc_run+0x160/0x1e9
 [<ffffffff803f406c>] dev_queue_xmit+0x1b3/0x2ee
 [<ffffffff8041abec>] ip_finish_output+0x2a6/0x2ef
 [<ffffffff8041ad18>] ip_output+0xe3/0xec
 [<ffffffff80419c54>] ip_local_out+0x25/0x29
 [<ffffffff8041a4c6>] ip_queue_xmit+0x2ce/0x35e
 [<ffffffff8042e274>] ? __tcp_push_pending_frames+0x74a/0x860
 [<ffffffff8042e274>] ? __tcp_push_pending_frames+0x74a/0x860
 [<ffffffff80286265>] ? trace_preempt_on+0x1f/0x105
 [<ffffffff8042b6b2>] ? tcp_transmit_skb+0x72a/0x78f
 [<ffffffff8042e274>] ? __tcp_push_pending_frames+0x74a/0x860
 [<ffffffff8042b6d8>] tcp_transmit_skb+0x750/0x78f
 [<ffffffff8042e274>] __tcp_push_pending_frames+0x74a/0x860
 [<ffffffff802b53b3>] ? __kmalloc_node+0x48/0x4a
 [<ffffffff803ee44d>] ? __alloc_skb+0x70/0x136
 [<ffffffff8042e518>] tcp_send_fin+0x18e/0x19a
 [<ffffffff80420358>] tcp_close+0x1cc/0x413
 [<ffffffff8043d1a6>] inet_release+0x55/0x5c
 [<ffffffff803e78af>] sock_release+0x1f/0xb2
 [<ffffffff803e797b>] sock_close+0x39/0x3f
 [<ffffffff802bb11b>] __fput+0xca/0x18d
 [<ffffffff802bb1f7>] fput+0x19/0x1b
 [<ffffffff802b8120>] filp_close+0x6b/0x76
 [<ffffffff802b81d5>] sys_close+0xaa/0xe9
 [<ffffffff80226c07>] sysenter_do_call+0x8c/0x149
 [<ffffffff8046ca94>] ? trace_hardirqs_on_thunk+0x3a/0x3c

---------------------------
| preempt count: 00000001 ]
| 1-level deep critical section nesting:
----------------------------------------
.. [<ffffffff80346859>] .... debug_smp_processor_id+0x91/0xf4
.....[<ffffffff8040673f>] ..   ( <= __qdisc_run+0x160/0x1e9)

BUG: firefox-bin:4091 task might have lost a preemption check!
Pid: 4091, comm: firefox-bin Tainted: G        W 2.6.26.1-rt1.jk #4

Call Trace:
 [<ffffffff804708c8>] ? sub_preempt_count+0xd1/0xe6
 [<ffffffff80235c41>] preempt_enable_no_resched+0x5c/0x5e
 [<ffffffff803468b1>] debug_smp_processor_id+0xe9/0xf4
 [<ffffffff8040673f>] __qdisc_run+0x160/0x1e9
 [<ffffffff803f406c>] dev_queue_xmit+0x1b3/0x2ee
 [<ffffffff8041abec>] ip_finish_output+0x2a6/0x2ef
 [<ffffffff8041ad18>] ip_output+0xe3/0xec
 [<ffffffff80419c54>] ip_local_out+0x25/0x29
 [<ffffffff8041a4c6>] ip_queue_xmit+0x2ce/0x35e
 [<ffffffff8042e274>] ? __tcp_push_pending_frames+0x74a/0x860
 [<ffffffff8042e274>] ? __tcp_push_pending_frames+0x74a/0x860
 [<ffffffff80286265>] ? trace_preempt_on+0x1f/0x105
 [<ffffffff8042b6b2>] ? tcp_transmit_skb+0x72a/0x78f
 [<ffffffff8042e274>] ? __tcp_push_pending_frames+0x74a/0x860
 [<ffffffff8042b6d8>] tcp_transmit_skb+0x750/0x78f
 [<ffffffff8042e274>] __tcp_push_pending_frames+0x74a/0x860
 [<ffffffff802b53b3>] ? __kmalloc_node+0x48/0x4a
 [<ffffffff803ee44d>] ? __alloc_skb+0x70/0x136
 [<ffffffff8042e518>] tcp_send_fin+0x18e/0x19a
 [<ffffffff80420358>] tcp_close+0x1cc/0x413
 [<ffffffff8043d1a6>] inet_release+0x55/0x5c
 [<ffffffff803e78af>] sock_release+0x1f/0xb2
 [<ffffffff803e797b>] sock_close+0x39/0x3f
 [<ffffffff802bb11b>] __fput+0xca/0x18d
 [<ffffffff802bb1f7>] fput+0x19/0x1b
 [<ffffffff802b8120>] filp_close+0x6b/0x76
 [<ffffffff802b81d5>] sys_close+0xaa/0xe9
 [<ffffffff80226c07>] sysenter_do_call+0x8c/0x149
 [<ffffffff8046ca94>] ? trace_hardirqs_on_thunk+0x3a/0x3c

---------------------------
| preempt count: 00000000 ]
| 0-level deep critical section nesting:

__qdisc_run() calls qdisc_restart() which calls
handle_dev_cpu_collision(skb, dev, q); and then the problem shows up
here:
__get_cpu_var(netdev_rx_stat).cpu_collision++;

The solution is to disable interrupts around the above increment. Here
is an attached patch to do so. (Thank's to Peter Zijlstra for help in
the analysis and dropping the answer in my lap, so if I got it right
it is due to his help, but if I messed it up, then I did that part all
by myself.)

Unless there are objections, please apply.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: qdisc_run.patch --]
[-- Type: text/x-patch; name=qdisc_run.patch, Size: 1067 bytes --]

Subject: fix for BUG: using smp_processor_id() in preemptible code

Fixes using smp_processor_id() in preemptible code as seen when __qdisc_run
calls qdisc_restart which calls handle_dev_cpu_collision

This is fixed by disabling irqs (and preemption) around cpu_collision++
in handle_dev_cpu_collision

Signed-off-by: John Kacur <jkacur at gmail dot com>

Index: linux-2.6.26.1-rt1.jk/net/sched/sch_generic.c
===================================================================
--- linux-2.6.26.1-rt1.jk.orig/net/sched/sch_generic.c
+++ linux-2.6.26.1-rt1.jk/net/sched/sch_generic.c
@@ -94,6 +94,7 @@ static inline int handle_dev_cpu_collisi
 					   struct Qdisc *q)
 {
 	int ret;
+	unsigned long flags;
 
 	if (unlikely(dev->xmit_lock_owner == (void *)current)) {
 		/*
@@ -112,7 +113,9 @@ static inline int handle_dev_cpu_collisi
 		 * Another cpu is holding lock, requeue & delay xmits for
 		 * some time.
 		 */
+		local_irq_save(flags);
 		__get_cpu_var(netdev_rx_stat).cpu_collision++;
+		local_irq_restore(flags);
 		ret = dev_requeue_skb(skb, dev, q);
 	}
 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] BUG: using smp_processor_id() in preemptible [00000000] code: caller is __qdisc_run
  2008-08-11 13:11 [PATCH] BUG: using smp_processor_id() in preemptible [00000000] code: caller is __qdisc_run John Kacur
@ 2008-08-11 21:00 ` David Miller
  2008-08-11 21:09   ` Peter Zijlstra
  0 siblings, 1 reply; 6+ messages in thread
From: David Miller @ 2008-08-11 21:00 UTC (permalink / raw)
  To: jkacur; +Cc: linux-rt-users, linux-kernel, mingo, tglx, rostedt, a.p.zijlstra

From: "John Kacur" <jkacur@gmail.com>
Date: Mon, 11 Aug 2008 15:11:46 +0200

> __qdisc_run() calls qdisc_restart() which calls
> handle_dev_cpu_collision(skb, dev, q); and then the problem shows up
> here:
> __get_cpu_var(netdev_rx_stat).cpu_collision++;
> 
> The solution is to disable interrupts around the above increment. Here
> is an attached patch to do so. (Thank's to Peter Zijlstra for help in
> the analysis and dropping the answer in my lap, so if I got it right
> it is due to his help, but if I messed it up, then I did that part all
> by myself.)

__qdisc_run() always runs in software interrupt context,
so I guess this is some problem with the -rt stuff running
software interrupts in threads?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] BUG: using smp_processor_id() in preemptible [00000000] code: caller is __qdisc_run
  2008-08-11 21:00 ` David Miller
@ 2008-08-11 21:09   ` Peter Zijlstra
  2008-08-11 21:23     ` David Miller
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2008-08-11 21:09 UTC (permalink / raw)
  To: David Miller; +Cc: jkacur, linux-rt-users, linux-kernel, mingo, tglx, rostedt

On Mon, 2008-08-11 at 14:00 -0700, David Miller wrote:
> From: "John Kacur" <jkacur@gmail.com>
> Date: Mon, 11 Aug 2008 15:11:46 +0200
> 
> > __qdisc_run() calls qdisc_restart() which calls
> > handle_dev_cpu_collision(skb, dev, q); and then the problem shows up
> > here:
> > __get_cpu_var(netdev_rx_stat).cpu_collision++;
> > 
> > The solution is to disable interrupts around the above increment. Here
> > is an attached patch to do so. (Thank's to Peter Zijlstra for help in
> > the analysis and dropping the answer in my lap, so if I got it right
> > it is due to his help, but if I messed it up, then I did that part all
> > by myself.)
> 
> __qdisc_run() always runs in software interrupt context,
> so I guess this is some problem with the -rt stuff running
> software interrupts in threads?

Hmm, good point - and those threads should be cpu affine on -rt if I'm
not mistaken. Steven, do you happen to remember details?


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] BUG: using smp_processor_id() in preemptible [00000000] code: caller is __qdisc_run
  2008-08-11 21:09   ` Peter Zijlstra
@ 2008-08-11 21:23     ` David Miller
  2008-08-12  1:22       ` Steven Rostedt
  0 siblings, 1 reply; 6+ messages in thread
From: David Miller @ 2008-08-11 21:23 UTC (permalink / raw)
  To: a.p.zijlstra; +Cc: jkacur, linux-rt-users, linux-kernel, mingo, tglx, rostedt

From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Mon, 11 Aug 2008 23:09:38 +0200

> On Mon, 2008-08-11 at 14:00 -0700, David Miller wrote:
> > From: "John Kacur" <jkacur@gmail.com>
> > Date: Mon, 11 Aug 2008 15:11:46 +0200
> > 
> > > __qdisc_run() calls qdisc_restart() which calls
> > > handle_dev_cpu_collision(skb, dev, q); and then the problem shows up
> > > here:
> > > __get_cpu_var(netdev_rx_stat).cpu_collision++;
> > > 
> > > The solution is to disable interrupts around the above increment. Here
> > > is an attached patch to do so. (Thank's to Peter Zijlstra for help in
> > > the analysis and dropping the answer in my lap, so if I got it right
> > > it is due to his help, but if I messed it up, then I did that part all
> > > by myself.)
> > 
> > __qdisc_run() always runs in software interrupt context,
> > so I guess this is some problem with the -rt stuff running
> > software interrupts in threads?
> 
> Hmm, good point - and those threads should be cpu affine on -rt if I'm
> not mistaken. Steven, do you happen to remember details?

The key issue is whether those threads run software interrupts
in a compatible environment.  And such a proper environment allows
plain smp_processor_id() without any special preparations.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] BUG: using smp_processor_id() in preemptible [00000000] code: caller is __qdisc_run
  2008-08-11 21:23     ` David Miller
@ 2008-08-12  1:22       ` Steven Rostedt
  2008-08-12  2:06         ` Gregory Haskins
  0 siblings, 1 reply; 6+ messages in thread
From: Steven Rostedt @ 2008-08-12  1:22 UTC (permalink / raw)
  To: David Miller
  Cc: a.p.zijlstra, jkacur, linux-rt-users, linux-kernel, mingo, tglx


On Mon, 11 Aug 2008, David Miller wrote:

> From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date: Mon, 11 Aug 2008 23:09:38 +0200
> 
> > On Mon, 2008-08-11 at 14:00 -0700, David Miller wrote:
> > > From: "John Kacur" <jkacur@gmail.com>
> > > Date: Mon, 11 Aug 2008 15:11:46 +0200
> > > 
> > > > __qdisc_run() calls qdisc_restart() which calls
> > > > handle_dev_cpu_collision(skb, dev, q); and then the problem shows up
> > > > here:
> > > > __get_cpu_var(netdev_rx_stat).cpu_collision++;
> > > > 
> > > > The solution is to disable interrupts around the above increment. Here
> > > > is an attached patch to do so. (Thank's to Peter Zijlstra for help in
> > > > the analysis and dropping the answer in my lap, so if I got it right
> > > > it is due to his help, but if I messed it up, then I did that part all
> > > > by myself.)
> > > 
> > > __qdisc_run() always runs in software interrupt context,
> > > so I guess this is some problem with the -rt stuff running
> > > software interrupts in threads?
> > 
> > Hmm, good point - and those threads should be cpu affine on -rt if I'm
> > not mistaken. Steven, do you happen to remember details?
> 
> The key issue is whether those threads run software interrupts
> in a compatible environment.  And such a proper environment allows
> plain smp_processor_id() without any special preparations.
> 

Yes, we have a softirq thread per CPU. We should have a test in the 
smp_processor_id for rt to not bug if it is called by known "per_cpu" 
threads.

-- Steve


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] BUG: using smp_processor_id() in preemptible [00000000] code: caller is __qdisc_run
  2008-08-12  1:22       ` Steven Rostedt
@ 2008-08-12  2:06         ` Gregory Haskins
  0 siblings, 0 replies; 6+ messages in thread
From: Gregory Haskins @ 2008-08-12  2:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: David Miller, a.p.zijlstra, jkacur, linux-rt-users, linux-kernel,
	mingo, tglx

[-- Attachment #1: Type: text/plain, Size: 2206 bytes --]

Steven Rostedt wrote:
> On Mon, 11 Aug 2008, David Miller wrote:
>
>   
>> From: Peter Zijlstra <a.p.zijlstra@chello.nl>
>> Date: Mon, 11 Aug 2008 23:09:38 +0200
>>
>>     
>>> On Mon, 2008-08-11 at 14:00 -0700, David Miller wrote:
>>>       
>>>> From: "John Kacur" <jkacur@gmail.com>
>>>> Date: Mon, 11 Aug 2008 15:11:46 +0200
>>>>
>>>>         
>>>>> __qdisc_run() calls qdisc_restart() which calls
>>>>> handle_dev_cpu_collision(skb, dev, q); and then the problem shows up
>>>>> here:
>>>>> __get_cpu_var(netdev_rx_stat).cpu_collision++;
>>>>>
>>>>> The solution is to disable interrupts around the above increment. Here
>>>>> is an attached patch to do so. (Thank's to Peter Zijlstra for help in
>>>>> the analysis and dropping the answer in my lap, so if I got it right
>>>>> it is due to his help, but if I messed it up, then I did that part all
>>>>> by myself.)
>>>>>           
>>>> __qdisc_run() always runs in software interrupt context,
>>>> so I guess this is some problem with the -rt stuff running
>>>> software interrupts in threads?
>>>>         
>>> Hmm, good point - and those threads should be cpu affine on -rt if I'm
>>> not mistaken. Steven, do you happen to remember details?
>>>       
>> The key issue is whether those threads run software interrupts
>> in a compatible environment.  And such a proper environment allows
>> plain smp_processor_id() without any special preparations.
>>
>>     
>
> Yes, we have a softirq thread per CPU. We should have a test in the 
> smp_processor_id for rt to not bug if it is called by known "per_cpu" 
> threads.
>   
something like

#ifdef CONFIG_PREEMPT_RT
    WARN_ON(!in_atomic() && current->rt.nr_cpus_allowed > 1);
#else
    WARN_ON(!in_atomic());
#endif

would probably work and be fairly efficient.  Of course nr_cpus_allowed 
technically could be adjusted at any time, so perhaps not.  Its probably 
good enough for a warning check, however.

> -- Steve
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>   



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-08-12  2:08 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-11 13:11 [PATCH] BUG: using smp_processor_id() in preemptible [00000000] code: caller is __qdisc_run John Kacur
2008-08-11 21:00 ` David Miller
2008-08-11 21:09   ` Peter Zijlstra
2008-08-11 21:23     ` David Miller
2008-08-12  1:22       ` Steven Rostedt
2008-08-12  2:06         ` Gregory Haskins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).