linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Kernel splat from 3.5.7+ (tainted)
@ 2012-11-26 17:10 Ben Greear
  2012-11-26 17:46 ` Ben Greear
  2012-11-26 17:49 ` Johannes Berg
  0 siblings, 2 replies; 7+ messages in thread
From: Ben Greear @ 2012-11-26 17:10 UTC (permalink / raw)
  To: linux-wireless@vger.kernel.org

This looks like some sort of locking bug...the warning comes from the code
in softirq.c (below). For what it's worth, the tainting module was not in active use.

I should be able to get source code printout for the various addresses
if there is anything of particular interest.


static inline void _local_bh_enable_ip(unsigned long ip)
{
	WARN_ON_ONCE(in_irq() || irqs_disabled());
#ifdef CONFIG_TRACE_IRQFLAGS
	local_irq_disable();
#endif


Nov 21 19:33:17 localhost kernel: WARNING: at /home/greearb/git/linux-3.5.dev.y/kernel/softirq.c:159 _local_bh_enable_ip+0x41/0x9f()
Nov 21 19:33:17 localhost kernel: Hardware name: To be filled by O.E.M.
Nov 21 19:33:17 localhost kernel: Modules linked in: bnep bluetooth fuse 8021q garp stp llc macvlan wanlink(PO) pktgen lockd sunrpc gpio_ich ppdev coretemp 
hwmon kvm snd_hda_codec_realtek microcode serio_raw snd_hda_intel pcspkr snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm i2c_i801 lpc_ich mfd_core ath9k 
ath9k_common ath9k_hw ath mac80211 cfg80211 e1000e snd_page_alloc snd_timer snd soundcore parport_pc parport uinput ipv6 i915 video i2c_algo_bit drm_kms_helper 
drm i2c_core [last unloaded: nf_nat]
Nov 21 19:33:17 localhost kernel: Pid: 5905, comm: kworker/u:0 Tainted: P           O 3.5.7+ #27
Nov 21 19:33:17 localhost kernel: Call Trace:
Nov 21 19:33:17 localhost kernel: <IRQ>  [<ffffffff8105c5cc>] warn_slowpath_common+0x80/0x98
Nov 21 19:33:17 localhost kernel: [<ffffffff8105c5f9>] warn_slowpath_null+0x15/0x17
Nov 21 19:33:17 localhost kernel: [<ffffffff81062dfa>] _local_bh_enable_ip+0x41/0x9f
Nov 21 19:33:17 localhost kernel: [<ffffffff81062e61>] local_bh_enable_ip+0x9/0xb
Nov 21 19:33:17 localhost kernel: [<ffffffff814e31a6>] _raw_spin_unlock_bh+0x1c/0x1e
Nov 21 19:33:17 localhost kernel: [<ffffffff8144fee4>] destroy_conntrack+0xbd/0xfc
Nov 21 19:33:17 localhost kernel: [<ffffffff8144da47>] nf_conntrack_destroy+0x27/0x2e
Nov 21 19:33:17 localhost kernel: [<ffffffff81422934>] skb_release_head_state+0x9a/0xdc
Nov 21 19:33:17 localhost kernel: [<ffffffff81422b77>] __kfree_skb+0x11/0x7d
Nov 21 19:33:17 localhost kernel: [<ffffffff81422c2c>] consume_skb+0x28/0x2a
Nov 21 19:33:17 localhost kernel: [<ffffffffa01fa2f9>] __ieee80211_tx+0x1f9/0x31a [mac80211]
Nov 21 19:33:17 localhost kernel: [<ffffffff810247b2>] ? smp_apic_timer_interrupt+0x85/0x93
Nov 21 19:33:17 localhost kernel: [<ffffffffa01fa4e0>] ieee80211_tx+0xc6/0xed [mac80211]
Nov 21 19:33:17 localhost kernel: [<ffffffff81422c00>] ? kfree_skb_partial+0x1d/0x21
Nov 21 19:33:17 localhost kernel: [<ffffffff81422d6b>] ? pskb_expand_head+0x13d/0x1eb
Nov 21 19:33:17 localhost kernel: [<ffffffffa01fa95c>] ieee80211_xmit+0xbe/0xcc [mac80211]
Nov 21 19:33:17 localhost kernel: [<ffffffffa01fb4ab>] ieee80211_subif_start_xmit+0xae2/0xb00 [mac80211]
Nov 21 19:33:17 localhost kernel: [<ffffffff81084f17>] ? load_balance+0xc3/0x5ea
Nov 21 19:33:17 localhost kernel: [<ffffffff8142c92e>] dev_hard_start_xmit+0x3e2/0x4d6
Nov 21 19:33:17 localhost kernel: [<ffffffff814435ef>] sch_direct_xmit+0x6d/0x14d
Nov 21 19:33:17 localhost kernel: [<ffffffff814437de>] __qdisc_run+0x10f/0x12b
Nov 21 19:33:17 localhost kernel: [<ffffffff8142940b>] net_tx_action+0xe9/0x11e
Nov 21 19:33:17 localhost kernel: [<ffffffff81062f55>] __do_softirq+0x86/0x12f
Nov 21 19:33:17 localhost kernel: [<ffffffff814e955c>] call_softirq+0x1c/0x30
Nov 21 19:33:17 localhost kernel: <EOI>  [<ffffffff8100bbd9>] do_softirq+0x41/0x7e
Nov 21 19:33:17 localhost kernel: [<ffffffff81062e33>] _local_bh_enable_ip+0x7a/0x9f
Nov 21 19:33:17 localhost kernel: [<ffffffff81062e70>] local_bh_enable+0xd/0xf
Nov 21 19:33:17 localhost kernel: [<ffffffffa01fa9c7>] ieee80211_tx_skb_tid+0x5d/0x5f [mac80211]
Nov 21 19:33:17 localhost kernel: [<ffffffffa0200a26>] ieee80211_send_nullfunc+0x5f/0x64 [mac80211]
Nov 21 19:33:17 localhost kernel: [<ffffffffa01e8cba>] ieee80211_offchannel_return+0x9c/0x1d8 [mac80211]
Nov 21 19:33:17 localhost kernel: [<ffffffffa01e84d5>] ? ieee80211_request_scan+0x4f/0x4f [mac80211]
Nov 21 19:33:17 localhost kernel: [<ffffffffa01e794f>] __ieee80211_scan_completed+0x13e/0x179 [mac80211]
Nov 21 19:33:17 localhost kernel: [<ffffffffa01e84d5>] ? ieee80211_request_scan+0x4f/0x4f [mac80211]
Nov 21 19:33:17 localhost kernel: [<ffffffffa01e88ed>] ieee80211_scan_work+0x418/0x42f [mac80211]
Nov 21 19:33:17 localhost kernel: [<ffffffff814e2495>] ? __schedule+0x51f/0x561
Nov 21 19:33:17 localhost kernel: [<ffffffffa01e84d5>] ? ieee80211_request_scan+0x4f/0x4f [mac80211]
Nov 21 19:33:17 localhost kernel: [<ffffffff8106fcc7>] process_one_work+0x1a6/0x278
Nov 21 19:33:17 localhost kernel: [<ffffffff81071cd3>] worker_thread+0x136/0x255
Nov 21 19:33:17 localhost kernel: [<ffffffff81071b9d>] ? manage_workers+0x191/0x191
Nov 21 19:33:17 localhost kernel: [<ffffffff810755d7>] kthread+0x84/0x8c
Nov 21 19:33:17 localhost kernel: [<ffffffff814e9464>] kernel_thread_helper+0x4/0x10
Nov 21 19:33:17 localhost kernel: [<ffffffff81075553>] ? __init_kthread_worker+0x37/0x37
Nov 21 19:33:17 localhost kernel: [<ffffffff814e9460>] ? gs_change+0x13/0x13
Nov 21 19:33:17 localhost kernel: ---[ end trace f0563900e2e456dc ]---
Nov 21 19:33:17 localhost kernel: IPv6: ADDRCONF(NETDEV_CHANGE): sta197: link becomes ready
-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Kernel splat from 3.5.7+ (tainted)
  2012-11-26 17:10 Kernel splat from 3.5.7+ (tainted) Ben Greear
@ 2012-11-26 17:46 ` Ben Greear
  2012-11-26 17:56   ` Johannes Berg
  2012-11-26 17:49 ` Johannes Berg
  1 sibling, 1 reply; 7+ messages in thread
From: Ben Greear @ 2012-11-26 17:46 UTC (permalink / raw)
  To: linux-wireless@vger.kernel.org

On 11/26/2012 09:10 AM, Ben Greear wrote:
> This looks like some sort of locking bug...the warning comes from the code
> in softirq.c (below). For what it's worth, the tainting module was not in active use.
>
> I should be able to get source code printout for the various addresses
> if there is anything of particular interest.

Here's some decoding below...it seems that the mac80211 code free's an SKB
with dev_kfree_skb(skb); while holding a spin_lock_irqsave(), and then
eventually we get the splat warning.

I'm not really sure what the problem is, however.

> static inline void _local_bh_enable_ip(unsigned long ip)
> {
>      WARN_ON_ONCE(in_irq() || irqs_disabled());
> #ifdef CONFIG_TRACE_IRQFLAGS
>      local_irq_disable();
> #endif
>
>
> Nov 21 19:33:17 localhost kernel: WARNING: at /home/greearb/git/linux-3.5.dev.y/kernel/softirq.c:159 _local_bh_enable_ip+0x41/0x9f()
> Nov 21 19:33:17 localhost kernel: Hardware name: To be filled by O.E.M.
> Nov 21 19:33:17 localhost kernel: Modules linked in: bnep bluetooth fuse 8021q garp stp llc macvlan wanlink(PO) pktgen lockd sunrpc gpio_ich ppdev coretemp
> hwmon kvm snd_hda_codec_realtek microcode serio_raw snd_hda_intel pcspkr snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm i2c_i801 lpc_ich mfd_core ath9k
> ath9k_common ath9k_hw ath mac80211 cfg80211 e1000e snd_page_alloc snd_timer snd soundcore parport_pc parport uinput ipv6 i915 video i2c_algo_bit drm_kms_helper
> drm i2c_core [last unloaded: nf_nat]
> Nov 21 19:33:17 localhost kernel: Pid: 5905, comm: kworker/u:0 Tainted: P           O 3.5.7+ #27
> Nov 21 19:33:17 localhost kernel: Call Trace:
> Nov 21 19:33:17 localhost kernel: <IRQ>  [<ffffffff8105c5cc>] warn_slowpath_common+0x80/0x98
> Nov 21 19:33:17 localhost kernel: [<ffffffff8105c5f9>] warn_slowpath_null+0x15/0x17
> Nov 21 19:33:17 localhost kernel: [<ffffffff81062dfa>] _local_bh_enable_ip+0x41/0x9f

(gdb) l *( _local_bh_enable_ip+0x41)
0xffffffff81062dfa is in _local_bh_enable_ip (/home/greearb/git/linux-3.5.dev.y/kernel/softirq.c:159).
154	
155	EXPORT_SYMBOL(_local_bh_enable);
156	
157	static inline void _local_bh_enable_ip(unsigned long ip)
158	{
159		WARN_ON_ONCE(in_irq() || irqs_disabled());
160	#ifdef CONFIG_TRACE_IRQFLAGS
161		local_irq_disable();
162	#endif
163		/*
(gdb)


> Nov 21 19:33:17 localhost kernel: [<ffffffff81062e61>] local_bh_enable_ip+0x9/0xb

(gdb) l *(local_bh_enable_ip+0x9)
0xffffffff81062e61 is in local_bh_enable_ip (/home/greearb/git/linux-3.5.dev.y/kernel/softirq.c:193).
188	EXPORT_SYMBOL(local_bh_enable);
189	
190	void local_bh_enable_ip(unsigned long ip)
191	{
192		_local_bh_enable_ip(ip);
193	}
194	EXPORT_SYMBOL(local_bh_enable_ip);
195	
196	/*
197	 * We restart softirq processing MAX_SOFTIRQ_RESTART times,


> Nov 21 19:33:17 localhost kernel: [<ffffffff814e31a6>] _raw_spin_unlock_bh+0x1c/0x1e

(gdb) l *(_raw_spin_unlock_bh+0x1c)
0xffffffff814e31a6 is in _raw_spin_unlock_bh (/home/greearb/git/linux-3.5.dev.y/kernel/spinlock.c:194).
189	
190	#ifndef CONFIG_INLINE_SPIN_UNLOCK_BH
191	void __lockfunc _raw_spin_unlock_bh(raw_spinlock_t *lock)
192	{
193		__raw_spin_unlock_bh(lock);
194	}
195	EXPORT_SYMBOL(_raw_spin_unlock_bh);
196	#endif
197	
198	#ifndef CONFIG_INLINE_READ_TRYLOCK
(gdb)


> Nov 21 19:33:17 localhost kernel: [<ffffffff8144fee4>] destroy_conntrack+0xbd/0xfc

0xffffffff8144fee4 is in destroy_conntrack (/home/greearb/git/linux-3.5.dev.y/net/netfilter/nf_conntrack_core.c:227).
222		}
223	
224		NF_CT_STAT_INC(net, delete);
225		spin_unlock_bh(&nf_conntrack_lock);
226	
227		if (ct->master)
228			nf_ct_put(ct->master);
229	
230		pr_debug("destroy_conntrack: returning ct=%p to slab\n", ct);
231		nf_conntrack_free(ct);


> Nov 21 19:33:17 localhost kernel: [<ffffffff8144da47>] nf_conntrack_destroy+0x27/0x2e

0xffffffff8144da47 is in nf_conntrack_destroy (/home/greearb/git/linux-3.5.dev.y/include/linux/rcupdate.h:754).
749	{
750		rcu_lockdep_assert(!rcu_is_cpu_idle(),
751				   "rcu_read_unlock() used illegally while idle");
752		rcu_lock_release(&rcu_lock_map);
753		__release(RCU);
754		__rcu_read_unlock();
755	}
756	
757	/**
758	 * rcu_read_lock_bh() - mark the beginning of an RCU-bh critical section
(gdb)


> Nov 21 19:33:17 localhost kernel: [<ffffffff81422934>] skb_release_head_state+0x9a/0xdc

0xffffffff81422934 is in skb_release_head_state (/home/greearb/git/linux-3.5.dev.y/net/core/skbuff.c:497).
492		}
493	#if IS_ENABLED(CONFIG_NF_CONNTRACK)
494		nf_conntrack_put(skb->nfct);
495	#endif
496	#ifdef NET_SKBUFF_NF_DEFRAG_NEEDED
497		nf_conntrack_put_reasm(skb->nfct_reasm);
498	#endif
499	#ifdef CONFIG_BRIDGE_NETFILTER
500		nf_bridge_put(skb->nf_bridge);
501	#endif
(gdb)

> Nov 21 19:33:17 localhost kernel: [<ffffffff81422b77>] __kfree_skb+0x11/0x7d

0xffffffff81422b77 is in __kfree_skb (/home/greearb/git/linux-3.5.dev.y/net/core/skbuff.c:515).
510	
511	/* Free everything but the sk_buff shell. */
512	static void skb_release_all(struct sk_buff *skb)
513	{
514		skb_release_head_state(skb);
515		skb_release_data(skb);
516	}
517	
518	/**
519	 *	__kfree_skb - private function

> Nov 21 19:33:17 localhost kernel: [<ffffffff81422c2c>] consume_skb+0x28/0x2a

0xffffffff81422c2c is in consume_skb (/home/greearb/git/linux-3.5.dev.y/net/core/skbuff.c:572).
567			smp_rmb();
568		else if (likely(!atomic_dec_and_test(&skb->users)))
569			return;
570		trace_consume_skb(skb);
571		__kfree_skb(skb);
572	}
573	EXPORT_SYMBOL(consume_skb);
574	
575	/**
576	 * 	skb_recycle - clean up an skb for reuse
(gdb)

> Nov 21 19:33:17 localhost kernel: [<ffffffffa01fa2f9>] __ieee80211_tx+0x1f9/0x31a [mac80211]

The line above is called with spin-lock held:


	spin_lock_irqsave(&local->queue_stop_reason_lock, flags);
...

0x182f9 is in __ieee80211_tx (/home/greearb/git/linux-3.5.dev.y/net/mac80211/tx.c:1256).
1251					skb_queue_splice_init(skbs, &local->pending[q]);
1252				} else {
1253					u32 len = skb_queue_len(&local->pending[q]);
1254					if (len >= max_pending_qsize) {
1255						__skb_unlink(skb, skbs);
1256						dev_kfree_skb(skb);
1257						/* TODO:  Add counter for this */
1258					} else {
1259						skb_queue_splice_tail_init(skbs,
1260									   &local->pending[q]);
(gdb)


> Nov 21 19:33:17 localhost kernel: [<ffffffff810247b2>] ? smp_apic_timer_interrupt+0x85/0x93
> Nov 21 19:33:17 localhost kernel: [<ffffffffa01fa4e0>] ieee80211_tx+0xc6/0xed [mac80211]
> Nov 21 19:33:17 localhost kernel: [<ffffffff81422c00>] ? kfree_skb_partial+0x1d/0x21
> Nov 21 19:33:17 localhost kernel: [<ffffffff81422d6b>] ? pskb_expand_head+0x13d/0x1eb
> Nov 21 19:33:17 localhost kernel: [<ffffffffa01fa95c>] ieee80211_xmit+0xbe/0xcc [mac80211]
> Nov 21 19:33:17 localhost kernel: [<ffffffffa01fb4ab>] ieee80211_subif_start_xmit+0xae2/0xb00 [mac80211]
> Nov 21 19:33:17 localhost kernel: [<ffffffff81084f17>] ? load_balance+0xc3/0x5ea
> Nov 21 19:33:17 localhost kernel: [<ffffffff8142c92e>] dev_hard_start_xmit+0x3e2/0x4d6
> Nov 21 19:33:17 localhost kernel: [<ffffffff814435ef>] sch_direct_xmit+0x6d/0x14d
> Nov 21 19:33:17 localhost kernel: [<ffffffff814437de>] __qdisc_run+0x10f/0x12b
> Nov 21 19:33:17 localhost kernel: [<ffffffff8142940b>] net_tx_action+0xe9/0x11e
> Nov 21 19:33:17 localhost kernel: [<ffffffff81062f55>] __do_softirq+0x86/0x12f
> Nov 21 19:33:17 localhost kernel: [<ffffffff814e955c>] call_softirq+0x1c/0x30
> Nov 21 19:33:17 localhost kernel: <EOI>  [<ffffffff8100bbd9>] do_softirq+0x41/0x7e
> Nov 21 19:33:17 localhost kernel: [<ffffffff81062e33>] _local_bh_enable_ip+0x7a/0x9f
> Nov 21 19:33:17 localhost kernel: [<ffffffff81062e70>] local_bh_enable+0xd/0xf
> Nov 21 19:33:17 localhost kernel: [<ffffffffa01fa9c7>] ieee80211_tx_skb_tid+0x5d/0x5f [mac80211]
> Nov 21 19:33:17 localhost kernel: [<ffffffffa0200a26>] ieee80211_send_nullfunc+0x5f/0x64 [mac80211]
> Nov 21 19:33:17 localhost kernel: [<ffffffffa01e8cba>] ieee80211_offchannel_return+0x9c/0x1d8 [mac80211]
> Nov 21 19:33:17 localhost kernel: [<ffffffffa01e84d5>] ? ieee80211_request_scan+0x4f/0x4f [mac80211]
> Nov 21 19:33:17 localhost kernel: [<ffffffffa01e794f>] __ieee80211_scan_completed+0x13e/0x179 [mac80211]
> Nov 21 19:33:17 localhost kernel: [<ffffffffa01e84d5>] ? ieee80211_request_scan+0x4f/0x4f [mac80211]
> Nov 21 19:33:17 localhost kernel: [<ffffffffa01e88ed>] ieee80211_scan_work+0x418/0x42f [mac80211]
> Nov 21 19:33:17 localhost kernel: [<ffffffff814e2495>] ? __schedule+0x51f/0x561
> Nov 21 19:33:17 localhost kernel: [<ffffffffa01e84d5>] ? ieee80211_request_scan+0x4f/0x4f [mac80211]
> Nov 21 19:33:17 localhost kernel: [<ffffffff8106fcc7>] process_one_work+0x1a6/0x278
> Nov 21 19:33:17 localhost kernel: [<ffffffff81071cd3>] worker_thread+0x136/0x255
> Nov 21 19:33:17 localhost kernel: [<ffffffff81071b9d>] ? manage_workers+0x191/0x191
> Nov 21 19:33:17 localhost kernel: [<ffffffff810755d7>] kthread+0x84/0x8c
> Nov 21 19:33:17 localhost kernel: [<ffffffff814e9464>] kernel_thread_helper+0x4/0x10
> Nov 21 19:33:17 localhost kernel: [<ffffffff81075553>] ? __init_kthread_worker+0x37/0x37
> Nov 21 19:33:17 localhost kernel: [<ffffffff814e9460>] ? gs_change+0x13/0x13
> Nov 21 19:33:17 localhost kernel: ---[ end trace f0563900e2e456dc ]---
> Nov 21 19:33:17 localhost kernel: IPv6: ADDRCONF(NETDEV_CHANGE): sta197: link becomes ready


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Kernel splat from 3.5.7+ (tainted)
  2012-11-26 17:10 Kernel splat from 3.5.7+ (tainted) Ben Greear
  2012-11-26 17:46 ` Ben Greear
@ 2012-11-26 17:49 ` Johannes Berg
  1 sibling, 0 replies; 7+ messages in thread
From: Johannes Berg @ 2012-11-26 17:49 UTC (permalink / raw)
  To: Ben Greear; +Cc: linux-wireless@vger.kernel.org

On Mon, 2012-11-26 at 09:10 -0800, Ben Greear wrote:
> This looks like some sort of locking bug...the warning comes from the code
> in softirq.c (below).
...
> Nov 21 19:33:17 localhost kernel: WARNING: at /home/greearb/git/linux-3.5.dev.y/kernel/softirq.c:159 _local_bh_enable_ip+0x41/0x9f()
> Nov 21 19:33:17 localhost kernel: Hardware name: To be filled by O.E.M.
...
> Nov 21 19:33:17 localhost kernel: Pid: 5905, comm: kworker/u:0 Tainted: P           O 3.5.7+ #27
> Nov 21 19:33:17 localhost kernel: Call Trace:
> Nov 21 19:33:17 localhost kernel: <IRQ>  [<ffffffff8105c5cc>] warn_slowpath_common+0x80/0x98
> Nov 21 19:33:17 localhost kernel: [<ffffffff8105c5f9>] warn_slowpath_null+0x15/0x17
> Nov 21 19:33:17 localhost kernel: [<ffffffff81062dfa>] _local_bh_enable_ip+0x41/0x9f
> Nov 21 19:33:17 localhost kernel: [<ffffffff81062e61>] local_bh_enable_ip+0x9/0xb
> Nov 21 19:33:17 localhost kernel: [<ffffffff814e31a6>] _raw_spin_unlock_bh+0x1c/0x1e
> Nov 21 19:33:17 localhost kernel: [<ffffffff8144fee4>] destroy_conntrack+0xbd/0xfc
> Nov 21 19:33:17 localhost kernel: [<ffffffff8144da47>] nf_conntrack_destroy+0x27/0x2e
> Nov 21 19:33:17 localhost kernel: [<ffffffff81422934>] skb_release_head_state+0x9a/0xdc
> Nov 21 19:33:17 localhost kernel: [<ffffffff81422b77>] __kfree_skb+0x11/0x7d
> Nov 21 19:33:17 localhost kernel: [<ffffffff81422c2c>] consume_skb+0x28/0x2a
> Nov 21 19:33:17 localhost kernel: [<ffffffffa01fa2f9>] __ieee80211_tx+0x1f9/0x31a [mac80211]


I think the problem is the dev_kfree_skb(), it should be
dev_kfree_skb_any() in line 1299 in tx.c because ieee80211_tx() calls
__ieee80211_tx() with RCU lock held. It has pretty much done that
forever though, so it's surprising this shows up now? Was that some sort
of special frame?

Note that this is actually fixed in wireless-next.

johannes


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Kernel splat from 3.5.7+ (tainted)
  2012-11-26 17:46 ` Ben Greear
@ 2012-11-26 17:56   ` Johannes Berg
  2012-11-26 19:37     ` Ben Greear
  0 siblings, 1 reply; 7+ messages in thread
From: Johannes Berg @ 2012-11-26 17:56 UTC (permalink / raw)
  To: Ben Greear; +Cc: linux-wireless@vger.kernel.org

On Mon, 2012-11-26 at 09:46 -0800, Ben Greear wrote:

> 0x182f9 is in __ieee80211_tx (/home/greearb/git/linux-3.5.dev.y/net/mac80211/tx.c:1256).
> 1251					skb_queue_splice_init(skbs, &local->pending[q]);
> 1252				} else {
> 1253					u32 len = skb_queue_len(&local->pending[q]);
> 1254					if (len >= max_pending_qsize) {
> 1255						__skb_unlink(skb, skbs);
> 1256						dev_kfree_skb(skb);
> 1257						/* TODO:  Add counter for this */
> 1258					} else {

Wait .. this appears to be a local patch you have, it doesn't exist.
That explains why, the bug doesn't exist upstream (all freeing there is
outside the queue lock)

johannes


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Kernel splat from 3.5.7+ (tainted)
  2012-11-26 17:56   ` Johannes Berg
@ 2012-11-26 19:37     ` Ben Greear
  2012-11-26 22:00       ` Johannes Berg
  0 siblings, 1 reply; 7+ messages in thread
From: Ben Greear @ 2012-11-26 19:37 UTC (permalink / raw)
  To: Johannes Berg; +Cc: linux-wireless@vger.kernel.org

On 11/26/2012 09:56 AM, Johannes Berg wrote:
> On Mon, 2012-11-26 at 09:46 -0800, Ben Greear wrote:
>
>> 0x182f9 is in __ieee80211_tx (/home/greearb/git/linux-3.5.dev.y/net/mac80211/tx.c:1256).
>> 1251					skb_queue_splice_init(skbs, &local->pending[q]);
>> 1252				} else {
>> 1253					u32 len = skb_queue_len(&local->pending[q]);
>> 1254					if (len >= max_pending_qsize) {
>> 1255						__skb_unlink(skb, skbs);
>> 1256						dev_kfree_skb(skb);
>> 1257						/* TODO:  Add counter for this */
>> 1258					} else {
>
> Wait .. this appears to be a local patch you have, it doesn't exist.
> That explains why, the bug doesn't exist upstream (all freeing there is
> outside the queue lock)

Ahh, sorry about that..it is entirely my bug it seems.

I added a patch to keep from queing too many skbs since it can
OOM my system (for instance, when using pktgen to generate traffic,
if I recall correctly).

Probably this bug isn't normally hit even in my code because
we rarely over-drive it like this, and upstream probably never
hits the OOM bug for similar reasons.

In case you are still feeling generous of your time, do you think just
changing the call to dev_kfree_skb_any() and moving it outside
the spin-lock would be a proper fix?

commit 4887df3f5409798f633881df7be6cf7168f78e93
Author: Ben Greear <greearb@candelatech.com>
Date:   Mon Jun 4 17:34:08 2012 -0700

     mac80211: Limit number of pending skbs.

     Current code will allow any number of pending skbs, and
     this can OOM the system when used with something like
     the pktgen tool (which may not back off properly if
     queue is stopped).

     Possibly this is just a bug in our version of pktgen,
     but either way, it seems reasonable to add a limit
     so that it is not possible to go OOM in this manner.

     Signed-off-by: Ben Greear <greearb@candelatech.com>

diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index d84727a..c5c3b8e 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -33,6 +33,17 @@
  #include "wpa.h"
  #include "wme.h"
  #include "rate.h"
+#include <linux/moduleparam.h>
+
+/*
+ * Maximum number of skbs that may be queued in a pending
+ * queue.  After that, packets will just be dropped.
+ */
+static int max_pending_qsize = 1000;
+module_param(max_pending_qsize, int, 0644);
+MODULE_PARM_DESC(max_pending_qsize,
+                "Maximum number of skbs that may be queued in a pending queue.");
+

  /* misc utils */

@@ -1236,12 +1247,19 @@ static bool ieee80211_tx_frags(struct ieee80211_local *local,
                          * transmission from the tx-pending tasklet when the
                          * queue is woken again.
                          */
-                       if (txpending)
+                       if (txpending) {
                                 skb_queue_splice_init(skbs, &local->pending[q]);
-                       else
-                               skb_queue_splice_tail_init(skbs,
-                                                          &local->pending[q]);
-
+                       } else {
+                               u32 len = skb_queue_len(&local->pending[q]);
+                               if (len >= max_pending_qsize) {
+                                       __skb_unlink(skb, skbs);
+                                       dev_kfree_skb(skb);
+                                       /* TODO:  Add counter for this */
+                               } else {
+                                       skb_queue_splice_tail_init(skbs,
+                                                                  &local->pending[q]);
+                               }
+                       }
                         spin_unlock_irqrestore(&local->queue_stop_reason_lock,
                                                flags);
                         return false;

>
> johannes
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: Kernel splat from 3.5.7+ (tainted)
  2012-11-26 19:37     ` Ben Greear
@ 2012-11-26 22:00       ` Johannes Berg
  2012-11-26 22:25         ` Ben Greear
  0 siblings, 1 reply; 7+ messages in thread
From: Johannes Berg @ 2012-11-26 22:00 UTC (permalink / raw)
  To: Ben Greear; +Cc: linux-wireless@vger.kernel.org

On Mon, 2012-11-26 at 11:37 -0800, Ben Greear wrote:

> >> 0x182f9 is in __ieee80211_tx (/home/greearb/git/linux-3.5.dev.y/net/mac80211/tx.c:1256).
> >> 1251					skb_queue_splice_init(skbs, &local->pending[q]);
> >> 1252				} else {
> >> 1253					u32 len = skb_queue_len(&local->pending[q]);
> >> 1254					if (len >= max_pending_qsize) {
> >> 1255						__skb_unlink(skb, skbs);
> >> 1256						dev_kfree_skb(skb);
> >> 1257						/* TODO:  Add counter for this */
> >> 1258					} else {
> >
> > Wait .. this appears to be a local patch you have, it doesn't exist.
> > That explains why, the bug doesn't exist upstream (all freeing there is
> > outside the queue lock)
> 
> Ahh, sorry about that..it is entirely my bug it seems.
> 
> I added a patch to keep from queing too many skbs since it can
> OOM my system (for instance, when using pktgen to generate traffic,
> if I recall correctly).
> 
> Probably this bug isn't normally hit even in my code because
> we rarely over-drive it like this, and upstream probably never
> hits the OOM bug for similar reasons.
> 
> In case you are still feeling generous of your time, do you think just
> changing the call to dev_kfree_skb_any() and moving it outside
> the spin-lock would be a proper fix?

Either one will fix it, I believe, no need to do both.

johannes


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Kernel splat from 3.5.7+ (tainted)
  2012-11-26 22:00       ` Johannes Berg
@ 2012-11-26 22:25         ` Ben Greear
  0 siblings, 0 replies; 7+ messages in thread
From: Ben Greear @ 2012-11-26 22:25 UTC (permalink / raw)
  To: Johannes Berg; +Cc: linux-wireless@vger.kernel.org

On 11/26/2012 02:00 PM, Johannes Berg wrote:
> On Mon, 2012-11-26 at 11:37 -0800, Ben Greear wrote:
>
>>>> 0x182f9 is in __ieee80211_tx (/home/greearb/git/linux-3.5.dev.y/net/mac80211/tx.c:1256).
>>>> 1251					skb_queue_splice_init(skbs, &local->pending[q]);
>>>> 1252				} else {
>>>> 1253					u32 len = skb_queue_len(&local->pending[q]);
>>>> 1254					if (len >= max_pending_qsize) {
>>>> 1255						__skb_unlink(skb, skbs);
>>>> 1256						dev_kfree_skb(skb);
>>>> 1257						/* TODO:  Add counter for this */
>>>> 1258					} else {
>>>
>>> Wait .. this appears to be a local patch you have, it doesn't exist.
>>> That explains why, the bug doesn't exist upstream (all freeing there is
>>> outside the queue lock)
>>
>> Ahh, sorry about that..it is entirely my bug it seems.
>>
>> I added a patch to keep from queing too many skbs since it can
>> OOM my system (for instance, when using pktgen to generate traffic,
>> if I recall correctly).
>>
>> Probably this bug isn't normally hit even in my code because
>> we rarely over-drive it like this, and upstream probably never
>> hits the OOM bug for similar reasons.
>>
>> In case you are still feeling generous of your time, do you think just
>> changing the call to dev_kfree_skb_any() and moving it outside
>> the spin-lock would be a proper fix?
>
> Either one will fix it, I believe, no need to do both.

Thanks.  I went ahead and did both...didn't seem like it
should hurt, at least.

Will beat on this for a while.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-11-26 22:25 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-26 17:10 Kernel splat from 3.5.7+ (tainted) Ben Greear
2012-11-26 17:46 ` Ben Greear
2012-11-26 17:56   ` Johannes Berg
2012-11-26 19:37     ` Ben Greear
2012-11-26 22:00       ` Johannes Berg
2012-11-26 22:25         ` Ben Greear
2012-11-26 17:49 ` Johannes Berg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).