From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: 3.9.5+: Crash in tcp_input.c:4810. Date: Mon, 17 Jun 2013 11:17:39 -0700 Message-ID: <1371493059.3252.200.camel@edumazet-glaptop> References: <51BF50B3.1080403@candelatech.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Cc: netdev To: Ben Greear Return-path: Received: from mail-ee0-f43.google.com ([74.125.83.43]:32840 "EHLO mail-ee0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752070Ab3FQSRo (ORCPT ); Mon, 17 Jun 2013 14:17:44 -0400 Received: by mail-ee0-f43.google.com with SMTP id l10so2011383eei.2 for ; Mon, 17 Jun 2013 11:17:42 -0700 (PDT) In-Reply-To: <51BF50B3.1080403@candelatech.com> Sender: netdev-owner@vger.kernel.org List-ID: On Mon, 2013-06-17 at 11:08 -0700, Ben Greear wrote: > This is from a 3.9.5+ kernel with local patches. We saw this crash during > a weekend run where we had TCP traffic trying to run on 128+ wifi station > interfaces as the interfaces assocaited over and over again (the AP > could handle no more than 127 stations and would dis-associate others > when the 128th tried to associate). > > The code in question is this from the tcp_collapse() method: > > skb_reserve(nskb, header); > memcpy(nskb->head, skb->head, header); > memcpy(nskb->cb, skb->cb, sizeof(skb->cb)); > TCP_SKB_CB(nskb)->seq = TCP_SKB_CB(nskb)->end_seq = start; > __skb_queue_before(list, skb, nskb); > skb_set_owner_r(nskb, sk); > > /* Copy data, releasing collapsed skbs. */ > while (copy > 0) { > int offset = start - TCP_SKB_CB(skb)->seq; > int size = TCP_SKB_CB(skb)->end_seq - start; > > BUG_ON(offset < 0); > > > > ------------[ cut here ]------------ > kernel BUG at /home/greearb/git/linux-3.9.dev.y/net/ipv4/tcp_input.c:4810! > invalid opcode: 0000 [#1] PREEMPT SMP > Modules linked in: nf_nat_ipv4 nf_nat 8021q garp stp mrp llc fuse macvlan wanlink(O) pktgen lockd sunrpc f71882fg e1000e ath9k ath9k_common ath9k_hw ath > mac80211 snd_hda_codec_realtek coretemp snd_hda_intel hwmon snd_hda_codec snd_hwdep mperf intel_powerclamp snd_seq snd_seq_device snd_pcm cfg80211 ptp pps_core > snd_page_alloc snd_timer kvm cdc_acm i2c_i801 gpio_ich iTCO_wdt iTCO_vendor_support snd soundcore ppdev microcode pcspkr serio_raw lpc_ich parport_pc parport > uinput ipv6 i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: iptable_nat] > CPU 1 > Pid: 0, comm: swapper/1 Tainted: G WC O 3.9.5+ #80 To be filled by O.E.M. To be filled by O.E.M./To be filled by O.E.M. > RIP: 0010:[] [] tcp_collapse+0x267/0x37a > RSP: 0018:ffff88022bc83608 EFLAGS: 00010297 > RAX: 0000000000001100 RBX: ffff8801b8f08730 RCX: 0000000000000000 > RDX: 00000000fffffa4d RSI: ffff8801b8f086c0 RDI: ffff880219adbe00 > RBP: ffff88022bc83668 R08: 000000009efbe0a8 R09: ffff8801d25eb328 > R10: ffffffff8109d762 R11: ffff88021791ff00 R12: 000000009efba1f9 > R13: ffff8801d25eb300 R14: ffff880219adbe00 R15: 0000000000000df0 > FS: 0000000000000000(0000) GS:ffff88022bc80000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 000000000286f350 CR3: 0000000001a0c000 CR4: 00000000000007e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process swapper/1 (pid: 0, threadinfo ffff880222162000, task ffff88022215ddc0) > Stack: > ffff88022bc83618 ffff8801000000d0 9efc19bcfffffa4d 0000000000000000 > ffff8801b8f086c0 ffff880219adbe28 ffff88022bc83698 ffff8801b8f086c0 > ffff8801b8f08c88 ffff8801b8f08c88 0000000000000a80 ffff8801c7841d00 > Call Trace: > > [] tcp_try_rmem_schedule+0x1c7/0x26d > [] tcp_data_queue+0x1a9/0xa7e > [] tcp_rcv_established+0x63b/0x696 > [] tcp_v4_do_rcv+0x1bd/0x37d > [] tcp_v4_rcv+0x4ed/0x7d7 > [] ? nf_hook_slow+0x102/0x113 > [] ? xfrm4_policy_check.clone.0+0x4f/0x4f > [] ip_local_deliver_finish+0x11c/0x199 > [] ? xfrm4_policy_check.clone.0+0x4f/0x4f > [] ? xfrm4_policy_check.clone.0+0x4f/0x4f > [] NF_HOOK.clone.1+0x4c/0x53 > [] ip_local_deliver+0x4e/0x52 > [] ip_rcv_finish+0x2da/0x2f2 > [] ? inet_add_protocol+0x48/0x48 > [] NF_HOOK.clone.1+0x4c/0x53 > [] ip_rcv+0x23c/0x26a > [] __netif_receive_skb_core+0x4e7/0x558 > [] __netif_receive_skb+0x4e/0x5e > [] netif_receive_skb+0x5b/0x90 > [] ? ieee80211_data_to_8023+0x2eb/0x370 [cfg80211] > [] ? _raw_read_unlock+0x24/0x2f > [] ieee80211_deliver_skb+0xcd/0x108 [mac80211] > [] ieee80211_rx_handlers+0x1305/0x18c9 [mac80211] > [] ieee80211_prepare_and_rx_handle+0x8fe/0x96a [mac80211] > [] ieee80211_rx+0x6e9/0x759 [mac80211] > [] ? swiotlb_map_page+0x67/0xbb > [] ath_rx_tasklet+0xfce/0x10a7 [ath9k] > [] ath9k_tasklet+0xf9/0x150 [ath9k] > [] tasklet_action+0x7d/0xcc > [] __do_softirq+0x114/0x254 > [] ? _raw_spin_unlock+0x24/0x2f > [] irq_exit+0x4b/0xa8 > [] do_IRQ+0x9d/0xb4 > [] common_interrupt+0x6d/0x6d > > [] ? set_next_entity+0x28/0x7e > [] ? cpuidle_wrap_enter+0x43/0x78 > [] ? cpuidle_wrap_enter+0x3c/0x78 > [] cpuidle_enter_tk+0x10/0x12 > [] cpuidle_enter_state+0x17/0x3f > [] cpuidle_idle_call+0xba/0xfa > [] cpu_idle+0x65/0xb5 > [] start_secondary+0x211/0x213 > [] ? regulator_init_complete+0x62/0x157 > Code: 89 30 4d 89 75 08 ff 43 10 48 8b 75 c0 e8 30 d0 ff ff e9 ee 00 00 00 4d 8d 4d 28 44 89 e2 41 2b 51 18 45 8b 41 1c 89 55 b0 79 04 <0f> 0b eb fe 45 29 e0 45 > 85 c0 7e 4d 45 39 f8 4c 89 f7 4c 89 4d > RIP [] tcp_collapse+0x267/0x37a > RSP > ---[ end trace f30d144e49d988df ]--- > Kernel panic - not syncing: Fatal exception in interrupt > drm_kms_helper: panic occurred, switching back to text console > > Thanks, > Ben > Thanks Ben Same problem was reported today and is under investigation