* skbuff panic @ 2014-07-03 23:03 Austin Schuh 2014-07-03 23:18 ` Austin Schuh 2014-07-05 10:40 ` Oliver Hartkopp 0 siblings, 2 replies; 17+ messages in thread From: Austin Schuh @ 2014-07-03 23:03 UTC (permalink / raw) To: linux-can I'm seeing the following panic. I've seen it on multiple kernel versions (3.10.24 patched, and 3.14.3). uname -a Linux vpc5 3.14.3-rt4abs+ #16 SMP PREEMPT RT Tue Jul 1 16:28:26 PDT 2014 x86_64 GNU/Linux Jul 3 12:18:28 vpc7 kernel: [ 16.691928] skbuff: skb_under_panic: text:ffffffff814fb64d len:-65447 put:-65463 head:ffff880407415080 data:ffff88030742507f tail:0x58 end:0x80 dev:can0 Jul 3 12:18:28 vpc7 kernel: [ 16.692207] ------------[ cut here ]------------ Jul 3 12:18:28 vpc7 kernel: [ 16.692209] kernel BUG at net/core/skbuff.c:100! Jul 3 12:18:28 vpc7 kernel: [ 16.692215] invalid opcode: 0000 [#1] PREEMPT SMP Jul 3 12:18:28 vpc7 kernel: [ 16.692268] Modules linked in: ext3 mbcache jbd vcan loop snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic iTCO_wdt iTCO_vendor_support x86_pkg_temp_thermal coretemp crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper evdev psmouse pcspkr serio_raw parport_pc parport tpm_tis tpm snd_hda_intel snd_hda_codec peak_pci sja1000 i915 snd_hwdep can_dev snd_pcm video snd_timer mei_me e1000e lpc_ich ac i2c_i801 mfd_core snd mei ptp intel_gtt pps_core ata_generic drm_kms_helper button processor ahci libahci fan thermal Jul 3 12:18:28 vpc7 kernel: [ 16.692274] CPU: 1 PID: 2080 Comm: irq/18-can0 Not tainted 3.14.3-rt4abs+ #16 Jul 3 12:18:28 vpc7 kernel: [ 16.692276] Hardware name: CompuLab Intense-PC/Intense-PC, BIOS CR_2.2.0.400 X64 12/12/2013 Jul 3 12:18:28 vpc7 kernel: [ 16.692279] task: ffff88040c5ad680 ti: ffff880407cba000 task.ti: ffff880407cba000 Jul 3 12:18:28 vpc7 kernel: [ 16.692293] RIP: 0010:[<ffffffff81512339>] [<ffffffff81512339>] skb_panic+0x63/0x65 Jul 3 12:18:28 vpc7 kernel: [ 16.692295] RSP: 0000:ffff880407cbbba8 EFLAGS: 00010292 Jul 3 12:18:28 vpc7 kernel: [ 16.692298] RAX: 000000000000008c RBX: ffff88040be50200 RCX: 0000000016f816f7 Jul 3 12:18:28 vpc7 kernel: [ 16.692300] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000000ffffffff Jul 3 12:18:28 vpc7 kernel: [ 16.692301] RBP: ffff880407cbbbc8 R08: 0000000000000000 R09: 0000000000000000 Jul 3 12:18:28 vpc7 kernel: [ 16.692304] R10: 00000000ffffffff R11: 00000000ffffffff R12: ffff88040be3f000 Jul 3 12:18:28 vpc7 kernel: [ 16.692306] R13: ffff88040bf29000 R14: ffff88040be3f000 R15: 0000000000000000 Jul 3 12:18:28 vpc7 kernel: [ 16.692309] FS: 0000000000000000(0000) GS:ffff88042e080000(0000) knlGS:0000000000000000 Jul 3 12:18:28 vpc7 kernel: [ 16.692311] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 3 12:18:28 vpc7 kernel: [ 16.692313] CR2: 00007fe8a7fd34cc CR3: 00000004070ec000 CR4: 00000000001407e0 Jul 3 12:18:28 vpc7 kernel: [ 16.692314] Stack: Jul 3 12:18:28 vpc7 kernel: [ 16.692320] ffff88030742507f 0000000000000058 0000000000000080 ffff88040be3f000 Jul 3 12:18:28 vpc7 kernel: [ 16.692324] ffff880407cbbbd8 ffffffff8143e142 ffff880407cbbc08 ffffffff814fb64d Jul 3 12:18:28 vpc7 kernel: [ 16.692328] ffff88040bf29880 ffffffff81ac40e8 ffffffff81ac4110 ffff88040be3f000 Jul 3 12:18:28 vpc7 kernel: [ 16.692330] Call Trace: Jul 3 12:18:28 vpc7 kernel: [ 16.692340] [<ffffffff8143e142>] skb_push+0x38/0x39 Jul 3 12:18:28 vpc7 kernel: [ 16.692348] [<ffffffff814fb64d>] packet_rcv_spkt+0x98/0xdf Jul 3 12:18:28 vpc7 kernel: [ 16.692357] [<ffffffff8144b8f8>] __netif_receive_skb_core+0x459/0x4dc Jul 3 12:18:28 vpc7 kernel: [ 16.692363] [<ffffffff8106b2db>] ? get_parent_ip+0xe/0x3e Jul 3 12:18:28 vpc7 kernel: [ 16.692369] [<ffffffff8144b9ce>] __netif_receive_skb+0x53/0x65 Jul 3 12:18:28 vpc7 kernel: [ 16.692376] [<ffffffff8144ba40>] process_backlog+0x60/0x13d Jul 3 12:18:28 vpc7 kernel: [ 16.692384] [<ffffffff8144be10>] net_rx_action+0x91/0x1bd Jul 3 12:18:28 vpc7 kernel: [ 16.692395] [<ffffffff81046793>] do_current_softirqs+0x1a5/0x35b Jul 3 12:18:28 vpc7 kernel: [ 16.692404] [<ffffffff81089c0e>] ? irq_thread_fn+0x3a/0x3a Jul 3 12:18:28 vpc7 kernel: [ 16.692409] [<ffffffff810469c7>] __local_bh_enable+0x41/0x68 Jul 3 12:18:28 vpc7 kernel: [ 16.692413] [<ffffffff810469fc>] local_bh_enable+0xe/0x10 Jul 3 12:18:28 vpc7 kernel: [ 16.692417] [<ffffffff81089c57>] irq_forced_thread_fn+0x49/0x55 Jul 3 12:18:28 vpc7 kernel: [ 16.692422] [<ffffffff8108a3cb>] irq_thread+0x8e/0x174 Jul 3 12:18:28 vpc7 kernel: [ 16.692426] [<ffffffff81089b32>] ? irq_finalize_oneshot+0x9c/0x9c Jul 3 12:18:28 vpc7 kernel: [ 16.692431] [<ffffffff8108a33d>] ? irq_affinity_notify+0x14/0x14 Jul 3 12:18:28 vpc7 kernel: [ 16.692437] [<ffffffff8105f7cf>] kthread+0xdc/0xe4 Jul 3 12:18:28 vpc7 kernel: [ 16.692443] [<ffffffff8105f6f3>] ? flush_kthread_worker+0xe1/0xe1 Jul 3 12:18:28 vpc7 kernel: [ 16.692449] [<ffffffff815158ac>] ret_from_fork+0x7c/0xb0 Jul 3 12:18:28 vpc7 kernel: [ 16.692454] [<ffffffff8105f6f3>] ? flush_kthread_worker+0xe1/0xe1 Jul 3 12:18:28 vpc7 kernel: [ 16.692498] Code: 00 00 48 89 44 24 10 8b 87 c8 00 00 00 48 89 44 24 08 48 8b 87 d8 00 00 00 48 c7 c7 fb 46 7b 81 48 89 04 24 31 c0 e8 37 b2 ff ff <0f> 0b 55 48 89 e5 0f 0b 55 48 c7 c2 a0 66 d8 81 48 89 e5 53 48 Jul 3 12:18:28 vpc7 kernel: [ 16.692503] RIP [<ffffffff81512339>] skb_panic+0x63/0x65 Jul 3 12:18:28 vpc7 kernel: [ 16.692505] RSP <ffff880407cbbba8> Jul 3 12:18:28 vpc7 kernel: [ 16.849291] ---[ end trace 0000000000000002 ]--- Here are the skbuffs from a number of crashes. The call traces seem to be similar to the one above, but I haven't exhaustively checked. Jul 3 12:18:28 vpc7 kernel: [ 16.691928] skbuff: skb_under_panic: text:ffffffff814fb64d len:-65447 put:-65463 head:ffff880407415080 data:ffff88030742507f tail:0x58 end:0x80 dev:can0 The following are from kernel version 'Linux version 3.10.24-rt22abs (austin@aschuh-peloton) (gcc version 4.7.2 (Debian 4.7.2-5abs) ) #15 SMP PREEMPT RT Tue May 13 14:42:22 PDT' Jul 3 09:25:11 vpc6 kernel: [ 7.994591] skbuff: skb_under_panic: text:ffffffff81492274 len:89 put:73 head:ffff8802151eaa00 data:ffff8802151ea9ff tail:0x58 end:0x80 dev:can0 Jul 3 09:32:46 vpc6 kernel: [ 7.887542] skbuff: skb_under_panic: text:ffffffff81492274 len:89 put:73 head:ffff8802176a9a40 data:ffff8802176a9a3f tail:0x58 end:0x80 dev:can1 Jun 28 08:56:19 vpc7 kernel: [ 8.157864] skbuff: skb_under_panic: text:ffffffff81492274 len:89 put:73 head:ffff8803f4d49a40 data:ffff8803f4d49a3f tail:0x58 end:0x80 dev:can0 Jun 28 08:56:19 vpc7 kernel: [ 8.157868] skbuff: skb_under_panic: text:ffffffff81492274 len:89 put:73 head:ffff8803f41881c0 data:ffff8803f41881bf tail:0x58 end:0x80 dev:can1 Jun 30 15:01:59 vpc6 kernel: [ 11.481219] skbuff: skb_under_panic: text:ffffffff81492274 len:89 put:73 head:ffff8802150f8540 data:ffff8802150f853f tail:0x58 end:0x80 dev:can1 Jun 30 14:42:13 vpc6 kernel: [ 9.660556] skbuff: skb_under_panic: text:ffffffff81492274 len:89 put:73 head:ffff88021733f1c0 data:ffff88021733f1bf tail:0x58 end:0x80 dev:can1 Jun 30 12:55:40 vpc6 kernel: [ 8.782069] skbuff: skb_under_panic: text:ffffffff81492274 len:89 put:73 head:ffff880214f63d40 data:ffff880214f63d3f tail:0x58 end:0x80 dev:can1 Jun 30 12:17:30 vpc6 kernel: [ 10.016782] skbuff: skb_under_panic: text:ffffffff81492274 len:89 put:73 head:ffff880217cb9180 data:ffff880217cb917f tail:0x58 end:0x80 dev:can1 Jun 30 12:48:04 vpc6 kernel: [ 8.720439] skbuff: skb_under_panic: text:ffffffff81492274 len:89 put:73 head:ffff880214bb88c0 data:ffff880214bb88bf tail:0x58 end:0x80 dev:can1 Jun 27 18:46:38 vpc7 kernel: [ 7.953504] skbuff: skb_under_panic: text:ffffffff81492274 len:89 put:73 head:ffff8803f22d0700 data:ffff8803f22d06ff tail:0x58 end:0x80 dev:can0 Jun 27 18:54:15 vpc7 kernel: [ 8.861305] skbuff: skb_under_panic: text:ffffffff81492274 len:89 put:73 head:ffff8803f2cdbf00 data:ffff8803f2cdbeff tail:0x58 end:0x80 dev:can1 Jun 27 04:57:37 vpc6 kernel: [ 8.336231] skbuff: skb_under_panic: text:ffffffff81492274 len:89 put:73 head:ffff88021a81d780 data:ffff88021a81d77f tail:0x58 end:0x80 dev:can1 Jun 27 04:57:37 vpc6 kernel: [ 8.336251] skbuff: skb_under_panic: text:ffffffff81492274 len:89 put:73 head:ffff880218756740 data:ffff88021875673f tail:0x58 end:0x80 dev:can0 Jun 23 00:22:42 vpc7 kernel: [ 7.924303] skbuff: skb_under_panic: text:ffffffff81492274 len:89 put:73 head:ffff8803f21e7e00 data:ffff8803f21e7dff tail:0x58 end:0x80 dev:can0 Jun 17 04:04:30 vpc6 kernel: [ 7.776512] skbuff: skb_under_panic: text:ffffffff81492274 len:89 put:73 head:ffff880215b19500 data:ffff880215b194ff tail:0x58 end:0x80 dev:can1 Jun 17 03:56:57 vpc6 kernel: [ 11.906063] skbuff: skb_under_panic: text:ffffffff81492274 len:89 put:73 head:ffff88021830a140 data:ffff88021830a13f tail:0x58 end:0x80 dev:can0 Jun 17 04:12:06 vpc6 kernel: [ 7.656869] skbuff: skb_under_panic: text:ffffffff81492274 len:89 put:73 head:ffff8802148b4ec0 data:ffff8802148b4ebf tail:0x58 end:0x80 dev:can0 Any ideas what is causing it? The issue seems to be that the data pointer is less than the head pointer, from reading the code. It only happens right at startup. Thanks, Austin ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: skbuff panic 2014-07-03 23:03 skbuff panic Austin Schuh @ 2014-07-03 23:18 ` Austin Schuh 2014-07-05 10:40 ` Oliver Hartkopp 1 sibling, 0 replies; 17+ messages in thread From: Austin Schuh @ 2014-07-03 23:18 UTC (permalink / raw) To: linux-can On Thu, Jul 3, 2014 at 4:03 PM, Austin Schuh <austin@peloton-tech.com> wrote: > I'm seeing the following panic. I've seen it on multiple kernel > versions (3.10.24 patched, and 3.14.3). > > uname -a > Linux vpc5 3.14.3-rt4abs+ #16 SMP PREEMPT RT Tue Jul 1 16:28:26 PDT > 2014 x86_64 GNU/Linux > > Jul 3 12:18:28 vpc7 kernel: [ 16.691928] skbuff: skb_under_panic: > text:ffffffff814fb64d len:-65447 put:-65463 head:ffff880407415080 > data:ffff88030742507f tail:0x58 end:0x80 dev:can0 > Jul 3 12:18:28 vpc7 kernel: [ 16.692207] ------------[ cut here ]------------ > Jul 3 12:18:28 vpc7 kernel: [ 16.692209] kernel BUG at net/core/skbuff.c:100! > Jul 3 12:18:28 vpc7 kernel: [ 16.692215] invalid opcode: 0000 [#1] > PREEMPT SMP > Jul 3 12:18:28 vpc7 kernel: [ 16.692268] Modules linked in: ext3 > mbcache jbd vcan loop snd_hda_codec_hdmi snd_hda_codec_realtek > snd_hda_codec_generic iTCO_wdt iTCO_vendor_support > x86_pkg_temp_thermal coretemp crc32c_intel ghash_clmulni_intel > aesni_intel aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper > evdev psmouse pcspkr serio_raw parport_pc parport tpm_tis tpm > snd_hda_intel snd_hda_codec peak_pci sja1000 i915 snd_hwdep can_dev > snd_pcm video snd_timer mei_me e1000e lpc_ich ac i2c_i801 mfd_core snd > mei ptp intel_gtt pps_core ata_generic drm_kms_helper button processor > ahci libahci fan thermal > Jul 3 12:18:28 vpc7 kernel: [ 16.692274] CPU: 1 PID: 2080 Comm: > irq/18-can0 Not tainted 3.14.3-rt4abs+ #16 > Jul 3 12:18:28 vpc7 kernel: [ 16.692276] Hardware name: CompuLab > Intense-PC/Intense-PC, BIOS CR_2.2.0.400 X64 12/12/2013 > Jul 3 12:18:28 vpc7 kernel: [ 16.692279] task: ffff88040c5ad680 ti: > ffff880407cba000 task.ti: ffff880407cba000 > Jul 3 12:18:28 vpc7 kernel: [ 16.692293] RIP: > 0010:[<ffffffff81512339>] [<ffffffff81512339>] skb_panic+0x63/0x65 > Jul 3 12:18:28 vpc7 kernel: [ 16.692295] RSP: 0000:ffff880407cbbba8 > EFLAGS: 00010292 > Jul 3 12:18:28 vpc7 kernel: [ 16.692298] RAX: 000000000000008c RBX: > ffff88040be50200 RCX: 0000000016f816f7 > Jul 3 12:18:28 vpc7 kernel: [ 16.692300] RDX: 0000000000000001 RSI: > 0000000000000000 RDI: 00000000ffffffff > Jul 3 12:18:28 vpc7 kernel: [ 16.692301] RBP: ffff880407cbbbc8 R08: > 0000000000000000 R09: 0000000000000000 > Jul 3 12:18:28 vpc7 kernel: [ 16.692304] R10: 00000000ffffffff R11: > 00000000ffffffff R12: ffff88040be3f000 > Jul 3 12:18:28 vpc7 kernel: [ 16.692306] R13: ffff88040bf29000 R14: > ffff88040be3f000 R15: 0000000000000000 > Jul 3 12:18:28 vpc7 kernel: [ 16.692309] FS: > 0000000000000000(0000) GS:ffff88042e080000(0000) > knlGS:0000000000000000 > Jul 3 12:18:28 vpc7 kernel: [ 16.692311] CS: 0010 DS: 0000 ES: > 0000 CR0: 0000000080050033 > Jul 3 12:18:28 vpc7 kernel: [ 16.692313] CR2: 00007fe8a7fd34cc CR3: > 00000004070ec000 CR4: 00000000001407e0 > Jul 3 12:18:28 vpc7 kernel: [ 16.692314] Stack: > Jul 3 12:18:28 vpc7 kernel: [ 16.692320] ffff88030742507f > 0000000000000058 0000000000000080 ffff88040be3f000 > Jul 3 12:18:28 vpc7 kernel: [ 16.692324] ffff880407cbbbd8 > ffffffff8143e142 ffff880407cbbc08 ffffffff814fb64d > Jul 3 12:18:28 vpc7 kernel: [ 16.692328] ffff88040bf29880 > ffffffff81ac40e8 ffffffff81ac4110 ffff88040be3f000 > Jul 3 12:18:28 vpc7 kernel: [ 16.692330] Call Trace: > Jul 3 12:18:28 vpc7 kernel: [ 16.692340] [<ffffffff8143e142>] > skb_push+0x38/0x39 > Jul 3 12:18:28 vpc7 kernel: [ 16.692348] [<ffffffff814fb64d>] > packet_rcv_spkt+0x98/0xdf > Jul 3 12:18:28 vpc7 kernel: [ 16.692357] [<ffffffff8144b8f8>] > __netif_receive_skb_core+0x459/0x4dc > Jul 3 12:18:28 vpc7 kernel: [ 16.692363] [<ffffffff8106b2db>] ? > get_parent_ip+0xe/0x3e > Jul 3 12:18:28 vpc7 kernel: [ 16.692369] [<ffffffff8144b9ce>] > __netif_receive_skb+0x53/0x65 > Jul 3 12:18:28 vpc7 kernel: [ 16.692376] [<ffffffff8144ba40>] > process_backlog+0x60/0x13d > Jul 3 12:18:28 vpc7 kernel: [ 16.692384] [<ffffffff8144be10>] > net_rx_action+0x91/0x1bd > Jul 3 12:18:28 vpc7 kernel: [ 16.692395] [<ffffffff81046793>] > do_current_softirqs+0x1a5/0x35b > Jul 3 12:18:28 vpc7 kernel: [ 16.692404] [<ffffffff81089c0e>] ? > irq_thread_fn+0x3a/0x3a > Jul 3 12:18:28 vpc7 kernel: [ 16.692409] [<ffffffff810469c7>] > __local_bh_enable+0x41/0x68 > Jul 3 12:18:28 vpc7 kernel: [ 16.692413] [<ffffffff810469fc>] > local_bh_enable+0xe/0x10 > Jul 3 12:18:28 vpc7 kernel: [ 16.692417] [<ffffffff81089c57>] > irq_forced_thread_fn+0x49/0x55 > Jul 3 12:18:28 vpc7 kernel: [ 16.692422] [<ffffffff8108a3cb>] > irq_thread+0x8e/0x174 > Jul 3 12:18:28 vpc7 kernel: [ 16.692426] [<ffffffff81089b32>] ? > irq_finalize_oneshot+0x9c/0x9c > Jul 3 12:18:28 vpc7 kernel: [ 16.692431] [<ffffffff8108a33d>] ? > irq_affinity_notify+0x14/0x14 > Jul 3 12:18:28 vpc7 kernel: [ 16.692437] [<ffffffff8105f7cf>] > kthread+0xdc/0xe4 > Jul 3 12:18:28 vpc7 kernel: [ 16.692443] [<ffffffff8105f6f3>] ? > flush_kthread_worker+0xe1/0xe1 > Jul 3 12:18:28 vpc7 kernel: [ 16.692449] [<ffffffff815158ac>] > ret_from_fork+0x7c/0xb0 > Jul 3 12:18:28 vpc7 kernel: [ 16.692454] [<ffffffff8105f6f3>] ? > flush_kthread_worker+0xe1/0xe1 > Jul 3 12:18:28 vpc7 kernel: [ 16.692498] Code: 00 00 48 89 44 24 10 > 8b 87 c8 00 00 00 48 89 44 24 08 48 8b 87 d8 00 00 00 48 c7 c7 fb 46 > 7b 81 48 89 04 24 31 c0 e8 37 b2 ff ff <0f> 0b 55 48 89 e5 0f 0b 55 48 > c7 c2 a0 66 d8 81 48 89 e5 53 48 > Jul 3 12:18:28 vpc7 kernel: [ 16.692503] RIP [<ffffffff81512339>] > skb_panic+0x63/0x65 > Jul 3 12:18:28 vpc7 kernel: [ 16.692505] RSP <ffff880407cbbba8> > Jul 3 12:18:28 vpc7 kernel: [ 16.849291] ---[ end trace 0000000000000002 ]--- > > > Here are the skbuffs from a number of crashes. The call traces seem > to be similar to the one above, but I haven't exhaustively checked. > > Jul 3 12:18:28 vpc7 kernel: [ 16.691928] skbuff: skb_under_panic: > text:ffffffff814fb64d len:-65447 put:-65463 head:ffff880407415080 > data:ffff88030742507f tail:0x58 end:0x80 dev:can0 > > The following are from kernel version 'Linux version 3.10.24-rt22abs > (austin@aschuh-peloton) (gcc version 4.7.2 (Debian 4.7.2-5abs) ) #15 > SMP PREEMPT RT Tue May 13 14:42:22 PDT' > > Jul 3 09:25:11 vpc6 kernel: [ 7.994591] skbuff: skb_under_panic: > text:ffffffff81492274 len:89 put:73 head:ffff8802151eaa00 > data:ffff8802151ea9ff tail:0x58 end:0x80 dev:can0 > Jul 3 09:32:46 vpc6 kernel: [ 7.887542] skbuff: skb_under_panic: > text:ffffffff81492274 len:89 put:73 head:ffff8802176a9a40 > data:ffff8802176a9a3f tail:0x58 end:0x80 dev:can1 > Jun 28 08:56:19 vpc7 kernel: [ 8.157864] skbuff: skb_under_panic: > text:ffffffff81492274 len:89 put:73 head:ffff8803f4d49a40 > data:ffff8803f4d49a3f tail:0x58 end:0x80 dev:can0 > Jun 28 08:56:19 vpc7 kernel: [ 8.157868] skbuff: skb_under_panic: > text:ffffffff81492274 len:89 put:73 head:ffff8803f41881c0 > data:ffff8803f41881bf tail:0x58 end:0x80 dev:can1 > Jun 30 15:01:59 vpc6 kernel: [ 11.481219] skbuff: skb_under_panic: > text:ffffffff81492274 len:89 put:73 head:ffff8802150f8540 > data:ffff8802150f853f tail:0x58 end:0x80 dev:can1 > Jun 30 14:42:13 vpc6 kernel: [ 9.660556] skbuff: skb_under_panic: > text:ffffffff81492274 len:89 put:73 head:ffff88021733f1c0 > data:ffff88021733f1bf tail:0x58 end:0x80 dev:can1 > Jun 30 12:55:40 vpc6 kernel: [ 8.782069] skbuff: skb_under_panic: > text:ffffffff81492274 len:89 put:73 head:ffff880214f63d40 > data:ffff880214f63d3f tail:0x58 end:0x80 dev:can1 > Jun 30 12:17:30 vpc6 kernel: [ 10.016782] skbuff: skb_under_panic: > text:ffffffff81492274 len:89 put:73 head:ffff880217cb9180 > data:ffff880217cb917f tail:0x58 end:0x80 dev:can1 > Jun 30 12:48:04 vpc6 kernel: [ 8.720439] skbuff: skb_under_panic: > text:ffffffff81492274 len:89 put:73 head:ffff880214bb88c0 > data:ffff880214bb88bf tail:0x58 end:0x80 dev:can1 > Jun 27 18:46:38 vpc7 kernel: [ 7.953504] skbuff: skb_under_panic: > text:ffffffff81492274 len:89 put:73 head:ffff8803f22d0700 > data:ffff8803f22d06ff tail:0x58 end:0x80 dev:can0 > Jun 27 18:54:15 vpc7 kernel: [ 8.861305] skbuff: skb_under_panic: > text:ffffffff81492274 len:89 put:73 head:ffff8803f2cdbf00 > data:ffff8803f2cdbeff tail:0x58 end:0x80 dev:can1 > Jun 27 04:57:37 vpc6 kernel: [ 8.336231] skbuff: skb_under_panic: > text:ffffffff81492274 len:89 put:73 head:ffff88021a81d780 > data:ffff88021a81d77f tail:0x58 end:0x80 dev:can1 > Jun 27 04:57:37 vpc6 kernel: [ 8.336251] skbuff: skb_under_panic: > text:ffffffff81492274 len:89 put:73 head:ffff880218756740 > data:ffff88021875673f tail:0x58 end:0x80 dev:can0 > Jun 23 00:22:42 vpc7 kernel: [ 7.924303] skbuff: skb_under_panic: > text:ffffffff81492274 len:89 put:73 head:ffff8803f21e7e00 > data:ffff8803f21e7dff tail:0x58 end:0x80 dev:can0 > Jun 17 04:04:30 vpc6 kernel: [ 7.776512] skbuff: skb_under_panic: > text:ffffffff81492274 len:89 put:73 head:ffff880215b19500 > data:ffff880215b194ff tail:0x58 end:0x80 dev:can1 > Jun 17 03:56:57 vpc6 kernel: [ 11.906063] skbuff: skb_under_panic: > text:ffffffff81492274 len:89 put:73 head:ffff88021830a140 > data:ffff88021830a13f tail:0x58 end:0x80 dev:can0 > Jun 17 04:12:06 vpc6 kernel: [ 7.656869] skbuff: skb_under_panic: > text:ffffffff81492274 len:89 put:73 head:ffff8802148b4ec0 > data:ffff8802148b4ebf tail:0x58 end:0x80 dev:can0 > > Any ideas what is causing it? The issue seems to be that the data > pointer is less than the head pointer, from reading the code. It only > happens right at startup. > > Thanks, > Austin And, of course, I forget to tell which hardware I'm using... $ lspci -v -s 05:00.0 05:00.0 Network controller: PEAK-System Technik GmbH Device 0008 (rev 02) Subsystem: PEAK-System Technik GmbH Device 0005 Flags: bus master, fast devsel, latency 0, IRQ 18 Memory at e0610000 (32-bit, non-prefetchable) [size=64K] Memory at e0600000 (32-bit, non-prefetchable) [size=64K] Kernel driver in use: peak_pci Austin ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: skbuff panic 2014-07-03 23:03 skbuff panic Austin Schuh 2014-07-03 23:18 ` Austin Schuh @ 2014-07-05 10:40 ` Oliver Hartkopp 2014-07-05 18:38 ` Austin Schuh 1 sibling, 1 reply; 17+ messages in thread From: Oliver Hartkopp @ 2014-07-05 10:40 UTC (permalink / raw) To: Austin Schuh, linux-can On 04.07.2014 01:03, Austin Schuh wrote: > I'm seeing the following panic. I've seen it on multiple kernel > versions (3.10.24 patched, and 3.14.3). > > uname -a > Linux vpc5 3.14.3-rt4abs+ #16 SMP PREEMPT RT Tue Jul 1 16:28:26 PDT > 2014 x86_64 GNU/Linux > > Jul 3 12:18:28 vpc7 kernel: [ 16.691928] skbuff: skb_under_panic: > text:ffffffff814fb64d len:-65447 put:-65463 head:ffff880407415080 > data:ffff88030742507f tail:0x58 end:0x80 dev:can0 > Jul 3 12:18:28 vpc7 kernel: [ 16.692207] ------------[ cut here ]------------ > Jul 3 12:18:28 vpc7 kernel: [ 16.692209] kernel BUG at net/core/skbuff.c:100! (..) > Jul 3 12:18:28 vpc7 kernel: [ 16.692330] Call Trace: > Jul 3 12:18:28 vpc7 kernel: [ 16.692340] [<ffffffff8143e142>] > skb_push+0x38/0x39 > Jul 3 12:18:28 vpc7 kernel: [ 16.692348] [<ffffffff814fb64d>] > packet_rcv_spkt+0x98/0xdf > Jul 3 12:18:28 vpc7 kernel: [ 16.692357] [<ffffffff8144b8f8>] > __netif_receive_skb_core+0x459/0x4dc > > Any ideas what is causing it? The issue seems to be that the data > pointer is less than the head pointer, from reading the code. It only > happens right at startup. Hi Austin, as you are using the PF_PACKET socket here - where packet_rcv_spkt() is using skb_push() - the things are slightly different to the PF_CAN handling. Are these kernel panics related to the reception of CAN frames - or do they only show up when you send CAN frames (via PF_PACKET socket)?? Can you tell something more about how you send and receive CAN frames in your setup? Best regards, Oliver ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: skbuff panic 2014-07-05 10:40 ` Oliver Hartkopp @ 2014-07-05 18:38 ` Austin Schuh 2014-07-05 19:21 ` Oliver Hartkopp 0 siblings, 1 reply; 17+ messages in thread From: Austin Schuh @ 2014-07-05 18:38 UTC (permalink / raw) To: Oliver Hartkopp; +Cc: linux-can On Sat, Jul 5, 2014 at 3:40 AM, Oliver Hartkopp <socketcan@hartkopp.net> wrote: > On 04.07.2014 01:03, Austin Schuh wrote: >> I'm seeing the following panic. I've seen it on multiple kernel >> versions (3.10.24 patched, and 3.14.3). >> >> uname -a >> Linux vpc5 3.14.3-rt4abs+ #16 SMP PREEMPT RT Tue Jul 1 16:28:26 PDT >> 2014 x86_64 GNU/Linux >> >> Jul 3 12:18:28 vpc7 kernel: [ 16.691928] skbuff: skb_under_panic: >> text:ffffffff814fb64d len:-65447 put:-65463 head:ffff880407415080 >> data:ffff88030742507f tail:0x58 end:0x80 dev:can0 >> Jul 3 12:18:28 vpc7 kernel: [ 16.692207] ------------[ cut here ]------------ >> Jul 3 12:18:28 vpc7 kernel: [ 16.692209] kernel BUG at net/core/skbuff.c:100! > (..) >> Jul 3 12:18:28 vpc7 kernel: [ 16.692330] Call Trace: >> Jul 3 12:18:28 vpc7 kernel: [ 16.692340] [<ffffffff8143e142>] >> skb_push+0x38/0x39 >> Jul 3 12:18:28 vpc7 kernel: [ 16.692348] [<ffffffff814fb64d>] >> packet_rcv_spkt+0x98/0xdf >> Jul 3 12:18:28 vpc7 kernel: [ 16.692357] [<ffffffff8144b8f8>] >> __netif_receive_skb_core+0x459/0x4dc > >> >> Any ideas what is causing it? The issue seems to be that the data >> pointer is less than the head pointer, from reading the code. It only >> happens right at startup. > > Hi Austin, > > as you are using the PF_PACKET socket here - where packet_rcv_spkt() is using > skb_push() - the things are slightly different to the PF_CAN handling. > > Are these kernel panics related to the reception of CAN frames - or do they > only show up when you send CAN frames (via PF_PACKET socket)?? > > Can you tell something more about how you send and receive CAN frames in your > setup? > > Best regards, > Oliver Hi Oliver, I'm opening the socket with the following calls: int socket_ = socket(PF_CAN, SOCK_RAW, CAN_RAW); struct ifreq ifr; ioctl(socket_, SIOCGIFINDEX, &ifr); struct sockaddr_can addr; addr.can_family = AF_CAN; addr.can_ifindex = ifr.ifr_ifindex; bind(socket_, (struct sockaddr *)&addr, sizeof(addr)); And sending with: struct can_frame frame write(socket_, &frame, sizeof(struct can_frame)) These panics only show up at startup time. As you can see from the syslog entries at the various times, they all happen within the first 20 seconds of the machine coming up, and I only get a max of 1 problem frame per boot per interface. My logs show that the frame that triggers the problem comes in within 1 second of the CAN interface being initialized. Jul 3 09:32:46 vpc6 kernel: [ 5.310067] loop: module loaded Jul 3 09:32:46 vpc6 kernel: [ 5.347914] vcan: Virtual CAN interface driver Jul 3 09:32:46 vpc6 kernel: [ 6.635362] XFS (sda6): Mounting Filesystem Jul 3 09:32:46 vpc6 kernel: [ 6.659463] XFS (sda6): Starting recovery (logdev: internal) Jul 3 09:32:46 vpc6 kernel: [ 6.670430] XFS (sda6): Ending recovery (logdev: internal) Jul 3 09:32:46 vpc6 kernel: [ 6.680831] XFS (sda7): Mounting Filesystem Jul 3 09:32:46 vpc6 kernel: [ 6.847411] XFS (sda7): Starting recovery (logdev: internal) Jul 3 09:32:46 vpc6 kernel: [ 6.852927] XFS (sda7): Ending recovery (logdev: internal) Jul 3 09:32:46 vpc6 kernel: [ 7.489861] peak_pci 0000:04:00.0 can0: setting BTR0=0x01 BTR1=0x9c Jul 3 09:32:46 vpc6 kernel: [ 7.564411] peak_pci 0000:04:00.0 can1: setting BTR0=0x00 BTR1=0x9c Jul 3 09:32:46 vpc6 kernel: [ 7.863569] r8169 0000:05:00.0 eth0: unable to load firmware patch rtl_nic/rtl8168e-3.fw (-2) Jul 3 09:32:46 vpc6 kernel: [ 7.873102] r8169 0000:05:00.0 eth0: link down Jul 3 09:32:46 vpc6 kernel: [ 7.873169] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready Jul 3 09:32:46 vpc6 kernel: [ 7.873212] r8169 0000:05:00.0 eth0: link down Jul 3 09:32:46 vpc6 kernel: [ 7.887542] skbuff: skb_under_panic: text:ffffffff81492274 len:89 put:73 head:ffff8802176a9a40 data:ffff8802176a9a3f tail:0x58 end:0x80 dev:can1 Jul 3 09:32:46 vpc6 kernel: [ 7.887665] ------------[ cut here ]------------ Jul 3 09:32:46 vpc6 kernel: [ 7.887666] kernel BUG at net/core/skbuff.c:127! I think the problem is related to reception and startup. I don't have logs to conclusively show it, but I'm pretty certain that my sending or reading applications haven't been started up by the time the panic triggers. I'll try to grab better evidence of that next time I observe it. Thanks! Austin ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: skbuff panic 2014-07-05 18:38 ` Austin Schuh @ 2014-07-05 19:21 ` Oliver Hartkopp 2014-07-06 5:07 ` Austin Schuh 0 siblings, 1 reply; 17+ messages in thread From: Oliver Hartkopp @ 2014-07-05 19:21 UTC (permalink / raw) To: Austin Schuh; +Cc: linux-can Hi Austin, I assume someone opened the PF_PACKET socket for any kind of traffic (e.g. dhcpclient ??) on any interface. Looks strange - but it should never cause any panic ... There's some skb header initialization code in the can_send() function in net/can/af_can.c . We could try to put some of these in alloc_can_skb(). Can you try the following patch, if it fixes your issue? Thanks, Oliver diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c index e318e87..653db1bb 100644 --- a/drivers/net/can/dev.c +++ b/drivers/net/can/dev.c @@ -501,6 +501,10 @@ struct sk_buff *alloc_can_skb(struct net_device *dev, struct can_frame **cf) skb->pkt_type = PACKET_BROADCAST; skb->ip_summed = CHECKSUM_UNNECESSARY; + skb_reset_mac_header(skb); + skb_reset_network_header(skb); + skb_reset_transport_header(skb); + can_skb_reserve(skb); can_skb_prv(skb)->ifindex = dev->ifindex; On 05.07.2014 20:38, Austin Schuh wrote: > On Sat, Jul 5, 2014 at 3:40 AM, Oliver Hartkopp <socketcan@hartkopp.net> wrote: >> On 04.07.2014 01:03, Austin Schuh wrote: >>> I'm seeing the following panic. I've seen it on multiple kernel >>> versions (3.10.24 patched, and 3.14.3). >>> >>> uname -a >>> Linux vpc5 3.14.3-rt4abs+ #16 SMP PREEMPT RT Tue Jul 1 16:28:26 PDT >>> 2014 x86_64 GNU/Linux >>> >>> Jul 3 12:18:28 vpc7 kernel: [ 16.691928] skbuff: skb_under_panic: >>> text:ffffffff814fb64d len:-65447 put:-65463 head:ffff880407415080 >>> data:ffff88030742507f tail:0x58 end:0x80 dev:can0 >>> Jul 3 12:18:28 vpc7 kernel: [ 16.692207] ------------[ cut here ]------------ >>> Jul 3 12:18:28 vpc7 kernel: [ 16.692209] kernel BUG at net/core/skbuff.c:100! >> (..) >>> Jul 3 12:18:28 vpc7 kernel: [ 16.692330] Call Trace: >>> Jul 3 12:18:28 vpc7 kernel: [ 16.692340] [<ffffffff8143e142>] >>> skb_push+0x38/0x39 >>> Jul 3 12:18:28 vpc7 kernel: [ 16.692348] [<ffffffff814fb64d>] >>> packet_rcv_spkt+0x98/0xdf >>> Jul 3 12:18:28 vpc7 kernel: [ 16.692357] [<ffffffff8144b8f8>] >>> __netif_receive_skb_core+0x459/0x4dc >> >>> >>> Any ideas what is causing it? The issue seems to be that the data >>> pointer is less than the head pointer, from reading the code. It only >>> happens right at startup. >> >> Hi Austin, >> >> as you are using the PF_PACKET socket here - where packet_rcv_spkt() is using >> skb_push() - the things are slightly different to the PF_CAN handling. >> >> Are these kernel panics related to the reception of CAN frames - or do they >> only show up when you send CAN frames (via PF_PACKET socket)?? >> >> Can you tell something more about how you send and receive CAN frames in your >> setup? >> >> Best regards, >> Oliver > > Hi Oliver, > > I'm opening the socket with the following calls: > > int socket_ = socket(PF_CAN, SOCK_RAW, CAN_RAW); > struct ifreq ifr; > ioctl(socket_, SIOCGIFINDEX, &ifr); > struct sockaddr_can addr; > addr.can_family = AF_CAN; > addr.can_ifindex = ifr.ifr_ifindex; > bind(socket_, (struct sockaddr *)&addr, sizeof(addr)); > > And sending with: > > struct can_frame frame > write(socket_, &frame, sizeof(struct can_frame)) > > These panics only show up at startup time. As you can see from the > syslog entries at the various times, they all happen within the first > 20 seconds of the machine coming up, and I only get a max of 1 problem > frame per boot per interface. My logs show that the frame that > triggers the problem comes in within 1 second of the CAN interface > being initialized. > > Jul 3 09:32:46 vpc6 kernel: [ 5.310067] loop: module loaded > Jul 3 09:32:46 vpc6 kernel: [ 5.347914] vcan: Virtual CAN interface driver > Jul 3 09:32:46 vpc6 kernel: [ 6.635362] XFS (sda6): Mounting Filesystem > Jul 3 09:32:46 vpc6 kernel: [ 6.659463] XFS (sda6): Starting > recovery (logdev: internal) > Jul 3 09:32:46 vpc6 kernel: [ 6.670430] XFS (sda6): Ending > recovery (logdev: internal) > Jul 3 09:32:46 vpc6 kernel: [ 6.680831] XFS (sda7): Mounting Filesystem > Jul 3 09:32:46 vpc6 kernel: [ 6.847411] XFS (sda7): Starting > recovery (logdev: internal) > Jul 3 09:32:46 vpc6 kernel: [ 6.852927] XFS (sda7): Ending > recovery (logdev: internal) > Jul 3 09:32:46 vpc6 kernel: [ 7.489861] peak_pci 0000:04:00.0 > can0: setting BTR0=0x01 BTR1=0x9c > Jul 3 09:32:46 vpc6 kernel: [ 7.564411] peak_pci 0000:04:00.0 > can1: setting BTR0=0x00 BTR1=0x9c > Jul 3 09:32:46 vpc6 kernel: [ 7.863569] r8169 0000:05:00.0 eth0: > unable to load firmware patch rtl_nic/rtl8168e-3.fw (-2) > Jul 3 09:32:46 vpc6 kernel: [ 7.873102] r8169 0000:05:00.0 eth0: link down > Jul 3 09:32:46 vpc6 kernel: [ 7.873169] IPv6: ADDRCONF(NETDEV_UP): > eth0: link is not ready > Jul 3 09:32:46 vpc6 kernel: [ 7.873212] r8169 0000:05:00.0 eth0: link down > Jul 3 09:32:46 vpc6 kernel: [ 7.887542] skbuff: skb_under_panic: > text:ffffffff81492274 len:89 put:73 head:ffff8802176a9a40 > data:ffff8802176a9a3f tail:0x58 end:0x80 dev:can1 > Jul 3 09:32:46 vpc6 kernel: [ 7.887665] ------------[ cut here ]------------ > Jul 3 09:32:46 vpc6 kernel: [ 7.887666] kernel BUG at net/core/skbuff.c:127! > > I think the problem is related to reception and startup. I don't have > logs to conclusively show it, but I'm pretty certain that my sending > or reading applications haven't been started up by the time the panic > triggers. I'll try to grab better evidence of that next time I > observe it. > > Thanks! > Austin > ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: skbuff panic 2014-07-05 19:21 ` Oliver Hartkopp @ 2014-07-06 5:07 ` Austin Schuh 2014-07-06 12:12 ` Oliver Hartkopp 0 siblings, 1 reply; 17+ messages in thread From: Austin Schuh @ 2014-07-06 5:07 UTC (permalink / raw) To: Oliver Hartkopp; +Cc: linux-can Hi Oliver, Thanks! What makes you think that someone opened a PF_PACKET socket? I'm curious, since that observation may help me produce a more reliable test case and debug it some more myself. I'm going to work on reproducing the panic more reliable, and then I'll give your patch a whirl. Currently, only PCs that are on the other end of cell modems out in the field seem to be triggering the panic. I'm hesitant to do excessive experimentation on something that takes a plane trip to fix. That just means that this will take longer to debug that I'd like... Austin On Sat, Jul 5, 2014 at 12:21 PM, Oliver Hartkopp <socketcan@hartkopp.net> wrote: > Hi Austin, > > I assume someone opened the PF_PACKET socket for any kind of traffic (e.g. > dhcpclient ??) on any interface. Looks strange - but it should never cause > any panic ... > > There's some skb header initialization code in the can_send() function in > net/can/af_can.c . We could try to put some of these in alloc_can_skb(). > > Can you try the following patch, if it fixes your issue? > > Thanks, > Oliver > > diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c > index e318e87..653db1bb 100644 > --- a/drivers/net/can/dev.c > +++ b/drivers/net/can/dev.c > @@ -501,6 +501,10 @@ struct sk_buff *alloc_can_skb(struct net_device *dev, struct can_frame **cf) > skb->pkt_type = PACKET_BROADCAST; > skb->ip_summed = CHECKSUM_UNNECESSARY; > > + skb_reset_mac_header(skb); > + skb_reset_network_header(skb); > + skb_reset_transport_header(skb); > + > can_skb_reserve(skb); > can_skb_prv(skb)->ifindex = dev->ifindex; > > > > > On 05.07.2014 20:38, Austin Schuh wrote: >> On Sat, Jul 5, 2014 at 3:40 AM, Oliver Hartkopp <socketcan@hartkopp.net> wrote: >>> On 04.07.2014 01:03, Austin Schuh wrote: >>>> I'm seeing the following panic. I've seen it on multiple kernel >>>> versions (3.10.24 patched, and 3.14.3). >>>> >>>> uname -a >>>> Linux vpc5 3.14.3-rt4abs+ #16 SMP PREEMPT RT Tue Jul 1 16:28:26 PDT >>>> 2014 x86_64 GNU/Linux >>>> >>>> Jul 3 12:18:28 vpc7 kernel: [ 16.691928] skbuff: skb_under_panic: >>>> text:ffffffff814fb64d len:-65447 put:-65463 head:ffff880407415080 >>>> data:ffff88030742507f tail:0x58 end:0x80 dev:can0 >>>> Jul 3 12:18:28 vpc7 kernel: [ 16.692207] ------------[ cut here ]------------ >>>> Jul 3 12:18:28 vpc7 kernel: [ 16.692209] kernel BUG at net/core/skbuff.c:100! >>> (..) >>>> Jul 3 12:18:28 vpc7 kernel: [ 16.692330] Call Trace: >>>> Jul 3 12:18:28 vpc7 kernel: [ 16.692340] [<ffffffff8143e142>] >>>> skb_push+0x38/0x39 >>>> Jul 3 12:18:28 vpc7 kernel: [ 16.692348] [<ffffffff814fb64d>] >>>> packet_rcv_spkt+0x98/0xdf >>>> Jul 3 12:18:28 vpc7 kernel: [ 16.692357] [<ffffffff8144b8f8>] >>>> __netif_receive_skb_core+0x459/0x4dc >>> >>>> >>>> Any ideas what is causing it? The issue seems to be that the data >>>> pointer is less than the head pointer, from reading the code. It only >>>> happens right at startup. >>> >>> Hi Austin, >>> >>> as you are using the PF_PACKET socket here - where packet_rcv_spkt() is using >>> skb_push() - the things are slightly different to the PF_CAN handling. >>> >>> Are these kernel panics related to the reception of CAN frames - or do they >>> only show up when you send CAN frames (via PF_PACKET socket)?? >>> >>> Can you tell something more about how you send and receive CAN frames in your >>> setup? >>> >>> Best regards, >>> Oliver >> >> Hi Oliver, >> >> I'm opening the socket with the following calls: >> >> int socket_ = socket(PF_CAN, SOCK_RAW, CAN_RAW); >> struct ifreq ifr; >> ioctl(socket_, SIOCGIFINDEX, &ifr); >> struct sockaddr_can addr; >> addr.can_family = AF_CAN; >> addr.can_ifindex = ifr.ifr_ifindex; >> bind(socket_, (struct sockaddr *)&addr, sizeof(addr)); >> >> And sending with: >> >> struct can_frame frame >> write(socket_, &frame, sizeof(struct can_frame)) >> >> These panics only show up at startup time. As you can see from the >> syslog entries at the various times, they all happen within the first >> 20 seconds of the machine coming up, and I only get a max of 1 problem >> frame per boot per interface. My logs show that the frame that >> triggers the problem comes in within 1 second of the CAN interface >> being initialized. >> >> Jul 3 09:32:46 vpc6 kernel: [ 5.310067] loop: module loaded >> Jul 3 09:32:46 vpc6 kernel: [ 5.347914] vcan: Virtual CAN interface driver >> Jul 3 09:32:46 vpc6 kernel: [ 6.635362] XFS (sda6): Mounting Filesystem >> Jul 3 09:32:46 vpc6 kernel: [ 6.659463] XFS (sda6): Starting >> recovery (logdev: internal) >> Jul 3 09:32:46 vpc6 kernel: [ 6.670430] XFS (sda6): Ending >> recovery (logdev: internal) >> Jul 3 09:32:46 vpc6 kernel: [ 6.680831] XFS (sda7): Mounting Filesystem >> Jul 3 09:32:46 vpc6 kernel: [ 6.847411] XFS (sda7): Starting >> recovery (logdev: internal) >> Jul 3 09:32:46 vpc6 kernel: [ 6.852927] XFS (sda7): Ending >> recovery (logdev: internal) >> Jul 3 09:32:46 vpc6 kernel: [ 7.489861] peak_pci 0000:04:00.0 >> can0: setting BTR0=0x01 BTR1=0x9c >> Jul 3 09:32:46 vpc6 kernel: [ 7.564411] peak_pci 0000:04:00.0 >> can1: setting BTR0=0x00 BTR1=0x9c >> Jul 3 09:32:46 vpc6 kernel: [ 7.863569] r8169 0000:05:00.0 eth0: >> unable to load firmware patch rtl_nic/rtl8168e-3.fw (-2) >> Jul 3 09:32:46 vpc6 kernel: [ 7.873102] r8169 0000:05:00.0 eth0: link down >> Jul 3 09:32:46 vpc6 kernel: [ 7.873169] IPv6: ADDRCONF(NETDEV_UP): >> eth0: link is not ready >> Jul 3 09:32:46 vpc6 kernel: [ 7.873212] r8169 0000:05:00.0 eth0: link down >> Jul 3 09:32:46 vpc6 kernel: [ 7.887542] skbuff: skb_under_panic: >> text:ffffffff81492274 len:89 put:73 head:ffff8802176a9a40 >> data:ffff8802176a9a3f tail:0x58 end:0x80 dev:can1 >> Jul 3 09:32:46 vpc6 kernel: [ 7.887665] ------------[ cut here ]------------ >> Jul 3 09:32:46 vpc6 kernel: [ 7.887666] kernel BUG at net/core/skbuff.c:127! >> >> I think the problem is related to reception and startup. I don't have >> logs to conclusively show it, but I'm pretty certain that my sending >> or reading applications haven't been started up by the time the panic >> triggers. I'll try to grab better evidence of that next time I >> observe it. >> >> Thanks! >> Austin >> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: skbuff panic 2014-07-06 5:07 ` Austin Schuh @ 2014-07-06 12:12 ` Oliver Hartkopp 2014-07-06 16:13 ` Oliver Hartkopp 0 siblings, 1 reply; 17+ messages in thread From: Oliver Hartkopp @ 2014-07-06 12:12 UTC (permalink / raw) To: Austin Schuh; +Cc: linux-can On 06.07.2014 07:07, Austin Schuh wrote: > What makes you think that someone opened a PF_PACKET socket? I'm > curious, since that observation may help me produce a more reliable > test case and debug it some more myself. If you look in your call trace is says: skb_push+0x38/0x39 (-->> which panics) packet_rcv_spkt+0x98/0xdf __netif_receive_skb_core+0x459/0x4dc get_parent_ip+0xe/0x3e __netif_receive_skb+0x53/0x65 As packet_rcv_spkt() is located in net/packet/af_packet.c there must be some user at this early stage of system boot to make PF_PACKET process the CAN frame. In https://gitorious.org/linux-can/can-tests there's a tst-packet.c program which uses the PF_PACKET socket to send/receive CAN frames. I was using tst-packet.c about four years ago for a test - without any problems. But maybe something in the network layer changed, so that CAN frame skbs need to be created with a different setup now. I'll try ASAP if tst-packet.c still works as expected on my machine. > > I'm going to work on reproducing the panic more reliable, and then > I'll give your patch a whirl. Currently, only PCs that are on the > other end of cell modems out in the field seem to be triggering the > panic. I'm hesitant to do excessive experimentation on something that > takes a plane trip to fix. That just means that this will take longer > to debug that I'd like... Indeed this should not be the plan ... Let's save the world by saving carbon dioxide ;-) Best regards, Oliver ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: skbuff panic 2014-07-06 12:12 ` Oliver Hartkopp @ 2014-07-06 16:13 ` Oliver Hartkopp 2014-07-06 19:38 ` Marc Kleine-Budde 0 siblings, 1 reply; 17+ messages in thread From: Oliver Hartkopp @ 2014-07-06 16:13 UTC (permalink / raw) To: Austin Schuh; +Cc: linux-can Answering myself: I tested the latest 3.16.0-rc3-00149-g034a0f6 with tst-packet.c and it works like charm. Receiving and sending with tst-packet is no problem. I was using the SJA1000 based EMS PCMCIA card in my laptop for this test. So I really wonder why (obviously) a CAN frame is processed by the PF_PACKET reception handler at the start of your system?!? Regards, Oliver On 06.07.2014 14:12, Oliver Hartkopp wrote: > On 06.07.2014 07:07, Austin Schuh wrote: > >> What makes you think that someone opened a PF_PACKET socket? I'm >> curious, since that observation may help me produce a more reliable >> test case and debug it some more myself. > > If you look in your call trace is says: > > skb_push+0x38/0x39 (-->> which panics) > packet_rcv_spkt+0x98/0xdf > __netif_receive_skb_core+0x459/0x4dc > get_parent_ip+0xe/0x3e > __netif_receive_skb+0x53/0x65 > > As packet_rcv_spkt() is located in net/packet/af_packet.c there must be some > user at this early stage of system boot to make PF_PACKET process the CAN frame. > > In https://gitorious.org/linux-can/can-tests there's a tst-packet.c program > which uses the PF_PACKET socket to send/receive CAN frames. > > I was using tst-packet.c about four years ago for a test - without any > problems. But maybe something in the network layer changed, so that CAN frame > skbs need to be created with a different setup now. > > I'll try ASAP if tst-packet.c still works as expected on my machine. > >> >> I'm going to work on reproducing the panic more reliable, and then >> I'll give your patch a whirl. Currently, only PCs that are on the >> other end of cell modems out in the field seem to be triggering the >> panic. I'm hesitant to do excessive experimentation on something that >> takes a plane trip to fix. That just means that this will take longer >> to debug that I'd like... > > Indeed this should not be the plan ... > Let's save the world by saving carbon dioxide ;-) > > Best regards, > Oliver > > -- > To unsubscribe from this list: send the line "unsubscribe linux-can" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: skbuff panic 2014-07-06 16:13 ` Oliver Hartkopp @ 2014-07-06 19:38 ` Marc Kleine-Budde 2014-07-07 4:11 ` Austin Schuh 0 siblings, 1 reply; 17+ messages in thread From: Marc Kleine-Budde @ 2014-07-06 19:38 UTC (permalink / raw) To: Oliver Hartkopp, Austin Schuh; +Cc: linux-can [-- Attachment #1: Type: text/plain, Size: 785 bytes --] On 07/06/2014 06:13 PM, Oliver Hartkopp wrote: > Answering myself: > > I tested the latest 3.16.0-rc3-00149-g034a0f6 with tst-packet.c and it works > like charm. Receiving and sending with tst-packet is no problem. > I was using the SJA1000 based EMS PCMCIA card in my laptop for this test. > > So I really wonder why (obviously) a CAN frame is processed by the PF_PACKET > reception handler at the start of your system?!? What about a dhcp client or the kernel's autoip functionality? Marc -- Pengutronix e.K. | Marc Kleine-Budde | Industrial Linux Solutions | Phone: +49-231-2826-924 | Vertretung West/Dortmund | Fax: +49-5121-206917-5555 | Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de | [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 242 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: skbuff panic 2014-07-06 19:38 ` Marc Kleine-Budde @ 2014-07-07 4:11 ` Austin Schuh 2014-07-10 0:07 ` Austin Schuh 0 siblings, 1 reply; 17+ messages in thread From: Austin Schuh @ 2014-07-07 4:11 UTC (permalink / raw) To: Marc Kleine-Budde; +Cc: Oliver Hartkopp, linux-can On Sun, Jul 6, 2014 at 12:38 PM, Marc Kleine-Budde <mkl@pengutronix.de> wrote: > On 07/06/2014 06:13 PM, Oliver Hartkopp wrote: >> Answering myself: >> >> I tested the latest 3.16.0-rc3-00149-g034a0f6 with tst-packet.c and it works >> like charm. Receiving and sending with tst-packet is no problem. >> I was using the SJA1000 based EMS PCMCIA card in my laptop for this test. >> >> So I really wonder why (obviously) a CAN frame is processed by the PF_PACKET >> reception handler at the start of your system?!? > > What about a dhcp client or the kernel's autoip functionality? > > Marc That is a good hypothesis. DHCP is enabled on the machine with the problem, and the machine at my desk which isn't reproducing the panic has it disabled. I'll enable it tomorrow and try it. Austin ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: skbuff panic 2014-07-07 4:11 ` Austin Schuh @ 2014-07-10 0:07 ` Austin Schuh 2014-07-10 17:37 ` Austin Schuh 0 siblings, 1 reply; 17+ messages in thread From: Austin Schuh @ 2014-07-10 0:07 UTC (permalink / raw) To: Marc Kleine-Budde; +Cc: Oliver Hartkopp, linux-can On Sun, Jul 6, 2014 at 9:11 PM, Austin Schuh <austin@peloton-tech.com> wrote: > On Sun, Jul 6, 2014 at 12:38 PM, Marc Kleine-Budde <mkl@pengutronix.de> wrote: >> On 07/06/2014 06:13 PM, Oliver Hartkopp wrote: >>> Answering myself: >>> >>> I tested the latest 3.16.0-rc3-00149-g034a0f6 with tst-packet.c and it works >>> like charm. Receiving and sending with tst-packet is no problem. >>> I was using the SJA1000 based EMS PCMCIA card in my laptop for this test. >>> >>> So I really wonder why (obviously) a CAN frame is processed by the PF_PACKET >>> reception handler at the start of your system?!? >> >> What about a dhcp client or the kernel's autoip functionality? >> >> Marc > > That is a good hypothesis. DHCP is enabled on the machine with the > problem, and the machine at my desk which isn't reproducing the panic > has it disabled. I'll enable it tomorrow and try it. > > Austin Turns out reproducing this bug at my desk is a bit of a pain. It takes 10 reboots, with a standard deviation of 9 to reproduce the bug. With Oliver's patch, I'm able to get to 60 reboots and counting, so it looks like that was the problem. I'll leave it rebooting over night to be sure. Austin For my future reference/anyone else who is interested, here is the number of reboots until failure without the patch 27 successful reboots 1 failed reboot 2 successful reboots 1 failed reboot 13 successful 1 failed 5 successful 2 failed 10 successful 2 failed 3 successful 1 failed 4 successful 1 failed 19 successful 1 failed ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: skbuff panic 2014-07-10 0:07 ` Austin Schuh @ 2014-07-10 17:37 ` Austin Schuh 2014-07-11 13:27 ` Oliver Hartkopp 0 siblings, 1 reply; 17+ messages in thread From: Austin Schuh @ 2014-07-10 17:37 UTC (permalink / raw) To: Marc Kleine-Budde; +Cc: Oliver Hartkopp, linux-can On Wed, Jul 9, 2014 at 5:07 PM, Austin Schuh <austin@peloton-tech.com> wrote: > With Oliver's patch, I'm able to get to 60 reboots and counting, so it > looks like that was the problem. I'll leave it rebooting over night > to be sure. The machine survived 216 reboots with no panics over the night. Thanks Oliver! Tested-by: Austin Schuh <austin@peloton-tech.com> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: skbuff panic 2014-07-10 17:37 ` Austin Schuh @ 2014-07-11 13:27 ` Oliver Hartkopp 2014-07-11 14:58 ` Austin Schuh 0 siblings, 1 reply; 17+ messages in thread From: Oliver Hartkopp @ 2014-07-11 13:27 UTC (permalink / raw) To: Austin Schuh, Marc Kleine-Budde; +Cc: linux-can Thanks for testing, Austin! I'll cook a patch for stable to add these settings. Even if we do not know why someone requests obviously all netdev skbs and treats them to be Ethernet/IP packets we should fix it in our code. If we fix it somewhere else it will last one year until some of the is-there-anything-else-than-ethernet-networking guys will break it again. ;-) Best regards, Oliver On 10.07.2014 13:37, Austin Schuh wrote: > On Wed, Jul 9, 2014 at 5:07 PM, Austin Schuh <austin@peloton-tech.com> wrote: >> With Oliver's patch, I'm able to get to 60 reboots and counting, so it >> looks like that was the problem. I'll leave it rebooting over night >> to be sure. > > The machine survived 216 reboots with no panics over the night. Thanks Oliver! > > Tested-by: Austin Schuh <austin@peloton-tech.com> > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: skbuff panic 2014-07-11 13:27 ` Oliver Hartkopp @ 2014-07-11 14:58 ` Austin Schuh 2014-07-11 17:48 ` Marc Kleine-Budde 0 siblings, 1 reply; 17+ messages in thread From: Austin Schuh @ 2014-07-11 14:58 UTC (permalink / raw) To: Oliver Hartkopp; +Cc: Marc Kleine-Budde, linux-can On Fri, Jul 11, 2014 at 6:27 AM, Oliver Hartkopp <socketcan@hartkopp.net> wrote: > Thanks for testing, Austin! > > I'll cook a patch for stable to add these settings. > Even if we do not know why someone requests obviously all netdev skbs and > treats them to be Ethernet/IP packets we should fix it in our code. > > If we fix it somewhere else it will last one year until some of the > is-there-anything-else-than-ethernet-networking guys will break it again. > ;-) Is it worth debugging why as well? Defense in layers? Austin ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: skbuff panic 2014-07-11 14:58 ` Austin Schuh @ 2014-07-11 17:48 ` Marc Kleine-Budde 2015-02-19 11:48 ` Daniel Steer 0 siblings, 1 reply; 17+ messages in thread From: Marc Kleine-Budde @ 2014-07-11 17:48 UTC (permalink / raw) To: Austin Schuh, Oliver Hartkopp; +Cc: linux-can [-- Attachment #1: Type: text/plain, Size: 919 bytes --] On 07/11/2014 04:58 PM, Austin Schuh wrote: > On Fri, Jul 11, 2014 at 6:27 AM, Oliver Hartkopp <socketcan@hartkopp.net> wrote: >> Thanks for testing, Austin! >> >> I'll cook a patch for stable to add these settings. >> Even if we do not know why someone requests obviously all netdev skbs and >> treats them to be Ethernet/IP packets we should fix it in our code. >> >> If we fix it somewhere else it will last one year until some of the >> is-there-anything-else-than-ethernet-networking guys will break it again. >> ;-) > > Is it worth debugging why as well? Defense in layers? Anything new about the kernel-dhcp? Marc -- Pengutronix e.K. | Marc Kleine-Budde | Industrial Linux Solutions | Phone: +49-231-2826-924 | Vertretung West/Dortmund | Fax: +49-5121-206917-5555 | Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de | [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 242 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: skbuff panic 2014-07-11 17:48 ` Marc Kleine-Budde @ 2015-02-19 11:48 ` Daniel Steer 2015-02-23 12:55 ` Oliver Hartkopp 0 siblings, 1 reply; 17+ messages in thread From: Daniel Steer @ 2015-02-19 11:48 UTC (permalink / raw) To: linux-can Marc Kleine-Budde <mkl <at> pengutronix.de> writes: > > On 07/11/2014 04:58 PM, Austin Schuh wrote: > > On Fri, Jul 11, 2014 at 6:27 AM, Oliver Hartkopp <socketcan <at> hartkopp.net> wrote: > >> Thanks for testing, Austin! > >> > >> I'll cook a patch for stable to add these settings. > >> Even if we do not know why someone requests obviously all netdev skbs and > >> treats them to be Ethernet/IP packets we should fix it in our code. > >> > >> If we fix it somewhere else it will last one year until some of the > >> is-there-anything-else-than-ethernet-networking guys will break it again. > >> > > > > Is it worth debugging why as well? Defense in layers? > > Anything new about the kernel-dhcp? > > Marc > Hi, I have been investigating a similar issue on our system when running dhclient on a wireless network interface whilst sending and receiving on CAN via the BCM. I have applied your suggested patch, but also needed to reset the mac header pointer in alloc_can_skb() as looped back transmit packets will also cause an sk_buff under panic in af_packet.c. Ours is an older kernel and not the latest CAN code, but I can't see any changes in the latest code that would address this. diff --git a/net/can/af_can.c b/net/can/af_can.c --- a/net/can/af_can.c +++ b/net/can/af_can.c @@ -318,6 +318,8 @@ #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,22) skb_reset_network_header(skb); skb_reset_transport_header(skb); + /* dhclient interacting badly with CAN. */ + skb_reset_mac_header(skb); #else skb->nh.raw = skb->data; skb->h.raw = skb->data; Thank you, Daniel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: skbuff panic 2015-02-19 11:48 ` Daniel Steer @ 2015-02-23 12:55 ` Oliver Hartkopp 0 siblings, 0 replies; 17+ messages in thread From: Oliver Hartkopp @ 2015-02-23 12:55 UTC (permalink / raw) To: Daniel Steer; +Cc: linux-can, Marc Kleine-Budde, Austin Schuh Hello Daniel, thanks for pointing me to this issue (again)! I was checking the mail thread from July 2014 and found this fix for linux/driver/net/can/dev.c http://marc.info/?l=linux-can&m=140458812201304&w=2 Which addresses this issue too. So obviously we will need to reset the mac/network/transport header in af_can.c: can_send() dev.c: alloc_can_skb() dev.c: alloc_canfd_skb() to make sure it runs properly with af_packet (and kernel-dhcp) too. Last time I announced to create a patch for it. This time I will DO SO! Thanks for your patience :-) Best regards, Oliver On 19.02.2015 12:48, Daniel Steer wrote: > Marc Kleine-Budde <mkl <at> pengutronix.de> writes: > >> >> On 07/11/2014 04:58 PM, Austin Schuh wrote: >>> On Fri, Jul 11, 2014 at 6:27 AM, Oliver Hartkopp <socketcan <at> > hartkopp.net> wrote: >>>> Thanks for testing, Austin! >>>> >>>> I'll cook a patch for stable to add these settings. >>>> Even if we do not know why someone requests obviously all netdev > skbs and >>>> treats them to be Ethernet/IP packets we should fix it in our code. >>>> >>>> If we fix it somewhere else it will last one year until some of the >>>> is-there-anything-else-than-ethernet-networking guys will break it > again. >>>> >>> >>> Is it worth debugging why as well? Defense in layers? >> >> Anything new about the kernel-dhcp? >> >> Marc >> > > Hi, > > I have been investigating a similar issue on our system when running > dhclient on a wireless network interface whilst sending and receiving on > CAN via the BCM. I have applied your suggested patch, but also needed to > reset the mac header pointer in alloc_can_skb() as looped back transmit > packets will also cause an sk_buff under panic in af_packet.c. Ours is > an older kernel and not the latest CAN code, but I can't see any changes > in the latest code that would address this. > > diff --git a/net/can/af_can.c b/net/can/af_can.c > --- a/net/can/af_can.c > +++ b/net/can/af_can.c > @@ -318,6 +318,8 @@ > #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,22) > skb_reset_network_header(skb); > skb_reset_transport_header(skb); > + /* dhclient interacting badly with CAN. */ > + skb_reset_mac_header(skb); > #else > skb->nh.raw = skb->data; > skb->h.raw = skb->data; > > Thank you, > > Daniel > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-can" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2015-02-23 12:55 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-07-03 23:03 skbuff panic Austin Schuh 2014-07-03 23:18 ` Austin Schuh 2014-07-05 10:40 ` Oliver Hartkopp 2014-07-05 18:38 ` Austin Schuh 2014-07-05 19:21 ` Oliver Hartkopp 2014-07-06 5:07 ` Austin Schuh 2014-07-06 12:12 ` Oliver Hartkopp 2014-07-06 16:13 ` Oliver Hartkopp 2014-07-06 19:38 ` Marc Kleine-Budde 2014-07-07 4:11 ` Austin Schuh 2014-07-10 0:07 ` Austin Schuh 2014-07-10 17:37 ` Austin Schuh 2014-07-11 13:27 ` Oliver Hartkopp 2014-07-11 14:58 ` Austin Schuh 2014-07-11 17:48 ` Marc Kleine-Budde 2015-02-19 11:48 ` Daniel Steer 2015-02-23 12:55 ` Oliver Hartkopp
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).