skbuff panic

linux-can.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* skbuff panic
@ 2014-07-03 23:03 Austin Schuh
  2014-07-03 23:18 ` Austin Schuh
  2014-07-05 10:40 ` Oliver Hartkopp
  0 siblings, 2 replies; 17+ messages in thread
From: Austin Schuh @ 2014-07-03 23:03 UTC (permalink / raw)
  To: linux-can

I'm seeing the following panic.  I've seen it on multiple kernel
versions (3.10.24 patched, and 3.14.3).

uname -a
Linux vpc5 3.14.3-rt4abs+ #16 SMP PREEMPT RT Tue Jul 1 16:28:26 PDT
2014 x86_64 GNU/Linux

Jul  3 12:18:28 vpc7 kernel: [   16.691928] skbuff: skb_under_panic:
text:ffffffff814fb64d len:-65447 put:-65463 head:ffff880407415080
data:ffff88030742507f tail:0x58 end:0x80 dev:can0
Jul  3 12:18:28 vpc7 kernel: [   16.692207] ------------[ cut here ]------------
Jul  3 12:18:28 vpc7 kernel: [   16.692209] kernel BUG at net/core/skbuff.c:100!
Jul  3 12:18:28 vpc7 kernel: [   16.692215] invalid opcode: 0000 [#1]
PREEMPT SMP
Jul  3 12:18:28 vpc7 kernel: [   16.692268] Modules linked in: ext3
mbcache jbd vcan loop snd_hda_codec_hdmi snd_hda_codec_realtek
snd_hda_codec_generic iTCO_wdt iTCO_vendor_support
x86_pkg_temp_thermal coretemp crc32c_intel ghash_clmulni_intel
aesni_intel aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper
evdev psmouse pcspkr serio_raw parport_pc parport tpm_tis tpm
snd_hda_intel snd_hda_codec peak_pci sja1000 i915 snd_hwdep can_dev
snd_pcm video snd_timer mei_me e1000e lpc_ich ac i2c_i801 mfd_core snd
mei ptp intel_gtt pps_core ata_generic drm_kms_helper button processor
ahci libahci fan thermal
Jul  3 12:18:28 vpc7 kernel: [   16.692274] CPU: 1 PID: 2080 Comm:
irq/18-can0 Not tainted 3.14.3-rt4abs+ #16
Jul  3 12:18:28 vpc7 kernel: [   16.692276] Hardware name: CompuLab
Intense-PC/Intense-PC, BIOS CR_2.2.0.400 X64 12/12/2013
Jul  3 12:18:28 vpc7 kernel: [   16.692279] task: ffff88040c5ad680 ti:
ffff880407cba000 task.ti: ffff880407cba000
Jul  3 12:18:28 vpc7 kernel: [   16.692293] RIP:
0010:[<ffffffff81512339>]  [<ffffffff81512339>] skb_panic+0x63/0x65
Jul  3 12:18:28 vpc7 kernel: [   16.692295] RSP: 0000:ffff880407cbbba8
 EFLAGS: 00010292
Jul  3 12:18:28 vpc7 kernel: [   16.692298] RAX: 000000000000008c RBX:
ffff88040be50200 RCX: 0000000016f816f7
Jul  3 12:18:28 vpc7 kernel: [   16.692300] RDX: 0000000000000001 RSI:
0000000000000000 RDI: 00000000ffffffff
Jul  3 12:18:28 vpc7 kernel: [   16.692301] RBP: ffff880407cbbbc8 R08:
0000000000000000 R09: 0000000000000000
Jul  3 12:18:28 vpc7 kernel: [   16.692304] R10: 00000000ffffffff R11:
00000000ffffffff R12: ffff88040be3f000
Jul  3 12:18:28 vpc7 kernel: [   16.692306] R13: ffff88040bf29000 R14:
ffff88040be3f000 R15: 0000000000000000
Jul  3 12:18:28 vpc7 kernel: [   16.692309] FS:
0000000000000000(0000) GS:ffff88042e080000(0000)
knlGS:0000000000000000
Jul  3 12:18:28 vpc7 kernel: [   16.692311] CS:  0010 DS: 0000 ES:
0000 CR0: 0000000080050033
Jul  3 12:18:28 vpc7 kernel: [   16.692313] CR2: 00007fe8a7fd34cc CR3:
00000004070ec000 CR4: 00000000001407e0
Jul  3 12:18:28 vpc7 kernel: [   16.692314] Stack:
Jul  3 12:18:28 vpc7 kernel: [   16.692320]  ffff88030742507f
0000000000000058 0000000000000080 ffff88040be3f000
Jul  3 12:18:28 vpc7 kernel: [   16.692324]  ffff880407cbbbd8
ffffffff8143e142 ffff880407cbbc08 ffffffff814fb64d
Jul  3 12:18:28 vpc7 kernel: [   16.692328]  ffff88040bf29880
ffffffff81ac40e8 ffffffff81ac4110 ffff88040be3f000
Jul  3 12:18:28 vpc7 kernel: [   16.692330] Call Trace:
Jul  3 12:18:28 vpc7 kernel: [   16.692340]  [<ffffffff8143e142>]
skb_push+0x38/0x39
Jul  3 12:18:28 vpc7 kernel: [   16.692348]  [<ffffffff814fb64d>]
packet_rcv_spkt+0x98/0xdf
Jul  3 12:18:28 vpc7 kernel: [   16.692357]  [<ffffffff8144b8f8>]
__netif_receive_skb_core+0x459/0x4dc
Jul  3 12:18:28 vpc7 kernel: [   16.692363]  [<ffffffff8106b2db>] ?
get_parent_ip+0xe/0x3e
Jul  3 12:18:28 vpc7 kernel: [   16.692369]  [<ffffffff8144b9ce>]
__netif_receive_skb+0x53/0x65
Jul  3 12:18:28 vpc7 kernel: [   16.692376]  [<ffffffff8144ba40>]
process_backlog+0x60/0x13d
Jul  3 12:18:28 vpc7 kernel: [   16.692384]  [<ffffffff8144be10>]
net_rx_action+0x91/0x1bd
Jul  3 12:18:28 vpc7 kernel: [   16.692395]  [<ffffffff81046793>]
do_current_softirqs+0x1a5/0x35b
Jul  3 12:18:28 vpc7 kernel: [   16.692404]  [<ffffffff81089c0e>] ?
irq_thread_fn+0x3a/0x3a
Jul  3 12:18:28 vpc7 kernel: [   16.692409]  [<ffffffff810469c7>]
__local_bh_enable+0x41/0x68
Jul  3 12:18:28 vpc7 kernel: [   16.692413]  [<ffffffff810469fc>]
local_bh_enable+0xe/0x10
Jul  3 12:18:28 vpc7 kernel: [   16.692417]  [<ffffffff81089c57>]
irq_forced_thread_fn+0x49/0x55
Jul  3 12:18:28 vpc7 kernel: [   16.692422]  [<ffffffff8108a3cb>]
irq_thread+0x8e/0x174
Jul  3 12:18:28 vpc7 kernel: [   16.692426]  [<ffffffff81089b32>] ?
irq_finalize_oneshot+0x9c/0x9c
Jul  3 12:18:28 vpc7 kernel: [   16.692431]  [<ffffffff8108a33d>] ?
irq_affinity_notify+0x14/0x14
Jul  3 12:18:28 vpc7 kernel: [   16.692437]  [<ffffffff8105f7cf>]
kthread+0xdc/0xe4
Jul  3 12:18:28 vpc7 kernel: [   16.692443]  [<ffffffff8105f6f3>] ?
flush_kthread_worker+0xe1/0xe1
Jul  3 12:18:28 vpc7 kernel: [   16.692449]  [<ffffffff815158ac>]
ret_from_fork+0x7c/0xb0
Jul  3 12:18:28 vpc7 kernel: [   16.692454]  [<ffffffff8105f6f3>] ?
flush_kthread_worker+0xe1/0xe1
Jul  3 12:18:28 vpc7 kernel: [   16.692498] Code: 00 00 48 89 44 24 10
8b 87 c8 00 00 00 48 89 44 24 08 48 8b 87 d8 00 00 00 48 c7 c7 fb 46
7b 81 48 89 04 24 31 c0 e8 37 b2 ff ff <0f> 0b 55 48 89 e5 0f 0b 55 48
c7 c2 a0 66 d8 81 48 89 e5 53 48
Jul  3 12:18:28 vpc7 kernel: [   16.692503] RIP  [<ffffffff81512339>]
skb_panic+0x63/0x65
Jul  3 12:18:28 vpc7 kernel: [   16.692505]  RSP <ffff880407cbbba8>
Jul  3 12:18:28 vpc7 kernel: [   16.849291] ---[ end trace 0000000000000002 ]---


Here are the skbuffs from a number of crashes.  The call traces seem
to be similar to the one above, but I haven't exhaustively checked.

Jul  3 12:18:28 vpc7 kernel: [   16.691928] skbuff: skb_under_panic:
text:ffffffff814fb64d len:-65447 put:-65463 head:ffff880407415080
data:ffff88030742507f tail:0x58 end:0x80 dev:can0

The following are from kernel version 'Linux version 3.10.24-rt22abs
(austin@aschuh-peloton) (gcc version 4.7.2 (Debian 4.7.2-5abs) ) #15
SMP PREEMPT RT Tue May 13 14:42:22 PDT'

Jul  3 09:25:11 vpc6 kernel: [    7.994591] skbuff: skb_under_panic:
text:ffffffff81492274 len:89 put:73 head:ffff8802151eaa00
data:ffff8802151ea9ff tail:0x58 end:0x80 dev:can0
Jul  3 09:32:46 vpc6 kernel: [    7.887542] skbuff: skb_under_panic:
text:ffffffff81492274 len:89 put:73 head:ffff8802176a9a40
data:ffff8802176a9a3f tail:0x58 end:0x80 dev:can1
Jun 28 08:56:19 vpc7 kernel: [    8.157864] skbuff: skb_under_panic:
text:ffffffff81492274 len:89 put:73 head:ffff8803f4d49a40
data:ffff8803f4d49a3f tail:0x58 end:0x80 dev:can0
Jun 28 08:56:19 vpc7 kernel: [    8.157868] skbuff: skb_under_panic:
text:ffffffff81492274 len:89 put:73 head:ffff8803f41881c0
data:ffff8803f41881bf tail:0x58 end:0x80 dev:can1
Jun 30 15:01:59 vpc6 kernel: [   11.481219] skbuff: skb_under_panic:
text:ffffffff81492274 len:89 put:73 head:ffff8802150f8540
data:ffff8802150f853f tail:0x58 end:0x80 dev:can1
Jun 30 14:42:13 vpc6 kernel: [    9.660556] skbuff: skb_under_panic:
text:ffffffff81492274 len:89 put:73 head:ffff88021733f1c0
data:ffff88021733f1bf tail:0x58 end:0x80 dev:can1
Jun 30 12:55:40 vpc6 kernel: [    8.782069] skbuff: skb_under_panic:
text:ffffffff81492274 len:89 put:73 head:ffff880214f63d40
data:ffff880214f63d3f tail:0x58 end:0x80 dev:can1
Jun 30 12:17:30 vpc6 kernel: [   10.016782] skbuff: skb_under_panic:
text:ffffffff81492274 len:89 put:73 head:ffff880217cb9180
data:ffff880217cb917f tail:0x58 end:0x80 dev:can1
Jun 30 12:48:04 vpc6 kernel: [    8.720439] skbuff: skb_under_panic:
text:ffffffff81492274 len:89 put:73 head:ffff880214bb88c0
data:ffff880214bb88bf tail:0x58 end:0x80 dev:can1
Jun 27 18:46:38 vpc7 kernel: [    7.953504] skbuff: skb_under_panic:
text:ffffffff81492274 len:89 put:73 head:ffff8803f22d0700
data:ffff8803f22d06ff tail:0x58 end:0x80 dev:can0
Jun 27 18:54:15 vpc7 kernel: [    8.861305] skbuff: skb_under_panic:
text:ffffffff81492274 len:89 put:73 head:ffff8803f2cdbf00
data:ffff8803f2cdbeff tail:0x58 end:0x80 dev:can1
Jun 27 04:57:37 vpc6 kernel: [    8.336231] skbuff: skb_under_panic:
text:ffffffff81492274 len:89 put:73 head:ffff88021a81d780
data:ffff88021a81d77f tail:0x58 end:0x80 dev:can1
Jun 27 04:57:37 vpc6 kernel: [    8.336251] skbuff: skb_under_panic:
text:ffffffff81492274 len:89 put:73 head:ffff880218756740
data:ffff88021875673f tail:0x58 end:0x80 dev:can0
Jun 23 00:22:42 vpc7 kernel: [    7.924303] skbuff: skb_under_panic:
text:ffffffff81492274 len:89 put:73 head:ffff8803f21e7e00
data:ffff8803f21e7dff tail:0x58 end:0x80 dev:can0
Jun 17 04:04:30 vpc6 kernel: [    7.776512] skbuff: skb_under_panic:
text:ffffffff81492274 len:89 put:73 head:ffff880215b19500
data:ffff880215b194ff tail:0x58 end:0x80 dev:can1
Jun 17 03:56:57 vpc6 kernel: [   11.906063] skbuff: skb_under_panic:
text:ffffffff81492274 len:89 put:73 head:ffff88021830a140
data:ffff88021830a13f tail:0x58 end:0x80 dev:can0
Jun 17 04:12:06 vpc6 kernel: [    7.656869] skbuff: skb_under_panic:
text:ffffffff81492274 len:89 put:73 head:ffff8802148b4ec0
data:ffff8802148b4ebf tail:0x58 end:0x80 dev:can0

Any ideas what is causing it?  The issue seems to be that the data
pointer is less than the head pointer, from reading the code.  It only
happens right at startup.

Thanks,
  Austin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: skbuff panic
  2014-07-03 23:03 skbuff panic Austin Schuh
@ 2014-07-03 23:18 ` Austin Schuh
  2014-07-05 10:40 ` Oliver Hartkopp
  1 sibling, 0 replies; 17+ messages in thread
From: Austin Schuh @ 2014-07-03 23:18 UTC (permalink / raw)
  To: linux-can

On Thu, Jul 3, 2014 at 4:03 PM, Austin Schuh <austin@peloton-tech.com> wrote:
> I'm seeing the following panic.  I've seen it on multiple kernel
> versions (3.10.24 patched, and 3.14.3).
>
> uname -a
> Linux vpc5 3.14.3-rt4abs+ #16 SMP PREEMPT RT Tue Jul 1 16:28:26 PDT
> 2014 x86_64 GNU/Linux
>
> Jul  3 12:18:28 vpc7 kernel: [   16.691928] skbuff: skb_under_panic:
> text:ffffffff814fb64d len:-65447 put:-65463 head:ffff880407415080
> data:ffff88030742507f tail:0x58 end:0x80 dev:can0
> Jul  3 12:18:28 vpc7 kernel: [   16.692207] ------------[ cut here ]------------
> Jul  3 12:18:28 vpc7 kernel: [   16.692209] kernel BUG at net/core/skbuff.c:100!
> Jul  3 12:18:28 vpc7 kernel: [   16.692215] invalid opcode: 0000 [#1]
> PREEMPT SMP
> Jul  3 12:18:28 vpc7 kernel: [   16.692268] Modules linked in: ext3
> mbcache jbd vcan loop snd_hda_codec_hdmi snd_hda_codec_realtek
> snd_hda_codec_generic iTCO_wdt iTCO_vendor_support
> x86_pkg_temp_thermal coretemp crc32c_intel ghash_clmulni_intel
> aesni_intel aes_x86_64 ablk_helper cryptd lrw gf128mul glue_helper
> evdev psmouse pcspkr serio_raw parport_pc parport tpm_tis tpm
> snd_hda_intel snd_hda_codec peak_pci sja1000 i915 snd_hwdep can_dev
> snd_pcm video snd_timer mei_me e1000e lpc_ich ac i2c_i801 mfd_core snd
> mei ptp intel_gtt pps_core ata_generic drm_kms_helper button processor
> ahci libahci fan thermal
> Jul  3 12:18:28 vpc7 kernel: [   16.692274] CPU: 1 PID: 2080 Comm:
> irq/18-can0 Not tainted 3.14.3-rt4abs+ #16
> Jul  3 12:18:28 vpc7 kernel: [   16.692276] Hardware name: CompuLab
> Intense-PC/Intense-PC, BIOS CR_2.2.0.400 X64 12/12/2013
> Jul  3 12:18:28 vpc7 kernel: [   16.692279] task: ffff88040c5ad680 ti:
> ffff880407cba000 task.ti: ffff880407cba000
> Jul  3 12:18:28 vpc7 kernel: [   16.692293] RIP:
> 0010:[<ffffffff81512339>]  [<ffffffff81512339>] skb_panic+0x63/0x65
> Jul  3 12:18:28 vpc7 kernel: [   16.692295] RSP: 0000:ffff880407cbbba8
>  EFLAGS: 00010292
> Jul  3 12:18:28 vpc7 kernel: [   16.692298] RAX: 000000000000008c RBX:
> ffff88040be50200 RCX: 0000000016f816f7
> Jul  3 12:18:28 vpc7 kernel: [   16.692300] RDX: 0000000000000001 RSI:
> 0000000000000000 RDI: 00000000ffffffff
> Jul  3 12:18:28 vpc7 kernel: [   16.692301] RBP: ffff880407cbbbc8 R08:
> 0000000000000000 R09: 0000000000000000
> Jul  3 12:18:28 vpc7 kernel: [   16.692304] R10: 00000000ffffffff R11:
> 00000000ffffffff R12: ffff88040be3f000
> Jul  3 12:18:28 vpc7 kernel: [   16.692306] R13: ffff88040bf29000 R14:
> ffff88040be3f000 R15: 0000000000000000
> Jul  3 12:18:28 vpc7 kernel: [   16.692309] FS:
> 0000000000000000(0000) GS:ffff88042e080000(0000)
> knlGS:0000000000000000
> Jul  3 12:18:28 vpc7 kernel: [   16.692311] CS:  0010 DS: 0000 ES:
> 0000 CR0: 0000000080050033
> Jul  3 12:18:28 vpc7 kernel: [   16.692313] CR2: 00007fe8a7fd34cc CR3:
> 00000004070ec000 CR4: 00000000001407e0
> Jul  3 12:18:28 vpc7 kernel: [   16.692314] Stack:
> Jul  3 12:18:28 vpc7 kernel: [   16.692320]  ffff88030742507f
> 0000000000000058 0000000000000080 ffff88040be3f000
> Jul  3 12:18:28 vpc7 kernel: [   16.692324]  ffff880407cbbbd8
> ffffffff8143e142 ffff880407cbbc08 ffffffff814fb64d
> Jul  3 12:18:28 vpc7 kernel: [   16.692328]  ffff88040bf29880
> ffffffff81ac40e8 ffffffff81ac4110 ffff88040be3f000
> Jul  3 12:18:28 vpc7 kernel: [   16.692330] Call Trace:
> Jul  3 12:18:28 vpc7 kernel: [   16.692340]  [<ffffffff8143e142>]
> skb_push+0x38/0x39
> Jul  3 12:18:28 vpc7 kernel: [   16.692348]  [<ffffffff814fb64d>]
> packet_rcv_spkt+0x98/0xdf
> Jul  3 12:18:28 vpc7 kernel: [   16.692357]  [<ffffffff8144b8f8>]
> __netif_receive_skb_core+0x459/0x4dc
> Jul  3 12:18:28 vpc7 kernel: [   16.692363]  [<ffffffff8106b2db>] ?
> get_parent_ip+0xe/0x3e
> Jul  3 12:18:28 vpc7 kernel: [   16.692369]  [<ffffffff8144b9ce>]
> __netif_receive_skb+0x53/0x65
> Jul  3 12:18:28 vpc7 kernel: [   16.692376]  [<ffffffff8144ba40>]
> process_backlog+0x60/0x13d
> Jul  3 12:18:28 vpc7 kernel: [   16.692384]  [<ffffffff8144be10>]
> net_rx_action+0x91/0x1bd
> Jul  3 12:18:28 vpc7 kernel: [   16.692395]  [<ffffffff81046793>]
> do_current_softirqs+0x1a5/0x35b
> Jul  3 12:18:28 vpc7 kernel: [   16.692404]  [<ffffffff81089c0e>] ?
> irq_thread_fn+0x3a/0x3a
> Jul  3 12:18:28 vpc7 kernel: [   16.692409]  [<ffffffff810469c7>]
> __local_bh_enable+0x41/0x68
> Jul  3 12:18:28 vpc7 kernel: [   16.692413]  [<ffffffff810469fc>]
> local_bh_enable+0xe/0x10
> Jul  3 12:18:28 vpc7 kernel: [   16.692417]  [<ffffffff81089c57>]
> irq_forced_thread_fn+0x49/0x55
> Jul  3 12:18:28 vpc7 kernel: [   16.692422]  [<ffffffff8108a3cb>]
> irq_thread+0x8e/0x174
> Jul  3 12:18:28 vpc7 kernel: [   16.692426]  [<ffffffff81089b32>] ?
> irq_finalize_oneshot+0x9c/0x9c
> Jul  3 12:18:28 vpc7 kernel: [   16.692431]  [<ffffffff8108a33d>] ?
> irq_affinity_notify+0x14/0x14
> Jul  3 12:18:28 vpc7 kernel: [   16.692437]  [<ffffffff8105f7cf>]
> kthread+0xdc/0xe4
> Jul  3 12:18:28 vpc7 kernel: [   16.692443]  [<ffffffff8105f6f3>] ?
> flush_kthread_worker+0xe1/0xe1
> Jul  3 12:18:28 vpc7 kernel: [   16.692449]  [<ffffffff815158ac>]
> ret_from_fork+0x7c/0xb0
> Jul  3 12:18:28 vpc7 kernel: [   16.692454]  [<ffffffff8105f6f3>] ?
> flush_kthread_worker+0xe1/0xe1
> Jul  3 12:18:28 vpc7 kernel: [   16.692498] Code: 00 00 48 89 44 24 10
> 8b 87 c8 00 00 00 48 89 44 24 08 48 8b 87 d8 00 00 00 48 c7 c7 fb 46
> 7b 81 48 89 04 24 31 c0 e8 37 b2 ff ff <0f> 0b 55 48 89 e5 0f 0b 55 48
> c7 c2 a0 66 d8 81 48 89 e5 53 48
> Jul  3 12:18:28 vpc7 kernel: [   16.692503] RIP  [<ffffffff81512339>]
> skb_panic+0x63/0x65
> Jul  3 12:18:28 vpc7 kernel: [   16.692505]  RSP <ffff880407cbbba8>
> Jul  3 12:18:28 vpc7 kernel: [   16.849291] ---[ end trace 0000000000000002 ]---
>
>
> Here are the skbuffs from a number of crashes.  The call traces seem
> to be similar to the one above, but I haven't exhaustively checked.
>
> Jul  3 12:18:28 vpc7 kernel: [   16.691928] skbuff: skb_under_panic:
> text:ffffffff814fb64d len:-65447 put:-65463 head:ffff880407415080
> data:ffff88030742507f tail:0x58 end:0x80 dev:can0
>
> The following are from kernel version 'Linux version 3.10.24-rt22abs
> (austin@aschuh-peloton) (gcc version 4.7.2 (Debian 4.7.2-5abs) ) #15
> SMP PREEMPT RT Tue May 13 14:42:22 PDT'
>
> Jul  3 09:25:11 vpc6 kernel: [    7.994591] skbuff: skb_under_panic:
> text:ffffffff81492274 len:89 put:73 head:ffff8802151eaa00
> data:ffff8802151ea9ff tail:0x58 end:0x80 dev:can0
> Jul  3 09:32:46 vpc6 kernel: [    7.887542] skbuff: skb_under_panic:
> text:ffffffff81492274 len:89 put:73 head:ffff8802176a9a40
> data:ffff8802176a9a3f tail:0x58 end:0x80 dev:can1
> Jun 28 08:56:19 vpc7 kernel: [    8.157864] skbuff: skb_under_panic:
> text:ffffffff81492274 len:89 put:73 head:ffff8803f4d49a40
> data:ffff8803f4d49a3f tail:0x58 end:0x80 dev:can0
> Jun 28 08:56:19 vpc7 kernel: [    8.157868] skbuff: skb_under_panic:
> text:ffffffff81492274 len:89 put:73 head:ffff8803f41881c0
> data:ffff8803f41881bf tail:0x58 end:0x80 dev:can1
> Jun 30 15:01:59 vpc6 kernel: [   11.481219] skbuff: skb_under_panic:
> text:ffffffff81492274 len:89 put:73 head:ffff8802150f8540
> data:ffff8802150f853f tail:0x58 end:0x80 dev:can1
> Jun 30 14:42:13 vpc6 kernel: [    9.660556] skbuff: skb_under_panic:
> text:ffffffff81492274 len:89 put:73 head:ffff88021733f1c0
> data:ffff88021733f1bf tail:0x58 end:0x80 dev:can1
> Jun 30 12:55:40 vpc6 kernel: [    8.782069] skbuff: skb_under_panic:
> text:ffffffff81492274 len:89 put:73 head:ffff880214f63d40
> data:ffff880214f63d3f tail:0x58 end:0x80 dev:can1
> Jun 30 12:17:30 vpc6 kernel: [   10.016782] skbuff: skb_under_panic:
> text:ffffffff81492274 len:89 put:73 head:ffff880217cb9180
> data:ffff880217cb917f tail:0x58 end:0x80 dev:can1
> Jun 30 12:48:04 vpc6 kernel: [    8.720439] skbuff: skb_under_panic:
> text:ffffffff81492274 len:89 put:73 head:ffff880214bb88c0
> data:ffff880214bb88bf tail:0x58 end:0x80 dev:can1
> Jun 27 18:46:38 vpc7 kernel: [    7.953504] skbuff: skb_under_panic:
> text:ffffffff81492274 len:89 put:73 head:ffff8803f22d0700
> data:ffff8803f22d06ff tail:0x58 end:0x80 dev:can0
> Jun 27 18:54:15 vpc7 kernel: [    8.861305] skbuff: skb_under_panic:
> text:ffffffff81492274 len:89 put:73 head:ffff8803f2cdbf00
> data:ffff8803f2cdbeff tail:0x58 end:0x80 dev:can1
> Jun 27 04:57:37 vpc6 kernel: [    8.336231] skbuff: skb_under_panic:
> text:ffffffff81492274 len:89 put:73 head:ffff88021a81d780
> data:ffff88021a81d77f tail:0x58 end:0x80 dev:can1
> Jun 27 04:57:37 vpc6 kernel: [    8.336251] skbuff: skb_under_panic:
> text:ffffffff81492274 len:89 put:73 head:ffff880218756740
> data:ffff88021875673f tail:0x58 end:0x80 dev:can0
> Jun 23 00:22:42 vpc7 kernel: [    7.924303] skbuff: skb_under_panic:
> text:ffffffff81492274 len:89 put:73 head:ffff8803f21e7e00
> data:ffff8803f21e7dff tail:0x58 end:0x80 dev:can0
> Jun 17 04:04:30 vpc6 kernel: [    7.776512] skbuff: skb_under_panic:
> text:ffffffff81492274 len:89 put:73 head:ffff880215b19500
> data:ffff880215b194ff tail:0x58 end:0x80 dev:can1
> Jun 17 03:56:57 vpc6 kernel: [   11.906063] skbuff: skb_under_panic:
> text:ffffffff81492274 len:89 put:73 head:ffff88021830a140
> data:ffff88021830a13f tail:0x58 end:0x80 dev:can0
> Jun 17 04:12:06 vpc6 kernel: [    7.656869] skbuff: skb_under_panic:
> text:ffffffff81492274 len:89 put:73 head:ffff8802148b4ec0
> data:ffff8802148b4ebf tail:0x58 end:0x80 dev:can0
>
> Any ideas what is causing it?  The issue seems to be that the data
> pointer is less than the head pointer, from reading the code.  It only
> happens right at startup.
>
> Thanks,
>   Austin

And, of course, I forget to tell which hardware I'm using...

$ lspci -v -s 05:00.0
05:00.0 Network controller: PEAK-System Technik GmbH Device 0008 (rev 02)
        Subsystem: PEAK-System Technik GmbH Device 0005
        Flags: bus master, fast devsel, latency 0, IRQ 18
        Memory at e0610000 (32-bit, non-prefetchable) [size=64K]
        Memory at e0600000 (32-bit, non-prefetchable) [size=64K]
        Kernel driver in use: peak_pci

Austin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: skbuff panic
  2014-07-03 23:03 skbuff panic Austin Schuh
  2014-07-03 23:18 ` Austin Schuh
@ 2014-07-05 10:40 ` Oliver Hartkopp
  2014-07-05 18:38   ` Austin Schuh
  1 sibling, 1 reply; 17+ messages in thread
From: Oliver Hartkopp @ 2014-07-05 10:40 UTC (permalink / raw)
  To: Austin Schuh, linux-can

On 04.07.2014 01:03, Austin Schuh wrote:
> I'm seeing the following panic.  I've seen it on multiple kernel
> versions (3.10.24 patched, and 3.14.3).
> 
> uname -a
> Linux vpc5 3.14.3-rt4abs+ #16 SMP PREEMPT RT Tue Jul 1 16:28:26 PDT
> 2014 x86_64 GNU/Linux
> 
> Jul  3 12:18:28 vpc7 kernel: [   16.691928] skbuff: skb_under_panic:
> text:ffffffff814fb64d len:-65447 put:-65463 head:ffff880407415080
> data:ffff88030742507f tail:0x58 end:0x80 dev:can0
> Jul  3 12:18:28 vpc7 kernel: [   16.692207] ------------[ cut here ]------------
> Jul  3 12:18:28 vpc7 kernel: [   16.692209] kernel BUG at net/core/skbuff.c:100!
(..)
> Jul  3 12:18:28 vpc7 kernel: [   16.692330] Call Trace:
> Jul  3 12:18:28 vpc7 kernel: [   16.692340]  [<ffffffff8143e142>]
> skb_push+0x38/0x39
> Jul  3 12:18:28 vpc7 kernel: [   16.692348]  [<ffffffff814fb64d>]
> packet_rcv_spkt+0x98/0xdf
> Jul  3 12:18:28 vpc7 kernel: [   16.692357]  [<ffffffff8144b8f8>]
> __netif_receive_skb_core+0x459/0x4dc

> 
> Any ideas what is causing it?  The issue seems to be that the data
> pointer is less than the head pointer, from reading the code.  It only
> happens right at startup.

Hi Austin,

as you are using the PF_PACKET socket here - where packet_rcv_spkt() is using
skb_push() - the things are slightly different to the PF_CAN handling.

Are these kernel panics related to the reception of CAN frames - or do they
only show up when you send CAN frames (via PF_PACKET socket)??

Can you tell something more about how you send and receive CAN frames in your
setup?

Best regards,
Oliver


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: skbuff panic
  2014-07-05 10:40 ` Oliver Hartkopp
@ 2014-07-05 18:38   ` Austin Schuh
  2014-07-05 19:21     ` Oliver Hartkopp
  0 siblings, 1 reply; 17+ messages in thread
From: Austin Schuh @ 2014-07-05 18:38 UTC (permalink / raw)
  To: Oliver Hartkopp; +Cc: linux-can

On Sat, Jul 5, 2014 at 3:40 AM, Oliver Hartkopp <socketcan@hartkopp.net> wrote:
> On 04.07.2014 01:03, Austin Schuh wrote:
>> I'm seeing the following panic.  I've seen it on multiple kernel
>> versions (3.10.24 patched, and 3.14.3).
>>
>> uname -a
>> Linux vpc5 3.14.3-rt4abs+ #16 SMP PREEMPT RT Tue Jul 1 16:28:26 PDT
>> 2014 x86_64 GNU/Linux
>>
>> Jul  3 12:18:28 vpc7 kernel: [   16.691928] skbuff: skb_under_panic:
>> text:ffffffff814fb64d len:-65447 put:-65463 head:ffff880407415080
>> data:ffff88030742507f tail:0x58 end:0x80 dev:can0
>> Jul  3 12:18:28 vpc7 kernel: [   16.692207] ------------[ cut here ]------------
>> Jul  3 12:18:28 vpc7 kernel: [   16.692209] kernel BUG at net/core/skbuff.c:100!
> (..)
>> Jul  3 12:18:28 vpc7 kernel: [   16.692330] Call Trace:
>> Jul  3 12:18:28 vpc7 kernel: [   16.692340]  [<ffffffff8143e142>]
>> skb_push+0x38/0x39
>> Jul  3 12:18:28 vpc7 kernel: [   16.692348]  [<ffffffff814fb64d>]
>> packet_rcv_spkt+0x98/0xdf
>> Jul  3 12:18:28 vpc7 kernel: [   16.692357]  [<ffffffff8144b8f8>]
>> __netif_receive_skb_core+0x459/0x4dc
>
>>
>> Any ideas what is causing it?  The issue seems to be that the data
>> pointer is less than the head pointer, from reading the code.  It only
>> happens right at startup.
>
> Hi Austin,
>
> as you are using the PF_PACKET socket here - where packet_rcv_spkt() is using
> skb_push() - the things are slightly different to the PF_CAN handling.
>
> Are these kernel panics related to the reception of CAN frames - or do they
> only show up when you send CAN frames (via PF_PACKET socket)??
>
> Can you tell something more about how you send and receive CAN frames in your
> setup?
>
> Best regards,
> Oliver

Hi Oliver,

I'm opening the socket with the following calls:

int socket_ = socket(PF_CAN, SOCK_RAW, CAN_RAW);
struct ifreq ifr;
ioctl(socket_, SIOCGIFINDEX, &ifr);
struct sockaddr_can addr;
addr.can_family = AF_CAN;
addr.can_ifindex = ifr.ifr_ifindex;
bind(socket_, (struct sockaddr *)&addr, sizeof(addr));

And sending with:

struct can_frame frame
write(socket_, &frame, sizeof(struct can_frame))

These panics only show up at startup time.  As you can see from the
syslog entries at the various times, they all happen within the first
20 seconds of the machine coming up, and I only get a max of 1 problem
frame per boot per interface.  My logs show that the frame that
triggers the problem comes in within 1 second of the CAN interface
being initialized.

Jul  3 09:32:46 vpc6 kernel: [    5.310067] loop: module loaded
Jul  3 09:32:46 vpc6 kernel: [    5.347914] vcan: Virtual CAN interface driver
Jul  3 09:32:46 vpc6 kernel: [    6.635362] XFS (sda6): Mounting Filesystem
Jul  3 09:32:46 vpc6 kernel: [    6.659463] XFS (sda6): Starting
recovery (logdev: internal)
Jul  3 09:32:46 vpc6 kernel: [    6.670430] XFS (sda6): Ending
recovery (logdev: internal)
Jul  3 09:32:46 vpc6 kernel: [    6.680831] XFS (sda7): Mounting Filesystem
Jul  3 09:32:46 vpc6 kernel: [    6.847411] XFS (sda7): Starting
recovery (logdev: internal)
Jul  3 09:32:46 vpc6 kernel: [    6.852927] XFS (sda7): Ending
recovery (logdev: internal)
Jul  3 09:32:46 vpc6 kernel: [    7.489861] peak_pci 0000:04:00.0
can0: setting BTR0=0x01 BTR1=0x9c
Jul  3 09:32:46 vpc6 kernel: [    7.564411] peak_pci 0000:04:00.0
can1: setting BTR0=0x00 BTR1=0x9c
Jul  3 09:32:46 vpc6 kernel: [    7.863569] r8169 0000:05:00.0 eth0:
unable to load firmware patch rtl_nic/rtl8168e-3.fw (-2)
Jul  3 09:32:46 vpc6 kernel: [    7.873102] r8169 0000:05:00.0 eth0: link down
Jul  3 09:32:46 vpc6 kernel: [    7.873169] IPv6: ADDRCONF(NETDEV_UP):
eth0: link is not ready
Jul  3 09:32:46 vpc6 kernel: [    7.873212] r8169 0000:05:00.0 eth0: link down
Jul  3 09:32:46 vpc6 kernel: [    7.887542] skbuff: skb_under_panic:
text:ffffffff81492274 len:89 put:73 head:ffff8802176a9a40
data:ffff8802176a9a3f tail:0x58 end:0x80 dev:can1
Jul  3 09:32:46 vpc6 kernel: [    7.887665] ------------[ cut here ]------------
Jul  3 09:32:46 vpc6 kernel: [    7.887666] kernel BUG at net/core/skbuff.c:127!

I think the problem is related to reception and startup.  I don't have
logs to conclusively show it, but I'm pretty certain that my sending
or reading applications haven't been started up by the time the panic
triggers.  I'll try to grab better evidence of that next time I
observe it.

Thanks!
  Austin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: skbuff panic
  2014-07-05 18:38   ` Austin Schuh
@ 2014-07-05 19:21     ` Oliver Hartkopp
  2014-07-06  5:07       ` Austin Schuh
  0 siblings, 1 reply; 17+ messages in thread
From: Oliver Hartkopp @ 2014-07-05 19:21 UTC (permalink / raw)
  To: Austin Schuh; +Cc: linux-can

Hi Austin,

I assume someone opened the PF_PACKET socket for any kind of traffic (e.g.
dhcpclient ??) on any interface. Looks strange - but it should never cause
any panic ...

There's some skb header initialization code in the can_send() function in
net/can/af_can.c . We could try to put some of these in alloc_can_skb().

Can you try the following patch, if it fixes your issue?

Thanks,
Oliver

diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c
index e318e87..653db1bb 100644
--- a/drivers/net/can/dev.c
+++ b/drivers/net/can/dev.c
@@ -501,6 +501,10 @@ struct sk_buff *alloc_can_skb(struct net_device *dev, struct can_frame **cf)
 	skb->pkt_type = PACKET_BROADCAST;
 	skb->ip_summed = CHECKSUM_UNNECESSARY;
 
+	skb_reset_mac_header(skb);
+	skb_reset_network_header(skb);
+	skb_reset_transport_header(skb);
+
 	can_skb_reserve(skb);
 	can_skb_prv(skb)->ifindex = dev->ifindex;
 

 

On 05.07.2014 20:38, Austin Schuh wrote:
> On Sat, Jul 5, 2014 at 3:40 AM, Oliver Hartkopp <socketcan@hartkopp.net> wrote:
>> On 04.07.2014 01:03, Austin Schuh wrote:
>>> I'm seeing the following panic.  I've seen it on multiple kernel
>>> versions (3.10.24 patched, and 3.14.3).
>>>
>>> uname -a
>>> Linux vpc5 3.14.3-rt4abs+ #16 SMP PREEMPT RT Tue Jul 1 16:28:26 PDT
>>> 2014 x86_64 GNU/Linux
>>>
>>> Jul  3 12:18:28 vpc7 kernel: [   16.691928] skbuff: skb_under_panic:
>>> text:ffffffff814fb64d len:-65447 put:-65463 head:ffff880407415080
>>> data:ffff88030742507f tail:0x58 end:0x80 dev:can0
>>> Jul  3 12:18:28 vpc7 kernel: [   16.692207] ------------[ cut here ]------------
>>> Jul  3 12:18:28 vpc7 kernel: [   16.692209] kernel BUG at net/core/skbuff.c:100!
>> (..)
>>> Jul  3 12:18:28 vpc7 kernel: [   16.692330] Call Trace:
>>> Jul  3 12:18:28 vpc7 kernel: [   16.692340]  [<ffffffff8143e142>]
>>> skb_push+0x38/0x39
>>> Jul  3 12:18:28 vpc7 kernel: [   16.692348]  [<ffffffff814fb64d>]
>>> packet_rcv_spkt+0x98/0xdf
>>> Jul  3 12:18:28 vpc7 kernel: [   16.692357]  [<ffffffff8144b8f8>]
>>> __netif_receive_skb_core+0x459/0x4dc
>>
>>>
>>> Any ideas what is causing it?  The issue seems to be that the data
>>> pointer is less than the head pointer, from reading the code.  It only
>>> happens right at startup.
>>
>> Hi Austin,
>>
>> as you are using the PF_PACKET socket here - where packet_rcv_spkt() is using
>> skb_push() - the things are slightly different to the PF_CAN handling.
>>
>> Are these kernel panics related to the reception of CAN frames - or do they
>> only show up when you send CAN frames (via PF_PACKET socket)??
>>
>> Can you tell something more about how you send and receive CAN frames in your
>> setup?
>>
>> Best regards,
>> Oliver
> 
> Hi Oliver,
> 
> I'm opening the socket with the following calls:
> 
> int socket_ = socket(PF_CAN, SOCK_RAW, CAN_RAW);
> struct ifreq ifr;
> ioctl(socket_, SIOCGIFINDEX, &ifr);
> struct sockaddr_can addr;
> addr.can_family = AF_CAN;
> addr.can_ifindex = ifr.ifr_ifindex;
> bind(socket_, (struct sockaddr *)&addr, sizeof(addr));
> 
> And sending with:
> 
> struct can_frame frame
> write(socket_, &frame, sizeof(struct can_frame))
> 
> These panics only show up at startup time.  As you can see from the
> syslog entries at the various times, they all happen within the first
> 20 seconds of the machine coming up, and I only get a max of 1 problem
> frame per boot per interface.  My logs show that the frame that
> triggers the problem comes in within 1 second of the CAN interface
> being initialized.
> 
> Jul  3 09:32:46 vpc6 kernel: [    5.310067] loop: module loaded
> Jul  3 09:32:46 vpc6 kernel: [    5.347914] vcan: Virtual CAN interface driver
> Jul  3 09:32:46 vpc6 kernel: [    6.635362] XFS (sda6): Mounting Filesystem
> Jul  3 09:32:46 vpc6 kernel: [    6.659463] XFS (sda6): Starting
> recovery (logdev: internal)
> Jul  3 09:32:46 vpc6 kernel: [    6.670430] XFS (sda6): Ending
> recovery (logdev: internal)
> Jul  3 09:32:46 vpc6 kernel: [    6.680831] XFS (sda7): Mounting Filesystem
> Jul  3 09:32:46 vpc6 kernel: [    6.847411] XFS (sda7): Starting
> recovery (logdev: internal)
> Jul  3 09:32:46 vpc6 kernel: [    6.852927] XFS (sda7): Ending
> recovery (logdev: internal)
> Jul  3 09:32:46 vpc6 kernel: [    7.489861] peak_pci 0000:04:00.0
> can0: setting BTR0=0x01 BTR1=0x9c
> Jul  3 09:32:46 vpc6 kernel: [    7.564411] peak_pci 0000:04:00.0
> can1: setting BTR0=0x00 BTR1=0x9c
> Jul  3 09:32:46 vpc6 kernel: [    7.863569] r8169 0000:05:00.0 eth0:
> unable to load firmware patch rtl_nic/rtl8168e-3.fw (-2)
> Jul  3 09:32:46 vpc6 kernel: [    7.873102] r8169 0000:05:00.0 eth0: link down
> Jul  3 09:32:46 vpc6 kernel: [    7.873169] IPv6: ADDRCONF(NETDEV_UP):
> eth0: link is not ready
> Jul  3 09:32:46 vpc6 kernel: [    7.873212] r8169 0000:05:00.0 eth0: link down
> Jul  3 09:32:46 vpc6 kernel: [    7.887542] skbuff: skb_under_panic:
> text:ffffffff81492274 len:89 put:73 head:ffff8802176a9a40
> data:ffff8802176a9a3f tail:0x58 end:0x80 dev:can1
> Jul  3 09:32:46 vpc6 kernel: [    7.887665] ------------[ cut here ]------------
> Jul  3 09:32:46 vpc6 kernel: [    7.887666] kernel BUG at net/core/skbuff.c:127!
> 
> I think the problem is related to reception and startup.  I don't have
> logs to conclusively show it, but I'm pretty certain that my sending
> or reading applications haven't been started up by the time the panic
> triggers.  I'll try to grab better evidence of that next time I
> observe it.
> 
> Thanks!
>    Austin
> 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: skbuff panic
  2014-07-05 19:21     ` Oliver Hartkopp
@ 2014-07-06  5:07       ` Austin Schuh
  2014-07-06 12:12         ` Oliver Hartkopp
  0 siblings, 1 reply; 17+ messages in thread
From: Austin Schuh @ 2014-07-06  5:07 UTC (permalink / raw)
  To: Oliver Hartkopp; +Cc: linux-can

Hi Oliver,

Thanks!

What makes you think that someone opened a PF_PACKET socket?  I'm
curious, since that observation may help me produce a more reliable
test case and debug it some more myself.

I'm going to work on reproducing the panic more reliable, and then
I'll give your patch a whirl.  Currently, only PCs that are on the
other end of cell modems out in the field seem to be triggering the
panic.  I'm hesitant to do excessive experimentation on something that
takes a plane trip to fix.  That just means that this will take longer
to debug that I'd like...

Austin

On Sat, Jul 5, 2014 at 12:21 PM, Oliver Hartkopp <socketcan@hartkopp.net> wrote:
> Hi Austin,
>
> I assume someone opened the PF_PACKET socket for any kind of traffic (e.g.
> dhcpclient ??) on any interface. Looks strange - but it should never cause
> any panic ...
>
> There's some skb header initialization code in the can_send() function in
> net/can/af_can.c . We could try to put some of these in alloc_can_skb().
>
> Can you try the following patch, if it fixes your issue?
>
> Thanks,
> Oliver
>
> diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c
> index e318e87..653db1bb 100644
> --- a/drivers/net/can/dev.c
> +++ b/drivers/net/can/dev.c
> @@ -501,6 +501,10 @@ struct sk_buff *alloc_can_skb(struct net_device *dev, struct can_frame **cf)
>         skb->pkt_type = PACKET_BROADCAST;
>         skb->ip_summed = CHECKSUM_UNNECESSARY;
>
> +       skb_reset_mac_header(skb);
> +       skb_reset_network_header(skb);
> +       skb_reset_transport_header(skb);
> +
>         can_skb_reserve(skb);
>         can_skb_prv(skb)->ifindex = dev->ifindex;
>
>
>
>
> On 05.07.2014 20:38, Austin Schuh wrote:
>> On Sat, Jul 5, 2014 at 3:40 AM, Oliver Hartkopp <socketcan@hartkopp.net> wrote:
>>> On 04.07.2014 01:03, Austin Schuh wrote:
>>>> I'm seeing the following panic.  I've seen it on multiple kernel
>>>> versions (3.10.24 patched, and 3.14.3).
>>>>
>>>> uname -a
>>>> Linux vpc5 3.14.3-rt4abs+ #16 SMP PREEMPT RT Tue Jul 1 16:28:26 PDT
>>>> 2014 x86_64 GNU/Linux
>>>>
>>>> Jul  3 12:18:28 vpc7 kernel: [   16.691928] skbuff: skb_under_panic:
>>>> text:ffffffff814fb64d len:-65447 put:-65463 head:ffff880407415080
>>>> data:ffff88030742507f tail:0x58 end:0x80 dev:can0
>>>> Jul  3 12:18:28 vpc7 kernel: [   16.692207] ------------[ cut here ]------------
>>>> Jul  3 12:18:28 vpc7 kernel: [   16.692209] kernel BUG at net/core/skbuff.c:100!
>>> (..)
>>>> Jul  3 12:18:28 vpc7 kernel: [   16.692330] Call Trace:
>>>> Jul  3 12:18:28 vpc7 kernel: [   16.692340]  [<ffffffff8143e142>]
>>>> skb_push+0x38/0x39
>>>> Jul  3 12:18:28 vpc7 kernel: [   16.692348]  [<ffffffff814fb64d>]
>>>> packet_rcv_spkt+0x98/0xdf
>>>> Jul  3 12:18:28 vpc7 kernel: [   16.692357]  [<ffffffff8144b8f8>]
>>>> __netif_receive_skb_core+0x459/0x4dc
>>>
>>>>
>>>> Any ideas what is causing it?  The issue seems to be that the data
>>>> pointer is less than the head pointer, from reading the code.  It only
>>>> happens right at startup.
>>>
>>> Hi Austin,
>>>
>>> as you are using the PF_PACKET socket here - where packet_rcv_spkt() is using
>>> skb_push() - the things are slightly different to the PF_CAN handling.
>>>
>>> Are these kernel panics related to the reception of CAN frames - or do they
>>> only show up when you send CAN frames (via PF_PACKET socket)??
>>>
>>> Can you tell something more about how you send and receive CAN frames in your
>>> setup?
>>>
>>> Best regards,
>>> Oliver
>>
>> Hi Oliver,
>>
>> I'm opening the socket with the following calls:
>>
>> int socket_ = socket(PF_CAN, SOCK_RAW, CAN_RAW);
>> struct ifreq ifr;
>> ioctl(socket_, SIOCGIFINDEX, &ifr);
>> struct sockaddr_can addr;
>> addr.can_family = AF_CAN;
>> addr.can_ifindex = ifr.ifr_ifindex;
>> bind(socket_, (struct sockaddr *)&addr, sizeof(addr));
>>
>> And sending with:
>>
>> struct can_frame frame
>> write(socket_, &frame, sizeof(struct can_frame))
>>
>> These panics only show up at startup time.  As you can see from the
>> syslog entries at the various times, they all happen within the first
>> 20 seconds of the machine coming up, and I only get a max of 1 problem
>> frame per boot per interface.  My logs show that the frame that
>> triggers the problem comes in within 1 second of the CAN interface
>> being initialized.
>>
>> Jul  3 09:32:46 vpc6 kernel: [    5.310067] loop: module loaded
>> Jul  3 09:32:46 vpc6 kernel: [    5.347914] vcan: Virtual CAN interface driver
>> Jul  3 09:32:46 vpc6 kernel: [    6.635362] XFS (sda6): Mounting Filesystem
>> Jul  3 09:32:46 vpc6 kernel: [    6.659463] XFS (sda6): Starting
>> recovery (logdev: internal)
>> Jul  3 09:32:46 vpc6 kernel: [    6.670430] XFS (sda6): Ending
>> recovery (logdev: internal)
>> Jul  3 09:32:46 vpc6 kernel: [    6.680831] XFS (sda7): Mounting Filesystem
>> Jul  3 09:32:46 vpc6 kernel: [    6.847411] XFS (sda7): Starting
>> recovery (logdev: internal)
>> Jul  3 09:32:46 vpc6 kernel: [    6.852927] XFS (sda7): Ending
>> recovery (logdev: internal)
>> Jul  3 09:32:46 vpc6 kernel: [    7.489861] peak_pci 0000:04:00.0
>> can0: setting BTR0=0x01 BTR1=0x9c
>> Jul  3 09:32:46 vpc6 kernel: [    7.564411] peak_pci 0000:04:00.0
>> can1: setting BTR0=0x00 BTR1=0x9c
>> Jul  3 09:32:46 vpc6 kernel: [    7.863569] r8169 0000:05:00.0 eth0:
>> unable to load firmware patch rtl_nic/rtl8168e-3.fw (-2)
>> Jul  3 09:32:46 vpc6 kernel: [    7.873102] r8169 0000:05:00.0 eth0: link down
>> Jul  3 09:32:46 vpc6 kernel: [    7.873169] IPv6: ADDRCONF(NETDEV_UP):
>> eth0: link is not ready
>> Jul  3 09:32:46 vpc6 kernel: [    7.873212] r8169 0000:05:00.0 eth0: link down
>> Jul  3 09:32:46 vpc6 kernel: [    7.887542] skbuff: skb_under_panic:
>> text:ffffffff81492274 len:89 put:73 head:ffff8802176a9a40
>> data:ffff8802176a9a3f tail:0x58 end:0x80 dev:can1
>> Jul  3 09:32:46 vpc6 kernel: [    7.887665] ------------[ cut here ]------------
>> Jul  3 09:32:46 vpc6 kernel: [    7.887666] kernel BUG at net/core/skbuff.c:127!
>>
>> I think the problem is related to reception and startup.  I don't have
>> logs to conclusively show it, but I'm pretty certain that my sending
>> or reading applications haven't been started up by the time the panic
>> triggers.  I'll try to grab better evidence of that next time I
>> observe it.
>>
>> Thanks!
>>    Austin
>>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: skbuff panic
  2014-07-06  5:07       ` Austin Schuh
@ 2014-07-06 12:12         ` Oliver Hartkopp
  2014-07-06 16:13           ` Oliver Hartkopp
  0 siblings, 1 reply; 17+ messages in thread
From: Oliver Hartkopp @ 2014-07-06 12:12 UTC (permalink / raw)
  To: Austin Schuh; +Cc: linux-can

On 06.07.2014 07:07, Austin Schuh wrote:

> What makes you think that someone opened a PF_PACKET socket?  I'm
> curious, since that observation may help me produce a more reliable
> test case and debug it some more myself.

If you look in your call trace is says:

skb_push+0x38/0x39          (-->> which panics)
packet_rcv_spkt+0x98/0xdf
__netif_receive_skb_core+0x459/0x4dc
get_parent_ip+0xe/0x3e
__netif_receive_skb+0x53/0x65

As packet_rcv_spkt() is located in net/packet/af_packet.c there must be some
user at this early stage of system boot to make PF_PACKET process the CAN frame.

In https://gitorious.org/linux-can/can-tests there's a tst-packet.c program
which uses the PF_PACKET socket to send/receive CAN frames.

I was using tst-packet.c about four years ago for a test - without any
problems. But maybe something in the network layer changed, so that CAN frame
skbs need to be created with a different setup now.

I'll try ASAP if tst-packet.c still works as expected on my machine.

> 
> I'm going to work on reproducing the panic more reliable, and then
> I'll give your patch a whirl.  Currently, only PCs that are on the
> other end of cell modems out in the field seem to be triggering the
> panic.  I'm hesitant to do excessive experimentation on something that
> takes a plane trip to fix.  That just means that this will take longer
> to debug that I'd like...

Indeed this should not be the plan ...
Let's save the world by saving carbon dioxide ;-)

Best regards,
Oliver

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: skbuff panic
  2014-07-06 12:12         ` Oliver Hartkopp
@ 2014-07-06 16:13           ` Oliver Hartkopp
  2014-07-06 19:38             ` Marc Kleine-Budde
  0 siblings, 1 reply; 17+ messages in thread
From: Oliver Hartkopp @ 2014-07-06 16:13 UTC (permalink / raw)
  To: Austin Schuh; +Cc: linux-can

Answering myself:

I tested the latest 3.16.0-rc3-00149-g034a0f6 with tst-packet.c and it works
like charm. Receiving and sending with tst-packet is no problem.
I was using the SJA1000 based EMS PCMCIA card in my laptop for this test.

So I really wonder why (obviously) a CAN frame is processed by the PF_PACKET
reception handler at the start of your system?!?

Regards,
Oliver

On 06.07.2014 14:12, Oliver Hartkopp wrote:
> On 06.07.2014 07:07, Austin Schuh wrote:
> 
>> What makes you think that someone opened a PF_PACKET socket?  I'm
>> curious, since that observation may help me produce a more reliable
>> test case and debug it some more myself.
> 
> If you look in your call trace is says:
> 
> skb_push+0x38/0x39          (-->> which panics)
> packet_rcv_spkt+0x98/0xdf
> __netif_receive_skb_core+0x459/0x4dc
> get_parent_ip+0xe/0x3e
> __netif_receive_skb+0x53/0x65
> 
> As packet_rcv_spkt() is located in net/packet/af_packet.c there must be some
> user at this early stage of system boot to make PF_PACKET process the CAN frame.
> 
> In https://gitorious.org/linux-can/can-tests there's a tst-packet.c program
> which uses the PF_PACKET socket to send/receive CAN frames.
> 
> I was using tst-packet.c about four years ago for a test - without any
> problems. But maybe something in the network layer changed, so that CAN frame
> skbs need to be created with a different setup now.
> 
> I'll try ASAP if tst-packet.c still works as expected on my machine.
> 
>>
>> I'm going to work on reproducing the panic more reliable, and then
>> I'll give your patch a whirl.  Currently, only PCs that are on the
>> other end of cell modems out in the field seem to be triggering the
>> panic.  I'm hesitant to do excessive experimentation on something that
>> takes a plane trip to fix.  That just means that this will take longer
>> to debug that I'd like...
> 
> Indeed this should not be the plan ...
> Let's save the world by saving carbon dioxide ;-)
> 
> Best regards,
> Oliver
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-can" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: skbuff panic
  2014-07-06 16:13           ` Oliver Hartkopp
@ 2014-07-06 19:38             ` Marc Kleine-Budde
  2014-07-07  4:11               ` Austin Schuh
  0 siblings, 1 reply; 17+ messages in thread
From: Marc Kleine-Budde @ 2014-07-06 19:38 UTC (permalink / raw)
  To: Oliver Hartkopp, Austin Schuh; +Cc: linux-can

[-- Attachment #1: Type: text/plain, Size: 785 bytes --]

On 07/06/2014 06:13 PM, Oliver Hartkopp wrote:
> Answering myself:
> 
> I tested the latest 3.16.0-rc3-00149-g034a0f6 with tst-packet.c and it works
> like charm. Receiving and sending with tst-packet is no problem.
> I was using the SJA1000 based EMS PCMCIA card in my laptop for this test.
> 
> So I really wonder why (obviously) a CAN frame is processed by the PF_PACKET
> reception handler at the start of your system?!?

What about a dhcp client or the kernel's autoip functionality?

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 242 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: skbuff panic
  2014-07-06 19:38             ` Marc Kleine-Budde
@ 2014-07-07  4:11               ` Austin Schuh
  2014-07-10  0:07                 ` Austin Schuh
  0 siblings, 1 reply; 17+ messages in thread
From: Austin Schuh @ 2014-07-07  4:11 UTC (permalink / raw)
  To: Marc Kleine-Budde; +Cc: Oliver Hartkopp, linux-can

On Sun, Jul 6, 2014 at 12:38 PM, Marc Kleine-Budde <mkl@pengutronix.de> wrote:
> On 07/06/2014 06:13 PM, Oliver Hartkopp wrote:
>> Answering myself:
>>
>> I tested the latest 3.16.0-rc3-00149-g034a0f6 with tst-packet.c and it works
>> like charm. Receiving and sending with tst-packet is no problem.
>> I was using the SJA1000 based EMS PCMCIA card in my laptop for this test.
>>
>> So I really wonder why (obviously) a CAN frame is processed by the PF_PACKET
>> reception handler at the start of your system?!?
>
> What about a dhcp client or the kernel's autoip functionality?
>
> Marc

That is a good hypothesis.  DHCP is enabled on the machine with the
problem, and the machine at my desk which isn't reproducing the panic
has it disabled.  I'll enable it tomorrow and try it.

Austin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: skbuff panic
  2014-07-07  4:11               ` Austin Schuh
@ 2014-07-10  0:07                 ` Austin Schuh
  2014-07-10 17:37                   ` Austin Schuh
  0 siblings, 1 reply; 17+ messages in thread
From: Austin Schuh @ 2014-07-10  0:07 UTC (permalink / raw)
  To: Marc Kleine-Budde; +Cc: Oliver Hartkopp, linux-can

On Sun, Jul 6, 2014 at 9:11 PM, Austin Schuh <austin@peloton-tech.com> wrote:
> On Sun, Jul 6, 2014 at 12:38 PM, Marc Kleine-Budde <mkl@pengutronix.de> wrote:
>> On 07/06/2014 06:13 PM, Oliver Hartkopp wrote:
>>> Answering myself:
>>>
>>> I tested the latest 3.16.0-rc3-00149-g034a0f6 with tst-packet.c and it works
>>> like charm. Receiving and sending with tst-packet is no problem.
>>> I was using the SJA1000 based EMS PCMCIA card in my laptop for this test.
>>>
>>> So I really wonder why (obviously) a CAN frame is processed by the PF_PACKET
>>> reception handler at the start of your system?!?
>>
>> What about a dhcp client or the kernel's autoip functionality?
>>
>> Marc
>
> That is a good hypothesis.  DHCP is enabled on the machine with the
> problem, and the machine at my desk which isn't reproducing the panic
> has it disabled.  I'll enable it tomorrow and try it.
>
> Austin

Turns out reproducing this bug at my desk is a bit of a pain.  It
takes 10 reboots, with a standard deviation of 9 to reproduce the bug.

With Oliver's patch, I'm able to get to 60 reboots and counting, so it
looks like that was the problem.  I'll leave it rebooting over night
to be sure.

Austin


For my future reference/anyone else who is interested, here is the
number of reboots until failure without the patch
27 successful reboots
1 failed reboot
2 successful reboots
1 failed reboot
13 successful
1 failed
5 successful
2 failed
10 successful
2 failed
3 successful
1 failed
4 successful
1 failed
19 successful
1 failed

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: skbuff panic
  2014-07-10  0:07                 ` Austin Schuh
@ 2014-07-10 17:37                   ` Austin Schuh
  2014-07-11 13:27                     ` Oliver Hartkopp
  0 siblings, 1 reply; 17+ messages in thread
From: Austin Schuh @ 2014-07-10 17:37 UTC (permalink / raw)
  To: Marc Kleine-Budde; +Cc: Oliver Hartkopp, linux-can

On Wed, Jul 9, 2014 at 5:07 PM, Austin Schuh <austin@peloton-tech.com> wrote:
> With Oliver's patch, I'm able to get to 60 reboots and counting, so it
> looks like that was the problem.  I'll leave it rebooting over night
> to be sure.

The machine survived 216 reboots with no panics over the night.  Thanks Oliver!

Tested-by: Austin Schuh <austin@peloton-tech.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: skbuff panic
  2014-07-10 17:37                   ` Austin Schuh
@ 2014-07-11 13:27                     ` Oliver Hartkopp
  2014-07-11 14:58                       ` Austin Schuh
  0 siblings, 1 reply; 17+ messages in thread
From: Oliver Hartkopp @ 2014-07-11 13:27 UTC (permalink / raw)
  To: Austin Schuh, Marc Kleine-Budde; +Cc: linux-can

Thanks for testing, Austin!

I'll cook a patch for stable to add these settings.
Even if we do not know why someone requests obviously all netdev skbs and 
treats them to be Ethernet/IP packets we should fix it in our code.

If we fix it somewhere else it will last one year until some of the 
is-there-anything-else-than-ethernet-networking guys will break it again. ;-)

Best regards,
Oliver

On 10.07.2014 13:37, Austin Schuh wrote:
> On Wed, Jul 9, 2014 at 5:07 PM, Austin Schuh <austin@peloton-tech.com> wrote:
>> With Oliver's patch, I'm able to get to 60 reboots and counting, so it
>> looks like that was the problem.  I'll leave it rebooting over night
>> to be sure.
>
> The machine survived 216 reboots with no panics over the night.  Thanks Oliver!
>
> Tested-by: Austin Schuh <austin@peloton-tech.com>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: skbuff panic
  2014-07-11 13:27                     ` Oliver Hartkopp
@ 2014-07-11 14:58                       ` Austin Schuh
  2014-07-11 17:48                         ` Marc Kleine-Budde
  0 siblings, 1 reply; 17+ messages in thread
From: Austin Schuh @ 2014-07-11 14:58 UTC (permalink / raw)
  To: Oliver Hartkopp; +Cc: Marc Kleine-Budde, linux-can

On Fri, Jul 11, 2014 at 6:27 AM, Oliver Hartkopp <socketcan@hartkopp.net> wrote:
> Thanks for testing, Austin!
>
> I'll cook a patch for stable to add these settings.
> Even if we do not know why someone requests obviously all netdev skbs and
> treats them to be Ethernet/IP packets we should fix it in our code.
>
> If we fix it somewhere else it will last one year until some of the
> is-there-anything-else-than-ethernet-networking guys will break it again.
> ;-)

Is it worth debugging why as well?  Defense in layers?

Austin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: skbuff panic
  2014-07-11 14:58                       ` Austin Schuh
@ 2014-07-11 17:48                         ` Marc Kleine-Budde
  2015-02-19 11:48                           ` Daniel Steer
  0 siblings, 1 reply; 17+ messages in thread
From: Marc Kleine-Budde @ 2014-07-11 17:48 UTC (permalink / raw)
  To: Austin Schuh, Oliver Hartkopp; +Cc: linux-can

[-- Attachment #1: Type: text/plain, Size: 919 bytes --]

On 07/11/2014 04:58 PM, Austin Schuh wrote:
> On Fri, Jul 11, 2014 at 6:27 AM, Oliver Hartkopp <socketcan@hartkopp.net> wrote:
>> Thanks for testing, Austin!
>>
>> I'll cook a patch for stable to add these settings.
>> Even if we do not know why someone requests obviously all netdev skbs and
>> treats them to be Ethernet/IP packets we should fix it in our code.
>>
>> If we fix it somewhere else it will last one year until some of the
>> is-there-anything-else-than-ethernet-networking guys will break it again.
>> ;-)
> 
> Is it worth debugging why as well?  Defense in layers?

Anything new about the kernel-dhcp?

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 242 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: skbuff panic
  2014-07-11 17:48                         ` Marc Kleine-Budde
@ 2015-02-19 11:48                           ` Daniel Steer
  2015-02-23 12:55                             ` Oliver Hartkopp
  0 siblings, 1 reply; 17+ messages in thread
From: Daniel Steer @ 2015-02-19 11:48 UTC (permalink / raw)
  To: linux-can

Marc Kleine-Budde <mkl <at> pengutronix.de> writes:

> 
> On 07/11/2014 04:58 PM, Austin Schuh wrote:
> > On Fri, Jul 11, 2014 at 6:27 AM, Oliver Hartkopp <socketcan <at> 
hartkopp.net> wrote:
> >> Thanks for testing, Austin!
> >>
> >> I'll cook a patch for stable to add these settings.
> >> Even if we do not know why someone requests obviously all netdev 
skbs and
> >> treats them to be Ethernet/IP packets we should fix it in our code.
> >>
> >> If we fix it somewhere else it will last one year until some of the
> >> is-there-anything-else-than-ethernet-networking guys will break it 
again.
> >> 
> > 
> > Is it worth debugging why as well?  Defense in layers?
> 
> Anything new about the kernel-dhcp?
> 
> Marc
> 

Hi,

I have been investigating a similar issue on our system when running 
dhclient on a wireless network interface whilst sending and receiving on 
CAN via the BCM. I have applied your suggested patch, but also needed to 
reset the mac header pointer in alloc_can_skb() as looped back transmit 
packets will also cause an sk_buff under panic in af_packet.c. Ours is 
an older kernel and not the latest CAN code, but I can't see any changes 
in the latest code that would address this.

diff --git a/net/can/af_can.c b/net/can/af_can.c
--- a/net/can/af_can.c
+++ b/net/can/af_can.c
@@ -318,6 +318,8 @@
 #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,22)
     skb_reset_network_header(skb);
     skb_reset_transport_header(skb);
+    /* dhclient interacting badly with CAN. */
+    skb_reset_mac_header(skb);
 #else
     skb->nh.raw = skb->data;
     skb->h.raw  = skb->data;

Thank you,

Daniel




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: skbuff panic
  2015-02-19 11:48                           ` Daniel Steer
@ 2015-02-23 12:55                             ` Oliver Hartkopp
  0 siblings, 0 replies; 17+ messages in thread
From: Oliver Hartkopp @ 2015-02-23 12:55 UTC (permalink / raw)
  To: Daniel Steer; +Cc: linux-can, Marc Kleine-Budde, Austin Schuh

Hello Daniel,

thanks for pointing me to this issue (again)!

I was checking the mail thread from July 2014 and found this fix for 
linux/driver/net/can/dev.c

http://marc.info/?l=linux-can&m=140458812201304&w=2

Which addresses this issue too.

So obviously we will need to reset the mac/network/transport header in

af_can.c: can_send()
dev.c: alloc_can_skb()
dev.c: alloc_canfd_skb()

to make sure it runs properly with af_packet (and kernel-dhcp) too.

Last time I announced to create a patch for it. This time I will DO SO!

Thanks for your patience :-)

Best regards,
Oliver


On 19.02.2015 12:48, Daniel Steer wrote:
> Marc Kleine-Budde <mkl <at> pengutronix.de> writes:
>
>>
>> On 07/11/2014 04:58 PM, Austin Schuh wrote:
>>> On Fri, Jul 11, 2014 at 6:27 AM, Oliver Hartkopp <socketcan <at>
> hartkopp.net> wrote:
>>>> Thanks for testing, Austin!
>>>>
>>>> I'll cook a patch for stable to add these settings.
>>>> Even if we do not know why someone requests obviously all netdev
> skbs and
>>>> treats them to be Ethernet/IP packets we should fix it in our code.
>>>>
>>>> If we fix it somewhere else it will last one year until some of the
>>>> is-there-anything-else-than-ethernet-networking guys will break it
> again.
>>>>
>>>
>>> Is it worth debugging why as well?  Defense in layers?
>>
>> Anything new about the kernel-dhcp?
>>
>> Marc
>>
>
> Hi,
>
> I have been investigating a similar issue on our system when running
> dhclient on a wireless network interface whilst sending and receiving on
> CAN via the BCM. I have applied your suggested patch, but also needed to
> reset the mac header pointer in alloc_can_skb() as looped back transmit
> packets will also cause an sk_buff under panic in af_packet.c. Ours is
> an older kernel and not the latest CAN code, but I can't see any changes
> in the latest code that would address this.
>
> diff --git a/net/can/af_can.c b/net/can/af_can.c
> --- a/net/can/af_can.c
> +++ b/net/can/af_can.c
> @@ -318,6 +318,8 @@
>   #if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,22)
>       skb_reset_network_header(skb);
>       skb_reset_transport_header(skb);
> +    /* dhclient interacting badly with CAN. */
> +    skb_reset_mac_header(skb);
>   #else
>       skb->nh.raw = skb->data;
>       skb->h.raw  = skb->data;
>
> Thank you,
>
> Daniel
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-can" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2015-02-23 12:55 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-03 23:03 skbuff panic Austin Schuh
2014-07-03 23:18 ` Austin Schuh
2014-07-05 10:40 ` Oliver Hartkopp
2014-07-05 18:38   ` Austin Schuh
2014-07-05 19:21     ` Oliver Hartkopp
2014-07-06  5:07       ` Austin Schuh
2014-07-06 12:12         ` Oliver Hartkopp
2014-07-06 16:13           ` Oliver Hartkopp
2014-07-06 19:38             ` Marc Kleine-Budde
2014-07-07  4:11               ` Austin Schuh
2014-07-10  0:07                 ` Austin Schuh
2014-07-10 17:37                   ` Austin Schuh
2014-07-11 13:27                     ` Oliver Hartkopp
2014-07-11 14:58                       ` Austin Schuh
2014-07-11 17:48                         ` Marc Kleine-Budde
2015-02-19 11:48                           ` Daniel Steer
2015-02-23 12:55                             ` Oliver Hartkopp

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).