* NULL pointer dereference panic in stable (2.6.33.2), amd64 @ 2010-04-11 20:38 Denys Fedorysychenko 2010-04-11 22:35 ` Eric Dumazet 0 siblings, 1 reply; 37+ messages in thread From: Denys Fedorysychenko @ 2010-04-11 20:38 UTC (permalink / raw) To: netdev [-- Attachment #1: Type: Text/Plain, Size: 386 bytes --] Hi Got recently NULL pointer dereference Note - it seems in 32-bit i didn't experience this issue. Router with NAT, HFSC shapers terminated with bfifo qdiscs (some attached to 802.1Q vlan's), userspace application using tun. Hardware - network cards e1000e, Core 2 based architecture (two Quad Xeon), full message received over netconsole attached More info can provide on request [-- Attachment #2: BUG --] [-- Type: text/plain, Size: 7186 bytes --] Apr 11 23:28:52 ROUTER kernel: [21095.453138] BUG: unable to handle kernel NULL pointer dereference at (null) Apr 11 23:28:52 ROUTER kernel: [21095.453334] IP: [<ffffffff811e5877>] dev_queue_xmit+0x284/0x465 Apr 11 23:28:52 ROUTER kernel: [21095.453522] PGD 20b247067 PUD 20b24c067 PMD 0 Apr 11 23:28:52 ROUTER kernel: [21095.453704] Oops: 0000 [#1] SMP Apr 11 23:28:52 ROUTER kernel: [21095.453880] last sysfs file: /sys/devices/virtual/misc/tun/dev Apr 11 23:28:52 ROUTER kernel: [21095.454001] CPU 0 Apr 11 23:28:52 ROUTER kernel: [21095.454001] Pid: 2864, comm: globax Not tainted 2.6.33.2-build-0051-64 #4 / Apr 11 23:28:52 ROUTER kernel: [21095.454001] RIP: 0010:[<ffffffff811e5877>] [<ffffffff811e5877>] dev_queue_xmit+0x284/0x465 Apr 11 23:28:52 ROUTER kernel: [21095.454001] RSP: 0000:ffff880028203db0 EFLAGS: 00010202 Apr 11 23:28:52 ROUTER kernel: [21095.454001] RAX: 0000000000002000 RBX: 0000000000000000 RCX: ffff8802087d8d00 Apr 11 23:28:52 ROUTER kernel: [21095.454001] RDX: ffff88021d818000 RSI: 0000000000000000 RDI: ffff8802135850e8 Apr 11 23:28:52 ROUTER kernel: [21095.454001] RBP: ffff880028203de0 R08: ffff88021c01d89c R09: ffff88021c01dc00 Apr 11 23:28:52 ROUTER kernel: [21095.454001] R10: dead000000200200 R11: dead000000100100 R12: ffff88021e212b80 Apr 11 23:28:52 ROUTER kernel: [21095.454001] R13: ffff88021da51e80 R14: ffff8802135850e8 R15: ffff88021be62000 Apr 11 23:28:52 ROUTER kernel: [21095.454001] FS: 0000000000000000(0000) GS:ffff880028200000(0063) knlGS:00000000f7e51b10 Apr 11 23:28:52 ROUTER kernel: [21095.454001] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 Apr 11 23:28:52 ROUTER kernel: [21095.454001] CR2: 0000000000000000 CR3: 000000020b248000 CR4: 00000000000006f0 Apr 11 23:28:52 ROUTER kernel: [21095.454001] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Apr 11 23:28:52 ROUTER kernel: [21095.454001] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Apr 11 23:28:52 ROUTER kernel: [21095.454001] Process globax (pid: 2864, threadinfo ffff8802118c0000, task ffff8802182c2c20) Apr 11 23:28:52 ROUTER kernel: [21095.454001] Stack: Apr 11 23:28:52 ROUTER kernel: [21095.454001] ffff88021d818000 ffff88021da51e80 0000000000000042 ffff88021da51e80 Apr 11 23:28:52 ROUTER kernel: [21095.454001] <0> ffff88021be62000 ffff88021be62000 ffff880028203e00 ffffffffa01c12a9 Apr 11 23:28:52 ROUTER kernel: [21095.454001] <0> 0000000000000000 ffff8802135850e8 ffff880028203e50 ffffffff811e540e Apr 11 23:28:52 ROUTER kernel: [21095.454001] Call Trace: Apr 11 23:28:52 ROUTER kernel: [21095.454001] <IRQ> Apr 11 23:28:52 ROUTER kernel: [21095.454001] [<ffffffffa01c12a9>] vlan_dev_hwaccel_hard_start_xmit+0x68/0x86 [8021q] Apr 11 23:28:52 ROUTER kernel: [21095.454001] [<ffffffff811e540e>] dev_hard_start_xmit+0x232/0x304 Apr 11 23:28:52 ROUTER kernel: [21095.454001] [<ffffffff811f6482>] sch_direct_xmit+0x5d/0x16b Apr 11 23:28:52 ROUTER kernel: [21095.454001] [<ffffffff811f664c>] __qdisc_run+0xbc/0xdc Apr 11 23:28:52 ROUTER kernel: [21095.454001] [<ffffffff811e2c89>] net_tx_action+0xc2/0x120 Apr 11 23:28:52 ROUTER kernel: [21095.454001] [<ffffffff81039670>] __do_softirq+0x96/0x11a Apr 11 23:28:52 ROUTER kernel: [21095.454001] [<ffffffff810037cc>] call_softirq+0x1c/0x28 Apr 11 23:28:52 ROUTER kernel: [21095.454001] [<ffffffff81005543>] do_softirq+0x33/0x68 Apr 11 23:28:52 ROUTER kernel: [21095.454001] [<ffffffff81039407>] irq_exit+0x36/0x75 Apr 11 23:28:52 ROUTER kernel: [21095.454001] [<ffffffff81016f63>] smp_apic_timer_interrupt+0x88/0x96 Apr 11 23:28:52 ROUTER kernel: [21095.454001] [<ffffffff81003293>] apic_timer_interrupt+0x13/0x20 Apr 11 23:28:52 ROUTER kernel: [21095.454001] <EOI> Apr 11 23:28:52 ROUTER kernel: [21095.454001] Code: e2 48 8b 55 d0 49 c1 e4 07 66 41 8b 86 a6 00 00 00 4c 03 a2 00 03 00 00 80 e4 cf 80 cc 20 49 8b 5c 24 08 66 41 89 86 a6 00 00 00 <48> 83 3b 00 0f 84 bb 00 00 00 4c 8d ab 9c 00 00 00 4c 89 ef e8 Apr 11 23:28:52 ROUTER kernel: [21095.454001] RIP [<ffffffff811e5877>] dev_queue_xmit+0x284/0x465 Apr 11 23:28:52 ROUTER kernel: [21095.454001] RSP <ffff880028203db0> Apr 11 23:28:52 ROUTER kernel: [21095.454001] CR2: 0000000000000000 Apr 11 23:28:52 ROUTER kernel: [21095.462975] ---[ end trace 2421d995c1afd7c3 ]--- Apr 11 23:28:52 ROUTER kernel: [21095.463199] Kernel panic - not syncing: Fatal exception in interrupt Apr 11 23:28:52 ROUTER kernel: [21095.463428] Pid: 2864, comm: globax Tainted: G D 2.6.33.2-build-0051-64 #4 Apr 11 23:28:52 ROUTER kernel: [21095.463819] Call Trace: Apr 11 23:28:52 ROUTER kernel: [21095.464036] <IRQ> [<ffffffff81259743>] panic+0xa0/0x161 Apr 11 23:28:52 ROUTER kernel: [21095.464307] [<ffffffff81003293>] ? apic_timer_interrupt+0x13/0x20 Apr 11 23:28:52 ROUTER kernel: [21095.464533] [<ffffffff81035673>] ? kmsg_dump+0x112/0x12c Apr 11 23:28:52 ROUTER kernel: [21095.464757] [<ffffffff81006651>] oops_end+0xaa/0xba Apr 11 23:28:52 ROUTER kernel: [21095.464980] [<ffffffff8101e653>] no_context+0x1f3/0x202 Apr 11 23:28:52 ROUTER kernel: [21095.465211] [<ffffffff810523a4>] ? tick_program_event+0x25/0x27 Apr 11 23:28:52 ROUTER kernel: [21095.465437] [<ffffffff8101e81c>] __bad_area_nosemaphore+0x1ba/0x1e0 Apr 11 23:28:52 ROUTER kernel: [21095.465668] [<ffffffff8113f8b3>] ? swiotlb_map_page+0x0/0xd5 Apr 11 23:28:52 ROUTER kernel: [21095.465907] [<ffffffffa015c55a>] ? pci_map_single+0x8a/0x99 [e1000e] Apr 11 23:28:52 ROUTER kernel: [21095.466139] [<ffffffff8113f0c0>] ? swiotlb_dma_mapping_error+0x18/0x25 Apr 11 23:28:52 ROUTER kernel: [21095.466372] [<ffffffffa015a2e0>] ? pci_dma_mapping_error+0x31/0x3d [e1000e] Apr 11 23:28:52 ROUTER kernel: [21095.466605] [<ffffffffa015cc37>] ? e1000_xmit_frame+0x6ce/0xa43 [e1000e] Apr 11 23:28:52 ROUTER kernel: [21095.466833] [<ffffffff8101e850>] bad_area_nosemaphore+0xe/0x10 Apr 11 23:28:52 ROUTER kernel: [21095.467064] [<ffffffff8101eb32>] do_page_fault+0x114/0x24a Apr 11 23:28:52 ROUTER kernel: [21095.467293] [<ffffffff8125bc9f>] page_fault+0x1f/0x30 Apr 11 23:28:52 ROUTER kernel: [21095.467516] [<ffffffff811e5877>] ? dev_queue_xmit+0x284/0x465 Apr 11 23:28:52 ROUTER kernel: [21095.467743] [<ffffffffa01c12a9>] vlan_dev_hwaccel_hard_start_xmit+0x68/0x86 [8021q] Apr 11 23:28:52 ROUTER kernel: [21095.468139] [<ffffffff811e540e>] dev_hard_start_xmit+0x232/0x304 Apr 11 23:28:52 ROUTER kernel: [21095.468366] [<ffffffff811f6482>] sch_direct_xmit+0x5d/0x16b Apr 11 23:28:52 ROUTER kernel: [21095.468591] [<ffffffff811f664c>] __qdisc_run+0xbc/0xdc Apr 11 23:28:52 ROUTER kernel: [21095.468814] [<ffffffff811e2c89>] net_tx_action+0xc2/0x120 Apr 11 23:28:52 ROUTER kernel: [21095.469042] [<ffffffff81039670>] __do_softirq+0x96/0x11a Apr 11 23:28:52 ROUTER kernel: [21095.469267] [<ffffffff810037cc>] call_softirq+0x1c/0x28 Apr 11 23:28:52 ROUTER kernel: [21095.469491] [<ffffffff81005543>] do_softirq+0x33/0x68 Apr 11 23:28:52 ROUTER kernel: [21095.469714] [<ffffffff81039407>] irq_exit+0x36/0x75 Apr 11 23:28:52 ROUTER kernel: [21095.469936] [<ffffffff81016f63>] smp_apic_timer_interrupt+0x88/0x96 Apr 11 23:28:52 ROUTER kernel: [21095.470167] [<ffffffff81003293>] apic_timer_interrupt+0x13/0x20 ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64 2010-04-11 20:38 NULL pointer dereference panic in stable (2.6.33.2), amd64 Denys Fedorysychenko @ 2010-04-11 22:35 ` Eric Dumazet 2010-04-11 23:04 ` Denys Fedorysychenko 2010-04-12 3:38 ` Krishna Kumar2 0 siblings, 2 replies; 37+ messages in thread From: Eric Dumazet @ 2010-04-11 22:35 UTC (permalink / raw) To: Denys Fedorysychenko; +Cc: netdev, Krishna Kumar, David Miller Le dimanche 11 avril 2010 à 23:38 +0300, Denys Fedorysychenko a écrit : > Hi > > Got recently NULL pointer dereference > Note - it seems in 32-bit i didn't experience this issue. > > Router with NAT, HFSC shapers terminated with bfifo qdiscs (some attached to > 802.1Q vlan's), userspace application using tun. > Hardware - network cards e1000e, Core 2 based architecture (two Quad Xeon), > > full message received over netconsole attached > More info can provide on request > pièce jointe document texte brut (BUG) > Apr 11 23:28:52 ROUTER kernel: [21095.453138] BUG: unable to handle kernel NULL pointer dereference at (null) > Apr 11 23:28:52 ROUTER kernel: [21095.453334] IP: [<ffffffff811e5877>] dev_queue_xmit+0x284/0x465 > Apr 11 23:28:52 ROUTER kernel: [21095.453522] PGD 20b247067 PUD 20b24c067 PMD 0 > Apr 11 23:28:52 ROUTER kernel: [21095.453704] Oops: 0000 [#1] SMP > Apr 11 23:28:52 ROUTER kernel: [21095.453880] last sysfs file: /sys/devices/virtual/misc/tun/dev > Apr 11 23:28:52 ROUTER kernel: [21095.454001] CPU 0 > Apr 11 23:28:52 ROUTER kernel: [21095.454001] Pid: 2864, comm: globax Not tainted 2.6.33.2-build-0051-64 #4 / > Apr 11 23:28:52 ROUTER kernel: [21095.454001] RIP: 0010:[<ffffffff811e5877>] [<ffffffff811e5877>] dev_queue_xmit+0x284/0x465 > Apr 11 23:28:52 ROUTER kernel: [21095.454001] RSP: 0000:ffff880028203db0 EFLAGS: 00010202 > Apr 11 23:28:52 ROUTER kernel: [21095.454001] RAX: 0000000000002000 RBX: 0000000000000000 RCX: ffff8802087d8d00 > Apr 11 23:28:52 ROUTER kernel: [21095.454001] RDX: ffff88021d818000 RSI: 0000000000000000 RDI: ffff8802135850e8 > Apr 11 23:28:52 ROUTER kernel: [21095.454001] RBP: ffff880028203de0 R08: ffff88021c01d89c R09: ffff88021c01dc00 > Apr 11 23:28:52 ROUTER kernel: [21095.454001] R10: dead000000200200 R11: dead000000100100 R12: ffff88021e212b80 > Apr 11 23:28:52 ROUTER kernel: [21095.454001] R13: ffff88021da51e80 R14: ffff8802135850e8 R15: ffff88021be62000 > Apr 11 23:28:52 ROUTER kernel: [21095.454001] FS: 0000000000000000(0000) GS:ffff880028200000(0063) knlGS:00000000f7e51b10 > Apr 11 23:28:52 ROUTER kernel: [21095.454001] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 > Apr 11 23:28:52 ROUTER kernel: [21095.454001] CR2: 0000000000000000 CR3: 000000020b248000 CR4: 00000000000006f0 > Apr 11 23:28:52 ROUTER kernel: [21095.454001] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > Apr 11 23:28:52 ROUTER kernel: [21095.454001] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Apr 11 23:28:52 ROUTER kernel: [21095.454001] Process globax (pid: 2864, threadinfo ffff8802118c0000, task ffff8802182c2c20) > Apr 11 23:28:52 ROUTER kernel: [21095.454001] Stack: > Apr 11 23:28:52 ROUTER kernel: [21095.454001] ffff88021d818000 ffff88021da51e80 0000000000000042 ffff88021da51e80 > Apr 11 23:28:52 ROUTER kernel: [21095.454001] <0> ffff88021be62000 ffff88021be62000 ffff880028203e00 ffffffffa01c12a9 > Apr 11 23:28:52 ROUTER kernel: [21095.454001] <0> 0000000000000000 ffff8802135850e8 ffff880028203e50 ffffffff811e540e > Apr 11 23:28:52 ROUTER kernel: [21095.454001] Call Trace: > Apr 11 23:28:52 ROUTER kernel: [21095.454001] <IRQ> > Apr 11 23:28:52 ROUTER kernel: [21095.454001] [<ffffffffa01c12a9>] vlan_dev_hwaccel_hard_start_xmit+0x68/0x86 [8021q] > Apr 11 23:28:52 ROUTER kernel: [21095.454001] [<ffffffff811e540e>] dev_hard_start_xmit+0x232/0x304 > Apr 11 23:28:52 ROUTER kernel: [21095.454001] [<ffffffff811f6482>] sch_direct_xmit+0x5d/0x16b > Apr 11 23:28:52 ROUTER kernel: [21095.454001] [<ffffffff811f664c>] __qdisc_run+0xbc/0xdc > Apr 11 23:28:52 ROUTER kernel: [21095.454001] [<ffffffff811e2c89>] net_tx_action+0xc2/0x120 > Apr 11 23:28:52 ROUTER kernel: [21095.454001] [<ffffffff81039670>] __do_softirq+0x96/0x11a > Apr 11 23:28:52 ROUTER kernel: [21095.454001] [<ffffffff810037cc>] call_softirq+0x1c/0x28 > Apr 11 23:28:52 ROUTER kernel: [21095.454001] [<ffffffff81005543>] do_softirq+0x33/0x68 > Apr 11 23:28:52 ROUTER kernel: [21095.454001] [<ffffffff81039407>] irq_exit+0x36/0x75 > Apr 11 23:28:52 ROUTER kernel: [21095.454001] [<ffffffff81016f63>] smp_apic_timer_interrupt+0x88/0x96 > Apr 11 23:28:52 ROUTER kernel: [21095.454001] [<ffffffff81003293>] apic_timer_interrupt+0x13/0x20 > Apr 11 23:28:52 ROUTER kernel: [21095.454001] <EOI> > Apr 11 23:28:52 ROUTER kernel: [21095.454001] Code: e2 > 48 8b 55 d0 // mov -0x30(%rbp),%rdx > 49 c1 e4 07 // shl $0x7,%r12 > 66 41 8b 86 a6 00 00 00 // mov 0xa6(%r14),%ax > 4c 03 a2 00 03 00 00 // add 0x0300(%rdx),%r12 > 80 e4 cf // and $0xcf,%ah > 80 cc 20 // or $0x20,%ah > 49 8b 5c 24 08 // mov 0x8(%r12),%rbx rcu_dereference(txq->qdisc); > 66 41 89 86 a6 00 00 00 // mov %ax,0xa6(%r14) > <48> 83 3b 00 // cmpq $0x0,(%rbx) > 0f 84 bb 00 00 00 // je ... > 4c 8d ab 9c 00 00 00 4c 89 ef e8 > Apr 11 23:28:52 ROUTER kernel: [21095.454001] RIP [<ffffffff811e5877>] dev_queue_xmit+0x284/0x465 > Apr 11 23:28:52 ROUTER kernel: [21095.454001] RSP <ffff880028203db0> > Apr 11 23:28:52 ROUTER kernel: [21095.454001] CR2: 0000000000000000 > Apr 11 23:28:52 ROUTER kernel: [21095.462975] ---[ end trace 2421d995c1afd7c3 ]--- > Apr 11 23:28:52 ROUTER kernel: [21095.463199] Kernel panic - not syncing: Fatal exception in interrupt > Apr 11 23:28:52 ROUTER kernel: [21095.463428] Pid: 2864, comm: globax Tainted: G D 2.6.33.2-build-0051-64 #4 > Apr 11 23:28:52 ROUTER kernel: [21095.463819] Call Trace: > Apr 11 23:28:52 ROUTER kernel: [21095.464036] <IRQ> [<ffffffff81259743>] panic+0xa0/0x161 > Apr 11 23:28:52 ROUTER kernel: [21095.464307] [<ffffffff81003293>] ? apic_timer_interrupt+0x13/0x20 > Apr 11 23:28:52 ROUTER kernel: [21095.464533] [<ffffffff81035673>] ? kmsg_dump+0x112/0x12c > Apr 11 23:28:52 ROUTER kernel: [21095.464757] [<ffffffff81006651>] oops_end+0xaa/0xba > Apr 11 23:28:52 ROUTER kernel: [21095.464980] [<ffffffff8101e653>] no_context+0x1f3/0x202 > Apr 11 23:28:52 ROUTER kernel: [21095.465211] [<ffffffff810523a4>] ? tick_program_event+0x25/0x27 > Apr 11 23:28:52 ROUTER kernel: [21095.465437] [<ffffffff8101e81c>] __bad_area_nosemaphore+0x1ba/0x1e0 > Apr 11 23:28:52 ROUTER kernel: [21095.465668] [<ffffffff8113f8b3>] ? swiotlb_map_page+0x0/0xd5 > Apr 11 23:28:52 ROUTER kernel: [21095.465907] [<ffffffffa015c55a>] ? pci_map_single+0x8a/0x99 [e1000e] > Apr 11 23:28:52 ROUTER kernel: [21095.466139] [<ffffffff8113f0c0>] ? swiotlb_dma_mapping_error+0x18/0x25 > Apr 11 23:28:52 ROUTER kernel: [21095.466372] [<ffffffffa015a2e0>] ? pci_dma_mapping_error+0x31/0x3d [e1000e] > Apr 11 23:28:52 ROUTER kernel: [21095.466605] [<ffffffffa015cc37>] ? e1000_xmit_frame+0x6ce/0xa43 [e1000e] > Apr 11 23:28:52 ROUTER kernel: [21095.466833] [<ffffffff8101e850>] bad_area_nosemaphore+0xe/0x10 > Apr 11 23:28:52 ROUTER kernel: [21095.467064] [<ffffffff8101eb32>] do_page_fault+0x114/0x24a > Apr 11 23:28:52 ROUTER kernel: [21095.467293] [<ffffffff8125bc9f>] page_fault+0x1f/0x30 > Apr 11 23:28:52 ROUTER kernel: [21095.467516] [<ffffffff811e5877>] ? dev_queue_xmit+0x284/0x465 > Apr 11 23:28:52 ROUTER kernel: [21095.467743] [<ffffffffa01c12a9>] vlan_dev_hwaccel_hard_start_xmit+0x68/0x86 [8021q] > Apr 11 23:28:52 ROUTER kernel: [21095.468139] [<ffffffff811e540e>] dev_hard_start_xmit+0x232/0x304 > Apr 11 23:28:52 ROUTER kernel: [21095.468366] [<ffffffff811f6482>] sch_direct_xmit+0x5d/0x16b > Apr 11 23:28:52 ROUTER kernel: [21095.468591] [<ffffffff811f664c>] __qdisc_run+0xbc/0xdc > Apr 11 23:28:52 ROUTER kernel: [21095.468814] [<ffffffff811e2c89>] net_tx_action+0xc2/0x120 > Apr 11 23:28:52 ROUTER kernel: [21095.469042] [<ffffffff81039670>] __do_softirq+0x96/0x11a > Apr 11 23:28:52 ROUTER kernel: [21095.469267] [<ffffffff810037cc>] call_softirq+0x1c/0x28 > Apr 11 23:28:52 ROUTER kernel: [21095.469491] [<ffffffff81005543>] do_softirq+0x33/0x68 > Apr 11 23:28:52 ROUTER kernel: [21095.469714] [<ffffffff81039407>] irq_exit+0x36/0x75 > Apr 11 23:28:52 ROUTER kernel: [21095.469936] [<ffffffff81016f63>] smp_apic_timer_interrupt+0x88/0x96 > Apr 11 23:28:52 ROUTER kernel: [21095.470167] [<ffffffff81003293>] apic_timer_interrupt+0x13/0x20 Hi Denys ! txq->qdisc is NULL at this point, I dont think its even possible, so my random guess is dev_pick_tx() got an queue_index > dev->real_num_tx_queues txq = dev_pick_tx(dev, skb); q = rcu_dereference(txq->qdisc); // q = NULL #ifdef CONFIG_NET_CLS_ACT skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_EGRESS); #endif if (q->enqueue) { // crash in dereference rc = __dev_xmit_skb(skb, q, dev, txq); goto out; } I believe the following lines from dev_pick_tx() are not the problem : if (sk && sk->sk_dst_cache) sk_tx_queue_set(sk, queue_index); It is IMHO not safe, because route for this socket might have just changed and we are transmitting an old packet (queued some milli seconds before, when route was different). We then memorize a queue_index that might be too big for the new device of new selected route. Next packet we want to transmit will take the cached value of queue_index, correct for old device, maybe not correct for new device. You could try to revert commit a4ee3ce3293dc931fab19beb472a8bde1295aebe commit a4ee3ce3293dc931fab19beb472a8bde1295aebe Author: Krishna Kumar <krkumar2@in.ibm.com> Date: Mon Oct 19 23:50:07 2009 +0000 net: Use sk_tx_queue_mapping for connected sockets For connected sockets, the first run of dev_pick_tx saves the calculated txq in sk_tx_queue_mapping. This is not saved if either the device has a queue select or the socket is not connected. Next iterations of dev_pick_tx uses the cached value of sk_tx_queue_mapping. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> diff --git a/net/core/dev.c b/net/core/dev.c index 28b0b9e..fa88dcd 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1791,13 +1791,25 @@ EXPORT_SYMBOL(skb_tx_hash); static struct netdev_queue *dev_pick_tx(struct net_device *dev, struct sk_buff *skb) { - const struct net_device_ops *ops = dev->netdev_ops; - u16 queue_index = 0; + u16 queue_index; + struct sock *sk = skb->sk; + + if (sk_tx_queue_recorded(sk)) { + queue_index = sk_tx_queue_get(sk); + } else { + const struct net_device_ops *ops = dev->netdev_ops; - if (ops->ndo_select_queue) - queue_index = ops->ndo_select_queue(dev, skb); - else if (dev->real_num_tx_queues > 1) - queue_index = skb_tx_hash(dev, skb); + if (ops->ndo_select_queue) { + queue_index = ops->ndo_select_queue(dev, skb); + } else { + queue_index = 0; + if (dev->real_num_tx_queues > 1) + queue_index = skb_tx_hash(dev, skb); + + if (sk && sk->sk_dst_cache) + sk_tx_queue_set(sk, queue_index); + } + } skb_set_queue_mapping(skb, queue_index); return netdev_get_tx_queue(dev, queue_index); ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64 2010-04-11 22:35 ` Eric Dumazet @ 2010-04-11 23:04 ` Denys Fedorysychenko 2010-04-11 23:11 ` Eric Dumazet 2010-04-12 3:38 ` Krishna Kumar2 1 sibling, 1 reply; 37+ messages in thread From: Denys Fedorysychenko @ 2010-04-11 23:04 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, Krishna Kumar, David Miller > It is IMHO not safe, because route for this socket might have just > changed and we are transmitting an old packet (queued some milli seconds > before, when route was different). > > We then memorize a queue_index that might be too big for the new device > of new selected route. > > Next packet we want to transmit will take the cached value of > queue_index, correct for old device, maybe not correct for new device. Yes, it is possible, i have there RIP with 1k+ routes, changing non-stop. > > You could try to revert commit a4ee3ce3293dc931fab19beb472a8bde1295aebe > > commit a4ee3ce3293dc931fab19beb472a8bde1295aebe I will try to revert it now. But to trigger bug probably i need 1-2 days. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64 2010-04-11 23:04 ` Denys Fedorysychenko @ 2010-04-11 23:11 ` Eric Dumazet 2010-04-11 23:36 ` Denys Fedorysychenko 0 siblings, 1 reply; 37+ messages in thread From: Eric Dumazet @ 2010-04-11 23:11 UTC (permalink / raw) To: Denys Fedorysychenko; +Cc: netdev, Krishna Kumar, David Miller Le lundi 12 avril 2010 à 02:04 +0300, Denys Fedorysychenko a écrit : > > It is IMHO not safe, because route for this socket might have just > > changed and we are transmitting an old packet (queued some milli seconds > > before, when route was different). > > > > We then memorize a queue_index that might be too big for the new device > > of new selected route. > > > > Next packet we want to transmit will take the cached value of > > queue_index, correct for old device, maybe not correct for new device. > Yes, it is possible, i have there RIP with 1k+ routes, changing non-stop. > > > > > You could try to revert commit a4ee3ce3293dc931fab19beb472a8bde1295aebe > > > > commit a4ee3ce3293dc931fab19beb472a8bde1295aebe > I will try to revert it now. But to trigger bug probably i need 1-2 days. I can try to reproduce bug here, with a multiqueue device, a non multiqueue device, and changing routes while transmitting bulk of packets, and of course some trafic shaping to slow down xmits. Could you send : ifconfig -a lspci Thanks ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64 2010-04-11 23:11 ` Eric Dumazet @ 2010-04-11 23:36 ` Denys Fedorysychenko 0 siblings, 0 replies; 37+ messages in thread From: Denys Fedorysychenko @ 2010-04-11 23:36 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, Krishna Kumar, David Miller On Monday 12 April 2010 02:11:23 Eric Dumazet wrote: > Le lundi 12 avril 2010 à 02:04 +0300, Denys Fedorysychenko a écrit : > > > It is IMHO not safe, because route for this socket might have just > > > changed and we are transmitting an old packet (queued some milli > > > seconds before, when route was different). > > > > > > We then memorize a queue_index that might be too big for the new device > > > of new selected route. > > > > > > Next packet we want to transmit will take the cached value of > > > queue_index, correct for old device, maybe not correct for new device. > > > > Yes, it is possible, i have there RIP with 1k+ routes, changing non-stop. > > > > > You could try to revert commit a4ee3ce3293dc931fab19beb472a8bde1295aebe > > > > > > commit a4ee3ce3293dc931fab19beb472a8bde1295aebe > > > > I will try to revert it now. But to trigger bug probably i need 1-2 days. > > I can try to reproduce bug here, with a multiqueue device, a non > multiqueue device, and changing routes while transmitting bulk of > packets, and of course some trafic shaping to slow down xmits. > > Could you send : > > ifconfig -a > lspci A bit more even, hopefully it is useful. Additional note - i have few blackhole routes, so if specific route announce will disappear - it will be less specific blackhole route. I have also some source routing (ip rule). lspci 00:00.0 Host bridge: Intel Corporation 5000P Chipset Memory Controller Hub (rev b1) 00:02.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x4 Port 2 (rev b1) 00:03.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x4 Port 3 (rev b1) 00:04.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port 4-5 (rev b1) 00:05.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x4 Port 5 (rev b1) 00:06.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port 6-7 (rev b1) 00:07.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x4 Port 7 (rev b1) 00:10.0 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev b1) 00:10.1 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev b1) 00:10.2 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev b1) 00:11.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers (rev b1) 00:13.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers (rev b1) 00:15.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev b1) 00:16.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev b1) 00:1c.0 PCI bridge: Intel Corporation 631xESB/632xESB/3100 Chipset PCI Express Root Port 1 (rev 09) 00:1d.0 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #1 (rev 09) 00:1d.1 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #2 (rev 09) 00:1d.2 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #3 (rev 09) 00:1d.3 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI USB Controller #4 (rev 09) 00:1d.7 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset EHCI USB2 Controller (rev 09) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d9) 00:1f.0 ISA bridge: Intel Corporation 631xESB/632xESB/3100 Chipset LPC Interface Controller (rev 09) 00:1f.1 IDE interface: Intel Corporation 631xESB/632xESB IDE Controller (rev 09) 00:1f.2 SATA controller: Intel Corporation 631xESB/632xESB SATA AHCI Controller (rev 09) 00:1f.3 SMBus: Intel Corporation 631xESB/632xESB/3100 Chipset SMBus Controller (rev 09) 01:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Upstream Port (rev 01) 01:00.3 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express to PCI-X Bridge (rev 01) 02:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream Port E1 (rev 01) 02:02.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream Port E3 (rev 01) 04:00.0 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) (rev 01) 04:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) (rev 01) 07:00.0 RAID bus controller: Adaptec AAC-RAID (rev 09) 0b:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) 0b:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06) 0c:05.0 VGA compatible controller: ASPEED Technology, Inc. AST2000 eth0 Link encap:Ethernet HWaddr 00:1E:68:57:4B:CC inet addr:194.146.153.17 Bcast:0.0.0.0 Mask:255.255.255.224 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1047262147 errors:0 dropped:0 overruns:0 frame:0 TX packets:1017399476 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:698944316059 (650.9 GiB) TX bytes:694648889407 (646.9 GiB) Memory:fc6e0000-fc700000 eth0.165 Link encap:Ethernet HWaddr 00:1E:68:57:4B:CC inet addr:172.30.1.2 Bcast:0.0.0.0 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:5375126 errors:0 dropped:0 overruns:0 frame:0 TX packets:3446299 errors:0 dropped:2 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:1910233297 (1.7 GiB) TX bytes:814740452 (776.9 MiB) eth0.32 Link encap:Ethernet HWaddr 00:1E:68:57:4B:CC inet addr:10.22.22.2 Bcast:0.0.0.0 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:96167917 errors:0 dropped:0 overruns:0 frame:0 TX packets:166845841 errors:0 dropped:2 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:14589314556 (13.5 GiB) TX bytes:126771970521 (118.0 GiB) eth0.33 Link encap:Ethernet HWaddr 00:1E:68:57:4B:CC inet addr:10.152.152.2 Bcast:0.0.0.0 Mask:255.255.255.248 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:10171808 errors:0 dropped:0 overruns:0 frame:0 TX packets:130698572 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:2690522850 (2.5 GiB) TX bytes:25223310290 (23.4 GiB) eth0.34 Link encap:Ethernet HWaddr 00:1E:68:57:4B:CC inet addr:10.25.25.1 Bcast:0.0.0.0 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:133415723 errors:0 dropped:0 overruns:0 frame:0 TX packets:1352 errors:0 dropped:1 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:140007983122 (130.3 GiB) TX bytes:117870 (115.1 KiB) eth0.35 Link encap:Ethernet HWaddr 00:1E:68:57:4B:CC UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) eth0.36 Link encap:Ethernet HWaddr 00:1E:68:57:4B:CC UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1474 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:125741 (122.7 KiB) TX bytes:0 (0.0 B) eth0.38 Link encap:Ethernet HWaddr 00:1E:68:57:4B:CC inet addr:10.0.4.1 Bcast:0.0.0.0 Mask:255.255.255.248 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:674 errors:0 dropped:0 overruns:0 frame:0 TX packets:188760217 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:31004 (30.2 KiB) TX bytes:212060676669 (197.4 GiB) eth0.39 Link encap:Ethernet HWaddr 00:1E:68:57:4B:CC inet addr:10.22.24.1 Bcast:0.0.0.0 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:270892875 errors:0 dropped:0 overruns:0 frame:0 TX packets:309713487 errors:0 dropped:3 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:77224626928 (71.9 GiB) TX bytes:215883462565 (201.0 GiB) eth0.40 Link encap:Ethernet HWaddr 00:1E:68:57:4B:CC inet addr:62.84.94.13 Bcast:0.0.0.0 Mask:255.255.255.248 BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) eth0.41 Link encap:Ethernet HWaddr 00:1E:68:57:4B:CC inet addr:2.2.3.1 Bcast:0.0.0.0 Mask:255.255.255.128 BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) eth1 Link encap:Ethernet HWaddr 00:1E:68:57:4B:CD UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1 RX packets:949912 errors:0 dropped:0 overruns:0 frame:0 TX packets:192 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:429565276 (409.6 MiB) TX bytes:8064 (7.8 KiB) Memory:fc6c0000-fc6e0000 eth1.2 Link encap:Ethernet HWaddr 00:1C:C0:19:23:80 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:5462 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:273100 (266.6 KiB) TX bytes:0 (0.0 B) eth1.3 Link encap:Ethernet HWaddr 00:1E:68:57:4B:CD UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:5462 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:273100 (266.6 KiB) TX bytes:0 (0.0 B) eth1.4 Link encap:Ethernet HWaddr 00:1E:68:57:4B:CD inet addr:10.154.154.2 Bcast:0.0.0.0 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:5461 errors:0 dropped:0 overruns:0 frame:0 TX packets:192 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:273050 (266.6 KiB) TX bytes:8064 (7.8 KiB) eth2 Link encap:Ethernet HWaddr 00:1E:68:57:4B:CE BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) Memory:fcde0000-fce00000 eth3 Link encap:Ethernet HWaddr 00:1E:68:57:4B:CF BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) Memory:fcd80000-fcda0000 ifb0 Link encap:Ethernet HWaddr 6A:73:52:ED:4A:D6 UP BROADCAST RUNNING NOARP MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:32 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) ifb1 Link encap:Ethernet HWaddr C2:97:7A:32:8B:59 UP BROADCAST RUNNING NOARP MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:32 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) ifb2 Link encap:Ethernet HWaddr 96:AC:DD:0D:68:B0 UP BROADCAST RUNNING NOARP MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:32 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) ifb3 Link encap:Ethernet HWaddr FA:D9:59:C6:16:E7 UP BROADCAST RUNNING NOARP MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:32 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:3192 errors:0 dropped:0 overruns:0 frame:0 TX packets:3192 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:222919 (217.6 KiB) TX bytes:222919 (217.6 KiB) tunx0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 inet addr:1.1.1.2 P-t-P:1.1.1.2 Mask:255.255.255.255 UP POINTOPOINT RUNNING MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:12531732 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1024 RX bytes:0 (0.0 B) TX bytes:1888404300 (1.7 GiB) Router-Dora ~ # tc -d qdisc qdisc hfsc 1: dev eth0 root refcnt 2 qdisc bfifo 101: dev eth0 parent 1:101 limit 12500000b qdisc bfifo 111: dev eth0 parent 1:111 limit 3125Kb qdisc bfifo 112: dev eth0 parent 1:112 limit 3125Kb qdisc pfifo_fast 0: dev eth1 root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc hfsc 1: dev eth0.33 root refcnt 2 linklayer ethernet overhead 4 mpu 64 mtu 1500 tsize 1436 qdisc bfifo 110: dev eth0.33 parent 1:110 limit 3000000b qdisc bfifo 120: dev eth0.33 parent 1:120 limit 100000b qdisc bfifo 130: dev eth0.33 parent 1:130 limit 100000b qdisc bfifo 140: dev eth0.33 parent 1:140 limit 100000b qdisc bfifo 150: dev eth0.33 parent 1:150 limit 1000000b qdisc bfifo 160: dev eth0.33 parent 1:160 limit 100000b qdisc bfifo 170: dev eth0.33 parent 1:170 limit 100000b qdisc bfifo 180: dev eth0.33 parent 1:180 limit 100000b qdisc bfifo 190: dev eth0.33 parent 1:190 limit 100000b qdisc hfsc 1: dev eth0.38 root refcnt 2 linklayer ethernet overhead 4 mpu 64 mtu 2047 tsize 512 qdisc bfifo 3: dev eth0.38 parent 1:3 limit 5000000b qdisc bfifo 4: dev eth0.38 parent 1:4 limit 5000000b qdisc hfsc 1: dev eth0.39 root refcnt 2 linklayer ethernet overhead 4 mpu 64 mtu 2047 tsize 512 qdisc bfifo 3: dev eth0.39 parent 1:3 limit 3000000b qdisc bfifo 20: dev eth0.39 parent 1:20 limit 20000000b qdisc bfifo 31: dev eth0.39 parent 1:31 limit 1000000b qdisc bfifo 30: dev eth0.39 parent 1:30 limit 5000000b qdisc ingress ffff: dev eth1.3 parent ffff:fff1 ---------------- qdisc pfifo_fast 0: dev ifb0 root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc pfifo_fast 0: dev ifb1 root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc pfifo_fast 0: dev ifb2 root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc pfifo_fast 0: dev ifb3 root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc pfifo_fast 0: dev tunx0 root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Router-Dora ~ # ip link 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc state UP qlen 1000 link/ether 00:1e:68:57:4b:cc brd ff:ff:ff:ff:ff:ff 3: eth1: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:1e:68:57:4b:cd brd ff:ff:ff:ff:ff:ff 4: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 00:1e:68:57:4b:ce brd ff:ff:ff:ff:ff:ff 5: eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether 00:1e:68:57:4b:cf brd ff:ff:ff:ff:ff:ff 6: eth0.32@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP link/ether 00:1e:68:57:4b:cc brd ff:ff:ff:ff:ff:ff 7: eth0.33@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc state UP link/ether 00:1e:68:57:4b:cc brd ff:ff:ff:ff:ff:ff 8: eth0.34@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP link/ether 00:1e:68:57:4b:cc brd ff:ff:ff:ff:ff:ff 9: eth0.35@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP link/ether 00:1e:68:57:4b:cc brd ff:ff:ff:ff:ff:ff 10: eth0.36@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP link/ether 00:1e:68:57:4b:cc brd ff:ff:ff:ff:ff:ff 11: eth0.38@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc state UP link/ether 00:1e:68:57:4b:cc brd ff:ff:ff:ff:ff:ff 12: eth0.39@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc state UP link/ether 00:1e:68:57:4b:cc brd ff:ff:ff:ff:ff:ff 13: eth0.40@eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN link/ether 00:1e:68:57:4b:cc brd ff:ff:ff:ff:ff:ff 14: eth0.41@eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN link/ether 00:1e:68:57:4b:cc brd ff:ff:ff:ff:ff:ff 15: eth0.165@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP link/ether 00:1e:68:57:4b:cc brd ff:ff:ff:ff:ff:ff 16: eth1.2@eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP link/ether 00:1c:c0:19:23:80 brd ff:ff:ff:ff:ff:ff 17: eth1.3@eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP link/ether 00:1e:68:57:4b:cd brd ff:ff:ff:ff:ff:ff 18: eth1.4@eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP link/ether 00:1e:68:57:4b:cd brd ff:ff:ff:ff:ff:ff 19: ifb0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 32 link/ether 6a:73:52:ed:4a:d6 brd ff:ff:ff:ff:ff:ff 20: ifb1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 32 link/ether c2:97:7a:32:8b:59 brd ff:ff:ff:ff:ff:ff 21: ifb2: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 32 link/ether 96:ac:dd:0d:68:b0 brd ff:ff:ff:ff:ff:ff 22: ifb3: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 32 link/ether fa:d9:59:c6:16:e7 brd ff:ff:ff:ff:ff:ff 23: tunx0: <POINTOPOINT,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 1024 link/[65534] > > Thanks > > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64 2010-04-11 22:35 ` Eric Dumazet 2010-04-11 23:04 ` Denys Fedorysychenko @ 2010-04-12 3:38 ` Krishna Kumar2 2010-04-12 6:01 ` Eric Dumazet 1 sibling, 1 reply; 37+ messages in thread From: Krishna Kumar2 @ 2010-04-12 3:38 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Miller, netdev, Denys Fedorysychenko Hi Eric, Eric Dumazet <eric.dumazet@gmail.com> wrote on 04/12/2010 04:05:53 AM: > I believe the following lines from dev_pick_tx() are not the problem : > > if (sk && sk->sk_dst_cache) > sk_tx_queue_set(sk, queue_index); > > It is IMHO not safe, because route for this socket might have just > changed and we are transmitting an old packet (queued some milli seconds > before, when route was different). > > We then memorize a queue_index that might be too big for the new device > of new selected route. > > Next packet we want to transmit will take the cached value of > queue_index, correct for old device, maybe not correct for new device. When route changes, I think my patch had reset sk->sk_tx_queue_mapping by calling sk_tx_queue_clear. I don't know if I missed any path where the route changes and sk_dst_reset() was not called. The following might be better to prove the panic is due to this, since your suggestion will hide a panic that happens somewhat rare (according to Denys): if (sk_tx_queue_recorded(sk)) { queue_index = sk_tx_queue_get(sk); + queue_index = dev_cap_txqueue(dev, queue_index); } else { Thanks, - KK > You could try to revert commit a4ee3ce3293dc931fab19beb472a8bde1295aebe > > commit a4ee3ce3293dc931fab19beb472a8bde1295aebe > Author: Krishna Kumar <krkumar2@in.ibm.com> > Date: Mon Oct 19 23:50:07 2009 +0000 > > net: Use sk_tx_queue_mapping for connected sockets > > For connected sockets, the first run of dev_pick_tx saves the > calculated txq in sk_tx_queue_mapping. This is not saved if > either the device has a queue select or the socket is not > connected. Next iterations of dev_pick_tx uses the cached value > of sk_tx_queue_mapping. > > Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> > Signed-off-by: David S. Miller <davem@davemloft.net> > > diff --git a/net/core/dev.c b/net/core/dev.c > index 28b0b9e..fa88dcd 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -1791,13 +1791,25 @@ EXPORT_SYMBOL(skb_tx_hash); > static struct netdev_queue *dev_pick_tx(struct net_device *dev, > struct sk_buff *skb) > { > - const struct net_device_ops *ops = dev->netdev_ops; > - u16 queue_index = 0; > + u16 queue_index; > + struct sock *sk = skb->sk; > + > + if (sk_tx_queue_recorded(sk)) { > + queue_index = sk_tx_queue_get(sk); > + } else { > + const struct net_device_ops *ops = dev->netdev_ops; > > - if (ops->ndo_select_queue) > - queue_index = ops->ndo_select_queue(dev, skb); > - else if (dev->real_num_tx_queues > 1) > - queue_index = skb_tx_hash(dev, skb); > + if (ops->ndo_select_queue) { > + queue_index = ops->ndo_select_queue(dev, skb); > + } else { > + queue_index = 0; > + if (dev->real_num_tx_queues > 1) > + queue_index = skb_tx_hash(dev, skb); > + > + if (sk && sk->sk_dst_cache) > + sk_tx_queue_set(sk, queue_index); > + } > + } > > skb_set_queue_mapping(skb, queue_index); > return netdev_get_tx_queue(dev, queue_index); > > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64 2010-04-12 3:38 ` Krishna Kumar2 @ 2010-04-12 6:01 ` Eric Dumazet 2010-04-12 7:18 ` Eric Dumazet 2010-04-12 7:54 ` NULL pointer dereference panic in stable (2.6.33.2), amd64 Krishna Kumar2 0 siblings, 2 replies; 37+ messages in thread From: Eric Dumazet @ 2010-04-12 6:01 UTC (permalink / raw) To: Krishna Kumar2; +Cc: David Miller, netdev, Denys Fedorysychenko Le lundi 12 avril 2010 à 09:08 +0530, Krishna Kumar2 a écrit : > Hi Eric, > > Eric Dumazet <eric.dumazet@gmail.com> wrote on 04/12/2010 04:05:53 AM: > > > I believe the following lines from dev_pick_tx() are not the problem : > > > > if (sk && sk->sk_dst_cache) > > sk_tx_queue_set(sk, queue_index); > > > > It is IMHO not safe, because route for this socket might have just > > changed and we are transmitting an old packet (queued some milli seconds > > before, when route was different). > > > > We then memorize a queue_index that might be too big for the new device > > of new selected route. > > > > Next packet we want to transmit will take the cached value of > > queue_index, correct for old device, maybe not correct for new device. > > When route changes, I think my patch had reset sk->sk_tx_queue_mapping > by calling sk_tx_queue_clear. I don't know if I missed any path where > the route changes and sk_dst_reset() was not called. > Problem is when you reset sk->sk_tx_queue_mapping at the very moment route (or destination) changes, we might have old packets queued in tx queues, of the old ethernet device (eth0 : multi queue compatable) 2) Application does a sendmsg() or connect() call and sk->sk_dst_cache is rebuild, it points to a dst_entry referring a new device (eth1 : non multiqueue) 3) When one old packet finally is transmitted, we do : queue_index = 1; // any value > 0 if (sk && sk->sk_dst_cache) sk_tx_queue_set(sk, queue_index); // remember a >0 value 4) application does a sendmsg(), enqueues a new skb on eth1 5) We re-enter dev_pick_tx(), and consider cached value in 3) is valid. we pick a non existent txq for eth1 device. 6) We crash. > The following might be better to prove the panic is due to this, since > your suggestion will hide a panic that happens somewhat rare (according > to Denys): > > if (sk_tx_queue_recorded(sk)) { > queue_index = sk_tx_queue_get(sk); > + queue_index = dev_cap_txqueue(dev, queue_index); > } else { > Sure, but I thought I was clear enough to prove this commit was wrong, and we have to find a fix. > Thanks, > > - KK > > > You could try to revert commit a4ee3ce3293dc931fab19beb472a8bde1295aebe > > > > commit a4ee3ce3293dc931fab19beb472a8bde1295aebe > > Author: Krishna Kumar <krkumar2@in.ibm.com> > > Date: Mon Oct 19 23:50:07 2009 +0000 > > > > net: Use sk_tx_queue_mapping for connected sockets > > > > For connected sockets, the first run of dev_pick_tx saves the > > calculated txq in sk_tx_queue_mapping. This is not saved if > > either the device has a queue select or the socket is not > > connected. Next iterations of dev_pick_tx uses the cached value > > of sk_tx_queue_mapping. > > > > Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> > > Signed-off-by: David S. Miller <davem@davemloft.net> > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > index 28b0b9e..fa88dcd 100644 > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -1791,13 +1791,25 @@ EXPORT_SYMBOL(skb_tx_hash); > > static struct netdev_queue *dev_pick_tx(struct net_device *dev, > > struct sk_buff *skb) > > { > > - const struct net_device_ops *ops = dev->netdev_ops; > > - u16 queue_index = 0; > > + u16 queue_index; > > + struct sock *sk = skb->sk; > > + > > + if (sk_tx_queue_recorded(sk)) { > > + queue_index = sk_tx_queue_get(sk); > > + } else { > > + const struct net_device_ops *ops = dev->netdev_ops; > > > > - if (ops->ndo_select_queue) > > - queue_index = ops->ndo_select_queue(dev, skb); > > - else if (dev->real_num_tx_queues > 1) > > - queue_index = skb_tx_hash(dev, skb); > > + if (ops->ndo_select_queue) { > > + queue_index = ops->ndo_select_queue(dev, skb); > > + } else { > > + queue_index = 0; > > + if (dev->real_num_tx_queues > 1) > > + queue_index = skb_tx_hash(dev, skb); > > + > > + if (sk && sk->sk_dst_cache) > > + sk_tx_queue_set(sk, queue_index); > > + } > > + } > > > > skb_set_queue_mapping(skb, queue_index); > > return netdev_get_tx_queue(dev, queue_index); > > > > > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64 2010-04-12 6:01 ` Eric Dumazet @ 2010-04-12 7:18 ` Eric Dumazet 2010-04-12 7:36 ` David Miller 2010-04-15 6:52 ` David Miller 2010-04-12 7:54 ` NULL pointer dereference panic in stable (2.6.33.2), amd64 Krishna Kumar2 1 sibling, 2 replies; 37+ messages in thread From: Eric Dumazet @ 2010-04-12 7:18 UTC (permalink / raw) To: Krishna Kumar2; +Cc: David Miller, netdev, Denys Fedorysychenko Le lundi 12 avril 2010 à 08:01 +0200, Eric Dumazet a écrit : > Problem is when you reset sk->sk_tx_queue_mapping at the very moment > route (or destination) changes, we might have old packets queued in tx > queues, of the old ethernet device (eth0 : multi queue compatable) > > 2) Application does a sendmsg() or connect() call and sk->sk_dst_cache > is rebuild, it points to a dst_entry referring a new device (eth1 : non > multiqueue) > > 3) When one old packet finally is transmitted, we do : > > queue_index = 1; // any value > 0 > > if (sk && sk->sk_dst_cache) > sk_tx_queue_set(sk, queue_index); // remember a >0 value > > 4) application does a sendmsg(), enqueues a new skb on eth1 > > 5) We re-enter dev_pick_tx(), and consider cached value in 3) is valid. > we pick a non existent txq for eth1 device. > > 6) We crash. > > > > The following might be better to prove the panic is due to this, since > > your suggestion will hide a panic that happens somewhat rare (according > > to Denys): > > > > if (sk_tx_queue_recorded(sk)) { > > queue_index = sk_tx_queue_get(sk); > > + queue_index = dev_cap_txqueue(dev, queue_index); > > } else { > > > > Sure, but I thought I was clear enough to prove this commit was wrong, > and we have to find a fix. Here the patch I cooked, this is a stable candidate, once tested and acknowledged. [PATCH] net: dev_pick_tx() fix When dev_pick_tx() caches tx queue_index on a socket, we must check socket dst_entry matches skb one, or risk a crash later, as reported by Denys Fedorysychenko, if old packets are in flight during a route change, involving devices with different number of queues. Bug introduced by commit a4ee3ce3 (net: Use sk_tx_queue_mapping for connected sockets) Reported-by: Denys Fedorysychenko <nuclearcat@nuclearcat.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- diff --git a/net/core/dev.c b/net/core/dev.c index 1c8a0ce..92584bf 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1989,8 +1989,12 @@ static struct netdev_queue *dev_pick_tx(struct net_device *dev, if (dev->real_num_tx_queues > 1) queue_index = skb_tx_hash(dev, skb); - if (sk && sk->sk_dst_cache) - sk_tx_queue_set(sk, queue_index); + if (sk) { + struct dst_entry *dst = rcu_dereference(sk->sk_dst_cache); + + if (dst && skb_dst(skb) == dst) + sk_tx_queue_set(sk, queue_index); + } } } ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64 2010-04-12 7:18 ` Eric Dumazet @ 2010-04-12 7:36 ` David Miller 2010-04-15 6:52 ` David Miller 1 sibling, 0 replies; 37+ messages in thread From: David Miller @ 2010-04-12 7:36 UTC (permalink / raw) To: eric.dumazet; +Cc: krkumar2, netdev, nuclearcat From: Eric Dumazet <eric.dumazet@gmail.com> Date: Mon, 12 Apr 2010 09:18:17 +0200 > Here the patch I cooked, this is a stable candidate, once tested and > acknowledged. > > [PATCH] net: dev_pick_tx() fix > > When dev_pick_tx() caches tx queue_index on a socket, we must check > socket dst_entry matches skb one, or risk a crash later, as reported by > Denys Fedorysychenko, if old packets are in flight during a route > change, involving devices with different number of queues. > > Bug introduced by commit a4ee3ce3 > (net: Use sk_tx_queue_mapping for connected sockets) > > Reported-by: Denys Fedorysychenko <nuclearcat@nuclearcat.com> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Looks good Eric. I'll integrate this into net-2.6 and also queue it up for -stable in a day or two unless some problem is discovered. Thanks! ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64 2010-04-12 7:18 ` Eric Dumazet 2010-04-12 7:36 ` David Miller @ 2010-04-15 6:52 ` David Miller 2010-04-15 8:02 ` Eric Dumazet 1 sibling, 1 reply; 37+ messages in thread From: David Miller @ 2010-04-15 6:52 UTC (permalink / raw) To: eric.dumazet; +Cc: krkumar2, netdev, nuclearcat From: Eric Dumazet <eric.dumazet@gmail.com> Date: Mon, 12 Apr 2010 09:18:17 +0200 > [PATCH] net: dev_pick_tx() fix > > When dev_pick_tx() caches tx queue_index on a socket, we must check > socket dst_entry matches skb one, or risk a crash later, as reported by > Denys Fedorysychenko, if old packets are in flight during a route > change, involving devices with different number of queues. > > Bug introduced by commit a4ee3ce3 > (net: Use sk_tx_queue_mapping for connected sockets) > > Reported-by: Denys Fedorysychenko <nuclearcat@nuclearcat.com> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> It looks like Denys is still getting crashes even with this patch applied. And I also think there is some meric to some of Krishna's analysis. To me it seems to make more sense to validate the SKB's queue against the real actual choosen device's range. The socket queue index will catch up and eventually become valid because the dst reset will invalidate the queue setting, and we'll thus recompute it as needed, as Krishna stated. So I'm tossing this patch for now since it doesn't even aparently fix the bug. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64 2010-04-15 6:52 ` David Miller @ 2010-04-15 8:02 ` Eric Dumazet 2010-04-15 8:26 ` David Miller 0 siblings, 1 reply; 37+ messages in thread From: Eric Dumazet @ 2010-04-15 8:02 UTC (permalink / raw) To: David Miller; +Cc: krkumar2, netdev, nuclearcat Le mercredi 14 avril 2010 à 23:52 -0700, David Miller a écrit : > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Mon, 12 Apr 2010 09:18:17 +0200 > > > [PATCH] net: dev_pick_tx() fix > > > > When dev_pick_tx() caches tx queue_index on a socket, we must check > > socket dst_entry matches skb one, or risk a crash later, as reported by > > Denys Fedorysychenko, if old packets are in flight during a route > > change, involving devices with different number of queues. > > > > Bug introduced by commit a4ee3ce3 > > (net: Use sk_tx_queue_mapping for connected sockets) > > > > Reported-by: Denys Fedorysychenko <nuclearcat@nuclearcat.com> > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > > It looks like Denys is still getting crashes even with this patch > applied. And I also think there is some meric to some of Krishna's > analysis. > > To me it seems to make more sense to validate the SKB's queue against > the real actual choosen device's range. > > The socket queue index will catch up and eventually become valid > because the dst reset will invalidate the queue setting, and we'll > thus recompute it as needed, as Krishna stated. > ??? > So I'm tossing this patch for now since it doesn't even aparently > fix the bug. I am a bit lost here David. Denys got a crash that we cannot explain yet. He said he has no multiqueue devices, so obviously my patch cant help him. But this patch was fixing a real issue, I believe I pointed it twice already... I'll try to setup an environment to trigger this bug for real, but this will take time, my dev machines are not multiqueue. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64 2010-04-15 8:02 ` Eric Dumazet @ 2010-04-15 8:26 ` David Miller 2010-04-15 8:51 ` Eric Dumazet 0 siblings, 1 reply; 37+ messages in thread From: David Miller @ 2010-04-15 8:26 UTC (permalink / raw) To: eric.dumazet; +Cc: krkumar2, netdev, nuclearcat From: Eric Dumazet <eric.dumazet@gmail.com> Date: Thu, 15 Apr 2010 10:02:33 +0200 > Denys got a crash that we cannot explain yet. He said he has no > multiqueue devices, so obviously my patch cant help him. > > But this patch was fixing a real issue, I believe I pointed it twice > already... Ok, I thought your patch was specifically meant to fix Denys's bug. The confusion comes from the fact that you mention Denys's crash in your commit message: -------------------- When dev_pick_tx() caches tx queue_index on a socket, we must check socket dst_entry matches skb one, or risk a crash later, as reported by Denys Fedorysychenko, if old packets are in flight during a route change, involving devices with different number of queues. -------------------- Anyways, I studied your patch once more and read the thread discussion with Krishna again and your patch looks fine. I'll apply it to net-2.6, thanks! ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64 2010-04-15 8:26 ` David Miller @ 2010-04-15 8:51 ` Eric Dumazet 2010-04-15 9:06 ` David Miller 0 siblings, 1 reply; 37+ messages in thread From: Eric Dumazet @ 2010-04-15 8:51 UTC (permalink / raw) To: David Miller; +Cc: krkumar2, netdev, nuclearcat Le jeudi 15 avril 2010 à 01:26 -0700, David Miller a écrit : > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Thu, 15 Apr 2010 10:02:33 +0200 > > > Denys got a crash that we cannot explain yet. He said he has no > > multiqueue devices, so obviously my patch cant help him. > > > > But this patch was fixing a real issue, I believe I pointed it twice > > already... > > Ok, I thought your patch was specifically meant to fix Denys's bug. > > The confusion comes from the fact that you mention Denys's crash in > your commit message: > > -------------------- > When dev_pick_tx() caches tx queue_index on a socket, we must check > socket dst_entry matches skb one, or risk a crash later, as reported by > Denys Fedorysychenko, if old packets are in flight during a route > change, involving devices with different number of queues. > -------------------- > > Anyways, I studied your patch once more and read the thread discussion > with Krishna again and your patch looks fine. I'll apply it to > net-2.6, thanks! > In any case, I think there is a fundamental problem with this sk caching. Because one packet can travel in many stacked devices before hitting the wire. (bonding, vlan, ethernet) for example. Socket cache is meaningfull for one level only... ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64 2010-04-15 8:51 ` Eric Dumazet @ 2010-04-15 9:06 ` David Miller 2010-04-15 9:11 ` Denys Fedorysychenko 2010-04-15 20:30 ` Eric Dumazet 0 siblings, 2 replies; 37+ messages in thread From: David Miller @ 2010-04-15 9:06 UTC (permalink / raw) To: eric.dumazet; +Cc: krkumar2, netdev, nuclearcat From: Eric Dumazet <eric.dumazet@gmail.com> Date: Thu, 15 Apr 2010 10:51:47 +0200 > In any case, I think there is a fundamental problem with this sk > caching. Because one packet can travel in many stacked devices before > hitting the wire. > > (bonding, vlan, ethernet) for example. > > Socket cache is meaningfull for one level only... We were talking the other day about that 'tun' change to orphan the SKB on TX, and I mentioned the possibility of just doing this in some generic location before we give the packet to the device ->xmit() method. Such a scheme could help with this problem too. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64 2010-04-15 9:06 ` David Miller @ 2010-04-15 9:11 ` Denys Fedorysychenko 2010-04-15 10:37 ` Eric Dumazet 2010-04-15 20:30 ` Eric Dumazet 1 sibling, 1 reply; 37+ messages in thread From: Denys Fedorysychenko @ 2010-04-15 9:11 UTC (permalink / raw) To: David Miller; +Cc: eric.dumazet, krkumar2, netdev Btw i have application using tun. On Thursday 15 April 2010 12:06:19 David Miller wrote: > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Thu, 15 Apr 2010 10:51:47 +0200 > > > In any case, I think there is a fundamental problem with this sk > > caching. Because one packet can travel in many stacked devices before > > hitting the wire. > > > > (bonding, vlan, ethernet) for example. > > > > Socket cache is meaningfull for one level only... > > We were talking the other day about that 'tun' change to orphan the > SKB on TX, and I mentioned the possibility of just doing this in some > generic location before we give the packet to the device ->xmit() > method. > > Such a scheme could help with this problem too. > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64 2010-04-15 9:11 ` Denys Fedorysychenko @ 2010-04-15 10:37 ` Eric Dumazet 2010-04-29 10:50 ` Denys Fedorysychenko 0 siblings, 1 reply; 37+ messages in thread From: Eric Dumazet @ 2010-04-15 10:37 UTC (permalink / raw) To: Denys Fedorysychenko; +Cc: David Miller, krkumar2, netdev Le jeudi 15 avril 2010 à 12:11 +0300, Denys Fedorysychenko a écrit : > Btw i have application using tun. Could you add following sanity test to catch the error ? diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index fa8b476..b67274a 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -988,6 +988,7 @@ static inline struct netdev_queue *netdev_get_tx_queue(const struct net_device *dev, unsigned int index) { + WARN_ON(index >= dev->num_tx_queues); return &dev->_tx[index]; } ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64 2010-04-15 10:37 ` Eric Dumazet @ 2010-04-29 10:50 ` Denys Fedorysychenko 0 siblings, 0 replies; 37+ messages in thread From: Denys Fedorysychenko @ 2010-04-29 10:50 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Miller, krkumar2, netdev On Thursday 15 April 2010 13:37:10 Eric Dumazet wrote: > Le jeudi 15 avril 2010 à 12:11 +0300, Denys Fedorysychenko a écrit : > > Btw i have application using tun. > > Could you add following sanity test to catch the error ? > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index fa8b476..b67274a 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -988,6 +988,7 @@ static inline > struct netdev_queue *netdev_get_tx_queue(const struct net_device *dev, > unsigned int index) > { > + WARN_ON(index >= dev->num_tx_queues); > return &dev->_tx[index]; > } > Very sorry for being late, just i found way to stabilize kernel for me and to solve my personal life issues. It took 2 weeks... I will try this patch, but i'm sure i dont have multiqueue card there. I had shaper on eth0.33 and eth0 so i disable all offloading except checksumming there. Recently i found that on some interfaces (where is no shaper) gso left on. And yes, probably some traffic was queued, then route changed, and maybe it went from gso off interface to gso on... It can be just pure luck, that i dont have crashes anymore, but maybe trick is in gso... ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64 2010-04-15 9:06 ` David Miller 2010-04-15 9:11 ` Denys Fedorysychenko @ 2010-04-15 20:30 ` Eric Dumazet 2010-04-15 20:46 ` Eric Dumazet 1 sibling, 1 reply; 37+ messages in thread From: Eric Dumazet @ 2010-04-15 20:30 UTC (permalink / raw) To: David Miller; +Cc: krkumar2, netdev, nuclearcat Le jeudi 15 avril 2010 à 02:06 -0700, David Miller a écrit : > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Thu, 15 Apr 2010 10:51:47 +0200 > > > In any case, I think there is a fundamental problem with this sk > > caching. Because one packet can travel in many stacked devices before > > hitting the wire. > > > > (bonding, vlan, ethernet) for example. > > > > Socket cache is meaningfull for one level only... > > We were talking the other day about that 'tun' change to orphan the > SKB on TX, and I mentioned the possibility of just doing this in some > generic location before we give the packet to the device ->xmit() > method. > > Such a scheme could help with this problem too. Same thing we did with if (dev->priv_flags & IFF_XMIT_DST_RELEASE) skb_dst_drop(skb); in dev_hard_start_xmit() ? Problem is this skb_tstamp_tx() thing.... One possibility would be to change skb_orphan() to let everything done by destructor. One more argument would let destructor() know if this is the final destructor() called from skb_release_head_state() destructor() would be responsible to set skb->destructor and/or skb->sk to NULL when possible. All normal destructors would not care of this 2nd argument and just do what they actually do, plus setting skb->sk = NULL, skb->destructor = NULL Fast path would stay as today, no extra test. Only tstamp users would need to setup another destructor, a bit more complex (it would have to take a look at 2nd argument before really doing the job) Completely untested patch to get the idea : (to be completed for the tstamp thing) include/linux/skbuff.h | 8 +++----- include/net/sctp/sctp.h | 2 +- include/net/sock.h | 4 ++-- net/caif/caif_socket.c | 4 +++- net/core/dev.c | 26 +++++++++----------------- net/core/skbuff.c | 2 +- net/core/sock.c | 8 ++++++-- net/l2tp/l2tp_core.c | 4 +++- net/netfilter/nf_tproxy_core.c | 2 +- net/packet/af_packet.c | 4 ++-- net/unix/af_unix.c | 4 ++-- 11 files changed, 33 insertions(+), 35 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 38501d2..7dfd833 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -353,7 +353,7 @@ struct sk_buff { kmemcheck_bitfield_end(flags1); __be16 protocol; - void (*destructor)(struct sk_buff *skb); + void (*destructor)(struct sk_buff *skb, int final); #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE) struct nf_conntrack *nfct; struct sk_buff *nfct_reasm; @@ -1407,15 +1407,13 @@ static inline void pskb_trim_unique(struct sk_buff *skb, unsigned int len) * @skb: buffer to orphan * * If a buffer currently has an owner then we call the owner's - * destructor function and make the @skb unowned. The buffer continues + * destructor function. The buffer continues * to exist but is no longer charged to its former owner. */ static inline void skb_orphan(struct sk_buff *skb) { if (skb->destructor) - skb->destructor(skb); - skb->destructor = NULL; - skb->sk = NULL; + skb->destructor(skb, 0); } /** diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h index 5915155..49e2162 100644 --- a/include/net/sctp/sctp.h +++ b/include/net/sctp/sctp.h @@ -130,7 +130,7 @@ int sctp_inet_listen(struct socket *sock, int backlog); void sctp_write_space(struct sock *sk); unsigned int sctp_poll(struct file *file, struct socket *sock, poll_table *wait); -void sctp_sock_rfree(struct sk_buff *skb); +void sctp_sock_rfree(struct sk_buff *skb, int final); void sctp_copy_sock(struct sock *newsk, struct sock *sk, struct sctp_association *asoc); extern struct percpu_counter sctp_sockets_allocated; diff --git a/include/net/sock.h b/include/net/sock.h index 56df440..a042c9d 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -988,8 +988,8 @@ extern struct sk_buff *sock_wmalloc(struct sock *sk, extern struct sk_buff *sock_rmalloc(struct sock *sk, unsigned long size, int force, gfp_t priority); -extern void sock_wfree(struct sk_buff *skb); -extern void sock_rfree(struct sk_buff *skb); +extern void sock_wfree(struct sk_buff *skb, int final); +extern void sock_rfree(struct sk_buff *skb, int final); extern int sock_setsockopt(struct socket *sock, int level, int op, char __user *optval, diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c index cdf62b9..4e7276a 100644 --- a/net/caif/caif_socket.c +++ b/net/caif/caif_socket.c @@ -256,10 +256,12 @@ static void caif_sktflowctrl_cb(struct cflayer *layr, } } -static void skb_destructor(struct sk_buff *skb) +static void skb_destructor(struct sk_buff *skb, int final) { dbfs_atomic_inc(&cnt.skb_free); dbfs_atomic_dec(&cnt.skb_in_use); + skb->sk = NULL; + skb->destructor = NULL; } diff --git a/net/core/dev.c b/net/core/dev.c index 876b111..9bffbe5 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1828,12 +1828,12 @@ static int illegal_highdma(struct net_device *dev, struct sk_buff *skb) } struct dev_gso_cb { - void (*destructor)(struct sk_buff *skb); + void (*destructor)(struct sk_buff *skb, int final); }; #define DEV_GSO_CB(skb) ((struct dev_gso_cb *)(skb)->cb) -static void dev_gso_skb_destructor(struct sk_buff *skb) +static void dev_gso_skb_destructor(struct sk_buff *skb, int final) { struct dev_gso_cb *cb; @@ -1847,7 +1847,9 @@ static void dev_gso_skb_destructor(struct sk_buff *skb) cb = DEV_GSO_CB(skb); if (cb->destructor) - cb->destructor(skb); + cb->destructor(skb, final); + skb->sk = NULL; + skb->destructor = NULL; } /** @@ -1904,23 +1906,11 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, if (dev->priv_flags & IFF_XMIT_DST_RELEASE) skb_dst_drop(skb); + skb_orphan(skb); + rc = ops->ndo_start_xmit(skb, dev); if (rc == NETDEV_TX_OK) txq_trans_update(txq); - /* - * TODO: if skb_orphan() was called by - * dev->hard_start_xmit() (for example, the unmodified - * igb driver does that; bnx2 doesn't), then - * skb_tx_software_timestamp() will be unable to send - * back the time stamp. - * - * How can this be prevented? Always create another - * reference to the socket before calling - * dev->hard_start_xmit()? Prevent that skb_orphan() - * does anything in dev->hard_start_xmit() by clearing - * the skb destructor before the call and restoring it - * afterwards, then doing the skb_orphan() ourselves? - */ return rc; } @@ -1938,6 +1928,8 @@ gso: if (dev->priv_flags & IFF_XMIT_DST_RELEASE) skb_dst_drop(nskb); + skb_orphan(nskb); + rc = ops->ndo_start_xmit(nskb, dev); if (unlikely(rc != NETDEV_TX_OK)) { if (rc & ~NETDEV_TX_MASK) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index bdea0ef..90c171f 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -396,7 +396,7 @@ static void skb_release_head_state(struct sk_buff *skb) #endif if (skb->destructor) { WARN_ON(in_irq()); - skb->destructor(skb); + skb->destructor(skb, 1); } #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE) nf_conntrack_put(skb->nfct); diff --git a/net/core/sock.c b/net/core/sock.c index 7effa1e..fcf67ea 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1259,7 +1259,7 @@ void __init sk_init(void) /* * Write buffer destructor automatically called from kfree_skb. */ -void sock_wfree(struct sk_buff *skb) +void sock_wfree(struct sk_buff *skb, int final) { struct sock *sk = skb->sk; unsigned int len = skb->truesize; @@ -1279,18 +1279,22 @@ void sock_wfree(struct sk_buff *skb) */ if (atomic_sub_and_test(len, &sk->sk_wmem_alloc)) __sk_free(sk); + skb->sk = NULL; + skb->destructor = NULL; } EXPORT_SYMBOL(sock_wfree); /* * Read buffer destructor automatically called from kfree_skb. */ -void sock_rfree(struct sk_buff *skb) +void sock_rfree(struct sk_buff *skb, int final) { struct sock *sk = skb->sk; atomic_sub(skb->truesize, &sk->sk_rmem_alloc); sk_mem_uncharge(skb->sk, skb->truesize); + skb->sk = NULL; + skb->destructor = NULL; } EXPORT_SYMBOL(sock_rfree); diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c index 98dfcce..a3b0a95 100644 --- a/net/l2tp/l2tp_core.c +++ b/net/l2tp/l2tp_core.c @@ -973,9 +973,11 @@ EXPORT_SYMBOL_GPL(l2tp_xmit_core); /* Automatically called when the skb is freed. */ -static void l2tp_sock_wfree(struct sk_buff *skb) +static void l2tp_sock_wfree(struct sk_buff *skb, int final) { sock_put(skb->sk); + skb->sk = NULL; + skb->destructor = NULL; } /* For data skbs that we transmit, we associate with the tunnel socket diff --git a/net/netfilter/nf_tproxy_core.c b/net/netfilter/nf_tproxy_core.c index 5490fc3..17dd2d9 100644 --- a/net/netfilter/nf_tproxy_core.c +++ b/net/netfilter/nf_tproxy_core.c @@ -55,7 +55,7 @@ nf_tproxy_get_sock_v4(struct net *net, const u8 protocol, EXPORT_SYMBOL_GPL(nf_tproxy_get_sock_v4); static void -nf_tproxy_destructor(struct sk_buff *skb) +nf_tproxy_destructor(struct sk_buff *skb, int final) { struct sock *sk = skb->sk; diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index f162d59..dc8e843 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -808,7 +808,7 @@ ring_is_full: goto drop_n_restore; } -static void tpacket_destruct_skb(struct sk_buff *skb) +static void tpacket_destruct_skb(struct sk_buff *skb, int final) { struct packet_sock *po = pkt_sk(skb->sk); void *ph; @@ -823,7 +823,7 @@ static void tpacket_destruct_skb(struct sk_buff *skb) __packet_set_status(po, ph, TP_STATUS_AVAILABLE); } - sock_wfree(skb); + sock_wfree(skb, final); } static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb, diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 3d9122e..17fca55 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1303,7 +1303,7 @@ static void unix_detach_fds(struct scm_cookie *scm, struct sk_buff *skb) unix_notinflight(scm->fp->fp[i]); } -static void unix_destruct_fds(struct sk_buff *skb) +static void unix_destruct_fds(struct sk_buff *skb, int final) { struct scm_cookie scm; memset(&scm, 0, sizeof(scm)); @@ -1312,7 +1312,7 @@ static void unix_destruct_fds(struct sk_buff *skb) /* Alas, it calls VFS */ /* So fscking what? fput() had been SMP-safe since the last Summer */ scm_destroy(&scm); - sock_wfree(skb); + sock_wfree(skb, final); } static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb) ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64 2010-04-15 20:30 ` Eric Dumazet @ 2010-04-15 20:46 ` Eric Dumazet 2010-04-15 21:33 ` David Miller 0 siblings, 1 reply; 37+ messages in thread From: Eric Dumazet @ 2010-04-15 20:46 UTC (permalink / raw) To: David Miller; +Cc: krkumar2, netdev, nuclearcat Le jeudi 15 avril 2010 à 22:30 +0200, Eric Dumazet a écrit : > > @@ -1938,6 +1928,8 @@ gso: > if (dev->priv_flags & IFF_XMIT_DST_RELEASE) > skb_dst_drop(nskb); > > + skb_orphan(nskb); > + > rc = ops->ndo_start_xmit(nskb, dev); > if (unlikely(rc != NETDEV_TX_OK)) { > if (rc & ~NETDEV_TX_MASK) Well, might need to test if skb is not shared before orphaning it if (!skb_shared(skb)) skb_orphan(nskb); ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64 2010-04-15 20:46 ` Eric Dumazet @ 2010-04-15 21:33 ` David Miller 2010-04-16 22:18 ` [PATCH net-next-2.6] net: Introduce skb_orphan_try() Eric Dumazet 0 siblings, 1 reply; 37+ messages in thread From: David Miller @ 2010-04-15 21:33 UTC (permalink / raw) To: eric.dumazet; +Cc: krkumar2, netdev, nuclearcat From: Eric Dumazet <eric.dumazet@gmail.com> Date: Thu, 15 Apr 2010 22:46:15 +0200 > Le jeudi 15 avril 2010 à 22:30 +0200, Eric Dumazet a écrit : >> >> @@ -1938,6 +1928,8 @@ gso: >> if (dev->priv_flags & IFF_XMIT_DST_RELEASE) >> skb_dst_drop(nskb); >> >> + skb_orphan(nskb); >> + >> rc = ops->ndo_start_xmit(nskb, dev); >> if (unlikely(rc != NETDEV_TX_OK)) { >> if (rc & ~NETDEV_TX_MASK) > > Well, might need to test if skb is not shared before orphaning it > > if (!skb_shared(skb)) > skb_orphan(nskb); If it's not legal to skb_orphan() here then it would not be legal for the drivers to unconditionally skb_orphan(), which they do. So either your test is unnecessary, or we have a big existing problem :-) ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH net-next-2.6] net: Introduce skb_orphan_try() 2010-04-15 21:33 ` David Miller @ 2010-04-16 22:18 ` Eric Dumazet 2010-04-18 9:46 ` David Miller 0 siblings, 1 reply; 37+ messages in thread From: Eric Dumazet @ 2010-04-16 22:18 UTC (permalink / raw) To: David Miller; +Cc: netdev Le jeudi 15 avril 2010 à 14:33 -0700, David Miller a écrit : > If it's not legal to skb_orphan() here then it would not be legal for > the drivers to unconditionally skb_orphan(), which they do. > > So either your test is unnecessary, or we have a big existing problem > :-) I cooked following patch, introducing skb_orphan_try() helper, to document all known exceptions. I have a possible followup for this patch : Orphaning skbs earlier could also make dev_kfree_skb_irq() faster. Instead of queing skb into completion_queue and triggering NET_TX_SOFTIRQ, we would directly free an orphaned skb ? [PATCH net-next-2.6] net: Introduce skb_orphan_try() Transmitted skb might be attached to a socket and a destructor, for memory accounting purposes. Traditionally, this destructor is called at tx completion time, when skb is freed. When tx completion is performed by another cpu than the sender, this forces some cache lines to change ownership. XPS was an attempt to give tx completion to initial cpu. David idea is to call destructor right before giving skb to device (call to ndo_start_xmit()). Because device queues are usually small, orphaning skb before tx completion is not a big deal. Some drivers already do this, we could do it in upper level. There is one known exception to this early orphaning, called tx timestamping. It needs to keep a reference to socket until device can give a hardware or software timestamp. This patch adds a skb_orphan_try() helper, to centralize all exceptions to early orphaning in one spot, and use it in dev_hard_start_xmit(). "tbench 16" results on a Nehalem machine (2 X5570 @ 2.93GHz) before: Throughput 4428.9 MB/sec 16 procs after: Throughput 4448.14 MB/sec 16 procs UDP should get even better results, its destructor being more complex, since SOCK_USE_WRITE_QUEUE is not set (four atomic ops instead of one) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- diff --git a/net/core/dev.c b/net/core/dev.c index e8041eb..acae5fe 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1880,6 +1880,17 @@ static int dev_gso_segment(struct sk_buff *skb) return 0; } +/* + * Try to orphan skb early, right before transmission by the device. + * We cannot orphan skb if tx timestamp is requested, since + * drivers need to call skb_tstamp_tx() to send the timestamp. + */ +static inline void skb_orphan_try(struct sk_buff *skb) +{ + if (!skb_tx(skb)->flags) + skb_orphan(skb); +} + int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, struct netdev_queue *txq) { @@ -1904,23 +1915,10 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, if (dev->priv_flags & IFF_XMIT_DST_RELEASE) skb_dst_drop(skb); + skb_orphan_try(skb); rc = ops->ndo_start_xmit(skb, dev); if (rc == NETDEV_TX_OK) txq_trans_update(txq); - /* - * TODO: if skb_orphan() was called by - * dev->hard_start_xmit() (for example, the unmodified - * igb driver does that; bnx2 doesn't), then - * skb_tx_software_timestamp() will be unable to send - * back the time stamp. - * - * How can this be prevented? Always create another - * reference to the socket before calling - * dev->hard_start_xmit()? Prevent that skb_orphan() - * does anything in dev->hard_start_xmit() by clearing - * the skb destructor before the call and restoring it - * afterwards, then doing the skb_orphan() ourselves? - */ return rc; } @@ -1938,6 +1936,7 @@ gso: if (dev->priv_flags & IFF_XMIT_DST_RELEASE) skb_dst_drop(nskb); + skb_orphan_try(nskb); rc = ops->ndo_start_xmit(nskb, dev); if (unlikely(rc != NETDEV_TX_OK)) { if (rc & ~NETDEV_TX_MASK) ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH net-next-2.6] net: Introduce skb_orphan_try() 2010-04-16 22:18 ` [PATCH net-next-2.6] net: Introduce skb_orphan_try() Eric Dumazet @ 2010-04-18 9:46 ` David Miller 2010-04-21 6:08 ` Eric Dumazet 0 siblings, 1 reply; 37+ messages in thread From: David Miller @ 2010-04-18 9:46 UTC (permalink / raw) To: eric.dumazet; +Cc: netdev From: Eric Dumazet <eric.dumazet@gmail.com> Date: Sat, 17 Apr 2010 00:18:22 +0200 > Le jeudi 15 avril 2010 à 14:33 -0700, David Miller a écrit : > >> If it's not legal to skb_orphan() here then it would not be legal for >> the drivers to unconditionally skb_orphan(), which they do. >> >> So either your test is unnecessary, or we have a big existing problem >> :-) > > I cooked following patch, introducing skb_orphan_try() helper, to > document all known exceptions. Looks good, applied, thanks Eric. That timestamping issue, I bet there is some simple solution hiding in the bushes for that one? Hmmm, something like... there is a less transient piece of state (perhaps in the socket) and the skb has a pointer to that piece of state. Then the driver writes into this pointed-to place rather into some member of the skb itself. Then all of these problems with referencing the skb metadata across being passed to ->ndo_start_xmit() could go away I think. > I have a possible followup for this patch : > > Orphaning skbs earlier could also make dev_kfree_skb_irq() faster. > Instead of queing skb into completion_queue and triggering > NET_TX_SOFTIRQ, we would directly free an orphaned skb ? Sounds great. But I don't see dev_kfree_skb_irq() as much of a performance priority, since all sane modern drivers free skbs from softirq context (via NAPI). ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH net-next-2.6] net: Introduce skb_orphan_try() 2010-04-18 9:46 ` David Miller @ 2010-04-21 6:08 ` Eric Dumazet 2010-04-22 5:56 ` David Miller 0 siblings, 1 reply; 37+ messages in thread From: Eric Dumazet @ 2010-04-21 6:08 UTC (permalink / raw) To: David Miller; +Cc: netdev Le dimanche 18 avril 2010 à 02:46 -0700, David Miller a écrit : > Looks good, applied, thanks Eric. Hmm, looking at the GSO stuff, I believe we should not call skb_orphan_try() on gso skbs ? Thanks ! [PATCH net-next-2.6] net: Dont call skb_orphan_try() for GSO At this point, skb->destructor is not the original one (stored in DEV_GSO_CB(skb)->destructor) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- diff --git a/net/core/dev.c b/net/core/dev.c index b31d5d6..200d1e8 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1937,7 +1937,6 @@ gso: if (dev->priv_flags & IFF_XMIT_DST_RELEASE) skb_dst_drop(nskb); - skb_orphan_try(nskb); rc = ops->ndo_start_xmit(nskb, dev); if (unlikely(rc != NETDEV_TX_OK)) { if (rc & ~NETDEV_TX_MASK) ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH net-next-2.6] net: Introduce skb_orphan_try() 2010-04-21 6:08 ` Eric Dumazet @ 2010-04-22 5:56 ` David Miller 2010-04-22 7:10 ` Eric Dumazet 0 siblings, 1 reply; 37+ messages in thread From: David Miller @ 2010-04-22 5:56 UTC (permalink / raw) To: eric.dumazet; +Cc: netdev From: Eric Dumazet <eric.dumazet@gmail.com> Date: Wed, 21 Apr 2010 08:08:36 +0200 > Le dimanche 18 avril 2010 à 02:46 -0700, David Miller a écrit : > >> Looks good, applied, thanks Eric. > > Hmm, looking at the GSO stuff, I believe we should not call > skb_orphan_try() on gso skbs ? Right, I've applied this, thanks. What we should probably do instead is call and NULL out the DEV_GSO_CB() destructor. Right? ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH net-next-2.6] net: Introduce skb_orphan_try() 2010-04-22 5:56 ` David Miller @ 2010-04-22 7:10 ` Eric Dumazet 2010-04-22 7:16 ` David Miller 0 siblings, 1 reply; 37+ messages in thread From: Eric Dumazet @ 2010-04-22 7:10 UTC (permalink / raw) To: David Miller; +Cc: netdev Le mercredi 21 avril 2010 à 22:56 -0700, David Miller a écrit : > Right, I've applied this, thanks. > > What we should probably do instead is call and NULL out the > DEV_GSO_CB() destructor. Right? Yes, probably, I'll take a look at this if you want. Thanks ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH net-next-2.6] net: Introduce skb_orphan_try() 2010-04-22 7:10 ` Eric Dumazet @ 2010-04-22 7:16 ` David Miller 2010-04-22 7:24 ` Eric Dumazet 0 siblings, 1 reply; 37+ messages in thread From: David Miller @ 2010-04-22 7:16 UTC (permalink / raw) To: eric.dumazet; +Cc: netdev From: Eric Dumazet <eric.dumazet@gmail.com> Date: Thu, 22 Apr 2010 09:10:33 +0200 > Le mercredi 21 avril 2010 à 22:56 -0700, David Miller a écrit : > >> Right, I've applied this, thanks. >> >> What we should probably do instead is call and NULL out the >> DEV_GSO_CB() destructor. Right? > > Yes, probably, I'll take a look at this if you want. It might look something like this: diff --git a/net/core/dev.c b/net/core/dev.c index 9bf1ccc..13241da 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1892,6 +1892,20 @@ static inline void skb_orphan_try(struct sk_buff *skb) skb_orphan(skb); } +/* + * GSO packets need to be handled specially because such packets + * hold the normal SKB destructor in a backup pointer. + */ +static inline void skb_orphan_try_gso(struct sk_buff *skb) +{ + if (!skb_tx(skb)->flags) { + if (DEV_GSO_CB(skb)->destructor) + DEV_GSO_CB(skb)->destructor(skb); + DEV_GSO_CB(skb)->destructor = NULL; + skb->sk = NULL; + } +} + int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, struct netdev_queue *txq) { @@ -1937,6 +1951,7 @@ gso: if (dev->priv_flags & IFF_XMIT_DST_RELEASE) skb_dst_drop(nskb); + skb_orphan_try_gso(skb); rc = ops->ndo_start_xmit(nskb, dev); if (unlikely(rc != NETDEV_TX_OK)) { if (rc & ~NETDEV_TX_MASK) ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH net-next-2.6] net: Introduce skb_orphan_try() 2010-04-22 7:16 ` David Miller @ 2010-04-22 7:24 ` Eric Dumazet 2010-04-22 7:26 ` David Miller 0 siblings, 1 reply; 37+ messages in thread From: Eric Dumazet @ 2010-04-22 7:24 UTC (permalink / raw) To: David Miller; +Cc: netdev Le jeudi 22 avril 2010 à 00:16 -0700, David Miller a écrit : > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Thu, 22 Apr 2010 09:10:33 +0200 > > > Le mercredi 21 avril 2010 à 22:56 -0700, David Miller a écrit : > > > >> Right, I've applied this, thanks. > >> > >> What we should probably do instead is call and NULL out the > >> DEV_GSO_CB() destructor. Right? > > > > Yes, probably, I'll take a look at this if you want. > > It might look something like this: > > diff --git a/net/core/dev.c b/net/core/dev.c > index 9bf1ccc..13241da 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -1892,6 +1892,20 @@ static inline void skb_orphan_try(struct sk_buff *skb) > skb_orphan(skb); > } > > +/* > + * GSO packets need to be handled specially because such packets > + * hold the normal SKB destructor in a backup pointer. > + */ > +static inline void skb_orphan_try_gso(struct sk_buff *skb) > +{ > + if (!skb_tx(skb)->flags) { > + if (DEV_GSO_CB(skb)->destructor) > + DEV_GSO_CB(skb)->destructor(skb); > + DEV_GSO_CB(skb)->destructor = NULL; > + skb->sk = NULL; > + } > +} > + > int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, > struct netdev_queue *txq) > { > @@ -1937,6 +1951,7 @@ gso: > if (dev->priv_flags & IFF_XMIT_DST_RELEASE) > skb_dst_drop(nskb); > > + skb_orphan_try_gso(skb); > rc = ops->ndo_start_xmit(nskb, dev); > if (unlikely(rc != NETDEV_TX_OK)) { > if (rc & ~NETDEV_TX_MASK) Hmm... are you sure we want to call destructor for each skb ? Should'nt we do it before initial skb is split ? ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH net-next-2.6] net: Introduce skb_orphan_try() 2010-04-22 7:24 ` Eric Dumazet @ 2010-04-22 7:26 ` David Miller 2010-04-22 7:33 ` Eric Dumazet 0 siblings, 1 reply; 37+ messages in thread From: David Miller @ 2010-04-22 7:26 UTC (permalink / raw) To: eric.dumazet; +Cc: netdev From: Eric Dumazet <eric.dumazet@gmail.com> Date: Thu, 22 Apr 2010 09:24:05 +0200 > Hmm... are you sure we want to call destructor for each skb ? > > Should'nt we do it before initial skb is split ? Good idea, therefore you mean something like this? diff --git a/net/core/dev.c b/net/core/dev.c index 3ba774b..f3c3885 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1865,6 +1865,7 @@ static int dev_gso_segment(struct sk_buff *skb) int features = dev->features & ~(illegal_highdma(dev, skb) ? NETIF_F_SG : 0); + skb_orphan_try(skb); segs = skb_gso_segment(skb, features); /* Verifying header integrity only. */ ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH net-next-2.6] net: Introduce skb_orphan_try() 2010-04-22 7:26 ` David Miller @ 2010-04-22 7:33 ` Eric Dumazet 2010-04-22 7:41 ` David Miller 0 siblings, 1 reply; 37+ messages in thread From: Eric Dumazet @ 2010-04-22 7:33 UTC (permalink / raw) To: David Miller; +Cc: netdev Le jeudi 22 avril 2010 à 00:26 -0700, David Miller a écrit : > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Thu, 22 Apr 2010 09:24:05 +0200 > > > Hmm... are you sure we want to call destructor for each skb ? > > > > Should'nt we do it before initial skb is split ? > > Good idea, therefore you mean something like this? > > diff --git a/net/core/dev.c b/net/core/dev.c > index 3ba774b..f3c3885 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -1865,6 +1865,7 @@ static int dev_gso_segment(struct sk_buff *skb) > int features = dev->features & ~(illegal_highdma(dev, skb) ? > NETIF_F_SG : 0); > > + skb_orphan_try(skb); > segs = skb_gso_segment(skb, features); > > /* Verifying header integrity only. */ Yes, it seems better. What about the if (dev->priv_flags & IFF_XMIT_DST_RELEASE) skb_dst_drop(skb); This thing might also be moved before the split, since split probably clone all dst ? ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH net-next-2.6] net: Introduce skb_orphan_try() 2010-04-22 7:33 ` Eric Dumazet @ 2010-04-22 7:41 ` David Miller 2010-04-22 7:47 ` Eric Dumazet 0 siblings, 1 reply; 37+ messages in thread From: David Miller @ 2010-04-22 7:41 UTC (permalink / raw) To: eric.dumazet; +Cc: netdev From: Eric Dumazet <eric.dumazet@gmail.com> Date: Thu, 22 Apr 2010 09:33:57 +0200 > Le jeudi 22 avril 2010 à 00:26 -0700, David Miller a écrit : >> @@ -1865,6 +1865,7 @@ static int dev_gso_segment(struct sk_buff *skb) >> int features = dev->features & ~(illegal_highdma(dev, skb) ? >> NETIF_F_SG : 0); >> >> + skb_orphan_try(skb); >> segs = skb_gso_segment(skb, features); >> >> /* Verifying header integrity only. */ > > Yes, it seems better. > > What about the > > if (dev->priv_flags & IFF_XMIT_DST_RELEASE) > skb_dst_drop(skb); > > This thing might also be moved before the split, since split probably > clone all dst ? Good catch, agreed. diff --git a/net/core/dev.c b/net/core/dev.c index 3ba774b..4f897e2 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1851,6 +1851,17 @@ static void dev_gso_skb_destructor(struct sk_buff *skb) cb->destructor(skb); } +/* + * Try to orphan skb early, right before transmission by the device. + * We cannot orphan skb if tx timestamp is requested, since + * drivers need to call skb_tstamp_tx() to send the timestamp. + */ +static inline void skb_orphan_try(struct sk_buff *skb) +{ + if (!skb_tx(skb)->flags) + skb_orphan(skb); +} + /** * dev_gso_segment - Perform emulated hardware segmentation on skb. * @skb: buffer to segment @@ -1865,6 +1876,7 @@ static int dev_gso_segment(struct sk_buff *skb) int features = dev->features & ~(illegal_highdma(dev, skb) ? NETIF_F_SG : 0); + skb_orphan_try(skb); segs = skb_gso_segment(skb, features); /* Verifying header integrity only. */ @@ -1881,17 +1893,6 @@ static int dev_gso_segment(struct sk_buff *skb) return 0; } -/* - * Try to orphan skb early, right before transmission by the device. - * We cannot orphan skb if tx timestamp is requested, since - * drivers need to call skb_tstamp_tx() to send the timestamp. - */ -static inline void skb_orphan_try(struct sk_buff *skb) -{ - if (!skb_tx(skb)->flags) - skb_orphan(skb); -} - int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, struct netdev_queue *txq) { @@ -1902,13 +1903,6 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, if (!list_empty(&ptype_all)) dev_queue_xmit_nit(skb, dev); - if (netif_needs_gso(dev, skb)) { - if (unlikely(dev_gso_segment(skb))) - goto out_kfree_skb; - if (skb->next) - goto gso; - } - /* * If device doesnt need skb->dst, release it right now while * its hot in this cpu cache @@ -1916,6 +1910,13 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, if (dev->priv_flags & IFF_XMIT_DST_RELEASE) skb_dst_drop(skb); + if (netif_needs_gso(dev, skb)) { + if (unlikely(dev_gso_segment(skb))) + goto out_kfree_skb; + if (skb->next) + goto gso; + } + skb_orphan_try(skb); rc = ops->ndo_start_xmit(skb, dev); if (rc == NETDEV_TX_OK) ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH net-next-2.6] net: Introduce skb_orphan_try() 2010-04-22 7:41 ` David Miller @ 2010-04-22 7:47 ` Eric Dumazet 2010-04-22 7:54 ` David Miller 0 siblings, 1 reply; 37+ messages in thread From: Eric Dumazet @ 2010-04-22 7:47 UTC (permalink / raw) To: David Miller; +Cc: netdev Le jeudi 22 avril 2010 à 00:41 -0700, David Miller a écrit : > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Thu, 22 Apr 2010 09:33:57 +0200 > > > Le jeudi 22 avril 2010 à 00:26 -0700, David Miller a écrit : > >> @@ -1865,6 +1865,7 @@ static int dev_gso_segment(struct sk_buff *skb) > >> int features = dev->features & ~(illegal_highdma(dev, skb) ? > >> NETIF_F_SG : 0); > >> > >> + skb_orphan_try(skb); > >> segs = skb_gso_segment(skb, features); > >> > >> /* Verifying header integrity only. */ > > > > Yes, it seems better. > > > > What about the > > > > if (dev->priv_flags & IFF_XMIT_DST_RELEASE) > > skb_dst_drop(skb); > > > > This thing might also be moved before the split, since split probably > > clone all dst ? > > Good catch, agreed. > > diff --git a/net/core/dev.c b/net/core/dev.c > index 3ba774b..4f897e2 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -1851,6 +1851,17 @@ static void dev_gso_skb_destructor(struct sk_buff *skb) > cb->destructor(skb); > } > > +/* > + * Try to orphan skb early, right before transmission by the device. > + * We cannot orphan skb if tx timestamp is requested, since > + * drivers need to call skb_tstamp_tx() to send the timestamp. > + */ > +static inline void skb_orphan_try(struct sk_buff *skb) > +{ > + if (!skb_tx(skb)->flags) > + skb_orphan(skb); > +} > + > /** > * dev_gso_segment - Perform emulated hardware segmentation on skb. > * @skb: buffer to segment > @@ -1865,6 +1876,7 @@ static int dev_gso_segment(struct sk_buff *skb) > int features = dev->features & ~(illegal_highdma(dev, skb) ? > NETIF_F_SG : 0); > > + skb_orphan_try(skb); > segs = skb_gso_segment(skb, features); > > /* Verifying header integrity only. */ > @@ -1881,17 +1893,6 @@ static int dev_gso_segment(struct sk_buff *skb) > return 0; > } > > -/* > - * Try to orphan skb early, right before transmission by the device. > - * We cannot orphan skb if tx timestamp is requested, since > - * drivers need to call skb_tstamp_tx() to send the timestamp. > - */ > -static inline void skb_orphan_try(struct sk_buff *skb) > -{ > - if (!skb_tx(skb)->flags) > - skb_orphan(skb); > -} > - > int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, > struct netdev_queue *txq) > { > @@ -1902,13 +1903,6 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, > if (!list_empty(&ptype_all)) > dev_queue_xmit_nit(skb, dev); > > - if (netif_needs_gso(dev, skb)) { > - if (unlikely(dev_gso_segment(skb))) > - goto out_kfree_skb; > - if (skb->next) > - goto gso; > - } > - > /* > * If device doesnt need skb->dst, release it right now while > * its hot in this cpu cache > @@ -1916,6 +1910,13 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, > if (dev->priv_flags & IFF_XMIT_DST_RELEASE) > skb_dst_drop(skb); > > + if (netif_needs_gso(dev, skb)) { > + if (unlikely(dev_gso_segment(skb))) > + goto out_kfree_skb; > + if (skb->next) > + goto gso; > + } > + > skb_orphan_try(skb); > rc = ops->ndo_start_xmit(skb, dev); > if (rc == NETDEV_TX_OK) You could have one skb_orphan_try() call before the if (netif_needs_gso(dev, skb)) { and remove it from dev_gso_segment() ? ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH net-next-2.6] net: Introduce skb_orphan_try() 2010-04-22 7:47 ` Eric Dumazet @ 2010-04-22 7:54 ` David Miller 2010-04-22 7:59 ` Eric Dumazet 0 siblings, 1 reply; 37+ messages in thread From: David Miller @ 2010-04-22 7:54 UTC (permalink / raw) To: eric.dumazet; +Cc: netdev From: Eric Dumazet <eric.dumazet@gmail.com> Date: Thu, 22 Apr 2010 09:47:03 +0200 > You could have one skb_orphan_try() call before the > > if (netif_needs_gso(dev, skb)) { > > and remove it from dev_gso_segment() ? Yes, that's much more concise. This should be ready to go: net: Orphan and de-dst skbs earlier in xmit path. This way GSO packets don't get handled differently. With help from Eric Dumazet. Signed-off-by: David S. Miller <davem@davemloft.net> diff --git a/net/core/dev.c b/net/core/dev.c index 3ba774b..a4a7c36 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1902,13 +1902,6 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, if (!list_empty(&ptype_all)) dev_queue_xmit_nit(skb, dev); - if (netif_needs_gso(dev, skb)) { - if (unlikely(dev_gso_segment(skb))) - goto out_kfree_skb; - if (skb->next) - goto gso; - } - /* * If device doesnt need skb->dst, release it right now while * its hot in this cpu cache @@ -1917,6 +1910,14 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, skb_dst_drop(skb); skb_orphan_try(skb); + + if (netif_needs_gso(dev, skb)) { + if (unlikely(dev_gso_segment(skb))) + goto out_kfree_skb; + if (skb->next) + goto gso; + } + rc = ops->ndo_start_xmit(skb, dev); if (rc == NETDEV_TX_OK) txq_trans_update(txq); ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH net-next-2.6] net: Introduce skb_orphan_try() 2010-04-22 7:54 ` David Miller @ 2010-04-22 7:59 ` Eric Dumazet 0 siblings, 0 replies; 37+ messages in thread From: Eric Dumazet @ 2010-04-22 7:59 UTC (permalink / raw) To: David Miller; +Cc: netdev Le jeudi 22 avril 2010 à 00:54 -0700, David Miller a écrit : > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Thu, 22 Apr 2010 09:47:03 +0200 > > > You could have one skb_orphan_try() call before the > > > > if (netif_needs_gso(dev, skb)) { > > > > and remove it from dev_gso_segment() ? > > Yes, that's much more concise. This should be ready to go: > > net: Orphan and de-dst skbs earlier in xmit path. > > This way GSO packets don't get handled differently. > > With help from Eric Dumazet. > > Signed-off-by: David S. Miller <davem@davemloft.net> > > diff --git a/net/core/dev.c b/net/core/dev.c > index 3ba774b..a4a7c36 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -1902,13 +1902,6 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, > if (!list_empty(&ptype_all)) > dev_queue_xmit_nit(skb, dev); > > - if (netif_needs_gso(dev, skb)) { > - if (unlikely(dev_gso_segment(skb))) > - goto out_kfree_skb; > - if (skb->next) > - goto gso; > - } > - > /* > * If device doesnt need skb->dst, release it right now while > * its hot in this cpu cache > @@ -1917,6 +1910,14 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, > skb_dst_drop(skb); > > skb_orphan_try(skb); > + > + if (netif_needs_gso(dev, skb)) { > + if (unlikely(dev_gso_segment(skb))) > + goto out_kfree_skb; > + if (skb->next) > + goto gso; > + } > + > rc = ops->ndo_start_xmit(skb, dev); > if (rc == NETDEV_TX_OK) > txq_trans_update(txq); Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Thanks David ! ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64 2010-04-12 6:01 ` Eric Dumazet 2010-04-12 7:18 ` Eric Dumazet @ 2010-04-12 7:54 ` Krishna Kumar2 2010-04-12 9:31 ` Eric Dumazet 1 sibling, 1 reply; 37+ messages in thread From: Krishna Kumar2 @ 2010-04-12 7:54 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Miller, netdev, Denys Fedorysychenko Hi Eric, Thanks for your patch, just one question on it though: Eric Dumazet <eric.dumazet@gmail.com> wrote on 04/12/2010 11:31:51 AM: > > When route changes, I think my patch had reset sk->sk_tx_queue_mapping > > by calling sk_tx_queue_clear. I don't know if I missed any path where > > the route changes and sk_dst_reset() was not called. > > > > Problem is when you reset sk->sk_tx_queue_mapping at the very moment > route (or destination) changes, we might have old packets queued in tx > queues, of the old ethernet device (eth0 : multi queue compatable) > 2) Application does a sendmsg() or connect() call and sk->sk_dst_cache > is rebuild, it points to a dst_entry referring a new device (eth1 : non > multiqueue) > > 3) When one old packet finally is transmitted, we do : > > queue_index = 1; // any value > 0 > > if (sk && sk->sk_dst_cache) > sk_tx_queue_set(sk, queue_index); // remember a >0 value > > 4) application does a sendmsg(), enqueues a new skb on eth1 > > 5) We re-enter dev_pick_tx(), and consider cached value in 3) is valid. > we pick a non existent txq for eth1 device. > > 6) We crash. > > > The following might be better to prove the panic is due to this, since > > your suggestion will hide a panic that happens somewhat rare (according > > to Denys): > > > > if (sk_tx_queue_recorded(sk)) { > > queue_index = sk_tx_queue_get(sk); > > + queue_index = dev_cap_txqueue(dev, queue_index); > > } else { > > > > Sure, but I thought I was clear enough to prove this commit was wrong, > and we have to find a fix. If the dst got changed between call to vlan_dev_hwaccel_hard_start_xmit and it's call to dev_queue_xmit, that change to dst should have reset sk_tx_queue_mapping to -1 by calling sk_tx_queue_clear (assuming that I have changed in all paths, eg __sk_dst_reset), and thus result in a new mapping in dev_pick_tx. Would the patch hide the actual bug where we do not clear sk_tx_queue_mapping, eg __sk_dst_set does it? I agree the patch will fix the panic, but this check could be removed if the code which changes the dst is fixed to clear the mapping. I could check that if you think this assumption is correct. Thanks, - KK ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64 2010-04-12 7:54 ` NULL pointer dereference panic in stable (2.6.33.2), amd64 Krishna Kumar2 @ 2010-04-12 9:31 ` Eric Dumazet 2010-04-12 16:11 ` Denys Fedorysychenko 0 siblings, 1 reply; 37+ messages in thread From: Eric Dumazet @ 2010-04-12 9:31 UTC (permalink / raw) To: Krishna Kumar2; +Cc: David Miller, netdev, Denys Fedorysychenko Le lundi 12 avril 2010 à 13:24 +0530, Krishna Kumar2 a écrit : > If the dst got changed between call to vlan_dev_hwaccel_hard_start_xmit > and it's call to dev_queue_xmit, that change to dst should have reset > sk_tx_queue_mapping to -1 by calling sk_tx_queue_clear (assuming that I > have changed in all paths, eg __sk_dst_reset), and thus result in a new > mapping in dev_pick_tx. Would the patch hide the actual bug where we do > not clear sk_tx_queue_mapping, eg __sk_dst_set does it? I agree the > patch will fix the panic, but this check could be removed if the code > which changes the dst is fixed to clear the mapping. I could check that > if you think this assumption is correct. > I believe you focus on another problem. I am not saying we dont have another bug (forgetting to reset sk_dst_cache somewhere). I am only saying that when we want to cache the queue number on a given socket, we have to make sure current packet routing decision was taken on same dst_entries than current and future ones. Denys hit the problem because of long delays caused by traffic shaping. So the cache renew must be safe, which I tried to fix. You are saying that cache invalidation might be missing from some paths. I dont think so because I took an extensive look at these spots when working on yet another RCU conversion two days ago (sk_dst_lock becomes a spinlock). This was fresh in my mind, this is why I probably found Denys problem origin so quickly ;) ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64 2010-04-12 9:31 ` Eric Dumazet @ 2010-04-12 16:11 ` Denys Fedorysychenko 2010-04-12 20:09 ` Eric Dumazet 0 siblings, 1 reply; 37+ messages in thread From: Denys Fedorysychenko @ 2010-04-12 16:11 UTC (permalink / raw) To: Eric Dumazet; +Cc: Krishna Kumar2, David Miller, netdev [-- Attachment #1: Type: Text/Plain, Size: 163 bytes --] On Monday 12 April 2010 12:31:43 Eric Dumazet wrote: . Seems problem still remain. Patched kernel, but paniced now. Btw, i dont have any multiqueue card, i think. [-- Attachment #2: x.txt --] [-- Type: text/plain, Size: 12174 bytes --] Apr 12 18:46:58 80.83.17.1 dropbear[4843]: exit before auth: Disconnect received Apr 12 18:46:59 80.83.17.1 dropbear[4845]: Child connection from 82.113.44.186:48692 Apr 12 18:46:59 80.83.17.1 dropbear[4844]: exit before auth: Disconnect received Apr 12 18:46:59 80.83.17.1 kernel: [12598.956375] BUG: unable to handle kernel NULL pointer dereference at (null) Apr 12 18:46:59 80.83.17.1 kernel: [12598.956571] IP: [<ffffffff811e587f>] dev_queue_xmit+0x28c/0x46d Apr 12 18:46:59 80.83.17.1 kernel: [12598.956762] PGD 21debc067 PUD 21c881067 PMD 0 Apr 12 18:46:59 80.83.17.1 kernel: [12598.956947] Oops: 0000 [#1] SMP Apr 12 18:46:59 80.83.17.1 kernel: [12598.957126] last sysfs file: /sys/devices/virtual/vc/vcs3/dev Apr 12 18:46:59 80.83.17.1 kernel: [12598.957311] CPU 0 Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] Pid: 0, comm: swapper Not tainted 2.6.33.2-build-0052test-64 #2 / Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] RIP: 0010:[<ffffffff811e587f>] [<ffffffff811e587f>] dev_queue_xmit+0x28c/0x46d Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] RSP: 0000:ffff880028203a30 EFLAGS: 00010202 Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] RAX: 0000000000002000 RBX: 0000000000000000 RCX: ffff880209d8a900 Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] RDX: ffff88021d870000 RSI: 0000000000000000 RDI: ffff88020a7b48e8 Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] RBP: ffff880028203a60 R08: ffff88021c8be89c R09: ffff88021c8bec00 Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] R10: dead000000200200 R11: dead000000100100 R12: ffff88021f98a880 Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] R13: ffff88021d5c0900 R14: ffff88020a7b48e8 R15: ffff88021cbad000 Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] FS: 0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000 Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] CR2: 0000000000000000 CR3: 000000021c9d8000 CR4: 00000000000006f0 Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] Process swapper (pid: 0, threadinfo ffffffff81392000, task ffffffff813a1020) Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] Stack: Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] ffff88021d870000 ffff88021d5c0900 0000000000000042 ffff88021d5c0900 Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] <0> ffff88021cbad000 ffff88021cbad000 ffff880028203a80 ffffffffa01c12a9 Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] <0> 0000000000000000 ffff88020a7b48e8 ffff880028203ad0 ffffffff811e540e Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] Call Trace: Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] <IRQ> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffffa01c12a9>] vlan_dev_hwaccel_hard_start_xmit+0x68/0x86 [8021q] Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff811e540e>] dev_hard_start_xmit+0x232/0x304 Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff811f648a>] sch_direct_xmit+0x5d/0x16b Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff811f6654>] __qdisc_run+0xbc/0xdc Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff811e5939>] dev_queue_xmit+0x346/0x46d Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff8120a384>] ip_finish_output2+0x1c2/0x206 Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff8120a430>] ip_finish_output+0x68/0x6a Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff8120a4d2>] ip_output+0xa0/0xa5 Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff81206d2e>] ip_forward_finish+0x2e/0x32 Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff81206ff4>] ip_forward+0x2c2/0x322 Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff81205ae0>] ip_rcv_finish+0x2f0/0x30a Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff81205d77>] ip_rcv+0x27d/0x2a4 Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff8124ad48>] ? vlan_hwaccel_do_receive+0x2b/0xda Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff811e47b6>] netif_receive_skb+0x450/0x475 Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff811e4909>] napi_skb_finish+0x24/0x3b Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff8124b01b>] vlan_gro_receive+0x7c/0x81 Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffffa015d6c5>] e1000_receive_skb+0x4a/0x65 [e1000e] Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffffa015d8cb>] e1000_clean_rx_irq+0x1eb/0x29c [e1000e] Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffffa015ebfb>] e1000_clean+0x75/0x22e [e1000e] Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffffa0234d6c>] ? hfsc_dequeue+0x171/0x2a6 [sch_hfsc] Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff811e4e56>] net_rx_action+0xa7/0x17a Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff81039670>] __do_softirq+0x96/0x11a Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff810037cc>] call_softirq+0x1c/0x28 Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff81005543>] do_softirq+0x33/0x68 Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff81039407>] irq_exit+0x36/0x75 Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff81004c3e>] do_IRQ+0xaa/0xc1 Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff8125ba93>] ret_from_intr+0x0/0xa Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] <EOI> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff8100a0c7>] ? mwait_idle+0x66/0x6b Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff81001d24>] ? enter_idle+0x20/0x22 Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff81001d7b>] cpu_idle+0x55/0x8d Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff8124bba5>] rest_init+0x79/0x7b Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff813fca70>] start_kernel+0x362/0x36d Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff813fc0a8>] x86_64_start_reservations+0xa5/0xa9 Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff813fc189>] x86_64_start_kernel+0xdd/0xe4 Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] Code: e2 48 8b 55 d0 49 c1 e4 07 66 41 8b 86 a6 00 00 00 4c 03 a2 00 03 00 00 80 e4 cf 80 cc 20 49 8b 5c 24 08 66 41 89 86 a6 00 00 00 <48> 83 3b 00 0f 84 bb 00 00 00 4c 8d ab 9c 00 00 00 4c 89 ef e8 Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] RIP [<ffffffff811e587f>] dev_queue_xmit+0x28c/0x46d Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] RSP <ffff880028203a30> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] CR2: 0000000000000000 Apr 12 18:46:59 80.83.17.1 kernel: [12598.974856] ---[ end trace 739e5480c8ab784f ]--- Apr 12 18:46:59 80.83.17.1 kernel: [12598.975082] Kernel panic - not syncing: Fatal exception in interrupt Apr 12 18:46:59 80.83.17.1 kernel: [12598.975311] Pid: 0, comm: swapper Tainted: G D 2.6.33.2-build-0052test-64 #2 Apr 12 18:46:59 80.83.17.1 kernel: [12598.975706] Call Trace: Apr 12 18:46:59 80.83.17.1 kernel: [12598.975920] <IRQ> [<ffffffff81259753>] panic+0xa0/0x161 Apr 12 18:46:59 80.83.17.1 kernel: [12598.976200] [<ffffffff81003293>] ? apic_timer_interrupt+0x13/0x20 Apr 12 18:46:59 80.83.17.1 kernel: [12598.976431] [<ffffffff81035673>] ? kmsg_dump+0x112/0x12c Apr 12 18:46:59 80.83.17.1 kernel: [12598.976657] [<ffffffff81006651>] oops_end+0xaa/0xba Apr 12 18:46:59 80.83.17.1 kernel: [12598.976882] [<ffffffff8101e653>] no_context+0x1f3/0x202 Apr 12 18:46:59 80.83.17.1 kernel: [12598.977113] [<ffffffff8101e81c>] __bad_area_nosemaphore+0x1ba/0x1e0 Apr 12 18:46:59 80.83.17.1 kernel: [12598.977347] [<ffffffff8113f8b3>] ? swiotlb_map_page+0x0/0xd5 Apr 12 18:46:59 80.83.17.1 kernel: [12598.977577] [<ffffffffa015c55a>] ? pci_map_single+0x8a/0x99 [e1000e] Apr 12 18:46:59 80.83.17.1 kernel: [12598.977806] [<ffffffff8113f0c0>] ? swiotlb_dma_mapping_error+0x18/0x25 Apr 12 18:46:59 80.83.17.1 kernel: [12598.978045] [<ffffffffa015a2e0>] ? pci_dma_mapping_error+0x31/0x3d [e1000e] Apr 12 18:46:59 80.83.17.1 kernel: [12598.978282] [<ffffffffa015cc37>] ? e1000_xmit_frame+0x6ce/0xa43 [e1000e] Apr 12 18:46:59 80.83.17.1 kernel: [12598.978513] [<ffffffff8101e850>] bad_area_nosemaphore+0xe/0x10 Apr 12 18:46:59 80.83.17.1 kernel: [12598.978741] [<ffffffff8101eb32>] do_page_fault+0x114/0x24a Apr 12 18:46:59 80.83.17.1 kernel: [12598.978967] [<ffffffff8125bc9f>] page_fault+0x1f/0x30 Apr 12 18:46:59 80.83.17.1 kernel: [12598.979196] [<ffffffff811e587f>] ? dev_queue_xmit+0x28c/0x46d Apr 12 18:46:59 80.83.17.1 kernel: [12598.979426] [<ffffffffa01c12a9>] vlan_dev_hwaccel_hard_start_xmit+0x68/0x86 [8021q] Apr 12 18:46:59 80.83.17.1 kernel: [12598.979821] [<ffffffff811e540e>] dev_hard_start_xmit+0x232/0x304 Apr 12 18:46:59 80.83.17.1 kernel: [12598.980055] [<ffffffff811f648a>] sch_direct_xmit+0x5d/0x16b Apr 12 18:46:59 80.83.17.1 kernel: [12598.980284] [<ffffffff811f6654>] __qdisc_run+0xbc/0xdc Apr 12 18:46:59 80.83.17.1 kernel: [12598.980514] [<ffffffff811e5939>] dev_queue_xmit+0x346/0x46d Apr 12 18:46:59 80.83.17.1 kernel: [12598.980740] [<ffffffff8120a384>] ip_finish_output2+0x1c2/0x206 Apr 12 18:46:59 80.83.17.1 kernel: [12598.980966] [<ffffffff8120a430>] ip_finish_output+0x68/0x6a Apr 12 18:46:59 80.83.17.1 kernel: [12598.981197] [<ffffffff8120a4d2>] ip_output+0xa0/0xa5 Apr 12 18:46:59 80.83.17.1 kernel: [12598.981427] [<ffffffff81206d2e>] ip_forward_finish+0x2e/0x32 Apr 12 18:46:59 80.83.17.1 kernel: [12598.981654] [<ffffffff81206ff4>] ip_forward+0x2c2/0x322 Apr 12 18:46:59 80.83.17.1 kernel: [12598.981880] [<ffffffff81205ae0>] ip_rcv_finish+0x2f0/0x30a Apr 12 18:46:59 80.83.17.1 kernel: [12598.982111] [<ffffffff81205d77>] ip_rcv+0x27d/0x2a4 Apr 12 18:46:59 80.83.17.1 kernel: [12598.982337] [<ffffffff8124ad48>] ? vlan_hwaccel_do_receive+0x2b/0xda Apr 12 18:46:59 80.83.17.1 kernel: [12598.982566] [<ffffffff811e47b6>] netif_receive_skb+0x450/0x475 Apr 12 18:46:59 80.83.17.1 kernel: [12598.982793] [<ffffffff811e4909>] napi_skb_finish+0x24/0x3b Apr 12 18:46:59 80.83.17.1 kernel: [12598.983025] [<ffffffff8124b01b>] vlan_gro_receive+0x7c/0x81 Apr 12 18:46:59 80.83.17.1 kernel: [12598.983260] [<ffffffffa015d6c5>] e1000_receive_skb+0x4a/0x65 [e1000e] Apr 12 18:46:59 80.83.17.1 kernel: [12598.983492] [<ffffffffa015d8cb>] e1000_clean_rx_irq+0x1eb/0x29c [e1000e] Apr 12 18:46:59 80.83.17.1 kernel: [12598.983727] [<ffffffffa015ebfb>] e1000_clean+0x75/0x22e [e1000e] Apr 12 18:46:59 80.83.17.1 kernel: [12598.983955] [<ffffffffa0234d6c>] ? hfsc_dequeue+0x171/0x2a6 [sch_hfsc] Apr 12 18:46:59 80.83.17.1 kernel: [12598.984190] [<ffffffff811e4e56>] net_rx_action+0xa7/0x17a Apr 12 18:46:59 80.83.17.1 kernel: [12598.984416] [<ffffffff81039670>] __do_softirq+0x96/0x11a Apr 12 18:46:59 80.83.17.1 kernel: [12598.984642] [<ffffffff810037cc>] call_softirq+0x1c/0x28 Apr 12 18:46:59 80.83.17.1 kernel: [12598.984866] [<ffffffff81005543>] do_softirq+0x33/0x68 Apr 12 18:46:59 80.83.17.1 kernel: [12598.985097] [<ffffffff81039407>] irq_exit+0x36/0x75 Apr 12 18:46:59 80.83.17.1 kernel: [12598.985323] [<ffffffff81004c3e>] do_IRQ+0xaa/0xc1 Apr 12 18:46:59 80.83.17.1 kernel: [12598.985546] [<ffffffff8125ba93>] ret_from_intr+0x0/0xa Apr 12 18:46:59 80.83.17.1 kernel: [12598.985770] <EOI> [<ffffffff8100a0c7>] ? mwait_idle+0x66/0x6b Apr 12 18:46:59 80.83.17.1 kernel: [12598.986048] [<ffffffff81001d24>] ? enter_idle+0x20/0x22 Apr 12 18:46:59 80.83.17.1 kernel: [12598.986284] [<ffffffff81001d7b>] cpu_idle+0x55/0x8d Apr 12 18:46:59 80.83.17.1 kernel: [12598.986507] [<ffffffff8124bba5>] rest_init+0x79/0x7b Apr 12 18:46:59 80.83.17.1 kernel: [12598.986730] [<ffffffff813fca70>] start_kernel+0x362/0x36d Apr 12 18:46:59 80.83.17.1 kernel: [12598.986955] [<ffffffff813fc0a8>] x86_64_start_reservations+0xa5/0xa9 Apr 12 18:46:59 80.83.17.1 kernel: [12598.987189] [<ffffffff813fc189>] x86_64_start_kernel+0xdd/0xe4 ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64 2010-04-12 16:11 ` Denys Fedorysychenko @ 2010-04-12 20:09 ` Eric Dumazet 0 siblings, 0 replies; 37+ messages in thread From: Eric Dumazet @ 2010-04-12 20:09 UTC (permalink / raw) To: Denys Fedorysychenko; +Cc: Krishna Kumar2, David Miller, netdev Le lundi 12 avril 2010 à 19:11 +0300, Denys Fedorysychenko a écrit : > On Monday 12 April 2010 12:31:43 Eric Dumazet wrote: > . > Seems problem still remain. Patched kernel, but paniced now. > Btw, i dont have any multiqueue card, i think. > pièce jointe document texte brut (x.txt) > Apr 12 18:46:58 80.83.17.1 dropbear[4843]: exit before auth: Disconnect received > Apr 12 18:46:59 80.83.17.1 dropbear[4845]: Child connection from 82.113.44.186:48692 > Apr 12 18:46:59 80.83.17.1 dropbear[4844]: exit before auth: Disconnect received > Apr 12 18:46:59 80.83.17.1 kernel: [12598.956375] BUG: unable to handle kernel NULL pointer dereference at (null) > Apr 12 18:46:59 80.83.17.1 kernel: [12598.956571] IP: [<ffffffff811e587f>] dev_queue_xmit+0x28c/0x46d > Apr 12 18:46:59 80.83.17.1 kernel: [12598.956762] PGD 21debc067 PUD 21c881067 PMD 0 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.956947] Oops: 0000 [#1] SMP > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957126] last sysfs file: /sys/devices/virtual/vc/vcs3/dev > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957311] CPU 0 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] Pid: 0, comm: swapper Not tainted 2.6.33.2-build-0052test-64 #2 / > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] RIP: 0010:[<ffffffff811e587f>] [<ffffffff811e587f>] dev_queue_xmit+0x28c/0x46d > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] RSP: 0000:ffff880028203a30 EFLAGS: 00010202 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] RAX: 0000000000002000 RBX: 0000000000000000 RCX: ffff880209d8a900 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] RDX: ffff88021d870000 RSI: 0000000000000000 RDI: ffff88020a7b48e8 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] RBP: ffff880028203a60 R08: ffff88021c8be89c R09: ffff88021c8bec00 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] R10: dead000000200200 R11: dead000000100100 R12: ffff88021f98a880 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] R13: ffff88021d5c0900 R14: ffff88020a7b48e8 R15: ffff88021cbad000 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] FS: 0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] CR2: 0000000000000000 CR3: 000000021c9d8000 CR4: 00000000000006f0 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] Process swapper (pid: 0, threadinfo ffffffff81392000, task ffffffff813a1020) > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] Stack: > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] ffff88021d870000 ffff88021d5c0900 0000000000000042 ffff88021d5c0900 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] <0> ffff88021cbad000 ffff88021cbad000 ffff880028203a80 ffffffffa01c12a9 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] <0> 0000000000000000 ffff88020a7b48e8 ffff880028203ad0 ffffffff811e540e > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] Call Trace: > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] <IRQ> > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffffa01c12a9>] vlan_dev_hwaccel_hard_start_xmit+0x68/0x86 [8021q] > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff811e540e>] dev_hard_start_xmit+0x232/0x304 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff811f648a>] sch_direct_xmit+0x5d/0x16b > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff811f6654>] __qdisc_run+0xbc/0xdc > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff811e5939>] dev_queue_xmit+0x346/0x46d > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff8120a384>] ip_finish_output2+0x1c2/0x206 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff8120a430>] ip_finish_output+0x68/0x6a > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff8120a4d2>] ip_output+0xa0/0xa5 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff81206d2e>] ip_forward_finish+0x2e/0x32 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff81206ff4>] ip_forward+0x2c2/0x322 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff81205ae0>] ip_rcv_finish+0x2f0/0x30a > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff81205d77>] ip_rcv+0x27d/0x2a4 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff8124ad48>] ? vlan_hwaccel_do_receive+0x2b/0xda > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff811e47b6>] netif_receive_skb+0x450/0x475 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff811e4909>] napi_skb_finish+0x24/0x3b > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff8124b01b>] vlan_gro_receive+0x7c/0x81 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffffa015d6c5>] e1000_receive_skb+0x4a/0x65 [e1000e] > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffffa015d8cb>] e1000_clean_rx_irq+0x1eb/0x29c [e1000e] > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffffa015ebfb>] e1000_clean+0x75/0x22e [e1000e] > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffffa0234d6c>] ? hfsc_dequeue+0x171/0x2a6 [sch_hfsc] > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff811e4e56>] net_rx_action+0xa7/0x17a > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff81039670>] __do_softirq+0x96/0x11a > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff810037cc>] call_softirq+0x1c/0x28 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff81005543>] do_softirq+0x33/0x68 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff81039407>] irq_exit+0x36/0x75 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff81004c3e>] do_IRQ+0xaa/0xc1 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff8125ba93>] ret_from_intr+0x0/0xa > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] <EOI> > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff8100a0c7>] ? mwait_idle+0x66/0x6b > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff81001d24>] ? enter_idle+0x20/0x22 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff81001d7b>] cpu_idle+0x55/0x8d > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff8124bba5>] rest_init+0x79/0x7b > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff813fca70>] start_kernel+0x362/0x36d > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff813fc0a8>] x86_64_start_reservations+0xa5/0xa9 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] [<ffffffff813fc189>] x86_64_start_kernel+0xdd/0xe4 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] Code: e2 48 8b 55 d0 49 c1 e4 07 66 41 8b 86 a6 00 00 00 4c 03 a2 00 03 00 00 80 e4 cf 80 cc 20 49 8b 5c 24 08 66 41 89 86 a6 00 00 00 <48> 83 3b 00 0f 84 bb 00 00 00 4c 8d ab 9c 00 00 00 4c 89 ef e8 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] RIP [<ffffffff811e587f>] dev_queue_xmit+0x28c/0x46d > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] RSP <ffff880028203a30> > Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] CR2: 0000000000000000 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.974856] ---[ end trace 739e5480c8ab784f ]--- > Apr 12 18:46:59 80.83.17.1 kernel: [12598.975082] Kernel panic - not syncing: Fatal exception in interrupt > Apr 12 18:46:59 80.83.17.1 kernel: [12598.975311] Pid: 0, comm: swapper Tainted: G D 2.6.33.2-build-0052test-64 #2 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.975706] Call Trace: > Apr 12 18:46:59 80.83.17.1 kernel: [12598.975920] <IRQ> [<ffffffff81259753>] panic+0xa0/0x161 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.976200] [<ffffffff81003293>] ? apic_timer_interrupt+0x13/0x20 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.976431] [<ffffffff81035673>] ? kmsg_dump+0x112/0x12c > Apr 12 18:46:59 80.83.17.1 kernel: [12598.976657] [<ffffffff81006651>] oops_end+0xaa/0xba > Apr 12 18:46:59 80.83.17.1 kernel: [12598.976882] [<ffffffff8101e653>] no_context+0x1f3/0x202 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.977113] [<ffffffff8101e81c>] __bad_area_nosemaphore+0x1ba/0x1e0 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.977347] [<ffffffff8113f8b3>] ? swiotlb_map_page+0x0/0xd5 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.977577] [<ffffffffa015c55a>] ? pci_map_single+0x8a/0x99 [e1000e] > Apr 12 18:46:59 80.83.17.1 kernel: [12598.977806] [<ffffffff8113f0c0>] ? swiotlb_dma_mapping_error+0x18/0x25 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.978045] [<ffffffffa015a2e0>] ? pci_dma_mapping_error+0x31/0x3d [e1000e] > Apr 12 18:46:59 80.83.17.1 kernel: [12598.978282] [<ffffffffa015cc37>] ? e1000_xmit_frame+0x6ce/0xa43 [e1000e] > Apr 12 18:46:59 80.83.17.1 kernel: [12598.978513] [<ffffffff8101e850>] bad_area_nosemaphore+0xe/0x10 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.978741] [<ffffffff8101eb32>] do_page_fault+0x114/0x24a > Apr 12 18:46:59 80.83.17.1 kernel: [12598.978967] [<ffffffff8125bc9f>] page_fault+0x1f/0x30 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.979196] [<ffffffff811e587f>] ? dev_queue_xmit+0x28c/0x46d > Apr 12 18:46:59 80.83.17.1 kernel: [12598.979426] [<ffffffffa01c12a9>] vlan_dev_hwaccel_hard_start_xmit+0x68/0x86 [8021q] > Apr 12 18:46:59 80.83.17.1 kernel: [12598.979821] [<ffffffff811e540e>] dev_hard_start_xmit+0x232/0x304 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.980055] [<ffffffff811f648a>] sch_direct_xmit+0x5d/0x16b > Apr 12 18:46:59 80.83.17.1 kernel: [12598.980284] [<ffffffff811f6654>] __qdisc_run+0xbc/0xdc > Apr 12 18:46:59 80.83.17.1 kernel: [12598.980514] [<ffffffff811e5939>] dev_queue_xmit+0x346/0x46d > Apr 12 18:46:59 80.83.17.1 kernel: [12598.980740] [<ffffffff8120a384>] ip_finish_output2+0x1c2/0x206 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.980966] [<ffffffff8120a430>] ip_finish_output+0x68/0x6a > Apr 12 18:46:59 80.83.17.1 kernel: [12598.981197] [<ffffffff8120a4d2>] ip_output+0xa0/0xa5 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.981427] [<ffffffff81206d2e>] ip_forward_finish+0x2e/0x32 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.981654] [<ffffffff81206ff4>] ip_forward+0x2c2/0x322 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.981880] [<ffffffff81205ae0>] ip_rcv_finish+0x2f0/0x30a > Apr 12 18:46:59 80.83.17.1 kernel: [12598.982111] [<ffffffff81205d77>] ip_rcv+0x27d/0x2a4 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.982337] [<ffffffff8124ad48>] ? vlan_hwaccel_do_receive+0x2b/0xda > Apr 12 18:46:59 80.83.17.1 kernel: [12598.982566] [<ffffffff811e47b6>] netif_receive_skb+0x450/0x475 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.982793] [<ffffffff811e4909>] napi_skb_finish+0x24/0x3b > Apr 12 18:46:59 80.83.17.1 kernel: [12598.983025] [<ffffffff8124b01b>] vlan_gro_receive+0x7c/0x81 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.983260] [<ffffffffa015d6c5>] e1000_receive_skb+0x4a/0x65 [e1000e] > Apr 12 18:46:59 80.83.17.1 kernel: [12598.983492] [<ffffffffa015d8cb>] e1000_clean_rx_irq+0x1eb/0x29c [e1000e] > Apr 12 18:46:59 80.83.17.1 kernel: [12598.983727] [<ffffffffa015ebfb>] e1000_clean+0x75/0x22e [e1000e] > Apr 12 18:46:59 80.83.17.1 kernel: [12598.983955] [<ffffffffa0234d6c>] ? hfsc_dequeue+0x171/0x2a6 [sch_hfsc] > Apr 12 18:46:59 80.83.17.1 kernel: [12598.984190] [<ffffffff811e4e56>] net_rx_action+0xa7/0x17a > Apr 12 18:46:59 80.83.17.1 kernel: [12598.984416] [<ffffffff81039670>] __do_softirq+0x96/0x11a > Apr 12 18:46:59 80.83.17.1 kernel: [12598.984642] [<ffffffff810037cc>] call_softirq+0x1c/0x28 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.984866] [<ffffffff81005543>] do_softirq+0x33/0x68 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.985097] [<ffffffff81039407>] irq_exit+0x36/0x75 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.985323] [<ffffffff81004c3e>] do_IRQ+0xaa/0xc1 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.985546] [<ffffffff8125ba93>] ret_from_intr+0x0/0xa > Apr 12 18:46:59 80.83.17.1 kernel: [12598.985770] <EOI> [<ffffffff8100a0c7>] ? mwait_idle+0x66/0x6b > Apr 12 18:46:59 80.83.17.1 kernel: [12598.986048] [<ffffffff81001d24>] ? enter_idle+0x20/0x22 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.986284] [<ffffffff81001d7b>] cpu_idle+0x55/0x8d > Apr 12 18:46:59 80.83.17.1 kernel: [12598.986507] [<ffffffff8124bba5>] rest_init+0x79/0x7b > Apr 12 18:46:59 80.83.17.1 kernel: [12598.986730] [<ffffffff813fca70>] start_kernel+0x362/0x36d > Apr 12 18:46:59 80.83.17.1 kernel: [12598.986955] [<ffffffff813fc0a8>] x86_64_start_reservations+0xa5/0xa9 > Apr 12 18:46:59 80.83.17.1 kernel: [12598.987189] [<ffffffff813fc189>] x86_64_start_kernel+0xdd/0xe4 This is becoming tricky :( This is forwarding case, no socket involved in this case. If no multiqueue is involved, I dont see how it can happen. We should take a look at requeues (qdisc congestion), there might be a problem with them. ^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2010-04-30 16:47 UTC | newest] Thread overview: 37+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-04-11 20:38 NULL pointer dereference panic in stable (2.6.33.2), amd64 Denys Fedorysychenko 2010-04-11 22:35 ` Eric Dumazet 2010-04-11 23:04 ` Denys Fedorysychenko 2010-04-11 23:11 ` Eric Dumazet 2010-04-11 23:36 ` Denys Fedorysychenko 2010-04-12 3:38 ` Krishna Kumar2 2010-04-12 6:01 ` Eric Dumazet 2010-04-12 7:18 ` Eric Dumazet 2010-04-12 7:36 ` David Miller 2010-04-15 6:52 ` David Miller 2010-04-15 8:02 ` Eric Dumazet 2010-04-15 8:26 ` David Miller 2010-04-15 8:51 ` Eric Dumazet 2010-04-15 9:06 ` David Miller 2010-04-15 9:11 ` Denys Fedorysychenko 2010-04-15 10:37 ` Eric Dumazet 2010-04-29 10:50 ` Denys Fedorysychenko 2010-04-15 20:30 ` Eric Dumazet 2010-04-15 20:46 ` Eric Dumazet 2010-04-15 21:33 ` David Miller 2010-04-16 22:18 ` [PATCH net-next-2.6] net: Introduce skb_orphan_try() Eric Dumazet 2010-04-18 9:46 ` David Miller 2010-04-21 6:08 ` Eric Dumazet 2010-04-22 5:56 ` David Miller 2010-04-22 7:10 ` Eric Dumazet 2010-04-22 7:16 ` David Miller 2010-04-22 7:24 ` Eric Dumazet 2010-04-22 7:26 ` David Miller 2010-04-22 7:33 ` Eric Dumazet 2010-04-22 7:41 ` David Miller 2010-04-22 7:47 ` Eric Dumazet 2010-04-22 7:54 ` David Miller 2010-04-22 7:59 ` Eric Dumazet 2010-04-12 7:54 ` NULL pointer dereference panic in stable (2.6.33.2), amd64 Krishna Kumar2 2010-04-12 9:31 ` Eric Dumazet 2010-04-12 16:11 ` Denys Fedorysychenko 2010-04-12 20:09 ` Eric Dumazet
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).