* [BUG] 2.6.29-rc* QinQ vlan trunking regression
@ 2009-02-28 18:05 Bart Trojanowski
2009-03-04 7:43 ` David Miller
0 siblings, 1 reply; 29+ messages in thread
From: Bart Trojanowski @ 2009-02-28 18:05 UTC (permalink / raw)
To: Ben Greear, David S. Miller, Stephen Hemminger, Eric Dumazet,
Frank Blaschka <frank.blasc
Cc: bart, netdev, linux-kernel
Hi all,
2.6.29-rc* introduces a bug that causes a crash when a vlan is put into
another vlan, so called vlan trunking or QinQ.
I can reproduce it reliably with:
$ modprobe 8021q
$ vconfig add eth1 5
$ ifconfig eth1.5 up
$ vconfig add eth1.5 4
I have seen crashes on all 2.6.29-rc kernels, but it does work for me on
2.6.28 (and prior). All my testing was done on ia32 and amd64 systems.
I've reproduced it with various configs, but if you need my config let
me know.
I was tired of crashing my devel box, so this Ooops came from a kvm VM
(with the default 8139cp driver) where I attempted a bisect to find the
source of the bug. Unfortunately I was unable to bisect it because of
other unrelated crashes in the history that made it too time consuming.
I have discovered that by doing:
$ git reset --hard origin/master # to HEAD of torvalds/linux-2.6.git
$ git revert cc883d16c3b7434c7da2c45b54a49c2a99e83db7
$ git revert f7d1b9f5aafa371d7f51f644aa3c38bc914e9205
$ git revert 656299f706e52e0409733d704c2761f1b12d6954
... the crash goes away. I just validated the procedure with Linus'
778ef1e6cbb049c9bcbf405936ee6f2b6e451892. And other than seeing...
[ 154.094561] eth1.5 (): not using net_device_ops yet
[ 154.220840] eth1.5.4 (): not using net_device_ops yet
... there is no trace of this bug. I suspect that only 656299f need to
be reverted/fixed, but the other two patches are prerequisites for it to
apply cleanly.
Hope that makes the source of the bug apparent to someone.
Cheers,
-Bart
PS: I see another problem with my KVM setup, but I think that has
something to do with my KVM host kernel, not the guest. More
specifically, when I do QinQ I only see the inner VLAN tags on the
underlying bridge device (under the KVM), but the outer most VLAN tag is
missing. But that's an exercise for another day.
--- 8< ---
[ 1201.822546] 802.1Q VLAN Support v1.8 Ben Greear <greearb@candelatech.com>
[ 1201.830944] All bugs added by David S. Miller <davem@redhat.com>
[ 1202.016124] PANIC: double fault, gdt at c364b000 [255 bytes]
[ 1202.016789] Slab corruption: size-32 start=f3da9a28, len=32
[ 1202.016793] Redzone: 0xf8163c85f3da9a2c/0xf3dda800f8163c85.
[ 1202.016795] Last user: [<f8163c85>](vlan_dev_neigh_setup+0x23/0x2a [8021q])
[ 1202.016812] 000: 00 a8 dd f3 38 9a da f3 85 3c 16 f8 00 a8 dd f3
[ 1202.016818] 010: 44 9a da f3 85 3c 16 f8 00 a8 dd f3 50 9a da f3
[ 1202.016825] Prev obj: start=f3da99c0, len=32
[ 1202.016826] Redzone: 0xf3dda800f8163c85/0xf3da99f0f3dda800.
[ 1202.016828] Last user: [<f3dda800>](0xf3dda800)
[ 1202.016831] 000: cc 99 da f3 85 3c 16 f8 00 a8 dd f3 d8 99 da f3
[ 1202.016836] 010: 85 3c 16 f8 00 a8 dd f3 e4 99 da f3 85 3c 16 f8
[ 1202.016842] Next obj: start=f3da9a30, len=32
[ 1202.016843] Redzone: 0xf3da9a38f3dda800/0xf8163c85f3da9a5c.
[ 1202.016845] Last user: [<f3da9a68>](0xf3da9a68)
[ 1202.016847] 000: 85 3c 16 f8 00 a8 dd f3 44 9a da f3 85 3c 16 f8
[ 1202.016852] 010: 00 a8 dd f3 50 9a da f3 85 3c 16 f8 00 a8 dd f3
[ 1202.016858] slab error in cache_alloc_debugcheck_after(): cache `size-32': double free, or memory outside object was overwritten
[ 1202.016863] Pid: 2229, comm: ps Tainted: G S 2.6.29-rc6-bisect-00121-g64e7130 #1
[ 1202.016865] Call Trace:
[ 1202.016874] [<c03cb5a8>] ? printk+0xf/0x17
[ 1202.016880] [<c018fea6>] __slab_error+0x17/0x1c
[ 1202.016884] [<c01902c8>] cache_alloc_debugcheck_after+0xd0/0x1b8
[ 1202.016886] [<c0191974>] kmem_cache_alloc+0xca/0x104
[ 1202.016889] [<c0191390>] ? cache_alloc_refill+0x397/0x62a
[ 1202.016891] [<c0191390>] ? cache_alloc_refill+0x397/0x62a
[ 1202.016893] [<c0191390>] cache_alloc_refill+0x397/0x62a
[ 1202.016897] [<c01a2598>] ? __d_lookup+0x0/0x11e
[ 1202.016899] [<c019195a>] kmem_cache_alloc+0xb0/0x104
[ 1202.016902] [<c01a28e4>] ? d_alloc+0x1e/0x16e
[ 1202.016904] [<c01a28e4>] d_alloc+0x1e/0x16e
[ 1202.016906] [<c019aaff>] do_lookup+0x9f/0x154
[ 1202.016909] [<c019c9e0>] __link_path_walk+0x86a/0xc5e
[ 1202.016911] [<c019cf67>] path_walk+0x50/0xa5
[ 1202.016913] [<c019d180>] do_path_lookup+0x140/0x188
[ 1202.016916] [<c019d20e>] path_lookup_open+0x46/0x77
[ 1202.016918] [<c019dbfc>] do_filp_open+0xa3/0x6cd
[ 1202.016923] [<c0288d64>] ? _raw_spin_unlock+0x74/0x78
[ 1202.016927] [<c03cdf0a>] ? _spin_unlock+0x1d/0x20
[ 1202.016929] [<c019296b>] do_sys_open+0x42/0xb7
[ 1202.016931] [<c0192a22>] sys_open+0x1e/0x26
[ 1202.016935] [<c0103507>] sysenter_do_call+0x12/0x3a
[ 1202.016938] f3da9a20: redzone 1:0xf8163c85f3da9a2c, redzone 2:0xf3dda800f8163c85
[ 1202.016953] ------------[ cut here ]------------
[ 1202.016955] WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c()
[ 1202.016958] Hardware name:
[ 1202.016960] list_add corruption. next->prev should be prev (f704f650), but was f8163c85. (next=f3da9a98).
[ 1202.016961] Modules linked in: 8021q virtio_balloon virtio_pci thermal_sys
[ 1202.016967] Pid: 2229, comm: ps Tainted: G S 2.6.29-rc6-bisect-00121-g64e7130 #1
[ 1202.016969] Call Trace:
[ 1202.016972] [<c0129769>] warn_slowpath+0x71/0xa8
[ 1202.016975] [<c0288d64>] ? _raw_spin_unlock+0x74/0x78
[ 1202.016978] [<c0288d64>] ? _raw_spin_unlock+0x74/0x78
[ 1202.016980] [<c03cb4ab>] ? dump_stack+0x57/0x61
[ 1202.016983] [<c02891e3>] __list_add+0x27/0x5c
[ 1202.016989] [<f8163c85>] ? vlan_dev_neigh_setup+0x23/0x2a [8021q]
[ 1202.016994] [<c0191220>] cache_alloc_refill+0x227/0x62a
[ 1202.016997] [<c019195a>] kmem_cache_alloc+0xb0/0x104
[ 1202.016999] [<c01a28e4>] ? d_alloc+0x1e/0x16e
[ 1202.017002] [<c01a28e4>] d_alloc+0x1e/0x16e
[ 1202.017004] [<c019aaff>] do_lookup+0x9f/0x154
[ 1202.017006] [<c019c9e0>] __link_path_walk+0x86a/0xc5e
[ 1202.017008] [<c019cf67>] path_walk+0x50/0xa5
[ 1202.017011] [<c019d180>] do_path_lookup+0x140/0x188
[ 1202.017013] [<c019d20e>] path_lookup_open+0x46/0x77
[ 1202.017015] [<c019dbfc>] do_filp_open+0xa3/0x6cd
[ 1202.017018] [<c0288d64>] ? _raw_spin_unlock+0x74/0x78
[ 1202.017020] [<c03cdf0a>] ? _spin_unlock+0x1d/0x20
[ 1202.017023] [<c019296b>] do_sys_open+0x42/0xb7
[ 1202.017025] [<c0192a22>] sys_open+0x1e/0x26
[ 1202.017027] [<c0103507>] sysenter_do_call+0x12/0x3a
[ 1202.017029] ---[ end trace 32f9f05d27403734 ]---
[ 1202.017057] Slab corruption: size-32 start=f3da99f0, len=32
[ 1202.017062] Redzone: 0xf3dda800f8163c85/0xf3da9a20f3dda800.
[ 1202.017064] Last user: [<f3dda800>](0xf3dda800)
[ 1202.017067] 000: fc 99 da f3 85 3c 16 f8 00 a8 dd f3 08 9a da f3
[ 1202.017072] 010: 85 3c 16 f8 00 a8 dd f3 14 9a da f3 85 3c 16 f8
[ 1202.017078] Prev obj: start=f3da9988, len=32
[ 1202.017079] Redzone: 0xf3da9990f3dda800/0xf8163c85f3da99b4.
[ 1202.017081] Last user: [<f3da99c0>](0xf3da99c0)
[ 1202.017083] 000: 85 3c 16 f8 00 a8 dd f3 9c 99 da f3 85 3c 16 f8
[ 1202.017088] 010: 00 a8 dd f3 a8 99 da f3 85 3c 16 f8 00 a8 dd f3
[ 1202.017093] Next obj: start=f3da99f8, len=32
[ 1202.017095] Redzone: 0xf8163c85f3da99fc/0xf3dda800f8163c85.
[ 1202.017098] Last user: [<d84156c5>](0xd84156c5)
[ 1202.017100] 000: 00 a8 dd f3 08 9a da f3 85 3c 16 f8 00 a8 dd f3
[ 1202.017105] 010: 14 9a da f3 85 3c 16 f8 00 a8 dd f3 20 9a da f3
[ 1202.017111] slab error in cache_alloc_debugcheck_after(): cache `size-32': double free, or memory outside object was overwritten
[ 1202.017114] Pid: 2229, comm: ps Tainted: G S W 2.6.29-rc6-bisect-00121-g64e7130 #1
[ 1202.017116] Call Trace:
[ 1202.017119] [<c03cb5a8>] ? printk+0xf/0x17
[ 1202.017122] [<c018fea6>] __slab_error+0x17/0x1c
[ 1202.017125] [<c01902c8>] cache_alloc_debugcheck_after+0xd0/0x1b8
[ 1202.017127] [<c0191974>] kmem_cache_alloc+0xca/0x104
[ 1202.017129] [<c0191390>] ? cache_alloc_refill+0x397/0x62a
[ 1202.017131] [<c0191390>] ? cache_alloc_refill+0x397/0x62a
[ 1202.017133] [<c0191390>] cache_alloc_refill+0x397/0x62a
[ 1202.017136] [<c02891e3>] ? __list_add+0x27/0x5c
[ 1202.017138] [<c019195a>] kmem_cache_alloc+0xb0/0x104
[ 1202.017144] [<c01c622c>] ? proc_alloc_inode+0x16/0x67
[ 1202.017146] [<c01c622c>] proc_alloc_inode+0x16/0x67
[ 1202.017149] [<c01a4584>] alloc_inode+0x13/0x3a
[ 1202.017152] [<c01a4825>] new_inode+0x17/0x7e
[ 1202.017154] [<c01c71bf>] proc_pid_make_inode+0xc/0xb3
[ 1202.017157] [<c01c9453>] proc_pident_instantiate+0x17/0x86
[ 1202.017159] [<c01c95c3>] proc_pident_lookup+0x6a/0x8b
[ 1202.017167] [<c01c9615>] proc_tgid_base_lookup+0xf/0x11
[ 1202.017169] [<c019ab15>] do_lookup+0xb5/0x154
[ 1202.017171] [<c019c9e0>] __link_path_walk+0x86a/0xc5e
[ 1202.017174] [<c019cf67>] path_walk+0x50/0xa5
[ 1202.017176] [<c019d180>] do_path_lookup+0x140/0x188
[ 1202.017178] [<c019d20e>] path_lookup_open+0x46/0x77
[ 1202.017180] [<c019dbfc>] do_filp_open+0xa3/0x6cd
[ 1202.017183] [<c0288d64>] ? _raw_spin_unlock+0x74/0x78
[ 1202.017186] [<c03cdf0a>] ? _spin_unlock+0x1d/0x20
[ 1202.017188] [<c019296b>] do_sys_open+0x42/0xb7
[ 1202.017190] [<c0192a22>] sys_open+0x1e/0x26
[ 1202.017193] [<c0103507>] sysenter_do_call+0x12/0x3a
[ 1202.017195] f3da99e8: redzone 1:0xf3dda800f8163c85, redzone 2:0xf3da9a20f3dda800
[ 1202.017214] ------------[ cut here ]------------
[ 1202.017216] WARNING: at lib/list_debug.c:26 __list_add+0x27/0x5c()
[ 1202.017218] Hardware name:
[ 1202.017219] list_add corruption. next->prev should be prev (f704f290), but was f3dda800. (next=f3da9a60).
[ 1202.017221] Modules linked in: 8021q virtio_balloon virtio_pci thermal_sys
[ 1202.017226] Pid: 2229, comm: ps Tainted: G S W 2.6.29-rc6-bisect-00121-g64e7130 #1
[ 1202.017227] Call Trace:
[ 1202.017230] [<c0129769>] warn_slowpath+0x71/0xa8
[ 1202.017232] [<c0288d64>] ? _raw_spin_unlock+0x74/0x78
[ 1202.017235] [<c0288d64>] ? _raw_spin_unlock+0x74/0x78
[ 1202.017238] [<c03cb4ab>] ? dump_stack+0x57/0x61
[ 1202.017240] [<c02891e3>] __list_add+0x27/0x5c
[ 1202.017242] [<c0191220>] cache_alloc_refill+0x227/0x62a
[ 1202.017245] [<c019195a>] kmem_cache_alloc+0xb0/0x104
[ 1202.017248] [<c01c622c>] ? proc_alloc_inode+0x16/0x67
[ 1202.017251] [<c01c622c>] proc_alloc_inode+0x16/0x67
[ 1202.017253] [<c01a4584>] alloc_inode+0x13/0x3a
[ 1202.017256] [<c01a4825>] new_inode+0x17/0x7e
[ 1202.017258] [<c01c71bf>] proc_pid_make_inode+0xc/0xb3
[ 1202.017261] [<c01c9453>] proc_pident_instantiate+0x17/0x86
[ 1202.017263] [<c01c95c3>] proc_pident_lookup+0x6a/0x8b
[ 1202.017265] [<c01c9615>] proc_tgid_base_lookup+0xf/0x11
[ 1202.017267] [<c019ab15>] do_lookup+0xb5/0x154
[ 1202.017269] [<c019c9e0>] __link_path_walk+0x86a/0xc5e
[ 1202.017272] [<c019cf67>] path_walk+0x50/0xa5
[ 1202.017274] [<c019d180>] do_path_lookup+0x140/0x188
[ 1202.017276] [<c019d20e>] path_lookup_open+0x46/0x77
[ 1202.017278] [<c019dbfc>] do_filp_open+0xa3/0x6cd
[ 1202.017281] [<c0288d64>] ? _raw_spin_unlock+0x74/0x78
[ 1202.017284] [<c03cdf0a>] ? _spin_unlock+0x1d/0x20
[ 1202.017286] [<c019296b>] do_sys_open+0x42/0xb7
[ 1202.017288] [<c0192a22>] sys_open+0x1e/0x26
[ 1202.017291] [<c0103507>] sysenter_do_call+0x12/0x3a
[ 1202.017293] ---[ end trace 32f9f05d27403735 ]---
[ 1202.017312] Slab corruption: size-32 start=f3da99b8, len=32
[ 1202.017314] Redzone: 0xf3da99c0f3dda800/0xf8163c85f3da99e4.
[ 1202.017316] Last user: [<f3da99f0>](0xf3da99f0)
[ 1202.017319] 000: 85 3c 16 f8 00 a8 dd f3 cc 99 da f3 85 3c 16 f8
[ 1202.017324] 010: 00 a8 dd f3 d8 99 da f3 85 3c 16 f8 00 a8 dd f3
[ 1202.017330] Prev obj: start=f3da9950, len=32
[ 1202.017331] Redzone: 0xf8163c85f3da9954/0xf3dda800f8163c85.
[ 1202.017333] Last user: [<f8163c85>](vlan_dev_neigh_setup+0x23/0x2a [8021q])
[ 1202.017338] 000: 00 a8 dd f3 60 99 da f3 85 3c 16 f8 00 a8 dd f3
[ 1202.017343] 010: 6c 99 da f3 85 3c 16 f8 00 a8 dd f3 78 99 da f3
[ 1202.017348] Next obj: start=f3da99c0, len=32
[ 1202.017350] Redzone: 0xf3dda800f8163c85/0xf3da99f0f3dda800.
[ 1202.017351] Last user: [<d84156c5>](0xd84156c5)
[ 1202.017353] 000: cc 99 da f3 85 3c 16 f8 00 a8 dd f3 d8 99 da f3
[ 1202.017359] 010: 85 3c 16 f8 00 a8 dd f3 e4 99 da f3 85 3c 16 f8
[ 1202.017365] slab error in cache_alloc_debugcheck_after(): cache `size-32': double free, or memory outside object was overwritten
[ 1202.017367] Pid: 2229, comm: ps Tainted: G S W 2.6.29-rc6-bisect-00121-g64e7130 #1
[ 1202.017369] Call Trace:
[ 1202.017372] [<c03cb5a8>] ? printk+0xf/0x17
[ 1202.017376] [<c018fea6>] __slab_error+0x17/0x1c
[ 1202.017379] [<c01902c8>] cache_alloc_debugcheck_after+0xd0/0x1b8
[ 1202.017381] [<c0191974>] kmem_cache_alloc+0xca/0x104
[ 1202.017384] [<c01a8d8d>] ? single_open+0x25/0x74
[ 1202.017387] [<c01a8d8d>] ? single_open+0x25/0x74
[ 1202.017390] [<c01c84dd>] ? proc_single_show+0x0/0x6b
[ 1202.017392] [<c01a8d8d>] single_open+0x25/0x74
[ 1202.017395] [<c01c7164>] proc_single_open+0x17/0x2c
[ 1202.017397] [<c0192b5b>] __dentry_open+0x11c/0x210
[ 1202.017399] [<c0192ce9>] nameidata_to_filp+0x2c/0x43
[ 1202.017402] [<c01c714d>] ? proc_single_open+0x0/0x2c
[ 1202.017404] [<c019dee1>] do_filp_open+0x388/0x6cd
[ 1202.017407] [<c0288d64>] ? _raw_spin_unlock+0x74/0x78
[ 1202.017410] [<c03cdf0a>] ? _spin_unlock+0x1d/0x20
[ 1202.017412] [<c019296b>] do_sys_open+0x42/0xb7
[ 1202.017414] [<c0192a22>] sys_open+0x1e/0x26
[ 1202.017417] [<c0103507>] sysenter_do_call+0x12/0x3a
[ 1202.017419] f3da99b0: redzone 1:0xf3da99c0f3dda800, redzone 2:0xf8163c85f3da99e4
[ 1202.017523] ------------[ cut here ]------------
[ 1202.017525] kernel BUG at mm/slab.c:2898!
[ 1202.017527] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 1202.017530] last sysfs file: /sys/class/net/eth1.5/address
[ 1202.017532] Modules linked in: 8021q virtio_balloon virtio_pci thermal_sys
[ 1202.017536]
[ 1202.017538] Pid: 2229, comm: ps Tainted: G S W (2.6.29-rc6-bisect-00121-g64e7130 #1)
[ 1202.017541] EIP: 0060:[<c0190686>] EFLAGS: 00010006 CPU: 0
[ 1202.017544] EIP is at cache_free_debugcheck+0x1cd/0x2a4
[ 1202.017546] EAX: f3da9980 EBX: f3da9000 ECX: f3da9018 EDX: 0000002b
[ 1202.017548] ESI: d84156c5 EDI: f7000200 EBP: f5a99f34 ESP: f5a99f04
[ 1202.017550] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[ 1202.017553] Process ps (pid: 2229, ti=f5a98000 task=f47e1708 task.ti=f5a98000)
[ 1202.017554] Stack:
[ 1202.017555] c01a8388 00a88654 f3da9000 635688c0 d84156c5 c01a833a f3da99b0 f7020f30
[ 1202.017559] f305af70 f7000200 f7028f30 f3da99b8 f5a99f4c c01908f8 00000282 00000000
[ 1202.017563] f3da99b8 f4432f40 f5a99f5c c01a8388 00000010 f2b1ddb8 f5a99f7c c01950e1
[ 1202.017567] Call Trace:
[ 1202.017569] [<c01a8388>] ? single_release+0x1c/0x22
[ 1202.017571] [<c01a833a>] ? seq_release+0x18/0x1d
[ 1202.017574] [<c01908f8>] ? kfree+0x99/0xe3
[ 1202.017576] [<c01a8388>] ? single_release+0x1c/0x22
[ 1202.017579] [<c01950e1>] ? __fput+0xca/0x175
[ 1202.017582] [<c01951a5>] ? fput+0x19/0x1b
[ 1202.017584] [<c019287b>] ? filp_close+0x51/0x5b
[ 1202.017586] [<c01928ef>] ? sys_close+0x6a/0xa4
[ 1202.017588] [<c0103507>] ? sysenter_do_call+0x12/0x3a
[ 1202.017591] Code: 5d e8 89 54 03 fc 8b 5d d8 8b 45 e8 8b 4b 0c 29 c8 f7 67 30 3b 57 38 72 04 0f 0b eb fe 89 d0 0f af 47 2c 8d 04 01 39 45 e8 74 04 <0f> 0b eb fe 8b 45 d8 c7 44 90 1c fe ff ff ff 8b 47 34 f6 c4 08
[ 1202.017613] EIP: [<c0190686>] cache_free_debugcheck+0x1cd/0x2a4 SS:ESP 0068:f5a99f04
[ 1202.017619] ---[ end trace 32f9f05d27403736 ]---
[ 1202.017624] BUG: sleeping function called from invalid context at kernel/rwsem.c:21
[ 1202.017626] in_atomic(): 0, irqs_disabled(): 1, pid: 2229, name: ps
[ 1202.017628] INFO: lockdep is turned off.
[ 1202.017629] Pid: 2229, comm: ps Tainted: G S D W 2.6.29-rc6-bisect-00121-g64e7130 #1
[ 1202.017631] Call Trace:
[ 1202.017635] [<c01241b2>] __might_sleep+0xd6/0xdb
[ 1202.017638] [<c03cd167>] down_read+0x15/0x3e
[ 1202.017643] [<c0153fd0>] acct_collect+0x37/0x156
[ 1202.017646] [<c012c671>] do_exit+0x13f/0x6af
[ 1202.017648] [<c0288d64>] ? _raw_spin_unlock+0x74/0x78
[ 1202.017651] [<c03cb5a8>] ? printk+0xf/0x17
[ 1202.017653] [<c012989a>] ? oops_exit+0x23/0x28
[ 1202.017656] [<c0105ffb>] oops_end+0xa1/0xa9
[ 1202.017659] [<c0106187>] die+0x54/0x5a
[ 1202.017661] [<c01040ab>] do_trap+0x89/0xa2
[ 1202.017664] [<c01043b3>] ? do_invalid_op+0x0/0x84
[ 1202.017666] [<c010442d>] do_invalid_op+0x7a/0x84
[ 1202.017669] [<c0190686>] ? cache_free_debugcheck+0x1cd/0x2a4
[ 1202.017673] [<c02c08bb>] ? n_tty_read+0x394/0x620
[ 1202.017678] [<c011a5ac>] ? kernel_map_pages+0xde/0xfe
[ 1202.017681] [<c03ce38a>] error_code+0x72/0x78
[ 1202.017684] [<c0190686>] ? cache_free_debugcheck+0x1cd/0x2a4
[ 1202.017686] [<c01a8388>] ? single_release+0x1c/0x22
[ 1202.017689] [<c01a833a>] ? seq_release+0x18/0x1d
[ 1202.017691] [<c01908f8>] kfree+0x99/0xe3
[ 1202.017693] [<c01a8388>] single_release+0x1c/0x22
[ 1202.017696] [<c01950e1>] __fput+0xca/0x175
[ 1202.017698] [<c01951a5>] fput+0x19/0x1b
[ 1202.017700] [<c019287b>] filp_close+0x51/0x5b
[ 1202.017702] [<c01928ef>] sys_close+0x6a/0xa4
[ 1202.017704] [<c0103507>] sysenter_do_call+0x12/0x3a
[ 1202.017858] BUG: unable to handle kernel paging request at 01454126
[ 1202.017861] IP: [<c011ef4a>] update_curr+0xc/0x17e
[ 1202.017865] *pde = 00000000
[ 1202.017867] Oops: 0000 [#2] SMP DEBUG_PAGEALLOC
[ 1202.017870] last sysfs file: /sys/class/net/eth1.5/address
[ 1202.017871] Modules linked in: 8021q virtio_balloon virtio_pci thermal_sys
[ 1202.017874]
[ 1202.017877] Pid: 2229, comm: ps Tainted: G S D W (2.6.29-rc6-bisect-00121-g64e7130 #1)
[ 1202.017879] EIP: 0060:[<c011ef4a>] EFLAGS: 00010086 CPU: 0
[ 1202.017881] EIP is at update_curr+0xc/0x17e
[ 1202.017883] EAX: 014540f6 EBX: f8163c85 ECX: 00000000 EDX: f8163c85
[ 1202.017886] ESI: c985ffff EDI: 014540f6 EBP: f5a99be0 ESP: f5a99bb4
[ 1202.017888] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[ 1202.017890] Process ps (pid: 2229, ti=f5a98000 task=f47e1708 task.ti=f5a98000)
[ 1202.017892] Stack:
[ 1202.017893] f61c7000 00000000 014540f6 ddd69bd9 00000117 00000046 c3647d90 00000004
[ 1202.017897] f8163c85 c985ffff 014540f6 f5a99c00 c011f642 014540f6 c3647d80 00000082
[ 1202.017901] f8163c85 c985ffff 014540f6 f5a99c18 c0121100 00000004 f448fd38 00000000
[ 1202.017905] Call Trace:
[ 1202.017907] [<f8163c85>] ? vlan_dev_neigh_setup+0x23/0x2a [8021q]
[ 1202.017912] [<c011f642>] ? dequeue_entity+0x13/0x134
[ 1202.017915] [<f8163c85>] ? vlan_dev_neigh_setup+0x23/0x2a [8021q]
[ 1202.017920] [<c0121100>] ? __set_se_shares+0x23/0x44
[ 1202.017923] [<c0121241>] ? tg_shares_up+0x120/0x153
[ 1202.017926] [<c011e349>] ? walk_tg_tree+0x6b/0x96
[ 1202.017928] [<c0121121>] ? tg_shares_up+0x0/0x153
[ 1202.017931] [<c011d3c2>] ? tg_nop+0x0/0x7
[ 1202.017933] [<c01242b4>] ? update_shares+0x53/0x59
[ 1202.017936] [<c0124313>] ? try_to_wake_up+0x59/0x179
[ 1202.017939] [<c012443e>] ? default_wake_function+0xb/0xd
[ 1202.017942] [<c013a826>] ? autoremove_wake_function+0xf/0x33
[ 1202.017946] [<c011dd04>] ? __wake_up_common+0x35/0x5b
[ 1202.017949] [<c011ea88>] ? __wake_up_sync+0x31/0x44
[ 1202.017951] [<c0199a55>] ? pipe_release+0x56/0x8b
[ 1202.017955] [<c0199ab0>] ? pipe_write_release+0xf/0x11
[ 1202.017958] [<c01950e1>] ? __fput+0xca/0x175
[ 1202.017960] [<c01951a5>] ? fput+0x19/0x1b
[ 1202.017963] [<c019287b>] ? filp_close+0x51/0x5b
[ 1202.017965] [<c012af64>] ? put_files_struct+0x68/0xaa
[ 1202.017968] [<c012afdd>] ? exit_files+0x37/0x3c
[ 1202.017970] [<c012c6db>] ? do_exit+0x1a9/0x6af
[ 1202.017973] [<c0288d64>] ? _raw_spin_unlock+0x74/0x78
[ 1202.017975] [<c03cb5a8>] ? printk+0xf/0x17
[ 1202.017978] [<c012989a>] ? oops_exit+0x23/0x28
[ 1202.017980] [<c0105ffb>] ? oops_end+0xa1/0xa9
[ 1202.017983] [<c0106187>] ? die+0x54/0x5a
[ 1202.017986] [<c01040ab>] ? do_trap+0x89/0xa2
[ 1202.017989] [<c01043b3>] ? do_invalid_op+0x0/0x84
[ 1202.017994] [<c010442d>] ? do_invalid_op+0x7a/0x84
[ 1202.017997] [<c0190686>] ? cache_free_debugcheck+0x1cd/0x2a4
[ 1202.018001] [<c02c08bb>] ? n_tty_read+0x394/0x620
[ 1202.018004] [<c011a5ac>] ? kernel_map_pages+0xde/0xfe
[ 1202.018007] [<c03ce38a>] ? error_code+0x72/0x78
[ 1202.018009] [<c0190686>] ? cache_free_debugcheck+0x1cd/0x2a4
[ 1202.018013] [<c01a8388>] ? single_release+0x1c/0x22
[ 1202.018015] [<c01a833a>] ? seq_release+0x18/0x1d
[ 1202.018018] [<c01908f8>] ? kfree+0x99/0xe3
[ 1202.018021] [<c01a8388>] ? single_release+0x1c/0x22
[ 1202.018023] [<c01950e1>] ? __fput+0xca/0x175
[ 1202.018026] [<c01951a5>] ? fput+0x19/0x1b
[ 1202.018028] [<c019287b>] ? filp_close+0x51/0x5b
[ 1202.018030] [<c01928ef>] ? sys_close+0x6a/0xa4
[ 1202.018033] [<c0103507>] ? sysenter_do_call+0x12/0x3a
[ 1202.018035] Code: 85 a0 0c 54 c0 8d 04 16 39 c3 75 08 5a 89 d8 5b 5e 5f 5d c3 8b 17 89 d8 e8 81 ef 2a 00 eb a5 55 89 e5 57 56 53 83 ec 20 89 45 dc <8b> 40 30 8b 55 dc 89 45 e8 8b 42 40 83 7d e8 00 8b 88 40 04 00
[ 1202.018062] EIP: [<c011ef4a>] update_curr+0xc/0x17e SS:ESP 0068:f5a99bb4
[ 1202.018066] ---[ end trace 32f9f05d27403737 ]---
[ 1202.018068] Fixing recursive fault but reboot is needed!
[ 1202.016619] double fault, tss at c364eae0
[ 1202.016619] eip = f8163c65, esp = f3d94000
[ 1202.016619] eax = f3dda800, ebx = f3dda800, ecx = f8163c62, edx = f6234e40
[ 1202.016619] esi = f6234e40, edi = c05218bc
--
WebSig: http://www.jukie.net/~bart/sig/
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [BUG] 2.6.29-rc* QinQ vlan trunking regression
2009-02-28 18:05 [BUG] 2.6.29-rc* QinQ vlan trunking regression Bart Trojanowski
@ 2009-03-04 7:43 ` David Miller
2009-03-04 9:57 ` Patrick McHardy
0 siblings, 1 reply; 29+ messages in thread
From: David Miller @ 2009-03-04 7:43 UTC (permalink / raw)
To: bart
Cc: greearb, shemminger, dada1, frank.blaschka, kaber, netdev,
linux-kernel
From: Bart Trojanowski <bart@jukie.net>
Date: Sat, 28 Feb 2009 13:05:41 -0500
> 2.6.29-rc* introduces a bug that causes a crash when a vlan is put into
> another vlan, so called vlan trunking or QinQ.
>
> I can reproduce it reliably with:
>
> $ modprobe 8021q
> $ vconfig add eth1 5
> $ ifconfig eth1.5 up
> $ vconfig add eth1.5 4
Stephen please fix this.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [BUG] 2.6.29-rc* QinQ vlan trunking regression
2009-03-04 7:43 ` David Miller
@ 2009-03-04 9:57 ` Patrick McHardy
2009-03-04 10:59 ` David Miller
0 siblings, 1 reply; 29+ messages in thread
From: Patrick McHardy @ 2009-03-04 9:57 UTC (permalink / raw)
To: David Miller
Cc: bart, greearb, shemminger, dada1, frank.blaschka, netdev,
linux-kernel
David Miller wrote:
> From: Bart Trojanowski <bart@jukie.net>
> Date: Sat, 28 Feb 2009 13:05:41 -0500
>
>> 2.6.29-rc* introduces a bug that causes a crash when a vlan is put into
>> another vlan, so called vlan trunking or QinQ.
>>
>> I can reproduce it reliably with:
>>
>> $ modprobe 8021q
>> $ vconfig add eth1 5
>> $ ifconfig eth1.5 up
>> $ vconfig add eth1.5 4
>
> Stephen please fix this.
I'm maintaining vlan :) I haven't been able to look into this yet,
but I should be later today.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [BUG] 2.6.29-rc* QinQ vlan trunking regression
2009-03-04 9:57 ` Patrick McHardy
@ 2009-03-04 10:59 ` David Miller
2009-03-04 11:45 ` Patrick McHardy
0 siblings, 1 reply; 29+ messages in thread
From: David Miller @ 2009-03-04 10:59 UTC (permalink / raw)
To: kaber
Cc: bart, greearb, shemminger, dada1, frank.blaschka, netdev,
linux-kernel
From: Patrick McHardy <kaber@trash.net>
Date: Wed, 04 Mar 2009 10:57:47 +0100
> David Miller wrote:
> > From: Bart Trojanowski <bart@jukie.net>
> > Date: Sat, 28 Feb 2009 13:05:41 -0500
> >
> >> 2.6.29-rc* introduces a bug that causes a crash when a vlan is put into
> >> another vlan, so called vlan trunking or QinQ.
> >>
> >> I can reproduce it reliably with:
> >>
> >> $ modprobe 8021q
> >> $ vconfig add eth1 5
> >> $ ifconfig eth1.5 up
> >> $ vconfig add eth1.5 4
> > Stephen please fix this.
>
> I'm maintaining vlan :) I haven't been able to look into this yet,
> but I should be later today.
Ok, I don't actually care who fixes it :-)
I asked Stephen because this appears like it might be netdev_ops
fallout.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [BUG] 2.6.29-rc* QinQ vlan trunking regression
2009-03-04 10:59 ` David Miller
@ 2009-03-04 11:45 ` Patrick McHardy
2009-03-05 3:53 ` David Miller
0 siblings, 1 reply; 29+ messages in thread
From: Patrick McHardy @ 2009-03-04 11:45 UTC (permalink / raw)
To: David Miller
Cc: bart, greearb, shemminger, dada1, frank.blaschka, netdev,
linux-kernel
David Miller wrote:
> From: Patrick McHardy <kaber@trash.net>
> Date: Wed, 04 Mar 2009 10:57:47 +0100
>
>>>> I can reproduce it reliably with:
>>>>
>>>> $ modprobe 8021q
>>>> $ vconfig add eth1 5
>>>> $ ifconfig eth1.5 up
>>>> $ vconfig add eth1.5 4
>>> Stephen please fix this.
>> I'm maintaining vlan :) I haven't been able to look into this yet,
>> but I should be later today.
>
> Ok, I don't actually care who fixes it :-)
>
> I asked Stephen because this appears like it might be netdev_ops
> fallout.
Good point. I think I know whats happening. The VLAN devices are
registered with vlan_netdev_ops, the ->init function then potentially
replaces them based on whether the underlying device supports HW
acceleration. At this point the register_netdev() function already
has used the first netdev_ops structure to initialize the compat
pointers, meaning the new assignment is largely without effect and
causes incorrect ops to be used with HW acceleration
This probably causes the slab corruption since the HW accelerated
VLAN device doesn't increase the headroom, but the non-accelerated
functions try to add a hard header anyways.
This is a bit tricky to fix since we actually need some valid
ops before invoking ->init(). One way would be to move the compat
ops initialization to a seperate function and have VLAN use it to
switch its ops.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [BUG] 2.6.29-rc* QinQ vlan trunking regression
2009-03-04 11:45 ` Patrick McHardy
@ 2009-03-05 3:53 ` David Miller
2009-03-05 4:54 ` Bart Trojanowski
` (2 more replies)
0 siblings, 3 replies; 29+ messages in thread
From: David Miller @ 2009-03-05 3:53 UTC (permalink / raw)
To: kaber
Cc: bart, greearb, shemminger, dada1, frank.blaschka, netdev,
linux-kernel
From: Patrick McHardy <kaber@trash.net>
Date: Wed, 04 Mar 2009 12:45:33 +0100
> This is a bit tricky to fix since we actually need some valid
> ops before invoking ->init(). One way would be to move the compat
> ops initialization to a seperate function and have VLAN use it to
> switch its ops.
Mind if I push this into net-2.6?
vlan: Fix vlan-in-vlan crashes.
As analyzed by Patrick McHardy, vlan needs to reset it's
netdev_ops pointer in it's ->init() function but this
leaves the compat method pointers stale.
Add a netdev_resync_ops() and call it from the vlan code.
Signed-off-by: David S. Miller <davem@davemloft.net>
---
include/linux/netdevice.h | 1 +
net/8021q/vlan_dev.c | 1 +
net/core/dev.c | 54 +++++++++++++++++++++++++++-----------------
3 files changed, 35 insertions(+), 21 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ec54785..6593667 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1079,6 +1079,7 @@ extern void synchronize_net(void);
extern int register_netdevice_notifier(struct notifier_block *nb);
extern int unregister_netdevice_notifier(struct notifier_block *nb);
extern int init_dummy_netdev(struct net_device *dev);
+extern void netdev_resync_ops(struct net_device *dev);
extern int call_netdevice_notifiers(unsigned long val, struct net_device *dev);
extern struct net_device *dev_get_by_index(struct net *net, int ifindex);
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 4a19acd..b6e1f9e 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -639,6 +639,7 @@ static int vlan_dev_init(struct net_device *dev)
dev->hard_header_len = real_dev->hard_header_len + VLAN_HLEN;
dev->netdev_ops = &vlan_netdev_ops;
}
+ netdev_resync_ops(dev);
if (is_vlan_dev(real_dev))
subclass = 1;
diff --git a/net/core/dev.c b/net/core/dev.c
index 9e4afe6..a06c6fa 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4282,6 +4282,38 @@ unsigned long netdev_fix_features(unsigned long features, const char *name)
}
EXPORT_SYMBOL(netdev_fix_features);
+/* Some devices need to (re-)set their netdev_ops inside
+ * ->init() or similar. If that happens, we have to setup
+ * the compat pointers again.
+ */
+void netdev_resync_ops(struct net_device *dev)
+{
+#ifdef CONFIG_COMPAT_NET_DEV_OPS
+ const struct net_device_ops *ops = dev->netdev_ops;
+
+ dev->init = ops->ndo_init;
+ dev->uninit = ops->ndo_uninit;
+ dev->open = ops->ndo_open;
+ dev->change_rx_flags = ops->ndo_change_rx_flags;
+ dev->set_rx_mode = ops->ndo_set_rx_mode;
+ dev->set_multicast_list = ops->ndo_set_multicast_list;
+ dev->set_mac_address = ops->ndo_set_mac_address;
+ dev->validate_addr = ops->ndo_validate_addr;
+ dev->do_ioctl = ops->ndo_do_ioctl;
+ dev->set_config = ops->ndo_set_config;
+ dev->change_mtu = ops->ndo_change_mtu;
+ dev->tx_timeout = ops->ndo_tx_timeout;
+ dev->get_stats = ops->ndo_get_stats;
+ dev->vlan_rx_register = ops->ndo_vlan_rx_register;
+ dev->vlan_rx_add_vid = ops->ndo_vlan_rx_add_vid;
+ dev->vlan_rx_kill_vid = ops->ndo_vlan_rx_kill_vid;
+#ifdef CONFIG_NET_POLL_CONTROLLER
+ dev->poll_controller = ops->ndo_poll_controller;
+#endif
+#endif
+}
+EXPORT_SYMBOL(netdev_resync_ops);
+
/**
* register_netdevice - register a network device
* @dev: device to register
@@ -4326,27 +4358,7 @@ int register_netdevice(struct net_device *dev)
* This is temporary until all network devices are converted.
*/
if (dev->netdev_ops) {
- const struct net_device_ops *ops = dev->netdev_ops;
-
- dev->init = ops->ndo_init;
- dev->uninit = ops->ndo_uninit;
- dev->open = ops->ndo_open;
- dev->change_rx_flags = ops->ndo_change_rx_flags;
- dev->set_rx_mode = ops->ndo_set_rx_mode;
- dev->set_multicast_list = ops->ndo_set_multicast_list;
- dev->set_mac_address = ops->ndo_set_mac_address;
- dev->validate_addr = ops->ndo_validate_addr;
- dev->do_ioctl = ops->ndo_do_ioctl;
- dev->set_config = ops->ndo_set_config;
- dev->change_mtu = ops->ndo_change_mtu;
- dev->tx_timeout = ops->ndo_tx_timeout;
- dev->get_stats = ops->ndo_get_stats;
- dev->vlan_rx_register = ops->ndo_vlan_rx_register;
- dev->vlan_rx_add_vid = ops->ndo_vlan_rx_add_vid;
- dev->vlan_rx_kill_vid = ops->ndo_vlan_rx_kill_vid;
-#ifdef CONFIG_NET_POLL_CONTROLLER
- dev->poll_controller = ops->ndo_poll_controller;
-#endif
+ netdev_resync_ops(dev);
} else {
char drivername[64];
pr_info("%s (%s): not using net_device_ops yet\n",
--
1.6.2
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [BUG] 2.6.29-rc* QinQ vlan trunking regression
2009-03-05 3:53 ` David Miller
@ 2009-03-05 4:54 ` Bart Trojanowski
2009-03-05 4:59 ` Bart Trojanowski
2009-03-05 5:21 ` David Miller
2009-03-05 5:51 ` Patrick McHardy
2009-03-05 12:30 ` Maxime Bizon
2 siblings, 2 replies; 29+ messages in thread
From: Bart Trojanowski @ 2009-03-05 4:54 UTC (permalink / raw)
To: David Miller
Cc: kaber, greearb, shemminger, dada1, frank.blaschka, netdev,
linux-kernel
David,
* David Miller <davem@davemloft.net> [090304 22:53]:
> vlan: Fix vlan-in-vlan crashes.
>
> As analyzed by Patrick McHardy, vlan needs to reset it's
> netdev_ops pointer in it's ->init() function but this
> leaves the compat method pointers stale.
>
> Add a netdev_resync_ops() and call it from the vlan code.
<snip>
> include/linux/netdevice.h | 1 +
> net/8021q/vlan_dev.c | 1 +
> net/core/dev.c | 54 +++++++++++++++++++++++++++-----------------
> 3 files changed, 35 insertions(+), 21 deletions(-)
I tried this patch onto v2.6.29-rc7-3-g559595a, but I still get a crash.
I assume that this worked for you, so I am not putting much faith in my
results at this late hour. I'll confirm tomorrow morning that it's not
something else.
Cheers,
-Bart
--
WebSig: http://www.jukie.net/~bart/sig/
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [BUG] 2.6.29-rc* QinQ vlan trunking regression
2009-03-05 4:54 ` Bart Trojanowski
@ 2009-03-05 4:59 ` Bart Trojanowski
2009-03-05 5:51 ` Patrick McHardy
2009-03-05 5:21 ` David Miller
1 sibling, 1 reply; 29+ messages in thread
From: Bart Trojanowski @ 2009-03-05 4:59 UTC (permalink / raw)
To: David Miller
Cc: kaber, greearb, shemminger, dada1, frank.blaschka, netdev,
linux-kernel
David,
* Bart Trojanowski <bart@jukie.net> [090304 23:54]:
> * David Miller <davem@davemloft.net> [090304 22:53]:
> > vlan: Fix vlan-in-vlan crashes.
> >
> > As analyzed by Patrick McHardy, vlan needs to reset it's
> > netdev_ops pointer in it's ->init() function but this
> > leaves the compat method pointers stale.
> >
> > Add a netdev_resync_ops() and call it from the vlan code.
> <snip>
> > include/linux/netdevice.h | 1 +
> > net/8021q/vlan_dev.c | 1 +
> > net/core/dev.c | 54 +++++++++++++++++++++++++++-----------------
> > 3 files changed, 35 insertions(+), 21 deletions(-)
>
> I tried this patch onto v2.6.29-rc7-3-g559595a, but I still get a crash.
> I assume that this worked for you, so I am not putting much faith in my
> results at this late hour. I'll confirm tomorrow morning that it's not
> something else.
... if you're interested, here is the Oops. And like I said, I'll
retest tomorrow.
-Bart
[ 231.748126] 802.1Q VLAN Support v1.8 Ben Greear <greearb@candelatech.com>
[ 231.751563] All bugs added by David S. Miller <davem@redhat.com>
[ 231.876164] PANIC: double fault, gdt at c364c000 [255 bytes]
[ 231.876271] double fault, tss at c364fae0
[ 231.876271] eip = f8163c65, esp = f3045000
[ 231.876271] eax = f3d78800, ebx = f3d78800, ecx = f8163c62, edx = f3e4ce40
[ 231.876271] esi = f3e4ce40, edi = c05228fc
[ 232.156110] BUG: unable to handle kernel paging request at a0ad0eb4
[ 232.159226] IP: [<c011fd49>] hrtick_start_fair+0x1f/0x17d
[ 232.160050] *pde = 00000000
[ 232.160050] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 232.160050] last sysfs file: /sys/class/net/lo/operstate
[ 232.160050] Modules linked in: 8021q virtio_balloon virtio_pci thermal_sys
[ 232.160050]
[ 232.160050] Pid: 0, comm: swapper Tainted: G S (2.6.29-rc7-bisect-00004-g39fc204 #1)
[ 232.160050] EIP: 0060:[<c011fd49>] EFLAGS: 00010092 CPU: 0
[ 232.160050] EIP is at hrtick_start_fair+0x1f/0x17d
[ 232.160050] EAX: c0599d80 EBX: c364e200 ECX: c3651dd4 EDX: f8163c85
[ 232.160050] ESI: f52f6198 EDI: f622b708 EBP: c0545ddc ESP: c0545dc4
[ 232.160050] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[ 232.160050] Process swapper (pid: 0, ti=c0544000 task=c04fd380 task.ti=c0544000)
[ 232.160050] Stack:
[ 232.160050] c3651d80 5f684afd 00000002 c3651dd4 f52f6198 c3651d80 c0545df0 c011ff82
[ 232.160050] f4796708 c03d6ad8 00000000 c0545e14 c011da63 0d94c928 00000036 00000000
[ 232.160050] c3651d80 c3651d80 c3648d80 00000522 c0545e20 c011daaa f4796708 c0545e34
[ 232.160050] Call Trace:
[ 232.160050] [<c011ff82>] ? dequeue_task_fair+0x51/0x56
[ 232.160050] [<c011da63>] ? dequeue_task+0xd5/0xe4
[ 232.160050] [<c011daaa>] ? deactivate_task+0x19/0x1f
[ 232.160050] [<c011e504>] ? pull_task+0x11/0x3d
[ 232.160050] [<c011e6fc>] ? load_balance_fair+0x134/0x1ce
[ 232.160050] [<c0124684>] ? rebalance_domains+0x219/0x421
[ 232.160050] [<c01248ba>] ? run_rebalance_domains+0x2e/0x9e
[ 232.160050] [<c012e0e7>] ? __do_softirq+0x8d/0x133
[ 232.160050] [<c012e1d5>] ? do_softirq+0x48/0x57
[ 232.160050] [<c012e2e6>] ? irq_exit+0x38/0x66
[ 232.160050] [<c011228b>] ? smp_apic_timer_interrupt+0x74/0x82
[ 232.160050] [<c0103cc0>] ? apic_timer_interrupt+0x28/0x30
[ 232.160050] [<c011824f>] ? native_safe_halt+0x5/0x7
[ 232.160050] [<c0108c92>] ? default_idle+0x30/0x58
[ 232.160050] [<c0102605>] ? cpu_idle+0x63/0x7e
[ 232.160050] [<c03bee3b>] ? rest_init+0x53/0x55
[ 232.160050] Code: 3e ff ff ff 83 c4 14 5b 5e 5f 5d c3 55 89 e5 57 89 d7 56 53 83 ec 0c 89 45 e8 8b 9a 80 00 00 00 b8 80 9d 59 c0 8b 52 04 8b 52 10 <03> 04 95 a0 1c 54 c0 39 45 e8 74 14 6a 00 68 7a 03 00 00 68 e1
[ 232.160050] EIP: [<c011fd49>] hrtick_start_fair+0x1f/0x17d SS:ESP 0068:c0545dc4
[ 232.160050] ---[ end trace b72565b053d710aa ]---
[ 232.160050] Kernel panic - not syncing: Fatal exception in interrupt
[ 232.160050] ------------[ cut here ]------------
[ 232.160050] WARNING: at kernel/smp.c:329 smp_call_function_many+0x37/0x1cf()
[ 232.160050] Hardware name:
[ 232.160050] Modules linked in: 8021q virtio_balloon virtio_pci thermal_sys
[ 232.160050] Pid: 0, comm: swapper Tainted: G S D 2.6.29-rc7-bisect-00004-g39fc204 #1
[ 232.160050] Call Trace:
[ 232.160050] [<c0129779>] warn_slowpath+0x71/0xa8
[ 232.160050] [<c0288f14>] ? _raw_spin_unlock+0x74/0x78
[ 232.160050] [<c03ce3aa>] ? _spin_unlock+0x1d/0x20
[ 232.160050] [<c0288f6b>] ? _raw_spin_lock+0x53/0xfa
[ 232.160050] [<c03ccadd>] ? mutex_unlock+0x8/0xa
[ 232.160050] [<c014b26d>] smp_call_function_many+0x37/0x1cf
[ 232.160050] [<c0108ec2>] ? stop_this_cpu+0x0/0x47
[ 232.160050] [<c014b421>] smp_call_function+0x1c/0x23
[ 232.160050] [<c0110e91>] native_smp_send_stop+0x1b/0x45
[ 232.160050] [<c03cb99e>] panic+0x48/0xe4
[ 232.160050] [<c0105ff4>] oops_end+0x9a/0xa9
[ 232.160050] [<c0106187>] die+0x54/0x5a
[ 232.160050] [<c01195d3>] do_page_fault+0x5f8/0x69d
[ 232.160050] [<c0119fe8>] ? __change_page_attr_set_clr+0x2a7/0x791
[ 232.160050] [<c0119ca2>] ? lookup_address+0x68/0x88
[ 232.160050] [<c0119fe8>] ? __change_page_attr_set_clr+0x2a7/0x791
[ 232.160050] [<c011a5b0>] ? kernel_map_pages+0xde/0xfe
[ 232.160050] [<c0119ca2>] ? lookup_address+0x68/0x88
[ 232.160050] [<c0119ca2>] ? lookup_address+0x68/0x88
[ 232.160050] [<c0119ca2>] ? lookup_address+0x68/0x88
[ 232.160050] [<c0118fdb>] ? do_page_fault+0x0/0x69d
[ 232.160050] [<c03ce82a>] error_code+0x72/0x78
[ 232.160050] [<f8163c85>] ? vlan_dev_neigh_setup+0x23/0x2a [8021q]
[ 232.160050] [<c011fd49>] ? hrtick_start_fair+0x1f/0x17d
[ 232.160050] [<c011ff82>] dequeue_task_fair+0x51/0x56
[ 232.160050] [<c011da63>] dequeue_task+0xd5/0xe4
[ 232.160050] [<c011daaa>] deactivate_task+0x19/0x1f
[ 232.160050] [<c011e504>] pull_task+0x11/0x3d
[ 232.160050] [<c011e6fc>] load_balance_fair+0x134/0x1ce
[ 232.160050] [<c0124684>] rebalance_domains+0x219/0x421
[ 232.160050] [<c01248ba>] run_rebalance_domains+0x2e/0x9e
[ 232.160050] [<c012e0e7>] __do_softirq+0x8d/0x133
[ 232.160050] [<c012e1d5>] do_softirq+0x48/0x57
[ 232.160050] [<c012e2e6>] irq_exit+0x38/0x66
[ 232.160050] [<c011228b>] smp_apic_timer_interrupt+0x74/0x82
[ 232.160050] [<c0103cc0>] apic_timer_interrupt+0x28/0x30
[ 232.160050] [<c011824f>] ? native_safe_halt+0x5/0x7
[ 232.160050] [<c0108c92>] default_idle+0x30/0x58
[ 232.160050] [<c0102605>] cpu_idle+0x63/0x7e
[ 232.160050] [<c03bee3b>] rest_init+0x53/0x55
[ 232.160050] ---[ end trace b72565b053d710ab ]---
[ 232.160050] ------------[ cut here ]------------
[ 232.160050] WARNING: at kernel/smp.c:226 smp_call_function_single+0x37/0xe8()
[ 232.160050] Hardware name:
[ 232.160050] Modules linked in: 8021q virtio_balloon virtio_pci thermal_sys
[ 232.160050] Pid: 0, comm: swapper Tainted: G S D W 2.6.29-rc7-bisect-00004-g39fc204 #1
[ 232.160050] Call Trace:
[ 232.160050] [<c0129779>] warn_slowpath+0x71/0xa8
[ 232.160050] [<c0288f14>] ? _raw_spin_unlock+0x74/0x78
[ 232.160050] [<c03ce300>] ? _read_unlock+0x15/0x20
[ 232.160050] [<c0288f6b>] ? _raw_spin_lock+0x53/0xfa
[ 232.160050] [<c014b185>] smp_call_function_single+0x37/0xe8
[ 232.160050] [<c0108ec2>] ? stop_this_cpu+0x0/0x47
[ 232.160050] [<c014b2ef>] smp_call_function_many+0xb9/0x1cf
[ 232.160050] [<c0108ec2>] ? stop_this_cpu+0x0/0x47
[ 232.160050] [<c014b421>] smp_call_function+0x1c/0x23
[ 232.160050] [<c0110e91>] native_smp_send_stop+0x1b/0x45
[ 232.160050] [<c03cb99e>] panic+0x48/0xe4
[ 232.160050] [<c0105ff4>] oops_end+0x9a/0xa9
[ 232.160050] [<c0106187>] die+0x54/0x5a
[ 232.160050] [<c01195d3>] do_page_fault+0x5f8/0x69d
[ 232.160050] [<c0119fe8>] ? __change_page_attr_set_clr+0x2a7/0x791
[ 232.160050] [<c0119ca2>] ? lookup_address+0x68/0x88
[ 232.160050] [<c0119fe8>] ? __change_page_attr_set_clr+0x2a7/0x791
[ 232.160050] [<c011a5b0>] ? kernel_map_pages+0xde/0xfe
[ 232.160050] [<c0119ca2>] ? lookup_address+0x68/0x88
[ 232.160050] [<c0119ca2>] ? lookup_address+0x68/0x88
[ 232.160050] [<c0119ca2>] ? lookup_address+0x68/0x88
[ 232.160050] [<c0118fdb>] ? do_page_fault+0x0/0x69d
[ 232.160050] [<c03ce82a>] error_code+0x72/0x78
[ 232.160050] [<f8163c85>] ? vlan_dev_neigh_setup+0x23/0x2a [8021q]
[ 232.160050] [<c011fd49>] ? hrtick_start_fair+0x1f/0x17d
[ 232.160050] [<c011ff82>] dequeue_task_fair+0x51/0x56
[ 232.160050] [<c011da63>] dequeue_task+0xd5/0xe4
[ 232.160050] [<c011daaa>] deactivate_task+0x19/0x1f
[ 232.160050] [<c011e504>] pull_task+0x11/0x3d
[ 232.160050] [<c011e6fc>] load_balance_fair+0x134/0x1ce
[ 232.160050] [<c0124684>] rebalance_domains+0x219/0x421
[ 232.160050] [<c01248ba>] run_rebalance_domains+0x2e/0x9e
[ 232.160050] [<c012e0e7>] __do_softirq+0x8d/0x133
[ 232.160050] [<c012e1d5>] do_softirq+0x48/0x57
[ 232.160050] [<c012e2e6>] irq_exit+0x38/0x66
[ 232.160050] [<c011228b>] smp_apic_timer_interrupt+0x74/0x82
[ 232.160050] [<c0103cc0>] apic_timer_interrupt+0x28/0x30
[ 232.160050] [<c011824f>] ? native_safe_halt+0x5/0x7
[ 232.160050] [<c0108c92>] default_idle+0x30/0x58
[ 232.160050] [<c0102605>] cpu_idle+0x63/0x7e
[ 232.160050] [<c03bee3b>] rest_init+0x53/0x55
[ 232.160050] ---[ end trace b72565b053d710ac ]---
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [BUG] 2.6.29-rc* QinQ vlan trunking regression
2009-03-05 4:54 ` Bart Trojanowski
2009-03-05 4:59 ` Bart Trojanowski
@ 2009-03-05 5:21 ` David Miller
1 sibling, 0 replies; 29+ messages in thread
From: David Miller @ 2009-03-05 5:21 UTC (permalink / raw)
To: bart
Cc: kaber, greearb, shemminger, dada1, frank.blaschka, netdev,
linux-kernel
From: Bart Trojanowski <bart@jukie.net>
Date: Wed, 4 Mar 2009 23:54:42 -0500
> David,
>
> * David Miller <davem@davemloft.net> [090304 22:53]:
> > vlan: Fix vlan-in-vlan crashes.
> >
> > As analyzed by Patrick McHardy, vlan needs to reset it's
> > netdev_ops pointer in it's ->init() function but this
> > leaves the compat method pointers stale.
> >
> > Add a netdev_resync_ops() and call it from the vlan code.
> <snip>
> > include/linux/netdevice.h | 1 +
> > net/8021q/vlan_dev.c | 1 +
> > net/core/dev.c | 54 +++++++++++++++++++++++++++-----------------
> > 3 files changed, 35 insertions(+), 21 deletions(-)
>
> I tried this patch onto v2.6.29-rc7-3-g559595a, but I still get a crash.
> I assume that this worked for you, so I am not putting much faith in my
> results at this late hour. I'll confirm tomorrow morning that it's not
> something else.
I didn't test, so you the bug must be a different problem or
my patch is wrong :)
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [BUG] 2.6.29-rc* QinQ vlan trunking regression
2009-03-05 3:53 ` David Miller
2009-03-05 4:54 ` Bart Trojanowski
@ 2009-03-05 5:51 ` Patrick McHardy
2009-03-05 6:57 ` David Miller
2009-03-05 12:30 ` Maxime Bizon
2 siblings, 1 reply; 29+ messages in thread
From: Patrick McHardy @ 2009-03-05 5:51 UTC (permalink / raw)
To: David Miller
Cc: bart, greearb, shemminger, dada1, frank.blaschka, netdev,
linux-kernel
David Miller wrote:
> From: Patrick McHardy <kaber@trash.net>
> Date: Wed, 04 Mar 2009 12:45:33 +0100
>
>> This is a bit tricky to fix since we actually need some valid
>> ops before invoking ->init(). One way would be to move the compat
>> ops initialization to a seperate function and have VLAN use it to
>> switch its ops.
>
> Mind if I push this into net-2.6?
>
> vlan: Fix vlan-in-vlan crashes.
>
> As analyzed by Patrick McHardy, vlan needs to reset it's
> netdev_ops pointer in it's ->init() function but this
> leaves the compat method pointers stale.
>
> Add a netdev_resync_ops() and call it from the vlan code.
This looks fine, thanks. Even if it doesn't fix this particular
report, I think its appropriate for net-2.6.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [BUG] 2.6.29-rc* QinQ vlan trunking regression
2009-03-05 4:59 ` Bart Trojanowski
@ 2009-03-05 5:51 ` Patrick McHardy
0 siblings, 0 replies; 29+ messages in thread
From: Patrick McHardy @ 2009-03-05 5:51 UTC (permalink / raw)
To: Bart Trojanowski
Cc: David Miller, greearb, shemminger, dada1, frank.blaschka, netdev,
linux-kernel
Bart Trojanowski wrote:
>>> As analyzed by Patrick McHardy, vlan needs to reset it's
>>> netdev_ops pointer in it's ->init() function but this
>>> leaves the compat method pointers stale.
>>>
>>> Add a netdev_resync_ops() and call it from the vlan code.
>> <snip>
>>> include/linux/netdevice.h | 1 +
>>> net/8021q/vlan_dev.c | 1 +
>>> net/core/dev.c | 54 +++++++++++++++++++++++++++-----------------
>>> 3 files changed, 35 insertions(+), 21 deletions(-)
>> I tried this patch onto v2.6.29-rc7-3-g559595a, but I still get a crash.
>> I assume that this worked for you, so I am not putting much faith in my
>> results at this late hour. I'll confirm tomorrow morning that it's not
>> something else.
>
> ... if you're interested, here is the Oops. And like I said, I'll
> retest tomorrow.
Thanks, I'll try to reproduce it here.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [BUG] 2.6.29-rc* QinQ vlan trunking regression
2009-03-05 5:51 ` Patrick McHardy
@ 2009-03-05 6:57 ` David Miller
2009-03-05 7:00 ` David Miller
0 siblings, 1 reply; 29+ messages in thread
From: David Miller @ 2009-03-05 6:57 UTC (permalink / raw)
To: kaber
Cc: bart, greearb, shemminger, dada1, frank.blaschka, netdev,
linux-kernel
From: Patrick McHardy <kaber@trash.net>
Date: Thu, 05 Mar 2009 06:51:25 +0100
> David Miller wrote:
> > From: Patrick McHardy <kaber@trash.net>
> > Date: Wed, 04 Mar 2009 12:45:33 +0100
> >
> >> This is a bit tricky to fix since we actually need some valid
> >> ops before invoking ->init(). One way would be to move the compat
> >> ops initialization to a seperate function and have VLAN use it to
> >> switch its ops.
> > Mind if I push this into net-2.6?
> > vlan: Fix vlan-in-vlan crashes.
> > As analyzed by Patrick McHardy, vlan needs to reset it's
> > netdev_ops pointer in it's ->init() function but this
> > leaves the compat method pointers stale.
> > Add a netdev_resync_ops() and call it from the vlan code.
>
> This looks fine, thanks. Even if it doesn't fix this particular
> report, I think its appropriate for net-2.6.
Thanks for looking at it, but I don't want to push it out until we
fully resolve this bug.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [BUG] 2.6.29-rc* QinQ vlan trunking regression
2009-03-05 6:57 ` David Miller
@ 2009-03-05 7:00 ` David Miller
2009-03-05 7:05 ` Patrick McHardy
0 siblings, 1 reply; 29+ messages in thread
From: David Miller @ 2009-03-05 7:00 UTC (permalink / raw)
To: kaber
Cc: bart, greearb, shemminger, dada1, frank.blaschka, netdev,
linux-kernel
I think this will fix it.
diff --git a/net/core/dev.c b/net/core/dev.c
index 9e4afe6..2dd484e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4339,6 +4339,7 @@ int register_netdevice(struct net_device *dev)
dev->do_ioctl = ops->ndo_do_ioctl;
dev->set_config = ops->ndo_set_config;
dev->change_mtu = ops->ndo_change_mtu;
+ dev->neigh_setup = ops->ndo_neigh_setup;
dev->tx_timeout = ops->ndo_tx_timeout;
dev->get_stats = ops->ndo_get_stats;
dev->vlan_rx_register = ops->ndo_vlan_rx_register;
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [BUG] 2.6.29-rc* QinQ vlan trunking regression
2009-03-05 7:00 ` David Miller
@ 2009-03-05 7:05 ` Patrick McHardy
2009-03-05 7:11 ` David Miller
0 siblings, 1 reply; 29+ messages in thread
From: Patrick McHardy @ 2009-03-05 7:05 UTC (permalink / raw)
To: David Miller
Cc: bart, greearb, shemminger, dada1, frank.blaschka, netdev,
linux-kernel
David Miller wrote:
> I think this will fix it.
Good catch, that looks like a likely cause. Will test in a minute ...
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [BUG] 2.6.29-rc* QinQ vlan trunking regression
2009-03-05 7:05 ` Patrick McHardy
@ 2009-03-05 7:11 ` David Miller
2009-03-05 7:12 ` Patrick McHardy
0 siblings, 1 reply; 29+ messages in thread
From: David Miller @ 2009-03-05 7:11 UTC (permalink / raw)
To: kaber
Cc: bart, greearb, shemminger, dada1, frank.blaschka, netdev,
linux-kernel
From: Patrick McHardy <kaber@trash.net>
Date: Thu, 05 Mar 2009 08:05:10 +0100
> David Miller wrote:
> > I think this will fix it.
>
> Good catch, that looks like a likely cause. Will test in a minute ...
We probably need both fixes to cover everything.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [BUG] 2.6.29-rc* QinQ vlan trunking regression
2009-03-05 7:11 ` David Miller
@ 2009-03-05 7:12 ` Patrick McHardy
2009-03-05 7:19 ` David Miller
0 siblings, 1 reply; 29+ messages in thread
From: Patrick McHardy @ 2009-03-05 7:12 UTC (permalink / raw)
To: David Miller
Cc: bart, greearb, shemminger, dada1, frank.blaschka, netdev,
linux-kernel
David Miller wrote:
> From: Patrick McHardy <kaber@trash.net>
> Date: Thu, 05 Mar 2009 08:05:10 +0100
>
>> David Miller wrote:
>>> I think this will fix it.
>> Good catch, that looks like a likely cause. Will test in a minute ...
>
> We probably need both fixes to cover everything.
>
Yes, just the second one still crashes. I'm about to retry using both.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [BUG] 2.6.29-rc* QinQ vlan trunking regression
2009-03-05 7:12 ` Patrick McHardy
@ 2009-03-05 7:19 ` David Miller
2009-03-05 7:26 ` Patrick McHardy
0 siblings, 1 reply; 29+ messages in thread
From: David Miller @ 2009-03-05 7:19 UTC (permalink / raw)
To: kaber
Cc: bart, greearb, shemminger, dada1, frank.blaschka, netdev,
linux-kernel
From: Patrick McHardy <kaber@trash.net>
Date: Thu, 05 Mar 2009 08:12:50 +0100
> David Miller wrote:
> > From: Patrick McHardy <kaber@trash.net>
> > Date: Thu, 05 Mar 2009 08:05:10 +0100
> >
> >> David Miller wrote:
> >>> I think this will fix it.
> >> Good catch, that looks like a likely cause. Will test in a minute ...
> > We probably need both fixes to cover everything.
> >
>
> Yes, just the second one still crashes. I'm about to retry using both.
Here is the updated version just for the record:
vlan: Fix vlan-in-vlan crashes.
As analyzed by Patrick McHardy, vlan needs to reset it's
netdev_ops pointer in it's ->init() function but this
leaves the compat method pointers stale.
Add a netdev_resync_ops() and call it from the vlan code.
Any other driver which changes ->netdev_ops after register_netdevice()
will need to call this new function after doing so too.
Signed-off-by: David S. Miller <davem@davemloft.net>
---
include/linux/netdevice.h | 1 +
net/8021q/vlan_dev.c | 1 +
net/core/dev.c | 56 +++++++++++++++++++++++++++-----------------
3 files changed, 36 insertions(+), 22 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ec54785..6593667 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1079,6 +1079,7 @@ extern void synchronize_net(void);
extern int register_netdevice_notifier(struct notifier_block *nb);
extern int unregister_netdevice_notifier(struct notifier_block *nb);
extern int init_dummy_netdev(struct net_device *dev);
+extern void netdev_resync_ops(struct net_device *dev);
extern int call_netdevice_notifiers(unsigned long val, struct net_device *dev);
extern struct net_device *dev_get_by_index(struct net *net, int ifindex);
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 4a19acd..b6e1f9e 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -639,6 +639,7 @@ static int vlan_dev_init(struct net_device *dev)
dev->hard_header_len = real_dev->hard_header_len + VLAN_HLEN;
dev->netdev_ops = &vlan_netdev_ops;
}
+ netdev_resync_ops(dev);
if (is_vlan_dev(real_dev))
subclass = 1;
diff --git a/net/core/dev.c b/net/core/dev.c
index 2dd484e..f112970 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4282,6 +4282,39 @@ unsigned long netdev_fix_features(unsigned long features, const char *name)
}
EXPORT_SYMBOL(netdev_fix_features);
+/* Some devices need to (re-)set their netdev_ops inside
+ * ->init() or similar. If that happens, we have to setup
+ * the compat pointers again.
+ */
+void netdev_resync_ops(struct net_device *dev)
+{
+#ifdef CONFIG_COMPAT_NET_DEV_OPS
+ const struct net_device_ops *ops = dev->netdev_ops;
+
+ dev->init = ops->ndo_init;
+ dev->uninit = ops->ndo_uninit;
+ dev->open = ops->ndo_open;
+ dev->change_rx_flags = ops->ndo_change_rx_flags;
+ dev->set_rx_mode = ops->ndo_set_rx_mode;
+ dev->set_multicast_list = ops->ndo_set_multicast_list;
+ dev->set_mac_address = ops->ndo_set_mac_address;
+ dev->validate_addr = ops->ndo_validate_addr;
+ dev->do_ioctl = ops->ndo_do_ioctl;
+ dev->set_config = ops->ndo_set_config;
+ dev->change_mtu = ops->ndo_change_mtu;
+ dev->neigh_setup = ops->ndo_neigh_setup;
+ dev->tx_timeout = ops->ndo_tx_timeout;
+ dev->get_stats = ops->ndo_get_stats;
+ dev->vlan_rx_register = ops->ndo_vlan_rx_register;
+ dev->vlan_rx_add_vid = ops->ndo_vlan_rx_add_vid;
+ dev->vlan_rx_kill_vid = ops->ndo_vlan_rx_kill_vid;
+#ifdef CONFIG_NET_POLL_CONTROLLER
+ dev->poll_controller = ops->ndo_poll_controller;
+#endif
+#endif
+}
+EXPORT_SYMBOL(netdev_resync_ops);
+
/**
* register_netdevice - register a network device
* @dev: device to register
@@ -4326,28 +4359,7 @@ int register_netdevice(struct net_device *dev)
* This is temporary until all network devices are converted.
*/
if (dev->netdev_ops) {
- const struct net_device_ops *ops = dev->netdev_ops;
-
- dev->init = ops->ndo_init;
- dev->uninit = ops->ndo_uninit;
- dev->open = ops->ndo_open;
- dev->change_rx_flags = ops->ndo_change_rx_flags;
- dev->set_rx_mode = ops->ndo_set_rx_mode;
- dev->set_multicast_list = ops->ndo_set_multicast_list;
- dev->set_mac_address = ops->ndo_set_mac_address;
- dev->validate_addr = ops->ndo_validate_addr;
- dev->do_ioctl = ops->ndo_do_ioctl;
- dev->set_config = ops->ndo_set_config;
- dev->change_mtu = ops->ndo_change_mtu;
- dev->neigh_setup = ops->ndo_neigh_setup;
- dev->tx_timeout = ops->ndo_tx_timeout;
- dev->get_stats = ops->ndo_get_stats;
- dev->vlan_rx_register = ops->ndo_vlan_rx_register;
- dev->vlan_rx_add_vid = ops->ndo_vlan_rx_add_vid;
- dev->vlan_rx_kill_vid = ops->ndo_vlan_rx_kill_vid;
-#ifdef CONFIG_NET_POLL_CONTROLLER
- dev->poll_controller = ops->ndo_poll_controller;
-#endif
+ netdev_resync_ops(dev);
} else {
char drivername[64];
pr_info("%s (%s): not using net_device_ops yet\n",
--
1.6.2
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [BUG] 2.6.29-rc* QinQ vlan trunking regression
2009-03-05 7:19 ` David Miller
@ 2009-03-05 7:26 ` Patrick McHardy
2009-03-05 7:31 ` Patrick McHardy
0 siblings, 1 reply; 29+ messages in thread
From: Patrick McHardy @ 2009-03-05 7:26 UTC (permalink / raw)
To: David Miller
Cc: bart, greearb, shemminger, dada1, frank.blaschka, netdev,
linux-kernel
David Miller wrote:
> From: Patrick McHardy <kaber@trash.net>
> Date: Thu, 05 Mar 2009 08:12:50 +0100
>
>>> We probably need both fixes to cover everything.
>>>
>> Yes, just the second one still crashes. I'm about to retry using both.
>
> Here is the updated version just for the record:
>
> vlan: Fix vlan-in-vlan crashes.
This still crashes. I'll have another look at the code.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [BUG] 2.6.29-rc* QinQ vlan trunking regression
2009-03-05 7:26 ` Patrick McHardy
@ 2009-03-05 7:31 ` Patrick McHardy
2009-03-05 7:45 ` David Miller
2009-03-05 8:05 ` Frank Blaschka
0 siblings, 2 replies; 29+ messages in thread
From: Patrick McHardy @ 2009-03-05 7:31 UTC (permalink / raw)
To: David Miller
Cc: bart, greearb, shemminger, dada1, frank.blaschka, netdev,
linux-kernel
[-- Attachment #1: Type: text/plain, Size: 639 bytes --]
Patrick McHardy wrote:
> David Miller wrote:
>> From: Patrick McHardy <kaber@trash.net>
>> Date: Thu, 05 Mar 2009 08:12:50 +0100
>>
>>>> We probably need both fixes to cover everything.
>>>>
>>> Yes, just the second one still crashes. I'm about to retry using both.
>>
>> Here is the updated version just for the record:
>>
>> vlan: Fix vlan-in-vlan crashes.
>
> This still crashes. I'll have another look at the code.
This one combined with your patch fixes the crash. The code was calling
vlan_dev_neigh_setup recursively.
Signed-off-by: Patrick McHardy <kaber@trash.net>
(or Tested-by: in case you want to roll it into your patch).
[-- Attachment #2: x --]
[-- Type: text/plain, Size: 424 bytes --]
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 4a19acd..1b34135 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -553,7 +553,7 @@ static int vlan_dev_neigh_setup(struct net_device *dev, struct neigh_parms *pa)
int err = 0;
if (netif_device_present(real_dev) && ops->ndo_neigh_setup)
- err = ops->ndo_neigh_setup(dev, pa);
+ err = ops->ndo_neigh_setup(real_dev, pa);
return err;
}
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [BUG] 2.6.29-rc* QinQ vlan trunking regression
2009-03-05 7:31 ` Patrick McHardy
@ 2009-03-05 7:45 ` David Miller
2009-03-05 8:05 ` Frank Blaschka
1 sibling, 0 replies; 29+ messages in thread
From: David Miller @ 2009-03-05 7:45 UTC (permalink / raw)
To: kaber
Cc: bart, greearb, shemminger, dada1, frank.blaschka, netdev,
linux-kernel
From: Patrick McHardy <kaber@trash.net>
Date: Thu, 05 Mar 2009 08:31:31 +0100
> Patrick McHardy wrote:
> > David Miller wrote:
> >> From: Patrick McHardy <kaber@trash.net>
> >> Date: Thu, 05 Mar 2009 08:12:50 +0100
> >>
> >>>> We probably need both fixes to cover everything.
> >>>>
> >>> Yes, just the second one still crashes. I'm about to retry using both.
> >>
> >> Here is the updated version just for the record:
> >>
> >> vlan: Fix vlan-in-vlan crashes.
> > This still crashes. I'll have another look at the code.
>
> This one combined with your patch fixes the crash. The code was calling
> vlan_dev_neigh_setup recursively.
>
> Signed-off-by: Patrick McHardy <kaber@trash.net>
>
> (or Tested-by: in case you want to roll it into your patch).
Great work, thanks Patrick!
I'll get this all integrated and push it out to Linus.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [BUG] 2.6.29-rc* QinQ vlan trunking regression
2009-03-05 7:31 ` Patrick McHardy
2009-03-05 7:45 ` David Miller
@ 2009-03-05 8:05 ` Frank Blaschka
2009-03-05 8:27 ` Patrick McHardy
1 sibling, 1 reply; 29+ messages in thread
From: Frank Blaschka @ 2009-03-05 8:05 UTC (permalink / raw)
To: Patrick McHardy
Cc: David Miller, bart, greearb, shemminger, dada1, frank.blaschka,
netdev, linux-kernel
Hi Dave, Patrick,
sorry I could not follow the complete discussion of the fixes done for this problem
but does
if (netif_device_present(real_dev) && ops->ndo_neigh_setup)
- err = ops->ndo_neigh_setup(dev, pa);
+ err = ops->ndo_neigh_setup(real_dev, pa);
not change the idea of the neigh_setup? Remind we want the neigh_setup of the
real device as the neigh setup function for the vlan device.
Frank
Patrick McHardy schrieb:
> Patrick McHardy wrote:
>> David Miller wrote:
>>> From: Patrick McHardy <kaber@trash.net>
>>> Date: Thu, 05 Mar 2009 08:12:50 +0100
>>>
>>>>> We probably need both fixes to cover everything.
>>>>>
>>>> Yes, just the second one still crashes. I'm about to retry using both.
>>>
>>> Here is the updated version just for the record:
>>>
>>> vlan: Fix vlan-in-vlan crashes.
>>
>> This still crashes. I'll have another look at the code.
>
> This one combined with your patch fixes the crash. The code was calling
> vlan_dev_neigh_setup recursively.
>
> Signed-off-by: Patrick McHardy <kaber@trash.net>
>
> (or Tested-by: in case you want to roll it into your patch).
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [BUG] 2.6.29-rc* QinQ vlan trunking regression
2009-03-05 8:05 ` Frank Blaschka
@ 2009-03-05 8:27 ` Patrick McHardy
2009-03-05 8:56 ` David Miller
0 siblings, 1 reply; 29+ messages in thread
From: Patrick McHardy @ 2009-03-05 8:27 UTC (permalink / raw)
To: Frank Blaschka
Cc: David Miller, bart, greearb, shemminger, dada1, frank.blaschka,
netdev, linux-kernel
Frank Blaschka wrote:
> Hi Dave, Patrick,
>
> sorry I could not follow the complete discussion of the fixes done for this problem
> but does
>
> if (netif_device_present(real_dev) && ops->ndo_neigh_setup)
> - err = ops->ndo_neigh_setup(dev, pa);
> + err = ops->ndo_neigh_setup(real_dev, pa);
>
> not change the idea of the neigh_setup? Remind we want the neigh_setup of the
> real device as the neigh setup function for the vlan device.
>
An we still use it. The only difference is that we pass it the
correct device reference, which not only fixes the recursion,
but is also expected by the callbacks. Look at bonding or simply
vlan itself.
The setup itself is still done using the neigh_params passed to
VLAN, which appears to be what was originally intended.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [BUG] 2.6.29-rc* QinQ vlan trunking regression
2009-03-05 8:27 ` Patrick McHardy
@ 2009-03-05 8:56 ` David Miller
2009-03-05 8:59 ` David Miller
0 siblings, 1 reply; 29+ messages in thread
From: David Miller @ 2009-03-05 8:56 UTC (permalink / raw)
To: kaber
Cc: blaschka, bart, greearb, shemminger, dada1, frank.blaschka,
netdev, linux-kernel
From: Patrick McHardy <kaber@trash.net>
Date: Thu, 05 Mar 2009 09:27:12 +0100
> Frank Blaschka wrote:
> > Hi Dave, Patrick,
> >
> > sorry I could not follow the complete discussion of the fixes done for this problem
> > but does
> >
> > if (netif_device_present(real_dev) && ops->ndo_neigh_setup)
> > - err = ops->ndo_neigh_setup(dev, pa);
> > + err = ops->ndo_neigh_setup(real_dev, pa);
> >
> > not change the idea of the neigh_setup? Remind we want the neigh_setup of the
> > real device as the neigh setup function for the vlan device.
> >
>
> An we still use it. The only difference is that we pass it the
> correct device reference, which not only fixes the recursion,
> but is also expected by the callbacks. Look at bonding or simply
> vlan itself.
>
> The setup itself is still done using the neigh_params passed to
> VLAN, which appears to be what was originally intended.
Then bond_neigh_setup() has the same bug, doesn't it?
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [BUG] 2.6.29-rc* QinQ vlan trunking regression
2009-03-05 8:56 ` David Miller
@ 2009-03-05 8:59 ` David Miller
2009-03-05 9:08 ` Patrick McHardy
0 siblings, 1 reply; 29+ messages in thread
From: David Miller @ 2009-03-05 8:59 UTC (permalink / raw)
To: kaber
Cc: blaschka, bart, greearb, shemminger, dada1, frank.blaschka,
netdev, linux-kernel
From: David Miller <davem@davemloft.net>
Date: Thu, 05 Mar 2009 00:56:46 -0800 (PST)
> Then bond_neigh_setup() has the same bug, doesn't it?
Looking at the bond_main.c changes in:
commit 008298231abbeb91bc7be9e8b078607b816d1a4a
Author: Stephen Hemminger <shemminger@vyatta.com>
Date: Thu Nov 20 20:14:53 2008 -0800
netdev: add more functions to netdevice ops
shows that it always behaved that way.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [BUG] 2.6.29-rc* QinQ vlan trunking regression
2009-03-05 8:59 ` David Miller
@ 2009-03-05 9:08 ` Patrick McHardy
2009-03-05 9:09 ` Patrick McHardy
0 siblings, 1 reply; 29+ messages in thread
From: Patrick McHardy @ 2009-03-05 9:08 UTC (permalink / raw)
To: David Miller
Cc: blaschka, bart, greearb, shemminger, dada1, frank.blaschka,
netdev, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 791 bytes --]
David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Thu, 05 Mar 2009 00:56:46 -0800 (PST)
>
>
>> Then bond_neigh_setup() has the same bug, doesn't it?
>>
Indeed. But this seems to be the last one.
>
> Looking at the bond_main.c changes in:
>
> commit 008298231abbeb91bc7be9e8b078607b816d1a4a
> Author: Stephen Hemminger <shemminger@vyatta.com>
> Date: Thu Nov 20 20:14:53 2008 -0800
>
> netdev: add more functions to netdevice ops
>
> shows that it always behaved that way.
>
Yes, but that patch introduced the requirement to pass the correct
device down since now the handlers need it to get to the ops of the
underlying device. Previously they all relied on the handlers not
using their private data.
Signed-off-by: Patrick McHardy <kaber@trash.net>
[-- Attachment #2: x --]
[-- Type: text/plain, Size: 516 bytes --]
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 9fb3883..383ce48 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4113,7 +4113,7 @@ static int bond_neigh_setup(struct net_device *dev, struct neigh_parms *parms)
const struct net_device_ops *slave_ops
= slave->dev->netdev_ops;
if (slave_ops->ndo_neigh_setup)
- return slave_ops->ndo_neigh_setup(dev, parms);
+ return slave_ops->ndo_neigh_setup(slave, parms);
}
return 0;
}
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [BUG] 2.6.29-rc* QinQ vlan trunking regression
2009-03-05 9:08 ` Patrick McHardy
@ 2009-03-05 9:09 ` Patrick McHardy
2009-03-05 9:58 ` David Miller
0 siblings, 1 reply; 29+ messages in thread
From: Patrick McHardy @ 2009-03-05 9:09 UTC (permalink / raw)
To: David Miller
Cc: blaschka, bart, greearb, shemminger, dada1, frank.blaschka,
netdev, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 352 bytes --]
Patrick McHardy wrote:
> Yes, but that patch introduced the requirement to pass the correct
> device down since now the handlers need it to get to the ops of the
> underlying device. Previously they all relied on the handlers not
> using their private data.
>
> Signed-off-by: Patrick McHardy <kaber@trash.net>
>
Oops, the last patch was broken.
[-- Attachment #2: x --]
[-- Type: text/plain, Size: 521 bytes --]
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 9fb3883..e0578fe 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4113,7 +4113,7 @@ static int bond_neigh_setup(struct net_device *dev, struct neigh_parms *parms)
const struct net_device_ops *slave_ops
= slave->dev->netdev_ops;
if (slave_ops->ndo_neigh_setup)
- return slave_ops->ndo_neigh_setup(dev, parms);
+ return slave_ops->ndo_neigh_setup(slave->dev, parms);
}
return 0;
}
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [BUG] 2.6.29-rc* QinQ vlan trunking regression
2009-03-05 9:09 ` Patrick McHardy
@ 2009-03-05 9:58 ` David Miller
0 siblings, 0 replies; 29+ messages in thread
From: David Miller @ 2009-03-05 9:58 UTC (permalink / raw)
To: kaber
Cc: blaschka, bart, greearb, shemminger, dada1, frank.blaschka,
netdev, linux-kernel
From: Patrick McHardy <kaber@trash.net>
Date: Thu, 05 Mar 2009 10:09:18 +0100
> Patrick McHardy wrote:
> > Yes, but that patch introduced the requirement to pass the correct
> > device down since now the handlers need it to get to the ops of the
> > underlying device. Previously they all relied on the handlers not
> > using their private data.
Aha, that's right.
> > Signed-off-by: Patrick McHardy <kaber@trash.net>
> >
>
> Oops, the last patch was broken.
Applied, thanks!
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [BUG] 2.6.29-rc* QinQ vlan trunking regression
2009-03-05 3:53 ` David Miller
2009-03-05 4:54 ` Bart Trojanowski
2009-03-05 5:51 ` Patrick McHardy
@ 2009-03-05 12:30 ` Maxime Bizon
2009-03-05 12:55 ` David Miller
2 siblings, 1 reply; 29+ messages in thread
From: Maxime Bizon @ 2009-03-05 12:30 UTC (permalink / raw)
To: David Miller
Cc: kaber, bart, greearb, shemminger, dada1, frank.blaschka, netdev,
linux-kernel
On Wed, 2009-03-04 at 19:53 -0800, David Miller wrote:
Hi David,
> +#ifdef CONFIG_COMPAT_NET_DEV_OPS
> + const struct net_device_ops *ops = dev->netdev_ops;
> +
> + dev->init = ops->ndo_init;
> + dev->uninit = ops->ndo_uninit;
> + dev->open = ops->ndo_open;
> + dev->change_rx_flags = ops->ndo_change_rx_flags;
> + dev->set_rx_mode = ops->ndo_set_rx_mode;
> + dev->set_multicast_list = ops->ndo_set_multicast_list;
> + dev->set_mac_address = ops->ndo_set_mac_address;
> + dev->validate_addr = ops->ndo_validate_addr;
> + dev->do_ioctl = ops->ndo_do_ioctl;
> + dev->set_config = ops->ndo_set_config;
> + dev->change_mtu = ops->ndo_change_mtu;
> + dev->tx_timeout = ops->ndo_tx_timeout;
> + dev->get_stats = ops->ndo_get_stats;
> + dev->vlan_rx_register = ops->ndo_vlan_rx_register;
> + dev->vlan_rx_add_vid = ops->ndo_vlan_rx_add_vid;
> + dev->vlan_rx_kill_vid = ops->ndo_vlan_rx_kill_vid;
> +#ifdef CONFIG_NET_POLL_CONTROLLER
> + dev->poll_controller = ops->ndo_poll_controller;
> +#endif
> +#endif
> +}
> +EXPORT_SYMBOL(netdev_resync_ops);
Any reason dev->stop is not in this list ?
Regards,
--
Maxime
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [BUG] 2.6.29-rc* QinQ vlan trunking regression
2009-03-05 12:30 ` Maxime Bizon
@ 2009-03-05 12:55 ` David Miller
0 siblings, 0 replies; 29+ messages in thread
From: David Miller @ 2009-03-05 12:55 UTC (permalink / raw)
To: mbizon
Cc: kaber, bart, greearb, shemminger, dada1, frank.blaschka, netdev,
linux-kernel
From: Maxime Bizon <mbizon@freebox.fr>
Date: Thu, 05 Mar 2009 13:30:26 +0100
> Any reason dev->stop is not in this list ?
dev->stop is only called from net/core/dev.c and it is
netdev_ops aware :-)
^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2009-03-05 12:56 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-28 18:05 [BUG] 2.6.29-rc* QinQ vlan trunking regression Bart Trojanowski
2009-03-04 7:43 ` David Miller
2009-03-04 9:57 ` Patrick McHardy
2009-03-04 10:59 ` David Miller
2009-03-04 11:45 ` Patrick McHardy
2009-03-05 3:53 ` David Miller
2009-03-05 4:54 ` Bart Trojanowski
2009-03-05 4:59 ` Bart Trojanowski
2009-03-05 5:51 ` Patrick McHardy
2009-03-05 5:21 ` David Miller
2009-03-05 5:51 ` Patrick McHardy
2009-03-05 6:57 ` David Miller
2009-03-05 7:00 ` David Miller
2009-03-05 7:05 ` Patrick McHardy
2009-03-05 7:11 ` David Miller
2009-03-05 7:12 ` Patrick McHardy
2009-03-05 7:19 ` David Miller
2009-03-05 7:26 ` Patrick McHardy
2009-03-05 7:31 ` Patrick McHardy
2009-03-05 7:45 ` David Miller
2009-03-05 8:05 ` Frank Blaschka
2009-03-05 8:27 ` Patrick McHardy
2009-03-05 8:56 ` David Miller
2009-03-05 8:59 ` David Miller
2009-03-05 9:08 ` Patrick McHardy
2009-03-05 9:09 ` Patrick McHardy
2009-03-05 9:58 ` David Miller
2009-03-05 12:30 ` Maxime Bizon
2009-03-05 12:55 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).