* kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces @ 2009-10-18 21:02 Denys Fedoryschenko 2009-10-19 3:34 ` Michal Ostrowski 0 siblings, 1 reply; 34+ messages in thread From: Denys Fedoryschenko @ 2009-10-18 21:02 UTC (permalink / raw) To: netdev, linux-ppp, paulus, mostrows I have server running as pppoe NAS. Tried to rename customers without dropping pppd connections first, got panic after few seconds. Panic triggerable at 2.6.30.4 and 2.6.31.4 pppoe users running on eth2 pppoe flags: 1457 root /usr/sbin/pppoe-server -I eth2 -k -L 172.16.1.1 -R 172.16.1.2 -N 253 -C gpzone -S gpzone Commands sequence that i think triggered that: ip link set eth0 down ip link set eth1 down ip link set eth2 down nameif etherx 00:16:76:8D:83:BA nameif eth0 00:19:e0:72:4a:37 nameif eth1 00:19:e0:72:4a:4b ip addr flush dev eth0 ip addr flush dev eth1 ip addr add X.X.X.X/29 dev eth0 ip addr add 192.168.2.177/24 dev eth0 ip addr add 192.168.0.1/32 dev eth1 ip addr add 127.0.0.0/8 dev lo #ip link set eth0 up ip link set eth0 up ip link set eth1 up ip link set lo up ip route add 0.0.0.0/0 via X.X.X.X [ 103.428591] r8169: eth0: link up [ 103.430360] r8169: eth1: link up [ 113.361528] BUG: unable to handle kernel NULL pointer dereference at 0000018f [ 113.361717] IP: [<f8868269>] pppoe_device_event+0x80/0x12c [pppoe] [ 113.361853] *pdpt = 000000003411a001 *pde = 0000000000000000 Oct 18 23:59:40 194.146.153.93 [ 113.362012] Oops: 0000 [#1] SMP Oct 18 23:59:40 194.146.153.93 [ 113.362166] last sysfs file: /sys/devices/virtual/vc/vcs3/dev [ 113.362246] Modules linked in: netconsole configfs act_skbedit sch_ingress sch_prio cls_flow cls_u32 em_meta cls_basic xt_dscp xt_DSCP ipt_REJECT ts_bm xt_string xt_hl ifb cls_fw sch_tbf sch_htb act_ipt act_mirred xt_MARK pppoe pppox ppp_generic slhc xt_TCPMSS xt_mark xt_tcpudp iptable_mangle iptable_nat nf_nat rtc_cmos nf_conntrack_ipv4 rtc_core nf_conntrack rtc_lib nf_defrag_ipv4 iptable_filter ip_tables x_tables 8021q garp stp llc loop sata_sil pata_atiixp pata_acpi ata_generic libata 8139cp usb_storage mtdblock mtd_blkdevs mtd sr_mod cdrom tulip r8169 sky2 via_velocity via_rhine sis900 ne2k_pci 8390 skge tg3 libphy 8139too e1000 e100 usbhid ohci_hcd uhci_hcd ehci_hcd usbcore nls_base Oct 18 23:59:40 194.146.153.93 [ 113.362344] [ 113.362344] Pid: 2858, comm: pppd Not tainted (2.6.31.4-build-0047 #7) [ 113.362344] EIP: 0060:[<f8868269>] EFLAGS: 00010286 CPU: 0 [ 113.362344] EIP is at pppoe_device_event+0x80/0x12c [pppoe] [ 113.362344] EAX: f4fbe000 EBX: ffffffff ECX: f6cea5a0 EDX: f7403680 [ 113.362344] ESI: 0000000f EDI: f6cea5e0 EBP: f4145e34 ESP: f4145e1c [ 113.362344] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 [ 113.362344] Process pppd (pid: 2858, ti=f4145000 task=f4112ff0 task.ti=f4145000) [ 113.362344] Stack: [ 113.362344] f4fbe220 f4fbe000 f6cea5a0 f886a430 fffffff5 00000000 f4145e54 c01422b3 Oct 18 23:59:40 194.146.153.93 [ 113.362344] <0> f4fbe000 00000009 f8a457d8 f4fbe000 f8850190 00001091 f4145e64 c0142361 Oct 18 23:59:40 194.146.153.93 [ 113.362344] <0> ffffffff 00000000 f4145e74 c029ffbf f4fbe000 000010d0 f4145e90 c029fa70 Oct 18 23:59:40 194.146.153.93 [ 113.362344] Call Trace: [ 113.362344] [<c01422b3>] ? notifier_call_chain+0x2b/0x4a [ 113.362344] [<c0142361>] ? raw_notifier_call_chain+0xc/0xe [ 113.362344] [<c029ffbf>] ? dev_close+0x4c/0x8c [ 113.362344] [<c029fa70>] ? dev_change_flags+0xa5/0x158 [ 113.362344] [<c02da633>] ? devinet_ioctl+0x21a/0x503 [ 113.362344] [<c02db693>] ? inet_ioctl+0x8d/0xa6 [ 113.362344] [<c0292b21>] ? sock_ioctl+0x1c8/0x1ec [ 113.362344] [<c0292959>] ? sock_ioctl+0x0/0x1ec [ 113.362344] [<c019af2b>] ? vfs_ioctl+0x22/0x69 [ 113.362344] [<c019b435>] ? do_vfs_ioctl+0x41f/0x459 [ 113.362344] [<c02934eb>] ? sys_send+0x18/0x1a [ 113.362344] [<c011942f>] ? do_page_fault+0x242/0x26f [ 113.362344] [<c019b49b>] ? sys_ioctl+0x2c/0x45 [ 113.362344] [<c0102975>] ? syscall_call+0x7/0xb [ 113.362344] Code: c9 00 00 00 89 c7 31 f6 83 c7 40 89 f8 e8 cc 60 a9 c7 8b 45 ec 05 20 02 00 00 89 45 e8 8b 4d f0 8b 1c b1 e9 8c 00 00 00 8b 45 ec Oct 18 23:59:40 194.146.153.93 83 90 01 00 00 74 08 8b 9b 8c 01 00 00 eb 79 b8 c0 a6 86 f8 Oct 18 23:59:40 194.146.153.93 [ 113.362344] EIP: [<f8868269>] pppoe_device_event+0x80/0x12c [pppoe] SS:ESP 0068:f4145e1c [ 113.362344] CR2: 000000000000018f [ 113.373124] ---[ end trace f6fe64a307e97f3b ]--- [ 113.373203] Kernel panic - not syncing: Fatal exception in interrupt [ 113.373286] Pid: 2858, comm: pppd Tainted: G D 2.6.31.4-build-0047 #7 [ 113.373379] Call Trace: [ 113.373479] [<c02fc496>] ? printk+0xf/0x11 [ 113.373561] [<c02fc3e7>] panic+0x39/0xd9 [ 113.373656] [<c01059b7>] oops_end+0x8b/0x9a [ 113.373727] [<c0118f6d>] no_context+0x13d/0x147 [ 113.373800] [<c011908a>] __bad_area_nosemaphore+0x113/0x11b [ 113.373881] [<c02953b3>] ? sock_alloc_send_pskb+0x8b/0x24a [ 113.373959] [<c0121801>] ? __wake_up_sync_key+0x3b/0x45 [ 113.374030] [<c0131967>] ? irq_exit+0x39/0x5c [ 113.374107] [<c0104393>] ? do_IRQ+0x80/0x96 [ 113.374183] [<c0102f49>] ? common_interrupt+0x29/0x30 [ 113.374259] [<c011909f>] bad_area_nosemaphore+0xd/0x10 [ 113.374348] [<c0119301>] do_page_fault+0x114/0x26f [ 113.374526] [<c01191ed>] ? do_page_fault+0x0/0x26f [ 113.374605] [<c02fe506>] error_code+0x66/0x6c [ 113.374683] [<c02d007b>] ? tcp_v4_send_ack+0xa3/0x10e [ 113.374764] [<c01191ed>] ? do_page_fault+0x0/0x26f [ 113.374850] [<f8868269>] ? pppoe_device_event+0x80/0x12c [pppoe] [ 113.374928] [<c01422b3>] notifier_call_chain+0x2b/0x4a [ 113.375012] [<c0142361>] raw_notifier_call_chain+0xc/0xe [ 113.375097] [<c029ffbf>] dev_close+0x4c/0x8c [ 113.375169] [<c029fa70>] dev_change_flags+0xa5/0x158 [ 113.375239] [<c02da633>] devinet_ioctl+0x21a/0x503 [ 113.375318] [<c02db693>] inet_ioctl+0x8d/0xa6 [ 113.375411] [<c0292b21>] sock_ioctl+0x1c8/0x1ec [ 113.375491] [<c0292959>] ? sock_ioctl+0x0/0x1ec [ 113.375574] [<c019af2b>] vfs_ioctl+0x22/0x69 [ 113.375653] [<c019b435>] do_vfs_ioctl+0x41f/0x459 [ 113.375734] [<c02934eb>] ? sys_send+0x18/0x1a [ 113.375813] [<c011942f>] ? do_page_fault+0x242/0x26f [ 113.375884] [<c019b49b>] sys_ioctl+0x2c/0x45 [ 113.375960] [<c0102975>] syscall_call+0x7/0xb [ 113.376041] Rebooting in 5 seconds.. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces 2009-10-18 21:02 kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces Denys Fedoryschenko @ 2009-10-19 3:34 ` Michal Ostrowski 2009-10-19 11:36 ` Denys Fedoryschenko ` (2 more replies) 0 siblings, 3 replies; 34+ messages in thread From: Michal Ostrowski @ 2009-10-19 3:34 UTC (permalink / raw) To: Denys Fedoryschenko; +Cc: netdev, linux-ppp, paulus, mostrows Here's my theory on this after an inital look... Looking at the oops report and disassembly of the actual module binary that caused the oops, one can deduce that: Execution was in pppoe_flush_dev(). %ebx contained the pointer "struct pppox_sock *po", which is what we faulted on, excuting "cmp %eax, 0x190(%ebx)". %ebx value was 0xffffffff (hence we got "NULL pointer dereference at 0x18f"). At this point "i" (stored in %esi) is 15 (valid), meaning that we got a value of 0xffffffff in pn->hash_table[i]. From this I'd hypothesize that the combination of dev_put() and release_sock() may have allowed us to free "pn". At the bottom of the loop we alreayd recognize that since locks are dropped we're responsible for handling invalidation of objects, and perhaps that should be extended to "pn" as well. -- Michal Ostrowski mostrows@gmail.com --- drivers/net/pppoe.c | 86 ++++++++++++++++++++++++++---- -------------------- 1 files changed, 45 insertions(+), 41 deletions(-) diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c index 7cbf6f9..720c4ea 100644 --- a/drivers/net/pppoe.c +++ b/drivers/net/pppoe.c @@ -296,6 +296,7 @@ static void pppoe_flush_dev(struct net_device *dev) BUG_ON(dev == NULL); +restart: pn = pppoe_pernet(dev_net(dev)); if (!pn) /* already freed */ return; @@ -303,48 +304,51 @@ static void pppoe_flush_dev(struct net_device *dev) write_lock_bh(&pn->hash_lock); for (i = 0; i < PPPOE_HASH_SIZE; i++) { struct pppox_sock *po = pn->hash_table[i]; + struct sock *sk; - while (po != NULL) { - struct sock *sk; - if (po->pppoe_dev != dev) { - po = po->next; - continue; - } - sk = sk_pppox(po); - spin_lock(&flush_lock); - po->pppoe_dev = NULL; - spin_unlock(&flush_lock); - dev_put(dev); - - /* We always grab the socket lock, followed by the - * hash_lock, in that order. Since we should - * hold the sock lock while doing any unbinding, - * we need to release the lock we're holding. - * Hold a reference to the sock so it doesn't disappear - * as we're jumping between locks. - */ + while (po && po->pppoe_dev != dev) { + po = po->next; + } - sock_hold(sk); + if (po == NULL) { + continue; + } - write_unlock_bh(&pn->hash_lock); - lock_sock(sk); + sk = sk_pppox(po); - if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND)) { - pppox_unbind_sock(sk); - sk->sk_state = PPPOX_ZOMBIE; - sk->sk_state_change(sk); - } + spin_lock(&flush_lock); + po->pppoe_dev = NULL; + spin_unlock(&flush_lock); - release_sock(sk); - sock_put(sk); + dev_put(dev); - /* Restart scan at the beginning of this hash chain. - * While the lock was dropped the chain contents may - * have changed. - */ - write_lock_bh(&pn->hash_lock); - po = pn->hash_table[i]; - } + /* We always grab the socket lock, followed by the + * hash_lock, in that order. Since we should + * hold the sock lock while doing any unbinding, + * we need to release the lock we're holding. + * Hold a reference to the sock so it doesn't disappear + * as we're jumping between locks. + */ + + sock_hold(sk); + + write_unlock_bh(&pn->hash_lock); + lock_sock(sk); + + if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND)) { + pppox_unbind_sock(sk); + sk->sk_state = PPPOX_ZOMBIE; + sk->sk_state_change(sk); + } + + release_sock(sk); + sock_put(sk); + + /* Restart the flush process from the beginning. While + * the lock was dropped the chain contents may have + * changed, and sock_put may have made things go away. + */ + goto restart; } write_unlock_bh(&pn->hash_lock); } -- 1.6.3.3 On Sun, Oct 18, 2009 at 4:02 PM, Denys Fedoryschenko <denys@visp.net.lb> wrote: > > I have server running as pppoe NAS. > Tried to rename customers without dropping pppd connections first, got panic > after few seconds. > Panic triggerable at 2.6.30.4 and 2.6.31.4 > pppoe users running on eth2 > pppoe flags: > 1457 root /usr/sbin/pppoe-server -I eth2 -k -L 172.16.1.1 -R > 172.16.1.2 -N 253 -C gpzone -S gpzone > > > Commands sequence that i think triggered that: > > ip link set eth0 down > ip link set eth1 down > ip link set eth2 down > nameif etherx 00:16:76:8D:83:BA > nameif eth0 00:19:e0:72:4a:37 > nameif eth1 00:19:e0:72:4a:4b > > ip addr flush dev eth0 > ip addr flush dev eth1 > ip addr add X.X.X.X/29 dev eth0 > ip addr add 192.168.2.177/24 dev eth0 > ip addr add 192.168.0.1/32 dev eth1 > ip addr add 127.0.0.0/8 dev lo > #ip link set eth0 up > ip link set eth0 up > ip link set eth1 up > ip link set lo up > ip route add 0.0.0.0/0 via X.X.X.X > > > [ 103.428591] r8169: eth0: link up > [ 103.430360] r8169: eth1: link up > [ 113.361528] BUG: unable to handle kernel > NULL pointer dereference > at 0000018f > [ 113.361717] IP: > [<f8868269>] pppoe_device_event+0x80/0x12c [pppoe] > [ 113.361853] *pdpt = 000000003411a001 > *pde = 0000000000000000 > Oct 18 23:59:40 194.146.153.93 > [ 113.362012] Oops: 0000 [#1] > SMP > Oct 18 23:59:40 194.146.153.93 > [ 113.362166] last sysfs file: /sys/devices/virtual/vc/vcs3/dev > [ 113.362246] Modules linked in: > netconsole > configfs > act_skbedit > sch_ingress > sch_prio > cls_flow > cls_u32 > em_meta > cls_basic > xt_dscp > xt_DSCP > ipt_REJECT > ts_bm > xt_string > xt_hl > ifb > cls_fw > sch_tbf > sch_htb > act_ipt > act_mirred > xt_MARK > pppoe > pppox > ppp_generic > slhc > xt_TCPMSS > xt_mark > xt_tcpudp > iptable_mangle > iptable_nat > nf_nat > rtc_cmos > nf_conntrack_ipv4 > rtc_core > nf_conntrack > rtc_lib > nf_defrag_ipv4 > iptable_filter > ip_tables > x_tables > 8021q > garp > stp > llc > loop > sata_sil > pata_atiixp > pata_acpi > ata_generic > libata > 8139cp > usb_storage > mtdblock > mtd_blkdevs > mtd > sr_mod > cdrom > tulip > r8169 > sky2 > via_velocity > via_rhine > sis900 > ne2k_pci > 8390 > skge > tg3 > libphy > 8139too > e1000 > e100 > usbhid > ohci_hcd > uhci_hcd > ehci_hcd > usbcore > nls_base > Oct 18 23:59:40 194.146.153.93 > [ 113.362344] > [ 113.362344] Pid: 2858, comm: pppd Not tainted (2.6.31.4-build-0047 #7) > [ 113.362344] EIP: 0060:[<f8868269>] EFLAGS: 00010286 CPU: 0 > [ 113.362344] EIP is at pppoe_device_event+0x80/0x12c [pppoe] > [ 113.362344] EAX: f4fbe000 EBX: ffffffff ECX: f6cea5a0 EDX: f7403680 > [ 113.362344] ESI: 0000000f EDI: f6cea5e0 EBP: f4145e34 ESP: f4145e1c > [ 113.362344] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 > [ 113.362344] Process pppd (pid: 2858, ti=f4145000 task=f4112ff0 > task.ti=f4145000) > [ 113.362344] Stack: > [ 113.362344] f4fbe220 > f4fbe000 > f6cea5a0 > f886a430 > fffffff5 > 00000000 > f4145e54 > c01422b3 > Oct 18 23:59:40 194.146.153.93 > [ 113.362344] <0> > f4fbe000 > 00000009 > f8a457d8 > f4fbe000 > f8850190 > 00001091 > f4145e64 > c0142361 > Oct 18 23:59:40 194.146.153.93 > [ 113.362344] <0> > ffffffff > 00000000 > f4145e74 > c029ffbf > f4fbe000 > 000010d0 > f4145e90 > c029fa70 > Oct 18 23:59:40 194.146.153.93 > [ 113.362344] Call Trace: > [ 113.362344] [<c01422b3>] ? notifier_call_chain+0x2b/0x4a > [ 113.362344] [<c0142361>] ? raw_notifier_call_chain+0xc/0xe > [ 113.362344] [<c029ffbf>] ? dev_close+0x4c/0x8c > [ 113.362344] [<c029fa70>] ? dev_change_flags+0xa5/0x158 > [ 113.362344] [<c02da633>] ? devinet_ioctl+0x21a/0x503 > [ 113.362344] [<c02db693>] ? inet_ioctl+0x8d/0xa6 > [ 113.362344] [<c0292b21>] ? sock_ioctl+0x1c8/0x1ec > [ 113.362344] [<c0292959>] ? sock_ioctl+0x0/0x1ec > [ 113.362344] [<c019af2b>] ? vfs_ioctl+0x22/0x69 > [ 113.362344] [<c019b435>] ? do_vfs_ioctl+0x41f/0x459 > [ 113.362344] [<c02934eb>] ? sys_send+0x18/0x1a > [ 113.362344] [<c011942f>] ? do_page_fault+0x242/0x26f > [ 113.362344] [<c019b49b>] ? sys_ioctl+0x2c/0x45 > [ 113.362344] [<c0102975>] ? syscall_call+0x7/0xb > [ 113.362344] Code: > c9 > 00 > 00 > 00 > 89 > c7 > 31 > f6 > 83 > c7 > 40 > 89 > f8 > e8 > cc > 60 > a9 > c7 > 8b > 45 > ec > 05 > 20 > 02 > 00 > 00 > 89 > 45 > e8 > 8b > 4d > f0 > 8b > 1c > b1 > e9 > 8c > 00 > 00 > 00 > 8b > 45 > ec > Oct 18 23:59:40 194.146.153.93 > 83 > 90 > 01 > 00 > 00 > 74 > 08 > 8b > 9b > 8c > 01 > 00 > 00 > eb > 79 > b8 > c0 > a6 > 86 > f8 > Oct 18 23:59:40 194.146.153.93 > [ 113.362344] EIP: [<f8868269>] > pppoe_device_event+0x80/0x12c [pppoe] > SS:ESP 0068:f4145e1c > [ 113.362344] CR2: 000000000000018f > [ 113.373124] ---[ end trace f6fe64a307e97f3b ]--- > [ 113.373203] Kernel panic - not syncing: Fatal exception in interrupt > [ 113.373286] Pid: 2858, comm: pppd Tainted: G D 2.6.31.4-build-0047 > #7 > [ 113.373379] Call Trace: > [ 113.373479] [<c02fc496>] ? printk+0xf/0x11 > [ 113.373561] [<c02fc3e7>] panic+0x39/0xd9 > [ 113.373656] [<c01059b7>] oops_end+0x8b/0x9a > [ 113.373727] [<c0118f6d>] no_context+0x13d/0x147 > [ 113.373800] [<c011908a>] __bad_area_nosemaphore+0x113/0x11b > [ 113.373881] [<c02953b3>] ? sock_alloc_send_pskb+0x8b/0x24a > [ 113.373959] [<c0121801>] ? __wake_up_sync_key+0x3b/0x45 > [ 113.374030] [<c0131967>] ? irq_exit+0x39/0x5c > [ 113.374107] [<c0104393>] ? do_IRQ+0x80/0x96 > [ 113.374183] [<c0102f49>] ? common_interrupt+0x29/0x30 > [ 113.374259] [<c011909f>] bad_area_nosemaphore+0xd/0x10 > [ 113.374348] [<c0119301>] do_page_fault+0x114/0x26f > [ 113.374526] [<c01191ed>] ? do_page_fault+0x0/0x26f > [ 113.374605] [<c02fe506>] error_code+0x66/0x6c > [ 113.374683] [<c02d007b>] ? tcp_v4_send_ack+0xa3/0x10e > [ 113.374764] [<c01191ed>] ? do_page_fault+0x0/0x26f > [ 113.374850] [<f8868269>] ? pppoe_device_event+0x80/0x12c [pppoe] > [ 113.374928] [<c01422b3>] notifier_call_chain+0x2b/0x4a > [ 113.375012] [<c0142361>] raw_notifier_call_chain+0xc/0xe > [ 113.375097] [<c029ffbf>] dev_close+0x4c/0x8c > [ 113.375169] [<c029fa70>] dev_change_flags+0xa5/0x158 > [ 113.375239] [<c02da633>] devinet_ioctl+0x21a/0x503 > [ 113.375318] [<c02db693>] inet_ioctl+0x8d/0xa6 > [ 113.375411] [<c0292b21>] sock_ioctl+0x1c8/0x1ec > [ 113.375491] [<c0292959>] ? sock_ioctl+0x0/0x1ec > [ 113.375574] [<c019af2b>] vfs_ioctl+0x22/0x69 > [ 113.375653] [<c019b435>] do_vfs_ioctl+0x41f/0x459 > [ 113.375734] [<c02934eb>] ? sys_send+0x18/0x1a > [ 113.375813] [<c011942f>] ? do_page_fault+0x242/0x26f > [ 113.375884] [<c019b49b>] sys_ioctl+0x2c/0x45 > [ 113.375960] [<c0102975>] syscall_call+0x7/0xb > [ 113.376041] Rebooting in 5 seconds.. ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces 2009-10-19 3:34 ` Michal Ostrowski @ 2009-10-19 11:36 ` Denys Fedoryschenko 2009-10-19 12:01 ` Denys Fedoryschenko 2009-10-19 12:36 ` Eric Dumazet 2 siblings, 0 replies; 34+ messages in thread From: Denys Fedoryschenko @ 2009-10-19 11:36 UTC (permalink / raw) To: Michal Ostrowski; +Cc: netdev, linux-ppp, paulus, mostrows Can you send me patch as attachment please? On Monday 19 October 2009 06:34:06 Michal Ostrowski wrote: > Here's my theory on this after an inital look... > > Looking at the oops report and disassembly of the actual module binary > that caused the oops, one can deduce that: > > Execution was in pppoe_flush_dev(). %ebx contained the pointer "struct > pppox_sock *po", which is what we faulted on, excuting "cmp %eax, > 0x190(%ebx)". %ebx value was 0xffffffff (hence we got "NULL pointer > dereference at 0x18f"). > > At this point "i" (stored in %esi) is 15 (valid), meaning that we got a > value of 0xffffffff in pn->hash_table[i]. > > From this I'd hypothesize that the combination of dev_put() and > release_sock() may have allowed us to free "pn". At the bottom of the loop > we alreayd recognize that since locks are dropped we're responsible for > handling invalidation of objects, and perhaps that should be extended to > "pn" as well. -- > Michal Ostrowski > mostrows@gmail.com > > > --- > drivers/net/pppoe.c | 86 ++++++++++++++++++++++++++---- > -------------------- > 1 files changed, 45 insertions(+), 41 deletions(-) > > diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c > index 7cbf6f9..720c4ea 100644 > --- a/drivers/net/pppoe.c > +++ b/drivers/net/pppoe.c > @@ -296,6 +296,7 @@ static void pppoe_flush_dev(struct net_device *dev) > > BUG_ON(dev == NULL); > > +restart: > pn = pppoe_pernet(dev_net(dev)); > if (!pn) /* already freed */ > return; > @@ -303,48 +304,51 @@ static void pppoe_flush_dev(struct net_device *dev) > write_lock_bh(&pn->hash_lock); > for (i = 0; i < PPPOE_HASH_SIZE; i++) { > struct pppox_sock *po = pn->hash_table[i]; > + struct sock *sk; > > - while (po != NULL) { > - struct sock *sk; > - if (po->pppoe_dev != dev) { > - po = po->next; > - continue; > - } > - sk = sk_pppox(po); > - spin_lock(&flush_lock); > - po->pppoe_dev = NULL; > - spin_unlock(&flush_lock); > - dev_put(dev); > - > - /* We always grab the socket lock, followed by the > - * hash_lock, in that order. Since we should > - * hold the sock lock while doing any unbinding, > - * we need to release the lock we're holding. > - * Hold a reference to the sock so it doesn't > disappear - * as we're jumping between locks. > - */ > + while (po && po->pppoe_dev != dev) { > + po = po->next; > + } > > - sock_hold(sk); > + if (po == NULL) { > + continue; > + } > > - write_unlock_bh(&pn->hash_lock); > - lock_sock(sk); > + sk = sk_pppox(po); > > - if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND)) > { - pppox_unbind_sock(sk); > - sk->sk_state = PPPOX_ZOMBIE; > - sk->sk_state_change(sk); > - } > + spin_lock(&flush_lock); > + po->pppoe_dev = NULL; > + spin_unlock(&flush_lock); > > - release_sock(sk); > - sock_put(sk); > + dev_put(dev); > > - /* Restart scan at the beginning of this hash > chain. - * While the lock was dropped the chain > contents may - * have changed. > - */ > - write_lock_bh(&pn->hash_lock); > - po = pn->hash_table[i]; > - } > + /* We always grab the socket lock, followed by the > + * hash_lock, in that order. Since we should > + * hold the sock lock while doing any unbinding, > + * we need to release the lock we're holding. > + * Hold a reference to the sock so it doesn't disappear > + * as we're jumping between locks. > + */ > + > + sock_hold(sk); > + > + write_unlock_bh(&pn->hash_lock); > + lock_sock(sk); > + > + if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND)) { > + pppox_unbind_sock(sk); > + sk->sk_state = PPPOX_ZOMBIE; > + sk->sk_state_change(sk); > + } > + > + release_sock(sk); > + sock_put(sk); > + > + /* Restart the flush process from the beginning. While > + * the lock was dropped the chain contents may have > + * changed, and sock_put may have made things go away. > + */ > + goto restart; > } > write_unlock_bh(&pn->hash_lock); > } > -- > 1.6.3.3 > > On Sun, Oct 18, 2009 at 4:02 PM, Denys Fedoryschenko <denys@visp.net.lb> wrote: > > I have server running as pppoe NAS. > > Tried to rename customers without dropping pppd connections first, got > > panic after few seconds. > > Panic triggerable at 2.6.30.4 and 2.6.31.4 > > pppoe users running on eth2 > > pppoe flags: > > 1457 root /usr/sbin/pppoe-server -I eth2 -k -L 172.16.1.1 -R > > 172.16.1.2 -N 253 -C gpzone -S gpzone > > > > > > Commands sequence that i think triggered that: > > > > ip link set eth0 down > > ip link set eth1 down > > ip link set eth2 down > > nameif etherx 00:16:76:8D:83:BA > > nameif eth0 00:19:e0:72:4a:37 > > nameif eth1 00:19:e0:72:4a:4b > > > > ip addr flush dev eth0 > > ip addr flush dev eth1 > > ip addr add X.X.X.X/29 dev eth0 > > ip addr add 192.168.2.177/24 dev eth0 > > ip addr add 192.168.0.1/32 dev eth1 > > ip addr add 127.0.0.0/8 dev lo > > #ip link set eth0 up > > ip link set eth0 up > > ip link set eth1 up > > ip link set lo up > > ip route add 0.0.0.0/0 via X.X.X.X > > > > > > [ 103.428591] r8169: eth0: link up > > [ 103.430360] r8169: eth1: link up > > [ 113.361528] BUG: unable to handle kernel > > NULL pointer dereference > > at 0000018f > > [ 113.361717] IP: > > [<f8868269>] pppoe_device_event+0x80/0x12c [pppoe] > > [ 113.361853] *pdpt = 000000003411a001 > > *pde = 0000000000000000 > > Oct 18 23:59:40 194.146.153.93 > > [ 113.362012] Oops: 0000 [#1] > > SMP > > Oct 18 23:59:40 194.146.153.93 > > [ 113.362166] last sysfs file: /sys/devices/virtual/vc/vcs3/dev > > [ 113.362246] Modules linked in: > > netconsole > > configfs > > act_skbedit > > sch_ingress > > sch_prio > > cls_flow > > cls_u32 > > em_meta > > cls_basic > > xt_dscp > > xt_DSCP > > ipt_REJECT > > ts_bm > > xt_string > > xt_hl > > ifb > > cls_fw > > sch_tbf > > sch_htb > > act_ipt > > act_mirred > > xt_MARK > > pppoe > > pppox > > ppp_generic > > slhc > > xt_TCPMSS > > xt_mark > > xt_tcpudp > > iptable_mangle > > iptable_nat > > nf_nat > > rtc_cmos > > nf_conntrack_ipv4 > > rtc_core > > nf_conntrack > > rtc_lib > > nf_defrag_ipv4 > > iptable_filter > > ip_tables > > x_tables > > 8021q > > garp > > stp > > llc > > loop > > sata_sil > > pata_atiixp > > pata_acpi > > ata_generic > > libata > > 8139cp > > usb_storage > > mtdblock > > mtd_blkdevs > > mtd > > sr_mod > > cdrom > > tulip > > r8169 > > sky2 > > via_velocity > > via_rhine > > sis900 > > ne2k_pci > > 8390 > > skge > > tg3 > > libphy > > 8139too > > e1000 > > e100 > > usbhid > > ohci_hcd > > uhci_hcd > > ehci_hcd > > usbcore > > nls_base > > Oct 18 23:59:40 194.146.153.93 > > [ 113.362344] > > [ 113.362344] Pid: 2858, comm: pppd Not tainted (2.6.31.4-build-0047 #7) > > [ 113.362344] EIP: 0060:[<f8868269>] EFLAGS: 00010286 CPU: 0 > > [ 113.362344] EIP is at pppoe_device_event+0x80/0x12c [pppoe] > > [ 113.362344] EAX: f4fbe000 EBX: ffffffff ECX: f6cea5a0 EDX: f7403680 > > [ 113.362344] ESI: 0000000f EDI: f6cea5e0 EBP: f4145e34 ESP: f4145e1c > > [ 113.362344] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 > > [ 113.362344] Process pppd (pid: 2858, ti=f4145000 task=f4112ff0 > > task.ti=f4145000) > > [ 113.362344] Stack: > > [ 113.362344] f4fbe220 > > f4fbe000 > > f6cea5a0 > > f886a430 > > fffffff5 > > 00000000 > > f4145e54 > > c01422b3 > > Oct 18 23:59:40 194.146.153.93 > > [ 113.362344] <0> > > f4fbe000 > > 00000009 > > f8a457d8 > > f4fbe000 > > f8850190 > > 00001091 > > f4145e64 > > c0142361 > > Oct 18 23:59:40 194.146.153.93 > > [ 113.362344] <0> > > ffffffff > > 00000000 > > f4145e74 > > c029ffbf > > f4fbe000 > > 000010d0 > > f4145e90 > > c029fa70 > > Oct 18 23:59:40 194.146.153.93 > > [ 113.362344] Call Trace: > > [ 113.362344] [<c01422b3>] ? notifier_call_chain+0x2b/0x4a > > [ 113.362344] [<c0142361>] ? raw_notifier_call_chain+0xc/0xe > > [ 113.362344] [<c029ffbf>] ? dev_close+0x4c/0x8c > > [ 113.362344] [<c029fa70>] ? dev_change_flags+0xa5/0x158 > > [ 113.362344] [<c02da633>] ? devinet_ioctl+0x21a/0x503 > > [ 113.362344] [<c02db693>] ? inet_ioctl+0x8d/0xa6 > > [ 113.362344] [<c0292b21>] ? sock_ioctl+0x1c8/0x1ec > > [ 113.362344] [<c0292959>] ? sock_ioctl+0x0/0x1ec > > [ 113.362344] [<c019af2b>] ? vfs_ioctl+0x22/0x69 > > [ 113.362344] [<c019b435>] ? do_vfs_ioctl+0x41f/0x459 > > [ 113.362344] [<c02934eb>] ? sys_send+0x18/0x1a > > [ 113.362344] [<c011942f>] ? do_page_fault+0x242/0x26f > > [ 113.362344] [<c019b49b>] ? sys_ioctl+0x2c/0x45 > > [ 113.362344] [<c0102975>] ? syscall_call+0x7/0xb > > [ 113.362344] Code: > > c9 > > 00 > > 00 > > 00 > > 89 > > c7 > > 31 > > f6 > > 83 > > c7 > > 40 > > 89 > > f8 > > e8 > > cc > > 60 > > a9 > > c7 > > 8b > > 45 > > ec > > 05 > > 20 > > 02 > > 00 > > 00 > > 89 > > 45 > > e8 > > 8b > > 4d > > f0 > > 8b > > 1c > > b1 > > e9 > > 8c > > 00 > > 00 > > 00 > > 8b > > 45 > > ec > > Oct 18 23:59:40 194.146.153.93 > > 83 > > 90 > > 01 > > 00 > > 00 > > 74 > > 08 > > 8b > > 9b > > 8c > > 01 > > 00 > > 00 > > eb > > 79 > > b8 > > c0 > > a6 > > 86 > > f8 > > Oct 18 23:59:40 194.146.153.93 > > [ 113.362344] EIP: [<f8868269>] > > pppoe_device_event+0x80/0x12c [pppoe] > > SS:ESP 0068:f4145e1c > > [ 113.362344] CR2: 000000000000018f > > [ 113.373124] ---[ end trace f6fe64a307e97f3b ]--- > > [ 113.373203] Kernel panic - not syncing: Fatal exception in interrupt > > [ 113.373286] Pid: 2858, comm: pppd Tainted: G D > > 2.6.31.4-build-0047 #7 > > [ 113.373379] Call Trace: > > [ 113.373479] [<c02fc496>] ? printk+0xf/0x11 > > [ 113.373561] [<c02fc3e7>] panic+0x39/0xd9 > > [ 113.373656] [<c01059b7>] oops_end+0x8b/0x9a > > [ 113.373727] [<c0118f6d>] no_context+0x13d/0x147 > > [ 113.373800] [<c011908a>] __bad_area_nosemaphore+0x113/0x11b > > [ 113.373881] [<c02953b3>] ? sock_alloc_send_pskb+0x8b/0x24a > > [ 113.373959] [<c0121801>] ? __wake_up_sync_key+0x3b/0x45 > > [ 113.374030] [<c0131967>] ? irq_exit+0x39/0x5c > > [ 113.374107] [<c0104393>] ? do_IRQ+0x80/0x96 > > [ 113.374183] [<c0102f49>] ? common_interrupt+0x29/0x30 > > [ 113.374259] [<c011909f>] bad_area_nosemaphore+0xd/0x10 > > [ 113.374348] [<c0119301>] do_page_fault+0x114/0x26f > > [ 113.374526] [<c01191ed>] ? do_page_fault+0x0/0x26f > > [ 113.374605] [<c02fe506>] error_code+0x66/0x6c > > [ 113.374683] [<c02d007b>] ? tcp_v4_send_ack+0xa3/0x10e > > [ 113.374764] [<c01191ed>] ? do_page_fault+0x0/0x26f > > [ 113.374850] [<f8868269>] ? pppoe_device_event+0x80/0x12c [pppoe] > > [ 113.374928] [<c01422b3>] notifier_call_chain+0x2b/0x4a > > [ 113.375012] [<c0142361>] raw_notifier_call_chain+0xc/0xe > > [ 113.375097] [<c029ffbf>] dev_close+0x4c/0x8c > > [ 113.375169] [<c029fa70>] dev_change_flags+0xa5/0x158 > > [ 113.375239] [<c02da633>] devinet_ioctl+0x21a/0x503 > > [ 113.375318] [<c02db693>] inet_ioctl+0x8d/0xa6 > > [ 113.375411] [<c0292b21>] sock_ioctl+0x1c8/0x1ec > > [ 113.375491] [<c0292959>] ? sock_ioctl+0x0/0x1ec > > [ 113.375574] [<c019af2b>] vfs_ioctl+0x22/0x69 > > [ 113.375653] [<c019b435>] do_vfs_ioctl+0x41f/0x459 > > [ 113.375734] [<c02934eb>] ? sys_send+0x18/0x1a > > [ 113.375813] [<c011942f>] ? do_page_fault+0x242/0x26f > > [ 113.375884] [<c019b49b>] sys_ioctl+0x2c/0x45 > > [ 113.375960] [<c0102975>] syscall_call+0x7/0xb > > [ 113.376041] Rebooting in 5 seconds.. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces 2009-10-19 3:34 ` Michal Ostrowski 2009-10-19 11:36 ` Denys Fedoryschenko @ 2009-10-19 12:01 ` Denys Fedoryschenko 2009-10-19 12:36 ` Eric Dumazet 2 siblings, 0 replies; 34+ messages in thread From: Denys Fedoryschenko @ 2009-10-19 12:01 UTC (permalink / raw) To: Michal Ostrowski; +Cc: netdev, linux-ppp, paulus, mostrows Applied patch manually, still panic (maybe different now): [ 42.596904] [ 42.596904] Pid: 0, comm: swapper Not tainted (2.6.31.4-build-0047 #12) [ 42.596904] EIP: 0060:[<f886865e>] EFLAGS: 00010286 CPU: 0 [ 42.596904] EIP is at pppoe_rcv+0x153/0x1be [pppoe] [ 42.596904] EAX: 00003100 EBX: ffffffff ECX: 00000002 EDX: f74007cb [ 42.596904] ESI: f79b4b00 EDI: f6332f00 EBP: c1806edc ESP: c1806eb8 [ 42.596904] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 [ 42.596904] Process swapper (pid: 0, ti=c1806000 task=c03f6de0 task.ti=c03f1000) [ 42.596904] Stack: [ 42.596904] f7445000 00000004 f6fc0028 31000030 f6fc0030 f6332f40 f79b4b00 c0418140 Oct 19 15:00:46 194.146.153.99 [ 42.596904] <0> f886a444 c1806f14 c029e702 f7445000 f79b4bb0 f79b4bb0 c0418160 f7445000 Oct 19 15:00:46 194.146.153.99 [ 42.596904] <0> 00000000 00000001 64886f14 c02ac3ce f79b4b00 00000000 f753105c c1806f58 Oct 19 15:00:46 194.146.153.99 [ 42.596904] Call Trace: [ 42.596904] [<c029e702>] ? netif_receive_skb+0x43b/0x45a [ 42.596904] [<c02ac3ce>] ? eth_type_trans+0x25/0xa9 [ 42.596904] [<f840f736>] ? rtl8169_rx_interrupt+0x343/0x3f3 [r8169] [ 42.596904] [<f8412191>] ? rtl8169_poll+0x2f/0x1b2 [r8169] [ 42.596904] [<c01413e5>] ? hrtimer_run_pending+0x2d/0xa7 [ 42.596904] [<c029ec95>] ? net_rx_action+0x93/0x177 [ 42.596904] [<c0131bdd>] ? __do_softirq+0xa7/0x144 [ 42.596904] [<c0131b36>] ? __do_softirq+0x0/0x144 [ 42.596904] <IRQ> Oct 19 15:00:46 194.146.153.99 [ 42.596904] [<c0131957>] ? irq_exit+0x29/0x5c [ 42.596904] [<c0104393>] ? do_IRQ+0x80/0x96 [ 42.596904] [<c0102f49>] ? common_interrupt+0x29/0x30 [ 42.596904] [<c0108a3e>] ? mwait_idle+0x8a/0xb9 [ 42.596904] [<c0149613>] ? tick_nohz_restart_sched_tick+0x27/0x12f [ 42.596904] [<c0101bf0>] ? cpu_idle+0x44/0x60 [ 42.596904] [<c02ef8c7>] ? rest_init+0x53/0x55 [ 42.596904] [<c04197df>] ? start_kernel+0x2b9/0x2be [ 42.596904] [<c041906a>] ? i386_start_kernel+0x6a/0x6f [ 42.596904] Code: 53 0a 32 53 0b 31 c2 c1 e8 08 31 c2 eb 08 0f b6 c2 c1 f8 04 31 c2 d1 e9 83 f9 04 74 f1 89 d0 83 e0 0f 8b 1c 87 eb 35 66 8b 45 ea Oct 19 15:00:46 194.146.153.99 39 83 98 01 00 00 75 22 8b 55 e4 8d 83 9a 01 00 00 b9 06 00 Oct 19 15:00:46 194.146.153.99 [ 42.596904] EIP: [<f886865e>] pppoe_rcv+0x153/0x1be [pppoe] SS:ESP 0068:c1806eb8 [ 42.596904] CR2: 0000000000000197 [ 42.606148] ---[ end trace b477ff4ee072d9b9 ]--- [ 42.606209] Kernel panic - not syncing: Fatal exception in interrupt [ 42.606273] Pid: 0, comm: swapper Tainted: G D 2.6.31.4-build-0047 #12 [ 42.606351] Call Trace: [ 42.606413] [<c02fc496>] ? printk+0xf/0x11 [ 42.606474] [<c02fc3e7>] panic+0x39/0xd9 [ 42.606535] [<c01059b7>] oops_end+0x8b/0x9a [ 42.606597] [<c0118f6d>] no_context+0x13d/0x147 [ 42.606658] [<c011908a>] __bad_area_nosemaphore+0x113/0x11b [ 42.606722] [<c0121dc1>] ? check_preempt_wakeup+0x34/0x141 [ 42.606862] [<c01294bb>] ? try_to_wake_up+0x1aa/0x1b4 [ 42.606930] [<c0209541>] ? cpumask_next_and+0x26/0x37 [ 42.607003] [<c01256f5>] ? find_busiest_group+0x291/0x885 [ 42.607067] [<c019cf47>] ? pollwake+0x5a/0x63 [ 42.607127] [<c011909f>] bad_area_nosemaphore+0xd/0x10 [ 42.607189] [<c0119301>] do_page_fault+0x114/0x26f [ 42.607251] [<c01191ed>] ? do_page_fault+0x0/0x26f [ 42.607313] [<c02fe506>] error_code+0x66/0x6c [ 42.607375] [<c02900d8>] ? pcibios_set_master+0x89/0x8d [ 42.607436] [<c01191ed>] ? do_page_fault+0x0/0x26f [ 42.607501] [<f886865e>] ? pppoe_rcv+0x153/0x1be [pppoe] [ 42.607564] [<c029e702>] netif_receive_skb+0x43b/0x45a [ 42.607625] [<c02ac3ce>] ? eth_type_trans+0x25/0xa9 [ 42.607691] [<f840f736>] rtl8169_rx_interrupt+0x343/0x3f3 [r8169] [ 42.607759] [<f8412191>] rtl8169_poll+0x2f/0x1b2 [r8169] [ 42.607824] [<c01413e5>] ? hrtimer_run_pending+0x2d/0xa7 [ 42.607886] [<c029ec95>] net_rx_action+0x93/0x177 [ 42.607948] [<c0131bdd>] __do_softirq+0xa7/0x144 [ 42.608022] [<c0131b36>] ? __do_softirq+0x0/0x144 [ 42.608082] <IRQ> [<c0131957>] ? irq_exit+0x29/0x5c [ 42.608183] [<c0104393>] ? do_IRQ+0x80/0x96 [ 42.608183] [<c0104393>] ? do_IRQ+0x80/0x96 [ 42.608245] [<c0102f49>] ? common_interrupt+0x29/0x30 [ 42.608307] [<c0108a3e>] ? mwait_idle+0x8a/0xb9 [ 42.608369] [<c0149613>] ? tick_nohz_restart_sched_tick+0x27/0x12f [ 42.608431] [<c0101bf0>] ? cpu_idle+0x44/0x60 [ 42.608493] [<c02ef8c7>] ? rest_init+0x53/0x55 [ 42.608553] [<c04197df>] ? start_kernel+0x2b9/0x2be [ 42.608616] [<c041906a>] ? i386_start_kernel+0x6a/0x6f [ 42.608682] Rebooting in 5 seconds.. On Monday 19 October 2009 06:34:06 Michal Ostrowski wrote: > Here's my theory on this after an inital look... > > Looking at the oops report and disassembly of the actual module binary > that caused the oops, one can deduce that: > > Execution was in pppoe_flush_dev(). %ebx contained the pointer "struct > pppox_sock *po", which is what we faulted on, excuting "cmp %eax, > 0x190(%ebx)". %ebx value was 0xffffffff (hence we got "NULL pointer > dereference at 0x18f"). > > At this point "i" (stored in %esi) is 15 (valid), meaning that we got a > value of 0xffffffff in pn->hash_table[i]. > > From this I'd hypothesize that the combination of dev_put() and > release_sock() may have allowed us to free "pn". At the bottom of the loop > we alreayd recognize that since locks are dropped we're responsible for > handling invalidation of objects, and perhaps that should be extended to > "pn" as well. -- > Michal Ostrowski > mostrows@gmail.com > > > --- > drivers/net/pppoe.c | 86 ++++++++++++++++++++++++++---- > -------------------- > 1 files changed, 45 insertions(+), 41 deletions(-) > > diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c > index 7cbf6f9..720c4ea 100644 > --- a/drivers/net/pppoe.c > +++ b/drivers/net/pppoe.c > @@ -296,6 +296,7 @@ static void pppoe_flush_dev(struct net_device *dev) > > BUG_ON(dev == NULL); > > +restart: > pn = pppoe_pernet(dev_net(dev)); > if (!pn) /* already freed */ > return; > @@ -303,48 +304,51 @@ static void pppoe_flush_dev(struct net_device *dev) > write_lock_bh(&pn->hash_lock); > for (i = 0; i < PPPOE_HASH_SIZE; i++) { > struct pppox_sock *po = pn->hash_table[i]; > + struct sock *sk; > > - while (po != NULL) { > - struct sock *sk; > - if (po->pppoe_dev != dev) { > - po = po->next; > - continue; > - } > - sk = sk_pppox(po); > - spin_lock(&flush_lock); > - po->pppoe_dev = NULL; > - spin_unlock(&flush_lock); > - dev_put(dev); > - > - /* We always grab the socket lock, followed by the > - * hash_lock, in that order. Since we should > - * hold the sock lock while doing any unbinding, > - * we need to release the lock we're holding. > - * Hold a reference to the sock so it doesn't > disappear - * as we're jumping between locks. > - */ > + while (po && po->pppoe_dev != dev) { > + po = po->next; > + } > > - sock_hold(sk); > + if (po == NULL) { > + continue; > + } > > - write_unlock_bh(&pn->hash_lock); > - lock_sock(sk); > + sk = sk_pppox(po); > > - if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND)) > { - pppox_unbind_sock(sk); > - sk->sk_state = PPPOX_ZOMBIE; > - sk->sk_state_change(sk); > - } > + spin_lock(&flush_lock); > + po->pppoe_dev = NULL; > + spin_unlock(&flush_lock); > > - release_sock(sk); > - sock_put(sk); > + dev_put(dev); > > - /* Restart scan at the beginning of this hash > chain. - * While the lock was dropped the chain > contents may - * have changed. > - */ > - write_lock_bh(&pn->hash_lock); > - po = pn->hash_table[i]; > - } > + /* We always grab the socket lock, followed by the > + * hash_lock, in that order. Since we should > + * hold the sock lock while doing any unbinding, > + * we need to release the lock we're holding. > + * Hold a reference to the sock so it doesn't disappear > + * as we're jumping between locks. > + */ > + > + sock_hold(sk); > + > + write_unlock_bh(&pn->hash_lock); > + lock_sock(sk); > + > + if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND)) { > + pppox_unbind_sock(sk); > + sk->sk_state = PPPOX_ZOMBIE; > + sk->sk_state_change(sk); > + } > + > + release_sock(sk); > + sock_put(sk); > + > + /* Restart the flush process from the beginning. While > + * the lock was dropped the chain contents may have > + * changed, and sock_put may have made things go away. > + */ > + goto restart; > } > write_unlock_bh(&pn->hash_lock); > } > -- > 1.6.3.3 > > On Sun, Oct 18, 2009 at 4:02 PM, Denys Fedoryschenko <denys@visp.net.lb> wrote: > > I have server running as pppoe NAS. > > Tried to rename customers without dropping pppd connections first, got > > panic after few seconds. > > Panic triggerable at 2.6.30.4 and 2.6.31.4 > > pppoe users running on eth2 > > pppoe flags: > > 1457 root /usr/sbin/pppoe-server -I eth2 -k -L 172.16.1.1 -R > > 172.16.1.2 -N 253 -C gpzone -S gpzone > > > > > > Commands sequence that i think triggered that: > > > > ip link set eth0 down > > ip link set eth1 down > > ip link set eth2 down > > nameif etherx 00:16:76:8D:83:BA > > nameif eth0 00:19:e0:72:4a:37 > > nameif eth1 00:19:e0:72:4a:4b > > > > ip addr flush dev eth0 > > ip addr flush dev eth1 > > ip addr add X.X.X.X/29 dev eth0 > > ip addr add 192.168.2.177/24 dev eth0 > > ip addr add 192.168.0.1/32 dev eth1 > > ip addr add 127.0.0.0/8 dev lo > > #ip link set eth0 up > > ip link set eth0 up > > ip link set eth1 up > > ip link set lo up > > ip route add 0.0.0.0/0 via X.X.X.X > > > > > > [ 103.428591] r8169: eth0: link up > > [ 103.430360] r8169: eth1: link up > > [ 113.361528] BUG: unable to handle kernel > > NULL pointer dereference > > at 0000018f > > [ 113.361717] IP: > > [<f8868269>] pppoe_device_event+0x80/0x12c [pppoe] > > [ 113.361853] *pdpt = 000000003411a001 > > *pde = 0000000000000000 > > Oct 18 23:59:40 194.146.153.93 > > [ 113.362012] Oops: 0000 [#1] > > SMP > > Oct 18 23:59:40 194.146.153.93 > > [ 113.362166] last sysfs file: /sys/devices/virtual/vc/vcs3/dev > > [ 113.362246] Modules linked in: > > netconsole > > configfs > > act_skbedit > > sch_ingress > > sch_prio > > cls_flow > > cls_u32 > > em_meta > > cls_basic > > xt_dscp > > xt_DSCP > > ipt_REJECT > > ts_bm > > xt_string > > xt_hl > > ifb > > cls_fw > > sch_tbf > > sch_htb > > act_ipt > > act_mirred > > xt_MARK > > pppoe > > pppox > > ppp_generic > > slhc > > xt_TCPMSS > > xt_mark > > xt_tcpudp > > iptable_mangle > > iptable_nat > > nf_nat > > rtc_cmos > > nf_conntrack_ipv4 > > rtc_core > > nf_conntrack > > rtc_lib > > nf_defrag_ipv4 > > iptable_filter > > ip_tables > > x_tables > > 8021q > > garp > > stp > > llc > > loop > > sata_sil > > pata_atiixp > > pata_acpi > > ata_generic > > libata > > 8139cp > > usb_storage > > mtdblock > > mtd_blkdevs > > mtd > > sr_mod > > cdrom > > tulip > > r8169 > > sky2 > > via_velocity > > via_rhine > > sis900 > > ne2k_pci > > 8390 > > skge > > tg3 > > libphy > > 8139too > > e1000 > > e100 > > usbhid > > ohci_hcd > > uhci_hcd > > ehci_hcd > > usbcore > > nls_base > > Oct 18 23:59:40 194.146.153.93 > > [ 113.362344] > > [ 113.362344] Pid: 2858, comm: pppd Not tainted (2.6.31.4-build-0047 #7) > > [ 113.362344] EIP: 0060:[<f8868269>] EFLAGS: 00010286 CPU: 0 > > [ 113.362344] EIP is at pppoe_device_event+0x80/0x12c [pppoe] > > [ 113.362344] EAX: f4fbe000 EBX: ffffffff ECX: f6cea5a0 EDX: f7403680 > > [ 113.362344] ESI: 0000000f EDI: f6cea5e0 EBP: f4145e34 ESP: f4145e1c > > [ 113.362344] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 > > [ 113.362344] Process pppd (pid: 2858, ti=f4145000 task=f4112ff0 > > task.ti=f4145000) > > [ 113.362344] Stack: > > [ 113.362344] f4fbe220 > > f4fbe000 > > f6cea5a0 > > f886a430 > > fffffff5 > > 00000000 > > f4145e54 > > c01422b3 > > Oct 18 23:59:40 194.146.153.93 > > [ 113.362344] <0> > > f4fbe000 > > 00000009 > > f8a457d8 > > f4fbe000 > > f8850190 > > 00001091 > > f4145e64 > > c0142361 > > Oct 18 23:59:40 194.146.153.93 > > [ 113.362344] <0> > > ffffffff > > 00000000 > > f4145e74 > > c029ffbf > > f4fbe000 > > 000010d0 > > f4145e90 > > c029fa70 > > Oct 18 23:59:40 194.146.153.93 > > [ 113.362344] Call Trace: > > [ 113.362344] [<c01422b3>] ? notifier_call_chain+0x2b/0x4a > > [ 113.362344] [<c0142361>] ? raw_notifier_call_chain+0xc/0xe > > [ 113.362344] [<c029ffbf>] ? dev_close+0x4c/0x8c > > [ 113.362344] [<c029fa70>] ? dev_change_flags+0xa5/0x158 > > [ 113.362344] [<c02da633>] ? devinet_ioctl+0x21a/0x503 > > [ 113.362344] [<c02db693>] ? inet_ioctl+0x8d/0xa6 > > [ 113.362344] [<c0292b21>] ? sock_ioctl+0x1c8/0x1ec > > [ 113.362344] [<c0292959>] ? sock_ioctl+0x0/0x1ec > > [ 113.362344] [<c019af2b>] ? vfs_ioctl+0x22/0x69 > > [ 113.362344] [<c019b435>] ? do_vfs_ioctl+0x41f/0x459 > > [ 113.362344] [<c02934eb>] ? sys_send+0x18/0x1a > > [ 113.362344] [<c011942f>] ? do_page_fault+0x242/0x26f > > [ 113.362344] [<c019b49b>] ? sys_ioctl+0x2c/0x45 > > [ 113.362344] [<c0102975>] ? syscall_call+0x7/0xb > > [ 113.362344] Code: > > c9 > > 00 > > 00 > > 00 > > 89 > > c7 > > 31 > > f6 > > 83 > > c7 > > 40 > > 89 > > f8 > > e8 > > cc > > 60 > > a9 > > c7 > > 8b > > 45 > > ec > > 05 > > 20 > > 02 > > 00 > > 00 > > 89 > > 45 > > e8 > > 8b > > 4d > > f0 > > 8b > > 1c > > b1 > > e9 > > 8c > > 00 > > 00 > > 00 > > 8b > > 45 > > ec > > Oct 18 23:59:40 194.146.153.93 > > 83 > > 90 > > 01 > > 00 > > 00 > > 74 > > 08 > > 8b > > 9b > > 8c > > 01 > > 00 > > 00 > > eb > > 79 > > b8 > > c0 > > a6 > > 86 > > f8 > > Oct 18 23:59:40 194.146.153.93 > > [ 113.362344] EIP: [<f8868269>] > > pppoe_device_event+0x80/0x12c [pppoe] > > SS:ESP 0068:f4145e1c > > [ 113.362344] CR2: 000000000000018f > > [ 113.373124] ---[ end trace f6fe64a307e97f3b ]--- > > [ 113.373203] Kernel panic - not syncing: Fatal exception in interrupt > > [ 113.373286] Pid: 2858, comm: pppd Tainted: G D > > 2.6.31.4-build-0047 #7 > > [ 113.373379] Call Trace: > > [ 113.373479] [<c02fc496>] ? printk+0xf/0x11 > > [ 113.373561] [<c02fc3e7>] panic+0x39/0xd9 > > [ 113.373656] [<c01059b7>] oops_end+0x8b/0x9a > > [ 113.373727] [<c0118f6d>] no_context+0x13d/0x147 > > [ 113.373800] [<c011908a>] __bad_area_nosemaphore+0x113/0x11b > > [ 113.373881] [<c02953b3>] ? sock_alloc_send_pskb+0x8b/0x24a > > [ 113.373959] [<c0121801>] ? __wake_up_sync_key+0x3b/0x45 > > [ 113.374030] [<c0131967>] ? irq_exit+0x39/0x5c > > [ 113.374107] [<c0104393>] ? do_IRQ+0x80/0x96 > > [ 113.374183] [<c0102f49>] ? common_interrupt+0x29/0x30 > > [ 113.374259] [<c011909f>] bad_area_nosemaphore+0xd/0x10 > > [ 113.374348] [<c0119301>] do_page_fault+0x114/0x26f > > [ 113.374526] [<c01191ed>] ? do_page_fault+0x0/0x26f > > [ 113.374605] [<c02fe506>] error_code+0x66/0x6c > > [ 113.374683] [<c02d007b>] ? tcp_v4_send_ack+0xa3/0x10e > > [ 113.374764] [<c01191ed>] ? do_page_fault+0x0/0x26f > > [ 113.374850] [<f8868269>] ? pppoe_device_event+0x80/0x12c [pppoe] > > [ 113.374928] [<c01422b3>] notifier_call_chain+0x2b/0x4a > > [ 113.375012] [<c0142361>] raw_notifier_call_chain+0xc/0xe > > [ 113.375097] [<c029ffbf>] dev_close+0x4c/0x8c > > [ 113.375169] [<c029fa70>] dev_change_flags+0xa5/0x158 > > [ 113.375239] [<c02da633>] devinet_ioctl+0x21a/0x503 > > [ 113.375318] [<c02db693>] inet_ioctl+0x8d/0xa6 > > [ 113.375411] [<c0292b21>] sock_ioctl+0x1c8/0x1ec > > [ 113.375491] [<c0292959>] ? sock_ioctl+0x0/0x1ec > > [ 113.375574] [<c019af2b>] vfs_ioctl+0x22/0x69 > > [ 113.375653] [<c019b435>] do_vfs_ioctl+0x41f/0x459 > > [ 113.375734] [<c02934eb>] ? sys_send+0x18/0x1a > > [ 113.375813] [<c011942f>] ? do_page_fault+0x242/0x26f > > [ 113.375884] [<c019b49b>] sys_ioctl+0x2c/0x45 > > [ 113.375960] [<c0102975>] syscall_call+0x7/0xb > > [ 113.376041] Rebooting in 5 seconds.. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces 2009-10-19 3:34 ` Michal Ostrowski 2009-10-19 11:36 ` Denys Fedoryschenko 2009-10-19 12:01 ` Denys Fedoryschenko @ 2009-10-19 12:36 ` Eric Dumazet 2009-10-19 13:19 ` Michal Ostrowski 2 siblings, 1 reply; 34+ messages in thread From: Eric Dumazet @ 2009-10-19 12:36 UTC (permalink / raw) To: Michal Ostrowski; +Cc: Denys Fedoryschenko, netdev, linux-ppp, paulus, mostrows Michal Ostrowski a écrit : > Here's my theory on this after an inital look... > > Looking at the oops report and disassembly of the actual module binary > that caused the oops, one can deduce that: > > Execution was in pppoe_flush_dev(). %ebx contained the pointer "struct > pppox_sock *po", which is what we faulted on, excuting "cmp %eax, 0x190(%ebx)". > %ebx value was 0xffffffff (hence we got "NULL pointer dereference at 0x18f"). > > At this point "i" (stored in %esi) is 15 (valid), meaning that we got a value > of 0xffffffff in pn->hash_table[i]. > >>From this I'd hypothesize that the combination of dev_put() and release_sock() > may have allowed us to free "pn". At the bottom of the loop we alreayd > recognize that since locks are dropped we're responsible for handling > invalidation of objects, and perhaps that should be extended to "pn" as well. > -- > Michal Ostrowski > mostrows@gmail.com > > Looking at this stuff, I do believe flush_lock protection is not properly done. At the end of pppoe_connect() for example we can find : err_put: if (po->pppoe_dev) { dev_put(po->pppoe_dev); po->pppoe_dev = NULL; } This is done without any protection, and can therefore clash with pppoe_flush_dev() : spin_lock(&flush_lock); po->pppoe_dev = NULL; /* ppoe_dev can already be NULL before this point */ spin_unlock(&flush_lock); dev_put(dev); /* oops */ ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces 2009-10-19 12:36 ` Eric Dumazet @ 2009-10-19 13:19 ` Michal Ostrowski 2009-10-19 15:50 ` Cyrill Gorcunov 0 siblings, 1 reply; 34+ messages in thread From: Michal Ostrowski @ 2009-10-19 13:19 UTC (permalink / raw) To: Eric Dumazet Cc: Denys Fedoryschenko, netdev, linux-ppp, paulus, mostrows, Cyrill Gorcunov The entire scheme for managing net namespaces seems unsafe. We depend on synchronization via pn->hash_lock, but have no guarantee of the existence of the "net" object -- hence no way to ensure the existence of the lock itself. This should be relatively easy to fix though as we should be able to get/put the net namespace as we add remove objects to/from the pppoe hash. Once you solve this existence issue, the flush_lock can be eliminated altogether since all of the relevant code paths already depend on a write_lock_bh(&pn->hash_lock), and that's the lock that should be use to protect the pppoe_dev field. Another patch to follow later... -- Michal Ostrowski mostrows@gmail.com On Mon, Oct 19, 2009 at 7:36 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > Michal Ostrowski a écrit : >> Here's my theory on this after an inital look... >> >> Looking at the oops report and disassembly of the actual module binary >> that caused the oops, one can deduce that: >> >> Execution was in pppoe_flush_dev(). %ebx contained the pointer "struct >> pppox_sock *po", which is what we faulted on, excuting "cmp %eax, 0x190(%ebx)". >> %ebx value was 0xffffffff (hence we got "NULL pointer dereference at 0x18f"). >> >> At this point "i" (stored in %esi) is 15 (valid), meaning that we got a value >> of 0xffffffff in pn->hash_table[i]. >> >>>From this I'd hypothesize that the combination of dev_put() and release_sock() >> may have allowed us to free "pn". At the bottom of the loop we alreayd >> recognize that since locks are dropped we're responsible for handling >> invalidation of objects, and perhaps that should be extended to "pn" as well. >> -- >> Michal Ostrowski >> mostrows@gmail.com >> >> > > Looking at this stuff, I do believe flush_lock protection is not > properly done. > > At the end of pppoe_connect() for example we can find : > > err_put: > if (po->pppoe_dev) { > dev_put(po->pppoe_dev); > po->pppoe_dev = NULL; > } > > This is done without any protection, and can therefore clash with > pppoe_flush_dev() : > > spin_lock(&flush_lock); > po->pppoe_dev = NULL; /* ppoe_dev can already be NULL before this point */ > spin_unlock(&flush_lock); > > dev_put(dev); /* oops */ > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces 2009-10-19 13:19 ` Michal Ostrowski @ 2009-10-19 15:50 ` Cyrill Gorcunov 2009-10-19 16:05 ` Michal Ostrowski 0 siblings, 1 reply; 34+ messages in thread From: Cyrill Gorcunov @ 2009-10-19 15:50 UTC (permalink / raw) To: Michal Ostrowski Cc: Eric Dumazet, Denys Fedoryschenko, netdev, linux-ppp, paulus, mostrows [Michal Ostrowski - Mon, Oct 19, 2009 at 08:19:23AM -0500] | | The entire scheme for managing net namespaces seems unsafe. We depend | on synchronization via pn->hash_lock, but have no guarantee of the | existence of the "net" object -- hence no way to ensure the existence | of the lock itself. This should be relatively easy to fix though as | we should be able to get/put the net namespace as we add remove | objects to/from the pppoe hash. | Hmm... it seems not. The only possible scenario I see (for such nonexistence namespace is that when it was cached via RCU and returned before grace period elapsed, so perhaps we need to call synchronize_net somewhere). | | Once you solve this existence issue, the flush_lock can be eliminated | altogether since all of the relevant code paths already depend on a | write_lock_bh(&pn->hash_lock), and that's the lock that should be use | to protect the pppoe_dev field. | | Another patch to follow later... | | -- | Michal Ostrowski | mostrows@gmail.com | | | | On Mon, Oct 19, 2009 at 7:36 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: | > Michal Ostrowski a écrit : | >> Here's my theory on this after an inital look... | >> | >> Looking at the oops report and disassembly of the actual module binary | >> that caused the oops, one can deduce that: | >> | >> Execution was in pppoe_flush_dev(). %ebx contained the pointer "struct | >> pppox_sock *po", which is what we faulted on, excuting "cmp %eax, 0x190(%ebx)". | >> %ebx value was 0xffffffff (hence we got "NULL pointer dereference at 0x18f"). | >> | >> At this point "i" (stored in %esi) is 15 (valid), meaning that we got a value | >> of 0xffffffff in pn->hash_table[i]. | >> | >>>From this I'd hypothesize that the combination of dev_put() and release_sock() | >> may have allowed us to free "pn". At the bottom of the loop we alreayd | >> recognize that since locks are dropped we're responsible for handling | >> invalidation of objects, and perhaps that should be extended to "pn" as well. | >> -- | >> Michal Ostrowski | >> mostrows@gmail.com | >> | >> | > | > Looking at this stuff, I do believe flush_lock protection is not | > properly done. | > | > At the end of pppoe_connect() for example we can find : | > | > err_put: | > if (po->pppoe_dev) { | > dev_put(po->pppoe_dev); | > po->pppoe_dev = NULL; | > } Yep, this is unsafe, thanks! | > | > This is done without any protection, and can therefore clash with | > pppoe_flush_dev() : | > | > spin_lock(&flush_lock); | > po->pppoe_dev = NULL; /* ppoe_dev can already be NULL before this point */ | > spin_unlock(&flush_lock); | > | > dev_put(dev); /* oops */ | > | Denys, could you check if the patch below help? -- Cyrill --- drivers/net/pppoe.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) Index: linux-2.6.git/drivers/net/pppoe.c ===================================================================== --- linux-2.6.git.orig/drivers/net/pppoe.c +++ linux-2.6.git/drivers/net/pppoe.c @@ -312,9 +312,9 @@ static void pppoe_flush_dev(struct net_d } sk = sk_pppox(po); spin_lock(&flush_lock); + dev_put(po->pppoe_dev); po->pppoe_dev = NULL; spin_unlock(&flush_lock); - dev_put(dev); /* We always grab the socket lock, followed by the * hash_lock, in that order. Since we should @@ -708,10 +708,12 @@ end: release_sock(sk); return error; err_put: + spin_lock(&flush_lock); if (po->pppoe_dev) { dev_put(po->pppoe_dev); po->pppoe_dev = NULL; } + spin_unlock(&flush_lock); goto end; } ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces 2009-10-19 15:50 ` Cyrill Gorcunov @ 2009-10-19 16:05 ` Michal Ostrowski 2009-10-19 17:12 ` Eric Dumazet 0 siblings, 1 reply; 34+ messages in thread From: Michal Ostrowski @ 2009-10-19 16:05 UTC (permalink / raw) To: Cyrill Gorcunov Cc: Eric Dumazet, Denys Fedoryschenko, netdev, linux-ppp, paulus, mostrows Here's a bigger patch that just gets rid of flush_lock altogether. We were seeing oopses due to net namespaces going away while we were using them, which turns out is simply due to the fact that pppoew wasn't claiming ref counts properly. Fixing this requires that adding and removing entries to the per-net hash-table requires incrementing and decrementing the ref count. This also allows us to get rid of the flush_lock since we can now depend on the existence of "pn->hash_lock". We also have to be careful when flushing devices that removal of a hash table entry may bring the net namespace refcount to 0. --- drivers/net/pppoe.c | 152 ++++++++++++++++++++++++++++----------------------- 1 files changed, 83 insertions(+), 69 deletions(-) diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c index 7cbf6f9..140a196 100644 --- a/drivers/net/pppoe.c +++ b/drivers/net/pppoe.c @@ -111,9 +111,6 @@ struct pppoe_net { rwlock_t hash_lock; }; -/* to eliminate a race btw pppoe_flush_dev and pppoe_release */ -static DEFINE_SPINLOCK(flush_lock); - /* * PPPoE could be in the following stages: * 1) Discovery stage (to obtain remote MAC and Session ID) @@ -292,61 +289,77 @@ static inline struct pppox_sock *delete_item(struct pppoe_net *pn, __be16 sid, static void pppoe_flush_dev(struct net_device *dev) { struct pppoe_net *pn; + struct net * net; int i; BUG_ON(dev == NULL); + /* We have to drop and re-acquire locks. So we'll grab a ref-count on + * the net namespace to ensure it is valid throughout this function. + */ + + net = maybe_get_net(dev_net(dev)); + if (!net) + return; + pn = pppoe_pernet(dev_net(dev)); - if (!pn) /* already freed */ + if (!pn) { /* already freed */ + put_net(net); return; + } write_lock_bh(&pn->hash_lock); for (i = 0; i < PPPOE_HASH_SIZE; i++) { struct pppox_sock *po = pn->hash_table[i]; - - while (po != NULL) { - struct sock *sk; - if (po->pppoe_dev != dev) { - po = po->next; - continue; - } - sk = sk_pppox(po); - spin_lock(&flush_lock); - po->pppoe_dev = NULL; - spin_unlock(&flush_lock); - dev_put(dev); - - /* We always grab the socket lock, followed by the - * hash_lock, in that order. Since we should - * hold the sock lock while doing any unbinding, - * we need to release the lock we're holding. - * Hold a reference to the sock so it doesn't disappear - * as we're jumping between locks. - */ - - sock_hold(sk); - - write_unlock_bh(&pn->hash_lock); - lock_sock(sk); - - if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND)) { - pppox_unbind_sock(sk); - sk->sk_state = PPPOX_ZOMBIE; - sk->sk_state_change(sk); - } - - release_sock(sk); - sock_put(sk); - - /* Restart scan at the beginning of this hash chain. - * While the lock was dropped the chain contents may - * have changed. - */ - write_lock_bh(&pn->hash_lock); - po = pn->hash_table[i]; - } + struct sock *sk; + + while (po && po->pppoe_dev != dev) { + po = po->next; + } + + if (po == NULL) { + continue; + } + + sk = sk_pppox(po); + + if (po->pppoe_dev) { + dev_put(po->pppoe_dev); + po->pppoe_dev = NULL; + } + + /* We always grab the socket lock, followed by the + * hash_lock, in that order. Since we should + * hold the sock lock while doing any unbinding, + * we need to release the lock we're holding. + * Hold a reference to the sock so it doesn't disappear + * as we're jumping between locks. + */ + + sock_hold(sk); + + write_unlock_bh(&pn->hash_lock); + lock_sock(sk); + + if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND)) { + pppox_unbind_sock(sk); + sk->sk_state = PPPOX_ZOMBIE; + sk->sk_state_change(sk); + } + + release_sock(sk); + sock_put(sk); + + /* Restart the process from the start of the current hash + * chain. We dropped locks so the world may have change from + * underneath us, but we know "pn" is still good because we + * grabbed a ref count on "net". + */ + write_unlock_bh(&pn->hash_lock); + po = pn->hash_table[i]; } write_unlock_bh(&pn->hash_lock); + put_net(net); } static int pppoe_device_event(struct notifier_block *this, @@ -561,6 +574,7 @@ static int pppoe_release(struct socket *sock) struct sock *sk = sock->sk; struct pppox_sock *po; struct pppoe_net *pn; + struct net *net = NULL; if (!sk) return 0; @@ -576,19 +590,8 @@ static int pppoe_release(struct socket *sock) /* Signal the death of the socket. */ sk->sk_state = PPPOX_DEAD; - /* - * pppoe_flush_dev could lead to a race with - * this routine so we use flush_lock to eliminate - * such a case (we only need per-net specific data) - */ - spin_lock(&flush_lock); - po = pppox_sk(sk); - if (!po->pppoe_dev) { - spin_unlock(&flush_lock); - goto out; - } - pn = pppoe_pernet(dev_net(po->pppoe_dev)); - spin_unlock(&flush_lock); + net = sock_net(sk); + pn = pppoe_pernet(net); /* * protect "po" from concurrent updates @@ -601,14 +604,14 @@ static int pppoe_release(struct socket *sock) __delete_item(pn, po->pppoe_pa.sid, po->pppoe_pa.remote, po->pppoe_ifindex); - if (po->pppoe_dev) { - dev_put(po->pppoe_dev); - po->pppoe_dev = NULL; - } + if (po->pppoe_dev) { + dev_put(po->pppoe_dev); + po->pppoe_dev = NULL; + } write_unlock_bh(&pn->hash_lock); + put_net(net); -out: sock_orphan(sk); sock->sk = NULL; @@ -625,8 +628,9 @@ static int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr, struct sock *sk = sock->sk; struct sockaddr_pppox *sp = (struct sockaddr_pppox *)uservaddr; struct pppox_sock *po = pppox_sk(sk); - struct net_device *dev; struct pppoe_net *pn; + struct net_device *dev = NULL; + struct net *net = NULL; int error; lock_sock(sk); @@ -653,10 +657,12 @@ static int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr, if (stage_session(po->pppoe_pa.sid)) { pppox_unbind_sock(sk); if (po->pppoe_dev) { - pn = pppoe_pernet(dev_net(po->pppoe_dev)); + struct net *old = dev_net(po->pppoe_dev); + pn = pppoe_pernet(old); delete_item(pn, po->pppoe_pa.sid, po->pppoe_pa.remote, po->pppoe_ifindex); dev_put(po->pppoe_dev); + put_net(old); } memset(sk_pppox(po) + 1, 0, sizeof(struct pppox_sock) - sizeof(struct sock)); @@ -666,13 +672,17 @@ static int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr, /* Re-bind in session stage only */ if (stage_session(sp->sa_addr.pppoe.sid)) { error = -ENODEV; - dev = dev_get_by_name(sock_net(sk), sp->sa_addr.pppoe.dev); + net = maybe_get_net(dev_net(dev)); + if (!net) + goto end; + + dev = dev_get_by_name(net, sp->sa_addr.pppoe.dev); if (!dev) - goto end; + goto err_put_net; po->pppoe_dev = dev; po->pppoe_ifindex = dev->ifindex; - pn = pppoe_pernet(dev_net(dev)); + pn = pppoe_pernet(net); write_lock_bh(&pn->hash_lock); if (!(dev->flags & IFF_UP)) { write_unlock_bh(&pn->hash_lock); @@ -707,6 +717,10 @@ static int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr, end: release_sock(sk); return error; +err_put_net: + if (net) { + put_net(net); + } err_put: if (po->pppoe_dev) { dev_put(po->pppoe_dev); -- 1.6.3.3 ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces 2009-10-19 16:05 ` Michal Ostrowski @ 2009-10-19 17:12 ` Eric Dumazet 2009-10-19 18:07 ` Michal Ostrowski 0 siblings, 1 reply; 34+ messages in thread From: Eric Dumazet @ 2009-10-19 17:12 UTC (permalink / raw) To: Michal Ostrowski Cc: Cyrill Gorcunov, Denys Fedoryschenko, netdev, linux-ppp, paulus, mostrows Michal Ostrowski a écrit : > Here's a bigger patch that just gets rid of flush_lock altogether. > > We were seeing oopses due to net namespaces going away while we were using > them, which turns out is simply due to the fact that pppoew wasn't claiming ref > counts properly. > > Fixing this requires that adding and removing entries to the per-net hash-table > requires incrementing and decrementing the ref count. This also allows us to > get rid of the flush_lock since we can now depend on the existence of > "pn->hash_lock". > > We also have to be careful when flushing devices that removal of a hash table > entry may bring the net namespace refcount to 0. > Your patch is mangled (tabulation -> white spaces), and I dont believe namespace refcount can reach 0 inside pppoe_flush_dev(), it would be a bug from core network code. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces 2009-10-19 17:12 ` Eric Dumazet @ 2009-10-19 18:07 ` Michal Ostrowski 2009-10-19 18:44 ` Eric Dumazet 0 siblings, 1 reply; 34+ messages in thread From: Michal Ostrowski @ 2009-10-19 18:07 UTC (permalink / raw) To: Eric Dumazet Cc: Cyrill Gorcunov, Denys Fedoryschenko, netdev, linux-ppp, paulus, mostrows [-- Attachment #1: Type: text/plain, Size: 1653 bytes --] On Mon, Oct 19, 2009 at 12:12 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > Michal Ostrowski a écrit : >> Here's a bigger patch that just gets rid of flush_lock altogether. >> >> We were seeing oopses due to net namespaces going away while we were using >> them, which turns out is simply due to the fact that pppoew wasn't claiming ref >> counts properly. >> >> Fixing this requires that adding and removing entries to the per-net hash-table >> requires incrementing and decrementing the ref count. This also allows us to >> get rid of the flush_lock since we can now depend on the existence of >> "pn->hash_lock". >> >> We also have to be careful when flushing devices that removal of a hash table >> entry may bring the net namespace refcount to 0. >> > > Your patch is mangled (tabulation -> white spaces), Patch mangling was due to mailer interactions, I'll attach a clean version here, no more inlining. > > and I dont believe namespace refcount can reach 0 inside pppoe_flush_dev(), > it would be a bug from core network code. > >From the original oops I was able to deduce that the namespace somehow managed to get destroyed during the interval where we dropped locks. If that's not due to the release_sock() call in pppoe_flush_dev() triggering a cleanup then I'd have to assume that that it's due to a secondary actor closing the socket in parallel, but that in turn would point to issues with the flush_lock. Having said that the thrust of this patch remains valid; it just means I don't need to inc the ref count in pppoe_flush_dev(). Do you agree? -- Michal Ostrowski mostrows@gmail.com [-- Attachment #2: 0001-PPPoE-Fix-ref-counts-on-net-namespaces.patch --] [-- Type: application/octet-stream, Size: 5483 bytes --] diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c index 7cbf6f9..60bbeb2 100644 --- a/drivers/net/pppoe.c +++ b/drivers/net/pppoe.c @@ -111,9 +111,6 @@ struct pppoe_net { rwlock_t hash_lock; }; -/* to eliminate a race btw pppoe_flush_dev and pppoe_release */ -static DEFINE_SPINLOCK(flush_lock); - /* * PPPoE could be in the following stages: * 1) Discovery stage (to obtain remote MAC and Session ID) @@ -303,48 +300,49 @@ static void pppoe_flush_dev(struct net_device *dev) write_lock_bh(&pn->hash_lock); for (i = 0; i < PPPOE_HASH_SIZE; i++) { struct pppox_sock *po = pn->hash_table[i]; + struct sock *sk; - while (po != NULL) { - struct sock *sk; - if (po->pppoe_dev != dev) { - po = po->next; - continue; - } - sk = sk_pppox(po); - spin_lock(&flush_lock); - po->pppoe_dev = NULL; - spin_unlock(&flush_lock); - dev_put(dev); - - /* We always grab the socket lock, followed by the - * hash_lock, in that order. Since we should - * hold the sock lock while doing any unbinding, - * we need to release the lock we're holding. - * Hold a reference to the sock so it doesn't disappear - * as we're jumping between locks. - */ + while (po && po->pppoe_dev != dev) { + po = po->next; + } - sock_hold(sk); + if (po == NULL) { + continue; + } - write_unlock_bh(&pn->hash_lock); - lock_sock(sk); + sk = sk_pppox(po); - if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND)) { - pppox_unbind_sock(sk); - sk->sk_state = PPPOX_ZOMBIE; - sk->sk_state_change(sk); - } + if (po->pppoe_dev) { + dev_put(po->pppoe_dev); + po->pppoe_dev = NULL; + } - release_sock(sk); - sock_put(sk); + /* We always grab the socket lock, followed by the hash_lock, + * in that order. Since we should hold the sock lock while + * doing any unbinding, we need to release the lock we're + * holding. Hold a reference to the sock so it doesn't + * disappear as we're jumping between locks. + */ - /* Restart scan at the beginning of this hash chain. - * While the lock was dropped the chain contents may - * have changed. - */ - write_lock_bh(&pn->hash_lock); - po = pn->hash_table[i]; + sock_hold(sk); + write_unlock_bh(&pn->hash_lock); + lock_sock(sk); + + if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND)) { + pppox_unbind_sock(sk); + sk->sk_state = PPPOX_ZOMBIE; + sk->sk_state_change(sk); } + + release_sock(sk); + sock_put(sk); + + /* Restart the process from the start of the current hash + * chain. We dropped locks so the world may have change from + * underneath us. + */ + write_unlock_bh(&pn->hash_lock); + po = pn->hash_table[i]; } write_unlock_bh(&pn->hash_lock); } @@ -561,6 +559,7 @@ static int pppoe_release(struct socket *sock) struct sock *sk = sock->sk; struct pppox_sock *po; struct pppoe_net *pn; + struct net *net = NULL; if (!sk) return 0; @@ -576,19 +575,8 @@ static int pppoe_release(struct socket *sock) /* Signal the death of the socket. */ sk->sk_state = PPPOX_DEAD; - /* - * pppoe_flush_dev could lead to a race with - * this routine so we use flush_lock to eliminate - * such a case (we only need per-net specific data) - */ - spin_lock(&flush_lock); - po = pppox_sk(sk); - if (!po->pppoe_dev) { - spin_unlock(&flush_lock); - goto out; - } - pn = pppoe_pernet(dev_net(po->pppoe_dev)); - spin_unlock(&flush_lock); + net = sock_net(sk); + pn = pppoe_pernet(net); /* * protect "po" from concurrent updates @@ -607,8 +595,8 @@ static int pppoe_release(struct socket *sock) } write_unlock_bh(&pn->hash_lock); + put_net(net); -out: sock_orphan(sk); sock->sk = NULL; @@ -625,8 +613,9 @@ static int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr, struct sock *sk = sock->sk; struct sockaddr_pppox *sp = (struct sockaddr_pppox *)uservaddr; struct pppox_sock *po = pppox_sk(sk); - struct net_device *dev; struct pppoe_net *pn; + struct net_device *dev = NULL; + struct net *net = NULL; int error; lock_sock(sk); @@ -653,10 +642,12 @@ static int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr, if (stage_session(po->pppoe_pa.sid)) { pppox_unbind_sock(sk); if (po->pppoe_dev) { - pn = pppoe_pernet(dev_net(po->pppoe_dev)); + struct net *old = dev_net(po->pppoe_dev); + pn = pppoe_pernet(old); delete_item(pn, po->pppoe_pa.sid, po->pppoe_pa.remote, po->pppoe_ifindex); dev_put(po->pppoe_dev); + put_net(old); } memset(sk_pppox(po) + 1, 0, sizeof(struct pppox_sock) - sizeof(struct sock)); @@ -666,13 +657,17 @@ static int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr, /* Re-bind in session stage only */ if (stage_session(sp->sa_addr.pppoe.sid)) { error = -ENODEV; - dev = dev_get_by_name(sock_net(sk), sp->sa_addr.pppoe.dev); - if (!dev) + net = maybe_get_net(dev_net(dev)); + if (!net) goto end; + dev = dev_get_by_name(net, sp->sa_addr.pppoe.dev); + if (!dev) + goto err_put_net; + po->pppoe_dev = dev; po->pppoe_ifindex = dev->ifindex; - pn = pppoe_pernet(dev_net(dev)); + pn = pppoe_pernet(net); write_lock_bh(&pn->hash_lock); if (!(dev->flags & IFF_UP)) { write_unlock_bh(&pn->hash_lock); @@ -707,6 +702,10 @@ static int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr, end: release_sock(sk); return error; +err_put_net: + if (net) + put_net(net); + err_put: if (po->pppoe_dev) { dev_put(po->pppoe_dev); ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces 2009-10-19 18:07 ` Michal Ostrowski @ 2009-10-19 18:44 ` Eric Dumazet 2009-10-19 19:29 ` Cyrill Gorcunov ` (2 more replies) 0 siblings, 3 replies; 34+ messages in thread From: Eric Dumazet @ 2009-10-19 18:44 UTC (permalink / raw) To: Michal Ostrowski Cc: Cyrill Gorcunov, Denys Fedoryschenko, netdev, linux-ppp, paulus, mostrows Michal Ostrowski a écrit : > On Mon, Oct 19, 2009 at 12:12 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: >> Michal Ostrowski a écrit : >>> Here's a bigger patch that just gets rid of flush_lock altogether. >>> >>> We were seeing oopses due to net namespaces going away while we were using >>> them, which turns out is simply due to the fact that pppoew wasn't claiming ref >>> counts properly. >>> >>> Fixing this requires that adding and removing entries to the per-net hash-table >>> requires incrementing and decrementing the ref count. This also allows us to >>> get rid of the flush_lock since we can now depend on the existence of >>> "pn->hash_lock". >>> >>> We also have to be careful when flushing devices that removal of a hash table >>> entry may bring the net namespace refcount to 0. >>> >> Your patch is mangled (tabulation -> white spaces), > > Patch mangling was due to mailer interactions, I'll attach a clean > version here, no more inlining. > >> and I dont believe namespace refcount can reach 0 inside pppoe_flush_dev(), >> it would be a bug from core network code. >> > > From the original oops I was able to deduce that the namespace somehow > managed to get destroyed during the interval where we dropped locks. > If that's not due to the release_sock() call in pppoe_flush_dev() > triggering a cleanup then I'd have to assume that that it's due to a > secondary actor closing the socket in parallel, but that in turn would > point to issues with the flush_lock. Having said that the thrust of > this patch remains valid; it just means I don't need to inc the ref > count in pppoe_flush_dev(). > > Do you agree? > Not really :) I dont believe you should care of namespace, and/or mess with its refcount at all. Please dont use maybe_get_net() : This function should not ever be used in drivers/net You can add a BUG_ON(dev_net(xxxx)->count <= 0) if you really want, but if this assertion is false, this is not because of pppoe. lock_sock(sk); @@ -653,10 +642,12 @@ static int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr, if (stage_session(po->pppoe_pa.sid)) { pppox_unbind_sock(sk); if (po->pppoe_dev) { - pn = pppoe_pernet(dev_net(po->pppoe_dev)); + struct net *old = dev_net(po->pppoe_dev); + pn = pppoe_pernet(old); delete_item(pn, po->pppoe_pa.sid, po->pppoe_pa.remote, po->pppoe_ifindex); dev_put(po->pppoe_dev); + put_net(old); } memset(sk_pppox(po) + 1, 0, sizeof(struct pppox_sock) - sizeof(struct sock)); There is still a race here, since you do a dev_put(po->ppoe_dev); without any lock held So pppoe_flush_dev() can run concurently and dev_put(po->ppoe_dev) at same time. In fact pppoe_flush_dev() can change po->ppoe_dev anytime, so you should check all occurences of po->ppoe_dev use in the code and check if appropriate locking is done. pppoe_rcv_core() is not safe pppoe_ioctl() is not safe pppoe_sendmsg() is not safe __pppoe_xmit() is not safe ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces 2009-10-19 18:44 ` Eric Dumazet @ 2009-10-19 19:29 ` Cyrill Gorcunov 2009-10-19 20:54 ` Michal Ostrowski 2009-10-19 20:57 ` Cyrill Gorcunov 2 siblings, 0 replies; 34+ messages in thread From: Cyrill Gorcunov @ 2009-10-19 19:29 UTC (permalink / raw) To: Eric Dumazet Cc: Michal Ostrowski, Denys Fedoryschenko, netdev, linux-ppp, paulus, mostrows [Eric Dumazet - Mon, Oct 19, 2009 at 08:44:52PM +0200] ... | | Not really :) | | I dont believe you should care of namespace, and/or mess with its refcount at all. | | Please dont use maybe_get_net() : This function should not ever be used in drivers/net | | You can add a BUG_ON(dev_net(xxxx)->count <= 0) if you really want, but if this | assertion is false, this is not because of pppoe. | ... | So pppoe_flush_dev() can run concurently and dev_put(po->ppoe_dev) at same time. | | In fact pppoe_flush_dev() can change po->ppoe_dev anytime, so you should check | all occurences of po->ppoe_dev use in the code and check if appropriate locking is done. | | pppoe_rcv_core() is not safe | pppoe_ioctl() is not safe | pppoe_sendmsg() is not safe | __pppoe_xmit() is not safe | Sigh... seem so (which is mostly my fault not Michal). Every time we touch pppoe_dev we should dev_hold on it and dev_put as only done all we need. Async nature of notifier seem to be a key here. -- Cyrill ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces 2009-10-19 18:44 ` Eric Dumazet 2009-10-19 19:29 ` Cyrill Gorcunov @ 2009-10-19 20:54 ` Michal Ostrowski 2009-10-20 3:42 ` Eric Dumazet 2009-10-19 20:57 ` Cyrill Gorcunov 2 siblings, 1 reply; 34+ messages in thread From: Michal Ostrowski @ 2009-10-19 20:54 UTC (permalink / raw) To: Eric Dumazet Cc: Cyrill Gorcunov, Denys Fedoryschenko, netdev, linux-ppp, paulus, mostrows [-- Attachment #1: Type: text/plain, Size: 3819 bytes --] Access of po->pppoe_dev is guarded by sk->sk_state & PPPOX_CONNECTED, and all use cases now rely on the socket lock. Because of this, the ref-count on the namespace held by the socket object suffices to hold the namespace in existence and so we don't need to ref-count the namespace in PPPoE. The flush_lock is gone. -- Michal Ostrowski mostrows@gmail.com On Mon, Oct 19, 2009 at 1:44 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > Michal Ostrowski a écrit : >> On Mon, Oct 19, 2009 at 12:12 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: >>> Michal Ostrowski a écrit : >>>> Here's a bigger patch that just gets rid of flush_lock altogether. >>>> >>>> We were seeing oopses due to net namespaces going away while we were using >>>> them, which turns out is simply due to the fact that pppoew wasn't claiming ref >>>> counts properly. >>>> >>>> Fixing this requires that adding and removing entries to the per-net hash-table >>>> requires incrementing and decrementing the ref count. This also allows us to >>>> get rid of the flush_lock since we can now depend on the existence of >>>> "pn->hash_lock". >>>> >>>> We also have to be careful when flushing devices that removal of a hash table >>>> entry may bring the net namespace refcount to 0. >>>> >>> Your patch is mangled (tabulation -> white spaces), >> >> Patch mangling was due to mailer interactions, I'll attach a clean >> version here, no more inlining. >> >>> and I dont believe namespace refcount can reach 0 inside pppoe_flush_dev(), >>> it would be a bug from core network code. >>> >> >> From the original oops I was able to deduce that the namespace somehow >> managed to get destroyed during the interval where we dropped locks. >> If that's not due to the release_sock() call in pppoe_flush_dev() >> triggering a cleanup then I'd have to assume that that it's due to a >> secondary actor closing the socket in parallel, but that in turn would >> point to issues with the flush_lock. Having said that the thrust of >> this patch remains valid; it just means I don't need to inc the ref >> count in pppoe_flush_dev(). >> >> Do you agree? >> > > Not really :) > > I dont believe you should care of namespace, and/or mess with its refcount at all. > > Please dont use maybe_get_net() : This function should not ever be used in drivers/net > > You can add a BUG_ON(dev_net(xxxx)->count <= 0) if you really want, but if this > assertion is false, this is not because of pppoe. > > > lock_sock(sk); > @@ -653,10 +642,12 @@ static int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr, > if (stage_session(po->pppoe_pa.sid)) { > pppox_unbind_sock(sk); > if (po->pppoe_dev) { > - pn = pppoe_pernet(dev_net(po->pppoe_dev)); > + struct net *old = dev_net(po->pppoe_dev); > + pn = pppoe_pernet(old); > delete_item(pn, po->pppoe_pa.sid, > po->pppoe_pa.remote, po->pppoe_ifindex); > dev_put(po->pppoe_dev); > + put_net(old); > } > memset(sk_pppox(po) + 1, 0, > sizeof(struct pppox_sock) - sizeof(struct sock)); > > > There is still a race here, since you do a dev_put(po->ppoe_dev); without any lock held > > So pppoe_flush_dev() can run concurently and dev_put(po->ppoe_dev) at same time. > > In fact pppoe_flush_dev() can change po->ppoe_dev anytime, so you should check > all occurences of po->ppoe_dev use in the code and check if appropriate locking is done. > > pppoe_rcv_core() is not safe > pppoe_ioctl() is not safe > pppoe_sendmsg() is not safe > __pppoe_xmit() is not safe > > [-- Attachment #2: 0001-PPPoE-Fix-flush-close-races.patch --] [-- Type: application/octet-stream, Size: 6458 bytes --] diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c index 7cbf6f9..1d0b569 100644 --- a/drivers/net/pppoe.c +++ b/drivers/net/pppoe.c @@ -111,9 +111,6 @@ struct pppoe_net { rwlock_t hash_lock; }; -/* to eliminate a race btw pppoe_flush_dev and pppoe_release */ -static DEFINE_SPINLOCK(flush_lock); - /* * PPPoE could be in the following stages: * 1) Discovery stage (to obtain remote MAC and Session ID) @@ -303,48 +300,46 @@ static void pppoe_flush_dev(struct net_device *dev) write_lock_bh(&pn->hash_lock); for (i = 0; i < PPPOE_HASH_SIZE; i++) { struct pppox_sock *po = pn->hash_table[i]; + struct sock *sk; - while (po != NULL) { - struct sock *sk; - if (po->pppoe_dev != dev) { - po = po->next; - continue; - } - sk = sk_pppox(po); - spin_lock(&flush_lock); - po->pppoe_dev = NULL; - spin_unlock(&flush_lock); - dev_put(dev); - - /* We always grab the socket lock, followed by the - * hash_lock, in that order. Since we should - * hold the sock lock while doing any unbinding, - * we need to release the lock we're holding. - * Hold a reference to the sock so it doesn't disappear - * as we're jumping between locks. - */ + while (po && po->pppoe_dev != dev) { + po = po->next; + } - sock_hold(sk); + if (po == NULL) { + continue; + } - write_unlock_bh(&pn->hash_lock); - lock_sock(sk); + sk = sk_pppox(po); - if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND)) { - pppox_unbind_sock(sk); - sk->sk_state = PPPOX_ZOMBIE; - sk->sk_state_change(sk); - } + /* We always grab the socket lock, followed by the hash_lock, + * in that order. Since we should hold the sock lock while + * doing any unbinding, we need to release the lock we're + * holding. Hold a reference to the sock so it doesn't + * disappear as we're jumping between locks. + */ - release_sock(sk); - sock_put(sk); + sock_hold(sk); + write_unlock_bh(&pn->hash_lock); + lock_sock(sk); - /* Restart scan at the beginning of this hash chain. - * While the lock was dropped the chain contents may - * have changed. - */ - write_lock_bh(&pn->hash_lock); - po = pn->hash_table[i]; + if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND)) { + pppox_unbind_sock(sk); + sk->sk_state = PPPOX_ZOMBIE; + sk->sk_state_change(sk); + po->pppoe_dev = NULL; + dev_put(dev); } + + release_sock(sk); + sock_put(sk); + + /* Restart the process from the start of the current hash + * chain. We dropped locks so the world may have change from + * underneath us. + */ + write_lock_bh(&pn->hash_lock); + po = pn->hash_table[i]; } write_unlock_bh(&pn->hash_lock); } @@ -391,8 +386,8 @@ static int pppoe_rcv_core(struct sock *sk, struct sk_buff *skb) if (sk->sk_state & PPPOX_BOUND) { ppp_input(&po->chan, skb); } else if (sk->sk_state & PPPOX_RELAY) { - relay_po = get_item_by_addr(dev_net(po->pppoe_dev), - &po->pppoe_relay); + relay_po = get_item_by_addr(sock_net(sk), + &po->pppoe_relay); if (relay_po == NULL) goto abort_kfree; @@ -561,6 +556,7 @@ static int pppoe_release(struct socket *sock) struct sock *sk = sock->sk; struct pppox_sock *po; struct pppoe_net *pn; + struct net *net = NULL; if (!sk) return 0; @@ -571,24 +567,19 @@ static int pppoe_release(struct socket *sock) return -EBADF; } + po = pppox_sk(sk); + + if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND)) { + dev_put(po->pppoe_dev); + } + pppox_unbind_sock(sk); /* Signal the death of the socket. */ sk->sk_state = PPPOX_DEAD; - /* - * pppoe_flush_dev could lead to a race with - * this routine so we use flush_lock to eliminate - * such a case (we only need per-net specific data) - */ - spin_lock(&flush_lock); - po = pppox_sk(sk); - if (!po->pppoe_dev) { - spin_unlock(&flush_lock); - goto out; - } - pn = pppoe_pernet(dev_net(po->pppoe_dev)); - spin_unlock(&flush_lock); + net = sock_net(sk); + pn = pppoe_pernet(net); /* * protect "po" from concurrent updates @@ -596,19 +587,12 @@ static int pppoe_release(struct socket *sock) */ write_lock_bh(&pn->hash_lock); - po = pppox_sk(sk); if (stage_session(po->pppoe_pa.sid)) __delete_item(pn, po->pppoe_pa.sid, po->pppoe_pa.remote, po->pppoe_ifindex); - if (po->pppoe_dev) { - dev_put(po->pppoe_dev); - po->pppoe_dev = NULL; - } - write_unlock_bh(&pn->hash_lock); -out: sock_orphan(sk); sock->sk = NULL; @@ -625,8 +609,9 @@ static int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr, struct sock *sk = sock->sk; struct sockaddr_pppox *sp = (struct sockaddr_pppox *)uservaddr; struct pppox_sock *po = pppox_sk(sk); - struct net_device *dev; struct pppoe_net *pn; + struct net_device *dev = NULL; + struct net *net = NULL; int error; lock_sock(sk); @@ -653,10 +638,12 @@ static int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr, if (stage_session(po->pppoe_pa.sid)) { pppox_unbind_sock(sk); if (po->pppoe_dev) { - pn = pppoe_pernet(dev_net(po->pppoe_dev)); + struct net *old = dev_net(po->pppoe_dev); + pn = pppoe_pernet(old); delete_item(pn, po->pppoe_pa.sid, po->pppoe_pa.remote, po->pppoe_ifindex); dev_put(po->pppoe_dev); + po->pppoe_dev = NULL; } memset(sk_pppox(po) + 1, 0, sizeof(struct pppox_sock) - sizeof(struct sock)); @@ -666,13 +653,14 @@ static int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr, /* Re-bind in session stage only */ if (stage_session(sp->sa_addr.pppoe.sid)) { error = -ENODEV; - dev = dev_get_by_name(sock_net(sk), sp->sa_addr.pppoe.dev); + net = sock_net(sk); + dev = dev_get_by_name(net, sp->sa_addr.pppoe.dev); if (!dev) - goto end; + goto err_put; po->pppoe_dev = dev; po->pppoe_ifindex = dev->ifindex; - pn = pppoe_pernet(dev_net(dev)); + pn = pppoe_pernet(net); write_lock_bh(&pn->hash_lock); if (!(dev->flags & IFF_UP)) { write_unlock_bh(&pn->hash_lock); @@ -915,6 +903,8 @@ static int __pppoe_xmit(struct sock *sk, struct sk_buff *skb) struct pppoe_hdr *ph; int data_len = skb->len; + lock_sock(sk); + if (sock_flag(sk, SOCK_DEAD) || !(sk->sk_state & PPPOX_CONNECTED)) goto abort; @@ -944,10 +934,11 @@ static int __pppoe_xmit(struct sock *sk, struct sk_buff *skb) po->pppoe_pa.remote, NULL, data_len); dev_queue_xmit(skb); - + release_sock(sk); return 1; abort: + release_sock(sk); kfree_skb(skb); return 1; } ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces 2009-10-19 20:54 ` Michal Ostrowski @ 2009-10-20 3:42 ` Eric Dumazet 2009-10-20 5:02 ` Cyrill Gorcunov 0 siblings, 1 reply; 34+ messages in thread From: Eric Dumazet @ 2009-10-20 3:42 UTC (permalink / raw) To: Michal Ostrowski Cc: Cyrill Gorcunov, Denys Fedoryschenko, netdev, linux-ppp, paulus, mostrows Michal Ostrowski a écrit : > Access of po->pppoe_dev is guarded by sk->sk_state & PPPOX_CONNECTED, > and all use cases now rely on the socket lock. Because of this, the > ref-count on the namespace held by the socket object suffices to hold > the namespace in existence and so we don't need to ref-count the > namespace in PPPoE. The flush_lock is gone. > Seems good ! But can we use lock_sock() in __pppoe_xmit() context ? ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces 2009-10-20 3:42 ` Eric Dumazet @ 2009-10-20 5:02 ` Cyrill Gorcunov 2009-10-20 5:05 ` Eric Dumazet 0 siblings, 1 reply; 34+ messages in thread From: Cyrill Gorcunov @ 2009-10-20 5:02 UTC (permalink / raw) To: Eric Dumazet Cc: Michal Ostrowski, Denys Fedoryschenko, netdev, linux-ppp, paulus, mostrows On 10/20/09, Eric Dumazet <eric.dumazet@gmail.com> wrote: > Michal Ostrowski a écrit : >> Access of po->pppoe_dev is guarded by sk->sk_state & PPPOX_CONNECTED, >> and all use cases now rely on the socket lock. Because of this, the >> ref-count on the namespace held by the socket object suffices to hold >> the namespace in existence and so we don't need to ref-count the >> namespace in PPPoE. The flush_lock is gone. >> > > Seems good ! > > But can we use lock_sock() in __pppoe_xmit() context ? > Eric, most probably i miss something, but how lock sock protect us from mtu changed via sysfs. This action calls change mtu notifier which doesn't care about sockets at all... ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces 2009-10-20 5:02 ` Cyrill Gorcunov @ 2009-10-20 5:05 ` Eric Dumazet 2009-10-20 5:17 ` Cyrill Gorcunov 0 siblings, 1 reply; 34+ messages in thread From: Eric Dumazet @ 2009-10-20 5:05 UTC (permalink / raw) To: Cyrill Gorcunov Cc: Michal Ostrowski, Denys Fedoryschenko, netdev, linux-ppp, paulus, mostrows Cyrill Gorcunov a écrit : > On 10/20/09, Eric Dumazet <eric.dumazet@gmail.com> wrote: >> Michal Ostrowski a écrit : >>> Access of po->pppoe_dev is guarded by sk->sk_state & PPPOX_CONNECTED, >>> and all use cases now rely on the socket lock. Because of this, the >>> ref-count on the namespace held by the socket object suffices to hold >>> the namespace in existence and so we don't need to ref-count the >>> namespace in PPPoE. The flush_lock is gone. >>> >> Seems good ! >> >> But can we use lock_sock() in __pppoe_xmit() context ? >> > > Eric, most probably i miss something, but how lock sock protect us > from mtu changed via sysfs. This action calls change mtu notifier > which doesn't care about sockets at all... This ultimately calls pppoe_flush_dev() and this function takes care of taking appropriate sock_locks() on each sockets ? ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces 2009-10-20 5:05 ` Eric Dumazet @ 2009-10-20 5:17 ` Cyrill Gorcunov 2009-10-20 6:04 ` Cyrill Gorcunov 0 siblings, 1 reply; 34+ messages in thread From: Cyrill Gorcunov @ 2009-10-20 5:17 UTC (permalink / raw) To: Eric Dumazet Cc: Michal Ostrowski, Denys Fedoryschenko, netdev, linux-ppp, paulus, mostrows On 10/20/09, Eric Dumazet <eric.dumazet@gmail.com> wrote: > Cyrill Gorcunov a écrit : >> On 10/20/09, Eric Dumazet <eric.dumazet@gmail.com> wrote: >>> Michal Ostrowski a écrit : >>>> Access of po->pppoe_dev is guarded by sk->sk_state & PPPOX_CONNECTED, >>>> and all use cases now rely on the socket lock. Because of this, the >>>> ref-count on the namespace held by the socket object suffices to hold >>>> the namespace in existence and so we don't need to ref-count the >>>> namespace in PPPoE. The flush_lock is gone. >>>> >>> Seems good ! >>> >>> But can we use lock_sock() in __pppoe_xmit() context ? >>> >> >> Eric, most probably i miss something, but how lock sock protect us >> from mtu changed via sysfs. This action calls change mtu notifier >> which doesn't care about sockets at all... > > This ultimately calls pppoe_flush_dev() and this function > takes care of taking appropriate sock_locks() on each sockets ? > This hold and lock socks but set pppoe_dev to null as well. I'll back later. And i need to reread the code. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces 2009-10-20 5:17 ` Cyrill Gorcunov @ 2009-10-20 6:04 ` Cyrill Gorcunov 0 siblings, 0 replies; 34+ messages in thread From: Cyrill Gorcunov @ 2009-10-20 6:04 UTC (permalink / raw) To: Eric Dumazet Cc: Michal Ostrowski, Denys Fedoryschenko, netdev, linux-ppp, paulus, mostrows On 10/20/09, Cyrill Gorcunov <gorcunov@gmail.com> wrote: > On 10/20/09, Eric Dumazet <eric.dumazet@gmail.com> wrote: >> >> This ultimately calls pppoe_flush_dev() and this function >> takes care of taking appropriate sock_locks() on each sockets ? >> > This hold and lock socks but set pppoe_dev to null as well. I'll back > later. And i need to reread the code. > from a second glance we have a race with pppoe_connect (err_put label) and pppoe_flush_dev. Connect lock socket but while flushing we may dev_put with null device. So as Eric pointed early the error path is racy and should be protected with flush_lock. Note that while flushing pppoe we may do dev_put without socket locked. This should be the corner case. Michal could you add this lock in pppoe_connect (i cant do this at moment). ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces 2009-10-19 18:44 ` Eric Dumazet 2009-10-19 19:29 ` Cyrill Gorcunov 2009-10-19 20:54 ` Michal Ostrowski @ 2009-10-19 20:57 ` Cyrill Gorcunov 2009-10-19 21:22 ` Michal Ostrowski 2 siblings, 1 reply; 34+ messages in thread From: Cyrill Gorcunov @ 2009-10-19 20:57 UTC (permalink / raw) To: Eric Dumazet Cc: Michal Ostrowski, Denys Fedoryschenko, netdev, linux-ppp, paulus, mostrows [Eric Dumazet - Mon, Oct 19, 2009 at 08:44:52PM +0200] ... | | There is still a race here, since you do a dev_put(po->ppoe_dev); without any lock held | | So pppoe_flush_dev() can run concurently and dev_put(po->ppoe_dev) at same time. | | In fact pppoe_flush_dev() can change po->ppoe_dev anytime, so you should check | all occurences of po->ppoe_dev use in the code and check if appropriate locking is done. | | pppoe_rcv_core() is not safe | pppoe_ioctl() is not safe | pppoe_sendmsg() is not safe | __pppoe_xmit() is not safe | Eric, Michal, please take a look (compile-test only). There is still unclear moment for NETDEV_CHANGEMTU since one could be doing this in say endless loop from userspace calling device to flush all the time which implies dev_put as a result :( Comments are welcome (and complains as well). I'll be able to continue handling (or reply to mail) tomorrow evening only. Anyway -- this patch is ugly enough but may happen to work. -- Cyrill --- drivers/net/pppoe.c | 79 ++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 62 insertions(+), 17 deletions(-) Index: linux-2.6.git/drivers/net/pppoe.c ===================================================================== --- linux-2.6.git.orig/drivers/net/pppoe.c +++ linux-2.6.git/drivers/net/pppoe.c @@ -386,14 +386,19 @@ static struct notifier_block pppoe_notif static int pppoe_rcv_core(struct sock *sk, struct sk_buff *skb) { struct pppox_sock *po = pppox_sk(sk); - struct pppox_sock *relay_po; + struct pppox_sock *relay_po = NULL; + struct net *net = NULL; if (sk->sk_state & PPPOX_BOUND) { ppp_input(&po->chan, skb); } else if (sk->sk_state & PPPOX_RELAY) { - relay_po = get_item_by_addr(dev_net(po->pppoe_dev), - &po->pppoe_relay); - if (relay_po == NULL) + spin_lock(&flush_lock); + if (po->pppoe_dev) + net = dev_net(po->pppoe_dev); + spin_unlock(&flush_lock); + if (net) + relay_po = get_item_by_addr(net, &po->pppoe_relay); + if (!net || !relay_po) goto abort_kfree; if ((sk_pppox(relay_po)->sk_state & PPPOX_CONNECTED) == 0) @@ -652,12 +657,15 @@ static int pppoe_connect(struct socket * /* Delete the old binding */ if (stage_session(po->pppoe_pa.sid)) { pppox_unbind_sock(sk); + spin_lock(&flush_lock); if (po->pppoe_dev) { pn = pppoe_pernet(dev_net(po->pppoe_dev)); delete_item(pn, po->pppoe_pa.sid, po->pppoe_pa.remote, po->pppoe_ifindex); dev_put(po->pppoe_dev); + po->pppoe_dev = NULL; } + spin_unlock(&flush_lock); memset(sk_pppox(po) + 1, 0, sizeof(struct pppox_sock) - sizeof(struct sock)); sk->sk_state = PPPOX_NONE; @@ -670,7 +678,9 @@ static int pppoe_connect(struct socket * if (!dev) goto end; + spin_lock(&flush_lock); po->pppoe_dev = dev; + spin_unlock(&flush_lock); po->pppoe_ifindex = dev->ifindex; pn = pppoe_pernet(dev_net(dev)); write_lock_bh(&pn->hash_lock); @@ -708,10 +718,12 @@ end: release_sock(sk); return error; err_put: + spin_lock(&flush_lock); if (po->pppoe_dev) { dev_put(po->pppoe_dev); po->pppoe_dev = NULL; } + spin_unlock(&flush_lock); goto end; } @@ -738,6 +750,7 @@ static int pppoe_ioctl(struct socket *so { struct sock *sk = sock->sk; struct pppox_sock *po = pppox_sk(sk); + unsigned int mtu = 0; int val; int err; @@ -746,11 +759,19 @@ static int pppoe_ioctl(struct socket *so err = -ENXIO; if (!(sk->sk_state & PPPOX_CONNECTED)) break; - + err = -EBUSY; + if (!spin_trylock(&flush_lock)) + break; + err = -ENODEV; + if (po->pppoe_dev) { + mtu = po->pppoe_dev->mtu; + err = 0; + } + spin_unlock(&flush_lock); + if (err) + break; err = -EFAULT; - if (put_user(po->pppoe_dev->mtu - - sizeof(struct pppoe_hdr) - - PPP_HDRLEN, + if (put_user(mtu - sizeof(struct pppoe_hdr) - PPP_HDRLEN, (int __user *)arg)) break; err = 0; @@ -760,14 +781,22 @@ static int pppoe_ioctl(struct socket *so err = -ENXIO; if (!(sk->sk_state & PPPOX_CONNECTED)) break; - + err = -EBUSY; + if (!spin_trylock(&flush_lock)) + break; + err = -ENODEV; + if (po->pppoe_dev) { + mtu = po->pppoe_dev->mtu; + err = 0; + } + spin_unlock(&flush_lock); + if (err) + break; err = -EFAULT; if (get_user(val, (int __user *)arg)) break; - if (val < (po->pppoe_dev->mtu - - sizeof(struct pppoe_hdr) - - PPP_HDRLEN)) + if (val < (mtu - sizeof(struct pppoe_hdr) - PPP_HDRLEN)) err = 0; else err = -EINVAL; @@ -842,7 +871,7 @@ static int pppoe_sendmsg(struct kiocb *i int error; struct pppoe_hdr hdr; struct pppoe_hdr *ph; - struct net_device *dev; + struct net_device *dev = NULL; char *start; lock_sock(sk); @@ -856,7 +885,15 @@ static int pppoe_sendmsg(struct kiocb *i hdr.code = 0; hdr.sid = po->num; + spin_lock(&flush_lock); dev = po->pppoe_dev; + if (!dev) { + spin_unlock(&flush_lock); + error = -ENODEV; + goto end; + } + dev_hold(dev); + spin_unlock(&flush_lock); error = -EMSGSIZE; if (total_len > (dev->mtu + dev->hard_header_len)) @@ -899,6 +936,8 @@ static int pppoe_sendmsg(struct kiocb *i dev_queue_xmit(skb); end: + if (dev) + dev_put(dev); release_sock(sk); return error; } @@ -911,15 +950,19 @@ end: static int __pppoe_xmit(struct sock *sk, struct sk_buff *skb) { struct pppox_sock *po = pppox_sk(sk); - struct net_device *dev = po->pppoe_dev; + struct net_device *dev; struct pppoe_hdr *ph; int data_len = skb->len; if (sock_flag(sk, SOCK_DEAD) || !(sk->sk_state & PPPOX_CONNECTED)) goto abort; - if (!dev) - goto abort; + spin_lock(&flush_lock); + if (!po->pppoe_dev) + goto abort_unlock; + dev = po->pppoe_dev; + dev_hold(dev); + spin_unlock(&flush_lock); /* Copy the data if there is no space for the header or if it's * read-only. @@ -942,11 +985,13 @@ static int __pppoe_xmit(struct sock *sk, dev_hard_header(skb, dev, ETH_P_PPP_SES, po->pppoe_pa.remote, NULL, data_len); - dev_queue_xmit(skb); + dev_put(dev); return 1; +abort_unlock: + spin_unlock(&flush_lock); abort: kfree_skb(skb); return 1; ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces 2009-10-19 20:57 ` Cyrill Gorcunov @ 2009-10-19 21:22 ` Michal Ostrowski 2009-10-20 0:08 ` Denys Fedoryschenko 2009-10-20 2:28 ` David Miller 0 siblings, 2 replies; 34+ messages in thread From: Michal Ostrowski @ 2009-10-19 21:22 UTC (permalink / raw) To: Cyrill Gorcunov Cc: Eric Dumazet, Denys Fedoryschenko, netdev, linux-ppp, paulus, mostrows I'm assuming that there was a race in us sending patches at nearly the same time I'm convinced now that the flush_lock can die, and the patch I sent out kills it. If we were to have to have a global lock, you might as well just use the flush_lock for everything. Note that there's a bunch of checks for sk->sk_state vs PPPOX_CONNECTED. PPPOX_CONNECTED being set implies that po->pppoe_dev is valid. So, by ensuring that checks of sk->sk_state and po->pppoe_dev are done under the protection of lock_sock()/release_sock() it is not even necessary to check po->pppoe_dev==NULL. In fact, the NULL checks you're adding are equivalent to testing sk->sk_state vs PPPOX_CONNECTED, and thus you're replacing the synchronization provided by the socket lock with flush_lock. -- Michal Ostrowski mostrows@gmail.com On Mon, Oct 19, 2009 at 3:57 PM, Cyrill Gorcunov <gorcunov@gmail.com> wrote: > [Eric Dumazet - Mon, Oct 19, 2009 at 08:44:52PM +0200] > ... > | > | There is still a race here, since you do a dev_put(po->ppoe_dev); without any lock held > | > | So pppoe_flush_dev() can run concurently and dev_put(po->ppoe_dev) at same time. > | > | In fact pppoe_flush_dev() can change po->ppoe_dev anytime, so you should check > | all occurences of po->ppoe_dev use in the code and check if appropriate locking is done. > | > | pppoe_rcv_core() is not safe > | pppoe_ioctl() is not safe > | pppoe_sendmsg() is not safe > | __pppoe_xmit() is not safe > | > > Eric, Michal, please take a look (compile-test only). > There is still unclear moment for NETDEV_CHANGEMTU > since one could be doing this in say endless loop from > userspace calling device to flush all the time which > implies dev_put as a result :( > > Comments are welcome (and complains as well). I'll be able to > continue handling (or reply to mail) tomorrow evening only. > Anyway -- this patch is ugly enough but may happen to work. > > -- Cyrill > --- > drivers/net/pppoe.c | 79 ++++++++++++++++++++++++++++++++++++++++------------ > 1 file changed, 62 insertions(+), 17 deletions(-) > > Index: linux-2.6.git/drivers/net/pppoe.c > ===================================================================== > --- linux-2.6.git.orig/drivers/net/pppoe.c > +++ linux-2.6.git/drivers/net/pppoe.c > @@ -386,14 +386,19 @@ static struct notifier_block pppoe_notif > static int pppoe_rcv_core(struct sock *sk, struct sk_buff *skb) > { > struct pppox_sock *po = pppox_sk(sk); > - struct pppox_sock *relay_po; > + struct pppox_sock *relay_po = NULL; > + struct net *net = NULL; > > if (sk->sk_state & PPPOX_BOUND) { > ppp_input(&po->chan, skb); > } else if (sk->sk_state & PPPOX_RELAY) { > - relay_po = get_item_by_addr(dev_net(po->pppoe_dev), > - &po->pppoe_relay); > - if (relay_po == NULL) > + spin_lock(&flush_lock); > + if (po->pppoe_dev) > + net = dev_net(po->pppoe_dev); > + spin_unlock(&flush_lock); > + if (net) > + relay_po = get_item_by_addr(net, &po->pppoe_relay); > + if (!net || !relay_po) > goto abort_kfree; > > if ((sk_pppox(relay_po)->sk_state & PPPOX_CONNECTED) == 0) > @@ -652,12 +657,15 @@ static int pppoe_connect(struct socket * > /* Delete the old binding */ > if (stage_session(po->pppoe_pa.sid)) { > pppox_unbind_sock(sk); > + spin_lock(&flush_lock); > if (po->pppoe_dev) { > pn = pppoe_pernet(dev_net(po->pppoe_dev)); > delete_item(pn, po->pppoe_pa.sid, > po->pppoe_pa.remote, po->pppoe_ifindex); > dev_put(po->pppoe_dev); > + po->pppoe_dev = NULL; > } > + spin_unlock(&flush_lock); > memset(sk_pppox(po) + 1, 0, > sizeof(struct pppox_sock) - sizeof(struct sock)); > sk->sk_state = PPPOX_NONE; > @@ -670,7 +678,9 @@ static int pppoe_connect(struct socket * > if (!dev) > goto end; > > + spin_lock(&flush_lock); > po->pppoe_dev = dev; > + spin_unlock(&flush_lock); > po->pppoe_ifindex = dev->ifindex; > pn = pppoe_pernet(dev_net(dev)); > write_lock_bh(&pn->hash_lock); > @@ -708,10 +718,12 @@ end: > release_sock(sk); > return error; > err_put: > + spin_lock(&flush_lock); > if (po->pppoe_dev) { > dev_put(po->pppoe_dev); > po->pppoe_dev = NULL; > } > + spin_unlock(&flush_lock); > goto end; > } > > @@ -738,6 +750,7 @@ static int pppoe_ioctl(struct socket *so > { > struct sock *sk = sock->sk; > struct pppox_sock *po = pppox_sk(sk); > + unsigned int mtu = 0; > int val; > int err; > > @@ -746,11 +759,19 @@ static int pppoe_ioctl(struct socket *so > err = -ENXIO; > if (!(sk->sk_state & PPPOX_CONNECTED)) > break; > - > + err = -EBUSY; > + if (!spin_trylock(&flush_lock)) > + break; > + err = -ENODEV; > + if (po->pppoe_dev) { > + mtu = po->pppoe_dev->mtu; > + err = 0; > + } > + spin_unlock(&flush_lock); > + if (err) > + break; > err = -EFAULT; > - if (put_user(po->pppoe_dev->mtu - > - sizeof(struct pppoe_hdr) - > - PPP_HDRLEN, > + if (put_user(mtu - sizeof(struct pppoe_hdr) - PPP_HDRLEN, > (int __user *)arg)) > break; > err = 0; > @@ -760,14 +781,22 @@ static int pppoe_ioctl(struct socket *so > err = -ENXIO; > if (!(sk->sk_state & PPPOX_CONNECTED)) > break; > - > + err = -EBUSY; > + if (!spin_trylock(&flush_lock)) > + break; > + err = -ENODEV; > + if (po->pppoe_dev) { > + mtu = po->pppoe_dev->mtu; > + err = 0; > + } > + spin_unlock(&flush_lock); > + if (err) > + break; > err = -EFAULT; > if (get_user(val, (int __user *)arg)) > break; > > - if (val < (po->pppoe_dev->mtu > - - sizeof(struct pppoe_hdr) > - - PPP_HDRLEN)) > + if (val < (mtu - sizeof(struct pppoe_hdr) - PPP_HDRLEN)) > err = 0; > else > err = -EINVAL; > @@ -842,7 +871,7 @@ static int pppoe_sendmsg(struct kiocb *i > int error; > struct pppoe_hdr hdr; > struct pppoe_hdr *ph; > - struct net_device *dev; > + struct net_device *dev = NULL; > char *start; > > lock_sock(sk); > @@ -856,7 +885,15 @@ static int pppoe_sendmsg(struct kiocb *i > hdr.code = 0; > hdr.sid = po->num; > > + spin_lock(&flush_lock); > dev = po->pppoe_dev; > + if (!dev) { > + spin_unlock(&flush_lock); > + error = -ENODEV; > + goto end; > + } > + dev_hold(dev); > + spin_unlock(&flush_lock); > > error = -EMSGSIZE; > if (total_len > (dev->mtu + dev->hard_header_len)) > @@ -899,6 +936,8 @@ static int pppoe_sendmsg(struct kiocb *i > dev_queue_xmit(skb); > > end: > + if (dev) > + dev_put(dev); > release_sock(sk); > return error; > } > @@ -911,15 +950,19 @@ end: > static int __pppoe_xmit(struct sock *sk, struct sk_buff *skb) > { > struct pppox_sock *po = pppox_sk(sk); > - struct net_device *dev = po->pppoe_dev; > + struct net_device *dev; > struct pppoe_hdr *ph; > int data_len = skb->len; > > if (sock_flag(sk, SOCK_DEAD) || !(sk->sk_state & PPPOX_CONNECTED)) > goto abort; > > - if (!dev) > - goto abort; > + spin_lock(&flush_lock); > + if (!po->pppoe_dev) > + goto abort_unlock; > + dev = po->pppoe_dev; > + dev_hold(dev); > + spin_unlock(&flush_lock); > > /* Copy the data if there is no space for the header or if it's > * read-only. > @@ -942,11 +985,13 @@ static int __pppoe_xmit(struct sock *sk, > > dev_hard_header(skb, dev, ETH_P_PPP_SES, > po->pppoe_pa.remote, NULL, data_len); > - > dev_queue_xmit(skb); > + dev_put(dev); > > return 1; > > +abort_unlock: > + spin_unlock(&flush_lock); > abort: > kfree_skb(skb); > return 1; > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces 2009-10-19 21:22 ` Michal Ostrowski @ 2009-10-20 0:08 ` Denys Fedoryschenko 2009-10-20 3:04 ` Cyrill Gorcunov 2009-10-20 2:28 ` David Miller 1 sibling, 1 reply; 34+ messages in thread From: Denys Fedoryschenko @ 2009-10-20 0:08 UTC (permalink / raw) To: Michal Ostrowski Cc: Cyrill Gorcunov, Eric Dumazet, netdev, linux-ppp, paulus, mostrows On Tuesday 20 October 2009 00:22:39 Michal Ostrowski wrote: > I'm assuming that there was a race in us sending patches at nearly the same > time I'm convinced now that the flush_lock can die, and the patch I sent > out kills it. o_O I am drowning in patches. Just let me know which one to test :-) ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces 2009-10-20 0:08 ` Denys Fedoryschenko @ 2009-10-20 3:04 ` Cyrill Gorcunov 2009-10-20 11:36 ` Denys Fedoryschenko 0 siblings, 1 reply; 34+ messages in thread From: Cyrill Gorcunov @ 2009-10-20 3:04 UTC (permalink / raw) To: Denys Fedoryschenko Cc: Michal Ostrowski, Eric Dumazet, netdev, linux-ppp, paulus, mostrows On 10/20/09, Denys Fedoryschenko <denys@visp.net.lb> wrote: > On Tuesday 20 October 2009 00:22:39 Michal Ostrowski wrote: >> I'm assuming that there was a race in us sending patches at nearly the >> same >> time I'm convinced now that the flush_lock can die, and the patch I sent >> out kills it. > o_O > > I am drowning in patches. Just let me know which one to test :-) > Oh ;) Try out latest Michal's patch (and then mine). I'll continue digg this issue at next spare time slot. Thanks! ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces 2009-10-20 3:04 ` Cyrill Gorcunov @ 2009-10-20 11:36 ` Denys Fedoryschenko 2009-10-20 11:50 ` Cyrill Gorcunov 0 siblings, 1 reply; 34+ messages in thread From: Denys Fedoryschenko @ 2009-10-20 11:36 UTC (permalink / raw) To: Cyrill Gorcunov Cc: Michal Ostrowski, Eric Dumazet, netdev, linux-ppp, paulus, mostrows On Tuesday 20 October 2009 06:04:35 Cyrill Gorcunov wrote: > On 10/20/09, Denys Fedoryschenko <denys@visp.net.lb> wrote: > > On Tuesday 20 October 2009 00:22:39 Michal Ostrowski wrote: > >> I'm assuming that there was a race in us sending patches at nearly the > >> same > >> time I'm convinced now that the flush_lock can die, and the patch I sent > >> out kills it. > > > > o_O > > > > I am drowning in patches. Just let me know which one to test :-) > > Oh ;) Try out latest Michal's patch (and then mine). I'll continue > digg this issue at next spare time slot. Thanks! Tried your patch, panic almost immediately Here is a text of panic message over netconsole http://www.nuclearcat.com/files/panic_pppoe3.txt It is different now, before it was pppoe_device_event, now in pppoe_rcv ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces 2009-10-20 11:36 ` Denys Fedoryschenko @ 2009-10-20 11:50 ` Cyrill Gorcunov 2009-10-20 11:52 ` Denys Fedoryschenko 0 siblings, 1 reply; 34+ messages in thread From: Cyrill Gorcunov @ 2009-10-20 11:50 UTC (permalink / raw) To: Denys Fedoryschenko Cc: Michal Ostrowski, Eric Dumazet, netdev, linux-ppp, paulus, mostrows On 10/20/09, Denys Fedoryschenko <denys@visp.net.lb> wrote: > On Tuesday 20 October 2009 06:04:35 Cyrill Gorcunov wrote: >> On 10/20/09, Denys Fedoryschenko <denys@visp.net.lb> wrote: >> > On Tuesday 20 October 2009 00:22:39 Michal Ostrowski wrote: >> >> I'm assuming that there was a race in us sending patches at nearly the >> >> same >> >> time I'm convinced now that the flush_lock can die, and the patch I >> >> sent >> >> out kills it. >> > >> > o_O >> > >> > I am drowning in patches. Just let me know which one to test :-) >> >> Oh ;) Try out latest Michal's patch (and then mine). I'll continue >> digg this issue at next spare time slot. Thanks! > Tried your patch, panic almost immediately > Here is a text of panic message over netconsole > http://www.nuclearcat.com/files/panic_pppoe3.txt > It is different now, before it was pppoe_device_event, now in pppoe_rcv > > Thanks a lot! I'll back with new one in a couple of hours. Meanwhile i suppose you may try Michal's patch as well. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces 2009-10-20 11:50 ` Cyrill Gorcunov @ 2009-10-20 11:52 ` Denys Fedoryschenko 2009-10-20 13:42 ` Cyrill Gorcunov 0 siblings, 1 reply; 34+ messages in thread From: Denys Fedoryschenko @ 2009-10-20 11:52 UTC (permalink / raw) To: Cyrill Gorcunov Cc: Michal Ostrowski, Eric Dumazet, netdev, linux-ppp, paulus, mostrows On Tuesday 20 October 2009 14:50:09 Cyrill Gorcunov wrote: > On 10/20/09, Denys Fedoryschenko <denys@visp.net.lb> wrote: > > On Tuesday 20 October 2009 06:04:35 Cyrill Gorcunov wrote: > >> On 10/20/09, Denys Fedoryschenko <denys@visp.net.lb> wrote: > >> > On Tuesday 20 October 2009 00:22:39 Michal Ostrowski wrote: > >> >> I'm assuming that there was a race in us sending patches at nearly > >> >> the same > >> >> time I'm convinced now that the flush_lock can die, and the patch I > >> >> sent > >> >> out kills it. > >> > > >> > o_O > >> > > >> > I am drowning in patches. Just let me know which one to test :-) > >> > >> Oh ;) Try out latest Michal's patch (and then mine). I'll continue > >> digg this issue at next spare time slot. Thanks! > > > > Tried your patch, panic almost immediately > > Here is a text of panic message over netconsole > > http://www.nuclearcat.com/files/panic_pppoe3.txt > > It is different now, before it was pppoe_device_event, now in pppoe_rcv > > Thanks a lot! I'll back with new one in a couple of hours. Meanwhile i > suppose you may try Michal's patch as well. I did, it didn't help. Maybe i can run some debugging options in kernel? Also i can add debug(printk) lines in kernel if you want, to see where is bug appearing. Note, i told to Michal, so will tell here, this pc is hyperthreading P4, as i know it is very good to trigger various SMP race conditions. I can try also it with nosmp if u want. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces 2009-10-20 11:52 ` Denys Fedoryschenko @ 2009-10-20 13:42 ` Cyrill Gorcunov 2009-10-20 13:50 ` Denys Fedoryschenko 0 siblings, 1 reply; 34+ messages in thread From: Cyrill Gorcunov @ 2009-10-20 13:42 UTC (permalink / raw) To: Denys Fedoryschenko Cc: Michal Ostrowski, Eric Dumazet, netdev, linux-ppp, paulus, mostrows [Denys Fedoryschenko - Tue, Oct 20, 2009 at 02:52:58PM +0300] ... | > Thanks a lot! I'll back with new one in a couple of hours. Meanwhile i | > suppose you may try Michal's patch as well. | I did, it didn't help. | Maybe i can run some debugging options in kernel? | Also i can add debug(printk) lines in kernel if you want, to see where is bug | appearing. | Note, i told to Michal, so will tell here, this pc is hyperthreading P4, as i | know it is very good to trigger various SMP race conditions. | I can try also it with nosmp if u want. | Thanks Denys, I'm preparing new patch (just back from office and had no inet connection that is why reply is delayed, sorry). -- Cyrill ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces 2009-10-20 13:42 ` Cyrill Gorcunov @ 2009-10-20 13:50 ` Denys Fedoryschenko 2009-10-20 13:59 ` Cyrill Gorcunov 0 siblings, 1 reply; 34+ messages in thread From: Denys Fedoryschenko @ 2009-10-20 13:50 UTC (permalink / raw) To: Cyrill Gorcunov Cc: Michal Ostrowski, Eric Dumazet, netdev, linux-ppp, paulus, mostrows On Tuesday 20 October 2009 16:42:17 Cyrill Gorcunov wrote: > [Denys Fedoryschenko - Tue, Oct 20, 2009 at 02:52:58PM +0300] > ... > > | > Thanks a lot! I'll back with new one in a couple of hours. Meanwhile i > | > suppose you may try Michal's patch as well. > | > | I did, it didn't help. > | Maybe i can run some debugging options in kernel? > | Also i can add debug(printk) lines in kernel if you want, to see where is > | bug appearing. > | Note, i told to Michal, so will tell here, this pc is hyperthreading P4, > | as i know it is very good to trigger various SMP race conditions. > | I can try also it with nosmp if u want. > > Thanks Denys, I'm preparing new patch (just back from office > and had no inet connection that is why reply is delayed, sorry). There is no problem at all. This rename operation is just future operation and host is redundant, so i can do tests on it anytime. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces 2009-10-20 13:50 ` Denys Fedoryschenko @ 2009-10-20 13:59 ` Cyrill Gorcunov 2009-10-20 14:20 ` Denys Fedoryschenko 0 siblings, 1 reply; 34+ messages in thread From: Cyrill Gorcunov @ 2009-10-20 13:59 UTC (permalink / raw) To: Denys Fedoryschenko Cc: Michal Ostrowski, Eric Dumazet, netdev, linux-ppp, paulus, mostrows [Denys Fedoryschenko - Tue, Oct 20, 2009 at 04:50:09PM +0300] | ... | > | > Thanks Denys, I'm preparing new patch (just back from office | > and had no inet connection that is why reply is delayed, sorry). | There is no problem at all. | This rename operation is just future operation and host is redundant, so i can | do tests on it anytime. | ok, here is it, please try (it's still a draft version though) -- Cyrill --- drivers/net/pppoe.c | 106 +++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 81 insertions(+), 25 deletions(-) Index: linux-2.6.git/drivers/net/pppoe.c ===================================================================== --- linux-2.6.git.orig/drivers/net/pppoe.c +++ linux-2.6.git/drivers/net/pppoe.c @@ -313,8 +313,8 @@ static void pppoe_flush_dev(struct net_d sk = sk_pppox(po); spin_lock(&flush_lock); po->pppoe_dev = NULL; - spin_unlock(&flush_lock); dev_put(dev); + spin_unlock(&flush_lock); /* We always grab the socket lock, followed by the * hash_lock, in that order. Since we should @@ -386,13 +386,21 @@ static struct notifier_block pppoe_notif static int pppoe_rcv_core(struct sock *sk, struct sk_buff *skb) { struct pppox_sock *po = pppox_sk(sk); - struct pppox_sock *relay_po; + struct pppox_sock *relay_po = NULL; + struct net_device *dev = NULL; if (sk->sk_state & PPPOX_BOUND) { ppp_input(&po->chan, skb); } else if (sk->sk_state & PPPOX_RELAY) { - relay_po = get_item_by_addr(dev_net(po->pppoe_dev), - &po->pppoe_relay); + struct pppoe_net *pn = pppoe_pernet(sock_net(sk)); + read_lock_bh(&pn->hash_lock); + dev = po->pppoe_dev; + if (dev) { + dev_hold(dev); + relay_po = get_item_by_addr(dev_net(dev), + &po->pppoe_relay); + } + read_unlock_bh(&pn->hash_lock); if (relay_po == NULL) goto abort_kfree; @@ -401,6 +409,7 @@ static int pppoe_rcv_core(struct sock *s if (!__pppoe_xmit(sk_pppox(relay_po), skb)) goto abort_put; + dev_put(dev); } else { if (sock_queue_rcv_skb(sk, skb)) goto abort_kfree; @@ -412,6 +421,8 @@ abort_put: sock_put(sk_pppox(relay_po)); abort_kfree: + if (dev) + dev_put(dev); kfree_skb(skb); return NET_RX_DROP; } @@ -625,8 +636,8 @@ static int pppoe_connect(struct socket * struct sock *sk = sock->sk; struct sockaddr_pppox *sp = (struct sockaddr_pppox *)uservaddr; struct pppox_sock *po = pppox_sk(sk); - struct net_device *dev; - struct pppoe_net *pn; + struct net_device *dev = NULL; + struct pppoe_net *pn = NULL; int error; lock_sock(sk); @@ -652,12 +663,15 @@ static int pppoe_connect(struct socket * /* Delete the old binding */ if (stage_session(po->pppoe_pa.sid)) { pppox_unbind_sock(sk); + spin_lock(&flush_lock); if (po->pppoe_dev) { pn = pppoe_pernet(dev_net(po->pppoe_dev)); delete_item(pn, po->pppoe_pa.sid, po->pppoe_pa.remote, po->pppoe_ifindex); dev_put(po->pppoe_dev); + po->pppoe_dev = NULL; } + spin_unlock(&flush_lock); memset(sk_pppox(po) + 1, 0, sizeof(struct pppox_sock) - sizeof(struct sock)); sk->sk_state = PPPOX_NONE; @@ -670,10 +684,11 @@ static int pppoe_connect(struct socket * if (!dev) goto end; + write_lock_bh(&pn->hash_lock); + dev_hold(dev); po->pppoe_dev = dev; po->pppoe_ifindex = dev->ifindex; pn = pppoe_pernet(dev_net(dev)); - write_lock_bh(&pn->hash_lock); if (!(dev->flags & IFF_UP)) { write_unlock_bh(&pn->hash_lock); goto err_put; @@ -700,6 +715,7 @@ static int pppoe_connect(struct socket * goto err_put; sk->sk_state = PPPOX_CONNECTED; + dev_put(dev); } po->num = sp->sa_addr.pppoe.sid; @@ -708,10 +724,13 @@ end: release_sock(sk); return error; err_put: + dev_put(dev); + write_lock_bh(&pn->hash_lock); if (po->pppoe_dev) { dev_put(po->pppoe_dev); po->pppoe_dev = NULL; } + write_unlock_bh(&pn->hash_lock); goto end; } @@ -738,6 +757,8 @@ static int pppoe_ioctl(struct socket *so { struct sock *sk = sock->sk; struct pppox_sock *po = pppox_sk(sk); + struct pppoe_net *pn = pppoe_pernet(sock_net(sk)); + unsigned int mtu = 0; int val; int err; @@ -746,11 +767,17 @@ static int pppoe_ioctl(struct socket *so err = -ENXIO; if (!(sk->sk_state & PPPOX_CONNECTED)) break; - + read_lock_bh(&pn->hash_lock); + err = -ENODEV; + if (po->pppoe_dev) { + mtu = po->pppoe_dev->mtu; + err = 0; + } + read_unlock_bh(&pn->hash_lock); + if (err) + break; err = -EFAULT; - if (put_user(po->pppoe_dev->mtu - - sizeof(struct pppoe_hdr) - - PPP_HDRLEN, + if (put_user(mtu - sizeof(struct pppoe_hdr) - PPP_HDRLEN, (int __user *)arg)) break; err = 0; @@ -761,13 +788,21 @@ static int pppoe_ioctl(struct socket *so if (!(sk->sk_state & PPPOX_CONNECTED)) break; + read_lock_bh(&pn->hash_lock); + err = -ENODEV; + if (po->pppoe_dev) { + mtu = po->pppoe_dev->mtu; + err = 0; + } + read_unlock_bh(&pn->hash_lock); + if (err) + break; + err = -EFAULT; if (get_user(val, (int __user *)arg)) break; - if (val < (po->pppoe_dev->mtu - - sizeof(struct pppoe_hdr) - - PPP_HDRLEN)) + if (val < (mtu - sizeof(struct pppoe_hdr) - PPP_HDRLEN)) err = 0; else err = -EINVAL; @@ -839,10 +874,11 @@ static int pppoe_sendmsg(struct kiocb *i struct sk_buff *skb; struct sock *sk = sock->sk; struct pppox_sock *po = pppox_sk(sk); + struct pppoe_net *pn = pppoe_pernet(sock_net(sk)); int error; struct pppoe_hdr hdr; struct pppoe_hdr *ph; - struct net_device *dev; + struct net_device *dev = NULL; char *start; lock_sock(sk); @@ -856,18 +892,27 @@ static int pppoe_sendmsg(struct kiocb *i hdr.code = 0; hdr.sid = po->num; - dev = po->pppoe_dev; + read_lock_bh(&pn->hash_lock); + error = -ENODEV; + if (po->pppoe_dev) { + dev = po->pppoe_dev; + dev_hold(dev); + error = 0; + } + read_unlock_bh(&pn->hash_lock); + if (error) + goto end; error = -EMSGSIZE; if (total_len > (dev->mtu + dev->hard_header_len)) - goto end; + goto end_put; skb = sock_wmalloc(sk, total_len + dev->hard_header_len + 32, 0, GFP_KERNEL); if (!skb) { error = -ENOMEM; - goto end; + goto end_put; } /* Reserve space for headers. */ @@ -885,7 +930,7 @@ static int pppoe_sendmsg(struct kiocb *i error = memcpy_fromiovec(start, m->msg_iov, total_len); if (error < 0) { kfree_skb(skb); - goto end; + goto end_put; } error = total_len; @@ -898,6 +943,8 @@ static int pppoe_sendmsg(struct kiocb *i dev_queue_xmit(skb); +end_put: + dev_put(dev); end: release_sock(sk); return error; @@ -911,21 +958,28 @@ end: static int __pppoe_xmit(struct sock *sk, struct sk_buff *skb) { struct pppox_sock *po = pppox_sk(sk); - struct net_device *dev = po->pppoe_dev; + struct pppoe_net *pn = pppoe_pernet(sock_net(sk)); + struct net_device *dev; struct pppoe_hdr *ph; int data_len = skb->len; - if (sock_flag(sk, SOCK_DEAD) || !(sk->sk_state & PPPOX_CONNECTED)) + read_lock_bh(&pn->hash_lock); + if (!po->pppoe_dev) { + read_unlock_bh(&pn->hash_lock); goto abort; + } + dev = po->pppoe_dev; + dev_hold(dev); + read_unlock_bh(&pn->hash_lock); - if (!dev) - goto abort; + if (sock_flag(sk, SOCK_DEAD) || !(sk->sk_state & PPPOX_CONNECTED)) + goto abort_put; /* Copy the data if there is no space for the header or if it's * read-only. */ if (skb_cow_head(skb, sizeof(*ph) + dev->hard_header_len)) - goto abort; + goto abort_put; __skb_push(skb, sizeof(*ph)); skb_reset_network_header(skb); @@ -944,9 +998,11 @@ static int __pppoe_xmit(struct sock *sk, po->pppoe_pa.remote, NULL, data_len); dev_queue_xmit(skb); - + dev_put(dev); return 1; +abort_put: + dev_put(dev); abort: kfree_skb(skb); return 1; ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces 2009-10-20 13:59 ` Cyrill Gorcunov @ 2009-10-20 14:20 ` Denys Fedoryschenko 2009-10-20 14:23 ` Cyrill Gorcunov 2009-10-20 19:08 ` Cyrill Gorcunov 0 siblings, 2 replies; 34+ messages in thread From: Denys Fedoryschenko @ 2009-10-20 14:20 UTC (permalink / raw) To: Cyrill Gorcunov Cc: Michal Ostrowski, Eric Dumazet, netdev, linux-ppp, paulus, mostrows It panics almost immediately on boot(even on old operations that was stable, seems on first pppoe customer login attempt), i will rebuild kernel and if interesting will try to get panic message. On Tuesday 20 October 2009 16:59:20 Cyrill Gorcunov wrote: > [Denys Fedoryschenko - Tue, Oct 20, 2009 at 04:50:09PM +0300] > > ... > > | > Thanks Denys, I'm preparing new patch (just back from office > | > and had no inet connection that is why reply is delayed, sorry). > | > | There is no problem at all. > | This rename operation is just future operation and host is redundant, so > | i can do tests on it anytime. > > ok, here is it, please try (it's still a draft version though) > > -- Cyrill > --- > drivers/net/pppoe.c | 106 > +++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 81 > insertions(+), 25 deletions(-) > > Index: linux-2.6.git/drivers/net/pppoe.c > ===================================================================== > --- linux-2.6.git.orig/drivers/net/pppoe.c > +++ linux-2.6.git/drivers/net/pppoe.c > @@ -313,8 +313,8 @@ static void pppoe_flush_dev(struct net_d > sk = sk_pppox(po); > spin_lock(&flush_lock); > po->pppoe_dev = NULL; > - spin_unlock(&flush_lock); > dev_put(dev); > + spin_unlock(&flush_lock); > > /* We always grab the socket lock, followed by the > * hash_lock, in that order. Since we should > @@ -386,13 +386,21 @@ static struct notifier_block pppoe_notif > static int pppoe_rcv_core(struct sock *sk, struct sk_buff *skb) > { > struct pppox_sock *po = pppox_sk(sk); > - struct pppox_sock *relay_po; > + struct pppox_sock *relay_po = NULL; > + struct net_device *dev = NULL; > > if (sk->sk_state & PPPOX_BOUND) { > ppp_input(&po->chan, skb); > } else if (sk->sk_state & PPPOX_RELAY) { > - relay_po = get_item_by_addr(dev_net(po->pppoe_dev), > - &po->pppoe_relay); > + struct pppoe_net *pn = pppoe_pernet(sock_net(sk)); > + read_lock_bh(&pn->hash_lock); > + dev = po->pppoe_dev; > + if (dev) { > + dev_hold(dev); > + relay_po = get_item_by_addr(dev_net(dev), > + &po->pppoe_relay); > + } > + read_unlock_bh(&pn->hash_lock); > if (relay_po == NULL) > goto abort_kfree; > > @@ -401,6 +409,7 @@ static int pppoe_rcv_core(struct sock *s > > if (!__pppoe_xmit(sk_pppox(relay_po), skb)) > goto abort_put; > + dev_put(dev); > } else { > if (sock_queue_rcv_skb(sk, skb)) > goto abort_kfree; > @@ -412,6 +421,8 @@ abort_put: > sock_put(sk_pppox(relay_po)); > > abort_kfree: > + if (dev) > + dev_put(dev); > kfree_skb(skb); > return NET_RX_DROP; > } > @@ -625,8 +636,8 @@ static int pppoe_connect(struct socket * > struct sock *sk = sock->sk; > struct sockaddr_pppox *sp = (struct sockaddr_pppox *)uservaddr; > struct pppox_sock *po = pppox_sk(sk); > - struct net_device *dev; > - struct pppoe_net *pn; > + struct net_device *dev = NULL; > + struct pppoe_net *pn = NULL; > int error; > > lock_sock(sk); > @@ -652,12 +663,15 @@ static int pppoe_connect(struct socket * > /* Delete the old binding */ > if (stage_session(po->pppoe_pa.sid)) { > pppox_unbind_sock(sk); > + spin_lock(&flush_lock); > if (po->pppoe_dev) { > pn = pppoe_pernet(dev_net(po->pppoe_dev)); > delete_item(pn, po->pppoe_pa.sid, > po->pppoe_pa.remote, po->pppoe_ifindex); > dev_put(po->pppoe_dev); > + po->pppoe_dev = NULL; > } > + spin_unlock(&flush_lock); > memset(sk_pppox(po) + 1, 0, > sizeof(struct pppox_sock) - sizeof(struct sock)); > sk->sk_state = PPPOX_NONE; > @@ -670,10 +684,11 @@ static int pppoe_connect(struct socket * > if (!dev) > goto end; > > + write_lock_bh(&pn->hash_lock); > + dev_hold(dev); > po->pppoe_dev = dev; > po->pppoe_ifindex = dev->ifindex; > pn = pppoe_pernet(dev_net(dev)); > - write_lock_bh(&pn->hash_lock); > if (!(dev->flags & IFF_UP)) { > write_unlock_bh(&pn->hash_lock); > goto err_put; > @@ -700,6 +715,7 @@ static int pppoe_connect(struct socket * > goto err_put; > > sk->sk_state = PPPOX_CONNECTED; > + dev_put(dev); > } > > po->num = sp->sa_addr.pppoe.sid; > @@ -708,10 +724,13 @@ end: > release_sock(sk); > return error; > err_put: > + dev_put(dev); > + write_lock_bh(&pn->hash_lock); > if (po->pppoe_dev) { > dev_put(po->pppoe_dev); > po->pppoe_dev = NULL; > } > + write_unlock_bh(&pn->hash_lock); > goto end; > } > > @@ -738,6 +757,8 @@ static int pppoe_ioctl(struct socket *so > { > struct sock *sk = sock->sk; > struct pppox_sock *po = pppox_sk(sk); > + struct pppoe_net *pn = pppoe_pernet(sock_net(sk)); > + unsigned int mtu = 0; > int val; > int err; > > @@ -746,11 +767,17 @@ static int pppoe_ioctl(struct socket *so > err = -ENXIO; > if (!(sk->sk_state & PPPOX_CONNECTED)) > break; > - > + read_lock_bh(&pn->hash_lock); > + err = -ENODEV; > + if (po->pppoe_dev) { > + mtu = po->pppoe_dev->mtu; > + err = 0; > + } > + read_unlock_bh(&pn->hash_lock); > + if (err) > + break; > err = -EFAULT; > - if (put_user(po->pppoe_dev->mtu - > - sizeof(struct pppoe_hdr) - > - PPP_HDRLEN, > + if (put_user(mtu - sizeof(struct pppoe_hdr) - PPP_HDRLEN, > (int __user *)arg)) > break; > err = 0; > @@ -761,13 +788,21 @@ static int pppoe_ioctl(struct socket *so > if (!(sk->sk_state & PPPOX_CONNECTED)) > break; > > + read_lock_bh(&pn->hash_lock); > + err = -ENODEV; > + if (po->pppoe_dev) { > + mtu = po->pppoe_dev->mtu; > + err = 0; > + } > + read_unlock_bh(&pn->hash_lock); > + if (err) > + break; > + > err = -EFAULT; > if (get_user(val, (int __user *)arg)) > break; > > - if (val < (po->pppoe_dev->mtu > - - sizeof(struct pppoe_hdr) > - - PPP_HDRLEN)) > + if (val < (mtu - sizeof(struct pppoe_hdr) - PPP_HDRLEN)) > err = 0; > else > err = -EINVAL; > @@ -839,10 +874,11 @@ static int pppoe_sendmsg(struct kiocb *i > struct sk_buff *skb; > struct sock *sk = sock->sk; > struct pppox_sock *po = pppox_sk(sk); > + struct pppoe_net *pn = pppoe_pernet(sock_net(sk)); > int error; > struct pppoe_hdr hdr; > struct pppoe_hdr *ph; > - struct net_device *dev; > + struct net_device *dev = NULL; > char *start; > > lock_sock(sk); > @@ -856,18 +892,27 @@ static int pppoe_sendmsg(struct kiocb *i > hdr.code = 0; > hdr.sid = po->num; > > - dev = po->pppoe_dev; > + read_lock_bh(&pn->hash_lock); > + error = -ENODEV; > + if (po->pppoe_dev) { > + dev = po->pppoe_dev; > + dev_hold(dev); > + error = 0; > + } > + read_unlock_bh(&pn->hash_lock); > + if (error) > + goto end; > > error = -EMSGSIZE; > if (total_len > (dev->mtu + dev->hard_header_len)) > - goto end; > + goto end_put; > > > skb = sock_wmalloc(sk, total_len + dev->hard_header_len + 32, > 0, GFP_KERNEL); > if (!skb) { > error = -ENOMEM; > - goto end; > + goto end_put; > } > > /* Reserve space for headers. */ > @@ -885,7 +930,7 @@ static int pppoe_sendmsg(struct kiocb *i > error = memcpy_fromiovec(start, m->msg_iov, total_len); > if (error < 0) { > kfree_skb(skb); > - goto end; > + goto end_put; > } > > error = total_len; > @@ -898,6 +943,8 @@ static int pppoe_sendmsg(struct kiocb *i > > dev_queue_xmit(skb); > > +end_put: > + dev_put(dev); > end: > release_sock(sk); > return error; > @@ -911,21 +958,28 @@ end: > static int __pppoe_xmit(struct sock *sk, struct sk_buff *skb) > { > struct pppox_sock *po = pppox_sk(sk); > - struct net_device *dev = po->pppoe_dev; > + struct pppoe_net *pn = pppoe_pernet(sock_net(sk)); > + struct net_device *dev; > struct pppoe_hdr *ph; > int data_len = skb->len; > > - if (sock_flag(sk, SOCK_DEAD) || !(sk->sk_state & PPPOX_CONNECTED)) > + read_lock_bh(&pn->hash_lock); > + if (!po->pppoe_dev) { > + read_unlock_bh(&pn->hash_lock); > goto abort; > + } > + dev = po->pppoe_dev; > + dev_hold(dev); > + read_unlock_bh(&pn->hash_lock); > > - if (!dev) > - goto abort; > + if (sock_flag(sk, SOCK_DEAD) || !(sk->sk_state & PPPOX_CONNECTED)) > + goto abort_put; > > /* Copy the data if there is no space for the header or if it's > * read-only. > */ > if (skb_cow_head(skb, sizeof(*ph) + dev->hard_header_len)) > - goto abort; > + goto abort_put; > > __skb_push(skb, sizeof(*ph)); > skb_reset_network_header(skb); > @@ -944,9 +998,11 @@ static int __pppoe_xmit(struct sock *sk, > po->pppoe_pa.remote, NULL, data_len); > > dev_queue_xmit(skb); > - > + dev_put(dev); > return 1; > > +abort_put: > + dev_put(dev); > abort: > kfree_skb(skb); > return 1; ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces 2009-10-20 14:20 ` Denys Fedoryschenko @ 2009-10-20 14:23 ` Cyrill Gorcunov 2009-10-20 19:08 ` Cyrill Gorcunov 1 sibling, 0 replies; 34+ messages in thread From: Cyrill Gorcunov @ 2009-10-20 14:23 UTC (permalink / raw) To: Denys Fedoryschenko Cc: Michal Ostrowski, Eric Dumazet, netdev, linux-ppp, paulus, mostrows [Denys Fedoryschenko - Tue, Oct 20, 2009 at 05:20:00PM +0300] | | It panics almost immediately on boot(even on old operations that was stable, | seems on first pppoe customer login attempt), i will rebuild kernel and if | interesting will try to get panic message. | ... ok, thanks. I continue digging. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces 2009-10-20 14:20 ` Denys Fedoryschenko 2009-10-20 14:23 ` Cyrill Gorcunov @ 2009-10-20 19:08 ` Cyrill Gorcunov 2009-10-23 15:18 ` Cyrill Gorcunov 1 sibling, 1 reply; 34+ messages in thread From: Cyrill Gorcunov @ 2009-10-20 19:08 UTC (permalink / raw) To: Denys Fedoryschenko Cc: Michal Ostrowski, Eric Dumazet, netdev, linux-ppp, paulus, mostrows [Denys Fedoryschenko - Tue, Oct 20, 2009 at 05:20:00PM +0300] | It panics almost immediately on boot(even on old operations that was stable, | seems on first pppoe customer login attempt), i will rebuild kernel and if | interesting will try to get panic message. | ... Just to update status of the issue. The key moment is that pppoe_flush_dev may be called asynchronously (especially via sysfs on dev entry, for example we retrieve mtu of device and while we at it other process may update mtu via sysfs). So I'm returning pppoe_hash_lock back which should eliminate a number of lock complains and make locking scheme easier. All-in-one: I'm working on it. Just need some more time. -- Cyrill ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces 2009-10-20 19:08 ` Cyrill Gorcunov @ 2009-10-23 15:18 ` Cyrill Gorcunov 2009-10-25 18:10 ` Denys Fedoryschenko 0 siblings, 1 reply; 34+ messages in thread From: Cyrill Gorcunov @ 2009-10-23 15:18 UTC (permalink / raw) To: netdev Cc: Denys Fedoryschenko, Michal Ostrowski, Eric Dumazet, linux-ppp, paulus, mostrows [Cyrill Gorcunov - Tue, Oct 20, 2009 at 11:08:21PM +0400] ... | | Just to update status of the issue. The key moment is that pppoe_flush_dev | may be called asynchronously (especially via sysfs on dev entry, for example | we retrieve mtu of device and while we at it other process may update mtu | via sysfs). So I'm returning pppoe_hash_lock back which should eliminate a | number of lock complains and make locking scheme easier. All-in-one: I'm | working on it. Just need some more time. | | -- Cyrill Another status update -- the patch is under testing stage. Hope we will reveal proper patch soon (in a few days I guess). -- Cyrill ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces 2009-10-23 15:18 ` Cyrill Gorcunov @ 2009-10-25 18:10 ` Denys Fedoryschenko 0 siblings, 0 replies; 34+ messages in thread From: Denys Fedoryschenko @ 2009-10-25 18:10 UTC (permalink / raw) To: Cyrill Gorcunov Cc: netdev, Michal Ostrowski, Eric Dumazet, linux-ppp, paulus, mostrows On Friday 23 October 2009 18:18:37 Cyrill Gorcunov wrote: > [Cyrill Gorcunov - Tue, Oct 20, 2009 at 11:08:21PM +0400] > ... > > | Just to update status of the issue. The key moment is that > | pppoe_flush_dev may be called asynchronously (especially via sysfs on dev > | entry, for example we retrieve mtu of device and while we at it other > | process may update mtu via sysfs). So I'm returning pppoe_hash_lock back > | which should eliminate a number of lock complains and make locking scheme > | easier. All-in-one: I'm working on it. Just need some more time. > | > | -- Cyrill > > Another status update -- the patch is under testing stage. Hope we will > reveal proper patch soon (in a few days I guess). > > -- Cyrill During normal operation till now no issues (2 days uptime, 100-150 users online on this pppoe). Testing more some stress scenarios. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces 2009-10-19 21:22 ` Michal Ostrowski 2009-10-20 0:08 ` Denys Fedoryschenko @ 2009-10-20 2:28 ` David Miller 1 sibling, 0 replies; 34+ messages in thread From: David Miller @ 2009-10-20 2:28 UTC (permalink / raw) To: mostrows; +Cc: gorcunov, eric.dumazet, denys, netdev, linux-ppp, paulus, mostrows Please stop top posting! I want to follow this discussions efficiently and that's impossible if you reply BEFORE instead of AFTER the context of what you're replying to. Note that a day will come very soon that postings to these lists that do top posting will be flat out bounced back to you and never make it to the list at all. ^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2009-10-25 18:10 UTC | newest] Thread overview: 34+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-10-18 21:02 kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces Denys Fedoryschenko 2009-10-19 3:34 ` Michal Ostrowski 2009-10-19 11:36 ` Denys Fedoryschenko 2009-10-19 12:01 ` Denys Fedoryschenko 2009-10-19 12:36 ` Eric Dumazet 2009-10-19 13:19 ` Michal Ostrowski 2009-10-19 15:50 ` Cyrill Gorcunov 2009-10-19 16:05 ` Michal Ostrowski 2009-10-19 17:12 ` Eric Dumazet 2009-10-19 18:07 ` Michal Ostrowski 2009-10-19 18:44 ` Eric Dumazet 2009-10-19 19:29 ` Cyrill Gorcunov 2009-10-19 20:54 ` Michal Ostrowski 2009-10-20 3:42 ` Eric Dumazet 2009-10-20 5:02 ` Cyrill Gorcunov 2009-10-20 5:05 ` Eric Dumazet 2009-10-20 5:17 ` Cyrill Gorcunov 2009-10-20 6:04 ` Cyrill Gorcunov 2009-10-19 20:57 ` Cyrill Gorcunov 2009-10-19 21:22 ` Michal Ostrowski 2009-10-20 0:08 ` Denys Fedoryschenko 2009-10-20 3:04 ` Cyrill Gorcunov 2009-10-20 11:36 ` Denys Fedoryschenko 2009-10-20 11:50 ` Cyrill Gorcunov 2009-10-20 11:52 ` Denys Fedoryschenko 2009-10-20 13:42 ` Cyrill Gorcunov 2009-10-20 13:50 ` Denys Fedoryschenko 2009-10-20 13:59 ` Cyrill Gorcunov 2009-10-20 14:20 ` Denys Fedoryschenko 2009-10-20 14:23 ` Cyrill Gorcunov 2009-10-20 19:08 ` Cyrill Gorcunov 2009-10-23 15:18 ` Cyrill Gorcunov 2009-10-25 18:10 ` Denys Fedoryschenko 2009-10-20 2:28 ` David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).