kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces
@ 2009-10-18 21:02 Denys Fedoryschenko
  2009-10-19  3:34 ` Michal Ostrowski
  0 siblings, 1 reply; 34+ messages in thread
From: Denys Fedoryschenko @ 2009-10-18 21:02 UTC (permalink / raw)
  To: netdev, linux-ppp, paulus, mostrows

I have server running as pppoe NAS.
Tried to rename customers without dropping pppd connections first, got panic 
after few seconds.
Panic triggerable at 2.6.30.4 and 2.6.31.4
pppoe users running on eth2
pppoe flags:
 1457 root     /usr/sbin/pppoe-server -I eth2 -k -L 172.16.1.1 -R 
172.16.1.2 -N 253 -C gpzone -S gpzone


Commands sequence that i think triggered that:

ip link set eth0 down
ip link set eth1 down
ip link set eth2 down
nameif etherx 00:16:76:8D:83:BA
nameif eth0 00:19:e0:72:4a:37
nameif eth1 00:19:e0:72:4a:4b

ip addr flush dev eth0
ip addr flush dev eth1
ip addr add X.X.X.X/29 dev eth0
ip addr add 192.168.2.177/24 dev eth0
ip addr add 192.168.0.1/32 dev eth1
ip addr add 127.0.0.0/8 dev lo
#ip link set eth0 up
ip link set eth0 up
ip link set eth1 up
ip link set lo up
ip route add 0.0.0.0/0 via X.X.X.X


[  103.428591] r8169: eth0: link up
[  103.430360] r8169: eth1: link up
[  113.361528] BUG: unable to handle kernel
NULL pointer dereference
at 0000018f
[  113.361717] IP:
[<f8868269>] pppoe_device_event+0x80/0x12c [pppoe]
[  113.361853] *pdpt = 000000003411a001
*pde = 0000000000000000
Oct 18 23:59:40 194.146.153.93
[  113.362012] Oops: 0000 [#1]
SMP
Oct 18 23:59:40 194.146.153.93
[  113.362166] last sysfs file: /sys/devices/virtual/vc/vcs3/dev
[  113.362246] Modules linked in:
netconsole
configfs
act_skbedit
sch_ingress
sch_prio
cls_flow
cls_u32
em_meta
cls_basic
xt_dscp
xt_DSCP
ipt_REJECT
ts_bm
xt_string
xt_hl
ifb
cls_fw
sch_tbf
sch_htb
act_ipt
act_mirred
xt_MARK
pppoe
pppox
ppp_generic
slhc
xt_TCPMSS
xt_mark
xt_tcpudp
iptable_mangle
iptable_nat
nf_nat
rtc_cmos
nf_conntrack_ipv4
rtc_core
nf_conntrack
rtc_lib
nf_defrag_ipv4
iptable_filter
ip_tables
x_tables
8021q
garp
stp
llc
loop
sata_sil
pata_atiixp
pata_acpi
ata_generic
libata
8139cp
usb_storage
mtdblock
mtd_blkdevs
mtd
sr_mod
cdrom
tulip
r8169
sky2
via_velocity
via_rhine
sis900
ne2k_pci
8390
skge
tg3
libphy
8139too
e1000
e100
usbhid
ohci_hcd
uhci_hcd
ehci_hcd
usbcore
nls_base
Oct 18 23:59:40 194.146.153.93
[  113.362344]
[  113.362344] Pid: 2858, comm: pppd Not tainted (2.6.31.4-build-0047 #7)
[  113.362344] EIP: 0060:[<f8868269>] EFLAGS: 00010286 CPU: 0
[  113.362344] EIP is at pppoe_device_event+0x80/0x12c [pppoe]
[  113.362344] EAX: f4fbe000 EBX: ffffffff ECX: f6cea5a0 EDX: f7403680
[  113.362344] ESI: 0000000f EDI: f6cea5e0 EBP: f4145e34 ESP: f4145e1c
[  113.362344]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  113.362344] Process pppd (pid: 2858, ti=f4145000 task=f4112ff0 
task.ti=f4145000)
[  113.362344] Stack:
[  113.362344]  f4fbe220
f4fbe000
f6cea5a0
f886a430
fffffff5
00000000
f4145e54
c01422b3
Oct 18 23:59:40 194.146.153.93
[  113.362344] <0>
f4fbe000
00000009
f8a457d8
f4fbe000
f8850190
00001091
f4145e64
c0142361
Oct 18 23:59:40 194.146.153.93
[  113.362344] <0>
ffffffff
00000000
f4145e74
c029ffbf
f4fbe000
000010d0
f4145e90
c029fa70
Oct 18 23:59:40 194.146.153.93
[  113.362344] Call Trace:
[  113.362344]  [<c01422b3>] ? notifier_call_chain+0x2b/0x4a
[  113.362344]  [<c0142361>] ? raw_notifier_call_chain+0xc/0xe
[  113.362344]  [<c029ffbf>] ? dev_close+0x4c/0x8c
[  113.362344]  [<c029fa70>] ? dev_change_flags+0xa5/0x158
[  113.362344]  [<c02da633>] ? devinet_ioctl+0x21a/0x503
[  113.362344]  [<c02db693>] ? inet_ioctl+0x8d/0xa6
[  113.362344]  [<c0292b21>] ? sock_ioctl+0x1c8/0x1ec
[  113.362344]  [<c0292959>] ? sock_ioctl+0x0/0x1ec
[  113.362344]  [<c019af2b>] ? vfs_ioctl+0x22/0x69
[  113.362344]  [<c019b435>] ? do_vfs_ioctl+0x41f/0x459
[  113.362344]  [<c02934eb>] ? sys_send+0x18/0x1a
[  113.362344]  [<c011942f>] ? do_page_fault+0x242/0x26f
[  113.362344]  [<c019b49b>] ? sys_ioctl+0x2c/0x45
[  113.362344]  [<c0102975>] ? syscall_call+0x7/0xb
[  113.362344] Code:
c9
00
00
00
89
c7
31
f6
83
c7
40
89
f8
e8
cc
60
a9
c7
8b
45
ec
05
20
02
00
00
89
45
e8
8b
4d
f0
8b
1c
b1
e9
8c
00
00
00
8b
45
ec
Oct 18 23:59:40 194.146.153.93
83
90
01
00
00
74
08
8b
9b
8c
01
00
00
eb
79
b8
c0
a6
86
f8
Oct 18 23:59:40 194.146.153.93
[  113.362344] EIP: [<f8868269>]
pppoe_device_event+0x80/0x12c [pppoe]
SS:ESP 0068:f4145e1c
[  113.362344] CR2: 000000000000018f
[  113.373124] ---[ end trace f6fe64a307e97f3b ]---
[  113.373203] Kernel panic - not syncing: Fatal exception in interrupt
[  113.373286] Pid: 2858, comm: pppd Tainted: G      D    2.6.31.4-build-0047 
#7
[  113.373379] Call Trace:
[  113.373479]  [<c02fc496>] ? printk+0xf/0x11
[  113.373561]  [<c02fc3e7>] panic+0x39/0xd9
[  113.373656]  [<c01059b7>] oops_end+0x8b/0x9a
[  113.373727]  [<c0118f6d>] no_context+0x13d/0x147
[  113.373800]  [<c011908a>] __bad_area_nosemaphore+0x113/0x11b
[  113.373881]  [<c02953b3>] ? sock_alloc_send_pskb+0x8b/0x24a
[  113.373959]  [<c0121801>] ? __wake_up_sync_key+0x3b/0x45
[  113.374030]  [<c0131967>] ? irq_exit+0x39/0x5c
[  113.374107]  [<c0104393>] ? do_IRQ+0x80/0x96
[  113.374183]  [<c0102f49>] ? common_interrupt+0x29/0x30
[  113.374259]  [<c011909f>] bad_area_nosemaphore+0xd/0x10
[  113.374348]  [<c0119301>] do_page_fault+0x114/0x26f
[  113.374526]  [<c01191ed>] ? do_page_fault+0x0/0x26f
[  113.374605]  [<c02fe506>] error_code+0x66/0x6c
[  113.374683]  [<c02d007b>] ? tcp_v4_send_ack+0xa3/0x10e
[  113.374764]  [<c01191ed>] ? do_page_fault+0x0/0x26f
[  113.374850]  [<f8868269>] ? pppoe_device_event+0x80/0x12c [pppoe]
[  113.374928]  [<c01422b3>] notifier_call_chain+0x2b/0x4a
[  113.375012]  [<c0142361>] raw_notifier_call_chain+0xc/0xe
[  113.375097]  [<c029ffbf>] dev_close+0x4c/0x8c
[  113.375169]  [<c029fa70>] dev_change_flags+0xa5/0x158
[  113.375239]  [<c02da633>] devinet_ioctl+0x21a/0x503
[  113.375318]  [<c02db693>] inet_ioctl+0x8d/0xa6
[  113.375411]  [<c0292b21>] sock_ioctl+0x1c8/0x1ec
[  113.375491]  [<c0292959>] ? sock_ioctl+0x0/0x1ec
[  113.375574]  [<c019af2b>] vfs_ioctl+0x22/0x69
[  113.375653]  [<c019b435>] do_vfs_ioctl+0x41f/0x459
[  113.375734]  [<c02934eb>] ? sys_send+0x18/0x1a
[  113.375813]  [<c011942f>] ? do_page_fault+0x242/0x26f
[  113.375884]  [<c019b49b>] sys_ioctl+0x2c/0x45
[  113.375960]  [<c0102975>] syscall_call+0x7/0xb
[  113.376041] Rebooting in 5 seconds..

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces
  2009-10-18 21:02 kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces Denys Fedoryschenko
@ 2009-10-19  3:34 ` Michal Ostrowski
  2009-10-19 11:36   ` Denys Fedoryschenko
                     ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Michal Ostrowski @ 2009-10-19  3:34 UTC (permalink / raw)
  To: Denys Fedoryschenko; +Cc: netdev, linux-ppp, paulus, mostrows

Here's my theory on this after an inital look...

Looking at the oops report and disassembly of the actual module binary
that caused the oops, one can deduce that:

Execution was in pppoe_flush_dev().  %ebx contained the pointer "struct
pppox_sock *po", which is what we faulted on, excuting "cmp %eax, 0x190(%ebx)".
%ebx value was 0xffffffff (hence we got "NULL pointer dereference at 0x18f").

At this point "i" (stored in %esi) is 15 (valid), meaning that we got a value
of 0xffffffff in pn->hash_table[i].

From this I'd hypothesize that the combination of dev_put() and release_sock()
may have allowed us to free "pn".  At the bottom of the loop we alreayd
recognize that since locks are dropped we're responsible for handling
invalidation of objects, and perhaps that should be extended to "pn" as well.
--
Michal Ostrowski
mostrows@gmail.com


---
 drivers/net/pppoe.c |   86 ++++++++++++++++++++++++++----
--------------------
 1 files changed, 45 insertions(+), 41 deletions(-)

diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c
index 7cbf6f9..720c4ea 100644
--- a/drivers/net/pppoe.c
+++ b/drivers/net/pppoe.c
@@ -296,6 +296,7 @@ static void pppoe_flush_dev(struct net_device *dev)

       BUG_ON(dev == NULL);

+restart:
       pn = pppoe_pernet(dev_net(dev));
       if (!pn) /* already freed */
               return;
@@ -303,48 +304,51 @@ static void pppoe_flush_dev(struct net_device *dev)
       write_lock_bh(&pn->hash_lock);
       for (i = 0; i < PPPOE_HASH_SIZE; i++) {
               struct pppox_sock *po = pn->hash_table[i];
+                struct sock *sk;

-               while (po != NULL) {
-                       struct sock *sk;
-                       if (po->pppoe_dev != dev) {
-                               po = po->next;
-                               continue;
-                       }
-                       sk = sk_pppox(po);
-                       spin_lock(&flush_lock);
-                       po->pppoe_dev = NULL;
-                       spin_unlock(&flush_lock);
-                       dev_put(dev);
-
-                       /* We always grab the socket lock, followed by the
-                        * hash_lock, in that order.  Since we should
-                        * hold the sock lock while doing any unbinding,
-                        * we need to release the lock we're holding.
-                        * Hold a reference to the sock so it doesn't disappear
-                        * as we're jumping between locks.
-                        */
+                while (po && po->pppoe_dev != dev) {
+                        po = po->next;
+                }

-                       sock_hold(sk);
+                if (po == NULL) {
+                        continue;
+                }

-                       write_unlock_bh(&pn->hash_lock);
-                       lock_sock(sk);
+                sk = sk_pppox(po);

-                       if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND)) {
-                               pppox_unbind_sock(sk);
-                               sk->sk_state = PPPOX_ZOMBIE;
-                               sk->sk_state_change(sk);
-                       }
+                spin_lock(&flush_lock);
+                po->pppoe_dev = NULL;
+                spin_unlock(&flush_lock);

-                       release_sock(sk);
-                       sock_put(sk);
+                dev_put(dev);

-                       /* Restart scan at the beginning of this hash chain.
-                        * While the lock was dropped the chain contents may
-                        * have changed.
-                        */
-                       write_lock_bh(&pn->hash_lock);
-                       po = pn->hash_table[i];
-               }
+                /* We always grab the socket lock, followed by the
+                 * hash_lock, in that order.  Since we should
+                 * hold the sock lock while doing any unbinding,
+                 * we need to release the lock we're holding.
+                 * Hold a reference to the sock so it doesn't disappear
+                 * as we're jumping between locks.
+                 */
+
+                sock_hold(sk);
+
+                write_unlock_bh(&pn->hash_lock);
+                lock_sock(sk);
+
+                if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND)) {
+                        pppox_unbind_sock(sk);
+                        sk->sk_state = PPPOX_ZOMBIE;
+                        sk->sk_state_change(sk);
+                }
+
+                release_sock(sk);
+                sock_put(sk);
+
+                /* Restart the flush process from the beginning.  While
+                 * the lock was dropped the chain contents may have
+                 * changed, and sock_put may have made things go away.
+                 */
+                goto restart;
       }
       write_unlock_bh(&pn->hash_lock);
 }
--
1.6.3.3



On Sun, Oct 18, 2009 at 4:02 PM, Denys Fedoryschenko <denys@visp.net.lb> wrote:
>
> I have server running as pppoe NAS.
> Tried to rename customers without dropping pppd connections first, got panic
> after few seconds.
> Panic triggerable at 2.6.30.4 and 2.6.31.4
> pppoe users running on eth2
> pppoe flags:
>  1457 root     /usr/sbin/pppoe-server -I eth2 -k -L 172.16.1.1 -R
> 172.16.1.2 -N 253 -C gpzone -S gpzone
>
>
> Commands sequence that i think triggered that:
>
> ip link set eth0 down
> ip link set eth1 down
> ip link set eth2 down
> nameif etherx 00:16:76:8D:83:BA
> nameif eth0 00:19:e0:72:4a:37
> nameif eth1 00:19:e0:72:4a:4b
>
> ip addr flush dev eth0
> ip addr flush dev eth1
> ip addr add X.X.X.X/29 dev eth0
> ip addr add 192.168.2.177/24 dev eth0
> ip addr add 192.168.0.1/32 dev eth1
> ip addr add 127.0.0.0/8 dev lo
> #ip link set eth0 up
> ip link set eth0 up
> ip link set eth1 up
> ip link set lo up
> ip route add 0.0.0.0/0 via X.X.X.X
>
>
> [  103.428591] r8169: eth0: link up
> [  103.430360] r8169: eth1: link up
> [  113.361528] BUG: unable to handle kernel
> NULL pointer dereference
> at 0000018f
> [  113.361717] IP:
> [<f8868269>] pppoe_device_event+0x80/0x12c [pppoe]
> [  113.361853] *pdpt = 000000003411a001
> *pde = 0000000000000000
> Oct 18 23:59:40 194.146.153.93
> [  113.362012] Oops: 0000 [#1]
> SMP
> Oct 18 23:59:40 194.146.153.93
> [  113.362166] last sysfs file: /sys/devices/virtual/vc/vcs3/dev
> [  113.362246] Modules linked in:
> netconsole
> configfs
> act_skbedit
> sch_ingress
> sch_prio
> cls_flow
> cls_u32
> em_meta
> cls_basic
> xt_dscp
> xt_DSCP
> ipt_REJECT
> ts_bm
> xt_string
> xt_hl
> ifb
> cls_fw
> sch_tbf
> sch_htb
> act_ipt
> act_mirred
> xt_MARK
> pppoe
> pppox
> ppp_generic
> slhc
> xt_TCPMSS
> xt_mark
> xt_tcpudp
> iptable_mangle
> iptable_nat
> nf_nat
> rtc_cmos
> nf_conntrack_ipv4
> rtc_core
> nf_conntrack
> rtc_lib
> nf_defrag_ipv4
> iptable_filter
> ip_tables
> x_tables
> 8021q
> garp
> stp
> llc
> loop
> sata_sil
> pata_atiixp
> pata_acpi
> ata_generic
> libata
> 8139cp
> usb_storage
> mtdblock
> mtd_blkdevs
> mtd
> sr_mod
> cdrom
> tulip
> r8169
> sky2
> via_velocity
> via_rhine
> sis900
> ne2k_pci
> 8390
> skge
> tg3
> libphy
> 8139too
> e1000
> e100
> usbhid
> ohci_hcd
> uhci_hcd
> ehci_hcd
> usbcore
> nls_base
> Oct 18 23:59:40 194.146.153.93
> [  113.362344]
> [  113.362344] Pid: 2858, comm: pppd Not tainted (2.6.31.4-build-0047 #7)
> [  113.362344] EIP: 0060:[<f8868269>] EFLAGS: 00010286 CPU: 0
> [  113.362344] EIP is at pppoe_device_event+0x80/0x12c [pppoe]
> [  113.362344] EAX: f4fbe000 EBX: ffffffff ECX: f6cea5a0 EDX: f7403680
> [  113.362344] ESI: 0000000f EDI: f6cea5e0 EBP: f4145e34 ESP: f4145e1c
> [  113.362344]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> [  113.362344] Process pppd (pid: 2858, ti=f4145000 task=f4112ff0
> task.ti=f4145000)
> [  113.362344] Stack:
> [  113.362344]  f4fbe220
> f4fbe000
> f6cea5a0
> f886a430
> fffffff5
> 00000000
> f4145e54
> c01422b3
> Oct 18 23:59:40 194.146.153.93
> [  113.362344] <0>
> f4fbe000
> 00000009
> f8a457d8
> f4fbe000
> f8850190
> 00001091
> f4145e64
> c0142361
> Oct 18 23:59:40 194.146.153.93
> [  113.362344] <0>
> ffffffff
> 00000000
> f4145e74
> c029ffbf
> f4fbe000
> 000010d0
> f4145e90
> c029fa70
> Oct 18 23:59:40 194.146.153.93
> [  113.362344] Call Trace:
> [  113.362344]  [<c01422b3>] ? notifier_call_chain+0x2b/0x4a
> [  113.362344]  [<c0142361>] ? raw_notifier_call_chain+0xc/0xe
> [  113.362344]  [<c029ffbf>] ? dev_close+0x4c/0x8c
> [  113.362344]  [<c029fa70>] ? dev_change_flags+0xa5/0x158
> [  113.362344]  [<c02da633>] ? devinet_ioctl+0x21a/0x503
> [  113.362344]  [<c02db693>] ? inet_ioctl+0x8d/0xa6
> [  113.362344]  [<c0292b21>] ? sock_ioctl+0x1c8/0x1ec
> [  113.362344]  [<c0292959>] ? sock_ioctl+0x0/0x1ec
> [  113.362344]  [<c019af2b>] ? vfs_ioctl+0x22/0x69
> [  113.362344]  [<c019b435>] ? do_vfs_ioctl+0x41f/0x459
> [  113.362344]  [<c02934eb>] ? sys_send+0x18/0x1a
> [  113.362344]  [<c011942f>] ? do_page_fault+0x242/0x26f
> [  113.362344]  [<c019b49b>] ? sys_ioctl+0x2c/0x45
> [  113.362344]  [<c0102975>] ? syscall_call+0x7/0xb
> [  113.362344] Code:
> c9
> 00
> 00
> 00
> 89
> c7
> 31
> f6
> 83
> c7
> 40
> 89
> f8
> e8
> cc
> 60
> a9
> c7
> 8b
> 45
> ec
> 05
> 20
> 02
> 00
> 00
> 89
> 45
> e8
> 8b
> 4d
> f0
> 8b
> 1c
> b1
> e9
> 8c
> 00
> 00
> 00
> 8b
> 45
> ec
> Oct 18 23:59:40 194.146.153.93
> 83
> 90
> 01
> 00
> 00
> 74
> 08
> 8b
> 9b
> 8c
> 01
> 00
> 00
> eb
> 79
> b8
> c0
> a6
> 86
> f8
> Oct 18 23:59:40 194.146.153.93
> [  113.362344] EIP: [<f8868269>]
> pppoe_device_event+0x80/0x12c [pppoe]
> SS:ESP 0068:f4145e1c
> [  113.362344] CR2: 000000000000018f
> [  113.373124] ---[ end trace f6fe64a307e97f3b ]---
> [  113.373203] Kernel panic - not syncing: Fatal exception in interrupt
> [  113.373286] Pid: 2858, comm: pppd Tainted: G      D    2.6.31.4-build-0047
> #7
> [  113.373379] Call Trace:
> [  113.373479]  [<c02fc496>] ? printk+0xf/0x11
> [  113.373561]  [<c02fc3e7>] panic+0x39/0xd9
> [  113.373656]  [<c01059b7>] oops_end+0x8b/0x9a
> [  113.373727]  [<c0118f6d>] no_context+0x13d/0x147
> [  113.373800]  [<c011908a>] __bad_area_nosemaphore+0x113/0x11b
> [  113.373881]  [<c02953b3>] ? sock_alloc_send_pskb+0x8b/0x24a
> [  113.373959]  [<c0121801>] ? __wake_up_sync_key+0x3b/0x45
> [  113.374030]  [<c0131967>] ? irq_exit+0x39/0x5c
> [  113.374107]  [<c0104393>] ? do_IRQ+0x80/0x96
> [  113.374183]  [<c0102f49>] ? common_interrupt+0x29/0x30
> [  113.374259]  [<c011909f>] bad_area_nosemaphore+0xd/0x10
> [  113.374348]  [<c0119301>] do_page_fault+0x114/0x26f
> [  113.374526]  [<c01191ed>] ? do_page_fault+0x0/0x26f
> [  113.374605]  [<c02fe506>] error_code+0x66/0x6c
> [  113.374683]  [<c02d007b>] ? tcp_v4_send_ack+0xa3/0x10e
> [  113.374764]  [<c01191ed>] ? do_page_fault+0x0/0x26f
> [  113.374850]  [<f8868269>] ? pppoe_device_event+0x80/0x12c [pppoe]
> [  113.374928]  [<c01422b3>] notifier_call_chain+0x2b/0x4a
> [  113.375012]  [<c0142361>] raw_notifier_call_chain+0xc/0xe
> [  113.375097]  [<c029ffbf>] dev_close+0x4c/0x8c
> [  113.375169]  [<c029fa70>] dev_change_flags+0xa5/0x158
> [  113.375239]  [<c02da633>] devinet_ioctl+0x21a/0x503
> [  113.375318]  [<c02db693>] inet_ioctl+0x8d/0xa6
> [  113.375411]  [<c0292b21>] sock_ioctl+0x1c8/0x1ec
> [  113.375491]  [<c0292959>] ? sock_ioctl+0x0/0x1ec
> [  113.375574]  [<c019af2b>] vfs_ioctl+0x22/0x69
> [  113.375653]  [<c019b435>] do_vfs_ioctl+0x41f/0x459
> [  113.375734]  [<c02934eb>] ? sys_send+0x18/0x1a
> [  113.375813]  [<c011942f>] ? do_page_fault+0x242/0x26f
> [  113.375884]  [<c019b49b>] sys_ioctl+0x2c/0x45
> [  113.375960]  [<c0102975>] syscall_call+0x7/0xb
> [  113.376041] Rebooting in 5 seconds..

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: kernel panic in latest vanilla stable, while using nameif with  "alive" pppoe interfaces
  2009-10-19  3:34 ` Michal Ostrowski
@ 2009-10-19 11:36   ` Denys Fedoryschenko
  2009-10-19 12:01   ` Denys Fedoryschenko
  2009-10-19 12:36   ` Eric Dumazet
  2 siblings, 0 replies; 34+ messages in thread
From: Denys Fedoryschenko @ 2009-10-19 11:36 UTC (permalink / raw)
  To: Michal Ostrowski; +Cc: netdev, linux-ppp, paulus, mostrows

Can you send me patch as attachment please?

On Monday 19 October 2009 06:34:06 Michal Ostrowski wrote:
> Here's my theory on this after an inital look...
>
> Looking at the oops report and disassembly of the actual module binary
> that caused the oops, one can deduce that:
>
> Execution was in pppoe_flush_dev().  %ebx contained the pointer "struct
> pppox_sock *po", which is what we faulted on, excuting "cmp %eax,
> 0x190(%ebx)". %ebx value was 0xffffffff (hence we got "NULL pointer
> dereference at 0x18f").
>
> At this point "i" (stored in %esi) is 15 (valid), meaning that we got a
> value of 0xffffffff in pn->hash_table[i].
>
> From this I'd hypothesize that the combination of dev_put() and
> release_sock() may have allowed us to free "pn".  At the bottom of the loop
> we alreayd recognize that since locks are dropped we're responsible for
> handling invalidation of objects, and perhaps that should be extended to
> "pn" as well. --
> Michal Ostrowski
> mostrows@gmail.com
>
>
> ---
>  drivers/net/pppoe.c |   86 ++++++++++++++++++++++++++----
> --------------------
>  1 files changed, 45 insertions(+), 41 deletions(-)
>
> diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c
> index 7cbf6f9..720c4ea 100644
> --- a/drivers/net/pppoe.c
> +++ b/drivers/net/pppoe.c
> @@ -296,6 +296,7 @@ static void pppoe_flush_dev(struct net_device *dev)
>
>        BUG_ON(dev == NULL);
>
> +restart:
>        pn = pppoe_pernet(dev_net(dev));
>        if (!pn) /* already freed */
>                return;
> @@ -303,48 +304,51 @@ static void pppoe_flush_dev(struct net_device *dev)
>        write_lock_bh(&pn->hash_lock);
>        for (i = 0; i < PPPOE_HASH_SIZE; i++) {
>                struct pppox_sock *po = pn->hash_table[i];
> +                struct sock *sk;
>
> -               while (po != NULL) {
> -                       struct sock *sk;
> -                       if (po->pppoe_dev != dev) {
> -                               po = po->next;
> -                               continue;
> -                       }
> -                       sk = sk_pppox(po);
> -                       spin_lock(&flush_lock);
> -                       po->pppoe_dev = NULL;
> -                       spin_unlock(&flush_lock);
> -                       dev_put(dev);
> -
> -                       /* We always grab the socket lock, followed by the
> -                        * hash_lock, in that order.  Since we should
> -                        * hold the sock lock while doing any unbinding,
> -                        * we need to release the lock we're holding.
> -                        * Hold a reference to the sock so it doesn't
> disappear -                        * as we're jumping between locks.
> -                        */
> +                while (po && po->pppoe_dev != dev) {
> +                        po = po->next;
> +                }
>
> -                       sock_hold(sk);
> +                if (po == NULL) {
> +                        continue;
> +                }
>
> -                       write_unlock_bh(&pn->hash_lock);
> -                       lock_sock(sk);
> +                sk = sk_pppox(po);
>
> -                       if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND))
> { -                               pppox_unbind_sock(sk);
> -                               sk->sk_state = PPPOX_ZOMBIE;
> -                               sk->sk_state_change(sk);
> -                       }
> +                spin_lock(&flush_lock);
> +                po->pppoe_dev = NULL;
> +                spin_unlock(&flush_lock);
>
> -                       release_sock(sk);
> -                       sock_put(sk);
> +                dev_put(dev);
>
> -                       /* Restart scan at the beginning of this hash
> chain. -                        * While the lock was dropped the chain
> contents may -                        * have changed.
> -                        */
> -                       write_lock_bh(&pn->hash_lock);
> -                       po = pn->hash_table[i];
> -               }
> +                /* We always grab the socket lock, followed by the
> +                 * hash_lock, in that order.  Since we should
> +                 * hold the sock lock while doing any unbinding,
> +                 * we need to release the lock we're holding.
> +                 * Hold a reference to the sock so it doesn't disappear
> +                 * as we're jumping between locks.
> +                 */
> +
> +                sock_hold(sk);
> +
> +                write_unlock_bh(&pn->hash_lock);
> +                lock_sock(sk);
> +
> +                if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND)) {
> +                        pppox_unbind_sock(sk);
> +                        sk->sk_state = PPPOX_ZOMBIE;
> +                        sk->sk_state_change(sk);
> +                }
> +
> +                release_sock(sk);
> +                sock_put(sk);
> +
> +                /* Restart the flush process from the beginning.  While
> +                 * the lock was dropped the chain contents may have
> +                 * changed, and sock_put may have made things go away.
> +                 */
> +                goto restart;
>        }
>        write_unlock_bh(&pn->hash_lock);
>  }
> --
> 1.6.3.3
>
> On Sun, Oct 18, 2009 at 4:02 PM, Denys Fedoryschenko <denys@visp.net.lb> 
wrote:
> > I have server running as pppoe NAS.
> > Tried to rename customers without dropping pppd connections first, got
> > panic after few seconds.
> > Panic triggerable at 2.6.30.4 and 2.6.31.4
> > pppoe users running on eth2
> > pppoe flags:
> >  1457 root     /usr/sbin/pppoe-server -I eth2 -k -L 172.16.1.1 -R
> > 172.16.1.2 -N 253 -C gpzone -S gpzone
> >
> >
> > Commands sequence that i think triggered that:
> >
> > ip link set eth0 down
> > ip link set eth1 down
> > ip link set eth2 down
> > nameif etherx 00:16:76:8D:83:BA
> > nameif eth0 00:19:e0:72:4a:37
> > nameif eth1 00:19:e0:72:4a:4b
> >
> > ip addr flush dev eth0
> > ip addr flush dev eth1
> > ip addr add X.X.X.X/29 dev eth0
> > ip addr add 192.168.2.177/24 dev eth0
> > ip addr add 192.168.0.1/32 dev eth1
> > ip addr add 127.0.0.0/8 dev lo
> > #ip link set eth0 up
> > ip link set eth0 up
> > ip link set eth1 up
> > ip link set lo up
> > ip route add 0.0.0.0/0 via X.X.X.X
> >
> >
> > [  103.428591] r8169: eth0: link up
> > [  103.430360] r8169: eth1: link up
> > [  113.361528] BUG: unable to handle kernel
> > NULL pointer dereference
> > at 0000018f
> > [  113.361717] IP:
> > [<f8868269>] pppoe_device_event+0x80/0x12c [pppoe]
> > [  113.361853] *pdpt = 000000003411a001
> > *pde = 0000000000000000
> > Oct 18 23:59:40 194.146.153.93
> > [  113.362012] Oops: 0000 [#1]
> > SMP
> > Oct 18 23:59:40 194.146.153.93
> > [  113.362166] last sysfs file: /sys/devices/virtual/vc/vcs3/dev
> > [  113.362246] Modules linked in:
> > netconsole
> > configfs
> > act_skbedit
> > sch_ingress
> > sch_prio
> > cls_flow
> > cls_u32
> > em_meta
> > cls_basic
> > xt_dscp
> > xt_DSCP
> > ipt_REJECT
> > ts_bm
> > xt_string
> > xt_hl
> > ifb
> > cls_fw
> > sch_tbf
> > sch_htb
> > act_ipt
> > act_mirred
> > xt_MARK
> > pppoe
> > pppox
> > ppp_generic
> > slhc
> > xt_TCPMSS
> > xt_mark
> > xt_tcpudp
> > iptable_mangle
> > iptable_nat
> > nf_nat
> > rtc_cmos
> > nf_conntrack_ipv4
> > rtc_core
> > nf_conntrack
> > rtc_lib
> > nf_defrag_ipv4
> > iptable_filter
> > ip_tables
> > x_tables
> > 8021q
> > garp
> > stp
> > llc
> > loop
> > sata_sil
> > pata_atiixp
> > pata_acpi
> > ata_generic
> > libata
> > 8139cp
> > usb_storage
> > mtdblock
> > mtd_blkdevs
> > mtd
> > sr_mod
> > cdrom
> > tulip
> > r8169
> > sky2
> > via_velocity
> > via_rhine
> > sis900
> > ne2k_pci
> > 8390
> > skge
> > tg3
> > libphy
> > 8139too
> > e1000
> > e100
> > usbhid
> > ohci_hcd
> > uhci_hcd
> > ehci_hcd
> > usbcore
> > nls_base
> > Oct 18 23:59:40 194.146.153.93
> > [  113.362344]
> > [  113.362344] Pid: 2858, comm: pppd Not tainted (2.6.31.4-build-0047 #7)
> > [  113.362344] EIP: 0060:[<f8868269>] EFLAGS: 00010286 CPU: 0
> > [  113.362344] EIP is at pppoe_device_event+0x80/0x12c [pppoe]
> > [  113.362344] EAX: f4fbe000 EBX: ffffffff ECX: f6cea5a0 EDX: f7403680
> > [  113.362344] ESI: 0000000f EDI: f6cea5e0 EBP: f4145e34 ESP: f4145e1c
> > [  113.362344]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> > [  113.362344] Process pppd (pid: 2858, ti=f4145000 task=f4112ff0
> > task.ti=f4145000)
> > [  113.362344] Stack:
> > [  113.362344]  f4fbe220
> > f4fbe000
> > f6cea5a0
> > f886a430
> > fffffff5
> > 00000000
> > f4145e54
> > c01422b3
> > Oct 18 23:59:40 194.146.153.93
> > [  113.362344] <0>
> > f4fbe000
> > 00000009
> > f8a457d8
> > f4fbe000
> > f8850190
> > 00001091
> > f4145e64
> > c0142361
> > Oct 18 23:59:40 194.146.153.93
> > [  113.362344] <0>
> > ffffffff
> > 00000000
> > f4145e74
> > c029ffbf
> > f4fbe000
> > 000010d0
> > f4145e90
> > c029fa70
> > Oct 18 23:59:40 194.146.153.93
> > [  113.362344] Call Trace:
> > [  113.362344]  [<c01422b3>] ? notifier_call_chain+0x2b/0x4a
> > [  113.362344]  [<c0142361>] ? raw_notifier_call_chain+0xc/0xe
> > [  113.362344]  [<c029ffbf>] ? dev_close+0x4c/0x8c
> > [  113.362344]  [<c029fa70>] ? dev_change_flags+0xa5/0x158
> > [  113.362344]  [<c02da633>] ? devinet_ioctl+0x21a/0x503
> > [  113.362344]  [<c02db693>] ? inet_ioctl+0x8d/0xa6
> > [  113.362344]  [<c0292b21>] ? sock_ioctl+0x1c8/0x1ec
> > [  113.362344]  [<c0292959>] ? sock_ioctl+0x0/0x1ec
> > [  113.362344]  [<c019af2b>] ? vfs_ioctl+0x22/0x69
> > [  113.362344]  [<c019b435>] ? do_vfs_ioctl+0x41f/0x459
> > [  113.362344]  [<c02934eb>] ? sys_send+0x18/0x1a
> > [  113.362344]  [<c011942f>] ? do_page_fault+0x242/0x26f
> > [  113.362344]  [<c019b49b>] ? sys_ioctl+0x2c/0x45
> > [  113.362344]  [<c0102975>] ? syscall_call+0x7/0xb
> > [  113.362344] Code:
> > c9
> > 00
> > 00
> > 00
> > 89
> > c7
> > 31
> > f6
> > 83
> > c7
> > 40
> > 89
> > f8
> > e8
> > cc
> > 60
> > a9
> > c7
> > 8b
> > 45
> > ec
> > 05
> > 20
> > 02
> > 00
> > 00
> > 89
> > 45
> > e8
> > 8b
> > 4d
> > f0
> > 8b
> > 1c
> > b1
> > e9
> > 8c
> > 00
> > 00
> > 00
> > 8b
> > 45
> > ec
> > Oct 18 23:59:40 194.146.153.93
> > 83
> > 90
> > 01
> > 00
> > 00
> > 74
> > 08
> > 8b
> > 9b
> > 8c
> > 01
> > 00
> > 00
> > eb
> > 79
> > b8
> > c0
> > a6
> > 86
> > f8
> > Oct 18 23:59:40 194.146.153.93
> > [  113.362344] EIP: [<f8868269>]
> > pppoe_device_event+0x80/0x12c [pppoe]
> > SS:ESP 0068:f4145e1c
> > [  113.362344] CR2: 000000000000018f
> > [  113.373124] ---[ end trace f6fe64a307e97f3b ]---
> > [  113.373203] Kernel panic - not syncing: Fatal exception in interrupt
> > [  113.373286] Pid: 2858, comm: pppd Tainted: G      D  
> >  2.6.31.4-build-0047 #7
> > [  113.373379] Call Trace:
> > [  113.373479]  [<c02fc496>] ? printk+0xf/0x11
> > [  113.373561]  [<c02fc3e7>] panic+0x39/0xd9
> > [  113.373656]  [<c01059b7>] oops_end+0x8b/0x9a
> > [  113.373727]  [<c0118f6d>] no_context+0x13d/0x147
> > [  113.373800]  [<c011908a>] __bad_area_nosemaphore+0x113/0x11b
> > [  113.373881]  [<c02953b3>] ? sock_alloc_send_pskb+0x8b/0x24a
> > [  113.373959]  [<c0121801>] ? __wake_up_sync_key+0x3b/0x45
> > [  113.374030]  [<c0131967>] ? irq_exit+0x39/0x5c
> > [  113.374107]  [<c0104393>] ? do_IRQ+0x80/0x96
> > [  113.374183]  [<c0102f49>] ? common_interrupt+0x29/0x30
> > [  113.374259]  [<c011909f>] bad_area_nosemaphore+0xd/0x10
> > [  113.374348]  [<c0119301>] do_page_fault+0x114/0x26f
> > [  113.374526]  [<c01191ed>] ? do_page_fault+0x0/0x26f
> > [  113.374605]  [<c02fe506>] error_code+0x66/0x6c
> > [  113.374683]  [<c02d007b>] ? tcp_v4_send_ack+0xa3/0x10e
> > [  113.374764]  [<c01191ed>] ? do_page_fault+0x0/0x26f
> > [  113.374850]  [<f8868269>] ? pppoe_device_event+0x80/0x12c [pppoe]
> > [  113.374928]  [<c01422b3>] notifier_call_chain+0x2b/0x4a
> > [  113.375012]  [<c0142361>] raw_notifier_call_chain+0xc/0xe
> > [  113.375097]  [<c029ffbf>] dev_close+0x4c/0x8c
> > [  113.375169]  [<c029fa70>] dev_change_flags+0xa5/0x158
> > [  113.375239]  [<c02da633>] devinet_ioctl+0x21a/0x503
> > [  113.375318]  [<c02db693>] inet_ioctl+0x8d/0xa6
> > [  113.375411]  [<c0292b21>] sock_ioctl+0x1c8/0x1ec
> > [  113.375491]  [<c0292959>] ? sock_ioctl+0x0/0x1ec
> > [  113.375574]  [<c019af2b>] vfs_ioctl+0x22/0x69
> > [  113.375653]  [<c019b435>] do_vfs_ioctl+0x41f/0x459
> > [  113.375734]  [<c02934eb>] ? sys_send+0x18/0x1a
> > [  113.375813]  [<c011942f>] ? do_page_fault+0x242/0x26f
> > [  113.375884]  [<c019b49b>] sys_ioctl+0x2c/0x45
> > [  113.375960]  [<c0102975>] syscall_call+0x7/0xb
> > [  113.376041] Rebooting in 5 seconds..



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: kernel panic in latest vanilla stable, while using nameif with  "alive" pppoe interfaces
  2009-10-19  3:34 ` Michal Ostrowski
  2009-10-19 11:36   ` Denys Fedoryschenko
@ 2009-10-19 12:01   ` Denys Fedoryschenko
  2009-10-19 12:36   ` Eric Dumazet
  2 siblings, 0 replies; 34+ messages in thread
From: Denys Fedoryschenko @ 2009-10-19 12:01 UTC (permalink / raw)
  To: Michal Ostrowski; +Cc: netdev, linux-ppp, paulus, mostrows

Applied patch manually, still panic (maybe different now):

[   42.596904]
[   42.596904] Pid: 0, comm: swapper Not tainted (2.6.31.4-build-0047 #12)
[   42.596904] EIP: 0060:[<f886865e>] EFLAGS: 00010286 CPU: 0
[   42.596904] EIP is at pppoe_rcv+0x153/0x1be [pppoe]
[   42.596904] EAX: 00003100 EBX: ffffffff ECX: 00000002 EDX: f74007cb
[   42.596904] ESI: f79b4b00 EDI: f6332f00 EBP: c1806edc ESP: c1806eb8
[   42.596904]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[   42.596904] Process swapper (pid: 0, ti=c1806000 task=c03f6de0 
task.ti=c03f1000)
[   42.596904] Stack:
[   42.596904]  f7445000
00000004
f6fc0028
31000030
f6fc0030
f6332f40
f79b4b00
c0418140
Oct 19 15:00:46 194.146.153.99
[   42.596904] <0>
f886a444
c1806f14
c029e702
f7445000
f79b4bb0
f79b4bb0
c0418160
f7445000
Oct 19 15:00:46 194.146.153.99
[   42.596904] <0>
00000000
00000001
64886f14
c02ac3ce
f79b4b00
00000000
f753105c
c1806f58
Oct 19 15:00:46 194.146.153.99
[   42.596904] Call Trace:
[   42.596904]  [<c029e702>] ? netif_receive_skb+0x43b/0x45a
[   42.596904]  [<c02ac3ce>] ? eth_type_trans+0x25/0xa9
[   42.596904]  [<f840f736>] ? rtl8169_rx_interrupt+0x343/0x3f3 [r8169]
[   42.596904]  [<f8412191>] ? rtl8169_poll+0x2f/0x1b2 [r8169]
[   42.596904]  [<c01413e5>] ? hrtimer_run_pending+0x2d/0xa7
[   42.596904]  [<c029ec95>] ? net_rx_action+0x93/0x177
[   42.596904]  [<c0131bdd>] ? __do_softirq+0xa7/0x144
[   42.596904]  [<c0131b36>] ? __do_softirq+0x0/0x144
[   42.596904]  <IRQ>
Oct 19 15:00:46 194.146.153.99
[   42.596904]  [<c0131957>] ? irq_exit+0x29/0x5c
[   42.596904]  [<c0104393>] ? do_IRQ+0x80/0x96
[   42.596904]  [<c0102f49>] ? common_interrupt+0x29/0x30
[   42.596904]  [<c0108a3e>] ? mwait_idle+0x8a/0xb9
[   42.596904]  [<c0149613>] ? tick_nohz_restart_sched_tick+0x27/0x12f
[   42.596904]  [<c0101bf0>] ? cpu_idle+0x44/0x60
[   42.596904]  [<c02ef8c7>] ? rest_init+0x53/0x55
[   42.596904]  [<c04197df>] ? start_kernel+0x2b9/0x2be
[   42.596904]  [<c041906a>] ? i386_start_kernel+0x6a/0x6f
[   42.596904] Code:
53
0a
32
53
0b
31
c2
c1
e8
08
31
c2
eb
08
0f
b6
c2
c1
f8
04
31
c2
d1
e9
83
f9
04
74
f1
89
d0
83
e0
0f
8b
1c
87
eb
35
66
8b
45
ea
Oct 19 15:00:46 194.146.153.99
39
83
98
01
00
00
75
22
8b
55
e4
8d
83
9a
01
00
00
b9
06
00
Oct 19 15:00:46 194.146.153.99
[   42.596904] EIP: [<f886865e>]
pppoe_rcv+0x153/0x1be [pppoe]
SS:ESP 0068:c1806eb8
[   42.596904] CR2: 0000000000000197
[   42.606148] ---[ end trace b477ff4ee072d9b9 ]---
[   42.606209] Kernel panic - not syncing: Fatal exception in interrupt
[   42.606273] Pid: 0, comm: swapper Tainted: G      D    2.6.31.4-build-0047 
#12
[   42.606351] Call Trace:
[   42.606413]  [<c02fc496>] ? printk+0xf/0x11
[   42.606474]  [<c02fc3e7>] panic+0x39/0xd9
[   42.606535]  [<c01059b7>] oops_end+0x8b/0x9a
[   42.606597]  [<c0118f6d>] no_context+0x13d/0x147
[   42.606658]  [<c011908a>] __bad_area_nosemaphore+0x113/0x11b
[   42.606722]  [<c0121dc1>] ? check_preempt_wakeup+0x34/0x141
[   42.606862]  [<c01294bb>] ? try_to_wake_up+0x1aa/0x1b4
[   42.606930]  [<c0209541>] ? cpumask_next_and+0x26/0x37
[   42.607003]  [<c01256f5>] ? find_busiest_group+0x291/0x885
[   42.607067]  [<c019cf47>] ? pollwake+0x5a/0x63
[   42.607127]  [<c011909f>] bad_area_nosemaphore+0xd/0x10
[   42.607189]  [<c0119301>] do_page_fault+0x114/0x26f
[   42.607251]  [<c01191ed>] ? do_page_fault+0x0/0x26f
[   42.607313]  [<c02fe506>] error_code+0x66/0x6c
[   42.607375]  [<c02900d8>] ? pcibios_set_master+0x89/0x8d
[   42.607436]  [<c01191ed>] ? do_page_fault+0x0/0x26f
[   42.607501]  [<f886865e>] ? pppoe_rcv+0x153/0x1be [pppoe]
[   42.607564]  [<c029e702>] netif_receive_skb+0x43b/0x45a
[   42.607625]  [<c02ac3ce>] ? eth_type_trans+0x25/0xa9
[   42.607691]  [<f840f736>] rtl8169_rx_interrupt+0x343/0x3f3 [r8169]
[   42.607759]  [<f8412191>] rtl8169_poll+0x2f/0x1b2 [r8169]
[   42.607824]  [<c01413e5>] ? hrtimer_run_pending+0x2d/0xa7
[   42.607886]  [<c029ec95>] net_rx_action+0x93/0x177
[   42.607948]  [<c0131bdd>] __do_softirq+0xa7/0x144
[   42.608022]  [<c0131b36>] ? __do_softirq+0x0/0x144
[   42.608082]  <IRQ>
[<c0131957>] ? irq_exit+0x29/0x5c
[   42.608183]  [<c0104393>] ? do_IRQ+0x80/0x96
[   42.608183]  [<c0104393>] ? do_IRQ+0x80/0x96
[   42.608245]  [<c0102f49>] ? common_interrupt+0x29/0x30
[   42.608307]  [<c0108a3e>] ? mwait_idle+0x8a/0xb9
[   42.608369]  [<c0149613>] ? tick_nohz_restart_sched_tick+0x27/0x12f
[   42.608431]  [<c0101bf0>] ? cpu_idle+0x44/0x60
[   42.608493]  [<c02ef8c7>] ? rest_init+0x53/0x55
[   42.608553]  [<c04197df>] ? start_kernel+0x2b9/0x2be
[   42.608616]  [<c041906a>] ? i386_start_kernel+0x6a/0x6f
[   42.608682] Rebooting in 5 seconds..



On Monday 19 October 2009 06:34:06 Michal Ostrowski wrote:
> Here's my theory on this after an inital look...
>
> Looking at the oops report and disassembly of the actual module binary
> that caused the oops, one can deduce that:
>
> Execution was in pppoe_flush_dev().  %ebx contained the pointer "struct
> pppox_sock *po", which is what we faulted on, excuting "cmp %eax,
> 0x190(%ebx)". %ebx value was 0xffffffff (hence we got "NULL pointer
> dereference at 0x18f").
>
> At this point "i" (stored in %esi) is 15 (valid), meaning that we got a
> value of 0xffffffff in pn->hash_table[i].
>
> From this I'd hypothesize that the combination of dev_put() and
> release_sock() may have allowed us to free "pn".  At the bottom of the loop
> we alreayd recognize that since locks are dropped we're responsible for
> handling invalidation of objects, and perhaps that should be extended to
> "pn" as well. --
> Michal Ostrowski
> mostrows@gmail.com
>
>
> ---
>  drivers/net/pppoe.c |   86 ++++++++++++++++++++++++++----
> --------------------
>  1 files changed, 45 insertions(+), 41 deletions(-)
>
> diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c
> index 7cbf6f9..720c4ea 100644
> --- a/drivers/net/pppoe.c
> +++ b/drivers/net/pppoe.c
> @@ -296,6 +296,7 @@ static void pppoe_flush_dev(struct net_device *dev)
>
>        BUG_ON(dev == NULL);
>
> +restart:
>        pn = pppoe_pernet(dev_net(dev));
>        if (!pn) /* already freed */
>                return;
> @@ -303,48 +304,51 @@ static void pppoe_flush_dev(struct net_device *dev)
>        write_lock_bh(&pn->hash_lock);
>        for (i = 0; i < PPPOE_HASH_SIZE; i++) {
>                struct pppox_sock *po = pn->hash_table[i];
> +                struct sock *sk;
>
> -               while (po != NULL) {
> -                       struct sock *sk;
> -                       if (po->pppoe_dev != dev) {
> -                               po = po->next;
> -                               continue;
> -                       }
> -                       sk = sk_pppox(po);
> -                       spin_lock(&flush_lock);
> -                       po->pppoe_dev = NULL;
> -                       spin_unlock(&flush_lock);
> -                       dev_put(dev);
> -
> -                       /* We always grab the socket lock, followed by the
> -                        * hash_lock, in that order.  Since we should
> -                        * hold the sock lock while doing any unbinding,
> -                        * we need to release the lock we're holding.
> -                        * Hold a reference to the sock so it doesn't
> disappear -                        * as we're jumping between locks.
> -                        */
> +                while (po && po->pppoe_dev != dev) {
> +                        po = po->next;
> +                }
>
> -                       sock_hold(sk);
> +                if (po == NULL) {
> +                        continue;
> +                }
>
> -                       write_unlock_bh(&pn->hash_lock);
> -                       lock_sock(sk);
> +                sk = sk_pppox(po);
>
> -                       if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND))
> { -                               pppox_unbind_sock(sk);
> -                               sk->sk_state = PPPOX_ZOMBIE;
> -                               sk->sk_state_change(sk);
> -                       }
> +                spin_lock(&flush_lock);
> +                po->pppoe_dev = NULL;
> +                spin_unlock(&flush_lock);
>
> -                       release_sock(sk);
> -                       sock_put(sk);
> +                dev_put(dev);
>
> -                       /* Restart scan at the beginning of this hash
> chain. -                        * While the lock was dropped the chain
> contents may -                        * have changed.
> -                        */
> -                       write_lock_bh(&pn->hash_lock);
> -                       po = pn->hash_table[i];
> -               }
> +                /* We always grab the socket lock, followed by the
> +                 * hash_lock, in that order.  Since we should
> +                 * hold the sock lock while doing any unbinding,
> +                 * we need to release the lock we're holding.
> +                 * Hold a reference to the sock so it doesn't disappear
> +                 * as we're jumping between locks.
> +                 */
> +
> +                sock_hold(sk);
> +
> +                write_unlock_bh(&pn->hash_lock);
> +                lock_sock(sk);
> +
> +                if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND)) {
> +                        pppox_unbind_sock(sk);
> +                        sk->sk_state = PPPOX_ZOMBIE;
> +                        sk->sk_state_change(sk);
> +                }
> +
> +                release_sock(sk);
> +                sock_put(sk);
> +
> +                /* Restart the flush process from the beginning.  While
> +                 * the lock was dropped the chain contents may have
> +                 * changed, and sock_put may have made things go away.
> +                 */
> +                goto restart;
>        }
>        write_unlock_bh(&pn->hash_lock);
>  }
> --
> 1.6.3.3
>
> On Sun, Oct 18, 2009 at 4:02 PM, Denys Fedoryschenko <denys@visp.net.lb> 
wrote:
> > I have server running as pppoe NAS.
> > Tried to rename customers without dropping pppd connections first, got
> > panic after few seconds.
> > Panic triggerable at 2.6.30.4 and 2.6.31.4
> > pppoe users running on eth2
> > pppoe flags:
> >  1457 root     /usr/sbin/pppoe-server -I eth2 -k -L 172.16.1.1 -R
> > 172.16.1.2 -N 253 -C gpzone -S gpzone
> >
> >
> > Commands sequence that i think triggered that:
> >
> > ip link set eth0 down
> > ip link set eth1 down
> > ip link set eth2 down
> > nameif etherx 00:16:76:8D:83:BA
> > nameif eth0 00:19:e0:72:4a:37
> > nameif eth1 00:19:e0:72:4a:4b
> >
> > ip addr flush dev eth0
> > ip addr flush dev eth1
> > ip addr add X.X.X.X/29 dev eth0
> > ip addr add 192.168.2.177/24 dev eth0
> > ip addr add 192.168.0.1/32 dev eth1
> > ip addr add 127.0.0.0/8 dev lo
> > #ip link set eth0 up
> > ip link set eth0 up
> > ip link set eth1 up
> > ip link set lo up
> > ip route add 0.0.0.0/0 via X.X.X.X
> >
> >
> > [  103.428591] r8169: eth0: link up
> > [  103.430360] r8169: eth1: link up
> > [  113.361528] BUG: unable to handle kernel
> > NULL pointer dereference
> > at 0000018f
> > [  113.361717] IP:
> > [<f8868269>] pppoe_device_event+0x80/0x12c [pppoe]
> > [  113.361853] *pdpt = 000000003411a001
> > *pde = 0000000000000000
> > Oct 18 23:59:40 194.146.153.93
> > [  113.362012] Oops: 0000 [#1]
> > SMP
> > Oct 18 23:59:40 194.146.153.93
> > [  113.362166] last sysfs file: /sys/devices/virtual/vc/vcs3/dev
> > [  113.362246] Modules linked in:
> > netconsole
> > configfs
> > act_skbedit
> > sch_ingress
> > sch_prio
> > cls_flow
> > cls_u32
> > em_meta
> > cls_basic
> > xt_dscp
> > xt_DSCP
> > ipt_REJECT
> > ts_bm
> > xt_string
> > xt_hl
> > ifb
> > cls_fw
> > sch_tbf
> > sch_htb
> > act_ipt
> > act_mirred
> > xt_MARK
> > pppoe
> > pppox
> > ppp_generic
> > slhc
> > xt_TCPMSS
> > xt_mark
> > xt_tcpudp
> > iptable_mangle
> > iptable_nat
> > nf_nat
> > rtc_cmos
> > nf_conntrack_ipv4
> > rtc_core
> > nf_conntrack
> > rtc_lib
> > nf_defrag_ipv4
> > iptable_filter
> > ip_tables
> > x_tables
> > 8021q
> > garp
> > stp
> > llc
> > loop
> > sata_sil
> > pata_atiixp
> > pata_acpi
> > ata_generic
> > libata
> > 8139cp
> > usb_storage
> > mtdblock
> > mtd_blkdevs
> > mtd
> > sr_mod
> > cdrom
> > tulip
> > r8169
> > sky2
> > via_velocity
> > via_rhine
> > sis900
> > ne2k_pci
> > 8390
> > skge
> > tg3
> > libphy
> > 8139too
> > e1000
> > e100
> > usbhid
> > ohci_hcd
> > uhci_hcd
> > ehci_hcd
> > usbcore
> > nls_base
> > Oct 18 23:59:40 194.146.153.93
> > [  113.362344]
> > [  113.362344] Pid: 2858, comm: pppd Not tainted (2.6.31.4-build-0047 #7)
> > [  113.362344] EIP: 0060:[<f8868269>] EFLAGS: 00010286 CPU: 0
> > [  113.362344] EIP is at pppoe_device_event+0x80/0x12c [pppoe]
> > [  113.362344] EAX: f4fbe000 EBX: ffffffff ECX: f6cea5a0 EDX: f7403680
> > [  113.362344] ESI: 0000000f EDI: f6cea5e0 EBP: f4145e34 ESP: f4145e1c
> > [  113.362344]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> > [  113.362344] Process pppd (pid: 2858, ti=f4145000 task=f4112ff0
> > task.ti=f4145000)
> > [  113.362344] Stack:
> > [  113.362344]  f4fbe220
> > f4fbe000
> > f6cea5a0
> > f886a430
> > fffffff5
> > 00000000
> > f4145e54
> > c01422b3
> > Oct 18 23:59:40 194.146.153.93
> > [  113.362344] <0>
> > f4fbe000
> > 00000009
> > f8a457d8
> > f4fbe000
> > f8850190
> > 00001091
> > f4145e64
> > c0142361
> > Oct 18 23:59:40 194.146.153.93
> > [  113.362344] <0>
> > ffffffff
> > 00000000
> > f4145e74
> > c029ffbf
> > f4fbe000
> > 000010d0
> > f4145e90
> > c029fa70
> > Oct 18 23:59:40 194.146.153.93
> > [  113.362344] Call Trace:
> > [  113.362344]  [<c01422b3>] ? notifier_call_chain+0x2b/0x4a
> > [  113.362344]  [<c0142361>] ? raw_notifier_call_chain+0xc/0xe
> > [  113.362344]  [<c029ffbf>] ? dev_close+0x4c/0x8c
> > [  113.362344]  [<c029fa70>] ? dev_change_flags+0xa5/0x158
> > [  113.362344]  [<c02da633>] ? devinet_ioctl+0x21a/0x503
> > [  113.362344]  [<c02db693>] ? inet_ioctl+0x8d/0xa6
> > [  113.362344]  [<c0292b21>] ? sock_ioctl+0x1c8/0x1ec
> > [  113.362344]  [<c0292959>] ? sock_ioctl+0x0/0x1ec
> > [  113.362344]  [<c019af2b>] ? vfs_ioctl+0x22/0x69
> > [  113.362344]  [<c019b435>] ? do_vfs_ioctl+0x41f/0x459
> > [  113.362344]  [<c02934eb>] ? sys_send+0x18/0x1a
> > [  113.362344]  [<c011942f>] ? do_page_fault+0x242/0x26f
> > [  113.362344]  [<c019b49b>] ? sys_ioctl+0x2c/0x45
> > [  113.362344]  [<c0102975>] ? syscall_call+0x7/0xb
> > [  113.362344] Code:
> > c9
> > 00
> > 00
> > 00
> > 89
> > c7
> > 31
> > f6
> > 83
> > c7
> > 40
> > 89
> > f8
> > e8
> > cc
> > 60
> > a9
> > c7
> > 8b
> > 45
> > ec
> > 05
> > 20
> > 02
> > 00
> > 00
> > 89
> > 45
> > e8
> > 8b
> > 4d
> > f0
> > 8b
> > 1c
> > b1
> > e9
> > 8c
> > 00
> > 00
> > 00
> > 8b
> > 45
> > ec
> > Oct 18 23:59:40 194.146.153.93
> > 83
> > 90
> > 01
> > 00
> > 00
> > 74
> > 08
> > 8b
> > 9b
> > 8c
> > 01
> > 00
> > 00
> > eb
> > 79
> > b8
> > c0
> > a6
> > 86
> > f8
> > Oct 18 23:59:40 194.146.153.93
> > [  113.362344] EIP: [<f8868269>]
> > pppoe_device_event+0x80/0x12c [pppoe]
> > SS:ESP 0068:f4145e1c
> > [  113.362344] CR2: 000000000000018f
> > [  113.373124] ---[ end trace f6fe64a307e97f3b ]---
> > [  113.373203] Kernel panic - not syncing: Fatal exception in interrupt
> > [  113.373286] Pid: 2858, comm: pppd Tainted: G      D  
> >  2.6.31.4-build-0047 #7
> > [  113.373379] Call Trace:
> > [  113.373479]  [<c02fc496>] ? printk+0xf/0x11
> > [  113.373561]  [<c02fc3e7>] panic+0x39/0xd9
> > [  113.373656]  [<c01059b7>] oops_end+0x8b/0x9a
> > [  113.373727]  [<c0118f6d>] no_context+0x13d/0x147
> > [  113.373800]  [<c011908a>] __bad_area_nosemaphore+0x113/0x11b
> > [  113.373881]  [<c02953b3>] ? sock_alloc_send_pskb+0x8b/0x24a
> > [  113.373959]  [<c0121801>] ? __wake_up_sync_key+0x3b/0x45
> > [  113.374030]  [<c0131967>] ? irq_exit+0x39/0x5c
> > [  113.374107]  [<c0104393>] ? do_IRQ+0x80/0x96
> > [  113.374183]  [<c0102f49>] ? common_interrupt+0x29/0x30
> > [  113.374259]  [<c011909f>] bad_area_nosemaphore+0xd/0x10
> > [  113.374348]  [<c0119301>] do_page_fault+0x114/0x26f
> > [  113.374526]  [<c01191ed>] ? do_page_fault+0x0/0x26f
> > [  113.374605]  [<c02fe506>] error_code+0x66/0x6c
> > [  113.374683]  [<c02d007b>] ? tcp_v4_send_ack+0xa3/0x10e
> > [  113.374764]  [<c01191ed>] ? do_page_fault+0x0/0x26f
> > [  113.374850]  [<f8868269>] ? pppoe_device_event+0x80/0x12c [pppoe]
> > [  113.374928]  [<c01422b3>] notifier_call_chain+0x2b/0x4a
> > [  113.375012]  [<c0142361>] raw_notifier_call_chain+0xc/0xe
> > [  113.375097]  [<c029ffbf>] dev_close+0x4c/0x8c
> > [  113.375169]  [<c029fa70>] dev_change_flags+0xa5/0x158
> > [  113.375239]  [<c02da633>] devinet_ioctl+0x21a/0x503
> > [  113.375318]  [<c02db693>] inet_ioctl+0x8d/0xa6
> > [  113.375411]  [<c0292b21>] sock_ioctl+0x1c8/0x1ec
> > [  113.375491]  [<c0292959>] ? sock_ioctl+0x0/0x1ec
> > [  113.375574]  [<c019af2b>] vfs_ioctl+0x22/0x69
> > [  113.375653]  [<c019b435>] do_vfs_ioctl+0x41f/0x459
> > [  113.375734]  [<c02934eb>] ? sys_send+0x18/0x1a
> > [  113.375813]  [<c011942f>] ? do_page_fault+0x242/0x26f
> > [  113.375884]  [<c019b49b>] sys_ioctl+0x2c/0x45
> > [  113.375960]  [<c0102975>] syscall_call+0x7/0xb
> > [  113.376041] Rebooting in 5 seconds..



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces
  2009-10-19  3:34 ` Michal Ostrowski
  2009-10-19 11:36   ` Denys Fedoryschenko
  2009-10-19 12:01   ` Denys Fedoryschenko
@ 2009-10-19 12:36   ` Eric Dumazet
  2009-10-19 13:19     ` Michal Ostrowski
  2 siblings, 1 reply; 34+ messages in thread
From: Eric Dumazet @ 2009-10-19 12:36 UTC (permalink / raw)
  To: Michal Ostrowski; +Cc: Denys Fedoryschenko, netdev, linux-ppp, paulus, mostrows

Michal Ostrowski a écrit :
> Here's my theory on this after an inital look...
> 
> Looking at the oops report and disassembly of the actual module binary
> that caused the oops, one can deduce that:
> 
> Execution was in pppoe_flush_dev().  %ebx contained the pointer "struct
> pppox_sock *po", which is what we faulted on, excuting "cmp %eax, 0x190(%ebx)".
> %ebx value was 0xffffffff (hence we got "NULL pointer dereference at 0x18f").
> 
> At this point "i" (stored in %esi) is 15 (valid), meaning that we got a value
> of 0xffffffff in pn->hash_table[i].
> 
>>From this I'd hypothesize that the combination of dev_put() and release_sock()
> may have allowed us to free "pn".  At the bottom of the loop we alreayd
> recognize that since locks are dropped we're responsible for handling
> invalidation of objects, and perhaps that should be extended to "pn" as well.
> --
> Michal Ostrowski
> mostrows@gmail.com
> 
> 

Looking at this stuff, I do believe flush_lock protection is not
properly done.

At the end of pppoe_connect() for example we can find :

err_put:
        if (po->pppoe_dev) {
                dev_put(po->pppoe_dev);
                po->pppoe_dev = NULL;
        }

This is done without any protection, and can therefore clash with 
pppoe_flush_dev() :

	spin_lock(&flush_lock);
	po->pppoe_dev = NULL; /* ppoe_dev can already be NULL before this point */
	spin_unlock(&flush_lock);

	dev_put(dev);    /* oops */

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces
  2009-10-19 12:36   ` Eric Dumazet
@ 2009-10-19 13:19     ` Michal Ostrowski
  2009-10-19 15:50       ` Cyrill Gorcunov
  0 siblings, 1 reply; 34+ messages in thread
From: Michal Ostrowski @ 2009-10-19 13:19 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Denys Fedoryschenko, netdev, linux-ppp, paulus, mostrows,
	Cyrill Gorcunov

The entire scheme for managing net namespaces seems unsafe.  We depend
on synchronization via pn->hash_lock, but have no guarantee of the
existence of the "net" object -- hence no way to ensure the existence
of the lock itself.  This should be relatively easy to fix though as
we should be able to get/put the net namespace as we add remove
objects to/from the pppoe hash.

Once you solve this existence issue, the flush_lock can be eliminated
altogether since all of the relevant code paths already depend on a
write_lock_bh(&pn->hash_lock), and that's the lock that should be use
to protect the pppoe_dev field.

Another patch to follow later...

--
Michal Ostrowski
mostrows@gmail.com



On Mon, Oct 19, 2009 at 7:36 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Michal Ostrowski a écrit :
>> Here's my theory on this after an inital look...
>>
>> Looking at the oops report and disassembly of the actual module binary
>> that caused the oops, one can deduce that:
>>
>> Execution was in pppoe_flush_dev().  %ebx contained the pointer "struct
>> pppox_sock *po", which is what we faulted on, excuting "cmp %eax, 0x190(%ebx)".
>> %ebx value was 0xffffffff (hence we got "NULL pointer dereference at 0x18f").
>>
>> At this point "i" (stored in %esi) is 15 (valid), meaning that we got a value
>> of 0xffffffff in pn->hash_table[i].
>>
>>>From this I'd hypothesize that the combination of dev_put() and release_sock()
>> may have allowed us to free "pn".  At the bottom of the loop we alreayd
>> recognize that since locks are dropped we're responsible for handling
>> invalidation of objects, and perhaps that should be extended to "pn" as well.
>> --
>> Michal Ostrowski
>> mostrows@gmail.com
>>
>>
>
> Looking at this stuff, I do believe flush_lock protection is not
> properly done.
>
> At the end of pppoe_connect() for example we can find :
>
> err_put:
>        if (po->pppoe_dev) {
>                dev_put(po->pppoe_dev);
>                po->pppoe_dev = NULL;
>        }
>
> This is done without any protection, and can therefore clash with
> pppoe_flush_dev() :
>
>        spin_lock(&flush_lock);
>        po->pppoe_dev = NULL; /* ppoe_dev can already be NULL before this point */
>        spin_unlock(&flush_lock);
>
>        dev_put(dev);    /* oops */
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces
  2009-10-19 13:19     ` Michal Ostrowski
@ 2009-10-19 15:50       ` Cyrill Gorcunov
  2009-10-19 16:05         ` Michal Ostrowski
  0 siblings, 1 reply; 34+ messages in thread
From: Cyrill Gorcunov @ 2009-10-19 15:50 UTC (permalink / raw)
  To: Michal Ostrowski
  Cc: Eric Dumazet, Denys Fedoryschenko, netdev, linux-ppp, paulus,
	mostrows

[Michal Ostrowski - Mon, Oct 19, 2009 at 08:19:23AM -0500]
|
| The entire scheme for managing net namespaces seems unsafe.  We depend
| on synchronization via pn->hash_lock, but have no guarantee of the
| existence of the "net" object -- hence no way to ensure the existence
| of the lock itself.  This should be relatively easy to fix though as
| we should be able to get/put the net namespace as we add remove
| objects to/from the pppoe hash.
|

Hmm... it seems not. The only possible scenario I see (for such nonexistence
namespace is that when it was cached via RCU and returned before grace period
elapsed, so perhaps we need to call synchronize_net somewhere).

| 
| Once you solve this existence issue, the flush_lock can be eliminated
| altogether since all of the relevant code paths already depend on a
| write_lock_bh(&pn->hash_lock), and that's the lock that should be use
| to protect the pppoe_dev field.
| 
| Another patch to follow later...
| 
| --
| Michal Ostrowski
| mostrows@gmail.com
| 
| 
| 
| On Mon, Oct 19, 2009 at 7:36 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
| > Michal Ostrowski a écrit :
| >> Here's my theory on this after an inital look...
| >>
| >> Looking at the oops report and disassembly of the actual module binary
| >> that caused the oops, one can deduce that:
| >>
| >> Execution was in pppoe_flush_dev().  %ebx contained the pointer "struct
| >> pppox_sock *po", which is what we faulted on, excuting "cmp %eax, 0x190(%ebx)".
| >> %ebx value was 0xffffffff (hence we got "NULL pointer dereference at 0x18f").
| >>
| >> At this point "i" (stored in %esi) is 15 (valid), meaning that we got a value
| >> of 0xffffffff in pn->hash_table[i].
| >>
| >>>From this I'd hypothesize that the combination of dev_put() and release_sock()
| >> may have allowed us to free "pn".  At the bottom of the loop we alreayd
| >> recognize that since locks are dropped we're responsible for handling
| >> invalidation of objects, and perhaps that should be extended to "pn" as well.
| >> --
| >> Michal Ostrowski
| >> mostrows@gmail.com
| >>
| >>
| >
| > Looking at this stuff, I do believe flush_lock protection is not
| > properly done.
| >
| > At the end of pppoe_connect() for example we can find :
| >
| > err_put:
| >        if (po->pppoe_dev) {
| >                dev_put(po->pppoe_dev);
| >                po->pppoe_dev = NULL;
| >        }

Yep, this is unsafe, thanks!

| >
| > This is done without any protection, and can therefore clash with
| > pppoe_flush_dev() :
| >
| >        spin_lock(&flush_lock);
| >        po->pppoe_dev = NULL; /* ppoe_dev can already be NULL before this point */
| >        spin_unlock(&flush_lock);
| >
| >        dev_put(dev);    /* oops */
| >
| 

Denys, could you check if the patch below help?

	-- Cyrill
---
 drivers/net/pppoe.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Index: linux-2.6.git/drivers/net/pppoe.c
=====================================================================
--- linux-2.6.git.orig/drivers/net/pppoe.c
+++ linux-2.6.git/drivers/net/pppoe.c
@@ -312,9 +312,9 @@ static void pppoe_flush_dev(struct net_d
 			}
 			sk = sk_pppox(po);
 			spin_lock(&flush_lock);
+			dev_put(po->pppoe_dev);
 			po->pppoe_dev = NULL;
 			spin_unlock(&flush_lock);
-			dev_put(dev);
 
 			/* We always grab the socket lock, followed by the
 			 * hash_lock, in that order.  Since we should
@@ -708,10 +708,12 @@ end:
 	release_sock(sk);
 	return error;
 err_put:
+	spin_lock(&flush_lock);
 	if (po->pppoe_dev) {
 		dev_put(po->pppoe_dev);
 		po->pppoe_dev = NULL;
 	}
+	spin_unlock(&flush_lock);
 	goto end;
 }
 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces
  2009-10-19 15:50       ` Cyrill Gorcunov
@ 2009-10-19 16:05         ` Michal Ostrowski
  2009-10-19 17:12           ` Eric Dumazet
  0 siblings, 1 reply; 34+ messages in thread
From: Michal Ostrowski @ 2009-10-19 16:05 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Eric Dumazet, Denys Fedoryschenko, netdev, linux-ppp, paulus,
	mostrows

Here's a bigger patch that just gets rid of flush_lock altogether.

We were seeing oopses due to net namespaces going away while we were using
them, which turns out is simply due to the fact that pppoew wasn't claiming ref
counts properly.

Fixing this requires that adding and removing entries to the per-net hash-table
requires incrementing and decrementing the ref count.  This also allows us to
get rid of the flush_lock since we can now depend on the existence of
"pn->hash_lock".

We also have to be careful when flushing devices that removal of a hash table
entry may bring the net namespace refcount to 0.

---
 drivers/net/pppoe.c |  152 ++++++++++++++++++++++++++++-----------------------
 1 files changed, 83 insertions(+), 69 deletions(-)

diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c
index 7cbf6f9..140a196 100644
--- a/drivers/net/pppoe.c
+++ b/drivers/net/pppoe.c
@@ -111,9 +111,6 @@ struct pppoe_net {
       rwlock_t hash_lock;
 };

-/* to eliminate a race btw pppoe_flush_dev and pppoe_release */
-static DEFINE_SPINLOCK(flush_lock);
-
 /*
 * PPPoE could be in the following stages:
 * 1) Discovery stage (to obtain remote MAC and Session ID)
@@ -292,61 +289,77 @@ static inline struct pppox_sock
*delete_item(struct pppoe_net *pn, __be16 sid,
 static void pppoe_flush_dev(struct net_device *dev)
 {
       struct pppoe_net *pn;
+        struct net * net;
       int i;

       BUG_ON(dev == NULL);

+        /* We have to drop and re-acquire locks.  So we'll grab a ref-count on
+         * the net namespace to ensure it is valid throughout this function.
+         */
+
+        net = maybe_get_net(dev_net(dev));
+        if (!net)
+                return;
+
       pn = pppoe_pernet(dev_net(dev));
-       if (!pn) /* already freed */
+       if (!pn) { /* already freed */
+                put_net(net);
               return;
+        }

       write_lock_bh(&pn->hash_lock);
       for (i = 0; i < PPPOE_HASH_SIZE; i++) {
               struct pppox_sock *po = pn->hash_table[i];
-
-               while (po != NULL) {
-                       struct sock *sk;
-                       if (po->pppoe_dev != dev) {
-                               po = po->next;
-                               continue;
-                       }
-                       sk = sk_pppox(po);
-                       spin_lock(&flush_lock);
-                       po->pppoe_dev = NULL;
-                       spin_unlock(&flush_lock);
-                       dev_put(dev);
-
-                       /* We always grab the socket lock, followed by the
-                        * hash_lock, in that order.  Since we should
-                        * hold the sock lock while doing any unbinding,
-                        * we need to release the lock we're holding.
-                        * Hold a reference to the sock so it doesn't disappear
-                        * as we're jumping between locks.
-                        */
-
-                       sock_hold(sk);
-
-                       write_unlock_bh(&pn->hash_lock);
-                       lock_sock(sk);
-
-                       if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND)) {
-                               pppox_unbind_sock(sk);
-                               sk->sk_state = PPPOX_ZOMBIE;
-                               sk->sk_state_change(sk);
-                       }
-
-                       release_sock(sk);
-                       sock_put(sk);
-
-                       /* Restart scan at the beginning of this hash chain.
-                        * While the lock was dropped the chain contents may
-                        * have changed.
-                        */
-                       write_lock_bh(&pn->hash_lock);
-                       po = pn->hash_table[i];
-               }
+                struct sock *sk;
+
+                while (po && po->pppoe_dev != dev) {
+                        po = po->next;
+                }
+
+                if (po == NULL) {
+                        continue;
+                }
+
+                sk = sk_pppox(po);
+
+                if (po->pppoe_dev) {
+                        dev_put(po->pppoe_dev);
+                        po->pppoe_dev = NULL;
+                }
+
+                /* We always grab the socket lock, followed by the
+                 * hash_lock, in that order.  Since we should
+                 * hold the sock lock while doing any unbinding,
+                 * we need to release the lock we're holding.
+                 * Hold a reference to the sock so it doesn't disappear
+                 * as we're jumping between locks.
+                 */
+
+                sock_hold(sk);
+
+                write_unlock_bh(&pn->hash_lock);
+                lock_sock(sk);
+
+                if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND)) {
+                        pppox_unbind_sock(sk);
+                        sk->sk_state = PPPOX_ZOMBIE;
+                        sk->sk_state_change(sk);
+                }
+
+                release_sock(sk);
+                sock_put(sk);
+
+                /* Restart the process from the start of the current hash
+                 * chain. We dropped locks so the world may have change from
+                 * underneath us, but we know "pn" is still good because we
+                 * grabbed a ref count on "net".
+                 */
+                write_unlock_bh(&pn->hash_lock);
+                po = pn->hash_table[i];
       }
       write_unlock_bh(&pn->hash_lock);
+        put_net(net);
 }

 static int pppoe_device_event(struct notifier_block *this,
@@ -561,6 +574,7 @@ static int pppoe_release(struct socket *sock)
       struct sock *sk = sock->sk;
       struct pppox_sock *po;
       struct pppoe_net *pn;
+        struct net *net = NULL;

       if (!sk)
               return 0;
@@ -576,19 +590,8 @@ static int pppoe_release(struct socket *sock)
       /* Signal the death of the socket. */
       sk->sk_state = PPPOX_DEAD;

-       /*
-        * pppoe_flush_dev could lead to a race with
-        * this routine so we use flush_lock to eliminate
-        * such a case (we only need per-net specific data)
-        */
-       spin_lock(&flush_lock);
-       po = pppox_sk(sk);
-       if (!po->pppoe_dev) {
-               spin_unlock(&flush_lock);
-               goto out;
-       }
-       pn = pppoe_pernet(dev_net(po->pppoe_dev));
-       spin_unlock(&flush_lock);
+        net = sock_net(sk);
+       pn = pppoe_pernet(net);

       /*
        * protect "po" from concurrent updates
@@ -601,14 +604,14 @@ static int pppoe_release(struct socket *sock)
               __delete_item(pn, po->pppoe_pa.sid, po->pppoe_pa.remote,
                               po->pppoe_ifindex);

-       if (po->pppoe_dev) {
-               dev_put(po->pppoe_dev);
-               po->pppoe_dev = NULL;
-       }
+        if (po->pppoe_dev) {
+                dev_put(po->pppoe_dev);
+                po->pppoe_dev = NULL;
+        }

       write_unlock_bh(&pn->hash_lock);
+        put_net(net);

-out:
       sock_orphan(sk);
       sock->sk = NULL;

@@ -625,8 +628,9 @@ static int pppoe_connect(struct socket *sock,
struct sockaddr *uservaddr,
       struct sock *sk = sock->sk;
       struct sockaddr_pppox *sp = (struct sockaddr_pppox *)uservaddr;
       struct pppox_sock *po = pppox_sk(sk);
-       struct net_device *dev;
       struct pppoe_net *pn;
+       struct net_device *dev = NULL;
+        struct net *net = NULL;
       int error;

       lock_sock(sk);
@@ -653,10 +657,12 @@ static int pppoe_connect(struct socket *sock,
struct sockaddr *uservaddr,
       if (stage_session(po->pppoe_pa.sid)) {
               pppox_unbind_sock(sk);
               if (po->pppoe_dev) {
-                       pn = pppoe_pernet(dev_net(po->pppoe_dev));
+                        struct net *old = dev_net(po->pppoe_dev);
+                       pn = pppoe_pernet(old);
                       delete_item(pn, po->pppoe_pa.sid,
                               po->pppoe_pa.remote, po->pppoe_ifindex);
                       dev_put(po->pppoe_dev);
+                        put_net(old);
               }
               memset(sk_pppox(po) + 1, 0,
                      sizeof(struct pppox_sock) - sizeof(struct sock));
@@ -666,13 +672,17 @@ static int pppoe_connect(struct socket *sock,
struct sockaddr *uservaddr,
       /* Re-bind in session stage only */
       if (stage_session(sp->sa_addr.pppoe.sid)) {
               error = -ENODEV;
-               dev = dev_get_by_name(sock_net(sk), sp->sa_addr.pppoe.dev);
+                net = maybe_get_net(dev_net(dev));
+                if (!net)
+                        goto end;
+
+               dev = dev_get_by_name(net, sp->sa_addr.pppoe.dev);
               if (!dev)
-                       goto end;
+                       goto err_put_net;

               po->pppoe_dev = dev;
               po->pppoe_ifindex = dev->ifindex;
-               pn = pppoe_pernet(dev_net(dev));
+               pn = pppoe_pernet(net);
               write_lock_bh(&pn->hash_lock);
               if (!(dev->flags & IFF_UP)) {
                       write_unlock_bh(&pn->hash_lock);
@@ -707,6 +717,10 @@ static int pppoe_connect(struct socket *sock,
struct sockaddr *uservaddr,
 end:
       release_sock(sk);
       return error;
+err_put_net:
+        if (net) {
+                put_net(net);
+        }
 err_put:
       if (po->pppoe_dev) {
               dev_put(po->pppoe_dev);
--
1.6.3.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces
  2009-10-19 16:05         ` Michal Ostrowski
@ 2009-10-19 17:12           ` Eric Dumazet
  2009-10-19 18:07             ` Michal Ostrowski
  0 siblings, 1 reply; 34+ messages in thread
From: Eric Dumazet @ 2009-10-19 17:12 UTC (permalink / raw)
  To: Michal Ostrowski
  Cc: Cyrill Gorcunov, Denys Fedoryschenko, netdev, linux-ppp, paulus,
	mostrows

Michal Ostrowski a écrit :
> Here's a bigger patch that just gets rid of flush_lock altogether.
> 
> We were seeing oopses due to net namespaces going away while we were using
> them, which turns out is simply due to the fact that pppoew wasn't claiming ref
> counts properly.
> 
> Fixing this requires that adding and removing entries to the per-net hash-table
> requires incrementing and decrementing the ref count.  This also allows us to
> get rid of the flush_lock since we can now depend on the existence of
> "pn->hash_lock".
> 
> We also have to be careful when flushing devices that removal of a hash table
> entry may bring the net namespace refcount to 0.
> 

Your patch is mangled (tabulation -> white spaces),

and I dont believe namespace refcount can reach 0 inside pppoe_flush_dev(),
it would be a bug from core network code.



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces
  2009-10-19 17:12           ` Eric Dumazet
@ 2009-10-19 18:07             ` Michal Ostrowski
  2009-10-19 18:44               ` Eric Dumazet
  0 siblings, 1 reply; 34+ messages in thread
From: Michal Ostrowski @ 2009-10-19 18:07 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Cyrill Gorcunov, Denys Fedoryschenko, netdev, linux-ppp, paulus,
	mostrows

[-- Attachment #1: Type: text/plain, Size: 1653 bytes --]

On Mon, Oct 19, 2009 at 12:12 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Michal Ostrowski a écrit :
>> Here's a bigger patch that just gets rid of flush_lock altogether.
>>
>> We were seeing oopses due to net namespaces going away while we were using
>> them, which turns out is simply due to the fact that pppoew wasn't claiming ref
>> counts properly.
>>
>> Fixing this requires that adding and removing entries to the per-net hash-table
>> requires incrementing and decrementing the ref count.  This also allows us to
>> get rid of the flush_lock since we can now depend on the existence of
>> "pn->hash_lock".
>>
>> We also have to be careful when flushing devices that removal of a hash table
>> entry may bring the net namespace refcount to 0.
>>
>
> Your patch is mangled (tabulation -> white spaces),

Patch mangling was due to mailer interactions, I'll attach a clean
version here, no more inlining.

>
> and I dont believe namespace refcount can reach 0 inside pppoe_flush_dev(),
> it would be a bug from core network code.
>

>From the original oops I was able to deduce that the namespace somehow
managed to get destroyed during the interval where we dropped locks.
If that's not due to the release_sock() call in pppoe_flush_dev()
triggering a cleanup then I'd have to assume that that it's due to a
secondary actor closing the socket in parallel, but that in turn would
point to issues with the flush_lock.  Having said that the thrust of
this patch remains valid; it just means I don't need to inc the ref
count in pppoe_flush_dev().

Do you agree?

--
Michal Ostrowski
mostrows@gmail.com

[-- Attachment #2: 0001-PPPoE-Fix-ref-counts-on-net-namespaces.patch --]
[-- Type: application/octet-stream, Size: 5483 bytes --]

diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c
index 7cbf6f9..60bbeb2 100644
--- a/drivers/net/pppoe.c
+++ b/drivers/net/pppoe.c
@@ -111,9 +111,6 @@ struct pppoe_net {
 	rwlock_t hash_lock;
 };
 
-/* to eliminate a race btw pppoe_flush_dev and pppoe_release */
-static DEFINE_SPINLOCK(flush_lock);
-
 /*
  * PPPoE could be in the following stages:
  * 1) Discovery stage (to obtain remote MAC and Session ID)
@@ -303,48 +300,49 @@ static void pppoe_flush_dev(struct net_device *dev)
 	write_lock_bh(&pn->hash_lock);
 	for (i = 0; i < PPPOE_HASH_SIZE; i++) {
 		struct pppox_sock *po = pn->hash_table[i];
+		struct sock *sk;
 
-		while (po != NULL) {
-			struct sock *sk;
-			if (po->pppoe_dev != dev) {
-				po = po->next;
-				continue;
-			}
-			sk = sk_pppox(po);
-			spin_lock(&flush_lock);
-			po->pppoe_dev = NULL;
-			spin_unlock(&flush_lock);
-			dev_put(dev);
-
-			/* We always grab the socket lock, followed by the
-			 * hash_lock, in that order.  Since we should
-			 * hold the sock lock while doing any unbinding,
-			 * we need to release the lock we're holding.
-			 * Hold a reference to the sock so it doesn't disappear
-			 * as we're jumping between locks.
-			 */
+		while (po && po->pppoe_dev != dev) {
+			po = po->next;
+		}
 
-			sock_hold(sk);
+		if (po == NULL) {
+			continue;
+		}
 
-			write_unlock_bh(&pn->hash_lock);
-			lock_sock(sk);
+		sk = sk_pppox(po);
 
-			if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND)) {
-				pppox_unbind_sock(sk);
-				sk->sk_state = PPPOX_ZOMBIE;
-				sk->sk_state_change(sk);
-			}
+		if (po->pppoe_dev) {
+			dev_put(po->pppoe_dev);
+			po->pppoe_dev = NULL;
+		}
 
-			release_sock(sk);
-			sock_put(sk);
+		/* We always grab the socket lock, followed by the hash_lock,
+		 * in that order.  Since we should hold the sock lock while
+		 * doing any unbinding, we need to release the lock we're
+		 * holding.  Hold a reference to the sock so it doesn't
+		 * disappear as we're jumping between locks.
+		 */
 
-			/* Restart scan at the beginning of this hash chain.
-			 * While the lock was dropped the chain contents may
-			 * have changed.
-			 */
-			write_lock_bh(&pn->hash_lock);
-			po = pn->hash_table[i];
+		sock_hold(sk);
+		write_unlock_bh(&pn->hash_lock);
+		lock_sock(sk);
+
+		if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND)) {
+			pppox_unbind_sock(sk);
+			sk->sk_state = PPPOX_ZOMBIE;
+			sk->sk_state_change(sk);
 		}
+
+		release_sock(sk);
+		sock_put(sk);
+
+		/* Restart the process from the start of the current hash
+		 * chain. We dropped locks so the world may have change from
+		 * underneath us.
+		 */
+		write_unlock_bh(&pn->hash_lock);
+		po = pn->hash_table[i];
 	}
 	write_unlock_bh(&pn->hash_lock);
 }
@@ -561,6 +559,7 @@ static int pppoe_release(struct socket *sock)
 	struct sock *sk = sock->sk;
 	struct pppox_sock *po;
 	struct pppoe_net *pn;
+	struct net *net = NULL;
 
 	if (!sk)
 		return 0;
@@ -576,19 +575,8 @@ static int pppoe_release(struct socket *sock)
 	/* Signal the death of the socket. */
 	sk->sk_state = PPPOX_DEAD;
 
-	/*
-	 * pppoe_flush_dev could lead to a race with
-	 * this routine so we use flush_lock to eliminate
-	 * such a case (we only need per-net specific data)
-	 */
-	spin_lock(&flush_lock);
-	po = pppox_sk(sk);
-	if (!po->pppoe_dev) {
-		spin_unlock(&flush_lock);
-		goto out;
-	}
-	pn = pppoe_pernet(dev_net(po->pppoe_dev));
-	spin_unlock(&flush_lock);
+	net = sock_net(sk);
+	pn = pppoe_pernet(net);
 
 	/*
 	 * protect "po" from concurrent updates
@@ -607,8 +595,8 @@ static int pppoe_release(struct socket *sock)
 	}
 
 	write_unlock_bh(&pn->hash_lock);
+	put_net(net);
 
-out:
 	sock_orphan(sk);
 	sock->sk = NULL;
 
@@ -625,8 +613,9 @@ static int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr,
 	struct sock *sk = sock->sk;
 	struct sockaddr_pppox *sp = (struct sockaddr_pppox *)uservaddr;
 	struct pppox_sock *po = pppox_sk(sk);
-	struct net_device *dev;
 	struct pppoe_net *pn;
+	struct net_device *dev = NULL;
+	struct net *net = NULL;
 	int error;
 
 	lock_sock(sk);
@@ -653,10 +642,12 @@ static int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr,
 	if (stage_session(po->pppoe_pa.sid)) {
 		pppox_unbind_sock(sk);
 		if (po->pppoe_dev) {
-			pn = pppoe_pernet(dev_net(po->pppoe_dev));
+			struct net *old = dev_net(po->pppoe_dev);
+			pn = pppoe_pernet(old);
 			delete_item(pn, po->pppoe_pa.sid,
 				po->pppoe_pa.remote, po->pppoe_ifindex);
 			dev_put(po->pppoe_dev);
+			put_net(old);
 		}
 		memset(sk_pppox(po) + 1, 0,
 		       sizeof(struct pppox_sock) - sizeof(struct sock));
@@ -666,13 +657,17 @@ static int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr,
 	/* Re-bind in session stage only */
 	if (stage_session(sp->sa_addr.pppoe.sid)) {
 		error = -ENODEV;
-		dev = dev_get_by_name(sock_net(sk), sp->sa_addr.pppoe.dev);
-		if (!dev)
+		net = maybe_get_net(dev_net(dev));
+		if (!net)
 			goto end;
 
+		dev = dev_get_by_name(net, sp->sa_addr.pppoe.dev);
+		if (!dev)
+			goto err_put_net;
+
 		po->pppoe_dev = dev;
 		po->pppoe_ifindex = dev->ifindex;
-		pn = pppoe_pernet(dev_net(dev));
+		pn = pppoe_pernet(net);
 		write_lock_bh(&pn->hash_lock);
 		if (!(dev->flags & IFF_UP)) {
 			write_unlock_bh(&pn->hash_lock);
@@ -707,6 +702,10 @@ static int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr,
 end:
 	release_sock(sk);
 	return error;
+err_put_net:
+	if (net)
+		put_net(net);
+
 err_put:
 	if (po->pppoe_dev) {
 		dev_put(po->pppoe_dev);

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces
  2009-10-19 18:07             ` Michal Ostrowski
@ 2009-10-19 18:44               ` Eric Dumazet
  2009-10-19 19:29                 ` Cyrill Gorcunov
                                   ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Eric Dumazet @ 2009-10-19 18:44 UTC (permalink / raw)
  To: Michal Ostrowski
  Cc: Cyrill Gorcunov, Denys Fedoryschenko, netdev, linux-ppp, paulus,
	mostrows

Michal Ostrowski a écrit :
> On Mon, Oct 19, 2009 at 12:12 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> Michal Ostrowski a écrit :
>>> Here's a bigger patch that just gets rid of flush_lock altogether.
>>>
>>> We were seeing oopses due to net namespaces going away while we were using
>>> them, which turns out is simply due to the fact that pppoew wasn't claiming ref
>>> counts properly.
>>>
>>> Fixing this requires that adding and removing entries to the per-net hash-table
>>> requires incrementing and decrementing the ref count.  This also allows us to
>>> get rid of the flush_lock since we can now depend on the existence of
>>> "pn->hash_lock".
>>>
>>> We also have to be careful when flushing devices that removal of a hash table
>>> entry may bring the net namespace refcount to 0.
>>>
>> Your patch is mangled (tabulation -> white spaces),
> 
> Patch mangling was due to mailer interactions, I'll attach a clean
> version here, no more inlining.
> 
>> and I dont believe namespace refcount can reach 0 inside pppoe_flush_dev(),
>> it would be a bug from core network code.
>>
> 
> From the original oops I was able to deduce that the namespace somehow
> managed to get destroyed during the interval where we dropped locks.
> If that's not due to the release_sock() call in pppoe_flush_dev()
> triggering a cleanup then I'd have to assume that that it's due to a
> secondary actor closing the socket in parallel, but that in turn would
> point to issues with the flush_lock.  Having said that the thrust of
> this patch remains valid; it just means I don't need to inc the ref
> count in pppoe_flush_dev().
> 
> Do you agree?
> 

Not really :)

I dont believe you should care of namespace, and/or mess with its refcount at all.

Please dont use maybe_get_net() : This function should not ever be used in drivers/net

You can add a BUG_ON(dev_net(xxxx)->count <= 0) if you really want, but if this
assertion is false, this is not because of pppoe.


 	lock_sock(sk);
@@ -653,10 +642,12 @@ static int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr,
 	if (stage_session(po->pppoe_pa.sid)) {
 		pppox_unbind_sock(sk);
 		if (po->pppoe_dev) {
-			pn = pppoe_pernet(dev_net(po->pppoe_dev));
+			struct net *old = dev_net(po->pppoe_dev);
+			pn = pppoe_pernet(old);
 			delete_item(pn, po->pppoe_pa.sid,
 				po->pppoe_pa.remote, po->pppoe_ifindex);
 			dev_put(po->pppoe_dev);
+			put_net(old);
 		}
 		memset(sk_pppox(po) + 1, 0,
 		       sizeof(struct pppox_sock) - sizeof(struct sock));


There is still a race here, since you do a dev_put(po->ppoe_dev); without any lock held

So pppoe_flush_dev() can run concurently and dev_put(po->ppoe_dev) at same time.

In fact pppoe_flush_dev() can change po->ppoe_dev anytime, so you should check
all occurences of po->ppoe_dev use in the code and check if appropriate locking is done.

pppoe_rcv_core() is not safe
pppoe_ioctl() is not safe
pppoe_sendmsg() is not safe
__pppoe_xmit() is not safe


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces
  2009-10-19 18:44               ` Eric Dumazet
@ 2009-10-19 19:29                 ` Cyrill Gorcunov
  2009-10-19 20:54                 ` Michal Ostrowski
  2009-10-19 20:57                 ` Cyrill Gorcunov
  2 siblings, 0 replies; 34+ messages in thread
From: Cyrill Gorcunov @ 2009-10-19 19:29 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Michal Ostrowski, Denys Fedoryschenko, netdev, linux-ppp, paulus,
	mostrows

[Eric Dumazet - Mon, Oct 19, 2009 at 08:44:52PM +0200]
...
| 
| Not really :)
| 
| I dont believe you should care of namespace, and/or mess with its refcount at all.
| 
| Please dont use maybe_get_net() : This function should not ever be used in drivers/net
| 
| You can add a BUG_ON(dev_net(xxxx)->count <= 0) if you really want, but if this
| assertion is false, this is not because of pppoe.
| 
...
| So pppoe_flush_dev() can run concurently and dev_put(po->ppoe_dev) at same time.
| 
| In fact pppoe_flush_dev() can change po->ppoe_dev anytime, so you should check
| all occurences of po->ppoe_dev use in the code and check if appropriate locking is done.
| 
| pppoe_rcv_core() is not safe
| pppoe_ioctl() is not safe
| pppoe_sendmsg() is not safe
| __pppoe_xmit() is not safe
| 

Sigh... seem so (which is mostly my fault not Michal). Every time we touch pppoe_dev we
should dev_hold on it and dev_put as only done all we need. Async nature
of notifier seem to be a key here.

	-- Cyrill

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces
  2009-10-19 18:44               ` Eric Dumazet
  2009-10-19 19:29                 ` Cyrill Gorcunov
@ 2009-10-19 20:54                 ` Michal Ostrowski
  2009-10-20  3:42                   ` Eric Dumazet
  2009-10-19 20:57                 ` Cyrill Gorcunov
  2 siblings, 1 reply; 34+ messages in thread
From: Michal Ostrowski @ 2009-10-19 20:54 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Cyrill Gorcunov, Denys Fedoryschenko, netdev, linux-ppp, paulus,
	mostrows

[-- Attachment #1: Type: text/plain, Size: 3819 bytes --]

Access of po->pppoe_dev is guarded by sk->sk_state & PPPOX_CONNECTED,
and all use cases now rely on the socket lock.  Because of this, the
ref-count on the namespace held by the socket object suffices to hold
the namespace in existence and so we don't need to ref-count the
namespace in PPPoE. The flush_lock is gone.

--
Michal Ostrowski
mostrows@gmail.com



On Mon, Oct 19, 2009 at 1:44 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Michal Ostrowski a écrit :
>> On Mon, Oct 19, 2009 at 12:12 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> Michal Ostrowski a écrit :
>>>> Here's a bigger patch that just gets rid of flush_lock altogether.
>>>>
>>>> We were seeing oopses due to net namespaces going away while we were using
>>>> them, which turns out is simply due to the fact that pppoew wasn't claiming ref
>>>> counts properly.
>>>>
>>>> Fixing this requires that adding and removing entries to the per-net hash-table
>>>> requires incrementing and decrementing the ref count.  This also allows us to
>>>> get rid of the flush_lock since we can now depend on the existence of
>>>> "pn->hash_lock".
>>>>
>>>> We also have to be careful when flushing devices that removal of a hash table
>>>> entry may bring the net namespace refcount to 0.
>>>>
>>> Your patch is mangled (tabulation -> white spaces),
>>
>> Patch mangling was due to mailer interactions, I'll attach a clean
>> version here, no more inlining.
>>
>>> and I dont believe namespace refcount can reach 0 inside pppoe_flush_dev(),
>>> it would be a bug from core network code.
>>>
>>
>> From the original oops I was able to deduce that the namespace somehow
>> managed to get destroyed during the interval where we dropped locks.
>> If that's not due to the release_sock() call in pppoe_flush_dev()
>> triggering a cleanup then I'd have to assume that that it's due to a
>> secondary actor closing the socket in parallel, but that in turn would
>> point to issues with the flush_lock.  Having said that the thrust of
>> this patch remains valid; it just means I don't need to inc the ref
>> count in pppoe_flush_dev().
>>
>> Do you agree?
>>
>
> Not really :)
>
> I dont believe you should care of namespace, and/or mess with its refcount at all.
>
> Please dont use maybe_get_net() : This function should not ever be used in drivers/net
>
> You can add a BUG_ON(dev_net(xxxx)->count <= 0) if you really want, but if this
> assertion is false, this is not because of pppoe.
>
>
>        lock_sock(sk);
> @@ -653,10 +642,12 @@ static int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr,
>        if (stage_session(po->pppoe_pa.sid)) {
>                pppox_unbind_sock(sk);
>                if (po->pppoe_dev) {
> -                       pn = pppoe_pernet(dev_net(po->pppoe_dev));
> +                       struct net *old = dev_net(po->pppoe_dev);
> +                       pn = pppoe_pernet(old);
>                        delete_item(pn, po->pppoe_pa.sid,
>                                po->pppoe_pa.remote, po->pppoe_ifindex);
>                        dev_put(po->pppoe_dev);
> +                       put_net(old);
>                }
>                memset(sk_pppox(po) + 1, 0,
>                       sizeof(struct pppox_sock) - sizeof(struct sock));
>
>
> There is still a race here, since you do a dev_put(po->ppoe_dev); without any lock held
>
> So pppoe_flush_dev() can run concurently and dev_put(po->ppoe_dev) at same time.
>
> In fact pppoe_flush_dev() can change po->ppoe_dev anytime, so you should check
> all occurences of po->ppoe_dev use in the code and check if appropriate locking is done.
>
> pppoe_rcv_core() is not safe
> pppoe_ioctl() is not safe
> pppoe_sendmsg() is not safe
> __pppoe_xmit() is not safe
>
>

[-- Attachment #2: 0001-PPPoE-Fix-flush-close-races.patch --]
[-- Type: application/octet-stream, Size: 6458 bytes --]

diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c
index 7cbf6f9..1d0b569 100644
--- a/drivers/net/pppoe.c
+++ b/drivers/net/pppoe.c
@@ -111,9 +111,6 @@ struct pppoe_net {
 	rwlock_t hash_lock;
 };
 
-/* to eliminate a race btw pppoe_flush_dev and pppoe_release */
-static DEFINE_SPINLOCK(flush_lock);
-
 /*
  * PPPoE could be in the following stages:
  * 1) Discovery stage (to obtain remote MAC and Session ID)
@@ -303,48 +300,46 @@ static void pppoe_flush_dev(struct net_device *dev)
 	write_lock_bh(&pn->hash_lock);
 	for (i = 0; i < PPPOE_HASH_SIZE; i++) {
 		struct pppox_sock *po = pn->hash_table[i];
+		struct sock *sk;
 
-		while (po != NULL) {
-			struct sock *sk;
-			if (po->pppoe_dev != dev) {
-				po = po->next;
-				continue;
-			}
-			sk = sk_pppox(po);
-			spin_lock(&flush_lock);
-			po->pppoe_dev = NULL;
-			spin_unlock(&flush_lock);
-			dev_put(dev);
-
-			/* We always grab the socket lock, followed by the
-			 * hash_lock, in that order.  Since we should
-			 * hold the sock lock while doing any unbinding,
-			 * we need to release the lock we're holding.
-			 * Hold a reference to the sock so it doesn't disappear
-			 * as we're jumping between locks.
-			 */
+		while (po && po->pppoe_dev != dev) {
+			po = po->next;
+		}
 
-			sock_hold(sk);
+		if (po == NULL) {
+			continue;
+		}
 
-			write_unlock_bh(&pn->hash_lock);
-			lock_sock(sk);
+		sk = sk_pppox(po);
 
-			if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND)) {
-				pppox_unbind_sock(sk);
-				sk->sk_state = PPPOX_ZOMBIE;
-				sk->sk_state_change(sk);
-			}
+		/* We always grab the socket lock, followed by the hash_lock,
+		 * in that order.  Since we should hold the sock lock while
+		 * doing any unbinding, we need to release the lock we're
+		 * holding.  Hold a reference to the sock so it doesn't
+		 * disappear as we're jumping between locks.
+		 */
 
-			release_sock(sk);
-			sock_put(sk);
+		sock_hold(sk);
+		write_unlock_bh(&pn->hash_lock);
+		lock_sock(sk);
 
-			/* Restart scan at the beginning of this hash chain.
-			 * While the lock was dropped the chain contents may
-			 * have changed.
-			 */
-			write_lock_bh(&pn->hash_lock);
-			po = pn->hash_table[i];
+		if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND)) {
+			pppox_unbind_sock(sk);
+			sk->sk_state = PPPOX_ZOMBIE;
+			sk->sk_state_change(sk);
+			po->pppoe_dev = NULL;
+			dev_put(dev);
 		}
+
+		release_sock(sk);
+		sock_put(sk);
+
+		/* Restart the process from the start of the current hash
+		 * chain. We dropped locks so the world may have change from
+		 * underneath us.
+		 */
+		write_lock_bh(&pn->hash_lock);
+		po = pn->hash_table[i];
 	}
 	write_unlock_bh(&pn->hash_lock);
 }
@@ -391,8 +386,8 @@ static int pppoe_rcv_core(struct sock *sk, struct sk_buff *skb)
 	if (sk->sk_state & PPPOX_BOUND) {
 		ppp_input(&po->chan, skb);
 	} else if (sk->sk_state & PPPOX_RELAY) {
-		relay_po = get_item_by_addr(dev_net(po->pppoe_dev),
-						&po->pppoe_relay);
+		relay_po = get_item_by_addr(sock_net(sk),
+					    &po->pppoe_relay);
 		if (relay_po == NULL)
 			goto abort_kfree;
 
@@ -561,6 +556,7 @@ static int pppoe_release(struct socket *sock)
 	struct sock *sk = sock->sk;
 	struct pppox_sock *po;
 	struct pppoe_net *pn;
+	struct net *net = NULL;
 
 	if (!sk)
 		return 0;
@@ -571,24 +567,19 @@ static int pppoe_release(struct socket *sock)
 		return -EBADF;
 	}
 
+	po = pppox_sk(sk);
+
+	if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND)) {
+		dev_put(po->pppoe_dev);
+	}
+
 	pppox_unbind_sock(sk);
 
 	/* Signal the death of the socket. */
 	sk->sk_state = PPPOX_DEAD;
 
-	/*
-	 * pppoe_flush_dev could lead to a race with
-	 * this routine so we use flush_lock to eliminate
-	 * such a case (we only need per-net specific data)
-	 */
-	spin_lock(&flush_lock);
-	po = pppox_sk(sk);
-	if (!po->pppoe_dev) {
-		spin_unlock(&flush_lock);
-		goto out;
-	}
-	pn = pppoe_pernet(dev_net(po->pppoe_dev));
-	spin_unlock(&flush_lock);
+	net = sock_net(sk);
+	pn = pppoe_pernet(net);
 
 	/*
 	 * protect "po" from concurrent updates
@@ -596,19 +587,12 @@ static int pppoe_release(struct socket *sock)
 	 */
 	write_lock_bh(&pn->hash_lock);
 
-	po = pppox_sk(sk);
 	if (stage_session(po->pppoe_pa.sid))
 		__delete_item(pn, po->pppoe_pa.sid, po->pppoe_pa.remote,
 				po->pppoe_ifindex);
 
-	if (po->pppoe_dev) {
-		dev_put(po->pppoe_dev);
-		po->pppoe_dev = NULL;
-	}
-
 	write_unlock_bh(&pn->hash_lock);
 
-out:
 	sock_orphan(sk);
 	sock->sk = NULL;
 
@@ -625,8 +609,9 @@ static int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr,
 	struct sock *sk = sock->sk;
 	struct sockaddr_pppox *sp = (struct sockaddr_pppox *)uservaddr;
 	struct pppox_sock *po = pppox_sk(sk);
-	struct net_device *dev;
 	struct pppoe_net *pn;
+	struct net_device *dev = NULL;
+	struct net *net = NULL;
 	int error;
 
 	lock_sock(sk);
@@ -653,10 +638,12 @@ static int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr,
 	if (stage_session(po->pppoe_pa.sid)) {
 		pppox_unbind_sock(sk);
 		if (po->pppoe_dev) {
-			pn = pppoe_pernet(dev_net(po->pppoe_dev));
+			struct net *old = dev_net(po->pppoe_dev);
+			pn = pppoe_pernet(old);
 			delete_item(pn, po->pppoe_pa.sid,
 				po->pppoe_pa.remote, po->pppoe_ifindex);
 			dev_put(po->pppoe_dev);
+			po->pppoe_dev = NULL;
 		}
 		memset(sk_pppox(po) + 1, 0,
 		       sizeof(struct pppox_sock) - sizeof(struct sock));
@@ -666,13 +653,14 @@ static int pppoe_connect(struct socket *sock, struct sockaddr *uservaddr,
 	/* Re-bind in session stage only */
 	if (stage_session(sp->sa_addr.pppoe.sid)) {
 		error = -ENODEV;
-		dev = dev_get_by_name(sock_net(sk), sp->sa_addr.pppoe.dev);
+		net = sock_net(sk);
+		dev = dev_get_by_name(net, sp->sa_addr.pppoe.dev);
 		if (!dev)
-			goto end;
+			goto err_put;
 
 		po->pppoe_dev = dev;
 		po->pppoe_ifindex = dev->ifindex;
-		pn = pppoe_pernet(dev_net(dev));
+		pn = pppoe_pernet(net);
 		write_lock_bh(&pn->hash_lock);
 		if (!(dev->flags & IFF_UP)) {
 			write_unlock_bh(&pn->hash_lock);
@@ -915,6 +903,8 @@ static int __pppoe_xmit(struct sock *sk, struct sk_buff *skb)
 	struct pppoe_hdr *ph;
 	int data_len = skb->len;
 
+	lock_sock(sk);
+
 	if (sock_flag(sk, SOCK_DEAD) || !(sk->sk_state & PPPOX_CONNECTED))
 		goto abort;
 
@@ -944,10 +934,11 @@ static int __pppoe_xmit(struct sock *sk, struct sk_buff *skb)
 			po->pppoe_pa.remote, NULL, data_len);
 
 	dev_queue_xmit(skb);
-
+	release_sock(sk);
 	return 1;
 
 abort:
+	release_sock(sk);
 	kfree_skb(skb);
 	return 1;
 }

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces
  2009-10-19 20:54                 ` Michal Ostrowski
@ 2009-10-20  3:42                   ` Eric Dumazet
  2009-10-20  5:02                     ` Cyrill Gorcunov
  0 siblings, 1 reply; 34+ messages in thread
From: Eric Dumazet @ 2009-10-20  3:42 UTC (permalink / raw)
  To: Michal Ostrowski
  Cc: Cyrill Gorcunov, Denys Fedoryschenko, netdev, linux-ppp, paulus,
	mostrows

Michal Ostrowski a écrit :
> Access of po->pppoe_dev is guarded by sk->sk_state & PPPOX_CONNECTED,
> and all use cases now rely on the socket lock.  Because of this, the
> ref-count on the namespace held by the socket object suffices to hold
> the namespace in existence and so we don't need to ref-count the
> namespace in PPPoE. The flush_lock is gone.
> 

Seems good !

But can we use lock_sock() in __pppoe_xmit() context ?




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces
  2009-10-20  3:42                   ` Eric Dumazet
@ 2009-10-20  5:02                     ` Cyrill Gorcunov
  2009-10-20  5:05                       ` Eric Dumazet
  0 siblings, 1 reply; 34+ messages in thread
From: Cyrill Gorcunov @ 2009-10-20  5:02 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Michal Ostrowski, Denys Fedoryschenko, netdev, linux-ppp, paulus,
	mostrows

On 10/20/09, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Michal Ostrowski a écrit :
>> Access of po->pppoe_dev is guarded by sk->sk_state & PPPOX_CONNECTED,
>> and all use cases now rely on the socket lock.  Because of this, the
>> ref-count on the namespace held by the socket object suffices to hold
>> the namespace in existence and so we don't need to ref-count the
>> namespace in PPPoE. The flush_lock is gone.
>>
>
> Seems good !
>
> But can we use lock_sock() in __pppoe_xmit() context ?
>

Eric, most probably i miss something, but how lock sock protect us
from mtu changed via sysfs. This action calls change mtu notifier
which doesn't care about sockets at all...

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces
  2009-10-20  5:02                     ` Cyrill Gorcunov
@ 2009-10-20  5:05                       ` Eric Dumazet
  2009-10-20  5:17                         ` Cyrill Gorcunov
  0 siblings, 1 reply; 34+ messages in thread
From: Eric Dumazet @ 2009-10-20  5:05 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Michal Ostrowski, Denys Fedoryschenko, netdev, linux-ppp, paulus,
	mostrows

Cyrill Gorcunov a écrit :
> On 10/20/09, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> Michal Ostrowski a écrit :
>>> Access of po->pppoe_dev is guarded by sk->sk_state & PPPOX_CONNECTED,
>>> and all use cases now rely on the socket lock.  Because of this, the
>>> ref-count on the namespace held by the socket object suffices to hold
>>> the namespace in existence and so we don't need to ref-count the
>>> namespace in PPPoE. The flush_lock is gone.
>>>
>> Seems good !
>>
>> But can we use lock_sock() in __pppoe_xmit() context ?
>>
> 
> Eric, most probably i miss something, but how lock sock protect us
> from mtu changed via sysfs. This action calls change mtu notifier
> which doesn't care about sockets at all...

This ultimately calls pppoe_flush_dev() and this function
takes care of taking appropriate sock_locks() on each sockets ?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces
  2009-10-20  5:05                       ` Eric Dumazet
@ 2009-10-20  5:17                         ` Cyrill Gorcunov
  2009-10-20  6:04                           ` Cyrill Gorcunov
  0 siblings, 1 reply; 34+ messages in thread
From: Cyrill Gorcunov @ 2009-10-20  5:17 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Michal Ostrowski, Denys Fedoryschenko, netdev, linux-ppp, paulus,
	mostrows

On 10/20/09, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Cyrill Gorcunov a écrit :
>> On 10/20/09, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> Michal Ostrowski a écrit :
>>>> Access of po->pppoe_dev is guarded by sk->sk_state & PPPOX_CONNECTED,
>>>> and all use cases now rely on the socket lock.  Because of this, the
>>>> ref-count on the namespace held by the socket object suffices to hold
>>>> the namespace in existence and so we don't need to ref-count the
>>>> namespace in PPPoE. The flush_lock is gone.
>>>>
>>> Seems good !
>>>
>>> But can we use lock_sock() in __pppoe_xmit() context ?
>>>
>>
>> Eric, most probably i miss something, but how lock sock protect us
>> from mtu changed via sysfs. This action calls change mtu notifier
>> which doesn't care about sockets at all...
>
> This ultimately calls pppoe_flush_dev() and this function
> takes care of taking appropriate sock_locks() on each sockets ?
>
This hold and lock socks but set pppoe_dev to null as well. I'll back
later. And i need to reread the code.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces
  2009-10-20  5:17                         ` Cyrill Gorcunov
@ 2009-10-20  6:04                           ` Cyrill Gorcunov
  0 siblings, 0 replies; 34+ messages in thread
From: Cyrill Gorcunov @ 2009-10-20  6:04 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Michal Ostrowski, Denys Fedoryschenko, netdev, linux-ppp, paulus,
	mostrows

On 10/20/09, Cyrill Gorcunov <gorcunov@gmail.com> wrote:
> On 10/20/09, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>> This ultimately calls pppoe_flush_dev() and this function
>> takes care of taking appropriate sock_locks() on each sockets ?
>>
> This hold and lock socks but set pppoe_dev to null as well. I'll back
> later. And i need to reread the code.
>
from a second glance we have a race with pppoe_connect (err_put label)
and pppoe_flush_dev. Connect lock socket but while flushing we may
dev_put with null device. So as Eric pointed early the error path is
racy and should be protected with flush_lock.  Note that while
flushing pppoe we may do dev_put without socket locked. This should be
the corner case. Michal could you add this lock in pppoe_connect (i
cant do this at moment).

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces
  2009-10-19 18:44               ` Eric Dumazet
  2009-10-19 19:29                 ` Cyrill Gorcunov
  2009-10-19 20:54                 ` Michal Ostrowski
@ 2009-10-19 20:57                 ` Cyrill Gorcunov
  2009-10-19 21:22                   ` Michal Ostrowski
  2 siblings, 1 reply; 34+ messages in thread
From: Cyrill Gorcunov @ 2009-10-19 20:57 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Michal Ostrowski, Denys Fedoryschenko, netdev, linux-ppp, paulus,
	mostrows

[Eric Dumazet - Mon, Oct 19, 2009 at 08:44:52PM +0200]
...
| 
| There is still a race here, since you do a dev_put(po->ppoe_dev); without any lock held
| 
| So pppoe_flush_dev() can run concurently and dev_put(po->ppoe_dev) at same time.
| 
| In fact pppoe_flush_dev() can change po->ppoe_dev anytime, so you should check
| all occurences of po->ppoe_dev use in the code and check if appropriate locking is done.
| 
| pppoe_rcv_core() is not safe
| pppoe_ioctl() is not safe
| pppoe_sendmsg() is not safe
| __pppoe_xmit() is not safe
| 

Eric, Michal, please take a look (compile-test only).
There is still unclear moment for NETDEV_CHANGEMTU
since one could be doing this in say endless loop from
userspace calling device to flush all the time which
implies dev_put as a result :(

Comments are welcome (and complains as well). I'll be able to
continue handling (or reply to mail) tomorrow evening only.
Anyway -- this patch is ugly enough but may happen to work.

	-- Cyrill
---
 drivers/net/pppoe.c |   79 ++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 62 insertions(+), 17 deletions(-)

Index: linux-2.6.git/drivers/net/pppoe.c
=====================================================================
--- linux-2.6.git.orig/drivers/net/pppoe.c
+++ linux-2.6.git/drivers/net/pppoe.c
@@ -386,14 +386,19 @@ static struct notifier_block pppoe_notif
 static int pppoe_rcv_core(struct sock *sk, struct sk_buff *skb)
 {
 	struct pppox_sock *po = pppox_sk(sk);
-	struct pppox_sock *relay_po;
+	struct pppox_sock *relay_po = NULL;
+	struct net *net = NULL;
 
 	if (sk->sk_state & PPPOX_BOUND) {
 		ppp_input(&po->chan, skb);
 	} else if (sk->sk_state & PPPOX_RELAY) {
-		relay_po = get_item_by_addr(dev_net(po->pppoe_dev),
-						&po->pppoe_relay);
-		if (relay_po == NULL)
+		spin_lock(&flush_lock);
+		if (po->pppoe_dev)
+			net = dev_net(po->pppoe_dev);
+		spin_unlock(&flush_lock);
+		if (net)
+			relay_po = get_item_by_addr(net, &po->pppoe_relay);
+		if (!net || !relay_po)
 			goto abort_kfree;
 
 		if ((sk_pppox(relay_po)->sk_state & PPPOX_CONNECTED) == 0)
@@ -652,12 +657,15 @@ static int pppoe_connect(struct socket *
 	/* Delete the old binding */
 	if (stage_session(po->pppoe_pa.sid)) {
 		pppox_unbind_sock(sk);
+		spin_lock(&flush_lock);
 		if (po->pppoe_dev) {
 			pn = pppoe_pernet(dev_net(po->pppoe_dev));
 			delete_item(pn, po->pppoe_pa.sid,
 				po->pppoe_pa.remote, po->pppoe_ifindex);
 			dev_put(po->pppoe_dev);
+			po->pppoe_dev = NULL;
 		}
+		spin_unlock(&flush_lock);
 		memset(sk_pppox(po) + 1, 0,
 		       sizeof(struct pppox_sock) - sizeof(struct sock));
 		sk->sk_state = PPPOX_NONE;
@@ -670,7 +678,9 @@ static int pppoe_connect(struct socket *
 		if (!dev)
 			goto end;
 
+		spin_lock(&flush_lock);
 		po->pppoe_dev = dev;
+		spin_unlock(&flush_lock);
 		po->pppoe_ifindex = dev->ifindex;
 		pn = pppoe_pernet(dev_net(dev));
 		write_lock_bh(&pn->hash_lock);
@@ -708,10 +718,12 @@ end:
 	release_sock(sk);
 	return error;
 err_put:
+	spin_lock(&flush_lock);
 	if (po->pppoe_dev) {
 		dev_put(po->pppoe_dev);
 		po->pppoe_dev = NULL;
 	}
+	spin_unlock(&flush_lock);
 	goto end;
 }
 
@@ -738,6 +750,7 @@ static int pppoe_ioctl(struct socket *so
 {
 	struct sock *sk = sock->sk;
 	struct pppox_sock *po = pppox_sk(sk);
+	unsigned int mtu = 0;
 	int val;
 	int err;
 
@@ -746,11 +759,19 @@ static int pppoe_ioctl(struct socket *so
 		err = -ENXIO;
 		if (!(sk->sk_state & PPPOX_CONNECTED))
 			break;
-
+		err = -EBUSY;
+		if (!spin_trylock(&flush_lock))
+			break;
+		err = -ENODEV;
+		if (po->pppoe_dev) {
+			mtu = po->pppoe_dev->mtu;
+			err = 0;
+		}
+		spin_unlock(&flush_lock);
+		if (err)
+			break;
 		err = -EFAULT;
-		if (put_user(po->pppoe_dev->mtu -
-			     sizeof(struct pppoe_hdr) -
-			     PPP_HDRLEN,
+		if (put_user(mtu - sizeof(struct pppoe_hdr) - PPP_HDRLEN,
 			     (int __user *)arg))
 			break;
 		err = 0;
@@ -760,14 +781,22 @@ static int pppoe_ioctl(struct socket *so
 		err = -ENXIO;
 		if (!(sk->sk_state & PPPOX_CONNECTED))
 			break;
-
+		err = -EBUSY;
+		if (!spin_trylock(&flush_lock))
+			break;
+		err = -ENODEV;
+		if (po->pppoe_dev) {
+			mtu = po->pppoe_dev->mtu;
+			err = 0;
+		}
+		spin_unlock(&flush_lock);
+		if (err)
+			break;
 		err = -EFAULT;
 		if (get_user(val, (int __user *)arg))
 			break;
 
-		if (val < (po->pppoe_dev->mtu
-			   - sizeof(struct pppoe_hdr)
-			   - PPP_HDRLEN))
+		if (val < (mtu - sizeof(struct pppoe_hdr) - PPP_HDRLEN))
 			err = 0;
 		else
 			err = -EINVAL;
@@ -842,7 +871,7 @@ static int pppoe_sendmsg(struct kiocb *i
 	int error;
 	struct pppoe_hdr hdr;
 	struct pppoe_hdr *ph;
-	struct net_device *dev;
+	struct net_device *dev = NULL;
 	char *start;
 
 	lock_sock(sk);
@@ -856,7 +885,15 @@ static int pppoe_sendmsg(struct kiocb *i
 	hdr.code = 0;
 	hdr.sid = po->num;
 
+	spin_lock(&flush_lock);
 	dev = po->pppoe_dev;
+	if (!dev) {
+		spin_unlock(&flush_lock);
+		error = -ENODEV;
+		goto end;
+	}
+	dev_hold(dev);
+	spin_unlock(&flush_lock);
 
 	error = -EMSGSIZE;
 	if (total_len > (dev->mtu + dev->hard_header_len))
@@ -899,6 +936,8 @@ static int pppoe_sendmsg(struct kiocb *i
 	dev_queue_xmit(skb);
 
 end:
+	if (dev)
+		dev_put(dev);
 	release_sock(sk);
 	return error;
 }
@@ -911,15 +950,19 @@ end:
 static int __pppoe_xmit(struct sock *sk, struct sk_buff *skb)
 {
 	struct pppox_sock *po = pppox_sk(sk);
-	struct net_device *dev = po->pppoe_dev;
+	struct net_device *dev;
 	struct pppoe_hdr *ph;
 	int data_len = skb->len;
 
 	if (sock_flag(sk, SOCK_DEAD) || !(sk->sk_state & PPPOX_CONNECTED))
 		goto abort;
 
-	if (!dev)
-		goto abort;
+	spin_lock(&flush_lock);
+	if (!po->pppoe_dev)
+		goto abort_unlock;
+	dev = po->pppoe_dev;
+	dev_hold(dev);
+	spin_unlock(&flush_lock);
 
 	/* Copy the data if there is no space for the header or if it's
 	 * read-only.
@@ -942,11 +985,13 @@ static int __pppoe_xmit(struct sock *sk,
 
 	dev_hard_header(skb, dev, ETH_P_PPP_SES,
 			po->pppoe_pa.remote, NULL, data_len);
-
 	dev_queue_xmit(skb);
+	dev_put(dev);
 
 	return 1;
 
+abort_unlock:
+	spin_unlock(&flush_lock);
 abort:
 	kfree_skb(skb);
 	return 1;

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces
  2009-10-19 20:57                 ` Cyrill Gorcunov
@ 2009-10-19 21:22                   ` Michal Ostrowski
  2009-10-20  0:08                     ` Denys Fedoryschenko
  2009-10-20  2:28                     ` David Miller
  0 siblings, 2 replies; 34+ messages in thread
From: Michal Ostrowski @ 2009-10-19 21:22 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Eric Dumazet, Denys Fedoryschenko, netdev, linux-ppp, paulus,
	mostrows

I'm assuming that there was a race in us sending patches at nearly the same time
I'm convinced now that the flush_lock can die, and the patch I sent
out kills it.
If we were to have to have a global lock, you might as well just use
the flush_lock for everything.

Note that there's a bunch of checks for sk->sk_state vs
PPPOX_CONNECTED.  PPPOX_CONNECTED being set implies that po->pppoe_dev
is valid.   So, by ensuring that checks of sk->sk_state and
po->pppoe_dev are done under the protection of
lock_sock()/release_sock() it is not even necessary to check
po->pppoe_dev==NULL.

In fact, the NULL checks you're adding are equivalent to testing
sk->sk_state vs PPPOX_CONNECTED, and thus you're replacing the
synchronization provided by the socket lock with flush_lock.

--
Michal Ostrowski
mostrows@gmail.com



On Mon, Oct 19, 2009 at 3:57 PM, Cyrill Gorcunov <gorcunov@gmail.com> wrote:
> [Eric Dumazet - Mon, Oct 19, 2009 at 08:44:52PM +0200]
> ...
> |
> | There is still a race here, since you do a dev_put(po->ppoe_dev); without any lock held
> |
> | So pppoe_flush_dev() can run concurently and dev_put(po->ppoe_dev) at same time.
> |
> | In fact pppoe_flush_dev() can change po->ppoe_dev anytime, so you should check
> | all occurences of po->ppoe_dev use in the code and check if appropriate locking is done.
> |
> | pppoe_rcv_core() is not safe
> | pppoe_ioctl() is not safe
> | pppoe_sendmsg() is not safe
> | __pppoe_xmit() is not safe
> |
>
> Eric, Michal, please take a look (compile-test only).
> There is still unclear moment for NETDEV_CHANGEMTU
> since one could be doing this in say endless loop from
> userspace calling device to flush all the time which
> implies dev_put as a result :(
>
> Comments are welcome (and complains as well). I'll be able to
> continue handling (or reply to mail) tomorrow evening only.
> Anyway -- this patch is ugly enough but may happen to work.
>
>        -- Cyrill
> ---
>  drivers/net/pppoe.c |   79 ++++++++++++++++++++++++++++++++++++++++------------
>  1 file changed, 62 insertions(+), 17 deletions(-)
>
> Index: linux-2.6.git/drivers/net/pppoe.c
> =====================================================================
> --- linux-2.6.git.orig/drivers/net/pppoe.c
> +++ linux-2.6.git/drivers/net/pppoe.c
> @@ -386,14 +386,19 @@ static struct notifier_block pppoe_notif
>  static int pppoe_rcv_core(struct sock *sk, struct sk_buff *skb)
>  {
>        struct pppox_sock *po = pppox_sk(sk);
> -       struct pppox_sock *relay_po;
> +       struct pppox_sock *relay_po = NULL;
> +       struct net *net = NULL;
>
>        if (sk->sk_state & PPPOX_BOUND) {
>                ppp_input(&po->chan, skb);
>        } else if (sk->sk_state & PPPOX_RELAY) {
> -               relay_po = get_item_by_addr(dev_net(po->pppoe_dev),
> -                                               &po->pppoe_relay);
> -               if (relay_po == NULL)
> +               spin_lock(&flush_lock);
> +               if (po->pppoe_dev)
> +                       net = dev_net(po->pppoe_dev);
> +               spin_unlock(&flush_lock);
> +               if (net)
> +                       relay_po = get_item_by_addr(net, &po->pppoe_relay);
> +               if (!net || !relay_po)
>                        goto abort_kfree;
>
>                if ((sk_pppox(relay_po)->sk_state & PPPOX_CONNECTED) == 0)
> @@ -652,12 +657,15 @@ static int pppoe_connect(struct socket *
>        /* Delete the old binding */
>        if (stage_session(po->pppoe_pa.sid)) {
>                pppox_unbind_sock(sk);
> +               spin_lock(&flush_lock);
>                if (po->pppoe_dev) {
>                        pn = pppoe_pernet(dev_net(po->pppoe_dev));
>                        delete_item(pn, po->pppoe_pa.sid,
>                                po->pppoe_pa.remote, po->pppoe_ifindex);
>                        dev_put(po->pppoe_dev);
> +                       po->pppoe_dev = NULL;
>                }
> +               spin_unlock(&flush_lock);
>                memset(sk_pppox(po) + 1, 0,
>                       sizeof(struct pppox_sock) - sizeof(struct sock));
>                sk->sk_state = PPPOX_NONE;
> @@ -670,7 +678,9 @@ static int pppoe_connect(struct socket *
>                if (!dev)
>                        goto end;
>
> +               spin_lock(&flush_lock);
>                po->pppoe_dev = dev;
> +               spin_unlock(&flush_lock);
>                po->pppoe_ifindex = dev->ifindex;
>                pn = pppoe_pernet(dev_net(dev));
>                write_lock_bh(&pn->hash_lock);
> @@ -708,10 +718,12 @@ end:
>        release_sock(sk);
>        return error;
>  err_put:
> +       spin_lock(&flush_lock);
>        if (po->pppoe_dev) {
>                dev_put(po->pppoe_dev);
>                po->pppoe_dev = NULL;
>        }
> +       spin_unlock(&flush_lock);
>        goto end;
>  }
>
> @@ -738,6 +750,7 @@ static int pppoe_ioctl(struct socket *so
>  {
>        struct sock *sk = sock->sk;
>        struct pppox_sock *po = pppox_sk(sk);
> +       unsigned int mtu = 0;
>        int val;
>        int err;
>
> @@ -746,11 +759,19 @@ static int pppoe_ioctl(struct socket *so
>                err = -ENXIO;
>                if (!(sk->sk_state & PPPOX_CONNECTED))
>                        break;
> -
> +               err = -EBUSY;
> +               if (!spin_trylock(&flush_lock))
> +                       break;
> +               err = -ENODEV;
> +               if (po->pppoe_dev) {
> +                       mtu = po->pppoe_dev->mtu;
> +                       err = 0;
> +               }
> +               spin_unlock(&flush_lock);
> +               if (err)
> +                       break;
>                err = -EFAULT;
> -               if (put_user(po->pppoe_dev->mtu -
> -                            sizeof(struct pppoe_hdr) -
> -                            PPP_HDRLEN,
> +               if (put_user(mtu - sizeof(struct pppoe_hdr) - PPP_HDRLEN,
>                             (int __user *)arg))
>                        break;
>                err = 0;
> @@ -760,14 +781,22 @@ static int pppoe_ioctl(struct socket *so
>                err = -ENXIO;
>                if (!(sk->sk_state & PPPOX_CONNECTED))
>                        break;
> -
> +               err = -EBUSY;
> +               if (!spin_trylock(&flush_lock))
> +                       break;
> +               err = -ENODEV;
> +               if (po->pppoe_dev) {
> +                       mtu = po->pppoe_dev->mtu;
> +                       err = 0;
> +               }
> +               spin_unlock(&flush_lock);
> +               if (err)
> +                       break;
>                err = -EFAULT;
>                if (get_user(val, (int __user *)arg))
>                        break;
>
> -               if (val < (po->pppoe_dev->mtu
> -                          - sizeof(struct pppoe_hdr)
> -                          - PPP_HDRLEN))
> +               if (val < (mtu - sizeof(struct pppoe_hdr) - PPP_HDRLEN))
>                        err = 0;
>                else
>                        err = -EINVAL;
> @@ -842,7 +871,7 @@ static int pppoe_sendmsg(struct kiocb *i
>        int error;
>        struct pppoe_hdr hdr;
>        struct pppoe_hdr *ph;
> -       struct net_device *dev;
> +       struct net_device *dev = NULL;
>        char *start;
>
>        lock_sock(sk);
> @@ -856,7 +885,15 @@ static int pppoe_sendmsg(struct kiocb *i
>        hdr.code = 0;
>        hdr.sid = po->num;
>
> +       spin_lock(&flush_lock);
>        dev = po->pppoe_dev;
> +       if (!dev) {
> +               spin_unlock(&flush_lock);
> +               error = -ENODEV;
> +               goto end;
> +       }
> +       dev_hold(dev);
> +       spin_unlock(&flush_lock);
>
>        error = -EMSGSIZE;
>        if (total_len > (dev->mtu + dev->hard_header_len))
> @@ -899,6 +936,8 @@ static int pppoe_sendmsg(struct kiocb *i
>        dev_queue_xmit(skb);
>
>  end:
> +       if (dev)
> +               dev_put(dev);
>        release_sock(sk);
>        return error;
>  }
> @@ -911,15 +950,19 @@ end:
>  static int __pppoe_xmit(struct sock *sk, struct sk_buff *skb)
>  {
>        struct pppox_sock *po = pppox_sk(sk);
> -       struct net_device *dev = po->pppoe_dev;
> +       struct net_device *dev;
>        struct pppoe_hdr *ph;
>        int data_len = skb->len;
>
>        if (sock_flag(sk, SOCK_DEAD) || !(sk->sk_state & PPPOX_CONNECTED))
>                goto abort;
>
> -       if (!dev)
> -               goto abort;
> +       spin_lock(&flush_lock);
> +       if (!po->pppoe_dev)
> +               goto abort_unlock;
> +       dev = po->pppoe_dev;
> +       dev_hold(dev);
> +       spin_unlock(&flush_lock);
>
>        /* Copy the data if there is no space for the header or if it's
>         * read-only.
> @@ -942,11 +985,13 @@ static int __pppoe_xmit(struct sock *sk,
>
>        dev_hard_header(skb, dev, ETH_P_PPP_SES,
>                        po->pppoe_pa.remote, NULL, data_len);
> -
>        dev_queue_xmit(skb);
> +       dev_put(dev);
>
>        return 1;
>
> +abort_unlock:
> +       spin_unlock(&flush_lock);
>  abort:
>        kfree_skb(skb);
>        return 1;
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: kernel panic in latest vanilla stable, while using nameif with  "alive" pppoe interfaces
  2009-10-19 21:22                   ` Michal Ostrowski
@ 2009-10-20  0:08                     ` Denys Fedoryschenko
  2009-10-20  3:04                       ` Cyrill Gorcunov
  2009-10-20  2:28                     ` David Miller
  1 sibling, 1 reply; 34+ messages in thread
From: Denys Fedoryschenko @ 2009-10-20  0:08 UTC (permalink / raw)
  To: Michal Ostrowski
  Cc: Cyrill Gorcunov, Eric Dumazet, netdev, linux-ppp, paulus,
	mostrows

On Tuesday 20 October 2009 00:22:39 Michal Ostrowski wrote:
> I'm assuming that there was a race in us sending patches at nearly the same
> time I'm convinced now that the flush_lock can die, and the patch I sent
> out kills it.
o_O 

I am drowning in patches. Just let me know which one to test :-)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces
  2009-10-20  0:08                     ` Denys Fedoryschenko
@ 2009-10-20  3:04                       ` Cyrill Gorcunov
  2009-10-20 11:36                         ` Denys Fedoryschenko
  0 siblings, 1 reply; 34+ messages in thread
From: Cyrill Gorcunov @ 2009-10-20  3:04 UTC (permalink / raw)
  To: Denys Fedoryschenko
  Cc: Michal Ostrowski, Eric Dumazet, netdev, linux-ppp, paulus,
	mostrows

On 10/20/09, Denys Fedoryschenko <denys@visp.net.lb> wrote:
> On Tuesday 20 October 2009 00:22:39 Michal Ostrowski wrote:
>> I'm assuming that there was a race in us sending patches at nearly the
>> same
>> time I'm convinced now that the flush_lock can die, and the patch I sent
>> out kills it.
> o_O
>
> I am drowning in patches. Just let me know which one to test :-)
>
Oh ;) Try out latest Michal's patch (and then mine). I'll continue
digg this issue at next spare time slot. Thanks!

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: kernel panic in latest vanilla stable, while using nameif with  "alive" pppoe interfaces
  2009-10-20  3:04                       ` Cyrill Gorcunov
@ 2009-10-20 11:36                         ` Denys Fedoryschenko
  2009-10-20 11:50                           ` Cyrill Gorcunov
  0 siblings, 1 reply; 34+ messages in thread
From: Denys Fedoryschenko @ 2009-10-20 11:36 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Michal Ostrowski, Eric Dumazet, netdev, linux-ppp, paulus,
	mostrows

On Tuesday 20 October 2009 06:04:35 Cyrill Gorcunov wrote:
> On 10/20/09, Denys Fedoryschenko <denys@visp.net.lb> wrote:
> > On Tuesday 20 October 2009 00:22:39 Michal Ostrowski wrote:
> >> I'm assuming that there was a race in us sending patches at nearly the
> >> same
> >> time I'm convinced now that the flush_lock can die, and the patch I sent
> >> out kills it.
> >
> > o_O
> >
> > I am drowning in patches. Just let me know which one to test :-)
>
> Oh ;) Try out latest Michal's patch (and then mine). I'll continue
> digg this issue at next spare time slot. Thanks!
Tried your patch, panic almost immediately
Here is a text of panic message over netconsole
http://www.nuclearcat.com/files/panic_pppoe3.txt
It is  different now, before it was pppoe_device_event, now in pppoe_rcv


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces
  2009-10-20 11:36                         ` Denys Fedoryschenko
@ 2009-10-20 11:50                           ` Cyrill Gorcunov
  2009-10-20 11:52                             ` Denys Fedoryschenko
  0 siblings, 1 reply; 34+ messages in thread
From: Cyrill Gorcunov @ 2009-10-20 11:50 UTC (permalink / raw)
  To: Denys Fedoryschenko
  Cc: Michal Ostrowski, Eric Dumazet, netdev, linux-ppp, paulus,
	mostrows

On 10/20/09, Denys Fedoryschenko <denys@visp.net.lb> wrote:
> On Tuesday 20 October 2009 06:04:35 Cyrill Gorcunov wrote:
>> On 10/20/09, Denys Fedoryschenko <denys@visp.net.lb> wrote:
>> > On Tuesday 20 October 2009 00:22:39 Michal Ostrowski wrote:
>> >> I'm assuming that there was a race in us sending patches at nearly the
>> >> same
>> >> time I'm convinced now that the flush_lock can die, and the patch I
>> >> sent
>> >> out kills it.
>> >
>> > o_O
>> >
>> > I am drowning in patches. Just let me know which one to test :-)
>>
>> Oh ;) Try out latest Michal's patch (and then mine). I'll continue
>> digg this issue at next spare time slot. Thanks!
> Tried your patch, panic almost immediately
> Here is a text of panic message over netconsole
> http://www.nuclearcat.com/files/panic_pppoe3.txt
> It is  different now, before it was pppoe_device_event, now in pppoe_rcv
>
>
Thanks a lot! I'll back with new one in a couple of hours. Meanwhile i
suppose you may try Michal's patch as well.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: kernel panic in latest vanilla stable, while using nameif with  "alive" pppoe interfaces
  2009-10-20 11:50                           ` Cyrill Gorcunov
@ 2009-10-20 11:52                             ` Denys Fedoryschenko
  2009-10-20 13:42                               ` Cyrill Gorcunov
  0 siblings, 1 reply; 34+ messages in thread
From: Denys Fedoryschenko @ 2009-10-20 11:52 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Michal Ostrowski, Eric Dumazet, netdev, linux-ppp, paulus,
	mostrows

On Tuesday 20 October 2009 14:50:09 Cyrill Gorcunov wrote:
> On 10/20/09, Denys Fedoryschenko <denys@visp.net.lb> wrote:
> > On Tuesday 20 October 2009 06:04:35 Cyrill Gorcunov wrote:
> >> On 10/20/09, Denys Fedoryschenko <denys@visp.net.lb> wrote:
> >> > On Tuesday 20 October 2009 00:22:39 Michal Ostrowski wrote:
> >> >> I'm assuming that there was a race in us sending patches at nearly
> >> >> the same
> >> >> time I'm convinced now that the flush_lock can die, and the patch I
> >> >> sent
> >> >> out kills it.
> >> >
> >> > o_O
> >> >
> >> > I am drowning in patches. Just let me know which one to test :-)
> >>
> >> Oh ;) Try out latest Michal's patch (and then mine). I'll continue
> >> digg this issue at next spare time slot. Thanks!
> >
> > Tried your patch, panic almost immediately
> > Here is a text of panic message over netconsole
> > http://www.nuclearcat.com/files/panic_pppoe3.txt
> > It is  different now, before it was pppoe_device_event, now in pppoe_rcv
>
> Thanks a lot! I'll back with new one in a couple of hours. Meanwhile i
> suppose you may try Michal's patch as well.
I did, it didn't help.
Maybe i can run some debugging options in kernel?
Also i can add debug(printk) lines in kernel if you want, to see where is bug 
appearing.
Note, i told to Michal, so will tell here, this pc is hyperthreading P4, as i 
know it is very good to trigger various SMP race conditions.
I can try also it with nosmp if u want.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces
  2009-10-20 11:52                             ` Denys Fedoryschenko
@ 2009-10-20 13:42                               ` Cyrill Gorcunov
  2009-10-20 13:50                                 ` Denys Fedoryschenko
  0 siblings, 1 reply; 34+ messages in thread
From: Cyrill Gorcunov @ 2009-10-20 13:42 UTC (permalink / raw)
  To: Denys Fedoryschenko
  Cc: Michal Ostrowski, Eric Dumazet, netdev, linux-ppp, paulus,
	mostrows

[Denys Fedoryschenko - Tue, Oct 20, 2009 at 02:52:58PM +0300]
...
| > Thanks a lot! I'll back with new one in a couple of hours. Meanwhile i
| > suppose you may try Michal's patch as well.
| I did, it didn't help.
| Maybe i can run some debugging options in kernel?
| Also i can add debug(printk) lines in kernel if you want, to see where is bug 
| appearing.
| Note, i told to Michal, so will tell here, this pc is hyperthreading P4, as i 
| know it is very good to trigger various SMP race conditions.
| I can try also it with nosmp if u want.
| 

Thanks Denys, I'm preparing new patch (just back from office
and had no inet connection that is why reply is delayed, sorry).

	-- Cyrill

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces
  2009-10-20 13:42                               ` Cyrill Gorcunov
@ 2009-10-20 13:50                                 ` Denys Fedoryschenko
  2009-10-20 13:59                                   ` Cyrill Gorcunov
  0 siblings, 1 reply; 34+ messages in thread
From: Denys Fedoryschenko @ 2009-10-20 13:50 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Michal Ostrowski, Eric Dumazet, netdev, linux-ppp, paulus,
	mostrows

On Tuesday 20 October 2009 16:42:17 Cyrill Gorcunov wrote:
> [Denys Fedoryschenko - Tue, Oct 20, 2009 at 02:52:58PM +0300]
> ...
>
> | > Thanks a lot! I'll back with new one in a couple of hours. Meanwhile i
> | > suppose you may try Michal's patch as well.
> |
> | I did, it didn't help.
> | Maybe i can run some debugging options in kernel?
> | Also i can add debug(printk) lines in kernel if you want, to see where is
> | bug appearing.
> | Note, i told to Michal, so will tell here, this pc is hyperthreading P4,
> | as i know it is very good to trigger various SMP race conditions.
> | I can try also it with nosmp if u want.
>
> Thanks Denys, I'm preparing new patch (just back from office
> and had no inet connection that is why reply is delayed, sorry).
There is no problem at all.
This rename operation is just future operation and host is redundant, so i can 
do tests on it anytime.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces
  2009-10-20 13:50                                 ` Denys Fedoryschenko
@ 2009-10-20 13:59                                   ` Cyrill Gorcunov
  2009-10-20 14:20                                     ` Denys Fedoryschenko
  0 siblings, 1 reply; 34+ messages in thread
From: Cyrill Gorcunov @ 2009-10-20 13:59 UTC (permalink / raw)
  To: Denys Fedoryschenko
  Cc: Michal Ostrowski, Eric Dumazet, netdev, linux-ppp, paulus,
	mostrows

[Denys Fedoryschenko - Tue, Oct 20, 2009 at 04:50:09PM +0300]
|
...
| >
| > Thanks Denys, I'm preparing new patch (just back from office
| > and had no inet connection that is why reply is delayed, sorry).
| There is no problem at all.
| This rename operation is just future operation and host is redundant, so i can 
| do tests on it anytime.
| 

ok, here is it, please try (it's still a draft version though)

	-- Cyrill
---
 drivers/net/pppoe.c |  106 +++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 81 insertions(+), 25 deletions(-)

Index: linux-2.6.git/drivers/net/pppoe.c
=====================================================================
--- linux-2.6.git.orig/drivers/net/pppoe.c
+++ linux-2.6.git/drivers/net/pppoe.c
@@ -313,8 +313,8 @@ static void pppoe_flush_dev(struct net_d
 			sk = sk_pppox(po);
 			spin_lock(&flush_lock);
 			po->pppoe_dev = NULL;
-			spin_unlock(&flush_lock);
 			dev_put(dev);
+			spin_unlock(&flush_lock);
 
 			/* We always grab the socket lock, followed by the
 			 * hash_lock, in that order.  Since we should
@@ -386,13 +386,21 @@ static struct notifier_block pppoe_notif
 static int pppoe_rcv_core(struct sock *sk, struct sk_buff *skb)
 {
 	struct pppox_sock *po = pppox_sk(sk);
-	struct pppox_sock *relay_po;
+	struct pppox_sock *relay_po = NULL;
+	struct net_device *dev = NULL;
 
 	if (sk->sk_state & PPPOX_BOUND) {
 		ppp_input(&po->chan, skb);
 	} else if (sk->sk_state & PPPOX_RELAY) {
-		relay_po = get_item_by_addr(dev_net(po->pppoe_dev),
-						&po->pppoe_relay);
+		struct pppoe_net *pn = pppoe_pernet(sock_net(sk));
+		read_lock_bh(&pn->hash_lock);
+		dev = po->pppoe_dev;
+		if (dev) {
+			dev_hold(dev);
+			relay_po = get_item_by_addr(dev_net(dev),
+					&po->pppoe_relay);
+		}
+		read_unlock_bh(&pn->hash_lock);
 		if (relay_po == NULL)
 			goto abort_kfree;
 
@@ -401,6 +409,7 @@ static int pppoe_rcv_core(struct sock *s
 
 		if (!__pppoe_xmit(sk_pppox(relay_po), skb))
 			goto abort_put;
+		dev_put(dev);
 	} else {
 		if (sock_queue_rcv_skb(sk, skb))
 			goto abort_kfree;
@@ -412,6 +421,8 @@ abort_put:
 	sock_put(sk_pppox(relay_po));
 
 abort_kfree:
+	if (dev)
+		dev_put(dev);
 	kfree_skb(skb);
 	return NET_RX_DROP;
 }
@@ -625,8 +636,8 @@ static int pppoe_connect(struct socket *
 	struct sock *sk = sock->sk;
 	struct sockaddr_pppox *sp = (struct sockaddr_pppox *)uservaddr;
 	struct pppox_sock *po = pppox_sk(sk);
-	struct net_device *dev;
-	struct pppoe_net *pn;
+	struct net_device *dev = NULL;
+	struct pppoe_net *pn = NULL;
 	int error;
 
 	lock_sock(sk);
@@ -652,12 +663,15 @@ static int pppoe_connect(struct socket *
 	/* Delete the old binding */
 	if (stage_session(po->pppoe_pa.sid)) {
 		pppox_unbind_sock(sk);
+		spin_lock(&flush_lock);
 		if (po->pppoe_dev) {
 			pn = pppoe_pernet(dev_net(po->pppoe_dev));
 			delete_item(pn, po->pppoe_pa.sid,
 				po->pppoe_pa.remote, po->pppoe_ifindex);
 			dev_put(po->pppoe_dev);
+			po->pppoe_dev = NULL;
 		}
+		spin_unlock(&flush_lock);
 		memset(sk_pppox(po) + 1, 0,
 		       sizeof(struct pppox_sock) - sizeof(struct sock));
 		sk->sk_state = PPPOX_NONE;
@@ -670,10 +684,11 @@ static int pppoe_connect(struct socket *
 		if (!dev)
 			goto end;
 
+		write_lock_bh(&pn->hash_lock);
+		dev_hold(dev);
 		po->pppoe_dev = dev;
 		po->pppoe_ifindex = dev->ifindex;
 		pn = pppoe_pernet(dev_net(dev));
-		write_lock_bh(&pn->hash_lock);
 		if (!(dev->flags & IFF_UP)) {
 			write_unlock_bh(&pn->hash_lock);
 			goto err_put;
@@ -700,6 +715,7 @@ static int pppoe_connect(struct socket *
 			goto err_put;
 
 		sk->sk_state = PPPOX_CONNECTED;
+		dev_put(dev);
 	}
 
 	po->num = sp->sa_addr.pppoe.sid;
@@ -708,10 +724,13 @@ end:
 	release_sock(sk);
 	return error;
 err_put:
+	dev_put(dev);
+	write_lock_bh(&pn->hash_lock);
 	if (po->pppoe_dev) {
 		dev_put(po->pppoe_dev);
 		po->pppoe_dev = NULL;
 	}
+	write_unlock_bh(&pn->hash_lock);
 	goto end;
 }
 
@@ -738,6 +757,8 @@ static int pppoe_ioctl(struct socket *so
 {
 	struct sock *sk = sock->sk;
 	struct pppox_sock *po = pppox_sk(sk);
+	struct pppoe_net *pn = pppoe_pernet(sock_net(sk));
+	unsigned int mtu = 0;
 	int val;
 	int err;
 
@@ -746,11 +767,17 @@ static int pppoe_ioctl(struct socket *so
 		err = -ENXIO;
 		if (!(sk->sk_state & PPPOX_CONNECTED))
 			break;
-
+		read_lock_bh(&pn->hash_lock);
+		err = -ENODEV;
+		if (po->pppoe_dev) {
+			mtu = po->pppoe_dev->mtu;
+			err = 0;
+		}
+		read_unlock_bh(&pn->hash_lock);
+		if (err)
+			break;
 		err = -EFAULT;
-		if (put_user(po->pppoe_dev->mtu -
-			     sizeof(struct pppoe_hdr) -
-			     PPP_HDRLEN,
+		if (put_user(mtu - sizeof(struct pppoe_hdr) - PPP_HDRLEN,
 			     (int __user *)arg))
 			break;
 		err = 0;
@@ -761,13 +788,21 @@ static int pppoe_ioctl(struct socket *so
 		if (!(sk->sk_state & PPPOX_CONNECTED))
 			break;
 
+		read_lock_bh(&pn->hash_lock);
+		err = -ENODEV;
+		if (po->pppoe_dev) {
+			mtu = po->pppoe_dev->mtu;
+			err = 0;
+		}
+		read_unlock_bh(&pn->hash_lock);
+		if (err)
+			break;
+
 		err = -EFAULT;
 		if (get_user(val, (int __user *)arg))
 			break;
 
-		if (val < (po->pppoe_dev->mtu
-			   - sizeof(struct pppoe_hdr)
-			   - PPP_HDRLEN))
+		if (val < (mtu - sizeof(struct pppoe_hdr) - PPP_HDRLEN))
 			err = 0;
 		else
 			err = -EINVAL;
@@ -839,10 +874,11 @@ static int pppoe_sendmsg(struct kiocb *i
 	struct sk_buff *skb;
 	struct sock *sk = sock->sk;
 	struct pppox_sock *po = pppox_sk(sk);
+	struct pppoe_net *pn = pppoe_pernet(sock_net(sk));
 	int error;
 	struct pppoe_hdr hdr;
 	struct pppoe_hdr *ph;
-	struct net_device *dev;
+	struct net_device *dev = NULL;
 	char *start;
 
 	lock_sock(sk);
@@ -856,18 +892,27 @@ static int pppoe_sendmsg(struct kiocb *i
 	hdr.code = 0;
 	hdr.sid = po->num;
 
-	dev = po->pppoe_dev;
+	read_lock_bh(&pn->hash_lock);
+	error = -ENODEV;
+	if (po->pppoe_dev) {
+		dev = po->pppoe_dev;
+		dev_hold(dev);
+		error = 0;
+	}
+	read_unlock_bh(&pn->hash_lock);
+	if (error)
+		goto end;
 
 	error = -EMSGSIZE;
 	if (total_len > (dev->mtu + dev->hard_header_len))
-		goto end;
+		goto end_put;
 
 
 	skb = sock_wmalloc(sk, total_len + dev->hard_header_len + 32,
 			   0, GFP_KERNEL);
 	if (!skb) {
 		error = -ENOMEM;
-		goto end;
+		goto end_put;
 	}
 
 	/* Reserve space for headers. */
@@ -885,7 +930,7 @@ static int pppoe_sendmsg(struct kiocb *i
 	error = memcpy_fromiovec(start, m->msg_iov, total_len);
 	if (error < 0) {
 		kfree_skb(skb);
-		goto end;
+		goto end_put;
 	}
 
 	error = total_len;
@@ -898,6 +943,8 @@ static int pppoe_sendmsg(struct kiocb *i
 
 	dev_queue_xmit(skb);
 
+end_put:
+	dev_put(dev);
 end:
 	release_sock(sk);
 	return error;
@@ -911,21 +958,28 @@ end:
 static int __pppoe_xmit(struct sock *sk, struct sk_buff *skb)
 {
 	struct pppox_sock *po = pppox_sk(sk);
-	struct net_device *dev = po->pppoe_dev;
+	struct pppoe_net *pn = pppoe_pernet(sock_net(sk));
+	struct net_device *dev;
 	struct pppoe_hdr *ph;
 	int data_len = skb->len;
 
-	if (sock_flag(sk, SOCK_DEAD) || !(sk->sk_state & PPPOX_CONNECTED))
+	read_lock_bh(&pn->hash_lock);
+	if (!po->pppoe_dev) {
+		read_unlock_bh(&pn->hash_lock);
 		goto abort;
+	}
+	dev = po->pppoe_dev;
+	dev_hold(dev);
+	read_unlock_bh(&pn->hash_lock);
 
-	if (!dev)
-		goto abort;
+	if (sock_flag(sk, SOCK_DEAD) || !(sk->sk_state & PPPOX_CONNECTED))
+		goto abort_put;
 
 	/* Copy the data if there is no space for the header or if it's
 	 * read-only.
 	 */
 	if (skb_cow_head(skb, sizeof(*ph) + dev->hard_header_len))
-		goto abort;
+		goto abort_put;
 
 	__skb_push(skb, sizeof(*ph));
 	skb_reset_network_header(skb);
@@ -944,9 +998,11 @@ static int __pppoe_xmit(struct sock *sk,
 			po->pppoe_pa.remote, NULL, data_len);
 
 	dev_queue_xmit(skb);
-
+	dev_put(dev);
 	return 1;
 
+abort_put:
+	dev_put(dev);
 abort:
 	kfree_skb(skb);
 	return 1;

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces
  2009-10-20 13:59                                   ` Cyrill Gorcunov
@ 2009-10-20 14:20                                     ` Denys Fedoryschenko
  2009-10-20 14:23                                       ` Cyrill Gorcunov
  2009-10-20 19:08                                       ` Cyrill Gorcunov
  0 siblings, 2 replies; 34+ messages in thread
From: Denys Fedoryschenko @ 2009-10-20 14:20 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Michal Ostrowski, Eric Dumazet, netdev, linux-ppp, paulus,
	mostrows

It panics almost immediately on boot(even on old operations  that was stable, 
seems on first pppoe customer login attempt), i will rebuild kernel and if 
interesting will try to get panic message.

On Tuesday 20 October 2009 16:59:20 Cyrill Gorcunov wrote:
> [Denys Fedoryschenko - Tue, Oct 20, 2009 at 04:50:09PM +0300]
>
> ...
>
> | > Thanks Denys, I'm preparing new patch (just back from office
> | > and had no inet connection that is why reply is delayed, sorry).
> |
> | There is no problem at all.
> | This rename operation is just future operation and host is redundant, so
> | i can do tests on it anytime.
>
> ok, here is it, please try (it's still a draft version though)
>
> 	-- Cyrill
> ---
>  drivers/net/pppoe.c |  106
> +++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 81
> insertions(+), 25 deletions(-)
>
> Index: linux-2.6.git/drivers/net/pppoe.c
> =====================================================================
> --- linux-2.6.git.orig/drivers/net/pppoe.c
> +++ linux-2.6.git/drivers/net/pppoe.c
> @@ -313,8 +313,8 @@ static void pppoe_flush_dev(struct net_d
>  			sk = sk_pppox(po);
>  			spin_lock(&flush_lock);
>  			po->pppoe_dev = NULL;
> -			spin_unlock(&flush_lock);
>  			dev_put(dev);
> +			spin_unlock(&flush_lock);
>
>  			/* We always grab the socket lock, followed by the
>  			 * hash_lock, in that order.  Since we should
> @@ -386,13 +386,21 @@ static struct notifier_block pppoe_notif
>  static int pppoe_rcv_core(struct sock *sk, struct sk_buff *skb)
>  {
>  	struct pppox_sock *po = pppox_sk(sk);
> -	struct pppox_sock *relay_po;
> +	struct pppox_sock *relay_po = NULL;
> +	struct net_device *dev = NULL;
>
>  	if (sk->sk_state & PPPOX_BOUND) {
>  		ppp_input(&po->chan, skb);
>  	} else if (sk->sk_state & PPPOX_RELAY) {
> -		relay_po = get_item_by_addr(dev_net(po->pppoe_dev),
> -						&po->pppoe_relay);
> +		struct pppoe_net *pn = pppoe_pernet(sock_net(sk));
> +		read_lock_bh(&pn->hash_lock);
> +		dev = po->pppoe_dev;
> +		if (dev) {
> +			dev_hold(dev);
> +			relay_po = get_item_by_addr(dev_net(dev),
> +					&po->pppoe_relay);
> +		}
> +		read_unlock_bh(&pn->hash_lock);
>  		if (relay_po == NULL)
>  			goto abort_kfree;
>
> @@ -401,6 +409,7 @@ static int pppoe_rcv_core(struct sock *s
>
>  		if (!__pppoe_xmit(sk_pppox(relay_po), skb))
>  			goto abort_put;
> +		dev_put(dev);
>  	} else {
>  		if (sock_queue_rcv_skb(sk, skb))
>  			goto abort_kfree;
> @@ -412,6 +421,8 @@ abort_put:
>  	sock_put(sk_pppox(relay_po));
>
>  abort_kfree:
> +	if (dev)
> +		dev_put(dev);
>  	kfree_skb(skb);
>  	return NET_RX_DROP;
>  }
> @@ -625,8 +636,8 @@ static int pppoe_connect(struct socket *
>  	struct sock *sk = sock->sk;
>  	struct sockaddr_pppox *sp = (struct sockaddr_pppox *)uservaddr;
>  	struct pppox_sock *po = pppox_sk(sk);
> -	struct net_device *dev;
> -	struct pppoe_net *pn;
> +	struct net_device *dev = NULL;
> +	struct pppoe_net *pn = NULL;
>  	int error;
>
>  	lock_sock(sk);
> @@ -652,12 +663,15 @@ static int pppoe_connect(struct socket *
>  	/* Delete the old binding */
>  	if (stage_session(po->pppoe_pa.sid)) {
>  		pppox_unbind_sock(sk);
> +		spin_lock(&flush_lock);
>  		if (po->pppoe_dev) {
>  			pn = pppoe_pernet(dev_net(po->pppoe_dev));
>  			delete_item(pn, po->pppoe_pa.sid,
>  				po->pppoe_pa.remote, po->pppoe_ifindex);
>  			dev_put(po->pppoe_dev);
> +			po->pppoe_dev = NULL;
>  		}
> +		spin_unlock(&flush_lock);
>  		memset(sk_pppox(po) + 1, 0,
>  		       sizeof(struct pppox_sock) - sizeof(struct sock));
>  		sk->sk_state = PPPOX_NONE;
> @@ -670,10 +684,11 @@ static int pppoe_connect(struct socket *
>  		if (!dev)
>  			goto end;
>
> +		write_lock_bh(&pn->hash_lock);
> +		dev_hold(dev);
>  		po->pppoe_dev = dev;
>  		po->pppoe_ifindex = dev->ifindex;
>  		pn = pppoe_pernet(dev_net(dev));
> -		write_lock_bh(&pn->hash_lock);
>  		if (!(dev->flags & IFF_UP)) {
>  			write_unlock_bh(&pn->hash_lock);
>  			goto err_put;
> @@ -700,6 +715,7 @@ static int pppoe_connect(struct socket *
>  			goto err_put;
>
>  		sk->sk_state = PPPOX_CONNECTED;
> +		dev_put(dev);
>  	}
>
>  	po->num = sp->sa_addr.pppoe.sid;
> @@ -708,10 +724,13 @@ end:
>  	release_sock(sk);
>  	return error;
>  err_put:
> +	dev_put(dev);
> +	write_lock_bh(&pn->hash_lock);
>  	if (po->pppoe_dev) {
>  		dev_put(po->pppoe_dev);
>  		po->pppoe_dev = NULL;
>  	}
> +	write_unlock_bh(&pn->hash_lock);
>  	goto end;
>  }
>
> @@ -738,6 +757,8 @@ static int pppoe_ioctl(struct socket *so
>  {
>  	struct sock *sk = sock->sk;
>  	struct pppox_sock *po = pppox_sk(sk);
> +	struct pppoe_net *pn = pppoe_pernet(sock_net(sk));
> +	unsigned int mtu = 0;
>  	int val;
>  	int err;
>
> @@ -746,11 +767,17 @@ static int pppoe_ioctl(struct socket *so
>  		err = -ENXIO;
>  		if (!(sk->sk_state & PPPOX_CONNECTED))
>  			break;
> -
> +		read_lock_bh(&pn->hash_lock);
> +		err = -ENODEV;
> +		if (po->pppoe_dev) {
> +			mtu = po->pppoe_dev->mtu;
> +			err = 0;
> +		}
> +		read_unlock_bh(&pn->hash_lock);
> +		if (err)
> +			break;
>  		err = -EFAULT;
> -		if (put_user(po->pppoe_dev->mtu -
> -			     sizeof(struct pppoe_hdr) -
> -			     PPP_HDRLEN,
> +		if (put_user(mtu - sizeof(struct pppoe_hdr) - PPP_HDRLEN,
>  			     (int __user *)arg))
>  			break;
>  		err = 0;
> @@ -761,13 +788,21 @@ static int pppoe_ioctl(struct socket *so
>  		if (!(sk->sk_state & PPPOX_CONNECTED))
>  			break;
>
> +		read_lock_bh(&pn->hash_lock);
> +		err = -ENODEV;
> +		if (po->pppoe_dev) {
> +			mtu = po->pppoe_dev->mtu;
> +			err = 0;
> +		}
> +		read_unlock_bh(&pn->hash_lock);
> +		if (err)
> +			break;
> +
>  		err = -EFAULT;
>  		if (get_user(val, (int __user *)arg))
>  			break;
>
> -		if (val < (po->pppoe_dev->mtu
> -			   - sizeof(struct pppoe_hdr)
> -			   - PPP_HDRLEN))
> +		if (val < (mtu - sizeof(struct pppoe_hdr) - PPP_HDRLEN))
>  			err = 0;
>  		else
>  			err = -EINVAL;
> @@ -839,10 +874,11 @@ static int pppoe_sendmsg(struct kiocb *i
>  	struct sk_buff *skb;
>  	struct sock *sk = sock->sk;
>  	struct pppox_sock *po = pppox_sk(sk);
> +	struct pppoe_net *pn = pppoe_pernet(sock_net(sk));
>  	int error;
>  	struct pppoe_hdr hdr;
>  	struct pppoe_hdr *ph;
> -	struct net_device *dev;
> +	struct net_device *dev = NULL;
>  	char *start;
>
>  	lock_sock(sk);
> @@ -856,18 +892,27 @@ static int pppoe_sendmsg(struct kiocb *i
>  	hdr.code = 0;
>  	hdr.sid = po->num;
>
> -	dev = po->pppoe_dev;
> +	read_lock_bh(&pn->hash_lock);
> +	error = -ENODEV;
> +	if (po->pppoe_dev) {
> +		dev = po->pppoe_dev;
> +		dev_hold(dev);
> +		error = 0;
> +	}
> +	read_unlock_bh(&pn->hash_lock);
> +	if (error)
> +		goto end;
>
>  	error = -EMSGSIZE;
>  	if (total_len > (dev->mtu + dev->hard_header_len))
> -		goto end;
> +		goto end_put;
>
>
>  	skb = sock_wmalloc(sk, total_len + dev->hard_header_len + 32,
>  			   0, GFP_KERNEL);
>  	if (!skb) {
>  		error = -ENOMEM;
> -		goto end;
> +		goto end_put;
>  	}
>
>  	/* Reserve space for headers. */
> @@ -885,7 +930,7 @@ static int pppoe_sendmsg(struct kiocb *i
>  	error = memcpy_fromiovec(start, m->msg_iov, total_len);
>  	if (error < 0) {
>  		kfree_skb(skb);
> -		goto end;
> +		goto end_put;
>  	}
>
>  	error = total_len;
> @@ -898,6 +943,8 @@ static int pppoe_sendmsg(struct kiocb *i
>
>  	dev_queue_xmit(skb);
>
> +end_put:
> +	dev_put(dev);
>  end:
>  	release_sock(sk);
>  	return error;
> @@ -911,21 +958,28 @@ end:
>  static int __pppoe_xmit(struct sock *sk, struct sk_buff *skb)
>  {
>  	struct pppox_sock *po = pppox_sk(sk);
> -	struct net_device *dev = po->pppoe_dev;
> +	struct pppoe_net *pn = pppoe_pernet(sock_net(sk));
> +	struct net_device *dev;
>  	struct pppoe_hdr *ph;
>  	int data_len = skb->len;
>
> -	if (sock_flag(sk, SOCK_DEAD) || !(sk->sk_state & PPPOX_CONNECTED))
> +	read_lock_bh(&pn->hash_lock);
> +	if (!po->pppoe_dev) {
> +		read_unlock_bh(&pn->hash_lock);
>  		goto abort;
> +	}
> +	dev = po->pppoe_dev;
> +	dev_hold(dev);
> +	read_unlock_bh(&pn->hash_lock);
>
> -	if (!dev)
> -		goto abort;
> +	if (sock_flag(sk, SOCK_DEAD) || !(sk->sk_state & PPPOX_CONNECTED))
> +		goto abort_put;
>
>  	/* Copy the data if there is no space for the header or if it's
>  	 * read-only.
>  	 */
>  	if (skb_cow_head(skb, sizeof(*ph) + dev->hard_header_len))
> -		goto abort;
> +		goto abort_put;
>
>  	__skb_push(skb, sizeof(*ph));
>  	skb_reset_network_header(skb);
> @@ -944,9 +998,11 @@ static int __pppoe_xmit(struct sock *sk,
>  			po->pppoe_pa.remote, NULL, data_len);
>
>  	dev_queue_xmit(skb);
> -
> +	dev_put(dev);
>  	return 1;
>
> +abort_put:
> +	dev_put(dev);
>  abort:
>  	kfree_skb(skb);
>  	return 1;



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces
  2009-10-20 14:20                                     ` Denys Fedoryschenko
@ 2009-10-20 14:23                                       ` Cyrill Gorcunov
  2009-10-20 19:08                                       ` Cyrill Gorcunov
  1 sibling, 0 replies; 34+ messages in thread
From: Cyrill Gorcunov @ 2009-10-20 14:23 UTC (permalink / raw)
  To: Denys Fedoryschenko
  Cc: Michal Ostrowski, Eric Dumazet, netdev, linux-ppp, paulus,
	mostrows

[Denys Fedoryschenko - Tue, Oct 20, 2009 at 05:20:00PM +0300]
|
| It panics almost immediately on boot(even on old operations  that was stable, 
| seems on first pppoe customer login attempt), i will rebuild kernel and if 
| interesting will try to get panic message.
| 
...

ok, thanks. I continue digging.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces
  2009-10-20 14:20                                     ` Denys Fedoryschenko
  2009-10-20 14:23                                       ` Cyrill Gorcunov
@ 2009-10-20 19:08                                       ` Cyrill Gorcunov
  2009-10-23 15:18                                         ` Cyrill Gorcunov
  1 sibling, 1 reply; 34+ messages in thread
From: Cyrill Gorcunov @ 2009-10-20 19:08 UTC (permalink / raw)
  To: Denys Fedoryschenko
  Cc: Michal Ostrowski, Eric Dumazet, netdev, linux-ppp, paulus,
	mostrows

[Denys Fedoryschenko - Tue, Oct 20, 2009 at 05:20:00PM +0300]
| It panics almost immediately on boot(even on old operations  that was stable, 
| seems on first pppoe customer login attempt), i will rebuild kernel and if 
| interesting will try to get panic message.
| 
...

Just to update status of the issue. The key moment is that pppoe_flush_dev
may be called asynchronously (especially via sysfs on dev entry, for example
we retrieve mtu of device and while we at it other process may update mtu
via sysfs). So I'm returning pppoe_hash_lock back which should eliminate a
number of lock complains and make locking scheme easier. All-in-one: I'm
working on it. Just need some more time.

	-- Cyrill

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces
  2009-10-20 19:08                                       ` Cyrill Gorcunov
@ 2009-10-23 15:18                                         ` Cyrill Gorcunov
  2009-10-25 18:10                                           ` Denys Fedoryschenko
  0 siblings, 1 reply; 34+ messages in thread
From: Cyrill Gorcunov @ 2009-10-23 15:18 UTC (permalink / raw)
  To: netdev
  Cc: Denys Fedoryschenko, Michal Ostrowski, Eric Dumazet, linux-ppp,
	paulus, mostrows

[Cyrill Gorcunov - Tue, Oct 20, 2009 at 11:08:21PM +0400]
...
| 
| Just to update status of the issue. The key moment is that pppoe_flush_dev
| may be called asynchronously (especially via sysfs on dev entry, for example
| we retrieve mtu of device and while we at it other process may update mtu
| via sysfs). So I'm returning pppoe_hash_lock back which should eliminate a
| number of lock complains and make locking scheme easier. All-in-one: I'm
| working on it. Just need some more time.
| 
| 	-- Cyrill

Another status update -- the patch is under testing stage. Hope we will
reveal proper patch soon (in a few days I guess).

	-- Cyrill

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces
  2009-10-23 15:18                                         ` Cyrill Gorcunov
@ 2009-10-25 18:10                                           ` Denys Fedoryschenko
  0 siblings, 0 replies; 34+ messages in thread
From: Denys Fedoryschenko @ 2009-10-25 18:10 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: netdev, Michal Ostrowski, Eric Dumazet, linux-ppp, paulus,
	mostrows

On Friday 23 October 2009 18:18:37 Cyrill Gorcunov wrote:
> [Cyrill Gorcunov - Tue, Oct 20, 2009 at 11:08:21PM +0400]
> ...
>
> | Just to update status of the issue. The key moment is that
> | pppoe_flush_dev may be called asynchronously (especially via sysfs on dev
> | entry, for example we retrieve mtu of device and while we at it other
> | process may update mtu via sysfs). So I'm returning pppoe_hash_lock back
> | which should eliminate a number of lock complains and make locking scheme
> | easier. All-in-one: I'm working on it. Just need some more time.
> |
> | 	-- Cyrill
>
> Another status update -- the patch is under testing stage. Hope we will
> reveal proper patch soon (in a few days I guess).
>
> 	-- Cyrill
During normal operation till now no issues (2 days uptime, 100-150 users 
online on this pppoe). Testing more some stress scenarios.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces
  2009-10-19 21:22                   ` Michal Ostrowski
  2009-10-20  0:08                     ` Denys Fedoryschenko
@ 2009-10-20  2:28                     ` David Miller
  1 sibling, 0 replies; 34+ messages in thread
From: David Miller @ 2009-10-20  2:28 UTC (permalink / raw)
  To: mostrows; +Cc: gorcunov, eric.dumazet, denys, netdev, linux-ppp, paulus,
	mostrows

Please stop top posting!

I want to follow this discussions efficiently and that's impossible
if you reply BEFORE instead of AFTER the context of what you're
replying to.

Note that a day will come very soon that postings to these lists
that do top posting will be flat out bounced back to you and
never make it to the list at all.

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2009-10-25 18:10 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-18 21:02 kernel panic in latest vanilla stable, while using nameif with "alive" pppoe interfaces Denys Fedoryschenko
2009-10-19  3:34 ` Michal Ostrowski
2009-10-19 11:36   ` Denys Fedoryschenko
2009-10-19 12:01   ` Denys Fedoryschenko
2009-10-19 12:36   ` Eric Dumazet
2009-10-19 13:19     ` Michal Ostrowski
2009-10-19 15:50       ` Cyrill Gorcunov
2009-10-19 16:05         ` Michal Ostrowski
2009-10-19 17:12           ` Eric Dumazet
2009-10-19 18:07             ` Michal Ostrowski
2009-10-19 18:44               ` Eric Dumazet
2009-10-19 19:29                 ` Cyrill Gorcunov
2009-10-19 20:54                 ` Michal Ostrowski
2009-10-20  3:42                   ` Eric Dumazet
2009-10-20  5:02                     ` Cyrill Gorcunov
2009-10-20  5:05                       ` Eric Dumazet
2009-10-20  5:17                         ` Cyrill Gorcunov
2009-10-20  6:04                           ` Cyrill Gorcunov
2009-10-19 20:57                 ` Cyrill Gorcunov
2009-10-19 21:22                   ` Michal Ostrowski
2009-10-20  0:08                     ` Denys Fedoryschenko
2009-10-20  3:04                       ` Cyrill Gorcunov
2009-10-20 11:36                         ` Denys Fedoryschenko
2009-10-20 11:50                           ` Cyrill Gorcunov
2009-10-20 11:52                             ` Denys Fedoryschenko
2009-10-20 13:42                               ` Cyrill Gorcunov
2009-10-20 13:50                                 ` Denys Fedoryschenko
2009-10-20 13:59                                   ` Cyrill Gorcunov
2009-10-20 14:20                                     ` Denys Fedoryschenko
2009-10-20 14:23                                       ` Cyrill Gorcunov
2009-10-20 19:08                                       ` Cyrill Gorcunov
2009-10-23 15:18                                         ` Cyrill Gorcunov
2009-10-25 18:10                                           ` Denys Fedoryschenko
2009-10-20  2:28                     ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).