netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] panic during unregister_netdevice()
@ 2003-11-06  0:00 Krishna Kumar
  2003-11-06  0:30 ` Stephen Hemminger
  0 siblings, 1 reply; 14+ messages in thread
From: Krishna Kumar @ 2003-11-06  0:00 UTC (permalink / raw)
  To: davem; +Cc: netdev

Hi dave,

While doing a test comprising of :

insmod e100
ifup eth0
rmmod e100

on test9-bk9 bits, I got the following Oops :

Nov  5 14:54:58 linux kernel: Unable to handle kernel paging request at virtual address 0260025d
Nov  5 14:54:58 linux kernel:  printing eip:
Nov  5 14:54:58 linux kernel: c0318a5a
Nov  5 14:54:58 linux kernel: *pde = 00000000
Nov  5 14:54:58 linux kernel: Oops: 0000 [#1]
Nov  5 14:54:58 linux kernel: CPU:    0
Nov  5 14:54:58 linux kernel: EIP:    0060:[__rta_fill+90/160]    Not tainted
Nov  5 14:54:58 linux kernel: EIP:    0060:[<c0318a5a>]    Not tainted
Nov  5 14:54:58 linux kernel: EFLAGS: 00010282
Nov  5 14:54:58 linux kernel: EIP is at __rta_fill+0x5a/0xa0
Nov  5 14:54:58 linux kernel: eax: 00000008   ebx: 00000004   ecx: 00000001   edx: c94fe980
Nov  5 14:54:58 linux kernel: esi: 0260025d   edi: c7e8a054   ebp: c1553e64   esp: c1553e48
Nov  5 14:54:58 linux kernel: ds: 007b   es: 007b   ss: 0068
Nov  5 14:54:58 linux kernel: Process events/0 (pid: 4, threadinfo=c1552000 task=c152c670)
Nov  5 14:54:58 linux kernel: Stack: c1553e6c c011b36e c0433800 00000008 c7e8a050 ce77a000 00000000 c1553e98
Nov  5 14:54:58 linux kernel:        c0318fee c94fe980 0000000a 00000004 0260025d c030a268 00000000 c7e8a000
Nov  5 14:54:58 linux kernel:        011b000e c94fe980 ce77a000 00000011 c1553ec8 c031943c c94fe980 ce77a000
Nov  5 14:54:58 linux kernel: Call Trace:
Nov  5 14:54:58 linux kernel:  [recalc_task_prio+126/384] recalc_task_prio+0x7e/0x180
Nov  5 14:54:58 linux kernel:  [<c011b36e>] recalc_task_prio+0x7e/0x180
Nov  5 14:54:58 linux kernel:  [rtnetlink_fill_ifinfo+862/1264] rtnetlink_fill_ifinfo+0x35e/0x4f0
Nov  5 14:54:58 linux kernel:  [<c0318fee>] rtnetlink_fill_ifinfo+0x35e/0x4f0
Nov  5 14:54:58 linux kernel:  [alloc_skb+72/240] alloc_skb+0x48/0xf0
Nov  5 14:54:58 linux kernel:  [<c030a268>] alloc_skb+0x48/0xf0
Nov  5 14:54:58 linux kernel:  [rtmsg_ifinfo+92/208] rtmsg_ifinfo+0x5c/0xd0
Nov  5 14:54:58 linux kernel:  [<c031943c>] rtmsg_ifinfo+0x5c/0xd0
Nov  5 14:54:58 linux kernel:  [rtnetlink_event+48/117] rtnetlink_event+0x30/0x75
Nov  5 14:54:58 linux kernel:  [<c0319980>] rtnetlink_event+0x30/0x75
Nov  5 14:54:58 linux kernel:  [notifier_call_chain+45/80] notifier_call_chain+0x2d/0x50
Nov  5 14:54:58 linux kernel:  [<c013055d>] notifier_call_chain+0x2d/0x50
Nov  5 14:54:58 linux kernel:  [netdev_wait_allrefs+242/320] netdev_wait_allrefs+0xf2/0x140
Nov  5 14:54:58 linux kernel:  [<c0310d52>] netdev_wait_allrefs+0xf2/0x140
Nov  5 14:54:58 linux kernel:  [netdev_run_todo+347/608] netdev_run_todo+0x15b/0x260
Nov  5 14:54:58 linux kernel:  [<c0310efb>] netdev_run_todo+0x15b/0x260
Nov  5 14:54:58 linux kernel:  [worker_thread+531/800] worker_thread+0x213/0x320
Nov  5 14:54:58 linux kernel:  [<c01333a3>] worker_thread+0x213/0x320
Nov  5 14:54:58 linux kernel:  [linkwatch_event+0/48] linkwatch_event+0x0/0x30
Nov  5 14:54:58 linux kernel:  [<c0319c60>] linkwatch_event+0x0/0x30
Nov  5 14:54:58 linux kernel:  [default_wake_function+0/48] default_wake_function+0x0/0x30
Nov  5 14:54:58 linux kernel:  [<c011d030>] default_wake_function+0x0/0x30
Nov  5 14:54:58 linux kernel:  [ret_from_fork+6/20] ret_from_fork+0x6/0x14
Nov  5 14:54:58 linux kernel:  [<c01097e6>] ret_from_fork+0x6/0x14
Nov  5 14:54:58 linux kernel:  [default_wake_function+0/48] default_wake_function+0x0/0x30
Nov  5 14:54:58 linux kernel:  [<c011d030>] default_wake_function+0x0/0x30
Nov  5 14:54:58 linux kernel:  [worker_thread+0/800] worker_thread+0x0/0x320
Nov  5 14:54:58 linux kernel:  [<c0133190>] worker_thread+0x0/0x320
Nov  5 14:54:58 linux kernel:  [kernel_thread_helper+5/24] kernel_thread_helper+0x5/0x18
Nov  5 14:54:58 linux kernel:  [<c010753d>] kernel_thread_helper+0x5/0x18
Nov  5 14:54:58 linux kernel:
Nov  5 14:54:58 linux kernel: Code: f3 a5 f6 c3 02 74 02 66 a5 f6 c3 01 74 01 a4 8b 5d f4 8b 75


I think the problem is as follows (changed between 2.4 and 2.6).
unregister_netdevice() drops the last reference to the device and waits
for the ref counter for the dev to drop to zero. While it is OK for
unregister_netdevice to call notifier_call_chain (since it does a dev_put
at the end of the routine), netdev_wait_allrefs() cannot do the same until
it gets it's own reference. The dev can disappear during this when the
last reference gets dropped by the process holding it.

Following patch should fix it. I will try to reproduce this with and
without the patch to be certain.

Thanks,

- KK

diff -ruN linux-2.6.0-test9-bk9/net/core/dev.c linux-2.6.0-test9-bk9.new/net/core/dev.c
--- linux-2.6.0-test9-bk9/net/core/dev.c	2003-11-05 15:43:21.000000000 -0800
+++ linux-2.6.0-test9-bk9.new/net/core/dev.c	2003-11-05 15:43:50.000000000 -0800
@@ -2749,8 +2749,10 @@
 			rtnl_exlock();

 			/* Rebroadcast unregister notification */
+			dev_hold(dev);
 			notifier_call_chain(&netdev_chain,
 					    NETDEV_UNREGISTER, dev);
+			dev_put(dev);

 			if (test_bit(__LINK_STATE_LINKWATCH_PENDING,
 				     &dev->state)) {

^ permalink raw reply	[flat|nested] 14+ messages in thread
* Re: [PATCH] panic during unregister_netdevice()
@ 2003-11-06  0:26 Krishna Kumar
  0 siblings, 0 replies; 14+ messages in thread
From: Krishna Kumar @ 2003-11-06  0:26 UTC (permalink / raw)
  To: davem; +Cc: netdev





About the patch below, I am still not clear about dev->reg_state
(NETREG_UNREGISTERING)
being used correctly, or how netdev_run_todo works correctly, so possibly
the race may not be
fixed. I am trying to reproduce this again. I will look some more into
this.

thanks,

- KK



|---------+------------------------------->
|         |           krkumar@us.ltcfwd.li|
|         |           nux.ibm.com         |
|         |           Sent by:            |
|         |           netdev-bounce@oss.sg|
|         |           i.com               |
|         |                               |
|         |                               |
|         |           11/05/2003 04:00 PM |
|         |                               |
|---------+------------------------------->
  >-----------------------------------------------------------------------------------------------------------------|
  |                                                                                                                 |
  |       To:       davem@redhat.com                                                                                |
  |       cc:       netdev@oss.sgi.com                                                                              |
  |       Subject:  [PATCH] panic during unregister_netdevice()                                                     |
  |                                                                                                                 |
  >-----------------------------------------------------------------------------------------------------------------|




Hi dave,

While doing a test comprising of :

insmod e100
ifup eth0
rmmod e100

on test9-bk9 bits, I got the following Oops :

Nov  5 14:54:58 linux kernel: Unable to handle kernel paging request at
virtual address 0260025d
Nov  5 14:54:58 linux kernel:  printing eip:
Nov  5 14:54:58 linux kernel: c0318a5a
Nov  5 14:54:58 linux kernel: *pde = 00000000
Nov  5 14:54:58 linux kernel: Oops: 0000 [#1]
Nov  5 14:54:58 linux kernel: CPU:    0
Nov  5 14:54:58 linux kernel: EIP:    0060:[__rta_fill+90/160]    Not
tainted
Nov  5 14:54:58 linux kernel: EIP:    0060:[<c0318a5a>]    Not tainted
Nov  5 14:54:58 linux kernel: EFLAGS: 00010282
Nov  5 14:54:58 linux kernel: EIP is at __rta_fill+0x5a/0xa0
Nov  5 14:54:58 linux kernel: eax: 00000008   ebx: 00000004   ecx: 00000001
edx: c94fe980
Nov  5 14:54:58 linux kernel: esi: 0260025d   edi: c7e8a054   ebp: c1553e64
esp: c1553e48
Nov  5 14:54:58 linux kernel: ds: 007b   es: 007b   ss: 0068
Nov  5 14:54:58 linux kernel: Process events/0 (pid: 4, threadinfo=c1552000
task=c152c670)
Nov  5 14:54:58 linux kernel: Stack: c1553e6c c011b36e c0433800 00000008
c7e8a050 ce77a000 00000000 c1553e98
Nov  5 14:54:58 linux kernel:        c0318fee c94fe980 0000000a 00000004
0260025d c030a268 00000000 c7e8a000
Nov  5 14:54:58 linux kernel:        011b000e c94fe980 ce77a000 00000011
c1553ec8 c031943c c94fe980 ce77a000
Nov  5 14:54:58 linux kernel: Call Trace:
Nov  5 14:54:58 linux kernel:  [recalc_task_prio+126/384]
recalc_task_prio+0x7e/0x180
Nov  5 14:54:58 linux kernel:  [<c011b36e>] recalc_task_prio+0x7e/0x180
Nov  5 14:54:58 linux kernel:  [rtnetlink_fill_ifinfo+862/1264]
rtnetlink_fill_ifinfo+0x35e/0x4f0
Nov  5 14:54:58 linux kernel:  [<c0318fee>]
rtnetlink_fill_ifinfo+0x35e/0x4f0
Nov  5 14:54:58 linux kernel:  [alloc_skb+72/240] alloc_skb+0x48/0xf0
Nov  5 14:54:58 linux kernel:  [<c030a268>] alloc_skb+0x48/0xf0
Nov  5 14:54:58 linux kernel:  [rtmsg_ifinfo+92/208] rtmsg_ifinfo+0x5c/0xd0
Nov  5 14:54:58 linux kernel:  [<c031943c>] rtmsg_ifinfo+0x5c/0xd0
Nov  5 14:54:58 linux kernel:  [rtnetlink_event+48/117]
rtnetlink_event+0x30/0x75
Nov  5 14:54:58 linux kernel:  [<c0319980>] rtnetlink_event+0x30/0x75
Nov  5 14:54:58 linux kernel:  [notifier_call_chain+45/80]
notifier_call_chain+0x2d/0x50
Nov  5 14:54:58 linux kernel:  [<c013055d>] notifier_call_chain+0x2d/0x50
Nov  5 14:54:58 linux kernel:  [netdev_wait_allrefs+242/320]
netdev_wait_allrefs+0xf2/0x140
Nov  5 14:54:58 linux kernel:  [<c0310d52>] netdev_wait_allrefs+0xf2/0x140
Nov  5 14:54:58 linux kernel:  [netdev_run_todo+347/608]
netdev_run_todo+0x15b/0x260
Nov  5 14:54:58 linux kernel:  [<c0310efb>] netdev_run_todo+0x15b/0x260
Nov  5 14:54:58 linux kernel:  [worker_thread+531/800]
worker_thread+0x213/0x320
Nov  5 14:54:58 linux kernel:  [<c01333a3>] worker_thread+0x213/0x320
Nov  5 14:54:58 linux kernel:  [linkwatch_event+0/48]
linkwatch_event+0x0/0x30
Nov  5 14:54:58 linux kernel:  [<c0319c60>] linkwatch_event+0x0/0x30
Nov  5 14:54:58 linux kernel:  [default_wake_function+0/48]
default_wake_function+0x0/0x30
Nov  5 14:54:58 linux kernel:  [<c011d030>] default_wake_function+0x0/0x30
Nov  5 14:54:58 linux kernel:  [ret_from_fork+6/20] ret_from_fork+0x6/0x14
Nov  5 14:54:58 linux kernel:  [<c01097e6>] ret_from_fork+0x6/0x14
Nov  5 14:54:58 linux kernel:  [default_wake_function+0/48]
default_wake_function+0x0/0x30
Nov  5 14:54:58 linux kernel:  [<c011d030>] default_wake_function+0x0/0x30
Nov  5 14:54:58 linux kernel:  [worker_thread+0/800]
worker_thread+0x0/0x320
Nov  5 14:54:58 linux kernel:  [<c0133190>] worker_thread+0x0/0x320
Nov  5 14:54:58 linux kernel:  [kernel_thread_helper+5/24]
kernel_thread_helper+0x5/0x18
Nov  5 14:54:58 linux kernel:  [<c010753d>] kernel_thread_helper+0x5/0x18
Nov  5 14:54:58 linux kernel:
Nov  5 14:54:58 linux kernel: Code: f3 a5 f6 c3 02 74 02 66 a5 f6 c3 01 74
01 a4 8b 5d f4 8b 75


I think the problem is as follows (changed between 2.4 and 2.6).
unregister_netdevice() drops the last reference to the device and waits
for the ref counter for the dev to drop to zero. While it is OK for
unregister_netdevice to call notifier_call_chain (since it does a dev_put
at the end of the routine), netdev_wait_allrefs() cannot do the same until
it gets it's own reference. The dev can disappear during this when the
last reference gets dropped by the process holding it.

Following patch should fix it. I will try to reproduce this with and
without the patch to be certain.

Thanks,

- KK

diff -ruN linux-2.6.0-test9-bk9/net/core/dev.c
linux-2.6.0-test9-bk9.new/net/core/dev.c
--- linux-2.6.0-test9-bk9/net/core/dev.c         2003-11-05
15:43:21.000000000 -0800
+++ linux-2.6.0-test9-bk9.new/net/core/dev.c           2003-11-05
15:43:50.000000000 -0800
@@ -2749,8 +2749,10 @@
                                     rtnl_exlock();

                                     /* Rebroadcast unregister notification
*/
+                                    dev_hold(dev);
                                     notifier_call_chain(&netdev_chain,

NETDEV_UNREGISTER, dev);
+                                    dev_put(dev);

                                     if
(test_bit(__LINK_STATE_LINKWATCH_PENDING,
                                                      &dev->state)) {

^ permalink raw reply	[flat|nested] 14+ messages in thread
* Re: [PATCH] panic during unregister_netdevice()
@ 2003-11-06  0:42 Krishna Kumar
  2003-11-06  1:09 ` Stephen Hemminger
  0 siblings, 1 reply; 14+ messages in thread
From: Krishna Kumar @ 2003-11-06  0:42 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: davem, krkumar, netdev





> Hey, what if the dev refcount goes to zero before your dev_hold?

Actually, that is what I had written in my second mail.

> I would argue even running the loop once means some protocol is busted.

I have seen a message regarding ref count of 1, some delay and then the
rmmod works fine. So it
doesn't seem busted.

- KK



|---------+---------------------------->
|         |           Stephen Hemminger|
|         |           <shemminger@osdl.|
|         |           org>             |
|         |           Sent by:         |
|         |           netdev-bounce@oss|
|         |           .sgi.com         |
|         |                            |
|         |                            |
|         |           11/05/2003 04:30 |
|         |           PM               |
|         |                            |
|---------+---------------------------->
  >-----------------------------------------------------------------------------------------------------------------|
  |                                                                                                                 |
  |       To:       krkumar@us.ltcfwd.linux.ibm.com                                                                 |
  |       cc:       davem@redhat.com, netdev@oss.sgi.com                                                            |
  |       Subject:  Re: [PATCH] panic during unregister_netdevice()                                                 |
  |                                                                                                                 |
  >-----------------------------------------------------------------------------------------------------------------|





>
> diff -ruN linux-2.6.0-test9-bk9/net/core/dev.c
linux-2.6.0-test9-bk9.new/net/core/dev.c
> --- linux-2.6.0-test9-bk9/net/core/dev.c             2003-11-05
15:43:21.000000000 -0800
> +++ linux-2.6.0-test9-bk9.new/net/core/dev.c         2003-11-05
15:43:50.000000000 -0800
> @@ -2749,8 +2749,10 @@
>                                    rtnl_exlock();
>
>                                    /* Rebroadcast unregister notification
*/
> +                                  dev_hold(dev);
>                                    notifier_call_chain(&netdev_chain,
>
NETDEV_UNREGISTER, dev);
> +                                  dev_put(dev);
>
>                                    if
(test_bit(__LINK_STATE_LINKWATCH_PENDING,
>                                                     &dev->state)) {
>

Hey, what if the dev refcount goes to zero before your dev_hold?
Actually this repeated notifier looks like it wouldn't work anyway.
Why would a protocol drop it's reference when notified a second time?

I would argue even running the loop once means some protocol is busted.

^ permalink raw reply	[flat|nested] 14+ messages in thread
* Re: [PATCH] panic during unregister_netdevice()
@ 2003-11-06  1:16 Krishna Kumar
  0 siblings, 0 replies; 14+ messages in thread
From: Krishna Kumar @ 2003-11-06  1:16 UTC (permalink / raw)
  To: David S. Miller; +Cc: krkumar, netdev, Stephen Hemminger





> If the loops runs once or twice that is not a bug, it is possible for
> processes to grab onto the device via rtnetlink queries and similar
> and we have to pause and potentially schedule to deal with that.

Yes. I guess rtnetlink_rcv() calls netdev_run_todo() to handle that case.

> The core problem is that we end up putting the device reference to
> zero a second time, and for that reason we should prevent it by
> holding onto a reference around the entire loop and wait for the
> refcount to drop to '1'.

That was my original intention, but won't a driver that calls
unregister_netdevice() followed by a free_netdev() still panic the
system ? Since the netdev_run_todo() can run after this is done and
derefernce the 'dev'. (Or is the refcount of the dev->kobject going
to be > 1 and hence not freed ? This class/object code is new to me).

Thanks,

- KK

^ permalink raw reply	[flat|nested] 14+ messages in thread
* Re: [PATCH] panic during unregister_netdevice()
@ 2003-11-06  1:20 Krishna Kumar
  2003-11-06  1:33 ` Stephen Hemminger
  0 siblings, 1 reply; 14+ messages in thread
From: Krishna Kumar @ 2003-11-06  1:20 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: davem, krkumar, netdev





So how will this guarantee that the dev is valid after the dev_put() long
enough to do the
BUG_ON() and dev->destructor code ? Won't the same panic happen ?

Any idea how the dev gets freed up ? I was still in the old 2.4 kernel mode
thinking that
dev_put does it, but it seems to be done by using the class/object stuff in
free_netdev().

Won't a driver doing a unregister followed by a free_netdev still panic the
system if we
reference the dev at a later stage (even with the put at this place) ?

thanks,

- KK



|---------+---------------------------->
|         |           Stephen Hemminger|
|         |           <shemminger@osdl.|
|         |           org>             |
|         |           Sent by:         |
|         |           netdev-bounce@oss|
|         |           .sgi.com         |
|         |                            |
|         |                            |
|         |           11/05/2003 05:09 |
|         |           PM               |
|         |                            |
|---------+---------------------------->
  >-----------------------------------------------------------------------------------------------------------------|
  |                                                                                                                 |
  |       To:       Krishna Kumar/Beaverton/IBM@IBMUS                                                               |
  |       cc:       davem@redhat.com, krkumar@us.ltcfwd.linux.ibm.com, netdev@oss.sgi.com                           |
  |       Subject:  Re: [PATCH] panic during unregister_netdevice()                                                 |
  |                                                                                                                 |
  >-----------------------------------------------------------------------------------------------------------------|




Try this. Instead of dropping the last reference in unregister, it does it
after all other references are gone (sort of like the old 2.4 code).

diff -Nru a/net/core/dev.c b/net/core/dev.c
--- a/net/core/dev.c           Wed Nov  5 17:04:57 2003
+++ b/net/core/dev.c           Wed Nov  5 17:04:57 2003
@@ -2743,7 +2743,7 @@
             unsigned long rebroadcast_time, warning_time;

             rebroadcast_time = warning_time = jiffies;
-            while (atomic_read(&dev->refcnt) != 0) {
+            while (atomic_read(&dev->refcnt) > 1) {
                         if (time_after(jiffies, rebroadcast_time + 1 *
HZ)) {
                                     rtnl_shlock();
                                     rtnl_exlock();
@@ -2838,6 +2838,7 @@
                                     dev->reg_state = NETREG_UNREGISTERED;

                                     netdev_wait_allrefs(dev);
+                                    dev_put(dev);

                                     /* paranoia */
                                     BUG_ON(atomic_read(&dev->refcnt));
@@ -2974,7 +2975,6 @@
             /* Finish processing unregister after unlock */
             net_set_todo(dev);

-            dev_put(dev);
             return 0;
 }

^ permalink raw reply	[flat|nested] 14+ messages in thread
* Re: [PATCH] panic during unregister_netdevice()
@ 2003-11-06  1:42 Krishna Kumar
  0 siblings, 0 replies; 14+ messages in thread
From: Krishna Kumar @ 2003-11-06  1:42 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: davem, krkumar, netdev





Nope, I still get the panic with the change you suggested. We need to
understand this better though we seem to
be on the right track. I will try to get the stack now (couldn't get this
time since I was on the X display, my mouse
still works but not the keyboard).

Thanks,

- KK



|---------+---------------------------->
|         |           Stephen Hemminger|
|         |           <shemminger@osdl.|
|         |           org>             |
|         |           Sent by:         |
|         |           netdev-bounce@oss|
|         |           .sgi.com         |
|         |                            |
|         |                            |
|         |           11/05/2003 05:33 |
|         |           PM               |
|         |                            |
|---------+---------------------------->
  >-----------------------------------------------------------------------------------------------------------------|
  |                                                                                                                 |
  |       To:       Krishna Kumar/Beaverton/IBM@IBMUS                                                               |
  |       cc:       davem@redhat.com, krkumar@us.ltcfwd.linux.ibm.com, netdev@oss.sgi.com                           |
  |       Subject:  Re: [PATCH] panic during unregister_netdevice()                                                 |
  |                                                                                                                 |
  >-----------------------------------------------------------------------------------------------------------------|




On Wed, 5 Nov 2003 17:20:06 -0800
Krishna Kumar <kumarkr@us.ibm.com> wrote:

>
>
>
>
> So how will this guarantee that the dev is valid after the dev_put() long
> enough to do the
> BUG_ON() and dev->destructor code ? Won't the same panic happen ?
>

Because the code there should be able to depend on having the last
reference.
No other code should be able to find the dev to get a new reference to it,
since it is no longer in the dev_list.  Code that does dev_hold's
without already having a reference is just not playing fair.


> Any idea how the dev gets freed up ? I was still in the old 2.4 kernel
mode
> thinking that
> dev_put does it, but it seems to be done by using the class/object stuff
in
> free_netdev().
>
> Won't a driver doing a unregister followed by a free_netdev still panic
the
> system if we
> reference the dev at a later stage (even with the put at this place) ?
>
> thanks,
>
> - KK
>

^ permalink raw reply	[flat|nested] 14+ messages in thread
* Re: [PATCH] panic during unregister_netdevice()
@ 2003-11-06 19:58 Krishna Kumar
  2003-11-06 19:59 ` David S. Miller
  0 siblings, 1 reply; 14+ messages in thread
From: Krishna Kumar @ 2003-11-06 19:58 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: davem, krkumar, netdev





Actually, even the original code looks good, and if that panic'd, the
following won't change
that (verified it also).

When unregister_netdev() is called by the driver, it first calls
unregister_netdevice() which
drops it's last ref to the dev, making it zero. unregister_netdev() then
calls rtnl_unlock() which
calls netdev_run_todo(), which calls netdev_wait_allrefs() and only after
that succeeds,
does the driver do a free_netdev(). So the dev should not be freed while
the wait_ref() is
executing, and the original code looks correct.

I don't know if it is some corruption on my system, some hardware problem ?
I will look
some more, also try to get a different machine.

- KK




|---------+---------------------------->
|         |           Stephen Hemminger|
|         |           <shemminger@osdl.|
|         |           org>             |
|         |           Sent by:         |
|         |           netdev-bounce@oss|
|         |           .sgi.com         |
|         |                            |
|         |                            |
|         |           11/05/2003 05:09 |
|         |           PM               |
|         |                            |
|---------+---------------------------->
  >-----------------------------------------------------------------------------------------------------------------|
  |                                                                                                                 |
  |       To:       Krishna Kumar/Beaverton/IBM@IBMUS                                                               |
  |       cc:       davem@redhat.com, krkumar@us.ltcfwd.linux.ibm.com, netdev@oss.sgi.com                           |
  |       Subject:  Re: [PATCH] panic during unregister_netdevice()                                                 |
  |                                                                                                                 |
  >-----------------------------------------------------------------------------------------------------------------|




Try this. Instead of dropping the last reference in unregister, it does it
after all other references are gone (sort of like the old 2.4 code).

diff -Nru a/net/core/dev.c b/net/core/dev.c
--- a/net/core/dev.c           Wed Nov  5 17:04:57 2003
+++ b/net/core/dev.c           Wed Nov  5 17:04:57 2003
@@ -2743,7 +2743,7 @@
             unsigned long rebroadcast_time, warning_time;

             rebroadcast_time = warning_time = jiffies;
-            while (atomic_read(&dev->refcnt) != 0) {
+            while (atomic_read(&dev->refcnt) > 1) {
                         if (time_after(jiffies, rebroadcast_time + 1 *
HZ)) {
                                     rtnl_shlock();
                                     rtnl_exlock();
@@ -2838,6 +2838,7 @@
                                     dev->reg_state = NETREG_UNREGISTERED;

                                     netdev_wait_allrefs(dev);
+                                    dev_put(dev);

                                     /* paranoia */
                                     BUG_ON(atomic_read(&dev->refcnt));
@@ -2974,7 +2975,6 @@
             /* Finish processing unregister after unlock */
             net_set_todo(dev);

-            dev_put(dev);
             return 0;
 }

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2003-11-06 21:14 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-11-06  0:00 [PATCH] panic during unregister_netdevice() Krishna Kumar
2003-11-06  0:30 ` Stephen Hemminger
2003-11-06  0:33   ` David S. Miller
  -- strict thread matches above, loose matches on Subject: below --
2003-11-06  0:26 Krishna Kumar
2003-11-06  0:42 Krishna Kumar
2003-11-06  1:09 ` Stephen Hemminger
2003-11-06  1:16 Krishna Kumar
2003-11-06  1:20 Krishna Kumar
2003-11-06  1:33 ` Stephen Hemminger
2003-11-06  1:42 Krishna Kumar
2003-11-06 19:58 Krishna Kumar
2003-11-06 19:59 ` David S. Miller
2003-11-06 21:07   ` Krishna Kumar
2003-11-06 21:14     ` David S. Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).