* Fw: oops during unregister_netdevice interface enslaved to bond - regression
@ 2011-05-10 8:17 Einar EL Lueck
2011-05-10 8:54 ` Eric Dumazet
2011-05-10 19:25 ` David Miller
0 siblings, 2 replies; 9+ messages in thread
From: Einar EL Lueck @ 2011-05-10 8:17 UTC (permalink / raw)
To: davem; +Cc: netdev, Frank Blaschka
Hi Dave,
Einar EL Lueck/Germany/IBM wrote on 04/29/2011 04:45:45 PM:
> From:
>
> Einar EL Lueck/Germany/IBM
>
> To:
>
> opurdila@ixiacom.com, netdev@vger.kernel.org, linux-
> s390@vger.kernel.org, davem@davemloft.net
>
> Cc:
>
> Frank Blaschka/Germany/IBM@IBMDE
>
> Date:
>
> 04/29/2011 04:45 PM
>
> Subject:
>
> Re: oops during unregister_netdevice interface enslaved to bond -
regression
>
> Hi Octavian,
>
> On 04/15/2011 10:53 AM, Frank Blaschka wrote:
> > Hi Octavian,
> >
> > your commit 443457242beb6716b43db4d62fe148eab5515505 introduced
> this regression.
> > I have reviewed the net device unregister code but did not
> understand it very well.
> > I have seen the problem only in combination with bonding. Can you
> give me some help
> > how to go on with this problem. I can reproduced it very easy on
asingle CPU
> > machine.
> >
>
> In this case rollback_registered_many iterates over the list of devs
> that initially has just one device in it. In a loop it calls
> call_netdevice_notifiers(NETDEV_UNREGISTER, dev) which triggers the
> bonding driver to call dev_close_many for the same device. That call
> to dev_close_many leads to the addition of the same device to the
> list over which rollback_registered_many is iterating. Consequently,
> netdev_unregister_kobject(dev) is called twice for the same device.
> Frank captured the result in his mail.
>
Calls to the *_many functions introduced by Octavian may never interleave
because
the traversed lists modify each other. This was the root cause for the
symptom that Frank discovered. Octavian is not a valid mail recipient
anymore and did not react from any new mail address. I suggest to revert
the commit.
Regards,
Einar.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Fw: oops during unregister_netdevice interface enslaved to bond - regression
2011-05-10 8:17 Fw: oops during unregister_netdevice interface enslaved to bond - regression Einar EL Lueck
@ 2011-05-10 8:54 ` Eric Dumazet
2011-05-10 8:59 ` Eric Dumazet
2011-05-10 13:14 ` Frank Blaschka
2011-05-10 19:25 ` David Miller
1 sibling, 2 replies; 9+ messages in thread
From: Eric Dumazet @ 2011-05-10 8:54 UTC (permalink / raw)
To: Einar EL Lueck; +Cc: davem, netdev, Frank Blaschka
Le mardi 10 mai 2011 à 10:17 +0200, Einar EL Lueck a écrit :
> Hi Dave,
>
> Einar EL Lueck/Germany/IBM wrote on 04/29/2011 04:45:45 PM:
>
> > From:
> >
> > Einar EL Lueck/Germany/IBM
> >
> > To:
> >
> > opurdila@ixiacom.com, netdev@vger.kernel.org, linux-
> > s390@vger.kernel.org, davem@davemloft.net
> >
> > Cc:
> >
> > Frank Blaschka/Germany/IBM@IBMDE
> >
> > Date:
> >
> > 04/29/2011 04:45 PM
> >
> > Subject:
> >
> > Re: oops during unregister_netdevice interface enslaved to bond -
> regression
> >
> > Hi Octavian,
> >
> > On 04/15/2011 10:53 AM, Frank Blaschka wrote:
> > > Hi Octavian,
> > >
> > > your commit 443457242beb6716b43db4d62fe148eab5515505 introduced
> > this regression.
> > > I have reviewed the net device unregister code but did not
> > understand it very well.
> > > I have seen the problem only in combination with bonding. Can you
> > give me some help
> > > how to go on with this problem. I can reproduced it very easy on
> asingle CPU
> > > machine.
> > >
> >
> > In this case rollback_registered_many iterates over the list of devs
> > that initially has just one device in it. In a loop it calls
> > call_netdevice_notifiers(NETDEV_UNREGISTER, dev) which triggers the
> > bonding driver to call dev_close_many for the same device. That call
> > to dev_close_many leads to the addition of the same device to the
> > list over which rollback_registered_many is iterating. Consequently,
> > netdev_unregister_kobject(dev) is called twice for the same device.
> > Frank captured the result in his mail.
> >
>
> Calls to the *_many functions introduced by Octavian may never interleave
> because
> the traversed lists modify each other. This was the root cause for the
> symptom that Frank discovered. Octavian is not a valid mail recipient
> anymore and did not react from any new mail address. I suggest to revert
> the commit.
>
Hello Einar
I am currently working on this stuff [adding even more batching and
probably bugs as well ], so instead of revert I'll try to find a way to
fix this.
If you already have a script to reproduce the bug on virtual devices on
x86 (not on s390 machines I dont have ;) ), I'll appreciate having a
copy of it.
Thanks for the reminder.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Fw: oops during unregister_netdevice interface enslaved to bond - regression
2011-05-10 8:54 ` Eric Dumazet
@ 2011-05-10 8:59 ` Eric Dumazet
2011-05-10 13:03 ` Eric Dumazet
2011-05-10 13:14 ` Frank Blaschka
1 sibling, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2011-05-10 8:59 UTC (permalink / raw)
To: Einar EL Lueck; +Cc: davem, netdev, Frank Blaschka
Le mardi 10 mai 2011 à 10:54 +0200, Eric Dumazet a écrit :
> I am currently working on this stuff [adding even more batching and
> probably bugs as well ], so instead of revert I'll try to find a way to
> fix this.
>
> If you already have a script to reproduce the bug on virtual devices on
> x86 (not on s390 machines I dont have ;) ), I'll appreciate having a
> copy of it.
>
> Thanks for the reminder.
BTW make sure latest linux-2.6 still exhibits the problem, we fixed some
things after original Octavian commit
List of commits :
commit ceaaec98ad99859ac90ac6863ad0a6cd075d8e0e
net: deinit automatic LIST_HEAD
commit f87e6f47933e3ebeced9bb12615e830a72cedce4
net: dont leave active on stack LIST_HEAD
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Fw: oops during unregister_netdevice interface enslaved to bond - regression
2011-05-10 8:59 ` Eric Dumazet
@ 2011-05-10 13:03 ` Eric Dumazet
0 siblings, 0 replies; 9+ messages in thread
From: Eric Dumazet @ 2011-05-10 13:03 UTC (permalink / raw)
To: Einar EL Lueck; +Cc: davem, netdev, Frank Blaschka
Le mardi 10 mai 2011 à 10:59 +0200, Eric Dumazet a écrit :
> Le mardi 10 mai 2011 à 10:54 +0200, Eric Dumazet a écrit :
>
> > I am currently working on this stuff [adding even more batching and
> > probably bugs as well ], so instead of revert I'll try to find a way to
> > fix this.
> >
> > If you already have a script to reproduce the bug on virtual devices on
> > x86 (not on s390 machines I dont have ;) ), I'll appreciate having a
> > copy of it.
> >
> > Thanks for the reminder.
>
> BTW make sure latest linux-2.6 still exhibits the problem, we fixed some
> things after original Octavian commit
>
> List of commits :
>
> commit ceaaec98ad99859ac90ac6863ad0a6cd075d8e0e
> net: deinit automatic LIST_HEAD
>
> commit f87e6f47933e3ebeced9bb12615e830a72cedce4
> net: dont leave active on stack LIST_HEAD
>
>
OK I trigger the bug on linux-2.6 with :
modprobe bonding
ip link add testa type veth peer name testb
ifconfig bond0 up
ifenslave bond0 testa
ip link del testa
I'll cook a patch, stay tuned :)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Fw: oops during unregister_netdevice interface enslaved to bond - regression
2011-05-10 8:54 ` Eric Dumazet
2011-05-10 8:59 ` Eric Dumazet
@ 2011-05-10 13:14 ` Frank Blaschka
2011-05-10 13:36 ` Eric Dumazet
1 sibling, 1 reply; 9+ messages in thread
From: Frank Blaschka @ 2011-05-10 13:14 UTC (permalink / raw)
To: Eric Dumazet; +Cc: davem, netdev, ELELUECK
On Tue, May 10, 2011 at 10:54:32AM +0200, Eric Dumazet wrote:
> Le mardi 10 mai 2011 à 10:17 +0200, Einar EL Lueck a écrit :
> > Hi Dave,
> >
> > Einar EL Lueck/Germany/IBM wrote on 04/29/2011 04:45:45 PM:
> >
> > > From:
> > >
> > > Einar EL Lueck/Germany/IBM
> > >
> > > To:
> > >
> > > opurdila@ixiacom.com, netdev@vger.kernel.org, linux-
> > > s390@vger.kernel.org, davem@davemloft.net
> > >
> > > Cc:
> > >
> > > Frank Blaschka/Germany/IBM@IBMDE
> > >
> > > Date:
> > >
> > > 04/29/2011 04:45 PM
> > >
> > > Subject:
> > >
> > > Re: oops during unregister_netdevice interface enslaved to bond -
> > regression
> > >
> > > Hi Octavian,
> > >
> > > On 04/15/2011 10:53 AM, Frank Blaschka wrote:
> > > > Hi Octavian,
> > > >
> > > > your commit 443457242beb6716b43db4d62fe148eab5515505 introduced
> > > this regression.
> > > > I have reviewed the net device unregister code but did not
> > > understand it very well.
> > > > I have seen the problem only in combination with bonding. Can you
> > > give me some help
> > > > how to go on with this problem. I can reproduced it very easy on
> > asingle CPU
> > > > machine.
> > > >
> > >
> > > In this case rollback_registered_many iterates over the list of devs
> > > that initially has just one device in it. In a loop it calls
> > > call_netdevice_notifiers(NETDEV_UNREGISTER, dev) which triggers the
> > > bonding driver to call dev_close_many for the same device. That call
> > > to dev_close_many leads to the addition of the same device to the
> > > list over which rollback_registered_many is iterating. Consequently,
> > > netdev_unregister_kobject(dev) is called twice for the same device.
> > > Frank captured the result in his mail.
> > >
> >
> > Calls to the *_many functions introduced by Octavian may never interleave
> > because
> > the traversed lists modify each other. This was the root cause for the
> > symptom that Frank discovered. Octavian is not a valid mail recipient
> > anymore and did not react from any new mail address. I suggest to revert
> > the commit.
> >
>
> Hello Einar
>
> I am currently working on this stuff [adding even more batching and
> probably bugs as well ], so instead of revert I'll try to find a way to
> fix this.
>
great Thx!
> If you already have a script to reproduce the bug on virtual devices on
> x86 (not on s390 machines I dont have ;) ), I'll appreciate having a
> copy of it.a
I just checked todays net-next tree, problem is still there.
I don't have an x86 box, but I was able to reproduce the problem
with the dummy device (on s/390)
# modprobe bonding
# modprobe dummy
# ifconfig bond0 up
# ifenslave bond0 dummy0
# rmmod dummy
oops looks very much the same as unsing a real device. Hope this helps ...
>
> Thanks for the reminder.
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Fw: oops during unregister_netdevice interface enslaved to bond - regression
2011-05-10 13:14 ` Frank Blaschka
@ 2011-05-10 13:36 ` Eric Dumazet
2011-05-10 14:20 ` Frank Blaschka
2011-05-10 19:26 ` David Miller
0 siblings, 2 replies; 9+ messages in thread
From: Eric Dumazet @ 2011-05-10 13:36 UTC (permalink / raw)
To: Frank Blaschka, David Miller; +Cc: netdev, ELELUECK, Octavian Purdila
Le mardi 10 mai 2011 à 15:14 +0200, Frank Blaschka a écrit :
> I just checked todays net-next tree, problem is still there.
> I don't have an x86 box, but I was able to reproduce the problem
> with the dummy device (on s/390)
>
> # modprobe bonding
> # modprobe dummy
> # ifconfig bond0 up
> # ifenslave bond0 dummy0
> # rmmod dummy
Here is the patch to fix this problem
Thanks again for your help.
[PATCH net-2.6] net: dev_close() should check IFF_UP
Commit 443457242beb (factorize sync-rcu call in
unregister_netdevice_many) mistakenly removed one test from dev_close()
Following actions trigger a BUG :
modprobe bonding
modprobe dummy
ifconfig bond0 up
ifenslave bond0 dummy0
rmmod dummy
dev_close() must not close a non IFF_UP device.
With help from Frank Blaschka and Einar EL Lueck
Reported-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Reported-by: Einar EL Lueck <ELELUECK@de.ibm.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Octavian Purdila <opurdila@ixiacom.com>
---
net/core/dev.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 856b6ee..9200944 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1284,11 +1284,13 @@ static int dev_close_many(struct list_head *head)
*/
int dev_close(struct net_device *dev)
{
- LIST_HEAD(single);
+ if (dev->flags & IFF_UP) {
+ LIST_HEAD(single);
- list_add(&dev->unreg_list, &single);
- dev_close_many(&single);
- list_del(&single);
+ list_add(&dev->unreg_list, &single);
+ dev_close_many(&single);
+ list_del(&single);
+ }
return 0;
}
EXPORT_SYMBOL(dev_close);
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: Fw: oops during unregister_netdevice interface enslaved to bond - regression
2011-05-10 13:36 ` Eric Dumazet
@ 2011-05-10 14:20 ` Frank Blaschka
2011-05-10 19:26 ` David Miller
1 sibling, 0 replies; 9+ messages in thread
From: Frank Blaschka @ 2011-05-10 14:20 UTC (permalink / raw)
To: Eric Dumazet; +Cc: davem, netdev, linux-s390, ELELUECK
On Tue, May 10, 2011 at 03:36:59PM +0200, Eric Dumazet wrote:
> Le mardi 10 mai 2011 à 15:14 +0200, Frank Blaschka a écrit :
>
> > I just checked todays net-next tree, problem is still there.
> > I don't have an x86 box, but I was able to reproduce the problem
> > with the dummy device (on s/390)
> >
> > # modprobe bonding
> > # modprobe dummy
> > # ifconfig bond0 up
> > # ifenslave bond0 dummy0
> > # rmmod dummy
>
> Here is the patch to fix this problem
>
Hi Eric,
your patch did the trick. With the patch applied I could not
reproduce the problem anyhow I use real or dummy device.
Thx for your help!
> Thanks again for your help.
>
> [PATCH net-2.6] net: dev_close() should check IFF_UP
>
> Commit 443457242beb (factorize sync-rcu call in
> unregister_netdevice_many) mistakenly removed one test from dev_close()
>
> Following actions trigger a BUG :
>
> modprobe bonding
> modprobe dummy
> ifconfig bond0 up
> ifenslave bond0 dummy0
> rmmod dummy
>
> dev_close() must not close a non IFF_UP device.
>
> With help from Frank Blaschka and Einar EL Lueck
>
> Reported-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
> Reported-by: Einar EL Lueck <ELELUECK@de.ibm.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Octavian Purdila <opurdila@ixiacom.com>
> ---
> net/core/dev.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 856b6ee..9200944 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1284,11 +1284,13 @@ static int dev_close_many(struct list_head *head)
> */
> int dev_close(struct net_device *dev)
> {
> - LIST_HEAD(single);
> + if (dev->flags & IFF_UP) {
> + LIST_HEAD(single);
>
> - list_add(&dev->unreg_list, &single);
> - dev_close_many(&single);
> - list_del(&single);
> + list_add(&dev->unreg_list, &single);
> + dev_close_many(&single);
> + list_del(&single);
> + }
> return 0;
> }
> EXPORT_SYMBOL(dev_close);
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: oops during unregister_netdevice interface enslaved to bond - regression
2011-05-10 8:17 Fw: oops during unregister_netdevice interface enslaved to bond - regression Einar EL Lueck
2011-05-10 8:54 ` Eric Dumazet
@ 2011-05-10 19:25 ` David Miller
1 sibling, 0 replies; 9+ messages in thread
From: David Miller @ 2011-05-10 19:25 UTC (permalink / raw)
To: ELELUECK; +Cc: netdev, Frank.Blaschka
From: Einar EL Lueck <ELELUECK@de.ibm.com>
Date: Tue, 10 May 2011 10:17:09 +0200
> Calls to the *_many functions introduced by Octavian may never interleave
> because
> the traversed lists modify each other. This was the root cause for the
> symptom that Frank discovered. Octavian is not a valid mail recipient
> anymore and did not react from any new mail address. I suggest to revert
> the commit.
I don't think a pure-revert is appropriate in this case, the regression
that will introduce is almost as serious as the OOPS here.
Someone just needs to work on a fix.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: oops during unregister_netdevice interface enslaved to bond - regression
2011-05-10 13:36 ` Eric Dumazet
2011-05-10 14:20 ` Frank Blaschka
@ 2011-05-10 19:26 ` David Miller
1 sibling, 0 replies; 9+ messages in thread
From: David Miller @ 2011-05-10 19:26 UTC (permalink / raw)
To: eric.dumazet; +Cc: blaschka, netdev, ELELUECK, opurdila
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 10 May 2011 15:36:59 +0200
> [PATCH net-2.6] net: dev_close() should check IFF_UP
Applied, thanks Eric.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2011-05-10 19:26 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-10 8:17 Fw: oops during unregister_netdevice interface enslaved to bond - regression Einar EL Lueck
2011-05-10 8:54 ` Eric Dumazet
2011-05-10 8:59 ` Eric Dumazet
2011-05-10 13:03 ` Eric Dumazet
2011-05-10 13:14 ` Frank Blaschka
2011-05-10 13:36 ` Eric Dumazet
2011-05-10 14:20 ` Frank Blaschka
2011-05-10 19:26 ` David Miller
2011-05-10 19:25 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).