netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jay Vosburgh <fubar@us.ibm.com>
To: "Michael Chan" <mchan@broadcom.com>
Cc: "Eric Dumazet" <eric.dumazet@gmail.com>,
	"David Miller" <davem@davemloft.net>,
	"pedro.netdev@dondevamos.com" <pedro.netdev@dondevamos.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"kaber@trash.net" <kaber@trash.net>,
	"bhutchings@solarflare.com" <bhutchings@solarflare.com>
Subject: Re: [BUG net-next-2.6] vlan, bonding, bnx2 problems
Date: Mon, 19 Jul 2010 13:19:26 -0700	[thread overview]
Message-ID: <25515.1279570766@death> (raw)
In-Reply-To: <1279563291.20559.10.camel@HP1>

Michael Chan <mchan@broadcom.com> wrote:

>Adding Jay to CC.
>
>On Mon, 2010-07-19 at 06:24 -0700, Eric Dumazet wrote:
>> [   32.046479] BUG: scheduling while atomic: ifenslave/4586/0x00000100
>> [   32.046540] Modules linked in: ipmi_si ipmi_msghandler hpilo
>> bonding ipv6
>> [   32.046784] Pid: 4586, comm: ifenslave Tainted: G        W
>> 2.6.35-rc1-01453-g3e12451-dirty #836
>> [   32.046860] Call Trace:
>> [   32.046910]  [<c13421c4>] ? printk+0x18/0x1c
>> [   32.046965]  [<c10315c9>] __schedule_bug+0x59/0x60
>> [   32.047019]  [<c1342a2c>] schedule+0x57c/0x850
>> [   32.047074]  [<c104a106>] ? lock_timer_base+0x26/0x50
>> [   32.047128]  [<c1342f78>] schedule_timeout+0x118/0x250
>> [   32.047183]  [<c104a2c0>] ? process_timeout+0x0/0x10
>> [   32.047238]  [<c13430c5>] schedule_timeout_uninterruptible
>> +0x15/0x20
>> [   32.047295]  [<c104a345>] msleep+0x15/0x20
>> [   32.047350]  [<c1227082>] bnx2_napi_disable+0x52/0x80
>> [   32.047405]  [<c122b56f>] bnx2_netif_stop+0x3f/0xa0
>> [   32.047460]  [<c122b62a>] bnx2_vlan_rx_register+0x5a/0x80
>> [   32.047516]  [<f8ced776>] bond_enslave+0x526/0xa90 [bonding]
>> [   32.047576]  [<f8b8f0d0>] ? fib6_clean_node+0x0/0xb0 [ipv6]
>> [   32.047634]  [<f8b8dda0>] ? fib6_age+0x0/0x90 [ipv6]
>> [   32.047689]  [<c129d2d3>] ? netdev_set_master+0x3/0xc0
>> [   32.047746]  [<f8cee4cb>] bond_do_ioctl+0x31b/0x430 [bonding]
>> [   32.047804]  [<c105b19a>] ? raw_notifier_call_chain+0x1a/0x20
>> [   32.047861]  [<c12abd5d>] ? __rtnl_unlock+0xd/0x10
>> [   32.047915]  [<c129f8cd>] ? __dev_get_by_name+0x7d/0xa0
>> [   32.047970]  [<c12a19b0>] dev_ifsioc+0xf0/0x290
>> [   32.048025]  [<f8cee1b0>] ? bond_do_ioctl+0x0/0x430 [bonding]
>> [   32.048081]  [<c12a1ce1>] dev_ioctl+0x191/0x610
>> [   32.048136]  [<c12eeb20>] ? udp_ioctl+0x0/0x70
>> [   32.048189]  [<c128f67c>] sock_ioctl+0x6c/0x240
>> [   32.048243]  [<c10d3a44>] vfs_ioctl+0x34/0xa0
>> [   32.048297]  [<c10c7cab>] ? alloc_file+0x1b/0xa0
>> [   32.048351]  [<c128f610>] ? sock_ioctl+0x0/0x240
>> [   32.048404]  [<c10d4186>] do_vfs_ioctl+0x66/0x550
>> [   32.048459]  [<c1022ca0>] ? do_page_fault+0x0/0x350
>> [   32.048513]  [<c1022e41>] ? do_page_fault+0x1a1/0x350
>> [   32.048568]  [<c129098c>] ? sys_socket+0x5c/0x70
>> [   32.048622]  [<c1291860>] ? sys_socketcall+0x60/0x270
>> [   32.048677]  [<c10d46a9>] sys_ioctl+0x39/0x60
>> [   32.048730]  [<c1002bd0>] sysenter_do_call+0x12/0x26
>> [   32.052025] bonding: bond0: enslaving eth1 as a backup interface
>> with a down link.
>> [   32.100207] tg3 0000:14:04.0: PME# enabled
>> [   32.100222]  pci0000:00: wake-up capability enabled by ACPI
>> [   32.224488]  pci0000:00: wake-up capability disabled by ACPI
>> [   32.224492] tg3 0000:14:04.0: PME# disabled
>> [   32.348516] tg3 0000:14:04.0: BAR 0: set to [mem
>> 0xfdff0000-0xfdffffff 64bit] (PCI address [0xfdff0000-0xfdffffff]
>> [   32.348524] tg3 0000:14:04.0: BAR 2: set to [mem
>> 0xfdfe0000-0xfdfeffff 64bit] (PCI address [0xfdfe0000-0xfdfeffff]
>> [   32.363711] bonding: bond0: enslaving eth2 as a backup interface
>> with a down link.
>> 
>> 
>> 
>> For bnx2, it seems commit 212f9934afccf9c9739921
>> was not sufficient to correct the "scheduling while atomic" bug...
>> enslaving a bnx2 on a bond device with one vlan already set :
>>  bond_enslave -> bnx2_vlan_rx_register -> bnx2_netif_stop ->
>> bnx2_napi_disable -> msleep()
>> 
>
>There are a number of drivers that call napi_disable() during
>->ndo_vlan_rx_regsiter().  bnx2 is lockless in the rx path and so we
>need to disable NAPI rx processing and wait for it to be done before
>modifying the vlgrp.
>
>Jay, is there an alternative to holding the bond->lock when calling the
>slave's ->ndo_vlan_rx_register()?

	I believe so.  The lock is held here nominally to mutex
bonding's vlan_list.  The bond_add_vlans_on_slave function actually does
the lock and call to ndo_vlan_rx_register (plus one add_vid call per
configured VLAN); I think the call frame in the above stack has been
optimized out.

	For the specific cases of bond_add_vlans_on_slave and
bond_del_vlans_from_slave, we should be able to get away without holding
the bond->lock because we also hold RTNL, and it looks like all changes
to the vlan_list are implicitly mutexed by RTNL because all VLAN add /
remove for device or vid end up being done under RTNL.

	The cases within bonding that change the vlan_list will still
have to hold bond->lock, because other call sites within bonding check
the vlan_list without RTNL (and it would be impractical to have them do
so).

	The patch is as follows; I'm compiling this now to test.  If it
pans out, I'll post a formal submission in a bit.

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 8228088..decddf5 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -565,10 +565,8 @@ static void bond_add_vlans_on_slave(struct bonding *bond, struct net_device *sla
 	struct vlan_entry *vlan;
 	const struct net_device_ops *slave_ops = slave_dev->netdev_ops;
 
-	write_lock_bh(&bond->lock);
-
 	if (list_empty(&bond->vlan_list))
-		goto out;
+		return;
 
 	if ((slave_dev->features & NETIF_F_HW_VLAN_RX) &&
 	    slave_ops->ndo_vlan_rx_register)
@@ -576,13 +574,10 @@ static void bond_add_vlans_on_slave(struct bonding *bond, struct net_device *sla
 
 	if (!(slave_dev->features & NETIF_F_HW_VLAN_FILTER) ||
 	    !(slave_ops->ndo_vlan_rx_add_vid))
-		goto out;
+		return;
 
 	list_for_each_entry(vlan, &bond->vlan_list, vlan_list)
 		slave_ops->ndo_vlan_rx_add_vid(slave_dev, vlan->vlan_id);
-
-out:
-	write_unlock_bh(&bond->lock);
 }
 
 static void bond_del_vlans_from_slave(struct bonding *bond,
@@ -592,10 +587,8 @@ static void bond_del_vlans_from_slave(struct bonding *bond,
 	struct vlan_entry *vlan;
 	struct net_device *vlan_dev;
 
-	write_lock_bh(&bond->lock);
-
 	if (list_empty(&bond->vlan_list))
-		goto out;
+		return;
 
 	if (!(slave_dev->features & NETIF_F_HW_VLAN_FILTER) ||
 	    !(slave_ops->ndo_vlan_rx_kill_vid))
@@ -614,9 +607,6 @@ unreg:
 	if ((slave_dev->features & NETIF_F_HW_VLAN_RX) &&
 	    slave_ops->ndo_vlan_rx_register)
 		slave_ops->ndo_vlan_rx_register(slave_dev, NULL);
-
-out:
-	write_unlock_bh(&bond->lock);
 }
 
 /*------------------------------- Link status -------------------------------*/

	
	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

  reply	other threads:[~2010-07-19 20:19 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-13 19:20 [PATCH] vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet) Pedro Garcia
2010-06-13 21:56 ` Ben Hutchings
2010-06-14 16:49   ` Pedro Garcia
2010-06-14 17:02     ` Ben Hutchings
2010-06-14 17:11       ` Patrick McHardy
2010-06-14 19:12         ` Eric Dumazet
2010-06-16  8:49           ` Pedro Garcia
2010-06-16  9:08             ` Eric Dumazet
2010-06-16 11:42               ` Patrick McHardy
2010-06-16 13:28                 ` Pedro Garcia
2010-06-16 14:24                   ` Arnd Bergmann
2010-06-16 15:28                     ` Patrick McHardy
2010-06-16 18:26                       ` Arnd Bergmann
2010-06-16 18:58                         ` Eric Dumazet
2010-06-17  8:56                           ` Vladislav Zolotarov
2010-06-17 10:28                             ` Eric Dumazet
2010-06-17 14:08                               ` Vladislav Zolotarov
2010-06-16 14:24                   ` Eric Dumazet
2010-06-27 23:21               ` Pedro Garcia
2010-06-30 20:16                 ` David Miller
2010-07-01 18:47                   ` Pedro Garcia
2010-07-01 20:19                     ` Eric Dumazet
2010-07-18 16:43                       ` Pedro Garcia
2010-07-18 22:39                         ` David Miller
2010-07-19 13:24                           ` [BUG net-next-2.6] vlan, bonding, bnx2 problems Eric Dumazet
2010-07-19 16:35                             ` David Miller
2010-07-19 18:14                             ` Michael Chan
2010-07-19 20:19                               ` Jay Vosburgh [this message]
2010-07-20 22:58                                 ` Jay Vosburgh
2010-06-24 18:28             ` [PATCH] vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet) Pedro Garcia Pelaez
2010-07-08 12:54             ` Vladislav Zolotarov
2010-07-08 12:58               ` Vladislav Zolotarov
2010-07-08 13:51             ` Vladislav Zolotarov
2010-06-14 19:42         ` Joe Perches
2010-06-14 20:03           ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=25515.1279570766@death \
    --to=fubar@us.ibm.com \
    --cc=bhutchings@solarflare.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=kaber@trash.net \
    --cc=mchan@broadcom.com \
    --cc=netdev@vger.kernel.org \
    --cc=pedro.netdev@dondevamos.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).