From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jay Vosburgh Subject: Re: [BUG net-next-2.6] vlan, bonding, bnx2 problems Date: Tue, 20 Jul 2010 15:58:18 -0700 Message-ID: <1264.1279666698@death> References: <1278015554.2782.11.camel@edumazet-laptop> <957a5becb6e742b6dc3255b68bef3ba8@dondevamos.com> <20100718.153910.67919508.davem@davemloft.net> <1279545854.2553.37.camel@edumazet-laptop> <1279563291.20559.10.camel@HP1> <25515.1279570766@death> Cc: "Michael Chan" , "Eric Dumazet" , "David Miller" , "pedro.netdev@dondevamos.com" , "netdev@vger.kernel.org" , "kaber@trash.net" , "bhutchings@solarflare.com" To: unlisted-recipients:; (no To-header on input) Return-path: Received: from e6.ny.us.ibm.com ([32.97.182.146]:46880 "EHLO e6.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932991Ab0GTW6X (ORCPT ); Tue, 20 Jul 2010 18:58:23 -0400 Received: from d01relay05.pok.ibm.com (d01relay05.pok.ibm.com [9.56.227.237]) by e6.ny.us.ibm.com (8.14.4/8.13.1) with ESMTP id o6KMvShT017072 for ; Tue, 20 Jul 2010 18:57:28 -0400 Received: from d01av03.pok.ibm.com (d01av03.pok.ibm.com [9.56.224.217]) by d01relay05.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id o6KMwLP7124050 for ; Tue, 20 Jul 2010 18:58:21 -0400 Received: from d01av03.pok.ibm.com (loopback [127.0.0.1]) by d01av03.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id o6KMwK2D002469 for ; Tue, 20 Jul 2010 19:58:21 -0300 In-reply-to: <25515.1279570766@death> Sender: netdev-owner@vger.kernel.org List-ID: Jay Vosburgh wrote: >Michael Chan wrote: > >>Adding Jay to CC. >> >>On Mon, 2010-07-19 at 06:24 -0700, Eric Dumazet wrote: >>> [ 32.046479] BUG: scheduling while atomic: ifenslave/4586/0x00000100 >>> [ 32.046540] Modules linked in: ipmi_si ipmi_msghandler hpilo >>> bonding ipv6 >>> [ 32.046784] Pid: 4586, comm: ifenslave Tainted: G W >>> 2.6.35-rc1-01453-g3e12451-dirty #836 >>> [ 32.046860] Call Trace: >>> [ 32.046910] [] ? printk+0x18/0x1c >>> [ 32.046965] [] __schedule_bug+0x59/0x60 >>> [ 32.047019] [] schedule+0x57c/0x850 >>> [ 32.047074] [] ? lock_timer_base+0x26/0x50 >>> [ 32.047128] [] schedule_timeout+0x118/0x250 >>> [ 32.047183] [] ? process_timeout+0x0/0x10 >>> [ 32.047238] [] schedule_timeout_uninterruptible >>> +0x15/0x20 >>> [ 32.047295] [] msleep+0x15/0x20 >>> [ 32.047350] [] bnx2_napi_disable+0x52/0x80 >>> [ 32.047405] [] bnx2_netif_stop+0x3f/0xa0 >>> [ 32.047460] [] bnx2_vlan_rx_register+0x5a/0x80 >>> [ 32.047516] [] bond_enslave+0x526/0xa90 [bonding] >>> [ 32.047576] [] ? fib6_clean_node+0x0/0xb0 [ipv6] >>> [ 32.047634] [] ? fib6_age+0x0/0x90 [ipv6] >>> [ 32.047689] [] ? netdev_set_master+0x3/0xc0 >>> [ 32.047746] [] bond_do_ioctl+0x31b/0x430 [bonding] >>> [ 32.047804] [] ? raw_notifier_call_chain+0x1a/0x20 >>> [ 32.047861] [] ? __rtnl_unlock+0xd/0x10 >>> [ 32.047915] [] ? __dev_get_by_name+0x7d/0xa0 >>> [ 32.047970] [] dev_ifsioc+0xf0/0x290 >>> [ 32.048025] [] ? bond_do_ioctl+0x0/0x430 [bonding] >>> [ 32.048081] [] dev_ioctl+0x191/0x610 >>> [ 32.048136] [] ? udp_ioctl+0x0/0x70 >>> [ 32.048189] [] sock_ioctl+0x6c/0x240 >>> [ 32.048243] [] vfs_ioctl+0x34/0xa0 >>> [ 32.048297] [] ? alloc_file+0x1b/0xa0 >>> [ 32.048351] [] ? sock_ioctl+0x0/0x240 >>> [ 32.048404] [] do_vfs_ioctl+0x66/0x550 >>> [ 32.048459] [] ? do_page_fault+0x0/0x350 >>> [ 32.048513] [] ? do_page_fault+0x1a1/0x350 >>> [ 32.048568] [] ? sys_socket+0x5c/0x70 >>> [ 32.048622] [] ? sys_socketcall+0x60/0x270 >>> [ 32.048677] [] sys_ioctl+0x39/0x60 >>> [ 32.048730] [] sysenter_do_call+0x12/0x26 >>> [ 32.052025] bonding: bond0: enslaving eth1 as a backup interface >>> with a down link. >>> [ 32.100207] tg3 0000:14:04.0: PME# enabled >>> [ 32.100222] pci0000:00: wake-up capability enabled by ACPI >>> [ 32.224488] pci0000:00: wake-up capability disabled by ACPI >>> [ 32.224492] tg3 0000:14:04.0: PME# disabled >>> [ 32.348516] tg3 0000:14:04.0: BAR 0: set to [mem >>> 0xfdff0000-0xfdffffff 64bit] (PCI address [0xfdff0000-0xfdffffff] >>> [ 32.348524] tg3 0000:14:04.0: BAR 2: set to [mem >>> 0xfdfe0000-0xfdfeffff 64bit] (PCI address [0xfdfe0000-0xfdfeffff] >>> [ 32.363711] bonding: bond0: enslaving eth2 as a backup interface >>> with a down link. >>> >>> >>> >>> For bnx2, it seems commit 212f9934afccf9c9739921 >>> was not sufficient to correct the "scheduling while atomic" bug... >>> enslaving a bnx2 on a bond device with one vlan already set : >>> bond_enslave -> bnx2_vlan_rx_register -> bnx2_netif_stop -> >>> bnx2_napi_disable -> msleep() >>> >> >>There are a number of drivers that call napi_disable() during >>->ndo_vlan_rx_regsiter(). bnx2 is lockless in the rx path and so we >>need to disable NAPI rx processing and wait for it to be done before >>modifying the vlgrp. >> >>Jay, is there an alternative to holding the bond->lock when calling the >>slave's ->ndo_vlan_rx_register()? > > I believe so. The lock is held here nominally to mutex >bonding's vlan_list. The bond_add_vlans_on_slave function actually does >the lock and call to ndo_vlan_rx_register (plus one add_vid call per >configured VLAN); I think the call frame in the above stack has been >optimized out. > > For the specific cases of bond_add_vlans_on_slave and >bond_del_vlans_from_slave, we should be able to get away without holding >the bond->lock because we also hold RTNL, and it looks like all changes >to the vlan_list are implicitly mutexed by RTNL because all VLAN add / >remove for device or vid end up being done under RTNL. > > The cases within bonding that change the vlan_list will still >have to hold bond->lock, because other call sites within bonding check >the vlan_list without RTNL (and it would be impractical to have them do >so). > > The patch is as follows; I'm compiling this now to test. If it >pans out, I'll post a formal submission in a bit. Just an update; the "VLAN 0" patch: commit ad1afb00393915a51c21b1ae8704562bf036855f Author: Pedro Garcia Date: Sun Jul 18 15:38:44 2010 -0700 vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet) has broken a bunch of VLAN-related things in bonding (more than just the ipv6 event thing that was already fixed). Now, 8021q will do an "add_vid" for VLAN 0 without doing a vlan_rx_register and supplying a struct vlan_group; this confuses the existing bonding code, which assumes that register comes first. I'm working out the best way to fix the VLAN breakage before I can test the below patch (which may have to change). -J >diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c >index 8228088..decddf5 100644 >--- a/drivers/net/bonding/bond_main.c >+++ b/drivers/net/bonding/bond_main.c >@@ -565,10 +565,8 @@ static void bond_add_vlans_on_slave(struct bonding *bond, struct net_device *sla > struct vlan_entry *vlan; > const struct net_device_ops *slave_ops = slave_dev->netdev_ops; > >- write_lock_bh(&bond->lock); >- > if (list_empty(&bond->vlan_list)) >- goto out; >+ return; > > if ((slave_dev->features & NETIF_F_HW_VLAN_RX) && > slave_ops->ndo_vlan_rx_register) >@@ -576,13 +574,10 @@ static void bond_add_vlans_on_slave(struct bonding *bond, struct net_device *sla > > if (!(slave_dev->features & NETIF_F_HW_VLAN_FILTER) || > !(slave_ops->ndo_vlan_rx_add_vid)) >- goto out; >+ return; > > list_for_each_entry(vlan, &bond->vlan_list, vlan_list) > slave_ops->ndo_vlan_rx_add_vid(slave_dev, vlan->vlan_id); >- >-out: >- write_unlock_bh(&bond->lock); > } > > static void bond_del_vlans_from_slave(struct bonding *bond, >@@ -592,10 +587,8 @@ static void bond_del_vlans_from_slave(struct bonding *bond, > struct vlan_entry *vlan; > struct net_device *vlan_dev; > >- write_lock_bh(&bond->lock); >- > if (list_empty(&bond->vlan_list)) >- goto out; >+ return; > > if (!(slave_dev->features & NETIF_F_HW_VLAN_FILTER) || > !(slave_ops->ndo_vlan_rx_kill_vid)) >@@ -614,9 +607,6 @@ unreg: > if ((slave_dev->features & NETIF_F_HW_VLAN_RX) && > slave_ops->ndo_vlan_rx_register) > slave_ops->ndo_vlan_rx_register(slave_dev, NULL); >- >-out: >- write_unlock_bh(&bond->lock); > } > > /*------------------------------- Link status -------------------------------*/ --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com