Netdev List
 help / color / mirror / Atom feed
* Re: [RFC v2] tcp: Export TCP Delayed ACK parameters to user
From: Daniel Baluta @ 2011-10-28 21:35 UTC (permalink / raw)
  To: David Miller
  Cc: eric.dumazet, kuznet, jmorris, yoshfuji, kaber, netdev, luto,
	rick.jones2
In-Reply-To: <20111028.171904.1635229691857703124.davem@davemloft.net>

On Sat, Oct 29, 2011 at 12:19 AM, David Miller <davem@davemloft.net> wrote:
> From: Daniel Baluta <dbaluta@ixiacom.com>
> Date: Sat, 29 Oct 2011 00:14:03 +0300
>
>> +static inline int tcp_delack_thresh(const struct sock *sk)
>> +{
>> +     return inet_csk(sk)->icsk_ack.rcv_mss * sysctl_tcp_delack_segs;
>> +}
>> +
>
> Please turn this into a shift or something, you're adding a multiply
> into a core code path.

Is there any generic API to do this? Default case is not
affected since tcp_delack_segs is 1.

Daniel.

^ permalink raw reply

* Re: [PATCH v4 0/9] Cleanup and extension of netdev features
From: Michał Mirosław @ 2011-10-28 21:47 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, David S. Miller, Ben Hutchings
In-Reply-To: <1319513754.3834.4.camel@edumazet-laptop>

On Tue, Oct 25, 2011 at 05:35:54AM +0200, Eric Dumazet wrote:
> Le mardi 25 octobre 2011 à 02:36 +0200, Michał Mirosław a écrit :
> > Commit fd38f734 (igbvf: convert to ndo_fix_features) removed last use
> > of old ethtool ops for controlling netdevice's features. This series
> > finishes the cleanup and extends feature pool to 64 bits.
> > 
> > Also, there's additional patch that removes NETIF_F_NO_CSUM as it is
> > now, and has been for some time, equivalent to NETIF_F_HW_CSUM.
> > 
> > To see the new features in action, you need ethtool patched with:
> > 
> > http://patchwork.ozlabs.org/patch/96374/
> > 
> > Not much has changed in those patches compared to last version I posted
> > in June.
> This reminds me current net-next is busted for bond/vlan, I dont know
> why.
[cut ethtool output]

Sorry for late reply. Is it broken also for bridge? Those two drivers share
almost the same code that propagates features.

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [RFC v2] tcp: Export TCP Delayed ACK parameters to user
From: Andy Lutomirski @ 2011-10-28 21:53 UTC (permalink / raw)
  To: Daniel Baluta
  Cc: davem, eric.dumazet, kuznet, jmorris, yoshfuji, kaber, netdev,
	rick.jones2
In-Reply-To: <1319836443-4419-1-git-send-email-dbaluta@ixiacom.com>

On Fri, Oct 28, 2011 at 2:14 PM, Daniel Baluta <dbaluta@ixiacom.com> wrote:
> RFC2581 ($4.2) specifies when an ACK should be generated as follows:
>
> " .. an ACK SHOULD be generated for at least every second
>  full-sized segment, and MUST be generated within 500 ms
>  of the arrival of the first unacknowledged packet.
> "
>
> We export the number of segments and the timeout limits
> specified above, so that a user can tune them according
> to its needs.
>
> Specifically:
>        * /proc/sys/net/ipv4/tcp_delack_segs, represents
>        the threshold for the number of segments.
>        * /proc/sys/net/ipv4/tcp_delack_min, specifies
>        the minimum timeout value
>        * /proc/sys/net/ipv4/tcp_delack_max, specifies
>        the maximum timeout value.

This is neat, but IMO it should be per socket -- I (and possibly most
other people who would use it) want to do this kind of tuning per
flow, not per-route or per-interface.

--Andy

^ permalink raw reply

* Cannot remove GRE interface in wireless-next (3.0.0 based)
From: Ben Greear @ 2011-10-28 21:59 UTC (permalink / raw)
  To: netdev

I'm not sure if wireless-next is up to date with the rest of the
kernel (it sort of seems not based on the kernel version), but
thought I'd report this GRE issue:

[root@lec2010-ath9k-1 lanforge]# ip tunnel del gre0
ioctl: Operation not permitted
[root@lec2010-ath9k-1 lanforge]# ifconfig gre0
gre0      Link encap:UNSPEC  HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
           NOARP MULTICAST  MTU:1476  Metric:1
           RX packets:0 errors:0 dropped:0 overruns:0 frame:0
           TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:0
           RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

I'll dig into it further if I get a chance, but stuck in the
wireless stack at the moment...

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re:..
From: Young Chang @ 2011-10-28 15:55 UTC (permalink / raw)


May I ask if you would be eligible to pursue a Business Proposal of  
$19.7m with
me if you don't mind? Let me know if you are interested?

^ permalink raw reply

* Re: [RFC v2] tcp: Export TCP Delayed ACK parameters to user
From: David Miller @ 2011-10-28 22:31 UTC (permalink / raw)
  To: dbaluta
  Cc: eric.dumazet, kuznet, jmorris, yoshfuji, kaber, netdev, luto,
	rick.jones2
In-Reply-To: <CAEnQRZCxSPb8NRceB+tfdZnkN=K-jGbdmXDgNwEhZFYAWGmvAg@mail.gmail.com>

From: Daniel Baluta <dbaluta@ixiacom.com>
Date: Sat, 29 Oct 2011 00:35:24 +0300

> On Sat, Oct 29, 2011 at 12:19 AM, David Miller <davem@davemloft.net> wrote:
>> From: Daniel Baluta <dbaluta@ixiacom.com>
>> Date: Sat, 29 Oct 2011 00:14:03 +0300
>>
>>> +static inline int tcp_delack_thresh(const struct sock *sk)
>>> +{
>>> +     return inet_csk(sk)->icsk_ack.rcv_mss * sysctl_tcp_delack_segs;
>>> +}
>>> +
>>
>> Please turn this into a shift or something, you're adding a multiply
>> into a core code path.
> 
> Is there any generic API to do this? Default case is not
> affected since tcp_delack_segs is 1.

I'm saying make the tunable a shift count instead of something to
multiply against.

^ permalink raw reply

* Re: [RFC v2] tcp: Export TCP Delayed ACK parameters to user
From: Rick Jones @ 2011-10-28 22:40 UTC (permalink / raw)
  To: David Miller
  Cc: dbaluta, eric.dumazet, kuznet, jmorris, yoshfuji, kaber, netdev,
	luto
In-Reply-To: <20111028.183107.2091450715774357523.davem@davemloft.net>

On 10/28/2011 03:31 PM, David Miller wrote:
> From: Daniel Baluta<dbaluta@ixiacom.com>
> Date: Sat, 29 Oct 2011 00:35:24 +0300
>
>> On Sat, Oct 29, 2011 at 12:19 AM, David Miller<davem@davemloft.net>  wrote:
>>> From: Daniel Baluta<dbaluta@ixiacom.com>
>>> Date: Sat, 29 Oct 2011 00:14:03 +0300
>>>
>>>> +static inline int tcp_delack_thresh(const struct sock *sk)
>>>> +{
>>>> +     return inet_csk(sk)->icsk_ack.rcv_mss * sysctl_tcp_delack_segs;
>>>> +}
>>>> +
>>>
>>> Please turn this into a shift or something, you're adding a multiply
>>> into a core code path.
>>
>> Is there any generic API to do this? Default case is not
>> affected since tcp_delack_segs is 1.
>
> I'm saying make the tunable a shift count instead of something to
> multiply against.

That would be loads faster, but won't that have issues with granularity? 
  It will allow 1, 2, 4, 8, 16, 32, etc segments but none of the umpteen 
values in between.  FWIW, HP-UX defaults to 22 segments, which IIRC has 
its basis in how many "typical" segments could fit in a 32KB window.

If the mss and the delack segs are being converted into an octet count, 
and multiplication or successive addition etc are too expensive, how 
about using an octet count directly?

rick jones

^ permalink raw reply

* Re:..
From: Young Chang @ 2011-10-28 16:03 UTC (permalink / raw)


May I ask if you would be eligible to pursue a Business Proposal of  
$19.7m with
me if you don't mind? Let me know if you are interested?

^ permalink raw reply

* [PATCH 3/5] qlcnic: updated reset sequence
From: Anirban Chakraborty @ 2011-10-28 22:57 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Dept_NX_Linux_NIC_Driver, Sony Chacko
In-Reply-To: <1319842636-14936-1-git-send-email-anirban.chakraborty@qlogic.com>

From: Sony Chacko <sony.chacko@qlogic.com>

Signed-off-by: Sony Chacko <sony.chacko@qlogic.com>
Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
---
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_hdr.h  |    2 +
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_init.c |   50 +++++++++++++++++++++-
 2 files changed, 50 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_hdr.h b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_hdr.h
index 92bc8ce..a528193 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_hdr.h
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_hdr.h
@@ -407,7 +407,9 @@ enum {
 #define QLCNIC_CRB_SRE		QLCNIC_PCI_CRB_WINDOW(QLCNIC_HW_PX_MAP_CRB_SRE)
 #define QLCNIC_CRB_ROMUSB	\
 	QLCNIC_PCI_CRB_WINDOW(QLCNIC_HW_PX_MAP_CRB_ROMUSB)
+#define QLCNIC_CRB_EPG		QLCNIC_PCI_CRB_WINDOW(QLCNIC_HW_PX_MAP_CRB_EG)
 #define QLCNIC_CRB_I2Q		QLCNIC_PCI_CRB_WINDOW(QLCNIC_HW_PX_MAP_CRB_I2Q)
+#define QLCNIC_CRB_TIMER	QLCNIC_PCI_CRB_WINDOW(QLCNIC_HW_PX_MAP_CRB_TIMR)
 #define QLCNIC_CRB_I2C0 	QLCNIC_PCI_CRB_WINDOW(QLCNIC_HW_PX_MAP_CRB_I2C0)
 #define QLCNIC_CRB_SMB		QLCNIC_PCI_CRB_WINDOW(QLCNIC_HW_PX_MAP_CRB_SMB)
 #define QLCNIC_CRB_MAX		QLCNIC_PCI_CRB_WINDOW(64)
diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_init.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_init.c
index 312c1c3..3866958 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_init.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_init.c
@@ -422,9 +422,53 @@ int qlcnic_pinit_from_rom(struct qlcnic_adapter *adapter)
 	QLCWR32(adapter, CRB_CMDPEG_STATE, 0);
 	QLCWR32(adapter, CRB_RCVPEG_STATE, 0);
 
-	qlcnic_rom_lock(adapter);
-	QLCWR32(adapter, QLCNIC_ROMUSB_GLB_SW_RESET, 0xfeffffff);
+	/* Halt all the indiviual PEGs and other blocks */
+	/* disable all I2Q */
+	QLCWR32(adapter, QLCNIC_CRB_I2Q + 0x10, 0x0);
+	QLCWR32(adapter, QLCNIC_CRB_I2Q + 0x14, 0x0);
+	QLCWR32(adapter, QLCNIC_CRB_I2Q + 0x18, 0x0);
+	QLCWR32(adapter, QLCNIC_CRB_I2Q + 0x1c, 0x0);
+	QLCWR32(adapter, QLCNIC_CRB_I2Q + 0x20, 0x0);
+	QLCWR32(adapter, QLCNIC_CRB_I2Q + 0x24, 0x0);
+
+	/* disable all niu interrupts */
+	QLCWR32(adapter, QLCNIC_CRB_NIU + 0x40, 0xff);
+	/* disable xge rx/tx */
+	QLCWR32(adapter, QLCNIC_CRB_NIU + 0x70000, 0x00);
+	/* disable xg1 rx/tx */
+	QLCWR32(adapter, QLCNIC_CRB_NIU + 0x80000, 0x00);
+	/* disable sideband mac */
+	QLCWR32(adapter, QLCNIC_CRB_NIU + 0x90000, 0x00);
+	/* disable ap0 mac */
+	QLCWR32(adapter, QLCNIC_CRB_NIU + 0xa0000, 0x00);
+	/* disable ap1 mac */
+	QLCWR32(adapter, QLCNIC_CRB_NIU + 0xb0000, 0x00);
+
+	/* halt sre */
+	val = QLCRD32(adapter, QLCNIC_CRB_SRE + 0x1000);
+	QLCWR32(adapter, QLCNIC_CRB_SRE + 0x1000, val & (~(0x1)));
+
+	/* halt epg */
+	QLCWR32(adapter, QLCNIC_CRB_EPG + 0x1300, 0x1);
+
+	/* halt timers */
+	QLCWR32(adapter, QLCNIC_CRB_TIMER + 0x0, 0x0);
+	QLCWR32(adapter, QLCNIC_CRB_TIMER + 0x8, 0x0);
+	QLCWR32(adapter, QLCNIC_CRB_TIMER + 0x10, 0x0);
+	QLCWR32(adapter, QLCNIC_CRB_TIMER + 0x18, 0x0);
+	QLCWR32(adapter, QLCNIC_CRB_TIMER + 0x100, 0x0);
+	QLCWR32(adapter, QLCNIC_CRB_TIMER + 0x200, 0x0);
+	/* halt pegs */
+	QLCWR32(adapter, QLCNIC_CRB_PEG_NET_0 + 0x3c, 1);
+	QLCWR32(adapter, QLCNIC_CRB_PEG_NET_1 + 0x3c, 1);
+	QLCWR32(adapter, QLCNIC_CRB_PEG_NET_2 + 0x3c, 1);
+	QLCWR32(adapter, QLCNIC_CRB_PEG_NET_3 + 0x3c, 1);
+	QLCWR32(adapter, QLCNIC_CRB_PEG_NET_4 + 0x3c, 1);
+	msleep(20);
+
 	qlcnic_rom_unlock(adapter);
+	/* big hammer don't reset CAM block on reset */
+	QLCWR32(adapter, QLCNIC_ROMUSB_GLB_SW_RESET, 0xfeffffff);
 
 	/* Init HW CRB block */
 	if (qlcnic_rom_fast_read(adapter, 0, &n) != 0 || (n != 0xcafecafe) ||
@@ -522,8 +566,10 @@ int qlcnic_pinit_from_rom(struct qlcnic_adapter *adapter)
 	QLCWR32(adapter, QLCNIC_CRB_PEG_NET_4 + 0x8, 0);
 	QLCWR32(adapter, QLCNIC_CRB_PEG_NET_4 + 0xc, 0);
 	msleep(1);
+
 	QLCWR32(adapter, QLCNIC_PEG_HALT_STATUS1, 0);
 	QLCWR32(adapter, QLCNIC_PEG_HALT_STATUS2, 0);
+
 	return 0;
 }
 
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 1/5] qlcnic: skip IDC ack check in fw reset path.
From: Anirban Chakraborty @ 2011-10-28 22:57 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Dept_NX_Linux_NIC_Driver, Sritej Velaga

From: Sritej Velaga <sritej.velaga@qlogic.com>

In fw reset path, we should consider any change in device state as an
ack from the other driver. When that happens, we don't have to wait for
an explicit ack.

Signed-off-by: Sritej Velaga <sritej.velaga@qlogic.com>
Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
---
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
index 106503f..2edffce 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
@@ -2840,8 +2840,15 @@ qlcnic_fwinit_work(struct work_struct *work)
 		goto wait_npar;
 	}
 
+	if (dev_state == QLCNIC_DEV_INITIALIZING ||
+	    dev_state == QLCNIC_DEV_READY) {
+		dev_info(&adapter->pdev->dev, "Detected state change from "
+				"DEV_NEED_RESET, skipping ack check\n");
+		goto skip_ack_check;
+	}
+
 	if (adapter->fw_wait_cnt++ > adapter->reset_ack_timeo) {
-		dev_err(&adapter->pdev->dev, "Reset:Failed to get ack %d sec\n",
+		dev_info(&adapter->pdev->dev, "Reset:Failed to get ack %d sec\n",
 					adapter->reset_ack_timeo);
 		goto skip_ack_check;
 	}
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 4/5] qlcnic: Updated License file
From: Anirban Chakraborty @ 2011-10-28 22:57 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Dept_NX_Linux_NIC_Driver, Sritej Velaga
In-Reply-To: <1319842636-14936-1-git-send-email-anirban.chakraborty@qlogic.com>

From: Sritej Velaga <sritej.velaga@qlogic.com>

Updated qlcnic's license file.

Signed-off-by: Sritej Velaga <sritej.velaga@qlogic.com>
Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
---
 Documentation/networking/LICENSE.qlcnic |   51 ++++---------------------------
 1 files changed, 6 insertions(+), 45 deletions(-)

diff --git a/Documentation/networking/LICENSE.qlcnic b/Documentation/networking/LICENSE.qlcnic
index 29ad4b1..e7fb2c6 100644
--- a/Documentation/networking/LICENSE.qlcnic
+++ b/Documentation/networking/LICENSE.qlcnic
@@ -1,61 +1,22 @@
-Copyright (c) 2009-2010 QLogic Corporation
+Copyright (c) 2009-2011 QLogic Corporation
 QLogic Linux qlcnic NIC Driver
 
-This program includes a device driver for Linux 2.6 that may be
-distributed with QLogic hardware specific firmware binary file.
 You may modify and redistribute the device driver code under the
 GNU General Public License (a copy of which is attached hereto as
 Exhibit A) published by the Free Software Foundation (version 2).
 
-You may redistribute the hardware specific firmware binary file
-under the following terms:
-
-       1. Redistribution of source code (only if applicable),
-          must retain the above copyright notice, this list of
-          conditions and the following disclaimer.
-
-       2. Redistribution in binary form must reproduce the above
-          copyright notice, this list of conditions and the
-          following disclaimer in the documentation and/or other
-          materials provided with the distribution.
-
-       3. The name of QLogic Corporation may not be used to
-          endorse or promote products derived from this software
-          without specific prior written permission
-
-REGARDLESS OF WHAT LICENSING MECHANISM IS USED OR APPLICABLE,
-THIS PROGRAM IS PROVIDED BY QLOGIC CORPORATION "AS IS'' AND ANY
-EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
-PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR
-BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
-EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
-TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
-DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
-ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
-OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
-OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
-POSSIBILITY OF SUCH DAMAGE.
-
-USER ACKNOWLEDGES AND AGREES THAT USE OF THIS PROGRAM WILL NOT
-CREATE OR GIVE GROUNDS FOR A LICENSE BY IMPLICATION, ESTOPPEL, OR
-OTHERWISE IN ANY INTELLECTUAL PROPERTY RIGHTS (PATENT, COPYRIGHT,
-TRADE SECRET, MASK WORK, OR OTHER PROPRIETARY RIGHT) EMBODIED IN
-ANY OTHER QLOGIC HARDWARE OR SOFTWARE EITHER SOLELY OR IN
-COMBINATION WITH THIS PROGRAM.
-
 
 EXHIBIT A
 
-                   GNU GENERAL PUBLIC LICENSE
-                      Version 2, June 1991
+		    GNU GENERAL PUBLIC LICENSE
+		       Version 2, June 1991
 
  Copyright (C) 1989, 1991 Free Software Foundation, Inc.
  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
  Everyone is permitted to copy and distribute verbatim copies
  of this license document, but changing it is not allowed.
 
-                           Preamble
+			    Preamble
 
   The licenses for most software are designed to take away your
 freedom to share and change it.  By contrast, the GNU General Public
@@ -105,7 +66,7 @@ patent must be licensed for everyone's free use or not licensed at all.
   The precise terms and conditions for copying, distribution and
 modification follow.
 
-                   GNU GENERAL PUBLIC LICENSE
+		    GNU GENERAL PUBLIC LICENSE
    TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
 
   0. This License applies to any program or other work which contains
@@ -304,7 +265,7 @@ make exceptions for this.  Our decision will be guided by the two goals
 of preserving the free status of all derivatives of our free software and
 of promoting the sharing and reuse of software generally.
 
-                           NO WARRANTY
+			    NO WARRANTY
 
   11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
 FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW.  EXCEPT WHEN
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 5/5] qlcnic: fix beacon and LED test.
From: Anirban Chakraborty @ 2011-10-28 22:57 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Dept_NX_Linux_NIC_Driver, Sucheta Chakraborty
In-Reply-To: <1319842636-14936-1-git-send-email-anirban.chakraborty@qlogic.com>

From: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>

o Updated version number to 5.0.25

o Do not hold onto RESETTING_BIT for entire duration of LED/ beacon test.
  Instead, just checking for RESETTING_BIT not set before sending config_led
  command down to card.

o Take rtnl_lock instead of RESETTING_BIT for beacon test while sending
  config_led command down to make sure interface cannot be brought up/ down.

o Allocate and free resources if interface is down before
  sending the config_led command. This is to make sure config_led
  command sending doesn't fail.

o Clear QLCNIC_LED_ENABLE bit if beacon/ LED test fails to start.

Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
---
 drivers/net/ethernet/qlogic/qlcnic/qlcnic.h        |    4 +-
 .../net/ethernet/qlogic/qlcnic/qlcnic_ethtool.c    |   45 +++++++++++++------
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c   |   39 ++++++++++-------
 3 files changed, 57 insertions(+), 31 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h b/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h
index 2fd1ba8..7ed53db 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h
@@ -36,8 +36,8 @@
 
 #define _QLCNIC_LINUX_MAJOR 5
 #define _QLCNIC_LINUX_MINOR 0
-#define _QLCNIC_LINUX_SUBVERSION 24
-#define QLCNIC_LINUX_VERSIONID  "5.0.24"
+#define _QLCNIC_LINUX_SUBVERSION 25
+#define QLCNIC_LINUX_VERSIONID  "5.0.25"
 #define QLCNIC_DRV_IDC_VER  0x01
 #define QLCNIC_DRIVER_VERSION  ((_QLCNIC_LINUX_MAJOR << 16) |\
 		 (_QLCNIC_LINUX_MINOR << 8) | (_QLCNIC_LINUX_SUBVERSION))
diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_ethtool.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_ethtool.c
index 5d8bec2..8aa1c6e 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_ethtool.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_ethtool.c
@@ -935,31 +935,49 @@ static int qlcnic_set_led(struct net_device *dev,
 {
 	struct qlcnic_adapter *adapter = netdev_priv(dev);
 	int max_sds_rings = adapter->max_sds_rings;
+	int err = -EIO, active = 1;
+
+	if (adapter->op_mode == QLCNIC_NON_PRIV_FUNC) {
+		netdev_warn(dev, "LED test not supported for non "
+				"privilege function\n");
+		return -EOPNOTSUPP;
+	}
 
 	switch (state) {
 	case ETHTOOL_ID_ACTIVE:
 		if (test_and_set_bit(__QLCNIC_LED_ENABLE, &adapter->state))
 			return -EBUSY;
 
-		if (!test_bit(__QLCNIC_DEV_UP, &adapter->state)) {
-			if (test_and_set_bit(__QLCNIC_RESETTING, &adapter->state))
-				return -EIO;
+		if (test_bit(__QLCNIC_RESETTING, &adapter->state))
+			break;
 
-			if (qlcnic_diag_alloc_res(dev, QLCNIC_LED_TEST)) {
-				clear_bit(__QLCNIC_RESETTING, &adapter->state);
-				return -EIO;
-			}
+		if (!test_bit(__QLCNIC_DEV_UP, &adapter->state)) {
+			if (qlcnic_diag_alloc_res(dev, QLCNIC_LED_TEST))
+				break;
 			set_bit(__QLCNIC_DIAG_RES_ALLOC, &adapter->state);
 		}
 
-		if (adapter->nic_ops->config_led(adapter, 1, 0xf) == 0)
-			return 0;
+		if (adapter->nic_ops->config_led(adapter, 1, 0xf) == 0) {
+			err = 0;
+			break;
+		}
 
 		dev_err(&adapter->pdev->dev,
 			"Failed to set LED blink state.\n");
 		break;
 
 	case ETHTOOL_ID_INACTIVE:
+		active = 0;
+
+		if (test_bit(__QLCNIC_RESETTING, &adapter->state))
+			break;
+
+		if (!test_bit(__QLCNIC_DEV_UP, &adapter->state)) {
+			if (qlcnic_diag_alloc_res(dev, QLCNIC_LED_TEST))
+				break;
+			set_bit(__QLCNIC_DIAG_RES_ALLOC, &adapter->state);
+		}
+
 		if (adapter->nic_ops->config_led(adapter, 0, 0xf))
 			dev_err(&adapter->pdev->dev,
 				"Failed to reset LED blink state.\n");
@@ -970,14 +988,13 @@ static int qlcnic_set_led(struct net_device *dev,
 		return -EINVAL;
 	}
 
-	if (test_and_clear_bit(__QLCNIC_DIAG_RES_ALLOC, &adapter->state)) {
+	if (test_and_clear_bit(__QLCNIC_DIAG_RES_ALLOC, &adapter->state))
 		qlcnic_diag_free_res(dev, max_sds_rings);
-		clear_bit(__QLCNIC_RESETTING, &adapter->state);
-	}
 
-	clear_bit(__QLCNIC_LED_ENABLE, &adapter->state);
+	if (!active || err)
+		clear_bit(__QLCNIC_LED_ENABLE, &adapter->state);
 
-	return -EIO;
+	return err;
 }
 
 static void
diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
index 2edffce..0bd1638 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
@@ -3504,11 +3504,16 @@ qlcnic_store_beacon(struct device *dev,
 {
 	struct qlcnic_adapter *adapter = dev_get_drvdata(dev);
 	int max_sds_rings = adapter->max_sds_rings;
-	int dev_down = 0;
 	u16 beacon;
 	u8 b_state, b_rate;
 	int err;
 
+	if (adapter->op_mode == QLCNIC_NON_PRIV_FUNC) {
+		dev_warn(dev, "LED test not supported for non "
+				"privilege function\n");
+		return -EOPNOTSUPP;
+	}
+
 	if (len != sizeof(u16))
 		return QL_STATUS_INVALID_PARAM;
 
@@ -3520,36 +3525,40 @@ qlcnic_store_beacon(struct device *dev,
 	if (adapter->ahw->beacon_state == b_state)
 		return len;
 
+	rtnl_lock();
+
 	if (!adapter->ahw->beacon_state)
-		if (test_and_set_bit(__QLCNIC_LED_ENABLE, &adapter->state))
+		if (test_and_set_bit(__QLCNIC_LED_ENABLE, &adapter->state)) {
+			rtnl_unlock();
 			return -EBUSY;
+		}
+
+	if (test_bit(__QLCNIC_RESETTING, &adapter->state)) {
+		err = -EIO;
+		goto out;
+	}
 
 	if (!test_bit(__QLCNIC_DEV_UP, &adapter->state)) {
-		if (test_and_set_bit(__QLCNIC_RESETTING, &adapter->state))
-			return -EIO;
 		err = qlcnic_diag_alloc_res(adapter->netdev, QLCNIC_LED_TEST);
-		if (err) {
-			clear_bit(__QLCNIC_RESETTING, &adapter->state);
-			clear_bit(__QLCNIC_LED_ENABLE, &adapter->state);
-			return err;
-		}
-		dev_down = 1;
+		if (err)
+			goto out;
+		set_bit(__QLCNIC_DIAG_RES_ALLOC, &adapter->state);
 	}
 
 	err = qlcnic_config_led(adapter, b_state, b_rate);
 
 	if (!err) {
-		adapter->ahw->beacon_state = b_state;
 		err = len;
+		adapter->ahw->beacon_state = b_state;
 	}
 
-	if (dev_down) {
+	if (test_and_clear_bit(__QLCNIC_DIAG_RES_ALLOC, &adapter->state))
 		qlcnic_diag_free_res(adapter->netdev, max_sds_rings);
-		clear_bit(__QLCNIC_RESETTING, &adapter->state);
-	}
 
-	if (!b_state)
+ out:
+	if (!adapter->ahw->beacon_state)
 		clear_bit(__QLCNIC_LED_ENABLE, &adapter->state);
+	rtnl_unlock();
 
 	return err;
 }
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH net-next 0/5] qlcnic: Fixes
From: Anirban Chakraborty @ 2011-10-28 22:57 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Dept_NX_Linux_NIC_Driver, Anirban Chakraborty
In-Reply-To: <1319842636-14936-1-git-send-email-anirban.chakraborty@qlogic.com>

Please apply the series to net-next. Thanks.

-Anirban

^ permalink raw reply

* [PATCH 2/5] qlcnic: reset loopback mode if promiscous mode setting fails.
From: Anirban Chakraborty @ 2011-10-28 22:57 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Dept_NX_Linux_NIC_Driver, Sucheta Chakraborty
In-Reply-To: <1319842636-14936-1-git-send-email-anirban.chakraborty@qlogic.com>

From: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>

If promiscous mode setting fails, reset loopback mode setting in firmware.

Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
---
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.c
index 74e9d7b..bcb81e4 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.c
@@ -566,7 +566,7 @@ int qlcnic_set_lb_mode(struct qlcnic_adapter *adapter, u8 mode)
 		return -EIO;
 
 	if (qlcnic_nic_set_promisc(adapter, VPORT_MISS_MODE_ACCEPT_ALL)) {
-		qlcnic_set_fw_loopback(adapter, mode);
+		qlcnic_set_fw_loopback(adapter, 0);
 		return -EIO;
 	}
 
-- 
1.7.4.1

^ permalink raw reply related

* Re: [PATCH 2/2 v3] net/smsc911x: Add regulator support
From: Mike Frysinger @ 2011-10-28 23:33 UTC (permalink / raw)
  To: Sascha Hauer
  Cc: Linus Walleij, netdev, Steve Glendinning, Mathieu Poirer,
	Robert Marklund, Paul Mundt, linux-sh, Tony Lindgren, linux-omap,
	uclinux-dist-devel, Linus Walleij
In-Reply-To: <20111028203353.GE23421@pengutronix.de>

On Fri, Oct 28, 2011 at 22:33, Sascha Hauer wrote:
> On Thu, Oct 27, 2011 at 02:48:11PM +0200, Linus Walleij wrote:
>> +/*
>> + * Request or free resources, currently just regulators.
>> + *
>> + * The SMSC911x has two power pins: vddvario and vdd33a, in designs where
>> + * these are not always-on we need to request regulators to be turned on
>> + * before we can try to access the device registers.
>> + */
>> +static int smsc911x_request_free_resources(struct platform_device *pdev,
>> +             bool request)
>
> I had to look twice at this function name. First I thought "request the
> free resources?", which other resources would you request if not the
> free ones? I think it would be nicer to have two functions instead.
> Just my 2 cents.

i'll add my 2 cents and we'll almost have a nickle.  maybe i'm dense,
but i had to look (more than) twice at both funcs before i could get
my head around what was happening.  no, it's not complicated, but it
is unusual in the kernel world.  either that or i haven't read enough
kernel code to consider this a common paradigm.  hopefully it's the
former ;).
-mike
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH 2/5] drivers/net/wireless/brcm80211/brcmsmac/dma.c: eliminate a null pointer dereference
From: Julia Lawall @ 2011-10-28 23:58 UTC (permalink / raw)
  To: John W. Linville; +Cc: kernel-janitors, linux-wireless, netdev, linux-kernel

From: Julia Lawall <julia@diku.dk>

Delete di->name from the error reporting code, as it is meaningless if di
is NULL.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@r@
expression E, E1;
identifier f;
statement S1,S2,S3;
@@

if (E == NULL)
{
  ... when != if (E == NULL || ...) S1 else S2
      when != E = E1
*E->f
  ... when any
  return ...;
}
else S3
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>

---
 drivers/net/wireless/brcm80211/brcmsmac/dma.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/brcm80211/brcmsmac/dma.c b/drivers/net/wireless/brcm80211/brcmsmac/dma.c
index b56a302..1d66f53 100644
--- a/drivers/net/wireless/brcm80211/brcmsmac/dma.c
+++ b/drivers/net/wireless/brcm80211/brcmsmac/dma.c
@@ -361,7 +361,7 @@ static uint _dma_ctrlflags(struct dma_info *di, uint mask, uint flags)
 	uint dmactrlflags = di->dma.dmactrlflags;
 
 	if (di == NULL) {
-		DMA_ERROR(("%s: _dma_ctrlflags: NULL dma handle\n", di->name));
+		DMA_ERROR(("_dma_ctrlflags: NULL dma handle\n"));
 		return 0;
 	}
 

^ permalink raw reply related

* [PATCH 0/2] bonding: Doesn't support IPv6
From: John @ 2011-10-29  1:08 UTC (permalink / raw)
  To: netdev; +Cc: andy

Currently the "bonding" driver does not support load balancing outgoing 
traffic in LACP mode for IPv6 traffic. IPv4 (and TCP or UDP over IPv4) 
are currently supported; this patch adds transmit hashing for IPv6 (and 
TCP or UDP over IPv6), bringing IPv6 up to par with IPv4 support in the 
bonding driver.

The algorithm chosen (xor'ing the bottom three quads and then xor'ing 
that down into the bottom byte) was chosen after testing almost 400,000 
unique IPv6 addresses harvested from server logs. This algorithm had the 
most even distribution for both big- and little-endian architectures 
while still using few instructions.

Fragmented IPv6 packets are handled the same way as fragmented IPv4 
packets, ie, they are not balanced based on layer 4 information. 
Additionally, IPv6 packets with intermediate headers are not balanced 
based on layer 4 information. In practice these intermediate headers are 
rare and this should not cause any problems, the alternative (a 
packet-parsing loop and look-up table) seemed slow and complicated for 
little gain.

This is an update to a prior patch I submitted. This version includes 
bounds checking not present in the original driver or my prior patch. 
Included with the patch is an update to the bonding documentation.

Patch has been tested and performs as expected.

John

^ permalink raw reply

* [PATCH 1/2] bonding: Doesn't support IPv6
From: John @ 2011-10-29  1:18 UTC (permalink / raw)
  To: netdev; +Cc: andy

--- a/drivers/net/bonding/bond_main.c	2011-04-19 11:18:48.000000000 -0700
+++ b/drivers/net/bonding/bond_main.c	2011-10-27 11:26:20.000000000 -0700
@@ -3533,14 +3533,26 @@
  static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count)
  {
  	struct ethhdr *data = (struct ethhdr *)skb->data;
-	struct iphdr *iph = ip_hdr(skb);

-	if (skb->protocol == htons(ETH_P_IP)) {
+	if (skb->protocol == htons(ETH_P_IP) &&
+		skb_network_header_len(skb) >= sizeof(struct iphdr)) {
+		struct iphdr *iph = ip_hdr(skb);
  		return ((ntohl(iph->saddr ^ iph->daddr) & 0xffff) ^
  			(data->h_dest[5] ^ data->h_source[5])) % count;
+	} else if (skb->protocol == htons(ETH_P_IPV6) &&
+		skb_network_header_len(skb) >= sizeof(struct ipv6hdr)) {
+		struct ipv6hdr *ipv6h = ipv6_hdr(skb);
+		u32 v6hash =
+			(ipv6h->saddr.s6_addr32[1] ^ ipv6h->daddr.s6_addr32[1]) ^
+			(ipv6h->saddr.s6_addr32[2] ^ ipv6h->daddr.s6_addr32[2]) ^
+			(ipv6h->saddr.s6_addr32[3] ^ ipv6h->daddr.s6_addr32[3]);
+		v6hash = (v6hash >> 16) ^ (v6hash >> 8) ^ v6hash;
+		return (v6hash ^ data->h_dest[5] ^ data->h_source[5]) % count;
  	}

-	return (data->h_dest[5] ^ data->h_source[5]) % count;
+	if (skb_headlen(skb) >= 6)
+		return (data->h_dest[5] ^ data->h_source[5]) % count;
+	return 0;
  }

  /*
@@ -3551,22 +3563,39 @@
  static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count)
  {
  	struct ethhdr *data = (struct ethhdr *)skb->data;
-	struct iphdr *iph = ip_hdr(skb);
-	__be16 *layer4hdr = (__be16 *)((u32 *)iph + iph->ihl);
-	int layer4_xor = 0;
+	u32 layer4_xor = 0;

  	if (skb->protocol == htons(ETH_P_IP)) {
+		struct iphdr *iph = ip_hdr(skb);
+		__be16 *layer4hdr = (__be16 *)((u32 *)iph + iph->ihl);
+		if (iph->ihl * sizeof(u32) + sizeof(__be16) * 2 >
+			skb_headlen(skb) - skb_network_offset(skb)) goto SHORT_HEADER;
  		if (!(iph->frag_off & htons(IP_MF|IP_OFFSET)) &&
-		    (iph->protocol == IPPROTO_TCP ||
-		     iph->protocol == IPPROTO_UDP)) {
+			(iph->protocol == IPPROTO_TCP ||
+			iph->protocol == IPPROTO_UDP)) {
  			layer4_xor = ntohs((*layer4hdr ^ *(layer4hdr + 1)));
  		}
  		return (layer4_xor ^
  			((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count;
-
-	}
-
-	return (data->h_dest[5] ^ data->h_source[5]) % count;
+	} else if (skb->protocol == htons(ETH_P_IPV6)) {
+		struct ipv6hdr *ipv6h = ipv6_hdr(skb);
+		__be16 *layer4hdrv6 = (__be16 *)((u8 *)ipv6h + sizeof(*ipv6h));
+		if (sizeof(struct ipv6hdr) + sizeof(__be16) * 2 >
+			skb_headlen(skb) - skb_network_offset(skb)) goto SHORT_HEADER;
+		if (ipv6h->nexthdr == IPPROTO_TCP || ipv6h->nexthdr == IPPROTO_UDP) {
+			layer4_xor = (*layer4hdrv6 ^ *(layer4hdrv6 + 1));
+		}
+		layer4_xor ^=
+			(ipv6h->saddr.s6_addr32[1] ^ ipv6h->daddr.s6_addr32[1]) ^
+			(ipv6h->saddr.s6_addr32[2] ^ ipv6h->daddr.s6_addr32[2]) ^
+			(ipv6h->saddr.s6_addr32[3] ^ ipv6h->daddr.s6_addr32[3]);
+		return ((layer4_xor >> 16) ^ (layer4_xor >> 8) ^ layer4_xor) % count;
+	}
+
+	SHORT_HEADER:
+	if (skb_headlen(skb) >= 6)
+		return (data->h_dest[5] ^ data->h_source[5]) % count;
+	return 0;
  }

  /*

^ permalink raw reply

* [PATCH 2/2] bonding: Doesn't support IPv6
From: John @ 2011-10-29  1:18 UTC (permalink / raw)
  To: netdev; +Cc: andy

--- a/Documentation/networking/bonding.txt	2011-09-01 12:15:38.000000000 -0700
+++ b/Documentation/networking/bonding.txt	2011-09-26 10:23:43.000000000 -0700
@@ -709,12 +709,22 @@
  		protocol information to generate the hash.

  		Uses XOR of hardware MAC addresses and IP addresses to
-		generate the hash.  The formula is
+		generate the hash.  The IPv4 formula is

  		(((source IP XOR dest IP) AND 0xffff) XOR
  			( source MAC XOR destination MAC ))
  				modulo slave count

+        The IPv6 forumla is
+
+        iphash =
+             (source ip quad 2 XOR dest IP quad 2) XOR
+             (source ip quad 3 XOR dest IP quad 3) XOR
+             (source ip quad 4 XOR dest IP quad 4)
+
+        (iphash >> 16) XOR (iphash >> 8) XOR iphash
+            modulo slave count
+
  		This algorithm will place all traffic to a particular
  		network peer on the same slave.  For non-IP traffic,
  		the formula is the same as for the layer2 transmit
@@ -735,19 +745,30 @@
  		slaves, although a single connection will not span
  		multiple slaves.

-		The formula for unfragmented TCP and UDP packets is
+		The formula for unfragmented IPv4 TCP and UDP packets is

  		((source port XOR dest port) XOR
  			 ((source IP XOR dest IP) AND 0xffff)
  				modulo slave count

-		For fragmented TCP or UDP packets and all other IP
-		protocol traffic, the source and destination port
+		The formula for unfragmented IPv6 TCP and UDP packets is
+
+		iphash =
+			 (source ip quad 2 XOR dest IP quad 2) XOR
+			 (source ip quad 3 XOR dest IP quad 3) XOR
+			 (source ip quad 4 XOR dest IP quad 4)
+
+		((source port XOR dest port) XOR
+			 (iphash >> 16) XOR (iphash >> 8) XOR iphash
+				modulo slave count
+
+		For fragmented TCP or UDP packets and all other IPv4 and
+		IPv6 protocol traffic, the source and destination port
  		information is omitted.  For non-IP traffic, the
  		formula is the same as for the layer2 transmit hash
  		policy.

-		This policy is intended to mimic the behavior of
+		The IPv4 policy is intended to mimic the behavior of
  		certain switches, notably Cisco switches with PFC2 as
  		well as some Foundry and IBM products.

^ permalink raw reply

* [PATCH net-next] bonding: eliminate bond_close race conditions
From: Jay Vosburgh @ 2011-10-29  1:42 UTC (permalink / raw)
  To: David Miller
  Cc: mitsuo.hayasaka.hu, netdev, xiyou.wangcong, shemminger, andy,
	linux-kernel, yrl.pp-manager.tt
In-Reply-To: <20111027.231520.698587739433841485.davem@davemloft.net>


	This patch resolves two sets of race conditions.

	Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> reported the
first, as follows:

The bond_close() calls cancel_delayed_work() to cancel delayed works.
It, however, cannot cancel works that were already queued in workqueue.
The bond_open() initializes work->data, and proccess_one_work() refers
get_work_cwq(work)->wq->flags. The get_work_cwq() returns NULL when
work->data has been initialized. Thus, a panic occurs.

	He included a patch that converted the cancel_delayed_work calls
in bond_close to flush_delayed_work_sync, which eliminated the above
problem.

	His patch is incorporated, at least in principle, into this
patch.  In this patch, we use cancel_delayed_work_sync in place of
flush_delayed_work_sync, and also convert bond_uninit in addition to
bond_close.

	This conversion to _sync, however, opens new races between
bond_close and three periodically executing workqueue functions:
bond_mii_monitor, bond_alb_monitor and bond_activebackup_arp_mon.

	The race occurs because bond_close and bond_uninit are always
called with RTNL held, and these workqueue functions may acquire RTNL to
perform failover-related activities.  If bond_close or bond_uninit is
waiting in cancel_delayed_work_sync, deadlock occurs.

	These deadlocks are resolved by having the workqueue functions
acquire RTNL conditionally.  If the rtnl_trylock() fails, the functions
reschedule and return immediately.  For the cases that are attempting to
perform link failover, a delay of 1 is used; for the other cases, the
normal interval is used (as those activities are not as time critical).

	Additionally, the bond_mii_monitor function now stores the delay
in a variable (mimicing the structure of activebackup_arp_mon).

	Lastly, all of the above renders the kill_timers sentinel moot,
and therefore it has been removed.

Tested-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>

---
 drivers/net/bonding/bond_3ad.c  |    8 +---
 drivers/net/bonding/bond_alb.c  |   16 +++----
 drivers/net/bonding/bond_main.c |   96 +++++++++++++++++++++------------------
 drivers/net/bonding/bonding.h   |    1 -
 4 files changed, 61 insertions(+), 60 deletions(-)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index b33c099..0ae0d7c 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -2110,9 +2110,6 @@ void bond_3ad_state_machine_handler(struct work_struct *work)
 
 	read_lock(&bond->lock);
 
-	if (bond->kill_timers)
-		goto out;
-
 	//check if there are any slaves
 	if (bond->slave_cnt == 0)
 		goto re_arm;
@@ -2161,9 +2158,8 @@ void bond_3ad_state_machine_handler(struct work_struct *work)
 	}
 
 re_arm:
-	if (!bond->kill_timers)
-		queue_delayed_work(bond->wq, &bond->ad_work, ad_delta_in_ticks);
-out:
+	queue_delayed_work(bond->wq, &bond->ad_work, ad_delta_in_ticks);
+
 	read_unlock(&bond->lock);
 }
 
diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index d4fbd2e..106b88a 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -1343,10 +1343,6 @@ void bond_alb_monitor(struct work_struct *work)
 
 	read_lock(&bond->lock);
 
-	if (bond->kill_timers) {
-		goto out;
-	}
-
 	if (bond->slave_cnt == 0) {
 		bond_info->tx_rebalance_counter = 0;
 		bond_info->lp_counter = 0;
@@ -1401,10 +1397,13 @@ void bond_alb_monitor(struct work_struct *work)
 
 			/*
 			 * dev_set_promiscuity requires rtnl and
-			 * nothing else.
+			 * nothing else.  Avoid race with bond_close.
 			 */
 			read_unlock(&bond->lock);
-			rtnl_lock();
+			if (!rtnl_trylock()) {
+				read_lock(&bond->lock);
+				goto re_arm;
+			}
 
 			bond_info->rlb_promisc_timeout_counter = 0;
 
@@ -1440,9 +1439,8 @@ void bond_alb_monitor(struct work_struct *work)
 	}
 
 re_arm:
-	if (!bond->kill_timers)
-		queue_delayed_work(bond->wq, &bond->alb_work, alb_delta_in_ticks);
-out:
+	queue_delayed_work(bond->wq, &bond->alb_work, alb_delta_in_ticks);
+
 	read_unlock(&bond->lock);
 }
 
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 71efff3..9931a16 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -774,9 +774,6 @@ static void bond_resend_igmp_join_requests(struct bonding *bond)
 
 	read_lock(&bond->lock);
 
-	if (bond->kill_timers)
-		goto out;
-
 	/* rejoin all groups on bond device */
 	__bond_resend_igmp_join_requests(bond->dev);
 
@@ -790,9 +787,9 @@ static void bond_resend_igmp_join_requests(struct bonding *bond)
 			__bond_resend_igmp_join_requests(vlan_dev);
 	}
 
-	if ((--bond->igmp_retrans > 0) && !bond->kill_timers)
+	if (--bond->igmp_retrans > 0)
 		queue_delayed_work(bond->wq, &bond->mcast_work, HZ/5);
-out:
+
 	read_unlock(&bond->lock);
 }
 
@@ -2518,10 +2515,11 @@ void bond_mii_monitor(struct work_struct *work)
 	struct bonding *bond = container_of(work, struct bonding,
 					    mii_work.work);
 	bool should_notify_peers = false;
+	unsigned long delay;
 
 	read_lock(&bond->lock);
-	if (bond->kill_timers)
-		goto out;
+
+	delay = msecs_to_jiffies(bond->params.miimon);
 
 	if (bond->slave_cnt == 0)
 		goto re_arm;
@@ -2530,7 +2528,15 @@ void bond_mii_monitor(struct work_struct *work)
 
 	if (bond_miimon_inspect(bond)) {
 		read_unlock(&bond->lock);
-		rtnl_lock();
+
+		/* Race avoidance with bond_close cancel of workqueue */
+		if (!rtnl_trylock()) {
+			read_lock(&bond->lock);
+			delay = 1;
+			should_notify_peers = false;
+			goto re_arm;
+		}
+
 		read_lock(&bond->lock);
 
 		bond_miimon_commit(bond);
@@ -2541,14 +2547,18 @@ void bond_mii_monitor(struct work_struct *work)
 	}
 
 re_arm:
-	if (bond->params.miimon && !bond->kill_timers)
-		queue_delayed_work(bond->wq, &bond->mii_work,
-				   msecs_to_jiffies(bond->params.miimon));
-out:
+	if (bond->params.miimon)
+		queue_delayed_work(bond->wq, &bond->mii_work, delay);
+
 	read_unlock(&bond->lock);
 
 	if (should_notify_peers) {
-		rtnl_lock();
+		if (!rtnl_trylock()) {
+			read_lock(&bond->lock);
+			bond->send_peer_notif++;
+			read_unlock(&bond->lock);
+			return;
+		}
 		netdev_bonding_change(bond->dev, NETDEV_NOTIFY_PEERS);
 		rtnl_unlock();
 	}
@@ -2790,9 +2800,6 @@ void bond_loadbalance_arp_mon(struct work_struct *work)
 
 	delta_in_ticks = msecs_to_jiffies(bond->params.arp_interval);
 
-	if (bond->kill_timers)
-		goto out;
-
 	if (bond->slave_cnt == 0)
 		goto re_arm;
 
@@ -2889,9 +2896,9 @@ void bond_loadbalance_arp_mon(struct work_struct *work)
 	}
 
 re_arm:
-	if (bond->params.arp_interval && !bond->kill_timers)
+	if (bond->params.arp_interval)
 		queue_delayed_work(bond->wq, &bond->arp_work, delta_in_ticks);
-out:
+
 	read_unlock(&bond->lock);
 }
 
@@ -3132,9 +3139,6 @@ void bond_activebackup_arp_mon(struct work_struct *work)
 
 	read_lock(&bond->lock);
 
-	if (bond->kill_timers)
-		goto out;
-
 	delta_in_ticks = msecs_to_jiffies(bond->params.arp_interval);
 
 	if (bond->slave_cnt == 0)
@@ -3144,7 +3148,15 @@ void bond_activebackup_arp_mon(struct work_struct *work)
 
 	if (bond_ab_arp_inspect(bond, delta_in_ticks)) {
 		read_unlock(&bond->lock);
-		rtnl_lock();
+
+		/* Race avoidance with bond_close flush of workqueue */
+		if (!rtnl_trylock()) {
+			read_lock(&bond->lock);
+			delta_in_ticks = 1;
+			should_notify_peers = false;
+			goto re_arm;
+		}
+
 		read_lock(&bond->lock);
 
 		bond_ab_arp_commit(bond, delta_in_ticks);
@@ -3157,13 +3169,18 @@ void bond_activebackup_arp_mon(struct work_struct *work)
 	bond_ab_arp_probe(bond);
 
 re_arm:
-	if (bond->params.arp_interval && !bond->kill_timers)
+	if (bond->params.arp_interval)
 		queue_delayed_work(bond->wq, &bond->arp_work, delta_in_ticks);
-out:
+
 	read_unlock(&bond->lock);
 
 	if (should_notify_peers) {
-		rtnl_lock();
+		if (!rtnl_trylock()) {
+			read_lock(&bond->lock);
+			bond->send_peer_notif++;
+			read_unlock(&bond->lock);
+			return;
+		}
 		netdev_bonding_change(bond->dev, NETDEV_NOTIFY_PEERS);
 		rtnl_unlock();
 	}
@@ -3425,8 +3442,6 @@ static int bond_open(struct net_device *bond_dev)
 	struct slave *slave;
 	int i;
 
-	bond->kill_timers = 0;
-
 	/* reset slave->backup and slave->inactive */
 	read_lock(&bond->lock);
 	if (bond->slave_cnt > 0) {
@@ -3495,33 +3510,30 @@ static int bond_close(struct net_device *bond_dev)
 
 	bond->send_peer_notif = 0;
 
-	/* signal timers not to re-arm */
-	bond->kill_timers = 1;
-
 	write_unlock_bh(&bond->lock);
 
 	if (bond->params.miimon) {  /* link check interval, in milliseconds. */
-		cancel_delayed_work(&bond->mii_work);
+		cancel_delayed_work_sync(&bond->mii_work);
 	}
 
 	if (bond->params.arp_interval) {  /* arp interval, in milliseconds. */
-		cancel_delayed_work(&bond->arp_work);
+		cancel_delayed_work_sync(&bond->arp_work);
 	}
 
 	switch (bond->params.mode) {
 	case BOND_MODE_8023AD:
-		cancel_delayed_work(&bond->ad_work);
+		cancel_delayed_work_sync(&bond->ad_work);
 		break;
 	case BOND_MODE_TLB:
 	case BOND_MODE_ALB:
-		cancel_delayed_work(&bond->alb_work);
+		cancel_delayed_work_sync(&bond->alb_work);
 		break;
 	default:
 		break;
 	}
 
 	if (delayed_work_pending(&bond->mcast_work))
-		cancel_delayed_work(&bond->mcast_work);
+		cancel_delayed_work_sync(&bond->mcast_work);
 
 	if (bond_is_lb(bond)) {
 		/* Must be called only after all
@@ -4368,26 +4380,22 @@ static void bond_setup(struct net_device *bond_dev)
 
 static void bond_work_cancel_all(struct bonding *bond)
 {
-	write_lock_bh(&bond->lock);
-	bond->kill_timers = 1;
-	write_unlock_bh(&bond->lock);
-
 	if (bond->params.miimon && delayed_work_pending(&bond->mii_work))
-		cancel_delayed_work(&bond->mii_work);
+		cancel_delayed_work_sync(&bond->mii_work);
 
 	if (bond->params.arp_interval && delayed_work_pending(&bond->arp_work))
-		cancel_delayed_work(&bond->arp_work);
+		cancel_delayed_work_sync(&bond->arp_work);
 
 	if (bond->params.mode == BOND_MODE_ALB &&
 	    delayed_work_pending(&bond->alb_work))
-		cancel_delayed_work(&bond->alb_work);
+		cancel_delayed_work_sync(&bond->alb_work);
 
 	if (bond->params.mode == BOND_MODE_8023AD &&
 	    delayed_work_pending(&bond->ad_work))
-		cancel_delayed_work(&bond->ad_work);
+		cancel_delayed_work_sync(&bond->ad_work);
 
 	if (delayed_work_pending(&bond->mcast_work))
-		cancel_delayed_work(&bond->mcast_work);
+		cancel_delayed_work_sync(&bond->mcast_work);
 }
 
 /*
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 82fec5f..1aecc37 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -222,7 +222,6 @@ struct bonding {
 			       struct slave *);
 	rwlock_t lock;
 	rwlock_t curr_slave_lock;
-	s8       kill_timers;
 	u8	 send_peer_notif;
 	s8	 setup_by_slave;
 	s8       igmp_retrans;
-- 
1.7.1

^ permalink raw reply related

* Re: [net-next PATCH] net: allow vlan traffic to be received under bond
From: John Fastabend @ 2011-10-29  2:20 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, jesse@nicira.com, hans.schillstrom@ericsson.com,
	jpirko@redhat.com, mbizon@freebox.fr, netdev@vger.kernel.org,
	fubar@us.ibm.com
In-Reply-To: <1319799986.23112.101.camel@edumazet-laptop>

On 10/28/2011 4:06 AM, Eric Dumazet wrote:
> Le vendredi 28 octobre 2011 à 12:00 +0200, Eric Dumazet a écrit :
> 
>> Oh well, this broke my setup, a very basic one.
>>
>> eth1 and eth2 on a bonding device, bond0, active-backup
>>
>> some vlans on top of bond0, say vlan.103
>>
>> $ ip link show dev vlan.103
>> 8: vlan.103@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
>> pfifo_fast state UP qlen 100
>>     link/ether 00:1e:0b:ec:d3:d2 brd ff:ff:ff:ff:ff:ff
>>
>>
>> arp_rcv() now gets packets with skb->type PACKET_OTHERHOST and drops
>> such packets.
>>
>>      [000] 52870.115435: skb_gro_reset_offset <-napi_gro_receive
>>      [000] 52870.115435: dev_gro_receive <-napi_gro_receive
>>      [000] 52870.115435: napi_skb_finish <-napi_gro_receive
>>      [000] 52870.115435: netif_receive_skb <-napi_skb_finish
>>      [000] 52870.115435: get_rps_cpu <-netif_receive_skb
>>      [000] 52870.115435: __netif_receive_skb <-netif_receive_skb
>>      [000] 52870.115436: vlan_do_receive <-__netif_receive_skb
>>      [000] 52870.115436: bond_handle_frame <-__netif_receive_skb
>>      [000] 52870.115436: vlan_do_receive <-__netif_receive_skb
>>      [000] 52870.115436: arp_rcv <-__netif_receive_skb
>>      [000] 52870.115436: kfree_skb <-arp_rcv
>>      [000] 52870.115437: __kfree_skb <-kfree_skb
>>      [000] 52870.115437: skb_release_head_state <-__kfree_skb
>>      [000] 52870.115437: skb_release_data <-__kfree_skb
>>      [000] 52870.115437: kfree <-skb_release_data
>>      [000] 52870.115437: kmem_cache_free <-__kfree_skb
>>
>>
>> By the way, we have no SNMP counter here so I spent some time to track
>> this. I'll send a patch for this.
>>
>> If this host initiates the trafic, all is well.
>>
>> Please guys, can we get back ARP or revert this patch ?
> 
> Following patch cures the problem, I am not sure its the right fix.
> 
> Problem is we dont know how many times vlan_do_receive() can be called
> for a packet.
> 
> Only last call should set/mess pkt_type to PACKET_OTHERHOST.
> 
> So the caller should be responsible for this, not vlan_do_receive()
> 
> 
> Alternative would be to check skb->dev->rx_handler being NULL,
> but its not clean.
> 
> Following patch is a hack because it handles multicast/broadcast trafic
> only. Unicast is already handled in lines 26-33, this is why we didnt
> catch the problem.
> 
> diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
> index f1f2f7b..6861899 100644
> --- a/net/8021q/vlan_core.c
> +++ b/net/8021q/vlan_core.c
> @@ -13,7 +13,7 @@ bool vlan_do_receive(struct sk_buff **skbp)
>  
>  	vlan_dev = vlan_find_dev(skb->dev, vlan_id);
>  	if (!vlan_dev) {
> -		if (vlan_id)
> +		if (vlan_id && skb->pkt_type == PACKET_HOST)
>  			skb->pkt_type = PACKET_OTHERHOST;
>  		return false;
>  	}
> 

Thanks Eric! Thought about this some and I haven't come up
with anything better yet. Even though this might be a slight
hack I would prefer this to reverting the patch.

I'll think about this more tomorrow. Would you be against
submitting this patch?

.John

^ permalink raw reply

* Re: [PATCH 2/5] drivers/net/wireless/brcm80211/brcmsmac/dma.c: eliminate a null pointer dereference
From: Julian Calaby @ 2011-10-29  2:27 UTC (permalink / raw)
  To: Julia Lawall
  Cc: John W. Linville, kernel-janitors, linux-wireless, netdev,
	linux-kernel
In-Reply-To: <1319846297-2985-2-git-send-email-julia@diku.dk>

On 29/10/11 10:58, Julia Lawall wrote:
> From: Julia Lawall <julia@diku.dk>
> 
> Delete di->name from the error reporting code, as it is meaningless if di
> is NULL.
> 
> The semantic match that finds this problem is as follows:
> (http://coccinelle.lip6.fr/)
> 
> // <smpl>
> @r@
> expression E, E1;
> identifier f;
> statement S1,S2,S3;
> @@
> 
> if (E == NULL)
> {
>   ... when != if (E == NULL || ...) S1 else S2
>       when != E = E1
> *E->f
>   ... when any
>   return ...;
> }
> else S3
> // </smpl>
> 
> Signed-off-by: Julia Lawall <julia@diku.dk>
> 
> ---
>  drivers/net/wireless/brcm80211/brcmsmac/dma.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/wireless/brcm80211/brcmsmac/dma.c b/drivers/net/wireless/brcm80211/brcmsmac/dma.c
> index b56a302..1d66f53 100644
> --- a/drivers/net/wireless/brcm80211/brcmsmac/dma.c
> +++ b/drivers/net/wireless/brcm80211/brcmsmac/dma.c
> @@ -361,7 +361,7 @@ static uint _dma_ctrlflags(struct dma_info *di, uint mask, uint flags)
>  	uint dmactrlflags = di->dma.dmactrlflags;

If di is null, we've already failed as it's dereferenced here.

>  	if (di == NULL) {
> -		DMA_ERROR(("%s: _dma_ctrlflags: NULL dma handle\n", di->name));
> +		DMA_ERROR(("_dma_ctrlflags: NULL dma handle\n"));
>  		return 0;
>  	}

So, a better patch would be something like this:

(apologies if this doesn't apply - I've pretty much built it manually)

---

Though it's unlikely, di may be null, so we can't dereference
di->dma.dmactrlflags until we've checked it.

Move this de-reference after the check, and adjust the error
message to not require de-referencing di.

This is based upon Julia's original patch:
<1319846297-2985-2-git-send-email-julia@diku.dk>

Reported-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Julian Calaby <julian.calaby@gmail.com>
CC: Julia Lawall <julia@diku.dk>

diff --git a/drivers/net/wireless/brcm80211/brcmsmac/dma.c b/drivers/net/wireless/brcm80211/brcmsmac/dma.c
index b56a302..6ebec8f 100644
--- a/drivers/net/wireless/brcm80211/brcmsmac/dma.c
+++ b/drivers/net/wireless/brcm80211/brcmsmac/dma.c
@@ -358,13 +358,14 @@ static uint nrxdactive(struct dma_info *di, uint h, uint t
 
 static uint _dma_ctrlflags(struct dma_info *di, uint mask, uint flags)
 {
-       uint dmactrlflags = di->dma.dmactrlflags;
+       uint dmactrlflags;
 
        if (di == NULL) {
-               DMA_ERROR(("%s: _dma_ctrlflags: NULL dma handle\n", di->name));
+               DMA_ERROR(("_dma_ctrlflags: NULL dma handle\n"));
                return 0;
        }
 
+       dmactrlflags = di->dma.dmactrlflags;
        dmactrlflags &= ~mask;
        dmactrlflags |= flags;
 




-- 
Julian Calaby

Email: julian.calaby@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/
.Plan: http://sites.google.com/site/juliancalaby

^ permalink raw reply related

* Re: [RFC v2] tcp: Export TCP Delayed ACK parameters to user
From: David Miller @ 2011-10-29  2:24 UTC (permalink / raw)
  To: rick.jones2
  Cc: dbaluta, eric.dumazet, kuznet, jmorris, yoshfuji, kaber, netdev,
	luto
In-Reply-To: <4EAB2F58.7080509@hp.com>

From: Rick Jones <rick.jones2@hp.com>
Date: Fri, 28 Oct 2011 15:40:24 -0700

> That would be loads faster, but won't that have issues with
> granularity?

Frankly, I don't care.

For an obscure feature I don't even like to begin with, I refuse
to allow a multiply into a core code path.

^ permalink raw reply

* [net-next-2.6 PATCH 0/6 RFC v3] macvlan: MAC Address filtering support for passthru mode
From: Roopa Prabhu @ 2011-10-29  2:33 UTC (permalink / raw)
  To: netdev
  Cc: sri, dragos.tatulea, kvm, arnd, mst, davem, gregory.v.rose, mchan,
	dwang2, shemminger, eric.dumazet, kaber, benve

v2 -> v3
- Moved set and get filter ops from rtnl_link_ops to netdev_ops
- Support for SRIOV VFs.
	[Note: The get filters msg might get too big for SRIOV vfs. 
        But this patch follows existing sriov vf get code and 
	accomodate filters for all VF's in a PF. 
        And for the SRIOV case I have only tested the fact that the VF 
	arguments are getting delivered to rtnetlink correctly. The rest of
	the code follows existing sriov vf handling code so it should work 
	just fine]
- Fixed all op and netlink attribute names to start with IFLA_RX_FILTER
- Changed macvlan filter ops to call corresponding lowerdev op if lowerdev 
  supports it for passthru mode. Else it falls back on macvlan handling the 
  filters locally as in v1 and v2

v1 -> v2
- Instead of TUNSETTXFILTER introduced rtnetlink interface for the same


Background and details:
=======================
Today macvtap used in virtualized environment does not have support to 
propagate MAC, VLAN and interface flags from guest to lowerdev.
Which means to be able to register additional VLANs, unicast and multicast
addresses or change pkt filter flags in the guest, the lowerdev has to be
put in promisocous mode. Today the only macvlan mode that supports this is 
the PASSTHRU mode and it puts the lower dev in promiscous mode.

PASSTHRU mode was added primarily for the SRIOV usecase. In PASSTHRU mode 
there is a 1-1 mapping between macvtap and physical NIC or VF.

There are two problems with putting the lowerdev in promiscous mode (ie SRIOV 
VF's):
	- Some SRIOV cards dont support promiscous mode today (Thread on Intel
	driver indicates that http://lists.openwall.net/netdev/2011/09/27/6)
	- For the SRIOV NICs that support it, Putting the lowerdev in 
	promiscous mode leads to additional traffic being sent up to the 
	guest virtio-net to filter result in extra overheads.
	
Both the above problems can be solved by offloading filtering to the 
lowerdev hw. ie lowerdev does not need to be in promiscous mode as 
long as the guest filters are passed down to the lowerdev. 

This patch basically adds the infrastructure to set and get MAC and VLAN 
filters on an interface via rtnetlink. It adds new netlink msg and netdev
ops for the same. And implements these ops in macvlan for passthru mode.

- Netlink interface:
    This patch provides the following netlink interface to set mac and vlan
    filters :

    Interface to set RX filter on a SRIOV VF:
    [IFLA_VF_RX_FILTERS] = {
    	[IFLA_VF_RX_FILTER] = {
    		[IFLA_RX_FILTER_VF]
    		[IFLA_RX_FILTER_ADDR] = {
    			[IFLA_RX_FILTER_ADDR_FLAGS]
    			[IFLA_RX_FILTER_ADDR_UC_LIST] = {
    				[IFLA_ADDR_LIST_ENTRY]
    			}
    			[IFLA_RX_FILTER_ADDR_MC_LIST] = {
    				[IFLA_ADDR_LIST_ENTRY]
    			}
    		}
    		[IFLA_RX_FILTER_VLAN] = {
    			[IFLA_RX_FILTER_VLAN_BITMAP]
    		}
    	}
    	...
    }
    
    Interface to set RX filter on a any network interface.:
    [IFLA_RX_FILTER] = {
    	[IFLA_RX_FILTER_VF]
    	[IFLA_RX_FILTER_ADDR] = {
    		[IFLA_RX_FILTER_ADDR_FLAGS]
    		[IFLA_RX_FILTER_ADDR_UC_LIST] = {
    			[IFLA_ADDR_LIST_ENTRY]
    		}
    		[IFLA_RX_FILTER_ADDR_MC_LIST] = {
    			[IFLA_ADDR_LIST_ENTRY]
    		}
    	}
    	[IFLA_RX_FILTER_VLAN] = {
    		[IFLA_RX_FILTER_VLAN_BITMAP]
	}
    } 

    Note1: The IFLA_RX_FILTER_VLAN is a nested attribute, but contains only 
    IFLA_RX_FILTER_VLAN_BITMAP today. The idea is that the IFLA_RX_FILTER_VLAN 
    can be extended tomorrow to have a vlan list if some implementations 
    prefer a list instead. 

    And it provides the following netdev_ops to set/get MAC/VLAN filters:

    int                     (*ndo_set_rx_filter_addr)(
	                                        struct net_device *dev, int vf,
                                                struct nlattr *tb[]);
    int                     (*ndo_set_rx_filter_vlan)(
                                                struct net_device *dev, int vf,
                                                struct nlattr *tb[]);
    size_t                  (*ndo_get_rx_filter_addr_size)(
                                                const struct net_device *dev,
                                                int vf);
    size_t                  (*ndo_get_rx_filter_vlan_size)(
                                                const struct net_device *dev,
                                                int vf);
    int                     (*ndo_get_rx_filter_addr)(
                                                const struct net_device *dev,
                                                int vf, struct sk_buff *skb);
    int                     (*ndo_get_rx_filter_vlan)(
                                                const struct net_device *dev,
                                                int vf, struct sk_buff *skb);

Some answers to questions that were raised during the review:
- Protection against address spoofing:
	- This patch adds filtering support only for macvtap PASSTHRU 
	Mode. PASSTHRU mode is used mainly with SRIOV VF's. And SRIOV VF's 
	come with anti mac/vlan spoofing support in the lowerdev driver. 
	(netdev infrastructure to support this was added recently 
	with IFLA_VF_SPOOFCHK). For 802.1Qbh devices, the port profile has a 
	knob to enable/disable anti spoof check. Lowerdevice drivers also 
	enforce limits on the number of address registrations allowed. 
	For non-SRIOV VF's its the responsibility of the lowerdev driver
	to implement any such protection. The currrent netdev hooks for 
	SRIOV VF's spoof check could be extended to accomodate any network 
	interface in the future.

- Support for multiqueue devices: Enable filtering on individual queues (?):
	As i understand after the thread between (Micheal and Greg),
	VMdq Linux implementation is not in yet and dont know how its going to
	take shape. But Intel VMdq devices do accept filters on a per-queue
	basis. Since the netdev infrastructure for VMdq is not in yet, Its
	hard to say how this patch can support it.

	This patch makes use of current netdev infrastructure for setting
	address and vlan filters. And if that changes for vmdq tomorrow,
	then the work that this patch represents can be modified to accomodate
	vmdq devices at that time. 

	So i dont see a huge problem with this patch coming in the way for
	vmdq devices.

- Support for non-PASSTHRU mode:
	I started implementing this. But there are a couple of problems.	
	- Today, in non-PASSTHRU cases macvlan_handle_frame assumes that 
	every macvlan device has a single unique mac.
	And the macvlans are hashed on that single mac address. 
	To support filtering for non-PASSTHRU mode in addition to this 
	patch the following needs to be done:
		- non-passthru mode with a single macvlan over a lower dev
		can be treated as PASSTHRU case
		- For non-PASSTHRU mode with multiple macvlans over a single 
		lower dev:  
			- Multiple unicast mac's now need to be hashed to the 
			same macvlan device. The macvlan hash needs to change 
			for lookup based on any one of the multiple unicast 
			addresses a macvlan is interested in
			- We need to consider vlans during the lookup too
			- So the macvlan device hash needs to hash on both mac 
			and vlan
		- But the support for filtering in non-PASSTHRU mode can be 
		built on this patch

This patch series implements the following 
01/6 rtnetlink: Netlink interface for setting MAC and VLAN filters
02/6 netdev: Add netdev_ops to set and get MAC/VLAN rx filters
03/6 rtnetlink: Add support to set MAC/VLAN filters
04/6 rtnetlink: Add support to get MAC/VLAN filters
05/6 macvlan: Add support to set MAC/VLAN filter netdev ops
06/6 macvlan: Add support to get MAC/VLAN filter netdev ops

Please comment. Thanks.

Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: Christian Benvenuti <benve@cisco.com>
Signed-off-by: David Wang <dwang2@cisco.com>

^ permalink raw reply

* [net-next-2.6 PATCH 2/6 RFC v3] net: Add netdev_ops to set and get MAC/VLAN rx filters
From: Roopa Prabhu @ 2011-10-29  2:34 UTC (permalink / raw)
  To: netdev
  Cc: sri, dragos.tatulea, kvm, arnd, mst, davem, gregory.v.rose, mchan,
	dwang2, shemminger, eric.dumazet, kaber, benve
In-Reply-To: <20111029023159.5198.60245.stgit@rhel6.1>

From: Roopa Prabhu <roprabhu@cisco.com>

This patch adds the following netdev_ops to set and get MAC/VLAN
filters on a SRIOV VF or any netdev interface. Each op takes a vf argument.
vf value of SELF_VF or -1 is for applying the operation directly on the
interface.

ndo_set_rx_filter_addr - to set address filter
ndo_get_rx_filter_addr_size - to get address filter size
ndo_get_rx_filter_addr - To get addr filter

ndo_set_rx_filter_vlan - to set vlan filter
ndo_get_rx_filter_vlan_size - to get vlan filter size
ndo_get_rx_filter_vlan - To get vlan filter

Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: Christian Benvenuti <benve@cisco.com>
Signed-off-by: David Wang <dwang2@cisco.com>
---
 include/linux/netdevice.h |   32 ++++++++++++++++++++++++++++++++
 1 files changed, 32 insertions(+), 0 deletions(-)


diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0db1f5f..94f2bc1 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -855,6 +855,20 @@ struct netdev_tc_txq {
  *	feature set might be less than what was returned by ndo_fix_features()).
  *	Must return >0 or -errno if it changed dev->features itself.
  *
+ * Address Filter management functions:
+ * int (*ndo_set_rx_filter_addr)(struct net_device *dev, int vf,
+ *				 struct nlattr *tb[]);
+ * size_t (*ndo_get_rx_filter_addr_size)(const struct net_device *dev, int vf);
+ * int (*ndo_get_rx_filter_addr)(const struct net_device *dev, int vf,
+ *				 struct sk_buff *skb);
+ *
+ * Vlan Filter management functions:
+ * int (*ndo_set_rx_filter_vlan)(struct net_device *dev, int vf,
+ *				 struct nlattr *tb[]);
+ * size_t (*ndo_get_rx_filter_vlan_size)(const struct net_device *dev, int vf);
+ * int (*ndo_get_rx_filter_vlan)(const struct net_device *dev, int vf,
+ *				 struct sk_buff *skb);
+ *
  */
 struct net_device_ops {
 	int			(*ndo_init)(struct net_device *dev);
@@ -948,6 +962,24 @@ struct net_device_ops {
 						    u32 features);
 	int			(*ndo_set_features)(struct net_device *dev,
 						    u32 features);
+	int			(*ndo_set_rx_filter_addr)(
+						struct net_device *dev, int vf,
+						struct nlattr *tb[]);
+	size_t			(*ndo_get_rx_filter_addr_size)(
+						const struct net_device *dev,
+						int vf);
+	int			(*ndo_get_rx_filter_addr)(
+						const struct net_device *dev,
+						int vf, struct sk_buff *skb);
+	int			(*ndo_set_rx_filter_vlan)(
+						struct net_device *dev, int vf,
+						struct nlattr *tb[]);
+	size_t			(*ndo_get_rx_filter_vlan_size)(
+						const struct net_device *dev,
+						int vf);
+	int			(*ndo_get_rx_filter_vlan)(
+						const struct net_device *dev,
+						int vf, struct sk_buff *skb);
 };
 
 /*


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox