Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [stable] [PATCH net-2.6/stable] tg3: Restrict phy ioctl access
From: Matt Carlson @ 2011-02-17  0:11 UTC (permalink / raw)
  To: Greg KH
  Cc: Matthew Carlson, David Miller, stable@kernel.org,
	netdev@vger.kernel.org
In-Reply-To: <20110217000035.GB6296@kroah.com>

On Wed, Feb 16, 2011 at 04:00:35PM -0800, Greg KH wrote:
> On Wed, Feb 16, 2011 at 03:52:48PM -0800, Matt Carlson wrote:
> > On Wed, Feb 16, 2011 at 03:11:03PM -0800, David Miller wrote:
> > > From: "Matt Carlson" <mcarlson@broadcom.com>
> > > Date: Wed, 16 Feb 2011 15:06:13 -0800
> > > 
> > > > On Wed, Feb 16, 2011 at 02:39:35PM -0800, Greg KH wrote:
> > > >> On Tue, Feb 15, 2011 at 02:51:10PM -0800, Matt Carlson wrote:
> > > >> > If management firmware is present and the device is down, the firmware
> > > >> > will assume control of the phy.  If a phy access were allowed from the
> > > >> > host, it will collide with firmware phy accesses, resulting in
> > > >> > unpredictable behavior.  This patch fixes the problem by disallowing phy
> > > >> > accesses during the problematic condition.
> > > >> > 
> > > >> > Upstream commit ID f746a3136a61ae535c5d0b49a9418fa21edc61b5
> > > >> 
> > > >> There is no such upstream git commit id in Linus's tree.  What am I
> > > >> doing wrong here?
> > > > 
> > > > The commit is in Dave Miller's net-next-2.6 tree.
> > > > 
> > > 
> > > If it wasn't appropriate for net-2.6, it absolutely it not appropriate
> > > for -stable.
> > 
> > net-2.6 was the target tree for the patch.  The stable_kernel_rules.txt
> > seemed to suggest that I could just CC stable@kernel.org with the
> > commit ID, and Greg would pull it in as the process dictates.  If that
> > isn't correct, what is the preferred way to expedite the integration of
> > a patch?
> 
> Keep reading that file, it says to put the Cc: in the signed-off-by area
> of the original patch.

Ah.  Yes.  I see that now.

> Also, that file says the patch has to be in Linus's tree, otherwise
> sending me a git commit id of some other tree isn't going to help at
> all.

I see.  Thanks for the tips.


^ permalink raw reply

* Re: [stable] [PATCH net-2.6/stable] tg3: Restrict phy ioctl access
From: David Miller @ 2011-02-17  0:10 UTC (permalink / raw)
  To: mcarlson; +Cc: greg, netdev, stable
In-Reply-To: <20110216235248.GA11108@mcarlson.broadcom.com>

From: "Matt Carlson" <mcarlson@broadcom.com>
Date: Wed, 16 Feb 2011 15:52:48 -0800

> On Wed, Feb 16, 2011 at 03:11:03PM -0800, David Miller wrote:
>> From: "Matt Carlson" <mcarlson@broadcom.com>
>> Date: Wed, 16 Feb 2011 15:06:13 -0800
>> 
>> > On Wed, Feb 16, 2011 at 02:39:35PM -0800, Greg KH wrote:
>> >> On Tue, Feb 15, 2011 at 02:51:10PM -0800, Matt Carlson wrote:
>> >> > If management firmware is present and the device is down, the firmware
>> >> > will assume control of the phy.  If a phy access were allowed from the
>> >> > host, it will collide with firmware phy accesses, resulting in
>> >> > unpredictable behavior.  This patch fixes the problem by disallowing phy
>> >> > accesses during the problematic condition.
>> >> > 
>> >> > Upstream commit ID f746a3136a61ae535c5d0b49a9418fa21edc61b5
>> >> 
>> >> There is no such upstream git commit id in Linus's tree.  What am I
>> >> doing wrong here?
>> > 
>> > The commit is in Dave Miller's net-next-2.6 tree.
>> > 
>> 
>> If it wasn't appropriate for net-2.6, it absolutely it not appropriate
>> for -stable.
> 
> net-2.6 was the target tree for the patch.  The stable_kernel_rules.txt
> seemed to suggest that I could just CC stable@kernel.org with the
> commit ID, and Greg would pull it in as the process dictates.  If that
> isn't correct, what is the preferred way to expedite the integration of
> a patch?

You are posting a commit ID for the net-next-2.6 tree, that's what triggered
my response.

Unless it also went into the net-2.6 tree (in which case you should
give Greg the net-2.6 commit ID, which is also what the commit ID must
be in Linus's tree right now), the change is not appropriate for
-stable submission.

^ permalink raw reply

* state of rtcache removal...
From: David Miller @ 2011-02-17  0:08 UTC (permalink / raw)
  To: netdev

So I've been testing out the routing cache removal patch to see
what the impact is on performance.

I'm using a UDP flood to a single IP address over a dummy interface
with hard coded ARP entries, so that pretty much just the main IP
output and routing paths are being exercised.

The UDP flood tool I cooked up based upon a description sent to me by
Eric Dumazet of a similar utility he uses for testing.  I've included
the code to this tool at the end of this email, as well as the dummy
interface setup script.   Basically, you go:

bash# ./udpflood_setup.sh
bash# time ./udpflood -l 10000 10.2.2.11

The IP output path is about twice as slow with the routing cache
removed entirely.  Here are the numbers I have:

net-next-2.6, rt_cache on:

davem@maramba:~$ time udpflood -l 10000000 10.2.2.11
real		 1m47.012s
user		 0m8.670s
sys		 1m38.370s

net-next-2.6, rt_cache turned off via sysctl:

davem@maramba:~$ time udpflood -l 10000000 10.2.2.11
real		 3m12.662s
user		 0m9.490s
sys		 3m3.220s

net-next-2.6 + "BONUS" rt_cache deletion patch:

maramba:/home/davem# time ./bin/udpflood -l 10000000 10.2.2.11
real		     3m9.921s
user		     0m9.520s
sys		     3m0.440s

I then worked on some simplifications of the code in net/ipv4/route.c
that remains after the cache removal.  I'll post those patches after
I've chewed on them some more, but they knock a couple seconds back off
of the benchmark:

The profile output is what you'd expect, with fib_table_lookup() topping
the charts taking ~%10 of the time.

What might not be initially apparent is that each output route lookup
results in two calls to fib_table_lookup() and thus two trie lookups.
Why?  Because we have two routing tables (3 with IP_MULTIPLE_TABLES
enabled) that get searched, first the LOCAL then the MAIN table (then
with mutliple-tables enabled, the DEFAULT).  And most external
outgoing routes sit in the MAIN table.

We do this so we can store all the interface address network,
broadcast, loopback network, et al. routes in the LOCAL table, then all
globally visible routes in the MAIN table.

Anyways, the long and short of this is that route lookups take two
trie lookups instead of just one.  On input there are even more, for
source address validation done by fib_validate_source().  That can be
up to 4 more fib_table_lookup() invocations.

Add in another level of complexity if you have a series of FIB rules
installed.

So, to me, this means that spending time micro-optiming fib_trie is
not going to help much.  Getting rid of that multiplier somehow, on
the other hand, might.

I plan to play with some ideas, such as sticking fib_alias entries into
the flow cache and consulting/populating the flow cache on fib_lookup()
calls.

-------------------- udpflood.c --------------------
/* An adaptation of Eric Dumazet's udpflood tool.  */

#include <stdio.h>
#include <stddef.h>
#include <malloc.h>
#include <string.h>
#include <errno.h>

#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>

#define _GNU_SOURCE
#include <getopt.h>

static int usage(void)
{
	printf("usage: udpflood [ -l count ] [ -m message_size ] IP_ADDRESS\n");
	return -1;
}

static int send_packets(in_addr_t addr, int port, int count, int msg_sz)
{
	char *msg = malloc(msg_sz);
	struct sockaddr_in saddr;
	int fd, i, err;

	if (!msg)
		return -ENOMEM;

	memset(msg, 0, msg_sz);

	memset(&saddr, 0, sizeof(saddr));
	saddr.sin_family = AF_INET;
	saddr.sin_port = port;
	saddr.sin_addr.s_addr = addr;

	fd = socket(PF_INET, SOCK_DGRAM, IPPROTO_IP);
	if (fd < 0) {
		perror("socket");
		err = fd;
		goto out_nofd;
	}
	err = connect(fd, (struct sockaddr *) &saddr, sizeof(saddr));
	if (err < 0) {
		perror("connect");
		close(fd);
		goto out;
	}
	for (i = 0; i < count; i++) {
		err = sendto(fd, msg, msg_sz, 0,
			     (struct sockaddr *) &saddr, sizeof(saddr));
		if (err < 0) {
			perror("sendto");
			goto out;
		}
	}

	err = 0;
out:
	close(fd);
out_nofd:
	free(msg);
	return err;
}

int main(int argc, char **argv, char **envp)
{
	int port, msg_sz, count, ret;
	in_addr_t addr;

	port = 6000;
	msg_sz = 32;
	count = 10000000;

	while ((ret = getopt(argc, argv, "l:s:p:")) >= 0) {
		switch (ret) {
		case 'l':
			sscanf(optarg, "%d", &count);
			break;
		case 's':
			sscanf(optarg, "%d", &msg_sz);
			break;
		case 'p':
			sscanf(optarg, "%d", &port);
			break;
		case '?':
			return usage();
		}
	}

	if (!argv[optind])
		return usage();

	addr = inet_addr(argv[optind]);
	if (addr == INADDR_NONE)
		return usage();

	return send_packets(addr, port, count, msg_sz);
}

-------------------- udpflood_setup.sh --------------------
#!/bin/sh
modprobe dummy
ifconfig dummy0 10.2.2.254 netmask 255.255.255.0 up

for f in $(seq 11 26)
do
 arp -H ether -i dummy0 -s 10.2.2.$f 00:00:0c:07:ac:$f
done

^ permalink raw reply

* Re: [stable] [PATCH net-2.6/stable] tg3: Restrict phy ioctl access
From: Greg KH @ 2011-02-17  0:00 UTC (permalink / raw)
  To: Matt Carlson; +Cc: David Miller, stable@kernel.org, netdev@vger.kernel.org
In-Reply-To: <20110216235248.GA11108@mcarlson.broadcom.com>

On Wed, Feb 16, 2011 at 03:52:48PM -0800, Matt Carlson wrote:
> On Wed, Feb 16, 2011 at 03:11:03PM -0800, David Miller wrote:
> > From: "Matt Carlson" <mcarlson@broadcom.com>
> > Date: Wed, 16 Feb 2011 15:06:13 -0800
> > 
> > > On Wed, Feb 16, 2011 at 02:39:35PM -0800, Greg KH wrote:
> > >> On Tue, Feb 15, 2011 at 02:51:10PM -0800, Matt Carlson wrote:
> > >> > If management firmware is present and the device is down, the firmware
> > >> > will assume control of the phy.  If a phy access were allowed from the
> > >> > host, it will collide with firmware phy accesses, resulting in
> > >> > unpredictable behavior.  This patch fixes the problem by disallowing phy
> > >> > accesses during the problematic condition.
> > >> > 
> > >> > Upstream commit ID f746a3136a61ae535c5d0b49a9418fa21edc61b5
> > >> 
> > >> There is no such upstream git commit id in Linus's tree.  What am I
> > >> doing wrong here?
> > > 
> > > The commit is in Dave Miller's net-next-2.6 tree.
> > > 
> > 
> > If it wasn't appropriate for net-2.6, it absolutely it not appropriate
> > for -stable.
> 
> net-2.6 was the target tree for the patch.  The stable_kernel_rules.txt
> seemed to suggest that I could just CC stable@kernel.org with the
> commit ID, and Greg would pull it in as the process dictates.  If that
> isn't correct, what is the preferred way to expedite the integration of
> a patch?

Keep reading that file, it says to put the Cc: in the signed-off-by area
of the original patch.

Also, that file says the patch has to be in Linus's tree, otherwise
sending me a git commit id of some other tree isn't going to help at
all.

thanks,

greg k-h

^ permalink raw reply

* Re: [stable] [PATCH net-2.6/stable] tg3: Restrict phy ioctl access
From: Matt Carlson @ 2011-02-16 23:52 UTC (permalink / raw)
  To: David Miller
  Cc: Matthew Carlson, greg@kroah.com, netdev@vger.kernel.org,
	stable@kernel.org
In-Reply-To: <20110216.151103.189711682.davem@davemloft.net>

On Wed, Feb 16, 2011 at 03:11:03PM -0800, David Miller wrote:
> From: "Matt Carlson" <mcarlson@broadcom.com>
> Date: Wed, 16 Feb 2011 15:06:13 -0800
> 
> > On Wed, Feb 16, 2011 at 02:39:35PM -0800, Greg KH wrote:
> >> On Tue, Feb 15, 2011 at 02:51:10PM -0800, Matt Carlson wrote:
> >> > If management firmware is present and the device is down, the firmware
> >> > will assume control of the phy.  If a phy access were allowed from the
> >> > host, it will collide with firmware phy accesses, resulting in
> >> > unpredictable behavior.  This patch fixes the problem by disallowing phy
> >> > accesses during the problematic condition.
> >> > 
> >> > Upstream commit ID f746a3136a61ae535c5d0b49a9418fa21edc61b5
> >> 
> >> There is no such upstream git commit id in Linus's tree.  What am I
> >> doing wrong here?
> > 
> > The commit is in Dave Miller's net-next-2.6 tree.
> > 
> 
> If it wasn't appropriate for net-2.6, it absolutely it not appropriate
> for -stable.

net-2.6 was the target tree for the patch.  The stable_kernel_rules.txt
seemed to suggest that I could just CC stable@kernel.org with the
commit ID, and Greg would pull it in as the process dictates.  If that
isn't correct, what is the preferred way to expedite the integration of
a patch?


^ permalink raw reply

* Re: [stable] [PATCH net-2.6/stable] tg3: Restrict phy ioctl access
From: David Miller @ 2011-02-16 23:11 UTC (permalink / raw)
  To: mcarlson; +Cc: greg, netdev, stable
In-Reply-To: <20110216230613.GA11053@mcarlson.broadcom.com>

From: "Matt Carlson" <mcarlson@broadcom.com>
Date: Wed, 16 Feb 2011 15:06:13 -0800

> On Wed, Feb 16, 2011 at 02:39:35PM -0800, Greg KH wrote:
>> On Tue, Feb 15, 2011 at 02:51:10PM -0800, Matt Carlson wrote:
>> > If management firmware is present and the device is down, the firmware
>> > will assume control of the phy.  If a phy access were allowed from the
>> > host, it will collide with firmware phy accesses, resulting in
>> > unpredictable behavior.  This patch fixes the problem by disallowing phy
>> > accesses during the problematic condition.
>> > 
>> > Upstream commit ID f746a3136a61ae535c5d0b49a9418fa21edc61b5
>> 
>> There is no such upstream git commit id in Linus's tree.  What am I
>> doing wrong here?
> 
> The commit is in Dave Miller's net-next-2.6 tree.
> 

If it wasn't appropriate for net-2.6, it absolutely it not appropriate
for -stable.

^ permalink raw reply

* Re: [stable] [PATCH net-2.6/stable] tg3: Restrict phy ioctl access
From: Matt Carlson @ 2011-02-16 23:06 UTC (permalink / raw)
  To: Greg KH
  Cc: Matthew Carlson, davem@davemloft.net, netdev@vger.kernel.org,
	stable@kernel.org
In-Reply-To: <20110216223935.GF22056@kroah.com>

On Wed, Feb 16, 2011 at 02:39:35PM -0800, Greg KH wrote:
> On Tue, Feb 15, 2011 at 02:51:10PM -0800, Matt Carlson wrote:
> > If management firmware is present and the device is down, the firmware
> > will assume control of the phy.  If a phy access were allowed from the
> > host, it will collide with firmware phy accesses, resulting in
> > unpredictable behavior.  This patch fixes the problem by disallowing phy
> > accesses during the problematic condition.
> > 
> > Upstream commit ID f746a3136a61ae535c5d0b49a9418fa21edc61b5
> 
> There is no such upstream git commit id in Linus's tree.  What am I
> doing wrong here?

The commit is in Dave Miller's net-next-2.6 tree.


^ permalink raw reply

* Re: [stable] [PATCH net-2.6/stable] tg3: Restrict phy ioctl access
From: Greg KH @ 2011-02-16 22:39 UTC (permalink / raw)
  To: Matt Carlson; +Cc: davem, netdev, stable
In-Reply-To: <1297810270-4690-1-git-send-email-mcarlson@broadcom.com>

On Tue, Feb 15, 2011 at 02:51:10PM -0800, Matt Carlson wrote:
> If management firmware is present and the device is down, the firmware
> will assume control of the phy.  If a phy access were allowed from the
> host, it will collide with firmware phy accesses, resulting in
> unpredictable behavior.  This patch fixes the problem by disallowing phy
> accesses during the problematic condition.
> 
> Upstream commit ID f746a3136a61ae535c5d0b49a9418fa21edc61b5

There is no such upstream git commit id in Linus's tree.  What am I
doing wrong here?

confused,

greg k-h

^ permalink raw reply

* macb:
From: Marc Kleine-Budde @ 2011-02-16 22:31 UTC (permalink / raw)
  To: Netdev; +Cc: Nicolas Ferre

[-- Attachment #1: Type: text/plain, Size: 650 bytes --]

Hello,

I added type checking[1] to platform_set_drvdata and got the following
warning:

drivers/net/macb.c: In function 'macb_mii_init':
drivers/net/macb.c:263: warning: passing argument 1 of 'platform_set_drvdata' from incompatible pointer type

I'm new to the mii_bus stuff and don't see how to fix it.

regards, Marc

[1] https://lkml.org/lkml/2011/2/16/380
-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply

* Re: [PATCH] bonding/vlan: Avoid mangled NAs on slaves without VLAN tag insertion
From: Greg KH @ 2011-02-16 21:58 UTC (permalink / raw)
  To: David Miller; +Cc: bhutchings, netdev, fubar, stable, bonding-devel
In-Reply-To: <20110207.131754.59684095.davem@davemloft.net>

On Mon, Feb 07, 2011 at 01:17:54PM -0800, David Miller wrote:
> From: Ben Hutchings <bhutchings@solarflare.com>
> Date: Mon, 07 Feb 2011 19:20:55 +0000
> 
> > This is related to commit f88a4a9b65a6f3422b81be995535d0e69df11bb8
> > upstream, but the bug cannot be properly fixed without the other
> > changes to VLAN tagging in 2.6.37.
> > 
> > bond_na_send() attempts to insert a VLAN tag in between building and
> > sending packets of the respective formats.  If the slave does not
> > implement hardware VLAN tag insertion then vlan_put_tag() will mangle
> > the network-layer header because the Ethernet header is not present at
> > this point (unlike in bond_arp_send()).
> > 
> > Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
> 
> Acked-by: David S. Miller <davem@davemloft.net>

Great, thanks for the patch, now queued up for the next .32-stable
release.

greg k-h

^ permalink raw reply

* Re: [BUG]  behaviour mismatch between ipv4 and ipv6 in UDP rx path
From: Eric Dumazet @ 2011-02-16 20:59 UTC (permalink / raw)
  To: Chris Friesen; +Cc: netdev, Herbert Xu
In-Reply-To: <4D5C3128.4080101@genband.com>

Le mercredi 16 février 2011 à 14:18 -0600, Chris Friesen a écrit :
> Hi,
> 
> I sent this out a week ago but didn't see a reply, so I'm sending it out
> again.
> 
> One of our guys is seeing occasional dropped ipv4 packets coming in on
> an ipv6 udp socket obtained via socket(AF_INET6,  SOCK_DGRAM, IPPROTO_UDP).
> 
> Here's what he says:
> 
> 
> "The problem happens when release_sock() goes down an interesting code
> path.  If (sk->sk_backlog.tail) is non-NULL then release_sock() invokes
> __release_sock() which loops over all queue packets and invokes the
> socket's backlog receive function for each previously queued packet.
> 
> Now for the interesting part.  The UDPv6 backlog receive function (in
> net/ipv6/udp.c, udpv6_queue_rcv_skb()) invokes xfrm6_policy_check() to
> confirm that the packet is allowed, but the problem is that it calls
> this function regardless of whether the packet is IPv4 or IPv6.  The
> xfrm6_policy_check() function then assumes that it is an IPv6 packet and
> tries to match a policy based on its packet header... but that clearly
> won't work because the addresses that it finds when it decodes the skb
> are completely bogus."
> 
> 
> Looking at the ipv4 code, git commit 9382177 split __udp_queue_rcv_skb()
> out of udp_queue_rcv_skb().  It was done for locking purposes, but it
> also means that backlog_rcv is bound to __udp_queue_rcv_skb(), which
> doesn't call xfrm4_policy_check().
> 
> 
> Should a new function __udpv6_queue_rcv_skb() be split out from
> udpv6_queue_rcv_skb() and bound to backlog_rcv to resolve the xfrm
> issue?  What about the locking that was the reason for the split in the
> ipv4 case--is there a similar problem with ipv6?
> 


Yes, please submit a patch ?

Ideally, __udp_queue_rcv_skb() should be the common .backlog

In practice, because of sock_rps_save_rxhash() and MIB counters, I
suspect a __udp6_queue_rcv_skb() is OK.




^ permalink raw reply

* [PATCH v2] bnx2x: Support for managing RX indirection table
From: Tom Herbert @ 2011-02-16 20:27 UTC (permalink / raw)
  To: davem, eilong, netdev

Support fetching and retrieving RX indirection table via ethtool.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 drivers/net/bnx2x/bnx2x.h         |    2 +
 drivers/net/bnx2x/bnx2x_ethtool.c |   56 +++++++++++++++++++++++++++++++++++++
 drivers/net/bnx2x/bnx2x_main.c    |   22 +++++++++++---
 3 files changed, 75 insertions(+), 5 deletions(-)

diff --git a/drivers/net/bnx2x/bnx2x.h b/drivers/net/bnx2x/bnx2x.h
index 236d79a..c0dd30d 100644
--- a/drivers/net/bnx2x/bnx2x.h
+++ b/drivers/net/bnx2x/bnx2x.h
@@ -1076,6 +1076,7 @@ struct bnx2x {
 	int			num_queues;
 	int			disable_tpa;
 	int			int_mode;
+	u32			*rx_indir_table;
 
 	struct tstorm_eth_mac_filter_config	mac_filters;
 #define BNX2X_ACCEPT_NONE		0x0000
@@ -1799,5 +1800,6 @@ static inline u32 reg_poll(struct bnx2x *bp, u32 reg, u32 expected, int ms,
 BNX2X_EXTERN int load_count[2][3]; /* per path: 0-common, 1-port0, 2-port1 */
 
 extern void bnx2x_set_ethtool_ops(struct net_device *netdev);
+void bnx2x_push_indir_table(struct bnx2x *bp);
 
 #endif /* bnx2x.h */
diff --git a/drivers/net/bnx2x/bnx2x_ethtool.c b/drivers/net/bnx2x/bnx2x_ethtool.c
index 816fef6..8d19d12 100644
--- a/drivers/net/bnx2x/bnx2x_ethtool.c
+++ b/drivers/net/bnx2x/bnx2x_ethtool.c
@@ -2134,6 +2134,59 @@ static int bnx2x_phys_id(struct net_device *dev, u32 data)
 	return 0;
 }
 
+static int bnx2x_get_rxnfc(struct net_device *dev, struct ethtool_rxnfc *info,
+			   void *rules __always_unused)
+{
+	struct bnx2x *bp = netdev_priv(dev);
+
+	switch (info->cmd) {
+	case ETHTOOL_GRXRINGS:
+		info->data = BNX2X_NUM_ETH_QUEUES(bp);
+		return 0;
+
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
+static int bnx2x_get_rxfh_indir(struct net_device *dev,
+				struct ethtool_rxfh_indir *indir)
+{
+	struct bnx2x *bp = netdev_priv(dev);
+	size_t copy_size =
+		min_t(size_t, indir->size, TSTORM_INDIRECTION_TABLE_SIZE);
+
+	if (bp->multi_mode == ETH_RSS_MODE_DISABLED)
+		return -EOPNOTSUPP;
+
+	indir->size = TSTORM_INDIRECTION_TABLE_SIZE;
+	memcpy(indir->ring_index, bp->rx_indir_table,
+	       copy_size * sizeof(bp->rx_indir_table[0]));
+	return 0;
+}
+
+static int bnx2x_set_rxfh_indir(struct net_device *dev,
+				const struct ethtool_rxfh_indir *indir)
+{
+	struct bnx2x *bp = netdev_priv(dev);
+	size_t i;
+
+	if (bp->multi_mode == ETH_RSS_MODE_DISABLED)
+		return -EOPNOTSUPP;
+
+	/* Validate size and indices */
+	if (indir->size != TSTORM_INDIRECTION_TABLE_SIZE)
+		return -EINVAL;
+	for (i = 0; i < TSTORM_INDIRECTION_TABLE_SIZE; i++)
+		if (indir->ring_index[i] >= BNX2X_NUM_ETH_QUEUES(bp))
+			return -EINVAL;
+
+	memcpy(bp->rx_indir_table, indir->ring_index,
+	       indir->size * sizeof(bp->rx_indir_table[0]));
+	bnx2x_push_indir_table(bp);
+	return 0;
+}
+
 static const struct ethtool_ops bnx2x_ethtool_ops = {
 	.get_settings		= bnx2x_get_settings,
 	.set_settings		= bnx2x_set_settings,
@@ -2170,6 +2223,9 @@ static const struct ethtool_ops bnx2x_ethtool_ops = {
 	.get_strings		= bnx2x_get_strings,
 	.phys_id		= bnx2x_phys_id,
 	.get_ethtool_stats	= bnx2x_get_ethtool_stats,
+	.get_rxnfc		= bnx2x_get_rxnfc,
+	.get_rxfh_indir		= bnx2x_get_rxfh_indir,
+	.set_rxfh_indir		= bnx2x_set_rxfh_indir,
 };
 
 void bnx2x_set_ethtool_ops(struct net_device *netdev)
diff --git a/drivers/net/bnx2x/bnx2x_main.c b/drivers/net/bnx2x/bnx2x_main.c
index c238c4d..6c7745e 100644
--- a/drivers/net/bnx2x/bnx2x_main.c
+++ b/drivers/net/bnx2x/bnx2x_main.c
@@ -4254,7 +4254,7 @@ static void bnx2x_init_eq_ring(struct bnx2x *bp)
 		min_t(int, MAX_SP_DESC_CNT - MAX_SPQ_PENDING, NUM_EQ_DESC) - 1);
 }
 
-static void bnx2x_init_ind_table(struct bnx2x *bp)
+void bnx2x_push_indir_table(struct bnx2x *bp)
 {
 	int func = BP_FUNC(bp);
 	int i;
@@ -4262,13 +4262,20 @@ static void bnx2x_init_ind_table(struct bnx2x *bp)
 	if (bp->multi_mode == ETH_RSS_MODE_DISABLED)
 		return;
 
-	DP(NETIF_MSG_IFUP,
-	   "Initializing indirection table  multi_mode %d\n", bp->multi_mode);
 	for (i = 0; i < TSTORM_INDIRECTION_TABLE_SIZE; i++)
 		REG_WR8(bp, BAR_TSTRORM_INTMEM +
 			TSTORM_INDIRECTION_TABLE_OFFSET(func) + i,
-			bp->fp->cl_id + (i % (bp->num_queues -
-				NONE_ETH_CONTEXT_USE)));
+			bp->fp->cl_id + bp->rx_indir_table[i]);
+}
+
+static void bnx2x_init_ind_table(struct bnx2x *bp)
+{
+	int i;
+
+	for (i = 0; i < TSTORM_INDIRECTION_TABLE_SIZE; i++)
+		bp->rx_indir_table[i] = i % BNX2X_NUM_ETH_QUEUES(bp);
+
+	bnx2x_push_indir_table(bp);
 }
 
 void bnx2x_set_storm_rx_mode(struct bnx2x *bp)
@@ -6016,6 +6023,8 @@ void bnx2x_free_mem(struct bnx2x *bp)
 	BNX2X_PCI_FREE(bp->eq_ring, bp->eq_mapping,
 		       BCM_PAGE_SIZE * NUM_EQ_PAGES);
 
+	BNX2X_FREE(bp->rx_indir_table);
+
 #undef BNX2X_PCI_FREE
 #undef BNX2X_KFREE
 }
@@ -6146,6 +6155,9 @@ int bnx2x_alloc_mem(struct bnx2x *bp)
 	/* EQ */
 	BNX2X_PCI_ALLOC(bp->eq_ring, &bp->eq_mapping,
 			BCM_PAGE_SIZE * NUM_EQ_PAGES);
+
+	BNX2X_ALLOC(bp->rx_indir_table, sizeof(bp->rx_indir_table[0]) *
+		    TSTORM_INDIRECTION_TABLE_SIZE);
 	return 0;
 
 alloc_mem_err:
-- 
1.7.3.1


^ permalink raw reply related

* [BUG]  behaviour mismatch between ipv4 and ipv6 in UDP rx path
From: Chris Friesen @ 2011-02-16 20:18 UTC (permalink / raw)
  To: netdev

Hi,

I sent this out a week ago but didn't see a reply, so I'm sending it out
again.

One of our guys is seeing occasional dropped ipv4 packets coming in on
an ipv6 udp socket obtained via socket(AF_INET6,  SOCK_DGRAM, IPPROTO_UDP).

Here's what he says:

"The problem happens when release_sock() goes down an interesting code
path.  If (sk->sk_backlog.tail) is non-NULL then release_sock() invokes
__release_sock() which loops over all queue packets and invokes the
socket's backlog receive function for each previously queued packet.

Now for the interesting part.  The UDPv6 backlog receive function (in
net/ipv6/udp.c, udpv6_queue_rcv_skb()) invokes xfrm6_policy_check() to
confirm that the packet is allowed, but the problem is that it calls
this function regardless of whether the packet is IPv4 or IPv6.  The
xfrm6_policy_check() function then assumes that it is an IPv6 packet and
tries to match a policy based on its packet header... but that clearly
won't work because the addresses that it finds when it decodes the skb
are completely bogus."

Looking at the ipv4 code, git commit 9382177 split __udp_queue_rcv_skb()
out of udp_queue_rcv_skb().  It was done for locking purposes, but it
also means that backlog_rcv is bound to __udp_queue_rcv_skb(), which
doesn't call xfrm4_policy_check().

Should a new function __udpv6_queue_rcv_skb() be split out from
udpv6_queue_rcv_skb() and bound to backlog_rcv to resolve the xfrm
issue?  What about the locking that was the reason for the split in the
ipv4 case--is there a similar problem with ipv6?

Thanks,
Chris

-- 
Chris Friesen
Software Developer
GENBAND
chris.friesen@genband.com
www.genband.com

^ permalink raw reply

* AVB QoS  support (IEEE802.1 Qav and Qat)
From: Eliot Blennerhassett @ 2011-02-16 19:51 UTC (permalink / raw)
  To: netdev
In-Reply-To: <4D488D9B.5080409@audioscience.com>

Greetings

Can linux network stacks can support the various protocols required by AVB?
It is not clear to me if it already does, and in any case how the setup
would be accessed?

In particular these two standards and the interaction between them

802.1Qat - Stream Reservation Protocol
802.1Qav - Forwarding and Queuing Enhancements for Time-Sensitive
Streams (approved on Dec 10th, 2009)

Qav requires traffic shaping to smooth the traffic flow in guaranteed
stream class. This is afaik based on 1 packet per stream per 125us

Note that in future (maybe already) there will be hardware assist for
this feature in NIC chips.

-- 
Eliot Blennerhassett
AudioScience Inc.

[1] http://www.ieee802.org/1/pages/avbridges.html
[2]
http://en.wikipedia.org/wiki/Audio_Video_Bridging#Traffic_shaping_for_AV_streams

^ permalink raw reply

* Re: [PATCH -next] PCI: fix tlan build when CONFIG_PCI is not enabled
From: Jesse Barnes @ 2011-02-16 19:46 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Stephen Rothwell, netdev, linux-pci, linux-next, LKML, davem,
	Sakari Ailus
In-Reply-To: <20110214122750.e1e03bc8.randy.dunlap@oracle.com>

On Mon, 14 Feb 2011 12:27:50 -0800
Randy Dunlap <randy.dunlap@oracle.com> wrote:

> From: Randy Dunlap <randy.dunlap@oracle.com>
> 
> When CONFIG_PCI is not enabled, tlan.c has a build error:
> drivers/net/tlan.c:503: error: implicit declaration of function 'pci_wake_from_d3'
> 
> so add an inline function stub for this function to pci.h when
> PCI is not enabled, similar to other stubbed PCI functions.
> 
> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
> Cc: Sakari Ailus <sakari.ailus@iki.fi>
> ---
>  include/linux/pci.h |    5 +++++
>  1 file changed, 5 insertions(+)
> 
> --- linux-next-20110214.orig/include/linux/pci.h
> +++ linux-next-20110214/include/linux/pci.h
> @@ -1191,6 +1191,11 @@ static inline int pci_set_power_state(st
>  	return 0;
>  }
>  
> +static inline int pci_wake_from_d3(struct pci_dev *dev, bool enable)
> +{
> +	return 0;
> +}
> +
>  static inline pci_power_t pci_choose_state(struct pci_dev *dev,
>  					   pm_message_t state)
>  {
> 

Applied to linux-next, thanks guys.

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply

* [PATCH] bonding: added 802.3ad round-robin hashing policy and source mac selection mode
From: Oleg V. Ukhno @ 2011-02-16 19:13 UTC (permalink / raw)
  To: netdev; +Cc: Jay Vosburgh, David S. Miller

Patch introduces two new (related) features to bonding module.
First feature is round-robin hashing policy, which is primarily
intended for use with 802.3ad mode, and puts every next IPv4 and
IPv6 packet into  next availables slave without taling into account
which layer3 and above protocol is used.
Second feature makes possible choosing which MAC-address will be set
in the transmitted packet - when set to src-mac it will force setting
slave's interface real MAC address as source MAC address in every
packet, sent via this slave interface.
Main goal of this patch is to make possible single TCP stream 
equally striped for both transmitted and received packets over all
available slaves.
This operating mode is not fully 802.3ad compliant, and will cause
some packet reordering in TCP stream, to some kernel tuning may be
required.
For correct working enabling round-robin hashing policy plus using
real slave's MAC addresses as source MAC addresses in transmitted 
packets requires specific switch setting)hashing mode for port-channel
("etherchannel) should be set to src-mac or src-dst-mac to get
correct load-striping on the receiving host's etherchannel.
General requirements for using bonding in this operating mode are:
- even and preferrably equal number of slaves on sending and receiving 
hosts;
- equal RTT between sending and receiving hosts on all slaves;
- switch capable of doing etherchannels and using src-mac or src-dst-mac
hashing policy for egress load striping

Signed-off-by: Oleg V. Ukhno <olegu@yandex-team.ru>
---

 Documentation/networking/bonding.txt |   65 +++++++++++++++++++++++++++++++++++
 drivers/net/bonding/bond_3ad.c       |    2 -
 drivers/net/bonding/bond_main.c      |   60 +++++++++++++++++++++++++++++---
 drivers/net/bonding/bond_sysfs.c     |   50 ++++++++++++++++++++++++++
 drivers/net/bonding/bonding.h        |    7 +++
 include/linux/if_bonding.h           |    1 
 6 files changed, 178 insertions(+), 7 deletions(-)

diff -uprN -X linux-2.6/Documentation/dontdiff linux-2.6/Documentation/networking/bonding.txt linux-2.6.p/Documentation/networking/bonding.txt
--- linux-2.6/Documentation/networking/bonding.txt	2011-02-08 16:03:01.290281998 +0300
+++ linux-2.6.p/Documentation/networking/bonding.txt	2011-02-16 22:03:09.650281997 +0300
@@ -83,6 +83,7 @@ Table of Contents
 12. Configuring Bonding for Maximum Throughput
 12.1	Maximum Throughput in a Single Switch Topology
 12.1.1		MT Bonding Mode Selection for Single Switch Topology
+12.1.1.1 		Maximizing TCP Throughput for RX/TX for Single Switch Topology using layer2 mechanisms
 12.1.2		MT Link Monitoring for Single Switch Topology
 12.2	Maximum Throughput in a Multiple Switch Topology
 12.2.1		MT Bonding Mode Selection for Multiple Switch Topology
@@ -761,6 +762,34 @@ xmit_hash_policy
 		conversations.  Other implementations of 802.3ad may
 		or may not tolerate this noncompliance.
 
+	round-robin
+
+		This policy simply puts every next packet into next
+		slave interfaces, providing round-robin load striping
+		for transmitted data. This policy can be enabled with
+		any mode which supports choosing alternate hash policy,
+		but was initially done for 802.3ad mode.
+
+		Main goal for this policy is to stripe TX load without
+		taking into account which layer3 protocol is used, and
+		can be used for single TCP connection load striping. When
+		enabled, it will round-robin packets for IPv4 and IPv6
+		only.
+
+		There is also src_mac_select option, which can be used
+		to configure RX load-striping using switch hashing
+		algorhytms on the receiving side. See detailed description
+		below.
+
+		It is important to understand, that this hashing policy
+		will possibly cause TCP out-of-order packets when enabled
+		and must not be used when slaves have different bandwidth
+		and/or RTT in receiver's direction. This algorithm is not
+		fully 802.3ad compliant. Some implementations of 802.3ad
+		may or may not tolerate this noncompliance.
+
+		Hashing formula is transmitted packet number % slave count.
+
 	The default value is layer2.  This option was added in bonding
 	version 2.6.3.  In earlier versions of bonding, this parameter
 	does not exist, and the layer2 policy is the only policy.  The
@@ -2190,6 +2219,42 @@ balance-alb: This mode is everything tha
 	device driver must support changing the hardware address while
 	the device is open.
 
+12.1.1.1 Maximizing TCP Throughput for RX/TX for Single Switch Topology
+         using layer2 mechanisms
+----------------------------------------------------------------------
+	Besides of methods of load striping and configuring HA, mentioned
+above, you can use round-robin hashing policy and src_mac_select "slave-src"
+setting to stripe TCP load near-equally over even number of slaves. Please
+note, that enabling round-robin policy for balance-xor mode should turn it
+into mode similar to balance-rr mode.
+	There is also specific switch configuration required to use all
+benefits of both round-robin hashing policy and src_mac_select "slave-src"
+setting.
+        When you enable round-robin xmit hashing policy plus set
+src_mac_select to slave-src mode, you will get every next packet
+transmitted over a new slave with every's packet source MAC address set
+to real MAC address of the according slave interface, not the aggregate
+interface.
+        Imagine, that you have two hosts(let's say A and B), each connected
+using 2 slave interfaces to switch with appropriate port-channels configured
+("etherchannels"). After you start transmitting TCP data from A to B, and
+round-robin hashing policy is enabled, you will see that TX load is equally
+striped over host A slaves, but all this traffic is received with only one
+machine's B slave.
+        Now, you set src_mac_select parameter to "slave-src" and
+configure switch for src-mac hashing for "outqoing" etherchannel load
+striping. Now every packet sent from host A has slave's MAC as source MAC
+address, and switch will send every packet from host A into receiving
+port-channel of host B taking into account source MAC address of packet being
+put into, so you will get near-equal RX load striping, which does not depend
+on layer3 and above protocols used for data transmission.
+        It is important to understand, that this load striping mode
+will only work correctly if number of slaves on each side is at least
+even, and preferrably equal and even.
+        This load striping mode also can cause TCP out-of-order packets,
+so you may need to tune your kernel for handling increased number of
+reordered packets.
+
 12.1.2 MT Link Monitoring for Single Switch Topology
 ----------------------------------------------------
 
diff -uprN -X linux-2.6/Documentation/dontdiff linux-2.6/drivers/net/bonding/bond_3ad.c linux-2.6.p/drivers/net/bonding/bond_3ad.c
--- linux-2.6/drivers/net/bonding/bond_3ad.c	2011-02-16 00:59:18.710282002 +0300
+++ linux-2.6.p/drivers/net/bonding/bond_3ad.c	2011-02-16 01:30:47.770281998 +0300
@@ -2419,7 +2419,7 @@ int bond_3ad_xmit_xor(struct sk_buff *sk
 		goto out;
 	}
 
-	slave_agg_no = bond->xmit_hash_policy(skb, slaves_in_agg);
+	slave_agg_no = bond->xmit_hash_policy(skb, slaves_in_agg, bond->rr_tx_counter++);
 
 	bond_for_each_slave(bond, slave, i) {
 		struct aggregator *agg = SLAVE_AD_INFO(slave).port.aggregator;
diff -uprN -X linux-2.6/Documentation/dontdiff linux-2.6/drivers/net/bonding/bonding.h linux-2.6.p/drivers/net/bonding/bonding.h
--- linux-2.6/drivers/net/bonding/bonding.h	2011-02-16 00:59:18.720282002 +0300
+++ linux-2.6.p/drivers/net/bonding/bonding.h	2011-02-16 01:33:11.610282004 +0300
@@ -162,6 +162,7 @@ struct bond_params {
 	int tx_queues;
 	int all_slaves_active;
 	int resend_igmp;
+	int src_mac_select;
 };
 
 struct bond_parm_tbl {
@@ -235,7 +236,7 @@ struct bonding {
 #endif /* CONFIG_PROC_FS */
 	struct   list_head bond_list;
 	struct   netdev_hw_addr_list mc_list;
-	int      (*xmit_hash_policy)(struct sk_buff *, int);
+	int      (*xmit_hash_policy)(struct sk_buff *, int, int);
 	__be32   master_ip;
 	u16      flags;
 	u16      rr_tx_counter;
@@ -308,6 +309,9 @@ static inline bool bond_is_lb(const stru
 #define BOND_ARP_VALIDATE_ALL		(BOND_ARP_VALIDATE_ACTIVE | \
 					 BOND_ARP_VALIDATE_BACKUP)
 
+#define BOND_MAC_SRC_DEFAULT		0
+#define BOND_MAC_SRC_SLAVE		1
+
 static inline int slave_do_arp_validate(struct bonding *bond,
 					struct slave *slave)
 {
@@ -402,6 +406,7 @@ extern const struct bond_parm_tbl arp_va
 extern const struct bond_parm_tbl fail_over_mac_tbl[];
 extern const struct bond_parm_tbl pri_reselect_tbl[];
 extern struct bond_parm_tbl ad_select_tbl[];
+extern const struct bond_parm_tbl src_mac_select_tbl[];
 
 #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
 void bond_send_unsolicited_na(struct bonding *bond);
diff -uprN -X linux-2.6/Documentation/dontdiff linux-2.6/drivers/net/bonding/bond_main.c linux-2.6.p/drivers/net/bonding/bond_main.c
--- linux-2.6/drivers/net/bonding/bond_main.c	2011-02-16 00:59:18.720282002 +0300
+++ linux-2.6.p/drivers/net/bonding/bond_main.c	2011-02-16 22:08:22.650281997 +0300
@@ -111,6 +111,7 @@ static char *fail_over_mac;
 static int all_slaves_active = 0;
 static struct bond_params bonding_defaults;
 static int resend_igmp = BOND_DEFAULT_RESEND_IGMP;
+static char *src_mac_select;
 
 module_param(max_bonds, int, 0);
 MODULE_PARM_DESC(max_bonds, "Max number of bonded devices");
@@ -152,7 +153,7 @@ module_param(ad_select, charp, 0);
 MODULE_PARM_DESC(ad_select, "803.ad aggregation selection logic: stable (0, default), bandwidth (1), count (2)");
 module_param(xmit_hash_policy, charp, 0);
 MODULE_PARM_DESC(xmit_hash_policy, "XOR hashing method: 0 for layer 2 (default)"
-				   ", 1 for layer 3+4");
+				   ", 1 for layer 3+4, 3 for round-robin");
 module_param(arp_interval, int, 0);
 MODULE_PARM_DESC(arp_interval, "arp interval in milliseconds");
 module_param_array(arp_ip_target, charp, NULL, 0);
@@ -167,6 +168,9 @@ MODULE_PARM_DESC(all_slaves_active, "Kee
 				     "0 for never (default), 1 for always.");
 module_param(resend_igmp, int, 0);
 MODULE_PARM_DESC(resend_igmp, "Number of IGMP membership reports to send on link failure");
+module_param(src_mac_select, charp, 0);
+MODULE_PARM_DESC(src_mac_select, "Source MAC selection mode: 0  or default (default),"
+				 "1 or slave-src to use slave's MAC as packet's src MAC");
 
 /*----------------------------- Global variables ----------------------------*/
 
@@ -206,6 +210,7 @@ const struct bond_parm_tbl xmit_hashtype
 {	"layer2",		BOND_XMIT_POLICY_LAYER2},
 {	"layer3+4",		BOND_XMIT_POLICY_LAYER34},
 {	"layer2+3",		BOND_XMIT_POLICY_LAYER23},
+{	"round-robin",		BOND_XMIT_POLICY_LAYERRR},
 {	NULL,			-1},
 };
 
@@ -238,6 +243,12 @@ struct bond_parm_tbl ad_select_tbl[] = {
 {	NULL,		-1},
 };
 
+const struct bond_parm_tbl src_mac_select_tbl[] = {
+{	"default",	BOND_MAC_SRC_DEFAULT},
+{	"slave-src",	BOND_MAC_SRC_SLAVE},
+{	NULL,		-1},
+};
+
 /*-------------------------- Forward declarations ---------------------------*/
 
 static void bond_send_gratuitous_arp(struct bonding *bond);
@@ -422,6 +433,7 @@ struct vlan_entry *bond_next_vlan(struct
 int bond_dev_queue_xmit(struct bonding *bond, struct sk_buff *skb,
 			struct net_device *slave_dev)
 {
+	struct ethhdr *eth_data;
 	skb->dev = slave_dev;
 	skb->priority = 1;
 #ifdef CONFIG_NET_POLL_CONTROLLER
@@ -433,6 +445,15 @@ int bond_dev_queue_xmit(struct bonding *
 		slave_dev->priv_flags &= ~IFF_IN_NETPOLL;
 	} else
 #endif
+		if (bond->params.src_mac_select == BOND_MAC_SRC_SLAVE &&
+		   (skb->protocol == htons(ETH_P_IP) ||
+		   skb->protocol == htons(ETH_P_IPV6))) {
+			skb_reset_mac_header(skb);
+			eth_data = eth_hdr(skb);
+			memcpy(eth_data->h_source, slave_dev->perm_addr,
+				ETH_ALEN);
+		}
+
 		dev_queue_xmit(skb);
 
 	return 0;
@@ -3261,6 +3282,13 @@ static void bond_info_show_master(struct
 			bond->params.xmit_policy);
 	}
 
+	if (bond->params.src_mac_select == BOND_MAC_SRC_DEFAULT ||
+		bond->params.src_mac_select == BOND_MAC_SRC_DEFAULT) {
+		seq_printf(seq, "Source MAC select is: %s (%d)\n",
+			src_mac_select_tbl[bond->params.src_mac_select].modename,
+			bond->params.src_mac_select);
+	}
+
 	if (USES_PRIMARY(bond->params.mode)) {
 		seq_printf(seq, "Primary Slave: %s",
 			   (bond->primary_slave) ?
@@ -3717,7 +3745,8 @@ void bond_unregister_arp(struct bonding
  * Hash for the output device based upon layer 2 and layer 3 data. If
  * the packet is not IP mimic bond_xmit_hash_policy_l2()
  */
-static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count)
+static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count,
+					int pktcount)
 {
 	struct ethhdr *data = (struct ethhdr *)skb->data;
 	struct iphdr *iph = ip_hdr(skb);
@@ -3735,7 +3764,8 @@ static int bond_xmit_hash_policy_l23(str
  * the packet is a frag or not TCP or UDP, just use layer 3 data.  If it is
  * altogether not IP, mimic bond_xmit_hash_policy_l2()
  */
-static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count)
+static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count,
+					int pktcount)
 {
 	struct ethhdr *data = (struct ethhdr *)skb->data;
 	struct iphdr *iph = ip_hdr(skb);
@@ -3759,13 +3789,29 @@ static int bond_xmit_hash_policy_l34(str
 /*
  * Hash for the output device based upon layer 2 data
  */
-static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count)
+static int bond_xmit_hash_policy_l2(struct sk_buff *skb, int count,
+					int pktcount)
 {
 	struct ethhdr *data = (struct ethhdr *)skb->data;
 
 	return (data->h_dest[5] ^ data->h_source[5]) % count;
 }
 
+/*
+ * Round-robin over all active slaves(one packet per slave) for IP and IPv6,
+ * otherwise mimic bond_xmit_hash_policy_l2()
+ */
+static int bond_xmit_hash_policy_rr(struct sk_buff *skb, int count,
+					int pktcount)
+{
+	struct ethhdr *data = (struct ethhdr *)skb->data;
+	if (skb->protocol == htons(ETH_P_IP)
+		|| skb->protocol == htons(ETH_P_IPV6)) {
+			return pktcount % count;
+	}
+	return (data->h_dest[5] ^ data->h_source[5]) % count;
+}
+
 /*-------------------------- Device entry points ----------------------------*/
 
 static int bond_open(struct net_device *bond_dev)
@@ -4395,7 +4441,8 @@ static int bond_xmit_xor(struct sk_buff
 	if (!BOND_IS_OK(bond))
 		goto out;
 
-	slave_no = bond->xmit_hash_policy(skb, bond->slave_cnt);
+	slave_no = bond->xmit_hash_policy(skb, bond->slave_cnt,
+					bond->rr_tx_counter++);
 
 	bond_for_each_slave(bond, slave, i) {
 		slave_no--;
@@ -4492,6 +4539,9 @@ static void bond_set_xmit_hash_policy(st
 	case BOND_XMIT_POLICY_LAYER34:
 		bond->xmit_hash_policy = bond_xmit_hash_policy_l34;
 		break;
+	case BOND_XMIT_POLICY_LAYERRR:
+		bond->xmit_hash_policy = bond_xmit_hash_policy_rr;
+		break;
 	case BOND_XMIT_POLICY_LAYER2:
 	default:
 		bond->xmit_hash_policy = bond_xmit_hash_policy_l2;
diff -uprN -X linux-2.6/Documentation/dontdiff linux-2.6/drivers/net/bonding/bond_sysfs.c linux-2.6.p/drivers/net/bonding/bond_sysfs.c
--- linux-2.6/drivers/net/bonding/bond_sysfs.c	2011-02-08 16:03:02.950282003 +0300
+++ linux-2.6.p/drivers/net/bonding/bond_sysfs.c	2011-02-16 02:05:58.650281999 +0300
@@ -1643,6 +1643,55 @@ out:
 static DEVICE_ATTR(resend_igmp, S_IRUGO | S_IWUSR,
 		   bonding_show_resend_igmp, bonding_store_resend_igmp);
 
+/*
+ * Show and set the bonding src_mac_select param.
+ */
+
+static ssize_t bonding_show_src_mac_select(struct device *d,
+					struct device_attribute *attr,
+					char *buf)
+{
+	struct bonding *bond = to_bond(d);
+
+	return sprintf(buf, "%s %d\n",
+			src_mac_select_tbl[bond->params.src_mac_select].modename,
+			bond->params.src_mac_select);
+}
+
+static ssize_t bonding_store_src_mac_select(struct device *d,
+					struct device_attribute *attr,
+					const char *buf, size_t count)
+{
+	int new_value, ret = count;
+	struct bonding *bond = to_bond(d);
+
+	if (bond->dev->flags & IFF_UP) {
+		pr_err("%s: Interface is up. Unable to update src mac select policy.\n",
+			bond->dev->name);
+		ret = -EPERM;
+		goto out;
+	}
+
+	new_value = bond_parse_parm(buf, src_mac_select_tbl);
+	if (new_value < 0)  {
+		pr_err("%s: Ignoring invalid src mac select policy value %.*s.\n",
+			bond->dev->name,
+			(int)strlen(buf) - 1, buf);
+		ret = -EINVAL;
+		goto out;
+	} else {
+		bond->params.src_mac_select = new_value;
+		pr_info("%s: setting src mac select policy to %s (%d).\n",
+			bond->dev->name,
+			src_mac_select_tbl[new_value].modename, new_value);
+	}
+out:
+	return ret;
+}
+
+static DEVICE_ATTR(src_mac_select, S_IRUGO | S_IWUSR,
+		bonding_show_src_mac_select, bonding_store_src_mac_select);
+
 static struct attribute *per_bond_attrs[] = {
 	&dev_attr_slaves.attr,
 	&dev_attr_mode.attr,
@@ -1671,6 +1720,7 @@ static struct attribute *per_bond_attrs[
 	&dev_attr_queue_id.attr,
 	&dev_attr_all_slaves_active.attr,
 	&dev_attr_resend_igmp.attr,
+	&dev_attr_src_mac_select.attr,
 	NULL,
 };
 
diff -uprN -X linux-2.6/Documentation/dontdiff linux-2.6/include/linux/if_bonding.h linux-2.6.p/include/linux/if_bonding.h
--- linux-2.6/include/linux/if_bonding.h	2011-02-16 00:59:18.720282002 +0300
+++ linux-2.6.p/include/linux/if_bonding.h	2011-02-16 01:23:38.660282000 +0300
@@ -91,6 +91,7 @@
 #define BOND_XMIT_POLICY_LAYER2		0 /* layer 2 (MAC only), default */
 #define BOND_XMIT_POLICY_LAYER34	1 /* layer 3+4 (IP ^ (TCP || UDP)) */
 #define BOND_XMIT_POLICY_LAYER23	2 /* layer 2+3 (IP ^ MAC) */
+#define BOND_XMIT_POLICY_LAYERRR	3 /* round-robin mode */
 
 typedef struct ifbond {
 	__s32 bond_mode;

^ permalink raw reply

* Re: Off-by-one error in net/8021q/vlan.c
From: Michał Mirosław @ 2011-02-16 18:41 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Phil Karn, richard -rw- weinberger, kaber, netdev
In-Reply-To: <1297874372.30541.29.camel@edumazet-laptop>

2011/2/16 Eric Dumazet <eric.dumazet@gmail.com>:
> Le mercredi 16 février 2011 à 08:28 -0800, Phil Karn a écrit :
>> On 2/16/11 8:10 AM, richard -rw- weinberger wrote:
>> > On Wed, Feb 16, 2011 at 4:58 PM, Phil Karn <karn@ka9q.net> wrote:
>> >> On 2/16/11 4:51 AM, richard -rw- weinberger wrote:
>> >>> On Wed, Feb 16, 2011 at 11:58 AM, Phil Karn <karn@ka9q.net> wrote:
>> >>>> The range check on vlan_id in register_vlan_device is off by one, and it
>> >>>> prevents the creation of a vlan interface for vlan ID 4095. (OSX allows
>> >>>> this, I checked.)
>> >>>
>> >>> Then OSX should fix their code. 4095 is reserved.
>> >> If it's reserved, then it's up to the user to reserve it.
>> > No.
>> > See:
>> > http://standards.ieee.org/getieee802/download/802.1Q-2005.pdf
>> Well, then I guess we all know better than the user. That's the Windows
>> Way...no, wait, I thought this is Linux.
>>
>> The fact is that I did encounter a misconfigured switch using vlan 4095,
>> and because of this off-by-one error I was unable to talk to it and fix it.
>>
>> I was hoping I wouldn't have to patch every new kernel I install.
> You can use an OSX gateway ;)
>
> If we allow ID 4095, then some users will complain we violate rules.
>
> Really you cannot push this patch in official kernel only to ease your
> life ;)

The idea is that you don't have to use ID 4095 and if you don't -
nothing's broken by just allowing it. The same goes with ID 0 - it's
defined to be 802.1p packet, but people do use it as normal VLAN
(especially with hardware that can cope with only small number of
VLANs at once).

Allowing it but with a big fat warning in logs is even better: "You
want your network broken? Sure, can do, but you have been warned."

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [RFC !!BONUS!! PATCH 6/5] ipv4: Delete routing cache.
From: David Miller @ 2011-02-16 18:09 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1297842977.3201.7.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 16 Feb 2011 08:56:17 +0100

> Le mardi 15 février 2011 à 18:55 -0800, David Miller a écrit :
>> From: David Miller <davem@davemloft.net>
>> Date: Wed, 09 Feb 2011 22:39:39 -0800 (PST)
>> 
>> > 
>> > Signed-off-by: David S. Miller <davem@davemloft.net>
>> 
>> Ok, this patch had one nasty bug:
>> 
>> > +	if (!err == 0)
>> 
>> Yeah... right.
>> 
>> I'm actively testing this version at the moment, against net-next-2.6,
>> works fine thus far.
>> 
>> --------------------
>> ipv4: Delete routing cache.
>> 
>> Signed-off-by: David S. Miller <davem@davemloft.net>
>> ---
> 
> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> I suspect we can zap DST_NOCACHE later ?

Yes, the number of cleanups we can do after this patch is actually
quite large.

^ permalink raw reply

* RE: Process for subsystem maintainers to get Hyper-V code out of staging. - CORRECTED RECIPIENTS
From: Hank Janssen @ 2011-02-16 17:43 UTC (permalink / raw)
  To: Robert Hancock
  Cc: shemminger@linux-foundation.org, netdev@vger.kernel.org,
	davem@davemloft.net, ide, KY Srinivasan, Hashir Abdi,
	Mike Sterling, Haiyang Zhang, gregkh@suse.de" "
In-Reply-To: <4D59CCAD.90503@gmail.com>


> From: Robert Hancock [mailto:hancockrwd@gmail.com]
> Sent: Monday, February 14, 2011 4:46 PM
> On 02/14/2011 05:42 PM, Hank Janssen wrote:
> >
> > MY APOLOGIES-I made a typo on James email address. I corrected it and
> resend.
> > Sorry for the double email.
> >
> >
> > Stephen/James/David,
> >
> > Greetings to you all. As you might be aware, we submitted Hyper-V
> drivers to the kernel 2009.
> > We have been extending these drivers with additional functionality
> and our primary focus now
> > is doing the work needed to exit the staging area.
> >
> > To give you some background, the following are Hyper-V specific Linux
> drivers:
> >
> >                  hv_vmbus           The vmbus driver that is the
> bridge between guest and the
> > 			host
> >                  hv_storvsc          The SCSI device driver
> >                  hv_blkvsc            The IDE driver
> 
> Given that the IDE subsystem (drivers/ide) is currently in
> maintenance-only mode, and isn't used by modern distributions, you
> likely want to make this a libata driver instead.
> 
> Though, from what's in current git, it's not clear to me what the HV
> IDE
> (and SCSI) drivers are attempting to do. Is it really something that
> looks like an IDE controller from the guest OS point of view? If not,
> then having it as an IDE driver would be the wrong thing to do, it
> should be more of a generic block driver. In that case, then, why are
> there both SCSI and IDE drivers in the first place?
> 

Robert,

Thank you very much for your responses, today Hyper-V host only supports 
IDE and SCSI, and the code was initially written against 2.6.9 kernel.

Hyper-V still treats them a separate interface and is designed to emulate
A pretty old BIOS.

What my approach will be is to dig into libsata (something I have not
Much knowledge of) and see if we can use it and find a way to more sanely
Merge the behavior of Hyper-V's IDE and SCSI.

Hank.


^ permalink raw reply

* Re: Off-by-one error in net/8021q/vlan.c
From: Eric Dumazet @ 2011-02-16 16:39 UTC (permalink / raw)
  To: Phil Karn; +Cc: richard -rw- weinberger, kaber, netdev
In-Reply-To: <4D5BFB39.8070805@ka9q.net>

Le mercredi 16 février 2011 à 08:28 -0800, Phil Karn a écrit :
> On 2/16/11 8:10 AM, richard -rw- weinberger wrote:
> > On Wed, Feb 16, 2011 at 4:58 PM, Phil Karn <karn@ka9q.net> wrote:
> >> On 2/16/11 4:51 AM, richard -rw- weinberger wrote:
> >>> On Wed, Feb 16, 2011 at 11:58 AM, Phil Karn <karn@ka9q.net> wrote:
> >>>> The range check on vlan_id in register_vlan_device is off by one, and it
> >>>> prevents the creation of a vlan interface for vlan ID 4095. (OSX allows
> >>>> this, I checked.)
> >>>
> >>> Then OSX should fix their code. 4095 is reserved.
> >>>
> >>
> >> If it's reserved, then it's up to the user to reserve it.
> > 
> > No.
> > 
> > See:
> > http://standards.ieee.org/getieee802/download/802.1Q-2005.pdf
> > 
> 
> Well, then I guess we all know better than the user. That's the Windows
> Way...no, wait, I thought this is Linux.
> 
> The fact is that I did encounter a misconfigured switch using vlan 4095,
> and because of this off-by-one error I was unable to talk to it and fix it.
> 
> I was hoping I wouldn't have to patch every new kernel I install.
> 

You can use an OSX gateway ;)

If we allow ID 4095, then some users will complain we violate rules.

Really you cannot push this patch in official kernel only to ease your
life ;)




^ permalink raw reply

* Re: Off-by-one error in net/8021q/vlan.c
From: richard -rw- weinberger @ 2011-02-16 16:35 UTC (permalink / raw)
  To: Phil Karn; +Cc: kaber, netdev
In-Reply-To: <4D5BFB39.8070805@ka9q.net>

On Wed, Feb 16, 2011 at 5:28 PM, Phil Karn <karn@ka9q.net> wrote:
> On 2/16/11 8:10 AM, richard -rw- weinberger wrote:
>> On Wed, Feb 16, 2011 at 4:58 PM, Phil Karn <karn@ka9q.net> wrote:
>>> On 2/16/11 4:51 AM, richard -rw- weinberger wrote:
>>>> On Wed, Feb 16, 2011 at 11:58 AM, Phil Karn <karn@ka9q.net> wrote:
>>>>> The range check on vlan_id in register_vlan_device is off by one, and it
>>>>> prevents the creation of a vlan interface for vlan ID 4095. (OSX allows
>>>>> this, I checked.)
>>>>
>>>> Then OSX should fix their code. 4095 is reserved.
>>>>
>>>
>>> If it's reserved, then it's up to the user to reserve it.
>>
>> No.
>>
>> See:
>> http://standards.ieee.org/getieee802/download/802.1Q-2005.pdf
>>
>
> Well, then I guess we all know better than the user. That's the Windows
> Way...no, wait, I thought this is Linux.
>
> The fact is that I did encounter a misconfigured switch using vlan 4095,
> and because of this off-by-one error I was unable to talk to it and fix it.
>
> I was hoping I wouldn't have to patch every new kernel I install.
>

The switch violates the standard. Why should Linux also do so?
This would only produce more broken VLANs...

-- 
Thanks,
//richard

^ permalink raw reply

* Re: Off-by-one error in net/8021q/vlan.c
From: Phil Karn @ 2011-02-16 16:28 UTC (permalink / raw)
  To: richard -rw- weinberger; +Cc: kaber, netdev
In-Reply-To: <AANLkTikNrwd31RBj1gc6kSaT=qodS=A=YntM=72PMbDf@mail.gmail.com>

On 2/16/11 8:10 AM, richard -rw- weinberger wrote:
> On Wed, Feb 16, 2011 at 4:58 PM, Phil Karn <karn@ka9q.net> wrote:
>> On 2/16/11 4:51 AM, richard -rw- weinberger wrote:
>>> On Wed, Feb 16, 2011 at 11:58 AM, Phil Karn <karn@ka9q.net> wrote:
>>>> The range check on vlan_id in register_vlan_device is off by one, and it
>>>> prevents the creation of a vlan interface for vlan ID 4095. (OSX allows
>>>> this, I checked.)
>>>
>>> Then OSX should fix their code. 4095 is reserved.
>>>
>>
>> If it's reserved, then it's up to the user to reserve it.
> 
> No.
> 
> See:
> http://standards.ieee.org/getieee802/download/802.1Q-2005.pdf
> 

Well, then I guess we all know better than the user. That's the Windows
Way...no, wait, I thought this is Linux.

The fact is that I did encounter a misconfigured switch using vlan 4095,
and because of this off-by-one error I was unable to talk to it and fix it.

I was hoping I wouldn't have to patch every new kernel I install.


^ permalink raw reply

* Re: Off-by-one error in net/8021q/vlan.c
From: richard -rw- weinberger @ 2011-02-16 16:10 UTC (permalink / raw)
  To: Phil Karn; +Cc: kaber, netdev
In-Reply-To: <4D5BF411.4020204@ka9q.net>

On Wed, Feb 16, 2011 at 4:58 PM, Phil Karn <karn@ka9q.net> wrote:
> On 2/16/11 4:51 AM, richard -rw- weinberger wrote:
>> On Wed, Feb 16, 2011 at 11:58 AM, Phil Karn <karn@ka9q.net> wrote:
>>> The range check on vlan_id in register_vlan_device is off by one, and it
>>> prevents the creation of a vlan interface for vlan ID 4095. (OSX allows
>>> this, I checked.)
>>
>> Then OSX should fix their code. 4095 is reserved.
>>
>
> If it's reserved, then it's up to the user to reserve it.

No.

See:
http://standards.ieee.org/getieee802/download/802.1Q-2005.pdf

-- 
Thanks,
//richard

^ permalink raw reply

* Re: Off-by-one error in net/8021q/vlan.c
From: Phil Karn @ 2011-02-16 15:58 UTC (permalink / raw)
  To: richard -rw- weinberger; +Cc: kaber, netdev
In-Reply-To: <AANLkTinBOk8ZNQvRpMqZQE_vOu63QVzDZ4ceRRUDvJD_@mail.gmail.com>

On 2/16/11 4:51 AM, richard -rw- weinberger wrote:
> On Wed, Feb 16, 2011 at 11:58 AM, Phil Karn <karn@ka9q.net> wrote:
>> The range check on vlan_id in register_vlan_device is off by one, and it
>> prevents the creation of a vlan interface for vlan ID 4095. (OSX allows
>> this, I checked.)
> 
> Then OSX should fix their code. 4095 is reserved.
> 

If it's reserved, then it's up to the user to reserve it.

I actually had reason to use this to fix a misconfigured host that was
using vlan 4095. This got in my way.

^ permalink raw reply

* Re: [patch net-next-2.6 1/4] rtnetlink: implement setting of master device
From: Patrick McHardy @ 2011-02-16 15:25 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Stephen Hemminger, netdev, davem, shemminger, fubar, eric.dumazet,
	nicolas.2p.debian
In-Reply-To: <20110216143923.GB5727@psychotron.brq.redhat.com>

On 16.02.2011 15:39, Jiri Pirko wrote:
> Wed, Feb 16, 2011 at 02:18:44PM CET, shemminger@vyatta.com wrote:
>> On Sun, 13 Feb 2011 20:31:06 +0100
>> Jiri Pirko <jpirko@redhat.com> wrote:
>>
>>> This patch allows userspace to enslave/release slave devices via netlink
>>> interface using IFLA_MASTER. This introduces generic way to add/remove
>>> underling devices.
>>>
>>> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
>>
>> But, setting master means something different for each type of device?
> 
> Why isn't correct to use master also for bridge? It had no meaning there.

In fact the bridge netlink family uses IFLA_MASTER for exactly
the same purpose.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox