Netdev List

Netdev List
 help / color / mirror / Atom feed

* [RFC][net-next-2.6 PATCH 2/4] net: 8021Q consolidate header_ops routines
From: John Fastabend @ 2010-10-21 22:10 UTC (permalink / raw)
  To: netdev; +Cc: jesse
In-Reply-To: <20101021221004.22906.58438.stgit@jf-dev1-dcblab>

The only thing the 8021Q header ops routines are required
for is the VLAN_FLAG_REORDER_HDR otherwise by the time
the VLAN tag has been added the packet is already on
its way down the stack. In this case using the Ethernet
ops works OK.

At present the VLAN_FLAG_REORDER_HDR flag does not work
with vlan offloads. As I understand the flag the intent
is to allow taps on the vlan device and possibly the
QOS layer to see the vlan tag info.

By inserting the tag in vlan_tci any taps or QOS policies
should be able to retrieve the vlan info. This allows
the flag to work the same in both the offload case and
non-offloaded case. And allows us to use the underlying
ethernet ops.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---

 net/8021q/vlan_dev.c |   83 +++++++++++++-------------------------------------
 1 files changed, 21 insertions(+), 62 deletions(-)

diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 78b1618..1645c3c 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -32,39 +32,6 @@
 #include "vlanproc.h"
 #include <linux/if_vlan.h>
 
-/*
- *	Rebuild the Ethernet MAC header. This is called after an ARP
- *	(or in future other address resolution) has completed on this
- *	sk_buff. We now let ARP fill in the other fields.
- *
- *	This routine CANNOT use cached dst->neigh!
- *	Really, it is used only when dst->neigh is wrong.
- *
- * TODO:  This needs a checkup, I'm ignorant here. --BLG
- */
-static int vlan_dev_rebuild_header(struct sk_buff *skb)
-{
-	struct net_device *dev = skb->dev;
-	struct vlan_ethhdr *veth = (struct vlan_ethhdr *)(skb->data);
-
-	switch (veth->h_vlan_encapsulated_proto) {
-#ifdef CONFIG_INET
-	case htons(ETH_P_IP):
-
-		/* TODO:  Confirm this will work with VLAN headers... */
-		return arp_find(veth->h_dest, skb);
-#endif
-	default:
-		pr_debug("%s: unable to resolve type %X addresses.\n",
-			 dev->name, ntohs(veth->h_vlan_encapsulated_proto));
-
-		memcpy(veth->h_source, dev->dev_addr, ETH_ALEN);
-		break;
-	}
-
-	return 0;
-}
-
 static inline struct sk_buff *vlan_check_reorder_header(struct sk_buff *skb)
 {
 	if (vlan_dev_info(skb->dev)->flags & VLAN_FLAG_REORDER_HDR) {
@@ -269,33 +236,26 @@ static int vlan_dev_hard_header(struct sk_buff *skb, struct net_device *dev,
 				const void *daddr, const void *saddr,
 				unsigned int len)
 {
-	struct vlan_hdr *vhdr;
-	unsigned int vhdrlen = 0;
-	u16 vlan_tci = 0;
 	int rc;
 
 	if (WARN_ON(skb_headroom(skb) < dev->hard_header_len))
 		return -ENOSPC;
 
+	/* When this flag is not set we make the vlan_tci visible
+	 * by setting the skb field.
+	 *
+	 * We do not immediately insert the tag here the intent
+	 * of setting VLAN_FLAG_REORDER_HDR is to make the vlan
+	 * info avaiable to tap devices and the QOS layer. Adding
+	 * the tag present bit shoould be enough for other layers
+	 * to learn the vlan tag.
+	 */
 	if (!(vlan_dev_info(dev)->flags & VLAN_FLAG_REORDER_HDR)) {
-		vhdr = (struct vlan_hdr *) skb_push(skb, VLAN_HLEN);
+		u16 vlan_tci = 0;
 
 		vlan_tci = vlan_dev_info(dev)->vlan_id;
 		vlan_tci |= vlan_dev_get_egress_qos_mask(dev, skb);
-		vhdr->h_vlan_TCI = htons(vlan_tci);
-
-		/*
-		 *  Set the protocol type. For a packet of type ETH_P_802_3/2 we
-		 *  put the length in here instead.
-		 */
-		if (type != ETH_P_802_3 && type != ETH_P_802_2)
-			vhdr->h_vlan_encapsulated_proto = htons(type);
-		else
-			vhdr->h_vlan_encapsulated_proto = htons(len);
-
-		skb->protocol = htons(ETH_P_8021Q);
-		type = ETH_P_8021Q;
-		vhdrlen = VLAN_HLEN;
+		skb = __vlan_hwaccel_put_tag(skb, vlan_tci);
 	}
 
 	/* Before delegating work to the lower layer, enter our MAC-address */
@@ -304,9 +264,7 @@ static int vlan_dev_hard_header(struct sk_buff *skb, struct net_device *dev,
 
 	/* Now make the underlying real hard header */
 	dev = vlan_dev_info(dev)->real_dev;
-	rc = dev_hard_header(skb, dev, type, daddr, saddr, len + vhdrlen);
-	if (rc > 0)
-		rc += vhdrlen;
+	rc = dev_hard_header(skb, dev, type, daddr, saddr, len);
 	return rc;
 }
 
@@ -676,9 +634,11 @@ static void vlan_dev_set_lockdep_class(struct net_device *dev, int subclass)
 }
 
 static const struct header_ops vlan_header_ops = {
-	.create	 = vlan_dev_hard_header,
-	.rebuild = vlan_dev_rebuild_header,
-	.parse	 = eth_header_parse,
+	.create		= vlan_dev_hard_header,
+	.rebuild	= eth_rebuild_header,
+	.parse		= eth_header_parse,
+	.cache		= eth_header_cache,
+	.cache_update	= eth_header_cache_update,
 };
 
 static const struct net_device_ops vlan_netdev_ops, vlan_netdev_ops_sq;
@@ -713,13 +673,12 @@ static int vlan_dev_init(struct net_device *dev)
 	dev->fcoe_ddp_xid = real_dev->fcoe_ddp_xid;
 #endif
 
-	if (real_dev->features & NETIF_F_HW_VLAN_TX) {
-		dev->header_ops      = real_dev->header_ops;
+	dev->header_ops = &vlan_header_ops;
+
+	if (real_dev->features & NETIF_F_HW_VLAN_TX)
 		dev->hard_header_len = real_dev->hard_header_len;
-	} else {
-		dev->header_ops      = &vlan_header_ops;
+	else
 		dev->hard_header_len = real_dev->hard_header_len + VLAN_HLEN;
-	}
 
 	if (real_dev->netdev_ops->ndo_select_queue)
 		dev->netdev_ops = &vlan_netdev_ops_sq;


^ permalink raw reply related

* [RFC][net-next-2.6 PATCH 3/4] ethtool: set hard_header_len using ETH_FLAG_{TX|RX}VLAN
From: John Fastabend @ 2010-10-21 22:10 UTC (permalink / raw)
  To: netdev; +Cc: jesse
In-Reply-To: <20101021221004.22906.58438.stgit@jf-dev1-dcblab>

Toggling the vlan tx|rx hw offloads needs to set the hard_header_len
as well otherwise we end up using LL_RESERVED_SPACE incorrectly.
This results in pskb_expand_head() being used unnecessarily.

This add a check in ethtool_op_set_flags to catch the ETH_FLAG_TXVLAN
flag and set the header length.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---

 net/core/ethtool.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 956a9f4..4f7fe26 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -21,6 +21,7 @@
 #include <linux/uaccess.h>
 #include <linux/vmalloc.h>
 #include <linux/slab.h>
+#include <linux/if_vlan.h>
 
 /*
  * Some useful ethtool_ops methods that're device independent.
@@ -151,6 +152,14 @@ int ethtool_op_set_flags(struct net_device *dev, u32 data, u32 supported)
 	if (data & ~supported)
 		return -EINVAL;
 
+	/* is ETH_FLAGS_TXVLAN being toggled */
+	if ((dev->features & ETH_FLAG_TXVLAN) ^ (data & ETH_FLAG_TXVLAN)) {
+		if (data & ETH_FLAG_TXVLAN)
+			dev->hard_header_len -= VLAN_HLEN;
+		else
+			dev->hard_header_len += VLAN_HLEN;
+	}
+
 	dev->features = ((dev->features & ~flags_dup_features) |
 			 (data & flags_dup_features));
 	return 0;


^ permalink raw reply related

* [RFC][net-next-2.6 PATCH 4/4] net: remove check for headroom in vlan_dev_create
From: John Fastabend @ 2010-10-21 22:10 UTC (permalink / raw)
  To: netdev; +Cc: jesse
In-Reply-To: <20101021221004.22906.58438.stgit@jf-dev1-dcblab>

It is possible for the headroom to be smaller then the
hard_header_len for a short period of time after toggling
the vlan offload setting.

This is not a hard error and skb_cow_head is called in
__vlan_put_tag() to resolve this.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---

 net/8021q/vlan_dev.c |    3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 1645c3c..e043389 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -238,9 +238,6 @@ static int vlan_dev_hard_header(struct sk_buff *skb, struct net_device *dev,
 {
 	int rc;
 
-	if (WARN_ON(skb_headroom(skb) < dev->hard_header_len))
-		return -ENOSPC;
-
 	/* When this flag is not set we make the vlan_tci visible
 	 * by setting the skb field.
 	 *


^ permalink raw reply related

* [RFC][net-next-2.6 PATCH 1/4] net: consolidate 8021q tagging
From: John Fastabend @ 2010-10-21 22:10 UTC (permalink / raw)
  To: netdev; +Cc: jesse

Now that VLAN packets are tagged in dev_hard_start_xmit()
at the bottom of the stack we no longer need to tag them
in the 8021Q module (Except in the !VLAN_FLAG_REORDER_HDR
case).

This allows the accel path and non accel paths to be consolidated.
Here the vlan_tci in the skb is always set and we allow the
stack to add the actual tag in dev_hard_start_xmit().

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---

 net/8021q/vlan_dev.c |  105 +++-----------------------------------------------
 1 files changed, 7 insertions(+), 98 deletions(-)

diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 14e3d1f..78b1618 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -326,24 +326,12 @@ static netdev_tx_t vlan_dev_hard_start_xmit(struct sk_buff *skb,
 	 */
 	if (veth->h_vlan_proto != htons(ETH_P_8021Q) ||
 	    vlan_dev_info(dev)->flags & VLAN_FLAG_REORDER_HDR) {
-		unsigned int orig_headroom = skb_headroom(skb);
 		u16 vlan_tci;
-
-		vlan_dev_info(dev)->cnt_encap_on_xmit++;
-
 		vlan_tci = vlan_dev_info(dev)->vlan_id;
 		vlan_tci |= vlan_dev_get_egress_qos_mask(dev, skb);
-		skb = __vlan_put_tag(skb, vlan_tci);
-		if (!skb) {
-			txq->tx_dropped++;
-			return NETDEV_TX_OK;
-		}
-
-		if (orig_headroom < VLAN_HLEN)
-			vlan_dev_info(dev)->cnt_inc_headroom_on_tx++;
+		skb = __vlan_hwaccel_put_tag(skb, vlan_tci);
 	}
 
-
 	skb_set_dev(skb, vlan_dev_info(dev)->real_dev);
 	len = skb->len;
 	ret = dev_queue_xmit(skb);
@@ -357,32 +345,6 @@ static netdev_tx_t vlan_dev_hard_start_xmit(struct sk_buff *skb,
 	return ret;
 }
 
-static netdev_tx_t vlan_dev_hwaccel_hard_start_xmit(struct sk_buff *skb,
-						    struct net_device *dev)
-{
-	int i = skb_get_queue_mapping(skb);
-	struct netdev_queue *txq = netdev_get_tx_queue(dev, i);
-	u16 vlan_tci;
-	unsigned int len;
-	int ret;
-
-	vlan_tci = vlan_dev_info(dev)->vlan_id;
-	vlan_tci |= vlan_dev_get_egress_qos_mask(dev, skb);
-	skb = __vlan_hwaccel_put_tag(skb, vlan_tci);
-
-	skb->dev = vlan_dev_info(dev)->real_dev;
-	len = skb->len;
-	ret = dev_queue_xmit(skb);
-
-	if (likely(ret == NET_XMIT_SUCCESS || ret == NET_XMIT_CN)) {
-		txq->tx_packets++;
-		txq->tx_bytes += len;
-	} else
-		txq->tx_dropped++;
-
-	return ret;
-}
-
 static u16 vlan_dev_select_queue(struct net_device *dev, struct sk_buff *skb)
 {
 	struct net_device *rdev = vlan_dev_info(dev)->real_dev;
@@ -719,8 +681,7 @@ static const struct header_ops vlan_header_ops = {
 	.parse	 = eth_header_parse,
 };
 
-static const struct net_device_ops vlan_netdev_ops, vlan_netdev_accel_ops,
-		    vlan_netdev_ops_sq, vlan_netdev_accel_ops_sq;
+static const struct net_device_ops vlan_netdev_ops, vlan_netdev_ops_sq;
 
 static int vlan_dev_init(struct net_device *dev)
 {
@@ -755,19 +716,16 @@ static int vlan_dev_init(struct net_device *dev)
 	if (real_dev->features & NETIF_F_HW_VLAN_TX) {
 		dev->header_ops      = real_dev->header_ops;
 		dev->hard_header_len = real_dev->hard_header_len;
-		if (real_dev->netdev_ops->ndo_select_queue)
-			dev->netdev_ops = &vlan_netdev_accel_ops_sq;
-		else
-			dev->netdev_ops = &vlan_netdev_accel_ops;
 	} else {
 		dev->header_ops      = &vlan_header_ops;
 		dev->hard_header_len = real_dev->hard_header_len + VLAN_HLEN;
-		if (real_dev->netdev_ops->ndo_select_queue)
-			dev->netdev_ops = &vlan_netdev_ops_sq;
-		else
-			dev->netdev_ops = &vlan_netdev_ops;
 	}
 
+	if (real_dev->netdev_ops->ndo_select_queue)
+		dev->netdev_ops = &vlan_netdev_ops_sq;
+	else
+		dev->netdev_ops = &vlan_netdev_ops;
+
 	if (is_vlan_dev(real_dev))
 		subclass = 1;
 
@@ -908,30 +866,6 @@ static const struct net_device_ops vlan_netdev_ops = {
 #endif
 };
 
-static const struct net_device_ops vlan_netdev_accel_ops = {
-	.ndo_change_mtu		= vlan_dev_change_mtu,
-	.ndo_init		= vlan_dev_init,
-	.ndo_uninit		= vlan_dev_uninit,
-	.ndo_open		= vlan_dev_open,
-	.ndo_stop		= vlan_dev_stop,
-	.ndo_start_xmit =  vlan_dev_hwaccel_hard_start_xmit,
-	.ndo_validate_addr	= eth_validate_addr,
-	.ndo_set_mac_address	= vlan_dev_set_mac_address,
-	.ndo_set_rx_mode	= vlan_dev_set_rx_mode,
-	.ndo_set_multicast_list	= vlan_dev_set_rx_mode,
-	.ndo_change_rx_flags	= vlan_dev_change_rx_flags,
-	.ndo_do_ioctl		= vlan_dev_ioctl,
-	.ndo_neigh_setup	= vlan_dev_neigh_setup,
-	.ndo_get_stats64	= vlan_dev_get_stats64,
-#if defined(CONFIG_FCOE) || defined(CONFIG_FCOE_MODULE)
-	.ndo_fcoe_ddp_setup	= vlan_dev_fcoe_ddp_setup,
-	.ndo_fcoe_ddp_done	= vlan_dev_fcoe_ddp_done,
-	.ndo_fcoe_enable	= vlan_dev_fcoe_enable,
-	.ndo_fcoe_disable	= vlan_dev_fcoe_disable,
-	.ndo_fcoe_get_wwn	= vlan_dev_fcoe_get_wwn,
-#endif
-};
-
 static const struct net_device_ops vlan_netdev_ops_sq = {
 	.ndo_select_queue	= vlan_dev_select_queue,
 	.ndo_change_mtu		= vlan_dev_change_mtu,
@@ -957,31 +891,6 @@ static const struct net_device_ops vlan_netdev_ops_sq = {
 #endif
 };
 
-static const struct net_device_ops vlan_netdev_accel_ops_sq = {
-	.ndo_select_queue	= vlan_dev_select_queue,
-	.ndo_change_mtu		= vlan_dev_change_mtu,
-	.ndo_init		= vlan_dev_init,
-	.ndo_uninit		= vlan_dev_uninit,
-	.ndo_open		= vlan_dev_open,
-	.ndo_stop		= vlan_dev_stop,
-	.ndo_start_xmit =  vlan_dev_hwaccel_hard_start_xmit,
-	.ndo_validate_addr	= eth_validate_addr,
-	.ndo_set_mac_address	= vlan_dev_set_mac_address,
-	.ndo_set_rx_mode	= vlan_dev_set_rx_mode,
-	.ndo_set_multicast_list	= vlan_dev_set_rx_mode,
-	.ndo_change_rx_flags	= vlan_dev_change_rx_flags,
-	.ndo_do_ioctl		= vlan_dev_ioctl,
-	.ndo_neigh_setup	= vlan_dev_neigh_setup,
-	.ndo_get_stats64	= vlan_dev_get_stats64,
-#if defined(CONFIG_FCOE) || defined(CONFIG_FCOE_MODULE)
-	.ndo_fcoe_ddp_setup	= vlan_dev_fcoe_ddp_setup,
-	.ndo_fcoe_ddp_done	= vlan_dev_fcoe_ddp_done,
-	.ndo_fcoe_enable	= vlan_dev_fcoe_enable,
-	.ndo_fcoe_disable	= vlan_dev_fcoe_disable,
-	.ndo_fcoe_get_wwn	= vlan_dev_fcoe_get_wwn,
-#endif
-};
-
 void vlan_setup(struct net_device *dev)
 {
 	ether_setup(dev);


^ permalink raw reply related

* Re: [PATCH v2 04/14] vlan: Enable software emulation for vlan accleration.
From: Jesse Gross @ 2010-10-21 21:44 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: David Miller, netdev
In-Reply-To: <1287675008.2235.8.camel@achroite.uk.solarflarecom.com>

On Thu, Oct 21, 2010 at 8:30 AM, Ben Hutchings
<bhutchings@solarflare.com> wrote:
> On Wed, 2010-10-20 at 16:56 -0700, Jesse Gross wrote:
>> Currently users of hardware vlan accleration need to know whether
>> the device supports it before generating packets.  However, vlan
>> acceleration will soon be available in a more flexible manner so
>> knowing ahead of time becomes much more difficult.  This adds
>> a software fallback path for vlan packets on devices without the
>> necessary offloading support, similar to other types of hardware
>> accleration.
> [...]
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 4c3ac53..1bfd96b 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -1694,7 +1694,12 @@ static bool can_checksum_protocol(unsigned long features, __be16 protocol)
>>
>>  static bool dev_can_checksum(struct net_device *dev, struct sk_buff *skb)
>>  {
>> -     if (can_checksum_protocol(dev->features, skb->protocol))
>> +     int features = dev->features;
>> +
>> +     if (vlan_tx_tag_present(skb))
>> +             features &= dev->vlan_features;
>> +
>> +     if (can_checksum_protocol(features, skb->protocol))
>>               return true;
>>
>>       if (skb->protocol == htons(ETH_P_8021Q)) {
> [...]
>
> Additional context:
>
>                struct vlan_ethhdr *veh = (struct vlan_ethhdr *)skb->data;
>                if (can_checksum_protocol(dev->features & dev->vlan_features,
>                                          veh->h_vlan_encapsulated_proto))
>                        return true;
>        }
>
>        return false;
> }
>
> I don't think this will do the right thing if the NIC does VLAN tag
> insertion and checksum offload with only one layer of VLAN
> encapsulation, but the skb has two layers of VLAN encapsulation.
>
> I think we actually want something like:
>
> static bool dev_can_checksum(struct net_device *dev, struct sk_buff *skb)
> {
>        __be16 protocol = skb->protocol;
>        int features = dev->features;
>
>        if (vlan_tx_tag_present(skb)) {
>                features &= dev->vlan_features;
>        } else if (skb->protocol == htons(ETH_P_8021Q)) {
>                struct vlan_ethhdr *veh = (struct vlan_ethhdr *)skb->data;
>                protocol = veh->h_vlan_encapsulated_proto;
>                features &= dev->vlan_features;
>        }
>
>        return can_checksum_protocol(features, protocol);
> }
>
> Does that look right?

Thanks, good catch.  Yes, that looks right.  Will you submit a patch
to fix this?

^ permalink raw reply

* Re: [PATCH v2 09/14] bnx2: Update bnx2 to use new vlan accleration.
From: Jesse Gross @ 2010-10-21 21:38 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: David Miller, netdev, Michael Chan
In-Reply-To: <1287675106.2235.10.camel@achroite.uk.solarflarecom.com>

On Thu, Oct 21, 2010 at 8:31 AM, Ben Hutchings
<bhutchings@solarflare.com> wrote:
> On Wed, 2010-10-20 at 16:56 -0700, Jesse Gross wrote:
>> Make the bnx2 driver use the new vlan accleration model.
>>
>> Signed-off-by: Jesse Gross <jesse@nicira.com>
>> CC: Michael Chan <mchan@broadcom.com>
>> ---
>>  drivers/net/bnx2.c |   97 +++++++++++++++-------------------------------------
>>  drivers/net/bnx2.h |    4 --
>>  2 files changed, 28 insertions(+), 73 deletions(-)
>>
>> diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
>> index 363ca8b..bf3c830 100644
>> --- a/drivers/net/bnx2.c
>> +++ b/drivers/net/bnx2.c
> [...]
>> @@ -7578,7 +7523,28 @@ bnx2_set_tx_csum(struct net_device *dev, u32 data)
>>  static int
>>  bnx2_set_flags(struct net_device *dev, u32 data)
>>  {
>> -     return ethtool_op_set_flags(dev, data, ETH_FLAG_RXHASH);
>> +     struct bnx2 *bp = netdev_priv(dev);
>> +     int rc;
>> +
>> +     if (!(bp->flags & BNX2_FLAG_CAN_KEEP_VLAN) &&
>> +         !(data & ETH_FLAG_RXVLAN))
>> +             return -EOPNOTSUPP;
> [...]
>
> Should be -EINVAL.

OK, thanks.  I've sent out a patch to fix this (and a similar one in
the bnx2x driver).

^ permalink raw reply

* Re: [PATCH v2 11/14] bnx2x: Update bnx2x to use new vlan accleration.
From: Jesse Gross @ 2010-10-21 21:36 UTC (permalink / raw)
  To: Vladislav Zolotarov
  Cc: David Miller, netdev@vger.kernel.org, Hao Zheng, Eilon Greenstein
In-Reply-To: <8628FE4E7912BF47A96AE7DD7BAC0AADDDEE42913C@SJEXCHCCR02.corp.ad.broadcom.com>

On Thu, Oct 21, 2010 at 7:50 AM, Vladislav Zolotarov <vladz@broadcom.com> wrote:
>> > Guys, when I compiled the kernel with these patches without VLAN
>> > support (CONFIG_VLAN_8021Q is not set) and tried to send VLAN tagged
>> > frames from the remote side to the bnx2x interface the kernel
>> panicked.
>> >
>> > The stack trace got cut with the __netif_receive_skb() on top by the
>> > IPKVM and I'll have to connect a serial to get it all. But until I
>> > did that maybe somebody will have any ideas anyway...
>> >
>> > It happens regardless there is HW RX VLAN stripping enabled or not.
>>
>> When RX VLAN stripping is enabled we hit the BUG() in the
>> vlan_hwaccel_do_receive().:
>
> We hit the same BUG() both when VLAN stripping is disabled.

This one surprises me because that function shouldn't get called at
all when VLAN stripping is disabled.  Are you sure that it is
disabled?  From my reading of the bnx2x driver it seems like it is
always enabled.

^ permalink raw reply

* Re: [PATCH v2 11/14] bnx2x: Update bnx2x to use new vlan accleration.
From: Jesse Gross @ 2010-10-21 21:34 UTC (permalink / raw)
  To: Vladislav Zolotarov
  Cc: David Miller, netdev@vger.kernel.org, Hao Zheng, Eilon Greenstein
In-Reply-To: <8628FE4E7912BF47A96AE7DD7BAC0AADDDEE429137@SJEXCHCCR02.corp.ad.broadcom.com>

On Thu, Oct 21, 2010 at 7:02 AM, Vladislav Zolotarov <vladz@broadcom.com> wrote:
>> Guys, when I compiled the kernel with these patches without VLAN
>> support (CONFIG_VLAN_8021Q is not set) and tried to send VLAN tagged
>> frames from the remote side to the bnx2x interface the kernel panicked.
>>
>> The stack trace got cut with the __netif_receive_skb() on top by the
>> IPKVM and I'll have to connect a serial to get it all. But until I
>> did that maybe somebody will have any ideas anyway...
>>
>> It happens regardless there is HW RX VLAN stripping enabled or not.
>
> When RX VLAN stripping is enabled we hit the BUG() in the
> vlan_hwaccel_do_receive().

Thanks, this is just a stupid mistake - I didn't update the
non-CONFIG_VLAN_8021Q version when I changed the semantics of that
function.  I've sent out a patch to fix it.

^ permalink raw reply

* [PATCH 2/2] bnx2/bnx2x: Unsupported Ethtool operations should return -EINVAL.
From: Jesse Gross @ 2010-10-21 21:30 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <1287696643-9695-1-git-send-email-jesse@nicira.com>

Some cards don't support changing vlan offloading settings.  Make
Ethtool set_flags return -EINVAL in those cases.

Reported-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
---
 drivers/net/bnx2.c                |    2 +-
 drivers/net/bnx2x/bnx2x_ethtool.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index bf3c830..062600b 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -7528,7 +7528,7 @@ bnx2_set_flags(struct net_device *dev, u32 data)
 
 	if (!(bp->flags & BNX2_FLAG_CAN_KEEP_VLAN) &&
 	    !(data & ETH_FLAG_RXVLAN))
-		return -EOPNOTSUPP;
+		return -EINVAL;
 
 	rc = ethtool_op_set_flags(dev, data, ETH_FLAG_RXHASH | ETH_FLAG_RXVLAN |
 				  ETH_FLAG_TXVLAN);
diff --git a/drivers/net/bnx2x/bnx2x_ethtool.c b/drivers/net/bnx2x/bnx2x_ethtool.c
index daefef6..d02ffbd 100644
--- a/drivers/net/bnx2x/bnx2x_ethtool.c
+++ b/drivers/net/bnx2x/bnx2x_ethtool.c
@@ -1123,7 +1123,7 @@ static int bnx2x_set_flags(struct net_device *dev, u32 data)
 	}
 
 	if (!(data & ETH_FLAG_RXVLAN))
-		return -EOPNOTSUPP;
+		return -EINVAL;
 
 	if ((data & ETH_FLAG_LRO) && bp->rx_csum && bp->disable_tpa)
 		return -EINVAL;
-- 
1.7.1


^ permalink raw reply related

* [PATCH 1/2] vlan: Calling vlan_hwaccel_do_receive() is always valid.
From: Jesse Gross @ 2010-10-21 21:30 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

It is now acceptable to receive vlan tagged packets at any time,
even if CONFIG_VLAN_8021Q is not set.  This means that calling
vlan_hwaccel_do_receive() should not result in BUG() but rather just
behave as if there were no vlan devices configured.

Reported-by: Vladislav Zolotarov <vladz@broadcom.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
---
 include/linux/if_vlan.h |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
index e607256..cbd3dcd 100644
--- a/include/linux/if_vlan.h
+++ b/include/linux/if_vlan.h
@@ -153,7 +153,8 @@ static inline u16 vlan_dev_vlan_id(const struct net_device *dev)

 static inline bool vlan_hwaccel_do_receive(struct sk_buff **skb)
 {
-	BUG();
+	if ((*skb)->vlan_tci & VLAN_VID_MASK)
+		(*skb)->pkt_type = PACKET_OTHERHOST;
 	return false;
 }
 #endif
-- 
1.7.1

^ permalink raw reply related

* [PATCH 0/2] cxgb4 updates
From: Dimitris Michailidis @ 2010-10-21 21:29 UTC (permalink / raw)
  To: netdev

Here are two patches for cxgb4.  The first fixes a crash triggered by
e6484930d7c73d324bccda7d43d131088da697b9.  The second updates the driver
to utilize the newer VLAN infrastructure.  If it's too late for the latter
let me know and I'll resend it when net-next reopens.

^ permalink raw reply

* [PATCH 2/2] cxgb4: update to utilize the newer VLAN infrastructure
From: Dimitris Michailidis @ 2010-10-21 21:29 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1287696596-15175-2-git-send-email-dm@chelsio.com>

Signed-off-by: Dimitris Michailidis <dm@chelsio.com>
---
 drivers/net/cxgb4/cxgb4.h      |    1 -
 drivers/net/cxgb4/cxgb4_main.c |   31 +++++++++++++++++++------------
 drivers/net/cxgb4/sge.c        |   23 +++++------------------
 3 files changed, 24 insertions(+), 31 deletions(-)

diff --git a/drivers/net/cxgb4/cxgb4.h b/drivers/net/cxgb4/cxgb4.h
index eaa49e4..3d4253d 100644
--- a/drivers/net/cxgb4/cxgb4.h
+++ b/drivers/net/cxgb4/cxgb4.h
@@ -281,7 +281,6 @@ struct sge_rspq;
 
 struct port_info {
 	struct adapter *adapter;
-	struct vlan_group *vlan_grp;
 	u16    viid;
 	s16    xact_addr_filt;        /* index of exact MAC address filter */
 	u16    rss_size;              /* size of VI's RSS table slice */
diff --git a/drivers/net/cxgb4/cxgb4_main.c b/drivers/net/cxgb4/cxgb4_main.c
index bc354ee..26a88a0 100644
--- a/drivers/net/cxgb4/cxgb4_main.c
+++ b/drivers/net/cxgb4/cxgb4_main.c
@@ -403,7 +403,7 @@ static int link_start(struct net_device *dev)
 	 * that step explicitly.
 	 */
 	ret = t4_set_rxmode(pi->adapter, mb, pi->viid, dev->mtu, -1, -1, -1,
-			    pi->vlan_grp != NULL, true);
+			    !!(dev->features & NETIF_F_HW_VLAN_RX), true);
 	if (ret == 0) {
 		ret = t4_change_mac(pi->adapter, mb, pi->viid,
 				    pi->xact_addr_filt, dev->dev_addr, true,
@@ -1881,7 +1881,24 @@ static int set_tso(struct net_device *dev, u32 value)
 
 static int set_flags(struct net_device *dev, u32 flags)
 {
-	return ethtool_op_set_flags(dev, flags, ETH_FLAG_RXHASH);
+	int err;
+	unsigned long old_feat = dev->features;
+
+	err = ethtool_op_set_flags(dev, flags, ETH_FLAG_RXHASH |
+				   ETH_FLAG_RXVLAN | ETH_FLAG_TXVLAN);
+	if (err)
+		return err;
+
+	if ((old_feat ^ dev->features) & NETIF_F_HW_VLAN_RX) {
+		const struct port_info *pi = netdev_priv(dev);
+
+		err = t4_set_rxmode(pi->adapter, pi->adapter->fn, pi->viid, -1,
+				    -1, -1, -1, !!(flags & ETH_FLAG_RXVLAN),
+				    true);
+		if (err)
+			dev->features = old_feat;
+	}
+	return err;
 }
 
 static int get_rss_table(struct net_device *dev, struct ethtool_rxfh_indir *p)
@@ -2841,15 +2858,6 @@ static int cxgb_set_mac_addr(struct net_device *dev, void *p)
 	return 0;
 }
 
-static void vlan_rx_register(struct net_device *dev, struct vlan_group *grp)
-{
-	struct port_info *pi = netdev_priv(dev);
-
-	pi->vlan_grp = grp;
-	t4_set_rxmode(pi->adapter, pi->adapter->fn, pi->viid, -1, -1, -1, -1,
-		      grp != NULL, true);
-}
-
 #ifdef CONFIG_NET_POLL_CONTROLLER
 static void cxgb_netpoll(struct net_device *dev)
 {
@@ -2877,7 +2885,6 @@ static const struct net_device_ops cxgb4_netdev_ops = {
 	.ndo_validate_addr    = eth_validate_addr,
 	.ndo_do_ioctl         = cxgb_ioctl,
 	.ndo_change_mtu       = cxgb_change_mtu,
-	.ndo_vlan_rx_register = vlan_rx_register,
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller  = cxgb_netpoll,
 #endif
diff --git a/drivers/net/cxgb4/sge.c b/drivers/net/cxgb4/sge.c
index 9967f3d..1702225 100644
--- a/drivers/net/cxgb4/sge.c
+++ b/drivers/net/cxgb4/sge.c
@@ -1530,18 +1530,11 @@ static void do_gro(struct sge_eth_rxq *rxq, const struct pkt_gl *gl,
 		skb->rxhash = (__force u32)pkt->rsshdr.hash_val;
 
 	if (unlikely(pkt->vlan_ex)) {
-		struct port_info *pi = netdev_priv(rxq->rspq.netdev);
-		struct vlan_group *grp = pi->vlan_grp;
-
+		__vlan_hwaccel_put_tag(skb, ntohs(pkt->vlan));
 		rxq->stats.vlan_ex++;
-		if (likely(grp)) {
-			ret = vlan_gro_frags(&rxq->rspq.napi, grp,
-					     ntohs(pkt->vlan));
-			goto stats;
-		}
 	}
 	ret = napi_gro_frags(&rxq->rspq.napi);
-stats:	if (ret == GRO_HELD)
+	if (ret == GRO_HELD)
 		rxq->stats.lro_pkts++;
 	else if (ret == GRO_MERGED || ret == GRO_MERGED_FREE)
 		rxq->stats.lro_merged++;
@@ -1608,16 +1601,10 @@ int t4_ethrx_handler(struct sge_rspq *q, const __be64 *rsp,
 		skb_checksum_none_assert(skb);
 
 	if (unlikely(pkt->vlan_ex)) {
-		struct vlan_group *grp = pi->vlan_grp;
-
+		__vlan_hwaccel_put_tag(skb, ntohs(pkt->vlan));
 		rxq->stats.vlan_ex++;
-		if (likely(grp))
-			vlan_hwaccel_receive_skb(skb, grp, ntohs(pkt->vlan));
-		else
-			dev_kfree_skb_any(skb);
-	} else
-		netif_receive_skb(skb);
-
+	}
+	netif_receive_skb(skb);
 	return 0;
 }
 
-- 
1.5.4


^ permalink raw reply related

* [PATCH 1/2] cxgb4: fix crash due to manipulating queues before registration
From: Dimitris Michailidis @ 2010-10-21 21:29 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1287696596-15175-1-git-send-email-dm@chelsio.com>

Before commit "net: allocate tx queues in register_netdevice"
netif_tx_stop_all_queues and related functions could be used between
device allocation and registration but now only after registration.
cxgb4 has such a call before registration and crashes now.  Move it
after register_netdev.

Signed-off-by: Dimitris Michailidis <dm@chelsio.com>
---
 drivers/net/cxgb4/cxgb4_main.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/cxgb4/cxgb4_main.c b/drivers/net/cxgb4/cxgb4_main.c
index 930bd07..bc354ee 100644
--- a/drivers/net/cxgb4/cxgb4_main.c
+++ b/drivers/net/cxgb4/cxgb4_main.c
@@ -3657,7 +3657,6 @@ static int __devinit init_one(struct pci_dev *pdev,
 		pi->rx_offload = RX_CSO;
 		pi->port_id = i;
 		netif_carrier_off(netdev);
-		netif_tx_stop_all_queues(netdev);
 		netdev->irq = pdev->irq;
 
 		netdev->features |= NETIF_F_SG | TSO_FLAGS;
@@ -3729,6 +3728,7 @@ static int __devinit init_one(struct pci_dev *pdev,
 
 			__set_bit(i, &adapter->registered_device_map);
 			adapter->chan_map[adap2pinfo(adapter, i)->tx_chan] = i;
+			netif_tx_stop_all_queues(adapter->port[i]);
 		}
 	}
 	if (!adapter->registered_device_map) {
-- 
1.5.4


^ permalink raw reply related

* Re: [PATCH 5/9] tproxy: allow non-local binds of IPv6 sockets if IP_TRANSPARENT is enabled
From: YOSHIFUJI Hideaki @ 2010-10-21 21:24 UTC (permalink / raw)
  To: Balazs Scheidler
  Cc: KOVACS Krisztian, netdev, netfilter-devel, Patrick McHardy,
	David Miller, yoshfuji
In-Reply-To: <1287583653.29676.9.camel@bzorp.lan>

Hello.

2010-10-20, Balazs Scheidler wrote:
> On Wed, 2010-10-20 at 21:45 +0900, YOSHIFUJI Hideaki wrote:
> > (2010/10/20 20:21), KOVACS Krisztian wrote:
> > > From: Balazs Scheidler<bazsi@balabit.hu>
> > > 
> > > Signed-off-by: Balazs Scheidler<bazsi@balabit.hu>
> > > Signed-off-by: KOVACS Krisztian<hidden@balabit.hu>
> > > ---
> > >   net/ipv6/af_inet6.c |    2 +-
> > >   1 files changed, 1 insertions(+), 1 deletions(-)
> > > 
> > > diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
> > > index 6022098..9480572 100644
> > > --- a/net/ipv6/af_inet6.c
> > > +++ b/net/ipv6/af_inet6.c
> > > @@ -343,7 +343,7 @@ int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
> > >   			 */
> > >   			v4addr = LOOPBACK4_IPV6;
> > >   			if (!(addr_type&  IPV6_ADDR_MULTICAST))	{
> > > -				if (!ipv6_chk_addr(net,&addr->sin6_addr,
> > > +				if (!inet->transparent&&  !ipv6_chk_addr(net,&addr->sin6_addr,
> > >   						   dev, 0)) {
> > >   					err = -EADDRNOTAVAIL;
> > >   					goto out_unlock;
> > > 
> > > 
> > 
> > As I wrote before in other thread, this does not seem sufficient --
> > well, it is sufficient to allow non-local bind, but before we're
> > allowing this, we need add checks of source address in sending side.
> 
> Can you please elaborate or point us to the other thread? Is it some
> kind of address-type check that we miss?

Please see my comment at:
<http://kerneltrap.org/mailarchive/linux-netdev/2010/7/5/6280572>

This will result in allowing non-privileged users easily sending from
non-local / unauthorized address, which is not good, and which should
not be allowed from security aspects.

Regards,

--yoshfuji


^ permalink raw reply

* Re: [PATCH 4/9] tproxy: added tproxy sockopt interface in the IPV6 layer
From: YOSHIFUJI Hideaki @ 2010-10-21 21:09 UTC (permalink / raw)
  To: KOVACS Krisztian
  Cc: Jan Engelhardt, netdev, netfilter-devel, Patrick McHardy,
	David Miller, yoshfuji
In-Reply-To: <1287650781.13326.1.camel@este.odu>

On 2010-10-21, KOVACS Krisztian wrote:
> Hi,
> 
> On Thu, 2010-10-21 at 10:39 +0200, Jan Engelhardt wrote:
> > On Wednesday 2010-10-20 13:21, KOVACS Krisztian wrote:
> > 
> > >@@ -268,6 +268,10 @@ struct in6_flowlabel_req {
> > > /* RFC5082: Generalized Ttl Security Mechanism */
> > > #define IPV6_MINHOPCOUNT		73
> > > 
> > >+#define IPV6_ORIGDSTADDR        74
> > >+#define IPV6_RECVORIGDSTADDR    IPV6_ORIGDSTADDR
> > >+#define IPV6_TRANSPARENT        75
> > >+
> > 
> > Why do we actually need two names for the same thing?
> 
> IPV6_RECVORIGDSTADDR is the name of the socket option you're supposed to
> set if you require the original destination address. IPV6_ORIGDSTADDR is
> the name of the ancillary message you get with the actual address in it.
> Just like we have it for IP_TOS/IP_RECVTOS, for example.

I agree.

--yoshfuji


^ permalink raw reply

* Re: [PATCH 63/72] tproxy: allow non-local binds of IPv6 sockets if IP_TRANSPARENT is enabled
From: YOSHIFUJI Hideaki @ 2010-10-21 21:07 UTC (permalink / raw)
  To: kaber; +Cc: bazsi, hidden, davem, netfilter-devel, netdev, yoshfuji
In-Reply-To: <1287674399-31455-64-git-send-email-kaber@trash.net>

Hello.

On 2010-10-21, kaber@trash.net wrote:
> From: Balazs Scheidler <bazsi@balabit.hu>
> 
> Signed-off-by: Balazs Scheidler <bazsi@balabit.hu>
> Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
> Signed-off-by: Patrick McHardy <kaber@trash.net>
> ---
>  net/ipv6/af_inet6.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
> index 56b9bf2..4869797 100644
> --- a/net/ipv6/af_inet6.c
> +++ b/net/ipv6/af_inet6.c
> @@ -343,7 +343,8 @@ int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
>  			 */
>  			v4addr = LOOPBACK4_IPV6;
>  			if (!(addr_type & IPV6_ADDR_MULTICAST))	{
> -				if (!ipv6_chk_addr(net, &addr->sin6_addr,
> +				if (!inet->transparent &&
> +				    !ipv6_chk_addr(net, &addr->sin6_addr,
>  						   dev, 0)) {
>  					err = -EADDRNOTAVAIL;
>  					goto out_unlock;

Sorry, NAK.

http://kerneltrap.org/mailarchive/linux-netdev/2010/7/5/6280572

--yoshfuji


^ permalink raw reply

* Re: [PATCH 01/12] l2tp: make local function static
From: James Chapman @ 2010-10-21 20:30 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev
In-Reply-To: <20101021175148.018298774@vyatta.com>

On 21/10/2010 18:50, Stephen Hemminger wrote:
> Also moved the refcound inlines from l2tp_core.h to l2tp_core.c
> since only used in that one file.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> 

Acked-by: James Chapman <jchapman@katalix.com>


-- 
James Chapman
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development


^ permalink raw reply

* RE: Question w.r.t debugfs / netdevice pass-through IOCTL
From: Debashis Dutt @ 2010-10-21 20:29 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev@vger.kernel.org



----Original Message-----
From: Stephen Hemminger [mailto:shemminger@vyatta.com] 
Sent: Wednesday, October 20, 2010 9:19 PM
To: Debashis Dutt
Cc: netdev@vger.kernel.org
Subject: Re: Question w.r.t debugfs / netdevice pass-through IOCTL

On Wed, 20 Oct 2010 20:26:50 -0700
Debashis Dutt <ddutt@Brocade.COM> wrote:

> Hi, 
> 
> For the Brocade 10G Ethernet driver (bna) we want to implement a set of operations which is not supported by current tools like ethtool. 
> 
> Examples of such operations would be 
>        a) Queries related to CEE, if the link is CEE.
>        b) Get traces from firmware.

> 
> I was wondering what would be right approach to take here:
>                 a) use debugfs (like the Chelsio cxgb4 driver)
Works as long as they are really debug operations. The debugfs isn't always
available, and support should be a config option for your driver.

>                 b) use SIOCDEVPRIVATE for the pass through IOCTL defined in
>                     struct net_device_ops{}

The problem with ioctl is it doesn't work for 32 bit user space
compatiablity. The ioctl compat layer does not have enough context
to translate SIOCDEVPRIVATE

>                     As per comments in the header file, b) should not be used
>                     since this IOCTL is supposed to be deprecated.
>                 c) use procfs / sysfs (these may not scale, in our opinion)

Although less common, there were drivers putting things in /proc/net/xxx/ethX


Thanks Stephen for the suggestions.

--Debashis
-- 

^ permalink raw reply

* Re: Question w.r.t debugfs / netdevice pass-through IOCTL
From: John Fastabend @ 2010-10-21 20:22 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Debashis Dutt, netdev@vger.kernel.org
In-Reply-To: <20101020211900.79b8336f@nehalam>

On 10/20/2010 9:19 PM, Stephen Hemminger wrote:
> On Wed, 20 Oct 2010 20:26:50 -0700
> Debashis Dutt <ddutt@Brocade.COM> wrote:
> 
>> Hi, 
>>
>> For the Brocade 10G Ethernet driver (bna) we want to implement a set of operations which is not supported by current tools like ethtool. 
>>
>> Examples of such operations would be 
>>        a) Queries related to CEE, if the link is CEE.

Assuming CEE is Converged Enhanced Ethernet here.

For CEE queries please consider using the dcbnl interface in /net/dcb/dcbnl.c. If
it is missing an interface that would be useful to all DCB devices we could
entertain adding it. Also this way DCB queries will work with existing tools that
query these things lldpad/dcbtool.

The things you would want to know about a CEE device should be about the same
regardless of the hardware in use lets try to use a single interface and avoid
private interfaces.

Thanks,
John.

>>        b) Get traces from firmware.
> 
>>
>> I was wondering what would be right approach to take here:
>>                 a) use debugfs (like the Chelsio cxgb4 driver)
> Works as long as they are really debug operations. The debugfs isn't always
> available, and support should be a config option for your driver.
> 
>>                 b) use SIOCDEVPRIVATE for the pass through IOCTL defined in
>>                     struct net_device_ops{}
> 
> The problem with ioctl is it doesn't work for 32 bit user space
> compatiablity. The ioctl compat layer does not have enough context
> to translate SIOCDEVPRIVATE
> 
>>                     As per comments in the header file, b) should not be used
>>                     since this IOCTL is supposed to be deprecated.
>>                 c) use procfs / sysfs (these may not scale, in our opinion)
> 
> Although less common, there were drivers putting things in /proc/net/xxx/ethX
> 
> 
> 


^ permalink raw reply

* [PATCH 2/2 v3] xps: Transmit Packet Steering
From: Tom Herbert @ 2010-10-21 20:17 UTC (permalink / raw)
  To: davem, netdev; +Cc: eric.dumazet

This patch implements transmit packet steering (XPS) for multiqueue
devices.  XPS selects a transmit queue during packet transmission based
on configuration.  This is done by mapping the CPU transmitting the
packet to a queue.  This is the transmit side analogue to RPS-- where
RPS is selecting a CPU based on receive queue, XPS selects a queue
based on the CPU (previously there was an XPS patch from Eric
Dumazet, but that might more appropriately be called transmit completion
steering).

Each transmit queue can be associated with a number of CPUs which will
use the queue to send packets.  This is configured as a CPU mask on a
per queue basis in:

/sys/class/net/eth<n>/queues/tx-<n>/xps_cpus

The mappings are stored per device in an inverted data structure that
maps CPUs to queues.  In the netdevice structure this is an array of
num_possible_cpu structures where each structure holds and array of
queue_indexes for queues which that CPU can use.

The benefits of XPS are improved locality in the per queue data
structures.  Also, transmit completions are more likely to be done
nearer to the sending thread, so this should promote locality back
to the socket on free (e.g. UDP).  The benefits of XPS are dependent on
cache hierarchy, application load, and other factors.  XPS would
nominally be configured so that a queue would only be shared by CPUs
which are sharing a cache, the degenerative configuration woud be that
each CPU has it's own queue.

Below are some benchmark results which show the potential benfit of
this patch.  The netperf test has 500 instances of netperf TCP_RR test
with 1 byte req. and resp.

bnx2x on 16 core AMD
   XPS (16 queues, 1 TX queue per CPU)	1234K at 100% CPU
   No XPS (16 queues)			996K at 100% CPU

Signed-off-by: Tom Herbert <therbert@google.com>
---
 include/linux/netdevice.h |   27 ++++
 net/core/dev.c            |   58 +++++++-
 net/core/net-sysfs.c      |  367 ++++++++++++++++++++++++++++++++++++++++++++-
 net/core/net-sysfs.h      |    3 +
 4 files changed, 448 insertions(+), 7 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index fcd3dda..f19b78b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -503,6 +503,13 @@ struct netdev_queue {
 	struct Qdisc		*qdisc;
 	unsigned long		state;
 	struct Qdisc		*qdisc_sleeping;
+#ifdef CONFIG_RPS
+	struct netdev_queue	*first;
+	atomic_t		count;
+	struct xps_dev_maps	*xps_maps;
+	struct kobject		kobj;
+#endif
+
 /*
  * write mostly part
  */
@@ -530,6 +537,26 @@ struct rps_map {
 #define RPS_MAP_SIZE(_num) (sizeof(struct rps_map) + (_num * sizeof(u16)))
 
 /*
+ * This structure holds an XPS map which can be of variable length.  The
+ * map is an array of queues.
+ */
+struct xps_map {
+	unsigned int len;
+	unsigned int alloc_len;
+	struct rcu_head rcu;
+	u16 queues[0];
+};
+
+/*
+ * This structure holds all XPS maps for device.  Maps are indexed by CPU.
+ */
+struct xps_dev_maps {
+	struct rcu_head rcu;
+	struct xps_map *cpu_map[0];
+};
+#define netdev_get_xps_maps(dev) ((dev)->_tx[0].xps_maps)
+
+/*
  * The rps_dev_flow structure contains the mapping of a flow to a CPU and the
  * tail pointer for that CPU's input queue at the time of last enqueue.
  */
diff --git a/net/core/dev.c b/net/core/dev.c
index a538ed5..334c85a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2119,6 +2119,44 @@ static inline u16 dev_cap_txqueue(struct net_device *dev, u16 queue_index)
 	return queue_index;
 }
 
+static inline int get_xps_queue(struct net_device *dev, struct sk_buff *skb)
+{
+#ifdef CONFIG_RPS
+	struct xps_dev_maps *dev_maps;
+	struct xps_map *map;
+	int queue_index = -1;
+
+	preempt_disable();
+	rcu_read_lock();
+	dev_maps = rcu_dereference(netdev_get_xps_maps(dev));
+	if (dev_maps) {
+		map = rcu_dereference(dev_maps->cpu_map[smp_processor_id()]);
+		if (map) {
+			if (map->len == 1)
+				queue_index = map->queues[0];
+			else {
+				u32 hash;
+				if (skb->sk && skb->sk->sk_hash)
+					hash = skb->sk->sk_hash;
+				else
+					hash = (__force u16) skb->protocol ^
+					    skb->rxhash;
+				hash = jhash_1word(hash, hashrnd);
+				queue_index = map->queues[
+				    ((u64)hash * map->len) >> 32];
+			}
+			if (unlikely(queue_index >= dev->real_num_tx_queues))
+				queue_index = -1;
+		}
+	}
+	rcu_read_unlock();
+	preempt_enable();
+
+	return queue_index;
+#endif
+	return -1;
+}
+
 static struct netdev_queue *dev_pick_tx(struct net_device *dev,
 					struct sk_buff *skb)
 {
@@ -2137,8 +2175,11 @@ static struct netdev_queue *dev_pick_tx(struct net_device *dev,
 			if (ops->ndo_select_queue) {
 				queue_index = ops->ndo_select_queue(dev, skb);
 				queue_index = dev_cap_txqueue(dev, queue_index);
-			} else
-				queue_index = skb_tx_hash(dev, skb);
+			} else {
+				queue_index = get_xps_queue(dev, skb);
+				if (queue_index < 0)
+					queue_index = skb_tx_hash(dev, skb);
+			}
 
 			if (queue_index != old_index && sk) {
 				struct dst_entry *dst = rcu_dereference_check(sk->sk_dst_cache, 1);
@@ -5052,6 +5093,17 @@ static int netif_alloc_netdev_queues(struct net_device *dev)
 		return -ENOMEM;
 	}
 	dev->_tx = tx;
+#ifdef CONFIG_RPS
+	/*
+	 * Set a pointer to first element in the array which holds the
+	 * reference count.
+	 */
+	{
+		int i;
+		for (i = 0; i < count; i++)
+			tx[i].first = tx;
+	}
+#endif
 	return 0;
 }
 
@@ -5616,7 +5668,9 @@ void free_netdev(struct net_device *dev)
 
 	release_net(dev_net(dev));
 
+#ifndef CONFIG_RPS
 	kfree(dev->_tx);
+#endif
 
 	kfree(rcu_dereference_raw(dev->ingress_queue));
 
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index b143173..e193cf2 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -764,18 +764,375 @@ net_rx_queue_update_kobjects(struct net_device *net, int old_num, int new_num)
 	return error;
 }
 
-static int rx_queue_register_kobjects(struct net_device *net)
+/*
+ * netdev_queue sysfs structures and functions.
+ */
+struct netdev_queue_attribute {
+	struct attribute attr;
+	ssize_t (*show)(struct netdev_queue *queue,
+	    struct netdev_queue_attribute *attr, char *buf);
+	ssize_t (*store)(struct netdev_queue *queue,
+	    struct netdev_queue_attribute *attr, const char *buf, size_t len);
+};
+#define to_netdev_queue_attr(_attr) container_of(_attr,		\
+    struct netdev_queue_attribute, attr)
+
+#define to_netdev_queue(obj) container_of(obj, struct netdev_queue, kobj)
+
+static ssize_t netdev_queue_attr_show(struct kobject *kobj,
+				      struct attribute *attr, char *buf)
+{
+	struct netdev_queue_attribute *attribute = to_netdev_queue_attr(attr);
+	struct netdev_queue *queue = to_netdev_queue(kobj);
+
+	if (!attribute->show)
+		return -EIO;
+
+	return attribute->show(queue, attribute, buf);
+}
+
+static ssize_t netdev_queue_attr_store(struct kobject *kobj,
+				       struct attribute *attr,
+				       const char *buf, size_t count)
+{
+	struct netdev_queue_attribute *attribute = to_netdev_queue_attr(attr);
+	struct netdev_queue *queue = to_netdev_queue(kobj);
+
+	if (!attribute->store)
+		return -EIO;
+
+	return attribute->store(queue, attribute, buf, count);
+}
+
+static const struct sysfs_ops netdev_queue_sysfs_ops = {
+	.show = netdev_queue_attr_show,
+	.store = netdev_queue_attr_store,
+};
+
+static inline unsigned int get_netdev_queue_index(struct netdev_queue *queue)
+{
+	struct net_device *dev = queue->dev;
+	int i;
+
+	for (i = 0; i < dev->num_tx_queues; i++)
+		if (queue == &dev->_tx[i])
+			break;
+
+	BUG_ON(i >= dev->num_tx_queues);
+
+	return i;
+}
+
+
+static ssize_t show_xps_map(struct netdev_queue *queue,
+			    struct netdev_queue_attribute *attribute, char *buf)
+{
+	struct netdev_queue *first = queue->first;
+	struct xps_dev_maps *dev_maps;
+	cpumask_var_t mask;
+	unsigned long index;
+	size_t len = 0;
+	int i;
+
+	if (!zalloc_cpumask_var(&mask, GFP_KERNEL))
+		return -ENOMEM;
+
+	index = get_netdev_queue_index(queue);
+
+	rcu_read_lock();
+	dev_maps = rcu_dereference(first->xps_maps);
+	if (dev_maps) {
+		for (i = 0; i < num_possible_cpus(); i++) {
+			struct xps_map *map =
+			    rcu_dereference(dev_maps->cpu_map[i]);
+			if (map) {
+				int j;
+				for (j = 0; j < map->len; j++) {
+					if (map->queues[j] == index) {
+						cpumask_set_cpu(i, mask);
+						break;
+					}
+				}
+			}
+		}
+	}
+	len += cpumask_scnprintf(buf + len, PAGE_SIZE, mask);
+	if (PAGE_SIZE - len < 3) {
+		rcu_read_unlock();
+		free_cpumask_var(mask);
+		return -EINVAL;
+	}
+	rcu_read_unlock();
+
+	free_cpumask_var(mask);
+	len += sprintf(buf + len, "\n");
+	return len;
+}
+
+static void xps_map_release(struct rcu_head *rcu)
+{
+	struct xps_map *map = container_of(rcu, struct xps_map, rcu);
+
+	kfree(map);
+}
+
+static void xps_dev_maps_release(struct rcu_head *rcu)
 {
+	struct xps_dev_maps *dev_maps =
+	    container_of(rcu, struct xps_dev_maps, rcu);
+
+	kfree(dev_maps);
+}
+
+static DEFINE_MUTEX(xps_map_mutex);
+
+static ssize_t store_xps_map(struct netdev_queue *queue,
+		      struct netdev_queue_attribute *attribute,
+		      const char *buf, size_t len)
+{
+	struct netdev_queue *first = queue->first;
+	cpumask_var_t mask;
+	int err, i, cpu, pos, map_len, alloc_len, need_set;
+	unsigned long index;
+	struct xps_map *map, *new_map;
+	struct xps_dev_maps *dev_maps, *new_dev_maps;
+	int nonempty = 0;
+
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
+	if (!alloc_cpumask_var(&mask, GFP_KERNEL))
+		return -ENOMEM;
+
+	index = get_netdev_queue_index(queue);
+
+	err = bitmap_parse(buf, len, cpumask_bits(mask), nr_cpumask_bits);
+	if (err) {
+		free_cpumask_var(mask);
+		return err;
+	}
+
+	new_dev_maps = kzalloc(sizeof(struct xps_dev_maps) +
+	    (num_possible_cpus() * sizeof(struct xps_map *)), GFP_KERNEL);
+	if (!new_dev_maps) {
+		free_cpumask_var(mask);
+		return err;
+	}
+
+	mutex_lock(&xps_map_mutex);
+
+	dev_maps = first->xps_maps;
+
+	for (cpu = 0; cpu < num_possible_cpus(); cpu++) {
+		new_map = map = dev_maps ? dev_maps->cpu_map[cpu] : NULL;
+
+		if (map) {
+			for (pos = 0; pos < map->len; pos++)
+				if (map->queues[pos] == index)
+					break;
+			map_len = map->len;
+			alloc_len = map->alloc_len;
+		} else
+			pos = map_len = alloc_len = 0;
+
+		need_set = cpu_isset(cpu, *mask) && cpu_online(cpu);
+
+		if (need_set && pos >= map_len) {
+			/* Need to add queue to this CPU's map */
+			if (map_len >= alloc_len) {
+				alloc_len = alloc_len ?  2 * alloc_len : 1;
+				new_map = kzalloc(sizeof(struct xps_map) +
+				    (alloc_len * sizeof(u16)), GFP_KERNEL);
+				if (!new_map)
+					goto error;
+				new_map->alloc_len = alloc_len;
+				for (i = 0; i < map_len; i++)
+					new_map->queues[i] = map->queues[i];
+				new_map->len = map_len;
+			}
+			new_map->queues[new_map->len++] = index;
+		} else if (!need_set && pos < map_len) {
+			/* Need to remove queue from this CPU's map */
+			if (map_len > 1)
+				new_map->queues[pos] =
+				    new_map->queues[--new_map->len];
+			else
+				new_map = NULL;
+		}
+		new_dev_maps->cpu_map[cpu] = new_map;
+	}
+
+	/* Cleanup old maps */
+	for (cpu = 0; cpu < num_possible_cpus(); cpu++) {
+		map = dev_maps ? dev_maps->cpu_map[cpu] : NULL;
+		if (map && new_dev_maps->cpu_map[cpu] != map)
+			call_rcu(&map->rcu, xps_map_release);
+		if (new_dev_maps->cpu_map[cpu])
+			nonempty = 1;
+	}
+
+	if (nonempty)
+		rcu_assign_pointer(first->xps_maps, new_dev_maps);
+	else {
+		kfree(new_dev_maps);
+		rcu_assign_pointer(first->xps_maps, NULL);
+	}
+
+	if (dev_maps)
+		call_rcu(&dev_maps->rcu, xps_dev_maps_release);
+
+	mutex_unlock(&xps_map_mutex);
+
+	free_cpumask_var(mask);
+	return len;
+
+error:
+	mutex_unlock(&xps_map_mutex);
+
+	if (new_dev_maps)
+		for (i = 0; i < num_possible_cpus(); i++)
+			kfree(new_dev_maps->cpu_map[i]);
+
+	kfree(new_dev_maps);
+	free_cpumask_var(mask);
+	return -ENOMEM;
+}
+
+static struct netdev_queue_attribute xps_cpus_attribute =
+    __ATTR(xps_cpus, S_IRUGO | S_IWUSR, show_xps_map, store_xps_map);
+
+static struct attribute *netdev_queue_default_attrs[] = {
+	&xps_cpus_attribute.attr,
+	NULL
+};
+
+static void netdev_queue_release(struct kobject *kobj)
+{
+	struct netdev_queue *queue = to_netdev_queue(kobj);
+	struct netdev_queue *first = queue->first;
+	struct xps_dev_maps *dev_maps;
+	struct xps_map *map;
+	unsigned long index;
+	int i, pos, nonempty = 0;
+
+	index = get_netdev_queue_index(queue);
+
+	mutex_lock(&xps_map_mutex);
+	dev_maps = first->xps_maps;
+
+	for (i = 0; i < num_possible_cpus(); i++) {
+		map  = dev_maps ? dev_maps->cpu_map[i] : NULL;
+		if (!map)
+			continue;
+
+		for (pos = 0; pos < map->len; pos++)
+			if (map->queues[pos] == index)
+				break;
+
+		if (pos < map->len) {
+			if (map->len > 1)
+				map->queues[pos] = map->queues[--map->len];
+			else {
+				rcu_assign_pointer(dev_maps->cpu_map[i],
+				    NULL);
+				call_rcu(&map->rcu, xps_map_release);
+				map = NULL;
+			}
+		}
+
+		if (map)
+			nonempty = 1;
+	}
+
+	if (!nonempty) {
+		rcu_assign_pointer(first->xps_maps, NULL);
+		call_rcu(&dev_maps->rcu, xps_dev_maps_release);
+	}
+	mutex_unlock(&xps_map_mutex);
+
+	if (atomic_dec_and_test(&first->count))
+		kfree(first);
+}
+
+static struct kobj_type netdev_queue_ktype = {
+	.sysfs_ops = &netdev_queue_sysfs_ops,
+	.release = netdev_queue_release,
+	.default_attrs = netdev_queue_default_attrs,
+};
+
+static int netdev_queue_add_kobject(struct net_device *net, int index)
+{
+	struct netdev_queue *queue = net->_tx + index;
+	struct netdev_queue *first = queue->first;
+	struct kobject *kobj = &queue->kobj;
+	int error = 0;
+
+	kobj->kset = net->queues_kset;
+	error = kobject_init_and_add(kobj, &netdev_queue_ktype, NULL,
+	    "tx-%u", index);
+	if (error) {
+		kobject_put(kobj);
+		return error;
+	}
+
+	kobject_uevent(kobj, KOBJ_ADD);
+	atomic_inc(&first->count);
+
+	return error;
+}
+
+int
+netdev_queue_update_kobjects(struct net_device *net, int old_num, int new_num)
+{
+	int i;
+	int error = 0;
+
+	for (i = old_num; i < new_num; i++) {
+		error = netdev_queue_add_kobject(net, i);
+		if (error) {
+			new_num = old_num;
+			break;
+		}
+	}
+
+	while (--i >= new_num)
+		kobject_put(&net->_rx[i].kobj);
+
+	return error;
+}
+
+static int register_queue_kobjects(struct net_device *net)
+{
+	int error = 0, txq = 0, rxq = 0;
+
 	net->queues_kset = kset_create_and_add("queues",
 	    NULL, &net->dev.kobj);
 	if (!net->queues_kset)
 		return -ENOMEM;
-	return net_rx_queue_update_kobjects(net, 0, net->real_num_rx_queues);
+
+	error = net_rx_queue_update_kobjects(net, 0, net->real_num_rx_queues);
+	if (error)
+		goto error;
+	rxq = net->real_num_rx_queues;
+
+	error = netdev_queue_update_kobjects(net, 0,
+					     net->real_num_tx_queues);
+	if (error)
+		goto error;
+	txq = net->real_num_tx_queues;
+
+	return 0;
+
+error:
+	netdev_queue_update_kobjects(net, txq, 0);
+	net_rx_queue_update_kobjects(net, rxq, 0);
+	return error;
 }
 
-static void rx_queue_remove_kobjects(struct net_device *net)
+static void remove_queue_kobjects(struct net_device *net)
 {
 	net_rx_queue_update_kobjects(net, net->real_num_rx_queues, 0);
+	netdev_queue_update_kobjects(net, net->real_num_tx_queues, 0);
 	kset_unregister(net->queues_kset);
 }
 #endif /* CONFIG_RPS */
@@ -878,7 +1235,7 @@ void netdev_unregister_kobject(struct net_device * net)
 	kobject_get(&dev->kobj);
 
 #ifdef CONFIG_RPS
-	rx_queue_remove_kobjects(net);
+	remove_queue_kobjects(net);
 #endif
 
 	device_del(dev);
@@ -919,7 +1276,7 @@ int netdev_register_kobject(struct net_device *net)
 		return error;
 
 #ifdef CONFIG_RPS
-	error = rx_queue_register_kobjects(net);
+	error = register_queue_kobjects(net);
 	if (error) {
 		device_del(dev);
 		return error;
diff --git a/net/core/net-sysfs.h b/net/core/net-sysfs.h
index 778e157..25ec2ee 100644
--- a/net/core/net-sysfs.h
+++ b/net/core/net-sysfs.h
@@ -6,6 +6,9 @@ int netdev_register_kobject(struct net_device *);
 void netdev_unregister_kobject(struct net_device *);
 #ifdef CONFIG_RPS
 int net_rx_queue_update_kobjects(struct net_device *, int old_num, int new_num);
+int netdev_queue_update_kobjects(struct net_device *net,
+				 int old_num, int new_num);
+
 #endif
 
 #endif
-- 
1.7.1


^ permalink raw reply related

* [PATCH 0/2 v3] xps: Transmit Packet Steering
From: Tom Herbert @ 2010-10-21 20:17 UTC (permalink / raw)
  To: davem, netdev; +Cc: eric.dumazet

New version transmit packet steering.  Separated out generic
changed in dev_pick_tx to its own patch.

^ permalink raw reply

* [PATCH 1/2 v3] xps: Improvements in TX queue selection
From: Tom Herbert @ 2010-10-21 20:17 UTC (permalink / raw)
  To: davem, netdev; +Cc: eric.dumazet

In dev_pick_tx, don't do work in calculating queue index or setting
the index in the sock unless the device has more than one queue.  This
allows the sock to be set only with a queue index of a multi-queue
device which is desirable if device are stacked like in a tunnel.

We also allow the mapping of a socket to queue to be changed.  To
maintain in order packet transmission a flag (ooo_okay) has been
added to the sk_buff structure.  If a transport layer sets this flag
on a packet, the transmit queue can be changed for the socket.
Presumably, the transport would set this if there was no possbility
of creating OOO packets (for instance, there are no packets in flight
for the socket).  This patch includes the modification in TCP output
for setting this flag.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 include/linux/skbuff.h |    3 ++-
 net/core/dev.c         |   24 ++++++++++++++----------
 net/ipv4/tcp_output.c  |    4 +++-
 3 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index e6ba898..19f37a6 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -386,9 +386,10 @@ struct sk_buff {
 #else
 	__u8			deliver_no_wcard:1;
 #endif
+	__u8			ooo_okay:1;
 	kmemcheck_bitfield_end(flags2);
 
-	/* 0/14 bit hole */
+	/* 0/13 bit hole */
 
 #ifdef CONFIG_NET_DMA
 	dma_cookie_t		dma_cookie;
diff --git a/net/core/dev.c b/net/core/dev.c
index b2269ac..a538ed5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2123,28 +2123,32 @@ static struct netdev_queue *dev_pick_tx(struct net_device *dev,
 					struct sk_buff *skb)
 {
 	int queue_index;
-	const struct net_device_ops *ops = dev->netdev_ops;
 
-	if (ops->ndo_select_queue) {
-		queue_index = ops->ndo_select_queue(dev, skb);
-		queue_index = dev_cap_txqueue(dev, queue_index);
-	} else {
+	if (dev->real_num_tx_queues > 1) {
 		struct sock *sk = skb->sk;
+
 		queue_index = sk_tx_queue_get(sk);
-		if (queue_index < 0) {
 
-			queue_index = 0;
-			if (dev->real_num_tx_queues > 1)
+		if (queue_index < 0 || skb->ooo_okay ||
+		    queue_index >= dev->real_num_tx_queues) {
+			const struct net_device_ops *ops = dev->netdev_ops;
+			int old_index = queue_index;
+
+			if (ops->ndo_select_queue) {
+				queue_index = ops->ndo_select_queue(dev, skb);
+				queue_index = dev_cap_txqueue(dev, queue_index);
+			} else
 				queue_index = skb_tx_hash(dev, skb);
 
-			if (sk) {
+			if (queue_index != old_index && sk) {
 				struct dst_entry *dst = rcu_dereference_check(sk->sk_dst_cache, 1);
 
 				if (dst && skb_dst(skb) == dst)
 					sk_tx_queue_set(sk, queue_index);
 			}
 		}
-	}
+	} else
+		queue_index = 0;
 
 	skb_set_queue_mapping(skb, queue_index);
 	return netdev_get_tx_queue(dev, queue_index);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 05b1ecf..67b9c9e 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -822,8 +822,10 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
 							   &md5);
 	tcp_header_size = tcp_options_size + sizeof(struct tcphdr);
 
-	if (tcp_packets_in_flight(tp) == 0)
+	if (tcp_packets_in_flight(tp) == 0) {
 		tcp_ca_event(sk, CA_EVENT_TX_START);
+		skb->ooo_okay = 1;
+	}
 
 	skb_push(skb, tcp_header_size);
 	skb_reset_transport_header(skb);
-- 
1.7.1


^ permalink raw reply related

* RE:
From: Debashis Dutt @ 2010-10-21 19:48 UTC (permalink / raw)
  To: David Miller
  Cc: netdev@vger.kernel.org, Rasesh Mody, Jing Huang, Akshay Mathur
In-Reply-To: <20101021.005657.226776344.davem@davemloft.net>

-----Original Message-----
From: David Miller [mailto:davem@davemloft.net] 
Sent: Thursday, October 21, 2010 12:57 AM
To: Debashis Dutt
Cc: netdev@vger.kernel.org; Rasesh Mody; Jing Huang; Akshay Mathur
Subject: 


People are very unlikely to read your posting because you
did not provide a subject line.

>>>
Thanks David, for looking at this. I have already reposted in netdev with the correct
subject line. 

Please provide your suggestions/feedback as required.

Thanks
--Debashis


^ permalink raw reply

* Re: [PATCH v2 07/14] ethtool: Add support for vlan accleration.
From: Jesse Gross @ 2010-10-21 19:43 UTC (permalink / raw)
  To: John Fastabend; +Cc: David Miller, netdev@vger.kernel.org
In-Reply-To: <4CBFB320.2060703@intel.com>

On Wed, Oct 20, 2010 at 8:27 PM, John Fastabend
<john.r.fastabend@intel.com> wrote:
> On 10/20/2010 4:56 PM, Jesse Gross wrote:
>> Now that vlan acceleration is handled consistently regardless of usage,
>> it is possible to enable and disable it at will.  This adds support for
>> Ethtool operations that change the offloading status for debugging
>> purposes, similar to other forms of hardware acceleration.
>>
>
> Jesse,
>
> Not sure if this is enough to get dynamic toggling like this
> dev->hard_header_len is set depending on offloads at init time in
> vlan_dev_init(). By changing this LL_RESERVED_SPACE won't work
> correctly and we end up having to call pskb_expand_head(). I think
> this might end up hurting performance.

That's a good point.

>
> That said I think I can probably get this working by fixing up the
> header_ops in vlan_dev.c.  And while I'm at it add a vlan_header_cache
> and vlan_header_cache_update routines. I'll try to get something out
> tomorrow in the meantime nothing too bad is happening.

That sounds great, thanks.

^ permalink raw reply

* Re: [PATCH v2 00/14] Move vlan acceleration into networking core.
From: Jesse Gross @ 2010-10-21 19:32 UTC (permalink / raw)
  To: David Dillow; +Cc: David Miller, netdev
In-Reply-To: <1287626579.11431.9.camel@obelisk.thedillows.org>

On Wed, Oct 20, 2010 at 7:02 PM, David Dillow <dave@thedillows.org> wrote:
> On Wed, 2010-10-20 at 16:56 -0700, Jesse Gross wrote:
>> The first eleven patches can be applied immediately, while the last three need
>> to wait until all drivers that support vlan acceleration are updated.  If
>> people agree that this patch set makes sense I will go ahead and switch over
>> the dozen or so drivers that would need to change.
>
> Here's a first pass at converting typhoon to the new methods. It is
> compile tested, but I have to put the hardware back in a machine to do
> some testing, which I may not be able to do before the weekend.
>
> Of course, we could just change it to not offload by default if we need
> to push this sooner.
>
> Applies to net-next-2.6 with the vlan changes.

This is great, thanks.  Much better to have someone who actually knows
what's going on, rather than me trying to randomly guess...

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox