Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net-next] inet_diag: use sock_gen_put()
From: Eric Dumazet @ 2013-10-11 15:54 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

From: Eric Dumazet <edumazet@google.com>

TCP listener refactoring, part 6 :

Use sock_gen_put() from inet_diag_dump_one_icsk() for future
SYN_RECV support.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/inet_diag.c |    9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 41e1c3e..56a964a 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -336,12 +336,9 @@ int inet_diag_dump_one_icsk(struct inet_hashinfo *hashinfo, struct sk_buff *in_s
 		err = 0;
 
 out:
-	if (sk) {
-		if (sk->sk_state == TCP_TIME_WAIT)
-			inet_twsk_put((struct inet_timewait_sock *)sk);
-		else
-			sock_put(sk);
-	}
+	if (sk)
+		sock_gen_put(sk);
+
 out_nosk:
 	return err;
 }

^ permalink raw reply related

* Re: [PATCH net] bridge: allow receiption on disabled port
From: Stephen Hemminger @ 2013-10-11 15:10 UTC (permalink / raw)
  To: Felix Fietkau; +Cc: netdev
In-Reply-To: <5257D067.2050607@openwrt.org>

On Fri, 11 Oct 2013 12:18:15 +0200
Felix Fietkau <nbd@openwrt.org> wrote:

> On 2013-10-11 4:35 AM, Stephen Hemminger wrote:
> > This is what I was thinking would be better.
> > 
> > Don't want these packets leaking into PRE_ROUTEING chain or have
> > any chance to get flooded out other ports.
> > 
> > Compile tested only...
> > 
> > I could use another goto instead but that becomes spaghetti and
> > never like to jump back into a block.
> [...]
> >  forward:
> >  	switch (p->state) {
> > +	case BR_STATE_DISABLED:
> > +		if (!ether_addr_equal(p->br->dev->dev_addr, dest))
> > +			goto drop;
> Checking against the bridge device address isn't enough, WPA EAPOL
> packets are addressed to the wifi device MAC address.

Correct, this should be skb->dev->dev_addr which matchs against
the MAC address that frame arrived on.

^ permalink raw reply

* [PATCH net-next v4 5/5] xen-netback: enable IPv6 TCP GSO to the guest
From: Paul Durrant @ 2013-10-11 15:06 UTC (permalink / raw)
  To: xen-devel, netdev; +Cc: Paul Durrant, Wei Liu, David Vrabel, Ian Campbell
In-Reply-To: <1381503982-1418-1-git-send-email-paul.durrant@citrix.com>

This patch adds code to handle SKB_GSO_TCPV6 skbs and construct appropriate
extra or prefix segments to pass the large packet to the frontend. New
xenstore flags, feature-gso-tcpv6 and feature-gso-tcpv6-prefix, are sampled
to determine if the frontend is capable of handling such packets.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
---
 drivers/net/xen-netback/common.h    |    9 +++++--
 drivers/net/xen-netback/interface.c |    6 +++--
 drivers/net/xen-netback/netback.c   |   48 +++++++++++++++++++++++++++--------
 drivers/net/xen-netback/xenbus.c    |   29 +++++++++++++++++++--
 include/xen/interface/io/netif.h    |    1 +
 5 files changed, 77 insertions(+), 16 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index b4a9a3c..55b8dec 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -87,9 +87,13 @@ struct pending_tx_info {
 struct xenvif_rx_meta {
 	int id;
 	int size;
+	int gso_type;
 	int gso_size;
 };
 
+#define GSO_BIT(type) \
+	(1 << XEN_NETIF_GSO_TYPE_ ## type)
+
 /* Discriminate from any valid pending_idx value. */
 #define INVALID_PENDING_IDX 0xFFFF
 
@@ -150,9 +154,10 @@ struct xenvif {
 	u8               fe_dev_addr[6];
 
 	/* Frontend feature information. */
+	int gso_mask;
+	int gso_prefix_mask;
+
 	u8 can_sg:1;
-	u8 gso:1;
-	u8 gso_prefix:1;
 	u8 ip_csum:1;
 	u8 ipv6_csum:1;
 
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index cb0d8ea..e4aa267 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -214,8 +214,10 @@ static netdev_features_t xenvif_fix_features(struct net_device *dev,
 
 	if (!vif->can_sg)
 		features &= ~NETIF_F_SG;
-	if (!vif->gso && !vif->gso_prefix)
+	if (~(vif->gso_mask | vif->gso_prefix_mask) & GSO_BIT(TCPV4))
 		features &= ~NETIF_F_TSO;
+	if (~(vif->gso_mask | vif->gso_prefix_mask) & GSO_BIT(TCPV6))
+		features &= ~NETIF_F_TSO6;
 	if (!vif->ip_csum)
 		features &= ~NETIF_F_IP_CSUM;
 	if (!vif->ipv6_csum)
@@ -320,7 +322,7 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	dev->netdev_ops	= &xenvif_netdev_ops;
 	dev->hw_features = NETIF_F_SG |
 		NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
-		NETIF_F_TSO;
+		NETIF_F_TSO | NETIF_F_TSO6;
 	dev->features = dev->hw_features | NETIF_F_RXCSUM;
 	SET_ETHTOOL_OPS(dev, &xenvif_ethtool_ops);
 
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 65f04e7..b19d407 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -140,7 +140,7 @@ static int max_required_rx_slots(struct xenvif *vif)
 	int max = DIV_ROUND_UP(vif->dev->mtu, PAGE_SIZE);
 
 	/* XXX FIXME: RX path dependent on MAX_SKB_FRAGS */
-	if (vif->can_sg || vif->gso || vif->gso_prefix)
+	if (vif->can_sg || vif->gso_mask || vif->gso_prefix_mask)
 		max += MAX_SKB_FRAGS + 1; /* extra_info + frags */
 
 	return max;
@@ -312,6 +312,7 @@ static struct xenvif_rx_meta *get_next_rx_buffer(struct xenvif *vif,
 	req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++);
 
 	meta = npo->meta + npo->meta_prod++;
+	meta->gso_type = XEN_NETIF_GSO_TYPE_NONE;
 	meta->gso_size = 0;
 	meta->size = 0;
 	meta->id = req->id;
@@ -334,6 +335,7 @@ static void xenvif_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
 	struct gnttab_copy *copy_gop;
 	struct xenvif_rx_meta *meta;
 	unsigned long bytes;
+	int gso_type;
 
 	/* Data must not cross a page boundary. */
 	BUG_ON(size + offset > PAGE_SIZE<<compound_order(page));
@@ -392,7 +394,14 @@ static void xenvif_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
 		}
 
 		/* Leave a gap for the GSO descriptor. */
-		if (*head && skb_shinfo(skb)->gso_size && !vif->gso_prefix)
+		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
+			gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
+		else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
+			gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
+		else
+			gso_type = XEN_NETIF_GSO_TYPE_NONE;
+
+		if (*head && ((1 << gso_type) & vif->gso_mask))
 			vif->rx.req_cons++;
 
 		*head = 0; /* There must be something in this buffer now. */
@@ -423,14 +432,28 @@ static int xenvif_gop_skb(struct sk_buff *skb,
 	unsigned char *data;
 	int head = 1;
 	int old_meta_prod;
+	int gso_type;
+	int gso_size;
 
 	old_meta_prod = npo->meta_prod;
 
+	if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4) {
+		gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
+		gso_size = skb_shinfo(skb)->gso_size;
+	} else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6) {
+		gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
+		gso_size = skb_shinfo(skb)->gso_size;
+	} else {
+		gso_type = XEN_NETIF_GSO_TYPE_NONE;
+		gso_size = 0;
+	}
+
 	/* Set up a GSO prefix descriptor, if necessary */
-	if (skb_shinfo(skb)->gso_size && vif->gso_prefix) {
+	if ((1 << skb_shinfo(skb)->gso_type) & vif->gso_prefix_mask) {
 		req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++);
 		meta = npo->meta + npo->meta_prod++;
-		meta->gso_size = skb_shinfo(skb)->gso_size;
+		meta->gso_type = gso_type;
+		meta->gso_size = gso_size;
 		meta->size = 0;
 		meta->id = req->id;
 	}
@@ -438,10 +461,13 @@ static int xenvif_gop_skb(struct sk_buff *skb,
 	req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++);
 	meta = npo->meta + npo->meta_prod++;
 
-	if (!vif->gso_prefix)
-		meta->gso_size = skb_shinfo(skb)->gso_size;
-	else
+	if ((1 << gso_type) & vif->gso_mask) {
+		meta->gso_type = gso_type;
+		meta->gso_size = gso_size;
+	} else {
+		meta->gso_type = XEN_NETIF_GSO_TYPE_NONE;
 		meta->gso_size = 0;
+	}
 
 	meta->size = 0;
 	meta->id = req->id;
@@ -587,7 +613,8 @@ void xenvif_rx_action(struct xenvif *vif)
 
 		vif = netdev_priv(skb->dev);
 
-		if (vif->meta[npo.meta_cons].gso_size && vif->gso_prefix) {
+		if ((1 << vif->meta[npo.meta_cons].gso_type) &
+		    vif->gso_prefix_mask) {
 			resp = RING_GET_RESPONSE(&vif->rx,
 						 vif->rx.rsp_prod_pvt++);
 
@@ -624,7 +651,8 @@ void xenvif_rx_action(struct xenvif *vif)
 					vif->meta[npo.meta_cons].size,
 					flags);
 
-		if (vif->meta[npo.meta_cons].gso_size && !vif->gso_prefix) {
+		if ((1 << vif->meta[npo.meta_cons].gso_type) &
+		    vif->gso_mask) {
 			struct xen_netif_extra_info *gso =
 				(struct xen_netif_extra_info *)
 				RING_GET_RESPONSE(&vif->rx,
@@ -632,8 +660,8 @@ void xenvif_rx_action(struct xenvif *vif)
 
 			resp->flags |= XEN_NETRXF_extra_info;
 
+			gso->u.gso.type = vif->meta[npo.meta_cons].gso_type;
 			gso->u.gso.size = vif->meta[npo.meta_cons].gso_size;
-			gso->u.gso.type = XEN_NETIF_GSO_TYPE_TCPV4;
 			gso->u.gso.pad = 0;
 			gso->u.gso.features = 0;
 
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 7e4dcc9..6a005f1 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -577,15 +577,40 @@ static int connect_rings(struct backend_info *be)
 		val = 0;
 	vif->can_sg = !!val;
 
+	vif->gso_mask = 0;
+	vif->gso_prefix_mask = 0;
+
 	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv4",
 			 "%d", &val) < 0)
 		val = 0;
-	vif->gso = !!val;
+	if (val)
+		vif->gso_mask |= GSO_BIT(TCPV4);
 
 	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv4-prefix",
 			 "%d", &val) < 0)
 		val = 0;
-	vif->gso_prefix = !!val;
+	if (val)
+		vif->gso_prefix_mask |= GSO_BIT(TCPV4);
+
+	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv6",
+			 "%d", &val) < 0)
+		val = 0;
+	if (val)
+		vif->gso_mask |= GSO_BIT(TCPV6);
+
+	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv6-prefix",
+			 "%d", &val) < 0)
+		val = 0;
+	if (val)
+		vif->gso_prefix_mask |= GSO_BIT(TCPV6);
+
+	if (vif->gso_mask & vif->gso_prefix_mask) {
+		xenbus_dev_fatal(dev, err,
+				 "%s: gso and gso prefix flags are not "
+				 "mutually exclusive",
+				 dev->otherend);
+		return -EOPNOTSUPP;
+	}
 
 	/* Before feature-ipv6-csum-offload was introduced, checksum offload
 	 * was turned on by a zero value in (or lack of)
diff --git a/include/xen/interface/io/netif.h b/include/xen/interface/io/netif.h
index d7dd8d7..be341f9 100644
--- a/include/xen/interface/io/netif.h
+++ b/include/xen/interface/io/netif.h
@@ -113,6 +113,7 @@ struct xen_netif_tx_request {
 #define  XEN_NETIF_EXTRA_FLAG_MORE	(1U<<_XEN_NETIF_EXTRA_FLAG_MORE)
 
 /* GSO types */
+#define XEN_NETIF_GSO_TYPE_NONE		(0)
 #define XEN_NETIF_GSO_TYPE_TCPV4	(1)
 #define XEN_NETIF_GSO_TYPE_TCPV6	(2)
 
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH net-next v4 0/5] xen-netback: IPv6 offload support
From: Paul Durrant @ 2013-10-11 15:06 UTC (permalink / raw)
  To: xen-devel, netdev

This patch series adds support for checksum and large packet offloads into
xen-netback.
Testing has mainly been done using the Microsoft network hardware
certification suite running in Server 2008R2 VMs with Citrix PV frontends.

v2:
- Fixed Wei's email address in Cc lines

v3:
- Responded to Wei's comments:
 - netif.h now updated with comments and a definition of
   XEN_NETIF_GSO_TYPE_NONE.
 - limited number of pullups
- Responded to Annie's comments:
 - New GSO_BIT macro

v4:
- Responded to more of Wei's comments
- Remove parsing of IPv6 fragment header and added warning

^ permalink raw reply

* [PATCH net-next v4 4/5] xen-netback: handle IPv6 TCP GSO packets from the guest
From: Paul Durrant @ 2013-10-11 15:06 UTC (permalink / raw)
  To: xen-devel, netdev; +Cc: Paul Durrant, Wei Liu, David Vrabel, Ian Campbell
In-Reply-To: <1381503982-1418-1-git-send-email-paul.durrant@citrix.com>

This patch adds a xenstore feature flag, festure-gso-tcpv6, to advertise
that netback can handle IPv6 TCP GSO packets. It creates SKB_GSO_TCPV6 skbs
if the frontend passes an extra segment with the new type
XEN_NETIF_GSO_TYPE_TCPV6 added to netif.h.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
---
 drivers/net/xen-netback/netback.c |   11 ++++++++---
 drivers/net/xen-netback/xenbus.c  |    7 +++++++
 include/xen/interface/io/netif.h  |   10 +++++++++-
 3 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 74c13b9..65f04e7 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1096,15 +1096,20 @@ static int xenvif_set_skb_gso(struct xenvif *vif,
 		return -EINVAL;
 	}
 
-	/* Currently only TCPv4 S.O. is supported. */
-	if (gso->u.gso.type != XEN_NETIF_GSO_TYPE_TCPV4) {
+	switch (gso->u.gso.type) {
+	case XEN_NETIF_GSO_TYPE_TCPV4:
+		skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
+		break;
+	case XEN_NETIF_GSO_TYPE_TCPV6:
+		skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
+		break;
+	default:
 		netdev_err(vif->dev, "Bad GSO type %d.\n", gso->u.gso.type);
 		xenvif_fatal_tx_err(vif);
 		return -EINVAL;
 	}
 
 	skb_shinfo(skb)->gso_size = gso->u.gso.size;
-	skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
 
 	/* Header must be checked, and gso_segs computed. */
 	skb_shinfo(skb)->gso_type |= SKB_GSO_DODGY;
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 9c9b37d..7e4dcc9 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -105,6 +105,13 @@ static int netback_probe(struct xenbus_device *dev,
 			goto abort_transaction;
 		}
 
+		err = xenbus_printf(xbt, dev->nodename, "feature-gso-tcpv6",
+				    "%d", sg);
+		if (err) {
+			message = "writing feature-gso-tcpv6";
+			goto abort_transaction;
+		}
+
 		/* We support partial checksum setup for IPv6 packets */
 		err = xenbus_printf(xbt, dev->nodename,
 				    "feature-ipv6-csum-offload",
diff --git a/include/xen/interface/io/netif.h b/include/xen/interface/io/netif.h
index d9fb44739..d7dd8d7 100644
--- a/include/xen/interface/io/netif.h
+++ b/include/xen/interface/io/netif.h
@@ -61,6 +61,13 @@
  */
 
 /*
+ * "feature-gso-tcpv4" and "feature-gso-tcpv6" advertise the capability to
+ * handle large TCP packets (in IPv4 or IPv6 form respectively). Neither
+ * frontends nor backends are assumed to be capable unless the flags are
+ * present.
+ */
+
+/*
  * This is the 'wire' format for packets:
  *  Request 1: xen_netif_tx_request  -- XEN_NETTXF_* (any flags)
  * [Request 2: xen_netif_extra_info]    (only if request 1 has XEN_NETTXF_extra_info)
@@ -105,8 +112,9 @@ struct xen_netif_tx_request {
 #define _XEN_NETIF_EXTRA_FLAG_MORE	(0)
 #define  XEN_NETIF_EXTRA_FLAG_MORE	(1U<<_XEN_NETIF_EXTRA_FLAG_MORE)
 
-/* GSO types - only TCPv4 currently supported. */
+/* GSO types */
 #define XEN_NETIF_GSO_TYPE_TCPV4	(1)
+#define XEN_NETIF_GSO_TYPE_TCPV6	(2)
 
 /*
  * This structure needs to fit within both netif_tx_request and
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH net-next v4 3/5] xen-netback: Unconditionally set NETIF_F_RXCSUM
From: Paul Durrant @ 2013-10-11 15:06 UTC (permalink / raw)
  To: xen-devel, netdev; +Cc: Paul Durrant, Wei Liu, David Vrabel, Ian Campbell
In-Reply-To: <1381503982-1418-1-git-send-email-paul.durrant@citrix.com>

There is no mechanism to insist that a guest always generates a packet
with good checksum (at least for IPv4) so we must handle checksum
offloading from the guest and hence should set NETIF_F_RXCSUM.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
---
 drivers/net/xen-netback/interface.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 8e92783..cb0d8ea 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -321,7 +321,7 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	dev->hw_features = NETIF_F_SG |
 		NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
 		NETIF_F_TSO;
-	dev->features = dev->hw_features;
+	dev->features = dev->hw_features | NETIF_F_RXCSUM;
 	SET_ETHTOOL_OPS(dev, &xenvif_ethtool_ops);
 
 	dev->tx_queue_len = XENVIF_QUEUE_LENGTH;
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH net-next v4 2/5] xen-netback: add support for IPv6 checksum offload from guest
From: Paul Durrant @ 2013-10-11 15:06 UTC (permalink / raw)
  To: xen-devel, netdev; +Cc: Paul Durrant, Wei Liu, David Vrabel, Ian Campbell
In-Reply-To: <1381503982-1418-1-git-send-email-paul.durrant@citrix.com>

For performance of VM to VM traffic on a single host it is better to avoid
calculation of TCP/UDP checksum in the sending frontend. To allow this this
patch adds the code necessary to set up partial checksum for IPv6 packets
and xenstore flag feature-ipv6-csum-offload to advertise that fact to
frontends.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
---
 drivers/net/xen-netback/netback.c |  233 ++++++++++++++++++++++++++++++-------
 drivers/net/xen-netback/xenbus.c  |    9 ++
 2 files changed, 203 insertions(+), 39 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index f3e591c..74c13b9 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -109,15 +109,10 @@ static inline unsigned long idx_to_kaddr(struct xenvif *vif,
 	return (unsigned long)pfn_to_kaddr(idx_to_pfn(vif, idx));
 }
 
-/*
- * This is the amount of packet we copy rather than map, so that the
- * guest can't fiddle with the contents of the headers while we do
- * packet processing on them (netfilter, routing, etc).
+/* This is a miniumum size for the linear area to avoid lots of
+ * calls to __pskb_pull_tail() as we set up checksum offsets.
  */
-#define PKT_PROT_LEN    (ETH_HLEN + \
-			 VLAN_HLEN + \
-			 sizeof(struct iphdr) + MAX_IPOPTLEN + \
-			 sizeof(struct tcphdr) + MAX_TCP_OPTION_SPACE)
+#define PKT_PROT_LEN 128
 
 static u16 frag_get_pending_idx(skb_frag_t *frag)
 {
@@ -1118,61 +1113,74 @@ static int xenvif_set_skb_gso(struct xenvif *vif,
 	return 0;
 }
 
-static int checksum_setup(struct xenvif *vif, struct sk_buff *skb)
+static inline void maybe_pull_tail(struct sk_buff *skb, unsigned int len)
+{
+	if (skb_is_nonlinear(skb) && skb_headlen(skb) < len) {
+		/* If we need to pullup then pullup to the max, so we
+		 * won't need to do it again.
+		 */
+		int target = min_t(int, skb->len, MAX_TCP_HEADER);
+		__pskb_pull_tail(skb, target - skb_headlen(skb));
+	}
+}
+
+static int checksum_setup_ip(struct xenvif *vif, struct sk_buff *skb,
+			     int recalculate_partial_csum)
 {
-	struct iphdr *iph;
+	struct iphdr *iph = (void *)skb->data;
+	unsigned int header_size;
+	unsigned int off;
 	int err = -EPROTO;
-	int recalculate_partial_csum = 0;
 
-	/*
-	 * A GSO SKB must be CHECKSUM_PARTIAL. However some buggy
-	 * peers can fail to set NETRXF_csum_blank when sending a GSO
-	 * frame. In this case force the SKB to CHECKSUM_PARTIAL and
-	 * recalculate the partial checksum.
-	 */
-	if (skb->ip_summed != CHECKSUM_PARTIAL && skb_is_gso(skb)) {
-		vif->rx_gso_checksum_fixup++;
-		skb->ip_summed = CHECKSUM_PARTIAL;
-		recalculate_partial_csum = 1;
-	}
+	off = sizeof(struct iphdr);
 
-	/* A non-CHECKSUM_PARTIAL SKB does not require setup. */
-	if (skb->ip_summed != CHECKSUM_PARTIAL)
-		return 0;
+	header_size = skb->network_header + off + MAX_IPOPTLEN;
+	maybe_pull_tail(skb, header_size);
 
-	if (skb->protocol != htons(ETH_P_IP))
-		goto out;
+	off = iph->ihl * 4;
 
-	iph = (void *)skb->data;
 	switch (iph->protocol) {
 	case IPPROTO_TCP:
-		if (!skb_partial_csum_set(skb, 4 * iph->ihl,
+		if (!skb_partial_csum_set(skb, off,
 					  offsetof(struct tcphdr, check)))
 			goto out;
 
 		if (recalculate_partial_csum) {
 			struct tcphdr *tcph = tcp_hdr(skb);
+
+			header_size = skb->network_header +
+				off +
+				sizeof(struct tcphdr);
+			maybe_pull_tail(skb, header_size);
+
 			tcph->check = ~csum_tcpudp_magic(iph->saddr, iph->daddr,
-							 skb->len - iph->ihl*4,
+							 skb->len - off,
 							 IPPROTO_TCP, 0);
 		}
 		break;
 	case IPPROTO_UDP:
-		if (!skb_partial_csum_set(skb, 4 * iph->ihl,
+		if (!skb_partial_csum_set(skb, off,
 					  offsetof(struct udphdr, check)))
 			goto out;
 
 		if (recalculate_partial_csum) {
 			struct udphdr *udph = udp_hdr(skb);
+
+			header_size = skb->network_header +
+				off +
+				sizeof(struct udphdr);
+			maybe_pull_tail(skb, header_size);
+
 			udph->check = ~csum_tcpudp_magic(iph->saddr, iph->daddr,
-							 skb->len - iph->ihl*4,
+							 skb->len - off,
 							 IPPROTO_UDP, 0);
 		}
 		break;
 	default:
 		if (net_ratelimit())
 			netdev_err(vif->dev,
-				   "Attempting to checksum a non-TCP/UDP packet, dropping a protocol %d packet\n",
+				   "Attempting to checksum a non-TCP/UDP packet, "
+				   "dropping a protocol %d packet\n",
 				   iph->protocol);
 		goto out;
 	}
@@ -1183,6 +1191,158 @@ out:
 	return err;
 }
 
+static int checksum_setup_ipv6(struct xenvif *vif, struct sk_buff *skb,
+			       int recalculate_partial_csum)
+{
+	int err = -EPROTO;
+	struct ipv6hdr *ipv6h = (void *)skb->data;
+	u8 nexthdr;
+	unsigned int header_size;
+	unsigned int off;
+	bool fragment;
+	bool done;
+
+	done = false;
+
+	off = sizeof(struct ipv6hdr);
+
+	header_size = skb->network_header + off;
+	maybe_pull_tail(skb, header_size);
+
+	nexthdr = ipv6h->nexthdr;
+
+	while ((off <= sizeof(struct ipv6hdr) + ntohs(ipv6h->payload_len)) &&
+	       !done) {
+		switch (nexthdr) {
+		case IPPROTO_DSTOPTS:
+		case IPPROTO_HOPOPTS:
+		case IPPROTO_ROUTING: {
+			struct ipv6_opt_hdr *hp = (void *)(skb->data + off);
+
+			header_size = skb->network_header +
+				off +
+				sizeof(struct ipv6_opt_hdr);
+			maybe_pull_tail(skb, header_size);
+
+			nexthdr = hp->nexthdr;
+			off += ipv6_optlen(hp);
+			break;
+		}
+		case IPPROTO_AH: {
+			struct ip_auth_hdr *hp = (void *)(skb->data + off);
+
+			header_size = skb->network_header +
+				off +
+				sizeof(struct ip_auth_hdr);
+			maybe_pull_tail(skb, header_size);
+
+			nexthdr = hp->nexthdr;
+			off += (hp->hdrlen+2)<<2;
+			break;
+		}
+		case IPPROTO_FRAGMENT:
+			fragment = true;
+			/* fall through */
+		default:
+			done = true;
+			break;
+		}
+	}
+
+	if (!done) {
+		if (net_ratelimit())
+			netdev_err(vif->dev, "Failed to parse packet header\n");
+		goto out;
+	}
+
+	if (fragment) {
+		if (net_ratelimit())
+			netdev_err(vif->dev, "Packet is a fragment!\n");
+		goto out;
+	}
+
+	switch (nexthdr) {
+	case IPPROTO_TCP:
+		if (!skb_partial_csum_set(skb, off,
+					  offsetof(struct tcphdr, check)))
+			goto out;
+
+		if (recalculate_partial_csum) {
+			struct tcphdr *tcph = tcp_hdr(skb);
+
+			header_size = skb->network_header +
+				off +
+				sizeof(struct tcphdr);
+			maybe_pull_tail(skb, header_size);
+
+			tcph->check = ~csum_ipv6_magic(&ipv6h->saddr,
+						       &ipv6h->daddr,
+						       skb->len - off,
+						       IPPROTO_TCP, 0);
+		}
+		break;
+	case IPPROTO_UDP:
+		if (!skb_partial_csum_set(skb, off,
+					  offsetof(struct udphdr, check)))
+			goto out;
+
+		if (recalculate_partial_csum) {
+			struct udphdr *udph = udp_hdr(skb);
+
+			header_size = skb->network_header +
+				off +
+				sizeof(struct udphdr);
+			maybe_pull_tail(skb, header_size);
+
+			udph->check = ~csum_ipv6_magic(&ipv6h->saddr,
+						       &ipv6h->daddr,
+						       skb->len - off,
+						       IPPROTO_UDP, 0);
+		}
+		break;
+	default:
+		if (net_ratelimit())
+			netdev_err(vif->dev,
+				   "Attempting to checksum a non-TCP/UDP packet, "
+				   "dropping a protocol %d packet\n",
+				   nexthdr);
+		goto out;
+	}
+
+	err = 0;
+
+out:
+	return err;
+}
+
+static int checksum_setup(struct xenvif *vif, struct sk_buff *skb)
+{
+	int err = -EPROTO;
+	int recalculate_partial_csum = 0;
+
+	/* A GSO SKB must be CHECKSUM_PARTIAL. However some buggy
+	 * peers can fail to set NETRXF_csum_blank when sending a GSO
+	 * frame. In this case force the SKB to CHECKSUM_PARTIAL and
+	 * recalculate the partial checksum.
+	 */
+	if (skb->ip_summed != CHECKSUM_PARTIAL && skb_is_gso(skb)) {
+		vif->rx_gso_checksum_fixup++;
+		skb->ip_summed = CHECKSUM_PARTIAL;
+		recalculate_partial_csum = 1;
+	}
+
+	/* A non-CHECKSUM_PARTIAL SKB does not require setup. */
+	if (skb->ip_summed != CHECKSUM_PARTIAL)
+		return 0;
+
+	if (skb->protocol == htons(ETH_P_IP))
+		err = checksum_setup_ip(vif, skb, recalculate_partial_csum);
+	else if (skb->protocol == htons(ETH_P_IPV6))
+		err = checksum_setup_ipv6(vif, skb, recalculate_partial_csum);
+
+	return err;
+}
+
 static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)
 {
 	unsigned long now = jiffies;
@@ -1428,12 +1588,7 @@ static int xenvif_tx_submit(struct xenvif *vif, int budget)
 
 		xenvif_fill_frags(vif, skb);
 
-		/*
-		 * If the initial fragment was < PKT_PROT_LEN then
-		 * pull through some bytes from the other fragments to
-		 * increase the linear region to PKT_PROT_LEN bytes.
-		 */
-		if (skb_headlen(skb) < PKT_PROT_LEN && skb_is_nonlinear(skb)) {
+		if (skb_is_nonlinear(skb) && skb_headlen(skb) < PKT_PROT_LEN) {
 			int target = min_t(int, skb->len, PKT_PROT_LEN);
 			__pskb_pull_tail(skb, target - skb_headlen(skb));
 		}
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 99343ca..9c9b37d 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -105,6 +105,15 @@ static int netback_probe(struct xenbus_device *dev,
 			goto abort_transaction;
 		}
 
+		/* We support partial checksum setup for IPv6 packets */
+		err = xenbus_printf(xbt, dev->nodename,
+				    "feature-ipv6-csum-offload",
+				    "%d", 1);
+		if (err) {
+			message = "writing feature-ipv6-csum-offload";
+			goto abort_transaction;
+		}
+
 		/* We support rx-copy path. */
 		err = xenbus_printf(xbt, dev->nodename,
 				    "feature-rx-copy", "%d", 1);
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH net-next v4 1/5] xen-netback: add support for IPv6 checksum offload to guest
From: Paul Durrant @ 2013-10-11 15:06 UTC (permalink / raw)
  To: xen-devel, netdev; +Cc: Paul Durrant, Wei Liu, David Vrabel, Ian Campbell
In-Reply-To: <1381503982-1418-1-git-send-email-paul.durrant@citrix.com>

Check xenstore flag feature-ipv6-csum-offload to determine if a
guest is happy to accept IPv6 packets with only partial checksum.
Also check analogous feature-ip-csum-offload to determine if a
guest is happy to accept IPv4 packets with only partial checksum
as a replacement for a negated feature-no-csum-offload value and
add a comment to deprecate use of feature-no-csum-offload.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
---
 drivers/net/xen-netback/common.h    |    3 ++-
 drivers/net/xen-netback/interface.c |   10 +++++++---
 drivers/net/xen-netback/xenbus.c    |   24 +++++++++++++++++++++++-
 include/xen/interface/io/netif.h    |   10 ++++++++++
 4 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 5715318..b4a9a3c 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -153,7 +153,8 @@ struct xenvif {
 	u8 can_sg:1;
 	u8 gso:1;
 	u8 gso_prefix:1;
-	u8 csum:1;
+	u8 ip_csum:1;
+	u8 ipv6_csum:1;
 
 	/* Internal feature information. */
 	u8 can_queue:1;	    /* can queue packets for receiver? */
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 01bb854..8e92783 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -216,8 +216,10 @@ static netdev_features_t xenvif_fix_features(struct net_device *dev,
 		features &= ~NETIF_F_SG;
 	if (!vif->gso && !vif->gso_prefix)
 		features &= ~NETIF_F_TSO;
-	if (!vif->csum)
+	if (!vif->ip_csum)
 		features &= ~NETIF_F_IP_CSUM;
+	if (!vif->ipv6_csum)
+		features &= ~NETIF_F_IPV6_CSUM;
 
 	return features;
 }
@@ -306,7 +308,7 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	vif->domid  = domid;
 	vif->handle = handle;
 	vif->can_sg = 1;
-	vif->csum = 1;
+	vif->ip_csum = 1;
 	vif->dev = dev;
 
 	vif->credit_bytes = vif->remaining_credit = ~0UL;
@@ -316,7 +318,9 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	vif->credit_timeout.expires = jiffies;
 
 	dev->netdev_ops	= &xenvif_netdev_ops;
-	dev->hw_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_TSO;
+	dev->hw_features = NETIF_F_SG |
+		NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
+		NETIF_F_TSO;
 	dev->features = dev->hw_features;
 	SET_ETHTOOL_OPS(dev, &xenvif_ethtool_ops);
 
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 1b08d87..99343ca 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -571,10 +571,32 @@ static int connect_rings(struct backend_info *be)
 		val = 0;
 	vif->gso_prefix = !!val;
 
+	/* Before feature-ipv6-csum-offload was introduced, checksum offload
+	 * was turned on by a zero value in (or lack of)
+	 * feature-no-csum-offload. Therefore, for compatibility, assume
+	 * that if feature-ip-csum-offload is missing that IP (v4) offload
+	 * is turned on. If this is not the case then the flag will be
+	 * cleared when we subsequently sample feature-no-csum-offload.
+	 */
+	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-ip-csum-offload",
+			 "%d", &val) < 0)
+		val = 1;
+	vif->ip_csum = !!val;
+
+	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-ipv6-csum-offload",
+			 "%d", &val) < 0)
+		val = 0;
+	vif->ipv6_csum = !!val;
+
+	/* This is deprecated. Frontends should set IP v4 and v6 checksum
+	 * offload using feature-ip-csum-offload and
+	 * feature-ipv6-csum-offload respectively.
+	 */
 	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-no-csum-offload",
 			 "%d", &val) < 0)
 		val = 0;
-	vif->csum = !val;
+	if (val)
+		vif->ip_csum = 0;
 
 	/* Map the shared frame, irq etc. */
 	err = xenvif_connect(vif, tx_ring_ref, rx_ring_ref,
diff --git a/include/xen/interface/io/netif.h b/include/xen/interface/io/netif.h
index eb262e3..d9fb44739 100644
--- a/include/xen/interface/io/netif.h
+++ b/include/xen/interface/io/netif.h
@@ -51,6 +51,16 @@
  */
 
 /*
+ * "feature-no-csum-offload" was used to turn off IPv4 TCP/UDP checksum
+ * offload but is now deprecated. Two new feature flags should now be used
+ * to control checksum offload:
+ * "feature-ip-csum-offload" should be used to turn IPv4 TCP/UDP checksum
+ * offload on or off. If it is missing then the feature is assumed to be on.
+ * "feature-ipv6-csum-offload" should be used to turn IPv6 TCP/UDP checksum
+ * offload on or off. If it is missing then the feature is assumed to be off.
+ */
+
+/*
  * This is the 'wire' format for packets:
  *  Request 1: xen_netif_tx_request  -- XEN_NETTXF_* (any flags)
  * [Request 2: xen_netif_extra_info]    (only if request 1 has XEN_NETTXF_extra_info)
-- 
1.7.10.4

^ permalink raw reply related

* Re: [PATCH] ipv6: Initialize ip6_tnl.hlen in gre tunnel even if no route is found
From: Hannes Frederic Sowa @ 2013-10-11 15:02 UTC (permalink / raw)
  To: Oussama Ghorbel
  Cc: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, linux-kernel
In-Reply-To: <1381427427-14880-1-git-send-email-ou.ghorbel@gmail.com>

On Thu, Oct 10, 2013 at 06:50:27PM +0100, Oussama Ghorbel wrote:
> The ip6_tnl.hlen (gre and ipv6 headers length) is independent from the
> outgoing interface, so it would be better to initialize it even when no
> route is found, otherwise its value will be zero.
> While I'm not sure if this could happen in real life, but doing that
> will avoid to call the skb_push function with a zero in ip6gre_header
> function.
> 
> Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Signed-off-by: Oussama Ghorbel <ou.ghorbel@gmail.com>

Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

Thanks,

  Hannes

^ permalink raw reply

* Re: [PATCHv2 2/2] Don't compute checksum value for SCTP skb with, CHECKSUM_PARTIAL set
From: Vlad Yasevich @ 2013-10-11 14:25 UTC (permalink / raw)
  To: Fan Du, Neil Horman, steffen.klassert, davem; +Cc: netdev
In-Reply-To: <5257A3D6.3010002@windriver.com>

On 10/11/2013 03:08 AM, Fan Du wrote:
>
>
> On 2013年10月10日 21:11, Neil Horman wrote:
>> On Thu, Oct 10, 2013 at 01:51:36PM +0800, Fan Du wrote:
>>> igb/ixgbe have hardware sctp checksum support, when this feature is
>>> enabled
>>> and also IPsec is armed to protect sctp traffic, ugly things happened as
>>> xfrm_output checks CHECKSUM_PARTIAL to do check sum operation(sum
>>> every thing
>>> up and pack the 16bits result in the checksum field). The result is fail
>>> establishment of sctp communication.
>>>
>> Shouldn't this be fixed in the xfrm code then?  E.g. check the device
>> features
>> for SCTP checksum offloading and and skip the checksum during xfrm
>> output if its
>> available?
>>
>> Or am I missing something?
>> Neil
>>
>>
>
>
>  From cff25810910603ff991f0c56441d6de8dc33a822 Mon Sep 17 00:00:00 2001
> From: Fan Du <fan.du@windriver.com>
> Date: Fri, 11 Oct 2013 14:31:57 +0800
> Subject: [PATCH 2/2] Don't compute checksum value for SCTP skb with
>   CHECKSUM_PARTIAL set
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> IPsec is not enabled in this scenario:
>
> SCTP skb set CHECKSUM_PARTIAL to indicate hardware is capable of doing
> SCTP checksum(crc32-c) scoping the whole SCTP packet range. However when
> such kind of skb is delivered through IPv4/v6 output handler, IPv4/v6 stack
> interpret CHECKSUM_PARTIAL by calling skb_checksum_help to compute 16bits
> checksum value by summing everything up, the whole SCTP packet in software
> manner！ After this skb reach NIC, after hardware doing its SCTP checking
> business, a crc32-c value will overwrite the value IPv4/v6 stack computed
> before.
>
> This patch solves this by introducing skb_is_sctpv4/6 to optimize such
> case.
>
> Signed-off-by: Fan Du <fan.du@windriver.com>
> ---
> v2:
>     Rework this problem by introducing skb_is_scktv4/6
>
> ---
>   include/linux/ip.h     |    5 +++++
>   include/linux/ipv6.h   |    6 ++++++
>   include/linux/skbuff.h |    1 -
>   net/core/skbuff.c      |    1 +
>   net/ipv4/ip_output.c   |    4 +++-
>   net/ipv6/ip6_output.c  |    1 +
>   net/xfrm/xfrm_output.c |   20 +++++++++++++++-----
>   7 files changed, 31 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/ip.h b/include/linux/ip.h
> index 492bc65..f556292 100644
> --- a/include/linux/ip.h
> +++ b/include/linux/ip.h
> @@ -19,6 +19,7 @@
>
>   #include <linux/skbuff.h>
>   #include <uapi/linux/ip.h>
> +#include <uapi/linux/in.h>
>
>   static inline struct iphdr *ip_hdr(const struct sk_buff *skb)
>   {
> @@ -34,4 +35,8 @@ static inline struct iphdr *ipip_hdr(const struct
> sk_buff *skb)
>   {
>       return (struct iphdr *)skb_transport_header(skb);
>   }
> +static inline int skb_is_sctpv4(const struct sk_buff *skb)
> +{
> +    return ip_hdr(skb)->protocol == IPPROTO_SCTP;
> +}
>   #endif    /* _LINUX_IP_H */
> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
> index 28ea384..6e17c04 100644
> --- a/include/linux/ipv6.h
> +++ b/include/linux/ipv6.h
> @@ -2,6 +2,7 @@
>   #define _IPV6_H
>
>   #include <uapi/linux/ipv6.h>
> +#include <uapi/linux/in.h>
>
>   #define ipv6_optlen(p)  (((p)->hdrlen+1) << 3)
>   /*
> @@ -387,4 +388,9 @@ static inline struct raw6_sock *raw6_sk(const struct
> sock *sk)
>         ((__sk)->sk_bound_dev_if == (__dif)))                && \
>        net_eq(sock_net(__sk), (__net)))
>
> +static inline int skb_is_sctpv6(const struct sk_buff *skb)
> +{
> +    return ipv6_hdr(skb)->nexthdr == IPPROTO_SCTP;
> +}
> +
>   #endif /* _IPV6_H */
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 2ddb48d..b36d0cc 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -2393,7 +2393,6 @@ extern void           skb_split(struct sk_buff *skb,
>   extern int           skb_shift(struct sk_buff *tgt, struct sk_buff *skb,
>                    int shiftlen);
>   extern void           skb_scrub_packet(struct sk_buff *skb, bool xnet);
> -
>   extern struct sk_buff *skb_segment(struct sk_buff *skb,
>                      netdev_features_t features);
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index d81cff1..54d6172 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -3526,3 +3526,4 @@ void skb_scrub_packet(struct sk_buff *skb, bool xnet)
>       nf_reset_trace(skb);
>   }
>   EXPORT_SYMBOL_GPL(skb_scrub_packet);
> +
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index a04d872..8676b2c 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -587,7 +587,9 @@ slow_path_clean:
>
>   slow_path:
>       /* for offloaded checksums cleanup checksum before fragmentation */
> -    if ((skb->ip_summed == CHECKSUM_PARTIAL) && skb_checksum_help(skb))
> +    if ((skb->ip_summed == CHECKSUM_PARTIAL) &&
> +        !skb_is_sctpv4(skb) &&
> +        skb_checksum_help(skb))
>           goto fail;
>       iph = ip_hdr(skb);
>
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index 3a692d5..9b27d95 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -671,6 +671,7 @@ slow_path_clean:
>
>   slow_path:
>       if ((skb->ip_summed == CHECKSUM_PARTIAL) &&
> +        !skb_is_sctpv6(skb) &&
>           skb_checksum_help(skb))
>           goto fail;
>

No, this isn't right.  This is a case where IP fragmentation is
required, and the above code will cause SCTP checksum to not be
computed.

Looks like SCTP needs to compute the checksum in the case where
skb will be fragmented.

An alternative, that will also allow us to get rid of patch 1
in the serices is to have a checksum handler offload function
that can be used to compute checksum in this case.

-vlad

> diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
> index 3bb2cdc..ddef94a 100644
> --- a/net/xfrm/xfrm_output.c
> +++ b/net/xfrm/xfrm_output.c
> @@ -180,6 +180,14 @@ static int xfrm_output_gso(struct sk_buff *skb)
>       return 0;
>   }
>
> +static int skb_is_sctp(const struct sk_buff *skb)
> +{
> +    if (skb->protocol == __constant_htons(ETH_P_IP))
> +        return skb_is_sctpv4(skb);
> +    else
> +        return skb_is_sctpv6(skb);
> +}
> +
>   int xfrm_output(struct sk_buff *skb)
>   {
>       struct net *net = dev_net(skb_dst(skb)->dev);
> @@ -189,11 +197,13 @@ int xfrm_output(struct sk_buff *skb)
>           return xfrm_output_gso(skb);
>
>       if (skb->ip_summed == CHECKSUM_PARTIAL) {
> -        err = skb_checksum_help(skb);
> -        if (err) {
> -            XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTERROR);
> -            kfree_skb(skb);
> -            return err;
> +        if (!skb_is_sctp(skb)) {
> +            err = skb_checksum_help(skb);
> +            if (err) {
> +                XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTERROR);
> +                kfree_skb(skb);
> +                return err;
> +            }
>           }
>       }
>

^ permalink raw reply

* Re: [PATCH net 0/4] bridge: Fix problems around the PVID
From: Vlad Yasevich @ 2013-10-11 14:14 UTC (permalink / raw)
  To: Toshiaki Makita
  Cc: Toshiaki Makita, David Miller, netdev, Fernando Luis Vazquez Cao,
	Patrick McHardy
In-Reply-To: <1381476853.3172.40.camel@ubuntu-vm-makita>

On 10/11/2013 03:34 AM, Toshiaki Makita wrote:
> On Wed, 2013-10-09 at 11:01 -0400, Vlad Yasevich wrote:
>> On 10/01/2013 07:56 AM, Toshiaki Makita wrote:
>>> On Mon, 2013-09-30 at 12:01 -0400, Vlad Yasevich wrote:
>>>> On 09/30/2013 07:46 AM, Toshiaki Makita wrote:
>>>>> On Fri, 2013-09-27 at 14:10 -0400, Vlad Yasevich wrote:
>>>>>> On 09/27/2013 01:11 PM, Toshiaki Makita wrote:
>>>>>>> On Thu, 2013-09-26 at 10:22 -0400, Vlad Yasevich wrote:
>>>>>>>> On 09/26/2013 06:38 AM, Toshiaki Makita wrote:
>>>>>>>>> On Tue, 2013-09-24 at 13:55 -0400, Vlad Yasevich wrote:
>>>>>>>>>> On 09/24/2013 01:30 PM, Toshiaki Makita wrote:
>>>>>>>>>>> On Tue, 2013-09-24 at 09:35 -0400, Vlad Yasevich wrote:
>>>>>>>>>>>> On 09/24/2013 07:45 AM, Toshiaki Makita wrote:
>>>>>>>>>>>>> On Mon, 2013-09-23 at 10:41 -0400, Vlad Yasevich
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> On 09/17/2013 04:12 AM, Toshiaki Makita wrote:
>>>>>>>>>>>>>>> On Mon, 2013-09-16 at 13:49 -0400, Vlad Yasevich
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> On 09/13/2013 08:06 AM, Toshiaki Makita wrote:
>>>>>>>>>>>>>>>>> On Thu, 2013-09-12 at 16:00 -0400, David
>>>>>>>>>>>>>>>>> Miller wrote:
>>>>>>>>>>>>>>>>>> From: Toshiaki Makita
>>>>>>>>>>>>>>>>>> <makita.toshiaki@lab.ntt.co.jp> Date: Tue,
>>>>>>>>>>>>>>>>>> 10 Sep 2013 19:27:54 +0900
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> There seem to be some undesirable
>>>>>>>>>>>>>>>>>>> behaviors related with PVID. 1. It has no
>>>>>>>>>>>>>>>>>>> effect assigning PVID to a port. PVID
>>>>>>>>>>>>>>>>>>> cannot be applied to any frame regardless
>>>>>>>>>>>>>>>>>>> of whether we set it or not. 2. FDB
>>>>>>>>>>>>>>>>>>> entries learned via frames applied PVID
>>>>>>>>>>>>>>>>>>> are registered with VID 0 rather than VID
>>>>>>>>>>>>>>>>>>> value of PVID. 3. We can set 0 or 4095 as
>>>>>>>>>>>>>>>>>>> a PVID that are not allowed in IEEE
>>>>>>>>>>>>>>>>>>> 802.1Q. This leads interoperational
>>>>>>>>>>>>>>>>>>> problems such as sending frames with VID
>>>>>>>>>>>>>>>>>>> 4095, which is not allowed in IEEE
>>>>>>>>>>>>>>>>>>> 802.1Q, and treating frames with VID 0 as
>>>>>>>>>>>>>>>>>>> they belong to VLAN 0, which is expected
>>>>>>>>>>>>>>>>>>> to be handled as they have no VID
>>>>>>>>>>>>>>>>>>> according to IEEE 802.1Q.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Note: 2nd and 3rd problems are potential
>>>>>>>>>>>>>>>>>>> and not exposed unless 1st problem is
>>>>>>>>>>>>>>>>>>> fixed, because we cannot activate PVID
>>>>>>>>>>>>>>>>>>> due to it.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Please work out the issues in patch #2 with
>>>>>>>>>>>>>>>>>> Vlad and resubmit this series.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thank you.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'm hovering between whether we should fix
>>>>>>>>>>>>>>>>> the issue by changing vlan 0 interface
>>>>>>>>>>>>>>>>> behavior in 8021q module or enabling a bridge
>>>>>>>>>>>>>>>>> port to sending priority-tagged frames, or
>>>>>>>>>>>>>>>>> another better way.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> If you could comment it, I'd appreciate it
>>>>>>>>>>>>>>>>> :)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> BTW, I think what is discussed in patch #2 is
>>>>>>>>>>>>>>>>> another problem about handling priority-tags,
>>>>>>>>>>>>>>>>> and it exists without this patch set
>>>>>>>>>>>>>>>>> applied. It looks like that we should prepare
>>>>>>>>>>>>>>>>> another patch set than this to fix that
>>>>>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Should I include patches that fix the
>>>>>>>>>>>>>>>>> priority-tags problem in this patch set and
>>>>>>>>>>>>>>>>> resubmit them all together?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I am thinking that we might need to do it in
>>>>>>>>>>>>>>>> bridge and it looks like the simplest way to do
>>>>>>>>>>>>>>>> it is to have default priority regeneration
>>>>>>>>>>>>>>>> table (table 6-5 from 802.1Q doc).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> That way I think we would conform to the spec.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -vlad
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Unfortunately I don't think the default priority
>>>>>>>>>>>>>>> regeneration table resolves the problem because
>>>>>>>>>>>>>>> IEEE 802.1Q says that a VLAN-aware bridge can
>>>>>>>>>>>>>>> transmit untagged or VLAN-tagged frames only (the
>>>>>>>>>>>>>>> end of section 7.5 and 8.1.7).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> No mechanism to send priority-tagged frames is
>>>>>>>>>>>>>>> found as far as I can see the standard. I think
>>>>>>>>>>>>>>> the regenerated priority is used for outgoing
>>>>>>>>>>>>>>> PCP field only if egress policy is not untagged
>>>>>>>>>>>>>>> (i.e. transmitting as VLAN-tagged), and unused if
>>>>>>>>>>>>>>> untagged (Section 6.9.2 3rd/4th Paragraph).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If we want to transmit priority-tagged frames
>>>>>>>>>>>>>>> from a bridge port, I think we need to implement
>>>>>>>>>>>>>>> a new (optional) feature that is above the
>>>>>>>>>>>>>>> standard, as I stated previously.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> How do you feel about adding a per-port policy
>>>>>>>>>>>>>>> that enables a bridge to send priority-tagged
>>>>>>>>>>>>>>> frames instead of untagged frames when egress
>>>>>>>>>>>>>>> policy for the port is untagged? With this
>>>>>>>>>>>>>>> change, we can transmit frames for a given vlan
>>>>>>>>>>>>>>> as either all untagged, all priority-tagged or
>>>>>>>>>>>>>>> all VLAN-tagged.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> That would work.  What I am thinking is that we do
>>>>>>>>>>>>>> it by special casing the vid 0 egress policy
>>>>>>>>>>>>>> specification.  Let it be untagged by default and
>>>>>>>>>>>>>> if it is tagged, then we preserve the priority
>>>>>>>>>>>>>> field and forward it on.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This keeps the API stable and doesn't require
>>>>>>>>>>>>>> user/admin from knowing exactly what happens.
>>>>>>>>>>>>>> Default operation conforms to the spec and allows
>>>>>>>>>>>>>> simple change to make it backward-compatible.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What do you think.  I've done a simple prototype of
>>>>>>>>>>>>>> this an it seems to work with the VMs I am testing
>>>>>>>>>>>>>> with.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Are you saying that - by default, set the 0th bit of
>>>>>>>>>>>>> untagged_bitmap; and - if we unset the 0th bit and
>>>>>>>>>>>>> set the "vid"th bit, we transmit frames classified as
>>>>>>>>>>>>> belonging to VLAN "vid" as priority-tagged?
>>>>>>>>>>>>>
>>>>>>>>>>>>> If so, though it's attractive to keep current API,
>>>>>>>>>>>>> I'm worried about if it could be a bit confusing and
>>>>>>>>>>>>> not intuitive for kernel/iproute2 developers that VID
>>>>>>>>>>>>> 0 has a special meaning only in the egress policy.
>>>>>>>>>>>>> Wouldn't it be better to adding a new member to
>>>>>>>>>>>>> struct net_port_vlans instead of using VID 0 of
>>>>>>>>>>>>> untagged_bitmap?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Or are you saying that we use a new flag in struct
>>>>>>>>>>>>> net_port_vlans but use the BRIDGE_VLAN_INFO_UNTAGGED
>>>>>>>>>>>>> bit with VID 0 in netlink to set the flag?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Even in that case, I'm afraid that it might be
>>>>>>>>>>>>> confusing for developers for the same reason. We are
>>>>>>>>>>>>> going to prohibit to specify VID with 0 (and 4095) in
>>>>>>>>>>>>> adding/deleting a FDB entry or a vlan filtering
>>>>>>>>>>>>> entry, but it would allow us to use VID 0 only when a
>>>>>>>>>>>>> vlan filtering entry is configured. I am thinking a
>>>>>>>>>>>>> new nlattr is a straightforward approach to
>>>>>>>>>>>>> configure it.
>>>>>>>>>>>>
>>>>>>>>>>>> By making this an explicit attribute it makes vid 0 a
>>>>>>>>>>>> special case for any automatic tool that would
>>>>>>>>>>>> provision such filtering.  Seeing vid 0 would mean that
>>>>>>>>>>>> these tools would have to know that this would have to
>>>>>>>>>>>> be translated to a different attribute instead of
>>>>>>>>>>>> setting the policy values.
>>>>>>>>>>>
>>>>>>>>>>> Yes, I agree with you that we can do it by the way you
>>>>>>>>>>> explained. What I don't understand is the advantage of
>>>>>>>>>>> using vid 0 over another way such as adding a new
>>>>>>>>>>> nlattr. I think we can indicate transmitting
>>>>>>>>>>> priority-tags explicitly by such a nlattr. Using vid 0
>>>>>>>>>>> seems to be easier to implement than a new nlattr, but,
>>>>>>>>>>> for me, it looks less intuitive and more difficult to
>>>>>>>>>>> maintain because we have to care about vid 0 instead of
>>>>>>>>>>> simply ignoring it.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The point I am trying to make is that regardless of the
>>>>>>>>>> approach someone has to know what to do when enabling
>>>>>>>>>> priority tagged frames.  You proposal would require the
>>>>>>>>>> administrator or config tool to have that knowledge.
>>>>>>>>>> Example is: Admin does: bridge vlan set priority on dev
>>>>>>>>>> eth0 Automated app: if (vid == 0) /* Turn on priority
>>>>>>>>>> tagged frame support */
>>>>>>>>>>
>>>>>>>>>> My proposal would require the bridge filtering
>>>>>>>>>> implementation to have it. user tool: bridge vlan add vid 0
>>>>>>>>>> tagged Automated app:  No special case.
>>>>>>>>>>
>>>>>>>>>> IMO its better to have 1 piece code handling the special
>>>>>>>>>> case then putting it multiple places.
>>>>>>>>>
>>>>>>>>> Thank you for the detailed explanation. Now I understand your
>>>>>>>>> intention.
>>>>>>>>>
>>>>>>>>> I have one question about your proposal. I guess the way to
>>>>>>>>> enable priority-tagged is something like bridge vlan add vid
>>>>>>>>> 10 dev eth0 bridge vlan add vid 10 dev vnet0 pvid untagged
>>>>>>>>> bridge vlan add vid 0 dev vnet0 tagged where vnet0 has sub
>>>>>>>>> interface vnet0.0.
>>>>>>>>>
>>>>>>>>> Here the admin have to know the egress policy is applied to a
>>>>>>>>> frame twice in a certain order when it is transmitted from
>>>>>>>>> the port vnet0 attached, that is, first, a frame with vid 10
>>>>>>>>> get untagged, and then, an untagged frame get
>>>>>>>>> priority-tagged.
>>>>>>>>>
>>>>>>>>> This behavior looks difficult to know without previous
>>>>>>>>> knowledge. Any good idea to avoid such a need for the admin's
>>>>>>>>> additional knowledge?
>>>>>>>>
>>>>>>>> To me, the fact that there is vnet0.0 (or typically, there is
>>>>>>>> eth0.0 in the guest or on the remote host) already tells the
>>>>>>>> admin vlan 0 has to be tagged.  The fact that we codify this in
>>>>>>>> the policy makes it explicit.
>>>>>>>
>>>>>>> My worry is that the admin might not be able to guess how to use
>>>>>>> bridge commands to enable priority-tag without any additional
>>>>>>> hint in "man bridge", "bridge vlan help", etc. I actually
>>>>>>> couldn't hit upon such a usage before seeing example commands you
>>>>>>> gave, because I had never think the egress policy could be
>>>>>>> applied twice.
>>>>>>>
>>>>>>>>
>>>>>>>> However, I can see strong argument to be made for an addition
>>>>>>>> egress policy attribute that could be for instance:
>>>>>>>>
>>>>>>>> bridge vlan add vid 10 dev eth0 pvid bridge vlan add vid 10 dev
>>>>>>>> vnet0 pvid untagged prio_tag
>>>>>>>>
>>>>>>>> But this has the same connotations as wrt to egress policy.
>>>>>>>> The 2 policies are applied: (1) untag the frame. (2) add
>>>>>>>> priority_tag.
>>>>>>>>
>>>>>>>> (2) only happens if initial fame received on eth0 was priority
>>>>>>>> tagged.
>>>>>>>
>>>>>>> If we do so, we will not be able to communicate using vlan 0
>>>>>>> interface under a certain circumstance. Eth0 can be receive mixed
>>>>>>> untagged and priority-tagged frames according to the network
>>>>>>> element it is connected to: for example, Open vSwitch can send
>>>>>>> such two kinds of frames from the same port even if original
>>>>>>> incoming frames belong to the same vlan.
>>>>>>
>>>>>> Which priority would you assign to the frame that was received
>>>>>> untagged?
>>>>>
>>>>> Untagged frame's priority is by default 0, so I think 0 is likely.
>>>>>
>>>>> 802.1Q 6.9.1 i) The received priority value and the drop_eligible
>>>>> parameter value are the values in the M_UNITDATA.indication.
>>>>>
>>>>> M_UNITDATA is passed from ISS.
>>>>>
>>>>> 802.1Q 6.7.1 The priority parameter provided in a data indication
>>>>> primitive shall take the value of the Default User Priority parameter
>>>>> for the Port through which the MAC frame was received. The default
>>>>> value of this parameter is 0, it may be set by management in which
>>>>> case the capability to set it to any of the values 0 through 7 shall
>>>>> be provided.
>>>>>
>>>>>>
>>>>>>> In this situation, we can only receive frames that is
>>>>>>> priority-tagged when received on eth0.
>>>>>>
>>>>>> Not sure I understand.  Let's look at this config: bridge vlan add
>>>>>> vid 10 dev eth0 pvid bridge vlan add vid 10 dev vnet0 pvid untagged
>>>>>> prio_tag
>>>>>>
>>>>>> Here, eth0 is allowed to receive vid 10 tagged, untagged, and
>>>>>> prio_tagged (if we look at the patch 2 from this set). Now, frame
>>>>>> is forwarded to vnet0. 1) if the frame had vid 10 in the tag or was
>>>>>> untagged, it should probably be sent untagged. 2) if the frame had
>>>>>> a priority tag, it should probably be sent as such.
>>>>>>
>>>>>> Now, I think a case could be made that if the frame had any
>>>>>> priority markings in the vlan header, we should try to preserve
>>>>>> those markings if prio_tag is turned on.  We can assume value of 0
>>>>>> means not set.
>>>>>
>>>>> If we don't insert prio_tag when PCP is 0, we might receive mixed
>>>>> priority-tagged and untagged frames on eth0.
>>>>
>>>> Right, and that's what you were trying to handle in your patch:
>>>>
>>>>> +		/* PVID is set on this port.  Any untagged or priority-tagged +
>>>>> * ingress frame is considered to belong to this vlan. */
>>>>
>>>> So, in this case we are prepared to handle the "mixed" scenario on ingress.
>>>>
>>>>> Even if we are sending frames from eth0.0 with some priority other
>>>>> than 0, we could receive frames with priority 0 or untagged on the
>>>>> other side of the bridge.
>>>>> For example, if we receive untagged arp reply on the bridge port, we
>>>>> migit not be able to communicate with such an end station, because
>>>>> untagged reply will not be passed to eth0.0.
>>>>
>>>> So the ARP request was sent tagged, but the reply came back untagged?
>>>
>>> Yes, it can happen.
>>> These are problematic cases.
>>>
>>> Example 1:
>>>               prio_tagged         prio_tagged
>>> +------------+ ---> +------------+ ---> +----------+
>>> |guest eth0.0|------|host1 Bridge|------|host2 eth0|
>>> +------------+ <--- +------------+ <--- +----------+
>>>                untagged            untagged
>>>
>>> Note: Host2 eth0, which is an interface on Linux, can receive
>>> priority-tagged frames if it doesn't have vlan 0 interface (eth0.0).
>>
>> Hmm..  Just to see if this works, I ran the this scenario with
>> a dumb switch in the middle, and it did not work as you noted.
>> I then realized that one of the kernels was rather old and after
>> updating it, behaved differently.  The communication still didn't
>> work, but host2 behaved properly.
>>
>>>>
>>>> How does that work when the end station is attached directly to the
>>>> HW switch instead of a linux bridge?
>>>>
>>>> The station configures eth0.0 and sends priority-tagged traffic to
>>>> the HW switch.  If the HW switch sends back untagged traffic, then
>>>> the untagged traffic will never reach eth0.0.
>>>
>>> Currently we cannot communicate using eth0.0 via directly connected
>>> 802.1Q conformed switch, because we never receive priority-tagged frames
>>> from the switch.
>>> It is not a problem of Linux bridge and is why I wondered whether it
>>> should be fixed by bridge or vlan 0 interface.
>>>
>>>>
>>>>>
>>>>>>
>>>>>>> IMO, if prio_tag is configured, the bridge should send any
>>>>>>> untagged frame as priority-tagged regardless of whatever it is on
>>>>>>> eth0.
>>>>>>
>>>>>> Which priority would you use, 0?  You are not guaranteed to
>>>>>> properly deliver the traffic then for a configuration such as: VM:
>>>>>> eth0: 10.0.0.1/24 eth0.0: 10.0.1.1/24
>>>>>
>>>>> I'd like to use priority 0 for untagged frames.
>>>>>
>>>>> I am assuming that one of our goals is at least that eth0.0 comes to
>>>>> be able to communicate with another end station. It seems to be hard
>>>>> to use both eth0 and eth0.0 simultaneously.
>>>>
>>>> I understand, but I don't agree that we should always tag.
>>>>
>>>> Consider config:
>>>>
>>>>       hw switch <---> (eth0: Linux Bridge: eth1) <--- (em1.0:end station)
>>>>
>>>> If the end station sends priority-tagged traffic it should receive
>>>> priority tagged traffic back.  Otherwise, untagged traffic may be
>>>> dropped by the end station.  This is true whether it is connected to
>>>> the hw switch or Linux bridge.
>>>
>>> Though such a behavior is generally not necessary as far as I can read
>>> 802.1Q spec, it is essential for vlan 0 interface on Linux, I think.
>>> My proposal aims to resolve it at least when we use Linux bridge.
>>>
>>> Example configuration:
>>> 	bridge vlan add vid 10 dev eth1 pvid untagged
>>> 	bridge vlan add vid 10 dev eth0
>>> 	bridge vlan set prio_tag on dev eth1
>>>
>>> Intended behavior:
>>>
>>>           VID10-tagged                     prio_tagged
>>> +---------+ <--- +------------------------+ <--- +-----------------+
>>> |hw switch|------|eth0: Linux Bridge: eth1|------|em1.0:end station|
>>> +---------+ ---> +------------------------+ ---> +-----------------+
>>>           VID10-tagged                     prio_tagged
>>>                                 (always if egress policy untagged)
>>
>> Ok, I think you've convinced me that this is the right approach. The
>> only thing that I am not crazy about is the API.  I'd almost want to
>> introduce a new flag that can be set in a 'vlan add' or 'vlan set'
>> command that communicates a new policy.
>
> I'm glad that we reached a consensus on the approach :)
>
> I agree with you that the API is flag based.
> I'm guessing your intention is that 'vlan add' means a per vlan per port
> policy and 'vlan set' means a per port one, that is,
> 	'vlan add': bridge vlan add vid 10 dev eth1 prio_tag
> 	'vlan set': bridge vlan set prio_tag on dev eth1
>
> I think they can behave differently only when we set untagged to
> multiple vlans on the same port.
>
> 'vlan add' example with vid 10 and 20:
> 	bridge vlan add vid 10 dev eth1 pvid untagged prio_tag
> 	bridge vlan add vid 10 dev eth0
> 	bridge vlan add vid 20 dev eth1 untagged
> 	bridge vlan add vid 20 dev eth2
>
>           VID10-tagged                  prio_tagged (from eth0)
> +---------+ ---> +------------------------+ ---> +-----------------+
> |hw switch|------|eth0                eth1|------|em1.0:end station|
> +---------+      |      Linux Bridge      | ---> +-----------------+
> +---------+      |                        | *untagged*
> |hw switch|------|eth2                    | (from eth2)
> +---------+ ---> +------------------------+
>           VID20-tagged
>

This is what I was thinking of, but I was actually considering that
untagged and prio_tag can not co-exist for the same vlan as they don't
really make sence together anymore.

So one can do:
	bridge vlan add vid 10 dev eth1 pvid prio_tag
	bridge vlan add vid 20 dev eth1 untagged

and recieve VLAN 10 as priority tagged and vlan 20 as untagged.

-vlad

>
> 'vlan set' example with vid 10 and 20:
> 	bridge vlan add vid 10 dev eth1 pvid untagged
> 	bridge vlan add vid 10 dev eth0
> 	bridge vlan add vid 20 dev eth1 untagged
> 	bridge vlan add vid 20 dev eth2
> 	bridge vlan set prio_tag on dev eth1
>
>           VID10-tagged                  prio_tagged (from eth0)
> +---------+ ---> +------------------------+ ---> +-----------------+
> |hw switch|------|eth0                eth1|------|em1.0:end station|
> +---------+      |      Linux Bridge      | ---> +-----------------+
> +---------+      |                        | prio_tagged
> |hw switch|------|eth2                    | (from eth2)
> +---------+ ---> +------------------------+
>           VID20-tagged
>
> Em1.0 can always receive traffic from eth1 if we adopt 'vlan set'.
> However, I cannot imagine when multiple untagged vlans is required, so
> cannot figure out whether 'vlan add' is useful or harmful.
> Anyway, both of approaches are OK with me.
>
> Thanks,
> Toshiaki Makita
>
>>
>> Thanks
>> -vlad
>>
>>>
>>> Thanks,
>>>
>>> Toshiaki Makita
>>>
>>>>
>>>> -vlad
>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Toshiaki Makita
>>>>>
>>>>>>
>>>>>> -vlad
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> I think I am ok with either approach.  Explicit vid 0 policy is
>>>>>>>> easier for automatic provisioning.   The flag based one is
>>>>>>>> easier for admin/ manual provisioning.
>>>>>>>
>>>>>>> Supposing we have to add something to help or man in any case, I
>>>>>>> think flag based is better. The format below seems to suitable
>>>>>>> for a per-port policy. bridge vlan set prio_tag on dev vnet0
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Toshiaki Makita
>>>>>>>
>>>>>>>>
>>>>>>>> -vlad.
>>>>>>>>
>>>>>>>> -vlad
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks -vlad
>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Toshiaki Makita
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> How it is implemented internally in the kernel isn't as
>>>>>>>>>>>> big of an issue. We can do it as a separate flag or as
>>>>>>>>>>>> part of existing policy.
>>>>>>>>>>>>
>>>>>>>>>>>> -vlad
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Toshiaki Makita
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -vlad
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Toshiaki Makita
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Toshiaki Makita
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> -- To unsubscribe from this list: send the
>>>>>>>>>>>>>>>>>> line "unsubscribe netdev" in the body of a
>>>>>>>>>>>>>>>>>> message to majordomo@vger.kernel.org More
>>>>>>>>>>>>>>>>>> majordomo info at
>>>>>>>>>>>>>>>>>> http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -- To unsubscribe from this list: send the line
>>>>>>>>>>>>>>> "unsubscribe netdev" in the body of a message to
>>>>>>>>>>>>>>> majordomo@vger.kernel.org More majordomo info at
>>>>>>>>>>>>>>> http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

^ permalink raw reply

* Re: [PATCHv2 1/2 ] {xfrm, sctp} Stick to software crc32 even if hardware is capable of that
From: Vlad Yasevich @ 2013-10-11 14:04 UTC (permalink / raw)
  To: Fan Du, Neil Horman, steffen.klassert, davem; +Cc: netdev
In-Reply-To: <5257A349.3010605@windriver.com>

On 10/11/2013 03:05 AM, Fan Du wrote:
>
>
> On 2013年10月10日 21:11, Neil Horman wrote:
>> On Thu, Oct 10, 2013 at 01:51:36PM +0800, Fan Du wrote:
>>> igb/ixgbe have hardware sctp checksum support, when this feature is
>>> enabled
>>> and also IPsec is armed to protect sctp traffic, ugly things happened as
>>> xfrm_output checks CHECKSUM_PARTIAL to do check sum operation(sum
>>> every thing
>>> up and pack the 16bits result in the checksum field). The result is fail
>>> establishment of sctp communication.
>>>
>> Shouldn't this be fixed in the xfrm code then?  E.g. check the device
>> features
>> for SCTP checksum offloading and and skip the checksum during xfrm
>> output if its
>> available?
>>
>> Or am I missing something?
>> Neil
>>
>>
>
>
>  From 014276de0877f11d46e1704114a7d91f19221a63 Mon Sep 17 00:00:00 2001
> From: Fan Du <fan.du@windriver.com>
> Date: Fri, 11 Oct 2013 14:24:33 +0800
> Subject: [PATCH 1/2] {xfrm, sctp} Stick to software crc32 even if
> hardware is
>   capable of that
>
> igb/ixgbe have hardware sctp checksum support, when this feature is enabled
> and also IPsec is armed to protect sctp traffic, ugly things happened as
> xfrm_output checks CHECKSUM_PARTIAL to do check sum operation(sum every
> thing
> up and pack the 16bits result in the checksum field). The result is fail
> establishment of sctp communication.
>
> Signed-off-by: Fan Du <fan.du@windriver.com>
> ---
> v2:
>    Leave ip_summed as CHECKSUM_PARTIAL as before, the second patch will
> fix this.
>
> ---
>   net/sctp/output.c |   14 +++++++++++++-
>   1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/net/sctp/output.c b/net/sctp/output.c
> index 0ac3a65..6de6402 100644
> --- a/net/sctp/output.c
> +++ b/net/sctp/output.c
> @@ -372,6 +372,16 @@ static void sctp_packet_set_owner_w(struct sk_buff
> *skb, struct sock *sk)
>       atomic_inc(&sk->sk_wmem_alloc);
>   }
>
> +static int is_xfrm_armed(struct dst_entry *dst)
> +{
> +#ifdef CONFIG_XFRM
> +    /* If dst->xfrm is valid, this skb needs to be transformed */
> +    return dst->xfrm != NULL;
> +#else
> +    return 0;
> +#endif
> +}
> +

I would really prefer to have an accessor function to dst->xfrm, but
I see that everyone codes it inside the #ifdef.  Gack.

>   /* All packets are sent to the network through this function from
>    * sctp_outq_tail().
>    *
> @@ -536,7 +546,9 @@ int sctp_packet_transmit(struct sctp_packet *packet)
>        * by CRC32-C as described in <draft-ietf-tsvwg-sctpcsum-02.txt>.
>        */
>       if (!sctp_checksum_disable) {
> -        if (!(dst->dev->features & NETIF_F_SCTP_CSUM)) {
> +        if ((!(dst->dev->features & NETIF_F_SCTP_CSUM)) ||
> +            is_xfrm_armed(dst)) {
> +
>               __u32 crc32 = sctp_start_cksum((__u8 *)sh, cksum_buf_len);
>
>               /* 3) Put the resultant value into the checksum field in the

Acked-by: Vlad Yasevich <vyasevich@gmail.com>

-vlad

^ permalink raw reply

* RE: [PATCH net-next v3 5/5] xen-netback: enable IPv6 TCP GSO to the guest
From: Paul Durrant @ 2013-10-11 14:03 UTC (permalink / raw)
  To: Wei Liu
  Cc: xen-devel@lists.xen.org, netdev@vger.kernel.org, Wei Liu,
	David Vrabel, Ian Campbell
In-Reply-To: <20131011100950.GF15556@zion.uk.xensource.com>

> -----Original Message-----
> From: Wei Liu [mailto:wei.liu2@citrix.com]
> Sent: 11 October 2013 11:10
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org; Wei Liu; David Vrabel;
> Ian Campbell
> Subject: Re: [PATCH net-next v3 5/5] xen-netback: enable IPv6 TCP GSO to
> the guest
> 
> On Thu, Oct 10, 2013 at 04:25:33PM +0100, Paul Durrant wrote:
> [...]
> >  	return max;
> > @@ -312,6 +312,7 @@ static struct xenvif_rx_meta
> *get_next_rx_buffer(struct xenvif *vif,
> >  	req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++);
> >
> >  	meta = npo->meta + npo->meta_prod++;
> > +	meta->gso_type = 0;
> 
> Should use XEN_NETIF_GSO_TYPE_NONE here.
> 
> >  	meta->gso_size = 0;
> >  	meta->size = 0;
> [...]
> > +	if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4) {
> > +		gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
> > +		gso_size = skb_shinfo(skb)->gso_size;
> > +	} else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6) {
> > +		gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
> > +		gso_size = skb_shinfo(skb)->gso_size;
> > +	} else {
> > +		gso_type = 0;
> 
> Ditto.
> 
> > +		gso_size = 0;
> > +	}
> > +
> [...]
> > @@ -438,10 +461,13 @@ static int xenvif_gop_skb(struct sk_buff *skb,
> >  	req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++);
> >  	meta = npo->meta + npo->meta_prod++;
> >
> > -	if (!vif->gso_prefix)
> > -		meta->gso_size = skb_shinfo(skb)->gso_size;
> > -	else
> > +	if ((1 << gso_type) & vif->gso_mask) {
> > +		meta->gso_type = gso_type;
> > +		meta->gso_size = gso_size;
> > +	} else {
> > +		meta->gso_type = 0;
> 
> Ditto.
> 
> >  		meta->gso_size = 0;
> > +	}
> >
> >  	meta->size = 0;
> >  	meta->id = req->id;
> > @@ -587,7 +613,8 @@ void xenvif_rx_action(struct xenvif *vif)
> >
> >  		vif = netdev_priv(skb->dev);
> >
> > -		if (vif->meta[npo.meta_cons].gso_size && vif->gso_prefix) {
> > +		if ((1 << vif->meta[npo.meta_cons].gso_type) &
> > +		    vif->gso_prefix_mask) {
> 
> The logic is changed here. Why do you remove the test on gso_size?
> 

Sorry, I missed this one before so never responded...

The test on gso_size is no longer necessary, because if gso_size is 0 then gso_type will be XEN_NETIF_GSO_TYPE_NONE and that will never appear in the masks.

  Paul

> >  			resp = RING_GET_RESPONSE(&vif->rx,
> >  						 vif->rx.rsp_prod_pvt++);
> >
> > @@ -624,7 +651,8 @@ void xenvif_rx_action(struct xenvif *vif)
> >  					vif->meta[npo.meta_cons].size,
> >  					flags);
> >
> > -		if (vif->meta[npo.meta_cons].gso_size && !vif->gso_prefix)
> {
> > +		if ((1 << vif->meta[npo.meta_cons].gso_type) &
> > +		    vif->gso_mask) {
> 
> Ditto.
> 
> >  			struct xen_netif_extra_info *gso =
> >  				(struct xen_netif_extra_info *)
> >  				RING_GET_RESPONSE(&vif->rx,
> > @@ -632,8 +660,8 @@ void xenvif_rx_action(struct xenvif *vif)
> >
> [...]
> >  	 * was turned on by a zero value in (or lack of)
> > diff --git a/include/xen/interface/io/netif.h
> b/include/xen/interface/io/netif.h
> > index d7dd8d7..41674bc 100644
> > --- a/include/xen/interface/io/netif.h
> > +++ b/include/xen/interface/io/netif.h
> > @@ -113,6 +113,7 @@ struct xen_netif_tx_request {
> >  #define  XEN_NETIF_EXTRA_FLAG_MORE
> 	(1U<<_XEN_NETIF_EXTRA_FLAG_MORE)
> >
> >  /* GSO types */
> > +#define XEN_NETIF_GSO_TYPE_NONE         (0)
> 
> Indentation.
> 
> Wei.

^ permalink raw reply

* [PATCH 14/14] net: smc91x: dont't use SMC_outw for fixing up halfword-aligned data
From: Matthew Leach @ 2013-10-11 13:52 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: ankit.jindal, steve.mcintyre, tushar.jagad, will.deacon,
	catalin.marinas, netdev, Nicolas Pitre, David S. Miller
In-Reply-To: <1381499540-28794-1-git-send-email-matthew.leach@arm.com>

From: Will Deacon <will.deacon@arm.com>

SMC_outw invokes an endian-aware I/O accessor, which may change the data
endianness before writing to the device. This is not suitable for data
transfers where the memory buffer is simply a string of bytes that does
not require any byte-swapping.

This patches fixes the smc91x SMC_PUSH_DATA macro so that it uses the
string I/O accessor for outputting the leading or trailing halfwords on
halfword-aligned buffers.

Cc: <netdev@vger.kernel.org>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 drivers/net/ethernet/smsc/smc91x.h |    6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/smsc/smc91x.h b/drivers/net/ethernet/smsc/smc91x.h
index 5730fe2..98eedb9 100644
--- a/drivers/net/ethernet/smsc/smc91x.h
+++ b/drivers/net/ethernet/smsc/smc91x.h
@@ -1124,8 +1124,7 @@ static const char * chip_ids[ 16 ] =  {
 			void __iomem *__ioaddr = ioaddr;		\
 			if (__len >= 2 && (unsigned long)__ptr & 2) {	\
 				__len -= 2;				\
-				SMC_outw(*(u16 *)__ptr, ioaddr,		\
-					DATA_REG(lp));		\
+				SMC_outsw(ioaddr, DATA_REG(lp), __ptr, 1); \
 				__ptr += 2;				\
 			}						\
 			if (SMC_CAN_USE_DATACS && lp->datacs)		\
@@ -1133,8 +1132,7 @@ static const char * chip_ids[ 16 ] =  {
 			SMC_outsl(__ioaddr, DATA_REG(lp), __ptr, __len>>2); \
 			if (__len & 2) {				\
 				__ptr += (__len & ~3);			\
-				SMC_outw(*((u16 *)__ptr), ioaddr,	\
-					 DATA_REG(lp));		\
+				SMC_outsw(ioaddr, DATA_REG(lp), __ptr, 1); \
 			}						\
 		} else if (SMC_16BIT(lp))				\
 			SMC_outsw(ioaddr, DATA_REG(lp), p, (l) >> 1);	\
-- 
1.7.9.5

^ permalink raw reply related

* Re: [PATCH v3] mac80211: port CCMP to cryptoapi's CCM driver
From: Johannes Berg @ 2013-10-11 13:44 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, patches-QSEj5FYQhm4dnm+yROfE0A
In-Reply-To: <1381391720-28771-1-git-send-email-ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>

On Thu, 2013-10-10 at 09:55 +0200, Ard Biesheuvel wrote:
> Use the generic CCM aead chaining mode driver rather than a local
> implementation that sits right on top of the core AES cipher.
> 
> This allows the use of accelerated implementations of either
> CCM as a whole or the CTR mode which it encapsulates.

Applied.

johannes

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: XGMII patch
From: Rayagond K @ 2013-10-11 13:34 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: netdev, afleming
In-Reply-To: <CAGVrzcZpZUVNg9zt3x0UsPuV1fSaEy89cg+1=t038_7FzPWjEw@mail.gmail.com>

On Fri, Oct 11, 2013 at 1:52 PM, Florian Fainelli <f.fainelli@gmail.com> wrote:
> Adding Andy in CC who is the original author
>
> Le 8 oct. 2013 07:19, "Rayagond K" <rayagond@vayavyalabs.com> a écrit :
>
>
>>
>> Hi All,
>>
>> I came across some old patch series to support XGMII in PHYLIB -
>> http://lwn.net/Articles/463344/
>>
>> These patch are not available in mainline kernel still and also PHYLIB
>> doesn't support XGMII also.
>
> This patch set does much more than adding XGMII support, it also adds a
> generic 10G driver, is that what you need or can you live with just
> extending phy_interface_t to know about XGMII and write your own PHY driver?

I need both ie generic 10G driver and extending phy_interface_t to
know about XGMII.

Currently PAL takes care of PHY driver and handles everything related
to PHY device, MAC driver has just registers to PAL and give some
callback function like read_phy, write_phy and adjust_link etc. But
PAL doesn't support XGMII yet, so this patch does that right ?

>
>>
>> Any plan for applying these patch to mainline kernel.
>
> There were some comments on the patchset and there does not seem to have
> been a follow up.

Since this patch adds generic 10G driver and extends PAL to support
XGMII, its worth to follow up comments and add this patch to mainline
kernel asap.

^ permalink raw reply

* Re: [PATCH v2 tip/core/rcu 07/13] ipv6/ip6_tunnel: Apply rcu_access_pointer() to avoid sparse false positive
From: Paul E. McKenney @ 2013-10-11 13:25 UTC (permalink / raw)
  To: Josh Triplett
  Cc: Eric Dumazet, linux-kernel, mingo, laijs, dipankar, akpm,
	mathieu.desnoyers, niv, tglx, peterz, rostedt, dhowells, edumazet,
	darren, fweisbec, sbw, David S. Miller, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev
In-Reply-To: <20131011002044.GA32546@jtriplet-mobl1>

On Thu, Oct 10, 2013 at 05:20:44PM -0700, Josh Triplett wrote:
> On Wed, Oct 09, 2013 at 05:28:33PM -0700, Paul E. McKenney wrote:
> > On Wed, Oct 09, 2013 at 05:12:40PM -0700, Eric Dumazet wrote:
> > > On Wed, 2013-10-09 at 16:40 -0700, Josh Triplett wrote:
> > > 
> > > > that.  Constructs like list_del_rcu are much clearer, and not
> > > > open-coded.  Open-coding synchronization code is almost always a Bad
> > > > Idea.
> > > 
> > > OK, so you think there is synchronization code.
> > > 
> > > I will shut up then, no need to waste time.
> > 
> > As you said earlier, we should at least get rid of the memory barrier
> > as long as we are changing the code.
> > 
> > Josh, what would you suggest as the best way to avoid the memory barrier,
> > keep sparse happy, and not be too ugly?
> 
> The more I think about it, the more I realize that assigning an __rcu
> pointer to an __rcu pointer *without* a memory barrier is a sufficiently
> uncommon case that you probably *should* just write an open-coded
> assignment.  Just please put a very clear comment right before it.

Fair enough, will do!  Given earlier email, I believe that Eric is
fine with this, and if he isn't I am sure he will let us know.  ;-)

> I'd originally thought it might make sense to have a macro similar to
> rcu_assign_pointer, but I just don't think this is a common enough case,
> and we don't want people thinking they can use this in general for __rcu
> to __rcu assignments (most of which still need a memory barrier).

Yep, it is a rather small fraction of rcu_assign_pointer() instances.

							Thanx, Paul

^ permalink raw reply

* [PATCH] issues with printk "mppe_compress[%d]: osize too small! ..."
From: Helmut Grohne @ 2013-10-11 12:57 UTC (permalink / raw)
  To: Paul Mackerras, linux-ppp, netdev@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 1140 bytes --]

Dear kernel maintainers,

I ran into an issue that causes "mppe_compress[%d]: osize too small! …"
message to be printed. While debugging the issue, I stumbled across two
additional issues with the code in question.

1) The printk gives a "have" and a "need" value. However the "need"
   value does not represent the actual need, because the printed need
   is based on the osize variable instead of the isize variable used in
   the test. This appears to by a typo an is as old as the ppp_mppe.c
   file. The first patch attached changes the parameter to printk to
   print the actual need.

2) The message in question can be triggered by network packets. If that
   happens, there are a lot of printks and the whole system slows down
   to a crawl. Whenever network packets are processed, the printks must
   be rate limited in some way, so my second patch adds a call to
   net_ratelimit(). Note that even though checkpatch suggested to
   convert the printks to netdev_dbg, I felt that a bug fix should be
   minimal and thus not change this aspect.

Please consider these patches and CC me in replies.

Helmut

[-- Attachment #2: 0001-mppe-print-correct-osize-required-for-compression.patch --]
[-- Type: application/octet-stream, Size: 1270 bytes --]

From 77ef588c202c1e8b23f5292e6b64d7cd69a9daba Mon Sep 17 00:00:00 2001
From: Helmut Grohne <h.grohne@cygnusnetworks.de>
Date: Fri, 11 Oct 2013 14:18:50 +0200
Subject: [PATCH] mppe: print correct osize required for compression

An example message looks like this:

mppe_compress[76]: osize too small! (have: 1404 need: 1408)

The value printed as "need" is not the one that is actually needed,
because it is based on osize instead of isize as used in the test. In
the old implementation the "need" value would always 4 greater than the
"have" value.

Signed-off-by: Helmut Grohne <h.grohne@cygnusnetworks.de>
---
 drivers/net/ppp/ppp_mppe.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ppp/ppp_mppe.c b/drivers/net/ppp/ppp_mppe.c
index 9a1849a..3d595a8 100644
--- a/drivers/net/ppp/ppp_mppe.c
+++ b/drivers/net/ppp/ppp_mppe.c
@@ -385,7 +385,7 @@ mppe_compress(void *arg, unsigned char *ibuf, unsigned char *obuf,
 		/* Drop the packet if we should encrypt it, but can't. */
 		printk(KERN_DEBUG "mppe_compress[%d]: osize too small! "
 		       "(have: %d need: %d)\n", state->unit,
-		       osize, osize + MPPE_OVHD + 2);
+		       osize, isize + MPPE_OVHD + 2);
 		return -1;
 	}

-- 
1.7.10.4

[-- Attachment #3: Type: text/plain, Size: 1 bytes --]

[-- Attachment #4: 0001-mppe-rate-limit-osize-too-small.patch --]
[-- Type: application/octet-stream, Size: 2557 bytes --]

From 3332d572e21f52d65df841ba85537b65d336007e Mon Sep 17 00:00:00 2001
From: Helmut Grohne <h.grohne@cygnusnetworks.de>
Date: Fri, 11 Oct 2013 14:34:06 +0200
Subject: [PATCH] mppe: rate limit "osize too small"

These messages occur per packet and can be triggered by remote sites. As
such they must be rate limited in order to not clog the kernel log
buffers. The corresponding messages from ppp_generic.c are already rate
limited. Example:

mppe_compress[76]: osize too small! (have: 1404 need: 1408)
ppp: compressor dropped pkt
mppe_compress[76]: osize too small! (have: 1404 need: 1408)
ppp: compressor dropped pkt
mppe_compress[76]: osize too small! (have: 1404 need: 1408)
mppe_compress[76]: osize too small! (have: 1404 need: 1408)

Signed-off-by: Helmut Grohne <h.grohne@cygnusnetworks.de>
---
 drivers/net/ppp/ppp_mppe.c |   15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ppp/ppp_mppe.c b/drivers/net/ppp/ppp_mppe.c
index 9a1849a..7e71a7a 100644
--- a/drivers/net/ppp/ppp_mppe.c
+++ b/drivers/net/ppp/ppp_mppe.c
@@ -52,6 +52,7 @@
 #include <linux/string.h>
 #include <linux/crypto.h>
 #include <linux/mm.h>
+#include <linux/net.h>
 #include <linux/ppp_defs.h>
 #include <linux/ppp-comp.h>
 #include <linux/scatterlist.h>
@@ -383,9 +384,10 @@ mppe_compress(void *arg, unsigned char *ibuf, unsigned char *obuf,
 	/* Make sure we have enough room to generate an encrypted packet. */
 	if (osize < isize + MPPE_OVHD + 2) {
 		/* Drop the packet if we should encrypt it, but can't. */
-		printk(KERN_DEBUG "mppe_compress[%d]: osize too small! "
-		       "(have: %d need: %d)\n", state->unit,
-		       osize, osize + MPPE_OVHD + 2);
+		if (net_ratelimit())
+			printk(KERN_DEBUG "mppe_compress[%d]: osize too small! "
+			       "(have: %d need: %d)\n", state->unit,
+			       osize, osize + MPPE_OVHD + 2);
 		return -1;
 	}

@@ -497,9 +499,10 @@ mppe_decompress(void *arg, unsigned char *ibuf, int isize, unsigned char *obuf,
 	 * this is to account for possible PFC.
 	 */
 	if (osize < isize - MPPE_OVHD - 1) {
-		printk(KERN_DEBUG "mppe_decompress[%d]: osize too small! "
-		       "(have: %d need: %d)\n", state->unit,
-		       osize, isize - MPPE_OVHD - 1);
+		if (net_ratelimit())
+			printk(KERN_DEBUG "mppe_decompress[%d]: osize too small! "
+			       "(have: %d need: %d)\n", state->unit,
+			       osize, isize - MPPE_OVHD - 1);
 		return DECOMP_ERROR;
 	}
 	osize = isize - MPPE_OVHD - 2;	/* assume no PFC */
-- 
1.7.10.4

^ permalink raw reply related

* RE: [Xen-devel] [PATCH net-next v3 2/5] xen-netback: add support for IPv6 checksum offload from guest
From: Paul Durrant @ 2013-10-11 12:08 UTC (permalink / raw)
  To: Paul Durrant, Wei Liu
  Cc: netdev@vger.kernel.org, Ian Campbell, Wei Liu, David Vrabel,
	xen-devel@lists.xen.org
In-Reply-To: <9AAE0902D5BC7E449B7C8E4E778ABCD013262E@AMSPEX01CL01.citrite.net>

> -----Original Message-----
> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
> bounces@lists.xen.org] On Behalf Of Paul Durrant
> Sent: 11 October 2013 12:10
> To: Wei Liu
> Cc: netdev@vger.kernel.org; Ian Campbell; Wei Liu; David Vrabel; xen-
> devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH net-next v3 2/5] xen-netback: add support
> for IPv6 checksum offload from guest
> 
> > -----Original Message-----
> > From: Wei Liu [mailto:wei.liu2@citrix.com]
> > Sent: 11 October 2013 11:10
> > To: Paul Durrant
> > Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org; Wei Liu; David
> Vrabel;
> > Ian Campbell
> > Subject: Re: [PATCH net-next v3 2/5] xen-netback: add support for IPv6
> > checksum offload from guest
> >
> > On Thu, Oct 10, 2013 at 04:25:30PM +0100, Paul Durrant wrote:
> > [...]
> > > -#define PKT_PROT_LEN    (ETH_HLEN + \
> > > -			 VLAN_HLEN + \
> > > -			 sizeof(struct iphdr) + MAX_IPOPTLEN + \
> > > -			 sizeof(struct tcphdr) + MAX_TCP_OPTION_SPACE)
> > > +#define PKT_PROT_LEN 128
> >
> > I'm not against changing this, but could you please add comment on why
> > 128 is chosen.
> >
> 
> Hmm. I did mod the comment, but that change seems to have got lost
> somewhere.
> 
> > >
> > [...]
> > > +	while ((off <= sizeof(struct ipv6hdr) + ntohs(ipv6h->payload_len))
> > &&
> > > +	       !done) {
> > > +		switch (nexthdr) {
> > > +		case IPPROTO_FRAGMENT: {
> > > +			struct frag_hdr *hp = (void *)(skb->data + off);
> > > +
> > > +			header_size = skb->network_header +
> > > +				off +
> > > +				sizeof(struct frag_hdr);
> > > +			maybe_pull_tail(skb, header_size);
> > > +
> > > +			nexthdr = hp->nexthdr;
> > > +			off += 8;
> >
> > Can you use sizeof(frag_hdr) instead of 8?
> >
> > > +			break;
> > > +		}
> > > +		case IPPROTO_DSTOPTS:
> > > +		case IPPROTO_HOPOPTS:
> > > +		case IPPROTO_ROUTING: {
> > > +			struct ipv6_opt_hdr *hp = (void *)(skb->data + off);
> > > +
> > > +			header_size = skb->network_header +
> > > +				off +
> > > +				sizeof(struct ipv6_opt_hdr);
> > > +			maybe_pull_tail(skb, header_size);
> > > +
> > > +			nexthdr = hp->nexthdr;
> > > +			off += ipv6_optlen(hp);
> > > +			break;
> > > +		}
> > > +		case IPPROTO_AH: {
> > > +			struct ip_auth_hdr *hp = (void *)(skb->data + off);
> > > +
> > > +			header_size = skb->network_header +
> > > +				off +
> > > +				sizeof(struct ip_auth_hdr);
> > > +			maybe_pull_tail(skb, header_size);
> > > +
> > > +			nexthdr = hp->nexthdr;
> > > +			off += (hp->hdrlen+2)<<2;
> > > +			break;
> > > +		}
> > > +		case IPPROTO_TCP:
> > > +		case IPPROTO_UDP:
> > > +			found = true;
> > > +			/* Fall through */
> > > +		default:
> > > +			done = true;
> > > +			break;
> >
> > The above 'switch' doesn't seem to cover all IPPROTO_*. Will it cause
> > the loop to exit too early? In other words, does any IPPROTO_* not
> > listed above marks the end of parsing?
> >
> 
> AFAIK, hitting anything not in that switch would mean we don't have a TCP or
> UDP packet. I think the original code structure made that clearer so I'm going
> to go back to that.
> 

Looking at this again I realize I should not parse the fragment header; there's no way anyone should be sending a fragment with partial checksum.

  Paul

> > [...]
> > >  	unsigned long now = jiffies;
> > > @@ -1428,12 +1598,7 @@ static int xenvif_tx_submit(struct xenvif *vif,
> int
> > budget)
> > >
> > >  		xenvif_fill_frags(vif, skb);
> > >
> > > -		/*
> > > -		 * If the initial fragment was < PKT_PROT_LEN then
> > > -		 * pull through some bytes from the other fragments to
> > > -		 * increase the linear region to PKT_PROT_LEN bytes.
> > > -		 */
> > > -		if (skb_headlen(skb) < PKT_PROT_LEN &&
> > skb_is_nonlinear(skb)) {
> > > +		if (skb_is_nonlinear(skb) && skb_headlen(skb) <
> > PKT_PROT_LEN) {
> >
> > This change is not necessary: the comment is useful, and swapping two
> > operands of && doesn't have any effect.
> 
> Sorry, that was too much cutting and pasting around. I'll just ditch that hunk.
> 
>   Paul
> 
> >
> > Wei.
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply

* RE: [PATCH net-next v3 5/5] xen-netback: enable IPv6 TCP GSO to the guest
From: Paul Durrant @ 2013-10-11 11:11 UTC (permalink / raw)
  To: Wei Liu
  Cc: xen-devel@lists.xen.org, netdev@vger.kernel.org, Wei Liu,
	David Vrabel, Ian Campbell
In-Reply-To: <20131011100950.GF15556@zion.uk.xensource.com>

> -----Original Message-----
> From: Wei Liu [mailto:wei.liu2@citrix.com]
> Sent: 11 October 2013 11:10
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org; Wei Liu; David Vrabel;
> Ian Campbell
> Subject: Re: [PATCH net-next v3 5/5] xen-netback: enable IPv6 TCP GSO to
> the guest
> 
> On Thu, Oct 10, 2013 at 04:25:33PM +0100, Paul Durrant wrote:
> [...]
> >  	return max;
> > @@ -312,6 +312,7 @@ static struct xenvif_rx_meta
> *get_next_rx_buffer(struct xenvif *vif,
> >  	req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++);
> >
> >  	meta = npo->meta + npo->meta_prod++;
> > +	meta->gso_type = 0;
> 
> Should use XEN_NETIF_GSO_TYPE_NONE here.
> 
> >  	meta->gso_size = 0;
> >  	meta->size = 0;
> [...]
> > +	if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4) {
> > +		gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
> > +		gso_size = skb_shinfo(skb)->gso_size;
> > +	} else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6) {
> > +		gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
> > +		gso_size = skb_shinfo(skb)->gso_size;
> > +	} else {
> > +		gso_type = 0;
> 
> Ditto.
> 
> > +		gso_size = 0;
> > +	}
> > +
> [...]
> > @@ -438,10 +461,13 @@ static int xenvif_gop_skb(struct sk_buff *skb,
> >  	req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++);
> >  	meta = npo->meta + npo->meta_prod++;
> >
> > -	if (!vif->gso_prefix)
> > -		meta->gso_size = skb_shinfo(skb)->gso_size;
> > -	else
> > +	if ((1 << gso_type) & vif->gso_mask) {
> > +		meta->gso_type = gso_type;
> > +		meta->gso_size = gso_size;
> > +	} else {
> > +		meta->gso_type = 0;
> 
> Ditto.
> 
> >  		meta->gso_size = 0;
> > +	}
> >
> >  	meta->size = 0;
> >  	meta->id = req->id;
> > @@ -587,7 +613,8 @@ void xenvif_rx_action(struct xenvif *vif)
> >
> >  		vif = netdev_priv(skb->dev);
> >
> > -		if (vif->meta[npo.meta_cons].gso_size && vif->gso_prefix) {
> > +		if ((1 << vif->meta[npo.meta_cons].gso_type) &
> > +		    vif->gso_prefix_mask) {
> 
> The logic is changed here. Why do you remove the test on gso_size?
> 
> >  			resp = RING_GET_RESPONSE(&vif->rx,
> >  						 vif->rx.rsp_prod_pvt++);
> >
> > @@ -624,7 +651,8 @@ void xenvif_rx_action(struct xenvif *vif)
> >  					vif->meta[npo.meta_cons].size,
> >  					flags);
> >
> > -		if (vif->meta[npo.meta_cons].gso_size && !vif->gso_prefix)
> {
> > +		if ((1 << vif->meta[npo.meta_cons].gso_type) &
> > +		    vif->gso_mask) {
> 
> Ditto.
> 
> >  			struct xen_netif_extra_info *gso =
> >  				(struct xen_netif_extra_info *)
> >  				RING_GET_RESPONSE(&vif->rx,
> > @@ -632,8 +660,8 @@ void xenvif_rx_action(struct xenvif *vif)
> >
> [...]
> >  	 * was turned on by a zero value in (or lack of)
> > diff --git a/include/xen/interface/io/netif.h
> b/include/xen/interface/io/netif.h
> > index d7dd8d7..41674bc 100644
> > --- a/include/xen/interface/io/netif.h
> > +++ b/include/xen/interface/io/netif.h
> > @@ -113,6 +113,7 @@ struct xen_netif_tx_request {
> >  #define  XEN_NETIF_EXTRA_FLAG_MORE
> 	(1U<<_XEN_NETIF_EXTRA_FLAG_MORE)
> >
> >  /* GSO types */
> > +#define XEN_NETIF_GSO_TYPE_NONE         (0)
> 
> Indentation.
> 

Agreed. I managed to lose a load of changes and this patch was one I had to re-construct; I'll fix.

  Paul

^ permalink raw reply

* RE: [PATCH net-next v3 2/5] xen-netback: add support for IPv6 checksum offload from guest
From: Paul Durrant @ 2013-10-11 11:09 UTC (permalink / raw)
  To: Wei Liu
  Cc: xen-devel@lists.xen.org, netdev@vger.kernel.org, Wei Liu,
	David Vrabel, Ian Campbell
In-Reply-To: <20131011100943.GE15556@zion.uk.xensource.com>

> -----Original Message-----
> From: Wei Liu [mailto:wei.liu2@citrix.com]
> Sent: 11 October 2013 11:10
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org; Wei Liu; David Vrabel;
> Ian Campbell
> Subject: Re: [PATCH net-next v3 2/5] xen-netback: add support for IPv6
> checksum offload from guest
> 
> On Thu, Oct 10, 2013 at 04:25:30PM +0100, Paul Durrant wrote:
> [...]
> > -#define PKT_PROT_LEN    (ETH_HLEN + \
> > -			 VLAN_HLEN + \
> > -			 sizeof(struct iphdr) + MAX_IPOPTLEN + \
> > -			 sizeof(struct tcphdr) + MAX_TCP_OPTION_SPACE)
> > +#define PKT_PROT_LEN 128
> 
> I'm not against changing this, but could you please add comment on why
> 128 is chosen.
> 

Hmm. I did mod the comment, but that change seems to have got lost somewhere.

> >
> [...]
> > +	while ((off <= sizeof(struct ipv6hdr) + ntohs(ipv6h->payload_len))
> &&
> > +	       !done) {
> > +		switch (nexthdr) {
> > +		case IPPROTO_FRAGMENT: {
> > +			struct frag_hdr *hp = (void *)(skb->data + off);
> > +
> > +			header_size = skb->network_header +
> > +				off +
> > +				sizeof(struct frag_hdr);
> > +			maybe_pull_tail(skb, header_size);
> > +
> > +			nexthdr = hp->nexthdr;
> > +			off += 8;
> 
> Can you use sizeof(frag_hdr) instead of 8?
> 
> > +			break;
> > +		}
> > +		case IPPROTO_DSTOPTS:
> > +		case IPPROTO_HOPOPTS:
> > +		case IPPROTO_ROUTING: {
> > +			struct ipv6_opt_hdr *hp = (void *)(skb->data + off);
> > +
> > +			header_size = skb->network_header +
> > +				off +
> > +				sizeof(struct ipv6_opt_hdr);
> > +			maybe_pull_tail(skb, header_size);
> > +
> > +			nexthdr = hp->nexthdr;
> > +			off += ipv6_optlen(hp);
> > +			break;
> > +		}
> > +		case IPPROTO_AH: {
> > +			struct ip_auth_hdr *hp = (void *)(skb->data + off);
> > +
> > +			header_size = skb->network_header +
> > +				off +
> > +				sizeof(struct ip_auth_hdr);
> > +			maybe_pull_tail(skb, header_size);
> > +
> > +			nexthdr = hp->nexthdr;
> > +			off += (hp->hdrlen+2)<<2;
> > +			break;
> > +		}
> > +		case IPPROTO_TCP:
> > +		case IPPROTO_UDP:
> > +			found = true;
> > +			/* Fall through */
> > +		default:
> > +			done = true;
> > +			break;
> 
> The above 'switch' doesn't seem to cover all IPPROTO_*. Will it cause
> the loop to exit too early? In other words, does any IPPROTO_* not
> listed above marks the end of parsing?
> 

AFAIK, hitting anything not in that switch would mean we don't have a TCP or UDP packet. I think the original code structure made that clearer so I'm going to go back to that.

> [...]
> >  	unsigned long now = jiffies;
> > @@ -1428,12 +1598,7 @@ static int xenvif_tx_submit(struct xenvif *vif, int
> budget)
> >
> >  		xenvif_fill_frags(vif, skb);
> >
> > -		/*
> > -		 * If the initial fragment was < PKT_PROT_LEN then
> > -		 * pull through some bytes from the other fragments to
> > -		 * increase the linear region to PKT_PROT_LEN bytes.
> > -		 */
> > -		if (skb_headlen(skb) < PKT_PROT_LEN &&
> skb_is_nonlinear(skb)) {
> > +		if (skb_is_nonlinear(skb) && skb_headlen(skb) <
> PKT_PROT_LEN) {
> 
> This change is not necessary: the comment is useful, and swapping two
> operands of && doesn't have any effect.

Sorry, that was too much cutting and pasting around. I'll just ditch that hunk.

  Paul

> 
> Wei.

^ permalink raw reply

* Re: [PATCH net] bridge: allow receiption on disabled port
From: Felix Fietkau @ 2013-10-11 10:18 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20131010193551.462fc1f8@nehalam.linuxnetplumber.net>

On 2013-10-11 4:35 AM, Stephen Hemminger wrote:
> This is what I was thinking would be better.
> 
> Don't want these packets leaking into PRE_ROUTEING chain or have
> any chance to get flooded out other ports.
> 
> Compile tested only...
> 
> I could use another goto instead but that becomes spaghetti and
> never like to jump back into a block.
[...]
>  forward:
>  	switch (p->state) {
> +	case BR_STATE_DISABLED:
> +		if (!ether_addr_equal(p->br->dev->dev_addr, dest))
> +			goto drop;
Checking against the bridge device address isn't enough, WPA EAPOL
packets are addressed to the wifi device MAC address.

- Felix

^ permalink raw reply

* Re: [PATCH net-next v3 5/5] xen-netback: enable IPv6 TCP GSO to the guest
From: Wei Liu @ 2013-10-11 10:09 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, netdev, Wei Liu, David Vrabel, Ian Campbell
In-Reply-To: <1381418733-20470-6-git-send-email-paul.durrant@citrix.com>

On Thu, Oct 10, 2013 at 04:25:33PM +0100, Paul Durrant wrote:
[...]
>  	return max;
> @@ -312,6 +312,7 @@ static struct xenvif_rx_meta *get_next_rx_buffer(struct xenvif *vif,
>  	req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++);
>  
>  	meta = npo->meta + npo->meta_prod++;
> +	meta->gso_type = 0;

Should use XEN_NETIF_GSO_TYPE_NONE here.

>  	meta->gso_size = 0;
>  	meta->size = 0;
[...]
> +	if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4) {
> +		gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
> +		gso_size = skb_shinfo(skb)->gso_size;
> +	} else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6) {
> +		gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
> +		gso_size = skb_shinfo(skb)->gso_size;
> +	} else {
> +		gso_type = 0;

Ditto.

> +		gso_size = 0;
> +	}
> +
[...]
> @@ -438,10 +461,13 @@ static int xenvif_gop_skb(struct sk_buff *skb,
>  	req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++);
>  	meta = npo->meta + npo->meta_prod++;
>  
> -	if (!vif->gso_prefix)
> -		meta->gso_size = skb_shinfo(skb)->gso_size;
> -	else
> +	if ((1 << gso_type) & vif->gso_mask) {
> +		meta->gso_type = gso_type;
> +		meta->gso_size = gso_size;
> +	} else {
> +		meta->gso_type = 0;

Ditto.

>  		meta->gso_size = 0;
> +	}
>  
>  	meta->size = 0;
>  	meta->id = req->id;
> @@ -587,7 +613,8 @@ void xenvif_rx_action(struct xenvif *vif)
>  
>  		vif = netdev_priv(skb->dev);
>  
> -		if (vif->meta[npo.meta_cons].gso_size && vif->gso_prefix) {
> +		if ((1 << vif->meta[npo.meta_cons].gso_type) &
> +		    vif->gso_prefix_mask) {

The logic is changed here. Why do you remove the test on gso_size?

>  			resp = RING_GET_RESPONSE(&vif->rx,
>  						 vif->rx.rsp_prod_pvt++);
>  
> @@ -624,7 +651,8 @@ void xenvif_rx_action(struct xenvif *vif)
>  					vif->meta[npo.meta_cons].size,
>  					flags);
>  
> -		if (vif->meta[npo.meta_cons].gso_size && !vif->gso_prefix) {
> +		if ((1 << vif->meta[npo.meta_cons].gso_type) &
> +		    vif->gso_mask) {

Ditto.

>  			struct xen_netif_extra_info *gso =
>  				(struct xen_netif_extra_info *)
>  				RING_GET_RESPONSE(&vif->rx,
> @@ -632,8 +660,8 @@ void xenvif_rx_action(struct xenvif *vif)
>  
[...]
>  	 * was turned on by a zero value in (or lack of)
> diff --git a/include/xen/interface/io/netif.h b/include/xen/interface/io/netif.h
> index d7dd8d7..41674bc 100644
> --- a/include/xen/interface/io/netif.h
> +++ b/include/xen/interface/io/netif.h
> @@ -113,6 +113,7 @@ struct xen_netif_tx_request {
>  #define  XEN_NETIF_EXTRA_FLAG_MORE	(1U<<_XEN_NETIF_EXTRA_FLAG_MORE)
>  
>  /* GSO types */
> +#define XEN_NETIF_GSO_TYPE_NONE         (0)

Indentation.

Wei.

^ permalink raw reply

* Re: [PATCH net-next v3 2/5] xen-netback: add support for IPv6 checksum offload from guest
From: Wei Liu @ 2013-10-11 10:09 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, netdev, Wei Liu, David Vrabel, Ian Campbell
In-Reply-To: <1381418733-20470-3-git-send-email-paul.durrant@citrix.com>

On Thu, Oct 10, 2013 at 04:25:30PM +0100, Paul Durrant wrote:
[...]
> -#define PKT_PROT_LEN    (ETH_HLEN + \
> -			 VLAN_HLEN + \
> -			 sizeof(struct iphdr) + MAX_IPOPTLEN + \
> -			 sizeof(struct tcphdr) + MAX_TCP_OPTION_SPACE)
> +#define PKT_PROT_LEN 128

I'm not against changing this, but could you please add comment on why
128 is chosen.

>  
[...]
> +	while ((off <= sizeof(struct ipv6hdr) + ntohs(ipv6h->payload_len)) &&
> +	       !done) {
> +		switch (nexthdr) {
> +		case IPPROTO_FRAGMENT: {
> +			struct frag_hdr *hp = (void *)(skb->data + off);
> +
> +			header_size = skb->network_header +
> +				off +
> +				sizeof(struct frag_hdr);
> +			maybe_pull_tail(skb, header_size);
> +
> +			nexthdr = hp->nexthdr;
> +			off += 8;

Can you use sizeof(frag_hdr) instead of 8?

> +			break;
> +		}
> +		case IPPROTO_DSTOPTS:
> +		case IPPROTO_HOPOPTS:
> +		case IPPROTO_ROUTING: {
> +			struct ipv6_opt_hdr *hp = (void *)(skb->data + off);
> +
> +			header_size = skb->network_header +
> +				off +
> +				sizeof(struct ipv6_opt_hdr);
> +			maybe_pull_tail(skb, header_size);
> +
> +			nexthdr = hp->nexthdr;
> +			off += ipv6_optlen(hp);
> +			break;
> +		}
> +		case IPPROTO_AH: {
> +			struct ip_auth_hdr *hp = (void *)(skb->data + off);
> +
> +			header_size = skb->network_header +
> +				off +
> +				sizeof(struct ip_auth_hdr);
> +			maybe_pull_tail(skb, header_size);
> +
> +			nexthdr = hp->nexthdr;
> +			off += (hp->hdrlen+2)<<2;
> +			break;
> +		}
> +		case IPPROTO_TCP:
> +		case IPPROTO_UDP:
> +			found = true;
> +			/* Fall through */
> +		default:
> +			done = true;
> +			break;

The above 'switch' doesn't seem to cover all IPPROTO_*. Will it cause
the loop to exit too early? In other words, does any IPPROTO_* not
listed above marks the end of parsing?

[...]
>  	unsigned long now = jiffies;
> @@ -1428,12 +1598,7 @@ static int xenvif_tx_submit(struct xenvif *vif, int budget)
>  
>  		xenvif_fill_frags(vif, skb);
>  
> -		/*
> -		 * If the initial fragment was < PKT_PROT_LEN then
> -		 * pull through some bytes from the other fragments to
> -		 * increase the linear region to PKT_PROT_LEN bytes.
> -		 */
> -		if (skb_headlen(skb) < PKT_PROT_LEN && skb_is_nonlinear(skb)) {
> +		if (skb_is_nonlinear(skb) && skb_headlen(skb) < PKT_PROT_LEN) {

This change is not necessary: the comment is useful, and swapping two
operands of && doesn't have any effect.

Wei.

^ permalink raw reply

* [patch] farsync: fix info leak in ioctl
From: Dan Carpenter @ 2013-10-11  9:50 UTC (permalink / raw)
  To: Kevin Curtis; +Cc: Salva Peiró, security, netdev
In-Reply-To: <5257BFBA.7030405@ai2.upv.es>

From: Salva Peiró <speiro@ai2.upv.es>

The fst_get_iface() code fails to initialize the two padding bytes of
struct sync_serial_settings after the ->loopback member. Add an explicit
memset(0) before filling the structure to avoid the info leak.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

---
 linux-3.4-xm/drivers/net/wan/farsync.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/linux-3.4-xm/drivers/net/wan/farsync.c b/linux-3.4-xm/drivers/net/wan/farsync.c
index 1a62318..3710427 100644
--- a/drivers/net/wan/farsync.c
+++ b/drivers/net/wan/farsync.c
@@ -1972,6 +1972,7 @@ fst_get_iface(struct fst_card_info *card, struct fst_port_info *port,
 	}
 
 	i = port->index;
+	memset(&sync, 0, sizeof(sync));
 	sync.clock_rate = FST_RDL(card, portConfig[i].lineSpeed);
 	/* Lucky card and linux use same encoding here */
 	sync.clock_type = FST_RDB(card, portConfig[i].internalClock) ==
-- 
1.7.10.4

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox