Netdev List
 help / color / mirror / Atom feed
* [PATCHv2 net-next 0/2] sunvnet: Use multiple Tx queues.
From: Sowmini Varadhan @ 2014-10-30 16:45 UTC (permalink / raw)
  To: davem, sowmini.varadhan; +Cc: netdev


v2: moved tcp fix out of this series per David Miller feedback

The primary objective of this patch-set is to address the suggestion from
  http://marc.info/?l=linux-netdev&m=140790778931563&w=2
With the changes in Patch 2, every vnet_port will get  packets from 
a single tx-queue, and flow-control/head-of-line-blocking is 
confined to the vnet_ports that share that tx queue (as opposed to 
flow-controlling *all* peers).

Patch 1 is an optimization that resets the DATA_READY bit when
we re-enable Rx interrupts.  This optimization lets us exit quickly 
from vnet_event_napi() when new data has not triggered an interrupt.

Sowmini Varadhan (3):
  Correction to RFC number in comment
  Reset LDC_EVENT_DATA_READY when napi completes.
  Use one Tx queue per vnet_port

 drivers/net/ethernet/sun/sunvnet.c | 95 +++++++++++++++++++++++++-------------
 drivers/net/ethernet/sun/sunvnet.h |  2 +
 net/ipv4/tcp_input.c               |  2 +-
 3 files changed, 67 insertions(+), 32 deletions(-)

-- 
1.8.4.2

^ permalink raw reply

* RE: [PATCH net-next 7/8] net: Add calaulation of non folded IPV6 pseudo header checksum
From: David Laight @ 2014-10-30 16:39 UTC (permalink / raw)
  To: 'Or Gerlitz'
  Cc: David S. Miller, netdev@vger.kernel.org, Matan Barak, Amir Vadai,
	Saeed Mahameed, Shani Michaeli
In-Reply-To: <5452682B.2030802@mellanox.com>

From: Or Gerlitz [mailto:ogerlitz@mellanox.com]
> On 10/30/2014 6:25 PM, David Laight wrote:
> >> >+static inline __wsum csum_ipv6_magic_nofold(const struct in6_addr *saddr,
> >> >+					    const struct in6_addr *daddr,
> >> >+					    __u32 len, unsigned short proto,
> >> >+					    __wsum sum)
> >> >+{
> >> >+	__wsum res = sum;
> >> >+
> >> >+	res = csum_add(res, (__force __wsum)saddr->in6_u.u6_addr32[0]);
> >> >+	res = csum_add(res, (__force __wsum)saddr->in6_u.u6_addr32[1]);
> >> >+	res = csum_add(res, (__force __wsum)saddr->in6_u.u6_addr32[2]);
> >> >+	res = csum_add(res, (__force __wsum)saddr->in6_u.u6_addr32[3]);
> >> >+	res = csum_add(res, (__force __wsum)daddr->in6_u.u6_addr32[0]);
> >> >+	res = csum_add(res, (__force __wsum)daddr->in6_u.u6_addr32[1]);
> >> >+	res = csum_add(res, (__force __wsum)daddr->in6_u.u6_addr32[2]);
> >> >+	res = csum_add(res, (__force __wsum)daddr->in6_u.u6_addr32[3]);
>
> > That probably generates a very long dependency chain.
> >
> 
> Could you clarify this comment a bit?

csum_add() probably generates a 32bit 'add with carry' instruction.
So the above generates 8 instructions that have to be executed in series
(dependencies on the register and the carry flag).

On a 64bit cpu there are other options, eg adding 32bit values into
several 64bit registers, then adding those together and finally
collapsing the value to 32 then 16 bits.

Maybe __wsum does end up being 64bit (not looked), but gcc won't
generate a 'tree' of additions, it will still generate a dependency chain.

Hopefully the software 'checksum a buffer' function is written to
avoid these problems.

	David

^ permalink raw reply

* Re: [PATCH net-next 7/8] net: Add calaulation of non folded IPV6 pseudo header checksum
From: Or Gerlitz @ 2014-10-30 16:32 UTC (permalink / raw)
  To: David Laight
  Cc: David S. Miller, netdev@vger.kernel.org, Matan Barak, Amir Vadai,
	Saeed Mahameed, Shani Michaeli
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6D1C9E2415@AcuExch.aculab.com>

On 10/30/2014 6:25 PM, David Laight wrote:
>> >+static inline __wsum csum_ipv6_magic_nofold(const struct in6_addr *saddr,
>> >+					    const struct in6_addr *daddr,
>> >+					    __u32 len, unsigned short proto,
>> >+					    __wsum sum)
>> >+{
>> >+	__wsum res = sum;
>> >+
>> >+	res = csum_add(res, (__force __wsum)saddr->in6_u.u6_addr32[0]);
>> >+	res = csum_add(res, (__force __wsum)saddr->in6_u.u6_addr32[1]);
>> >+	res = csum_add(res, (__force __wsum)saddr->in6_u.u6_addr32[2]);
>> >+	res = csum_add(res, (__force __wsum)saddr->in6_u.u6_addr32[3]);
>> >+	res = csum_add(res, (__force __wsum)daddr->in6_u.u6_addr32[0]);
>> >+	res = csum_add(res, (__force __wsum)daddr->in6_u.u6_addr32[1]);
>> >+	res = csum_add(res, (__force __wsum)daddr->in6_u.u6_addr32[2]);
>> >+	res = csum_add(res, (__force __wsum)daddr->in6_u.u6_addr32[3]);
> That probably generates a very long dependency chain.
>

Could you clarify this comment a bit?

Or.

^ permalink raw reply

* Re: [PATCH net-next 1/3] tcp: Correction to RFC number in comment
From: David Miller @ 2014-10-30 16:32 UTC (permalink / raw)
  To: sowmini.varadhan; +Cc: netdev
In-Reply-To: <20141029192729.GF6582@oracle.com>

From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Date: Wed, 29 Oct 2014 15:27:29 -0400

> Challenge ACK is described in RFC 5961, fix typo.
> 
> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>

This TCP change has nothing to do with your sunvnet driver changes.

Please do not ever mix unrelated changes like this within a series.

Submit the TCP change on it's own, and then respin the sunvnet
driver specific patches separately.

Thanks.

^ permalink raw reply

* Re: [GIT PULL nf-next] IPVS Updates for v3.19
From: Pablo Neira Ayuso @ 2014-10-30 16:30 UTC (permalink / raw)
  To: Simon Horman
  Cc: lvs-devel, netdev, netfilter-devel, Wensong Zhang,
	Julian Anastasov
In-Reply-To: <1414457960-20864-1-git-send-email-horms@verge.net.au>

On Tue, Oct 28, 2014 at 09:59:19AM +0900, Simon Horman wrote:
> Hi Pablo,
> 
> please consider these IPVS updates for v3.19.
> 
> The single patch in this series fixes some minor fallout from adding
> support IPv6 real servers in IPv4 virtual-services and vice versa.
> 
> It should not have any run-time affect other than perhaps saving a few cycles.
> 
> 
> The following changes since commit 61ed53deb1c6a4386d8710dbbfcee8779c381931:
> 
>   Merge tag 'ntb-3.18' of git://github.com/jonmason/ntb (2014-10-19 12:58:22 -0700)
> 
> are available in the git repository at:
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next.git tags/ipvs-for-v3.19

Pulled, thanks Simon.

^ permalink raw reply

* Re: [PATCH] net: gianfar: fix dma check map error when DMA_API_DEBUG is enabled
From: Claudiu Manoil @ 2014-10-30 16:28 UTC (permalink / raw)
  To: Kevin Hao, netdev, David Miller
In-Reply-To: <1414664727-21988-1-git-send-email-haokexin@gmail.com>

On 10/30/2014 12:25 PM, Kevin Hao wrote:

[...]

> @@ -2406,6 +2416,25 @@ static int gfar_start_xmit(struct sk_buff *skb, struct net_device *dev)
>   	spin_unlock_irqrestore(&tx_queue->txlock, flags);
>
>   	return NETDEV_TX_OK;
> +
> +dma_map_err:
> +	txbdp = next_txbd(txbdp_start, base, tx_queue->tx_ring_size);
> +	if (do_tstamp)
> +		txbdp = next_txbd(txbdp, base, tx_queue->tx_ring_size);
> +	for (i = 0; i < nr_frags; i++) {
> +		lstatus = txbdp->lstatus;
> +		if (!(lstatus & BD_LFLAG(TXBD_READY)))
> +			break;
> +
> +		txbdp->lstatus = lstatus & ~BD_LFLAG(TXBD_READY);
> +		bufaddr = txbdp->bufPtr;
> +		dma_unmap_page(priv->dev, bufaddr, txbdp->length,
> +			       DMA_TO_DEVICE);
> +		txbdp = next_txbd(txbdp, base, tx_queue->tx_ring_size);
> +	}
> +	gfar_wmb();

Why use the wmb() memory barrier here?

> +	dev_kfree_skb_any(skb);
> +	return NETDEV_TX_OK;
>   }
>

[...]

Hi Dave,

The patch seems ok at first glance (except a minor comment) but I'd like 
to have it tested first because it modifies sensitive code.
I can re-send it to netdev later, after we're done testing it.
Maybe it would be better to stack up a few more gianfar fixes in the 
meantime and send them all to netdev as a pull request, later on.

Thanks,
Claudiu

^ permalink raw reply

* RE: [PATCH net-next 7/8] net: Add calaulation of non folded IPV6 pseudo header checksum
From: David Laight @ 2014-10-30 16:25 UTC (permalink / raw)
  To: 'Or Gerlitz', David S. Miller
  Cc: netdev@vger.kernel.org, Matan Barak, Amir Vadai, Saeed Mahameed,
	Shani Michaeli
In-Reply-To: <1414685216-28907-8-git-send-email-ogerlitz@mellanox.com>

From: Or Gerlitz
> From: Shani Michaeli <shanim@mellanox.com>
> 
> Compute IPV6 pseudo header checksum without folding it to a 16 bit
> return value.
> 
> Signed-off-by: Shani Michaeli <shanim@mellanox.com>
> Signed-off-by: Matan Barak <matanb@mellanox.com>
> ---
>  include/net/ip6_checksum.h |   21 +++++++++++++++++++++
>  1 files changed, 21 insertions(+), 0 deletions(-)
> 
> diff --git a/include/net/ip6_checksum.h b/include/net/ip6_checksum.h
> index 1a49b73..c45d690 100644
> --- a/include/net/ip6_checksum.h
> +++ b/include/net/ip6_checksum.h
> @@ -41,6 +41,27 @@ __sum16 csum_ipv6_magic(const struct in6_addr *saddr,
>  			__wsum csum);
>  #endif
> 
> +static inline __wsum csum_ipv6_magic_nofold(const struct in6_addr *saddr,
> +					    const struct in6_addr *daddr,
> +					    __u32 len, unsigned short proto,
> +					    __wsum sum)
> +{
> +	__wsum res = sum;
> +
> +	res = csum_add(res, (__force __wsum)saddr->in6_u.u6_addr32[0]);
> +	res = csum_add(res, (__force __wsum)saddr->in6_u.u6_addr32[1]);
> +	res = csum_add(res, (__force __wsum)saddr->in6_u.u6_addr32[2]);
> +	res = csum_add(res, (__force __wsum)saddr->in6_u.u6_addr32[3]);
> +	res = csum_add(res, (__force __wsum)daddr->in6_u.u6_addr32[0]);
> +	res = csum_add(res, (__force __wsum)daddr->in6_u.u6_addr32[1]);
> +	res = csum_add(res, (__force __wsum)daddr->in6_u.u6_addr32[2]);
> +	res = csum_add(res, (__force __wsum)daddr->in6_u.u6_addr32[3]);

That probably generates a very long dependency chain.

> +	res = csum_add(res, (__force __wsum)htonl(len));
> +	res = csum_add(res, (__force __wsum)htonl(proto));

htonl() doesn't look right for a 16bit value.
It might not matter (because the final checksum is 16bits).

	David

> +
> +	return res;
> +}
> +
>  static inline __wsum ip6_compute_pseudo(struct sk_buff *skb, int proto)
>  {
>  	return ~csum_unfold(csum_ipv6_magic(&ipv6_hdr(skb)->saddr,
> --
> 1.7.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [patch] Bluetooth: 6lowpan: use after free in disconnect_devices()
From: Marcel Holtmann @ 2014-10-30 16:24 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Gustavo F. Padovan, Johan Hedberg, David S. Miller,
	BlueZ development, Network Development, linux-kernel,
	kernel-janitors
In-Reply-To: <20141029161057.GF5290@mwanda>

Hi Dan,

> This was accidentally changed from list_for_each_entry_safe() to
> list_for_each_entry() so now it has a use after free bug.  I've changed
> it back.
> 
> Fixes: 90305829635d ('Bluetooth: 6lowpan: Converting rwlocks to use RCU')
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

patch has been applied to bluetooth-next tree.

Regards

Marcel

^ permalink raw reply

* Re: net: fec: fix regression on i.MX28 introduced by rx_copybreak support
From: David Miller @ 2014-10-30 16:17 UTC (permalink / raw)
  To: LW
  Cc: netdev, rmk+kernel, Frank.Li, fabio.estevam, linux-kernel,
	linux-arm-kernel
In-Reply-To: <20141030075104.05e44b43@ipc1.ka-ro>

From: Lothar Waßmann <LW@KARO-electronics.de>
Date: Thu, 30 Oct 2014 07:51:04 +0100

>> Also, I don't thnk your DIV_ROUND_UP() eliminate for the loop
>> in swap_buffer() is valid.  The whole point is that the current
>> code handles buffers which have a length which is not a multiple
>> of 4 properly, after your change it will no longer do so.
>>
> Do you really think so?

Yes, because you're rounding down so you'll miss the final
partial word (if any).

^ permalink raw reply

* [PATCH net-next 8/8] net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE
From: Or Gerlitz @ 2014-10-30 16:06 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Matan Barak, Amir Vadai, Saeed Mahameed, Shani Michaeli,
	Jerry Chu, Or Gerlitz
In-Reply-To: <1414685216-28907-1-git-send-email-ogerlitz@mellanox.com>

From: Shani Michaeli <shanim@mellanox.com>

When processing received traffic, pass CHECKSUM_COMPLETE status to the
stack, with calculated checksum for non TCP/UDP packets (such
as GRE or ICMP).

Although the stack expects checksum which doesn't include the pseudo
header, the HW adds it. To address that, we are subtracting the pseudo
header checksum from the checksum value provided by the HW.

In the IPv6 case, we also compute/add the IP header checksum which
is not added by the HW for such packets.

Cc: Jerry Chu <hkchu@google.com>
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c |    2 +-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c  |    5 +
 drivers/net/ethernet/mellanox/mlx4/en_port.c    |    2 +
 drivers/net/ethernet/mellanox/mlx4/en_rx.c      |  116 +++++++++++++++++++++-
 drivers/net/ethernet/mellanox/mlx4/main.c       |    9 ++
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h    |    5 +-
 include/linux/mlx4/device.h                     |    1 +
 7 files changed, 132 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
index 8ea4d5b..6c64323 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
@@ -115,7 +115,7 @@ static const char main_strings[][ETH_GSTRING_LEN] = {
 	"tso_packets",
 	"xmit_more",
 	"queue_stopped", "wake_queue", "tx_timeout", "rx_alloc_failed",
-	"rx_csum_good", "rx_csum_none", "tx_chksum_offload",
+	"rx_csum_good", "rx_csum_none", "rx_csum_complete", "tx_chksum_offload",
 
 	/* packet statistics */
 	"broadcast", "rx_prio_0", "rx_prio_1", "rx_prio_2", "rx_prio_3",
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 0efbae9..d1eb25d 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -1893,6 +1893,7 @@ static void mlx4_en_clear_stats(struct net_device *dev)
 		priv->rx_ring[i]->packets = 0;
 		priv->rx_ring[i]->csum_ok = 0;
 		priv->rx_ring[i]->csum_none = 0;
+		priv->rx_ring[i]->csum_complete = 0;
 	}
 }
 
@@ -2503,6 +2504,10 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
 	/* Query for default mac and max mtu */
 	priv->max_mtu = mdev->dev->caps.eth_mtu_cap[priv->port];
 
+	if (mdev->dev->caps.rx_checksum_flags_port[priv->port] &
+	    MLX4_RX_CSUM_MODE_VAL_NON_TCP_UDP)
+		priv->flags |= MLX4_EN_FLAG_RX_CSUM_NON_TCP_UDP;
+
 	/* Set default MAC */
 	dev->addr_len = ETH_ALEN;
 	mlx4_en_u64_to_mac(dev->dev_addr, mdev->dev->caps.def_mac[priv->port]);
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_port.c b/drivers/net/ethernet/mellanox/mlx4/en_port.c
index 134b12e..6cb8007 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_port.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_port.c
@@ -155,11 +155,13 @@ int mlx4_en_DUMP_ETH_STATS(struct mlx4_en_dev *mdev, u8 port, u8 reset)
 	stats->rx_bytes = 0;
 	priv->port_stats.rx_chksum_good = 0;
 	priv->port_stats.rx_chksum_none = 0;
+	priv->port_stats.rx_chksum_complete = 0;
 	for (i = 0; i < priv->rx_ring_num; i++) {
 		stats->rx_packets += priv->rx_ring[i]->packets;
 		stats->rx_bytes += priv->rx_ring[i]->bytes;
 		priv->port_stats.rx_chksum_good += priv->rx_ring[i]->csum_ok;
 		priv->port_stats.rx_chksum_none += priv->rx_ring[i]->csum_none;
+		priv->port_stats.rx_chksum_complete += priv->rx_ring[i]->csum_complete;
 	}
 	stats->tx_packets = 0;
 	stats->tx_bytes = 0;
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 2a29a1a..f8a0449 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -42,6 +42,10 @@
 #include <linux/vmalloc.h>
 #include <linux/irq.h>
 
+#if IS_ENABLED(CONFIG_IPV6)
+#include <net/ip6_checksum.h>
+#endif
+
 #include "mlx4_en.h"
 
 static int mlx4_alloc_pages(struct mlx4_en_priv *priv,
@@ -642,6 +646,86 @@ static void mlx4_en_refill_rx_buffers(struct mlx4_en_priv *priv,
 	}
 }
 
+/* When hardware doesn't strip the vlan, we need to calculate the checksum
+ * over it and add it to the hardware's checksum calculation
+ */
+static inline __wsum get_fixed_vlan_csum(__wsum hw_checksum,
+					 struct vlan_hdr *vlanh)
+{
+	return csum_add(hw_checksum, *(__wsum *)vlanh);
+}
+
+/* Although the stack expects checksum which doesn't include the pseudo
+ * header, the HW adds it. To address that, we are subtracting the pseudo
+ * header checksum from the checksum value provided by the HW.
+ */
+static void get_fixed_ipv4_csum(__wsum hw_checksum, struct sk_buff *skb,
+				struct iphdr *iph)
+{
+	__u16 length_for_csum = 0;
+	__wsum csum_pseudo_header = 0;
+
+	length_for_csum = (be16_to_cpu(iph->tot_len) - (iph->ihl << 2));
+	csum_pseudo_header = csum_tcpudp_nofold(iph->saddr, iph->daddr,
+						length_for_csum, iph->protocol, 0);
+	skb->csum = csum_sub(hw_checksum, csum_pseudo_header);
+}
+
+#if IS_ENABLED(CONFIG_IPV6)
+/* In IPv6 packets, besides subtracting the pseudo header checksum,
+ * we also compute/add the IP header checksum which
+ * is not added by the HW.
+ */
+static int get_fixed_ipv6_csum(__wsum hw_checksum, struct sk_buff *skb,
+			       struct ipv6hdr *ipv6h)
+{
+	__wsum csum_pseudo_header = 0;
+
+	if (ipv6h->nexthdr == IPPROTO_FRAGMENT || ipv6h->nexthdr == IPPROTO_HOPOPTS)
+		return -1;
+	hw_checksum = csum_add(hw_checksum, (__force __wsum)(ipv6h->nexthdr << 8));
+
+	csum_pseudo_header = csum_ipv6_magic_nofold(&ipv6h->saddr,
+						    &ipv6h->daddr,
+						    ntohs(ipv6h->payload_len),
+						    ipv6h->nexthdr,
+						    0);
+	skb->csum = csum_sub(hw_checksum, csum_pseudo_header);
+	skb->csum = csum_add(skb->csum, csum_partial(ipv6h, sizeof(struct ipv6hdr), 0));
+	return 0;
+}
+#endif
+
+static int check_csum(struct mlx4_cqe *cqe, struct sk_buff *skb, int hwtstamp_rx_filter)
+{
+	__wsum hw_checksum = 0;
+
+	void *hdr = (u8 *)skb->data + sizeof(struct ethhdr);
+
+	hw_checksum = csum_unfold((__force __sum16)cqe->checksum);
+
+	if (((struct ethhdr *)skb->data)->h_proto == htons(ETH_P_8021Q) &&
+	    hwtstamp_rx_filter != HWTSTAMP_FILTER_NONE) {
+		/* next protocol non IPv4 or IPv6 */
+		if (((struct vlan_hdr *)hdr)->h_vlan_encapsulated_proto
+		    != htons(ETH_P_IP) ||
+		    ((struct vlan_hdr *)hdr)->h_vlan_encapsulated_proto
+		    != htons(ETH_P_IPV6))
+			return -1;
+		hw_checksum = get_fixed_vlan_csum(hw_checksum, hdr);
+		hdr += sizeof(struct vlan_hdr);
+	}
+
+	if (cqe->status & cpu_to_be16(MLX4_CQE_STATUS_IPV4))
+		get_fixed_ipv4_csum(hw_checksum, skb, hdr);
+#if IS_ENABLED(CONFIG_IPV6)
+	else if (cqe->status & cpu_to_be16(MLX4_CQE_STATUS_IPV6))
+		if (get_fixed_ipv6_csum(hw_checksum, skb, hdr))
+			return -1;
+#endif
+	return 0;
+}
+
 int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int budget)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
@@ -743,13 +827,26 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 			(cqe->vlan_my_qpn & cpu_to_be32(MLX4_CQE_L2_TUNNEL));
 
 		if (likely(dev->features & NETIF_F_RXCSUM)) {
-			if ((cqe->status & cpu_to_be16(MLX4_CQE_STATUS_IPOK)) &&
-			    (cqe->checksum == cpu_to_be16(0xffff))) {
-				ring->csum_ok++;
-				ip_summed = CHECKSUM_UNNECESSARY;
+			if (cqe->status & cpu_to_be16(MLX4_CQE_STATUS_TCP |
+						      MLX4_CQE_STATUS_UDP)) {
+				if ((cqe->status & cpu_to_be16(MLX4_CQE_STATUS_IPOK)) &&
+				    cqe->checksum == cpu_to_be16(0xffff)) {
+					ip_summed = CHECKSUM_UNNECESSARY;
+					ring->csum_ok++;
+				} else {
+					ip_summed = CHECKSUM_NONE;
+					ring->csum_none++;
+				}
 			} else {
-				ip_summed = CHECKSUM_NONE;
-				ring->csum_none++;
+				if (priv->flags & MLX4_EN_FLAG_RX_CSUM_NON_TCP_UDP &&
+				    (cqe->status & cpu_to_be16(MLX4_CQE_STATUS_IPV4 |
+							       MLX4_CQE_STATUS_IPV6))) {
+					ip_summed = CHECKSUM_COMPLETE;
+					ring->csum_complete++;
+				} else {
+					ip_summed = CHECKSUM_NONE;
+					ring->csum_none++;
+				}
 			}
 		} else {
 			ip_summed = CHECKSUM_NONE;
@@ -767,6 +864,13 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 			goto next;
 		}
 
+		if (ip_summed == CHECKSUM_COMPLETE) {
+			if (check_csum(cqe, skb, ring->hwtstamp_rx_filter)) {
+				ip_summed = CHECKSUM_NONE;
+				ring->csum_none++;
+			}
+		}
+
 		skb->ip_summed = ip_summed;
 		skb->protocol = eth_type_trans(skb, dev);
 		skb_record_rx_queue(skb, cq->ring);
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 9f82196..2f6ba42 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -1629,6 +1629,7 @@ static int mlx4_init_hca(struct mlx4_dev *dev)
 	struct mlx4_init_hca_param init_hca;
 	u64 icm_size;
 	int err;
+	struct mlx4_config_dev_params params;
 
 	if (!mlx4_is_slave(dev)) {
 		err = mlx4_QUERY_FW(dev);
@@ -1762,6 +1763,14 @@ static int mlx4_init_hca(struct mlx4_dev *dev)
 		goto unmap_bf;
 	}
 
+	/* Query CONFIG_DEV parameters */
+	err = mlx4_config_dev_retrieval(dev, &params);
+	if (err && err != -ENOTSUPP) {
+		mlx4_err(dev, "Failed to query CONFIG_DEV parameters\n");
+	} else if (!err) {
+		dev->caps.rx_checksum_flags_port[1] = params.rx_csum_flags_port_1;
+		dev->caps.rx_checksum_flags_port[2] = params.rx_csum_flags_port_2;
+	}
 	priv->eq_table.inta_pin = adapter.inta_pin;
 	memcpy(dev->board_id, adapter.board_id, sizeof dev->board_id);
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index ef83d12..de45674 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -326,6 +326,7 @@ struct mlx4_en_rx_ring {
 #endif
 	unsigned long csum_ok;
 	unsigned long csum_none;
+	unsigned long csum_complete;
 	int hwtstamp_rx_filter;
 	cpumask_var_t affinity_mask;
 };
@@ -449,6 +450,7 @@ struct mlx4_en_port_stats {
 	unsigned long rx_alloc_failed;
 	unsigned long rx_chksum_good;
 	unsigned long rx_chksum_none;
+	unsigned long rx_chksum_complete;
 	unsigned long tx_chksum_offload;
 #define NUM_PORT_STATS		9
 };
@@ -507,7 +509,8 @@ enum {
 	MLX4_EN_FLAG_ENABLE_HW_LOOPBACK	= (1 << 2),
 	/* whether we need to drop packets that hardware loopback-ed */
 	MLX4_EN_FLAG_RX_FILTER_NEEDED	= (1 << 3),
-	MLX4_EN_FLAG_FORCE_PROMISC	= (1 << 4)
+	MLX4_EN_FLAG_FORCE_PROMISC	= (1 << 4),
+	MLX4_EN_FLAG_RX_CSUM_NON_TCP_UDP	= (1 << 5),
 };
 
 #define MLX4_EN_MAC_HASH_SIZE (1 << BITS_PER_BYTE)
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 5cc5eac..3d9bff0 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -497,6 +497,7 @@ struct mlx4_caps {
 	u16			hca_core_clock;
 	u64			phys_port_id[MLX4_MAX_PORTS + 1];
 	int			tunnel_offload_mode;
+	u8			rx_checksum_flags_port[MLX4_MAX_PORTS + 1];
 };
 
 struct mlx4_buf_list {
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next 6/8] net/mlx4_core: Add retrieval of CONFIG_DEV parameters
From: Or Gerlitz @ 2014-10-30 16:06 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Matan Barak, Amir Vadai, Saeed Mahameed, Shani Michaeli,
	Or Gerlitz
In-Reply-To: <1414685216-28907-1-git-send-email-ogerlitz@mellanox.com>

From: Matan Barak <matanb@mellanox.com>

Add code to issue CONFIG_DEV "get" firmware command.

This command is used in order to obtain certain parameters used for
supporting various RX checksumming options and vxlan UDP port.

The GET operation is allowed for VFs too.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/cmd.c           |    4 +-
 drivers/net/ethernet/mellanox/mlx4/fw.c            |   88 +++++++++++++++++++-
 drivers/net/ethernet/mellanox/mlx4/mlx4.h          |    5 +
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  |   17 ++++
 include/linux/mlx4/cmd.h                           |   29 +++++++
 include/linux/mlx4/device.h                        |    3 +-
 6 files changed, 139 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c b/drivers/net/ethernet/mellanox/mlx4/cmd.c
index 1312ccf..3c05e58 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
@@ -990,11 +990,11 @@ static struct mlx4_cmd_info cmd_info[] = {
 	{
 		.opcode = MLX4_CMD_CONFIG_DEV,
 		.has_inbox = false,
-		.has_outbox = false,
+		.has_outbox = true,
 		.out_is_imm = false,
 		.encode_slave_id = false,
 		.verify = NULL,
-		.wrapper = mlx4_CMD_EPERM_wrapper
+		.wrapper = mlx4_CONFIG_DEV_wrapper
 	},
 	{
 		.opcode = MLX4_CMD_ALLOC_RES,
diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c
index e7639e3..d6dba77 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -141,7 +141,8 @@ static void dump_dev_cap_flags2(struct mlx4_dev *dev, u64 flags)
 		[12] = "Large cache line (>64B) CQE stride support",
 		[13] = "Large cache line (>64B) EQE stride support",
 		[14] = "Ethernet protocol control support",
-		[15] = "Ethernet Backplane autoneg support"
+		[15] = "Ethernet Backplane autoneg support",
+		[16] = "CONFIG DEV support"
 	};
 	int i;
 
@@ -574,6 +575,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 #define QUERY_DEV_CAP_MTT_ENTRY_SZ_OFFSET	0x90
 #define QUERY_DEV_CAP_D_MPT_ENTRY_SZ_OFFSET	0x92
 #define QUERY_DEV_CAP_BMME_FLAGS_OFFSET		0x94
+#define QUERY_DEV_CAP_CONFIG_DEV_OFFSET		0x94
 #define QUERY_DEV_CAP_RSVD_LKEY_OFFSET		0x98
 #define QUERY_DEV_CAP_MAX_ICM_SZ_OFFSET		0xa0
 #define QUERY_DEV_CAP_ETH_BACKPL_OFFSET		0x9c
@@ -749,6 +751,9 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 		dev_cap->flags2 |= MLX4_DEV_CAP_FLAG2_EQE_STRIDE;
 	MLX4_GET(dev_cap->bmme_flags, outbox,
 		 QUERY_DEV_CAP_BMME_FLAGS_OFFSET);
+	MLX4_GET(field, outbox, QUERY_DEV_CAP_CONFIG_DEV_OFFSET);
+	if (field & 0x20)
+		dev_cap->flags2 |= MLX4_DEV_CAP_FLAG2_CONFIG_DEV;
 	MLX4_GET(dev_cap->reserved_lkey, outbox,
 		 QUERY_DEV_CAP_RSVD_LKEY_OFFSET);
 	MLX4_GET(field32, outbox, QUERY_DEV_CAP_ETH_BACKPL_OFFSET);
@@ -1849,14 +1854,18 @@ int mlx4_CLOSE_HCA(struct mlx4_dev *dev, int panic)
 
 struct mlx4_config_dev {
 	__be32	update_flags;
-	__be32	rsdv1[3];
+	__be32	rsvd1[3];
 	__be16	vxlan_udp_dport;
 	__be16	rsvd2;
+	__be32	rsvd3[27];
+	__be16	rsvd4;
+	u8	rsvd5;
+	u8	rx_checksum_val;
 };
 
 #define MLX4_VXLAN_UDP_DPORT (1 << 0)
 
-static int mlx4_CONFIG_DEV(struct mlx4_dev *dev, struct mlx4_config_dev *config_dev)
+static int mlx4_CONFIG_DEV_set(struct mlx4_dev *dev, struct mlx4_config_dev *config_dev)
 {
 	int err;
 	struct mlx4_cmd_mailbox *mailbox;
@@ -1874,6 +1883,77 @@ static int mlx4_CONFIG_DEV(struct mlx4_dev *dev, struct mlx4_config_dev *config_
 	return err;
 }
 
+static int mlx4_CONFIG_DEV_get(struct mlx4_dev *dev, struct mlx4_config_dev *config_dev)
+{
+	int err;
+	struct mlx4_cmd_mailbox *mailbox;
+
+	mailbox = mlx4_alloc_cmd_mailbox(dev);
+	if (IS_ERR(mailbox))
+		return PTR_ERR(mailbox);
+
+	err = mlx4_cmd_box(dev, 0, mailbox->dma, 0, 1, MLX4_CMD_CONFIG_DEV,
+			   MLX4_CMD_TIME_CLASS_A, MLX4_CMD_NATIVE);
+
+	if (!err)
+		memcpy(config_dev, mailbox->buf, sizeof(*config_dev));
+
+	mlx4_free_cmd_mailbox(dev, mailbox);
+	return err;
+}
+
+/* Conversion between the HW values and the actual functionality.
+ * The value represented by the array index,
+ * and the functionality determined by the flags.
+ */
+static const u8 config_dev_csum_flags[] = {
+	[0] =	0,
+	[1] =	MLX4_RX_CSUM_MODE_VAL_NON_TCP_UDP,
+	[2] =	MLX4_RX_CSUM_MODE_VAL_NON_TCP_UDP	|
+		MLX4_RX_CSUM_MODE_L4,
+	[3] =	MLX4_RX_CSUM_MODE_L4			|
+		MLX4_RX_CSUM_MODE_IP_OK_IP_NON_TCP_UDP	|
+		MLX4_RX_CSUM_MODE_MULTI_VLAN
+};
+
+int mlx4_config_dev_retrieval(struct mlx4_dev *dev,
+			      struct mlx4_config_dev_params *params)
+{
+	struct mlx4_config_dev config_dev;
+	int err;
+	u8 csum_mask;
+
+#define CONFIG_DEV_RX_CSUM_MODE_MASK			0x7
+#define CONFIG_DEV_RX_CSUM_MODE_PORT1_BIT_OFFSET	0
+#define CONFIG_DEV_RX_CSUM_MODE_PORT2_BIT_OFFSET	4
+
+	if (!(dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_CONFIG_DEV))
+		return -ENOTSUPP;
+
+	err = mlx4_CONFIG_DEV_get(dev, &config_dev);
+	if (err)
+		return err;
+
+	csum_mask = (config_dev.rx_checksum_val >> CONFIG_DEV_RX_CSUM_MODE_PORT1_BIT_OFFSET) &
+			CONFIG_DEV_RX_CSUM_MODE_MASK;
+
+	if (csum_mask >= sizeof(config_dev_csum_flags)/sizeof(config_dev_csum_flags[0]))
+		return -EINVAL;
+	params->rx_csum_flags_port_1 = config_dev_csum_flags[csum_mask];
+
+	csum_mask = (config_dev.rx_checksum_val >> CONFIG_DEV_RX_CSUM_MODE_PORT2_BIT_OFFSET) &
+			CONFIG_DEV_RX_CSUM_MODE_MASK;
+
+	if (csum_mask >= sizeof(config_dev_csum_flags)/sizeof(config_dev_csum_flags[0]))
+		return -EINVAL;
+	params->rx_csum_flags_port_2 = config_dev_csum_flags[csum_mask];
+
+	params->vxlan_udp_dport = be16_to_cpu(config_dev.vxlan_udp_dport);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(mlx4_config_dev_retrieval);
+
 int mlx4_config_vxlan_port(struct mlx4_dev *dev, __be16 udp_port)
 {
 	struct mlx4_config_dev config_dev;
@@ -1882,7 +1962,7 @@ int mlx4_config_vxlan_port(struct mlx4_dev *dev, __be16 udp_port)
 	config_dev.update_flags    = cpu_to_be32(MLX4_VXLAN_UDP_DPORT);
 	config_dev.vxlan_udp_dport = udp_port;
 
-	return mlx4_CONFIG_DEV(dev, &config_dev);
+	return mlx4_CONFIG_DEV_set(dev, &config_dev);
 }
 EXPORT_SYMBOL_GPL(mlx4_config_vxlan_port);
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4.h b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
index 254ec7b..f8fc7bd 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
@@ -947,6 +947,11 @@ int mlx4_SW2HW_EQ_wrapper(struct mlx4_dev *dev, int slave,
 			  struct mlx4_cmd_mailbox *inbox,
 			  struct mlx4_cmd_mailbox *outbox,
 			  struct mlx4_cmd_info *cmd);
+int mlx4_CONFIG_DEV_wrapper(struct mlx4_dev *dev, int slave,
+			    struct mlx4_vhcr *vhcr,
+			    struct mlx4_cmd_mailbox *inbox,
+			    struct mlx4_cmd_mailbox *outbox,
+			    struct mlx4_cmd_info *cmd);
 int mlx4_DMA_wrapper(struct mlx4_dev *dev, int slave,
 		     struct mlx4_vhcr *vhcr,
 		     struct mlx4_cmd_mailbox *inbox,
diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
index 5d2498d..d718ca0 100644
--- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
+++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
@@ -2872,6 +2872,23 @@ out_add:
 	return err;
 }
 
+int mlx4_CONFIG_DEV_wrapper(struct mlx4_dev *dev, int slave,
+			    struct mlx4_vhcr *vhcr,
+			    struct mlx4_cmd_mailbox *inbox,
+			    struct mlx4_cmd_mailbox *outbox,
+			    struct mlx4_cmd_info *cmd)
+{
+	int err;
+	u8 get = vhcr->op_modifier;
+
+	if (get != 1)
+		return -EPERM;
+
+	err = mlx4_DMA_wrapper(dev, slave, vhcr, inbox, outbox, cmd);
+
+	return err;
+}
+
 static int get_containing_mtt(struct mlx4_dev *dev, int slave, int start,
 			      int len, struct res_mtt **res)
 {
diff --git a/include/linux/mlx4/cmd.h b/include/linux/mlx4/cmd.h
index ff5f5de..64d2594 100644
--- a/include/linux/mlx4/cmd.h
+++ b/include/linux/mlx4/cmd.h
@@ -199,6 +199,33 @@ enum {
 	MLX4_CMD_NATIVE
 };
 
+/*
+ * MLX4_RX_CSUM_MODE_VAL_NON_TCP_UDP -
+ * Receive checksum value is reported in CQE also for non TCP/UDP packets.
+ *
+ * MLX4_RX_CSUM_MODE_L4 -
+ * L4_CSUM bit in CQE, which indicates whether or not L4 checksum
+ * was validated correctly, is supported.
+ *
+ * MLX4_RX_CSUM_MODE_IP_OK_IP_NON_TCP_UDP -
+ * IP_OK CQE's field is supported also for non TCP/UDP IP packets.
+ *
+ * MLX4_RX_CSUM_MODE_MULTI_VLAN -
+ * Receive Checksum offload is supported for packets with more than 2 vlan headers.
+ */
+enum mlx4_rx_csum_mode {
+	MLX4_RX_CSUM_MODE_VAL_NON_TCP_UDP		= 1UL << 0,
+	MLX4_RX_CSUM_MODE_L4				= 1UL << 1,
+	MLX4_RX_CSUM_MODE_IP_OK_IP_NON_TCP_UDP		= 1UL << 2,
+	MLX4_RX_CSUM_MODE_MULTI_VLAN			= 1UL << 3
+};
+
+struct mlx4_config_dev_params {
+	u16	vxlan_udp_dport;
+	u8	rx_csum_flags_port_1;
+	u8	rx_csum_flags_port_2;
+};
+
 struct mlx4_dev;
 
 struct mlx4_cmd_mailbox {
@@ -250,6 +277,8 @@ int mlx4_set_vf_vlan(struct mlx4_dev *dev, int port, int vf, u16 vlan, u8 qos);
 int mlx4_set_vf_spoofchk(struct mlx4_dev *dev, int port, int vf, bool setting);
 int mlx4_get_vf_config(struct mlx4_dev *dev, int port, int vf, struct ifla_vf_info *ivf);
 int mlx4_set_vf_link_state(struct mlx4_dev *dev, int port, int vf, int link_state);
+int mlx4_config_dev_retrieval(struct mlx4_dev *dev,
+			      struct mlx4_config_dev_params *params);
 /*
  * mlx4_get_slave_default_vlan -
  * return true if VST ( default vlan)
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index e4c136e..5cc5eac 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -188,7 +188,8 @@ enum {
 	MLX4_DEV_CAP_FLAG2_CQE_STRIDE		= 1LL <<  12,
 	MLX4_DEV_CAP_FLAG2_EQE_STRIDE		= 1LL <<  13,
 	MLX4_DEV_CAP_FLAG2_ETH_PROT_CTRL        = 1LL <<  14,
-	MLX4_DEV_CAP_FLAG2_ETH_BACKPL_AN_REP	= 1LL <<  15
+	MLX4_DEV_CAP_FLAG2_ETH_BACKPL_AN_REP	= 1LL <<  15,
+	MLX4_DEV_CAP_FLAG2_CONFIG_DEV		= 1LL <<  16
 };
 
 enum {
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next 7/8] net: Add calaulation of non folded IPV6 pseudo header checksum
From: Or Gerlitz @ 2014-10-30 16:06 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Matan Barak, Amir Vadai, Saeed Mahameed, Shani Michaeli
In-Reply-To: <1414685216-28907-1-git-send-email-ogerlitz@mellanox.com>

From: Shani Michaeli <shanim@mellanox.com>

Compute IPV6 pseudo header checksum without folding it to a 16 bit
return value.

Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
---
 include/net/ip6_checksum.h |   21 +++++++++++++++++++++
 1 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/include/net/ip6_checksum.h b/include/net/ip6_checksum.h
index 1a49b73..c45d690 100644
--- a/include/net/ip6_checksum.h
+++ b/include/net/ip6_checksum.h
@@ -41,6 +41,27 @@ __sum16 csum_ipv6_magic(const struct in6_addr *saddr,
 			__wsum csum);
 #endif
 
+static inline __wsum csum_ipv6_magic_nofold(const struct in6_addr *saddr,
+					    const struct in6_addr *daddr,
+					    __u32 len, unsigned short proto,
+					    __wsum sum)
+{
+	__wsum res = sum;
+
+	res = csum_add(res, (__force __wsum)saddr->in6_u.u6_addr32[0]);
+	res = csum_add(res, (__force __wsum)saddr->in6_u.u6_addr32[1]);
+	res = csum_add(res, (__force __wsum)saddr->in6_u.u6_addr32[2]);
+	res = csum_add(res, (__force __wsum)saddr->in6_u.u6_addr32[3]);
+	res = csum_add(res, (__force __wsum)daddr->in6_u.u6_addr32[0]);
+	res = csum_add(res, (__force __wsum)daddr->in6_u.u6_addr32[1]);
+	res = csum_add(res, (__force __wsum)daddr->in6_u.u6_addr32[2]);
+	res = csum_add(res, (__force __wsum)daddr->in6_u.u6_addr32[3]);
+	res = csum_add(res, (__force __wsum)htonl(len));
+	res = csum_add(res, (__force __wsum)htonl(proto));
+
+	return res;
+}
+
 static inline __wsum ip6_compute_pseudo(struct sk_buff *skb, int proto)
 {
 	return ~csum_unfold(csum_ipv6_magic(&ipv6_hdr(skb)->saddr,
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next 5/8] net/mlx4_en: Remove redundant code from RX/GRO path
From: Or Gerlitz @ 2014-10-30 16:06 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Matan Barak, Amir Vadai, Saeed Mahameed, Shani Michaeli,
	Or Gerlitz
In-Reply-To: <1414685216-28907-1-git-send-email-ogerlitz@mellanox.com>

Remove the code which goes through napi_gro_frags() on the RX path,
use only napi_gro_receive().

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c |   54 ----------------------------
 1 files changed, 0 insertions(+), 54 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 9d616a8..2a29a1a 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -746,60 +746,6 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 			if ((cqe->status & cpu_to_be16(MLX4_CQE_STATUS_IPOK)) &&
 			    (cqe->checksum == cpu_to_be16(0xffff))) {
 				ring->csum_ok++;
-				/* This packet is eligible for GRO if it is:
-				 * - DIX Ethernet (type interpretation)
-				 * - TCP/IP (v4)
-				 * - without IP options
-				 * - not an IP fragment
-				 * - no LLS polling in progress
-				 */
-				if (!mlx4_en_cq_busy_polling(cq) &&
-				    (dev->features & NETIF_F_GRO)) {
-					struct sk_buff *gro_skb = napi_get_frags(&cq->napi);
-					if (!gro_skb)
-						goto next;
-
-					nr = mlx4_en_complete_rx_desc(priv,
-						rx_desc, frags, gro_skb,
-						length);
-					if (!nr)
-						goto next;
-
-					skb_shinfo(gro_skb)->nr_frags = nr;
-					gro_skb->len = length;
-					gro_skb->data_len = length;
-					gro_skb->ip_summed = CHECKSUM_UNNECESSARY;
-
-					if (l2_tunnel)
-						gro_skb->csum_level = 1;
-					if ((cqe->vlan_my_qpn &
-					    cpu_to_be32(MLX4_CQE_VLAN_PRESENT_MASK)) &&
-					    (dev->features & NETIF_F_HW_VLAN_CTAG_RX)) {
-						u16 vid = be16_to_cpu(cqe->sl_vid);
-
-						__vlan_hwaccel_put_tag(gro_skb, htons(ETH_P_8021Q), vid);
-					}
-
-					if (dev->features & NETIF_F_RXHASH)
-						skb_set_hash(gro_skb,
-							     be32_to_cpu(cqe->immed_rss_invalid),
-							     PKT_HASH_TYPE_L3);
-
-					skb_record_rx_queue(gro_skb, cq->ring);
-					skb_mark_napi_id(gro_skb, &cq->napi);
-
-					if (ring->hwtstamp_rx_filter == HWTSTAMP_FILTER_ALL) {
-						timestamp = mlx4_en_get_cqe_ts(cqe);
-						mlx4_en_fill_hwtstamps(mdev,
-								       skb_hwtstamps(gro_skb),
-								       timestamp);
-					}
-
-					napi_gro_frags(&cq->napi);
-					goto next;
-				}
-
-				/* GRO not possible, complete processing here */
 				ip_summed = CHECKSUM_UNNECESSARY;
 			} else {
 				ip_summed = CHECKSUM_NONE;
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next 3/8] net/mlx4_en: Remove RX buffers alignment to IP_ALIGN
From: Or Gerlitz @ 2014-10-30 16:06 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Matan Barak, Amir Vadai, Saeed Mahameed, Shani Michaeli,
	Ido Shamay
In-Reply-To: <1414685216-28907-1-git-send-email-ogerlitz@mellanox.com>

From: Ido Shamay <idos@mellanox.com>

When IP_ALIGN has a non zero value, hardware will write to a non aligned
address. The only reader from this address is when copying the header
from the first frag into the linear buffer (further access to the IP
address will be from the linear buffer, in which the headers are
aligned). Since the penalty of non align access by the hardware is
greater than the software memcpy, changing the frag_align to always be 0.

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c   |   16 ++++------------
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |    1 -
 2 files changed, 4 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index c8e75da..4cb716f 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -74,7 +74,7 @@ static int mlx4_alloc_pages(struct mlx4_en_priv *priv,
 	page_alloc->page_size = PAGE_SIZE << order;
 	page_alloc->page = page;
 	page_alloc->dma = dma;
-	page_alloc->page_offset = frag_info->frag_align;
+	page_alloc->page_offset = 0;
 	/* Not doing get_page() for each frag is a big win
 	 * on asymetric workloads. Note we can not use atomic_set().
 	 */
@@ -945,15 +945,8 @@ void mlx4_en_calc_rx_buf(struct net_device *dev)
 			(eff_mtu > buf_size + frag_sizes[i]) ?
 				frag_sizes[i] : eff_mtu - buf_size;
 		priv->frag_info[i].frag_prefix_size = buf_size;
-		if (!i)	{
-			priv->frag_info[i].frag_align = NET_IP_ALIGN;
-			priv->frag_info[i].frag_stride =
-				ALIGN(frag_sizes[i] + NET_IP_ALIGN, SMP_CACHE_BYTES);
-		} else {
-			priv->frag_info[i].frag_align = 0;
-			priv->frag_info[i].frag_stride =
-				ALIGN(frag_sizes[i], SMP_CACHE_BYTES);
-		}
+		priv->frag_info[i].frag_stride = ALIGN(frag_sizes[i],
+						       SMP_CACHE_BYTES);
 		buf_size += priv->frag_info[i].frag_size;
 		i++;
 	}
@@ -966,11 +959,10 @@ void mlx4_en_calc_rx_buf(struct net_device *dev)
 	       eff_mtu, priv->num_frags);
 	for (i = 0; i < priv->num_frags; i++) {
 		en_err(priv,
-		       "  frag:%d - size:%d prefix:%d align:%d stride:%d\n",
+		       "  frag:%d - size:%d prefix:%d stride:%d\n",
 		       i,
 		       priv->frag_info[i].frag_size,
 		       priv->frag_info[i].frag_prefix_size,
-		       priv->frag_info[i].frag_align,
 		       priv->frag_info[i].frag_stride);
 	}
 }
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 6beb4d3..ef83d12 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -481,7 +481,6 @@ struct mlx4_en_frag_info {
 	u16 frag_size;
 	u16 frag_prefix_size;
 	u16 frag_stride;
-	u16 frag_align;
 };
 
 #ifdef CONFIG_MLX4_EN_DCB
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next 4/8] net/mlx4_en: Add __GFP_COLD gfp flags in alloc_pages
From: Or Gerlitz @ 2014-10-30 16:06 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Matan Barak, Amir Vadai, Saeed Mahameed, Shani Michaeli,
	Ido Shamay
In-Reply-To: <1414685216-28907-1-git-send-email-ogerlitz@mellanox.com>

From: Ido Shamay <idos@mellanox.com>

Needed in order to get cache cold pages (L3 flushed) for HW scatter.

Otherwise memory may flush those entries when the packet comes from
PCI, causing back pressure resulting in BW decrease.

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 4cb716f..9d616a8 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -54,7 +54,7 @@ static int mlx4_alloc_pages(struct mlx4_en_priv *priv,
 	dma_addr_t dma;
 
 	for (order = MLX4_EN_ALLOC_PREFER_ORDER; ;) {
-		gfp_t gfp = _gfp;
+		gfp_t gfp = _gfp | __GFP_COLD;
 
 		if (order)
 			gfp |= __GFP_COMP | __GFP_NOWARN;
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next 2/8] net/mlx4_core: Protect port type setting by mutex
From: Or Gerlitz @ 2014-10-30 16:06 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Matan Barak, Amir Vadai, Saeed Mahameed, Shani Michaeli
In-Reply-To: <1414685216-28907-1-git-send-email-ogerlitz@mellanox.com>

From: Amir Vadai <amirv@mellanox.com>

We need to protect set_port_type() for concurrency, as the sysfs code could
call it from mutliple contexts in parallel.

The port_mutex is not enough because we need to protect from concurrent
modification of 'info' and stopping of the port sensing work.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/main.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 90de6e1..9f82196 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -901,9 +901,12 @@ static ssize_t set_port_type(struct device *dev,
 	struct mlx4_priv *priv = mlx4_priv(mdev);
 	enum mlx4_port_type types[MLX4_MAX_PORTS];
 	enum mlx4_port_type new_types[MLX4_MAX_PORTS];
+	static DEFINE_MUTEX(set_port_type_mutex);
 	int i;
 	int err = 0;
 
+	mutex_lock(&set_port_type_mutex);
+
 	if (!strcmp(buf, "ib\n"))
 		info->tmp_type = MLX4_PORT_TYPE_IB;
 	else if (!strcmp(buf, "eth\n"))
@@ -912,7 +915,8 @@ static ssize_t set_port_type(struct device *dev,
 		info->tmp_type = MLX4_PORT_TYPE_AUTO;
 	else {
 		mlx4_err(mdev, "%s is not supported port type\n", buf);
-		return -EINVAL;
+		err = -EINVAL;
+		goto err_out;
 	}
 
 	mlx4_stop_sense(mdev);
@@ -958,6 +962,9 @@ static ssize_t set_port_type(struct device *dev,
 out:
 	mlx4_start_sense(mdev);
 	mutex_unlock(&priv->port_mutex);
+err_out:
+	mutex_unlock(&set_port_type_mutex);
+
 	return err ? err : count;
 }
 
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next 1/8] net/mlx4_core: Prevent VF from changing port configuration
From: Or Gerlitz @ 2014-10-30 16:06 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Matan Barak, Amir Vadai, Saeed Mahameed, Shani Michaeli
In-Reply-To: <1414685216-28907-1-git-send-email-ogerlitz@mellanox.com>

From: Saeed Mahameed <saeedm@mellanox.com>

Added wrapper to the ACCESS_REG command for handling guest HW
registers access, preventing write operations, but do allow reads.

This will prevent SRIOV guests to change port PTYS configuration,
such as speed/advertised link modes.

Fixes: adbc7ac5c15e ('net/mlx4_core: Introduce ACCESS_REG CMD [...]')
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/cmd.c  |    2 +-
 drivers/net/ethernet/mellanox/mlx4/fw.c   |   30 ++++++++++++++++++++++++++++-
 drivers/net/ethernet/mellanox/mlx4/mlx4.h |    5 ++++
 3 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c b/drivers/net/ethernet/mellanox/mlx4/cmd.c
index 916459e..1312ccf 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
@@ -1345,7 +1345,7 @@ static struct mlx4_cmd_info cmd_info[] = {
 		.out_is_imm = false,
 		.encode_slave_id = false,
 		.verify = NULL,
-		.wrapper = NULL,
+		.wrapper = mlx4_ACCESS_REG_wrapper,
 	},
 	/* Native multicast commands are not available for guests */
 	{
diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c
index 72289ef..e7639e3 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -2220,7 +2220,7 @@ static int mlx4_ACCESS_REG(struct mlx4_dev *dev, u16 reg_id,
 	memcpy(inbuf->reg_data, reg_data, reg_len);
 	err = mlx4_cmd_box(dev, inbox->dma, outbox->dma, 0, 0,
 			   MLX4_CMD_ACCESS_REG, MLX4_CMD_TIME_CLASS_C,
-			   MLX4_CMD_NATIVE);
+			   MLX4_CMD_WRAPPED);
 	if (err)
 		goto out;
 
@@ -2263,3 +2263,31 @@ int mlx4_ACCESS_PTYS_REG(struct mlx4_dev *dev,
 			       method, sizeof(*ptys_reg), ptys_reg);
 }
 EXPORT_SYMBOL_GPL(mlx4_ACCESS_PTYS_REG);
+
+int mlx4_ACCESS_REG_wrapper(struct mlx4_dev *dev, int slave,
+			    struct mlx4_vhcr *vhcr,
+			    struct mlx4_cmd_mailbox *inbox,
+			    struct mlx4_cmd_mailbox *outbox,
+			    struct mlx4_cmd_info *cmd)
+{
+	struct mlx4_access_reg *inbuf = inbox->buf;
+	u8 method = inbuf->method & MLX4_ACCESS_REG_METHOD_MASK;
+	u16 reg_id = be16_to_cpu(inbuf->reg_id);
+
+	if (slave != mlx4_master_func_num(dev) &&
+	    method == MLX4_ACCESS_REG_WRITE)
+		return -EPERM;
+
+	if (reg_id == MLX4_REG_ID_PTYS) {
+		struct mlx4_ptys_reg *ptys_reg =
+			(struct mlx4_ptys_reg *)inbuf->reg_data;
+
+		ptys_reg->local_port =
+			mlx4_slave_convert_port(dev, slave,
+						ptys_reg->local_port);
+	}
+
+	return mlx4_cmd_box(dev, inbox->dma, outbox->dma, vhcr->in_modifier,
+			    0, MLX4_CMD_ACCESS_REG, MLX4_CMD_TIME_CLASS_C,
+			    MLX4_CMD_NATIVE);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4.h b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
index de10dbb..254ec7b 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
@@ -1273,6 +1273,11 @@ int mlx4_QP_FLOW_STEERING_DETACH_wrapper(struct mlx4_dev *dev, int slave,
 					 struct mlx4_cmd_mailbox *inbox,
 					 struct mlx4_cmd_mailbox *outbox,
 					 struct mlx4_cmd_info *cmd);
+int mlx4_ACCESS_REG_wrapper(struct mlx4_dev *dev, int slave,
+			    struct mlx4_vhcr *vhcr,
+			    struct mlx4_cmd_mailbox *inbox,
+			    struct mlx4_cmd_mailbox *outbox,
+			    struct mlx4_cmd_info *cmd);
 
 int mlx4_get_mgm_entry_size(struct mlx4_dev *dev);
 int mlx4_get_qp_per_mgm(struct mlx4_dev *dev);
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next 0/8] Mellanox ethernet driver update Oct-30-2014
From: Or Gerlitz @ 2014-10-30 16:06 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Matan Barak, Amir Vadai, Saeed Mahameed, Shani Michaeli,
	Or Gerlitz

Hi Dave,

The 1st patch from Saeed fixes a bug in the last net-next batch where
a VF could get access to set port configuration, the next patch from Amir
fixes a race in the port VPI logic. Next are two performance patches from Ido.

The last four patches from Shani, Matan and myself add support for CHECKSUM_COMPLETE 
reporting on non TCP/UDP packets such as GRE and ICMP. I'd like to deeply thank 
Jerry Chu for his innovation and support in that effort.

Or.

Amir Vadai (1):
  net/mlx4_core: Protect port type setting by mutex

Ido Shamay (2):
  net/mlx4_en: Remove RX buffers alignment to IP_ALIGN
  net/mlx4_en: Add __GFP_COLD gfp flags in alloc_pages

Matan Barak (1):
  net/mlx4_core: Add retrieval of CONFIG_DEV parameters

Or Gerlitz (1):
  net/mlx4_en: Remove redundant code from RX/GRO path

Saeed Mahameed (1):
  net/mlx4_core: Prevent VF from changing port configuration

Shani Michaeli (1):
  net: Add calaulation of non folded IPV6 pseudo header checksum
  net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE

 drivers/net/ethernet/mellanox/mlx4/cmd.c           |    6 +-
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c    |    2 +-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c     |    5 +
 drivers/net/ethernet/mellanox/mlx4/en_port.c       |    2 +
 drivers/net/ethernet/mellanox/mlx4/en_rx.c         |  186 ++++++++++++--------
 drivers/net/ethernet/mellanox/mlx4/fw.c            |  118 ++++++++++++-
 drivers/net/ethernet/mellanox/mlx4/main.c          |   18 ++-
 drivers/net/ethernet/mellanox/mlx4/mlx4.h          |   10 +
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h       |    6 +-
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  |   17 ++
 include/linux/mlx4/cmd.h                           |   29 +++
 include/linux/mlx4/device.h                        |    4 +-
 include/net/ip6_checksum.h                         |   21 +++
 13 files changed, 339 insertions(+), 85 deletions(-)

^ permalink raw reply

* Re: [PATCH net] gre: Use inner mac length when computing tunnel length
From: Alexander Duyck @ 2014-10-30 15:52 UTC (permalink / raw)
  To: Tom Herbert, davem, alexander.duyck, netdev
In-Reply-To: <1414683656-26493-1-git-send-email-therbert@google.com>

On 10/30/2014 08:40 AM, Tom Herbert wrote:
> Currently, skb_inner_network_header is used but this does not account
> for Ethernet header for ETH_P_TEB. Use skb_inner_mac_header which
> handles TEB and also should work with IP encapsulation in which case
> inner mac and inner network headers are the same.
>
> Tested: Ran TCP_STREAM over GRE, worked as expected.
>
> Signed-off-by: Tom Herbert <therbert@google.com>
> ---
>   net/ipv4/gre_offload.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c
> index f6e345c..bb5947b 100644
> --- a/net/ipv4/gre_offload.c
> +++ b/net/ipv4/gre_offload.c
> @@ -47,7 +47,7 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb,
>
>   	greh = (struct gre_base_hdr *)skb_transport_header(skb);
>
> -	ghl = skb_inner_network_header(skb) - skb_transport_header(skb);
> +	ghl = skb_inner_mac_header(skb) - skb_transport_header(skb);
>   	if (unlikely(ghl < sizeof(*greh)))
>   		goto out;
>
>

This works for me.  We probably need to queue this up for stable as well 
since this bug goes back as far as 3.14.

Acked-by: Alexander Duyck <alexander.h.duyck@redhat.com>

^ permalink raw reply

* [PATCH iproute2] ss: Identify more netlink protocol names
From: Vadim Kochan @ 2014-10-30 15:33 UTC (permalink / raw)
  To: netdev; +Cc: Vadim Kochan

There were only few Netlink protocol names
which were printed on the screen:

    rtnl, fw, tcpdiag

So added the ability to identify Netlink proto name
from /etc/iproute/nl_protos or from static table.

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
---
 etc/iproute2/nl_protos | 23 ++++++++++++++
 include/rt_names.h     |  2 ++
 lib/rt_names.c         | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++
 misc/ss.c              | 17 ++++++-----
 4 files changed, 116 insertions(+), 8 deletions(-)
 create mode 100644 etc/iproute2/nl_protos

diff --git a/etc/iproute2/nl_protos b/etc/iproute2/nl_protos
new file mode 100644
index 0000000..43418f3
--- /dev/null
+++ b/etc/iproute2/nl_protos
@@ -0,0 +1,23 @@
+# Netlink protocol names mapping
+
+0   rtnl
+1   unused
+2   usersock
+3   fw
+4   tcpdiag
+5   nflog
+6   xfrm
+7   selinux
+8   iscsi
+9   audit
+10  fiblookup
+11  connector
+12  nft 
+13  ip6fw
+14  dec-rt
+15  uevent
+16  genl
+18  scsi-trans
+19  ecryptfs
+20  rdma
+21  crypto 
diff --git a/include/rt_names.h b/include/rt_names.h
index 56b649a..c0ea4f9 100644
--- a/include/rt_names.h
+++ b/include/rt_names.h
@@ -29,5 +29,7 @@ int ll_addr_a2n(char *lladdr, int len, const char *arg);
 const char * ll_proto_n2a(unsigned short id, char *buf, int len);
 int ll_proto_a2n(unsigned short *id, const char *buf);
 
+const char *nl_proto_n2a(int id, char *buf, int len);
+int nl_proto_a2n(__u32 *id, const char *arg);
 
 #endif
diff --git a/lib/rt_names.c b/lib/rt_names.c
index 911e4d2..184f590 100644
--- a/lib/rt_names.c
+++ b/lib/rt_names.c
@@ -525,3 +525,85 @@ const char *rtnl_group_n2a(int id, char *buf, int len)
 	snprintf(buf, len, "%d", id);
 	return buf;
 }
+
+static char *nl_proto_tab[256] = {
+	[NETLINK_ROUTE]          = "rtnl",
+	[NETLINK_UNUSED]         = "unused",
+	[NETLINK_USERSOCK]       = "usersock",
+	[NETLINK_FIREWALL]       = "fw",
+	[NETLINK_SOCK_DIAG]      = "tcpdiag",
+	[NETLINK_NFLOG]          = "nflog",
+	[NETLINK_XFRM]           = "xfrm",
+	[NETLINK_SELINUX]        = "selinux",
+	[NETLINK_ISCSI]          = "iscsi",
+	[NETLINK_AUDIT]          = "audit",
+	[NETLINK_FIB_LOOKUP]     = "fiblookup",
+	[NETLINK_CONNECTOR]      = "connector",
+	[NETLINK_NETFILTER]      = "nft",
+	[NETLINK_IP6_FW]         = "ip6fw",
+	[NETLINK_DNRTMSG]        = "dec-rt",
+	[NETLINK_KOBJECT_UEVENT] = "uevent",
+	[NETLINK_GENERIC]        = "genl",
+	[NETLINK_SCSITRANSPORT]  = "scsi-trans",
+	[NETLINK_ECRYPTFS]       = "ecryptfs",
+	[NETLINK_RDMA]           = "rdma",
+	[NETLINK_CRYPTO]         = "crypto",
+};
+
+static int nl_proto_init;
+
+static void nl_proto_initialize(void)
+{
+	nl_proto_init = 1;
+	rtnl_tab_initialize(CONFDIR "/nl_protos",
+			    nl_proto_tab, 256);
+}
+
+const char *nl_proto_n2a(int id, char *buf, int len)
+{
+	if (id < 0 || id >= 256) {
+		snprintf(buf, len, "%u", id);
+		return buf;
+	}
+
+	if (!nl_proto_init)
+		nl_proto_initialize();
+
+	if (nl_proto_tab[id])
+		return nl_proto_tab[id];
+
+	snprintf(buf, len, "%u", id);
+	return buf;
+}
+
+int nl_proto_a2n(__u32 *id, const char *arg)
+{
+	static char *cache = NULL;
+	static unsigned long res;
+	char *end;
+	int i;
+
+	if (cache && strcmp(cache, arg) == 0) {
+		*id = res;
+		return 0;
+	}
+
+	if (!nl_proto_init)
+		nl_proto_initialize();
+
+	for (i = 0; i < 256; i++) {
+		if (nl_proto_tab[i] &&
+		    strcmp(nl_proto_tab[i], arg) == 0) {
+			cache = nl_proto_tab[i];
+			res = i;
+			*id = res;
+			return 0;
+		}
+	}
+
+	res = strtoul(arg, &end, 0);
+	if (!end || end == arg || *end || res > 255)
+		return -1;
+	*id = res;
+	return 0;
+}
diff --git a/misc/ss.c b/misc/ss.c
index b7e0ef0..291d85f 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -2979,6 +2979,8 @@ static void netlink_show_one(struct filter *f,
 				int rq, int wq,
 				unsigned long long sk, unsigned long long cb)
 {
+	SPRINT_BUF(prot_name);
+
 	if (f->f) {
 		struct tcpstat tst;
 		tst.local.family = AF_NETLINK;
@@ -2996,14 +2998,13 @@ static void netlink_show_one(struct filter *f,
 	if (state_width)
 		printf("%-*s ", state_width, "UNCONN");
 	printf("%-6d %-6d ", rq, wq);
-	if (resolve_services && prot == 0)
-		printf("%*s:", addr_width, "rtnl");
-	else if (resolve_services && prot == 3)
-		printf("%*s:", addr_width, "fw");
-	else if (resolve_services && prot == 4)
-		printf("%*s:", addr_width, "tcpdiag");
-	else
-		printf("%*d:", addr_width, prot);
+
+	if (resolve_services)
+	{
+		printf("%*s:", addr_width, nl_proto_n2a(prot, prot_name,
+					sizeof(prot_name)));
+	}
+
 	if (pid == -1) {
 		printf("%-*s ", serv_width, "*");
 	} else if (resolve_services) {
-- 
2.1.0

^ permalink raw reply related

* [PATCH net] gre: Use inner mac length when computing tunnel length
From: Tom Herbert @ 2014-10-30 15:40 UTC (permalink / raw)
  To: davem, alexander.duyck, netdev

Currently, skb_inner_network_header is used but this does not account
for Ethernet header for ETH_P_TEB. Use skb_inner_mac_header which
handles TEB and also should work with IP encapsulation in which case
inner mac and inner network headers are the same.

Tested: Ran TCP_STREAM over GRE, worked as expected.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 net/ipv4/gre_offload.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c
index f6e345c..bb5947b 100644
--- a/net/ipv4/gre_offload.c
+++ b/net/ipv4/gre_offload.c
@@ -47,7 +47,7 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb,
 
 	greh = (struct gre_base_hdr *)skb_transport_header(skb);
 
-	ghl = skb_inner_network_header(skb) - skb_transport_header(skb);
+	ghl = skb_inner_mac_header(skb) - skb_transport_header(skb);
 	if (unlikely(ghl < sizeof(*greh)))
 		goto out;
 
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related

* Re: [PATCH iproute2] ss: Identify a lot of netlink protocol names
From: vadim4j @ 2014-10-30 15:26 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20141029224932.0df2edba@urahara>

On Wed, Oct 29, 2014 at 10:49:32PM -0700, Stephen Hemminger wrote:
> On Thu, 16 Oct 2014 19:46:58 +0300
> Vadim Kochan <vadim4j@gmail.com> wrote:
> 
> > There were only few Netlink protocol names:
> > 
> >     rtnl, fw, tcpdiag
> > 
> > which were printed on output.
> > So added the other ones.
> > 
> > Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
> 
> Please make this driven off of a file in /etc/iproute2/ rather than
> hard coding a big switch in the code.
> 
Yes, good idea, will do.

Regards,

^ permalink raw reply

* Re: [PATCH net] gre: Fix regression in gretap TSO support
From: Alexander Duyck @ 2014-10-30 15:32 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Neal Cardwell, Pravin Shelar, Alexander Duyck, netdev,
	David Miller, H.K. Jerry Chu, Eric Dumazet
In-Reply-To: <CA+mtBx_AAexNNktyZDFFZwqzgEh_FJbdekvtYVLn7cc2AGLFqA@mail.gmail.com>


On 10/30/2014 08:05 AM, Tom Herbert wrote:
> On Thu, Oct 30, 2014 at 7:30 AM, Alexander Duyck
> <alexander.h.duyck@redhat.com> wrote:
>> On 10/30/2014 06:51 AM, Neal Cardwell wrote:
>>> On Thu, Oct 30, 2014 at 1:14 AM, Pravin Shelar <pshelar@nicira.com> wrote:
>>>> On Wed, Oct 29, 2014 at 8:26 PM,  <alexander.duyck@gmail.com> wrote:
>>>>> From: Alexander Duyck <alexander.h.duyck@redhat.com>
>>>>>
>>>>> On recent kernels I found that TSO on gretap interfaces didn't work.
>>>>> After
>>>>> bisecting it I found that commit b884b1a4 had introduced a regression in
>>>>> which the Ethernet header was being included in the GRE header length.
>>>>>
>>>>> This change corrects that by basing the GRE header length on the inner
>>>>> mac
>>>>> header in the case of GRE tunnels using transparent Ethernet bridging,
>>>>> and
>>>>> uses the network header for all other GRE tunnel types.
>>>>>
>>>>> Fixes: b884b1a4 ("gre_offload: simplify GRE header length calculation in
>>>>> gre_gso_segment()")
>>> Hmm. There may be other protocols, either now or in the future, where
>>> we want to be able to have a mac header inside the GRE header, rather
>>> than a network header. AFAICT it would be safer to revert b884b1a4,
>>> and go back to the previous code (from c50cd357), where we parse the
>>> GRE header to figure out its length.
>>>
>>> neal
>>
>> The change is consistent with how we handle this in other spots throughout
>> the kernel.  If nothing else you can just search for ETH_P_TEB and you will
>> find multiple spots in the kernel where IP tunnels differentiate between
>> transparent Ethernet bridging and regular IP in IP tunnels by checking for
>> the protocol ETH_P_TEB.
>>
> I'm not sure I understand this. We always use inner mac header in
> __skb_udp_tunnel_segment for computing tunnel length and don't
> distinguish between Ethernet or IP encapsulation. Presumably, in the
> case of IP encapsulation inner mac header is equal to inner network
> header. Why is this different for GRE?
>
> Thanks,
> Tom

I'll dig into that a bit more and see if I can simplify this.  I just 
wasn't sure if the inner mac header was being initialized or not in the 
case of IP in IP tunnels.

Thanks,

Alex

^ permalink raw reply

* Re: [PATCH net] gre: Fix regression in gretap TSO support
From: Tom Herbert @ 2014-10-30 15:32 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Neal Cardwell, Pravin Shelar, Alexander Duyck, netdev,
	David Miller, H.K. Jerry Chu, Eric Dumazet
In-Reply-To: <CA+mtBx_AAexNNktyZDFFZwqzgEh_FJbdekvtYVLn7cc2AGLFqA@mail.gmail.com>

> I'm not sure I understand this. We always use inner mac header in
> __skb_udp_tunnel_segment for computing tunnel length and don't
> distinguish between Ethernet or IP encapsulation. Presumably, in the
> case of IP encapsulation inner mac header is equal to inner network
> header. Why is this different for GRE?
>

Using skb_inner_mac_header seems to work okay for IP encapsulation.
I'll post the path momentarily.

Tom


> Thanks,
> Tom
>
>> Thanks,
>>
>> Alex
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net-next v4 1/4] netns: add genl cmd to add and get peer netns ids
From: Nicolas Dichtel @ 2014-10-30 15:25 UTC (permalink / raw)
  To: netdev, containers, linux-kernel, linux-api
  Cc: davem, ebiederm, stephen, akpm, luto, cwang, Nicolas Dichtel
In-Reply-To: <1414682728-4532-1-git-send-email-nicolas.dichtel@6wind.com>

With this patch, a user can define an id for a peer netns by providing a FD or a
PID. These ids are local to netns (ie valid only into one netns).

This will be useful for netlink messages when a x-netns interface is dumped.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 MAINTAINERS                 |   1 +
 include/net/net_namespace.h |   5 ++
 include/uapi/linux/Kbuild   |   1 +
 include/uapi/linux/netns.h  |  38 +++++++++
 net/core/net_namespace.c    | 195 ++++++++++++++++++++++++++++++++++++++++++++
 net/netlink/genetlink.c     |   4 +
 6 files changed, 244 insertions(+)
 create mode 100644 include/uapi/linux/netns.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 43898b1a8a2d..de7e6fcbd5c2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6382,6 +6382,7 @@ F:	include/linux/netdevice.h
 F:	include/uapi/linux/in.h
 F:	include/uapi/linux/net.h
 F:	include/uapi/linux/netdevice.h
+F:	include/uapi/linux/netns.h
 F:	tools/net/
 F:	tools/testing/selftests/net/
 F:	lib/random32.c
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index e0d64667a4b3..0f1367a71b81 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -59,6 +59,7 @@ struct net {
 	struct list_head	exit_list;	/* Use only net_mutex */
 
 	struct user_namespace   *user_ns;	/* Owning user namespace */
+	struct idr		netns_ids;
 
 	unsigned int		proc_inum;
 
@@ -289,6 +290,10 @@ static inline struct net *read_pnet(struct net * const *pnet)
 #define __net_initconst	__initconst
 #endif
 
+int peernet2id(struct net *net, struct net *peer);
+struct net *get_net_ns_by_id(struct net *net, int id);
+int netns_genl_register(void);
+
 struct pernet_operations {
 	struct list_head list;
 	int (*init)(struct net *net);
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index 6cad97485bad..d7f49c69585a 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -277,6 +277,7 @@ header-y += netfilter_decnet.h
 header-y += netfilter_ipv4.h
 header-y += netfilter_ipv6.h
 header-y += netlink.h
+header-y += netns.h
 header-y += netrom.h
 header-y += nfc.h
 header-y += nfs.h
diff --git a/include/uapi/linux/netns.h b/include/uapi/linux/netns.h
new file mode 100644
index 000000000000..2edf129377de
--- /dev/null
+++ b/include/uapi/linux/netns.h
@@ -0,0 +1,38 @@
+/* Copyright (c) 2014 6WIND S.A.
+ * Author: Nicolas Dichtel <nicolas.dichtel@6wind.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ */
+#ifndef _UAPI_LINUX_NETNS_H_
+#define _UAPI_LINUX_NETNS_H_
+
+/* Generic netlink messages */
+
+#define NETNS_GENL_NAME			"netns"
+#define NETNS_GENL_VERSION		0x1
+
+/* Commands */
+enum {
+	NETNS_CMD_UNSPEC,
+	NETNS_CMD_NEWID,
+	NETNS_CMD_GETID,
+	__NETNS_CMD_MAX,
+};
+
+#define NETNS_CMD_MAX		(__NETNS_CMD_MAX - 1)
+
+/* Attributes */
+enum {
+	NETNSA_NONE,
+#define NETNSA_NSINDEX_UNKNOWN	-1
+	NETNSA_NSID,
+	NETNSA_PID,
+	NETNSA_FD,
+	__NETNSA_MAX,
+};
+
+#define NETNSA_MAX		(__NETNSA_MAX - 1)
+
+#endif /* _UAPI_LINUX_NETNS_H_ */
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 7f155175bba8..4a5680ed42fb 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -15,6 +15,8 @@
 #include <linux/file.h>
 #include <linux/export.h>
 #include <linux/user_namespace.h>
+#include <linux/netns.h>
+#include <net/genetlink.h>
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
 
@@ -144,6 +146,50 @@ static void ops_free_list(const struct pernet_operations *ops,
 	}
 }
 
+/* This function is used by idr_for_each(). If net is equal to peer, the
+ * function returns the id so that idr_for_each() stops. Because we cannot
+ * returns the id 0 (idr_for_each() will not stop), we return the magic value
+ * -1 for it.
+ */
+static int net_eq_idr(int id, void *net, void *peer)
+{
+	if (net_eq(net, peer))
+		return id ? : -1;
+	return 0;
+}
+
+/* returns NETNSA_NSINDEX_UNKNOWN if not found */
+int peernet2id(struct net *net, struct net *peer)
+{
+	int id = idr_for_each(&net->netns_ids, net_eq_idr, peer);
+
+	ASSERT_RTNL();
+
+	/* Magic value for id 0. */
+	if (id == -1)
+		return 0;
+	if (id == 0)
+		return NETNSA_NSINDEX_UNKNOWN;
+
+	return id;
+}
+
+struct net *get_net_ns_by_id(struct net *net, int id)
+{
+	struct net *peer;
+
+	if (id < 0)
+		return NULL;
+
+	rcu_read_lock();
+	peer = idr_find(&net->netns_ids, id);
+	if (peer)
+		get_net(peer);
+	rcu_read_unlock();
+
+	return peer;
+}
+
 /*
  * setup_net runs the initializers for the network namespace object.
  */
@@ -158,6 +204,7 @@ static __net_init int setup_net(struct net *net, struct user_namespace *user_ns)
 	atomic_set(&net->passive, 1);
 	net->dev_base_seq = 1;
 	net->user_ns = user_ns;
+	idr_init(&net->netns_ids);
 
 #ifdef NETNS_REFCNT_DEBUG
 	atomic_set(&net->use_count, 0);
@@ -288,6 +335,14 @@ static void cleanup_net(struct work_struct *work)
 	list_for_each_entry(net, &net_kill_list, cleanup_list) {
 		list_del_rcu(&net->list);
 		list_add_tail(&net->exit_list, &net_exit_list);
+		for_each_net(tmp) {
+			int id = peernet2id(tmp, net);
+
+			if (id >= 0)
+				idr_remove(&tmp->netns_ids, id);
+		}
+		idr_destroy(&net->netns_ids);
+
 	}
 	rtnl_unlock();
 
@@ -399,6 +454,146 @@ static struct pernet_operations __net_initdata net_ns_ops = {
 	.exit = net_ns_net_exit,
 };
 
+static struct genl_family netns_genl_family = {
+	.id		= GENL_ID_GENERATE,
+	.name		= NETNS_GENL_NAME,
+	.version	= NETNS_GENL_VERSION,
+	.hdrsize	= 0,
+	.maxattr	= NETNSA_MAX,
+	.netnsok	= true,
+};
+
+static struct nla_policy netns_nl_policy[NETNSA_MAX + 1] = {
+	[NETNSA_NONE]		= { .type = NLA_UNSPEC },
+	[NETNSA_NSID]		= { .type = NLA_S32 },
+	[NETNSA_PID]		= { .type = NLA_U32 },
+	[NETNSA_FD]		= { .type = NLA_U32 },
+};
+
+static int netns_nl_cmd_newid(struct sk_buff *skb, struct genl_info *info)
+{
+	struct net *net = genl_info_net(info);
+	struct net *peer;
+	int nsid, err;
+
+	if (!info->attrs[NETNSA_NSID])
+		return -EINVAL;
+	nsid = nla_get_s32(info->attrs[NETNSA_NSID]);
+	if (nsid < 0)
+		return -EINVAL;
+
+	if (info->attrs[NETNSA_PID])
+		peer = get_net_ns_by_pid(nla_get_u32(info->attrs[NETNSA_PID]));
+	else if (info->attrs[NETNSA_FD])
+		peer = get_net_ns_by_fd(nla_get_u32(info->attrs[NETNSA_FD]));
+	else
+		return -EINVAL;
+	if (IS_ERR(peer))
+		return PTR_ERR(peer);
+
+	rtnl_lock();
+	if (peernet2id(net, peer) >= 0) {
+		err = -EEXIST;
+		goto out;
+	}
+
+	err = idr_alloc(&net->netns_ids, peer, nsid, nsid + 1, GFP_KERNEL);
+	if (err >= 0)
+		err = 0;
+out:
+	rtnl_unlock();
+	put_net(peer);
+	return err;
+}
+
+static int netns_nl_get_size(void)
+{
+	return nla_total_size(sizeof(s32)) /* NETNSA_NSID */
+	       ;
+}
+
+static int netns_nl_fill(struct sk_buff *skb, u32 portid, u32 seq, int flags,
+			 int cmd, struct net *net, struct net *peer)
+{
+	void *hdr;
+	int id;
+
+	hdr = genlmsg_put(skb, portid, seq, &netns_genl_family, flags, cmd);
+	if (!hdr)
+		return -EMSGSIZE;
+
+	rtnl_lock();
+	id = peernet2id(net, peer);
+	rtnl_unlock();
+	if (nla_put_s32(skb, NETNSA_NSID, id))
+		goto nla_put_failure;
+
+	return genlmsg_end(skb, hdr);
+
+nla_put_failure:
+	genlmsg_cancel(skb, hdr);
+	return -EMSGSIZE;
+}
+
+static int netns_nl_cmd_getid(struct sk_buff *skb, struct genl_info *info)
+{
+	struct net *net = genl_info_net(info);
+	struct sk_buff *msg;
+	int err = -ENOBUFS;
+	struct net *peer;
+
+	if (info->attrs[NETNSA_PID])
+		peer = get_net_ns_by_pid(nla_get_u32(info->attrs[NETNSA_PID]));
+	else if (info->attrs[NETNSA_FD])
+		peer = get_net_ns_by_fd(nla_get_u32(info->attrs[NETNSA_FD]));
+	else
+		return -EINVAL;
+
+	if (IS_ERR(peer))
+		return PTR_ERR(peer);
+
+	msg = genlmsg_new(netns_nl_get_size(), GFP_KERNEL);
+	if (!msg) {
+		err = -ENOMEM;
+		goto out;
+	}
+
+	err = netns_nl_fill(msg, info->snd_portid, info->snd_seq,
+			    NLM_F_ACK, NETNS_CMD_GETID, net, peer);
+	if (err < 0)
+		goto err_out;
+
+	err = genlmsg_unicast(net, msg, info->snd_portid);
+	goto out;
+
+err_out:
+	nlmsg_free(msg);
+out:
+	put_net(peer);
+	return err;
+}
+
+static struct genl_ops netns_genl_ops[] = {
+	{
+		.cmd = NETNS_CMD_NEWID,
+		.policy = netns_nl_policy,
+		.doit = netns_nl_cmd_newid,
+		.flags = GENL_ADMIN_PERM,
+	},
+	{
+		.cmd = NETNS_CMD_GETID,
+		.policy = netns_nl_policy,
+		.doit = netns_nl_cmd_getid,
+		.flags = GENL_ADMIN_PERM,
+	},
+};
+
+int netns_genl_register(void)
+{
+	return genl_register_family_with_ops(&netns_genl_family,
+					     netns_genl_ops);
+}
+
 static int __init net_ns_init(void)
 {
 	struct net_generic *ng;
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index 76393f2f4b22..c6f39e40c9f3 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -1029,6 +1029,10 @@ static int __init genl_init(void)
 	if (err)
 		goto problem;
 
+	err = netns_genl_register();
+	if (err < 0)
+		goto problem;
+
 	return 0;
 
 problem:
-- 
2.1.0

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox