Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v2] PPC: bpf_jit_comp: add SKF_AD_PKTTYPE instruction
From: Daniel Borkmann @ 2014-11-01 18:40 UTC (permalink / raw)
  To: Denis Kirjanov
  Cc: Alexei Starovoitov, Denis Kirjanov, netdev@vger.kernel.org,
	linuxppc-dev, Michael Ellerman, Matt Evans
In-Reply-To: <CAHj3AVkpouHa1y0jt9c5dyYvnd6dctociTMW1ORdxaKdsFEbhQ@mail.gmail.com>

On 10/31/2014 07:09 AM, Denis Kirjanov wrote:
> On 10/30/14, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>> On Wed, Oct 29, 2014 at 11:12 PM, Denis Kirjanov <kda@linux-powerpc.org>
>> wrote:
>>> Add BPF extension SKF_AD_PKTTYPE to ppc JIT to load
>>> skb->pkt_type field.
>>>
>>> Before:
>>> [   88.262622] test_bpf: #11 LD_IND_NET 86 97 99 PASS
>>> [   88.265740] test_bpf: #12 LD_PKTTYPE 109 107 PASS
>>>
>>> After:
>>> [   80.605964] test_bpf: #11 LD_IND_NET 44 40 39 PASS
>>> [   80.607370] test_bpf: #12 LD_PKTTYPE 9 9 PASS
>>
>> if you'd only quoted #12, it would all make sense ;)
>> but #11 test is not using PKTTYPE. So your patch shouldn't
>> make a difference. Are these numbers with JIT on and off?
>
> Right.

Ok.

Please mention this in future log messages, as it was not quite
clear that "before" was actually with JIT off, and "after" was
with JIT on.

One could have read it that actually both cases were with JIT on,
and thus the inconsistent result for LD_IND_NET is a bit confusing
since you've quoted it here as well.

^ permalink raw reply

* Re: [PATCH 0/1] mv643xx_eth: Disable TSO by default
From: Ezequiel Garcia @ 2014-11-01 19:05 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, David Miller, Thomas Petazzoni, Gregory Clement,
	Tawfik Bayouk, Lior Amsalem, Nadav Haklai
In-Reply-To: <1414863453.31792.8.camel@edumazet-glaptop2.roam.corp.google.com>

[-- Attachment #1: Type: text/plain, Size: 1716 bytes --]

On 11/01/2014 02:37 PM, Eric Dumazet wrote:
> On Sat, 2014-11-01 at 10:26 -0700, Eric Dumazet wrote:
>> On Sat, 2014-11-01 at 12:30 -0300, Ezequiel Garcia wrote:
>>> Several users ([1], [2]) have been reporting data corruption with TSO on
>>> Kirkwood platforms (i.e. using the mv643xx_eth driver).
>>>
>>> Until we manage to find what's causing this, this simple patch will make
>>> the TSO path disabled by default. This patch should be queued for stable,
>>> fixing the TSO feature introduced in v3.16.
>>>
>>> The corruption itself is very easy to reproduce: checking md5sum on a mounted
>>> NFS directory gives a different result each time. Same tests using the mvneta
>>> driver (Armada 370/38x/XP SoC) pass with no issues.
>>>
>>> Frankly, I'm a bit puzzled about this, and so any ideas or debugging hints
>>> are well received.
>>
>> lack of barriers maybe ?
>>

Yup, that was my initial thought as well...

>> It seems you might need to populate all TX descriptors but delay the
>> first, like doing the populate in descending order.
>>
>> If you take a look at txq_submit_skb(), you'll see the final
>> desc->cmd_sts = cmd_sts (line 959) is done _after_ frags were cooked by
>> txq_submit_frag_skb()
>>
>> You should kick the nick only when all TX descriptors are ready and
>> committed to memory.
>>
> 
> Untested patch would be :
> 

Yeah, it makes sense. I'm still seeing the corruption after applying
your patch.

However, maybe we are onto something. I'll see about taking a closer
look and give this some more thought.

Thanks for the hint!
-- 
Ezequiel García, Free Electrons
Embedded Linux, Kernel and Android Engineering
http://free-electrons.com


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* DONATION!!!
From: Mrs Birgit Rausing @ 2014-10-30 21:26 UTC (permalink / raw)





I,Birgit authenticate this email, you can read about me on:
http://en.wikipedia.org/wiki/Birgit_Rausing
I have funds for you to manage and disburse to various charities of your
choice. If you are sure you can handle this, it will be of help to you and
others. Please reply if you are interested  for more details.please
Contact my private  email;( mrs_BirgitRausin0@qq.com ) for more
information

With love,
Mrs Birgit Rausing

^ permalink raw reply

* Re: [PATCH] net: mvpp2: fix possible memory leak
From: David Miller @ 2014-11-01 19:12 UTC (permalink / raw)
  To: sudipm.mukherjee; +Cc: netdev, linux-kernel
In-Reply-To: <1414841374-30537-1-git-send-email-sudipm.mukherjee@gmail.com>

From: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Date: Sat,  1 Nov 2014 16:59:34 +0530

> we are allocating memory using kzalloc for struct mvpp2_prs_entry,
> but later when we are getting error we were just returning the error
> value without releasing the memory.
> 
> Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net 0/2] net: systemport: TX dma fixes
From: David Miller @ 2014-11-01 19:14 UTC (permalink / raw)
  To: f.fainelli; +Cc: netdev
In-Reply-To: <1414795895-31612-1-git-send-email-f.fainelli@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Fri, 31 Oct 2014 15:51:33 -0700

> This patch series contains two fixes for our transmit path, first one
> is a pretty nasty one since we were not allocating a large enough
> dma coherent pool for our transmit descriptors, which would work most of the
> time, since allocations are contiguous and we could have.
> 
> Second patch fixes a less frequent, though highly likley crash when using
> CMA allocations.

Series applied, thanks.

> I just missed your pull request to Linus, though I assume there will be
> another one?

Yes, there should be another one, or two, or three, or...

^ permalink raw reply

* Re: [PATCH] drivers: net: ethernet: xilinx: xilinx_emaclite: Compatible with 'xlnx,xps-ethernetlite-2.00.b' for QEMU using
From: David Miller @ 2014-11-01 21:03 UTC (permalink / raw)
  To: gang.chen.5i5j
  Cc: michal.simek, soren.brinkmann, sthokal, manuel.schoelling,
	paul.gortmaker, f.fainelli, ebiederm, netdev, linux-arm-kernel,
	linux-kernel
In-Reply-To: <54545AB9.70206@gmail.com>

From: Chen Gang <gang.chen.5i5j@gmail.com>
Date: Sat, 01 Nov 2014 11:59:53 +0800

> On 11/1/14 11:08, Chen Gang wrote:
>> When use current latest upstream qemu (current version: 2.1.2), need let
>> driver compatible with 'xlnx,xps-ethernetlite-2.00.b', or can not find
>> net device in microblaze qemu. Related QEMU commands under fedora 20:
>> 
>>   yum install libvirt
>>   yum install tunctl
>>   tunctl -b
>>   ip link set tap0 up
>>   brctl addif virbr0 tap0
>>   ./microblaze-softmmu/qemu-system-microblaze -M petalogix-s3adsp1800 \
>>     -kernel ../linux-stable.microblaze/arch/microblaze/boot/linux.bin \
>>     -no-reboot -append "console=ttyUL0,115200 doreboot" -nographic \
>>     -net nic,vlan=0,model=xlnx.xps-ethernetlite,macaddr=00:16:35:AF:94:00 \
>>     -net tap,vlan=0,ifname=tap0,script=no,downscript=no
>> 
>>   in microblaze qemu bash (guest machine):
>> 
>>     ifconfig eth0 add 192.168.122.1 netmask 255.255.255.0
> 
> Oh, sorry, it is 192.168.122.2 (192.168.122.1 is the router address).
> 
> Thanks.
> 
>>     ifconfig eth0 up
>> 
>> After add this patch, can find the device, and can be used by 'telnetd'
>> (need cross-build busybox with glibc for it), then outside can telnet to
>> it without password.
>> 
>> Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>

Applied with correct commit message, thanks.

^ permalink raw reply

* Re: [PATCH] smc91x: retrieve IRQ and trigger flags in a modern way
From: David Miller @ 2014-11-01 21:04 UTC (permalink / raw)
  To: linus.walleij; +Cc: netdev, nico
In-Reply-To: <1414787526-11197-1-git-send-email-linus.walleij@linaro.org>

From: Linus Walleij <linus.walleij@linaro.org>
Date: Fri, 31 Oct 2014 21:32:06 +0100

> The SMC91x is written to explicitly look up the IRQ resource
> from the platform device and extract the IRQ and flags, however
> the platform_get_irq() does additional things, like call
> of_irq_get() in the device tree case, which will translate
> the IRQ using the irqdomain and defer the probe if the
> IRQ host cannot be found.
> 
> As we're not looking up the resource, this will not retrieve
> the IRQ flags, but that is better done using
> irqd_get_trigger_type(), as the trigger is what the driver
> wants to modify. We take care to preserve the semantics that
> will make the trigger type provided from the resource
> override any local specifier.
> 
> Tested on the Nomadik NHK15 which has its SMC91x IRQ line
> connected to a STMPE2401 GPIO expander on I2C.
> 
> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next v4 0/4] netns: allow to identify peer netns
From: David Miller @ 2014-11-01 21:08 UTC (permalink / raw)
  To: ebiederm-aS9lmoZGLiVWk0Htik3J/w
  Cc: nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	luto-kltTT9wpgjJwATOyAt5JVQ, cwang-xCSkyg8dI+0RB7SZvlqPiA
In-Reply-To: <871tpph03k.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>

From: ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman)
Date: Thu, 30 Oct 2014 11:41:03 -0700

> Nicolas Dichtel <nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w@public.gmane.org> writes:
> 
>> The goal of this serie is to be able to multicast netlink messages with an
>> attribute that identify a peer netns.
>> This is needed by the userland to interpret some informations contained in
>> netlink messages (like IFLA_LINK value, but also some other attributes in case
>> of x-netns netdevice (see also
>> http://thread.gmane.org/gmane.linux.network/315933/focus=316064 and
>> http://thread.gmane.org/gmane.linux.kernel.containers/28301/focus=4239)).
>>
>> Ids of peer netns are set by userland via a new genl messages. These ids are
>> stored per netns and are local (ie only valid in the netns where they are set).
>> To avoid allocating an int for each peer netns, I use idr_for_each() to retrieve
>> the id of a peer netns. Note that it will be possible to add a table (struct net
>> -> id) later to optimize this lookup if needed.
>>
>> Patch 1/4 introduces the netlink API mechanism to set and get these ids.
>> Patch 2/4 and 3/4 implements an example of how to use these ids in rtnetlink
>> messages. And patch 4/4 shows that the netlink messages can be symetric between
>> a GET and a SET.
>>
>> iproute2 patches are available, I can send them on demand.
> 
> A quick reply.  I think this patchset is in the right general direction.
> There are some oddball details that seem odd/awkward to me such as using
> genetlink instead of rtnetlink to get and set the ids, and not having
> ids if they are not set (that feels like a maintenance/usability challenge).
> 
> I would like to give your patches a deep review, but I won't be able to
> do that for a couple of weeks.  I am deep in the process of moving,
> and will be mostly offline until about the Nov 11th.

I'm going to mark this patch set 'deferred' in patchwork until things
move forward.

Thanks.

^ permalink raw reply

* Re: [PATCH]  net: mvpp2: fix possible memory leak
From: Thomas Petazzoni @ 2014-11-01 22:24 UTC (permalink / raw)
  To: Sudip Mukherjee; +Cc: David S. Miller, netdev, linux-kernel
In-Reply-To: <1414841374-30537-1-git-send-email-sudipm.mukherjee@gmail.com>

Dear Sudip Mukherjee,

On Sat,  1 Nov 2014 16:59:34 +0530, Sudip Mukherjee wrote:
> we are allocating memory using kzalloc for struct mvpp2_prs_entry,
> but later when we are getting error we were just returning the error
> value without releasing the memory.
> 
> Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
> ---
> 
> hi,
> i could not build test after modifying it. I tried to compile using
> multi_v7_defconfig , but the cross compiler i have is not able to
> compile it and giving sevaral warnings from the assembler.

That seems weird. Which compiler are you using, and which errors were
you getting?

In any case, it would have been good to Cc the authors of the driver.

Thanks!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply

* [PATCH net-next 0/7] gue: Remote checksum offload
From: Tom Herbert @ 2014-11-01 22:57 UTC (permalink / raw)
  To: davem, netdev

This patch set implements remote checksum offload for
GUE, which is a mechanism that provides checksum offload of
encapsulated packets using rudimentary offload capabilities found in
most Network Interface Card (NIC) devices. The outer header checksum
for UDP is enabled in packets and, with some additional meta
information in the GUE header, a receiver is able to deduce the
checksum to be set for an inner encapsulated packet. Effectively this
offloads the computation of the inner checksum. Enabling the outer
checksum in encapsulation has the additional advantage that it covers
more of the packet than the inner checksum including the encapsulation
headers.

Remote checksum offload is described in:
http://tools.ietf.org/html/draft-herbert-remotecsumoffload-00

The GUE transmit and receive paths are modified to support the
remote checksum offload option. The option contains a checksum
offset and checksum start which are directly derived from values
set in stack when doing CHECKSUM_PARTIAL. On receipt of the option, the
operation is to calculate the packet checksum from "start" to end of
the packet (normally derived for checksum complete), and then set 
the resultant value at checksum "offset" (the checksum field has
already been primed with the pseudo header). This emulates a NIC
that implements NETIF_F_HW_CSUM.

The primary purpose of this feature is to eliminate cost of performing
checksum calculation over a packet when encpasulating.

In this patch set:
  - Move fou_build_header into fou.c and split it into a couple of
    functions
  - Enable offloading of outer UDP checksum in encapsulation
  - Change udp_offload to support remote checksum offload, includes
    new GSO type and ensuring encapsulated layers (TCP) doesn't try to
    set a checksum covered by RCO
  - TX support for RCO with GUE. This is configured through ip_tunnel
    and set the option on transmit when packet being encapsulated is
    CHECKSUM_PARTIAL
  - RX support for RCO with GUE for normal and GRO paths. Includes
    resolving the offloaded checksum

Testing:

I ran performance numbers using netperf TCP_STREAM and TCP_RR with 200
streams, comparing GUE with and without remote checksum offload (doing
checksum-unnecessary to complete conversion in both cases). These
were run on mlnx4 and bnx2x. Some mlnx4 results are below.

GRE/GUE
    TCP_STREAM 
      IPv4, with remote checksum offload
        9.71% TX CPU utilization
        7.42% RX CPU utilization
        36380 Mbps
      IPv4, without remote checksum offload
        12.40% TX CPU utilization
        7.36% RX CPU utilization
        36591 Mbps
    TCP_RR
      IPv4, with remote checksum offload
        77.79% CPU utilization
	91/144/216 90/95/99% latencies
        1.95127e+06 tps
      IPv4, without remote checksum offload
        78.70% CPU utilization
        89/152/297 90/95/99% latencies
        1.95458e+06 tps

IPIP/GUE
    TCP_STREAM 
      With remote checksum offload
        10.30% TX CPU utilization
        7.43% RX CPU utilization
        36486 Mbps
      Without remote checksum offload
        12.47% TX CPU utilization
        7.49% RX CPU utilization
        36694 Mbps
    TCP_RR
      With remote checksum offload
        77.80% CPU utilization
        87/153/270 90/95/99% latencies
        1.98735e+06 tps
      Without remote checksum offload
        77.98% CPU utilization
        87/150/287 90/95/99% latencies
        1.98737e+06 tps

SIT/GUE
    TCP_STREAM 
      With remote checksum offload
        9.68% TX CPU utilization
        7.36% RX CPU utilization
        35971 Mbps
      Without remote checksum offload
        12.95% TX CPU utilization
        8.04% RX CPU utilization
        36177 Mbps
    TCP_RR
      With remote checksum offload
        79.32% CPU utilization
        94/158/295 90/95/99% latencies
        1.88842e+06 tps
      Without remote checksum offload
        80.23% CPU utilization
        94/149/226 90/95/99% latencies
        1.90338e+06 tps

VXLAN
    TCP_STREAM 
        35.03% TX CPU utilization
        20.85% RX CPU utilization
        36230 Mbps
    TCP_RR
        77.36% CPU utilization
        84/146/270 90/95/99% latencies
        2.08063e+06 tps

We can also look at CPU time in csum_partial using perf (with bnx2x
setup). For GRE with TCP_STREAM I see:

    With remote checksum offload
        0.33% TX
        1.81% RX
    Without remote checksum offload
        6.00% TX
        0.51% RX

I suspect the fact that time in csum_partial noticably increases
with remote checksum offload for RX is due to taking the cache miss on
the encapsulated header in that function. By similar reasoning, if on
the TX side the packet were not in cache (say we did a splice from a
file whose data was never touched by the CPU) the CPU savings for TX
would probably be more pronounced.

Tom Herbert (7):
  net: Move fou_build_header into fou.c and refactor
  udp: Offload outer UDP tunnel csum if available
  gue: Add infrastructure for flags and options
  udp: Changes to udp_offload to support remote checksum offload
  gue: Protocol constants for remote checksum offload
  gue: TX support for using remote checksum offload option
  gue: Receive side of remote checksum offload

 include/linux/netdev_features.h |   4 +-
 include/linux/netdevice.h       |   1 +
 include/linux/skbuff.h          |   4 +-
 include/net/fou.h               |  38 ++++
 include/net/gue.h               | 103 ++++++++++-
 include/uapi/linux/if_tunnel.h  |   1 +
 net/core/skbuff.c               |   4 +-
 net/ipv4/Kconfig                |   9 +
 net/ipv4/af_inet.c              |   1 +
 net/ipv4/fou.c                  | 388 +++++++++++++++++++++++++++++++++++-----
 net/ipv4/ip_tunnel.c            |  61 ++-----
 net/ipv4/tcp_offload.c          |   1 +
 net/ipv4/udp_offload.c          |  66 +++++--
 net/ipv6/ip6_offload.c          |   1 +
 net/ipv6/udp_offload.c          |   1 +
 15 files changed, 565 insertions(+), 118 deletions(-)
 create mode 100644 include/net/fou.h

-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply

* [PATCH net-next 1/7] net: Move fou_build_header into fou.c and refactor
From: Tom Herbert @ 2014-11-01 22:57 UTC (permalink / raw)
  To: davem, netdev
In-Reply-To: <1414882683-25484-1-git-send-email-therbert@google.com>

Move fou_build_header out of ip_tunnel.c and into fou.c splitting
it up into fou_build_header, gue_build_header, and fou_build_udp.
This allows for other users for TX of FOU or GUE. Change ip_tunnel_encap
to call fou_build_header or gue_build_header based on the tunnel
encapsulation type. Similarly, added fou_encap_hlen and gue_encap_hlen
functions which are called by ip_encap_hlen. New net/fou.h has
prototypes and defines for this.

Added NET_FOU_IP_TUNNELS configuration. When this is set, IP tunnels
can use FOU/GUE and fou module is also selected.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 include/net/fou.h    | 26 +++++++++++++++++++
 net/ipv4/Kconfig     |  9 +++++++
 net/ipv4/fou.c       | 73 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 net/ipv4/ip_tunnel.c | 61 +++++++++----------------------------------
 4 files changed, 120 insertions(+), 49 deletions(-)
 create mode 100644 include/net/fou.h

diff --git a/include/net/fou.h b/include/net/fou.h
new file mode 100644
index 0000000..cf4ce88
--- /dev/null
+++ b/include/net/fou.h
@@ -0,0 +1,26 @@
+#ifndef __NET_FOU_H
+#define __NET_FOU_H
+
+#include <linux/skbuff.h>
+
+#include <net/flow.h>
+#include <net/gue.h>
+#include <net/ip_tunnels.h>
+#include <net/udp.h>
+
+int fou_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
+		     u8 *protocol, struct flowi4 *fl4);
+int gue_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
+		     u8 *protocol, struct flowi4 *fl4);
+
+static size_t fou_encap_hlen(struct ip_tunnel_encap *e)
+{
+	return sizeof(struct udphdr);
+}
+
+static size_t gue_encap_hlen(struct ip_tunnel_encap *e)
+{
+	return sizeof(struct udphdr) + sizeof(struct guehdr);
+}
+
+#endif
diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
index e682b48..bd29016 100644
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -322,6 +322,15 @@ config NET_FOU
 	  network mechanisms and optimizations for UDP (such as ECMP
 	  and RSS) can be leveraged to provide better service.
 
+config NET_FOU_IP_TUNNELS
+	bool "IP: FOU encapsulation of IP tunnels"
+	depends on NET_IPIP || NET_IPGRE || IPV6_SIT
+	select NET_FOU
+	---help---
+	  Allow configuration of FOU or GUE encapsulation for IP tunnels.
+	  When this option is enabled IP tunnels can be configured to use
+	  FOU or GUE encapsulation.
+
 config GENEVE
 	tristate "Generic Network Virtualization Encapsulation (Geneve)"
 	depends on INET
diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
index 32e7892..5446c1c 100644
--- a/net/ipv4/fou.c
+++ b/net/ipv4/fou.c
@@ -487,6 +487,79 @@ static const struct genl_ops fou_nl_ops[] = {
 	},
 };
 
+static void fou_build_udp(struct sk_buff *skb, struct ip_tunnel_encap *e,
+			  struct flowi4 *fl4, u8 *protocol, __be16 sport)
+{
+	struct udphdr *uh;
+
+	skb_push(skb, sizeof(struct udphdr));
+	skb_reset_transport_header(skb);
+
+	uh = udp_hdr(skb);
+
+	uh->dest = e->dport;
+	uh->source = sport;
+	uh->len = htons(skb->len);
+	uh->check = 0;
+	udp_set_csum(!(e->flags & TUNNEL_ENCAP_FLAG_CSUM), skb,
+		     fl4->saddr, fl4->daddr, skb->len);
+
+	*protocol = IPPROTO_UDP;
+}
+
+int fou_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
+		     u8 *protocol, struct flowi4 *fl4)
+{
+	bool csum = !!(e->flags & TUNNEL_ENCAP_FLAG_CSUM);
+	int type = csum ? SKB_GSO_UDP_TUNNEL_CSUM : SKB_GSO_UDP_TUNNEL;
+	__be16 sport;
+
+	skb = iptunnel_handle_offloads(skb, csum, type);
+
+	if (IS_ERR(skb))
+		return PTR_ERR(skb);
+
+	sport = e->sport ? : udp_flow_src_port(dev_net(skb->dev),
+					       skb, 0, 0, false);
+	fou_build_udp(skb, e, fl4, protocol, sport);
+
+	return 0;
+}
+EXPORT_SYMBOL(fou_build_header);
+
+int gue_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
+		     u8 *protocol, struct flowi4 *fl4)
+{
+	bool csum = !!(e->flags & TUNNEL_ENCAP_FLAG_CSUM);
+	int type = csum ? SKB_GSO_UDP_TUNNEL_CSUM : SKB_GSO_UDP_TUNNEL;
+	struct guehdr *guehdr;
+	size_t hdr_len = sizeof(struct guehdr);
+	__be16 sport;
+
+	skb = iptunnel_handle_offloads(skb, csum, type);
+
+	if (IS_ERR(skb))
+		return PTR_ERR(skb);
+
+	/* Get source port (based on flow hash) before skb_push */
+	sport = e->sport ? : udp_flow_src_port(dev_net(skb->dev),
+					       skb, 0, 0, false);
+
+	skb_push(skb, hdr_len);
+
+	guehdr = (struct guehdr *)skb->data;
+
+	guehdr->version = 0;
+	guehdr->hlen = 0;
+	guehdr->flags = 0;
+	guehdr->next_hdr = *protocol;
+
+	fou_build_udp(skb, e, fl4, protocol, sport);
+
+	return 0;
+}
+EXPORT_SYMBOL(gue_build_header);
+
 static int __init fou_init(void)
 {
 	int ret;
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index 0bb8e14..c3587e1 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -56,7 +56,10 @@
 #include <net/netns/generic.h>
 #include <net/rtnetlink.h>
 #include <net/udp.h>
-#include <net/gue.h>
+
+#if IS_ENABLED(CONFIG_NET_FOU)
+#include <net/fou.h>
+#endif
 
 #if IS_ENABLED(CONFIG_IPV6)
 #include <net/ipv6.h>
@@ -494,10 +497,12 @@ static int ip_encap_hlen(struct ip_tunnel_encap *e)
 	switch (e->type) {
 	case TUNNEL_ENCAP_NONE:
 		return 0;
+#if IS_ENABLED(CONFIG_NET_FOU)
 	case TUNNEL_ENCAP_FOU:
-		return sizeof(struct udphdr);
+		return fou_encap_hlen(e);
 	case TUNNEL_ENCAP_GUE:
-		return sizeof(struct udphdr) + sizeof(struct guehdr);
+		return gue_encap_hlen(e);
+#endif
 	default:
 		return -EINVAL;
 	}
@@ -526,60 +531,18 @@ int ip_tunnel_encap_setup(struct ip_tunnel *t,
 }
 EXPORT_SYMBOL_GPL(ip_tunnel_encap_setup);
 
-static int fou_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
-			    size_t hdr_len, u8 *protocol, struct flowi4 *fl4)
-{
-	struct udphdr *uh;
-	__be16 sport;
-	bool csum = !!(e->flags & TUNNEL_ENCAP_FLAG_CSUM);
-	int type = csum ? SKB_GSO_UDP_TUNNEL_CSUM : SKB_GSO_UDP_TUNNEL;
-
-	skb = iptunnel_handle_offloads(skb, csum, type);
-
-	if (IS_ERR(skb))
-		return PTR_ERR(skb);
-
-	/* Get length and hash before making space in skb */
-
-	sport = e->sport ? : udp_flow_src_port(dev_net(skb->dev),
-					       skb, 0, 0, false);
-
-	skb_push(skb, hdr_len);
-
-	skb_reset_transport_header(skb);
-	uh = udp_hdr(skb);
-
-	if (e->type == TUNNEL_ENCAP_GUE) {
-		struct guehdr *guehdr = (struct guehdr *)&uh[1];
-
-		guehdr->version = 0;
-		guehdr->hlen = 0;
-		guehdr->flags = 0;
-		guehdr->next_hdr = *protocol;
-	}
-
-	uh->dest = e->dport;
-	uh->source = sport;
-	uh->len = htons(skb->len);
-	uh->check = 0;
-	udp_set_csum(!(e->flags & TUNNEL_ENCAP_FLAG_CSUM), skb,
-		     fl4->saddr, fl4->daddr, skb->len);
-
-	*protocol = IPPROTO_UDP;
-
-	return 0;
-}
-
 int ip_tunnel_encap(struct sk_buff *skb, struct ip_tunnel *t,
 		    u8 *protocol, struct flowi4 *fl4)
 {
 	switch (t->encap.type) {
 	case TUNNEL_ENCAP_NONE:
 		return 0;
+#if IS_ENABLED(CONFIG_NET_FOU)
 	case TUNNEL_ENCAP_FOU:
+		return fou_build_header(skb, &t->encap, protocol, fl4);
 	case TUNNEL_ENCAP_GUE:
-		return fou_build_header(skb, &t->encap, t->encap_hlen,
-					protocol, fl4);
+		return gue_build_header(skb, &t->encap, protocol, fl4);
+#endif
 	default:
 		return -EINVAL;
 	}
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related

* [PATCH net-next 2/7] udp: Offload outer UDP tunnel csum if available
From: Tom Herbert @ 2014-11-01 22:57 UTC (permalink / raw)
  To: davem, netdev
In-Reply-To: <1414882683-25484-1-git-send-email-therbert@google.com>

In __skb_udp_tunnel_segment if outer UDP checksums are enabled and
ip_summed is not already CHECKSUM_PARTIAL, set up checksum offload
if device features allow it.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 net/ipv4/udp_offload.c | 52 ++++++++++++++++++++++++++++++++++----------------
 1 file changed, 36 insertions(+), 16 deletions(-)

diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 6480cea..a774711 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -29,7 +29,7 @@ static struct sk_buff *__skb_udp_tunnel_segment(struct sk_buff *skb,
 	netdev_features_t features,
 	struct sk_buff *(*gso_inner_segment)(struct sk_buff *skb,
 					     netdev_features_t features),
-	__be16 new_protocol)
+	__be16 new_protocol, bool is_ipv6)
 {
 	struct sk_buff *segs = ERR_PTR(-EINVAL);
 	u16 mac_offset = skb->mac_header;
@@ -39,7 +39,9 @@ static struct sk_buff *__skb_udp_tunnel_segment(struct sk_buff *skb,
 	netdev_features_t enc_features;
 	int udp_offset, outer_hlen;
 	unsigned int oldlen;
-	bool need_csum;
+	bool need_csum = !!(skb_shinfo(skb)->gso_type &
+			    SKB_GSO_UDP_TUNNEL_CSUM);
+	bool offload_csum = false, dont_encap = need_csum;
 
 	oldlen = (u16)~skb->len;
 
@@ -52,10 +54,12 @@ static struct sk_buff *__skb_udp_tunnel_segment(struct sk_buff *skb,
 	skb_set_network_header(skb, skb_inner_network_offset(skb));
 	skb->mac_len = skb_inner_network_offset(skb);
 	skb->protocol = new_protocol;
+	skb->encap_hdr_csum = need_csum;
 
-	need_csum = !!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL_CSUM);
-	if (need_csum)
-		skb->encap_hdr_csum = 1;
+	/* Try to offload checksum if possible */
+	offload_csum = !!(need_csum &&
+			  (skb->dev->features &
+			   (is_ipv6 ? NETIF_F_V6_CSUM : NETIF_F_V4_CSUM)));
 
 	/* segment inner packet. */
 	enc_features = skb->dev->hw_enc_features & features;
@@ -72,11 +76,21 @@ static struct sk_buff *__skb_udp_tunnel_segment(struct sk_buff *skb,
 	do {
 		struct udphdr *uh;
 		int len;
-
-		skb_reset_inner_headers(skb);
-		skb->encapsulation = 1;
+		__be32 delta;
+
+		if (dont_encap) {
+			skb->encapsulation = 0;
+			skb->ip_summed = CHECKSUM_NONE;
+		} else {
+			/* Only set up inner headers if we might be offloading
+			 * inner checksum.
+			 */
+			skb_reset_inner_headers(skb);
+			skb->encapsulation = 1;
+		}
 
 		skb->mac_len = mac_len;
+		skb->protocol = protocol;
 
 		skb_push(skb, outer_hlen);
 		skb_reset_mac_header(skb);
@@ -86,19 +100,25 @@ static struct sk_buff *__skb_udp_tunnel_segment(struct sk_buff *skb,
 		uh = udp_hdr(skb);
 		uh->len = htons(len);
 
-		if (need_csum) {
-			__be32 delta = htonl(oldlen + len);
+		if (!need_csum)
+			continue;
+
+		delta = htonl(oldlen + len);
+
+		uh->check = ~csum_fold((__force __wsum)
+				       ((__force u32)uh->check +
+					(__force u32)delta));
 
-			uh->check = ~csum_fold((__force __wsum)
-					       ((__force u32)uh->check +
-						(__force u32)delta));
+		if (offload_csum) {
+			skb->ip_summed = CHECKSUM_PARTIAL;
+			skb->csum_start = skb_transport_header(skb) - skb->head;
+			skb->csum_offset = offsetof(struct udphdr, check);
+		} else {
 			uh->check = gso_make_checksum(skb, ~uh->check);
 
 			if (uh->check == 0)
 				uh->check = CSUM_MANGLED_0;
 		}
-
-		skb->protocol = protocol;
 	} while ((skb = skb->next));
 out:
 	return segs;
@@ -134,7 +154,7 @@ struct sk_buff *skb_udp_tunnel_segment(struct sk_buff *skb,
 	}
 
 	segs = __skb_udp_tunnel_segment(skb, features, gso_inner_segment,
-					protocol);
+					protocol, is_ipv6);
 
 out_unlock:
 	rcu_read_unlock();
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related

* [PATCH net-next 3/7] gue: Add infrastructure for flags and options
From: Tom Herbert @ 2014-11-01 22:57 UTC (permalink / raw)
  To: davem, netdev
In-Reply-To: <1414882683-25484-1-git-send-email-therbert@google.com>

Add functions and basic definitions for processing standard flags,
private flags, and control messages. This includes definitions
to compute length of optional fields corresponding to a set of flags.
Flag validation is in validate_gue_flags function. This checks for
unknown flags, and that length of optional fields is <= length
in guehdr hlen.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 include/net/fou.h |  11 ++++-
 include/net/gue.h | 100 ++++++++++++++++++++++++++++++++++++--
 net/ipv4/fou.c    | 142 ++++++++++++++++++++++++++++++++++++------------------
 3 files changed, 199 insertions(+), 54 deletions(-)

diff --git a/include/net/fou.h b/include/net/fou.h
index cf4ce88..d2d8055 100644
--- a/include/net/fou.h
+++ b/include/net/fou.h
@@ -20,7 +20,16 @@ static size_t fou_encap_hlen(struct ip_tunnel_encap *e)
 
 static size_t gue_encap_hlen(struct ip_tunnel_encap *e)
 {
-	return sizeof(struct udphdr) + sizeof(struct guehdr);
+	size_t len;
+	bool need_priv = false;
+
+	len = sizeof(struct udphdr) + sizeof(struct guehdr);
+
+	/* Add in lengths flags */
+
+	len += need_priv ? GUE_LEN_PRIV : 0;
+
+	return len;
 }
 
 #endif
diff --git a/include/net/gue.h b/include/net/gue.h
index b6c3327..cb68ae8 100644
--- a/include/net/gue.h
+++ b/include/net/gue.h
@@ -1,23 +1,113 @@
 #ifndef __NET_GUE_H
 #define __NET_GUE_H
 
+/* Definitions for the GUE header, standard and private flags, lengths
+ * of optional fields are below.
+ *
+ * Diagram of GUE header:
+ *
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |Ver|C|  Hlen   | Proto/ctype   |        Standard flags       |P|
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |                                                               |
+ * ~                      Fields (optional)                        ~
+ * |                                                               |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |            Private flags (optional, P bit is set)             |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |                                                               |
+ * ~                   Private fields (optional)                   ~
+ * |                                                               |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *
+ * C bit indicates contol message when set, data message when unset.
+ * For a control message, proto/ctype is interpreted as a type of
+ * control message. For data messages, proto/ctype is the IP protocol
+ * of the next header.
+ *
+ * P bit indicates private flags field is present. The private flags
+ * may refer to options placed after this field.
+ */
+
 struct guehdr {
 	union {
 		struct {
 #if defined(__LITTLE_ENDIAN_BITFIELD)
-			__u8	hlen:4,
-			version:4;
+			__u8	hlen:5,
+				control:1,
+				version:2;
 #elif defined (__BIG_ENDIAN_BITFIELD)
-			__u8	version:4,
-				hlen:4;
+			__u8	version:2,
+				control:1,
+				hlen:5;
 #else
 #error  "Please fix <asm/byteorder.h>"
 #endif
-			__u8    next_hdr;
+			__u8    proto_ctype;
 			__u16   flags;
 		};
 		__u32 word;
 	};
 };
 
+/* Standard flags in GUE header */
+
+#define GUE_FLAG_PRIV	htons(1<<0)	/* Private flags are in options */
+#define GUE_LEN_PRIV	4
+
+#define GUE_FLAGS_ALL	(GUE_FLAG_PRIV)
+
+/* Private flags in the private option extension */
+
+#define GUE_PFLAGS_ALL	(0)
+
+/* Functions to compute options length corresponding to flags.
+ * If we ever have a lot of flags this can be potentially be
+ * converted to a more optimized algorithm (table lookup
+ * for instance).
+ */
+static inline size_t guehdr_flags_len(__be16 flags)
+{
+	return ((flags & GUE_FLAG_PRIV) ? GUE_LEN_PRIV : 0);
+}
+
+static inline size_t guehdr_priv_flags_len(__be32 flags)
+{
+	return 0;
+}
+
+/* Validate standard and private flags. Returns non-zero (meaning invalid)
+ * if there is an unknown standard or private flags, or the options length for
+ * the flags exceeds the options length specific in hlen of the GUE header.
+ */
+static inline int validate_gue_flags(struct guehdr *guehdr,
+				     size_t optlen)
+{
+	size_t len;
+	__be32 flags = guehdr->flags;
+
+	if (flags & ~GUE_FLAGS_ALL)
+		return 1;
+
+	len = guehdr_flags_len(flags);
+	if (len > optlen)
+		return 1;
+
+	if (flags & GUE_FLAG_PRIV) {
+		/* Private flags are last four bytes accounted in
+		 * guehdr_flags_len
+		 */
+		flags = *(__be32 *)((void *)&guehdr[1] + len - GUE_LEN_PRIV);
+
+		if (flags & ~GUE_PFLAGS_ALL)
+			return 1;
+
+		len += guehdr_priv_flags_len(flags);
+		if (len > optlen)
+			return 1;
+	}
+
+	return 0;
+}
+
 #endif
diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
index 5446c1c..a3b8c5b 100644
--- a/net/ipv4/fou.c
+++ b/net/ipv4/fou.c
@@ -38,21 +38,17 @@ static inline struct fou *fou_from_sock(struct sock *sk)
 	return sk->sk_user_data;
 }
 
-static int fou_udp_encap_recv_deliver(struct sk_buff *skb,
-				      u8 protocol, size_t len)
+static void fou_recv_pull(struct sk_buff *skb, size_t len)
 {
 	struct iphdr *iph = ip_hdr(skb);
 
 	/* Remove 'len' bytes from the packet (UDP header and
-	 * FOU header if present), modify the protocol to the one
-	 * we found, and then call rcv_encap.
+	 * FOU header if present).
 	 */
 	iph->tot_len = htons(ntohs(iph->tot_len) - len);
 	__skb_pull(skb, len);
 	skb_postpull_rcsum(skb, udp_hdr(skb), len);
 	skb_reset_transport_header(skb);
-
-	return -protocol;
 }
 
 static int fou_udp_recv(struct sock *sk, struct sk_buff *skb)
@@ -62,16 +58,24 @@ static int fou_udp_recv(struct sock *sk, struct sk_buff *skb)
 	if (!fou)
 		return 1;
 
-	return fou_udp_encap_recv_deliver(skb, fou->protocol,
-					  sizeof(struct udphdr));
+	fou_recv_pull(skb, sizeof(struct udphdr));
+
+	return -fou->protocol;
+}
+
+static int gue_control_message(struct sk_buff *skb, struct guehdr *guehdr)
+{
+	/* No support yet */
+	kfree_skb(skb);
+	return 0;
 }
 
 static int gue_udp_recv(struct sock *sk, struct sk_buff *skb)
 {
 	struct fou *fou = fou_from_sock(sk);
-	size_t len;
+	size_t len, optlen, hdrlen;
 	struct guehdr *guehdr;
-	struct udphdr *uh;
+	void *data;
 
 	if (!fou)
 		return 1;
@@ -80,25 +84,38 @@ static int gue_udp_recv(struct sock *sk, struct sk_buff *skb)
 	if (!pskb_may_pull(skb, len))
 		goto drop;
 
-	uh = udp_hdr(skb);
-	guehdr = (struct guehdr *)&uh[1];
+	guehdr = (struct guehdr *)&udp_hdr(skb)[1];
+
+	optlen = guehdr->hlen << 2;
+	len += optlen;
 
-	len += guehdr->hlen << 2;
 	if (!pskb_may_pull(skb, len))
 		goto drop;
 
-	uh = udp_hdr(skb);
-	guehdr = (struct guehdr *)&uh[1];
+	/* guehdr may change after pull */
+	guehdr = (struct guehdr *)&udp_hdr(skb)[1];
 
-	if (guehdr->version != 0)
-		goto drop;
+	hdrlen = sizeof(struct guehdr) + optlen;
 
-	if (guehdr->flags) {
-		/* No support yet */
+	if (guehdr->version != 0 || validate_gue_flags(guehdr, optlen))
 		goto drop;
+
+	/* Pull UDP and GUE headers */
+	fou_recv_pull(skb, len);
+
+	data = &guehdr[1];
+
+	if (guehdr->flags & GUE_FLAG_PRIV) {
+		data += GUE_LEN_PRIV;
+
+		/* Process private flags */
 	}
 
-	return fou_udp_encap_recv_deliver(skb, guehdr->next_hdr, len);
+	if (unlikely(guehdr->control))
+		return gue_control_message(skb, guehdr);
+
+	return -guehdr->proto_ctype;
+
 drop:
 	kfree_skb(skb);
 	return 0;
@@ -154,36 +171,47 @@ static struct sk_buff **gue_gro_receive(struct sk_buff **head,
 	const struct net_offload *ops;
 	struct sk_buff **pp = NULL;
 	struct sk_buff *p;
-	u8 proto;
 	struct guehdr *guehdr;
-	unsigned int hlen, guehlen;
-	unsigned int off;
+	size_t len, optlen, hdrlen, off;
+	void *data;
 	int flush = 1;
 
 	off = skb_gro_offset(skb);
-	hlen = off + sizeof(*guehdr);
+	len = off + sizeof(*guehdr);
+
 	guehdr = skb_gro_header_fast(skb, off);
-	if (skb_gro_header_hard(skb, hlen)) {
-		guehdr = skb_gro_header_slow(skb, hlen, off);
+	if (skb_gro_header_hard(skb, len)) {
+		guehdr = skb_gro_header_slow(skb, len, off);
 		if (unlikely(!guehdr))
 			goto out;
 	}
 
-	proto = guehdr->next_hdr;
+	optlen = guehdr->hlen << 2;
+	len += optlen;
 
-	rcu_read_lock();
-	offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads;
-	ops = rcu_dereference(offloads[proto]);
-	if (WARN_ON(!ops || !ops->callbacks.gro_receive))
-		goto out_unlock;
+	if (skb_gro_header_hard(skb, len)) {
+		guehdr = skb_gro_header_slow(skb, len, off);
+		if (unlikely(!guehdr))
+			goto out;
+	}
 
-	guehlen = sizeof(*guehdr) + (guehdr->hlen << 2);
+	if (unlikely(guehdr->control) || guehdr->version != 0 ||
+	    validate_gue_flags(guehdr, optlen))
+		goto out;
 
-	hlen = off + guehlen;
-	if (skb_gro_header_hard(skb, hlen)) {
-		guehdr = skb_gro_header_slow(skb, hlen, off);
-		if (unlikely(!guehdr))
-			goto out_unlock;
+	hdrlen = sizeof(*guehdr) + optlen;
+
+	skb_gro_pull(skb, hdrlen);
+
+	/* Adjusted NAPI_GRO_CB(skb)->csum after skb_gro_pull()*/
+	skb_gro_postpull_rcsum(skb, guehdr, hdrlen);
+
+	data = &guehdr[1];
+
+	if (guehdr->flags & GUE_FLAG_PRIV) {
+		data += GUE_LEN_PRIV;
+
+		/* Process private flags */
 	}
 
 	flush = 0;
@@ -197,7 +225,7 @@ static struct sk_buff **gue_gro_receive(struct sk_buff **head,
 		guehdr2 = (struct guehdr *)(p->data + off);
 
 		/* Compare base GUE header to be equal (covers
-		 * hlen, version, next_hdr, and flags.
+		 * hlen, version, proto_ctype, and flags.
 		 */
 		if (guehdr->word != guehdr2->word) {
 			NAPI_GRO_CB(p)->same_flow = 0;
@@ -212,10 +240,11 @@ static struct sk_buff **gue_gro_receive(struct sk_buff **head,
 		}
 	}
 
-	skb_gro_pull(skb, guehlen);
-
-	/* Adjusted NAPI_GRO_CB(skb)->csum after skb_gro_pull()*/
-	skb_gro_postpull_rcsum(skb, guehdr, guehlen);
+	rcu_read_lock();
+	offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads;
+	ops = rcu_dereference(offloads[guehdr->proto_ctype]);
+	if (WARN_ON(!ops || !ops->callbacks.gro_receive))
+		goto out_unlock;
 
 	pp = ops->callbacks.gro_receive(head, skb);
 
@@ -236,7 +265,7 @@ static int gue_gro_complete(struct sk_buff *skb, int nhoff)
 	u8 proto;
 	int err = -ENOENT;
 
-	proto = guehdr->next_hdr;
+	proto = guehdr->proto_ctype;
 
 	guehlen = sizeof(*guehdr) + (guehdr->hlen << 2);
 
@@ -533,8 +562,12 @@ int gue_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
 	bool csum = !!(e->flags & TUNNEL_ENCAP_FLAG_CSUM);
 	int type = csum ? SKB_GSO_UDP_TUNNEL_CSUM : SKB_GSO_UDP_TUNNEL;
 	struct guehdr *guehdr;
-	size_t hdr_len = sizeof(struct guehdr);
+	size_t optlen = 0;
 	__be16 sport;
+	void *data;
+	bool need_priv = false;
+
+	optlen += need_priv ? GUE_LEN_PRIV : 0;
 
 	skb = iptunnel_handle_offloads(skb, csum, type);
 
@@ -545,14 +578,27 @@ int gue_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
 	sport = e->sport ? : udp_flow_src_port(dev_net(skb->dev),
 					       skb, 0, 0, false);
 
-	skb_push(skb, hdr_len);
+	skb_push(skb, sizeof(struct guehdr) + optlen);
 
 	guehdr = (struct guehdr *)skb->data;
 
+	guehdr->control = 0;
 	guehdr->version = 0;
-	guehdr->hlen = 0;
+	guehdr->hlen = optlen >> 2;
 	guehdr->flags = 0;
-	guehdr->next_hdr = *protocol;
+	guehdr->proto_ctype = *protocol;
+
+	data = &guehdr[1];
+
+	if (need_priv) {
+		__be32 *flags = data;
+
+		guehdr->flags |= GUE_FLAG_PRIV;
+		*flags = 0;
+		data += GUE_LEN_PRIV;
+
+		/* Add private flags */
+	}
 
 	fou_build_udp(skb, e, fl4, protocol, sport);
 
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related

* [PATCH net-next 4/7] udp: Changes to udp_offload to support remote checksum offload
From: Tom Herbert @ 2014-11-01 22:58 UTC (permalink / raw)
  To: davem, netdev
In-Reply-To: <1414882683-25484-1-git-send-email-therbert@google.com>

Add a new GSO type, SKB_GSO_TUNNEL_REMCSUM, which indicates remote
checksum offload being done (in this case inner checksum must not
be offloaded to the NIC).

Added logic in __skb_udp_tunnel_segment to handle remote checksum
offload case.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 include/linux/netdev_features.h |  4 +++-
 include/linux/netdevice.h       |  1 +
 include/linux/skbuff.h          |  4 +++-
 net/core/skbuff.c               |  4 ++--
 net/ipv4/af_inet.c              |  1 +
 net/ipv4/tcp_offload.c          |  1 +
 net/ipv4/udp_offload.c          | 18 ++++++++++++++++--
 net/ipv6/ip6_offload.c          |  1 +
 net/ipv6/udp_offload.c          |  1 +
 9 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index dcfdecb..8c94b07 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -48,8 +48,9 @@ enum {
 	NETIF_F_GSO_UDP_TUNNEL_BIT,	/* ... UDP TUNNEL with TSO */
 	NETIF_F_GSO_UDP_TUNNEL_CSUM_BIT,/* ... UDP TUNNEL with TSO & CSUM */
 	NETIF_F_GSO_MPLS_BIT,		/* ... MPLS segmentation */
+	NETIF_F_GSO_TUNNEL_REMCSUM_BIT, /* ... TUNNEL with TSO & REMCSUM */
 	/**/NETIF_F_GSO_LAST =		/* last bit, see GSO_MASK */
-		NETIF_F_GSO_MPLS_BIT,
+		NETIF_F_GSO_TUNNEL_REMCSUM_BIT,
 
 	NETIF_F_FCOE_CRC_BIT,		/* FCoE CRC32 */
 	NETIF_F_SCTP_CSUM_BIT,		/* SCTP checksum offload */
@@ -119,6 +120,7 @@ enum {
 #define NETIF_F_GSO_UDP_TUNNEL	__NETIF_F(GSO_UDP_TUNNEL)
 #define NETIF_F_GSO_UDP_TUNNEL_CSUM __NETIF_F(GSO_UDP_TUNNEL_CSUM)
 #define NETIF_F_GSO_MPLS	__NETIF_F(GSO_MPLS)
+#define NETIF_F_GSO_TUNNEL_REMCSUM __NETIF_F(GSO_TUNNEL_REMCSUM)
 #define NETIF_F_HW_VLAN_STAG_FILTER __NETIF_F(HW_VLAN_STAG_FILTER)
 #define NETIF_F_HW_VLAN_STAG_RX	__NETIF_F(HW_VLAN_STAG_RX)
 #define NETIF_F_HW_VLAN_STAG_TX	__NETIF_F(HW_VLAN_STAG_TX)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index c85e065..b2364f0 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3583,6 +3583,7 @@ static inline bool net_gso_ok(netdev_features_t features, int gso_type)
 	BUILD_BUG_ON(SKB_GSO_UDP_TUNNEL != (NETIF_F_GSO_UDP_TUNNEL >> NETIF_F_GSO_SHIFT));
 	BUILD_BUG_ON(SKB_GSO_UDP_TUNNEL_CSUM != (NETIF_F_GSO_UDP_TUNNEL_CSUM >> NETIF_F_GSO_SHIFT));
 	BUILD_BUG_ON(SKB_GSO_MPLS    != (NETIF_F_GSO_MPLS >> NETIF_F_GSO_SHIFT));
+	BUILD_BUG_ON(SKB_GSO_TUNNEL_REMCSUM != (NETIF_F_GSO_TUNNEL_REMCSUM >> NETIF_F_GSO_SHIFT));
 
 	return (features & feature) == feature;
 }
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index a59d934..a41e101 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -372,6 +372,7 @@ enum {
 
 	SKB_GSO_MPLS = 1 << 12,
 
+	SKB_GSO_TUNNEL_REMCSUM = 1 << 13,
 };
 
 #if BITS_PER_LONG > 32
@@ -595,7 +596,8 @@ struct sk_buff {
 #endif
 	__u8			ipvs_property:1;
 	__u8			inner_protocol_type:1;
-	/* 4 or 6 bit hole */
+	__u8			remcsum_offload:1;
+	/* 3 or 5 bit hole */
 
 #ifdef CONFIG_NET_SCHED
 	__u16			tc_index;	/* traffic control index */
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index e48e5c0..7001896 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3013,7 +3013,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
 		if (nskb->len == len + doffset)
 			goto perform_csum_check;
 
-		if (!sg) {
+		if (!sg && !nskb->remcsum_offload) {
 			nskb->ip_summed = CHECKSUM_NONE;
 			nskb->csum = skb_copy_and_csum_bits(head_skb, offset,
 							    skb_put(nskb, len),
@@ -3085,7 +3085,7 @@ skip_fraglist:
 		nskb->truesize += nskb->data_len;
 
 perform_csum_check:
-		if (!csum) {
+		if (!csum && !nskb->remcsum_offload) {
 			nskb->csum = skb_checksum(nskb, doffset,
 						  nskb->len - doffset, 0);
 			nskb->ip_summed = CHECKSUM_NONE;
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 8b7fe5b..ed2c672 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1222,6 +1222,7 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
 		       SKB_GSO_TCPV6 |
 		       SKB_GSO_UDP_TUNNEL |
 		       SKB_GSO_UDP_TUNNEL_CSUM |
+		       SKB_GSO_TUNNEL_REMCSUM |
 		       SKB_GSO_MPLS |
 		       0)))
 		goto out;
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index 5b90f2f..a1b2a56 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -97,6 +97,7 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
 			       SKB_GSO_MPLS |
 			       SKB_GSO_UDP_TUNNEL |
 			       SKB_GSO_UDP_TUNNEL_CSUM |
+			       SKB_GSO_TUNNEL_REMCSUM |
 			       0) ||
 			     !(type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6))))
 			goto out;
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index a774711..0a5a70d 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -41,7 +41,8 @@ static struct sk_buff *__skb_udp_tunnel_segment(struct sk_buff *skb,
 	unsigned int oldlen;
 	bool need_csum = !!(skb_shinfo(skb)->gso_type &
 			    SKB_GSO_UDP_TUNNEL_CSUM);
-	bool offload_csum = false, dont_encap = need_csum;
+	bool remcsum = !!(skb_shinfo(skb)->gso_type & SKB_GSO_TUNNEL_REMCSUM);
+	bool offload_csum = false, dont_encap = (need_csum || remcsum);
 
 	oldlen = (u16)~skb->len;
 
@@ -55,6 +56,7 @@ static struct sk_buff *__skb_udp_tunnel_segment(struct sk_buff *skb,
 	skb->mac_len = skb_inner_network_offset(skb);
 	skb->protocol = new_protocol;
 	skb->encap_hdr_csum = need_csum;
+	skb->remcsum_offload = remcsum;
 
 	/* Try to offload checksum if possible */
 	offload_csum = !!(need_csum &&
@@ -108,11 +110,22 @@ static struct sk_buff *__skb_udp_tunnel_segment(struct sk_buff *skb,
 		uh->check = ~csum_fold((__force __wsum)
 				       ((__force u32)uh->check +
 					(__force u32)delta));
-
 		if (offload_csum) {
 			skb->ip_summed = CHECKSUM_PARTIAL;
 			skb->csum_start = skb_transport_header(skb) - skb->head;
 			skb->csum_offset = offsetof(struct udphdr, check);
+		} else if (remcsum) {
+			/* Need to calculate checksum from scratch,
+			 * inner checksums are never when doing
+			 * remote_checksum_offload.
+			 */
+
+			skb->csum = skb_checksum(skb, udp_offset,
+						 skb->len - udp_offset,
+						 0);
+			uh->check = csum_fold(skb->csum);
+			if (uh->check == 0)
+				uh->check = CSUM_MANGLED_0;
 		} else {
 			uh->check = gso_make_checksum(skb, ~uh->check);
 
@@ -192,6 +205,7 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb,
 		if (unlikely(type & ~(SKB_GSO_UDP | SKB_GSO_DODGY |
 				      SKB_GSO_UDP_TUNNEL |
 				      SKB_GSO_UDP_TUNNEL_CSUM |
+				      SKB_GSO_TUNNEL_REMCSUM |
 				      SKB_GSO_IPIP |
 				      SKB_GSO_GRE | SKB_GSO_GRE_CSUM |
 				      SKB_GSO_MPLS) ||
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index a071563..e976707 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -78,6 +78,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
 		       SKB_GSO_SIT |
 		       SKB_GSO_UDP_TUNNEL |
 		       SKB_GSO_UDP_TUNNEL_CSUM |
+		       SKB_GSO_TUNNEL_REMCSUM |
 		       SKB_GSO_MPLS |
 		       SKB_GSO_TCPV6 |
 		       0)))
diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index 6b8f543..637ba2e 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -42,6 +42,7 @@ static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb,
 				      SKB_GSO_DODGY |
 				      SKB_GSO_UDP_TUNNEL |
 				      SKB_GSO_UDP_TUNNEL_CSUM |
+				      SKB_GSO_TUNNEL_REMCSUM |
 				      SKB_GSO_GRE |
 				      SKB_GSO_GRE_CSUM |
 				      SKB_GSO_IPIP |
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related

* [PATCH net-next 5/7] gue: Protocol constants for remote checksum offload
From: Tom Herbert @ 2014-11-01 22:58 UTC (permalink / raw)
  To: davem, netdev
In-Reply-To: <1414882683-25484-1-git-send-email-therbert@google.com>

Define a private flag for remote checksun offload as well as a length
for the option.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 include/net/gue.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/net/gue.h b/include/net/gue.h
index cb68ae8..3f28ec7 100644
--- a/include/net/gue.h
+++ b/include/net/gue.h
@@ -59,7 +59,10 @@ struct guehdr {
 
 /* Private flags in the private option extension */
 
-#define GUE_PFLAGS_ALL	(0)
+#define GUE_PFLAG_REMCSUM	htonl(1 << 31)
+#define GUE_PLEN_REMCSUM	4
+
+#define GUE_PFLAGS_ALL	(GUE_PFLAG_REMCSUM)
 
 /* Functions to compute options length corresponding to flags.
  * If we ever have a lot of flags this can be potentially be
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related

* [PATCH net-next 6/7] gue: TX support for using remote checksum offload option
From: Tom Herbert @ 2014-11-01 22:58 UTC (permalink / raw)
  To: davem, netdev
In-Reply-To: <1414882683-25484-1-git-send-email-therbert@google.com>

Add if_tunnel flag TUNNEL_ENCAP_FLAG_REMCSUM to configure
remote checksum offload on an IP tunnel. Add logic in gue_build_header
to insert remote checksum offload option.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 include/net/fou.h              |  5 ++++-
 include/uapi/linux/if_tunnel.h |  1 +
 net/ipv4/fou.c                 | 35 ++++++++++++++++++++++++++++++++---
 3 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/include/net/fou.h b/include/net/fou.h
index d2d8055..25b26ff 100644
--- a/include/net/fou.h
+++ b/include/net/fou.h
@@ -25,7 +25,10 @@ static size_t gue_encap_hlen(struct ip_tunnel_encap *e)
 
 	len = sizeof(struct udphdr) + sizeof(struct guehdr);
 
-	/* Add in lengths flags */
+	if (e->flags & TUNNEL_ENCAP_FLAG_REMCSUM) {
+		len += GUE_PLEN_REMCSUM;
+		need_priv = true;
+	}
 
 	len += need_priv ? GUE_LEN_PRIV : 0;
 
diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
index 280d9e0..bd3cc11 100644
--- a/include/uapi/linux/if_tunnel.h
+++ b/include/uapi/linux/if_tunnel.h
@@ -69,6 +69,7 @@ enum tunnel_encap_types {
 
 #define TUNNEL_ENCAP_FLAG_CSUM		(1<<0)
 #define TUNNEL_ENCAP_FLAG_CSUM6		(1<<1)
+#define TUNNEL_ENCAP_FLAG_REMCSUM	(1<<2)
 
 /* SIT-mode i_flags */
 #define	SIT_ISATAP	0x0001
diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
index a3b8c5b..fb0db99 100644
--- a/net/ipv4/fou.c
+++ b/net/ipv4/fou.c
@@ -562,11 +562,19 @@ int gue_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
 	bool csum = !!(e->flags & TUNNEL_ENCAP_FLAG_CSUM);
 	int type = csum ? SKB_GSO_UDP_TUNNEL_CSUM : SKB_GSO_UDP_TUNNEL;
 	struct guehdr *guehdr;
-	size_t optlen = 0;
+	size_t hdrlen, optlen = 0;
 	__be16 sport;
 	void *data;
 	bool need_priv = false;
 
+	if ((e->flags & TUNNEL_ENCAP_FLAG_REMCSUM) &&
+	    skb->ip_summed == CHECKSUM_PARTIAL) {
+		csum = false;
+		optlen += GUE_PLEN_REMCSUM;
+		type |= SKB_GSO_TUNNEL_REMCSUM;
+		need_priv = true;
+	}
+
 	optlen += need_priv ? GUE_LEN_PRIV : 0;
 
 	skb = iptunnel_handle_offloads(skb, csum, type);
@@ -578,7 +586,9 @@ int gue_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
 	sport = e->sport ? : udp_flow_src_port(dev_net(skb->dev),
 					       skb, 0, 0, false);
 
-	skb_push(skb, sizeof(struct guehdr) + optlen);
+	hdrlen = sizeof(struct guehdr) + optlen;
+
+	skb_push(skb, hdrlen);
 
 	guehdr = (struct guehdr *)skb->data;
 
@@ -597,7 +607,26 @@ int gue_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
 		*flags = 0;
 		data += GUE_LEN_PRIV;
 
-		/* Add private flags */
+		if (type & SKB_GSO_TUNNEL_REMCSUM) {
+			u16 csum_start = skb_checksum_start_offset(skb);
+			__be16 *pd = data;
+
+			if (csum_start < hdrlen)
+				return -EINVAL;
+
+			csum_start -= hdrlen;
+			pd[0] = htons(csum_start);
+			pd[1] = htons(csum_start + skb->csum_offset);
+
+			if (!skb_is_gso(skb)) {
+				skb->ip_summed = CHECKSUM_NONE;
+				skb->encapsulation = 0;
+			}
+
+			*flags |= GUE_PFLAG_REMCSUM;
+			data += GUE_PLEN_REMCSUM;
+		}
+
 	}
 
 	fou_build_udp(skb, e, fl4, protocol, sport);
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related

* [PATCH net-next 7/7] gue: Receive side of remote checksum offload
From: Tom Herbert @ 2014-11-01 22:58 UTC (permalink / raw)
  To: davem, netdev
In-Reply-To: <1414882683-25484-1-git-send-email-therbert@google.com>

Add processing of the remote checksum offload option in both the normal
path as well as the GRO path. The implements patching the affected
checksum to derive the offloaded checksum.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 net/ipv4/fou.c | 170 ++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 161 insertions(+), 9 deletions(-)

diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
index fb0db99..740ae09 100644
--- a/net/ipv4/fou.c
+++ b/net/ipv4/fou.c
@@ -63,6 +63,59 @@ static int fou_udp_recv(struct sock *sk, struct sk_buff *skb)
 	return -fou->protocol;
 }
 
+static struct guehdr *gue_remcsum(struct sk_buff *skb, struct guehdr *guehdr,
+				  void *data, int hdrlen, u8 ipproto)
+{
+	__be16 *pd = data;
+	u16 start = ntohs(pd[0]);
+	u16 offset = ntohs(pd[1]);
+	u16 poffset = 0;
+	u16 plen;
+	__wsum csum, delta;
+	__sum16 *psum;
+
+	if (skb->remcsum_offload) {
+		/* Already processed in GRO path */
+		skb->remcsum_offload = 0;
+		return guehdr;
+	}
+
+	if (start > skb->len - hdrlen ||
+	    offset > skb->len - hdrlen - sizeof(u16))
+		return NULL;
+
+	if (unlikely(skb->ip_summed != CHECKSUM_COMPLETE))
+		__skb_checksum_complete(skb);
+
+	plen = hdrlen + offset + sizeof(u16);
+	if (!pskb_may_pull(skb, plen))
+		return NULL;
+	guehdr = (struct guehdr *)&udp_hdr(skb)[1];
+
+	if (ipproto == IPPROTO_IP && sizeof(struct iphdr) < plen) {
+		struct iphdr *ip = (struct iphdr *)(skb->data + hdrlen);
+
+		/* If next header happens to be IP we can skip that for the
+		 * checksum calculation since the IP header checksum is zero
+		 * if correct.
+		 */
+		poffset = ip->ihl * 4;
+	}
+
+	csum = csum_sub(skb->csum, skb_checksum(skb, poffset + hdrlen,
+						start - poffset - hdrlen, 0));
+
+	/* Set derived checksum in packet */
+	psum = (__sum16 *)(skb->data + hdrlen + offset);
+	delta = csum_sub(csum_fold(csum), *psum);
+	*psum = csum_fold(csum);
+
+	/* Adjust skb->csum since we changed the packet */
+	skb->csum = csum_add(skb->csum, delta);
+
+	return guehdr;
+}
+
 static int gue_control_message(struct sk_buff *skb, struct guehdr *guehdr)
 {
 	/* No support yet */
@@ -76,6 +129,7 @@ static int gue_udp_recv(struct sock *sk, struct sk_buff *skb)
 	size_t len, optlen, hdrlen;
 	struct guehdr *guehdr;
 	void *data;
+	u16 doffset = 0;
 
 	if (!fou)
 		return 1;
@@ -100,20 +154,43 @@ static int gue_udp_recv(struct sock *sk, struct sk_buff *skb)
 	if (guehdr->version != 0 || validate_gue_flags(guehdr, optlen))
 		goto drop;
 
-	/* Pull UDP and GUE headers */
-	fou_recv_pull(skb, len);
+	hdrlen = sizeof(struct guehdr) + optlen;
+
+	ip_hdr(skb)->tot_len = htons(ntohs(ip_hdr(skb)->tot_len) - len);
+
+	/* Pull UDP header now, skb->data points to guehdr */
+	__skb_pull(skb, sizeof(struct udphdr));
+
+	/* Pull csum through the guehdr now . This can be used if
+	 * there is a remote checksum offload.
+	 */
+	skb_postpull_rcsum(skb, udp_hdr(skb), len);
 
 	data = &guehdr[1];
 
 	if (guehdr->flags & GUE_FLAG_PRIV) {
-		data += GUE_LEN_PRIV;
+		__be32 flags = *(__be32 *)(data + doffset);
+
+		doffset += GUE_LEN_PRIV;
 
-		/* Process private flags */
+		if (flags & GUE_PFLAG_REMCSUM) {
+			guehdr = gue_remcsum(skb, guehdr, data + doffset,
+					     hdrlen, guehdr->proto_ctype);
+			if (!guehdr)
+				goto drop;
+
+			data = &guehdr[1];
+
+			doffset += GUE_PLEN_REMCSUM;
+		}
 	}
 
 	if (unlikely(guehdr->control))
 		return gue_control_message(skb, guehdr);
 
+	__skb_pull(skb, hdrlen);
+	skb_reset_transport_header(skb);
+
 	return -guehdr->proto_ctype;
 
 drop:
@@ -164,6 +241,66 @@ out_unlock:
 	return err;
 }
 
+static struct guehdr *gue_gro_remcsum(struct sk_buff *skb, unsigned int off,
+				      struct guehdr *guehdr, void *data,
+				      size_t hdrlen, u8 ipproto)
+{
+	__be16 *pd = data;
+	u16 start = ntohs(pd[0]);
+	u16 offset = ntohs(pd[1]);
+	u16 poffset = 0;
+	u16 plen;
+	void *ptr;
+	__wsum csum, delta;
+	__sum16 *psum;
+
+	if (skb->remcsum_offload)
+		return guehdr;
+
+	if (start > skb_gro_len(skb) - hdrlen ||
+	    offset > skb_gro_len(skb) - hdrlen - sizeof(u16) ||
+	    !NAPI_GRO_CB(skb)->csum_valid || skb->remcsum_offload)
+		return NULL;
+
+	plen = hdrlen + offset + sizeof(u16);
+
+	/* Pull checksum that will be written */
+	if (skb_gro_header_hard(skb, off + plen)) {
+		guehdr = skb_gro_header_slow(skb, off + plen, off);
+		if (!guehdr)
+			return NULL;
+	}
+
+	ptr = (void *)guehdr + hdrlen;
+
+	if (ipproto == IPPROTO_IP &&
+	    (hdrlen + sizeof(struct iphdr) < plen)) {
+		struct iphdr *ip = (struct iphdr *)(ptr + hdrlen);
+
+		/* If next header happens to be IP we can skip
+		 * that for the checksum calculation since the
+		 * IP header checksum is zero if correct.
+		 */
+		poffset = ip->ihl * 4;
+	}
+
+	csum = csum_sub(NAPI_GRO_CB(skb)->csum,
+			csum_partial(ptr + poffset, start - poffset, 0));
+
+	/* Set derived checksum in packet */
+	psum = (__sum16 *)(ptr + offset);
+	delta = csum_sub(csum_fold(csum), *psum);
+	*psum = csum_fold(csum);
+
+	/* Adjust skb->csum since we changed the packet */
+	skb->csum = csum_add(skb->csum, delta);
+	NAPI_GRO_CB(skb)->csum = csum_add(NAPI_GRO_CB(skb)->csum, delta);
+
+	skb->remcsum_offload = 1;
+
+	return guehdr;
+}
+
 static struct sk_buff **gue_gro_receive(struct sk_buff **head,
 					struct sk_buff *skb)
 {
@@ -174,6 +311,7 @@ static struct sk_buff **gue_gro_receive(struct sk_buff **head,
 	struct guehdr *guehdr;
 	size_t len, optlen, hdrlen, off;
 	void *data;
+	u16 doffset = 0;
 	int flush = 1;
 
 	off = skb_gro_offset(skb);
@@ -201,19 +339,33 @@ static struct sk_buff **gue_gro_receive(struct sk_buff **head,
 
 	hdrlen = sizeof(*guehdr) + optlen;
 
-	skb_gro_pull(skb, hdrlen);
-
-	/* Adjusted NAPI_GRO_CB(skb)->csum after skb_gro_pull()*/
+	/* Adjust NAPI_GRO_CB(skb)->csum to account for guehdr,
+	 * this is needed if there is a remote checkcsum offload.
+	 */
 	skb_gro_postpull_rcsum(skb, guehdr, hdrlen);
 
 	data = &guehdr[1];
 
 	if (guehdr->flags & GUE_FLAG_PRIV) {
-		data += GUE_LEN_PRIV;
+		__be32 flags = *(__be32 *)(data + doffset);
 
-		/* Process private flags */
+		doffset += GUE_LEN_PRIV;
+
+		if (flags & GUE_PFLAG_REMCSUM) {
+			guehdr = gue_gro_remcsum(skb, off, guehdr,
+						 data + doffset, hdrlen,
+						 guehdr->proto_ctype);
+			if (!guehdr)
+				goto out;
+
+			data = &guehdr[1];
+
+			doffset += GUE_PLEN_REMCSUM;
+		}
 	}
 
+	skb_gro_pull(skb, hdrlen);
+
 	flush = 0;
 
 	for (p = *head; p; p = p->next) {
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related

* [PATCH bluetooth-next] netdevice: add ieee802154_ptr to net_device
From: Alexander Aring @ 2014-11-02  5:44 UTC (permalink / raw)
  To: linux-wpan
  Cc: kernel, netdev, linux-wireless, Alexander Aring, David S. Miller

This patch adds an ieee802154_ptr to the net_device structure.
Furthermore the 802.15.4 subsystem will introduce a nl802154 framework
which is similar like the nl80211 framework and a wpan_dev structure.
The wpan_dev structure will hold additional net_device attributes like
address options which are 802.15.4 specific. In the upcoming nl802154
implementation we will introduce a NL802154_FLAG_NEED_WPAN_DEV like
NL80211_FLAG_NEED_WDEV. For this flag an ieee802154_ptr in net_device is
needed. Additional we can access the wpan_dev attributes in upper layers
like IEEE 802.15.4 6LoWPAN easily. Current solution is a complicated
callback interface and getting these values over subif data structure
in mac802154.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
---
Another option would be to combine ieee80211_ptr and ieee802154_ptr in
an union. These pointer can't be used twice at the same time and the
union solution will not make the struct net_device bigger.

My working repository is bluetooth-next. Marcel will apply all 802.15.4
changes. That's why this patch should go into bluetooth-next. Then I can
send new patches which depends on this patch for introducing wpan_dev and
nl802154.

 include/linux/netdevice.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 74fd5d3..c9bcf33 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -57,6 +57,8 @@ struct device;
 struct phy_device;
 /* 802.11 specific */
 struct wireless_dev;
+/* 802.15.4 specific */
+struct wpan_dev;

 void netdev_set_default_ethtool_ops(struct net_device *dev,
 				    const struct ethtool_ops *ops);
@@ -1572,6 +1574,7 @@ struct net_device {
 	struct inet6_dev __rcu	*ip6_ptr;
 	void			*ax25_ptr;
 	struct wireless_dev	*ieee80211_ptr;
+	struct wpan_dev		*ieee802154_ptr;

 /*
  * Cache lines mostly used on receive path (including eth_type_trans())
-- 
2.1.3

^ permalink raw reply related

* Re: [PATCH]  net: mvpp2: fix possible memory leak
From: Sudip Mukherjee @ 2014-11-02  6:19 UTC (permalink / raw)
  To: Thomas Petazzoni; +Cc: David S. Miller, netdev, linux-kernel, Marcin Wojtas
In-Reply-To: <20141101232445.1a3fe27e@free-electrons.com>

On Sat, Nov 01, 2014 at 11:24:45PM +0100, Thomas Petazzoni wrote:
> Dear Sudip Mukherjee,
> 
> On Sat,  1 Nov 2014 16:59:34 +0530, Sudip Mukherjee wrote:
> > we are allocating memory using kzalloc for struct mvpp2_prs_entry,
> > but later when we are getting error we were just returning the error
> > value without releasing the memory.
> > 
> > Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
> > ---
> > 
> > hi,
> > i could not build test after modifying it. I tried to compile using
> > multi_v7_defconfig , but the cross compiler i have is not able to
> > compile it and giving sevaral warnings from the assembler.
> 
> That seems weird. Which compiler are you using, and which errors were
> you getting?
> 
> In any case, it would have been good to Cc the authors of the driver.
yes, i should have. Ccing now. better late than never.
i am using gcc version 4.3.2 (Sourcery G++ Lite 2008q3-72).

thanks
sudip

> 
> Thanks!
> 
> Thomas
> -- 
> Thomas Petazzoni, CTO, Free Electrons
> Embedded Linux, Kernel and Android engineering
> http://free-electrons.com

^ permalink raw reply

* IEEE 802.15.4 6LoWPAN need to change netdev type UAPI - How we can do it right now?
From: Alexander Aring @ 2014-11-02 12:41 UTC (permalink / raw)
  To: netdev; +Cc: linux-wpan, linux-bluetooth

Hi,

The IEEE 802.15.4 with 6LoWPAN has a big problem. We have two interfaces
one "wpan" interface which belongs the IEEE 802.15.4 subsystem and the
"lowpan" interface for IEEE 802.15.4 6LoWPAN layer.

The big problem is that "wpan" and "lowpan" interfaces use the same
ARPHRD type. This is "ARPHRD_IEEE802154".

In kernelspace we can't decide if I handle now a "wpan" interface or
"lowpan" interface and there exist two problems which I know.

These are:

1. Freeing resources

If we create a "wpan" interface we allocate some private data, etc.
which the "lowpan" doesn't allocate. If we free/unregister a wpan
interface over netlink we will free these private resources. Now if
we call the netlink with an "lowpan" interface we try to freeing the
same resources. Of course this will fail because on allocation we don't
allocate the "wpan" resources. We can't decide at netlink interface if
it's a "wpan" or "lowpan" interface. On a "lowpan" interface we could
return -EINVAL then, but we can't decide that.

Possible hacking solution would be to remember the ifindex of all "wpan"
registrated interfaces and check if it fits. I don't know if this
solution could be 100% save.

2. Confusing userspace applications

Userspace applications can't also decide between "wpan" and "lowpan"
interfaces. Currently applications like "wireshark" will decode all
"packets" on "lowpan" as IEEE 802.15.4 frames by default. Correct
should be IPv6 packets decoding. Changing wireshark to decode
ARPHRD_IEEE802154 as IPv6 by default will occur that a "wpan" interface
will have a wrong default decoding.

Possible hacking solution would here to try to create a IPv6 socket if
it's fail it's an wpan interface, if succesful we have a lowpan interface.

In my opinion we need to change this behaviour, but it's an UAPI change
and I will do it right the first time. Possible types we could change to
is "ARPHRD_6LOWPAN" which is also used by bluetooth.

Two solutions:

- Changing type to "ARPHRD_6LOWPAN":

Furthermore we need to make "small" runtime decisions in IPv6. The
ARPHRD_6LOWPAN is used by bluetooth and maybe possible IEEE 802.15.4.
These "small" runtime decisions needs L2 informations from bluetooth or
IEEE 802.15.4. If we change now to "ARPHRD_6LOWPAN" we have a much
similar issue that we can't decide between a 6LoWPAN bluetooth interface
or 6LoWPAN 802.15.4 interface. At userspace this should make no
difference. It's only to decide interface types inside upper layers
inside kernelspace.

Possible solution could be to introduce a ARPHRD_SUBTYPE and place it to
the beginning of netdev_priv(dev) structure. This structure could look
like:

struct lowpan_netdev_priv {
	/* subtype of ARPHRD_6LOWPAN */
	enum lowpan_subtype subtype;
	/* private data of L2 subtype */
	void *priv;
};

In upper layers like IPv6 we could first check if it's a ARPHRD_6LOWPAN
type. After that and we need really L2 different handling we can check
on the lowpan_subtype subtype which is placed in netdev_priv.

This solution require that all ARPHRD_6LOWPAN interfaces use the
lowpan_netdev_priv structure in netdev_priv. I am also not 100% sure if
we need such information in userspace, maybe we can introduce also a
subtype netlink type to get the ARPHRD_6LOWPAN subtype.

- introduce new ARPHRD "ARPHRD_6LOWPAN_IEEE802154"

Just add another type without complicated subtype mechanism.

I hope it's clear that I run into several issues because "wpan" and
"lowpan" uses the same ARPHRD. This code is already 4 years old and
there exists already userspace software which checks on
ARPHRD_IEEE802154 for lowpan interfaces. I need some help how we can
deal now with this "just change the dev->type?". If yes, to which type?

- Alex

^ permalink raw reply

* [PATCH] net: shrink struct softnet_data
From: Eric Dumazet @ 2014-11-02 14:00 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

From: Eric Dumazet <edumazet@google.com>

flow_limit in struct softnet_data is only read from local cpu
and can be moved to fill a hole, reducing softnet_data size by
64 bytes on x86_64

While we are at it, move output_queue, output_queue_tailp and
completion_queue, so that rx / tx paths touch a single cache line.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/netdevice.h |   15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index c85e065122460e9f077bcb6788018be38e1d7ddf..5ed05bd764dcf3699afdd9a7b17600246de22d1d 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2329,10 +2329,7 @@ extern int netdev_flow_limit_table_len;
  * Incoming packets are placed on per-cpu queues
  */
 struct softnet_data {
-	struct Qdisc		*output_queue;
-	struct Qdisc		**output_queue_tailp;
 	struct list_head	poll_list;
-	struct sk_buff		*completion_queue;
 	struct sk_buff_head	process_queue;
 
 	/* stats */
@@ -2340,10 +2337,17 @@ struct softnet_data {
 	unsigned int		time_squeeze;
 	unsigned int		cpu_collision;
 	unsigned int		received_rps;
-
 #ifdef CONFIG_RPS
 	struct softnet_data	*rps_ipi_list;
+#endif
+#ifdef CONFIG_NET_FLOW_LIMIT
+	struct sd_flow_limit __rcu *flow_limit;
+#endif
+	struct Qdisc		*output_queue;
+	struct Qdisc		**output_queue_tailp;
+	struct sk_buff		*completion_queue;
 
+#ifdef CONFIG_RPS
 	/* Elements below can be accessed between CPUs for RPS */
 	struct call_single_data	csd ____cacheline_aligned_in_smp;
 	struct softnet_data	*rps_ipi_next;
@@ -2355,9 +2359,6 @@ struct softnet_data {
 	struct sk_buff_head	input_pkt_queue;
 	struct napi_struct	backlog;
 
-#ifdef CONFIG_NET_FLOW_LIMIT
-	struct sd_flow_limit __rcu *flow_limit;
-#endif
 };
 
 static inline void input_queue_head_incr(struct softnet_data *sd)

^ permalink raw reply related

* Re: [PATCH net-next 5/8] net/mlx4_en: Remove redundant code from RX/GRO path
From: Or Gerlitz @ 2014-11-02 14:09 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, Linux Netdev List, Matan Barak, Amir Vadai,
	Saeed Mahameed, Shani Michaeli, Ido Shamay
In-Reply-To: <1414770362.27538.7.camel@edumazet-glaptop2.roam.corp.google.com>

On 10/31/2014 5:46 PM, Eric Dumazet wrote:
> On Fri, 2014-10-31 at 16:00 +0200, Or Gerlitz wrote:
>> On Fri, Oct 31, 2014 at 5:19 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> On Fri, 2014-10-31 at 01:25 +0200, Or Gerlitz wrote:
>>>> On Thu, Oct 30, 2014 at 9:00 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>>> On Thu, 2014-10-30 at 18:06 +0200, Or Gerlitz wrote:
>>>>>> Remove the code which goes through napi_gro_frags() on the RX path,
>>>>>> use only napi_gro_receive().
>>>>> Hmpff... napi_gro_frags() should be faster. Have you benchmarked this ?
>>>>
>>>> yep we did, napi_gro_frags() was somehow better for single stream. Do
>>>> you think we need to do it the other way around, e.g converge to use napi_gro_frags()?
>>> napi_gro_frags() is faster because the napi->skb is reused fast (not
>>> going through kfree_skb()/alloc_skb() for every fragment)
>> I see. Is this a strong vote to convert the code to use napi_gro_frags
>> on it's usual track?
> I don't know yet. In some cases, actually slowing down the rx path can
> help by building bigger GRO packets. But instead of inserting delays,
> we can simply force napi to be run another time, with a nanosec based
> timer.
>
> I've tested this kind of heuristic :
>
>         /* If some packets are waiting in GRO engine and timeout is not expired,
>          * reschedule a NAPI poll. We allow servicing other softirqs
>          * before repoll, we do not rearm CQ.
>          */
>         if (rx_nsecs && napi->gro_list && !need_resched()) {
>                 u64 now = local_clock();
>                 unsigned long flags;
>
>                 /* If we got packets in this round, restart timeout */
>                 if (done)
>                         cq->tstart = now;
>                 else if (now - cq->tstart >= (u64)rx_nsecs)
>                         goto complete;
>
>                 /* Since we might need one skb very soon, build it now */
>                 napi_get_frags(napi);
>
>                 local_irq_save(flags);
>                 list_del(&napi->poll_list);
>                 __napi_schedule_irqoff(napi);
>                 local_irq_restore(flags);
>
>          } else {
> complete:
>                  napi_complete(napi);
>                  mlx4_en_arm_cq(priv, cq);
>          }
> 	return done;

Hi Eric,

For the time being, I'll drop from this series thischange and the 
following ones which depend on it. So can pick in the earlier patches of 
the series, and investigate in parallel thevarious optionsw.r.t GRO here.

Or.

^ permalink raw reply

* [PATCH] net: less interrupt masking in NAPI
From: Eric Dumazet @ 2014-11-02 14:19 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Willem de Bruijn

From: Eric Dumazet <edumazet@google.com>

net_rx_action() can mask irqs a single time to transfert sd->poll_list
into a private list, for a very short duration.

Then, napi_complete() can avoid masking irqs again,
and net_rx_action() only needs to mask irq again in slow path.

This patch removes 2 couples of irq mask/unmask per typical NAPI run,
more if multiple napi were triggered.

Note this also allows to give control back to caller (do_softirq())
more often, so that other softirq handlers can be called a bit earlier,
or ksoftirqd can be wakeup earlier under pressure.

This was developed while testing an alternative to RX interrupt
mitigation to reduce latencies while keeping or improving GRO
aggregation on fast NIC.

Idea is to test napi->gro_list at the end of a napi->poll() and
reschedule one NAPI poll, but after servicing a full round of
softirqs (timers, TX, rcu, ...). This will be allowed only if softirq
is currently serviced by idle task or ksoftirqd, and resched not needed.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
---
 net/core/dev.c |   68 +++++++++++++++++++++++++++++------------------
 1 file changed, 43 insertions(+), 25 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index ebf778df58cd..40be481268de 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4316,20 +4316,28 @@ static void net_rps_action_and_irq_enable(struct softnet_data *sd)
 		local_irq_enable();
 }
 
+static bool sd_has_rps_ipi_waiting(struct softnet_data *sd)
+{
+#ifdef CONFIG_RPS
+	return sd->rps_ipi_list != NULL;
+#else
+	return false;
+#endif
+}
+
 static int process_backlog(struct napi_struct *napi, int quota)
 {
 	int work = 0;
 	struct softnet_data *sd = container_of(napi, struct softnet_data, backlog);
 
-#ifdef CONFIG_RPS
 	/* Check if we have pending ipi, its better to send them now,
 	 * not waiting net_rx_action() end.
 	 */
-	if (sd->rps_ipi_list) {
+	if (sd_has_rps_ipi_waiting(sd)) {
 		local_irq_disable();
 		net_rps_action_and_irq_enable(sd);
 	}
-#endif
+
 	napi->weight = weight_p;
 	local_irq_disable();
 	while (1) {
@@ -4356,7 +4364,6 @@ static int process_backlog(struct napi_struct *napi, int quota)
 			 * We can use a plain write instead of clear_bit(),
 			 * and we dont need an smp_mb() memory barrier.
 			 */
-			list_del(&napi->poll_list);
 			napi->state = 0;
 			rps_unlock(sd);
 
@@ -4406,7 +4413,7 @@ void __napi_complete(struct napi_struct *n)
 	BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state));
 	BUG_ON(n->gro_list);
 
-	list_del(&n->poll_list);
+	list_del_init(&n->poll_list);
 	smp_mb__before_atomic();
 	clear_bit(NAPI_STATE_SCHED, &n->state);
 }
@@ -4424,9 +4431,15 @@ void napi_complete(struct napi_struct *n)
 		return;
 
 	napi_gro_flush(n, false);
-	local_irq_save(flags);
-	__napi_complete(n);
-	local_irq_restore(flags);
+
+	if (likely(list_empty(&n->poll_list))) {
+		WARN_ON_ONCE(!test_and_clear_bit(NAPI_STATE_SCHED, &n->state));
+	} else {
+		/* If n->poll_list is not empty, we need to mask irqs */
+		local_irq_save(flags);
+		__napi_complete(n);
+		local_irq_restore(flags);
+	}
 }
 EXPORT_SYMBOL(napi_complete);
 
@@ -4520,29 +4533,28 @@ static void net_rx_action(struct softirq_action *h)
 	struct softnet_data *sd = this_cpu_ptr(&softnet_data);
 	unsigned long time_limit = jiffies + 2;
 	int budget = netdev_budget;
+	LIST_HEAD(list);
+	LIST_HEAD(repoll);
 	void *have;
 
 	local_irq_disable();
+	list_splice_init(&sd->poll_list, &list);
+	local_irq_enable();
 
-	while (!list_empty(&sd->poll_list)) {
+	while (!list_empty(&list)) {
 		struct napi_struct *n;
 		int work, weight;
 
-		/* If softirq window is exhuasted then punt.
+		/* If softirq window is exhausted then punt.
 		 * Allow this to run for 2 jiffies since which will allow
 		 * an average latency of 1.5/HZ.
 		 */
 		if (unlikely(budget <= 0 || time_after_eq(jiffies, time_limit)))
 			goto softnet_break;
 
-		local_irq_enable();
 
-		/* Even though interrupts have been re-enabled, this
-		 * access is safe because interrupts can only add new
-		 * entries to the tail of this list, and only ->poll()
-		 * calls can remove this head entry from the list.
-		 */
-		n = list_first_entry(&sd->poll_list, struct napi_struct, poll_list);
+		n = list_first_entry(&list, struct napi_struct, poll_list);
+		list_del_init(&n->poll_list);
 
 		have = netpoll_poll_lock(n);
 
@@ -4564,8 +4576,6 @@ static void net_rx_action(struct softirq_action *h)
 
 		budget -= work;
 
-		local_irq_disable();
-
 		/* Drivers must not modify the NAPI state if they
 		 * consume the entire weight.  In such cases this code
 		 * still "owns" the NAPI instance and therefore can
@@ -4573,32 +4583,40 @@ static void net_rx_action(struct softirq_action *h)
 		 */
 		if (unlikely(work == weight)) {
 			if (unlikely(napi_disable_pending(n))) {
-				local_irq_enable();
 				napi_complete(n);
-				local_irq_disable();
 			} else {
 				if (n->gro_list) {
 					/* flush too old packets
 					 * If HZ < 1000, flush all packets.
 					 */
-					local_irq_enable();
 					napi_gro_flush(n, HZ >= 1000);
-					local_irq_disable();
 				}
-				list_move_tail(&n->poll_list, &sd->poll_list);
+				list_add_tail(&n->poll_list, &repoll);
 			}
 		}
 
 		netpoll_poll_unlock(have);
 	}
+
+	if (!sd_has_rps_ipi_waiting(sd) &&
+	    list_empty(&list) &&
+	    list_empty(&repoll))
+		return;
 out:
+	local_irq_disable();
+
+	list_splice_tail_init(&sd->poll_list, &list);
+	list_splice_tail(&repoll, &list);
+	list_splice(&list, &sd->poll_list);
+	if (!list_empty(&sd->poll_list))
+		__raise_softirq_irqoff(NET_RX_SOFTIRQ);
+
 	net_rps_action_and_irq_enable(sd);
 
 	return;
 
 softnet_break:
 	sd->time_squeeze++;
-	__raise_softirq_irqoff(NET_RX_SOFTIRQ);
 	goto out;
 }
 

^ permalink raw reply related

* [PATCH V1 net-next 0/5] Mellanox ethernet driver update Oct-30-2014
From: Or Gerlitz @ 2014-11-02 14:26 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Matan Barak, Amir Vadai, Saeed Mahameed, Shani Michaeli,
	Ido Shamay, Or Gerlitz

Hi Dave,

The 1st patch from Saeed fixes a bug in the last net-next batch where
a VF could get access to set port configuration, the next patch from Amir
fixes a race in the port VPI logic. Next are two performance patches from Ido.

The patch to add checksum complete status on GRE and such packets was 
preceded with a patch that converted the driver to only use napi_gro_receive 
vs. the current code which goes through napi_gro_frags on it's usual track.
Eric D. has some thoughts and suggestions on that change for which we 
want to take the time and consider, so for the time being dropped that
patch and the ones that depend on it.

Or.

Changes from V0:
  - have the caller to provide the __GFP_COLD hint to the service function
  - dropped the patch that changes the GRO logic and the subsequent dependent
    patches. 

Amir Vadai (1):
  net/mlx4_core: Protect port type setting by mutex

Ido Shamay (2):
  net/mlx4_en: Remove RX buffers alignment to IP_ALIGN
  net/mlx4_en: Add __GFP_COLD gfp flags in alloc_pages

Matan Barak (1):
  net/mlx4_core: Add retrieval of CONFIG_DEV parameters

Saeed Mahameed (1):
  net/mlx4_core: Prevent VF from changing port configuration

 drivers/net/ethernet/mellanox/mlx4/cmd.c           |    6 +-
 drivers/net/ethernet/mellanox/mlx4/en_rx.c         |   23 ++---
 drivers/net/ethernet/mellanox/mlx4/fw.c            |  118 +++++++++++++++++++-
 drivers/net/ethernet/mellanox/mlx4/main.c          |    9 ++-
 drivers/net/ethernet/mellanox/mlx4/mlx4.h          |   10 ++
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h       |    1 -
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  |   17 +++
 include/linux/mlx4/cmd.h                           |   29 +++++
 include/linux/mlx4/device.h                        |    3 +-
 9 files changed, 190 insertions(+), 26 deletions(-)

^ permalink raw reply

* [PATCH V1 net-next 2/5] net/mlx4_core: Protect port type setting by mutex
From: Or Gerlitz @ 2014-11-02 14:26 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Matan Barak, Amir Vadai, Saeed Mahameed, Shani Michaeli,
	Ido Shamay
In-Reply-To: <1414938377-421-1-git-send-email-ogerlitz@mellanox.com>

From: Amir Vadai <amirv@mellanox.com>

We need to protect set_port_type() for concurrency, as the sysfs code could
call it from mutliple contexts in parallel.

The port_mutex is not enough because we need to protect from concurrent
modification of 'info' and stopping of the port sensing work.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/main.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 90de6e1..9f82196 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -901,9 +901,12 @@ static ssize_t set_port_type(struct device *dev,
 	struct mlx4_priv *priv = mlx4_priv(mdev);
 	enum mlx4_port_type types[MLX4_MAX_PORTS];
 	enum mlx4_port_type new_types[MLX4_MAX_PORTS];
+	static DEFINE_MUTEX(set_port_type_mutex);
 	int i;
 	int err = 0;
 
+	mutex_lock(&set_port_type_mutex);
+
 	if (!strcmp(buf, "ib\n"))
 		info->tmp_type = MLX4_PORT_TYPE_IB;
 	else if (!strcmp(buf, "eth\n"))
@@ -912,7 +915,8 @@ static ssize_t set_port_type(struct device *dev,
 		info->tmp_type = MLX4_PORT_TYPE_AUTO;
 	else {
 		mlx4_err(mdev, "%s is not supported port type\n", buf);
-		return -EINVAL;
+		err = -EINVAL;
+		goto err_out;
 	}
 
 	mlx4_stop_sense(mdev);
@@ -958,6 +962,9 @@ static ssize_t set_port_type(struct device *dev,
 out:
 	mlx4_start_sense(mdev);
 	mutex_unlock(&priv->port_mutex);
+err_out:
+	mutex_unlock(&set_port_type_mutex);
+
 	return err ? err : count;
 }
 
-- 
1.7.1

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox