Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH V3 1/3] can: add can_is_canfd_skb() API
From: Eric Dumazet @ 2014-11-05 16:22 UTC (permalink / raw)
  To: Dong Aisheng
  Cc: linux-can, mkl, wg, varkabhadram, netdev, socketcan,
	linux-arm-kernel
In-Reply-To: <1415193393-30023-1-git-send-email-b29396@freescale.com>

On Wed, 2014-11-05 at 21:16 +0800, Dong Aisheng wrote:
> The CAN device drivers can use it to check if the frame to send is on
> CAN FD mode or normal CAN mode.
> 
> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
> Signed-off-by: Dong Aisheng <b29396@freescale.com>
> ---
> ChangesLog:
>  * v1->v2: change to skb->len == CANFD_MTU;
> ---
>  include/linux/can/dev.h | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/include/linux/can/dev.h b/include/linux/can/dev.h
> index 6992afc..4c3919c 100644
> --- a/include/linux/can/dev.h
> +++ b/include/linux/can/dev.h
> @@ -99,6 +99,11 @@ inval_skb:
>  	return 1;
>  }
>  

This looks a bit strange to assume that skb->len == magical_value is CAN
FD. A comment would be nice.

> +static inline int can_is_canfd_skb(struct sk_buff *skb)

static inline bool can_is_canfd_skb(const struct sk_buff *skb)

> +{
> +	return skb->len == CANFD_MTU;
> +}
> +
>  /* get data length from can_dlc with sanitized can_dlc */
>  u8 can_dlc2len(u8 can_dlc);
>  



^ permalink raw reply

* Fwd: Kernel Oops in __inet_twsk_kill()
From: Daniel Borkmann @ 2014-11-05 16:00 UTC (permalink / raw)
  To: charley.chu; +Cc: netdev
In-Reply-To: <FB8A4655DFD2B34DB16AE06DDDD6C0E231A6F030@SJEXCHMB12.corp.ad.broadcom.com>

[ moving to netdev ]

-------- Original Message --------
Subject: Kernel Oops in __inet_twsk_kill()
Date: Tue, 4 Nov 2014 23:47:18 +0000
From: Charley (Hao Chuan) Chu <charley.chu@broadcom.com>
To: linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>

We have situation on our system. It brings the network interface up and down every
a few seconds. Eventually, it brings down the system - the kernel crashed due to BUG
on in __inet_twsk_kill(). The debug message show following call flow.

1) time-wait socket is created by tcp_time_wait() when the socket gets into "TIME_WAIT" state.
     inet_twsk_alloc()               - refcnt= 0
     inet_twsk_hashdance()  - refcnt = 3
     inet_twsk_schedule()      - refcnt = 4
     inet_twsk_put()                 - refcnt = 3
2) tcp_v4_timewait_ack() is called when sync is received
     inet_twsk_put()                  - refcnt= 2      <== where we thing the problem is
     occasionally, second sync is received, so the inet_twsk_put is called twice - refcnt = 1
3) twdr_do_twkill_work() is called when timed out
     call __inet_twsk_kill - BUG_ON!!! as refcnt=2 (supposed to be 3).
     call inet_twsk_put()

In a normal case, the callflow only has step 1 and step 3.  Our understanding is
the time-wait socket has three references - ehash, bhash and timer death row. In
step 2, none of them are touched. Can anyone here explain to us why the inet_twsk_put()
is called in tcp_v4_timewait_ack()?

our system has 3.14 kernel.

Any help would be highly appreciated.

Charley Chu


^ permalink raw reply

* Re: [PATCH net v3] ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs
From: Daniel Borkmann @ 2014-11-05 16:26 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: davem, lw1a2.jing, fw, hannes, netdev, Eric Dumazet,
	David L Stevens
In-Reply-To: <1415204413.13896.5.camel@edumazet-glaptop2.roam.corp.google.com>

On 11/05/2014 05:20 PM, Eric Dumazet wrote:
> On Wed, 2014-11-05 at 15:42 +0100, Daniel Borkmann wrote:
>> It has been reported that generating an MLD listener report on
>> devices with large MTUs (e.g. 9000) and a high number of IPv6
>> addresses can trigger a skb_over_panic():
> ...
>>   v2->v3:
>>    - Still had a discussion w/ Hannes and improved the code a bit to
>>      make it more clear to read
>
> I am very sorry Daniel, but I found v2 much easier to understand :(
>
> Could you refrain from doing cleanups in this patch,
> only provide the very minimal fix ?
>
> No empty lines additions or deletions and stuff like that...
>
> Then, we can cleanup for net-next later if you really want ;)
>
> I know its _very_ tempting to do cleanups, but its very time consuming
> to review patches having real stuff done (like bug fixes) and cleanups.

I can understand, sorry, I'm fine with either version actually.

Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH net v3] ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs
From: Hannes Frederic Sowa @ 2014-11-05 16:38 UTC (permalink / raw)
  To: Eric Dumazet, Daniel Borkmann
  Cc: davem, lw1a2.jing, fw, netdev, Eric Dumazet, David L Stevens
In-Reply-To: <1415204413.13896.5.camel@edumazet-glaptop2.roam.corp.google.com>

On Wed, Nov 5, 2014, at 17:20, Eric Dumazet wrote:
> On Wed, 2014-11-05 at 15:42 +0100, Daniel Borkmann wrote:
> > It has been reported that generating an MLD listener report on
> > devices with large MTUs (e.g. 9000) and a high number of IPv6
> > addresses can trigger a skb_over_panic():
> 
> ...
> 
> >  v2->v3:
> >   - Still had a discussion w/ Hannes and improved the code a bit to
> >     make it more clear to read
> 
> I am very sorry Daniel, but I found v2 much easier to understand :(
> 
> Could you refrain from doing cleanups in this patch,
> only provide the very minimal fix ?
> 
> No empty lines additions or deletions and stuff like that...
> 
> Then, we can cleanup for net-next later if you really want ;)
> 
> I know its _very_ tempting to do cleanups, but its very time consuming
> to review patches having real stuff done (like bug fixes) and cleanups.

My point was that the max_t(int, ..., ...) assignment to
reserved_tailroom was too implicit in case we allocated an skb smaller
than the mtu and reserved_tailroom should become '0'.

I would still vote for this version, but see the problem with the noise
caused by newline updates. Eric, would you mind a new version with only
the essential parts changed and keeping this calculation so we don't
need to change it twice for net and for net-next?

Bye,
Hannes

^ permalink raw reply

* Re: [PATCH net v3] ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs
From: Eric Dumazet @ 2014-11-05 16:48 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Daniel Borkmann, davem, lw1a2.jing, fw, netdev, Eric Dumazet,
	David L Stevens
In-Reply-To: <1415205483.3264462.187432293.00A1F284@webmail.messagingengine.com>

On Wed, 2014-11-05 at 17:38 +0100, Hannes Frederic Sowa wrote:

> I would still vote for this version, but see the problem with the noise
> caused by newline updates. Eric, would you mind a new version with only
> the essential parts changed and keeping this calculation so we don't
> need to change it twice for net and for net-next?

I will be happy to review a v4 ;)

Thanks !

^ permalink raw reply

* Re: mlx4+vxlan offload breaks gre tunnels
From: Florian Westphal @ 2014-11-05 16:53 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Florian Westphal, netdev, Tom Herbert, Jesse Gross, amirv
In-Reply-To: <545A4DB7.5010603@mellanox.com>

Or Gerlitz <ogerlitz@mellanox.com> wrote:
> On 11/5/2014 5:04 PM, Florian Westphal wrote:
> >tl,dr: all tcp packets sent via gre tunnel have broken tcp csum if vxlan offload
> >is enabled with mlx4 driver.
> >
> Yep, I can see now the problem. It comes into play with
> ConnectX3-pro NICs that support VXLAN offloads (but not with
> ConnectX3 NIC which don't) when you enable the offloads support on
> the CX3-pro.

[..]
> I think the best effort we can do now is
> 
> 1. come up with something such as the below patch for 3.18 which is
> back-ward portable for -stable kernels, it will only arm the hw
> offloads if the OS tells us there's VXLAN in action
> 
[..]
> 
> tested to work with the  following which is a bit different, tell me
> if it works for you

Right, the patch below works in my setup as well (until link-add-vxlan,
that is ;) )

Thanks,
Florian

^ permalink raw reply

* Re: [Xen-devel] [PATCHv1 net-next] xen-netback: remove unconditional pull_skb_tail in guest Tx path
From: David Miller @ 2014-11-05 17:15 UTC (permalink / raw)
  To: Ian.Campbell
  Cc: zoltan.kiss, david.vrabel, netdev, malcolm.crossley, wei.liu2,
	xen-devel
In-Reply-To: <1415181080.11486.63.camel@citrix.com>

From: Ian Campbell <Ian.Campbell@citrix.com>
Date: Wed, 5 Nov 2014 09:51:20 +0000

> Is this also true for things which hit the iptables paths? I suppose
> they must necessarily have already been through the protocol demux stage
> before iptables would even be able to interpret them as e.g. an IP
> packet.

Netfilter often takes a different approach, by using
skb_header_pointer() which returns a direct pointer if the linear area
contains the requested range already, or alternatively copies from the
frags into a user supplied on-stack header buffer if not.

^ permalink raw reply

* Re: [PATCHv1 net-next] xen-netback: remove unconditional pull_skb_tail in guest Tx path
From: David Miller @ 2014-11-05 17:16 UTC (permalink / raw)
  To: Ian.Campbell; +Cc: david.vrabel, netdev, xen-devel, wei.liu2, malcolm.crossley
In-Reply-To: <1415181185.11486.65.camel@citrix.com>

From: Ian Campbell <Ian.Campbell@citrix.com>
Date: Wed, 5 Nov 2014 09:53:05 +0000

> I'd like to see the commit message expanded to explain why this isn't
> introducing a (security) bug by not pulling enough stuff into the header
> (IOW the conclusion of the discussion).

Just because a fundamental aspect of how we handle packets isn't clear
to some people, doesn't mean that every commit that depends upon that
invariant has to explain it in the commit message. :-)

^ permalink raw reply

* Re: [PATCH V3 1/3] can: add can_is_canfd_skb() API
From: Oliver Hartkopp @ 2014-11-05 17:33 UTC (permalink / raw)
  To: Eric Dumazet, Dong Aisheng
  Cc: linux-can, mkl, wg, varkabhadram, netdev, linux-arm-kernel
In-Reply-To: <1415204533.13896.7.camel@edumazet-glaptop2.roam.corp.google.com>

On 05.11.2014 17:22, Eric Dumazet wrote:
> On Wed, 2014-11-05 at 21:16 +0800, Dong Aisheng wrote:

>
> This looks a bit strange to assume that skb->len == magical_value is CAN
> FD. A comment would be nice.
>

Yes. Due to exactly two types of struct can(fd)_frame which can be contained 
in a skb the skbs are distinguished by the length which can be either CAN_MTU 
or CANFD_MTU.

>> +static inline int can_is_canfd_skb(struct sk_buff *skb)
>
> static inline bool can_is_canfd_skb(const struct sk_buff *skb)
>

ok.

>> +{

What about:

	/* the CAN specific type of skb is identified by its data length */

>> +	return skb->len == CANFD_MTU;
>> +}
>> +
>>   /* get data length from can_dlc with sanitized can_dlc */
>>   u8 can_dlc2len(u8 can_dlc);

Regards,
Oliver


^ permalink raw reply

* Re: [PATCH net v3] ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs
From: Daniel Borkmann @ 2014-11-05 17:59 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Hannes Frederic Sowa, davem, lw1a2.jing, fw, netdev, Eric Dumazet,
	David L Stevens
In-Reply-To: <1415206106.13896.8.camel@edumazet-glaptop2.roam.corp.google.com>

On 11/05/2014 05:48 PM, Eric Dumazet wrote:
> On Wed, 2014-11-05 at 17:38 +0100, Hannes Frederic Sowa wrote:
...
>> I would still vote for this version, but see the problem with the noise
>> caused by newline updates. Eric, would you mind a new version with only
>> the essential parts changed and keeping this calculation so we don't
>> need to change it twice for net and for net-next?
>
> I will be happy to review a v4 ;)

No problem, I'll respin. ;)

Thanks,
Daniel

^ permalink raw reply

* Re: Kernel Oops in __inet_twsk_kill()
From: Cong Wang @ 2014-11-05 18:00 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: charley.chu, netdev
In-Reply-To: <545A49B4.7090107@iogearbox.net>

On Wed, Nov 5, 2014 at 8:00 AM, Daniel Borkmann <borkmann@iogearbox.net> wrote:
> [ moving to netdev ]
>
> -------- Original Message --------
> Subject: Kernel Oops in __inet_twsk_kill()
> Date: Tue, 4 Nov 2014 23:47:18 +0000
> From: Charley (Hao Chuan) Chu <charley.chu@broadcom.com>
> To: linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>
>
> We have situation on our system. It brings the network interface up and down
> every
> a few seconds. Eventually, it brings down the system - the kernel crashed
> due to BUG
> on in __inet_twsk_kill(). The debug message show following call flow.
>
> 1) time-wait socket is created by tcp_time_wait() when the socket gets into
> "TIME_WAIT" state.
>     inet_twsk_alloc()               - refcnt= 0
>     inet_twsk_hashdance()  - refcnt = 3
>     inet_twsk_schedule()      - refcnt = 4
>     inet_twsk_put()                 - refcnt = 3
> 2) tcp_v4_timewait_ack() is called when sync is received
>     inet_twsk_put()                  - refcnt= 2      <== where we thing the
> problem is
>     occasionally, second sync is received, so the inet_twsk_put is called
> twice - refcnt = 1
> 3) twdr_do_twkill_work() is called when timed out
>     call __inet_twsk_kill - BUG_ON!!! as refcnt=2 (supposed to be 3).
>     call inet_twsk_put()
>
> In a normal case, the callflow only has step 1 and step 3.  Our
> understanding is
> the time-wait socket has three references - ehash, bhash and timer death
> row. In
> step 2, none of them are touched. Can anyone here explain to us why the
> inet_twsk_put()
> is called in tcp_v4_timewait_ack()?
>

It has been there for a rather long time, but this doesn't mean it is
correct. Its caller calls inet_twsk_put() on error path, so smells wrong
to call it on non-error path. But I don't look into this.

^ permalink raw reply

* Re: [PATCH net 0/5] Implement ndo_gso_check() for vxlan nics
From: Tom Herbert @ 2014-11-05 18:00 UTC (permalink / raw)
  To: Joe Stringer
  Cc: Or Gerlitz, Linux Netdev List, sathya.perla, Jeff Kirsher,
	linux.nics, Amir Vadai, shahed.shaikh, dept-gelinuxnicdev,
	Linux Kernel
In-Reply-To: <CANr6G5xtNYenhd8KDWx+kRcnSZ0fahUdAL+6Wcz=5_dNvrQR6Q@mail.gmail.com>

On Wed, Nov 5, 2014 at 9:50 AM, Joe Stringer <joestringer@nicira.com> wrote:
>
> On 5 November 2014 04:38, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>
>> On Tue, Nov 4, 2014 at 11:56 PM, Joe Stringer <joestringer@nicira.com> wrote:
>> > Most NICs that report NETIF_F_GSO_UDP_TUNNEL support VXLAN, and not other
>> > UDP-based encapsulation protocols where the format and size of the header may
>> > differ. This patch series implements ndo_gso_check() for these NICs,
>> > restricting the GSO handling to something that looks and smells like VXLAN.
>> >
>> > Implementation shamelessly stolen from Tom Herbert (with minor fixups):
>> > http://thread.gmane.org/gmane.linux.network/332428/focus=333111
>>
>>
>> Hi Joe,
>>
>> 1st, thanks for picking this task...2nd, for drivers that currently
>> support only pure VXLAN, I don't see the point
>> to replicate the helper suggested by Tom (good catch on the size check
>> to be 16 and not 12) four times and who know how more in the future.
>> Let's just have one generic helper and make the mlx4/be/fm10k/benet
>> drivers to have it as their ndo, OK?
>
>
> Thanks for taking a look.
>
> I had debated whether to do this or not as the actual support on each NIC may differ, and each implementation may morph over time to match these capabilities better. Obviously the vendors will know better than me on this, so I'm posing this series to prod them for more information. At this point I've had just one maintainer come back and confirm that this helper is a good fit for their hardware, so I'd like to confirm that multiple drivers will use a ndo_gso_check_vxlan_helper() function before I go and create it.


Thanks for implementing this fix!

Personally, I would rather not have the helper. This is already a
small number of drivers, and each driver owner should consider what
limitations are of their device and try to enable to allow the maximum
number of use cases possible. I'm also hoping that new devices will
implement the more generic mechanism so that VXLAN is just one
supported protocol.

^ permalink raw reply

* Can ndo_select_queue save data in skb->cb?
From: James Yonan @ 2014-11-05 17:57 UTC (permalink / raw)
  To: netdev

Is it permissible for a net driver's ndo_select_queue method to save 
data in skb->cb for later use in ndo_start_xmit?

Also, is it necessary for users of skb->cb to zero out their private 
data after use to prevent it from being misinterpreted by other layers? 
  I noticed some commits in the log (such as 462fb2) are zeroing out the 
skb->cb area for this reason.

Thanks,
James

^ permalink raw reply

* [PATCH v2 iproute2 0/5] iproute2: Add FOU and GUE configuration in ip
From: Tom Herbert @ 2014-11-05 18:06 UTC (permalink / raw)
  To: stephen, davem, netdev

This patch set adds support in iproute2 to configure FOU and GUE ports
for receive, and using FOU or GUE with ip tunnels (IPIP, GRE, sit) on
transmit.

A new ip subcommand "fou" has been added to configure FOU/GUE ports.
For example:

  ip fou add port 5555 gue 
  ip fou add port 9999 ipproto 4

The first command creates a GUE port, the second creates a direct FOU
port for IPIP (receive payload is a assumed to be an IP packet).

fou.8 and gue.8 man pages were added to describe this command.

To configure an IP tunnel to use FOU or GUE encap parameters have
been added. For example:

  ip link add name tun1 type ipip remote 192.168.1.1 local 192.168.1.2 \
     ttl 225 encap gue encap-sport auto encap-dport 7777 encap-csum
  ip link add name tun2 type gre remote 192.168.1.1 local 192.168.1.2 \
     ttl 225 encap fou encap-sport auto encap-dport 8888 encap-csum

The first command configures an IPIP tunnel to use GUE on transmit. The
peer might be configured to receive GUE packets with the
"ip fou add port 7777 gue" command.

The second configures a GRE tunnel to use FOU encapsulation. The
peer might be configured to receive these packets with the
"ip fou add port 8888 ipproto 47" command.

v2:
  - Add man pages fou.8 and gue.8
  - Add ntohs for print ports in configuration
  - Add support for remote checksum offload

Tom Herbert (5):
  ip fou: Support to configure foo-over-udp RX
  ip link ipip: Add support to configure FOU and GUE
  ip link gre: Add support to configure FOU and GUE
  ip link: Add support for remote checksum offload
  iproute2: Man pages for fou and gue

 include/linux/fou.h       |  41 ++++++++++++
 include/linux/if_tunnel.h |   1 +
 ip/Makefile               |   2 +-
 ip/ip.c                   |   3 +-
 ip/ip_common.h            |   1 +
 ip/ipfou.c                | 159 ++++++++++++++++++++++++++++++++++++++++++++++
 ip/link_gre.c             |  98 ++++++++++++++++++++++++++++
 ip/link_iptnl.c           |  98 ++++++++++++++++++++++++++++
 man/man8/ip-fou.8         |  76 ++++++++++++++++++++++
 man/man8/ip-gue.8         |   1 +
 10 files changed, 478 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/fou.h
 create mode 100644 ip/ipfou.c
 create mode 100644 man/man8/ip-fou.8
 create mode 100644 man/man8/ip-gue.8

-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply

* [PATCH v2 iproute2 1/5] ip fou: Support to configure foo-over-udp RX
From: Tom Herbert @ 2014-11-05 18:06 UTC (permalink / raw)
  To: stephen, davem, netdev
In-Reply-To: <1415210788-8058-1-git-send-email-therbert@google.com>

Added 'ip fou...' commands to enable/disable UDP ports for doing
foo-over-udp and Generic UDP Encapsulation variant. Arguments are port
number to bind to and IP protocol to map to port (for direct FOU).

Examples:

ip fou add port 7777 gue
ip fou add port 8888 ipproto 4

The first command creates a GUE port, the second creates a direct FOU
port for IPIP (receive payload is a assumed to be an IPv4 packet).

Signed-off-by: Tom Herbert <therbert@google.com>
---
 include/linux/fou.h |  41 ++++++++++++++
 ip/Makefile         |   2 +-
 ip/ip.c             |   3 +-
 ip/ip_common.h      |   1 +
 ip/ipfou.c          | 159 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 204 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/fou.h
 create mode 100644 ip/ipfou.c

diff --git a/include/linux/fou.h b/include/linux/fou.h
new file mode 100644
index 0000000..e1724ff
--- /dev/null
+++ b/include/linux/fou.h
@@ -0,0 +1,41 @@
+/* fou.h - FOU Interface */
+
+#ifndef _LINUX_FOU_H
+#define _LINUX_FOU_H
+
+#include <linux/types.h>
+
+/* NETLINK_GENERIC related info
+ */
+#define FOU_GENL_NAME		"fou"
+#define FOU_GENL_VERSION	0x1
+
+enum {
+	FOU_ATTR_UNSPEC,
+	FOU_ATTR_PORT,				/* u16 */
+	FOU_ATTR_AF,				/* u8 */
+	FOU_ATTR_IPPROTO,			/* u8 */
+	FOU_ATTR_TYPE,				/* u8 */
+
+	__FOU_ATTR_MAX,
+};
+
+#define FOU_ATTR_MAX		(__FOU_ATTR_MAX - 1)
+
+enum {
+	FOU_CMD_UNSPEC,
+	FOU_CMD_ADD,
+	FOU_CMD_DEL,
+
+	__FOU_CMD_MAX,
+};
+
+enum {
+	FOU_ENCAP_UNSPEC,
+	FOU_ENCAP_DIRECT,
+	FOU_ENCAP_GUE,
+};
+
+#define FOU_CMD_MAX	(__FOU_CMD_MAX - 1)
+
+#endif /* _LINUX_FOU_H */
diff --git a/ip/Makefile b/ip/Makefile
index 5405ee7..1f50848 100644
--- a/ip/Makefile
+++ b/ip/Makefile
@@ -6,7 +6,7 @@ IPOBJ=ip.o ipaddress.o ipaddrlabel.o iproute.o iprule.o ipnetns.o \
     iplink_macvlan.o iplink_macvtap.o ipl2tp.o link_vti.o link_vti6.o \
     iplink_vxlan.o tcp_metrics.o iplink_ipoib.o ipnetconf.o link_ip6tnl.o \
     link_iptnl.o link_gre6.o iplink_bond.o iplink_bond_slave.o iplink_hsr.o \
-    iplink_bridge.o iplink_bridge_slave.o
+    iplink_bridge.o iplink_bridge_slave.o ipfou.o
 
 RTMONOBJ=rtmon.o
 
diff --git a/ip/ip.c b/ip/ip.c
index e4b201f..5f759d5 100644
--- a/ip/ip.c
+++ b/ip/ip.c
@@ -47,7 +47,7 @@ static void usage(void)
 "       ip [ -force ] -batch filename\n"
 "where  OBJECT := { link | addr | addrlabel | route | rule | neigh | ntable |\n"
 "                   tunnel | tuntap | maddr | mroute | mrule | monitor | xfrm |\n"
-"                   netns | l2tp | tcp_metrics | token | netconf }\n"
+"                   netns | l2tp | fou | tcp_metrics | token | netconf }\n"
 "       OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] | -r[esolve] |\n"
 "                    -h[uman-readable] | -iec |\n"
 "                    -f[amily] { inet | inet6 | ipx | dnet | bridge | link } |\n"
@@ -79,6 +79,7 @@ static const struct cmd {
 	{ "ntbl",	do_ipntable },
 	{ "link",	do_iplink },
 	{ "l2tp",	do_ipl2tp },
+	{ "fou",	do_ipfou },
 	{ "tunnel",	do_iptunnel },
 	{ "tunl",	do_iptunnel },
 	{ "tuntap",	do_iptuntap },
diff --git a/ip/ip_common.h b/ip/ip_common.h
index 8351463..095c92d 100644
--- a/ip/ip_common.h
+++ b/ip/ip_common.h
@@ -48,6 +48,7 @@ extern int do_multirule(int argc, char **argv);
 extern int do_netns(int argc, char **argv);
 extern int do_xfrm(int argc, char **argv);
 extern int do_ipl2tp(int argc, char **argv);
+extern int do_ipfou(int argc, char **argv);
 extern int do_tcp_metrics(int argc, char **argv);
 extern int do_ipnetconf(int argc, char **argv);
 extern int do_iptoken(int argc, char **argv);
diff --git a/ip/ipfou.c b/ip/ipfou.c
new file mode 100644
index 0000000..2676045
--- /dev/null
+++ b/ip/ipfou.c
@@ -0,0 +1,159 @@
+/*
+ * ipfou.c	FOU (foo over UDP) support
+ *
+ *              This program is free software; you can redistribute it and/or
+ *              modify it under the terms of the GNU General Public License
+ *              as published by the Free Software Foundation; either version
+ *              2 of the License, or (at your option) any later version.
+ *
+ * Authors:	Tom Herbert <therbert@google.com>
+ */
+
+#include <netdb.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <net/if.h>
+#include <linux/fou.h>
+#include <linux/genetlink.h>
+#include <linux/ip.h>
+#include <arpa/inet.h>
+
+#include "libgenl.h"
+#include "utils.h"
+#include "ip_common.h"
+
+static void usage(void)
+{
+	fprintf(stderr, "Usage: ip fou add port PORT { ipproto PROTO  | gue }\n");
+	fprintf(stderr, "       ip fou del port PORT\n");
+	fprintf(stderr, "\n");
+	fprintf(stderr, "Where: PROTO { ipproto-name | 1..255 }\n");
+	fprintf(stderr, "       PORT { 1..65535 }\n");
+
+	exit(-1);
+}
+
+/* netlink socket */
+static struct rtnl_handle genl_rth = { .fd = -1 };
+static int genl_family = -1;
+
+#define FOU_REQUEST(_req, _bufsiz, _cmd, _flags)	\
+	GENL_REQUEST(_req, _bufsiz, genl_family, 0,	\
+		     FOU_GENL_VERSION, _cmd, _flags)
+
+static int fou_parse_opt(int argc, char **argv, struct nlmsghdr *n,
+			 bool adding)
+{
+	__u16 port;
+	int port_set = 0;
+	__u8 ipproto, type;
+	bool gue_set = false;
+	int ipproto_set = 0;
+
+	while (argc > 0) {
+		if (!matches(*argv, "port")) {
+			NEXT_ARG();
+
+			if (get_u16(&port, *argv, 0) || port == 0)
+				invarg("invalid port", *argv);
+			port = htons(port);
+			port_set = 1;
+		} else if (!matches(*argv, "ipproto")) {
+			struct protoent *servptr;
+
+			NEXT_ARG();
+
+			servptr = getprotobyname(*argv);
+			if (servptr)
+				ipproto = servptr->p_proto;
+			else if (get_u8(&ipproto, *argv, 0) || ipproto == 0)
+				invarg("invalid ipproto", *argv);
+			ipproto_set = 1;
+		} else if (!matches(*argv, "gue")) {
+			gue_set = true;
+		} else {
+			fprintf(stderr, "fou: unknown command \"%s\"?\n", *argv);
+			usage();
+			return -1;
+		}
+		argc--, argv++;
+	}
+
+	if (!port_set) {
+		fprintf(stderr, "fou: missing port\n");
+		return -1;
+	}
+
+	if (!ipproto_set && !gue_set && adding) {
+		fprintf(stderr, "fou: must set ipproto or gue\n");
+		return -1;
+	}
+
+	if (ipproto_set && gue_set) {
+		fprintf(stderr, "fou: cannot set ipproto and gue\n");
+		return -1;
+	}
+
+	type = gue_set ? FOU_ENCAP_GUE : FOU_ENCAP_DIRECT;
+
+	addattr16(n, 1024, FOU_ATTR_PORT, port);
+	addattr8(n, 1024, FOU_ATTR_TYPE, type);
+
+	if (ipproto_set)
+		addattr8(n, 1024, FOU_ATTR_IPPROTO, ipproto);
+
+	return 0;
+}
+
+static int do_add(int argc, char **argv)
+{
+	FOU_REQUEST(req, 1024, FOU_CMD_ADD, NLM_F_REQUEST);
+
+	fou_parse_opt(argc, argv, &req.n, true);
+
+	if (rtnl_talk(&genl_rth, &req.n, 0, 0, NULL) < 0)
+		return -2;
+
+	return 0;
+}
+
+static int do_del(int argc, char **argv)
+{
+	FOU_REQUEST(req, 1024, FOU_CMD_DEL, NLM_F_REQUEST);
+
+	fou_parse_opt(argc, argv, &req.n, false);
+
+	if (rtnl_talk(&genl_rth, &req.n, 0, 0, NULL) < 0)
+		return -2;
+
+	return 0;
+}
+
+int do_ipfou(int argc, char **argv)
+{
+	if (genl_family < 0) {
+		if (rtnl_open_byproto(&genl_rth, 0, NETLINK_GENERIC) < 0) {
+			fprintf(stderr, "Cannot open generic netlink socket\n");
+			exit(1);
+		}
+
+		genl_family = genl_resolve_family(&genl_rth, FOU_GENL_NAME);
+		if (genl_family < 0)
+			exit(1);
+	}
+
+	if (argc < 1)
+		usage();
+
+	if (matches(*argv, "add") == 0)
+		return do_add(argc-1, argv+1);
+	if (matches(*argv, "delete") == 0)
+		return do_del(argc-1, argv+1);
+	if (matches(*argv, "help") == 0)
+		usage();
+
+	fprintf(stderr, "Command \"%s\" is unknown, try \"ip fou help\".\n", *argv);
+	exit(-1);
+}
+
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related

* [PATCH v2 iproute2 2/5] ip link ipip: Add support to configure FOU and GUE
From: Tom Herbert @ 2014-11-05 18:06 UTC (permalink / raw)
  To: stephen, davem, netdev
In-Reply-To: <1415210788-8058-1-git-send-email-therbert@google.com>

This patch adds support to configure foo-over-udp (FOU) and Generic
UDP Encapsulation for IPIP and sit tunnels. This configuration allows
selection of FOU or GUE for the tunnel, specification of the source and
destination ports for UDP tunnel, and enabling TX checksum. This
configuration only affects the transmit side of a tunnel.

Example:

ip link add name tun1 type ipip remote 192.168.1.1 local 192.168.1.2 \
   ttl 225 encap gue encap-sport auto encap-dport 9999 encap-csum

This would create an IPIP tunnel in GUE encapsulation where the source
port is automatically selected (based on hash of inner packet) and
checksums in the encapsulating UDP header are enabled.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 ip/link_iptnl.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 89 insertions(+)

diff --git a/ip/link_iptnl.c b/ip/link_iptnl.c
index ea13ce9..9487117 100644
--- a/ip/link_iptnl.c
+++ b/ip/link_iptnl.c
@@ -29,6 +29,9 @@ static void print_usage(FILE *f, int sit)
 	fprintf(f, "          type { ipip | sit } [ remote ADDR ] [ local ADDR ]\n");
 	fprintf(f, "          [ ttl TTL ] [ tos TOS ] [ [no]pmtudisc ] [ dev PHYS_DEV ]\n");
 	fprintf(f, "          [ 6rd-prefix ADDR ] [ 6rd-relay_prefix ADDR ] [ 6rd-reset ]\n");
+	fprintf(f, "          [ noencap ] [ encap { fou | gue | none } ]\n");
+	fprintf(f, "          [ encap-sport PORT ] [ encap-dport PORT ]\n");
+	fprintf(f, "          [ [no]encap-csum ] [ [no]encap-csum6 ]\n");
 	if (sit) {
 		fprintf(f, "          [ mode { ip6ip | ipip | any } ]\n");
 		fprintf(f, "          [ isatap ]\n");
@@ -72,6 +75,10 @@ static int iptunnel_parse_opt(struct link_util *lu, int argc, char **argv,
 	__u16 ip6rdprefixlen = 0;
 	__u32 ip6rdrelayprefix = 0;
 	__u16 ip6rdrelayprefixlen = 0;
+	__u16 encaptype = 0;
+	__u16 encapflags = 0;
+	__u16 encapsport = 0;
+	__u16 encapdport = 0;
 
 	memset(&ip6rdprefix, 0, sizeof(ip6rdprefix));
 
@@ -134,6 +141,14 @@ get_failed:
 		if (iptuninfo[IFLA_IPTUN_PROTO])
 			proto = rta_getattr_u8(iptuninfo[IFLA_IPTUN_PROTO]);
 
+		if (iptuninfo[IFLA_IPTUN_ENCAP_TYPE])
+			encaptype = rta_getattr_u16(iptuninfo[IFLA_IPTUN_ENCAP_TYPE]);
+		if (iptuninfo[IFLA_IPTUN_ENCAP_FLAGS])
+			encapflags = rta_getattr_u16(iptuninfo[IFLA_IPTUN_ENCAP_FLAGS]);
+		if (iptuninfo[IFLA_IPTUN_ENCAP_SPORT])
+			encapsport = rta_getattr_u16(iptuninfo[IFLA_IPTUN_ENCAP_SPORT]);
+		if (iptuninfo[IFLA_IPTUN_ENCAP_DPORT])
+			encapdport = rta_getattr_u16(iptuninfo[IFLA_IPTUN_ENCAP_DPORT]);
 		if (iptuninfo[IFLA_IPTUN_6RD_PREFIX])
 			memcpy(&ip6rdprefix,
 			       RTA_DATA(iptuninfo[IFLA_IPTUN_6RD_PREFIX]),
@@ -211,6 +226,36 @@ get_failed:
 				proto = 0;
 			else
 				invarg("Cannot guess tunnel mode.", *argv);
+		} else if (strcmp(*argv, "noencap") == 0) {
+			encaptype = TUNNEL_ENCAP_NONE;
+		} else if (strcmp(*argv, "encap") == 0) {
+			NEXT_ARG();
+			if (strcmp(*argv, "fou") == 0)
+				encaptype = TUNNEL_ENCAP_FOU;
+			else if (strcmp(*argv, "gue") == 0)
+				encaptype = TUNNEL_ENCAP_GUE;
+			else if (strcmp(*argv, "none") == 0)
+				encaptype = TUNNEL_ENCAP_NONE;
+			else
+				invarg("Invalid encap type.", *argv);
+		} else if (strcmp(*argv, "encap-sport") == 0) {
+			NEXT_ARG();
+			if (strcmp(*argv, "auto") == 0)
+				encapsport = 0;
+			else if (get_u16(&encapsport, *argv, 0))
+				invarg("Invalid source port.", *argv);
+		} else if (strcmp(*argv, "encap-dport") == 0) {
+			NEXT_ARG();
+			if (get_u16(&encapdport, *argv, 0))
+				invarg("Invalid destination port.", *argv);
+		} else if (strcmp(*argv, "encap-csum") == 0) {
+			encapflags |= TUNNEL_ENCAP_FLAG_CSUM;
+		} else if (strcmp(*argv, "noencap-csum") == 0) {
+			encapflags &= ~TUNNEL_ENCAP_FLAG_CSUM;
+		} else if (strcmp(*argv, "encap-udp6-csum") == 0) {
+			encapflags |= TUNNEL_ENCAP_FLAG_CSUM6;
+		} else if (strcmp(*argv, "noencap-udp6-csum") == 0) {
+			encapflags &= ~TUNNEL_ENCAP_FLAG_CSUM6;
 		} else if (strcmp(*argv, "6rd-prefix") == 0) {
 			inet_prefix prefix;
 			NEXT_ARG();
@@ -248,6 +293,12 @@ get_failed:
 	addattr8(n, 1024, IFLA_IPTUN_TTL, ttl);
 	addattr8(n, 1024, IFLA_IPTUN_TOS, tos);
 	addattr8(n, 1024, IFLA_IPTUN_PMTUDISC, pmtudisc);
+
+	addattr16(n, 1024, IFLA_IPTUN_ENCAP_TYPE, encaptype);
+	addattr16(n, 1024, IFLA_IPTUN_ENCAP_FLAGS, encapflags);
+	addattr16(n, 1024, IFLA_IPTUN_ENCAP_SPORT, htons(encapsport));
+	addattr16(n, 1024, IFLA_IPTUN_ENCAP_DPORT, htons(encapdport));
+
 	if (strcmp(lu->id, "sit") == 0) {
 		addattr16(n, 1024, IFLA_IPTUN_FLAGS, iflags);
 		addattr8(n, 1024, IFLA_IPTUN_PROTO, proto);
@@ -350,6 +401,44 @@ static void iptunnel_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[
 			       relayprefixlen);
 		}
 	}
+
+	if (tb[IFLA_IPTUN_ENCAP_TYPE] &&
+	    *(__u16 *)RTA_DATA(tb[IFLA_IPTUN_ENCAP_TYPE]) != TUNNEL_ENCAP_NONE) {
+		__u16 type = rta_getattr_u16(tb[IFLA_IPTUN_ENCAP_TYPE]);
+		__u16 flags = rta_getattr_u16(tb[IFLA_IPTUN_ENCAP_FLAGS]);
+		__u16 sport = rta_getattr_u16(tb[IFLA_IPTUN_ENCAP_SPORT]);
+		__u16 dport = rta_getattr_u16(tb[IFLA_IPTUN_ENCAP_DPORT]);
+
+		fputs("encap ", f);
+		switch (type) {
+		case TUNNEL_ENCAP_FOU:
+			fputs("fou ", f);
+			break;
+		case TUNNEL_ENCAP_GUE:
+			fputs("gue ", f);
+			break;
+		default:
+			fputs("unknown ", f);
+			break;
+		}
+
+		if (sport == 0)
+			fputs("encap-sport auto ", f);
+		else
+			fprintf(f, "encap-sport %u", ntohs(sport));
+
+		fprintf(f, "encap-dport %u ", ntohs(dport));
+
+		if (flags & TUNNEL_ENCAP_FLAG_CSUM)
+			fputs("encap-csum ", f);
+		else
+			fputs("noencap-csum ", f);
+
+		if (flags & TUNNEL_ENCAP_FLAG_CSUM6)
+			fputs("encap-csum6 ", f);
+		else
+			fputs("noencap-csum6 ", f);
+	}
 }
 
 static void iptunnel_print_help(struct link_util *lu, int argc, char **argv,
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related

* [PATCH v2 iproute2 3/5] ip link gre: Add support to configure FOU and GUE
From: Tom Herbert @ 2014-11-05 18:06 UTC (permalink / raw)
  To: stephen, davem, netdev
In-Reply-To: <1415210788-8058-1-git-send-email-therbert@google.com>

This patch adds support to configure foo-over-udp (FOU) and Generic
UDP Encapsulation for GRE tunnels. This configuration allows selection
of FOU or GUE for the tunnel, specification of the source and
destination ports for UDP tunnel, and enabling TX checksum. This
configuration only affects the transmit side of a tunnel.

Example:

ip link add name tun1 type gre remote 192.168.1.1 local 192.168.1.2 \
   ttl 225 encap fou encap-sport auto encap-dport 7777 encap-csum

This would create an GRE tunnel in GUE encapsulation where the source
port is automatically selected (based on hash of inner packet) and
checksums in the encapsulating UDP header are enabled.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 ip/link_gre.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 89 insertions(+)

diff --git a/ip/link_gre.c b/ip/link_gre.c
index 83653d0..47b64cb 100644
--- a/ip/link_gre.c
+++ b/ip/link_gre.c
@@ -29,6 +29,9 @@ static void print_usage(FILE *f)
 	fprintf(f, "          type { gre | gretap } [ remote ADDR ] [ local ADDR ]\n");
 	fprintf(f, "          [ [i|o]seq ] [ [i|o]key KEY ] [ [i|o]csum ]\n");
 	fprintf(f, "          [ ttl TTL ] [ tos TOS ] [ [no]pmtudisc ] [ dev PHYS_DEV ]\n");
+	fprintf(f, "          [ noencap ] [ encap { fou | gue | none } ]\n");
+	fprintf(f, "          [ encap-sport PORT ] [ encap-dport PORT ]\n");
+	fprintf(f, "          [ [no]encap-csum ] [ [no]encap-csum6 ]\n");
 	fprintf(f, "\n");
 	fprintf(f, "Where: NAME := STRING\n");
 	fprintf(f, "       ADDR := { IP_ADDRESS | any }\n");
@@ -67,6 +70,10 @@ static int gre_parse_opt(struct link_util *lu, int argc, char **argv,
 	__u8 ttl = 0;
 	__u8 tos = 0;
 	int len;
+	__u16 encaptype = 0;
+	__u16 encapflags = 0;
+	__u16 encapsport = 0;
+	__u16 encapdport = 0;
 
 	if (!(n->nlmsg_flags & NLM_F_CREATE)) {
 		memset(&req, 0, sizeof(req));
@@ -132,6 +139,15 @@ get_failed:
 
 		if (greinfo[IFLA_GRE_LINK])
 			link = rta_getattr_u8(greinfo[IFLA_GRE_LINK]);
+
+		if (greinfo[IFLA_GRE_ENCAP_TYPE])
+			encaptype = rta_getattr_u16(greinfo[IFLA_GRE_ENCAP_TYPE]);
+		if (greinfo[IFLA_GRE_ENCAP_FLAGS])
+			encapflags = rta_getattr_u16(greinfo[IFLA_GRE_ENCAP_FLAGS]);
+		if (greinfo[IFLA_GRE_ENCAP_SPORT])
+			encapsport = rta_getattr_u16(greinfo[IFLA_GRE_ENCAP_SPORT]);
+		if (greinfo[IFLA_GRE_ENCAP_DPORT])
+			encapdport = rta_getattr_u16(greinfo[IFLA_GRE_ENCAP_DPORT]);
 	}
 
 	while (argc > 0) {
@@ -241,6 +257,36 @@ get_failed:
 				tos = uval;
 			} else
 				tos = 1;
+		} else if (strcmp(*argv, "noencap") == 0) {
+			encaptype = TUNNEL_ENCAP_NONE;
+		} else if (strcmp(*argv, "encap") == 0) {
+			NEXT_ARG();
+			if (strcmp(*argv, "fou") == 0)
+				encaptype = TUNNEL_ENCAP_FOU;
+			else if (strcmp(*argv, "gue") == 0)
+				encaptype = TUNNEL_ENCAP_GUE;
+			else if (strcmp(*argv, "none") == 0)
+				encaptype = TUNNEL_ENCAP_NONE;
+			else
+				invarg("Invalid encap type.", *argv);
+		} else if (strcmp(*argv, "encap-sport") == 0) {
+			NEXT_ARG();
+			if (strcmp(*argv, "auto") == 0)
+				encapsport = 0;
+			else if (get_u16(&encapsport, *argv, 0))
+				invarg("Invalid source port.", *argv);
+		} else if (strcmp(*argv, "encap-dport") == 0) {
+			NEXT_ARG();
+			if (get_u16(&encapdport, *argv, 0))
+				invarg("Invalid destination port.", *argv);
+		} else if (strcmp(*argv, "encap-csum") == 0) {
+			encapflags |= TUNNEL_ENCAP_FLAG_CSUM;
+		} else if (strcmp(*argv, "noencap-csum") == 0) {
+			encapflags &= ~TUNNEL_ENCAP_FLAG_CSUM;
+		} else if (strcmp(*argv, "encap-udp6-csum") == 0) {
+			encapflags |= TUNNEL_ENCAP_FLAG_CSUM6;
+		} else if (strcmp(*argv, "noencap-udp6-csum") == 0) {
+			encapflags |= ~TUNNEL_ENCAP_FLAG_CSUM6;
 		} else
 			usage();
 		argc--; argv++;
@@ -271,6 +317,11 @@ get_failed:
 	addattr_l(n, 1024, IFLA_GRE_TTL, &ttl, 1);
 	addattr_l(n, 1024, IFLA_GRE_TOS, &tos, 1);
 
+	addattr16(n, 1024, IFLA_GRE_ENCAP_TYPE, encaptype);
+	addattr16(n, 1024, IFLA_GRE_ENCAP_FLAGS, encapflags);
+	addattr16(n, 1024, IFLA_GRE_ENCAP_SPORT, htons(encapsport));
+	addattr16(n, 1024, IFLA_GRE_ENCAP_DPORT, htons(encapdport));
+
 	return 0;
 }
 
@@ -357,6 +408,44 @@ static void gre_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
 		fputs("icsum ", f);
 	if (oflags & GRE_CSUM)
 		fputs("ocsum ", f);
+
+	if (tb[IFLA_GRE_ENCAP_TYPE] &&
+	    *(__u16 *)RTA_DATA(tb[IFLA_GRE_ENCAP_TYPE]) != TUNNEL_ENCAP_NONE) {
+		__u16 type = rta_getattr_u16(tb[IFLA_GRE_ENCAP_TYPE]);
+		__u16 flags = rta_getattr_u16(tb[IFLA_GRE_ENCAP_FLAGS]);
+		__u16 sport = rta_getattr_u16(tb[IFLA_GRE_ENCAP_SPORT]);
+		__u16 dport = rta_getattr_u16(tb[IFLA_GRE_ENCAP_DPORT]);
+
+		fputs("encap ", f);
+		switch (type) {
+		case TUNNEL_ENCAP_FOU:
+			fputs("fou ", f);
+			break;
+		case TUNNEL_ENCAP_GUE:
+			fputs("gue ", f);
+			break;
+		default:
+			fputs("unknown ", f);
+			break;
+		}
+
+		if (sport == 0)
+			fputs("encap-sport auto ", f);
+		else
+			fprintf(f, "encap-sport %u", ntohs(sport));
+
+		fprintf(f, "encap-dport %u ", ntohs(dport));
+
+		if (flags & TUNNEL_ENCAP_FLAG_CSUM)
+			fputs("encap-csum ", f);
+		else
+			fputs("noencap-csum ", f);
+
+		if (flags & TUNNEL_ENCAP_FLAG_CSUM6)
+			fputs("encap-csum6 ", f);
+		else
+			fputs("noencap-csum6 ", f);
+	}
 }
 
 static void gre_print_help(struct link_util *lu, int argc, char **argv,
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related

* [PATCH v2 iproute2 4/5] ip link: Add support for remote checksum offload
From: Tom Herbert @ 2014-11-05 18:06 UTC (permalink / raw)
  To: stephen, davem, netdev
In-Reply-To: <1415210788-8058-1-git-send-email-therbert@google.com>

This patch adds support to remote checksum checksum offload
confinguration for IPIP, SIT, and GRE tunnels. This patch
adds a [no]encap-remcsum to ip link command which applicable
when configured tunnels that use GUE.

http://tools.ietf.org/html/draft-herbert-remotecsumoffload-00

Example:

ip link add name tun1 type gre remote 192.168.1.1 local 192.168.1.2 \
   ttl 225 encap fou encap-sport auto encap-dport 7777 encap-csum \
   encap-remcsum

This would create an GRE tunnel in GUE encapsulation where the source
port is automatically selected (based on hash of inner packet),
checksums in the encapsulating UDP header are enabled (needed.for
remote checksum offload), and remote checksum is configured to
be used on the tunnel (affects TX side).

Signed-off-by: Tom Herbert <therbert@google.com>
---
 include/linux/if_tunnel.h |  1 +
 ip/link_gre.c             | 11 ++++++++++-
 ip/link_iptnl.c           | 11 ++++++++++-
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/include/linux/if_tunnel.h b/include/linux/if_tunnel.h
index 8b04f32..102ce7a 100644
--- a/include/linux/if_tunnel.h
+++ b/include/linux/if_tunnel.h
@@ -69,6 +69,7 @@ enum tunnel_encap_types {
 
 #define TUNNEL_ENCAP_FLAG_CSUM		(1<<0)
 #define TUNNEL_ENCAP_FLAG_CSUM6		(1<<1)
+#define TUNNEL_ENCAP_FLAG_REMCSUM	(1<<2)
 
 /* SIT-mode i_flags */
 #define	SIT_ISATAP	0x0001
diff --git a/ip/link_gre.c b/ip/link_gre.c
index 47b64cb..1d78387 100644
--- a/ip/link_gre.c
+++ b/ip/link_gre.c
@@ -31,7 +31,7 @@ static void print_usage(FILE *f)
 	fprintf(f, "          [ ttl TTL ] [ tos TOS ] [ [no]pmtudisc ] [ dev PHYS_DEV ]\n");
 	fprintf(f, "          [ noencap ] [ encap { fou | gue | none } ]\n");
 	fprintf(f, "          [ encap-sport PORT ] [ encap-dport PORT ]\n");
-	fprintf(f, "          [ [no]encap-csum ] [ [no]encap-csum6 ]\n");
+	fprintf(f, "          [ [no]encap-csum ] [ [no]encap-csum6 ] [ [no]encap-remcsum ]\n");
 	fprintf(f, "\n");
 	fprintf(f, "Where: NAME := STRING\n");
 	fprintf(f, "       ADDR := { IP_ADDRESS | any }\n");
@@ -287,6 +287,10 @@ get_failed:
 			encapflags |= TUNNEL_ENCAP_FLAG_CSUM6;
 		} else if (strcmp(*argv, "noencap-udp6-csum") == 0) {
 			encapflags |= ~TUNNEL_ENCAP_FLAG_CSUM6;
+		} else if (strcmp(*argv, "encap-remcsum") == 0) {
+			encapflags |= TUNNEL_ENCAP_FLAG_REMCSUM;
+		} else if (strcmp(*argv, "noencap-remcsum") == 0) {
+			encapflags |= ~TUNNEL_ENCAP_FLAG_REMCSUM;
 		} else
 			usage();
 		argc--; argv++;
@@ -445,6 +449,11 @@ static void gre_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
 			fputs("encap-csum6 ", f);
 		else
 			fputs("noencap-csum6 ", f);
+
+		if (flags & TUNNEL_ENCAP_FLAG_REMCSUM)
+			fputs("encap-remcsum ", f);
+		else
+			fputs("noencap-remcsum ", f);
 	}
 }
 
diff --git a/ip/link_iptnl.c b/ip/link_iptnl.c
index 9487117..cab174f 100644
--- a/ip/link_iptnl.c
+++ b/ip/link_iptnl.c
@@ -31,7 +31,7 @@ static void print_usage(FILE *f, int sit)
 	fprintf(f, "          [ 6rd-prefix ADDR ] [ 6rd-relay_prefix ADDR ] [ 6rd-reset ]\n");
 	fprintf(f, "          [ noencap ] [ encap { fou | gue | none } ]\n");
 	fprintf(f, "          [ encap-sport PORT ] [ encap-dport PORT ]\n");
-	fprintf(f, "          [ [no]encap-csum ] [ [no]encap-csum6 ]\n");
+	fprintf(f, "          [ [no]encap-csum ] [ [no]encap-csum6 ] [ [no]encap-remcsum ]\n");
 	if (sit) {
 		fprintf(f, "          [ mode { ip6ip | ipip | any } ]\n");
 		fprintf(f, "          [ isatap ]\n");
@@ -256,6 +256,10 @@ get_failed:
 			encapflags |= TUNNEL_ENCAP_FLAG_CSUM6;
 		} else if (strcmp(*argv, "noencap-udp6-csum") == 0) {
 			encapflags &= ~TUNNEL_ENCAP_FLAG_CSUM6;
+		} else if (strcmp(*argv, "encap-remcsum") == 0) {
+			encapflags |= TUNNEL_ENCAP_FLAG_REMCSUM;
+		} else if (strcmp(*argv, "noencap-remcsum") == 0) {
+			encapflags &= ~TUNNEL_ENCAP_FLAG_REMCSUM;
 		} else if (strcmp(*argv, "6rd-prefix") == 0) {
 			inet_prefix prefix;
 			NEXT_ARG();
@@ -438,6 +442,11 @@ static void iptunnel_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[
 			fputs("encap-csum6 ", f);
 		else
 			fputs("noencap-csum6 ", f);
+
+		if (flags & TUNNEL_ENCAP_FLAG_REMCSUM)
+			fputs("encap-remcsum ", f);
+		else
+			fputs("noencap-remcsum ", f);
 	}
 }
 
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related

* [PATCH v2 iproute2 5/5] iproute2: Man pages for fou and gue
From: Tom Herbert @ 2014-11-05 18:06 UTC (permalink / raw)
  To: stephen, davem, netdev
In-Reply-To: <1415210788-8058-1-git-send-email-therbert@google.com>

Man pages for Foo-over-UDP and Generic UDP Encapsulation receive
port configuration. gue man page links to fou one.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 man/man8/ip-fou.8 | 76 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 man/man8/ip-gue.8 |  1 +
 2 files changed, 77 insertions(+)
 create mode 100644 man/man8/ip-fou.8
 create mode 100644 man/man8/ip-gue.8

diff --git a/man/man8/ip-fou.8 b/man/man8/ip-fou.8
new file mode 100644
index 0000000..0fa22ee
--- /dev/null
+++ b/man/man8/ip-fou.8
@@ -0,0 +1,76 @@
+.TH IP\-FOU 8 "2 Nov 2014" "iproute2" "Linux"
+.SH "NAME"
+ip-fou \- Foo-over-UDP receive port configuration
+.P
+ip-gue \- Generic UDP Encapsulation receive port configuration
+.SH "SYNOPSIS"
+.sp
+.ad l
+.in +8
+.ti -8
+.B ip
+.RI "[ " OPTIONS " ]"
+.B fou
+.RI " { " COMMAND " | "
+.BR help " }"
+.sp
+.ti -8
+.BR "ip fou add"
+.B port
+.IR PORT
+.RB "{ "
+.B gue
+.RI "|"
+.B ipproto
+.IR PROTO
+.RB " }"
+.br
+.ti -8
+.BR "ip fou del"
+.B port
+.IR PORT
+.SH DESCRIPTION
+The
+.B ip fou
+commands are used to create and delete receive ports for Foo-over-UDP
+(FOU) as well as Generic UDP Encapsulation (GUE).
+.PP
+Foo-over-UDP allows encapsulating packets of an IP protocol directly
+over UDP. The receiver infers the protocol of a packet received on
+a FOU UDP port to be the protocol configured for the port.
+.PP
+Generic UDP Encapsulation (GUE) encapsulates packets of an IP protocol
+within UDP and an encapsulation header. The encapsulation header contains the
+IP protocol number for the encapsulated packet.
+.PP
+When creating a FOU or GUE receive port, the port number is specified in
+.I PORT
+argument. If FOU is used, the IP protocol number associated with the port is specified in
+.I PROTO
+argument.
+.PP
+A FOU or GUE receive port is deleted by specifying
+.I PORT
+in the delete command.
+.SH EXAMPLES
+.PP
+.SS Configure a FOU receive port for GRE bound to 7777
+.nf
+# ip fou add port 8888 ipproto 47
+.PP
+.SS Configure a FOU receive port for IPIP bound to 8888
+.nf
+# ip fou add port 8888 ipproto 4
+.PP
+.SS Configure a GUE receive port bound to 9999
+.nf
+# ip fou add port 9999 gue
+.PP
+.SS Delete the GUE receive port bound to 9999
+.nf
+# ip fou del port 9999
+.SH SEE ALSO
+.br
+.BR ip (8)
+.SH AUTHOR
+Tom Herbert <therbert@google.com>
diff --git a/man/man8/ip-gue.8 b/man/man8/ip-gue.8
new file mode 100644
index 0000000..4d2914c
--- /dev/null
+++ b/man/man8/ip-gue.8
@@ -0,0 +1 @@
+.so man8/ip-fou.8
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related

* Re: [PATCH] rtlwifi: Add more checks for get_btc_status callback
From: Larry Finger @ 2014-11-05 18:12 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Murilo Opsfelder Araujo, linux-kernel, linux-wireless, netdev,
	Chaoming Li, John W. Linville, Thadeu Cascardo, troy_tan
In-Reply-To: <1415178975.5402.66.camel@marge.simpson.net>

On 11/05/2014 03:16 AM, Mike Galbraith wrote:
> On Wed, 2014-10-29 at 23:30 -0500, Larry Finger wrote:
>> On 10/29/2014 06:28 PM, Murilo Opsfelder Araujo wrote:
>>> This is a complement of commit 08054200117a95afc14c3d2ed3a38bf4e345bf78
>>> "rtlwifi: Add check for get_btc_status callback".
>>>
>>> With this patch, next-20141029 at least does not panic with rtl8192se
>>> device.
>>>
>>
>> This patch is OK, but as noted it is not complete.
>>
>> I have patches to fix all the kernel panics for rtl8192se AND rtl8192ce. There
>> are missing parts, but I would prefer submitting mine, which would conflict with
>> this one. For that reason, NACK for this one, and please apply the set I am
>> submitting now.
>
> It's all in there now, but my RTL8191SEvB is still dead.  Squabbling
> with it isn't going all that well either.
>
> As soon as 38506ece rtlwifi: rtl_pci: Start modification for new drivers
> is applied, explosions appear.  Subsequently applying...
>
> 08054200 rtlwifi: Add check for get_btc_status callback
> c0386f15 rtlwifi: rtl8192ce: rtl8192de: rtl8192se: Fix handling for missing get_btc_status
> 50147969 rtlwifi: rtl8192se: Fix duplicate calls to ieee80211_register_hw()
> 30c5ccc6 rtlwifi: rtl8192se: Add missing section to read descriptor setting
> 75a916e1 rtlwifi: rtl8192se: Fix firmware loading
>
> ...fixes that mess up, but leaves the interface dead in the same manner
> as if nothing has been reverted.  So it _seems_ the bustage lurks in
> 38506ece somewhere.  Too bad it's non-dinky, and written in wifi-ese :)

Yes, I am aware that rtl8192se is failing, and now that I am back from vacation, 
I am working on the problem. If you want to use the driver with kernel 3.18, 
clone the repo at http://github.com/lwfinger/rtlwifi_new.git and build and 
install either the master or kernel_version branches. Both work.

I am in the process of trying to find what the crucial difference is between 
that repo and the kernel version.

Larry

^ permalink raw reply

* M_CAN message RAM initialization AppNote  - was: Re: [PATCH V3 3/3] can: m_can: workaround for transmit data less than 4 bytes
From: Oliver Hartkopp @ 2014-11-05 18:15 UTC (permalink / raw)
  To: Marc Kleine-Budde, Dong Aisheng, linux-can
  Cc: wg, varkabhadram, netdev, linux-arm-kernel
In-Reply-To: <545A3451.2090302@pengutronix.de>

Hi all,

just to close this application note relevant point ...

I got an answer from Florian Hartwich (Mr. CAN) from Bosch regarding the bit 
error detection found by Dong Aisheng.

The relevant interrupts IR.BEU or IR.BEC monitor the message RAM:

Bit 21 BEU: Bit Error Uncorrected
Message RAM bit error detected, uncorrected. Controlled by input signal 
m_can_aeim_berr[1] generated by an optional external parity / ECC logic 
attached to the Message RAM. An uncorrected Message RAM bit error sets 
CCCR.INIT to ‘1’. This is done to avoid transmission of corrupted data.

0= No bit error detected when reading from Message RAM
1= Bit error detected, uncorrected (e.g. parity logic)

Bit 20 BEC: Bit Error Corrected
Message RAM bit error detected and corrected. Controlled by input signal 
m_can_aeim_berr[0] generated by an optional external parity / ECC logic 
attached to the Message RAM.

0= No bit error detected when reading from Message RAM
1= Bit error detected and corrected (e.g. ECC)

---

The Message RAM is usually equipped with a parity or ECC functionality.
But RAM cells suffer a hardware reset and can therefore hold arbitrary content 
at startup - including parity and/or ECC bits.

So when you write only the CAN ID and the first four bytes the last four bytes 
remain untouched. Then the M_CAN starts to read in 32bit words from the start 
of the Tx Message element. So it is very likely to trigger the message RAM 
error when reading the uninitialized 32bit word from the last four bytes.

Finally it turns out that an initial writing (with any kind of data) to the 
entire message RAM is mandatory to create valid parity/ECC checksums.

That's it.

Regards,
Oliver


^ permalink raw reply

* Re: Can ndo_select_queue save data in skb->cb?
From: Eric Dumazet @ 2014-11-05 18:16 UTC (permalink / raw)
  To: James Yonan; +Cc: netdev
In-Reply-To: <545A64EC.309@openvpn.net>

On Wed, 2014-11-05 at 10:57 -0700, James Yonan wrote:
> Is it permissible for a net driver's ndo_select_queue method to save 
> data in skb->cb for later use in ndo_start_xmit?
> 
> Also, is it necessary for users of skb->cb to zero out their private 
> data after use to prevent it from being misinterpreted by other layers? 
>   I noticed some commits in the log (such as 462fb2) are zeroing out the 
> skb->cb area for this reason.

Its ok to use skb->cb[] from ndo_select_queue()

Look at bond_select_queue() for such a case.

You do not need to cleanup skb->cb[] to zero before giving skb to the
driver.

^ permalink raw reply

* Re: [PATCH] net: mv643xx_eth: reclaim TX skbs only when released by the HW
From: Karl Beldan @ 2014-11-05 18:31 UTC (permalink / raw)
  To: Ezequiel Garcia
  Cc: David Miller, Ian Campbell, Karl Beldan, netdev, Eric Dumazet,
	Sebastian Hesselbarth
In-Reply-To: <20141105150521.GA25354@magnum.frso.rivierawaves.com>

On Wed, Nov 05, 2014 at 04:05:21PM +0100, Karl Beldan wrote:
> On Wed, Nov 05, 2014 at 11:46:16AM -0300, Ezequiel Garcia wrote:
> > Hi Karl,
> > 
> > On 11/05/2014 11:32 AM, Karl Beldan wrote:> From: Karl Beldan <karl.beldan@rivierawaves.com>
> > > 
> > > ATM, txq_reclaim will dequeue and free an skb for each tx desc released
> > > by the hw that has TX_LAST_DESC set. However, in case of TSO, each
> > > hw desc embedding the last part of a segment has TX_LAST_DESC set,
> > > losing the one-to-one 'last skb frag'/'TX_LAST_DESC set' correspondance,
> > > which causes data corruption.
> > > 
> > > Fix this by checking TX_ENABLE_INTERRUPT instead of TX_LAST_DESC, and
> > > warn when trying to dequeue from an empty txq (which can be symptomatic
> > > of releasing skbs prematurely).
> > > 
> > > Fixes: 3ae8f4e0b98 ('net: mv643xx_eth: Implement software TSO')
> > 
> > Although your change makes sense, this isn't fixing the issue for me,
> > neither did the previous one.
> > 
> This change fixes a serious issue.
> On my side I can now trigger misc NFS and md5sums errors very easily,
> which I haven't detected so far with it applied.
> Are you running little endian ? Do you have the tso alignment fix
> a63ba13e (I don't expect it to be required but I don't know what SoC you
> are using) ? I suppose you are running with all 3 fixes applied.
>  
Also, I haven't checked SMP issues and I only have one core, if you are
using SMP it might be worth looking into that, maybe try running on one
core only (I only have an MV78200).
 
Karl

^ permalink raw reply

* Re: Can ndo_select_queue save data in skb->cb?
From: Cong Wang @ 2014-11-05 18:38 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: James Yonan, netdev
In-Reply-To: <1415211391.13896.10.camel@edumazet-glaptop2.roam.corp.google.com>

On Wed, Nov 5, 2014 at 10:16 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2014-11-05 at 10:57 -0700, James Yonan wrote:
>> Is it permissible for a net driver's ndo_select_queue method to save
>> data in skb->cb for later use in ndo_start_xmit?
>>
>> Also, is it necessary for users of skb->cb to zero out their private
>> data after use to prevent it from being misinterpreted by other layers?
>>   I noticed some commits in the log (such as 462fb2) are zeroing out the
>> skb->cb area for this reason.
>
> Its ok to use skb->cb[] from ndo_select_queue()
>
> Look at bond_select_queue() for such a case.
>
> You do not need to cleanup skb->cb[] to zero before giving skb to the
> driver.
>

That is only because qdisc layer saves data for bond,
which means if you have more data to save, you have to put
more into qdisc CB, it is just ugly.

^ permalink raw reply

* guten Tag,
From: Lubben Hendrik @ 2014-11-05 17:39 UTC (permalink / raw)
  To: netdev

guten Tag,

Mein Name ist Lubben Hendrik und ich arbeite mit der Finanz- Haus hier in der Niederlande. 
Ich fand Ihre Adresse durch meinen Ländern internationale Web Verzeichnis. Bei unserem letzten Treffen und Prüfung der Bankkonten
hier in den Niederlanden gefunden meine Abteilung ein ruhendes Konto mit einem enorme Summe von US $6.500.000,00
( sechs Millionen fünfhunderttausend USDollar ), die von verstorbenen Herrn Williams aus England abgelagert wurde .

Vor seinem Tod übernahm er die Summe von US $6.500.000,00 ( sechs Millionen fünfhunderttausend USDollar)an eine Bank hier in den Niederlanden. von unserem
Untersuchung hatte er keinen Empfänger oder nächsten Angehörigen , diese Mittel zu erreichen.
Aufgrund unserer finanziellen Hausordnung nur ein Ausländer kann als Stand-nächsten Verwandten oder nächsten Angehörigen . Der Antrag eines
Ausländers als nächsten Angehörigen ist die Basis auf der Tatsache, dass der Einleger ein Ausländer und jemand
in der war Niederlande kann nicht als die nächsten Angehörigen zu stehen.


Ich brauche Ihre Erlaubnis als nächster Verwandter oder nächsten Angehörigen unserer Verstorbenen
Kunden , so dass die Mittel freigegeben werden kann und Transfer zu Ihrem Konto , am Ende der Transaktion 40% wird für Sie sein und 60 % werden
für mich und meine Kollegen.

Wir brauchen ein ausländisches Konto . Ich immer noch auf die finanzielle Haus zu arbeiten , und das ist
der eigentliche Grund , dass ich eine zweite Partei oder Person brauchen ,zu stehen und zu arbeiten
mit mir und gelten für die Bank hier in den Niederlanden , wie die nächsten Angehörigen.
Ich habe in meinem Besitz alle notwendigen Unterlagen zu haben, diese Transaktion erfolgreich durchgeführt .


Weitere Informationen werden nach dem Eingang Ihrer Aufforderung zur Verfügung gestellt werden
Reaktion und ich möchte , dass Sie wissen , dass es kein Risiko. ich werde müssen uns zusammen zu arbeiten , 
wenn Sie interessiert sind , und ich versichere Ihnen, dass ich werden alle nützlichen Informationen und Dokumentation wie diese
Unternehmen zu schaffen braucht dringend Aufmerksamkeit,da es keine viel Zeit zu verlieren.
Bitte schreiben Sie mir direkt mit Ihrem Namen , Adresse , Telefon-und Faxnummer
auf diese E-Mail(lubbenhendrik1@aol.com)so kann ich erklären die Verfahren .

Grüße

Lubben Hendrik

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox