Netdev List
 help / color / mirror / Atom feed
* Re: net: GPF in eth_header
From: Eric Dumazet @ 2016-11-29 16:15 UTC (permalink / raw)
  To: Andrey Konovalov
  Cc: syzkaller, Dmitry Vyukov, David Miller, Tom Herbert,
	Alexander Duyck, Hannes Frederic Sowa, Jiri Benc, Sabrina Dubroca,
	netdev, LKML
In-Reply-To: <CAAeHK+zLO=Gv33TqS6tGTTC=QUYO-HOPgSB2oS1SBOAyJzPDTQ@mail.gmail.com>

On Tue, 2016-11-29 at 16:31 +0100, Andrey Konovalov wrote:
=
> The issue is not with skb_network_offset(), but with __skb_pull()
> using skb_network_offset() as an argument.
> 

No. The issue can happen with _any_ __skb_pull() with a 'negative'
argument, on 64bit arches.

skb_network_offset() is only one of the many cases this could happen if
a bug is added at some random place, including memory corruption from
a different kernel layer, or buggy hardware.

> I'm not sure what would be the beast way to fix this, to add a check
> before every __skb_pull(skb_network_offset()), to fix __skb_pull() to
> work with signed ints, to add BUG_ON()'s in __skb_pull, or something
> else.
> 
> What I meant is that you fixed this very instance of the bug, and I'm
> pointing out that a similar one might hit us again.

As I said, adding a check in skb_network_offset() would not be generic
enough.

Sure, we can be proactive and add tests everywhere in the kernel, but we
also want to keep it reasonably fast.

^ permalink raw reply

* [PATCH net 2/2] esp6: Fix integrity verification when ESN are used
From: Tobias Brunner @ 2016-11-29 16:05 UTC (permalink / raw)
  To: David S. Miller, Herbert Xu; +Cc: Steffen Klassert, netdev

When handling inbound packets, the two halves of the sequence number
stored on the skb are already in network order.

Fixes: 000ae7b2690e ("esp6: Switch to new AEAD interface")
Signed-off-by: Tobias Brunner <tobias@strongswan.org>
---
 net/ipv6/esp6.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index 060a60b2f8a6..111ba55fd512 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -418,7 +418,7 @@ static int esp6_input(struct xfrm_state *x, struct sk_buff *skb)
 		esph = (void *)skb_push(skb, 4);
 		*seqhi = esph->spi;
 		esph->spi = esph->seq_no;
-		esph->seq_no = htonl(XFRM_SKB_CB(skb)->seq.input.hi);
+		esph->seq_no = XFRM_SKB_CB(skb)->seq.input.hi;
 		aead_request_set_callback(req, 0, esp_input_done_esn, skb);
 	}
 
-- 
1.9.1

^ permalink raw reply related

* [PATCH net 1/2] esp4: Fix integrity verification when ESN are used
From: Tobias Brunner @ 2016-11-29 16:05 UTC (permalink / raw)
  To: David S. Miller, Herbert Xu; +Cc: Steffen Klassert, netdev

When handling inbound packets, the two halves of the sequence number
stored on the skb are already in network order.

Fixes: 7021b2e1cddd ("esp4: Switch to new AEAD interface")
Signed-off-by: Tobias Brunner <tobias@strongswan.org>
---
 net/ipv4/esp4.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index d95631d09248..20fb25e3027b 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -476,7 +476,7 @@ static int esp_input(struct xfrm_state *x, struct sk_buff *skb)
 		esph = (void *)skb_push(skb, 4);
 		*seqhi = esph->spi;
 		esph->spi = esph->seq_no;
-		esph->seq_no = htonl(XFRM_SKB_CB(skb)->seq.input.hi);
+		esph->seq_no = XFRM_SKB_CB(skb)->seq.input.hi;
 		aead_request_set_callback(req, 0, esp_input_done_esn, skb);
 	}
 
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH net-next] samples/bpf: fix include path
From: David Ahern @ 2016-11-29 15:55 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov, David S . Miller; +Cc: netdev
In-Reply-To: <583D2187.7060405@iogearbox.net>

On 11/28/16 11:34 PM, Daniel Borkmann wrote:
> On 11/29/2016 07:07 AM, Alexei Starovoitov wrote:
>> Fix the following build error:
>> HOSTCC  samples/bpf/test_lru_dist.o
>> ../samples/bpf/test_lru_dist.c:25:22: fatal error: bpf_util.h: No such file or directory
>>
>> This is due to objtree != srctree.
>> Use srctree, since that's where bpf_util.h is located.
>>
>> Fixes: e00c7b216f34 ("bpf: fix multiple issues in selftest suite and samples")
>> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> 
> Didn't run into this so far, thanks for fixing!
> 
> Acked-by: Daniel Borkmann <daniel@iogearbox.net>

I have and this fixed it for me.

Tested-by: David Ahern <dsa@cumulusnetworks.com>

^ permalink raw reply

* Re: [PATCH v2 08/13] net: ethernet: ti: cpts: move dt props parsing to cpts driver
From: Grygorii Strashko @ 2016-11-29 15:54 UTC (permalink / raw)
  To: Richard Cochran
  Cc: David S. Miller, netdev, Mugunthan V N, Sekhar Nori, linux-kernel,
	linux-omap, Rob Herring, devicetree, Murali Karicheri,
	Wingman Kwok
In-Reply-To: <20161129101112.GG3110@localhost.localdomain>



On 11/29/2016 04:11 AM, Richard Cochran wrote:
> On Mon, Nov 28, 2016 at 05:03:32PM -0600, Grygorii Strashko wrote:
>> +static int cpts_of_parse(struct cpts *cpts, struct device_node *node)
>> +{
>> +	int ret = -EINVAL;
>> +	u32 prop;
>> +
>> +	if (of_property_read_u32(node, "cpts_clock_mult", &prop))
>> +		goto  of_error;
>> +	cpts->cc_mult = prop;
> 
> Why not set cc.mult here at the same time?

The same reason as in prev patch - cpts->cc_mult is original/initial mult
value loaded from DT (or calculated), while cc.mult is dynamic value
which can be changed as part of freq adjustment.

> 
>> +
>> +	if (of_property_read_u32(node, "cpts_clock_shift", &prop))
>> +		goto  of_error;
>> +	cpts->cc.shift = prop;
>> +
>> +	return 0;
>> +
>> +of_error:
>> +	dev_err(cpts->dev, "CPTS: Missing property in the DT.\n");
>> +	return ret;
>> +}
> 


-- 
regards,
-grygorii

^ permalink raw reply

* [PATCH net-next v5 3/3] samples: bpf: add userspace example for modifying sk_bound_dev_if
From: David Ahern @ 2016-11-29 15:53 UTC (permalink / raw)
  To: netdev; +Cc: daniel, ast, daniel, maheshb, tgraf, David Ahern
In-Reply-To: <1480434813-3141-1-git-send-email-dsa@cumulusnetworks.com>

Add a simple program to demonstrate the ability to attach a bpf program
to a cgroup that sets sk_bound_dev_if for AF_INET{6} sockets when they
are created.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
v5
- changed BPF_CGROUP_INET_SOCK to BPF_CGROUP_INET_SOCK_CREATE

v4
- added test_cgrp2_sock.sh for an automated test

v3
- revert to BPF_PROG_TYPE_CGROUP_SOCK prog type

v2
- removed bpf_sock_store_u32 references
- changed BPF_CGROUP_INET_SOCK_CREATE to BPF_CGROUP_INET_SOCK
- remove BPF_PROG_TYPE_CGROUP_SOCK prog type and add prog_subtype

 samples/bpf/Makefile           |  2 +
 samples/bpf/test_cgrp2_sock.c  | 83 ++++++++++++++++++++++++++++++++++++++++++
 samples/bpf/test_cgrp2_sock.sh | 48 ++++++++++++++++++++++++
 3 files changed, 133 insertions(+)
 create mode 100644 samples/bpf/test_cgrp2_sock.c
 create mode 100755 samples/bpf/test_cgrp2_sock.sh

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 22b6407efa4f..3a404dd4bb46 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -23,6 +23,7 @@ hostprogs-y += map_perf_test
 hostprogs-y += test_overhead
 hostprogs-y += test_cgrp2_array_pin
 hostprogs-y += test_cgrp2_attach
+hostprogs-y += test_cgrp2_sock
 hostprogs-y += xdp1
 hostprogs-y += xdp2
 hostprogs-y += test_current_task_under_cgroup
@@ -51,6 +52,7 @@ map_perf_test-objs := bpf_load.o libbpf.o map_perf_test_user.o
 test_overhead-objs := bpf_load.o libbpf.o test_overhead_user.o
 test_cgrp2_array_pin-objs := libbpf.o test_cgrp2_array_pin.o
 test_cgrp2_attach-objs := libbpf.o test_cgrp2_attach.o
+test_cgrp2_sock-objs := libbpf.o test_cgrp2_sock.o
 xdp1-objs := bpf_load.o libbpf.o xdp1_user.o
 # reuse xdp1 source intentionally
 xdp2-objs := bpf_load.o libbpf.o xdp1_user.o
diff --git a/samples/bpf/test_cgrp2_sock.c b/samples/bpf/test_cgrp2_sock.c
new file mode 100644
index 000000000000..2831e5f41f86
--- /dev/null
+++ b/samples/bpf/test_cgrp2_sock.c
@@ -0,0 +1,83 @@
+/* eBPF example program:
+ *
+ * - Loads eBPF program
+ *
+ *   The eBPF program sets the sk_bound_dev_if index in new AF_INET{6}
+ *   sockets opened by processes in the cgroup.
+ *
+ * - Attaches the new program to a cgroup using BPF_PROG_ATTACH
+ */
+
+#define _GNU_SOURCE
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <stddef.h>
+#include <string.h>
+#include <unistd.h>
+#include <assert.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <linux/bpf.h>
+
+#include "libbpf.h"
+
+static int prog_load(int idx)
+{
+	struct bpf_insn prog[] = {
+		BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+		BPF_MOV64_IMM(BPF_REG_3, idx),
+		BPF_MOV64_IMM(BPF_REG_2, offsetof(struct bpf_sock, bound_dev_if)),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_3, offsetof(struct bpf_sock, bound_dev_if)),
+		BPF_MOV64_IMM(BPF_REG_0, 1), /* r0 = verdict */
+		BPF_EXIT_INSN(),
+	};
+
+	return bpf_prog_load(BPF_PROG_TYPE_CGROUP_SOCK, prog, sizeof(prog),
+			     "GPL", 0);
+}
+
+static int usage(const char *argv0)
+{
+	printf("Usage: %s cg-path device-index\n", argv0);
+	return EXIT_FAILURE;
+}
+
+int main(int argc, char **argv)
+{
+	int cg_fd, prog_fd, ret;
+	int idx = 0;
+
+	if (argc < 2)
+		return usage(argv[0]);
+
+	idx = atoi(argv[2]);
+	if (!idx) {
+		printf("Invalid device index\n");
+		return EXIT_FAILURE;
+	}
+
+	cg_fd = open(argv[1], O_DIRECTORY | O_RDONLY);
+	if (cg_fd < 0) {
+		printf("Failed to open cgroup path: '%s'\n", strerror(errno));
+		return EXIT_FAILURE;
+	}
+
+	prog_fd = prog_load(idx);
+	printf("Output from kernel verifier:\n%s\n-------\n", bpf_log_buf);
+
+	if (prog_fd < 0) {
+		printf("Failed to load prog: '%s'\n", strerror(errno));
+		return EXIT_FAILURE;
+	}
+
+	ret = bpf_prog_detach(cg_fd, BPF_CGROUP_INET_SOCK_CREATE);
+	ret = bpf_prog_attach(prog_fd, cg_fd, BPF_CGROUP_INET_SOCK_CREATE);
+	if (ret < 0) {
+		printf("Failed to attach prog to cgroup: '%s'\n",
+		       strerror(errno));
+		return EXIT_FAILURE;
+	}
+
+	return EXIT_SUCCESS;
+}
diff --git a/samples/bpf/test_cgrp2_sock.sh b/samples/bpf/test_cgrp2_sock.sh
new file mode 100755
index 000000000000..35ce9e1566d8
--- /dev/null
+++ b/samples/bpf/test_cgrp2_sock.sh
@@ -0,0 +1,48 @@
+#!/bin/bash
+
+function config_device {
+	ip netns add at_ns0
+	ip link add veth0 type veth peer name veth0b
+	ip link set veth0b up
+	ip link set veth0 netns at_ns0
+	ip netns exec at_ns0 ip addr add 172.16.1.100/24 dev veth0
+	ip netns exec at_ns0 ip addr add 2401:db00::1/64 dev veth0 nodad
+	ip netns exec at_ns0 ip link set dev veth0 up
+	ip link add foo type vrf table 1234
+	ip link set foo up
+	ip addr add 172.16.1.101/24 dev veth0b
+	ip addr add 2401:db00::2/64 dev veth0b nodad
+	ip link set veth0b master foo
+}
+
+function attach_bpf {
+	idx=$(ip -o li sh dev foo | awk -F':' '{print $1}')
+	rm -rf /tmp/cgroupv2
+	mkdir -p /tmp/cgroupv2
+	mount -t cgroup2 none /tmp/cgroupv2
+	mkdir -p /tmp/cgroupv2/foo
+	test_cgrp2_sock /tmp/cgroupv2/foo $idx
+	echo $$ >> /tmp/cgroupv2/foo/cgroup.procs
+}
+
+function cleanup {
+	set +ex
+	ip netns delete at_ns0
+	ip link del veth0
+	ip link del foo
+	umount /tmp/cgroupv2
+	rm -rf /tmp/cgroupv2
+	set -ex
+}
+
+function do_test {
+	ping -c1 -w1 172.16.1.100
+	ping6 -c1 -w1 2401:db00::1
+}
+
+cleanup 2>/dev/null
+config_device
+attach_bpf
+do_test
+cleanup
+echo "*** PASS ***"
-- 
2.1.4

^ permalink raw reply related

* [PATCH net-next v5 2/3] bpf: Add new cgroup attach type to enable sock modifications
From: David Ahern @ 2016-11-29 15:53 UTC (permalink / raw)
  To: netdev; +Cc: daniel, ast, daniel, maheshb, tgraf, David Ahern
In-Reply-To: <1480434813-3141-1-git-send-email-dsa@cumulusnetworks.com>

Add new cgroup based program type, BPF_PROG_TYPE_CGROUP_SOCK. Similar to
BPF_PROG_TYPE_CGROUP_SKB programs can be attached to a cgroup and run
any time a process in the cgroup opens an AF_INET or AF_INET6 socket.
Currently only sk_bound_dev_if is exported to userspace for modification
by a bpf program.

This allows a cgroup to be configured such that AF_INET{6} sockets opened
by processes are automatically bound to a specific device. In turn, this
enables the running of programs that do not support SO_BINDTODEVICE in a
specific VRF context / L3 domain.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
v5
- no change

v4
- dropped tweak to bpf_func signature
- dropped cg_sock_func_proto in favor of sk_filter_func_proto
- new __cgroup_bpf_run_filter_sk versus overloading __cgroup_bpf_run_filter
- reverted BPF_CGROUP_INET_SOCK to BPF_CGROUP_INET_SOCK_CREATE

v3
- reverted to new prog type BPF_PROG_TYPE_CGROUP_SOCK
- dropped the subtype

v2
- dropped the bpf_sock_store_u32 helper
- dropped the new prog type BPF_PROG_TYPE_CGROUP_SOCK
- moved valid access and context conversion to use subtype
- dropped CREATE from BPF_CGROUP_INET_SOCK and related function names
- moved running of filter from sk_alloc to inet{6}_create

 include/linux/bpf-cgroup.h | 14 +++++++++++
 include/uapi/linux/bpf.h   |  6 +++++
 kernel/bpf/cgroup.c        | 33 ++++++++++++++++++++++++++
 kernel/bpf/syscall.c       |  5 +++-
 net/core/filter.c          | 59 ++++++++++++++++++++++++++++++++++++++++++++++
 net/ipv4/af_inet.c         | 12 +++++++++-
 net/ipv6/af_inet6.c        |  8 +++++++
 7 files changed, 135 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index 7f0fc635b13e..7de376e37c5c 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -41,6 +41,9 @@ int __cgroup_bpf_run_filter_skb(struct sock *sk,
 				struct sk_buff *skb,
 				enum bpf_attach_type type);
 
+int __cgroup_bpf_run_filter_sk(struct sock *sk,
+			       enum bpf_attach_type type);
+
 /* Wrappers for __cgroup_bpf_run_filter_skb() guarded by cgroup_bpf_enabled. */
 #define BPF_CGROUP_RUN_PROG_INET_INGRESS(sk, skb)			      \
 ({									      \
@@ -64,6 +67,16 @@ int __cgroup_bpf_run_filter_skb(struct sock *sk,
 	__ret;								       \
 })
 
+#define BPF_CGROUP_RUN_PROG_INET_SOCK(sk)				       \
+({									       \
+	int __ret = 0;							       \
+	if (cgroup_bpf_enabled && sk) {					       \
+		__ret = __cgroup_bpf_run_filter_sk(sk,			       \
+						 BPF_CGROUP_INET_SOCK_CREATE); \
+	}								       \
+	__ret;								       \
+})
+
 #else
 
 struct cgroup_bpf {};
@@ -73,6 +86,7 @@ static inline void cgroup_bpf_inherit(struct cgroup *cgrp,
 
 #define BPF_CGROUP_RUN_PROG_INET_INGRESS(sk,skb) ({ 0; })
 #define BPF_CGROUP_RUN_PROG_INET_EGRESS(sk,skb) ({ 0; })
+#define BPF_CGROUP_RUN_PROG_INET_SOCK(sk) ({ 0; })
 
 #endif /* CONFIG_CGROUP_BPF */
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 1370a9d1456f..75964e00d947 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -101,11 +101,13 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_XDP,
 	BPF_PROG_TYPE_PERF_EVENT,
 	BPF_PROG_TYPE_CGROUP_SKB,
+	BPF_PROG_TYPE_CGROUP_SOCK,
 };
 
 enum bpf_attach_type {
 	BPF_CGROUP_INET_INGRESS,
 	BPF_CGROUP_INET_EGRESS,
+	BPF_CGROUP_INET_SOCK_CREATE,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -537,6 +539,10 @@ struct bpf_tunnel_key {
 	__u32 tunnel_label;
 };
 
+struct bpf_sock {
+	__u32 bound_dev_if;
+};
+
 /* User return codes for XDP prog type.
  * A valid XDP program must return one of these defined values. All other
  * return codes are reserved for future use. Unknown return codes will result
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 19892973a78a..fe1c9ad03a36 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -165,3 +165,36 @@ int __cgroup_bpf_run_filter_skb(struct sock *sk,
 	return ret;
 }
 EXPORT_SYMBOL(__cgroup_bpf_run_filter_skb);
+
+/**
+ * __cgroup_bpf_run_filter_sk() - Run a program on a sock
+ * @sk: sock structure to manipulate
+ * @type: The type of program to be exectuted
+ *
+ * socket is passed is expected to be of type INET or INET6.
+ *
+ * The program type passed in via @type must be suitable for sock
+ * filtering. No further check is performed to assert that.
+ *
+ * This function will return %-EPERM if any if an attached program was found
+ * and if it returned != 1 during execution. In all other cases, 0 is returned.
+ */
+int __cgroup_bpf_run_filter_sk(struct sock *sk,
+			       enum bpf_attach_type type)
+{
+	struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
+	struct bpf_prog *prog;
+	int ret = 0;
+
+
+	rcu_read_lock();
+
+	prog = rcu_dereference(cgrp->bpf.effective[type]);
+	if (prog)
+		ret = BPF_PROG_RUN(prog, sk) == 1 ? 0 : -EPERM;
+
+	rcu_read_unlock();
+
+	return ret;
+}
+EXPORT_SYMBOL(__cgroup_bpf_run_filter_sk);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 5518a6839ab1..85af86c496cd 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -869,7 +869,9 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 	case BPF_CGROUP_INET_EGRESS:
 		ptype = BPF_PROG_TYPE_CGROUP_SKB;
 		break;
-
+	case BPF_CGROUP_INET_SOCK_CREATE:
+		ptype = BPF_PROG_TYPE_CGROUP_SOCK;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -905,6 +907,7 @@ static int bpf_prog_detach(const union bpf_attr *attr)
 	switch (attr->attach_type) {
 	case BPF_CGROUP_INET_INGRESS:
 	case BPF_CGROUP_INET_EGRESS:
+	case BPF_CGROUP_INET_SOCK_CREATE:
 		cgrp = cgroup_get_from_fd(attr->target_fd);
 		if (IS_ERR(cgrp))
 			return PTR_ERR(cgrp);
diff --git a/net/core/filter.c b/net/core/filter.c
index 698a262b8ebb..404aaa1bfa1f 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2676,6 +2676,29 @@ static bool sk_filter_is_valid_access(int off, int size,
 	return __is_valid_access(off, size, type);
 }
 
+static bool sock_filter_is_valid_access(int off, int size,
+					enum bpf_access_type type,
+					enum bpf_reg_type *reg_type)
+{
+	if (type == BPF_WRITE) {
+		switch (off) {
+		case offsetof(struct bpf_sock, bound_dev_if):
+			break;
+		default:
+			return false;
+		}
+	}
+
+	if (off < 0 || off + size > sizeof(struct bpf_sock))
+		return false;
+
+	/* The verifier guarantees that size > 0. */
+	if (off % size != 0)
+		return false;
+
+	return true;
+}
+
 static int tc_cls_act_prologue(struct bpf_insn *insn_buf, bool direct_write,
 			       const struct bpf_prog *prog)
 {
@@ -2934,6 +2957,30 @@ static u32 sk_filter_convert_ctx_access(enum bpf_access_type type, int dst_reg,
 	return insn - insn_buf;
 }
 
+static u32 sock_filter_convert_ctx_access(enum bpf_access_type type,
+					  int dst_reg, int src_reg,
+					  int ctx_off,
+					  struct bpf_insn *insn_buf,
+					  struct bpf_prog *prog)
+{
+	struct bpf_insn *insn = insn_buf;
+
+	switch (ctx_off) {
+	case offsetof(struct bpf_sock, bound_dev_if):
+		BUILD_BUG_ON(FIELD_SIZEOF(struct sock, sk_bound_dev_if) != 4);
+
+		if (type == BPF_WRITE)
+			*insn++ = BPF_STX_MEM(BPF_W, dst_reg, src_reg,
+					offsetof(struct sock, sk_bound_dev_if));
+		else
+			*insn++ = BPF_LDX_MEM(BPF_W, dst_reg, src_reg,
+				      offsetof(struct sock, sk_bound_dev_if));
+		break;
+	}
+
+	return insn - insn_buf;
+}
+
 static u32 tc_cls_act_convert_ctx_access(enum bpf_access_type type, int dst_reg,
 					 int src_reg, int ctx_off,
 					 struct bpf_insn *insn_buf,
@@ -3007,6 +3054,12 @@ static const struct bpf_verifier_ops cg_skb_ops = {
 	.convert_ctx_access	= sk_filter_convert_ctx_access,
 };
 
+static const struct bpf_verifier_ops cg_sock_ops = {
+	.get_func_proto		= sk_filter_func_proto,
+	.is_valid_access	= sock_filter_is_valid_access,
+	.convert_ctx_access	= sock_filter_convert_ctx_access,
+};
+
 static struct bpf_prog_type_list sk_filter_type __read_mostly = {
 	.ops	= &sk_filter_ops,
 	.type	= BPF_PROG_TYPE_SOCKET_FILTER,
@@ -3032,6 +3085,11 @@ static struct bpf_prog_type_list cg_skb_type __read_mostly = {
 	.type	= BPF_PROG_TYPE_CGROUP_SKB,
 };
 
+static struct bpf_prog_type_list cg_sock_type __read_mostly = {
+	.ops	= &cg_sock_ops,
+	.type	= BPF_PROG_TYPE_CGROUP_SOCK
+};
+
 static int __init register_sk_filter_ops(void)
 {
 	bpf_register_prog_type(&sk_filter_type);
@@ -3039,6 +3097,7 @@ static int __init register_sk_filter_ops(void)
 	bpf_register_prog_type(&sched_act_type);
 	bpf_register_prog_type(&xdp_type);
 	bpf_register_prog_type(&cg_skb_type);
+	bpf_register_prog_type(&cg_sock_type);
 
 	return 0;
 }
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 5ddf5cda07f4..24d2550492ee 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -374,8 +374,18 @@ static int inet_create(struct net *net, struct socket *sock, int protocol,
 
 	if (sk->sk_prot->init) {
 		err = sk->sk_prot->init(sk);
-		if (err)
+		if (err) {
+			sk_common_release(sk);
+			goto out;
+		}
+	}
+
+	if (!kern) {
+		err = BPF_CGROUP_RUN_PROG_INET_SOCK(sk);
+		if (err) {
 			sk_common_release(sk);
+			goto out;
+		}
 	}
 out:
 	return err;
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index d424f3a3737a..237e654ba717 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -258,6 +258,14 @@ static int inet6_create(struct net *net, struct socket *sock, int protocol,
 			goto out;
 		}
 	}
+
+	if (!kern) {
+		err = BPF_CGROUP_RUN_PROG_INET_SOCK(sk);
+		if (err) {
+			sk_common_release(sk);
+			goto out;
+		}
+	}
 out:
 	return err;
 out_rcu_unlock:
-- 
2.1.4

^ permalink raw reply related

* [PATCH net-next v5 1/3] bpf: Refactor cgroups code in prep for new type
From: David Ahern @ 2016-11-29 15:53 UTC (permalink / raw)
  To: netdev; +Cc: daniel, ast, daniel, maheshb, tgraf, David Ahern
In-Reply-To: <1480434813-3141-1-git-send-email-dsa@cumulusnetworks.com>

Code move and rename only; no functional change intended.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
v5
- no change

v4
- dropped refactor of __cgroup_bpf_run_filter and renamed it
  to __cgroup_bpf_run_filter_skb

v3
- dropped the rename

v2
- fix bpf_prog_run_clear_cb to bpf_prog_run_save_cb as caught by Daniel

- rename BPF_PROG_TYPE_CGROUP_SKB and its cg_skb functions to
  BPF_PROG_TYPE_CGROUP and cgroup

 include/linux/bpf-cgroup.h | 46 +++++++++++++++++++++++-----------------------
 kernel/bpf/cgroup.c        | 10 +++++-----
 kernel/bpf/syscall.c       | 28 +++++++++++++++-------------
 3 files changed, 43 insertions(+), 41 deletions(-)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index ec80d0c0953e..7f0fc635b13e 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -37,31 +37,31 @@ void cgroup_bpf_update(struct cgroup *cgrp,
 		       struct bpf_prog *prog,
 		       enum bpf_attach_type type);
 
-int __cgroup_bpf_run_filter(struct sock *sk,
-			    struct sk_buff *skb,
-			    enum bpf_attach_type type);
-
-/* Wrappers for __cgroup_bpf_run_filter() guarded by cgroup_bpf_enabled. */
-#define BPF_CGROUP_RUN_PROG_INET_INGRESS(sk,skb)			\
-({									\
-	int __ret = 0;							\
-	if (cgroup_bpf_enabled)						\
-		__ret = __cgroup_bpf_run_filter(sk, skb,		\
-						BPF_CGROUP_INET_INGRESS); \
-									\
-	__ret;								\
+int __cgroup_bpf_run_filter_skb(struct sock *sk,
+				struct sk_buff *skb,
+				enum bpf_attach_type type);
+
+/* Wrappers for __cgroup_bpf_run_filter_skb() guarded by cgroup_bpf_enabled. */
+#define BPF_CGROUP_RUN_PROG_INET_INGRESS(sk, skb)			      \
+({									      \
+	int __ret = 0;							      \
+	if (cgroup_bpf_enabled)						      \
+		__ret = __cgroup_bpf_run_filter_skb(sk, skb,		      \
+						    BPF_CGROUP_INET_INGRESS); \
+									      \
+	__ret;								      \
 })
 
-#define BPF_CGROUP_RUN_PROG_INET_EGRESS(sk,skb)				\
-({									\
-	int __ret = 0;							\
-	if (cgroup_bpf_enabled && sk && sk == skb->sk) {		\
-		typeof(sk) __sk = sk_to_full_sk(sk);			\
-		if (sk_fullsock(__sk))					\
-			__ret = __cgroup_bpf_run_filter(__sk, skb,	\
-						BPF_CGROUP_INET_EGRESS); \
-	}								\
-	__ret;								\
+#define BPF_CGROUP_RUN_PROG_INET_EGRESS(sk, skb)			       \
+({									       \
+	int __ret = 0;							       \
+	if (cgroup_bpf_enabled && sk && sk == skb->sk) {		       \
+		typeof(sk) __sk = sk_to_full_sk(sk);			       \
+		if (sk_fullsock(__sk))					       \
+			__ret = __cgroup_bpf_run_filter_skb(__sk, skb,	       \
+						      BPF_CGROUP_INET_EGRESS); \
+	}								       \
+	__ret;								       \
 })
 
 #else
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index a0ab43f264b0..19892973a78a 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -118,7 +118,7 @@ void __cgroup_bpf_update(struct cgroup *cgrp,
 }
 
 /**
- * __cgroup_bpf_run_filter() - Run a program for packet filtering
+ * __cgroup_bpf_run_filter_skb() - Run a program for packet filtering
  * @sk: The socken sending or receiving traffic
  * @skb: The skb that is being sent or received
  * @type: The type of program to be exectuted
@@ -132,9 +132,9 @@ void __cgroup_bpf_update(struct cgroup *cgrp,
  * This function will return %-EPERM if any if an attached program was found
  * and if it returned != 1 during execution. In all other cases, 0 is returned.
  */
-int __cgroup_bpf_run_filter(struct sock *sk,
-			    struct sk_buff *skb,
-			    enum bpf_attach_type type)
+int __cgroup_bpf_run_filter_skb(struct sock *sk,
+				struct sk_buff *skb,
+				enum bpf_attach_type type)
 {
 	struct bpf_prog *prog;
 	struct cgroup *cgrp;
@@ -164,4 +164,4 @@ int __cgroup_bpf_run_filter(struct sock *sk,
 
 	return ret;
 }
-EXPORT_SYMBOL(__cgroup_bpf_run_filter);
+EXPORT_SYMBOL(__cgroup_bpf_run_filter_skb);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 4caa18e6860a..5518a6839ab1 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -856,6 +856,7 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 {
 	struct bpf_prog *prog;
 	struct cgroup *cgrp;
+	enum bpf_prog_type ptype;
 
 	if (!capable(CAP_NET_ADMIN))
 		return -EPERM;
@@ -866,25 +867,26 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 	switch (attr->attach_type) {
 	case BPF_CGROUP_INET_INGRESS:
 	case BPF_CGROUP_INET_EGRESS:
-		prog = bpf_prog_get_type(attr->attach_bpf_fd,
-					 BPF_PROG_TYPE_CGROUP_SKB);
-		if (IS_ERR(prog))
-			return PTR_ERR(prog);
-
-		cgrp = cgroup_get_from_fd(attr->target_fd);
-		if (IS_ERR(cgrp)) {
-			bpf_prog_put(prog);
-			return PTR_ERR(cgrp);
-		}
-
-		cgroup_bpf_update(cgrp, prog, attr->attach_type);
-		cgroup_put(cgrp);
+		ptype = BPF_PROG_TYPE_CGROUP_SKB;
 		break;
 
 	default:
 		return -EINVAL;
 	}
 
+	prog = bpf_prog_get_type(attr->attach_bpf_fd, ptype);
+	if (IS_ERR(prog))
+		return PTR_ERR(prog);
+
+	cgrp = cgroup_get_from_fd(attr->target_fd);
+	if (IS_ERR(cgrp)) {
+		bpf_prog_put(prog);
+		return PTR_ERR(cgrp);
+	}
+
+	cgroup_bpf_update(cgrp, prog, attr->attach_type);
+	cgroup_put(cgrp);
+
 	return 0;
 }
 
-- 
2.1.4

^ permalink raw reply related

* [PATCH net-next v5 0/3] net: Add bpf support to set sk_bound_dev_if
From: David Ahern @ 2016-11-29 15:53 UTC (permalink / raw)
  To: netdev; +Cc: daniel, ast, daniel, maheshb, tgraf, David Ahern

The recently added VRF support in Linux leverages the bind-to-device
API for programs to specify an L3 domain for a socket. While
SO_BINDTODEVICE has been around for ages, not every ipv4/ipv6 capable
program has support for it. Even for those programs that do support it,
the API requires processes to be started as root (CAP_NET_RAW) which
is not desirable from a general security perspective.

This patch set leverages Daniel Mack's work to attach bpf programs to
a cgroup to provide a capability to set sk_bound_dev_if for all
AF_INET{6} sockets opened by a process in a cgroup when the sockets
are allocated.

For example:
 1. configure vrf (e.g., using ifupdown2)
        auto eth0
        iface eth0 inet dhcp
            vrf mgmt

        auto mgmt
        iface mgmt
            vrf-table auto

 2. configure cgroup
        mount -t cgroup2 none /tmp/cgroupv2
        mkdir /tmp/cgroupv2/mgmt
        test_cgrp2_sock /tmp/cgroupv2/mgmt 15

 3. set shell into cgroup (e.g., can be done at login using pam)
        echo $$ >> /tmp/cgroupv2/mgmt/cgroup.procs

At this point all commands run in the shell (e.g, apt) have sockets
automatically bound to the VRF (see output of ss -ap 'dev == <vrf>'),
including processes not running as root.

This capability enables running any program in a VRF context and is key
to deploying Management VRF, a fundamental configuration for networking
gear, with any Linux OS installation.

David Ahern (3):
  bpf: Refactor cgroups code in prep for new type
  bpf: Add new cgroup attach type to enable sock modifications
  samples: bpf: add userspace example for modifying sk_bound_dev_if

 include/linux/bpf-cgroup.h     | 60 ++++++++++++++++++------------
 include/uapi/linux/bpf.h       |  6 +++
 kernel/bpf/cgroup.c            | 43 +++++++++++++++++++---
 kernel/bpf/syscall.c           | 33 ++++++++++-------
 net/core/filter.c              | 59 ++++++++++++++++++++++++++++++
 net/ipv4/af_inet.c             | 12 +++++-
 net/ipv6/af_inet6.c            |  8 ++++
 samples/bpf/Makefile           |  2 +
 samples/bpf/test_cgrp2_sock.c  | 83 ++++++++++++++++++++++++++++++++++++++++++
 samples/bpf/test_cgrp2_sock.sh | 48 ++++++++++++++++++++++++
 10 files changed, 311 insertions(+), 43 deletions(-)
 create mode 100644 samples/bpf/test_cgrp2_sock.c
 create mode 100755 samples/bpf/test_cgrp2_sock.sh

-- 
2.1.4

^ permalink raw reply

* Re: [PATCH net-next v4 3/3] samples: bpf: add userspace example for modifying sk_bound_dev_if
From: David Ahern @ 2016-11-29 15:52 UTC (permalink / raw)
  To: netdev; +Cc: daniel, ast, daniel, maheshb, tgraf
In-Reply-To: <1480394329-24847-4-git-send-email-dsa@cumulusnetworks.com>

On 11/28/16 9:38 PM, David Ahern wrote:
> +	ret = bpf_prog_detach(cg_fd, BPF_CGROUP_INET_SOCK);
> +	ret = bpf_prog_attach(prog_fd, cg_fd, BPF_CGROUP_INET_SOCK);

forgot to update this to BPF_CGROUP_INET_SOCK_CREATE.

will send v5

^ permalink raw reply

* [PATCH iproute2 net-next 4/4] tc: flower: make use of flower_port_attr_type() safe and silent
From: Simon Horman @ 2016-11-29 15:51 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Simon Horman
In-Reply-To: <1480434693-29397-1-git-send-email-simon.horman@netronome.com>

Make use of flower_port_attr_type() safe:
* flower_port_attr_type() may return a valid index into tb[] or -1.
  Only access tb[] in the case of the former.
* Do not access null entries in tb[]

Also make usage silent - it is valid for ip_proto to be invalid,
for example if it is not specified as part of the filter.

Fixes: ("tc: flower: Support matching on SCTP ports")
Signed-off-by: Simon Horman <simon.horman@netronome.com>
---
 tc/f_flower.c | 24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/tc/f_flower.c b/tc/f_flower.c
index bca910d95729..21655f950386 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -159,19 +159,17 @@ static int flower_parse_ip_addr(char *str, __be16 eth_type,
 
 static int flower_port_attr_type(__u8 ip_proto, bool is_src)
 {
-	if (ip_proto == IPPROTO_TCP) {
+	if (ip_proto == IPPROTO_TCP)
 		return is_src ? TCA_FLOWER_KEY_TCP_SRC :
 			TCA_FLOWER_KEY_TCP_DST;
-	} else if (ip_proto == IPPROTO_UDP) {
+	else if (ip_proto == IPPROTO_UDP)
 		return is_src ? TCA_FLOWER_KEY_UDP_SRC :
 			TCA_FLOWER_KEY_UDP_DST;
-	} else if (ip_proto == IPPROTO_SCTP) {
+	else if (ip_proto == IPPROTO_SCTP)
 		return is_src ? TCA_FLOWER_KEY_SCTP_SRC :
 			TCA_FLOWER_KEY_SCTP_DST;
-	} else {
-		fprintf(stderr, "Illegal \"ip_proto\" for port\n");
+	else
 		return -1;
-	}
 }
 
 static int flower_parse_port(char *str, __u8 ip_proto, bool is_src,
@@ -505,7 +503,8 @@ static void flower_print_ip_addr(FILE *f, char *name, __be16 eth_type,
 
 static void flower_print_port(FILE *f, char *name, struct rtattr *attr)
 {
-	fprintf(f, "\n  %s %d", name, ntohs(rta_getattr_u16(attr)));
+	if (attr)
+		fprintf(f, "\n  %s %d", name, ntohs(rta_getattr_u16(attr)));
 }
 
 static int flower_print_opt(struct filter_util *qu, FILE *f,
@@ -514,6 +513,7 @@ static int flower_print_opt(struct filter_util *qu, FILE *f,
 	struct rtattr *tb[TCA_FLOWER_MAX + 1];
 	__be16 eth_type = 0;
 	__u8 ip_proto = 0xff;
+	int nl_type;
 
 	if (!opt)
 		return 0;
@@ -568,10 +568,12 @@ static int flower_print_opt(struct filter_util *qu, FILE *f,
 			     tb[TCA_FLOWER_KEY_IPV6_SRC],
 			     tb[TCA_FLOWER_KEY_IPV6_SRC_MASK]);
 
-	flower_print_port(f, "dst_port",
-			  tb[flower_port_attr_type(ip_proto, false)]);
-	flower_print_port(f, "src_port",
-			  tb[flower_port_attr_type(ip_proto, true)]);
+	nl_type = flower_port_attr_type(ip_proto, false);
+	if (nl_type >= 0)
+		flower_print_port(f, "dst_port", tb[nl_type]);
+	nl_type = flower_port_attr_type(ip_proto, true);
+	if (nl_type >= 0)
+		flower_print_port(f, "src_port", tb[nl_type]);
 
 	if (tb[TCA_FLOWER_FLAGS])  {
 		__u32 flags = rta_getattr_u32(tb[TCA_FLOWER_FLAGS]);
-- 
2.7.0.rc3.207.g0ac5344

^ permalink raw reply related

* [PATCH iproute2 net-next 2/4] tc: flower: document SCTP ip_proto
From: Simon Horman @ 2016-11-29 15:51 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Simon Horman
In-Reply-To: <1480434693-29397-1-git-send-email-simon.horman@netronome.com>

Add SCTP ip_proto to help text and man page.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
---
This a follow-up to ("tc: flower: Support matching on SCTP ports")
---
 man/man8/tc-flower.8 | 14 +++++++-------
 tc/f_flower.c        |  2 +-
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/man/man8/tc-flower.8 b/man/man8/tc-flower.8
index 56db42f983c1..a401293fed50 100644
--- a/man/man8/tc-flower.8
+++ b/man/man8/tc-flower.8
@@ -29,7 +29,7 @@ flower \- flow based traffic control filter
 .IR PRIORITY " | "
 .BR vlan_eth_type " { " ipv4 " | " ipv6 " | "
 .IR ETH_TYPE " } | "
-.BR ip_proto " { " tcp " | " udp " | "
+.BR ip_proto " { " tcp " | " udp " | " sctp " | "
 .IR IP_PROTO " } | { "
 .BR dst_ip " | " src_ip " } { "
 .IR ipv4_address " | " ipv6_address " } | { "
@@ -93,8 +93,8 @@ or an unsigned 16bit value in hexadecimal format.
 .BI ip_proto " IP_PROTO"
 Match on layer four protocol.
 .I IP_PROTO
-may be either
-.BR tcp , udp
+may be
+.BR tcp ", " udp ", " sctp
 or an unsigned 8bit value in hexadecimal format.
 .TP
 .BI dst_ip " ADDRESS"
@@ -110,8 +110,8 @@ option of tc filter.
 .TQ
 .BI src_port " NUMBER"
 Match on layer 4 protocol source or destination port number. Only available for
-.BR ip_proto " values " udp " and " tcp ,
-which has to be specified in beforehand.
+.BR ip_proto " values " udp ", " tcp  " and " sctp
+which have to be specified in beforehand.
 .SH NOTES
 As stated above where applicable, matches of a certain layer implicitly depend
 on the matches of the next lower layer. Precisely, layer one and two matches
@@ -125,8 +125,8 @@ and finally layer four matches
 (\fBdst_port\fR and \fBsrc_port\fR)
 depend on
 .B ip_proto
-being set to either
-.BR tcp " or " udp .
+being set to
+.BR tcp ", " udp " or " sctp.
 .P
 There can be only used one mask per one prio. If user needs to specify different
 mask, he has to use different prio.
diff --git a/tc/f_flower.c b/tc/f_flower.c
index d1694623863d..c74a0244e244 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -36,7 +36,7 @@ static void explain(void)
 	fprintf(stderr, "                       vlan_ethtype [ ipv4 | ipv6 | ETH-TYPE ] |\n");
 	fprintf(stderr, "                       dst_mac MAC-ADDR |\n");
 	fprintf(stderr, "                       src_mac MAC-ADDR |\n");
-	fprintf(stderr, "                       ip_proto [tcp | udp | IP-PROTO ] |\n");
+	fprintf(stderr, "                       ip_proto [ tcp | udp | sctp | IP-PROTO ] |\n");
 	fprintf(stderr, "                       dst_ip [ IPV4-ADDR | IPV6-ADDR ] |\n");
 	fprintf(stderr, "                       src_ip [ IPV4-ADDR | IPV6-ADDR ] |\n");
 	fprintf(stderr, "                       dst_port PORT-NUMBER |\n");
-- 
2.7.0.rc3.207.g0ac5344

^ permalink raw reply related

* [PATCH iproute2 net-next 3/4] tc: flower: correct name of ip_proto parameter to flower_parse_port()
From: Simon Horman @ 2016-11-29 15:51 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Simon Horman
In-Reply-To: <1480434693-29397-1-git-send-email-simon.horman@netronome.com>

This corrects a typo.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
---
This a follow-up to ("tc: flower: Support matching on SCTP ports")
---
 tc/f_flower.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/tc/f_flower.c b/tc/f_flower.c
index c74a0244e244..bca910d95729 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -157,15 +157,15 @@ static int flower_parse_ip_addr(char *str, __be16 eth_type,
 	return 0;
 }
 
-static int flower_port_attr_type(__u8 ip_port, bool is_src)
+static int flower_port_attr_type(__u8 ip_proto, bool is_src)
 {
-	if (ip_port == IPPROTO_TCP) {
+	if (ip_proto == IPPROTO_TCP) {
 		return is_src ? TCA_FLOWER_KEY_TCP_SRC :
 			TCA_FLOWER_KEY_TCP_DST;
-	} else if (ip_port == IPPROTO_UDP) {
+	} else if (ip_proto == IPPROTO_UDP) {
 		return is_src ? TCA_FLOWER_KEY_UDP_SRC :
 			TCA_FLOWER_KEY_UDP_DST;
-	} else if (ip_port == IPPROTO_SCTP) {
+	} else if (ip_proto == IPPROTO_SCTP) {
 		return is_src ? TCA_FLOWER_KEY_SCTP_SRC :
 			TCA_FLOWER_KEY_SCTP_DST;
 	} else {
@@ -174,14 +174,14 @@ static int flower_port_attr_type(__u8 ip_port, bool is_src)
 	}
 }
 
-static int flower_parse_port(char *str, __u8 ip_port, bool is_src,
+static int flower_parse_port(char *str, __u8 ip_proto, bool is_src,
 			     struct nlmsghdr *n)
 {
 	int ret;
 	int type;
 	__be16 port;
 
-	type = flower_port_attr_type(ip_port, is_src);
+	type = flower_port_attr_type(ip_proto, is_src);
 	if (type < 0)
 		return -1;
 
-- 
2.7.0.rc3.207.g0ac5344

^ permalink raw reply related

* [PATCH iproute2 net-next 1/4] tc: flower: remove references to eth_type in manpage
From: Simon Horman @ 2016-11-29 15:51 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Simon Horman, Paul Blakey
In-Reply-To: <1480434693-29397-1-git-send-email-simon.horman@netronome.com>

Remove references to eth_type and ether_type (spelling error) in
the tc flower manpage.

Also correct formatting of boldface text with whitespace.

Cc: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
---
This is a follow-up to "tc: flower: Fix usage message"
---
 man/man8/tc-flower.8 | 23 +++++++++++------------
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/man/man8/tc-flower.8 b/man/man8/tc-flower.8
index 16ef261797ab..56db42f983c1 100644
--- a/man/man8/tc-flower.8
+++ b/man/man8/tc-flower.8
@@ -103,8 +103,8 @@ or an unsigned 8bit value in hexadecimal format.
 Match on source or destination IP address.
 .I ADDRESS
 must be a valid IPv4 or IPv6 address, depending on
-.BR ether_type ,
-which has to be specified in beforehand.
+.BR protocol
+option of tc filter.
 .TP
 .BI dst_port " NUMBER"
 .TQ
@@ -114,16 +114,15 @@ Match on layer 4 protocol source or destination port number. Only available for
 which has to be specified in beforehand.
 .SH NOTES
 As stated above where applicable, matches of a certain layer implicitly depend
-on the matches of the next lower layer. Precisely, layer one and two matches (
-.BR indev , dst_mac , src_mac " and " eth_type )
-have no dependency, layer three matches (
-.BR ip_proto , dst_ip " and " src_ip )
-require
-.B eth_type
-being set to either
-.BR ipv4 " or " ipv6 ,
-and finally layer four matches (
-.BR dst_port " and " src_port )
+on the matches of the next lower layer. Precisely, layer one and two matches
+(\fBindev\fR,  \fBdst_mac\fR and \fBsrc_mac\fR)
+have no dependency, layer three matches
+(\fBip_proto\fR, \fBdst_ip\fR and \fBsrc_ip\fR)
+depend on the
+.B protocol
+option of tc filter
+and finally layer four matches
+(\fBdst_port\fR and \fBsrc_port\fR)
 depend on
 .B ip_proto
 being set to either
-- 
2.7.0.rc3.207.g0ac5344

^ permalink raw reply related

* [PATCH iproute2 net-next 0/4] tc: flower: SCTP and other port fixes
From: Simon Horman @ 2016-11-29 15:51 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Simon Horman

Hi,

this short series:

* Makes some improvements to the documentation of flower;
  A follow-up to recent work by Paul Blakey and myself.
* Corrects some errors introduced when SCTP port matching support was
  recently added.

It applies on top of the following:
* [PATCH iproute2] tc: flower: Fix usage message
* [PATCH iproute2 0/2] tc: flower: Support matching on SCTP port

Simon Horman (4):
  tc: flower: remove references to eth_type in manpage
  tc: flower: document SCTP ip_proto
  tc: flower: correct name of ip_proto parameter to flower_parse_port()
  tc: flower: make use of flower_port_attr_type() safe and silent

 man/man8/tc-flower.8 | 37 ++++++++++++++++++-------------------
 tc/f_flower.c        | 32 +++++++++++++++++---------------
 2 files changed, 35 insertions(+), 34 deletions(-)

-- 
2.7.0.rc3.207.g0ac5344

^ permalink raw reply

* Re: [PATCH v2 07/13] net: ethernet: ti: cpts: rework initialization/deinitialization
From: Grygorii Strashko @ 2016-11-29 15:50 UTC (permalink / raw)
  To: Richard Cochran
  Cc: David S. Miller, netdev-u79uwXL29TY76Z2rM5mHXA, Mugunthan V N,
	Sekhar Nori, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-omap-u79uwXL29TY76Z2rM5mHXA, Rob Herring,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Murali Karicheri, Wingman Kwok
In-Reply-To: <20161129100737.GF3110-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>

Hi Richard,

On 11/29/2016 04:07 AM, Richard Cochran wrote:
> On Mon, Nov 28, 2016 at 05:03:31PM -0600, Grygorii Strashko wrote:
>> +int cpts_register(struct cpts *cpts)
>>  {
>>  	int err, i;
>>  
>> -	cpts->info = cpts_info;
>> -	spin_lock_init(&cpts->lock);
>> -
>> -	cpts->cc.read = cpts_systim_read;
>> -	cpts->cc.mask = CLOCKSOURCE_MASK(32);
>> -	cpts->cc_mult = mult;
>> -	cpts->cc.mult = mult;
>> -	cpts->cc.shift = shift;
>> -
>>  	INIT_LIST_HEAD(&cpts->events);
>>  	INIT_LIST_HEAD(&cpts->pool);
>>  	for (i = 0; i < CPTS_MAX_EVENTS; i++)
>>  		list_add(&cpts->pool_data[i].list, &cpts->pool);
>>  
>> -	cpts_clk_init(dev, cpts);
>> +	clk_enable(cpts->refclk);
>> +
>>  	cpts_write32(cpts, CPTS_EN, control);
>>  	cpts_write32(cpts, TS_PEND_EN, int_enable);
>>  
>> +	cpts->cc.mult = cpts->cc_mult;
> 
> It is not clear why you set cc.mult in a different place than
> cc.shift.  That isn't logical, but maybe later patches make it
> clear...

cc.mult has to be reloaded to original value each time CPTS is registered(restarted)
as it can be modified by cpts_ptp_adjfreq().

While cc.shift is static.



> 
>>  	timecounter_init(&cpts->tc, &cpts->cc, ktime_to_ns(ktime_get_real()));
>>  

[...]

>>  }
>>  EXPORT_SYMBOL_GPL(cpts_unregister);
>>  
>> +struct cpts *cpts_create(struct device *dev, void __iomem *regs,
>> +			 u32 mult, u32 shift)
>> +{
>> +	struct cpts *cpts;
>> +
>> +	if (!regs || !dev)
>> +		return ERR_PTR(-EINVAL);
> 
> There is no need for this test, as the caller will always pass valid
> pointers.  (This isn't a user space library!)
> 
ok

-- 
regards,
-grygorii
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2 net-next 10/11] qede: Add basic XDP support
From: Daniel Borkmann @ 2016-11-29 15:48 UTC (permalink / raw)
  To: Yuval Mintz; +Cc: davem, netdev, alexei.starovoitov
In-Reply-To: <1480430830-17671-11-git-send-email-Yuval.Mintz@cavium.com>

On 11/29/2016 03:47 PM, Yuval Mintz wrote:
> Add support for the ndo_xdp callback. This patch would support XDP_PASS,
> XDP_DROP and XDP_ABORTED commands.
>
> This also adds a per Rx queue statistic which counts number of packets
> which didn't reach the stack [due to XDP].
>
> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
[...]
> @@ -1560,6 +1593,7 @@ static int qede_rx_process_cqe(struct qede_dev *edev,
>   			       struct qede_fastpath *fp,
>   			       struct qede_rx_queue *rxq)
>   {
> +	struct bpf_prog *xdp_prog = READ_ONCE(rxq->xdp_prog);
>   	struct eth_fast_path_rx_reg_cqe *fp_cqe;
>   	u16 len, pad, bd_cons_idx, parse_flag;
>   	enum eth_rx_cqe_type cqe_type;
> @@ -1596,6 +1630,11 @@ static int qede_rx_process_cqe(struct qede_dev *edev,
>   	len = le16_to_cpu(fp_cqe->len_on_first_bd);
>   	pad = fp_cqe->placement_offset;
>
> +	/* Run eBPF program if one is attached */
> +	if (xdp_prog)
> +		if (!qede_rx_xdp(edev, fp, rxq, xdp_prog, bd, fp_cqe))
> +			return 1;
> +

You also need to wrap this under rcu_read_lock() (at least I haven't seen
it in your patches) for same reasons as stated in 326fe02d1ed6 ("net/mlx4_en:
protect ring->xdp_prog with rcu_read_lock"), as otherwise xdp_prog could
disappear underneath you. mlx4 and nfp does it correctly, looks like mlx5
doesn't.

^ permalink raw reply

* [PATCH net-next 1/2] tcp: randomize tcp timestamp offsets for each connection
From: Florian Westphal @ 2016-11-29 15:45 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal, Mirja Kühlewind

jiffies based timestamps allow for easy inference of number of devices
behind NAT translators and also makes tracking of hosts simpler.

commit ceaa1fef65a7c2e ("tcp: adding a per-socket timestamp offset")
added the main infrastructure that is needed for per-connection ts
randomization, in particular writing/reading the on-wire tcp header
format takes the offset into account so rest of stack can use normal
tcp_time_stamp (jiffies).

So only two items are left:
 - add a tsoffset for request sockets
 - extend the tcp isn generator to also return another 32bit number
   in addition to the ISN.

Re-use of ISN generator also means timestamps are still monotonically
increasing for same connection quadruple, i.e. PAWS will still work.

Includes fixes from Eric Dumazet.

Cc: Mirja Kühlewind <mirja.kuehlewind@tik.ee.ethz.ch>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/linux/tcp.h      |  1 +
 include/net/secure_seq.h |  8 ++++----
 include/net/tcp.h        |  2 +-
 net/core/secure_seq.c    | 10 ++++++----
 net/ipv4/syncookies.c    |  1 +
 net/ipv4/tcp_input.c     |  7 ++++++-
 net/ipv4/tcp_ipv4.c      |  9 +++++----
 net/ipv4/tcp_minisocks.c |  4 +++-
 net/ipv4/tcp_output.c    |  2 +-
 net/ipv6/syncookies.c    |  1 +
 net/ipv6/tcp_ipv6.c      | 10 ++++++----
 11 files changed, 35 insertions(+), 20 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 32a7c7e35b71..2408bcc579f1 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -123,6 +123,7 @@ struct tcp_request_sock {
 	u32				txhash;
 	u32				rcv_isn;
 	u32				snt_isn;
+	u32				ts_off;
 	u32				last_oow_ack_time; /* last SYNACK */
 	u32				rcv_nxt; /* the ack # by SYNACK. For
 						  * FastOpen it's the seq#
diff --git a/include/net/secure_seq.h b/include/net/secure_seq.h
index 3f36d45b714a..0caee631a836 100644
--- a/include/net/secure_seq.h
+++ b/include/net/secure_seq.h
@@ -6,10 +6,10 @@
 u32 secure_ipv4_port_ephemeral(__be32 saddr, __be32 daddr, __be16 dport);
 u32 secure_ipv6_port_ephemeral(const __be32 *saddr, const __be32 *daddr,
 			       __be16 dport);
-__u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr,
-				 __be16 sport, __be16 dport);
-__u32 secure_tcpv6_sequence_number(const __be32 *saddr, const __be32 *daddr,
-				   __be16 sport, __be16 dport);
+u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr,
+			       __be16 sport, __be16 dport, u32 *tsoff);
+u32 secure_tcpv6_sequence_number(const __be32 *saddr, const __be32 *daddr,
+				 __be16 sport, __be16 dport, u32 *tsoff);
 u64 secure_dccp_sequence_number(__be32 saddr, __be32 daddr,
 				__be16 sport, __be16 dport);
 u64 secure_dccpv6_sequence_number(__be32 *saddr, __be32 *daddr,
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 7de80739adab..1c09d909bd43 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1809,7 +1809,7 @@ struct tcp_request_sock_ops {
 	struct dst_entry *(*route_req)(const struct sock *sk, struct flowi *fl,
 				       const struct request_sock *req,
 				       bool *strict);
-	__u32 (*init_seq)(const struct sk_buff *skb);
+	__u32 (*init_seq)(const struct sk_buff *skb, u32 *tsoff);
 	int (*send_synack)(const struct sock *sk, struct dst_entry *dst,
 			   struct flowi *fl, struct request_sock *req,
 			   struct tcp_fastopen_cookie *foc,
diff --git a/net/core/secure_seq.c b/net/core/secure_seq.c
index fd3ce461fbe6..a8d6062cbb4a 100644
--- a/net/core/secure_seq.c
+++ b/net/core/secure_seq.c
@@ -40,8 +40,8 @@ static u32 seq_scale(u32 seq)
 #endif
 
 #if IS_ENABLED(CONFIG_IPV6)
-__u32 secure_tcpv6_sequence_number(const __be32 *saddr, const __be32 *daddr,
-				   __be16 sport, __be16 dport)
+u32 secure_tcpv6_sequence_number(const __be32 *saddr, const __be32 *daddr,
+				 __be16 sport, __be16 dport, u32 *tsoff)
 {
 	u32 secret[MD5_MESSAGE_BYTES / 4];
 	u32 hash[MD5_DIGEST_WORDS];
@@ -58,6 +58,7 @@ __u32 secure_tcpv6_sequence_number(const __be32 *saddr, const __be32 *daddr,
 
 	md5_transform(hash, secret);
 
+	*tsoff = hash[1];
 	return seq_scale(hash[0]);
 }
 EXPORT_SYMBOL(secure_tcpv6_sequence_number);
@@ -86,8 +87,8 @@ EXPORT_SYMBOL(secure_ipv6_port_ephemeral);
 
 #ifdef CONFIG_INET
 
-__u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr,
-				 __be16 sport, __be16 dport)
+u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr,
+			       __be16 sport, __be16 dport, u32 *tsoff)
 {
 	u32 hash[MD5_DIGEST_WORDS];
 
@@ -99,6 +100,7 @@ __u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr,
 
 	md5_transform(hash, net_secret);
 
+	*tsoff = hash[1];
 	return seq_scale(hash[0]);
 }
 
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index 0dc6286272aa..3e88467d70ee 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -334,6 +334,7 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb)
 	treq = tcp_rsk(req);
 	treq->rcv_isn		= ntohl(th->seq) - 1;
 	treq->snt_isn		= cookie;
+	treq->ts_off		= 0;
 	req->mss		= mss;
 	ireq->ir_num		= ntohs(th->dest);
 	ireq->ir_rmt_port	= th->source;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 22e6a2097ff6..1b1921c71f7c 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6301,6 +6301,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
 		goto drop;
 
 	tcp_rsk(req)->af_specific = af_ops;
+	tcp_rsk(req)->ts_off = 0;
 
 	tcp_clear_options(&tmp_opt);
 	tmp_opt.mss_clamp = af_ops->mss_clamp;
@@ -6322,6 +6323,9 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
 	if (security_inet_conn_request(sk, skb, req))
 		goto drop_and_free;
 
+	if (isn && tmp_opt.tstamp_ok)
+		af_ops->init_seq(skb, &tcp_rsk(req)->ts_off);
+
 	if (!want_cookie && !isn) {
 		/* VJ's idea. We save last timestamp seen
 		 * from the destination in peer table, when entering
@@ -6362,7 +6366,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
 			goto drop_and_release;
 		}
 
-		isn = af_ops->init_seq(skb);
+		isn = af_ops->init_seq(skb, &tcp_rsk(req)->ts_off);
 	}
 	if (!dst) {
 		dst = af_ops->route_req(sk, &fl, req, NULL);
@@ -6374,6 +6378,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
 
 	if (want_cookie) {
 		isn = cookie_init_sequence(af_ops, sk, skb, &req->mss);
+		tcp_rsk(req)->ts_off = 0;
 		req->cookie_ts = tmp_opt.tstamp_ok;
 		if (!tmp_opt.tstamp_ok)
 			inet_rsk(req)->ecn_ok = 0;
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 5555eb86e549..b50f05905ced 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -95,12 +95,12 @@ static int tcp_v4_md5_hash_hdr(char *md5_hash, const struct tcp_md5sig_key *key,
 struct inet_hashinfo tcp_hashinfo;
 EXPORT_SYMBOL(tcp_hashinfo);
 
-static  __u32 tcp_v4_init_sequence(const struct sk_buff *skb)
+static u32 tcp_v4_init_sequence(const struct sk_buff *skb, u32 *tsoff)
 {
 	return secure_tcp_sequence_number(ip_hdr(skb)->daddr,
 					  ip_hdr(skb)->saddr,
 					  tcp_hdr(skb)->dest,
-					  tcp_hdr(skb)->source);
+					  tcp_hdr(skb)->source, tsoff);
 }
 
 int tcp_twsk_unique(struct sock *sk, struct sock *sktw, void *twp)
@@ -237,7 +237,8 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 		tp->write_seq = secure_tcp_sequence_number(inet->inet_saddr,
 							   inet->inet_daddr,
 							   inet->inet_sport,
-							   usin->sin_port);
+							   usin->sin_port,
+							   &tp->tsoffset);
 
 	inet->inet_id = tp->write_seq ^ jiffies;
 
@@ -824,7 +825,7 @@ static void tcp_v4_reqsk_send_ack(const struct sock *sk, struct sk_buff *skb,
 	tcp_v4_send_ack(sk, skb, seq,
 			tcp_rsk(req)->rcv_nxt,
 			req->rsk_rcv_wnd >> inet_rsk(req)->rcv_wscale,
-			tcp_time_stamp,
+			tcp_time_stamp + tcp_rsk(req)->ts_off,
 			req->ts_recent,
 			0,
 			tcp_md5_do_lookup(sk, (union tcp_md5_addr *)&ip_hdr(skb)->daddr,
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 6234ebaa7db1..28ce5ee831f5 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -532,7 +532,7 @@ struct sock *tcp_create_openreq_child(const struct sock *sk,
 			newtp->rx_opt.ts_recent_stamp = 0;
 			newtp->tcp_header_len = sizeof(struct tcphdr);
 		}
-		newtp->tsoffset = 0;
+		newtp->tsoffset = treq->ts_off;
 #ifdef CONFIG_TCP_MD5SIG
 		newtp->md5sig_info = NULL;	/*XXX*/
 		if (newtp->af_specific->md5_lookup(sk, newsk))
@@ -581,6 +581,8 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
 
 		if (tmp_opt.saw_tstamp) {
 			tmp_opt.ts_recent = req->ts_recent;
+			if (tmp_opt.rcv_tsecr)
+				tmp_opt.rcv_tsecr -= tcp_rsk(req)->ts_off;
 			/* We do not store true stamp, but it is not required,
 			 * it can be estimated (approximately)
 			 * from another data.
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 19105b46a304..1b6d5f34bf45 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -640,7 +640,7 @@ static unsigned int tcp_synack_options(struct request_sock *req,
 	}
 	if (likely(ireq->tstamp_ok)) {
 		opts->options |= OPTION_TS;
-		opts->tsval = tcp_skb_timestamp(skb);
+		opts->tsval = tcp_skb_timestamp(skb) + tcp_rsk(req)->ts_off;
 		opts->tsecr = req->ts_recent;
 		remaining -= TCPOLEN_TSTAMP_ALIGNED;
 	}
diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c
index 97830a6a9cbb..a4d49760bf43 100644
--- a/net/ipv6/syncookies.c
+++ b/net/ipv6/syncookies.c
@@ -209,6 +209,7 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
 	treq->snt_synack.v64	= 0;
 	treq->rcv_isn = ntohl(th->seq) - 1;
 	treq->snt_isn = cookie;
+	treq->ts_off = 0;
 
 	/*
 	 * We need to lookup the dst_entry to get the correct window size.
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 28ec0a2e7b72..a2185a214abc 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -101,12 +101,12 @@ static void inet6_sk_rx_dst_set(struct sock *sk, const struct sk_buff *skb)
 	}
 }
 
-static __u32 tcp_v6_init_sequence(const struct sk_buff *skb)
+static u32 tcp_v6_init_sequence(const struct sk_buff *skb, u32 *tsoff)
 {
 	return secure_tcpv6_sequence_number(ipv6_hdr(skb)->daddr.s6_addr32,
 					    ipv6_hdr(skb)->saddr.s6_addr32,
 					    tcp_hdr(skb)->dest,
-					    tcp_hdr(skb)->source);
+					    tcp_hdr(skb)->source, tsoff);
 }
 
 static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
@@ -283,7 +283,8 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
 		tp->write_seq = secure_tcpv6_sequence_number(np->saddr.s6_addr32,
 							     sk->sk_v6_daddr.s6_addr32,
 							     inet->inet_sport,
-							     inet->inet_dport);
+							     inet->inet_dport,
+							     &tp->tsoffset);
 
 	err = tcp_connect(sk);
 	if (err)
@@ -956,7 +957,8 @@ static void tcp_v6_reqsk_send_ack(const struct sock *sk, struct sk_buff *skb,
 			tcp_rsk(req)->snt_isn + 1 : tcp_sk(sk)->snd_nxt,
 			tcp_rsk(req)->rcv_nxt,
 			req->rsk_rcv_wnd >> inet_rsk(req)->rcv_wscale,
-			tcp_time_stamp, req->ts_recent, sk->sk_bound_dev_if,
+			tcp_time_stamp + tcp_rsk(req)->ts_off,
+			req->ts_recent, sk->sk_bound_dev_if,
 			tcp_v6_md5_do_lookup(sk, &ipv6_hdr(skb)->daddr),
 			0, 0);
 }
-- 
2.7.3

^ permalink raw reply related

* [PATCH net-next 2/2] tcp: allow to turn tcp timestamp randomization off
From: Florian Westphal @ 2016-11-29 15:45 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal
In-Reply-To: <1480434342-12367-1-git-send-email-fw@strlen.de>

Eric says: "By looking at tcpdump, and TS val of xmit packets of multiple
flows, we can deduct the relative qdisc delays (think of fq pacing).
This should work even if we have one flow per remote peer."

Having random per flow (or host) offsets doesn't allow that anymore so add
a way to turn this off.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 Documentation/networking/ip-sysctl.txt |  9 +++++++--
 net/ipv4/tcp_input.c                   | 10 +++++++---
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 5af48dd7c5fc..de2448313799 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -610,8 +610,13 @@ tcp_syn_retries - INTEGER
 	with the current initial RTO of 1second. With this the final timeout
 	for an active TCP connection attempt will happen after 127seconds.
 
-tcp_timestamps - BOOLEAN
-	Enable timestamps as defined in RFC1323.
+tcp_timestamps - INTEGER
+Enable timestamps as defined in RFC1323.
+	0: Disabled.
+	1: Enable timestamps as defined in RFC1323.
+	2: Like 1, but also use a random offset for each connection
+	rather than only using the current time.
+	Default: 2
 
 tcp_min_tso_segs - INTEGER
 	Minimal number of segments per TSO frame.
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 1b1921c71f7c..ebed73703198 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -76,7 +76,7 @@
 #include <asm/unaligned.h>
 #include <linux/errqueue.h>
 
-int sysctl_tcp_timestamps __read_mostly = 1;
+int sysctl_tcp_timestamps __read_mostly = 2;
 int sysctl_tcp_window_scaling __read_mostly = 1;
 int sysctl_tcp_sack __read_mostly = 1;
 int sysctl_tcp_fack __read_mostly = 1;
@@ -6323,10 +6323,12 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
 	if (security_inet_conn_request(sk, skb, req))
 		goto drop_and_free;
 
-	if (isn && tmp_opt.tstamp_ok)
+	if (isn && tmp_opt.tstamp_ok && sysctl_tcp_timestamps == 2)
 		af_ops->init_seq(skb, &tcp_rsk(req)->ts_off);
 
 	if (!want_cookie && !isn) {
+		u32 ts_off;
+
 		/* VJ's idea. We save last timestamp seen
 		 * from the destination in peer table, when entering
 		 * state TIME-WAIT, and check against it before
@@ -6366,7 +6368,9 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
 			goto drop_and_release;
 		}
 
-		isn = af_ops->init_seq(skb, &tcp_rsk(req)->ts_off);
+		isn = af_ops->init_seq(skb, &ts_off);
+		if (sysctl_tcp_timestamps == 2)
+			tcp_rsk(req)->ts_off = ts_off;
 	}
 	if (!dst) {
 		dst = af_ops->route_req(sk, &fl, req, NULL);
-- 
2.7.3

^ permalink raw reply related

* Re: bpf debug info
From: Jakub Kicinski @ 2016-11-29 15:38 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, netdev, Brenden Blanco, Thomas Graf, Wangnan,
	He Kuang, kernel-team
In-Reply-To: <583D9AA4.8050601@iogearbox.net>

On Tue, 29 Nov 2016 16:11:32 +0100, Daniel Borkmann wrote:
> On 11/29/2016 07:42 AM, Alexei Starovoitov wrote:
> [...]
> > The support for debug information in BPF was recently added to llvm.
> >
> > In order to use it recompile bpf programs with the following patch
> > in samples/bpf/Makefile
> > @@ -155,4 +155,4 @@ $(obj)/%.o: $(src)/%.c
> >          $(CLANG) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) \
> >                  -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
> >                  -Wno-compare-distinct-pointer-types \
> > -               -O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf -filetype=obj -o $@
> > +               -O2 -emit-llvm -g -c $< -o -| $(LLC) -march=bpf -filetype=obj -o $@
> >
> > and compiled .o files can be consumed by standard llvm-objdump utility.
> >
> > $ llvm-objdump -S -no-show-raw-insn samples/bpf/xdp1_kern.o
> > xdp1_kern.o:    file format ELF64-BPF
> >
> > Disassembly of section xdp1:
> > xdp_prog1:
> > ; {
> >         0:       r2 = *(u32 *)(r1 + 4)
> > ; void *data = (void *)(long)ctx->data;
> >         8:       r1 = *(u32 *)(r1 + 0)
> > ; if (data + nh_off > data_end)
> >        10:       r3 = r1
> >        18:       r3 += 14
> >        20:       if r3 > r2 goto 55
> > ; h_proto = eth->h_proto;
> >        28:       r3 = *(u8 *)(r1 + 12)
> >        30:       r4 = *(u8 *)(r1 + 13)
> >        38:       r4 <<= 8
> >        40:       r4 |= r3
> > ; if (h_proto == htons(ETH_P_8021Q) || h_proto == htons(ETH_P_8021AD)) {
> >        48:       if r4 == 43144 goto 2
> >        50:       r3 = 14
> >        58:       if r4 != 129 goto 5
> >
> > LBB0_3:
> > ; if (data + nh_off > data_end)
> >        60:       r3 = r1
> >        68:       r3 += 18
> >        70:       if r3 > r2 goto 45
> >        78:       r3 = 18
> > ; h_proto = vhdr->h_vlan_encapsulated_proto;
> >        80:       r4 = *(u16 *)(r1 + 16)
> >
> > LBB0_5:
> >        88:       r5 = r4
> >        90:       r5 &= 65535
> > ; if (h_proto == htons(ETH_P_8021Q) || h_proto == htons(ETH_P_8021AD)) {
> >        98:       if r5 == 43144 goto 1
> >        a0:       if r5 != 129 goto 9
> >
> > Notice that 'clang -S -o a.s' output and llvm-objdump disassembler
> > were changed to use kernel verifier style, so now it should be easier
> > to see what's going on.  

This sounds super useful!  Thanks a lot!

>>> [...]
> > So next step is to improve verifier messages to be more human friendly.
> > The step after is to introduce BPF_COMMENT pseudo instruction
> > that will be ignored by the interpreter yet it will contain the text
> > of original source code. Then llvm-objdump step won't be necessary.
> > The bpf loader will load both instructions and pieces of C sources.
> > Then verifier errors should be even easier to read and humans
> > can easily understand the purpose of the program.  
> 
> So the BPF_COMMENT pseudo insn will get stripped away from the insn array
> after verification step, so we don't need to hold/account for this mem? I
> assume in it's ->imm member it will just hold offset into text blob?

Associating any form of opaque data with programs always makes me
worried about opening a side channel of communication with a specialized
user space implementations/compilers.  But I guess if the BPF_COMMENTs
are stripped in the verifier as Daniel assumes drivers and JITs will
never see it.

Just to clarify, however - is there any reason why pushing the source
code into the kernel is necessary?  Or is it just for convenience?
Provided the user space loader has access to the debug info it should
have no problems matching the verifier output to code lines?

^ permalink raw reply

* Re: [PATCH] tun: Use netif_receive_skb instead of netif_rx
From: Eric Dumazet @ 2016-11-29 15:35 UTC (permalink / raw)
  To: Andrey Konovalov, Peter Klausler
  Cc: Herbert Xu, David S . Miller, Jason Wang, Eric Dumazet,
	Paolo Abeni, Michael S . Tsirkin, Soheil Hassas Yeganeh,
	Markus Elfring, Mike Rapoport, netdev, linux-kernel,
	Dmitry Vyukov, Kostya Serebryany, syzkaller
In-Reply-To: <1480433136-7922-1-git-send-email-andreyknvl@google.com>

On Tue, 2016-11-29 at 16:25 +0100, Andrey Konovalov wrote:
> This patch changes tun.c to call netif_receive_skb instead of netif_rx
> when a packet is received. The difference between the two is that netif_rx
> queues the packet into the backlog, and netif_receive_skb proccesses the
> packet in the current context.
> 
> This patch is required for syzkaller [1] to collect coverage from packet
> receive paths, when a packet being received through tun (syzkaller collects
> coverage per process in the process context).
> 
> A similar patch was introduced back in 2010 [2, 3], but the author found
> out that the patch doesn't help with the task he had in mind (for cgroups
> to shape network traffic based on the original process) and decided not to
> go further with it. The main concern back then was about possible stack
> exhaustion with 4K stacks, but CONFIG_4KSTACKS was removed and stacks are
> 8K now.

Acked-by: Eric Dumazet <edumazet@google.com>

We're using a similar patch written by Peter , let me copy here part of
his changelog, since main motivation was speed improvement at that
time :

commit 29aa09f47d43e93327a706cd835a37012ccc5b9e
Author: Peter Klausler <pmk@google.com>
Date:   Fri Mar 29 17:08:02 2013 -0400

    net-tun: Add netif_rx_ni_immediate() variant to speed up tun/tap.
    
    Speed up packet reception from (i.e., writes to) a tun/tap
    device by adding an alternative netif_rx_ni_immediate()
    interface that invokes netif_receive_skb() immediately rather
    than enqueueing the packet in the backlog queue and then driving
    its processing with do_softirq().  Forced queueing as a consequence
    of an RPS CPU mask will still work as expected.
    
    This change speeds up my closed-loop single-stream tap/OVS benchmark
    by about 23%, from 700k packets/second to 867k packets/second.
    

^ permalink raw reply

* Re: [PATCH net-next 1/3] ethtool: (uapi) Add ETHTOOL_PHY_LOOPBACK to PHY tunables
From: Allan W. Nielsen @ 2016-11-29 15:32 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Florian Fainelli, netdev, raju.lakkaraju
In-Reply-To: <20161129004653.GA24782@lunn.ch>


On 28/11/16 21:21, Andrew Lunn wrote:
> On Mon, Nov 28, 2016 at 08:23:06PM +0100, Allan W. Nielsen wrote:
> > On 28/11/16 15:14, Andrew Lunn wrote:
> > > On Mon, Nov 28, 2016 at 02:24:30PM +0100, Allan W. Nielsen wrote:
> > > > From: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com>
> > > > 3 types of PHY loopback are supported.
> > > > i.e. Near-End Loopback, Far-End Loopback and External Loopback.
> > > Is this integrated with ethtool --test? You only want the PHY to go
> > > into loopback mode when running ethtool --test external_lb or maybe
> > > ethtool --test offline.
> > There are other use-cases for enabling PHY loopback:
> > 1) If the PHY is connected to a switch then a loop-port is sometime
> >    used to "force/enable" one or more extra pass through the switch
> >    core. This "hack" can sometime be used to achieve new functionality
> >    with existing silicon.
> With Linux, switches are managed via switchdev, or DSA. You will have
> to teach this infrastructure that something really odd is going on
> with one of its ports before you do anything like this in the PHY
> layer. I suggest you leave this use case alone until somebody
> really-really wants it. From my knowledge of the Marvell DSA driver,
> this is not easy.
> > 2) Existing user-space application may expect to be able to do the
> >    testing on its own (generate/validate the test traffic).
> Please can you reference some existing user-space application and the
> kernel API it uses to put the PHY into loopback mode?
The application I had in mind, is the switch application that I'm normally
working on (a product of MSCC). This application was originally written for
eCos, but is now moved to Linux. The application is currently doing almost
all drivers in user-space (reaching the HW through a UIO driver). This
means that we have an existing set of APIs and associated applications which
among many other things tests the PHYs using loopback facilities. There are
also cases where the loopports are being used as I described earlier.

We are looking at moving some of the drivers into the kernel, and that
require us to find a solution to such issues (nothing have been decided,
and nothing will be decided anytime soon).

I also understand your point of view, you have presented pretty good
arguments, and I do not expect this to change your view on this topic.

On 29/11/16 01:46, Andrew Lunn wrote:
> > > If you really do what to do this, you should look at NETIF_F_LOOPBACK
> > > and consider how this concept can be applied at the PHY, not the MAC.
> > > But you need the full concept, so you see things like:
> > >
> > > 2: eth0: <LOOPBACK,BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
> > >     link/ether 80:ee:73:83:60:27 brd ff:ff:ff:ff:ff:ff
> > >
> > > Humm, i've no idea how you actually enable the MAC loopback
> > > NETIF_F_LOOPBACK represents. I don't see anything with ip link set.
> >
> > I am afraid you lost me on this, NETIF_F_LOOPBACK is a netdev_features_t
> > bit, so it tells ethtool that this is a potential feature to be turned
> > on with ethtool -K <iface>.
> Yep, i'm talking rubbish after jumping to a wrong conclusion.
> > The semantics of this loopack feature are
> > not defined AFAICT, but a reasonable behavior from the driver is to put
> > itself in a mode where packets send by a socket-level application are
> > looped through the Ethernet adapter itself. Whether this happens at the
> > DMA level, the MII signals, or somewhere in the PHY, is driver specific
> > unfortunately.
> 
> Yes, the interesting one might be forcedeth which writes to the PHY
> BMCR_LOOPBACK | BMCR_FULLDPLX | BMCR_SPEED1000;
> when the NETIF_F_LOOPBACK feature is set.
> 
> Maybe this could be generalised and made available for all MACs which
> don't support MAC loopback?
> 
> What needs considering is the correct duplex and speed. We need to
> ensure the MAC and PHY agree on this.
> 
> > Now, there is another way to toggle a loopback for a given Ethernet
> > adapter which is to actually set IFF_LOOPBACK in dev->flags for this
> > interface. Some drivers seem to be able to properly react to that as
> > well, although I see no way this can be done looking at the iproute2 or
> > ifconfig man pages..
> 
> I prefer this one, since i expect this will set LOOPBACK in
> 
> 2: eth0: <LOOPBACK,BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
> 
> making it lot more obvious something funny is going on.
Raju and I will need to dive deeper into this to understand the details of
what you are suggesting. But I think it points in a different direction...

The approach you are describing is to use existing flags (which will make
the loop-back state visible to the user, which is a good thing) to set the
loopback state. Where/how the frames are looped depends on the drivers. The
suggested tunable could be kept, but it will only allow the user to say,
"if the frames are looped in the PHY, then this is how I want them to be
looped".

Also, it does not make a lot of sense for the "FAR" loopback patch (which I
admit is a bit strange...).

The facility we are seeking is much more specific: "Allow the user to bring
the PHY in and out of specific loopback modes" assuming that the user have
reasons to do so.

I'm not sure if your main dislike with this feature is the lack of
transparency/visibility to the end-user, or if is the concept of allowing
the user to control where and how frames are looped (or both).

Best regards
Allan W. Nielsen

^ permalink raw reply

* Re: adm80211: Removed unused 'io_addr' 'mem_addr' variables
From: Kalle Valo @ 2016-11-29 15:32 UTC (permalink / raw)
  To: Kirtika Ruchandani
  Cc: Arnd Bergmann, Johannes Berg,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161124064045.GA8836-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>

Kirtika Ruchandani <kirtika.ruchandani-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Initial commit cc0b88cf5ecf ([PATCH] Add adm8211 802.11b wireless driver)
> introduced variables mem_addr and io_addr in adm80211_probe() that are
> set but not used. Compiling with W=1 gives the following warnings,
> fix them.
> 
> drivers/net/wireless/admtek/adm8211.c: In function ‘adm8211_probe’:
> drivers/net/wireless/admtek/adm8211.c:1769:15: warning: variable ‘io_addr’ set but not used [-Wunused-but-set-variable]
>   unsigned int io_addr, io_len;
>                ^
> drivers/net/wireless/admtek/adm8211.c:1768:16: warning: variable ‘mem_addr’ set but not used [-Wunused-but-set-variable]
>   unsigned long mem_addr, mem_len;
>                 ^
> 
> These are harmless warnings and are only being fixed to reduce the
> noise with W=1 in the kernel. The calls to pci_resource_start do not
> have any side-effects and are safe to remove.
> 
> Fixes: cc0b88cf5ecf ("[PATCH] Add adm8211 802.11b wireless driver")
> Cc: Michael Wu <flamingice-R9e9/4HEdknk1uMJSBkQmQ@public.gmane.org>
> Cc: John W. Linville <linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>
> Signed-off-by: Kirtika Ruchandani <kirtika-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>

The patch failed to apply:

fatal: corrupt patch at line 14
Applying: adm80211: Removed unused 'io_addr' 'mem_addr' variables
Repository lacks necessary blobs to fall back on 3-way merge.
Cannot fall back to three-way merge.
Patch failed at 0001 adm80211: Removed unused 'io_addr' 'mem_addr' variables

Patch set to Changes Requested.

-- 
https://patchwork.kernel.org/patch/9444901/

Documentation about submitting wireless patches and checking status
from patchwork:

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

* Re: net: GPF in eth_header
From: Andrey Konovalov @ 2016-11-29 15:31 UTC (permalink / raw)
  To: syzkaller
  Cc: Dmitry Vyukov, David Miller, Tom Herbert, Alexander Duyck,
	Hannes Frederic Sowa, Jiri Benc, Sabrina Dubroca, netdev, LKML
In-Reply-To: <1480431501.18162.131.camel@edumazet-glaptop3.roam.corp.google.com>

On Tue, Nov 29, 2016 at 3:58 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2016-11-29 at 11:26 +0100, Andrey Konovalov wrote:
>> On Sat, Nov 26, 2016 at 9:05 PM, Eric Dumazet <erdlkml@gmail.com> wrote:
>> >> I actually see multiple places where skb_network_offset() is used as
>> >> an argument to skb_pull().
>> >> So I guess every place can potentially be buggy.
>> >
>> > Well, I think the intent is to accept a negative number.
>>
>> I'm not sure that was the intent since it results in a signedness
>> issue which leads to an out-of-bounds.
>>
>
> Hey, I already mentioned where was the bug.
>
> You missed the investigation where I pointed it to FLorian ?
>
>> A quick grep shows that the same issue can potentially happen in
>> multiple places across the kernel:
>>
>> net/ipv6/ip6_output.c:1655: __skb_pull(skb, skb_network_offset(skb));
>> net/packet/af_packet.c:2043: skb_pull(skb, skb_network_offset(skb));
>> net/packet/af_packet.c:2165: skb_pull(skb, skb_network_offset(skb));
>> net/core/neighbour.c:1301: __skb_pull(skb, skb_network_offset(skb));
>> net/core/neighbour.c:1331: __skb_pull(skb, skb_network_offset(skb));
>> net/core/dev.c:3157: __skb_pull(skb, skb_network_offset(skb));
>> net/sched/sch_teql.c:337: __skb_pull(skb, skb_network_offset(skb));
>> net/sched/sch_atm.c:479: skb_pull(skb, skb_network_offset(skb));
>> net/ipv4/ip_output.c:1385: __skb_pull(skb, skb_network_offset(skb));
>> net/ipv4/ip_fragment.c:391: if (!pskb_pull(skb, skb_network_offset(skb) + ihl))
>> drivers/net/vxlan.c:1440: __skb_pull(reply, skb_network_offset(reply));
>> drivers/net/vxlan.c:1902: __skb_pull(skb, skb_network_offset(skb));
>> drivers/net/vrf.c:220: __skb_pull(skb, skb_network_offset(skb));
>> drivers/net/vrf.c:314: __skb_pull(skb, skb_network_offset(skb));
>>
>> A similar thing also happened to somebody else (on a receive path!):
>> https://forums.grsecurity.net/viewtopic.php?f=3&t=4550
>>
>> Does it make sense to check skb_network_offset() before passing it to
>> skb_pull() everywhere?
>
> Well, sure, we could add safety checks everywhere and slow the kernel
> when debugging is requested.
>
> But skb_network_offset() is not the problem here. Why are you focusing
> on it ?
>
> The real problem is in __skb_pull() or __skb_push() and all similar
> helpers. Lots of added checks and slowdowns.

The issue is not with skb_network_offset(), but with __skb_pull()
using skb_network_offset() as an argument.

I'm not sure what would be the beast way to fix this, to add a check
before every __skb_pull(skb_network_offset()), to fix __skb_pull() to
work with signed ints, to add BUG_ON()'s in __skb_pull, or something
else.

What I meant is that you fixed this very instance of the bug, and I'm
pointing out that a similar one might hit us again.

>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 9c535fbccf2c7dbfae04cee393460e86d588c26b..d6116e37d054fc1536114347ed3c41fc7dc7a882 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -1923,6 +1923,7 @@ static inline unsigned char *__skb_put(struct sk_buff *skb, unsigned int len)
>  unsigned char *skb_push(struct sk_buff *skb, unsigned int len);
>  static inline unsigned char *__skb_push(struct sk_buff *skb, unsigned int len)
>  {
> +       BUG_ON((int)len < 0);
>         skb->data -= len;
>         skb->len  += len;
>         return skb->data;
> @@ -1931,6 +1932,7 @@ static inline unsigned char *__skb_push(struct sk_buff *skb, unsigned int len)
>  unsigned char *skb_pull(struct sk_buff *skb, unsigned int len);
>  static inline unsigned char *__skb_pull(struct sk_buff *skb, unsigned int len)
>  {
> +       BUG_ON((int)len < 0);
>         skb->len -= len;
>         BUG_ON(skb->len < skb->data_len);
>         return skb->data += len;
> @@ -1938,6 +1940,7 @@ static inline unsigned char *__skb_pull(struct sk_buff *skb, unsigned int len)
>
>  static inline unsigned char *skb_pull_inline(struct sk_buff *skb, unsigned int len)
>  {
> +       BUG_ON((int)len < 0);
>         return unlikely(len > skb->len) ? NULL : __skb_pull(skb, len);
>  }
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index d1d1a5a5ad24ded523fc12ffba8c602b03bd0830..7bf098c848fd857ba5d287fc91d43f62f381bd55 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -1450,6 +1450,7 @@ EXPORT_SYMBOL(skb_put);
>   */
>  unsigned char *skb_push(struct sk_buff *skb, unsigned int len)
>  {
> +       BUG_ON((int)len < 0);
>         skb->data -= len;
>         skb->len  += len;
>         if (unlikely(skb->data<skb->head))
>
>
>
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply

* Re: [PATCH] ipv6:ip6_xmit remove unnecessary np NULL check
From: Eric Dumazet @ 2016-11-29 15:26 UTC (permalink / raw)
  To: Manjeet Pawar
  Cc: davem, kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel,
	pankaj.m, ajeet.y, Rohit Thapliyal
In-Reply-To: <1480401179-9220-1-git-send-email-manjeet.p@samsung.com>

On Tue, 2016-11-29 at 12:02 +0530, Manjeet Pawar wrote:
> From: Rohit Thapliyal <r.thapliyal@samsung.com>
> 
> np NULL check doesn't seem required here as it shall never
> be NULL anyways in inet6_sk(sk).
> 
> Signed-off-by: Rohit Thapliyal <r.thapliyal@samsung.com>
> Signed-off-by: Manjeet Pawar <manjeet.p@samsung.com>
> Signed-off-by: David Miller <davem@davemloft.net>
> Reviewed-by: Akhilesh Kumar <akhilesh.k@samsung.com>
> 
> ---
> v2->v3: Modified as per the suggestion from David Miller
>         ip6_xmit calls are made without checking NULL np
>         pointer, so no need to explicitly check NULL np in
>         ip6_xmit.
> 
>  include/linux/ipv6.h  | 2 +-
>  net/ipv6/ip6_output.c | 3 +--
>  2 files changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
> index a064997..6c9c604 100644
> --- a/include/linux/ipv6.h
> +++ b/include/linux/ipv6.h
> @@ -299,7 +299,7 @@ struct tcp6_timewait_sock {
>  
>  static inline struct ipv6_pinfo *inet6_sk(const struct sock *__sk)
>  {
> -	return sk_fullsock(__sk) ? inet_sk(__sk)->pinet6 : NULL;
> +	return inet_sk(__sk)->pinet6;


David suggestion was about np being NULL or not in ip6_xmit()

But have you checked inet6_sk() was never called for a TCPv6 TIMEWAIT or
SYN_RECV request ?

>  }
>  
>  static inline struct raw6_sock *raw6_sk(const struct sock *sk)
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index 59eb4ed..f8c63ec 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -213,8 +213,7 @@ int ip6_xmit(const struct sock *sk, struct sk_buff *skb, struct flowi6 *fl6,
>  	/*
>  	 *	Fill in the IPv6 header
>  	 */
> -	if (np)
> -		hlimit = np->hop_limit;
> +	hlimit = np->hop_limit;
>  	if (hlimit < 0)
>  		hlimit = ip6_dst_hoplimit(dst);
>  

This part is fine.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox