BPF List
 help / color / mirror / Atom feed
* [PATCH bpf-next] libbpf: provide more helpful message on uninitialized global var
From: Andrii Nakryiko @ 2019-07-23  4:23 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

When BPF program defines uninitialized global variable, it's put into
a special COMMON section. Libbpf will reject such programs, but will
provide very unhelpful message with garbage-looking section index.

This patch detects special section cases and gives more explicit error
message.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/lib/bpf/libbpf.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 794dd5064ae8..5f9e7eedb134 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1760,15 +1760,23 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
 			 (long long) sym.st_value, sym.st_name, name);
 
 		shdr_idx = sym.st_shndx;
+		insn_idx = rel.r_offset / sizeof(struct bpf_insn);
+		pr_debug("relocation: insn_idx=%u, shdr_idx=%u\n",
+			 insn_idx, shdr_idx);
+
+		if (shdr_idx >= SHN_LORESERVE) {
+			pr_warning("relocation: not yet supported relo for non-static global \'%s\' variable "
+				   "in special section (0x%x) found in insns[%d].code 0x%x\n",
+				   name, shdr_idx, insn_idx,
+				   insns[insn_idx].code);
+			return -LIBBPF_ERRNO__RELOC;
+		}
 		if (!bpf_object__relo_in_known_section(obj, shdr_idx)) {
 			pr_warning("Program '%s' contains unrecognized relo data pointing to section %u\n",
 				   prog->section_name, shdr_idx);
 			return -LIBBPF_ERRNO__RELOC;
 		}
 
-		insn_idx = rel.r_offset / sizeof(struct bpf_insn);
-		pr_debug("relocation: insn_idx=%u\n", insn_idx);
-
 		if (insns[insn_idx].code == (BPF_JMP | BPF_CALL)) {
 			if (insns[insn_idx].src_reg != BPF_PSEUDO_CALL) {
 				pr_warning("incorrect bpf_call opcode\n");
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH 3/3] net/xdp: convert put_page() to put_user_page*()
From: Ira Weiny @ 2019-07-23  0:25 UTC (permalink / raw)
  To: john.hubbard
  Cc: Andrew Morton, Alexander Viro, Björn Töpel,
	Boaz Harrosh, Christoph Hellwig, Daniel Vetter, Dan Williams,
	Dave Chinner, David Airlie, David S . Miller, Ilya Dryomov,
	Jan Kara, Jason Gunthorpe, Jens Axboe, Jérôme Glisse,
	Johannes Thumshirn, Magnus Karlsson, Matthew Wilcox,
	Miklos Szeredi, Ming Lei, Sage Weil, Santosh Shilimkar, Yan Zheng,
	netdev, dri-devel, linux-mm, linux-rdma, bpf, LKML, John Hubbard
In-Reply-To: <20190722223415.13269-4-jhubbard@nvidia.com>

On Mon, Jul 22, 2019 at 03:34:15PM -0700, john.hubbard@gmail.com wrote:
> From: John Hubbard <jhubbard@nvidia.com>
> 
> For pages that were retained via get_user_pages*(), release those pages
> via the new put_user_page*() routines, instead of via put_page() or
> release_pages().
> 
> This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
> ("mm: introduce put_user_page*(), placeholder versions").
> 
> Cc: Björn Töpel <bjorn.topel@intel.com>
> Cc: Magnus Karlsson <magnus.karlsson@intel.com>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: netdev@vger.kernel.org
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
>  net/xdp/xdp_umem.c | 9 +--------
>  1 file changed, 1 insertion(+), 8 deletions(-)
> 
> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> index 83de74ca729a..0325a17915de 100644
> --- a/net/xdp/xdp_umem.c
> +++ b/net/xdp/xdp_umem.c
> @@ -166,14 +166,7 @@ void xdp_umem_clear_dev(struct xdp_umem *umem)
>  
>  static void xdp_umem_unpin_pages(struct xdp_umem *umem)
>  {
> -	unsigned int i;
> -
> -	for (i = 0; i < umem->npgs; i++) {
> -		struct page *page = umem->pgs[i];
> -
> -		set_page_dirty_lock(page);
> -		put_page(page);
> -	}
> +	put_user_pages_dirty_lock(umem->pgs, umem->npgs);

What is the difference between this and

__put_user_pages(umem->pgs, umem->npgs, PUP_FLAGS_DIRTY_LOCK);

?

I'm a bit concerned with adding another form of the same interface.  We should
either have 1 call with flags (enum in this case) or multiple calls.  Given the
previous discussion lets move in the direction of having the enum but don't
introduce another caller of the "old" interface.

So I think on this patch NAK from me.

I also don't like having a __* call in the exported interface but there is a
__get_user_pages_fast() call so I guess there is precedent.  :-/

Ira

>  
>  	kfree(umem->pgs);
>  	umem->pgs = NULL;
> -- 
> 2.22.0
> 

^ permalink raw reply

* [bpf-next 4/6] bpf: sync bpf.h to tools/
From: Petar Penkov @ 2019-07-23  0:20 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, edumazet, lmb, sdf, Petar Penkov
In-Reply-To: <20190723002042.105927-1-ppenkov.kernel@gmail.com>

From: Petar Penkov <ppenkov@google.com>

Sync updated documentation for bpf_redirect_map.

Sync the bpf_tcp_gen_syncookie helper function definition with the one
in tools/uapi.

Signed-off-by: Petar Penkov <ppenkov@google.com>
---
 tools/include/uapi/linux/bpf.h | 37 +++++++++++++++++++++++++++++++---
 1 file changed, 34 insertions(+), 3 deletions(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index f506c68b2612..20baee7b2219 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1571,8 +1571,11 @@ union bpf_attr {
  * 		but this is only implemented for native XDP (with driver
  * 		support) as of this writing).
  *
- * 		All values for *flags* are reserved for future usage, and must
- * 		be left at zero.
+ * 		The lower two bits of *flags* are used as the return code if
+ * 		the map lookup fails. This is so that the return value can be
+ * 		one of the XDP program return codes up to XDP_TX, as chosen by
+ * 		the caller. Any higher bits in the *flags* argument must be
+ * 		unset.
  *
  * 		When used to redirect packets to net devices, this helper
  * 		provides a high performance increase over **bpf_redirect**\ ().
@@ -2710,6 +2713,33 @@ union bpf_attr {
  *		**-EPERM** if no permission to send the *sig*.
  *
  *		**-EAGAIN** if bpf program can try again.
+ *
+ * s64 bpf_tcp_gen_syncookie(struct bpf_sock *sk, void *iph, u32 iph_len, struct tcphdr *th, u32 th_len)
+ *	Description
+ *		Try to issue a SYN cookie for the packet with corresponding
+ *		IP/TCP headers, *iph* and *th*, on the listening socket in *sk*.
+ *
+ *		*iph* points to the start of the IPv4 or IPv6 header, while
+ *		*iph_len* contains **sizeof**\ (**struct iphdr**) or
+ *		**sizeof**\ (**struct ip6hdr**).
+ *
+ *		*th* points to the start of the TCP header, while *th_len*
+ *		contains the length of the TCP header.
+ *
+ *	Return
+ *		On success, lower 32 bits hold the generated SYN cookie in
+ *		followed by 16 bits which hold the MSS value for that cookie,
+ *		and the top 16 bits are unused.
+ *
+ *		On failure, the returned value is one of the following:
+ *
+ *		**-EINVAL** SYN cookie cannot be issued due to error
+ *
+ *		**-ENOENT** SYN cookie should not be issued (no SYN flood)
+ *
+ *		**-ENOTSUPP** kernel configuration does not enable SYN cookies
+ *
+ *		**-EPROTONOSUPPORT** IP packet version is not 4 or 6
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2821,7 +2851,8 @@ union bpf_attr {
 	FN(strtoul),			\
 	FN(sk_storage_get),		\
 	FN(sk_storage_delete),		\
-	FN(send_signal),
+	FN(send_signal),		\
+	FN(tcp_gen_syncookie),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
-- 
2.22.0.657.g960e92d24f-goog


^ permalink raw reply related

* [bpf-next 1/6] tcp: tcp_syn_flood_action read port from socket
From: Petar Penkov @ 2019-07-23  0:20 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, edumazet, lmb, sdf, Petar Penkov
In-Reply-To: <20190723002042.105927-1-ppenkov.kernel@gmail.com>

From: Petar Penkov <ppenkov@google.com>

This allows us to call this function before an SKB has been
allocated.

Signed-off-by: Petar Penkov <ppenkov@google.com>
---
 net/ipv4/tcp_input.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c21e8a22fb3b..8892df6de1d4 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6422,9 +6422,7 @@ EXPORT_SYMBOL(inet_reqsk_alloc);
 /*
  * Return true if a syncookie should be sent
  */
-static bool tcp_syn_flood_action(const struct sock *sk,
-				 const struct sk_buff *skb,
-				 const char *proto)
+static bool tcp_syn_flood_action(const struct sock *sk, const char *proto)
 {
 	struct request_sock_queue *queue = &inet_csk(sk)->icsk_accept_queue;
 	const char *msg = "Dropping request";
@@ -6444,7 +6442,7 @@ static bool tcp_syn_flood_action(const struct sock *sk,
 	    net->ipv4.sysctl_tcp_syncookies != 2 &&
 	    xchg(&queue->synflood_warned, 1) == 0)
 		net_info_ratelimited("%s: Possible SYN flooding on port %d. %s.  Check SNMP counters.\n",
-				     proto, ntohs(tcp_hdr(skb)->dest), msg);
+				     proto, sk->sk_num, msg);
 
 	return want_cookie;
 }
@@ -6487,7 +6485,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
 	 */
 	if ((net->ipv4.sysctl_tcp_syncookies == 2 ||
 	     inet_csk_reqsk_queue_is_full(sk)) && !isn) {
-		want_cookie = tcp_syn_flood_action(sk, skb, rsk_ops->slab_name);
+		want_cookie = tcp_syn_flood_action(sk, rsk_ops->slab_name);
 		if (!want_cookie)
 			goto drop;
 	}
-- 
2.22.0.657.g960e92d24f-goog


^ permalink raw reply related

* [bpf-next 2/6] tcp: add skb-less helpers to retrieve SYN cookie
From: Petar Penkov @ 2019-07-23  0:20 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, edumazet, lmb, sdf, Petar Penkov
In-Reply-To: <20190723002042.105927-1-ppenkov.kernel@gmail.com>

From: Petar Penkov <ppenkov@google.com>

This patch allows generation of a SYN cookie before an SKB has been
allocated, as is the case at XDP.

Signed-off-by: Petar Penkov <ppenkov@google.com>
---
 include/net/tcp.h    | 11 +++++++
 net/ipv4/tcp_input.c | 76 ++++++++++++++++++++++++++++++++++++++++++++
 net/ipv4/tcp_ipv4.c  |  8 +++++
 net/ipv6/tcp_ipv6.c  |  8 +++++
 4 files changed, 103 insertions(+)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index cca3c59b98bf..a128e22c0d5d 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -414,6 +414,17 @@ void tcp_parse_options(const struct net *net, const struct sk_buff *skb,
 		       int estab, struct tcp_fastopen_cookie *foc);
 const u8 *tcp_parse_md5sig_option(const struct tcphdr *th);
 
+/*
+ *	BPF SKB-less helpers
+ */
+u16 tcp_v4_get_syncookie(struct sock *sk, struct iphdr *iph,
+			 struct tcphdr *tch, u32 *cookie);
+u16 tcp_v6_get_syncookie(struct sock *sk, struct ipv6hdr *iph,
+			 struct tcphdr *tch, u32 *cookie);
+u16 tcp_get_syncookie(struct request_sock_ops *rsk_ops,
+		      const struct tcp_request_sock_ops *af_ops,
+		      struct sock *sk, void *iph, struct tcphdr *tch,
+		      u32 *cookie);
 /*
  *	TCP v4 functions exported for the inet6 API
  */
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 8892df6de1d4..893b275a6d49 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3782,6 +3782,49 @@ static void smc_parse_options(const struct tcphdr *th,
 #endif
 }
 
+/* Try to parse the MSS option from the TCP header. Return 0 on failure, clamped
+ * value on success.
+ */
+static u16 tcp_parse_mss_option(const struct tcphdr *th, u16 user_mss)
+{
+	const unsigned char *ptr = (const unsigned char *)(th + 1);
+	int length = (th->doff * 4) - sizeof(struct tcphdr);
+	u16 mss = 0;
+
+	while (length > 0) {
+		int opcode = *ptr++;
+		int opsize;
+
+		switch (opcode) {
+		case TCPOPT_EOL:
+			return mss;
+		case TCPOPT_NOP:	/* Ref: RFC 793 section 3.1 */
+			length--;
+			continue;
+		default:
+			if (length < 2)
+				return mss;
+			opsize = *ptr++;
+			if (opsize < 2) /* "silly options" */
+				return mss;
+			if (opsize > length)
+				return mss;	/* fail on partial options */
+			if (opcode == TCPOPT_MSS && opsize == TCPOLEN_MSS) {
+				u16 in_mss = get_unaligned_be16(ptr);
+
+				if (in_mss) {
+					if (user_mss && user_mss < in_mss)
+						in_mss = user_mss;
+					mss = in_mss;
+				}
+			}
+			ptr += opsize - 2;
+			length -= opsize;
+		}
+	}
+	return mss;
+}
+
 /* Look for tcp options. Normally only called on SYN and SYNACK packets.
  * But, this can also be called on packets in the established flow when
  * the fast version below fails.
@@ -6464,6 +6507,39 @@ static void tcp_reqsk_record_syn(const struct sock *sk,
 	}
 }
 
+u16 tcp_get_syncookie(struct request_sock_ops *rsk_ops,
+		      const struct tcp_request_sock_ops *af_ops,
+		      struct sock *sk, void *iph, struct tcphdr *th,
+		      u32 *cookie)
+{
+	u16 mss = 0;
+#ifdef CONFIG_SYN_COOKIES
+	bool is_v4 = rsk_ops->family == AF_INET;
+	struct tcp_sock *tp = tcp_sk(sk);
+
+	if (sock_net(sk)->ipv4.sysctl_tcp_syncookies != 2 &&
+	    !inet_csk_reqsk_queue_is_full(sk))
+		return 0;
+
+	if (!tcp_syn_flood_action(sk, rsk_ops->slab_name))
+		return 0;
+
+	if (sk_acceptq_is_full(sk)) {
+		NET_INC_STATS(sock_net(sk), LINUX_MIB_LISTENOVERFLOWS);
+		return 0;
+	}
+
+	mss = tcp_parse_mss_option(th, tp->rx_opt.user_mss);
+	if (!mss)
+		mss = af_ops->mss_clamp;
+
+	tcp_synq_overflow(sk);
+	*cookie = is_v4 ? __cookie_v4_init_sequence(iph, th, &mss)
+			: __cookie_v6_init_sequence(iph, th, &mss);
+#endif
+	return mss;
+}
+
 int tcp_conn_request(struct request_sock_ops *rsk_ops,
 		     const struct tcp_request_sock_ops *af_ops,
 		     struct sock *sk, struct sk_buff *skb)
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index d57641cb3477..0e06e59784bd 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1515,6 +1515,14 @@ static struct sock *tcp_v4_cookie_check(struct sock *sk, struct sk_buff *skb)
 	return sk;
 }
 
+u16 tcp_v4_get_syncookie(struct sock *sk, struct iphdr *iph,
+			 struct tcphdr *tch, u32 *cookie)
+{
+	return tcp_get_syncookie(&tcp_request_sock_ops,
+				 &tcp_request_sock_ipv4_ops, sk, iph, tch,
+				 cookie);
+}
+
 /* The socket must have it's spinlock held when we get
  * here, unless it is a TCP_LISTEN socket.
  *
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 5da069e91cac..102f68c3152d 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1063,6 +1063,14 @@ static struct sock *tcp_v6_cookie_check(struct sock *sk, struct sk_buff *skb)
 	return sk;
 }
 
+u16 tcp_v6_get_syncookie(struct sock *sk, struct ipv6hdr *iph,
+			 struct tcphdr *tch, u32 *cookie)
+{
+	return tcp_get_syncookie(&tcp6_request_sock_ops,
+				 &tcp_request_sock_ipv6_ops, sk, iph, tch,
+				 cookie);
+}
+
 static int tcp_v6_conn_request(struct sock *sk, struct sk_buff *skb)
 {
 	if (skb->protocol == htons(ETH_P_IP))
-- 
2.22.0.657.g960e92d24f-goog


^ permalink raw reply related

* [bpf-next 6/6] selftests/bpf: add test for bpf_tcp_gen_syncookie
From: Petar Penkov @ 2019-07-23  0:20 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, edumazet, lmb, sdf, Petar Penkov
In-Reply-To: <20190723002042.105927-1-ppenkov.kernel@gmail.com>

From: Petar Penkov <ppenkov@google.com>

Modify the existing bpf_tcp_check_syncookie test to also generate a
SYN cookie, pass the packet to the kernel, and verify that the two
cookies are the same (and both valid). Since cloned SKBs are skipped
during generic XDP, this test does not issue a SYN cookie when run in
XDP mode. We therefore only check that a valid SYN cookie was issued at
the TC hook.

Additionally, verify that the MSS for that SYN cookie is within
expected range.

Signed-off-by: Petar Penkov <ppenkov@google.com>
---
 .../bpf/progs/test_tcp_check_syncookie_kern.c | 48 +++++++++++++--
 .../selftests/bpf/test_tcp_check_syncookie.sh |  3 +
 .../bpf/test_tcp_check_syncookie_user.c       | 61 ++++++++++++++++---
 3 files changed, 99 insertions(+), 13 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/test_tcp_check_syncookie_kern.c b/tools/testing/selftests/bpf/progs/test_tcp_check_syncookie_kern.c
index 1ab095bcacd8..d8803dfa8d32 100644
--- a/tools/testing/selftests/bpf/progs/test_tcp_check_syncookie_kern.c
+++ b/tools/testing/selftests/bpf/progs/test_tcp_check_syncookie_kern.c
@@ -19,10 +19,29 @@
 struct bpf_map_def SEC("maps") results = {
 	.type = BPF_MAP_TYPE_ARRAY,
 	.key_size = sizeof(__u32),
-	.value_size = sizeof(__u64),
-	.max_entries = 1,
+	.value_size = sizeof(__u32),
+	.max_entries = 3,
 };
 
+static __always_inline __s64 gen_syncookie(void *data_end, struct bpf_sock *sk,
+					   void *iph, __u32 ip_size,
+					   struct tcphdr *tcph)
+{
+	__u32 thlen = tcph->doff * 4;
+
+	if (tcph->syn && !tcph->ack) {
+		// packet should only have an MSS option
+		if (thlen != 24)
+			return 0;
+
+		if ((void *)tcph + thlen > data_end)
+			return 0;
+
+		return bpf_tcp_gen_syncookie(sk, iph, ip_size, tcph, thlen);
+	}
+	return 0;
+}
+
 static __always_inline void check_syncookie(void *ctx, void *data,
 					    void *data_end)
 {
@@ -33,8 +52,10 @@ static __always_inline void check_syncookie(void *ctx, void *data,
 	struct ipv6hdr *ipv6h;
 	struct tcphdr *tcph;
 	int ret;
+	__u32 key_mss = 2;
+	__u32 key_gen = 1;
 	__u32 key = 0;
-	__u64 value = 1;
+	__s64 seq_mss;
 
 	ethh = data;
 	if (ethh + 1 > data_end)
@@ -66,6 +87,9 @@ static __always_inline void check_syncookie(void *ctx, void *data,
 		if (sk->state != BPF_TCP_LISTEN)
 			goto release;
 
+		seq_mss = gen_syncookie(data_end, sk, ipv4h, sizeof(*ipv4h),
+					tcph);
+
 		ret = bpf_tcp_check_syncookie(sk, ipv4h, sizeof(*ipv4h),
 					      tcph, sizeof(*tcph));
 		break;
@@ -95,6 +119,9 @@ static __always_inline void check_syncookie(void *ctx, void *data,
 		if (sk->state != BPF_TCP_LISTEN)
 			goto release;
 
+		seq_mss = gen_syncookie(data_end, sk, ipv6h, sizeof(*ipv6h),
+					tcph);
+
 		ret = bpf_tcp_check_syncookie(sk, ipv6h, sizeof(*ipv6h),
 					      tcph, sizeof(*tcph));
 		break;
@@ -103,8 +130,19 @@ static __always_inline void check_syncookie(void *ctx, void *data,
 		return;
 	}
 
-	if (ret == 0)
-		bpf_map_update_elem(&results, &key, &value, 0);
+	if (seq_mss > 0) {
+		__u32 cookie = (__u32)seq_mss;
+		__u32 mss = seq_mss >> 32;
+
+		bpf_map_update_elem(&results, &key_gen, &cookie, 0);
+		bpf_map_update_elem(&results, &key_mss, &mss, 0);
+	}
+
+	if (ret == 0) {
+		__u32 cookie = bpf_ntohl(tcph->ack_seq) - 1;
+
+		bpf_map_update_elem(&results, &key, &cookie, 0);
+	}
 
 release:
 	bpf_sk_release(sk);
diff --git a/tools/testing/selftests/bpf/test_tcp_check_syncookie.sh b/tools/testing/selftests/bpf/test_tcp_check_syncookie.sh
index d48e51716d19..9b3617d770a5 100755
--- a/tools/testing/selftests/bpf/test_tcp_check_syncookie.sh
+++ b/tools/testing/selftests/bpf/test_tcp_check_syncookie.sh
@@ -37,6 +37,9 @@ setup()
 	ns1_exec ip link set lo up
 
 	ns1_exec sysctl -w net.ipv4.tcp_syncookies=2
+	ns1_exec sysctl -w net.ipv4.tcp_window_scaling=0
+	ns1_exec sysctl -w net.ipv4.tcp_timestamps=0
+	ns1_exec sysctl -w net.ipv4.tcp_sack=0
 
 	wait_for_ip 127.0.0.1
 	wait_for_ip ::1
diff --git a/tools/testing/selftests/bpf/test_tcp_check_syncookie_user.c b/tools/testing/selftests/bpf/test_tcp_check_syncookie_user.c
index 87829c86c746..b9e991d43155 100644
--- a/tools/testing/selftests/bpf/test_tcp_check_syncookie_user.c
+++ b/tools/testing/selftests/bpf/test_tcp_check_syncookie_user.c
@@ -2,6 +2,7 @@
 // Copyright (c) 2018 Facebook
 // Copyright (c) 2019 Cloudflare
 
+#include <limits.h>
 #include <string.h>
 #include <stdlib.h>
 #include <unistd.h>
@@ -77,7 +78,7 @@ static int connect_to_server(int server_fd)
 	return fd;
 }
 
-static int get_map_fd_by_prog_id(int prog_id)
+static int get_map_fd_by_prog_id(int prog_id, bool *xdp)
 {
 	struct bpf_prog_info info = {};
 	__u32 info_len = sizeof(info);
@@ -104,6 +105,8 @@ static int get_map_fd_by_prog_id(int prog_id)
 		goto err;
 	}
 
+	*xdp = info.type == BPF_PROG_TYPE_XDP;
+
 	map_fd = bpf_map_get_fd_by_id(map_ids[0]);
 	if (map_fd < 0)
 		log_err("Failed to get fd by map id %d", map_ids[0]);
@@ -113,18 +116,32 @@ static int get_map_fd_by_prog_id(int prog_id)
 	return map_fd;
 }
 
-static int run_test(int server_fd, int results_fd)
+static int run_test(int server_fd, int results_fd, bool xdp)
 {
 	int client = -1, srv_client = -1;
 	int ret = 0;
 	__u32 key = 0;
-	__u64 value = 0;
+	__u32 key_gen = 1;
+	__u32 key_mss = 2;
+	__u32 value = 0;
+	__u32 value_gen = 0;
+	__u32 value_mss = 0;
 
 	if (bpf_map_update_elem(results_fd, &key, &value, 0) < 0) {
 		log_err("Can't clear results");
 		goto err;
 	}
 
+	if (bpf_map_update_elem(results_fd, &key_gen, &value_gen, 0) < 0) {
+		log_err("Can't clear results");
+		goto err;
+	}
+
+	if (bpf_map_update_elem(results_fd, &key_mss, &value_mss, 0) < 0) {
+		log_err("Can't clear results");
+		goto err;
+	}
+
 	client = connect_to_server(server_fd);
 	if (client == -1)
 		goto err;
@@ -140,8 +157,35 @@ static int run_test(int server_fd, int results_fd)
 		goto err;
 	}
 
-	if (value != 1) {
-		log_err("Didn't match syncookie: %llu", value);
+	if (value == 0) {
+		log_err("Didn't match syncookie: %u", value);
+		goto err;
+	}
+
+	if (bpf_map_lookup_elem(results_fd, &key_gen, &value_gen) < 0) {
+		log_err("Can't lookup result");
+		goto err;
+	}
+
+	if (xdp && value_gen == 0) {
+		// SYN packets do not get passed through generic XDP, skip the
+		// rest of the test.
+		printf("Skipping XDP cookie check\n");
+		goto out;
+	}
+
+	if (bpf_map_lookup_elem(results_fd, &key_mss, &value_mss) < 0) {
+		log_err("Can't lookup result");
+		goto err;
+	}
+
+	if (value != value_gen) {
+		log_err("BPF generated cookie does not match kernel one");
+		goto err;
+	}
+
+	if (value_mss < 536 || value_mss > USHRT_MAX) {
+		log_err("Unexpected MSS retrieved");
 		goto err;
 	}
 
@@ -163,13 +207,14 @@ int main(int argc, char **argv)
 	int server_v6 = -1;
 	int results = -1;
 	int err = 0;
+	bool xdp;
 
 	if (argc < 2) {
 		fprintf(stderr, "Usage: %s prog_id\n", argv[0]);
 		exit(1);
 	}
 
-	results = get_map_fd_by_prog_id(atoi(argv[1]));
+	results = get_map_fd_by_prog_id(atoi(argv[1]), &xdp);
 	if (results < 0) {
 		log_err("Can't get map");
 		goto err;
@@ -194,10 +239,10 @@ int main(int argc, char **argv)
 	if (server_v6 == -1)
 		goto err;
 
-	if (run_test(server, results))
+	if (run_test(server, results, xdp))
 		goto err;
 
-	if (run_test(server_v6, results))
+	if (run_test(server_v6, results, xdp))
 		goto err;
 
 	printf("ok\n");
-- 
2.22.0.657.g960e92d24f-goog


^ permalink raw reply related

* [bpf-next 5/6] selftests/bpf: bpf_tcp_gen_syncookie->bpf_helpers
From: Petar Penkov @ 2019-07-23  0:20 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, edumazet, lmb, sdf, Petar Penkov
In-Reply-To: <20190723002042.105927-1-ppenkov.kernel@gmail.com>

From: Petar Penkov <ppenkov@google.com>

Expose bpf_tcp_gen_syncookie to selftests.

Signed-off-by: Petar Penkov <ppenkov@google.com>
---
 tools/testing/selftests/bpf/bpf_helpers.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index 5a3d92c8bec8..19f01e967402 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -228,6 +228,9 @@ static void *(*bpf_sk_storage_get)(void *map, struct bpf_sock *sk,
 static int (*bpf_sk_storage_delete)(void *map, struct bpf_sock *sk) =
 	(void *)BPF_FUNC_sk_storage_delete;
 static int (*bpf_send_signal)(unsigned sig) = (void *)BPF_FUNC_send_signal;
+static long long (*bpf_tcp_gen_syncookie)(struct bpf_sock *sk, void *ip,
+					  int ip_len, void *tcp, int tcp_len) =
+	(void *) BPF_FUNC_tcp_gen_syncookie;
 
 /* llvm builtin functions that eBPF C program may use to
  * emit BPF_LD_ABS and BPF_LD_IND instructions
-- 
2.22.0.657.g960e92d24f-goog


^ permalink raw reply related

* [bpf-next 3/6] bpf: add bpf_tcp_gen_syncookie helper
From: Petar Penkov @ 2019-07-23  0:20 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, edumazet, lmb, sdf, Petar Penkov
In-Reply-To: <20190723002042.105927-1-ppenkov.kernel@gmail.com>

From: Petar Penkov <ppenkov@google.com>

This helper function allows BPF programs to try to generate SYN
cookies, given a reference to a listener socket. The function works
from XDP and with an skb context since bpf_skc_lookup_tcp can lookup a
socket in both cases.

Signed-off-by: Petar Penkov <ppenkov@google.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
---
 include/uapi/linux/bpf.h | 30 ++++++++++++++++-
 net/core/filter.c        | 73 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 102 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 6f68438aa4ed..20baee7b2219 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2713,6 +2713,33 @@ union bpf_attr {
  *		**-EPERM** if no permission to send the *sig*.
  *
  *		**-EAGAIN** if bpf program can try again.
+ *
+ * s64 bpf_tcp_gen_syncookie(struct bpf_sock *sk, void *iph, u32 iph_len, struct tcphdr *th, u32 th_len)
+ *	Description
+ *		Try to issue a SYN cookie for the packet with corresponding
+ *		IP/TCP headers, *iph* and *th*, on the listening socket in *sk*.
+ *
+ *		*iph* points to the start of the IPv4 or IPv6 header, while
+ *		*iph_len* contains **sizeof**\ (**struct iphdr**) or
+ *		**sizeof**\ (**struct ip6hdr**).
+ *
+ *		*th* points to the start of the TCP header, while *th_len*
+ *		contains the length of the TCP header.
+ *
+ *	Return
+ *		On success, lower 32 bits hold the generated SYN cookie in
+ *		followed by 16 bits which hold the MSS value for that cookie,
+ *		and the top 16 bits are unused.
+ *
+ *		On failure, the returned value is one of the following:
+ *
+ *		**-EINVAL** SYN cookie cannot be issued due to error
+ *
+ *		**-ENOENT** SYN cookie should not be issued (no SYN flood)
+ *
+ *		**-ENOTSUPP** kernel configuration does not enable SYN cookies
+ *
+ *		**-EPROTONOSUPPORT** IP packet version is not 4 or 6
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2824,7 +2851,8 @@ union bpf_attr {
 	FN(strtoul),			\
 	FN(sk_storage_get),		\
 	FN(sk_storage_delete),		\
-	FN(send_signal),
+	FN(send_signal),		\
+	FN(tcp_gen_syncookie),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/net/core/filter.c b/net/core/filter.c
index 47f6386fb17a..92114271eff6 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5850,6 +5850,75 @@ static const struct bpf_func_proto bpf_tcp_check_syncookie_proto = {
 	.arg5_type	= ARG_CONST_SIZE,
 };
 
+BPF_CALL_5(bpf_tcp_gen_syncookie, struct sock *, sk, void *, iph, u32, iph_len,
+	   struct tcphdr *, th, u32, th_len)
+{
+#ifdef CONFIG_SYN_COOKIES
+	u32 cookie;
+	u16 mss;
+
+	if (unlikely(th_len < sizeof(*th) || th_len != th->doff * 4))
+		return -EINVAL;
+
+	if (sk->sk_protocol != IPPROTO_TCP || sk->sk_state != TCP_LISTEN)
+		return -EINVAL;
+
+	if (!sock_net(sk)->ipv4.sysctl_tcp_syncookies)
+		return -ENOENT;
+
+	if (!th->syn || th->ack || th->fin || th->rst)
+		return -EINVAL;
+
+	if (unlikely(iph_len < sizeof(struct iphdr)))
+		return -EINVAL;
+
+	/* Both struct iphdr and struct ipv6hdr have the version field at the
+	 * same offset so we can cast to the shorter header (struct iphdr).
+	 */
+	switch (((struct iphdr *)iph)->version) {
+	case 4:
+		if (sk->sk_family == AF_INET6 && sk->sk_ipv6only)
+			return -EINVAL;
+
+		mss = tcp_v4_get_syncookie(sk, iph, th, &cookie);
+		break;
+
+#if IS_BUILTIN(CONFIG_IPV6)
+	case 6:
+		if (unlikely(iph_len < sizeof(struct ipv6hdr)))
+			return -EINVAL;
+
+		if (sk->sk_family != AF_INET6)
+			return -EINVAL;
+
+		mss = tcp_v6_get_syncookie(sk, iph, th, &cookie);
+		break;
+#endif /* CONFIG_IPV6 */
+
+	default:
+		return -EPROTONOSUPPORT;
+	}
+	if (mss <= 0)
+		return -ENOENT;
+
+	return cookie | ((u64)mss << 32);
+#else
+	return -ENOTSUPP;
+#endif /* CONFIG_SYN_COOKIES */
+}
+
+static const struct bpf_func_proto bpf_tcp_gen_syncookie_proto = {
+	.func		= bpf_tcp_gen_syncookie,
+	.gpl_only	= true, /* __cookie_v*_init_sequence() is GPL */
+	.pkt_access	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_SOCK_COMMON,
+	.arg2_type	= ARG_PTR_TO_MEM,
+	.arg3_type	= ARG_CONST_SIZE,
+	.arg4_type	= ARG_PTR_TO_MEM,
+	.arg5_type	= ARG_CONST_SIZE,
+};
+
 #endif /* CONFIG_INET */
 
 bool bpf_helper_changes_pkt_data(void *func)
@@ -6135,6 +6204,8 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_tcp_check_syncookie_proto;
 	case BPF_FUNC_skb_ecn_set_ce:
 		return &bpf_skb_ecn_set_ce_proto;
+	case BPF_FUNC_tcp_gen_syncookie:
+		return &bpf_tcp_gen_syncookie_proto;
 #endif
 	default:
 		return bpf_base_func_proto(func_id);
@@ -6174,6 +6245,8 @@ xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_xdp_skc_lookup_tcp_proto;
 	case BPF_FUNC_tcp_check_syncookie:
 		return &bpf_tcp_check_syncookie_proto;
+	case BPF_FUNC_tcp_gen_syncookie:
+		return &bpf_tcp_gen_syncookie_proto;
 #endif
 	default:
 		return bpf_base_func_proto(func_id);
-- 
2.22.0.657.g960e92d24f-goog


^ permalink raw reply related

* [bpf-next 0/6] Introduce a BPF helper to generate SYN cookies
From: Petar Penkov @ 2019-07-23  0:20 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, edumazet, lmb, sdf, Petar Penkov

From: Petar Penkov <ppenkov@google.com>

This patch series introduces a BPF helper function that allows generating SYN
cookies from BPF. Currently, this helper is enabled at both the TC hook and the
XDP hook.

The first two patches in the series add/modify several TCP helper functions to
allow for SKB-less operation, as is the case at the XDP hook.

The third patch introduces the bpf_tcp_gen_syncookie helper function which
generates a SYN cookie for either XDP or TC programs. The return value of
this function contains both the MSS value, encoded in the cookie, and the
cookie itself.

The last three patches sync tools/ and add a test. 

Changes since RFC:
1/ Cookie is returned in host order at Alexei's suggestion
2/ If cookies are not enabled via a sysctl, the helper function returns
   -ENOENT instead of -EINVAL at Lorenz's suggestion
3/ Fixed documentation to properly reflect that MSS is 16 bits at
   Lorenz's suggestion
4/ BPF helper requires TCP length to match ->doff field, rather than to simply
   be no more than 20 bytes at Eric and Alexei's suggestion
5/ Packet type is looked up from the packet version field, rather than from the
   socket. v4 packets are rejected on v6-only sockets but should work with
   dual stack listeners at Eric's suggestion
6/ Removed unnecessary `net` argument from helper function in patch 2 at
   Lorenz's suggestion 
7/ Changed test to only pass MSS option so we can convince the verifier that the
   memory access is not out of bounds

Note that 7/ below illustrates the verifier might need to be extended to allow
passing a variable tcph->doff to the helper function like below:

__u32 thlen = tcph->doff * 4;
if (thlen < sizeof(*tcph))
	return;
__s64 cookie = bpf_tcp_gen_syncookie(sk, ipv4h, 20, tcph, thlen);

Petar Penkov (6):
  tcp: tcp_syn_flood_action read port from socket
  tcp: add skb-less helpers to retrieve SYN cookie
  bpf: add bpf_tcp_gen_syncookie helper
  bpf: sync bpf.h to tools/
  selftests/bpf: bpf_tcp_gen_syncookie->bpf_helpers
  selftests/bpf: add test for bpf_tcp_gen_syncookie

 include/net/tcp.h                             | 11 +++
 include/uapi/linux/bpf.h                      | 30 ++++++-
 net/core/filter.c                             | 73 ++++++++++++++++
 net/ipv4/tcp_input.c                          | 84 +++++++++++++++++--
 net/ipv4/tcp_ipv4.c                           |  8 ++
 net/ipv6/tcp_ipv6.c                           |  8 ++
 tools/include/uapi/linux/bpf.h                | 37 +++++++-
 tools/testing/selftests/bpf/bpf_helpers.h     |  3 +
 .../bpf/progs/test_tcp_check_syncookie_kern.c | 48 +++++++++--
 .../selftests/bpf/test_tcp_check_syncookie.sh |  3 +
 .../bpf/test_tcp_check_syncookie_user.c       | 61 ++++++++++++--
 11 files changed, 344 insertions(+), 22 deletions(-)

-- 
2.22.0.657.g960e92d24f-goog


^ permalink raw reply

* bpf-next is OPEN
From: Daniel Borkmann @ 2019-07-22 23:57 UTC (permalink / raw)
  To: bpf; +Cc: netdev, alexei.starovoitov

Merge window is closed, therefore new round begins.

Thanks everyone,
Daniel

^ permalink raw reply

* [PATCH 0/3] introduce __put_user_pages(), convert a few call sites
From: john.hubbard @ 2019-07-22 22:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Viro, Björn Töpel, Boaz Harrosh,
	Christoph Hellwig, Daniel Vetter, Dan Williams, Dave Chinner,
	David Airlie, David S . Miller, Ilya Dryomov, Jan Kara,
	Jason Gunthorpe, Jens Axboe, Jérôme Glisse,
	Johannes Thumshirn, Magnus Karlsson, Matthew Wilcox,
	Miklos Szeredi, Ming Lei, Sage Weil, Santosh Shilimkar, Yan Zheng,
	netdev, dri-devel, linux-mm, linux-rdma, bpf, LKML, John Hubbard

From: John Hubbard <jhubbard@nvidia.com>

As discussed in [1] just now, this adds a more capable variation of
put_user_pages() to the API set, and uses it to simplify both the
main implementation, and (especially) the call sites.

Thanks to Christoph for the simplifying ideas, and Matthew for (again)
recommending an enum in the API. Matthew, I seem to recall you asked
for enums before this, so I'm sorry it took until now for me to add
them. :)

The new __put_user_pages() takes an enum that handles the various
combinations of needing to call set_page_dirty() or
set_page_dirty_lock(), before calling put_user_page().

I'm using the same CC list as in [1], even though IB is no longer
included in the series. That's everyone can see what the end result
turns out to be.

Notes about the remaining patches to come:

There are about 50+ patches in my tree [2], and I'll be sending out the
remaining ones in a few more groups:

    * The block/bio related changes (Jerome mostly wrote those, but I've
      had to move stuff around extensively, and add a little code)

    * mm/ changes

    * other subsystem patches

    * an RFC that shows the current state of the tracking patch set. That
      can only be applied after all call sites are converted, but it's
      good to get an early look at it.

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

[1] https://lore.kernel.org/r/20190722093355.GB29538@lst.de
[2] https://github.com/johnhubbard/linux/tree/gup_dma_core

John Hubbard (3):
  mm/gup: introduce __put_user_pages()
  drivers/gpu/drm/via: convert put_page() to put_user_page*()
  net/xdp: convert put_page() to put_user_page*()

 drivers/gpu/drm/via/via_dmablit.c |  11 +--
 include/linux/mm.h                |  58 +++++++++++-
 mm/gup.c                          | 149 +++++++++++++++---------------
 net/xdp/xdp_umem.c                |   9 +-
 4 files changed, 135 insertions(+), 92 deletions(-)

-- 
2.22.0


^ permalink raw reply

* [PATCH 2/3] drivers/gpu/drm/via: convert put_page() to put_user_page*()
From: john.hubbard @ 2019-07-22 22:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Viro, Björn Töpel, Boaz Harrosh,
	Christoph Hellwig, Daniel Vetter, Dan Williams, Dave Chinner,
	David Airlie, David S . Miller, Ilya Dryomov, Jan Kara,
	Jason Gunthorpe, Jens Axboe, Jérôme Glisse,
	Johannes Thumshirn, Magnus Karlsson, Matthew Wilcox,
	Miklos Szeredi, Ming Lei, Sage Weil, Santosh Shilimkar, Yan Zheng,
	netdev, dri-devel, linux-mm, linux-rdma, bpf, LKML, John Hubbard
In-Reply-To: <20190722223415.13269-1-jhubbard@nvidia.com>

From: John Hubbard <jhubbard@nvidia.com>

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Also reverse the order of a comparison, in order to placate
checkpatch.pl.

Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/gpu/drm/via/via_dmablit.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/via/via_dmablit.c b/drivers/gpu/drm/via/via_dmablit.c
index 062067438f1d..754f2bb97d61 100644
--- a/drivers/gpu/drm/via/via_dmablit.c
+++ b/drivers/gpu/drm/via/via_dmablit.c
@@ -171,7 +171,6 @@ via_map_blit_for_device(struct pci_dev *pdev,
 static void
 via_free_sg_info(struct pci_dev *pdev, drm_via_sg_info_t *vsg)
 {
-	struct page *page;
 	int i;
 
 	switch (vsg->state) {
@@ -186,13 +185,9 @@ via_free_sg_info(struct pci_dev *pdev, drm_via_sg_info_t *vsg)
 		kfree(vsg->desc_pages);
 		/* fall through */
 	case dr_via_pages_locked:
-		for (i = 0; i < vsg->num_pages; ++i) {
-			if (NULL != (page = vsg->pages[i])) {
-				if (!PageReserved(page) && (DMA_FROM_DEVICE == vsg->direction))
-					SetPageDirty(page);
-				put_page(page);
-			}
-		}
+		__put_user_pages(vsg->pages, vsg->num_pages,
+				 (vsg->direction == DMA_FROM_DEVICE) ?
+				 PUP_FLAGS_DIRTY : PUP_FLAGS_CLEAN);
 		/* fall through */
 	case dr_via_pages_alloc:
 		vfree(vsg->pages);
-- 
2.22.0


^ permalink raw reply related

* [PATCH 3/3] net/xdp: convert put_page() to put_user_page*()
From: john.hubbard @ 2019-07-22 22:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Viro, Björn Töpel, Boaz Harrosh,
	Christoph Hellwig, Daniel Vetter, Dan Williams, Dave Chinner,
	David Airlie, David S . Miller, Ilya Dryomov, Jan Kara,
	Jason Gunthorpe, Jens Axboe, Jérôme Glisse,
	Johannes Thumshirn, Magnus Karlsson, Matthew Wilcox,
	Miklos Szeredi, Ming Lei, Sage Weil, Santosh Shilimkar, Yan Zheng,
	netdev, dri-devel, linux-mm, linux-rdma, bpf, LKML, John Hubbard
In-Reply-To: <20190722223415.13269-1-jhubbard@nvidia.com>

From: John Hubbard <jhubbard@nvidia.com>

For pages that were retained via get_user_pages*(), release those pages
via the new put_user_page*() routines, instead of via put_page() or
release_pages().

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions").

Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 net/xdp/xdp_umem.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 83de74ca729a..0325a17915de 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -166,14 +166,7 @@ void xdp_umem_clear_dev(struct xdp_umem *umem)
 
 static void xdp_umem_unpin_pages(struct xdp_umem *umem)
 {
-	unsigned int i;
-
-	for (i = 0; i < umem->npgs; i++) {
-		struct page *page = umem->pgs[i];
-
-		set_page_dirty_lock(page);
-		put_page(page);
-	}
+	put_user_pages_dirty_lock(umem->pgs, umem->npgs);
 
 	kfree(umem->pgs);
 	umem->pgs = NULL;
-- 
2.22.0


^ permalink raw reply related

* [PATCH 1/3] mm/gup: introduce __put_user_pages()
From: john.hubbard @ 2019-07-22 22:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Viro, Björn Töpel, Boaz Harrosh,
	Christoph Hellwig, Daniel Vetter, Dan Williams, Dave Chinner,
	David Airlie, David S . Miller, Ilya Dryomov, Jan Kara,
	Jason Gunthorpe, Jens Axboe, Jérôme Glisse,
	Johannes Thumshirn, Magnus Karlsson, Matthew Wilcox,
	Miklos Szeredi, Ming Lei, Sage Weil, Santosh Shilimkar, Yan Zheng,
	netdev, dri-devel, linux-mm, linux-rdma, bpf, LKML, John Hubbard
In-Reply-To: <20190722223415.13269-1-jhubbard@nvidia.com>

From: John Hubbard <jhubbard@nvidia.com>

Add a more capable variation of put_user_pages() to the
API set, and call it from the simple ones.

The new __put_user_pages() takes an enum that handles the various
combinations of needing to call set_page_dirty() or
set_page_dirty_lock(), before calling put_user_page().

Cc: Matthew Wilcox <willy@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 include/linux/mm.h |  58 ++++++++++++++++++-
 mm/gup.c           | 137 ++++++++++++++++++++++-----------------------
 2 files changed, 124 insertions(+), 71 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0334ca97c584..7218585681b2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1057,8 +1057,62 @@ static inline void put_user_page(struct page *page)
 	put_page(page);
 }
 
-void put_user_pages_dirty(struct page **pages, unsigned long npages);
-void put_user_pages_dirty_lock(struct page **pages, unsigned long npages);
+enum pup_flags_t {
+	PUP_FLAGS_CLEAN		= 0,
+	PUP_FLAGS_DIRTY		= 1,
+	PUP_FLAGS_LOCK		= 2,
+	PUP_FLAGS_DIRTY_LOCK	= 3,
+};
+
+void __put_user_pages(struct page **pages, unsigned long npages,
+		      enum pup_flags_t flags);
+
+/**
+ * put_user_pages_dirty() - release and dirty an array of gup-pinned pages
+ * @pages:  array of pages to be marked dirty and released.
+ * @npages: number of pages in the @pages array.
+ *
+ * "gup-pinned page" refers to a page that has had one of the get_user_pages()
+ * variants called on that page.
+ *
+ * For each page in the @pages array, make that page (or its head page, if a
+ * compound page) dirty, if it was previously listed as clean. Then, release
+ * the page using put_user_page().
+ *
+ * Please see the put_user_page() documentation for details.
+ *
+ * set_page_dirty(), which does not lock the page, is used here.
+ * Therefore, it is the caller's responsibility to ensure that this is
+ * safe. If not, then put_user_pages_dirty_lock() should be called instead.
+ *
+ */
+static inline void put_user_pages_dirty(struct page **pages,
+					unsigned long npages)
+{
+	__put_user_pages(pages, npages, PUP_FLAGS_DIRTY);
+}
+
+/**
+ * put_user_pages_dirty_lock() - release and dirty an array of gup-pinned pages
+ * @pages:  array of pages to be marked dirty and released.
+ * @npages: number of pages in the @pages array.
+ *
+ * For each page in the @pages array, make that page (or its head page, if a
+ * compound page) dirty, if it was previously listed as clean. Then, release
+ * the page using put_user_page().
+ *
+ * Please see the put_user_page() documentation for details.
+ *
+ * This is just like put_user_pages_dirty(), except that it invokes
+ * set_page_dirty_lock(), instead of set_page_dirty().
+ *
+ */
+static inline void put_user_pages_dirty_lock(struct page **pages,
+					     unsigned long npages)
+{
+	__put_user_pages(pages, npages, PUP_FLAGS_DIRTY_LOCK);
+}
+
 void put_user_pages(struct page **pages, unsigned long npages);
 
 #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
diff --git a/mm/gup.c b/mm/gup.c
index 98f13ab37bac..6831ef064d76 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -29,87 +29,86 @@ struct follow_page_context {
 	unsigned int page_mask;
 };
 
-typedef int (*set_dirty_func_t)(struct page *page);
-
-static void __put_user_pages_dirty(struct page **pages,
-				   unsigned long npages,
-				   set_dirty_func_t sdf)
-{
-	unsigned long index;
-
-	for (index = 0; index < npages; index++) {
-		struct page *page = compound_head(pages[index]);
-
-		/*
-		 * Checking PageDirty at this point may race with
-		 * clear_page_dirty_for_io(), but that's OK. Two key cases:
-		 *
-		 * 1) This code sees the page as already dirty, so it skips
-		 * the call to sdf(). That could happen because
-		 * clear_page_dirty_for_io() called page_mkclean(),
-		 * followed by set_page_dirty(). However, now the page is
-		 * going to get written back, which meets the original
-		 * intention of setting it dirty, so all is well:
-		 * clear_page_dirty_for_io() goes on to call
-		 * TestClearPageDirty(), and write the page back.
-		 *
-		 * 2) This code sees the page as clean, so it calls sdf().
-		 * The page stays dirty, despite being written back, so it
-		 * gets written back again in the next writeback cycle.
-		 * This is harmless.
-		 */
-		if (!PageDirty(page))
-			sdf(page);
-
-		put_user_page(page);
-	}
-}
-
 /**
- * put_user_pages_dirty() - release and dirty an array of gup-pinned pages
+ * __put_user_pages() - release an array of gup-pinned pages.
  * @pages:  array of pages to be marked dirty and released.
  * @npages: number of pages in the @pages array.
+ * @flags: additional hints, to be applied to each page:
  *
- * "gup-pinned page" refers to a page that has had one of the get_user_pages()
- * variants called on that page.
+ *	PUP_FLAGS_CLEAN: no additional steps required. (Consider calling
+ *			 put_user_pages() directly, instead.)
  *
- * For each page in the @pages array, make that page (or its head page, if a
- * compound page) dirty, if it was previously listed as clean. Then, release
- * the page using put_user_page().
+ *	PUP_FLAGS_DIRTY: Call set_page_dirty() on the page (if not already
+ *			 dirty).
  *
- * Please see the put_user_page() documentation for details.
+ *      PUP_FLAGS_LOCK: meaningless by itself, but included in order to show
+ *			the numeric relationship between the flags.
  *
- * set_page_dirty(), which does not lock the page, is used here.
- * Therefore, it is the caller's responsibility to ensure that this is
- * safe. If not, then put_user_pages_dirty_lock() should be called instead.
+ *      PUP_FLAGS_DIRTY_LOCK: Call set_page_dirty_lock() on the page (if not
+ *			      already dirty).
  *
+ * For each page in the @pages array, release the page using put_user_page().
  */
-void put_user_pages_dirty(struct page **pages, unsigned long npages)
+void __put_user_pages(struct page **pages, unsigned long npages,
+		      enum pup_flags_t flags)
 {
-	__put_user_pages_dirty(pages, npages, set_page_dirty);
-}
-EXPORT_SYMBOL(put_user_pages_dirty);
+	unsigned long index;
 
-/**
- * put_user_pages_dirty_lock() - release and dirty an array of gup-pinned pages
- * @pages:  array of pages to be marked dirty and released.
- * @npages: number of pages in the @pages array.
- *
- * For each page in the @pages array, make that page (or its head page, if a
- * compound page) dirty, if it was previously listed as clean. Then, release
- * the page using put_user_page().
- *
- * Please see the put_user_page() documentation for details.
- *
- * This is just like put_user_pages_dirty(), except that it invokes
- * set_page_dirty_lock(), instead of set_page_dirty().
- *
- */
-void put_user_pages_dirty_lock(struct page **pages, unsigned long npages)
-{
-	__put_user_pages_dirty(pages, npages, set_page_dirty_lock);
+	/*
+	 * TODO: this can be optimized for huge pages: if a series of pages is
+	 * physically contiguous and part of the same compound page, then a
+	 * single operation to the head page should suffice.
+	 */
+
+	for (index = 0; index < npages; index++) {
+		struct page *page = compound_head(pages[index]);
+
+		switch (flags) {
+		case PUP_FLAGS_CLEAN:
+			break;
+
+		case PUP_FLAGS_DIRTY:
+			/*
+			 * Checking PageDirty at this point may race with
+			 * clear_page_dirty_for_io(), but that's OK. Two key
+			 * cases:
+			 *
+			 * 1) This code sees the page as already dirty, so it
+			 * skips the call to set_page_dirty(). That could happen
+			 * because clear_page_dirty_for_io() called
+			 * page_mkclean(), followed by set_page_dirty().
+			 * However, now the page is going to get written back,
+			 * which meets the original intention of setting it
+			 * dirty, so all is well: clear_page_dirty_for_io() goes
+			 * on to call TestClearPageDirty(), and write the page
+			 * back.
+			 *
+			 * 2) This code sees the page as clean, so it calls
+			 * set_page_dirty(). The page stays dirty, despite being
+			 * written back, so it gets written back again in the
+			 * next writeback cycle. This is harmless.
+			 */
+			if (!PageDirty(page))
+				set_page_dirty(page);
+			break;
+
+		case PUP_FLAGS_LOCK:
+			VM_WARN_ON_ONCE(flags == PUP_FLAGS_LOCK);
+			/*
+			 * Shouldn't happen, but treat it as _DIRTY_LOCK if
+			 * it does: fall through.
+			 */
+
+		case PUP_FLAGS_DIRTY_LOCK:
+			/* Same comments as for PUP_FLAGS_DIRTY apply here. */
+			if (!PageDirty(page))
+				set_page_dirty_lock(page);
+			break;
+		};
+		put_user_page(page);
+	}
 }
-EXPORT_SYMBOL(put_user_pages_dirty_lock);
+EXPORT_SYMBOL(__put_user_pages);
 
 /**
  * put_user_pages() - release an array of gup-pinned pages.
-- 
2.22.0


^ permalink raw reply related

* Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Song Liu @ 2019-07-22 20:53 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kees Cook, linux-security@vger.kernel.org, Networking, bpf,
	Alexei Starovoitov, Daniel Borkmann, Kernel Team, Lorenz Bauer,
	Jann Horn, Greg KH, Linux API
In-Reply-To: <CALCETrXTta26CTtEDnzvtd03-WOGdXcnsAogP8JjLkcj4-mHvg@mail.gmail.com>

Hi Andy, Lorenz, and all, 

> On Jul 2, 2019, at 2:32 PM, Andy Lutomirski <luto@kernel.org> wrote:
> 
> On Tue, Jul 2, 2019 at 2:04 PM Kees Cook <keescook@chromium.org> wrote:
>> 
>> On Mon, Jul 01, 2019 at 06:59:13PM -0700, Andy Lutomirski wrote:
>>> I think I'm understanding your motivation.  You're not trying to make
>>> bpf() generically usable without privilege -- you're trying to create
>>> a way to allow certain users to access dangerous bpf functionality
>>> within some limits.
>>> 
>>> That's a perfectly fine goal, but I think you're reinventing the
>>> wheel, and the wheel you're reinventing is quite complicated and
>>> already exists.  I think you should teach bpftool to be secure when
>>> installed setuid root or with fscaps enabled and put your policy in
>>> bpftool.  If you want to harden this a little bit, it would seem
>>> entirely reasonable to add a new CAP_BPF_ADMIN and change some, but
>>> not all, of the capable() checks to check CAP_BPF_ADMIN instead of the
>>> capabilities that they currently check.
>> 
>> If finer grained controls are wanted, it does seem like the /dev/bpf
>> path makes the most sense. open, request abilities, use fd. The open can
>> be mediated by DAC and LSM. The request can be mediated by LSM. This
>> provides a way to add policy at the LSM level and at the tool level.
>> (i.e. For tool-level controls: leave LSM wide open, make /dev/bpf owned
>> by "bpfadmin" and bpftool becomes setuid "bpfadmin". For fine-grained
>> controls, leave /dev/bpf wide open and add policy to SELinux, etc.)
>> 
>> With only a new CAP, you don't get the fine-grained controls. (The
>> "request abilities" part is the key there.)
> 
> Sure you do: the effective set.  It has somewhat bizarre defaults, but
> I don't think that's a real problem.  Also, this wouldn't be like
> CAP_DAC_READ_SEARCH -- you can't accidentally use your BPF caps.
> 
> I think that a /dev capability-like object isn't totally nuts, but I
> think we should do it well, and this patch doesn't really achieve
> that.  But I don't think bpf wants fine-grained controls like this at
> all -- as I pointed upthread, a fine-grained solution really wants
> different treatment for the different capable() checks, and a bunch of
> them won't resemble capabilities or /dev/bpf at all.

With 5.3-rc1 out, I am back on this. :)

How about we modify the set as:
  1. Introduce sys_bpf_with_cap() that takes fd of /dev/bpf. 
  2. Better handling of capable() calls through bpf code. I guess the
     biggest problem here is is_priv in verifier.c:bpf_check(). 

With this approach, we will be able to pass the fd around, so it should 
also solve problem for Go. 

Please let me know your comments/suggestions on this direction. 

Thanks,
Song


^ permalink raw reply

* Re: [PATCH 1/3] drivers/gpu/drm/via: convert put_page() to put_user_page*()
From: John Hubbard @ 2019-07-22 19:10 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, john.hubbard, Andrew Morton, Alexander Viro,
	Björn Töpel, Boaz Harrosh, Daniel Vetter, Dan Williams,
	Dave Chinner, David Airlie, David S . Miller, Ilya Dryomov,
	Jan Kara, Jason Gunthorpe, Jens Axboe, Jérôme Glisse,
	Johannes Thumshirn, Magnus Karlsson, Miklos Szeredi, Ming Lei,
	Sage Weil, Santosh Shilimkar, Yan Zheng, netdev, dri-devel,
	linux-mm, linux-rdma, bpf, LKML
In-Reply-To: <20190722190722.GF363@bombadil.infradead.org>

On 7/22/19 12:07 PM, Matthew Wilcox wrote:
> On Mon, Jul 22, 2019 at 11:53:54AM -0700, John Hubbard wrote:
>> On 7/22/19 2:33 AM, Christoph Hellwig wrote:
>>> On Sun, Jul 21, 2019 at 09:30:10PM -0700, john.hubbard@gmail.com wrote:
>>>>  		for (i = 0; i < vsg->num_pages; ++i) {
>>>>  			if (NULL != (page = vsg->pages[i])) {
>>>>  				if (!PageReserved(page) && (DMA_FROM_DEVICE == vsg->direction))
>>>> -					SetPageDirty(page);
>>>> -				put_page(page);
>>>> +					put_user_pages_dirty(&page, 1);
>>>> +				else
>>>> +					put_user_page(page);
>>>>  			}
>>>
>>> Can't just pass a dirty argument to put_user_pages?  Also do we really
>>
>> Yes, and in fact that would help a lot more than the single page case,
>> which is really just cosmetic after all.
>>
>>> need a separate put_user_page for the single page case?
>>> put_user_pages_dirty?
>>
>> Not really. I'm still zeroing in on the ideal API for all these call sites,
>> and I agree that the approach below is cleaner.
> 
> so enum { CLEAN = 0, DIRTY = 1, LOCK = 2, DIRTY_LOCK = 3 };
> ?
> 

Sure. In fact, I just applied something similar to bio_release_pages()
locally, in order to reconcile Christoph's and Jerome's approaches 
(they each needed to add a bool arg), so I'm all about the enums in the
arg lists. :)

I'm going to post that one shortly, let's see how it goes over. heh.

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply

* Re: [PATCH 1/3] drivers/gpu/drm/via: convert put_page() to put_user_page*()
From: Matthew Wilcox @ 2019-07-22 19:07 UTC (permalink / raw)
  To: John Hubbard
  Cc: Christoph Hellwig, john.hubbard, Andrew Morton, Alexander Viro,
	Björn Töpel, Boaz Harrosh, Daniel Vetter, Dan Williams,
	Dave Chinner, David Airlie, David S . Miller, Ilya Dryomov,
	Jan Kara, Jason Gunthorpe, Jens Axboe, Jérôme Glisse,
	Johannes Thumshirn, Magnus Karlsson, Miklos Szeredi, Ming Lei,
	Sage Weil, Santosh Shilimkar, Yan Zheng, netdev, dri-devel,
	linux-mm, linux-rdma, bpf, LKML
In-Reply-To: <397ff3e4-e857-037a-1aee-ff6242e024b2@nvidia.com>

On Mon, Jul 22, 2019 at 11:53:54AM -0700, John Hubbard wrote:
> On 7/22/19 2:33 AM, Christoph Hellwig wrote:
> > On Sun, Jul 21, 2019 at 09:30:10PM -0700, john.hubbard@gmail.com wrote:
> >>  		for (i = 0; i < vsg->num_pages; ++i) {
> >>  			if (NULL != (page = vsg->pages[i])) {
> >>  				if (!PageReserved(page) && (DMA_FROM_DEVICE == vsg->direction))
> >> -					SetPageDirty(page);
> >> -				put_page(page);
> >> +					put_user_pages_dirty(&page, 1);
> >> +				else
> >> +					put_user_page(page);
> >>  			}
> > 
> > Can't just pass a dirty argument to put_user_pages?  Also do we really
> 
> Yes, and in fact that would help a lot more than the single page case,
> which is really just cosmetic after all.
> 
> > need a separate put_user_page for the single page case?
> > put_user_pages_dirty?
> 
> Not really. I'm still zeroing in on the ideal API for all these call sites,
> and I agree that the approach below is cleaner.

so enum { CLEAN = 0, DIRTY = 1, LOCK = 2, DIRTY_LOCK = 3 };
?

^ permalink raw reply

* Re: [PATCH 3/3] gup: new put_user_page_dirty*() helpers
From: John Hubbard @ 2019-07-22 19:05 UTC (permalink / raw)
  To: john.hubbard, Andrew Morton
  Cc: Alexander Viro, Björn Töpel, Boaz Harrosh,
	Christoph Hellwig, Daniel Vetter, Dan Williams, Dave Chinner,
	David Airlie, David S . Miller, Ilya Dryomov, Jan Kara,
	Jason Gunthorpe, Jens Axboe, Jérôme Glisse,
	Johannes Thumshirn, Magnus Karlsson, Matthew Wilcox,
	Miklos Szeredi, Ming Lei, Sage Weil, Santosh Shilimkar, Yan Zheng,
	netdev, dri-devel, linux-mm, linux-rdma, bpf, LKML
In-Reply-To: <20190722043012.22945-4-jhubbard@nvidia.com>

On 7/21/19 9:30 PM, john.hubbard@gmail.com wrote:
> From: John Hubbard <jhubbard@nvidia.com>
> 
> While converting call sites to use put_user_page*() [1], quite a few
> places ended up needing a single-page routine to put and dirty a
> page.
> 
> Provide put_user_page_dirty() and put_user_page_dirty_lock(),
> and use them in a few places: net/xdp, drm/via/, drivers/infiniband.
> 

Please disregard this one, I'm going to drop it, as per the
discussion in patch 1.

thanks,
-- 
John Hubbard
NVIDIA

> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Jan Kara <jack@suse.cz>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
>  drivers/gpu/drm/via/via_dmablit.c        |  2 +-
>  drivers/infiniband/core/umem.c           |  2 +-
>  drivers/infiniband/hw/usnic/usnic_uiom.c |  2 +-
>  include/linux/mm.h                       | 10 ++++++++++
>  net/xdp/xdp_umem.c                       |  2 +-
>  5 files changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/via/via_dmablit.c b/drivers/gpu/drm/via/via_dmablit.c
> index 219827ae114f..d30b2d75599f 100644
> --- a/drivers/gpu/drm/via/via_dmablit.c
> +++ b/drivers/gpu/drm/via/via_dmablit.c
> @@ -189,7 +189,7 @@ via_free_sg_info(struct pci_dev *pdev, drm_via_sg_info_t *vsg)
>  		for (i = 0; i < vsg->num_pages; ++i) {
>  			if (NULL != (page = vsg->pages[i])) {
>  				if (!PageReserved(page) && (DMA_FROM_DEVICE == vsg->direction))
> -					put_user_pages_dirty(&page, 1);
> +					put_user_page_dirty(page);
>  				else
>  					put_user_page(page);
>  			}
> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> index 08da840ed7ee..a7337cc3ca20 100644
> --- a/drivers/infiniband/core/umem.c
> +++ b/drivers/infiniband/core/umem.c
> @@ -55,7 +55,7 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d
>  	for_each_sg_page(umem->sg_head.sgl, &sg_iter, umem->sg_nents, 0) {
>  		page = sg_page_iter_page(&sg_iter);
>  		if (umem->writable && dirty)
> -			put_user_pages_dirty_lock(&page, 1);
> +			put_user_page_dirty_lock(page);
>  		else
>  			put_user_page(page);
>  	}
> diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c
> index 0b0237d41613..d2ded624fb2a 100644
> --- a/drivers/infiniband/hw/usnic/usnic_uiom.c
> +++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
> @@ -76,7 +76,7 @@ static void usnic_uiom_put_pages(struct list_head *chunk_list, int dirty)
>  			page = sg_page(sg);
>  			pa = sg_phys(sg);
>  			if (dirty)
> -				put_user_pages_dirty_lock(&page, 1);
> +				put_user_page_dirty_lock(page);
>  			else
>  				put_user_page(page);
>  			usnic_dbg("pa: %pa\n", &pa);
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 0334ca97c584..c0584c6d9d78 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1061,6 +1061,16 @@ void put_user_pages_dirty(struct page **pages, unsigned long npages);
>  void put_user_pages_dirty_lock(struct page **pages, unsigned long npages);
>  void put_user_pages(struct page **pages, unsigned long npages);
>  
> +static inline void put_user_page_dirty(struct page *page)
> +{
> +	put_user_pages_dirty(&page, 1);
> +}
> +
> +static inline void put_user_page_dirty_lock(struct page *page)
> +{
> +	put_user_pages_dirty_lock(&page, 1);
> +}
> +
>  #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
>  #define SECTION_IN_PAGE_FLAGS
>  #endif
> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> index 9cbbb96c2a32..1d122e52c6de 100644
> --- a/net/xdp/xdp_umem.c
> +++ b/net/xdp/xdp_umem.c
> @@ -171,7 +171,7 @@ static void xdp_umem_unpin_pages(struct xdp_umem *umem)
>  	for (i = 0; i < umem->npgs; i++) {
>  		struct page *page = umem->pgs[i];
>  
> -		put_user_pages_dirty_lock(&page, 1);
> +		put_user_page_dirty_lock(page);
>  	}
>  
>  	kfree(umem->pgs);
> 

^ permalink raw reply

* Re: [PATCH 1/3] drivers/gpu/drm/via: convert put_page() to put_user_page*()
From: John Hubbard @ 2019-07-22 18:53 UTC (permalink / raw)
  To: Christoph Hellwig, john.hubbard
  Cc: Andrew Morton, Alexander Viro, Björn Töpel,
	Boaz Harrosh, Daniel Vetter, Dan Williams, Dave Chinner,
	David Airlie, David S . Miller, Ilya Dryomov, Jan Kara,
	Jason Gunthorpe, Jens Axboe, Jérôme Glisse,
	Johannes Thumshirn, Magnus Karlsson, Matthew Wilcox,
	Miklos Szeredi, Ming Lei, Sage Weil, Santosh Shilimkar, Yan Zheng,
	netdev, dri-devel, linux-mm, linux-rdma, bpf, LKML
In-Reply-To: <20190722093355.GB29538@lst.de>

On 7/22/19 2:33 AM, Christoph Hellwig wrote:
> On Sun, Jul 21, 2019 at 09:30:10PM -0700, john.hubbard@gmail.com wrote:
>>  		for (i = 0; i < vsg->num_pages; ++i) {
>>  			if (NULL != (page = vsg->pages[i])) {
>>  				if (!PageReserved(page) && (DMA_FROM_DEVICE == vsg->direction))
>> -					SetPageDirty(page);
>> -				put_page(page);
>> +					put_user_pages_dirty(&page, 1);
>> +				else
>> +					put_user_page(page);
>>  			}
> 
> Can't just pass a dirty argument to put_user_pages?  Also do we really

Yes, and in fact that would help a lot more than the single page case,
which is really just cosmetic after all.

> need a separate put_user_page for the single page case?
> put_user_pages_dirty?

Not really. I'm still zeroing in on the ideal API for all these call sites,
and I agree that the approach below is cleaner.

> 
> Also the PageReserved check looks bogus, as I can't see how a reserved
> page can end up here.  So IMHO the above snippled should really look
> something like this:
> 
> 	put_user_pages(vsg->pages[i], vsg->num_pages,
> 			vsg->direction == DMA_FROM_DEVICE);
> 
> in the end.
> 

Agreed.

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply

* Re: [PATCH bpf v4 00/14] sockmap/tls fixes
From: John Fastabend @ 2019-07-22 15:48 UTC (permalink / raw)
  To: Daniel Borkmann, Jakub Kicinski, john.fastabend,
	alexei.starovoitov
  Cc: edumazet, netdev, bpf, oss-drivers
In-Reply-To: <3c97d252-37ad-302f-b917-e7ea6e819318@iogearbox.net>

Daniel Borkmann wrote:
> On 7/19/19 7:37 PM, Jakub Kicinski wrote:
> > On Fri, 19 Jul 2019 10:29:13 -0700, Jakub Kicinski wrote:
> >> John says:
> >>
> >> Resolve a series of splats discovered by syzbot and an unhash
> >> TLS issue noted by Eric Dumazet.
> > 
> > Sorry for the delay, this code is quite tricky. According to my testing
> > TLS SW and HW should now work, I hope I didn't regress things on the
> > sockmap side.
> 
> Applied, thanks everyone!

Thanks Jakub, for the patches without my signed-off already

Acked-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply

* Re: [PATCH bpf v4 00/14] sockmap/tls fixes
From: John Fastabend @ 2019-07-22 15:46 UTC (permalink / raw)
  To: Jakub Kicinski, john.fastabend, alexei.starovoitov, daniel
  Cc: edumazet, netdev, bpf, oss-drivers
In-Reply-To: <20190719103721.558d9e7d@cakuba.netronome.com>

Jakub Kicinski wrote:
> On Fri, 19 Jul 2019 10:29:13 -0700, Jakub Kicinski wrote:
> > John says:
> > 
> > Resolve a series of splats discovered by syzbot and an unhash
> > TLS issue noted by Eric Dumazet.
> 
> Sorry for the delay, this code is quite tricky. According to my testing
> TLS SW and HW should now work, I hope I didn't regress things on the
> sockmap side.

I'll run it through our CI as well but looks good to me. Thanks a lot
for getting this finished up.

> 
> This is not solving all the issues (ugh), apart from HW needing the
> unhash/shutdown treatment, as discussed we may have a sender stuck in
> wmem wait while we free context underneath. That's a "minor" UAF for
> another day..

Agreed. But this should solve most of the syzbot issues and a few
crashes we saw in testing real workloads.

Thanks,
John

^ permalink raw reply

* Re: [PATCH bpf] selftests/bpf: fix sendmsg6_prog on s390
From: Daniel Borkmann @ 2019-07-22 14:23 UTC (permalink / raw)
  To: Ilya Leoshkevich, bpf, netdev; +Cc: gor, heiko.carstens, rdna
In-Reply-To: <20190719090611.91743-1-iii@linux.ibm.com>

On 7/19/19 11:06 AM, Ilya Leoshkevich wrote:
> "sendmsg6: rewrite IP & port (C)" fails on s390, because the code in
> sendmsg_v6_prog() assumes that (ctx->user_ip6[0] & 0xFFFF) refers to
> leading IPv6 address digits, which is not the case on big-endian
> machines.
> 
> Since checking bitwise operations doesn't seem to be the point of the
> test, replace two short comparisons with a single int comparison.
> 
> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>

Applied, thanks!

^ permalink raw reply

* Re: [GIT PULL 0/2] libbpf build fixes
From: Daniel Borkmann @ 2019-07-22 14:23 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Alexei Starovoitov
  Cc: Clark Williams, bpf, netdev, Adrian Hunter, Andrii Nakryiko,
	Ingo Molnar, Jiri Olsa, Namhyung Kim
In-Reply-To: <20190719143407.20847-1-acme@kernel.org>

On 7/19/19 4:34 PM, Arnaldo Carvalho de Melo wrote:
> Hi Daniel,
> 
> 	Please consider pulling or applying from the patches, if someone
> has any issues, please holler,
> 
> - Arnaldo
> 
> Arnaldo Carvalho de Melo (2):
>   libbpf: Fix endianness macro usage for some compilers
>   libbpf: Avoid designated initializers for unnamed union members
> 
>  tools/lib/bpf/btf.c    |  5 +++--
>  tools/lib/bpf/libbpf.c | 19 ++++++++++---------
>  2 files changed, 13 insertions(+), 11 deletions(-)
> 

Applied, thanks!

^ permalink raw reply

* Re: [PATCH bpf v4 00/14] sockmap/tls fixes
From: Daniel Borkmann @ 2019-07-22 14:22 UTC (permalink / raw)
  To: Jakub Kicinski, john.fastabend, alexei.starovoitov
  Cc: edumazet, netdev, bpf, oss-drivers
In-Reply-To: <20190719103721.558d9e7d@cakuba.netronome.com>

On 7/19/19 7:37 PM, Jakub Kicinski wrote:
> On Fri, 19 Jul 2019 10:29:13 -0700, Jakub Kicinski wrote:
>> John says:
>>
>> Resolve a series of splats discovered by syzbot and an unhash
>> TLS issue noted by Eric Dumazet.
> 
> Sorry for the delay, this code is quite tricky. According to my testing
> TLS SW and HW should now work, I hope I didn't regress things on the
> sockmap side.

Applied, thanks everyone!

^ permalink raw reply

* Re: [PATCH 2/3] net/xdp: convert put_page() to put_user_page*()
From: Christoph Hellwig @ 2019-07-22  9:34 UTC (permalink / raw)
  To: john.hubbard
  Cc: Andrew Morton, Alexander Viro, Björn Töpel,
	Boaz Harrosh, Christoph Hellwig, Daniel Vetter, Dan Williams,
	Dave Chinner, David Airlie, David S . Miller, Ilya Dryomov,
	Jan Kara, Jason Gunthorpe, Jens Axboe, Jérôme Glisse,
	Johannes Thumshirn, Magnus Karlsson, Matthew Wilcox,
	Miklos Szeredi, Ming Lei, Sage Weil, Santosh Shilimkar, Yan Zheng,
	netdev, dri-devel, linux-mm, linux-rdma, bpf, LKML, John Hubbard
In-Reply-To: <20190722043012.22945-3-jhubbard@nvidia.com>

> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> index 83de74ca729a..9cbbb96c2a32 100644
> --- a/net/xdp/xdp_umem.c
> +++ b/net/xdp/xdp_umem.c
> @@ -171,8 +171,7 @@ static void xdp_umem_unpin_pages(struct xdp_umem *umem)
>  	for (i = 0; i < umem->npgs; i++) {
>  		struct page *page = umem->pgs[i];
>  
> -		set_page_dirty_lock(page);
> -		put_page(page);
> +		put_user_pages_dirty_lock(&page, 1);

Same here, we really should avoid the need for the loop here and
do the looping inside the helper.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox