[bpf PATCH 0/3] sockmap error path fixes

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

* [bpf PATCH 0/3] sockmap error path fixes
@ 2018-05-02 17:47 John Fastabend
  2018-05-02 17:47 ` [bpf PATCH 1/3] bpf: sockmap, fix scatterlist update on error path in send with apply John Fastabend
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: John Fastabend @ 2018-05-02 17:47 UTC (permalink / raw)
  To: borkmann, ast; +Cc: netdev

When I added the test_sockmap to selftests I mistakenly changed the
test logic a bit. The result of this was on redirect cases we ended up
choosing the wrong sock from the BPF program and ended up sending to a
socket that had no receive handler. The result was the actual receive
handler, running on a different socket, is timing out and closing the
socket. This results in errors (-EPIPE to be specific) on the sending
side. Typically happening if the sender does not complete the send
before the receive side times out. So depending on timing and the size
of the send we may get errors. This exposed some bugs in the sockmap
error path handling.

This series fixes the errors. The primary issue is we did not do proper
memory accounting in these cases which resulted in missing a
sk_mem_uncharge(). This happened in the redirect path and in one case
on the normal send path. See the three patches for the details.

The other take-away from this is we need to fix the test_sockmap and
also add more negative test cases. That will happen in bpf-next.

Finally, I tested this using the existing test_sockmap program, the
older sockmap sample test script, and a few real use cases with
Cilium. All of these seem to be in working correctly.

---

John Fastabend (3):
      bpf: sockmap, fix scatterlist update on error path in send with apply
      bpf: sockmap, zero sg_size on error when buffer is released
      bpf: sockmap, fix error handling in redirect failures

 kernel/bpf/sockmap.c |   46 +++++++++++++++++++++++++---------------------
 1 file changed, 25 insertions(+), 21 deletions(-)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [bpf PATCH 1/3] bpf: sockmap, fix scatterlist update on error path in send with apply
  2018-05-02 17:47 [bpf PATCH 0/3] sockmap error path fixes John Fastabend
@ 2018-05-02 17:47 ` John Fastabend
  2018-05-02 18:51   ` David Miller
  2018-05-02 17:47 ` [bpf PATCH 2/3] bpf: sockmap, zero sg_size on error when buffer is released John Fastabend
  2018-05-02 17:47 ` [bpf PATCH 3/3] bpf: sockmap, fix error handling in redirect failures John Fastabend
  2 siblings, 1 reply; 8+ messages in thread
From: John Fastabend @ 2018-05-02 17:47 UTC (permalink / raw)
  To: borkmann, ast; +Cc: netdev

When the call to do_tcp_sendpage() fails to send the complete block
requested we either retry if only a partial send was completed or
abort if we receive a error less than or equal to zero. Before
returning though we must update the scatterlist length/offset to
account for any partial send completed.

Before this patch we did this at the end of the retry loop, but
this was buggy when used while applying a verdict to fewer bytes
than in the scatterlist. When the scatterlist length was being set
we forgot to account for the apply logic reducing the size variable.
So the result was we chopped off some bytes in the scatterlist without
doing proper cleanup on them. This results in a WARNING when the
sock is tore down because the bytes have previously been charged to
the socket but are never uncharged.

The simple fix is to simply do the accounting inside the retry loop
subtracting from the absolute scatterlist values rather than trying
to accumulate the totals and subtract at the end.

Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 kernel/bpf/sockmap.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 634415c..943929a 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -326,6 +326,9 @@ static int bpf_tcp_push(struct sock *sk, int apply_bytes,
 			if (ret > 0) {
 				if (apply)
 					apply_bytes -= ret;
+
+				sg->offset += ret;
+				sg->length -= ret;
 				size -= ret;
 				offset += ret;
 				if (uncharge)
@@ -333,8 +336,6 @@ static int bpf_tcp_push(struct sock *sk, int apply_bytes,
 				goto retry;
 			}

-			sg->length = size;
-			sg->offset = offset;
 			return ret;
 		}

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [bpf PATCH 1/3] bpf: sockmap, fix scatterlist update on error path in send with apply
  2018-05-02 17:47 ` [bpf PATCH 1/3] bpf: sockmap, fix scatterlist update on error path in send with apply John Fastabend
@ 2018-05-02 18:51   ` David Miller
  0 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2018-05-02 18:51 UTC (permalink / raw)
  To: john.fastabend; +Cc: borkmann, ast, netdev

From: John Fastabend <john.fastabend@gmail.com>
Date: Wed, 02 May 2018 10:47:27 -0700

> When the call to do_tcp_sendpage() fails to send the complete block
> requested we either retry if only a partial send was completed or
> abort if we receive a error less than or equal to zero. Before
> returning though we must update the scatterlist length/offset to
> account for any partial send completed.
> 
> Before this patch we did this at the end of the retry loop, but
> this was buggy when used while applying a verdict to fewer bytes
> than in the scatterlist. When the scatterlist length was being set
> we forgot to account for the apply logic reducing the size variable.
> So the result was we chopped off some bytes in the scatterlist without
> doing proper cleanup on them. This results in a WARNING when the
> sock is tore down because the bytes have previously been charged to
> the socket but are never uncharged.
> 
> The simple fix is to simply do the accounting inside the retry loop
> subtracting from the absolute scatterlist values rather than trying
> to accumulate the totals and subtract at the end.
> 
> Reported-by: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [bpf PATCH 2/3] bpf: sockmap, zero sg_size on error when buffer is released
  2018-05-02 17:47 [bpf PATCH 0/3] sockmap error path fixes John Fastabend
  2018-05-02 17:47 ` [bpf PATCH 1/3] bpf: sockmap, fix scatterlist update on error path in send with apply John Fastabend
@ 2018-05-02 17:47 ` John Fastabend
  2018-05-02 18:51   ` David Miller
  2018-05-02 17:47 ` [bpf PATCH 3/3] bpf: sockmap, fix error handling in redirect failures John Fastabend
  2 siblings, 1 reply; 8+ messages in thread
From: John Fastabend @ 2018-05-02 17:47 UTC (permalink / raw)
  To: borkmann, ast; +Cc: netdev

When an error occurs during a redirect we have two cases that need
to be handled (i) we have a cork'ed buffer (ii) we have a normal
sendmsg buffer.

In the cork'ed buffer case we don't currently support recovering from
errors in a redirect action. So the buffer is released and the error
should _not_ be pushed back to the caller of sendmsg/sendpage. The
rationale here is the user will get an error that relates to old
data that may have been sent by some arbitrary thread on that sock.
Instead we simple consume the data and tell the user that the data
has been consumed. We may add proper error recovery in the future.
However, this patch fixes a bug where the bytes outstanding counter
sg_size was not zeroed. This could result in a case where if the user
has both a cork'ed action and apply action in progress we may
incorrectly call into the BPF program when the user expected an
old verdict to be applied via the apply action. I don't have a use
case where using apply and cork at the same time is valid but we
never explicitly reject it because it should work fine. This patch
ensures the sg_size is zeroed so we don't have this case.

In the normal sendmsg buffer case (no cork data) we also do not
zero sg_size. Again this can confuse the apply logic when the logic
calls into the BPF program when the BPF programmer expected the old
verdict to remain. So ensure we set sg_size to zero here as well. And
additionally to keep the psock state in-sync with the sk_msg_buff
release all the memory as well. Previously we did this before
returning to the user but this left a gap where psock and sk_msg_buff
states were out of sync which seems fragile. No additional overhead
is taken here except for a call to check the length and realize its
already been freed. This is in the error path as well so in my
opinion lets have robust code over optimized error paths.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 kernel/bpf/sockmap.c |   15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 943929a..052c313 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -701,15 +701,22 @@ static int bpf_exec_tx_verdict(struct smap_psock *psock,
 		err = bpf_tcp_sendmsg_do_redirect(redir, send, m, flags);
 		lock_sock(sk);

+		if (unlikely(err < 0)) {
+			free_start_sg(sk, m);
+			psock->sg_size = 0;
+			if (!cork)
+				*copied -= send;
+		} else {
+			psock->sg_size -= send;
+		}
+
 		if (cork) {
 			free_start_sg(sk, m);
+			psock->sg_size = 0;
 			kfree(m);
 			m = NULL;
+			err = 0;
 		}
-		if (unlikely(err))
-			*copied -= err;
-		else
-			psock->sg_size -= send;
 		break;
 	case __SK_DROP:
 	default:

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [bpf PATCH 2/3] bpf: sockmap, zero sg_size on error when buffer is released
  2018-05-02 17:47 ` [bpf PATCH 2/3] bpf: sockmap, zero sg_size on error when buffer is released John Fastabend
@ 2018-05-02 18:51   ` David Miller
  0 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2018-05-02 18:51 UTC (permalink / raw)
  To: john.fastabend; +Cc: borkmann, ast, netdev

From: John Fastabend <john.fastabend@gmail.com>
Date: Wed, 02 May 2018 10:47:32 -0700

> When an error occurs during a redirect we have two cases that need
> to be handled (i) we have a cork'ed buffer (ii) we have a normal
> sendmsg buffer.
> 
> In the cork'ed buffer case we don't currently support recovering from
> errors in a redirect action. So the buffer is released and the error
> should _not_ be pushed back to the caller of sendmsg/sendpage. The
> rationale here is the user will get an error that relates to old
> data that may have been sent by some arbitrary thread on that sock.
> Instead we simple consume the data and tell the user that the data
> has been consumed. We may add proper error recovery in the future.
> However, this patch fixes a bug where the bytes outstanding counter
> sg_size was not zeroed. This could result in a case where if the user
> has both a cork'ed action and apply action in progress we may
> incorrectly call into the BPF program when the user expected an
> old verdict to be applied via the apply action. I don't have a use
> case where using apply and cork at the same time is valid but we
> never explicitly reject it because it should work fine. This patch
> ensures the sg_size is zeroed so we don't have this case.
> 
> In the normal sendmsg buffer case (no cork data) we also do not
> zero sg_size. Again this can confuse the apply logic when the logic
> calls into the BPF program when the BPF programmer expected the old
> verdict to remain. So ensure we set sg_size to zero here as well. And
> additionally to keep the psock state in-sync with the sk_msg_buff
> release all the memory as well. Previously we did this before
> returning to the user but this left a gap where psock and sk_msg_buff
> states were out of sync which seems fragile. No additional overhead
> is taken here except for a call to check the length and realize its
> already been freed. This is in the error path as well so in my
> opinion lets have robust code over optimized error paths.
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [bpf PATCH 3/3] bpf: sockmap, fix error handling in redirect failures
  2018-05-02 17:47 [bpf PATCH 0/3] sockmap error path fixes John Fastabend
  2018-05-02 17:47 ` [bpf PATCH 1/3] bpf: sockmap, fix scatterlist update on error path in send with apply John Fastabend
  2018-05-02 17:47 ` [bpf PATCH 2/3] bpf: sockmap, zero sg_size on error when buffer is released John Fastabend
@ 2018-05-02 17:47 ` John Fastabend
  2018-05-02 18:51   ` David Miller
  2018-05-02 19:34   ` Alexei Starovoitov
  2 siblings, 2 replies; 8+ messages in thread
From: John Fastabend @ 2018-05-02 17:47 UTC (permalink / raw)
  To: borkmann, ast; +Cc: netdev

When a redirect failure happens we release the buffers in-flight
without calling a sk_mem_uncharge(), the uncharge is called before
dropping the sock lock for the redirecte, however we missed updating
the ring start index. When no apply actions are in progress this
is OK because we uncharge the entire buffer before the redirect.
But, when we have apply logic running its possible that only a
portion of the buffer is being redirected. In this case we only
do memory accounting for the buffer slice being redirected and
expect to be able to loop over the BPF program again and/or if
a sock is closed uncharge the memory at sock destruct time.

With an invalid start index however the program logic looks at
the start pointer index, checks the length, and when seeing the
length is zero (from the initial release and failure to update
the pointer) aborts without uncharging/releasing the remaining
memory.

The fix for this is simply to update the start index. To avoid
fixing this error in two locations we do a small refactor and
remove one case where it is open-coded. Then fix it in the
single function.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 kernel/bpf/sockmap.c |   26 +++++++++++---------------
 1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 052c313..7e3c4cd 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -393,7 +393,8 @@ static void return_mem_sg(struct sock *sk, int bytes, struct sk_msg_buff *md)
 	} while (i != md->sg_end);
 }
 
-static void free_bytes_sg(struct sock *sk, int bytes, struct sk_msg_buff *md)
+static void free_bytes_sg(struct sock *sk, int bytes,
+			  struct sk_msg_buff *md, bool charge)
 {
 	struct scatterlist *sg = md->sg_data;
 	int i = md->sg_start, free;
@@ -403,11 +404,13 @@ static void free_bytes_sg(struct sock *sk, int bytes, struct sk_msg_buff *md)
 		if (bytes < free) {
 			sg[i].length -= bytes;
 			sg[i].offset += bytes;
-			sk_mem_uncharge(sk, bytes);
+			if (charge)
+				sk_mem_uncharge(sk, bytes);
 			break;
 		}
 
-		sk_mem_uncharge(sk, sg[i].length);
+		if (charge)
+			sk_mem_uncharge(sk, sg[i].length);
 		put_page(sg_page(&sg[i]));
 		bytes -= sg[i].length;
 		sg[i].length = 0;
@@ -418,6 +421,7 @@ static void free_bytes_sg(struct sock *sk, int bytes, struct sk_msg_buff *md)
 		if (i == MAX_SKB_FRAGS)
 			i = 0;
 	}
+	md->sg_start = i;
 }
 
 static int free_sg(struct sock *sk, int start, struct sk_msg_buff *md)
@@ -578,7 +582,7 @@ static int bpf_tcp_sendmsg_do_redirect(struct sock *sk, int send,
 {
 	struct smap_psock *psock;
 	struct scatterlist *sg;
-	int i, err, free = 0;
+	int i, err = 0;
 	bool ingress = !!(md->flags & BPF_F_INGRESS);
 
 	sg = md->sg_data;
@@ -607,16 +611,8 @@ static int bpf_tcp_sendmsg_do_redirect(struct sock *sk, int send,
 out_rcu:
 	rcu_read_unlock();
 out:
-	i = md->sg_start;
-	while (sg[i].length) {
-		free += sg[i].length;
-		put_page(sg_page(&sg[i]));
-		sg[i].length = 0;
-		i++;
-		if (i == MAX_SKB_FRAGS)
-			i = 0;
-	}
-	return free;
+	free_bytes_sg(NULL, send, md, false);
+	return err;
 }
 
 static inline void bpf_md_init(struct smap_psock *psock)
@@ -720,7 +716,7 @@ static int bpf_exec_tx_verdict(struct smap_psock *psock,
 		break;
 	case __SK_DROP:
 	default:
-		free_bytes_sg(sk, send, m);
+		free_bytes_sg(sk, send, m, true);
 		apply_bytes_dec(psock, send);
 		*copied -= send;
 		psock->sg_size -= send;

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [bpf PATCH 3/3] bpf: sockmap, fix error handling in redirect failures
  2018-05-02 17:47 ` [bpf PATCH 3/3] bpf: sockmap, fix error handling in redirect failures John Fastabend
@ 2018-05-02 18:51   ` David Miller
  2018-05-02 19:34   ` Alexei Starovoitov
  1 sibling, 0 replies; 8+ messages in thread
From: David Miller @ 2018-05-02 18:51 UTC (permalink / raw)
  To: john.fastabend; +Cc: borkmann, ast, netdev

From: John Fastabend <john.fastabend@gmail.com>
Date: Wed, 02 May 2018 10:47:37 -0700

> When a redirect failure happens we release the buffers in-flight
> without calling a sk_mem_uncharge(), the uncharge is called before
> dropping the sock lock for the redirecte, however we missed updating
> the ring start index. When no apply actions are in progress this
> is OK because we uncharge the entire buffer before the redirect.
> But, when we have apply logic running its possible that only a
> portion of the buffer is being redirected. In this case we only
> do memory accounting for the buffer slice being redirected and
> expect to be able to loop over the BPF program again and/or if
> a sock is closed uncharge the memory at sock destruct time.
> 
> With an invalid start index however the program logic looks at
> the start pointer index, checks the length, and when seeing the
> length is zero (from the initial release and failure to update
> the pointer) aborts without uncharging/releasing the remaining
> memory.
> 
> The fix for this is simply to update the start index. To avoid
> fixing this error in two locations we do a small refactor and
> remove one case where it is open-coded. Then fix it in the
> single function.
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bpf PATCH 3/3] bpf: sockmap, fix error handling in redirect failures
  2018-05-02 17:47 ` [bpf PATCH 3/3] bpf: sockmap, fix error handling in redirect failures John Fastabend
  2018-05-02 18:51   ` David Miller
@ 2018-05-02 19:34   ` Alexei Starovoitov
  1 sibling, 0 replies; 8+ messages in thread
From: Alexei Starovoitov @ 2018-05-02 19:34 UTC (permalink / raw)
  To: John Fastabend; +Cc: borkmann, ast, netdev

On Wed, May 02, 2018 at 10:47:37AM -0700, John Fastabend wrote:
> When a redirect failure happens we release the buffers in-flight
> without calling a sk_mem_uncharge(), the uncharge is called before
> dropping the sock lock for the redirecte, however we missed updating
> the ring start index. When no apply actions are in progress this
> is OK because we uncharge the entire buffer before the redirect.
> But, when we have apply logic running its possible that only a
> portion of the buffer is being redirected. In this case we only
> do memory accounting for the buffer slice being redirected and
> expect to be able to loop over the BPF program again and/or if
> a sock is closed uncharge the memory at sock destruct time.
> 
> With an invalid start index however the program logic looks at
> the start pointer index, checks the length, and when seeing the
> length is zero (from the initial release and failure to update
> the pointer) aborts without uncharging/releasing the remaining
> memory.
> 
> The fix for this is simply to update the start index. To avoid
> fixing this error in two locations we do a small refactor and
> remove one case where it is open-coded. Then fix it in the
> single function.
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> ---
>  kernel/bpf/sockmap.c |   26 +++++++++++---------------
>  1 file changed, 11 insertions(+), 15 deletions(-)
> 
> diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
> index 052c313..7e3c4cd 100644
> --- a/kernel/bpf/sockmap.c
> +++ b/kernel/bpf/sockmap.c
> @@ -393,7 +393,8 @@ static void return_mem_sg(struct sock *sk, int bytes, struct sk_msg_buff *md)
>  	} while (i != md->sg_end);
>  }
>  
> -static void free_bytes_sg(struct sock *sk, int bytes, struct sk_msg_buff *md)
> +static void free_bytes_sg(struct sock *sk, int bytes,
> +			  struct sk_msg_buff *md, bool charge)
>  {
>  	struct scatterlist *sg = md->sg_data;
>  	int i = md->sg_start, free;
> @@ -403,11 +404,13 @@ static void free_bytes_sg(struct sock *sk, int bytes, struct sk_msg_buff *md)
>  		if (bytes < free) {
>  			sg[i].length -= bytes;
>  			sg[i].offset += bytes;
> -			sk_mem_uncharge(sk, bytes);
> +			if (charge)
> +				sk_mem_uncharge(sk, bytes);
>  			break;
>  		}
>  
> -		sk_mem_uncharge(sk, sg[i].length);
> +		if (charge)
> +			sk_mem_uncharge(sk, sg[i].length);
>  		put_page(sg_page(&sg[i]));
>  		bytes -= sg[i].length;
>  		sg[i].length = 0;
> @@ -418,6 +421,7 @@ static void free_bytes_sg(struct sock *sk, int bytes, struct sk_msg_buff *md)
>  		if (i == MAX_SKB_FRAGS)
>  			i = 0;
>  	}
> +	md->sg_start = i;
>  }
>  
>  static int free_sg(struct sock *sk, int start, struct sk_msg_buff *md)
> @@ -578,7 +582,7 @@ static int bpf_tcp_sendmsg_do_redirect(struct sock *sk, int send,
>  {
>  	struct smap_psock *psock;
>  	struct scatterlist *sg;
> -	int i, err, free = 0;
> +	int i, err = 0;
>  	bool ingress = !!(md->flags & BPF_F_INGRESS);
>  
>  	sg = md->sg_data;
> @@ -607,16 +611,8 @@ static int bpf_tcp_sendmsg_do_redirect(struct sock *sk, int send,
>  out_rcu:
>  	rcu_read_unlock();
>  out:
> -	i = md->sg_start;
> -	while (sg[i].length) {
> -		free += sg[i].length;
> -		put_page(sg_page(&sg[i]));
> -		sg[i].length = 0;
> -		i++;
> -		if (i == MAX_SKB_FRAGS)
> -			i = 0;
> -	}

this hunk is causing:
../kernel/bpf/sockmap.c:585:6: warning: unused variable ‘i’ [-Wunused-variable]
  int i, err = 0;

please respin.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-05-02 19:34 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-05-02 17:47 [bpf PATCH 0/3] sockmap error path fixes John Fastabend
2018-05-02 17:47 ` [bpf PATCH 1/3] bpf: sockmap, fix scatterlist update on error path in send with apply John Fastabend
2018-05-02 18:51   ` David Miller
2018-05-02 17:47 ` [bpf PATCH 2/3] bpf: sockmap, zero sg_size on error when buffer is released John Fastabend
2018-05-02 18:51   ` David Miller
2018-05-02 17:47 ` [bpf PATCH 3/3] bpf: sockmap, fix error handling in redirect failures John Fastabend
2018-05-02 18:51   ` David Miller
2018-05-02 19:34   ` Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox