netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v2 0/2] BPF update
@ 2016-01-07 14:50 Daniel Borkmann
  2016-01-07 14:50 ` [PATCH net-next v2 1/2] net, sched: add skb_at_tc_ingress helper Daniel Borkmann
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Daniel Borkmann @ 2016-01-07 14:50 UTC (permalink / raw)
  To: davem; +Cc: alexei.starovoitov, hannes, netdev, Daniel Borkmann

Fixes a csum issue on ingress. As mentioned previously, net-next
seems just fine imho. Later on, will follow up with couple of
replacements like ovs_skb_postpush_rcsum() etc.

Thanks!

v1 -> v2:
  - Added patch 1 with helper
  - Implemented Hannes' idea to just use csum_partial, thanks!

Daniel Borkmann (2):
  net, sched: add skb_at_tc_ingress helper
  bpf: add skb_postpush_rcsum and fix dev_forward_skb occasions

 include/linux/skbuff.h    | 17 +++++++++++++++++
 include/net/sch_generic.h |  9 +++++++++
 net/core/filter.c         | 17 +++++++++++++----
 net/sched/cls_bpf.c       |  6 +-----
 4 files changed, 40 insertions(+), 9 deletions(-)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH net-next v2 1/2] net, sched: add skb_at_tc_ingress helper
  2016-01-07 14:50 [PATCH net-next v2 0/2] BPF update Daniel Borkmann
@ 2016-01-07 14:50 ` Daniel Borkmann
  2016-01-07 18:12   ` Alexei Starovoitov
  2016-01-07 14:50 ` [PATCH net-next v2 2/2] bpf: add skb_postpush_rcsum and fix dev_forward_skb occasions Daniel Borkmann
  2016-01-10 22:55 ` [PATCH net-next v2 0/2] BPF update David Miller
  2 siblings, 1 reply; 6+ messages in thread
From: Daniel Borkmann @ 2016-01-07 14:50 UTC (permalink / raw)
  To: davem; +Cc: alexei.starovoitov, hannes, netdev, Daniel Borkmann

Add a skb_at_tc_ingress() as this will be needed elsewhere as well and
can hide the ugly ifdef.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 include/net/sch_generic.h | 9 +++++++++
 net/sched/cls_bpf.c       | 6 +-----
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index b2a8e63..636a362 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -407,6 +407,15 @@ bool tcf_destroy(struct tcf_proto *tp, bool force);
 void tcf_destroy_chain(struct tcf_proto __rcu **fl);
 int skb_do_redirect(struct sk_buff *);
 
+static inline bool skb_at_tc_ingress(const struct sk_buff *skb)
+{
+#ifdef CONFIG_NET_CLS_ACT
+	return G_TC_AT(skb->tc_verd) & AT_INGRESS;
+#else
+	return false;
+#endif
+}
+
 /* Reset all TX qdiscs greater then index of a device.  */
 static inline void qdisc_reset_all_tx_gt(struct net_device *dev, unsigned int i)
 {
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index 99259f5..8dc8430 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -79,12 +79,8 @@ static int cls_bpf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 			    struct tcf_result *res)
 {
 	struct cls_bpf_head *head = rcu_dereference_bh(tp->root);
+	bool at_ingress = skb_at_tc_ingress(skb);
 	struct cls_bpf_prog *prog;
-#ifdef CONFIG_NET_CLS_ACT
-	bool at_ingress = G_TC_AT(skb->tc_verd) & AT_INGRESS;
-#else
-	bool at_ingress = false;
-#endif
 	int ret = -1;
 
 	if (unlikely(!skb_mac_header_was_set(skb)))
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH net-next v2 2/2] bpf: add skb_postpush_rcsum and fix dev_forward_skb occasions
  2016-01-07 14:50 [PATCH net-next v2 0/2] BPF update Daniel Borkmann
  2016-01-07 14:50 ` [PATCH net-next v2 1/2] net, sched: add skb_at_tc_ingress helper Daniel Borkmann
@ 2016-01-07 14:50 ` Daniel Borkmann
  2016-01-07 18:52   ` Alexei Starovoitov
  2016-01-10 22:55 ` [PATCH net-next v2 0/2] BPF update David Miller
  2 siblings, 1 reply; 6+ messages in thread
From: Daniel Borkmann @ 2016-01-07 14:50 UTC (permalink / raw)
  To: davem; +Cc: alexei.starovoitov, hannes, netdev, Daniel Borkmann

Add a small helper skb_postpush_rcsum() and fix up redirect locations
that need CHECKSUM_COMPLETE fixups on ingress. dev_forward_skb() expects
a proper csum that covers also Ethernet header, f.e. since 2c26d34bbcc0
("net/core: Handle csum for CHECKSUM_COMPLETE VXLAN forwarding"), we
also do skb_postpull_rcsum() after pulling Ethernet header off via
eth_type_trans().

When using eBPF in a netns setup f.e. with vxlan in collect metadata mode,
I can trigger the following csum issue with an IPv6 setup:

  [  505.144065] dummy1: hw csum failure
  [...]
  [  505.144108] Call Trace:
  [  505.144112]  <IRQ>  [<ffffffff81372f08>] dump_stack+0x44/0x5c
  [  505.144134]  [<ffffffff81607cea>] netdev_rx_csum_fault+0x3a/0x40
  [  505.144142]  [<ffffffff815fee3f>] __skb_checksum_complete+0xcf/0xe0
  [  505.144149]  [<ffffffff816f0902>] nf_ip6_checksum+0xb2/0x120
  [  505.144161]  [<ffffffffa08c0e0e>] icmpv6_error+0x17e/0x328 [nf_conntrack_ipv6]
  [  505.144170]  [<ffffffffa0898eca>] ? ip6t_do_table+0x2fa/0x645 [ip6_tables]
  [  505.144177]  [<ffffffffa08c0725>] ? ipv6_get_l4proto+0x65/0xd0 [nf_conntrack_ipv6]
  [  505.144189]  [<ffffffffa06c9a12>] nf_conntrack_in+0xc2/0x5a0 [nf_conntrack]
  [  505.144196]  [<ffffffffa08c039c>] ipv6_conntrack_in+0x1c/0x20 [nf_conntrack_ipv6]
  [  505.144204]  [<ffffffff8164385d>] nf_iterate+0x5d/0x70
  [  505.144210]  [<ffffffff816438d6>] nf_hook_slow+0x66/0xc0
  [  505.144218]  [<ffffffff816bd302>] ipv6_rcv+0x3f2/0x4f0
  [  505.144225]  [<ffffffff816bca40>] ? ip6_make_skb+0x1b0/0x1b0
  [  505.144232]  [<ffffffff8160b77b>] __netif_receive_skb_core+0x36b/0x9a0
  [  505.144239]  [<ffffffff8160bdc8>] ? __netif_receive_skb+0x18/0x60
  [  505.144245]  [<ffffffff8160bdc8>] __netif_receive_skb+0x18/0x60
  [  505.144252]  [<ffffffff8160ccff>] process_backlog+0x9f/0x140
  [  505.144259]  [<ffffffff8160c4a5>] net_rx_action+0x145/0x320
  [...]

What happens is that on ingress, we push Ethernet header back in, either
from cls_bpf or right before skb_do_redirect(), but without updating csum.
The "hw csum failure" can be fixed by using the new skb_postpush_rcsum()
helper for the dev_forward_skb() case to correct the csum diff again.

Thanks to Hannes Frederic Sowa for the csum_partial() idea!

Fixes: 3896d655f4d4 ("bpf: introduce bpf_clone_redirect() helper")
Fixes: 27b29f63058d ("bpf: add bpf_redirect() helper")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 include/linux/skbuff.h | 17 +++++++++++++++++
 net/core/filter.c      | 17 +++++++++++++----
 2 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 6b6bd42..07f9ccd 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2805,6 +2805,23 @@ static inline void skb_postpull_rcsum(struct sk_buff *skb,
 
 unsigned char *skb_pull_rcsum(struct sk_buff *skb, unsigned int len);
 
+static inline void skb_postpush_rcsum(struct sk_buff *skb,
+				      const void *start, unsigned int len)
+{
+	/* For performing the reverse operation to skb_postpull_rcsum(),
+	 * we can instead of ...
+	 *
+	 *   skb->csum = csum_add(skb->csum, csum_partial(start, len, 0));
+	 *
+	 * ... just use this equivalent version here to save a few
+	 * instructions. Feeding csum of 0 in csum_partial() and later
+	 * on adding skb->csum is equivalent to feed skb->csum in the
+	 * first place.
+	 */
+	if (skb->ip_summed == CHECKSUM_COMPLETE)
+		skb->csum = csum_partial(start, len, skb->csum);
+}
+
 /**
  *	pskb_trim_rcsum - trim received skb and update checksum
  *	@skb: buffer to trim
diff --git a/net/core/filter.c b/net/core/filter.c
index 35e6fed..0db92b5 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1368,8 +1368,9 @@ static u64 bpf_skb_store_bytes(u64 r1, u64 r2, u64 r3, u64 r4, u64 flags)
 		/* skb_store_bits cannot return -EFAULT here */
 		skb_store_bits(skb, offset, ptr, len);
 
-	if (BPF_RECOMPUTE_CSUM(flags) && skb->ip_summed == CHECKSUM_COMPLETE)
-		skb->csum = csum_add(skb->csum, csum_partial(ptr, len, 0));
+	if (BPF_RECOMPUTE_CSUM(flags))
+		skb_postpush_rcsum(skb, ptr, len);
+
 	return 0;
 }
 
@@ -1525,8 +1526,12 @@ static u64 bpf_clone_redirect(u64 r1, u64 ifindex, u64 flags, u64 r4, u64 r5)
 	if (unlikely(!skb2))
 		return -ENOMEM;
 
-	if (BPF_IS_REDIRECT_INGRESS(flags))
+	if (BPF_IS_REDIRECT_INGRESS(flags)) {
+		if (skb_at_tc_ingress(skb2))
+			skb_postpush_rcsum(skb2, skb_mac_header(skb2),
+					   skb2->mac_len);
 		return dev_forward_skb(dev, skb2);
+	}
 
 	skb2->dev = dev;
 	skb_sender_cpu_clear(skb2);
@@ -1569,8 +1574,12 @@ int skb_do_redirect(struct sk_buff *skb)
 		return -EINVAL;
 	}
 
-	if (BPF_IS_REDIRECT_INGRESS(ri->flags))
+	if (BPF_IS_REDIRECT_INGRESS(ri->flags)) {
+		if (skb_at_tc_ingress(skb))
+			skb_postpush_rcsum(skb, skb_mac_header(skb),
+					   skb->mac_len);
 		return dev_forward_skb(dev, skb);
+	}
 
 	skb->dev = dev;
 	skb_sender_cpu_clear(skb);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next v2 1/2] net, sched: add skb_at_tc_ingress helper
  2016-01-07 14:50 ` [PATCH net-next v2 1/2] net, sched: add skb_at_tc_ingress helper Daniel Borkmann
@ 2016-01-07 18:12   ` Alexei Starovoitov
  0 siblings, 0 replies; 6+ messages in thread
From: Alexei Starovoitov @ 2016-01-07 18:12 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: davem, hannes, netdev

On Thu, Jan 07, 2016 at 03:50:22PM +0100, Daniel Borkmann wrote:
> Add a skb_at_tc_ingress() as this will be needed elsewhere as well and
> can hide the ugly ifdef.
> 
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Acked-by: Alexei Starovoitov <ast@kernel.org>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next v2 2/2] bpf: add skb_postpush_rcsum and fix dev_forward_skb occasions
  2016-01-07 14:50 ` [PATCH net-next v2 2/2] bpf: add skb_postpush_rcsum and fix dev_forward_skb occasions Daniel Borkmann
@ 2016-01-07 18:52   ` Alexei Starovoitov
  0 siblings, 0 replies; 6+ messages in thread
From: Alexei Starovoitov @ 2016-01-07 18:52 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: davem, hannes, netdev

On Thu, Jan 07, 2016 at 03:50:23PM +0100, Daniel Borkmann wrote:
> Add a small helper skb_postpush_rcsum() and fix up redirect locations
> that need CHECKSUM_COMPLETE fixups on ingress. dev_forward_skb() expects
> a proper csum that covers also Ethernet header, f.e. since 2c26d34bbcc0
> ("net/core: Handle csum for CHECKSUM_COMPLETE VXLAN forwarding"), we
> also do skb_postpull_rcsum() after pulling Ethernet header off via
> eth_type_trans().
> 
> When using eBPF in a netns setup f.e. with vxlan in collect metadata mode,
> I can trigger the following csum issue with an IPv6 setup:
> 
>   [  505.144065] dummy1: hw csum failure
>   [...]
>   [  505.144108] Call Trace:
>   [  505.144112]  <IRQ>  [<ffffffff81372f08>] dump_stack+0x44/0x5c
>   [  505.144134]  [<ffffffff81607cea>] netdev_rx_csum_fault+0x3a/0x40
>   [  505.144142]  [<ffffffff815fee3f>] __skb_checksum_complete+0xcf/0xe0
>   [  505.144149]  [<ffffffff816f0902>] nf_ip6_checksum+0xb2/0x120
>   [  505.144161]  [<ffffffffa08c0e0e>] icmpv6_error+0x17e/0x328 [nf_conntrack_ipv6]
>   [  505.144170]  [<ffffffffa0898eca>] ? ip6t_do_table+0x2fa/0x645 [ip6_tables]
>   [  505.144177]  [<ffffffffa08c0725>] ? ipv6_get_l4proto+0x65/0xd0 [nf_conntrack_ipv6]
>   [  505.144189]  [<ffffffffa06c9a12>] nf_conntrack_in+0xc2/0x5a0 [nf_conntrack]
>   [  505.144196]  [<ffffffffa08c039c>] ipv6_conntrack_in+0x1c/0x20 [nf_conntrack_ipv6]
>   [  505.144204]  [<ffffffff8164385d>] nf_iterate+0x5d/0x70
>   [  505.144210]  [<ffffffff816438d6>] nf_hook_slow+0x66/0xc0
>   [  505.144218]  [<ffffffff816bd302>] ipv6_rcv+0x3f2/0x4f0
>   [  505.144225]  [<ffffffff816bca40>] ? ip6_make_skb+0x1b0/0x1b0
>   [  505.144232]  [<ffffffff8160b77b>] __netif_receive_skb_core+0x36b/0x9a0
>   [  505.144239]  [<ffffffff8160bdc8>] ? __netif_receive_skb+0x18/0x60
>   [  505.144245]  [<ffffffff8160bdc8>] __netif_receive_skb+0x18/0x60
>   [  505.144252]  [<ffffffff8160ccff>] process_backlog+0x9f/0x140
>   [  505.144259]  [<ffffffff8160c4a5>] net_rx_action+0x145/0x320
>   [...]
> 
> What happens is that on ingress, we push Ethernet header back in, either
> from cls_bpf or right before skb_do_redirect(), but without updating csum.
> The "hw csum failure" can be fixed by using the new skb_postpush_rcsum()
> helper for the dev_forward_skb() case to correct the csum diff again.
> 
> Thanks to Hannes Frederic Sowa for the csum_partial() idea!
> 
> Fixes: 3896d655f4d4 ("bpf: introduce bpf_clone_redirect() helper")
> Fixes: 27b29f63058d ("bpf: add bpf_redirect() helper")
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Acked-by: Alexei Starovoitov <ast@kernel.org>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next v2 0/2] BPF update
  2016-01-07 14:50 [PATCH net-next v2 0/2] BPF update Daniel Borkmann
  2016-01-07 14:50 ` [PATCH net-next v2 1/2] net, sched: add skb_at_tc_ingress helper Daniel Borkmann
  2016-01-07 14:50 ` [PATCH net-next v2 2/2] bpf: add skb_postpush_rcsum and fix dev_forward_skb occasions Daniel Borkmann
@ 2016-01-10 22:55 ` David Miller
  2 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2016-01-10 22:55 UTC (permalink / raw)
  To: daniel; +Cc: alexei.starovoitov, hannes, netdev

From: Daniel Borkmann <daniel@iogearbox.net>
Date: Thu,  7 Jan 2016 15:50:21 +0100

> Fixes a csum issue on ingress. As mentioned previously, net-next
> seems just fine imho. Later on, will follow up with couple of
> replacements like ovs_skb_postpush_rcsum() etc.
> 
> Thanks!
> 
> v1 -> v2:
>   - Added patch 1 with helper
>   - Implemented Hannes' idea to just use csum_partial, thanks!

Series applied.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-01-10 22:55 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-07 14:50 [PATCH net-next v2 0/2] BPF update Daniel Borkmann
2016-01-07 14:50 ` [PATCH net-next v2 1/2] net, sched: add skb_at_tc_ingress helper Daniel Borkmann
2016-01-07 18:12   ` Alexei Starovoitov
2016-01-07 14:50 ` [PATCH net-next v2 2/2] bpf: add skb_postpush_rcsum and fix dev_forward_skb occasions Daniel Borkmann
2016-01-07 18:52   ` Alexei Starovoitov
2016-01-10 22:55 ` [PATCH net-next v2 0/2] BPF update David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).