Netdev List
 help / color / mirror / Atom feed
* RE: [PATCH] net/smc: mark as BROKEN due to remote memory exposure
From: Parav Pandit @ 2017-05-17 22:37 UTC (permalink / raw)
  To: Doug Ledford, David Miller
  Cc: Bart.VanAssche@sandisk.com, torvalds@linux-foundation.org,
	hch@lst.de, netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
	stable@vger.kernel.org, ubraun@linux.vnet.ibm.com
In-Reply-To: <1495053467.2240.29.camel@redhat.com>

Hi Doug,

> -----Original Message-----
> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> owner@vger.kernel.org] On Behalf Of Doug Ledford
> Sent: Wednesday, May 17, 2017 3:38 PM
> To: David Miller <davem@davemloft.net>
> Cc: Bart.VanAssche@sandisk.com; torvalds@linux-foundation.org;
> hch@lst.de; netdev@vger.kernel.org; linux-rdma@vger.kernel.org;
> stable@vger.kernel.org; ubraun@linux.vnet.ibm.com
> Subject: Re: [PATCH] net/smc: mark as BROKEN due to remote memory
> exposure
> 
> On Tue, 2017-05-16 at 15:28 -0400, Doug Ledford wrote:
> > I hadn't realized EXPERIMENTAL was gone.  Which is too bad, because
> > that's entirely appropriate in this case, and would have had the
> > desired side effect of keeping it out of any non-cutting edge distros
> > and warning people of possible API changes.  With EXPERIMENTAL gone,
> > the closest thing we have is drivers/staging, since that tends to
> > imply some of the same consequences.  I know you think BROKEN is
> > overly harsh, but I'm not sure we should just do nothing.  How about
> > we take a few days to let some of the RDMA people closely review the
> > 143 page
> > (egads!) rfc (http://www.rfc-editor.org/info/rfc7609) to see if we
> > think it can be fixed to use multiple link layers with the existing
> > API in place or if it will require something other than AF_SMC.  If we
> > need to break API, then I think we should either fix it ASAP and send
> > that fix to the 4.11 stable series (which probably violates the
> > normative stable patch size/scope) or if the fix will take longer than
> > this kernel cycle, then move it to staging both here and in 4.11
> > stable, and fix it there and then move it back.  Something like that
> > would prevent the kind of API flappage we ought not do....
> 
> So, I've skimmed the entire RFC, focusing on things were I needed to.
>  Here's my take on it:
> 
> It would have been better with AF_INET/AF_INET6 and an option to enable
> SMC than AF_SMC.  The first implementation simply assumes AF_INET in
> the presence of AF_SMC.  When IPv6 support is added, some sort of
> guessing logic will have to be put in place to try and determine if an
> AF_SMC address is actually AF_INET or AF_INET6 since we won't have a
> guaranteed way of telling.  Apps can use struct sockaddr_storage as their
> normal element to stick the address into, and could rely on the kernel to
> interpret it properly based on the AF_INET/AF_INET6 differentiation, and
> this breaks that.  The RFC gives *some* thought to adding IPv6 in the
> future, but not a lot.  It may be that the answer is that in the future, IPv6
> support is enabled by making the IPv6 API be
> AF_INET6 + setsockopt(SMC) or the equivalent.  If that's the case, then I
> would suggest making the later API specifically call out AF_INET +
> setsockopt(SMC) be identical to AF_SMC.
> 

What are the shortcomings in my proposal [1] which I am reiterating below.
Bart also suggested to define new stream protocol for SMC similar to SCTP.

(a) address family should be either AF_INET or AF_INET6
(b) socket() API to introduce new protocol as PROTO_SMC for in socket() system call.

With this there is no additional setsockop() needed.

With this - user space applications, do getaddrinfo() with hint as
hints.ai_family  = AF_INET;
hints.ai_socktype = SOCK_STREAM;

getaddrinfo() returns back both the protocols TCP and SMC.
Famous database application such as Redis client iterates over all entries of getaddrinfo() and establishes connection to servers.

There are few advantages of this interface.
1. No change in any makefile of applications needed - unless user wants to specify explicitly that it wants to force SMC protocol.
2. No need to do LD_LIBRARY_PRELOAD, (which won't work anyway because bind() connect() of SMC checks for AF_INET).
3. No major changes to glibc to process AF_SMC differently
glibc references IPPROTO_TCP at 22 places. (compare to AF_INET at 140+ places).
A lot simpler test matrix for glibc for new protocol
5. No need to recompile applications, as long as getaddrinfo returns all streaming protocols (TCP, SMC)
6. for applications to make use of setsockopt() it needs another knob and hint from other places, which can be avoided because SMC TCP option negotiates with remote end

And representing new protocol as new protocol for a given address family appears correct, compare to setting socket options.

Tools like CRIU or similar tool might find a race conditions - when it queries socket option, SMC was not set, but later on SMC was set, and does incorrect handling.
Setting socket() with SMC protocol makes it easier to understand in this area as well.

I have additional proposal for link groups, resource creation area. I will take that up after this discussion.

[1] https://patchwork.kernel.org/patch/9719375/

> The protocol included a version header in the negotation messages.
>  This is good as it allows us to almost immediately start work on version 2
> that fixes the shortcomings of version 1 while maintaining back
> compatibility.
> 
> After reading the RFC, I can see why they only support one link layer:
> RoCE.  The issue here is that they punted on the hard issue of doing any sort
> of work to determine if security restrictions on the TCP connection should
> also be applied to the RDMA connection.  The RFC basically says "the RoCE
> link needs to have the same physical restrictions (vlan) as the TCP/IP link so
> that the security limitations are the same" because they don't do anything
> to check them essentially.
>  For v2 of the protocol, and for different link layer support, this is no longer
> sufficient, so there will be significant work to determine the security context
> of specific TCP connections and then make sure that they meet the security
> context of the RDMA links allowed.
> 
> Additionally, the same is likely to be true in terms of SELinux options.  The
> addition of the IB/OPA link layers will throw a bit of a monkey wrench in
> things because the SELinux control over those links is slightly different than
> a normal TCP/IP SELinux control.  For instance, you might create rules about
> processes and ports to make sure that the httpd daemon can only access
> specific ports on TCP/IP, but on IB/OPA you would need to create process
> and P_Key rules as IB/OPA don't have the same port level semantics, and
> it's the P_Key on communications that is enforced network wide, including
> in the switches, so controlling what P_Key a process can send/receive on is
> your best way to limit what a process can do.  Translation from one to the
> other might be rather difficult to do in any sort of automated fashion.
> 
> There might have to be some additional work to get this to properly
> account items for the RDMA control group elements that were recently
> taken into the kernel.
> 
> Finally, the RDMA subsystem is in the process of switching to structured
> option processing similar to how netlink does it.  For version 2 of this
> protocol, since it will be interacting with the RDMA core in many ways, will
> be simpler if it switches the on-wire negotiation packets to follow the same
> methods as that will allow reuse of code that it will have to have for
> properly dealing with the RDMA subsystem in the future.
> 
> So, I'm fine with it being left as is since you accepted the patch that makes
> note of the memory registration insecurity in the Kconfig text.
>  The fact that this is a versioned protocol means that we can fix the things
> we see as being wrong without having to have it all right from the very
> start, it can be done incrementally.
> 
> --
> Doug Ledford <dledford@redhat.com>
>     GPG KeyID: B826A3330E572FDD
> 
> Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
> body of a message to majordomo@vger.kernel.org More majordomo info
> at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net v2] selftests/bpf: fix broken build due to types.h
From: David Miller @ 2017-05-17 22:45 UTC (permalink / raw)
  To: yhs; +Cc: ast, daniel, netdev, kernel-team
In-Reply-To: <20170517221805.1593973-1-yhs@fb.com>

From: Yonghong Song <yhs@fb.com>
Date: Wed, 17 May 2017 15:18:05 -0700

> Commit 0a5539f66133 ("bpf: Provide a linux/types.h override
> for bpf selftests.") caused a build failure for tools/testing/selftest/bpf
> because of some missing types:
>     $ make -C tools/testing/selftests/bpf/
>     ...
>     In file included from /home/yhs/work/net-next/tools/testing/selftests/bpf/test_pkt_access.c:8:
>     ../../../include/uapi/linux/bpf.h:170:3: error: unknown type name '__aligned_u64'
>                     __aligned_u64   key;
>     ...
>     /usr/include/linux/swab.h:160:8: error: unknown type name '__always_inline'
>     static __always_inline __u16 __swab16p(const __u16 *p)
>     ...
> The type __aligned_u64 is defined in linux:include/uapi/linux/types.h.
> 
> The fix is to copy missing type definition into
> tools/testing/selftests/bpf/include/uapi/linux/types.h.
> Adding additional include "string.h" resolves __always_inline issue.
> 
> Fixes: 0a5539f66133 ("bpf: Provide a linux/types.h override for bpf selftests.")
> Signed-off-by: Yonghong Song <yhs@fb.com>

Applied, thank you.

^ permalink raw reply

* [PATCH] ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated
From: I-Dawson, Peter A @ 2017-05-17 22:51 UTC (permalink / raw)
  To: netdev@vger.kernel.org

This fix addresses two problems in the way the DSCP field is formulated on the
 encapsulating header of IPv6 tunnels.
This fix addresses Bug 195661. https://bugzilla.kernel.org/show_bug.cgi?id=195661 

1) The IPv6 tunneling code was manipulating the DSCP field of the encapsulating
 packet using the 32b flowlabel. Since the flowlabel is only the lower 20b it
 was incorrect to assume that the upper 12b containing the DSCP and ECN fields
 would remain intact when formulating the encapsulating header. This fix
 handles the 'inherit' and 'fixed-value' DSCP cases explicitly using the extant
 dsfield u8 variable.

2) The use of INET_ECN_encapsulate(0, dsfield) in ip6_tnl_xmit was incorrect
 and resulted in the DSCP value always being set to 0.
---
 net/ipv6/ip6_gre.c    | 18 ++++++++++--------
 net/ipv6/ip6_tunnel.c | 28 +++++++++++++++++-----------
 2 files changed, 27 insertions(+), 19 deletions(-)

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 8d128ba..42f51fe 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -537,11 +537,11 @@ static inline int ip6gre_xmit_ipv4(struct sk_buff *skb, struct net_device *dev)
 
 	memcpy(&fl6, &t->fl.u.ip6, sizeof(fl6));
 
-	dsfield = ipv4_get_dsfield(iph);
-
-	if (t->parms.flags & IP6_TNL_F_USE_ORIG_TCLASS)
-		fl6.flowlabel |= htonl((__u32)iph->tos << IPV6_TCLASS_SHIFT)
-					  & IPV6_TCLASS_MASK;
+	if (t->parms.flags & IP6_TNL_F_USE_ORIG_TCLASS) {
+		dsfield = ipv4_get_dsfield(iph);
+	} else {
+		dsfield = ip6_tclass(t->parms.flowinfo);
+	}
 	if (t->parms.flags & IP6_TNL_F_USE_ORIG_FWMARK)
 		fl6.flowi6_mark = skb->mark;
 	else
@@ -598,9 +598,11 @@ static inline int ip6gre_xmit_ipv6(struct sk_buff *skb, struct net_device *dev)
 
 	memcpy(&fl6, &t->fl.u.ip6, sizeof(fl6));
 
-	dsfield = ipv6_get_dsfield(ipv6h);
-	if (t->parms.flags & IP6_TNL_F_USE_ORIG_TCLASS)
-		fl6.flowlabel |= (*(__be32 *) ipv6h & IPV6_TCLASS_MASK);
+	if (t->parms.flags & IP6_TNL_F_USE_ORIG_TCLASS) {
+		dsfield = ipv6_get_dsfield(ipv6h);
+	} else {
+		dsfield = ip6_tclass(t->parms.flowinfo);
+        }
 	if (t->parms.flags & IP6_TNL_F_USE_ORIG_FLOWLABEL)
 		fl6.flowlabel |= ip6_flowlabel(ipv6h);
 	if (t->parms.flags & IP6_TNL_F_USE_ORIG_FWMARK)
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 6eb2ae5..4d45195 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1196,8 +1196,7 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev, __u8 dsfield,
 	skb_push(skb, sizeof(struct ipv6hdr));
 	skb_reset_network_header(skb);
 	ipv6h = ipv6_hdr(skb);
-	ip6_flow_hdr(ipv6h, INET_ECN_encapsulate(0, dsfield),
-		     ip6_make_flowlabel(net, skb, fl6->flowlabel, true, fl6));
+	ip6_flow_hdr(ipv6h, dsfield, ip6_make_flowlabel(net, skb, fl6->flowlabel, true, fl6));
 	ipv6h->hop_limit = hop_limit;
 	ipv6h->nexthdr = proto;
 	ipv6h->saddr = fl6->saddr;
@@ -1231,8 +1230,6 @@ ip4ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev)
 	if (tproto != IPPROTO_IPIP && tproto != 0)
 		return -1;
 
-	dsfield = ipv4_get_dsfield(iph);
-
 	if (t->parms.collect_md) {
 		struct ip_tunnel_info *tun_info;
 		const struct ip_tunnel_key *key;
@@ -1246,6 +1243,7 @@ ip4ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev)
 		fl6.flowi6_proto = IPPROTO_IPIP;
 		fl6.daddr = key->u.ipv6.dst;
 		fl6.flowlabel = key->label;
+		dsfield = ip6_tclass(key->label);
 	} else {
 		if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT))
 			encap_limit = t->parms.encap_limit;
@@ -1253,9 +1251,11 @@ ip4ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev)
 		memcpy(&fl6, &t->fl.u.ip6, sizeof(fl6));
 		fl6.flowi6_proto = IPPROTO_IPIP;
 
-		if (t->parms.flags & IP6_TNL_F_USE_ORIG_TCLASS)
-			fl6.flowlabel |= htonl((__u32)iph->tos << IPV6_TCLASS_SHIFT)
-					 & IPV6_TCLASS_MASK;
+		if (t->parms.flags & IP6_TNL_F_USE_ORIG_TCLASS) {
+			dsfield = ipv4_get_dsfield(iph);
+		} else {
+			dsfield = ip6_tclass(t->parms.flowinfo);
+		}
 		if (t->parms.flags & IP6_TNL_F_USE_ORIG_FWMARK)
 			fl6.flowi6_mark = skb->mark;
 		else
@@ -1267,6 +1267,8 @@ ip4ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev)
 	if (iptunnel_handle_offloads(skb, SKB_GSO_IPXIP6))
 		return -1;
 
+	dsfield = INET_ECN_encapsulate(dsfield, ipv4_get_dsfield(iph));
+
 	skb_set_inner_ipproto(skb, IPPROTO_IPIP);
 
 	err = ip6_tnl_xmit(skb, dev, dsfield, &fl6, encap_limit, &mtu,
@@ -1300,8 +1302,6 @@ ip6ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev)
 	    ip6_tnl_addr_conflict(t, ipv6h))
 		return -1;
 
-	dsfield = ipv6_get_dsfield(ipv6h);
-
 	if (t->parms.collect_md) {
 		struct ip_tunnel_info *tun_info;
 		const struct ip_tunnel_key *key;
@@ -1315,6 +1315,7 @@ ip6ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev)
 		fl6.flowi6_proto = IPPROTO_IPV6;
 		fl6.daddr = key->u.ipv6.dst;
 		fl6.flowlabel = key->label;
+		dsfield = ip6_tclass(key->label);
 	} else {
 		offset = ip6_tnl_parse_tlv_enc_lim(skb, skb_network_header(skb));
 		/* ip6_tnl_parse_tlv_enc_lim() might have reallocated skb->head */
@@ -1336,8 +1337,11 @@ ip6ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev)
 		memcpy(&fl6, &t->fl.u.ip6, sizeof(fl6));
 		fl6.flowi6_proto = IPPROTO_IPV6;
 
-		if (t->parms.flags & IP6_TNL_F_USE_ORIG_TCLASS)
-			fl6.flowlabel |= (*(__be32 *)ipv6h & IPV6_TCLASS_MASK);
+		if (t->parms.flags & IP6_TNL_F_USE_ORIG_TCLASS) {
+			dsfield = ipv6_get_dsfield(ipv6h);
+		} else {
+			dsfield = ip6_tclass(t->parms.flowinfo);
+                }
 		if (t->parms.flags & IP6_TNL_F_USE_ORIG_FLOWLABEL)
 			fl6.flowlabel |= ip6_flowlabel(ipv6h);
 		if (t->parms.flags & IP6_TNL_F_USE_ORIG_FWMARK)
@@ -1351,6 +1355,8 @@ ip6ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev)
 	if (iptunnel_handle_offloads(skb, SKB_GSO_IPXIP6))
 		return -1;
 
+        dsfield = INET_ECN_encapsulate(dsfield, ipv6_get_dsfield(ipv6h));
+
 	skb_set_inner_ipproto(skb, IPPROTO_IPV6);
 
 	err = ip6_tnl_xmit(skb, dev, dsfield, &fl6, encap_limit, &mtu,

^ permalink raw reply related

* Re: [PATCH net v2] selftests/bpf: fix broken build due to types.h
From: Daniel Borkmann @ 2017-05-17 22:57 UTC (permalink / raw)
  To: Yonghong Song, ast, davem, netdev; +Cc: kernel-team
In-Reply-To: <20170517221805.1593973-1-yhs@fb.com>

On 05/18/2017 12:18 AM, Yonghong Song wrote:
> Commit 0a5539f66133 ("bpf: Provide a linux/types.h override
> for bpf selftests.") caused a build failure for tools/testing/selftest/bpf
> because of some missing types:
>      $ make -C tools/testing/selftests/bpf/
>      ...
>      In file included from /home/yhs/work/net-next/tools/testing/selftests/bpf/test_pkt_access.c:8:
>      ../../../include/uapi/linux/bpf.h:170:3: error: unknown type name '__aligned_u64'
>                      __aligned_u64   key;
>      ...
>      /usr/include/linux/swab.h:160:8: error: unknown type name '__always_inline'
>      static __always_inline __u16 __swab16p(const __u16 *p)
>      ...
> The type __aligned_u64 is defined in linux:include/uapi/linux/types.h.
>
> The fix is to copy missing type definition into
> tools/testing/selftests/bpf/include/uapi/linux/types.h.
> Adding additional include "string.h" resolves __always_inline issue.
>
> Fixes: 0a5539f66133 ("bpf: Provide a linux/types.h override for bpf selftests.")
> Signed-off-by: Yonghong Song <yhs@fb.com>

Can you elaborate why string.h specifically? Can't we define the
__always_inline ourselves?

Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH net v2] selftests/bpf: fix broken build due to types.h
From: David Miller @ 2017-05-17 23:01 UTC (permalink / raw)
  To: daniel; +Cc: yhs, ast, netdev, kernel-team
In-Reply-To: <591CD540.1040305@iogearbox.net>

From: Daniel Borkmann <daniel@iogearbox.net>
Date: Thu, 18 May 2017 00:57:04 +0200

> On 05/18/2017 12:18 AM, Yonghong Song wrote:
>> Commit 0a5539f66133 ("bpf: Provide a linux/types.h override
>> for bpf selftests.") caused a build failure for
>> tools/testing/selftest/bpf
>> because of some missing types:
>>      $ make -C tools/testing/selftests/bpf/
>>      ...
>>      In file included from
>>      /home/yhs/work/net-next/tools/testing/selftests/bpf/test_pkt_access.c:8:
>>      ../../../include/uapi/linux/bpf.h:170:3: error: unknown type name
>>      '__aligned_u64'
>>                      __aligned_u64   key;
>>      ...
>>      /usr/include/linux/swab.h:160:8: error: unknown type name
>>      '__always_inline'
>>      static __always_inline __u16 __swab16p(const __u16 *p)
>>      ...
>> The type __aligned_u64 is defined in linux:include/uapi/linux/types.h.
>>
>> The fix is to copy missing type definition into
>> tools/testing/selftests/bpf/include/uapi/linux/types.h.
>> Adding additional include "string.h" resolves __always_inline issue.
>>
>> Fixes: 0a5539f66133 ("bpf: Provide a linux/types.h override for bpf
>> selftests.")
>> Signed-off-by: Yonghong Song <yhs@fb.com>
> 
> Can you elaborate why string.h specifically? Can't we define the
> __always_inline ourselves?

That way it comes from compiler.h

Probably it would have been better to have the BPF linux/types.h bring
it in.

Sorry I applied this so quickly, I wanted this regression fixed as fast
as possible.

^ permalink raw reply

* Re: [PATCH net v2] selftests/bpf: fix broken build due to types.h
From: Daniel Borkmann @ 2017-05-17 23:15 UTC (permalink / raw)
  To: David Miller; +Cc: yhs, ast, netdev, kernel-team
In-Reply-To: <20170517.190149.1446382466519323776.davem@davemloft.net>

On 05/18/2017 01:01 AM, David Miller wrote:
> From: Daniel Borkmann <daniel@iogearbox.net>
> Date: Thu, 18 May 2017 00:57:04 +0200
>
>> On 05/18/2017 12:18 AM, Yonghong Song wrote:
>>> Commit 0a5539f66133 ("bpf: Provide a linux/types.h override
>>> for bpf selftests.") caused a build failure for
>>> tools/testing/selftest/bpf
>>> because of some missing types:
>>>       $ make -C tools/testing/selftests/bpf/
>>>       ...
>>>       In file included from
>>>       /home/yhs/work/net-next/tools/testing/selftests/bpf/test_pkt_access.c:8:
>>>       ../../../include/uapi/linux/bpf.h:170:3: error: unknown type name
>>>       '__aligned_u64'
>>>                       __aligned_u64   key;
>>>       ...
>>>       /usr/include/linux/swab.h:160:8: error: unknown type name
>>>       '__always_inline'
>>>       static __always_inline __u16 __swab16p(const __u16 *p)
>>>       ...
>>> The type __aligned_u64 is defined in linux:include/uapi/linux/types.h.
>>>
>>> The fix is to copy missing type definition into
>>> tools/testing/selftests/bpf/include/uapi/linux/types.h.
>>> Adding additional include "string.h" resolves __always_inline issue.
>>>
>>> Fixes: 0a5539f66133 ("bpf: Provide a linux/types.h override for bpf
>>> selftests.")
>>> Signed-off-by: Yonghong Song <yhs@fb.com>
>>
>> Can you elaborate why string.h specifically? Can't we define the
>> __always_inline ourselves?
>
> That way it comes from compiler.h
>
> Probably it would have been better to have the BPF linux/types.h bring
> it in.

Would have made it a bit more clear at least.

> Sorry I applied this so quickly, I wanted this regression fixed as fast
> as possible.

Ok, no problem.

Btw, 0day kernel testing bot is from now on running BPF selftests
as well as part of their regression tests. Fengguang confirmed this
today after integrating this.

^ permalink raw reply

* Re: [patch net-next v4 06/10] net: sched: introduce helpers to work with filter chains
From: Joe Perches @ 2017-05-17 23:17 UTC (permalink / raw)
  To: Jiri Pirko, Jamal Hadi Salim
  Cc: netdev, davem, xiyou.wangcong, dsa, edumazet, stephen, daniel,
	alexander.h.duyck, simon.horman, mlxsw
In-Reply-To: <20170517124446.GB9557@nanopsycho>

On Wed, 2017-05-17 at 14:44 +0200, Jiri Pirko wrote:
> Wed, May 17, 2017 at 02:39:05PM CEST, jhs@mojatatu.com wrote:
> > On 17-05-17 08:25 AM, Jiri Pirko wrote:
> > > Wed, May 17, 2017 at 02:18:00PM CEST, jhs@mojatatu.com wrote:
> > > > On 17-05-17 05:07 AM, Jiri Pirko wrote:
> > > > > From: Jiri Pirko <jiri@mellanox.com>
> > > > > 
> > > > > Introduce struct tcf_chain object and set of helpers around it. Wraps up
> > > > > insertion, deletion and search in the filter chain.
> > > > > 
> > > > > Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> > > > > ---
> > > > 
> > > > [..]
> > > > > +
> > > > > +static void
> > > > > +tcf_chain_filter_chain_ptr_set(struct tcf_chain *chain,
> > > > > +			       struct tcf_proto __rcu **p_filter_chain)
> > > > > +
> > > > 
> > > > What are the rules for this? Common coding style is:
> > > > static void tcf_chain_filter_chain_ptr_set(struct tcf_chain *chain,
> > > >                                           struct tcf_proto ..
> > > 
> > > When this would not fit 80 cols (this case), you need to wrap the
> > > text in front of the function name. That is exacly what I did.
> > > 
> > 
> > That i understand.
> > The question is: what does scripture dictate on conflict?
> > Should a function signature always follow coding style and
> > allow for exceeding 80 chars or the 80 chars rules trumps?
> 
> Definitelly 80 chars rules trumps here.

Disagree.
80 columns is just a "strongly preferred" limit.

Clarity for a human reader trumps everything else.

^ permalink raw reply

* Re: [PATCH net v2] selftests/bpf: fix broken build due to types.h
From: David Miller @ 2017-05-17 23:18 UTC (permalink / raw)
  To: daniel; +Cc: yhs, ast, netdev, kernel-team
In-Reply-To: <591CD99F.1060807@iogearbox.net>

From: Daniel Borkmann <daniel@iogearbox.net>
Date: Thu, 18 May 2017 01:15:43 +0200

> Btw, 0day kernel testing bot is from now on running BPF selftests
> as well as part of their regression tests. Fengguang confirmed this
> today after integrating this.

That's great news.

^ permalink raw reply

* Re: [PATCH 2/2] arp: always override existing neigh entries with gratuitous ARP
From: Julian Anastasov @ 2017-05-17 23:25 UTC (permalink / raw)
  To: Ihar Hrachyshka; +Cc: davem, netdev
In-Reply-To: <8d14e8bd5c4d988f073eb14307bbb8491896414c.1495056542.git.ihrachys@redhat.com>


	Hello,

On Wed, 17 May 2017, Ihar Hrachyshka wrote:

> Currently, when arp_accept is 1, we always override existing neigh
> entries with incoming gratuitous ARP replies. Otherwise, we override
> them only if new replies satisfy _locktime_ conditional (packets arrive
> not earlier than _locktime_ seconds since the last update to the neigh
> entry).
> 
> The idea behind locktime is to pick the very first (=> close) reply
> received in a unicast burst when ARP proxies are used. This helps to
> avoid ARP thrashing where Linux would switch back and forth from one
> proxy to another.
> 
> This logic has nothing to do with gratuitous ARP replies that are
> generally not aligned in time when multiple IP address carriers send
> them into network.
> 
> This patch enforces overriding of existing neigh entries by all incoming
> gratuitous ARP packets, irrespective of their time of arrival. This will
> make the kernel honour all incoming gratuitous ARP packets.
> 
> Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com>
> ---
>  net/ipv4/arp.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
> index 3f06201..97ea9d8 100644
> --- a/net/ipv4/arp.c
> +++ b/net/ipv4/arp.c
> @@ -858,16 +858,16 @@ static int arp_process(struct net *net, struct sock *sk, struct sk_buff *skb)
>  
>  	n = __neigh_lookup(&arp_tbl, &sip, dev, 0);
>  
> -	if (IN_DEV_ARP_ACCEPT(in_dev)) {
> -		unsigned int addr_type = inet_addr_type_dev_table(net, dev, sip);
> -
> -		/* Unsolicited ARP is not accepted by default.

	Above line is not related to GARP. I think, it should
stay at its old place, see below. Also, in first patch, line
should be:

	/* GARP _replies_ also require target hwaddr to be
	 * the same as source.
	 */

> -		   It is possible, that this option should be enabled for some
> -		   devices (strip is candidate)
> -		 */
> +	if (n || IN_DEV_ARP_ACCEPT(in_dev)) {
> +		addr_type = inet_addr_type_dev_table(net, dev, sip);
>  		is_garp = arp_is_garp(dev, addr_type, arp->ar_op,
>  				      sip, tip, sha, tha);

	There was something I didn't liked in the old code:
inet_addr_type_dev_table() can be slow and we should call it
only when needed. With some rearranging we can fix it, for
example:

1. arp_is_garp(net,..., int *addr_type) should use:

	bool is_garp = tip == sip;

	...
	if (is_garp) {
		*addr_type = inet_addr_type_dev_table(net, dev, sip);
		if (*addr_type != RTN_UNICAST)
			is_garp = false;
	}
	return is_garp;

> +	}
>  
> +	if (IN_DEV_ARP_ACCEPT(in_dev)) {
> +		/* It is possible, that this option should be enabled for some
> +		 * devices (strip is candidate)
> +		 */
>  		if (!n &&
>  		    ((arp->ar_op == htons(ARPOP_REPLY)  &&
>  				addr_type == RTN_UNICAST) || is_garp))
> 

2. fill addr_type and do is_garp check first:

	if (n || IN_DEV_ARP_ACCEPT(in_dev)) {
		addr_type = -1;
                is_garp = arp_is_garp(net, dev, addr_type, arp->ar_op,
                                      sip, tip, sha, tha, &addr_type);
	}


	/* Unsolicited ARP is not accepted by default.
	 * It is possible, that this option should be enabled for some
	 * devices (strip is candidate)
	 */
	if (!n && IN_DEV_ARP_ACCEPT(in_dev) &&
	    (is_garp ||
	     (arp->ar_op == htons(ARPOP_REPLY) &&
	      (addr_type == RTN_UNICAST ||
	       (addr_type < 0 &&
	        inet_addr_type_dev_table(net, dev, sip) == RTN_UNICAST)))))
		n = __neigh_lookup(&arp_tbl, &sip, dev, 1);

	As result, we will avoid the unneeded
inet_addr_type_dev_table() call for ARP requests (non-GARP)
which are too common to see. May be there is another way
to make this code more nice...

Regards

^ permalink raw reply

* Re: [PATCH net v2] selftests/bpf: fix broken build due to types.h
From: Yonghong Song @ 2017-05-17 23:48 UTC (permalink / raw)
  To: Daniel Borkmann, ast, davem, netdev; +Cc: kernel-team
In-Reply-To: <591CD540.1040305@iogearbox.net>



On 5/17/17 3:57 PM, Daniel Borkmann wrote:
> On 05/18/2017 12:18 AM, Yonghong Song wrote:
>> Commit 0a5539f66133 ("bpf: Provide a linux/types.h override
>> for bpf selftests.") caused a build failure for 
>> tools/testing/selftest/bpf
>> because of some missing types:
>>      $ make -C tools/testing/selftests/bpf/
>>      ...
>>      In file included from 
>> /home/yhs/work/net-next/tools/testing/selftests/bpf/test_pkt_access.c:8:
>>      ../../../include/uapi/linux/bpf.h:170:3: error: unknown type name 
>> '__aligned_u64'
>>                      __aligned_u64   key;
>>      ...
>>      /usr/include/linux/swab.h:160:8: error: unknown type name 
>> '__always_inline'
>>      static __always_inline __u16 __swab16p(const __u16 *p)
>>      ...
>> The type __aligned_u64 is defined in linux:include/uapi/linux/types.h.
>>
>> The fix is to copy missing type definition into
>> tools/testing/selftests/bpf/include/uapi/linux/types.h.
>> Adding additional include "string.h" resolves __always_inline issue.
>>
>> Fixes: 0a5539f66133 ("bpf: Provide a linux/types.h override for bpf 
>> selftests.")
>> Signed-off-by: Yonghong Song <yhs@fb.com>
> 
> Can you elaborate why string.h specifically? Can't we define the
> __always_inline ourselves?

Actually, yes, we can define __always_inline ourselves in each bpf 
program or in some common header files used by all selftests bpf
programs.

I use string.h because several other bpf programs
(test_xdp.c test_l4lb.c test_tcp_estats.c) they all have
__always_inline and string.h, so they do not have issue.
I discovered this pattern so I just add string.h into test_pkt_access.c
as well.

string.h provides memset/memcpy prototypes which is used in bpf program 
(test_xdp.c and test_tcp_estats.c, but not test_l4lb.c and 
test_pkt_access.c).

Thanks,

Yonghong

> 
> Thanks,
> Daniel

^ permalink raw reply

* Re: [PATCH net v2] selftests/bpf: fix broken build due to types.h
From: Daniel Borkmann @ 2017-05-17 23:52 UTC (permalink / raw)
  To: Yonghong Song, ast, davem, netdev; +Cc: kernel-team
In-Reply-To: <5927830d-fac2-a204-47e7-78f0e4149afb@fb.com>

On 05/18/2017 01:48 AM, Yonghong Song wrote:
> On 5/17/17 3:57 PM, Daniel Borkmann wrote:
>> On 05/18/2017 12:18 AM, Yonghong Song wrote:
>>> Commit 0a5539f66133 ("bpf: Provide a linux/types.h override
>>> for bpf selftests.") caused a build failure for tools/testing/selftest/bpf
>>> because of some missing types:
>>>      $ make -C tools/testing/selftests/bpf/
>>>      ...
>>>      In file included from /home/yhs/work/net-next/tools/testing/selftests/bpf/test_pkt_access.c:8:
>>>      ../../../include/uapi/linux/bpf.h:170:3: error: unknown type name '__aligned_u64'
>>>                      __aligned_u64   key;
>>>      ...
>>>      /usr/include/linux/swab.h:160:8: error: unknown type name '__always_inline'
>>>      static __always_inline __u16 __swab16p(const __u16 *p)
>>>      ...
>>> The type __aligned_u64 is defined in linux:include/uapi/linux/types.h.
>>>
>>> The fix is to copy missing type definition into
>>> tools/testing/selftests/bpf/include/uapi/linux/types.h.
>>> Adding additional include "string.h" resolves __always_inline issue.
>>>
>>> Fixes: 0a5539f66133 ("bpf: Provide a linux/types.h override for bpf selftests.")
>>> Signed-off-by: Yonghong Song <yhs@fb.com>
>>
>> Can you elaborate why string.h specifically? Can't we define the
>> __always_inline ourselves?
>
> Actually, yes, we can define __always_inline ourselves in each bpf program or in some common header files used by all selftests bpf
> programs.
>
> I use string.h because several other bpf programs
> (test_xdp.c test_l4lb.c test_tcp_estats.c) they all have
> __always_inline and string.h, so they do not have issue.
> I discovered this pattern so I just add string.h into test_pkt_access.c
> as well.

Ok, thanks for clarifying.

^ permalink raw reply

* Re: [PATCH] net/smc: mark as BROKEN due to remote memory exposure
From: Doug Ledford @ 2017-05-18  0:07 UTC (permalink / raw)
  To: Parav Pandit, David Miller
  Cc: Bart.VanAssche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org,
	torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org,
	hch-jcswGhMUV9g@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	ubraun-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org
In-Reply-To: <VI1PR0502MB3008604A216B388A440A8B53D1E70-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>


[-- Attachment #1.1: Type: text/plain, Size: 4247 bytes --]

On 5/17/2017 6:37 PM, Parav Pandit wrote:
> Hi Doug,
> 

>> It would have been better with AF_INET/AF_INET6 and an option to enable
>> SMC than AF_SMC.  The first implementation simply assumes AF_INET in
>> the presence of AF_SMC.  When IPv6 support is added, some sort of
>> guessing logic will have to be put in place to try and determine if an
>> AF_SMC address is actually AF_INET or AF_INET6 since we won't have a
>> guaranteed way of telling.  Apps can use struct sockaddr_storage as their
>> normal element to stick the address into, and could rely on the kernel to
>> interpret it properly based on the AF_INET/AF_INET6 differentiation, and
>> this breaks that.  The RFC gives *some* thought to adding IPv6 in the
>> future, but not a lot.  It may be that the answer is that in the future, IPv6
>> support is enabled by making the IPv6 API be
>> AF_INET6 + setsockopt(SMC) or the equivalent.  If that's the case, then I
>> would suggest making the later API specifically call out AF_INET +
>> setsockopt(SMC) be identical to AF_SMC.
>>
> 
> What are the shortcomings in my proposal [1] which I am reiterating below.
> Bart also suggested to define new stream protocol for SMC similar to SCTP.
> 
> (a) address family should be either AF_INET or AF_INET6
> (b) socket() API to introduce new protocol as PROTO_SMC for in socket() system call.
> 
> With this there is no additional setsockop() needed.
> 
> With this - user space applications, do getaddrinfo() with hint as
> hints.ai_family  = AF_INET;
> hints.ai_socktype = SOCK_STREAM;
> 
> getaddrinfo() returns back both the protocols TCP and SMC.
> Famous database application such as Redis client iterates over all entries of getaddrinfo() and establishes connection to servers.
> 
> There are few advantages of this interface.
> 1. No change in any makefile of applications needed - unless user wants to specify explicitly that it wants to force SMC protocol.
> 2. No need to do LD_LIBRARY_PRELOAD, (which won't work anyway because bind() connect() of SMC checks for AF_INET).
> 3. No major changes to glibc to process AF_SMC differently
> glibc references IPPROTO_TCP at 22 places. (compare to AF_INET at 140+ places).
> A lot simpler test matrix for glibc for new protocol
> 5. No need to recompile applications, as long as getaddrinfo returns all streaming protocols (TCP, SMC)
> 6. for applications to make use of setsockopt() it needs another knob and hint from other places, which can be avoided because SMC TCP option negotiates with remote end
> 
> And representing new protocol as new protocol for a given address family appears correct, compare to setting socket options.
> 
> Tools like CRIU or similar tool might find a race conditions - when it queries socket option, SMC was not set, but later on SMC was set, and does incorrect handling.
> Setting socket() with SMC protocol makes it easier to understand in this area as well.

I have no problem with the proposal in itself, but as IBM released this
publication and did their own implementation prior to submitting things
upstream, and as there might exist in the field implementations, it
depends on whether we wish to call those in the field implementations
experimental and break them as we go to a final implementation of
version 1, or if we consider version 1 baked.  I'm fine breaking it.
After all, that's what happened with XRC back in the day and Mellanox
learned a valuable lesson about upstream first.  I have no problem with
IBM learning that same lesson IMO.  So, I find your proposal, including
breaking the API of the version 1 implementation just taken into the
kernel before it has had time to fully sit and gel, acceptable.

But this is where we kind of need a judgment from on high, and why I
Cc:ed Linus on this thread.  Any input on this issue Linus?

> I have additional proposal for link groups, resource creation area. I will take that up after this discussion.

Look forward to hearing it.

> [1] https://patchwork.kernel.org/patch/9719375/


-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    GPG Key ID: B826A3330E572FDD
    Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply

* Re: [PATCH 2/6] wl1251: Use request_firmware_prefer_user() for loading NVS calibration data
From: Bjorn Andersson @ 2017-05-18  0:08 UTC (permalink / raw)
  To: Pali Roh?r
  Cc: Johannes Berg, Luis R. Rodriguez, Arend Van Spriel, Pavel Machek,
	Daniel Wagner, Tom Gundersen, Greg Kroah-Hartman, Kalle Valo,
	David Gnedt, Tony Lindgren, Sebastian Reichel, Ivaylo Dimitrov,
	Aaro Koskinen, Takashi Iwai, AKASHI Takahiro, David Woodhouse,
	Grazvydas Ignotas, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, Micha?? Kazior <kaz
In-Reply-To: <20170517125335.GG10015@pali>

On Wed 17 May 05:53 PDT 2017, Pali Roh?r wrote:

> On Wednesday 17 May 2017 14:06:06 Johannes Berg wrote:
> > On Tue, 2017-05-16 at 01:13 +0200, Luis R. Rodriguez wrote:
> > > > > Now for N900 case there is a similar scenario
> > > > > alhtough it has additional requirement to go to user-space due to
> > > > > need to use a proprietary library to obtain the NVS calibration
> > > > > data. My thought: Why should firmware_class care?
> > > 
> > > Agreed.
> > 
> > In fact, why should the *driver* care either? IOW - why should
> > "request_firmware_prefer_user()" even exist?
> 
> There are default/example NVS data, which are stored in /lib/firmware
> and installed by linux-firmware package. Those example calibration data
> should not be used for real usage, but Pavel told us that on N900 they
> are enough for working WIFI connection. They does not contain valid MAC
> address, so kernel should generate some (random?).
> 
> So kernel driver should get NVS calibration data from userspace (which
> know how where to get or how to prepare them) and in case userspace do
> not have it, then we can try fallback to those example data (as people
> reported us they can be useful instead of non-working WIFI).
> 

We're going to see a similar case with the Qualcomm DB410c WiFi soon,
where there is default calibration for the chip (wcn3620) but specific
calibration data for the particular board or product using this chip.

As with your case we expect to have a "generic" calibration file
integrated in linux-firmware, but providing means to supporting
device-specific calibration is probably going to be requested shortly.


We have however altered the reference design of picking the MAC address
from the calibration data and have the bootloader pass it via DT - so
our calibration data doesn't need to be specific to each unit.

Regards,
Bjorn

^ permalink raw reply

* Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.
From: David Miller @ 2017-05-18  0:16 UTC (permalink / raw)
  To: ecree; +Cc: ast, daniel, alexei.starovoitov, netdev
In-Reply-To: <e873f261-e1c1-c344-3d72-254623e0c91a@solarflare.com>

From: Edward Cree <ecree@solarflare.com>
Date: Wed, 17 May 2017 16:33:27 +0100

> So I did some experiments (in Python, script follows) and found that
>  indeed this does appear to work, at least for addition and shifts.
> The idea is that we have a 0s mask and a 1s mask; for bits that are
>  unknown, the 0s mask is set and the 1s mask is cleared.  So a
>  completely unknown variable has masks (~0, 0), then if you shift it
>  left 2 you get (~3, 0) - just shift both masks.  A constant x has
>  masks (x, x) - all the 0s are known 0s and all the 1s are known 1s.
> Addition is a bit more complicated: we compute the 'unknown bits'
>  mask, by XORing the 0s and 1s masks together, of each addend.  Then
>  we add the corresponding masks from each addend together, and force
>  the 'unknown' bits to the appropriate values in each mask.
> So given (a1, b1) and (a2, b2), we compute m1 = a1 ^ b1,
>  m2 = a2 ^ b2, and m = m1 | m2.  Then a = (a1 + a2) | m, and
>  b = (b1 + b2) & ~m.
> As a worked example, 2 + (x << 2) + 14:
>  2 => (2, 2) constant
>  x => (~0, 0) unknown
>  x << 2 => (~3, 0)
>  2 + (x << 2): add (2, 2) with (~3, 0)
>     m1 = 0, m2 = ~3, m = ~3
>     a = (2 + ~3) | ~3 = ~1 | ~3 = ~1
>     b = (2 + 0) & ~~3 = 2 & 3 = 2
>     so (~1, 2), which means "...xx10"
> now add 14: add (~1, 2) with (14, 14)
>     m1 = ~3, m2 = 0, m = ~3
>     a = (~1 + 14) | ~3 = 12 | ~3 = ~3
>     b = (2 + 14) & ~~3 = 16 & 3 = 0
>     so (~3, 0), which means "...xx00"
> and the result is 4-byte aligned.

Ok, I'm starting to digest this.  If it works it's a nice universal
way to handle alignment tracking, that's for sure.

So, in C, addition (a += b) is something like:

struct bpf_reg_bits {
        u64 zero_bits;
        u64 one_bits;
};

static void add_update_bits(struct bpf_reg_bits *a, struct bpf_reg_bits *b)
{
        u64 m_zeros, m_ones, m_all;

        m_zeros = a->zero_bits ^ b->zero_bits;
        m_ones = a->one_bits ^ b->one_bits;
        m_all = m_zeros | m_ones;

        a->zero_bits = (a->zero_bits + b->zero_bits) | m_all;
        a->one_bits = (a->one_bits + b->zero_bits) & ~m_all;
}

Then, is subtraction merely:

static void sub_update_bits(struct bpf_reg_bits *a, struct bpf_reg_bits *b)
{
        u64 m_zeros, m_ones, m_all;

        m_zeros = a->zero_bits ^ b->zero_bits;
        m_ones = a->one_bits ^ b->one_bits;
        m_all = m_zeros | m_ones;

        a->zero_bits = (a->zero_bits - b->zero_bits) | m_all;
        a->one_bits = (a->one_bits - b->zero_bits) & ~m_all;
}

Or is something different needed?

What about multiplication?  AND should be easy too.

BTW, we track something similar already in reg->imm which tracks how
many high order bits are known to be cleared in the register.  It is
used to avoid potential overflow for packet pointer accesses.  I bet
we can subsume that into this facility as well.

Thanks for all of your suggestions so far.

^ permalink raw reply

* [PATCH v3] net: dsa: b53: Add compatible strings for the Cygnus-family BCM11360.
From: Eric Anholt @ 2017-05-18  0:32 UTC (permalink / raw)
  To: Florian Fainelli, Vivien Didelot, Andrew Lunn,
	netdev-u79uwXL29TY76Z2rM5mHXA, Rob Herring, Mark Rutland,
	devicetree-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	bcm-kernel-feedback-list-dY08KVG/lbpWk0Htik3J/w, Ray Jui,
	Scott Branden, Jon Mason, Eric Anholt

Cygnus is a small family of SoCs, of which we currently have
devicetree for BCM11360 and BCM58300.  The 11360's B53 is mostly the
same as 58xx, just requiring a tiny bit of setup that was previously
missing.

Signed-off-by: Eric Anholt <eric-WhKQ6XTQaPysTnJN9+BGXg@public.gmane.org>
Reviewed-by: Florian Fainelli <f.fainelli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Acked-by: Rob Herring <robh-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---

v2: Reorder the entry in the docs (suggestion by Scott Branden), add
    missing '"'
v3: Resend now that 4.12-rc1 is out.

 Documentation/devicetree/bindings/net/dsa/b53.txt | 3 +++
 drivers/net/dsa/b53/b53_srab.c                    | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/Documentation/devicetree/bindings/net/dsa/b53.txt b/Documentation/devicetree/bindings/net/dsa/b53.txt
index d6c6e41648d4..eb679e92d525 100644
--- a/Documentation/devicetree/bindings/net/dsa/b53.txt
+++ b/Documentation/devicetree/bindings/net/dsa/b53.txt
@@ -13,6 +13,9 @@ Required properties:
       "brcm,bcm5397"
       "brcm,bcm5398"
 
+  For the BCM11360 SoC, must be:
+      "brcm,bcm11360-srab" and the mandatory "brcm,cygnus-srab" string
+
   For the BCM5310x SoCs with an integrated switch, must be one of:
       "brcm,bcm53010-srab"
       "brcm,bcm53011-srab"
diff --git a/drivers/net/dsa/b53/b53_srab.c b/drivers/net/dsa/b53/b53_srab.c
index 8a62b6a69703..c37ffd1b6833 100644
--- a/drivers/net/dsa/b53/b53_srab.c
+++ b/drivers/net/dsa/b53/b53_srab.c
@@ -364,6 +364,7 @@ static const struct of_device_id b53_srab_of_match[] = {
 	{ .compatible = "brcm,bcm53018-srab" },
 	{ .compatible = "brcm,bcm53019-srab" },
 	{ .compatible = "brcm,bcm5301x-srab" },
+	{ .compatible = "brcm,bcm11360-srab", .data = (void *)BCM58XX_DEVICE_ID },
 	{ .compatible = "brcm,bcm58522-srab", .data = (void *)BCM58XX_DEVICE_ID },
 	{ .compatible = "brcm,bcm58525-srab", .data = (void *)BCM58XX_DEVICE_ID },
 	{ .compatible = "brcm,bcm58535-srab", .data = (void *)BCM58XX_DEVICE_ID },
@@ -371,6 +372,7 @@ static const struct of_device_id b53_srab_of_match[] = {
 	{ .compatible = "brcm,bcm58623-srab", .data = (void *)BCM58XX_DEVICE_ID },
 	{ .compatible = "brcm,bcm58625-srab", .data = (void *)BCM58XX_DEVICE_ID },
 	{ .compatible = "brcm,bcm88312-srab", .data = (void *)BCM58XX_DEVICE_ID },
+	{ .compatible = "brcm,cygnus-srab", .data = (void *)BCM58XX_DEVICE_ID },
 	{ .compatible = "brcm,nsp-srab", .data = (void *)BCM58XX_DEVICE_ID },
 	{ /* sentinel */ },
 };
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH net v2] selftests/bpf: fix broken build due to types.h
From: Yonghong Song @ 2017-05-18  0:46 UTC (permalink / raw)
  To: David Miller, daniel; +Cc: ast, netdev, kernel-team
In-Reply-To: <20170517.190149.1446382466519323776.davem@davemloft.net>



On 5/17/17 4:01 PM, David Miller wrote:
> From: Daniel Borkmann <daniel@iogearbox.net>
> Date: Thu, 18 May 2017 00:57:04 +0200
> 
>> On 05/18/2017 12:18 AM, Yonghong Song wrote:
>>> Commit 0a5539f66133 ("bpf: Provide a linux/types.h override
>>> for bpf selftests.") caused a build failure for
>>> tools/testing/selftest/bpf
>>> because of some missing types:
>>>       $ make -C tools/testing/selftests/bpf/
>>>       ...
>>>       In file included from
>>>       /home/yhs/work/net-next/tools/testing/selftests/bpf/test_pkt_access.c:8:
>>>       ../../../include/uapi/linux/bpf.h:170:3: error: unknown type name
>>>       '__aligned_u64'
>>>                       __aligned_u64   key;
>>>       ...
>>>       /usr/include/linux/swab.h:160:8: error: unknown type name
>>>       '__always_inline'
>>>       static __always_inline __u16 __swab16p(const __u16 *p)
>>>       ...
>>> The type __aligned_u64 is defined in linux:include/uapi/linux/types.h.
>>>
>>> The fix is to copy missing type definition into
>>> tools/testing/selftests/bpf/include/uapi/linux/types.h.
>>> Adding additional include "string.h" resolves __always_inline issue.
>>>
>>> Fixes: 0a5539f66133 ("bpf: Provide a linux/types.h override for bpf
>>> selftests.")
>>> Signed-off-by: Yonghong Song <yhs@fb.com>
>>
>> Can you elaborate why string.h specifically? Can't we define the
>> __always_inline ourselves?
> 
> That way it comes from compiler.h

Just a little bit correction. The __always_inline is not from 
compiler.h. The compiler.h is inside kernel source tree. Currently,
programs in selftests do not directly referencing kernel header
files (except test_verifier trying to do with alignment)

======
#ifdef HAVE_GENHDR
# include "autoconf.h"
#else
# if defined(__i386) || defined(__x86_64) || defined(__s390x__) || 
defined(__aarch64__)
#  define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS 1
# endif
#endif
=====
(The above part may be gone soon with recent alignment tracking patch)

[yhs@localhost include]$ pwd
/usr/include
[yhs@localhost include]$ find . -name "compiler.h"
[yhs@localhost include]$

The __always_inline comes from sys/cdefs.h
sys/cdefs.h:# define __always_inline __inline __attribute__ 
((__always_inline))

The following is the include chain leading to __always_inline:
string.h
    features.h
       sys/cdefs.h

Yes, it is deeply embedded in chain of header files and hard
to figure out intuitively....

Yonghong

> 
> Probably it would have been better to have the BPF linux/types.h bring
> it in.
> 
> Sorry I applied this so quickly, I wanted this regression fixed as fast
> as possible.
> 

^ permalink raw reply

* [PATCH net] bpf: adjust verifier heuristics
From: Daniel Borkmann @ 2017-05-18  1:00 UTC (permalink / raw)
  To: davem; +Cc: ast, netdev, Daniel Borkmann

Current limits with regards to processing program paths do not
really reflect today's needs anymore due to programs becoming
more complex and verifier smarter, keeping track of more data
such as const ALU operations, alignment tracking, spilling of
PTR_TO_MAP_VALUE_ADJ registers, and other features allowing for
smarter matching of what LLVM generates.

This also comes with the side-effect that we result in fewer
opportunities to prune search states and thus often need to do
more work to prove safety than in the past due to different
register states and stack layout where we mismatch. Generally,
it's quite hard to determine what caused a sudden increase in
complexity, it could be caused by something as trivial as a
single branch somewhere at the beginning of the program where
LLVM assigned a stack slot that is marked differently throughout
other branches and thus causing a mismatch, where verifier
then needs to prove safety for the whole rest of the program.
Subsequently, programs with even less than half the insn size
limit can get rejected. We noticed that while some programs
load fine under pre 4.11, they get rejected due to hitting
limits on more recent kernels. We saw that in the vast majority
of cases (90+%) pruning failed due to register mismatches. In
case of stack mismatches, majority of cases failed due to
different stack slot types (invalid, spill, misc) rather than
differences in spilled registers.

This patch makes pruning more aggressive by also adding markers
that sit at conditional jumps as well. Currently, we only mark
jump targets for pruning. For example in direct packet access,
these are usually error paths where we bail out. We found that
adding these markers, it can reduce number of processed insns
by up to 30%. Another option is to ignore reg->id in probing
PTR_TO_MAP_VALUE_OR_NULL registers, which can help pruning
slightly as well by up to 7% observed complexity reduction as
stand-alone. Meaning, if a previous path with register type
PTR_TO_MAP_VALUE_OR_NULL for map X was found to be safe, then
in the current state a PTR_TO_MAP_VALUE_OR_NULL register for
the same map X must be safe as well. Last but not least the
patch also adds a scheduling point and bumps the current limit
for instructions to be processed to a more adequate value.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
 kernel/bpf/verifier.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 39f2dcb..1eddb71 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -140,7 +140,7 @@ struct bpf_verifier_stack_elem {
 	struct bpf_verifier_stack_elem *next;
 };
 
-#define BPF_COMPLEXITY_LIMIT_INSNS	65536
+#define BPF_COMPLEXITY_LIMIT_INSNS	98304
 #define BPF_COMPLEXITY_LIMIT_STACK	1024
 
 #define BPF_MAP_PTR_POISON ((void *)0xeB9F + POISON_POINTER_DELTA)
@@ -2640,6 +2640,7 @@ static int check_cfg(struct bpf_verifier_env *env)
 				env->explored_states[t + 1] = STATE_LIST_MARK;
 		} else {
 			/* conditional jump with two edges */
+			env->explored_states[t] = STATE_LIST_MARK;
 			ret = push_insn(t, t + 1, FALLTHROUGH, env);
 			if (ret == 1)
 				goto peek_stack;
@@ -2798,6 +2799,12 @@ static bool states_equal(struct bpf_verifier_env *env,
 		     rcur->type != NOT_INIT))
 			continue;
 
+		/* Don't care about the reg->id in this case. */
+		if (rold->type == PTR_TO_MAP_VALUE_OR_NULL &&
+		    rcur->type == PTR_TO_MAP_VALUE_OR_NULL &&
+		    rold->map_ptr == rcur->map_ptr)
+			continue;
+
 		if (rold->type == PTR_TO_PACKET && rcur->type == PTR_TO_PACKET &&
 		    compare_ptrs_to_packet(rold, rcur))
 			continue;
@@ -2932,6 +2939,9 @@ static int do_check(struct bpf_verifier_env *env)
 			goto process_bpf_exit;
 		}
 
+		if (need_resched())
+			cond_resched();
+
 		if (log_level > 1 || (log_level && do_print_state)) {
 			if (log_level > 1)
 				verbose("%d:", insn_idx);
-- 
1.9.3

^ permalink raw reply related

* [PATCH] ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets
From: I-Dawson, Peter A @ 2017-05-18  1:11 UTC (permalink / raw)
  To: netdev@vger.kernel.org

This fix addresses two problems in the way the DSCP field is formulated on the
 encapsulating header of IPv6 tunnels.
This fix addresses Bug 195661. https://bugzilla.kernel.org/show_bug.cgi?id=195661 

1) The IPv6 tunneling code was manipulating the DSCP field of the encapsulating
 packet using the 32b flowlabel. Since the flowlabel is only the lower 20b it
 was incorrect to assume that the upper 12b containing the DSCP and ECN fields
 would remain intact when formulating the encapsulating header. This fix
 handles the 'inherit' and 'fixed-value' DSCP cases explicitly using the extant
 dsfield u8 variable.

2) The use of INET_ECN_encapsulate(0, dsfield) in ip6_tnl_xmit was incorrect
 and resulted in the DSCP value always being set to 0.
---
 net/ipv6/ip6_gre.c    | 18 ++++++++++--------
 net/ipv6/ip6_tunnel.c | 28 +++++++++++++++++-----------
 2 files changed, 27 insertions(+), 19 deletions(-)

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 8d128ba..42f51fe 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -537,11 +537,11 @@ static inline int ip6gre_xmit_ipv4(struct sk_buff *skb, struct net_device *dev)
 
 	memcpy(&fl6, &t->fl.u.ip6, sizeof(fl6));
 
-	dsfield = ipv4_get_dsfield(iph);
-
-	if (t->parms.flags & IP6_TNL_F_USE_ORIG_TCLASS)
-		fl6.flowlabel |= htonl((__u32)iph->tos << IPV6_TCLASS_SHIFT)
-					  & IPV6_TCLASS_MASK;
+	if (t->parms.flags & IP6_TNL_F_USE_ORIG_TCLASS) {
+		dsfield = ipv4_get_dsfield(iph);
+	} else {
+		dsfield = ip6_tclass(t->parms.flowinfo);
+	}
 	if (t->parms.flags & IP6_TNL_F_USE_ORIG_FWMARK)
 		fl6.flowi6_mark = skb->mark;
 	else
@@ -598,9 +598,11 @@ static inline int ip6gre_xmit_ipv6(struct sk_buff *skb, struct net_device *dev)
 
 	memcpy(&fl6, &t->fl.u.ip6, sizeof(fl6));
 
-	dsfield = ipv6_get_dsfield(ipv6h);
-	if (t->parms.flags & IP6_TNL_F_USE_ORIG_TCLASS)
-		fl6.flowlabel |= (*(__be32 *) ipv6h & IPV6_TCLASS_MASK);
+	if (t->parms.flags & IP6_TNL_F_USE_ORIG_TCLASS) {
+		dsfield = ipv6_get_dsfield(ipv6h);
+	} else {
+		dsfield = ip6_tclass(t->parms.flowinfo);
+        }
 	if (t->parms.flags & IP6_TNL_F_USE_ORIG_FLOWLABEL)
 		fl6.flowlabel |= ip6_flowlabel(ipv6h);
 	if (t->parms.flags & IP6_TNL_F_USE_ORIG_FWMARK)
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 6eb2ae5..4d45195 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1196,8 +1196,7 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev, __u8 dsfield,
 	skb_push(skb, sizeof(struct ipv6hdr));
 	skb_reset_network_header(skb);
 	ipv6h = ipv6_hdr(skb);
-	ip6_flow_hdr(ipv6h, INET_ECN_encapsulate(0, dsfield),
-		     ip6_make_flowlabel(net, skb, fl6->flowlabel, true, fl6));
+	ip6_flow_hdr(ipv6h, dsfield, ip6_make_flowlabel(net, skb, fl6->flowlabel, true, fl6));
 	ipv6h->hop_limit = hop_limit;
 	ipv6h->nexthdr = proto;
 	ipv6h->saddr = fl6->saddr;
@@ -1231,8 +1230,6 @@ ip4ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev)
 	if (tproto != IPPROTO_IPIP && tproto != 0)
 		return -1;
 
-	dsfield = ipv4_get_dsfield(iph);
-
 	if (t->parms.collect_md) {
 		struct ip_tunnel_info *tun_info;
 		const struct ip_tunnel_key *key;
@@ -1246,6 +1243,7 @@ ip4ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev)
 		fl6.flowi6_proto = IPPROTO_IPIP;
 		fl6.daddr = key->u.ipv6.dst;
 		fl6.flowlabel = key->label;
+		dsfield = ip6_tclass(key->label);
 	} else {
 		if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT))
 			encap_limit = t->parms.encap_limit;
@@ -1253,9 +1251,11 @@ ip4ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev)
 		memcpy(&fl6, &t->fl.u.ip6, sizeof(fl6));
 		fl6.flowi6_proto = IPPROTO_IPIP;
 
-		if (t->parms.flags & IP6_TNL_F_USE_ORIG_TCLASS)
-			fl6.flowlabel |= htonl((__u32)iph->tos << IPV6_TCLASS_SHIFT)
-					 & IPV6_TCLASS_MASK;
+		if (t->parms.flags & IP6_TNL_F_USE_ORIG_TCLASS) {
+			dsfield = ipv4_get_dsfield(iph);
+		} else {
+			dsfield = ip6_tclass(t->parms.flowinfo);
+		}
 		if (t->parms.flags & IP6_TNL_F_USE_ORIG_FWMARK)
 			fl6.flowi6_mark = skb->mark;
 		else
@@ -1267,6 +1267,8 @@ ip4ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev)
 	if (iptunnel_handle_offloads(skb, SKB_GSO_IPXIP6))
 		return -1;
 
+	dsfield = INET_ECN_encapsulate(dsfield, ipv4_get_dsfield(iph));
+
 	skb_set_inner_ipproto(skb, IPPROTO_IPIP);
 
 	err = ip6_tnl_xmit(skb, dev, dsfield, &fl6, encap_limit, &mtu,
@@ -1300,8 +1302,6 @@ ip6ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev)
 	    ip6_tnl_addr_conflict(t, ipv6h))
 		return -1;
 
-	dsfield = ipv6_get_dsfield(ipv6h);
-
 	if (t->parms.collect_md) {
 		struct ip_tunnel_info *tun_info;
 		const struct ip_tunnel_key *key;
@@ -1315,6 +1315,7 @@ ip6ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev)
 		fl6.flowi6_proto = IPPROTO_IPV6;
 		fl6.daddr = key->u.ipv6.dst;
 		fl6.flowlabel = key->label;
+		dsfield = ip6_tclass(key->label);
 	} else {
 		offset = ip6_tnl_parse_tlv_enc_lim(skb, skb_network_header(skb));
 		/* ip6_tnl_parse_tlv_enc_lim() might have reallocated skb->head */
@@ -1336,8 +1337,11 @@ ip6ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev)
 		memcpy(&fl6, &t->fl.u.ip6, sizeof(fl6));
 		fl6.flowi6_proto = IPPROTO_IPV6;
 
-		if (t->parms.flags & IP6_TNL_F_USE_ORIG_TCLASS)
-			fl6.flowlabel |= (*(__be32 *)ipv6h & IPV6_TCLASS_MASK);
+		if (t->parms.flags & IP6_TNL_F_USE_ORIG_TCLASS) {
+			dsfield = ipv6_get_dsfield(ipv6h);
+		} else {
+			dsfield = ip6_tclass(t->parms.flowinfo);
+                }
 		if (t->parms.flags & IP6_TNL_F_USE_ORIG_FLOWLABEL)
 			fl6.flowlabel |= ip6_flowlabel(ipv6h);
 		if (t->parms.flags & IP6_TNL_F_USE_ORIG_FWMARK)
@@ -1351,6 +1355,8 @@ ip6ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev)
 	if (iptunnel_handle_offloads(skb, SKB_GSO_IPXIP6))
 		return -1;
 
+        dsfield = INET_ECN_encapsulate(dsfield, ipv6_get_dsfield(ipv6h));
+
 	skb_set_inner_ipproto(skb, IPPROTO_IPV6);
 
 	err = ip6_tnl_xmit(skb, dev, dsfield, &fl6, encap_limit, &mtu,

^ permalink raw reply related

* Re: [PATCH] ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets
From: Stephen Hemminger @ 2017-05-18  1:18 UTC (permalink / raw)
  To: I-Dawson, Peter A; +Cc: netdev@vger.kernel.org
In-Reply-To: <5fba416f3f644db2b1f51e54724ef6ba@XCH15-07-02.nw.nos.boeing.com>

On Thu, 18 May 2017 01:11:01 +0000
"I-Dawson, Peter A" <Peter.A.Dawson@boeing.com> wrote:

> This fix addresses two problems in the way the DSCP field is formulated on the
>  encapsulating header of IPv6 tunnels.
> This fix addresses Bug 195661. https://bugzilla.kernel.org/show_bug.cgi?id=195661 
> 
> 1) The IPv6 tunneling code was manipulating the DSCP field of the encapsulating
>  packet using the 32b flowlabel. Since the flowlabel is only the lower 20b it
>  was incorrect to assume that the upper 12b containing the DSCP and ECN fields
>  would remain intact when formulating the encapsulating header. This fix
>  handles the 'inherit' and 'fixed-value' DSCP cases explicitly using the extant
>  dsfield u8 variable.
> 
> 2) The use of INET_ECN_encapsulate(0, dsfield) in ip6_tnl_xmit was incorrect
>  and resulted in the DSCP value always being set to 0.
> ---
>  net/ipv6/ip6_gre.c    | 18 ++++++++++--------
>  net/ipv6/ip6_tunnel.c | 28 +++++++++++++++++-----------
>  2 files changed, 27 insertions(+), 19 deletions(-)

This patch looks correct, but has trivial style issues like using spaces
instead of tabs and other junk.  Please run checkpatch.pl, look at the
results and resubmit.

^ permalink raw reply

* Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.
From: Alexei Starovoitov @ 2017-05-18  1:18 UTC (permalink / raw)
  To: David Miller, ecree; +Cc: daniel, alexei.starovoitov, netdev
In-Reply-To: <20170517.201633.1413407173876329751.davem@davemloft.net>

On 5/17/17 5:16 PM, David Miller wrote:
> BTW, we track something similar already in reg->imm which tracks how
> many high order bits are known to be cleared in the register.  It is
> used to avoid potential overflow for packet pointer accesses.  I bet
> we can subsume that into this facility as well.

yeah, reg->imm tracks zero upper bits in very simplistic way.
For alignment checking it seems we need to track lower zeros
and ones. If Edward's algorithm can be adopted to track both
that would be double win indeed.

^ permalink raw reply

* Re: [net-realtek-btcoexist] question about identical code for different branches
From: Larry Finger @ 2017-05-18  1:38 UTC (permalink / raw)
  To: Gustavo A. R. Silva, Chaoming Li, Kalle Valo
  Cc: linux-wireless, netdev, linux-kernel, Pkshih
In-Reply-To: <20170517165232.Horde.v4lHW1-r73ZYdz2POOdERBa@gator4166.hostgator.com>

On 05/17/2017 04:52 PM, Gustavo A. R. Silva wrote:
> 
> Hello everybody,
> 
> While looking into Coverity ID 1362263 I ran into the following piece of code at 
> drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtcoutsrc.c:1000:
> 
> 1000void exhalbtc_set_ant_num(struct rtl_priv *rtlpriv, u8 type, u8 ant_num)
> 1001{
> 1002        if (BT_COEX_ANT_TYPE_PG == type) {
> 1003                gl_bt_coexist.board_info.pg_ant_num = ant_num;
> 1004                gl_bt_coexist.board_info.btdm_ant_num = ant_num;
> 1005                /* The antenna position:
> 1006                 * Main (default) or Aux for pgAntNum=2 && btdmAntNum =1.
> 1007                 * The antenna position should be determined by
> 1008                 * auto-detect mechanism.
> 1009                 * The following is assumed to main,
> 1010                 * and those must be modified
> 1011                 * if y auto-detect mechanism is ready
> 1012                 */
> 1013                if ((gl_bt_coexist.board_info.pg_ant_num == 2) &&
> 1014                    (gl_bt_coexist.board_info.btdm_ant_num == 1))
> 1015                        gl_bt_coexist.board_info.btdm_ant_pos =
> 1016                                                       
> BTC_ANTENNA_AT_MAIN_PORT;
> 1017                else
> 1018                        gl_bt_coexist.board_info.btdm_ant_pos =
> 1019                                                       
> BTC_ANTENNA_AT_MAIN_PORT;
> 1020        } else if (BT_COEX_ANT_TYPE_ANTDIV == type) {
> 1021                gl_bt_coexist.board_info.btdm_ant_num = ant_num;
> 1022                gl_bt_coexist.board_info.btdm_ant_pos =
> 1023                                                       
> BTC_ANTENNA_AT_MAIN_PORT;
> 1024        } else if (type == BT_COEX_ANT_TYPE_DETECTED) {
> 1025                gl_bt_coexist.board_info.btdm_ant_num = ant_num;
> 1026                if (rtlpriv->cfg->mod_params->ant_sel == 1)
> 1027                        gl_bt_coexist.board_info.btdm_ant_pos =
> 1028                                BTC_ANTENNA_AT_AUX_PORT;
> 1029                else
> 1030                        gl_bt_coexist.board_info.btdm_ant_pos =
> 1031                                BTC_ANTENNA_AT_MAIN_PORT;
> 1032        }
> 1033}
> 
> The issue is that lines of code 1015-1016 and 1018-1019 are identical for 
> different branches.
> 
> My question here is if one of those assignments should be modified, or the 
> entire _if_ statement replaced and the function refactored?
> 
> I'd really appreciate any comment on this.
> 
> Thank you!
> -- 
> Gustavo A. R. Silva

Gustavo,

Thanks for the notification. I will discuss with Realtek as to the proper fix.

Larry

^ permalink raw reply

* [net:master 9/12] net/ipv6/ip6_offload.c:120:7-21: WARNING: Unsigned expression compared with zero: unfrag_ip6hlen < 0 (fwd)
From: Julia Lawall @ 2017-05-18  2:01 UTC (permalink / raw)
  To: Craig Gallek; +Cc: netdev, kbuild-all

Hello,

It may be worth checking on these.  The code context is shown in the first
case (line 120).  For the others, at least it gives the line numbers.

julia

---------- Forwarded message ----------
Date: Thu, 18 May 2017 09:07:31 +0800
From: kbuild test robot <fengguang.wu@intel.com>
To: kbuild@01.org
Cc: Julia Lawall <julia.lawall@lip6.fr>
Subject: [net:master 9/12] net/ipv6/ip6_offload.c:120:7-21: WARNING: Unsigned
    expression compared with zero: unfrag_ip6hlen < 0

CC: kbuild-all@01.org
CC: netdev@vger.kernel.org
TO: Craig Gallek <kraig@google.com>

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master
head:   579f1d926c66cc0bd3bd87b1fe2e091084b07430
commit: 2423496af35d94a87156b063ea5cedffc10a70a1 [9/12] ipv6: Prevent overrun when parsing v6 header options
:::::: branch date: 2 hours ago
:::::: commit date: 6 hours ago

>> net/ipv6/ip6_offload.c:120:7-21: WARNING: Unsigned expression compared with zero: unfrag_ip6hlen < 0
--
>> net/ipv6/ip6_output.c:601:5-9: WARNING: Unsigned expression compared with zero: hlen < 0
--
>> net/ipv6/udp_offload.c:94:6-20: WARNING: Unsigned expression compared with zero: unfrag_ip6hlen < 0

git remote add net https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git
git remote update net
git checkout 2423496af35d94a87156b063ea5cedffc10a70a1
vim +120 net/ipv6/ip6_offload.c

d1da932e Vlad Yasevich        2012-11-15  104
07b26c94 Steffen Klassert     2016-09-19  105  	gso_partial = !!(skb_shinfo(segs)->gso_type & SKB_GSO_PARTIAL);
07b26c94 Steffen Klassert     2016-09-19  106
d1da932e Vlad Yasevich        2012-11-15  107  	for (skb = segs; skb; skb = skb->next) {
d3e5e006 Eric Dumazet         2013-10-20  108  		ipv6h = (struct ipv6hdr *)(skb_mac_header(skb) + nhoff);
07b26c94 Steffen Klassert     2016-09-19  109  		if (gso_partial)
802ab55a Alexander Duyck      2016-04-10  110  			payload_len = skb_shinfo(skb)->gso_size +
802ab55a Alexander Duyck      2016-04-10  111  				      SKB_GSO_CB(skb)->data_offset +
802ab55a Alexander Duyck      2016-04-10  112  				      skb->head - (unsigned char *)(ipv6h + 1);
802ab55a Alexander Duyck      2016-04-10  113  		else
802ab55a Alexander Duyck      2016-04-10  114  			payload_len = skb->len - nhoff - sizeof(*ipv6h);
802ab55a Alexander Duyck      2016-04-10  115  		ipv6h->payload_len = htons(payload_len);
d3e5e006 Eric Dumazet         2013-10-20  116  		skb->network_header = (u8 *)ipv6h - skb->head;
d3e5e006 Eric Dumazet         2013-10-20  117
91a48a2e Hannes Frederic Sowa 2014-02-24  118  		if (udpfrag) {
d1da932e Vlad Yasevich        2012-11-15  119  			unfrag_ip6hlen = ip6_find_1stfragopt(skb, &prevhdr);
2423496a Craig Gallek         2017-05-16 @120  			if (unfrag_ip6hlen < 0)
2423496a Craig Gallek         2017-05-16  121  				return ERR_PTR(unfrag_ip6hlen);
d3e5e006 Eric Dumazet         2013-10-20  122  			fptr = (struct frag_hdr *)((u8 *)ipv6h + unfrag_ip6hlen);
d1da932e Vlad Yasevich        2012-11-15  123  			fptr->frag_off = htons(offset);
53b24b8f Ian Morris           2015-03-29  124  			if (skb->next)
d1da932e Vlad Yasevich        2012-11-15  125  				fptr->frag_off |= htons(IP6_MF);
d1da932e Vlad Yasevich        2012-11-15  126  			offset += (ntohs(ipv6h->payload_len) -
d1da932e Vlad Yasevich        2012-11-15  127  				   sizeof(struct frag_hdr));
d1da932e Vlad Yasevich        2012-11-15  128  		}

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply

* Re: [PATCH] ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets
From: Peter Dawson @ 2017-05-18  2:49 UTC (permalink / raw)
  To: stephen; +Cc: netdev
In-Reply-To: <579256095bc540bab2668f55a6a2c25b@XCH15-07-02.nw.nos.boeing.com>

[-- Attachment #1: Type: text/plain, Size: 1214 bytes --]

>> This fix addresses two problems in the way the DSCP field is
>> formulated on the  encapsulating header of IPv6 tunnels.
>> This fix addresses Bug 195661.
>> https://bugzilla.kernel.org/show_bug.cgi?id=195661
>>
>> 1) The IPv6 tunneling code was manipulating the DSCP field of the
>> encapsulating  packet using the 32b flowlabel. Since the flowlabel is
>> only the lower 20b it  was incorrect to assume that the upper 12b
>> containing the DSCP and ECN fields  would remain intact when
>> formulating the encapsulating header. This fix  handles the 'inherit'
>> and 'fixed-value' DSCP cases explicitly using the extant  dsfield u8 variable.
>>
>> 2) The use of INET_ECN_encapsulate(0, dsfield) in ip6_tnl_xmit was
>> incorrect  and resulted in the DSCP value always being set to 0.
>> ---
>>  net/ipv6/ip6_gre.c    | 18 ++++++++++--------
>>  net/ipv6/ip6_tunnel.c | 28 +++++++++++++++++-----------
>>  2 files changed, 27 insertions(+), 19 deletions(-)
>
> This patch looks correct, but has trivial style issues like using spaces instead of tabs and other junk.  Please run checkpatch.pl, look at the results and resubmit.

Thanks.  I've run checkpatch.pl over this updated file with no errors.
Regards
Peter

[-- Attachment #2: tunnel_inherit.patch --]
[-- Type: text/x-diff, Size: 5709 bytes --]

From f12adb265f683d53ae071725e657dab17d26ed8d Mon Sep 17 00:00:00 2001
From: Peter Dawson <peter.a.dawson@boeing.com>
Date: Thu, 18 May 2017 11:52:22 +1000
Subject: [PATCH] ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated
 packets

This fix addresses two problems in the way the DSCP field is formulated
 on the encapsulating header of IPv6 tunnels.
This fix addresses Bug 195661.

1) The IPv6 tunneling code was manipulating the DSCP field of the
 encapsulating packet using the 32b flowlabel. Since the flowlabel is
 only the lower 20b it was incorrect to assume that the upper 12b
 containing the DSCP and ECN fields would remain intact when formulating
 the encapsulating header. This fix handles the 'inherit' and
 'fixed-value' DSCP cases explicitly using the extant dsfield u8 variable.

2) The use of INET_ECN_encapsulate(0, dsfield) in ip6_tnl_xmit was
 incorrect and resulted in the DSCP value always being set to 0.

Signed-off-by: Peter Dawson <peter.a.dawson@boeing.com>
---
 net/ipv6/ip6_gre.c    | 13 +++++++------
 net/ipv6/ip6_tunnel.c | 21 +++++++++++++--------
 2 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 8d128ba..0c5b4caa 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -537,11 +537,10 @@ static inline int ip6gre_xmit_ipv4(struct sk_buff *skb, struct net_device *dev)
 
 	memcpy(&fl6, &t->fl.u.ip6, sizeof(fl6));
 
-	dsfield = ipv4_get_dsfield(iph);
-
 	if (t->parms.flags & IP6_TNL_F_USE_ORIG_TCLASS)
-		fl6.flowlabel |= htonl((__u32)iph->tos << IPV6_TCLASS_SHIFT)
-					  & IPV6_TCLASS_MASK;
+		dsfield = ipv4_get_dsfield(iph);
+	else
+		dsfield = ip6_tclass(t->parms.flowinfo);
 	if (t->parms.flags & IP6_TNL_F_USE_ORIG_FWMARK)
 		fl6.flowi6_mark = skb->mark;
 	else
@@ -598,9 +597,11 @@ static inline int ip6gre_xmit_ipv6(struct sk_buff *skb, struct net_device *dev)
 
 	memcpy(&fl6, &t->fl.u.ip6, sizeof(fl6));
 
-	dsfield = ipv6_get_dsfield(ipv6h);
 	if (t->parms.flags & IP6_TNL_F_USE_ORIG_TCLASS)
-		fl6.flowlabel |= (*(__be32 *) ipv6h & IPV6_TCLASS_MASK);
+		dsfield = ipv6_get_dsfield(ipv6h);
+	else
+		dsfield = ip6_tclass(t->parms.flowinfo);
+
 	if (t->parms.flags & IP6_TNL_F_USE_ORIG_FLOWLABEL)
 		fl6.flowlabel |= ip6_flowlabel(ipv6h);
 	if (t->parms.flags & IP6_TNL_F_USE_ORIG_FWMARK)
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 6eb2ae5..7ae6c50 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1196,7 +1196,7 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev, __u8 dsfield,
 	skb_push(skb, sizeof(struct ipv6hdr));
 	skb_reset_network_header(skb);
 	ipv6h = ipv6_hdr(skb);
-	ip6_flow_hdr(ipv6h, INET_ECN_encapsulate(0, dsfield),
+	ip6_flow_hdr(ipv6h, dsfield,
 		     ip6_make_flowlabel(net, skb, fl6->flowlabel, true, fl6));
 	ipv6h->hop_limit = hop_limit;
 	ipv6h->nexthdr = proto;
@@ -1231,8 +1231,6 @@ ip4ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev)
 	if (tproto != IPPROTO_IPIP && tproto != 0)
 		return -1;
 
-	dsfield = ipv4_get_dsfield(iph);
-
 	if (t->parms.collect_md) {
 		struct ip_tunnel_info *tun_info;
 		const struct ip_tunnel_key *key;
@@ -1246,6 +1244,7 @@ ip4ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev)
 		fl6.flowi6_proto = IPPROTO_IPIP;
 		fl6.daddr = key->u.ipv6.dst;
 		fl6.flowlabel = key->label;
+		dsfield = ip6_tclass(key->label);
 	} else {
 		if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT))
 			encap_limit = t->parms.encap_limit;
@@ -1254,8 +1253,9 @@ ip4ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev)
 		fl6.flowi6_proto = IPPROTO_IPIP;
 
 		if (t->parms.flags & IP6_TNL_F_USE_ORIG_TCLASS)
-			fl6.flowlabel |= htonl((__u32)iph->tos << IPV6_TCLASS_SHIFT)
-					 & IPV6_TCLASS_MASK;
+			dsfield = ipv4_get_dsfield(iph);
+		else
+			dsfield = ip6_tclass(t->parms.flowinfo);
 		if (t->parms.flags & IP6_TNL_F_USE_ORIG_FWMARK)
 			fl6.flowi6_mark = skb->mark;
 		else
@@ -1267,6 +1267,8 @@ ip4ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev)
 	if (iptunnel_handle_offloads(skb, SKB_GSO_IPXIP6))
 		return -1;
 
+	dsfield = INET_ECN_encapsulate(dsfield, ipv4_get_dsfield(iph));
+
 	skb_set_inner_ipproto(skb, IPPROTO_IPIP);
 
 	err = ip6_tnl_xmit(skb, dev, dsfield, &fl6, encap_limit, &mtu,
@@ -1300,8 +1302,6 @@ ip6ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev)
 	    ip6_tnl_addr_conflict(t, ipv6h))
 		return -1;
 
-	dsfield = ipv6_get_dsfield(ipv6h);
-
 	if (t->parms.collect_md) {
 		struct ip_tunnel_info *tun_info;
 		const struct ip_tunnel_key *key;
@@ -1315,6 +1315,7 @@ ip6ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev)
 		fl6.flowi6_proto = IPPROTO_IPV6;
 		fl6.daddr = key->u.ipv6.dst;
 		fl6.flowlabel = key->label;
+		dsfield = ip6_tclass(key->label);
 	} else {
 		offset = ip6_tnl_parse_tlv_enc_lim(skb, skb_network_header(skb));
 		/* ip6_tnl_parse_tlv_enc_lim() might have reallocated skb->head */
@@ -1337,7 +1338,9 @@ ip6ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev)
 		fl6.flowi6_proto = IPPROTO_IPV6;
 
 		if (t->parms.flags & IP6_TNL_F_USE_ORIG_TCLASS)
-			fl6.flowlabel |= (*(__be32 *)ipv6h & IPV6_TCLASS_MASK);
+			dsfield = ipv6_get_dsfield(ipv6h);
+		else
+			dsfield = ip6_tclass(t->parms.flowinfo);
 		if (t->parms.flags & IP6_TNL_F_USE_ORIG_FLOWLABEL)
 			fl6.flowlabel |= ip6_flowlabel(ipv6h);
 		if (t->parms.flags & IP6_TNL_F_USE_ORIG_FWMARK)
@@ -1351,6 +1354,8 @@ ip6ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev)
 	if (iptunnel_handle_offloads(skb, SKB_GSO_IPXIP6))
 		return -1;
 
+	dsfield = INET_ECN_encapsulate(dsfield, ipv6_get_dsfield(ipv6h));
+
 	skb_set_inner_ipproto(skb, IPPROTO_IPV6);
 
 	err = ip6_tnl_xmit(skb, dev, dsfield, &fl6, encap_limit, &mtu,
-- 
2.7.4


^ permalink raw reply related

* Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.
From: Alexei Starovoitov @ 2017-05-18  2:48 UTC (permalink / raw)
  To: Edward Cree, David Miller, Daniel Borkmann; +Cc: alexei.starovoitov, netdev
In-Reply-To: <e873f261-e1c1-c344-3d72-254623e0c91a@solarflare.com>

On 5/17/17 8:33 AM, Edward Cree wrote:
> On 17/05/17 15:00, Edward Cree wrote:
>> OTOH the 'track known 1s as well' might work in a nice generic way
>>  and cover all bases, I'll have to experiment a bit with that.
>>
>> -Ed
>
> So I did some experiments (in Python, script follows) and found that
>  indeed this does appear to work, at least for addition and shifts.
> The idea is that we have a 0s mask and a 1s mask; for bits that are
>  unknown, the 0s mask is set and the 1s mask is cleared.  So a
>  completely unknown variable has masks (~0, 0), then if you shift it
>  left 2 you get (~3, 0) - just shift both masks.  A constant x has
>  masks (x, x) - all the 0s are known 0s and all the 1s are known 1s.
> Addition is a bit more complicated: we compute the 'unknown bits'
>  mask, by XORing the 0s and 1s masks together, of each addend.  Then
>  we add the corresponding masks from each addend together, and force
>  the 'unknown' bits to the appropriate values in each mask.
> So given (a1, b1) and (a2, b2), we compute m1 = a1 ^ b1,
>  m2 = a2 ^ b2, and m = m1 | m2.  Then a = (a1 + a2) | m, and
>  b = (b1 + b2) & ~m.
> As a worked example, 2 + (x << 2) + 14:
>  2 => (2, 2) constant
>  x => (~0, 0) unknown
>  x << 2 => (~3, 0)
>  2 + (x << 2): add (2, 2) with (~3, 0)
>     m1 = 0, m2 = ~3, m = ~3
>     a = (2 + ~3) | ~3 = ~1 | ~3 = ~1
>     b = (2 + 0) & ~~3 = 2 & 3 = 2
>     so (~1, 2), which means "...xx10"
> now add 14: add (~1, 2) with (14, 14)
>     m1 = ~3, m2 = 0, m = ~3
>     a = (~1 + 14) | ~3 = 12 | ~3 = ~3
>     b = (2 + 14) & ~~3 = 16 & 3 = 0
>     so (~3, 0), which means "...xx00"
> and the result is 4-byte aligned.
>
> -Ed
>
> PS. Beware of bugs in the following code; I have only tested it, not
>  proved it correct.
>
> --8<--
>
> #!/usr/bin/python2
>
> def cpl(x):
>     return (~x)&0xff
>
> class AlignedNumber(object):
>     def __init__(self, mask0=0xff, mask1=0):
>         """mask0 has 0s for bits known to be 0, 1s otherwise.
>         mask1 has 1s for bits known to be 1, 0s otherwise.
>         Thus a bit which is set in mask0 and cleared in mask1 is an 'unknown'
>         bit, while a bit which is cleared in mask0 and set in mask1 is a bug.
>         """
>         self.masks = (mask0 & 0xff, mask1 & 0xff)
>         self.validate()
>     @classmethod
>     def const(cls, value):
>         """All bits are known, so both masks equal value."""
>         return cls(value, value)
>     def validate(self):
>         # Check for bits 'known' to be both 0 and 1
>         assert not (cpl(self.masks[0]) & self.masks[1]), self.masks
>         # Check unknown bits don't follow known bits
>         assert self.mx | ((self.mx - 1) & 0xff) == 0xff, self.masks
>     def __str__(self):
>         return ':'.join(map(bin, self.masks))
>     def __lshift__(self, sh):
>         """Shift both masks left, low bits become 'known 0'"""
>         return self.__class__(self.masks[0] << sh, self.masks[1] << sh)
>     def __rshift__(self, sh):
>         """Shift 1s into mask0; high bits become 'unknown'.
>         While strictly speaking they may be known 0 if we're tracking the full
>         word and doing unsigned shifts, having known bits before unknown bits
>         breaks the addition code."""
>         return self.__class__(cpl(cpl(self.masks[0]) >> sh), self.masks[1] >> sh)
>     @property
>     def mx(self):
>         """Mask of unknown bits"""
>         return self.masks[0] ^ self.masks[1]
>     def __add__(self, other):
>         """OR the mx values together.  Unknown bits could cause carries, so we
>         just assume that they can carry all the way to the left (thus we keep
>         our mx masks in the form 1...10...0.
>         Then, add our 0- and 1-masks, and force the bits of the combined mx
>         mask to the unknown state."""
>         if isinstance(other, int):
>             return self + AlignedNumber.const(other)
>         assert isinstance(other, AlignedNumber), other
>         m = self.mx | other.mx
>         return self.__class__((self.masks[0] + other.masks[0]) | m,
>                               (self.masks[1] + other.masks[1]) & cpl(m))
>     def is_aligned(self, bits):
>         """We are 2^n-aligned iff the bottom n bits are known-0."""
>         mask = (1 << bits) - 1
>         return not (self.masks[0] & mask)
>
> if __name__ == '__main__':
>     a = AlignedNumber.const(2)
>     b = AlignedNumber() << 2
>     c = AlignedNumber.const(14)
>     print a, b, c
>     print a + b, a + b + c
>     assert (a + b + c).is_aligned(2)

that bit was confusing to me.
is_aligned(2) checks that number is 4 byte aligned :)

Would it be easier to represent this logic via (mask_of_unknown, value)
instead of (mask0, mask1) ?
Where mask_of_unknown has 1s for unknown bits and 0 for known bits
in the value. Then arithmetic operations will be easier to visualize.
Like:
(mask1, val1) + (mask2, val2) = (mask1 | mask2, val1 + val2)

Here is your modified script:
#!/usr/bin/python2

class AlignedNumber(object):
     def __init__(self, mask=0xff, value=0):
         # mask has 1s for unknown bits, 0s for known bits in value
         self.masks = (mask & 0xff, value & 0xff)
     @classmethod
     def const(cls, value):
         # All bits are known, hence mask = 0
         return cls(0, value)
     def __str__(self):
         return ':'.join(map(bin, self.masks))
     def __lshift__(self, sh):
         return self.__class__(self.masks[0] << sh, self.masks[1] << sh)
     def __rshift__(self, sh):
         return self.__class__(self.masks[0] >> sh, self.masks[1] >> sh)
     def __add__(self, other):
         if isinstance(other, int):
             return self + AlignedNumber.const(other)
         assert isinstance(other, AlignedNumber), other
         return self.__class__(self.masks[0] | other.masks[0],
                               self.masks[1] + other.masks[1])
     def is_aligned(self, bits):
         """We are 2^n-aligned iff the bottom n bits are known-0."""
         mask = (1 << bits) - 1
         # assume all unknown bits are 1 for alignment checking purpose
         m = self.masks[0] | self.masks[1]
         return not (self.masks[0] & mask)

if __name__ == '__main__':
     a = AlignedNumber.const(2)
     b = AlignedNumber() << 2
     c = AlignedNumber.const(14)
     print a, b, c
     print a + b, a + b + c
     assert (a + b + c).is_aligned(2)
     d = (AlignedNumber() << 4) >> 2
     print d
     assert d.is_aligned(2)


As far as upper bits we can tweak the algorithm to eat into
one or more bits of known bits due to carry.
Like
00xx11 + 00xx11 = 0xxx10
we will eat only one bit (second from left) and the highest bit
is known to stay zero, since carry can only compromise 2nd from left.
Such logic should work for sparse representation of unknown bits too
Like:
10xx01xx10 +
01xx01xx00 =
xxxx1xxx10
both upper two bits would be unknown, but only one middle bit becomes
unknown.
In typical case (that verifier tracks today) a bunch of upper bits will
be zero, so worst case we will be eating one bit at a time.

^ permalink raw reply

* Re: [PATCH net] bpf: adjust verifier heuristics
From: David Miller @ 2017-05-18  2:56 UTC (permalink / raw)
  To: daniel; +Cc: ast, netdev
In-Reply-To: <57a6fc1f937d985a5851016221c98a29fb1cc46a.1495068621.git.daniel@iogearbox.net>

From: Daniel Borkmann <daniel@iogearbox.net>
Date: Thu, 18 May 2017 03:00:06 +0200

> Current limits with regards to processing program paths do not
> really reflect today's needs anymore due to programs becoming
> more complex and verifier smarter, keeping track of more data
> such as const ALU operations, alignment tracking, spilling of
> PTR_TO_MAP_VALUE_ADJ registers, and other features allowing for
> smarter matching of what LLVM generates.
> 
> This also comes with the side-effect that we result in fewer
> opportunities to prune search states and thus often need to do
> more work to prove safety than in the past due to different
> register states and stack layout where we mismatch. Generally,
> it's quite hard to determine what caused a sudden increase in
> complexity, it could be caused by something as trivial as a
> single branch somewhere at the beginning of the program where
> LLVM assigned a stack slot that is marked differently throughout
> other branches and thus causing a mismatch, where verifier
> then needs to prove safety for the whole rest of the program.
> Subsequently, programs with even less than half the insn size
> limit can get rejected. We noticed that while some programs
> load fine under pre 4.11, they get rejected due to hitting
> limits on more recent kernels. We saw that in the vast majority
> of cases (90+%) pruning failed due to register mismatches. In
> case of stack mismatches, majority of cases failed due to
> different stack slot types (invalid, spill, misc) rather than
> differences in spilled registers.
> 
> This patch makes pruning more aggressive by also adding markers
> that sit at conditional jumps as well. Currently, we only mark
> jump targets for pruning. For example in direct packet access,
> these are usually error paths where we bail out. We found that
> adding these markers, it can reduce number of processed insns
> by up to 30%. Another option is to ignore reg->id in probing
> PTR_TO_MAP_VALUE_OR_NULL registers, which can help pruning
> slightly as well by up to 7% observed complexity reduction as
> stand-alone. Meaning, if a previous path with register type
> PTR_TO_MAP_VALUE_OR_NULL for map X was found to be safe, then
> in the current state a PTR_TO_MAP_VALUE_OR_NULL register for
> the same map X must be safe as well. Last but not least the
> patch also adds a scheduling point and bumps the current limit
> for instructions to be processed to a more adequate value.
> 
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Acked-by: Alexei Starovoitov <ast@kernel.org>

Ok, applied, but we should continue trying to make the pruner
more effective.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox