Netdev List
 help / color / mirror / Atom feed
* [bpf PATCH v2 2/3] bpf: sockmap, zero sg_size on error when buffer is released
From: John Fastabend @ 2018-05-02 20:50 UTC (permalink / raw)
  To: borkmann, ast; +Cc: netdev
In-Reply-To: <20180502204748.12776.80509.stgit@john-Precision-Tower-5810>

When an error occurs during a redirect we have two cases that need
to be handled (i) we have a cork'ed buffer (ii) we have a normal
sendmsg buffer.

In the cork'ed buffer case we don't currently support recovering from
errors in a redirect action. So the buffer is released and the error
should _not_ be pushed back to the caller of sendmsg/sendpage. The
rationale here is the user will get an error that relates to old
data that may have been sent by some arbitrary thread on that sock.
Instead we simple consume the data and tell the user that the data
has been consumed. We may add proper error recovery in the future.
However, this patch fixes a bug where the bytes outstanding counter
sg_size was not zeroed. This could result in a case where if the user
has both a cork'ed action and apply action in progress we may
incorrectly call into the BPF program when the user expected an
old verdict to be applied via the apply action. I don't have a use
case where using apply and cork at the same time is valid but we
never explicitly reject it because it should work fine. This patch
ensures the sg_size is zeroed so we don't have this case.

In the normal sendmsg buffer case (no cork data) we also do not
zero sg_size. Again this can confuse the apply logic when the logic
calls into the BPF program when the BPF programmer expected the old
verdict to remain. So ensure we set sg_size to zero here as well. And
additionally to keep the psock state in-sync with the sk_msg_buff
release all the memory as well. Previously we did this before
returning to the user but this left a gap where psock and sk_msg_buff
states were out of sync which seems fragile. No additional overhead
is taken here except for a call to check the length and realize its
already been freed. This is in the error path as well so in my
opinion lets have robust code over optimized error paths.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 kernel/bpf/sockmap.c |   15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 943929a..052c313 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -701,15 +701,22 @@ static int bpf_exec_tx_verdict(struct smap_psock *psock,
 		err = bpf_tcp_sendmsg_do_redirect(redir, send, m, flags);
 		lock_sock(sk);
 
+		if (unlikely(err < 0)) {
+			free_start_sg(sk, m);
+			psock->sg_size = 0;
+			if (!cork)
+				*copied -= send;
+		} else {
+			psock->sg_size -= send;
+		}
+
 		if (cork) {
 			free_start_sg(sk, m);
+			psock->sg_size = 0;
 			kfree(m);
 			m = NULL;
+			err = 0;
 		}
-		if (unlikely(err))
-			*copied -= err;
-		else
-			psock->sg_size -= send;
 		break;
 	case __SK_DROP:
 	default:

^ permalink raw reply related

* [bpf PATCH v2 1/3] bpf: sockmap, fix scatterlist update on error path in send with apply
From: John Fastabend @ 2018-05-02 20:50 UTC (permalink / raw)
  To: borkmann, ast; +Cc: netdev
In-Reply-To: <20180502204748.12776.80509.stgit@john-Precision-Tower-5810>

When the call to do_tcp_sendpage() fails to send the complete block
requested we either retry if only a partial send was completed or
abort if we receive a error less than or equal to zero. Before
returning though we must update the scatterlist length/offset to
account for any partial send completed.

Before this patch we did this at the end of the retry loop, but
this was buggy when used while applying a verdict to fewer bytes
than in the scatterlist. When the scatterlist length was being set
we forgot to account for the apply logic reducing the size variable.
So the result was we chopped off some bytes in the scatterlist without
doing proper cleanup on them. This results in a WARNING when the
sock is tore down because the bytes have previously been charged to
the socket but are never uncharged.

The simple fix is to simply do the accounting inside the retry loop
subtracting from the absolute scatterlist values rather than trying
to accumulate the totals and subtract at the end.

Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 kernel/bpf/sockmap.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 634415c..943929a 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -326,6 +326,9 @@ static int bpf_tcp_push(struct sock *sk, int apply_bytes,
 			if (ret > 0) {
 				if (apply)
 					apply_bytes -= ret;
+
+				sg->offset += ret;
+				sg->length -= ret;
 				size -= ret;
 				offset += ret;
 				if (uncharge)
@@ -333,8 +336,6 @@ static int bpf_tcp_push(struct sock *sk, int apply_bytes,
 				goto retry;
 			}
 
-			sg->length = size;
-			sg->offset = offset;
 			return ret;
 		}
 

^ permalink raw reply related

* [bpf PATCH v2 0/3] sockmap error path fixes
From: John Fastabend @ 2018-05-02 20:50 UTC (permalink / raw)
  To: borkmann, ast; +Cc: netdev

When I added the test_sockmap to selftests I mistakenly changed the
test logic a bit. The result of this was on redirect cases we ended up
choosing the wrong sock from the BPF program and ended up sending to a
socket that had no receive handler. The result was the actual receive
handler, running on a different socket, is timing out and closing the
socket. This results in errors (-EPIPE to be specific) on the sending
side. Typically happening if the sender does not complete the send
before the receive side times out. So depending on timing and the size
of the send we may get errors. This exposed some bugs in the sockmap
error path handling.

This series fixes the errors. The primary issue is we did not do proper
memory accounting in these cases which resulted in missing a
sk_mem_uncharge(). This happened in the redirect path and in one case
on the normal send path. See the three patches for the details.

The other take-away from this is we need to fix the test_sockmap and
also add more negative test cases. That will happen in bpf-next.

Finally, I tested this using the existing test_sockmap program, the
older sockmap sample test script, and a few real use cases with
Cilium. All of these seem to be in working correctly.

v2: fix compiler warning, drop iterator variable 'i' that is no longer
    used in patch 3.

---

John Fastabend (3):
      bpf: sockmap, fix scatterlist update on error path in send with apply
      bpf: sockmap, zero sg_size on error when buffer is released
      bpf: sockmap, fix error handling in redirect failures


 kernel/bpf/sockmap.c |   48 ++++++++++++++++++++++++++----------------------
 1 file changed, 26 insertions(+), 22 deletions(-)

^ permalink raw reply

* Re: [PATCH net-next 1/4] ipv6: Calculate hash thresholds for IPv6 nexthops
From: Thomas Winter @ 2018-05-02 20:48 UTC (permalink / raw)
  To: Ido Schimmel, David Ahern
  Cc: Eric Dumazet, davem@davemloft.net, Ido Schimmel,
	netdev@vger.kernel.org, roopa@cumulusnetworks.com,
	nikolay@cumulusnetworks.com, pch@ordbogen.com, jkbs@redhat.com,
	yoshfuji@linux-ipv6.org, mlxsw@mellanox.com
In-Reply-To: <20180502190401.GA470@splinter>

> On Wed, May 02, 2018 at 12:58:56PM -0600, David Ahern wrote:
> > On 5/2/18 12:53 PM, Ido Schimmel wrote:
> > > 
> > > So this fixes the issue for me. To reproduce:
> > > 
> > > # ip -6 address add 2001:db8::1/64 dev dummy0
> > > # ip -6 address add 2001:db8::1/64 dev dummy1
> > > 
> > > This reproduces the issue because due to above commit both local routes
> > > are considered siblings... :/
> > > 
> > > local 2001:db8::1 proto kernel metric 0 
> > >         nexthop dev dummy0 weight 1 
> > >         nexthop dev dummy1 weight 1 pref medium
> > > 
> > > I think it's best to revert the patch and have Thomas submit a fixed
> > > version to net-next. I was actually surprised to see it applied to net.
> > 
> > ugly side effect of the way ecmp routes are managed in IPv6. I think
> > revert is the best option for now.
> 
> OK. I'll send a patch.

fe80::/64  proto kernel  metric 256 
        nexthop dev vlan1 weight 1
        nexthop dev vlan10 weight 1
        nexthop dev vlan30 weight 1
        nexthop dev tunnel11 weight 1
        nexthop dev tunnel12 weight 1

Sorry I completely missed that, I was always looking at other route tables. 
Should I look at reworking this? It would be great to have these ECMP routes for other purposes.

ip -6 ro show table 601
default  metric 1024 
        nexthop dev tunnel11 weight 1
        nexthop dev tunnel12 weight 1

^ permalink raw reply

* Re: [PATCH 0/2] sh_eth: complain on access to unimplemented TSU registers
From: David Miller @ 2018-05-02 20:41 UTC (permalink / raw)
  To: sergei.shtylyov; +Cc: netdev, linux-renesas-soc, linux-sh
In-Reply-To: <4a17ab65-4d9c-897e-c340-7d86e0296f2f@cogentembedded.com>

From: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Date: Wed, 2 May 2018 22:53:23 +0300

> Here's a set of 2 patches against DaveM's 'net-next.git' repo. The 1st patch
> routes TSU_POST<n> register accesses thru sh_eth_tsu_{read|write}() and the 2nd
> added WARN_ON() unimplemented register to those functions. I'm going to deal with
> TSU_ADR{H|L}<n> registers in a later series...
> 
> [1/2] sh_eth: use TSU register accessors for TSU_POST<n>
> [2/2] sh_eth: WARN_ON() access to unimplemented TSU register

Series applied to net-next, thanks.

^ permalink raw reply

* Re: [PATCH net] net_sched: fq: take care of throttled flows before reuse
From: David Miller @ 2018-05-02 20:38 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, eric.dumazet
In-Reply-To: <20180502170330.55458-1-edumazet@google.com>

From: Eric Dumazet <edumazet@google.com>
Date: Wed,  2 May 2018 10:03:30 -0700

> Normally, a socket can not be freed/reused unless all its TX packets
> left qdisc and were TX-completed. However connect(AF_UNSPEC) allows
> this to happen.
> 
> With commit fc59d5bdf1e3 ("pkt_sched: fq: clear time_next_packet for
> reused flows") we cleared f->time_next_packet but took no special
> action if the flow was still in the throttled rb-tree.
> 
> Since f->time_next_packet is the key used in the rb-tree searches,
> blindly clearing it might break rb-tree integrity. We need to make
> sure the flow is no longer in the rb-tree to avoid this problem.
> 
> Fixes: fc59d5bdf1e3 ("pkt_sched: fq: clear time_next_packet for reused flows")
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied and queued up for -stable, thanks Eric.

^ permalink raw reply

* Re: [PATCH net] ipv6: Revert "ipv6: Allow non-gateway ECMP for IPv6"
From: David Miller @ 2018-05-02 20:33 UTC (permalink / raw)
  To: idosch; +Cc: netdev, dsahern, eric.dumazet, Thomas.Winter, mlxsw
In-Reply-To: <20180502194156.9275-1-idosch@mellanox.com>

From: Ido Schimmel <idosch@mellanox.com>
Date: Wed,  2 May 2018 22:41:56 +0300

> This reverts commit edd7ceb78296 ("ipv6: Allow non-gateway ECMP for
> IPv6").
> 
> Eric reported a division by zero in rt6_multipath_rebalance() which is
> caused by above commit that considers identical local routes to be
> siblings. The division by zero happens because a nexthop weight is not
> set for local routes.
> 
> Revert the commit as it does not fix a bug and has side effects.
> 
> To reproduce:
> 
> # ip -6 address add 2001:db8::1/64 dev dummy0
> # ip -6 address add 2001:db8::1/64 dev dummy1
> 
> Fixes: edd7ceb78296 ("ipv6: Allow non-gateway ECMP for IPv6")
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> Reported-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH net-next v9 2/4] net: Introduce generic failover module
From: Michael S. Tsirkin @ 2018-05-02 20:30 UTC (permalink / raw)
  To: Samudrala, Sridhar
  Cc: Jiri Pirko, stephen, davem, netdev, virtualization, virtio-dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, jasowang,
	loseweigh, aaron.f.brown
In-Reply-To: <052e2c60-dc4c-d6a0-0159-fc1821cb4365@intel.com>

On Wed, May 02, 2018 at 10:51:12AM -0700, Samudrala, Sridhar wrote:
> 
> 
> On 5/2/2018 9:15 AM, Jiri Pirko wrote:
> > Sat, Apr 28, 2018 at 11:06:01AM CEST, jiri@resnulli.us wrote:
> > > Fri, Apr 27, 2018 at 07:06:58PM CEST, sridhar.samudrala@intel.com wrote:
> > [...]
> > 
> > 
> > > > +
> > > > +	err = netdev_rx_handler_register(slave_dev, net_failover_handle_frame,
> > > > +					 failover_dev);
> > > > +	if (err) {
> > > > +		netdev_err(slave_dev, "can not register failover rx handler (err = %d)\n",
> > > > +			   err);
> > > > +		goto err_handler_register;
> > > > +	}
> > > > +
> > > > +	err = netdev_upper_dev_link(slave_dev, failover_dev, NULL);
> > > Please use netdev_master_upper_dev_link().
> > Don't forget to fillup struct netdev_lag_upper_info - NETDEV_LAG_TX_TYPE_ACTIVEBACKUP
> > 
> > 
> > Also, please call netdev_lower_state_changed() when the active slave
> > device changes from primary->backup of backup->primary and whenever link
> > state of a slave changes
> > 
> Sure. will look into it.  Do you think this will help with the issue
> you saw with having to change mac on standy twice to get the init scripts
> working? We are now going to block changing the mac on both standby and
> failover.
> 
> Also, i was wondering if we should set dev->flags to IFF_MASTER on failover
> and IFF_SLAVE on primary and standby.

We do need a way to find things out, that's for sure.
How does userspace know it's a failover
config and find the failover device right now?

> netvsc does this.
> Does this help with the init scripts and network manager to skip slave
> devices for dhcp requests?

Try it?

^ permalink raw reply

* [RFC iproute2-next 5/5] ss: use correct slab statistics
From: Stephen Hemminger @ 2018-05-02 20:28 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Hemminger, Stephen Hemminger
In-Reply-To: <20180502202801.5255-1-stephen@networkplumber.org>

From: Stephen Hemminger <sthemmin@microsoft.com>

The slabinfo names changed years ago, and ss statistics were broken.
This changes to use current slab names and handle TCP IPv6.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 misc/ss.c | 23 +++++++++++------------
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/misc/ss.c b/misc/ss.c
index 97304cd8abfc..66c767cc415b 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -742,12 +742,12 @@ static int get_slabstat(struct slabstat *s)
 {
 	char buf[256];
 	FILE *fp;
-	int cnt;
+	int *stats = (int *) s;
 	static int slabstat_valid;
 	static const char * const slabstat_ids[] = {
-		"sock",
-		"tcp_tw_bucket",
-		"tcp_open_request",
+		"sock_inode_cache",
+		"tw_sock_TCP",
+		"request_sock_TCP",
 	};
 
 	if (slabstat_valid)
@@ -759,24 +759,23 @@ static int get_slabstat(struct slabstat *s)
 	if (!fp)
 		return -1;
 
-	cnt = sizeof(*s)/sizeof(int);
-
 	if (!fgets(buf, sizeof(buf), fp)) {
 		fclose(fp);
 		return -1;
 	}
+
 	while (fgets(buf, sizeof(buf), fp) != NULL) {
-		int i;
+		int i, v;
 
 		for (i = 0; i < ARRAY_SIZE(slabstat_ids); i++) {
-			if (memcmp(buf, slabstat_ids[i], strlen(slabstat_ids[i])) == 0) {
-				sscanf(buf, "%*s%d", ((int *)s) + i);
-				cnt--;
+			if (memcmp(buf, slabstat_ids[i], strlen(slabstat_ids[i])) != 0)
+				continue;
+
+			if (sscanf(buf, "%*s%d", &v) == 1) {
+				stats[i] += v;
 				break;
 			}
 		}
-		if (cnt <= 0)
-			break;
 	}
 
 	slabstat_valid = 1;
-- 
2.17.0

^ permalink raw reply related

* [RFC iproute2-next 4/5] ss: don't look for skbuff_head_cache
From: Stephen Hemminger @ 2018-05-02 20:28 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Hemminger, Stephen Hemminger
In-Reply-To: <20180502202801.5255-1-stephen@networkplumber.org>

From: Stephen Hemminger <sthemmin@microsoft.com>

Not used in current code.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 misc/ss.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/misc/ss.c b/misc/ss.c
index 4f76999c0fee..97304cd8abfc 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -734,7 +734,6 @@ struct slabstat {
 	int socks;
 	int tcp_tws;
 	int tcp_syns;
-	int skbs;
 };
 
 static struct slabstat slabstat;
@@ -749,7 +748,6 @@ static int get_slabstat(struct slabstat *s)
 		"sock",
 		"tcp_tw_bucket",
 		"tcp_open_request",
-		"skbuff_head_cache",
 	};
 
 	if (slabstat_valid)
-- 
2.17.0

^ permalink raw reply related

* [RFC iproute2-next 3/5] ss: use sockstat to get TCP bind ports
From: Stephen Hemminger @ 2018-05-02 20:27 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Hemminger, Stephen Hemminger
In-Reply-To: <20180502202801.5255-1-stephen@networkplumber.org>

From: Stephen Hemminger <sthemmin@microsoft.com>

Using slabinfo to try and get the number of bind_buckets no longer
works because of slab cache merging. Instead use proposed enhancment
of /proc/net/sockstat to get the same data.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 misc/ss.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/misc/ss.c b/misc/ss.c
index c88a25581755..4f76999c0fee 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -732,7 +732,6 @@ next:
 
 struct slabstat {
 	int socks;
-	int tcp_ports;
 	int tcp_tws;
 	int tcp_syns;
 	int skbs;
@@ -748,7 +747,6 @@ static int get_slabstat(struct slabstat *s)
 	static int slabstat_valid;
 	static const char * const slabstat_ids[] = {
 		"sock",
-		"tcp_bind_bucket",
 		"tcp_tw_bucket",
 		"tcp_open_request",
 		"skbuff_head_cache",
@@ -4594,6 +4592,7 @@ struct ssummary {
 	int tcp_orphans;
 	int tcp_tws;
 	int tcp4_hashed;
+	int tcp_ports;
 	int udp4;
 	int raw4;
 	int frag4;
@@ -4629,9 +4628,9 @@ static void get_sockstat_line(char *line, struct ssummary *s)
 	else if (strcmp(id, "FRAG6:") == 0)
 		sscanf(rem, "%*s%d%*s%d", &s->frag6, &s->frag6_mem);
 	else if (strcmp(id, "TCP:") == 0)
-		sscanf(rem, "%*s%d%*s%d%*s%d%*s%d%*s%ld",
+		sscanf(rem, "%*s%d%*s%d%*s%d%*s%d%*s%ld%*s%d",
 		       &s->tcp4_hashed,
-		       &s->tcp_orphans, &s->tcp_tws, &s->tcp_total, &s->tcp_mem);
+		       &s->tcp_orphans, &s->tcp_tws, &s->tcp_total, &s->tcp_mem, &s->tcp_ports);
 }
 
 static int get_sockstat(struct ssummary *s)
@@ -4676,8 +4675,7 @@ static int print_summary(void)
 	       s.tcp_total - (s.tcp4_hashed+s.tcp6_hashed-s.tcp_tws),
 	       s.tcp_orphans,
 	       slabstat.tcp_syns,
-	       s.tcp_tws, slabstat.tcp_tws,
-	       slabstat.tcp_ports
+	       s.tcp_tws, slabstat.tcp_tws, s.tcp_ports
 	       );
 
 	printf("\n");
-- 
2.17.0

^ permalink raw reply related

* [RFC iproute2-next 2/5] ss: make tcp_mem long
From: Stephen Hemminger @ 2018-05-02 20:27 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Hemminger
In-Reply-To: <20180502202801.5255-1-stephen@networkplumber.org>

The tcp_memory field in /proc/net/sockstat is formatted as
a long value by kernel. Change ss to keep this as full value.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 misc/ss.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/misc/ss.c b/misc/ss.c
index 22c76e34f83b..c88a25581755 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -4589,7 +4589,7 @@ static int get_snmp_int(const char *proto, const char *key, int *result)
 
 struct ssummary {
 	int socks;
-	int tcp_mem;
+	long tcp_mem;
 	int tcp_total;
 	int tcp_orphans;
 	int tcp_tws;
@@ -4629,7 +4629,7 @@ static void get_sockstat_line(char *line, struct ssummary *s)
 	else if (strcmp(id, "FRAG6:") == 0)
 		sscanf(rem, "%*s%d%*s%d", &s->frag6, &s->frag6_mem);
 	else if (strcmp(id, "TCP:") == 0)
-		sscanf(rem, "%*s%d%*s%d%*s%d%*s%d%*s%d",
+		sscanf(rem, "%*s%d%*s%d%*s%d%*s%d%*s%ld",
 		       &s->tcp4_hashed,
 		       &s->tcp_orphans, &s->tcp_tws, &s->tcp_total, &s->tcp_mem);
 }
-- 
2.17.0

^ permalink raw reply related

* [RFC iproute2-next 1/5] ss: make args to get_snmp_int const
From: Stephen Hemminger @ 2018-05-02 20:27 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Hemminger
In-Reply-To: <20180502202801.5255-1-stephen@networkplumber.org>

These are keys for lookup and should be const.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 misc/ss.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/misc/ss.c b/misc/ss.c
index 3ed7e66962f3..22c76e34f83b 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -4539,7 +4539,7 @@ static int handle_follow_request(struct filter *f)
 	return ret;
 }
 
-static int get_snmp_int(char *proto, char *key, int *result)
+static int get_snmp_int(const char *proto, const char *key, int *result)
 {
 	char buf[1024];
 	FILE *fp;
-- 
2.17.0

^ permalink raw reply related

* [RFC iproute2-next 0/5] ss statistics fixes
From: Stephen Hemminger @ 2018-05-02 20:27 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Hemminger

From: Stephen Hemminger <sthemmin@microsoft.com>

The output of the ss -s command has been broken for a long time
because of kernel changes (ie since 2.6).

This is an attempt to resolve most of the issues. Still don't like
the way it is using slabinfo to get the data but some of this information
would be expensive for kernel to account for otherwise.

Stephen Hemminger (5):
  ss: make args to get_snmp_int const
  ss: make tcp_mem long
  ss: use sockstat to get TCP bind ports
  ss: don't look for skbuff_head_cache
  ss: use correct slab statistics

 misc/ss.c | 39 +++++++++++++++++----------------------
 1 file changed, 17 insertions(+), 22 deletions(-)

-- 
2.17.0

^ permalink raw reply

* Re: [PATCH net-next 00/10] r8169: series with further improvements
From: David Miller @ 2018-05-02 20:24 UTC (permalink / raw)
  To: hkallweit1; +Cc: nic_swsd, netdev
In-Reply-To: <2c7cae9e-d989-8529-7209-5a271242ce39@gmail.com>

From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Wed, 2 May 2018 21:28:10 +0200

> I thought I'm more or less done with the basic refactoring. But again
> I stumbled across things that can be improved / simplified.

Looks good, series applied, thanks Heiner.

^ permalink raw reply

* [PATCH net,stable] qmi_wwan: do not steal interfaces from class drivers
From: Bjørn Mork @ 2018-05-02 20:22 UTC (permalink / raw)
  To: netdev; +Cc: linux-usb, Bjørn Mork

The USB_DEVICE_INTERFACE_NUMBER matching macro assumes that
the { vendorid, productid, interfacenumber } set uniquely
identifies one specific function.  This has proven to fail
for some configurable devices. One example is the Quectel
EM06/EP06 where the same interface number can be either
QMI or MBIM, without the device ID changing either.

Fix by requiring the vendor-specific class for interface number
based matching.  Functions of other classes can and should use
class based matching instead.

Fixes: 03304bcb5ec4 ("net: qmi_wwan: use fixed interface number matching")
Signed-off-by: Bjørn Mork <bjorn@mork.no>
---
It's quite possible that the fix should be integrated in the
USB_DEVICE_INTERFACE_NUMBER macro instead.  But that has grown a few
other users since it was added, so changing it now seems risky. 
Another option is of course adding a new match macro with the
USB_CLASS_VENDOR_SPEC match integrated. Maybe best?

But I'm proposing this as-is for now, since this quickfix seems most
suitable for stable backporting.

 drivers/net/usb/qmi_wwan.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
index 51c68fc416fa..42565dd33aa6 100644
--- a/drivers/net/usb/qmi_wwan.c
+++ b/drivers/net/usb/qmi_wwan.c
@@ -1344,6 +1344,18 @@ static int qmi_wwan_probe(struct usb_interface *intf,
 		id->driver_info = (unsigned long)&qmi_wwan_info;
 	}
 
+	/* There are devices where the same interface number can be
+	 * configured as different functions. We should only bind to
+	 * vendor specific functions when matching on interface number
+	 */
+	if (id->match_flags & USB_DEVICE_ID_MATCH_INT_NUMBER &&
+	    desc->bInterfaceClass != USB_CLASS_VENDOR_SPEC) {
+		dev_dbg(&intf->dev,
+			"Rejecting interface number match for class %02x\n",
+			desc->bInterfaceClass);
+		return -ENODEV;
+	}
+
 	/* Quectel EC20 quirk where we've QMI on interface 4 instead of 0 */
 	if (quectel_ec20_detected(intf) && desc->bInterfaceNumber == 0) {
 		dev_dbg(&intf->dev, "Quectel EC20 quirk, skipping interface 0\n");
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH 2/3] mlx4: Don't bother using skb_tx_hash in mlx4_en_select_queue
From: Alexander Duyck @ 2018-05-02 20:20 UTC (permalink / raw)
  To: Ruhl, Michael J
  Cc: Duyck, Alexander H, netdev@vger.kernel.org, davem@davemloft.net,
	linux-rdma@vger.kernel.org, Dalessandro, Dennis,
	Vishwanathapura, Niranjana, tariqt@mellanox.com
In-Reply-To: <14063C7AD467DE4B82DEDB5C278E8663B2E40B6D@ORSMSX154.amr.corp.intel.com>

On Wed, May 2, 2018 at 11:09 AM, Ruhl, Michael J
<michael.j.ruhl@intel.com> wrote:
>>-----Original Message-----
>>From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
>>owner@vger.kernel.org] On Behalf Of Alexander Duyck
>>Sent: Friday, April 27, 2018 2:07 PM
>>To: netdev@vger.kernel.org; davem@davemloft.net
>>Cc: linux-rdma@vger.kernel.org; Dalessandro, Dennis
>><dennis.dalessandro@intel.com>; Vishwanathapura, Niranjana
>><niranjana.vishwanathapura@intel.com>; tariqt@mellanox.com
>>Subject: [PATCH 2/3] mlx4: Don't bother using skb_tx_hash in
>>mlx4_en_select_queue
>>
>>The code in the fallback path has supported XDP in conjunction with the Tx
>>traffic classification for TCs for over a year now. So instead of just
>>calling skb_tx_hash for every packet we are better off using the fallback
>>since that will record the Tx queue to the socket and then that can be used
>>instead of having to recompute the hash every time.
>>
>>Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
>>---
>> drivers/net/ethernet/mellanox/mlx4/en_tx.c |    2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>>diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
>>b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
>>index 6b68537..0227786 100644
>>--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
>>+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
>>@@ -694,7 +694,7 @@ u16 mlx4_en_select_queue(struct net_device *dev,
>>struct sk_buff *skb,
>>       u16 rings_p_up = priv->num_tx_rings_p_up;
>>
>>       if (netdev_get_num_tc(dev))
>>-              return skb_tx_hash(dev, skb);
>>+              return fallback(dev, skb);
>>
>>       return fallback(dev, skb) % rings_p_up;
>
> Hi Alexander,
>
> The final return fallback() call is doing a % rings_p_up.
>
> Do you need to do that for the new fallback() call?
>
> Maybe you can get rid of the netdev_get_num_tc() call altogether?
>
> Thanks,
>
> Mike

I contemplated that. The problem I suspect that piece of code is
solving is that there are cases where the number of queues the device
advertises and the number in use may not match. We have similar cases
in the ixgbe and i40e drivers for things like MACVLAN or FCoE offload
where certain queues are reserved for the offloaded traffic. For those
we are just setting up 1 TC in order to limit traffic to a given queue
set.

As time permits one thing I may look at doing is making it so that
this driver always has the number of TCs set to 1 or more. If we do
that then we could drop the modulo fallback case, likely improve the
performance as a result, and get rid of this ndo_select_queue call
entirely. I just don't currently have the time or resources to take on
the research and validation of such a change.

Thanks.

- Alex

^ permalink raw reply

* DSA switch
From: Ran Shalit @ 2018-05-02 20:20 UTC (permalink / raw)
  To: netdev

Hello,

Is it possible to use switch just like external real switch,
connecting all ports to the same subnet ?

In our architecture, we prefer that all IPs connected to board shall
be in the same subnet.

Yet, I am not sure if it is possible with dsa switch, because in dsa
the ports are seen in linux as separated interfaces. If we set all
interfaces to the same subnet, it will be problematic, because cpu
won't know in which interface to send output packets, Right ?

for example, can I use the following configuration ? Does it only
require to config ip address of lan0-lan3 ?

net mask 255.255.0.0
lan0 - 10.1.0.1  ------ connected to PC 10.1.0.2
lan1 - 10.1.0.3
lan2 - 10.1.0.5
lan3 - 10.1.0.7 ------ connected to PC 10.1.0.8

Thank you,
ranran

^ permalink raw reply

* [PATCH v2 bpf-next 3/3] bpf: add faked "ending" subprog
From: Jiong Wang @ 2018-05-02 20:17 UTC (permalink / raw)
  To: alexei.starovoitov, borkmann
  Cc: john.fastabend, ecree, netdev, oss-drivers, Jiong Wang
In-Reply-To: <1525292239-1309-1-git-send-email-jiong.wang@netronome.com>

There are quite a few code snippet like the following in verifier:

       subprog_start = 0;
       if (env->subprog_cnt == cur_subprog + 1)
               subprog_end = insn_cnt;
       else
               subprog_end = env->subprog_info[cur_subprog + 1].start;

The reason is there is no marker in subprog_info array to tell the end of
it.

We could resolve this issue by introducing a faked "ending" subprog.
The special "ending" subprog is with "insn_cnt" as start offset, so it is
serving as the end mark whenever we iterate over all subprogs.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 kernel/bpf/verifier.c | 34 ++++++++++++++--------------------
 1 file changed, 14 insertions(+), 20 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 9764b9b..65a6e2e 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -766,7 +766,7 @@ static int add_subprog(struct bpf_verifier_env *env, int off)
 	ret = find_subprog(env, off);
 	if (ret >= 0)
 		return 0;
-	if (env->subprog_cnt > BPF_MAX_SUBPROGS) {
+	if (env->subprog_cnt >= BPF_MAX_SUBPROGS) {
 		verbose(env, "too many subprograms\n");
 		return -E2BIG;
 	}
@@ -807,16 +807,18 @@ static int check_subprogs(struct bpf_verifier_env *env)
 			return ret;
 	}
 
+	/* Add a fake 'exit' subprog which could simplify subprog iteration
+	 * logic. 'subprog_cnt' should not be increased.
+	 */
+	subprog[env->subprog_cnt].start = insn_cnt;
+
 	if (env->log.level > 1)
 		for (i = 0; i < env->subprog_cnt; i++)
 			verbose(env, "func#%d @%d\n", i, subprog[i].start);
 
 	/* now check that all jumps are within the same subprog */
-	subprog_start = 0;
-	if (env->subprog_cnt == cur_subprog + 1)
-		subprog_end = insn_cnt;
-	else
-		subprog_end = subprog[cur_subprog + 1].start;
+	subprog_start = subprog[cur_subprog].start;
+	subprog_end = subprog[cur_subprog + 1].start;
 	for (i = 0; i < insn_cnt; i++) {
 		u8 code = insn[i].code;
 
@@ -840,11 +842,9 @@ static int check_subprogs(struct bpf_verifier_env *env)
 				verbose(env, "last insn is not an exit or jmp\n");
 				return -EINVAL;
 			}
-			cur_subprog++;
 			subprog_start = subprog_end;
-			if (env->subprog_cnt == cur_subprog + 1)
-				subprog_end = insn_cnt;
-			else
+			cur_subprog++;
+			if (cur_subprog < env->subprog_cnt)
 				subprog_end = subprog[cur_subprog + 1].start;
 		}
 	}
@@ -1499,7 +1499,6 @@ static int check_max_stack_depth(struct bpf_verifier_env *env)
 	int depth = 0, frame = 0, idx = 0, i = 0, subprog_end;
 	struct bpf_subprog_info *subprog = env->subprog_info;
 	struct bpf_insn *insn = env->prog->insnsi;
-	int insn_cnt = env->prog->len;
 	int ret_insn[MAX_CALL_FRAMES];
 	int ret_prog[MAX_CALL_FRAMES];
 
@@ -1514,10 +1513,7 @@ static int check_max_stack_depth(struct bpf_verifier_env *env)
 		return -EACCES;
 	}
 continue_func:
-	if (env->subprog_cnt == idx + 1)
-		subprog_end = insn_cnt;
-	else
-		subprog_end = subprog[idx + 1].start;
+	subprog_end = subprog[idx + 1].start;
 	for (; i < subprog_end; i++) {
 		if (insn[i].code != (BPF_JMP | BPF_CALL))
 			continue;
@@ -5070,7 +5066,8 @@ static void adjust_subprog_starts(struct bpf_verifier_env *env, u32 off, u32 len
 
 	if (len == 1)
 		return;
-	for (i = 0; i < env->subprog_cnt; i++) {
+	/* NOTE: fake 'exit' subprog should be updated as well. */
+	for (i = 0; i <= env->subprog_cnt; i++) {
 		if (env->subprog_info[i].start < off)
 			continue;
 		env->subprog_info[i].start += len - 1;
@@ -5268,10 +5265,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 
 	for (i = 0; i < env->subprog_cnt; i++) {
 		subprog_start = subprog_end;
-		if (env->subprog_cnt == i + 1)
-			subprog_end = prog->len;
-		else
-			subprog_end = env->subprog_info[i + 1].start;
+		subprog_end = env->subprog_info[i + 1].start;
 
 		len = subprog_end - subprog_start;
 		func[i] = bpf_prog_alloc(bpf_prog_size(len), GFP_USER);
-- 
2.7.4

^ permalink raw reply related

* [PATCH v2 bpf-next 2/3] bpf: centre subprog information fields
From: Jiong Wang @ 2018-05-02 20:17 UTC (permalink / raw)
  To: alexei.starovoitov, borkmann
  Cc: john.fastabend, ecree, netdev, oss-drivers, Jiong Wang
In-Reply-To: <1525292239-1309-1-git-send-email-jiong.wang@netronome.com>

It is better to centre all subprog information fields into one structure.
This structure could later serve as function node in call graph.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 include/linux/bpf_verifier.h |  9 ++++---
 kernel/bpf/verifier.c        | 62 +++++++++++++++++++++++---------------------
 2 files changed, 38 insertions(+), 33 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index f655b92..8f70dc1 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -173,6 +173,11 @@ static inline bool bpf_verifier_log_needed(const struct bpf_verifier_log *log)
 
 #define BPF_MAX_SUBPROGS 256
 
+struct bpf_subprog_info {
+	u32 start; /* insn idx of function entry point */
+	u16 stack_depth; /* max. stack depth used by this function */
+};
+
 /* single container for all structs
  * one verifier_env per bpf_check() call
  */
@@ -191,9 +196,7 @@ struct bpf_verifier_env {
 	bool seen_direct_write;
 	struct bpf_insn_aux_data *insn_aux_data; /* array of per-insn state */
 	struct bpf_verifier_log log;
-	u32 subprog_starts[BPF_MAX_SUBPROGS + 1];
-	/* computes the stack depth of each bpf function */
-	u16 subprog_stack_depth[BPF_MAX_SUBPROGS + 1];
+	struct bpf_subprog_info subprog_info[BPF_MAX_SUBPROGS + 1];
 	u32 subprog_cnt;
 };
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 16ec977..9764b9b 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -738,18 +738,19 @@ enum reg_arg_type {
 
 static int cmp_subprogs(const void *a, const void *b)
 {
-	return *(int *)a - *(int *)b;
+	return ((struct bpf_subprog_info *)a)->start -
+	       ((struct bpf_subprog_info *)b)->start;
 }
 
 static int find_subprog(struct bpf_verifier_env *env, int off)
 {
-	u32 *p;
+	struct bpf_subprog_info *p;
 
-	p = bsearch(&off, env->subprog_starts, env->subprog_cnt,
-		    sizeof(env->subprog_starts[0]), cmp_subprogs);
+	p = bsearch(&off, env->subprog_info, env->subprog_cnt,
+		    sizeof(env->subprog_info[0]), cmp_subprogs);
 	if (!p)
 		return -ENOENT;
-	return p - env->subprog_starts;
+	return p - env->subprog_info;
 
 }
 
@@ -769,15 +770,16 @@ static int add_subprog(struct bpf_verifier_env *env, int off)
 		verbose(env, "too many subprograms\n");
 		return -E2BIG;
 	}
-	env->subprog_starts[env->subprog_cnt++] = off;
-	sort(env->subprog_starts, env->subprog_cnt,
-	     sizeof(env->subprog_starts[0]), cmp_subprogs, NULL);
+	env->subprog_info[env->subprog_cnt++].start = off;
+	sort(env->subprog_info, env->subprog_cnt,
+	     sizeof(env->subprog_info[0]), cmp_subprogs, NULL);
 	return 0;
 }
 
 static int check_subprogs(struct bpf_verifier_env *env)
 {
 	int i, ret, subprog_start, subprog_end, off, cur_subprog = 0;
+	struct bpf_subprog_info *subprog = env->subprog_info;
 	struct bpf_insn *insn = env->prog->insnsi;
 	int insn_cnt = env->prog->len;
 
@@ -807,14 +809,14 @@ static int check_subprogs(struct bpf_verifier_env *env)
 
 	if (env->log.level > 1)
 		for (i = 0; i < env->subprog_cnt; i++)
-			verbose(env, "func#%d @%d\n", i, env->subprog_starts[i]);
+			verbose(env, "func#%d @%d\n", i, subprog[i].start);
 
 	/* now check that all jumps are within the same subprog */
 	subprog_start = 0;
 	if (env->subprog_cnt == cur_subprog + 1)
 		subprog_end = insn_cnt;
 	else
-		subprog_end = env->subprog_starts[cur_subprog + 1];
+		subprog_end = subprog[cur_subprog + 1].start;
 	for (i = 0; i < insn_cnt; i++) {
 		u8 code = insn[i].code;
 
@@ -843,8 +845,7 @@ static int check_subprogs(struct bpf_verifier_env *env)
 			if (env->subprog_cnt == cur_subprog + 1)
 				subprog_end = insn_cnt;
 			else
-				subprog_end =
-					env->subprog_starts[cur_subprog + 1];
+				subprog_end = subprog[cur_subprog + 1].start;
 		}
 	}
 	return 0;
@@ -1477,13 +1478,13 @@ static int update_stack_depth(struct bpf_verifier_env *env,
 			      const struct bpf_func_state *func,
 			      int off)
 {
-	u16 stack = env->subprog_stack_depth[func->subprogno];
+	u16 stack = env->subprog_info[func->subprogno].stack_depth;
 
 	if (stack >= -off)
 		return 0;
 
 	/* update known max for given subprogram */
-	env->subprog_stack_depth[func->subprogno] = -off;
+	env->subprog_info[func->subprogno].stack_depth = -off;
 	return 0;
 }
 
@@ -1495,7 +1496,8 @@ static int update_stack_depth(struct bpf_verifier_env *env,
  */
 static int check_max_stack_depth(struct bpf_verifier_env *env)
 {
-	int depth = 0, frame = 0, subprog = 0, i = 0, subprog_end;
+	int depth = 0, frame = 0, idx = 0, i = 0, subprog_end;
+	struct bpf_subprog_info *subprog = env->subprog_info;
 	struct bpf_insn *insn = env->prog->insnsi;
 	int insn_cnt = env->prog->len;
 	int ret_insn[MAX_CALL_FRAMES];
@@ -1505,17 +1507,17 @@ static int check_max_stack_depth(struct bpf_verifier_env *env)
 	/* round up to 32-bytes, since this is granularity
 	 * of interpreter stack size
 	 */
-	depth += round_up(max_t(u32, env->subprog_stack_depth[subprog], 1), 32);
+	depth += round_up(max_t(u32, subprog[idx].stack_depth, 1), 32);
 	if (depth > MAX_BPF_STACK) {
 		verbose(env, "combined stack size of %d calls is %d. Too large\n",
 			frame + 1, depth);
 		return -EACCES;
 	}
 continue_func:
-	if (env->subprog_cnt == subprog + 1)
+	if (env->subprog_cnt == idx + 1)
 		subprog_end = insn_cnt;
 	else
-		subprog_end = env->subprog_starts[subprog + 1];
+		subprog_end = subprog[idx + 1].start;
 	for (; i < subprog_end; i++) {
 		if (insn[i].code != (BPF_JMP | BPF_CALL))
 			continue;
@@ -1523,12 +1525,12 @@ static int check_max_stack_depth(struct bpf_verifier_env *env)
 			continue;
 		/* remember insn and function to return to */
 		ret_insn[frame] = i + 1;
-		ret_prog[frame] = subprog;
+		ret_prog[frame] = idx;
 
 		/* find the callee */
 		i = i + insn[i].imm + 1;
-		subprog = find_subprog(env, i);
-		if (subprog < 0) {
+		idx = find_subprog(env, i);
+		if (idx < 0) {
 			WARN_ONCE(1, "verifier bug. No program starts at insn %d\n",
 				  i);
 			return -EFAULT;
@@ -1545,10 +1547,10 @@ static int check_max_stack_depth(struct bpf_verifier_env *env)
 	 */
 	if (frame == 0)
 		return 0;
-	depth -= round_up(max_t(u32, env->subprog_stack_depth[subprog], 1), 32);
+	depth -= round_up(max_t(u32, subprog[idx].stack_depth, 1), 32);
 	frame--;
 	i = ret_insn[frame];
-	subprog = ret_prog[frame];
+	idx = ret_prog[frame];
 	goto continue_func;
 }
 
@@ -1564,7 +1566,7 @@ static int get_callee_stack_depth(struct bpf_verifier_env *env,
 			  start);
 		return -EFAULT;
 	}
-	return env->subprog_stack_depth[subprog];
+	return env->subprog_info[subprog].stack_depth;
 }
 #endif
 
@@ -4855,14 +4857,14 @@ static int do_check(struct bpf_verifier_env *env)
 	verbose(env, "processed %d insns (limit %d), stack depth ",
 		insn_processed, BPF_COMPLEXITY_LIMIT_INSNS);
 	for (i = 0; i < env->subprog_cnt; i++) {
-		u32 depth = env->subprog_stack_depth[i];
+		u32 depth = env->subprog_info[i].stack_depth;
 
 		verbose(env, "%d", depth);
 		if (i + 1 < env->subprog_cnt)
 			verbose(env, "+");
 	}
 	verbose(env, "\n");
-	env->prog->aux->stack_depth = env->subprog_stack_depth[0];
+	env->prog->aux->stack_depth = env->subprog_info[0].stack_depth;
 	return 0;
 }
 
@@ -5069,9 +5071,9 @@ static void adjust_subprog_starts(struct bpf_verifier_env *env, u32 off, u32 len
 	if (len == 1)
 		return;
 	for (i = 0; i < env->subprog_cnt; i++) {
-		if (env->subprog_starts[i] < off)
+		if (env->subprog_info[i].start < off)
 			continue;
-		env->subprog_starts[i] += len - 1;
+		env->subprog_info[i].start += len - 1;
 	}
 }
 
@@ -5269,7 +5271,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 		if (env->subprog_cnt == i + 1)
 			subprog_end = prog->len;
 		else
-			subprog_end = env->subprog_starts[i + 1];
+			subprog_end = env->subprog_info[i + 1].start;
 
 		len = subprog_end - subprog_start;
 		func[i] = bpf_prog_alloc(bpf_prog_size(len), GFP_USER);
@@ -5286,7 +5288,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 		 * Long term would need debug info to populate names
 		 */
 		func[i]->aux->name[0] = 'F';
-		func[i]->aux->stack_depth = env->subprog_stack_depth[i];
+		func[i]->aux->stack_depth = env->subprog_info[i].stack_depth;
 		func[i]->jit_requested = 1;
 		func[i] = bpf_int_jit_compile(func[i]);
 		if (!func[i]->jited) {
-- 
2.7.4

^ permalink raw reply related

* [PATCH v2 bpf-next 1/3] bpf: unify main prog and subprog
From: Jiong Wang @ 2018-05-02 20:17 UTC (permalink / raw)
  To: alexei.starovoitov, borkmann
  Cc: john.fastabend, ecree, netdev, oss-drivers, Jiong Wang
In-Reply-To: <1525292239-1309-1-git-send-email-jiong.wang@netronome.com>

Currently, verifier treat main prog and subprog differently. All subprogs
detected are kept in env->subprog_starts while main prog is not kept there.
Instead, main prog is implicitly defined as the prog start at 0.

There is actually no difference between main prog and subprog, it is better
to unify them, and register all progs detected into env->subprog_starts.

This could also help simplifying some code logic.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
---
 include/linux/bpf_verifier.h |  2 +-
 kernel/bpf/verifier.c        | 57 ++++++++++++++++++++++++--------------------
 2 files changed, 32 insertions(+), 27 deletions(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 7e61c39..f655b92 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -191,7 +191,7 @@ struct bpf_verifier_env {
 	bool seen_direct_write;
 	struct bpf_insn_aux_data *insn_aux_data; /* array of per-insn state */
 	struct bpf_verifier_log log;
-	u32 subprog_starts[BPF_MAX_SUBPROGS];
+	u32 subprog_starts[BPF_MAX_SUBPROGS + 1];
 	/* computes the stack depth of each bpf function */
 	u16 subprog_stack_depth[BPF_MAX_SUBPROGS + 1];
 	u32 subprog_cnt;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index eb1a596..16ec977 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -765,7 +765,7 @@ static int add_subprog(struct bpf_verifier_env *env, int off)
 	ret = find_subprog(env, off);
 	if (ret >= 0)
 		return 0;
-	if (env->subprog_cnt >= BPF_MAX_SUBPROGS) {
+	if (env->subprog_cnt > BPF_MAX_SUBPROGS) {
 		verbose(env, "too many subprograms\n");
 		return -E2BIG;
 	}
@@ -781,6 +781,11 @@ static int check_subprogs(struct bpf_verifier_env *env)
 	struct bpf_insn *insn = env->prog->insnsi;
 	int insn_cnt = env->prog->len;
 
+	/* Add entry function. */
+	ret = add_subprog(env, 0);
+	if (ret < 0)
+		return ret;
+
 	/* determine subprog starts. The end is one before the next starts */
 	for (i = 0; i < insn_cnt; i++) {
 		if (insn[i].code != (BPF_JMP | BPF_CALL))
@@ -806,10 +811,10 @@ static int check_subprogs(struct bpf_verifier_env *env)
 
 	/* now check that all jumps are within the same subprog */
 	subprog_start = 0;
-	if (env->subprog_cnt == cur_subprog)
+	if (env->subprog_cnt == cur_subprog + 1)
 		subprog_end = insn_cnt;
 	else
-		subprog_end = env->subprog_starts[cur_subprog++];
+		subprog_end = env->subprog_starts[cur_subprog + 1];
 	for (i = 0; i < insn_cnt; i++) {
 		u8 code = insn[i].code;
 
@@ -833,11 +838,13 @@ static int check_subprogs(struct bpf_verifier_env *env)
 				verbose(env, "last insn is not an exit or jmp\n");
 				return -EINVAL;
 			}
+			cur_subprog++;
 			subprog_start = subprog_end;
-			if (env->subprog_cnt == cur_subprog)
+			if (env->subprog_cnt == cur_subprog + 1)
 				subprog_end = insn_cnt;
 			else
-				subprog_end = env->subprog_starts[cur_subprog++];
+				subprog_end =
+					env->subprog_starts[cur_subprog + 1];
 		}
 	}
 	return 0;
@@ -1505,10 +1512,10 @@ static int check_max_stack_depth(struct bpf_verifier_env *env)
 		return -EACCES;
 	}
 continue_func:
-	if (env->subprog_cnt == subprog)
+	if (env->subprog_cnt == subprog + 1)
 		subprog_end = insn_cnt;
 	else
-		subprog_end = env->subprog_starts[subprog];
+		subprog_end = env->subprog_starts[subprog + 1];
 	for (; i < subprog_end; i++) {
 		if (insn[i].code != (BPF_JMP | BPF_CALL))
 			continue;
@@ -1526,7 +1533,6 @@ static int check_max_stack_depth(struct bpf_verifier_env *env)
 				  i);
 			return -EFAULT;
 		}
-		subprog++;
 		frame++;
 		if (frame >= MAX_CALL_FRAMES) {
 			WARN_ONCE(1, "verifier bug. Call stack is too deep\n");
@@ -1558,7 +1564,6 @@ static int get_callee_stack_depth(struct bpf_verifier_env *env,
 			  start);
 		return -EFAULT;
 	}
-	subprog++;
 	return env->subprog_stack_depth[subprog];
 }
 #endif
@@ -2087,7 +2092,7 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
 	case BPF_FUNC_tail_call:
 		if (map->map_type != BPF_MAP_TYPE_PROG_ARRAY)
 			goto error;
-		if (env->subprog_cnt) {
+		if (env->subprog_cnt > 1) {
 			verbose(env, "tail_calls are not allowed in programs with bpf-to-bpf calls\n");
 			return -EINVAL;
 		}
@@ -2259,7 +2264,7 @@ static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			/* remember the callsite, it will be used by bpf_exit */
 			*insn_idx /* callsite */,
 			state->curframe + 1 /* frameno within this callchain */,
-			subprog + 1 /* subprog number within this prog */);
+			subprog /* subprog number within this prog */);
 
 	/* copy r1 - r5 args that callee can access */
 	for (i = BPF_REG_1; i <= BPF_REG_5; i++)
@@ -3818,7 +3823,7 @@ static int check_ld_abs(struct bpf_verifier_env *env, struct bpf_insn *insn)
 		return -EINVAL;
 	}
 
-	if (env->subprog_cnt) {
+	if (env->subprog_cnt > 1) {
 		/* when program has LD_ABS insn JITs and interpreter assume
 		 * that r1 == ctx == skb which is not the case for callees
 		 * that can have arbitrary arguments. It's problematic
@@ -4849,11 +4854,11 @@ static int do_check(struct bpf_verifier_env *env)
 
 	verbose(env, "processed %d insns (limit %d), stack depth ",
 		insn_processed, BPF_COMPLEXITY_LIMIT_INSNS);
-	for (i = 0; i < env->subprog_cnt + 1; i++) {
+	for (i = 0; i < env->subprog_cnt; i++) {
 		u32 depth = env->subprog_stack_depth[i];
 
 		verbose(env, "%d", depth);
-		if (i + 1 < env->subprog_cnt + 1)
+		if (i + 1 < env->subprog_cnt)
 			verbose(env, "+");
 	}
 	verbose(env, "\n");
@@ -5230,7 +5235,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 	void *old_bpf_func;
 	int err = -ENOMEM;
 
-	if (env->subprog_cnt == 0)
+	if (env->subprog_cnt <= 1)
 		return 0;
 
 	for (i = 0, insn = prog->insnsi; i < prog->len; i++, insn++) {
@@ -5246,7 +5251,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 		/* temporarily remember subprog id inside insn instead of
 		 * aux_data, since next loop will split up all insns into funcs
 		 */
-		insn->off = subprog + 1;
+		insn->off = subprog;
 		/* remember original imm in case JIT fails and fallback
 		 * to interpreter will be needed
 		 */
@@ -5255,16 +5260,16 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 		insn->imm = 1;
 	}
 
-	func = kzalloc(sizeof(prog) * (env->subprog_cnt + 1), GFP_KERNEL);
+	func = kzalloc(sizeof(prog) * env->subprog_cnt, GFP_KERNEL);
 	if (!func)
 		return -ENOMEM;
 
-	for (i = 0; i <= env->subprog_cnt; i++) {
+	for (i = 0; i < env->subprog_cnt; i++) {
 		subprog_start = subprog_end;
-		if (env->subprog_cnt == i)
+		if (env->subprog_cnt == i + 1)
 			subprog_end = prog->len;
 		else
-			subprog_end = env->subprog_starts[i];
+			subprog_end = env->subprog_starts[i + 1];
 
 		len = subprog_end - subprog_start;
 		func[i] = bpf_prog_alloc(bpf_prog_size(len), GFP_USER);
@@ -5294,7 +5299,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 	 * now populate all bpf_calls with correct addresses and
 	 * run last pass of JIT
 	 */
-	for (i = 0; i <= env->subprog_cnt; i++) {
+	for (i = 0; i < env->subprog_cnt; i++) {
 		insn = func[i]->insnsi;
 		for (j = 0; j < func[i]->len; j++, insn++) {
 			if (insn->code != (BPF_JMP | BPF_CALL) ||
@@ -5307,7 +5312,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 				__bpf_call_base;
 		}
 	}
-	for (i = 0; i <= env->subprog_cnt; i++) {
+	for (i = 0; i < env->subprog_cnt; i++) {
 		old_bpf_func = func[i]->bpf_func;
 		tmp = bpf_int_jit_compile(func[i]);
 		if (tmp != func[i] || func[i]->bpf_func != old_bpf_func) {
@@ -5321,7 +5326,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 	/* finally lock prog and jit images for all functions and
 	 * populate kallsysm
 	 */
-	for (i = 0; i <= env->subprog_cnt; i++) {
+	for (i = 0; i < env->subprog_cnt; i++) {
 		bpf_prog_lock_ro(func[i]);
 		bpf_prog_kallsyms_add(func[i]);
 	}
@@ -5338,7 +5343,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 			continue;
 		insn->off = env->insn_aux_data[i].call_imm;
 		subprog = find_subprog(env, i + insn->off + 1);
-		addr  = (unsigned long)func[subprog + 1]->bpf_func;
+		addr  = (unsigned long)func[subprog]->bpf_func;
 		addr &= PAGE_MASK;
 		insn->imm = (u64 (*)(u64, u64, u64, u64, u64))
 			    addr - __bpf_call_base;
@@ -5347,10 +5352,10 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 	prog->jited = 1;
 	prog->bpf_func = func[0]->bpf_func;
 	prog->aux->func = func;
-	prog->aux->func_cnt = env->subprog_cnt + 1;
+	prog->aux->func_cnt = env->subprog_cnt;
 	return 0;
 out_free:
-	for (i = 0; i <= env->subprog_cnt; i++)
+	for (i = 0; i < env->subprog_cnt; i++)
 		if (func[i])
 			bpf_jit_free(func[i]);
 	kfree(func);
-- 
2.7.4

^ permalink raw reply related

* [PATCH v2 bpf-next 0/3] bpf: cleanups on managing subprog information
From: Jiong Wang @ 2018-05-02 20:17 UTC (permalink / raw)
  To: alexei.starovoitov, borkmann
  Cc: john.fastabend, ecree, netdev, oss-drivers, Jiong Wang

This patch set clean up some code logic related with managing subprog
information.

Part of the set are inspried by Edwin's code in his RFC:

  "bpf/verifier: subprog/func_call simplifications"

but with clearer separation so it could be easier to review.

  - Path 1 unifies main prog and subprogs. All of them are registered in
    env->subprog_starts.

  - After patch 1, it is clear that subprog_starts and subprog_stack_depth
    could be merged as both of them now have main and subprog unified.
    Patch 2 therefore does the merge, all subprog information are centred
    at bpf_subprog_info.

  - Patch 3 goes further to introduce a new fake "exit" subprog which
    serves as an ending marker to the subprog list. We could then turn the
    following code snippets across verifier:

       if (env->subprog_cnt == cur_subprog + 1)
               subprog_end = insn_cnt;
       else
               subprog_end = env->subprog_info[cur_subprog + 1].start;

    into:
       subprog_end = env->subprog_info[cur_subprog + 1].start;

There is no functional change by this patch set.
No bpf selftest (both non-jit and jit) regression found after this set.

v2:
  - fixed adjust_subprog_starts to also update fake "exit" subprog start.
  - for John's suggestion on renaming subprog to prog, I could work on
    a follow-up patch if it is recognized as worth the change.

Jiong Wang (3):
  bpf: unify main prog and subprog
  bpf: centre subprog information fields
  bpf: add faked "ending" subprog

 include/linux/bpf_verifier.h |   9 ++--
 kernel/bpf/verifier.c        | 121 ++++++++++++++++++++++---------------------
 2 files changed, 67 insertions(+), 63 deletions(-)

-- 
2.7.4

^ permalink raw reply

* [PATCH 2/2] sh_eth: WARN_ON() access to unimplemented TSU register
From: Sergei Shtylyov @ 2018-05-02 19:55 UTC (permalink / raw)
  To: netdev; +Cc: linux-renesas-soc, linux-sh
In-Reply-To: <4a17ab65-4d9c-897e-c340-7d86e0296f2f@cogentembedded.com>

Commit 3365711df024 ("sh_eth: WARN on access to a register not implemented
in a particular chip") added  WARN_ON() to sh_eth_{read|write}() but not
to sh_eth_tsu_{read|write}(). Now that we've routed almost all TSU register
accesses  (except TSU_ADR{H|L}<n> -- which are special) thru the latter
pair of accessors, it makes sense to check for the unimplemented TSU
registers as well...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

---
 drivers/net/ethernet/renesas/sh_eth.c |   14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

Index: net-next/drivers/net/ethernet/renesas/sh_eth.c
===================================================================
--- net-next.orig/drivers/net/ethernet/renesas/sh_eth.c
+++ net-next/drivers/net/ethernet/renesas/sh_eth.c
@@ -442,12 +442,22 @@ static void sh_eth_modify(struct net_dev
 static void sh_eth_tsu_write(struct sh_eth_private *mdp, u32 data,
 			     int enum_index)
 {
-	iowrite32(data, mdp->tsu_addr + mdp->reg_offset[enum_index]);
+	u16 offset = mdp->reg_offset[enum_index];
+
+	if (WARN_ON(offset == SH_ETH_OFFSET_INVALID))
+		return;
+
+	iowrite32(data, mdp->tsu_addr + offset);
 }
 
 static u32 sh_eth_tsu_read(struct sh_eth_private *mdp, int enum_index)
 {
-	return ioread32(mdp->tsu_addr + mdp->reg_offset[enum_index]);
+	u16 offset = mdp->reg_offset[enum_index];
+
+	if (WARN_ON(offset == SH_ETH_OFFSET_INVALID))
+		return ~0U;
+
+	return ioread32(mdp->tsu_addr + offset);
 }
 
 static void sh_eth_select_mii(struct net_device *ndev)

^ permalink raw reply

* [PATCH 1/2] sh_eth: use TSU register accessors for TSU_POST<n>
From: Sergei Shtylyov @ 2018-05-02 19:54 UTC (permalink / raw)
  To: netdev; +Cc: linux-renesas-soc, linux-sh
In-Reply-To: <4a17ab65-4d9c-897e-c340-7d86e0296f2f@cogentembedded.com>

There's no particularly good reason TSU_POST<n> registers get accessed
circumventing sh_eth_tsu_{read|write}() -- start using those, removing
(badly named) sh_eth_tsu_get_post_reg_offset(),  while at it...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

---
 drivers/net/ethernet/renesas/sh_eth.c |   20 ++++++--------------
 1 file changed, 6 insertions(+), 14 deletions(-)

Index: net-next/drivers/net/ethernet/renesas/sh_eth.c
===================================================================
--- net-next.orig/drivers/net/ethernet/renesas/sh_eth.c
+++ net-next/drivers/net/ethernet/renesas/sh_eth.c
@@ -2610,12 +2610,6 @@ static int sh_eth_change_mtu(struct net_
 }
 
 /* For TSU_POSTn. Please refer to the manual about this (strange) bitfields */
-static void *sh_eth_tsu_get_post_reg_offset(struct sh_eth_private *mdp,
-					    int entry)
-{
-	return sh_eth_tsu_get_offset(mdp, TSU_POST1) + (entry / 8 * 4);
-}
-
 static u32 sh_eth_tsu_get_post_mask(int entry)
 {
 	return 0x0f << (28 - ((entry % 8) * 4));
@@ -2630,27 +2624,25 @@ static void sh_eth_tsu_enable_cam_entry_
 					     int entry)
 {
 	struct sh_eth_private *mdp = netdev_priv(ndev);
+	int reg = TSU_POST1 + entry / 8;
 	u32 tmp;
-	void *reg_offset;
 
-	reg_offset = sh_eth_tsu_get_post_reg_offset(mdp, entry);
-	tmp = ioread32(reg_offset);
-	iowrite32(tmp | sh_eth_tsu_get_post_bit(mdp, entry), reg_offset);
+	tmp = sh_eth_tsu_read(mdp, reg);
+	sh_eth_tsu_write(mdp, tmp | sh_eth_tsu_get_post_bit(mdp, entry), reg);
 }
 
 static bool sh_eth_tsu_disable_cam_entry_post(struct net_device *ndev,
 					      int entry)
 {
 	struct sh_eth_private *mdp = netdev_priv(ndev);
+	int reg = TSU_POST1 + entry / 8;
 	u32 post_mask, ref_mask, tmp;
-	void *reg_offset;
 
-	reg_offset = sh_eth_tsu_get_post_reg_offset(mdp, entry);
 	post_mask = sh_eth_tsu_get_post_mask(entry);
 	ref_mask = sh_eth_tsu_get_post_bit(mdp, entry) & ~post_mask;
 
-	tmp = ioread32(reg_offset);
-	iowrite32(tmp & ~post_mask, reg_offset);
+	tmp = sh_eth_tsu_read(mdp, reg);
+	sh_eth_tsu_write(mdp, tmp & ~post_mask, reg);
 
 	/* If other port enables, the function returns "true" */
 	return tmp & ref_mask;

^ permalink raw reply

* [PATCH 0/2] sh_eth: complain on access to unimplemented TSU registers
From: Sergei Shtylyov @ 2018-05-02 19:53 UTC (permalink / raw)
  To: netdev; +Cc: linux-renesas-soc, linux-sh

Hello!

Here's a set of 2 patches against DaveM's 'net-next.git' repo. The 1st patch
routes TSU_POST<n> register accesses thru sh_eth_tsu_{read|write}() and the 2nd
added WARN_ON() unimplemented register to those functions. I'm going to deal with
TSU_ADR{H|L}<n> registers in a later series...

[1/2] sh_eth: use TSU register accessors for TSU_POST<n>
[2/2] sh_eth: WARN_ON() access to unimplemented TSU register

MBR, Sergei

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox