Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next 00/13] mlxsw: Add resource scale tests
From: David Miller @ 2018-06-30 13:06 UTC (permalink / raw)
  To: petrm; +Cc: netdev, linux-kselftest, jiri, idosch, shuah
In-Reply-To: <cover.1530319109.git.petrm@mellanox.com>

From: Petr Machata <petrm@mellanox.com>
Date: Sat, 30 Jun 2018 02:44:23 +0200

> There are a number of tests that check features of the Linux networking
> stack. By running them on suitable interfaces, one can exercise the
> mlxsw offloading code. However none of these tests attempts to push
> mlxsw to the limits supported by the ASIC.
> 
> As an additional wrinkle, the "limits supported by the ASIC" themselves
> may not be a set of fixed numbers, but rather depend on a profile that
> determines how the ASIC resources are allocated for different purposes.
> 
> This patchset introduces several tests that verify capability of mlxsw
> to offload amounts of routes, flower rules, and mirroring sessions that
> match predicted ASIC capacity, at different configuration profiles.
> Additionally they verify that amounts exceeding the predicted capacity
> can *not* be offloaded.
 ...

Good stuff, series applied, thanks!

^ permalink raw reply

* Re: [PATCH net] net: fib_rules: bring back rule_exists to match rule during add
From: David Miller @ 2018-06-30 13:11 UTC (permalink / raw)
  To: roopa; +Cc: netdev, dsa
In-Reply-To: <1530307935-6652-1-git-send-email-roopa@cumulusnetworks.com>

From: Roopa Prabhu <roopa@cumulusnetworks.com>
Date: Fri, 29 Jun 2018 14:32:15 -0700

> From: Roopa Prabhu <roopa@cumulusnetworks.com>
> 
> After commit f9d4b0c1e969 ("fib_rules: move common handling of newrule
> delrule msgs into fib_nl2rule"), rule_exists got replaced by rule_find
> for existing rule lookup in both the add and del paths. While this
> is good for the delete path, it solves a few problems but opens up
> a few invalid key matches in the add path.
> 
> $ip -4 rule add table main tos 10 fwmark 1
> $ip -4 rule add table main tos 10
> RTNETLINK answers: File exists
> 
> The problem here is rule_find does not check if the key masks in
> the new and old rule are the same and hence ends up matching a more
> secific rule. Rule key masks cannot be easily compared today without
> an elaborate if-else block. Its best to introduce key masks for easier
> and accurate rule comparison in the future. Until then, due to fear of
> regressions this patch re-introduces older loose rule_exists during add.
> Also fixes both rule_exists and rule_find to cover missing attributes.
> 
> Fixes: f9d4b0c1e969 ("fib_rules: move common handling of newrule delrule msgs into fib_nl2rule")
> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>

Applied, thanks for resolving all of these issues.

^ permalink raw reply

* [bpf PATCH v5 0/4] BPF fixes for sockhash
From: John Fastabend @ 2018-06-30 13:17 UTC (permalink / raw)
  To: ast, daniel, kafai; +Cc: netdev

This addresses two syzbot issues that lead to identifying (by Eric and
Wei) a class of bugs where we don't correctly check for IPv4/v6
sockets and their associated state. The second issue was a locking
omission in sockhash.

The first patch addresses IPv6 socks and fixing an error where
sockhash would overwrite the prot pointer with IPv4 prot. To fix
this build similar solution to TLS ULP. Although we continue to
allow socks in all states not just ESTABLISH in this patch set
because as Martin points out there should be no issue with this
on the sockmap ULP because we don't use the ctx in this code. Once
multiple ULPs coexist we may need to revisit this. However we
can do this in *next trees.

The other issue syzbot found that the tcp_close() handler missed
locking the hash bucket lock which could result in corrupting the
sockhash bucket list if delete and close ran at the same time. 
And also the smap_list_remove() routine was not working correctly
at all. This was not caught in my testing because in general my
tests (to date at least lets add some more robust selftest in
bpf-next) do things in the "expected" order, create map, add socks,
delete socks, then tear down maps. The tests we have that do the
ops out of this order where only working on single maps not multi-
maps so we never saw the issue. Thanks syzbot. The fix is to
restructure the tcp_close() lock handling. And fix the obvious
bug in smap_list_remove().

Finally, during review I noticed the release handler was omitted
from the upstream code (patch 4) due to an incorrect merge conflict
fix when I ported the code to latest bpf-next before submitting.
This would leave references to the map around if the user never
closes the map.

v3: rework patches, dropping ESTABLISH check and adding rcu
    annotation along with the smap_list_remove fix

v4: missed one more case where maps was being accessed without
    the sk_callback_lock, spoted by Martin as well.

v5: changed to use a specific lock for maps and reduced callback
    lock so that it is only used to gaurd sk callbacks. I think
    this makes the logic a bit cleaner and avoids confusion
    ovoer what each lock is doing.

Also big thanks to Martin for thorough review he caught at least
one case where I missed a rcu_call().

---

John Fastabend (4):
      bpf: sockmap, fix crash when ipv6 sock is added
      bpf: sockmap, fix smap_list_map_remove when psock is in many maps
      bpf: sockhash fix omitted bucket lock in sock_close
      bpf: sockhash, add release routine

 kernel/bpf/sockmap.c |  236 +++++++++++++++++++++++++++++++++++---------------
 1 file changed, 166 insertions(+), 70 deletions(-)

^ permalink raw reply

* [bpf PATCH v5 1/4] bpf: sockmap, fix crash when ipv6 sock is added
From: John Fastabend @ 2018-06-30 13:17 UTC (permalink / raw)
  To: ast, daniel, kafai; +Cc: netdev
In-Reply-To: <20180630131506.17344.7833.stgit@john-Precision-Tower-5810>

This fixes a crash where we assign tcp_prot to IPv6 sockets instead
of tcpv6_prot.

Previously we overwrote the sk->prot field with tcp_prot even in the
AF_INET6 case. This patch ensures the correct tcp_prot and tcpv6_prot
are used.

Tested with 'netserver -6' and 'netperf -H [IPv6]' as well as
'netperf -H [IPv4]'. The ESTABLISHED check resolves the previously
crashing case here.

Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support")
Reported-by: syzbot+5c063698bdbfac19f363@syzkaller.appspotmail.com
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Wei Wang <weiwan@google.com>
---
 kernel/bpf/sockmap.c |   58 +++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 48 insertions(+), 10 deletions(-)

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 52a91d8..d7fd17a 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -140,6 +140,7 @@ static int bpf_tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 static int bpf_tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
 static int bpf_tcp_sendpage(struct sock *sk, struct page *page,
 			    int offset, size_t size, int flags);
+static void bpf_tcp_close(struct sock *sk, long timeout);
 
 static inline struct smap_psock *smap_psock_sk(const struct sock *sk)
 {
@@ -161,7 +162,42 @@ static bool bpf_tcp_stream_read(const struct sock *sk)
 	return !empty;
 }
 
-static struct proto tcp_bpf_proto;
+enum {
+	SOCKMAP_IPV4,
+	SOCKMAP_IPV6,
+	SOCKMAP_NUM_PROTS,
+};
+
+enum {
+	SOCKMAP_BASE,
+	SOCKMAP_TX,
+	SOCKMAP_NUM_CONFIGS,
+};
+
+static struct proto *saved_tcpv6_prot __read_mostly;
+static DEFINE_SPINLOCK(tcpv6_prot_lock);
+static struct proto bpf_tcp_prots[SOCKMAP_NUM_PROTS][SOCKMAP_NUM_CONFIGS];
+static void build_protos(struct proto prot[SOCKMAP_NUM_CONFIGS],
+			 struct proto *base)
+{
+	prot[SOCKMAP_BASE]			= *base;
+	prot[SOCKMAP_BASE].close		= bpf_tcp_close;
+	prot[SOCKMAP_BASE].recvmsg		= bpf_tcp_recvmsg;
+	prot[SOCKMAP_BASE].stream_memory_read	= bpf_tcp_stream_read;
+
+	prot[SOCKMAP_TX]			= prot[SOCKMAP_BASE];
+	prot[SOCKMAP_TX].sendmsg		= bpf_tcp_sendmsg;
+	prot[SOCKMAP_TX].sendpage		= bpf_tcp_sendpage;
+}
+
+static void update_sk_prot(struct sock *sk, struct smap_psock *psock)
+{
+	int family = sk->sk_family == AF_INET6 ? SOCKMAP_IPV6 : SOCKMAP_IPV4;
+	int conf = psock->bpf_tx_msg ? SOCKMAP_TX : SOCKMAP_BASE;
+
+	sk->sk_prot = &bpf_tcp_prots[family][conf];
+}
+
 static int bpf_tcp_init(struct sock *sk)
 {
 	struct smap_psock *psock;
@@ -181,14 +217,17 @@ static int bpf_tcp_init(struct sock *sk)
 	psock->save_close = sk->sk_prot->close;
 	psock->sk_proto = sk->sk_prot;
 
-	if (psock->bpf_tx_msg) {
-		tcp_bpf_proto.sendmsg = bpf_tcp_sendmsg;
-		tcp_bpf_proto.sendpage = bpf_tcp_sendpage;
-		tcp_bpf_proto.recvmsg = bpf_tcp_recvmsg;
-		tcp_bpf_proto.stream_memory_read = bpf_tcp_stream_read;
+	/* Build IPv6 sockmap whenever the address of tcpv6_prot changes */
+	if (sk->sk_family == AF_INET6 &&
+	    unlikely(sk->sk_prot != smp_load_acquire(&saved_tcpv6_prot))) {
+		spin_lock_bh(&tcpv6_prot_lock);
+		if (likely(sk->sk_prot != saved_tcpv6_prot)) {
+			build_protos(bpf_tcp_prots[SOCKMAP_IPV6], sk->sk_prot);
+			smp_store_release(&saved_tcpv6_prot, sk->sk_prot);
+		}
+		spin_unlock_bh(&tcpv6_prot_lock);
 	}
-
-	sk->sk_prot = &tcp_bpf_proto;
+	update_sk_prot(sk, psock);
 	rcu_read_unlock();
 	return 0;
 }
@@ -1111,8 +1150,7 @@ static void bpf_tcp_msg_add(struct smap_psock *psock,
 
 static int bpf_tcp_ulp_register(void)
 {
-	tcp_bpf_proto = tcp_prot;
-	tcp_bpf_proto.close = bpf_tcp_close;
+	build_protos(bpf_tcp_prots[SOCKMAP_IPV4], &tcp_prot);
 	/* Once BPF TX ULP is registered it is never unregistered. It
 	 * will be in the ULP list for the lifetime of the system. Doing
 	 * duplicate registers is not a problem.

^ permalink raw reply related

* [bpf PATCH v5 2/4] bpf: sockmap, fix smap_list_map_remove when psock is in many maps
From: John Fastabend @ 2018-06-30 13:17 UTC (permalink / raw)
  To: ast, daniel, kafai; +Cc: netdev
In-Reply-To: <20180630131506.17344.7833.stgit@john-Precision-Tower-5810>

If a hashmap is free'd with open socks it removes the reference to
the hash entry from the psock. If that is the last reference to the
psock then it will also be free'd by the reference counting logic.
However the current logic that removes the hash reference from the
list of references is broken. In smap_list_remove() we first check
if the sockmap entry matches and then check if the hashmap entry
matches. But, the sockmap entry sill always match because its NULL in
this case which causes the first entry to be removed from the list.
If this is always the "right" entry (because the user adds/removes
entries in order) then everything is OK but otherwise a subsequent
bpf_tcp_close() may reference a free'd object.

To fix this create two list handlers one for sockmap and one for
sockhash.

Reported-by: syzbot+0ce137753c78f7b6acc1@syzkaller.appspotmail.com
Fixes: 81110384441a ("bpf: sockmap, add hash map support")
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 kernel/bpf/sockmap.c |   34 ++++++++++++++++++++++------------
 1 file changed, 22 insertions(+), 12 deletions(-)

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index d7fd17a..5adff4b 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -1602,17 +1602,27 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
 	return ERR_PTR(err);
 }
 
-static void smap_list_remove(struct smap_psock *psock,
-			     struct sock **entry,
-			     struct htab_elem *hash_link)
+static void smap_list_map_remove(struct smap_psock *psock,
+				 struct sock **entry)
 {
 	struct smap_psock_map_entry *e, *tmp;
 
 	list_for_each_entry_safe(e, tmp, &psock->maps, list) {
-		if (e->entry == entry || e->hash_link == hash_link) {
+		if (e->entry == entry)
+			list_del(&e->list);
+	}
+}
+
+static void smap_list_hash_remove(struct smap_psock *psock,
+				  struct htab_elem *hash_link)
+{
+	struct smap_psock_map_entry *e, *tmp;
+
+	list_for_each_entry_safe(e, tmp, &psock->maps, list) {
+		struct htab_elem *c = e->hash_link;
+
+		if (c == hash_link)
 			list_del(&e->list);
-			break;
-		}
 	}
 }
 
@@ -1647,7 +1657,7 @@ static void sock_map_free(struct bpf_map *map)
 		 * to be null and queued for garbage collection.
 		 */
 		if (likely(psock)) {
-			smap_list_remove(psock, &stab->sock_map[i], NULL);
+			smap_list_map_remove(psock, &stab->sock_map[i]);
 			smap_release_sock(psock, sock);
 		}
 		write_unlock_bh(&sock->sk_callback_lock);
@@ -1706,7 +1716,7 @@ static int sock_map_delete_elem(struct bpf_map *map, void *key)
 
 	if (psock->bpf_parse)
 		smap_stop_sock(psock, sock);
-	smap_list_remove(psock, &stab->sock_map[k], NULL);
+	smap_list_map_remove(psock, &stab->sock_map[k]);
 	smap_release_sock(psock, sock);
 out:
 	write_unlock_bh(&sock->sk_callback_lock);
@@ -1908,7 +1918,7 @@ static int sock_map_ctx_update_elem(struct bpf_sock_ops_kern *skops,
 		struct smap_psock *opsock = smap_psock_sk(osock);
 
 		write_lock_bh(&osock->sk_callback_lock);
-		smap_list_remove(opsock, &stab->sock_map[i], NULL);
+		smap_list_map_remove(opsock, &stab->sock_map[i]);
 		smap_release_sock(opsock, osock);
 		write_unlock_bh(&osock->sk_callback_lock);
 	}
@@ -2124,7 +2134,7 @@ static void sock_hash_free(struct bpf_map *map)
 			 * (psock) to be null and queued for garbage collection.
 			 */
 			if (likely(psock)) {
-				smap_list_remove(psock, NULL, l);
+				smap_list_hash_remove(psock, l);
 				smap_release_sock(psock, sock);
 			}
 			write_unlock_bh(&sock->sk_callback_lock);
@@ -2304,7 +2314,7 @@ static int sock_hash_ctx_update_elem(struct bpf_sock_ops_kern *skops,
 		psock = smap_psock_sk(l_old->sk);
 
 		hlist_del_rcu(&l_old->hash_node);
-		smap_list_remove(psock, NULL, l_old);
+		smap_list_hash_remove(psock, l_old);
 		smap_release_sock(psock, l_old->sk);
 		free_htab_elem(htab, l_old);
 	}
@@ -2372,7 +2382,7 @@ static int sock_hash_delete_elem(struct bpf_map *map, void *key)
 		 * to be null and queued for garbage collection.
 		 */
 		if (likely(psock)) {
-			smap_list_remove(psock, NULL, l);
+			smap_list_hash_remove(psock, l);
 			smap_release_sock(psock, sock);
 		}
 		write_unlock_bh(&sock->sk_callback_lock);

^ permalink raw reply related

* [bpf PATCH v5 3/4] bpf: sockhash fix omitted bucket lock in sock_close
From: John Fastabend @ 2018-06-30 13:17 UTC (permalink / raw)
  To: ast, daniel, kafai; +Cc: netdev
In-Reply-To: <20180630131506.17344.7833.stgit@john-Precision-Tower-5810>

First the sk_callback_lock() was being used to protect both the
sock callback hooks and the psock->maps list. This got overly
convoluted after the addition of sockhash (in sockmap it made
some sense because masp and callbacks were tightly coupled) so
lets split out a specific lock for maps and only use the callback
lock for its intended purpose. This fixes a couple cases where
we missed using maps lock when it was in fact needed. Also this
makes it easier to follow the code because now we can put the
locking closer to the actual code its serializing.

Next, in sock_hash_delete_elem() the pattern was as follows,

  sock_hash_delete_elem()
     [...]
     spin_lock(bucket_lock)
     l = lookup_elem_raw()
     if (l)
        hlist_del_rcu()
        write_lock(sk_callback_lock)
         .... destroy psock ...
        write_unlock(sk_callback_lock)
     spin_unlock(bucket_lock)

The ordering is necessary because we only know the {p}sock after
dereferencing the hash table which we can't do unless we have the
bucket lock held. Once we have the bucket lock and the psock element
it is deleted from the hashmap to ensure any other path doing a lookup
will fail. Finally, the refcnt is decremented and if zero the psock
is destroyed.

In parallel with the above (or free'ing the map) a tcp close event
may trigger tcp_close(). Which at the moment omits the bucket lock
altogether (oops!) where the flow looks like this,

  bpf_tcp_close()
     [...]
     write_lock(sk_callback_lock)
     for each psock->maps // list of maps this sock is part of
         hlist_del_rcu(ref_hash_node);
         .... destroy psock ...
     write_unlock(sk_callback_lock)

Obviously, and demonstrated by syzbot, this is broken because
we can have multiple threads deleting entries via hlist_del_rcu().

To fix this we might be tempted to wrap the hlist operation in a
bucket lock but that would create a lock inversion problem. In
summary to follow locking rules the psocks maps list needs the
sk_callback_lock (after this patch maps_lock) but we need the bucket
lock to do the hlist_del_rcu.

To resolve the lock inversion problem pop the head of the maps list
repeatedly and remove the reference until no more are left. If a
delete happens in parallel from the BPF API that is OK as well because
it will do a similar action, lookup the lock in the map/hash, delete
it from the map/hash, and dec the refcnt. We check for this case
before doing a destroy on the psock to ensure we don't have two
threads tearing down a psock. The new logic is as follows,

  bpf_tcp_close()
  e = psock_map_pop(psock->maps) // done with map lock
  bucket_lock() // lock hash list bucket
  l = lookup_elem_raw(head, hash, key, key_size);
  if (l) {
     //only get here if elmnt was not already removed
     hlist_del_rcu()
     ... destroy psock...
  }
  bucket_unlock()

And finally for all the above to work add missing locking around  map
operations per above. Then add RCU annotations and use
rcu_dereference/rcu_assign_pointer to manage values relying on RCU so
that the object is not free'd from sock_hash_free() while it is being
referenced in bpf_tcp_close().

Reported-by: syzbot+0ce137753c78f7b6acc1@syzkaller.appspotmail.com
Fixes: 81110384441a ("bpf: sockmap, add hash map support")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 kernel/bpf/sockmap.c |  145 +++++++++++++++++++++++++++++++++-----------------
 1 file changed, 96 insertions(+), 49 deletions(-)

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 5adff4b..60b6e8b 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -72,6 +72,7 @@ struct bpf_htab {
 	u32 n_buckets;
 	u32 elem_size;
 	struct bpf_sock_progs progs;
+	struct rcu_head rcu;
 };
 
 struct htab_elem {
@@ -89,8 +90,8 @@ enum smap_psock_state {
 struct smap_psock_map_entry {
 	struct list_head list;
 	struct sock **entry;
-	struct htab_elem *hash_link;
-	struct bpf_htab *htab;
+	struct htab_elem __rcu *hash_link;
+	struct bpf_htab __rcu *htab;
 };
 
 struct smap_psock {
@@ -120,6 +121,7 @@ struct smap_psock {
 	struct bpf_prog *bpf_parse;
 	struct bpf_prog *bpf_verdict;
 	struct list_head maps;
+	spinlock_t maps_lock;
 
 	/* Back reference used when sock callback trigger sockmap operations */
 	struct sock *sock;
@@ -258,16 +260,54 @@ static void bpf_tcp_release(struct sock *sk)
 	rcu_read_unlock();
 }
 
+static struct htab_elem *lookup_elem_raw(struct hlist_head *head,
+					 u32 hash, void *key, u32 key_size)
+{
+	struct htab_elem *l;
+
+	hlist_for_each_entry_rcu(l, head, hash_node) {
+		if (l->hash == hash && !memcmp(&l->key, key, key_size))
+			return l;
+	}
+
+	return NULL;
+}
+
+static inline struct bucket *__select_bucket(struct bpf_htab *htab, u32 hash)
+{
+	return &htab->buckets[hash & (htab->n_buckets - 1)];
+}
+
+static inline struct hlist_head *select_bucket(struct bpf_htab *htab, u32 hash)
+{
+	return &__select_bucket(htab, hash)->head;
+}
+
 static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l)
 {
 	atomic_dec(&htab->count);
 	kfree_rcu(l, rcu);
 }
 
+static struct smap_psock_map_entry *psock_map_pop(struct sock *sk,
+						  struct smap_psock *psock)
+{
+	struct smap_psock_map_entry *e;
+
+	spin_lock_bh(&psock->maps_lock);
+	e = list_first_entry_or_null(&psock->maps,
+				     struct smap_psock_map_entry,
+				     list);
+	if (e)
+		list_del(&e->list);
+	spin_unlock_bh(&psock->maps_lock);
+	return e;
+}
+
 static void bpf_tcp_close(struct sock *sk, long timeout)
 {
 	void (*close_fun)(struct sock *sk, long timeout);
-	struct smap_psock_map_entry *e, *tmp;
+	struct smap_psock_map_entry *e;
 	struct sk_msg_buff *md, *mtmp;
 	struct smap_psock *psock;
 	struct sock *osk;
@@ -286,7 +326,6 @@ static void bpf_tcp_close(struct sock *sk, long timeout)
 	 */
 	close_fun = psock->save_close;
 
-	write_lock_bh(&sk->sk_callback_lock);
 	if (psock->cork) {
 		free_start_sg(psock->sock, psock->cork);
 		kfree(psock->cork);
@@ -299,20 +338,38 @@ static void bpf_tcp_close(struct sock *sk, long timeout)
 		kfree(md);
 	}
 
-	list_for_each_entry_safe(e, tmp, &psock->maps, list) {
+	e = psock_map_pop(sk, psock);
+	while (e) {
 		if (e->entry) {
 			osk = cmpxchg(e->entry, sk, NULL);
 			if (osk == sk) {
-				list_del(&e->list);
 				smap_release_sock(psock, sk);
 			}
 		} else {
-			hlist_del_rcu(&e->hash_link->hash_node);
-			smap_release_sock(psock, e->hash_link->sk);
-			free_htab_elem(e->htab, e->hash_link);
+			struct htab_elem *link = rcu_dereference(e->hash_link);
+			struct bpf_htab *htab = rcu_dereference(e->htab);
+			struct hlist_head *head;
+			struct htab_elem *l;
+			struct bucket *b;
+
+			b = __select_bucket(htab, link->hash);
+			head = &b->head;
+			raw_spin_lock_bh(&b->lock);
+			l = lookup_elem_raw(head,
+					    link->hash, link->key,
+					    htab->map.key_size);
+			/* If another thread deleted this object skip deletion.
+			 * The refcnt on psock may or may not be zero.
+			 */
+			if (l) {
+				hlist_del_rcu(&link->hash_node);
+				smap_release_sock(psock, link->sk);
+				free_htab_elem(htab, link);
+			}
+			raw_spin_unlock_bh(&b->lock);
 		}
+		e = psock_map_pop(sk, psock);
 	}
-	write_unlock_bh(&sk->sk_callback_lock);
 	rcu_read_unlock();
 	close_fun(sk, timeout);
 }
@@ -1395,7 +1452,9 @@ static void smap_release_sock(struct smap_psock *psock, struct sock *sock)
 {
 	if (refcount_dec_and_test(&psock->refcnt)) {
 		tcp_cleanup_ulp(sock);
+		write_lock_bh(&sock->sk_callback_lock);
 		smap_stop_sock(psock, sock);
+		write_unlock_bh(&sock->sk_callback_lock);
 		clear_bit(SMAP_TX_RUNNING, &psock->state);
 		rcu_assign_sk_user_data(sock, NULL);
 		call_rcu_sched(&psock->rcu, smap_destroy_psock);
@@ -1546,6 +1605,7 @@ static struct smap_psock *smap_init_psock(struct sock *sock, int node)
 	INIT_LIST_HEAD(&psock->maps);
 	INIT_LIST_HEAD(&psock->ingress);
 	refcount_set(&psock->refcnt, 1);
+	spin_lock_init(&psock->maps_lock);
 
 	rcu_assign_sk_user_data(sock, psock);
 	sock_hold(sock);
@@ -1607,10 +1667,12 @@ static void smap_list_map_remove(struct smap_psock *psock,
 {
 	struct smap_psock_map_entry *e, *tmp;
 
+	spin_lock_bh(&psock->maps_lock);
 	list_for_each_entry_safe(e, tmp, &psock->maps, list) {
 		if (e->entry == entry)
 			list_del(&e->list);
 	}
+	spin_unlock_bh(&psock->maps_lock);
 }
 
 static void smap_list_hash_remove(struct smap_psock *psock,
@@ -1618,12 +1680,14 @@ static void smap_list_hash_remove(struct smap_psock *psock,
 {
 	struct smap_psock_map_entry *e, *tmp;
 
+	spin_lock_bh(&psock->maps_lock);
 	list_for_each_entry_safe(e, tmp, &psock->maps, list) {
-		struct htab_elem *c = e->hash_link;
+		struct htab_elem *c = rcu_dereference(e->hash_link);
 
 		if (c == hash_link)
 			list_del(&e->list);
 	}
+	spin_unlock_bh(&psock->maps_lock);
 }
 
 static void sock_map_free(struct bpf_map *map)
@@ -1649,7 +1713,6 @@ static void sock_map_free(struct bpf_map *map)
 		if (!sock)
 			continue;
 
-		write_lock_bh(&sock->sk_callback_lock);
 		psock = smap_psock_sk(sock);
 		/* This check handles a racing sock event that can get the
 		 * sk_callback_lock before this case but after xchg happens
@@ -1660,7 +1723,6 @@ static void sock_map_free(struct bpf_map *map)
 			smap_list_map_remove(psock, &stab->sock_map[i]);
 			smap_release_sock(psock, sock);
 		}
-		write_unlock_bh(&sock->sk_callback_lock);
 	}
 	rcu_read_unlock();
 
@@ -1709,7 +1771,6 @@ static int sock_map_delete_elem(struct bpf_map *map, void *key)
 	if (!sock)
 		return -EINVAL;
 
-	write_lock_bh(&sock->sk_callback_lock);
 	psock = smap_psock_sk(sock);
 	if (!psock)
 		goto out;
@@ -1719,7 +1780,6 @@ static int sock_map_delete_elem(struct bpf_map *map, void *key)
 	smap_list_map_remove(psock, &stab->sock_map[k]);
 	smap_release_sock(psock, sock);
 out:
-	write_unlock_bh(&sock->sk_callback_lock);
 	return 0;
 }
 
@@ -1800,7 +1860,6 @@ static int __sock_map_ctx_update_elem(struct bpf_map *map,
 		}
 	}
 
-	write_lock_bh(&sock->sk_callback_lock);
 	psock = smap_psock_sk(sock);
 
 	/* 2. Do not allow inheriting programs if psock exists and has
@@ -1857,7 +1916,9 @@ static int __sock_map_ctx_update_elem(struct bpf_map *map,
 		if (err)
 			goto out_free;
 		smap_init_progs(psock, verdict, parse);
+		write_lock_bh(&sock->sk_callback_lock);
 		smap_start_sock(psock, sock);
+		write_unlock_bh(&sock->sk_callback_lock);
 	}
 
 	/* 4. Place psock in sockmap for use and stop any programs on
@@ -1867,9 +1928,10 @@ static int __sock_map_ctx_update_elem(struct bpf_map *map,
 	 */
 	if (map_link) {
 		e->entry = map_link;
+		spin_lock_bh(&psock->maps_lock);
 		list_add_tail(&e->list, &psock->maps);
+		spin_unlock_bh(&psock->maps_lock);
 	}
-	write_unlock_bh(&sock->sk_callback_lock);
 	return err;
 out_free:
 	smap_release_sock(psock, sock);
@@ -1880,7 +1942,6 @@ static int __sock_map_ctx_update_elem(struct bpf_map *map,
 	}
 	if (tx_msg)
 		bpf_prog_put(tx_msg);
-	write_unlock_bh(&sock->sk_callback_lock);
 	kfree(e);
 	return err;
 }
@@ -1917,10 +1978,8 @@ static int sock_map_ctx_update_elem(struct bpf_sock_ops_kern *skops,
 	if (osock) {
 		struct smap_psock *opsock = smap_psock_sk(osock);
 
-		write_lock_bh(&osock->sk_callback_lock);
 		smap_list_map_remove(opsock, &stab->sock_map[i]);
 		smap_release_sock(opsock, osock);
-		write_unlock_bh(&osock->sk_callback_lock);
 	}
 out:
 	return err;
@@ -2091,14 +2150,13 @@ static struct bpf_map *sock_hash_alloc(union bpf_attr *attr)
 	return ERR_PTR(err);
 }
 
-static inline struct bucket *__select_bucket(struct bpf_htab *htab, u32 hash)
+static void __bpf_htab_free(struct rcu_head *rcu)
 {
-	return &htab->buckets[hash & (htab->n_buckets - 1)];
-}
+	struct bpf_htab *htab;
 
-static inline struct hlist_head *select_bucket(struct bpf_htab *htab, u32 hash)
-{
-	return &__select_bucket(htab, hash)->head;
+	htab = container_of(rcu, struct bpf_htab, rcu);
+	bpf_map_area_free(htab->buckets);
+	kfree(htab);
 }
 
 static void sock_hash_free(struct bpf_map *map)
@@ -2117,16 +2175,18 @@ static void sock_hash_free(struct bpf_map *map)
 	 */
 	rcu_read_lock();
 	for (i = 0; i < htab->n_buckets; i++) {
-		struct hlist_head *head = select_bucket(htab, i);
+		struct bucket *b = __select_bucket(htab, i);
+		struct hlist_head *head;
 		struct hlist_node *n;
 		struct htab_elem *l;
 
+		raw_spin_lock_bh(&b->lock);
+		head = &b->head;
 		hlist_for_each_entry_safe(l, n, head, hash_node) {
 			struct sock *sock = l->sk;
 			struct smap_psock *psock;
 
 			hlist_del_rcu(&l->hash_node);
-			write_lock_bh(&sock->sk_callback_lock);
 			psock = smap_psock_sk(sock);
 			/* This check handles a racing sock event that can get
 			 * the sk_callback_lock before this case but after xchg
@@ -2137,13 +2197,12 @@ static void sock_hash_free(struct bpf_map *map)
 				smap_list_hash_remove(psock, l);
 				smap_release_sock(psock, sock);
 			}
-			write_unlock_bh(&sock->sk_callback_lock);
-			kfree(l);
+			free_htab_elem(htab, l);
 		}
+		raw_spin_unlock_bh(&b->lock);
 	}
 	rcu_read_unlock();
-	bpf_map_area_free(htab->buckets);
-	kfree(htab);
+	call_rcu(&htab->rcu, __bpf_htab_free);
 }
 
 static struct htab_elem *alloc_sock_hash_elem(struct bpf_htab *htab,
@@ -2170,19 +2229,6 @@ static struct htab_elem *alloc_sock_hash_elem(struct bpf_htab *htab,
 	return l_new;
 }
 
-static struct htab_elem *lookup_elem_raw(struct hlist_head *head,
-					 u32 hash, void *key, u32 key_size)
-{
-	struct htab_elem *l;
-
-	hlist_for_each_entry_rcu(l, head, hash_node) {
-		if (l->hash == hash && !memcmp(&l->key, key, key_size))
-			return l;
-	}
-
-	return NULL;
-}
-
 static inline u32 htab_map_hash(const void *key, u32 key_len)
 {
 	return jhash(key, key_len, 0);
@@ -2302,9 +2348,12 @@ static int sock_hash_ctx_update_elem(struct bpf_sock_ops_kern *skops,
 		goto bucket_err;
 	}
 
-	e->hash_link = l_new;
-	e->htab = container_of(map, struct bpf_htab, map);
+	rcu_assign_pointer(e->hash_link, l_new);
+	rcu_assign_pointer(e->htab,
+			   container_of(map, struct bpf_htab, map));
+	spin_lock_bh(&psock->maps_lock);
 	list_add_tail(&e->list, &psock->maps);
+	spin_unlock_bh(&psock->maps_lock);
 
 	/* add new element to the head of the list, so that
 	 * concurrent search will find it before old elem
@@ -2374,7 +2423,6 @@ static int sock_hash_delete_elem(struct bpf_map *map, void *key)
 		struct smap_psock *psock;
 
 		hlist_del_rcu(&l->hash_node);
-		write_lock_bh(&sock->sk_callback_lock);
 		psock = smap_psock_sk(sock);
 		/* This check handles a racing sock event that can get the
 		 * sk_callback_lock before this case but after xchg happens
@@ -2385,7 +2433,6 @@ static int sock_hash_delete_elem(struct bpf_map *map, void *key)
 			smap_list_hash_remove(psock, l);
 			smap_release_sock(psock, sock);
 		}
-		write_unlock_bh(&sock->sk_callback_lock);
 		free_htab_elem(htab, l);
 		ret = 0;
 	}

^ permalink raw reply related

* [bpf PATCH v5 4/4] bpf: sockhash, add release routine
From: John Fastabend @ 2018-06-30 13:17 UTC (permalink / raw)
  To: ast, daniel, kafai; +Cc: netdev
In-Reply-To: <20180630131506.17344.7833.stgit@john-Precision-Tower-5810>

Add map_release_uref pointer to hashmap ops. This was dropped when
original sockhash code was ported into bpf-next before initial
commit.

Fixes: 81110384441a ("bpf: sockmap, add hash map support")
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 kernel/bpf/sockmap.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 60b6e8b..4fc2cb1 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -2478,6 +2478,7 @@ struct sock  *__sock_hash_lookup_elem(struct bpf_map *map, void *key)
 	.map_get_next_key = sock_hash_get_next_key,
 	.map_update_elem = sock_hash_update_elem,
 	.map_delete_elem = sock_hash_delete_elem,
+	.map_release_uref = sock_map_release,
 };
 
 BPF_CALL_4(bpf_sock_map_update, struct bpf_sock_ops_kern *, bpf_sock,

^ permalink raw reply related

* Re: [PATCH 0/2 ] cdc_ncm: Handle multicast Ethernet traffic
From: Miguel Rodríguez Pérez @ 2018-06-30 13:43 UTC (permalink / raw)
  To: Bjørn Mork; +Cc: linux-usb, netdev
In-Reply-To: <87muvczcot.fsf@miraculix.mork.no>

On 30/06/18 14:22, Bjørn Mork wrote:
> Miguel Rodríguez Pérez <miguel@det.uvigo.gal> writes:
> 
>> Sending again, as the previous try had the wrong subjects. Sorry about that.
> 
> No problem really, but please use version numbers so it's easy to see
> which set is most current.
Fine, I'll do so in next revision.
> 
> And you didn't address Olivers comment.  Why not?
Well, I did, but I forgot to format it as plain text and it didn't reach
the least. I have just resent it.
> 
> 
> Bjørn
> 

-- 
Miguel Rodríguez Pérez
Laboratorio de Redes
EE Telecomunicación – Universidade de Vigo

^ permalink raw reply

* [bpf PATCH 2/2] bpf: sockmap, hash table is RCU so readers do not need locks
From: John Fastabend @ 2018-06-30 13:51 UTC (permalink / raw)
  To: ast, daniel; +Cc: netdev, kafai
In-Reply-To: <20180630134148.6395.64795.stgit@john-Precision-Tower-5810>

This removes locking from readers of RCU hash table. Its not
necessary.

Fixes: 81110384441a ("bpf: sockmap, add hash map support")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 0 files changed

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 63fb047..12ac10a 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -2451,10 +2451,8 @@ struct sock  *__sock_hash_lookup_elem(struct bpf_map *map, void *key)
 	b = __select_bucket(htab, hash);
 	head = &b->head;
 
-	raw_spin_lock_bh(&b->lock);
 	l = lookup_elem_raw(head, hash, key, key_size);
 	sk = l ? l->sk : NULL;
-	raw_spin_unlock_bh(&b->lock);
 	return sk;
 }
 

^ permalink raw reply related

* [bpf PATCH 1/2] bpf: sockmap, error path can not release psock in multi-map case
From: John Fastabend @ 2018-06-30 13:51 UTC (permalink / raw)
  To: ast, daniel; +Cc: netdev, kafai
In-Reply-To: <20180630134148.6395.64795.stgit@john-Precision-Tower-5810>

The current code, in the error path of sock_hash_ctx_update_elem,
checks if the sock has a psock in the user data and if so decrements
the reference count of the psock. However, if the error happens early
in the error path we may have never incremented the psock reference
count and if the psock exists because the sock is in another map then
we may inadvertently decrement the reference count.

Fix this by making the error path only call smap_release_sock if the
error happens after the increment.

Reported-by: syzbot+d464d2c20c717ef5a6a8@syzkaller.appspotmail.com
Fixes: 81110384441a ("bpf: sockmap, add hash map support")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 0 files changed

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 4fc2cb1..63fb047 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -1896,7 +1896,7 @@ static int __sock_map_ctx_update_elem(struct bpf_map *map,
 		e = kzalloc(sizeof(*e), GFP_ATOMIC | __GFP_NOWARN);
 		if (!e) {
 			err = -ENOMEM;
-			goto out_progs;
+			goto out_free;
 		}
 	}
 
@@ -2324,7 +2324,12 @@ static int sock_hash_ctx_update_elem(struct bpf_sock_ops_kern *skops,
 	if (err)
 		goto err;
 
-	/* bpf_map_update_elem() can be called in_irq() */
+	psock = smap_psock_sk(sock);
+	if (unlikely(!psock)) {
+		err = -EINVAL;
+		goto err;
+	}
+
 	raw_spin_lock_bh(&b->lock);
 	l_old = lookup_elem_raw(head, hash, key, key_size);
 	if (l_old && map_flags == BPF_NOEXIST) {
@@ -2342,12 +2347,6 @@ static int sock_hash_ctx_update_elem(struct bpf_sock_ops_kern *skops,
 		goto bucket_err;
 	}
 
-	psock = smap_psock_sk(sock);
-	if (unlikely(!psock)) {
-		err = -EINVAL;
-		goto bucket_err;
-	}
-
 	rcu_assign_pointer(e->hash_link, l_new);
 	rcu_assign_pointer(e->htab,
 			   container_of(map, struct bpf_htab, map));
@@ -2370,12 +2369,10 @@ static int sock_hash_ctx_update_elem(struct bpf_sock_ops_kern *skops,
 	raw_spin_unlock_bh(&b->lock);
 	return 0;
 bucket_err:
+	smap_release_sock(psock, sock);
 	raw_spin_unlock_bh(&b->lock);
 err:
 	kfree(e);
-	psock = smap_psock_sk(sock);
-	if (psock)
-		smap_release_sock(psock, sock);
 	return err;
 }
 

^ permalink raw reply related

* [bpf PATCH 0/2] sockmap, syzbot fix error path and RCU fix
From: John Fastabend @ 2018-06-30 13:51 UTC (permalink / raw)
  To: ast, daniel; +Cc: netdev, kafai

  This applies on top of "BPF fixes for sockhash" I just didn't
  want to confuse that series yet again by re-ordering/adding
  these patches in it

I missed fixing the error path in the sockhash code to align with
supporting socks in multiple maps. Simply checking if the psock is
present does not mean we can decrement the reference count because
it could be part of another map. Fix this by cleaning up the error
path so this situation does not happen.

As far as I know this should be the last fix to the fallout from
relaxing the single map restriction. Sorry about the multiple fixes
but these patches were all written before the initial submission then
converted and I missed this detail. But at least we caught these early
in the net cycle. Will continue reviewing/testing however to see if
we catch anything else.

Also we need one more series to check ESTABLISH state as Eric noted.
That will be sent out shortly just going over the patches once more.
The ESTABLISH/unhash fix is also needed in kTLS.

---

John Fastabend (2):
      bpf: sockmap, error path can not release psock in multi-map case
      bpf: sockmap, hash table is RCU so readers do not need locks

 0 files changed

^ permalink raw reply

* Re: [PATCH 1/2] cdc_ncm: Hook into usbnet_change_mtu respecting usbnet,driver_info
From: Miguel Rodríguez Pérez @ 2018-06-30 14:22 UTC (permalink / raw)
  To: Bjørn Mork; +Cc: linux-usb, netdev
In-Reply-To: <87vaa0zdnx.fsf@miraculix.mork.no>

On 30/06/18 14:01, Bjørn Mork wrote:
> Miguel Rodríguez Pérez <miguel@det.uvigo.gal> writes:
> 
>> Change the way cdc_ncm_change_mtu hooks into the netdev_ops
>> structure so that changes into usbnet driver_info operations
>> can be respected. Without this, is was not possible to hook
>> into usbnet_set_rx_mode.
> 
> Please export usbnet_set_rx_mode instead, so that cdc_ncm_netdev_ops
> can be kept const.
> 
I don't see how this would help. From my point of view, the problem is
that cdc_ncm driver should not be setting net_device_ops in the first
place, as that is usbnet.c job. Sadly there seems to be no way currently
for cdc_ncm to be notified of mtu changes. This was the least intrusive
way I could think of to respect usbnet's net_device_ops structure while
keeping cdc_ncm notified of mtu changes. Without this change,
set_rx_mode was not getting called for cdc_ncm even after modifying
struct driver_info.
> 
> Bjørn
> 

-- 
Miguel Rodríguez Pérez
Laboratorio de Redes
EE Telecomunicación – Universidade de Vigo

^ permalink raw reply

* Re: [PATCH 2/2] cdc_ncm: Admit multicast traffic
From: Miguel Rodríguez Pérez @ 2018-06-30 14:22 UTC (permalink / raw)
  To: Bjørn Mork; +Cc: linux-usb, netdev
In-Reply-To: <87r2kozcx0.fsf@miraculix.mork.no>

On 30/06/18 14:17, Bjørn Mork wrote:
> Miguel Rodríguez Pérez <miguel@det.uvigo.gal> writes:
> 
>> +static void cdc_ncm_update_filter(struct usbnet *dev)
>> +{
>> +       struct cdc_ncm_ctx *ctx = (struct cdc_ncm_ctx *)dev->data[0];
>> +	u8 iface_no = ctx->control->cur_altsetting->desc.bInterfaceNumber;
>> +	struct net_device *net = dev->net;
> 
> Why not change usbnet_cdc_update_filter to get the interface number from
> dev->intf instead? Then you can export it and reuse it here instead of
> creating a copy.  And it will also work for any other usbnet minidriver.

Well, I just wanted to limit my changes to the minimum, as I lack
experience with kernel development. Where should I declare
usbnet_cdc_update then? Would linux/usb/cdc.h the right place? Is it
enough to do EXPORT_GPL_SYMBOL(usbnet_cdc_update_filter) in cdc_ether.c
for it to be loaded automatically from cdc_ncm.c?
> 
> I.e. change this:
> 
> static void usbnet_cdc_update_filter(struct usbnet *dev)
> {
> 	struct cdc_state	*info = (void *) &dev->data;
> 	struct usb_interface	*intf = info->control;
> 
> into:
> 
> static void usbnet_cdc_update_filter(struct usbnet *dev)
> {
> 	struct usb_interface	*intf = dev->intf;
> 
> 
> or simply use dev->intf to deref directly into an interface number, like
> you do in your version.  The local 'intf' variable doesn't do much
> good here.
> 
Ok, done.
> 
> 
>> +
>> +	u16 cdc_filter = USB_CDC_PACKET_TYPE_DIRECTED
>> +			| USB_CDC_PACKET_TYPE_BROADCAST;
>> +
>> +	/* filtering on the device is an optional feature and not worth
>> +	 * the hassle so we just roughly care about snooping and if any
>> +	 * multicast is requested, we take every multicast
>> +	 */
>> +	if (net->flags & IFF_PROMISC)
>> +		cdc_filter |= USB_CDC_PACKET_TYPE_PROMISCUOUS;
>> +	if (!netdev_mc_empty(net) || (net->flags & IFF_ALLMULTI))
>> +		cdc_filter |= USB_CDC_PACKET_TYPE_ALL_MULTICAST;
>> +
>> +	usbnet_write_cmd(dev,
> 
> This is a nice change. It should be probably done in
> usbnet_cdc_update_filter in any case.  Unless there is some reason it
> doesn't alreay use usbnet_write_cmd?
I don't know of any, so I'll keep it changed.
> 
>> +			USB_CDC_SET_ETHERNET_PACKET_FILTER,
>> +			USB_TYPE_CLASS | USB_DIR_OUT | USB_RECIP_INTERFACE,
>> +			cdc_filter,
>> +			iface_no,
>> +			NULL,
>> +			0);
>> +}
>> +
>>  static const struct ethtool_ops cdc_ncm_ethtool_ops = {
>>  	.get_link          = usbnet_get_link,
>>  	.nway_reset        = usbnet_nway_reset,
>> @@ -1652,6 +1679,7 @@ static const struct driver_info cdc_ncm_info = {
>>  	.status = cdc_ncm_status,
>>  	.rx_fixup = cdc_ncm_rx_fixup,
>>  	.tx_fixup = cdc_ncm_tx_fixup,
>> +	.set_rx_mode = cdc_ncm_update_filter,
>>  };
>>
>>  /* Same as cdc_ncm_info, but with FLAG_WWAN */
> 
> 
> As the comment indicates:  There are more 'struct driver_info' variants
> here.  Please add .set_rx_mode to all of them, unless you have a reason
> not to?
I simply lack access to the hardware to test and I didn't want to break
anybody else's hardware. In any case, the change seems quite safe, so I
modified them all too.

I will for your comments before sending the updated version of the patches.
> 
> 
> Bjørn
> 

-- 
Miguel Rodríguez Pérez
Laboratorio de Redes
EE Telecomunicación – Universidade de Vigo

^ permalink raw reply

* [PATCH net] net: fix use-after-free in GRO with ESP
From: Sabrina Dubroca @ 2018-06-30 15:38 UTC (permalink / raw)
  To: netdev; +Cc: Sabrina Dubroca, Stefano Brivio, Steffen Klassert

Since the addition of GRO for ESP, gro_receive can consume the skb and
return -EINPROGRESS. In that case, the lower layer GRO handler cannot
touch the skb anymore.

Commit 5f114163f2f5 ("net: Add a skb_gro_flush_final helper.") converted
some of the gro_receive handlers that can lead to ESP's gro_receive so
that they wouldn't access the skb when -EINPROGRESS is returned, but
missed other spots, mainly in tunneling protocols.

This patch finishes the conversion to using skb_gro_flush_final(), and
adds a new helper, skb_gro_flush_final_remcsum(), used in VXLAN and
GUE.

Fixes: 5f114163f2f5 ("net: Add a skb_gro_flush_final helper.")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
---
 drivers/net/geneve.c      |  2 +-
 drivers/net/vxlan.c       |  4 +---
 include/linux/netdevice.h | 20 ++++++++++++++++++++
 net/8021q/vlan.c          |  2 +-
 net/ipv4/fou.c            |  4 +---
 net/ipv4/gre_offload.c    |  2 +-
 net/ipv4/udp_offload.c    |  2 +-
 7 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 750eaa53bf0c..ada33c2d9ac2 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -476,7 +476,7 @@ static struct sk_buff **geneve_gro_receive(struct sock *sk,
 out_unlock:
 	rcu_read_unlock();
 out:
-	NAPI_GRO_CB(skb)->flush |= flush;
+	skb_gro_flush_final(skb, pp, flush);
 
 	return pp;
 }
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index aee0e60471f1..f6bb1d54d4bd 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -623,9 +623,7 @@ static struct sk_buff **vxlan_gro_receive(struct sock *sk,
 	flush = 0;
 
 out:
-	skb_gro_remcsum_cleanup(skb, &grc);
-	skb->remcsum_offload = 0;
-	NAPI_GRO_CB(skb)->flush |= flush;
+	skb_gro_flush_final_remcsum(skb, pp, flush, &grc);
 
 	return pp;
 }
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 3ec9850c7936..3d0cc0b5cec2 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2789,11 +2789,31 @@ static inline void skb_gro_flush_final(struct sk_buff *skb, struct sk_buff **pp,
 	if (PTR_ERR(pp) != -EINPROGRESS)
 		NAPI_GRO_CB(skb)->flush |= flush;
 }
+static inline void skb_gro_flush_final_remcsum(struct sk_buff *skb,
+					       struct sk_buff **pp,
+					       int flush,
+					       struct gro_remcsum *grc)
+{
+	if (PTR_ERR(pp) != -EINPROGRESS) {
+		NAPI_GRO_CB(skb)->flush |= flush;
+		skb_gro_remcsum_cleanup(skb, grc);
+		skb->remcsum_offload = 0;
+	}
+}
 #else
 static inline void skb_gro_flush_final(struct sk_buff *skb, struct sk_buff **pp, int flush)
 {
 	NAPI_GRO_CB(skb)->flush |= flush;
 }
+static inline void skb_gro_flush_final_remcsum(struct sk_buff *skb,
+					       struct sk_buff **pp,
+					       int flush,
+					       struct gro_remcsum *grc)
+{
+	NAPI_GRO_CB(skb)->flush |= flush;
+	skb_gro_remcsum_cleanup(skb, grc);
+	skb->remcsum_offload = 0;
+}
 #endif
 
 static inline int dev_hard_header(struct sk_buff *skb, struct net_device *dev,
diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 73a65789271b..8ccee3d01822 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -693,7 +693,7 @@ static struct sk_buff **vlan_gro_receive(struct sk_buff **head,
 out_unlock:
 	rcu_read_unlock();
 out:
-	NAPI_GRO_CB(skb)->flush |= flush;
+	skb_gro_flush_final(skb, pp, flush);
 
 	return pp;
 }
diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
index 1540db65241a..c9ec1603666b 100644
--- a/net/ipv4/fou.c
+++ b/net/ipv4/fou.c
@@ -448,9 +448,7 @@ static struct sk_buff **gue_gro_receive(struct sock *sk,
 out_unlock:
 	rcu_read_unlock();
 out:
-	NAPI_GRO_CB(skb)->flush |= flush;
-	skb_gro_remcsum_cleanup(skb, &grc);
-	skb->remcsum_offload = 0;
+	skb_gro_flush_final_remcsum(skb, pp, flush, &grc);
 
 	return pp;
 }
diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c
index 1859c473b21a..6a7d980105f6 100644
--- a/net/ipv4/gre_offload.c
+++ b/net/ipv4/gre_offload.c
@@ -223,7 +223,7 @@ static struct sk_buff **gre_gro_receive(struct sk_buff **head,
 out_unlock:
 	rcu_read_unlock();
 out:
-	NAPI_GRO_CB(skb)->flush |= flush;
+	skb_gro_flush_final(skb, pp, flush);
 
 	return pp;
 }
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 92dc9e5a7ff3..69c54540d5b4 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -394,7 +394,7 @@ struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb,
 out_unlock:
 	rcu_read_unlock();
 out:
-	NAPI_GRO_CB(skb)->flush |= flush;
+	skb_gro_flush_final(skb, pp, flush);
 	return pp;
 }
 EXPORT_SYMBOL(udp_gro_receive);
-- 
2.18.0

^ permalink raw reply related

* Re: [net-next 01/12] net/mlx5e: Add UDP GSO support
From: Boris Pismenny @ 2018-06-30 16:06 UTC (permalink / raw)
  To: Willem de Bruijn, Saeed Mahameed
  Cc: David Miller, Network Development, ogerlitz, yossiku,
	Alexander Duyck
In-Reply-To: <CAF=yD-LHN6GS7aGBm3enQDJNrvrUp6aGGMcr7bHOR0PNrLqKJA@mail.gmail.com>



On 06/30/18 01:19, Willem de Bruijn wrote:
> On Fri, Jun 29, 2018 at 2:24 AM Saeed Mahameed <saeedm@mellanox.com> wrote:
>> From: Boris Pismenny <borisp@mellanox.com>
>>
>> This patch enables UDP GSO support. We enable this by using two WQEs
>> the first is a UDP LSO WQE for all segments with equal length, and the
>> second is for the last segment in case it has different length.
>> Due to HW limitation, before sending, we must adjust the packet length fields.
>>
>> We measure performance between two Intel(R) Xeon(R) CPU E5-2643 v2 @3.50GHz
>> machines connected back-to-back with Connectx4-Lx (40Gbps) NICs.
>> We compare single stream UDP, UDP GSO and UDP GSO with offload.
>> Performance:
>>                  | MSS (bytes)   | Throughput (Gbps)     | CPU utilization (%)
>> UDP GSO offload | 1472          | 35.6                  | 8%
>> UDP GSO         | 1472          | 25.5                  | 17%
>> UDP             | 1472          | 10.2                  | 17%
>> UDP GSO offload | 1024          | 35.6                  | 8%
>> UDP GSO         | 1024          | 19.2                  | 17%
>> UDP             | 1024          | 5.7                   | 17%
>> UDP GSO offload | 512           | 33.8                  | 16%
>> UDP GSO         | 512           | 10.4                  | 17%
>> UDP             | 512           | 3.5                   | 17%
> Very nice results :)
Thanks. We expect to have 100Gbps results by netdev0x12.

>> +static void mlx5e_udp_gso_prepare_last_skb(struct sk_buff *skb,
>> +                                          struct sk_buff *nskb,
>> +                                          int remaining)
>> +{
>> +       int bytes_needed = remaining, remaining_headlen, remaining_page_offset;
>> +       int headlen = skb_transport_offset(skb) + sizeof(struct udphdr);
>> +       int payload_len = remaining + sizeof(struct udphdr);
>> +       int k = 0, i, j;
>> +
>> +       skb_copy_bits(skb, 0, nskb->data, headlen);
>> +       nskb->dev = skb->dev;
>> +       skb_reset_mac_header(nskb);
>> +       skb_set_network_header(nskb, skb_network_offset(skb));
>> +       skb_set_transport_header(nskb, skb_transport_offset(skb));
>> +       skb_set_tail_pointer(nskb, headlen);
>> +
>> +       /* How many frags do we need? */
>> +       for (i = skb_shinfo(skb)->nr_frags - 1; i >= 0; i--) {
>> +               bytes_needed -= skb_frag_size(&skb_shinfo(skb)->frags[i]);
>> +               k++;
>> +               if (bytes_needed <= 0)
>> +                       break;
>> +       }
>> +
>> +       /* Fill the first frag and split it if necessary */
>> +       j = skb_shinfo(skb)->nr_frags - k;
>> +       remaining_page_offset = -bytes_needed;
>> +       skb_fill_page_desc(nskb, 0,
>> +                          skb_shinfo(skb)->frags[j].page.p,
>> +                          skb_shinfo(skb)->frags[j].page_offset + remaining_page_offset,
>> +                          skb_shinfo(skb)->frags[j].size - remaining_page_offset);
>> +
>> +       skb_frag_ref(skb, j);
>> +
>> +       /* Fill the rest of the frags */
>> +       for (i = 1; i < k; i++) {
>> +               j = skb_shinfo(skb)->nr_frags - k + i;
>> +
>> +               skb_fill_page_desc(nskb, i,
>> +                                  skb_shinfo(skb)->frags[j].page.p,
>> +                                  skb_shinfo(skb)->frags[j].page_offset,
>> +                                  skb_shinfo(skb)->frags[j].size);
>> +               skb_frag_ref(skb, j);
>> +       }
>> +       skb_shinfo(nskb)->nr_frags = k;
>> +
>> +       remaining_headlen = remaining - skb->data_len;
>> +
>> +       /* headlen contains remaining data? */
>> +       if (remaining_headlen > 0)
>> +               skb_copy_bits(skb, skb->len - remaining, nskb->data + headlen,
>> +                             remaining_headlen);
>> +       nskb->len = remaining + headlen;
>> +       nskb->data_len =  payload_len - sizeof(struct udphdr) +
>> +               max_t(int, 0, remaining_headlen);
>> +       nskb->protocol = skb->protocol;
>> +       if (nskb->protocol == htons(ETH_P_IP)) {
>> +               ip_hdr(nskb)->id = htons(ntohs(ip_hdr(nskb)->id) +
>> +                                        skb_shinfo(skb)->gso_segs);
>> +               ip_hdr(nskb)->tot_len =
>> +                       htons(payload_len + sizeof(struct iphdr));
>> +       } else {
>> +               ipv6_hdr(nskb)->payload_len = htons(payload_len);
>> +       }
>> +       udp_hdr(nskb)->len = htons(payload_len);
>> +       skb_shinfo(nskb)->gso_size = 0;
>> +       nskb->ip_summed = skb->ip_summed;
>> +       nskb->csum_start = skb->csum_start;
>> +       nskb->csum_offset = skb->csum_offset;
>> +       nskb->queue_mapping = skb->queue_mapping;
>> +}
>> +
>> +/* might send skbs and update wqe and pi */
>> +struct sk_buff *mlx5e_udp_gso_handle_tx_skb(struct net_device *netdev,
>> +                                           struct mlx5e_txqsq *sq,
>> +                                           struct sk_buff *skb,
>> +                                           struct mlx5e_tx_wqe **wqe,
>> +                                           u16 *pi)
>> +{
>> +       int payload_len = skb_shinfo(skb)->gso_size + sizeof(struct udphdr);
>> +       int headlen = skb_transport_offset(skb) + sizeof(struct udphdr);
>> +       int remaining = (skb->len - headlen) % skb_shinfo(skb)->gso_size;
>> +       struct sk_buff *nskb;
>> +
>> +       if (skb->protocol == htons(ETH_P_IP))
>> +               ip_hdr(skb)->tot_len = htons(payload_len + sizeof(struct iphdr));
>> +       else
>> +               ipv6_hdr(skb)->payload_len = htons(payload_len);
>> +       udp_hdr(skb)->len = htons(payload_len);
>> +       if (!remaining)
>> +               return skb;
>> +
>> +       nskb = alloc_skb(max_t(int, headlen, headlen + remaining - skb->data_len), GFP_ATOMIC);
>> +       if (unlikely(!nskb)) {
>> +               sq->stats->dropped++;
>> +               return NULL;
>> +       }
>> +
>> +       mlx5e_udp_gso_prepare_last_skb(skb, nskb, remaining);
>> +
>> +       skb_shinfo(skb)->gso_segs--;
>> +       pskb_trim(skb, skb->len - remaining);
>> +       mlx5e_sq_xmit(sq, skb, *wqe, *pi);
>> +       mlx5e_sq_fetch_wqe(sq, wqe, pi);
>> +       return nskb;
>> +}
> The device driver seems to be implementing the packet split here
> similar to NETIF_F_GSO_PARTIAL. When advertising the right flag, the
> stack should be able to do that for you and pass two packets to the
> driver.

We've totally missed that!
Thank you. I'll fix and resubmit.

^ permalink raw reply

* [PATCH net-next 1/5] net: gemini: Look up L3 maxlen from table
From: Linus Walleij @ 2018-06-30 16:18 UTC (permalink / raw)
  To: netdev, David S . Miller, Michał Mirosław
  Cc: Janos Laube, Paulius Zaleckas, linux-arm-kernel, Hans Ulli Kroll,
	Florian Fainelli, Linus Walleij

The code to calculate the hardware register enumerator
for the maximum L3 length isn't entirely simple to read.
Use the existing defines and rewrite the function into a
table look-up.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 drivers/net/ethernet/cortina/gemini.c | 61 ++++++++++++++++++++-------
 1 file changed, 46 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/cortina/gemini.c b/drivers/net/ethernet/cortina/gemini.c
index 6d7404f66f84..8fc31723f700 100644
--- a/drivers/net/ethernet/cortina/gemini.c
+++ b/drivers/net/ethernet/cortina/gemini.c
@@ -401,26 +401,57 @@ static int gmac_setup_phy(struct net_device *netdev)
 	return 0;
 }
 
-static int gmac_pick_rx_max_len(int max_l3_len)
-{
-	/* index = CONFIG_MAXLEN_XXX values */
-	static const int max_len[8] = {
-		1536, 1518, 1522, 1542,
-		9212, 10236, 1518, 1518
-	};
-	int i, n = 5;
+/* The maximum frame length is not logically enumerated in the
+ * hardware, so we do a table lookup to find the applicable max
+ * frame length.
+ */
+struct gmac_max_framelen {
+	unsigned int max_l3_len;
+	u8 val;
+};
 
-	max_l3_len += ETH_HLEN + VLAN_HLEN;
+static const struct gmac_max_framelen gmac_maxlens[] = {
+	{
+		.max_l3_len = 1518,
+		.val = CONFIG0_MAXLEN_1518,
+	},
+	{
+		.max_l3_len = 1522,
+		.val = CONFIG0_MAXLEN_1522,
+	},
+	{
+		.max_l3_len = 1536,
+		.val = CONFIG0_MAXLEN_1536,
+	},
+	{
+		.max_l3_len = 1542,
+		.val = CONFIG0_MAXLEN_1542,
+	},
+	{
+		.max_l3_len = 9212,
+		.val = CONFIG0_MAXLEN_9k,
+	},
+	{
+		.max_l3_len = 10236,
+		.val = CONFIG0_MAXLEN_10k,
+	},
+};
+
+static int gmac_pick_rx_max_len(unsigned int max_l3_len)
+{
+	const struct gmac_max_framelen *maxlen;
+	int maxtot;
+	int i;
 
-	if (max_l3_len > max_len[n])
-		return -1;
+	maxtot = max_l3_len + ETH_HLEN + VLAN_HLEN;
 
-	for (i = 0; i < 5; i++) {
-		if (max_len[i] >= max_l3_len && max_len[i] < max_len[n])
-			n = i;
+	for (i = 0; i < ARRAY_SIZE(gmac_maxlens); i++) {
+		maxlen = &gmac_maxlens[i];
+		if (maxtot <= maxlen->max_l3_len)
+			return maxlen->val;
 	}
 
-	return n;
+	return -1;
 }
 
 static int gmac_init(struct net_device *netdev)
-- 
2.17.1

^ permalink raw reply related

* [PATCH net-next 2/5] net: gemini: Improve connection prints
From: Linus Walleij @ 2018-06-30 16:18 UTC (permalink / raw)
  To: netdev, David S . Miller, Michał Mirosław
  Cc: Janos Laube, Paulius Zaleckas, linux-arm-kernel, Hans Ulli Kroll,
	Florian Fainelli, Linus Walleij
In-Reply-To: <20180630161806.24312-1-linus.walleij@linaro.org>

Instead of just specify that a PHY is connected at some
speed, also specify which one. This is helpful with several
PHYs on the system.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 drivers/net/ethernet/cortina/gemini.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/cortina/gemini.c b/drivers/net/ethernet/cortina/gemini.c
index 8fc31723f700..b49ed8964026 100644
--- a/drivers/net/ethernet/cortina/gemini.c
+++ b/drivers/net/ethernet/cortina/gemini.c
@@ -300,23 +300,26 @@ static void gmac_speed_set(struct net_device *netdev)
 		status.bits.speed = GMAC_SPEED_1000;
 		if (phydev->interface == PHY_INTERFACE_MODE_RGMII)
 			status.bits.mii_rmii = GMAC_PHY_RGMII_1000;
-		netdev_info(netdev, "connect to RGMII @ 1Gbit\n");
+		netdev_info(netdev, "connect %s to RGMII @ 1Gbit\n",
+			    phydev_name(phydev));
 		break;
 	case 100:
 		status.bits.speed = GMAC_SPEED_100;
 		if (phydev->interface == PHY_INTERFACE_MODE_RGMII)
 			status.bits.mii_rmii = GMAC_PHY_RGMII_100_10;
-		netdev_info(netdev, "connect to RGMII @ 100 Mbit\n");
+		netdev_info(netdev, "connect %s to RGMII @ 100 Mbit\n",
+			    phydev_name(phydev));
 		break;
 	case 10:
 		status.bits.speed = GMAC_SPEED_10;
 		if (phydev->interface == PHY_INTERFACE_MODE_RGMII)
 			status.bits.mii_rmii = GMAC_PHY_RGMII_100_10;
-		netdev_info(netdev, "connect to RGMII @ 10 Mbit\n");
+		netdev_info(netdev, "connect %s to RGMII @ 10 Mbit\n",
+			    phydev_name(phydev));
 		break;
 	default:
-		netdev_warn(netdev, "Not supported PHY speed (%d)\n",
-			    phydev->speed);
+		netdev_warn(netdev, "Unsupported PHY speed (%d) on %s\n",
+			    phydev->speed, phydev_name(phydev));
 	}
 
 	if (phydev->duplex == DUPLEX_FULL) {
-- 
2.17.1

^ permalink raw reply related

* [PATCH net-next 3/5] net: gemini: Allow multiple ports to instantiate
From: Linus Walleij @ 2018-06-30 16:18 UTC (permalink / raw)
  To: netdev, David S . Miller, Michał Mirosław
  Cc: Janos Laube, Paulius Zaleckas, linux-arm-kernel, Hans Ulli Kroll,
	Florian Fainelli, Linus Walleij
In-Reply-To: <20180630161806.24312-1-linus.walleij@linaro.org>

The code was not tested with two ports actually in use at
the same time. (I blame this on lack of actual hardware using
that feature.) Now after locating a system using both ports,
add necessary fix to make both ports come up.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 drivers/net/ethernet/cortina/gemini.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/cortina/gemini.c b/drivers/net/ethernet/cortina/gemini.c
index b49ed8964026..8d192fcd51c8 100644
--- a/drivers/net/ethernet/cortina/gemini.c
+++ b/drivers/net/ethernet/cortina/gemini.c
@@ -1787,7 +1787,10 @@ static int gmac_open(struct net_device *netdev)
 	phy_start(netdev->phydev);
 
 	err = geth_resize_freeq(port);
-	if (err) {
+	/* It's fine if it's just busy, the other port has set up
+	 * the freeq in that case.
+	 */
+	if (err && (err != -EBUSY)) {
 		netdev_err(netdev, "could not resize freeq\n");
 		goto err_stop_phy;
 	}
-- 
2.17.1

^ permalink raw reply related

* [PATCH net-next 4/5] net: gemini: Move main init to port
From: Linus Walleij @ 2018-06-30 16:18 UTC (permalink / raw)
  To: netdev, David S . Miller, Michał Mirosław
  Cc: Janos Laube, Paulius Zaleckas, linux-arm-kernel, Hans Ulli Kroll,
	Florian Fainelli, Linus Walleij
In-Reply-To: <20180630161806.24312-1-linus.walleij@linaro.org>

The initialization sequence for the ethernet, setting up
interrupt routing and such things, need to be done after
both the ports are clocked and reset. Before this the
config will not "take". Move the initialization to the
port probe function and keep track of init status in
the state.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 drivers/net/ethernet/cortina/gemini.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/cortina/gemini.c b/drivers/net/ethernet/cortina/gemini.c
index 8d192fcd51c8..79324bbfd768 100644
--- a/drivers/net/ethernet/cortina/gemini.c
+++ b/drivers/net/ethernet/cortina/gemini.c
@@ -146,6 +146,7 @@ struct gemini_ethernet {
 	void __iomem *base;
 	struct gemini_ethernet_port *port0;
 	struct gemini_ethernet_port *port1;
+	bool initialized;
 
 	spinlock_t	irq_lock; /* Locks IRQ-related registers */
 	unsigned int	freeq_order;
@@ -2301,6 +2302,14 @@ static void gemini_port_remove(struct gemini_ethernet_port *port)
 
 static void gemini_ethernet_init(struct gemini_ethernet *geth)
 {
+	/* Only do this once both ports are online */
+	if (geth->initialized)
+		return;
+	if (geth->port0 && geth->port1)
+		geth->initialized = true;
+	else
+		return;
+
 	writel(0, geth->base + GLOBAL_INTERRUPT_ENABLE_0_REG);
 	writel(0, geth->base + GLOBAL_INTERRUPT_ENABLE_1_REG);
 	writel(0, geth->base + GLOBAL_INTERRUPT_ENABLE_2_REG);
@@ -2447,6 +2456,10 @@ static int gemini_ethernet_port_probe(struct platform_device *pdev)
 		geth->port0 = port;
 	else
 		geth->port1 = port;
+
+	/* This will just be done once both ports are up and reset */
+	gemini_ethernet_init(geth);
+
 	platform_set_drvdata(pdev, port);
 
 	/* Set up and register the netdev */
@@ -2564,7 +2577,6 @@ static int gemini_ethernet_probe(struct platform_device *pdev)
 
 	spin_lock_init(&geth->irq_lock);
 	spin_lock_init(&geth->freeq_lock);
-	gemini_ethernet_init(geth);
 
 	/* The children will use this */
 	platform_set_drvdata(pdev, geth);
@@ -2577,8 +2589,8 @@ static int gemini_ethernet_remove(struct platform_device *pdev)
 {
 	struct gemini_ethernet *geth = platform_get_drvdata(pdev);
 
-	gemini_ethernet_init(geth);
 	geth_cleanup_freeq(geth);
+	geth->initialized = false;
 
 	return 0;
 }
-- 
2.17.1

^ permalink raw reply related

* [PATCH net-next 5/5] net: gemini: Indicate that we can handle jumboframes
From: Linus Walleij @ 2018-06-30 16:18 UTC (permalink / raw)
  To: netdev, David S . Miller, Michał Mirosław
  Cc: Janos Laube, Paulius Zaleckas, linux-arm-kernel, Hans Ulli Kroll,
	Florian Fainelli, Linus Walleij
In-Reply-To: <20180630161806.24312-1-linus.walleij@linaro.org>

The hardware supposedly handles frames up to 10236 bytes and
implements .ndo_change_mtu() so accept 10236 minus the ethernet
header for a VLAN tagged frame on the netdevices.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 drivers/net/ethernet/cortina/gemini.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/cortina/gemini.c b/drivers/net/ethernet/cortina/gemini.c
index 79324bbfd768..ae475393e4ac 100644
--- a/drivers/net/ethernet/cortina/gemini.c
+++ b/drivers/net/ethernet/cortina/gemini.c
@@ -2473,6 +2473,11 @@ static int gemini_ethernet_port_probe(struct platform_device *pdev)
 
 	netdev->hw_features = GMAC_OFFLOAD_FEATURES;
 	netdev->features |= GMAC_OFFLOAD_FEATURES | NETIF_F_GRO;
+	/* We can handle jumbo frames up to 10236 bytes so, let's accept
+	 * payloads of 10236 bytes minus VLAN and ethernet header
+	 */
+	netdev->min_mtu = 256;
+	netdev->max_mtu = 10236 - VLAN_ETH_HLEN;
 
 	port->freeq_refill = 0;
 	netif_napi_add(netdev, &port->napi, gmac_napi_poll,
-- 
2.17.1

^ permalink raw reply related

* [PATCH v2 0/3] usbnet: Admit multicast traffic in cdc ncm devices
From: Miguel Rodríguez Pérez @ 2018-06-30 17:27 UTC (permalink / raw)
  To: linux-usb, netdev

I have reworked the patches according to the previous round of revisions.

-- 
Miguel Rodríguez Pérez
Laboratorio de Redes
EE Telecomunicación – Universidade de Vigo

^ permalink raw reply

* [PATCH v2 2/4] usbnet: Export usbnet_cdc_update_filter
From: Miguel Rodríguez Pérez @ 2018-06-30 17:31 UTC (permalink / raw)
  To: linux-usb, netdev

This one exports usbnet_cdc_update_filter so that it can be reused by
cdc_ncm or any other usbnet driver.

---
 drivers/net/usb/cdc_ether.c | 3 ++-
 include/linux/usb/usbnet.h  | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/usb/cdc_ether.c b/drivers/net/usb/cdc_ether.c
index 815ed0dc18fe..54472ab77b90 100644
--- a/drivers/net/usb/cdc_ether.c
+++ b/drivers/net/usb/cdc_ether.c
@@ -75,7 +75,7 @@ static const u8 mbm_guid[16] = {
 	0xa6, 0x07, 0xc0, 0xff, 0xcb, 0x7e, 0x39, 0x2a,
 };

-static void usbnet_cdc_update_filter(struct usbnet *dev)
+void usbnet_cdc_update_filter(struct usbnet *dev)
 {
 	struct net_device *net = dev->net;

@@ -99,6 +99,7 @@ static void usbnet_cdc_update_filter(struct usbnet *dev)
 			NULL,
 			0);
 }
+EXPORT_SYMBOL_GPL(usbnet_cdc_update_filter);

 /* probes control interface, claims data interface, collects the bulk
  * endpoints, activates data interface (if needed), maybe sets MTU.
diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h
index e2ec3582e549..7821cf1dcd60 100644
--- a/include/linux/usb/usbnet.h
+++ b/include/linux/usb/usbnet.h
@@ -286,4 +286,5 @@ extern void usbnet_update_max_qlen(struct usbnet *dev);
 extern void usbnet_get_stats64(struct net_device *dev,
 			       struct rtnl_link_stats64 *stats);

+extern void usbnet_cdc_update_filter(struct usbnet *);
 #endif /* __LINUX_USB_USBNET_H */
-- 
2.17.1

-- 
Miguel Rodríguez Pérez
Laboratorio de Redes
EE Telecomunicación – Universidade de Vigo

^ permalink raw reply related

* [PATCH v2 1/4] Simplify usbnet_cdc_update_filter
From: Miguel Rodríguez Pérez @ 2018-06-30 17:32 UTC (permalink / raw)
  To: linux-usb, netdev

Remove some unneeded varibles to make the code easier to read
and, replace the generic usb_control_msg function for the
more specific usbnet_write_cmd.
---
 drivers/net/usb/cdc_ether.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/drivers/net/usb/cdc_ether.c b/drivers/net/usb/cdc_ether.c
index 178b956501a7..815ed0dc18fe 100644
--- a/drivers/net/usb/cdc_ether.c
+++ b/drivers/net/usb/cdc_ether.c
@@ -77,9 +77,7 @@ static const u8 mbm_guid[16] = {

 static void usbnet_cdc_update_filter(struct usbnet *dev)
 {
-	struct cdc_state	*info = (void *) &dev->data;
-	struct usb_interface	*intf = info->control;
-	struct net_device	*net = dev->net;
+	struct net_device *net = dev->net;

 	u16 cdc_filter = USB_CDC_PACKET_TYPE_DIRECTED
 			| USB_CDC_PACKET_TYPE_BROADCAST;
@@ -93,16 +91,13 @@ static void usbnet_cdc_update_filter(struct usbnet *dev)
 	if (!netdev_mc_empty(net) || (net->flags & IFF_ALLMULTI))
 		cdc_filter |= USB_CDC_PACKET_TYPE_ALL_MULTICAST;

-	usb_control_msg(dev->udev,
-			usb_sndctrlpipe(dev->udev, 0),
+	usbnet_write_cmd(dev,
 			USB_CDC_SET_ETHERNET_PACKET_FILTER,
-			USB_TYPE_CLASS | USB_RECIP_INTERFACE,
+			USB_TYPE_CLASS | USB_DIR_OUT | USB_RECIP_INTERFACE,
 			cdc_filter,
-			intf->cur_altsetting->desc.bInterfaceNumber,
+			dev->intf->cur_altsetting->desc.bInterfaceNumber,
 			NULL,
-			0,
-			USB_CTRL_SET_TIMEOUT
-		);
+			0);
 }

 /* probes control interface, claims data interface, collects the bulk
-- 
2.17.1

-- 
Miguel Rodríguez Pérez
Laboratorio de Redes
EE Telecomunicación – Universidade de Vigo

^ permalink raw reply related

* [PATCH v2 3/4] Replace the way cdc_ncm hooks into usbnet_change_mtu
From: Miguel Rodríguez Pérez @ 2018-06-30 17:32 UTC (permalink / raw)
  To: linux-usb, netdev

Previously cdc_ncm overwrited netdev_ops used by usbnet
thus preventing hooking into set_rx_mode. This patch
preserves usbnet hooks into netdev_ops, and add an
additional one for change_mtu needed by cdc_ncm.
---
 drivers/net/usb/cdc_ncm.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c
index 9e1b74590682..d6b51e2b9495 100644
--- a/drivers/net/usb/cdc_ncm.c
+++ b/drivers/net/usb/cdc_ncm.c
@@ -750,16 +750,7 @@ int cdc_ncm_change_mtu(struct net_device *net, int
new_mtu)
 }
 EXPORT_SYMBOL_GPL(cdc_ncm_change_mtu);

-static const struct net_device_ops cdc_ncm_netdev_ops = {
-	.ndo_open	     = usbnet_open,
-	.ndo_stop	     = usbnet_stop,
-	.ndo_start_xmit	     = usbnet_start_xmit,
-	.ndo_tx_timeout	     = usbnet_tx_timeout,
-	.ndo_get_stats64     = usbnet_get_stats64,
-	.ndo_change_mtu	     = cdc_ncm_change_mtu,
-	.ndo_set_mac_address = eth_mac_addr,
-	.ndo_validate_addr   = eth_validate_addr,
-};
+static struct net_device_ops cdc_ncm_netdev_ops;

 int cdc_ncm_bind_common(struct usbnet *dev, struct usb_interface *intf,
u8 data_altsetting, int drvflags)
 {
@@ -939,6 +930,8 @@ int cdc_ncm_bind_common(struct usbnet *dev, struct
usb_interface *intf, u8 data_
 	dev->net->sysfs_groups[0] = &cdc_ncm_sysfs_attr_group;

 	/* must handle MTU changes */
+	cdc_ncm_netdev_ops = *dev->net->netdev_ops;
+	cdc_ncm_netdev_ops.ndo_change_mtu = cdc_ncm_change_mtu;
 	dev->net->netdev_ops = &cdc_ncm_netdev_ops;
 	dev->net->max_mtu = cdc_ncm_max_dgram_size(dev) - cdc_ncm_eth_hlen(dev);

-- 
2.17.1

-- 
Miguel Rodríguez Pérez
Laboratorio de Redes
EE Telecomunicación – Universidade de Vigo

^ permalink raw reply related

* [PATCH v2 4/4] Hook into set_rx_mode to admit multicast traffic
From: Miguel Rodríguez Pérez @ 2018-06-30 17:32 UTC (permalink / raw)
  To: linux-usb, netdev

We set set_rx_mode to usbnet_cdc_update_filter provided
by cdc_ether that simply admits all multicast traffic
if there is more than one multicast filter configured.
---
 drivers/net/usb/cdc_ncm.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c
index d6b51e2b9495..fda0af0b5d3c 100644
--- a/drivers/net/usb/cdc_ncm.c
+++ b/drivers/net/usb/cdc_ncm.c
@@ -1652,6 +1652,7 @@ static const struct driver_info cdc_ncm_info = {
 	.status = cdc_ncm_status,
 	.rx_fixup = cdc_ncm_rx_fixup,
 	.tx_fixup = cdc_ncm_tx_fixup,
+	.set_rx_mode = usbnet_cdc_update_filter,
 };

 /* Same as cdc_ncm_info, but with FLAG_WWAN */
@@ -1665,6 +1666,7 @@ static const struct driver_info wwan_info = {
 	.status = cdc_ncm_status,
 	.rx_fixup = cdc_ncm_rx_fixup,
 	.tx_fixup = cdc_ncm_tx_fixup,
+	.set_rx_mode = usbnet_cdc_update_filter,
 };

 /* Same as wwan_info, but with FLAG_NOARP  */
@@ -1678,6 +1680,7 @@ static const struct driver_info wwan_noarp_info = {
 	.status = cdc_ncm_status,
 	.rx_fixup = cdc_ncm_rx_fixup,
 	.tx_fixup = cdc_ncm_tx_fixup,
+	.set_rx_mode = usbnet_cdc_update_filter,
 };

 static const struct usb_device_id cdc_devs[] = {
-- 
2.17.1

-- 
Miguel Rodríguez Pérez
Laboratorio de Redes
EE Telecomunicación – Universidade de Vigo

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox