Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH iproute2] tc/htb: remove unused variable
From: Stephen Hemminger @ 2018-08-30 15:01 UTC (permalink / raw)
  To: Florent Fourcot; +Cc: netdev
In-Reply-To: <20180830143854.24928-1-florent.fourcot@wifirst.fr>

On Thu, 30 Aug 2018 16:38:54 +0200
Florent Fourcot <florent.fourcot@wifirst.fr> wrote:

> Since introduction of htb module, this variable has never been used.
> 
> Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>

Looks good. Applied

^ permalink raw reply

* Re: [PATCH net] tcp: do not restart timewait timer on rst reception
From: Eric Dumazet @ 2018-08-30 15:07 UTC (permalink / raw)
  To: Florian Westphal, netdev; +Cc: edumazet
In-Reply-To: <20180830122429.3546-1-fw@strlen.de>



On 08/30/2018 05:24 AM, Florian Westphal wrote:
> RFC 1337 says:
>  ''Ignore RST segments in TIME-WAIT state.
>    If the 2 minute MSL is enforced, this fix avoids all three hazards.''
> 
> So with net.ipv4.tcp_rfc1337=1, expected behaviour is to have TIME-WAIT sk
> expire rather than removing it instantly when a reset is received.
> 
> However, Linux will also re-start the TIME-WAIT timer.
> 
> This causes connect to fail when tying to re-use ports or very long
> delays (until syn retry interval exceeds MSL).
>


> Reported-by: Michal Tesar <mtesar@redhat.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---

SGTM, thanks.

Signed-off-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* [PATCH v3 bpf-next 2/2] new sample bpf prog
From: Nikita V. Shirokov @ 2018-08-30 14:51 UTC (permalink / raw)
  To: ast, brakmo, daniel; +Cc: netdev, Nikita V. Shirokov
In-Reply-To: <20180830145154.1128593-1-tehnerd@fb.com>

sample program which shows TCP_SAVE_SYN/TCP_SAVED_SYN usage example:
bpf's program which is doing TOS/TCLASS reflection (server would reply
with a same TOS/TCLASS as client)

Signed-off-by: Nikita V. Shirokov <tehnerd@fb.com>
---
 samples/bpf/Makefile               |  1 +
 samples/bpf/tcp_tos_reflect_kern.c | 87 ++++++++++++++++++++++++++++++
 2 files changed, 88 insertions(+)
 create mode 100644 samples/bpf/tcp_tos_reflect_kern.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 36f9f41d094b..be0a961450bc 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -153,6 +153,7 @@ always += tcp_cong_kern.o
 always += tcp_iw_kern.o
 always += tcp_clamp_kern.o
 always += tcp_basertt_kern.o
+always += tcp_tos_reflect_kern.o
 always += xdp_redirect_kern.o
 always += xdp_redirect_map_kern.o
 always += xdp_redirect_cpu_kern.o
diff --git a/samples/bpf/tcp_tos_reflect_kern.c b/samples/bpf/tcp_tos_reflect_kern.c
new file mode 100644
index 000000000000..d51dab19eca6
--- /dev/null
+++ b/samples/bpf/tcp_tos_reflect_kern.c
@@ -0,0 +1,87 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2018 Facebook
+ *
+ * BPF program to automatically reflect TOS option from received syn packet
+ *
+ * Use load_sock_ops to load this BPF program.
+ */
+
+#include <uapi/linux/bpf.h>
+#include <uapi/linux/tcp.h>
+#include <uapi/linux/if_ether.h>
+#include <uapi/linux/if_packet.h>
+#include <uapi/linux/ip.h>
+#include <uapi/linux/ipv6.h>
+#include <uapi/linux/in.h>
+#include <linux/socket.h>
+#include "bpf_helpers.h"
+#include "bpf_endian.h"
+
+#define DEBUG 1
+
+#define bpf_printk(fmt, ...)					\
+({								\
+	       char ____fmt[] = fmt;				\
+	       bpf_trace_printk(____fmt, sizeof(____fmt),	\
+				##__VA_ARGS__);			\
+})
+
+SEC("sockops")
+int bpf_basertt(struct bpf_sock_ops *skops)
+{
+	char header[sizeof(struct ipv6hdr)];
+	struct ipv6hdr *hdr6;
+	struct iphdr *hdr;
+	int hdr_size = 0;
+	int save_syn = 1;
+	int tos = 0;
+	int rv = 0;
+	int op;
+
+	op = (int) skops->op;
+
+#ifdef DEBUG
+	bpf_printk("BPF command: %d\n", op);
+#endif
+	switch (op) {
+	case BPF_SOCK_OPS_TCP_LISTEN_CB:
+		rv = bpf_setsockopt(skops, SOL_TCP, TCP_SAVE_SYN,
+				   &save_syn, sizeof(save_syn));
+		break;
+	case BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB:
+		if (skops->family == AF_INET)
+			hdr_size = sizeof(struct iphdr);
+		else
+			hdr_size = sizeof(struct ipv6hdr);
+		rv = bpf_getsockopt(skops, SOL_TCP, TCP_SAVED_SYN,
+				    header, hdr_size);
+		if (!rv) {
+			if (skops->family == AF_INET) {
+				hdr = (struct iphdr *) header;
+				tos = hdr->tos;
+				if (tos != 0)
+					bpf_setsockopt(skops, SOL_IP, IP_TOS,
+						       &tos, sizeof(tos));
+			} else {
+				hdr6 = (struct ipv6hdr *) header;
+				tos = ((hdr6->priority) << 4 |
+				       (hdr6->flow_lbl[0]) >>  4);
+				if (tos)
+					bpf_setsockopt(skops, SOL_IPV6,
+						       IPV6_TCLASS,
+						       &tos, sizeof(tos));
+			}
+			rv = 0;
+		}
+		break;
+	default:
+		rv = -1;
+	}
+#ifdef DEBUG
+	bpf_printk("Returning %d\n", rv);
+#endif
+	skops->reply = rv;
+	return 1;
+}
+char _license[] SEC("license") = "GPL";
-- 
2.17.1

^ permalink raw reply related

* [PATCH v3 bpf-next 1/2] new options for bpf_(set|get)sockopt
From: Nikita V. Shirokov @ 2018-08-30 14:51 UTC (permalink / raw)
  To: ast, brakmo, daniel; +Cc: netdev, Nikita V. Shirokov
In-Reply-To: <20180830145154.1128593-1-tehnerd@fb.com>

adding support for two new bpf's get/set sockopts: TCP_SAVE_SYN (set)
and TCP_SAVED_SYN (get). this would allow for bpf program to build
logic based on data from ingress SYN packet

Signed-off-by: Nikita V. Shirokov <tehnerd@fb.com>
---
 net/core/filter.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index c25eb36f1320..feb578506009 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4007,6 +4007,12 @@ BPF_CALL_5(bpf_setsockopt, struct bpf_sock_ops_kern *, bpf_sock,
 					tp->snd_ssthresh = val;
 				}
 				break;
+			case TCP_SAVE_SYN:
+				if (val < 0 || val > 1)
+					ret = -EINVAL;
+				else
+					tp->save_syn = val;
+				break;
 			default:
 				ret = -EINVAL;
 			}
@@ -4032,21 +4038,32 @@ static const struct bpf_func_proto bpf_setsockopt_proto = {
 BPF_CALL_5(bpf_getsockopt, struct bpf_sock_ops_kern *, bpf_sock,
 	   int, level, int, optname, char *, optval, int, optlen)
 {
+	struct inet_connection_sock *icsk;
 	struct sock *sk = bpf_sock->sk;
+	struct tcp_sock *tp;
 
 	if (!sk_fullsock(sk))
 		goto err_clear;
-
 #ifdef CONFIG_INET
 	if (level == SOL_TCP && sk->sk_prot->getsockopt == tcp_getsockopt) {
-		if (optname == TCP_CONGESTION) {
-			struct inet_connection_sock *icsk = inet_csk(sk);
+		switch (optname) {
+		case TCP_CONGESTION:
+			icsk = inet_csk(sk);
 
 			if (!icsk->icsk_ca_ops || optlen <= 1)
 				goto err_clear;
 			strncpy(optval, icsk->icsk_ca_ops->name, optlen);
 			optval[optlen - 1] = 0;
-		} else {
+			break;
+		case TCP_SAVED_SYN:
+			tp = tcp_sk(sk);
+
+			if (optlen <= 0 || !tp->saved_syn ||
+			    optlen > tp->saved_syn[0])
+				goto err_clear;
+			memcpy(optval, tp->saved_syn + 1, optlen);
+			break;
+		default:
 			goto err_clear;
 		}
 	} else if (level == SOL_IP) {
-- 
2.17.1

^ permalink raw reply related

* [PATCH v3 bpf-next 0/2] bpf tcp save syn set/get sockoptions
From: Nikita V. Shirokov @ 2018-08-30 14:51 UTC (permalink / raw)
  To: ast, brakmo, daniel; +Cc: netdev, Nikita V. Shirokov

adding supprot for two new bpf's tcp sockopts:
TCP_SAVE_SYN (set) and TCP_SAVED_SYN (get)
this would allow for tcp-bpf program to build some logic based on fields from
ingress syn packet (e.g. doing tcp's tos/tclass reflection (see sample prog))
and do it transparently from userspace program point of view

v2->v3:
 - make patch series public
v1->v2:
 - adding proper SPDX license

Nikita V. Shirokov (2):
  new options for bpf_(set|get)sockopt
  new sample bpf prog

 net/core/filter.c                  | 25 +++++++--
 samples/bpf/Makefile               |  1 +
 samples/bpf/tcp_tos_reflect_kern.c | 87 ++++++++++++++++++++++++++++++
 3 files changed, 109 insertions(+), 4 deletions(-)
 create mode 100644 samples/bpf/tcp_tos_reflect_kern.c

-- 
2.17.1

^ permalink raw reply

* Re: KMSAN: uninit-value in rds_connect
From: Santosh Shilimkar @ 2018-08-30 19:30 UTC (permalink / raw)
  To: syzbot, davem, linux-kernel, linux-rdma, netdev, rds-devel,
	syzkaller-bugs
In-Reply-To: <00000000000005859b0574ab47e5@google.com>

On 8/30/2018 11:31 AM, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:    2dca2cbde67a kmsan: fix build warnings with CONFIG_KMSAN=n
> git tree:       https://github.com/google/kmsan.git/master
> console output: https://syzkaller.appspot.com/x/log.txt?x=1519734a400000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=820d6393634b55e3
> dashboard link: 
> https://syzkaller.appspot.com/bug?extid=0049bebbf3042dbd2e8f
> compiler:       clang version 8.0.0 (trunk 339414)
> 
> Unfortunately, I don't have any reproducer for this crash yet.
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+0049bebbf3042dbd2e8f@syzkaller.appspotmail.com
> 
OK. Will send the fix to address this.

Regards,
Santosh

^ permalink raw reply

* Re: KMSAN: uninit-value in rds_bind
From: Santosh Shilimkar @ 2018-08-30 19:30 UTC (permalink / raw)
  To: syzbot, davem, linux-kernel, linux-rdma, netdev, rds-devel,
	syzkaller-bugs
In-Reply-To: <000000000000010a4d0574ab4743@google.com>

On 8/30/2018 11:31 AM, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:    2dca2cbde67a kmsan: fix build warnings with CONFIG_KMSAN=n
> git tree:       https://github.com/google/kmsan.git/master
> console output: https://syzkaller.appspot.com/x/log.txt?x=16db895a400000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=820d6393634b55e3
> dashboard link: 
> https://syzkaller.appspot.com/bug?extid=915c9f99f3dbc4bd6cd1
> compiler:       clang version 8.0.0 (trunk 339414)
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1137bffe400000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1521a7fe400000
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+915c9f99f3dbc4bd6cd1@syzkaller.appspotmail.com
> 
OK. Will send the fix to address this.

Regards,
Santosh

^ permalink raw reply

* [PATCH] Optimize lookup of /0 xfrm policies
From: Yannick Brosseau @ 2018-08-30 19:34 UTC (permalink / raw)
  To: steffen.klassert, herbert, davem, netdev
  Cc: linux-kernel, kernel-team, Yannick Brosseau

Currently, all the xfrm policies that are not /32 end up in
the inexact policies linked list which take a long time to lookup.

We can optimize the case where we have a /0 prefix in the policy, which
means we can match any address to that part.
We do this by putting those policies in the direct hash table after
zeroing the address part.
At lookup time, we do an additional lookup with the packet address
and either the destination or source address zeroed out.
We still call xfrm_policy_match to validate that the packet match the
selector.

In our tests, with this optimization we reduce softirq cpu utilisation
from about 40% to 7% with 3k policies.

Signed-off-by: Yannick Brosseau <scientist@fb.com>
---
 net/xfrm/xfrm_hash.h   | 10 +++++
 net/xfrm/xfrm_policy.c | 88 +++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 96 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_hash.h b/net/xfrm/xfrm_hash.h
index 61be810389d8..40997fb5336d 100644
--- a/net/xfrm/xfrm_hash.h
+++ b/net/xfrm/xfrm_hash.h
@@ -145,6 +145,16 @@ static inline unsigned int __sel_hash(const struct xfrm_selector *sel,
 	const xfrm_address_t *saddr = &sel->saddr;
 	unsigned int h = 0;
 
+	/* A selector with a prefixlen of zero can basically be ignored in
+	 * the matching. To speed up the lookup, let's hash it without those
+	 * component. In the lookup, we'll do an additional check for a zero
+	 * daddr and a zero saddr.
+	 */
+	if (sel->prefixlen_d == 0)
+		dbits = 0;
+	if (sel->prefixlen_s == 0)
+		sbits = 0;
+
 	switch (family) {
 	case AF_INET:
 		if (sel->prefixlen_d < dbits ||
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 3110c3fbee20..7c2259f140d5 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1096,8 +1096,10 @@ static struct xfrm_policy *xfrm_policy_lookup_bytype(struct net *net, u8 type,
 	int err;
 	struct xfrm_policy *pol, *ret;
 	const xfrm_address_t *daddr, *saddr;
+	static const xfrm_address_t zero_addr = {0};
+
 	struct hlist_head *chain;
-	unsigned int sequence;
+	unsigned int sequence, first_sequence;
 	u32 priority;
 
 	daddr = xfrm_flowi_daddr(fl, family);
@@ -1112,6 +1114,7 @@ static struct xfrm_policy *xfrm_policy_lookup_bytype(struct net *net, u8 type,
 		chain = policy_hash_direct(net, daddr, saddr, family, dir);
 	} while (read_seqcount_retry(&xfrm_policy_hash_generation, sequence));
 
+	first_sequence = sequence;
 	priority = ~0U;
 	ret = NULL;
 	hlist_for_each_entry_rcu(pol, chain, bydst) {
@@ -1129,6 +1132,87 @@ static struct xfrm_policy *xfrm_policy_lookup_bytype(struct net *net, u8 type,
 			break;
 		}
 	}
+
+	/* XXX FB NOT UPSTREAM YET T12762593 */
+	/* Do an additional lookup for saddr == 0, since we stored source
+	 * selector with a prefix len of 0 that way in the bydst hash
+	 */
+	do {
+		sequence = read_seqcount_begin(&xfrm_policy_hash_generation);
+		chain = policy_hash_direct(net, daddr, &zero_addr, family, dir);
+	} while (read_seqcount_retry(&xfrm_policy_hash_generation, sequence));
+
+	hlist_for_each_entry_rcu(pol, chain, bydst) {
+		if ((pol->priority >= priority) && ret)
+			break;
+
+		err = xfrm_policy_match(pol, fl, type, family, dir);
+		if (err) {
+			if (err == -ESRCH)
+				continue;
+			else {
+				ret = ERR_PTR(err);
+				goto fail;
+			}
+		} else {
+			ret = pol;
+			priority = ret->priority;
+			break;
+		}
+	}
+
+	/* Do an additional lookup for daddr == 0, since we stored dest
+	 * selector with a prefix len of 0 that way in the bydst hash
+	 */
+	do {
+		sequence = read_seqcount_begin(&xfrm_policy_hash_generation);
+		chain = policy_hash_direct(net, &zero_addr, saddr, family, dir);
+	} while (read_seqcount_retry(&xfrm_policy_hash_generation, sequence));
+
+	hlist_for_each_entry_rcu(pol, chain, bydst) {
+		if ((pol->priority >= priority) && ret)
+			break;
+
+		err = xfrm_policy_match(pol, fl, type, family, dir);
+		if (err) {
+			if (err == -ESRCH)
+				continue;
+			else {
+				ret = ERR_PTR(err);
+				goto fail;
+			}
+		} else {
+			ret = pol;
+			priority = ret->priority;
+			break;
+		}
+	}
+
+	/* Do an additional lookup for both saddr and daddr == 0 */
+	do {
+		sequence = read_seqcount_begin(&xfrm_policy_hash_generation);
+		chain = policy_hash_direct(net, &zero_addr, &zero_addr, family, dir);
+	} while (read_seqcount_retry(&xfrm_policy_hash_generation, sequence));
+
+	hlist_for_each_entry_rcu(pol, chain, bydst) {
+		if ((pol->priority >= priority) && ret)
+			break;
+
+		err = xfrm_policy_match(pol, fl, type, family, dir);
+		if (err) {
+			if (err == -ESRCH)
+				continue;
+			else {
+				ret = ERR_PTR(err);
+				goto fail;
+			}
+		} else {
+			ret = pol;
+			priority = ret->priority;
+			break;
+		}
+	}
+
 	chain = &net->xfrm.policy_inexact[dir];
 	hlist_for_each_entry_rcu(pol, chain, bydst) {
 		if ((pol->priority >= priority) && ret)
@@ -1148,7 +1232,7 @@ static struct xfrm_policy *xfrm_policy_lookup_bytype(struct net *net, u8 type,
 		}
 	}
 
-	if (read_seqcount_retry(&xfrm_policy_hash_generation, sequence))
+	if (read_seqcount_retry(&xfrm_policy_hash_generation, first_sequence))
 		goto retry;
 
 	if (ret && !xfrm_pol_hold_rcu(ret))
-- 
2.18.0

^ permalink raw reply related

* Re: [pull request][net-next 00/10] Mellanox, mlx5 and devlink updates 2018-07-31
From: Alexander Duyck @ 2018-08-30 15:39 UTC (permalink / raw)
  To: valex
  Cc: Erez Shitrit, Saeed Mahameed, Saeed Mahameed, David Miller,
	Netdev, Jiri Pirko, Jakub Kicinski, Bjorn Helgaas, linux-pci
In-Reply-To: <5206dd74-432d-3342-2a48-3cdd1be8b5cb@mellanox.com>

I'm dropping all the old comments since the conversation was flattened
and only has one level of marks for everything.

On Thu, Aug 30, 2018 at 7:43 AM Alex Vesker <valex@mellanox.com> wrote:

<snip>

> To which devlink interfaces are you referring?

All of them. Not just the ones in this patch. If you are exposing an
interface to the user you should have documentation for it somewhere.
You should probably look at adding a patch to make certain you have
all the existing devlink interfaces in the driver documented.

I would like to see something added to the documentation folder that
explains what all the DEVLINK_PARAM_GENERIC interfaces are expected to
do, and maybe why I would use them. Then in addition I would like to
see per-driver documentation added for the DEVLINK_PARAM_DRIVER calls.
So for example I can't find any documentation in the kernel on what
enable_64b_cqe_eqe or enable_4k_uar do in mlx4 or why I would need
them, but you have them exposed as interfaces to userspace.

> There are 3 patches here that provide the crdump capability,
> these are the patches I would like to resubmit.
>
> net/mlx5: Add Vendor Specific Capability access gateway:
>     This is needed to read from the VSC by only the driver to collect a dump

You should probably work with the linux-pci mailing list on this bit
since you are exposing a new capability and they can probably point
you in the direction of how they want to deal with any potential races
in terms of access to the device versus your capability which you are
adding support for dumping via devlink.

> net/mlx5: Add Crdump FW snapshot support
>     This is code that collects the dump and registers a region called crdump
> net/mlx5: Use devlink region_snapshot parameter
>     Here I use an already implemented global param that specifies whether
>     snapshots are supported.
>
> The devlink region feature is well documented.

Where?

> can it be that you referring to devlink region called "crdump" which mlx5 exposes?

I don't care about the internals. I care about user available
documentation for the interface that is exposed. How do you expect the
user to use this functionality? That is what I want documented.

<snip>

> Will it be sufficient to prevent setcpi access using "pci_cfg_access_lock -
> any userspace reads or writes to config space and concurrent lock requests will sleep"
> otherwise do you have a different solution?

That sounds like a step in the right direction, but that is something
you should work with the linux-pci list on. My main concern is that I
don't want us being able to come at this interface from multiple
directions and screw things up.

^ permalink raw reply

* Re: [PATCH net 1/2] selftests: pmtu: maximum MTU for vti4 is 2^16-1-20
From: Nicolas Dichtel @ 2018-08-30 15:41 UTC (permalink / raw)
  To: Sabrina Dubroca, netdev; +Cc: Stefano Brivio
In-Reply-To: <1e62875c4c72b38b17f6c73f9654696b14fb3166.1535636302.git.sd@queasysnail.net>

Le 30/08/2018 à 16:01, Sabrina Dubroca a écrit :
> Since commit 82612de1c98e ("ip_tunnel: restore binding to ifaces with a
> large mtu"), the maximum MTU for vti4 is based on IP_MAX_MTU instead of
> the mysterious constant 0xFFF8.  This makes this selftest fail.
> 
> Fixes: 82612de1c98e ("ip_tunnel: restore binding to ifaces with a large mtu")
> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
> Acked-by: Stefano Brivio <sbrivio@redhat.com>

Thanks for fixing this.

Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>

^ permalink raw reply

* Re: [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR
From: Nicolas Dichtel @ 2018-08-30 15:49 UTC (permalink / raw)
  To: Christian Brauner, Kirill Tkhai
  Cc: netdev, linux-kernel, davem, kuznet, yoshfuji, pombredanne,
	kstewart, gregkh, dsahern, fw, lucien.xin, jakub.kicinski, jbenc
In-Reply-To: <20180830144544.tpross4jd6awou4u@gmail.com>

Le 30/08/2018 à 16:45, Christian Brauner a écrit :
[snip]
> Introducing the IFA_IF_NETNSID property will not make the netlink
> interface less modular. It is a clean, RTM_*ADDR-request specific
> property using network namespace identifiers which we discussed in prior
> patches are the way to go forward.
> 
> You can already get interfaces via GETLINK from another network
> namespaces than the one you reside in (Which we enabled just a few
> months back.) but you can't do the same for GETADDR. Those two are
> almost always used together. When you want to get the links you usually
> also want to get the addresses associated with it right after.
> In a prior discussion we agreed that network namespace identifiers are
> the way to go forward but that any other propery, i.e. PIDs and fds
> should never be ported into other parts of the codebase and that is
> indeed something I agree with.
Yes, I agree with this and I think this series go to the right direction.

Maybe I would choose a more generic name for the attribute, something that can
be used in other netlink families (xfrm, netfilter, ...) also.
What about IFA_TARGET_NSID?

^ permalink raw reply

* [PATCH net] ipv6: don't get lwtstate twice in ip6_rt_copy_init()
From: Alexey Kodanev @ 2018-08-30 16:11 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern, David Miller, Alexey Kodanev

Commit 80f1a0f4e0cd ("net/ipv6: Put lwtstate when destroying fib6_info")
partially fixed the kmemleak [1], lwtstate can be copied from fib6_info,
with ip6_rt_copy_init(), and it should be done only once there.

rt->dst.lwtstate is set by ip6_rt_init_dst(), at the start of the function
ip6_rt_copy_init(), so there is no need to get it again at the end.

With this patch, lwtstate also isn't copied from RTF_REJECT routes.

[1]:
unreferenced object 0xffff880b6aaa14e0 (size 64):
  comm "ip", pid 10577, jiffies 4295149341 (age 1273.903s)
  hex dump (first 32 bytes):
    01 00 04 00 04 00 00 00 10 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<0000000018664623>] lwtunnel_build_state+0x1bc/0x420
    [<00000000b73aa29a>] ip6_route_info_create+0x9f7/0x1fd0
    [<00000000ee2c5d1f>] ip6_route_add+0x14/0x70
    [<000000008537b55c>] inet6_rtm_newroute+0xd9/0xe0
    [<000000002acc50f5>] rtnetlink_rcv_msg+0x66f/0x8e0
    [<000000008d9cd381>] netlink_rcv_skb+0x268/0x3b0
    [<000000004c893c76>] netlink_unicast+0x417/0x5a0
    [<00000000f2ab1afb>] netlink_sendmsg+0x70b/0xc30
    [<00000000890ff0aa>] sock_sendmsg+0xb1/0xf0
    [<00000000a2e7b66f>] ___sys_sendmsg+0x659/0x950
    [<000000001e7426c8>] __sys_sendmsg+0xde/0x170
    [<00000000fe411443>] do_syscall_64+0x9f/0x4a0
    [<000000001be7b28b>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [<000000006d21f353>] 0xffffffffffffffff

Fixes: 6edb3c96a5f0 ("net/ipv6: Defer initialization of dst to data path")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
---
 net/ipv6/route.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 8e08a91..9f27ada 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -996,7 +996,6 @@ static void ip6_rt_copy_init(struct rt6_info *rt, struct fib6_info *ort)
 	rt->rt6i_src = ort->fib6_src;
 #endif
 	rt->rt6i_prefsrc = ort->fib6_prefsrc;
-	rt->dst.lwtstate = lwtstate_get(ort->fib6_nh.nh_lwtstate);
 }
 
 static struct fib6_node* fib6_backtrack(struct fib6_node *fn,
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH net-next 1/3] net: rework SIOCGSTAMP ioctl handling
From: Willem de Bruijn @ 2018-08-30 20:09 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Network Development, David Miller, linux-arch, y2038 Mailman List,
	Eric Dumazet, Willem de Bruijn, LKML, linux-hams, linux-bluetooth,
	linux-can, dccp, linux-wpan, linux-sctp, linux-x25
In-Reply-To: <20180829130308.3504560-1-arnd@arndb.de>

On Wed, Aug 29, 2018 at 9:05 AM Arnd Bergmann <arnd@arndb.de> wrote:
>
> The SIOCGSTAMP/SIOCGSTAMPNS ioctl commands are implemented by many
> socket protocol handlers, and all of those end up calling the same
> sock_get_timestamp()/sock_get_timestampns() helper functions, which
> results in a lot of duplicate code.
>
> With the introduction of 64-bit time_t on 32-bit architectures, this
> gets worse, as we then need four different ioctl commands in each
> socket protocol implementation.
>
> To simplify that, let's add a new .gettstamp() operation in
> struct proto_ops, and move ioctl implementation into the common
> sock_ioctl()/compat_sock_ioctl_trans() functions that these all go
> through.
>
> We can reuse the sock_get_timestamp() implementation, but generalize
> it so it can deal with both native and compat mode, as well as
> timeval and timespec structures.
>
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

This also will simplify fixing a recently reported race condition with
sock_get_timestamp [1]. That calls sock_enable_timestamp, which
modifies sk->sk_flags, without taking the socket lock. Currently some
callers of sock_get_timestamp hold the lock (ax25, netrom, qrtr), many
don't. See also how this patch removes the lock_sock in the netrom
case. Moving the call to sock_gettstamp outside the protocol handlers
will allow taking the lock inside the function.

If this is the only valid implementation of .gettstamp, the indirect
call could be avoided in favor of a simple branch.

Thanks,

Acked-by: Willem de Bruijn <willemb@google.com>

[1] http://lkml.kernel.org/r/20180518080308.GA28587@dragonet.kaist.ac.kr

^ permalink raw reply

* Re: [PATCH net] ipv6: don't get lwtstate twice in ip6_rt_copy_init()
From: David Ahern @ 2018-08-30 16:10 UTC (permalink / raw)
  To: Alexey Kodanev, netdev; +Cc: David Miller
In-Reply-To: <1535645484-30629-1-git-send-email-alexey.kodanev@oracle.com>

On 8/30/18 10:11 AM, Alexey Kodanev wrote:
> Commit 80f1a0f4e0cd ("net/ipv6: Put lwtstate when destroying fib6_info")
> partially fixed the kmemleak [1], lwtstate can be copied from fib6_info,
> with ip6_rt_copy_init(), and it should be done only once there.
> 
> rt->dst.lwtstate is set by ip6_rt_init_dst(), at the start of the function
> ip6_rt_copy_init(), so there is no need to get it again at the end.
> 
> With this patch, lwtstate also isn't copied from RTF_REJECT routes.

Those should not have lwtstate set.

> 
> [1]:
> unreferenced object 0xffff880b6aaa14e0 (size 64):
>   comm "ip", pid 10577, jiffies 4295149341 (age 1273.903s)
>   hex dump (first 32 bytes):
>     01 00 04 00 04 00 00 00 10 00 00 00 00 00 00 00  ................
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>   backtrace:
>     [<0000000018664623>] lwtunnel_build_state+0x1bc/0x420
>     [<00000000b73aa29a>] ip6_route_info_create+0x9f7/0x1fd0
>     [<00000000ee2c5d1f>] ip6_route_add+0x14/0x70
>     [<000000008537b55c>] inet6_rtm_newroute+0xd9/0xe0
>     [<000000002acc50f5>] rtnetlink_rcv_msg+0x66f/0x8e0
>     [<000000008d9cd381>] netlink_rcv_skb+0x268/0x3b0
>     [<000000004c893c76>] netlink_unicast+0x417/0x5a0
>     [<00000000f2ab1afb>] netlink_sendmsg+0x70b/0xc30
>     [<00000000890ff0aa>] sock_sendmsg+0xb1/0xf0
>     [<00000000a2e7b66f>] ___sys_sendmsg+0x659/0x950
>     [<000000001e7426c8>] __sys_sendmsg+0xde/0x170
>     [<00000000fe411443>] do_syscall_64+0x9f/0x4a0
>     [<000000001be7b28b>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
>     [<000000006d21f353>] 0xffffffffffffffff

What test did you run to uncover this? Curious as to why my testing that
found the need for 80f1a0f4e0cd did not hit this.

> 
> Fixes: 6edb3c96a5f0 ("net/ipv6: Defer initialization of dst to data path")
> Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
> ---
>  net/ipv6/route.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 8e08a91..9f27ada 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -996,7 +996,6 @@ static void ip6_rt_copy_init(struct rt6_info *rt, struct fib6_info *ort)
>  	rt->rt6i_src = ort->fib6_src;
>  #endif
>  	rt->rt6i_prefsrc = ort->fib6_prefsrc;
> -	rt->dst.lwtstate = lwtstate_get(ort->fib6_nh.nh_lwtstate);
>  }
>  
>  static struct fib6_node* fib6_backtrack(struct fib6_node *fn,
> 

Thanks for the patch.

Reviewed-by: David Ahern <dsahern@gmail.com>

^ permalink raw reply

* Re: [PATCH net-next 2/3] net: nixge: Add support for having nixge as subdevice
From: Moritz Fischer @ 2018-08-30 16:39 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: David S. Miller, Kees Cook, Florian Fainelli,
	Linux Kernel Mailing List, netdev, Alex Williams
In-Reply-To: <20180830031110.GC16896@lunn.ch>

Hi Andrew,

On Wed, Aug 29, 2018 at 8:11 PM, Andrew Lunn <andrew@lunn.ch> wrote:

> Could you tell us more about the parent device. I'm guessing PCIe.  Is
> it x86 so no device tree? Are there cases where it does not have a PHY
> connected? What is connected instead? SFP? A switch? Can there be
> multiple PHYs on the MDIO bus?

The device is part of a larger FPGA design. One use case that I was trying
to support with this patch is PCIe with x86 (hopefully on it's own PF...)
Since the whole design isn't completely done, these are the use cases I
see upcoming and current:

ARM(64):
a) DT: PHY over MDIO (current use case), fixed-link with GPIO (coming)
b) DT: SFP (potentially coming)

x86:
a) no PHY (coming)-> fixed-link with GPIO
b) SFP (potentially), PHY over MDIO (potentially)

Thanks for your help,

Moritz

^ permalink raw reply

* Re: [PATCH net-next 1/2] netlink: ipv4 IGMP join notifications
From: Patrick Ruddy @ 2018-08-30 16:44 UTC (permalink / raw)
  To: netdev; +Cc: roopa, jiri, stephen
In-Reply-To: <20180830093545.29465-2-pruddy@vyatta.att-mail.com>

Don't know what happened to the 0/2 cover for this series so here it
is:

This patch is an update to https://patchwork.ozlabs.org/patch/571127/.
The
previous patch was based on sending multicast MAC addresses in the
netlink messages to allow the programming of hardware. It was agreed to
rework this to use RTM_NEW/DELLINK messages which were more appropriate
for layer 2 addresses.
In the interim period it has become apparent that the applications
actually
needs to see the L3 multicast addresses which are joined for FORUS
processing so this patch has been reworked to send the L3 multicast
addresses using RTM_NEW/DELADDR.
These new multicast L3 netlink notifications should use the
IFA_MULTICAST
address type but this has been dropped in favour of IFA_ADDRESS as
during
testing it was noticed that some applications - notably getaddrinfo in
lib6c assume that there is an IFA_ADDRESS in a RTM_NEW/DELADDR and
blindly dereference it.
Finally the RTM_GETADDR for both address families has been modified to
include the multicast l3 addresses.

Patrick Ruddy (2):
  netlink: ipv4 IGMP join notifications
  netlink: ipv6 MLD join notifications

 include/linux/igmp.h |  2 +
 net/ipv4/devinet.c   | 39 +++++++++++++------
 net/ipv4/igmp.c      | 90 ++++++++++++++++++++++++++++++++++++++++++++
 net/ipv6/addrconf.c  | 44 ++++++++++++++++------
 net/ipv6/mcast.c     | 66 ++++++++++++++++++++++++++++++++
 5 files changed, 218 insertions(+), 23 deletions(-)

-- 
2.17.1

On Thu, 2018-08-30 at 10:35 +0100, Patrick Ruddy wrote:
> Some userspace applications need to know about IGMP joins from the kernel
> for 2 reasons
> 1. To allow the programming of multicast MAC filters in hardware
> 2. To form a multicast FORUS list for non link-local multicast
>    groups to be sent to the kernel and from there to the interested
>    party.
> (1) can be fulfilled but simply sending the hardware multicast MAC
> address to be programmed but (2) requires the L3 address to be sent
> since this cannot be constructed from the MAC address whereas the
> reverse translation is a standard library function.
> 
> This commit provides addition and deletion of multicast addresses
> using the RTM_NEWADDR and RTM_DELADDR messages. It also provides
> the RTM_GETADDR extension to allow multicast join state to be read
> from the kernel.
> 
> Signed-off-by: Patrick Ruddy <pruddy@vyatta.att-mail.com>
> ---
>  include/linux/igmp.h |  2 +
>  net/ipv4/devinet.c   | 39 +++++++++++++------
>  net/ipv4/igmp.c      | 90 ++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 120 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/igmp.h b/include/linux/igmp.h
> index 119f53941c12..1fb417865e7d 100644
> --- a/include/linux/igmp.h
> +++ b/include/linux/igmp.h
> @@ -130,6 +130,8 @@ extern void ip_mc_unmap(struct in_device *);
>  extern void ip_mc_remap(struct in_device *);
>  extern void ip_mc_dec_group(struct in_device *in_dev, __be32 addr);
>  extern void ip_mc_inc_group(struct in_device *in_dev, __be32 addr);
> +extern int ip_mc_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb,
> +			     struct net_device *dev);
>  int ip_mc_check_igmp(struct sk_buff *skb, struct sk_buff **skb_trimmed);
>  
>  #endif
> diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
> index ea4bd8a52422..42f7dcc4fb5e 100644
> --- a/net/ipv4/devinet.c
> +++ b/net/ipv4/devinet.c
> @@ -57,6 +57,7 @@
>  #endif
>  #include <linux/kmod.h>
>  #include <linux/netconf.h>
> +#include <linux/igmp.h>
>  
>  #include <net/arp.h>
>  #include <net/ip.h>
> @@ -1651,6 +1652,7 @@ static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb)
>  	int h, s_h;
>  	int idx, s_idx;
>  	int ip_idx, s_ip_idx;
> +	int multicast, mcast_idx;
>  	struct net_device *dev;
>  	struct in_device *in_dev;
>  	struct in_ifaddr *ifa;
> @@ -1659,6 +1661,8 @@ static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb)
>  	s_h = cb->args[0];
>  	s_idx = idx = cb->args[1];
>  	s_ip_idx = ip_idx = cb->args[2];
> +	multicast = cb->args[3];
> +	mcast_idx = cb->args[4];
>  
>  	for (h = s_h; h < NETDEV_HASHENTRIES; h++, s_idx = 0) {
>  		idx = 0;
> @@ -1675,18 +1679,29 @@ static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb)
>  			if (!in_dev)
>  				goto cont;
>  
> -			for (ifa = in_dev->ifa_list, ip_idx = 0; ifa;
> -			     ifa = ifa->ifa_next, ip_idx++) {
> -				if (ip_idx < s_ip_idx)
> -					continue;
> -				if (inet_fill_ifaddr(skb, ifa,
> -					     NETLINK_CB(cb->skb).portid,
> -					     cb->nlh->nlmsg_seq,
> -					     RTM_NEWADDR, NLM_F_MULTI) < 0) {
> -					rcu_read_unlock();
> -					goto done;
> +			if (!multicast) {
> +				for (ifa = in_dev->ifa_list, ip_idx = 0; ifa;
> +				     ifa = ifa->ifa_next, ip_idx++) {
> +					if (ip_idx < s_ip_idx)
> +						continue;
> +					if (inet_fill_ifaddr(skb, ifa,
> +							     NETLINK_CB(cb->skb).portid,
> +							     cb->nlh->nlmsg_seq,
> +							     RTM_NEWADDR,
> +							     NLM_F_MULTI) < 0) {
> +						rcu_read_unlock();
> +						goto done;
> +					}
> +					nl_dump_check_consistent(cb,
> +								 nlmsg_hdr(skb));
>  				}
> -				nl_dump_check_consistent(cb, nlmsg_hdr(skb));
> +				/* set for multicast loop */
> +				multicast++;
> +			}
> +			/* loop over multicast addresses */
> +			if (ip_mc_dump_ifaddr(skb, cb, dev) < 0) {
> +				rcu_read_unlock();
> +				goto done;
>  			}
>  cont:
>  			idx++;
> @@ -1698,6 +1713,8 @@ static int inet_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb)
>  	cb->args[0] = h;
>  	cb->args[1] = idx;
>  	cb->args[2] = ip_idx;
> +	cb->args[3] = multicast;
> +	cb->args[4] = mcast_idx;
>  
>  	return skb->len;
>  }
> diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
> index cf75f8944b05..c9bbd1d27124 100644
> --- a/net/ipv4/igmp.c
> +++ b/net/ipv4/igmp.c
> @@ -86,6 +86,7 @@
>  #include <linux/inetdevice.h>
>  #include <linux/igmp.h>
>  #include <linux/if_arp.h>
> +#include <net/netlink.h>
>  #include <linux/rtnetlink.h>
>  #include <linux/times.h>
>  #include <linux/pkt_sched.h>
> @@ -1384,6 +1385,91 @@ static void ip_mc_hash_remove(struct in_device *in_dev,
>  }
>  
>  
> +static int fill_addr(struct sk_buff *skb, struct net_device *dev, __be32 addr,
> +		     int type, unsigned int flags)
> +{
> +	struct nlmsghdr *nlh;
> +	struct ifaddrmsg *ifm;
> +
> +	nlh = nlmsg_put(skb, 0, 0, type, sizeof(*ifm), flags);
> +	if (!nlh)
> +		return -EMSGSIZE;
> +
> +	ifm = nlmsg_data(nlh);
> +	ifm->ifa_family = AF_INET;
> +	ifm->ifa_prefixlen = 32;
> +	ifm->ifa_flags = IFA_F_PERMANENT;
> +	ifm->ifa_scope = RT_SCOPE_LINK;
> +	ifm->ifa_index = dev->ifindex;
> +
> +	if (nla_put_in_addr(skb, IFA_ADDRESS, addr))
> +		goto nla_put_failure;
> +	nlmsg_end(skb, nlh);
> +	return 0;
> +
> +nla_put_failure:
> +	nlmsg_cancel(skb, nlh);
> +	return -EMSGSIZE;
> +}
> +
> +static inline size_t addr_nlmsg_size(void)
> +{
> +	return NLMSG_ALIGN(sizeof(struct ifaddrmsg))
> +		+ nla_total_size(sizeof(__be32));
> +}
> +
> +static void ip_mc_addr_notify(struct net_device *dev, __be32 addr, int type)
> +{
> +	struct net *net = dev_net(dev);
> +	struct sk_buff *skb;
> +	int err = -ENOBUFS;
> +
> +	skb = nlmsg_new(addr_nlmsg_size(), GFP_ATOMIC);
> +	if (!skb)
> +		goto errout;
> +
> +	err = fill_addr(skb, dev, addr, type, 0);
> +	if (err < 0) {
> +		WARN_ON(err == -EMSGSIZE);
> +		kfree_skb(skb);
> +		goto errout;
> +	}
> +	rtnl_notify(skb, net, 0, RTNLGRP_IPV4_IFADDR, NULL, GFP_ATOMIC);
> +	return;
> +errout:
> +	if (err < 0)
> +		rtnl_set_sk_err(net, RTNLGRP_LINK, err);
> +}
> +
> +int ip_mc_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb,
> +		      struct net_device *dev)
> +{
> +	int s_idx;
> +	int idx = 0;
> +	struct ip_mc_list *im;
> +	struct in_device *in_dev;
> +
> +	ASSERT_RTNL();
> +
> +	s_idx = cb->args[4];
> +	in_dev = __in_dev_get_rtnl(dev);
> +
> +	for_each_pmc_rtnl(in_dev, im) {
> +		if (idx < s_idx)
> +			continue;
> +		if (fill_addr(skb, dev, im->multiaddr, RTM_NEWADDR,
> +			      NLM_F_MULTI) < 0)
> +			goto done;
> +		nl_dump_check_consistent(cb, nlmsg_hdr(skb));
> +		idx++;
> +	}
> +
> + done:
> +	cb->args[4] = idx;
> +
> +	return skb->len;
> +}
> +
>  /*
>   *	A socket has joined a multicast group on device dev.
>   */
> @@ -1433,6 +1519,8 @@ static void __ip_mc_inc_group(struct in_device *in_dev, __be32 addr,
>  	igmpv3_del_delrec(in_dev, im);
>  #endif
>  	igmp_group_added(im);
> +
> +	ip_mc_addr_notify(in_dev->dev, addr, RTM_NEWADDR);
>  	if (!in_dev->dead)
>  		ip_rt_multicast_event(in_dev);
>  out:
> @@ -1664,6 +1752,8 @@ void ip_mc_dec_group(struct in_device *in_dev, __be32 addr)
>  				in_dev->mc_count--;
>  				igmp_group_dropped(i);
>  				ip_mc_clear_src(i);
> +				ip_mc_addr_notify(in_dev->dev, addr,
> +						  RTM_DELADDR);
>  
>  				if (!in_dev->dead)
>  					ip_rt_multicast_event(in_dev);

^ permalink raw reply

* Re: [PATCH 2/4] r8169: Get and enable optional ether_clk clock
From: Stephen Boyd @ 2018-08-30 16:48 UTC (permalink / raw)
  To: David S . Miller, Andy Shevchenko, Hans de Goede, Heiner Kallweit,
	Irina Tirdea, Michael Turquette
  Cc: netdev, Johannes Stezenbach, Carlo Caione, linux-clk
In-Reply-To: <8a424470-9c57-9b95-9d41-3ea51d3f2629@redhat.com>

Quoting Hans de Goede (2018-08-29 10:09:57)
> Hi,
> 
> On 27-08-18 21:14, Stephen Boyd wrote:
> > Quoting Hans de Goede (2018-08-27 11:53:19)
> >> On 27-08-18 20:47, Stephen Boyd wrote:
> >>> How would you know that a clk device driver hasn't probed yet and isn't
> >>> the driver that's actually providing the clk to this device on x86
> >>> systems? With DT systems we can figure that out by looking at the DT and
> >>> seeing if the device driver requesting the clk has the clocks property.
> >>> On x86 systems it's all clkdev which doesn't really lend itself to
> >>> solving this problem.
> >>
> >> Right on x86 the assumption is that the clk driver will be builtin and
> >> will probe before the consumer. In this case that is true as the
> >> pmc-atom-clk driver can only be builtin and its platform device is
> >> instantiated from the acpi_lpss code and acpi init happens before
> >> the PCI bus is scanned.
> > 
> > If we can go with this assumption then we can make the optional clk API
> > work even on clkdev based systems. Maybe if x86 had some way of
> > indicating that all builtin clks are registered?
> 
> Unfortunately there is no such thing I'm afraid.

Ugh!

> 
> > That might work but
> > it's not very clean. Or if we could check to see if we're running on an
> > ACPI based system in clkdev we could use that to assume that clk_get()
> > will only be called after all providers have registered their lookups.
> 
> Yes some check for x86 + ACPI (ARM also uses ACPI, but there we
> should no do this AFAICT) is probably best. That or not use the
> new optional clk API on x86, but that means that any cross platform
> driver cannot use it, which would be a pain.

Right. The optional clk API will be not so great until we can get ACPI
to move way from clkdev.

> 
> BTW does your Acked-by indicate you are ok with merging this series
> through the netdev tree as I suggested in the cover-letter? If so
> can I also add your Acked-by to the 3th patch ?
> 

Yep, I thought I did that but now I've really done it.

^ permalink raw reply

* Re: KMSAN: uninit-value in rds_bind
From: Santosh Shilimkar @ 2018-08-30 20:56 UTC (permalink / raw)
  To: syzbot, linux-rdma, netdev, syzkaller-bugs; +Cc: davem, linux-kernel
In-Reply-To: <000000000000010a4d0574ab4743@google.com>

On 8/30/2018 11:31 AM, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:    2dca2cbde67a kmsan: fix build warnings with CONFIG_KMSAN=n
> git tree:       https://github.com/google/kmsan.git/master

BTW, can you please fix your git url since this one doesn't work.
This tree is not vanila kernel.org tree(4.19.0-rc1+ #36), so would be
good to get the line numbers correct for sources.

Regards,
Snatosh

^ permalink raw reply

* Re: [PATCH 1/2] dt-bindings: net: cpsw: Document cpsw-phy-sel usage but prefer phandle
From: Grygorii Strashko @ 2018-08-30 17:04 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: David Miller, netdev, linux-omap, devicetree, Andrew Lunn,
	Ivan Khoronzhuk, Mark Rutland, Murali Karicheri, Rob Herring
In-Reply-To: <20180830004745.GU7523@atomide.com>



On 08/29/2018 07:47 PM, Tony Lindgren wrote:
> * Grygorii Strashko <grygorii.strashko@ti.com> [180830 00:12]:
>> Hi Tony,
>>
>> On 08/29/2018 10:00 AM, Tony Lindgren wrote:
>>> The current cpsw usage for cpsw-phy-sel is undocumented but is used for
>>> all the boards using cpsw. And cpsw-phy-sel is not really a child of
>>> the cpsw device, it lives in the system control module instead.
>>>
>>> Let's document the existing usage, and improve it a bit where we prefer
>>> to use a phandle instead of a child device for it. That way we can
>>> properly describe the hardware in dts files for things like genpd.
>>
>> I'm ok with this series, but I really don't like cpsw-phy-sel in general.
> 
> Yeah this binding predates any standards. This series
> only fixes the nasty issue of cpsw claiming a module as a
> child that's outside it's IO range.
> 
>> It was introduced long time back and now I'm thinking about possibility to replace it with
>> one of current generic interfaces - for example mux-controller.
>> Each port will control up to 3 muxes (port mode, idmode and rmii_ext_clk) and
>> transform phy-mode => mux states.
>> What do you think?
> 
> Sure a mux-controller here makes sense.
> 
>> Another option is to use phy, but it'd be complicated.
> 
> For the port muxes, how about a phy driver just using
> a pinctrl driver?
> 
> In general, it seems cpsw is just an interconnect instance
> (L4_FAST) with a control module (CPSW_WR) and a pile of
> independent other modules. That's described nicely in
> am437x TRM chapter "2.1.4 L4 Fast Peripheral Memory Map".
> So from that point of view the binding reg entries right
> now are all wrong :)

TRM not consistent - for am5 it's one MMIO region.

> 
> In the long run cpsw should be really treated as an
> interconnect instance with it's control module providing
> standard Linux framework services such as clock /
> regulator / phy / pinctrl / iio whatever for the other
> modules.
> 
> Just my 2c based on looking at the interconnect, I'm
> not too familiar with cpsw otherwise.

It's not separate modules. this is composite module which have only one 
fck/ick and most of blocks can't even function without each other.
Above might be the case for Keystone 2, but not omap CPSW.
Keystone 2 - has packet processor, security accelerator, queue manager 
in addition to its basic switch block.

-- 
regards,
-grygorii

^ permalink raw reply

* RE: [PATCH] i40e: mark expected switch fall-through
From: Kirsher, Jeffrey T @ 2018-08-30 21:09 UTC (permalink / raw)
  To: Gustavo A. R. Silva, David S. Miller
  Cc: intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <20180830185019.GA30328@embeddedor.com>

> -----Original Message-----
> From: Gustavo A. R. Silva [mailto:gustavo@embeddedor.com]
> Sent: Thursday, August 30, 2018 11:50
> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>; David S. Miller
> <davem@davemloft.net>
> Cc: intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org; Gustavo A. R. Silva <gustavo@embeddedor.com>
> Subject: [PATCH] i40e: mark expected switch fall-through
> 
> In preparation to enabling -Wimplicit-fallthrough, mark switch cases where
> we are expecting to fall through.
> 
> Addresses-Coverity-ID: 1473099 ("Missing break in switch")
> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e_xsk.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)

I have picked this up Dave.

^ permalink raw reply

* Re: [PATCH net-next 1/3] net: nixge: Add support for fixed-link subnodes
From: Moritz Fischer @ 2018-08-30 17:21 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: David S. Miller, Kees Cook, Florian Fainelli,
	Linux Kernel Mailing List, netdev, Alex Williams
In-Reply-To: <20180830030420.GB16896@lunn.ch>

Hi Andrew,

On Wed, Aug 29, 2018 at 8:04 PM, Andrew Lunn <andrew@lunn.ch> wrote:
> On Wed, Aug 29, 2018 at 05:40:44PM -0700, Moritz Fischer wrote:
>> Add support for fixed-link cases where no MDIO is
>> actually required to run the device.
>> In that case no MDIO bus is instantiated since the
>> actual registers are not available in hardware.
>
> Hi Moritz
>
> There are a few different use cases here:
>
> The hardware is missing MDIO - You need fixed-link.

Agreed.
>
> The hardware has MDIO, but you don't have a PHY connected on it, and
> use fixed link.

Since it's an FPGA design in that case we'd probably build the hardware without
MDIO to save resources.

> The hardware has MDIO, and it is used e.g. for an Ethernet switch, or
> a PHY for another Ethernet interface. Plus you need fixed link.
We haven't had that yet but I can see that happen.
>
> The binding typically looks like:
>
> &fec1 {
>         phy-mode = "rmii";
>         pinctrl-names = "default";
>         pinctrl-0 = <&pinctrl_fec1>;
>         status = "okay";
>
>         fixed-link {
>                 speed = <100>;
>                 full-duplex;
>         };
>
>         mdio1: mdio {
>                 #address-cells = <1>;
>                 #size-cells = <0>;
>                 status = "okay";
>
>                 switch0: switch0@0 {
>                         compatible = "marvell,mv88e6085";
>                         pinctrl-names = "default";
>                         pinctrl-0 = <&pinctrl_switch>;
>                         reg = <0>;
>                         eeprom-length = <512>;
>                         interrupt-parent = <&gpio3>;
>
> It is important you have the mdio subnode, with PHYs and switches as
> children. The driver currently gets this wrong, it uses
> pdev->dev.of_node.

Oh, whoops. Yeah I should look into that. Any good examples of drivers doing
it right? Is the one going with the DT snippet above a good example?
>
> So the first patch should be to extend this behaviour. Look for a
> child node called mdio. If it exists, call nixge_mdio_setup() passing
> that child. Otherwise continue using pdev->dev.of_node, so you don't
> break backwards compatibility.

Ok will do.
>
> Then a patch adding support for fixed-link. If the mdio child node
> exists, you still need to register the MDIO bus. If there is no child
> node, but there is a fixed-link, skip registering the mdio bus with
> pdev->dev.of_node.
>
>         Andrew

Thanks for your feedback, much appreciated!

Moritz

^ permalink raw reply

* Re: [PATCH v2 iproute2-next 0/3] support delivering packets in
From: David Ahern @ 2018-08-30 18:10 UTC (permalink / raw)
  To: Yousuk Seung, netdev; +Cc: Stephen Hemminger, Michael McLennan, Priyaranjan Jha
In-Reply-To: <20180827024230.246445-1-ysseung@google.com>

On 8/26/18 8:42 PM, Yousuk Seung wrote:
> This series adds support for the new "slot" netem parameter for
> slotting. Slotting is an approximation of shared media that gather up
> packets within a varying delay window before delivering them nearly at
> once.
> 
> Dave Taht (2):
>   tc: support conversions to or from 64 bit nanosecond-based time
>   q_netem: support delivering packets in delayed time slots
> 
> Yousuk Seung (1):
>   q_netem: slotting with non-uniform distribution
> 
>  include/utils.h     |  12 +++++
>  lib/utils.c         | 104 +++++++++++++++++++++++++++++++++++++++
>  man/man8/tc-netem.8 |  40 ++++++++++++++-
>  tc/q_netem.c        | 115 +++++++++++++++++++++++++++++++++++++++++++-
>  tc/tc_cbq.c         |   1 +
>  tc/tc_core.c        |   1 +
>  tc/tc_core.h        |   2 -
>  tc/tc_estimator.c   |   1 +
>  tc/tc_util.c        |  46 ------------------
>  tc/tc_util.h        |   3 --
>  10 files changed, 272 insertions(+), 53 deletions(-)
> 

applied to iproute2-next after fixing up a whitespace issue and 2
checkpatch errors in patch 2.

^ permalink raw reply

* Re: [PATCH v4] 9p: Add refcount to p9_req_t
From: Tomas Bortoli @ 2018-08-30 22:20 UTC (permalink / raw)
  To: Dominique Martinet, Eric Van Hensbergen, Latchesar Ionkov
  Cc: v9fs-developer, netdev, linux-kernel, syzkaller,
	Dominique Martinet
In-Reply-To: <1535626341-20693-1-git-send-email-asmadeus@codewreck.org>

On 08/30/2018 12:52 PM, Dominique Martinet wrote:
> From: Tomas Bortoli <tomasbortoli@gmail.com>
> 
> To avoid use-after-free(s), use a refcount to keep track of the
> usable references to any instantiated struct p9_req_t.
> 
> This commit adds p9_req_put(), p9_req_get() and p9_req_try_get() as
> wrappers to kref_put(), kref_get() and kref_get_unless_zero().
> These are used by the client and the transports to keep track of
> valid requests' references.
> 
> p9_free_req() is added back and used as callback by kref_put().
> 
> Add SLAB_TYPESAFE_BY_RCU as it ensures that the memory freed by
> kmem_cache_free() will not be reused for another type until the rcu
> synchronisation period is over, so an address gotten under rcu read
> lock is safe to inc_ref() without corrupting random memory while
> the lock is held.
> 
> Co-developed-by: Dominique Martinet <dominique.martinet@cea.fr>
> Signed-off-by: Tomas Bortoli <tomasbortoli@gmail.com>
> Reported-by: syzbot+467050c1ce275af2a5b8@syzkaller.appspotmail.com
> Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
> ---
> v3:
>  - add req put if virtio zc request fails
>  - add req put if cancelled callback is not defined for virtio
>  - (incorrectly) add req put in rdma cancelled callback
> 
> v4:
>  - removed rdma's cancelled callback put again
>  - changed the else if no cancelled callback into actually giving virtio
> a callback, xen does not need to call put in that case either because
> both function rely on tag_lookup to find the request. trans_fd only
> needs to put in cancelled because it also keeps the req in a list around
> for cancel.
>  - add req put for trans xen's request(), I'm not sure why that one was
> missing either..
> 
> And with that I believe I am done testing all four transports.
> I'll do a second round of tests next week just to make sure, but it
> should be good enough™
> Sorry for the multiple iterations.
> 
>  include/net/9p/client.h | 14 ++++++++++
>  net/9p/client.c         | 57 ++++++++++++++++++++++++++++++++++++-----
>  net/9p/trans_fd.c       | 11 +++++++-
>  net/9p/trans_rdma.c     |  1 +
>  net/9p/trans_virtio.c   | 26 ++++++++++++++++---
>  net/9p/trans_xen.c      |  1 +
>  6 files changed, 98 insertions(+), 12 deletions(-)
> 
> diff --git a/include/net/9p/client.h b/include/net/9p/client.h
> index 735f3979d559..947a570307a6 100644
> --- a/include/net/9p/client.h
> +++ b/include/net/9p/client.h
> @@ -94,6 +94,7 @@ enum p9_req_status_t {
>  struct p9_req_t {
>  	int status;
>  	int t_err;
> +	struct kref refcount;
>  	wait_queue_head_t wq;
>  	struct p9_fcall tc;
>  	struct p9_fcall rc;
> @@ -233,6 +234,19 @@ int p9_client_lock_dotl(struct p9_fid *fid, struct p9_flock *flock, u8 *status);
>  int p9_client_getlock_dotl(struct p9_fid *fid, struct p9_getlock *fl);
>  void p9_fcall_fini(struct p9_fcall *fc);
>  struct p9_req_t *p9_tag_lookup(struct p9_client *, u16);
> +
> +static inline void p9_req_get(struct p9_req_t *r)
> +{
> +	kref_get(&r->refcount);
> +}
> +
> +static inline int p9_req_try_get(struct p9_req_t *r)
> +{
> +	return kref_get_unless_zero(&r->refcount);
> +}
> +
> +int p9_req_put(struct p9_req_t *r);
> +
>  void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status);
>  
>  int p9_parse_header(struct p9_fcall *, int32_t *, int8_t *, int16_t *, int);
> diff --git a/net/9p/client.c b/net/9p/client.c
> index 7942c0bfcc5b..aeeb6d8515d4 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -310,6 +310,18 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
>  	if (tag < 0)
>  		goto free;
>  
> +	/* Init ref to two because in the general case there is one ref
> +	 * that is put asynchronously by a writer thread, one ref
> +	 * temporarily given by p9_tag_lookup and put by p9_client_cb
> +	 * in the recv thread, and one ref put by p9_tag_remove in the
> +	 * main thread. The only exception is virtio that does not use
> +	 * p9_tag_lookup but does not have a writer thread either
> +	 * (the write happens synchronously in the request/zc_request
> +	 * callback), so p9_client_cb eats the second ref there
> +	 * as the pointer is duplicated directly by virtqueue_add_sgs()
> +	 */
> +	refcount_set(&req->refcount.refcount, 2);
> +
>  	return req;
>  
>  free:
> @@ -333,10 +345,21 @@ struct p9_req_t *p9_tag_lookup(struct p9_client *c, u16 tag)
>  	struct p9_req_t *req;
>  
>  	rcu_read_lock();
> +again:
>  	req = idr_find(&c->reqs, tag);
> -	/* There's no refcount on the req; a malicious server could cause
> -	 * us to dereference a NULL pointer
> -	 */
> +	if (req) {
> +		/* We have to be careful with the req found under rcu_read_lock
> +		 * Thanks to SLAB_TYPESAFE_BY_RCU we can safely try to get the
> +		 * ref again without corrupting other data, then check again
> +		 * that the tag matches once we have the ref
> +		 */
> +		if (!p9_req_try_get(req))
> +			goto again;
> +		if (req->tc.tag != tag) {
> +			p9_req_put(req);
> +			goto again;
> +		}
> +	}
>  	rcu_read_unlock();
>  
>  	return req;
> @@ -350,7 +373,7 @@ EXPORT_SYMBOL(p9_tag_lookup);
>   *
>   * Context: Any context.
>   */
> -static void p9_tag_remove(struct p9_client *c, struct p9_req_t *r)
> +static int p9_tag_remove(struct p9_client *c, struct p9_req_t *r)
>  {
>  	unsigned long flags;
>  	u16 tag = r->tc.tag;
> @@ -359,11 +382,23 @@ static void p9_tag_remove(struct p9_client *c, struct p9_req_t *r)
>  	spin_lock_irqsave(&c->lock, flags);
>  	idr_remove(&c->reqs, tag);
>  	spin_unlock_irqrestore(&c->lock, flags);
> +	return p9_req_put(r);
> +}
> +
> +static void p9_req_free(struct kref *ref)
> +{
> +	struct p9_req_t *r = container_of(ref, struct p9_req_t, refcount);
>  	p9_fcall_fini(&r->tc);
>  	p9_fcall_fini(&r->rc);
>  	kmem_cache_free(p9_req_cache, r);
>  }
>  
> +int p9_req_put(struct p9_req_t *r)
> +{
> +	return kref_put(&r->refcount, p9_req_free);
> +}
> +EXPORT_SYMBOL(p9_req_put);
> +
>  /**
>   * p9_tag_cleanup - cleans up tags structure and reclaims resources
>   * @c:  v9fs client struct
> @@ -379,7 +414,9 @@ static void p9_tag_cleanup(struct p9_client *c)
>  	rcu_read_lock();
>  	idr_for_each_entry(&c->reqs, req, id) {
>  		pr_info("Tag %d still in use\n", id);
> -		p9_tag_remove(c, req);
> +		if (p9_tag_remove(c, req) == 0)
> +			pr_warn("Packet with tag %d has still references",
> +				req->tc.tag);
>  	}
>  	rcu_read_unlock();
>  }
> @@ -403,6 +440,7 @@ void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status)
>  
>  	wake_up(&req->wq);
>  	p9_debug(P9_DEBUG_MUX, "wakeup: %d\n", req->tc.tag);
> +	p9_req_put(req);
>  }
>  EXPORT_SYMBOL(p9_client_cb);
>  
> @@ -643,9 +681,10 @@ static int p9_client_flush(struct p9_client *c, struct p9_req_t *oldreq)
>  	 * if we haven't received a response for oldreq,
>  	 * remove it from the list
>  	 */
> -	if (oldreq->status == REQ_STATUS_SENT)
> +	if (oldreq->status == REQ_STATUS_SENT) {
>  		if (c->trans_mod->cancelled)
>  			c->trans_mod->cancelled(c, oldreq);
> +	}
>  
>  	p9_tag_remove(c, req);
>  	return 0;
> @@ -682,6 +721,8 @@ static struct p9_req_t *p9_client_prepare_req(struct p9_client *c,
>  	return req;
>  reterr:
>  	p9_tag_remove(c, req);
> +	/* We have to put also the 2nd reference as it won't be used */
> +	p9_req_put(req);
>  	return ERR_PTR(err);
>  }
>  
> @@ -716,6 +757,8 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
>  
>  	err = c->trans_mod->request(c, req);
>  	if (err < 0) {
> +		/* write won't happen */
> +		p9_req_put(req);
>  		if (err != -ERESTARTSYS && err != -EFAULT)
>  			c->status = Disconnected;
>  		goto recalc_sigpending;
> @@ -2241,7 +2284,7 @@ EXPORT_SYMBOL(p9_client_readlink);
>  
>  int __init p9_client_init(void)
>  {
> -	p9_req_cache = KMEM_CACHE(p9_req_t, 0);
> +	p9_req_cache = KMEM_CACHE(p9_req_t, SLAB_TYPESAFE_BY_RCU);
>  	return p9_req_cache ? 0 : -ENOMEM;
>  }
>  
> diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
> index 20f46f13fe83..686e24e355d0 100644
> --- a/net/9p/trans_fd.c
> +++ b/net/9p/trans_fd.c
> @@ -132,6 +132,7 @@ struct p9_conn {
>  	struct list_head req_list;
>  	struct list_head unsent_req_list;
>  	struct p9_req_t *req;
> +	struct p9_req_t *wreq;
>  	char tmp_buf[7];
>  	struct p9_fcall rc;
>  	int wpos;
> @@ -383,6 +384,7 @@ static void p9_read_work(struct work_struct *work)
>  		m->rc.sdata = NULL;
>  		m->rc.offset = 0;
>  		m->rc.capacity = 0;
> +		p9_req_put(m->req);
>  		m->req = NULL;
>  	}
>  
> @@ -472,6 +474,8 @@ static void p9_write_work(struct work_struct *work)
>  		m->wbuf = req->tc.sdata;
>  		m->wsize = req->tc.size;
>  		m->wpos = 0;
> +		p9_req_get(req);
> +		m->wreq = req;
>  		spin_unlock(&m->client->lock);
>  	}
>  
> @@ -492,8 +496,11 @@ static void p9_write_work(struct work_struct *work)
>  	}
>  
>  	m->wpos += err;
> -	if (m->wpos == m->wsize)
> +	if (m->wpos == m->wsize) {
>  		m->wpos = m->wsize = 0;
> +		p9_req_put(m->wreq);
> +		m->wreq = NULL;
> +	}
>  
>  end_clear:
>  	clear_bit(Wworksched, &m->wsched);
> @@ -694,6 +701,7 @@ static int p9_fd_cancel(struct p9_client *client, struct p9_req_t *req)
>  	if (req->status == REQ_STATUS_UNSENT) {
>  		list_del(&req->req_list);
>  		req->status = REQ_STATUS_FLSHD;
> +		p9_req_put(req);
>  		ret = 0;
>  	}
>  	spin_unlock(&client->lock);
> @@ -711,6 +719,7 @@ static int p9_fd_cancelled(struct p9_client *client, struct p9_req_t *req)
>  	spin_lock(&client->lock);
>  	list_del(&req->req_list);
>  	spin_unlock(&client->lock);
> +	p9_req_put(req);
>  
>  	return 0;
>  }
> diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c
> index 5b0cda1aaa7a..9cc9b3a19ee7 100644
> --- a/net/9p/trans_rdma.c
> +++ b/net/9p/trans_rdma.c
> @@ -365,6 +365,7 @@ send_done(struct ib_cq *cq, struct ib_wc *wc)
>  			    c->busa, c->req->tc.size,
>  			    DMA_TO_DEVICE);
>  	up(&rdma->sq_sem);
> +	p9_req_put(c->req);
>  	kfree(c);
>  }
>  
> diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
> index 3dd6ce1c0f2d..eb596c2ed546 100644
> --- a/net/9p/trans_virtio.c
> +++ b/net/9p/trans_virtio.c
> @@ -207,6 +207,13 @@ static int p9_virtio_cancel(struct p9_client *client, struct p9_req_t *req)
>  	return 1;
>  }
>  
> +/* Reply won't come, so drop req ref */
> +static int p9_virtio_cancelled(struct p9_client *client, struct p9_req_t *req)
> +{
> +	p9_req_put(req);
> +	return 0;
> +}
> +
>  /**
>   * pack_sg_list_p - Just like pack_sg_list. Instead of taking a buffer,
>   * this takes a list of pages.
> @@ -404,6 +411,7 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
>  	struct scatterlist *sgs[4];
>  	size_t offs;
>  	int need_drop = 0;
> +	int kicked = 0;
>  
>  	p9_debug(P9_DEBUG_TRANS, "virtio request\n");
>  
> @@ -411,8 +419,10 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
>  		__le32 sz;
>  		int n = p9_get_mapped_pages(chan, &out_pages, uodata,
>  					    outlen, &offs, &need_drop);
> -		if (n < 0)
> -			return n;
> +		if (n < 0) {
> +			err = n;
> +			goto err_out;
> +		}
>  		out_nr_pages = DIV_ROUND_UP(n + offs, PAGE_SIZE);
>  		if (n != outlen) {
>  			__le32 v = cpu_to_le32(n);
> @@ -428,8 +438,10 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
>  	} else if (uidata) {
>  		int n = p9_get_mapped_pages(chan, &in_pages, uidata,
>  					    inlen, &offs, &need_drop);
> -		if (n < 0)
> -			return n;
> +		if (n < 0) {
> +			err = n;
> +			goto err_out;
> +		}
>  		in_nr_pages = DIV_ROUND_UP(n + offs, PAGE_SIZE);
>  		if (n != inlen) {
>  			__le32 v = cpu_to_le32(n);
> @@ -498,6 +510,7 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
>  	}
>  	virtqueue_kick(chan->vq);
>  	spin_unlock_irqrestore(&chan->lock, flags);
> +	kicked = 1;
>  	p9_debug(P9_DEBUG_TRANS, "virtio request kicked\n");
>  	err = wait_event_killable(req->wq, req->status >= REQ_STATUS_RCVD);
>  	/*
> @@ -518,6 +531,10 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
>  	}
>  	kvfree(in_pages);
>  	kvfree(out_pages);
> +	if (!kicked) {
> +		/* reply won't come */
> +		p9_req_put(req);
> +	}
>  	return err;
>  }
>  
> @@ -750,6 +767,7 @@ static struct p9_trans_module p9_virtio_trans = {
>  	.request = p9_virtio_request,
>  	.zc_request = p9_virtio_zc_request,
>  	.cancel = p9_virtio_cancel,
> +	.cancelled = p9_virtio_cancelled,
>  	/*
>  	 * We leave one entry for input and one entry for response
>  	 * headers. We also skip one more entry to accomodate, address
> diff --git a/net/9p/trans_xen.c b/net/9p/trans_xen.c
> index 782a07f2ad0c..e2fbf3677b9b 100644
> --- a/net/9p/trans_xen.c
> +++ b/net/9p/trans_xen.c
> @@ -185,6 +185,7 @@ static int p9_xen_request(struct p9_client *client, struct p9_req_t *p9_req)
>  	ring->intf->out_prod = prod;
>  	spin_unlock_irqrestore(&ring->lock, flags);
>  	notify_remote_via_irq(ring->irq);
> +	p9_req_put(p9_req);
>  
>  	return 0;
>  }
> 

LGTM, thanks Dominique!

Tomas

^ permalink raw reply

* [PATCH net] ibmvnic: Include missing return code checks in reset function
From: Thomas Falcon @ 2018-08-30 18:19 UTC (permalink / raw)
  To: netdev; +Cc: Thomas Falcon

Check the return codes of these functions and halt reset
in case of failure. The driver will remain in a dormant state
until the next reset event, when device initialization will be
re-attempted.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
---
 drivers/net/ethernet/ibm/ibmvnic.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index ffe7acb..d834308 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1841,11 +1841,17 @@ static int do_reset(struct ibmvnic_adapter *adapter,
 			adapter->map_id = 1;
 			release_rx_pools(adapter);
 			release_tx_pools(adapter);
-			init_rx_pools(netdev);
-			init_tx_pools(netdev);
+			rc = init_rx_pools(netdev);
+			if (rc)
+				return rc;
+			rc = init_tx_pools(netdev);
+			if (rc)
+				return rc;
 
 			release_napi(adapter);
-			init_napi(adapter);
+			rc = init_napi(adapter);
+			if (rc)
+				return rc;
 		} else {
 			rc = reset_tx_pools(adapter);
 			if (rc)
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH v2 1/2] IB/ipoib: Use dev_port to expose network interface port numbers
From: Arseny Maslennikov @ 2018-08-30 18:22 UTC (permalink / raw)
  To: linux-rdma; +Cc: Arseny Maslennikov, Doug Ledford, Jason Gunthorpe, netdev
In-Reply-To: <20180830182238.16361-1-ar@cs.msu.ru>

Some InfiniBand network devices have multiple ports on the same PCI
function. This initializes the `dev_port' sysfs field of those
network interfaces with their port number.

Prior to this the kernel erroneously used the `dev_id' sysfs
field of those network interfaces to convey the port number to userspace.

The use of `dev_id' was considered correct until Linux 3.15,
when another field, `dev_port', was defined for this particular
purpose and `dev_id' was reserved for distinguishing stacked ifaces
(e.g: VLANs) with the same hardware address as their parent device.

Similar fixes to net/mlx4_en and many other drivers, which started
exporting this information through `dev_id' before 3.15, were accepted
into the kernel 4 years ago.
See 76a066f2a2a0 (`net/mlx4_en: Expose port number through sysfs').

Signed-off-by: Arseny Maslennikov <ar@cs.msu.ru>
---
 drivers/infiniband/ulp/ipoib/ipoib_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index e3d28f9ad9c0..ba16a63ee303 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -1880,7 +1880,7 @@ static int ipoib_parent_init(struct net_device *ndev)
 	       sizeof(union ib_gid));

 	SET_NETDEV_DEV(priv->dev, priv->ca->dev.parent);
-	priv->dev->dev_id = priv->port - 1;
+	priv->dev->dev_port = priv->port - 1;

 	return 0;
 }
-- 
2.18.0

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox