Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v5 net-next] net:sched: add action inheritdsfield to skbedit
From: Davide Caratti @ 2018-06-21 16:13 UTC (permalink / raw)
  To: Fu, Qiaobin, davem@davemloft.net
  Cc: Marcelo Ricardo Leitner, Michel Machado, netdev@vger.kernel.org,
	jhs@mojatatu.com, xiyou.wangcong@gmail.com
In-Reply-To: <B84B92F9-B872-4430-B7E2-FBF23E543632@bu.edu>

On Thu, 2018-06-21 at 15:50 +0000, Fu, Qiaobin wrote:
> The new action inheritdsfield copies the field DS of
> IPv4 and IPv6 packets into skb->priority. This enables
> later classification of packets based on the DS field.
> 
> v5:
> *Update the drop counter for TC_ACT_SHOT


Acked-by: Davide Caratti <dcaratti@redhat.com>

^ permalink raw reply

* Re: [PATCH net 1/1] net/smc: coordinate wait queues for nonblocking connect
From: kbuild test robot @ 2018-06-21 16:26 UTC (permalink / raw)
  To: Ursula Braun
  Cc: kbuild-all, davem, netdev, linux-s390, schwidefsky,
	heiko.carstens, raspl, ubraun, xiyou.wangcong, hch
In-Reply-To: <20180620080737.50323-1-ubraun@linux.ibm.com>

Hi Ursula,

I love your patch! Perhaps something to improve:

[auto build test WARNING on net/master]

url:    https://github.com/0day-ci/linux/commits/Ursula-Braun/net-smc-coordinate-wait-queues-for-nonblocking-connect/20180620-180901
reproduce:
        # apt-get install sparse
        make ARCH=x86_64 allmodconfig
        make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

>> net/smc/af_smc.c:1301:49: sparse: incorrect type in assignment (different address spaces) @@    expected struct socket_wq [noderef] <asn:4>*sk_wq @@    got [noderef] <asn:4>*sk_wq @@
   net/smc/af_smc.c:1301:49:    expected struct socket_wq [noderef] <asn:4>*sk_wq
   net/smc/af_smc.c:1301:49:    got struct socket_wq *smcwq
   net/smc/smc_cdc.h:143:24: sparse: expression using sizeof(void)
   net/smc/smc_cdc.h:146:16: sparse: expression using sizeof(void)
   net/smc/smc_cdc.h:143:24: sparse: expression using sizeof(void)
   net/smc/smc_cdc.h:146:16: sparse: expression using sizeof(void)
>> net/smc/af_smc.c:1667:20: sparse: incorrect type in assignment (different address spaces) @@    expected struct socket_wq *smcwq @@    got struct socket_wq struct socket_wq *smcwq @@
   net/smc/af_smc.c:1667:20:    expected struct socket_wq *smcwq
   net/smc/af_smc.c:1667:20:    got struct socket_wq [noderef] <asn:4>*sk_wq
   net/smc/af_smc.c:1668:29: sparse: expression using sizeof(void)
   net/smc/af_smc.c:1669:29: sparse: expression using sizeof(void)

vim +1301 net/smc/af_smc.c

  1277	
  1278	static __poll_t smc_poll_mask(struct socket *sock, __poll_t events)
  1279	{
  1280		struct sock *sk = sock->sk;
  1281		__poll_t mask = 0;
  1282		struct smc_sock *smc;
  1283		int rc;
  1284	
  1285		if (!sk)
  1286			return EPOLLNVAL;
  1287	
  1288		smc = smc_sk(sock->sk);
  1289		sock_hold(sk);
  1290		if ((sk->sk_state == SMC_INIT) || smc->use_fallback) {
  1291			/* delegate to CLC child sock */
  1292			mask = smc->clcsock->ops->poll_mask(smc->clcsock, events);
  1293			sk->sk_err = smc->clcsock->sk->sk_err;
  1294			if (sk->sk_err) {
  1295				mask |= EPOLLERR;
  1296			} else {
  1297				/* if non-blocking connect finished ... */
  1298				if (sk->sk_state == SMC_INIT &&
  1299				    mask & EPOLLOUT &&
  1300				    smc->clcsock->sk->sk_state != TCP_CLOSE) {
> 1301					sock->sk->sk_wq = smc->smcwq;
  1302					lock_sock(sk);
  1303					rc = __smc_connect(smc);
  1304					release_sock(sk);
  1305					if (rc < 0)
  1306						mask |= EPOLLERR;
  1307					/* success cases including fallback */
  1308					mask |= EPOLLOUT | EPOLLWRNORM;
  1309				}
  1310			}
  1311		} else {
  1312			if (sk->sk_err)
  1313				mask |= EPOLLERR;
  1314			if ((sk->sk_shutdown == SHUTDOWN_MASK) ||
  1315			    (sk->sk_state == SMC_CLOSED))
  1316				mask |= EPOLLHUP;
  1317			if (sk->sk_state == SMC_LISTEN) {
  1318				/* woken up by sk_data_ready in smc_listen_work() */
  1319				mask = smc_accept_poll(sk);
  1320			} else {
  1321				if (atomic_read(&smc->conn.sndbuf_space) ||
  1322				    sk->sk_shutdown & SEND_SHUTDOWN) {
  1323					mask |= EPOLLOUT | EPOLLWRNORM;
  1324				} else {
  1325					sk_set_bit(SOCKWQ_ASYNC_NOSPACE, sk);
  1326					set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
  1327				}
  1328				if (atomic_read(&smc->conn.bytes_to_rcv))
  1329					mask |= EPOLLIN | EPOLLRDNORM;
  1330				if (sk->sk_shutdown & RCV_SHUTDOWN)
  1331					mask |= EPOLLIN | EPOLLRDNORM | EPOLLRDHUP;
  1332				if (sk->sk_state == SMC_APPCLOSEWAIT1)
  1333					mask |= EPOLLIN;
  1334			}
  1335			if (smc->conn.urg_state == SMC_URG_VALID)
  1336				mask |= EPOLLPRI;
  1337	
  1338		}
  1339		sock_put(sk);
  1340	
  1341		return mask;
  1342	}
  1343	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply

* hi
From: Dr.Abdul Fha @ 2018-06-21 16:30 UTC (permalink / raw)


Dear
I am.Dr.Abdul Fha,Manager Auditing and Accountancy Department,Ecobank,
Burkina Faso.I have a business proposal for you in the tune of 3.5
million usd.If you know you are capable of involving and partaking in 
this transaction this will be disbursed or shared between the both of 
us. email me so that i will give you all the informetion of it (dr.
abdulfha44@gmail.com) 
Regards,
Dr.Abdul fha,

^ permalink raw reply

* Re: [GIT] Networking
From: Ingo Molnar @ 2018-06-21 16:33 UTC (permalink / raw)
  To: Matteo Croce
  Cc: David S . Miller, alexei.starovoitov, sfr, torvalds, akpm, netdev,
	linux-kernel, tglx
In-Reply-To: <CAGnkfhxGAYZNhJp7eyg+_j3LY31w7muFqerhQp7jGqQ02iFxkg@mail.gmail.com>


* Matteo Croce <mcroce@redhat.com> wrote:

> Hi Ingo,
> 
> are you compiling a 32 bit kernel on an x86_64 host?

Yes.

> then I tried to compile an i386 kernel on an x86_64 host and I get the
> same error:
> 
> $ make -j8 ARCH=i386
> ...
>   LD      vmlinux.o
> ld: i386:x86-64 architecture of input file
> `net/bpfilter/bpfilter_umh.o' is incompatible with i386 output

Correct.

> Any idea how to fix it without building it twice, for host and target?

No idea, sorry ...

Thanks,

	Ingo

^ permalink raw reply

* Re: [PATCH net-next 0/2] fixes for ipsec selftests
From: Anders Roxell @ 2018-06-21 16:56 UTC (permalink / raw)
  To: shannon.nelson; +Cc: Networking, David Miller
In-Reply-To: <6134e116-13c5-ee9d-e539-35679efcd665@oracle.com>

On Thu, 21 Jun 2018 at 02:32, Shannon Nelson <shannon.nelson@oracle.com> wrote:
>
> On 6/20/2018 4:18 PM, Anders Roxell wrote:
> > On Thu, 21 Jun 2018 at 00:26, Shannon Nelson <shannon.nelson@oracle.com> wrote:
> >>
> >> On 6/20/2018 12:09 PM, Anders Roxell wrote:
> >>> On Wed, 20 Jun 2018 at 07:42, Shannon Nelson <shannon.nelson@oracle.com> wrote:
> >>>>
> >>>> A couple of bad behaviors in the ipsec selftest were pointed out
> >>>> by Anders Roxell <anders.roxell@linaro.org> and are addressed here.
> >>>>
> >>>> Shannon Nelson (2):
> >>>>     selftests: rtnetlink: hide complaint from terminated monitor
> >>>>     selftests: rtnetlink: use a local IP address for IPsec tests
> >>>>
> >>>>    tools/testing/selftests/net/rtnetlink.sh | 11 +++++++----
> >>>>    1 file changed, 7 insertions(+), 4 deletions(-)
> >>>>
> >>>> --
> >>>> 2.7.4
> >>>>
> >>>
> >>> Hi Shannon,
> >>>
> >>> With this patches applied and my config patch.
> >>>
> >>> I still get this error when I run the ipsec test:
> >>>
> >>> FAIL: can't add fou port 7777, skipping test
> >>> RTNETLINK answers: Operation not supported
> >>> FAIL: can't add macsec interface, skipping test
> >>> RTNETLINK answers: Protocol not supported
> >>> RTNETLINK answers: No such process
> >>> RTNETLINK answers: No such process
> >>> FAIL: ipsec
> >>
> >> One of the odd things I noticed about this script is that there really
> >> aren't any diagnosis messages, just PASS or FAIL.  I followed this
> >> custom when I added the ipsec tests, but I think this is something that
> >> should change so we can get some idea of what breaks.
> >>
> >> I'm curious about the "RTNETLINK answers" messages and where they might
> >> be coming from, especially "RTNETLINK answers: Protocol not supported".
> >
> > I added: "set -x" in the beginning of the rtnetlink.sh script.
> > + ip x s add proto esp src 10.66.17.140 dst 10.66.17.141 spi 0x07 mode
> > transport reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))'
> > 0x3132333435
> > 363738393031323334353664636261 128 sel src 10.66.17.140/24 dst 10.66.17.141/24
> > RTNETLINK answers: Protocol not supported
>
> Okay, so ip didn't like this command...
>
> >> What are the XFRM and AES settings in your kernel config - what is the
> >> output from
> >>          egrep -i "xfrm|_aes" .config
> >
> > CONFIG_XFRM=y
> > CONFIG_XFRM_ALGO=y
> > CONFIG_XFRM_USER=y
> > CONFIG_INET_XFRM_MODE_TUNNEL=y
> > CONFIG_INET6_XFRM_MODE_TRANSPORT=y
> > CONFIG_INET6_XFRM_MODE_TUNNEL=y
> > CONFIG_INET6_XFRM_MODE_BEET=y
> > CONFIG_CRYPTO_AES=y
>
> And this is probably why - there seem to be a few config variables
> missing, including CONFIG_INET_XFRM_MODE_TRANSPORT, which might be why
> the ip command fails above.
>
> Here's what I have in my config:
> CONFIG_XFRM=y
> CONFIG_XFRM_OFFLOAD=y
> CONFIG_XFRM_ALGO=m
> CONFIG_XFRM_USER=m
> # CONFIG_XFRM_SUB_POLICY is not set
> # CONFIG_XFRM_MIGRATE is not set
> CONFIG_XFRM_STATISTICS=y
> CONFIG_XFRM_IPCOMP=m
> CONFIG_INET_XFRM_TUNNEL=m
> CONFIG_INET_XFRM_MODE_TRANSPORT=m
> CONFIG_INET_XFRM_MODE_TUNNEL=m
> CONFIG_INET_XFRM_MODE_BEET=m
> CONFIG_INET6_XFRM_TUNNEL=m
> CONFIG_INET6_XFRM_MODE_TRANSPORT=m
> CONFIG_INET6_XFRM_MODE_TUNNEL=m
> CONFIG_INET6_XFRM_MODE_BEET=m
> CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION=m
> CONFIG_SECURITY_NETWORK_XFRM=y
> CONFIG_CRYPTO_AES=y
> # CONFIG_CRYPTO_AES_TI is not set
> CONFIG_CRYPTO_AES_X86_64=m
> CONFIG_CRYPTO_AES_NI_INTEL=m
> CONFIG_CRYPTO_CAMELLIA_AESNI_AVX_X86_64=m
> CONFIG_CRYPTO_CAMELLIA_AESNI_AVX2_X86_64=m
> CONFIG_CRYPTO_DEV_PADLOCK_AES=m
>
> Can I talk you into adding CONFIG_INET_XFRM_MODE_TRANSPORT to your
> config

Yes you can.

> and trying again?

same issue with CONFIG_INET_XFRM_MODE_TRANSPORT=y

Cheers,
Anders

^ permalink raw reply

* [PATCH] selftests: bpf: notification about privilege required to run test_kmod.sh testing script
From: Jeffrin Jose T @ 2018-06-21 17:00 UTC (permalink / raw)
  To: ast, daniel, shuah; +Cc: netdev, linux-kernel, linux-kselftest, Jeffrin Jose T

The test_kmod.sh script require root privilege for the successful
execution of the test.

This patch is to notify the user about the privilege the script
demands for the successful execution of the test.

Signed-off-by: Jeffrin Jose T (Rajagiri SET) <ahiliation@gmail.com>
---
 tools/testing/selftests/bpf/test_kmod.sh | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_kmod.sh b/tools/testing/selftests/bpf/test_kmod.sh
index 35669ccd4d23..378ccc512ad3 100755
--- a/tools/testing/selftests/bpf/test_kmod.sh
+++ b/tools/testing/selftests/bpf/test_kmod.sh
@@ -1,6 +1,15 @@
 #!/bin/sh
 # SPDX-License-Identifier: GPL-2.0
 
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
+msg="skip all tests:"
+if [ "$(id -u)" != "0" ]; then
+    echo $msg please run this as root >&2
+    exit $ksft_skip
+fi
+
 SRC_TREE=../../../../
 
 test_run()
-- 
2.17.0

^ permalink raw reply related

* [PATCH bpf] tools/bpf: fix test_sockmap failure
From: Yonghong Song @ 2018-06-21 17:02 UTC (permalink / raw)
  To: ast, daniel, netdev; +Cc: kernel-team

On one of our production test machine, when running
bpf selftest test_sockmap, I got the following error:
  # sudo ./test_sockmap
  libbpf: failed to create map (name: 'sock_map'): Operation not permitted
  libbpf: failed to load object 'test_sockmap_kern.o'
  libbpf: Can't get the 0th fd from program sk_skb1: only -1 instances
  ......
  load_bpf_file: (-1) Operation not permitted
  ERROR: (-1) load bpf failed

The error is due to not-big-enough rlimit
  struct rlimit r = {10 * 1024 * 1024, RLIM_INFINITY};

The test already includes "bpf_rlimit.h", which sets current
and max rlimit to RLIM_INFINITY. Let us just use it.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 tools/testing/selftests/bpf/test_sockmap.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index 05c8cb7..9e78df2 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -1413,18 +1413,12 @@ static int test_suite(void)

 int main(int argc, char **argv)
 {
-	struct rlimit r = {10 * 1024 * 1024, RLIM_INFINITY};
 	int iov_count = 1, length = 1024, rate = 1;
 	struct sockmap_options options = {0};
 	int opt, longindex, err, cg_fd = 0;
 	char *bpf_file = BPF_SOCKMAP_FILENAME;
 	int test = PING_PONG;

-	if (setrlimit(RLIMIT_MEMLOCK, &r)) {
-		perror("setrlimit(RLIMIT_MEMLOCK)");
-		return 1;
-	}
-
 	if (argc < 2)
 		return test_suite();

-- 
2.9.5

^ permalink raw reply related

* Re: [PATCH v2 bpf-net] bpf: Change bpf_fib_lookup to return lookup status
From: Martin KaFai Lau @ 2018-06-21 17:09 UTC (permalink / raw)
  To: dsahern; +Cc: netdev, borkmann, ast, davem, David Ahern
In-Reply-To: <20180621030011.7441-1-dsahern@kernel.org>

On Wed, Jun 20, 2018 at 08:00:11PM -0700, dsahern@kernel.org wrote:
> From: David Ahern <dsahern@gmail.com>
> 
> For ACLs implemented using either FIB rules or FIB entries, the BPF
> program needs the FIB lookup status to be able to drop the packet.
> Since the bpf_fib_lookup API has not reached a released kernel yet,
> change the return code to contain an encoding of the FIB lookup
> result and return the nexthop device index in the params struct.
> 
> In addition, inform the BPF program of any post FIB lookup reason as
> to why the packet needs to go up the stack.
> 
> The fib result for unicast routes must have an egress device, so remove
> the check that it is non-NULL.
Acked-by: Martin KaFai Lau <kafai@fb.com>

> 
> Signed-off-by: David Ahern <dsahern@gmail.com>
> ---
> v2
> - drop BPF_FIB_LKUP_RET_NO_NHDEV; check in dev in fib result not needed
> - enhance documentation of BPF_FIB_LKUP_RET_ codes
> 
>  include/uapi/linux/bpf.h   | 28 ++++++++++++++----
>  net/core/filter.c          | 72 ++++++++++++++++++++++++++++++----------------
>  samples/bpf/xdp_fwd_kern.c |  8 +++---
>  3 files changed, 74 insertions(+), 34 deletions(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 59b19b6a40d7..b7db3261c62d 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -1857,7 +1857,8 @@ union bpf_attr {
>   *		is resolved), the nexthop address is returned in ipv4_dst
>   *		or ipv6_dst based on family, smac is set to mac address of
>   *		egress device, dmac is set to nexthop mac address, rt_metric
> - *		is set to metric from route (IPv4/IPv6 only).
> + *		is set to metric from route (IPv4/IPv6 only), and ifindex
> + *		is set to the device index of the nexthop from the FIB lookup.
>   *
>   *             *plen* argument is the size of the passed in struct.
>   *             *flags* argument can be a combination of one or more of the
> @@ -1873,9 +1874,10 @@ union bpf_attr {
>   *             *ctx* is either **struct xdp_md** for XDP programs or
>   *             **struct sk_buff** tc cls_act programs.
>   *     Return
> - *             Egress device index on success, 0 if packet needs to continue
> - *             up the stack for further processing or a negative error in case
> - *             of failure.
> + *		* < 0 if any input argument is invalid
> + *		*   0 on success (packet is forwarded, nexthop neighbor exists)
> + *		* > 0 one of **BPF_FIB_LKUP_RET_** codes explaining why the
> + *		*     packet is not forwarded or needs assist from full stack
>   *
>   * int bpf_sock_hash_update(struct bpf_sock_ops_kern *skops, struct bpf_map *map, void *key, u64 flags)
>   *	Description
> @@ -2612,6 +2614,18 @@ struct bpf_raw_tracepoint_args {
>  #define BPF_FIB_LOOKUP_DIRECT  BIT(0)
>  #define BPF_FIB_LOOKUP_OUTPUT  BIT(1)
>  
> +enum {
> +	BPF_FIB_LKUP_RET_SUCCESS,      /* lookup successful */
> +	BPF_FIB_LKUP_RET_BLACKHOLE,    /* dest is blackholed; can be dropped */
> +	BPF_FIB_LKUP_RET_UNREACHABLE,  /* dest is unreachable; can be dropped */
> +	BPF_FIB_LKUP_RET_PROHIBIT,     /* dest not allowed; can be dropped */
> +	BPF_FIB_LKUP_RET_NOT_FWDED,    /* packet is not forwarded */
> +	BPF_FIB_LKUP_RET_FWD_DISABLED, /* fwding is not enabled on ingress */
> +	BPF_FIB_LKUP_RET_UNSUPP_LWT,   /* fwd requires encapsulation */
> +	BPF_FIB_LKUP_RET_NO_NEIGH,     /* no neighbor entry for nh */
> +	BPF_FIB_LKUP_RET_FRAG_NEEDED,  /* fragmentation required to fwd */
> +};
> +
>  struct bpf_fib_lookup {
>  	/* input:  network family for lookup (AF_INET, AF_INET6)
>  	 * output: network family of egress nexthop
> @@ -2625,7 +2639,11 @@ struct bpf_fib_lookup {
>  
>  	/* total length of packet from network header - used for MTU check */
>  	__u16	tot_len;
> -	__u32	ifindex;  /* L3 device index for lookup */
> +
> +	/* input: L3 device index for lookup
> +	 * output: device index from FIB lookup
> +	 */
> +	__u32	ifindex;
>  
>  	union {
>  		/* inputs to lookup */
> diff --git a/net/core/filter.c b/net/core/filter.c
> index e7f12e9f598c..f8dd8aa89de4 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -4073,8 +4073,9 @@ static int bpf_fib_set_fwd_params(struct bpf_fib_lookup *params,
>  	memcpy(params->smac, dev->dev_addr, ETH_ALEN);
>  	params->h_vlan_TCI = 0;
>  	params->h_vlan_proto = 0;
> +	params->ifindex = dev->ifindex;
>  
> -	return dev->ifindex;
> +	return 0;
>  }
>  #endif
>  
> @@ -4098,7 +4099,7 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>  	/* verify forwarding is enabled on this interface */
>  	in_dev = __in_dev_get_rcu(dev);
>  	if (unlikely(!in_dev || !IN_DEV_FORWARD(in_dev)))
> -		return 0;
> +		return BPF_FIB_LKUP_RET_FWD_DISABLED;
>  
>  	if (flags & BPF_FIB_LOOKUP_OUTPUT) {
>  		fl4.flowi4_iif = 1;
> @@ -4123,7 +4124,7 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>  
>  		tb = fib_get_table(net, tbid);
>  		if (unlikely(!tb))
> -			return 0;
> +			return BPF_FIB_LKUP_RET_NOT_FWDED;
>  
>  		err = fib_table_lookup(tb, &fl4, &res, FIB_LOOKUP_NOREF);
>  	} else {
> @@ -4135,8 +4136,20 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>  		err = fib_lookup(net, &fl4, &res, FIB_LOOKUP_NOREF);
>  	}
>  
> -	if (err || res.type != RTN_UNICAST)
> -		return 0;
> +	if (err) {
> +		/* map fib lookup errors to RTN_ type */
> +		if (err == -EINVAL)
> +			return BPF_FIB_LKUP_RET_BLACKHOLE;
> +		if (err == -EHOSTUNREACH)
> +			return BPF_FIB_LKUP_RET_UNREACHABLE;
> +		if (err == -EACCES)
> +			return BPF_FIB_LKUP_RET_PROHIBIT;
> +
> +		return BPF_FIB_LKUP_RET_NOT_FWDED;
> +	}
> +
> +	if (res.type != RTN_UNICAST)
> +		return BPF_FIB_LKUP_RET_NOT_FWDED;
>  
>  	if (res.fi->fib_nhs > 1)
>  		fib_select_path(net, &res, &fl4, NULL);
> @@ -4144,19 +4157,16 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>  	if (check_mtu) {
>  		mtu = ip_mtu_from_fib_result(&res, params->ipv4_dst);
>  		if (params->tot_len > mtu)
> -			return 0;
> +			return BPF_FIB_LKUP_RET_FRAG_NEEDED;
>  	}
>  
>  	nh = &res.fi->fib_nh[res.nh_sel];
>  
>  	/* do not handle lwt encaps right now */
>  	if (nh->nh_lwtstate)
> -		return 0;
> +		return BPF_FIB_LKUP_RET_UNSUPP_LWT;
>  
>  	dev = nh->nh_dev;
> -	if (unlikely(!dev))
> -		return 0;
> -
>  	if (nh->nh_gw)
>  		params->ipv4_dst = nh->nh_gw;
>  
> @@ -4166,10 +4176,10 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>  	 * rcu_read_lock_bh is not needed here
>  	 */
>  	neigh = __ipv4_neigh_lookup_noref(dev, (__force u32)params->ipv4_dst);
> -	if (neigh)
> -		return bpf_fib_set_fwd_params(params, neigh, dev);
> +	if (!neigh)
> +		return BPF_FIB_LKUP_RET_NO_NEIGH;
>  
> -	return 0;
> +	return bpf_fib_set_fwd_params(params, neigh, dev);
>  }
>  #endif
>  
> @@ -4190,7 +4200,7 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>  
>  	/* link local addresses are never forwarded */
>  	if (rt6_need_strict(dst) || rt6_need_strict(src))
> -		return 0;
> +		return BPF_FIB_LKUP_RET_NOT_FWDED;
>  
>  	dev = dev_get_by_index_rcu(net, params->ifindex);
>  	if (unlikely(!dev))
> @@ -4198,7 +4208,7 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>  
>  	idev = __in6_dev_get_safely(dev);
>  	if (unlikely(!idev || !net->ipv6.devconf_all->forwarding))
> -		return 0;
> +		return BPF_FIB_LKUP_RET_FWD_DISABLED;
>  
>  	if (flags & BPF_FIB_LOOKUP_OUTPUT) {
>  		fl6.flowi6_iif = 1;
> @@ -4225,7 +4235,7 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>  
>  		tb = ipv6_stub->fib6_get_table(net, tbid);
>  		if (unlikely(!tb))
> -			return 0;
> +			return BPF_FIB_LKUP_RET_NOT_FWDED;
>  
>  		f6i = ipv6_stub->fib6_table_lookup(net, tb, oif, &fl6, strict);
>  	} else {
> @@ -4238,11 +4248,23 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>  	}
>  
>  	if (unlikely(IS_ERR_OR_NULL(f6i) || f6i == net->ipv6.fib6_null_entry))
> -		return 0;
> +		return BPF_FIB_LKUP_RET_NOT_FWDED;
> +
> +	if (unlikely(f6i->fib6_flags & RTF_REJECT)) {
> +		switch (f6i->fib6_type) {
> +		case RTN_BLACKHOLE:
> +			return BPF_FIB_LKUP_RET_BLACKHOLE;
> +		case RTN_UNREACHABLE:
> +			return BPF_FIB_LKUP_RET_UNREACHABLE;
> +		case RTN_PROHIBIT:
> +			return BPF_FIB_LKUP_RET_PROHIBIT;
> +		default:
> +			return BPF_FIB_LKUP_RET_NOT_FWDED;
> +		}
> +	}
>  
> -	if (unlikely(f6i->fib6_flags & RTF_REJECT ||
> -	    f6i->fib6_type != RTN_UNICAST))
> -		return 0;
> +	if (f6i->fib6_type != RTN_UNICAST)
> +		return BPF_FIB_LKUP_RET_NOT_FWDED;
>  
>  	if (f6i->fib6_nsiblings && fl6.flowi6_oif == 0)
>  		f6i = ipv6_stub->fib6_multipath_select(net, f6i, &fl6,
> @@ -4252,11 +4274,11 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>  	if (check_mtu) {
>  		mtu = ipv6_stub->ip6_mtu_from_fib6(f6i, dst, src);
>  		if (params->tot_len > mtu)
> -			return 0;
> +			return BPF_FIB_LKUP_RET_FRAG_NEEDED;
>  	}
>  
>  	if (f6i->fib6_nh.nh_lwtstate)
> -		return 0;
> +		return BPF_FIB_LKUP_RET_UNSUPP_LWT;
>  
>  	if (f6i->fib6_flags & RTF_GATEWAY)
>  		*dst = f6i->fib6_nh.nh_gw;
> @@ -4270,10 +4292,10 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>  	 */
>  	neigh = ___neigh_lookup_noref(ipv6_stub->nd_tbl, neigh_key_eq128,
>  				      ndisc_hashfn, dst, dev);
> -	if (neigh)
> -		return bpf_fib_set_fwd_params(params, neigh, dev);
> +	if (!neigh)
> +		return BPF_FIB_LKUP_RET_NO_NEIGH;
>  
> -	return 0;
> +	return bpf_fib_set_fwd_params(params, neigh, dev);
>  }
>  #endif
>  
> diff --git a/samples/bpf/xdp_fwd_kern.c b/samples/bpf/xdp_fwd_kern.c
> index 6673cdb9f55c..a7e94e7ff87d 100644
> --- a/samples/bpf/xdp_fwd_kern.c
> +++ b/samples/bpf/xdp_fwd_kern.c
> @@ -48,9 +48,9 @@ static __always_inline int xdp_fwd_flags(struct xdp_md *ctx, u32 flags)
>  	struct ethhdr *eth = data;
>  	struct ipv6hdr *ip6h;
>  	struct iphdr *iph;
> -	int out_index;
>  	u16 h_proto;
>  	u64 nh_off;
> +	int rc;
>  
>  	nh_off = sizeof(*eth);
>  	if (data + nh_off > data_end)
> @@ -101,7 +101,7 @@ static __always_inline int xdp_fwd_flags(struct xdp_md *ctx, u32 flags)
>  
>  	fib_params.ifindex = ctx->ingress_ifindex;
>  
> -	out_index = bpf_fib_lookup(ctx, &fib_params, sizeof(fib_params), flags);
> +	rc = bpf_fib_lookup(ctx, &fib_params, sizeof(fib_params), flags);
>  
>  	/* verify egress index has xdp support
>  	 * TO-DO bpf_map_lookup_elem(&tx_port, &key) fails with
> @@ -109,7 +109,7 @@ static __always_inline int xdp_fwd_flags(struct xdp_md *ctx, u32 flags)
>  	 * NOTE: without verification that egress index supports XDP
>  	 *       forwarding packets are dropped.
>  	 */
> -	if (out_index > 0) {
> +	if (rc == 0) {
>  		if (h_proto == htons(ETH_P_IP))
>  			ip_decrease_ttl(iph);
>  		else if (h_proto == htons(ETH_P_IPV6))
> @@ -117,7 +117,7 @@ static __always_inline int xdp_fwd_flags(struct xdp_md *ctx, u32 flags)
>  
>  		memcpy(eth->h_dest, fib_params.dmac, ETH_ALEN);
>  		memcpy(eth->h_source, fib_params.smac, ETH_ALEN);
> -		return bpf_redirect_map(&tx_port, out_index, 0);
> +		return bpf_redirect_map(&tx_port, fib_params.ifindex, 0);
>  	}
>  
>  	return XDP_PASS;
> -- 
> 2.11.0
> 

^ permalink raw reply

* Re: [PATCH v0 03/12] mlxsw: core: Add core environment module for port temperature reading
From: Andrew Lunn @ 2018-06-21 17:11 UTC (permalink / raw)
  To: Vadim Pasternak; +Cc: davem, netdev, jiri
In-Reply-To: <1529594883-20619-4-git-send-email-vadimp@mellanox.com>

> New internal API reads the temperature from all the modules, which are
> equipped with the thermal sensor and exposes temperature according to
> the worst measure. All individual temperature values are normalized to
> pre-defined range.

Hi Vadim

Could you explain this normalization process. Why are you not just
expose each sensors temperature in millidegrees C, which is the normal
for HWMON.

    Andrew

^ permalink raw reply

* [PATCH] net: ethernet: ti: davinci_cpdma: make function cpdma_desc_pool_create static
From: Colin King @ 2018-06-21 17:16 UTC (permalink / raw)
  To: David S . Miller, Florian Fainelli, Ivan Khoronzhuk, linux-omap,
	netdev
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

The function cpdma_desc_pool_create is local to the source and does not
need to be in global scope, so make it static.

Cleans up sparse warning:
warning: symbol 'cpdma_desc_pool_create' was not declared. Should it
be static?

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/ethernet/ti/davinci_cpdma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c
index cdbddf16dd29..4f1267477aa4 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -205,7 +205,7 @@ static void cpdma_desc_pool_destroy(struct cpdma_ctlr *ctlr)
  * devices (e.g. cpsw switches) use plain old memory.  Descriptor pools
  * abstract out these details
  */
-int cpdma_desc_pool_create(struct cpdma_ctlr *ctlr)
+static int cpdma_desc_pool_create(struct cpdma_ctlr *ctlr)
 {
 	struct cpdma_params *cpdma_params = &ctlr->params;
 	struct cpdma_desc_pool *pool;
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH] net: ethernet: ti: davinci_cpdma: make function cpdma_desc_pool_create static
From: Grygorii Strashko @ 2018-06-21 17:22 UTC (permalink / raw)
  To: Colin King, David S . Miller, Florian Fainelli, Ivan Khoronzhuk,
	linux-omap, netdev, netdev
  Cc: kernel-janitors, linux-kernel
In-Reply-To: <20180621171645.29734-1-colin.king@canonical.com>

Please, add netdev@vger.kernel.org for the future

On 06/21/2018 12:16 PM, Colin King wrote:
> From: Colin Ian King <colin.king@canonical.com>
> 
> The function cpdma_desc_pool_create is local to the source and does not
> need to be in global scope, so make it static.
> 
> Cleans up sparse warning:
> warning: symbol 'cpdma_desc_pool_create' was not declared. Should it
> be static?
> 
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
> ---
>   drivers/net/ethernet/ti/davinci_cpdma.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c
> index cdbddf16dd29..4f1267477aa4 100644
> --- a/drivers/net/ethernet/ti/davinci_cpdma.c
> +++ b/drivers/net/ethernet/ti/davinci_cpdma.c
> @@ -205,7 +205,7 @@ static void cpdma_desc_pool_destroy(struct cpdma_ctlr *ctlr)
>    * devices (e.g. cpsw switches) use plain old memory.  Descriptor pools
>    * abstract out these details
>    */
> -int cpdma_desc_pool_create(struct cpdma_ctlr *ctlr)
> +static int cpdma_desc_pool_create(struct cpdma_ctlr *ctlr)
>   {
>   	struct cpdma_params *cpdma_params = &ctlr->params;
>   	struct cpdma_desc_pool *pool;
> 

Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>

-- 
regards,
-grygorii

^ permalink raw reply

* Re: [PATCH net-next 0/2] fixes for ipsec selftests
From: Shannon Nelson @ 2018-06-21 17:25 UTC (permalink / raw)
  To: Anders Roxell; +Cc: Networking, David Miller
In-Reply-To: <CADYN=9+DSfu+UN3d8Te71F91ZWxCFUS0RBJLdzO4M0hZUonPiA@mail.gmail.com>

On 6/21/2018 9:56 AM, Anders Roxell wrote:
> On Thu, 21 Jun 2018 at 02:32, Shannon Nelson <shannon.nelson@oracle.com> wrote:
>>
>> On 6/20/2018 4:18 PM, Anders Roxell wrote:
>>> On Thu, 21 Jun 2018 at 00:26, Shannon Nelson <shannon.nelson@oracle.com> wrote:
>>>>
>>>> On 6/20/2018 12:09 PM, Anders Roxell wrote:
>>>>> On Wed, 20 Jun 2018 at 07:42, Shannon Nelson <shannon.nelson@oracle.com> wrote:
>>>>>>
>>>>>> A couple of bad behaviors in the ipsec selftest were pointed out
>>>>>> by Anders Roxell <anders.roxell@linaro.org> and are addressed here.
>>>>>>
>>>>>> Shannon Nelson (2):
>>>>>>      selftests: rtnetlink: hide complaint from terminated monitor
>>>>>>      selftests: rtnetlink: use a local IP address for IPsec tests
>>>>>>
>>>>>>     tools/testing/selftests/net/rtnetlink.sh | 11 +++++++----
>>>>>>     1 file changed, 7 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> --
>>>>>> 2.7.4
>>>>>>
>>>>>
>>>>> Hi Shannon,
>>>>>
>>>>> With this patches applied and my config patch.
>>>>>
>>>>> I still get this error when I run the ipsec test:
>>>>>
>>>>> FAIL: can't add fou port 7777, skipping test
>>>>> RTNETLINK answers: Operation not supported
>>>>> FAIL: can't add macsec interface, skipping test
>>>>> RTNETLINK answers: Protocol not supported
>>>>> RTNETLINK answers: No such process
>>>>> RTNETLINK answers: No such process
>>>>> FAIL: ipsec
>>>>
>>>> One of the odd things I noticed about this script is that there really
>>>> aren't any diagnosis messages, just PASS or FAIL.  I followed this
>>>> custom when I added the ipsec tests, but I think this is something that
>>>> should change so we can get some idea of what breaks.
>>>>
>>>> I'm curious about the "RTNETLINK answers" messages and where they might
>>>> be coming from, especially "RTNETLINK answers: Protocol not supported".
>>>
>>> I added: "set -x" in the beginning of the rtnetlink.sh script.
>>> + ip x s add proto esp src 10.66.17.140 dst 10.66.17.141 spi 0x07 mode
>>> transport reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))'
>>> 0x3132333435
>>> 363738393031323334353664636261 128 sel src 10.66.17.140/24 dst 10.66.17.141/24
>>> RTNETLINK answers: Protocol not supported
>>
>> Okay, so ip didn't like this command...
>>
>>>> What are the XFRM and AES settings in your kernel config - what is the
>>>> output from
>>>>           egrep -i "xfrm|_aes" .config
>>>
>>> CONFIG_XFRM=y
>>> CONFIG_XFRM_ALGO=y
>>> CONFIG_XFRM_USER=y
>>> CONFIG_INET_XFRM_MODE_TUNNEL=y
>>> CONFIG_INET6_XFRM_MODE_TRANSPORT=y
>>> CONFIG_INET6_XFRM_MODE_TUNNEL=y
>>> CONFIG_INET6_XFRM_MODE_BEET=y
>>> CONFIG_CRYPTO_AES=y
>>
>> And this is probably why - there seem to be a few config variables
>> missing, including CONFIG_INET_XFRM_MODE_TRANSPORT, which might be why
>> the ip command fails above.
>>
>> Here's what I have in my config:
>> CONFIG_XFRM=y
>> CONFIG_XFRM_OFFLOAD=y
>> CONFIG_XFRM_ALGO=m
>> CONFIG_XFRM_USER=m
>> # CONFIG_XFRM_SUB_POLICY is not set
>> # CONFIG_XFRM_MIGRATE is not set
>> CONFIG_XFRM_STATISTICS=y
>> CONFIG_XFRM_IPCOMP=m
>> CONFIG_INET_XFRM_TUNNEL=m
>> CONFIG_INET_XFRM_MODE_TRANSPORT=m
>> CONFIG_INET_XFRM_MODE_TUNNEL=m
>> CONFIG_INET_XFRM_MODE_BEET=m
>> CONFIG_INET6_XFRM_TUNNEL=m
>> CONFIG_INET6_XFRM_MODE_TRANSPORT=m
>> CONFIG_INET6_XFRM_MODE_TUNNEL=m
>> CONFIG_INET6_XFRM_MODE_BEET=m
>> CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION=m
>> CONFIG_SECURITY_NETWORK_XFRM=y
>> CONFIG_CRYPTO_AES=y
>> # CONFIG_CRYPTO_AES_TI is not set
>> CONFIG_CRYPTO_AES_X86_64=m
>> CONFIG_CRYPTO_AES_NI_INTEL=m
>> CONFIG_CRYPTO_CAMELLIA_AESNI_AVX_X86_64=m
>> CONFIG_CRYPTO_CAMELLIA_AESNI_AVX2_X86_64=m
>> CONFIG_CRYPTO_DEV_PADLOCK_AES=m
>>
>> Can I talk you into adding CONFIG_INET_XFRM_MODE_TRANSPORT to your
>> config
> 
> Yes you can.
> 
>> and trying again?
> 
> same issue with CONFIG_INET_XFRM_MODE_TRANSPORT=y

Interesting.  I took only CONFIG_INET_XFRM_MODE_TRANSPORT out of my 
config and was able to see the "Protocol not supported" message.  I'm 
not familiar enough with the crypto algorithm setup, but I suspect 
there's a combination of the other missing CONFIGs that are needed along 
with CONFIG_INET_XFRM_MODE_TRANSPORT.

My knee-jerk reaction voice wants to say this is the test working as 
expected, pointing out to us that the kernel config is not up to what it 
should be.  However, perhaps a better answer is that the test should be 
reworked to just skip the rest if it can't set up the expected test 
environment, as is done in the macsec case.

So the remaining question then is should the test be marked as failed, 
as in the macsec test if it can't set up it's interface, or just skipped?

sln

> 
> Cheers,
> Anders
> 

^ permalink raw reply

* Re: [PATCH rdma-next 0/2] RoCE ICRC counter
From: Jason Gunthorpe @ 2018-06-21 17:43 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Mark Bloch,
	Talat Batheesh, Saeed Mahameed, linux-netdev
In-Reply-To: <20180621123756.32645-1-leon@kernel.org>

On Thu, Jun 21, 2018 at 03:37:54PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@mellanox.com>
> 
> Hi,
> 
> This series exposes RoCE ICRC counter through existing RDMA hw_counters
> sysfs interface.
> 
> First patch has all HW definitions in mlx5_ifc.h file and second patch is
> actual counter implementation.

The RDMA parts are OK, can you please send me the commit for the mlx5
patch when applied?

Thanks,
Jason

^ permalink raw reply

* Re: [PATCH mlx5-next 1/2] net/mlx5: Add RoCE RX ICRC encapsulated counter
From: Leon Romanovsky @ 2018-06-21 17:53 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: RDMA mailing list, Mark Bloch, Talat Batheesh, Saeed Mahameed,
	linux-netdev
In-Reply-To: <20180621123756.32645-2-leon@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 1117 bytes --]

On Thu, Jun 21, 2018 at 03:37:55PM +0300, Leon Romanovsky wrote:
> From: Talat Batheesh <talatb@mellanox.com>
>
> Add capability bit in PCAM register and RoCE ICRC error counter
> to PPCNT register.
>
> Signed-off-by: Talat Batheesh <talatb@mellanox.com>
> Reviewed-by: Mark Bloch <markb@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> ---
>  include/linux/mlx5/mlx5_ifc.h | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
> index b4302ccb63a6..9e8682489951 100644
> --- a/include/linux/mlx5/mlx5_ifc.h
> +++ b/include/linux/mlx5/mlx5_ifc.h
> @@ -1687,7 +1687,11 @@ struct mlx5_ifc_eth_extended_cntrs_grp_data_layout_bits {
>
>  	u8         rx_buffer_full_low[0x20];
>
> -	u8         reserved_at_1c0[0x600];
> +	u8         rx_icrc_encapsulated_high[0x20];
> +
> +	u8         rx_icrc_encapsulated_low[0x20];
> +
> +	u8         reserved_at_3c0[0x5c0];

reserved_at_3c0 should be reserved_at_200, fixed and applied to mlx5-next.

Commit 0af5107cd0640ee3424e337b492e4b11b450ce28 in mlx5-next.

Thanks

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* [PATCH net v2] cls_flower: fix use after free in flower S/W path
From: Paolo Abeni @ 2018-06-21 18:02 UTC (permalink / raw)
  To: netdev
  Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, Marcelo Ricardo Leitner,
	Paul Blakey

If flower filter is created without the skip_sw flag, fl_mask_put()
can race with fl_classify() and we can destroy the mask rhashtable
while a lookup operation is accessing it.

 BUG: unable to handle kernel paging request at 00000000000911d1
 PGD 0 P4D 0
 SMP PTI
 CPU: 3 PID: 5582 Comm: vhost-5541 Not tainted 4.18.0-rc1.vanilla+ #1950
 Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016
 RIP: 0010:rht_bucket_nested+0x20/0x60
 Code: 31 c8 c1 c1 18 29 c8 c3 66 90 8b 4f 04 ba 01 00 00 00 8b 07 48 8b bf 80 00 00 0
 RSP: 0018:ffffafc5cfbb7a48 EFLAGS: 00010206
 RAX: 0000000000001978 RBX: ffff9f12dff88a00 RCX: 00000000ffff9f12
 RDX: 00000000000911d1 RSI: 0000000000000148 RDI: 0000000000000001
 RBP: ffff9f12dff88a00 R08: 000000005f1cc119 R09: 00000000a715fae2
 R10: ffffafc5cfbb7aa8 R11: ffff9f1cb4be804e R12: ffff9f1265e13000
 R13: 0000000000000000 R14: ffffafc5cfbb7b48 R15: ffff9f12dff88b68
 FS:  0000000000000000(0000) GS:ffff9f1d3f0c0000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00000000000911d1 CR3: 0000001575a94006 CR4: 00000000001626e0
 Call Trace:
  fl_lookup+0x134/0x140 [cls_flower]
  fl_classify+0xf3/0x180 [cls_flower]
  tcf_classify+0x78/0x150
  __netif_receive_skb_core+0x69e/0xa50
  netif_receive_skb_internal+0x42/0xf0
  tun_get_user+0xdd5/0xfd0 [tun]
  tun_sendmsg+0x52/0x70 [tun]
  handle_tx+0x2b3/0x5f0 [vhost_net]
  vhost_worker+0xab/0x100 [vhost]
  kthread+0xf8/0x130
  ret_from_fork+0x35/0x40
 Modules linked in: act_mirred act_gact cls_flower vhost_net vhost tap sch_ingress
 CR2: 00000000000911d1

Fix the above waiting for a RCU grace period before destroying the
rhashtable: we need to use tcf_queue_work(), as rhashtable_destroy()
must run in process context, as pointed out by Cong Wang.

v1 -> v2: use tcf_queue_work to run rhashtable_destroy().

Fixes: 05cd271fd61a ("cls_flower: Support multiple masks per priority")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/sched/cls_flower.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 2b5be42a9f1c..9e8b26a80fb3 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -66,7 +66,7 @@ struct fl_flow_mask {
 	struct rhashtable_params filter_ht_params;
 	struct flow_dissector dissector;
 	struct list_head filters;
-	struct rcu_head rcu;
+	struct rcu_work rwork;
 	struct list_head list;
 };
 
@@ -203,6 +203,20 @@ static int fl_init(struct tcf_proto *tp)
 	return rhashtable_init(&head->ht, &mask_ht_params);
 }
 
+static void fl_mask_free(struct fl_flow_mask *mask)
+{
+	rhashtable_destroy(&mask->ht);
+	kfree(mask);
+}
+
+static void fl_mask_free_work(struct work_struct *work)
+{
+	struct fl_flow_mask *mask = container_of(to_rcu_work(work),
+						 struct fl_flow_mask, rwork);
+
+	fl_mask_free(mask);
+}
+
 static bool fl_mask_put(struct cls_fl_head *head, struct fl_flow_mask *mask,
 			bool async)
 {
@@ -210,12 +224,11 @@ static bool fl_mask_put(struct cls_fl_head *head, struct fl_flow_mask *mask,
 		return false;
 
 	rhashtable_remove_fast(&head->ht, &mask->ht_node, mask_ht_params);
-	rhashtable_destroy(&mask->ht);
 	list_del_rcu(&mask->list);
 	if (async)
-		kfree_rcu(mask, rcu);
+		tcf_queue_work(&mask->rwork, fl_mask_free_work);
 	else
-		kfree(mask);
+		fl_mask_free(mask);
 
 	return true;
 }
-- 
2.17.1

^ permalink raw reply related

* RE: [PATCH v0 03/12] mlxsw: core: Add core environment module for port temperature reading
From: Vadim Pasternak @ 2018-06-21 18:14 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: davem@davemloft.net, netdev@vger.kernel.org, jiri@resnulli.us
In-Reply-To: <20180621171120.GA6830@lunn.ch>

> -----Original Message-----
> From: Andrew Lunn [mailto:andrew@lunn.ch]
> Sent: Thursday, June 21, 2018 8:11 PM
> To: Vadim Pasternak <vadimp@mellanox.com>
> Cc: davem@davemloft.net; netdev@vger.kernel.org; jiri@resnulli.us
> Subject: Re: [PATCH v0 03/12] mlxsw: core: Add core environment module for
> port temperature reading
> 
> > New internal API reads the temperature from all the modules, which are
> > equipped with the thermal sensor and exposes temperature according to
> > the worst measure. All individual temperature values are normalized to
> > pre-defined range.
> 
> Hi Vadim
> 
> Could you explain this normalization process. Why are you not just expose each
> sensors temperature in millidegrees C, which is the normal for HWMON.

Hi Andrew,

The temperature of each individual module can be obtained
through ethtool.
The worst temperature is necessary for the system cooling
control decision.

Up to 64 SFP/QSFP modules could be connected to the system.
Some of them could cooper modules, which doesn't provide
temperature measurement.
Some of them could be optical modules, providing untrusted
temperature measurement, which could impact thermal
control of the system.
Also optical modules could be from the different vendors,  and
this is real situation, when, f.e. one module has the warning and
critical thresholds 75C and 85C, while another 70C and 80C.
In such case  the first module temperature 72C is better, then the
second module temperature 71C.

And deltas  between warning and critical thresholds, could be
different as well. It could be 5C, 10C, etc.  

So, nominal temperature is not the case here, we should know the
"worst" value for the thermal control decision.

Thanks,
Vadim.

> 
>     Andrew

^ permalink raw reply

* Re: [PATCH net v2] cls_flower: fix use after free in flower S/W path
From: Jiri Pirko @ 2018-06-21 18:16 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Jamal Hadi Salim, Cong Wang, Marcelo Ricardo Leitner,
	Paul Blakey
In-Reply-To: <fd96de4e9dc358e3982922ae681fdb1b9d8ae72a.1529603970.git.pabeni@redhat.com>

Thu, Jun 21, 2018 at 08:02:16PM CEST, pabeni@redhat.com wrote:
>If flower filter is created without the skip_sw flag, fl_mask_put()
>can race with fl_classify() and we can destroy the mask rhashtable
>while a lookup operation is accessing it.
>
> BUG: unable to handle kernel paging request at 00000000000911d1
> PGD 0 P4D 0
> SMP PTI
> CPU: 3 PID: 5582 Comm: vhost-5541 Not tainted 4.18.0-rc1.vanilla+ #1950
> Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016
> RIP: 0010:rht_bucket_nested+0x20/0x60
> Code: 31 c8 c1 c1 18 29 c8 c3 66 90 8b 4f 04 ba 01 00 00 00 8b 07 48 8b bf 80 00 00 0
> RSP: 0018:ffffafc5cfbb7a48 EFLAGS: 00010206
> RAX: 0000000000001978 RBX: ffff9f12dff88a00 RCX: 00000000ffff9f12
> RDX: 00000000000911d1 RSI: 0000000000000148 RDI: 0000000000000001
> RBP: ffff9f12dff88a00 R08: 000000005f1cc119 R09: 00000000a715fae2
> R10: ffffafc5cfbb7aa8 R11: ffff9f1cb4be804e R12: ffff9f1265e13000
> R13: 0000000000000000 R14: ffffafc5cfbb7b48 R15: ffff9f12dff88b68
> FS:  0000000000000000(0000) GS:ffff9f1d3f0c0000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000000911d1 CR3: 0000001575a94006 CR4: 00000000001626e0
> Call Trace:
>  fl_lookup+0x134/0x140 [cls_flower]
>  fl_classify+0xf3/0x180 [cls_flower]
>  tcf_classify+0x78/0x150
>  __netif_receive_skb_core+0x69e/0xa50
>  netif_receive_skb_internal+0x42/0xf0
>  tun_get_user+0xdd5/0xfd0 [tun]
>  tun_sendmsg+0x52/0x70 [tun]
>  handle_tx+0x2b3/0x5f0 [vhost_net]
>  vhost_worker+0xab/0x100 [vhost]
>  kthread+0xf8/0x130
>  ret_from_fork+0x35/0x40
> Modules linked in: act_mirred act_gact cls_flower vhost_net vhost tap sch_ingress
> CR2: 00000000000911d1
>
>Fix the above waiting for a RCU grace period before destroying the
>rhashtable: we need to use tcf_queue_work(), as rhashtable_destroy()
>must run in process context, as pointed out by Cong Wang.
>
>v1 -> v2: use tcf_queue_work to run rhashtable_destroy().
>
>Fixes: 05cd271fd61a ("cls_flower: Support multiple masks per priority")
>Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply

* Re: Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Michael S. Tsirkin @ 2018-06-21 18:20 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Siwei Liu, Samudrala, Sridhar, Alexander Duyck, virtio-dev,
	aaron.f.brown, Jiri Pirko, Jakub Kicinski, Netdev, qemu-devel,
	virtualization, konrad.wilk, boris.ostrovsky, Joao Martins,
	Venu Busireddy, vijay.balakrishna
In-Reply-To: <20180621165913.7e3f4faa.cohuck@redhat.com>

On Thu, Jun 21, 2018 at 04:59:13PM +0200, Cornelia Huck wrote:
> On Wed, 20 Jun 2018 22:48:58 +0300
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Wed, Jun 20, 2018 at 06:06:19PM +0200, Cornelia Huck wrote:
> > > In any case, I'm not sure anymore why we'd want the extra uuid.  
> > 
> > It's mostly so we can have e.g. multiple devices with same MAC
> > (which some people seem to want in order to then use
> > then with different containers).
> > 
> > But it is also handy for when you assign a PF, since then you
> > can't set the MAC.
> > 
> 
> OK, so what about the following:
> 
> - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates
>   that we have a new uuid field in the virtio-net config space
> - in QEMU, add a property for virtio-net that allows to specify a uuid,
>   offer VIRTIO_NET_F_STANDBY_UUID if set
> - when configuring, set the property to the group UUID of the vfio-pci
>   device
> - in the guest, use the uuid from the virtio-net device's config space
>   if applicable; else, fall back to matching by MAC as done today
> 
> That should work for all virtio transports.

True. I'm a bit unhappy that it's virtio net specific though
since down the road I expect we'll have a very similar feature
for scsi (and maybe others).

But we do not have a way to have fields that are portable
both across devices and transports, and I think it would
be a useful addition. How would this work though? Any idea?

-- 
MST

^ permalink raw reply

* Re: [PATCH] net: nixge: Add __packed attribute to DMA descriptor struct
From: Moritz Fischer @ 2018-06-21 18:30 UTC (permalink / raw)
  To: David Miller; +Cc: f.fainelli, mdf, keescook, netdev, linux-kernel
In-Reply-To: <20180620.073750.642289685695664600.davem@davemloft.net>

Hi David,

On Wed, Jun 20, 2018 at 07:37:50AM +0900, David Miller wrote:
> From: Florian Fainelli <f.fainelli@gmail.com>
> Date: Tue, 19 Jun 2018 10:13:55 -0700
> 
> > How could padding be inserted given than all of the structure members
> > are naturally aligned (all u32 type). Compiler bug?
> 
> Agreed, this looks completely unnecessary.
> 
> __packed should only be used when absolutely necessary because using
> it generates less efficient code on some architectures.

Thanks for your input, will fix with the whole series when I submit it.

- Moritz

^ permalink raw reply

* Re: [PATCH v0 03/12] mlxsw: core: Add core environment module for port temperature reading
From: Andrew Lunn @ 2018-06-21 18:34 UTC (permalink / raw)
  To: Vadim Pasternak, Guenter Roeck
  Cc: davem@davemloft.net, netdev@vger.kernel.org, jiri@resnulli.us
In-Reply-To: <HE1PR0502MB37531F5503D85EB153A6C672A2760@HE1PR0502MB3753.eurprd05.prod.outlook.com>

> Hi Andrew,

Adding Guenter Roeck, the HWMON maintainer.

> The temperature of each individual module can be obtained
> through ethtool.

You mean via --module-info?

FYI: I plan to add hwmon support to the kernel SFP code. So if you
ever decide to swap to the kernel SFP code, not your own, the raw
temperatures will be exported.

> The worst temperature is necessary for the system cooling
> control decision.

I would expect the system cooling would understand that.

> Up to 64 SFP/QSFP modules could be connected to the system.
> Some of them could cooper modules, which doesn't provide
> temperature measurement.

SFP modules are hot-plugable. So i would also expect the hwmon devices
to hotplug. If there is no sensor, then there is no hwmon device... If
there is no hwmon device, it plays no part in the thermal control
loop.

> Some of them could be optical modules, providing untrusted
> temperature measurement, which could impact thermal
> control of the system.

Why would you not trust it? Are you saying some modules simply have
broken temperature sensors? Do you have a whitelist/blacklist of
modules?

> Also optical modules could be from the different vendors,  and
> this is real situation, when, f.e. one module has the warning and
> critical thresholds 75C and 85C, while another 70C and 80C.

But hwmon exports both the actual temperature and the alarm
temperatures. I would expect the thermal control code to use all this
information when making its decisions, not just the current
temperature.

> So, nominal temperature is not the case here, we should know the
> "worst" value for the thermal control decision.

What it sounds like to me is you are working around problems in the
thermal control by fudging the raw temperatures. That is the wrong
thing to do. hwmon should export the raw data, and you should fix the
thermal control code to use it correctly.

	Andrew

^ permalink raw reply

* [PATCH] net: phy: Allow compile test of GPIO consumers if !GPIOLIB
From: Geert Uytterhoeven @ 2018-06-21 18:58 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David S . Miller
  Cc: netdev, linux-kernel, Geert Uytterhoeven

The GPIO subsystem provides dummy GPIO consumer functions if GPIOLIB is
not enabled. Hence drivers that depend on GPIOLIB, but use GPIO consumer
functionality only, can still be compiled if GPIOLIB is not enabled.

Relax the dependency on GPIOLIB if COMPILE_TEST is enabled, where
appropriate.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
---
v3:
  - Rebased,

v2:
  - Add Acked-by.
---
 drivers/net/phy/Kconfig | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index 343989f9f9d981e2..ceede09a28459f45 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -92,7 +92,8 @@ config MDIO_CAVIUM

 config MDIO_GPIO
 	tristate "GPIO lib-based bitbanged MDIO buses"
-	depends on MDIO_BITBANG && GPIOLIB
+	depends on MDIO_BITBANG
+	depends on GPIOLIB || COMPILE_TEST
 	---help---
 	  Supports GPIO lib-based MDIO busses.

-- 
2.17.1

^ permalink raw reply related

* RE: [PATCH v0 03/12] mlxsw: core: Add core environment module for port temperature reading
From: Vadim Pasternak @ 2018-06-21 19:17 UTC (permalink / raw)
  To: Andrew Lunn, Guenter Roeck
  Cc: davem@davemloft.net, netdev@vger.kernel.org, jiri@resnulli.us
In-Reply-To: <20180621183440.GA10038@lunn.ch>



> -----Original Message-----
> From: Andrew Lunn [mailto:andrew@lunn.ch]
> Sent: Thursday, June 21, 2018 9:35 PM
> To: Vadim Pasternak <vadimp@mellanox.com>; Guenter Roeck <linux@roeck-
> us.net>
> Cc: davem@davemloft.net; netdev@vger.kernel.org; jiri@resnulli.us
> Subject: Re: [PATCH v0 03/12] mlxsw: core: Add core environment module for
> port temperature reading
> 
> > Hi Andrew,
> 
> Adding Guenter Roeck, the HWMON maintainer.
> 
> > The temperature of each individual module can be obtained through
> > ethtool.
> 
> You mean via --module-info?

Yes.

> 
> FYI: I plan to add hwmon support to the kernel SFP code. So if you ever decide to
> swap to the kernel SFP code, not your own, the raw temperatures will be
> exported.
> 

Not sure it'll work for us, since we read SFP/QSFP ports through our SW/FW
interface.
But would be nice if you can provide some reference to this code.

> > The worst temperature is necessary for the system cooling control
> > decision.
> 
> I would expect the system cooling would understand that.
> 

In thermal zone infrastructure there is one temperature input.
How you can consider 64+ different inputs?

> > Up to 64 SFP/QSFP modules could be connected to the system.
> > Some of them could cooper modules, which doesn't provide temperature
> > measurement.
> 
> SFP modules are hot-plugable. So i would also expect the hwmon devices to
> hotplug. If there is no sensor, then there is no hwmon device... If there is no
> hwmon device, it plays no part in the thermal control loop.
> 
> > Some of them could be optical modules, providing untrusted temperature
> > measurement, which could impact thermal control of the system.
> 
> Why would you not trust it? Are you saying some modules simply have broken
> temperature sensors? Do you have a whitelist/blacklist of modules?
> 

We are reading temperature info through the firmware.
In case of "broken" module (module is supposed to be capable of
reading temperature, but returns some non-valid code), we'll get
some error code.

> > Also optical modules could be from the different vendors,  and this is
> > real situation, when, f.e. one module has the warning and critical
> > thresholds 75C and 85C, while another 70C and 80C.
> 
> But hwmon exports both the actual temperature and the alarm temperatures. I
> would expect the thermal control code to use all this information when making
> its decisions, not just the current temperature.
> 

All information is used, but the decision to increase FAN speed is taken
based on the worst measure, which is logical.

> > So, nominal temperature is not the case here, we should know the
> > "worst" value for the thermal control decision.
> 
> What it sounds like to me is you are working around problems in the thermal
> control by fudging the raw temperatures. That is the wrong thing to do. hwmon
> should export the raw data, and you should fix the thermal control code to use it
> correctly.

By default we are using kernel step-wise thermal algorithm, considering
all the module and ASIC ambient sensors temperature. This is not working
around. In thermal zone we have one PWM control and cumulative temperature
from the modules and ASIC. And it gives stable and correct results.

> 
> 	Andrew

^ permalink raw reply

* Re: [PATCH v0 03/12] mlxsw: core: Add core environment module for port temperature reading
From: Andrew Lunn @ 2018-06-21 19:49 UTC (permalink / raw)
  To: Vadim Pasternak
  Cc: Guenter Roeck, davem@davemloft.net, netdev@vger.kernel.org,
	jiri@resnulli.us
In-Reply-To: <HE1PR0502MB37537B5DCD0D607DFB7C7099A2760@HE1PR0502MB3753.eurprd05.prod.outlook.com>

On Thu, Jun 21, 2018 at 07:17:03PM +0000, Vadim Pasternak wrote:
> 
> 
> > -----Original Message-----
> > From: Andrew Lunn [mailto:andrew@lunn.ch]
> > Sent: Thursday, June 21, 2018 9:35 PM
> > To: Vadim Pasternak <vadimp@mellanox.com>; Guenter Roeck <linux@roeck-
> > us.net>
> > Cc: davem@davemloft.net; netdev@vger.kernel.org; jiri@resnulli.us
> > Subject: Re: [PATCH v0 03/12] mlxsw: core: Add core environment module for
> > port temperature reading
> > 
> > > Hi Andrew,
> > 
> > Adding Guenter Roeck, the HWMON maintainer.
> > 
> > > The temperature of each individual module can be obtained through
> > > ethtool.
> > 
> > You mean via --module-info?
> 
> Yes.
> 
> > 
> > FYI: I plan to add hwmon support to the kernel SFP code. So if you ever decide to
> > swap to the kernel SFP code, not your own, the raw temperatures will be
> > exported.
> > 
> 
> Not sure it'll work for us, since we read SFP/QSFP ports through our SW/FW
> interface.

Can you make fake i2c busses? Pass the i2c transactions to the
firmware?

> But would be nice if you can provide some reference to this code.

drivers/net/phy/sfp.c

> 
> > > The worst temperature is necessary for the system cooling control
> > > decision.
> > 
> > I would expect the system cooling would understand that.
> > 
> 
> In thermal zone infrastructure there is one temperature input.
> How you can consider 64+ different inputs?

I've never used the thermal zone code. But i've used boards with 4
sensors spread around it. If the thermal zone code could not support
that, i would be surprised.

[Goes away and reads https://www.kernel.org/doc/Documentation/thermal/sysfs-api.txt]

So it sounds like, one zone is one sensor. So you actually have
hot-plugable zones, up to 64 of them. It also looks like you can bind
a zone to a cooling device. There does not seem to be a 1:1
mapping. So you should be able to bind 64 zones to one fan. Or if you
have multiple fans, bind a zone to the nearest fan.

But as i said, i'm no expert on this. You really should be posting
these patches on the hwmon list and the linux-pm list. The netdev list
does not have the needed specialist. Once Rui Zhang, Eduardo Valentin,
and Guenter Roack have given them Acked-by, David Miller can then
merge them.

      Andrew

^ permalink raw reply

* Re: Route fallback issue
From: Julian Anastasov @ 2018-06-21 19:57 UTC (permalink / raw)
  To: Grant Taylor; +Cc: Akshat Kakkar, netdev, cronolog+lartc, lartc, Erik Auerswald
In-Reply-To: <0a920d2d-4e97-284b-9aad-54cf75bcb755@spamtrap.tnetconsulting.net>


	Hello,

On Wed, 20 Jun 2018, Grant Taylor wrote:

> On 06/20/2018 01:00 PM, Julian Anastasov wrote:
> > You can also try alternative routes.
> 
> "Alternative routes"?  I can't say as I've heard that description as a
> specific technique / feature / capability before.
> 
> Is that it's official name?

	I think so

> Where can I find out more about it?

	You can search on net. I have some old docs on
these issues, they should be actual:

http://ja.ssi.bg/dgd-usage.txt

> > But as the kernel supports only default alternative routes, you can put them
> > in their own table:
> 
> I don't know that that is the case any more.
> 
> I was able to issue the following commands without a problem:
> 
> # ip route append 192.0.2.128/26 via 192.0.2.62
> # ip route append 192.0.2.128/26 via 192.0.2.126
> 
> I crated two network namespaces and had a pair of vEths between them
> (192.0.2.0/26 and 192.0.2.64/26).  I added a dummy network to each NetNS
> (192.0.2.128/26 and 192.0.2.192/26).
> 
> I ran the following commands while a persistent ping was running from one
> NetNS to the IP on the other's dummy0 interface:
> 
> # ip link set ns2b up && ip route append 192.0.2.192/26 via 192.0.2.126 && ip
> link set ns2a down
> (pause and watch things)
> # ip link set ns2a up && ip route append 192.0.2.192/26 via 192.0.2.62 && ip
> link set ns2b down
> (pause and watch things)
> 
> I could iterate between the two above commands and pings continued to work.
> 
> So, I think that it's now possible to use "alternate routes" (new to me) on
> specific prefixes in addition to the default.  Thus there is no longer any
> need for a separate table and the associated IP rule.

	Not true. net/ipv4/fib_semantics.c:fib_select_path()
calls fib_select_default() only when prefixlen = 0 (default route).
Otherwise, only the first route will be considered.

	fib_select_default() is the function that decides which
nexthop is reachable and whether to contact it. It uses the ARP
state via fib_detect_death(). That is all code that is behind this
feature called "alternative routes": the kernel selects one
based on nexthop's ARP state. Routes with different metric are
considered only when the routes with lower metric are removed.

> I'm running kernel version 4.9.76.
> 
> I did go ahead and set net.ipv4.conf.ns2b.ignore_routes_with_linkdown to 1.
> 
> for i in /proc/sys/net/ipv4/conf/*/ignore_routes_with_linkdown; do echo 1 >
> $i; done

	IIRC, this flag invalidates nexthops depending on
the link state. If your link is always UP it does not help
much. If you rely on user space tool, you can check the state
of the desired hops: device link state, your gateway to
ISP, one or more gateways in the ISP network which you
consider permanent part of the path via this ISP.

> Doing that dropped the number of dropped pings from 60 ~ 90 (1 / second) to 0
> ~ 5 (1 / second).  (Rarely, maybe 1 out of 20 flips, would it take upwards of
> 10 pings / seconds.)
> 
> > # Alternative routes use same metric!!!
> > ip route append default via 192.168.1.254 dev eno1 table 100
> > ip route append default via 192.168.2.254 dev eno2 table 100
> > ip rule add prio 100 to 172.16.0.0/12 table 100
> 
> I did have to "append" the route.  I couldn't just "add" the route. When I
> tried to "add" the second route, I got an error about the route already
> existing.  Using "append" instead of "add" with everything else the same
> worked just fine.
> 
> Note:  I did go ahead and remove the single route that was added via "add" and
> used "append" for both.

	First route can be created with 'add' but all next
alternative routes can be added only with "append". If you
successfully add them with "add" it means they are not
alternatives to the first one, they are not considered at all.

Regards

^ permalink raw reply

* RE: [PATCH v0 03/12] mlxsw: core: Add core environment module for port temperature reading
From: Vadim Pasternak @ 2018-06-21 20:02 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Guenter Roeck, davem@davemloft.net, netdev@vger.kernel.org,
	jiri@resnulli.us
In-Reply-To: <20180621194917.GC10038@lunn.ch>



> -----Original Message-----
> From: Andrew Lunn [mailto:andrew@lunn.ch]
> Sent: Thursday, June 21, 2018 10:49 PM
> To: Vadim Pasternak <vadimp@mellanox.com>
> Cc: Guenter Roeck <linux@roeck-us.net>; davem@davemloft.net;
> netdev@vger.kernel.org; jiri@resnulli.us
> Subject: Re: [PATCH v0 03/12] mlxsw: core: Add core environment module for
> port temperature reading
> 
> On Thu, Jun 21, 2018 at 07:17:03PM +0000, Vadim Pasternak wrote:
> >
> >
> > > -----Original Message-----
> > > From: Andrew Lunn [mailto:andrew@lunn.ch]
> > > Sent: Thursday, June 21, 2018 9:35 PM
> > > To: Vadim Pasternak <vadimp@mellanox.com>; Guenter Roeck
> > > <linux@roeck- us.net>
> > > Cc: davem@davemloft.net; netdev@vger.kernel.org; jiri@resnulli.us
> > > Subject: Re: [PATCH v0 03/12] mlxsw: core: Add core environment
> > > module for port temperature reading
> > >
> > > > Hi Andrew,
> > >
> > > Adding Guenter Roeck, the HWMON maintainer.
> > >
> > > > The temperature of each individual module can be obtained through
> > > > ethtool.
> > >
> > > You mean via --module-info?
> >
> > Yes.
> >
> > >
> > > FYI: I plan to add hwmon support to the kernel SFP code. So if you
> > > ever decide to swap to the kernel SFP code, not your own, the raw
> > > temperatures will be exported.
> > >
> >
> > Not sure it'll work for us, since we read SFP/QSFP ports through our
> > SW/FW interface.
> 
> Can you make fake i2c busses? Pass the i2c transactions to the firmware?

Theoretically yes.
But have well-defined SW/FW interface, working over PCI and FW at
ASIC end implements I2C master.
> 
> > But would be nice if you can provide some reference to this code.
> 
> drivers/net/phy/sfp.c
> 

Thank you.

> >
> > > > The worst temperature is necessary for the system cooling control
> > > > decision.
> > >
> > > I would expect the system cooling would understand that.
> > >
> >
> > In thermal zone infrastructure there is one temperature input.
> > How you can consider 64+ different inputs?
> 
> I've never used the thermal zone code. But i've used boards with 4 sensors
> spread around it. If the thermal zone code could not support that, i would be
> surprised.
> 
> [Goes away and reads
> https://www.kernel.org/doc/Documentation/thermal/sysfs-api.txt]
> 
> So it sounds like, one zone is one sensor. So you actually have hot-plugable
> zones, up to 64 of them. It also looks like you can bind a zone to a cooling
> device. There does not seem to be a 1:1 mapping. So you should be able to bind
> 64 zones to one fan. Or if you have multiple fans, bind a zone to the nearest fan.
> 

It means I will have 64 thermal zones for each module (actually for the
some coming new systems will support 128 modules, plus thermal zone
for ASIC ambient temperatures.
And each zone will try to control same PWM.
As I result PWM will be extremely jumpy and non-effective.

> But as i said, i'm no expert on this. You really should be posting these patches on
> the hwmon list and the linux-pm list. The netdev list does not have the needed
> specialist. Once Rui Zhang, Eduardo Valentin, and Guenter Roack have given
> them Acked-by, David Miller can then merge them.

Thanks,
Vadim.

> 
>       Andrew

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox