Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCHv2 net-next 0/5] sctp: clean up __sctp_connect function
From: Marcelo Ricardo Leitner @ 2019-07-30 19:42 UTC (permalink / raw)
  To: Xin Long; +Cc: network dev, linux-sctp, Neil Horman, davem
In-Reply-To: <cover.1564490276.git.lucien.xin@gmail.com>

On Tue, Jul 30, 2019 at 08:38:18PM +0800, Xin Long wrote:
> This patchset is to factor out some common code for
> sctp_sendmsg_new_asoc() and __sctp_connect() into 2
> new functioins.
> 
> v1->v2:
>   - add the patch 1/5 to avoid a slab-out-of-bounds warning.
>   - add some code comment for the check change in patch 2/5.
>   - remove unused 'addrcnt' as Marcelo noticed in patch 3/5.
> 
> Xin Long (5):
>   sctp: only copy the available addr data in sctp_transport_init
>   sctp: check addr_size with sa_family_t size in
>     __sctp_setsockopt_connectx
>   sctp: clean up __sctp_connect
>   sctp: factor out sctp_connect_new_asoc
>   sctp: factor out sctp_connect_add_peer

Series,
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

^ permalink raw reply

* Re: [PATCH net v2] ipvs: Improve robustness to the ipvs sysctl
From: Julian Anastasov @ 2019-07-30 19:29 UTC (permalink / raw)
  To: hujunwei
  Cc: wensong, horms, pablo, kadlec, Florian Westphal, davem,
	Florian Westphal, netdev@vger.kernel.org, lvs-devel,
	netfilter-devel, coreteam, Mingfangsen, wangxiaogang3, xuhanbing
In-Reply-To: <4a0476d3-57a4-50e0-cae8-9dffc4f4d556@huawei.com>


	Hello,

On Tue, 30 Jul 2019, hujunwei wrote:

> From: Junwei Hu <hujunwei4@huawei.com>
> 
> The ipvs module parse the user buffer and save it to sysctl,
> then check if the value is valid. invalid value occurs
> over a period of time.
> Here, I add a variable, struct ctl_table tmp, used to read
> the value from the user buffer, and save only when it is valid.
> I delete proc_do_sync_mode and use extra1/2 in table for the
> proc_dointvec_minmax call.
> 
> Fixes: f73181c8288f ("ipvs: add support for sync threads")
> Signed-off-by: Junwei Hu <hujunwei4@huawei.com>

	Looks good to me, thanks!

Acked-by: Julian Anastasov <ja@ssi.bg>

	BTW, why ip_vs_zero_all everywhere? Due to old git version?

> ---
> V1->V2:
> - delete proc_do_sync_mode and use proc_dointvec_minmax call.
> ---
>  net/netfilter/ipvs/ip_vs_ctl.c | 69 +++++++++++++++++++++---------------------
>  1 file changed, 35 insertions(+), 34 deletions(-)
> 
> diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
> index 060565e..7aed7b0 100644
> --- a/net/netfilter/ipvs/ip_vs_ctl.c
> +++ b/net/netfilter/ipvs/ip_vs_ctl.c
> @@ -1737,12 +1737,18 @@ static int ip_vs_zero_all(struct netns_ipvs *ipvs)
>  	int val = *valp;
>  	int rc;
> 
> -	rc = proc_dointvec(table, write, buffer, lenp, ppos);
> +	struct ctl_table tmp = {
> +		.data = &val,
> +		.maxlen = sizeof(int),
> +		.mode = table->mode,
> +	};
> +
> +	rc = proc_dointvec(&tmp, write, buffer, lenp, ppos);
>  	if (write && (*valp != val)) {
> -		if ((*valp < 0) || (*valp > 3)) {
> -			/* Restore the correct value */
> +		if (val < 0 || val > 3) {
> +			rc = -EINVAL;
> +		} else {
>  			*valp = val;
> -		} else {
>  			update_defense_level(ipvs);
>  		}
>  	}
> @@ -1756,33 +1762,20 @@ static int ip_vs_zero_all(struct netns_ipvs *ipvs)
>  	int *valp = table->data;
>  	int val[2];
>  	int rc;
> +	struct ctl_table tmp = {
> +		.data = &val,
> +		.maxlen = table->maxlen,
> +		.mode = table->mode,
> +	};
> 
> -	/* backup the value first */
>  	memcpy(val, valp, sizeof(val));
> -
> -	rc = proc_dointvec(table, write, buffer, lenp, ppos);
> -	if (write && (valp[0] < 0 || valp[1] < 0 ||
> -	    (valp[0] >= valp[1] && valp[1]))) {
> -		/* Restore the correct value */
> -		memcpy(valp, val, sizeof(val));
> -	}
> -	return rc;
> -}
> -
> -static int
> -proc_do_sync_mode(struct ctl_table *table, int write,
> -		     void __user *buffer, size_t *lenp, loff_t *ppos)
> -{
> -	int *valp = table->data;
> -	int val = *valp;
> -	int rc;
> -
> -	rc = proc_dointvec(table, write, buffer, lenp, ppos);
> -	if (write && (*valp != val)) {
> -		if ((*valp < 0) || (*valp > 1)) {
> -			/* Restore the correct value */
> -			*valp = val;
> -		}
> +	rc = proc_dointvec(&tmp, write, buffer, lenp, ppos);
> +	if (write) {
> +		if (val[0] < 0 || val[1] < 0 ||
> +		    (val[0] >= val[1] && val[1]))
> +			rc = -EINVAL;
> +		else
> +			memcpy(valp, val, sizeof(val));
>  	}
>  	return rc;
>  }
> @@ -1795,12 +1788,18 @@ static int ip_vs_zero_all(struct netns_ipvs *ipvs)
>  	int val = *valp;
>  	int rc;
> 
> -	rc = proc_dointvec(table, write, buffer, lenp, ppos);
> +	struct ctl_table tmp = {
> +		.data = &val,
> +		.maxlen = sizeof(int),
> +		.mode = table->mode,
> +	};
> +
> +	rc = proc_dointvec(&tmp, write, buffer, lenp, ppos);
>  	if (write && (*valp != val)) {
> -		if (*valp < 1 || !is_power_of_2(*valp)) {
> -			/* Restore the correct value */
> +		if (val < 1 || !is_power_of_2(val))
> +			rc = -EINVAL;
> +		else
>  			*valp = val;
> -		}
>  	}
>  	return rc;
>  }
> @@ -1860,7 +1859,9 @@ static int ip_vs_zero_all(struct netns_ipvs *ipvs)
>  		.procname	= "sync_version",
>  		.maxlen		= sizeof(int),
>  		.mode		= 0644,
> -		.proc_handler	= proc_do_sync_mode,
> +		.proc_handler	= proc_dointvec_minmax,
> +		.extra1		= SYSCTL_ZERO,
> +		.extra2		= SYSCTL_ONE,
>  	},
>  	{
>  		.procname	= "sync_ports",
> -- 
> 1.7.12.4

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* [PATCH v5 09/29] compat_ioctl: pppoe: fix PPPOEIOCSFWD handling
From: Arnd Bergmann @ 2019-07-30 19:25 UTC (permalink / raw)
  To: Alexander Viro
  Cc: linux-fsdevel, linux-kernel, Arnd Bergmann, Guillaume Nault,
	Michal Ostrowski, David S. Miller, Dmitry Kozlov, James Chapman,
	netdev
In-Reply-To: <20190730192552.4014288-1-arnd@arndb.de>

Support for handling the PPPOEIOCSFWD ioctl in compat mode was added in
linux-2.5.69 along with hundreds of other commands, but was always broken
sincen only the structure is compatible, but the command number is not,
due to the size being sizeof(size_t), or at first sizeof(sizeof((struct
sockaddr_pppox)), which is different on 64-bit architectures.

Guillaume Nault adds:

  And the implementation was broken until 2016 (see 29e73269aa4d ("pppoe:
  fix reference counting in PPPoE proxy")), and nobody ever noticed. I
  should probably have removed this ioctl entirely instead of fixing it.
  Clearly, it has never been used.

Fix it by adding a compat_ioctl handler for all pppoe variants that
translates the command number and then calls the regular ioctl function.

All other ioctl commands handled by pppoe are compatible between 32-bit
and 64-bit, and require compat_ptr() conversion.

This should apply to all stable kernels.

Acked-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 drivers/net/ppp/pppoe.c  |  3 +++
 drivers/net/ppp/pppox.c  | 13 +++++++++++++
 drivers/net/ppp/pptp.c   |  3 +++
 fs/compat_ioctl.c        |  3 ---
 include/linux/if_pppox.h |  3 +++
 net/l2tp/l2tp_ppp.c      |  3 +++
 6 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
index 1d902ecb4aa8..a44dd3c8af63 100644
--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -1115,6 +1115,9 @@ static const struct proto_ops pppoe_ops = {
 	.recvmsg	= pppoe_recvmsg,
 	.mmap		= sock_no_mmap,
 	.ioctl		= pppox_ioctl,
+#ifdef CONFIG_COMPAT
+	.compat_ioctl	= pppox_compat_ioctl,
+#endif
 };
 
 static const struct pppox_proto pppoe_proto = {
diff --git a/drivers/net/ppp/pppox.c b/drivers/net/ppp/pppox.c
index 5ef422a43d70..08364f10a43f 100644
--- a/drivers/net/ppp/pppox.c
+++ b/drivers/net/ppp/pppox.c
@@ -17,6 +17,7 @@
 #include <linux/string.h>
 #include <linux/module.h>
 #include <linux/kernel.h>
+#include <linux/compat.h>
 #include <linux/errno.h>
 #include <linux/netdevice.h>
 #include <linux/net.h>
@@ -98,6 +99,18 @@ int pppox_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
 
 EXPORT_SYMBOL(pppox_ioctl);
 
+#ifdef CONFIG_COMPAT
+int pppox_compat_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
+{
+	if (cmd == PPPOEIOCSFWD32)
+		cmd = PPPOEIOCSFWD;
+
+	return pppox_ioctl(sock, cmd, (unsigned long)compat_ptr(arg));
+}
+
+EXPORT_SYMBOL(pppox_compat_ioctl);
+#endif
+
 static int pppox_create(struct net *net, struct socket *sock, int protocol,
 			int kern)
 {
diff --git a/drivers/net/ppp/pptp.c b/drivers/net/ppp/pptp.c
index a8e52c8e4128..734de7de03f7 100644
--- a/drivers/net/ppp/pptp.c
+++ b/drivers/net/ppp/pptp.c
@@ -623,6 +623,9 @@ static const struct proto_ops pptp_ops = {
 	.recvmsg    = sock_no_recvmsg,
 	.mmap       = sock_no_mmap,
 	.ioctl      = pppox_ioctl,
+#ifdef CONFIG_COMPAT
+	.compat_ioctl = pppox_compat_ioctl,
+#endif
 };
 
 static const struct pppox_proto pppox_pptp_proto = {
diff --git a/fs/compat_ioctl.c b/fs/compat_ioctl.c
index 9ea1c4981332..cec3ec0a1727 100644
--- a/fs/compat_ioctl.c
+++ b/fs/compat_ioctl.c
@@ -589,9 +589,6 @@ COMPATIBLE_IOCTL(PPPIOCDISCONN)
 COMPATIBLE_IOCTL(PPPIOCATTCHAN)
 COMPATIBLE_IOCTL(PPPIOCGCHAN)
 COMPATIBLE_IOCTL(PPPIOCGL2TPSTATS)
-/* PPPOX */
-COMPATIBLE_IOCTL(PPPOEIOCSFWD)
-COMPATIBLE_IOCTL(PPPOEIOCDFWD)
 /* Big A */
 /* sparc only */
 /* Big Q for sound/OSS */
diff --git a/include/linux/if_pppox.h b/include/linux/if_pppox.h
index 8b728750a625..69e813bcb947 100644
--- a/include/linux/if_pppox.h
+++ b/include/linux/if_pppox.h
@@ -80,6 +80,9 @@ extern int register_pppox_proto(int proto_num, const struct pppox_proto *pp);
 extern void unregister_pppox_proto(int proto_num);
 extern void pppox_unbind_sock(struct sock *sk);/* delete ppp-channel binding */
 extern int pppox_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg);
+extern int pppox_compat_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg);
+
+#define PPPOEIOCSFWD32    _IOW(0xB1 ,0, compat_size_t)
 
 /* PPPoX socket states */
 enum {
diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c
index 1d0e5904dedf..c54cb59593ef 100644
--- a/net/l2tp/l2tp_ppp.c
+++ b/net/l2tp/l2tp_ppp.c
@@ -1681,6 +1681,9 @@ static const struct proto_ops pppol2tp_ops = {
 	.recvmsg	= pppol2tp_recvmsg,
 	.mmap		= sock_no_mmap,
 	.ioctl		= pppox_ioctl,
+#ifdef CONFIG_COMPAT
+	.compat_ioctl = pppox_compat_ioctl,
+#endif
 };
 
 static const struct pppox_proto pppol2tp_proto = {
-- 
2.20.0


^ permalink raw reply related

* Re: [PATCHv2 net-next 1/5] sctp: only copy the available addr data in sctp_transport_init
From: Marcelo Ricardo Leitner @ 2019-07-30 19:24 UTC (permalink / raw)
  To: Xin Long; +Cc: network dev, linux-sctp, Neil Horman, davem
In-Reply-To: <bb6e9856c2db0f24b91fb326fbe3c9c013f2459b.1564490276.git.lucien.xin@gmail.com>

On Tue, Jul 30, 2019 at 08:38:19PM +0800, Xin Long wrote:
> 'addr' passed to sctp_transport_init is not always a whole size
> of union sctp_addr, like the path:
> 
>   sctp_sendmsg() ->
>   sctp_sendmsg_new_asoc() ->
>   sctp_assoc_add_peer() ->
>   sctp_transport_new() -> sctp_transport_init()
> 
> In the next patches, we will also pass the address length of data
> only to sctp_assoc_add_peer().
> 
> So sctp_transport_init() should copy the only available data from
> addr to peer->ipaddr, instead of 'peer->ipaddr = *addr' which may
> cause slab-out-of-bounds.
> 
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> ---
>  net/sctp/transport.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/sctp/transport.c b/net/sctp/transport.c
> index e2f8e36..7235a60 100644
> --- a/net/sctp/transport.c
> +++ b/net/sctp/transport.c
> @@ -43,8 +43,8 @@ static struct sctp_transport *sctp_transport_init(struct net *net,
>  						  gfp_t gfp)
>  {
>  	/* Copy in the address.  */
> -	peer->ipaddr = *addr;
>  	peer->af_specific = sctp_get_af_specific(addr->sa.sa_family);
> +	memcpy(&peer->ipaddr, addr, peer->af_specific->sockaddr_len);

Just for the record, transports are allocated with kzalloc() and this
shouldn't result in any remaining bytes of this buffer to be
uninitialized.

That said, unrelated to the patch, memset below and other =0's are not
necessary.

>  	memset(&peer->saddr, 0, sizeof(union sctp_addr));
>  
>  	peer->sack_generation = 0;
> -- 
> 2.1.0
> 

^ permalink raw reply

* Re: [PATCH] net: bridge: Allow bridge to joing multicast groups
From: Allan W. Nielsen @ 2019-07-30 19:00 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Ido Schimmel, Nikolay Aleksandrov, Horatiu Vultur, roopa, davem,
	bridge, netdev, linux-kernel
In-Reply-To: <20190730143400.GO28552@lunn.ch>

The 07/30/2019 16:34, Andrew Lunn wrote:
> The whole offloading story has been you use the hardware to accelerate
> what the Linux stack can already do.
It is true, I have been quite keen on finding a way to control the forwarding of
L2-multicast which will work in the same way with and without HW acceleration
(and which we can HW offlaod with the HW I'm working on).

> In this case, you want to accelerate Device Level Ring, DLR.
It is actually not only for DLR, there are other ring protocols which has the
same needs the same MRP (media redundancy protocol) is an other example.

I just used DLR as an example because this is the one we expect to implement the
protocol for first. There are other just as important use-cases.

> But i've not yet seen a software implementation of DLR. Should we really be
> considering first adding DLR to the SW bridge?
We have actually (slowly) stared to work on a DLR SW implementation. We want to
do this as a Linux driver instead of a user-space implementation, because there
are other HW facilities we would like to offload (the HW has a automatic frame
generator, which can generate the beacon frames, and a unit which can terminate
the beacon frames, and generate an interrupt if the beacon frames are not
received).

Our plan was to implement this in pure SW, and then look at how to HW offload
it.

But this will take some time before we have anything meaning full to show.

> Make it an alternative to the STP code?
I'm still working on learning the details of DLR, but I actually believe that it
in some situations may co-exists with STP ;-)

DLR only works on ring topologies, but it is possible to connect a ring to a
classic STP network. If doing so, then you are suppose to run DLR on the two
ring ports, and (M)STP on the ports connecting to the remaining part of the
network.

As far as I recall, this is called a gateway node. But supporting this is
optional, and will properly not be supported in the first implementation.

> Once we have a generic implementation we can then look at how it can
> be accelerated using switchdev.
I agree with you that we need a SW implementation of DLR because we can offload
the DLR protocol to HW.

But what we are looking at here, is to offload a
non-aware-(DLR|MRP)-switch which happens to be placed in a network with
these protocols running.

It is not really DLR specific, which is why it seems reasonable to implement
this without a DLR SW implementation up front.

-- 
/Allan

^ permalink raw reply

* Re: [PATCH nf] netfilter: nf_tables: map basechain priority to hardware priority
From: Marcelo Ricardo Leitner @ 2019-07-30 19:08 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netfilter-devel, wenxu, jiri, saeedm, gerlitz.or, paulb, netdev
In-Reply-To: <20190730105417.14538-1-pablo@netfilter.org>

On Tue, Jul 30, 2019 at 12:54:17PM +0200, Pablo Neira Ayuso wrote:
> This patch maps basechain netfilter priorities from -8192 to 8191 to
> hardware priority 0xC000 + 1. tcf_auto_prio() uses 0xC000 if the user
> specifies no priority, then it subtract 1 for each new tcf_proto object.
> This patch uses the hardware priority range from 0xC000 to 0xFFFF for
> netfilter.

This makes more sense, thanks Pablo.
Nit below.

> +u16 nft_chain_offload_priority(struct nft_base_chain *basechain)
> +{
> +	u16 prio;
> +
> +	if (basechain->ops.priority < NFT_BASECHAIN_OFFLOAD_PRIO_MIN ||
> +	    basechain->ops.priority > NFT_BASECHAIN_OFFLOAD_PRIO_MAX)
> +		return 0;
> +
> +	/* map netfilter chain priority to hardware priority. */
> +	prio = basechain->ops.priority +
> +		NFT_BASECHAIN_OFFLOAD_PRIO_MAX +
> +			NFT_BASECHAIN_OFFLOAD_HW_PRIO_BASE;

Weird indent here.

> +
> +	return prio;
> +}

^ permalink raw reply

* Re: [RFC] net: phy: read link status twice when phy_check_link_status()
From: Heiner Kallweit @ 2019-07-30 19:04 UTC (permalink / raw)
  To: liuyonglong, andrew, davem
  Cc: netdev, linux-kernel, linuxarm, salil.mehta, yisen.zhuang,
	shiju.jose
In-Reply-To: <03708d00-a8d9-4a9d-4188-9fe0e38de2b8@huawei.com>

On 30.07.2019 08:35, liuyonglong wrote:
> :/sys/kernel/debug/tracing$ cat trace
> # tracer: nop
> #
> # entries-in-buffer/entries-written: 45/45   #P:128
> #
> #                              _-----=> irqs-off
> #                             / _----=> need-resched
> #                            | / _---=> hardirq/softirq
> #                            || / _--=> preempt-depth
> #                            ||| /     delay
> #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
> #              | |       |   ||||       |         |
>     kworker/64:2-1028  [064] ....   172.295687: mdio_access: mii-0000:bd:00.0 read  phy:0x01 reg:0x02 val:0x001c
>     kworker/64:2-1028  [064] ....   172.295726: mdio_access: mii-0000:bd:00.0 read  phy:0x01 reg:0x03 val:0xc916
>     kworker/64:2-1028  [064] ....   172.296902: mdio_access: mii-0000:bd:00.0 read  phy:0x01 reg:0x01 val:0x79ad
>     kworker/64:2-1028  [064] ....   172.296938: mdio_access: mii-0000:bd:00.0 read  phy:0x01 reg:0x0f val:0x2000
>     kworker/64:2-1028  [064] ....   172.321213: mdio_access: mii-0000:bd:00.0 read  phy:0x01 reg:0x00 val:0x1040
>     kworker/64:2-1028  [064] ....   172.343209: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x02 val:0x001c
>     kworker/64:2-1028  [064] ....   172.343245: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x03 val:0xc916
>     kworker/64:2-1028  [064] ....   172.343882: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x79ad
>     kworker/64:2-1028  [064] ....   172.343918: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x0f val:0x2000
>     kworker/64:2-1028  [064] ....   172.362658: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x00 val:0x1040
>     kworker/64:2-1028  [064] ....   172.385961: mdio_access: mii-0000:bd:00.2 read  phy:0x05 reg:0x02 val:0x001c
>     kworker/64:2-1028  [064] ....   172.385996: mdio_access: mii-0000:bd:00.2 read  phy:0x05 reg:0x03 val:0xc916
>     kworker/64:2-1028  [064] ....   172.386646: mdio_access: mii-0000:bd:00.2 read  phy:0x05 reg:0x01 val:0x79ad
>     kworker/64:2-1028  [064] ....   172.386681: mdio_access: mii-0000:bd:00.2 read  phy:0x05 reg:0x0f val:0x2000
>     kworker/64:2-1028  [064] ....   172.411286: mdio_access: mii-0000:bd:00.2 read  phy:0x05 reg:0x00 val:0x1040
>     kworker/64:2-1028  [064] ....   172.433225: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x02 val:0x001c
>     kworker/64:2-1028  [064] ....   172.433260: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x03 val:0xc916
>     kworker/64:2-1028  [064] ....   172.433887: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x01 val:0x79ad
>     kworker/64:2-1028  [064] ....   172.433922: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x0f val:0x2000
>     kworker/64:2-1028  [064] ....   172.452862: mdio_access: mii-0000:bd:00.3 read  phy:0x07 reg:0x00 val:0x1040
>         ifconfig-1324  [011] ....   177.325585: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x00 val:0x1040
>   kworker/u257:0-8     [012] ....   177.325642: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x04 val:0x01e1
>   kworker/u257:0-8     [012] ....   177.325654: mdio_access: mii-0000:bd:00.1 write phy:0x03 reg:0x04 val:0x05e1
>   kworker/u257:0-8     [012] ....   177.325708: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x79ad
>   kworker/u257:0-8     [012] ....   177.325744: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x09 val:0x0200
>   kworker/u257:0-8     [012] ....   177.325779: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x00 val:0x1040
>   kworker/u257:0-8     [012] ....   177.325788: mdio_access: mii-0000:bd:00.1 write phy:0x03 reg:0x00 val:0x1240
>   kworker/u257:0-8     [012] ....   177.325843: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x798d

What I think that happens here:
Writing 0x1240 to BMCR starts aneg. When reading BMSR immediately after that then the PHY seems to have cleared
the "aneg complete" bit already, but not yet the "link up" bit. This results in the false "link up" notification.
The following patch is based on the fact that in case of enabled aneg we can't have a valid link if aneg isn't
finished. Could you please test whether this works for you?

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 6b5cb87f3..7ddd91df9 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -1774,6 +1774,12 @@ int genphy_update_link(struct phy_device *phydev)
 	phydev->link = status & BMSR_LSTATUS ? 1 : 0;
 	phydev->autoneg_complete = status & BMSR_ANEGCOMPLETE ? 1 : 0;
 
+	/* Consider the case that autoneg was started and "aneg complete"
+	 * bit has been reset, but "link up" bit not yet.
+	 */
+	if (phydev->autoneg == AUTONEG_ENABLE && !phydev->autoneg_complete)
+		phydev->link = 0;
+
 	return 0;
 }
 EXPORT_SYMBOL(genphy_update_link);
-- 
2.22.0





>   kworker/u257:0-8     [003] ....   178.360488: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x7989
>   kworker/u257:0-8     [000] ....   179.384479: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x7989
>   kworker/u257:0-8     [000] ....   180.408477: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x7989
>   kworker/u257:0-8     [000] ....   181.432474: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x79a9
>   kworker/u257:0-8     [000] ....   181.432510: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x0a val:0x7800
>   kworker/u257:0-8     [000] ....   181.432546: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x09 val:0x0200
>   kworker/u257:0-8     [000] ....   181.432582: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x05 val:0xc1e1
>   kworker/u257:0-8     [000] ....   182.456510: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x79ad
>   kworker/u257:0-8     [000] ....   182.456546: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x0a val:0x4800
>   kworker/u257:0-8     [000] ....   182.456582: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x09 val:0x0200
>   kworker/u257:0-8     [000] ....   182.456618: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x05 val:0xc1e1
>   kworker/u257:0-8     [001] ....   183.480476: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x79ad
>   kworker/u257:0-8     [000] ....   184.504478: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x79ad
>   kworker/u257:0-8     [000] ....   185.528486: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x79ad
>   kworker/u257:0-8     [000] ....   186.552475: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x01 val:0x79ad
>         ifconfig-1327  [011] ....   187.196036: mdio_access: mii-0000:bd:00.1 read  phy:0x03 reg:0x00 val:0x1040
>         ifconfig-1327  [011] ....   187.196046: mdio_access: mii-0000:bd:00.1 write phy:0x03 reg:0x00 val:0x1840
> 
> 
[...]
>>
>>>
>>>
>>>
>>
>> Heiner
>>
>> .
>>
> 
> .
> 


^ permalink raw reply related

* Re: [net  1/1] tipc: fix unitilized skb list crash
From: Xin Long @ 2019-07-30 18:57 UTC (permalink / raw)
  To: Jon Maloy
  Cc: davem, netdev, tung q nguyen, hoang h le, shuali, ying xue,
	tipc-discussion
In-Reply-To: <1564510750-19531-1-git-send-email-jon.maloy@ericsson.com>



----- Original Message -----
> Our test suite somtimes provokes the following crash:
> 
> Description of problem:
> [ 1092.597234] BUG: unable to handle kernel NULL pointer dereference at
> 00000000000000e8
> [ 1092.605072] PGD 0 P4D 0
> [ 1092.607620] Oops: 0000 [#1] SMP PTI
> [ 1092.611118] CPU: 37 PID: 0 Comm: swapper/37 Kdump: loaded Not tainted
> 4.18.0-122.el8.x86_64 #1
> [ 1092.619724] Hardware name: Dell Inc. PowerEdge R740/08D89F, BIOS 1.3.7
> 02/08/2018
> [ 1092.627215] RIP: 0010:tipc_mcast_filter_msg+0x93/0x2d0 [tipc]
> [ 1092.632955] Code: 0f 84 aa 01 00 00 89 cf 4d 01 ca 4c 8b 26 c1 ef 19 83 e7
> 0f 83 ff 0c 4d 0f 45 d1 41 8b 6a 10 0f cd 4c 39 e6 0f 84 81 01 00 00 <4d> 8b
> 9c 24 e8 00 00 00 45 8b 13 41 0f ca 44 89 d7 c1 ef 13 83 e7
> [ 1092.651703] RSP: 0018:ffff929e5fa83a18 EFLAGS: 00010282
> [ 1092.656927] RAX: ffff929e3fb38100 RBX: 00000000069f29ee RCX:
> 00000000416c0045
> [ 1092.664058] RDX: ffff929e5fa83a88 RSI: ffff929e31a28420 RDI:
> 0000000000000000
> [ 1092.671209] RBP: 0000000029b11821 R08: 0000000000000000 R09:
> ffff929e39b4407a
> [ 1092.678343] R10: ffff929e39b4407a R11: 0000000000000007 R12:
> 0000000000000000
> [ 1092.685475] R13: 0000000000000001 R14: ffff929e3fb38100 R15:
> ffff929e39b4407a
> [ 1092.692614] FS:  0000000000000000(0000) GS:ffff929e5fa80000(0000)
> knlGS:0000000000000000
> [ 1092.700702] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1092.706447] CR2: 00000000000000e8 CR3: 000000031300a004 CR4:
> 00000000007606e0
> [ 1092.713579] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [ 1092.720712] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [ 1092.727843] PKRU: 55555554
> [ 1092.730556] Call Trace:
> [ 1092.733010]  <IRQ>
> [ 1092.735034]  tipc_sk_filter_rcv+0x7ca/0xb80 [tipc]
> [ 1092.739828]  ? __kmalloc_node_track_caller+0x1cb/0x290
> [ 1092.744974]  ? dev_hard_start_xmit+0xa5/0x210
> [ 1092.749332]  tipc_sk_rcv+0x389/0x640 [tipc]
> [ 1092.753519]  tipc_sk_mcast_rcv+0x23c/0x3a0 [tipc]
> [ 1092.758224]  tipc_rcv+0x57a/0xf20 [tipc]
> [ 1092.762154]  ? ktime_get_real_ts64+0x40/0xe0
> [ 1092.766432]  ? tpacket_rcv+0x50/0x9f0
> [ 1092.770098]  tipc_l2_rcv_msg+0x4a/0x70 [tipc]
> [ 1092.774452]  __netif_receive_skb_core+0xb62/0xbd0
> [ 1092.779164]  ? enqueue_entity+0xf6/0x630
> [ 1092.783084]  ? kmem_cache_alloc+0x158/0x1c0
> [ 1092.787272]  ? __build_skb+0x25/0xd0
> [ 1092.790849]  netif_receive_skb_internal+0x42/0xf0
> [ 1092.795557]  napi_gro_receive+0xba/0xe0
> [ 1092.799417]  mlx5e_handle_rx_cqe+0x83/0xd0 [mlx5_core]
> [ 1092.804564]  mlx5e_poll_rx_cq+0xd5/0x920 [mlx5_core]
> [ 1092.809536]  mlx5e_napi_poll+0xb2/0xce0 [mlx5_core]
> [ 1092.814415]  ? __wake_up_common_lock+0x89/0xc0
> [ 1092.818861]  net_rx_action+0x149/0x3b0
> [ 1092.822616]  __do_softirq+0xe3/0x30a
> [ 1092.826193]  irq_exit+0x100/0x110
> [ 1092.829512]  do_IRQ+0x85/0xd0
> [ 1092.832483]  common_interrupt+0xf/0xf
> [ 1092.836147]  </IRQ>
> [ 1092.838255] RIP: 0010:cpuidle_enter_state+0xb7/0x2a0
> [ 1092.843221] Code: e8 3e 79 a5 ff 80 7c 24 03 00 74 17 9c 58 0f 1f 44 00 00
> f6 c4 02 0f 85 d7 01 00 00 31 ff e8 a0 6b ab ff fb 66 0f 1f 44 00 00 <48> b8
> ff ff ff ff f3 01 00 00 4c 29 f3 ba ff ff ff 7f 48 39 c3 7f
> [ 1092.861967] RSP: 0018:ffffaa5ec6533e98 EFLAGS: 00000246 ORIG_RAX:
> ffffffffffffffdd
> [ 1092.869530] RAX: ffff929e5faa3100 RBX: 000000fe63dd2092 RCX:
> 000000000000001f
> [ 1092.876665] RDX: 000000fe63dd2092 RSI: 000000003a518aaa RDI:
> 0000000000000000
> [ 1092.883795] RBP: 0000000000000003 R08: 0000000000000004 R09:
> 0000000000022940
> [ 1092.890929] R10: 0000040cb0666b56 R11: ffff929e5faa20a8 R12:
> ffff929e5faade78
> [ 1092.898060] R13: ffffffffb59258f8 R14: 000000fe60f3228d R15:
> 0000000000000000
> [ 1092.905196]  ? cpuidle_enter_state+0x92/0x2a0
> [ 1092.909555]  do_idle+0x236/0x280
> [ 1092.912785]  cpu_startup_entry+0x6f/0x80
> [ 1092.916715]  start_secondary+0x1a7/0x200
> [ 1092.920642]  secondary_startup_64+0xb7/0xc0
> [...]
> 
> The reason is that the skb list tipc_socket::mc_method.deferredq only
> is initialized for connectionless sockets, while nothing stops arriving
> multicast messages from being filtered by connection oriented sockets,
> with subsequent access to the said list.
> 
> We fix this by initializing the list unconditionally at socket creation.
> This eliminates the crash, while the message still is dropped further
> down in tipc_sk_filter_rcv() as it should be.
> 
> Reported-by: Li Shuang <shuali@redhat.com>
> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
> ---
>  net/tipc/socket.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/net/tipc/socket.c b/net/tipc/socket.c
> index dd8537f..83ae41d 100644
> --- a/net/tipc/socket.c
> +++ b/net/tipc/socket.c
> @@ -485,9 +485,8 @@ static int tipc_sk_create(struct net *net, struct socket
> *sock,
>  		tsk_set_unreturnable(tsk, true);
>  		if (sock->type == SOCK_DGRAM)
>  			tsk_set_unreliable(tsk, true);
> -		__skb_queue_head_init(&tsk->mc_method.deferredq);
>  	}
> -
> +	__skb_queue_head_init(&tsk->mc_method.deferredq);
>  	trace_tipc_sk_create(sk, NULL, TIPC_DUMP_NONE, " ");
>  	return 0;
>  }
> --
> 2.1.4
> 
> 
Reviewed-by: Xin Long <lucien.xin@gmail.com>

^ permalink raw reply

* Re: [PATCH v4 net-next 13/19] ionic: Add initial ethtool support
From: Shannon Nelson @ 2019-07-30 18:57 UTC (permalink / raw)
  To: Saeed Mahameed, netdev@vger.kernel.org, davem@davemloft.net
In-Reply-To: <2f966a9ced52c01e8017f7ded772fce1ca4fc966.camel@mellanox.com>

On 7/24/19 5:17 PM, Saeed Mahameed wrote:
> On Mon, 2019-07-22 at 14:40 -0700, Shannon Nelson wrote:
>> Add in the basic ethtool callbacks for device information
>> and control.
>>
>> Signed-off-by: Shannon Nelson <snelson@pensando.io>
>> ---

>> +
>> +	while (test_and_set_bit(LIF_QUEUE_RESET, lif->state))
>> +		usleep_range(200, 400);
>> +
> I see this is recurring a lot in the driver, i suggest to have a helper
> function (wait_pending_reset_timeout) and make it return with timeout
> errno after a reasonable amount of time, especially on user context
> flows.
>
>

Sure.
sln

^ permalink raw reply

* [PATCH 2/2] tools: bpftool: add net unload command to unload XDP on interface
From: Daniel T. Lee @ 2019-07-30 18:48 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov; +Cc: netdev
In-Reply-To: <20190730184821.10833-1-danieltimlee@gmail.com>

By this commit, using `bpftool net unload`, the loaded XDP prog can
be unloaded. Unloading the BPF prog will be done through libbpf
'bpf_set_link_xdp_fd' with the progfd set to -1.

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
---
 tools/bpf/bpftool/net.c | 55 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 54 insertions(+), 1 deletion(-)

diff --git a/tools/bpf/bpftool/net.c b/tools/bpf/bpftool/net.c
index d3a4f18b5b95..9d353b6e7d6d 100644
--- a/tools/bpf/bpftool/net.c
+++ b/tools/bpf/bpftool/net.c
@@ -281,6 +281,31 @@ static int parse_load_args(int argc, char **argv, int *progfd,
 	return 0;
 }
 
+static int parse_unload_args(int argc, char **argv,
+			     enum net_load_type *load_type, int *ifindex)
+{
+	if (!REQ_ARGS(2))
+		return -EINVAL;
+
+	*load_type = parse_load_type(*argv);
+	if (*load_type == __MAX_NET_LOAD_TYPE) {
+		p_err("invalid net load/unload type");
+		return -EINVAL;
+	}
+
+	NEXT_ARG();
+	if (!REQ_ARGS(1))
+		return -EINVAL;
+
+	*ifindex = if_nametoindex(*argv);
+	if (!*ifindex) {
+		p_err("Invalid ifname");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 static int do_load_unload_xdp(int *progfd, enum net_load_type *load_type,
 			      int *ifindex)
 {
@@ -323,6 +348,31 @@ static int do_load(int argc, char **argv)
 	return 0;
 }
 
+static int do_unload(int argc, char **argv)
+{
+	enum net_load_type load_type;
+	int err, progfd, ifindex;
+
+	err = parse_unload_args(argc, argv, &load_type, &ifindex);
+	if (err)
+		return err;
+
+	/* to unload xdp prog */
+	progfd = -1;
+	if (is_prefix("xdp", load_type_strings[load_type]))
+		err = do_load_unload_xdp(&progfd, &load_type, &ifindex);
+
+	if (err < 0) {
+		p_err("link set %s failed", load_type_strings[load_type]);
+		return -1;
+	}
+
+	if (json_output)
+		jsonw_null(json_wtr);
+
+	return 0;
+}
+
 static int do_show(int argc, char **argv)
 {
 	struct bpf_attach_info attach_info = {};
@@ -406,6 +456,7 @@ static int do_help(int argc, char **argv)
 	fprintf(stderr,
 		"Usage: %s %s { show | list } [dev <devname>]\n"
 		"       %s %s load PROG LOAD_TYPE <devname>\n"
+		"       %s %s unload LOAD_TYPE <devname>\n"
 		"       %s %s help\n"
 		"\n"
 		"       " HELP_SPEC_PROGRAM "\n"
@@ -415,7 +466,8 @@ static int do_help(int argc, char **argv)
 		"      to dump program attachments. For program types\n"
 		"      sk_{filter,skb,msg,reuseport} and lwt/seg6, please\n"
 		"      consult iproute2.\n",
-		bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2]);
+		bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
+		bin_name, argv[-2]);
 
 	return 0;
 }
@@ -424,6 +476,7 @@ static const struct cmd cmds[] = {
 	{ "show",	do_show },
 	{ "list",	do_show },
 	{ "load",	do_load },
+	{ "unload",	do_unload },
 	{ "help",	do_help },
 	{ 0 }
 };
-- 
2.20.1


^ permalink raw reply related

* [PATCH 1/2] tools: bpftool: add net load command to load XDP on interface
From: Daniel T. Lee @ 2019-07-30 18:48 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov; +Cc: netdev
In-Reply-To: <20190730184821.10833-1-danieltimlee@gmail.com>

By this commit, using `bpftool net load`, user can load XDP prog on
interface. New type of enum 'net_load_type' has been made, as stated at
cover-letter, the meaning of 'load' is, prog will be loaded on interface.

BPF prog will be loaded through libbpf 'bpf_set_link_xdp_fd'.

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
---
 tools/bpf/bpftool/net.c | 107 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 106 insertions(+), 1 deletion(-)

diff --git a/tools/bpf/bpftool/net.c b/tools/bpf/bpftool/net.c
index 67e99c56bc88..d3a4f18b5b95 100644
--- a/tools/bpf/bpftool/net.c
+++ b/tools/bpf/bpftool/net.c
@@ -55,6 +55,35 @@ struct bpf_attach_info {
 	__u32 flow_dissector_id;
 };
 
+enum net_load_type {
+	NET_LOAD_TYPE_XDP,
+	NET_LOAD_TYPE_XDP_GENERIC,
+	NET_LOAD_TYPE_XDP_DRIVE,
+	NET_LOAD_TYPE_XDP_OFFLOAD,
+	__MAX_NET_LOAD_TYPE
+};
+
+static const char * const load_type_strings[] = {
+	[NET_LOAD_TYPE_XDP] = "xdp",
+	[NET_LOAD_TYPE_XDP_GENERIC] = "xdpgeneric",
+	[NET_LOAD_TYPE_XDP_DRIVE] = "xdpdrv",
+	[NET_LOAD_TYPE_XDP_OFFLOAD] = "xdpoffload",
+	[__MAX_NET_LOAD_TYPE] = NULL,
+};
+
+static enum net_load_type parse_load_type(const char *str)
+{
+	enum net_load_type type;
+
+	for (type = 0; type < __MAX_NET_LOAD_TYPE; type++) {
+		if (load_type_strings[type] &&
+		   is_prefix(str, load_type_strings[type]))
+			return type;
+	}
+
+	return __MAX_NET_LOAD_TYPE;
+}
+
 static int dump_link_nlmsg(void *cookie, void *msg, struct nlattr **tb)
 {
 	struct bpf_netdev_t *netinfo = cookie;
@@ -223,6 +252,77 @@ static int query_flow_dissector(struct bpf_attach_info *attach_info)
 	return 0;
 }
 
+static int parse_load_args(int argc, char **argv, int *progfd,
+			   enum net_load_type *load_type, int *ifindex)
+{
+	if (!REQ_ARGS(3))
+		return -EINVAL;
+
+	*progfd = prog_parse_fd(&argc, &argv);
+	if (*progfd < 0)
+		return *progfd;
+
+	*load_type = parse_load_type(*argv);
+	if (*load_type == __MAX_NET_LOAD_TYPE) {
+		p_err("invalid net load/unload type");
+		return -EINVAL;
+	}
+
+	NEXT_ARG();
+	if (!REQ_ARGS(1))
+		return -EINVAL;
+
+	*ifindex = if_nametoindex(*argv);
+	if (!*ifindex) {
+		p_err("Invalid ifname");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int do_load_unload_xdp(int *progfd, enum net_load_type *load_type,
+			      int *ifindex)
+{
+	__u32 flags;
+	int err;
+
+	flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
+	if (*load_type == NET_LOAD_TYPE_XDP_GENERIC)
+		flags |= XDP_FLAGS_SKB_MODE;
+	if (*load_type == NET_LOAD_TYPE_XDP_DRIVE)
+		flags |= XDP_FLAGS_DRV_MODE;
+	if (*load_type == NET_LOAD_TYPE_XDP_OFFLOAD)
+		flags |= XDP_FLAGS_HW_MODE;
+
+	err = bpf_set_link_xdp_fd(*ifindex, *progfd, flags);
+
+	return err;
+}
+
+static int do_load(int argc, char **argv)
+{
+	enum net_load_type load_type;
+	int err, progfd, ifindex;
+
+	err = parse_load_args(argc, argv, &progfd, &load_type, &ifindex);
+	if (err)
+		return err;
+
+	if (is_prefix("xdp", load_type_strings[load_type]))
+		err = do_load_unload_xdp(&progfd, &load_type, &ifindex);
+
+	if (err < 0) {
+		p_err("link set %s failed", load_type_strings[load_type]);
+		return -1;
+	}
+
+	if (json_output)
+		jsonw_null(json_wtr);
+
+	return 0;
+}
+
 static int do_show(int argc, char **argv)
 {
 	struct bpf_attach_info attach_info = {};
@@ -305,13 +405,17 @@ static int do_help(int argc, char **argv)
 
 	fprintf(stderr,
 		"Usage: %s %s { show | list } [dev <devname>]\n"
+		"       %s %s load PROG LOAD_TYPE <devname>\n"
 		"       %s %s help\n"
+		"\n"
+		"       " HELP_SPEC_PROGRAM "\n"
+		"       LOAD_TYPE := { xdp | xdpgeneric | xdpdrv | xdpoffload }\n"
 		"Note: Only xdp and tc attachments are supported now.\n"
 		"      For progs attached to cgroups, use \"bpftool cgroup\"\n"
 		"      to dump program attachments. For program types\n"
 		"      sk_{filter,skb,msg,reuseport} and lwt/seg6, please\n"
 		"      consult iproute2.\n",
-		bin_name, argv[-2], bin_name, argv[-2]);
+		bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2]);
 
 	return 0;
 }
@@ -319,6 +423,7 @@ static int do_help(int argc, char **argv)
 static const struct cmd cmds[] = {
 	{ "show",	do_show },
 	{ "list",	do_show },
+	{ "load",	do_load },
 	{ "help",	do_help },
 	{ 0 }
 };
-- 
2.20.1


^ permalink raw reply related

* [PATCH 0/2] tools: bpftool: add net (un)load command to load XDP
From: Daniel T. Lee @ 2019-07-30 18:48 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov; +Cc: netdev

Currently, bpftool net only supports dumping progs loaded on the
interface. To load XDP prog on interface, user must use other tool
(eg. iproute2). By this patch, with `bpftool net (un)load`, user can
(un)load XDP prog on interface.

    $ ./bpftool prog
    ...
    208: xdp  name xdp_prog1  tag ad822e38b629553f  gpl
      loaded_at 2019-07-28T18:03:11+0900  uid 0
    ...
    $ ./bpftool net load id 208 xdpdrv enp6s0np1
    $ ./bpftool net
    xdp:
    enp6s0np1(5) driver id 208
    ...
    $ ./bpftool net unload xdpdrv enp6s0np1
    $ ./bpftool net
    xdp:
    ...

The word 'load' is used instead of 'attach', since XDP program is not
considered as 'bpf_attach_type' and can't be attached with
'BPF_PROG_ATTACH'. In this context, the meaning of 'load' is, prog will
be loaded on interface.

While this patch only contains support for XDP, through `net (un)load`,
bpftool can further support other prog attach types.

XDP (un)load tested on Netronome Agilio.

Daniel T. Lee (2):
  tools: bpftool: add net load command to load XDP on interface
  tools: bpftool: add net unload command to unload XDP on interface

 tools/bpf/bpftool/net.c | 160 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 159 insertions(+), 1 deletion(-)

-- 
2.20.1

^ permalink raw reply

* Re: INFO: task hung in perf_event_free_task
From: syzbot @ 2019-07-30 18:44 UTC (permalink / raw)
  To: acme, acme, alexander.shishkin, ast, bpf, daniel, eranian, jolsa,
	kafai, linux-kernel, mark.rutland, mingo, mingo, namhyung, netdev,
	peterz, songliubraving, syzkaller-bugs, tglx, torvalds,
	vincent.weaver, yhs
In-Reply-To: <00000000000057102e058e722bba@google.com>

syzbot has bisected this bug to:

commit 1cf8dfe8a661f0462925df943140e9f6d1ea5233
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Sat Jul 13 09:21:25 2019 +0000

     perf/core: Fix race between close() and fork()

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=1523f40c600000
start commit:   c6dd78fc Merge branch 'x86-urgent-for-linus' of git://git...
git tree:       upstream
final crash:    https://syzkaller.appspot.com/x/report.txt?x=1723f40c600000
console output: https://syzkaller.appspot.com/x/log.txt?x=1323f40c600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=7937b718ddac333b
dashboard link: https://syzkaller.appspot.com/bug?extid=7692cea7450c97fa2a0a
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=17e888cc600000

Reported-by: syzbot+7692cea7450c97fa2a0a@syzkaller.appspotmail.com
Fixes: 1cf8dfe8a661 ("perf/core: Fix race between close() and fork()")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

^ permalink raw reply

* Re: [PATCH v4 net-next 12/19] ionic: Add async link status check and basic stats
From: Shannon Nelson @ 2019-07-30 18:35 UTC (permalink / raw)
  To: Saeed Mahameed, netdev@vger.kernel.org, davem@davemloft.net
In-Reply-To: <1621fa0c5649e1afb07889db0972cf87e1580332.camel@mellanox.com>

On 7/24/19 5:04 PM, Saeed Mahameed wrote:
> On Mon, 2019-07-22 at 14:40 -0700, Shannon Nelson wrote:
>> Add code to handle the link status event, and wire up the
>> basic netdev hardware stats.
>>
>> Signed-off-by: Shannon Nelson<snelson@pensando.io>
>> ---
>>   .../net/ethernet/pensando/ionic/ionic_lif.c   | 116
>> ++++++++++++++++++
>>   .../net/ethernet/pensando/ionic/ionic_lif.h   |   1 +
>>   2 files changed, 117 insertions(+)
[...]
>> +	/* After outstanding events are processed we can check on
>> +	 * the link status and any outstanding interrupt credits.
>> +	 *
>> +	 * We wait until here to check on the link status in case
>> +	 * there was a long list of link events from a flap episode.
>> +	 */
>> +	if (test_bit(LIF_LINK_CHECK_NEEDED, lif->state)) {
>> +		struct ionic_deferred_work *work;
>> +
>> +		work = kzalloc(sizeof(*work), GFP_ATOMIC);
>> +		if (!work) {
>> +			netdev_err(lif->netdev, "%s OOM\n", __func__);
> why not having a pre allocated dedicated lif->link_check_work, instead
> of allocating in atomic context on every link check event ?

I don't want to worry about the possibility of additional requests 
driven from other threads using the same struct.

>> +		} else {
>> +			work->type = DW_TYPE_LINK_STATUS;
>> +			ionic_lif_deferred_enqueue(&lif->deferred,
>> work);
>> +		}
>> +	}
>> +
>>   return_to_napi:
>>   	return work_done;
>>   }
>>   
>> +static void ionic_get_stats64(struct net_device *netdev,
>> +			      struct rtnl_link_stats64 *ns)
>> +{
>> +	struct lif *lif = netdev_priv(netdev);
>> +	struct lif_stats *ls;
>> +
>> +	memset(ns, 0, sizeof(*ns));
>> +	ls = &lif->info->stats;
>> +
>> +	ns->rx_packets = le64_to_cpu(ls->rx_ucast_packets) +
>> +			 le64_to_cpu(ls->rx_mcast_packets) +
>> +			 le64_to_cpu(ls->rx_bcast_packets);
>> +
>> +	ns->tx_packets = le64_to_cpu(ls->tx_ucast_packets) +
>> +			 le64_to_cpu(ls->tx_mcast_packets) +
>> +			 le64_to_cpu(ls->tx_bcast_packets);
>> +
>> +	ns->rx_bytes = le64_to_cpu(ls->rx_ucast_bytes) +
>> +		       le64_to_cpu(ls->rx_mcast_bytes) +
>> +		       le64_to_cpu(ls->rx_bcast_bytes);
>> +
>> +	ns->tx_bytes = le64_to_cpu(ls->tx_ucast_bytes) +
>> +		       le64_to_cpu(ls->tx_mcast_bytes) +
>> +		       le64_to_cpu(ls->tx_bcast_bytes);
>> +
>> +	ns->rx_dropped = le64_to_cpu(ls->rx_ucast_drop_packets) +
>> +			 le64_to_cpu(ls->rx_mcast_drop_packets) +
>> +			 le64_to_cpu(ls->rx_bcast_drop_packets);
>> +
>> +	ns->tx_dropped = le64_to_cpu(ls->tx_ucast_drop_packets) +
>> +			 le64_to_cpu(ls->tx_mcast_drop_packets) +
>> +			 le64_to_cpu(ls->tx_bcast_drop_packets);
>> +
>> +	ns->multicast = le64_to_cpu(ls->rx_mcast_packets);
>> +
>> +	ns->rx_over_errors = le64_to_cpu(ls->rx_queue_empty);
>> +
>> +	ns->rx_missed_errors = le64_to_cpu(ls->rx_dma_error) +
>> +			       le64_to_cpu(ls->rx_queue_disabled) +
>> +			       le64_to_cpu(ls->rx_desc_fetch_error) +
>> +			       le64_to_cpu(ls->rx_desc_data_error);
>> +
>> +	ns->tx_aborted_errors = le64_to_cpu(ls->tx_dma_error) +
>> +				le64_to_cpu(ls->tx_queue_disabled) +
>> +				le64_to_cpu(ls->tx_desc_fetch_error) +
>> +				le64_to_cpu(ls->tx_desc_data_error);
>> +
>> +	ns->rx_errors = ns->rx_over_errors +
>> +			ns->rx_missed_errors;
>> +
>> +	ns->tx_errors = ns->tx_aborted_errors;
>> +}
>> +
>>   static int ionic_lif_addr_add(struct lif *lif, const u8 *addr)
>>   {
>>   	struct ionic_admin_ctx ctx = {
>> @@ -581,6 +693,7 @@ static int ionic_vlan_rx_kill_vid(struct
>> net_device *netdev, __be16 proto,
>>   static const struct net_device_ops ionic_netdev_ops = {
>>   	.ndo_open               = ionic_open,
>>   	.ndo_stop               = ionic_stop,
>> +	.ndo_get_stats64	= ionic_get_stats64,
>>   	.ndo_set_rx_mode	= ionic_set_rx_mode,
>>   	.ndo_set_features	= ionic_set_features,
>>   	.ndo_set_mac_address	= ionic_set_mac_address,
>> @@ -1418,6 +1531,8 @@ static int ionic_lif_init(struct lif *lif)
>>   
>>   	set_bit(LIF_INITED, lif->state);
>>   
>> +	ionic_link_status_check(lif);
>> +
>>   	return 0;
>>   
>>   err_out_notifyq_deinit:
>> @@ -1461,6 +1576,7 @@ int ionic_lifs_register(struct ionic *ionic)
>>   		return err;
>>   	}
>>   
> are events (NotifyQ) enabled at this stage ? if so then you might endup
> racing ionic_link_status_check with itself.

I'll look at that again to see what such a race might do.  I probably 
should add a test here and in a couple other spots to see if the link 
status check has already been requested.

sln


^ permalink raw reply

* Re: [PATCH v4 net-next 09/19] ionic: Add the basic NDO callbacks for netdev support
From: Shannon Nelson @ 2019-07-30 18:35 UTC (permalink / raw)
  To: Saeed Mahameed, netdev@vger.kernel.org, davem@davemloft.net
In-Reply-To: <c5d6315ee4b72d9b2a977866b9849ffe9183f4b6.camel@mellanox.com>

On 7/24/19 4:45 PM, Saeed Mahameed wrote:
> On Mon, 2019-07-22 at 14:40 -0700, Shannon Nelson wrote:
>> Set up the initial NDO structure and callbacks for netdev
>> to use, and register the netdev.  This will allow us to do
>> a few basic operations on the device, but no traffic yet.
>>
>> Signed-off-by: Shannon Nelson<snelson@pensando.io>
>> ---
>>   drivers/net/ethernet/pensando/ionic/ionic.h   |   1 +
>>   .../ethernet/pensando/ionic/ionic_bus_pci.c   |   9 +
>>   .../net/ethernet/pensando/ionic/ionic_dev.h   |   2 +
>>   .../net/ethernet/pensando/ionic/ionic_lif.c   | 348
>> ++++++++++++++++++
>>   .../net/ethernet/pensando/ionic/ionic_lif.h   |   5 +
>>   5 files changed, 365 insertions(+)
>>
[...]
>>   
>> +static int ionic_set_nic_features(struct lif *lif, netdev_features_t
>> features);
>>   static int ionic_notifyq_clean(struct lif *lif, int budget);
>>   
>> +int ionic_open(struct net_device *netdev)
>> +{
>> +	struct lif *lif = netdev_priv(netdev);
>> +
>> +	netif_carrier_off(netdev);
>> +
>> +	set_bit(LIF_UP, lif->state);
>> +
>> +	if (netif_carrier_ok(netdev))
> always false ? you just invoked netif_carrier_off two lines ago..

Hmmm... an artifact of splitting up an existing driver.  This makes more 
sense when there's a link status check in between these, which comes in 
about 3 patches later.  Unless this really causes someone significant 
heartburn, I'm going to leave this as is for now.


>> +		netif_tx_wake_all_queues(netdev);
>> +
>> +	return 0;
>> +}
>> +
>> +static int ionic_lif_stop(struct lif *lif)
>> +{
>> +	struct net_device *ndev = lif->netdev;
>> +	int err = 0;
>> +
>> +	if (!test_bit(LIF_UP, lif->state)) {
>> +		dev_dbg(lif->ionic->dev, "%s: %s state=DOWN\n",
>> +			__func__, lif->name);
>> +		return 0;
>> +	}
>> +	dev_dbg(lif->ionic->dev, "%s: %s state=UP\n", __func__, lif-
>>> name);
>> +	clear_bit(LIF_UP, lif->state);
>> +
>> +	/* carrier off before disabling queues to avoid watchdog
>> timeout */
>> +	netif_carrier_off(ndev);
>> +	netif_tx_stop_all_queues(ndev);
>> +	netif_tx_disable(ndev);
>> +	synchronize_rcu();
> why synchronize_rcu ?

Looks like a little leakage from a feature in the internal driver, I'll 
remove it.

>> +
>> +	return err;
>> +}
>> +
>> +int ionic_stop(struct net_device *netdev)
>> +{
>> +	struct lif *lif = netdev_priv(netdev);
>> +
>> +	return ionic_lif_stop(lif);
>> +}
>> +
>> +int ionic_reset_queues(struct lif *lif)
>> +{
>> +	bool running;
>> +	int err = 0;
>> +
>> +	/* Put off the next watchdog timeout */
>> +	netif_trans_update(lif->netdev);
> this doesn't seem right to me also this won't help you if the next
> while loop takes too long.. also netif_trans_update is marked to be
> only used for legacy drivers.

If the loop takes too long, then I don't mind if the watchdog goes off.

Yes, its primary use is now handled by netdev_start_xmit(), but it is 
still a handy gizmo to give a little more cushion until the carrier is 
set off in ionic_lif_stop().

sln


^ permalink raw reply

* Re: [PATCH v4 net-next 08/19] ionic: Add notifyq support
From: Shannon Nelson @ 2019-07-30 18:35 UTC (permalink / raw)
  To: Saeed Mahameed, netdev@vger.kernel.org, davem@davemloft.net
In-Reply-To: <879ae2c3f79d5212253811518769cdaa4bf8b9c7.camel@mellanox.com>

On 7/24/19 4:21 PM, Saeed Mahameed wrote:
> On Mon, 2019-07-22 at 14:40 -0700, Shannon Nelson wrote:
>> The AdminQ is fine for sending messages and requests to the NIC,
>> but we also need to have events published from the NIC to the
>> driver.  The NotifyQ handles this for us, using the same interrupt
>> as AdminQ.
>>
>> Signed-off-by: Shannon Nelson<snelson@pensando.io>
>> ---
>>   .../ethernet/pensando/ionic/ionic_debugfs.c   |  16 ++
>>   .../net/ethernet/pensando/ionic/ionic_lif.c   | 181
>> +++++++++++++++++-
>>   .../net/ethernet/pensando/ionic/ionic_lif.h   |   4 +
>>   3 files changed, 200 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/pensando/ionic/ionic_debugfs.c
>> b/drivers/net/ethernet/pensando/ionic/ionic_debugfs.c
>> index 9af15c69b2a6..1d05b23de303 100644
>> --- a/drivers/net/ethernet/pensando/ionic/ionic_debugfs.c
>> +++ b/drivers/net/ethernet/pensando/ionic/ionic_debugfs.c
>> @@ -126,6 +126,7 @@ int ionic_debugfs_add_qcq(struct lif *lif, struct
>> qcq *qcq)
>>   	struct debugfs_blob_wrapper *desc_blob;
>>   	struct device *dev = lif->ionic->dev;
>>   	struct intr *intr = &qcq->intr;
>> +	struct dentry *stats_dentry;
>>   	struct queue *q = &qcq->q;
>>   	struct cq *cq = &qcq->cq;
>>   
>> @@ -219,6 +220,21 @@ int ionic_debugfs_add_qcq(struct lif *lif,
>> struct qcq *qcq)
>>   					intr_ctrl_regset);
>>   	}
>>   
>> +	if (qcq->flags & QCQ_F_NOTIFYQ) {
>> +		stats_dentry = debugfs_create_dir("notifyblock",
>> qcq_dentry);
>> +		if (IS_ERR_OR_NULL(stats_dentry))
>> +			return PTR_ERR(stats_dentry);
>> +
>> +		debugfs_create_u64("eid", 0400, stats_dentry,
>> +				   (u64 *)&lif->info->status.eid);
>> +		debugfs_create_u16("link_status", 0400, stats_dentry,
>> +				   (u16 *)&lif->info-
>>> status.link_status);
>> +		debugfs_create_u32("link_speed", 0400, stats_dentry,
>> +				   (u32 *)&lif->info-
>>> status.link_speed);
>> +		debugfs_create_u16("link_down_count", 0400,
>> stats_dentry,
>> +				   (u16 *)&lif->info-
>>> status.link_down_count);
>> +	}
>> +
> you never write to these lif->info->status.xyz ..

This is data coming out of DMA memory from the nic, the driver only 
reads it.

> and link state and speed are/should be available  in "ethtool <ifname>"
> so this looks redundant to me. you can also use ethtool -S to report
> linkdown count.

The notifyblock is a chunk of data that usually stays together, but 
isn't really something for ethtool -S, I'd prefer to leave this here.

[...]
>> +	case EVENT_OPCODE_LOG:
>> +		netdev_info(netdev, "Notifyq EVENT_OPCODE_LOG
>> eid=%lld\n", eid);
>> +		print_hex_dump(KERN_INFO, "notifyq ",
>> DUMP_PREFIX_OFFSET, 16, 1,
>> +			       comp->log.data, sizeof(comp->log.data),
>> true);
> So your device can generate log buffer dump into the kernel log ..
> I am not sure how acceptable this is, maybe trace buffer is more
> appropriate for this.

It turns out that this early design feature has gone unused, so I'll 
drop it out of here for now.

Thanks,
sln

^ permalink raw reply

* Re: [PATCH] net: usb: pegasus: fix improper read if get_registers() fail
From: Denis Kirjanov @ 2019-07-30 18:23 UTC (permalink / raw)
  To: David Miller; +Cc: petkan, netdev
In-Reply-To: <20190730.102434.1438984182304969810.davem@davemloft.net>

On 7/30/19, David Miller <davem@davemloft.net> wrote:
> From: Denis Kirjanov <kda@linux-powerpc.org>
> Date: Tue, 30 Jul 2019 15:13:57 +0200
>
>> get_registers() may fail with -ENOMEM and in this
>> case we can read a garbage from the status variable tmp.
>>
>> Reported-by: syzbot+3499a83b2d062ae409d4@syzkaller.appspotmail.com
>> Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
>
> Why did you post this patch twice?  What is different between the two
> versions?
>
Looks like it was the issue with git send-email. Sorry about that.
Do you want me to figure out the reason and resend?

^ permalink raw reply

* [net  1/1] tipc: fix unitilized skb list crash
From: Jon Maloy @ 2019-07-30 18:19 UTC (permalink / raw)
  To: davem, netdev
  Cc: tung.q.nguyen, hoang.h.le, jon.maloy, lxin, shuali, ying.xue,
	tipc-discussion

Our test suite somtimes provokes the following crash:

Description of problem:
[ 1092.597234] BUG: unable to handle kernel NULL pointer dereference at 00000000000000e8
[ 1092.605072] PGD 0 P4D 0
[ 1092.607620] Oops: 0000 [#1] SMP PTI
[ 1092.611118] CPU: 37 PID: 0 Comm: swapper/37 Kdump: loaded Not tainted 4.18.0-122.el8.x86_64 #1
[ 1092.619724] Hardware name: Dell Inc. PowerEdge R740/08D89F, BIOS 1.3.7 02/08/2018
[ 1092.627215] RIP: 0010:tipc_mcast_filter_msg+0x93/0x2d0 [tipc]
[ 1092.632955] Code: 0f 84 aa 01 00 00 89 cf 4d 01 ca 4c 8b 26 c1 ef 19 83 e7 0f 83 ff 0c 4d 0f 45 d1 41 8b 6a 10 0f cd 4c 39 e6 0f 84 81 01 00 00 <4d> 8b 9c 24 e8 00 00 00 45 8b 13 41 0f ca 44 89 d7 c1 ef 13 83 e7
[ 1092.651703] RSP: 0018:ffff929e5fa83a18 EFLAGS: 00010282
[ 1092.656927] RAX: ffff929e3fb38100 RBX: 00000000069f29ee RCX: 00000000416c0045
[ 1092.664058] RDX: ffff929e5fa83a88 RSI: ffff929e31a28420 RDI: 0000000000000000
[ 1092.671209] RBP: 0000000029b11821 R08: 0000000000000000 R09: ffff929e39b4407a
[ 1092.678343] R10: ffff929e39b4407a R11: 0000000000000007 R12: 0000000000000000
[ 1092.685475] R13: 0000000000000001 R14: ffff929e3fb38100 R15: ffff929e39b4407a
[ 1092.692614] FS:  0000000000000000(0000) GS:ffff929e5fa80000(0000) knlGS:0000000000000000
[ 1092.700702] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1092.706447] CR2: 00000000000000e8 CR3: 000000031300a004 CR4: 00000000007606e0
[ 1092.713579] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1092.720712] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1092.727843] PKRU: 55555554
[ 1092.730556] Call Trace:
[ 1092.733010]  <IRQ>
[ 1092.735034]  tipc_sk_filter_rcv+0x7ca/0xb80 [tipc]
[ 1092.739828]  ? __kmalloc_node_track_caller+0x1cb/0x290
[ 1092.744974]  ? dev_hard_start_xmit+0xa5/0x210
[ 1092.749332]  tipc_sk_rcv+0x389/0x640 [tipc]
[ 1092.753519]  tipc_sk_mcast_rcv+0x23c/0x3a0 [tipc]
[ 1092.758224]  tipc_rcv+0x57a/0xf20 [tipc]
[ 1092.762154]  ? ktime_get_real_ts64+0x40/0xe0
[ 1092.766432]  ? tpacket_rcv+0x50/0x9f0
[ 1092.770098]  tipc_l2_rcv_msg+0x4a/0x70 [tipc]
[ 1092.774452]  __netif_receive_skb_core+0xb62/0xbd0
[ 1092.779164]  ? enqueue_entity+0xf6/0x630
[ 1092.783084]  ? kmem_cache_alloc+0x158/0x1c0
[ 1092.787272]  ? __build_skb+0x25/0xd0
[ 1092.790849]  netif_receive_skb_internal+0x42/0xf0
[ 1092.795557]  napi_gro_receive+0xba/0xe0
[ 1092.799417]  mlx5e_handle_rx_cqe+0x83/0xd0 [mlx5_core]
[ 1092.804564]  mlx5e_poll_rx_cq+0xd5/0x920 [mlx5_core]
[ 1092.809536]  mlx5e_napi_poll+0xb2/0xce0 [mlx5_core]
[ 1092.814415]  ? __wake_up_common_lock+0x89/0xc0
[ 1092.818861]  net_rx_action+0x149/0x3b0
[ 1092.822616]  __do_softirq+0xe3/0x30a
[ 1092.826193]  irq_exit+0x100/0x110
[ 1092.829512]  do_IRQ+0x85/0xd0
[ 1092.832483]  common_interrupt+0xf/0xf
[ 1092.836147]  </IRQ>
[ 1092.838255] RIP: 0010:cpuidle_enter_state+0xb7/0x2a0
[ 1092.843221] Code: e8 3e 79 a5 ff 80 7c 24 03 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 d7 01 00 00 31 ff e8 a0 6b ab ff fb 66 0f 1f 44 00 00 <48> b8 ff ff ff ff f3 01 00 00 4c 29 f3 ba ff ff ff 7f 48 39 c3 7f
[ 1092.861967] RSP: 0018:ffffaa5ec6533e98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd
[ 1092.869530] RAX: ffff929e5faa3100 RBX: 000000fe63dd2092 RCX: 000000000000001f
[ 1092.876665] RDX: 000000fe63dd2092 RSI: 000000003a518aaa RDI: 0000000000000000
[ 1092.883795] RBP: 0000000000000003 R08: 0000000000000004 R09: 0000000000022940
[ 1092.890929] R10: 0000040cb0666b56 R11: ffff929e5faa20a8 R12: ffff929e5faade78
[ 1092.898060] R13: ffffffffb59258f8 R14: 000000fe60f3228d R15: 0000000000000000
[ 1092.905196]  ? cpuidle_enter_state+0x92/0x2a0
[ 1092.909555]  do_idle+0x236/0x280
[ 1092.912785]  cpu_startup_entry+0x6f/0x80
[ 1092.916715]  start_secondary+0x1a7/0x200
[ 1092.920642]  secondary_startup_64+0xb7/0xc0
[...]

The reason is that the skb list tipc_socket::mc_method.deferredq only
is initialized for connectionless sockets, while nothing stops arriving
multicast messages from being filtered by connection oriented sockets,
with subsequent access to the said list.

We fix this by initializing the list unconditionally at socket creation.
This eliminates the crash, while the message still is dropped further
down in tipc_sk_filter_rcv() as it should be.

Reported-by: Li Shuang <shuali@redhat.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
---
 net/tipc/socket.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index dd8537f..83ae41d 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -485,9 +485,8 @@ static int tipc_sk_create(struct net *net, struct socket *sock,
 		tsk_set_unreturnable(tsk, true);
 		if (sock->type == SOCK_DGRAM)
 			tsk_set_unreliable(tsk, true);
-		__skb_queue_head_init(&tsk->mc_method.deferredq);
 	}
-
+	__skb_queue_head_init(&tsk->mc_method.deferredq);
 	trace_tipc_sk_create(sk, NULL, TIPC_DUMP_NONE, " ");
 	return 0;
 }
-- 
2.1.4


^ permalink raw reply related

* [PATCH v6 51/57] net: Remove dev_err() usage after platform_get_irq()
From: Stephen Boyd @ 2019-07-30 18:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: David S. Miller, Kalle Valo, Saeed Mahameed, Jeff Kirsher,
	Felix Fietkau, Lorenzo Bianconi, netdev, Greg Kroah-Hartman
In-Reply-To: <20190730181557.90391-1-swboyd@chromium.org>

We don't need dev_err() messages when platform_get_irq() fails now that
platform_get_irq() prints an error message itself when something goes
wrong. Let's remove these prints with a simple semantic patch.

// <smpl>
@@
expression ret;
struct platform_device *E;
@@

ret =
(
platform_get_irq(E, ...)
|
platform_get_irq_byname(E, ...)
);

if ( \( ret < 0 \| ret <= 0 \) )
{
(
-if (ret != -EPROBE_DEFER)
-{ ...
-dev_err(...);
-... }
|
...
-dev_err(...);
)
...
}
// </smpl>

While we're here, remove braces on if statements that only have one
statement (manually).

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Felix Fietkau <nbd@nbd.name>
Cc: Lorenzo Bianconi <lorenzo@kernel.org>
Cc: netdev@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
---

Please apply directly to subsystem trees

 drivers/net/can/janz-ican3.c                       |  1 -
 drivers/net/can/rcar/rcar_can.c                    |  1 -
 drivers/net/can/rcar/rcar_canfd.c                  |  2 --
 drivers/net/can/sun4i_can.c                        |  1 -
 drivers/net/ethernet/amd/au1000_eth.c              |  1 -
 drivers/net/ethernet/amd/xgbe/xgbe-platform.c      | 14 +++-----------
 drivers/net/ethernet/apm/xgene-v2/main.c           |  4 +---
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c   |  4 +---
 drivers/net/ethernet/aurora/nb8800.c               |  4 +---
 drivers/net/ethernet/broadcom/bgmac-platform.c     |  4 +---
 drivers/net/ethernet/cortina/gemini.c              |  4 +---
 drivers/net/ethernet/davicom/dm9000.c              |  2 --
 drivers/net/ethernet/hisilicon/hisi_femac.c        |  1 -
 drivers/net/ethernet/lantiq_xrx200.c               | 10 ++--------
 drivers/net/ethernet/nuvoton/w90p910_ether.c       |  2 --
 drivers/net/ethernet/qualcomm/emac/emac.c          |  5 +----
 drivers/net/ethernet/socionext/sni_ave.c           |  4 +---
 .../ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c    |  7 +------
 .../net/ethernet/stmicro/stmmac/stmmac_platform.c  |  7 +------
 drivers/net/wireless/mediatek/mt76/mt7603/soc.c    |  4 +---
 20 files changed, 15 insertions(+), 67 deletions(-)

diff --git a/drivers/net/can/janz-ican3.c b/drivers/net/can/janz-ican3.c
index 19d4f52a8f90..a761092e6ac9 100644
--- a/drivers/net/can/janz-ican3.c
+++ b/drivers/net/can/janz-ican3.c
@@ -1936,7 +1936,6 @@ static int ican3_probe(struct platform_device *pdev)
 	/* find our IRQ number */
 	mod->irq = platform_get_irq(pdev, 0);
 	if (mod->irq < 0) {
-		dev_err(dev, "IRQ line not found\n");
 		ret = -ENODEV;
 		goto out_free_ndev;
 	}
diff --git a/drivers/net/can/rcar/rcar_can.c b/drivers/net/can/rcar/rcar_can.c
index 13e66297b65f..cf218949a8fb 100644
--- a/drivers/net/can/rcar/rcar_can.c
+++ b/drivers/net/can/rcar/rcar_can.c
@@ -759,7 +759,6 @@ static int rcar_can_probe(struct platform_device *pdev)
 
 	irq = platform_get_irq(pdev, 0);
 	if (irq < 0) {
-		dev_err(&pdev->dev, "No IRQ resource\n");
 		err = irq;
 		goto fail;
 	}
diff --git a/drivers/net/can/rcar/rcar_canfd.c b/drivers/net/can/rcar/rcar_canfd.c
index 05410008aa6b..51eecc7cdcdd 100644
--- a/drivers/net/can/rcar/rcar_canfd.c
+++ b/drivers/net/can/rcar/rcar_canfd.c
@@ -1651,14 +1651,12 @@ static int rcar_canfd_probe(struct platform_device *pdev)
 
 	ch_irq = platform_get_irq(pdev, 0);
 	if (ch_irq < 0) {
-		dev_err(&pdev->dev, "no Channel IRQ resource\n");
 		err = ch_irq;
 		goto fail_dev;
 	}
 
 	g_irq = platform_get_irq(pdev, 1);
 	if (g_irq < 0) {
-		dev_err(&pdev->dev, "no Global IRQ resource\n");
 		err = g_irq;
 		goto fail_dev;
 	}
diff --git a/drivers/net/can/sun4i_can.c b/drivers/net/can/sun4i_can.c
index 093fc9a529f0..f4cd88196404 100644
--- a/drivers/net/can/sun4i_can.c
+++ b/drivers/net/can/sun4i_can.c
@@ -787,7 +787,6 @@ static int sun4ican_probe(struct platform_device *pdev)
 
 	irq = platform_get_irq(pdev, 0);
 	if (irq < 0) {
-		dev_err(&pdev->dev, "could not get a valid irq\n");
 		err = -ENODEV;
 		goto exit;
 	}
diff --git a/drivers/net/ethernet/amd/au1000_eth.c b/drivers/net/ethernet/amd/au1000_eth.c
index 650d1bae5f56..1793950f0582 100644
--- a/drivers/net/ethernet/amd/au1000_eth.c
+++ b/drivers/net/ethernet/amd/au1000_eth.c
@@ -1100,7 +1100,6 @@ static int au1000_probe(struct platform_device *pdev)
 
 	irq = platform_get_irq(pdev, 0);
 	if (irq < 0) {
-		dev_err(&pdev->dev, "failed to retrieve IRQ\n");
 		err = -ENODEV;
 		goto out;
 	}
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-platform.c b/drivers/net/ethernet/amd/xgbe/xgbe-platform.c
index d0f3dfb88202..dce9e59e8881 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-platform.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-platform.c
@@ -467,10 +467,8 @@ static int xgbe_platform_probe(struct platform_device *pdev)
 
 	/* Get the device interrupt */
 	ret = platform_get_irq(pdev, 0);
-	if (ret < 0) {
-		dev_err(dev, "platform_get_irq 0 failed\n");
+	if (ret < 0)
 		goto err_io;
-	}
 	pdata->dev_irq = ret;
 
 	/* Get the per channel DMA interrupts */
@@ -479,12 +477,8 @@ static int xgbe_platform_probe(struct platform_device *pdev)
 
 		for (i = 0; (i < max) && (dma_irqnum < dma_irqend); i++) {
 			ret = platform_get_irq(pdata->platdev, dma_irqnum++);
-			if (ret < 0) {
-				netdev_err(pdata->netdev,
-					   "platform_get_irq %u failed\n",
-					   dma_irqnum - 1);
+			if (ret < 0)
 				goto err_io;
-			}
 
 			pdata->channel_irq[i] = ret;
 		}
@@ -496,10 +490,8 @@ static int xgbe_platform_probe(struct platform_device *pdev)
 
 	/* Get the auto-negotiation interrupt */
 	ret = platform_get_irq(phy_pdev, phy_irqnum++);
-	if (ret < 0) {
-		dev_err(dev, "platform_get_irq phy 0 failed\n");
+	if (ret < 0)
 		goto err_io;
-	}
 	pdata->an_irq = ret;
 
 	/* Configure the netdev resource */
diff --git a/drivers/net/ethernet/apm/xgene-v2/main.c b/drivers/net/ethernet/apm/xgene-v2/main.c
index 79048cc46703..02b4f3af02b5 100644
--- a/drivers/net/ethernet/apm/xgene-v2/main.c
+++ b/drivers/net/ethernet/apm/xgene-v2/main.c
@@ -54,10 +54,8 @@ static int xge_get_resources(struct xge_pdata *pdata)
 	}
 
 	ret = platform_get_irq(pdev, 0);
-	if (ret < 0) {
-		dev_err(dev, "Unable to get irq\n");
+	if (ret < 0)
 		return ret;
-	}
 	pdata->resources.irq = ret;
 
 	return 0;
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
index 10b1c053e70a..a63baca97f53 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
@@ -1636,9 +1636,7 @@ static int xgene_enet_get_irqs(struct xgene_enet_pdata *pdata)
 				pdata->cq_cnt = max_irqs / 2;
 				break;
 			}
-			dev_err(dev, "Unable to get ENET IRQ\n");
-			ret = ret ? : -ENXIO;
-			return ret;
+			return ret ? : -ENXIO;
 		}
 		pdata->irqs[i] = ret;
 	}
diff --git a/drivers/net/ethernet/aurora/nb8800.c b/drivers/net/ethernet/aurora/nb8800.c
index 3b3370a94a9c..37752d9514e7 100644
--- a/drivers/net/ethernet/aurora/nb8800.c
+++ b/drivers/net/ethernet/aurora/nb8800.c
@@ -1351,10 +1351,8 @@ static int nb8800_probe(struct platform_device *pdev)
 		ops = match->data;
 
 	irq = platform_get_irq(pdev, 0);
-	if (irq <= 0) {
-		dev_err(&pdev->dev, "No IRQ\n");
+	if (irq <= 0)
 		return -EINVAL;
-	}
 
 	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
 	base = devm_ioremap_resource(&pdev->dev, res);
diff --git a/drivers/net/ethernet/broadcom/bgmac-platform.c b/drivers/net/ethernet/broadcom/bgmac-platform.c
index 6dc0dd91ad11..c46c1b1416f7 100644
--- a/drivers/net/ethernet/broadcom/bgmac-platform.c
+++ b/drivers/net/ethernet/broadcom/bgmac-platform.c
@@ -199,10 +199,8 @@ static int bgmac_probe(struct platform_device *pdev)
 		dev_warn(&pdev->dev, "MAC address not present in device tree\n");
 
 	bgmac->irq = platform_get_irq(pdev, 0);
-	if (bgmac->irq < 0) {
-		dev_err(&pdev->dev, "Unable to obtain IRQ\n");
+	if (bgmac->irq < 0)
 		return bgmac->irq;
-	}
 
 	regs = platform_get_resource_byname(pdev, IORESOURCE_MEM, "amac_base");
 	if (!regs) {
diff --git a/drivers/net/ethernet/cortina/gemini.c b/drivers/net/ethernet/cortina/gemini.c
index 9003eb6716cd..5a8d7b44faf9 100644
--- a/drivers/net/ethernet/cortina/gemini.c
+++ b/drivers/net/ethernet/cortina/gemini.c
@@ -2423,10 +2423,8 @@ static int gemini_ethernet_port_probe(struct platform_device *pdev)
 
 	/* Interrupt */
 	irq = platform_get_irq(pdev, 0);
-	if (irq <= 0) {
-		dev_err(dev, "no IRQ\n");
+	if (irq <= 0)
 		return irq ? irq : -ENODEV;
-	}
 	port->irq = irq;
 
 	/* Clock the port */
diff --git a/drivers/net/ethernet/davicom/dm9000.c b/drivers/net/ethernet/davicom/dm9000.c
index 386bdc1378d1..cce90b5925d9 100644
--- a/drivers/net/ethernet/davicom/dm9000.c
+++ b/drivers/net/ethernet/davicom/dm9000.c
@@ -1500,8 +1500,6 @@ dm9000_probe(struct platform_device *pdev)
 
 	ndev->irq = platform_get_irq(pdev, 0);
 	if (ndev->irq < 0) {
-		dev_err(db->dev, "interrupt resource unavailable: %d\n",
-			ndev->irq);
 		ret = ndev->irq;
 		goto out;
 	}
diff --git a/drivers/net/ethernet/hisilicon/hisi_femac.c b/drivers/net/ethernet/hisilicon/hisi_femac.c
index 689f18e3100f..90ab7ade44c4 100644
--- a/drivers/net/ethernet/hisilicon/hisi_femac.c
+++ b/drivers/net/ethernet/hisilicon/hisi_femac.c
@@ -877,7 +877,6 @@ static int hisi_femac_drv_probe(struct platform_device *pdev)
 
 	ndev->irq = platform_get_irq(pdev, 0);
 	if (ndev->irq <= 0) {
-		dev_err(dev, "No irq resource\n");
 		ret = -ENODEV;
 		goto out_disconnect_phy;
 	}
diff --git a/drivers/net/ethernet/lantiq_xrx200.c b/drivers/net/ethernet/lantiq_xrx200.c
index cda641ef89af..900affbdcc0e 100644
--- a/drivers/net/ethernet/lantiq_xrx200.c
+++ b/drivers/net/ethernet/lantiq_xrx200.c
@@ -458,17 +458,11 @@ static int xrx200_probe(struct platform_device *pdev)
 	}
 
 	priv->chan_rx.dma.irq = platform_get_irq_byname(pdev, "rx");
-	if (priv->chan_rx.dma.irq < 0) {
-		dev_err(dev, "failed to get RX IRQ, %i\n",
-			priv->chan_rx.dma.irq);
+	if (priv->chan_rx.dma.irq < 0)
 		return -ENOENT;
-	}
 	priv->chan_tx.dma.irq = platform_get_irq_byname(pdev, "tx");
-	if (priv->chan_tx.dma.irq < 0) {
-		dev_err(dev, "failed to get TX IRQ, %i\n",
-			priv->chan_tx.dma.irq);
+	if (priv->chan_tx.dma.irq < 0)
 		return -ENOENT;
-	}
 
 	/* get the clock */
 	priv->clk = devm_clk_get(dev, NULL);
diff --git a/drivers/net/ethernet/nuvoton/w90p910_ether.c b/drivers/net/ethernet/nuvoton/w90p910_ether.c
index 3d73970b3a2e..219b0b863c89 100644
--- a/drivers/net/ethernet/nuvoton/w90p910_ether.c
+++ b/drivers/net/ethernet/nuvoton/w90p910_ether.c
@@ -993,14 +993,12 @@ static int w90p910_ether_probe(struct platform_device *pdev)
 
 	ether->txirq = platform_get_irq(pdev, 0);
 	if (ether->txirq < 0) {
-		dev_err(&pdev->dev, "failed to get ether tx irq\n");
 		error = -ENXIO;
 		goto failed_free_io;
 	}
 
 	ether->rxirq = platform_get_irq(pdev, 1);
 	if (ether->rxirq < 0) {
-		dev_err(&pdev->dev, "failed to get ether rx irq\n");
 		error = -ENXIO;
 		goto failed_free_io;
 	}
diff --git a/drivers/net/ethernet/qualcomm/emac/emac.c b/drivers/net/ethernet/qualcomm/emac/emac.c
index 59c2349b59df..bfe10464c81f 100644
--- a/drivers/net/ethernet/qualcomm/emac/emac.c
+++ b/drivers/net/ethernet/qualcomm/emac/emac.c
@@ -556,11 +556,8 @@ static int emac_probe_resources(struct platform_device *pdev,
 
 	/* Core 0 interrupt */
 	ret = platform_get_irq(pdev, 0);
-	if (ret < 0) {
-		dev_err(&pdev->dev,
-			"error: missing core0 irq resource (error=%i)\n", ret);
+	if (ret < 0)
 		return ret;
-	}
 	adpt->irq.irq = ret;
 
 	/* base register address */
diff --git a/drivers/net/ethernet/socionext/sni_ave.c b/drivers/net/ethernet/socionext/sni_ave.c
index 51a7b48db4bc..87ab0b5da91e 100644
--- a/drivers/net/ethernet/socionext/sni_ave.c
+++ b/drivers/net/ethernet/socionext/sni_ave.c
@@ -1573,10 +1573,8 @@ static int ave_probe(struct platform_device *pdev)
 	}
 
 	irq = platform_get_irq(pdev, 0);
-	if (irq < 0) {
-		dev_err(dev, "IRQ not found\n");
+	if (irq < 0)
 		return irq;
-	}
 
 	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
 	base = devm_ioremap_resource(dev, res);
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c
index 3a14cdd01f5f..cac32f7eb6f3 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c
@@ -428,13 +428,8 @@ static int dwc_eth_dwmac_probe(struct platform_device *pdev)
 	 * resource initialization is done in the glue logic.
 	 */
 	stmmac_res.irq = platform_get_irq(pdev, 0);
-	if (stmmac_res.irq < 0) {
-		if (stmmac_res.irq != -EPROBE_DEFER)
-			dev_err(&pdev->dev,
-				"IRQ configuration information not found\n");
-
+	if (stmmac_res.irq < 0)
 		return stmmac_res.irq;
-	}
 	stmmac_res.wol_irq = stmmac_res.irq;
 
 	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
index 73fc2524372e..1ca3d8009b55 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
@@ -602,13 +602,8 @@ int stmmac_get_platform_resources(struct platform_device *pdev,
 	 * probe if needed before we went too far with resource allocation.
 	 */
 	stmmac_res->irq = platform_get_irq_byname(pdev, "macirq");
-	if (stmmac_res->irq < 0) {
-		if (stmmac_res->irq != -EPROBE_DEFER) {
-			dev_err(&pdev->dev,
-				"MAC IRQ configuration information not found\n");
-		}
+	if (stmmac_res->irq < 0)
 		return stmmac_res->irq;
-	}
 
 	/* On some platforms e.g. SPEAr the wake up irq differs from the mac irq
 	 * The external wake up irq can be passed through the platform code
diff --git a/drivers/net/wireless/mediatek/mt76/mt7603/soc.c b/drivers/net/wireless/mediatek/mt76/mt7603/soc.c
index b920be1f5718..c6c1ce69bcbc 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7603/soc.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7603/soc.c
@@ -17,10 +17,8 @@ mt76_wmac_probe(struct platform_device *pdev)
 	int ret;
 
 	irq = platform_get_irq(pdev, 0);
-	if (irq < 0) {
-		dev_err(&pdev->dev, "Failed to get device IRQ\n");
+	if (irq < 0)
 		return irq;
-	}
 
 	mem_base = devm_ioremap_resource(&pdev->dev, res);
 	if (IS_ERR(mem_base)) {
-- 
Sent by a computer through tubes


^ permalink raw reply related

* [PATCH v2 bpf-next] selftests/bpf: fix clearing buffered output between tests/subtests
From: Andrii Nakryiko @ 2019-07-30 18:05 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Clear buffered output once test or subtests finishes even if test was
successful. Not doing this leads to accumulation of output from previous
tests and on first failed tests lots of irrelevant output will be
dumped, greatly confusing things.

v1->v2: fix Fixes tag, add more context to patch

Fixes: 3a516a0a3a7b ("selftests/bpf: add sub-tests support for test_progs")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/testing/selftests/bpf/test_progs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index 546d99b3ec34..db00196c8315 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -39,22 +39,22 @@ static bool should_run(struct test_selector *sel, int num, const char *name)
 }
 
 static void dump_test_log(const struct prog_test_def *test, bool failed)
 {
 	if (env.verbose || test->force_log || failed) {
 		if (env.log_cnt) {
 			fprintf(stdout, "%s", env.log_buf);
 			if (env.log_buf[env.log_cnt - 1] != '\n')
 				fprintf(stdout, "\n");
 		}
-		env.log_cnt = 0;
 	}
+	env.log_cnt = 0;
 }
 
 void test__end_subtest()
 {
 	struct prog_test_def *test = env.test;
 	int sub_error_cnt = error_cnt - test->old_error_cnt;
 
 	if (sub_error_cnt)
 		env.fail_cnt++;
 	else
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH bpf-next] tools: bpftool: add support for reporting the effective cgroup progs
From: Takshak Chahande @ 2019-07-30 18:04 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: alexei.starovoitov@gmail.com, daniel@iogearbox.net,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	oss-drivers@netronome.com, Kernel Team, Quentin Monnet
In-Reply-To: <20190729213538.8960-1-jakub.kicinski@netronome.com>

Jakub Kicinski <jakub.kicinski@netronome.com> wrote on Mon [2019-Jul-29 14:35:38 -0700]:
> Takshak said in the original submission:
> 
> With different bpf attach_flags available to attach bpf programs specially
> with BPF_F_ALLOW_OVERRIDE and BPF_F_ALLOW_MULTI, the list of effective
> bpf-programs available to any sub-cgroups really needs to be available for
> easy debugging.
> 
> Using BPF_F_QUERY_EFFECTIVE flag, one can get the list of not only attached
> bpf-programs to a cgroup but also the inherited ones from parent cgroup.
> 
> So a new option is introduced to use BPF_F_QUERY_EFFECTIVE query flag here
> to list all the effective bpf-programs available for execution at a specified
> cgroup.
> 
> Reused modified test program test_cgroup_attach from tools/testing/selftests/bpf:
>   # ./test_cgroup_attach
> 
> With old bpftool:
> 
>  # bpftool cgroup show /sys/fs/cgroup/cgroup-test-work-dir/cg1/
>   ID       AttachType      AttachFlags     Name
>   271      egress          multi           pkt_cntr_1
>   272      egress          multi           pkt_cntr_2
> 
> Attached new program pkt_cntr_4 in cg2 gives following:
> 
>  # bpftool cgroup show /sys/fs/cgroup/cgroup-test-work-dir/cg1/cg2
>   ID       AttachType      AttachFlags     Name
>   273      egress          override        pkt_cntr_4
> 
> And with new "effective" option it shows all effective programs for cg2:
> 
>  # bpftool cgroup show /sys/fs/cgroup/cgroup-test-work-dir/cg1/cg2 effective
>   ID       AttachType      AttachFlags     Name
>   273      egress          override        pkt_cntr_4
>   271      egress          override        pkt_cntr_1
>   272      egress          override        pkt_cntr_2
> 
> Compared to original submission use a local flag instead of global
> option.
> 
> We need to clear query_flags on every command, in case batch mode
> wants to use varying settings.
> 
> Signed-off-by: Takshak Chahande <ctakshak@fb.com>
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
> ---
>  .../bpftool/Documentation/bpftool-cgroup.rst  | 16 +++--
>  tools/bpf/bpftool/bash-completion/bpftool     | 15 +++--
>  tools/bpf/bpftool/cgroup.c                    | 65 ++++++++++++-------
>  3 files changed, 63 insertions(+), 33 deletions(-)
> 
> diff --git a/tools/bpf/bpftool/Documentation/bpftool-cgroup.rst b/tools/bpf/bpftool/Documentation/bpftool-cgroup.rst
> index 585f270c2d25..06a28b07787d 100644
> --- a/tools/bpf/bpftool/Documentation/bpftool-cgroup.rst
> +++ b/tools/bpf/bpftool/Documentation/bpftool-cgroup.rst
> @@ -20,8 +20,8 @@ SYNOPSIS
>  CGROUP COMMANDS
>  ===============
>  
> -|	**bpftool** **cgroup { show | list }** *CGROUP*
> -|	**bpftool** **cgroup tree** [*CGROUP_ROOT*]
> +|	**bpftool** **cgroup { show | list }** *CGROUP* [**effective**]
> +|	**bpftool** **cgroup tree** [*CGROUP_ROOT*] [**effective**]
>  |	**bpftool** **cgroup attach** *CGROUP* *ATTACH_TYPE* *PROG* [*ATTACH_FLAGS*]
>  |	**bpftool** **cgroup detach** *CGROUP* *ATTACH_TYPE* *PROG*
>  |	**bpftool** **cgroup help**
> @@ -35,13 +35,17 @@ CGROUP COMMANDS
>  
>  DESCRIPTION
>  ===========
> -	**bpftool cgroup { show | list }** *CGROUP*
> +	**bpftool cgroup { show | list }** *CGROUP* [**effective**]
>  		  List all programs attached to the cgroup *CGROUP*.
>  
>  		  Output will start with program ID followed by attach type,
>  		  attach flags and program name.
>  
> -	**bpftool cgroup tree** [*CGROUP_ROOT*]
> +		  If **effective** is specified retrieve effective programs that
> +		  will execute for events within a cgroup. This includes
> +		  inherited along with attached ones.
> +
> +	**bpftool cgroup tree** [*CGROUP_ROOT*] [**effective**]
>  		  Iterate over all cgroups in *CGROUP_ROOT* and list all
>  		  attached programs. If *CGROUP_ROOT* is not specified,
>  		  bpftool uses cgroup v2 mountpoint.
> @@ -50,6 +54,10 @@ DESCRIPTION
>  		  commands: it starts with absolute cgroup path, followed by
>  		  program ID, attach type, attach flags and program name.
>  
> +		  If **effective** is specified retrieve effective programs that
> +		  will execute for events within a cgroup. This includes
> +		  inherited along with attached ones.
> +
>  	**bpftool cgroup attach** *CGROUP* *ATTACH_TYPE* *PROG* [*ATTACH_FLAGS*]
>  		  Attach program *PROG* to the cgroup *CGROUP* with attach type
>  		  *ATTACH_TYPE* and optional *ATTACH_FLAGS*.
> diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
> index 6b961a5ed100..df16c5415444 100644
> --- a/tools/bpf/bpftool/bash-completion/bpftool
> +++ b/tools/bpf/bpftool/bash-completion/bpftool
> @@ -710,12 +710,15 @@ _bpftool()
>              ;;
>          cgroup)
>              case $command in
> -                show|list)
> -                    _filedir
> -                    return 0
> -                    ;;
> -                tree)
> -                    _filedir
> +                show|list|tree)
> +                    case $cword in
> +                        3)
> +                            _filedir
> +                            ;;
> +                        4)
> +                            COMPREPLY=( $( compgen -W 'effective' -- "$cur" ) )
> +                            ;;
> +                    esac
>                      return 0
>                      ;;
>                  attach|detach)
> diff --git a/tools/bpf/bpftool/cgroup.c b/tools/bpf/bpftool/cgroup.c
> index f3c05b08c68c..339c2c78b8e4 100644
> --- a/tools/bpf/bpftool/cgroup.c
> +++ b/tools/bpf/bpftool/cgroup.c
> @@ -29,6 +29,8 @@
>  	"                        recvmsg4 | recvmsg6 | sysctl |\n"	       \
>  	"                        getsockopt | setsockopt }"
>  
> +static unsigned int query_flags;
> +
>  static const char * const attach_type_strings[] = {
>  	[BPF_CGROUP_INET_INGRESS] = "ingress",
>  	[BPF_CGROUP_INET_EGRESS] = "egress",
> @@ -107,7 +109,8 @@ static int count_attached_bpf_progs(int cgroup_fd, enum bpf_attach_type type)
>  	__u32 prog_cnt = 0;
>  	int ret;
>  
> -	ret = bpf_prog_query(cgroup_fd, type, 0, NULL, NULL, &prog_cnt);
> +	ret = bpf_prog_query(cgroup_fd, type, query_flags, NULL,
> +			     NULL, &prog_cnt);
>  	if (ret)
>  		return -1;
>  
> @@ -125,8 +128,8 @@ static int show_attached_bpf_progs(int cgroup_fd, enum bpf_attach_type type,
>  	int ret;
>  
>  	prog_cnt = ARRAY_SIZE(prog_ids);
> -	ret = bpf_prog_query(cgroup_fd, type, 0, &attach_flags, prog_ids,
> -			     &prog_cnt);
> +	ret = bpf_prog_query(cgroup_fd, type, query_flags, &attach_flags,
> +			     prog_ids, &prog_cnt);
>  	if (ret)
>  		return ret;
>  
> @@ -158,20 +161,30 @@ static int show_attached_bpf_progs(int cgroup_fd, enum bpf_attach_type type,
>  static int do_show(int argc, char **argv)
>  {
>  	enum bpf_attach_type type;
> +	const char *path;
>  	int cgroup_fd;
>  	int ret = -1;
>  
> -	if (argc < 1) {
> -		p_err("too few parameters for cgroup show");
> -		goto exit;
> -	} else if (argc > 1) {
> -		p_err("too many parameters for cgroup show");
> -		goto exit;
> +	query_flags = 0;
> +
> +	if (!REQ_ARGS(1))
> +		return -1;
> +	path = GET_ARG();
> +
> +	while (argc) {
> +		if (is_prefix(*argv, "effective")) {
> +			query_flags |= BPF_F_QUERY_EFFECTIVE;
> +			NEXT_ARG();
> +		} else {
> +			p_err("expected no more arguments, 'effective', got: '%s'?",
> +			      *argv);
> +			return -1;
> +		}
>  	}
This while loop will allow multiple 'effective' keywords in the argument
unnecessarily. IMO, we should strictly restrict only for single
occurance of 'effective' word.

>  
> -	cgroup_fd = open(argv[0], O_RDONLY);
> +	cgroup_fd = open(path, O_RDONLY);
>  	if (cgroup_fd < 0) {
> -		p_err("can't open cgroup %s", argv[0]);
> +		p_err("can't open cgroup %s", path);
>  		goto exit;
>  	}
>  
> @@ -297,23 +310,29 @@ static int do_show_tree(int argc, char **argv)
>  	char *cgroup_root;
>  	int ret;
>  
> -	switch (argc) {
> -	case 0:
> +	query_flags = 0;
> +
> +	if (!argc) {
>  		cgroup_root = find_cgroup_root();
>  		if (!cgroup_root) {
>  			p_err("cgroup v2 isn't mounted");
>  			return -1;
>  		}
> -		break;
> -	case 1:
> -		cgroup_root = argv[0];
> -		break;
> -	default:
> -		p_err("too many parameters for cgroup tree");
> -		return -1;
> +	} else {
> +		cgroup_root = GET_ARG();
> +
> +		while (argc) {
> +			if (is_prefix(*argv, "effective")) {
> +				query_flags |= BPF_F_QUERY_EFFECTIVE;
> +				NEXT_ARG();

NEXT_ARG() does update argc value; that means after this outer if/else we need 
to know how argc has become 0 (through which path) before freeing up `cgroup_root` allocated
memory later at the end of this function.

> +			} else {
> +				p_err("expected no more arguments, 'effective', got: '%s'?",
> +				      *argv);
> +				return -1;
> +			}
> +		}
>  	}

>  
> -
>  	if (json_output)
>  		jsonw_start_array(json_wtr);
>  	else
> @@ -459,8 +478,8 @@ static int do_help(int argc, char **argv)
>  	}
>  
>  	fprintf(stderr,
> -		"Usage: %s %s { show | list } CGROUP\n"
> -		"       %s %s tree [CGROUP_ROOT]\n"
> +		"Usage: %s %s { show | list } CGROUP [**effective**]\n"
> +		"       %s %s tree [CGROUP_ROOT] [**effective**]\n"
>  		"       %s %s attach CGROUP ATTACH_TYPE PROG [ATTACH_FLAGS]\n"
>  		"       %s %s detach CGROUP ATTACH_TYPE PROG\n"
>  		"       %s %s help\n"
> -- 
> 2.21.0
> 

Thanks for the patch. Apart from above two issues, patch looks good.

^ permalink raw reply

* [PATCH bpf-next] selftests/bpf: fix clearing buffered output between tests/subtests
From: Andrii Nakryiko @ 2019-07-30 17:56 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Clear buffered output once test or subtests finishes even if test was
successful. Not doing this leads to accumulation of output from previous
tests and on first failed tests lots of irrelevant output will be
dumped, greatly confusing things.

Fixed: 3a516a0a3a7b ("selftests/bpf: add sub-tests support for test_progs")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/testing/selftests/bpf/test_progs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index 546d99b3ec34..db00196c8315 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -46,8 +46,8 @@ static void dump_test_log(const struct prog_test_def *test, bool failed)
 			if (env.log_buf[env.log_cnt - 1] != '\n')
 				fprintf(stdout, "\n");
 		}
-		env.log_cnt = 0;
 	}
+	env.log_cnt = 0;
 }
 
 void test__end_subtest()
-- 
2.17.1


^ permalink raw reply related

* [FINAL REMINDER!!] LPC 2019 Networking Track CFP
From: David Miller @ 2019-07-30 17:41 UTC (permalink / raw)
  To: netdev, daniel; +Cc: linux-wireless, netfilter-devel, bpf, linux-kernel, lwn

The deadline is this Friday, please get your proposals in as soon as
possible and do not procrastinate.  The deadline absolutely cannot be
extended.

This is a call for proposals for the 3 day networking track at the
Linux Plumbers Conference in Lisbon, which will be happening on
September 9th-11th, 2019.

We are seeking talks of 40 minutes in length (including Q & A),
optionally accompanied by papers of 2 to 10 pages in length.  The
papers, while not required, are very strongly encouraged by the
committee.  The submitters intention to provide a paper will be taken
into consideration as a criteria when deciding which proposals to
accept.

Any kind of advanced networking-related topic will be considered.

Please submit your proposals on the LPC website at:

	https://www.linuxplumbersconf.org/event/4/abstracts/#submit-abstract

And be sure to select "Networking Summit Track" in the Track pulldown
menu.

Proposals must be submitted by August 2nd, and submitters will be
notified of acceptance by August 9th.

Final slides and papers (as PDF) are due on September 2nd.

Looking forward to seeing you all in Lisbon in September!

^ permalink raw reply

* Re: [PATCH net-next] rxrpc: Fix -Wframe-larger-than= warnings from on-stack crypto
From: David Miller @ 2019-07-30 17:32 UTC (permalink / raw)
  To: dhowells; +Cc: netdev, linux-afs, linux-kernel, clang-built-linux, arnd,
	keescook
In-Reply-To: <156449861697.10315.4666924841804740487.stgit@warthog.procyon.org.uk>

From: David Howells <dhowells@redhat.com>
Date: Tue, 30 Jul 2019 15:56:57 +0100

> rxkad sometimes triggers a warning about oversized stack frames when
> building with clang for a 32-bit architecture:
> 
> net/rxrpc/rxkad.c:243:12: error: stack frame size of 1088 bytes in function 'rxkad_secure_packet' [-Werror,-Wframe-larger-than=]
> net/rxrpc/rxkad.c:501:12: error: stack frame size of 1088 bytes in function 'rxkad_verify_packet' [-Werror,-Wframe-larger-than=]
> 
> The problem is the combination of SYNC_SKCIPHER_REQUEST_ON_STACK() in
> rxkad_verify_packet()/rxkad_secure_packet() with the relatively large
> scatterlist in rxkad_verify_packet_1()/rxkad_secure_packet_encrypt().
> 
> The warning does not show up when using gcc, which does not inline the
> functions as aggressively, but the problem is still the same.
> 
> Allocate the cipher buffers from the slab instead, caching the allocated
> packet crypto request memory used for DATA packet crypto in the rxrpc_call
> struct.
> 
> Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
> Reported-by: Arnd Bergmann <arnd@arndb.de>
> Signed-off-by: David Howells <dhowells@redhat.com>
> Acked-by: Arnd Bergmann <arnd@arndb.de>
> cc: Herbert Xu <herbert@gondor.apana.org.au>

Applied.

^ permalink raw reply

* Re: [PATCH net 0/2] rxrpc: Fixes
From: David Miller @ 2019-07-30 17:31 UTC (permalink / raw)
  To: dhowells; +Cc: netdev, linux-afs, linux-kernel
In-Reply-To: <156449821120.9558.2821927090314866621.stgit@warthog.procyon.org.uk>

From: David Howells <dhowells@redhat.com>
Date: Tue, 30 Jul 2019 15:50:11 +0100

> 
> Here are a couple of fixes for rxrpc:
> 
>  (1) Fix a potential deadlock in the peer keepalive dispatcher.
> 
>  (2) Fix a missing notification when a UDP sendmsg error occurs in rxrpc.
> 
> 
> The patches are tagged here:
> 
> 	git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
> 	rxrpc-fixes-20190730

Pulled, thanks David.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox