Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR
From: Kirill Tkhai @ 2018-09-03 13:41 UTC (permalink / raw)
  To: Christian Brauner
  Cc: netdev, linux-kernel, davem, kuznet, yoshfuji, pombredanne,
	kstewart, gregkh, dsahern, fw, lucien.xin, jakub.kicinski, jbenc,
	nicolas.dichtel
In-Reply-To: <20180901013427.tj3t2mlik4t7hlt5@gmail.com>

On 01.09.2018 04:34, Christian Brauner wrote:
> On Thu, Aug 30, 2018 at 04:45:45PM +0200, Christian Brauner wrote:
>> On Thu, Aug 30, 2018 at 11:49:31AM +0300, Kirill Tkhai wrote:
>>> On 29.08.2018 21:13, Christian Brauner wrote:
>>>> Hi Kirill,
>>>>
>>>> Thanks for the question!
>>>>
>>>> On Wed, Aug 29, 2018 at 11:30:37AM +0300, Kirill Tkhai wrote:
>>>>> Hi, Christian,
>>>>>
>>>>> On 29.08.2018 02:18, Christian Brauner wrote:
>>>>>> From: Christian Brauner <christian@brauner.io>
>>>>>>
>>>>>> Hey,
>>>>>>
>>>>>> A while back we introduced and enabled IFLA_IF_NETNSID in
>>>>>> RTM_{DEL,GET,NEW}LINK requests (cf. [1], [2], [3], [4], [5]). This has led
>>>>>> to signficant performance increases since it allows userspace to avoid
>>>>>> taking the hit of a setns(netns_fd, CLONE_NEWNET), then getting the
>>>>>> interfaces from the netns associated with the netns_fd. Especially when a
>>>>>> lot of network namespaces are in use, using setns() becomes increasingly
>>>>>> problematic when performance matters.
>>>>>
>>>>> could you please give a real example, when setns()+socket(AF_NETLINK) cause
>>>>> problems with the performance? You should do this only once on application
>>>>> startup, and then you have created netlink sockets in any net namespaces you
>>>>> need. What is the problem here?
>>>>
>>>> So we have a daemon (LXD) that is often running thousands of containers.
>>>> When users issue a lxc list request against the daemon it returns a list
>>>> of all containers including all of the interfaces and addresses for each
>>>> container. To retrieve those addresses we currently rely on setns() +
>>>> getifaddrs() for each of those containers. That has horrible
>>>> performance.
>>>
>>> Could you please provide some numbers showing that setns()
>>> introduces signify performance decrease in the application?
>>
>> Sure, might take a few days++ though since I'm traveling.
> 
> Hey Kirill,
> 
> As promised here's some code [1] that compares performance. I basically
> did a setns() to the network namespace and called getifaddrs() and
> compared this to the scenario where I use the newly introduced property.
> I did this 1 million times and calculated the mean getifaddrs()
> retrieval time based on that.
> My patch cuts the time in half.
> 
> brauner@wittgenstein:~/netns_getifaddrs$ ./getifaddrs_perf 0 1178
> Mean time in microseconds (netnsid): 81
> Mean time in microseconds (setns): 162
> 
> Christian
> 
> I'm only appending the main file since the netsns_getifaddrs() code I
> used is pretty long:
> 
> [1]:
> 
> #define _GNU_SOURCE
> #define __STDC_FORMAT_MACROS
> #include <fcntl.h>
> #include <inttypes.h>
> #include <linux/types.h>
> #include <sched.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <sys/stat.h>
> #include <sys/time.h>
> #include <sys/types.h>
> #include <unistd.h>
> 
> #include "netns_getifaddrs.h"
> #include "print_getifaddrs.h"
> 
> #define ITERATIONS 1000000
> #define SEC_TO_MICROSEC(x) (1000000 * (x))
> 
> int main(int argc, char *argv[])
> {
> 	int i, ret;
> 	__s32 netns_id;
> 	pid_t netns_pid;
> 	char path[1024];
> 	intmax_t times[ITERATIONS];
> 	struct timeval t1, t2;
> 	intmax_t time_in_mcs;
> 	int fret = EXIT_FAILURE;
> 	intmax_t sum = 0;
> 	int host_netns_fd = -1, netns_fd = -1;
> 
> 	struct ifaddrs *ifaddrs = NULL;
> 
> 	if (argc != 3)
> 		goto on_error;
> 
> 	netns_id = atoi(argv[1]);
> 	netns_pid = atoi(argv[2]);
> 	printf("%d\n", netns_id);
> 	printf("%d\n", netns_pid);
> 
> 	for (i = 0; i < ITERATIONS; i++) {
> 		ret = gettimeofday(&t1, NULL);
> 		if (ret < 0)
> 			goto on_error;
> 
> 		ret = netns_getifaddrs(&ifaddrs, netns_id);
> 		freeifaddrs(ifaddrs);
> 		if (ret < 0)
> 			goto on_error;
> 
> 		ret = gettimeofday(&t2, NULL);
> 		if (ret < 0)
> 			goto on_error;
> 
> 		time_in_mcs = (SEC_TO_MICROSEC(t2.tv_sec) + t2.tv_usec) -
> 			      (SEC_TO_MICROSEC(t1.tv_sec) + t1.tv_usec);
> 		times[i] = time_in_mcs;
> 	}
> 
> 	for (i = 0; i < ITERATIONS; i++)
> 		sum += times[i];
> 
> 	printf("Mean time in microseconds (netnsid): %ju\n",
> 	       sum / ITERATIONS);
> 
> 	ret = snprintf(path, sizeof(path), "/proc/%d/ns/net", netns_pid);
> 	if (ret < 0 || (size_t)ret >= sizeof(path))
> 		goto on_error;
> 
> 	netns_fd = open(path, O_RDONLY | O_CLOEXEC);
> 	if (netns_fd < 0)
> 		goto on_error;
> 
> 	host_netns_fd = open("/proc/self/ns/net", O_RDONLY | O_CLOEXEC);
> 	if (host_netns_fd < 0)
> 		goto on_error;
> 
> 	memset(times, 0, sizeof(times));
> 	for (i = 0; i < ITERATIONS; i++) {
> 		ret = gettimeofday(&t1, NULL);
> 		if (ret < 0)
> 			goto on_error;
> 
> 		ret = setns(netns_fd, CLONE_NEWNET);
> 		if (ret < 0)
> 			goto on_error;
> 
> 		ret = getifaddrs(&ifaddrs);
> 		freeifaddrs(ifaddrs);
> 		if (ret < 0)
> 			goto on_error;
> 
> 		ret = gettimeofday(&t2, NULL);
> 		if (ret < 0)
> 			goto on_error;
> 
> 		ret = setns(host_netns_fd, CLONE_NEWNET);
> 		if (ret < 0)
> 			goto on_error;
> 
> 		time_in_mcs = (SEC_TO_MICROSEC(t2.tv_sec) + t2.tv_usec) -
> 			      (SEC_TO_MICROSEC(t1.tv_sec) + t1.tv_usec);
> 		times[i] = time_in_mcs;
> 	}
> 
> 	for (i = 0; i < ITERATIONS; i++)
> 		sum += times[i];
> 
> 	printf("Mean time in microseconds (setns): %ju\n",
> 	       sum / ITERATIONS);
> 
> 	fret = EXIT_SUCCESS;
> 
> on_error:
> 	if (netns_fd >= 0)
> 		close(netns_fd);
> 
> 	if (host_netns_fd >= 0)
> 		close(host_netns_fd);
> 
> 	exit(fret);
> }

But this is a synthetic test, while I asked about real workflow.
Is this real problem for lxd, and there is observed performance
decrease?

I see, there are already nsid use in existing code, but I have to say,
that adding new types of variables as a system call arguments make it
less modular. When you request RTM_GETADDR for a specific nsid, this
obligates the kernel to make everything unchangable during the call,
doesn't it?

We may look at existing code as example, what problems this may cause.
Look at do_setlink(). There are many different types of variables,
and all of them should be dereferenced atomically. So, all the function
is executed under global rtnl. And this causes delays in another config
places, which are sensitive to rtnl. So, adding more dimensions to RTM_GETADDR
may turn it in the same overloaded function as do_setlink() is. And one
day, when we reach the state, when we must rework all of this, we won't
be able to do this. I'm not sure, now is not too late.

I just say about this, because it's possible we should consider another
approach in rtnl communication in general, and stop to overload it.

Kirill

^ permalink raw reply

* RE: [PATCH 2/3] xen-netback: validate queue numbers in xenvif_set_hash_mapping()
From: Paul Durrant @ 2018-09-03  9:23 UTC (permalink / raw)
  To: 'Jan Beulich', Wei Liu
  Cc: davem@davemloft.net, xen-devel, netdev@vger.kernel.org
In-Reply-To: <5B85636102000078001E2A4D@prv1-mh.provo.novell.com>

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 28 August 2018 16:00
> To: Paul Durrant <Paul.Durrant@citrix.com>; Wei Liu <wei.liu2@citrix.com>
> Cc: davem@davemloft.net; xen-devel <xen-devel@lists.xenproject.org>;
> netdev@vger.kernel.org
> Subject: [PATCH 2/3] xen-netback: validate queue numbers in
> xenvif_set_hash_mapping()
> 
> Checking them before the grant copy means nothing as to the validity of
> the incoming request. As we shouldn't make the new data live before
> having validated it, introduce a second instance of the mapping array.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Paul Durrant <paul.durrant@citrix.com>

> 
> ---
>  drivers/net/xen-netback/common.h    |    3 ++-
>  drivers/net/xen-netback/hash.c      |   20 ++++++++++++++------
>  drivers/net/xen-netback/interface.c |    3 ++-
>  3 files changed, 18 insertions(+), 8 deletions(-)
> 
> --- 4.19-rc1-xen-netback-set-hash-mapping.orig/drivers/net/xen-
> netback/common.h
> +++ 4.19-rc1-xen-netback-set-hash-mapping/drivers/net/xen-
> netback/common.h
> @@ -241,8 +241,9 @@ struct xenvif_hash_cache {
>  struct xenvif_hash {
>  	unsigned int alg;
>  	u32 flags;
> +	bool mapping_sel;
>  	u8 key[XEN_NETBK_MAX_HASH_KEY_SIZE];
> -	u32 mapping[XEN_NETBK_MAX_HASH_MAPPING_SIZE];
> +	u32 mapping[2][XEN_NETBK_MAX_HASH_MAPPING_SIZE];
>  	unsigned int size;
>  	struct xenvif_hash_cache cache;
>  };
> --- 4.19-rc1-xen-netback-set-hash-mapping.orig/drivers/net/xen-
> netback/hash.c
> +++ 4.19-rc1-xen-netback-set-hash-mapping/drivers/net/xen-
> netback/hash.c
> @@ -324,7 +324,8 @@ u32 xenvif_set_hash_mapping_size(struct
>  		return XEN_NETIF_CTRL_STATUS_INVALID_PARAMETER;
> 
>  	vif->hash.size = size;
> -	memset(vif->hash.mapping, 0, sizeof(u32) * size);
> +	memset(vif->hash.mapping[vif->hash.mapping_sel], 0,
> +	       sizeof(u32) * size);
> 
>  	return XEN_NETIF_CTRL_STATUS_SUCCESS;
>  }
> @@ -332,7 +333,7 @@ u32 xenvif_set_hash_mapping_size(struct
>  u32 xenvif_set_hash_mapping(struct xenvif *vif, u32 gref, u32 len,
>  			    u32 off)
>  {
> -	u32 *mapping = vif->hash.mapping;
> +	u32 *mapping = vif->hash.mapping[!vif->hash.mapping_sel];
>  	struct gnttab_copy copy_op = {
>  		.source.u.ref = gref,
>  		.source.domid = vif->domid,
> @@ -348,9 +349,8 @@ u32 xenvif_set_hash_mapping(struct xenvi
>  	copy_op.dest.u.gmfn = virt_to_gfn(mapping + off);
>  	copy_op.dest.offset = xen_offset_in_page(mapping + off);
> 
> -	while (len-- != 0)
> -		if (mapping[off++] >= vif->num_queues)
> -			return
> XEN_NETIF_CTRL_STATUS_INVALID_PARAMETER;
> +	memcpy(mapping, vif->hash.mapping[vif->hash.mapping_sel],
> +	       vif->hash.size * sizeof(*mapping));
> 
>  	if (copy_op.len != 0) {
>  		gnttab_batch_copy(&copy_op, 1);
> @@ -359,6 +359,12 @@ u32 xenvif_set_hash_mapping(struct xenvi
>  			return
> XEN_NETIF_CTRL_STATUS_INVALID_PARAMETER;
>  	}
> 
> +	while (len-- != 0)
> +		if (mapping[off++] >= vif->num_queues)
> +			return
> XEN_NETIF_CTRL_STATUS_INVALID_PARAMETER;
> +
> +	vif->hash.mapping_sel = !vif->hash.mapping_sel;
> +
>  	return XEN_NETIF_CTRL_STATUS_SUCCESS;
>  }
> 
> @@ -410,6 +416,8 @@ void xenvif_dump_hash_info(struct xenvif
>  	}
> 
>  	if (vif->hash.size != 0) {
> +		const u32 *mapping = vif->hash.mapping[vif-
> >hash.mapping_sel];
> +
>  		seq_puts(m, "\nHash Mapping:\n");
> 
>  		for (i = 0; i < vif->hash.size; ) {
> @@ -422,7 +430,7 @@ void xenvif_dump_hash_info(struct xenvif
>  			seq_printf(m, "[%4u - %4u]: ", i, i + n - 1);
> 
>  			for (j = 0; j < n; j++, i++)
> -				seq_printf(m, "%4u ", vif->hash.mapping[i]);
> +				seq_printf(m, "%4u ", mapping[i]);
> 
>  			seq_puts(m, "\n");
>  		}
> --- 4.19-rc1-xen-netback-set-hash-mapping.orig/drivers/net/xen-
> netback/interface.c
> +++ 4.19-rc1-xen-netback-set-hash-mapping/drivers/net/xen-
> netback/interface.c
> @@ -162,7 +162,8 @@ static u16 xenvif_select_queue(struct ne
>  	if (size == 0)
>  		return skb_get_hash_raw(skb) % dev-
> >real_num_tx_queues;
> 
> -	return vif->hash.mapping[skb_get_hash_raw(skb) % size];
> +	return vif->hash.mapping[vif->hash.mapping_sel]
> +				[skb_get_hash_raw(skb) % size];
>  }
> 
>  static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
> 
> 
> 

^ permalink raw reply

* Re: [PATCH v2 00/11] mscc: ocelot: add support for SerDes muxing configuration
From: Alexandre Belloni @ 2018-09-03 13:45 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Quentin Schulz, ralf, paul.burton, jhogan, robh+dt, mark.rutland,
	davem, kishon, f.fainelli, allan.nielsen, linux-mips, devicetree,
	linux-kernel, netdev, thomas.petazzoni
In-Reply-To: <20180903133415.GF4445@lunn.ch>

On 03/09/2018 15:34:15+0200, Andrew Lunn wrote:
> > I suggest patches 1 and 8 go through MIPS tree, 2 to 5 and 11 go through
> > net while the others (6, 7, 9 and 10) go through the generic PHY subsystem.
> 
> Hi Quentin
> 
> Are you expecting merge conflicts? If not, it might be simpler to gets
> ACKs from each maintainer, and then merge it though one tree.
> 

There are some other DT changes for this cycle so those should probably
go through MIPS.

-- 
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply

* RE: [PATCH 3/3] xen-netback: handle page straddling in xenvif_set_hash_mapping()
From: Paul Durrant @ 2018-09-03  9:28 UTC (permalink / raw)
  To: 'Jan Beulich', Wei Liu
  Cc: davem@davemloft.net, xen-devel, netdev@vger.kernel.org
In-Reply-To: <5B85637E02000078001E2A50@prv1-mh.provo.novell.com>

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 28 August 2018 16:00
> To: Paul Durrant <Paul.Durrant@citrix.com>; Wei Liu <wei.liu2@citrix.com>
> Cc: davem@davemloft.net; xen-devel <xen-devel@lists.xenproject.org>;
> netdev@vger.kernel.org
> Subject: [PATCH 3/3] xen-netback: handle page straddling in
> xenvif_set_hash_mapping()
> 
> There's no guarantee that the mapping array doesn't cross a page
> boundary. Use a second grant copy operation if necessary.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Personally I think it would be cleaner to out-of-line the allocation of the mapping table and ensure it is page aligned but this works so...

Reviewed-by: Paul Durrant <paul.durrant@citrix.com>

> 
> ---
>  drivers/net/xen-netback/hash.c |   25 ++++++++++++++++++-------
>  1 file changed, 18 insertions(+), 7 deletions(-)
> 
> --- 4.19-rc1-xen-netback-set-hash-mapping.orig/drivers/net/xen-
> netback/hash.c
> +++ 4.19-rc1-xen-netback-set-hash-mapping/drivers/net/xen-
> netback/hash.c
> @@ -334,28 +334,39 @@ u32 xenvif_set_hash_mapping(struct xenvi
>  			    u32 off)
>  {
>  	u32 *mapping = vif->hash.mapping[!vif->hash.mapping_sel];
> -	struct gnttab_copy copy_op = {
> +	unsigned int nr = 1;
> +	struct gnttab_copy copy_op[2] = {{
>  		.source.u.ref = gref,
>  		.source.domid = vif->domid,
>  		.dest.domid = DOMID_SELF,
>  		.len = len * sizeof(*mapping),
>  		.flags = GNTCOPY_source_gref
> -	};
> +	}};
> 
>  	if ((off + len < off) || (off + len > vif->hash.size) ||
>  	    len > XEN_PAGE_SIZE / sizeof(*mapping))
>  		return XEN_NETIF_CTRL_STATUS_INVALID_PARAMETER;
> 
> -	copy_op.dest.u.gmfn = virt_to_gfn(mapping + off);
> -	copy_op.dest.offset = xen_offset_in_page(mapping + off);
> +	copy_op[0].dest.u.gmfn = virt_to_gfn(mapping + off);
> +	copy_op[0].dest.offset = xen_offset_in_page(mapping + off);
> +	if (copy_op[0].dest.offset + copy_op[0].len > XEN_PAGE_SIZE) {
> +		copy_op[1] = copy_op[0];
> +		copy_op[1].source.offset = XEN_PAGE_SIZE -
> copy_op[0].dest.offset;
> +		copy_op[1].dest.u.gmfn = virt_to_gfn(mapping + off + len);
> +		copy_op[1].dest.offset = 0;
> +		copy_op[1].len = copy_op[0].len - copy_op[1].source.offset;
> +		copy_op[0].len = copy_op[1].source.offset;
> +		nr = 2;
> +	}
> 
>  	memcpy(mapping, vif->hash.mapping[vif->hash.mapping_sel],
>  	       vif->hash.size * sizeof(*mapping));
> 
> -	if (copy_op.len != 0) {
> -		gnttab_batch_copy(&copy_op, 1);
> +	if (copy_op[0].len != 0) {
> +		gnttab_batch_copy(copy_op, nr);
> 
> -		if (copy_op.status != GNTST_okay)
> +		if (copy_op[0].status != GNTST_okay ||
> +		    copy_op[nr - 1].status != GNTST_okay)
>  			return
> XEN_NETIF_CTRL_STATUS_INVALID_PARAMETER;
>  	}
> 
> 
> 
> 

^ permalink raw reply

* Re: [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR
From: Jiri Benc @ 2018-09-03 13:50 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: Christian Brauner, netdev, linux-kernel, davem, kuznet, yoshfuji,
	pombredanne, kstewart, gregkh, dsahern, fw, lucien.xin,
	jakub.kicinski, nicolas.dichtel
In-Reply-To: <2319a029-7aca-b7aa-2e8f-4dfdeedcb6df@virtuozzo.com>

On Mon, 3 Sep 2018 16:41:45 +0300, Kirill Tkhai wrote:
> But this is a synthetic test, while I asked about real workflow.
> Is this real problem for lxd, and there is observed performance
> decrease?

It's actually not as much a performance problem but rather the only way
to get the data in some situations. Namely, when you have only netnsid.
This happens e.g. when you want to query a veth peer in another netns.

setns() requires a file descriptor which you don't have. Nor there is
a way to convert netnsid to a fd.

While developing the IFLA_IF_NETNSID patch, I was first thinking about
implementing an API doing the conversion. The problem is there's no
good place to put this into. It can't be done over netlink: netlink is
unreliable and you can't have the kernel open a fd for you and lose it.
There's no ioctl to use. So we'd be left with a procfs/sysfs or a
syscall.

Using netnsid to refer to the target netns seems to be a nice solution -
after all, netnsid is the identifier to use in netlink.

 Jiri

^ permalink raw reply

* [PATCH net-next v2 05/11] net: mscc: ocelot: simplify register access for PLL5 configuration
From: Quentin Schulz @ 2018-09-03  9:33 UTC (permalink / raw)
  To: alexandre.belloni, ralf, paul.burton, jhogan, robh+dt,
	mark.rutland, davem, kishon, andrew, f.fainelli
  Cc: allan.nielsen, linux-mips, devicetree, linux-kernel, netdev,
	thomas.petazzoni, Quentin Schulz
In-Reply-To: <20180903093308.24366-1-quentin.schulz@bootlin.com>

Since HSIO address space can be accessed by different drivers, let's
simplify the register address definitions so that it can be easily used
by all drivers and put the register address definition in the
include/soc/mscc/ocelot_hsio.h header file.

Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com>
---
 drivers/net/ethernet/mscc/ocelot.h      | 73 --------------------
 drivers/net/ethernet/mscc/ocelot_regs.c | 92 +++----------------------
 include/soc/mscc/ocelot_hsio.h          | 74 ++++++++++++++++++++
 3 files changed, 83 insertions(+), 156 deletions(-)

diff --git a/drivers/net/ethernet/mscc/ocelot.h b/drivers/net/ethernet/mscc/ocelot.h
index 2da20a352120..3720e519cdd7 100644
--- a/drivers/net/ethernet/mscc/ocelot.h
+++ b/drivers/net/ethernet/mscc/ocelot.h
@@ -332,79 +332,6 @@ enum ocelot_reg {
 	SYS_CM_DATA_RD,
 	SYS_CM_OP,
 	SYS_CM_DATA,
-	HSIO_PLL5G_CFG0 = HSIO << TARGET_OFFSET,
-	HSIO_PLL5G_CFG1,
-	HSIO_PLL5G_CFG2,
-	HSIO_PLL5G_CFG3,
-	HSIO_PLL5G_CFG4,
-	HSIO_PLL5G_CFG5,
-	HSIO_PLL5G_CFG6,
-	HSIO_PLL5G_STATUS0,
-	HSIO_PLL5G_STATUS1,
-	HSIO_PLL5G_BIST_CFG0,
-	HSIO_PLL5G_BIST_CFG1,
-	HSIO_PLL5G_BIST_CFG2,
-	HSIO_PLL5G_BIST_STAT0,
-	HSIO_PLL5G_BIST_STAT1,
-	HSIO_RCOMP_CFG0,
-	HSIO_RCOMP_STATUS,
-	HSIO_SYNC_ETH_CFG,
-	HSIO_SYNC_ETH_PLL_CFG,
-	HSIO_S1G_DES_CFG,
-	HSIO_S1G_IB_CFG,
-	HSIO_S1G_OB_CFG,
-	HSIO_S1G_SER_CFG,
-	HSIO_S1G_COMMON_CFG,
-	HSIO_S1G_PLL_CFG,
-	HSIO_S1G_PLL_STATUS,
-	HSIO_S1G_DFT_CFG0,
-	HSIO_S1G_DFT_CFG1,
-	HSIO_S1G_DFT_CFG2,
-	HSIO_S1G_TP_CFG,
-	HSIO_S1G_RC_PLL_BIST_CFG,
-	HSIO_S1G_MISC_CFG,
-	HSIO_S1G_DFT_STATUS,
-	HSIO_S1G_MISC_STATUS,
-	HSIO_MCB_S1G_ADDR_CFG,
-	HSIO_S6G_DIG_CFG,
-	HSIO_S6G_DFT_CFG0,
-	HSIO_S6G_DFT_CFG1,
-	HSIO_S6G_DFT_CFG2,
-	HSIO_S6G_TP_CFG0,
-	HSIO_S6G_TP_CFG1,
-	HSIO_S6G_RC_PLL_BIST_CFG,
-	HSIO_S6G_MISC_CFG,
-	HSIO_S6G_OB_ANEG_CFG,
-	HSIO_S6G_DFT_STATUS,
-	HSIO_S6G_ERR_CNT,
-	HSIO_S6G_MISC_STATUS,
-	HSIO_S6G_DES_CFG,
-	HSIO_S6G_IB_CFG,
-	HSIO_S6G_IB_CFG1,
-	HSIO_S6G_IB_CFG2,
-	HSIO_S6G_IB_CFG3,
-	HSIO_S6G_IB_CFG4,
-	HSIO_S6G_IB_CFG5,
-	HSIO_S6G_OB_CFG,
-	HSIO_S6G_OB_CFG1,
-	HSIO_S6G_SER_CFG,
-	HSIO_S6G_COMMON_CFG,
-	HSIO_S6G_PLL_CFG,
-	HSIO_S6G_ACJTAG_CFG,
-	HSIO_S6G_GP_CFG,
-	HSIO_S6G_IB_STATUS0,
-	HSIO_S6G_IB_STATUS1,
-	HSIO_S6G_ACJTAG_STATUS,
-	HSIO_S6G_PLL_STATUS,
-	HSIO_S6G_REVID,
-	HSIO_MCB_S6G_ADDR_CFG,
-	HSIO_HW_CFG,
-	HSIO_HW_QSGMII_CFG,
-	HSIO_HW_QSGMII_STAT,
-	HSIO_CLK_CFG,
-	HSIO_TEMP_SENSOR_CTRL,
-	HSIO_TEMP_SENSOR_CFG,
-	HSIO_TEMP_SENSOR_STAT,
 };
 
 enum ocelot_regfield {
diff --git a/drivers/net/ethernet/mscc/ocelot_regs.c b/drivers/net/ethernet/mscc/ocelot_regs.c
index bf0c609caea5..9271af18b93b 100644
--- a/drivers/net/ethernet/mscc/ocelot_regs.c
+++ b/drivers/net/ethernet/mscc/ocelot_regs.c
@@ -103,82 +103,6 @@ static const u32 ocelot_qs_regmap[] = {
 	REG(QS_INH_DBG,                    0x000048),
 };
 
-static const u32 ocelot_hsio_regmap[] = {
-	REG(HSIO_PLL5G_CFG0,               0x000000),
-	REG(HSIO_PLL5G_CFG1,               0x000004),
-	REG(HSIO_PLL5G_CFG2,               0x000008),
-	REG(HSIO_PLL5G_CFG3,               0x00000c),
-	REG(HSIO_PLL5G_CFG4,               0x000010),
-	REG(HSIO_PLL5G_CFG5,               0x000014),
-	REG(HSIO_PLL5G_CFG6,               0x000018),
-	REG(HSIO_PLL5G_STATUS0,            0x00001c),
-	REG(HSIO_PLL5G_STATUS1,            0x000020),
-	REG(HSIO_PLL5G_BIST_CFG0,          0x000024),
-	REG(HSIO_PLL5G_BIST_CFG1,          0x000028),
-	REG(HSIO_PLL5G_BIST_CFG2,          0x00002c),
-	REG(HSIO_PLL5G_BIST_STAT0,         0x000030),
-	REG(HSIO_PLL5G_BIST_STAT1,         0x000034),
-	REG(HSIO_RCOMP_CFG0,               0x000038),
-	REG(HSIO_RCOMP_STATUS,             0x00003c),
-	REG(HSIO_SYNC_ETH_CFG,             0x000040),
-	REG(HSIO_SYNC_ETH_PLL_CFG,         0x000048),
-	REG(HSIO_S1G_DES_CFG,              0x00004c),
-	REG(HSIO_S1G_IB_CFG,               0x000050),
-	REG(HSIO_S1G_OB_CFG,               0x000054),
-	REG(HSIO_S1G_SER_CFG,              0x000058),
-	REG(HSIO_S1G_COMMON_CFG,           0x00005c),
-	REG(HSIO_S1G_PLL_CFG,              0x000060),
-	REG(HSIO_S1G_PLL_STATUS,           0x000064),
-	REG(HSIO_S1G_DFT_CFG0,             0x000068),
-	REG(HSIO_S1G_DFT_CFG1,             0x00006c),
-	REG(HSIO_S1G_DFT_CFG2,             0x000070),
-	REG(HSIO_S1G_TP_CFG,               0x000074),
-	REG(HSIO_S1G_RC_PLL_BIST_CFG,      0x000078),
-	REG(HSIO_S1G_MISC_CFG,             0x00007c),
-	REG(HSIO_S1G_DFT_STATUS,           0x000080),
-	REG(HSIO_S1G_MISC_STATUS,          0x000084),
-	REG(HSIO_MCB_S1G_ADDR_CFG,         0x000088),
-	REG(HSIO_S6G_DIG_CFG,              0x00008c),
-	REG(HSIO_S6G_DFT_CFG0,             0x000090),
-	REG(HSIO_S6G_DFT_CFG1,             0x000094),
-	REG(HSIO_S6G_DFT_CFG2,             0x000098),
-	REG(HSIO_S6G_TP_CFG0,              0x00009c),
-	REG(HSIO_S6G_TP_CFG1,              0x0000a0),
-	REG(HSIO_S6G_RC_PLL_BIST_CFG,      0x0000a4),
-	REG(HSIO_S6G_MISC_CFG,             0x0000a8),
-	REG(HSIO_S6G_OB_ANEG_CFG,          0x0000ac),
-	REG(HSIO_S6G_DFT_STATUS,           0x0000b0),
-	REG(HSIO_S6G_ERR_CNT,              0x0000b4),
-	REG(HSIO_S6G_MISC_STATUS,          0x0000b8),
-	REG(HSIO_S6G_DES_CFG,              0x0000bc),
-	REG(HSIO_S6G_IB_CFG,               0x0000c0),
-	REG(HSIO_S6G_IB_CFG1,              0x0000c4),
-	REG(HSIO_S6G_IB_CFG2,              0x0000c8),
-	REG(HSIO_S6G_IB_CFG3,              0x0000cc),
-	REG(HSIO_S6G_IB_CFG4,              0x0000d0),
-	REG(HSIO_S6G_IB_CFG5,              0x0000d4),
-	REG(HSIO_S6G_OB_CFG,               0x0000d8),
-	REG(HSIO_S6G_OB_CFG1,              0x0000dc),
-	REG(HSIO_S6G_SER_CFG,              0x0000e0),
-	REG(HSIO_S6G_COMMON_CFG,           0x0000e4),
-	REG(HSIO_S6G_PLL_CFG,              0x0000e8),
-	REG(HSIO_S6G_ACJTAG_CFG,           0x0000ec),
-	REG(HSIO_S6G_GP_CFG,               0x0000f0),
-	REG(HSIO_S6G_IB_STATUS0,           0x0000f4),
-	REG(HSIO_S6G_IB_STATUS1,           0x0000f8),
-	REG(HSIO_S6G_ACJTAG_STATUS,        0x0000fc),
-	REG(HSIO_S6G_PLL_STATUS,           0x000100),
-	REG(HSIO_S6G_REVID,                0x000104),
-	REG(HSIO_MCB_S6G_ADDR_CFG,         0x000108),
-	REG(HSIO_HW_CFG,                   0x00010c),
-	REG(HSIO_HW_QSGMII_CFG,            0x000110),
-	REG(HSIO_HW_QSGMII_STAT,           0x000114),
-	REG(HSIO_CLK_CFG,                  0x000118),
-	REG(HSIO_TEMP_SENSOR_CTRL,         0x00011c),
-	REG(HSIO_TEMP_SENSOR_CFG,          0x000120),
-	REG(HSIO_TEMP_SENSOR_STAT,         0x000124),
-};
-
 static const u32 ocelot_qsys_regmap[] = {
 	REG(QSYS_PORT_MODE,                0x011200),
 	REG(QSYS_SWITCH_PORT_MODE,         0x011234),
@@ -303,7 +227,6 @@ static const u32 ocelot_sys_regmap[] = {
 static const u32 *ocelot_regmap[] = {
 	[ANA] = ocelot_ana_regmap,
 	[QS] = ocelot_qs_regmap,
-	[HSIO] = ocelot_hsio_regmap,
 	[QSYS] = ocelot_qsys_regmap,
 	[REW] = ocelot_rew_regmap,
 	[SYS] = ocelot_sys_regmap,
@@ -454,9 +377,11 @@ static void ocelot_pll5_init(struct ocelot *ocelot)
 	/* Configure PLL5. This will need a proper CCF driver
 	 * The values are coming from the VTSS API for Ocelot
 	 */
-	ocelot_write(ocelot, HSIO_PLL5G_CFG4_IB_CTRL(0x7600) |
-		     HSIO_PLL5G_CFG4_IB_BIAS_CTRL(0x8), HSIO_PLL5G_CFG4);
-	ocelot_write(ocelot, HSIO_PLL5G_CFG0_CORE_CLK_DIV(0x11) |
+	regmap_write(ocelot->targets[HSIO], HSIO_PLL5G_CFG4,
+		     HSIO_PLL5G_CFG4_IB_CTRL(0x7600) |
+		     HSIO_PLL5G_CFG4_IB_BIAS_CTRL(0x8));
+	regmap_write(ocelot->targets[HSIO], HSIO_PLL5G_CFG0,
+		     HSIO_PLL5G_CFG0_CORE_CLK_DIV(0x11) |
 		     HSIO_PLL5G_CFG0_CPU_CLK_DIV(2) |
 		     HSIO_PLL5G_CFG0_ENA_BIAS |
 		     HSIO_PLL5G_CFG0_ENA_VCO_BUF |
@@ -466,13 +391,14 @@ static void ocelot_pll5_init(struct ocelot *ocelot)
 		     HSIO_PLL5G_CFG0_SELBGV820(4) |
 		     HSIO_PLL5G_CFG0_DIV4 |
 		     HSIO_PLL5G_CFG0_ENA_CLKTREE |
-		     HSIO_PLL5G_CFG0_ENA_LANE, HSIO_PLL5G_CFG0);
-	ocelot_write(ocelot, HSIO_PLL5G_CFG2_EN_RESET_FRQ_DET |
+		     HSIO_PLL5G_CFG0_ENA_LANE);
+	regmap_write(ocelot->targets[HSIO], HSIO_PLL5G_CFG2,
+		     HSIO_PLL5G_CFG2_EN_RESET_FRQ_DET |
 		     HSIO_PLL5G_CFG2_EN_RESET_OVERRUN |
 		     HSIO_PLL5G_CFG2_GAIN_TEST(0x8) |
 		     HSIO_PLL5G_CFG2_ENA_AMPCTRL |
 		     HSIO_PLL5G_CFG2_PWD_AMPCTRL_N |
-		     HSIO_PLL5G_CFG2_AMPC_SEL(0x10), HSIO_PLL5G_CFG2);
+		     HSIO_PLL5G_CFG2_AMPC_SEL(0x10));
 }
 
 int ocelot_chip_init(struct ocelot *ocelot)
diff --git a/include/soc/mscc/ocelot_hsio.h b/include/soc/mscc/ocelot_hsio.h
index d93ddec3931b..43112dd7313a 100644
--- a/include/soc/mscc/ocelot_hsio.h
+++ b/include/soc/mscc/ocelot_hsio.h
@@ -8,6 +8,80 @@
 #ifndef _MSCC_OCELOT_HSIO_H_
 #define _MSCC_OCELOT_HSIO_H_
 
+#define HSIO_PLL5G_CFG0			0x0000
+#define HSIO_PLL5G_CFG1			0x0004
+#define HSIO_PLL5G_CFG2			0x0008
+#define HSIO_PLL5G_CFG3			0x000c
+#define HSIO_PLL5G_CFG4			0x0010
+#define HSIO_PLL5G_CFG5			0x0014
+#define HSIO_PLL5G_CFG6			0x0018
+#define HSIO_PLL5G_STATUS0		0x001c
+#define HSIO_PLL5G_STATUS1		0x0020
+#define HSIO_PLL5G_BIST_CFG0		0x0024
+#define HSIO_PLL5G_BIST_CFG1		0x0028
+#define HSIO_PLL5G_BIST_CFG2		0x002c
+#define HSIO_PLL5G_BIST_STAT0		0x0030
+#define HSIO_PLL5G_BIST_STAT1		0x0034
+#define HSIO_RCOMP_CFG0			0x0038
+#define HSIO_RCOMP_STATUS		0x003c
+#define HSIO_SYNC_ETH_CFG		0x0040
+#define HSIO_SYNC_ETH_PLL_CFG		0x0048
+#define HSIO_S1G_DES_CFG		0x004c
+#define HSIO_S1G_IB_CFG			0x0050
+#define HSIO_S1G_OB_CFG			0x0054
+#define HSIO_S1G_SER_CFG		0x0058
+#define HSIO_S1G_COMMON_CFG		0x005c
+#define HSIO_S1G_PLL_CFG		0x0060
+#define HSIO_S1G_PLL_STATUS		0x0064
+#define HSIO_S1G_DFT_CFG0		0x0068
+#define HSIO_S1G_DFT_CFG1		0x006c
+#define HSIO_S1G_DFT_CFG2		0x0070
+#define HSIO_S1G_TP_CFG			0x0074
+#define HSIO_S1G_RC_PLL_BIST_CFG	0x0078
+#define HSIO_S1G_MISC_CFG		0x007c
+#define HSIO_S1G_DFT_STATUS		0x0080
+#define HSIO_S1G_MISC_STATUS		0x0084
+#define HSIO_MCB_S1G_ADDR_CFG		0x0088
+#define HSIO_S6G_DIG_CFG		0x008c
+#define HSIO_S6G_DFT_CFG0		0x0090
+#define HSIO_S6G_DFT_CFG1		0x0094
+#define HSIO_S6G_DFT_CFG2		0x0098
+#define HSIO_S6G_TP_CFG0		0x009c
+#define HSIO_S6G_TP_CFG1		0x00a0
+#define HSIO_S6G_RC_PLL_BIST_CFG	0x00a4
+#define HSIO_S6G_MISC_CFG		0x00a8
+#define HSIO_S6G_OB_ANEG_CFG		0x00ac
+#define HSIO_S6G_DFT_STATUS		0x00b0
+#define HSIO_S6G_ERR_CNT		0x00b4
+#define HSIO_S6G_MISC_STATUS		0x00b8
+#define HSIO_S6G_DES_CFG		0x00bc
+#define HSIO_S6G_IB_CFG			0x00c0
+#define HSIO_S6G_IB_CFG1		0x00c4
+#define HSIO_S6G_IB_CFG2		0x00c8
+#define HSIO_S6G_IB_CFG3		0x00cc
+#define HSIO_S6G_IB_CFG4		0x00d0
+#define HSIO_S6G_IB_CFG5		0x00d4
+#define HSIO_S6G_OB_CFG			0x00d8
+#define HSIO_S6G_OB_CFG1		0x00dc
+#define HSIO_S6G_SER_CFG		0x00e0
+#define HSIO_S6G_COMMON_CFG		0x00e4
+#define HSIO_S6G_PLL_CFG		0x00e8
+#define HSIO_S6G_ACJTAG_CFG		0x00ec
+#define HSIO_S6G_GP_CFG			0x00f0
+#define HSIO_S6G_IB_STATUS0		0x00f4
+#define HSIO_S6G_IB_STATUS1		0x00f8
+#define HSIO_S6G_ACJTAG_STATUS		0x00fc
+#define HSIO_S6G_PLL_STATUS		0x0100
+#define HSIO_S6G_REVID			0x0104
+#define HSIO_MCB_S6G_ADDR_CFG		0x0108
+#define HSIO_HW_CFG			0x010c
+#define HSIO_HW_QSGMII_CFG		0x0110
+#define HSIO_HW_QSGMII_STAT		0x0114
+#define HSIO_CLK_CFG			0x0118
+#define HSIO_TEMP_SENSOR_CTRL		0x011c
+#define HSIO_TEMP_SENSOR_CFG		0x0120
+#define HSIO_TEMP_SENSOR_STAT		0x0124
+
 #define HSIO_PLL5G_CFG0_ENA_ROT                           BIT(31)
 #define HSIO_PLL5G_CFG0_ENA_LANE                          BIT(30)
 #define HSIO_PLL5G_CFG0_ENA_CLKTREE                       BIT(29)
-- 
2.17.1

^ permalink raw reply related

* Re: [RFT net-next] net: stmmac: Rework coalesce timer and fix multi-queue races
From: Jose Abreu @ 2018-09-03  9:36 UTC (permalink / raw)
  To: Jerome Brunet, Jose Abreu, netdev
  Cc: Martin Blumenstingl, David S. Miller, Joao Pinto,
	Giuseppe Cavallaro, Alexandre Torgue
In-Reply-To: <1710c5d8f447be2cf38c5c5e468186357b1388f2.camel@baylibre.com>

Hi Jerome,

On 03-09-2018 09:56, Jerome Brunet wrote:
> On Thu, 2018-08-30 at 11:37 +0100, Jose Abreu wrote:
>> [ As for now this is only for testing! ]
>>
>> This follows David Miller advice and tries to fix coalesce timer in
>> multi-queue scenarios.
>>
>> We are now using per-queue coalesce values and per-queue TX timer. This
>> assumes that tx_queues == rx_queues, which can not be necessarly true.
>> Official patch will need to have this fixed.
>>
>> Coalesce timer default values was changed to 1ms and the coalesce frames
>> to 25.
>>
>> Tested in B2B setup between XGMAC2 and GMAC5.
> Tested on Amlogic meson-axg-s400. No regression seen so far.
> (arch/arm64/boot/dts/amlogic/meson-axg-s400.dts)
>
> As far as I understand from the device tree parsing, this platform (and all
> other amlogic platforms) use single queue.

Thanks for testing! I will send a formal patch once I get around
the problem of rx queues != tx queues.

>
> ---
>
> Jose,
>
> On another topic doing iperf3 test on amlogic's devices we seen a strange
> behavior.
>
> Doing Tx or Rx test usually works fine (700MBps to 900MBps depending on the
> platform). However, when doing both Rx and Tx at the same time, We see the Tx
> throughput dropping significantly (~30MBps) and lot of TCP retries.
>
> Would you any idea what might be our problem ? or how to start investigating
> this ?
>

I'm not able to reproduce this here but I'm using multiple queue.
I will try with single queue. In the meantime please try this
patch (it shall be applied directly on top of this RFT):


--->8
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index ae26a6e8608e..1407975320aa 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2210,8 +2210,7 @@ static int stmmac_init_dma_engine(struct
stmmac_priv *priv)
                stmmac_init_tx_chan(priv, priv->ioaddr,
priv->plat->dma_cfg,
                                    tx_q->dma_tx_phy, chan);
 
-               tx_q->tx_tail_addr = tx_q->dma_tx_phy +
-                           (DMA_TX_SIZE * sizeof(struct dma_desc));
+               tx_q->tx_tail_addr = tx_q->dma_tx_phy;
                stmmac_set_tx_tail_ptr(priv, priv->ioaddr,
                                       tx_q->tx_tail_addr, chan);
        }
@@ -3004,6 +3003,7 @@ static netdev_tx_t stmmac_tso_xmit(struct
sk_buff *skb, struct net_device *dev)
 
        netdev_tx_sent_queue(netdev_get_tx_queue(dev, queue),
skb->len);
 
+       tx_q->tx_tail_addr = tx_q->dma_tx_phy + (tx_q->cur_tx *
sizeof(*desc));
        stmmac_set_tx_tail_ptr(priv, priv->ioaddr,
tx_q->tx_tail_addr, queue);
 
        if (priv->tx_coal_timer && !tx_q->tx_timer_active) {
@@ -3223,6 +3223,8 @@ static netdev_tx_t stmmac_xmit(struct
sk_buff *skb, struct net_device *dev)
        netdev_tx_sent_queue(netdev_get_tx_queue(dev, queue),
skb->len);
 
        stmmac_enable_dma_transmission(priv, priv->ioaddr);
+
+       tx_q->tx_tail_addr = tx_q->dma_tx_phy + (tx_q->cur_tx *
sizeof(*desc));
        stmmac_set_tx_tail_ptr(priv, priv->ioaddr,
tx_q->tx_tail_addr, queue);
 
        if (priv->tx_coal_timer && !tx_q->tx_timer_active) {
--->8

Thanks and Best Regards,
Jose Miguel Abreu

^ permalink raw reply related

* Re: phys_port_id in switchdev mode?
From: Or Gerlitz @ 2018-09-03  9:40 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Florian Fainelli, Simon Horman, Andy Gospodarek,
	mchan@broadcom.com, Jiri Pirko, Alexander Duyck, Frederick Botha,
	nick viljoen, Linux Netdev List
In-Reply-To: <20180828200539.1c0fe607@cakuba.netronome.com>

On Tue, Aug 28, 2018 at 9:05 PM, Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
> Hi!


Hi Jakub and sorry for the late reply, this crazigly hot summer refuses to die,

Note I replied couple of minutes ago but it didn't get to the list, so
lets take it from this one:

> I wonder if we can use phys_port_id in switchdev to group together
> interfaces of a single PCI PF?  Here is the problem:
>
> With a mix of PF and VF interfaces it gets increasingly difficult to
> figure out which one corresponds to which PF.  We can identify which
> *representor* is which, by means of phys_port_name and devlink
> flavours.  But if the actual VF/PF interfaces are also present on the
> same host, it gets confusing when one tries to identify the PF they
> came from.  Generally one has to resort of matching between PCI DBDF of
> the PF and VFs or read relevant info out of ethtool -i.
>
> In multi host scenario this is particularly painful, as there seems to
> be no immediately obvious way to match PCI interface ID of a card (0,
> 1, 2, 3, 4...) to the DBDF we have connected.
>
> Another angle to this is legacy SR-IOV NDOs.  User space picks a netdev
> from /sys/bus/pci/$VF_DBDF/physfn/net/ to run the NDOs on in somehow
> random manner, which means we have to provide those for all devices with
> link to the PF (all reprs).  And we have to link them (a) because it's
> right (tm) and (b) to get correct naming.


wait, as you commented in later, not only the mellanox vf reprs but rather also
the nfp vf reprs are not linked to the PF, because ip link output
grows quadratically.

> The only reliable way to make
> user space (libvirt) choose the repr it should run the NDOs on (which is
> IMHO the corresponding PF repr) is to set phys_port_id on actual VFs,
> VF reprs, PFs and PF reprs to a value corresponding to the *PCI PF*,
> not the external/Ethernet port when in switchdev mode.  User space
> should understand phys_port_id in this context, given it was originally
> introduced for matching VFs to ports.


Using phy_port_id to match/group VFs to PFs makes sense to me.

So what would be the libvirt use case you envision that needs
the VF and PF reprs to support that as well? or maybe you were
not referring to libvirt but to some other provisioning element? I need
to refresh my memory on that area.

^ permalink raw reply

* Re: phys_port_id in switchdev mode?
From: Or Gerlitz @ 2018-09-03  9:43 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: Jakub Kicinski, Florian Fainelli, Simon Horman, Andy Gospodarek,
	mchan@broadcom.com, Jiri Pirko, Alexander Duyck, Frederick Botha,
	nick viljoen, netdev@vger.kernel.org
In-Reply-To: <20180831201321.GA4590@localhost.localdomain>

On Fri, Aug 31, 2018 at 11:13 PM, Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
> On Tue, Aug 28, 2018 at 08:43:51PM +0200, Jakub Kicinski wrote:
>> Ugh, CC: netdev..
>>
>> On Tue, 28 Aug 2018 20:05:39 +0200, Jakub Kicinski wrote:
>> > Hi!
>> >
>> > I wonder if we can use phys_port_id in switchdev to group together
>> > interfaces of a single PCI PF?  Here is the problem:
>
> On Mellanox cards, this is already possible via phys_switch_id, as
> each PF has its own phys_switch_id. So all VFs with a given
> phys_switch_id belong to the PF with that same phys_switch_id.

This is due to the fact that currently when getting into switchdev mode
the PF netdev becomes the uplink representor. This is problematic and we
are working to have an uplink repr as nfp and others have. Bottom line,
this is not the correct way to group PF with it's VFs, switch id is something
that relates to switch port reprs not the entities behind them.

^ permalink raw reply

* Re: [RFT net-next] net: stmmac: Rework coalesce timer and fix multi-queue races
From: Jerome Brunet @ 2018-09-03 10:16 UTC (permalink / raw)
  To: Jose Abreu, netdev
  Cc: Martin Blumenstingl, David S. Miller, Joao Pinto,
	Giuseppe Cavallaro, Alexandre Torgue
In-Reply-To: <a4442f61-dbe5-fae1-a128-81007750d0bc@synopsys.com>

On Mon, 2018-09-03 at 10:36 +0100, Jose Abreu wrote:
> Hi Jerome,
> 
> On 03-09-2018 09:56, Jerome Brunet wrote:
> > On Thu, 2018-08-30 at 11:37 +0100, Jose Abreu wrote:
> > > [ As for now this is only for testing! ]
> > > 
> > > This follows David Miller advice and tries to fix coalesce timer in
> > > multi-queue scenarios.
> > > 
> > > We are now using per-queue coalesce values and per-queue TX timer. This
> > > assumes that tx_queues == rx_queues, which can not be necessarly true.
> > > Official patch will need to have this fixed.
> > > 
> > > Coalesce timer default values was changed to 1ms and the coalesce frames
> > > to 25.
> > > 
> > > Tested in B2B setup between XGMAC2 and GMAC5.
> > 
> > Tested on Amlogic meson-axg-s400. No regression seen so far.
> > (arch/arm64/boot/dts/amlogic/meson-axg-s400.dts)
> > 
> > As far as I understand from the device tree parsing, this platform (and all
> > other amlogic platforms) use single queue.
> 
> Thanks for testing! I will send a formal patch once I get around
> the problem of rx queues != tx queues.
> 
> > 
> > ---
> > 
> > Jose,
> > 
> > On another topic doing iperf3 test on amlogic's devices we seen a strange
> > behavior.
> > 
> > Doing Tx or Rx test usually works fine (700MBps to 900MBps depending on the
> > platform). However, when doing both Rx and Tx at the same time, We see the Tx
> > throughput dropping significantly (~30MBps) and lot of TCP retries.
> > 
> > Would you any idea what might be our problem ? or how to start investigating
> > this ?
> > 
> 
> I'm not able to reproduce this here but I'm using multiple queue.
> I will try with single queue. In the meantime please try this
> patch (it shall be applied directly on top of this RFT):

No notable change. Rx is fine but Tx:
[  5]   3.00-4.00   sec  3.55 MBytes  29.8 Mbits/sec   51   12.7 KBytes

I suppose the problem as something to do with the retries. When doing Tx test
alone, we don't have such a things a throughput where we expect it to be.

By the way, your mailer (and its auto 80 column rule I suppose) made the patch
below a bit harder to apply

> 
> 
> --->8
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index ae26a6e8608e..1407975320aa 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -2210,8 +2210,7 @@ static int stmmac_init_dma_engine(struct
> stmmac_priv *priv)
>                 stmmac_init_tx_chan(priv, priv->ioaddr,
> priv->plat->dma_cfg,
>                                     tx_q->dma_tx_phy, chan);
>  
> -               tx_q->tx_tail_addr = tx_q->dma_tx_phy +
> -                           (DMA_TX_SIZE * sizeof(struct dma_desc));
> +               tx_q->tx_tail_addr = tx_q->dma_tx_phy;
>                 stmmac_set_tx_tail_ptr(priv, priv->ioaddr,
>                                        tx_q->tx_tail_addr, chan);
>         }
> @@ -3004,6 +3003,7 @@ static netdev_tx_t stmmac_tso_xmit(struct
> sk_buff *skb, struct net_device *dev)
>  
>         netdev_tx_sent_queue(netdev_get_tx_queue(dev, queue),
> skb->len);
>  
> +       tx_q->tx_tail_addr = tx_q->dma_tx_phy + (tx_q->cur_tx *
> sizeof(*desc));
>         stmmac_set_tx_tail_ptr(priv, priv->ioaddr,
> tx_q->tx_tail_addr, queue);
>  
>         if (priv->tx_coal_timer && !tx_q->tx_timer_active) {
> @@ -3223,6 +3223,8 @@ static netdev_tx_t stmmac_xmit(struct
> sk_buff *skb, struct net_device *dev)
>         netdev_tx_sent_queue(netdev_get_tx_queue(dev, queue),
> skb->len);
>  
>         stmmac_enable_dma_transmission(priv, priv->ioaddr);
> +
> +       tx_q->tx_tail_addr = tx_q->dma_tx_phy + (tx_q->cur_tx *
> sizeof(*desc));
>         stmmac_set_tx_tail_ptr(priv, priv->ioaddr,
> tx_q->tx_tail_addr, queue);
>  
>         if (priv->tx_coal_timer && !tx_q->tx_timer_active) {
> --->8
> 
> Thanks and Best Regards,
> Jose Miguel Abreu

^ permalink raw reply

* [PATCH net-next 04/11] net: hns3: Implement shutdown ops in hns3 pci driver
From: Salil Mehta @ 2018-09-03 10:21 UTC (permalink / raw)
  To: davem
  Cc: salil.mehta, yisen.zhuang, lipeng321, mehta.salil, netdev,
	linux-kernel, linuxarm, Yunsheng Lin
In-Reply-To: <20180903102156.18676-1-salil.mehta@huawei.com>

From: Yunsheng Lin <linyunsheng@huawei.com>

This patch implements shutdown ops in hns3 pci driver, which
unloads the hns3 driver and set the power state to D3hot.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 955c4ab..75e8ee9 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -1662,11 +1662,24 @@ static int hns3_pci_sriov_configure(struct pci_dev *pdev, int num_vfs)
 	return 0;
 }
 
+static void hns3_shutdown(struct pci_dev *pdev)
+{
+	struct hnae3_ae_dev *ae_dev = pci_get_drvdata(pdev);
+
+	hnae3_unregister_ae_dev(ae_dev);
+	devm_kfree(&pdev->dev, ae_dev);
+	pci_set_drvdata(pdev, NULL);
+
+	if (system_state == SYSTEM_POWER_OFF)
+		pci_set_power_state(pdev, PCI_D3hot);
+}
+
 static struct pci_driver hns3_driver = {
 	.name     = hns3_driver_name,
 	.id_table = hns3_pci_tbl,
 	.probe    = hns3_probe,
 	.remove   = hns3_remove,
+	.shutdown = hns3_shutdown,
 	.sriov_configure = hns3_pci_sriov_configure,
 };
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next 08/11] net: hns3: Only update mac configuation when necessary
From: Salil Mehta @ 2018-09-03 10:21 UTC (permalink / raw)
  To: davem
  Cc: salil.mehta, yisen.zhuang, lipeng321, mehta.salil, netdev,
	linux-kernel, linuxarm, Yunsheng Lin
In-Reply-To: <20180903102156.18676-1-salil.mehta@huawei.com>

From: Yunsheng Lin <linyunsheng@huawei.com>

Currently only fiber port checks if it is necessay to set the
mac through firmware when link is changed, this patch unify the
checking to allow the copper port do the checking too.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
---
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c    | 48 ++++++++++++++--------
 1 file changed, 31 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 004bfc1..25d4fd9 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -2066,19 +2066,17 @@ static int hclge_init_msi(struct hclge_dev *hdev)
 	return 0;
 }
 
-static void hclge_check_speed_dup(struct hclge_dev *hdev, int duplex, int speed)
+static u8 hclge_check_speed_dup(u8 duplex, int speed)
 {
-	struct hclge_mac *mac = &hdev->hw.mac;
 
-	if ((speed == HCLGE_MAC_SPEED_10M) || (speed == HCLGE_MAC_SPEED_100M))
-		mac->duplex = (u8)duplex;
-	else
-		mac->duplex = HCLGE_MAC_FULL;
+	if (!(speed == HCLGE_MAC_SPEED_10M || speed == HCLGE_MAC_SPEED_100M))
+		duplex = HCLGE_MAC_FULL;
 
-	mac->speed = speed;
+	return duplex;
 }
 
-int hclge_cfg_mac_speed_dup(struct hclge_dev *hdev, int speed, u8 duplex)
+static int hclge_cfg_mac_speed_dup_hw(struct hclge_dev *hdev, int speed,
+				      u8 duplex)
 {
 	struct hclge_config_mac_speed_dup_cmd *req;
 	struct hclge_desc desc;
@@ -2138,7 +2136,23 @@ int hclge_cfg_mac_speed_dup(struct hclge_dev *hdev, int speed, u8 duplex)
 		return ret;
 	}
 
-	hclge_check_speed_dup(hdev, duplex, speed);
+	return 0;
+}
+
+int hclge_cfg_mac_speed_dup(struct hclge_dev *hdev, int speed, u8 duplex)
+{
+	int ret;
+
+	duplex = hclge_check_speed_dup(duplex, speed);
+	if (hdev->hw.mac.speed == speed && hdev->hw.mac.duplex == duplex)
+		return 0;
+
+	ret = hclge_cfg_mac_speed_dup_hw(hdev, speed, duplex);
+	if (ret)
+		return ret;
+
+	hdev->hw.mac.speed = speed;
+	hdev->hw.mac.duplex = duplex;
 
 	return 0;
 }
@@ -2259,7 +2273,9 @@ static int hclge_mac_init(struct hclge_dev *hdev)
 	int ret;
 	int i;
 
-	ret = hclge_cfg_mac_speed_dup(hdev, hdev->hw.mac.speed, HCLGE_MAC_FULL);
+	hdev->hw.mac.duplex = HCLGE_MAC_FULL;
+	ret = hclge_cfg_mac_speed_dup_hw(hdev, hdev->hw.mac.speed,
+					 hdev->hw.mac.duplex);
 	if (ret) {
 		dev_err(&hdev->pdev->dev,
 			"Config mac speed dup fail ret=%d\n", ret);
@@ -2415,13 +2431,11 @@ static int hclge_update_speed_duplex(struct hclge_dev *hdev)
 		return ret;
 	}
 
-	if ((mac.speed != speed) || (mac.duplex != duplex)) {
-		ret = hclge_cfg_mac_speed_dup(hdev, speed, duplex);
-		if (ret) {
-			dev_err(&hdev->pdev->dev,
-				"mac speed/duplex config failed %d\n", ret);
-			return ret;
-		}
+	ret = hclge_cfg_mac_speed_dup(hdev, speed, duplex);
+	if (ret) {
+		dev_err(&hdev->pdev->dev,
+			"mac speed/duplex config failed %d\n", ret);
+		return ret;
 	}
 
 	return 0;
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next 09/11] net: hns3: Change the dst mac addr of loopback packet
From: Salil Mehta @ 2018-09-03 10:21 UTC (permalink / raw)
  To: davem
  Cc: salil.mehta, yisen.zhuang, lipeng321, mehta.salil, netdev,
	linux-kernel, linuxarm, Yunsheng Lin
In-Reply-To: <20180903102156.18676-1-salil.mehta@huawei.com>

From: Yunsheng Lin <linyunsheng@huawei.com>

Currently, the dst mac addr of loopback packet is the same as
the host' mac addr, the SSU component may loop back the packet
to host before the packet reaches mac or serdes, which will defect
the purpose of mac or serdes selftest.

This patch changes it by adding 0x1f to the last byte of dst mac
addr.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
index 7143e39..dfce76e 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
@@ -137,6 +137,7 @@ static void hns3_lp_setup_skb(struct sk_buff *skb)
 	packet = skb_put(skb, HNS3_NIC_LB_TEST_PACKET_SIZE);
 
 	memcpy(ethh->h_dest, ndev->dev_addr, ETH_ALEN);
+	ethh->h_dest[5] += 0x1f;
 	eth_zero_addr(ethh->h_source);
 	ethh->h_proto = htons(ETH_P_ARP);
 	skb_reset_mac_header(skb);
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR
From: Kirill Tkhai @ 2018-09-03 14:53 UTC (permalink / raw)
  To: Jiri Benc
  Cc: Christian Brauner, netdev, linux-kernel, davem, kuznet, yoshfuji,
	pombredanne, kstewart, gregkh, dsahern, fw, lucien.xin,
	jakub.kicinski, nicolas.dichtel
In-Reply-To: <20180903155020.6112f773@redhat.com>

On 03.09.2018 16:50, Jiri Benc wrote:
> On Mon, 3 Sep 2018 16:41:45 +0300, Kirill Tkhai wrote:
>> But this is a synthetic test, while I asked about real workflow.
>> Is this real problem for lxd, and there is observed performance
>> decrease?
> 
> It's actually not as much a performance problem but rather the only way
> to get the data in some situations. Namely, when you have only netnsid.
> This happens e.g. when you want to query a veth peer in another netns.
> 
> setns() requires a file descriptor which you don't have. Nor there is
> a way to convert netnsid to a fd.
> 
> While developing the IFLA_IF_NETNSID patch, I was first thinking about
> implementing an API doing the conversion. The problem is there's no
> good place to put this into. It can't be done over netlink: netlink is
> unreliable and you can't have the kernel open a fd for you and lose it.
> There's no ioctl to use. So we'd be left with a procfs/sysfs or a
> syscall.
> 
> Using netnsid to refer to the target netns seems to be a nice solution -
> after all, netnsid is the identifier to use in netlink.

Ok, I have no objections, just mentioning that generic problem.

Thanks,
Kirill

^ permalink raw reply

* Re: [PATCH net-next v1 4/5] ipv6: enable IFA_IF_NETNSID for RTM_GETADDR
From: Kirill Tkhai @ 2018-09-03 14:56 UTC (permalink / raw)
  To: Christian Brauner, netdev, linux-kernel
  Cc: davem, kuznet, yoshfuji, pombredanne, kstewart, gregkh, dsahern,
	fw, lucien.xin, jakub.kicinski, jbenc, nicolas.dichtel
In-Reply-To: <20180903043717.20136-5-christian@brauner.io>

On 03.09.2018 07:37, Christian Brauner wrote:
> - Backwards Compatibility:
>   If userspace wants to determine whether ipv6 RTM_GETADDR requests support
>   the new IFA_IF_NETNSID property they should verify that the reply after
>   sending a request includes the IFA_IF_NETNSID property. If it does not
>   userspace should assume that IFA_IF_NETNSID is not supported for ipv6
>   RTM_GETADDR requests on this kernel.
> - From what I gather from current userspace tools that make use of
>   RTM_GETADDR requests some of them pass down struct ifinfomsg when they
>   should actually pass down struct ifaddrmsg. To not break existing
>   tools that pass down the wrong struct we will do the same as for
>   RTM_GETLINK | NLM_F_DUMP requests and not error out when the
>   nlmsg_parse() fails.
> 
> - Security:
>   Callers must have CAP_NET_ADMIN in the owning user namespace of the
>   target network namespace.
> 
> Signed-off-by: Christian Brauner <christian@brauner.io>
> ---
> v0->v1:
> - unchanged
> ---
>  net/ipv6/addrconf.c | 70 +++++++++++++++++++++++++++++++++++----------
>  1 file changed, 55 insertions(+), 15 deletions(-)
> 
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index d51a8c0b3372..00f1af374150 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -4491,6 +4491,7 @@ static const struct nla_policy ifa_ipv6_policy[IFA_MAX+1] = {
>  	[IFA_CACHEINFO]		= { .len = sizeof(struct ifa_cacheinfo) },
>  	[IFA_FLAGS]		= { .len = sizeof(u32) },
>  	[IFA_RT_PRIORITY]	= { .len = sizeof(u32) },
> +	[IFA_IF_NETNSID]	= { .type = NLA_S32 },
>  };
>  
>  static int
> @@ -4794,7 +4795,8 @@ static inline int inet6_ifaddr_msgsize(void)
>  }
>  
>  static int inet6_fill_ifaddr(struct sk_buff *skb, struct inet6_ifaddr *ifa,
> -			     u32 portid, u32 seq, int event, unsigned int flags)
> +			     u32 portid, u32 seq, int event, unsigned int flags,
> +			     int netnsid)
>  {
>  	struct nlmsghdr  *nlh;
>  	u32 preferred, valid;
> @@ -4806,6 +4808,9 @@ static int inet6_fill_ifaddr(struct sk_buff *skb, struct inet6_ifaddr *ifa,
>  	put_ifaddrmsg(nlh, ifa->prefix_len, ifa->flags, rt_scope(ifa->scope),
>  		      ifa->idev->dev->ifindex);
>  
> +	if (netnsid >= 0 && nla_put_s32(skb, IFA_IF_NETNSID, netnsid))
> +		goto error;
> +
>  	if (!((ifa->flags&IFA_F_PERMANENT) &&
>  	      (ifa->prefered_lft == INFINITY_LIFE_TIME))) {
>  		preferred = ifa->prefered_lft;
> @@ -4855,7 +4860,8 @@ static int inet6_fill_ifaddr(struct sk_buff *skb, struct inet6_ifaddr *ifa,
>  }
>  
>  static int inet6_fill_ifmcaddr(struct sk_buff *skb, struct ifmcaddr6 *ifmca,
> -				u32 portid, u32 seq, int event, u16 flags)
> +			       u32 portid, u32 seq, int event, u16 flags,
> +			       int netnsid)
>  {
>  	struct nlmsghdr  *nlh;
>  	u8 scope = RT_SCOPE_UNIVERSE;
> @@ -4868,6 +4874,9 @@ static int inet6_fill_ifmcaddr(struct sk_buff *skb, struct ifmcaddr6 *ifmca,
>  	if (!nlh)
>  		return -EMSGSIZE;
>  
> +	if (netnsid >= 0 && nla_put_s32(skb, IFA_IF_NETNSID, netnsid))
> +		return -EMSGSIZE;
> +
>  	put_ifaddrmsg(nlh, 128, IFA_F_PERMANENT, scope, ifindex);
>  	if (nla_put_in6_addr(skb, IFA_MULTICAST, &ifmca->mca_addr) < 0 ||
>  	    put_cacheinfo(skb, ifmca->mca_cstamp, ifmca->mca_tstamp,
> @@ -4881,7 +4890,8 @@ static int inet6_fill_ifmcaddr(struct sk_buff *skb, struct ifmcaddr6 *ifmca,
>  }
>  
>  static int inet6_fill_ifacaddr(struct sk_buff *skb, struct ifacaddr6 *ifaca,
> -				u32 portid, u32 seq, int event, unsigned int flags)
> +			       u32 portid, u32 seq, int event,
> +			       unsigned int flags, int netnsid)

Here we will have 7 arguments, while we have only 6 x86 registers. Can we do something
with this?

>  {
>  	struct net_device *dev = fib6_info_nh_dev(ifaca->aca_rt);
>  	int ifindex = dev ? dev->ifindex : 1;
> @@ -4895,6 +4905,9 @@ static int inet6_fill_ifacaddr(struct sk_buff *skb, struct ifacaddr6 *ifaca,
>  	if (!nlh)
>  		return -EMSGSIZE;
>  
> +	if (netnsid >= 0 && nla_put_s32(skb, IFA_IF_NETNSID, netnsid))
> +		return -EMSGSIZE;
> +
>  	put_ifaddrmsg(nlh, 128, IFA_F_PERMANENT, scope, ifindex);
>  	if (nla_put_in6_addr(skb, IFA_ANYCAST, &ifaca->aca_addr) < 0 ||
>  	    put_cacheinfo(skb, ifaca->aca_cstamp, ifaca->aca_tstamp,
> @@ -4916,7 +4929,7 @@ enum addr_type_t {
>  /* called with rcu_read_lock() */
>  static int in6_dump_addrs(struct inet6_dev *idev, struct sk_buff *skb,
>  			  struct netlink_callback *cb, enum addr_type_t type,
> -			  int s_ip_idx, int *p_ip_idx)
> +			  int s_ip_idx, int *p_ip_idx, int netnsid)
>  {
>  	struct ifmcaddr6 *ifmca;
>  	struct ifacaddr6 *ifaca;
> @@ -4936,7 +4949,7 @@ static int in6_dump_addrs(struct inet6_dev *idev, struct sk_buff *skb,
>  						NETLINK_CB(cb->skb).portid,
>  						cb->nlh->nlmsg_seq,
>  						RTM_NEWADDR,
> -						NLM_F_MULTI);
> +						NLM_F_MULTI, netnsid);
>  			if (err < 0)
>  				break;
>  			nl_dump_check_consistent(cb, nlmsg_hdr(skb));
> @@ -4953,7 +4966,7 @@ static int in6_dump_addrs(struct inet6_dev *idev, struct sk_buff *skb,
>  						  NETLINK_CB(cb->skb).portid,
>  						  cb->nlh->nlmsg_seq,
>  						  RTM_GETMULTICAST,
> -						  NLM_F_MULTI);
> +						  NLM_F_MULTI, netnsid);
>  			if (err < 0)
>  				break;
>  		}
> @@ -4968,7 +4981,7 @@ static int in6_dump_addrs(struct inet6_dev *idev, struct sk_buff *skb,
>  						  NETLINK_CB(cb->skb).portid,
>  						  cb->nlh->nlmsg_seq,
>  						  RTM_GETANYCAST,
> -						  NLM_F_MULTI);
> +						  NLM_F_MULTI, netnsid);
>  			if (err < 0)
>  				break;
>  		}
> @@ -4985,6 +4998,9 @@ static int inet6_dump_addr(struct sk_buff *skb, struct netlink_callback *cb,
>  			   enum addr_type_t type)
>  {
>  	struct net *net = sock_net(skb->sk);
> +	struct nlattr *tb[IFA_MAX+1];
> +	struct net *tgt_net = net;
> +	int netnsid = -1;

This function already has 10 local variables, while this patch adds 3 more.
Documentation/process/coding-style.rst says:

   Another measure of the function is the number of local variables.  They
   shouldn't exceed 5-10, or you're doing something wrong.  Re-think the
   function, and split it into smaller pieces.  A human brain can
   generally easily keep track of about 7 different things, anything more
   and it gets confused.

Can we do something with this?

>  	int h, s_h;
>  	int idx, ip_idx;
>  	int s_idx, s_ip_idx;
> @@ -4996,11 +5012,22 @@ static int inet6_dump_addr(struct sk_buff *skb, struct netlink_callback *cb,
>  	s_idx = idx = cb->args[1];
>  	s_ip_idx = ip_idx = cb->args[2];
>  
> +	if (nlmsg_parse(cb->nlh, sizeof(struct ifaddrmsg), tb, IFA_MAX,
> +			ifa_ipv6_policy, NULL) >= 0) {
> +		if (tb[IFA_IF_NETNSID]) {
> +			netnsid = nla_get_s32(tb[IFA_IF_NETNSID]);
> +
> +			tgt_net = rtnl_get_net_ns_capable(skb->sk, netnsid);
> +			if (IS_ERR(tgt_net))
> +				return PTR_ERR(tgt_net);
> +		}
> +	}
> +
>  	rcu_read_lock();
> -	cb->seq = atomic_read(&net->ipv6.dev_addr_genid) ^ net->dev_base_seq;
> +	cb->seq = atomic_read(&tgt_net->ipv6.dev_addr_genid) ^ tgt_net->dev_base_seq;
>  	for (h = s_h; h < NETDEV_HASHENTRIES; h++, s_idx = 0) {
>  		idx = 0;
> -		head = &net->dev_index_head[h];
> +		head = &tgt_net->dev_index_head[h];
>  		hlist_for_each_entry_rcu(dev, head, index_hlist) {
>  			if (idx < s_idx)
>  				goto cont;
> @@ -5012,7 +5039,7 @@ static int inet6_dump_addr(struct sk_buff *skb, struct netlink_callback *cb,
>  				goto cont;
>  
>  			if (in6_dump_addrs(idev, skb, cb, type,
> -					   s_ip_idx, &ip_idx) < 0)
> +					   s_ip_idx, &ip_idx, netnsid) < 0)
>  				goto done;
>  cont:
>  			idx++;
> @@ -5023,6 +5050,8 @@ static int inet6_dump_addr(struct sk_buff *skb, struct netlink_callback *cb,
>  	cb->args[0] = h;
>  	cb->args[1] = idx;
>  	cb->args[2] = ip_idx;
> +	if (netnsid >= 0)
> +		put_net(tgt_net);
>  
>  	return skb->len;
>  }
> @@ -5053,12 +5082,14 @@ static int inet6_rtm_getaddr(struct sk_buff *in_skb, struct nlmsghdr *nlh,
>  			     struct netlink_ext_ack *extack)
>  {
>  	struct net *net = sock_net(in_skb->sk);
> +	struct net *tgt_net = net;
>  	struct ifaddrmsg *ifm;
>  	struct nlattr *tb[IFA_MAX+1];
>  	struct in6_addr *addr = NULL, *peer;
>  	struct net_device *dev = NULL;
>  	struct inet6_ifaddr *ifa;
>  	struct sk_buff *skb;
> +	int netnsid = -1;
>  	int err;
>  
>  	err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFA_MAX, ifa_ipv6_policy,
> @@ -5066,15 +5097,24 @@ static int inet6_rtm_getaddr(struct sk_buff *in_skb, struct nlmsghdr *nlh,
>  	if (err < 0)
>  		return err;
>  
> +	if (tb[IFA_IF_NETNSID]) {
> +		netnsid = nla_get_s32(tb[IFA_IF_NETNSID]);
> +
> +		tgt_net = rtnl_get_net_ns_capable(NETLINK_CB(in_skb).sk,
> +						  netnsid);
> +		if (IS_ERR(tgt_net))
> +			return PTR_ERR(tgt_net);
> +	}
> +
>  	addr = extract_addr(tb[IFA_ADDRESS], tb[IFA_LOCAL], &peer);
>  	if (!addr)
>  		return -EINVAL;
>  
>  	ifm = nlmsg_data(nlh);
>  	if (ifm->ifa_index)
> -		dev = dev_get_by_index(net, ifm->ifa_index);
> +		dev = dev_get_by_index(tgt_net, ifm->ifa_index);
>  
> -	ifa = ipv6_get_ifaddr(net, addr, dev, 1);
> +	ifa = ipv6_get_ifaddr(tgt_net, addr, dev, 1);
>  	if (!ifa) {
>  		err = -EADDRNOTAVAIL;
>  		goto errout;
> @@ -5087,14 +5127,14 @@ static int inet6_rtm_getaddr(struct sk_buff *in_skb, struct nlmsghdr *nlh,
>  	}
>  
>  	err = inet6_fill_ifaddr(skb, ifa, NETLINK_CB(in_skb).portid,
> -				nlh->nlmsg_seq, RTM_NEWADDR, 0);
> +				nlh->nlmsg_seq, RTM_NEWADDR, 0, netnsid);
>  	if (err < 0) {
>  		/* -EMSGSIZE implies BUG in inet6_ifaddr_msgsize() */
>  		WARN_ON(err == -EMSGSIZE);
>  		kfree_skb(skb);
>  		goto errout_ifa;
>  	}
> -	err = rtnl_unicast(skb, net, NETLINK_CB(in_skb).portid);
> +	err = rtnl_unicast(skb, tgt_net, NETLINK_CB(in_skb).portid);
>  errout_ifa:
>  	in6_ifa_put(ifa);
>  errout:
> @@ -5113,7 +5153,7 @@ static void inet6_ifa_notify(int event, struct inet6_ifaddr *ifa)
>  	if (!skb)
>  		goto errout;
>  
> -	err = inet6_fill_ifaddr(skb, ifa, 0, 0, event, 0);
> +	err = inet6_fill_ifaddr(skb, ifa, 0, 0, event, 0, -1);
>  	if (err < 0) {
>  		/* -EMSGSIZE implies BUG in inet6_ifaddr_msgsize() */
>  		WARN_ON(err == -EMSGSIZE);

Thanks,
Kirill

^ permalink raw reply

* [PATCH net-next] cxgb4: when max_tx_rate is 0 disable tx rate limiting
From: Ganesh Goudar @ 2018-09-03 10:51 UTC (permalink / raw)
  To: netdev, davem; +Cc: nirranjan, indranil, dt, Ganesh Goudar, Casey Leedom

in ndo_set_vf_rate() when max_tx_rate is 0 disable tx
rate limiting for that vf.

Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 961e3087..2e1e286 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -2749,6 +2749,27 @@ static int cxgb4_mgmt_set_vf_rate(struct net_device *dev, int vf,
 		return -EINVAL;
 	}
 
+	if (max_tx_rate == 0) {
+		/* unbind VF to to any Traffic Class */
+		fw_pfvf =
+		    (FW_PARAMS_MNEM_V(FW_PARAMS_MNEM_PFVF) |
+		     FW_PARAMS_PARAM_X_V(FW_PARAMS_PARAM_PFVF_SCHEDCLASS_ETH));
+		fw_class = 0xffffffff;
+		ret = t4_set_params(adap, adap->mbox, adap->pf, vf + 1, 1,
+				    &fw_pfvf, &fw_class);
+		if (ret) {
+			dev_err(adap->pdev_dev,
+				"Err %d in unbinding PF %d VF %d from TX Rate Limiting\n",
+				ret, adap->pf, vf);
+			return -EINVAL;
+		}
+		dev_info(adap->pdev_dev,
+			 "PF %d VF %d is unbound from TX Rate Limiting\n",
+			 adap->pf, vf);
+		adap->vfinfo[vf].tx_rate = 0;
+		return 0;
+	}
+
 	ret = t4_get_link_params(pi, &link_ok, &speed, &mtu);
 	if (ret != FW_SUCCESS) {
 		dev_err(adap->pdev_dev,
@@ -2798,8 +2819,8 @@ static int cxgb4_mgmt_set_vf_rate(struct net_device *dev, int vf,
 			    &fw_class);
 	if (ret) {
 		dev_err(adap->pdev_dev,
-			"Err %d in binding VF %d to Traffic Class %d\n",
-			ret, vf, class_id);
+			"Err %d in binding PF %d VF %d to Traffic Class %d\n",
+			ret, adap->pf, vf, class_id);
 		return -EINVAL;
 	}
 	dev_info(adap->pdev_dev, "PF %d VF %d is bound to Class %d\n",
-- 
2.1.0

^ permalink raw reply related

* [PATCH V2 ipsec-next 1/2] xfrm: reset transport header back to network header after all input transforms ahave been applied
From: Sowmini Varadhan @ 2018-09-03 11:36 UTC (permalink / raw)
  To: netdev, steffen.klassert; +Cc: davem, sowmini.varadhan
In-Reply-To: <cover.1535973419.git.sowmini.varadhan@oracle.com>

A policy may have been set up with multiple transforms (e.g., ESP
and ipcomp). In this situation, the ingress IPsec processing
iterates in xfrm_input() and applies each transform in turn,
processing the nexthdr to find any additional xfrm that may apply.

This patch resets the transport header back to network header
only after the last transformation so that subsequent xfrms
can find the correct transport header.

Fixes: 7785bba299a8 ("esp: Add a software GRO codepath")
Suggested-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
v2: added "Fixes" tag

 net/ipv4/xfrm4_input.c          |    1 +
 net/ipv4/xfrm4_mode_transport.c |    4 +---
 net/ipv6/xfrm6_input.c          |    1 +
 net/ipv6/xfrm6_mode_transport.c |    4 +---
 4 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/xfrm4_input.c b/net/ipv4/xfrm4_input.c
index bcfc00e..f8de248 100644
--- a/net/ipv4/xfrm4_input.c
+++ b/net/ipv4/xfrm4_input.c
@@ -67,6 +67,7 @@ int xfrm4_transport_finish(struct sk_buff *skb, int async)
 
 	if (xo && (xo->flags & XFRM_GRO)) {
 		skb_mac_header_rebuild(skb);
+		skb_reset_transport_header(skb);
 		return 0;
 	}
 
diff --git a/net/ipv4/xfrm4_mode_transport.c b/net/ipv4/xfrm4_mode_transport.c
index 3d36644..1ad2c2c 100644
--- a/net/ipv4/xfrm4_mode_transport.c
+++ b/net/ipv4/xfrm4_mode_transport.c
@@ -46,7 +46,6 @@ static int xfrm4_transport_output(struct xfrm_state *x, struct sk_buff *skb)
 static int xfrm4_transport_input(struct xfrm_state *x, struct sk_buff *skb)
 {
 	int ihl = skb->data - skb_transport_header(skb);
-	struct xfrm_offload *xo = xfrm_offload(skb);
 
 	if (skb->transport_header != skb->network_header) {
 		memmove(skb_transport_header(skb),
@@ -54,8 +53,7 @@ static int xfrm4_transport_input(struct xfrm_state *x, struct sk_buff *skb)
 		skb->network_header = skb->transport_header;
 	}
 	ip_hdr(skb)->tot_len = htons(skb->len + ihl);
-	if (!xo || !(xo->flags & XFRM_GRO))
-		skb_reset_transport_header(skb);
+	skb_reset_transport_header(skb);
 	return 0;
 }
 
diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c
index 841f4a0..9ef490d 100644
--- a/net/ipv6/xfrm6_input.c
+++ b/net/ipv6/xfrm6_input.c
@@ -59,6 +59,7 @@ int xfrm6_transport_finish(struct sk_buff *skb, int async)
 
 	if (xo && (xo->flags & XFRM_GRO)) {
 		skb_mac_header_rebuild(skb);
+		skb_reset_transport_header(skb);
 		return -1;
 	}
 
diff --git a/net/ipv6/xfrm6_mode_transport.c b/net/ipv6/xfrm6_mode_transport.c
index 9ad07a9..3c29da5 100644
--- a/net/ipv6/xfrm6_mode_transport.c
+++ b/net/ipv6/xfrm6_mode_transport.c
@@ -51,7 +51,6 @@ static int xfrm6_transport_output(struct xfrm_state *x, struct sk_buff *skb)
 static int xfrm6_transport_input(struct xfrm_state *x, struct sk_buff *skb)
 {
 	int ihl = skb->data - skb_transport_header(skb);
-	struct xfrm_offload *xo = xfrm_offload(skb);
 
 	if (skb->transport_header != skb->network_header) {
 		memmove(skb_transport_header(skb),
@@ -60,8 +59,7 @@ static int xfrm6_transport_input(struct xfrm_state *x, struct sk_buff *skb)
 	}
 	ipv6_hdr(skb)->payload_len = htons(skb->len + ihl -
 					   sizeof(struct ipv6hdr));
-	if (!xo || !(xo->flags & XFRM_GRO))
-		skb_reset_transport_header(skb);
+	skb_reset_transport_header(skb);
 	return 0;
 }
 
-- 
1.7.1

^ permalink raw reply related

* [PATCH V2 ipsec-next 0/2] xfrm: bug fixes when processing multiple transforms
From: Sowmini Varadhan @ 2018-09-03 11:36 UTC (permalink / raw)
  To: netdev, steffen.klassert; +Cc: davem, sowmini.varadhan

This series contains bug fixes that were encountered when I set
up a libreswan tunnel using the config below, which will set up
an IPsec policy involving 2 tmpls.

    type=transport
    compress=yes
    esp=aes_gcm_c-128-null # offloaded to Niantic
    auto=start

The non-offload test case uses  esp=aes_gcm_c-256-null.

Each patch has a technical description of the contents of the fix.

V2: added Fixes tag so that it can be backported to the stable trees.

Sowmini Varadhan (2):
  xfrm: reset transport header back to network header after all input
    transforms ahave been applied
  xfrm: reset crypto_done when iterating over multiple input xfrms

 net/ipv4/xfrm4_input.c          |    1 +
 net/ipv4/xfrm4_mode_transport.c |    4 +---
 net/ipv6/xfrm6_input.c          |    1 +
 net/ipv6/xfrm6_mode_transport.c |    4 +---
 net/xfrm/xfrm_input.c           |    1 +
 5 files changed, 5 insertions(+), 6 deletions(-)

^ permalink raw reply

* [PATCH V2 ipsec-next 2/2] xfrm: reset crypto_done when iterating over multiple input xfrms
From: Sowmini Varadhan @ 2018-09-03 11:36 UTC (permalink / raw)
  To: netdev, steffen.klassert; +Cc: davem, sowmini.varadhan
In-Reply-To: <cover.1535973419.git.sowmini.varadhan@oracle.com>

We only support one offloaded xfrm (we do not have devices that
can handle more than one offload), so reset crypto_done in
xfrm_input() when iterating over multiple transforms in xfrm_input,
so that we can invoke the appropriate x->type->input for the
non-offloaded transforms

Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API")

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
v2: added "Fixes" tag

 net/xfrm/xfrm_input.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index b89c9c7..be3520e 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -458,6 +458,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type)
 			XFRM_INC_STATS(net, LINUX_MIB_XFRMINHDRERROR);
 			goto drop;
 		}
+		crypto_done = false;
 	} while (!err);
 
 	err = xfrm_rcv_cb(skb, family, x->type->proto, 0);
-- 
1.7.1

^ permalink raw reply related

* [PATCH] 9p: Rename req to rreq
From: Tomas Bortoli @ 2018-09-03 16:03 UTC (permalink / raw)
  To: asmadeus
  Cc: ericvh, rminnich, lucho, davem, v9fs-developer, netdev,
	linux-kernel, syzkaller, Tomas Bortoli

In struct p9_conn, rename req to rreq as it is used by the read routine.

Signed-off-by: Tomas Bortoli <tomasbortoli@gmail.com>
Suggested-by: Jun Piao <piaojun@huawei.com>
---
 net/9p/trans_fd.c | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 686e24e355d0..743c8bf571f6 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -131,7 +131,7 @@ struct p9_conn {
 	int err;
 	struct list_head req_list;
 	struct list_head unsent_req_list;
-	struct p9_req_t *req;
+	struct p9_req_t *rreq;
 	struct p9_req_t *wreq;
 	char tmp_buf[7];
 	struct p9_fcall rc;
@@ -323,7 +323,7 @@ static void p9_read_work(struct work_struct *work)
 	m->rc.offset += err;
 
 	/* header read in */
-	if ((!m->req) && (m->rc.offset == m->rc.capacity)) {
+	if ((!m->rreq) && (m->rc.offset == m->rc.capacity)) {
 		p9_debug(P9_DEBUG_TRANS, "got new header\n");
 
 		/* Header size */
@@ -347,23 +347,23 @@ static void p9_read_work(struct work_struct *work)
 			 "mux %p pkt: size: %d bytes tag: %d\n",
 			 m, m->rc.size, m->rc.tag);
 
-		m->req = p9_tag_lookup(m->client, m->rc.tag);
-		if (!m->req || (m->req->status != REQ_STATUS_SENT)) {
+		m->rreq = p9_tag_lookup(m->client, m->rc.tag);
+		if (!m->rreq || (m->rreq->status != REQ_STATUS_SENT)) {
 			p9_debug(P9_DEBUG_ERROR, "Unexpected packet tag %d\n",
 				 m->rc.tag);
 			err = -EIO;
 			goto error;
 		}
 
-		if (m->req->rc.sdata == NULL) {
+		if (m->rreq->rc.sdata == NULL) {
 			p9_debug(P9_DEBUG_ERROR,
 				 "No recv fcall for tag %d (req %p), disconnecting!\n",
-				 m->rc.tag, m->req);
-			m->req = NULL;
+				 m->rc.tag, m->rreq);
+			m->rreq = NULL;
 			err = -EIO;
 			goto error;
 		}
-		m->rc.sdata = m->req->rc.sdata;
+		m->rc.sdata = m->rreq->rc.sdata;
 		memcpy(m->rc.sdata, m->tmp_buf, m->rc.capacity);
 		m->rc.capacity = m->rc.size;
 	}
@@ -371,21 +371,21 @@ static void p9_read_work(struct work_struct *work)
 	/* packet is read in
 	 * not an else because some packets (like clunk) have no payload
 	 */
-	if ((m->req) && (m->rc.offset == m->rc.capacity)) {
+	if ((m->rreq) && (m->rc.offset == m->rc.capacity)) {
 		p9_debug(P9_DEBUG_TRANS, "got new packet\n");
-		m->req->rc.size = m->rc.offset;
+		m->rreq->rc.size = m->rc.offset;
 		spin_lock(&m->client->lock);
-		if (m->req->status != REQ_STATUS_ERROR)
+		if (m->rreq->status != REQ_STATUS_ERROR)
 			status = REQ_STATUS_RCVD;
-		list_del(&m->req->req_list);
+		list_del(&m->rreq->req_list);
 		/* update req->status while holding client->lock  */
-		p9_client_cb(m->client, m->req, status);
+		p9_client_cb(m->client, m->rreq, status);
 		spin_unlock(&m->client->lock);
 		m->rc.sdata = NULL;
 		m->rc.offset = 0;
 		m->rc.capacity = 0;
-		p9_req_put(m->req);
-		m->req = NULL;
+		p9_req_put(m->rreq);
+		m->rreq = NULL;
 	}
 
 end_clear:
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH RFC net-next 00/11] udp gso
From: Sowmini Varadhan @ 2018-09-03 11:45 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Willem de Bruijn, Paolo Abeni, Network Development,
	Willem de Bruijn, alexander.h.duyck
In-Reply-To: <20180903080248.GM23674@gauss3.secunet.de>

On (09/03/18 10:02), Steffen Klassert wrote:
> I'm working on patches that builds such skb lists. The list is chained
> at the frag_list pointer of the first skb, all subsequent skbs are linked
> to the next pointer of the skb. It looks like this:

there are some risks to using the frag_list pointer, Alex Duyck
had pointed this out to me in 
  https://www.mail-archive.com/netdev@vger.kernel.org/msg131081.html
(see last paragraph, or search for the string "gotcha" there)

I dont know the details of your playground patch, but might want to
watch out for those to make sure it is immune to these issues..

--Sowmini

^ permalink raw reply

* Re: [RFT net-next] net: stmmac: Rework coalesce timer and fix multi-queue races
From: Jose Abreu @ 2018-09-03 11:47 UTC (permalink / raw)
  To: Jerome Brunet, Jose Abreu, netdev
  Cc: Martin Blumenstingl, David S. Miller, Joao Pinto,
	Giuseppe Cavallaro, Alexandre Torgue
In-Reply-To: <634384f2f0e6cf9fa4f9120ceb3abb0c69f998dc.camel@baylibre.com>

On 03-09-2018 11:16, Jerome Brunet wrote:
> No notable change. Rx is fine but Tx:
> [  5]   3.00-4.00   sec  3.55 MBytes  29.8 Mbits/sec   51   12.7 KBytes
>
> I suppose the problem as something to do with the retries. When doing Tx test
> alone, we don't have such a things a throughput where we expect it to be.

Yeah, I just remembered you are not using GMAC4 so it wouldn't
make a difference. Is your version 3.710? If so please try adding
the following compatible to your DT bindings "snps,dwmac-3.710".

>
> By the way, your mailer (and its auto 80 column rule I suppose) made the patch
> below a bit harder to apply

Sorry. Next time I will send as attachment.

Thanks and Best Regards,
Jose Miguel Abreu

^ permalink raw reply

* [PATCH net-next v2] cxgb4: collect hardware queue descriptors
From: Rahul Lakkireddy @ 2018-09-03 12:11 UTC (permalink / raw)
  To: netdev; +Cc: davem, ganeshgr, nirranjan, indranil

Collect descriptors of all ULD and LLD hardware queues managed
by LLD.

Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
---
v2:
- Move inline functions to header file.
- Add missing undefine for QDESC_GET_* macros.

 drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h |  42 ++++
 drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h     |   3 +-
 drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c    | 238 ++++++++++++++++++++++
 drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.h    | 106 ++++++++++
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h        |   7 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c  |   4 +
 6 files changed, 399 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h b/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h
index 36d25883d123..b2d617abcf49 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_entity.h
@@ -315,6 +315,48 @@ struct cudbg_pbt_tables {
 	u32 pbt_data[CUDBG_PBT_DATA_ENTRIES];
 };
 
+enum cudbg_qdesc_qtype {
+	CUDBG_QTYPE_UNKNOWN = 0,
+	CUDBG_QTYPE_NIC_TXQ,
+	CUDBG_QTYPE_NIC_RXQ,
+	CUDBG_QTYPE_NIC_FLQ,
+	CUDBG_QTYPE_CTRLQ,
+	CUDBG_QTYPE_FWEVTQ,
+	CUDBG_QTYPE_INTRQ,
+	CUDBG_QTYPE_PTP_TXQ,
+	CUDBG_QTYPE_OFLD_TXQ,
+	CUDBG_QTYPE_RDMA_RXQ,
+	CUDBG_QTYPE_RDMA_FLQ,
+	CUDBG_QTYPE_RDMA_CIQ,
+	CUDBG_QTYPE_ISCSI_RXQ,
+	CUDBG_QTYPE_ISCSI_FLQ,
+	CUDBG_QTYPE_ISCSIT_RXQ,
+	CUDBG_QTYPE_ISCSIT_FLQ,
+	CUDBG_QTYPE_CRYPTO_TXQ,
+	CUDBG_QTYPE_CRYPTO_RXQ,
+	CUDBG_QTYPE_CRYPTO_FLQ,
+	CUDBG_QTYPE_TLS_RXQ,
+	CUDBG_QTYPE_TLS_FLQ,
+	CUDBG_QTYPE_MAX,
+};
+
+#define CUDBG_QDESC_REV 1
+
+struct cudbg_qdesc_entry {
+	u32 data_size;
+	u32 qtype;
+	u32 qid;
+	u32 desc_size;
+	u32 num_desc;
+	u8 data[0]; /* Must be last */
+};
+
+struct cudbg_qdesc_info {
+	u32 qdesc_entry_size;
+	u32 num_queues;
+	u8 data[0]; /* Must be last */
+};
+
 #define IREG_NUM_ELEM 4
 
 static const u32 t6_tp_pio_array[][IREG_NUM_ELEM] = {
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h b/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h
index 215fe6260fd7..dec63c15c0ba 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_if.h
@@ -81,7 +81,8 @@ enum cudbg_dbg_entity_type {
 	CUDBG_MBOX_LOG = 66,
 	CUDBG_HMA_INDIRECT = 67,
 	CUDBG_HMA = 68,
-	CUDBG_MAX_ENTITY = 70,
+	CUDBG_QDESC = 70,
+	CUDBG_MAX_ENTITY = 71,
 };
 
 struct cudbg_init {
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c
index d97e0d7e541a..7c49681407ad 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c
@@ -19,6 +19,7 @@
 
 #include "t4_regs.h"
 #include "cxgb4.h"
+#include "cxgb4_cudbg.h"
 #include "cudbg_if.h"
 #include "cudbg_lib_common.h"
 #include "cudbg_entity.h"
@@ -2890,3 +2891,240 @@ int cudbg_collect_hma_indirect(struct cudbg_init *pdbg_init,
 	}
 	return cudbg_write_and_release_buff(pdbg_init, &temp_buff, dbg_buff);
 }
+
+void cudbg_fill_qdesc_num_and_size(const struct adapter *padap,
+				   u32 *num, u32 *size)
+{
+	u32 tot_entries = 0, tot_size = 0;
+
+	/* NIC TXQ, RXQ, FLQ, and CTRLQ */
+	tot_entries += MAX_ETH_QSETS * 3;
+	tot_entries += MAX_CTRL_QUEUES;
+
+	tot_size += MAX_ETH_QSETS * MAX_TXQ_ENTRIES * MAX_TXQ_DESC_SIZE;
+	tot_size += MAX_ETH_QSETS * MAX_RSPQ_ENTRIES * MAX_RXQ_DESC_SIZE;
+	tot_size += MAX_ETH_QSETS * MAX_RX_BUFFERS * MAX_FL_DESC_SIZE;
+	tot_size += MAX_CTRL_QUEUES * MAX_CTRL_TXQ_ENTRIES *
+		    MAX_CTRL_TXQ_DESC_SIZE;
+
+	/* FW_EVTQ and INTRQ */
+	tot_entries += INGQ_EXTRAS;
+	tot_size += INGQ_EXTRAS * MAX_RSPQ_ENTRIES * MAX_RXQ_DESC_SIZE;
+
+	/* PTP_TXQ */
+	tot_entries += 1;
+	tot_size += MAX_TXQ_ENTRIES * MAX_TXQ_DESC_SIZE;
+
+	/* ULD TXQ, RXQ, and FLQ */
+	tot_entries += CXGB4_TX_MAX * MAX_OFLD_QSETS;
+	tot_entries += CXGB4_ULD_MAX * MAX_ULD_QSETS * 2;
+
+	tot_size += CXGB4_TX_MAX * MAX_OFLD_QSETS * MAX_TXQ_ENTRIES *
+		    MAX_TXQ_DESC_SIZE;
+	tot_size += CXGB4_ULD_MAX * MAX_ULD_QSETS * MAX_RSPQ_ENTRIES *
+		    MAX_RXQ_DESC_SIZE;
+	tot_size += CXGB4_ULD_MAX * MAX_ULD_QSETS * MAX_RX_BUFFERS *
+		    MAX_FL_DESC_SIZE;
+
+	/* ULD CIQ */
+	tot_entries += CXGB4_ULD_MAX * MAX_ULD_QSETS;
+	tot_size += CXGB4_ULD_MAX * MAX_ULD_QSETS * SGE_MAX_IQ_SIZE *
+		    MAX_RXQ_DESC_SIZE;
+
+	tot_size += sizeof(struct cudbg_ver_hdr) +
+		    sizeof(struct cudbg_qdesc_info) +
+		    sizeof(struct cudbg_qdesc_entry) * tot_entries;
+
+	if (num)
+		*num = tot_entries;
+
+	if (size)
+		*size = tot_size;
+}
+
+int cudbg_collect_qdesc(struct cudbg_init *pdbg_init,
+			struct cudbg_buffer *dbg_buff,
+			struct cudbg_error *cudbg_err)
+{
+	u32 num_queues = 0, tot_entries = 0, size = 0;
+	struct adapter *padap = pdbg_init->adap;
+	struct cudbg_buffer temp_buff = { 0 };
+	struct cudbg_qdesc_entry *qdesc_entry;
+	struct cudbg_qdesc_info *qdesc_info;
+	struct cudbg_ver_hdr *ver_hdr;
+	struct sge *s = &padap->sge;
+	u32 i, j, cur_off, tot_len;
+	u8 *data;
+	int rc;
+
+	cudbg_fill_qdesc_num_and_size(padap, &tot_entries, &size);
+	size = min_t(u32, size, CUDBG_DUMP_BUFF_SIZE);
+	tot_len = size;
+	data = kvzalloc(size, GFP_KERNEL);
+	if (!data)
+		return -ENOMEM;
+
+	ver_hdr = (struct cudbg_ver_hdr *)data;
+	ver_hdr->signature = CUDBG_ENTITY_SIGNATURE;
+	ver_hdr->revision = CUDBG_QDESC_REV;
+	ver_hdr->size = sizeof(struct cudbg_qdesc_info);
+	size -= sizeof(*ver_hdr);
+
+	qdesc_info = (struct cudbg_qdesc_info *)(data +
+						 sizeof(*ver_hdr));
+	size -= sizeof(*qdesc_info);
+	qdesc_entry = (struct cudbg_qdesc_entry *)qdesc_info->data;
+
+#define QDESC_GET(q, desc, type, label) do { \
+	if (size <= 0) { \
+		goto label; \
+	} \
+	if (desc) { \
+		cudbg_fill_qdesc_##q(q, type, qdesc_entry); \
+		size -= sizeof(*qdesc_entry) + qdesc_entry->data_size; \
+		num_queues++; \
+		qdesc_entry = cudbg_next_qdesc(qdesc_entry); \
+	} \
+} while (0)
+
+#define QDESC_GET_TXQ(q, type, label) do { \
+	struct sge_txq *txq = (struct sge_txq *)q; \
+	QDESC_GET(txq, txq->desc, type, label); \
+} while (0)
+
+#define QDESC_GET_RXQ(q, type, label) do { \
+	struct sge_rspq *rxq = (struct sge_rspq *)q; \
+	QDESC_GET(rxq, rxq->desc, type, label); \
+} while (0)
+
+#define QDESC_GET_FLQ(q, type, label) do { \
+	struct sge_fl *flq = (struct sge_fl *)q; \
+	QDESC_GET(flq, flq->desc, type, label); \
+} while (0)
+
+	/* NIC TXQ */
+	for (i = 0; i < s->ethqsets; i++)
+		QDESC_GET_TXQ(&s->ethtxq[i].q, CUDBG_QTYPE_NIC_TXQ, out);
+
+	/* NIC RXQ */
+	for (i = 0; i < s->ethqsets; i++)
+		QDESC_GET_RXQ(&s->ethrxq[i].rspq, CUDBG_QTYPE_NIC_RXQ, out);
+
+	/* NIC FLQ */
+	for (i = 0; i < s->ethqsets; i++)
+		QDESC_GET_FLQ(&s->ethrxq[i].fl, CUDBG_QTYPE_NIC_FLQ, out);
+
+	/* NIC CTRLQ */
+	for (i = 0; i < padap->params.nports; i++)
+		QDESC_GET_TXQ(&s->ctrlq[i].q, CUDBG_QTYPE_CTRLQ, out);
+
+	/* FW_EVTQ */
+	QDESC_GET_RXQ(&s->fw_evtq, CUDBG_QTYPE_FWEVTQ, out);
+
+	/* INTRQ */
+	QDESC_GET_RXQ(&s->intrq, CUDBG_QTYPE_INTRQ, out);
+
+	/* PTP_TXQ */
+	QDESC_GET_TXQ(&s->ptptxq.q, CUDBG_QTYPE_PTP_TXQ, out);
+
+	/* ULD Queues */
+	mutex_lock(&uld_mutex);
+
+	if (s->uld_txq_info) {
+		struct sge_uld_txq_info *utxq;
+
+		/* ULD TXQ */
+		for (j = 0; j < CXGB4_TX_MAX; j++) {
+			if (!s->uld_txq_info[j])
+				continue;
+
+			utxq = s->uld_txq_info[j];
+			for (i = 0; i < utxq->ntxq; i++)
+				QDESC_GET_TXQ(&utxq->uldtxq[i].q,
+					      cudbg_uld_txq_to_qtype(j),
+					      out_unlock);
+		}
+	}
+
+	if (s->uld_rxq_info) {
+		struct sge_uld_rxq_info *urxq;
+		u32 base;
+
+		/* ULD RXQ */
+		for (j = 0; j < CXGB4_ULD_MAX; j++) {
+			if (!s->uld_rxq_info[j])
+				continue;
+
+			urxq = s->uld_rxq_info[j];
+			for (i = 0; i < urxq->nrxq; i++)
+				QDESC_GET_RXQ(&urxq->uldrxq[i].rspq,
+					      cudbg_uld_rxq_to_qtype(j),
+					      out_unlock);
+		}
+
+		/* ULD FLQ */
+		for (j = 0; j < CXGB4_ULD_MAX; j++) {
+			if (!s->uld_rxq_info[j])
+				continue;
+
+			urxq = s->uld_rxq_info[j];
+			for (i = 0; i < urxq->nrxq; i++)
+				QDESC_GET_FLQ(&urxq->uldrxq[i].fl,
+					      cudbg_uld_flq_to_qtype(j),
+					      out_unlock);
+		}
+
+		/* ULD CIQ */
+		for (j = 0; j < CXGB4_ULD_MAX; j++) {
+			if (!s->uld_rxq_info[j])
+				continue;
+
+			urxq = s->uld_rxq_info[j];
+			base = urxq->nrxq;
+			for (i = 0; i < urxq->nciq; i++)
+				QDESC_GET_RXQ(&urxq->uldrxq[base + i].rspq,
+					      cudbg_uld_ciq_to_qtype(j),
+					      out_unlock);
+		}
+	}
+
+out_unlock:
+	mutex_unlock(&uld_mutex);
+
+out:
+	qdesc_info->qdesc_entry_size = sizeof(*qdesc_entry);
+	qdesc_info->num_queues = num_queues;
+	cur_off = 0;
+	while (tot_len) {
+		u32 chunk_size = min_t(u32, tot_len, CUDBG_CHUNK_SIZE);
+
+		rc = cudbg_get_buff(pdbg_init, dbg_buff, chunk_size,
+				    &temp_buff);
+		if (rc) {
+			cudbg_err->sys_warn = CUDBG_STATUS_PARTIAL_DATA;
+			goto out_free;
+		}
+
+		memcpy(temp_buff.data, data + cur_off, chunk_size);
+		tot_len -= chunk_size;
+		cur_off += chunk_size;
+		rc = cudbg_write_and_release_buff(pdbg_init, &temp_buff,
+						  dbg_buff);
+		if (rc) {
+			cudbg_put_buff(pdbg_init, &temp_buff);
+			cudbg_err->sys_warn = CUDBG_STATUS_PARTIAL_DATA;
+			goto out_free;
+		}
+	}
+
+out_free:
+	if (data)
+		kvfree(data);
+
+#undef QDESC_GET_FLQ
+#undef QDESC_GET_RXQ
+#undef QDESC_GET_TXQ
+#undef QDESC_GET
+
+	return rc;
+}
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.h b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.h
index eebefe7cd18e..f047a01a3e5b 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.h
@@ -171,6 +171,9 @@ int cudbg_collect_hma_indirect(struct cudbg_init *pdbg_init,
 int cudbg_collect_hma_meminfo(struct cudbg_init *pdbg_init,
 			      struct cudbg_buffer *dbg_buff,
 			      struct cudbg_error *cudbg_err);
+int cudbg_collect_qdesc(struct cudbg_init *pdbg_init,
+			struct cudbg_buffer *dbg_buff,
+			struct cudbg_error *cudbg_err);
 
 struct cudbg_entity_hdr *cudbg_get_entity_hdr(void *outbuf, int i);
 void cudbg_align_debug_buffer(struct cudbg_buffer *dbg_buff,
@@ -182,4 +185,107 @@ int cudbg_fill_meminfo(struct adapter *padap,
 		       struct cudbg_meminfo *meminfo_buff);
 void cudbg_fill_le_tcam_info(struct adapter *padap,
 			     struct cudbg_tcam *tcam_region);
+void cudbg_fill_qdesc_num_and_size(const struct adapter *padap,
+				   u32 *num, u32 *size);
+
+static inline u32 cudbg_uld_txq_to_qtype(u32 uld)
+{
+	switch (uld) {
+	case CXGB4_TX_OFLD:
+		return CUDBG_QTYPE_OFLD_TXQ;
+	case CXGB4_TX_CRYPTO:
+		return CUDBG_QTYPE_CRYPTO_TXQ;
+	}
+
+	return CUDBG_QTYPE_UNKNOWN;
+}
+
+static inline u32 cudbg_uld_rxq_to_qtype(u32 uld)
+{
+	switch (uld) {
+	case CXGB4_ULD_RDMA:
+		return CUDBG_QTYPE_RDMA_RXQ;
+	case CXGB4_ULD_ISCSI:
+		return CUDBG_QTYPE_ISCSI_RXQ;
+	case CXGB4_ULD_ISCSIT:
+		return CUDBG_QTYPE_ISCSIT_RXQ;
+	case CXGB4_ULD_CRYPTO:
+		return CUDBG_QTYPE_CRYPTO_RXQ;
+	case CXGB4_ULD_TLS:
+		return CUDBG_QTYPE_TLS_RXQ;
+	}
+
+	return CUDBG_QTYPE_UNKNOWN;
+}
+
+static inline u32 cudbg_uld_flq_to_qtype(u32 uld)
+{
+	switch (uld) {
+	case CXGB4_ULD_RDMA:
+		return CUDBG_QTYPE_RDMA_FLQ;
+	case CXGB4_ULD_ISCSI:
+		return CUDBG_QTYPE_ISCSI_FLQ;
+	case CXGB4_ULD_ISCSIT:
+		return CUDBG_QTYPE_ISCSIT_FLQ;
+	case CXGB4_ULD_CRYPTO:
+		return CUDBG_QTYPE_CRYPTO_FLQ;
+	case CXGB4_ULD_TLS:
+		return CUDBG_QTYPE_TLS_FLQ;
+	}
+
+	return CUDBG_QTYPE_UNKNOWN;
+}
+
+static inline u32 cudbg_uld_ciq_to_qtype(u32 uld)
+{
+	switch (uld) {
+	case CXGB4_ULD_RDMA:
+		return CUDBG_QTYPE_RDMA_CIQ;
+	}
+
+	return CUDBG_QTYPE_UNKNOWN;
+}
+
+static inline void cudbg_fill_qdesc_txq(const struct sge_txq *txq,
+					enum cudbg_qdesc_qtype type,
+					struct cudbg_qdesc_entry *entry)
+{
+	entry->qtype = type;
+	entry->qid = txq->cntxt_id;
+	entry->desc_size = sizeof(struct tx_desc);
+	entry->num_desc = txq->size;
+	entry->data_size = txq->size * sizeof(struct tx_desc);
+	memcpy(entry->data, txq->desc, entry->data_size);
+}
+
+static inline void cudbg_fill_qdesc_rxq(const struct sge_rspq *rxq,
+					enum cudbg_qdesc_qtype type,
+					struct cudbg_qdesc_entry *entry)
+{
+	entry->qtype = type;
+	entry->qid = rxq->cntxt_id;
+	entry->desc_size = rxq->iqe_len;
+	entry->num_desc = rxq->size;
+	entry->data_size = rxq->size * rxq->iqe_len;
+	memcpy(entry->data, rxq->desc, entry->data_size);
+}
+
+static inline void cudbg_fill_qdesc_flq(const struct sge_fl *flq,
+					enum cudbg_qdesc_qtype type,
+					struct cudbg_qdesc_entry *entry)
+{
+	entry->qtype = type;
+	entry->qid = flq->cntxt_id;
+	entry->desc_size = sizeof(__be64);
+	entry->num_desc = flq->size;
+	entry->data_size = flq->size * sizeof(__be64);
+	memcpy(entry->data, flq->desc, entry->data_size);
+}
+
+static inline
+struct cudbg_qdesc_entry *cudbg_next_qdesc(struct cudbg_qdesc_entry *e)
+{
+	return (struct cudbg_qdesc_entry *)
+	       ((u8 *)e + sizeof(*e) + e->data_size);
+}
 #endif /* __CUDBG_LIB_H__ */
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index 76d16747f513..298701edb638 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -532,6 +532,13 @@ enum {
 	MIN_FL_ENTRIES       = 16
 };
 
+enum {
+	MAX_TXQ_DESC_SIZE      = 64,
+	MAX_RXQ_DESC_SIZE      = 128,
+	MAX_FL_DESC_SIZE       = 8,
+	MAX_CTRL_TXQ_DESC_SIZE = 64,
+};
+
 enum {
 	INGQ_EXTRAS = 2,        /* firmware event queue and */
 				/*   forwarded interrupts */
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c
index 5f01c0a7fd98..972f0a124714 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c
@@ -30,6 +30,7 @@ static const struct cxgb4_collect_entity cxgb4_collect_mem_dump[] = {
 
 static const struct cxgb4_collect_entity cxgb4_collect_hw_dump[] = {
 	{ CUDBG_MBOX_LOG, cudbg_collect_mbox_log },
+	{ CUDBG_QDESC, cudbg_collect_qdesc },
 	{ CUDBG_DEV_LOG, cudbg_collect_fw_devlog },
 	{ CUDBG_REG_DUMP, cudbg_collect_reg_dump },
 	{ CUDBG_CIM_LA, cudbg_collect_cim_la },
@@ -311,6 +312,9 @@ static u32 cxgb4_get_entity_length(struct adapter *adap, u32 entity)
 		}
 		len = cudbg_mbytes_to_bytes(len);
 		break;
+	case CUDBG_QDESC:
+		cudbg_fill_qdesc_num_and_size(adap, NULL, &len);
+		break;
 	default:
 		break;
 	}
-- 
2.14.1

^ permalink raw reply related

* Re: [PATCH] net: wireless: ath: Convert to using %pOFn instead of device_node.name
From: Kalle Valo @ 2018-09-03 16:58 UTC (permalink / raw)
  To: Rob Herring; +Cc: linux-kernel, David S. Miller, linux-wireless, netdev
In-Reply-To: <20180828015252.28511-36-robh@kernel.org>

Rob Herring <robh@kernel.org> wrote:

> In preparation to remove the node name pointer from struct device_node,
> convert printf users to use the %pOFn format specifier.
> 
> Cc: Kalle Valo <kvalo@codeaurora.org>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: linux-wireless@vger.kernel.org
> Cc: netdev@vger.kernel.org
> Signed-off-by: Rob Herring <robh@kernel.org>
> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

Patch applied to ath-next branch of ath.git, thanks.

e12e643c1dfb ath6kl: convert to using %pOFn instead of device_node.name

-- 
https://patchwork.kernel.org/patch/10577811/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

* Re: [PATCH 2/2] ath10k: allow ATH10K_SNOC with COMPILE_TEST
From: Kalle Valo @ 2018-09-03 16:59 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: netdev, linux-wireless, linux-kernel, ath10k, Niklas Cassel,
	David S. Miller
In-Reply-To: <20180612113907.15043-2-niklas.cassel@linaro.org>

Niklas Cassel <niklas.cassel@linaro.org> wrote:

> ATH10K_SNOC builds just fine with COMPILE_TEST, so make that possible.
> 
> Signed-off-by: Niklas Cassel <niklas.cassel@linaro.org>
> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

Patch applied to ath-next branch of ath.git, thanks.

f1908735f141 ath10k: allow ATH10K_SNOC with COMPILE_TEST

-- 
https://patchwork.kernel.org/patch/10460103/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox