Netdev List

Netdev List
 help / color / mirror / Atom feed

* [patch net-next 2/4] mlxsw: spectrum_buffers: Limit size of pools
From: Jiri Pirko @ 2016-11-28 17:01 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, eladr, yotamg, nogahf, arkadis, ogerlitz
In-Reply-To: <1480352486-18518-1-git-send-email-jiri@resnulli.us>

From: Ido Schimmel <idosch@mellanox.com>

The shared buffer pools are containers whose size is used to calculate
the maximum usage for packets from / to a specific port / {port, PG/TC},
when dynamic threshold is employed.

While it's perfectly fine for the sum of the pools to exceed the maximum
size of the shared buffer, a single pool cannot.

Add a check when the pool size is set and forbid sizes larger than the
maximum size of the shared buffer.

Without the patch:
$ devlink sb pool set pci/0000:03:00.0 pool 0 size 999999999 thtype
dynamic
// No error is returned

With the patch:
$ devlink sb pool set pci/0000:03:00.0 pool 0 size 999999999 thtype
dynamic
devlink answers: Invalid argument

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c
index bcaed8a..a746826 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c
@@ -611,6 +611,9 @@ int mlxsw_sp_sb_pool_set(struct mlxsw_core *mlxsw_core,
 	u32 pool_size = MLXSW_SP_BYTES_TO_CELLS(size);
 	enum mlxsw_reg_sbpr_mode mode;
 
+	if (size > MLXSW_CORE_RES_GET(mlxsw_sp->core, MAX_BUFFER_SIZE))
+		return -EINVAL;
+
 	mode = (enum mlxsw_reg_sbpr_mode) threshold_type;
 	return mlxsw_sp_sb_pr_write(mlxsw_sp, pool, dir, mode, pool_size);
 }
-- 
2.7.4

^ permalink raw reply related

* [patch net-next 1/4] mlxsw: resources: Add maximum buffer size
From: Jiri Pirko @ 2016-11-28 17:01 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, eladr, yotamg, nogahf, arkadis, ogerlitz
In-Reply-To: <1480352486-18518-1-git-send-email-jiri@resnulli.us>

From: Ido Schimmel <idosch@mellanox.com>

We need to be able to limit the size of shared buffer pools, so query
the maximum size from the device during init.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/resources.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/resources.h b/drivers/net/ethernet/mellanox/mlxsw/resources.h
index 1c2119b..3c2171d 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/resources.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/resources.h
@@ -47,6 +47,7 @@ enum mlxsw_res_id {
 	MLXSW_RES_ID_MAX_SYSTEM_PORT,
 	MLXSW_RES_ID_MAX_LAG,
 	MLXSW_RES_ID_MAX_LAG_MEMBERS,
+	MLXSW_RES_ID_MAX_BUFFER_SIZE,
 	MLXSW_RES_ID_MAX_CPU_POLICERS,
 	MLXSW_RES_ID_MAX_VRS,
 	MLXSW_RES_ID_MAX_RIFS,
@@ -70,6 +71,7 @@ static u16 mlxsw_res_ids[] = {
 	[MLXSW_RES_ID_MAX_SYSTEM_PORT] = 0x2502,
 	[MLXSW_RES_ID_MAX_LAG] = 0x2520,
 	[MLXSW_RES_ID_MAX_LAG_MEMBERS] = 0x2521,
+	[MLXSW_RES_ID_MAX_BUFFER_SIZE] = 0x2802,	/* Bytes */
 	[MLXSW_RES_ID_MAX_CPU_POLICERS] = 0x2A13,
 	[MLXSW_RES_ID_MAX_VRS] = 0x2C01,
 	[MLXSW_RES_ID_MAX_RIFS] = 0x2C02,
-- 
2.7.4

^ permalink raw reply related

* Re: stmmac ethernet in kernel 4.4: coalescing related pauses?
From: Lino Sanfilippo @ 2016-11-28 17:01 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, pavel, peppe.cavallaro, netdev, linux-kernel
In-Reply-To: <20161128.113031.964579744326063048.davem@davemloft.net>

On 28.11.2016 17:30, David Miller wrote:
> From: Lino Sanfilippo <lsanfil@marvell.com>
> Date: Mon, 28 Nov 2016 16:57:35 +0100
>
>> I wonder if the best fix would be indeed to deactivate irq coalescing
>> completely.
>> Does it make any sense at all to use it if a driver uses NAPI already?
>
> It absolutely does make sense, when it is implemented and functions
> properly.
>

Interesting. I always thought both (NAPI and irq coalescing) are essentially doing the same thing only
one time in software and one time with hw support. Did I misunderstand NAPI?

Regards,
Lino

^ permalink raw reply

* Re: [PATCH net-next 1/5] net: mvneta: Use cacheable memory to store the rx buffer virtual address
From: Gregory CLEMENT @ 2016-11-28 17:00 UTC (permalink / raw)
  To: Jisheng Zhang
  Cc: David S. Miller, linux-kernel, netdev, Arnd Bergmann,
	Jason Cooper, Andrew Lunn, Sebastian Hesselbarth,
	Thomas Petazzoni, linux-arm-kernel, Nadav Haklai, Marcin Wojtas,
	Dmitri Epshtein, Yelena Krivosheev
In-Reply-To: <20161128163548.70181560@xhacker>

Hi Jisheng,
 
 On lun., nov. 28 2016, Jisheng Zhang <jszhang@marvell.com> wrote:

> Hi Gregory,
>
> On Fri, 25 Nov 2016 16:30:14 +0100 Gregory CLEMENT wrote:
>
>> Until now the virtual address of the received buffer were stored in the
>> cookie field of the rx descriptor. However, this field is 32-bits only
>> which prevents to use the driver on a 64-bits architecture.
>> 
>> With this patch the virtual address is stored in an array not shared with
>> the hardware (no more need to use the DMA API). Thanks to this, it is
>> possible to use cache contrary to the access of the rx descriptor member.
>> 
>> The change is done in the swbm path only because the hwbm uses the cookie
>> field, this also means that currently the hwbm is not usable in 64-bits.
>> 
>> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
>> ---
>>  drivers/net/ethernet/marvell/mvneta.c | 96 ++++++++++++++++++++++++----
>>  1 file changed, 84 insertions(+), 12 deletions(-)
>> 
>> diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
>> index 87274d4ab102..b6849f88cab7 100644
>> --- a/drivers/net/ethernet/marvell/mvneta.c
>> +++ b/drivers/net/ethernet/marvell/mvneta.c
>> @@ -561,6 +561,9 @@ struct mvneta_rx_queue {
>>  	u32 pkts_coal;
>>  	u32 time_coal;
>>  
>> +	/* Virtual address of the RX buffer */
>> +	void  **buf_virt_addr;
>
> can we store buf_phys_addr in cacheable memory as well?

Even if we store in in cacheable memory we will still need to store it
in the buffer descriptor as it is used by the hardware.

>
>> +
>>  	/* Virtual address of the RX DMA descriptors array */
>>  	struct mvneta_rx_desc *descs;
>>  
>> @@ -1573,10 +1576,14 @@ static void mvneta_tx_done_pkts_coal_set(struct mvneta_port *pp,
>>  
>>  /* Handle rx descriptor fill by setting buf_cookie and buf_phys_addr */
>>  static void mvneta_rx_desc_fill(struct mvneta_rx_desc *rx_desc,
>> -				u32 phys_addr, u32 cookie)
>> +				u32 phys_addr, void *virt_addr,
>> +				struct mvneta_rx_queue *rxq)
>>  {
>> -	rx_desc->buf_cookie = cookie;
>> +	int i;
>> +
>>  	rx_desc->buf_phys_addr = phys_addr;
>> +	i = rx_desc - rxq->descs;
>> +	rxq->buf_virt_addr[i] = virt_addr;
>>  }
>>  
>>  /* Decrement sent descriptors counter */
>> @@ -1781,7 +1788,8 @@ EXPORT_SYMBOL_GPL(mvneta_frag_free);
>>  
>>  /* Refill processing for SW buffer management */
>>  static int mvneta_rx_refill(struct mvneta_port *pp,
>> -			    struct mvneta_rx_desc *rx_desc)
>> +			    struct mvneta_rx_desc *rx_desc,
>> +			    struct mvneta_rx_queue *rxq)
>>  
>>  {
>>  	dma_addr_t phys_addr;
>> @@ -1799,7 +1807,7 @@ static int mvneta_rx_refill(struct mvneta_port *pp,
>>  		return -ENOMEM;
>>  	}
>>  
>> -	mvneta_rx_desc_fill(rx_desc, phys_addr, (u32)data);
>> +	mvneta_rx_desc_fill(rx_desc, phys_addr, data, rxq);
>>  	return 0;
>>  }
>>  
>> @@ -1861,7 +1869,12 @@ static void mvneta_rxq_drop_pkts(struct mvneta_port *pp,
>>  
>>  	for (i = 0; i < rxq->size; i++) {
>>  		struct mvneta_rx_desc *rx_desc = rxq->descs + i;
>> -		void *data = (void *)rx_desc->buf_cookie;
>> +		void *data;
>> +
>> +		if (!pp->bm_priv)
>> +			data = rxq->buf_virt_addr[i];
>> +		else
>> +			data = (void *)(uintptr_t)rx_desc->buf_cookie;
>>  
>>  		dma_unmap_single(pp->dev->dev.parent, rx_desc->buf_phys_addr,
>>  				 MVNETA_RX_BUF_SIZE(pp->pkt_size), DMA_FROM_DEVICE);
>> @@ -1894,12 +1907,13 @@ static int mvneta_rx_swbm(struct mvneta_port *pp, int rx_todo,
>>  		unsigned char *data;
>>  		dma_addr_t phys_addr;
>>  		u32 rx_status, frag_size;
>> -		int rx_bytes, err;
>> +		int rx_bytes, err, index;
>>  
>>  		rx_done++;
>>  		rx_status = rx_desc->status;
>>  		rx_bytes = rx_desc->data_size - (ETH_FCS_LEN + MVNETA_MH_SIZE);
>> -		data = (unsigned char *)rx_desc->buf_cookie;
>> +		index = rx_desc - rxq->descs;
>> +		data = (unsigned char *)rxq->buf_virt_addr[index];
>>  		phys_addr = rx_desc->buf_phys_addr;
>>  
>>  		if (!mvneta_rxq_desc_is_first_last(rx_status) ||
>> @@ -1938,7 +1952,7 @@ static int mvneta_rx_swbm(struct mvneta_port *pp, int rx_todo,
>>  		}
>>  
>>  		/* Refill processing */
>> -		err = mvneta_rx_refill(pp, rx_desc);
>> +		err = mvneta_rx_refill(pp, rx_desc, rxq);
>>  		if (err) {
>>  			netdev_err(dev, "Linux processing - Can't refill\n");
>>  			rxq->missed++;
>> @@ -2020,7 +2034,7 @@ static int mvneta_rx_hwbm(struct mvneta_port *pp, int rx_todo,
>>  		rx_done++;
>>  		rx_status = rx_desc->status;
>>  		rx_bytes = rx_desc->data_size - (ETH_FCS_LEN + MVNETA_MH_SIZE);
>> -		data = (unsigned char *)rx_desc->buf_cookie;
>> +		data = (u8 *)(uintptr_t)rx_desc->buf_cookie;
>>  		phys_addr = rx_desc->buf_phys_addr;
>>  		pool_id = MVNETA_RX_GET_BM_POOL_ID(rx_desc);
>>  		bm_pool = &pp->bm_priv->bm_pools[pool_id];
>> @@ -2708,6 +2722,57 @@ static int mvneta_poll(struct napi_struct *napi, int budget)
>>  	return rx_done;
>>  }
>>  
>> +/* Refill processing for HW buffer management */
>> +static int mvneta_rx_hwbm_refill(struct mvneta_port *pp,
>> +				 struct mvneta_rx_desc *rx_desc)
>> +
>> +{
>> +	dma_addr_t phys_addr;
>> +	void *data;
>> +
>> +	data = mvneta_frag_alloc(pp->frag_size);
>> +	if (!data)
>> +		return -ENOMEM;
>> +
>> +	phys_addr = dma_map_single(pp->dev->dev.parent, data,
>> +				   MVNETA_RX_BUF_SIZE(pp->pkt_size),
>> +				   DMA_FROM_DEVICE);
>> +	if (unlikely(dma_mapping_error(pp->dev->dev.parent, phys_addr))) {
>> +		mvneta_frag_free(pp->frag_size, data);
>> +		return -ENOMEM;
>> +	}
>> +
>> +	phys_addr += pp->rx_offset_correction;
>> +	rx_desc->buf_phys_addr = phys_addr;
>> +	rx_desc->buf_cookie = (uintptr_t)data;
>> +
>> +	return 0;
>> +}
>> +
>> +/* Handle rxq fill: allocates rxq skbs; called when initializing a port */
>> +static int mvneta_rxq_bm_fill(struct mvneta_port *pp,
>> +			      struct mvneta_rx_queue *rxq,
>> +			      int num)
>> +{
>> +	int i;
>> +
>> +	for (i = 0; i < num; i++) {
>> +		memset(rxq->descs + i, 0, sizeof(struct mvneta_rx_desc));
>> +		if (mvneta_rx_hwbm_refill(pp, rxq->descs + i) != 0) {
>> +			netdev_err(pp->dev, "%s:rxq %d, %d of %d buffs  filled\n",
>> +				   __func__, rxq->id, i, num);
>> +			break;
>> +		}
>> +	}
>> +
>> +	/* Add this number of RX descriptors as non occupied (ready to
>> +	 * get packets)
>> +	 */
>> +	mvneta_rxq_non_occup_desc_add(pp, rxq, i);
>> +
>> +	return i;
>> +}
>> +
>>  /* Handle rxq fill: allocates rxq skbs; called when initializing a port */
>>  static int mvneta_rxq_fill(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
>>  			   int num)
>> @@ -2716,7 +2781,7 @@ static int mvneta_rxq_fill(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
>>  
>>  	for (i = 0; i < num; i++) {
>>  		memset(rxq->descs + i, 0, sizeof(struct mvneta_rx_desc));
>> -		if (mvneta_rx_refill(pp, rxq->descs + i) != 0) {
>> +		if (mvneta_rx_refill(pp, rxq->descs + i, rxq) != 0) {
>>  			netdev_err(pp->dev, "%s:rxq %d, %d of %d buffs  filled\n",
>>  				__func__, rxq->id, i, num);
>>  			break;
>> @@ -2784,14 +2849,21 @@ static int mvneta_rxq_init(struct mvneta_port *pp,
>>  		mvneta_rxq_buf_size_set(pp, rxq,
>>  					MVNETA_RX_BUF_SIZE(pp->pkt_size));
>>  		mvneta_rxq_bm_disable(pp, rxq);
>> +
>> +		rxq->buf_virt_addr = devm_kmalloc(pp->dev->dev.parent,
>> +						  rxq->size * sizeof(void *),
>> +						  GFP_KERNEL);
>
> I would suggest allocate this buffer during probe. Otherwise, there's
> memory leak if we either change the mtu or close then open the eth in
> a loop, e.g
>
> while true
> do
> 	ifconfig eth0 up
> 	ifconfig eth0 down
> done

Indeed, I will move it.

Thanks,

Gregory

>
> Thanks,
> Jisheng
>
>> +		if (!rxq->buf_virt_addr)
>> +			return -ENOMEM;
>> +
>> +		mvneta_rxq_fill(pp, rxq, rxq->size);
>>  	} else {
>>  		mvneta_rxq_bm_enable(pp, rxq);
>>  		mvneta_rxq_long_pool_set(pp, rxq);
>>  		mvneta_rxq_short_pool_set(pp, rxq);
>> +		mvneta_rxq_bm_fill(pp, rxq, rxq->size);
>>  	}
>>  
>> -	mvneta_rxq_fill(pp, rxq, rxq->size);
>> -
>>  	return 0;
>>  }
>>  
>

-- 
Gregory Clement, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com

^ permalink raw reply

* Re: [PATCH net-next 0/2] Fix support for the MV88E6097
From: David Miller @ 2016-11-28 16:59 UTC (permalink / raw)
  To: eichest; +Cc: andrew, vivien.didelot, netdev, stefan.eichenberger
In-Reply-To: <20161125084130.3210-1-stefan.eichenberger@netmodule.com>

From: Stefan Eichenberger <eichest@gmail.com>
Date: Fri, 25 Nov 2016 09:41:28 +0100

> This patchset fixes the following two issues for the MV88E6097:
> - Add missing definition of g1_irqs
> - Add missing comment

Series applied, thanks Stefan.

^ permalink raw reply

* Re: AF_VSOCK network namespace support
From: Jorgen S. Hansen @ 2016-11-28 15:24 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: netdev@vger.kernel.org, imbrenda@linux.vnet.ibm.com
In-Reply-To: <20161123145535.GA16465@stefanha-x1.localdomain>

Hi Stefan,

> On Nov 23, 2016, at 3:55 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> Hi Jorgen,
> There are two use cases where network namespace support in AF_VSOCK
> could be useful:
> 
> 1. Claudio Imbrenda pointed out that a machine cannot act as both host
>   and guest at the same time.  This is necessary for nested
>   virtualization.  Currently only one transport (the host side or the
>   guest side) can be registered at a time.

VMCI based AF_VSOCK relies on the VMCI driver for nested virtualization support. The VMCI driver is a combined host/guest driver with a routing component, that will either direct traffic to VMs managed by the host “personality” of the driver, or to the outer host. So any VMCI driver driver is able to function simultaneously as both a guest and a host driver - exactly to be able to support nested virtualization.

Since, for VMCI based vSocket, the host has a fixed CID (2), any traffic generated by an application inside a VM destined for CID 2 will be routed out of the VM (to the host - either a virtual or physical one). Any traffic for a CID > 2 will be directed towards VMs managed by the host personality of the VMCI driver.

Since VMCI predates nested virtualization, the solution above was partly a result of having to support existing configurations in a transparent way.

> 2. Users may wish to isolate the AF_VSOCK address namespace so that two
>   VMs have completely independent CID and ports (they could even use
>   the same CID and ports because they're in separate namespaces).  This
>   ensures that a host service visible to VM1 is not automatically
>   visible to VM2.

If the goal is to provide fine grained service access control, won’t this end up requiring a namespace per VM? For ESX, we have a mechanism to tag VMs that allows them to be granted access to a service offered through AF_VSOCK, but this is not part of the Linux hypervisor.

If the intent is to be able to support multi tenancy, then this sounds like a better fit. Also, in the multi tenancy case, isolating the other AFs is probably what you want as well.

> Network namespaces could solve both problems.
> 
> A drawback of namespaces is that existing configurations using network
> namespaces for IPv4/6 or other purposes break if AF_VSOCK gains network
> namespace support.  This is not a big problem for virtio-vsock if we
> implement namespace support soon since there are no existing users.
> 
> I wonder how other address families have solved this transition to
> network namespaces.  It's almost like we need fine-grained namespaces
> instead of a blanket network namespace that applies across all address
> families...
> 
> I'm playing around with the code now but wanted to get your thoughts in
> case you've already considered these problems.
> 
> Stefan

Thanks,
Jørgen

^ permalink raw reply

* Re: [PATCH net] sit: Set skb->protocol properly in ipip6_tunnel_xmit()
From: Eric Dumazet @ 2016-11-28 16:53 UTC (permalink / raw)
  To: David Miller, Alexander Duyck; +Cc: sfr, elicooper, netdev
In-Reply-To: <20161128.114703.154301432767947066.davem@davemloft.net>

On Mon, 2016-11-28 at 11:47 -0500, David Miller wrote:
> From: Stephen Rothwell <sfr@canb.auug.org.au>
> Date: Sun, 27 Nov 2016 13:04:00 +1100
> 
> > [Just for Dave's information]
> > 
> > On Fri, 25 Nov 2016 13:50:17 +0800 Eli Cooper <elicooper@gmx.com> wrote:
> >>
> >> Similar to commit ae148b085876
> >> ("ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()"),
> >> sit tunnels also need to update skb->protocol; otherwise, TSO/GSO packets
> >> might not be properly segmented, which causes the packets being dropped.
> >> 
> >> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
> >> Tested-by: Eli Cooper <elicooper@gmx.com>
> >> Cc: stable@vger.kernel.org
> >> Signed-off-by: Eli Cooper <elicooper@gmx.com>
> > 
> > I tested this patch and it does *not* solve my problem.
> 
> I'm torn on this patch, because it looked exactly like it would solve the
> kind of problem Stephen is running into.
> 
> Even though it doesn't fix his case, it seems correct to me.
> 
> I was wondering if it was also important to set the skb->protocol
> before the call to ip_tunnel_encap() but I couldn't find a dependency.
> 
> In any event I'd like to see some other people review this change
> before I apply it.
> 
> My only other guess for Stephen's problem is somehow the SKB headers
> aren't set up properly for what the GSO engine expects.

Well, mlx4 just works, and uses GSO engine just fine.

So my guess is this is a bug in Intel IGB driver.


Alexander, can you take a look ?

Features for eth0:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: off [fixed]
        tx-checksum-ip-generic: on
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: on
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp-mangleid-segmentation: off
        tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: on
tx-gre-csum-segmentation: on
tx-ipxip4-segmentation: on
tx-ipxip6-segmentation: on
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-gso-partial: on
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]
hw-tc-offload: off [fixed]

^ permalink raw reply

* Re: [PATCH net-next] virtio-net: enable multiqueue by default
From: Michael S. Tsirkin @ 2016-11-28 16:52 UTC (permalink / raw)
  To: Jason Wang
  Cc: Neil Horman, Jeremy Eder, Hannes Frederic Sowa, netdev,
	linux-kernel, virtualization, Marko Myllynen, Maxime Coquelin
In-Reply-To: <1480048646-17536-1-git-send-email-jasowang@redhat.com>

On Fri, Nov 25, 2016 at 12:37:26PM +0800, Jason Wang wrote:
> We use single queue even if multiqueue is enabled and let admin to
> enable it through ethtool later. This is used to avoid possible
> regression (small packet TCP stream transmission). But looks like an
> overkill since:
> 
> - single queue user can disable multiqueue when launching qemu
> - brings extra troubles for the management since it needs extra admin
>   tool in guest to enable multiqueue
> - multiqueue performs much better than single queue in most of the
>   cases
> 
> So this patch enables multiqueue by default: if #queues is less than or
> equal to #vcpu, enable as much as queue pairs; if #queues is greater
> than #vcpu, enable #vcpu queue pairs.
> 
> Cc: Hannes Frederic Sowa <hannes@redhat.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Neil Horman <nhorman@redhat.com>
> Cc: Jeremy Eder <jeder@redhat.com>
> Cc: Marko Myllynen <myllynen@redhat.com>
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

OK I stil htink we should handle cpu hotplug better
but this can be done separately.

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/net/virtio_net.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index d4ac7a6..a21d93a 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1886,8 +1886,11 @@ static int virtnet_probe(struct virtio_device *vdev)
>  	if (vi->any_header_sg)
>  		dev->needed_headroom = vi->hdr_len;
>  
> -	/* Use single tx/rx queue pair as default */
> -	vi->curr_queue_pairs = 1;
> +	/* Enable multiqueue by default */
> +	if (num_online_cpus() >= max_queue_pairs)
> +		vi->curr_queue_pairs = max_queue_pairs;
> +	else
> +		vi->curr_queue_pairs = num_online_cpus();
>  	vi->max_queue_pairs = max_queue_pairs;
>  
>  	/* Allocate/initialize the rx/tx queues, and invoke find_vqs */
> @@ -1918,6 +1921,8 @@ static int virtnet_probe(struct virtio_device *vdev)
>  		goto free_unregister_netdev;
>  	}
>  
> +	virtnet_set_affinity(vi);
> +
>  	/* Assume link up if device can't report link status,
>  	   otherwise get link status from config. */
>  	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_STATUS)) {
> -- 
> 2.7.4

^ permalink raw reply

* Re: [PATCH net] sit: Set skb->protocol properly in ipip6_tunnel_xmit()
From: David Miller @ 2016-11-28 16:47 UTC (permalink / raw)
  To: sfr; +Cc: elicooper, netdev
In-Reply-To: <20161127130400.4a69ff1b@canb.auug.org.au>

From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Sun, 27 Nov 2016 13:04:00 +1100

> [Just for Dave's information]
> 
> On Fri, 25 Nov 2016 13:50:17 +0800 Eli Cooper <elicooper@gmx.com> wrote:
>>
>> Similar to commit ae148b085876
>> ("ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()"),
>> sit tunnels also need to update skb->protocol; otherwise, TSO/GSO packets
>> might not be properly segmented, which causes the packets being dropped.
>> 
>> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
>> Tested-by: Eli Cooper <elicooper@gmx.com>
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Eli Cooper <elicooper@gmx.com>
> 
> I tested this patch and it does *not* solve my problem.

I'm torn on this patch, because it looked exactly like it would solve the
kind of problem Stephen is running into.

Even though it doesn't fix his case, it seems correct to me.

I was wondering if it was also important to set the skb->protocol
before the call to ip_tunnel_encap() but I couldn't find a dependency.

In any event I'd like to see some other people review this change
before I apply it.

My only other guess for Stephen's problem is somehow the SKB headers
aren't set up properly for what the GSO engine expects.

^ permalink raw reply

* Re: [PATCH] net: stmmac: enable tx queue 0 for gmac4 IPs synthesized with multiple TX queues
From: David Miller @ 2016-11-28 16:29 UTC (permalink / raw)
  To: niklas.cassel
  Cc: peppe.cavallaro, alexandre.torgue, niklass, netdev, linux-kernel
In-Reply-To: <1479998194-7113-1-git-send-email-niklass@axis.com>

From: Niklas Cassel <niklas.cassel@axis.com>
Date: Thu, 24 Nov 2016 15:36:33 +0100

> From: Niklas Cassel <niklas.cassel@axis.com>
> 
> The dwmac4 IP can synthesized with 1-8 number of tx queues.
> On an IP synthesized with DWC_EQOS_NUM_TXQ > 1, all txqueues are disabled
> by default. For these IPs, the bitfield TXQEN is R/W.
> 
> Always enable tx queue 0. The write will have no effect on IPs synthesized
> with DWC_EQOS_NUM_TXQ == 1.
> 
> The driver does still not utilize more than one tx queue in the IP.
> 
> Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>

Alexandre, we are still waiting for your implicit/explicit ACK on this
change.

Thank you.

^ permalink raw reply

* Re: [PATCH net-next] virtio-net: enable multiqueue by default
From: John Fastabend @ 2016-11-28 16:38 UTC (permalink / raw)
  To: David Miller, mst
  Cc: nhorman, jeder, hannes, netdev, linux-kernel, virtualization,
	myllynen, maxime.coquelin
In-Reply-To: <20161128.112823.2263657283777032168.davem@davemloft.net>

On 16-11-28 08:28 AM, David Miller wrote:
> From: "Michael S. Tsirkin" <mst@redhat.com>
> Date: Fri, 25 Nov 2016 06:43:08 +0200
> 
>> On Fri, Nov 25, 2016 at 12:37:26PM +0800, Jason Wang wrote:
>>> We use single queue even if multiqueue is enabled and let admin to
>>> enable it through ethtool later. This is used to avoid possible
>>> regression (small packet TCP stream transmission). But looks like an
>>> overkill since:
>>>
>>> - single queue user can disable multiqueue when launching qemu
>>> - brings extra troubles for the management since it needs extra admin
>>>   tool in guest to enable multiqueue
>>> - multiqueue performs much better than single queue in most of the
>>>   cases
>>>
>>> So this patch enables multiqueue by default: if #queues is less than or
>>> equal to #vcpu, enable as much as queue pairs; if #queues is greater
>>> than #vcpu, enable #vcpu queue pairs.
>>>
>>> Cc: Hannes Frederic Sowa <hannes@redhat.com>
>>> Cc: Michael S. Tsirkin <mst@redhat.com>
>>> Cc: Neil Horman <nhorman@redhat.com>
>>> Cc: Jeremy Eder <jeder@redhat.com>
>>> Cc: Marko Myllynen <myllynen@redhat.com>
>>> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>
>> OK at some level but all uses of num_online_cpus()
>> like this are racy versus hotplug.
>> I know we already have this bug but shouldn't we fix it
>> before we add more?
> 
> This is more being used like a heuristic in this scenerio, and in
> fact I would say one would keep the code this way even once proper
> hotplug handlers are installed to adjust the queued dynamically if
> there is a desired (which is also not necessarily the case).
> 
> I really don't think this change should be held on up on this issue.
> So can we please make some forward progress here?
> 
> Thanks.
> 

Also it might be worth noting all the other multiqueue capable
ethernet devices I checked use some variation of this heuristic.
Typically related to what other features are enables RSS, DCB, etc.
So it at least should be familiar to folks who do hotplug cpus on
bare metal boxes.

.John

^ permalink raw reply

* Re: [PATCH 1/1] GSO: Reload iph after pskb_may_pull
From: Alexander Duyck @ 2016-11-28 16:34 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: David Miller, Eric Dumazet, Alexander Duyck, Andrey Konovalov,
	Linux Networking Development Mailing List
In-Reply-To: <20161128153658.GB4778@kernel.org>

On Mon, Nov 28, 2016 at 7:36 AM, Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
> As it may get stale and lead to use after free.
>
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: Alexander Duyck <aduyck@mirantis.com>
> Cc: Andrey Konovalov <andreyknvl@google.com>
> Fixes: cbc53e08a793 ("GSO: Add GSO type for fixed IPv4 ID")
> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> ---
>  net/ipv4/af_inet.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index 5ddf5cda07f4..215143246e4b 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -1233,7 +1233,7 @@ struct sk_buff *inet_gso_segment(struct sk_buff *skb,
>                 fixedid = !!(skb_shinfo(skb)->gso_type & SKB_GSO_TCP_FIXEDID);
>
>                 /* fixed ID is invalid if DF bit is not set */
> -               if (fixedid && !(iph->frag_off & htons(IP_DF)))
> +               if (fixedid && !(ip_hdr(skb)->frag_off & htons(IP_DF)))
>                         goto out;
>         }
>
> --
> 2.9.3
>

Looks good to me.

Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>

^ permalink raw reply

* Re: [PATCH] net/iucv: use explicit clean up labels in iucv_init()
From: Thomas Gleixner @ 2016-11-28 16:31 UTC (permalink / raw)
  To: David Miller; +Cc: bigeasy, ubraun, linux-kernel, rt, linux-s390, netdev
In-Reply-To: <20161128.112433.295947046706527102.davem@davemloft.net>

On Mon, 28 Nov 2016, David Miller wrote:
> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Date: Thu, 24 Nov 2016 17:10:13 +0100
> 
> > Ursula suggested to use explicit labels for clean up in the error path
> > instead of one `out_free' label which handles multiple exits.
> > Since the previous patch got already applied, here is a follow up patch.
> > 
> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> 
> "Previous patch" doesn't tell readers anything specific enough to identify
> the change you are referring to.  This will be even more true years down
> the line if someone tries to read this commit and figure out what you
> are referring to.
> 
> We have a standard mechanism to refer to commits, via SHA1_ID and commit
> header line text, please use it.

I amended the commit message.

Thanks,

	tglx

^ permalink raw reply

* Re: [RFC PATCH 1/2] net: macb: Add MDIO driver for accessing multiple PHY devices
From: Andrew Lunn @ 2016-11-28 16:33 UTC (permalink / raw)
  To: Harini Katakam
  Cc: nicolas.ferre-AIFe0yeh4nAAvxtiuMwx3w,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	pawel.moll-5wv7dgnIgG8, mark.rutland-5wv7dgnIgG8,
	ijc+devicetree-KcIKpvwj1kUDXYZnReoRVg,
	galak-sgV2jX0FEOL9JmXXK+q4OQ,
	boris.brezillon-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8,
	alexandre.belloni-wi1+55ScJUtKEb57/3fJTNBPR1lH4CV8,
	harinikatakamlinux-Re5JQEeQqe8AvxtiuMwx3w,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA, harinik-gjFFaj9aHVfQT0dZR+AlfA,
	michals-gjFFaj9aHVfQT0dZR+AlfA, Punnaiah Choudary Kalluri
In-Reply-To: <1480326554-6041-1-git-send-email-harinik-gjFFaj9aHVfQT0dZR+AlfA@public.gmane.org>

On Mon, Nov 28, 2016 at 03:19:14PM +0530, Harini Katakam wrote:
> This patch is to add support for the hardware with multiple ethernet
> MAC controllers and a single MDIO bus connected to multiple PHY devices.
> MDIO lines are connected to any one of the ethernet MAC controllers and
> all the PHY devices will be accessed using the PHY maintenance interface
> in that MAC controller. This handling along with PHY functionality is
> moved to macb_mdio.c
> 
> Signed-off-by: Punnaiah Choudary Kalluri <punnaia-gjFFaj9aHVfQT0dZR+AlfA@public.gmane.org>
> Signed-off-by: Harini Katakam <harinik-gjFFaj9aHVfQT0dZR+AlfA@public.gmane.org>
> ---
>  drivers/net/ethernet/cadence/Makefile    |   2 +-
>  drivers/net/ethernet/cadence/macb.c      | 169 +++-----------------
>  drivers/net/ethernet/cadence/macb.h      |   2 +
>  drivers/net/ethernet/cadence/macb_mdio.c | 266 +++++++++++++++++++++++++++++++
>  4 files changed, 294 insertions(+), 145 deletions(-)
>  create mode 100644 drivers/net/ethernet/cadence/macb_mdio.c
> 
> diff --git a/drivers/net/ethernet/cadence/Makefile b/drivers/net/ethernet/cadence/Makefile
> index 91f79b1..75c3d84 100644
> --- a/drivers/net/ethernet/cadence/Makefile
> +++ b/drivers/net/ethernet/cadence/Makefile
> @@ -2,4 +2,4 @@
>  # Makefile for the Atmel network device drivers.
>  #
>  
> -obj-$(CONFIG_MACB) += macb.o
> +obj-$(CONFIG_MACB) += macb.o macb_mdio.o
> diff --git a/drivers/net/ethernet/cadence/macb.c b/drivers/net/ethernet/cadence/macb.c
> index 80ccfc4..ae2a797 100644
> --- a/drivers/net/ethernet/cadence/macb.c
> +++ b/drivers/net/ethernet/cadence/macb.c
> @@ -232,45 +232,6 @@ static void macb_get_hwaddr(struct macb *bp)
>  	eth_hw_addr_random(bp->dev);
>  }
>  
> -static int macb_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
> -{
> -	struct macb *bp = bus->priv;
> -	int value;
> -
> -	macb_writel(bp, MAN, (MACB_BF(SOF, MACB_MAN_SOF)
> -			      | MACB_BF(RW, MACB_MAN_READ)
> -			      | MACB_BF(PHYA, mii_id)
> -			      | MACB_BF(REGA, regnum)
> -			      | MACB_BF(CODE, MACB_MAN_CODE)));
> -
> -	/* wait for end of transfer */
> -	while (!MACB_BFEXT(IDLE, macb_readl(bp, NSR)))
> -		cpu_relax();
> -
> -	value = MACB_BFEXT(DATA, macb_readl(bp, MAN));
> -
> -	return value;
> -}
> -
> -static int macb_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
> -			   u16 value)
> -{
> -	struct macb *bp = bus->priv;
> -
> -	macb_writel(bp, MAN, (MACB_BF(SOF, MACB_MAN_SOF)
> -			      | MACB_BF(RW, MACB_MAN_WRITE)
> -			      | MACB_BF(PHYA, mii_id)
> -			      | MACB_BF(REGA, regnum)
> -			      | MACB_BF(CODE, MACB_MAN_CODE)
> -			      | MACB_BF(DATA, value)));
> -
> -	/* wait for end of transfer */
> -	while (!MACB_BFEXT(IDLE, macb_readl(bp, NSR)))
> -		cpu_relax();
> -
> -	return 0;
> -}
> -
>  /**
>   * macb_set_tx_clk() - Set a clock to a new frequency
>   * @clk		Pointer to the clock to change
> @@ -385,27 +346,19 @@ static void macb_handle_link_change(struct net_device *dev)
>  static int macb_mii_probe(struct net_device *dev)
>  {
>  	struct macb *bp = netdev_priv(dev);
> -	struct macb_platform_data *pdata;
>  	struct phy_device *phydev;
> -	int phy_irq;
>  	int ret;
>  
> -	phydev = phy_find_first(bp->mii_bus);
> +	if (dev->phydev)
> +		return 0;
> +
> +	phydev = of_phy_find_device(bp->phy_node);
>  	if (!phydev) {
>  		netdev_err(dev, "no PHY found\n");
>  		return -ENXIO;
>  	}
> -
> -	pdata = dev_get_platdata(&bp->pdev->dev);
> -	if (pdata && gpio_is_valid(pdata->phy_irq_pin)) {
> -		ret = devm_gpio_request(&bp->pdev->dev, pdata->phy_irq_pin,
> -					"phy int");
> -		if (!ret) {
> -			phy_irq = gpio_to_irq(pdata->phy_irq_pin);
> -			phydev->irq = (phy_irq < 0) ? PHY_POLL : phy_irq;
> -		}
> -	}
> -
> +	if (bp->phy_irq)
> +		phydev->irq = bp->phy_irq;
>  	/* attach the mac to the phy */
>  	ret = phy_connect_direct(dev, phydev, &macb_handle_link_change,
>  				 bp->phy_interface);
> @@ -429,80 +382,9 @@ static int macb_mii_probe(struct net_device *dev)
>  	bp->speed = 0;
>  	bp->duplex = -1;
>  
> -	return 0;
> -}
> -
> -static int macb_mii_init(struct macb *bp)
> -{
> -	struct macb_platform_data *pdata;
> -	struct device_node *np;
> -	int err = -ENXIO, i;
> -
> -	/* Enable management port */
> -	macb_writel(bp, NCR, MACB_BIT(MPE));
> -
> -	bp->mii_bus = mdiobus_alloc();
> -	if (!bp->mii_bus) {
> -		err = -ENOMEM;
> -		goto err_out;
> -	}
> -
> -	bp->mii_bus->name = "MACB_mii_bus";
> -	bp->mii_bus->read = &macb_mdio_read;
> -	bp->mii_bus->write = &macb_mdio_write;
> -	snprintf(bp->mii_bus->id, MII_BUS_ID_SIZE, "%s-%x",
> -		 bp->pdev->name, bp->pdev->id);
> -	bp->mii_bus->priv = bp;
> -	bp->mii_bus->parent = &bp->pdev->dev;
> -	pdata = dev_get_platdata(&bp->pdev->dev);
> -
> -	dev_set_drvdata(&bp->dev->dev, bp->mii_bus);
> -
> -	np = bp->pdev->dev.of_node;
> -	if (np) {
> -		/* try dt phy registration */
> -		err = of_mdiobus_register(bp->mii_bus, np);
> -
> -		/* fallback to standard phy registration if no phy were
> -		 * found during dt phy registration
> -		 */
> -		if (!err && !phy_find_first(bp->mii_bus)) {
> -			for (i = 0; i < PHY_MAX_ADDR; i++) {
> -				struct phy_device *phydev;
> -
> -				phydev = mdiobus_scan(bp->mii_bus, i);
> -				if (IS_ERR(phydev) &&
> -				    PTR_ERR(phydev) != -ENODEV) {
> -					err = PTR_ERR(phydev);
> -					break;
> -				}
> -			}
> -
> -			if (err)
> -				goto err_out_unregister_bus;
> -		}
> -	} else {
> -		if (pdata)
> -			bp->mii_bus->phy_mask = pdata->phy_mask;
> -
> -		err = mdiobus_register(bp->mii_bus);
> -	}
> -
> -	if (err)
> -		goto err_out_free_mdiobus;
> -
> -	err = macb_mii_probe(bp->dev);
> -	if (err)
> -		goto err_out_unregister_bus;
> +	phy_attached_info(phydev);
>  
>  	return 0;
> -
> -err_out_unregister_bus:
> -	mdiobus_unregister(bp->mii_bus);
> -err_out_free_mdiobus:
> -	mdiobus_free(bp->mii_bus);
> -err_out:
> -	return err;
>  }
>  
>  static void macb_update_stats(struct macb *bp)
> @@ -2060,7 +1942,8 @@ static int macb_open(struct net_device *dev)
>  	netif_carrier_off(dev);
>  
>  	/* if the phy is not yet register, retry later*/
> -	if (!dev->phydev)
> +	err = macb_mii_probe(dev);
> +	if (err)
>  		return -EAGAIN;
>  
>  	/* RX buffers initialization */
> @@ -3122,16 +3005,16 @@ static int macb_probe(struct platform_device *pdev)
>  	unsigned int queue_mask, num_queues;
>  	struct macb_platform_data *pdata;
>  	bool native_io;
> -	struct phy_device *phydev;
>  	struct net_device *dev;
>  	struct resource *regs;
>  	void __iomem *mem;
>  	const char *mac;
>  	struct macb *bp;
> +	int phy_irq;
>  	int err;
>  
>  	regs = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> -	mem = devm_ioremap_resource(&pdev->dev, regs);
> +	mem = devm_ioremap(&pdev->dev, regs->start, resource_size(regs));
>  	if (IS_ERR(mem))
>  		return PTR_ERR(mem);
>  
> @@ -3250,21 +3133,26 @@ static int macb_probe(struct platform_device *pdev)
>  	if (err)
>  		goto err_out_free_netdev;
>  
> -	err = macb_mii_init(bp);
> -	if (err)
> -		goto err_out_free_netdev;
> -
> -	phydev = dev->phydev;
> -
> -	netif_carrier_off(dev);
> -
>  	err = register_netdev(dev);
>  	if (err) {
>  		dev_err(&pdev->dev, "Cannot register net device, aborting.\n");
>  		goto err_out_unregister_mdio;
>  	}
>  
> -	phy_attached_info(phydev);
> +	bp->phy_node = of_parse_phandle(bp->pdev->dev.of_node,
> +					"phy-handle", 0);
> +
> +	pdata = dev_get_platdata(&bp->pdev->dev);
> +	if (pdata && gpio_is_valid(pdata->phy_irq_pin)) {
> +		err = devm_gpio_request(&bp->pdev->dev, pdata->phy_irq_pin,
> +					"phy int");
> +		if (!err) {
> +			phy_irq = gpio_to_irq(pdata->phy_irq_pin);
> +			bp->phy_irq = (phy_irq < 0) ? PHY_POLL : phy_irq;
> +		}
> +	}
> +
> +	netif_carrier_off(dev);
>  
>  	netdev_info(dev, "Cadence %s rev 0x%08x at 0x%08lx irq %d (%pM)\n",
>  		    macb_is_gem(bp) ? "GEM" : "MACB", macb_readl(bp, MID),
> @@ -3273,10 +3161,6 @@ static int macb_probe(struct platform_device *pdev)
>  	return 0;
>  
>  err_out_unregister_mdio:
> -	phy_disconnect(dev->phydev);
> -	mdiobus_unregister(bp->mii_bus);
> -	mdiobus_free(bp->mii_bus);
> -
>  	/* Shutdown the PHY if there is a GPIO reset */
>  	if (bp->reset_gpio)
>  		gpiod_set_value(bp->reset_gpio, 0);
> @@ -3304,9 +3188,6 @@ static int macb_remove(struct platform_device *pdev)
>  		bp = netdev_priv(dev);
>  		if (dev->phydev)
>  			phy_disconnect(dev->phydev);
> -		mdiobus_unregister(bp->mii_bus);
> -		dev->phydev = NULL;
> -		mdiobus_free(bp->mii_bus);
>  
>  		/* Shutdown the PHY if there is a GPIO reset */
>  		if (bp->reset_gpio)
> diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h
> index d67adad..15e5c0f 100644
> --- a/drivers/net/ethernet/cadence/macb.h
> +++ b/drivers/net/ethernet/cadence/macb.h
> @@ -874,6 +874,8 @@ struct macb {
>  	unsigned int		jumbo_max_len;
>  
>  	u32			wol;
> +	struct device_node *phy_node;
> +	int phy_irq;
>  };
>  
>  static inline bool macb_is_gem(struct macb *bp)
> diff --git a/drivers/net/ethernet/cadence/macb_mdio.c b/drivers/net/ethernet/cadence/macb_mdio.c
> new file mode 100644
> index 0000000..1277ca3
> --- /dev/null
> +++ b/drivers/net/ethernet/cadence/macb_mdio.c
> @@ -0,0 +1,266 @@
> +/*
> + * Cadence Macb mdio controller driver
> + *
> + * Copyright (C) 2014 - 2016 Xilinx, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify it under
> + * the terms of the GNU General Public License version 2 as published by the
> + * Free Software Foundation; either version 2 of the License, or (at your
> + * option) any later version.
> + */
> +#include <linux/clk.h>
> +#include <linux/delay.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/mutex.h>
> +#include <linux/netdevice.h>
> +#include <linux/of_address.h>
> +#include <linux/of_mdio.h>
> +#include <linux/io.h>
> +#include <linux/phy.h>
> +#include <linux/platform_device.h>
> +#include <linux/ptp_clock_kernel.h>
> +#include "macb.h"
> +
> +struct macb_mdio_data {
> +	void __iomem *regs;
> +
> +	struct clk *pclk;
> +	struct clk *hclk;
> +};
> +
> +#define macb_mdio_reg_writel(bp, offset, value)	\
> +	writel_relaxed(value, bp->regs + offset)
> +#define macb_mdio_writel(bp, reg, value)	\
> +	macb_mdio_reg_writel(bp, MACB_##reg, value)
> +
> +#define macb_mdio_reg_readl(bp, offset)	readl_relaxed(bp->regs + offset)
> +#define macb_mdio_readl(bp, reg)	macb_mdio_reg_readl(bp, MACB_##reg)
> +
> +#define MACB_MDIO_TIMEOUT	1000
> +
> +static int macb_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
> +{
> +	struct macb_mdio_data *bp = bus->priv;
> +	unsigned int timeout = MACB_MDIO_TIMEOUT;
> +	int value;
> +
> +	macb_mdio_writel(bp, MAN, (MACB_BF(SOF, MACB_MAN_SOF) |
> +				   MACB_BF(RW, MACB_MAN_READ) |
> +				   MACB_BF(PHYA, mii_id) |
> +				   MACB_BF(REGA, regnum) |
> +				   MACB_BF(CODE, MACB_MAN_CODE)));
> +
> +	/* wait for end of transfer */
> +	while (!MACB_BFEXT(IDLE, macb_mdio_readl(bp, NSR)) && timeout) {
> +		cpu_relax();
> +		timeout--;
> +	}
> +
> +	if (!timeout)
> +		return -ETIMEDOUT;
> +
> +	value = MACB_BFEXT(DATA, macb_mdio_readl(bp, MAN));
> +
> +	return value;
> +}
> +
> +static int macb_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
> +			   u16 value)
> +{
> +	struct macb_mdio_data *bp = bus->priv;
> +	unsigned int timeout = MACB_MDIO_TIMEOUT;
> +
> +	macb_mdio_writel(bp, MAN, (MACB_BF(SOF, MACB_MAN_SOF) |
> +				   MACB_BF(RW, MACB_MAN_WRITE) |
> +				   MACB_BF(PHYA, mii_id) |
> +				   MACB_BF(REGA, regnum) |
> +				   MACB_BF(CODE, MACB_MAN_CODE) |
> +				   MACB_BF(DATA, value)));
> +
> +	/* wait for end of transfer */
> +	while (!MACB_BFEXT(IDLE, macb_mdio_readl(bp, NSR)) && timeout) {
> +		cpu_relax();
> +		timeout--;
> +	}
> +
> +	if (!timeout)
> +		return -ETIMEDOUT;
> +
> +	return 0;
> +}
> +
> +static u32 gem_mdc_clk_div(struct macb_mdio_data *bp)
> +{
> +	u32 config;
> +	unsigned long pclk_hz = clk_get_rate(bp->pclk);
> +
> +	if (pclk_hz <= 20000000)
> +		config = GEM_BF(CLK, GEM_CLK_DIV8);
> +	else if (pclk_hz <= 40000000)
> +		config = GEM_BF(CLK, GEM_CLK_DIV16);
> +	else if (pclk_hz <= 80000000)
> +		config = GEM_BF(CLK, GEM_CLK_DIV32);
> +	else if (pclk_hz <= 120000000)
> +		config = GEM_BF(CLK, GEM_CLK_DIV48);
> +	else if (pclk_hz <= 160000000)
> +		config = GEM_BF(CLK, GEM_CLK_DIV64);
> +	else
> +		config = GEM_BF(CLK, GEM_CLK_DIV96);
> +
> +	return config;
> +}
> +
> +static int macb_mdio_probe(struct platform_device *pdev)
> +{
> +	struct device_node *np = pdev->dev.of_node;
> +	struct mii_bus *bus;
> +	struct macb_mdio_data *bp;
> +	struct resource *res;
> +	int ret;
> +	u32 config, i;
> +
> +	bus = mdiobus_alloc_size(sizeof(*bp));
> +	if (!bus)
> +		return -ENOMEM;
> +
> +	bus->name = "macb_mii_bus";
> +	bus->read = &macb_mdio_read;
> +	bus->write = &macb_mdio_write;
> +	snprintf(bus->id, MII_BUS_ID_SIZE, "%s-mii", dev_name(&pdev->dev));
> +	bus->parent = &pdev->dev;
> +	bus->irq = devm_kzalloc(&pdev->dev, sizeof(int) * PHY_MAX_ADDR,
> +				GFP_KERNEL);

This looks wrong, or at least old. It used to be a pointer to an array,
but it is now an actual array.

> +static const struct of_device_id macb_mdio_dt_ids[] = {
> +	{ .compatible = "cdns,macb-mdio" },
> +
> +};


I've not looked hard enough to know, but can you keep backwards
compatibility? Won't old device tree's assume the mdio bus is always
present? Now you need an explicit node otherwise there will not be an
mdio bus?

     Andrew
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: stmmac ethernet in kernel 4.4: coalescing related pauses?
From: David Miller @ 2016-11-28 16:30 UTC (permalink / raw)
  To: lsanfil; +Cc: eric.dumazet, pavel, peppe.cavallaro, netdev, linux-kernel
In-Reply-To: <920b1148-a77d-0b69-01b0-b14bcbdf8648@marvell.com>

From: Lino Sanfilippo <lsanfil@marvell.com>
Date: Mon, 28 Nov 2016 16:57:35 +0100

> I wonder if the best fix would be indeed to deactivate irq coalescing
> completely.
> Does it make any sense at all to use it if a driver uses NAPI already?

It absolutely does make sense, when it is implemented and functions
properly.

^ permalink raw reply

* Re: [PATCH net-next] virtio-net: enable multiqueue by default
From: David Miller @ 2016-11-28 16:28 UTC (permalink / raw)
  To: mst
  Cc: nhorman, jeder, hannes, netdev, linux-kernel, virtualization,
	myllynen, maxime.coquelin
In-Reply-To: <20161125064201-mutt-send-email-mst@kernel.org>

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Fri, 25 Nov 2016 06:43:08 +0200

> On Fri, Nov 25, 2016 at 12:37:26PM +0800, Jason Wang wrote:
>> We use single queue even if multiqueue is enabled and let admin to
>> enable it through ethtool later. This is used to avoid possible
>> regression (small packet TCP stream transmission). But looks like an
>> overkill since:
>> 
>> - single queue user can disable multiqueue when launching qemu
>> - brings extra troubles for the management since it needs extra admin
>>   tool in guest to enable multiqueue
>> - multiqueue performs much better than single queue in most of the
>>   cases
>> 
>> So this patch enables multiqueue by default: if #queues is less than or
>> equal to #vcpu, enable as much as queue pairs; if #queues is greater
>> than #vcpu, enable #vcpu queue pairs.
>> 
>> Cc: Hannes Frederic Sowa <hannes@redhat.com>
>> Cc: Michael S. Tsirkin <mst@redhat.com>
>> Cc: Neil Horman <nhorman@redhat.com>
>> Cc: Jeremy Eder <jeder@redhat.com>
>> Cc: Marko Myllynen <myllynen@redhat.com>
>> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> 
> OK at some level but all uses of num_online_cpus()
> like this are racy versus hotplug.
> I know we already have this bug but shouldn't we fix it
> before we add more?

This is more being used like a heuristic in this scenerio, and in
fact I would say one would keep the code this way even once proper
hotplug handlers are installed to adjust the queued dynamically if
there is a desired (which is also not necessarily the case).

I really don't think this change should be held on up on this issue.
So can we please make some forward progress here?

Thanks.

^ permalink raw reply

* Re: [PATCH] rtlwifi: Add updates for RTL8723BE and RTL8821AE
From: Kyle McMartin @ 2016-11-28 16:24 UTC (permalink / raw)
  To: Larry Finger
  Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-firmware-DgEjT+Ai2ygdnm+yROfE0A
In-Reply-To: <20161127192834.19907-1-Larry.Finger-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>

On Sun, Nov 27, 2016 at 01:28:34PM -0600, Larry Finger wrote:
> The new versions will only work with new versions of the drivers. For
> that reason, they are given new names and the old versions are retained.
> 
> Signed-off-by: Larry Finger <Larry.Finger-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>
> ---
>  WHENCE                     |   4 ++++

applied, thanks Larry.

regards, --kyle

^ permalink raw reply

* Re: [PATCH] net/iucv: use explicit clean up labels in iucv_init()
From: David Miller @ 2016-11-28 16:24 UTC (permalink / raw)
  To: bigeasy; +Cc: ubraun, tglx, linux-kernel, rt, linux-s390, netdev
In-Reply-To: <20161124161013.dukr42y2nwscosk6@linutronix.de>

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date: Thu, 24 Nov 2016 17:10:13 +0100

> Ursula suggested to use explicit labels for clean up in the error path
> instead of one `out_free' label which handles multiple exits.
> Since the previous patch got already applied, here is a follow up patch.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

"Previous patch" doesn't tell readers anything specific enough to identify
the change you are referring to.  This will be even more true years down
the line if someone tries to read this commit and figure out what you
are referring to.

We have a standard mechanism to refer to commits, via SHA1_ID and commit
header line text, please use it.

Thank you.

^ permalink raw reply

* Re: [RFC PATCH net-next] ipv6: implement consistent hashing for equal-cost multipath routing
From: David Miller @ 2016-11-28 16:22 UTC (permalink / raw)
  To: david.lebrun; +Cc: netdev
In-Reply-To: <1480017556-25988-1-git-send-email-david.lebrun@uclouvain.be>

From: David Lebrun <david.lebrun@uclouvain.be>
Date: Thu, 24 Nov 2016 20:59:16 +0100

> When multiple nexthops are available for a given route, the routing engine
> chooses a nexthop by computing the flow hash through get_hash_from_flowi6
> and by taking that value modulo the number of nexthops. The resulting value
> indexes the nexthop to select. This method causes issues when a new nexthop
> is added or one is removed (e.g. link failure). In that case, the number
> of nexthops changes and potentially all the flows get re-routed to another
> nexthop.
> 
> This patch implements a consistent hash method to select the nexthop in
> case of ECMP. The idea is to generate N random numbers (__u32) for each
> nexthop, where N is configurable. In order to select a nexthop, we find the
> number that is directly higher than the flow hash (which is also __u32).
> The nexthop associated to the number is then selected. The lookup method
> performs a binary search over the sorted array of numbers, which yields a
> time complexity of O(log n), where n is the number of nexthops times the
> number of random values generated for each number.
> 
> This feature can be enabled through the CONFIG_IPV6_MULTIPATH_CONSISTENT
> option and the number of random values generated for each nexthop is
> defined through CONFIG_IPV6_MPCONSIST_BUCKETSIZE.
> 
> Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>

Thanks for trying to solve this problem.

But we really don't want this to be Kconfig gated.  If we decide to
support this it should be a run-time selectable option.  Every
distribution on the planet is going to turn your Kconfig option on, so
whatever you think we might be saving by putting this behind Kconfig
checks won't be realized for large swaths of the userbase.

I also question how necessary %100 consistent hashing of ECMP really
is.

If we can do something at extremely low cost and arrive at a very low
disruption rate, honestly that's probably good enough.

I assume you've looked over RFC2992.  A low disrutpion algorithm is
described there and if we could just get rid of the divide it might
be extremely desirable.

^ permalink raw reply

* Re: [PATCH 2/2] net: dsa: mv88e6xxx: Add 88E6176 device tree support
From: Andrew Lunn @ 2016-11-28 16:14 UTC (permalink / raw)
  To: Uwe Kleine-König
  Cc: Rob Herring, Frank Rowand, Andreas Färber,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Michal Hrusecki, Tomas Hlavacek, Bed??icha Ko??atu,
	Vivien Didelot, Florian Fainelli,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <2c59cc79-b6dc-9920-1725-a7785ff3b6bf-rXY34ruvC2xidJT2blvkqNi2O/JbrIOy@public.gmane.org>

On Mon, Nov 28, 2016 at 04:44:47PM +0100, Uwe Kleine-König wrote:
> On 11/28/2016 02:17 PM, Andrew Lunn wrote:
> >> I still wonder (and didn't get an answer back when I asked about this)
> >> why a comment is preferred here. For other devices I know it's usual and
> >> requested by the maintainers to use:
> >>
> >> 	compatible = "exact name", "earlyer device to match driver";
> >>
> >> . This is more robust, documents the situation more formally and makes
> >> it better greppable. The price to pay is only a few bytes in the dtb
> >> which IMO is ok.
> > 
> > We did discuss this a while back. The information is useless and
> > should to be ignored if present.
> 
> Who is "we"?

Anybody on netdev a while back, but mostly Vivien and myself.

> I'd say fail to probe is the right thing to do. Of course that doesn't
> work for already supported models because it will break compatibility.

And that is the first rule of device tree, never break backwards
compatibility.

> Also it seems wrong to write "marvell,mv88e6085" (only) if I know the
> hardware is really a "marvell,mv88e6176".

Why? Remember, the property name is called "compatible", not "is".

> > Linus has said he does not like ARM devices because of all the busses
> > which are not enumerable. Here we have a device which with a little
> > bit of help we can enumerate. So we should. 
> 
> If you write
> 
> 	compatible = "marvell,mv88e6176", "marvell,mv88e6085";
> 
> you can still enumerate in the same way as before.

Sure it would. But if the driver decides to ignore it, it is likely to
be wrong. Developers are used to comments being wrong. We are
suspicious of comments, we don't trust them 100%. But if somebody sees
a property in a device tree, they put a higher 'trustability' value on
it. But it actually has less trustable than a comment.
 
> There are several more instances where the device tree specifies
> something that could be probed instead. Some examples:
> 
> 	compatible = "ethernet-phy-id0141.0DD1", "ethernet-phy-ieee802.3-c22";

There are examples of phys which don't implemented register 2 and
3. In that case, you do need to specify the ID, if you want the
correct driver to load.

> 	compatible = "spansion,s25fl164k", "jedec,spi-nor";

And there was a push recently to add "jedec,spi-nor" everywhere and
deprecate a specific compatible because it is also not needed.

> 	compatible = "fsl,imx25-flexcan", "fsl,p1010-flexcan";
> 	compatible = "arm,pl011", "arm,primecell";

I don't know these hardwares, so cannot comment.

  Andrew
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] stmmac ethernet: remove cut & paste code
From: Joe Perches @ 2016-11-28 16:03 UTC (permalink / raw)
  To: Pavel Machek
  Cc: peppe.cavallaro, netdev, kernel list, ezequiel, sonic.zhang,
	fabrice.gasnier
In-Reply-To: <20161128143539.GB31224@amd>

On Mon, 2016-11-28 at 15:35 +0100, Pavel Machek wrote:
> On Mon 2016-11-28 06:24:28, Joe Perches wrote:
> > On Mon, 2016-11-28 at 12:50 +0100, Pavel Machek wrote:
> > > On Thu 2016-11-24 14:27:13, Joe Perches wrote:
> > > > On Thu, 2016-11-24 at 22:44 +0100, Pavel Machek wrote:
> > > > > On Thu 2016-11-24 12:05:25, Joe Perches wrote:
> > > > > > On Thu, 2016-11-24 at 12:05 +0100, Pavel Machek wrote:
> > > > > > > Remove duplicate code from _tx routines.
> > > > > > 
> > > > > > trivia:
> > > > > > 
> > > > > > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > > > > > 
> > > > > > []
> > > > > > > @@ -1960,6 +1960,38 @@ static void stmmac_tso_allocator(struct stmmac_priv *priv, unsigned int des,
> > > > > > >  	}
> > > > > > >  }
> > > > > > >  
> > > > > > > +static void stmmac_xmit_common(struct sk_buff *skb, struct net_device *dev, int nfrags, struct dma_desc *desc)
> > > > > > > +{
> > > > > > > +	struct stmmac_priv *priv = netdev_priv(dev);
> > > > > > > +
> > > > > > > +	if (unlikely(stmmac_tx_avail(priv) <= (MAX_SKB_FRAGS + 1))) {
> > > > > > > +		if (netif_msg_hw(priv))
> > > > > > > +			pr_debug("%s: stop transmitted packets\n", __func__);
> > > > > > 
> > > > > > 		netif_dbg(priv, hw, dev, "%s: stop transmitted packets\n",
> > > > > > 			  __func__);
> > > > > 
> > > > > Not now. Modifying the code while de-duplicating would be bad idea.
> > > > 
> > > > Too many people think overly granular patches are the
> > > > best and only way to make changes.
> > > > Deduplication and consolidation can happen simultaneously.
> > > 
> > > Can, but should not at this point. Please take a look at the driver in
> > > question before commenting on trivial printk style.
> > 
> > I had.
> > 
> > It's perfectly acceptable and already uses netif_<level> properly.
> > 
> > This consolidation now introduces the _only_ instance where it is
> > now improperly using a netif_msg_<type> then single pr_<level>
> > function sequence that should be consolidated into netif_dbg.
> > Every other use of netif_msg_<level> then either emits multiple
> > lines or is used in an if/else.
> 
> Are you looking at right driver?

Yes and I think you should make changes against -next
and not Linus' where this is:

b3e51069627e2 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c (LABBE Corentin            2016-11-16 20:09:41 +0100  755)                              netif_warn(priv, link, priv->dev,
b3e51069627e2 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c (LABBE Corentin            2016-11-16 20:09:41 +0100  756)                                         "Speed (%d) not 10/100\n",
b3e51069627e2 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c (LABBE Corentin            2016-11-16 20:09:41 +0100  757)                                         phydev->speed);

>  I don't see single use of
> netif_msg_<level>, but see this at stmmac_main.c:756. Code is actually
> pretty consistent using pr_*.
> 
>                                 if (netif_msg_link(priv))
>                                         pr_warn("%s: Speed (%d) not 10/100\n",
>                                                 dev->name, phydev->speed);
> 
> Anyway, I'm moving code around, if you want to do trivial cleanups, do
> them yourself.

cheers, Joe

^ permalink raw reply

* Re: [[PATCH net-next RFC] 4/4] net: dsa: mv88e6xxx: Refactor CPU and DSA port setup
From: Vivien Didelot @ 2016-11-28 16:00 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev, Andrew Lunn
In-Reply-To: <1479944598-20372-5-git-send-email-andrew@lunn.ch>

Hi Andrew,

Andrew Lunn <andrew@lunn.ch> writes:

>  static const struct mv88e6xxx_ops mv88e6391_ops = {
> diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
> index a0d0f79a7de8..b846a33c024c 100644
> --- a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
> +++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
> @@ -123,6 +123,10 @@
>  #define PORT_CONTROL_USE_TAG		BIT(4)
>  #define PORT_CONTROL_FORWARD_UNKNOWN_MC	BIT(3)
>  #define PORT_CONTROL_FORWARD_UNKNOWN	BIT(2)
> +#define PORT_CONTROL_NOT_EGREES_UNKNOWN_DA		(0x0 << 2)
> +#define PORT_CONTROL_NOT_EGREES_UNKNOWN_MULTICAST_DA	(0x1 << 2)
> +#define PORT_CONTROL_NOT_EGREES_UNKNOWN_UNITCAST_DA	(0x2 << 2)

s/EGREES/EGRESS/.

> +#define PORT_CONTROL_EGREES_ALL_UNKNOWN_DA		(0x3 << 2)
>  #define PORT_CONTROL_STATE_MASK		0x03
>  #define PORT_CONTROL_STATE_DISABLED	0x00
>  #define PORT_CONTROL_STATE_BLOCKING	0x01
> @@ -821,6 +825,8 @@ struct mv88e6xxx_ops {
>  				uint64_t *data);
>  	int (*tag_remap)(struct mv88e6xxx_chip *chip, int port);
>  	int (*monitor_ctrl)(struct mv88e6xxx_chip *chip, int upstream_port);
> +	int (*cpu_port_config)(struct mv88e6xxx_chip *chip, int port);
> +	int (*dsa_port_config)(struct mv88e6xxx_chip *chip, int port);
>  };

Hum, I dislike having config wrappers in the library. The functions
stored in the mv88e6xxx_ops structure should reflect the features
exposed by the SMI device (Port, Global X, PHY, etc.) registers.

They should not handle configuration logic, we should see them as an
MV88E6XXX API used to implement a chip (usually wrapped in chip.c).

>  
>  #define STATS_TYPE_PORT		BIT(0)
> diff --git a/drivers/net/dsa/mv88e6xxx/port.c b/drivers/net/dsa/mv88e6xxx/port.c
> index b7fab70f6cd7..a37d7d72df47 100644
> --- a/drivers/net/dsa/mv88e6xxx/port.c
> +++ b/drivers/net/dsa/mv88e6xxx/port.c
> @@ -553,3 +553,78 @@ int mv88e6390_tag_remap(struct mv88e6xxx_chip *chip, int port)
>  
>  	return 0;
>  }
> +
> +int mv88e6095_cpu_port_config(struct mv88e6xxx_chip *chip, int port)
> +{
> +	u16 reg;
> +	int err;
> +
> +	err = mv88e6xxx_port_read(chip, port, PORT_CONTROL, &reg);
> +	if (err)
> +		return err;
> +
> +	reg |= PORT_CONTROL_DSA_TAG |
> +		PORT_CONTROL_EGRESS_ADD_TAG |
> +		PORT_CONTROL_FORWARD_UNKNOWN;
> +
> +	return mv88e6xxx_port_write(chip, port, PORT_CONTROL, reg);
> +}
> +
> +int mv88e6351_cpu_port_config(struct mv88e6xxx_chip *chip, int port)
> +{
> +	u16 reg;
> +	int err;
> +
> +	err = mv88e6xxx_port_read(chip, port, PORT_CONTROL, &reg);
> +	if (err)
> +		return err;
> +
> +	if (chip->info->tag_protocol == DSA_TAG_PROTO_EDSA) {
> +		reg |= PORT_CONTROL_FRAME_ETHER_TYPE_DSA |
> +			PORT_CONTROL_EGRESS_ADD_TAG;
> +
> +		/* Port Ethertype: use the Ethertype DSA Ethertype
> +		 * value.
> +		 */
> +		err = mv88e6xxx_port_write(chip, port, PORT_ETH_TYPE,
> +					   ETH_P_EDSA);
> +		if (err)
> +			return err;
> +	} else {
> +		reg |= PORT_CONTROL_FRAME_MODE_DSA;
> +	}
> +
> +	reg |= PORT_CONTROL_EGREES_ALL_UNKNOWN_DA;
> +
> +	return mv88e6xxx_port_write(chip, port, PORT_CONTROL, reg);
> +}

The device (Port, Global X, etc.) functions should reflect and implement
the feature described in the register(s) they read from/write to.

Please provide generic functions to configure a port's Frame mode and
Egress mode under the comment /* Offset 0x04: Port Control Register */

mv88e6xxx_port_set_{frame,egress}_mode() would work. The Frame mode bits
changed between chips, but the Egress mode bits are the same since 6065.

A frame mode can be "Normal Network mode", "DSA mode", "Provider mode",
or "Ether Type DSA mode". Maybe use an enum, or switching on the u8
frame_mode value might be enough, as you wish.

Please also provide a generic function to set the Port EType under an
ordered comment:

/* Offset 0x0F: Port E Type */

int mv88e6xxx_port_set_etype(struct mv88e6xxx_chip *chip, int port,
                             u16 etype)
{
    return mv88e6xxx_port_write(chip, port, PORT_ETH_TYPE, etype);
}

Then, some nice wrappers in chip.c can use them depending on the chip's
tag_protocol to configure what net/dsa calls cpu, dsa, or user port.

Thanks,

        Vivien

^ permalink raw reply

* Re: stmmac ethernet in kernel 4.4: coalescing related pauses?
From: Lino Sanfilippo @ 2016-11-28 15:57 UTC (permalink / raw)
  To: Eric Dumazet, David Miller; +Cc: pavel, peppe.cavallaro, netdev, linux-kernel
In-Reply-To: <1480347103.18162.58.camel@edumazet-glaptop3.roam.corp.google.com>

On 28.11.2016 16:31, Eric Dumazet wrote:
> On Mon, 2016-11-28 at 09:54 -0500, David Miller wrote:
>> From: Lino Sanfilippo <lsanfil@marvell.com>
>> Date: Mon, 28 Nov 2016 14:07:51 +0100
>>
>>> Calling skb_orphan() in the xmit handler made this issue disappear.
>>
>> This is not the way to handle this problem.
>>
>> The solution is to free the SKBs in a timely manner after the
>> chip has transmitted the frame.
>
> Note that the 'pauses' described by Pavel are also caused by a too small
> SO_SNDBUF value on the UDP socket.
>
> An immediate fix, with no kernel change is to increase it.
>
> echo 1000000 >/proc/sys/net/core/wmem_default
>
> or
>
> val = 1000000;
> setsockopt(fd, SOL_SOCKET, SO_SNDBUF, &val, sizeof(val));
>

I wonder if the best fix would be indeed to deactivate irq coalescing completely.
Does it make any sense at all to use it if a driver uses NAPI already?

Regards,
Lino

^ permalink raw reply

* [PATCH net-next v4 2/4] dt-bindings: net: add EEE capability constants
From: Jerome Brunet @ 2016-11-28 15:50 UTC (permalink / raw)
  To: netdev, devicetree, Florian Fainelli
  Cc: Jerome Brunet, Carlo Caione, Kevin Hilman, Giuseppe Cavallaro,
	Alexandre TORGUE, Martin Blumenstingl, Andre Roth, Andrew Lunn,
	Neil Armstrong, linux-amlogic, linux-arm-kernel, linux-kernel,
	Julia Lawall, Yegor Yefremov, Andreas Färber
In-Reply-To: <1480348229-25672-1-git-send-email-jbrunet@baylibre.com>

Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Tested-by: Yegor Yefremov <yegorslists@googlemail.com>
Tested-by: Andreas Färber <afaerber@suse.de>
Tested-by: Neil Armstrong <narmstrong@baylibre.com>
---
 include/dt-bindings/net/mdio.h | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)
 create mode 100644 include/dt-bindings/net/mdio.h

diff --git a/include/dt-bindings/net/mdio.h b/include/dt-bindings/net/mdio.h
new file mode 100644
index 000000000000..99c6d903d439
--- /dev/null
+++ b/include/dt-bindings/net/mdio.h
@@ -0,0 +1,19 @@
+/*
+ * This header provides generic constants for ethernet MDIO bindings
+ */
+
+#ifndef _DT_BINDINGS_NET_MDIO_H
+#define _DT_BINDINGS_NET_MDIO_H
+
+/*
+ * EEE capability Advertisement
+ */
+
+#define MDIO_EEE_100TX		0x0002	/* 100TX EEE cap */
+#define MDIO_EEE_1000T		0x0004	/* 1000T EEE cap */
+#define MDIO_EEE_10GT		0x0008	/* 10GT EEE cap */
+#define MDIO_EEE_1000KX		0x0010	/* 1000KX EEE cap */
+#define MDIO_EEE_10GKX4		0x0020	/* 10G KX4 EEE cap */
+#define MDIO_EEE_10GKR		0x0040	/* 10G KR EEE cap */
+
+#endif
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next v4 0/4] Fix OdroidC2 Gigabit Tx link issue
From: Jerome Brunet @ 2016-11-28 15:50 UTC (permalink / raw)
  To: netdev, devicetree, Florian Fainelli
  Cc: Jerome Brunet, Carlo Caione, Kevin Hilman, Giuseppe Cavallaro,
	Alexandre TORGUE, Martin Blumenstingl, Andre Roth, Andrew Lunn,
	Neil Armstrong, linux-amlogic, linux-arm-kernel, linux-kernel,
	Julia Lawall, Yegor Yefremov, Andreas Färber

This patchset fixes an issue with the OdroidC2 board (DWMAC + RTL8211F).
The platform seems to enter LPI on the Rx path too often while performing
relatively high TX transfer. This eventually break the link (both Tx and
Rx), and require to bring the interface down and up again to get the Rx
path working again.

The root cause of this issue is not fully understood yet but disabling EEE
advertisement on the PHY prevent this feature to be negotiated.
With this change, the link is stable and reliable, with the expected
throughput performance.

The patchset adds options in the generic phy driver to disable EEE
advertisement, through device tree. The way it is done is very similar
to the handling of the max-speed property.

Patch 4 is provided here for testing purpose only. Please don't merge
patch 4, this change will go through the amlogic's tree.

Chnages since V3: [3]
 - Fix signess error reported by kbuild test robot (Thx Julia)

Changes since V2: [2]
 - Rename "eee-advert-disable" to "eee-broken-modes" to make the intended
   purpose of this option clear (flag broken configuration, not a
   configuration option)
 - Add DT bindings constants so the DT configuration is more user friendly
 - Submit to net-next instead of net.

Changes since V1: [1]
 - Disable the advertisement of EEE in the generic code instead of the
   realtek driver.

[1] : http://lkml.kernel.org/r/1479220154-25851-1-git-send-email-jbrunet@baylibre.com
[2] : http://lkml.kernel.org/r/1479742524-30222-1-git-send-email-jbrunet@baylibre.com
[3] : http://lkml.kernel.org/r/1480326409-25419-1-git-send-email-jbrunet@baylibre.com

Jerome Brunet (4):
  net: phy: add an option to disable EEE advertisement
  dt-bindings: net: add EEE capability constants
  dt: bindings: add ethernet phy eee-broken-modes option documentation
  ARM64: dts: meson: odroidc2: disable advertisement EEE for GbE.

 Documentation/devicetree/bindings/net/phy.txt      |  2 +
 .../arm64/boot/dts/amlogic/meson-gxbb-odroidc2.dts | 14 ++++
 drivers/net/phy/phy.c                              |  3 +
 drivers/net/phy/phy_device.c                       | 80 +++++++++++++++++++---
 include/dt-bindings/net/mdio.h                     | 19 +++++
 include/linux/phy.h                                |  3 +
 6 files changed, 112 insertions(+), 9 deletions(-)
 create mode 100644 include/dt-bindings/net/mdio.h

-- 
2.7.4

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox