Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next 0/2] Fix support for the MV88E6097
From: David Miller @ 2016-11-28 16:59 UTC (permalink / raw)
  To: eichest; +Cc: andrew, vivien.didelot, netdev, stefan.eichenberger
In-Reply-To: <20161125084130.3210-1-stefan.eichenberger@netmodule.com>

From: Stefan Eichenberger <eichest@gmail.com>
Date: Fri, 25 Nov 2016 09:41:28 +0100

> This patchset fixes the following two issues for the MV88E6097:
> - Add missing definition of g1_irqs
> - Add missing comment

Series applied, thanks Stefan.

^ permalink raw reply

* Re: [PATCH net-next 1/5] net: mvneta: Use cacheable memory to store the rx buffer virtual address
From: Gregory CLEMENT @ 2016-11-28 17:00 UTC (permalink / raw)
  To: Jisheng Zhang
  Cc: David S. Miller, linux-kernel, netdev, Arnd Bergmann,
	Jason Cooper, Andrew Lunn, Sebastian Hesselbarth,
	Thomas Petazzoni, linux-arm-kernel, Nadav Haklai, Marcin Wojtas,
	Dmitri Epshtein, Yelena Krivosheev
In-Reply-To: <20161128163548.70181560@xhacker>

Hi Jisheng,
 
 On lun., nov. 28 2016, Jisheng Zhang <jszhang@marvell.com> wrote:

> Hi Gregory,
>
> On Fri, 25 Nov 2016 16:30:14 +0100 Gregory CLEMENT wrote:
>
>> Until now the virtual address of the received buffer were stored in the
>> cookie field of the rx descriptor. However, this field is 32-bits only
>> which prevents to use the driver on a 64-bits architecture.
>> 
>> With this patch the virtual address is stored in an array not shared with
>> the hardware (no more need to use the DMA API). Thanks to this, it is
>> possible to use cache contrary to the access of the rx descriptor member.
>> 
>> The change is done in the swbm path only because the hwbm uses the cookie
>> field, this also means that currently the hwbm is not usable in 64-bits.
>> 
>> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
>> ---
>>  drivers/net/ethernet/marvell/mvneta.c | 96 ++++++++++++++++++++++++----
>>  1 file changed, 84 insertions(+), 12 deletions(-)
>> 
>> diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
>> index 87274d4ab102..b6849f88cab7 100644
>> --- a/drivers/net/ethernet/marvell/mvneta.c
>> +++ b/drivers/net/ethernet/marvell/mvneta.c
>> @@ -561,6 +561,9 @@ struct mvneta_rx_queue {
>>  	u32 pkts_coal;
>>  	u32 time_coal;
>>  
>> +	/* Virtual address of the RX buffer */
>> +	void  **buf_virt_addr;
>
> can we store buf_phys_addr in cacheable memory as well?

Even if we store in in cacheable memory we will still need to store it
in the buffer descriptor as it is used by the hardware.

>
>> +
>>  	/* Virtual address of the RX DMA descriptors array */
>>  	struct mvneta_rx_desc *descs;
>>  
>> @@ -1573,10 +1576,14 @@ static void mvneta_tx_done_pkts_coal_set(struct mvneta_port *pp,
>>  
>>  /* Handle rx descriptor fill by setting buf_cookie and buf_phys_addr */
>>  static void mvneta_rx_desc_fill(struct mvneta_rx_desc *rx_desc,
>> -				u32 phys_addr, u32 cookie)
>> +				u32 phys_addr, void *virt_addr,
>> +				struct mvneta_rx_queue *rxq)
>>  {
>> -	rx_desc->buf_cookie = cookie;
>> +	int i;
>> +
>>  	rx_desc->buf_phys_addr = phys_addr;
>> +	i = rx_desc - rxq->descs;
>> +	rxq->buf_virt_addr[i] = virt_addr;
>>  }
>>  
>>  /* Decrement sent descriptors counter */
>> @@ -1781,7 +1788,8 @@ EXPORT_SYMBOL_GPL(mvneta_frag_free);
>>  
>>  /* Refill processing for SW buffer management */
>>  static int mvneta_rx_refill(struct mvneta_port *pp,
>> -			    struct mvneta_rx_desc *rx_desc)
>> +			    struct mvneta_rx_desc *rx_desc,
>> +			    struct mvneta_rx_queue *rxq)
>>  
>>  {
>>  	dma_addr_t phys_addr;
>> @@ -1799,7 +1807,7 @@ static int mvneta_rx_refill(struct mvneta_port *pp,
>>  		return -ENOMEM;
>>  	}
>>  
>> -	mvneta_rx_desc_fill(rx_desc, phys_addr, (u32)data);
>> +	mvneta_rx_desc_fill(rx_desc, phys_addr, data, rxq);
>>  	return 0;
>>  }
>>  
>> @@ -1861,7 +1869,12 @@ static void mvneta_rxq_drop_pkts(struct mvneta_port *pp,
>>  
>>  	for (i = 0; i < rxq->size; i++) {
>>  		struct mvneta_rx_desc *rx_desc = rxq->descs + i;
>> -		void *data = (void *)rx_desc->buf_cookie;
>> +		void *data;
>> +
>> +		if (!pp->bm_priv)
>> +			data = rxq->buf_virt_addr[i];
>> +		else
>> +			data = (void *)(uintptr_t)rx_desc->buf_cookie;
>>  
>>  		dma_unmap_single(pp->dev->dev.parent, rx_desc->buf_phys_addr,
>>  				 MVNETA_RX_BUF_SIZE(pp->pkt_size), DMA_FROM_DEVICE);
>> @@ -1894,12 +1907,13 @@ static int mvneta_rx_swbm(struct mvneta_port *pp, int rx_todo,
>>  		unsigned char *data;
>>  		dma_addr_t phys_addr;
>>  		u32 rx_status, frag_size;
>> -		int rx_bytes, err;
>> +		int rx_bytes, err, index;
>>  
>>  		rx_done++;
>>  		rx_status = rx_desc->status;
>>  		rx_bytes = rx_desc->data_size - (ETH_FCS_LEN + MVNETA_MH_SIZE);
>> -		data = (unsigned char *)rx_desc->buf_cookie;
>> +		index = rx_desc - rxq->descs;
>> +		data = (unsigned char *)rxq->buf_virt_addr[index];
>>  		phys_addr = rx_desc->buf_phys_addr;
>>  
>>  		if (!mvneta_rxq_desc_is_first_last(rx_status) ||
>> @@ -1938,7 +1952,7 @@ static int mvneta_rx_swbm(struct mvneta_port *pp, int rx_todo,
>>  		}
>>  
>>  		/* Refill processing */
>> -		err = mvneta_rx_refill(pp, rx_desc);
>> +		err = mvneta_rx_refill(pp, rx_desc, rxq);
>>  		if (err) {
>>  			netdev_err(dev, "Linux processing - Can't refill\n");
>>  			rxq->missed++;
>> @@ -2020,7 +2034,7 @@ static int mvneta_rx_hwbm(struct mvneta_port *pp, int rx_todo,
>>  		rx_done++;
>>  		rx_status = rx_desc->status;
>>  		rx_bytes = rx_desc->data_size - (ETH_FCS_LEN + MVNETA_MH_SIZE);
>> -		data = (unsigned char *)rx_desc->buf_cookie;
>> +		data = (u8 *)(uintptr_t)rx_desc->buf_cookie;
>>  		phys_addr = rx_desc->buf_phys_addr;
>>  		pool_id = MVNETA_RX_GET_BM_POOL_ID(rx_desc);
>>  		bm_pool = &pp->bm_priv->bm_pools[pool_id];
>> @@ -2708,6 +2722,57 @@ static int mvneta_poll(struct napi_struct *napi, int budget)
>>  	return rx_done;
>>  }
>>  
>> +/* Refill processing for HW buffer management */
>> +static int mvneta_rx_hwbm_refill(struct mvneta_port *pp,
>> +				 struct mvneta_rx_desc *rx_desc)
>> +
>> +{
>> +	dma_addr_t phys_addr;
>> +	void *data;
>> +
>> +	data = mvneta_frag_alloc(pp->frag_size);
>> +	if (!data)
>> +		return -ENOMEM;
>> +
>> +	phys_addr = dma_map_single(pp->dev->dev.parent, data,
>> +				   MVNETA_RX_BUF_SIZE(pp->pkt_size),
>> +				   DMA_FROM_DEVICE);
>> +	if (unlikely(dma_mapping_error(pp->dev->dev.parent, phys_addr))) {
>> +		mvneta_frag_free(pp->frag_size, data);
>> +		return -ENOMEM;
>> +	}
>> +
>> +	phys_addr += pp->rx_offset_correction;
>> +	rx_desc->buf_phys_addr = phys_addr;
>> +	rx_desc->buf_cookie = (uintptr_t)data;
>> +
>> +	return 0;
>> +}
>> +
>> +/* Handle rxq fill: allocates rxq skbs; called when initializing a port */
>> +static int mvneta_rxq_bm_fill(struct mvneta_port *pp,
>> +			      struct mvneta_rx_queue *rxq,
>> +			      int num)
>> +{
>> +	int i;
>> +
>> +	for (i = 0; i < num; i++) {
>> +		memset(rxq->descs + i, 0, sizeof(struct mvneta_rx_desc));
>> +		if (mvneta_rx_hwbm_refill(pp, rxq->descs + i) != 0) {
>> +			netdev_err(pp->dev, "%s:rxq %d, %d of %d buffs  filled\n",
>> +				   __func__, rxq->id, i, num);
>> +			break;
>> +		}
>> +	}
>> +
>> +	/* Add this number of RX descriptors as non occupied (ready to
>> +	 * get packets)
>> +	 */
>> +	mvneta_rxq_non_occup_desc_add(pp, rxq, i);
>> +
>> +	return i;
>> +}
>> +
>>  /* Handle rxq fill: allocates rxq skbs; called when initializing a port */
>>  static int mvneta_rxq_fill(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
>>  			   int num)
>> @@ -2716,7 +2781,7 @@ static int mvneta_rxq_fill(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
>>  
>>  	for (i = 0; i < num; i++) {
>>  		memset(rxq->descs + i, 0, sizeof(struct mvneta_rx_desc));
>> -		if (mvneta_rx_refill(pp, rxq->descs + i) != 0) {
>> +		if (mvneta_rx_refill(pp, rxq->descs + i, rxq) != 0) {
>>  			netdev_err(pp->dev, "%s:rxq %d, %d of %d buffs  filled\n",
>>  				__func__, rxq->id, i, num);
>>  			break;
>> @@ -2784,14 +2849,21 @@ static int mvneta_rxq_init(struct mvneta_port *pp,
>>  		mvneta_rxq_buf_size_set(pp, rxq,
>>  					MVNETA_RX_BUF_SIZE(pp->pkt_size));
>>  		mvneta_rxq_bm_disable(pp, rxq);
>> +
>> +		rxq->buf_virt_addr = devm_kmalloc(pp->dev->dev.parent,
>> +						  rxq->size * sizeof(void *),
>> +						  GFP_KERNEL);
>
> I would suggest allocate this buffer during probe. Otherwise, there's
> memory leak if we either change the mtu or close then open the eth in
> a loop, e.g
>
> while true
> do
> 	ifconfig eth0 up
> 	ifconfig eth0 down
> done

Indeed, I will move it.

Thanks,

Gregory

>
> Thanks,
> Jisheng
>
>> +		if (!rxq->buf_virt_addr)
>> +			return -ENOMEM;
>> +
>> +		mvneta_rxq_fill(pp, rxq, rxq->size);
>>  	} else {
>>  		mvneta_rxq_bm_enable(pp, rxq);
>>  		mvneta_rxq_long_pool_set(pp, rxq);
>>  		mvneta_rxq_short_pool_set(pp, rxq);
>> +		mvneta_rxq_bm_fill(pp, rxq, rxq->size);
>>  	}
>>  
>> -	mvneta_rxq_fill(pp, rxq, rxq->size);
>> -
>>  	return 0;
>>  }
>>  
>

-- 
Gregory Clement, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com

^ permalink raw reply

* Re: stmmac ethernet in kernel 4.4: coalescing related pauses?
From: Lino Sanfilippo @ 2016-11-28 17:01 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, pavel, peppe.cavallaro, netdev, linux-kernel
In-Reply-To: <20161128.113031.964579744326063048.davem@davemloft.net>



On 28.11.2016 17:30, David Miller wrote:
> From: Lino Sanfilippo <lsanfil@marvell.com>
> Date: Mon, 28 Nov 2016 16:57:35 +0100
>
>> I wonder if the best fix would be indeed to deactivate irq coalescing
>> completely.
>> Does it make any sense at all to use it if a driver uses NAPI already?
>
> It absolutely does make sense, when it is implemented and functions
> properly.
>

Interesting. I always thought both (NAPI and irq coalescing) are essentially doing the same thing only
one time in software and one time with hw support. Did I misunderstand NAPI?

Regards,
Lino

^ permalink raw reply

* [patch net-next 1/4] mlxsw: resources: Add maximum buffer size
From: Jiri Pirko @ 2016-11-28 17:01 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, eladr, yotamg, nogahf, arkadis, ogerlitz
In-Reply-To: <1480352486-18518-1-git-send-email-jiri@resnulli.us>

From: Ido Schimmel <idosch@mellanox.com>

We need to be able to limit the size of shared buffer pools, so query
the maximum size from the device during init.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/resources.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/resources.h b/drivers/net/ethernet/mellanox/mlxsw/resources.h
index 1c2119b..3c2171d 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/resources.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/resources.h
@@ -47,6 +47,7 @@ enum mlxsw_res_id {
 	MLXSW_RES_ID_MAX_SYSTEM_PORT,
 	MLXSW_RES_ID_MAX_LAG,
 	MLXSW_RES_ID_MAX_LAG_MEMBERS,
+	MLXSW_RES_ID_MAX_BUFFER_SIZE,
 	MLXSW_RES_ID_MAX_CPU_POLICERS,
 	MLXSW_RES_ID_MAX_VRS,
 	MLXSW_RES_ID_MAX_RIFS,
@@ -70,6 +71,7 @@ static u16 mlxsw_res_ids[] = {
 	[MLXSW_RES_ID_MAX_SYSTEM_PORT] = 0x2502,
 	[MLXSW_RES_ID_MAX_LAG] = 0x2520,
 	[MLXSW_RES_ID_MAX_LAG_MEMBERS] = 0x2521,
+	[MLXSW_RES_ID_MAX_BUFFER_SIZE] = 0x2802,	/* Bytes */
 	[MLXSW_RES_ID_MAX_CPU_POLICERS] = 0x2A13,
 	[MLXSW_RES_ID_MAX_VRS] = 0x2C01,
 	[MLXSW_RES_ID_MAX_RIFS] = 0x2C02,
-- 
2.7.4

^ permalink raw reply related

* [patch net-next 2/4] mlxsw: spectrum_buffers: Limit size of pools
From: Jiri Pirko @ 2016-11-28 17:01 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, eladr, yotamg, nogahf, arkadis, ogerlitz
In-Reply-To: <1480352486-18518-1-git-send-email-jiri@resnulli.us>

From: Ido Schimmel <idosch@mellanox.com>

The shared buffer pools are containers whose size is used to calculate
the maximum usage for packets from / to a specific port / {port, PG/TC},
when dynamic threshold is employed.

While it's perfectly fine for the sum of the pools to exceed the maximum
size of the shared buffer, a single pool cannot.

Add a check when the pool size is set and forbid sizes larger than the
maximum size of the shared buffer.

Without the patch:
$ devlink sb pool set pci/0000:03:00.0 pool 0 size 999999999 thtype
dynamic
// No error is returned

With the patch:
$ devlink sb pool set pci/0000:03:00.0 pool 0 size 999999999 thtype
dynamic
devlink answers: Invalid argument

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c
index bcaed8a..a746826 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c
@@ -611,6 +611,9 @@ int mlxsw_sp_sb_pool_set(struct mlxsw_core *mlxsw_core,
 	u32 pool_size = MLXSW_SP_BYTES_TO_CELLS(size);
 	enum mlxsw_reg_sbpr_mode mode;
 
+	if (size > MLXSW_CORE_RES_GET(mlxsw_sp->core, MAX_BUFFER_SIZE))
+		return -EINVAL;
+
 	mode = (enum mlxsw_reg_sbpr_mode) threshold_type;
 	return mlxsw_sp_sb_pr_write(mlxsw_sp, pool, dir, mode, pool_size);
 }
-- 
2.7.4

^ permalink raw reply related

* [patch net-next 3/4] mlxsw: core: Add missing rollback in error path
From: Jiri Pirko @ 2016-11-28 17:01 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, eladr, yotamg, nogahf, arkadis, ogerlitz
In-Reply-To: <1480352486-18518-1-git-send-email-jiri@resnulli.us>

From: Ido Schimmel <idosch@mellanox.com>

Without this rollback, the thermal zone is still registered during the
error path, whereas its private data is freed upon the destruction of
the underlying bus device due to the use of devm_kzalloc(). This results
in use after free.

Fix this by calling mlxsw_thermal_fini() from the appropriate place in
the error path.

Fixes: a50c1e35650b ("mlxsw: core: Implement thermal zone")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index b21f88c..7a0ad39 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -1157,6 +1157,7 @@ int mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
 	if (mlxsw_core->driver->fini)
 		mlxsw_core->driver->fini(mlxsw_core);
 err_driver_init:
+	mlxsw_thermal_fini(mlxsw_core->thermal);
 err_thermal_init:
 err_hwmon_init:
 	devlink_unregister(devlink);
-- 
2.7.4

^ permalink raw reply related

* [patch net-next 4/4] mlxsw: core: Change order of operations in removal path
From: Jiri Pirko @ 2016-11-28 17:01 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, eladr, yotamg, nogahf, arkadis, ogerlitz
In-Reply-To: <1480352486-18518-1-git-send-email-jiri@resnulli.us>

From: Ido Schimmel <idosch@mellanox.com>

We call bus->init() before allocating 'lag.mapping'. Change the order of
operations in removal path to reflect that.

This makes the error path of mlxsw_core_bus_device_register() symmetric
with mlxsw_core_bus_device_unregister().

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 7a0ad39..4dc028b 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -1188,8 +1188,8 @@ void mlxsw_core_bus_device_unregister(struct mlxsw_core *mlxsw_core)
 	mlxsw_thermal_fini(mlxsw_core->thermal);
 	devlink_unregister(devlink);
 	mlxsw_emad_fini(mlxsw_core);
-	mlxsw_core->bus->fini(mlxsw_core->bus_priv);
 	kfree(mlxsw_core->lag.mapping);
+	mlxsw_core->bus->fini(mlxsw_core->bus_priv);
 	free_percpu(mlxsw_core->pcpu_stats);
 	devlink_free(devlink);
 	mlxsw_core_driver_put(device_kind);
-- 
2.7.4

^ permalink raw reply related

* [patch net-next 0/4] mlxsw: couple of enhancements and fixes
From: Jiri Pirko @ 2016-11-28 17:01 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, eladr, yotamg, nogahf, arkadis, ogerlitz

From: Jiri Pirko <jiri@mellanox.com>

Couple of enhancements and fixes from Ido.

Ido Schimmel (4):
  mlxsw: resources: Add maximum buffer size
  mlxsw: spectrum_buffers: Limit size of pools
  mlxsw: core: Add missing rollback in error path
  mlxsw: core: Change order of operations in removal path

 drivers/net/ethernet/mellanox/mlxsw/core.c             | 3 ++-
 drivers/net/ethernet/mellanox/mlxsw/resources.h        | 2 ++
 drivers/net/ethernet/mellanox/mlxsw/spectrum_buffers.c | 3 +++
 3 files changed, 7 insertions(+), 1 deletion(-)

-- 
2.7.4

^ permalink raw reply

* Re: [PATCH 0/2] net: phy: realtek: fix RTL8211F TX-delay handling
From: David Miller @ 2016-11-28 17:07 UTC (permalink / raw)
  To: martin.blumenstingl-gM/Ye1E23mwN+BqQ9rBEUg
  Cc: f.fainelli-Re5JQEeQqe8AvxtiuMwx3w, robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	mark.rutland-5wv7dgnIgG8, sean.wang-NuS5LvNUpcJWk0Htik3J/w,
	netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-amlogic-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	jbrunet-rdvid1DuHRBWk0Htik3J/w
In-Reply-To: <20161125131201.19994-1-martin.blumenstingl-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org>

From: Martin Blumenstingl <martin.blumenstingl-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org>
Date: Fri, 25 Nov 2016 14:11:59 +0100

> The RTL8211F PHY driver currently enables the TX-delay only when the
> phy-mode is PHY_INTERFACE_MODE_RGMII. This is incorrect, because there
> are three RGMII variations of the phy-mode which explicitly request the
> PHY to enable the RX and/or TX delay, while PHY_INTERFACE_MODE_RGMII
> specifies that the PHY should disable the RX and/or TX delays.
> 
> Additionally to the RTL8211F PHY driver change this contains a small
> update to the phy-mode documentation to clarify the purpose of the
> RGMII phy-modes.
> While this may not be perfect yet it's at least a start. Please feel
> free to drop this patch from this series and send an improved version
> yourself.
> 
> These patches are the results of recent discussions, see [0]
> 
> [0] http://lists.infradead.org/pipermail/linux-amlogic/2016-November/001688.html

Series applied, thanks Martin.
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next] net: hns: Fix to conditionally convey RX checksum flag to stack
From: David Miller @ 2016-11-28 17:12 UTC (permalink / raw)
  To: salil.mehta; +Cc: yisen.zhuang, mehta.salil.lnk, netdev, linux-kernel, linuxarm
In-Reply-To: <20161125133240.1264224-1-salil.mehta@huawei.com>

From: Salil Mehta <salil.mehta@huawei.com>
Date: Fri, 25 Nov 2016 13:32:40 +0000

> @@ -778,6 +778,35 @@ int hns_ae_get_regs_len(struct hnae_handle *handle)
>  	return total_num;
>  }
>  
> +static bool hns_ae_is_l3l4_csum_err(struct hnae_handle *handle)
> +{
> +	struct hns_ppe_cb *ppe_cb = hns_get_ppe_cb(handle);
> +	u32 regval;
> +	bool retval = false;
> +
> +	/* read PPE_HIS_PRO_ERR register and check for the checksum errors */
> +	regval = dsaf_read_dev(ppe_cb, PPE_HIS_PRO_ERR_REG);
> +

I don't see how a single register can properly provide error status for a ring
of pending received packets.

No matter how this register is implemented, it is either going to result in
packets erroneously being marked as having errors, or error status being
lost when multiple packets in a row have such errors.

For example, if you receive several packets in a row that have errors,
you'll read this register for the first one.  If this read clears the error
status, which I am guessing it does, then you won't see the error status
for the next packet that had one of these errors as well.

If you don't have something which is provided on a per-packet basis
then you can't determine the error properly.  Therefore you will just
have to always ignore the checksum if there is any error indicated in
the ring descriptor.

^ permalink raw reply

* Re: [PATCH] Set DEFAULT_TCP_CONG to bbr if DEFAULT_BBR is set
From: David Miller @ 2016-11-28 17:15 UTC (permalink / raw)
  To: jwollrath; +Cc: netdev
In-Reply-To: <20161125140526.2486-1-jwollrath@web.de>

From: Julian Wollrath <jwollrath@web.de>
Date: Fri, 25 Nov 2016 15:05:26 +0100

> Signed-off-by: Julian Wollrath <jwollrath@web.de>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] amd-xgbe: Fix unused suspend handlers build warning
From: David Miller @ 2016-11-28 17:19 UTC (permalink / raw)
  To: bp; +Cc: linux-kernel, thomas.lendacky, netdev
In-Reply-To: <20161126205352.19577-1-bp@alien8.de>

From: Borislav Petkov <bp@alien8.de>
Date: Sat, 26 Nov 2016 21:53:52 +0100

> From: Borislav Petkov <bp@suse.de>
> 
> Fix:
> 
>   drivers/net/ethernet/amd/xgbe/xgbe-main.c:835:12: warning: ‘xgbe_suspend’ defined
>     but not used [-Wunused-function]
>   drivers/net/ethernet/amd/xgbe/xgbe-main.c:855:12: warning: ‘xgbe_resume’ defined
>     but not used [-Wunused-function]
> 
> I see it during randconfig builds here.
> 
> Signed-off-by: Borislav Petkov <bp@suse.de>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next v2 1/6] tcp: instrument tcp sender limits chronographs
From: Neal Cardwell @ 2016-11-28 17:23 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: David Miller, Soheil Hassas Yeganeh, francisyyan, Netdev,
	Eric Dumazet
In-Reply-To: <1480316838-154141-2-git-send-email-ycheng@google.com>

On Mon, Nov 28, 2016 at 2:07 AM, Yuchung Cheng <ycheng@google.com> wrote:
> From: Francis Yan <francisyyan@gmail.com>
>
> This patch implements the skeleton of the TCP chronograph
> instrumentation on sender side limits:
>
>         1) idle (unspec)
>         2) busy sending data other than 3-4 below
>         3) rwnd-limited
>         4) sndbuf-limited
>
> The limits are enumerated 'tcp_chrono'. Since a connection in
> theory can idle forever, we do not track the actual length of this
> uninteresting idle period. For the rest we track how long the sender
> spends in each limit. At any point during the life time of a
> connection, the sender must be in one of the four states.
>
> If there are multiple conditions worthy of tracking in a chronograph
> then the highest priority enum takes precedence over
> the other conditions. So that if something "more interesting"
> starts happening, stop the previous chrono and start a new one.
>
> The time unit is jiffy(u32) in order to save space in tcp_sock.
> This implies application must sample the stats no longer than every
> 49 days of 1ms jiffy.
>
> Signed-off-by: Francis Yan <francisyyan@gmail.com>
> Signed-off-by: Yuchung Cheng <ycheng@google.com>
> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
> ---

Acked-by: Neal Cardwell <ncardwell@google.com>

neal

^ permalink raw reply

* Re: [PATCH net-next v2 2/6] tcp: instrument how long TCP is busy sending
From: Neal Cardwell @ 2016-11-28 17:24 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: David Miller, Soheil Hassas Yeganeh, francisyyan, Netdev,
	Eric Dumazet
In-Reply-To: <1480316838-154141-3-git-send-email-ycheng@google.com>

On Mon, Nov 28, 2016 at 2:07 AM, Yuchung Cheng <ycheng@google.com> wrote:
> From: Francis Yan <francisyyan@gmail.com>
>
> This patch measures TCP busy time, which is defined as the period
> of time when sender has data (or FIN) to send. The time starts when
> data is buffered and stops when the write queue is flushed by ACKs
> or error events.
>
> Note the busy time does not include SYN time, unless data is
> included in SYN (i.e. Fast Open). It does include FIN time even
> if the FIN carries no payload. Excluding pure FIN is possible but
> would incur one additional test in the fast path, which may not
> be worth it.
>
> Signed-off-by: Francis Yan <francisyyan@gmail.com>
> Signed-off-by: Yuchung Cheng <ycheng@google.com>
> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
> ---

Acked-by: Neal Cardwell <ncardwell@google.com>

neal

^ permalink raw reply

* Re: [PATCH net-next v2 3/6] tcp: instrument how long TCP is limited by receive window
From: Neal Cardwell @ 2016-11-28 17:25 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: David Miller, Soheil Hassas Yeganeh, francisyyan, Netdev,
	Eric Dumazet
In-Reply-To: <1480316838-154141-4-git-send-email-ycheng@google.com>

On Mon, Nov 28, 2016 at 2:07 AM, Yuchung Cheng <ycheng@google.com> wrote:
> From: Francis Yan <francisyyan@gmail.com>
>
> This patch measures the total time when the TCP stops sending because
> the receiver's advertised window is not large enough. Note that
> once the limit is lifted we are likely in the busy status if we
> have data pending.
>
> Signed-off-by: Francis Yan <francisyyan@gmail.com>
> Signed-off-by: Yuchung Cheng <ycheng@google.com>
> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
> ---

Acked-by: Neal Cardwell <ncardwell@google.com>

neal

^ permalink raw reply

* [PATCH net-next] hv_netvsc: remove excessive logging on MTU change
From: Vitaly Kuznetsov @ 2016-11-28 17:25 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel, K. Y. Srinivasan, Haiyang Zhang

When we change MTU or the number of channels on a netvsc device we get the
following logged:

 hv_netvsc bf5edba8...: net device safe to remove
 hv_netvsc: hv_netvsc channel opened successfully
 hv_netvsc bf5edba8...: Send section size: 6144, Section count:2560
 hv_netvsc bf5edba8...: Device MAC 00:15:5d:1e:91:12 link state up

This information is useful as debug at most.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 drivers/net/hyperv/netvsc.c       | 8 ++++----
 drivers/net/hyperv/rndis_filter.c | 6 +++---
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 720b5fa..d85da0d 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -410,8 +410,8 @@ static int netvsc_init_buf(struct hv_device *device)
 	net_device->send_section_cnt =
 		net_device->send_buf_size / net_device->send_section_size;
 
-	dev_info(&device->device, "Send section size: %d, Section count:%d\n",
-		 net_device->send_section_size, net_device->send_section_cnt);
+	netdev_dbg(ndev, "Send section size: %d, Section count:%d\n",
+		   net_device->send_section_size, net_device->send_section_cnt);
 
 	/* Setup state for managing the send buffer. */
 	net_device->map_words = DIV_ROUND_UP(net_device->send_section_cnt,
@@ -578,7 +578,7 @@ void netvsc_device_remove(struct hv_device *device)
 	 * At this point, no one should be accessing net_device
 	 * except in here
 	 */
-	dev_notice(&device->device, "net device safe to remove\n");
+	netdev_dbg(ndev, "net device safe to remove\n");
 
 	/* Now, we can close the channel safely */
 	vmbus_close(device->channel);
@@ -1380,7 +1380,7 @@ int netvsc_device_add(struct hv_device *device, void *additional_info)
 	}
 
 	/* Channel is opened */
-	pr_info("hv_netvsc channel opened successfully\n");
+	netdev_dbg(ndev, "hv_netvsc channel opened successfully\n");
 
 	/* If we're reopening the device we may have multiple queues, fill the
 	 * chn_table with the default channel to use it before subchannels are
diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c
index 9195d5d..8d90904 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -1059,9 +1059,9 @@ int rndis_filter_device_add(struct hv_device *dev,
 
 	device_info->link_state = rndis_device->link_state;
 
-	dev_info(&dev->device, "Device MAC %pM link state %s\n",
-		 rndis_device->hw_mac_adr,
-		 device_info->link_state ? "down" : "up");
+	netdev_dbg(net, "Device MAC %pM link state %s\n",
+		   rndis_device->hw_mac_adr,
+		   device_info->link_state ? "down" : "up");
 
 	if (net_device->nvsp_version < NVSP_PROTOCOL_VERSION_5)
 		return 0;
-- 
2.9.3

^ permalink raw reply related

* Re: [PATCH net-next v2 4/6] tcp: instrument how long TCP is limited by insufficient send buffer
From: Neal Cardwell @ 2016-11-28 17:25 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: David Miller, Soheil Hassas Yeganeh, francisyyan, Netdev,
	Eric Dumazet
In-Reply-To: <1480316838-154141-5-git-send-email-ycheng@google.com>

On Mon, Nov 28, 2016 at 2:07 AM, Yuchung Cheng <ycheng@google.com> wrote:
> From: Francis Yan <francisyyan@gmail.com>
>
> This patch measures the amount of time when TCP runs out of new data
> to send to the network due to insufficient send buffer, while TCP
> is still busy delivering (i.e. write queue is not empty). The goal
> is to indicate either the send buffer autotuning or user SO_SNDBUF
> setting has resulted network under-utilization.
>
> The measurement starts conservatively by checking various conditions
> to minimize false claims (i.e. under-estimation is more likely).
> The measurement stops when the SOCK_NOSPACE flag is cleared. But it
> does not account the time elapsed till the next application write.
> Also the measurement only starts if the sender is still busy sending
> data, s.t. the limit accounted is part of the total busy time.
>
> Signed-off-by: Francis Yan <francisyyan@gmail.com>
> Signed-off-by: Yuchung Cheng <ycheng@google.com>
> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
> ---

Acked-by: Neal Cardwell <ncardwell@google.com>

neal

^ permalink raw reply

* Re: [PATCH net-next v2 5/6] tcp: export sender limits chronographs to TCP_INFO
From: Neal Cardwell @ 2016-11-28 17:26 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: David Miller, Soheil Hassas Yeganeh, francisyyan, Netdev,
	Eric Dumazet
In-Reply-To: <1480316838-154141-6-git-send-email-ycheng@google.com>

On Mon, Nov 28, 2016 at 2:07 AM, Yuchung Cheng <ycheng@google.com> wrote:
> From: Francis Yan <francisyyan@gmail.com>
>
> This patch exports all the sender chronograph measurements collected
> in the previous patches to TCP_INFO interface. Note that busy time
> exported includes all the other sending limits (rwnd-limited,
> sndbuf-limited). Internally the time unit is jiffy but externally
> the measurements are in microseconds for future extensions.
>
> Signed-off-by: Francis Yan <francisyyan@gmail.com>
> Signed-off-by: Yuchung Cheng <ycheng@google.com>
> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
> ---

Acked-by: Neal Cardwell <ncardwell@google.com>

neal

^ permalink raw reply

* Re: [PATCH net-next v2 6/6] tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING
From: Neal Cardwell @ 2016-11-28 17:26 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: David Miller, Soheil Hassas Yeganeh, francisyyan, Netdev,
	Eric Dumazet
In-Reply-To: <1480316838-154141-7-git-send-email-ycheng@google.com>

On Mon, Nov 28, 2016 at 2:07 AM, Yuchung Cheng <ycheng@google.com> wrote:
> From: Francis Yan <francisyyan@gmail.com>
>
> This patch exports the sender chronograph stats via the socket
> SO_TIMESTAMPING channel. Currently we can instrument how long a
> particular application unit of data was queued in TCP by tracking
> SOF_TIMESTAMPING_TX_SOFTWARE and SOF_TIMESTAMPING_TX_SCHED. Having
> these sender chronograph stats exported simultaneously along with
> these timestamps allow further breaking down the various sender
> limitation.  For example, a video server can tell if a particular
> chunk of video on a connection takes a long time to deliver because
> TCP was experiencing small receive window. It is not possible to
> tell before this patch without packet traces.
>
> To prepare these stats, the user needs to set
> SOF_TIMESTAMPING_OPT_STATS and SOF_TIMESTAMPING_OPT_TSONLY flags
> while requesting other SOF_TIMESTAMPING TX timestamps. When the
> timestamps are available in the error queue, the stats are returned
> in a separate control message of type SCM_TIMESTAMPING_OPT_STATS,
> in a list of TLVs (struct nlattr) of types: TCP_NLA_BUSY_TIME,
> TCP_NLA_RWND_LIMITED, TCP_NLA_SNDBUF_LIMITED. Unit is microsecond.
>
> Signed-off-by: Francis Yan <francisyyan@gmail.com>
> Signed-off-by: Yuchung Cheng <ycheng@google.com>
> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
> ---

Acked-by: Neal Cardwell <ncardwell@google.com>

neal

^ permalink raw reply

* RE: [PATCH net-next] hv_netvsc: remove excessive logging on MTU change
From: Haiyang Zhang @ 2016-11-28 17:33 UTC (permalink / raw)
  To: Vitaly Kuznetsov, netdev@vger.kernel.org
  Cc: linux-kernel@vger.kernel.org, KY Srinivasan
In-Reply-To: <20161128172544.2491-1-vkuznets@redhat.com>



> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> Sent: Monday, November 28, 2016 12:26 PM
> To: netdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org; KY Srinivasan <kys@microsoft.com>;
> Haiyang Zhang <haiyangz@microsoft.com>
> Subject: [PATCH net-next] hv_netvsc: remove excessive logging on MTU
> change
> 
> When we change MTU or the number of channels on a netvsc device we get
> the
> following logged:
> 
>  hv_netvsc bf5edba8...: net device safe to remove
>  hv_netvsc: hv_netvsc channel opened successfully
>  hv_netvsc bf5edba8...: Send section size: 6144, Section count:2560
>  hv_netvsc bf5edba8...: Device MAC 00:15:5d:1e:91:12 link state up
> 
> This information is useful as debug at most.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>

^ permalink raw reply

* Re: [PATCH net-next v2 2/4] Documentation: net: phy: Add a paragraph about pause frames/flow control
From: Florian Fainelli @ 2016-11-28 17:33 UTC (permalink / raw)
  To: Sebastian Frias, netdev
  Cc: davem, andrew, martin.blumenstingl, mans, alexandre.torgue,
	peppe.cavallaro, timur, jbrunet
In-Reply-To: <1b7425f7-9183-69d4-76e8-42eefffeb1c6@laposte.net>

On 11/28/2016 02:38 AM, Sebastian Frias wrote:
> On 27/11/16 19:44, Florian Fainelli wrote:
>> Describe that the Ethernet MAC controller is ultimately responsible for
>> dealing with proper pause frames/flow control advertisement and
>> enabling, and that it is therefore allowed to have it change
>> phydev->supported/advertising with SUPPORTED_Pause and
>> SUPPORTED_AsymPause.
>>
>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
>> ---
>>  Documentation/networking/phy.txt | 18 ++++++++++++++++--
>>  1 file changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt
>> index 4b25c0f24201..9a42a9414cea 100644
>> --- a/Documentation/networking/phy.txt
>> +++ b/Documentation/networking/phy.txt
>> @@ -127,8 +127,9 @@ Letting the PHY Abstraction Layer do Everything
>>   values pruned from them which don't make sense for your controller (a 10/100
>>   controller may be connected to a gigabit capable PHY, so you would need to
>>   mask off SUPPORTED_1000baseT*).  See include/linux/ethtool.h for definitions
>> - for these bitfields. Note that you should not SET any bits, or the PHY may
>> - get put into an unsupported state.
>> + for these bitfields. Note that you should not SET any bits, except the
>> + SUPPORTED_Pause and SUPPORTED_AsymPause bits (see below), or the PHY may get
>> + put into an unsupported state.
>>  
>>   Lastly, once the controller is ready to handle network traffic, you call
>>   phy_start(phydev).  This tells the PAL that you are ready, and configures the
>> @@ -139,6 +140,19 @@ Letting the PHY Abstraction Layer do Everything
>>   When you want to disconnect from the network (even if just briefly), you call
>>   phy_stop(phydev).
>>  
>> +Pause frames / flow control
>> +
>> + The PHY does not participate directly in flow control/pause frames except by
>> + making sure that the SUPPORTED_Pause and SUPPORTED_AsymPause bits are set in
>> + MII_ADVERTISE to indicate towards the link partner that the Ethernet MAC
>> + controller supports such a thing. Since flow control/pause frames generation
>> + involves the Ethernet MAC driver, it is recommended that this driver takes care
>> + of properly indicating advertisement and support for such features by setting
>> + the SUPPORTED_Pause and SUPPORTED_AsymPause bits accordingly. This can be done
>> + either before or after phy_connect() 
> 
> If the bits are set after phy_connect(), how does the PHY framework knows there's
> an update to the bits? Should some call be made?

You would most likely either call phy_start() to start the PHY state
machine (again) or have to re-negotiate the link with e.g:
genphy_restart_aneg().
-- 
Florian

^ permalink raw reply

* Re: [PATCH net] sit: Set skb->protocol properly in ipip6_tunnel_xmit()
From: Eric Dumazet @ 2016-11-28 17:34 UTC (permalink / raw)
  To: David Miller; +Cc: Alexander Duyck, sfr, elicooper, netdev
In-Reply-To: <1480352023.18162.63.camel@edumazet-glaptop3.roam.corp.google.com>

On Mon, 2016-11-28 at 08:53 -0800, Eric Dumazet wrote:
> On Mon, 2016-11-28 at 11:47 -0500, David Miller wrote:
> > From: Stephen Rothwell <sfr@canb.auug.org.au>
> > Date: Sun, 27 Nov 2016 13:04:00 +1100
> > 
> > > [Just for Dave's information]
> > > 
> > > On Fri, 25 Nov 2016 13:50:17 +0800 Eli Cooper <elicooper@gmx.com> wrote:
> > >>
> > >> Similar to commit ae148b085876
> > >> ("ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()"),
> > >> sit tunnels also need to update skb->protocol; otherwise, TSO/GSO packets
> > >> might not be properly segmented, which causes the packets being dropped.
> > >> 
> > >> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
> > >> Tested-by: Eli Cooper <elicooper@gmx.com>
> > >> Cc: stable@vger.kernel.org
> > >> Signed-off-by: Eli Cooper <elicooper@gmx.com>
> > > 
> > > I tested this patch and it does *not* solve my problem.
> > 
> > I'm torn on this patch, because it looked exactly like it would solve the
> > kind of problem Stephen is running into.
> > 
> > Even though it doesn't fix his case, it seems correct to me.
> > 
> > I was wondering if it was also important to set the skb->protocol
> > before the call to ip_tunnel_encap() but I couldn't find a dependency.
> > 
> > In any event I'd like to see some other people review this change
> > before I apply it.
> > 
> > My only other guess for Stephen's problem is somehow the SKB headers
> > aren't set up properly for what the GSO engine expects.
> 
> Well, mlx4 just works, and uses GSO engine just fine.
> 
> So my guess is this is a bug in Intel IGB driver.

About Eli patch : I do not believe it is needed.

Here is the path followed by SIT packet being GSO at the device layer :

We can see that ip_output() was called, and ip_output() already does :

skb->protocol = htons(ETH_P_IP);

[   71.209437] sit: skb->protocol = dd86
[   71.215387] skb_mac_gso_segment type 8
[   71.219176] ------------[ cut here ]------------
[   71.219180] WARNING: CPU: 25 PID: 12087 at net/core/dev.c:2642 skb_mac_gso_segment+0x166/0x170
[   71.219200] CPU: 25 PID: 12087 Comm: netperf Not tainted 4.9.0-smp-DEV #362
[   71.219203]  ffffc90019d4b4d8 ffffffff813ca25b 0000000000000009 0000000000000000
[   71.219204]  ffffc90019d4b518 ffffffff810d0c13 00000a5219d4b4f0 0000000000000008
[   71.219205]  ffff881ff2524ae8 0000000000000001 0000000000000001 ffff881fe1107a9c
[   71.219205] Call Trace:
[   71.219209]  [<ffffffff813ca25b>] dump_stack+0x4d/0x72
[   71.219213]  [<ffffffff810d0c13>] __warn+0xd3/0xf0
[   71.219214]  [<ffffffff810d0cfe>] warn_slowpath_null+0x1e/0x20
[   71.219215]  [<ffffffff8167b656>] skb_mac_gso_segment+0x166/0x170
[   71.219216]  [<ffffffff8167b719>] __skb_gso_segment+0xb9/0x140
[   71.219218]  [<ffffffff8167bb26>] validate_xmit_skb+0x136/0x2d0
[   71.219219]  [<ffffffff8167bd03>] validate_xmit_skb_list+0x43/0x70
[   71.219221]  [<ffffffff816a4817>] sch_direct_xmit+0x147/0x1a0
[   71.219223]  [<ffffffff8167c431>] __dev_queue_xmit+0x421/0x620
[   71.219224]  [<ffffffff8167c640>] dev_queue_xmit+0x10/0x20
[   71.219226]  [<ffffffff816db2d4>] ip_finish_output2+0x254/0x330
[   71.219227]  [<ffffffff816dcaa6>] ip_finish_output+0x136/0x1d0
[   71.219228]  [<ffffffff816dd4db>] ip_output+0xab/0xc0
[   71.219230]  [<ffffffff816dc970>] ? ip_fragment.constprop.58+0x80/0x80
[   71.219231]  [<ffffffff816dccb5>] ip_local_out+0x35/0x40
[   71.219233]  [<ffffffff81725aec>] iptunnel_xmit+0x14c/0x1c0
[   71.219235]  [<ffffffff8178f77a>] sit_tunnel_xmit+0x50a/0x860
[   71.219236]  [<ffffffff8167bde0>] dev_hard_start_xmit+0xb0/0x200
[   71.219238]  [<ffffffff8167c592>] __dev_queue_xmit+0x582/0x620
[   71.219239]  [<ffffffff8167c640>] dev_queue_xmit+0x10/0x20
[   71.219241]  [<ffffffff81684fb1>] neigh_direct_output+0x11/0x20
[   71.219243]  [<ffffffff81750e91>] ip6_finish_output2+0x181/0x460
[   71.219245]  [<ffffffff8178d00b>] ? ip6table_mangle_hook+0x3b/0x130
[   71.219246]  [<ffffffff81750ea4>] ? ip6_finish_output2+0x194/0x460
[   71.219247]  [<ffffffff8178d00b>] ? ip6table_mangle_hook+0x3b/0x130
[   71.219249]  [<ffffffff81752436>] ip6_finish_output+0xa6/0x110
[   71.219250]  [<ffffffff81752535>] ip6_output+0x95/0xe0
[   71.219251]  [<ffffffff81752390>] ? ip6_fragment+0x9e0/0x9e0
[   71.219252]  [<ffffffff8174edaf>] dst_output+0xf/0x20
[   71.219254]  [<ffffffff8174f347>] NF_HOOK.constprop.45+0x77/0x90
[   71.219255]  [<ffffffff8174eda0>] ? ac6_proc_exit+0x20/0x20
[   71.219256]  [<ffffffff817505b8>] ip6_xmit+0x268/0x4d0
[   71.219257]  [<ffffffff8174eda0>] ? ac6_proc_exit+0x20/0x20
[   71.219258]  [<ffffffff817505b8>] ? ip6_xmit+0x268/0x4d0
[   71.219260]  [<ffffffff817834b0>] ? inet6_csk_route_socket+0x120/0x1d0
[   71.219262]  [<ffffffff8122f771>] ? __kmalloc_node_track_caller+0x31/0x40
[   71.219264]  [<ffffffff8178363e>] inet6_csk_xmit+0x7e/0xc0
[   71.219265]  [<ffffffff8122f771>] ? __kmalloc_node_track_caller+0x31/0x40
[   71.219267]  [<ffffffff816f62e7>] tcp_transmit_skb+0x547/0x910
[   71.219268]  [<ffffffff816f6a6d>] tcp_write_xmit+0x3bd/0xf70
[   71.219270]  [<ffffffff813d0580>] ? __radix_tree_create+0x2d0/0x360
[   71.219271]  [<ffffffff816f7734>] tcp_push_one+0x34/0x40
[   71.219272]  [<ffffffff816e9141>] tcp_sendmsg+0x511/0xbc0
[   71.219275]  [<ffffffff81716375>] inet_sendmsg+0x65/0xa0
[   71.219277]  [<ffffffff81659188>] sock_sendmsg+0x38/0x50
[   71.219278]  [<ffffffff8165a8d6>] SYSC_sendto+0x106/0x170
[   71.219281]  [<ffffffff81139c06>] ? do_setitimer+0x1f6/0x250
[   71.219282]  [<ffffffff81139ca1>] ? alarm_setitimer+0x41/0x70
[   71.219283]  [<ffffffff8165b55e>] SyS_sendto+0xe/0x10
[   71.219286]  [<ffffffff817b5720>] entry_SYSCALL_64_fastpath+0x13/0x94

^ permalink raw reply

* Re: [PATCH net-next v2 0/6] tcp: sender chronographs instrumentation
From: Eric Dumazet @ 2016-11-28 17:38 UTC (permalink / raw)
  To: Yuchung Cheng; +Cc: davem, soheil, francisyyan, netdev, ncardwell, edumazet
In-Reply-To: <1480316838-154141-1-git-send-email-ycheng@google.com>

On Sun, 2016-11-27 at 23:07 -0800, Yuchung Cheng wrote:
> This patch set provides instrumentation on TCP sender limitations.
> While developing the BBR congestion control, we noticed that TCP
> sending process is often limited by factors unrelated to congestion
> control: insufficient sender buffer and/or insufficient receive
> window/buffer to saturate the network bandwidth. Unfortunately these
> limits are not visible to the users and often the poor performance
> is attributed to the congestion control of choice.

For the whole series :

Acked-by: Eric Dumazet <edumazet@google.com>

Thanks !

^ permalink raw reply

* Re: [[PATCH net-next RFC] 2/4] net: dsa: mv88e6xxx: Monitor and Management tables
From: Vivien Didelot @ 2016-11-28 17:42 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev, Andrew Lunn
In-Reply-To: <1479944598-20372-3-git-send-email-andrew@lunn.ch>

Hi Andrew,

Andrew Lunn <andrew@lunn.ch> writes:

> +/* Offset 0x1a: Monitor Control */

Thanks. We could also optionally add another line Offset 0x1A: Monitor &
MGMT Control, to mention the change on newer chips.

> +
> +int mv88e6095_monitor_ctrl(struct mv88e6xxx_chip *chip, int upstream_port)
> +{
> +	u16 reg;
> +
> +	reg = upstream_port << GLOBAL_MONITOR_CONTROL_INGRESS_SHIFT |
> +		upstream_port << GLOBAL_MONITOR_CONTROL_EGRESS_SHIFT |
> +		upstream_port << GLOBAL_MONITOR_CONTROL_ARP_SHIFT;
> +
> +	return mv88e6xxx_g1_write(chip, GLOBAL_MONITOR_CONTROL, reg);
> +}

Monitor Control is just one of these registers which control several
stuffs.

On 6185, it controls only the monitor destination port for ingress,
egress and ARP. On 6352, it controls the monitor destination port for
ingress and egress, sets the CPU destination port and the mirror port.

Hence using GLOBAL_MONITOR_CONTROL_ARP_SHIFT is bad since bits 7:4
differ in those two (CPU Dest vs. ARP Dest). Unless you tell me that
these are the same.

Please add a .set_monitor_port ops and provide a more explicit function
like mv88e6xxx_g1_set_monitor_port() which configures the ingress/egress
monitor destination port and eventually the CPU/ARP dest if the effect
is similar (otherwise a .set_cpu_port ops is needed).

> +
> +int mv88e6390_monitor_ctrl(struct mv88e6xxx_chip *chip, int upstream_port)
> +{
> +	u16 reg;
> +	int err;
> +
> +	/* Trap destination */
> +	reg = GLOBAL_MONITOR_CONTROL_UPDATE |
> +		GLOBAL_MONITOR_CONTROL_CPU_DEST |
> +		upstream_port;
> +	err = mv88e6xxx_g1_write(chip, GLOBAL_MONITOR_CONTROL, reg);
> +	if (err)
> +		return err;
> +
> +	/* 01:c2:80:00:00:00:00-01:c2:80:00:00:00:07 are Management */
> +	reg = GLOBAL_MONITOR_CONTROL_UPDATE |
> +		GLOBAL_MONITOR_CONTROL_0180C280000000XLO | 0xff;
> +	err = mv88e6xxx_g1_write(chip, GLOBAL_MONITOR_CONTROL, reg);
> +	if (err)
> +		return err;
> +
> +	/* 01:c2:80:00:00:00:08-01:c2:80:00:00:00:0f are Management */
> +	reg = GLOBAL_MONITOR_CONTROL_UPDATE |
> +		GLOBAL_MONITOR_CONTROL_0180C280000000XHI | 0xff;
> +	err = mv88e6xxx_g1_write(chip, GLOBAL_MONITOR_CONTROL, reg);
> +	if (err)
> +		return err;
> +
> +	/* 01:c2:80:00:00:00:20-01:c2:80:00:00:00:27 are Management */
> +	reg = GLOBAL_MONITOR_CONTROL_UPDATE |
> +		GLOBAL_MONITOR_CONTROL_0180C280000002XLO | 0xff;
> +	err = mv88e6xxx_g1_write(chip, GLOBAL_MONITOR_CONTROL, reg);
> +	if (err)
> +		return err;
> +
> +	/* 01:c2:80:00:00:00:28-01:c2:80:00:00:00:2f are Management */
> +	reg = GLOBAL_MONITOR_CONTROL_UPDATE |
> +		GLOBAL_MONITOR_CONTROL_0180C280000002XHI | 0xff;
> +
> +	return mv88e6xxx_g1_write(chip, GLOBAL_MONITOR_CONTROL, reg);
> +}
> +

This function does more that the 6095 implementation.

First please provide a static helper to write the Monitor & MGMT Control
table. Something like:

    mv88e6xxx_g1_monitor_mgmt_write(struct mv88e6xxx_chip *chip,
                                    u8 pointer, u8 data)

Then use it to implement a mv88e6xxx_g1_set_trap_port() function.

Rsvd2CPU is a different thing. 6352 has 2 dedicated registers in Global
2 to configure the management addresses 01:c2:80:00:00:00:2x (offset
0x2) and 01:c2:80:00:00:00:0x (offset 0x3). 6185 only has register 0x3.
Then 6390 stores the Rsvd2CPU addresses in this MGMT table.

I'd expect something like new .set_rsvd2cpu{0,2}(chip, u16 x) ops and
the following implementation:

    int mv88e6xxx_g2_set_rsvd2cpu2(struct mv88e6xxx_chip *chip, u16 x);
    int mv88e6xxx_g2_set_rsvd2cpu0(struct mv88e6xxx_chip *chip, u16 x);

int global2.c, and:

    int mv88e6xxx_g1_set_rsvd2cpu2(struct mv88e6xxx_chip *chip, u16 x);
    int mv88e6xxx_g1_set_rsvd2cpu0(struct mv88e6xxx_chip *chip, u16 x);

in global1.c (which use mv88e6xxx_g1_monitor_mgmt_write()).

Later (not required now), a nice wrapper in chip.c could set a rsvd2cpu
bit for a given address, or fallback to loading it in the ATU as MGMT.

Thanks,

        Vivien

^ permalink raw reply

* Re: [PATCH net-next v2 3/4] Documentation: net: phy: Add blurb about RGMII
From: Florian Fainelli @ 2016-11-28 17:43 UTC (permalink / raw)
  To: Sebastian Frias, netdev
  Cc: davem, andrew, martin.blumenstingl, mans, alexandre.torgue,
	peppe.cavallaro, timur, jbrunet, mason
In-Reply-To: <4c71a69c-2942-f177-ef25-04686f5e4149@laposte.net>

On 11/28/2016 02:34 AM, Sebastian Frias wrote:
> On 27/11/16 19:44, Florian Fainelli wrote:
>> RGMII is a recurring source of pain for people with Gigabit Ethernet
>> hardware since it may require PHY driver and MAC driver level
>> configuration hints. Document what are the expectations from PHYLIB and
>> what options exist.
>>
>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
>> ---
>>  Documentation/networking/phy.txt | 76 ++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 76 insertions(+)
>>
>> diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt
>> index 9a42a9414cea..7a0cb1212b9e 100644
>> --- a/Documentation/networking/phy.txt
>> +++ b/Documentation/networking/phy.txt
>> @@ -65,6 +65,82 @@ The MDIO bus
>>   drivers/net/ethernet/freescale/fsl_pq_mdio.c and an associated DTS file
>>   for one of the users. (e.g. "git grep fsl,.*-mdio arch/powerpc/boot/dts/")
>>  
>> +(RG)MII/electrical interface considerations
>> +
>> + The Reduced Gigabit Medium Independent Interface (RGMII) is a 12 pins
>> + electrical signal interface using a synchronous 125Mhz clock signal and several
>> + data lines. Due to this design decision, a 1.5ns to 2ns delay must be added
>> + between the clock line (RXC or TXC) and the data lines to let the PHY (clock
>> + sink) have enough setup and hold times to sample the data lines correctly. The
>> + PHY library offers different types of PHY_INTERFACE_MODE_RGMII* values to let
>> + the PHY driver and optionaly the MAC driver implement the required delay. The
>> + values of phy_interface_t must be understood from the perspective of the PHY
>> + device itself, leading to the following:
>> +
>> + * PHY_INTERFACE_MODE_RGMII: the PHY is not responsible for inserting any
>> +   internal delay by itself, it assumes that either the Ethernet MAC (if capable
>> +   or the PCB traces) insert the correct 1.5-2ns delay
> 
> For what is worth, the Atheros at803x driver comes with RX delay enabled as per HW
> reset.

Always, or is this a behavior possibly defined via a stra-pin that can
be changed?

> I understand that enforcing this documentation as is would result in changing the
> driver to disable RX delay, but it could break existing DTs, so I don't know if that
> path will be pursued.

Which is why the documentation update proposed indicates preferred vs.
mandatory.

> 
> Would explicit "versioning" the DT nodes be something worth exploring? So far it
> seems the versioning is implicit: driver probably check the presence of some property
> and conclude that it has to behave in a way or another.

I would not go that route, can of worms.

> 
>> +
>> + * PHY_INTERFACE_MODE_RGMII_TXID: the PHY should be inserting an internal delay
>> +   for the transmit data lines (TXD[3:0]) processed by the PHY device
>> +
>> + * PHY_INTERFACE_MODE_RGMII_RXID: the PHY should be inserting an internal delay
>> +   for the receive data lines (RXD[3:0]) processed by the PHY device
>> +
>> + * PHY_INTERFACE_MODE_RGMII_ID: the PHY should be inserting internal delays for
>> +   both transmit AND receive data lines from/to the PHY device
>> +
>> + Whenever it is possible, it is preferrable to utilize the PHY side RGMII delay
>> + for several reasons:
>> +
>> + * PHY devices may offer sub-nanosecond granularity in how they allow a
>> +   receiver/transmitter side delay (e.g: 0.5, 1.0, 1.5ns) to be specified. Such
>> +   precision may be required to account for differences in PCB trace lengths
>> +
>> + * PHY devices are typically qualified for a large range of applications
>> +   (industrial, medical, automotive...), and they provide a constant and
>> +   reliable delay across temperature/pressure/voltage ranges
>> +
>> + * PHY device drivers in PHYLIB being reusable by nature, being able to
>> +   configure correctly a specified delay enables more designs with similar delay
>> +   requirements to be operate correctly
>> +
>> + For cases where the PHY is not capable of providing this delay, but the
>> + Ethernet MAC driver is capable of doing it, the correct phy_interface_t value
>> + should be PHY_INTERFACE_MODE_RGMII, and the Ethernet MAC driver should be
>> + configured correctly in order to provide the required transmit and/or receive
>> + side delay from the perspective of the PHY device. 
> 
> While this clarifies the current situation very well, wouldn't the proposed approach
> require that a property such as "txid-delay-ns" on the PHY's DT node to be duplicated
> for the MAC?

The property name can be identical and represent essentially the same
things, but as as described, if the delay is implemented by the PHY,
then the MAC should disable it, conversely if the MAC needs to implement
it, the PHY should not contain any delay property. If both are found,
because the DTS is miswritten, then, the drivers should ignore/error out.

Or are you thinking about the case you described earlier with delays
that are neither 0 or 2, but e.g: 1 and 3 and you still want to have the
end-result be somewhere around 2ns?
-- 
Florian

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox