Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next 1/2] mlxsw: spectrum: Register CPU port with devlink
From: Jiri Pirko @ 2019-09-13  8:02 UTC (permalink / raw)
  To: Ido Schimmel; +Cc: netdev, davem, jiri, shalomt, mlxsw, Ido Schimmel
In-Reply-To: <20190912130740.22061-2-idosch@idosch.org>

Thu, Sep 12, 2019 at 03:07:39PM CEST, idosch@idosch.org wrote:
>From: Shalom Toledo <shalomt@mellanox.com>
>
>Register CPU port with devlink.
>
>Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
>Signed-off-by: Ido Schimmel <idosch@mellanox.com>
>---
> drivers/net/ethernet/mellanox/mlxsw/core.c    | 33 +++++++++++++
> drivers/net/ethernet/mellanox/mlxsw/core.h    |  5 ++
> .../net/ethernet/mellanox/mlxsw/spectrum.c    | 47 +++++++++++++++++++
> 3 files changed, 85 insertions(+)
>
>diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
>index 963a2b4b61b1..94f83d2be17e 100644
>--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
>+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
>@@ -1892,6 +1892,39 @@ void mlxsw_core_port_fini(struct mlxsw_core *mlxsw_core, u8 local_port)
> }
> EXPORT_SYMBOL(mlxsw_core_port_fini);
> 
>+int mlxsw_core_cpu_port_init(struct mlxsw_core *mlxsw_core,
>+			     void *port_driver_priv,
>+			     const unsigned char *switch_id,
>+			     unsigned char switch_id_len)
>+{
>+	struct devlink *devlink = priv_to_devlink(mlxsw_core);
>+	struct mlxsw_core_port *mlxsw_core_port =
>+					&mlxsw_core->ports[MLXSW_PORT_CPU_PORT];
>+	struct devlink_port *devlink_port = &mlxsw_core_port->devlink_port;
>+	int err;
>+
>+	mlxsw_core_port->local_port = MLXSW_PORT_CPU_PORT;
>+	mlxsw_core_port->port_driver_priv = port_driver_priv;
>+	devlink_port_attrs_set(devlink_port, DEVLINK_PORT_FLAVOUR_CPU,
>+			       0, false, 0, switch_id, switch_id_len);
>+	err = devlink_port_register(devlink, devlink_port, MLXSW_PORT_CPU_PORT);
>+	if (err)
>+		memset(mlxsw_core_port, 0, sizeof(*mlxsw_core_port));
>+	return err;

This duplicates almost 100% of code of mlxsw_core_port_init. Please do a
common function what can be used by both.


>+}
>+EXPORT_SYMBOL(mlxsw_core_cpu_port_init);
>+
>+void mlxsw_core_cpu_port_fini(struct mlxsw_core *mlxsw_core)
>+{
>+	struct mlxsw_core_port *mlxsw_core_port =
>+					&mlxsw_core->ports[MLXSW_PORT_CPU_PORT];
>+	struct devlink_port *devlink_port = &mlxsw_core_port->devlink_port;
>+
>+	devlink_port_unregister(devlink_port);
>+	memset(mlxsw_core_port, 0, sizeof(*mlxsw_core_port));
>+}
>+EXPORT_SYMBOL(mlxsw_core_cpu_port_fini);
>+
> void mlxsw_core_port_eth_set(struct mlxsw_core *mlxsw_core, u8 local_port,
> 			     void *port_driver_priv, struct net_device *dev)
> {
>diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h b/drivers/net/ethernet/mellanox/mlxsw/core.h
>index b65a17d49e43..5d7d2ab6d155 100644
>--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
>+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
>@@ -177,6 +177,11 @@ int mlxsw_core_port_init(struct mlxsw_core *mlxsw_core, u8 local_port,
> 			 const unsigned char *switch_id,
> 			 unsigned char switch_id_len);
> void mlxsw_core_port_fini(struct mlxsw_core *mlxsw_core, u8 local_port);
>+int mlxsw_core_cpu_port_init(struct mlxsw_core *mlxsw_core,
>+			     void *port_driver_priv,
>+			     const unsigned char *switch_id,
>+			     unsigned char switch_id_len);
>+void mlxsw_core_cpu_port_fini(struct mlxsw_core *mlxsw_core);
> void mlxsw_core_port_eth_set(struct mlxsw_core *mlxsw_core, u8 local_port,
> 			     void *port_driver_priv, struct net_device *dev);
> void mlxsw_core_port_ib_set(struct mlxsw_core *mlxsw_core, u8 local_port,
>diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
>index 91e4792bb7e7..1fc73a9ad84d 100644
>--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
>+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
>@@ -3872,6 +3872,46 @@ static void mlxsw_sp_port_remove(struct mlxsw_sp *mlxsw_sp, u8 local_port)
> 	mlxsw_core_port_fini(mlxsw_sp->core, local_port);
> }
> 
>+static int mlxsw_sp_cpu_port_create(struct mlxsw_sp *mlxsw_sp)
>+{
>+	struct mlxsw_sp_port *mlxsw_sp_port;
>+	int err;
>+
>+	mlxsw_sp_port = kzalloc(sizeof(*mlxsw_sp_port), GFP_KERNEL);
>+	if (!mlxsw_sp_port)
>+		return -ENOMEM;
>+
>+	mlxsw_sp_port->mlxsw_sp = mlxsw_sp;
>+	mlxsw_sp_port->local_port = MLXSW_PORT_CPU_PORT;
>+
>+	mlxsw_sp->ports[MLXSW_PORT_CPU_PORT] = mlxsw_sp_port;

Assign this at the end of the function and avoid NULL assignment on
error path.


>+
>+	err = mlxsw_core_cpu_port_init(mlxsw_sp->core,
>+				       mlxsw_sp_port,
>+				       mlxsw_sp->base_mac,
>+				       sizeof(mlxsw_sp->base_mac));
>+	if (err) {
>+		dev_err(mlxsw_sp->bus_info->dev, "Failed to initialize core CPU port\n");
>+		goto err_core_cpu_port_init;
>+	}
>+
>+	return err;
>+
>+err_core_cpu_port_init:
>+	mlxsw_sp->ports[MLXSW_PORT_CPU_PORT] = NULL;
>+	kfree(mlxsw_sp_port);
>+	return err;
>+}
>+
>+static void mlxsw_sp_cpu_port_remove(struct mlxsw_sp *mlxsw_sp)
>+{
>+	struct mlxsw_sp_port *mlxsw_sp_port = mlxsw_sp->ports[0];

s/0/MLXSW_PORT_CPU_PORT/


>+
>+	mlxsw_core_cpu_port_fini(mlxsw_sp->core);
>+	mlxsw_sp->ports[MLXSW_PORT_CPU_PORT] = NULL;
>+	kfree(mlxsw_sp_port);
>+}
>+
> static bool mlxsw_sp_port_created(struct mlxsw_sp *mlxsw_sp, u8 local_port)
> {
> 	return mlxsw_sp->ports[local_port] != NULL;
>@@ -3884,6 +3924,7 @@ static void mlxsw_sp_ports_remove(struct mlxsw_sp *mlxsw_sp)
> 	for (i = 1; i < mlxsw_core_max_ports(mlxsw_sp->core); i++)
> 		if (mlxsw_sp_port_created(mlxsw_sp, i))
> 			mlxsw_sp_port_remove(mlxsw_sp, i);
>+	mlxsw_sp_cpu_port_remove(mlxsw_sp);
> 	kfree(mlxsw_sp->port_to_module);
> 	kfree(mlxsw_sp->ports);
> }
>@@ -3908,6 +3949,10 @@ static int mlxsw_sp_ports_create(struct mlxsw_sp *mlxsw_sp)
> 		goto err_port_to_module_alloc;
> 	}
> 
>+	err = mlxsw_sp_cpu_port_create(mlxsw_sp);
>+	if (err)
>+		goto err_cpu_port_create;
>+
> 	for (i = 1; i < max_ports; i++) {
> 		/* Mark as invalid */
> 		mlxsw_sp->port_to_module[i] = -1;
>@@ -3931,6 +3976,8 @@ static int mlxsw_sp_ports_create(struct mlxsw_sp *mlxsw_sp)
> 	for (i--; i >= 1; i--)
> 		if (mlxsw_sp_port_created(mlxsw_sp, i))
> 			mlxsw_sp_port_remove(mlxsw_sp, i);
>+	mlxsw_sp_cpu_port_remove(mlxsw_sp);
>+err_cpu_port_create:
> 	kfree(mlxsw_sp->port_to_module);
> err_port_to_module_alloc:
> 	kfree(mlxsw_sp->ports);
>-- 
>2.21.0
>

^ permalink raw reply

* [PATCH] ath10k: fix spelling mistake "eanble" -> "enable"
From: Colin King @ 2019-09-13  7:43 UTC (permalink / raw)
  To: Kalle Valo, David S . Miller, ath10k, linux-wireless, netdev
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

There is a spelling mistake in a ath10k_warn warning message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/wireless/ath/ath10k/mac.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
index 12dad659bf68..a3d612f5f652 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -7418,7 +7418,7 @@ static bool ath10k_mac_set_vht_bitrate_mask_fixup(struct ath10k *ar,
 	err = ath10k_wmi_peer_set_param(ar, arvif->vdev_id, sta->addr,
 					WMI_PEER_PARAM_FIXED_RATE, rate);
 	if (err)
-		ath10k_warn(ar, "failed to eanble STA %pM peer fixed rate: %d\n",
+		ath10k_warn(ar, "failed to enable STA %pM peer fixed rate: %d\n",
 			    sta->addr, err);
 
 	return true;
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH bpf-next] libbpf: add xsk_umem__adjust_offset
From: Björn Töpel @ 2019-09-13  4:59 UTC (permalink / raw)
  To: Kevin Laatz
  Cc: Netdev, Alexei Starovoitov, Daniel Borkmann,
	Björn Töpel, Karlsson, Magnus, Jonathan Lemon,
	Bruce Richardson, ciara.loftus, bpf
In-Reply-To: <20190912072840.20947-1-kevin.laatz@intel.com>

On Thu, 12 Sep 2019 at 17:47, Kevin Laatz <kevin.laatz@intel.com> wrote:
>
> Currently, xsk_umem_adjust_offset exists as a kernel internal function.
> This patch adds xsk_umem__adjust_offset to libbpf so that it can be used
> from userspace. This will take the responsibility of properly storing the
> offset away from the application, making it less error prone.
>
> Since xsk_umem__adjust_offset is called on a per-packet basis, we need to
> inline the function to avoid any performance regressions.  In order to
> inline xsk_umem__adjust_offset, we need to add it to xsk.h. Unfortunately
> this means that we can't dereference the xsk_umem_config struct directly
> since it is defined only in xsk.c. We therefore add an extra API to return
> the flags field to the user from the structure, and have the inline
> function use this flags field directly.
>

Can you expand this to a series, with an additional patch where these
functions are used in XDP socket sample application, so it's more
clear for users?


Thanks,
Björn

> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> ---
>  tools/lib/bpf/libbpf.map |  1 +
>  tools/lib/bpf/xsk.c      |  5 +++++
>  tools/lib/bpf/xsk.h      | 14 ++++++++++++++
>  3 files changed, 20 insertions(+)
>
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index d04c7cb623ed..760350c9b81c 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -189,4 +189,5 @@ LIBBPF_0.0.4 {
>  LIBBPF_0.0.5 {
>         global:
>                 bpf_btf_get_next_id;
> +               xsk_umem__get_flags;
>  } LIBBPF_0.0.4;
> diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
> index 842c4fd55859..a4250a721ea6 100644
> --- a/tools/lib/bpf/xsk.c
> +++ b/tools/lib/bpf/xsk.c
> @@ -84,6 +84,11 @@ int xsk_socket__fd(const struct xsk_socket *xsk)
>         return xsk ? xsk->fd : -EINVAL;
>  }
>
> +__u32 xsk_umem__get_flags(struct xsk_umem *umem)
> +{
> +       return umem->config.flags;
> +}
> +
>  static bool xsk_page_aligned(void *buffer)
>  {
>         unsigned long addr = (unsigned long)buffer;
> diff --git a/tools/lib/bpf/xsk.h b/tools/lib/bpf/xsk.h
> index 584f6820a639..bf782facb274 100644
> --- a/tools/lib/bpf/xsk.h
> +++ b/tools/lib/bpf/xsk.h
> @@ -183,8 +183,22 @@ static inline __u64 xsk_umem__add_offset_to_addr(__u64 addr)
>         return xsk_umem__extract_addr(addr) + xsk_umem__extract_offset(addr);
>  }
>
> +/* Handle the offset appropriately depending on aligned or unaligned mode.
> + * For unaligned mode, we store the offset in the upper 16-bits of the address.
> + * For aligned mode, we simply add the offset to the address.
> + */
> +static inline __u64 xsk_umem__adjust_offset(__u32 umem_flags, __u64 addr,
> +                                           __u64 offset)
> +{
> +       if (umem_flags & XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> +               return addr + (offset << XSK_UNALIGNED_BUF_OFFSET_SHIFT);
> +       else
> +               return addr + offset;
> +}
> +
>  LIBBPF_API int xsk_umem__fd(const struct xsk_umem *umem);
>  LIBBPF_API int xsk_socket__fd(const struct xsk_socket *xsk);
> +LIBBPF_API __u32 xsk_umem__get_flags(struct xsk_umem *umem);
>
>  #define XSK_RING_CONS__DEFAULT_NUM_DESCS      2048
>  #define XSK_RING_PROD__DEFAULT_NUM_DESCS      2048
> --
> 2.17.1
>

^ permalink raw reply

* Re: [PATCH bpf-next 2/3] ixgbe: fix xdp handle calculations
From: Björn Töpel @ 2019-09-13  4:33 UTC (permalink / raw)
  To: Ciara Loftus
  Cc: Netdev, Alexei Starovoitov, Daniel Borkmann,
	Björn Töpel, Karlsson, Magnus, Jonathan Lemon,
	Bruce Richardson, bpf, intel-wired-lan, Kevin Laatz
In-Reply-To: <20190911172435.21042-2-ciara.loftus@intel.com>

On Wed, 11 Sep 2019 at 19:27, Ciara Loftus <ciara.loftus@intel.com> wrote:
>
> Commit 7cbbf9f1fa23 ("ixgbe: fix xdp handle calculations") reintroduced
> the addition of the umem headroom to the xdp handle in the ixgbe_zca_free,
> ixgbe_alloc_buffer_slow_zc and ixgbe_alloc_buffer_zc functions. However,
> the headroom is already added to the handle in the function
> ixgbe_run_xdp_zc. This commit removes the latter addition and fixes the
> case where the headroom is non-zero.
>
> Fixes: 7cbbf9f1fa23 ("ixgbe: fix xdp handle calculations")
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> index ad802a8909e0..5ed8b5a257cf 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> @@ -145,7 +145,7 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
>  {
>         struct xdp_umem *umem = rx_ring->xsk_umem;
>         int err, result = IXGBE_XDP_PASS;
> -       u64 offset = umem->headroom;
> +       u64 offset;

...and same comment as from the i40e patch: Reverse xmas tree, please.


Cheers,
Björn

>         struct bpf_prog *xdp_prog;
>         struct xdp_frame *xdpf;
>         u32 act;
> @@ -153,7 +153,7 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
>         rcu_read_lock();
>         xdp_prog = READ_ONCE(rx_ring->xdp_prog);
>         act = bpf_prog_run_xdp(xdp_prog, xdp);
> -       offset += xdp->data - xdp->data_hard_start;
> +       offset = xdp->data - xdp->data_hard_start;
>
>         xdp->handle = xsk_umem_adjust_offset(umem, xdp->handle, offset);
>
> --
> 2.17.1
>

^ permalink raw reply

* Re: [PATCH bpf-next 1/3] i40e: fix xdp handle calculations
From: Björn Töpel @ 2019-09-13  4:32 UTC (permalink / raw)
  To: Ciara Loftus
  Cc: Netdev, Alexei Starovoitov, Daniel Borkmann,
	Björn Töpel, Karlsson, Magnus, Jonathan Lemon,
	Bruce Richardson, bpf, intel-wired-lan, Kevin Laatz
In-Reply-To: <20190911172435.21042-1-ciara.loftus@intel.com>

On Wed, 11 Sep 2019 at 19:27, Ciara Loftus <ciara.loftus@intel.com> wrote:
>
> Commit 4c5d9a7fa149 ("i40e: fix xdp handle calculations") reintroduced
> the addition of the umem headroom to the xdp handle in the i40e_zca_free,
> i40e_alloc_buffer_slow_zc and i40e_alloc_buffer_zc functions. However,
> the headroom is already added to the handle in the function i40_run_xdp_zc.
> This commit removes the latter addition and fixes the case where the
> headroom is non-zero.
>
> Fixes: 4c5d9a7fa149 ("i40e: fix xdp handle calculations")
> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e_xsk.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> index 0373bc6c7e61..5f285ba1f1f9 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> @@ -192,7 +192,7 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
>  {
>         struct xdp_umem *umem = rx_ring->xsk_umem;
>         int err, result = I40E_XDP_PASS;
> -       u64 offset = umem->headroom;
> +       u64 offset;

Hi Ciara! Thanks for the patch; Small nit: Please sort local variable
declarations from longest to shortest line.


Cheers,
Björn


>         struct i40e_ring *xdp_ring;
>         struct bpf_prog *xdp_prog;
>         u32 act;
> @@ -203,7 +203,7 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
>          */
>         xdp_prog = READ_ONCE(rx_ring->xdp_prog);
>         act = bpf_prog_run_xdp(xdp_prog, xdp);
> -       offset += xdp->data - xdp->data_hard_start;
> +       offset = xdp->data - xdp->data_hard_start;
>
>         xdp->handle = xsk_umem_adjust_offset(umem, xdp->handle, offset);
>
> --
> 2.17.1
>

^ permalink raw reply

* [PATCH] iwlwifi: dbg_ini: fix memory leak in alloc_sgtable
From: Navid Emamdoost @ 2019-09-13  4:23 UTC (permalink / raw)
  Cc: emamd001, smccaman, kjlu, Navid Emamdoost, Johannes Berg,
	Emmanuel Grumbach, Luca Coelho, Intel Linux Wireless, Kalle Valo,
	David S. Miller, Shahar S Matityahu, Sara Sharon, linux-wireless,
	netdev, linux-kernel

In alloc_sgtable if alloc_page fails, the alocated table should be
released.

Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
---
 drivers/net/wireless/intel/iwlwifi/fw/dbg.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/wireless/intel/iwlwifi/fw/dbg.c b/drivers/net/wireless/intel/iwlwifi/fw/dbg.c
index 4d81776f576d..db41abb3361d 100644
--- a/drivers/net/wireless/intel/iwlwifi/fw/dbg.c
+++ b/drivers/net/wireless/intel/iwlwifi/fw/dbg.c
@@ -643,6 +643,7 @@ static struct scatterlist *alloc_sgtable(int size)
 				if (new_page)
 					__free_page(new_page);
 			}
+			kfree(table);
 			return NULL;
 		}
 		alloc_size = min_t(int, size, PAGE_SIZE);
-- 
2.17.1


^ permalink raw reply related

* [PATCH net-next] net: dsa: b53: Add support for port_egress_floods callback
From: Florian Fainelli @ 2019-09-13  3:28 UTC (permalink / raw)
  To: netdev
  Cc: b.spranger, Florian Fainelli, Andrew Lunn, Vivien Didelot,
	David S. Miller, open list

Add support for configuring the per-port egress flooding control for
both Unicast and Multicast traffic.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
Beneditk,

Do you mind re-testing, or confirming that this patch that I sent much
earlier does work correctly for you? Thanks!

 drivers/net/dsa/b53/b53_common.c | 33 ++++++++++++++++++++++++++++++++
 drivers/net/dsa/b53/b53_priv.h   |  2 ++
 2 files changed, 35 insertions(+)

diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index 7d328a5f0161..ac2ec08a652b 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -342,6 +342,13 @@ static void b53_set_forwarding(struct b53_device *dev, int enable)
 	b53_read8(dev, B53_CTRL_PAGE, B53_SWITCH_CTRL, &mgmt);
 	mgmt |= B53_MII_DUMB_FWDG_EN;
 	b53_write8(dev, B53_CTRL_PAGE, B53_SWITCH_CTRL, mgmt);
+
+	/* Look at B53_UC_FWD_EN and B53_MC_FWD_EN to decide whether
+	 * frames should be flooed or not.
+	 */
+	b53_read8(dev, B53_CTRL_PAGE, B53_IP_MULTICAST_CTRL, &mgmt);
+	mgmt |= B53_UC_FWD_EN | B53_MC_FWD_EN;
+	b53_write8(dev, B53_CTRL_PAGE, B53_IP_MULTICAST_CTRL, mgmt);
 }
 
 static void b53_enable_vlan(struct b53_device *dev, bool enable,
@@ -1753,6 +1760,31 @@ void b53_br_fast_age(struct dsa_switch *ds, int port)
 }
 EXPORT_SYMBOL(b53_br_fast_age);
 
+int b53_br_egress_floods(struct dsa_switch *ds, int port,
+			 bool unicast, bool multicast)
+{
+	struct b53_device *dev = ds->priv;
+	u16 uc, mc;
+
+	b53_read16(dev, B53_CTRL_PAGE, B53_UC_FWD_EN, &uc);
+	if (unicast)
+		uc |= BIT(port);
+	else
+		uc &= ~BIT(port);
+	b53_write16(dev, B53_CTRL_PAGE, B53_UC_FWD_EN, uc);
+
+	b53_read16(dev, B53_CTRL_PAGE, B53_MC_FWD_EN, &mc);
+	if (multicast)
+		mc |= BIT(port);
+	else
+		mc &= ~BIT(port);
+	b53_write16(dev, B53_CTRL_PAGE, B53_MC_FWD_EN, mc);
+
+	return 0;
+
+}
+EXPORT_SYMBOL(b53_br_egress_floods);
+
 static bool b53_possible_cpu_port(struct dsa_switch *ds, int port)
 {
 	/* Broadcom switches will accept enabling Broadcom tags on the
@@ -1953,6 +1985,7 @@ static const struct dsa_switch_ops b53_switch_ops = {
 	.port_bridge_leave	= b53_br_leave,
 	.port_stp_state_set	= b53_br_set_stp_state,
 	.port_fast_age		= b53_br_fast_age,
+	.port_egress_floods	= b53_br_egress_floods,
 	.port_vlan_filtering	= b53_vlan_filtering,
 	.port_vlan_prepare	= b53_vlan_prepare,
 	.port_vlan_add		= b53_vlan_add,
diff --git a/drivers/net/dsa/b53/b53_priv.h b/drivers/net/dsa/b53/b53_priv.h
index f25bc80c4ffc..a7dd8acc281b 100644
--- a/drivers/net/dsa/b53/b53_priv.h
+++ b/drivers/net/dsa/b53/b53_priv.h
@@ -319,6 +319,8 @@ int b53_br_join(struct dsa_switch *ds, int port, struct net_device *bridge);
 void b53_br_leave(struct dsa_switch *ds, int port, struct net_device *bridge);
 void b53_br_set_stp_state(struct dsa_switch *ds, int port, u8 state);
 void b53_br_fast_age(struct dsa_switch *ds, int port);
+int b53_br_egress_floods(struct dsa_switch *ds, int port,
+			 bool unicast, bool multicast);
 void b53_port_event(struct dsa_switch *ds, int port);
 void b53_phylink_validate(struct dsa_switch *ds, int port,
 			  unsigned long *supported,
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH bpf-next v10 2/4] bpf: new helper to obtain namespace data from current task New bpf helper bpf_get_current_pidns_info.
From: Yonghong Song @ 2019-09-13  2:56 UTC (permalink / raw)
  To: carlos antonio neira bustos
  Cc: Eric Biederman, Al Viro, netdev@vger.kernel.org,
	brouer@redhat.com, bpf@vger.kernel.org
In-Reply-To: <CACiB22j9M2gmccnh7XqqFp8g7qKFuiOrSAVJiA2tQHLB0pmoSQ@mail.gmail.com>



On 9/12/19 3:03 PM, carlos antonio neira bustos wrote:
> Yonghong,
> 
> I think bpf_get_ns_current_pid_tgid interface is a lot better than the 
> one proposed in my patch, how are we going to move forward? Should I 
> take these changes and refactor the selftests to use this new interface 
> and submit version 12 or as the interface changed completely is a new 
> set of patches?.

This is to solve the same problem. You can just submit as version 12.
This way, we preserves discussion history clearly.

> 
> Eric,
> Thank you very much for explaining the problem and your help to move 
> forward with this new helper.
> 
> 
> Bests

^ permalink raw reply

* [PATCH net] udp: correct reuseport selection with connected sockets
From: Willem de Bruijn @ 2019-09-13  1:16 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, kraig, zabele, pabeni, mark.keaton,
	Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

UDP reuseport groups can hold a mix unconnected and connected sockets.
Ensure that connections only receive all traffic to their 4-tuple.

Fast reuseport returns on the first reuseport match on the assumption
that all matches are equal. Only if connections are present, return to
the previous behavior of scoring all sockets.

Record if connections are present and if so (1) treat such connected
sockets as an independent match from the group, (2) only return
2-tuple matches from reuseport and (3) do not return on the first
2-tuple reuseport match to allow for a higher scoring match later.

New field has_conns is set without locks. No other fields in the
bitmap are modified at runtime and the field is only ever set
unconditionally, so an RMW cannot miss a change.

Fixes: e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection")
Link: http://lkml.kernel.org/r/CA+FuTSfRP09aJNYRt04SS6qj22ViiOEWaWmLAwX0psk8-PGNxw@mail.gmail.com
Signed-off-by: Willem de Bruijn <willemb@google.com>

---

I was unable to compile some older kernels, so the Fixes tag is based
on basic analysis, not bisected to by the regression test.
---
 include/net/sock_reuseport.h | 20 +++++++++++++++++++-
 net/core/sock_reuseport.c    | 15 +++++++++++++--
 net/ipv4/datagram.c          |  2 ++
 net/ipv4/udp.c               |  5 +++--
 net/ipv6/datagram.c          |  2 ++
 net/ipv6/udp.c               |  5 +++--
 6 files changed, 42 insertions(+), 7 deletions(-)

diff --git a/include/net/sock_reuseport.h b/include/net/sock_reuseport.h
index d9112de85261..43f4a818d88f 100644
--- a/include/net/sock_reuseport.h
+++ b/include/net/sock_reuseport.h
@@ -21,7 +21,8 @@ struct sock_reuseport {
 	unsigned int		synq_overflow_ts;
 	/* ID stays the same even after the size of socks[] grows. */
 	unsigned int		reuseport_id;
-	bool			bind_inany;
+	unsigned int		bind_inany:1;
+	unsigned int		has_conns:1;
 	struct bpf_prog __rcu	*prog;		/* optional BPF sock selector */
 	struct sock		*socks[0];	/* array of sock pointers */
 };
@@ -37,6 +38,23 @@ extern struct sock *reuseport_select_sock(struct sock *sk,
 extern int reuseport_attach_prog(struct sock *sk, struct bpf_prog *prog);
 extern int reuseport_detach_prog(struct sock *sk);
 
+static inline bool reuseport_has_conns(struct sock *sk, bool set)
+{
+	struct sock_reuseport *reuse;
+	bool ret = false;
+
+	rcu_read_lock();
+	reuse = rcu_dereference(sk->sk_reuseport_cb);
+	if (reuse) {
+		if (set)
+			reuse->has_conns = 1;
+		ret = reuse->has_conns;
+	}
+	rcu_read_unlock();
+
+	return ret;
+}
+
 int reuseport_get_id(struct sock_reuseport *reuse);
 
 #endif  /* _SOCK_REUSEPORT_H */
diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c
index 9408f9264d05..f3ceec93f392 100644
--- a/net/core/sock_reuseport.c
+++ b/net/core/sock_reuseport.c
@@ -295,8 +295,19 @@ struct sock *reuseport_select_sock(struct sock *sk,
 
 select_by_hash:
 		/* no bpf or invalid bpf result: fall back to hash usage */
-		if (!sk2)
-			sk2 = reuse->socks[reciprocal_scale(hash, socks)];
+		if (!sk2) {
+			int i, j;
+
+			i = j = reciprocal_scale(hash, socks);
+			while (reuse->socks[i]->sk_state == TCP_ESTABLISHED) {
+				i++;
+				if (i >= reuse->num_socks)
+					i = 0;
+				if (i == j)
+					goto out;
+			}
+			sk2 = reuse->socks[i];
+		}
 	}
 
 out:
diff --git a/net/ipv4/datagram.c b/net/ipv4/datagram.c
index 7bd29e694603..9a0fe0c2fa02 100644
--- a/net/ipv4/datagram.c
+++ b/net/ipv4/datagram.c
@@ -15,6 +15,7 @@
 #include <net/sock.h>
 #include <net/route.h>
 #include <net/tcp_states.h>
+#include <net/sock_reuseport.h>
 
 int __ip4_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 {
@@ -69,6 +70,7 @@ int __ip4_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len
 	}
 	inet->inet_daddr = fl4->daddr;
 	inet->inet_dport = usin->sin_port;
+	reuseport_has_conns(sk, true);
 	sk->sk_state = TCP_ESTABLISHED;
 	sk_set_txhash(sk);
 	inet->inet_id = jiffies;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index d88821c794fb..16486c8b708b 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -423,12 +423,13 @@ static struct sock *udp4_lib_lookup2(struct net *net,
 		score = compute_score(sk, net, saddr, sport,
 				      daddr, hnum, dif, sdif);
 		if (score > badness) {
-			if (sk->sk_reuseport) {
+			if (sk->sk_reuseport &&
+			    sk->sk_state != TCP_ESTABLISHED) {
 				hash = udp_ehashfn(net, daddr, hnum,
 						   saddr, sport);
 				result = reuseport_select_sock(sk, hash, skb,
 							sizeof(struct udphdr));
-				if (result)
+				if (result && !reuseport_has_conns(sk, false))
 					return result;
 			}
 			badness = score;
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 9ab897ded4df..96f939248d2f 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -27,6 +27,7 @@
 #include <net/ip6_route.h>
 #include <net/tcp_states.h>
 #include <net/dsfield.h>
+#include <net/sock_reuseport.h>
 
 #include <linux/errqueue.h>
 #include <linux/uaccess.h>
@@ -254,6 +255,7 @@ int __ip6_datagram_connect(struct sock *sk, struct sockaddr *uaddr,
 		goto out;
 	}
 
+	reuseport_has_conns(sk, true);
 	sk->sk_state = TCP_ESTABLISHED;
 	sk_set_txhash(sk);
 out:
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 827fe7385078..5995fdc99d3f 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -158,13 +158,14 @@ static struct sock *udp6_lib_lookup2(struct net *net,
 		score = compute_score(sk, net, saddr, sport,
 				      daddr, hnum, dif, sdif);
 		if (score > badness) {
-			if (sk->sk_reuseport) {
+			if (sk->sk_reuseport &&
+			    sk->sk_state != TCP_ESTABLISHED) {
 				hash = udp6_ehashfn(net, daddr, hnum,
 						    saddr, sport);
 
 				result = reuseport_select_sock(sk, hash, skb,
 							sizeof(struct udphdr));
-				if (result)
+				if (result && !reuseport_has_conns(sk, false))
 					return result;
 			}
 			result = sk;
-- 
2.23.0.237.gc6a4ce50a0-goog


^ permalink raw reply related

* Re: [PATCH net] net/rds: Fix 'ib_evt_handler_call' element in 'rds_ib_stat_names'
From: santosh.shilimkar @ 2019-09-13  1:10 UTC (permalink / raw)
  To: Gerd Rausch, netdev, linux-rdma, rds-devel; +Cc: David Miller
In-Reply-To: <914b48be-2373-5b38-83f5-e0d917dd139d@oracle.com>

On 9/12/19 1:49 PM, Gerd Rausch wrote:
> All entries in 'rds_ib_stat_names' are stringified versions
> of the corresponding "struct rds_ib_statistics" element
> without the "s_"-prefix.
> 
> Fix entry 'ib_evt_handler_call' to do the same.
> 
> Fixes: f4f943c958a2 ("RDS: IB: ack more receive completions to improve performance")
> Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
> ---
Thanks Gerd !!

Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

^ permalink raw reply

* [PATCH net] net/rds: Fix 'ib_evt_handler_call' element in 'rds_ib_stat_names'
From: Gerd Rausch @ 2019-09-12 20:49 UTC (permalink / raw)
  To: Santosh Shilimkar, netdev, linux-rdma, rds-devel; +Cc: David Miller

All entries in 'rds_ib_stat_names' are stringified versions
of the corresponding "struct rds_ib_statistics" element
without the "s_"-prefix.

Fix entry 'ib_evt_handler_call' to do the same.

Fixes: f4f943c958a2 ("RDS: IB: ack more receive completions to improve performance")
Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
---
 net/rds/ib_stats.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/rds/ib_stats.c b/net/rds/ib_stats.c
index 9252ad126335..ac46d8961b61 100644
--- a/net/rds/ib_stats.c
+++ b/net/rds/ib_stats.c
@@ -42,7 +42,7 @@ DEFINE_PER_CPU_SHARED_ALIGNED(struct rds_ib_statistics, rds_ib_stats);
 static const char *const rds_ib_stat_names[] = {
 	"ib_connect_raced",
 	"ib_listen_closed_stale",
-	"s_ib_evt_handler_call",
+	"ib_evt_handler_call",
 	"ib_tasklet_call",
 	"ib_tx_cq_event",
 	"ib_tx_ring_full",
-- 
2.22.1


^ permalink raw reply related

* Re: [PATCH net-next 5/5] sctp: add spt_pathcpthld in struct sctp_paddrthlds
From: Marcelo Ricardo Leitner @ 2019-09-12 22:51 UTC (permalink / raw)
  To: Xin Long
  Cc: David Laight, network dev, linux-sctp@vger.kernel.org,
	Neil Horman, davem@davemloft.net
In-Reply-To: <CADvbK_e=4Fo7dmM=4QTZHtNDtsrDVe_VtyG2NVqt_3r9z7R=PA@mail.gmail.com>

On Thu, Sep 12, 2019 at 01:47:08AM +0800, Xin Long wrote:
> On Wed, Sep 11, 2019 at 8:56 PM Marcelo Ricardo Leitner
> <marcelo.leitner@gmail.com> wrote:
> >
> > On Wed, Sep 11, 2019 at 05:38:33PM +0800, Xin Long wrote:
> > > On Wed, Sep 11, 2019 at 5:21 PM Xin Long <lucien.xin@gmail.com> wrote:
> > > >
> > > > On Wed, Sep 11, 2019 at 5:03 PM David Laight <David.Laight@aculab.com> wrote:
> > > > >
> > > > > From: Xin Long [mailto:lucien.xin@gmail.com]
> > > > > > Sent: 11 September 2019 09:52
> > > > > > On Tue, Sep 10, 2019 at 9:19 PM David Laight <David.Laight@aculab.com> wrote:
> > > > > > >
> > > > > > > From: Xin Long
> > > > > > > > Sent: 09 September 2019 08:57
> > > > > > > > Section 7.2 of rfc7829: "Peer Address Thresholds (SCTP_PEER_ADDR_THLDS)
> > > > > > > > Socket Option" extends 'struct sctp_paddrthlds' with 'spt_pathcpthld'
> > > > > > > > added to allow a user to change ps_retrans per sock/asoc/transport, as
> > > > > > > > other 2 paddrthlds: pf_retrans, pathmaxrxt.
> > > > > > > >
> > > > > > > > Note that ps_retrans is not allowed to be greater than pf_retrans.
> > > > > > > >
> > > > > > > > Signed-off-by: Xin Long <lucien.xin@gmail.com>
> > > > > > > > ---
> > > > > > > >  include/uapi/linux/sctp.h |  1 +
> > > > > > > >  net/sctp/socket.c         | 10 ++++++++++
> > > > > > > >  2 files changed, 11 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/include/uapi/linux/sctp.h b/include/uapi/linux/sctp.h
> > > > > > > > index a15cc28..dfd81e1 100644
> > > > > > > > --- a/include/uapi/linux/sctp.h
> > > > > > > > +++ b/include/uapi/linux/sctp.h
> > > > > > > > @@ -1069,6 +1069,7 @@ struct sctp_paddrthlds {
> > > > > > > >       struct sockaddr_storage spt_address;
> > > > > > > >       __u16 spt_pathmaxrxt;
> > > > > > > >       __u16 spt_pathpfthld;
> > > > > > > > +     __u16 spt_pathcpthld;
> > > > > > > >  };
> > > > > > > >
> > > > > > > >  /*
> > > > > > > > diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> > > > > > > > index 5e2098b..5b9774d 100644
> > > > > > > > --- a/net/sctp/socket.c
> > > > > > > > +++ b/net/sctp/socket.c
> > > > > > > > @@ -3954,6 +3954,9 @@ static int sctp_setsockopt_paddr_thresholds(struct sock *sk,
> > > > > > >
> > > > > > > This code does:
> > > > > > >         if (optlen < sizeof(struct sctp_paddrthlds))
> > > > > > >                 return -EINVAL;
> > > > > > here will become:
> > > > > >
> > > > > >         if (optlen >= sizeof(struct sctp_paddrthlds)) {
> > > > > >                 optlen = sizeof(struct sctp_paddrthlds);
> > > > > >         } else if (optlen >= ALIGN(offsetof(struct sctp_paddrthlds,
> > > > > >                                             spt_pathcpthld), 4))
> > > > > >                 optlen = ALIGN(offsetof(struct sctp_paddrthlds,
> > > > > >                                         spt_pathcpthld), 4);
> > > > > >                 val.spt_pathcpthld = 0xffff;
> > > > > >         else {
> > > > > >                 return -EINVAL;
> > > > > >         }
> > > > >
> > > > > Hmmm...
> > > > > If the kernel has to default 'val.spt_pathcpthld = 0xffff'
> > > > > then recompiling an existing application with the new uapi
> > > > > header is going to lead to very unexpected behaviour.
> > > > >
> > > > > The best you can hope for is that the application memset the
> > > > > structure to zero.
> > > > > But more likely it is 'random' on-stack data.
> > > > 0xffff is a value to disable the feature 'Primary Path Switchover'.
> > > > you're right that user might set it to zero unexpectly with their
> > > > old application rebuilt.
> > > >
> > > > A safer way is to introduce "sysctl net.sctp.ps_retrans", it won't
> > > > matter if users set spt_pathcpthld properly when they're not aware
> > > > of this feature, like "sysctl net.sctp.pF_retrans". Looks better?
> > > Sorry for confusing,  "sysctl net.sctp.ps_retrans" is already there
> > > (its value is 0xffff by default),
> > > we just need to do this in sctp_setsockopt_paddr_thresholds():
> > >
> > >         if (copy_from_user(&val, (struct sctp_paddrthlds __user *)optval,
> > >                            optlen))
> > >                 return -EFAULT;
> > >
> > >         if (sock_net(sk)->sctp.ps_retrans == 0xffff)
> > >                 val.spt_pathcpthld = 0xffff;
> >
> > I'm confused with the snippets, but if I got them right, this is after
> > dealing with proper len and could leave val.spt_pathcpthld
> > uninitialized if the application used the old format and sysctl is !=
> > 0xffff.
> right, how about this in sctp_setsockopt_paddr_thresholds():
> 
>         offset = ALIGN(offsetof(struct sctp_paddrthlds, spt_pathcpthld), 4);
>         if (optlen < offset)
>                 return -EINVAL;
>         if (optlen < sizeof(val) || sock_net(sk)->sctp.ps_retrans == 0xffff) {
>                 optlen = offset;
>                 val.spt_pathcpthld = 0xffff;
>         } else {
>                 optlen = sizeof(val);
>         }
> 
>         if (copy_from_user(&val, (struct sctp_paddrthlds __user *)optval,
>                            optlen))
>                 return -EFAULT;
> 
>         if (val.spt_pathpfthld > val.spt_pathcpthld)
>                 return -EINVAL;
> 
> Which means we will 'skip' spt_pathcpthld if (it's using old format) or
> (ps_retrans is disabled and it's using new format).

This will be inconsistent if we end up having to add another field
after spt_pathcpthld, because it will get ignored if ps_retrans ==
0xffff.  Lets not optimize out the fields please. This is already very
tricky to get right.

> Note that  ps_retrans < pf_retrans is not allowed in rfc7829.
> 
> and in sctp_getsockopt_paddr_thresholds():
> 
>         offset = ALIGN(offsetof(struct sctp_paddrthlds, spt_pathcpthld), 4);
>         if (len < offset)
>                 return -EINVAL;
>         if (len < sizeof(val) || sock_net(sk)->sctp.ps_retrans == 0xffff)
>                 len = offset;
>         else
>                 len = sizeof(val);

Here it is more visible. If net->...ps_retrans is disabled, remaining
fields (currently just this one, but as we are extending it now, we
have to think about the possibility of more as well) will be ignored,
we and we may not want that.

> 
>         if (copy_from_user(&val, (struct sctp_paddrthlds __user *)optval, len))
>                 return -EFAULT;
> 
> 
> >
> > >
> > >         if (val.spt_pathpfthld > val.spt_pathcpthld)
> > >                 return -EINVAL;
> > >
> > > >
> > > > >
> > > > >         David
> > > > >
> > > > > -
> > > > > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> > > > > Registration No: 1397386 (Wales)
> > >
> 

^ permalink raw reply

* Re: [net] ixgbevf: Fix secpath usage for IPsec Tx offload
From: Shannon Nelson @ 2019-09-12 22:26 UTC (permalink / raw)
  To: Jeff Kirsher, netdev; +Cc: Jonathan Tooker
In-Reply-To: <20190912190734.10560-1-jeffrey.t.kirsher@intel.com>

On 9/12/19 8:07 PM, Jeff Kirsher wrote:
> Port the same fix for ixgbe to ixgbevf.
>
> The ixgbevf driver currently does IPsec Tx offloading
> based on an existing secpath. However, the secpath
> can also come from the Rx side, in this case it is
> misinterpreted for Tx offload and the packets are
> dropped with a "bad sa_idx" error. Fix this by using
> the xfrm_offload() function to test for Tx offload.
>
> CC: Shannon Nelson <snelson@pensando.io>
> Fixes: 7f68d4306701 ("ixgbevf: enable VF IPsec offload operations")
> Reported-by: Jonathan Tooker <jonathan@reliablehosting.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> ---
>   drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> index d2b41f9f87f8..72872d6ca80c 100644
> --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> @@ -30,6 +30,7 @@
>   #include <linux/bpf.h>
>   #include <linux/bpf_trace.h>
>   #include <linux/atomic.h>
> +#include <net/xfrm.h>
>   
>   #include "ixgbevf.h"
>   
> @@ -4161,7 +4162,7 @@ static int ixgbevf_xmit_frame_ring(struct sk_buff *skb,
>   	first->protocol = vlan_get_protocol(skb);
>   
>   #ifdef CONFIG_IXGBEVF_IPSEC
> -	if (secpath_exists(skb) && !ixgbevf_ipsec_tx(tx_ring, first, &ipsec_tx))
> +	if (xfrm_offload(skb) && !ixgbevf_ipsec_tx(tx_ring, first, &ipsec_tx))
>   		goto out_drop;
>   #endif
>   	tso = ixgbevf_tso(tx_ring, first, &hdr_len, &ipsec_tx);

Acked-by: Shannon Nelson <snelson@pensando.io>


^ permalink raw reply

* [PATCH V1 net-next 11/11] net: ena: fix incorrect update of intr_delay_resolution
From: akiyano @ 2019-09-12 22:08 UTC (permalink / raw)
  To: davem, netdev
  Cc: Arthur Kiyanovski, dwmw, zorik, matua, saeedb, msw, aliguori,
	nafea, gtzalik, netanel, alisaidi, benh, sameehj, ndagan
In-Reply-To: <1568326128-4057-1-git-send-email-akiyano@amazon.com>

From: Arthur Kiyanovski <akiyano@amazon.com>

ena_dev->intr_moder_rx/tx_interval save the intervals received from the
user after dividing them by ena_dev->intr_delay_resolution. Therefore
when intr_delay_resolution changes, the code needs to first mutiply
intr_moder_rx/tx_interval by the previous intr_delay_resolution to get
the value originally given by the user, and only then divide it by the
new intr_delay_resolution.

Current code does not first multiply intr_moder_rx/tx_interval by the old
intr_delay_resolution. This commit fixes it.

Also initialize ena_dev->intr_delay_resolution to be 1.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_com.c    | 22 +++++++++++++++-----
 drivers/net/ethernet/amazon/ena/ena_com.h    |  1 +
 drivers/net/ethernet/amazon/ena/ena_netdev.c |  1 +
 3 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c b/drivers/net/ethernet/amazon/ena/ena_com.c
index c3f4846aa845..ea62604fdf8c 100644
--- a/drivers/net/ethernet/amazon/ena/ena_com.c
+++ b/drivers/net/ethernet/amazon/ena/ena_com.c
@@ -1281,18 +1281,30 @@ static int ena_com_ind_tbl_convert_from_device(struct ena_com_dev *ena_dev)
 static void ena_com_update_intr_delay_resolution(struct ena_com_dev *ena_dev,
 						 u16 intr_delay_resolution)
 {
+	/* Initial value of intr_delay_resolution might be 0 */
+	u16 prev_intr_delay_resolution =
+		ena_dev->intr_delay_resolution ?
+		ena_dev->intr_delay_resolution :
+		ENA_DEFAULT_INTR_DELAY_RESOLUTION;
+
 	if (!intr_delay_resolution) {
 		pr_err("Illegal intr_delay_resolution provided. Going to use default 1 usec resolution\n");
-		intr_delay_resolution = 1;
+		intr_delay_resolution = ENA_DEFAULT_INTR_DELAY_RESOLUTION;
 	}
-	ena_dev->intr_delay_resolution = intr_delay_resolution;
 
 	/* update Rx */
-	ena_dev->intr_moder_rx_interval /= intr_delay_resolution;
-
+	ena_dev->intr_moder_rx_interval =
+		ena_dev->intr_moder_rx_interval *
+		prev_intr_delay_resolution /
+		intr_delay_resolution;
 
 	/* update Tx */
-	ena_dev->intr_moder_tx_interval /= intr_delay_resolution;
+	ena_dev->intr_moder_tx_interval =
+		ena_dev->intr_moder_tx_interval *
+		prev_intr_delay_resolution /
+		intr_delay_resolution;
+
+	ena_dev->intr_delay_resolution = intr_delay_resolution;
 }
 
 /*****************************************************************************/
diff --git a/drivers/net/ethernet/amazon/ena/ena_com.h b/drivers/net/ethernet/amazon/ena/ena_com.h
index ddc2a8c50333..7c941eba0bc9 100644
--- a/drivers/net/ethernet/amazon/ena/ena_com.h
+++ b/drivers/net/ethernet/amazon/ena/ena_com.h
@@ -74,6 +74,7 @@
 
 #define ENA_INTR_INITIAL_TX_INTERVAL_USECS		196
 #define ENA_INTR_INITIAL_RX_INTERVAL_USECS		0
+#define ENA_DEFAULT_INTR_DELAY_RESOLUTION		1
 
 #define ENA_HW_HINTS_NO_TIMEOUT				0xFFFF
 
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 617f51890449..25f70a59d42f 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -3501,6 +3501,7 @@ static int ena_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	*/
 	ena_dev->intr_moder_tx_interval = ENA_INTR_INITIAL_TX_INTERVAL_USECS;
 	ena_dev->intr_moder_rx_interval = ENA_INTR_INITIAL_RX_INTERVAL_USECS;
+	ena_dev->intr_delay_resolution = ENA_DEFAULT_INTR_DELAY_RESOLUTION;
 	io_queue_num = ena_calc_io_queue_num(pdev, ena_dev, &get_feat_ctx);
 	rc = ena_calc_queue_size(&calc_queue_ctx);
 	if (rc || io_queue_num <= 0) {
-- 
2.17.2


^ permalink raw reply related

* [PATCH V1 net-next 10/11] net: ena: fix retrieval of nonadaptive interrupt moderation intervals
From: akiyano @ 2019-09-12 22:08 UTC (permalink / raw)
  To: davem, netdev
  Cc: Arthur Kiyanovski, dwmw, zorik, matua, saeedb, msw, aliguori,
	nafea, gtzalik, netanel, alisaidi, benh, sameehj, ndagan
In-Reply-To: <1568326128-4057-1-git-send-email-akiyano@amazon.com>

From: Arthur Kiyanovski <akiyano@amazon.com>

Nonadaptive interrupt moderation intervals are assigned the value set
by the user in ethtool -C divided by ena_dev->intr_delay_resolution.

Therefore when the user tries to get the nonadaptive interrupt moderation
intervals with ethtool -c the code needs to multiply the saved value
by ena_dev->intr_delay_resolution.

The current code erroneously divides instead of multiplying in ethtool -c.
This patch fixes this.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_ethtool.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_ethtool.c b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
index 0f90e2296630..16553d92fad2 100644
--- a/drivers/net/ethernet/amazon/ena/ena_ethtool.c
+++ b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
@@ -310,14 +310,15 @@ static int ena_get_coalesce(struct net_device *net_dev,
 		/* the devie doesn't support interrupt moderation */
 		return -EOPNOTSUPP;
 	}
+
 	coalesce->tx_coalesce_usecs =
-		ena_com_get_nonadaptive_moderation_interval_tx(ena_dev) /
+		ena_com_get_nonadaptive_moderation_interval_tx(ena_dev) *
 			ena_dev->intr_delay_resolution;
 
 	if (!ena_com_get_adaptive_moderation_enabled(ena_dev))
 		coalesce->rx_coalesce_usecs =
 			ena_com_get_nonadaptive_moderation_interval_rx(ena_dev)
-			/ ena_dev->intr_delay_resolution;
+			* ena_dev->intr_delay_resolution;
 
 	coalesce->use_adaptive_rx_coalesce =
 		ena_com_get_adaptive_moderation_enabled(ena_dev);
-- 
2.17.2


^ permalink raw reply related

* [PATCH V1 net-next 09/11] net: ena: fix update of interrupt moderation register
From: akiyano @ 2019-09-12 22:08 UTC (permalink / raw)
  To: davem, netdev
  Cc: Arthur Kiyanovski, dwmw, zorik, matua, saeedb, msw, aliguori,
	nafea, gtzalik, netanel, alisaidi, benh, sameehj, ndagan
In-Reply-To: <1568326128-4057-1-git-send-email-akiyano@amazon.com>

From: Arthur Kiyanovski <akiyano@amazon.com>

Current implementation always updates the interrupt register with
the smoothed_interval of the rx_ring. However this should be
done only in case of adaptive interrupt moderation. If non-adaptive
interrupt moderation is used, the non-adaptive interrupt moderation
interval should be used. This commit fixes that.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 8b9f8b90e525..617f51890449 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -1187,12 +1187,16 @@ static void ena_unmask_interrupt(struct ena_ring *tx_ring,
 					struct ena_ring *rx_ring)
 {
 	struct ena_eth_io_intr_reg intr_reg;
+	u32 rx_interval = ena_com_get_adaptive_moderation_enabled(rx_ring->ena_dev) ?
+		rx_ring->smoothed_interval :
+		ena_com_get_nonadaptive_moderation_interval_rx(rx_ring->ena_dev);
+
 
 	/* Update intr register: rx intr delay,
 	 * tx intr delay and interrupt unmask
 	 */
 	ena_com_update_intr_reg(&intr_reg,
-				rx_ring->smoothed_interval,
+				rx_interval,
 				tx_ring->smoothed_interval,
 				true);
 
-- 
2.17.2


^ permalink raw reply related

* [PATCH V1 net-next 08/11] net: ena: remove all old adaptive rx interrupt moderation code from ena_com
From: akiyano @ 2019-09-12 22:08 UTC (permalink / raw)
  To: davem, netdev
  Cc: Arthur Kiyanovski, dwmw, zorik, matua, saeedb, msw, aliguori,
	nafea, gtzalik, netanel, alisaidi, benh, sameehj, ndagan
In-Reply-To: <1568326128-4057-1-git-send-email-akiyano@amazon.com>

From: Arthur Kiyanovski <akiyano@amazon.com>

Remove previous implementation of adaptive rx interrupt moderation
from ena_com files.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_com.c | 110 -----------------
 drivers/net/ethernet/amazon/ena/ena_com.h | 142 ----------------------
 2 files changed, 252 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c b/drivers/net/ethernet/amazon/ena/ena_com.c
index ea859ecf664d..c3f4846aa845 100644
--- a/drivers/net/ethernet/amazon/ena/ena_com.c
+++ b/drivers/net/ethernet/amazon/ena/ena_com.c
@@ -1278,22 +1278,6 @@ static int ena_com_ind_tbl_convert_from_device(struct ena_com_dev *ena_dev)
 	return 0;
 }
 
-static int ena_com_init_interrupt_moderation_table(struct ena_com_dev *ena_dev)
-{
-	size_t size;
-
-	size = sizeof(struct ena_intr_moder_entry) * ENA_INTR_MAX_NUM_OF_LEVELS;
-
-	ena_dev->intr_moder_tbl =
-		devm_kzalloc(ena_dev->dmadev, size, GFP_KERNEL);
-	if (!ena_dev->intr_moder_tbl)
-		return -ENOMEM;
-
-	ena_com_config_default_interrupt_moderation_table(ena_dev);
-
-	return 0;
-}
-
 static void ena_com_update_intr_delay_resolution(struct ena_com_dev *ena_dev,
 						 u16 intr_delay_resolution)
 {
@@ -2803,13 +2787,6 @@ int ena_com_update_nonadaptive_moderation_interval_rx(struct ena_com_dev *ena_de
 							      &ena_dev->intr_moder_rx_interval);
 }
 
-void ena_com_destroy_interrupt_moderation(struct ena_com_dev *ena_dev)
-{
-	if (ena_dev->intr_moder_tbl)
-		devm_kfree(ena_dev->dmadev, ena_dev->intr_moder_tbl);
-	ena_dev->intr_moder_tbl = NULL;
-}
-
 int ena_com_init_interrupt_moderation(struct ena_com_dev *ena_dev)
 {
 	struct ena_admin_get_feat_resp get_resp;
@@ -2834,10 +2811,6 @@ int ena_com_init_interrupt_moderation(struct ena_com_dev *ena_dev)
 		return rc;
 	}
 
-	rc = ena_com_init_interrupt_moderation_table(ena_dev);
-	if (rc)
-		goto err;
-
 	/* if moderation is supported by device we set adaptive moderation */
 	delay_resolution = get_resp.u.intr_moderation.intr_delay_resolution;
 	ena_com_update_intr_delay_resolution(ena_dev, delay_resolution);
@@ -2846,52 +2819,6 @@ int ena_com_init_interrupt_moderation(struct ena_com_dev *ena_dev)
 	ena_com_disable_adaptive_moderation(ena_dev);
 
 	return 0;
-err:
-	ena_com_destroy_interrupt_moderation(ena_dev);
-	return rc;
-}
-
-void ena_com_config_default_interrupt_moderation_table(struct ena_com_dev *ena_dev)
-{
-	struct ena_intr_moder_entry *intr_moder_tbl = ena_dev->intr_moder_tbl;
-
-	if (!intr_moder_tbl)
-		return;
-
-	intr_moder_tbl[ENA_INTR_MODER_LOWEST].intr_moder_interval =
-		ENA_INTR_LOWEST_USECS;
-	intr_moder_tbl[ENA_INTR_MODER_LOWEST].pkts_per_interval =
-		ENA_INTR_LOWEST_PKTS;
-	intr_moder_tbl[ENA_INTR_MODER_LOWEST].bytes_per_interval =
-		ENA_INTR_LOWEST_BYTES;
-
-	intr_moder_tbl[ENA_INTR_MODER_LOW].intr_moder_interval =
-		ENA_INTR_LOW_USECS;
-	intr_moder_tbl[ENA_INTR_MODER_LOW].pkts_per_interval =
-		ENA_INTR_LOW_PKTS;
-	intr_moder_tbl[ENA_INTR_MODER_LOW].bytes_per_interval =
-		ENA_INTR_LOW_BYTES;
-
-	intr_moder_tbl[ENA_INTR_MODER_MID].intr_moder_interval =
-		ENA_INTR_MID_USECS;
-	intr_moder_tbl[ENA_INTR_MODER_MID].pkts_per_interval =
-		ENA_INTR_MID_PKTS;
-	intr_moder_tbl[ENA_INTR_MODER_MID].bytes_per_interval =
-		ENA_INTR_MID_BYTES;
-
-	intr_moder_tbl[ENA_INTR_MODER_HIGH].intr_moder_interval =
-		ENA_INTR_HIGH_USECS;
-	intr_moder_tbl[ENA_INTR_MODER_HIGH].pkts_per_interval =
-		ENA_INTR_HIGH_PKTS;
-	intr_moder_tbl[ENA_INTR_MODER_HIGH].bytes_per_interval =
-		ENA_INTR_HIGH_BYTES;
-
-	intr_moder_tbl[ENA_INTR_MODER_HIGHEST].intr_moder_interval =
-		ENA_INTR_HIGHEST_USECS;
-	intr_moder_tbl[ENA_INTR_MODER_HIGHEST].pkts_per_interval =
-		ENA_INTR_HIGHEST_PKTS;
-	intr_moder_tbl[ENA_INTR_MODER_HIGHEST].bytes_per_interval =
-		ENA_INTR_HIGHEST_BYTES;
 }
 
 unsigned int ena_com_get_nonadaptive_moderation_interval_tx(struct ena_com_dev *ena_dev)
@@ -2904,43 +2831,6 @@ unsigned int ena_com_get_nonadaptive_moderation_interval_rx(struct ena_com_dev *
 	return ena_dev->intr_moder_rx_interval;
 }
 
-void ena_com_init_intr_moderation_entry(struct ena_com_dev *ena_dev,
-					enum ena_intr_moder_level level,
-					struct ena_intr_moder_entry *entry)
-{
-	struct ena_intr_moder_entry *intr_moder_tbl = ena_dev->intr_moder_tbl;
-
-	if (level >= ENA_INTR_MAX_NUM_OF_LEVELS)
-		return;
-
-	intr_moder_tbl[level].intr_moder_interval = entry->intr_moder_interval;
-	if (ena_dev->intr_delay_resolution)
-		intr_moder_tbl[level].intr_moder_interval /=
-			ena_dev->intr_delay_resolution;
-	intr_moder_tbl[level].pkts_per_interval = entry->pkts_per_interval;
-
-	/* use hardcoded value until ethtool supports bytecount parameter */
-	if (entry->bytes_per_interval != ENA_INTR_BYTE_COUNT_NOT_SUPPORTED)
-		intr_moder_tbl[level].bytes_per_interval = entry->bytes_per_interval;
-}
-
-void ena_com_get_intr_moderation_entry(struct ena_com_dev *ena_dev,
-				       enum ena_intr_moder_level level,
-				       struct ena_intr_moder_entry *entry)
-{
-	struct ena_intr_moder_entry *intr_moder_tbl = ena_dev->intr_moder_tbl;
-
-	if (level >= ENA_INTR_MAX_NUM_OF_LEVELS)
-		return;
-
-	entry->intr_moder_interval = intr_moder_tbl[level].intr_moder_interval;
-	if (ena_dev->intr_delay_resolution)
-		entry->intr_moder_interval *= ena_dev->intr_delay_resolution;
-	entry->pkts_per_interval =
-	intr_moder_tbl[level].pkts_per_interval;
-	entry->bytes_per_interval = intr_moder_tbl[level].bytes_per_interval;
-}
-
 int ena_com_config_dev_mode(struct ena_com_dev *ena_dev,
 			    struct ena_admin_feature_llq_desc *llq_features,
 			    struct ena_llq_configurations *llq_default_cfg)
diff --git a/drivers/net/ethernet/amazon/ena/ena_com.h b/drivers/net/ethernet/amazon/ena/ena_com.h
index baeefc6af4f3..ddc2a8c50333 100644
--- a/drivers/net/ethernet/amazon/ena/ena_com.h
+++ b/drivers/net/ethernet/amazon/ena/ena_com.h
@@ -72,46 +72,13 @@
 /*****************************************************************************/
 /* ENA adaptive interrupt moderation settings */
 
-#define ENA_INTR_LOWEST_USECS           (0)
-#define ENA_INTR_LOWEST_PKTS            (3)
-#define ENA_INTR_LOWEST_BYTES           (2 * 1524)
-
-#define ENA_INTR_LOW_USECS              (32)
-#define ENA_INTR_LOW_PKTS               (12)
-#define ENA_INTR_LOW_BYTES              (16 * 1024)
-
-#define ENA_INTR_MID_USECS              (80)
-#define ENA_INTR_MID_PKTS               (48)
-#define ENA_INTR_MID_BYTES              (64 * 1024)
-
-#define ENA_INTR_HIGH_USECS             (128)
-#define ENA_INTR_HIGH_PKTS              (96)
-#define ENA_INTR_HIGH_BYTES             (128 * 1024)
-
-#define ENA_INTR_HIGHEST_USECS          (192)
-#define ENA_INTR_HIGHEST_PKTS           (128)
-#define ENA_INTR_HIGHEST_BYTES          (192 * 1024)
-
 #define ENA_INTR_INITIAL_TX_INTERVAL_USECS		196
 #define ENA_INTR_INITIAL_RX_INTERVAL_USECS		0
-#define ENA_INTR_DELAY_OLD_VALUE_WEIGHT			6
-#define ENA_INTR_DELAY_NEW_VALUE_WEIGHT			4
-#define ENA_INTR_MODER_LEVEL_STRIDE			2
-#define ENA_INTR_BYTE_COUNT_NOT_SUPPORTED		0xFFFFFF
 
 #define ENA_HW_HINTS_NO_TIMEOUT				0xFFFF
 
 #define ENA_FEATURE_MAX_QUEUE_EXT_VER	1
 
-enum ena_intr_moder_level {
-	ENA_INTR_MODER_LOWEST = 0,
-	ENA_INTR_MODER_LOW,
-	ENA_INTR_MODER_MID,
-	ENA_INTR_MODER_HIGH,
-	ENA_INTR_MODER_HIGHEST,
-	ENA_INTR_MAX_NUM_OF_LEVELS,
-};
-
 struct ena_llq_configurations {
 	enum ena_admin_llq_header_location llq_header_location;
 	enum ena_admin_llq_ring_entry_size llq_ring_entry_size;
@@ -120,12 +87,6 @@ struct ena_llq_configurations {
 	u16 llq_ring_entry_size_value;
 };
 
-struct ena_intr_moder_entry {
-	unsigned int intr_moder_interval;
-	unsigned int pkts_per_interval;
-	unsigned int bytes_per_interval;
-};
-
 enum queue_direction {
 	ENA_COM_IO_QUEUE_DIRECTION_TX,
 	ENA_COM_IO_QUEUE_DIRECTION_RX
@@ -920,11 +881,6 @@ int ena_com_execute_admin_command(struct ena_com_admin_queue *admin_queue,
  */
 int ena_com_init_interrupt_moderation(struct ena_com_dev *ena_dev);
 
-/* ena_com_destroy_interrupt_moderation - Destroy interrupt moderation resources
- * @ena_dev: ENA communication layer struct
- */
-void ena_com_destroy_interrupt_moderation(struct ena_com_dev *ena_dev);
-
 /* ena_com_interrupt_moderation_supported - Return if interrupt moderation
  * capability is supported by the device.
  *
@@ -932,12 +888,6 @@ void ena_com_destroy_interrupt_moderation(struct ena_com_dev *ena_dev);
  */
 bool ena_com_interrupt_moderation_supported(struct ena_com_dev *ena_dev);
 
-/* ena_com_config_default_interrupt_moderation_table - Restore the interrupt
- * moderation table back to the default parameters.
- * @ena_dev: ENA communication layer struct
- */
-void ena_com_config_default_interrupt_moderation_table(struct ena_com_dev *ena_dev);
-
 /* ena_com_update_nonadaptive_moderation_interval_tx - Update the
  * non-adaptive interval in Tx direction.
  * @ena_dev: ENA communication layer struct
@@ -974,29 +924,6 @@ unsigned int ena_com_get_nonadaptive_moderation_interval_tx(struct ena_com_dev *
  */
 unsigned int ena_com_get_nonadaptive_moderation_interval_rx(struct ena_com_dev *ena_dev);
 
-/* ena_com_init_intr_moderation_entry - Update a single entry in the interrupt
- * moderation table.
- * @ena_dev: ENA communication layer struct
- * @level: Interrupt moderation table level
- * @entry: Entry value
- *
- * Update a single entry in the interrupt moderation table.
- */
-void ena_com_init_intr_moderation_entry(struct ena_com_dev *ena_dev,
-					enum ena_intr_moder_level level,
-					struct ena_intr_moder_entry *entry);
-
-/* ena_com_get_intr_moderation_entry - Init ena_intr_moder_entry.
- * @ena_dev: ENA communication layer struct
- * @level: Interrupt moderation table level
- * @entry: Entry to fill.
- *
- * Initialize the entry according to the adaptive interrupt moderation table.
- */
-void ena_com_get_intr_moderation_entry(struct ena_com_dev *ena_dev,
-				       enum ena_intr_moder_level level,
-				       struct ena_intr_moder_entry *entry);
-
 /* ena_com_config_dev_mode - Configure the placement policy of the device.
  * @ena_dev: ENA communication layer struct
  * @llq_features: LLQ feature descriptor, retrieve via
@@ -1022,75 +949,6 @@ static inline void ena_com_disable_adaptive_moderation(struct ena_com_dev *ena_d
 	ena_dev->adaptive_coalescing = false;
 }
 
-/* ena_com_calculate_interrupt_delay - Calculate new interrupt delay
- * @ena_dev: ENA communication layer struct
- * @pkts: Number of packets since the last update
- * @bytes: Number of bytes received since the last update.
- * @smoothed_interval: Returned interval
- * @moder_tbl_idx: Current table level as input update new level as return
- * value.
- */
-static inline void ena_com_calculate_interrupt_delay(struct ena_com_dev *ena_dev,
-						     unsigned int pkts,
-						     unsigned int bytes,
-						     unsigned int *smoothed_interval,
-						     unsigned int *moder_tbl_idx)
-{
-	enum ena_intr_moder_level curr_moder_idx, new_moder_idx;
-	struct ena_intr_moder_entry *curr_moder_entry;
-	struct ena_intr_moder_entry *pred_moder_entry;
-	struct ena_intr_moder_entry *new_moder_entry;
-	struct ena_intr_moder_entry *intr_moder_tbl = ena_dev->intr_moder_tbl;
-	unsigned int interval;
-
-	/* We apply adaptive moderation on Rx path only.
-	 * Tx uses static interrupt moderation.
-	 */
-	if (!pkts || !bytes)
-		/* Tx interrupt, or spurious interrupt,
-		 * in both cases we just use same delay values
-		 */
-		return;
-
-	curr_moder_idx = (enum ena_intr_moder_level)(*moder_tbl_idx);
-	if (unlikely(curr_moder_idx >= ENA_INTR_MAX_NUM_OF_LEVELS)) {
-		pr_err("Wrong moderation index %u\n", curr_moder_idx);
-		return;
-	}
-
-	curr_moder_entry = &intr_moder_tbl[curr_moder_idx];
-	new_moder_idx = curr_moder_idx;
-
-	if (curr_moder_idx == ENA_INTR_MODER_LOWEST) {
-		if ((pkts > curr_moder_entry->pkts_per_interval) ||
-		    (bytes > curr_moder_entry->bytes_per_interval))
-			new_moder_idx =
-				(enum ena_intr_moder_level)(curr_moder_idx + ENA_INTR_MODER_LEVEL_STRIDE);
-	} else {
-		pred_moder_entry = &intr_moder_tbl[curr_moder_idx - ENA_INTR_MODER_LEVEL_STRIDE];
-
-		if ((pkts <= pred_moder_entry->pkts_per_interval) ||
-		    (bytes <= pred_moder_entry->bytes_per_interval))
-			new_moder_idx =
-				(enum ena_intr_moder_level)(curr_moder_idx - ENA_INTR_MODER_LEVEL_STRIDE);
-		else if ((pkts > curr_moder_entry->pkts_per_interval) ||
-			 (bytes > curr_moder_entry->bytes_per_interval)) {
-			if (curr_moder_idx != ENA_INTR_MODER_HIGHEST)
-				new_moder_idx =
-					(enum ena_intr_moder_level)(curr_moder_idx + ENA_INTR_MODER_LEVEL_STRIDE);
-		}
-	}
-	new_moder_entry = &intr_moder_tbl[new_moder_idx];
-
-	interval = new_moder_entry->intr_moder_interval;
-	*smoothed_interval = (
-		(interval * ENA_INTR_DELAY_NEW_VALUE_WEIGHT +
-		ENA_INTR_DELAY_OLD_VALUE_WEIGHT * (*smoothed_interval)) + 5) /
-		10;
-
-	*moder_tbl_idx = new_moder_idx;
-}
-
 /* ena_com_update_intr_reg - Prepare interrupt register
  * @intr_reg: interrupt register to update.
  * @rx_delay_interval: Rx interval in usecs
-- 
2.17.2


^ permalink raw reply related

* [PATCH V1 net-next 07/11] net: ena: remove ena_restore_ethtool_params() and relevant fields
From: akiyano @ 2019-09-12 22:08 UTC (permalink / raw)
  To: davem, netdev
  Cc: Arthur Kiyanovski, dwmw, zorik, matua, saeedb, msw, aliguori,
	nafea, gtzalik, netanel, alisaidi, benh, sameehj, ndagan
In-Reply-To: <1568326128-4057-1-git-send-email-akiyano@amazon.com>

From: Arthur Kiyanovski <akiyano@amazon.com>

Deleted unused 4 fields from struct ena_adapter and their only user
ena_restore_ethtool_params().

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 10 ----------
 drivers/net/ethernet/amazon/ena/ena_netdev.h |  3 ---
 2 files changed, 13 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 1d8855ab8624..8b9f8b90e525 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -1563,14 +1563,6 @@ static void ena_napi_enable_all(struct ena_adapter *adapter)
 		napi_enable(&adapter->ena_napi[i].napi);
 }
 
-static void ena_restore_ethtool_params(struct ena_adapter *adapter)
-{
-	adapter->tx_usecs = 0;
-	adapter->rx_usecs = 0;
-	adapter->tx_frames = 1;
-	adapter->rx_frames = 1;
-}
-
 /* Configure the Rx forwarding */
 static int ena_rss_configure(struct ena_adapter *adapter)
 {
@@ -1620,8 +1612,6 @@ static int ena_up_complete(struct ena_adapter *adapter)
 	/* enable transmits */
 	netif_tx_start_all_queues(adapter->netdev);
 
-	ena_restore_ethtool_params(adapter);
-
 	ena_napi_enable_all(adapter);
 
 	return 0;
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h b/drivers/net/ethernet/amazon/ena/ena_netdev.h
index e9450991ab74..72ee51a82ec7 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.h
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h
@@ -330,9 +330,6 @@ struct ena_adapter {
 
 	u32 missing_tx_completion_threshold;
 
-	u32 tx_usecs, rx_usecs; /* interrupt moderation */
-	u32 tx_frames, rx_frames; /* interrupt moderation */
-
 	u32 requested_tx_ring_size;
 	u32 requested_rx_ring_size;
 
-- 
2.17.2


^ permalink raw reply related

* [PATCH V1 net-next 06/11] net: ena: remove old adaptive interrupt moderation code from ena_netdev
From: akiyano @ 2019-09-12 22:08 UTC (permalink / raw)
  To: davem, netdev
  Cc: Arthur Kiyanovski, dwmw, zorik, matua, saeedb, msw, aliguori,
	nafea, gtzalik, netanel, alisaidi, benh, sameehj, ndagan
In-Reply-To: <1568326128-4057-1-git-send-email-akiyano@amazon.com>

From: Arthur Kiyanovski <akiyano@amazon.com>

1. Out of the fields {per_napi_bytes, per_napi_packets} in struct ena_ring,
   only rx_ring->per_napi_packets are used to determine if napi did work
   for dim.
   This commit removes all other uses of these fields.
2. Remove ena_ring->moder_tbl_idx, which is not used by dim.
3. Remove all calls to ena_com_destroy_interrupt_moderation(), since all it
   did was to destroy the interrupt moderation table, which is removed as
   part of removing old interrupt moderation code.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 8 --------
 drivers/net/ethernet/amazon/ena/ena_netdev.h | 2 --
 2 files changed, 10 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index f19736493c01..1d8855ab8624 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -158,7 +158,6 @@ static void ena_init_io_rings_common(struct ena_adapter *adapter,
 	ring->adapter = adapter;
 	ring->ena_dev = adapter->ena_dev;
 	ring->per_napi_packets = 0;
-	ring->per_napi_bytes = 0;
 	ring->cpu = 0;
 	ring->first_interrupt = false;
 	ring->no_interrupt_event_cnt = 0;
@@ -834,9 +833,6 @@ static int ena_clean_tx_irq(struct ena_ring *tx_ring, u32 budget)
 		__netif_tx_unlock(txq);
 	}
 
-	tx_ring->per_napi_bytes += tx_bytes;
-	tx_ring->per_napi_packets += tx_pkts;
-
 	return tx_pkts;
 }
 
@@ -1120,7 +1116,6 @@ static int ena_clean_rx_irq(struct ena_ring *rx_ring, struct napi_struct *napi,
 	} while (likely(res_budget));
 
 	work_done = budget - res_budget;
-	rx_ring->per_napi_bytes += total_len;
 	rx_ring->per_napi_packets += work_done;
 	u64_stats_update_begin(&rx_ring->syncp);
 	rx_ring->rx_stats.bytes += total_len;
@@ -3641,7 +3636,6 @@ static int ena_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	ena_free_mgmnt_irq(adapter);
 	ena_disable_msix(adapter);
 err_worker_destroy:
-	ena_com_destroy_interrupt_moderation(ena_dev);
 	del_timer(&adapter->timer_service);
 err_netdev_destroy:
 	free_netdev(netdev);
@@ -3702,8 +3696,6 @@ static void ena_remove(struct pci_dev *pdev)
 
 	pci_disable_device(pdev);
 
-	ena_com_destroy_interrupt_moderation(ena_dev);
-
 	vfree(ena_dev);
 }
 
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h b/drivers/net/ethernet/amazon/ena/ena_netdev.h
index f67ecab3389a..e9450991ab74 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.h
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h
@@ -280,8 +280,6 @@ struct ena_ring {
 	struct ena_com_rx_buf_info ena_bufs[ENA_PKT_MAX_BUFS];
 	u32  smoothed_interval;
 	u32  per_napi_packets;
-	u32  per_napi_bytes;
-	enum ena_intr_moder_level moder_tbl_idx;
 	u16 non_empty_napi_events;
 	struct u64_stats_sync syncp;
 	union {
-- 
2.17.2


^ permalink raw reply related

* [PATCH V1 net-next 04/11] net: ena: enable the interrupt_moderation in driver_supported_features
From: akiyano @ 2019-09-12 22:08 UTC (permalink / raw)
  To: davem, netdev
  Cc: Arthur Kiyanovski, dwmw, zorik, matua, saeedb, msw, aliguori,
	nafea, gtzalik, netanel, alisaidi, benh, sameehj, ndagan
In-Reply-To: <1568326128-4057-1-git-send-email-akiyano@amazon.com>

From: Arthur Kiyanovski <akiyano@amazon.com>

Add driver_supported_features to host_host info which is a new API used to
communicate to the device which features are supported by the driver.

Add the interrupt_moderation bit to host_info->driver_supported_features
and enable it to signal the device that this driver supports interrupt
moderation properly.

Reserved bits are for features implemented in the future

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_admin_defs.h | 8 ++++++++
 drivers/net/ethernet/amazon/ena/ena_netdev.c     | 3 +++
 2 files changed, 11 insertions(+)

diff --git a/drivers/net/ethernet/amazon/ena/ena_admin_defs.h b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h
index d19f2ecf8e84..8baf847e8622 100644
--- a/drivers/net/ethernet/amazon/ena/ena_admin_defs.h
+++ b/drivers/net/ethernet/amazon/ena/ena_admin_defs.h
@@ -808,6 +808,12 @@ struct ena_admin_host_info {
 	u16 num_cpus;
 
 	u16 reserved;
+
+	/* 1 :0 : reserved
+	 * 2 : interrupt_moderation
+	 * 31:3 : reserved
+	 */
+	u32 driver_supported_features;
 };
 
 struct ena_admin_rss_ind_table_entry {
@@ -1110,6 +1116,8 @@ struct ena_admin_ena_mmio_req_read_less_resp {
 #define ENA_ADMIN_HOST_INFO_DEVICE_MASK                     GENMASK(7, 3)
 #define ENA_ADMIN_HOST_INFO_BUS_SHIFT                       8
 #define ENA_ADMIN_HOST_INFO_BUS_MASK                        GENMASK(15, 8)
+#define ENA_ADMIN_HOST_INFO_INTERRUPT_MODERATION_SHIFT      2
+#define ENA_ADMIN_HOST_INFO_INTERRUPT_MODERATION_MASK       BIT(2)
 
 /* aenq_common_desc */
 #define ENA_ADMIN_AENQ_COMMON_DESC_PHASE_MASK               BIT(0)
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index cdcc169b87fa..f19736493c01 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -2438,6 +2438,9 @@ static void ena_config_host_info(struct ena_com_dev *ena_dev,
 		("K"[0] << ENA_ADMIN_HOST_INFO_MODULE_TYPE_SHIFT);
 	host_info->num_cpus = num_online_cpus();
 
+	host_info->driver_supported_features =
+		ENA_ADMIN_HOST_INFO_INTERRUPT_MODERATION_MASK;
+
 	rc = ena_com_set_host_attributes(ena_dev);
 	if (rc) {
 		if (rc == -EOPNOTSUPP)
-- 
2.17.2


^ permalink raw reply related

* [PATCH V1 net-next 05/11] net: ena: remove code duplication in ena_com_update_nonadaptive_moderation_interval _*()
From: akiyano @ 2019-09-12 22:08 UTC (permalink / raw)
  To: davem, netdev
  Cc: Arthur Kiyanovski, dwmw, zorik, matua, saeedb, msw, aliguori,
	nafea, gtzalik, netanel, alisaidi, benh, sameehj, ndagan
In-Reply-To: <1568326128-4057-1-git-send-email-akiyano@amazon.com>

From: Arthur Kiyanovski <akiyano@amazon.com>

Remove code duplication in:
ena_com_update_nonadaptive_moderation_interval_tx()
ena_com_update_nonadaptive_moderation_interval_rx()
functions.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_com.c | 30 ++++++++++++-----------
 1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c b/drivers/net/ethernet/amazon/ena/ena_com.c
index c9440abf6b95..ea859ecf664d 100644
--- a/drivers/net/ethernet/amazon/ena/ena_com.c
+++ b/drivers/net/ethernet/amazon/ena/ena_com.c
@@ -2773,32 +2773,34 @@ bool ena_com_interrupt_moderation_supported(struct ena_com_dev *ena_dev)
 						  ENA_ADMIN_INTERRUPT_MODERATION);
 }
 
-int ena_com_update_nonadaptive_moderation_interval_tx(struct ena_com_dev *ena_dev,
-						      u32 tx_coalesce_usecs)
+static int ena_com_update_nonadaptive_moderation_interval(u32 coalesce_usecs,
+							  u32 intr_delay_resolution,
+							  u32 *intr_moder_interval)
 {
-	if (!ena_dev->intr_delay_resolution) {
+	if (!intr_delay_resolution) {
 		pr_err("Illegal interrupt delay granularity value\n");
 		return -EFAULT;
 	}
 
-	ena_dev->intr_moder_tx_interval = tx_coalesce_usecs /
-		ena_dev->intr_delay_resolution;
+	*intr_moder_interval = coalesce_usecs / intr_delay_resolution;
 
 	return 0;
 }
 
+int ena_com_update_nonadaptive_moderation_interval_tx(struct ena_com_dev *ena_dev,
+						      u32 tx_coalesce_usecs)
+{
+	return ena_com_update_nonadaptive_moderation_interval(tx_coalesce_usecs,
+							      ena_dev->intr_delay_resolution,
+							      &ena_dev->intr_moder_tx_interval);
+}
+
 int ena_com_update_nonadaptive_moderation_interval_rx(struct ena_com_dev *ena_dev,
 						      u32 rx_coalesce_usecs)
 {
-	if (!ena_dev->intr_delay_resolution) {
-		pr_err("Illegal interrupt delay granularity value\n");
-		return -EFAULT;
-	}
-
-	ena_dev->intr_moder_rx_interval = rx_coalesce_usecs /
-		ena_dev->intr_delay_resolution;
-
-	return 0;
+	return ena_com_update_nonadaptive_moderation_interval(rx_coalesce_usecs,
+							      ena_dev->intr_delay_resolution,
+							      &ena_dev->intr_moder_rx_interval);
 }
 
 void ena_com_destroy_interrupt_moderation(struct ena_com_dev *ena_dev)
-- 
2.17.2


^ permalink raw reply related

* [PATCH V1 net-next 03/11] net: ena: reimplement set/get_coalesce()
From: akiyano @ 2019-09-12 22:08 UTC (permalink / raw)
  To: davem, netdev
  Cc: Arthur Kiyanovski, dwmw, zorik, matua, saeedb, msw, aliguori,
	nafea, gtzalik, netanel, alisaidi, benh, sameehj, ndagan
In-Reply-To: <1568326128-4057-1-git-send-email-akiyano@amazon.com>

From: Arthur Kiyanovski <akiyano@amazon.com>

1. Remove old adaptive interrupt moderation code from set/get_coalesce()
2. Add ena_update_rx_rings_intr_moderation() function for updating
   nonadaptive interrupt moderation intervals similarly to
   ena_update_tx_rings_intr_moderation().
3. Remove checks of multiple unsupported received interrupt coalescing
   parameters. This makes code cleaner and cancels the need to update
   it every time a new coalescing parameter is invented.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_ethtool.c | 84 ++++++-------------
 1 file changed, 26 insertions(+), 58 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_ethtool.c b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
index b997c3ce9e2b..0f90e2296630 100644
--- a/drivers/net/ethernet/amazon/ena/ena_ethtool.c
+++ b/drivers/net/ethernet/amazon/ena/ena_ethtool.c
@@ -305,7 +305,6 @@ static int ena_get_coalesce(struct net_device *net_dev,
 {
 	struct ena_adapter *adapter = netdev_priv(net_dev);
 	struct ena_com_dev *ena_dev = adapter->ena_dev;
-	struct ena_intr_moder_entry intr_moder_entry;
 
 	if (!ena_com_interrupt_moderation_supported(ena_dev)) {
 		/* the devie doesn't support interrupt moderation */
@@ -314,23 +313,12 @@ static int ena_get_coalesce(struct net_device *net_dev,
 	coalesce->tx_coalesce_usecs =
 		ena_com_get_nonadaptive_moderation_interval_tx(ena_dev) /
 			ena_dev->intr_delay_resolution;
-	if (!ena_com_get_adaptive_moderation_enabled(ena_dev)) {
+
+	if (!ena_com_get_adaptive_moderation_enabled(ena_dev))
 		coalesce->rx_coalesce_usecs =
 			ena_com_get_nonadaptive_moderation_interval_rx(ena_dev)
 			/ ena_dev->intr_delay_resolution;
-	} else {
-		ena_com_get_intr_moderation_entry(adapter->ena_dev, ENA_INTR_MODER_LOWEST, &intr_moder_entry);
-		coalesce->rx_coalesce_usecs_low = intr_moder_entry.intr_moder_interval;
-		coalesce->rx_max_coalesced_frames_low = intr_moder_entry.pkts_per_interval;
-
-		ena_com_get_intr_moderation_entry(adapter->ena_dev, ENA_INTR_MODER_MID, &intr_moder_entry);
-		coalesce->rx_coalesce_usecs = intr_moder_entry.intr_moder_interval;
-		coalesce->rx_max_coalesced_frames = intr_moder_entry.pkts_per_interval;
-
-		ena_com_get_intr_moderation_entry(adapter->ena_dev, ENA_INTR_MODER_HIGHEST, &intr_moder_entry);
-		coalesce->rx_coalesce_usecs_high = intr_moder_entry.intr_moder_interval;
-		coalesce->rx_max_coalesced_frames_high = intr_moder_entry.pkts_per_interval;
-	}
+
 	coalesce->use_adaptive_rx_coalesce =
 		ena_com_get_adaptive_moderation_enabled(ena_dev);
 
@@ -348,12 +336,22 @@ static void ena_update_tx_rings_intr_moderation(struct ena_adapter *adapter)
 		adapter->tx_ring[i].smoothed_interval = val;
 }
 
+static void ena_update_rx_rings_intr_moderation(struct ena_adapter *adapter)
+{
+	unsigned int val;
+	int i;
+
+	val = ena_com_get_nonadaptive_moderation_interval_rx(adapter->ena_dev);
+
+	for (i = 0; i < adapter->num_queues; i++)
+		adapter->rx_ring[i].smoothed_interval = val;
+}
+
 static int ena_set_coalesce(struct net_device *net_dev,
 			    struct ethtool_coalesce *coalesce)
 {
 	struct ena_adapter *adapter = netdev_priv(net_dev);
 	struct ena_com_dev *ena_dev = adapter->ena_dev;
-	struct ena_intr_moder_entry intr_moder_entry;
 	int rc;
 
 	if (!ena_com_interrupt_moderation_supported(ena_dev)) {
@@ -361,22 +359,6 @@ static int ena_set_coalesce(struct net_device *net_dev,
 		return -EOPNOTSUPP;
 	}
 
-	if (coalesce->rx_coalesce_usecs_irq ||
-	    coalesce->rx_max_coalesced_frames_irq ||
-	    coalesce->tx_coalesce_usecs_irq ||
-	    coalesce->tx_max_coalesced_frames ||
-	    coalesce->tx_max_coalesced_frames_irq ||
-	    coalesce->stats_block_coalesce_usecs ||
-	    coalesce->use_adaptive_tx_coalesce ||
-	    coalesce->pkt_rate_low ||
-	    coalesce->tx_coalesce_usecs_low ||
-	    coalesce->tx_max_coalesced_frames_low ||
-	    coalesce->pkt_rate_high ||
-	    coalesce->tx_coalesce_usecs_high ||
-	    coalesce->tx_max_coalesced_frames_high ||
-	    coalesce->rate_sample_interval)
-		return -EINVAL;
-
 	rc = ena_com_update_nonadaptive_moderation_interval_tx(ena_dev,
 							       coalesce->tx_coalesce_usecs);
 	if (rc)
@@ -384,37 +366,23 @@ static int ena_set_coalesce(struct net_device *net_dev,
 
 	ena_update_tx_rings_intr_moderation(adapter);
 
-	if (ena_com_get_adaptive_moderation_enabled(ena_dev)) {
-		if (!coalesce->use_adaptive_rx_coalesce) {
-			ena_com_disable_adaptive_moderation(ena_dev);
-			rc = ena_com_update_nonadaptive_moderation_interval_rx(ena_dev,
-									       coalesce->rx_coalesce_usecs);
-			return rc;
-		}
-	} else { /* was in non-adaptive mode */
-		if (coalesce->use_adaptive_rx_coalesce) {
+	if (coalesce->use_adaptive_rx_coalesce) {
+		if (!ena_com_get_adaptive_moderation_enabled(ena_dev))
 			ena_com_enable_adaptive_moderation(ena_dev);
-		} else {
-			rc = ena_com_update_nonadaptive_moderation_interval_rx(ena_dev,
-									       coalesce->rx_coalesce_usecs);
-			return rc;
-		}
+		return 0;
 	}
 
-	intr_moder_entry.intr_moder_interval = coalesce->rx_coalesce_usecs_low;
-	intr_moder_entry.pkts_per_interval = coalesce->rx_max_coalesced_frames_low;
-	intr_moder_entry.bytes_per_interval = ENA_INTR_BYTE_COUNT_NOT_SUPPORTED;
-	ena_com_init_intr_moderation_entry(adapter->ena_dev, ENA_INTR_MODER_LOWEST, &intr_moder_entry);
+	rc = ena_com_update_nonadaptive_moderation_interval_rx(ena_dev,
+							       coalesce->rx_coalesce_usecs);
+	if (rc)
+		return rc;
 
-	intr_moder_entry.intr_moder_interval = coalesce->rx_coalesce_usecs;
-	intr_moder_entry.pkts_per_interval = coalesce->rx_max_coalesced_frames;
-	intr_moder_entry.bytes_per_interval = ENA_INTR_BYTE_COUNT_NOT_SUPPORTED;
-	ena_com_init_intr_moderation_entry(adapter->ena_dev, ENA_INTR_MODER_MID, &intr_moder_entry);
+	ena_update_rx_rings_intr_moderation(adapter);
 
-	intr_moder_entry.intr_moder_interval = coalesce->rx_coalesce_usecs_high;
-	intr_moder_entry.pkts_per_interval = coalesce->rx_max_coalesced_frames_high;
-	intr_moder_entry.bytes_per_interval = ENA_INTR_BYTE_COUNT_NOT_SUPPORTED;
-	ena_com_init_intr_moderation_entry(adapter->ena_dev, ENA_INTR_MODER_HIGHEST, &intr_moder_entry);
+	if (!coalesce->use_adaptive_rx_coalesce) {
+		if (ena_com_get_adaptive_moderation_enabled(ena_dev))
+			ena_com_disable_adaptive_moderation(ena_dev);
+	}
 
 	return 0;
 }
-- 
2.17.2


^ permalink raw reply related

* [PATCH V1 net-next 02/11] net: ena: switch to dim algorithm for rx adaptive interrupt moderation
From: akiyano @ 2019-09-12 22:08 UTC (permalink / raw)
  To: davem, netdev
  Cc: Arthur Kiyanovski, dwmw, zorik, matua, saeedb, msw, aliguori,
	nafea, gtzalik, netanel, alisaidi, benh, sameehj, ndagan
In-Reply-To: <1568326128-4057-1-git-send-email-akiyano@amazon.com>

From: Arthur Kiyanovski <akiyano@amazon.com>

Use the dim library for the rx adaptive interrupt moderation implementation

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_com.c    |  4 +-
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 55 +++++++++++++-------
 drivers/net/ethernet/amazon/ena/ena_netdev.h |  3 ++
 3 files changed, 41 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c b/drivers/net/ethernet/amazon/ena/ena_com.c
index af85fd831ea4..c9440abf6b95 100644
--- a/drivers/net/ethernet/amazon/ena/ena_com.c
+++ b/drivers/net/ethernet/amazon/ena/ena_com.c
@@ -2840,9 +2840,7 @@ int ena_com_init_interrupt_moderation(struct ena_com_dev *ena_dev)
 	delay_resolution = get_resp.u.intr_moderation.intr_delay_resolution;
 	ena_com_update_intr_delay_resolution(ena_dev, delay_resolution);
 
-	/* Disable adaptive moderation by default - can be enabled from
-	 * ethtool
-	 */
+	/* Disable adaptive moderation by default - can be enabled later */
 	ena_com_disable_adaptive_moderation(ena_dev);
 
 	return 0;
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 233b252ceb9e..cdcc169b87fa 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -196,6 +196,7 @@ static void ena_init_io_rings(struct ena_adapter *adapter)
 		rxr->smoothed_interval =
 			ena_com_get_nonadaptive_moderation_interval_rx(ena_dev);
 		rxr->empty_rx_queue = 0;
+		adapter->ena_napi[i].dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
 	}
 }
 
@@ -712,6 +713,7 @@ static void ena_destroy_all_rx_queues(struct ena_adapter *adapter)
 
 	for (i = 0; i < adapter->num_queues; i++) {
 		ena_qid = ENA_IO_RXQ_IDX(i);
+		cancel_work_sync(&adapter->ena_napi[i].dim.work);
 		ena_com_destroy_io_queue(adapter->ena_dev, ena_qid);
 	}
 }
@@ -1155,23 +1157,35 @@ static int ena_clean_rx_irq(struct ena_ring *rx_ring, struct napi_struct *napi,
 	return 0;
 }
 
-void ena_adjust_intr_moderation(struct ena_ring *rx_ring,
-				       struct ena_ring *tx_ring)
+static void ena_dim_work(struct work_struct *w)
 {
-	/* We apply adaptive moderation on Rx path only.
-	 * Tx uses static interrupt moderation.
-	 */
-	ena_com_calculate_interrupt_delay(rx_ring->ena_dev,
-					  rx_ring->per_napi_packets,
-					  rx_ring->per_napi_bytes,
-					  &rx_ring->smoothed_interval,
-					  &rx_ring->moder_tbl_idx);
-
-	/* Reset per napi packets/bytes */
-	tx_ring->per_napi_packets = 0;
-	tx_ring->per_napi_bytes = 0;
+	struct dim *dim = container_of(w, struct dim, work);
+	struct dim_cq_moder cur_moder =
+		net_dim_get_rx_moderation(dim->mode, dim->profile_ix);
+	struct ena_napi *ena_napi = container_of(dim, struct ena_napi, dim);
+
+	ena_napi->rx_ring->smoothed_interval = cur_moder.usec;
+	dim->state = DIM_START_MEASURE;
+}
+
+static void ena_adjust_adaptive_rx_intr_moderation(struct ena_napi *ena_napi)
+{
+	struct dim_sample dim_sample;
+	struct ena_ring *rx_ring = ena_napi->rx_ring;
+
+	if (!rx_ring->per_napi_packets)
+		return;
+
+	rx_ring->non_empty_napi_events++;
+
+	dim_update_sample(rx_ring->non_empty_napi_events,
+			  rx_ring->rx_stats.cnt,
+			  rx_ring->rx_stats.bytes,
+			  &dim_sample);
+
+	net_dim(&ena_napi->dim, dim_sample);
+
 	rx_ring->per_napi_packets = 0;
-	rx_ring->per_napi_bytes = 0;
 }
 
 static void ena_unmask_interrupt(struct ena_ring *tx_ring,
@@ -1260,9 +1274,11 @@ static int ena_io_poll(struct napi_struct *napi, int budget)
 		 * from the interrupt context (vs from sk_busy_loop)
 		 */
 		if (napi_complete_done(napi, rx_work_done)) {
-			/* Tx and Rx share the same interrupt vector */
+			/* We apply adaptive moderation on Rx path only.
+			 * Tx uses static interrupt moderation.
+			 */
 			if (ena_com_get_adaptive_moderation_enabled(rx_ring->ena_dev))
-				ena_adjust_intr_moderation(rx_ring, tx_ring);
+				ena_adjust_adaptive_rx_intr_moderation(ena_napi);
 
 			ena_unmask_interrupt(tx_ring, rx_ring);
 		}
@@ -1740,13 +1756,16 @@ static int ena_create_all_io_rx_queues(struct ena_adapter *adapter)
 		rc = ena_create_io_rx_queue(adapter, i);
 		if (rc)
 			goto create_err;
+		INIT_WORK(&adapter->ena_napi[i].dim.work, ena_dim_work);
 	}
 
 	return 0;
 
 create_err:
-	while (i--)
+	while (i--) {
+		cancel_work_sync(&adapter->ena_napi[i].dim.work);
 		ena_com_destroy_io_queue(ena_dev, ENA_IO_RXQ_IDX(i));
+	}
 
 	return rc;
 }
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h b/drivers/net/ethernet/amazon/ena/ena_netdev.h
index efbcffd22215..f67ecab3389a 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.h
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h
@@ -34,6 +34,7 @@
 #define ENA_H
 
 #include <linux/bitops.h>
+#include <linux/dim.h>
 #include <linux/etherdevice.h>
 #include <linux/inetdevice.h>
 #include <linux/interrupt.h>
@@ -153,6 +154,7 @@ struct ena_napi {
 	struct ena_ring *tx_ring;
 	struct ena_ring *rx_ring;
 	u32 qid;
+	struct dim dim;
 };
 
 struct ena_calc_queue_size_ctx {
@@ -280,6 +282,7 @@ struct ena_ring {
 	u32  per_napi_packets;
 	u32  per_napi_bytes;
 	enum ena_intr_moder_level moder_tbl_idx;
+	u16 non_empty_napi_events;
 	struct u64_stats_sync syncp;
 	union {
 		struct ena_stats_tx tx_stats;
-- 
2.17.2


^ permalink raw reply related

* [PATCH V1 net-next 01/11] net: ena: add intr_moder_rx_interval to struct ena_com_dev and use it
From: akiyano @ 2019-09-12 22:08 UTC (permalink / raw)
  To: davem, netdev
  Cc: Arthur Kiyanovski, dwmw, zorik, matua, saeedb, msw, aliguori,
	nafea, gtzalik, netanel, alisaidi, benh, sameehj, ndagan
In-Reply-To: <1568326128-4057-1-git-send-email-akiyano@amazon.com>

From: Arthur Kiyanovski <akiyano@amazon.com>

Add intr_moder_rx_interval to struct ena_com_dev and use it as the
location where the interrupt moderation rx interval is saved, instead
of the interrupt moderation table.

This is done as a first step before removing the old interrupt moderation
code.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
---
 drivers/net/ethernet/amazon/ena/ena_com.c    | 21 +++++---------------
 drivers/net/ethernet/amazon/ena/ena_com.h    |  8 +++++++-
 drivers/net/ethernet/amazon/ena/ena_netdev.c |  3 ++-
 3 files changed, 14 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c b/drivers/net/ethernet/amazon/ena/ena_com.c
index 911a2e7a375a..af85fd831ea4 100644
--- a/drivers/net/ethernet/amazon/ena/ena_com.c
+++ b/drivers/net/ethernet/amazon/ena/ena_com.c
@@ -1297,9 +1297,6 @@ static int ena_com_init_interrupt_moderation_table(struct ena_com_dev *ena_dev)
 static void ena_com_update_intr_delay_resolution(struct ena_com_dev *ena_dev,
 						 u16 intr_delay_resolution)
 {
-	struct ena_intr_moder_entry *intr_moder_tbl = ena_dev->intr_moder_tbl;
-	unsigned int i;
-
 	if (!intr_delay_resolution) {
 		pr_err("Illegal intr_delay_resolution provided. Going to use default 1 usec resolution\n");
 		intr_delay_resolution = 1;
@@ -1307,8 +1304,8 @@ static void ena_com_update_intr_delay_resolution(struct ena_com_dev *ena_dev,
 	ena_dev->intr_delay_resolution = intr_delay_resolution;
 
 	/* update Rx */
-	for (i = 0; i < ENA_INTR_MAX_NUM_OF_LEVELS; i++)
-		intr_moder_tbl[i].intr_moder_interval /= intr_delay_resolution;
+	ena_dev->intr_moder_rx_interval /= intr_delay_resolution;
+
 
 	/* update Tx */
 	ena_dev->intr_moder_tx_interval /= intr_delay_resolution;
@@ -2798,11 +2795,8 @@ int ena_com_update_nonadaptive_moderation_interval_rx(struct ena_com_dev *ena_de
 		return -EFAULT;
 	}
 
-	/* We use LOWEST entry of moderation table for storing
-	 * nonadaptive interrupt coalescing values
-	 */
-	ena_dev->intr_moder_tbl[ENA_INTR_MODER_LOWEST].intr_moder_interval =
-		rx_coalesce_usecs / ena_dev->intr_delay_resolution;
+	ena_dev->intr_moder_rx_interval = rx_coalesce_usecs /
+		ena_dev->intr_delay_resolution;
 
 	return 0;
 }
@@ -2907,12 +2901,7 @@ unsigned int ena_com_get_nonadaptive_moderation_interval_tx(struct ena_com_dev *
 
 unsigned int ena_com_get_nonadaptive_moderation_interval_rx(struct ena_com_dev *ena_dev)
 {
-	struct ena_intr_moder_entry *intr_moder_tbl = ena_dev->intr_moder_tbl;
-
-	if (intr_moder_tbl)
-		return intr_moder_tbl[ENA_INTR_MODER_LOWEST].intr_moder_interval;
-
-	return 0;
+	return ena_dev->intr_moder_rx_interval;
 }
 
 void ena_com_init_intr_moderation_entry(struct ena_com_dev *ena_dev,
diff --git a/drivers/net/ethernet/amazon/ena/ena_com.h b/drivers/net/ethernet/amazon/ena/ena_com.h
index 0d3664fe260d..baeefc6af4f3 100644
--- a/drivers/net/ethernet/amazon/ena/ena_com.h
+++ b/drivers/net/ethernet/amazon/ena/ena_com.h
@@ -93,7 +93,7 @@
 #define ENA_INTR_HIGHEST_BYTES          (192 * 1024)
 
 #define ENA_INTR_INITIAL_TX_INTERVAL_USECS		196
-#define ENA_INTR_INITIAL_RX_INTERVAL_USECS		4
+#define ENA_INTR_INITIAL_RX_INTERVAL_USECS		0
 #define ENA_INTR_DELAY_OLD_VALUE_WEIGHT			6
 #define ENA_INTR_DELAY_NEW_VALUE_WEIGHT			4
 #define ENA_INTR_MODER_LEVEL_STRIDE			2
@@ -376,7 +376,13 @@ struct ena_com_dev {
 	struct ena_host_attribute host_attr;
 	bool adaptive_coalescing;
 	u16 intr_delay_resolution;
+
+	/* interrupt moderation intervals are in usec divided by
+	 * intr_delay_resolution, which is supplied by the device.
+	 */
 	u32 intr_moder_tx_interval;
+	u32 intr_moder_rx_interval;
+
 	struct ena_intr_moder_entry *intr_moder_tbl;
 
 	struct ena_com_llq_info llq_info;
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 664e3ed97ea9..233b252ceb9e 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -3485,10 +3485,11 @@ static int ena_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	calc_queue_ctx.get_feat_ctx = &get_feat_ctx;
 	calc_queue_ctx.pdev = pdev;
 
-	/* initial Tx interrupt delay, Assumes 1 usec granularity.
+	/* Initial Tx and RX interrupt delay. Assumes 1 usec granularity.
 	* Updated during device initialization with the real granularity
 	*/
 	ena_dev->intr_moder_tx_interval = ENA_INTR_INITIAL_TX_INTERVAL_USECS;
+	ena_dev->intr_moder_rx_interval = ENA_INTR_INITIAL_RX_INTERVAL_USECS;
 	io_queue_num = ena_calc_io_queue_num(pdev, ena_dev, &get_feat_ctx);
 	rc = ena_calc_queue_size(&calc_queue_ctx);
 	if (rc || io_queue_num <= 0) {
-- 
2.17.2


^ permalink raw reply related

* [PATCH V1 net-next 00/11] net: ena: implement adaptive interrupt moderation using dim
From: akiyano @ 2019-09-12 22:08 UTC (permalink / raw)
  To: davem, netdev
  Cc: Arthur Kiyanovski, dwmw, zorik, matua, saeedb, msw, aliguori,
	nafea, gtzalik, netanel, alisaidi, benh, sameehj, ndagan

From: Arthur Kiyanovski <akiyano@amazon.com>

In this patchset we replace our adaptive interrupt moderation
implementation with the dim library implementation.
The dim library showed great improvement in throughput, latency
and CPU usage in different scenarios on ARM CPUs.
This patchset also includes a few bug fixes to the parts of the
old implementation of adaptive interrupt moderation that were left.

Arthur Kiyanovski (11):
  net: ena: add intr_moder_rx_interval to struct ena_com_dev and use it
  net: ena: switch to dim algorithm for rx adaptive interrupt moderation
  net: ena: reimplement set/get_coalesce()
  net: ena: enable the interrupt_moderation in driver_supported_features
  net: ena: remove code duplication in
    ena_com_update_nonadaptive_moderation_interval _*()
  net: ena: remove old adaptive interrupt moderation code from
    ena_netdev
  net: ena: remove ena_restore_ethtool_params() and relevant fields
  net: ena: remove all old adaptive rx interrupt moderation code from
    ena_com
  net: ena: fix update of interrupt moderation register
  net: ena: fix retrieval of nonadaptive interrupt moderation intervals
  net: ena: fix incorrect update of intr_delay_resolution

 .../net/ethernet/amazon/ena/ena_admin_defs.h  |   8 +
 drivers/net/ethernet/amazon/ena/ena_com.c     | 175 ++++--------------
 drivers/net/ethernet/amazon/ena/ena_com.h     | 151 +--------------
 drivers/net/ethernet/amazon/ena/ena_ethtool.c |  89 +++------
 drivers/net/ethernet/amazon/ena/ena_netdev.c  |  86 +++++----
 drivers/net/ethernet/amazon/ena/ena_netdev.h  |   8 +-
 6 files changed, 129 insertions(+), 388 deletions(-)

-- 
2.17.2


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox