netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tariq Toukan <ttoukan.linux@gmail.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>,
	netdev <netdev@vger.kernel.org>,
	Tariq Toukan <tariqt@mellanox.com>
Subject: Re: [PATCH v2 net-next] mlx4: avoid unnecessary dirtying of critical fields
Date: Mon, 21 Nov 2016 11:09:50 +0200	[thread overview]
Message-ID: <32f9297d-96d2-c755-c150-a61dbb721f28@gmail.com> (raw)
In-Reply-To: <1479662676.8455.364.camel@edumazet-glaptop3.roam.corp.google.com>


On 20/11/2016 7:24 PM, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> While stressing a 40Gbit mlx4 NIC with busy polling, I found false
> sharing in mlx4 driver that can be easily avoided.
>
> This patch brings an additional 7 % performance improvement in UDP_RR
> workload.
>
> 1) If we received no frame during one mlx4_en_process_rx_cq()
>     invocation, no need to call mlx4_cq_set_ci() and/or dirty ring->cons
>
> 2) Do not refill rx buffers if we have plenty of them.
>     This avoids false sharing and allows some bulk/batch optimizations.
>     Page allocator and its locks will thank us.
>
> Finally, mlx4_en_poll_rx_cq() should not return 0 if it determined
> cpu handling NIC IRQ should be changed. We should return budget-1
> instead, to not fool net_rx_action() and its netdev_budget.
>
>
> v2: keep AVG_PERF_COUNTER(... polled) even if polled is 0
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Tariq Toukan <tariqt@mellanox.com>
> ---
>   drivers/net/ethernet/mellanox/mlx4/en_rx.c |   47 ++++++++++++-------
>   1 file changed, 30 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index 22f08f9ef4645869359783823127c0432fc7a591..6562f78b07f4370b5c1ea2c5e3a4221d7ebaeba8 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -688,18 +688,23 @@ static void validate_loopback(struct mlx4_en_priv *priv, struct sk_buff *skb)
>   	dev_kfree_skb_any(skb);
>   }
>   
> -static void mlx4_en_refill_rx_buffers(struct mlx4_en_priv *priv,
> -				     struct mlx4_en_rx_ring *ring)
> +static bool mlx4_en_refill_rx_buffers(struct mlx4_en_priv *priv,
> +				      struct mlx4_en_rx_ring *ring)
>   {
> -	int index = ring->prod & ring->size_mask;
> +	u32 missing = ring->actual_size - (ring->prod - ring->cons);
>   
> -	while ((u32) (ring->prod - ring->cons) < ring->actual_size) {
> -		if (mlx4_en_prepare_rx_desc(priv, ring, index,
> +	/* Try to batch allocations, but not too much. */
> +	if (missing < 8)
> +		return false;
> +	do {
> +		if (mlx4_en_prepare_rx_desc(priv, ring,
> +					    ring->prod & ring->size_mask,
>   					    GFP_ATOMIC | __GFP_COLD))
>   			break;
>   		ring->prod++;
> -		index = ring->prod & ring->size_mask;
> -	}
> +	} while (--missing);
> +
> +	return true;
>   }
>   
>   /* When hardware doesn't strip the vlan, we need to calculate the checksum
> @@ -1081,15 +1086,20 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
>   
>   out:
>   	rcu_read_unlock();
> -	if (doorbell_pending)
> -		mlx4_en_xmit_doorbell(priv->tx_ring[TX_XDP][cq->ring]);
>   
> +	if (polled) {
> +		if (doorbell_pending)
> +			mlx4_en_xmit_doorbell(priv->tx_ring[TX_XDP][cq->ring]);
> +
> +		mlx4_cq_set_ci(&cq->mcq);
> +		wmb(); /* ensure HW sees CQ consumer before we post new buffers */
> +		ring->cons = cq->mcq.cons_index;
> +	}
>   	AVG_PERF_COUNTER(priv->pstats.rx_coal_avg, polled);
> -	mlx4_cq_set_ci(&cq->mcq);
> -	wmb(); /* ensure HW sees CQ consumer before we post new buffers */
> -	ring->cons = cq->mcq.cons_index;
> -	mlx4_en_refill_rx_buffers(priv, ring);
> -	mlx4_en_update_rx_prod_db(ring);
> +
> +	if (mlx4_en_refill_rx_buffers(priv, ring))
> +		mlx4_en_update_rx_prod_db(ring);
> +
>   	return polled;
>   }
>   
> @@ -1131,10 +1141,13 @@ int mlx4_en_poll_rx_cq(struct napi_struct *napi, int budget)
>   			return budget;
>   
>   		/* Current cpu is not according to smp_irq_affinity -
> -		 * probably affinity changed. need to stop this NAPI
> -		 * poll, and restart it on the right CPU
> +		 * probably affinity changed. Need to stop this NAPI
> +		 * poll, and restart it on the right CPU.
> +		 * Try to avoid returning a too small value (like 0),
> +		 * to not fool net_rx_action() and its netdev_budget
>   		 */
> -		done = 0;
> +		if (done)
> +			done--;
>   	}
>   	/* Done for now */
>   	if (napi_complete_done(napi, done))
>
>
Thanks Eric.
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>

  parent reply	other threads:[~2016-11-21  9:09 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-18 20:15 [PATCH net-next] mlx4: avoid unnecessary dirtying of critical fields Eric Dumazet
2016-11-20 15:14 ` Tariq Toukan
2016-11-20 17:15   ` Eric Dumazet
2016-11-20 17:24     ` [PATCH v2 " Eric Dumazet
2016-11-20 17:26       ` Eric Dumazet
2016-11-21  9:09       ` Tariq Toukan [this message]
2016-11-21 16:33       ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=32f9297d-96d2-c755-c150-a61dbb721f28@gmail.com \
    --to=ttoukan.linux@gmail.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=tariqt@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).