From: Tariq Toukan <ttoukan.linux@gmail.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>,
netdev <netdev@vger.kernel.org>,
Tariq Toukan <tariqt@mellanox.com>
Subject: Re: [PATCH v2 net-next] mlx4: avoid unnecessary dirtying of critical fields
Date: Mon, 21 Nov 2016 11:09:50 +0200 [thread overview]
Message-ID: <32f9297d-96d2-c755-c150-a61dbb721f28@gmail.com> (raw)
In-Reply-To: <1479662676.8455.364.camel@edumazet-glaptop3.roam.corp.google.com>
On 20/11/2016 7:24 PM, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> While stressing a 40Gbit mlx4 NIC with busy polling, I found false
> sharing in mlx4 driver that can be easily avoided.
>
> This patch brings an additional 7 % performance improvement in UDP_RR
> workload.
>
> 1) If we received no frame during one mlx4_en_process_rx_cq()
> invocation, no need to call mlx4_cq_set_ci() and/or dirty ring->cons
>
> 2) Do not refill rx buffers if we have plenty of them.
> This avoids false sharing and allows some bulk/batch optimizations.
> Page allocator and its locks will thank us.
>
> Finally, mlx4_en_poll_rx_cq() should not return 0 if it determined
> cpu handling NIC IRQ should be changed. We should return budget-1
> instead, to not fool net_rx_action() and its netdev_budget.
>
>
> v2: keep AVG_PERF_COUNTER(... polled) even if polled is 0
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Tariq Toukan <tariqt@mellanox.com>
> ---
> drivers/net/ethernet/mellanox/mlx4/en_rx.c | 47 ++++++++++++-------
> 1 file changed, 30 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index 22f08f9ef4645869359783823127c0432fc7a591..6562f78b07f4370b5c1ea2c5e3a4221d7ebaeba8 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -688,18 +688,23 @@ static void validate_loopback(struct mlx4_en_priv *priv, struct sk_buff *skb)
> dev_kfree_skb_any(skb);
> }
>
> -static void mlx4_en_refill_rx_buffers(struct mlx4_en_priv *priv,
> - struct mlx4_en_rx_ring *ring)
> +static bool mlx4_en_refill_rx_buffers(struct mlx4_en_priv *priv,
> + struct mlx4_en_rx_ring *ring)
> {
> - int index = ring->prod & ring->size_mask;
> + u32 missing = ring->actual_size - (ring->prod - ring->cons);
>
> - while ((u32) (ring->prod - ring->cons) < ring->actual_size) {
> - if (mlx4_en_prepare_rx_desc(priv, ring, index,
> + /* Try to batch allocations, but not too much. */
> + if (missing < 8)
> + return false;
> + do {
> + if (mlx4_en_prepare_rx_desc(priv, ring,
> + ring->prod & ring->size_mask,
> GFP_ATOMIC | __GFP_COLD))
> break;
> ring->prod++;
> - index = ring->prod & ring->size_mask;
> - }
> + } while (--missing);
> +
> + return true;
> }
>
> /* When hardware doesn't strip the vlan, we need to calculate the checksum
> @@ -1081,15 +1086,20 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
>
> out:
> rcu_read_unlock();
> - if (doorbell_pending)
> - mlx4_en_xmit_doorbell(priv->tx_ring[TX_XDP][cq->ring]);
>
> + if (polled) {
> + if (doorbell_pending)
> + mlx4_en_xmit_doorbell(priv->tx_ring[TX_XDP][cq->ring]);
> +
> + mlx4_cq_set_ci(&cq->mcq);
> + wmb(); /* ensure HW sees CQ consumer before we post new buffers */
> + ring->cons = cq->mcq.cons_index;
> + }
> AVG_PERF_COUNTER(priv->pstats.rx_coal_avg, polled);
> - mlx4_cq_set_ci(&cq->mcq);
> - wmb(); /* ensure HW sees CQ consumer before we post new buffers */
> - ring->cons = cq->mcq.cons_index;
> - mlx4_en_refill_rx_buffers(priv, ring);
> - mlx4_en_update_rx_prod_db(ring);
> +
> + if (mlx4_en_refill_rx_buffers(priv, ring))
> + mlx4_en_update_rx_prod_db(ring);
> +
> return polled;
> }
>
> @@ -1131,10 +1141,13 @@ int mlx4_en_poll_rx_cq(struct napi_struct *napi, int budget)
> return budget;
>
> /* Current cpu is not according to smp_irq_affinity -
> - * probably affinity changed. need to stop this NAPI
> - * poll, and restart it on the right CPU
> + * probably affinity changed. Need to stop this NAPI
> + * poll, and restart it on the right CPU.
> + * Try to avoid returning a too small value (like 0),
> + * to not fool net_rx_action() and its netdev_budget
> */
> - done = 0;
> + if (done)
> + done--;
> }
> /* Done for now */
> if (napi_complete_done(napi, done))
>
>
Thanks Eric.
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
next prev parent reply other threads:[~2016-11-21 9:09 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-18 20:15 [PATCH net-next] mlx4: avoid unnecessary dirtying of critical fields Eric Dumazet
2016-11-20 15:14 ` Tariq Toukan
2016-11-20 17:15 ` Eric Dumazet
2016-11-20 17:24 ` [PATCH v2 " Eric Dumazet
2016-11-20 17:26 ` Eric Dumazet
2016-11-21 9:09 ` Tariq Toukan [this message]
2016-11-21 16:33 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=32f9297d-96d2-c755-c150-a61dbb721f28@gmail.com \
--to=ttoukan.linux@gmail.com \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=tariqt@mellanox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).