netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next] net/mlx5e: Transmit small messages in linear skb
@ 2024-12-04 14:02 Alexandra Winter
  2024-12-04 14:16 ` Eric Dumazet
  2024-12-04 14:32 ` Alexander Lobakin
  0 siblings, 2 replies; 21+ messages in thread
From: Alexandra Winter @ 2024-12-04 14:02 UTC (permalink / raw)
  To: Rahul Rameshbabu, Saeed Mahameed, Tariq Toukan, Leon Romanovsky,
	David Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn
  Cc: Nils Hoppmann, netdev, linux-s390, Heiko Carstens, Vasily Gorbik,
	Alexander Gordeev, Christian Borntraeger, Sven Schnelle,
	Thorsten Winkler, Simon Horman

Linearize the skb if the device uses IOMMU and the data buffer can fit
into one page. So messages can be transferred in one transfer to the card
instead of two.

Performance issue:
------------------
Since commit 472c2e07eef0 ("tcp: add one skb cache for tx")
tcp skbs are always non-linear. Especially on platforms with IOMMU,
mapping and unmapping two pages instead of one per transfer can make a
noticeable difference. On s390 we saw a 13% degradation in throughput,
when running uperf with a request-response pattern with 1k payload and
250 connections parallel. See [0] for a discussion.

This patch mitigates these effects using a work-around in the mlx5 driver.

Notes on implementation:
------------------------
TCP skbs never contain any tailroom, so skb_linearize() will allocate a
new data buffer.
No need to handle rc of skb_linearize(). If it fails, we continue with the
unchanged skb.

As mentioned in the discussion, an alternative, but more invasive approach
would be: premapping a coherent piece of memory in which you can copy
small skbs.

Measurement results:
--------------------
We see an improvement in throughput of up to 16% compared to kernel v6.12.
We measured throughput and CPU consumption of uperf benchmarks with
ConnectX-6 cards on s390 architecture and compared results of kernel v6.12
with and without this patch.

+------------------------------------------+
| Transactions per Second - Deviation in % |
+-------------------+----------------------+
| Workload          |                      |
|  rr1c-1x1--50     |          4.75        |
|  rr1c-1x1-250     |         14.53        |
| rr1c-200x1000--50 |          2.22        |
| rr1c-200x1000-250 |         12.24        |
+-------------------+----------------------+
| Server CPU Consumption - Deviation in %  |
+-------------------+----------------------+
| Workload          |                      |
|  rr1c-1x1--50     |         -1.66        |
|  rr1c-1x1-250     |        -10.00        |
| rr1c-200x1000--50 |         -0.83        |
| rr1c-200x1000-250 |         -8.71        |
+-------------------+----------------------+

Note:
- CPU consumption: less is better
- Client CPU consumption is similar
- Workload:
  rr1c-<bytes send>x<bytes received>-<parallel connections>

  Highly transactional small data sizes (rr1c-1x1)
    This is a Request & Response (RR) test that sends a 1-byte request
    from the client and receives a 1-byte response from the server. This
    is the smallest possible transactional workload test and is smaller
    than most customer workloads. This test represents the RR overhead
    costs.
  Highly transactional medium data sizes (rr1c-200x1000)
    Request & Response (RR) test that sends a 200-byte request from the
    client and receives a 1000-byte response from the server. This test
    should be representative of a typical user's interaction with a remote
    web site.

Link: https://lore.kernel.org/netdev/20220907122505.26953-1-wintera@linux.ibm.com/#t [0]
Suggested-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Co-developed-by: Nils Hoppmann <niho@linux.ibm.com>
Signed-off-by: Nils Hoppmann <niho@linux.ibm.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index f8c7912abe0e..421ba6798ca7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -32,6 +32,7 @@
 
 #include <linux/tcp.h>
 #include <linux/if_vlan.h>
+#include <linux/iommu-dma.h>
 #include <net/geneve.h>
 #include <net/dsfield.h>
 #include "en.h"
@@ -269,6 +270,10 @@ static void mlx5e_sq_xmit_prepare(struct mlx5e_txqsq *sq, struct sk_buff *skb,
 {
 	struct mlx5e_sq_stats *stats = sq->stats;
 
+	/* Don't require 2 IOMMU TLB entries, if one is sufficient */
+	if (use_dma_iommu(sq->pdev) && skb->truesize <= PAGE_SIZE)
+		skb_linearize(skb);
+
 	if (skb_is_gso(skb)) {
 		int hopbyhop;
 		u16 ihs = mlx5e_tx_get_gso_ihs(sq, skb, &hopbyhop);
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2024-12-13 20:41 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-04 14:02 [PATCH net-next] net/mlx5e: Transmit small messages in linear skb Alexandra Winter
2024-12-04 14:16 ` Eric Dumazet
2024-12-04 14:35   ` Alexandra Winter
2024-12-04 14:36   ` Eric Dumazet
2024-12-06 14:47     ` David Laight
2024-12-06 16:35       ` Eric Dumazet
2024-12-06 15:25     ` Alexandra Winter
2024-12-10 11:49       ` Dragos Tatulea
2024-12-11 16:19         ` Alexandra Winter
2024-12-11 17:36           ` Dragos Tatulea
2024-12-04 14:32 ` Alexander Lobakin
2024-12-06 15:20   ` Alexandra Winter
2024-12-09 11:36     ` Tariq Toukan
2024-12-10 11:44     ` Dragos Tatulea
2024-12-10 13:54       ` Alexander Lobakin
2024-12-10 17:10         ` Joe Damato
2024-12-11 13:35         ` Alexandra Winter
2024-12-11 17:28           ` Dragos Tatulea
2024-12-11 17:50             ` Niklas Schnelle
2024-12-13 20:41               ` Dragos Tatulea
2024-12-12 10:36             ` Christian Borntraeger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).