From: Alexandra Winter <wintera@linux.ibm.com>
To: Eric Dumazet <edumazet@google.com>
Cc: Rahul Rameshbabu <rrameshbabu@nvidia.com>,
Saeed Mahameed <saeedm@nvidia.com>,
Tariq Toukan <tariqt@nvidia.com>,
Leon Romanovsky <leon@kernel.org>,
David Miller <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Andrew Lunn <andrew+netdev@lunn.ch>,
Nils Hoppmann <niho@linux.ibm.com>,
netdev@vger.kernel.org, linux-s390@vger.kernel.org,
Heiko Carstens <hca@linux.ibm.com>,
Vasily Gorbik <gor@linux.ibm.com>,
Alexander Gordeev <agordeev@linux.ibm.com>,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Sven Schnelle <svens@linux.ibm.com>,
Thorsten Winkler <twinkler@linux.ibm.com>,
Simon Horman <horms@kernel.org>
Subject: Re: [PATCH net-next] net/mlx5e: Transmit small messages in linear skb
Date: Wed, 4 Dec 2024 15:35:55 +0100 [thread overview]
Message-ID: <eb29649e-836e-44b8-b364-2ed736bad3ee@linux.ibm.com> (raw)
In-Reply-To: <CANn89i+DX-b4PM4R2uqtcPmztCxe_Onp7Vk+uHU4E6eW1H+=zA@mail.gmail.com>
On 04.12.24 15:16, Eric Dumazet wrote:
> On Wed, Dec 4, 2024 at 3:02 PM Alexandra Winter <wintera@linux.ibm.com> wrote:
>>
>> Linearize the skb if the device uses IOMMU and the data buffer can fit
>> into one page. So messages can be transferred in one transfer to the card
>> instead of two.
>>
>> Performance issue:
>> ------------------
>> Since commit 472c2e07eef0 ("tcp: add one skb cache for tx")
>> tcp skbs are always non-linear. Especially on platforms with IOMMU,
>> mapping and unmapping two pages instead of one per transfer can make a
>> noticeable difference. On s390 we saw a 13% degradation in throughput,
>> when running uperf with a request-response pattern with 1k payload and
>> 250 connections parallel. See [0] for a discussion.
>>
>> This patch mitigates these effects using a work-around in the mlx5 driver.
>>
>> Notes on implementation:
>> ------------------------
>> TCP skbs never contain any tailroom, so skb_linearize() will allocate a
>> new data buffer.
>> No need to handle rc of skb_linearize(). If it fails, we continue with the
>> unchanged skb.
>>
>> As mentioned in the discussion, an alternative, but more invasive approach
>> would be: premapping a coherent piece of memory in which you can copy
>> small skbs.
>>
>> Measurement results:
>> --------------------
>> We see an improvement in throughput of up to 16% compared to kernel v6.12.
>> We measured throughput and CPU consumption of uperf benchmarks with
>> ConnectX-6 cards on s390 architecture and compared results of kernel v6.12
>> with and without this patch.
>>
>> +------------------------------------------+
>> | Transactions per Second - Deviation in % |
>> +-------------------+----------------------+
>> | Workload | |
>> | rr1c-1x1--50 | 4.75 |
>> | rr1c-1x1-250 | 14.53 |
>> | rr1c-200x1000--50 | 2.22 |
>> | rr1c-200x1000-250 | 12.24 |
>> +-------------------+----------------------+
>> | Server CPU Consumption - Deviation in % |
>> +-------------------+----------------------+
>> | Workload | |
>> | rr1c-1x1--50 | -1.66 |
>> | rr1c-1x1-250 | -10.00 |
>> | rr1c-200x1000--50 | -0.83 |
>> | rr1c-200x1000-250 | -8.71 |
>> +-------------------+----------------------+
>>
>> Note:
>> - CPU consumption: less is better
>> - Client CPU consumption is similar
>> - Workload:
>> rr1c-<bytes send>x<bytes received>-<parallel connections>
>>
>> Highly transactional small data sizes (rr1c-1x1)
>> This is a Request & Response (RR) test that sends a 1-byte request
>> from the client and receives a 1-byte response from the server. This
>> is the smallest possible transactional workload test and is smaller
>> than most customer workloads. This test represents the RR overhead
>> costs.
>> Highly transactional medium data sizes (rr1c-200x1000)
>> Request & Response (RR) test that sends a 200-byte request from the
>> client and receives a 1000-byte response from the server. This test
>> should be representative of a typical user's interaction with a remote
>> web site.
>>
>> Link: https://lore.kernel.org/netdev/20220907122505.26953-1-wintera@linux.ibm.com/#t [0]
>> Suggested-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
>> Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
>> Co-developed-by: Nils Hoppmann <niho@linux.ibm.com>
>> Signed-off-by: Nils Hoppmann <niho@linux.ibm.com>
>> ---
>> drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 5 +++++
>> 1 file changed, 5 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
>> index f8c7912abe0e..421ba6798ca7 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
>> @@ -32,6 +32,7 @@
>>
>> #include <linux/tcp.h>
>> #include <linux/if_vlan.h>
>> +#include <linux/iommu-dma.h>
>> #include <net/geneve.h>
>> #include <net/dsfield.h>
>> #include "en.h"
>> @@ -269,6 +270,10 @@ static void mlx5e_sq_xmit_prepare(struct mlx5e_txqsq *sq, struct sk_buff *skb,
>> {
>> struct mlx5e_sq_stats *stats = sq->stats;
>>
>> + /* Don't require 2 IOMMU TLB entries, if one is sufficient */
>> + if (use_dma_iommu(sq->pdev) && skb->truesize <= PAGE_SIZE)
>> + skb_linearize(skb);
>> +
>> if (skb_is_gso(skb)) {
>> int hopbyhop;
>> u16 ihs = mlx5e_tx_get_gso_ihs(sq, skb, &hopbyhop);
>> --
>> 2.45.2
>
>
> Was this tested on x86_64 or any other arch than s390, especially ones
> with PAGE_SIZE = 65536 ?
>
No, I don't have a mlx5 card in an x86_64.
Rahul, could you test this patch?
next prev parent reply other threads:[~2024-12-04 14:36 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-04 14:02 [PATCH net-next] net/mlx5e: Transmit small messages in linear skb Alexandra Winter
2024-12-04 14:16 ` Eric Dumazet
2024-12-04 14:35 ` Alexandra Winter [this message]
2024-12-04 14:36 ` Eric Dumazet
2024-12-06 14:47 ` David Laight
2024-12-06 16:35 ` Eric Dumazet
2024-12-06 15:25 ` Alexandra Winter
2024-12-10 11:49 ` Dragos Tatulea
2024-12-11 16:19 ` Alexandra Winter
2024-12-11 17:36 ` Dragos Tatulea
2024-12-04 14:32 ` Alexander Lobakin
2024-12-06 15:20 ` Alexandra Winter
2024-12-09 11:36 ` Tariq Toukan
2024-12-10 11:44 ` Dragos Tatulea
2024-12-10 13:54 ` Alexander Lobakin
2024-12-10 17:10 ` Joe Damato
2024-12-11 13:35 ` Alexandra Winter
2024-12-11 17:28 ` Dragos Tatulea
2024-12-11 17:50 ` Niklas Schnelle
2024-12-13 20:41 ` Dragos Tatulea
2024-12-12 10:36 ` Christian Borntraeger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=eb29649e-836e-44b8-b364-2ed736bad3ee@linux.ibm.com \
--to=wintera@linux.ibm.com \
--cc=agordeev@linux.ibm.com \
--cc=andrew+netdev@lunn.ch \
--cc=borntraeger@linux.ibm.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=gor@linux.ibm.com \
--cc=hca@linux.ibm.com \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=niho@linux.ibm.com \
--cc=pabeni@redhat.com \
--cc=rrameshbabu@nvidia.com \
--cc=saeedm@nvidia.com \
--cc=svens@linux.ibm.com \
--cc=tariqt@nvidia.com \
--cc=twinkler@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox