From: Leon Romanovsky <leon@kernel.org>
To: Aurelien Aptel <aaptel@nvidia.com>
Cc: netdev@vger.kernel.org, davem@davemloft.net, kuba@kernel.org,
edumazet@google.com, pabeni@redhat.com, saeedm@nvidia.com,
tariqt@nvidia.com, linux-nvme@lists.infradead.org,
sagi@grimberg.me, hch@lst.de, kbusch@kernel.org, axboe@fb.com,
chaitanyak@nvidia.com, smalin@nvidia.com, ogerlitz@nvidia.com,
yorayz@nvidia.com, borisp@nvidia.com, aurelien.aptel@gmail.com,
malin1024@gmail.com
Subject: Re: [PATCH v6 01/23] net: Introduce direct data placement tcp offload
Date: Mon, 24 Oct 2022 14:22:07 +0300 [thread overview]
Message-ID: <Y1Z1XzLusUVjfnLp@unreal> (raw)
In-Reply-To: <20221020101838.2712846-2-aaptel@nvidia.com>
On Thu, Oct 20, 2022 at 01:18:16PM +0300, Aurelien Aptel wrote:
> From: Boris Pismenny <borisp@nvidia.com>
>
> This commit introduces direct data placement (DDP) offload for TCP.
>
> The motivation is saving compute resources/cycles that are spent
> to copy data from SKBs to the block layer buffers and CRC
> calculation/verification for received PDUs (Protocol Data Units).
>
> The DDP capability is accompanied by new net_device operations that
> configure hardware contexts.
>
> There is a context per socket, and a context per DDP operation.
> Additionally, a resynchronization routine is used to assist
> hardware handle TCP OOO, and continue the offload. Furthermore,
> we let the offloading driver advertise what is the max hw
> sectors/segments.
>
> The interface includes five net-device ddp operations:
>
> 1. sk_add - add offload for the queue represented by socket+config pair
> 2. sk_del - remove the offload for the socket/queue
> 3. ddp_setup - request copy offload for buffers associated with an IO
> 4. ddp_teardown - release offload resources for that IO
> 5. limits - query NIC driver for quirks and limitations (e.g.
> max number of scatter gather entries per IO)
>
> Using this interface, the NIC hardware will scatter TCP payload
> directly to the BIO pages according to the command_id.
>
> To maintain the correctness of the network stack, the driver is
> expected to construct SKBs that point to the BIO pages.
>
> The SKB passed to the network stack from the driver represents
> data as it is on the wire, while it is pointing directly to data
> in destination buffers.
>
> As a result, data from page frags should not be copied out to
> the linear part. To avoid needless copies, such as when using
> skb_condense, we mark the skb->ddp bit.
> In addition, the skb->crc will be used by the upper layers to
> determine if CRC re-calculation is required. The two separated skb
> indications are needed to avoid false positives GRO flushing events.
>
> Follow-up patches will use this interface for DDP in NVMe-TCP.
>
> Signed-off-by: Boris Pismenny <borisp@nvidia.com>
> Signed-off-by: Ben Ben-Ishay <benishay@nvidia.com>
> Signed-off-by: Or Gerlitz <ogerlitz@nvidia.com>
> Signed-off-by: Yoray Zack <yorayz@nvidia.com>
> Signed-off-by: Shai Malin <smalin@nvidia.com>
> Signed-off-by: Aurelien Aptel <aaptel@nvidia.com>
> ---
> include/linux/netdev_features.h | 3 +-
> include/linux/netdevice.h | 5 +
> include/linux/skbuff.h | 11 ++
> include/net/inet_connection_sock.h | 4 +
> include/net/ulp_ddp.h | 171 +++++++++++++++++++++++++++++
> net/Kconfig | 10 ++
> net/core/skbuff.c | 3 +-
> net/ethtool/common.c | 1 +
> net/ipv4/tcp_input.c | 8 ++
> net/ipv4/tcp_ipv4.c | 3 +
> net/ipv4/tcp_offload.c | 3 +
> 11 files changed, 220 insertions(+), 2 deletions(-)
> create mode 100644 include/net/ulp_ddp.h
>
> diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
> index 7c2d77d75a88..bf7391aa04c7 100644
> --- a/include/linux/netdev_features.h
> +++ b/include/linux/netdev_features.h
> @@ -14,7 +14,7 @@ typedef u64 netdev_features_t;
> enum {
> NETIF_F_SG_BIT, /* Scatter/gather IO. */
> NETIF_F_IP_CSUM_BIT, /* Can checksum TCP/UDP over IPv4. */
> - __UNUSED_NETIF_F_1,
> + NETIF_F_HW_ULP_DDP_BIT, /* ULP direct data placement offload */
> NETIF_F_HW_CSUM_BIT, /* Can checksum all the packets. */
> NETIF_F_IPV6_CSUM_BIT, /* Can checksum TCP/UDP over IPV6 */
> NETIF_F_HIGHDMA_BIT, /* Can DMA to high memory. */
> @@ -168,6 +168,7 @@ enum {
> #define NETIF_F_HW_HSR_TAG_RM __NETIF_F(HW_HSR_TAG_RM)
> #define NETIF_F_HW_HSR_FWD __NETIF_F(HW_HSR_FWD)
> #define NETIF_F_HW_HSR_DUP __NETIF_F(HW_HSR_DUP)
> +#define NETIF_F_HW_ULP_DDP __NETIF_F(HW_ULP_DDP)
>
> /* Finds the next feature with the highest number of the range of start-1 till 0.
> */
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index a36edb0ec199..ff6f4978723a 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1043,6 +1043,7 @@ struct dev_ifalias {
<...>
> @@ -982,6 +984,15 @@ struct sk_buff {
> #endif
> __u8 slow_gro:1;
> __u8 csum_not_inet:1;
> +#ifdef CONFIG_ULP_DDP
> + __u8 ulp_ddp:1;
> + __u8 ulp_crc:1;
> +#define IS_ULP_DDP(skb) ((skb)->ulp_ddp)
> +#define IS_ULP_CRC(skb) ((skb)->ulp_crc)
> +#else
> +#define IS_ULP_DDP(skb) (0)
> +#define IS_ULP_CRC(skb) (0)
All users of this define are protected by ifdef CONFIG_ULP_DDP.
It is easier to wrap user of IS_ULP_DDP() too and remove #else
lag from here.
> +#endif
>
> #ifdef CONFIG_NET_SCHED
> __u16 tc_index; /* traffic control index */
> diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
> index c2b15f7e5516..2ba73167b3bb 100644
> --- a/include/net/inet_connection_sock.h
> +++ b/include/net/inet_connection_sock.h
> @@ -68,6 +68,8 @@ struct inet_connection_sock_af_ops {
> * @icsk_ulp_ops Pluggable ULP control hook
> * @icsk_ulp_data ULP private data
> * @icsk_clean_acked Clean acked data hook
> + * @icsk_ulp_ddp_ops Pluggable ULP direct data placement control hook
> + * @icsk_ulp_ddp_data ULP direct data placement private data
> * @icsk_ca_state: Congestion control state
> * @icsk_retransmits: Number of unrecovered [RTO] timeouts
> * @icsk_pending: Scheduled timer event
> @@ -98,6 +100,8 @@ struct inet_connection_sock {
> const struct tcp_ulp_ops *icsk_ulp_ops;
> void __rcu *icsk_ulp_data;
> void (*icsk_clean_acked)(struct sock *sk, u32 acked_seq);
> + const struct ulp_ddp_ulp_ops *icsk_ulp_ddp_ops;
> + void __rcu *icsk_ulp_ddp_data;
> unsigned int (*icsk_sync_mss)(struct sock *sk, u32 pmtu);
> __u8 icsk_ca_state:5,
> icsk_ca_initialized:1,
> diff --git a/include/net/ulp_ddp.h b/include/net/ulp_ddp.h
> new file mode 100644
> index 000000000000..57cc4ab22e18
> --- /dev/null
> +++ b/include/net/ulp_ddp.h
> @@ -0,0 +1,171 @@
> +/* SPDX-License-Identifier: GPL-2.0
> + *
> + * ulp_ddp.h
> + * Author: Boris Pismenny <borisp@nvidia.com>
> + * Copyright (C) 2022 NVIDIA CORPORATION & AFFILIATES.
The official format is:
Copyright (C) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
^^^^ ^^^^^^^^^^^^^^^^^^
<...>
> +config ULP_DDP
> + bool "ULP direct data placement offload"
> + default n
No need to set "n" explicitly, it is already default.
Thanks
next prev parent reply other threads:[~2022-10-24 11:22 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-20 10:18 [PATCH v6 00/23] nvme-tcp receive offloads Aurelien Aptel
2022-10-20 10:18 ` [PATCH v6 01/23] net: Introduce direct data placement tcp offload Aurelien Aptel
2022-10-21 5:05 ` Jakub Kicinski
2022-10-21 11:19 ` Aurelien Aptel
2022-10-24 11:22 ` Leon Romanovsky [this message]
2022-10-24 13:09 ` Aurelien Aptel
2022-10-20 10:18 ` [PATCH v6 02/23] iov_iter: DDP copy to iter/pages Aurelien Aptel
2022-10-20 10:18 ` [PATCH v6 03/23] net/tls: export get_netdev_for_sock Aurelien Aptel
2022-10-20 10:18 ` [PATCH v6 04/23] Revert "nvme-tcp: remove the unused queue_size member in nvme_tcp_queue" Aurelien Aptel
2022-10-20 10:18 ` [PATCH v6 05/23] nvme-tcp: Add DDP offload control path Aurelien Aptel
2022-10-20 10:18 ` [PATCH v6 06/23] nvme-tcp: Add DDP data-path Aurelien Aptel
2022-10-20 10:18 ` [PATCH v6 07/23] nvme-tcp: RX DDGST offload Aurelien Aptel
2022-10-20 10:18 ` [PATCH v6 08/23] nvme-tcp: Deal with netdevice DOWN events Aurelien Aptel
2022-10-20 10:18 ` [PATCH v6 09/23] nvme-tcp: Add ulp_offload modparam to control enablement of ULP offload Aurelien Aptel
2022-10-20 10:18 ` [PATCH v6 10/23] Documentation: add ULP DDP offload documentation Aurelien Aptel
2022-10-20 10:18 ` [PATCH v6 11/23] net/mlx5e: Rename from tls to transport static params Aurelien Aptel
2022-10-20 10:18 ` [PATCH v6 12/23] net/mlx5e: Refactor ico sq polling to get budget Aurelien Aptel
2022-10-20 10:18 ` [PATCH v6 13/23] net/mlx5e: Have mdev pointer directly on the icosq structure Aurelien Aptel
2022-10-20 10:18 ` [PATCH v6 14/23] net/mlx5e: Refactor doorbell function to allow avoiding a completion Aurelien Aptel
2022-10-20 10:18 ` [PATCH v6 15/23] net/mlx5: Add NVMEoTCP caps, HW bits, 128B CQE and enumerations Aurelien Aptel
2022-10-20 10:18 ` [PATCH v6 16/23] net/mlx5e: NVMEoTCP, offload initialization Aurelien Aptel
2022-10-20 10:18 ` [PATCH v6 17/23] net/mlx5e: TCP flow steering for nvme-tcp acceleration Aurelien Aptel
2022-10-20 10:18 ` [PATCH v6 18/23] net/mlx5e: NVMEoTCP, use KLM UMRs for buffer registration Aurelien Aptel
2022-10-20 10:18 ` [PATCH v6 19/23] net/mlx5e: NVMEoTCP, queue init/teardown Aurelien Aptel
2022-10-20 10:18 ` [PATCH v6 20/23] net/mlx5e: NVMEoTCP, ddp setup and resync Aurelien Aptel
2022-10-20 10:18 ` [PATCH v6 21/23] net/mlx5e: NVMEoTCP, async ddp invalidation Aurelien Aptel
2022-10-20 10:18 ` [PATCH v6 22/23] net/mlx5e: NVMEoTCP, data-path for DDP+DDGST offload Aurelien Aptel
2022-10-20 10:18 ` [PATCH v6 23/23] net/mlx5e: NVMEoTCP, statistics Aurelien Aptel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y1Z1XzLusUVjfnLp@unreal \
--to=leon@kernel.org \
--cc=aaptel@nvidia.com \
--cc=aurelien.aptel@gmail.com \
--cc=axboe@fb.com \
--cc=borisp@nvidia.com \
--cc=chaitanyak@nvidia.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=hch@lst.de \
--cc=kbusch@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=malin1024@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=ogerlitz@nvidia.com \
--cc=pabeni@redhat.com \
--cc=saeedm@nvidia.com \
--cc=sagi@grimberg.me \
--cc=smalin@nvidia.com \
--cc=tariqt@nvidia.com \
--cc=yorayz@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).