From: Jens Axboe <axboe@kernel.dk>
To: Aurelien Aptel <aaptel@nvidia.com>,
linux-nvme@lists.infradead.org, netdev@vger.kernel.org,
sagi@grimberg.me, hch@lst.de, kbusch@kernel.org, axboe@fb.com,
chaitanyak@nvidia.com, davem@davemloft.net, kuba@kernel.org
Cc: aurelien.aptel@gmail.com, smalin@nvidia.com, malin1024@gmail.com,
ogerlitz@nvidia.com, yorayz@nvidia.com, borisp@nvidia.com,
galshalom@nvidia.com, mgurtovoy@nvidia.com, tariqt@nvidia.com,
gus@collabora.com, viro@zeniv.linux.org.uk,
akpm@linux-foundation.org
Subject: Re: [PATCH v30 03/20] iov_iter: skip copy if src == dst for direct data placement
Date: Thu, 31 Jul 2025 07:29:25 -0600 [thread overview]
Message-ID: <d7ddaf65-00a0-4b90-b596-5db1a5169950@kernel.dk> (raw)
In-Reply-To: <253zfcs2qaw.fsf@nvidia.com>
On 7/25/25 10:22 AM, Aurelien Aptel wrote:
> Jens Axboe <axboe@kernel.dk> writes:
>> This seems like entirely the wrong place to apply this logic...
>
> I understand it might look strange on first sight to have the check
> implemented in this low level iterator copy function.
>
> But it is actually where it makes the most sense.
Please stop rationalizing what is a blatant layering violation.
> Let's assume we move it to the caller, nvme-tcp. We now need read our
> packets from the socket but somehow when we reach the offloaded payload,
> skip ahead. There is no function for that in the socket API. We can walk
> the SKB fragments but a single SKB can contain a combination of
> offloaded and non-offloaded chunks. You now need to re-implement
> skb_copy_datagram() to know what to copy to and where at the socket
> layer. Even if you reuse the callback API you have to take care of the
> destination bvec iterator. You end up duplicating that iterator
> map/step/advance logic.
>
> Our design is transparent to applications. The application does not need
> to handle the skipping logic, and the skipping is done in a generic way
> in the underlying copy function. It also will work for other protocol
> with no extra code. All the work will happen in the driver, which needs
> to construct SKBs using the application buffers. Making drivers
> communicate information to the upper layer is already a common
> practice. The SKBs are already assembled by drivers today according to
> how the data is received from the network.
>
> We have covered some of the design decisions in previous discussion,
> more recently on v25. I suggest you can take a look [1].
>
> Regarding performances, we were not able to see any measurable
> differences between fio runs with the CONFIG disabled and enabled.
I'm not at all worried about performance, I'm worried about littering
completely generic helper code with random driver checks. And you
clearly WERE worried about performance, since otherwise why would you
even add a IS_ENABLED(CONFIG_ULP_DDP) in the first place if not?
--
Jens Axboe
next prev parent reply other threads:[~2025-07-31 13:29 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-15 13:27 [PATCH v30 00/20] nvme-tcp receive offloads Aurelien Aptel
2025-07-15 13:27 ` [PATCH v30 01/20] net: Introduce direct data placement tcp offload Aurelien Aptel
2025-07-15 13:27 ` [PATCH v30 02/20] netlink: add new family to manage ULP_DDP enablement and stats Aurelien Aptel
2025-07-15 13:27 ` [PATCH v30 03/20] iov_iter: skip copy if src == dst for direct data placement Aurelien Aptel
2025-07-15 15:06 ` Jens Axboe
2025-07-15 17:08 ` Chaitanya Kulkarni
2025-07-15 17:14 ` Jens Axboe
2025-07-25 16:22 ` Aurelien Aptel
2025-07-31 13:29 ` Jens Axboe [this message]
2025-07-15 13:27 ` [PATCH v30 04/20] net/tls,core: export get_netdev_for_sock Aurelien Aptel
2025-07-15 13:27 ` [PATCH v30 05/20] nvme-tcp: Add DDP offload control path Aurelien Aptel
2025-07-15 13:27 ` [PATCH v30 06/20] nvme-tcp: Add DDP data-path Aurelien Aptel
2025-07-15 13:27 ` [PATCH v30 07/20] nvme-tcp: RX DDGST offload Aurelien Aptel
2025-07-15 13:27 ` [PATCH v30 08/20] nvme-tcp: Deal with netdevice DOWN events Aurelien Aptel
2025-07-15 13:27 ` [PATCH v30 09/20] Documentation: add ULP DDP offload documentation Aurelien Aptel
2025-07-15 13:27 ` [PATCH v30 10/20] net/mlx5e: Rename from tls to transport static params Aurelien Aptel
2025-07-15 13:27 ` [PATCH v30 11/20] net/mlx5e: Refactor ico sq polling to get budget Aurelien Aptel
2025-07-15 13:27 ` [PATCH v30 12/20] net/mlx5: Add NVMEoTCP caps, HW bits, 128B CQE and enumerations Aurelien Aptel
2025-07-15 13:27 ` [PATCH v30 13/20] net/mlx5e: NVMEoTCP, offload initialization Aurelien Aptel
2025-07-15 13:27 ` [PATCH v30 14/20] net/mlx5e: TCP flow steering for nvme-tcp acceleration Aurelien Aptel
2025-07-15 13:27 ` [PATCH v30 15/20] net/mlx5e: NVMEoTCP, use KLM UMRs for buffer registration Aurelien Aptel
2025-07-15 13:27 ` [PATCH v30 16/20] net/mlx5e: NVMEoTCP, queue init/teardown Aurelien Aptel
2025-07-15 13:27 ` [PATCH v30 17/20] net/mlx5e: NVMEoTCP, ddp setup and resync Aurelien Aptel
2025-07-15 13:27 ` [PATCH v30 18/20] net/mlx5e: NVMEoTCP, async ddp invalidation Aurelien Aptel
2025-07-15 13:27 ` [PATCH v30 19/20] net/mlx5e: NVMEoTCP, data-path for DDP+DDGST offload Aurelien Aptel
2025-07-15 13:27 ` [PATCH v30 20/20] net/mlx5e: NVMEoTCP, statistics Aurelien Aptel
2025-07-31 12:40 ` [PATCH v30 00/20] nvme-tcp receive offloads Aurelien Aptel
2025-07-31 14:20 ` Jakub Kicinski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d7ddaf65-00a0-4b90-b596-5db1a5169950@kernel.dk \
--to=axboe@kernel.dk \
--cc=aaptel@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=aurelien.aptel@gmail.com \
--cc=axboe@fb.com \
--cc=borisp@nvidia.com \
--cc=chaitanyak@nvidia.com \
--cc=davem@davemloft.net \
--cc=galshalom@nvidia.com \
--cc=gus@collabora.com \
--cc=hch@lst.de \
--cc=kbusch@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=malin1024@gmail.com \
--cc=mgurtovoy@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=ogerlitz@nvidia.com \
--cc=sagi@grimberg.me \
--cc=smalin@nvidia.com \
--cc=tariqt@nvidia.com \
--cc=viro@zeniv.linux.org.uk \
--cc=yorayz@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox