linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* transferring bvecs over the network in drbd
@ 2025-05-08  6:45 Christoph Hellwig
  2025-05-08  8:39 ` Lars Ellenberg
  0 siblings, 1 reply; 3+ messages in thread
From: Christoph Hellwig @ 2025-05-08  6:45 UTC (permalink / raw)
  To: Philipp Reisner, Lars Ellenberg, Christoph Böhmwalder
  Cc: drbd-dev, linux-block

Hi all,

I recently went over code that directly access the bio_vec bv_page/
bv_offset members and the code in _drbd_send_bio/_drbd_send_zc_bio
came to my attention.

It iterates the bio to kmap all segments, and then either does a
sock_sendmsg on a newly created kvec iter, or one one a new bvec iter
for each segment.  The former can't work on highmem systems and both
versions are rather inefficient.

What is preventing drbd from doing a single sock_sendmsg with the
bvec payload?  nvme-tcp (nvme_tcp_init_iter0 is a good example for
doing that, or the sunrpc svcsock code using it's local bvec list
(svc_tcp_sendmsg).

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: transferring bvecs over the network in drbd
  2025-05-08  6:45 transferring bvecs over the network in drbd Christoph Hellwig
@ 2025-05-08  8:39 ` Lars Ellenberg
  2025-05-08 10:06   ` Christoph Hellwig
  0 siblings, 1 reply; 3+ messages in thread
From: Lars Ellenberg @ 2025-05-08  8:39 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Philipp Reisner, Christoph Böhmwalder, drbd-dev, linux-block

On Wed, May 07, 2025 at 11:45:50PM -0700, Christoph Hellwig wrote:
> Hi all,
> 
> I recently went over code that directly access the bio_vec bv_page/
> bv_offset members and the code in _drbd_send_bio/_drbd_send_zc_bio
> came to my attention.
> 
> It iterates the bio to kmap all segments, and then either does a
> sock_sendmsg on a newly created kvec iter, or one one a new bvec iter
> for each segment.  The former can't work on highmem systems and both
> versions are rather inefficient.
> 
> What is preventing drbd from doing a single sock_sendmsg with the
> bvec payload?  nvme-tcp (nvme_tcp_init_iter0 is a good example for
> doing that, or the sunrpc svcsock code using it's local bvec list
> (svc_tcp_sendmsg).

For async replication, we want to actually copy data into send buffer,
we cannot have the network stack hold a reference to a page for which
we signalled io completion already.

For sync replication we want to avoid additional data copy if possible,
so try to use "zero copy sendpage".

That's why we have two variants of what looks to be the same thing.

Why we do it that way: probably when we wrote that part,
a better infrastructure was not available, or we were not aware of it.
Thanks for the pointers, we'll look into it.
Using more efficient ways to do stuff sounds good.

    Lars


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: transferring bvecs over the network in drbd
  2025-05-08  8:39 ` Lars Ellenberg
@ 2025-05-08 10:06   ` Christoph Hellwig
  0 siblings, 0 replies; 3+ messages in thread
From: Christoph Hellwig @ 2025-05-08 10:06 UTC (permalink / raw)
  To: Christoph Hellwig, Philipp Reisner, Christoph Böhmwalder,
	drbd-dev, linux-block

On Thu, May 08, 2025 at 10:39:56AM +0200, Lars Ellenberg wrote:
> For async replication, we want to actually copy data into send buffer,
> we cannot have the network stack hold a reference to a page for which
> we signalled io completion already.
> 
> For sync replication we want to avoid additional data copy if possible,
> so try to use "zero copy sendpage".

I didn't even complain about having both variants :)

> 
> That's why we have two variants of what looks to be the same thing.
> 
> Why we do it that way: probably when we wrote that part,
> a better infrastructure was not available, or we were not aware of it.

Yes.  While the iov_iter and the bvec version of have been around
for a long time, drbd probably still predates them.

> Thanks for the pointers, we'll look into it.
> Using more efficient ways to do stuff sounds good.

thanks.  Note that now that ->sendpage has been replaced with the
MSG_SPLICE_PAGES flag you can actually share most code for both
variants as well.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-05-08 10:06 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-08  6:45 transferring bvecs over the network in drbd Christoph Hellwig
2025-05-08  8:39 ` Lars Ellenberg
2025-05-08 10:06   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).