From: Vladislav Bolkhovitin <vst@vlnb.net>
To: Jens Axboe <jens.axboe@oracle.com>
Cc: "David M. Lloyd" <dmlloyd@flurg.com>,
linux-mm@kvack.org, Christoph Hellwig <hch@infradead.org>,
James Bottomley <James.Bottomley@HansenPartnership.com>,
linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org,
scst-devel@lists.sourceforge.net,
Bart Van Assche <bart.vanassche@gmail.com>,
netdev@vger.kernel.org
Subject: Re: [RFC]: Support for zero-copy TCP transmit of user space data
Date: Tue, 23 Dec 2008 22:11:24 +0300 [thread overview]
Message-ID: <495137DC.8050204@vlnb.net> (raw)
In-Reply-To: <20081219192736.GQ32491@kernel.dk>
Jens Axboe, on 12/19/2008 10:27 PM wrote:
>>>>>> An iSCSI target driver iSCSI-SCST was a part of the patchset
>>>>>> (http://lkml.org/lkml/2008/12/10/293). For it a nice optimization to
>>>>>> have TCP zero-copy transmit of user space data was implemented. Patch,
>>>>>> implementing this optimization was also sent in the patchset, see
>>>>>> http://lkml.org/lkml/2008/12/10/296.
>>>>> I'm probably ignorant of about 90% of the context here, but isn't this
>>>>> the sort of problem that was supposed to have been solved by vmsplice(2)?
>>>> No, vmsplice can't help here. ISCSI-SCST is a kernel space driver. But,
>>>> even if it was a user space driver, vmsplice wouldn't change anything
>>>> much. It doesn't have a possibility for a user to know, when
>>>> transmission of the data finished. So, it is intended to be used as:
>>>> vmsplice() buffer -> munmap() the buffer -> mmap() new buffer ->
>>>> vmsplice() it. But on the mmap() stage kernel has to zero all the newly
>>>> mapped pages and zeroing memory isn't much faster, than copying it.
>>>> Hence, there would be no considerable performance increase.
>>> vmsplice() isn't the right choice, but splice() very well could be. You
>>> could easily use splice internally as well. The vmsplice() part sort-of
>>> applies in the sense that you want to fill pages into a pipe, which is
>>> essentially what vmsplice() does. You'd need some helper to do that.
>> Sorry, Jens, but splice() works only if there is a file handle on the
>> another side, so user space doesn't see data buffers. But SCST needs to
>> serve a wider usage cases, like reading data with decompression from a
>> virtual tape, where decompression is done in user space. For those only
>> complete zero-copy network send, which I implemented, can give the best
>> performance.
>
> __splice_from_pipe() takes a pipe, a descriptor and an actor. There's
> absolutely ZERO reason you could not reuse most of that for this
> implementation. The big bonus here is that getting the put correct from
> networking would even make splice() better for everyone. Win for Linux,
> win for you since it'll make it MUCH easier for you to get this stuff
> in. Looking at your original patch and I almost think it's a flame bait
> to induce discussion (nothing wrong with that, that approach works quite
> well and has been used before). There's no way in HELL that it'd ever be
> a merge candidate. And I suspect you know that, at least I hope you do
> or you are farther away from going forward with this than you think.
>
> So don't look at splice() the system call, look at the infrastructure
> and check if that could be useful for your case. To me it looks
> absolutely like it could, if you goal is just zero-copy transmit.
I looked at the splice code again to make sure I don't miss anything.
__splice_from_pipe() leads to pipe_to_sendpage(), which leads to
sock_sendpage, then to sock->sendpage(). Sorry, but I don't see any
point why to go over all the complicated splice infrastructure instead
of directly call sock->sendpage(), as I do.
> The
> only missing piece is dropping the reference and signalling page
> consumption at the right point, which is when the data is safe to be
> reused. That very bit is missing, but that should be all as far as I can
> tell.
This is exactly what I implemented in the patch we are discussing.
>>> And
>>> the ack-on-xmit-done bits is something that splice-to-socket needs
>>> anyway, so I think it'd be quite a suitable choice for this.
>> So, are you writing that splice() could also benefit from the zero-copy
>> transmit feature, like I implemented?
>
> I like how you want to reinvent everything, perhaps you should spend a
> little more time looking into various other approaches? splice() already
> does zero-copy network transmit, there are no copies going on. Ideally,
> you'd have zero copies moving data into your pipe, but migrade/move
> isn't quite there yet. But that doesn't apply to your case at all.
>
> What is missing, as I wrote, is the 'release on ack' and not on pipe
> buffer release. This is similar to the get_page/put_page stuff you did
> in your patch, but don't go claiming that zero-copy transmit is a
> Vladislav original - the ->sendpage() does no copies.
Jens, I have never claimed I reinvented ->sendpage(). Quite opposite, I
use it. I only extended it by a missing feature. Although, seems, since
you were misleaded, I should apologize for not too good description of
the patch.
Thanks,
Vlad
next prev parent reply other threads:[~2008-12-23 19:11 UTC|newest]
Thread overview: 106+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-10 18:26 [PATCH][RFC 0/23] New SCSI target framework (SCST) and 4 target drivers Vladislav Bolkhovitin
2008-12-10 18:28 ` [PATCH][RFC 1/23]: SCST public headers Vladislav Bolkhovitin
2008-12-10 18:30 ` [PATCH][RFC 2/23]: SCST core Vladislav Bolkhovitin
2008-12-10 19:12 ` Sam Ravnborg
2008-12-11 17:28 ` Vladislav Bolkhovitin
2008-12-11 21:09 ` Sam Ravnborg
2008-12-12 19:24 ` Vladislav Bolkhovitin
2008-12-12 21:50 ` Steven Rostedt
[not found] ` <20081212230523.GB4775@ghostprotocols.net>
2008-12-13 1:25 ` Frédéric Weisbecker
2008-12-13 1:27 ` Frédéric Weisbecker
2008-12-13 14:46 ` Vladislav Bolkhovitin
2008-12-14 0:35 ` Frédéric Weisbecker
2008-12-16 21:49 ` Ingo Molnar
2008-12-16 22:13 ` Frédéric Weisbecker
2008-12-16 22:22 ` Ingo Molnar
2008-12-16 23:46 ` Frédéric Weisbecker
2008-12-18 11:45 ` Vladislav Bolkhovitin
2008-12-20 13:06 ` Frédéric Weisbecker
2008-12-23 19:11 ` Vladislav Bolkhovitin
2008-12-27 11:20 ` Ingo Molnar
2008-12-30 17:13 ` Vladislav Bolkhovitin
2008-12-30 21:03 ` Frederic Weisbecker
2008-12-30 21:35 ` Steven Rostedt
2008-12-10 18:34 ` [PATCH][RFC 3/23]: SCST core docs Vladislav Bolkhovitin
2008-12-10 18:36 ` [PATCH][RFC 4/23]: SCST debug support Vladislav Bolkhovitin
2008-12-10 18:37 ` [PATCH][RFC 5/23]: SCST /proc interface Vladislav Bolkhovitin
2008-12-11 20:23 ` Nicholas A. Bellinger
2008-12-12 19:23 ` Vladislav Bolkhovitin
2008-12-10 18:39 ` [PATCH][RFC 6/23]: SCST SGV cache Vladislav Bolkhovitin
2008-12-10 18:40 ` [PATCH][RFC 7/23]: SCST integration into the kernel Vladislav Bolkhovitin
2008-12-10 18:42 ` [PATCH][RFC 8/23]: SCST pass-through backend handlers Vladislav Bolkhovitin
2008-12-10 18:43 ` [PATCH][RFC 9/23]: SCST virtual disk backend handler Vladislav Bolkhovitin
2008-12-10 18:44 ` [PATCH][RFC 10/23]: SCST user space " Vladislav Bolkhovitin
2008-12-10 18:46 ` [PATCH][RFC 11/23]: Makefile for SCST backend handlers Vladislav Bolkhovitin
2008-12-10 18:47 ` [PATCH][RFC 12/23]: Patch to add necessary support for SCST pass-through Vladislav Bolkhovitin
2008-12-10 18:49 ` [PATCH][RFC 13/23]: Export of alloc_io_context() function Vladislav Bolkhovitin
2008-12-11 13:34 ` Jens Axboe
2008-12-11 18:17 ` Vladislav Bolkhovitin
2008-12-11 18:41 ` Jens Axboe
2008-12-11 19:00 ` Vladislav Bolkhovitin
2008-12-11 19:06 ` Jens Axboe
2008-12-12 19:16 ` Vladislav Bolkhovitin
2008-12-10 18:50 ` [PATCH][RFC 14/23]: Necessary functionality in qla2xxx driver to support target mode Vladislav Bolkhovitin
2008-12-10 18:51 ` [PATCH][RFC 15/23]: QLogic target driver Vladislav Bolkhovitin
2008-12-10 18:54 ` [PATCH][RFC 16/23]: Documentation for " Vladislav Bolkhovitin
2008-12-10 18:55 ` [PATCH][RFC 17/23]: InfiniBand SRP " Vladislav Bolkhovitin
2008-12-10 18:57 ` [PATCH][RFC 18/23]: Documentation for " Vladislav Bolkhovitin
2008-12-10 18:58 ` [PATCH][RFC 19/23]: scst_local " Vladislav Bolkhovitin
2008-12-10 19:00 ` [PATCH][RFC 20/23]: Documentation for scst_local driver Vladislav Bolkhovitin
2008-12-10 19:01 ` [PATCH][RFC 21/23]: iSCSI target driver Vladislav Bolkhovitin
2008-12-11 22:55 ` Nicholas A. Bellinger
2008-12-11 22:59 ` Nicholas A. Bellinger
2008-12-12 19:26 ` Vladislav Bolkhovitin
2008-12-13 10:03 ` Nicholas A. Bellinger
2008-12-13 10:11 ` Bart Van Assche
2008-12-13 10:16 ` Nicholas A. Bellinger
2008-12-13 10:27 ` Bart Van Assche
2008-12-13 15:01 ` Vladislav Bolkhovitin
2008-12-13 14:57 ` Vladislav Bolkhovitin
2008-12-10 19:02 ` [PATCH][RFC 22/23]: Documentation for iSCSI-SCST Vladislav Bolkhovitin
2008-12-10 19:04 ` [PATCH][RFC 23/23]: Support for zero-copy TCP transmit of user space data Vladislav Bolkhovitin
2008-12-10 21:45 ` Evgeniy Polyakov
2008-12-11 18:16 ` Vladislav Bolkhovitin
2008-12-11 19:12 ` James Bottomley
2008-12-12 19:25 ` Vladislav Bolkhovitin
2008-12-12 19:37 ` James Bottomley
2008-12-15 17:58 ` Vladislav Bolkhovitin
2008-12-15 23:18 ` Christoph Hellwig
2008-12-16 18:57 ` Vladislav Bolkhovitin
2008-12-18 18:35 ` [RFC]: " Vladislav Bolkhovitin
2008-12-18 18:43 ` David M. Lloyd
2008-12-19 17:37 ` Vladislav Bolkhovitin
2008-12-19 19:07 ` Jens Axboe
2008-12-19 19:17 ` Vladislav Bolkhovitin
2008-12-19 19:27 ` Jens Axboe
2008-12-19 21:58 ` Evgeniy Polyakov
2008-12-23 19:11 ` Vladislav Bolkhovitin [this message]
2008-12-19 11:27 ` Andi Kleen
2008-12-19 17:38 ` Vladislav Bolkhovitin
2008-12-19 18:00 ` Andi Kleen
2008-12-19 17:57 ` Vladislav Bolkhovitin
2008-12-16 16:00 ` [PATCH][RFC 23/23]: " Bart Van Assche
2008-12-16 17:41 ` Evgeniy Polyakov
2008-12-19 20:21 ` Jeremy Fitzhardinge
2008-12-19 22:04 ` Evgeniy Polyakov
2008-12-19 22:21 ` Jeremy Fitzhardinge
2008-12-19 22:33 ` Evgeniy Polyakov
2008-12-20 1:56 ` Jeremy Fitzhardinge
2008-12-20 2:02 ` Herbert Xu
2008-12-20 6:14 ` Jeremy Fitzhardinge
2008-12-20 6:51 ` Herbert Xu
2008-12-20 7:43 ` Jeremy Fitzhardinge
2008-12-20 8:10 ` Herbert Xu
2008-12-20 10:32 ` Evgeniy Polyakov
2008-12-20 19:39 ` Jeremy Fitzhardinge
2008-12-22 0:43 ` Rusty Russell
2008-12-23 19:14 ` Vladislav Bolkhovitin
2008-12-23 19:16 ` Vladislav Bolkhovitin
2008-12-23 21:38 ` Evgeniy Polyakov
2008-12-24 14:37 ` Vladislav Bolkhovitin
2008-12-24 14:44 ` Evgeniy Polyakov
2008-12-24 17:46 ` Vladislav Bolkhovitin
2008-12-24 18:08 ` Evgeniy Polyakov
2008-12-30 17:37 ` Vladislav Bolkhovitin
2008-12-30 21:35 ` Evgeniy Polyakov
2008-12-23 19:13 ` Vladislav Bolkhovitin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=495137DC.8050204@vlnb.net \
--to=vst@vlnb.net \
--cc=James.Bottomley@HansenPartnership.com \
--cc=bart.vanassche@gmail.com \
--cc=dmlloyd@flurg.com \
--cc=hch@infradead.org \
--cc=jens.axboe@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-scsi@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=scst-devel@lists.sourceforge.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox