netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <jens.axboe@oracle.com>
To: Vladislav Bolkhovitin <vst@vlnb.net>
Cc: "David M. Lloyd" <dmlloyd@flurg.com>,
	linux-mm@kvack.org, Christoph Hellwig <hch@infradead.org>,
	James Bottomley <James.Bottomley@HansenPartnership.com>,
	linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org,
	scst-devel@lists.sourceforge.net,
	Bart Van Assche <bart.vanassche@gmail.com>,
	netdev@vger.kernel.org
Subject: Re: [RFC]: Support for zero-copy TCP transmit of user space data
Date: Fri, 19 Dec 2008 20:27:36 +0100	[thread overview]
Message-ID: <20081219192736.GQ32491@kernel.dk> (raw)
In-Reply-To: <494BF361.1090003@vlnb.net>

On Fri, Dec 19 2008, Vladislav Bolkhovitin wrote:
> Jens Axboe, on 12/19/2008 10:07 PM wrote:
> >On Fri, Dec 19 2008, Vladislav Bolkhovitin wrote:
> >>David M. Lloyd, on 12/18/2008 09:43 PM wrote:
> >>>On 12/18/2008 12:35 PM, Vladislav Bolkhovitin wrote:
> >>>>An iSCSI target driver iSCSI-SCST was a part of the patchset 
> >>>>(http://lkml.org/lkml/2008/12/10/293). For it a nice optimization to 
> >>>>have TCP zero-copy transmit of user space data was implemented. Patch, 
> >>>>implementing this optimization was also sent in the patchset, see 
> >>>>http://lkml.org/lkml/2008/12/10/296.
> >>>I'm probably ignorant of about 90% of the context here, but isn't this 
> >>>the sort of problem that was supposed to have been solved by vmsplice(2)?
> >>No, vmsplice can't help here. ISCSI-SCST is a kernel space driver. But, 
> >>even if it was a user space driver, vmsplice wouldn't change anything 
> >>much. It doesn't have a possibility for a user to know, when 
> >>transmission of the data finished. So, it is intended to be used as: 
> >>vmsplice() buffer -> munmap() the buffer -> mmap() new buffer -> 
> >>vmsplice() it. But on the mmap() stage kernel has to zero all the newly 
> >>mapped pages and zeroing memory isn't much faster, than copying it. 
> >>Hence, there would be no considerable performance increase.
> >
> >vmsplice() isn't the right choice, but splice() very well could be. You
> >could easily use splice internally as well. The vmsplice() part sort-of
> >applies in the sense that you want to fill pages into a pipe, which is
> >essentially what vmsplice() does. You'd need some helper to do that.
> 
> Sorry, Jens, but splice() works only if there is a file handle on the 
> another side, so user space doesn't see data buffers. But SCST needs to 
> serve a wider usage cases, like reading data with decompression from a 
> virtual tape, where decompression is done in user space. For those only 
> complete zero-copy network send, which I implemented, can give the best 
> performance.

__splice_from_pipe() takes a pipe, a descriptor and an actor. There's
absolutely ZERO reason you could not reuse most of that for this
implementation. The big bonus here is that getting the put correct from
networking would even make splice() better for everyone. Win for Linux,
win for you since it'll make it MUCH easier for you to get this stuff
in. Looking at your original patch and I almost think it's a flame bait
to induce discussion (nothing wrong with that, that approach works quite
well and has been used before). There's no way in HELL that it'd ever be
a merge candidate. And I suspect you know that, at least I hope you do
or you are farther away from going forward with this than you think.

So don't look at splice() the system call, look at the infrastructure
and check if that could be useful for your case. To me it looks
absolutely like it could, if you goal is just zero-copy transmit. The
only missing piece is dropping the reference and signalling page
consumption at the right point, which is when the data is safe to be
reused. That very bit is missing, but that should be all as far as I can
tell.

> >And
> >the ack-on-xmit-done bits is something that splice-to-socket needs
> >anyway, so I think it'd be quite a suitable choice for this.
> 
> So, are you writing that splice() could also benefit from the zero-copy 
> transmit feature, like I implemented?

I like how you want to reinvent everything, perhaps you should spend a
little more time looking into various other approaches? splice() already
does zero-copy network transmit, there are no copies going on. Ideally,
you'd have zero copies moving data into your pipe, but migrade/move
isn't quite there yet. But that doesn't apply to your case at all.

What is missing, as I wrote, is the 'release on ack' and not on pipe
buffer release. This is similar to the get_page/put_page stuff you did
in your patch, but don't go claiming that zero-copy transmit is a
Vladislav original - the ->sendpage() does no copies.

-- 
Jens Axboe

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2008-12-19 19:27 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-10 18:26 [PATCH][RFC 0/23] New SCSI target framework (SCST) and 4 target drivers Vladislav Bolkhovitin
2008-12-10 19:01 ` [PATCH][RFC 21/23]: iSCSI target driver Vladislav Bolkhovitin
2008-12-11 22:55   ` Nicholas A. Bellinger
2008-12-11 22:59     ` Nicholas A. Bellinger
2008-12-12 19:26     ` Vladislav Bolkhovitin
2008-12-10 19:02 ` [PATCH][RFC 22/23]: Documentation for iSCSI-SCST Vladislav Bolkhovitin
2008-12-10 19:04 ` [PATCH][RFC 23/23]: Support for zero-copy TCP transmit of user space data Vladislav Bolkhovitin
2008-12-10 21:45   ` Evgeniy Polyakov
2008-12-11 18:16     ` Vladislav Bolkhovitin
2008-12-11 19:12       ` James Bottomley
2008-12-12 19:25         ` Vladislav Bolkhovitin
2008-12-12 19:37           ` James Bottomley
2008-12-15 17:58             ` Vladislav Bolkhovitin
2008-12-15 23:18               ` Christoph Hellwig
2008-12-16 18:57                 ` Vladislav Bolkhovitin
2008-12-18 18:35                   ` [RFC]: " Vladislav Bolkhovitin
2008-12-18 18:43                     ` David M. Lloyd
2008-12-19 17:37                       ` Vladislav Bolkhovitin
2008-12-19 19:07                         ` Jens Axboe
2008-12-19 19:17                           ` Vladislav Bolkhovitin
2008-12-19 19:27                             ` Jens Axboe [this message]
2008-12-19 21:58                               ` Evgeniy Polyakov
2008-12-23 19:11                               ` Vladislav Bolkhovitin
2008-12-19 11:27                     ` Andi Kleen
2008-12-19 17:38                       ` Vladislav Bolkhovitin
2008-12-19 18:00                         ` Andi Kleen
2008-12-19 17:57                           ` Vladislav Bolkhovitin
2008-12-16 16:00     ` [PATCH][RFC 23/23]: " Bart Van Assche
2008-12-16 17:41       ` Evgeniy Polyakov
2008-12-19 20:21   ` Jeremy Fitzhardinge
2008-12-19 22:04     ` Evgeniy Polyakov
2008-12-19 22:21       ` Jeremy Fitzhardinge
2008-12-19 22:33         ` Evgeniy Polyakov
2008-12-20  1:56           ` Jeremy Fitzhardinge
2008-12-20  2:02             ` Herbert Xu
2008-12-20  6:14               ` Jeremy Fitzhardinge
2008-12-20  6:51                 ` Herbert Xu
2008-12-20  7:43                   ` Jeremy Fitzhardinge
2008-12-20  8:10                     ` Herbert Xu
2008-12-20 10:32                       ` Evgeniy Polyakov
2008-12-20 19:39                         ` Jeremy Fitzhardinge
2008-12-22  0:43                           ` Rusty Russell
2008-12-23 19:14                             ` Vladislav Bolkhovitin
2008-12-23 19:16                         ` Vladislav Bolkhovitin
2008-12-23 21:38                           ` Evgeniy Polyakov
2008-12-24 14:37                             ` Vladislav Bolkhovitin
2008-12-24 14:44                               ` Evgeniy Polyakov
2008-12-24 17:46                                 ` Vladislav Bolkhovitin
2008-12-24 18:08                                   ` Evgeniy Polyakov
2008-12-30 17:37                                     ` Vladislav Bolkhovitin
2008-12-30 21:35                                       ` Evgeniy Polyakov
2008-12-23 19:13     ` Vladislav Bolkhovitin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081219192736.GQ32491@kernel.dk \
    --to=jens.axboe@oracle.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=bart.vanassche@gmail.com \
    --cc=dmlloyd@flurg.com \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=scst-devel@lists.sourceforge.net \
    --cc=vst@vlnb.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).