public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Vladislav Bolkhovitin <vst@vlnb.net>
To: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Evgeniy Polyakov <zbr@ioremap.net>,
	linux-scsi@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>,
	Mike Christie <michaelc@cs.wisc.edu>,
	Jeff Garzik <jeff@garzik.org>,
	Boaz Harrosh <bharrosh@panasas.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-kernel@vger.kernel.org, scst-devel@lists.sourceforge.net,
	Bart Van Assche <bart.vanassche@gmail.com>,
	"Nicholas A. Bellinger" <nab@linux-iscsi.org>,
	netdev@vger.kernel.org
Subject: Re: [PATCH][RFC 23/23]: Support for zero-copy TCP transmit of user space data
Date: Fri, 12 Dec 2008 22:25:44 +0300	[thread overview]
Message-ID: <4942BAB8.4050007@vlnb.net> (raw)
In-Reply-To: <1229022734.3266.67.camel@localhost.localdomain>

James Bottomley wrote:
> On Thu, 2008-12-11 at 21:16 +0300, Vladislav Bolkhovitin wrote:
>> Hi Evgeniy,
>>
>> Evgeniy Polyakov wrote:
>>> Hi Vladislav.
>>>
>>> On Wed, Dec 10, 2008 at 10:04:36PM +0300, Vladislav Bolkhovitin (vst@vlnb.net) wrote:
>>>> In the chosen approach new optional field void *net_priv was added to 
>>>> struct page. It is enclosed by
>>> There is a huge no-no in networking land on increasing skb.
>>> Reason is simple every skb will carry potentially unneded data as long
>>> as given option is enabled, and most of the time it will.
>>> To break this barrier one has to have (I wanted to write ego, but then
>>> decided to replace it with mojo) so huge reason to do this, that it is
>>> almost impossible to have.
>>>
>>> Something tells me that increasing page structure with 8 bytes because
>>> of zero-copy iscsi transfer is not that great idea, since basically every
>>> user out there will have it enabled in the distro config and will waste
>>> noticeble amount of ram.
>> The waste will be only 0.2% of RAM or 2MB per 1GB. Not much. Perhaps, 
>> not noticeable for an average user of distro kernels at all. Embedded 
>> people, who count each byte, almost always don't need iSCSI, so won't 
>> have any problems to disable 
>> TCP_ZERO_COPY_TRANSFER_COMPLETION_NOTIFICATION option.
> 
> Actually, there are several other considerations:
> 
>      1. struct page is a lowmem structure, so increasing its size
>         becomes problematic on x86 PAE systems.
>      2. The current 64 bit struct page seems to be exactly pushing a
>         cacheline boundary.  Increasing it so it spills over will have a
>         performance impact

This is why I suggest to have 
CONFIG_TCP_ZERO_COPY_TRANSFER_COMPLETION_NOTIFICATION disabled in 
general kernels. ISCSI-SCST will still work with almost no performance 
loss for in-kernel backend and people would better recompile kernel, 
then patch it, then recompile.

> It's the performance problems that will be most critical, I suspect, so
> you'll need mm people buy in for doing this.

I'll ask in linux-mm, thanks for the suggestion.

> One thing that leaps immediately to mind is that you could isolate this
> to the net layer by putting it in skb_frag_struct.  However, such a move
> would require a proper API for this in net ...

To have net_priv analog in skb was the first idea I was tried. But I 
quickly gave up, because it would required that all the pages in each 
skb_frag_struct be from the same originator, i.e. with the same 
net_priv. It is unpractical to change all the operations with skb's to 
forbid merging them, if they have different net_priv. There are too many 
such places in very not obvious code pieces.

> right now it looks like
> you're using the struct page addition to carry this information from
> SCSI to net, which is a bit of a layering violation.

I don't think there is any layering violation here. Just lower layer 
notifies upper layer that transmission of a page has finished. It's done 
a bit not straightforward, but still basically the same as, for 
instance, on_free_cmd() callbacks which SCST core uses to notify target 
drivers and dev handlers that the corresponding command is about to be 
freed, so they can free associated with it data as well.

Thanks,
Vlad

  reply	other threads:[~2008-12-12 19:27 UTC|newest]

Thread overview: 106+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-10 18:26 [PATCH][RFC 0/23] New SCSI target framework (SCST) and 4 target drivers Vladislav Bolkhovitin
2008-12-10 18:28 ` [PATCH][RFC 1/23]: SCST public headers Vladislav Bolkhovitin
2008-12-10 18:30 ` [PATCH][RFC 2/23]: SCST core Vladislav Bolkhovitin
2008-12-10 19:12   ` Sam Ravnborg
2008-12-11 17:28     ` Vladislav Bolkhovitin
2008-12-11 21:09       ` Sam Ravnborg
2008-12-12 19:24         ` Vladislav Bolkhovitin
2008-12-12 21:50           ` Steven Rostedt
     [not found]             ` <20081212230523.GB4775@ghostprotocols.net>
2008-12-13  1:25               ` Frédéric Weisbecker
2008-12-13  1:27                 ` Frédéric Weisbecker
2008-12-13 14:46             ` Vladislav Bolkhovitin
2008-12-14  0:35               ` Frédéric Weisbecker
2008-12-16 21:49                 ` Ingo Molnar
2008-12-16 22:13                   ` Frédéric Weisbecker
2008-12-16 22:22                     ` Ingo Molnar
2008-12-16 23:46                       ` Frédéric Weisbecker
2008-12-18 11:45                 ` Vladislav Bolkhovitin
2008-12-20 13:06                   ` Frédéric Weisbecker
2008-12-23 19:11                     ` Vladislav Bolkhovitin
2008-12-27 11:20                       ` Ingo Molnar
2008-12-30 17:13                         ` Vladislav Bolkhovitin
2008-12-30 21:03                           ` Frederic Weisbecker
2008-12-30 21:35                             ` Steven Rostedt
2008-12-10 18:34 ` [PATCH][RFC 3/23]: SCST core docs Vladislav Bolkhovitin
2008-12-10 18:36 ` [PATCH][RFC 4/23]: SCST debug support Vladislav Bolkhovitin
2008-12-10 18:37 ` [PATCH][RFC 5/23]: SCST /proc interface Vladislav Bolkhovitin
2008-12-11 20:23   ` Nicholas A. Bellinger
2008-12-12 19:23     ` Vladislav Bolkhovitin
2008-12-10 18:39 ` [PATCH][RFC 6/23]: SCST SGV cache Vladislav Bolkhovitin
2008-12-10 18:40 ` [PATCH][RFC 7/23]: SCST integration into the kernel Vladislav Bolkhovitin
2008-12-10 18:42 ` [PATCH][RFC 8/23]: SCST pass-through backend handlers Vladislav Bolkhovitin
2008-12-10 18:43 ` [PATCH][RFC 9/23]: SCST virtual disk backend handler Vladislav Bolkhovitin
2008-12-10 18:44 ` [PATCH][RFC 10/23]: SCST user space " Vladislav Bolkhovitin
2008-12-10 18:46 ` [PATCH][RFC 11/23]: Makefile for SCST backend handlers Vladislav Bolkhovitin
2008-12-10 18:47 ` [PATCH][RFC 12/23]: Patch to add necessary support for SCST pass-through Vladislav Bolkhovitin
2008-12-10 18:49 ` [PATCH][RFC 13/23]: Export of alloc_io_context() function Vladislav Bolkhovitin
2008-12-11 13:34   ` Jens Axboe
2008-12-11 18:17     ` Vladislav Bolkhovitin
2008-12-11 18:41       ` Jens Axboe
2008-12-11 19:00         ` Vladislav Bolkhovitin
2008-12-11 19:06           ` Jens Axboe
2008-12-12 19:16             ` Vladislav Bolkhovitin
2008-12-10 18:50 ` [PATCH][RFC 14/23]: Necessary functionality in qla2xxx driver to support target mode Vladislav Bolkhovitin
2008-12-10 18:51 ` [PATCH][RFC 15/23]: QLogic target driver Vladislav Bolkhovitin
2008-12-10 18:54 ` [PATCH][RFC 16/23]: Documentation for " Vladislav Bolkhovitin
2008-12-10 18:55 ` [PATCH][RFC 17/23]: InfiniBand SRP " Vladislav Bolkhovitin
2008-12-10 18:57 ` [PATCH][RFC 18/23]: Documentation for " Vladislav Bolkhovitin
2008-12-10 18:58 ` [PATCH][RFC 19/23]: scst_local " Vladislav Bolkhovitin
2008-12-10 19:00 ` [PATCH][RFC 20/23]: Documentation for scst_local driver Vladislav Bolkhovitin
2008-12-10 19:01 ` [PATCH][RFC 21/23]: iSCSI target driver Vladislav Bolkhovitin
2008-12-11 22:55   ` Nicholas A. Bellinger
2008-12-11 22:59     ` Nicholas A. Bellinger
2008-12-12 19:26     ` Vladislav Bolkhovitin
2008-12-13 10:03       ` Nicholas A. Bellinger
2008-12-13 10:11         ` Bart Van Assche
2008-12-13 10:16           ` Nicholas A. Bellinger
2008-12-13 10:27             ` Bart Van Assche
2008-12-13 15:01             ` Vladislav Bolkhovitin
2008-12-13 14:57         ` Vladislav Bolkhovitin
2008-12-10 19:02 ` [PATCH][RFC 22/23]: Documentation for iSCSI-SCST Vladislav Bolkhovitin
2008-12-10 19:04 ` [PATCH][RFC 23/23]: Support for zero-copy TCP transmit of user space data Vladislav Bolkhovitin
2008-12-10 21:45   ` Evgeniy Polyakov
2008-12-11 18:16     ` Vladislav Bolkhovitin
2008-12-11 19:12       ` James Bottomley
2008-12-12 19:25         ` Vladislav Bolkhovitin [this message]
2008-12-12 19:37           ` James Bottomley
2008-12-15 17:58             ` Vladislav Bolkhovitin
2008-12-15 23:18               ` Christoph Hellwig
2008-12-16 18:57                 ` Vladislav Bolkhovitin
2008-12-18 18:35                   ` [RFC]: " Vladislav Bolkhovitin
2008-12-18 18:43                     ` David M. Lloyd
2008-12-19 17:37                       ` Vladislav Bolkhovitin
2008-12-19 19:07                         ` Jens Axboe
2008-12-19 19:17                           ` Vladislav Bolkhovitin
2008-12-19 19:27                             ` Jens Axboe
2008-12-19 21:58                               ` Evgeniy Polyakov
2008-12-23 19:11                               ` Vladislav Bolkhovitin
2008-12-19 11:27                     ` Andi Kleen
2008-12-19 17:38                       ` Vladislav Bolkhovitin
2008-12-19 18:00                         ` Andi Kleen
2008-12-19 17:57                           ` Vladislav Bolkhovitin
2008-12-16 16:00     ` [PATCH][RFC 23/23]: " Bart Van Assche
2008-12-16 17:41       ` Evgeniy Polyakov
2008-12-19 20:21   ` Jeremy Fitzhardinge
2008-12-19 22:04     ` Evgeniy Polyakov
2008-12-19 22:21       ` Jeremy Fitzhardinge
2008-12-19 22:33         ` Evgeniy Polyakov
2008-12-20  1:56           ` Jeremy Fitzhardinge
2008-12-20  2:02             ` Herbert Xu
2008-12-20  6:14               ` Jeremy Fitzhardinge
2008-12-20  6:51                 ` Herbert Xu
2008-12-20  7:43                   ` Jeremy Fitzhardinge
2008-12-20  8:10                     ` Herbert Xu
2008-12-20 10:32                       ` Evgeniy Polyakov
2008-12-20 19:39                         ` Jeremy Fitzhardinge
2008-12-22  0:43                           ` Rusty Russell
2008-12-23 19:14                             ` Vladislav Bolkhovitin
2008-12-23 19:16                         ` Vladislav Bolkhovitin
2008-12-23 21:38                           ` Evgeniy Polyakov
2008-12-24 14:37                             ` Vladislav Bolkhovitin
2008-12-24 14:44                               ` Evgeniy Polyakov
2008-12-24 17:46                                 ` Vladislav Bolkhovitin
2008-12-24 18:08                                   ` Evgeniy Polyakov
2008-12-30 17:37                                     ` Vladislav Bolkhovitin
2008-12-30 21:35                                       ` Evgeniy Polyakov
2008-12-23 19:13     ` Vladislav Bolkhovitin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4942BAB8.4050007@vlnb.net \
    --to=vst@vlnb.net \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=akpm@linux-foundation.org \
    --cc=bart.vanassche@gmail.com \
    --cc=bharrosh@panasas.com \
    --cc=fujita.tomonori@lab.ntt.co.jp \
    --cc=jeff@garzik.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=michaelc@cs.wisc.edu \
    --cc=nab@linux-iscsi.org \
    --cc=netdev@vger.kernel.org \
    --cc=scst-devel@lists.sourceforge.net \
    --cc=torvalds@linux-foundation.org \
    --cc=zbr@ioremap.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox