From: Evgeniy Polyakov <zbr@ioremap.net>
To: Vladislav Bolkhovitin <vst@vlnb.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>,
Jeremy Fitzhardinge <jeremy@goop.org>,
linux-scsi@vger.kernel.org,
James Bottomley <James.Bottomley@HansenPartnership.com>,
Andrew Morton <akpm@linux-foundation.org>,
FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>,
Mike Christie <michaelc@cs.wisc.edu>,
Jeff Garzik <jeff@garzik.org>,
Boaz Harrosh <bharrosh@panasas.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
linux-kernel@vger.kernel.org, scst-devel@lists.sourceforge.net,
Bart Van Assche <bart.vanassche@gmail.com>,
"Nicholas A. Bellinger" <nab@linux-iscsi.org>,
netdev@vger.kernel.org, Rusty Russell <rusty@rustcorp.com.au>,
David Miller <davem@davemloft.net>,
Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Subject: Re: [PATCH][RFC 23/23]: Support for zero-copy TCP transmit of user space data
Date: Wed, 31 Dec 2008 00:35:59 +0300 [thread overview]
Message-ID: <20081230213559.GD20238@ioremap.net> (raw)
In-Reply-To: <495A5C3C.8090006@vlnb.net>
Hi Vlad.
On Tue, Dec 30, 2008 at 08:37:00PM +0300, Vladislav Bolkhovitin (vst@vlnb.net) wrote:
> Although I agree that any additional allocation is something, which
> should be avoided, *if possible*. But you shouldn't overestimate the
> overhead of the sk_transaction_token allocation in cases, when it would
> be needed. At first, sk_transaction_token is quite small, so a single
> page in the kmem cache would keep about 100 of them, hence the slow
> allocation path would be called only once per 100 objects. Second, in
> many cases ->sendpages() needs to allocate a new skb, so already there
> is at least one such allocations on the fast path.
Once per 100 objects? With millions of packets per second at extreme
cases this does not scale. Even more common thousand of usual packets
per second with 1.5k mtu will show up (especially freeing actually).
Any additional overhead has to be avoided if possible, even if it looks
innocent.
BSD guys already learned this lesson with packet processing tags at
every layer.
> Actually, it doesn't look like the skb shared info destructor alone
> can't solve the task we are solving, because we need to know not when an
> skb transmittion finished, but when transmittion of our *set of pages*
> finished. Hence, with skb shared info destructor we would need also to
> invent some way to track set of pages <-> set of skbs translation (you
> refer it as combining tag and separate destructor), which would bring
> this solution on the entire new complexity level for no gain over the
> sk_transaction_token solution.
You really do not need to know when transmission is over, but when remote
side acks it (or connection is reset by the timeout). There is no way to
know when transmission is over without creating own skbs and submitting
them avoiding usual tcp/ip stack machinery.
You do not need to know which skbs contain which pages, system only should
track page pointers freed at skb destruction (shared info destruction
actually) time, no matter who owns those pages (since new pages can be
added into the page and some of the old ones can be freed early).
This will be effectively the same token, but it does not mean that
everyone who needs notification will have to perform additional
allocation. Put two pointers: destructor and token and do whatever you
like if one of them is non-empty, but try to avoid unneded overhead when
it is possible.
--
Evgeniy Polyakov
next prev parent reply other threads:[~2008-12-30 21:36 UTC|newest]
Thread overview: 106+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-10 18:26 [PATCH][RFC 0/23] New SCSI target framework (SCST) and 4 target drivers Vladislav Bolkhovitin
2008-12-10 18:28 ` [PATCH][RFC 1/23]: SCST public headers Vladislav Bolkhovitin
2008-12-10 18:30 ` [PATCH][RFC 2/23]: SCST core Vladislav Bolkhovitin
2008-12-10 19:12 ` Sam Ravnborg
2008-12-11 17:28 ` Vladislav Bolkhovitin
2008-12-11 21:09 ` Sam Ravnborg
2008-12-12 19:24 ` Vladislav Bolkhovitin
2008-12-12 21:50 ` Steven Rostedt
[not found] ` <20081212230523.GB4775@ghostprotocols.net>
2008-12-13 1:25 ` Frédéric Weisbecker
2008-12-13 1:27 ` Frédéric Weisbecker
2008-12-13 14:46 ` Vladislav Bolkhovitin
2008-12-14 0:35 ` Frédéric Weisbecker
2008-12-16 21:49 ` Ingo Molnar
2008-12-16 22:13 ` Frédéric Weisbecker
2008-12-16 22:22 ` Ingo Molnar
2008-12-16 23:46 ` Frédéric Weisbecker
2008-12-18 11:45 ` Vladislav Bolkhovitin
2008-12-20 13:06 ` Frédéric Weisbecker
2008-12-23 19:11 ` Vladislav Bolkhovitin
2008-12-27 11:20 ` Ingo Molnar
2008-12-30 17:13 ` Vladislav Bolkhovitin
2008-12-30 21:03 ` Frederic Weisbecker
2008-12-30 21:35 ` Steven Rostedt
2008-12-10 18:34 ` [PATCH][RFC 3/23]: SCST core docs Vladislav Bolkhovitin
2008-12-10 18:36 ` [PATCH][RFC 4/23]: SCST debug support Vladislav Bolkhovitin
2008-12-10 18:37 ` [PATCH][RFC 5/23]: SCST /proc interface Vladislav Bolkhovitin
2008-12-11 20:23 ` Nicholas A. Bellinger
2008-12-12 19:23 ` Vladislav Bolkhovitin
2008-12-10 18:39 ` [PATCH][RFC 6/23]: SCST SGV cache Vladislav Bolkhovitin
2008-12-10 18:40 ` [PATCH][RFC 7/23]: SCST integration into the kernel Vladislav Bolkhovitin
2008-12-10 18:42 ` [PATCH][RFC 8/23]: SCST pass-through backend handlers Vladislav Bolkhovitin
2008-12-10 18:43 ` [PATCH][RFC 9/23]: SCST virtual disk backend handler Vladislav Bolkhovitin
2008-12-10 18:44 ` [PATCH][RFC 10/23]: SCST user space " Vladislav Bolkhovitin
2008-12-10 18:46 ` [PATCH][RFC 11/23]: Makefile for SCST backend handlers Vladislav Bolkhovitin
2008-12-10 18:47 ` [PATCH][RFC 12/23]: Patch to add necessary support for SCST pass-through Vladislav Bolkhovitin
2008-12-10 18:49 ` [PATCH][RFC 13/23]: Export of alloc_io_context() function Vladislav Bolkhovitin
2008-12-11 13:34 ` Jens Axboe
2008-12-11 18:17 ` Vladislav Bolkhovitin
2008-12-11 18:41 ` Jens Axboe
2008-12-11 19:00 ` Vladislav Bolkhovitin
2008-12-11 19:06 ` Jens Axboe
2008-12-12 19:16 ` Vladislav Bolkhovitin
2008-12-10 18:50 ` [PATCH][RFC 14/23]: Necessary functionality in qla2xxx driver to support target mode Vladislav Bolkhovitin
2008-12-10 18:51 ` [PATCH][RFC 15/23]: QLogic target driver Vladislav Bolkhovitin
2008-12-10 18:54 ` [PATCH][RFC 16/23]: Documentation for " Vladislav Bolkhovitin
2008-12-10 18:55 ` [PATCH][RFC 17/23]: InfiniBand SRP " Vladislav Bolkhovitin
2008-12-10 18:57 ` [PATCH][RFC 18/23]: Documentation for " Vladislav Bolkhovitin
2008-12-10 18:58 ` [PATCH][RFC 19/23]: scst_local " Vladislav Bolkhovitin
2008-12-10 19:00 ` [PATCH][RFC 20/23]: Documentation for scst_local driver Vladislav Bolkhovitin
2008-12-10 19:01 ` [PATCH][RFC 21/23]: iSCSI target driver Vladislav Bolkhovitin
2008-12-11 22:55 ` Nicholas A. Bellinger
2008-12-11 22:59 ` Nicholas A. Bellinger
2008-12-12 19:26 ` Vladislav Bolkhovitin
2008-12-13 10:03 ` Nicholas A. Bellinger
2008-12-13 10:11 ` Bart Van Assche
2008-12-13 10:16 ` Nicholas A. Bellinger
2008-12-13 10:27 ` Bart Van Assche
2008-12-13 15:01 ` Vladislav Bolkhovitin
2008-12-13 14:57 ` Vladislav Bolkhovitin
2008-12-10 19:02 ` [PATCH][RFC 22/23]: Documentation for iSCSI-SCST Vladislav Bolkhovitin
2008-12-10 19:04 ` [PATCH][RFC 23/23]: Support for zero-copy TCP transmit of user space data Vladislav Bolkhovitin
2008-12-10 21:45 ` Evgeniy Polyakov
2008-12-11 18:16 ` Vladislav Bolkhovitin
2008-12-11 19:12 ` James Bottomley
2008-12-12 19:25 ` Vladislav Bolkhovitin
2008-12-12 19:37 ` James Bottomley
2008-12-15 17:58 ` Vladislav Bolkhovitin
2008-12-15 23:18 ` Christoph Hellwig
2008-12-16 18:57 ` Vladislav Bolkhovitin
2008-12-18 18:35 ` [RFC]: " Vladislav Bolkhovitin
2008-12-18 18:43 ` David M. Lloyd
2008-12-19 17:37 ` Vladislav Bolkhovitin
2008-12-19 19:07 ` Jens Axboe
2008-12-19 19:17 ` Vladislav Bolkhovitin
2008-12-19 19:27 ` Jens Axboe
2008-12-19 21:58 ` Evgeniy Polyakov
2008-12-23 19:11 ` Vladislav Bolkhovitin
2008-12-19 11:27 ` Andi Kleen
2008-12-19 17:38 ` Vladislav Bolkhovitin
2008-12-19 18:00 ` Andi Kleen
2008-12-19 17:57 ` Vladislav Bolkhovitin
2008-12-16 16:00 ` [PATCH][RFC 23/23]: " Bart Van Assche
2008-12-16 17:41 ` Evgeniy Polyakov
2008-12-19 20:21 ` Jeremy Fitzhardinge
2008-12-19 22:04 ` Evgeniy Polyakov
2008-12-19 22:21 ` Jeremy Fitzhardinge
2008-12-19 22:33 ` Evgeniy Polyakov
2008-12-20 1:56 ` Jeremy Fitzhardinge
2008-12-20 2:02 ` Herbert Xu
2008-12-20 6:14 ` Jeremy Fitzhardinge
2008-12-20 6:51 ` Herbert Xu
2008-12-20 7:43 ` Jeremy Fitzhardinge
2008-12-20 8:10 ` Herbert Xu
2008-12-20 10:32 ` Evgeniy Polyakov
2008-12-20 19:39 ` Jeremy Fitzhardinge
2008-12-22 0:43 ` Rusty Russell
2008-12-23 19:14 ` Vladislav Bolkhovitin
2008-12-23 19:16 ` Vladislav Bolkhovitin
2008-12-23 21:38 ` Evgeniy Polyakov
2008-12-24 14:37 ` Vladislav Bolkhovitin
2008-12-24 14:44 ` Evgeniy Polyakov
2008-12-24 17:46 ` Vladislav Bolkhovitin
2008-12-24 18:08 ` Evgeniy Polyakov
2008-12-30 17:37 ` Vladislav Bolkhovitin
2008-12-30 21:35 ` Evgeniy Polyakov [this message]
2008-12-23 19:13 ` Vladislav Bolkhovitin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081230213559.GD20238@ioremap.net \
--to=zbr@ioremap.net \
--cc=James.Bottomley@HansenPartnership.com \
--cc=akpm@linux-foundation.org \
--cc=bart.vanassche@gmail.com \
--cc=bharrosh@panasas.com \
--cc=davem@davemloft.net \
--cc=fujita.tomonori@lab.ntt.co.jp \
--cc=herbert@gondor.apana.org.au \
--cc=jeff@garzik.org \
--cc=jeremy@goop.org \
--cc=kuznet@ms2.inr.ac.ru \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=michaelc@cs.wisc.edu \
--cc=nab@linux-iscsi.org \
--cc=netdev@vger.kernel.org \
--cc=rusty@rustcorp.com.au \
--cc=scst-devel@lists.sourceforge.net \
--cc=torvalds@linux-foundation.org \
--cc=vst@vlnb.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox