From: Evgeniy Polyakov <zbr@ioremap.net>
To: Vladislav Bolkhovitin <vst@vlnb.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>,
Jeremy Fitzhardinge <jeremy@goop.org>,
linux-scsi@vger.kernel.org,
James Bottomley <James.Bottomley@HansenPartnership.com>,
Andrew Morton <akpm@linux-foundation.org>,
FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>,
Mike Christie <michaelc@cs.wisc.edu>,
Jeff Garzik <jeff@garzik.org>,
Boaz Harrosh <bharrosh@panasas.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
linux-kernel@vger.kernel.org, scst-devel@lists.sourceforge.net,
Bart Van Assche <bart.vanassche@gmail.com>,
"Nicholas A. Bellinger" <nab@linux-iscsi.org>,
netdev@vger.kernel.org, Rusty Russell <rusty@rustcorp.com.au>,
David Miller <davem@davemloft.net>,
Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Subject: Re: [PATCH][RFC 23/23]: Support for zero-copy TCP transmit of user space data
Date: Wed, 31 Dec 2008 00:35:59 +0300 [thread overview]
Message-ID: <20081230213559.GD20238@ioremap.net> (raw)
In-Reply-To: <495A5C3C.8090006@vlnb.net>
Hi Vlad.
On Tue, Dec 30, 2008 at 08:37:00PM +0300, Vladislav Bolkhovitin (vst@vlnb.net) wrote:
> Although I agree that any additional allocation is something, which
> should be avoided, *if possible*. But you shouldn't overestimate the
> overhead of the sk_transaction_token allocation in cases, when it would
> be needed. At first, sk_transaction_token is quite small, so a single
> page in the kmem cache would keep about 100 of them, hence the slow
> allocation path would be called only once per 100 objects. Second, in
> many cases ->sendpages() needs to allocate a new skb, so already there
> is at least one such allocations on the fast path.
Once per 100 objects? With millions of packets per second at extreme
cases this does not scale. Even more common thousand of usual packets
per second with 1.5k mtu will show up (especially freeing actually).
Any additional overhead has to be avoided if possible, even if it looks
innocent.
BSD guys already learned this lesson with packet processing tags at
every layer.
> Actually, it doesn't look like the skb shared info destructor alone
> can't solve the task we are solving, because we need to know not when an
> skb transmittion finished, but when transmittion of our *set of pages*
> finished. Hence, with skb shared info destructor we would need also to
> invent some way to track set of pages <-> set of skbs translation (you
> refer it as combining tag and separate destructor), which would bring
> this solution on the entire new complexity level for no gain over the
> sk_transaction_token solution.
You really do not need to know when transmission is over, but when remote
side acks it (or connection is reset by the timeout). There is no way to
know when transmission is over without creating own skbs and submitting
them avoiding usual tcp/ip stack machinery.
You do not need to know which skbs contain which pages, system only should
track page pointers freed at skb destruction (shared info destruction
actually) time, no matter who owns those pages (since new pages can be
added into the page and some of the old ones can be freed early).
This will be effectively the same token, but it does not mean that
everyone who needs notification will have to perform additional
allocation. Put two pointers: destructor and token and do whatever you
like if one of them is non-empty, but try to avoid unneded overhead when
it is possible.
--
Evgeniy Polyakov
next prev parent reply other threads:[~2008-12-30 21:35 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-10 18:26 [PATCH][RFC 0/23] New SCSI target framework (SCST) and 4 target drivers Vladislav Bolkhovitin
2008-12-10 19:01 ` [PATCH][RFC 21/23]: iSCSI target driver Vladislav Bolkhovitin
2008-12-11 22:55 ` Nicholas A. Bellinger
2008-12-11 22:59 ` Nicholas A. Bellinger
2008-12-12 19:26 ` Vladislav Bolkhovitin
2008-12-10 19:02 ` [PATCH][RFC 22/23]: Documentation for iSCSI-SCST Vladislav Bolkhovitin
2008-12-10 19:04 ` [PATCH][RFC 23/23]: Support for zero-copy TCP transmit of user space data Vladislav Bolkhovitin
2008-12-10 21:45 ` Evgeniy Polyakov
2008-12-11 18:16 ` Vladislav Bolkhovitin
2008-12-11 19:12 ` James Bottomley
2008-12-12 19:25 ` Vladislav Bolkhovitin
2008-12-12 19:37 ` James Bottomley
2008-12-15 17:58 ` Vladislav Bolkhovitin
2008-12-15 23:18 ` Christoph Hellwig
2008-12-16 18:57 ` Vladislav Bolkhovitin
2008-12-18 18:35 ` [RFC]: " Vladislav Bolkhovitin
2008-12-18 18:43 ` David M. Lloyd
2008-12-19 17:37 ` Vladislav Bolkhovitin
2008-12-19 19:07 ` Jens Axboe
2008-12-19 19:17 ` Vladislav Bolkhovitin
2008-12-19 19:27 ` Jens Axboe
2008-12-19 21:58 ` Evgeniy Polyakov
2008-12-23 19:11 ` Vladislav Bolkhovitin
2008-12-19 11:27 ` Andi Kleen
2008-12-19 17:38 ` Vladislav Bolkhovitin
2008-12-19 18:00 ` Andi Kleen
2008-12-19 17:57 ` Vladislav Bolkhovitin
2008-12-16 16:00 ` [PATCH][RFC 23/23]: " Bart Van Assche
2008-12-16 17:41 ` Evgeniy Polyakov
2008-12-19 20:21 ` Jeremy Fitzhardinge
2008-12-19 22:04 ` Evgeniy Polyakov
2008-12-19 22:21 ` Jeremy Fitzhardinge
2008-12-19 22:33 ` Evgeniy Polyakov
2008-12-20 1:56 ` Jeremy Fitzhardinge
2008-12-20 2:02 ` Herbert Xu
2008-12-20 6:14 ` Jeremy Fitzhardinge
2008-12-20 6:51 ` Herbert Xu
2008-12-20 7:43 ` Jeremy Fitzhardinge
2008-12-20 8:10 ` Herbert Xu
2008-12-20 10:32 ` Evgeniy Polyakov
2008-12-20 19:39 ` Jeremy Fitzhardinge
2008-12-22 0:43 ` Rusty Russell
2008-12-23 19:14 ` Vladislav Bolkhovitin
2008-12-23 19:16 ` Vladislav Bolkhovitin
2008-12-23 21:38 ` Evgeniy Polyakov
2008-12-24 14:37 ` Vladislav Bolkhovitin
2008-12-24 14:44 ` Evgeniy Polyakov
2008-12-24 17:46 ` Vladislav Bolkhovitin
2008-12-24 18:08 ` Evgeniy Polyakov
2008-12-30 17:37 ` Vladislav Bolkhovitin
2008-12-30 21:35 ` Evgeniy Polyakov [this message]
2008-12-23 19:13 ` Vladislav Bolkhovitin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081230213559.GD20238@ioremap.net \
--to=zbr@ioremap.net \
--cc=James.Bottomley@HansenPartnership.com \
--cc=akpm@linux-foundation.org \
--cc=bart.vanassche@gmail.com \
--cc=bharrosh@panasas.com \
--cc=davem@davemloft.net \
--cc=fujita.tomonori@lab.ntt.co.jp \
--cc=herbert@gondor.apana.org.au \
--cc=jeff@garzik.org \
--cc=jeremy@goop.org \
--cc=kuznet@ms2.inr.ac.ru \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=michaelc@cs.wisc.edu \
--cc=nab@linux-iscsi.org \
--cc=netdev@vger.kernel.org \
--cc=rusty@rustcorp.com.au \
--cc=scst-devel@lists.sourceforge.net \
--cc=torvalds@linux-foundation.org \
--cc=vst@vlnb.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).