netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Gregory Haskins <gregory.haskins@gmail.com>
To: Stephen Hemminger <shemminger@vyatta.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>,
	Gregory Haskins <ghaskins@novell.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	alacrityvm-devel@lists.sourceforge.net,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: [RFC PATCH] net: add dataref destructor to sk_buff
Date: Mon, 16 Nov 2009 15:18:17 -0500	[thread overview]
Message-ID: <4B01B389.9090507@gmail.com> (raw)
In-Reply-To: <20091116115931.6266f9c9@nehalam>

[-- Attachment #1: Type: text/plain, Size: 1833 bytes --]

Stephen Hemminger wrote:
> On Sat, 14 Nov 2009 00:27:46 -0500
> Gregory Haskins <gregory.haskins@gmail.com> wrote:
> 
>> Stephen Hemminger wrote:

> 
>>> People have tried doing copy-less send by page flipping, but the overhead of the IPI to
>>> invalidate the TLB exceeded the overhead of the copy. There was an Intel paper on this in
>>> at Linux Symposium (Ottawa) several years ago.
>> I think you are confusing copy-less tx with copy-less rx.  You can try
>> to do copy-less rx with page flipping, which has the IPI/TLB thrashing
>> properties you mention, and I agree is problematic.  We are talking
>> about copy-less tx here, however, and therefore no page-flipping is
>> involved.  Rather, we are just posting SG lists of pages directly to the
>> NIC (assuming the nic supports HIGH_DMA, etc).  You do not need to flip
>> the page, or invalidate the TLB (and thus IPI the other cores) to do
>> this to my knowledge.
>>
> 
> If you want to do copy-less tx for all applications, you have to
> do COW to handle the trivial case of :
> 
> while (cc = read(infd, buffer, sizeof buffer)) {
>    send(outsock, buffer, cc);
> }
> 
> 

You certainly _could_ implement this as a COW I suppose, but that would
be insane.  If someone did do this, you are right: you need TLB
invalidation.

However, if I were going to actually propose the changeover of the
system calls to use zero-copy (note that I am not), it would be based on
the concept in this patch.  That is: the send() would block until the
NIC completes the DMA and the shinfo block is freed.  Alternate
implementations would be AIO based, where the shinfo destructor
signifies the generation of the completion event.

FWIW: The latter is conceptually similar to how this is being used in
AlacrityVM.

HTH

Kind Regards,
-Greg


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 267 bytes --]

  reply	other threads:[~2009-11-16 20:18 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-02 14:20 [RFC PATCH] net: add dataref destructor to sk_buff Gregory Haskins
2009-11-06  5:08 ` David Miller
2009-11-06 16:08   ` Gregory Haskins
2009-11-10 11:53 ` Michael S. Tsirkin
2009-11-10 12:40   ` Gregory Haskins
2009-11-10 13:17     ` Michael S. Tsirkin
2009-11-10 14:11       ` Gregory Haskins
2009-11-10 14:36         ` Michael S. Tsirkin
2009-11-10 15:45           ` Gregory Haskins
2009-11-10 17:36             ` Michael S. Tsirkin
2009-11-10 18:36               ` Gregory Haskins
2009-11-10 21:40                 ` Evgeniy Polyakov
2009-11-14  1:12             ` Herbert Xu
2009-11-14  1:33               ` Gregory Haskins
2009-11-14  2:21                 ` Herbert Xu
2009-11-14  2:27                   ` Gregory Haskins
2009-11-14  2:43                     ` Herbert Xu
2009-11-14  2:45                     ` Stephen Hemminger
2009-11-14  2:51                       ` Herbert Xu
2009-11-14  5:27                       ` Gregory Haskins
2009-11-16 19:59                         ` Stephen Hemminger
2009-11-16 20:18                           ` Gregory Haskins [this message]
2009-11-14  3:09                     ` David Miller
2009-11-14  3:04                 ` David Miller
2009-11-16 14:22                   ` Gregory Haskins
2009-11-17  1:02                     ` Herbert Xu
2009-11-17 12:33                       ` Gregory Haskins
2009-11-16 17:08                   ` Avi Kivity
2009-11-10 14:45         ` Michael S. Tsirkin
2009-11-10 15:47           ` Gregory Haskins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B01B389.9090507@gmail.com \
    --to=gregory.haskins@gmail.com \
    --cc=alacrityvm-devel@lists.sourceforge.net \
    --cc=ghaskins@novell.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=shemminger@vyatta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).