Re: net-AIO and real-time TCP (blue sky research)

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Suparna Bhattacharya <suparna@in.ibm.com>
To: Werner Almesberger <werner@almesberger.net>
Cc: netdev@oss.sgi.com
Subject: Re: net-AIO and real-time TCP (blue sky research)
Date: Thu, 12 Aug 2004 17:37:10 +0530	[thread overview]
Message-ID: <20040812120710.GA4435@in.ibm.com> (raw)
In-Reply-To: <20040811201829.T28020@almesberger.net>

On Wed, Aug 11, 2004 at 08:18:29PM -0300, Werner Almesberger wrote:
> Suparna Bhattacharya wrote:
> > I was hoping all this while that someone with deeper knowledge
> > in this area than me would respond, but well, maybe they were
> > all quiet chuckles :) ?
> 
> Or they haven't stopped laughing yet ;-)
> 
> > Does your proposal require additional semantics on aio TCP socket
> > reads and writes that differ from the synchronous TCP case, besides
> > not blocking and indicating completion through aio_complete ?
> 
> Unfortunately, yes. First of all, we'd need a definition of where
> in the stream the AIO operation is applied. Two possibilities:
> 
>  1) explicit: apply the concept of a "file position" to the stream,
>     and make it visible to applications (through aio_offset)
> 
>  2) implicit: follow the existing principle that any read consumes
>     just the next chunk of data, and internally assign positions
>     based on the sequence number. As a consequence, AIOs would be
>     ordered over time (in the case of individual aio_reads) and
>     space (in the case of lio_listio).
> 
> In any case, it's a departure from existing API properties, i.e.
> 1) would introduce an application-visible "stream position" for
> TCP (which doesn't agree with TCP being able to send arbitrarily
> long streams, but then, a nice 64 bit position is probably close
> enough to near-infinity), and 2) adds ordering to AIO, which may
> be undesirable in terms of consistency, and also in terms of
> lock avoidance.
> 
> There's also the issue of whether an AIO read should complete
> after retrieving less than aio_nbytes. Three possibilities:
> 
>  1) never (probably not a great idea)
>  2) may always (like "read" does)
>  3) only on the last AIO read returning data
> 
> 2) would be the most flexible approach, but requires either
> application-settable positions (to fetch the missing part) or
> automatic re-arranging of subsequent AIO reads.
> 
> 3) avoids the problems of 2), but doesn't work well if the
> reader didn't correctly predict segment boundaries, and may
> cause trouble (like in 2) if there are pending requests after
> the one that was "short", and new data arrives.
> 
> Last but not least, aio_forget would have to tell TCP that we're
> not only no longer interested in retrieving a certain piece of
> data, but that we'll never be.
> 
> If positions are implicit, aio_cancel would actually have this
> effect (since there would be no way to request the same range of
> data again), so we wouldn't even need aio_forget.
> 
> > The notion of which segment to aio_forget on the Rx path 
> > is a little hazy to me (were you were indeed referring
> > to the receive side here ? I can see this more clearly for
> > the send side when coupled with zero copy).
> 
> Yes, this is mainly about receiving. Similar things could be
> done for sending, but that's largely a separate issue.
> 
> Let's say I'm issuing three AIOs:
> 
>  1: offset = 0, nbytes = 100
>  2: offset = 100, nbytes = 100
>  3: offset = 200, nbytes = 100
> 
> Now a segment arrives for 0-99, and another for 200-299.
> Normal TCP will retry (by ACKing sequence 100) until also the
> segment 100-199 has made it.
> 
> With AIO-TCP, if our application is happy with getting two
> out of the three requests, it can now aio_forget the 2nd
> request. TCP would notice that can now ACK up to sequence 200,
> for the forgotten read, and even up to sequence 300, because
> the 200-299 has been received. So it'll ACK sequence 300 now,
> and happily move on, without caring whether segment 100-199
> ever gets through.

OK, in the light of the change in semantics you described
earlier, introducing the notion of an offset, this makes sense. 
Thanks for clarifying.

Regards
Suparna

> 
> - Werner
> 
> -- 
>   _________________________________________________________________________
>  / Werner Almesberger, Buenos Aires, Argentina     werner@almesberger.net /
> /_http://www.almesberger.net/____________________________________________/

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India

     prev parent reply	other threads:[~2004-08-12 12:07 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-08-02  2:51 net-AIO and real-time TCP (blue sky research) Werner Almesberger
2004-08-10 15:51 ` Suparna Bhattacharya
2004-08-11 23:18   ` Werner Almesberger
2004-08-11 23:44     ` Sridhar Samudrala
2004-08-12  0:40       ` Werner Almesberger
2004-08-12  6:06         ` Sridhar Samudrala
2004-08-12 18:11         ` John Heffner
2004-08-12 12:07     ` Suparna Bhattacharya [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040812120710.GA4435@in.ibm.com \
    --to=suparna@in.ibm.com \
    --cc=netdev@oss.sgi.com \
    --cc=werner@almesberger.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.