From: Suparna Bhattacharya <suparna@in.ibm.com>
To: Werner Almesberger <werner@almesberger.net>
Cc: netdev@oss.sgi.com
Subject: Re: net-AIO and real-time TCP (blue sky research)
Date: Thu, 12 Aug 2004 17:37:10 +0530 [thread overview]
Message-ID: <20040812120710.GA4435@in.ibm.com> (raw)
In-Reply-To: <20040811201829.T28020@almesberger.net>
On Wed, Aug 11, 2004 at 08:18:29PM -0300, Werner Almesberger wrote:
> Suparna Bhattacharya wrote:
> > I was hoping all this while that someone with deeper knowledge
> > in this area than me would respond, but well, maybe they were
> > all quiet chuckles :) ?
>
> Or they haven't stopped laughing yet ;-)
>
> > Does your proposal require additional semantics on aio TCP socket
> > reads and writes that differ from the synchronous TCP case, besides
> > not blocking and indicating completion through aio_complete ?
>
> Unfortunately, yes. First of all, we'd need a definition of where
> in the stream the AIO operation is applied. Two possibilities:
>
> 1) explicit: apply the concept of a "file position" to the stream,
> and make it visible to applications (through aio_offset)
>
> 2) implicit: follow the existing principle that any read consumes
> just the next chunk of data, and internally assign positions
> based on the sequence number. As a consequence, AIOs would be
> ordered over time (in the case of individual aio_reads) and
> space (in the case of lio_listio).
>
> In any case, it's a departure from existing API properties, i.e.
> 1) would introduce an application-visible "stream position" for
> TCP (which doesn't agree with TCP being able to send arbitrarily
> long streams, but then, a nice 64 bit position is probably close
> enough to near-infinity), and 2) adds ordering to AIO, which may
> be undesirable in terms of consistency, and also in terms of
> lock avoidance.
>
> There's also the issue of whether an AIO read should complete
> after retrieving less than aio_nbytes. Three possibilities:
>
> 1) never (probably not a great idea)
> 2) may always (like "read" does)
> 3) only on the last AIO read returning data
>
> 2) would be the most flexible approach, but requires either
> application-settable positions (to fetch the missing part) or
> automatic re-arranging of subsequent AIO reads.
>
> 3) avoids the problems of 2), but doesn't work well if the
> reader didn't correctly predict segment boundaries, and may
> cause trouble (like in 2) if there are pending requests after
> the one that was "short", and new data arrives.
>
> Last but not least, aio_forget would have to tell TCP that we're
> not only no longer interested in retrieving a certain piece of
> data, but that we'll never be.
>
> If positions are implicit, aio_cancel would actually have this
> effect (since there would be no way to request the same range of
> data again), so we wouldn't even need aio_forget.
>
> > The notion of which segment to aio_forget on the Rx path
> > is a little hazy to me (were you were indeed referring
> > to the receive side here ? I can see this more clearly for
> > the send side when coupled with zero copy).
>
> Yes, this is mainly about receiving. Similar things could be
> done for sending, but that's largely a separate issue.
>
> Let's say I'm issuing three AIOs:
>
> 1: offset = 0, nbytes = 100
> 2: offset = 100, nbytes = 100
> 3: offset = 200, nbytes = 100
>
> Now a segment arrives for 0-99, and another for 200-299.
> Normal TCP will retry (by ACKing sequence 100) until also the
> segment 100-199 has made it.
>
> With AIO-TCP, if our application is happy with getting two
> out of the three requests, it can now aio_forget the 2nd
> request. TCP would notice that can now ACK up to sequence 200,
> for the forgotten read, and even up to sequence 300, because
> the 200-299 has been received. So it'll ACK sequence 300 now,
> and happily move on, without caring whether segment 100-199
> ever gets through.
OK, in the light of the change in semantics you described
earlier, introducing the notion of an offset, this makes sense.
Thanks for clarifying.
Regards
Suparna
>
> - Werner
>
> --
> _________________________________________________________________________
> / Werner Almesberger, Buenos Aires, Argentina werner@almesberger.net /
> /_http://www.almesberger.net/____________________________________________/
--
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India
prev parent reply other threads:[~2004-08-12 12:07 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-08-02 2:51 net-AIO and real-time TCP (blue sky research) Werner Almesberger
2004-08-10 15:51 ` Suparna Bhattacharya
2004-08-11 23:18 ` Werner Almesberger
2004-08-11 23:44 ` Sridhar Samudrala
2004-08-12 0:40 ` Werner Almesberger
2004-08-12 6:06 ` Sridhar Samudrala
2004-08-12 18:11 ` John Heffner
2004-08-12 12:07 ` Suparna Bhattacharya [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20040812120710.GA4435@in.ibm.com \
--to=suparna@in.ibm.com \
--cc=netdev@oss.sgi.com \
--cc=werner@almesberger.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).