From mboxrd@z Thu Jan 1 00:00:00 1970 From: Werner Almesberger Subject: Re: net-AIO and real-time TCP (blue sky research) Date: Wed, 11 Aug 2004 20:18:29 -0300 Sender: netdev-bounce@oss.sgi.com Message-ID: <20040811201829.T28020@almesberger.net> References: <20040801235102.K1276@almesberger.net> <20040810155148.GA4630@in.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: netdev@oss.sgi.com Return-path: To: Suparna Bhattacharya Content-Disposition: inline In-Reply-To: <20040810155148.GA4630@in.ibm.com>; from suparna@in.ibm.com on Tue, Aug 10, 2004 at 09:21:48PM +0530 Errors-to: netdev-bounce@oss.sgi.com List-Id: netdev.vger.kernel.org Suparna Bhattacharya wrote: > I was hoping all this while that someone with deeper knowledge > in this area than me would respond, but well, maybe they were > all quiet chuckles :) ? Or they haven't stopped laughing yet ;-) > Does your proposal require additional semantics on aio TCP socket > reads and writes that differ from the synchronous TCP case, besides > not blocking and indicating completion through aio_complete ? Unfortunately, yes. First of all, we'd need a definition of where in the stream the AIO operation is applied. Two possibilities: 1) explicit: apply the concept of a "file position" to the stream, and make it visible to applications (through aio_offset) 2) implicit: follow the existing principle that any read consumes just the next chunk of data, and internally assign positions based on the sequence number. As a consequence, AIOs would be ordered over time (in the case of individual aio_reads) and space (in the case of lio_listio). In any case, it's a departure from existing API properties, i.e. 1) would introduce an application-visible "stream position" for TCP (which doesn't agree with TCP being able to send arbitrarily long streams, but then, a nice 64 bit position is probably close enough to near-infinity), and 2) adds ordering to AIO, which may be undesirable in terms of consistency, and also in terms of lock avoidance. There's also the issue of whether an AIO read should complete after retrieving less than aio_nbytes. Three possibilities: 1) never (probably not a great idea) 2) may always (like "read" does) 3) only on the last AIO read returning data 2) would be the most flexible approach, but requires either application-settable positions (to fetch the missing part) or automatic re-arranging of subsequent AIO reads. 3) avoids the problems of 2), but doesn't work well if the reader didn't correctly predict segment boundaries, and may cause trouble (like in 2) if there are pending requests after the one that was "short", and new data arrives. Last but not least, aio_forget would have to tell TCP that we're not only no longer interested in retrieving a certain piece of data, but that we'll never be. If positions are implicit, aio_cancel would actually have this effect (since there would be no way to request the same range of data again), so we wouldn't even need aio_forget. > The notion of which segment to aio_forget on the Rx path > is a little hazy to me (were you were indeed referring > to the receive side here ? I can see this more clearly for > the send side when coupled with zero copy). Yes, this is mainly about receiving. Similar things could be done for sending, but that's largely a separate issue. Let's say I'm issuing three AIOs: 1: offset = 0, nbytes = 100 2: offset = 100, nbytes = 100 3: offset = 200, nbytes = 100 Now a segment arrives for 0-99, and another for 200-299. Normal TCP will retry (by ACKing sequence 100) until also the segment 100-199 has made it. With AIO-TCP, if our application is happy with getting two out of the three requests, it can now aio_forget the 2nd request. TCP would notice that can now ACK up to sequence 200, for the forgotten read, and even up to sequence 300, because the 200-299 has been received. So it'll ACK sequence 300 now, and happily move on, without caring whether segment 100-199 ever gets through. - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina werner@almesberger.net / /_http://www.almesberger.net/____________________________________________/