From mboxrd@z Thu Jan 1 00:00:00 1970 From: Suparna Bhattacharya Subject: Re: net-AIO and real-time TCP (blue sky research) Date: Tue, 10 Aug 2004 21:21:48 +0530 Sender: netdev-bounce@oss.sgi.com Message-ID: <20040810155148.GA4630@in.ibm.com> References: <20040801235102.K1276@almesberger.net> Reply-To: suparna@in.ibm.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: netdev@oss.sgi.com Return-path: To: Werner Almesberger Content-Disposition: inline In-Reply-To: <20040801235102.K1276@almesberger.net> Errors-to: netdev-bounce@oss.sgi.com List-Id: netdev.vger.kernel.org Hello Werner, I was hoping all this while that someone with deeper knowledge in this area than me would respond, but well, maybe they were all quiet chuckles :) ? Does your proposal require additional semantics on aio TCP socket reads and writes that differ from the synchronous TCP case, besides not blocking and indicating completion through aio_complete ? On Sun, Aug 01, 2004 at 11:51:02PM -0300, Werner Almesberger wrote: > Hi Suparna, > > I'm copying this to netdev, because people there may get a good > chuckle out of this outlandish idea as well :-) > > At OLS we were chatting about using AIO also for networking. > While this concept didn't seem to rank particularly high on the > lunacy scale, it didn't appear overly useful either. About the > only possibly interesting new functionality, besides the > possibility to connect this with some eccentric TCP offloading > and zero-copy scheme, would be - when applied to TCP - to make > unACKed data in the out-of-order buffer available to user > space. > > Now, it occurred to me that this may lead to something a lot > more exciting: a step towards making TCP real-time capable. > I'm using the term "real-time" loosely here, as in "there's a > deadline, but we're flexible". > > I haven't followed what's going on at IETF in that area for a > while, and I'm sure plenty of other people must have thought > of similar schemes before, but since this seems nicer and > maybe even simpler than some, let me describe it anyway. > > First of all, one of the main complaints of the real-time > networking people is that TCP stubbornly insists on > retransmitting every single segment until it is absolutely > certain that the segment has been received, even if the > real-time application has long since moved on. > > Now, with net-AIO, the application could already get all the > data that has arrived after a lost segment. That's a good > start, but TCP will still try to retransmit. So the next step > would be to have a means to indicate that we've lost interest > in the outcome of a pending AIO operation, and - as a side > effect - communicate this also to TCP, so that TCP can stop > trying, and do something more useful instead. > > Let's call this operation aio_forget(). For disk IO, this > may work just like aio_cancel(). The notion of which segment to aio_forget on the Rx path is a little hazy to me (were you were indeed referring to the receive side here ? I can see this more clearly for the send side when coupled with zero copy). > > Now, aio_forget() would be a great tool for making TCP > blissfully ignorant of any losses, actually making it very > TCP-unfriendly. So the next step would be to record the fact > that we've just forgotten some segments, but still need to > make the peer aware of the fact that there (may) have been > losses, and to slow down accordingly. Obviously, if we have > reason to believe that the peer already knows of a loss in > the general vicinity, no action is needed. > > Reliably communicating a loss isn't trivial, but there should > be good background material in the context of ECN. Of course, > if ECN is available, we may just use that. Otherwise, we may > have to force a retransmission, to be sure that the peer has > noticed. (And, if the forgotten segment(s) should arrive while > TCP is trying to indicate a loss, it should stop doing so.) > > Now, assuming we have a solution for indicating losses that is > satisfying both in terms of congestion control and in terms of > efficiency, there are still a few things that would be nice to > have, that this approach doesn't solve: > > - message boundaries and segment-message alignment. Not being > able to use messages just because a few of their bytes > ended up in a lost (and then aio_forgotten) segment would > be just too bad. In some cases, it may be possible to just > set the MSS to a suitable value. Also, recovering message > boundaries after a loss may be tricky. > > - there's no direct provision for allowing adaptive coding. > Of course, this is a fairly orthogonal problem. > > - as time passes, the sender may want to remove or substitute > data it had already enqueued, e.g because there is less > bandwidth than originally anticipated. So there may be a > place for aio_forget() at the sender side too. > > Now, why could this scheme be "nicer" than just inventing some > new protocol that is designed to do all these things ? The > main thing that "looks good" is that this mechanism could use > all of TCP, and may not even need major maintenance if some > minor aspect of TCP congestion control gets changed. > > Anyway, this may be peculiar enough for someone to spin the > idea a little further. In the worst case, I might just have > provided additional evidence that, if you just search long > enough, there's a perfectly plausible problem for every > solution :-) Thanks for bringing in some fresh perspective :) Regards Suparna > > - Werner > > -- > _________________________________________________________________________ > / Werner Almesberger, Buenos Aires, Argentina werner@almesberger.net / > /_http://www.almesberger.net/____________________________________________/ -- Suparna Bhattacharya (suparna@in.ibm.com) Linux Technology Center IBM Software Lab, India