From mboxrd@z Thu Jan 1 00:00:00 1970 From: Thomas Graf Subject: Re: dummy as IMQ replacement Date: Mon, 31 Jan 2005 16:15:32 +0100 Message-ID: <20050131151532.GE31837@postel.suug.ch> References: <1107123123.8021.80.camel@jzny.localdomain> <20050131135810.GC31837@postel.suug.ch> <1107181169.7840.184.camel@jzny.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: netdev@oss.sgi.com, Nguyen Dinh Nam , Remus , Andre Tomt , syrius.ml@no-log.org, Andy Furniss , Damion de Soto Return-path: To: jamal Content-Disposition: inline In-Reply-To: <1107181169.7840.184.camel@jzny.localdomain> Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com List-Id: netdev.vger.kernel.org > Or dropping packets. TCP will adjust itself either way; at least > thats true according to this formula [rfc3448] (originally derived from > Reno, but people are finding it works fine with all other variants of > TCP CC): > > ----- > The throughput equation is: > > s > X = ---------------------------------------------------------- > R*sqrt(2*b*p/3) + (t_RTO * (3*sqrt(3*b*p/8) * p * (1+32*p^2))) > > > Where: > > X is the transmit rate in bytes/second. > s is the packet size in bytes. > R is the round trip time in seconds. > p is the loss event rate, between 0 and 1.0, of the number of loss > events as a fraction of the number of packets transmitted. > t_RTO is the TCP retransmission timeout value in seconds. > b is the number of packets acknowledged by a single TCP > acknowledgement. > ---- Agreed, this was my first attempt and my current code is still based on this. I'm trying to avoid a retransmit battle, therefore I try to delay packets if possible with the hope that it's either just a peak or the slow down is fast enough. I use a simplified RED and tcp_xmit_retransmit_queue() input to avoid flick flack effects which works pretty well for bulky streams. A burst buffer takes care of interactive traffic with peaks but this doesn't work perfectly fine yet. Overall, my attempt works pretty well if the other side uses reno/bic and quite well for westwood and vegas. The problem is not that it doesn't work at all but achieving a certain _stable_ rate is very difficult, the delta of the requested and real rate is up to 25% depending on the constancy of the rtt and wether they follow one of the proposed tcp cc algorithms. The cc guessing code helps a bit but isn't very accurate. > Something along the lines of what OBSD firewall does but selectively (If > i understood those OBSD fanatics at SUCON;-> correctly)..they track > at ingress before ip stack. The difference is we can allow selective > tracking; something along the lines of: This means we'd have to do the most important sanity cehcks ourselves like checksum and ip header consistencity. Which basically means a duplication of ip_rcv() and ipv6_rcv(). > tc filter add dev $DEV parent ffff: protocol ip prio 10 \ > u32 match u32 0x10000 0xff0000 at 8 \ > action track \ > action metamark here depending on whether we found contrack etc > > I have the layout scribbeled on paper somewhere .. I will look it up > and provide more details > > Track should just use iptables contracking code instead of reinventing > it. This is exactly my thinking as well but I'd do it as ematch. Given we pass the netfilter conntrack code we'd then have access to the meta data of it such as direction, state and other attributes. tc filter add dev $DEV parent ffff: protocol ip prio 10 \ u32 match u32 0x10000 0xff0000 at 8 \ and conntrack \ and meta nf_state eq ESTABLISHED \ and meta nf_status eq SEEN_REPLY \ action metamark here depending on whether we found contrack etc