From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jarek Poplawski Subject: Re: [PATCH iproute2] Re: HTB accuracy for high speed Date: Wed, 3 Jun 2009 07:40:49 +0000 Message-ID: <20090603074049.GA5254@ff.dom.local> References: <20090530200756.GF3166@ami.dom.local> <298f5c050906020312r514c4638sfa2b504f55d71bc1@mail.gmail.com> <298f5c050906020445n3941b4ceic1167a4a028005bf@mail.gmail.com> <20090602123635.GC4239@ff.dom.local> <4A251EEE.4060903@trash.net> <20090602130857.GA7690@ff.dom.local> <4A252714.2020008@trash.net> <20090602213723.GB2850@ami.dom.local> <4A259EB2.5010500@gmail.com> <4A2620FD.8030708@trash.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Antonio Almeida , Stephen Hemminger , netdev@vger.kernel.org, davem@davemloft.net, devik@cdi.cz, Eric Dumazet , Vladimir Ivashchenko To: Patrick McHardy Return-path: Received: from mail-bw0-f222.google.com ([209.85.218.222]:45693 "EHLO mail-bw0-f222.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755214AbZFCHky (ORCPT ); Wed, 3 Jun 2009 03:40:54 -0400 Received: by bwz22 with SMTP id 22so8485257bwz.37 for ; Wed, 03 Jun 2009 00:40:55 -0700 (PDT) Content-Disposition: inline In-Reply-To: <4A2620FD.8030708@trash.net> Sender: netdev-owner@vger.kernel.org List-ID: On Wed, Jun 03, 2009 at 09:06:37AM +0200, Patrick McHardy wrote: > Jarek Poplawski wrote: >> Jarek Poplawski wrote, On 06/02/2009 11:37 PM: >> ... >> >>> I described the reasoning here: >>> http://permalink.gmane.org/gmane.linux.network/128189 >> >> The link is stuck now, so here is a quote: > > Thanks. > >> Jarek Poplawski wrote, On 05/17/2009 10:15 PM: >> >>> Here is some additional explanation. It looks like these rates above >>> 500Mbit hit the design limits of packet scheduling. Currently used >>> internal resolution PSCHED_TICKS_PER_SEC is 1,000,000. 550Mbit rate >>> with 800byte packets means 550M/8/800 = 85938 packets/s, so on average >>> 1000000/85938 = 11.6 ticks per packet. Accounting only 11 ticks means >>> we leave 0.6*85938 = 51563 ticks per second, letting for additional >>> sending of 51563/11 = 4687 packets/s or 4687*800*8 = 30Mbit. Of course >>> it could be worse (0.9 tick/packet lost) depending on packet sizes vs. >>> rates, and the effect rises for higher rates. > > I see. Unfortunately changing the scaling factors is pushing the lower > end towards overflowing. For example Denys Fedoryshchenko reported some > breakage a few years ago when I changed the iproute-internal factors > triggered by this command: > > .. tbf buffer 1024kb latency 500ms rate 128kbit peakrate 256kbit > minburst 16384 > > The burst size calculated by TBF with the current parameters is > 64000000. Increasing it by a factor of 16 as in your patch results > in 1024000000. Which means we're getting dangerously close to > overflowing, a buffer size increase or a rate decrease of slightly > bigger than factor 4 will already overflow. > > Mid-term we really need to move to 64 bit values and ns resolution, > otherwise this problem is just going to reappear as soon as someone > tries 10gbit. Not sure what the best short term fix is, I feel a bit > uneasy about changing the current factors given how close this brings > us towards overflowing. I completely agree it's on the verge of overflow, and actually would overflow for some insanely low (for today's standards) rates. So I treat it's as a temporary solution, until people start asking about more than 1 or 2Gbit. And of course we will have to move to 64 bit anyway. Or we can do it now... Btw., I've some doubts about HFSC; it's really different than others wrt. rate tables/time accounting, and these PSCHED_TICKS look only like an unnecesary compatibility; it works OK with usecs and doesn't need this change now, unless I miss something. So maybe we would simply stop using common psched_get_time() for it, and only do a conversion for qdisc_watchdog_schedule() etc.? Thanks, Jarek P.