From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: [RFC] GRO scalability Date: Fri, 05 Oct 2012 21:00:34 +0200 Message-ID: <1349463634.21172.152.camel@edumazet-glaptop> References: <1348750130.5093.1227.camel@edumazet-glaptop> <1348769294.5093.1566.camel@edumazet-glaptop> <1348769990.5093.1584.camel@edumazet-glaptop> <1348841041.5093.2477.camel@edumazet-glaptop> <1349448747.21172.113.camel@edumazet-glaptop> <506F23F6.1060704@hp.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Cc: Herbert Xu , David Miller , netdev , Jesse Gross To: Rick Jones Return-path: Received: from mail-bk0-f46.google.com ([209.85.214.46]:56089 "EHLO mail-bk0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750909Ab2JETAj (ORCPT ); Fri, 5 Oct 2012 15:00:39 -0400 Received: by mail-bk0-f46.google.com with SMTP id jk13so1106979bkc.19 for ; Fri, 05 Oct 2012 12:00:38 -0700 (PDT) In-Reply-To: <506F23F6.1060704@hp.com> Sender: netdev-owner@vger.kernel.org List-ID: On Fri, 2012-10-05 at 11:16 -0700, Rick Jones wrote: > O > Flushing things if N packets have come though sounds like goodness, and > it reminds me a bit about what happens with IP fragment reassembly - > another area where the stack is trying to guess just how long to > hang-onto a packet before doing something else with it. But the value > of N to get a "decent" per-flow GRO aggregation rate will depend on the > number of concurrent flows right? If I want to have a good shot at > getting 2 segments combined for 1000 active, concurrent flows entering > my system via that interface, won't N have to approach 2000? > It all depends on the max latency you can afford. > GRO (and HW LRO) has a fundamental limitation/disadvantage here. GRO > does provide a very nice "boost" on various situations (especially > numbers of concurrent netperfs that don't blow-out the tracking limits) > but since it won't really know anything about the flow(s) involved (*) > or even their number (?), it will always be guessing. That is why it is > really only "poor man's JumboFrames" (or larger MTU - Sadly, the IEEE > keeps us all beggars here). > > A goodly portion of the benefit of GRO comes from the "incidental" ACK > avoidance it causes yes? That being the case, might that be a > worthwhile avenue to explore? It would then naturally scale as TCP et > al do today. > > When we go to 40 GbE will we have 4x as many flows, or the same number > of 4x faster flows? > > rick jones > > * for example - does this TCP segment contain the last byte(s) of a > pipelined http request/response and the first byte(s) of the next one > and so should "flush" now? Some remarks : 1) I use some 40Gbe links, thats probably why I try to improve things ;) 2) benefit of GRO can be huge, and not only for the ACK avoidance (other tricks could be done for ACK avoidance in the stack) 3) High speeds probably need multiqueue device, and each queue has its own GRO unit. For example on a 40Gbe, 8 queues -> 5Gbps per queue (about 400k packets/sec) Lets say we allow no more than 1ms of delay in GRO, this means we could have about 400 packets in the GRO queue (assuming 1500 bytes packets) Another idea to play with would be to extend GRO to allow packet reorder.