From mboxrd@z Thu Jan 1 00:00:00 1970 From: Rick Jones Subject: Re: small RPS cache for fragments? Date: Tue, 24 May 2011 14:38:48 -0700 Message-ID: <1306273128.8149.1444.camel@tardy> References: <20110517.143342.1566027350038182221.davem@davemloft.net> <20110524.160123.2051949867829317339.davem@davemloft.net> Reply-To: rick.jones2@hp.com Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Cc: netdev@vger.kernel.org To: David Miller Return-path: Received: from g1t0029.austin.hp.com ([15.216.28.36]:40487 "EHLO g1t0029.austin.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757132Ab1EXViu (ORCPT ); Tue, 24 May 2011 17:38:50 -0400 In-Reply-To: <20110524.160123.2051949867829317339.davem@davemloft.net> Sender: netdev-owner@vger.kernel.org List-ID: On Tue, 2011-05-24 at 16:01 -0400, David Miller wrote: > So I looked into implementing this now that it has been established > that we changed even Linux to emit fragments in-order. > > The first problem we run into is that there is no "context" we can > use in all the places where skb_get_rxhash() gets called. > > Part of the problem is that we call it from strange places, such as > egress packet schedulers. That's completely bogus. > > Examples, FLOW classifier, META e-match, CHOKE, and SFB. > > In fact, for the classifiers this means they aren't making use of the > precomputed TX hash values in the sockets like __skb_tx_hash() will > make use of. So this makes these packet schedulers operate > potentially more expensively than they need to. > > If we could get rid of those silly cases, the stuff that remains > (macvtap and net/core/dev.c) could work with a NAPI context during > rxhash computation and use that to store the IP fragmentation > on-behind cached information. Isn't there still an issue (perhaps small) of traffic being sent through a mode-rr bond, either at the origin or somewhere along the way? At the origin point will depend on the presence of UFO and whether it is propagated up through the bond interface, but as a quick test, I disabled TSO, GSO and UFO on four e1000e driven interfaces, bonded them mode-rr and ran a netperf UDP_RR test with a 1473 byte request size and this is what they looked like at my un-bonded reciever at the other end: 14:31:01.011370 IP (tos 0x0, ttl 64, id 24960, offset 1480, flags [none], proto UDP (17), length 21) tardy.local > raj-8510w.local: udp 14:31:01.011420 IP (tos 0x0, ttl 64, id 24960, offset 0, flags [+], proto UDP (17), length 1500) tardy.local.36073 > raj-8510w.local.59951: UDP, length 1473 14:31:01.011514 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 29) raj-8510w.local.59951 > tardy.local.36073: UDP, length 1 rick jones