From mboxrd@z Thu Jan 1 00:00:00 1970 From: Rick Jones Subject: Re: small RPS cache for fragments? Date: Tue, 17 May 2011 13:41:15 -0700 Message-ID: <1305664875.8149.945.camel@tardy> References: <20110517.143342.1566027350038182221.davem@davemloft.net> <1305663434.8149.936.camel@tardy> Reply-To: rick.jones2@hp.com Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Cc: David Miller , netdev@vger.kernel.org To: Tom Herbert Return-path: Received: from g6t0184.atlanta.hp.com ([15.193.32.61]:16800 "EHLO g6t0184.atlanta.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932307Ab1EQUsP (ORCPT ); Tue, 17 May 2011 16:48:15 -0400 In-Reply-To: <1305663434.8149.936.camel@tardy> Sender: netdev-owner@vger.kernel.org List-ID: On Tue, 2011-05-17 at 13:17 -0700, Rick Jones wrote: > On Tue, 2011-05-17 at 13:02 -0700, Tom Herbert wrote: > > I like it! And this sounds like the sort of algorithm that NICs might > > be able to implement to solve the UDP/RSS unpleasantness, so even > > better. > > Do (m)any devices take "shortcuts" with UDP datagrams these days? By > that I mean that back in the day, the HP-PB and "Slider" FDDI > cards/drivers did checksum offload for fragmented UDP datagrams by > sending the first fragment, the one with the UDP header and thus > checksum, last. It did that to save space on the card and make use of > the checksum accumulator. Even if no devices (mis)behave like that today, ordering of fragments sent via a mode-rr bond is far from a sure thing. rick > > rick jones > > > > > Tom > > > > On Tue, May 17, 2011 at 11:33 AM, David Miller wrote: > > > > > > It seems to me that we can solve the UDP fragmentation problem for > > > flow steering very simply by creating a (saddr/daddr/IPID) entry in a > > > table that maps to the corresponding RPS flow entry. > > > > > > When we see the initial frag with the UDP header, we create the > > > saddr/daddr/IPID mapping, and we tear it down when we hit the > > > saddr/daddr/IPID mapping and the packet has the IP_MF bit clear. > > > > > > We only inspect the saddr/daddr/IPID cache when iph->frag_off is > > > non-zero. > > > > > > It's best effort and should work quite well. > > > > > > Even a one-behind cache, per-NAPI instance, would do a lot better than > > > what happens at the moment. Especially since the IP fragments mostly > > > arrive as one packet train.