From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mtagate5.de.ibm.com (mtagate5.de.ibm.com [195.212.29.154]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mtagate5.de.ibm.com", Issuer "Equifax" (verified OK)) by ozlabs.org (Postfix) with ESMTP id 1A937DDDDB for ; Mon, 6 Aug 2007 18:21:11 +1000 (EST) Received: from d12nrmr1607.megacenter.de.ibm.com (d12nrmr1607.megacenter.de.ibm.com [9.149.167.49]) by mtagate5.de.ibm.com (8.13.8/8.13.8) with ESMTP id l768L5g5914318 for ; Mon, 6 Aug 2007 08:21:05 GMT Received: from d12av01.megacenter.de.ibm.com (d12av01.megacenter.de.ibm.com [9.149.165.212]) by d12nrmr1607.megacenter.de.ibm.com (8.13.8/8.13.8/NCO v8.4) with ESMTP id l768L58c1736888 for ; Mon, 6 Aug 2007 10:21:05 +0200 Received: from d12av01.megacenter.de.ibm.com (loopback [127.0.0.1]) by d12av01.megacenter.de.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id l768L1ib014508 for ; Mon, 6 Aug 2007 10:21:01 +0200 From: Jan-Bernd Themann To: =?utf-8?q?J=C3=B6rn_Engel?= Subject: Re: [PATCH 1/1] lro: Generic Large Receive Offload for TCP traffic Date: Mon, 6 Aug 2007 09:51:11 +0200 References: <200708031441.20632.ossthema@de.ibm.com> <20070803134150.GH19344@lazybastard.org> In-Reply-To: <20070803134150.GH19344@lazybastard.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Message-Id: <200708060951.12408.ossthema@de.ibm.com> Cc: Thomas Klein , Jeff Garzik , Jan-Bernd Themann , netdev , linux-kernel , linux-ppc , Christoph Raisch , Marcus Eder , Andrew Gallatin , Stefan Roscher , David Miller List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Hi J=C3=B6rn On Friday 03 August 2007 15:41, J=C3=B6rn Engel wrote: > On Fri, 3 August 2007 14:41:19 +0200, Jan-Bernd Themann wrote: > >=20 > > This patch provides generic Large Receive Offload (LRO) functionality > > for IPv4/TCP traffic. > >=20 > > LRO combines received tcp packets to a single larger tcp packet and=20 > > passes them then to the network stack in order to increase performance > > (throughput). The interface supports two modes: Drivers can either pass > > SKBs or fragment lists to the LRO engine.=20 >=20 > Maybe this is a stupid question, but why is LRO done at the device > driver level? >=20 > If it is a unversal performance benefit, I would have expected it to be > done generically, i.e. have all packets moved into network layer pass > through LRO instead. The driver seems to be the right place: =2D There is the "page mode" interface that accepts fragment lists instead= of SKBs and does generate SKBs only in the end (see Andrew Gallatins=20 mails where he described the advantages of this approach) =2D Some drivers (in particular for 10G NICs which actually could benefit from LRO) have multiple HW receive queues that do some sort of sorting, thus using one lro_mgr per queue increases the likelyhood of beeing able to do efficient LRO. =20 > > +void lro_flush_pkt(struct net_lro_mgr *lro_mgr, > > + struct iphdr *iph, struct tcphdr *tcph); > In particular this bit looks like it should be driven by a timeout, > which would be settable via /proc/sys/net/core/lro_timeout or similar. No, this function is needed for "page mode" as some HW provides extra handling for small packets where packets are not stored in preallocat= ed=20 pages but in extra queues. Thus the driver needs a way to flush old sessions for this connection and handle these packets in a different way (for exampl= e=20 create a SKB and copy the data there). Timeouts are not used at all. Experiments showed that flushing at the end=20 of a NAPI poll round seems to be sufficient (see Andrew's test results) and does not affect the latency too badly. Regards, Jan-Bernd