From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from qw-out-2122.google.com (qw-out-2122.google.com [74.125.92.24]) by ozlabs.org (Postfix) with ESMTP id EC026DE172 for ; Fri, 10 Oct 2008 02:37:54 +1100 (EST) Received: by qw-out-2122.google.com with SMTP id 9so25295qwb.15 for ; Thu, 09 Oct 2008 08:37:53 -0700 (PDT) Message-ID: <48EE2553.30903@genesi-usa.com> Date: Thu, 09 Oct 2008 10:37:55 -0500 From: Matt Sealey MIME-Version: 1.0 To: Paul Mackerras Subject: Re: performance: memcpy vs. __copy_tofrom_user References: <48ECC611.3030309@mikroswiat.pl> <20081008154212.GA21723@secretlab.ca> <18669.28058.495259.72182@cargo.ozlabs.ibm.com> <48EDD905.6070609@mikroswiat.pl> <18669.58803.48011.686743@cargo.ozlabs.ibm.com> In-Reply-To: <18669.58803.48011.686743@cargo.ozlabs.ibm.com> Content-Type: text/plain; charset=UTF-8; format=flowed Sender: Matt Sealey Cc: linuxppc-dev@ozlabs.org, Dominik Bozek , linuxppc-embedded@ozlabs.org List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Paul Mackerras wrote: > Dominik Bozek writes: > >> Actually I made couple of other tests on that mpc8313. Most of them are >> to ugly to publish them, but... My problem is that I have to boost the >> gigabit interface on the mpc8313. I made simple substitution and >> __copy_tofrom_user was used instead of memcpy. I know, it's wrong, but I >> speedup that way the network interface for about 10%. > > Very interesting. Can you work out where memcpy is being called on > the network data? I wouldn't have expected that. It probably is somewhere.. through some weird and wonderful code path that needs some serious digging to find. At least in 2.4 memcpy was used and optimizing it (see Freescale's libmotovec benchmarks) did produce a sizable performance improvement. That, and offloading TCP checksumming to AltiVec helped a lot. No help at all on an 8313 but, relevant anyway. Since then zero copy networking and other fancy things like the DMA engine API (for intel ioat at least but also there is fsl dma support) there's less to actually optimize now so you're less likely to see the same benefits. All these got into mainline because it's essential to have this kind of architecture to get reasonable speeds out of >gigabit network links. > There is actually no strong reason not to use __copy_tofrom_user as > memcpy, in fact, as long as we are sure that source and destination > are both cacheable. I do think there is probably a good benefit in doing things like zeroing pages in AltiVec and copying entire pages with AltiVec (for instance when copy-on-write happens in an application) - NetBSD and QNX implement at least this because it's faster than using the cache management and works fine on uncacheable pages too (also since you're always aligned to a page, zeroing 4kb aligned to a 4kb boundary - or whatever your page size happens to be, the number of errors that can occur are absolutely tiny and performance can go through the roof). Ahem, but nobody here wants AltiVec in the kernel do they? -- Matt Sealey Genesi, Manager, Developer Relations