From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Subject: Re: performance: memcpy vs. __copy_tofrom_user From: Benjamin Herrenschmidt To: Matt Sealey In-Reply-To: <48F15B7D.3060608@genesi-usa.com> References: <48ECC611.3030309@mikroswiat.pl> <20081008154212.GA21723@secretlab.ca> <18669.28058.495259.72182@cargo.ozlabs.ibm.com> <48EDD905.6070609@mikroswiat.pl> <18669.58803.48011.686743@cargo.ozlabs.ibm.com> <48EE2553.30903@genesi-usa.com> <1223764226.8157.182.camel@pasglop> <48F15B7D.3060608@genesi-usa.com> Content-Type: text/plain Date: Sun, 12 Oct 2008 15:05:37 +1100 Message-Id: <1223784337.8157.193.camel@pasglop> Mime-Version: 1.0 Cc: linuxppc-dev@ozlabs.org, Dominik Bozek , Paul Mackerras , linuxppc-embedded@ozlabs.org Reply-To: benh@kernel.crashing.org List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , > Would the examples (page copy, page clear) be an okay place to do it? > These sections can't be preempted anyway (right?), and it's noted that > doing it with AltiVec is a tad faster than using MMU tricks or standard > copies? I think typically page copying and clearing -are- preemptible. I'm not sure what you mean by MMU tricks, but it's not clear whether using altivec will result in any significant performance gain here, considering the cost of enabling/disabling altivec (added to handling the preemption issue). However, nothing prevents you from trying to do it and we'll see what the results are with hard numbers. > In Scott's case, while "optimizing memcpy for 48byte blocks" was a joke, > this is 3 load/stores in AltiVec, which as long as every SKB is 16 > byte aligned (is there any reason why it would not be? :) In this case, the cost of enabling/saving/restoring altivec will far outweight any benefit. In addition, skb's are often not well aligned due to the alignment tricks done with packet headers. > skb_clone might not be something you want to dump AltiVec into and would > make a mess if an skb got extended somehow, but the principle is outlined > in a very good document from a very long time ago; > > http://www.motorola.com.cn/semiconductors/sndf/conference/PDF/AH1109.pdf > > I think a lot of it still holds true as long as you really don't care > about preemption under these circumstances (where network throughput > is more important, and where AltiVec actually *reduces* CPU time, the > overhead of disabling preemption is lower anyway). You could say the > same about the RAID functions - I bet LatencyTOP has a field day when > you're using RAID5 AltiVec. RAID6 actually :-) In any case, as I said, people are welcome to implement something that can be put to the test and measured. If it proves beneficial enough, then I see no reason not to merge it. Basically, enough talks, just do something and we'll see whether it proves useful or not. Cheers, Ben.