From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from wr-out-0506.google.com (wr-out-0506.google.com [64.233.184.227]) by ozlabs.org (Postfix) with ESMTP id E54FBDDDD8 for ; Sun, 12 Oct 2008 13:06:00 +1100 (EST) Received: by wr-out-0506.google.com with SMTP id c48so683160wra.1 for ; Sat, 11 Oct 2008 19:05:58 -0700 (PDT) Message-ID: <48F15B7D.3060608@genesi-usa.com> Date: Sat, 11 Oct 2008 21:05:49 -0500 From: Matt Sealey MIME-Version: 1.0 To: benh@kernel.crashing.org Subject: Re: performance: memcpy vs. __copy_tofrom_user References: <48ECC611.3030309@mikroswiat.pl> <20081008154212.GA21723@secretlab.ca> <18669.28058.495259.72182@cargo.ozlabs.ibm.com> <48EDD905.6070609@mikroswiat.pl> <18669.58803.48011.686743@cargo.ozlabs.ibm.com> <48EE2553.30903@genesi-usa.com> <1223764226.8157.182.camel@pasglop> In-Reply-To: <1223764226.8157.182.camel@pasglop> Content-Type: text/plain; charset=UTF-8; format=flowed Sender: Matt Sealey Cc: linuxppc-dev@ozlabs.org, Dominik Bozek , Paul Mackerras , linuxppc-embedded@ozlabs.org List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Benjamin Herrenschmidt wrote: > On Thu, 2008-10-09 at 10:37 -0500, Matt Sealey wrote: >> Ahem, but nobody here wants AltiVec in the kernel do they? > > It depends. We do use altivec in the kernel for example for > RAID accelerations. > > The reason where we require a -real-good- reason to do it is > simply because of the drawbacks. The cost of enabling altivec > in the kernel can be high (especially if the user is using it) > and it's not context switched for kernel code (just like the > FPU) for obvious performance reasons. Thus any use of altivec in the > kernel must be done within non-preemptible sections, which can > cause higher latencies in preemptible kernels. Would the examples (page copy, page clear) be an okay place to do it? These sections can't be preempted anyway (right?), and it's noted that doing it with AltiVec is a tad faster than using MMU tricks or standard copies? In Scott's case, while "optimizing memcpy for 48byte blocks" was a joke, this is 3 load/stores in AltiVec, which as long as every SKB is 16 byte aligned (is there any reason why it would not be? :) skb_clone might not be something you want to dump AltiVec into and would make a mess if an skb got extended somehow, but the principle is outlined in a very good document from a very long time ago; http://www.motorola.com.cn/semiconductors/sndf/conference/PDF/AH1109.pdf I think a lot of it still holds true as long as you really don't care about preemption under these circumstances (where network throughput is more important, and where AltiVec actually *reduces* CPU time, the overhead of disabling preemption is lower anyway). You could say the same about the RAID functions - I bet LatencyTOP has a field day when you're using RAID5 AltiVec. But if you're more concerned about fast disk access, would you really care (especially since the algorithm is automatically selected on boot, you've not much chance of having any choice in the matter anyway)? Granted it also doesn't help Scott one bit. Sorry :D -- Matt Sealey Genesi, Manager, Developer Relations