From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gate.crashing.org (gate.crashing.org [63.228.1.57]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 49305DDE11 for ; Tue, 14 Oct 2008 13:49:47 +1100 (EST) Subject: Re: performance: memcpy vs. __copy_tofrom_user From: Benjamin Herrenschmidt To: Matt Sealey In-Reply-To: <48F40077.5060003@genesi-usa.com> References: <48ECC611.3030309@mikroswiat.pl> <20081008154212.GA21723@secretlab.ca> <18669.28058.495259.72182@cargo.ozlabs.ibm.com> <48EDD905.6070609@mikroswiat.pl> <18669.58803.48011.686743@cargo.ozlabs.ibm.com> <48EE2553.30903@genesi-usa.com> <1223764226.8157.182.camel@pasglop> <48F15B7D.3060608@genesi-usa.com> <20081013152028.GA18639@ld0162-tx32.am.freescale.net> <1223931027.8157.272.camel@pasglop> <48F3B7A2.3010004@freescale.com> <48F40077.5060003@genesi-usa.com> Content-Type: text/plain Date: Tue, 14 Oct 2008 13:39:19 +1100 Message-Id: <1223951959.8157.318.camel@pasglop> Mime-Version: 1.0 Cc: Scott Wood , linuxppc-dev@ozlabs.org, Dominik Bozek , Paul Mackerras , linuxppc-embedded@ozlabs.org Reply-To: benh@kernel.crashing.org List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , > There should definitely be a nice API for an in-kernel AltiVec context > save/restore. When preemption happens doesn't it do some equivalent of > the userspace context switch? Why can't the preemption system take care > of it? > > At worst case you make the worst case latency bigger, but at best case > you gain performance across the board. Do you ? Can you prove this assertion with numbers ? > One thing which is worrying me is that now that Ben has thrown down the > gauntlet (note, I'm not going to be coding a line, but I know a man who > can :) how on earth do we benchmark the differences here? Precisely :-) So again, let's start by having somebody pick up something that you believe is worth altivec-ifying, eat the preempt_disable/enable for now, and if we see that indeed, it's worth the pain, then we can look into adding a way to context switch altivec in a kernel thread upon explicit request or something like that. As to how to benchmark the difference ? Well, I would suggest first a couple of very simple things that give a good indication, and from there, if it looks promising, we can torture more and see whether we can find regressions etc.. For example, I personally use kernel compile times (with make -jN on SMP), I find it a good overall exercise, but if you feel like a network benchmark might be better at advertising your improvements, then go for that too, though expect us to also do some other tests to verify they didn't regress. Cheers, Ben.