From mboxrd@z Thu Jan 1 00:00:00 1970 From: Benjamin Herrenschmidt Subject: Re: x86: faster strncpy_from_user() Date: Wed, 11 Apr 2012 11:25:01 +1000 Message-ID: <1334107501.2984.19.camel@pasglop> References: <1334097321.3040.62.camel@pasglop> <20120410.192550.1904257005034151185.davem@davemloft.net> <20120410.204322.1625218582933057448.davem@davemloft.net> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Return-path: Received: from gate.crashing.org ([63.228.1.57]:35938 "EHLO gate.crashing.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754747Ab2DKBZE (ORCPT ); Tue, 10 Apr 2012 21:25:04 -0400 In-Reply-To: Sender: linux-arch-owner@vger.kernel.org List-ID: To: Linus Torvalds Cc: David Miller , mingo@kernel.org, hpa@zytor.com, x86@kernel.org, linux-arch@vger.kernel.org On Tue, 2012-04-10 at 17:57 -0700, Linus Torvalds wrote: > On Tue, Apr 10, 2012 at 5:50 PM, Linus Torvalds > wrote: > > > > .. and you don't have a double shift, right? So you'd need to do two > > shifts and the or for each word. > > Actually, if sparc has a "rotate" instruction, you can do with a > single shift (rotate) per word loop. > > You need to set up a mask register based on the alignment, and > pre-load the first word, but if you do have a rotate you can rotate > and then use "and mask" first to generate the "high bits" of the > current word, and then use the "andn mask" to generate the low bits of > the next word. So then you just need a single rotate per loop, and > some (very minor) loop prep. > > Of course, RISC people tended to throw out rotate too, so maybe you > don't have even that. Well, we do have a very nice & flexible rotate & mask on ppc at least :-) Cheers, Ben.