From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753333AbcBJK6u (ORCPT ); Wed, 10 Feb 2016 05:58:50 -0500 Received: from mail.skyhub.de ([78.46.96.112]:49771 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751713AbcBJK6s (ORCPT ); Wed, 10 Feb 2016 05:58:48 -0500 Date: Wed, 10 Feb 2016 11:58:43 +0100 From: Borislav Petkov To: "Luck, Tony" Cc: Ingo Molnar , Andrew Morton , Andy Lutomirski , Dan Williams , elliott@hpe.com, Brian Gerst , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-nvdimm@ml01.01.org, x86@kernel.org Subject: Re: [PATCH v10 3/4] x86, mce: Add __mcsafe_copy() Message-ID: <20160210105843.GD23914@pd.tnic> References: <6b63a88e925bbc821dc87f209909c3c1166b3261.1454618190.git.tony.luck@intel.com> <20160207164933.GE5862@pd.tnic> <20160209231557.GA23207@agluck-desk.sc.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20160209231557.GA23207@agluck-desk.sc.intel.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 09, 2016 at 03:15:57PM -0800, Luck, Tony wrote: > > You can save yourself this MOV here in what is, I'm assuming, the > > general likely case where @src is aligned and do: > > > > /* check for bad alignment of source */ > > testl $7, %esi > > /* already aligned? */ > > jz 102f > > > > movl %esi,%ecx > > subl $8,%ecx > > negl %ecx > > subl %ecx,%edx > > 0: movb (%rsi),%al > > movb %al,(%rdi) > > incq %rsi > > incq %rdi > > decl %ecx > > jnz 0b > > The "testl $7, %esi" just checks the low three bits ... it doesn't > change %esi. But the code from the "subl $8" on down assumes that > %ecx is a number in [1..7] as the count of bytes to copy until we > achieve alignment. Grr, sorry about that, I actually missed to copy-paste the AND: /* check for bad alignment of source */ testl $7, %esi jz 102f /* already aligned */ movl %esi,%ecx andl $7,%ecx subl $8,%ecx negl %ecx subl %ecx,%edx 0: movb (%rsi),%al movb %al,(%rdi) incq %rsi incq %rdi decl %ecx jnz 0b I basically am proposing to move the unlikely case out of line and optimize the likely one. > So your "movl %esi,%ecx" needs to be somthing that just copies the > low three bits and zeroes the high part of %ecx. Is there a cute > way to do that in x86 assembler? We could do some funky games with byte-sized moves but those are generally slower anyway so doing the default operand size thing should be ok. > I copied that loop from arch/x86/lib/copy_user_64.S:__copy_user_nocache() > I guess the answer depends on whether you generally copy enough > cache lines to save enough time to cover the cost of saving and > restoring those registers. Well, that function will run on modern hw with a stack engine so I'd assume those 4 pushes and pops would be paid for by the increased registers count for the data shuffling. But one could take out that function do some microbenchmarking with different sizes and once with the current version and once with the pushes and pops of r1[2-5] to see where the breakeven is. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.