From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756864AbcBIXQV (ORCPT ); Tue, 9 Feb 2016 18:16:21 -0500 Received: from mga03.intel.com ([134.134.136.65]:51230 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756103AbcBIXP6 (ORCPT ); Tue, 9 Feb 2016 18:15:58 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,423,1449561600"; d="scan'208";a="45260220" Date: Tue, 9 Feb 2016 15:15:57 -0800 From: "Luck, Tony" To: Borislav Petkov Cc: Ingo Molnar , Andrew Morton , Andy Lutomirski , Dan Williams , elliott@hpe.com, Brian Gerst , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-nvdimm@ml01.01.org, x86@kernel.org Subject: Re: [PATCH v10 3/4] x86, mce: Add __mcsafe_copy() Message-ID: <20160209231557.GA23207@agluck-desk.sc.intel.com> References: <6b63a88e925bbc821dc87f209909c3c1166b3261.1454618190.git.tony.luck@intel.com> <20160207164933.GE5862@pd.tnic> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160207164933.GE5862@pd.tnic> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > You can save yourself this MOV here in what is, I'm assuming, the > general likely case where @src is aligned and do: > > /* check for bad alignment of source */ > testl $7, %esi > /* already aligned? */ > jz 102f > > movl %esi,%ecx > subl $8,%ecx > negl %ecx > subl %ecx,%edx > 0: movb (%rsi),%al > movb %al,(%rdi) > incq %rsi > incq %rdi > decl %ecx > jnz 0b The "testl $7, %esi" just checks the low three bits ... it doesn't change %esi. But the code from the "subl $8" on down assumes that %ecx is a number in [1..7] as the count of bytes to copy until we achieve alignment. So your "movl %esi,%ecx" needs to be somthing that just copies the low three bits and zeroes the high part of %ecx. Is there a cute way to do that in x86 assembler? > Why aren't we pushing %r12-%r15 on the stack after the "jz 17f" above > and using them too and thus copying a whole cacheline in one go? > > We would need to restore them when we're done with the cacheline-wise > shuffle, of course. I copied that loop from arch/x86/lib/copy_user_64.S:__copy_user_nocache() I guess the answer depends on whether you generally copy enough cache lines to save enough time to cover the cost of saving and restoring those registers. -Tony