From mboxrd@z Thu Jan 1 00:00:00 1970 From: will.deacon@arm.com (Will Deacon) Date: Tue, 24 Oct 2017 12:09:05 +0100 Subject: [PATCH v2] arm64: optimize __memcpy_fromio and __memcpy_toio In-Reply-To: <20171023162611.37098-1-salyzyn@android.com> References: <20171023162611.37098-1-salyzyn@android.com> Message-ID: <20171024110905.GA31064@arm.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Mon, Oct 23, 2017 at 09:25:35AM -0700, Mark Salyzyn wrote: > __memcpy_fromio and __memcpy_toio functions do not deal well with > harmonically unaligned addresses unless they can ultimately be > copied as quads (u64) to and from the destination. Without a > harmonically aligned relationship, they perform byte operations > over the entire buffer. > > Dropped the fragment that tried to align on the normal memory, > placing a priority on using quad alignment on the io-side. > > Removed the volatile on the source for __memcpy_toio as it is > unnecessary. > > This change was motivated by performance issues in the pstore driver. > On a test platform, measuring probe time for pstore, console buffer > size of 1/4MB and pmsg of 1/2MB, was in the 90-107ms region. Change > managed to reduce it to 10-25ms, an improvement in boot time. > > Signed-off-by: Mark Salyzyn > Cc: Kees Cook > Cc: Anton Vorontsov > Cc: Tony Luck > Cc: Catalin Marinas > Cc: Will Deacon > Cc: Anton Vorontsov > Cc: linux-arm-kernel at lists.infradead.org > Cc: linux-kernel at vger.kernel.org > > v2: > - simplify, do not try so hard, or through steps, to align on the > normal memory side, as it was a diminishing return. Dealing with > any pathological short cases was unnecessary since there does not > appear to be any. > - drop similar __memset_io changes completely. I'm fine with the idea here, but can you leave the '8's alone and not replace them with sizeof(u64) please? I don't think it helps anybody, and we still use ++/-- for the u8 case. Will From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932330AbdJXLJI (ORCPT ); Tue, 24 Oct 2017 07:09:08 -0400 Received: from foss.arm.com ([217.140.101.70]:53592 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751874AbdJXLJF (ORCPT ); Tue, 24 Oct 2017 07:09:05 -0400 Date: Tue, 24 Oct 2017 12:09:05 +0100 From: Will Deacon To: Mark Salyzyn Cc: linux-kernel@vger.kernel.org, Kees Cook , Anton Vorontsov , Tony Luck , Catalin Marinas , linux-arm-kernel@lists.infradead.org, Colin Cross , Mark Salyzyn Subject: Re: [PATCH v2] arm64: optimize __memcpy_fromio and __memcpy_toio Message-ID: <20171024110905.GA31064@arm.com> References: <20171023162611.37098-1-salyzyn@android.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171023162611.37098-1-salyzyn@android.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Oct 23, 2017 at 09:25:35AM -0700, Mark Salyzyn wrote: > __memcpy_fromio and __memcpy_toio functions do not deal well with > harmonically unaligned addresses unless they can ultimately be > copied as quads (u64) to and from the destination. Without a > harmonically aligned relationship, they perform byte operations > over the entire buffer. > > Dropped the fragment that tried to align on the normal memory, > placing a priority on using quad alignment on the io-side. > > Removed the volatile on the source for __memcpy_toio as it is > unnecessary. > > This change was motivated by performance issues in the pstore driver. > On a test platform, measuring probe time for pstore, console buffer > size of 1/4MB and pmsg of 1/2MB, was in the 90-107ms region. Change > managed to reduce it to 10-25ms, an improvement in boot time. > > Signed-off-by: Mark Salyzyn > Cc: Kees Cook > Cc: Anton Vorontsov > Cc: Tony Luck > Cc: Catalin Marinas > Cc: Will Deacon > Cc: Anton Vorontsov > Cc: linux-arm-kernel@lists.infradead.org > Cc: linux-kernel@vger.kernel.org > > v2: > - simplify, do not try so hard, or through steps, to align on the > normal memory side, as it was a diminishing return. Dealing with > any pathological short cases was unnecessary since there does not > appear to be any. > - drop similar __memset_io changes completely. I'm fine with the idea here, but can you leave the '8's alone and not replace them with sizeof(u64) please? I don't think it helps anybody, and we still use ++/-- for the u8 case. Will