From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753585AbeCTNcc convert rfc822-to-8bit (ORCPT ); Tue, 20 Mar 2018 09:32:32 -0400 Received: from smtp-out4.electric.net ([192.162.216.195]:65048 "EHLO smtp-out4.electric.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753354AbeCTNaR (ORCPT ); Tue, 20 Mar 2018 09:30:17 -0400 From: David Laight To: "'Ingo Molnar'" , Thomas Gleixner CC: "'Rahul Lakkireddy'" , "x86@kernel.org" , "linux-kernel@vger.kernel.org" , "netdev@vger.kernel.org" , "mingo@redhat.com" , "hpa@zytor.com" , "davem@davemloft.net" , "akpm@linux-foundation.org" , "torvalds@linux-foundation.org" , "ganeshgr@chelsio.com" , "nirranjan@chelsio.com" , "indranil@chelsio.com" , "Andy Lutomirski" , Peter Zijlstra , Fenghua Yu , Eric Biggers Subject: RE: [RFC PATCH 0/3] kernel: add support for 256-bit IO access Thread-Topic: [RFC PATCH 0/3] kernel: add support for 256-bit IO access Thread-Index: AQHTv43TjMVMzNQoikSg1VH837bpVaPXnqXggAAJp4CAAAH+gIABSo1ogAApUxA= Date: Tue, 20 Mar 2018 13:30:59 +0000 Message-ID: References: <7f0ddb3678814c7bab180714437795e0@AcuMS.aculab.com> <7f8d811e79284a78a763f4852984eb3f@AcuMS.aculab.com> <20180320082651.jmxvvii2xvmpyr2s@gmail.com> <20180320090802.qw4tqjmhy6yfd6sf@gmail.com> <20180320105427.bm4od7cpessbraag@gmail.com> In-Reply-To: <20180320105427.bm4od7cpessbraag@gmail.com> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.202.205.33] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 X-Outbound-IP: 156.67.243.126 X-Env-From: David.Laight@ACULAB.COM X-Proto: esmtps X-Revdns: X-HELO: AcuMS.aculab.com X-TLS: TLSv1.2:ECDHE-RSA-AES256-SHA384:256 X-Authenticated_ID: X-PolicySMART: 3396946, 3397078 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Ingo Molnar > Sent: 20 March 2018 10:54 ... > Note that a generic version might still be worth trying out, if and only if it's > safe to access those vector registers directly: modern x86 CPUs will do their > non-constant memcpy()s via the common memcpy_erms() function - which could in > theory be an easy common point to be (cpufeatures-) patched to an AVX2 variant, if > size (and alignment, perhaps) is a multiple of 32 bytes or so. > > Assuming it's correct with arbitrary user-space FPU state and if it results in any > measurable speedups, which might not be the case: ERMS is supposed to be very > fast. > > So even if it's possible (which it might not be), it could end up being slower > than the ERMS version. Last I checked memcpy() was implemented as 'rep movsb' on the latest Intel cpus. Since memcpy_to/fromio() get aliased to memcpy() this generates byte copies. The previous 'fastest' version of memcpy() was ok for uncached locations. For PCIe I suspect that the actual instructions don't make a massive difference. I'm not even sure interleaving two transfers makes any difference. What makes a huge difference for memcpy_fromio() is the size of the register. The time taken for a read will be largely independent of the width of the register used. David