From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Laight Subject: RE: [RFC PATCH 0/3] kernel: add support for 256-bit IO access Date: Tue, 20 Mar 2018 13:30:59 +0000 Message-ID: References: <7f0ddb3678814c7bab180714437795e0@AcuMS.aculab.com> <7f8d811e79284a78a763f4852984eb3f@AcuMS.aculab.com> <20180320082651.jmxvvii2xvmpyr2s@gmail.com> <20180320090802.qw4tqjmhy6yfd6sf@gmail.com> <20180320105427.bm4od7cpessbraag@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT Cc: 'Rahul Lakkireddy' , "x86@kernel.org" , "linux-kernel@vger.kernel.org" , "netdev@vger.kernel.org" , "mingo@redhat.com" , "hpa@zytor.com" , "davem@davemloft.net" , "akpm@linux-foundation.org" , "torvalds@linux-foundation.org" , "ganeshgr@chelsio.com" , "nirranjan@chelsio.com" , "indranil@chelsio.com" , "Andy Lutomirski" , Peter Zijlstra , Fenghua Yu , Eric Biggers To: 'Ingo Molnar' , Thomas Gleixner Return-path: In-Reply-To: <20180320105427.bm4od7cpessbraag@gmail.com> Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org List-Id: netdev.vger.kernel.org From: Ingo Molnar > Sent: 20 March 2018 10:54 ... > Note that a generic version might still be worth trying out, if and only if it's > safe to access those vector registers directly: modern x86 CPUs will do their > non-constant memcpy()s via the common memcpy_erms() function - which could in > theory be an easy common point to be (cpufeatures-) patched to an AVX2 variant, if > size (and alignment, perhaps) is a multiple of 32 bytes or so. > > Assuming it's correct with arbitrary user-space FPU state and if it results in any > measurable speedups, which might not be the case: ERMS is supposed to be very > fast. > > So even if it's possible (which it might not be), it could end up being slower > than the ERMS version. Last I checked memcpy() was implemented as 'rep movsb' on the latest Intel cpus. Since memcpy_to/fromio() get aliased to memcpy() this generates byte copies. The previous 'fastest' version of memcpy() was ok for uncached locations. For PCIe I suspect that the actual instructions don't make a massive difference. I'm not even sure interleaving two transfers makes any difference. What makes a huge difference for memcpy_fromio() is the size of the register. The time taken for a read will be largely independent of the width of the register used. David