From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Laight Subject: RE: [RFC PATCH 0/3] kernel: add support for 256-bit IO access Date: Tue, 20 Mar 2018 09:59:40 +0000 Message-ID: <43d86d051123403496311bb70babadd5@AcuMS.aculab.com> References: <7f0ddb3678814c7bab180714437795e0@AcuMS.aculab.com> <7f8d811e79284a78a763f4852984eb3f@AcuMS.aculab.com> <20180320082651.jmxvvii2xvmpyr2s@gmail.com> <20180320090802.qw4tqjmhy6yfd6sf@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT Cc: 'Rahul Lakkireddy' , "x86@kernel.org" , "linux-kernel@vger.kernel.org" , "netdev@vger.kernel.org" , "mingo@redhat.com" , "hpa@zytor.com" , "davem@davemloft.net" , "akpm@linux-foundation.org" , "torvalds@linux-foundation.org" , "ganeshgr@chelsio.com" , "nirranjan@chelsio.com" , "indranil@chelsio.com" , "Andy Lutomirski" , Peter Zijlstra , Fenghua Yu , Eric Biggers To: 'Thomas Gleixner' , Ingo Molnar Return-path: In-Reply-To: Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org List-Id: netdev.vger.kernel.org From: Thomas Gleixner > Sent: 20 March 2018 09:41 > On Tue, 20 Mar 2018, Ingo Molnar wrote: > > * Thomas Gleixner wrote: ... > > > And if we go down that road then we want a AVX based memcpy() > > > implementation which is runtime conditional on the feature bit(s) and > > > length dependent. Just slapping a readqq() at it and use it in a loop does > > > not make any sense. > > > > Yeah, so generic memcpy() replacement is only feasible I think if the most > > optimistic implementation is actually correct: > > > > - if no preempt disable()/enable() is required > > > > - if direct access to the AVX[2] registers does not disturb legacy FPU state in > > any fashion > > > > - if direct access to the AVX[2] registers cannot raise weird exceptions or have > > weird behavior if the FPU control word is modified to non-standard values by > > untrusted user-space > > > > If we have to touch the FPU tag or control words then it's probably only good for > > a specialized API. > > I did not mean to have a general memcpy replacement. Rather something like > magic_memcpy() which falls back to memcpy when AVX is not usable or the > length does not justify the AVX stuff at all. There is probably no point for memcpy(). Where it would make a big difference is memcpy_fromio() for PCIe devices (where longer TLP make a big difference). But any code belongs in its implementation not in every driver. The implementation of memcpy_toio() is nothing like as critical. If might be the code would need to fallback to 64bit accesses if the AVX(2) registers can't currently be accessed - maybe some obscure state.... However memcpy_to/fromio() are both horrid at the moment because they result in byte copies! David