From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750881AbWEZP3Y (ORCPT ); Fri, 26 May 2006 11:29:24 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1750913AbWEZP3Y (ORCPT ); Fri, 26 May 2006 11:29:24 -0400 Received: from rrcs-67-52-50-206.west.biz.rr.com ([67.52.50.206]:63870 "HELO out-smtp.licor.com") by vger.kernel.org with SMTP id S1750881AbWEZP3Y (ORCPT ); Fri, 26 May 2006 11:29:24 -0400 Subject: memcpy_toio on i386 using byte writes even when n%2==0 From: Chris Lesiak To: linux-kernel@vger.kernel.org Cc: Chris Lesiak Content-Type: text/plain Date: Fri, 26 May 2006 10:29:22 -0500 Message-Id: <1148657362.4376.77.camel@emerson.licor.com> Mime-Version: 1.0 X-Mailer: Evolution 2.2.3 (2.2.3-4.fc4) Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org I'm working on a driver for a custom PCI card on the i386 architecture. The card uses a PLX9030 pci bridge to link an FPGA to the PCI bus using a 16 bit bus. I found that something broke when moving from 2.6.10 to 2.6.17-rc4. In the driver, I use memcpy_toio to write 14 bytes to a memory region in the FPGA. To copy the 14 bytes, 2.6.10 does three 32 bit writes followed by one 16 bit write. 2.6.10 does three 32 bit writes followed by two 8 bit write. The PLX9030 breaks the 32 bit writes into 16 bit writes for its local bus just fine. The problem is that my board doesn't handle byte enables. It was assumed that if all memory transfers were a multiple of 2 bytes, then byte accesses wouldn't be used. This is no longer true in 2.6.7-rc4. I've solved the problem by padding to 16 bytes, but should this be considered a bug in the kernel? Both kernels use __memcpy to implement memcpy_toio. Here is the relevent code from The 2.6.10 version: static inline void * __memcpy(void * to, const void * from, size_t n) { int d0, d1, d2; __asm__ __volatile__( "rep ; movsl\n\t" "testb $2,%b4\n\t" "je 1f\n\t" "movsw\n" "1:\ttestb $1,%b4\n\t" "je 2f\n\t" "movsb\n" "2:" : "=&c" (d0), "=&D" (d1), "=&S" (d2) :"0" (n/4), "q" (n),"1" ((long) to),"2" ((long) from) : "memory"); return (to); } The 2.6.17-rc4 version: static __always_inline void * __memcpy(void * to, const void * from, size_t n) { int d0, d1, d2; __asm__ __volatile__( "rep ; movsl\n\t" "movl %4,%%ecx\n\t" "andl $3,%%ecx\n\t" #if 1 /* want to pay 2 byte penalty for a chance to skip microcoded rep? */ "jz 1f\n\t" #endif "rep ; movsb\n\t" "1:" : "=&c" (d0), "=&D" (d1), "=&S" (d2) : "0" (n/4), "g" (n), "1" ((long) to), "2" ((long) from) : "memory"); return (to); } -- Chris Lesiak chris.lesiak@licor.com