From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gate.crashing.org (gate.crashing.org [63.228.1.57]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 56850B6F6E for ; Thu, 19 May 2011 07:30:33 +1000 (EST) Subject: RE: [PATCH 1/3] mpt2sas: remove the use of writeq, since writeq is not atomic From: Benjamin Herrenschmidt To: "Moore, Eric" In-Reply-To: <4565AEA676113A449269C2F3A549520F80B66280@cosmail03.lsi.com> References: <20110504115324.GE17855@lsi.com> <1305616571.6008.23.camel@mulgrave.site> <20110518041551.GL15227@parisc-linux.org> <1305692584.2580.3.camel@mulgrave.site> <1305702010.2781.33.camel@pasglop> <4565AEA676113A449269C2F3A549520F80B66280@cosmail03.lsi.com> Content-Type: text/plain; charset="UTF-8" Date: Thu, 19 May 2011 07:30:18 +1000 Message-ID: <1305754218.7481.0.camel@pasglop> Mime-Version: 1.0 Cc: "Prakash, Sathya" , "Desai, Kashyap" , "linux-scsi@vger.kernel.org" , Matthew Wilcox , Milton Miller , James Bottomley , "paulus@samba.org" , "linuxppc-dev@lists.ozlabs.org" List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Wed, 2011-05-18 at 09:35 -0600, Moore, Eric wrote: > I worked the original defect a couple months ago, and Kashyap is now > getting around to posting my patch's. > > This original defect has nothing to do with PPC64. The original > problem was only on x86. It only became a problem on PPC64 when I > tried to fix the original x86 issue by copying the writeq code from > the linux headers, then it broke PPC64. I doubt that broken patch > was ever posted. Anyways, back to the original defect. The reason it > because a problem for x86 is because the kernel headers had a > implementation of writeq in the arch/x86 headers, which means our > internal implementation of writeq is not being used. The writeq > implementation in the kernel is total wrong for arch/x86 because it > doesn't not have spin locks, and if two processor simultaneously doing > two separate 32bit pci writes, then what is received by controller > firmware is out of order. This change occurs between Red Hat RHEL5 > and RHEL6. In RHEL5, this writeq was not implemented in arch/x86 > headers, and our driver internal implementation of write was used. You may also want to look at Milton's comments, it looks like the way you do init_completion followed immediately by wait_completion is racy. You should init the completion before you do the IO that will eventually trigger complete() to be called. Cheers, Ben.