From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeremy Higdon Subject: Re: SCSI QLA not working on latest *-mm SN2 Date: Mon, 20 Sep 2004 23:45:06 -0700 Sender: linux-scsi-owner@vger.kernel.org Message-ID: <20040921064506.GA143950@sgi.com> References: <20040917183029.GW642@parcelfarce.linux.theplanet.co.uk> <200409201540.02297.jbarnes@engr.sgi.com> <20040920232716.GD19511@colo.lackof.org> <200409201709.45008.jbarnes@engr.sgi.com> <20040921054626.GF19511@colo.lackof.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Received: from omx2-ext.sgi.com ([192.48.171.19]:51105 "EHLO omx2.sgi.com") by vger.kernel.org with ESMTP id S267464AbUIUGon (ORCPT ); Tue, 21 Sep 2004 02:44:43 -0400 Content-Disposition: inline In-Reply-To: <20040921054626.GF19511@colo.lackof.org> List-Id: linux-scsi@vger.kernel.org To: Grant Grundler Cc: Jesse Barnes , Andrew Vasquez , pj@sgi.com, linux-scsi@vger.kernel.org, mdr@cthulhu.engr.sgi.com, jeremy@cthulhu.engr.sgi.com, djh@cthulhu.engr.sgi.com, Andrew Morton Lots of issues covered. I'd like to cover one of them first, since it is an underlying principle in the discussion. On Mon, Sep 20, 2004 at 11:46:26PM -0600, Grant Grundler wrote: > On Mon, Sep 20, 2004 at 05:09:44PM -0700, Jesse Barnes wrote: > > > Secondly, I don't recall hearing about problems like this > > > on Intel or HP ia64 machines. I've only run into PCI posted write > > > and DMA syncronization problems where the drivers aren't following > > > all the rules quite right (missing mb() and readl()'s mostly). > > > > Problems like what? > > I've never heard of multiple writes from different CPUs going out of order > to the PCI device. It was my understanding that this could be a problem on any MP machine where the CPUs use a write buffer (which is just about everything today). The question seems to be whether release semantics (or equivalent on other chips) in the IA64 apply to MMIO writes. I believe that they do not. It seems that you think that it does. On Altix, we ran into a problem with the qla1280 driver (see version 1.56 in the scsi-misc-2.6 bk tree) because the spinunlock (apparently) did not imply a retirement of a previous mmio write. In that rev, I added an mmio read to so that the mmio write would be completed before releasing the spinlock (I believe the host lock held during the call to queuecommand). Before making that change, the problem was the two different CPUs would mmio write to the Request In register, and the ordering would flip, causing the qla1280 to think that it suddenly had an entire request queue. We didn't see this problem on puny 64p machines (at least not under ordinary stress testing); we needed a 512p machine to see it, though odds are that it would have occurred very occasionally on smaller machines. jeremy