From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Vasquez Subject: Re: QLA2200 causes kernel bug Date: Fri, 7 Aug 2009 14:11:50 -0700 Message-ID: <20090807211150.GL18590@plap4-2.local> References: <6e4c20e70908060828xd4a6a8fh801e1d456c39a5f@mail.gmail.com> <20090806164925.GO2453@plap4-2.local> <6e4c20e70908061012y3fa907aduca4f706cf5ccaa5a@mail.gmail.com> <6e4c20e70908062040x39d8d0b3p90e674ec5925c5ac@mail.gmail.com> <20090807070147.GA13292@plap4-2.local> <6e4c20e70908071219v56f52c2te3b331a229fe9706@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Received: from avexch1.qlogic.com ([198.70.193.115]:4077 "EHLO avexch1.qlogic.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752656AbZHGVLu (ORCPT ); Fri, 7 Aug 2009 17:11:50 -0400 Content-Disposition: inline In-Reply-To: <6e4c20e70908071219v56f52c2te3b331a229fe9706@mail.gmail.com> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: Thomas Georgiou Cc: "linux-scsi@vger.kernel.org" On Fri, 07 Aug 2009, Thomas Georgiou wrote: > I am not sure what is happening at 1840. > > The current topology is royal (the machine in this backtrace) > connected via 2 fibre channel connections directly to a Powervault > 224F jbod. This is then connected via 2 connections again to another > 224F, which is then connected to another machine, fiord (which also > has had problems). > > I had royal connected to one 224f with 2 connections and did not > connect that jbod to anything else, and it worked with no problems for > the time it was connected like that (2 days). > Ok, so it looks like there's two problems, first, I'd suggest you talk with your JBOD vendor to see if this daisychained configuration is supported? Is the JBOD acting as a mini-hub in this configuration? Either way, as can be seen from the logs, your storage device is continually LIP/LIP-resetting causing intermitent and visiblity/loss to your storage, often times for long enough to have the midlayer begin its reaping of scsi-devices. Given the low-seed value for dev-loss-tmo (set via your qlport_down_retry usage), after numerous LIPs you run into the second issue: the BUG_ON() triggering within the FC-transport -- deferred execution of rport reaping in fc_timeout_deleted_rport(). > I have also tried connecting fiord and royal to two powervault 51f > switches in a redundant configuration and then the switches to the > 224Fs. This also generated problems and was where most of the > backtraces in the bug reports came from. Just for completeness, could you gather a similar set of driver logs with error-logging enabled within this configuration? > I have set qlport_down_retry=1 for faster failover. Increasing it may help to avoid problem (2). > Should I unset > it? A constant stream of RESETs is not expected. Regards, Andrew Vasquez