From mboxrd@z Thu Jan 1 00:00:00 1970 From: Vladislav Bolkhovitin Subject: Re: Who do we point to? Date: Thu, 21 Aug 2008 16:17:56 +0400 Message-ID: <48AD5CF4.9060407@vlnb.net> References: <200808201911.m7KJBTik015082@wind.enjellic.com> <200808210306.39959.stf_xl@wp.pl> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <200808210306.39959.stf_xl@wp.pl> Sender: linux-scsi-owner@vger.kernel.org To: Stanislaw Gruszka Cc: scst-devel@lists.sourceforge.net, greg@enjellic.com, linux-driver@qlogic.com, neilb@suse.de, linux-raid@vger.kernel.org, linuxraid@amcc.com, linux-scsi@vger.kernel.org List-Id: linux-raid.ids Stanislaw Gruszka wrote: >> Apologies for the large broadcast domain on this. I wanted to make >> sure everyone who may have an interest in this is involved. >> >> Some feedback on another issue we encountered with Linux in a >> production initiator/target environment with SCST. I'm including logs >> below from three separate systems involved in the incident. I've gone >> through them with my team and we are currently unsure on what >> triggered all this, hence mail to everyone who may be involved. >> >> The system involved is SCST 1.0.0.0 running on a Linux 2.6.24.7 target >> platform using the qla_isp driver module. The target machine has two >> 9650 eight port 3Ware controller cards driving a total of 16 750 >> gigabyte Seagate NearLine drives. Firmware on the 3ware and Qlogic >> cards should all be current. There are two identical servers in two >> geographically separated data-centers. >> >> The drives on each platform are broken into four 3+1 RAID5 devices >> with software RAID. Each RAID5 volume is a physical volume for an LVM >> volume group. There is currently one logical volume exported from each >> of four RAID5 volumes as a target device. A total of four initiators >> are thus accessing the target server, each accessing different RAID5 >> volumes. >> >> The initiators are running a stock 2.6.26.2 kernel with a RHEL5 >> userspace. Access to the SAN is via a 2462 dual-port Qlogic card. >> The initiators see a block device from each of the two target servers >> through separate ports/paths. The block devices form a software RAID1 >> device (with bitmaps) which is the physical volume for an LVM volume >> group. The production filesystem is supported by a single logical >> volume allocated from that volume group. >> >> A drive failure occured last Sunday afternoon on one of the RAID5 >> volumes. The target kernel recognized the failure, failed the device >> and kept going. >> >> Unfortunately three of the four initiators picked up a device failure >> which caused the SCST exported volume to be faulted out of the RAID1 >> device. One of the initiators noted an incident was occurring, issued >> a target reset and continued forward with no issues. >> >> The initiator which got things 'right' was not accessing the RAID5 >> volume on the target which experienced the error. Two of the three >> initiators which faulted out their volumes were not accessing the >> compromised RAID5 volume. The initiator accessing the volume faulted >> out its device. > For some reason SCST core need to wait for logical unit driver (aka dev > handler) for abort comand. It is not possible to abort command instantly i.e. > mark command as aborted, return task management success to initiator and > after logical unit driver finish, just free resources for aborted command (I > don't know way, maybe Vlad could tell more about this). That's a SAM requirement. Otherwise, if complete TM commands "instantly", without waiting for all affected commands to complete, it is possible that the aborted command would be executed in one more retry *after* the next command that initiator issued after the reset was completed. Initiator would think that the aborted commands are already dead and such behavior could kill journaled filesystems. > Qlogic initiator > device just waits for 3ware card to abort commands. As both systems have the > same SCSI stack, such same commands timeouts. 3ware driver will return error > to RAID5 roughly at the same time when Qlogic initiator timeouts. So > sometimes Qlogic send only device reset and sometimes target reset too. > > I believe increasing timeouts in sd driver on initiator site (and maybe > decreasing in on target system) will help. This things are not run time > configurable, only compile time. On initiator systems I suggest to increase > SD_TIMEOUT and maybe on target site decrease SD_MAX_RETRIES, both values are > in drivers/scsi/sd.h. In such configuration, when physical disk fail, 3ware > will return error during initiator waiting for command complete, RAID5 on > target will do the right job and from initiator point of view command will > finish successfully. > > Cheers > Stanislaw Gruszka >