From: Vladislav Bolkhovitin <vst@vlnb.net>
To: Stanislaw Gruszka <stf_xl@wp.pl>
Cc: scst-devel@lists.sourceforge.net, greg@enjellic.com,
linux-driver@qlogic.com, neilb@suse.de,
linux-raid@vger.kernel.org, linuxraid@amcc.com,
linux-scsi@vger.kernel.org
Subject: Re: Who do we point to?
Date: Thu, 21 Aug 2008 16:17:56 +0400 [thread overview]
Message-ID: <48AD5CF4.9060407@vlnb.net> (raw)
In-Reply-To: <200808210306.39959.stf_xl@wp.pl>
Stanislaw Gruszka wrote:
>> Apologies for the large broadcast domain on this. I wanted to make
>> sure everyone who may have an interest in this is involved.
>>
>> Some feedback on another issue we encountered with Linux in a
>> production initiator/target environment with SCST. I'm including logs
>> below from three separate systems involved in the incident. I've gone
>> through them with my team and we are currently unsure on what
>> triggered all this, hence mail to everyone who may be involved.
>>
>> The system involved is SCST 1.0.0.0 running on a Linux 2.6.24.7 target
>> platform using the qla_isp driver module. The target machine has two
>> 9650 eight port 3Ware controller cards driving a total of 16 750
>> gigabyte Seagate NearLine drives. Firmware on the 3ware and Qlogic
>> cards should all be current. There are two identical servers in two
>> geographically separated data-centers.
>>
>> The drives on each platform are broken into four 3+1 RAID5 devices
>> with software RAID. Each RAID5 volume is a physical volume for an LVM
>> volume group. There is currently one logical volume exported from each
>> of four RAID5 volumes as a target device. A total of four initiators
>> are thus accessing the target server, each accessing different RAID5
>> volumes.
>>
>> The initiators are running a stock 2.6.26.2 kernel with a RHEL5
>> userspace. Access to the SAN is via a 2462 dual-port Qlogic card.
>> The initiators see a block device from each of the two target servers
>> through separate ports/paths. The block devices form a software RAID1
>> device (with bitmaps) which is the physical volume for an LVM volume
>> group. The production filesystem is supported by a single logical
>> volume allocated from that volume group.
>>
>> A drive failure occured last Sunday afternoon on one of the RAID5
>> volumes. The target kernel recognized the failure, failed the device
>> and kept going.
>>
>> Unfortunately three of the four initiators picked up a device failure
>> which caused the SCST exported volume to be faulted out of the RAID1
>> device. One of the initiators noted an incident was occurring, issued
>> a target reset and continued forward with no issues.
>>
>> The initiator which got things 'right' was not accessing the RAID5
>> volume on the target which experienced the error. Two of the three
>> initiators which faulted out their volumes were not accessing the
>> compromised RAID5 volume. The initiator accessing the volume faulted
>> out its device.
> For some reason SCST core need to wait for logical unit driver (aka dev
> handler) for abort comand. It is not possible to abort command instantly i.e.
> mark command as aborted, return task management success to initiator and
> after logical unit driver finish, just free resources for aborted command (I
> don't know way, maybe Vlad could tell more about this).
That's a SAM requirement. Otherwise, if complete TM commands
"instantly", without waiting for all affected commands to complete, it
is possible that the aborted command would be executed in one more retry
*after* the next command that initiator issued after the reset was
completed. Initiator would think that the aborted commands are already
dead and such behavior could kill journaled filesystems.
> Qlogic initiator
> device just waits for 3ware card to abort commands. As both systems have the
> same SCSI stack, such same commands timeouts. 3ware driver will return error
> to RAID5 roughly at the same time when Qlogic initiator timeouts. So
> sometimes Qlogic send only device reset and sometimes target reset too.
>
> I believe increasing timeouts in sd driver on initiator site (and maybe
> decreasing in on target system) will help. This things are not run time
> configurable, only compile time. On initiator systems I suggest to increase
> SD_TIMEOUT and maybe on target site decrease SD_MAX_RETRIES, both values are
> in drivers/scsi/sd.h. In such configuration, when physical disk fail, 3ware
> will return error during initiator waiting for command complete, RAID5 on
> target will do the right job and from initiator point of view command will
> finish successfully.
>
> Cheers
> Stanislaw Gruszka
>
next prev parent reply other threads:[~2008-08-21 12:17 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-08-20 19:11 Who do we point to? greg
2008-08-21 1:06 ` [Scst-devel] " Stanislaw Gruszka
2008-08-21 12:17 ` Vladislav Bolkhovitin [this message]
2008-08-21 12:14 ` Vladislav Bolkhovitin
2008-08-21 14:32 ` James Bottomley
2008-08-27 18:17 ` Vladislav Bolkhovitin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48AD5CF4.9060407@vlnb.net \
--to=vst@vlnb.net \
--cc=greg@enjellic.com \
--cc=linux-driver@qlogic.com \
--cc=linux-raid@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=linuxraid@amcc.com \
--cc=neilb@suse.de \
--cc=scst-devel@lists.sourceforge.net \
--cc=stf_xl@wp.pl \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).