linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vladislav Bolkhovitin <vst@vlnb.net>
To: Stanislaw Gruszka <stf_xl@wp.pl>
Cc: scst-devel@lists.sourceforge.net, greg@enjellic.com,
	linux-driver@qlogic.com, neilb@suse.de,
	linux-raid@vger.kernel.org, linuxraid@amcc.com,
	linux-scsi@vger.kernel.org
Subject: Re: Who do we point to?
Date: Thu, 21 Aug 2008 16:17:56 +0400	[thread overview]
Message-ID: <48AD5CF4.9060407@vlnb.net> (raw)
In-Reply-To: <200808210306.39959.stf_xl@wp.pl>

Stanislaw Gruszka wrote:
>> Apologies for the large broadcast domain on this.  I wanted to make
>> sure everyone who may have an interest in this is involved.
>>
>> Some feedback on another issue we encountered with Linux in a
>> production initiator/target environment with SCST.  I'm including logs
>> below from three separate systems involved in the incident.  I've gone
>> through them with my team and we are currently unsure on what
>> triggered all this, hence mail to everyone who may be involved.
>>
>> The system involved is SCST 1.0.0.0 running on a Linux 2.6.24.7 target
>> platform using the qla_isp driver module.  The target machine has two
>> 9650 eight port 3Ware controller cards driving a total of 16 750
>> gigabyte Seagate NearLine drives.  Firmware on the 3ware and Qlogic
>> cards should all be current.  There are two identical servers in two
>> geographically separated data-centers.
>>
>> The drives on each platform are broken into four 3+1 RAID5 devices
>> with software RAID.  Each RAID5 volume is a physical volume for an LVM
>> volume group. There is currently one logical volume exported from each
>> of four RAID5 volumes as a target device.  A total of four initiators
>> are thus accessing the target server, each accessing different RAID5
>> volumes.
>>
>> The initiators are running a stock 2.6.26.2 kernel with a RHEL5
>> userspace.  Access to the SAN is via a 2462 dual-port Qlogic card.
>> The initiators see a block device from each of the two target servers
>> through separate ports/paths.  The block devices form a software RAID1
>> device (with bitmaps) which is the physical volume for an LVM volume
>> group.  The production filesystem is supported by a single logical
>> volume allocated from that volume group.
>>
>> A drive failure occured last Sunday afternoon on one of the RAID5
>> volumes.  The target kernel recognized the failure, failed the device
>> and kept going.
>>
>> Unfortunately three of the four initiators picked up a device failure
>> which caused the SCST exported volume to be faulted out of the RAID1
>> device.  One of the initiators noted an incident was occurring, issued
>> a target reset and continued forward with no issues.
>>
>> The initiator which got things 'right' was not accessing the RAID5
>> volume on the target which experienced the error.  Two of the three
>> initiators which faulted out their volumes were not accessing the
>> compromised RAID5 volume.  The initiator accessing the volume faulted
>> out its device.
> For some reason SCST core need to wait for logical unit driver (aka dev 
> handler) for abort comand. It is not possible to abort command instantly i.e.  
> mark command as aborted, return task management success to initiator and 
> after logical unit driver finish, just free resources for aborted command (I 
> don't know way, maybe Vlad could tell more about this).

That's a SAM requirement. Otherwise, if complete TM commands 
"instantly", without waiting for all affected commands to complete, it 
is possible that the aborted command would be executed in one more retry 
*after* the next command that initiator issued after the reset was 
completed. Initiator would think that the aborted commands are already 
dead and such behavior could kill journaled filesystems.

> Qlogic initiator 
> device just waits for 3ware card to abort commands. As both systems have the 
> same SCSI stack, such same commands timeouts. 3ware driver will return error 
> to RAID5 roughly at the same time when Qlogic initiator timeouts. So 
> sometimes Qlogic send only device reset and sometimes target reset too.
> 
> I believe increasing timeouts in sd driver on initiator site (and maybe 
> decreasing in on target system) will help. This things are not run time 
> configurable, only compile time. On initiator systems I suggest to increase 
> SD_TIMEOUT and maybe on target site decrease SD_MAX_RETRIES, both values are 
> in drivers/scsi/sd.h. In such configuration, when physical disk fail, 3ware 
> will return error during initiator waiting for command complete, RAID5 on 
> target will do the right job and from initiator point of view command will 
> finish successfully.
> 
> Cheers
> Stanislaw Gruszka
> 


  reply	other threads:[~2008-08-21 12:17 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-20 19:11 Who do we point to? greg
2008-08-21  1:06 ` [Scst-devel] " Stanislaw Gruszka
2008-08-21 12:17   ` Vladislav Bolkhovitin [this message]
2008-08-21 12:14 ` Vladislav Bolkhovitin
2008-08-21 14:32   ` James Bottomley
2008-08-27 18:17     ` Vladislav Bolkhovitin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48AD5CF4.9060407@vlnb.net \
    --to=vst@vlnb.net \
    --cc=greg@enjellic.com \
    --cc=linux-driver@qlogic.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=linuxraid@amcc.com \
    --cc=neilb@suse.de \
    --cc=scst-devel@lists.sourceforge.net \
    --cc=stf_xl@wp.pl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).