public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
From: Vladislav Bolkhovitin <vst@vlnb.net>
To: greg@enjellic.com
Cc: linux-driver@qlogic.com, scst-devel@lists.sourceforge.net,
	linux-scsi@vger.kernel.org
Subject: Re: [Scst-devel] Poisoning of Linux initiators on SCST reboot.
Date: Fri, 25 Jul 2008 14:43:56 +0400	[thread overview]
Message-ID: <4889AE6C.40900@vlnb.net> (raw)
In-Reply-To: <4889AD80.30000@vlnb.net>

Vladislav Bolkhovitin wrote:
> greg@enjellic.com wrote:
>> Good morning to everyone, hope your respective days are going well.
>> Sorry for the wide cast on this but I wanted to get what would seem to
>> be the concerned parties on this issue in the loop.
>>
>> We have been putting SCST through an extensive round of pre-production
>> testing.  I wanted to start following up on some of the issues we have
>> noted.
>>
>> We will be putting SCST into service to support mirrored storage from
>> client initiators to two separate data-centers.  The filesystems on
>> the client initiators access storage at the two data-centers via a
>> Linux MD RAID1 device.  The SAN architecture is based on Cisco
>> MDS-9509 switches.
>>
>> Just as an aside for people considering use of SCST.  The core engine
>> has been rock solid.  Our testing rounds consists of driving around
>> 1/8 of a petabyte of widely disparate I/O types from multiple
>> initiators to a pair of targets in the two data-centers.  SCST hasn't
>> missed a beat so far, so kudos to Vlad and everyone involved in its
>> development.
> 
> Thanks! Your support is very much appreciated and exactly on time, 
> because I'm going to submit SCST patches for review and inclusion into 
> the kernel next week.
> 
>> As we began forced failure testing one issue has come up that I wanted
>> to advise people of.  A hard reboot of an SCST target server results
>> in the 'poisoning' of Linux based initiators.  We verified the issue
>> as being present on client initiators running the stock RHEL5 kernel
>> up through 2.6.26.
>>
>> The targets are using Qlogic 2462 cards using the isp_mod driver.  The
>> client initiators are using Qlogic 2342 cards with the qla2xxx driver.
>>
>> Failure mode is as follows:
>>
>>         1.) Configure SCST based storage for an initiator (vdisk
>>             based).
>>
>>         2.) Activate initiator.  Initiator logs into fabric and
>>             discovers SCST based storage.
>>
>>         3.) Force SCST target failure by rebooting or pulling power.
>>
>>         4.) SCST target returns to service and logs into zone.
>>
>>         5.) Initiator picks up RSCN but re-activates the rport for
>>             SCST server as an INITIATOR rather than TARGET role.
>>
>> After this point in time the initiator is effectively 'poisoned'.
>>
>> Nothing short of unloading and reloading the Qlogic 2xxx driver on the
>> client initiator will allow the initiator to recognize the SCST server
>> as a target device.  A driver unload/reload of course is not an option
>> to restore connectivity since it would take the remaining live side of
>> the mirror off-line as well.
>>
>> We finally figured out what seems to be happening by watching the logs
>> on the client and comparing what was going on there to the FLOGI login
>> status on the fabric.
>>
>> When the SCST target server reboots the initiator times out the remote
>> port and places it into 'unknown' state.  The qla2xxx driver,
>> according to the source code, maintains the previous rport state in
>> driver internal data.
>>
>> The 2462 card in the target on boot logs into the fabric with an
>> initiator role, I'm assuming in support of BIOS based SAN booting. The
>> client initiator picks up on this and re-activates the rport as being
>> in an INITIATOR role.
> 
> You should be able to switch off this behavior by disabling the SAN 
> booting in the card's BIOS.
> 
>> Loading the isp_mod driver causes the 2462 card in the target to be
>> shutdown.  The client initiator picks up on this and times out the
>> rport retaining the last rport state as INITIATOR.
>>
>> Enabling target mode on the 2462 causes it to log back into the
>> fabric.  The client initiator picks up on the RSCN but refuses to
>> transition the rport from INITIATOR to TARGET state.  Without going
>> into TARGET state the remote port won't have SCSI device discovery
>> initiated against it and hence the SCST based storage is inaccessible.
>>
>> Activating a LIP on the client initiates a new fabric login attempt
>> which completes with the following message:
>>
>> Jul 24 02:53:59 init-test kernel: rport-2:0-0: blocked FC remote port
>> time out: no longer a FCP target, removing starget
>>
>> Which from a review of the source code seems consistent with our
>> analysis of the problem.
>>
>> The culprit is the following code from drivers/scsi/scsi_transport_fc.c:
>>
>>         if ((rport->port_state == FC_PORTSTATE_ONLINE) &&
>>             (rport->scsi_target_id != -1) &&
>>             !(rport->roles & FC_PORT_ROLE_FCP_TARGET)) {
>>                 dev_printk(KERN_ERR, &rport->dev,
>>                         "blocked FC remote port time out: no longer"
>>                         " a FCP target, removing starget\n");
>>                 spin_unlock_irqrestore(shost->host_lock, flags);
>>                 scsi_target_unblock(&rport->dev);
>>                 fc_queue_work(shost, &rport->stgt_delete_work);
>>                 return;
>>         }
>>
>> The above gets executed in response to the LIP on the initiator.  The
>> value in rport->roles is being populated with what the remote target
>> was INITIATOR rather than its current TARGET state.
>>
>> Windows client initiators running against the SCST targets get the
>> transition and login sequence correct.  When the SCST target is
>> re-activated after the cold boot those clients immediately re-discover
>> their storage while the Linux clients issue error messages about loss
>> of the remote target.
>>
>> While all this doesn't seem to be technically a bug with SCST it
>> certainly is a problematic usage scenario.  It may also explain why
>> some individuals may have had problems getting SCST clients to access
>> their storage.
>>
>> If a test SCST server was plugged into an active zone and turned on it
>> would immediately poison any Linux clients.  No amount of proper
>> configuration on the target would allow the client to access storage
>> until the client was rebooted or its drivers reloaded.
>>
>> Any suggestions on how to move forward would be appreciated.  We've
>> got a pretty extensive test environment and would be happy to test run
>> any suggested changes or patches.
> 
> I've also many times seen how Linux Qlogic qla2xxx driver "lost" remote 
> ports. But that was from the target side and I wasn't able to figure out 
> the exact test case for that. Plus, we found out a suitable for target 
> workaround: usage of INITIATOR PORT NAME field in ATIO IOCB for the lost 
> ports.
> 
> So, qla2xxx driver definitely has problem(s) in this area. The fact that 
> Windows works well in this scenario only additionally proves that. But 
> I'm afraid, you have the only way to deal with it is to fix qla2xxx 
> driver itself.

Sorry, I meant "yourself".

> My experience with contacts with Andrew Vasquez, the 
> driver's maintainer, that you need something more valuable than problems 
> with some home brewed target to make him interested. Otherwise your 
> questions will be simply ignored.
> 
>> Once again a thank you to everyone who has contributed to SCST
>> development.  Other than this and a few additional glitches I will
>> follow up with via additional e-mails it is presenting itself as a
>> very solid platform for storage delivery.
>>
>> Best wishes for a pleasant weekend to everyone.
>>
>> As always,
>> Dr. G.W. Wettstein, Ph.D.   Enjellic Systems Development, LLC.
>> 4206 N. 19th Ave.           Specializing in information infra-structure
>> Fargo, ND  58102            development.
>> PH: 701-281-1686
>> FAX: 701-281-3949           EMAIL: greg@enjellic.com
>> ------------------------------------------------------------------------------
>> "Much work remains to be done before we can announce our total failure
>>  to make any progress."
>>                                 -- Mike Kelly
>>
>> -------------------------------------------------------------------------
>> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
>> Build the coolest Linux based applications with Moblin SDK & win great prizes
>> Grand prize is a trip for two to an Open Source event anywhere in the world
>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>> _______________________________________________
>> Scst-devel mailing list
>> Scst-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scst-devel
>>
> 
> 
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Scst-devel mailing list
> Scst-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scst-devel
> 


  reply	other threads:[~2008-07-25 10:43 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-24 17:50 Poisoning of Linux initiators on SCST reboot greg
2008-07-25 10:40 ` [Scst-devel] " Vladislav Bolkhovitin
2008-07-25 10:43   ` Vladislav Bolkhovitin [this message]
2008-07-25 13:11 ` Stanislaw Gruszka
2008-07-25 13:45 ` Andrew Vasquez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4889AE6C.40900@vlnb.net \
    --to=vst@vlnb.net \
    --cc=greg@enjellic.com \
    --cc=linux-driver@qlogic.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=scst-devel@lists.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox