All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vladislav Bolkhovitin <vst@vlnb.net>
To: linux-scsi@vger.kernel.org
Cc: linux-driver@qlogic.com
Subject: Problem handling task management functions in qla2xxx
Date: Tue, 22 Aug 2006 18:25:28 +0400	[thread overview]
Message-ID: <44EB13D8.6010607@vlnb.net> (raw)

Hello,

If a task management function is issued, eg using sg_reset utility (the 
easiest way), during active IO to qla2xxx device (ISP2422), it often 
fails with messages like:

------------------------------------------------------------------

qla2xxx 0000:04:02.0: scsi(13:0:1): DEVICE RESET ISSUED.
qla2xxx 0000:04:02.0: qla2xxx_eh_device_reset: failed while waiting for
commands

------------------------------------------------------------------

This could lead to broken SCSI mid-level's error recovery and 
erroneously making the device(es) offline, when they are actually healthy.

I did some investigations and figured out that the driver waits some 
time for the firmware to finish aborting the outstanding commands with 
CS_ABORTED status and if at least one command isn't finished until 
timeout, FAILED is returned.

The problem is how the wait is implemented. Here is the code:

------------------------------------------------------------------

static int
qla2x00_eh_wait_on_command(scsi_qla_host_t *ha, struct scsi_cmnd *cmd)
{
#define ABORT_POLLING_PERIOD    1000
#define ABORT_WAIT_ITER         ((10 * 1000) / (ABORT_POLLING_PERIOD))
         unsigned long wait_iter = ABORT_WAIT_ITER;
         int ret = QLA_SUCCESS;

         while (CMD_SP(cmd)) {
                 msleep(ABORT_POLLING_PERIOD);

                 if (--wait_iter)
                         break;
         }
         if (CMD_SP(cmd))
                 ret = QLA_FUNCTION_FAILED;

         return ret;
}

------------------------------------------------------------------

Where CMD_SP() is defined as
#define CMD_SP(Cmnd)            ((Cmnd)->SCp.ptr)

It's set to NULL just before cmd->scsi_done() is called.

You can see that this way of waiting has a race with the SCSI mid-level, 
where it can free and reuse the command while 
qla2x00_eh_wait_on_command() is sleeping in msleep(), so SCp.ptr can 
become non-NULL again, which could lead to the above false errors.

Regards,
Vlad


                 reply	other threads:[~2006-08-22 14:26 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44EB13D8.6010607@vlnb.net \
    --to=vst@vlnb.net \
    --cc=linux-driver@qlogic.com \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.