All of lore.kernel.org
 help / color / mirror / Atom feed
* Problem handling task management functions in qla2xxx
@ 2006-08-22 14:25 Vladislav Bolkhovitin
  0 siblings, 0 replies; only message in thread
From: Vladislav Bolkhovitin @ 2006-08-22 14:25 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-driver

Hello,

If a task management function is issued, eg using sg_reset utility (the 
easiest way), during active IO to qla2xxx device (ISP2422), it often 
fails with messages like:

------------------------------------------------------------------

qla2xxx 0000:04:02.0: scsi(13:0:1): DEVICE RESET ISSUED.
qla2xxx 0000:04:02.0: qla2xxx_eh_device_reset: failed while waiting for
commands

------------------------------------------------------------------

This could lead to broken SCSI mid-level's error recovery and 
erroneously making the device(es) offline, when they are actually healthy.

I did some investigations and figured out that the driver waits some 
time for the firmware to finish aborting the outstanding commands with 
CS_ABORTED status and if at least one command isn't finished until 
timeout, FAILED is returned.

The problem is how the wait is implemented. Here is the code:

------------------------------------------------------------------

static int
qla2x00_eh_wait_on_command(scsi_qla_host_t *ha, struct scsi_cmnd *cmd)
{
#define ABORT_POLLING_PERIOD    1000
#define ABORT_WAIT_ITER         ((10 * 1000) / (ABORT_POLLING_PERIOD))
         unsigned long wait_iter = ABORT_WAIT_ITER;
         int ret = QLA_SUCCESS;

         while (CMD_SP(cmd)) {
                 msleep(ABORT_POLLING_PERIOD);

                 if (--wait_iter)
                         break;
         }
         if (CMD_SP(cmd))
                 ret = QLA_FUNCTION_FAILED;

         return ret;
}

------------------------------------------------------------------

Where CMD_SP() is defined as
#define CMD_SP(Cmnd)            ((Cmnd)->SCp.ptr)

It's set to NULL just before cmd->scsi_done() is called.

You can see that this way of waiting has a race with the SCSI mid-level, 
where it can free and reuse the command while 
qla2x00_eh_wait_on_command() is sleeping in msleep(), so SCp.ptr can 
become non-NULL again, which could lead to the above false errors.

Regards,
Vlad


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2006-08-22 14:26 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-22 14:25 Problem handling task management functions in qla2xxx Vladislav Bolkhovitin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.