[PATCH] mvsas: fix default can

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] mvsas: fix default can_queue
@ 2008-02-29 18:01 James Bottomley
  2008-03-03  0:42 ` James Bottomley
  0 siblings, 1 reply; 14+ messages in thread
From: James Bottomley @ 2008-02-29 18:01 UTC (permalink / raw)
  To: Ke Wei; +Cc: linux-scsi

I noticed that the current marvell sas driver wasn't performing very
well.  It turns out that it's setting can_queue not in the SCSI host,
but in its own internal data structure, meaning it's always operating
with a global queue depth of one.  This patch raises it to what the code
seemed to be intending ... although I think can_queue should be
MVS_CHIP_SLOT_SZ - 1 (without the divide by two)?

The good news is that with this change, I'm getting a respectable
throughput on the fio hammer test; plus zapping random phy resets across
the disk triggers error handler recovery correctly (so far).

I'm having less happy results with a SATAPI DVD ... it looks like the
initial IDENTIFY goes across just fine, but that we stall on the other
SCSI commands ... I'm still investigating this one.

James

---

diff --git a/drivers/scsi/mvsas.c b/drivers/scsi/mvsas.c
index 5ec0665..a128f67 100755
--- a/drivers/scsi/mvsas.c
+++ b/drivers/scsi/mvsas.c
@@ -618,7 +618,6 @@ struct mvs_info {
 	struct mvs_phy		phy[MVS_MAX_PHYS];
 	struct mvs_port		port[MVS_MAX_PHYS];

-	u32			can_queue;	/* per adapter */
 	u32			tag_out;	/*Get*/
 	u32			tag_in;		/*Give*/
 };
@@ -2150,7 +2149,7 @@ static struct mvs_info *__devinit mvs_alloc(struct pci_dev *pdev,
 	mvi->sas.num_phys = chip->n_phy;
 	mvi->sas.lldd_max_execute_num = MVS_CHIP_SLOT_SZ - 1;
 	mvi->sas.lldd_queue_size = MVS_QUEUE_SIZE;
-	mvi->can_queue = (MVS_CHIP_SLOT_SZ >> 1) - 1;
+	mvi->shost->can_queue = (MVS_CHIP_SLOT_SZ >> 1) - 1;
 	mvi->sas.lldd_ha = mvi;
 	mvi->sas.core.shost = mvi->shost;

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] mvsas: fix default can_queue
  2008-02-29 18:01 [PATCH] mvsas: fix default can_queue James Bottomley
@ 2008-03-03  0:42 ` James Bottomley
  2008-03-03  8:17   ` Ke Wei
  0 siblings, 1 reply; 14+ messages in thread
From: James Bottomley @ 2008-03-03  0:42 UTC (permalink / raw)
  To: Ke Wei; +Cc: linux-scsi


On Fri, 2008-02-29 at 12:01 -0600, James Bottomley wrote:
> I noticed that the current marvell sas driver wasn't performing very
> well.  It turns out that it's setting can_queue not in the SCSI host,
> but in its own internal data structure, meaning it's always operating
> with a global queue depth of one.  This patch raises it to what the code
> seemed to be intending ... although I think can_queue should be
> MVS_CHIP_SLOT_SZ - 1 (without the divide by two)?
> 
> The good news is that with this change, I'm getting a respectable
> throughput on the fio hammer test; plus zapping random phy resets across
> the disk triggers error handler recovery correctly (so far).
> 
> I'm having less happy results with a SATAPI DVD ... it looks like the
> initial IDENTIFY goes across just fine, but that we stall on the other
> SCSI commands ... I'm still investigating this one.

Actually, I've run into another problem with this patch applied.  It
looks like NCQ fails with ATA disks.  What I see is that I/O goes fine
until I get more than one command outstanding to the device, then the
device stops responding.  I can keep the I/O flowing if I clamp the
device queue depth at 1.  SAS disks seem to be fine ... I can get
multiple outstanding commands to them correctly serviced.

James



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mvsas: fix default can_queue
  2008-03-03  0:42 ` James Bottomley
@ 2008-03-03  8:17   ` Ke Wei
  2008-03-03 14:59     ` James Bottomley
  0 siblings, 1 reply; 14+ messages in thread
From: Ke Wei @ 2008-03-03  8:17 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi

On Mon, Mar 3, 2008 at 8:42 AM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
>
> On Fri, 2008-02-29 at 12:01 -0600, James Bottomley wrote:
> > I noticed that the current marvell sas driver wasn't performing very
> > well.  It turns out that it's setting can_queue not in the SCSI host,
> > but in its own internal data structure, meaning it's always operating
> > with a global queue depth of one.  This patch raises it to what the code
> > seemed to be intending ... although I think can_queue should be
> > MVS_CHIP_SLOT_SZ - 1 (without the divide by two)?
> >
> > The good news is that with this change, I'm getting a respectable
> > throughput on the fio hammer test; plus zapping random phy resets across
> > the disk triggers error handler recovery correctly (so far).
> >
> > I'm having less happy results with a SATAPI DVD ... it looks like the
> > initial IDENTIFY goes across just fine, but that we stall on the other
> > SCSI commands ... I'm still investigating this one.
>
> Actually, I've run into another problem with this patch applied.  It
> looks like NCQ fails with ATA disks.  What I see is that I/O goes fine
> until I get more than one command outstanding to the device, then the
> device stops responding.  I can keep the I/O flowing if I clamp the
> device queue depth at 1.  SAS disks seem to be fine ... I can get
> multiple outstanding commands to them correctly serviced.

Yes, I have to say that testing failed when I plugged SATA and SAS
disk. Sometimes "insmod mvsas" will cause the system to hang.
Only look good if can_queue is set to 1.  I will investigate this case.

> I'm having less happy results with a SATAPI DVD ... it looks like the
> initial IDENTIFY goes across just fine, but that we stall on the other
> SCSI commands ... I'm still investigating this one.

I think we need set BLIST_NOREPORTLUN or some other flags (see
scsi_devinfo.h) about  new some ATAPI device.When calling
scsi_report_lun_scan , it will bypass REPORT_LUNS command.

-- 
Best Regards,
Ke Wei

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mvsas: fix default can_queue
  2008-03-03  8:17   ` Ke Wei
@ 2008-03-03 14:59     ` James Bottomley
  2008-03-03 16:40       ` James Bottomley
  2008-03-05  2:07       ` James Bottomley
  0 siblings, 2 replies; 14+ messages in thread
From: James Bottomley @ 2008-03-03 14:59 UTC (permalink / raw)
  To: Ke Wei; +Cc: linux-scsi

On Mon, 2008-03-03 at 16:17 +0800, Ke Wei wrote:
> On Mon, Mar 3, 2008 at 8:42 AM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
> >
> > On Fri, 2008-02-29 at 12:01 -0600, James Bottomley wrote:
> > > I noticed that the current marvell sas driver wasn't performing very
> > > well.  It turns out that it's setting can_queue not in the SCSI host,
> > > but in its own internal data structure, meaning it's always operating
> > > with a global queue depth of one.  This patch raises it to what the code
> > > seemed to be intending ... although I think can_queue should be
> > > MVS_CHIP_SLOT_SZ - 1 (without the divide by two)?
> > >
> > > The good news is that with this change, I'm getting a respectable
> > > throughput on the fio hammer test; plus zapping random phy resets across
> > > the disk triggers error handler recovery correctly (so far).
> > >
> > > I'm having less happy results with a SATAPI DVD ... it looks like the
> > > initial IDENTIFY goes across just fine, but that we stall on the other
> > > SCSI commands ... I'm still investigating this one.
> >
> > Actually, I've run into another problem with this patch applied.  It
> > looks like NCQ fails with ATA disks.  What I see is that I/O goes fine
> > until I get more than one command outstanding to the device, then the
> > device stops responding.  I can keep the I/O flowing if I clamp the
> > device queue depth at 1.  SAS disks seem to be fine ... I can get
> > multiple outstanding commands to them correctly serviced.
> 
> Yes, I have to say that testing failed when I plugged SATA and SAS
> disk. Sometimes "insmod mvsas" will cause the system to hang.
> Only look good if can_queue is set to 1.  I will investigate this case.

Thanks.  For the NCQ case, it does look like turning NCQ off makes the
disk work fine, so I'd suspect some issue with NCQ handling.

> > I'm having less happy results with a SATAPI DVD ... it looks like the
> > initial IDENTIFY goes across just fine, but that we stall on the other
> > SCSI commands ... I'm still investigating this one.
> 
> I think we need set BLIST_NOREPORTLUN or some other flags (see
> scsi_devinfo.h) about  new some ATAPI device.When calling
> scsi_report_lun_scan , it will bypass REPORT_LUNS command.

It doesn't seem to be anything the DVD does ... it works fine with the
aic94xx controller doing SATAPI (it sends the correct reply to REPORT
LUNS).  It looks like the first hang comes at around the second or third
Test Unit Ready.

Traces seem to show IDENTIFY_PACKET, INQUIRY, INQUIRY, TUR, TUR (hang)
and then every following command hangs, but I'll try to instrument more
accurate tracing.

James



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mvsas: fix default can_queue
  2008-03-03 14:59     ` James Bottomley
@ 2008-03-03 16:40       ` James Bottomley
  2008-03-05  2:07       ` James Bottomley
  1 sibling, 0 replies; 14+ messages in thread
From: James Bottomley @ 2008-03-03 16:40 UTC (permalink / raw)
  To: Ke Wei; +Cc: linux-scsi

On Mon, 2008-03-03 at 08:59 -0600, James Bottomley wrote:
> On Mon, 2008-03-03 at 16:17 +0800, Ke Wei wrote:
> > On Mon, Mar 3, 2008 at 8:42 AM, James Bottomley
> > <James.Bottomley@hansenpartnership.com> wrote:
> > >
> > > On Fri, 2008-02-29 at 12:01 -0600, James Bottomley wrote:
> > > > I noticed that the current marvell sas driver wasn't performing very
> > > > well.  It turns out that it's setting can_queue not in the SCSI host,
> > > > but in its own internal data structure, meaning it's always operating
> > > > with a global queue depth of one.  This patch raises it to what the code
> > > > seemed to be intending ... although I think can_queue should be
> > > > MVS_CHIP_SLOT_SZ - 1 (without the divide by two)?
> > > >
> > > > The good news is that with this change, I'm getting a respectable
> > > > throughput on the fio hammer test; plus zapping random phy resets across
> > > > the disk triggers error handler recovery correctly (so far).
> > > >
> > > > I'm having less happy results with a SATAPI DVD ... it looks like the
> > > > initial IDENTIFY goes across just fine, but that we stall on the other
> > > > SCSI commands ... I'm still investigating this one.
> > >
> > > Actually, I've run into another problem with this patch applied.  It
> > > looks like NCQ fails with ATA disks.  What I see is that I/O goes fine
> > > until I get more than one command outstanding to the device, then the
> > > device stops responding.  I can keep the I/O flowing if I clamp the
> > > device queue depth at 1.  SAS disks seem to be fine ... I can get
> > > multiple outstanding commands to them correctly serviced.
> > 
> > Yes, I have to say that testing failed when I plugged SATA and SAS
> > disk. Sometimes "insmod mvsas" will cause the system to hang.
> > Only look good if can_queue is set to 1.  I will investigate this case.
> 
> Thanks.  For the NCQ case, it does look like turning NCQ off makes the
> disk work fine, so I'd suspect some issue with NCQ handling.

As a work around until the NCQ issue is fixed, this will allow can_queue
to be raised while still forcing all ATA devices to the non-NCQ case.

Also, can can_queue be set to MVS_CHIP_SLOT_SZ - 1?  It seems like a
reasonable default.

James

---

diff --git a/drivers/scsi/mvsas.c b/drivers/scsi/mvsas.c
index 5ec0665..425231f 100755
--- a/drivers/scsi/mvsas.c
+++ b/drivers/scsi/mvsas.c
@@ -37,6 +37,8 @@
 #include <linux/dma-mapping.h>
 #include <linux/ctype.h>
 #include <scsi/libsas.h>
+#include <scsi/scsi_tcq.h>
+#include <scsi/sas_ata.h>
 #include <asm/io.h>
 
 #define DRV_NAME	"mvsas"
@@ -654,12 +655,31 @@ static const struct mvs_chip_info mvs_chips[] = {
 	[chip_6480] =		{ 8, 32, 10 },
 };
 
+static int mvsas_slave_configure(struct scsi_device *sdev)
+{
+	struct domain_device *dev = sdev_to_domain_dev(sdev);
+	int ret = sas_slave_configure(sdev);
+
+	if (ret)
+		return ret;
+
+	if (dev_is_sata(dev)) {
+		struct ata_port *ap = dev->sata_dev.ap;
+		struct ata_device *adev = ap->link.device;
+
+		/* clamp at no NCQ for the time being */
+		adev->flags |= ATA_DFLAG_NCQ_OFF;
+		scsi_adjust_queue_depth(sdev, MSG_SIMPLE_TAG, 1);
+	}
+	return 0;
+}
+
 static struct scsi_host_template mvs_sht = {
 	.module			= THIS_MODULE,
 	.name			= DRV_NAME,
 	.queuecommand		= sas_queuecommand,
 	.target_alloc		= sas_target_alloc,
-	.slave_configure	= sas_slave_configure,
+	.slave_configure	= mvsas_slave_configure,
 	.slave_destroy		= sas_slave_destroy,
 	.scan_finished		= mvs_scan_finished,
 	.scan_start		= mvs_scan_start,



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] mvsas: fix default can_queue
  2008-03-03 14:59     ` James Bottomley
  2008-03-03 16:40       ` James Bottomley
@ 2008-03-05  2:07       ` James Bottomley
  2008-03-05 21:02         ` James Bottomley
  1 sibling, 1 reply; 14+ messages in thread
From: James Bottomley @ 2008-03-05  2:07 UTC (permalink / raw)
  To: Ke Wei; +Cc: linux-scsi

On Mon, 2008-03-03 at 08:59 -0600, James Bottomley wrote:
> On Mon, 2008-03-03 at 16:17 +0800, Ke Wei wrote:
> > On Mon, Mar 3, 2008 at 8:42 AM, James Bottomley
> > <James.Bottomley@hansenpartnership.com> wrote:
> > >
> > > On Fri, 2008-02-29 at 12:01 -0600, James Bottomley wrote:
> > > > I noticed that the current marvell sas driver wasn't performing very
> > > > well.  It turns out that it's setting can_queue not in the SCSI host,
> > > > but in its own internal data structure, meaning it's always operating
> > > > with a global queue depth of one.  This patch raises it to what the code
> > > > seemed to be intending ... although I think can_queue should be
> > > > MVS_CHIP_SLOT_SZ - 1 (without the divide by two)?
> > > >
> > > > The good news is that with this change, I'm getting a respectable
> > > > throughput on the fio hammer test; plus zapping random phy resets across
> > > > the disk triggers error handler recovery correctly (so far).
> > > >
> > > > I'm having less happy results with a SATAPI DVD ... it looks like the
> > > > initial IDENTIFY goes across just fine, but that we stall on the other
> > > > SCSI commands ... I'm still investigating this one.
> > >
> > > Actually, I've run into another problem with this patch applied.  It
> > > looks like NCQ fails with ATA disks.  What I see is that I/O goes fine
> > > until I get more than one command outstanding to the device, then the
> > > device stops responding.  I can keep the I/O flowing if I clamp the
> > > device queue depth at 1.  SAS disks seem to be fine ... I can get
> > > multiple outstanding commands to them correctly serviced.
> > 
> > Yes, I have to say that testing failed when I plugged SATA and SAS
> > disk. Sometimes "insmod mvsas" will cause the system to hang.
> > Only look good if can_queue is set to 1.  I will investigate this case.
> 
> Thanks.  For the NCQ case, it does look like turning NCQ off makes the
> disk work fine, so I'd suspect some issue with NCQ handling.
> 
> > > I'm having less happy results with a SATAPI DVD ... it looks like the
> > > initial IDENTIFY goes across just fine, but that we stall on the other
> > > SCSI commands ... I'm still investigating this one.
> > 
> > I think we need set BLIST_NOREPORTLUN or some other flags (see
> > scsi_devinfo.h) about  new some ATAPI device.When calling
> > scsi_report_lun_scan , it will bypass REPORT_LUNS command.
> 
> It doesn't seem to be anything the DVD does ... it works fine with the
> aic94xx controller doing SATAPI (it sends the correct reply to REPORT
> LUNS).  It looks like the first hang comes at around the second or third
> Test Unit Ready.
> 
> Traces seem to show IDENTIFY_PACKET, INQUIRY, INQUIRY, TUR, TUR (hang)
> and then every following command hangs, but I'll try to instrument more
> accurate tracing.

OK, I instrumented more ... you're right, the first failing command is
REPORT_LUNS.  The failure isn't because the DVD doesn't accept the
command, but because it gets errored and we fail to report back the
error data.

What I see is the mvsas driver returning RXQ_ERR, so the device is
trying to terminate the transaction with an error code.  Unfortunately,
when it sees this code, mvsas does nothing at all, leaving the request
to time out and be aborted (even through it already finished).

I can plumb it in ... it looks like we should also be doing is calling
mvs_slot_complete(), but this still isn't quite correct ... it just sets
SAM_STAT_CHECK_COND ... it needs to collect the ATA error code somehow.

James



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mvsas: fix default can_queue
  2008-03-05  2:07       ` James Bottomley
@ 2008-03-05 21:02         ` James Bottomley
  2008-03-06 14:46           ` Ke Wei
  0 siblings, 1 reply; 14+ messages in thread
From: James Bottomley @ 2008-03-05 21:02 UTC (permalink / raw)
  To: Ke Wei; +Cc: linux-scsi, jgarzik

On Tue, 2008-03-04 at 20:07 -0600, James Bottomley wrote:
> On Mon, 2008-03-03 at 08:59 -0600, James Bottomley wrote:
> > On Mon, 2008-03-03 at 16:17 +0800, Ke Wei wrote:
> > > On Mon, Mar 3, 2008 at 8:42 AM, James Bottomley
> > > <James.Bottomley@hansenpartnership.com> wrote:
> > > >
> > > > On Fri, 2008-02-29 at 12:01 -0600, James Bottomley wrote:
> > > > > I noticed that the current marvell sas driver wasn't performing very
> > > > > well.  It turns out that it's setting can_queue not in the SCSI host,
> > > > > but in its own internal data structure, meaning it's always operating
> > > > > with a global queue depth of one.  This patch raises it to what the code
> > > > > seemed to be intending ... although I think can_queue should be
> > > > > MVS_CHIP_SLOT_SZ - 1 (without the divide by two)?
> > > > >
> > > > > The good news is that with this change, I'm getting a respectable
> > > > > throughput on the fio hammer test; plus zapping random phy resets across
> > > > > the disk triggers error handler recovery correctly (so far).
> > > > >
> > > > > I'm having less happy results with a SATAPI DVD ... it looks like the
> > > > > initial IDENTIFY goes across just fine, but that we stall on the other
> > > > > SCSI commands ... I'm still investigating this one.
> > > >
> > > > Actually, I've run into another problem with this patch applied.  It
> > > > looks like NCQ fails with ATA disks.  What I see is that I/O goes fine
> > > > until I get more than one command outstanding to the device, then the
> > > > device stops responding.  I can keep the I/O flowing if I clamp the
> > > > device queue depth at 1.  SAS disks seem to be fine ... I can get
> > > > multiple outstanding commands to them correctly serviced.
> > > 
> > > Yes, I have to say that testing failed when I plugged SATA and SAS
> > > disk. Sometimes "insmod mvsas" will cause the system to hang.
> > > Only look good if can_queue is set to 1.  I will investigate this case.
> > 
> > Thanks.  For the NCQ case, it does look like turning NCQ off makes the
> > disk work fine, so I'd suspect some issue with NCQ handling.
> > 
> > > > I'm having less happy results with a SATAPI DVD ... it looks like the
> > > > initial IDENTIFY goes across just fine, but that we stall on the other
> > > > SCSI commands ... I'm still investigating this one.
> > > 
> > > I think we need set BLIST_NOREPORTLUN or some other flags (see
> > > scsi_devinfo.h) about  new some ATAPI device.When calling
> > > scsi_report_lun_scan , it will bypass REPORT_LUNS command.
> > 
> > It doesn't seem to be anything the DVD does ... it works fine with the
> > aic94xx controller doing SATAPI (it sends the correct reply to REPORT
> > LUNS).  It looks like the first hang comes at around the second or third
> > Test Unit Ready.
> > 
> > Traces seem to show IDENTIFY_PACKET, INQUIRY, INQUIRY, TUR, TUR (hang)
> > and then every following command hangs, but I'll try to instrument more
> > accurate tracing.
> 
> OK, I instrumented more ... you're right, the first failing command is
> REPORT_LUNS.  The failure isn't because the DVD doesn't accept the
> command, but because it gets errored and we fail to report back the
> error data.
> 
> What I see is the mvsas driver returning RXQ_ERR, so the device is
> trying to terminate the transaction with an error code.  Unfortunately,
> when it sees this code, mvsas does nothing at all, leaving the request
> to time out and be aborted (even through it already finished).
> 
> I can plumb it in ... it looks like we should also be doing is calling
> mvs_slot_complete(), but this still isn't quite correct ... it just sets
> SAM_STAT_CHECK_COND ... it needs to collect the ATA error code somehow.

Just by way of update, the slot is completing with RXQ_ERR set, but
RXQ_DONE clear.  The mvs_err_info field has TFILE_ERR set (the only set
bit) and MVS_INT_STAT_SRS is zero.

I assume the slot processing has halted, and that we need to collect the
task file error registers and resume it somehow, but how?

James



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mvsas: fix default can_queue
  2008-03-05 21:02         ` James Bottomley
@ 2008-03-06 14:46           ` Ke Wei
  2008-03-06 15:52             ` James Bottomley
  0 siblings, 1 reply; 14+ messages in thread
From: Ke Wei @ 2008-03-06 14:46 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-scsi, jgarzik

On Thu, Mar 6, 2008 at 5:02 AM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:

1.
> >
> > OK, I instrumented more ... you're right, the first failing command is
> > REPORT_LUNS.  The failure isn't because the DVD doesn't accept the
> > command, but because it gets errored and we fail to report back the
> > error data.

First, I noticed that scsi_probe_and_add_lun function has changed
since v2.6.17-git4.
It will cause bflagsp to miss BLIST_NOREPORTLUN flag after calling scsi_add_lun.
On the other hand,  Even though mvsas driver can succeed in reporting
back the error data, scsi subsystem also will force device to reset
because of wrong REPORTLUN status. But Error again, then reset.
Maybe I may write simulation codes for this command.


 2.6.17-git4:
 /**
  * scsi_probe_and_add_lun - probe a LUN, if a LUN is found add it
@@ -880,10 +906,12 @@ static int scsi_probe_and_add_lun(struct
    if (scsi_probe_lun(sdev, result, result_len, &bflags))
       goto out_free_result;

+   if (bflagsp)
+      *bflagsp = bflags;
    /*
     * result contains valid SCSI INQUIRY data.
     */
-   if ((result[0] >> 5) == 3) {
+   if (((result[0] >> 5) == 3) && !(bflags & BLIST_ATTACH_PQ3)) {
       /*
        * For a Peripheral qualifier 3 (011b), the SCSI
        * spec says: The device server is not capable of
@@ -894,9 +922,22 @@ static int scsi_probe_and_add_lun(struct
        * logical disk configured at sdev->lun, but there
        * is a target id responding.
        */
-      SCSI_LOG_SCAN_BUS(3, printk(KERN_INFO
-               "scsi scan: peripheral qualifier of 3,"
-               " no device added\n"));
+      SCSI_LOG_SCAN_BUS(2, sdev_printk(KERN_INFO, sdev, "scsi scan:"
+               " peripheral qualifier of 3, device not"
+               " added\n"))
+      if (lun == 0) {
+         SCSI_LOG_SCAN_BUS(1, {
+            unsigned char vend[9];
+            unsigned char mod[17];
+
+            sdev_printk(KERN_INFO, sdev,
+               "scsi scan: consider passing scsi_mod."
+               "dev_flags=%s:%s:0x240 or 0x800240\n",
+               scsi_inq_str(vend, result, 8, 16),
+               scsi_inq_str(mod, result, 16, 32));
+         });
+      }
+
       res = SCSI_SCAN_TARGET_PRESENT;
       goto out_free_result;
    }
@@ -920,8 +961,6 @@ static int scsi_probe_and_add_lun(struct
          sdev->lockable = 0;
          scsi_unlock_floptical(sdev, result);
       }
-      if (bflagsp)
-         *bflagsp = bflags;
    }


2.
> As a work around until the NCQ issue is fixed, this will allow can_queue
> to be raised while still forcing all ATA devices to the non-NCQ case.
>
> Also, can can_queue be set to MVS_CHIP_SLOT_SZ - 1?  It seems like a
> reasonable default.

I also test failed. By far, can_queue only be set to 1. I will keep on
investigating.

3.
> The good news is that with this change, I'm getting a respectable
> throughput on the fio hammer test; plus zapping random phy resets across
> the disk triggers error handler recovery correctly (so far).

I fixed it because of a register mistake. And I'm testing for the follow patch.

-- 
Best Regards,
Ke Wei

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mvsas: fix default can_queue
  2008-03-06 14:46           ` Ke Wei
@ 2008-03-06 15:52             ` James Bottomley
  2008-03-06 17:44               ` James Bottomley
  0 siblings, 1 reply; 14+ messages in thread
From: James Bottomley @ 2008-03-06 15:52 UTC (permalink / raw)
  To: Ke Wei; +Cc: linux-scsi, jgarzik

On Thu, 2008-03-06 at 22:46 +0800, Ke Wei wrote:
> On Thu, Mar 6, 2008 at 5:02 AM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
> 
> 1.
> > >
> > > OK, I instrumented more ... you're right, the first failing command is
> > > REPORT_LUNS.  The failure isn't because the DVD doesn't accept the
> > > command, but because it gets errored and we fail to report back the
> > > error data.
> 
> First, I noticed that scsi_probe_and_add_lun function has changed
> since v2.6.17-git4.
> It will cause bflagsp to miss BLIST_NOREPORTLUN flag after calling scsi_add_lun.
> On the other hand,  Even though mvsas driver can succeed in reporting
> back the error data, scsi subsystem also will force device to reset
> because of wrong REPORTLUN status. But Error again, then reset.
> Maybe I may write simulation codes for this command.

But this isn't the problem.

I'll look into the mid layer changes and see if we have a problem.
However, this device does respond correctly (with an error) to
REPORT_LUNS (however, at the moment, for me, the current setup is a
great test of ATA error handling).

When I have it on the aic94xx it terminates the command and sends back a
Register D2H FIS with

error = 0x54 (sense key ILLEGAL REQUEST, ABRT)
status = 0x51 (DRDY, SERV, CHK)

So the device does a correct termination for an unsupported command.
The problem seems to be that mvsas isn't terminating and picking up the
register D2H fis properly.  Which, in turn seems to be connected with
the way it's not really processing a RXQ_ERR return.

This is a perfectly proper termination, and the scan routine will
proceed normally without invoking any error handling (it's aware of the
conflict between SPC and MMC regarding REPORT_LUNS).

James

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mvsas: fix default can_queue
  2008-03-06 15:52             ` James Bottomley
@ 2008-03-06 17:44               ` James Bottomley
  2008-03-06 17:59                 ` Jeff Garzik
  0 siblings, 1 reply; 14+ messages in thread
From: James Bottomley @ 2008-03-06 17:44 UTC (permalink / raw)
  To: Ke Wei; +Cc: linux-scsi, jgarzik

On Thu, 2008-03-06 at 09:52 -0600, James Bottomley wrote:
> On Thu, 2008-03-06 at 22:46 +0800, Ke Wei wrote:
> > On Thu, Mar 6, 2008 at 5:02 AM, James Bottomley
> > <James.Bottomley@hansenpartnership.com> wrote:
> > 
> > 1.
> > > >
> > > > OK, I instrumented more ... you're right, the first failing command is
> > > > REPORT_LUNS.  The failure isn't because the DVD doesn't accept the
> > > > command, but because it gets errored and we fail to report back the
> > > > error data.
> > 
> > First, I noticed that scsi_probe_and_add_lun function has changed
> > since v2.6.17-git4.
> > It will cause bflagsp to miss BLIST_NOREPORTLUN flag after calling scsi_add_lun.
> > On the other hand,  Even though mvsas driver can succeed in reporting
> > back the error data, scsi subsystem also will force device to reset
> > because of wrong REPORTLUN status. But Error again, then reset.
> > Maybe I may write simulation codes for this command.
> 
> But this isn't the problem.
> 
> I'll look into the mid layer changes and see if we have a problem.
> However, this device does respond correctly (with an error) to
> REPORT_LUNS (however, at the moment, for me, the current setup is a
> great test of ATA error handling).
> 
> When I have it on the aic94xx it terminates the command and sends back a
> Register D2H FIS with
> 
> error = 0x54 (sense key ILLEGAL REQUEST, ABRT)
> status = 0x51 (DRDY, SERV, CHK)
> 
> So the device does a correct termination for an unsupported command.
> The problem seems to be that mvsas isn't terminating and picking up the
> register D2H fis properly.  Which, in turn seems to be connected with
> the way it's not really processing a RXQ_ERR return.
> 
> This is a perfectly proper termination, and the scan routine will
> proceed normally without invoking any error handling (it's aware of the
> conflict between SPC and MMC regarding REPORT_LUNS).

Actually, I looked in your mvi->rx_fis area where the D2H FIS should be
and there seems to be a correct one collected.  The problem is that if
the slot returns RXQ_ERR without any other flags set (other than the
slot number), the driver does nothing.  Shouldn't the slot be completed
and the D2H FIS returned so we can then do a request sense?

James



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mvsas: fix default can_queue
  2008-03-06 17:44               ` James Bottomley
@ 2008-03-06 17:59                 ` Jeff Garzik
  2008-03-07 10:50                   ` Ke Wei
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff Garzik @ 2008-03-06 17:59 UTC (permalink / raw)
  To: James Bottomley; +Cc: Ke Wei, linux-scsi

James Bottomley wrote:
> On Thu, 2008-03-06 at 09:52 -0600, James Bottomley wrote:
>> On Thu, 2008-03-06 at 22:46 +0800, Ke Wei wrote:
>>> On Thu, Mar 6, 2008 at 5:02 AM, James Bottomley
>>> <James.Bottomley@hansenpartnership.com> wrote:
>>>
>>> 1.
>>>>> OK, I instrumented more ... you're right, the first failing command is
>>>>> REPORT_LUNS.  The failure isn't because the DVD doesn't accept the
>>>>> command, but because it gets errored and we fail to report back the
>>>>> error data.
>>> First, I noticed that scsi_probe_and_add_lun function has changed
>>> since v2.6.17-git4.
>>> It will cause bflagsp to miss BLIST_NOREPORTLUN flag after calling scsi_add_lun.
>>> On the other hand,  Even though mvsas driver can succeed in reporting
>>> back the error data, scsi subsystem also will force device to reset
>>> because of wrong REPORTLUN status. But Error again, then reset.
>>> Maybe I may write simulation codes for this command.
>> But this isn't the problem.
>>
>> I'll look into the mid layer changes and see if we have a problem.
>> However, this device does respond correctly (with an error) to
>> REPORT_LUNS (however, at the moment, for me, the current setup is a
>> great test of ATA error handling).
>>
>> When I have it on the aic94xx it terminates the command and sends back a
>> Register D2H FIS with
>>
>> error = 0x54 (sense key ILLEGAL REQUEST, ABRT)
>> status = 0x51 (DRDY, SERV, CHK)
>>
>> So the device does a correct termination for an unsupported command.
>> The problem seems to be that mvsas isn't terminating and picking up the
>> register D2H fis properly.  Which, in turn seems to be connected with
>> the way it's not really processing a RXQ_ERR return.
>>
>> This is a perfectly proper termination, and the scan routine will
>> proceed normally without invoking any error handling (it's aware of the
>> conflict between SPC and MMC regarding REPORT_LUNS).
> 
> Actually, I looked in your mvi->rx_fis area where the D2H FIS should be
> and there seems to be a correct one collected.  The problem is that if
> the slot returns RXQ_ERR without any other flags set (other than the
> slot number), the driver does nothing.  Shouldn't the slot be completed
> and the D2H FIS returned so we can then do a request sense?

That's a good question...  Ke?

I wrote that slot completion code according to the docs, which seemed to 
indicate that nothing associated with a slot could be touched until 
RXQ_DONE was set.

It seemed analagous to the 'OWN' bit of a DMA ring found in many other 
hardwares, where software should not touch anything related to the 
descriptor if that bit is not set.

	Jeff




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mvsas: fix default can_queue
  2008-03-06 17:59                 ` Jeff Garzik
@ 2008-03-07 10:50                   ` Ke Wei
  2008-03-07 15:03                     ` James Bottomley
  0 siblings, 1 reply; 14+ messages in thread
From: Ke Wei @ 2008-03-07 10:50 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: James Bottomley, linux-scsi

On a system with many SAS targets,  It appears possible that a
scsi_cmnd can time out without ever making it to the SAS LLDD or at
the same time that a completion is occurring.
in file sas_scsi_host.c :
               /* Queue up, Direct Mode or Task Collector Mode. */
               if (sas_ha->lldd_max_execute_num < 2)
                               res = i->dft->lldd_execute_task(task,
1, GFP_ATOMIC);
               else
                               res = sas_queue_up(task);
If I set lldd_max_execute_num above 1, I find that libsas couldn't
queue a task to the SAS LLDD sometimes.
System will always report: not at initiator: EH_RESET_TIMER.
Is queue_thread pending? I will keep investigating.



-- 
Best Regards,
Ke Wei

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mvsas: fix default can_queue
  2008-03-07 10:50                   ` Ke Wei
@ 2008-03-07 15:03                     ` James Bottomley
  2008-03-07 15:31                       ` Ke Wei
  0 siblings, 1 reply; 14+ messages in thread
From: James Bottomley @ 2008-03-07 15:03 UTC (permalink / raw)
  To: Ke Wei; +Cc: Jeff Garzik, linux-scsi

On Fri, 2008-03-07 at 18:50 +0800, Ke Wei wrote:
> On a system with many SAS targets,  It appears possible that a
> scsi_cmnd can time out without ever making it to the SAS LLDD or at
> the same time that a completion is occurring.
> in file sas_scsi_host.c :
>                /* Queue up, Direct Mode or Task Collector Mode. */
>                if (sas_ha->lldd_max_execute_num < 2)
>                                res = i->dft->lldd_execute_task(task,
> 1, GFP_ATOMIC);
>                else
>                                res = sas_queue_up(task);
> If I set lldd_max_execute_num above 1, I find that libsas couldn't
> queue a task to the SAS LLDD sometimes.
> System will always report: not at initiator: EH_RESET_TIMER.
> Is queue_thread pending? I will keep investigating.

Oh, actually, that's the old task collector mode.  I keep meaning to
strip it out of libsas ... this provides a good excuse.  Basically the
only valid case is lldd_max_execute_num == 1 since, as you found,
anything above that doesn't work.

Are you still seeing timeouts in the lldd_max_execute_num == 1 case?
And if so, is that still with can_queue == 1?

James



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mvsas: fix default can_queue
  2008-03-07 15:03                     ` James Bottomley
@ 2008-03-07 15:31                       ` Ke Wei
  0 siblings, 0 replies; 14+ messages in thread
From: Ke Wei @ 2008-03-07 15:31 UTC (permalink / raw)
  To: James Bottomley; +Cc: Jeff Garzik, linux-scsi

On Fri, Mar 7, 2008 at 11:03 PM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
>
>
> Are you still seeing timeouts in the lldd_max_execute_num == 1 case?
> And if so, is that still with can_queue == 1?
>
Yes , I set lldd_max_execute_num = 1 & can_queue = 1. By far, no timeout event.

> James
>
>
>


-- 
Best Regards,
Ke Wei

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2008-03-07 15:31 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-29 18:01 [PATCH] mvsas: fix default can_queue James Bottomley
2008-03-03  0:42 ` James Bottomley
2008-03-03  8:17   ` Ke Wei
2008-03-03 14:59     ` James Bottomley
2008-03-03 16:40       ` James Bottomley
2008-03-05  2:07       ` James Bottomley
2008-03-05 21:02         ` James Bottomley
2008-03-06 14:46           ` Ke Wei
2008-03-06 15:52             ` James Bottomley
2008-03-06 17:44               ` James Bottomley
2008-03-06 17:59                 ` Jeff Garzik
2008-03-07 10:50                   ` Ke Wei
2008-03-07 15:03                     ` James Bottomley
2008-03-07 15:31                       ` Ke Wei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).