public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* Commit a692b0e broke my mvsas card
@ 2012-04-16 22:42 Tom Rini
  2012-04-16 23:12 ` Dan Williams
  0 siblings, 1 reply; 15+ messages in thread
From: Tom Rini @ 2012-04-16 22:42 UTC (permalink / raw)
  To: linux-scsi; +Cc: Dan Williams, James Bottomley, Xiangliang Yu

Hey all,

I have an OCZ RevoDrive3 X2 (which the mvsas driver works for).  I've
been running mainline since support for my card was added and I just
went to test 3.3-rc4.  Unpatched I get some sort of recursive fault that
locks up the system.  When I revert a692b0e (which I found with a git
bisect) things are working as expected again.  What info do you need
from me to work out the right solution here?  Thanks!

-- 
Tom

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Commit a692b0e broke my mvsas card
  2012-04-16 22:42 Commit a692b0e broke my mvsas card Tom Rini
@ 2012-04-16 23:12 ` Dan Williams
  2012-04-16 23:34   ` Tom Rini
  0 siblings, 1 reply; 15+ messages in thread
From: Dan Williams @ 2012-04-16 23:12 UTC (permalink / raw)
  To: Tom Rini; +Cc: linux-scsi, James Bottomley, Xiangliang Yu

On Mon, Apr 16, 2012 at 3:42 PM, Tom Rini <trini@ti.com> wrote:
> Hey all,
>
> I have an OCZ RevoDrive3 X2 (which the mvsas driver works for).  I've
> been running mainline since support for my card was added and I just
> went to test 3.3-rc4.  Unpatched I get some sort of recursive fault that

I assume this was supposed to be 3.4-rc3?

> locks up the system.  When I revert a692b0e (which I found with a git
> bisect) things are working as expected again.  What info do you need
> from me to work out the right solution here?  Thanks!

Can you post the kernel log of the fault (starting at the first sign
of trouble)?

I suspect it's one of the known regression fixes [1], but would like
to double check.

--
Dan

[1]: http://marc.info/?l=linux-scsi&m=133435937115299&w=2
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Commit a692b0e broke my mvsas card
  2012-04-16 23:12 ` Dan Williams
@ 2012-04-16 23:34   ` Tom Rini
  2012-04-16 23:59     ` Dan Williams
  0 siblings, 1 reply; 15+ messages in thread
From: Tom Rini @ 2012-04-16 23:34 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-scsi, James Bottomley, Xiangliang Yu

On 04/16/2012 04:12 PM, Dan Williams wrote:
> On Mon, Apr 16, 2012 at 3:42 PM, Tom Rini<trini@ti.com>  wrote:
>> Hey all,
>>
>> I have an OCZ RevoDrive3 X2 (which the mvsas driver works for).  I've
>> been running mainline since support for my card was added and I just
>> went to test 3.3-rc4.  Unpatched I get some sort of recursive fault that
>
> I assume this was supposed to be 3.4-rc3?

Yes, sorry.

>> locks up the system.  When I revert a692b0e (which I found with a git
>> bisect) things are working as expected again.  What info do you need
>> from me to work out the right solution here?  Thanks!
>
> Can you post the kernel log of the fault (starting at the first sign
> of trouble)?
>
> I suspect it's one of the known regression fixes [1], but would like
> to double check.

I merged those changes in and the problem is now non-fatal but now I'm 
missing 2 of the 4 drives on the card.  Portion of dmesg is at 
http://pastebin.com/36K7xwHK and I can email it if needed.

-- 
Tom

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Commit a692b0e broke my mvsas card
  2012-04-16 23:34   ` Tom Rini
@ 2012-04-16 23:59     ` Dan Williams
  2012-04-17  1:53       ` Tom Rini
  0 siblings, 1 reply; 15+ messages in thread
From: Dan Williams @ 2012-04-16 23:59 UTC (permalink / raw)
  To: Tom Rini; +Cc: linux-scsi, James Bottomley, Xiangliang Yu

On Mon, 2012-04-16 at 16:34 -0700, Tom Rini wrote:
> On 04/16/2012 04:12 PM, Dan Williams wrote:
> > On Mon, Apr 16, 2012 at 3:42 PM, Tom Rini<trini@ti.com>  wrote:
> >> Hey all,
> >>
> >> I have an OCZ RevoDrive3 X2 (which the mvsas driver works for).  I've
> >> been running mainline since support for my card was added and I just
> >> went to test 3.3-rc4.  Unpatched I get some sort of recursive fault that
> >
> > I assume this was supposed to be 3.4-rc3?
> 
> Yes, sorry.
> 
> >> locks up the system.  When I revert a692b0e (which I found with a git
> >> bisect) things are working as expected again.  What info do you need
> >> from me to work out the right solution here?  Thanks!
> >
> > Can you post the kernel log of the fault (starting at the first sign
> > of trouble)?
> >
> > I suspect it's one of the known regression fixes [1], but would like
> > to double check.
> 
> I merged those changes in and the problem is now non-fatal but now I'm 
> missing 2 of the 4 drives on the card.  Portion of dmesg is at 
> http://pastebin.com/36K7xwHK and I can email it if needed.
> 

Thanks, yes it looks like a new regression.  Can you tell me if the
following fixes it?

---

mvsas: fix local port name regression

Commit a692b0e "[SCSI] libsas: fix sas port naming" added a dependency
on lldds filling out the id field of registered phys.  Add this for
mvsas.

Note we do this in the lldd rather than the core to allow it the control
of mapping physical phy id to the libsas phy index.

Reported-by: Tom Rini <trini@ti.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/mvsas/mv_init.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/mvsas/mv_init.c b/drivers/scsi/mvsas/mv_init.c
index cc59dff..556fa6c 100644
--- a/drivers/scsi/mvsas/mv_init.c
+++ b/drivers/scsi/mvsas/mv_init.c
@@ -447,7 +447,7 @@ static int pci_go_64(struct pci_dev *pdev)
 static int __devinit mvs_prep_sas_ha_init(struct Scsi_Host *shost,
 				const struct mvs_chip_info *chip_info)
 {
-	int phy_nr, port_nr; unsigned short core_nr;
+	int i, phy_nr, port_nr; unsigned short core_nr;
 	struct asd_sas_phy **arr_phy;
 	struct asd_sas_port **arr_port;
 	struct sas_ha_struct *sha = SHOST_TO_SAS_HA(shost);
@@ -462,6 +462,9 @@ static int __devinit mvs_prep_sas_ha_init(struct Scsi_Host *shost,
 	if (!arr_phy || !arr_port)
 		goto exit_free;
 
+	for (i = 0; i < phy_nr; i++)
+		arr_phy[i]->id = i;
+
 	sha->sas_phy = arr_phy;
 	sha->sas_port = arr_port;
 	sha->core.shost = shost;



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: Commit a692b0e broke my mvsas card
  2012-04-16 23:59     ` Dan Williams
@ 2012-04-17  1:53       ` Tom Rini
  2012-04-17  3:25         ` Dan Williams
  0 siblings, 1 reply; 15+ messages in thread
From: Tom Rini @ 2012-04-17  1:53 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-scsi, James Bottomley, Xiangliang Yu

On 04/16/2012 04:59 PM, Dan Williams wrote:
> On Mon, 2012-04-16 at 16:34 -0700, Tom Rini wrote:
>> On 04/16/2012 04:12 PM, Dan Williams wrote:
>>> On Mon, Apr 16, 2012 at 3:42 PM, Tom Rini<trini@ti.com>   wrote:
>>>> Hey all,
>>>>
>>>> I have an OCZ RevoDrive3 X2 (which the mvsas driver works for).  I've
>>>> been running mainline since support for my card was added and I just
>>>> went to test 3.3-rc4.  Unpatched I get some sort of recursive fault that
>>>
>>> I assume this was supposed to be 3.4-rc3?
>>
>> Yes, sorry.
>>
>>>> locks up the system.  When I revert a692b0e (which I found with a git
>>>> bisect) things are working as expected again.  What info do you need
>>>> from me to work out the right solution here?  Thanks!
>>>
>>> Can you post the kernel log of the fault (starting at the first sign
>>> of trouble)?
>>>
>>> I suspect it's one of the known regression fixes [1], but would like
>>> to double check.
>>
>> I merged those changes in and the problem is now non-fatal but now I'm
>> missing 2 of the 4 drives on the card.  Portion of dmesg is at
>> http://pastebin.com/36K7xwHK and I can email it if needed.
>>
>
> Thanks, yes it looks like a new regression.  Can you tell me if the
> following fixes it?

No joy, oopses.  http://alnk.org/43youngkirby for the dmesg and oops.

-- 
Tom

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Commit a692b0e broke my mvsas card
  2012-04-17  1:53       ` Tom Rini
@ 2012-04-17  3:25         ` Dan Williams
  2012-04-17 14:44           ` Tom Rini
  2012-04-17 15:47           ` James Bottomley
  0 siblings, 2 replies; 15+ messages in thread
From: Dan Williams @ 2012-04-17  3:25 UTC (permalink / raw)
  To: Tom Rini; +Cc: linux-scsi, James Bottomley, Xiangliang Yu

On Mon, 2012-04-16 at 18:53 -0700, Tom Rini wrote:
> > Thanks, yes it looks like a new regression.  Can you tell me if the
> > following fixes it?
> 
> No joy, oopses.  http://alnk.org/43youngkirby for the dmesg and oops.
> 

Agh, sorry, I rushed that one.  The phy array is initialized later, here
is another run at it:

----
mvsas: fix local port name regression

From: Dan Williams <dan.j.williams@intel.com>

Commit a692b0e "[SCSI] libsas: fix sas port naming" added a dependency
on lldds filling out the id field of registered phys.  Add this for
mvsas.

Note we do this in the lldd rather than the core to allow it the control
of mappping physical phy id to the libsas phy index.

Reported-by: Tom Rini <trini@ti.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/scsi/mvsas/mv_init.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/mvsas/mv_init.c b/drivers/scsi/mvsas/mv_init.c
index cc59dff..0c9af72 100644
--- a/drivers/scsi/mvsas/mv_init.c
+++ b/drivers/scsi/mvsas/mv_init.c
@@ -497,10 +497,12 @@ static void  __devinit mvs_post_sas_ha_init(struct Scsi_Host *shost,
 	for (j = 0; j < nr_core; j++) {
 		mvi = ((struct mvs_prv_info *)sha->lldd_ha)->mvi[j];
 		for (i = 0; i < chip_info->n_phy; i++) {
-			sha->sas_phy[j * chip_info->n_phy  + i] =
-				&mvi->phy[i].sas_phy;
-			sha->sas_port[j * chip_info->n_phy + i] =
-				&mvi->port[i].sas_port;
+			struct asd_sas_phy *phy = &mvi->phy[i].sas_phy;
+			int id = j * chip_info->n_phy  + i;
+
+			phy->id = id;
+			sha->sas_phy[id] = phy;
+			sha->sas_port[id] = &mvi->port[i].sas_port;
 		}
 	}
 





^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: Commit a692b0e broke my mvsas card
  2012-04-17  3:25         ` Dan Williams
@ 2012-04-17 14:44           ` Tom Rini
  2012-04-17 16:50             ` Dan Williams
  2012-04-17 15:47           ` James Bottomley
  1 sibling, 1 reply; 15+ messages in thread
From: Tom Rini @ 2012-04-17 14:44 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-scsi, James Bottomley, Xiangliang Yu

On 04/16/2012 08:25 PM, Dan Williams wrote:
> On Mon, 2012-04-16 at 18:53 -0700, Tom Rini wrote:
>>> Thanks, yes it looks like a new regression.  Can you tell me if the
>>> following fixes it?
>>
>> No joy, oopses.  http://alnk.org/43youngkirby for the dmesg and oops.
>>
>
> Agh, sorry, I rushed that one.  The phy array is initialized later, here
> is another run at it:

No crash, but 2 out of 4 SSDs still: http://pastebin.com/PCUGTBqG

-- 
Tom

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Commit a692b0e broke my mvsas card
  2012-04-17  3:25         ` Dan Williams
  2012-04-17 14:44           ` Tom Rini
@ 2012-04-17 15:47           ` James Bottomley
  2012-04-17 17:17             ` Dan Williams
  1 sibling, 1 reply; 15+ messages in thread
From: James Bottomley @ 2012-04-17 15:47 UTC (permalink / raw)
  To: Dan Williams; +Cc: Tom Rini, linux-scsi, Xiangliang Yu

On Mon, 2012-04-16 at 20:25 -0700, Dan Williams wrote:
> On Mon, 2012-04-16 at 18:53 -0700, Tom Rini wrote:
> > > Thanks, yes it looks like a new regression.  Can you tell me if the
> > > following fixes it?
> > 
> > No joy, oopses.  http://alnk.org/43youngkirby for the dmesg and oops.
> > 
> 
> Agh, sorry, I rushed that one.  The phy array is initialized later, here
> is another run at it:

Are we sure it's initialised correctly in all the other SAS drivers that
use (well, one other: aic94xx)?

Given the oops issue, perhaps revert this for now and get a working
patch in for the next merge window?

James



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Commit a692b0e broke my mvsas card
  2012-04-17 14:44           ` Tom Rini
@ 2012-04-17 16:50             ` Dan Williams
  2012-04-17 17:21               ` Tom Rini
  0 siblings, 1 reply; 15+ messages in thread
From: Dan Williams @ 2012-04-17 16:50 UTC (permalink / raw)
  To: Tom Rini; +Cc: linux-scsi, James Bottomley, Xiangliang Yu

On Tue, Apr 17, 2012 at 7:44 AM, Tom Rini <trini@ti.com> wrote:
> On 04/16/2012 08:25 PM, Dan Williams wrote:
>>
>> On Mon, 2012-04-16 at 18:53 -0700, Tom Rini wrote:
>>>>
>>>> Thanks, yes it looks like a new regression.  Can you tell me if the
>>>> following fixes it?
>>>
>>>
>>> No joy, oopses.  http://alnk.org/43youngkirby for the dmesg and oops.
>>>
>>
>> Agh, sorry, I rushed that one.  The phy array is initialized later, here
>> is another run at it:
>
>
> No crash, but 2 out of 4 SSDs still: http://pastebin.com/PCUGTBqG

Ok, this looks like a different issue around the changes to ata reset.
 The intent was to leave mvsas with its old behavior (of not waiting
for resets to complete) until it could be updated to the new reset
scheme.  Can you send a log from a good run on 3.3?

I want to see if the timings are different, because:

[   10.039490] sas: ata10: end_device-6:4: dev error handler
[..]
[   10.198842] ata10.00: failed to IDENTIFY (INIT_DEV_PARAMS failed,
err_mask=0x80)

[   25.684822] sas: ata11: end_device-6:5: dev error handler
[..]
[   25.844299] ata11.00: failed to IDENTIFY (INIT_DEV_PARAMS failed,
err_mask=0x80)

Is shorter than I would expect.

--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Commit a692b0e broke my mvsas card
  2012-04-17 15:47           ` James Bottomley
@ 2012-04-17 17:17             ` Dan Williams
  2012-04-18  5:11               ` Dan Williams
  0 siblings, 1 reply; 15+ messages in thread
From: Dan Williams @ 2012-04-17 17:17 UTC (permalink / raw)
  To: James Bottomley; +Cc: Tom Rini, linux-scsi, Xiangliang Yu

On Tue, Apr 17, 2012 at 8:47 AM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Mon, 2012-04-16 at 20:25 -0700, Dan Williams wrote:
>> Agh, sorry, I rushed that one.  The phy array is initialized later, here
>> is another run at it:
>
> Are we sure it's initialised correctly in all the other SAS drivers that
> use (well, one other: aic94xx)?

Looks like we need:

diff --git a/drivers/scsi/aic94xx/aic94xx_init.c
b/drivers/scsi/aic94xx/aic94xx_init.c
index ff80552..830f438 100644
--- a/drivers/scsi/aic94xx/aic94xx_init.c
+++ b/drivers/scsi/aic94xx/aic94xx_init.c
@@ -250,6 +250,7 @@ static int __devinit asd_common_setup(struct
asd_ha_struct *asd_ha)
                        SAS_LINK_RATE_1_5_GBPS;
                asd_ha->hw_prof.phy_desc[i].min_sata_lrate =
                        SAS_LINK_RATE_1_5_GBPS;
+               asd_ha->phys[i].sas_phy.id = i;
        }

        return 0;

> Given the oops issue, perhaps revert this for now and get a working
> patch in for the next merge window?

I have no strong feelings either way, but aic94xx and mvsas
maintainers have been hard to reach and I'm not encouraged more time
will yield a different result versus just moving ahead with these
fixes.

That said we still have Tom's discovery regression which is a separate issue.

--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: Commit a692b0e broke my mvsas card
  2012-04-17 16:50             ` Dan Williams
@ 2012-04-17 17:21               ` Tom Rini
  0 siblings, 0 replies; 15+ messages in thread
From: Tom Rini @ 2012-04-17 17:21 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-scsi, James Bottomley, Xiangliang Yu

On 04/17/2012 09:50 AM, Dan Williams wrote:
> On Tue, Apr 17, 2012 at 7:44 AM, Tom Rini<trini@ti.com>  wrote:
>> On 04/16/2012 08:25 PM, Dan Williams wrote:
>>>
>>> On Mon, 2012-04-16 at 18:53 -0700, Tom Rini wrote:
>>>>>
>>>>> Thanks, yes it looks like a new regression.  Can you tell me if the
>>>>> following fixes it?
>>>>
>>>>
>>>> No joy, oopses.  http://alnk.org/43youngkirby for the dmesg and oops.
>>>>
>>>
>>> Agh, sorry, I rushed that one.  The phy array is initialized later, here
>>> is another run at it:
>>
>>
>> No crash, but 2 out of 4 SSDs still: http://pastebin.com/PCUGTBqG
>
> Ok, this looks like a different issue around the changes to ata reset.
>   The intent was to leave mvsas with its old behavior (of not waiting
> for resets to complete) until it could be updated to the new reset
> scheme.  Can you send a log from a good run on 3.3?
>
> I want to see if the timings are different, because:
>
> [   10.039490] sas: ata10: end_device-6:4: dev error handler
> [..]
> [   10.198842] ata10.00: failed to IDENTIFY (INIT_DEV_PARAMS failed,
> err_mask=0x80)
>
> [   25.684822] sas: ata11: end_device-6:5: dev error handler
> [..]
> [   25.844299] ata11.00: failed to IDENTIFY (INIT_DEV_PARAMS failed,
> err_mask=0x80)
>
> Is shorter than I would expect.

The 3.3.0 dmesg is up at http://pastebin.com/cVTmpH1M

-- 
Tom

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Commit a692b0e broke my mvsas card
  2012-04-17 17:17             ` Dan Williams
@ 2012-04-18  5:11               ` Dan Williams
  2012-04-18  5:37                 ` James Bottomley
  2012-04-18 15:23                 ` Tom Rini
  0 siblings, 2 replies; 15+ messages in thread
From: Dan Williams @ 2012-04-18  5:11 UTC (permalink / raw)
  To: James Bottomley; +Cc: Tom Rini, linux-scsi, Xiangliang Yu

On Tue, Apr 17, 2012 at 10:17 AM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Tue, Apr 17, 2012 at 8:47 AM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
>> On Mon, 2012-04-16 at 20:25 -0700, Dan Williams wrote:
>>> Agh, sorry, I rushed that one.  The phy array is initialized later, here
>>> is another run at it:
>>
>> Are we sure it's initialised correctly in all the other SAS drivers that
>> use (well, one other: aic94xx)?
>
> Looks like we need:
>
> diff --git a/drivers/scsi/aic94xx/aic94xx_init.c
> b/drivers/scsi/aic94xx/aic94xx_init.c
> index ff80552..830f438 100644
> --- a/drivers/scsi/aic94xx/aic94xx_init.c
> +++ b/drivers/scsi/aic94xx/aic94xx_init.c
> @@ -250,6 +250,7 @@ static int __devinit asd_common_setup(struct
> asd_ha_struct *asd_ha)
>                        SAS_LINK_RATE_1_5_GBPS;
>                asd_ha->hw_prof.phy_desc[i].min_sata_lrate =
>                        SAS_LINK_RATE_1_5_GBPS;
> +               asd_ha->phys[i].sas_phy.id = i;
>        }
>
>        return 0;
>
>> Given the oops issue, perhaps revert this for now and get a working
>> patch in for the next merge window?
>
> I have no strong feelings either way, but aic94xx and mvsas
> maintainers have been hard to reach and I'm not encouraged more time
> will yield a different result versus just moving ahead with these
> fixes.
>
> That said we still have Tom's discovery regression which is a separate issue.

I take it back.

I overlooked what Tom said at the very beginning.  Everything is fixed
by the revert, and I now see why my later "fix" made it worse.  The
fix overlooked that mvsas is indeed initializing the phy ids, but in
the "multi-chip" case it does a rather annoying duplication of phy ids
in the array passed to libsas.  So, for example, chip0 has phy0-3 at
ha phy index 0-3 and chip1 has its phy0-3 at ha phy index 4-7.  So the
"fix" was breaking mvsas's ability to lookup phys by id and the
original commit is tripped up by mvsas's scheme of putting non-unique
ids in the sas_ha_struct.

Question for a later day, but why isn't mvsas creating a scsi_host per
chip??  That can only help performance and is more in line with
reality.

To properly support commit a692b0e mvsas would either need to convert
to a shost-per-chip or use a helper like:

static inline struct mvs_phy *to_mvs_phy(struct mvs_info *mvi, int phy_id)
{
        return mvi->phy[phy_id - mvi->chip->n_phy * mvi->id];
}

...everywhere it converts from a libsas phy id back to a local phy structure.

Both of these options are way too big for 3.4-rc4, so I'll queue up
the revert with a Reported-by and Tested-by from Tom.

Thanks again for the report and help Tom.

--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Commit a692b0e broke my mvsas card
  2012-04-18  5:11               ` Dan Williams
@ 2012-04-18  5:37                 ` James Bottomley
  2012-04-18 15:23                 ` Tom Rini
  1 sibling, 0 replies; 15+ messages in thread
From: James Bottomley @ 2012-04-18  5:37 UTC (permalink / raw)
  To: Dan Williams; +Cc: Tom Rini, linux-scsi, Xiangliang Yu

On Tue, 2012-04-17 at 22:11 -0700, Dan Williams wrote:
> On Tue, Apr 17, 2012 at 10:17 AM, Dan Williams <dan.j.williams@intel.com> wrote:
> > On Tue, Apr 17, 2012 at 8:47 AM, James Bottomley
> > <James.Bottomley@hansenpartnership.com> wrote:
> >> On Mon, 2012-04-16 at 20:25 -0700, Dan Williams wrote:
> >>> Agh, sorry, I rushed that one.  The phy array is initialized later, here
> >>> is another run at it:
> >>
> >> Are we sure it's initialised correctly in all the other SAS drivers that
> >> use (well, one other: aic94xx)?
> >
> > Looks like we need:
> >
> > diff --git a/drivers/scsi/aic94xx/aic94xx_init.c
> > b/drivers/scsi/aic94xx/aic94xx_init.c
> > index ff80552..830f438 100644
> > --- a/drivers/scsi/aic94xx/aic94xx_init.c
> > +++ b/drivers/scsi/aic94xx/aic94xx_init.c
> > @@ -250,6 +250,7 @@ static int __devinit asd_common_setup(struct
> > asd_ha_struct *asd_ha)
> >                        SAS_LINK_RATE_1_5_GBPS;
> >                asd_ha->hw_prof.phy_desc[i].min_sata_lrate =
> >                        SAS_LINK_RATE_1_5_GBPS;
> > +               asd_ha->phys[i].sas_phy.id = i;
> >        }
> >
> >        return 0;
> >
> >> Given the oops issue, perhaps revert this for now and get a working
> >> patch in for the next merge window?
> >
> > I have no strong feelings either way, but aic94xx and mvsas
> > maintainers have been hard to reach and I'm not encouraged more time
> > will yield a different result versus just moving ahead with these
> > fixes.
> >
> > That said we still have Tom's discovery regression which is a separate issue.
> 
> I take it back.
> 
> I overlooked what Tom said at the very beginning.  Everything is fixed
> by the revert, and I now see why my later "fix" made it worse.  The
> fix overlooked that mvsas is indeed initializing the phy ids, but in
> the "multi-chip" case it does a rather annoying duplication of phy ids
> in the array passed to libsas.  So, for example, chip0 has phy0-3 at
> ha phy index 0-3 and chip1 has its phy0-3 at ha phy index 4-7.  So the
> "fix" was breaking mvsas's ability to lookup phys by id and the
> original commit is tripped up by mvsas's scheme of putting non-unique
> ids in the sas_ha_struct.
> 
> Question for a later day, but why isn't mvsas creating a scsi_host per
> chip??  That can only help performance and is more in line with
> reality.
> 
> To properly support commit a692b0e mvsas would either need to convert
> to a shost-per-chip or use a helper like:
> 
> static inline struct mvs_phy *to_mvs_phy(struct mvs_info *mvi, int phy_id)
> {
>         return mvi->phy[phy_id - mvi->chip->n_phy * mvi->id];
> }
> 
> ...everywhere it converts from a libsas phy id back to a local phy structure.
> 
> Both of these options are way too big for 3.4-rc4, so I'll queue up
> the revert with a Reported-by and Tested-by from Tom.
> 
> Thanks again for the report and help Tom.

Actually, no need ... I can add a revert to the fixes branch (when I
manage to get it built).

James



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Commit a692b0e broke my mvsas card
  2012-04-18  5:11               ` Dan Williams
  2012-04-18  5:37                 ` James Bottomley
@ 2012-04-18 15:23                 ` Tom Rini
  2012-04-18 15:46                   ` Dan Williams
  1 sibling, 1 reply; 15+ messages in thread
From: Tom Rini @ 2012-04-18 15:23 UTC (permalink / raw)
  To: Dan Williams; +Cc: James Bottomley, linux-scsi, Xiangliang Yu

On 04/17/2012 10:11 PM, Dan Williams wrote:
> On Tue, Apr 17, 2012 at 10:17 AM, Dan Williams<dan.j.williams@intel.com>  wrote:
>> On Tue, Apr 17, 2012 at 8:47 AM, James Bottomley
>> <James.Bottomley@hansenpartnership.com>  wrote:
>>> On Mon, 2012-04-16 at 20:25 -0700, Dan Williams wrote:
>>>> Agh, sorry, I rushed that one.  The phy array is initialized later, here
>>>> is another run at it:
>>>
>>> Are we sure it's initialised correctly in all the other SAS drivers that
>>> use (well, one other: aic94xx)?
>>
>> Looks like we need:
>>
>> diff --git a/drivers/scsi/aic94xx/aic94xx_init.c
>> b/drivers/scsi/aic94xx/aic94xx_init.c
>> index ff80552..830f438 100644
>> --- a/drivers/scsi/aic94xx/aic94xx_init.c
>> +++ b/drivers/scsi/aic94xx/aic94xx_init.c
>> @@ -250,6 +250,7 @@ static int __devinit asd_common_setup(struct
>> asd_ha_struct *asd_ha)
>>                         SAS_LINK_RATE_1_5_GBPS;
>>                 asd_ha->hw_prof.phy_desc[i].min_sata_lrate =
>>                         SAS_LINK_RATE_1_5_GBPS;
>> +               asd_ha->phys[i].sas_phy.id = i;
>>         }
>>
>>         return 0;
>>
>>> Given the oops issue, perhaps revert this for now and get a working
>>> patch in for the next merge window?
>>
>> I have no strong feelings either way, but aic94xx and mvsas
>> maintainers have been hard to reach and I'm not encouraged more time
>> will yield a different result versus just moving ahead with these
>> fixes.
>>
>> That said we still have Tom's discovery regression which is a separate issue.
>
> I take it back.
>
> I overlooked what Tom said at the very beginning.  Everything is fixed
> by the revert, and I now see why my later "fix" made it worse.  The
> fix overlooked that mvsas is indeed initializing the phy ids, but in
> the "multi-chip" case it does a rather annoying duplication of phy ids
> in the array passed to libsas.  So, for example, chip0 has phy0-3 at
> ha phy index 0-3 and chip1 has its phy0-3 at ha phy index 4-7.  So the
> "fix" was breaking mvsas's ability to lookup phys by id and the
> original commit is tripped up by mvsas's scheme of putting non-unique
> ids in the sas_ha_struct.
>
> Question for a later day, but why isn't mvsas creating a scsi_host per
> chip??  That can only help performance and is more in line with
> reality.

Performance isn't in-line with what I would expect (in playing with 
fio), so if you want to whip something up, or point me some changesets / 
other drivers I can take a stab here.  Thanks!

-- 
Tom

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Commit a692b0e broke my mvsas card
  2012-04-18 15:23                 ` Tom Rini
@ 2012-04-18 15:46                   ` Dan Williams
  0 siblings, 0 replies; 15+ messages in thread
From: Dan Williams @ 2012-04-18 15:46 UTC (permalink / raw)
  To: Tom Rini; +Cc: James Bottomley, linux-scsi, Xiangliang Yu

On Wed, Apr 18, 2012 at 8:23 AM, Tom Rini <trini@ti.com> wrote:
> Performance isn't in-line with what I would expect (in playing with fio), so
> if you want to whip something up, or point me some changesets / other
> drivers I can take a stab here.  Thanks!
>

The thrust is to change this loop in mvs_pci_init:

        do {
                mvi = mvs_pci_alloc(pdev, ent, shost, nhost);
                if (!mvi) {
                        rc = -ENOMEM;
                        goto err_out_regions;
                }

                memset(&mvi->hba_info_param, 0xFF,
                        sizeof(struct hba_info_page));

                mvs_init_sas_add(mvi);

                mvi->instance = nhost;
                rc = MVS_CHIP_DISP->chip_init(mvi);
                if (rc) {
                        mvs_free(mvi);
                        goto err_out_regions;
                }
                nhost++;
        } while (nhost < chip->n_host);

...to assume that mvs_pci_alloc will do the allocation of a scsi_host
per chip, rather than passing in a global host.

Then would need to clean up warts like the following in mvs_phy_control:

        while (sha->sas_phy[i]) {
                if (sha->sas_phy[i] == sas_phy)
                        break;
                i++;
        }

...since this model means that the sas_phy->id can be used directly to
look up the local mvsas phy.

I might circle back to this once 3.4 finalizes (since the random ids
libsas uses for identifying local ports is onerous), but if someone
else beats me to it I wouldn't mind ;-).

--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2012-04-18 15:46 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-04-16 22:42 Commit a692b0e broke my mvsas card Tom Rini
2012-04-16 23:12 ` Dan Williams
2012-04-16 23:34   ` Tom Rini
2012-04-16 23:59     ` Dan Williams
2012-04-17  1:53       ` Tom Rini
2012-04-17  3:25         ` Dan Williams
2012-04-17 14:44           ` Tom Rini
2012-04-17 16:50             ` Dan Williams
2012-04-17 17:21               ` Tom Rini
2012-04-17 15:47           ` James Bottomley
2012-04-17 17:17             ` Dan Williams
2012-04-18  5:11               ` Dan Williams
2012-04-18  5:37                 ` James Bottomley
2012-04-18 15:23                 ` Tom Rini
2012-04-18 15:46                   ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox