* Commit a692b0e broke my mvsas card @ 2012-04-16 22:42 Tom Rini 2012-04-16 23:12 ` Dan Williams 0 siblings, 1 reply; 15+ messages in thread From: Tom Rini @ 2012-04-16 22:42 UTC (permalink / raw) To: linux-scsi; +Cc: Dan Williams, James Bottomley, Xiangliang Yu Hey all, I have an OCZ RevoDrive3 X2 (which the mvsas driver works for). I've been running mainline since support for my card was added and I just went to test 3.3-rc4. Unpatched I get some sort of recursive fault that locks up the system. When I revert a692b0e (which I found with a git bisect) things are working as expected again. What info do you need from me to work out the right solution here? Thanks! -- Tom ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Commit a692b0e broke my mvsas card 2012-04-16 22:42 Commit a692b0e broke my mvsas card Tom Rini @ 2012-04-16 23:12 ` Dan Williams 2012-04-16 23:34 ` Tom Rini 0 siblings, 1 reply; 15+ messages in thread From: Dan Williams @ 2012-04-16 23:12 UTC (permalink / raw) To: Tom Rini; +Cc: linux-scsi, James Bottomley, Xiangliang Yu On Mon, Apr 16, 2012 at 3:42 PM, Tom Rini <trini@ti.com> wrote: > Hey all, > > I have an OCZ RevoDrive3 X2 (which the mvsas driver works for). I've > been running mainline since support for my card was added and I just > went to test 3.3-rc4. Unpatched I get some sort of recursive fault that I assume this was supposed to be 3.4-rc3? > locks up the system. When I revert a692b0e (which I found with a git > bisect) things are working as expected again. What info do you need > from me to work out the right solution here? Thanks! Can you post the kernel log of the fault (starting at the first sign of trouble)? I suspect it's one of the known regression fixes [1], but would like to double check. -- Dan [1]: http://marc.info/?l=linux-scsi&m=133435937115299&w=2 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Commit a692b0e broke my mvsas card 2012-04-16 23:12 ` Dan Williams @ 2012-04-16 23:34 ` Tom Rini 2012-04-16 23:59 ` Dan Williams 0 siblings, 1 reply; 15+ messages in thread From: Tom Rini @ 2012-04-16 23:34 UTC (permalink / raw) To: Dan Williams; +Cc: linux-scsi, James Bottomley, Xiangliang Yu On 04/16/2012 04:12 PM, Dan Williams wrote: > On Mon, Apr 16, 2012 at 3:42 PM, Tom Rini<trini@ti.com> wrote: >> Hey all, >> >> I have an OCZ RevoDrive3 X2 (which the mvsas driver works for). I've >> been running mainline since support for my card was added and I just >> went to test 3.3-rc4. Unpatched I get some sort of recursive fault that > > I assume this was supposed to be 3.4-rc3? Yes, sorry. >> locks up the system. When I revert a692b0e (which I found with a git >> bisect) things are working as expected again. What info do you need >> from me to work out the right solution here? Thanks! > > Can you post the kernel log of the fault (starting at the first sign > of trouble)? > > I suspect it's one of the known regression fixes [1], but would like > to double check. I merged those changes in and the problem is now non-fatal but now I'm missing 2 of the 4 drives on the card. Portion of dmesg is at http://pastebin.com/36K7xwHK and I can email it if needed. -- Tom ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Commit a692b0e broke my mvsas card 2012-04-16 23:34 ` Tom Rini @ 2012-04-16 23:59 ` Dan Williams 2012-04-17 1:53 ` Tom Rini 0 siblings, 1 reply; 15+ messages in thread From: Dan Williams @ 2012-04-16 23:59 UTC (permalink / raw) To: Tom Rini; +Cc: linux-scsi, James Bottomley, Xiangliang Yu On Mon, 2012-04-16 at 16:34 -0700, Tom Rini wrote: > On 04/16/2012 04:12 PM, Dan Williams wrote: > > On Mon, Apr 16, 2012 at 3:42 PM, Tom Rini<trini@ti.com> wrote: > >> Hey all, > >> > >> I have an OCZ RevoDrive3 X2 (which the mvsas driver works for). I've > >> been running mainline since support for my card was added and I just > >> went to test 3.3-rc4. Unpatched I get some sort of recursive fault that > > > > I assume this was supposed to be 3.4-rc3? > > Yes, sorry. > > >> locks up the system. When I revert a692b0e (which I found with a git > >> bisect) things are working as expected again. What info do you need > >> from me to work out the right solution here? Thanks! > > > > Can you post the kernel log of the fault (starting at the first sign > > of trouble)? > > > > I suspect it's one of the known regression fixes [1], but would like > > to double check. > > I merged those changes in and the problem is now non-fatal but now I'm > missing 2 of the 4 drives on the card. Portion of dmesg is at > http://pastebin.com/36K7xwHK and I can email it if needed. > Thanks, yes it looks like a new regression. Can you tell me if the following fixes it? --- mvsas: fix local port name regression Commit a692b0e "[SCSI] libsas: fix sas port naming" added a dependency on lldds filling out the id field of registered phys. Add this for mvsas. Note we do this in the lldd rather than the core to allow it the control of mapping physical phy id to the libsas phy index. Reported-by: Tom Rini <trini@ti.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- drivers/scsi/mvsas/mv_init.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/mvsas/mv_init.c b/drivers/scsi/mvsas/mv_init.c index cc59dff..556fa6c 100644 --- a/drivers/scsi/mvsas/mv_init.c +++ b/drivers/scsi/mvsas/mv_init.c @@ -447,7 +447,7 @@ static int pci_go_64(struct pci_dev *pdev) static int __devinit mvs_prep_sas_ha_init(struct Scsi_Host *shost, const struct mvs_chip_info *chip_info) { - int phy_nr, port_nr; unsigned short core_nr; + int i, phy_nr, port_nr; unsigned short core_nr; struct asd_sas_phy **arr_phy; struct asd_sas_port **arr_port; struct sas_ha_struct *sha = SHOST_TO_SAS_HA(shost); @@ -462,6 +462,9 @@ static int __devinit mvs_prep_sas_ha_init(struct Scsi_Host *shost, if (!arr_phy || !arr_port) goto exit_free; + for (i = 0; i < phy_nr; i++) + arr_phy[i]->id = i; + sha->sas_phy = arr_phy; sha->sas_port = arr_port; sha->core.shost = shost; ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: Commit a692b0e broke my mvsas card 2012-04-16 23:59 ` Dan Williams @ 2012-04-17 1:53 ` Tom Rini 2012-04-17 3:25 ` Dan Williams 0 siblings, 1 reply; 15+ messages in thread From: Tom Rini @ 2012-04-17 1:53 UTC (permalink / raw) To: Dan Williams; +Cc: linux-scsi, James Bottomley, Xiangliang Yu On 04/16/2012 04:59 PM, Dan Williams wrote: > On Mon, 2012-04-16 at 16:34 -0700, Tom Rini wrote: >> On 04/16/2012 04:12 PM, Dan Williams wrote: >>> On Mon, Apr 16, 2012 at 3:42 PM, Tom Rini<trini@ti.com> wrote: >>>> Hey all, >>>> >>>> I have an OCZ RevoDrive3 X2 (which the mvsas driver works for). I've >>>> been running mainline since support for my card was added and I just >>>> went to test 3.3-rc4. Unpatched I get some sort of recursive fault that >>> >>> I assume this was supposed to be 3.4-rc3? >> >> Yes, sorry. >> >>>> locks up the system. When I revert a692b0e (which I found with a git >>>> bisect) things are working as expected again. What info do you need >>>> from me to work out the right solution here? Thanks! >>> >>> Can you post the kernel log of the fault (starting at the first sign >>> of trouble)? >>> >>> I suspect it's one of the known regression fixes [1], but would like >>> to double check. >> >> I merged those changes in and the problem is now non-fatal but now I'm >> missing 2 of the 4 drives on the card. Portion of dmesg is at >> http://pastebin.com/36K7xwHK and I can email it if needed. >> > > Thanks, yes it looks like a new regression. Can you tell me if the > following fixes it? No joy, oopses. http://alnk.org/43youngkirby for the dmesg and oops. -- Tom ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Commit a692b0e broke my mvsas card 2012-04-17 1:53 ` Tom Rini @ 2012-04-17 3:25 ` Dan Williams 2012-04-17 14:44 ` Tom Rini 2012-04-17 15:47 ` James Bottomley 0 siblings, 2 replies; 15+ messages in thread From: Dan Williams @ 2012-04-17 3:25 UTC (permalink / raw) To: Tom Rini; +Cc: linux-scsi, James Bottomley, Xiangliang Yu On Mon, 2012-04-16 at 18:53 -0700, Tom Rini wrote: > > Thanks, yes it looks like a new regression. Can you tell me if the > > following fixes it? > > No joy, oopses. http://alnk.org/43youngkirby for the dmesg and oops. > Agh, sorry, I rushed that one. The phy array is initialized later, here is another run at it: ---- mvsas: fix local port name regression From: Dan Williams <dan.j.williams@intel.com> Commit a692b0e "[SCSI] libsas: fix sas port naming" added a dependency on lldds filling out the id field of registered phys. Add this for mvsas. Note we do this in the lldd rather than the core to allow it the control of mappping physical phy id to the libsas phy index. Reported-by: Tom Rini <trini@ti.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> --- drivers/scsi/mvsas/mv_init.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/scsi/mvsas/mv_init.c b/drivers/scsi/mvsas/mv_init.c index cc59dff..0c9af72 100644 --- a/drivers/scsi/mvsas/mv_init.c +++ b/drivers/scsi/mvsas/mv_init.c @@ -497,10 +497,12 @@ static void __devinit mvs_post_sas_ha_init(struct Scsi_Host *shost, for (j = 0; j < nr_core; j++) { mvi = ((struct mvs_prv_info *)sha->lldd_ha)->mvi[j]; for (i = 0; i < chip_info->n_phy; i++) { - sha->sas_phy[j * chip_info->n_phy + i] = - &mvi->phy[i].sas_phy; - sha->sas_port[j * chip_info->n_phy + i] = - &mvi->port[i].sas_port; + struct asd_sas_phy *phy = &mvi->phy[i].sas_phy; + int id = j * chip_info->n_phy + i; + + phy->id = id; + sha->sas_phy[id] = phy; + sha->sas_port[id] = &mvi->port[i].sas_port; } } ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: Commit a692b0e broke my mvsas card 2012-04-17 3:25 ` Dan Williams @ 2012-04-17 14:44 ` Tom Rini 2012-04-17 16:50 ` Dan Williams 2012-04-17 15:47 ` James Bottomley 1 sibling, 1 reply; 15+ messages in thread From: Tom Rini @ 2012-04-17 14:44 UTC (permalink / raw) To: Dan Williams; +Cc: linux-scsi, James Bottomley, Xiangliang Yu On 04/16/2012 08:25 PM, Dan Williams wrote: > On Mon, 2012-04-16 at 18:53 -0700, Tom Rini wrote: >>> Thanks, yes it looks like a new regression. Can you tell me if the >>> following fixes it? >> >> No joy, oopses. http://alnk.org/43youngkirby for the dmesg and oops. >> > > Agh, sorry, I rushed that one. The phy array is initialized later, here > is another run at it: No crash, but 2 out of 4 SSDs still: http://pastebin.com/PCUGTBqG -- Tom ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Commit a692b0e broke my mvsas card 2012-04-17 14:44 ` Tom Rini @ 2012-04-17 16:50 ` Dan Williams 2012-04-17 17:21 ` Tom Rini 0 siblings, 1 reply; 15+ messages in thread From: Dan Williams @ 2012-04-17 16:50 UTC (permalink / raw) To: Tom Rini; +Cc: linux-scsi, James Bottomley, Xiangliang Yu On Tue, Apr 17, 2012 at 7:44 AM, Tom Rini <trini@ti.com> wrote: > On 04/16/2012 08:25 PM, Dan Williams wrote: >> >> On Mon, 2012-04-16 at 18:53 -0700, Tom Rini wrote: >>>> >>>> Thanks, yes it looks like a new regression. Can you tell me if the >>>> following fixes it? >>> >>> >>> No joy, oopses. http://alnk.org/43youngkirby for the dmesg and oops. >>> >> >> Agh, sorry, I rushed that one. The phy array is initialized later, here >> is another run at it: > > > No crash, but 2 out of 4 SSDs still: http://pastebin.com/PCUGTBqG Ok, this looks like a different issue around the changes to ata reset. The intent was to leave mvsas with its old behavior (of not waiting for resets to complete) until it could be updated to the new reset scheme. Can you send a log from a good run on 3.3? I want to see if the timings are different, because: [ 10.039490] sas: ata10: end_device-6:4: dev error handler [..] [ 10.198842] ata10.00: failed to IDENTIFY (INIT_DEV_PARAMS failed, err_mask=0x80) [ 25.684822] sas: ata11: end_device-6:5: dev error handler [..] [ 25.844299] ata11.00: failed to IDENTIFY (INIT_DEV_PARAMS failed, err_mask=0x80) Is shorter than I would expect. -- Dan -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Commit a692b0e broke my mvsas card 2012-04-17 16:50 ` Dan Williams @ 2012-04-17 17:21 ` Tom Rini 0 siblings, 0 replies; 15+ messages in thread From: Tom Rini @ 2012-04-17 17:21 UTC (permalink / raw) To: Dan Williams; +Cc: linux-scsi, James Bottomley, Xiangliang Yu On 04/17/2012 09:50 AM, Dan Williams wrote: > On Tue, Apr 17, 2012 at 7:44 AM, Tom Rini<trini@ti.com> wrote: >> On 04/16/2012 08:25 PM, Dan Williams wrote: >>> >>> On Mon, 2012-04-16 at 18:53 -0700, Tom Rini wrote: >>>>> >>>>> Thanks, yes it looks like a new regression. Can you tell me if the >>>>> following fixes it? >>>> >>>> >>>> No joy, oopses. http://alnk.org/43youngkirby for the dmesg and oops. >>>> >>> >>> Agh, sorry, I rushed that one. The phy array is initialized later, here >>> is another run at it: >> >> >> No crash, but 2 out of 4 SSDs still: http://pastebin.com/PCUGTBqG > > Ok, this looks like a different issue around the changes to ata reset. > The intent was to leave mvsas with its old behavior (of not waiting > for resets to complete) until it could be updated to the new reset > scheme. Can you send a log from a good run on 3.3? > > I want to see if the timings are different, because: > > [ 10.039490] sas: ata10: end_device-6:4: dev error handler > [..] > [ 10.198842] ata10.00: failed to IDENTIFY (INIT_DEV_PARAMS failed, > err_mask=0x80) > > [ 25.684822] sas: ata11: end_device-6:5: dev error handler > [..] > [ 25.844299] ata11.00: failed to IDENTIFY (INIT_DEV_PARAMS failed, > err_mask=0x80) > > Is shorter than I would expect. The 3.3.0 dmesg is up at http://pastebin.com/cVTmpH1M -- Tom ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Commit a692b0e broke my mvsas card 2012-04-17 3:25 ` Dan Williams 2012-04-17 14:44 ` Tom Rini @ 2012-04-17 15:47 ` James Bottomley 2012-04-17 17:17 ` Dan Williams 1 sibling, 1 reply; 15+ messages in thread From: James Bottomley @ 2012-04-17 15:47 UTC (permalink / raw) To: Dan Williams; +Cc: Tom Rini, linux-scsi, Xiangliang Yu On Mon, 2012-04-16 at 20:25 -0700, Dan Williams wrote: > On Mon, 2012-04-16 at 18:53 -0700, Tom Rini wrote: > > > Thanks, yes it looks like a new regression. Can you tell me if the > > > following fixes it? > > > > No joy, oopses. http://alnk.org/43youngkirby for the dmesg and oops. > > > > Agh, sorry, I rushed that one. The phy array is initialized later, here > is another run at it: Are we sure it's initialised correctly in all the other SAS drivers that use (well, one other: aic94xx)? Given the oops issue, perhaps revert this for now and get a working patch in for the next merge window? James ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Commit a692b0e broke my mvsas card 2012-04-17 15:47 ` James Bottomley @ 2012-04-17 17:17 ` Dan Williams 2012-04-18 5:11 ` Dan Williams 0 siblings, 1 reply; 15+ messages in thread From: Dan Williams @ 2012-04-17 17:17 UTC (permalink / raw) To: James Bottomley; +Cc: Tom Rini, linux-scsi, Xiangliang Yu On Tue, Apr 17, 2012 at 8:47 AM, James Bottomley <James.Bottomley@hansenpartnership.com> wrote: > On Mon, 2012-04-16 at 20:25 -0700, Dan Williams wrote: >> Agh, sorry, I rushed that one. The phy array is initialized later, here >> is another run at it: > > Are we sure it's initialised correctly in all the other SAS drivers that > use (well, one other: aic94xx)? Looks like we need: diff --git a/drivers/scsi/aic94xx/aic94xx_init.c b/drivers/scsi/aic94xx/aic94xx_init.c index ff80552..830f438 100644 --- a/drivers/scsi/aic94xx/aic94xx_init.c +++ b/drivers/scsi/aic94xx/aic94xx_init.c @@ -250,6 +250,7 @@ static int __devinit asd_common_setup(struct asd_ha_struct *asd_ha) SAS_LINK_RATE_1_5_GBPS; asd_ha->hw_prof.phy_desc[i].min_sata_lrate = SAS_LINK_RATE_1_5_GBPS; + asd_ha->phys[i].sas_phy.id = i; } return 0; > Given the oops issue, perhaps revert this for now and get a working > patch in for the next merge window? I have no strong feelings either way, but aic94xx and mvsas maintainers have been hard to reach and I'm not encouraged more time will yield a different result versus just moving ahead with these fixes. That said we still have Tom's discovery regression which is a separate issue. -- Dan -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: Commit a692b0e broke my mvsas card 2012-04-17 17:17 ` Dan Williams @ 2012-04-18 5:11 ` Dan Williams 2012-04-18 5:37 ` James Bottomley 2012-04-18 15:23 ` Tom Rini 0 siblings, 2 replies; 15+ messages in thread From: Dan Williams @ 2012-04-18 5:11 UTC (permalink / raw) To: James Bottomley; +Cc: Tom Rini, linux-scsi, Xiangliang Yu On Tue, Apr 17, 2012 at 10:17 AM, Dan Williams <dan.j.williams@intel.com> wrote: > On Tue, Apr 17, 2012 at 8:47 AM, James Bottomley > <James.Bottomley@hansenpartnership.com> wrote: >> On Mon, 2012-04-16 at 20:25 -0700, Dan Williams wrote: >>> Agh, sorry, I rushed that one. The phy array is initialized later, here >>> is another run at it: >> >> Are we sure it's initialised correctly in all the other SAS drivers that >> use (well, one other: aic94xx)? > > Looks like we need: > > diff --git a/drivers/scsi/aic94xx/aic94xx_init.c > b/drivers/scsi/aic94xx/aic94xx_init.c > index ff80552..830f438 100644 > --- a/drivers/scsi/aic94xx/aic94xx_init.c > +++ b/drivers/scsi/aic94xx/aic94xx_init.c > @@ -250,6 +250,7 @@ static int __devinit asd_common_setup(struct > asd_ha_struct *asd_ha) > SAS_LINK_RATE_1_5_GBPS; > asd_ha->hw_prof.phy_desc[i].min_sata_lrate = > SAS_LINK_RATE_1_5_GBPS; > + asd_ha->phys[i].sas_phy.id = i; > } > > return 0; > >> Given the oops issue, perhaps revert this for now and get a working >> patch in for the next merge window? > > I have no strong feelings either way, but aic94xx and mvsas > maintainers have been hard to reach and I'm not encouraged more time > will yield a different result versus just moving ahead with these > fixes. > > That said we still have Tom's discovery regression which is a separate issue. I take it back. I overlooked what Tom said at the very beginning. Everything is fixed by the revert, and I now see why my later "fix" made it worse. The fix overlooked that mvsas is indeed initializing the phy ids, but in the "multi-chip" case it does a rather annoying duplication of phy ids in the array passed to libsas. So, for example, chip0 has phy0-3 at ha phy index 0-3 and chip1 has its phy0-3 at ha phy index 4-7. So the "fix" was breaking mvsas's ability to lookup phys by id and the original commit is tripped up by mvsas's scheme of putting non-unique ids in the sas_ha_struct. Question for a later day, but why isn't mvsas creating a scsi_host per chip?? That can only help performance and is more in line with reality. To properly support commit a692b0e mvsas would either need to convert to a shost-per-chip or use a helper like: static inline struct mvs_phy *to_mvs_phy(struct mvs_info *mvi, int phy_id) { return mvi->phy[phy_id - mvi->chip->n_phy * mvi->id]; } ...everywhere it converts from a libsas phy id back to a local phy structure. Both of these options are way too big for 3.4-rc4, so I'll queue up the revert with a Reported-by and Tested-by from Tom. Thanks again for the report and help Tom. -- Dan -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Commit a692b0e broke my mvsas card 2012-04-18 5:11 ` Dan Williams @ 2012-04-18 5:37 ` James Bottomley 2012-04-18 15:23 ` Tom Rini 1 sibling, 0 replies; 15+ messages in thread From: James Bottomley @ 2012-04-18 5:37 UTC (permalink / raw) To: Dan Williams; +Cc: Tom Rini, linux-scsi, Xiangliang Yu On Tue, 2012-04-17 at 22:11 -0700, Dan Williams wrote: > On Tue, Apr 17, 2012 at 10:17 AM, Dan Williams <dan.j.williams@intel.com> wrote: > > On Tue, Apr 17, 2012 at 8:47 AM, James Bottomley > > <James.Bottomley@hansenpartnership.com> wrote: > >> On Mon, 2012-04-16 at 20:25 -0700, Dan Williams wrote: > >>> Agh, sorry, I rushed that one. The phy array is initialized later, here > >>> is another run at it: > >> > >> Are we sure it's initialised correctly in all the other SAS drivers that > >> use (well, one other: aic94xx)? > > > > Looks like we need: > > > > diff --git a/drivers/scsi/aic94xx/aic94xx_init.c > > b/drivers/scsi/aic94xx/aic94xx_init.c > > index ff80552..830f438 100644 > > --- a/drivers/scsi/aic94xx/aic94xx_init.c > > +++ b/drivers/scsi/aic94xx/aic94xx_init.c > > @@ -250,6 +250,7 @@ static int __devinit asd_common_setup(struct > > asd_ha_struct *asd_ha) > > SAS_LINK_RATE_1_5_GBPS; > > asd_ha->hw_prof.phy_desc[i].min_sata_lrate = > > SAS_LINK_RATE_1_5_GBPS; > > + asd_ha->phys[i].sas_phy.id = i; > > } > > > > return 0; > > > >> Given the oops issue, perhaps revert this for now and get a working > >> patch in for the next merge window? > > > > I have no strong feelings either way, but aic94xx and mvsas > > maintainers have been hard to reach and I'm not encouraged more time > > will yield a different result versus just moving ahead with these > > fixes. > > > > That said we still have Tom's discovery regression which is a separate issue. > > I take it back. > > I overlooked what Tom said at the very beginning. Everything is fixed > by the revert, and I now see why my later "fix" made it worse. The > fix overlooked that mvsas is indeed initializing the phy ids, but in > the "multi-chip" case it does a rather annoying duplication of phy ids > in the array passed to libsas. So, for example, chip0 has phy0-3 at > ha phy index 0-3 and chip1 has its phy0-3 at ha phy index 4-7. So the > "fix" was breaking mvsas's ability to lookup phys by id and the > original commit is tripped up by mvsas's scheme of putting non-unique > ids in the sas_ha_struct. > > Question for a later day, but why isn't mvsas creating a scsi_host per > chip?? That can only help performance and is more in line with > reality. > > To properly support commit a692b0e mvsas would either need to convert > to a shost-per-chip or use a helper like: > > static inline struct mvs_phy *to_mvs_phy(struct mvs_info *mvi, int phy_id) > { > return mvi->phy[phy_id - mvi->chip->n_phy * mvi->id]; > } > > ...everywhere it converts from a libsas phy id back to a local phy structure. > > Both of these options are way too big for 3.4-rc4, so I'll queue up > the revert with a Reported-by and Tested-by from Tom. > > Thanks again for the report and help Tom. Actually, no need ... I can add a revert to the fixes branch (when I manage to get it built). James ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Commit a692b0e broke my mvsas card 2012-04-18 5:11 ` Dan Williams 2012-04-18 5:37 ` James Bottomley @ 2012-04-18 15:23 ` Tom Rini 2012-04-18 15:46 ` Dan Williams 1 sibling, 1 reply; 15+ messages in thread From: Tom Rini @ 2012-04-18 15:23 UTC (permalink / raw) To: Dan Williams; +Cc: James Bottomley, linux-scsi, Xiangliang Yu On 04/17/2012 10:11 PM, Dan Williams wrote: > On Tue, Apr 17, 2012 at 10:17 AM, Dan Williams<dan.j.williams@intel.com> wrote: >> On Tue, Apr 17, 2012 at 8:47 AM, James Bottomley >> <James.Bottomley@hansenpartnership.com> wrote: >>> On Mon, 2012-04-16 at 20:25 -0700, Dan Williams wrote: >>>> Agh, sorry, I rushed that one. The phy array is initialized later, here >>>> is another run at it: >>> >>> Are we sure it's initialised correctly in all the other SAS drivers that >>> use (well, one other: aic94xx)? >> >> Looks like we need: >> >> diff --git a/drivers/scsi/aic94xx/aic94xx_init.c >> b/drivers/scsi/aic94xx/aic94xx_init.c >> index ff80552..830f438 100644 >> --- a/drivers/scsi/aic94xx/aic94xx_init.c >> +++ b/drivers/scsi/aic94xx/aic94xx_init.c >> @@ -250,6 +250,7 @@ static int __devinit asd_common_setup(struct >> asd_ha_struct *asd_ha) >> SAS_LINK_RATE_1_5_GBPS; >> asd_ha->hw_prof.phy_desc[i].min_sata_lrate = >> SAS_LINK_RATE_1_5_GBPS; >> + asd_ha->phys[i].sas_phy.id = i; >> } >> >> return 0; >> >>> Given the oops issue, perhaps revert this for now and get a working >>> patch in for the next merge window? >> >> I have no strong feelings either way, but aic94xx and mvsas >> maintainers have been hard to reach and I'm not encouraged more time >> will yield a different result versus just moving ahead with these >> fixes. >> >> That said we still have Tom's discovery regression which is a separate issue. > > I take it back. > > I overlooked what Tom said at the very beginning. Everything is fixed > by the revert, and I now see why my later "fix" made it worse. The > fix overlooked that mvsas is indeed initializing the phy ids, but in > the "multi-chip" case it does a rather annoying duplication of phy ids > in the array passed to libsas. So, for example, chip0 has phy0-3 at > ha phy index 0-3 and chip1 has its phy0-3 at ha phy index 4-7. So the > "fix" was breaking mvsas's ability to lookup phys by id and the > original commit is tripped up by mvsas's scheme of putting non-unique > ids in the sas_ha_struct. > > Question for a later day, but why isn't mvsas creating a scsi_host per > chip?? That can only help performance and is more in line with > reality. Performance isn't in-line with what I would expect (in playing with fio), so if you want to whip something up, or point me some changesets / other drivers I can take a stab here. Thanks! -- Tom ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Commit a692b0e broke my mvsas card 2012-04-18 15:23 ` Tom Rini @ 2012-04-18 15:46 ` Dan Williams 0 siblings, 0 replies; 15+ messages in thread From: Dan Williams @ 2012-04-18 15:46 UTC (permalink / raw) To: Tom Rini; +Cc: James Bottomley, linux-scsi, Xiangliang Yu On Wed, Apr 18, 2012 at 8:23 AM, Tom Rini <trini@ti.com> wrote: > Performance isn't in-line with what I would expect (in playing with fio), so > if you want to whip something up, or point me some changesets / other > drivers I can take a stab here. Thanks! > The thrust is to change this loop in mvs_pci_init: do { mvi = mvs_pci_alloc(pdev, ent, shost, nhost); if (!mvi) { rc = -ENOMEM; goto err_out_regions; } memset(&mvi->hba_info_param, 0xFF, sizeof(struct hba_info_page)); mvs_init_sas_add(mvi); mvi->instance = nhost; rc = MVS_CHIP_DISP->chip_init(mvi); if (rc) { mvs_free(mvi); goto err_out_regions; } nhost++; } while (nhost < chip->n_host); ...to assume that mvs_pci_alloc will do the allocation of a scsi_host per chip, rather than passing in a global host. Then would need to clean up warts like the following in mvs_phy_control: while (sha->sas_phy[i]) { if (sha->sas_phy[i] == sas_phy) break; i++; } ...since this model means that the sas_phy->id can be used directly to look up the local mvsas phy. I might circle back to this once 3.4 finalizes (since the random ids libsas uses for identifying local ports is onerous), but if someone else beats me to it I wouldn't mind ;-). -- Dan -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2012-04-18 15:46 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-04-16 22:42 Commit a692b0e broke my mvsas card Tom Rini 2012-04-16 23:12 ` Dan Williams 2012-04-16 23:34 ` Tom Rini 2012-04-16 23:59 ` Dan Williams 2012-04-17 1:53 ` Tom Rini 2012-04-17 3:25 ` Dan Williams 2012-04-17 14:44 ` Tom Rini 2012-04-17 16:50 ` Dan Williams 2012-04-17 17:21 ` Tom Rini 2012-04-17 15:47 ` James Bottomley 2012-04-17 17:17 ` Dan Williams 2012-04-18 5:11 ` Dan Williams 2012-04-18 5:37 ` James Bottomley 2012-04-18 15:23 ` Tom Rini 2012-04-18 15:46 ` Dan Williams
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox