* mpt fusion driver performance issue in 2.6.14-rc2
@ 2005-09-29 18:59 Chen, Kenneth W
0 siblings, 0 replies; 7+ messages in thread
From: Chen, Kenneth W @ 2005-09-29 18:59 UTC (permalink / raw)
To: linux-scsi; +Cc: 'Moore, Eric Dean'
Something happened in between kernel 2.6.12 and 2.6.14-rc2, where
disk performance went 20X slower on the latest release kernel. I
suspect it has something to do with the fusion driver. This showed
up in the boot log: "mptscsih: ioc0: DV: Release failed." is it
significant?
2.6.12
[root]# hdparm -t /dev/sdc
Timing buffered disk reads: 174 MB in 3.03 seconds = 57.44 MB/sec
2.6.14-rc2
[root]# hdparm -t /dev/sdc
Timing buffered disk reads: 8 MB in 3.16 seconds = 2.53 MB/sec
Partial boot log with 2.6.14-rc2:
Fusion MPT base driver 3.03.02
Copyright (c) 1999-2005 LSI Logic Corporation
Fusion MPT SPI Host driver 3.03.02
GSI 28 (level, low) -> CPU 0 (0xc618) vector 49
ACPI: PCI Interrupt 0000:06:02.0[A] -> GSI 28 (level, low) -> IRQ 49
mptbase: Initiating ioc0 bringup
ioc0: 53C1030: Capabilities={Initiator}
scsi0 : ioc0: LSI53C1030, FwRev=01030a00h, Ports=1, MaxQ=255, IRQ=49
Vendor: MAXTOR Model: ATLAS10K3_18_SCA Rev: B000
Type: Direct-Access ANSI SCSI revision: 03
SCSI device sda: 35916548 512-byte hdwr sectors (18389 MB)
SCSI device sda: drive cache: write back
SCSI device sda: 35916548 512-byte hdwr sectors (18389 MB)
SCSI device sda: drive cache: write back
sda: sda1 sda2 sda3
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Vendor: QUANTUM Model: ATLAS IV 9 SCA Rev: 0B0B
Type: Direct-Access ANSI SCSI revision: 03
SCSI device sdb: 17942584 512-byte hdwr sectors (9187 MB)
SCSI device sdb: drive cache: write back
SCSI device sdb: 17942584 512-byte hdwr sectors (9187 MB)
SCSI device sdb: drive cache: write back
sdb: sdb1
Attached scsi disk sdb at scsi0, channel 0, id 1, lun 0
Vendor: SEAGATE Model: ST336752LC Rev: 0004
Type: Direct-Access ANSI SCSI revision: 03
SCSI device sdc: 71687369 512-byte hdwr sectors (36704 MB)
SCSI device sdc: drive cache: write back
SCSI device sdc: 71687369 512-byte hdwr sectors (36704 MB)
SCSI device sdc: drive cache: write back
sdc: sdc1 sdc2
Attached scsi disk sdc at scsi0, channel 0, id 2, lun 0
Vendor: ESG-SHV Model: SCA HSBP M17 Rev: 1.0D
Type: Processor ANSI SCSI revision: 02
GSI 29 (level, low) -> CPU 3 (0xc018) vector 50
ACPI: PCI Interrupt 0000:06:02.1[B] -> GSI 29 (level, low) -> IRQ 50
mptbase: Initiating ioc1 bringup
mptscsih: ioc0: DV: Release failed. id 0<6>ioc1: 53C1030: Capabilities={Initiator}
scsi1 : ioc1: LSI53C1030, FwRev=01030a00h, Ports=1, MaxQ=255, IRQ=50
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: mpt fusion driver performance issue in 2.6.14-rc2
[not found] <B05667366EE6204181EABE9C1B1C0EB50484F7D4@scsmsx401.amr.corp.intel.com>
@ 2005-10-01 0:27 ` Chen, Kenneth W
2005-10-01 6:19 ` Moore, Eric Moore
0 siblings, 1 reply; 7+ messages in thread
From: Chen, Kenneth W @ 2005-10-01 0:27 UTC (permalink / raw)
To: linux-scsi; +Cc: 'Moore, Eric Dean', 'James Bottomley'
Chen, Kenneth W wrote on Thursday, September 29, 2005 11:59 AM
> Something happened in between kernel 2.6.12 and 2.6.14-rc2, where
> disk performance went 20X slower on the latest release kernel. I
> suspect it has something to do with the fusion driver. This showed
> up in the boot log: "mptscsih: ioc0: DV: Release failed." is it
> significant?
I think the bug is for real, and it is in the mpt fusion driver. I'm
not an expert of LSI53C1030 host controller, and I won't pretend to be
one. Though I have data to show what is going on:
There are two threads during driver initialization. One does domain
validation (mptscsih_domainValidation) and one does host controller
initialization (mptspi_probe). During 2nd host controller bringup,
i.e., bringing up ioc1, it temporary disables first channel (ioc0).
However, DV is in progress on ioc0 in another thread (and possibly
running on another CPU). The effect of disabling ioc0 during in-
progress-domain-validation is that it causes all subsequent DV
commands to fail and resulting lowest possible performance setting
for almost all disks pending DV.
Here is a fix that I propose: for the period that ioc0 need to be
disabled for bringing up ioc1, ioc->active is marked with a special
flag and have DV thread busy wait on that flag. This avoid mptspi_probe
thread clash into the DV thread causing brain-damage to DV.
With the patch, all disks are up to the performance expectation and it
also fixed the "mptscsih: ioc0: DV: Release failed" error message.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
--- ./drivers/message/fusion/mptbase.c.orig 2005-09-30 17:16:16.051906000 -0700
+++ ./drivers/message/fusion/mptbase.c 2005-09-30 17:18:50.811670000 -0700
@@ -740,8 +740,12 @@ mpt_get_msg_frame(int handle, MPT_ADAPTE
#endif
/* If interrupts are not attached, do not return a request frame */
- if (!ioc->active)
- return NULL;
+ if (ioc->active <= 0) {
+ while (ioc->active == -1)
+ schedule_timeout_uninterruptible(1);
+ if (!ioc->active)
+ return NULL;
+ }
spin_lock_irqsave(&ioc->FreeQlock, flags);
if (!list_empty(&ioc->FreeQ)) {
@@ -1495,7 +1499,7 @@ mpt_do_ioc_recovery(MPT_ADAPTER *ioc, u3
/* Disable alt-IOC's reply interrupts (and FreeQ) for a bit ... */
CHIPREG_WRITE32(&ioc->alt_ioc->chip->IntMask, 0xFFFFFFFF);
- ioc->alt_ioc->active = 0;
+ ioc->alt_ioc->active = -1;
}
hard = 1;
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: mpt fusion driver performance issue in 2.6.14-rc2
2005-10-01 0:27 ` mpt fusion driver performance issue in 2.6.14-rc2 Chen, Kenneth W
@ 2005-10-01 6:19 ` Moore, Eric Moore
2005-10-12 1:10 ` Chen, Kenneth W
0 siblings, 1 reply; 7+ messages in thread
From: Moore, Eric Moore @ 2005-10-01 6:19 UTC (permalink / raw)
To: Chen, Kenneth W, linux-scsi; +Cc: 'James Bottomley'
On Friday, September 30, 2005 6:27 PM, Chen, Kenneth W wrote:
> Chen, Kenneth W wrote on Thursday, September 29, 2005 11:59 AM
>> Something happened in between kernel 2.6.12 and 2.6.14-rc2, where
>> disk performance went 20X slower on the latest release kernel. I
>> suspect it has something to do with the fusion driver. This showed
>> up in the boot log: "mptscsih: ioc0: DV: Release failed." is it
>> significant?
>
> I think the bug is for real, and it is in the mpt fusion driver. I'm
> not an expert of LSI53C1030 host controller, and I won't pretend to be
> one. Though I have data to show what is going on:
>
> There are two threads during driver initialization. One does domain
> validation (mptscsih_domainValidation) and one does host controller
> initialization (mptspi_probe). During 2nd host controller bringup,
> i.e., bringing up ioc1, it temporary disables first channel (ioc0).
> However, DV is in progress on ioc0 in another thread (and possibly
> running on another CPU). The effect of disabling ioc0 during in-
> progress-domain-validation is that it causes all subsequent DV
> commands to fail and resulting lowest possible performance setting
> for almost all disks pending DV.
>
> Here is a fix that I propose: for the period that ioc0 need to be
> disabled for bringing up ioc1, ioc->active is marked with a special
> flag and have DV thread busy wait on that flag. This avoid mptspi_probe
> thread clash into the DV thread causing brain-damage to DV.
>
> With the patch, all disks are up to the performance expectation and it
> also fixed the "mptscsih: ioc0: DV: Release failed" error message.
>
Thanks for you findings into this.
I'm moving across town, and will not be in the office, nor have
internet access for the next couple days. I will look into this when I
return.
Best regards,
Eric Moore
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: mpt fusion driver performance issue in 2.6.14-rc2
2005-10-01 6:19 ` Moore, Eric Moore
@ 2005-10-12 1:10 ` Chen, Kenneth W
0 siblings, 0 replies; 7+ messages in thread
From: Chen, Kenneth W @ 2005-10-12 1:10 UTC (permalink / raw)
To: 'Moore, Eric Moore', linux-scsi; +Cc: 'James Bottomley'
On Friday, September 30, 2005 6:27 PM, Chen, Kenneth W wrote:
> Chen, Kenneth W wrote on Thursday, September 29, 2005 11:59 AM
>> Something happened in between kernel 2.6.12 and 2.6.14-rc2, where
>> disk performance went 20X slower on the latest release kernel. I
>> suspect it has something to do with the fusion driver. This showed
>> up in the boot log: "mptscsih: ioc0: DV: Release failed." is it
>> significant?
>
> I think the bug is for real, and it is in the mpt fusion driver. I'm
> not an expert of LSI53C1030 host controller, and I won't pretend to be
> one. Though I have data to show what is going on:
>
> There are two threads during driver initialization. One does domain
> validation (mptscsih_domainValidation) and one does host controller
> initialization (mptspi_probe). During 2nd host controller bringup,
> i.e., bringing up ioc1, it temporary disables first channel (ioc0).
> However, DV is in progress on ioc0 in another thread (and possibly
> running on another CPU). The effect of disabling ioc0 during in-
> progress-domain-validation is that it causes all subsequent DV
> commands to fail and resulting lowest possible performance setting
> for almost all disks pending DV.
>
> Here is a fix that I propose: for the period that ioc0 need to be
> disabled for bringing up ioc1, ioc->active is marked with a special
> flag and have DV thread busy wait on that flag. This avoid mptspi_probe
> thread clash into the DV thread causing brain-damage to DV.
>
> With the patch, all disks are up to the performance expectation and it
> also fixed the "mptscsih: ioc0: DV: Release failed" error message.
>
Moore, Eric Moore wrote on Friday, September 30, 2005 11:20 PM
> Thanks for you findings into this.
>
> I'm moving across town, and will not be in the office, nor have
> internet access for the next couple days. I will look into this when
> I return.
Any time frame when this is going to be looked into and have the bug
fixed? If no objection, please apply.
Simple "hdparm -t" shows:
vanilla 2.6.14-rc3:
Timing buffered disk reads: 8 MB in 3.16 seconds = 2.53 MB/sec
2.6.14-rc3 + patch:
Timing buffered disk reads: 174 MB in 3.02 seconds = 57.55 MB/sec
---
Fix race between domain validation and host controller initialization
for mpt fusion driver.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
--- ./drivers/message/fusion/mptbase.c.orig 2005-09-30 17:16:16.051906000 -0700
+++ ./drivers/message/fusion/mptbase.c 2005-09-30 17:18:50.811670000 -0700
@@ -740,8 +740,12 @@ mpt_get_msg_frame(int handle, MPT_ADAPTE
#endif
/* If interrupts are not attached, do not return a request frame */
- if (!ioc->active)
- return NULL;
+ if (ioc->active <= 0) {
+ while (ioc->active == -1)
+ schedule_timeout_uninterruptible(1);
+ if (!ioc->active)
+ return NULL;
+ }
spin_lock_irqsave(&ioc->FreeQlock, flags);
if (!list_empty(&ioc->FreeQ)) {
@@ -1495,7 +1499,7 @@ mpt_do_ioc_recovery(MPT_ADAPTER *ioc, u3
/* Disable alt-IOC's reply interrupts (and FreeQ) for a bit ... */
CHIPREG_WRITE32(&ioc->alt_ioc->chip->IntMask, 0xFFFFFFFF);
- ioc->alt_ioc->active = 0;
+ ioc->alt_ioc->active = -1;
}
hard = 1;
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: mpt fusion driver performance issue in 2.6.14-rc2
@ 2005-10-12 20:57 Moore, Eric Dean
2005-10-12 21:05 ` Chen, Kenneth W
0 siblings, 1 reply; 7+ messages in thread
From: Moore, Eric Dean @ 2005-10-12 20:57 UTC (permalink / raw)
To: Chen, Kenneth W, linux-scsi; +Cc: Moore, Eric Dean
On Thursday, September 29, 2005 12:59 PM, Chen, Kenneth W wrote:
>
> Something happened in between kernel 2.6.12 and 2.6.14-rc2, where
> disk performance went 20X slower on the latest release kernel. I
> suspect it has something to do with the fusion driver. This showed
> up in the boot log: "mptscsih: ioc0: DV: Release failed." is it
> significant?
>
>
There have been several post circulating about the "release failed"
issue. I thought I saw an email recently from someone at Intel
that had root caused the problem. I'm been trying to find that all day
to find that email. I believe it was on this or the linux-kernel
forums. Had you seen it? We need to root cause that, however
I'm very much burried in SAS.
Anyways the plan is to eventually go to SPI transport. Have you seen
this post by James?
http://marc.theaimsgroup.com/?l=linux-scsi&m=112879629201766&w=2
I have it on my todo list, and plan to review this SPI updates
sometime this week. Perhaps you can try this patch out and let us
(and James) whether its solved.
Eric Moore
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: mpt fusion driver performance issue in 2.6.14-rc2
2005-10-12 20:57 Moore, Eric Dean
@ 2005-10-12 21:05 ` Chen, Kenneth W
0 siblings, 0 replies; 7+ messages in thread
From: Chen, Kenneth W @ 2005-10-12 21:05 UTC (permalink / raw)
To: 'Moore, Eric Dean', linux-scsi
Moore, Eric Dean wrote on Wednesday, October 12, 2005 1:58 PM
> There have been several post circulating about the "release failed"
> issue. I thought I saw an email recently from someone at Intel
> that had root caused the problem. I'm been trying to find that all day
> to find that email. I believe it was on this or the linux-kernel
> forums. Had you seen it?
Are you referring to this thread:
http://marc.theaimsgroup.com/?t=112802043200007&r=1&w=2
If it is, then it was me originally posted the patch.
> Anyways the plan is to eventually go to SPI transport. Have you seen
> this post by James?
> http://marc.theaimsgroup.com/?l=linux-scsi&m=112879629201766&w=2
>
> I have it on my todo list, and plan to review this SPI updates
> sometime this week. Perhaps you can try this patch out and let us
> (and James) whether its solved.
I suppose I can just run that patch through our benchmark setup.
- Ken
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: mpt fusion driver performance issue in 2.6.14-rc2
@ 2005-10-14 0:07 Moore, Eric Dean
0 siblings, 0 replies; 7+ messages in thread
From: Moore, Eric Dean @ 2005-10-14 0:07 UTC (permalink / raw)
To: Chen, Kenneth W, linux-scsi
Cc: Brauer, Jonathan, Roy Wade, Pope, Steve, Maloy, Joe,
Unrein, Jason
On Wednesday, October 12, 2005 3:05 PM, Chen, Kenneth W wrote:
>
> Moore, Eric Dean wrote on Wednesday, October 12, 2005 1:58 PM
> > There have been several post circulating about the "release failed"
> > issue. I thought I saw an email recently from someone at Intel
> > that had root caused the problem. I'm been trying to find
> that all day
> > to find that email. I believe it was on this or the linux-kernel
> > forums. Had you seen it?
>
> Are you referring to this thread:
> http://marc.theaimsgroup.com/?t=112802043200007&r=1&w=2
>
> If it is, then it was me originally posted the patch.
>
Sorry, I remember that email, but forgot that was you.
It's been a very a busy time at the office.
Thanks for your detailed info before, as it makes sense why
dv would fail for ioc0, when ioc1 is brought up. Thanks for that
detail description. I'm not sure whether its good idea
to goto sleep from the mpt_get_msg_frame(). I will have to check
the code to insure interrupts are not disabled when this
is called. I almost prefer going to sleep from
mptscish_do_cmd() function which occurs during DV when
interrupts are enabled, and we probably have the info
knowing whether the ioc's are in reset stage.
Part of my problem is I'm not able to reproduce this issue here in
my office. We can take this offline between our FAE, John Brauer,
which I copied on this email. He is in Portland area, and if
your there, perhaps he can drop by your office.
Eric Moore
LSI Logic
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2005-10-14 0:07 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <B05667366EE6204181EABE9C1B1C0EB50484F7D4@scsmsx401.amr.corp.intel.com>
2005-10-01 0:27 ` mpt fusion driver performance issue in 2.6.14-rc2 Chen, Kenneth W
2005-10-01 6:19 ` Moore, Eric Moore
2005-10-12 1:10 ` Chen, Kenneth W
2005-10-14 0:07 Moore, Eric Dean
-- strict thread matches above, loose matches on Subject: below --
2005-10-12 20:57 Moore, Eric Dean
2005-10-12 21:05 ` Chen, Kenneth W
2005-09-29 18:59 Chen, Kenneth W
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox