* [PATCH] 2.6.34-rc3 Disable R_OK (Early ACK) on SII 3726/4726 PMP
@ 2010-04-14 0:46 Grant Grundler
2010-04-15 0:31 ` Tejun Heo
0 siblings, 1 reply; 3+ messages in thread
From: Grant Grundler @ 2010-04-14 0:46 UTC (permalink / raw)
To: Linux IDE mailing list; +Cc: Jeff Garzik, Tejun Heo, LKML
[-- Attachment #1: Type: text/plain, Size: 3909 bytes --]
In 2009, While running "cache read" performance test of drives behind
SII PMP we encountered a "all 5 drives" timeout on more than 30% of the
machines under test. This patch reduces the rate by a factor of about 70.
Low enough that we didn't care to further investigate the issue.
Performance impact with any sort of "normal" use was ~2%+ CPU and less
than 1% throughput degradation. Worst case impact (cached read) was
6% IOPS reduction. This is with NCQ off (q=1) but I believe FIS based
switching enabled in the SATA driver.
The patch disables "Early ACK" in the 3726/4726 port multiplier.
"Early ACK" is issued when device sends a FIS to the host (via PMP)
and the PMP sends an ACK immediately back to the device - well before
the host gets the response. Under worst case IOPs load (cached read
test) and more than 2 PMPs connected to a 4-port SATA controller,
I suspect the time to service all of the PMPs is exceeding the PMPs
ability to keep track of outstanding FIS it owes the Host. Reducing
the number of PMPs to 2 (or 1) reduces the frequency by several orders
of magnitude. Kudos to Gwendal for initial debugging of this issue.
[Any errors in the description are mine, not his.]
Patch is currently in production on Google servers.
Signed-off-by: Grant Grundler <grundler@google.com>
Signed-off-by: Gwendal Grignou <gwendal@google.com>
---
And my apologies in advance for using gmail. Patch appended below
is also attached since I expect gmail will mangle the inlined content.
diff --git a/drivers/ata/libata-pmp.c b/drivers/ata/libata-pmp.c
index 00305f4..69bb203 100644
--- a/drivers/ata/libata-pmp.c
+++ b/drivers/ata/libata-pmp.c
@@ -235,6 +235,8 @@ static int sata_pmp_configure(struct ata_device
*dev, int print_info)
{
struct ata_port *ap = dev->link->ap;
u32 *gscr = dev->gscr;
+ u16 vendor = sata_pmp_gscr_vendor(gscr);
+ u16 devid = sata_pmp_gscr_devid(gscr);
unsigned int err_mask = 0;
const char *reason;
int nr_ports, rc;
@@ -260,12 +262,28 @@ static int sata_pmp_configure(struct ata_device
*dev, int print_info)
goto fail;
}
+ /* For Sil3726, disable sending Early R_OK */
+ if (vendor == 0x1095 && devid == 0x3726) {
+ u32 reg;
+ err_mask = sata_pmp_read(&ap->link,
SATA_PMP_GSCR_SII_POL, ®);
+ if (err_mask) {
+ rc = -EIO;
+ reason = "failed to read Sil3726 Private Register";
+ goto fail;
+ }
+ reg &= ~0x1;
+ err_mask = sata_pmp_write(&ap->link,
SATA_PMP_GSCR_SII_POL, reg);
+ if (err_mask) {
+ rc = -EIO;
+ reason = "failed to write Sil3726 Private Register";
+ goto fail;
+ }
+ }
+
if (print_info) {
ata_dev_printk(dev, KERN_INFO, "Port Multiplier %s, "
"0x%04x:0x%04x r%d, %d ports, feat 0x%x/0x%x\n",
- sata_pmp_spec_rev_str(gscr),
- sata_pmp_gscr_vendor(gscr),
- sata_pmp_gscr_devid(gscr),
+ sata_pmp_spec_rev_str(gscr), vendor, devid,
sata_pmp_gscr_rev(gscr),
nr_ports, gscr[SATA_PMP_GSCR_FEAT_EN],
gscr[SATA_PMP_GSCR_FEAT]);
diff --git a/include/linux/ata.h b/include/linux/ata.h
index 700c5b9..18fc208 100644
--- a/include/linux/ata.h
+++ b/include/linux/ata.h
@@ -386,6 +386,7 @@ enum {
SATA_PMP_GSCR_ERROR_EN = 33,
SATA_PMP_GSCR_FEAT = 64,
SATA_PMP_GSCR_FEAT_EN = 96,
+ SATA_PMP_GSCR_SII_POL = 129, /* VENDOR : Silicon Image 3726/4726 */
SATA_PMP_PSCR_STATUS = 0,
SATA_PMP_PSCR_ERROR = 1,
[-- Attachment #2: 2.6.18-mv_sata-measure_port_handler-03 --]
[-- Type: application/octet-stream, Size: 4937 bytes --]
==== //depot/google_vendor_src_branch/prodkernel/2.6.18/drivers/scsi/mv_sata/mvIALCommonUtils.c#6 - /usr/local/google/ggg-2.6.18/2.6.18/drivers/scsi/mv_sata/mvIALCommonUtils.c ====
--- /tmp/g4-24719/cache/depot/google_vendor_src_branch/prodkernel/2.6.18/drivers/scsi/mv_sata/mvIALCommonUtils.c#6 2009-07-10 12:08:45.000000000 -0700
+++ /usr/local/google/ggg-2.6.18/2.6.18/drivers/scsi/mv_sata/mvIALCommonUtils.c 2009-07-10 12:08:34.000000000 -0700
@@ -966,6 +966,7 @@ MV_BOOLEAN mvGetPMDeviceInfo(MV_SATA_AD
if (pPMDeviceInfo->vendorId == 0x1095 &&
(pPMDeviceInfo->deviceId == 0x3726 ||
pPMDeviceInfo->deviceId == 0x4726)) {
+#if 0
/*
* Bug 1411341: "CommandAbort in mv_sata during PHY stress test"
* Clear "Immediate ACK enable" bit so frame ACKs are returned
@@ -977,6 +978,7 @@ MV_BOOLEAN mvGetPMDeviceInfo(MV_SATA_AD
regVal &= ~MV_BIT0;
mvPMDevWriteReg(pSataAdapter, channelIndex, MV_SATA_PM_CONTROL_PORT,
129, regVal, NULL);
+#endif
regVal = 5;
pm_gpio_reg = PM_GSCR_SIL_GPIO;
==== //depot/google_vendor_src_branch/prodkernel/2.6.18/drivers/scsi/mv_sata/mvSata.c#19 - /usr/local/google/ggg-2.6.18/2.6.18/drivers/scsi/mv_sata/mvSata.c ====
--- /tmp/g4-24719/cache/depot/google_vendor_src_branch/prodkernel/2.6.18/drivers/scsi/mv_sata/mvSata.c#19 2009-07-10 10:25:34.000000000 -0700
+++ /usr/local/google/ggg-2.6.18/2.6.18/drivers/scsi/mv_sata/mvSata.c 2009-07-16 22:12:57.000000000 -0700
@@ -7779,6 +7779,11 @@ MV_BOOLEAN mvSataCheckPendingInterrupt(M
/*bogus interrupt*/
return MV_FALSE;
}
+
+#if 1
+#define CPU_MHZ 2200 /* Argo-Barcelona */
+#endif
+
/*******************************************************************************
* mvSataInterruptServiceRoutine - Interrupt service routine
*
@@ -7800,7 +7805,11 @@ MV_BOOLEAN mvSataInterruptServiceRoutine
{
MV_U8 sataUnit;
MV_BUS_ADDR_T ioBaseAddr = pAdapter->adapterIoBaseAddress;
-
+#if 1
+ cycles_t mv_gap, mv_stamp[5];
+ static cycles_t mv_max[5];
+ unsigned char mv_func[5];
+#endif
mvOsSemTake(&pAdapter->semaphore);
/* Prevent any new interrupts until we are done processing current one.
@@ -7817,7 +7826,17 @@ MV_BOOLEAN mvSataInterruptServiceRoutine
MV_SATA_II_ALL_PORTS_INT_CAUSE_REG_OFFSET, 0);
}
+#if 1
+mv_stamp[0] = get_cycles_sync();
+mv_func[0] = 0;
+mv_func[1] = 0;
+mv_func[2] = 0;
+mv_func[3] = 0;
+mv_func[4] = 0;
+sataUnit = 0;
+#else
for (sataUnit = 0; sataUnit < pAdapter->numberOfUnits; sataUnit++)
+#endif
{
MV_U32 unitCause;
MV_U32 unitMask = pAdapter->mainMask;
@@ -7838,7 +7857,7 @@ MV_BOOLEAN mvSataInterruptServiceRoutine
* (This also flushes the previous MMIO write.) */
unitCause = MV_REG_READ_DWORD(ioBaseAddr, unitCauseAddr);
- /* derive SATAHC0 Intr Cause mask from mainMask.
+ /* Derive SATAHC0 Intr Cause mask from mainMask.
* typical value of mainMask is 0x0006ab55 .
* FIXME - move this out of line to same places mainMask is set.
*/
@@ -7885,6 +7904,9 @@ MV_BOOLEAN mvSataInterruptServiceRoutine
MV_U32 ATA_intr = (unitCause >> 8) & portMask;
if (responseDone) {
+#if 1
+ mv_func[port] |= 0x10;
+#endif
handleEdmaInterrupt(pAdapter, sataUnit, port,
(MV_U8)(unitRspInPtr),
responseDone);
@@ -7901,14 +7923,52 @@ MV_BOOLEAN mvSataInterruptServiceRoutine
if (mainCause & (1<<(port * 2))) {
handleEdmaError(pAdapter, MV_CHANNEL_INDEX(sataUnit, port));
+#if 1
+ mv_func[port] |= 0x2;
+#endif
} else {
handleDeviceInterrupt(pAdapter, sataUnit, port);
+#if 1
+ mv_func[port] |= 0x4;
+#endif
}
}
unitRspInPtr >>= 8;
+
+#if 1
+/* port+1 == end time. port == start time. */
+mv_stamp[port+1] = get_cycles_sync();
+#endif
+ }
+
+#if 1
+/* Defer printing stuff until we've handled everything. */
+for (port = 0; port < pAdapter->portsPerUnit; port++)
+ {
+ cycles_t mv_gap = mv_stamp[port+1]-mv_stamp[port];
+ if ( mv_gap > mv_max[port]) {
+ mv_max[port] = mv_gap;
+ printk(KERN_ERR "mv_sata: SATA%d.%d %llu microsec (0x%llx cycles) 0x%x\n",
+ get_sata_devicenum(pAdapter), port, mv_gap/CPU_MHZ,
+ mv_gap, mv_func[port]);
}
}
+
+/* Aggregate time : end time of last port - start time of first port */
+mv_gap = mv_stamp[port] - mv_stamp[0];
+if ( mv_gap > mv_max[port]) {
+ mv_max[port] = mv_gap;
+ printk(KERN_ERR "mv_sata: SATA%d %llu microsec (0x%llx cycles)"
+ " 0x%x 0x%x 0x%x 0x%x\n",
+ get_sata_devicenum(pAdapter),
+ mv_gap/CPU_MHZ, mv_gap,
+ mv_func[0], mv_func[1], mv_func[2], mv_func[3]
+ );
+}
+#endif
+
+ }
else
{
MV_U32 edmaError;
@@ -7976,6 +8036,7 @@ MV_BOOLEAN mvSataInterruptServiceRoutine
}
#endif
}
+
}
/* restore MAIN Interrupt Cause Mask register */
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH] 2.6.34-rc3 Disable R_OK (Early ACK) on SII 3726/4726 PMP
2010-04-14 0:46 [PATCH] 2.6.34-rc3 Disable R_OK (Early ACK) on SII 3726/4726 PMP Grant Grundler
@ 2010-04-15 0:31 ` Tejun Heo
2010-04-15 0:57 ` Grant Grundler
0 siblings, 1 reply; 3+ messages in thread
From: Tejun Heo @ 2010-04-15 0:31 UTC (permalink / raw)
To: Grant Grundler; +Cc: Linux IDE mailing list, Jeff Garzik, LKML
Hello,
On 04/14/2010 09:46 AM, Grant Grundler wrote:
> In 2009, While running "cache read" performance test of drives behind
> SII PMP we encountered a "all 5 drives" timeout on more than 30% of the
> machines under test. This patch reduces the rate by a factor of about 70.
> Low enough that we didn't care to further investigate the issue.
Interesting.
> Performance impact with any sort of "normal" use was ~2%+ CPU and less
> than 1% throughput degradation. Worst case impact (cached read) was
> 6% IOPS reduction. This is with NCQ off (q=1) but I believe FIS based
> switching enabled in the SATA driver.
Given the general flakiness of the device, I'm more than willing to
pay some overhead for stability and the performance hit seems pretty
small.
> @@ -260,12 +262,28 @@ static int sata_pmp_configure(struct ata_device
> *dev, int print_info)
> goto fail;
> }
>
> + /* For Sil3726, disable sending Early R_OK */
> + if (vendor == 0x1095 && devid == 0x3726) {
Isn't this also applicable to 4723 and 4726?
> diff --git a/include/linux/ata.h b/include/linux/ata.h
> index 700c5b9..18fc208 100644
> --- a/include/linux/ata.h
> +++ b/include/linux/ata.h
> @@ -386,6 +386,7 @@ enum {
> SATA_PMP_GSCR_ERROR_EN = 33,
> SATA_PMP_GSCR_FEAT = 64,
> SATA_PMP_GSCR_FEAT_EN = 96,
> + SATA_PMP_GSCR_SII_POL = 129, /* VENDOR : Silicon Image 3726/4726 */
Can you plesae define it inside libata-pmp.c? ata.h only contains
stuff defined by the ATA standard.
Other than the above nitpicks,
Acked-by: Tejun Heo <tj@kernel.org>
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [PATCH] 2.6.34-rc3 Disable R_OK (Early ACK) on SII 3726/4726 PMP
2010-04-15 0:31 ` Tejun Heo
@ 2010-04-15 0:57 ` Grant Grundler
0 siblings, 0 replies; 3+ messages in thread
From: Grant Grundler @ 2010-04-15 0:57 UTC (permalink / raw)
To: Tejun Heo; +Cc: Linux IDE mailing list, Jeff Garzik, LKML
On Wed, Apr 14, 2010 at 5:31 PM, Tejun Heo <tj@kernel.org> wrote:
> Hello,
>
> On 04/14/2010 09:46 AM, Grant Grundler wrote:
>> In 2009, While running "cache read" performance test of drives behind
>> SII PMP we encountered a "all 5 drives" timeout on more than 30% of the
>> machines under test. This patch reduces the rate by a factor of about 70.
>> Low enough that we didn't care to further investigate the issue.
>
> Interesting.
>
>> Performance impact with any sort of "normal" use was ~2%+ CPU and less
>> than 1% throughput degradation. Worst case impact (cached read) was
>> 6% IOPS reduction. This is with NCQ off (q=1) but I believe FIS based
>> switching enabled in the SATA driver.
>
> Given the general flakiness of the device, I'm more than willing to
> pay some overhead for stability and the performance hit seems pretty
> small.
>
>> @@ -260,12 +262,28 @@ static int sata_pmp_configure(struct ata_device
>> *dev, int print_info)
>> goto fail;
>> }
>>
>> + /* For Sil3726, disable sending Early R_OK */
>> + if (vendor == 0x1095 && devid == 0x3726) {
>
> Isn't this also applicable to 4723 and 4726?
Sorry, I don't know why I included 4726 in the Subject line since I
only tested with 3726.
I expect it would apply to 4726 but someone who knows the internals of
both would have to confirm that.
I don't know anything about 4723.
>
>> diff --git a/include/linux/ata.h b/include/linux/ata.h
>> index 700c5b9..18fc208 100644
>> --- a/include/linux/ata.h
>> +++ b/include/linux/ata.h
>> @@ -386,6 +386,7 @@ enum {
>> SATA_PMP_GSCR_ERROR_EN = 33,
>> SATA_PMP_GSCR_FEAT = 64,
>> SATA_PMP_GSCR_FEAT_EN = 96,
>> + SATA_PMP_GSCR_SII_POL = 129, /* VENDOR : Silicon Image 3726/4726 */
>
> Can you plesae define it inside libata-pmp.c?
Yes. I'll resubmit with "SII_POL" defined "locally".
> ata.h only contains stuff defined by the ATA standard.
Ah ok. I didn't realize that.
>
> Other than the above nitpicks,
>
> Acked-by: Tejun Heo <tj@kernel.org>
Thanks for the Ack/Review!
grant
>
> Thanks.
>
> --
> tejun
>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2010-04-15 0:57 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-14 0:46 [PATCH] 2.6.34-rc3 Disable R_OK (Early ACK) on SII 3726/4726 PMP Grant Grundler
2010-04-15 0:31 ` Tejun Heo
2010-04-15 0:57 ` Grant Grundler
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox