siimage.c -serious- error fixed. Performance/stability issues resolved.

linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* siimage.c -serious- error fixed.  Performance/stability issues resolved.
@ 2003-12-02 10:36 Ryan Earl
  0 siblings, 0 replies; 7+ messages in thread
From: Ryan Earl @ 2003-12-02 10:36 UTC (permalink / raw)
  To: linux-ide

Hello,

Let me preface this by saying I'm new to the list and that I hope this is
the right place for this message.  I run a production server that utilizes
the Silicon Image 3112A SerialATA interface.  After doing research and
reading back through the kernel logs I saw there was a stability issue
being worked around by turning the request buffer down to 15KBytes from
128KByte.  While this fixed some symptoms to the problem, it did not fix
the problem and resulted in horrible perforance especially in terms of CPU
utilization.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* siimage.c -serious- error fixed.  Performance/stability issues resolved.
@ 2003-12-02 10:57 Ryan Earl
  2003-12-02 19:16 ` Bartlomiej Zolnierkiewicz
  0 siblings, 1 reply; 7+ messages in thread
From: Ryan Earl @ 2003-12-02 10:57 UTC (permalink / raw)
  To: linux-ide

Hello,

Let me preface this by saying I'm new to the list and that I hope this is
the right place for this message.  I run a production server that utilizes
the Silicon Image 3112A SerialATA interface and I had ordered 2 WD360GD
"Raptor" drives to have a cheap RAID1 hot-swap configuration.  After doing
research and reading back through the kernel lists I saw there was a
stability issue being worked around by turning the request buffer down to
15KBytes from 128KByte.  While this fixed some symptoms to the problem, it
did not fix the problem and resulted in horrible performance especially in
terms of CPU utilization.

After reading through the IDE/PCI subsystems on linux, I looked that
linux/ide/pci/siimage.[c|h] to find and fix what I believe is the reason
for the stability issues on SI's SATA hardware.  Here's a verbose diff
from the unpatched=>patch siimage.c driver from 2.4.23_pre8:

*** /usr/src/linux/drivers/ide/pci/siimage.c.orig       Sun Nov 30
07:05:59 2003
--- /usr/src/linux/drivers/ide/pci/siimage.c    Sun Nov 30 07:05:59 2003
***************
*** 265,271 ****
  static void siimage_tuneproc (ide_drive_t *drive, byte mode_wanted)
  {
        ide_hwif_t *hwif        = HWIF(drive);
!       u32 speedt              = 0;
        u16 speedp              = 0;
        unsigned long addr      = siimage_seldev(drive, 0x04);
        unsigned long tfaddr    = siimage_selreg(hwif, 0x02);
--- 265,271 ----
  static void siimage_tuneproc (ide_drive_t *drive, byte mode_wanted)
  {
        ide_hwif_t *hwif        = HWIF(drive);
!       u16 speedt              = 0;    /* was a u32 that clobered over
the port/addr section of the stack in OUTW */
        u16 speedp              = 0;
        unsigned long addr      = siimage_seldev(drive, 0x04);
        unsigned long tfaddr    = siimage_selreg(hwif, 0x02);
***************
*** 1065,1072 ****
--- 1065,1075 ----
        hwif->hwif_data = 0;

        hwif->rqsize = 128;
+
+ #if defined( SATA_BUGGY )
        if (is_sata(hwif))
                hwif->rqsize = 15;
+ #endif

        if (pci_get_drvdata(dev) == NULL)
                return;

The crux of the problem was that the on-disk controller was getting
programmed with erroneous settings.  A 32-bit argument was being passed
instead of a 16-bit argument to the function referenced by hwif->OUTW,
thus clobbering the stack.

I've been stress testing this patch for the last 40 hours.  Benchmark
results have been excellent.  Without the patch, sequential reads and
writes were using 90% of the CPU, with the patch it's more around 20% on
average.  Read CPU usage is significantly lower than write usage.  Disk
throughput also increased 5-7MB/s in sequential read/write access.  I've
copied around over 1TB of data on this drive with my patch and the request
buffer set to 128KByte/sec with zero errors.  Without the patch and with
128KByte request buffer the harddrive typically becomes unstable within 5
minutes of usage.

Here are some preliminary benchmarks with bonnie++:

name,file_size,putc,putc_cpu,put_block,put_block_cpu,rewrite,rewrite_cpu,getc,getc_cpu,get_block,get_block_cpu,seeks,seeks_cpu,num_files,seq_create,seq_create_cpu,seq_stat,seq_stat_cpu,seq_del,seq_del_cpu,ran_create,ran_create_cpu,ran_stat,ran_stat_cpu,ran_del,ran_del_cpu
aeryn,2G,13394,98,44280,29,20448,8,18897,88,42106,11,206.9,0,16,19359,94,+++++,+++,18830,98,19631,99,+++++,+++,17084,100
aeryn,2G,13426,98,41892,28,19254,7,19021,88,39806,10,203.8,0,16,20929,99,+++++,+++,16702,91,20550,100,+++++,+++,16454,100
aeryn,2G,13491,99,41469,28,17846,7,18908,88,37610,9,204.0,0,16,21461,99,+++++,+++,18478,99,20609,100,+++++,+++,15858,95
aeryn,2G,13285,97,35673,23,16948,7,18934,88,34985,8,197.6,0,16,20864,96,+++++,+++,18135,100,20712,101,+++++,+++,16243,100
aeryn,2G,12428,91,52800,35,21584,9,18974,89,45868,12,191.3,0,16,21298,100,+++++,+++,15746,86,20526,100,+++++,+++,16347,100
aeryn,2G,13464,99,43526,29,20138,8,18978,88,42060,12,202.2,0,16,21233,99,+++++,+++,18045,100,19158,93,+++++,+++,16182,100
aeryn,2G,13170,97,42420,28,19145,7,19044,89,39721,10,209.8,0,16,20469,96,+++++,+++,18263,99,20636,100,+++++,+++,16327,99
aeryn,2G,13263,97,39844,26,17376,7,18844,88,36047,10,201.5,0,16,21212,99,+++++,+++,18221,100,20513,100,+++++,+++,14119,87
aeryn,2G,13087,96,38054,26,16139,6,18797,88,33860,8,197.0,0,16,20503,97,+++++,+++,18021,100,20281,99,+++++,+++,16120,99
aeryn,2G,12632,92,51055,34,20976,8,18841,88,45228,12,172.1,0,16,21476,100,+++++,+++,18446,100,20225,96,+++++,+++,15075,92

I'm getting disk throughput within 5% of the manufactures claim, which I
feel is reasonable considering filesystem overhead and what not.  This is
on ReiserFS 3.6.  There is room I'm sure to reduce CPU overhead more, but
I think that's a good thing to do with 2.6. Hopefully libata will take
care of that.

I checked the latest 2.6.0-test11 kernel, and it also has the same error
as in the 2.4 kernel, however I do not have a 2.6 testbed so I did not
attempt to patch that kernel for testing.  I was mainly interested in
cleaning up things for the kernel my production server will upgrade to
when I install the SATA RAID1 setup.

Cheers,
J. Ryan Earl

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: siimage.c -serious- error fixed.  Performance/stability issues resolved.
  2003-12-02 10:57 siimage.c -serious- error fixed. Performance/stability issues resolved Ryan Earl
@ 2003-12-02 19:16 ` Bartlomiej Zolnierkiewicz
  2003-12-02 21:33   ` Ryan Earl
  0 siblings, 1 reply; 7+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2003-12-02 19:16 UTC (permalink / raw)
  To: Ryan Earl; +Cc: linux-ide


On Tuesday 02 of December 2003 11:57, Ryan Earl wrote:
> Hello,

Hi,

> Let me preface this by saying I'm new to the list and that I hope this is
> the right place for this message.  I run a production server that utilizes
> the Silicon Image 3112A SerialATA interface and I had ordered 2 WD360GD
> "Raptor" drives to have a cheap RAID1 hot-swap configuration.  After doing
> research and reading back through the kernel lists I saw there was a
> stability issue being worked around by turning the request buffer down to
> 15KBytes from 128KByte.  While this fixed some symptoms to the problem, it
> did not fix the problem and resulted in horrible performance especially in
> terms of CPU utilization.
>
> After reading through the IDE/PCI subsystems on linux, I looked that
> linux/ide/pci/siimage.[c|h] to find and fix what I believe is the reason
> for the stability issues on SI's SATA hardware.  Here's a verbose diff
> from the unpatched=>patch siimage.c driver from 2.4.23_pre8:
>
> *** /usr/src/linux/drivers/ide/pci/siimage.c.orig       Sun Nov 30
> 07:05:59 2003
> --- /usr/src/linux/drivers/ide/pci/siimage.c    Sun Nov 30 07:05:59 2003
> ***************
> *** 265,271 ****
>   static void siimage_tuneproc (ide_drive_t *drive, byte mode_wanted)
>   {
>         ide_hwif_t *hwif        = HWIF(drive);
> !       u32 speedt              = 0;
>         u16 speedp              = 0;
>         unsigned long addr      = siimage_seldev(drive, 0x04);
>         unsigned long tfaddr    = siimage_selreg(hwif, 0x02);
> --- 265,271 ----
>   static void siimage_tuneproc (ide_drive_t *drive, byte mode_wanted)
>   {
>         ide_hwif_t *hwif        = HWIF(drive);
> !       u16 speedt              = 0;    /* was a u32 that clobered over
> the port/addr section of the stack in OUTW */
>         u16 speedp              = 0;
>         unsigned long addr      = siimage_seldev(drive, 0x04);
>         unsigned long tfaddr    = siimage_selreg(hwif, 0x02);
> ***************
> *** 1065,1072 ****
> --- 1065,1075 ----
>         hwif->hwif_data = 0;
>
>         hwif->rqsize = 128;
> +
> + #if defined( SATA_BUGGY )
>         if (is_sata(hwif))
>                 hwif->rqsize = 15;
> + #endif
>
>         if (pci_get_drvdata(dev) == NULL)
>                 return;
>
>
> The crux of the problem was that the on-disk controller was getting
> programmed with erroneous settings.  A 32-bit argument was being passed
> instead of a 16-bit argument to the function referenced by hwif->OUTW,
> thus clobbering the stack.

Good catch but can you verify this?

I.e. if you are using MMIO (MMIO-DMA banner printed by a driver),
copy ide_mm_outw() function from ide-iops.c to siimage.c, rename it,
add printk("%u\n", val) to it and call this function with speedt argument
instead of calling hwif->OUTW().

It is better to change speedt to u16 anyway.

> I've been stress testing this patch for the last 40 hours.  Benchmark
> results have been excellent.  Without the patch, sequential reads and
> writes were using 90% of the CPU, with the patch it's more around 20% on
> average.  Read CPU usage is significantly lower than write usage.  Disk
> throughput also increased 5-7MB/s in sequential read/write access.  I've
> copied around over 1TB of data on this drive with my patch and the request
> buffer set to 128KByte/sec with zero errors.  Without the patch and with
> 128KByte request buffer the harddrive typically becomes unstable within 5
> minutes of usage.

So without "speedt" fix even and with 128kB requests it is unstable
and with fix and 128kB requests it works okay?

> I checked the latest 2.6.0-test11 kernel, and it also has the same error
> as in the 2.4 kernel, however I do not have a 2.6 testbed so I did not
> attempt to patch that kernel for testing.  I was mainly interested in
> cleaning up things for the kernel my production server will upgrade to
> when I install the SATA RAID1 setup.

You can find patch limiting requests to 15kB only for Seagate drives here:
	http://lkml.org/lkml/2003/11/30/54

It is for 2.6.0-test11 but should also apply to 2.4 kernels.

thanks,
--bart


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: siimage.c -serious- error fixed.  Performance/stability issues resolved.
  2003-12-02 19:16 ` Bartlomiej Zolnierkiewicz
@ 2003-12-02 21:33   ` Ryan Earl
  2003-12-02 23:32     ` Bartlomiej Zolnierkiewicz
  2003-12-03  0:06     ` Jeff Garzik
  0 siblings, 2 replies; 7+ messages in thread
From: Ryan Earl @ 2003-12-02 21:33 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz; +Cc: Ryan Earl, linux-ide

> Good catch but can you verify this?
>
> I.e. if you are using MMIO (MMIO-DMA banner printed by a driver),
> copy ide_mm_outw() function from ide-iops.c to siimage.c, rename it,
> add printk("%u\n", val) to it and call this function with speedt argument
> instead of calling hwif->OUTW().
>
> It is better to change speedt to u16 anyway.

The 3112 is using memory mapped I/O yes.  I verified by simply putting a
printk() in the 'if (hwif->mmio) { }' section.  I took it out for the
diff.  Even if we're not using MMIO, it was still using a 32-bit speedt as
an argument to pci_write_config_WORD() which is still erroneous.

> So without "speedt" fix even and with 128kB requests it is unstable
> and with fix and 128kB requests it works okay?

Without the "speedt" fix, a 128KB request buffer causes the hardrive to
become completely unstable.  In benchmarking it would timeout and give DMA
errors within 5 minutes of usage.  For my development setup, I was running
a second non-SATA harddrive with RAID1 across the partitions on the
Raptor; benchmarks were obviously run on a non-RAID partition.  I had to
reboot to resync the drives many times after DMA errors.  I have had zero
issues with the patch installed.

> You can find patch limiting requests to 15kB only for Seagate drives here:
> 	http://lkml.org/lkml/2003/11/30/54
>
> It is for 2.6.0-test11 but should also apply to 2.4 kernels.

The 15KB request buffer "fix"--*ahem* workaround--never addressed the real
issue.  You want to get rid of all the 15KB hoocus-poocus.  All that does
is kill your disk performance, it never solved the problem, merely made
your system more stable with the problem still around.  Essentially, you
should never have that buffer set to 15KB and with this fix all that noise
should just disappear from the driver/kernel.

I mean, think about what's happening to OUTW with a 32-bit speedt.  The 16
most significant bits of speedt are being used as the whole first
argument--this will always be 0--and the lower order 16-bits of speedt is
then being used as the upper order 16-bits of the address/port, while the
upper 16-bits of the real address/port are now the lower 16-bits of the
argument sent.  The lower order 16-bits of the address/port then get left
on the stack to cause whatever damage it can down the road.  The driver
was sending complete nonsense to the on-disk controller.  Though the error
was only 2 characters long, this belies the severity of its nature.

I'm currently waiting for a benchmarking script to finish, it's been
running about 10 hours now.  I setup a script to run tests across varrying
datasets--from 2G to 8G, 15K to 64K files--with the patched driver with
128 and 15 K buffers, and with/without write caching enabled.  I noticed
wcache was not enabled by default, so I'm trying to gauge its effect on
performance and stability.  I will post the results when the runs have
finished.  The performance differential here is incredible.

-ryan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: siimage.c -serious- error fixed.  Performance/stability issues resolved.
  2003-12-02 21:33   ` Ryan Earl
@ 2003-12-02 23:32     ` Bartlomiej Zolnierkiewicz
  2003-12-03  0:30       ` Ryan Earl
  2003-12-03  0:06     ` Jeff Garzik
  1 sibling, 1 reply; 7+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2003-12-02 23:32 UTC (permalink / raw)
  To: Ryan Earl; +Cc: linux-ide

On Tuesday 02 of December 2003 22:33, Ryan Earl wrote:
> > Good catch but can you verify this?
> >
> > I.e. if you are using MMIO (MMIO-DMA banner printed by a driver),
> > copy ide_mm_outw() function from ide-iops.c to siimage.c, rename it,
> > add printk("%u\n", val) to it and call this function with speedt argument
> > instead of calling hwif->OUTW().
> >
> > It is better to change speedt to u16 anyway.
>
> The 3112 is using memory mapped I/O yes.  I verified by simply putting a
> printk() in the 'if (hwif->mmio) { }' section.  I took it out for the
> diff.  Even if we're not using MMIO, it was still using a 32-bit speedt as
> an argument to pci_write_config_WORD() which is still erroneous.

Okay.

> > So without "speedt" fix even and with 128kB requests it is unstable
> > and with fix and 128kB requests it works okay?
>
> Without the "speedt" fix, a 128KB request buffer causes the hardrive to
> become completely unstable.  In benchmarking it would timeout and give DMA
> errors within 5 minutes of usage.  For my development setup, I was running
> a second non-SATA harddrive with RAID1 across the partitions on the
> Raptor; benchmarks were obviously run on a non-RAID partition.  I had to
> reboot to resync the drives many times after DMA errors.  I have had zero
> issues with the patch installed.

Okay, thanks!

> > You can find patch limiting requests to 15kB only for Seagate drives
> > here: http://lkml.org/lkml/2003/11/30/54
> >
> > It is for 2.6.0-test11 but should also apply to 2.4 kernels.
>
> The 15KB request buffer "fix"--*ahem* workaround--never addressed the real
> issue.  You want to get rid of all the 15KB hoocus-poocus.  All that does
> is kill your disk performance, it never solved the problem, merely made
> your system more stable with the problem still around.  Essentially, you
> should never have that buffer set to 15KB and with this fix all that noise
> should just disappear from the driver/kernel.

No, there is a REAL issue with Seagate SATA drives
(more details in recent discussion on lkml and in lkml archives).

Proper workaround (instead of limiting requests to 15kB) is underway.

> I mean, think about what's happening to OUTW with a 32-bit speedt.  The 16
> most significant bits of speedt are being used as the whole first
> argument--this will always be 0--and the lower order 16-bits of speedt is
> then being used as the upper order 16-bits of the address/port, while the
> upper 16-bits of the real address/port are now the lower 16-bits of the
> argument sent.  The lower order 16-bits of the address/port then get left
> on the stack to cause whatever damage it can down the road.  The driver
> was sending complete nonsense to the on-disk controller.  Though the error
> was only 2 characters long, this belies the severity of its nature.

Yep, this explanaiton makes sense.
It made me wonder why gcc doesn't emit warning for this case...

> I'm currently waiting for a benchmarking script to finish, it's been
> running about 10 hours now.  I setup a script to run tests across varrying
> datasets--from 2G to 8G, 15K to 64K files--with the patched driver with
> 128 and 15 K buffers, and with/without write caching enabled.  I noticed
> wcache was not enabled by default, so I'm trying to gauge its effect on
> performance and stability.  I will post the results when the runs have
> finished.  The performance differential here is incredible.

Okay.

thanks,
--bart


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: siimage.c -serious- error fixed.  Performance/stability issues resolved.
  2003-12-02 21:33   ` Ryan Earl
  2003-12-02 23:32     ` Bartlomiej Zolnierkiewicz
@ 2003-12-03  0:06     ` Jeff Garzik
  1 sibling, 0 replies; 7+ messages in thread
From: Jeff Garzik @ 2003-12-03  0:06 UTC (permalink / raw)
  To: Ryan Earl; +Cc: Bartlomiej Zolnierkiewicz, linux-ide

Ryan Earl wrote:
> The 15KB request buffer "fix"--*ahem* workaround--never addressed the real
> issue.  You want to get rid of all the 15KB hoocus-poocus.  All that does
> is kill your disk performance, it never solved the problem, merely made
> your system more stable with the problem still around.  Essentially, you
> should never have that buffer set to 15KB and with this fix all that noise
> should just disappear from the driver/kernel.

Not correct.

As has been discussed many times, with certain Seagate drives,

	sector_count % 15 == 1 && sector_count != 1

causes failure.

This is what the 15KB limit addresses.  Read libata SII driver for more 
info.

	Jeff



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: siimage.c -serious- error fixed.  Performance/stability issues resolved.
  2003-12-02 23:32     ` Bartlomiej Zolnierkiewicz
@ 2003-12-03  0:30       ` Ryan Earl
  0 siblings, 0 replies; 7+ messages in thread
From: Ryan Earl @ 2003-12-03  0:30 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz; +Cc: Ryan Earl, linux-ide

> No, there is a REAL issue with Seagate SATA drives
> (more details in recent discussion on lkml and in lkml archives).
>
> Proper workaround (instead of limiting requests to 15kB) is underway.

Oh, OK.  I thought you were still talking about the siimage cludge, I was
confused about why you mentioned Seagate.  That sucks, doesn't Seagate
have the only native SATA interface (save for the new WD740GD maybe)? 
Aren't they going to be the first to support NCQ on SATA 2.0?  Guess I
should go check that thread.

> Yep, this explanaiton makes sense.
> It made me wonder why gcc doesn't emit warning for this case...

That is a good question.  I would guess that there is a limit to how deep
gcc persues type checking when making calls to function pointers.

-ryan

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2003-12-03  0:34 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-12-02 10:57 siimage.c -serious- error fixed. Performance/stability issues resolved Ryan Earl
2003-12-02 19:16 ` Bartlomiej Zolnierkiewicz
2003-12-02 21:33   ` Ryan Earl
2003-12-02 23:32     ` Bartlomiej Zolnierkiewicz
2003-12-03  0:30       ` Ryan Earl
2003-12-03  0:06     ` Jeff Garzik
  -- strict thread matches above, loose matches on Subject: below --
2003-12-02 10:36 Ryan Earl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).