linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RAID1 & 2.6.9 performance problem
@ 2005-01-17 15:22 Janusz Zamecki
  2005-01-17 15:39 ` Gordon Henderson
  2005-01-18 17:32 ` RAID1 & 2.6.9 performance problem J. Ryan Earl
  0 siblings, 2 replies; 21+ messages in thread
From: Janusz Zamecki @ 2005-01-17 15:22 UTC (permalink / raw)
  To: linux-raid

Hello!

After days of googling I've gave up and decided to ask for help.

The story is very simple: I have /dev/md6 raid1 array made of hdg and 
hde disks. The resulting array is as fast as 1 disk only.

Please check this out:

hdparm -t /dev/hdg /dev/hde /dev/md6

/dev/hdg:
  Timing buffered disk reads:  184 MB in  3.03 seconds =  60.76 MB/sec

/dev/hde:
  Timing buffered disk reads:  184 MB in  3.01 seconds =  61.08 MB/sec

/dev/md6:
  Timing buffered disk reads:  184 MB in  3.03 seconds =  60.74 MB/sec

I've expected much better /dev/md6 performance (at least 100MB/s).

It seems that md6 uses one drive only. This is the dstat output:

dstat -d -Dhdg,hde

--disk/hdg----disk/hde-
_read write _read write
    0     0 :   0     0
    0     0 :   0     0
    0     0 :52.5M    0
    0     0 :61.4M    0
    0     0 :62.5M    0
    0     0 :8064k    0
    0     0 :   0     0
    0     0 :   0     0
    0     0 :   0     0
23.9M    0 :   0     0
   62M    0 :   0     0
62.5M    0 :   0     0
33.9M    0 :   0     0
    0     0 :   0     0

In second terminal I've ran hdparm -t /dev/md6 twice (one by one).
As you can see the first hdparm reads from hde, while the second hdparm 
test reads from hdg. The next test reads from hde and so on.

I've tried to run small script, to run two hdparm tests simultanously:

hdparm -t /dev/md6 &
hdparm -t /dev/md6

This is the result:

--disk/hdg----disk/hde-
_read write _read write
    0     0 :   0     0
    0     0 :   0     0
  124k    0 :26.0M    0
  368k    0 :45.5M    0
    0     0 :   0     0
    0     0 : 896k    0
  124k    0 :1568k    0
    0     0 :   0     0

Strange, seems that hde is preferred.
If I run the same test again:

    0     0 :   0     0
30.6M    0 : 112k    0
41.1M    0 : 116k    0
    0     0 :   0     0
  360k    0 :   0     0
  124k    0 : 416k    0
    0     0 :   0     0

This time hdg is the preferred disk.

What is wrong? Is it possible to balance reads from both disks?

If you need more details I will more than happy to
send them to the list.

Best regards, Janusz

P.S.
More info:

cat /proc/mdstat
Personalities : [raid1]
md6 : active raid1 hdg[1] hde[0]
       195360896 blocks [2/2] [UU]



hdparm -i /dev/hdg /dev/hde

/dev/hdg:

  Model=ST3200822A, FwRev=3.01, SerialNo=***
  Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
  RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
  BuffType=unknown, BuffSize=8192kB, MaxMultSect=16, MultSect=16
  CurCHS=65535/1/63, CurSects=4128705, LBA=yes, LBAsects=268435455
  IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
  PIO modes:  pio0 pio1 pio2 pio3 pio4
  DMA modes:  mdma0 mdma1 mdma2
  UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5
  AdvancedPM=no WriteCache=enabled
  Drive conforms to: ATA/ATAPI-6 T13 1410D revision 2:

/dev/hde:

  Model=ST3200822A, FwRev=3.01, SerialNo=***
  Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
  RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
  BuffType=unknown, BuffSize=8192kB, MaxMultSect=16, MultSect=16
  CurCHS=65535/1/63, CurSects=4128705, LBA=yes, LBAsects=268435455
  IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
  PIO modes:  pio0 pio1 pio2 pio3 pio4
  DMA modes:  mdma0 mdma1 mdma2
  UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5
  AdvancedPM=no WriteCache=enabled
  Drive conforms to: ATA/ATAPI-6 T13 1410D revision 2:



hdparm  /dev/hdg /dev/hde

/dev/hdg:
  multcount    = 16 (on)
  IO_support   =  1 (32-bit)
  unmaskirq    =  1 (on)
  using_dma    =  1 (on)
  keepsettings =  0 (off)
  readonly     =  0 (off)
  readahead    = 512 (on)
  geometry     = 24321/255/63, sectors = 390721968, start = 0

/dev/hde:
  multcount    = 16 (on)
  IO_support   =  1 (32-bit)
  unmaskirq    =  1 (on)
  using_dma    =  1 (on)
  keepsettings =  0 (off)
  readonly     =  0 (off)
  readahead    = 512 (on)
  geometry     = 24321/255/63, sectors = 390721968, start = 0

from dmesg:
SiI680: IDE controller at PCI slot 0000:00:0b.0
SiI680: chipset revision 2
SiI680: BASE CLOCK == 133
SiI680: 100% native mode on irq 5
     ide2: MMIO-DMA , BIOS settings: hde:pio, hdf:pio
     ide3: MMIO-DMA , BIOS settings: hdg:pio, hdh:pio


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RAID1 & 2.6.9 performance problem
  2005-01-17 15:22 RAID1 & 2.6.9 performance problem Janusz Zamecki
@ 2005-01-17 15:39 ` Gordon Henderson
  2005-01-17 15:51   ` Hans Kristian Rosbach
  2005-01-17 16:24   ` Andrew Walrond
  2005-01-18 17:32 ` RAID1 & 2.6.9 performance problem J. Ryan Earl
  1 sibling, 2 replies; 21+ messages in thread
From: Gordon Henderson @ 2005-01-17 15:39 UTC (permalink / raw)
  To: Janusz Zamecki; +Cc: linux-raid

On Mon, 17 Jan 2005, Janusz Zamecki wrote:

> Hello!
>
> After days of googling I've gave up and decided to ask for help.
>
> The story is very simple: I have /dev/md6 raid1 array made of hdg and
> hde disks. The resulting array is as fast as 1 disk only.

Why would you expect it to be any faster?

> Please check this out:
>
> hdparm -t /dev/hdg /dev/hde /dev/md6
>
> /dev/hdg:
>   Timing buffered disk reads:  184 MB in  3.03 seconds =  60.76 MB/sec
>
> /dev/hde:
>   Timing buffered disk reads:  184 MB in  3.01 seconds =  61.08 MB/sec
>
> /dev/md6:
>   Timing buffered disk reads:  184 MB in  3.03 seconds =  60.74 MB/sec

These are all good - nice fast disks too by the looks of it - best I've
seen for a while is about 55MB/sec head bandwidth.

> I've expected much better /dev/md6 performance (at least 100MB/s).

I wouldn't - use RAID-0 if you want more performance.

> It seems that md6 uses one drive only. This is the dstat output:

As I understand it, it reads "chunksize" blocks from one drive, then
switches to the other drive, then back again.

Try a bigger read - eg:

  time dd if=/dev/md6 of=/dev/null bs=128K count=8192

but I don't think there are any real gains to be made with RAID-1 - your
results more or less track everything I've seen and used with RAID-1 - ie.
disk read speed is the same as reading from a single device, and never
significantly faster.

Good luck...

Gordon



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RAID1 & 2.6.9 performance problem
  2005-01-17 15:39 ` Gordon Henderson
@ 2005-01-17 15:51   ` Hans Kristian Rosbach
  2005-01-17 16:46     ` Peter T. Breuer
  2005-01-17 20:49     ` Janusz Zamecki
  2005-01-17 16:24   ` Andrew Walrond
  1 sibling, 2 replies; 21+ messages in thread
From: Hans Kristian Rosbach @ 2005-01-17 15:51 UTC (permalink / raw)
  To: Gordon Henderson; +Cc: linux-raid

> As I understand it, it reads "chunksize" blocks from one drive, then
> switches to the other drive, then back again.
>
> Try a bigger read - eg:
> 
>   time dd if=/dev/md6 of=/dev/null bs=128K count=8192
> 
> but I don't think there are any real gains to be made with RAID-1 - your
> results more or less track everything I've seen and used with RAID-1 - ie.
> disk read speed is the same as reading from a single device, and never
> significantly faster.

Actually I have managed to get about 30-40% higher throughput with just
a little hacking on the code that selects what disk to use.

Problem is
-It selects the disk that is closest to the wanted sector by remembering
 what sector was last requested and what disk was used for it.
-For sequential reads (sucha as hdparm) it will override and use the
 same disk anyways. (sector = lastsector+1)

I gained a lot of throughput by alternating disk, but seek time was
roughly doubled. I also tried to get smart and played some with the
code in order to avoid seeking both disks back and forth wildly when
there were two sequential reads. I didn't find a good way to do it
unfortunately.

I'm not going to make any patch available, because I removed bad-disk
checking in order to simplify it.

-HK


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RAID1 & 2.6.9 performance problem
  2005-01-17 15:39 ` Gordon Henderson
  2005-01-17 15:51   ` Hans Kristian Rosbach
@ 2005-01-17 16:24   ` Andrew Walrond
  2005-01-17 16:51     ` Is this hdparm -t output correct? (was Re: RAID1 & 2.6.9 performance problem) Andy Smith
  1 sibling, 1 reply; 21+ messages in thread
From: Andrew Walrond @ 2005-01-17 16:24 UTC (permalink / raw)
  To: linux-raid

On Monday 17 January 2005 15:39, Gordon Henderson wrote:
> On Mon, 17 Jan 2005, Janusz Zamecki wrote:
>
> > I've expected much better /dev/md6 performance (at least 100MB/s).
>
> I wouldn't - use RAID-0 if you want more performance.
>

FWIW I get these results with RAID-0

andrew@orac ~ $ sudo hdparm -t /dev/sda /dev/sdb /dev/md0

/dev/sda:
 Timing buffered disk reads:  170 MB in  3.00 seconds =  56.64 MB/sec

/dev/sdb:
 Timing buffered disk reads:  170 MB in  3.02 seconds =  56.37 MB/sec

/dev/md0:
 Timing buffered disk reads:  236 MB in  3.02 seconds =  78.08 MB/sec


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RAID1 & 2.6.9 performance problem
  2005-01-17 15:51   ` Hans Kristian Rosbach
@ 2005-01-17 16:46     ` Peter T. Breuer
  2005-01-18 13:18       ` Hans Kristian Rosbach
  2005-01-17 20:49     ` Janusz Zamecki
  1 sibling, 1 reply; 21+ messages in thread
From: Peter T. Breuer @ 2005-01-17 16:46 UTC (permalink / raw)
  To: linux-raid

Hans Kristian Rosbach <hk@isphuset.no> wrote:
> -It selects the disk that is closest to the wanted sector by remembering
>  what sector was last requested and what disk was used for it.
> -For sequential reads (sucha as hdparm) it will override and use the
>  same disk anyways. (sector = lastsector+1)
> 
> I gained a lot of throughput by alternating disk, but seek time was
> roughly doubled. I also tried to get smart and played some with the
> code in order to avoid seeking both disks back and forth wildly when
> there were two sequential reads. I didn't find a good way to do it
> unfortunately.

Interesting. How did you measure latency? Do you have a script you
could post?

> I'm not going to make any patch available, because I removed bad-disk
> checking in order to simplify it.

The FR1 patch measures disk latency and weights the disk head distances
by the measured latency, which may help.  It probably also gets rid of
that sequential read thing (I haven't done anything but port the patch
to 2.6, not actually run it in anger!).

  ftp://oboe.it.uc3m.es/pub/Programs/fr1-2.15b.tgz

(I am doing a 2.16 with the robust-read patch I suggested added in).

I really don't think this measuring disk head position can help unless
raid controls ALL of the disks in question, or the disks are otherwise
inactive.  Is that your case?


Peter


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Is this hdparm -t output correct? (was Re: RAID1 & 2.6.9 performance problem)
  2005-01-17 16:24   ` Andrew Walrond
@ 2005-01-17 16:51     ` Andy Smith
  2005-01-17 17:04       ` Andrew Walrond
  0 siblings, 1 reply; 21+ messages in thread
From: Andy Smith @ 2005-01-17 16:51 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1066 bytes --]

On Mon, Jan 17, 2005 at 04:24:47PM +0000, Andrew Walrond wrote:
> FWIW I get these results with RAID-0
> 
> andrew@orac ~ $ sudo hdparm -t /dev/sda /dev/sdb /dev/md0
> 
> /dev/sda:
>  Timing buffered disk reads:  170 MB in  3.00 seconds =  56.64 MB/sec
> 
> /dev/sdb:
>  Timing buffered disk reads:  170 MB in  3.02 seconds =  56.37 MB/sec
> 
> /dev/md0:
>  Timing buffered disk reads:  236 MB in  3.02 seconds =  78.08 MB/sec

As an aside, when I try this, how come I get this:

$ sudo hdparm -t /dev/sda /dev/sdb /dev/md0

/dev/sda:
 Timing buffered disk reads:  152 MB in  3.03 seconds =  50.19 MB/sec
HDIO_DRIVE_CMD(null) (wait for flush complete) failed: Inappropriate ioctl for device

/dev/sdb:
 Timing buffered disk reads:  152 MB in  3.03 seconds =  50.24 MB/sec
HDIO_DRIVE_CMD(null) (wait for flush complete) failed: Inappropriate ioctl for device

/dev/md0:
 Timing buffered disk reads:  read(2097152) returned 524288 bytes

(note warnings about ioctls and no speed output for /dev/md0)

These are SATA drives in a RAID 1.

[-- Attachment #2: Type: application/pgp-signature, Size: 187 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is this hdparm -t output correct? (was Re: RAID1 & 2.6.9 performance problem)
  2005-01-17 16:51     ` Is this hdparm -t output correct? (was Re: RAID1 & 2.6.9 performance problem) Andy Smith
@ 2005-01-17 17:04       ` Andrew Walrond
  2005-01-17 18:26         ` RAID1 Corruption Markus Gehring
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Walrond @ 2005-01-17 17:04 UTC (permalink / raw)
  To: linux-raid

On Monday 17 January 2005 16:51, Andy Smith wrote:
>
> As an aside, when I try this, how come I get this:
>
> $ sudo hdparm -t /dev/sda /dev/sdb /dev/md0
>
> /dev/sda:
>  Timing buffered disk reads:  152 MB in  3.03 seconds =  50.19 MB/sec
> HDIO_DRIVE_CMD(null) (wait for flush complete) failed: Inappropriate ioctl
> for device
>
> /dev/sdb:
>  Timing buffered disk reads:  152 MB in  3.03 seconds =  50.24 MB/sec
> HDIO_DRIVE_CMD(null) (wait for flush complete) failed: Inappropriate ioctl
> for device
>
> /dev/md0:
>  Timing buffered disk reads:  read(2097152) returned 524288 bytes
>
> (note warnings about ioctls and no speed output for /dev/md0)
>
> These are SATA drives in a RAID 1.

I edited out the "inappropriate ioctl" warnings in my output. Since the drives 
are not partitions, a flush would indeed be inappropriate. hdparm -t only 
does read timings anyway. As to why your md0 output is strange - no idea I'm  
afraid. :(

Andrew Walrond

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RAID1 Corruption
  2005-01-17 17:04       ` Andrew Walrond
@ 2005-01-17 18:26         ` Markus Gehring
  2005-01-17 19:14           ` Paul Clements
  2005-01-17 19:21           ` Sven Anders
  0 siblings, 2 replies; 21+ messages in thread
From: Markus Gehring @ 2005-01-17 18:26 UTC (permalink / raw)
  To: linux-raid; +Cc: anders, nicoya

Hi Folks!

I have a reproducable problem with corrupted data read from a RAID1-array.

Setup:
  HW:
   2 S-ATA-Disks (160GB each) -> /dev/md4 RAID1
   Promise S150 TX4 - Controller
   AMD Sempron 2200+

  SW:
   Fedora Core 3
   Kernel 2.6.10 unpatched
   Samba (for read/write-accesses)
   SW-Raid

Everything works fine with only one drive in the array. If the second is
synced up read accesses return corrupted data.

Interesting: If you remove again the second disk. The same files will be
  read correctly again (no matter if written while only one disk is in
the array or two are synced!)!

Tests with differend disks (2x80GB Seagate and 2x160GB Samsung),
different partition sizes (20, 40, 80, 160GB), different filesytems
(ext3, ext2, reiser) showed the same results.

If i use the drive without raid everything works fine.

Many Thanks in advance!

Best regards,
  Markus


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RAID1 Corruption
  2005-01-17 18:26         ` RAID1 Corruption Markus Gehring
@ 2005-01-17 19:14           ` Paul Clements
  2005-01-17 19:35             ` Tony Mantler
  2005-01-17 19:42             ` Markus Gehring
  2005-01-17 19:21           ` Sven Anders
  1 sibling, 2 replies; 21+ messages in thread
From: Paul Clements @ 2005-01-17 19:14 UTC (permalink / raw)
  To: Markus Gehring; +Cc: linux-raid, anders, nicoya

Hi,

Markus Gehring wrote:

> I have a reproducable problem with corrupted data read from a RAID1-array.
> 
> Setup:
>  HW:
>   2 S-ATA-Disks (160GB each) -> /dev/md4 RAID1
>   Promise S150 TX4 - Controller
>   AMD Sempron 2200+
> 
>  SW:
>   Fedora Core 3
>   Kernel 2.6.10 unpatched
>   Samba (for read/write-accesses)
>   SW-Raid
> 
> Everything works fine with only one drive in the array. If the second is
> synced up read accesses return corrupted data.
> 
> Interesting: If you remove again the second disk. The same files will be
>  read correctly again (no matter if written while only one disk is in
> the array or two are synced!)!

This makes it sound like bad data is getting written to the second disk 
during resync. Could you give more details about your test procedure (a 
script or list of steps that reproduces the problem would be great)?

I don't think samba is the culprit, but just to be sure, is there any 
chance you could reproduce the problem without samba in the equation? 
(From what you say above, I assume all reads and writes are coming from 
a samba client of some sort?)

Thanks,
Paul

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RAID1 Corruption
  2005-01-17 18:26         ` RAID1 Corruption Markus Gehring
  2005-01-17 19:14           ` Paul Clements
@ 2005-01-17 19:21           ` Sven Anders
  1 sibling, 0 replies; 21+ messages in thread
From: Sven Anders @ 2005-01-17 19:21 UTC (permalink / raw)
  To: linux-raid; +Cc: Markus Gehring

[-- Attachment #1: Type: text/plain, Size: 2127 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Markus Gehring wrote:
| Hi Folks!
|
| I have a reproducable problem with corrupted data read from a RAID1-array.
|
| Everything works fine with only one drive in the array. If the second is
| synced up read accesses return corrupted data.
|
| Interesting: If you remove again the second disk. The same files will be
|  read correctly again (no matter if written while only one disk is in
| the array or two are synced!)!
|
| Tests with differend disks (2x80GB Seagate and 2x160GB Samsung),
| different partition sizes (20, 40, 80, 160GB), different filesytems
| (ext3, ext2, reiser) showed the same results.
|
| If i use the drive without raid everything works fine.

It's seems to be the same problem I have (see my posting from 5.1.2005
on this mailing-list).

The problem was reproducable. I changed to ext2, but problem persists.
I tried several kernel configurations (with a special extra-small
configuration, which had only the absolute necessary turned on), but
it had no effect (It only took longer to reproduce the problem).
I tried the kernels 2.6.9 with and without the -ac patches, wth no
success at all.

Because I'm not at home till 29.01., I could not run any tests at the
moment, but the problem seem to be the same. The seems to be no problem
with the 2.4 kernel series.

Any ideas?

Regards
~ Sven

- --
~ Sven Anders <anders@anduras.de>                 () Ascii Ribbon Campaign
~                                                 /\ Support plain text e-mail
~ ANDURAS service solutions AG
~ Innstraße 71 - 94036 Passau - Germany
~ Web: www.anduras.de - Tel: +49 (0)851-4 90 50-0 - Fax: +49 (0)851-4 90 50-55

Rechtsform: Aktiengesellschaft - Sitz: Passau - Amtsgericht Passau HRB 6032
Mitglieder des Vorstands: Sven Anders, Marcus Junker, Michael Schön
Vorsitzender des Aufsichtsrats: Dipl. Kfm. Thomas Träger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFB7BBI5lKZ7Feg4EcRAukBAKCewS2zwYERnzfojr/W1K3F3Y7bAACfUuXP
ulBaaE5BGl9nKu16tpLCLm8=
=xh5T
-----END PGP SIGNATURE-----

[-- Attachment #2: anders.vcf --]
[-- Type: text/x-vcard, Size: 339 bytes --]

begin:vcard
fn:Sven Anders
n:Anders;Sven
org:ANDURAS AG;Research and Development
adr;quoted-printable:;;Innstra=C3=9Fe 71;Passau;Bavaria;94036;Germany
email;internet:anders@anduras.de
title:Dipl. Inf.
tel;work:++49 (0)851 / 490 50 - 0
tel;fax:+49 (0)851 / 4 90 50 - 55
x-mozilla-html:FALSE
url:http://www.anduras.de
version:2.1
end:vcard


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RAID1 Corruption
  2005-01-17 19:14           ` Paul Clements
@ 2005-01-17 19:35             ` Tony Mantler
  2005-01-17 19:42             ` Markus Gehring
  1 sibling, 0 replies; 21+ messages in thread
From: Tony Mantler @ 2005-01-17 19:35 UTC (permalink / raw)
  To: Paul Clements; +Cc: linux-raid, anders, Markus Gehring


On 17-Jan-05, at 1:14 PM, Paul Clements wrote:

> Hi,
>
> Markus Gehring wrote:
>
>> I have a reproducable problem with corrupted data read from a 
>> RAID1-array.
>> Setup:
>>  HW:
>>   2 S-ATA-Disks (160GB each) -> /dev/md4 RAID1
>>   Promise S150 TX4 - Controller
>>   AMD Sempron 2200+
>>  SW:
>>   Fedora Core 3
>>   Kernel 2.6.10 unpatched
>>   Samba (for read/write-accesses)
>>   SW-Raid
>> Everything works fine with only one drive in the array. If the second 
>> is
>> synced up read accesses return corrupted data.
>> Interesting: If you remove again the second disk. The same files will 
>> be
>>  read correctly again (no matter if written while only one disk is in
>> the array or two are synced!)!
>
> This makes it sound like bad data is getting written to the second 
> disk during resync. Could you give more details about your test 
> procedure (a script or list of steps that reproduces the problem would 
> be great)?
>
> I don't think samba is the culprit, but just to be sure, is there any 
> chance you could reproduce the problem without samba in the equation? 
> (From what you say above, I assume all reads and writes are coming 
> from a samba client of some sort?)

I've run into this before. I don't think Samba was the issue, as it was 
puking on unrelated files, usually showing up as exec format errors 
when trying to run various commands.

The controller in question was a Promise Ultra66 or Ultra100 (I can't 
remember which), plugged into a Powermac 9600. I wonder if the Promise 
driver is interacting poorly with MD?


Cheers - Tony 'Nicoya' Mantler :)

--
Tony 'Nicoya' Mantler -- Master of Code-fu -- nicoya@ubb.ca
--  http://nicoya.feline.pp.se/  --  http://www.ubb.ca/  --


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RAID1 Corruption
  2005-01-17 19:14           ` Paul Clements
  2005-01-17 19:35             ` Tony Mantler
@ 2005-01-17 19:42             ` Markus Gehring
  1 sibling, 0 replies; 21+ messages in thread
From: Markus Gehring @ 2005-01-17 19:42 UTC (permalink / raw)
  To: linux-raid; +Cc: Paul Clements

Paul Clements wrote:
> Hi,
> 
> Markus Gehring wrote:
> 
>> I have a reproducable problem with corrupted data read from a 
>> RAID1-array.
>>
>> Setup:
>>  HW:
>>   2 S-ATA-Disks (160GB each) -> /dev/md4 RAID1
>>   Promise S150 TX4 - Controller
>>   AMD Sempron 2200+
>>
>>  SW:
>>   Fedora Core 3
>>   Kernel 2.6.10 unpatched
>>   Samba (for read/write-accesses)
>>   SW-Raid
>>
>> Everything works fine with only one drive in the array. If the second is
>> synced up read accesses return corrupted data.
>>
>> Interesting: If you remove again the second disk. The same files will be
>>  read correctly again (no matter if written while only one disk is in
>> the array or two are synced!)!
> 
> 
> This makes it sound like bad data is getting written to the second disk 
> during resync. Could you give more details about your test procedure (a 
> script or list of steps that reproduces the problem would be great)?
1. Setup Array (mdadm -C /dev/md4 -l 1 -n 2 /dev/sdc1 /dev/sdd1)
2. ... resync running (as i can see with cat /proc/mdstat)
3. mke2fs /dev/md4
4. mount /dev/md4 /home2
5. Copy ~100M JPGs (~800k each) via samba to array (/home2/test1/)
6. See the JPGs all okay
7. after resync has finished: Copy same ~100M JPGs to array (/home2/test2)
8. See the JPGs (at least in /home2/test2... i didn't check them in 
..test1) damaged
9. remove one disk again (mdadm /dev/md4 -f /dev/sdd1
mdadm /dev/md4 -r /dev/sdd1 ... or ../dev/sdc1!!!)
10. see (from the Win Client) the JPGs in /home2/test2 okay again!


> I don't think samba is the culprit, but just to be sure, is there any 
> chance you could reproduce the problem without samba in the equation? 
> (From what you say above, I assume all reads and writes are coming from 
> a samba client of some sort?)
I did a quick test:
Copyied my test-JPG-dir from /home/test (where i can see the pics okay) 
to /home2/test9 and see the pics damaged. After i copied them back to 
/home/test9 the stay damaged.

Remarks:
I also saw here that the pics on the syncing /dev/md4 = /home2 are 
damaged (read?) while the drive is syncing (new compared to point 6 
above) but this happens definitly not so often as if the drive has 
finished syncing (saw this the first time while dealing with the problem 
for over 2 weeks now).
I have all mounts on SW-Raid1 arrays, but i have never seen problems 
with md0 (/boot), md1 (/), md2 (swap), md3 (/var).
I have seen ext3-fs errors also (see also Sven Andras's posting from 
today and 5.1.2005).

Many Thanks,
  Markus


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RAID1 & 2.6.9 performance problem
  2005-01-17 15:51   ` Hans Kristian Rosbach
  2005-01-17 16:46     ` Peter T. Breuer
@ 2005-01-17 20:49     ` Janusz Zamecki
  1 sibling, 0 replies; 21+ messages in thread
From: Janusz Zamecki @ 2005-01-17 20:49 UTC (permalink / raw)
  To: linux-raid

W liście z pon, 17-01-2005, godz. 16:51, Hans Kristian Rosbach pisze: 

[...]

> Actually I have managed to get about 30-40% higher throughput with just
> a little hacking on the code that selects what disk to use.
> 
> Problem is
> -It selects the disk that is closest to the wanted sector by remembering
>  what sector was last requested and what disk was used for it.
> -For sequential reads (sucha as hdparm) it will override and use the
>  same disk anyways. (sector = lastsector+1)
> 
> I gained a lot of throughput by alternating disk, but seek time was
> roughly doubled. I also tried to get smart and played some with the
> code in order to avoid seeking both disks back and forth wildly when
> there were two sequential reads. I didn't find a good way to do it
> unfortunately.

What about the special case where whole disks are mirrored? 
Then there is no need for selecting disks other than round-robin,
because the heads should be in similar positions (except for rebuild
situation).

I will be more than happy to give up partitions to get better
performance instead. I have two disks, so raid 1+0 is not an option.

Could you please reconsider releasing your patch?

Best regards, Janusz
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RAID1 & 2.6.9 performance problem
  2005-01-17 16:46     ` Peter T. Breuer
@ 2005-01-18 13:18       ` Hans Kristian Rosbach
  2005-01-18 13:43         ` Peter T. Breuer
  0 siblings, 1 reply; 21+ messages in thread
From: Hans Kristian Rosbach @ 2005-01-18 13:18 UTC (permalink / raw)
  Cc: linux-raid

On Mon, 2005-01-17 at 17:46, Peter T. Breuer wrote:
> Hans Kristian Rosbach <hk@isphuset.no> wrote:
> > -It selects the disk that is closest to the wanted sector by
remembering
> >  what sector was last requested and what disk was used for it.
> > -For sequential reads (sucha as hdparm) it will override and use the
> >  same disk anyways. (sector = lastsector+1)
> > 
> > I gained a lot of throughput by alternating disk, but seek time was
> > roughly doubled. I also tried to get smart and played some with the
> > code in order to avoid seeking both disks back and forth wildly when
> > there were two sequential reads. I didn't find a good way to do it
> > unfortunately.
> 
> Interesting. How did you measure latency? Do you have a script you
> could post?

It's part of another application we use internally at work. I'll check
to see wether part of it could be GPL'ed or similar.

But it is also logical since for two requests in a row to sector and 
sector +1, it will first seek disk1 and then disk2 when the second 
request arrives. Atleast it was that way with my hack.

I was pondering maybe doing something like a virtual stripe array, such
that the data reads are logically alternated between the functioning
disks. Since it's virtual the block-size and number of disks could be
changed in run-time for speed tweaking or failed disks.

The tweaking of block size could be managed automagically by a userspace
daemon that monitors load patterns and such. A step further would be to
monitor disk-speed, so if a disk is slow it gets less/smaller stripe
segments than the other disks do. This would be ideal for a software
mirror running atop two raid-5 volumes for example, so that if one of
the raid5 volumes is degraded the speed won't fail totally.

> > I'm not going to make any patch available, because I removed
bad-disk
> > checking in order to simplify it.
> 
> The FR1 patch measures disk latency and weights the disk head
distances
> by the measured latency, which may help.  It probably also gets rid of
> that sequential read thing (I haven't done anything but port the patch
> to 2.6, not actually run it in anger!).

Latency measuring is an excellent profile imho, this would probably also
reduce the speed variation when using disks of different types.

I'll take a look at the code, and do some benchmarks when I get time.
It'll probably be this weekend.

>   ftp://oboe.it.uc3m.es/pub/Programs/fr1-2.15b.tgz
> 
> (I am doing a 2.16 with the robust-read patch I suggested added in).

Keep me posted =)

> I really don't think this measuring disk head position can help unless
> raid controls ALL of the disks in question, or the disks are otherwise
> inactive.  Is that your case?

Yep, head position is imho not a factor to be considered at all.

Currently I'm working on a database project that needs all the read
speed I can get. So what i'd like to do is to add for example 8 disks in
a mirror, and hopefully get 4-6x the overall read speed.

-HK



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: RAID1 & 2.6.9 performance problem
  2005-01-18 13:18       ` Hans Kristian Rosbach
@ 2005-01-18 13:43         ` Peter T. Breuer
  0 siblings, 0 replies; 21+ messages in thread
From: Peter T. Breuer @ 2005-01-18 13:43 UTC (permalink / raw)
  To: linux-raid

Hans Kristian Rosbach <hk@isphuset.no> wrote:
> On Mon, 2005-01-17 at 17:46, Peter T. Breuer wrote:
> > Interesting. How did you measure latency? Do you have a script you
> > could post?
> 
> It's part of another application we use internally at work. I'll check
> to see wether part of it could be GPL'ed or similar.
> 
> But it is also logical since for two requests in a row to sector and 
> sector +1, it will first seek disk1 and then disk2 when the second 
> request arrives. Atleast it was that way with my hack.

It's not that logical. Readahead on the underlying devices should mean
that sector+1 is read on BOTH drives. I'll tell you below why it
"works".

You want to avoid overlap of the readahead areas. That essentially
means striping underneath in lumps equal to the readahead distance.

> I was pondering maybe doing something like a virtual stripe array, such
> that the data reads are logically alternated between the functioning

It' not worth it - striping is only a means to an end, and the end is
distribution of requests among the participating disks. The stripe
idea works by using alternation in space as a distribution mechanism,
but need you to deliver effectively random seeks to make it work!

You can achieve the same distribution in other ways. The most obvious
is simply rotating the requests round robin, as you are doing. This
doesn't work because different sectors come from different drives -
it means that all the drives will all pull the same areas into their
caches, ditto the drivers for them into the block buffer cache, and
thus you have a bigger lookahead cache plus some parellism and
pipelining ...

Try switching drives every 40KB instead (assuming RA is set to 40 KB).


Peter


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: RAID1 & 2.6.9 performance problem
  2005-01-17 15:22 RAID1 & 2.6.9 performance problem Janusz Zamecki
  2005-01-17 15:39 ` Gordon Henderson
@ 2005-01-18 17:32 ` J. Ryan Earl
  2005-01-18 17:34   ` J. Ryan Earl
  2005-01-18 19:12   ` Janusz Zamecki
  1 sibling, 2 replies; 21+ messages in thread
From: J. Ryan Earl @ 2005-01-18 17:32 UTC (permalink / raw)
  To: Janusz Zamecki, linux-raid

"Please check this out:

hdparm -t /dev/hdg /dev/hde /dev/md6

/dev/hdg:
  Timing buffered disk reads:  184 MB in  3.03 seconds =  60.76 MB/sec

/dev/hde:
  Timing buffered disk reads:  184 MB in  3.01 seconds =  61.08 MB/sec

/dev/md6:
  Timing buffered disk reads:  184 MB in  3.03 seconds =  60.74 MB/sec

I've expected much better /dev/md6 performance (at least 100MB/s)."

This is perfectly normally, I'm not sure why you'd expect better
performance.  You will get 2 parallel sequential reads at around 120MB/sec
assuming you're not bus limited.  A single sequential parallel read can be
no faster than the performance of a single RAID1 disk, though latency should
lower significantly.  I found that average number of read seeks/sec
increases around 80% in going from a single HD to a RAID1 setup.

Think about it and it should make sense.  You have two discs with identical
layouts.  How could you possibly increase the speed of a single sequential
read?  You can't just read half from one drive, half from the other, you'd
always have heads seeking and it would no longer be a sequential read.

-ryan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: RAID1 & 2.6.9 performance problem
  2005-01-18 17:32 ` RAID1 & 2.6.9 performance problem J. Ryan Earl
@ 2005-01-18 17:34   ` J. Ryan Earl
  2005-01-18 18:41     ` Janusz Zamecki
  2005-01-18 19:12   ` Janusz Zamecki
  1 sibling, 1 reply; 21+ messages in thread
From: J. Ryan Earl @ 2005-01-18 17:34 UTC (permalink / raw)
  To: J. Ryan Earl, Janusz Zamecki, linux-raid

"You will get 2 parallel sequential reads at around 120MB/sec
assuming you're not bus limited."

To clarify because this looks ambiguous to me now, you should be able to
perform 2 parallel sequential reads both at 60MB/sec = 120MB/sec total.

-ryan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: RAID1 & 2.6.9 performance problem
  2005-01-18 17:34   ` J. Ryan Earl
@ 2005-01-18 18:41     ` Janusz Zamecki
  2005-01-18 19:18       ` J. Ryan Earl
  0 siblings, 1 reply; 21+ messages in thread
From: Janusz Zamecki @ 2005-01-18 18:41 UTC (permalink / raw)
  To: J. Ryan Earl; +Cc: linux-raid

W liście z wto, 18-01-2005, godz. 18:34, J. Ryan Earl pisze: 
> "You will get 2 parallel sequential reads at around 120MB/sec
> assuming you're not bus limited."
> 
> To clarify because this looks ambiguous to me now, you should be able to
> perform 2 parallel sequential reads both at 60MB/sec = 120MB/sec total.
> 
> -ryan
> 

Hi,

Unfortunatelly it is not the case. Please go to my original e-mail. I've
ran two simultaneous tests and I get:


hdparm -t /dev/md6 &
hdparm -t /dev/md6

This is the result:

--disk/hdg----disk/hde-
_read write _read write
    0     0 :   0     0
    0     0 :   0     0
  124k    0 :26.0M    0
  368k    0 :45.5M    0
    0     0 :   0     0
    0     0 : 896k    0
  124k    0 :1568k    0
    0     0 :   0     0

One disk works with 75% of its full speed, the second one works with
0,6% of its full speed or even less.

Best regards, Janusz
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: RAID1 & 2.6.9 performance problem
  2005-01-18 17:32 ` RAID1 & 2.6.9 performance problem J. Ryan Earl
  2005-01-18 17:34   ` J. Ryan Earl
@ 2005-01-18 19:12   ` Janusz Zamecki
  1 sibling, 0 replies; 21+ messages in thread
From: Janusz Zamecki @ 2005-01-18 19:12 UTC (permalink / raw)
  To: linux-raid


> Think about it and it should make sense.  You have two discs with identical
> layouts.  How could you possibly increase the speed of a single sequential
> read?  You can't just read half from one drive, half from the other, you'd
> always have heads seeking and it would no longer be a sequential read.
> 

I'm thinking about something like this (of course this is
simplification):

1. Application reads big chunk of data. 
2. File system layer splits this requests into fs block size requests
(let say 4k each).
3. Each block read request is divided into sector size requests. As far
as I understand modern drives can do read ahead up to 16 (32??) sectors.

Now, the first read request for sector X should go to the first disk,
while the next request for the sector (X+disk_read_ahead) should go to
the second disk, third request for the (X+2*disk_read_ahead) sector to
the first disk, and so on.

If we use whole disks for RAID1, the heads should always be in similar
position, so latency should be similar to 1 disk, because the drives
should seek in track to track manner (in ideal world).

I agree that it is not pure sequential operation, but I think it is
close enough to get some performance boost at virtually no cost.

Do you agree?

Janusz

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: RAID1 & 2.6.9 performance problem
  2005-01-18 18:41     ` Janusz Zamecki
@ 2005-01-18 19:18       ` J. Ryan Earl
  2005-01-18 19:34         ` Janusz Zamecki
  0 siblings, 1 reply; 21+ messages in thread
From: J. Ryan Earl @ 2005-01-18 19:18 UTC (permalink / raw)
  To: Janusz Zamecki; +Cc: linux-raid

I missed that, sounds like you got an IDE concurrency problem.  Perhaps you
can't read from both ports for some reason.  Don't think it's a raid problem
though.

-ryan

-----Original Message-----
From: Janusz Zamecki [mailto:janusz@pipi.ma.cx]
Sent: Tuesday, January 18, 2005 12:41 PM
To: J. Ryan Earl
Cc: linux-raid@vger.kernel.org
Subject: RE: RAID1 & 2.6.9 performance problem


W liście z wto, 18-01-2005, godz. 18:34, J. Ryan Earl pisze:
> "You will get 2 parallel sequential reads at around 120MB/sec
> assuming you're not bus limited."
>
> To clarify because this looks ambiguous to me now, you should be able to
> perform 2 parallel sequential reads both at 60MB/sec = 120MB/sec total.
>
> -ryan
>

Hi,

Unfortunatelly it is not the case. Please go to my original e-mail. I've
ran two simultaneous tests and I get:


hdparm -t /dev/md6 &
hdparm -t /dev/md6

This is the result:

--disk/hdg----disk/hde-
_read write _read write
    0     0 :   0     0
    0     0 :   0     0
  124k    0 :26.0M    0
  368k    0 :45.5M    0
    0     0 :   0     0
    0     0 : 896k    0
  124k    0 :1568k    0
    0     0 :   0     0

One disk works with 75% of its full speed, the second one works with
0,6% of its full speed or even less.

Best regards, Janusz

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: RAID1 & 2.6.9 performance problem
  2005-01-18 19:18       ` J. Ryan Earl
@ 2005-01-18 19:34         ` Janusz Zamecki
  0 siblings, 0 replies; 21+ messages in thread
From: Janusz Zamecki @ 2005-01-18 19:34 UTC (permalink / raw)
  To: J. Ryan Earl; +Cc: linux-raid

W liście z wto, 18-01-2005, godz. 20:18, J. Ryan Earl pisze: 
> I missed that, sounds like you got an IDE concurrency problem.  Perhaps you
> can't read from both ports for some reason.  Don't think it's a raid problem
> though.


well, it is a raid problem. Check it out. I've ran the following script:

hdparm -t /dev/hdg &
hdparm -t /dev/hde

(hde and hdg creates my md6 array from previous message)

And there are results:

--disk/hdg----disk/hde-
_read write _read write
   0     0 :   0     0
40.2M    0 :39.9M    0
53.5M    0 :53.5M    0
51.5M    0 :51.5M    0
13.2M    0 :13.5M    0
   0     0 :   0     0

Even if I increase number of reading threads

hdparm -t /dev/hdg &
hdparm -t /dev/hde &
hdparm -t /dev/hdg &
hdparm -t /dev/hde

the workload is more or less evenly distributed.

--disk/hdg----disk/hde-
_read write _read write
   0     0 :   0     0
40.2M    0 :38.9M    0
7732k    0 :8440k    0
3780k    0 : 928k    0
1280k    0 :2400k    0
2076k    0 :   0     0
   0     0 :   0     0

Janusz
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2005-01-18 19:34 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-01-17 15:22 RAID1 & 2.6.9 performance problem Janusz Zamecki
2005-01-17 15:39 ` Gordon Henderson
2005-01-17 15:51   ` Hans Kristian Rosbach
2005-01-17 16:46     ` Peter T. Breuer
2005-01-18 13:18       ` Hans Kristian Rosbach
2005-01-18 13:43         ` Peter T. Breuer
2005-01-17 20:49     ` Janusz Zamecki
2005-01-17 16:24   ` Andrew Walrond
2005-01-17 16:51     ` Is this hdparm -t output correct? (was Re: RAID1 & 2.6.9 performance problem) Andy Smith
2005-01-17 17:04       ` Andrew Walrond
2005-01-17 18:26         ` RAID1 Corruption Markus Gehring
2005-01-17 19:14           ` Paul Clements
2005-01-17 19:35             ` Tony Mantler
2005-01-17 19:42             ` Markus Gehring
2005-01-17 19:21           ` Sven Anders
2005-01-18 17:32 ` RAID1 & 2.6.9 performance problem J. Ryan Earl
2005-01-18 17:34   ` J. Ryan Earl
2005-01-18 18:41     ` Janusz Zamecki
2005-01-18 19:18       ` J. Ryan Earl
2005-01-18 19:34         ` Janusz Zamecki
2005-01-18 19:12   ` Janusz Zamecki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).