raid6 + caviar black + mpt2sas horrific performance

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* raid6 + caviar black + mpt2sas horrific performance
@ 2011-03-30  8:08 Louis-David Mitterrand
  2011-03-30 13:20 ` Stan Hoeppner
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Louis-David Mitterrand @ 2011-03-30  8:08 UTC (permalink / raw)
  To: linux-raid

Hi,

I am seeing horrific performance on a Dell T610 with a LSISAS2008 (Dell
H200) card and 8 WD1002FAEX Caviar Black 1TB configured in mdadm raid6.

The LSI card is upgraded to the latest 9.00 firmware:
http://www.lsi.com/storage_home/products_home/host_bus_adapters/sas_hbas/internal/sas9211-8i/index.html
and the 2.6.38.2 kernel uses the newer mpt2sas driver.

On the T610 this command takes 20 minutes:

	tar -I pbzip2 -xvf linux-2.6.37.tar.bz2  22.64s user 3.34s system 2% cpu 20:00.69 total

where on a lower spec'ed Poweredge 2900 III server (LSI Logic MegaRAID
SAS 1078 + 8 x Hitachi Ultrastar 7K1000 in mdadm raid6) it takes 22
_seconds_:

	tar -I pbzip2 -xvf linux-2.6.37.tar.bz2  16.40s user 3.22s system 86% cpu 22.773 total

Besides hardware, the other difference between servers is that the
PE2900's MegaRAID has no JBOD mode so each disk must be configured as a
"raid0" vdisk unit. On the T610 no configuration was necessary for the
disks to "appear" in the OS. Would configuring them as raid0 vdisks
change anything?

Thanks in advance for any suggestion,

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: raid6 + caviar black + mpt2sas horrific performance
  2011-03-30  8:08 raid6 + caviar black + mpt2sas horrific performance Louis-David Mitterrand
@ 2011-03-30 13:20 ` Stan Hoeppner
  2011-03-30 13:42 ` Robin Hill
  2011-03-30 13:46 ` Joe Landman
  2 siblings, 0 replies; 11+ messages in thread
From: Stan Hoeppner @ 2011-03-30 13:20 UTC (permalink / raw)
  To: linux-raid

Louis-David Mitterrand put forth on 3/30/2011 3:08 AM:
> Hi,
> 
> I am seeing horrific performance on a Dell T610 with a LSISAS2008 (Dell
> H200) card and 8 WD1002FAEX Caviar Black 1TB configured in mdadm raid6.
> 
> The LSI card is upgraded to the latest 9.00 firmware:
> http://www.lsi.com/storage_home/products_home/host_bus_adapters/sas_hbas/internal/sas9211-8i/index.html
> and the 2.6.38.2 kernel uses the newer mpt2sas driver.
> 
> On the T610 this command takes 20 minutes:
> 
> 	tar -I pbzip2 -xvf linux-2.6.37.tar.bz2  22.64s user 3.34s system 2% cpu 20:00.69 total
> 
> where on a lower spec'ed Poweredge 2900 III server (LSI Logic MegaRAID
> SAS 1078 + 8 x Hitachi Ultrastar 7K1000 in mdadm raid6) it takes 22
> _seconds_:
> 
> 	tar -I pbzip2 -xvf linux-2.6.37.tar.bz2  16.40s user 3.22s system 86% cpu 22.773 total
> 
> Besides hardware, the other difference between servers is that the
> PE2900's MegaRAID has no JBOD mode so each disk must be configured as a
> "raid0" vdisk unit. On the T610 no configuration was necessary for the
> disks to "appear" in the OS. Would configuring them as raid0 vdisks
> change anything?

Changing the virtual disk configuration is a question for Dell support.
 You haven't provided sufficient information that would allow us to
troubleshoot your problem.  Please include relevant log and dmesg output.

-- 
Stan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: raid6 + caviar black + mpt2sas horrific performance
  2011-03-30  8:08 raid6 + caviar black + mpt2sas horrific performance Louis-David Mitterrand
  2011-03-30 13:20 ` Stan Hoeppner
@ 2011-03-30 13:42 ` Robin Hill
  2011-03-30 13:46 ` Joe Landman
  2 siblings, 0 replies; 11+ messages in thread
From: Robin Hill @ 2011-03-30 13:42 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1167 bytes --]

On Wed Mar 30, 2011 at 10:08:23AM +0200, Louis-David Mitterrand wrote:

> Hi,
> 
> I am seeing horrific performance on a Dell T610 with a LSISAS2008 (Dell
> H200) card and 8 WD1002FAEX Caviar Black 1TB configured in mdadm raid6.
> 
> Besides hardware, the other difference between servers is that the
> PE2900's MegaRAID has no JBOD mode so each disk must be configured as a
> "raid0" vdisk unit. On the T610 no configuration was necessary for the
> disks to "appear" in the OS. Would configuring them as raid0 vdisks
> change anything?
> 
Several years ago I ran into a similar issue with a SCSI RAID controller
(may have been LSI MegaRAID, I can't recall now) and found that RAID0
vdisks were substantially faster than running it in JBOD mode. This
wasn't just down to the controller disabling caching/optimisations
either - in JBOD mode the performance was well below that on a non-RAID
controller.

Cheers,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: raid6 + caviar black + mpt2sas horrific performance
  2011-03-30  8:08 raid6 + caviar black + mpt2sas horrific performance Louis-David Mitterrand
  2011-03-30 13:20 ` Stan Hoeppner
  2011-03-30 13:42 ` Robin Hill
@ 2011-03-30 13:46 ` Joe Landman
  2011-03-30 15:20   ` Louis-David Mitterrand
  2 siblings, 1 reply; 11+ messages in thread
From: Joe Landman @ 2011-03-30 13:46 UTC (permalink / raw)
  To: linux-raid

On 03/30/2011 04:08 AM, Louis-David Mitterrand wrote:
> Hi,
>
> I am seeing horrific performance on a Dell T610 with a LSISAS2008 (Dell
> H200) card and 8 WD1002FAEX Caviar Black 1TB configured in mdadm raid6.
>
> The LSI card is upgraded to the latest 9.00 firmware:
> http://www.lsi.com/storage_home/products_home/host_bus_adapters/sas_hbas/internal/sas9211-8i/index.html
> and the 2.6.38.2 kernel uses the newer mpt2sas driver.
>
> On the T610 this command takes 20 minutes:
>
> 	tar -I pbzip2 -xvf linux-2.6.37.tar.bz2  22.64s user 3.34s system 2% cpu 20:00.69 total

Get rid of the "v" option.  And do an

	sync
	echo 3 > /proc/sys/vm/drop_caches

before the test.  Make sure your file system is local, and not NFS 
mounted (this could easily explain the timing BTW).

While we are at it, don't use pbzip2, use single threaded bzip2, as 
there may be other platform differences that impact the parallel extraction.

Here is an extraction on a local md based Delta-V unit (we use 
internally for backups)

[root@vault t]# /usr/bin/time tar -xf ~/linux-2.6.38.tar.bz2
25.18user 4.08system 1:06.96elapsed 43%CPU (0avgtext+0avgdata 
16256maxresident)k
6568inputs+969880outputs (4major+1437minor)pagefaults 0swaps

This also uses an LSI card.

On one of internal file servers using a hardware RAID

root@crunch:/data/kernel/2.6.38# /usr/bin/time tar -xf linux-2.6.38.tar.bz2
22.51user 3.73system 0:22.59elapsed 116%CPU (0avgtext+0avgdata 
0maxresident)k
0inputs+969872outputs (0major+3565minor)pagefaults 0swaps

Try a similar test on your two units, without the "v" option.  Then try 
to get useful information about the MD raid, and file system atop this.

For our MD raid Delta-V system

[root@vault t]# mdadm --detail /dev/md2
/dev/md2:
         Version : 1.2
   Creation Time : Mon Nov  1 10:38:35 2010
      Raid Level : raid6
      Array Size : 10666968576 (10172.81 GiB 10922.98 GB)
   Used Dev Size : 969724416 (924.80 GiB 993.00 GB)
    Raid Devices : 13
   Total Devices : 14
     Persistence : Superblock is persistent

     Update Time : Wed Mar 30 04:46:35 2011
           State : clean
  Active Devices : 13
Working Devices : 14
  Failed Devices : 0
   Spare Devices : 1

          Layout : left-symmetric
      Chunk Size : 512K

            Name : 2
            UUID : 45ddd631:efd08494:8cd4ff1a:0695567b
          Events : 18280

     Number   Major   Minor   RaidDevice State
        0       8       35        0      active sync   /dev/sdc3
       13       8      227        1      active sync   /dev/sdo3
        2       8       51        2      active sync   /dev/sdd3
        3       8       67        3      active sync   /dev/sde3
        4       8       83        4      active sync   /dev/sdf3
        5       8       99        5      active sync   /dev/sdg3
        6       8      115        6      active sync   /dev/sdh3
        7       8      131        7      active sync   /dev/sdi3
        8       8      147        8      active sync   /dev/sdj3
        9       8      163        9      active sync   /dev/sdk3
       10       8      179       10      active sync   /dev/sdl3
       11       8      195       11      active sync   /dev/sdm3
       12       8      211       12      active sync   /dev/sdn3

       14       8      243        -      spare   /dev/sdp3

[root@vault t]# mount | grep md2
/dev/md2 on /backup type xfs (rw)

[root@vault t]# grep md2 /etc/fstab
/dev/md2		/backup			xfs	defaults	1 2

And a basic speed check on the md device

[root@vault t]# dd if=/dev/md2 of=/dev/null bs=32k count=32000
32000+0 records in
32000+0 records out
1048576000 bytes (1.0 GB) copied, 3.08236 seconds, 340 MB/s

[root@vault t]# dd if=/dev/zero of=/backup/t/big.file bs=32k count=32000
32000+0 records in
32000+0 records out
1048576000 bytes (1.0 GB) copied, 2.87177 seconds, 365 MB/s

Some 'lspci -vvv' output, and contents of /proc/interrupts, 
/proc/cpuinfo, ... would be helpful.




>
> where on a lower spec'ed Poweredge 2900 III server (LSI Logic MegaRAID
> SAS 1078 + 8 x Hitachi Ultrastar 7K1000 in mdadm raid6) it takes 22
> _seconds_:
>
> 	tar -I pbzip2 -xvf linux-2.6.37.tar.bz2  16.40s user 3.22s system 86% cpu 22.773 total
>
> Besides hardware, the other difference between servers is that the
> PE2900's MegaRAID has no JBOD mode so each disk must be configured as a
> "raid0" vdisk unit. On the T610 no configuration was necessary for the
> disks to "appear" in the OS. Would configuring them as raid0 vdisks
> change anything?
>
> Thanks in advance for any suggestion,
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman@scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: raid6 + caviar black + mpt2sas horrific performance
  2011-03-30 13:46 ` Joe Landman
@ 2011-03-30 15:20   ` Louis-David Mitterrand
  2011-03-30 16:12     ` Joe Landman
                       ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Louis-David Mitterrand @ 2011-03-30 15:20 UTC (permalink / raw)
  To: linux-raid

On Wed, Mar 30, 2011 at 09:46:29AM -0400, Joe Landman wrote:
> On 03/30/2011 04:08 AM, Louis-David Mitterrand wrote:
> >Hi,
> >
> >I am seeing horrific performance on a Dell T610 with a LSISAS2008 (Dell
> >H200) card and 8 WD1002FAEX Caviar Black 1TB configured in mdadm raid6.
> >
> >The LSI card is upgraded to the latest 9.00 firmware:
> >http://www.lsi.com/storage_home/products_home/host_bus_adapters/sas_hbas/internal/sas9211-8i/index.html
> >and the 2.6.38.2 kernel uses the newer mpt2sas driver.
> >
> >On the T610 this command takes 20 minutes:
> >
> >	tar -I pbzip2 -xvf linux-2.6.37.tar.bz2  22.64s user 3.34s system 2% cpu 20:00.69 total
> 
> Get rid of the "v" option.  And do an
> 
> 	sync
> 	echo 3 > /proc/sys/vm/drop_caches
> 
> before the test.  Make sure your file system is local, and not NFS
> mounted (this could easily explain the timing BTW).

fs are local on both machines.

> Try a similar test on your two units, without the "v" option.  Then

- T610:

	tar -xjf linux-2.6.37.tar.bz2  24.09s user 4.36s system 2% cpu 20:30.95 total

- PE2900:

	tar -xjf linux-2.6.37.tar.bz2  17.81s user 3.37s system 64% cpu 33.062 total

Still a huge difference.

> try to get useful information about the MD raid, and file system
> atop this.
> 
> For our MD raid Delta-V system
> 
> [root@vault t]# mdadm --detail /dev/md2

- T610:

/dev/md1:
        Version : 1.2
  Creation Time : Wed Oct 20 21:40:40 2010
     Raid Level : raid6
     Array Size : 841863168 (802.86 GiB 862.07 GB)
  Used Dev Size : 140310528 (133.81 GiB 143.68 GB)
   Raid Devices : 8
  Total Devices : 8
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Wed Mar 30 17:11:22 2011
          State : active
 Active Devices : 8
Working Devices : 8
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           Name : grml:1
           UUID : 1434a46a:f2b751cd:8604803c:b545de8c
         Events : 2532

    Number   Major   Minor   RaidDevice State
       0       8       82        0      active sync   /dev/sdf2
       1       8       50        1      active sync   /dev/sdd2
       2       8        2        2      active sync   /dev/sda2
       3       8       18        3      active sync   /dev/sdb2
       4       8       34        4      active sync   /dev/sdc2
       5       8       66        5      active sync   /dev/sde2
       6       8      114        6      active sync   /dev/sdh2
       7       8       98        7      active sync   /dev/sdg2

- PE2900:

/dev/md1:
        Version : 1.2
  Creation Time : Mon Oct 25 10:17:30 2010
     Raid Level : raid6
     Array Size : 841863168 (802.86 GiB 862.07 GB)
  Used Dev Size : 140310528 (133.81 GiB 143.68 GB)
   Raid Devices : 8
  Total Devices : 8
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Wed Mar 30 17:12:17 2011
          State : active
 Active Devices : 8
Working Devices : 8
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           Name : grml:1
           UUID : 224f5112:b8a3c0d2:49361f8f:abed9c4f
         Events : 1507

    Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync   /dev/sda2
       1       8       18        1      active sync   /dev/sdb2
       2       8       34        2      active sync   /dev/sdc2
       3       8       50        3      active sync   /dev/sdd2
       4       8       66        4      active sync   /dev/sde2
       5       8       82        5      active sync   /dev/sdf2
       6       8       98        6      active sync   /dev/sdg2
       7       8      114        7      active sync   /dev/sdh2

> [root@vault t]# mount | grep md2

- T610:

/dev/mapper/cmd1 on / type xfs (rw,inode64,delaylog,logbsize=262144)

- PE2900:

/dev/mapper/cmd1 on / type xfs (rw,inode64,delaylog,logbsize=262144)

> [root@vault t]# grep md2 /etc/fstab

- T610:

/dev/mapper/cmd1	/		xfs	defaults,inode64,delaylog,logbsize=262144	0	0

- PE2900:

/dev/mapper/cmd1	/		xfs	defaults,inode64,delaylog,logbsize=262144	0	0

> [root@vault t]# dd if=/dev/md2 of=/dev/null bs=32k count=32000

- T610:

32000+0 enregistrements lus
32000+0 enregistrements écrits
1048576000 octets (1,0 GB) copiés, 1,70421 s, 615 MB/s

- PE2900:

32000+0 records in
32000+0 records out
1048576000 bytes (1.0 GB) copied, 2.02322 s, 518 MB/s

> [root@vault t]# dd if=/dev/zero of=/backup/t/big.file bs=32k count=32000

- T610:

32000+0 enregistrements lus
32000+0 enregistrements écrits
1048576000 octets (1,0 GB) copiés, 0,870001 s, 1,2 GB/s

- PE2900:

32000+0 records in
32000+0 records out
1048576000 bytes (1.0 GB) copied, 9.11934 s, 115 MB/s

> Some 'lspci -vvv' output, 

- T610:

02:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 02)
	Subsystem: Dell PERC H200 Integrated
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 41
	Region 0: I/O ports at fc00 [size=256]
	Region 1: Memory at df2b0000 (64-bit, non-prefetchable) [size=64K]
	Region 3: Memory at df2c0000 (64-bit, non-prefetchable) [size=256K]
	Expansion ROM at df100000 [disabled] [size=1M]
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [68] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
			MaxPayload 256 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 5GT/s, Width x8, ASPM L0s, Latency L0 <64ns, L1 <1us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range BC, TimeoutDis+
		DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-
		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB
	Capabilities: [d0] Vital Product Data
		Unknown small resource type 00, will not decode more.
	Capabilities: [a8] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [c0] MSI-X: Enable- Count=15 Masked-
		Vector table: BAR=1 offset=0000e000
		PBA: BAR=1 offset=0000f800
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP+ FCP+ CmpltTO+ CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
	Capabilities: [138 v1] Power Budgeting <?>
	Kernel driver in use: mpt2sas

- PE2900:

01:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04)
	Subsystem: Dell PERC 6/i Integrated RAID Controller
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 16
	Region 0: Memory at fc480000 (64-bit, non-prefetchable) [size=256K]
	Region 2: I/O ports at ec00 [size=256]
	Region 3: Memory at fc440000 (64-bit, non-prefetchable) [size=256K]
	Expansion ROM at fc300000 [disabled] [size=32K]
	Capabilities: [b0] Express (v1) Endppcilib: sysfs_read_vpd: read failed: Connection timed out
oint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal+ Unsupported-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 256 bytes, MaxReadReq 2048 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x8, ASPM L0s, Latency L0 <2us, L1 unlimited
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
	Capabilities: [c4] MSI: Enable- Count=1/4 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [d4] MSI-X: Enable- Count=4 Masked-
		Vector table: BAR=0 offset=0003e000
		PBA: BAR=0 offset=00fff000
	Capabilities: [e0] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [ec] Vital Product Data
		Not readable
	Capabilities: [100 v1] Power Budgeting <?>
	Kernel driver in use: megaraid_sas
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: raid6 + caviar black + mpt2sas horrific performance
  2011-03-30 15:20   ` Louis-David Mitterrand
@ 2011-03-30 16:12     ` Joe Landman
  2011-03-31  9:32       ` Louis-David Mitterrand
  2011-04-19 11:04       ` Louis-David Mitterrand
  2011-03-30 19:26     ` Iordan Iordanov
  2011-03-31  7:11     ` Michael Tokarev
  2 siblings, 2 replies; 11+ messages in thread
From: Joe Landman @ 2011-03-30 16:12 UTC (permalink / raw)
  To: linux-raid

On 03/30/2011 11:20 AM, Louis-David Mitterrand wrote:
> On Wed, Mar 30, 2011 at 09:46:29AM -0400, Joe Landman wrote:

[...]

>> Try a similar test on your two units, without the "v" option.  Then
>
> - T610:
>
> 	tar -xjf linux-2.6.37.tar.bz2  24.09s user 4.36s system 2% cpu 20:30.95 total
>
> - PE2900:
>
> 	tar -xjf linux-2.6.37.tar.bz2  17.81s user 3.37s system 64% cpu 33.062 total
>
> Still a huge difference.

The wallclock gives you a huge difference.  The user and system times 
are quite similar.

[...]

> - T610:
>
> /dev/mapper/cmd1 on / type xfs (rw,inode64,delaylog,logbsize=262144)
>
> - PE2900:
>
> /dev/mapper/cmd1 on / type xfs (rw,inode64,delaylog,logbsize=262144)

Hmmm.  You are layering an LVM atop the raid?  Your raids are /dev/md1. 
  How is /dev/mapper/cmd1 related to /dev/md1?

[...]

>> [root@vault t]# dd if=/dev/md2 of=/dev/null bs=32k count=32000
>
> - T610:
>
> 32000+0 enregistrements lus
> 32000+0 enregistrements écrits
> 1048576000 octets (1,0 GB) copiés, 1,70421 s, 615 MB/s
>
> - PE2900:
>
> 32000+0 records in
> 32000+0 records out
> 1048576000 bytes (1.0 GB) copied, 2.02322 s, 518 MB/s

Raw reads from the MD device.  For completeness, you should also do

	dd if=/dev/mapper/cmd1 of=/dev/null bs=32k count=32000

and

	dd if=/backup/t/big.file  of=/dev/null bs=32k count=32000

to see if there is a sudden loss of performance at some level.

>> [root@vault t]# dd if=/dev/zero of=/backup/t/big.file bs=32k count=32000
>
> - T610:
>
> 32000+0 enregistrements lus
> 32000+0 enregistrements écrits
> 1048576000 octets (1,0 GB) copiés, 0,870001 s, 1,2 GB/s
>
> - PE2900:
>
> 32000+0 records in
> 32000+0 records out
> 1048576000 bytes (1.0 GB) copied, 9.11934 s, 115 MB/s

Ahhh ...  look at that.  Cached write is very different between the two. 
  An order of magnitude.  You could also try a direct (noncached) write, 
using oflag=direct at the end of the line.  This could be useful, though 
direct IO isn't terribly fast on MD raids.

If we can get the other dd's indicated, we might have a better sense of 
which layer is causing the issue.  It might not be MD.

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman@scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: raid6 + caviar black + mpt2sas horrific performance
  2011-03-30 16:12     ` Joe Landman
@ 2011-03-31  9:32       ` Louis-David Mitterrand
  2011-04-19 11:04       ` Louis-David Mitterrand
  1 sibling, 0 replies; 11+ messages in thread
From: Louis-David Mitterrand @ 2011-03-31  9:32 UTC (permalink / raw)
  To: linux-raid

On Wed, Mar 30, 2011 at 12:12:14PM -0400, Joe Landman wrote:
> 
> >- T610:
> >
> >/dev/mapper/cmd1 on / type xfs (rw,inode64,delaylog,logbsize=262144)
> >
> >- PE2900:
> >
> >/dev/mapper/cmd1 on / type xfs (rw,inode64,delaylog,logbsize=262144)
> 
> Hmmm.  You are layering an LVM atop the raid?  Your raids are
> /dev/md1.  How is /dev/mapper/cmd1 related to /dev/md1?

It's a dm-crypt (luks) layer. No lvm here.

> [...]
> 
> >>[root@vault t]# dd if=/dev/md2 of=/dev/null bs=32k count=32000
> >
> >- T610:
> >
> >32000+0 enregistrements lus
> >32000+0 enregistrements écrits
> >1048576000 octets (1,0 GB) copiés, 1,70421 s, 615 MB/s
> >
> >- PE2900:
> >
> >32000+0 records in
> >32000+0 records out
> >1048576000 bytes (1.0 GB) copied, 2.02322 s, 518 MB/s
> 
> Raw reads from the MD device.  For completeness, you should also do
> 
> 	dd if=/dev/mapper/cmd1 of=/dev/null bs=32k count=32000

- T610:

32000+0 records in
32000+0 records out
1048576000 bytes (1.0 GB) copied, 0.21472 s, 4.9 GB/s

- PE2900:

32000+0 records in
32000+0 records out
1048576000 bytes (1.0 GB) copied, 0.396733 s, 2.6 GB/s

> and
> 
> 	dd if=/backup/t/big.file  of=/dev/null bs=32k count=32000
> 
> to see if there is a sudden loss of performance at some level.

- T610:

32000+0 records in
32000+0 records out
1048576000 bytes (1.0 GB) copied, 0.251609 s, 4.2 GB/s

- PE2900:

32000+0 records in
32000+0 records out
1048576000 bytes (1.0 GB) copied, 1.70794 s, 614 MB/s

> >>[root@vault t]# dd if=/dev/zero of=/backup/t/big.file bs=32k count=32000
> >
> >- T610:
> >
> >32000+0 enregistrements lus
> >32000+0 enregistrements écrits
> >1048576000 octets (1,0 GB) copiés, 0,870001 s, 1,2 GB/s
> >
> >- PE2900:
> >
> >32000+0 records in
> >32000+0 records out
> >1048576000 bytes (1.0 GB) copied, 9.11934 s, 115 MB/s

> Ahhh ...  look at that.  Cached write is very different between the
> two.  An order of magnitude.  You could also try a direct
> (noncached) write, using oflag=direct at the end of the line.  This
> could be useful, though direct IO isn't terribly fast on MD raids.

- T610:

32000+0 records in
32000+0 records out
1048576000 bytes (1.0 GB) copied, 316.461 s, 3.3 MB/s

- PE2900:

32000+0 records in
32000+0 records out
1048576000 bytes (1.0 GB) copied, 262.569 s, 4.0 MB/s

> If we can get the other dd's indicated, we might have a better sense
> of which layer is causing the issue.  It might not be MD.

Thanks for your help,
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: raid6 + caviar black + mpt2sas horrific performance
  2011-03-30 16:12     ` Joe Landman
  2011-03-31  9:32       ` Louis-David Mitterrand
@ 2011-04-19 11:04       ` Louis-David Mitterrand
  1 sibling, 0 replies; 11+ messages in thread
From: Louis-David Mitterrand @ 2011-04-19 11:04 UTC (permalink / raw)
  To: linux-raid

On Wed, Mar 30, 2011 at 12:12:14PM -0400, Joe Landman wrote:
> >- T610:
> >
> >32000+0 enregistrements lus
> >32000+0 enregistrements écrits
> >1048576000 octets (1,0 GB) copiés, 0,870001 s, 1,2 GB/s
> >
> >- PE2900:
> >
> >32000+0 records in
> >32000+0 records out
> >1048576000 bytes (1.0 GB) copied, 9.11934 s, 115 MB/s
> 
> Ahhh ...  look at that.  Cached write is very different between the
> two.  An order of magnitude.  You could also try a direct
> (noncached) write, using oflag=direct at the end of the line.  This
> could be useful, though direct IO isn't terribly fast on MD raids.
> 
> If we can get the other dd's indicated, we might have a better sense
> of which layer is causing the issue.  It might not be MD.

Hi,

FWIW I removed a disk from the raid6 array, formatted a partition as xfs
and mounted it to test write speed on a single device:

- T610 (LSI Logic SAS2008 + 8 X WDC Caviar Black WD1002FAEX):

	ZENON:/mnt/test# time tar -xjf /usr/src/linux-2.6.37.tar.bz2 
	tar -xjf /usr/src/linux-2.6.37.tar.bz2  21.89s user 3.93s system 44% cpu 58.115 total
	ZENON:/mnt/test# time rm linux-2.6.37 -rf
	rm -i linux-2.6.37 -rf  0.07s user 2.29s system 1% cpu 2:28.32 total

- PE2900 (MegaRAID SAS 1078 + 8 X Hitachi Ultrastar 7K1000):

	PYRRHUS:/mnt/test# time tar -xjf /usr/src/linux-2.6.37.tar.bz2
	tar -xjf /usr/src/linux-2.6.37.tar.bz2  18.00s user 3.20s system 114% cpu 18.537 total
	PYRRHUS:/mnt/test# time rm linux-2.6.37 -rf
	rm -i linux-2.6.37 -rf  0.03s user 1.68s system 63% cpu 2.665 total

This would mean that the problem really lies with the controller or the
disks, not the raid6 array.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: raid6 + caviar black + mpt2sas horrific performance
  2011-03-30 15:20   ` Louis-David Mitterrand
  2011-03-30 16:12     ` Joe Landman
@ 2011-03-30 19:26     ` Iordan Iordanov
  2011-03-31  7:11     ` Michael Tokarev
  2 siblings, 0 replies; 11+ messages in thread
From: Iordan Iordanov @ 2011-03-30 19:26 UTC (permalink / raw)
  To: linux-raid

In case this helps focus minds, we had horrendous performance of a 30 
1TB disk RAID10 with LVM on top of it (1MB chunks), so we ended up doing 
RAID1 mirrors with LVM striping on top in order to alleviate it.

We were not in a position to try to debug why RAID10 would not play nice 
with LVM, since deadlines were looming...

Cheers,
Iordan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: raid6 + caviar black + mpt2sas horrific performance
  2011-03-30 15:20   ` Louis-David Mitterrand
  2011-03-30 16:12     ` Joe Landman
  2011-03-30 19:26     ` Iordan Iordanov
@ 2011-03-31  7:11     ` Michael Tokarev
  2011-03-31  9:35       ` Louis-David Mitterrand
  2 siblings, 1 reply; 11+ messages in thread
From: Michael Tokarev @ 2011-03-31  7:11 UTC (permalink / raw)
  To: linux-raid

30.03.2011 19:20, Louis-David Mitterrand wrote:
[]

> /dev/mapper/cmd1	/		xfs	defaults,inode64,delaylog,logbsize=262144	0	0
> /dev/mapper/cmd1	/		xfs	defaults,inode64,delaylog,logbsize=262144	0	0
> 
>> [root@vault t]# dd if=/dev/zero of=/backup/t/big.file bs=32k count=32000
> 
> 1048576000 octets (1,0 GB) copiés, 0,870001 s, 1,2 GB/s
> 1048576000 bytes (1.0 GB) copied, 9.11934 s, 115 MB/s

Are you sure your LVM volumes are aligned to raid stripe size correctly?
Proper alignment for raid levels with parity is _critical_, and with
8-drive raid6 you'll have 6 data disks in eash stripe, but since LVM
can only align volumes to a power of two, you'll have 2 unaligned
volumes after each aligned...  Verify that the volume starts at the
raid stripe boundary, maybe create new volume and recheck -- there
should be quite dramatic speed difference like the above when you
change from aligned to misaligned.

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: raid6 + caviar black + mpt2sas horrific performance
  2011-03-31  7:11     ` Michael Tokarev
@ 2011-03-31  9:35       ` Louis-David Mitterrand
  0 siblings, 0 replies; 11+ messages in thread
From: Louis-David Mitterrand @ 2011-03-31  9:35 UTC (permalink / raw)
  To: linux-raid

On Thu, Mar 31, 2011 at 11:11:48AM +0400, Michael Tokarev wrote:
> 30.03.2011 19:20, Louis-David Mitterrand wrote:
> []
> 
> > /dev/mapper/cmd1	/		xfs	defaults,inode64,delaylog,logbsize=262144	0	0
> > /dev/mapper/cmd1	/		xfs	defaults,inode64,delaylog,logbsize=262144	0	0
> > 
> >> [root@vault t]# dd if=/dev/zero of=/backup/t/big.file bs=32k count=32000
> > 
> > 1048576000 octets (1,0 GB) copiés, 0,870001 s, 1,2 GB/s
> > 1048576000 bytes (1.0 GB) copied, 9.11934 s, 115 MB/s
> 
> Are you sure your LVM volumes are aligned to raid stripe size correctly?
> Proper alignment for raid levels with parity is _critical_, and with
> 8-drive raid6 you'll have 6 data disks in eash stripe, but since LVM
> can only align volumes to a power of two, you'll have 2 unaligned
> volumes after each aligned...  Verify that the volume starts at the
> raid stripe boundary, maybe create new volume and recheck -- there
> should be quite dramatic speed difference like the above when you
> change from aligned to misaligned.

Hi,

I am not using LVM, just a dm-crypt layer. Does this change anything
with regard to alignment issues?

Thanks,
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-04-19 11:04 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-30  8:08 raid6 + caviar black + mpt2sas horrific performance Louis-David Mitterrand
2011-03-30 13:20 ` Stan Hoeppner
2011-03-30 13:42 ` Robin Hill
2011-03-30 13:46 ` Joe Landman
2011-03-30 15:20   ` Louis-David Mitterrand
2011-03-30 16:12     ` Joe Landman
2011-03-31  9:32       ` Louis-David Mitterrand
2011-04-19 11:04       ` Louis-David Mitterrand
2011-03-30 19:26     ` Iordan Iordanov
2011-03-31  7:11     ` Michael Tokarev
2011-03-31  9:35       ` Louis-David Mitterrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).