linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* help with bad performing raid6
@ 2009-07-27 19:19 Jon Nelson
  2009-07-27 20:01 ` Robin Hill
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Jon Nelson @ 2009-07-27 19:19 UTC (permalink / raw)
  To: LinuxRaid

I have a raid6 which is exposed via LVM (and parts of which are, in
turn, exposed via NFS) and I'm having some really bad performance
issues, primarily with large files. I'm not sure where the blame lies.
When performance is bad "load" on the server is insanely high even
though it's not doing anything except for the raid6 (it's otherwise
quiescent) and NFS (to typically just one client).

This is a home machine, but it has an AMD Athlon X2 3600+ and 4 fast SATA disks.

When I say "bad performance" I mean writes that vary down to 100KB/s
or less, as reported by rsync. The "average" end-to-end speed for
writing large (500MB to 5GB) files hovers around 3-4MB/s. This is over
100 MBit.

Often times while stracing rsync I will see rsync not make a single
system call for sometimes more than a minute. Sometimes well in excess
of that. If I look at the load on the server the top process is
md0_raid5 (the raid6 process for md0, despite the raid5 in the name).
The load hovers around 8 or 9 at this time.

Even during this period of high load, actual disk I/O is fairly low.
I can get 70-80MB/s out of the actual underlying disks the entire time.
Uncached.

vmstat reports up to 20MB/s writes (this is expected given 100Mbit and
raid6) but most of the time it hovers between 2 and 6 MB/s.

What can I do to improve performance?
Why is the load so high on the server? The client is absolutely bored
(load less than 0.5 most of the time).
Is this some weird interaction between NFS(v3) and software raid6?

My raid6 config:

/dev/md0:
        Version : 1.01
  Creation Time : Mon Feb  9 20:56:40 2009
     Raid Level : raid6
     Array Size : 613409536 (584.99 GiB 628.13 GB)
  Used Dev Size : 306704768 (292.50 GiB 314.07 GB)
   Raid Devices : 4
  Total Devices : 4
    Persistence : Superblock is persistent

    Update Time : Mon Jul 27 14:18:55 2009
          State : active
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 64K


-- 
Jon

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: help with bad performing raid6
  2009-07-27 19:19 help with bad performing raid6 Jon Nelson
@ 2009-07-27 20:01 ` Robin Hill
  2009-07-27 20:03   ` Jon Nelson
  2009-07-27 20:44 ` John Robinson
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Robin Hill @ 2009-07-27 20:01 UTC (permalink / raw)
  To: LinuxRaid

[-- Attachment #1: Type: text/plain, Size: 924 bytes --]

On Mon Jul 27, 2009 at 02:19:42PM -0500, Jon Nelson wrote:

> I have a raid6 which is exposed via LVM (and parts of which are, in
> turn, exposed via NFS) and I'm having some really bad performance
> issues, primarily with large files. I'm not sure where the blame lies.
> When performance is bad "load" on the server is insanely high even
> though it's not doing anything except for the raid6 (it's otherwise
> quiescent) and NFS (to typically just one client).
> 
Have you checked dmesg for disk errors?  I've had similar slowdowns when
there's a bad block on the drive - it's sitting waiting for the retries
to timeout (or the bus to reset) and the read to fail.

Cheers,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: help with bad performing raid6
  2009-07-27 20:01 ` Robin Hill
@ 2009-07-27 20:03   ` Jon Nelson
  0 siblings, 0 replies; 14+ messages in thread
From: Jon Nelson @ 2009-07-27 20:03 UTC (permalink / raw)
  To: LinuxRaid

On Mon, Jul 27, 2009 at 3:01 PM, Robin Hill<robin@robinhill.me.uk> wrote:
> On Mon Jul 27, 2009 at 02:19:42PM -0500, Jon Nelson wrote:
>
>> I have a raid6 which is exposed via LVM (and parts of which are, in
>> turn, exposed via NFS) and I'm having some really bad performance
>> issues, primarily with large files. I'm not sure where the blame lies.
>> When performance is bad "load" on the server is insanely high even
>> though it's not doing anything except for the raid6 (it's otherwise
>> quiescent) and NFS (to typically just one client).
>>
> Have you checked dmesg for disk errors?  I've had similar slowdowns when
> there's a bad block on the drive - it's sitting waiting for the retries
> to timeout (or the bus to reset) and the read to fail.

Yes. There are no bad blocks that are being reported.
I also do a "long" SMART test weekly, and a short one daily.
The smart values look great.

-- 
Jon
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: help with bad performing raid6
  2009-07-27 19:19 help with bad performing raid6 Jon Nelson
  2009-07-27 20:01 ` Robin Hill
@ 2009-07-27 20:44 ` John Robinson
  2009-07-29 15:08 ` Bill Davidsen
  2009-07-30 21:09 ` David Rees
  3 siblings, 0 replies; 14+ messages in thread
From: John Robinson @ 2009-07-27 20:44 UTC (permalink / raw)
  To: Jon Nelson; +Cc: LinuxRaid

On 27/07/2009 20:19, Jon Nelson wrote:
> I have a raid6 which is exposed via LVM (and parts of which are, in
> turn, exposed via NFS) and I'm having some really bad performance
> issues
[...]

Probably the first thing to do is to isolate the problem. First up, I'd 
say run bonnie++ against the filesystem locally, then you can eliminate 
the possibility of NFS or networking causing problems. Then, have you 
any spare space on the LVM VG with which you could try another filesystem?

> If I look at the load on the server the top process is
> md0_raid5 (the raid6 process for md0, despite the raid5 in the name).
> The load hovers around 8 or 9 at this time.

You say it's the top process, but is it using any CPU, or just blocked 
waiting for the discs?

Cheers,

John.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: help with bad performing raid6
  2009-07-27 19:19 help with bad performing raid6 Jon Nelson
  2009-07-27 20:01 ` Robin Hill
  2009-07-27 20:44 ` John Robinson
@ 2009-07-29 15:08 ` Bill Davidsen
  2009-07-29 15:57   ` Jon Nelson
  2009-07-29 16:06   ` Steven Haigh
  2009-07-30 21:09 ` David Rees
  3 siblings, 2 replies; 14+ messages in thread
From: Bill Davidsen @ 2009-07-29 15:08 UTC (permalink / raw)
  To: Jon Nelson; +Cc: LinuxRaid

Jon Nelson wrote:
> I have a raid6 which is exposed via LVM (and parts of which are, in
> turn, exposed via NFS) and I'm having some really bad performance
> issues, primarily with large files. I'm not sure where the blame lies.
> When performance is bad "load" on the server is insanely high even
> though it's not doing anything except for the raid6 (it's otherwise
> quiescent) and NFS (to typically just one client).
>
> This is a home machine, but it has an AMD Athlon X2 3600+ and 4 fast SATA disks.
>
> When I say "bad performance" I mean writes that vary down to 100KB/s
> or less, as reported by rsync. The "average" end-to-end speed for
> writing large (500MB to 5GB) files hovers around 3-4MB/s. This is over
> 100 MBit.
>
> Often times while stracing rsync I will see rsync not make a single
> system call for sometimes more than a minute. Sometimes well in excess
> of that. If I look at the load on the server the top process is
> md0_raid5 (the raid6 process for md0, despite the raid5 in the name).
> The load hovers around 8 or 9 at this time.
>
>   
I really suspect disk errors, I assume nothing in /var/log/messages?

> Even during this period of high load, actual disk I/O is fairly low.
> I can get 70-80MB/s out of the actual underlying disks the entire time.
> Uncached.
>
> vmstat reports up to 20MB/s writes (this is expected given 100Mbit and
> raid6) but most of the time it hovers between 2 and 6 MB/s.
>   

Perhaps iostat looking at the underlying drives would tell you 
something. You might also run iostat with a test write load to see if 
something is unusual:
  dd if=/dev/zero bs=1024k count=1024k of=BigJunk.File conv=fdatasync
and just see if iostat or vmstat or /var/log/messages tells you 
something. Of course if it runs like a bat out hell, it tells you the 
problem is elsewhere.

Other possible causes are a poor chunk size, bad alignment of the whole 
filesystem, and many other things too ugly to name. The fact that you 
use LVM make alignment issue more likely (in the sense of "one more 
level which could mess up"). Checked the error count on the array?

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc

"You are disgraced professional losers. And by the way, give us our money back."
    - Representative Earl Pomeroy,  Democrat of North Dakota
on the A.I.G. executives who were paid bonuses  after a federal bailout.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: help with bad performing raid6
  2009-07-29 15:08 ` Bill Davidsen
@ 2009-07-29 15:57   ` Jon Nelson
  2009-07-29 16:06   ` Steven Haigh
  1 sibling, 0 replies; 14+ messages in thread
From: Jon Nelson @ 2009-07-29 15:57 UTC (permalink / raw)
  Cc: LinuxRaid

On Wed, Jul 29, 2009 at 10:08 AM, Bill Davidsen<davidsen@tmr.com> wrote:
> Jon Nelson wrote:
..

>> When I say "bad performance" I mean writes that vary down to 100KB/s
>> or less, as reported by rsync. The "average" end-to-end speed for
>> writing large (500MB to 5GB) files hovers around 3-4MB/s. This is over
>> 100 MBit.
>>
>> Often times while stracing rsync I will see rsync not make a single
>> system call for sometimes more than a minute. Sometimes well in excess
>> of that. If I look at the load on the server the top process is
>> md0_raid5 (the raid6 process for md0, despite the raid5 in the name).
>> The load hovers around 8 or 9 at this time.
>>
>>
>
> I really suspect disk errors, I assume nothing in /var/log/messages?

Nope. Nothing in /v/l/m. I'm rather strongly beginning to suspect some
sort of weird NFS issue.

> Perhaps iostat looking at the underlying drives would tell you something.

> You might also run iostat with a test write load to see if something is
> unusual:
>  dd if=/dev/zero bs=1024k count=1024k of=BigJunk.File conv=fdatasync

During this test, vmstat reports blocks out of (infrequent) lows of
25000 to about 70000. The values seem to hover in the mid 60K (65MB/s
give or take). That seems very reasonable.

> Of course if it runs like a bat out hell, it tells you the problem is
> elsewhere.

> Other possible causes are a poor chunk size, bad alignment of the whole
> filesystem, and many other things too ugly to name. The fact that you use
> LVM make alignment issue more likely (in the sense of "one more level which
> could mess up"). Checked the error count on the array?

Well, since I can write some 25-30MB/s (actual underlying I/O much
higher obviously) to the same filesystem, and load hovers around 2.5
I'm suspecting some weird NFS issue.

The md0_raid5 process is in R or S most of the time, with about 30% of the CPU.

Summary: writing large files over NFS causes huge load and really
awful performance. Writing similarly large files directly (same
underlying filesystem, ext3) performs as expected without huge load.
Therefore, I am going to assume this is an NFS issue. I've more than
my fair share of NFS issues lately. :-(

PS. I'm running 2.6.27.25 stock openSUSE kernel. I just checked and it
does not apper to have the "NFS packet storm" patches which seems to
cause 2.6.27.X NFS performance to really suck.

Sorry for wasting everybody's time.

-- 
Jon
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: help with bad performing raid6
  2009-07-29 15:08 ` Bill Davidsen
  2009-07-29 15:57   ` Jon Nelson
@ 2009-07-29 16:06   ` Steven Haigh
  2009-07-30 13:15     ` Bill Davidsen
  1 sibling, 1 reply; 14+ messages in thread
From: Steven Haigh @ 2009-07-29 16:06 UTC (permalink / raw)
  To: 'Jon Nelson'; +Cc: 'LinuxRaid'

> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> owner@vger.kernel.org] On Behalf Of Bill Davidsen
> Sent: Thursday, 30 July 2009 1:09 AM
> To: Jon Nelson
> Cc: LinuxRaid
> Subject: Re: help with bad performing raid6
> 
> Jon Nelson wrote:
> > I have a raid6 which is exposed via LVM (and parts of which are, in
> > turn, exposed via NFS) and I'm having some really bad performance
> > issues, primarily with large files. I'm not sure where the blame
> lies.
> > When performance is bad "load" on the server is insanely high even
> > though it's not doing anything except for the raid6 (it's otherwise
> > quiescent) and NFS (to typically just one client).
> >
> > This is a home machine, but it has an AMD Athlon X2 3600+ and 4 fast
> SATA disks.
> >
> > When I say "bad performance" I mean writes that vary down to 100KB/s
> > or less, as reported by rsync. The "average" end-to-end speed for
> > writing large (500MB to 5GB) files hovers around 3-4MB/s. This is
> over
> > 100 MBit.
> >
> > Often times while stracing rsync I will see rsync not make a single
> > system call for sometimes more than a minute. Sometimes well in
> excess
> > of that. If I look at the load on the server the top process is
> > md0_raid5 (the raid6 process for md0, despite the raid5 in the name).
> > The load hovers around 8 or 9 at this time.
> >
> >
> I really suspect disk errors, I assume nothing in /var/log/messages?
> 
> > Even during this period of high load, actual disk I/O is fairly low.
> > I can get 70-80MB/s out of the actual underlying disks the entire
> time.
> > Uncached.
> >
> > vmstat reports up to 20MB/s writes (this is expected given 100Mbit
> and
> > raid6) but most of the time it hovers between 2 and 6 MB/s.
> >
> 
> Perhaps iostat looking at the underlying drives would tell you
> something. You might also run iostat with a test write load to see if
> something is unusual:
>   dd if=/dev/zero bs=1024k count=1024k of=BigJunk.File conv=fdatasync
> and just see if iostat or vmstat or /var/log/messages tells you
> something. Of course if it runs like a bat out hell, it tells you the
> problem is elsewhere.
> 
> Other possible causes are a poor chunk size, bad alignment of the whole
> filesystem, and many other things too ugly to name. The fact that you
> use LVM make alignment issue more likely (in the sense of "one more
> level which could mess up"). Checked the error count on the array?

Keep in mind it may also be CPU/memory throughput as a bottleneck...

I have been debugging an issue with my 5 SATA disk RAID5 system running on a
P4 3Ghz CPU. It's an older style machine with DDR400 RAM and a socket 472(?)
age CPU. Many, many tests were done on this setup

For example, read speeds of a single drive, I get:
# dd if=/dev/sdc of=/dev/null bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 15.3425 seconds, 68.3 MB/s

Then when reading from the RAID5, I get:
# dd if=/dev/md0 of=/dev/null bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 14.2457 seconds, 73.6 MB/s

Not a huge increase, but this is where things become interesting. Write
speeds are a complete new thing - as raw writes to the individual drive can
top 50MB/sec. When put together in a RAID5, I was maxing out at 30MB/sec. As
soon as the hosts RAM buffers filled up, things got ugly. Upgrading the CPU
to a 3.2Ghz CPU gave me a slight performance increase to between 35-40MB/sec
writes.

I tried many, many combinations of drives to controllers, kernel versions,
chunk sizes, filesystems and more - yet I couldn't get things any faster.

As an example, here is an output of iostat during the command suggested
above:

$ iostat -m /dev/sd[c-g] /dev/md1 10
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.30    0.00   14.99   46.68    0.00   38.03

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdc              53.40         0.93         8.31          9         83
sdd              86.90         1.14         8.54         11         85
sde              86.80         1.20         8.50         11         85
sdf              98.80         0.98         8.31          9         83
sdg              95.00         1.04         8.23         10         82
md1             311.00         0.09        33.25          0        332

As you can see, this is much less than what a single drive can sustain - but
in my case, it seemed to be a CPU/RAM bottleneck. This may be the exact same
cause as yours.

Oh, and for the record, here's the mdadm output:
# mdadm --detail /dev/md1
/dev/md1:
        Version : 01.02.03
  Creation Time : Sat Jun 20 17:42:09 2009
     Raid Level : raid5
     Array Size : 1172132864 (1117.83 GiB 1200.26 GB)
  Used Dev Size : 586066432 (279.46 GiB 300.07 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Thu Jul 30 02:03:50 2009
          State : clean
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 256K

           Name : 1
           UUID : 170a984d:2fc1bc57:77b053cf:7b42d9e8
         Events : 3086

    Number   Major   Minor   RaidDevice State
       0       8       33        0      active sync   /dev/sdc1
       1       8       49        1      active sync   /dev/sdd1
       2       8       65        2      active sync   /dev/sde1
       3       8       81        3      active sync   /dev/sdf1
       5       8       97        4      active sync   /dev/sdg1

--
Steven Haigh

Email: netwiz@crc.id.au
Web: http://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: help with bad performing raid6
  2009-07-29 16:06   ` Steven Haigh
@ 2009-07-30 13:15     ` Bill Davidsen
  2009-07-30 20:30       ` John Stoffel
  0 siblings, 1 reply; 14+ messages in thread
From: Bill Davidsen @ 2009-07-30 13:15 UTC (permalink / raw)
  To: Steven Haigh; +Cc: 'Jon Nelson', 'LinuxRaid'

Steven Haigh wrote:
>> -----Original Message-----
>> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
>> owner@vger.kernel.org] On Behalf Of Bill Davidsen
>> Sent: Thursday, 30 July 2009 1:09 AM
>> To: Jon Nelson
>> Cc: LinuxRaid
>> Subject: Re: help with bad performing raid6
>>
>> Jon Nelson wrote:
>>     
>>> I have a raid6 which is exposed via LVM (and parts of which are, in
>>> turn, exposed via NFS) and I'm having some really bad performance
>>> issues, primarily with large files. I'm not sure where the blame
>>>       
>> lies.
>>     
>>> When performance is bad "load" on the server is insanely high even
>>> though it's not doing anything except for the raid6 (it's otherwise
>>> quiescent) and NFS (to typically just one client).
>>>
>>> This is a home machine, but it has an AMD Athlon X2 3600+ and 4 fast
>>>       
>> SATA disks.
>>     
>>> When I say "bad performance" I mean writes that vary down to 100KB/s
>>> or less, as reported by rsync. The "average" end-to-end speed for
>>> writing large (500MB to 5GB) files hovers around 3-4MB/s. This is
>>>       
>> over
>>     
>>> 100 MBit.
>>>
>>> Often times while stracing rsync I will see rsync not make a single
>>> system call for sometimes more than a minute. Sometimes well in
>>>       
>> excess
>>     
>>> of that. If I look at the load on the server the top process is
>>> md0_raid5 (the raid6 process for md0, despite the raid5 in the name).
>>> The load hovers around 8 or 9 at this time.
>>>
>>>
>>>       
>> I really suspect disk errors, I assume nothing in /var/log/messages?
>>
>>     
>>> Even during this period of high load, actual disk I/O is fairly low.
>>> I can get 70-80MB/s out of the actual underlying disks the entire
>>>       
>> time.
>>     
>>> Uncached.
>>>
>>> vmstat reports up to 20MB/s writes (this is expected given 100Mbit
>>>       
>> and
>>     
>>> raid6) but most of the time it hovers between 2 and 6 MB/s.
>>>
>>>       
>> Perhaps iostat looking at the underlying drives would tell you
>> something. You might also run iostat with a test write load to see if
>> something is unusual:
>>   dd if=/dev/zero bs=1024k count=1024k of=BigJunk.File conv=fdatasync
>> and just see if iostat or vmstat or /var/log/messages tells you
>> something. Of course if it runs like a bat out hell, it tells you the
>> problem is elsewhere.
>>
>> Other possible causes are a poor chunk size, bad alignment of the whole
>> filesystem, and many other things too ugly to name. The fact that you
>> use LVM make alignment issue more likely (in the sense of "one more
>> level which could mess up"). Checked the error count on the array?
>>     
>
> Keep in mind it may also be CPU/memory throughput as a bottleneck...
>
> I have been debugging an issue with my 5 SATA disk RAID5 system running on a
> P4 3Ghz CPU. It's an older style machine with DDR400 RAM and a socket 472(?)
> age CPU. Many, many tests were done on this setup
>
> For example, read speeds of a single drive, I get:
> # dd if=/dev/sdc of=/dev/null bs=1M count=1000
> 1000+0 records in
> 1000+0 records out
> 1048576000 bytes (1.0 GB) copied, 15.3425 seconds, 68.3 MB/s
>
> Then when reading from the RAID5, I get:
> # dd if=/dev/md0 of=/dev/null bs=1M count=1000
> 1000+0 records in
> 1000+0 records out
> 1048576000 bytes (1.0 GB) copied, 14.2457 seconds, 73.6 MB/s
>
> Not a huge increase, but this is where things become interesting. Write
> speeds are a complete new thing - as raw writes to the individual drive can
> top 50MB/sec. When put together in a RAID5, I was maxing out at 30MB/sec. As
> soon as the hosts RAM buffers filled up, things got ugly. Upgrading the CPU
> to a 3.2Ghz CPU gave me a slight performance increase to between 35-40MB/sec
> writes.
>
> I tried many, many combinations of drives to controllers, kernel versions,
> chunk sizes, filesystems and more - yet I couldn't get things any faster.
>   

I have done some ad-hoc tests related to using the stride-size and 
stripe-width features of ext2. I'm not ready to give guidance on this 
yet, but I have seen some significant improvement (and degradation) 
using these. If you use ext3 you will probably get a boost in 
performance from putting the journal on an external fast device (SSD 
comes to mind).

If you feel like characterizing this it's a place to start. I was not at 
all strict in my testing, I just wanted to see if those features made a 
difference, and they seem to, 2-5x change in performance. What I didn't 
do is investigate what values work best, record numbers, etc, etc. I 
tested ext4 not at all.

Good luck.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc

"You are disgraced professional losers. And by the way, give us our money back."
    - Representative Earl Pomeroy,  Democrat of North Dakota
on the A.I.G. executives who were paid bonuses  after a federal bailout.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: help with bad performing raid6
  2009-07-30 13:15     ` Bill Davidsen
@ 2009-07-30 20:30       ` John Stoffel
  0 siblings, 0 replies; 14+ messages in thread
From: John Stoffel @ 2009-07-30 20:30 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Steven Haigh, 'Jon Nelson', 'LinuxRaid'


You could also be running into PCI bus bandwidth issues.  And since
you say you're using a 100Mb/s network link you're never going to get
more than 10MByete/sec out of NFS, and probably much worse.  Check
your 'netstat -ni' output as well and see if you have collisions or
other network issues.

Can you shuffle around PCI boards (if you have multiple busses on the
server) or maybe switch to gigabit cards as well?

John

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: help with bad performing raid6
  2009-07-27 19:19 help with bad performing raid6 Jon Nelson
                   ` (2 preceding siblings ...)
  2009-07-29 15:08 ` Bill Davidsen
@ 2009-07-30 21:09 ` David Rees
  2009-07-31 18:21   ` Keld Jørn Simonsen
  3 siblings, 1 reply; 14+ messages in thread
From: David Rees @ 2009-07-30 21:09 UTC (permalink / raw)
  To: Jon Nelson; +Cc: LinuxRaid

On Mon, Jul 27, 2009 at 12:19 PM, Jon Nelson<jnelson-suse@jamponi.net> wrote:
> When I say "bad performance" I mean writes that vary down to 100KB/s
> or less, as reported by rsync. The "average" end-to-end speed for
> writing large (500MB to 5GB) files hovers around 3-4MB/s. This is over
> 100 MBit.

That doesn't sound too unexpected.  rsync does a lot of reading and
writing, so you're going to see less than network speeds.

> Often times while stracing rsync I will see rsync not make a single
> system call for sometimes more than a minute. Sometimes well in excess
> of that. If I look at the load on the server the top process is
> md0_raid5 (the raid6 process for md0, despite the raid5 in the name).
> The load hovers around 8 or 9 at this time.

High load seems a bit abnormal, but not too bad.  You probably have 8
nfsd daemons running on the server?

> Even during this period of high load, actual disk I/O is fairly low.
> I can get 70-80MB/s out of the actual underlying disks the entire time.
> Uncached.

Seems about right.

> vmstat reports up to 20MB/s writes (this is expected given 100Mbit and
> raid6) but most of the time it hovers between 2 and 6 MB/s.

Also seems about right, but you shouldn't be seeing more than 10MB/s
writes - the limit of 100Mbps network.

> What can I do to improve performance?

You might try the --inplace option of rsync.  By default, rsync will
create a copy of the file while it rebuilds it from the source, then
move the copy over the original when it's done.

Since you said you're dealing with large files, you could be
performing a lot of extra IO that doesn't necessarily need to be done
if you are migrating changes to large files over.

You could try rsync over SSH instead of directly over NFS.  That way,
only the changes will get transferred over the network instead of the
entire file being read over the network.

> Why is the load so high on the server? The client is absolutely bored
> (load less than 0.5 most of the time).

Load probably climbs to about the number of NFS daemons you have running.

> Is this some weird interaction between NFS(v3) and software raid6?

Probably not.

-Dave

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: help with bad performing raid6
  2009-07-30 21:09 ` David Rees
@ 2009-07-31 18:21   ` Keld Jørn Simonsen
  2009-07-31 18:23     ` Jon Nelson
  0 siblings, 1 reply; 14+ messages in thread
From: Keld Jørn Simonsen @ 2009-07-31 18:21 UTC (permalink / raw)
  To: David Rees; +Cc: Jon Nelson, LinuxRaid

On Thu, Jul 30, 2009 at 02:09:19PM -0700, David Rees wrote:
> On Mon, Jul 27, 2009 at 12:19 PM, Jon Nelson<jnelson-suse@jamponi.net> wrote:
> > When I say "bad performance" I mean writes that vary down to 100KB/s
> > or less, as reported by rsync. The "average" end-to-end speed for
> > writing large (500MB to 5GB) files hovers around 3-4MB/s. This is over
> > 100 MBit.
> 
> That doesn't sound too unexpected.  rsync does a lot of reading and
> writing, so you're going to see less than network speeds.

rsync also does a lot of computing, so that is also a source of delay.
For faster cpu's the delay is not so big. I get near network speed ( 80
Mbit/s on a 100 Mbit/s LAN) between two of my faster machines.


best regards
keld

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: help with bad performing raid6
  2009-07-31 18:21   ` Keld Jørn Simonsen
@ 2009-07-31 18:23     ` Jon Nelson
  2009-07-31 19:19       ` David Rees
  0 siblings, 1 reply; 14+ messages in thread
From: Jon Nelson @ 2009-07-31 18:23 UTC (permalink / raw)
  To: Keld Jørn Simonsen; +Cc: David Rees, LinuxRaid

2009/7/31 Keld Jørn Simonsen <keld@dkuug.dk>:
> On Thu, Jul 30, 2009 at 02:09:19PM -0700, David Rees wrote:
>> On Mon, Jul 27, 2009 at 12:19 PM, Jon Nelson<jnelson-suse@jamponi.net> wrote:
>> > When I say "bad performance" I mean writes that vary down to 100KB/s
>> > or less, as reported by rsync. The "average" end-to-end speed for
>> > writing large (500MB to 5GB) files hovers around 3-4MB/s. This is over
>> > 100 MBit.
>>
>> That doesn't sound too unexpected.  rsync does a lot of reading and
>> writing, so you're going to see less than network speeds.
>
> rsync also does a lot of computing, so that is also a source of delay.
> For faster cpu's the delay is not so big. I get near network speed ( 80
> Mbit/s on a 100 Mbit/s LAN) between two of my faster machines.

In this case, you may all be assured it was not doing much reading
over NFS or computing, as the file did not exist on the NFS share - in
this case NFS goes into (more or less) pure write mode - no updating
per-se.



-- 
Jon
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: help with bad performing raid6
  2009-07-31 18:23     ` Jon Nelson
@ 2009-07-31 19:19       ` David Rees
  2009-07-31 19:31         ` Jon Nelson
  0 siblings, 1 reply; 14+ messages in thread
From: David Rees @ 2009-07-31 19:19 UTC (permalink / raw)
  To: Jon Nelson; +Cc: Keld Jørn Simonsen, LinuxRaid

2009/7/31 Jon Nelson <jnelson-suse@jamponi.net>:
> 2009/7/31 Keld Jørn Simonsen <keld@dkuug.dk>:
>> On Thu, Jul 30, 2009 at 02:09:19PM -0700, David Rees wrote:
>>> On Mon, Jul 27, 2009 at 12:19 PM, Jon Nelson<jnelson-suse@jamponi.net> wrote:
>>> > When I say "bad performance" I mean writes that vary down to 100KB/s
>>> > or less, as reported by rsync. The "average" end-to-end speed for
>>> > writing large (500MB to 5GB) files hovers around 3-4MB/s. This is over
>>> > 100 MBit.
>>>
>>> That doesn't sound too unexpected.  rsync does a lot of reading and
>>> writing, so you're going to see less than network speeds.
>>
>> rsync also does a lot of computing, so that is also a source of delay.
>> For faster cpu's the delay is not so big. I get near network speed ( 80
>> Mbit/s on a 100 Mbit/s LAN) between two of my faster machines.
>
> In this case, you may all be assured it was not doing much reading
> over NFS or computing, as the file did not exist on the NFS share - in
> this case NFS goes into (more or less) pure write mode - no updating
> per-se.

OK, so how fast does a simple dd write to the NFS share?

-Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: help with bad performing raid6
  2009-07-31 19:19       ` David Rees
@ 2009-07-31 19:31         ` Jon Nelson
  0 siblings, 0 replies; 14+ messages in thread
From: Jon Nelson @ 2009-07-31 19:31 UTC (permalink / raw)
  Cc: LinuxRaid

> OK, so how fast does a simple dd write to the NFS share?

I changed some of the test params, here: now I am over a gig-e link
and switch instead of 100Mbit.

Performance: 16-18MB/s very steady. Load on server: around 9. I have 8
nfsd configured.

I can live with 17MB/s. I wonder if I had a hub/switch/whatever go bad
and start to flake out when under load. You may safely ignore this
thread.


-- 
Jon

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-07-31 19:31 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-07-27 19:19 help with bad performing raid6 Jon Nelson
2009-07-27 20:01 ` Robin Hill
2009-07-27 20:03   ` Jon Nelson
2009-07-27 20:44 ` John Robinson
2009-07-29 15:08 ` Bill Davidsen
2009-07-29 15:57   ` Jon Nelson
2009-07-29 16:06   ` Steven Haigh
2009-07-30 13:15     ` Bill Davidsen
2009-07-30 20:30       ` John Stoffel
2009-07-30 21:09 ` David Rees
2009-07-31 18:21   ` Keld Jørn Simonsen
2009-07-31 18:23     ` Jon Nelson
2009-07-31 19:19       ` David Rees
2009-07-31 19:31         ` Jon Nelson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).