XFS and 2.6.18 -> 2.6.20-rc3

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* XFS and 2.6.18 -> 2.6.20-rc3
@ 2007-01-08 17:13 Mr. Berkley Shands
  2007-01-08 17:17 ` Eric Sandeen
  2007-01-09  1:22 ` David Chinner
  0 siblings, 2 replies; 6+ messages in thread
From: Mr. Berkley Shands @ 2007-01-08 17:13 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Dave Lloyd, linux-xfs

My testbench is a 4 core Opteron (dual 275's) into
two LSI8408E SAS controllers, into 16 Seagate 7200.10 320GB satas.
Redhat ES4.4 (Centos 4.4). A slightly newer parted is needed
than the contemporary of Moses that is shipped with the O/S.

I have a standard burn in script that takes the 4 4-drive raid0's
and puts a GPT label on them, aligns the partitions to stripe
boundary's. It then proceeds to write 8GB files concurrently
onto all 4 raid drives.

Under 2.6.18.1 the write speeds start at 265MB/Sec and decrease
mostly monotonically down to ~160MB/Sec, indicating that
the files start on the outside (fastest tracks) and work in.
All 4 raids are within 7-8MB/Sec of each other (usually they
are identical in speed).

By the time of 2.6.20-rc3, the same testbench shows
a 10% across the board decrease in throughput for writes.
Reads are unaffected.
But now the allocation order for virgin file systems are random,
usually starting at the slow 140MB/Sec, then bouncing up to 220MB/Sec,
then around and around. No two raids get the same write speeds at the 
same time.

Dave Lloyd (our in-house Idea Guy) looked at the allocation groups...
Non-sequential, random...

What data would you like to see?

The run logs from 2.6.18.1 and 2.6.20-rc3?
Want the scripts?
The xfs-debug dumps of a few files?

Berkley

-- 

//E. F. Berkley Shands, MSc//

**Exegy Inc.**

3668 S. Geyer Road, Suite 300

St. Louis, MO  63127

Direct:  (314) 450-5348

Cell:  (314) 303-2546

Office:  (314) 450-5353

Fax:  (314) 450-5354

This e-mail and any documents accompanying it may contain legally 
privileged and/or confidential information belonging to Exegy, Inc.  
Such information may be protected from disclosure by law.  The 
information is intended for use by only the addressee.  If you are not 
the intended recipient, you are hereby notified that any disclosure or 
use of the information is strictly prohibited.  If you have received 
this e-mail in error, please immediately contact the sender by e-mail or 
phone regarding instructions for return or destruction and do not use or 
disclose the content to others.

[[HTML alternate version deleted]]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: XFS and 2.6.18 -> 2.6.20-rc3
  2007-01-08 17:13 XFS and 2.6.18 -> 2.6.20-rc3 Mr. Berkley Shands
@ 2007-01-08 17:17 ` Eric Sandeen
  2007-01-09  1:22 ` David Chinner
  1 sibling, 0 replies; 6+ messages in thread
From: Eric Sandeen @ 2007-01-08 17:17 UTC (permalink / raw)
  To: Mr. Berkley Shands; +Cc: Dave Lloyd, linux-xfs

Mr. Berkley Shands wrote:

> Dave Lloyd (our in-house Idea Guy) looked at the allocation groups...
> Non-sequential, random...
> 
> What data would you like to see?

xfs_bmap -v on files where you consider the allocation to have changed
between kernels, would show exactly how it has changed.

-Eric

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: XFS and 2.6.18 -> 2.6.20-rc3
  2007-01-08 17:13 XFS and 2.6.18 -> 2.6.20-rc3 Mr. Berkley Shands
  2007-01-08 17:17 ` Eric Sandeen
@ 2007-01-09  1:22 ` David Chinner
  2007-01-09  7:25   ` David Chinner
  1 sibling, 1 reply; 6+ messages in thread
From: David Chinner @ 2007-01-09  1:22 UTC (permalink / raw)
  To: Mr. Berkley Shands; +Cc: Eric Sandeen, Dave Lloyd, linux-xfs

On Mon, Jan 08, 2007 at 11:13:43AM -0600, Mr. Berkley Shands wrote:
> My testbench is a 4 core Opteron (dual 275's) into
> two LSI8408E SAS controllers, into 16 Seagate 7200.10 320GB satas.
> Redhat ES4.4 (Centos 4.4). A slightly newer parted is needed
> than the contemporary of Moses that is shipped with the O/S.
> 
> I have a standard burn in script that takes the 4 4-drive raid0's
> and puts a GPT label on them, aligns the partitions to stripe
> boundary's. It then proceeds to write 8GB files concurrently
> onto all 4 raid drives.

How many files are being written at the same time to each filesystem?
buffered or direct I/O? I/O size? how much memory in the machine?
What size I/Os are actually hitting the disks?

> Under 2.6.18.1 the write speeds start at 265MB/Sec and decrease
> mostly monotonically down to ~160MB/Sec, indicating that
> the files start on the outside (fastest tracks) and work in.

So you are filling the entire disk with this test?

> All 4 raids are within 7-8MB/Sec of each other (usually they
> are identical in speed).
>
> By the time of 2.6.20-rc3, the same testbench shows
> a 10% across the board decrease in throughput for writes.
> Reads are unaffected.

Reads being unaffected indicates the files are not being fragmented
badly.

> But now the allocation order for virgin file systems are random,

How did you determine this?

> usually starting at the slow 140MB/Sec, then bouncing up to 220MB/Sec,
> then around and around. No two raids get the same write speeds at the 
> same time.
> 
> Dave Lloyd (our in-house Idea Guy) looked at the allocation groups...
> Non-sequential, random...
> 
> What data would you like to see?

First thing to do is run a set of write tests to the _raw_ devices,
not to the filesystem so we can rule out a driver/hardware problem.

Can you do something as simple as concurrent writes to each raid lun
to see if .18 and .20 perform the same?

> The run logs from 2.6.18.1 and 2.6.20-rc3?
> Want the scripts?

Yes please.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: XFS and 2.6.18 -> 2.6.20-rc3
  2007-01-09  1:22 ` David Chinner
@ 2007-01-09  7:25   ` David Chinner
  2007-01-10 13:29     ` Mr. Berkley Shands
  0 siblings, 1 reply; 6+ messages in thread
From: David Chinner @ 2007-01-09  7:25 UTC (permalink / raw)
  To: David Chinner; +Cc: Mr. Berkley Shands, Eric Sandeen, Dave Lloyd, linux-xfs

On Tue, Jan 09, 2007 at 12:22:12PM +1100, David Chinner wrote:
> On Mon, Jan 08, 2007 at 11:13:43AM -0600, Mr. Berkley Shands wrote:
> > My testbench is a 4 core Opteron (dual 275's) into
> > two LSI8408E SAS controllers, into 16 Seagate 7200.10 320GB satas.
> > Redhat ES4.4 (Centos 4.4). A slightly newer parted is needed
> > than the contemporary of Moses that is shipped with the O/S.
> > 
> > I have a standard burn in script that takes the 4 4-drive raid0's
> > and puts a GPT label on them, aligns the partitions to stripe
> > boundary's. It then proceeds to write 8GB files concurrently
> > onto all 4 raid drives.

I just ran up a similar test - single large file per device on a 4
core Xeon (woodcrest) with 16GB RAM, a single PCI-X SAS HBA and
12x10krpm 300GB SAS disks split into 3x4 disk dm raid zero stripes
on 2.6.18 and 2.6.20-rc3.

I see the same thing - 2.6.20-rc3 is more erractic and quite a
bit slower than 2.6.18 when going through XFS.

I suggest trying this on 2.6.20-rc3:

# echo 10 > /proc/sys/vm/dirty_ratio

That restored most of the lost performance and consistency
in my testing....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: XFS and 2.6.18 -> 2.6.20-rc3
  2007-01-09  7:25   ` David Chinner
@ 2007-01-10 13:29     ` Mr. Berkley Shands
  2007-01-10 22:08       ` David Chinner
  0 siblings, 1 reply; 6+ messages in thread
From: Mr. Berkley Shands @ 2007-01-10 13:29 UTC (permalink / raw)
  To: David Chinner; +Cc: Eric Sandeen, Dave Lloyd, linux-xfs

With a fresh install of the O/S on a non-broken motherboard, the change to

/proc/sys/vm/dirty_ratio

restores most of the lost performance from 2.6.18,
as of 2.6.20-rc4. The difference is 10% to 15% without the dirty_ratio
change (40 is the default, 10 gives the old performance).

Data: Writing, 8192 MB, Buffer: 128 KB, Time: 33571 MS, Rate:   244.020, to /s0/GigaData.0
Data: Writing, 8192 MB, Buffer: 128 KB, Time: 33480 MS, Rate:   244.683, to /s2/GigaData.0
Data: Writing, 8192 MB, Buffer: 128 KB, Time: 33570 MS, Rate:   244.027, to /s1/GigaData.0
Data: Writing, 8192 MB, Buffer: 128 KB, Time: 33579 MS, Rate:   243.962, to /s3/GigaData.0
Data: Writing, 8192 MB, Buffer: 128 KB, Time: 31486 MS, Rate:   260.179, to /s0/GigaData.1
Data: Writing, 8192 MB, Buffer: 128 KB, Time: 31537 MS, Rate:   259.758, to /s2/GigaData.1
Data: Writing, 8192 MB, Buffer: 128 KB, Time: 31538 MS, Rate:   259.750, to /s3/GigaData.1
Data: Writing, 8192 MB, Buffer: 128 KB, Time: 31549 MS, Rate:   259.660, to /s1/GigaData.1
Data: Writing, 8192 MB, Buffer: 128 KB, Time: 32205 MS, Rate:   254.370, to /s2/GigaData.2
Data: Writing, 8192 MB, Buffer: 128 KB, Time: 32205 MS, Rate:   254.370, to /s1/GigaData.2
Data: Writing, 8192 MB, Buffer: 128 KB, Time: 32218 MS, Rate:   254.268, to /s3/GigaData.2
Data: Writing, 8192 MB, Buffer: 128 KB, Time: 32220 MS, Rate:   254.252, to /s0/GigaData.2
Data: Writing, 8192 MB, Buffer: 128 KB, Time: 33536 MS, Rate:   244.275, to /s0/GigaData.3
Data: Writing, 8192 MB, Buffer: 128 KB, Time: 33583 MS, Rate:   243.933, to /s3/GigaData.3
Data: Writing, 8192 MB, Buffer: 128 KB, Time: 34016 MS, Rate:   240.828, to /s1/GigaData.3
Data: Writing, 8192 MB, Buffer: 128 KB, Time: 34018 MS, Rate:   240.814, to /s2/GigaData.3
Data: Writing, 8192 MB, Buffer: 128 KB, Time: 34121 MS, Rate:   240.087, to /s1/GigaData.4
Data: Writing, 8192 MB, Buffer: 128 KB, Time: 34123 MS, Rate:   240.073, to /s0/GigaData.4
Data: Writing, 8192 MB, Buffer: 128 KB, Time: 34114 MS, Rate:   240.136, to /s2/GigaData.4
Data: Writing, 8192 MB, Buffer: 128 KB, Time: 34122 MS, Rate:   240.080, to /s3/GigaData.4
Data: Writing, 8192 MB, Buffer: 128 KB, Time: 30386 MS, Rate:   269.598, to /s0/GigaData.9
Data: Writing, 8192 MB, Buffer: 128 KB, Time: 30384 MS, Rate:   269.616, to /s1/GigaData.9
Data: Writing, 8192 MB, Buffer: 128 KB, Time: 30717 MS, Rate:   266.693, to /s3/GigaData.9
Data: Writing, 8192 MB, Buffer: 128 KB, Time: 30726 MS, Rate:   266.615, to /s2/GigaData.9
Data: Writing, 8192 MB, Buffer: 128 KB, Time: 30088 MS, Rate:   272.268, to /s0/GigaData.10
Data: Writing, 8192 MB, Buffer: 128 KB, Time: 30079 MS, Rate:   272.349, to /s2/GigaData.10
Data: Writing, 8192 MB, Buffer: 128 KB, Time: 30086 MS, Rate:   272.286, to /s1/GigaData.10
Data: Writing, 8192 MB, Buffer: 128 KB, Time: 29989 MS, Rate:   273.167, to /s3/GigaData.10

So whatever needs to be tweaked in the VM system seems to be the key.

Thanks to all for getting this regression repaired.

Berkley


David Chinner wrote:
> On Tue, Jan 09, 2007 at 12:22:12PM +1100, David Chinner wrote:
>   
>> On Mon, Jan 08, 2007 at 11:13:43AM -0600, Mr. Berkley Shands wrote:
>>     
>>> My testbench is a 4 core Opteron (dual 275's) into
>>> two LSI8408E SAS controllers, into 16 Seagate 7200.10 320GB satas.
>>> Redhat ES4.4 (Centos 4.4). A slightly newer parted is needed
>>> than the contemporary of Moses that is shipped with the O/S.
>>>
>>> I have a standard burn in script that takes the 4 4-drive raid0's
>>> and puts a GPT label on them, aligns the partitions to stripe
>>> boundary's. It then proceeds to write 8GB files concurrently
>>> onto all 4 raid drives.
>>>       
>
> I just ran up a similar test - single large file per device on a 4
> core Xeon (woodcrest) with 16GB RAM, a single PCI-X SAS HBA and
> 12x10krpm 300GB SAS disks split into 3x4 disk dm raid zero stripes
> on 2.6.18 and 2.6.20-rc3.
>
> I see the same thing - 2.6.20-rc3 is more erractic and quite a
> bit slower than 2.6.18 when going through XFS.
>
> I suggest trying this on 2.6.20-rc3:
>
> # echo 10 > /proc/sys/vm/dirty_ratio
>
> That restored most of the lost performance and consistency
> in my testing....
>
> Cheers,
>
> Dave.
>   


-- 

//E. F. Berkley Shands, MSc//

**Exegy Inc.**

3668 S. Geyer Road, Suite 300

St. Louis, MO  63127

Direct:  (314) 450-5348

Cell:  (314) 303-2546

Office:  (314) 450-5353

Fax:  (314) 450-5354

 

This e-mail and any documents accompanying it may contain legally 
privileged and/or confidential information belonging to Exegy, Inc.  
Such information may be protected from disclosure by law.  The 
information is intended for use by only the addressee.  If you are not 
the intended recipient, you are hereby notified that any disclosure or 
use of the information is strictly prohibited.  If you have received 
this e-mail in error, please immediately contact the sender by e-mail or 
phone regarding instructions for return or destruction and do not use or 
disclose the content to others.

 



[[HTML alternate version deleted]]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: XFS and 2.6.18 -> 2.6.20-rc3
  2007-01-10 13:29     ` Mr. Berkley Shands
@ 2007-01-10 22:08       ` David Chinner
  0 siblings, 0 replies; 6+ messages in thread
From: David Chinner @ 2007-01-10 22:08 UTC (permalink / raw)
  To: Mr. Berkley Shands; +Cc: David Chinner, Eric Sandeen, Dave Lloyd, linux-xfs

On Wed, Jan 10, 2007 at 07:29:15AM -0600, Mr. Berkley Shands wrote:
> With a fresh install of the O/S on a non-broken motherboard, the change to
> 
> /proc/sys/vm/dirty_ratio
> 
> restores most of the lost performance from 2.6.18,
> as of 2.6.20-rc4. The difference is 10% to 15% without the dirty_ratio
> change (40 is the default, 10 gives the old performance).

....

> So whatever needs to be tweaked in the VM system seems to be the key.
> 
> Thanks to all for getting this regression repaired.

Well, it's not repaired as such - you've got a WAR for the problem.
I'll report the problem to lkml so that the VM gurus can try to
really fix the problem....

Thanks for confirming that the dirty_ratio tweak also worked
for you.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2007-01-10 22:09 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-01-08 17:13 XFS and 2.6.18 -> 2.6.20-rc3 Mr. Berkley Shands
2007-01-08 17:17 ` Eric Sandeen
2007-01-09  1:22 ` David Chinner
2007-01-09  7:25   ` David Chinner
2007-01-10 13:29     ` Mr. Berkley Shands
2007-01-10 22:08       ` David Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox