* XFS and 2.6.18 -> 2.6.20-rc3 @ 2007-01-08 17:13 Mr. Berkley Shands 2007-01-08 17:17 ` Eric Sandeen 2007-01-09 1:22 ` David Chinner 0 siblings, 2 replies; 6+ messages in thread From: Mr. Berkley Shands @ 2007-01-08 17:13 UTC (permalink / raw) To: Eric Sandeen; +Cc: Dave Lloyd, linux-xfs My testbench is a 4 core Opteron (dual 275's) into two LSI8408E SAS controllers, into 16 Seagate 7200.10 320GB satas. Redhat ES4.4 (Centos 4.4). A slightly newer parted is needed than the contemporary of Moses that is shipped with the O/S. I have a standard burn in script that takes the 4 4-drive raid0's and puts a GPT label on them, aligns the partitions to stripe boundary's. It then proceeds to write 8GB files concurrently onto all 4 raid drives. Under 2.6.18.1 the write speeds start at 265MB/Sec and decrease mostly monotonically down to ~160MB/Sec, indicating that the files start on the outside (fastest tracks) and work in. All 4 raids are within 7-8MB/Sec of each other (usually they are identical in speed). By the time of 2.6.20-rc3, the same testbench shows a 10% across the board decrease in throughput for writes. Reads are unaffected. But now the allocation order for virgin file systems are random, usually starting at the slow 140MB/Sec, then bouncing up to 220MB/Sec, then around and around. No two raids get the same write speeds at the same time. Dave Lloyd (our in-house Idea Guy) looked at the allocation groups... Non-sequential, random... What data would you like to see? The run logs from 2.6.18.1 and 2.6.20-rc3? Want the scripts? The xfs-debug dumps of a few files? Berkley -- //E. F. Berkley Shands, MSc// **Exegy Inc.** 3668 S. Geyer Road, Suite 300 St. Louis, MO 63127 Direct: (314) 450-5348 Cell: (314) 303-2546 Office: (314) 450-5353 Fax: (314) 450-5354 This e-mail and any documents accompanying it may contain legally privileged and/or confidential information belonging to Exegy, Inc. Such information may be protected from disclosure by law. The information is intended for use by only the addressee. If you are not the intended recipient, you are hereby notified that any disclosure or use of the information is strictly prohibited. If you have received this e-mail in error, please immediately contact the sender by e-mail or phone regarding instructions for return or destruction and do not use or disclose the content to others. [[HTML alternate version deleted]] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: XFS and 2.6.18 -> 2.6.20-rc3 2007-01-08 17:13 XFS and 2.6.18 -> 2.6.20-rc3 Mr. Berkley Shands @ 2007-01-08 17:17 ` Eric Sandeen 2007-01-09 1:22 ` David Chinner 1 sibling, 0 replies; 6+ messages in thread From: Eric Sandeen @ 2007-01-08 17:17 UTC (permalink / raw) To: Mr. Berkley Shands; +Cc: Dave Lloyd, linux-xfs Mr. Berkley Shands wrote: > Dave Lloyd (our in-house Idea Guy) looked at the allocation groups... > Non-sequential, random... > > What data would you like to see? xfs_bmap -v on files where you consider the allocation to have changed between kernels, would show exactly how it has changed. -Eric ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: XFS and 2.6.18 -> 2.6.20-rc3 2007-01-08 17:13 XFS and 2.6.18 -> 2.6.20-rc3 Mr. Berkley Shands 2007-01-08 17:17 ` Eric Sandeen @ 2007-01-09 1:22 ` David Chinner 2007-01-09 7:25 ` David Chinner 1 sibling, 1 reply; 6+ messages in thread From: David Chinner @ 2007-01-09 1:22 UTC (permalink / raw) To: Mr. Berkley Shands; +Cc: Eric Sandeen, Dave Lloyd, linux-xfs On Mon, Jan 08, 2007 at 11:13:43AM -0600, Mr. Berkley Shands wrote: > My testbench is a 4 core Opteron (dual 275's) into > two LSI8408E SAS controllers, into 16 Seagate 7200.10 320GB satas. > Redhat ES4.4 (Centos 4.4). A slightly newer parted is needed > than the contemporary of Moses that is shipped with the O/S. > > I have a standard burn in script that takes the 4 4-drive raid0's > and puts a GPT label on them, aligns the partitions to stripe > boundary's. It then proceeds to write 8GB files concurrently > onto all 4 raid drives. How many files are being written at the same time to each filesystem? buffered or direct I/O? I/O size? how much memory in the machine? What size I/Os are actually hitting the disks? > Under 2.6.18.1 the write speeds start at 265MB/Sec and decrease > mostly monotonically down to ~160MB/Sec, indicating that > the files start on the outside (fastest tracks) and work in. So you are filling the entire disk with this test? > All 4 raids are within 7-8MB/Sec of each other (usually they > are identical in speed). > > By the time of 2.6.20-rc3, the same testbench shows > a 10% across the board decrease in throughput for writes. > Reads are unaffected. Reads being unaffected indicates the files are not being fragmented badly. > But now the allocation order for virgin file systems are random, How did you determine this? > usually starting at the slow 140MB/Sec, then bouncing up to 220MB/Sec, > then around and around. No two raids get the same write speeds at the > same time. > > Dave Lloyd (our in-house Idea Guy) looked at the allocation groups... > Non-sequential, random... > > What data would you like to see? First thing to do is run a set of write tests to the _raw_ devices, not to the filesystem so we can rule out a driver/hardware problem. Can you do something as simple as concurrent writes to each raid lun to see if .18 and .20 perform the same? > The run logs from 2.6.18.1 and 2.6.20-rc3? > Want the scripts? Yes please. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: XFS and 2.6.18 -> 2.6.20-rc3 2007-01-09 1:22 ` David Chinner @ 2007-01-09 7:25 ` David Chinner 2007-01-10 13:29 ` Mr. Berkley Shands 0 siblings, 1 reply; 6+ messages in thread From: David Chinner @ 2007-01-09 7:25 UTC (permalink / raw) To: David Chinner; +Cc: Mr. Berkley Shands, Eric Sandeen, Dave Lloyd, linux-xfs On Tue, Jan 09, 2007 at 12:22:12PM +1100, David Chinner wrote: > On Mon, Jan 08, 2007 at 11:13:43AM -0600, Mr. Berkley Shands wrote: > > My testbench is a 4 core Opteron (dual 275's) into > > two LSI8408E SAS controllers, into 16 Seagate 7200.10 320GB satas. > > Redhat ES4.4 (Centos 4.4). A slightly newer parted is needed > > than the contemporary of Moses that is shipped with the O/S. > > > > I have a standard burn in script that takes the 4 4-drive raid0's > > and puts a GPT label on them, aligns the partitions to stripe > > boundary's. It then proceeds to write 8GB files concurrently > > onto all 4 raid drives. I just ran up a similar test - single large file per device on a 4 core Xeon (woodcrest) with 16GB RAM, a single PCI-X SAS HBA and 12x10krpm 300GB SAS disks split into 3x4 disk dm raid zero stripes on 2.6.18 and 2.6.20-rc3. I see the same thing - 2.6.20-rc3 is more erractic and quite a bit slower than 2.6.18 when going through XFS. I suggest trying this on 2.6.20-rc3: # echo 10 > /proc/sys/vm/dirty_ratio That restored most of the lost performance and consistency in my testing.... Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: XFS and 2.6.18 -> 2.6.20-rc3 2007-01-09 7:25 ` David Chinner @ 2007-01-10 13:29 ` Mr. Berkley Shands 2007-01-10 22:08 ` David Chinner 0 siblings, 1 reply; 6+ messages in thread From: Mr. Berkley Shands @ 2007-01-10 13:29 UTC (permalink / raw) To: David Chinner; +Cc: Eric Sandeen, Dave Lloyd, linux-xfs With a fresh install of the O/S on a non-broken motherboard, the change to /proc/sys/vm/dirty_ratio restores most of the lost performance from 2.6.18, as of 2.6.20-rc4. The difference is 10% to 15% without the dirty_ratio change (40 is the default, 10 gives the old performance). Data: Writing, 8192 MB, Buffer: 128 KB, Time: 33571 MS, Rate: 244.020, to /s0/GigaData.0 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 33480 MS, Rate: 244.683, to /s2/GigaData.0 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 33570 MS, Rate: 244.027, to /s1/GigaData.0 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 33579 MS, Rate: 243.962, to /s3/GigaData.0 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 31486 MS, Rate: 260.179, to /s0/GigaData.1 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 31537 MS, Rate: 259.758, to /s2/GigaData.1 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 31538 MS, Rate: 259.750, to /s3/GigaData.1 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 31549 MS, Rate: 259.660, to /s1/GigaData.1 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 32205 MS, Rate: 254.370, to /s2/GigaData.2 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 32205 MS, Rate: 254.370, to /s1/GigaData.2 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 32218 MS, Rate: 254.268, to /s3/GigaData.2 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 32220 MS, Rate: 254.252, to /s0/GigaData.2 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 33536 MS, Rate: 244.275, to /s0/GigaData.3 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 33583 MS, Rate: 243.933, to /s3/GigaData.3 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 34016 MS, Rate: 240.828, to /s1/GigaData.3 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 34018 MS, Rate: 240.814, to /s2/GigaData.3 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 34121 MS, Rate: 240.087, to /s1/GigaData.4 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 34123 MS, Rate: 240.073, to /s0/GigaData.4 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 34114 MS, Rate: 240.136, to /s2/GigaData.4 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 34122 MS, Rate: 240.080, to /s3/GigaData.4 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 30386 MS, Rate: 269.598, to /s0/GigaData.9 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 30384 MS, Rate: 269.616, to /s1/GigaData.9 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 30717 MS, Rate: 266.693, to /s3/GigaData.9 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 30726 MS, Rate: 266.615, to /s2/GigaData.9 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 30088 MS, Rate: 272.268, to /s0/GigaData.10 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 30079 MS, Rate: 272.349, to /s2/GigaData.10 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 30086 MS, Rate: 272.286, to /s1/GigaData.10 Data: Writing, 8192 MB, Buffer: 128 KB, Time: 29989 MS, Rate: 273.167, to /s3/GigaData.10 So whatever needs to be tweaked in the VM system seems to be the key. Thanks to all for getting this regression repaired. Berkley David Chinner wrote: > On Tue, Jan 09, 2007 at 12:22:12PM +1100, David Chinner wrote: > >> On Mon, Jan 08, 2007 at 11:13:43AM -0600, Mr. Berkley Shands wrote: >> >>> My testbench is a 4 core Opteron (dual 275's) into >>> two LSI8408E SAS controllers, into 16 Seagate 7200.10 320GB satas. >>> Redhat ES4.4 (Centos 4.4). A slightly newer parted is needed >>> than the contemporary of Moses that is shipped with the O/S. >>> >>> I have a standard burn in script that takes the 4 4-drive raid0's >>> and puts a GPT label on them, aligns the partitions to stripe >>> boundary's. It then proceeds to write 8GB files concurrently >>> onto all 4 raid drives. >>> > > I just ran up a similar test - single large file per device on a 4 > core Xeon (woodcrest) with 16GB RAM, a single PCI-X SAS HBA and > 12x10krpm 300GB SAS disks split into 3x4 disk dm raid zero stripes > on 2.6.18 and 2.6.20-rc3. > > I see the same thing - 2.6.20-rc3 is more erractic and quite a > bit slower than 2.6.18 when going through XFS. > > I suggest trying this on 2.6.20-rc3: > > # echo 10 > /proc/sys/vm/dirty_ratio > > That restored most of the lost performance and consistency > in my testing.... > > Cheers, > > Dave. > -- //E. F. Berkley Shands, MSc// **Exegy Inc.** 3668 S. Geyer Road, Suite 300 St. Louis, MO 63127 Direct: (314) 450-5348 Cell: (314) 303-2546 Office: (314) 450-5353 Fax: (314) 450-5354 This e-mail and any documents accompanying it may contain legally privileged and/or confidential information belonging to Exegy, Inc. Such information may be protected from disclosure by law. The information is intended for use by only the addressee. If you are not the intended recipient, you are hereby notified that any disclosure or use of the information is strictly prohibited. If you have received this e-mail in error, please immediately contact the sender by e-mail or phone regarding instructions for return or destruction and do not use or disclose the content to others. [[HTML alternate version deleted]] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: XFS and 2.6.18 -> 2.6.20-rc3 2007-01-10 13:29 ` Mr. Berkley Shands @ 2007-01-10 22:08 ` David Chinner 0 siblings, 0 replies; 6+ messages in thread From: David Chinner @ 2007-01-10 22:08 UTC (permalink / raw) To: Mr. Berkley Shands; +Cc: David Chinner, Eric Sandeen, Dave Lloyd, linux-xfs On Wed, Jan 10, 2007 at 07:29:15AM -0600, Mr. Berkley Shands wrote: > With a fresh install of the O/S on a non-broken motherboard, the change to > > /proc/sys/vm/dirty_ratio > > restores most of the lost performance from 2.6.18, > as of 2.6.20-rc4. The difference is 10% to 15% without the dirty_ratio > change (40 is the default, 10 gives the old performance). .... > So whatever needs to be tweaked in the VM system seems to be the key. > > Thanks to all for getting this regression repaired. Well, it's not repaired as such - you've got a WAR for the problem. I'll report the problem to lkml so that the VM gurus can try to really fix the problem.... Thanks for confirming that the dirty_ratio tweak also worked for you. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2007-01-10 22:09 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-01-08 17:13 XFS and 2.6.18 -> 2.6.20-rc3 Mr. Berkley Shands 2007-01-08 17:17 ` Eric Sandeen 2007-01-09 1:22 ` David Chinner 2007-01-09 7:25 ` David Chinner 2007-01-10 13:29 ` Mr. Berkley Shands 2007-01-10 22:08 ` David Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox