* [MMTests] IO metadata on ext3 [not found] ` <20120629111932.GA14154@suse.de> @ 2012-06-29 11:23 ` Mel Gorman 2012-06-29 11:24 ` [MMTests] IO metadata on ext4 Mel Gorman ` (7 subsequent siblings) 8 siblings, 0 replies; 29+ messages in thread From: Mel Gorman @ 2012-06-29 11:23 UTC (permalink / raw) To: linux-mm; +Cc: linux-kernel, linux-fsdevel Configuration: global-dhp__io-metadata-ext3 Benchmarks: dbench3, fsmark-single, fsmark-threaded Summary ======= While the fsmark figures look ok, fsmark in single threaded mode has a small number of large outliers towards the min end of the scale. The resulting standard deviation fuzzes results but filtering is not necessarily the best answer. A similar effect is visible when running in threaded mode except that there is clustering around some values that could be presented better. Arithmetic mean is unsuitable for this sort of data. Benchmark notes =============== mkfs was run on system startup. No attempt was made to age it. No special mkfs or mount options were used. dbench3 was chosen as it's metadata intensive. o Duration was 180 seconds o OSYNC, OSYNC_DIRECTORY and FSYNC were all off As noted in the MMTests, dbench3 can be a random number generator particularly when run in asynchronous mode. Even with the limitations, it can be useful as an early warning system and as it's still used by QA teams it's still worth keeping an eye on. FSMark o Parallel directories were used o 1 Thread per CPU o 0 Filesize o 225 directories o 22500 files per directory o 50000 files per iteration o 15 iterations Single: ./fs_mark -d /tmp/fsmark-9227/1 -D 225 -N 22500 -n 50000 -L 15 -S0 -s 0 Thread: ./fs_mark -d /tmp/fsmark-9407/1 -d /tmp/fsmark-9407/2 -D 225 -N 22500 -n 25000 -L 15 -S0 -s 0 FSMark is a more realistic indicator of metadata intensive workloads. =========================================================== Machine: arnold Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-metadata-ext3/arnold/comparison.html Arch: x86 CPUs: 1 socket, 2 threads Model: Pentium 4 Disk: Single Rotary Disk Status: Fine =========================================================== dbench ------ For single clients, we're doing reasonably well. There was a big spike for large number of clients in 2.6.34 and to a lesser extent in 2.6.39.4 but much of this is due to the operations taking place in memory without reaching disk. There were also fairness issues and the indicated throughput figures are far higher than the disks capabilities so I do not consider this to be a regression. There was a mild dip for 3.2.x and 3.3.x that has been recovered somewhat in 3.4. As this was when IO-Less Dirty Throttling got merged it is hardly a surprise and a dip in dbench is not worth backing that out for. Recent kernels appear to deal worse for large number of clients. However, I very strongly suspect this is due to improved fairness in IO. The high throughput figures are due to one client making an unfair amount of progress while other clients stall. fsmark-single ------------- Again, this is looking good. Files/sec has improved slightly with the exception of a dip in 3.2 and 3.3 which again may be due to IO-Less dirty throttling. I have a slight concern with the overhead measurements. Somewhere between 3.0.23 and 3.1.10 the overhead started deviating a lot more. Ideally this should be bisected because it difficult to blame IO-Less throttling with any certainity. IO-Less Throttling. fsmark-threaded --------------- Looks better but due to high deviations it's hard to be 100% sure. If this is of interest then the thing to do is do a proper measurement of whether the results are significant or not although with 15 samples it still will be fuzzy. Right now, there is little to be concerned about. ========================================================== Machine: hydra Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-metadata-ext3/hydra/comparison.html Arch: x86-64 CPUs: 1 socket, 4 threads Model: AMD Phenom II X4 940 Disk: Single Rotary Disk Status: Fine ========================================================== dbench ------ Very similar story to the arnold machine. Big spike in 2.6.34 and then drops off. fsmark-single ------------- Nothing really notable here. Deviations are too high to draw reasonable conclusions from. Looking at the raw results it's due to a small number of low outliers. These could be filtered but it would mask the fact that throughput is not consistent so strong justification would be required. fsmark-threaded --------------- Similar to fsmark-single. Figures look ok but large deviations are a problem Unlike the single-threaded case the raw data shows that we cluster around two points that are very far apart from each other. It is worth investigating if this can be presented in some sensible manner such as k-means clustering because arithmetic mean with this sort of data is crap. ========================================================== Machine: sandy Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-metadata-ext3/sandy/comparison.html Arch: x86-64 CPUs: 1 socket, 8 threads Model: Intel Core i7-2600 Disk: Single Rotary Disk Status: Fine ========================================================== dbench ------ Same as the other two complete with spikes. fsmark-single ------------- Other than overhead going crazy in 3.2 there is nothing notable either. As with hydra, there are a small number of outliers that result in large deviations fsmark-threaded --------------- Similar to hydra. Figures look basically ok but deviations are high with some clustering going on. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 29+ messages in thread
* [MMTests] IO metadata on ext4 [not found] ` <20120629111932.GA14154@suse.de> 2012-06-29 11:23 ` [MMTests] IO metadata on ext3 Mel Gorman @ 2012-06-29 11:24 ` Mel Gorman 2012-06-29 11:25 ` [MMTests] IO metadata on XFS Mel Gorman ` (6 subsequent siblings) 8 siblings, 0 replies; 29+ messages in thread From: Mel Gorman @ 2012-06-29 11:24 UTC (permalink / raw) To: linux-mm; +Cc: linux-kernel, linux-fsdevel Configuration: global-dhp__io-metadata-ext4 Benchmarks: dbench3, fsmark-single, fsmark-threaded Summary ======= For the most part the figures look ok currently. However a number of tests show that we have declined since 3.0 in a number of areas. Some machines show that there were performance drops in the 3.2 and 3.3 kernels that have not being fully recovered. Benchmark notes =============== mkfs was run on system startup. No attempt was made to age it. No special mkfs or mount options were used. dbench3 was chosen as it's metadata intensive. o Duration was 180 seconds o OSYNC, OSYNC_DIRECTORY and FSYNC were all off As noted in the MMTests, dbench3 can be a random number generator particularly when run in asynchronous mode. Even with the limitations, it can be useful as an early warning system and as it's still used by QA teams it's still worth keeping an eye on. FSMark o Parallel directories were used o 1 Thread per CPU o 0 Filesize o 225 directories o 22500 files per directory o 50000 files per iteration o 15 iterations Single: ./fs_mark -d /tmp/fsmark-9227/1 -D 225 -N 22500 -n 50000 -L 15 -S0 -s 0 Thread: ./fs_mark -d /tmp/fsmark-9407/1 -d /tmp/fsmark-9407/2 -D 225 -N 22500 -n 25000 -L 15 -S0 -s 0 FSMark is a more realistic indicator of metadata intensive workloads. =========================================================== Machine: arnold Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-metadata-ext4/arnold/comparison.html Arch: x86 CPUs: 1 socket, 2 threads Model: Pentium 4 Disk: Single Rotary Disk Status: Fine but fsmark has declined since 3.0 =========================================================== dbench ------ For single clients, we're doing reasonably well and this has been consistent with each release. fsmark-single ------------- This is not as happy a story. Variations are quite high but 3.0 was a reasonably good kernel and we've been declining ever since with 3.4 being marginally worse than 2.6.32. fsmark-threaded --------------- The trends are very similar to fsmark-single. 3.0 was reasonably good but we have degraded since and are at approximately 2.6.32 levels. ========================================================== Machine: hydra Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-metadata-ext4/hydra/comparison.html Arch: x86-64 CPUs: 1 socket, 4 threads Model: AMD Phenom II X4 940 Disk: Single Rotary Disk Status: Fine but fsmark has declined since 3.0 ========================================================== dbench ------ Unlike arnold, this is looking good with solid gains in most kernels for the single-threaded case. The exception was 3.2.9 which saw a a big dip that was recovered in 3.3. For higher number of clients the figures still look good. It's not clear why there is such a difference between arnold and hydra for the single-threaded case. fsmark-single ------------- This is very similar to arnold in that 3.0 performed best and we have declined since back to more or less the same level as 2.6.32. fsmark-threaded --------------- Performance here is flat in terms of throughput. 3.4 recorded much higher overhead but it is not clear if this is a cause for concern. ========================================================== Machine: sandy Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-metadata-ext4/sandy/comparison.html Arch: x86-64 CPUs: 1 socket, 8 threads Model: Intel Core i7-2600 Disk: Single Rotary Disk Status: Fine but there have been recent declines ========================================================== dbench ------ Like hydra, this is looking good with solid gains in most kernels for the single-threaded case. The same dip in 3.2.9 is visible but unlikely hydra it was not recovered until 3.4. Higher number of clients generally look good as well although it is interesting to see that the dip in 3.2.9 is not consistently visible. fsmark-single ------------- Overhead went crazy in 3.3 and there is a large drop in files/sec in 3.3 as well. fsmark-threaded --------------- The trends are similar to the single-threaded case. Looking reasonably good but a dip in 3.3 that has not being recovered and overhead is higher. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 29+ messages in thread
* [MMTests] IO metadata on XFS [not found] ` <20120629111932.GA14154@suse.de> 2012-06-29 11:23 ` [MMTests] IO metadata on ext3 Mel Gorman 2012-06-29 11:24 ` [MMTests] IO metadata on ext4 Mel Gorman @ 2012-06-29 11:25 ` Mel Gorman 2012-07-01 23:54 ` Dave Chinner 2012-07-05 14:56 ` [MMTests] Interactivity during IO on ext3 Mel Gorman ` (5 subsequent siblings) 8 siblings, 1 reply; 29+ messages in thread From: Mel Gorman @ 2012-06-29 11:25 UTC (permalink / raw) To: linux-mm; +Cc: linux-kernel, linux-fsdevel, xfs Configuration: global-dhp__io-metadata-xfs Benchmarks: dbench3, fsmark-single, fsmark-threaded Summary ======= Most of the figures look good and in general there has been consistent good performance from XFS. However, fsmark-single is showing a severe performance dip in a few cases somewhere between 3.1 and 3.4. fs-mark running a single thread took a particularly bad dive in 3.4 for two machines that is worth examining closer. Unfortunately it is harder to easy conclusions as the gains/losses are not consistent between machines which may be related to the available number of CPU threads. Benchmark notes =============== mkfs was run on system startup. mkfs parameters -f -d agcount=8 mount options inode64,delaylog,logbsize=262144,nobarrier for the most part. On kernels to old to support delaylog was removed. On kernels where it was the default, it was specified and the warning ignored. dbench3 was chosen as it's metadata intensive. o Duration was 180 seconds o OSYNC, OSYNC_DIRECTORY and FSYNC were all off As noted in the MMTests, dbench3 can be a random number generator particularly when run in asynchronous mode. Even with the limitations, it can be useful as an early warning system and as it's still used by QA teams it's still worth keeping an eye on. FSMark o Parallel directories were used o 1 Thread per CPU o 0 Filesize o 225 directories o 22500 files per directory o 50000 files per iteration o 15 iterations Single: ./fs_mark -d /tmp/fsmark-9227/1 -D 225 -N 22500 -n 50000 -L 15 -S0 -s 0 Thread: ./fs_mark -d /tmp/fsmark-9407/1 -d /tmp/fsmark-9407/2 -D 225 -N 22500 -n 25000 -L 15 -S0 -s 0 FSMark is a more realistic indicator of metadata intensive workloads. =========================================================== Machine: arnold Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-metadata-xfs/arnold/comparison.html Arch: x86 CPUs: 1 socket, 2 threads Model: Pentium 4 Disk: Single Rotary Disk Status: Great =========================================================== dbench ------ XFS is showing steady improvements with a large gain for single client in 2.6.39 and more or less retained since then. This is also true for higher number of clients although 64 clients was suspiciously poor even though 128 clients looked better. I didn't re-examine the raw data to see why. In general, dbench is looking very good. fsmark-single ------------- Again, this is looking good. Files/sec has improved slightly with the exception of a small dip in 3.2 and 3.3 which may be due to IO-Less dirty throttling. Overhead measurements are a bit all over the place. Not clear if this is cause for concern or not. fsmark-threaded --------------- Improved since 2.6.32 and has been steadily good for some time. Overhead measurements are all over the place. Again, not clear if this is a cause for concern. ========================================================== Machine: hydra Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-metadata-ext3/hydra/comparison.html Arch: x86-64 CPUs: 1 socket, 4 threads Model: AMD Phenom II X4 940 Disk: Single Rotary Disk Status: Ok ========================================================== dbench ------ The results here look very different to the arnold machine. This is curious because the disks have similar size and performance characteristics. It is doubtful that the difference is between 32 bit and 64 bit architectures. The discrepency may be more due to the different number of CPUs and how XFS does locking. One possibility is that fewer CPUs has the side-effect of better batching of some operations but this is a case. Figures areis showing that throughput is worse and highly variable in 3.4 for single clients. For higher number of clients figures look better overall. There was a dip in 3.1-based kernels though for an unknown reason. This does not exactly correlate with the ext3 figures although it showed a dip in performance at 3.2. fsmark-single ------------- While performance is better than 2.6.32, there was a dip in performance in 3.3 and a very large dip in 3.4. fsmark-threaded --------------- The same dip in 3.4 is visibile when multiple threads are used but it is not as severe. ========================================================== Machine: sandy Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-metadata-ext3/sandy/comparison.html Arch: x86-64 CPUs: 1 socket, 8 threads Model: Intel Core i7-2600 Disk: Single Rotary Disk Status: Fine ========================================================== dbench ------ Like seen on other filesystems, this data shows that there was a large dip in performance around 3.2 for single threads. Unlike the hydra machine, this was recovered in 3.4. As higher number of threads are used the gains and losses are inconsistent making it hard to draw a solid conclusion. fsmark-single ------------- This was doing great until 3.4 where there is a large drop. fsmark-threaded --------------- Unlike the single threaded case, things are looking great here. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [MMTests] IO metadata on XFS 2012-06-29 11:25 ` [MMTests] IO metadata on XFS Mel Gorman @ 2012-07-01 23:54 ` Dave Chinner 2012-07-02 6:32 ` Christoph Hellwig 2012-07-02 13:30 ` Mel Gorman 0 siblings, 2 replies; 29+ messages in thread From: Dave Chinner @ 2012-07-01 23:54 UTC (permalink / raw) To: Mel Gorman; +Cc: linux-mm, linux-kernel, linux-fsdevel, xfs On Fri, Jun 29, 2012 at 12:25:06PM +0100, Mel Gorman wrote: > Configuration: global-dhp__io-metadata-xfs > Benchmarks: dbench3, fsmark-single, fsmark-threaded > > Summary > ======= > Most of the figures look good and in general there has been consistent good > performance from XFS. However, fsmark-single is showing a severe performance > dip in a few cases somewhere between 3.1 and 3.4. fs-mark running a single > thread took a particularly bad dive in 3.4 for two machines that is worth > examining closer. That will be caused by the fact we changed all the metadata updates to be logged, which means a transaction every time .dirty_inode is called. This should mostly go away when XFS is converted to use .update_time rather than .dirty_inode to only issue transactions when the VFS updates the atime rather than every .dirty_inode call... > Unfortunately it is harder to easy conclusions as the > gains/losses are not consistent between machines which may be related to > the available number of CPU threads. It increases the CPU overhead (dirty_inode can be called up to 4 times per write(2) call, IIRC), so with limited numbers of threads/limited CPU power it will result in lower performance. Where you have lots of CPU power, there will be little difference in performance... Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [MMTests] IO metadata on XFS 2012-07-01 23:54 ` Dave Chinner @ 2012-07-02 6:32 ` Christoph Hellwig 2012-07-02 14:32 ` Mel Gorman 2012-07-02 13:30 ` Mel Gorman 1 sibling, 1 reply; 29+ messages in thread From: Christoph Hellwig @ 2012-07-02 6:32 UTC (permalink / raw) To: Dave Chinner; +Cc: Mel Gorman, linux-mm, linux-kernel, linux-fsdevel, xfs On Mon, Jul 02, 2012 at 09:54:58AM +1000, Dave Chinner wrote: > That will be caused by the fact we changed all the metadata updates > to be logged, which means a transaction every time .dirty_inode is > called. > > This should mostly go away when XFS is converted to use .update_time > rather than .dirty_inode to only issue transactions when the VFS > updates the atime rather than every .dirty_inode call... I think the patch to do that conversion still needs review.. > It increases the CPU overhead (dirty_inode can be called up to 4 > times per write(2) call, IIRC), so with limited numbers of > threads/limited CPU power it will result in lower performance. Where > you have lots of CPU power, there will be little difference in > performance... When I checked it it could only be called twice, and we'd already optimize away the second call. I'd defintively like to track down where the performance changes happend, at least to a major version but even better to a -rc or git commit. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [MMTests] IO metadata on XFS 2012-07-02 6:32 ` Christoph Hellwig @ 2012-07-02 14:32 ` Mel Gorman 2012-07-02 19:35 ` Mel Gorman 0 siblings, 1 reply; 29+ messages in thread From: Mel Gorman @ 2012-07-02 14:32 UTC (permalink / raw) To: Christoph Hellwig Cc: Dave Chinner, linux-mm, linux-kernel, linux-fsdevel, xfs On Mon, Jul 02, 2012 at 02:32:26AM -0400, Christoph Hellwig wrote: > > It increases the CPU overhead (dirty_inode can be called up to 4 > > times per write(2) call, IIRC), so with limited numbers of > > threads/limited CPU power it will result in lower performance. Where > > you have lots of CPU power, there will be little difference in > > performance... > > When I checked it it could only be called twice, and we'd already > optimize away the second call. I'd defintively like to track down where > the performance changes happend, at least to a major version but even > better to a -rc or git commit. > By all means feel free to run the test yourself and run the bisection :) It's rare but on this occasion the test machine is idle so I started an automated git bisection. As you know the milage with an automated bisect varies so it may or may not find the right commit. Test machine is sandy so http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-metadata-xfs/sandy/comparison.html is the report of interest. The script is doing a full search between v3.3 and v3.4 for a point where average files/sec for fsmark-single drops below 25000. I did not limit the search to fs/xfs on the off-chance that it is an apparently unrelated patch that caused the problem. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [MMTests] IO metadata on XFS 2012-07-02 14:32 ` Mel Gorman @ 2012-07-02 19:35 ` Mel Gorman 2012-07-03 0:19 ` Dave Chinner 2012-07-03 13:04 ` Mel Gorman 0 siblings, 2 replies; 29+ messages in thread From: Mel Gorman @ 2012-07-02 19:35 UTC (permalink / raw) To: Christoph Hellwig Cc: Dave Chinner, linux-mm, linux-kernel, linux-fsdevel, xfs, dri-devel, Keith Packard, Eugeni Dodonov, Daniel Vetter, Chris Wilson Adding dri-devel and a few others because an i915 patch contributed to the regression. On Mon, Jul 02, 2012 at 03:32:15PM +0100, Mel Gorman wrote: > On Mon, Jul 02, 2012 at 02:32:26AM -0400, Christoph Hellwig wrote: > > > It increases the CPU overhead (dirty_inode can be called up to 4 > > > times per write(2) call, IIRC), so with limited numbers of > > > threads/limited CPU power it will result in lower performance. Where > > > you have lots of CPU power, there will be little difference in > > > performance... > > > > When I checked it it could only be called twice, and we'd already > > optimize away the second call. I'd defintively like to track down where > > the performance changes happend, at least to a major version but even > > better to a -rc or git commit. > > > > By all means feel free to run the test yourself and run the bisection :) > > It's rare but on this occasion the test machine is idle so I started an > automated git bisection. As you know the milage with an automated bisect > varies so it may or may not find the right commit. Test machine is sandy so > http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-metadata-xfs/sandy/comparison.html > is the report of interest. The script is doing a full search between v3.3 and > v3.4 for a point where average files/sec for fsmark-single drops below 25000. > I did not limit the search to fs/xfs on the off-chance that it is an > apparently unrelated patch that caused the problem. > It was obvious very quickly that there were two distinct regression so I ran two bisections. One led to a XFS and the other led to an i915 patch that enables RC6 to reduce power usage. [c999a223: xfs: introduce an allocation workqueue] [aa464191: drm/i915: enable plain RC6 on Sandy Bridge by default] gdm was running on the machine so i915 would have been in use. In case it is of interest this is the log of the bisection. Lines beginning with # are notes I made and all other lines are from the bisection script. The second-last column is the files/sec recorded by fsmark. # MARK v3.3..v3.4 Search for BAD files/sec -lt 28000 # BAD 16536 # GOOD 34757 Mon Jul 2 15:46:13 IST 2012 sandy xfsbisect 141124c02059eee9dbc5c86ea797b1ca888e77f7 37454 good Mon Jul 2 15:56:06 IST 2012 sandy xfsbisect 55a320308902f7a0746569ee57eeb3f254e6ed16 25192 bad Mon Jul 2 16:08:34 IST 2012 sandy xfsbisect 281b05392fc2cb26209b4d85abaf4889ab1991f3 38807 good Mon Jul 2 16:18:02 IST 2012 sandy xfsbisect a8364d5555b2030d093cde0f07951628e55454e1 37553 good Mon Jul 2 16:27:22 IST 2012 sandy xfsbisect d2a2fc18d98d8ee2dec1542efc7f47beec256144 36676 good Mon Jul 2 16:36:48 IST 2012 sandy xfsbisect 2e7580b0e75d771d93e24e681031a165b1d31071 37756 good Mon Jul 2 16:46:36 IST 2012 sandy xfsbisect 532bfc851a7475fb6a36c1e953aa395798a7cca7 25416 bad Mon Jul 2 16:56:10 IST 2012 sandy xfsbisect 0c9aac08261512d70d7d4817bd222abca8b6bdd6 38486 good Mon Jul 2 17:05:40 IST 2012 sandy xfsbisect 0fc9d1040313047edf6a39fd4d7c7defdca97c62 37970 good Mon Jul 2 17:16:01 IST 2012 sandy xfsbisect 5a5881cdeec2c019b5c9a307800218ee029f7f61 24493 bad Mon Jul 2 17:21:15 IST 2012 sandy xfsbisect f616137519feb17b849894fcbe634a021d3fa7db 24405 bad Mon Jul 2 17:26:16 IST 2012 sandy xfsbisect 5575acc7807595687288b3bbac15103f2a5462e1 37336 good Mon Jul 2 17:31:25 IST 2012 sandy xfsbisect c999a223c2f0d31c64ef7379814cea1378b2b800 24552 bad Mon Jul 2 17:36:34 IST 2012 sandy xfsbisect 1a1d772433d42aaff7315b3468fef5951604f5c6 36872 good # c999a223c2f0d31c64ef7379814cea1378b2b800 is the first bad commit # [c999a223: xfs: introduce an allocation workqueue] # # MARK c999a223c2f0d31c64ef7379814cea1378b2b800..v3.4 Search for BAD files/sec -lt 20000 # BAD 16536 # GOOD 24552 Mon Jul 2 17:48:39 IST 2012 sandy xfsbisect b2094ef840697bc8ca5d17a83b7e30fad5f1e9fa 37435 good Mon Jul 2 17:58:12 IST 2012 sandy xfsbisect d2a2fc18d98d8ee2dec1542efc7f47beec256144 38303 good Mon Jul 2 18:08:18 IST 2012 sandy xfsbisect 5d32c88f0b94061b3af2e3ade92422407282eb12 16718 bad Mon Jul 2 18:18:02 IST 2012 sandy xfsbisect 2f7fa1be66dce77608330c5eb918d6360b5525f2 24964 good Mon Jul 2 18:24:14 IST 2012 sandy xfsbisect 923f79743c76583ed4684e2c80c8da51a7268af3 24963 good Mon Jul 2 18:33:49 IST 2012 sandy xfsbisect b61c37f57988567c84359645f8202a7c84bc798a 24824 good Mon Jul 2 18:40:20 IST 2012 sandy xfsbisect 20a2a811602b16c42ce88bada3d52712cdfb988b 17155 bad Mon Jul 2 18:50:12 IST 2012 sandy xfsbisect 78fb72f7936c01d5b426c03a691eca082b03f2b9 38494 good Mon Jul 2 19:00:24 IST 2012 sandy xfsbisect e1a7eb08ee097e97e928062a242b0de5b2599a11 25033 good Mon Jul 2 19:10:24 IST 2012 sandy xfsbisect 97effadb65ed08809e1720c8d3ee80b73a93665c 16520 bad Mon Jul 2 19:16:16 IST 2012 sandy xfsbisect 25e341cfc33d94435472983825163e97fe370a6c 16748 bad Mon Jul 2 19:21:52 IST 2012 sandy xfsbisect 7dd4906586274f3945f2aeaaa5a33b451c3b4bba 24957 good Mon Jul 2 19:27:35 IST 2012 sandy xfsbisect aa46419186992e6b8b8010319f0ca7f40a0d13f5 17088 bad Mon Jul 2 19:32:54 IST 2012 sandy xfsbisect 83b7f9ac9126f0532ca34c14e4f0582c565c6b0d 25667 good # aa46419186992e6b8b8010319f0ca7f40a0d13f5 is the first bad commit # [aa464191: drm/i915: enable plain RC6 on Sandy Bridge by default] I tested plain reverts of the patches individually and together and got the following results FS-Mark Single Threaded 3.4.0 3.4.0 3.4.0 3.4.0-vanilla revert-aa464191 revert-c999a223 revert-both Files/s min 14176.40 ( 0.00%) 17830.60 (25.78%) 24186.70 (70.61%) 25108.00 (77.11%) Files/s mean 16783.35 ( 0.00%) 25029.69 (49.13%) 37513.72 (123.52%) 38169.97 (127.43%) Files/s stddev 1007.26 ( 0.00%) 2644.87 (162.58%) 5344.99 (430.65%) 5599.65 (455.93%) Files/s max 18475.40 ( 0.00%) 27966.10 (51.37%) 45564.60 (146.62%) 47918.10 (159.36%) Overhead min 593978.00 ( 0.00%) 386173.00 (34.99%) 253812.00 (57.27%) 247396.00 (58.35%) Overhead mean 637782.80 ( 0.00%) 429229.33 (32.70%) 322868.20 (49.38%) 287141.73 (54.98%) Overhead stddev 72440.72 ( 0.00%) 100056.96 (-38.12%) 175001.08 (-141.58%) 102018.14 (-40.83%) Overhead max 855637.00 ( 0.00%) 753541.00 (11.93%) 880531.00 (-2.91%) 637932.00 (25.44%) MMTests Statistics: duration Sys Time Running Test (seconds) 44.06 32.25 24.19 23.99 User+Sys Time Running Test (seconds) 50.19 36.35 27.24 26.7 Total Elapsed Time (seconds) 59.21 44.76 34.95 34.14 Individually reverting either patch makes a difference to both files/sec and overhead. Reverting both is not as dramatic as reverting each individual patch would indicate but it's still a major improvement. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [MMTests] IO metadata on XFS 2012-07-02 19:35 ` Mel Gorman @ 2012-07-03 0:19 ` Dave Chinner 2012-07-03 10:59 ` Mel Gorman 2012-07-03 13:04 ` Mel Gorman 1 sibling, 1 reply; 29+ messages in thread From: Dave Chinner @ 2012-07-03 0:19 UTC (permalink / raw) To: Mel Gorman Cc: Christoph Hellwig, linux-mm, linux-kernel, linux-fsdevel, xfs, dri-devel, Keith Packard, Eugeni Dodonov, Daniel Vetter, Chris Wilson On Mon, Jul 02, 2012 at 08:35:16PM +0100, Mel Gorman wrote: > Adding dri-devel and a few others because an i915 patch contributed to > the regression. > > On Mon, Jul 02, 2012 at 03:32:15PM +0100, Mel Gorman wrote: > > On Mon, Jul 02, 2012 at 02:32:26AM -0400, Christoph Hellwig wrote: > > > > It increases the CPU overhead (dirty_inode can be called up to 4 > > > > times per write(2) call, IIRC), so with limited numbers of > > > > threads/limited CPU power it will result in lower performance. Where > > > > you have lots of CPU power, there will be little difference in > > > > performance... > > > > > > When I checked it it could only be called twice, and we'd already > > > optimize away the second call. I'd defintively like to track down where > > > the performance changes happend, at least to a major version but even > > > better to a -rc or git commit. > > > > > > > By all means feel free to run the test yourself and run the bisection :) > > > > It's rare but on this occasion the test machine is idle so I started an > > automated git bisection. As you know the milage with an automated bisect > > varies so it may or may not find the right commit. Test machine is sandy so > > http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-metadata-xfs/sandy/comparison.html > > is the report of interest. The script is doing a full search between v3.3 and > > v3.4 for a point where average files/sec for fsmark-single drops below 25000. > > I did not limit the search to fs/xfs on the off-chance that it is an > > apparently unrelated patch that caused the problem. > > > > It was obvious very quickly that there were two distinct regression so I > ran two bisections. One led to a XFS and the other led to an i915 patch > that enables RC6 to reduce power usage. > > [aa464191: drm/i915: enable plain RC6 on Sandy Bridge by default] Doesn't seem to be the major cause of the regression. By itself, it has impact, but the majority comes from the XFS change... > [c999a223: xfs: introduce an allocation workqueue] Which indicates that there is workqueue scheduling issues, I think. The same amount of work is being done, but half of it is being pushed off into a workqueue to avoid stack overflow issues (*). I tested the above patch in anger on an 8p machine, similar to the machine you saw no regressions on, but the workload didn't drive it to being completely CPU bound (only about 90%) so the allocation work was probably always scheduled quickly. How many worker threads have been spawned on these machines that are showing the regression? What is the context switch rate on the machines whenteh test is running? Can you run latencytop to see if there is excessive starvation/wait times for allocation completion? A pert top profile comparison might be informative, too... (*) The stack usage below submit_bio() can be more than 5k (DM, MD, SCSI, driver, memory allocation), so it's really not safe to do allocation anywhere below about 3k of kernel stack being used. e.g. on a relatively trivial storage setup without the above commit: [142296.384921] flush-253:4 used greatest stack depth: 360 bytes left Fundamentally, 8k stacks on x86-64 are too small for our increasingly complex storage layers and the 100+ function deep call chains that occur. Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [MMTests] IO metadata on XFS 2012-07-03 0:19 ` Dave Chinner @ 2012-07-03 10:59 ` Mel Gorman 2012-07-03 11:44 ` Mel Gorman ` (2 more replies) 0 siblings, 3 replies; 29+ messages in thread From: Mel Gorman @ 2012-07-03 10:59 UTC (permalink / raw) To: Dave Chinner Cc: Christoph Hellwig, linux-mm, linux-kernel, linux-fsdevel, xfs, dri-devel, Keith Packard, Eugeni Dodonov, Daniel Vetter, Chris Wilson On Tue, Jul 03, 2012 at 10:19:28AM +1000, Dave Chinner wrote: > On Mon, Jul 02, 2012 at 08:35:16PM +0100, Mel Gorman wrote: > > Adding dri-devel and a few others because an i915 patch contributed to > > the regression. > > > > On Mon, Jul 02, 2012 at 03:32:15PM +0100, Mel Gorman wrote: > > > On Mon, Jul 02, 2012 at 02:32:26AM -0400, Christoph Hellwig wrote: > > > > > It increases the CPU overhead (dirty_inode can be called up to 4 > > > > > times per write(2) call, IIRC), so with limited numbers of > > > > > threads/limited CPU power it will result in lower performance. Where > > > > > you have lots of CPU power, there will be little difference in > > > > > performance... > > > > > > > > When I checked it it could only be called twice, and we'd already > > > > optimize away the second call. I'd defintively like to track down where > > > > the performance changes happend, at least to a major version but even > > > > better to a -rc or git commit. > > > > > > > > > > By all means feel free to run the test yourself and run the bisection :) > > > > > > It's rare but on this occasion the test machine is idle so I started an > > > automated git bisection. As you know the milage with an automated bisect > > > varies so it may or may not find the right commit. Test machine is sandy so > > > http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-metadata-xfs/sandy/comparison.html > > > is the report of interest. The script is doing a full search between v3.3 and > > > v3.4 for a point where average files/sec for fsmark-single drops below 25000. > > > I did not limit the search to fs/xfs on the off-chance that it is an > > > apparently unrelated patch that caused the problem. > > > > > > > It was obvious very quickly that there were two distinct regression so I > > ran two bisections. One led to a XFS and the other led to an i915 patch > > that enables RC6 to reduce power usage. > > > > [aa464191: drm/i915: enable plain RC6 on Sandy Bridge by default] > > Doesn't seem to be the major cause of the regression. By itself, it > has impact, but the majority comes from the XFS change... > The fact it has an impact at all is weird but lets see what the DRI folks think about it. > > [c999a223: xfs: introduce an allocation workqueue] > > Which indicates that there is workqueue scheduling issues, I think. > The same amount of work is being done, but half of it is being > pushed off into a workqueue to avoid stack overflow issues (*). I > tested the above patch in anger on an 8p machine, similar to the > machine you saw no regressions on, but the workload didn't drive it > to being completely CPU bound (only about 90%) so the allocation > work was probably always scheduled quickly. > What test were you using? > How many worker threads have been spawned on these machines > that are showing the regression? 20 or 21 generally. An example list as spotted by top looks like kworker/0:0 kworker/0:1 kworker/0:2 kworker/1:0 kworker/1:1 kworker/1:2 kworker/2:0 kworker/2:1 kworker/2:2 kworker/3:0 kworker/3:1 kworker/3:2 kworker/4:0 kworker/4:1 kworker/5:0 kworker/5:1 kworker/6:0 kworker/6:1 kworker/6:2 kworker/7:0 kworker/7:1 There were 8 unbound workers. > What is the context switch rate on the machines whenteh test is running? This is vmstat from a vanilla kernel. The actual vmstat is after the --. The information before that is recorded by mmtests to try and detect if there was jitter in the vmstat output. It's showing that there is little or no jitter in this test. VANILLA 1341306582.6713 1.8109 1.8109 -- 0 0 0 16050784 11448 104056 0 0 376 0 209 526 0 0 99 1 0 1341306584.6715 3.8112 2.0003 -- 1 0 0 16050628 11448 104064 0 0 0 0 121 608 0 0 100 0 0 1341306586.6718 5.8114 2.0003 -- 0 0 0 16047432 11460 104288 0 0 102 45 227 999 0 0 99 1 0 1341306588.6721 7.8117 2.0003 -- 1 0 0 16046944 11460 104292 0 0 0 0 120 663 0 0 100 0 0 1341306590.6723 9.8119 2.0002 -- 0 2 0 16045788 11476 104296 0 0 12 40 190 754 0 0 99 0 0 1341306592.6725 11.8121 2.0002 -- 0 1 0 15990236 12600 141724 0 0 19054 30 1400 2937 2 1 88 9 0 1341306594.6727 13.8124 2.0002 -- 1 0 0 15907628 12600 186360 0 0 1653 0 3117 6406 2 9 88 1 0 1341306596.6730 15.8127 2.0003 -- 0 0 0 15825964 12608 226636 0 0 15 11024 3073 6350 2 9 89 0 0 1341306598.6733 17.8130 2.0003 -- 1 0 0 15730420 12608 271632 0 0 0 3072 3461 7179 2 10 88 0 0 1341306600.6736 19.8132 2.0003 -- 1 0 0 15686200 12608 310816 0 0 0 12416 3093 6198 2 9 89 0 0 1341306602.6738 21.8135 2.0003 -- 2 0 0 15593588 12616 354928 0 0 0 32 3482 7146 2 11 87 0 0 1341306604.6741 23.8138 2.0003 -- 2 0 0 15562032 12616 393772 0 0 0 12288 3129 6330 2 10 89 0 0 1341306606.6744 25.8140 2.0002 -- 1 0 0 15458316 12624 438004 0 0 0 26 3471 7107 2 11 87 0 0 1341306608.6746 27.8142 2.0002 -- 1 0 0 15432024 12624 474244 0 0 0 12416 3011 6017 1 10 89 0 0 1341306610.6749 29.8145 2.0003 -- 2 0 0 15343280 12624 517696 0 0 0 24 3393 6826 2 11 87 0 0 1341306612.6751 31.8148 2.0002 -- 1 0 0 15311136 12632 551816 0 0 0 16502 2818 5653 2 9 88 1 0 1341306614.6754 33.8151 2.0003 -- 1 0 0 15220648 12632 594936 0 0 0 3584 3451 6779 2 11 87 0 0 1341306616.6755 35.8152 2.0001 -- 4 0 0 15221252 12632 649296 0 0 0 38559 4846 8709 2 15 78 6 0 1341306618.6758 37.8155 2.0003 -- 1 0 0 15177724 12640 668476 0 0 20 40679 2204 4067 1 5 89 5 0 1341306620.6761 39.8158 2.0003 -- 1 0 0 15090204 12640 711752 0 0 0 0 3316 6788 2 11 88 0 0 1341306622.6764 41.8160 2.0003 -- 1 0 0 15005356 12640 748532 0 0 0 12288 3073 6132 2 10 89 0 0 1341306624.6766 43.8163 2.0002 -- 2 0 0 14913088 12648 791952 0 0 0 28 3408 6806 2 11 87 0 0 1341306626.6769 45.8166 2.0003 -- 1 0 0 14891512 12648 826328 0 0 0 12420 2906 5710 1 9 90 0 0 1341306628.6772 47.8168 2.0003 -- 1 0 0 14794316 12656 868936 0 0 0 26 3367 6798 2 11 87 0 0 1341306630.6774 49.8171 2.0003 -- 1 0 0 14769188 12656 905016 0 0 30 12324 3029 5876 2 10 89 0 0 1341306632.6777 51.8173 2.0002 -- 1 0 0 14679544 12656 947712 0 0 0 0 3399 6868 2 11 87 0 0 1341306634.6780 53.8176 2.0003 -- 1 0 0 14646156 12664 982032 0 0 0 14658 2987 5761 1 10 89 0 0 1341306636.6782 55.8179 2.0003 -- 1 0 0 14560504 12664 1023816 0 0 0 4404 3454 6876 2 11 87 0 0 1341306638.6783 57.8180 2.0001 -- 2 0 0 14533384 12664 1056812 0 0 0 15810 3002 5581 1 10 89 0 0 1341306640.6785 59.8182 2.0002 -- 1 0 0 14593332 12672 1027392 0 0 0 31790 3504 1811 1 13 78 8 0 1341306642.6787 61.8183 2.0001 -- 1 0 0 14686968 12672 1007604 0 0 0 14621 2434 1248 1 10 89 0 0 1341306644.6789 63.8185 2.0002 -- 1 1 0 15042476 12680 788104 0 0 0 36564 2809 1484 1 12 86 1 0 1341306646.6790 65.8187 2.0002 -- 1 0 0 15128292 12680 757948 0 0 0 26395 3050 1313 1 13 86 1 0 1341306648.6792 67.8189 2.0002 -- 1 0 0 15160036 12680 727964 0 0 0 5463 2752 910 1 12 87 0 0 1341306650.6795 69.8192 2.0003 -- 0 0 0 15633256 12688 332572 0 0 1156 12308 2117 2346 1 7 91 1 0 1341306652.6797 71.8194 2.0002 -- 0 0 0 15633892 12688 332652 0 0 0 0 224 758 0 0 100 0 0 1341306654.6800 73.8197 2.0003 -- 0 0 0 15633900 12688 332524 0 0 0 0 231 1009 0 0 100 0 0 1341306656.6803 75.8199 2.0003 -- 0 0 0 15637436 12696 332504 0 0 0 38 266 713 0 0 99 0 0 1341306658.6805 77.8202 2.0003 -- 0 0 0 15654180 12696 332352 0 0 0 0 270 821 0 0 100 0 0 REVERT-XFS 1341307733.8702 1.7941 1.7941 -- 0 0 0 16050640 12036 103996 0 0 372 0 216 752 0 0 99 1 0 1341307735.8704 3.7944 2.0002 -- 0 0 0 16050864 12036 104028 0 0 0 0 132 857 0 0 100 0 0 1341307737.8707 5.7946 2.0002 -- 0 0 0 16047492 12048 104252 0 0 102 37 255 938 0 0 99 1 0 1341307739.8709 7.7949 2.0003 -- 0 0 0 16047600 12072 104324 0 0 32 2 129 658 0 0 100 0 0 1341307741.8712 9.7951 2.0002 -- 1 1 0 16046676 12080 104328 0 0 0 32 165 729 0 0 100 0 0 1341307743.8714 11.7954 2.0003 -- 0 1 0 15990840 13216 142612 0 0 19422 30 1467 3015 2 1 89 8 0 1341307745.8717 13.7956 2.0002 -- 0 0 0 15825496 13216 226396 0 0 1310 11214 2217 1348 2 8 89 1 0 1341307747.8717 15.7957 2.0001 -- 1 0 0 15677816 13224 314672 0 0 4 15294 2307 1173 2 9 89 0 0 1341307749.8719 17.7959 2.0002 -- 1 0 0 15524372 13224 409728 0 0 0 12288 2466 888 1 10 89 0 0 1341307751.8721 19.7960 2.0002 -- 1 0 0 15368424 13224 502552 0 0 0 12416 2312 878 1 10 89 0 0 1341307753.8722 21.7962 2.0002 -- 1 0 0 15225216 13232 593092 0 0 0 12448 2539 1380 1 10 88 0 0 1341307755.8724 23.7963 2.0002 -- 2 0 0 15163712 13232 664768 0 0 0 32160 2184 1177 1 8 90 0 0 1341307757.8727 25.7967 2.0003 -- 1 0 0 14973888 13240 755080 0 0 0 12316 2482 1219 1 10 89 0 0 1341307759.8728 27.7968 2.0001 -- 1 0 0 14883580 13240 840036 0 0 0 44471 2711 1234 2 10 88 0 0 1341307761.8730 29.7970 2.0002 -- 1 0 0 14800304 13240 920504 0 0 0 42554 2571 1050 1 10 89 0 0 1341307763.8734 31.7973 2.0003 -- 0 0 0 14642504 13248 995004 0 0 0 3232 2276 1081 1 8 90 0 0 1341307765.8737 33.7976 2.0003 -- 1 0 0 14545072 13248 1052536 0 0 0 18688 2628 1114 1 9 89 0 0 1341307767.8739 35.7979 2.0003 -- 1 0 0 14783848 13248 926824 0 0 0 59559 2409 1308 0 10 89 1 0 1341307769.8740 37.7980 2.0001 -- 2 0 0 14854800 13256 896832 0 0 0 9172 2419 1004 1 10 89 1 0 1341307771.8742 39.7981 2.0002 -- 2 0 0 14835084 13256 875612 0 0 0 12288 2524 812 0 11 89 0 0 1341307773.8743 41.7983 2.0002 -- 2 0 0 15126252 13256 745844 0 0 0 10297 2714 1163 1 12 88 0 0 1341307775.8745 43.7985 2.0002 -- 1 0 0 15108800 13264 724544 0 0 0 12316 2499 931 1 11 88 0 0 1341307777.8746 45.7986 2.0001 -- 2 0 0 15226236 13264 694580 0 0 0 12416 2700 1194 1 12 88 0 0 1341307779.8750 47.7989 2.0003 -- 1 0 0 15697632 13264 300716 0 0 1156 0 934 1701 0 2 96 1 0 1341307781.8752 49.7992 2.0003 -- 0 0 0 15697508 13272 300720 0 0 0 66 166 641 0 0 100 0 0 1341307783.8755 51.7995 2.0003 -- 0 0 0 15699008 13272 300524 0 0 0 0 248 865 0 0 100 0 0 1341307785.8758 53.7997 2.0003 -- 0 0 0 15702452 13272 300520 0 0 0 0 285 960 0 0 99 0 0 1341307787.8760 55.7999 2.0002 -- 0 0 0 15719404 13280 300436 0 0 0 26 136 590 0 0 99 0 0 Vanilla average context switch rate 4278.53 Revert average context switch rate 1095 > Can you run latencytop to see > if there is excessive starvation/wait times for allocation > completion? I'm not sure what format you are looking for. latencytop is shit for capturing information throughout a test and it does not easily allow you to record a snapshot of a test. You can record all the console output of course but that's a complete mess. I tried capturing /proc/latency_stats over time instead because that can be trivially sorted on a system-wide basis but as I write this I find that latency_stats was bust. It was just spitting out Latency Top version : v0.1 and nothing else. Either latency_stats is broken or my config is. Not sure which it is right now and won't get enough time on this today to pinpoint it. > A pert top profile comparison might be informative, > too... > I'm not sure if this is what you really wanted. I thought an oprofile or perf report would have made more sense but I recorded perf top over time anyway and it's at the end of the mail. The timestamp information is poor because the perf top information was buffered so it would receive a bunch of updates at once. Each sample should be roughly 2 seconds apart. This buffering can be dealt with, I just failed to do it in advance and I do not think it's necessary to rerun the tests for it. > (*) The stack usage below submit_bio() can be more than 5k (DM, MD, > SCSI, driver, memory allocation), so it's really not safe to do > allocation anywhere below about 3k of kernel stack being used. e.g. > on a relatively trivial storage setup without the above commit: > > [142296.384921] flush-253:4 used greatest stack depth: 360 bytes left > > Fundamentally, 8k stacks on x86-64 are too small for our > increasingly complex storage layers and the 100+ function deep call > chains that occur. > I understand the patches motivation. For these tests I'm being deliberately a bit of a dummy and just capturing information. This might allow me to actually get through all the results and identify some of the problems and spread them around a bit. Either that or I need to clone myself a few times to tackle each of the problems in a reasonable timeframe :) For just these XFS tests I've uploaded a tarball of the logs to http://www.csn.ul.ie/~mel/postings/xfsbisect-20120703/xfsbisect-logs.tar.gz For results with no monitor you can find them somewhere like this default/no-monitor/sandy/fsmark-single-3.4.0-vanilla/noprofile/fsmark.log Results with monitors attached are in run-monitor. You can read the iostat logs for example from default/run-monitor/sandy/iostat-3.4.0-vanilla-fsmark-single Some of the monitor logs are gzipped. This is perf top over time for the vanilla kernel time: 1341306570 time: 1341306579 PerfTop: 1 irqs/sec kernel: 0.0% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 61.85% [kernel] [k] __rmqueue 38.15% libc-2.11.3.so [.] _IO_vfscanf time: 1341306579 PerfTop: 3 irqs/sec kernel:66.7% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 19.88% [kernel] [k] _raw_spin_lock_irqsave 17.14% [kernel] [k] __rmqueue 16.96% [kernel] [k] format_decode 15.37% libc-2.11.3.so [.] __tzfile_compute 13.55% [kernel] [k] copy_user_generic_string 10.57% libc-2.11.3.so [.] _IO_vfscanf 6.53% [kernel] [k] find_first_bit time: 1341306579 PerfTop: 0 irqs/sec kernel:-nan% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 17.51% [kernel] [k] _raw_spin_lock_irqsave 15.10% [kernel] [k] __rmqueue 14.94% [kernel] [k] format_decode 13.54% libc-2.11.3.so [.] __tzfile_compute 11.94% [kernel] [k] copy_user_generic_string 11.90% [kernel] [k] _raw_spin_lock 9.31% libc-2.11.3.so [.] _IO_vfscanf 5.75% [kernel] [k] find_first_bit time: 1341306579 PerfTop: 41 irqs/sec kernel:58.5% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 13.62% [kernel] [k] _raw_spin_lock_irqsave 11.02% [kernel] [k] __rmqueue 10.91% [kernel] [k] format_decode 9.89% libc-2.11.3.so [.] __tzfile_compute 8.72% [kernel] [k] copy_user_generic_string 8.69% [kernel] [k] _raw_spin_lock 7.15% libc-2.11.3.so [.] _IO_vfscanf 4.20% [kernel] [k] find_first_bit 1.47% libc-2.11.3.so [.] __strcmp_sse42 1.37% libc-2.11.3.so [.] __strchr_sse42 1.19% sed [.] 0x0000000000009f7d 0.90% libc-2.11.3.so [.] vfprintf 0.84% [kernel] [k] hrtimer_interrupt 0.84% libc-2.11.3.so [.] re_string_realloc_buffers 0.76% [kernel] [k] enqueue_entity 0.66% [kernel] [k] __switch_to 0.65% libc-2.11.3.so [.] _IO_default_xsputn 0.62% [kernel] [k] do_vfs_ioctl 0.59% [kernel] [k] perf_event_mmap_event 0.56% gzip [.] 0x0000000000007b96 0.55% libc-2.11.3.so [.] bsearch time: 1341306579 PerfTop: 35 irqs/sec kernel:62.9% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 11.50% [kernel] [k] _raw_spin_lock_irqsave 9.22% [kernel] [k] __rmqueue 9.13% [kernel] [k] format_decode 8.27% libc-2.11.3.so [.] __tzfile_compute 7.92% [kernel] [k] copy_user_generic_string 7.74% [kernel] [k] _raw_spin_lock 6.21% libc-2.11.3.so [.] _IO_vfscanf 3.51% [kernel] [k] find_first_bit 1.44% gzip [.] 0x0000000000007b96 1.23% libc-2.11.3.so [.] __strcmp_sse42 1.15% libc-2.11.3.so [.] __strchr_sse42 1.06% libc-2.11.3.so [.] vfprintf 0.99% sed [.] 0x0000000000009f7d 0.92% [unknown] [.] 0x00007f84a7766b99 0.70% [kernel] [k] hrtimer_interrupt 0.70% libc-2.11.3.so [.] re_string_realloc_buffers 0.64% [kernel] [k] enqueue_entity 0.58% libtcl8.5.so [.] 0x000000000006fe86 0.55% [kernel] [k] __switch_to 0.54% libc-2.11.3.so [.] _IO_default_xsputn 0.53% [kernel] [k] __d_lookup_rcu time: 1341306585 PerfTop: 100 irqs/sec kernel:59.0% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 8.61% [kernel] [k] _raw_spin_lock_irqsave 5.92% [kernel] [k] __rmqueue 5.86% [kernel] [k] format_decode 5.31% libc-2.11.3.so [.] __tzfile_compute 5.30% [kernel] [k] copy_user_generic_string 5.27% [kernel] [k] _raw_spin_lock 3.99% libc-2.11.3.so [.] _IO_vfscanf 2.45% [unknown] [.] 0x00007f84a7766b99 2.26% [kernel] [k] find_first_bit 1.68% [kernel] [k] page_fault 1.45% libc-2.11.3.so [.] _int_malloc 1.28% gzip [.] 0x0000000000007b96 1.13% libc-2.11.3.so [.] vfprintf 1.06% libc-2.11.3.so [.] __strchr_sse42 1.02% perl [.] 0x0000000000044505 0.79% libc-2.11.3.so [.] __strcmp_sse42 0.79% [kernel] [k] do_task_stat 0.77% [kernel] [k] zap_pte_range 0.72% libc-2.11.3.so [.] __gconv_transform_utf8_internal 0.70% libc-2.11.3.so [.] malloc 0.70% libc-2.11.3.so [.] __mbrtowc time: 1341306585 PerfTop: 19 irqs/sec kernel:78.9% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 7.97% [kernel] [k] _raw_spin_lock_irqsave 5.48% [kernel] [k] __rmqueue 5.43% [kernel] [k] format_decode 5.24% [kernel] [k] copy_user_generic_string 5.18% [kernel] [k] _raw_spin_lock 4.92% libc-2.11.3.so [.] __tzfile_compute 4.25% libc-2.11.3.so [.] _IO_vfscanf 2.33% [unknown] [.] 0x00007f84a7766b99 2.12% [kernel] [k] page_fault 2.09% [kernel] [k] find_first_bit 1.34% libc-2.11.3.so [.] _int_malloc 1.19% gzip [.] 0x0000000000007b96 1.05% libc-2.11.3.so [.] vfprintf 0.98% libc-2.11.3.so [.] __strchr_sse42 0.94% perl [.] 0x0000000000044505 0.94% libc-2.11.3.so [.] _dl_addr 0.91% [kernel] [k] zap_pte_range 0.74% [kernel] [k] s_show 0.73% libc-2.11.3.so [.] __strcmp_sse42 0.73% [kernel] [k] do_task_stat 0.67% libc-2.11.3.so [.] __gconv_transform_utf8_internal time: 1341306585 PerfTop: 38 irqs/sec kernel:68.4% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 7.64% [kernel] [k] _raw_spin_lock_irqsave 4.89% [kernel] [k] _raw_spin_lock 4.77% [kernel] [k] __rmqueue 4.72% [kernel] [k] format_decode 4.56% [kernel] [k] copy_user_generic_string 4.53% libc-2.11.3.so [.] _IO_vfscanf 4.28% libc-2.11.3.so [.] __tzfile_compute 2.52% [unknown] [.] 0x00007f84a7766b99 no symbols found in /bin/sort, maybe install a debug package? 2.10% [kernel] [k] page_fault 1.82% [kernel] [k] find_first_bit 1.31% libc-2.11.3.so [.] _int_malloc 1.14% libc-2.11.3.so [.] vfprintf 1.08% libc-2.11.3.so [.] _dl_addr 1.07% [kernel] [k] s_show 1.05% libc-2.11.3.so [.] __strchr_sse42 1.03% gzip [.] 0x0000000000007b96 0.82% [kernel] [k] do_task_stat 0.82% perl [.] 0x0000000000044505 0.79% [kernel] [k] zap_pte_range 0.70% [kernel] [k] seq_put_decimal_ull 0.69% [kernel] [k] find_busiest_group time: 1341306591 PerfTop: 66 irqs/sec kernel:59.1% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 6.52% [kernel] [k] _raw_spin_lock_irqsave 4.11% libc-2.11.3.so [.] _IO_vfscanf 3.91% [kernel] [k] _raw_spin_lock 3.50% [kernel] [k] copy_user_generic_string 3.41% [kernel] [k] __rmqueue 3.38% [kernel] [k] format_decode 3.06% libc-2.11.3.so [.] __tzfile_compute 2.90% [unknown] [.] 0x00007f84a7766b99 2.30% [kernel] [k] page_fault 2.20% perl [.] 0x0000000000044505 1.83% libc-2.11.3.so [.] vfprintf 1.61% libc-2.11.3.so [.] _int_malloc 1.30% [kernel] [k] find_first_bit 1.22% libc-2.11.3.so [.] _dl_addr 1.19% libc-2.11.3.so [.] __gconv_transform_utf8_internal 1.10% libc-2.11.3.so [.] __strchr_sse42 1.01% [kernel] [k] zap_pte_range 0.99% [kernel] [k] s_show 0.98% [kernel] [k] __percpu_counter_add 0.86% [kernel] [k] __strnlen_user 0.75% ld-2.11.3.so [.] do_lookup_x time: 1341306591 PerfTop: 39 irqs/sec kernel:69.2% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 6.26% [kernel] [k] _raw_spin_lock_irqsave 4.05% [kernel] [k] _raw_spin_lock 3.86% libc-2.11.3.so [.] _IO_vfscanf 3.21% [kernel] [k] copy_user_generic_string 3.03% [kernel] [k] __rmqueue 3.00% [kernel] [k] format_decode 2.93% [unknown] [.] 0x00007f84a7766b99 2.72% libc-2.11.3.so [.] __tzfile_compute 2.20% [kernel] [k] page_fault 1.96% perl [.] 0x0000000000044505 1.77% libc-2.11.3.so [.] vfprintf 1.43% libc-2.11.3.so [.] _int_malloc 1.16% [kernel] [k] find_first_bit 1.09% libc-2.11.3.so [.] _dl_addr 1.06% libc-2.11.3.so [.] __gconv_transform_utf8_internal 1.02% [kernel] [k] s_show 0.98% libc-2.11.3.so [.] __strchr_sse42 0.93% gzip [.] 0x0000000000007b96 0.90% [kernel] [k] zap_pte_range 0.87% [kernel] [k] __percpu_counter_add 0.76% [kernel] [k] __strnlen_user time: 1341306591 PerfTop: 185 irqs/sec kernel:70.8% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 4.81% [kernel] [k] _raw_spin_lock_irqsave 3.60% [unknown] [.] 0x00007f84a7766b99 3.10% [kernel] [k] _raw_spin_lock 3.04% [kernel] [k] page_fault 2.66% libc-2.11.3.so [.] _IO_vfscanf 2.14% [kernel] [k] copy_user_generic_string 2.11% [kernel] [k] format_decode 1.96% [kernel] [k] __rmqueue 1.86% libc-2.11.3.so [.] _dl_addr 1.76% libc-2.11.3.so [.] __tzfile_compute 1.26% perl [.] 0x0000000000044505 1.19% libc-2.11.3.so [.] __mbrtowc 1.14% libc-2.11.3.so [.] vfprintf 1.12% libc-2.11.3.so [.] _int_malloc 1.09% gzip [.] 0x0000000000007b96 0.95% libc-2.11.3.so [.] __gconv_transform_utf8_internal 0.88% [kernel] [k] _raw_spin_unlock_irqrestore 0.87% [kernel] [k] __strnlen_user 0.82% [kernel] [k] clear_page_c 0.77% [kernel] [k] __schedule 0.76% [kernel] [k] find_get_page time: 1341306595 PerfTop: 385 irqs/sec kernel:48.8% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 27.20% cc1 [.] 0x0000000000210978 3.01% [unknown] [.] 0x00007f84a7766b99 2.18% [kernel] [k] page_fault 1.96% libbfd-2.21.so [.] 0x00000000000b9cdd 1.95% [kernel] [k] _raw_spin_lock_irqsave 1.91% ld.bfd [.] 0x000000000000e3b9 1.85% [kernel] [k] _raw_spin_lock 1.31% [kernel] [k] copy_user_generic_string 1.20% libbfd-2.21.so [.] bfd_hash_lookup 1.10% libc-2.11.3.so [.] __strcmp_sse42 0.93% libc-2.11.3.so [.] _IO_vfscanf 0.85% [kernel] [k] _raw_spin_unlock_irqrestore 0.82% libc-2.11.3.so [.] _int_malloc 0.80% [kernel] [k] __rmqueue 0.79% [kernel] [k] kmem_cache_alloc 0.74% [kernel] [k] format_decode 0.71% libc-2.11.3.so [.] _dl_addr 0.62% libbfd-2.21.so [.] _bfd_final_link_relocate 0.61% libc-2.11.3.so [.] __tzfile_compute 0.61% libc-2.11.3.so [.] vfprintf 0.59% [kernel] [k] find_busiest_group time: 1341306595 PerfTop: 1451 irqs/sec kernel:87.3% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 9.75% cc1 [.] 0x0000000000210978 8.81% [unknown] [.] 0x00007f84a7766b99 4.62% [kernel] [k] page_fault 3.61% [kernel] [k] _raw_spin_lock 2.67% [kernel] [k] memcpy 2.03% [kernel] [k] _raw_spin_lock_irqsave 2.00% [kernel] [k] kmem_cache_alloc 1.64% [xfs] [k] _xfs_buf_find 1.31% [kernel] [k] _raw_spin_unlock_irqrestore 1.16% [kernel] [k] kmem_cache_free 1.15% [xfs] [k] xfs_next_bit 0.98% [kernel] [k] __d_lookup 0.89% [xfs] [k] xfs_da_do_buf 0.83% [kernel] [k] memset 0.80% [xfs] [k] xfs_dir2_node_addname_int 0.79% [kernel] [k] link_path_walk 0.76% [xfs] [k] xfs_buf_item_size no symbols found in /usr/bin/tee, maybe install a debug package? no symbols found in /bin/date, maybe install a debug package? 0.73% [xfs] [k] xfs_buf_offset 0.71% [kernel] [k] __kmalloc 0.70% [kernel] [k] kfree 0.70% libbfd-2.21.so [.] 0x00000000000b9cdd time: 1341306601 PerfTop: 1267 irqs/sec kernel:85.2% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 10.81% [unknown] [.] 0x00007f84a7766b99 5.98% cc1 [.] 0x0000000000210978 5.20% [kernel] [k] page_fault 3.54% [kernel] [k] _raw_spin_lock 3.37% [kernel] [k] memcpy 2.03% [kernel] [k] kmem_cache_alloc 1.91% [kernel] [k] _raw_spin_lock_irqsave 1.75% [xfs] [k] _xfs_buf_find 1.35% [kernel] [k] _raw_spin_unlock_irqrestore 1.28% [xfs] [k] xfs_next_bit 1.14% [kernel] [k] kmem_cache_free 1.13% [kernel] [k] __kmalloc 1.12% [kernel] [k] __d_lookup 0.97% [xfs] [k] xfs_dir2_node_addname_int 0.96% [xfs] [k] xfs_buf_offset 0.95% [kernel] [k] memset 0.91% [kernel] [k] link_path_walk 0.88% [xfs] [k] xfs_da_do_buf 0.85% [kernel] [k] kfree 0.84% [xfs] [k] xfs_buf_item_size 0.74% [xfs] [k] xfs_btree_lookup time: 1341306601 PerfTop: 1487 irqs/sec kernel:85.3% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 11.84% [unknown] [.] 0x00007f84a7766b99 5.15% [kernel] [k] page_fault 3.93% cc1 [.] 0x0000000000210978 3.76% [kernel] [k] _raw_spin_lock 3.50% [kernel] [k] memcpy 2.13% [kernel] [k] kmem_cache_alloc 1.91% [xfs] [k] _xfs_buf_find 1.79% [kernel] [k] _raw_spin_lock_irqsave 1.52% [kernel] [k] __kmalloc 1.33% [kernel] [k] kmem_cache_free 1.32% [kernel] [k] _raw_spin_unlock_irqrestore 1.29% [kernel] [k] __d_lookup 1.27% [xfs] [k] xfs_next_bit 1.11% [kernel] [k] link_path_walk 1.01% [xfs] [k] xfs_buf_offset 1.00% [xfs] [k] xfs_dir2_node_addname_int 0.98% [xfs] [k] xfs_da_do_buf 0.97% [kernel] [k] kfree 0.96% [kernel] [k] memset 0.84% [xfs] [k] xfs_btree_lookup 0.82% [xfs] [k] xfs_buf_item_format time: 1341306601 PerfTop: 1291 irqs/sec kernel:85.9% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 12.21% [unknown] [.] 0x00007f84a7766b99 5.18% [kernel] [k] page_fault 3.83% [kernel] [k] _raw_spin_lock 3.67% [kernel] [k] memcpy 2.92% cc1 [.] 0x0000000000210978 2.28% [kernel] [k] kmem_cache_alloc 2.18% [xfs] [k] _xfs_buf_find 1.66% [kernel] [k] _raw_spin_lock_irqsave 1.56% [kernel] [k] __kmalloc 1.43% [kernel] [k] __d_lookup 1.43% [kernel] [k] _raw_spin_unlock_irqrestore 1.40% [kernel] [k] kmem_cache_free 1.29% [xfs] [k] xfs_next_bit 1.13% [xfs] [k] xfs_buf_offset 1.07% [kernel] [k] link_path_walk 1.04% [xfs] [k] xfs_da_do_buf 1.01% [kernel] [k] memset 1.01% [kernel] [k] kfree 1.00% [xfs] [k] xfs_dir2_node_addname_int 0.87% [xfs] [k] xfs_buf_item_size 0.84% [xfs] [k] xfs_btree_lookup time: 1341306607 PerfTop: 1435 irqs/sec kernel:87.9% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 12.06% [unknown] [.] 0x00007f84a7766b99 5.40% [kernel] [k] page_fault 3.88% [kernel] [k] _raw_spin_lock 3.83% [kernel] [k] memcpy 2.41% [xfs] [k] _xfs_buf_find 2.35% [kernel] [k] kmem_cache_alloc 2.19% cc1 [.] 0x0000000000210978 1.68% [kernel] [k] _raw_spin_lock_irqsave 1.55% [kernel] [k] __kmalloc 1.48% [kernel] [k] __d_lookup 1.43% [kernel] [k] kmem_cache_free 1.42% [kernel] [k] _raw_spin_unlock_irqrestore 1.37% [xfs] [k] xfs_next_bit 1.27% [xfs] [k] xfs_buf_offset 1.12% [kernel] [k] link_path_walk 1.09% [kernel] [k] kfree 1.08% [kernel] [k] memset 1.04% [xfs] [k] xfs_da_do_buf 0.99% [xfs] [k] xfs_dir2_node_addname_int 0.92% [xfs] [k] xfs_buf_item_size 0.89% [xfs] [k] xfs_btree_lookup time: 1341306607 PerfTop: 1281 irqs/sec kernel:87.0% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 12.00% [unknown] [.] 0x00007f84a7766b99 5.44% [kernel] [k] page_fault 4.04% [kernel] [k] _raw_spin_lock 3.94% [kernel] [k] memcpy 2.51% [xfs] [k] _xfs_buf_find 2.32% [kernel] [k] kmem_cache_alloc 1.75% cc1 [.] 0x0000000000210978 1.66% [kernel] [k] _raw_spin_lock_irqsave 1.58% [kernel] [k] __d_lookup 1.56% [kernel] [k] __kmalloc 1.46% [xfs] [k] xfs_next_bit 1.44% [kernel] [k] kmem_cache_free 1.41% [kernel] [k] _raw_spin_unlock_irqrestore 1.34% [xfs] [k] xfs_buf_offset 1.20% [kernel] [k] link_path_walk 1.16% [kernel] [k] kfree 1.11% [kernel] [k] memset 1.04% [xfs] [k] xfs_dir2_node_addname_int 0.94% [xfs] [k] xfs_da_do_buf 0.92% [xfs] [k] xfs_btree_lookup 0.89% [xfs] [k] xfs_buf_item_size time: 1341306607 PerfTop: 1455 irqs/sec kernel:86.8% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 12.14% [unknown] [.] 0x00007f84a7766b99 5.36% [kernel] [k] page_fault 4.12% [kernel] [k] _raw_spin_lock 4.02% [kernel] [k] memcpy 2.54% [xfs] [k] _xfs_buf_find 2.41% [kernel] [k] kmem_cache_alloc 1.69% [kernel] [k] _raw_spin_lock_irqsave 1.56% [kernel] [k] __kmalloc 1.49% [xfs] [k] xfs_next_bit 1.47% [kernel] [k] __d_lookup 1.42% [kernel] [k] _raw_spin_unlock_irqrestore 1.39% [xfs] [k] xfs_buf_offset 1.39% cc1 [.] 0x0000000000210978 1.37% [kernel] [k] kmem_cache_free 1.24% [kernel] [k] link_path_walk 1.17% [kernel] [k] memset 1.16% [kernel] [k] kfree 1.07% [xfs] [k] xfs_dir2_node_addname_int 0.99% [xfs] [k] xfs_buf_item_size 0.92% [xfs] [k] xfs_da_do_buf 0.91% [xfs] [k] xfs_btree_lookup time: 1341306613 PerfTop: 1245 irqs/sec kernel:87.3% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 12.05% [unknown] [.] 0x00007f84a7766b99 5.40% [kernel] [k] page_fault 4.10% [kernel] [k] _raw_spin_lock 4.06% [kernel] [k] memcpy 2.74% [xfs] [k] _xfs_buf_find 2.40% [kernel] [k] kmem_cache_alloc 1.64% [kernel] [k] _raw_spin_lock_irqsave 1.58% [xfs] [k] xfs_next_bit 1.54% [kernel] [k] __kmalloc 1.49% [kernel] [k] __d_lookup 1.45% [kernel] [k] _raw_spin_unlock_irqrestore 1.41% [kernel] [k] kmem_cache_free 1.35% [xfs] [k] xfs_buf_offset 1.25% [kernel] [k] link_path_walk 1.22% [kernel] [k] kfree 1.16% [kernel] [k] memset 1.15% cc1 [.] 0x0000000000210978 1.02% [xfs] [k] xfs_buf_item_size 1.00% [xfs] [k] xfs_dir2_node_addname_int 0.92% [xfs] [k] xfs_btree_lookup 0.91% [xfs] [k] xfs_da_do_buf time: 1341306613 PerfTop: 1433 irqs/sec kernel:87.2% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 12.04% [unknown] [.] 0x00007f84a7766b99 5.30% [kernel] [k] page_fault 4.08% [kernel] [k] memcpy 4.07% [kernel] [k] _raw_spin_lock 2.88% [xfs] [k] _xfs_buf_find 2.50% [kernel] [k] kmem_cache_alloc 1.72% [kernel] [k] _raw_spin_lock_irqsave 1.68% [xfs] [k] xfs_next_bit 1.56% [kernel] [k] __d_lookup 1.54% [kernel] [k] __kmalloc 1.48% [kernel] [k] _raw_spin_unlock_irqrestore 1.46% [kernel] [k] kmem_cache_free 1.40% [xfs] [k] xfs_buf_offset 1.25% [kernel] [k] link_path_walk 1.21% [kernel] [k] memset 1.18% [kernel] [k] kfree 1.04% [xfs] [k] xfs_buf_item_size 1.02% [xfs] [k] xfs_dir2_node_addname_int 0.95% [xfs] [k] xfs_btree_lookup 0.94% cc1 [.] 0x0000000000210978 0.90% [xfs] [k] xfs_da_do_buf time: 1341306613 PerfTop: 1118 irqs/sec kernel:87.2% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- no symbols found in /usr/bin/vmstat, maybe install a debug package? 12.03% [unknown] [.] 0x00007f84a7766b99 5.48% [kernel] [k] page_fault 4.21% [kernel] [k] memcpy 4.11% [kernel] [k] _raw_spin_lock 2.98% [xfs] [k] _xfs_buf_find 2.47% [kernel] [k] kmem_cache_alloc 1.81% [kernel] [k] _raw_spin_lock_irqsave 1.72% [xfs] [k] xfs_next_bit 1.51% [kernel] [k] __kmalloc 1.48% [kernel] [k] kmem_cache_free 1.48% [kernel] [k] __d_lookup 1.47% [kernel] [k] _raw_spin_unlock_irqrestore 1.39% [xfs] [k] xfs_buf_offset 1.23% [kernel] [k] link_path_walk 1.19% [kernel] [k] memset 1.19% [kernel] [k] kfree 1.07% [xfs] [k] xfs_dir2_node_addname_int 1.01% [xfs] [k] xfs_buf_item_size 0.98% [xfs] [k] xfs_btree_lookup 0.93% [xfs] [k] xfs_buf_item_format 0.91% [xfs] [k] xfs_da_do_buf time: 1341306617 PerfTop: 1454 irqs/sec kernel:87.6% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 11.93% [unknown] [.] 0x00007f84a7766b99 5.42% [kernel] [k] page_fault 4.28% [kernel] [k] memcpy 4.20% [kernel] [k] _raw_spin_lock 3.15% [xfs] [k] _xfs_buf_find 2.52% [kernel] [k] kmem_cache_alloc 1.76% [kernel] [k] _raw_spin_lock_irqsave 1.72% [xfs] [k] xfs_next_bit 1.59% [kernel] [k] __d_lookup 1.51% [kernel] [k] __kmalloc 1.49% [kernel] [k] _raw_spin_unlock_irqrestore 1.48% [kernel] [k] kmem_cache_free 1.40% [xfs] [k] xfs_buf_offset 1.29% [kernel] [k] memset 1.20% [kernel] [k] link_path_walk 1.17% [kernel] [k] kfree 1.09% [xfs] [k] xfs_dir2_node_addname_int 1.01% [xfs] [k] xfs_buf_item_size 0.95% [xfs] [k] xfs_btree_lookup 0.94% [xfs] [k] xfs_da_do_buf 0.91% [xfs] [k] xfs_buf_item_format time: 1341306617 PerfTop: 1758 irqs/sec kernel:90.2% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 10.99% [unknown] [.] 0x00007f84a7766b99 5.40% [kernel] [k] _raw_spin_lock 4.82% [kernel] [k] page_fault 4.04% [kernel] [k] memcpy 3.86% [xfs] [k] _xfs_buf_find 2.31% [kernel] [k] kmem_cache_alloc 2.03% [kernel] [k] _raw_spin_lock_irqsave 1.67% [kernel] [k] _raw_spin_unlock_irqrestore 1.60% [kernel] [k] __d_lookup 1.60% [xfs] [k] xfs_next_bit 1.44% [kernel] [k] __kmalloc 1.36% [xfs] [k] xfs_buf_offset 1.35% [kernel] [k] kmem_cache_free 1.17% [kernel] [k] kfree 1.16% [kernel] [k] memset 1.08% [kernel] [k] link_path_walk 0.98% [xfs] [k] xfs_dir2_node_addname_int 0.97% [xfs] [k] xfs_btree_lookup 0.92% [xfs] [k] xfs_perag_put 0.90% [xfs] [k] xfs_buf_item_size 0.84% [xfs] [k] xfs_da_do_buf time: 1341306623 PerfTop: 1022 irqs/sec kernel:88.6% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 10.93% [unknown] [.] 0x00007f84a7766b99 5.34% [kernel] [k] _raw_spin_lock 4.82% [kernel] [k] page_fault 4.01% [kernel] [k] memcpy 4.01% [xfs] [k] _xfs_buf_find 2.28% [kernel] [k] kmem_cache_alloc 2.00% [kernel] [k] _raw_spin_lock_irqsave 1.68% [kernel] [k] _raw_spin_unlock_irqrestore 1.60% [xfs] [k] xfs_next_bit 1.59% [kernel] [k] __d_lookup 1.41% [kernel] [k] __kmalloc 1.39% [kernel] [k] kmem_cache_free 1.35% [xfs] [k] xfs_buf_offset 1.15% [kernel] [k] kfree 1.15% [kernel] [k] memset 1.09% [kernel] [k] link_path_walk 0.98% [xfs] [k] xfs_dir2_node_addname_int 0.98% [xfs] [k] xfs_btree_lookup 0.91% [xfs] [k] xfs_perag_put 0.88% [xfs] [k] xfs_buf_item_size 0.86% [xfs] [k] xfs_da_do_buf time: 1341306623 PerfTop: 1430 irqs/sec kernel:87.6% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 11.05% [unknown] [.] 0x00007f84a7766b99 5.24% [kernel] [k] _raw_spin_lock 4.89% [kernel] [k] page_fault 4.13% [kernel] [k] memcpy 3.96% [xfs] [k] _xfs_buf_find 2.35% [kernel] [k] kmem_cache_alloc 1.95% [kernel] [k] _raw_spin_lock_irqsave 1.77% [kernel] [k] _raw_spin_unlock_irqrestore 1.63% [kernel] [k] __d_lookup 1.54% [xfs] [k] xfs_next_bit 1.42% [kernel] [k] __kmalloc 1.41% [kernel] [k] kmem_cache_free 1.32% [xfs] [k] xfs_buf_offset 1.16% [kernel] [k] memset 1.11% [kernel] [k] kfree 1.10% [kernel] [k] link_path_walk 1.05% [xfs] [k] xfs_dir2_node_addname_int 0.99% [xfs] [k] xfs_btree_lookup 0.91% [xfs] [k] xfs_buf_item_size 0.87% [xfs] [k] xfs_da_do_buf 0.87% [xfs] [k] xfs_perag_put time: 1341306623 PerfTop: 1267 irqs/sec kernel:87.1% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 11.20% [unknown] [.] 0x00007f84a7766b99 5.08% [kernel] [k] _raw_spin_lock 4.85% [kernel] [k] page_fault 4.12% [kernel] [k] memcpy 3.96% [xfs] [k] _xfs_buf_find 2.41% [kernel] [k] kmem_cache_alloc 1.94% [kernel] [k] _raw_spin_lock_irqsave 1.77% [kernel] [k] _raw_spin_unlock_irqrestore 1.61% [kernel] [k] __d_lookup 1.50% [xfs] [k] xfs_next_bit 1.44% [kernel] [k] __kmalloc 1.40% [kernel] [k] kmem_cache_free 1.31% [xfs] [k] xfs_buf_offset no symbols found in /usr/bin/iostat, maybe install a debug package? 1.16% [kernel] [k] memset 1.11% [kernel] [k] kfree 1.06% [kernel] [k] link_path_walk 1.04% [xfs] [k] xfs_dir2_node_addname_int 1.01% [xfs] [k] xfs_btree_lookup 0.95% [xfs] [k] xfs_buf_item_size 0.90% [xfs] [k] xfs_da_do_buf 0.84% [xfs] [k] xfs_perag_put time: 1341306629 PerfTop: 1399 irqs/sec kernel:88.3% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 11.12% [unknown] [.] 0x00007f84a7766b99 5.13% [kernel] [k] _raw_spin_lock 4.85% [kernel] [k] page_fault 4.23% [kernel] [k] memcpy 4.03% [xfs] [k] _xfs_buf_find 2.37% [kernel] [k] kmem_cache_alloc 1.96% [kernel] [k] _raw_spin_lock_irqsave 1.69% [kernel] [k] _raw_spin_unlock_irqrestore 1.63% [kernel] [k] __d_lookup 1.50% [xfs] [k] xfs_next_bit 1.45% [kernel] [k] __kmalloc 1.35% [kernel] [k] kmem_cache_free 1.32% [xfs] [k] xfs_buf_offset 1.17% [kernel] [k] memset 1.09% [kernel] [k] kfree 1.07% [xfs] [k] xfs_dir2_node_addname_int 1.07% [kernel] [k] link_path_walk 1.04% [xfs] [k] xfs_btree_lookup 1.02% [xfs] [k] xfs_buf_item_size 0.93% [xfs] [k] xfs_da_do_buf 0.84% [xfs] [k] xfs_buf_item_format time: 1341306629 PerfTop: 1225 irqs/sec kernel:87.5% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 11.15% [unknown] [.] 0x00007f84a7766b99 5.02% [kernel] [k] _raw_spin_lock 4.85% [kernel] [k] page_fault 4.22% [kernel] [k] memcpy 4.19% [xfs] [k] _xfs_buf_find 2.32% [kernel] [k] kmem_cache_alloc 1.94% [kernel] [k] _raw_spin_lock_irqsave 1.71% [kernel] [k] __d_lookup 1.68% [kernel] [k] _raw_spin_unlock_irqrestore 1.54% [xfs] [k] xfs_next_bit 1.51% [kernel] [k] __kmalloc 1.36% [kernel] [k] kmem_cache_free 1.28% [xfs] [k] xfs_buf_offset 1.14% [kernel] [k] memset 1.09% [kernel] [k] kfree 1.06% [kernel] [k] link_path_walk 1.02% [xfs] [k] xfs_dir2_node_addname_int 0.99% [xfs] [k] xfs_buf_item_size 0.99% [xfs] [k] xfs_btree_lookup 0.92% [xfs] [k] xfs_da_do_buf 0.86% [kernel] [k] s_show time: 1341306629 PerfTop: 1400 irqs/sec kernel:87.4% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 11.23% [unknown] [.] 0x00007f84a7766b99 5.07% [kernel] [k] _raw_spin_lock 4.87% [kernel] [k] page_fault 4.27% [xfs] [k] _xfs_buf_find 4.18% [kernel] [k] memcpy 2.31% [kernel] [k] kmem_cache_alloc 1.94% [kernel] [k] _raw_spin_lock_irqsave 1.73% [kernel] [k] __d_lookup 1.66% [kernel] [k] _raw_spin_unlock_irqrestore 1.49% [xfs] [k] xfs_next_bit 1.49% [kernel] [k] __kmalloc 1.40% [kernel] [k] kmem_cache_free 1.29% [xfs] [k] xfs_buf_offset 1.11% [kernel] [k] kfree 1.07% [kernel] [k] memset 1.07% [kernel] [k] link_path_walk 1.05% [xfs] [k] xfs_dir2_node_addname_int 0.99% [xfs] [k] xfs_buf_item_size 0.97% [xfs] [k] xfs_btree_lookup 0.93% [xfs] [k] xfs_da_do_buf 0.89% [kernel] [k] s_show time: 1341306635 PerfTop: 1251 irqs/sec kernel:87.9% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 11.20% [unknown] [.] 0x00007f84a7766b99 5.10% [kernel] [k] _raw_spin_lock 4.82% [kernel] [k] page_fault 4.29% [xfs] [k] _xfs_buf_find 4.19% [kernel] [k] memcpy 2.26% [kernel] [k] kmem_cache_alloc 1.87% [kernel] [k] _raw_spin_lock_irqsave 1.83% [kernel] [k] __d_lookup 1.64% [kernel] [k] _raw_spin_unlock_irqrestore 1.53% [kernel] [k] __kmalloc 1.49% [xfs] [k] xfs_next_bit 1.41% [kernel] [k] kmem_cache_free 1.32% [xfs] [k] xfs_buf_offset 1.10% [kernel] [k] link_path_walk 1.09% [kernel] [k] memset 1.08% [kernel] [k] kfree 1.03% [xfs] [k] xfs_dir2_node_addname_int 0.99% [xfs] [k] xfs_buf_item_size 0.98% [xfs] [k] xfs_btree_lookup 0.96% [kernel] [k] s_show 0.93% [xfs] [k] xfs_da_do_buf time: 1341306635 PerfTop: 1429 irqs/sec kernel:88.3% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 11.18% [unknown] [.] 0x00007f84a7766b99 5.13% [kernel] [k] _raw_spin_lock 4.82% [kernel] [k] page_fault 4.28% [xfs] [k] _xfs_buf_find 4.21% [kernel] [k] memcpy 2.23% [kernel] [k] kmem_cache_alloc 1.90% [kernel] [k] _raw_spin_lock_irqsave 1.83% [kernel] [k] __d_lookup 1.67% [kernel] [k] _raw_spin_unlock_irqrestore 1.52% [kernel] [k] __kmalloc 1.52% [xfs] [k] xfs_next_bit 1.36% [kernel] [k] kmem_cache_free 1.34% [xfs] [k] xfs_buf_offset 1.11% [kernel] [k] link_path_walk 1.11% [kernel] [k] memset 1.08% [kernel] [k] kfree 1.03% [xfs] [k] xfs_buf_item_size 1.03% [xfs] [k] xfs_dir2_node_addname_int 1.01% [kernel] [k] s_show 0.98% [xfs] [k] xfs_btree_lookup 0.94% [xfs] [k] xfs_da_do_buf time: 1341306635 PerfTop: 1232 irqs/sec kernel:88.9% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 11.11% [unknown] [.] 0x00007f84a7766b99 5.13% [kernel] [k] _raw_spin_lock 4.87% [kernel] [k] page_fault 4.33% [xfs] [k] _xfs_buf_find 4.16% [kernel] [k] memcpy 2.24% [kernel] [k] kmem_cache_alloc 1.84% [kernel] [k] _raw_spin_lock_irqsave 1.82% [kernel] [k] __d_lookup 1.65% [kernel] [k] _raw_spin_unlock_irqrestore 1.50% [xfs] [k] xfs_next_bit 1.49% [kernel] [k] __kmalloc 1.34% [kernel] [k] kmem_cache_free 1.32% [xfs] [k] xfs_buf_offset 1.13% [kernel] [k] link_path_walk 1.13% [kernel] [k] kfree 1.11% [kernel] [k] memset 1.06% [kernel] [k] s_show 1.03% [xfs] [k] xfs_buf_item_size 1.02% [xfs] [k] xfs_dir2_node_addname_int 0.98% [xfs] [k] xfs_btree_lookup 0.94% [xfs] [k] xfs_da_do_buf time: 1341306639 PerfTop: 1444 irqs/sec kernel:87.3% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 11.19% [unknown] [.] 0x00007f84a7766b99 5.10% [kernel] [k] _raw_spin_lock 4.95% [kernel] [k] page_fault 4.40% [xfs] [k] _xfs_buf_find 4.10% [kernel] [k] memcpy 2.20% [kernel] [k] kmem_cache_alloc 1.93% [kernel] [k] _raw_spin_lock_irqsave 1.81% [kernel] [k] __d_lookup 1.59% [kernel] [k] _raw_spin_unlock_irqrestore 1.50% [xfs] [k] xfs_next_bit 1.48% [kernel] [k] __kmalloc 1.37% [kernel] [k] kmem_cache_free 1.36% [xfs] [k] xfs_buf_offset 1.15% [kernel] [k] memset 1.12% [kernel] [k] s_show 1.12% [kernel] [k] link_path_walk 1.10% [kernel] [k] kfree 1.02% [xfs] [k] xfs_buf_item_size 0.99% [xfs] [k] xfs_btree_lookup 0.97% [xfs] [k] xfs_dir2_node_addname_int 0.94% [xfs] [k] xfs_da_do_buf time: 1341306639 PerfTop: 1195 irqs/sec kernel:90.9% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 10.00% [unknown] [.] 0x00007f84a7766b99 5.17% [kernel] [k] _raw_spin_lock 4.44% [xfs] [k] _xfs_buf_find 4.37% [kernel] [k] page_fault 4.37% [kernel] [k] memcpy 2.30% [kernel] [k] kmem_cache_alloc 1.90% [kernel] [k] _raw_spin_lock_irqsave 1.63% [kernel] [k] __d_lookup 1.62% [kernel] [k] _raw_spin_unlock_irqrestore 1.59% [xfs] [k] xfs_buf_offset 1.50% [kernel] [k] kmem_cache_free 1.50% [kernel] [k] __kmalloc 1.49% [xfs] [k] xfs_next_bit 1.33% [kernel] [k] memset 1.28% [kernel] [k] kfree 1.11% [xfs] [k] xfs_buf_item_size 1.07% [kernel] [k] s_show 1.07% [kernel] [k] link_path_walk 0.93% [xfs] [k] xfs_btree_lookup 0.90% [xfs] [k] xfs_da_do_buf 0.84% [xfs] [k] xfs_dir2_node_addname_int time: 1341306645 PerfTop: 1097 irqs/sec kernel:95.8% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 7.51% [unknown] [.] 0x00007f84a7766b99 5.02% [kernel] [k] _raw_spin_lock 4.63% [kernel] [k] memcpy 4.51% [xfs] [k] _xfs_buf_find 3.32% [kernel] [k] page_fault 2.37% [kernel] [k] kmem_cache_alloc 1.87% [kernel] [k] _raw_spin_lock_irqsave 1.77% [xfs] [k] xfs_buf_offset 1.75% [kernel] [k] __kmalloc 1.73% [xfs] [k] xfs_next_bit 1.65% [kernel] [k] memset 1.60% [kernel] [k] _raw_spin_unlock_irqrestore 1.53% [kernel] [k] kfree 1.52% [kernel] [k] kmem_cache_free 1.44% [xfs] [k] xfs_trans_ail_cursor_first 1.26% [kernel] [k] __d_lookup 1.26% [xfs] [k] xfs_buf_item_size 1.05% [kernel] [k] s_show 1.03% [xfs] [k] xfs_buf_item_format 0.92% [kernel] [k] __d_lookup_rcu 0.87% [kernel] [k] link_path_walk time: 1341306645 PerfTop: 1038 irqs/sec kernel:95.2% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 5.89% [unknown] [.] 0x00007f84a7766b99 5.18% [kernel] [k] memcpy 4.60% [xfs] [k] _xfs_buf_find 4.52% [kernel] [k] _raw_spin_lock 2.60% [kernel] [k] page_fault 2.42% [kernel] [k] kmem_cache_alloc 1.99% [kernel] [k] __kmalloc 1.96% [xfs] [k] xfs_next_bit 1.93% [xfs] [k] xfs_buf_offset 1.84% [kernel] [k] _raw_spin_lock_irqsave 1.83% [kernel] [k] memset 1.80% [kernel] [k] kmem_cache_free 1.68% [kernel] [k] kfree 1.47% [xfs] [k] xfs_buf_item_size 1.45% [kernel] [k] _raw_spin_unlock_irqrestore 1.18% [kernel] [k] __d_lookup_rcu 1.13% [xfs] [k] xfs_trans_ail_cursor_first 1.12% [xfs] [k] xfs_buf_item_format 1.04% [kernel] [k] s_show 1.01% [kernel] [k] __d_lookup 0.93% [xfs] [k] xfs_da_do_buf time: 1341306645 PerfTop: 1087 irqs/sec kernel:96.0% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 5.27% [kernel] [k] memcpy 4.77% [unknown] [.] 0x00007f84a7766b99 4.69% [xfs] [k] _xfs_buf_find 4.56% [kernel] [k] _raw_spin_lock 2.47% [kernel] [k] kmem_cache_alloc 2.18% [xfs] [k] xfs_next_bit 2.11% [kernel] [k] page_fault 2.00% [xfs] [k] xfs_buf_offset 1.99% [kernel] [k] __kmalloc 1.96% [kernel] [k] kmem_cache_free 1.85% [kernel] [k] kfree 1.82% [kernel] [k] memset 1.75% [kernel] [k] _raw_spin_lock_irqsave 1.58% [xfs] [k] xfs_buf_item_size 1.41% [kernel] [k] _raw_spin_unlock_irqrestore 1.23% [kernel] [k] __d_lookup_rcu 1.21% [xfs] [k] xfs_buf_item_format 0.99% [kernel] [k] s_show 0.97% [xfs] [k] xfs_perag_put 0.92% [xfs] [k] xfs_da_do_buf 0.92% [xfs] [k] xfs_trans_ail_cursor_first time: 1341306651 PerfTop: 1157 irqs/sec kernel:96.5% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 5.53% [kernel] [k] memcpy 4.61% [xfs] [k] _xfs_buf_find 4.40% [kernel] [k] _raw_spin_lock 3.83% [unknown] [.] 0x00007f84a7766b99 2.67% [kernel] [k] kmem_cache_alloc 2.32% [xfs] [k] xfs_next_bit 2.21% [kernel] [k] __kmalloc 2.21% [xfs] [k] xfs_buf_offset 2.19% [kernel] [k] kmem_cache_free 1.92% [kernel] [k] memset 1.89% [kernel] [k] kfree 1.80% [xfs] [k] xfs_buf_item_size 1.70% [kernel] [k] page_fault 1.62% [kernel] [k] _raw_spin_lock_irqsave 1.50% [kernel] [k] _raw_spin_unlock_irqrestore 1.30% [xfs] [k] xfs_buf_item_format 1.27% [kernel] [k] __d_lookup_rcu 0.97% [xfs] [k] xfs_da_do_buf 0.96% [xfs] [k] xlog_cil_prepare_log_vecs 0.93% [kernel] [k] s_show 0.93% [xfs] [k] xfs_perag_put time: 1341306651 PerfTop: 1073 irqs/sec kernel:95.5% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 5.76% [kernel] [k] memcpy 4.54% [xfs] [k] _xfs_buf_find 4.32% [kernel] [k] _raw_spin_lock 3.15% [unknown] [.] 0x00007f84a7766b99 2.77% [kernel] [k] kmem_cache_alloc 2.49% [xfs] [k] xfs_next_bit 2.36% [kernel] [k] __kmalloc 2.27% [kernel] [k] kmem_cache_free 2.20% [xfs] [k] xfs_buf_offset 1.88% [kernel] [k] memset 1.88% [kernel] [k] kfree 1.77% [xfs] [k] xfs_buf_item_size 1.62% [kernel] [k] _raw_spin_lock_irqsave 1.48% [kernel] [k] _raw_spin_unlock_irqrestore 1.42% [xfs] [k] xfs_buf_item_format 1.40% [kernel] [k] page_fault 1.39% [kernel] [k] __d_lookup_rcu 0.99% [xfs] [k] xfs_da_do_buf 0.96% [xfs] [k] xlog_cil_prepare_log_vecs 0.88% [kernel] [k] s_show 0.87% [xfs] [k] xfs_perag_put time: 1341306651 PerfTop: 492 irqs/sec kernel:85.6% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 5.74% [kernel] [k] memcpy 4.48% [xfs] [k] _xfs_buf_find 4.27% [kernel] [k] _raw_spin_lock 3.00% [unknown] [.] 0x00007f84a7766b99 2.76% [kernel] [k] kmem_cache_alloc 2.54% [xfs] [k] xfs_next_bit 2.39% [kernel] [k] __kmalloc 2.30% [kernel] [k] kmem_cache_free 2.20% [xfs] [k] xfs_buf_offset no symbols found in /bin/ps, maybe install a debug package? 1.96% [kernel] [k] kfree 1.92% [kernel] [k] memset 1.75% [xfs] [k] xfs_buf_item_size 1.56% [kernel] [k] _raw_spin_lock_irqsave 1.48% [kernel] [k] _raw_spin_unlock_irqrestore 1.44% [xfs] [k] xfs_buf_item_format 1.39% [kernel] [k] __d_lookup_rcu 1.36% [kernel] [k] page_fault 0.99% [xfs] [k] xlog_cil_prepare_log_vecs 0.96% [xfs] [k] xfs_da_do_buf 0.86% [kernel] [k] s_show 0.85% [xfs] [k] xfs_perag_put time: 1341306657 PerfTop: 70 irqs/sec kernel:72.9% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 5.73% [kernel] [k] memcpy 4.47% [xfs] [k] _xfs_buf_find 4.27% [kernel] [k] _raw_spin_lock 2.99% [unknown] [.] 0x00007f84a7766b99 2.75% [kernel] [k] kmem_cache_alloc 2.53% [xfs] [k] xfs_next_bit 2.39% [kernel] [k] __kmalloc 2.30% [kernel] [k] kmem_cache_free 2.20% [xfs] [k] xfs_buf_offset 1.96% [kernel] [k] kfree 1.92% [kernel] [k] memset 1.75% [xfs] [k] xfs_buf_item_size 1.56% [kernel] [k] _raw_spin_lock_irqsave 1.49% [kernel] [k] _raw_spin_unlock_irqrestore 1.43% [xfs] [k] xfs_buf_item_format 1.38% [kernel] [k] __d_lookup_rcu 1.37% [kernel] [k] page_fault 0.98% [xfs] [k] xlog_cil_prepare_log_vecs 0.96% [xfs] [k] xfs_da_do_buf 0.89% [kernel] [k] s_show 0.85% [xfs] [k] xfs_perag_put time: 1341306657 PerfTop: 87 irqs/sec kernel:71.3% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 5.72% [kernel] [k] memcpy 4.45% [xfs] [k] _xfs_buf_find 4.25% [kernel] [k] _raw_spin_lock 2.99% [unknown] [.] 0x00007f84a7766b99 2.74% [kernel] [k] kmem_cache_alloc 2.52% [xfs] [k] xfs_next_bit 2.38% [kernel] [k] __kmalloc 2.29% [kernel] [k] kmem_cache_free 2.19% [xfs] [k] xfs_buf_offset 1.95% [kernel] [k] kfree 1.91% [kernel] [k] memset 1.74% [xfs] [k] xfs_buf_item_size 1.56% [kernel] [k] _raw_spin_lock_irqsave 1.48% [kernel] [k] _raw_spin_unlock_irqrestore 1.43% [xfs] [k] xfs_buf_item_format 1.38% [kernel] [k] page_fault 1.38% [kernel] [k] __d_lookup_rcu 0.98% [xfs] [k] xlog_cil_prepare_log_vecs 0.96% [xfs] [k] xfs_da_do_buf 0.93% [kernel] [k] s_show 0.84% [xfs] [k] xfs_perag_put time: 1341306657 PerfTop: 88 irqs/sec kernel:68.2% exact: 0.0% [1000Hz cycles], (all, 8 CPUs) ------------------------------------------------------------------------------- 5.69% [kernel] [k] memcpy 4.42% [xfs] [k] _xfs_buf_find 4.25% [kernel] [k] _raw_spin_lock 2.98% [unknown] [.] 0x00007f84a7766b99 -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [MMTests] IO metadata on XFS 2012-07-03 10:59 ` Mel Gorman @ 2012-07-03 11:44 ` Mel Gorman 2012-07-03 12:31 ` Daniel Vetter 2012-07-04 0:47 ` Dave Chinner 2 siblings, 0 replies; 29+ messages in thread From: Mel Gorman @ 2012-07-03 11:44 UTC (permalink / raw) To: Dave Chinner Cc: Christoph Hellwig, linux-mm, linux-kernel, linux-fsdevel, xfs, dri-devel, Keith Packard, Eugeni Dodonov, Daniel Vetter, Chris Wilson On Tue, Jul 03, 2012 at 11:59:51AM +0100, Mel Gorman wrote: > > Can you run latencytop to see > > if there is excessive starvation/wait times for allocation > > completion? > > I'm not sure what format you are looking for. latencytop is shit for > capturing information throughout a test and it does not easily allow you to > record a snapshot of a test. You can record all the console output of course > but that's a complete mess. I tried capturing /proc/latency_stats over time > instead because that can be trivially sorted on a system-wide basis but > as I write this I find that latency_stats was bust. It was just spitting out > > Latency Top version : v0.1 > > and nothing else. Either latency_stats is broken or my config is. Not sure > which it is right now and won't get enough time on this today to pinpoint it. > PEBKAC. Script that monitored /proc/latency_stats was not enabling latency top via /proc/sys/kernel/latencytop -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [MMTests] IO metadata on XFS 2012-07-03 10:59 ` Mel Gorman 2012-07-03 11:44 ` Mel Gorman @ 2012-07-03 12:31 ` Daniel Vetter 2012-07-03 13:08 ` Mel Gorman 2012-07-03 13:28 ` Eugeni Dodonov 2012-07-04 0:47 ` Dave Chinner 2 siblings, 2 replies; 29+ messages in thread From: Daniel Vetter @ 2012-07-03 12:31 UTC (permalink / raw) To: Mel Gorman Cc: Dave Chinner, Christoph Hellwig, linux-mm, linux-kernel, linux-fsdevel, xfs, dri-devel, Keith Packard, Eugeni Dodonov, Daniel Vetter, Chris Wilson On Tue, Jul 03, 2012 at 11:59:51AM +0100, Mel Gorman wrote: > On Tue, Jul 03, 2012 at 10:19:28AM +1000, Dave Chinner wrote: > > On Mon, Jul 02, 2012 at 08:35:16PM +0100, Mel Gorman wrote: > > > Adding dri-devel and a few others because an i915 patch contributed to > > > the regression. > > > > > > On Mon, Jul 02, 2012 at 03:32:15PM +0100, Mel Gorman wrote: > > > > On Mon, Jul 02, 2012 at 02:32:26AM -0400, Christoph Hellwig wrote: > > > > > > It increases the CPU overhead (dirty_inode can be called up to 4 > > > > > > times per write(2) call, IIRC), so with limited numbers of > > > > > > threads/limited CPU power it will result in lower performance. Where > > > > > > you have lots of CPU power, there will be little difference in > > > > > > performance... > > > > > > > > > > When I checked it it could only be called twice, and we'd already > > > > > optimize away the second call. I'd defintively like to track down where > > > > > the performance changes happend, at least to a major version but even > > > > > better to a -rc or git commit. > > > > > > > > > > > > > By all means feel free to run the test yourself and run the bisection :) > > > > > > > > It's rare but on this occasion the test machine is idle so I started an > > > > automated git bisection. As you know the milage with an automated bisect > > > > varies so it may or may not find the right commit. Test machine is sandy so > > > > http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-metadata-xfs/sandy/comparison.html > > > > is the report of interest. The script is doing a full search between v3.3 and > > > > v3.4 for a point where average files/sec for fsmark-single drops below 25000. > > > > I did not limit the search to fs/xfs on the off-chance that it is an > > > > apparently unrelated patch that caused the problem. > > > > > > > > > > It was obvious very quickly that there were two distinct regression so I > > > ran two bisections. One led to a XFS and the other led to an i915 patch > > > that enables RC6 to reduce power usage. > > > > > > [aa464191: drm/i915: enable plain RC6 on Sandy Bridge by default] > > > > Doesn't seem to be the major cause of the regression. By itself, it > > has impact, but the majority comes from the XFS change... > > > > The fact it has an impact at all is weird but lets see what the DRI > folks think about it. Well, presuming I understand things correctly the cpu die only goes into the lowest sleep state (which iirc switches off l3 caches and interconnects) when both the cpu and gpu are in the lowest sleep state. rc6 is that deep-sleep state for the gpu, so without that enabled your system won't go into these deep-sleep states. I guess the slight changes in wakeup latency, power consumption (cuts about 10W on an idle desktop snb with resulting big effect on what turbo boost can sustain for short amounts of time) and all the follow-on effects are good enough to massively change timing-critical things. So this having an effect isn't too weird. Obviously, if you also have X running while doing these tests there's the chance that the gpu dies because of an issue when waking up from rc6 (we've known a few of these), but if no drm client is up, that shouldn't be possible. So please retest without X running if that hasn't been done already. Yours, Daniel -- Daniel Vetter Mail: daniel@ffwll.ch Mobile: +41 (0)79 365 57 48 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [MMTests] IO metadata on XFS 2012-07-03 12:31 ` Daniel Vetter @ 2012-07-03 13:08 ` Mel Gorman 2012-07-03 13:28 ` Eugeni Dodonov 1 sibling, 0 replies; 29+ messages in thread From: Mel Gorman @ 2012-07-03 13:08 UTC (permalink / raw) To: Dave Chinner, Christoph Hellwig, linux-mm, linux-kernel, linux-fsdevel, xfs, dri-devel, Keith Packard, Eugeni Dodonov, Chris Wilson On Tue, Jul 03, 2012 at 02:31:19PM +0200, Daniel Vetter wrote: > On Tue, Jul 03, 2012 at 11:59:51AM +0100, Mel Gorman wrote: > > On Tue, Jul 03, 2012 at 10:19:28AM +1000, Dave Chinner wrote: > > > On Mon, Jul 02, 2012 at 08:35:16PM +0100, Mel Gorman wrote: > > > > Adding dri-devel and a few others because an i915 patch contributed to > > > > the regression. > > > > > > > > On Mon, Jul 02, 2012 at 03:32:15PM +0100, Mel Gorman wrote: > > > > > On Mon, Jul 02, 2012 at 02:32:26AM -0400, Christoph Hellwig wrote: > > > > > > > It increases the CPU overhead (dirty_inode can be called up to 4 > > > > > > > times per write(2) call, IIRC), so with limited numbers of > > > > > > > threads/limited CPU power it will result in lower performance. Where > > > > > > > you have lots of CPU power, there will be little difference in > > > > > > > performance... > > > > > > > > > > > > When I checked it it could only be called twice, and we'd already > > > > > > optimize away the second call. I'd defintively like to track down where > > > > > > the performance changes happend, at least to a major version but even > > > > > > better to a -rc or git commit. > > > > > > > > > > > > > > > > By all means feel free to run the test yourself and run the bisection :) > > > > > > > > > > It's rare but on this occasion the test machine is idle so I started an > > > > > automated git bisection. As you know the milage with an automated bisect > > > > > varies so it may or may not find the right commit. Test machine is sandy so > > > > > http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-metadata-xfs/sandy/comparison.html > > > > > is the report of interest. The script is doing a full search between v3.3 and > > > > > v3.4 for a point where average files/sec for fsmark-single drops below 25000. > > > > > I did not limit the search to fs/xfs on the off-chance that it is an > > > > > apparently unrelated patch that caused the problem. > > > > > > > > > > > > > It was obvious very quickly that there were two distinct regression so I > > > > ran two bisections. One led to a XFS and the other led to an i915 patch > > > > that enables RC6 to reduce power usage. > > > > > > > > [aa464191: drm/i915: enable plain RC6 on Sandy Bridge by default] > > > > > > Doesn't seem to be the major cause of the regression. By itself, it > > > has impact, but the majority comes from the XFS change... > > > > > > > The fact it has an impact at all is weird but lets see what the DRI > > folks think about it. > > Well, presuming I understand things correctly the cpu die only goes into > the lowest sleep state (which iirc switches off l3 caches and > interconnects) when both the cpu and gpu are in the lowest sleep state. I made a mistake in my previous mail. gdm and X were were *not* running. Once the screen blanked I would guess the GPU is in a low sleep state the majority of the time. > rc6 is that deep-sleep state for the gpu, so without that enabled your > system won't go into these deep-sleep states. > > I guess the slight changes in wakeup latency, power consumption (cuts > about 10W on an idle desktop snb with resulting big effect on what turbo > boost can sustain for short amounts of time) and all the follow-on effects > are good enough to massively change timing-critical things. > Maybe. How aggressively is the lowest sleep state entered and how long does it take to exit? > So this having an effect isn't too weird. > > Obviously, if you also have X running while doing these tests there's the > chance that the gpu dies because of an issue when waking up from rc6 > (we've known a few of these), but if no drm client is up, that shouldn't > be possible. So please retest without X running if that hasn't been done > already. > Again, sorry for the confusion but the posted results are without X running. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [MMTests] IO metadata on XFS 2012-07-03 12:31 ` Daniel Vetter 2012-07-03 13:08 ` Mel Gorman @ 2012-07-03 13:28 ` Eugeni Dodonov 1 sibling, 0 replies; 29+ messages in thread From: Eugeni Dodonov @ 2012-07-03 13:28 UTC (permalink / raw) To: Mel Gorman, Dave Chinner, Christoph Hellwig, linux-mm, linux-kernel, linux-fsdevel, xfs, dri-devel, Keith Packard, Eugeni Dodonov, Chris Wilson Cc: Daniel Vetter [-- Attachment #1: Type: text/plain, Size: 1366 bytes --] On Tue, Jul 3, 2012 at 9:31 AM, Daniel Vetter <daniel@ffwll.ch> wrote: > Well, presuming I understand things correctly the cpu die only goes into > the lowest sleep state (which iirc switches off l3 caches and > interconnects) when both the cpu and gpu are in the lowest sleep state. > rc6 is that deep-sleep state for the gpu, so without that enabled your > system won't go into these deep-sleep states. > > I guess the slight changes in wakeup latency, power consumption (cuts > about 10W on an idle desktop snb with resulting big effect on what turbo > boost can sustain for short amounts of time) and all the follow-on effects > are good enough to massively change timing-critical things. > The sad side effect is that the software has very little control over the RC6 entry and exit, the hardware enters and leaves RC6 state on its own when it detects that the GPU is idle beyond a threshold. Chances are that if you are not running any GPU workload, the GPU simple enters RC6 state and stays there. It is possible to observe the current state and also time spent in rc6 by looking at the /sys/kernel/debug/dri/0/i915_drpc_info file. One other effect of RC6 is that it also allows CPU to go into higher turbo modes as it has more watts to spend while GPU is idle, perhaps this is what causes the issue here? -- Eugeni Dodonov <http://eugeni.dodonov.net/> [-- Attachment #2: Type: text/html, Size: 1809 bytes --] ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [MMTests] IO metadata on XFS 2012-07-03 10:59 ` Mel Gorman 2012-07-03 11:44 ` Mel Gorman 2012-07-03 12:31 ` Daniel Vetter @ 2012-07-04 0:47 ` Dave Chinner 2012-07-04 9:51 ` Mel Gorman 2 siblings, 1 reply; 29+ messages in thread From: Dave Chinner @ 2012-07-04 0:47 UTC (permalink / raw) To: Mel Gorman Cc: Christoph Hellwig, linux-mm, linux-kernel, linux-fsdevel, xfs, dri-devel, Keith Packard, Eugeni Dodonov, Daniel Vetter, Chris Wilson On Tue, Jul 03, 2012 at 11:59:51AM +0100, Mel Gorman wrote: > On Tue, Jul 03, 2012 at 10:19:28AM +1000, Dave Chinner wrote: > > On Mon, Jul 02, 2012 at 08:35:16PM +0100, Mel Gorman wrote: > > > Adding dri-devel and a few others because an i915 patch contributed to > > > the regression. > > > > > > On Mon, Jul 02, 2012 at 03:32:15PM +0100, Mel Gorman wrote: > > > > On Mon, Jul 02, 2012 at 02:32:26AM -0400, Christoph Hellwig wrote: > > > > > > It increases the CPU overhead (dirty_inode can be called up to 4 > > > > > > times per write(2) call, IIRC), so with limited numbers of > > > > > > threads/limited CPU power it will result in lower performance. Where > > > > > > you have lots of CPU power, there will be little difference in > > > > > > performance... > > > > > > > > > > When I checked it it could only be called twice, and we'd already > > > > > optimize away the second call. I'd defintively like to track down where > > > > > the performance changes happend, at least to a major version but even > > > > > better to a -rc or git commit. > > > > > > > > > > > > > By all means feel free to run the test yourself and run the bisection :) > > > > > > > > It's rare but on this occasion the test machine is idle so I started an > > > > automated git bisection. As you know the milage with an automated bisect > > > > varies so it may or may not find the right commit. Test machine is sandy so > > > > http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-metadata-xfs/sandy/comparison.html > > > > is the report of interest. The script is doing a full search between v3.3 and > > > > v3.4 for a point where average files/sec for fsmark-single drops below 25000. > > > > I did not limit the search to fs/xfs on the off-chance that it is an > > > > apparently unrelated patch that caused the problem. > > > > > > > > > > It was obvious very quickly that there were two distinct regression so I > > > ran two bisections. One led to a XFS and the other led to an i915 patch > > > that enables RC6 to reduce power usage. > > > > > > [aa464191: drm/i915: enable plain RC6 on Sandy Bridge by default] > > > > Doesn't seem to be the major cause of the regression. By itself, it > > has impact, but the majority comes from the XFS change... > > > > The fact it has an impact at all is weird but lets see what the DRI > folks think about it. > > > > [c999a223: xfs: introduce an allocation workqueue] > > > > Which indicates that there is workqueue scheduling issues, I think. > > The same amount of work is being done, but half of it is being > > pushed off into a workqueue to avoid stack overflow issues (*). I > > tested the above patch in anger on an 8p machine, similar to the > > machine you saw no regressions on, but the workload didn't drive it > > to being completely CPU bound (only about 90%) so the allocation > > work was probably always scheduled quickly. > > What test were you using? fsmark, dbench, compilebench, and a few fio workloads. Also, xfstests times each test and I keep track of overall runtime, and none of those showed any performance differential, either... Indeed, running on a current 3.5-rc5 tree, my usual fsmark benchmarks are running at the same numbers I've been seeing since about 3.0 - somewhere around 18k files/s for a single thread, and 110-115k files/s for 8 threads. I just ran your variant, and I'm getting about 20kfile/s for a single thread, which is about right because you're using smaller directories than I am (22500 files per dir vs 100k in my tests). > > How many worker threads have been spawned on these machines > > that are showing the regression? > > 20 or 21 generally. An example list as spotted by top looks like Pretty normal. > > What is the context switch rate on the machines whenteh test is running? ..... > Vanilla average context switch rate 4278.53 > Revert average context switch rate 1095 That seems about right, too. > > Can you run latencytop to see > > if there is excessive starvation/wait times for allocation > > completion? > > I'm not sure what format you are looking for. Where the context switches are coming from, and how long they are abeing stalled for. Just to get the context switch locations, you can use perf on the sched:sched_switch event, but that doesn't give you stall times. Local testing tells me that about 40% of the switches are from xfs_alloc_vextent, 55% are from the work threads, and the rest are CPU idling events, which is exactly as I'd expect. > > A pert top profile comparison might be informative, > > too... > > > > I'm not sure if this is what you really wanted. I thought an oprofile or > perf report would have made more sense but I recorded perf top over time > anyway and it's at the end of the mail. perf report and oprofile give you CPU usage across the run, it's not instantaneous and that's where all the interesting information is. e.g. a 5% sample in a 20s profile might be 5% per second for 20s, or it might be 100% for 1s - that's the behaviour run profiles cannot give you insight into.... As it is, the output you posted is nothing unusual. > For just these XFS tests I've uploaded a tarball of the logs to > http://www.csn.ul.ie/~mel/postings/xfsbisect-20120703/xfsbisect-logs.tar.gz Ok, so the main thing I missed when first looking at this is that you are concerned about single thread regressions. Well, I can't reproduce your results here. Single threaded with or without the workqueue based allocation gives me roughly 20k +/-0.5k files/s one a single disk, a 12 disk RAID0 array and a RAM disk on a 8p/4GB RAM machine. That's the same results I've been seeing since I wrote this patch almost 12 months ago.... So, given that this is a metadata intensive workload, the only extent allocation is going to be through inode and directory block allocation. These paths do not consume a large amount of stack, so we can tell the allocator not to switch to workqueue stack for these allocations easily. The patch below does this. It completely removes all the allocation based context switches from the no-data fsmark workloads being used for this testing. It makes no noticable difference to performance here, so I'm interested if it solves the regression you are seeing on your machines. Cheers, Dave. -- Dave Chinner david@fromorbit.com xfs: don't defer metadata allocation to the workqueue From: Dave Chinner <dchinner@redhat.com> Almost all metadata allocations come from shallow stack usage situations. Avoid the overhead of switching the allocation to a workqueue as we are not in danger of running out of stack when making these allocations. Metadata allocations are already marked through the args that are passed down, so this is trivial to do. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/xfs_alloc.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/fs/xfs/xfs_alloc.c b/fs/xfs/xfs_alloc.c index f654f51..4f33c32 100644 --- a/fs/xfs/xfs_alloc.c +++ b/fs/xfs/xfs_alloc.c @@ -2434,13 +2434,22 @@ xfs_alloc_vextent_worker( current_restore_flags_nested(&pflags, PF_FSTRANS); } - -int /* error */ +/* + * Data allocation requests often come in with little stack to work on. Push + * them off to a worker thread so there is lots of stack to use. Metadata + * requests, OTOH, are generally from low stack usage paths, so avoid the + * context switch overhead here. + */ +int xfs_alloc_vextent( - xfs_alloc_arg_t *args) /* allocation argument structure */ + struct xfs_alloc_arg *args) { DECLARE_COMPLETION_ONSTACK(done); + if (!args->userdata) + return __xfs_alloc_vextent(args); + + args->done = &done; INIT_WORK_ONSTACK(&args->work, xfs_alloc_vextent_worker); queue_work(xfs_alloc_wq, &args->work); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [MMTests] IO metadata on XFS 2012-07-04 0:47 ` Dave Chinner @ 2012-07-04 9:51 ` Mel Gorman 0 siblings, 0 replies; 29+ messages in thread From: Mel Gorman @ 2012-07-04 9:51 UTC (permalink / raw) To: Dave Chinner Cc: Christoph Hellwig, linux-mm, linux-kernel, linux-fsdevel, xfs, dri-devel, Keith Packard, Eugeni Dodonov, Daniel Vetter, Chris Wilson On Wed, Jul 04, 2012 at 10:47:06AM +1000, Dave Chinner wrote: > > > > <SNIP> > > > > [c999a223: xfs: introduce an allocation workqueue] > > > > > > Which indicates that there is workqueue scheduling issues, I think. > > > The same amount of work is being done, but half of it is being > > > pushed off into a workqueue to avoid stack overflow issues (*). I > > > tested the above patch in anger on an 8p machine, similar to the > > > machine you saw no regressions on, but the workload didn't drive it > > > to being completely CPU bound (only about 90%) so the allocation > > > work was probably always scheduled quickly. > > > > What test were you using? > > fsmark, dbench, compilebench, and a few fio workloads. Also, > xfstests times each test and I keep track of overall runtime, and > none of those showed any performance differential, either... > Sound. I have some coverage on some of the same tests. When I get to them I'll keep an eye on the 3.4 figures. It might be due to the disk I'm using. It's a single disk and nothing to write home about in terms of performance. It's not exactly XFS's usual target audience. > Indeed, running on a current 3.5-rc5 tree, my usual fsmark > benchmarks are running at the same numbers I've been seeing since > about 3.0 - somewhere around 18k files/s for a single thread, and > 110-115k files/s for 8 threads. > > I just ran your variant, and I'm getting about 20kfile/s for a > single thread, which is about right because you're using smaller > directories than I am (22500 files per dir vs 100k in my tests). > I had data for an fsmark-single test running with 30M files and FWIW the 3.4 performance figures were in line with 3.0 and later kernels. > > > How many worker threads have been spawned on these machines > > > that are showing the regression? > > > > 20 or 21 generally. An example list as spotted by top looks like > > Pretty normal. > > > > What is the context switch rate on the machines whenteh test is running? > ..... > > Vanilla average context switch rate 4278.53 > > Revert average context switch rate 1095 > > That seems about right, too. > > > > Can you run latencytop to see > > > if there is excessive starvation/wait times for allocation > > > completion? > > > > I'm not sure what format you are looking for. > > Where the context switches are coming from, and how long they are > abeing stalled for. Noted. Capturing latency_stats over time is enough to do that. It won't give a per-process breakdown but in the majority of cases that is not a problem. Extracting the data is a bit annoying but not impossible and better than parsing latencytop. Ideally, latencytop would be able to log data in some sensible format. hmmm. > Just to get the context switch locations, you > can use perf on the sched:sched_switch event, but that doesn't give > you stall times. No, but both can be captured and roughly correlated with each other given sufficient motivation. > Local testing tells me that about 40% of the > switches are from xfs_alloc_vextent, 55% are from the work threads, > and the rest are CPU idling events, which is exactly as I'd expect. > > > > A pert top profile comparison might be informative, > > > too... > > > > > > > I'm not sure if this is what you really wanted. I thought an oprofile or > > perf report would have made more sense but I recorded perf top over time > > anyway and it's at the end of the mail. > > perf report and oprofile give you CPU usage across the run, it's not > instantaneous and that's where all the interesting information is. > e.g. a 5% sample in a 20s profile might be 5% per second for 20s, or > it might be 100% for 1s - that's the behaviour run profiles cannot > give you insight into.... > Fair point. I'll fix up the timestamping and keep the monitor for future reference. > As it is, the output you posted is nothing unusual. > Grand. I had taken a look but I saw nothing particularly unusual but I also was not 100% sure what I should be looking for., > > For just these XFS tests I've uploaded a tarball of the logs to > > http://www.csn.ul.ie/~mel/postings/xfsbisect-20120703/xfsbisect-logs.tar.gz > > Ok, so the main thing I missed when first looking at this is that > you are concerned about single thread regressions. In this specific test, yes. In the original data I posted I had threaded benchmarks but they did not show the regression. This was a rerun with just the single threaded case. Generally I run both because I see bug reports involving both types of test. > Well, I can't > reproduce your results here. Single threaded with or without the > workqueue based allocation gives me roughly 20k +/-0.5k files/s one > a single disk, a 12 disk RAID0 array and a RAM disk on a 8p/4GB RAM > machine. That's the same results I've been seeing since I wrote > this patch almost 12 months ago.... > > So, given that this is a metadata intensive workload, the only > extent allocation is going to be through inode and directory block > allocation. These paths do not consume a large amount of stack, so > we can tell the allocator not to switch to workqueue stack for these > allocations easily. > > The patch below does this. It completely removes all the allocation > based context switches from the no-data fsmark workloads being used > for this testing. It makes no noticable difference to performance > here, so I'm interested if it solves the regression you are seeing > on your machines. > It does. nodefer-metadata is just your patch applied on top of 3.4 and is the right-most column. It's within the noise for the reverted patches and approximately the same performance as 3.3. If you look at the timing stats at the bottom you'll see that hte patch brings the System time way down so consider this a Tested-by: Mel Gorman <mgorman@suse.de> FS-Mark Single Threaded fsmark-single single-3.4.0 single-3.4.0 single-3.4.0 single-3.4.0 3.4.0-vanilla revert-aa464191 revert-c999a223 revert-both nodefer-metadata Files/s min 14176.40 ( 0.00%) 17830.60 (25.78%) 24186.70 (70.61%) 25108.00 (77.11%) 25448.40 (79.51%) Files/s mean 16783.35 ( 0.00%) 25029.69 (49.13%) 37513.72 (123.52%) 38169.97 (127.43%) 36393.09 (116.84%) Files/s stddev 1007.26 ( 0.00%) 2644.87 (162.58%) 5344.99 (430.65%) 5599.65 (455.93%) 5961.48 (491.85%) Files/s max 18475.40 ( 0.00%) 27966.10 (51.37%) 45564.60 (146.62%) 47918.10 (159.36%) 47146.20 (155.18%) Overhead min 593978.00 ( 0.00%) 386173.00 (34.99%) 253812.00 (57.27%) 247396.00 (58.35%) 248906.00 (58.10%) Overhead mean 637782.80 ( 0.00%) 429229.33 (32.70%) 322868.20 (49.38%) 287141.73 (54.98%) 284274.93 (55.43%) Overhead stddev 72440.72 ( 0.00%) 100056.96 (-38.12%) 175001.08 (-141.58%) 102018.14 (-40.83%) 114055.47 (-57.45%) Overhead max 855637.00 ( 0.00%) 753541.00 (11.93%) 880531.00 (-2.91%) 637932.00 (25.44%) 710720.00 (16.94%) MMTests Statistics: duration Sys Time Running Test (seconds) 44.06 32.25 24.19 23.99 24.38 User+Sys Time Running Test (seconds) 50.19 36.35 27.24 26.7 27.12 Total Elapsed Time (seconds) 59.21 44.76 34.95 34.14 36.11 Thanks. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [MMTests] IO metadata on XFS 2012-07-02 19:35 ` Mel Gorman 2012-07-03 0:19 ` Dave Chinner @ 2012-07-03 13:04 ` Mel Gorman 2012-07-03 14:04 ` Daniel Vetter 1 sibling, 1 reply; 29+ messages in thread From: Mel Gorman @ 2012-07-03 13:04 UTC (permalink / raw) To: Christoph Hellwig Cc: Dave Chinner, linux-mm, linux-kernel, linux-fsdevel, xfs, dri-devel, Keith Packard, Eugeni Dodonov, Daniel Vetter, Chris Wilson On Mon, Jul 02, 2012 at 08:35:16PM +0100, Mel Gorman wrote: > > <SNIP> > > > It was obvious very quickly that there were two distinct regression so I > ran two bisections. One led to a XFS and the other led to an i915 patch > that enables RC6 to reduce power usage. > > [c999a223: xfs: introduce an allocation workqueue] > [aa464191: drm/i915: enable plain RC6 on Sandy Bridge by default] > > gdm was running on the machine so i915 would have been in use. Bah, more PEBKAC. gdm was *not* running on this machine. i915 is loaded but X is not. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [MMTests] IO metadata on XFS 2012-07-03 13:04 ` Mel Gorman @ 2012-07-03 14:04 ` Daniel Vetter 0 siblings, 0 replies; 29+ messages in thread From: Daniel Vetter @ 2012-07-03 14:04 UTC (permalink / raw) To: Mel Gorman Cc: Christoph Hellwig, Dave Chinner, linux-mm, linux-kernel, linux-fsdevel, xfs, dri-devel, Keith Packard, Eugeni Dodonov, Daniel Vetter, Chris Wilson On Tue, Jul 03, 2012 at 02:04:14PM +0100, Mel Gorman wrote: > On Mon, Jul 02, 2012 at 08:35:16PM +0100, Mel Gorman wrote: > > > <SNIP> > > > > > It was obvious very quickly that there were two distinct regression so I > > ran two bisections. One led to a XFS and the other led to an i915 patch > > that enables RC6 to reduce power usage. > > > > [c999a223: xfs: introduce an allocation workqueue] > > [aa464191: drm/i915: enable plain RC6 on Sandy Bridge by default] > > > > gdm was running on the machine so i915 would have been in use. > > Bah, more PEBKAC. gdm was *not* running on this machine. i915 is loaded > but X is not. See my little explanation of rc6, just loading the driver will have effects. But I'm happy to know that the issue also happens without using it, makes it really unlikely it's an issue with the gpu or i915.ko ;-) -Daniel -- Daniel Vetter Mail: daniel@ffwll.ch Mobile: +41 (0)79 365 57 48 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [MMTests] IO metadata on XFS 2012-07-01 23:54 ` Dave Chinner 2012-07-02 6:32 ` Christoph Hellwig @ 2012-07-02 13:30 ` Mel Gorman 1 sibling, 0 replies; 29+ messages in thread From: Mel Gorman @ 2012-07-02 13:30 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-mm, linux-kernel, linux-fsdevel, xfs On Mon, Jul 02, 2012 at 09:54:58AM +1000, Dave Chinner wrote: > On Fri, Jun 29, 2012 at 12:25:06PM +0100, Mel Gorman wrote: > > Configuration: global-dhp__io-metadata-xfs > > Benchmarks: dbench3, fsmark-single, fsmark-threaded > > > > Summary > > ======= > > Most of the figures look good and in general there has been consistent good > > performance from XFS. However, fsmark-single is showing a severe performance > > dip in a few cases somewhere between 3.1 and 3.4. fs-mark running a single > > thread took a particularly bad dive in 3.4 for two machines that is worth > > examining closer. > > That will be caused by the fact we changed all the metadata updates > to be logged, which means a transaction every time .dirty_inode is > called. > Ok. > This should mostly go away when XFS is converted to use .update_time > rather than .dirty_inode to only issue transactions when the VFS > updates the atime rather than every .dirty_inode call... > Sound. I'll keep an eye out for it in the future. If you want to use the same test configuration then be sure you set the partition configuration correctly. For example, these are the values I used for config-global-dhp__io-metadata configuration file. export TESTDISK_PARTITION=/dev/sda6 export TESTDISK_FILESYSTEM=xfs export TESTDISK_MKFS_PARAM="-f -d agcount=8" export TESTDISK_MOUNT_ARGS=inode64,delaylog,logbsize=262144,nobarrier > > Unfortunately it is harder to easy conclusions as the > > gains/losses are not consistent between machines which may be related to > > the available number of CPU threads. > > It increases the CPU overhead (dirty_inode can be called up to 4 > times per write(2) call, IIRC), so with limited numbers of > threads/limited CPU power it will result in lower performance. Where > you have lots of CPU power, there will be little difference in > performance... > Thanks for that clarification. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 29+ messages in thread
* [MMTests] Interactivity during IO on ext3 [not found] ` <20120629111932.GA14154@suse.de> ` (2 preceding siblings ...) 2012-06-29 11:25 ` [MMTests] IO metadata on XFS Mel Gorman @ 2012-07-05 14:56 ` Mel Gorman 2012-07-10 9:49 ` Jan Kara 2012-07-05 14:57 ` [MMTests] Interactivity during IO on ext4 Mel Gorman ` (4 subsequent siblings) 8 siblings, 1 reply; 29+ messages in thread From: Mel Gorman @ 2012-07-05 14:56 UTC (permalink / raw) To: linux-mm; +Cc: linux-kernel, linux-fsdevel Configuration: global-dhp__io-interactive-performance-ext3 Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-interactive-performance-ext3 Benchmarks: postmark largedd fsmark-single fsmark-threaded micro Summary ======= There are some terrible results in here that might explain some of the interactivity mess if the distribution defaulted to ext3 or was was chosen by the user for any reason. In some cases average read latency has doubled, tripled and in one case almost quadrupled since 2.6.32. Worse, we are not consistently good or bad. I see patterns like great release, bad release, good release, bad again etc. Benchmark notes =============== NOTE: This configuration is new and very experimental. This is my first time looking at the results of this type of test so flaws are inevitable. There is ample scope for improvement but I had to start somewhere. This configuration is very different in that it is trying to analyse the impact of IO on interactive performance. Some interactivity problems are due to an application trying to read() cache-cold data such as configuration files or cached images. If there is a lot of IO going on, the application may stall while this happens. This is a limited scenario for measuring interactivity but a common one. These tests are fairly standard except that there is a background application running in parallel. It begins by creating a 100M file and using fadvise(POSIX_FADV_DONTNEED) to evict it from cache. Once that is complete it will try to read 1M from the file every few seconds and record the latency. When it reaches the end of the file, it dumps it from cache and starts again. This latency is a *proxy* measure of interactivity, not a true measure. A variation would be to measure the time for small writes for applications that are logging data or applications like gnome-terminal that do small writes to /tmp as part of its buffer management. The main strength is that if we get this basic case wrong, then the complex cases are almost certainly screwed as well. There are two areas to pay attention to. One is completion time and how it is affected by the small reads taking place in parallel. A comprehensive analysis would show exactly how much the workload is affected by a parallel read but right now I'm just looking at wall time. The second area to pay attention to is the read latencies paying particular attention to the average latency and the max latencies. The variations are harder to draw decent conclusions from. A sensible option would be to plot a CDF to get a better idea what the probability of a given read latency is but for now that's a TODO item. As it is, the graphs are barely usable and I'll be giving that more thought. =========================================================== Machine: arnold Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-interactive-performance-ext3/arnold/comparison.html Arch: x86 CPUs: 1 socket, 2 threads Model: Pentium 4 Disk: Single Rotary Disk =========================================================== fsmark-single ------------- Completion times since 3.2 have been badly affected which coincides with the introduction of IO-less dirty page throttling. 3.3 was particularly bad. 2.6.32 was TERRIBLE in terms of read-latencies with the average latency and max latencies looking awful. The 90th percentile was close to 4 seconds and as a result the graphs are even more of a complete mess than they might have been otherwise. Otherwise it's worth looking closely at 3.0 and 3.2. In 3.0, 95% of the reads were below 206ms but in 3.2 this had grown to 273ms. The latency of the other 5% results increased from 481ms to 774ms. 3.4 is looking better at least. fsmark-threaded --------------- With multiple writers, completion times have been affected and again 3.2 showed a big increase. Again, 2.6.32 is a complete disaster and mucks up all the graphs. Otherwise, our average read latencies do not look too bad. However, our worst-case latencies look pretty bad. Kernel 3.2 is showing that at worst a read() can take 4.3 seconds when there are multiple parallel writers. This must be fairly rare as 99% of the latencies were below 1 second but a 4 second stall in an application sometimes would feel pretty bad. Maximum latencies have improved a bit in 3.4 but are still around a half second higher than 3.0 and 3.1 kernels. postmark -------- This is interesting in that 3.2 kernels results show an improvement in maximum read latencies and 3.4 is looking worse. The completion times for postmark were very badly affected in 3.4. Almost the opposite of what the fsmark workloads showed. It's hard to draw any sensible conclusions from this that match up with fsmark. largedd ------- Completion times are more or less unaffected. Maximum read latencies are affected though. In 2.6.39, our maximum latency was 781ms and was 13163ms in 3.0 and 1122ms in 3.2 which might explain some of the interactivity complains around those kernels when a large cp was going on. Right now, things are looking very good. micro ----- Completion times look ok. 2.6.32 is again hilariously bad. 3.1 also showed very poor maximum latencies but 3.2 and later kernels look good. ========================================================== Machine: hydra Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-interactive-performance-ext3/hydra/comparison.html Arch: x86-64 CPUs: 1 socket, 4 threads Model: AMD Phenom II X4 940 Disk: Single Rotary Disk ========================================================== fsmark-single ------------- Completion times are all over the place with a big increase in 3.2 that improved a bit since but not as good as 3.1 kernels were. Unlike arnold, 2.6.32 is not a complete mess and makes a comparison more meaningful. Our maximum latencies have jumped around a lot with 3.2 being particularly bad and 3.4 not being much better. 3.1 and 3.3 were both good in terms of maximum latency. Average latency is shot to hell. In 2.6.32 it was 349ms and it's now 781ms. 3.2 was really bad but it's not like 3.0 or 3.1 were fantastic either. fsmark-threaded --------------- Completion times are more or less ok. Maximum read latency is worse with increases of around 500ms in worst latency and even the 90th percentile is not looking great. Average latency is completely shot. postmark -------- Again impossible to draw sensible conclusions from this. The throughput graph makes a nice sawtooth pattern suitable for poking you in the eye until it bleeds. It's all over the place in terms of completion times. Average latency figures are relatively ok but still regressed. Maximum latencies have increased. largedd ------- Completion times are more or less steady although 3.2 showed a large jump in the length time it took to copy the files. 3.2 took almost 10 minutes more to copy the files than 3.1 or 3.3. Maximum latencies in 3.2 were very high and the 90th percentile also looked pretty bad. 3.4 is better but still way worse than 2.6.32. Average latency would be laughable if it was not so tragic. micro ----- This was looking better until 3.4 when max latencies jumped but by and large this looks good. ========================================================== Machine: sandy Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-interactive-performance-ext3/sandy/comparison.html Arch: x86-64 CPUs: 1 socket, 8 threads Model: Intel Core i7-2600 Disk: Single Rotary Disk ========================================================== fsmark-single ------------- Completion times are more or less ok. They've been worse since 3.2 but still better than 3.2 by a big margin. Read latencies are another story. Maximum latency has increased a LOT from 1.3 seconds to 3.1 seconds in kernel 3.4. Kernel 3.0 had a maximum latency of 6.5 seconds! The 90th percentile figures are not much better with latencies of more than 1 second being recorded from all the way back to 2.6.39. Average latencies have more than tripled from 230ms to 812ms. fsmark-threaded --------------- Completion times are generally good. Read latencies are completely screwed. Kernel 3.2 had a maximum latency of 15 seconds! 3.4 has improved but it's still way too high. Even the 90th percentile figures look completely crap and average latency is of course bad with such high latencies being recorded. postmark -------- Once again the throughput figures make a nice stab stab shape for the eyes. The latency figures are sufficiently crap that it depresses me to talk about them. largedd ------- Completion times look decent. Which does not get over the shock of the latency figures were again are shocking. 6 second maximum latencies in 3.3 and 3.4 kernels although 3.2 was actually quite good. Even 90th percentile maximum latencies have almost doubled since 3.2 and of course the average latencies have almost tripled in line with other results. micro ----- Completion times look decent. The read latencies are ok. The average latency is higher because the latencies to the 90th percentile are higher but the maximum latencies have improved so overall I guess this is a win. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [MMTests] Interactivity during IO on ext3 2012-07-05 14:56 ` [MMTests] Interactivity during IO on ext3 Mel Gorman @ 2012-07-10 9:49 ` Jan Kara 2012-07-10 11:30 ` Mel Gorman 0 siblings, 1 reply; 29+ messages in thread From: Jan Kara @ 2012-07-10 9:49 UTC (permalink / raw) To: Mel Gorman; +Cc: linux-mm, linux-kernel, linux-fsdevel On Thu 05-07-12 15:56:52, Mel Gorman wrote: > Configuration: global-dhp__io-interactive-performance-ext3 > Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-interactive-performance-ext3 > Benchmarks: postmark largedd fsmark-single fsmark-threaded micro > > Summary > ======= > > There are some terrible results in here that might explain some of the > interactivity mess if the distribution defaulted to ext3 or was was chosen > by the user for any reason. In some cases average read latency has doubled, > tripled and in one case almost quadrupled since 2.6.32. Worse, we are not > consistently good or bad. I see patterns like great release, bad release, > good release, bad again etc. > > Benchmark notes > =============== > > NOTE: This configuration is new and very experimental. This is my first > time looking at the results of this type of test so flaws are > inevitable. There is ample scope for improvement but I had to > start somewhere. > > This configuration is very different in that it is trying to analyse the > impact of IO on interactive performance. Some interactivity problems are > due to an application trying to read() cache-cold data such as configuration > files or cached images. If there is a lot of IO going on, the application > may stall while this happens. This is a limited scenario for measuring > interactivity but a common one. > > These tests are fairly standard except that there is a background > application running in parallel. It begins by creating a 100M file and > using fadvise(POSIX_FADV_DONTNEED) to evict it from cache. Once that is > complete it will try to read 1M from the file every few seconds and record > the latency. When it reaches the end of the file, it dumps it from cache > and starts again. > > This latency is a *proxy* measure of interactivity, not a true measure. A > variation would be to measure the time for small writes for applications > that are logging data or applications like gnome-terminal that do small > writes to /tmp as part of its buffer management. The main strength is > that if we get this basic case wrong, then the complex cases are almost > certainly screwed as well. > > There are two areas to pay attention to. One is completion time and how > it is affected by the small reads taking place in parallel. A comprehensive > analysis would show exactly how much the workload is affected by a parallel > read but right now I'm just looking at wall time. > > The second area to pay attention to is the read latencies paying particular > attention to the average latency and the max latencies. The variations are > harder to draw decent conclusions from. A sensible option would be to plot > a CDF to get a better idea what the probability of a given read latency is > but for now that's a TODO item. As it is, the graphs are barely usable and > I'll be giving that more thought. > > =========================================================== > Machine: arnold > Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-interactive-performance-ext3/arnold/comparison.html > Arch: x86 > CPUs: 1 socket, 2 threads > Model: Pentium 4 > Disk: Single Rotary Disk > =========================================================== > > fsmark-single > ------------- > Completion times since 3.2 have been badly affected which coincides with > the introduction of IO-less dirty page throttling. 3.3 was particularly > bad. > > 2.6.32 was TERRIBLE in terms of read-latencies with the average latency > and max latencies looking awful. The 90th percentile was close to 4 > seconds and as a result the graphs are even more of a complete mess than > they might have been otherwise. > > Otherwise it's worth looking closely at 3.0 and 3.2. In 3.0, 95% of the > reads were below 206ms but in 3.2 this had grown to 273ms. The latency > of the other 5% results increased from 481ms to 774ms. > > 3.4 is looking better at least. Yeah, 3.4 looks OK and I'd be interested in 3.5 results since I've merged one more fix which should help the read latency. But all in all it's hard to tackle the latency problems with ext3 - we have a journal which synchronizes all the writes so we write to it with a high priority (we use WRITE_SYNC when there's some contention on the journal). But that naturally competes with reads and creates higher read latency. > fsmark-threaded > --------------- > With multiple writers, completion times have been affected and again 3.2 > showed a big increase. > > Again, 2.6.32 is a complete disaster and mucks up all the graphs. > > Otherwise, our average read latencies do not look too bad. However, our > worst-case latencies look pretty bad. Kernel 3.2 is showing that at worst > a read() can take 4.3 seconds when there are multiple parallel writers. > This must be fairly rare as 99% of the latencies were below 1 second but > a 4 second stall in an application sometimes would feel pretty bad. > > Maximum latencies have improved a bit in 3.4 but are still around a half > second higher than 3.0 and 3.1 kernels. > > postmark > -------- > This is interesting in that 3.2 kernels results show an improvement in > maximum read latencies and 3.4 is looking worse. The completion times > for postmark were very badly affected in 3.4. Almost the opposite of what > the fsmark workloads showed. It's hard to draw any sensible conclusions > from this that match up with fsmark. > > largedd > ------- > Completion times are more or less unaffected. > > Maximum read latencies are affected though. In 2.6.39, our maximum latency > was 781ms and was 13163ms in 3.0 and 1122ms in 3.2 which might explain > some of the interactivity complains around those kernels when a large > cp was going on. Right now, things are looking very good. > > micro > ----- > Completion times look ok. > > 2.6.32 is again hilariously bad. > > 3.1 also showed very poor maximum latencies but 3.2 and later kernels > look good. > > > ========================================================== > Machine: hydra > Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-interactive-performance-ext3/hydra/comparison.html > Arch: x86-64 > CPUs: 1 socket, 4 threads > Model: AMD Phenom II X4 940 > Disk: Single Rotary Disk > ========================================================== > > fsmark-single > ------------- > Completion times are all over the place with a big increase in 3.2 that > improved a bit since but not as good as 3.1 kernels were. > > Unlike arnold, 2.6.32 is not a complete mess and makes a comparison more > meaningful. Our maximum latencies have jumped around a lot with 3.2 > being particularly bad and 3.4 not being much better. 3.1 and 3.3 were > both good in terms of maximum latency. > > Average latency is shot to hell. In 2.6.32 it was 349ms and it's now 781ms. > 3.2 was really bad but it's not like 3.0 or 3.1 were fantastic either. So I wonder what makes a difference between this machine and the previous one. The results seem completely different. Is it the amount of memory? Is it the difference in the disk? Or even the difference in the CPU? Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [MMTests] Interactivity during IO on ext3 2012-07-10 9:49 ` Jan Kara @ 2012-07-10 11:30 ` Mel Gorman 0 siblings, 0 replies; 29+ messages in thread From: Mel Gorman @ 2012-07-10 11:30 UTC (permalink / raw) To: Jan Kara; +Cc: linux-mm, linux-kernel, linux-fsdevel On Tue, Jul 10, 2012 at 11:49:40AM +0200, Jan Kara wrote: > > =========================================================== > > Machine: arnold > > Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-interactive-performance-ext3/arnold/comparison.html > > Arch: x86 > > CPUs: 1 socket, 2 threads > > Model: Pentium 4 > > Disk: Single Rotary Disk > > =========================================================== > > > > fsmark-single > > ------------- > > Completion times since 3.2 have been badly affected which coincides with > > the introduction of IO-less dirty page throttling. 3.3 was particularly > > bad. > > > > 2.6.32 was TERRIBLE in terms of read-latencies with the average latency > > and max latencies looking awful. The 90th percentile was close to 4 > > seconds and as a result the graphs are even more of a complete mess than > > they might have been otherwise. > > > > Otherwise it's worth looking closely at 3.0 and 3.2. In 3.0, 95% of the > > reads were below 206ms but in 3.2 this had grown to 273ms. The latency > > of the other 5% results increased from 481ms to 774ms. > > > > 3.4 is looking better at least. > > Yeah, 3.4 looks OK and I'd be interested in 3.5 results since I've merged > one more fix which should help the read latency. When 3.5 comes out, I'll be queue up the same tests. Ideally I would be running against each rc but the machines are used for other tests as well and these ones take too long for continual testing to be practical. > But all in all it's hard > to tackle the latency problems with ext3 - we have a journal which > synchronizes all the writes so we write to it with a high priority > (we use WRITE_SYNC when there's some contention on the journal). But that > naturally competes with reads and creates higher read latency. > Thanks for the good explanation. I'll just know to look out for this in interactivity-related or IO-latency bugs. > > <SNIP> > > ========================================================== > > Machine: hydra > > Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-interactive-performance-ext3/hydra/comparison.html > > Arch: x86-64 > > CPUs: 1 socket, 4 threads > > Model: AMD Phenom II X4 940 > > Disk: Single Rotary Disk > > ========================================================== > > > > fsmark-single > > ------------- > > Completion times are all over the place with a big increase in 3.2 that > > improved a bit since but not as good as 3.1 kernels were. > > > > Unlike arnold, 2.6.32 is not a complete mess and makes a comparison more > > meaningful. Our maximum latencies have jumped around a lot with 3.2 > > being particularly bad and 3.4 not being much better. 3.1 and 3.3 were > > both good in terms of maximum latency. > > > > Average latency is shot to hell. In 2.6.32 it was 349ms and it's now 781ms. > > 3.2 was really bad but it's not like 3.0 or 3.1 were fantastic either. > > So I wonder what makes a difference between this machine and the previous > one. The results seem completely different. Is it the amount of memory? Is > it the difference in the disk? Or even the difference in the CPU? > Two big differences are 32-bit versus 64-bit and the 32-bit machine having 4G of RAM and the 64-bit machine having 8G. On the 32-bit machine, bounce buffering may have been an issue but as -S0 was specified (no sync) there would also be differences on when dirty page balancing took place. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 29+ messages in thread
* [MMTests] Interactivity during IO on ext4 [not found] ` <20120629111932.GA14154@suse.de> ` (3 preceding siblings ...) 2012-07-05 14:56 ` [MMTests] Interactivity during IO on ext3 Mel Gorman @ 2012-07-05 14:57 ` Mel Gorman 2012-07-23 21:21 ` [MMTests] dbench4 async on ext3 Mel Gorman ` (3 subsequent siblings) 8 siblings, 0 replies; 29+ messages in thread From: Mel Gorman @ 2012-07-05 14:57 UTC (permalink / raw) To: linux-mm; +Cc: linux-kernel, linux-fsdevel Configuration: global-dhp__io-interactive-performance-ext4 Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-interactive-performance-ext4 Benchmarks: postmark largedd fsmark-single fsmark-threaded micro Summary ======= Unlike ext3, these figures look generally good. There are a few wrinkles in there but indications are that interactivity jitter experienced by users may have a filesystem-specific component. One possibility is that there are differences in how metadata reads are sent to the IO scheduler but I did not confirm this. Benchmark notes =============== NOTE: This configuration is new and very experimental. This is my first time looking at the results of this type of test so flaws are inevitable. There is ample scope for improvement but I had to start somewhere. This configuration is very different in that it is trying to analyse the impact of IO on interactive performance. Some interactivity problems are due to an application trying to read() cache-cold data such as configuration files or cached images. If there is a lot of IO going on, the application may stall while this happens. This is a limited scenario for measuring interactivity but a common one. These tests are fairly standard except that there is a background application running in parallel. It begins by creating a 100M file and using fadvise(POSIX_FADV_DONTNEED) to evict it from cache. Once that is complete it will try to read 1M from the file every few seconds and record the latency. When it reaches the end of the file, it dumps it from cache and starts again. This latency is a *proxy* measure of interactivity, not a true measure. A variation would be to measure the time for small writes for applications that are logging data or applications like gnome-terminal that do small writes to /tmp as part of its buffer management. The main strength is that if we get this basic case wrong, then the complex cases are almost certainly screwed as well. There are two areas to pay attention to. One is completion time and how it is affected by the small reads taking place in parallel. A comprehensive analysis would show exactly how much the workload is affected by a parallel read but right now I'm just looking at wall time. The second area to pay attention to is the read latencies paying particular attention to the average latency and the max latencies. The variations are harder to draw decent conclusions from. A sensible option would be to plot a CDF to get a better idea what the probability of a given read latency is but for now that's a TODO item. As it is, the graphs are barely usable and I'll be giving that more thought. =========================================================== Machine: arnold Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-interactive-performance-ext4/arnold/comparison.html Arch: x86 CPUs: 1 socket, 2 threads Model: Pentium 4 Disk: Single Rotary Disk Status: =========================================================== fsmark-single ------------- Completion times are more or less ok. 3.2 showed a big improvement which is not in line with what was experienced in ext3. As with ext3, kernel 2.6.32 was a disaster but otherwise our maximum read latencies were looking up until 3.3 when there was a big jump that was not fixed in 3.4. By and large though the average latencies are looking good and while the max latency is bad, the 99th percentile was looking good implying that the worst latencies are rarely experienced. fsmark-threaded --------------- Completion times look generally good with 3.1 being an exception. Latencies are also looking good. postmark -------- Similar story. Completion times and latencies generally look good. largedd ------- Completion times were higher from 2.6.39 up until 3.3 taking nearly two minutes to complete the copy in some cases. This is reflected in some of the maximum latencies in that window but by and large the read latencies are much improved. micro ----- Looking good all round. ========================================================== Machine: hydra Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-interactive-performance-ext4/hydra/comparison.html Arch: x86-64 CPUs: 1 socket, 4 threads Model: AMD Phenom II X4 940 Disk: Single Rotary Disk Status: Ok ========================================================== fsmark-single ------------- Completion times have degraded slightly but are acceptable. All the latency figures look good with some big improvements. fsmark-threaded --------------- Same story, generally looking good with big improvements. postmark -------- Completion times are a bit varied but latencies look good. largedd ------- Completion times look good. Latency has improved since 2.6.32 but there is a big wrinkle in there. Maximum latency was 337ms in kernel 3.2 but in 3.3 it was 707ms and in 3.4 was 990ms. The 99th percentile figures look good but something happened to allow bigger outliers. micro ----- For the most part, looks good but there was a big jump in the maximum latency in kernel 3.4. Like largedd, the 99th percentil did not look as bad so it might be an outlier. ========================================================== Machine: sandy Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-interactive-performance-ext4/sandy/comparison.html Arch: x86-64 CPUs: 1 socket, 8 threads Model: Intel Core i7-2600 Disk: Single Rotary Disk Status: ========================================================== fsmark-single ------------- Completion times have degraded slightly but are acceptble. All the latency figures look good with some big improvements. fsmark-threaded --------------- Completion times are improved although curiously it is not reflected in the performance figures for fsmark itself. Maximum latency figures generally look good other than a mild jump in 3.2 that has almost being recovered. postmark -------- Completion times have varied a lot and 3.4 is particularly high. The latency figures in general regressed in 3.4 in comparison to 3.3 but by and large the figures look good. largedd ------- Completion times generally look good but were noticably worse for a number of releases between 2.6.39 and 3.2. This same window showed much higher latency figures with kernel 3.1 showing a maximum latency of 1.3 seconds for example. These were mostly outliers though as the 99th percentile generally looked ok. micro ----- Generally much improved. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 29+ messages in thread
* [MMTests] dbench4 async on ext3 [not found] ` <20120629111932.GA14154@suse.de> ` (4 preceding siblings ...) 2012-07-05 14:57 ` [MMTests] Interactivity during IO on ext4 Mel Gorman @ 2012-07-23 21:21 ` Mel Gorman 2012-08-16 14:52 ` Jan Kara 2012-08-21 22:00 ` Jan Kara 2012-07-23 21:23 ` [MMTests] dbench4 async on ext4 Mel Gorman ` (2 subsequent siblings) 8 siblings, 2 replies; 29+ messages in thread From: Mel Gorman @ 2012-07-23 21:21 UTC (permalink / raw) To: linux-mm; +Cc: linux-kernel, linux-fsdevel Configuration: global-dhp__io-dbench4-async-ext3 Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-dbench4-async-ext3 Benchmarks: dbench4 Summary ======= In general there was a massive drop in throughput after 3.0. Very broadly speaking it looks like the Read operation got faster but at the cost of a big regression in the Flush operation. Benchmark notes =============== mkfs was run on system startup. No attempt was made to age it. No special mkfs or mount options were used. dbench 4 was used. Tests ran for 180 seconds once warmed up. A varying number of clients were used up to 64*NR_CPU. osync, sync-directory and fsync were all off. =========================================================== Machine: arnold Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-dbench4-async-ext3/arnold/comparison.html Arch: x86 CPUs: 1 socket, 2 threads Model: Pentium 4 Disk: Single Rotary Disk =========================================================== dbench4 ------- Generally worse with a big drop in throughput after 3.0 for small number of clients. In some cases there is an improvement in latency for 3.0 and later kernels but not always. ========================================================== Machine: hydra Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-dbench4-async-ext3/hydra/comparison.html Arch: x86-64 CPUs: 1 socket, 4 threads Model: AMD Phenom II X4 940 Disk: Single Rotary Disk Status: Ok ========================================================== dbench4 ------- Similar to arnold, big drop in throughput after 3.0 for small numbers of clients. Unlike arnold, this is matched by an improvement in latency so it may be the case that IO is more fair even if dbench complains about the latency. Very very broadly speaking, it looks like the read operation got a lot faster but flush got a lot slower. ========================================================== Machine: sandy Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-dbench4-async-ext3/sandy/comparison.html Arch: x86-64 CPUs: 1 socket, 8 threads Model: Intel Core i7-2600 Disk: Single Rotary Disk Status: ========================================================== dbench4 ------- Same story, big drop in throughput after 3.0 with flush again looking very expensive for 3.1 and later kernels. Latency figures are a mixed bag. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [MMTests] dbench4 async on ext3 2012-07-23 21:21 ` [MMTests] dbench4 async on ext3 Mel Gorman @ 2012-08-16 14:52 ` Jan Kara 2012-08-21 22:00 ` Jan Kara 1 sibling, 0 replies; 29+ messages in thread From: Jan Kara @ 2012-08-16 14:52 UTC (permalink / raw) To: Mel Gorman; +Cc: linux-mm, linux-kernel, linux-fsdevel On Mon 23-07-12 22:21:46, Mel Gorman wrote: > Configuration: global-dhp__io-dbench4-async-ext3 > Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-dbench4-async-ext3 > Benchmarks: dbench4 > > Summary > ======= > > In general there was a massive drop in throughput after 3.0. Very broadly > speaking it looks like the Read operation got faster but at the cost of > a big regression in the Flush operation. This looks bad. Also quickly looking into changelogs I don't see any change which could cause this. I'll try to reproduce this and track it down. Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [MMTests] dbench4 async on ext3 2012-07-23 21:21 ` [MMTests] dbench4 async on ext3 Mel Gorman 2012-08-16 14:52 ` Jan Kara @ 2012-08-21 22:00 ` Jan Kara 2012-08-22 10:48 ` Mel Gorman 1 sibling, 1 reply; 29+ messages in thread From: Jan Kara @ 2012-08-21 22:00 UTC (permalink / raw) To: Mel Gorman; +Cc: linux-mm, linux-kernel, linux-fsdevel On Mon 23-07-12 22:21:46, Mel Gorman wrote: > Configuration: global-dhp__io-dbench4-async-ext3 > Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-dbench4-async-ext3 > Benchmarks: dbench4 > > Summary > ======= > > In general there was a massive drop in throughput after 3.0. Very broadly > speaking it looks like the Read operation got faster but at the cost of > a big regression in the Flush operation. Mel, I had a look into this and it's actually very likely only a configuration issue. In 3.1 ext3 started to default to enabled barriers (barrier=1 in mount options) which is a safer but slower choice. When I set barriers explicitely, I see no performance difference for dbench4 between 3.0 and 3.1. Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [MMTests] dbench4 async on ext3 2012-08-21 22:00 ` Jan Kara @ 2012-08-22 10:48 ` Mel Gorman 0 siblings, 0 replies; 29+ messages in thread From: Mel Gorman @ 2012-08-22 10:48 UTC (permalink / raw) To: Jan Kara; +Cc: linux-mm, linux-kernel, linux-fsdevel On Wed, Aug 22, 2012 at 12:00:38AM +0200, Jan Kara wrote: > On Mon 23-07-12 22:21:46, Mel Gorman wrote: > > Configuration: global-dhp__io-dbench4-async-ext3 > > Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-dbench4-async-ext3 > > Benchmarks: dbench4 > > > > Summary > > ======= > > > > In general there was a massive drop in throughput after 3.0. Very broadly > > speaking it looks like the Read operation got faster but at the cost of > > a big regression in the Flush operation. > > Mel, I had a look into this and it's actually very likely only a > configuration issue. In 3.1 ext3 started to default to enabled barriers > (barrier=1 in mount options) which is a safer but slower choice. When I set > barriers explicitely, I see no performance difference for dbench4 between > 3.0 and 3.1. > I've confirmed that disabling barriers fixed it, for one test machine and one test at least. I'll reschedule the tests to run with barriers disabled at some point in the future. Thanks for tracking it down, I was at least two weeks away before I got the chance to even look. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 29+ messages in thread
* [MMTests] dbench4 async on ext4 [not found] ` <20120629111932.GA14154@suse.de> ` (5 preceding siblings ...) 2012-07-23 21:21 ` [MMTests] dbench4 async on ext3 Mel Gorman @ 2012-07-23 21:23 ` Mel Gorman 2012-07-23 21:24 ` [MMTests] Threaded IO Performance on ext3 Mel Gorman 2012-07-23 21:25 ` [MMTests] Threaded IO Performance on xfs Mel Gorman 8 siblings, 0 replies; 29+ messages in thread From: Mel Gorman @ 2012-07-23 21:23 UTC (permalink / raw) To: linux-mm; +Cc: linux-kernel, linux-fsdevel Configuration: global-dhp__io-dbench4-async-ext4 Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-dbench4-async-ext4 Benchmarks: dbench4 Summary ======= Nothing majorly exciting although throughput has been declining slightly in a number of cases. However, this is not consistent between machines and latency has also been variable. Broadly speaking, there is not need to take any action here. Benchmark notes =============== mkfs was run on system startup. No attempt was made to age it. No special mkfs or mount options were used. dbench 4 was used. Tests ran for 180 seconds once warmed up. A varying number of clients were used up to 64*NR_CPU. osync, sync-directory and fsync were all off. =========================================================== Machine: arnold Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-dbench4-async-ext4/arnold/comparison.html Arch: x86 CPUs: 1 socket, 2 threads Model: Pentium 4 Disk: Single Rotary Disk =========================================================== dbench4 ------- In very vague terms, throughput has been getting worse over time but it's very gradual. Latency has also been getting worse. ========================================================== Machine: hydra Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-dbench4-async-ext4/hydra/comparison.html Arch: x86-64 CPUs: 1 socket, 4 threads Model: AMD Phenom II X4 940 Disk: Single Rotary Disk Status: Ok ========================================================== dbench4 ------- This is a mixed bag, there are gains and losses and it's hard to draw any meaningful conclusion. ========================================================== Machine: sandy Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-dbench4-async-ext4/sandy/comparison.html Arch: x86-64 CPUs: 1 socket, 8 threads Model: Intel Core i7-2600 Disk: Single Rotary Disk Status: ========================================================== dbench4 ------- For the most part, there are few changes of note. Latency has been getting better particularly in 3.2 and later kernels. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 29+ messages in thread
* [MMTests] Threaded IO Performance on ext3 [not found] ` <20120629111932.GA14154@suse.de> ` (6 preceding siblings ...) 2012-07-23 21:23 ` [MMTests] dbench4 async on ext4 Mel Gorman @ 2012-07-23 21:24 ` Mel Gorman 2012-07-23 21:25 ` [MMTests] Threaded IO Performance on xfs Mel Gorman 8 siblings, 0 replies; 29+ messages in thread From: Mel Gorman @ 2012-07-23 21:24 UTC (permalink / raw) To: linux-mm; +Cc: linux-kernel, linux-fsdevel Configuration: global-dhp__io-threaded-ext3 Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-threaded-ext3 Benchmarks: tiobench Summary ======= Some good results but some 3.x kernels were bad and this varied between machines. In some, 3.1 and 3.2 were particularly bad. 3.4 regressed on one machine with a large amount of memory. Benchmark notes =============== mkfs was run on system startup. No attempt was made to age it. No special mkfs or mount options were used. The size parameter for tiobench was 2*RAM. This is barely sufficient for this particular test where the size parameter should be multiple times the size of memory. The running time of the benchmark is already excessive and this is not likely to be changed. =========================================================== Machine: arnold Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-threaded-ext3/arnold/comparison.html Arch: x86 CPUs: 1 socket, 2 threads Model: Pentium 4 Disk: Single Rotary Disk ========================================================== tiobench -------- This has regressed in almost all cases although for this machine the main damage was between 2.6.32 and 2.6.34. 3.2.9 performed particularly badly. It's interesting to note that 3.1 and 3.2 kernels both swapped and unexpected swapping has been seen in other tests. ========================================================== Machine: hydra Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-threaded-ext3/hydra/comparison.html Arch: x86-64 CPUs: 1 socket, 4 threads Model: AMD Phenom II X4 940 Disk: Single Rotary Disk ========================================================== tiobench -------- This is a mixed bag. For low numbers of clients, throughput on sequential reads has improved with the exception of 3.2.9 which was a disaster. For larger number of clients, it is a mix of gains and losses. This could be due to weakness in the methodology due to both a small filesize and a small number of iterations. Random read has improved. With the exception of 3.2.9, sequential writes have generally improved. Random write has a number of regressions and 3.2.9 is a diaster. Kernels 3.1 and 3.2 had unexpected swapping. ========================================================== Machine: sandy Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-threaded-ext3/sandy/comparison.html Arch: x86-64 CPUs: 1 socket, 8 threads Model: Intel Core i7-2600 Disk: Single Rotary Disk ========================================================== tiobench -------- Like hydra, sequential reads were generally better for low numbers of clients. 3.4 is notable in that it regressed. Unlike hydra, 3.1 was the first bad kernel for sequential reads unlikely hydra where it was 3.2. There are differences in the memory sizes and therefore the filesize and it implies that there is not a single cause of the regression. Random read has improved. Sequential writes have generally improved although it is interesting to note that 3.1 was a kernel that regressed. 3.4 is better than 2.6.32 but it is interesting to note that it has regressed in comparison to 3.3. Random write has generally improved but again 3.4 is worse than 3.3. Like the other machines, 3.1 and 3.2 saw unexpected swapping. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 29+ messages in thread
* [MMTests] Threaded IO Performance on xfs [not found] ` <20120629111932.GA14154@suse.de> ` (7 preceding siblings ...) 2012-07-23 21:24 ` [MMTests] Threaded IO Performance on ext3 Mel Gorman @ 2012-07-23 21:25 ` Mel Gorman 8 siblings, 0 replies; 29+ messages in thread From: Mel Gorman @ 2012-07-23 21:25 UTC (permalink / raw) To: linux-mm; +Cc: linux-kernel, linux-fsdevel, xfs Configuration: global-dhp__io-threaded-xfs Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-threaded-xfs Benchmarks: tiobench Summary ======= There have been many improvements in the sequential read/write case but 3.4 is noticably worse than 3.3 in a number of cases. Benchmark notes =============== mkfs was run on system startup. mkfs parameters -f -d agcount=8 mount options inode64,delaylog,logbsize=262144,nobarrier for the most part. On kernels to old to support delaylog was removed. On kernels where it was the default, it was specified and the warning ignored. The size parameter for tiobench was 2*RAM. This is barely sufficient for this particular test where the size parameter should be multiple times the size of memory. The running time of the benchmark is already excessive and this is not likely to be changed. =========================================================== Machine: arnold Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-threaded-xfs/arnold/comparison.html Arch: x86 CPUs: 1 socket, 2 threads Model: Pentium 4 Disk: Single Rotary Disk ========================================================== tiobench -------- This is a mixed bag. For low numbers of clients, throughput on sequential reads has improved. For larger number of clients, there are many regressions but this is not consistent. This could be due to weakness in the methodology due to both a small filesize and a small number of iterations. Random read is generally bad. For many kernels sequential write is good with the notable exception of 2.6.39 and 3.0 kernels. There was unexpected swapping on 3.1 and 3.2 kernels. ========================================================== Machine: hydra Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-threaded-xfs/hydra/comparison.html Arch: x86-64 CPUs: 1 socket, 4 threads Model: AMD Phenom II X4 940 Disk: Single Rotary Disk ========================================================== tiobench -------- Like arnold, performance for sequential read is good for low number of clients. Random read looks good. With the exception of 3.0 in general and single threaded writes for all kernels, sequential writes have generally improved. Random write has a number of regressions. Kernels 3.1 and 3.2 had unexpected swapping. ========================================================== Machine: sandy Result: http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__io-threaded-xfs/sandy/comparison.html Arch: x86-64 CPUs: 1 socket, 8 threads Model: Intel Core i7-2600 Disk: Single Rotary Disk ========================================================== tiobench -------- Like hydra, sequential reads were generally better for low numbers of clients. 3.4 is notable in that it regressed and 3.1 was also bad which is roughly similar to what was seen on ext3. There are differences in the memory sizes and therefore the filesize and it implies that there is not a single cause of the regression. Random read has generally improved except with the obvious exception of the single-threaded case. Sequential writes have generally improved but it is interesting to note that 3.4 is worse than 3.3 and this was also seen for ext3. Random write is a mixed bad but again 3.4 is worse than 3.3. Like the other machines, 3.1 and 3.2 saw unexpected swapping. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2012-08-22 10:48 UTC | newest] Thread overview: 29+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <20120620113252.GE4011@suse.de> [not found] ` <20120629111932.GA14154@suse.de> 2012-06-29 11:23 ` [MMTests] IO metadata on ext3 Mel Gorman 2012-06-29 11:24 ` [MMTests] IO metadata on ext4 Mel Gorman 2012-06-29 11:25 ` [MMTests] IO metadata on XFS Mel Gorman 2012-07-01 23:54 ` Dave Chinner 2012-07-02 6:32 ` Christoph Hellwig 2012-07-02 14:32 ` Mel Gorman 2012-07-02 19:35 ` Mel Gorman 2012-07-03 0:19 ` Dave Chinner 2012-07-03 10:59 ` Mel Gorman 2012-07-03 11:44 ` Mel Gorman 2012-07-03 12:31 ` Daniel Vetter 2012-07-03 13:08 ` Mel Gorman 2012-07-03 13:28 ` Eugeni Dodonov 2012-07-04 0:47 ` Dave Chinner 2012-07-04 9:51 ` Mel Gorman 2012-07-03 13:04 ` Mel Gorman 2012-07-03 14:04 ` Daniel Vetter 2012-07-02 13:30 ` Mel Gorman 2012-07-05 14:56 ` [MMTests] Interactivity during IO on ext3 Mel Gorman 2012-07-10 9:49 ` Jan Kara 2012-07-10 11:30 ` Mel Gorman 2012-07-05 14:57 ` [MMTests] Interactivity during IO on ext4 Mel Gorman 2012-07-23 21:21 ` [MMTests] dbench4 async on ext3 Mel Gorman 2012-08-16 14:52 ` Jan Kara 2012-08-21 22:00 ` Jan Kara 2012-08-22 10:48 ` Mel Gorman 2012-07-23 21:23 ` [MMTests] dbench4 async on ext4 Mel Gorman 2012-07-23 21:24 ` [MMTests] Threaded IO Performance on ext3 Mel Gorman 2012-07-23 21:25 ` [MMTests] Threaded IO Performance on xfs Mel Gorman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).