From: Ye Xiaolong <xiaolong.ye@intel.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
LKML <linux-kernel@vger.kernel.org>,
Bob Peterson <rpeterso@redhat.com>,
Wu Fengguang <fengguang.wu@intel.com>, LKP <lkp@01.org>,
Christoph Hellwig <hch@lst.de>
Subject: Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression
Date: Fri, 12 Aug 2016 14:29:34 +0800 [thread overview]
Message-ID: <20160812062934.GA17589@yexl-desktop> (raw)
In-Reply-To: <20160812060433.GS19025@dastard>
On 08/12, Dave Chinner wrote:
>On Thu, Aug 11, 2016 at 10:02:39PM -0700, Linus Torvalds wrote:
>> On Thu, Aug 11, 2016 at 9:16 PM, Dave Chinner <david@fromorbit.com> wrote:
>> >
>> > That's why running aim7 as your "does the filesystem scale"
>> > benchmark is somewhat irrelevant to scaling applications on high
>> > performance systems these days
>>
>> Yes, don't get me wrong - I'm not at all trying to say that AIM7 is a
>> good benchmark. It's just that I think what it happens to test is
>> still meaningful, even if it's not necessarily in any way some kind of
>> "high performance IO" thing.
>>
>> There are probably lots of other more important loads, I just reacted
>> to Christoph seeming to argue that the AIM7 behavior was _so_ broken
>> that we shouldn't even care. It's not _that_ broken, it's just not
>> about high-performance IO streaming, it happens to test something else
>> entirely.
>
>Right - I admit that my first reaction once I worked out what the
>problem was is exactly what Christoph said. But after looking at it
>further, regardless of how crappy the benchmark it, it is a
>regression....
>
>> We've actually had AIM7 occasionally find other issues just because
>> some of the things it does is so odd.
>
>*nod*
>
>> And let's face it, user programs doing odd and not very efficient
>> things should be considered par for the course. We're never going to
>> get rid of insane user programs, so we might as well fix the
>> performance problems even when we say "that's just stupid".
>
>Yup, that's what I'm doing :/
>
>It looks like the underlying cause is that the old block mapping
>code only fed filesystem block size lengths into
>xfs_iomap_write_delay(), whereas the iomap code is feeding the
>(capped) write() length into it. Hence xfs_iomap_write_delay() is
>not detecting the need for speculative preallocation correctly on
>these sub-block writes. The profile looks better for the 1 byte
>write - I've combined the old and new for comparison below:
>
> 4.22% __block_commit_write.isra.30
> 3.80% up_write
> 3.74% xfs_bmapi_read
> 3.65% ___might_sleep
> 3.55% down_write
> 3.20% entry_SYSCALL_64_fastpath
> 3.02% mark_buffer_dirty
> 2.78% __mark_inode_dirty
> 2.78% unlock_page
> 2.59% xfs_break_layouts
> 2.47% xfs_iext_bno_to_ext
> 2.38% __block_write_begin_int
> 2.22% find_get_entry
> 2.17% xfs_file_write_iter
> 2.16% __radix_tree_lookup
> 2.13% iomap_write_actor
> 2.04% xfs_bmap_search_extents
> 1.98% __might_sleep
> 1.84% xfs_file_buffered_aio_write
> 1.76% iomap_apply
> 1.71% generic_write_end
> 1.68% vfs_write
> 1.66% iov_iter_copy_from_user_atomic
> 1.56% xfs_bmap_search_multi_extents
> 1.55% __vfs_write
> 1.52% pagecache_get_page
> 1.46% xfs_bmapi_update_map
> 1.33% xfs_iunlock
> 1.32% xfs_iomap_write_delay
> 1.29% xfs_file_iomap_begin
> 1.29% do_raw_spin_lock
> 1.29% __xfs_bmbt_get_all
> 1.21% iov_iter_advance
> 1.20% xfs_file_aio_write_checks
> 1.14% xfs_ilock
> 1.11% balance_dirty_pages_ratelimited
> 1.10% xfs_bmapi_trim_map
> 1.06% xfs_iomap_eof_want_preallocate
> 1.00% xfs_bmapi_delay
>
>Comparison of common functions:
>
>Old New function
>4.50% 3.74% xfs_bmapi_read
>3.64% 4.22% __block_commit_write.isra.30
>3.55% 2.16% __radix_tree_lookup
>3.46% 3.80% up_write
>3.43% 3.65% ___might_sleep
>3.09% 3.20% entry_SYSCALL_64_fastpath
>3.01% 2.47% xfs_iext_bno_to_ext
>3.01% 2.22% find_get_entry
>2.98% 3.55% down_write
>2.71% 3.02% mark_buffer_dirty
>2.52% 2.78% __mark_inode_dirty
>2.38% 2.78% unlock_page
>2.14% 2.59% xfs_break_layouts
>2.07% 1.46% xfs_bmapi_update_map
>2.06% 2.04% xfs_bmap_search_extents
>2.04% 1.32% xfs_iomap_write_delay
>2.00% 0.38% generic_write_checks
>1.96% 1.56% xfs_bmap_search_multi_extents
>1.90% 1.29% __xfs_bmbt_get_all
>1.89% 1.11% balance_dirty_pages_ratelimited
>1.82% 0.28% wait_for_stable_page
>1.76% 2.17% xfs_file_write_iter
>1.68% 1.06% xfs_iomap_eof_want_preallocate
>1.68% 1.00% xfs_bmapi_delay
>1.67% 2.13% iomap_write_actor
>1.60% 1.84% xfs_file_buffered_aio_write
>1.56% 1.98% __might_sleep
>1.48% 1.29% do_raw_spin_lock
>1.44% 1.71% generic_write_end
>1.41% 1.52% pagecache_get_page
>1.38% 1.10% xfs_bmapi_trim_map
>1.21% 2.38% __block_write_begin_int
>1.17% 1.68% vfs_write
>1.17% 1.29% xfs_file_iomap_begin
>
>This shows more time spent in functions above xfs_file_iomap_begin
>(which does the block mapping and allocation) and less time spent
>below it. i.e. the generic functions as showing higher CPU usage
>and the xfs* functions are showing signficantly reduced CPU usage.
>This implies that we're doing a lot less block mapping work....
>
>lkp-folk: the patch I've just tested it attached below - can you
>feed that through your test and see if it fixes the regression?
>
Hi, Dave
I am verifying your fix patch in lkp environment now, will send the
result once I get it.
Thanks,
Xiaolong
>Cheers,
>
>Dave.
>--
>Dave Chinner
>david@fromorbit.com
>
>xfs: correct speculative prealloc on extending subpage writes
>
>From: Dave Chinner <dchinner@redhat.com>
>
>When a write occurs that extends the file, we check to see if we
>need to preallocate more delalloc space. When we do sub-page
>writes, the new iomap write path passes a sub-block write length to
>the block mapping code. xfs_iomap_write_delay does not expect to be
>pased byte counts smaller than one filesystem block, so it ends up
>checking the BMBT on for blocks beyond EOF on every write,
>regardless of whether we need to or not. This causes a regression in
>aim7 benchmarks as it is full of sub-page writes.
>
>To fix this, clamp the minimum length of a mapping request coming
>through xfs_file_iomap_begin() to one filesystem block. This ensures
>we are passing the same length to xfs_iomap_write_delay() as we did
>when calling through the get_blocks path. This substantially reduces
>the amount of lookup load being placed on the BMBT during sub-block
>write loads.
>
>Signed-off-by: Dave Chinner <dchinner@redhat.com>
>---
> fs/xfs/xfs_iomap.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
>diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
>index cf697eb..486b75b 100644
>--- a/fs/xfs/xfs_iomap.c
>+++ b/fs/xfs/xfs_iomap.c
>@@ -1036,10 +1036,15 @@ xfs_file_iomap_begin(
> * number pulled out of thin air as a best guess for initial
> * testing.
> *
>+ * xfs_iomap_write_delay() only works if the length passed in is
>+ * >= one filesystem block. Hence we need to clamp the minimum
>+ * length we map, too.
>+ *
> * Note that the values needs to be less than 32-bits wide until
> * the lower level functions are updated.
> */
> length = min_t(loff_t, length, 1024 * PAGE_SIZE);
>+ length = max_t(loff_t, length, (1 << inode->i_blkbits));
> if (xfs_get_extsz_hint(ip)) {
> /*
> * xfs_iomap_write_direct() expects the shared lock. It
>_______________________________________________
>LKP mailing list
>LKP@lists.01.org
>https://lists.01.org/mailman/listinfo/lkp
next prev parent reply other threads:[~2016-08-12 6:33 UTC|newest]
Thread overview: 109+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-08-09 14:33 [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression kernel test robot
2016-08-10 18:24 ` Linus Torvalds
2016-08-10 23:08 ` Dave Chinner
2016-08-10 23:51 ` Linus Torvalds
2016-08-10 23:58 ` [LKP] " Huang, Ying
2016-08-11 0:11 ` Huang, Ying
2016-08-11 0:23 ` Linus Torvalds
2016-08-11 0:33 ` Huang, Ying
2016-08-11 1:00 ` Linus Torvalds
2016-08-11 4:46 ` Dave Chinner
2016-08-15 17:22 ` Huang, Ying
2016-08-16 0:08 ` Dave Chinner
2016-08-11 15:57 ` Christoph Hellwig
2016-08-11 16:55 ` Linus Torvalds
2016-08-11 17:51 ` Huang, Ying
2016-08-11 19:51 ` Linus Torvalds
2016-08-11 20:00 ` Christoph Hellwig
2016-08-11 20:35 ` Linus Torvalds
2016-08-11 22:16 ` Al Viro
2016-08-11 22:30 ` Linus Torvalds
2016-08-11 21:16 ` Huang, Ying
2016-08-11 21:40 ` Linus Torvalds
2016-08-11 22:08 ` Christoph Hellwig
2016-08-12 0:54 ` Dave Chinner
2016-08-12 2:23 ` Dave Chinner
2016-08-12 2:32 ` Linus Torvalds
2016-08-12 2:52 ` Christoph Hellwig
2016-08-12 3:20 ` Linus Torvalds
2016-08-12 4:16 ` Dave Chinner
2016-08-12 5:02 ` Linus Torvalds
2016-08-12 6:04 ` Dave Chinner
2016-08-12 6:29 ` Ye Xiaolong [this message]
2016-08-12 8:51 ` Ye Xiaolong
2016-08-12 10:02 ` Dave Chinner
2016-08-12 10:43 ` Fengguang Wu
2016-08-13 0:30 ` [LKP] [lkp] " Christoph Hellwig
2016-08-13 21:48 ` Christoph Hellwig
2016-08-13 22:07 ` Fengguang Wu
2016-08-13 22:15 ` Christoph Hellwig
2016-08-13 22:51 ` Fengguang Wu
2016-08-14 14:50 ` Fengguang Wu
2016-08-14 16:17 ` Christoph Hellwig
2016-08-14 23:46 ` Dave Chinner
2016-08-14 23:57 ` Fengguang Wu
2016-08-15 14:14 ` Fengguang Wu
2016-08-15 21:22 ` Dave Chinner
2016-08-16 12:20 ` Fengguang Wu
2016-08-15 20:30 ` Huang, Ying
2016-08-22 22:09 ` Huang, Ying
2016-09-26 6:25 ` Huang, Ying
2016-09-26 14:55 ` Christoph Hellwig
2016-09-27 0:52 ` Huang, Ying
2016-08-16 13:25 ` Fengguang Wu
2016-08-13 23:32 ` Dave Chinner
2016-08-12 2:27 ` Linus Torvalds
2016-08-12 3:56 ` Dave Chinner
2016-08-12 18:03 ` Linus Torvalds
2016-08-13 23:58 ` Fengguang Wu
2016-08-15 0:48 ` Dave Chinner
2016-08-15 1:37 ` Linus Torvalds
2016-08-15 2:28 ` Dave Chinner
2016-08-15 2:53 ` Linus Torvalds
2016-08-15 5:00 ` Dave Chinner
[not found] ` <CA+55aFwva2Xffai+Eqv1Jn_NGryk3YJ2i5JoHOQnbQv6qVPAsw@mail.gmail.com>
[not found] ` <CA+55aFy14nUnJQ_GdF=j8Fa9xiH70c6fY2G3q5HQ01+8z1z3qQ@mail.gmail.com>
[not found] ` <CA+55aFxp+rLehC8c157uRbH459wUC1rRPfCVgvmcq5BrG9gkyg@mail.gmail.com>
2016-08-15 22:22 ` Dave Chinner
2016-08-15 22:42 ` Dave Chinner
2016-08-15 23:20 ` Linus Torvalds
2016-08-15 23:48 ` Linus Torvalds
2016-08-16 0:44 ` Dave Chinner
2016-08-16 15:05 ` Mel Gorman
2016-08-16 17:47 ` Linus Torvalds
2016-08-17 15:48 ` Michal Hocko
2016-08-17 16:42 ` Michal Hocko
2016-08-17 15:49 ` Mel Gorman
2016-08-18 0:45 ` Mel Gorman
2016-08-18 7:11 ` Dave Chinner
2016-08-18 13:24 ` Mel Gorman
2016-08-18 17:55 ` Linus Torvalds
2016-08-18 21:19 ` Dave Chinner
2016-08-18 22:25 ` Linus Torvalds
2016-08-19 9:00 ` Michal Hocko
2016-08-19 10:49 ` Mel Gorman
2016-08-19 23:48 ` Dave Chinner
2016-08-20 1:08 ` Linus Torvalds
2016-08-20 12:16 ` Mel Gorman
2016-08-19 15:08 ` Mel Gorman
2016-09-01 23:32 ` Dave Chinner
2016-09-06 15:37 ` Mel Gorman
2016-09-06 15:52 ` Huang, Ying
2016-08-24 15:40 ` Huang, Ying
2016-08-25 9:37 ` Mel Gorman
2016-08-18 2:44 ` Dave Chinner
2016-08-16 0:15 ` Linus Torvalds
2016-08-16 0:38 ` Dave Chinner
2016-08-16 0:50 ` Linus Torvalds
2016-08-16 0:19 ` Dave Chinner
2016-08-16 1:51 ` Linus Torvalds
2016-08-16 22:02 ` Dave Chinner
2016-08-16 23:23 ` Linus Torvalds
2016-08-15 23:01 ` Linus Torvalds
2016-08-16 0:17 ` Dave Chinner
2016-08-16 0:45 ` Linus Torvalds
2016-08-15 5:03 ` Ingo Molnar
2016-08-17 16:24 ` Peter Zijlstra
2016-08-15 12:58 ` Fengguang Wu
2016-08-11 1:16 ` Dave Chinner
2016-08-11 1:32 ` Dave Chinner
2016-08-11 2:36 ` Ye Xiaolong
2016-08-11 3:05 ` Dave Chinner
2016-08-12 1:26 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160812062934.GA17589@yexl-desktop \
--to=xiaolong.ye@intel.com \
--cc=david@fromorbit.com \
--cc=fengguang.wu@intel.com \
--cc=hch@lst.de \
--cc=linux-kernel@vger.kernel.org \
--cc=lkp@01.org \
--cc=rpeterso@redhat.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox