From: Dave Chinner <david@fromorbit.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>,
"Huang, Ying" <ying.huang@intel.com>,
LKML <linux-kernel@vger.kernel.org>,
Bob Peterson <rpeterso@redhat.com>,
Wu Fengguang <fengguang.wu@intel.com>, LKP <lkp@01.org>
Subject: Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression
Date: Fri, 12 Aug 2016 12:23:29 +1000 [thread overview]
Message-ID: <20160812022329.GP19025@dastard> (raw)
In-Reply-To: <20160812005442.GN19025@dastard>
On Fri, Aug 12, 2016 at 10:54:42AM +1000, Dave Chinner wrote:
> I'm now going to test Christoph's theory that this is an "overwrite
> doing lots of block mapping" issue. More on that to follow.
Ok, so going back to the profiles, I can say it's not an overwrite
issue, because there is delayed allocation showing up in the
profile. Lots of it. Which lead me to think "maybe the benchmark is
just completely *dumb*".
And, as usual, that's the answer. Here's the reproducer:
# sudo mkfs.xfs -f -m crc=0 /dev/pmem1
# sudo mount -o noatime /dev/pmem1 /mnt/scratch
# sudo xfs_io -f -c "pwrite 0 512m -b 1" /mnt/scratch/fooey
And here's the profile:
4.50% [kernel] [k] xfs_bmapi_read
3.64% [kernel] [k] __block_commit_write.isra.30
3.55% [kernel] [k] __radix_tree_lookup
3.46% [kernel] [k] up_write
3.43% [kernel] [k] ___might_sleep
3.09% [kernel] [k] entry_SYSCALL_64_fastpath
3.01% [kernel] [k] xfs_iext_bno_to_ext
3.01% [kernel] [k] find_get_entry
2.98% [kernel] [k] down_write
2.71% [kernel] [k] mark_buffer_dirty
2.52% [kernel] [k] __mark_inode_dirty
2.38% [kernel] [k] unlock_page
2.14% [kernel] [k] xfs_break_layouts
2.07% [kernel] [k] xfs_bmapi_update_map
2.06% [kernel] [k] xfs_bmap_search_extents
2.04% [kernel] [k] xfs_iomap_write_delay
2.00% [kernel] [k] generic_write_checks
1.96% [kernel] [k] xfs_bmap_search_multi_extents
1.90% [kernel] [k] __xfs_bmbt_get_all
1.89% [kernel] [k] balance_dirty_pages_ratelimited
1.82% [kernel] [k] wait_for_stable_page
1.76% [kernel] [k] xfs_file_write_iter
1.68% [kernel] [k] xfs_iomap_eof_want_preallocate
1.68% [kernel] [k] xfs_bmapi_delay
1.67% [kernel] [k] iomap_write_actor
1.60% [kernel] [k] xfs_file_buffered_aio_write
1.56% [kernel] [k] __might_sleep
1.48% [kernel] [k] do_raw_spin_lock
1.44% [kernel] [k] generic_write_end
1.41% [kernel] [k] pagecache_get_page
1.38% [kernel] [k] xfs_bmapi_trim_map
1.21% [kernel] [k] __block_write_begin_int
1.17% [kernel] [k] vfs_write
1.17% [kernel] [k] xfs_file_iomap_begin
1.17% [kernel] [k] xfs_bmbt_get_startoff
1.14% [kernel] [k] iomap_apply
1.08% [kernel] [k] xfs_iunlock
1.08% [kernel] [k] iov_iter_copy_from_user_atomic
0.97% [kernel] [k] xfs_file_aio_write_checks
0.96% [kernel] [k] xfs_ilock
.....
Yeah, I'm doing a sequential write in *1 byte pwrite() calls*.
Ok, so the benchmark isn't /quite/ that abysmally stupid. It's
still, ah, extremely challenged:
if (NBUFSIZE != 1024) { /* enforce known block size */
fprintf(stderr, "NBUFSIZE changed to %d\n", NBUFSIZE);
exit(1);
}
i.e. it's hard coded to do all it's "disk" IO in 1k block sizes.
Every read, every write, every file copy, etc are all done with a
1024 byte buffer. There are lots of loops that look like:
while (--n) {
write(fd, nbuf, sizeof nbuf)
}
where n is the file size specified in the job file. Those loops are
what is generating the profile we see: repeated partial page writes
that extend the file.
IOWs, the benchmark is doing exactly what we document in the fstat()
man page *not to do* as it is will cause inefficient IO patterns:
The st_blksize field gives the "preferred" blocksize for
efficient filesystem I/O. (Writing to a file in smaller
chunks may cause an inefficient read-modify-rewrite.)
The smallest we ever set st_blksize to is PAGE_SIZE, so the
benchmark is running well known and documented (at least 10 years
ago) slow paths through the IO stack. I'm very tempted now simply
to say that the aim7 disk benchmark is showing it's age and as such
the results are not actually reflective of what typical applications
will see.
Christoph, maybe there's something we can do to only trigger
speculative prealloc growth checks if the new file size crosses the end of
the currently allocated block at the EOF. That would chop out a fair
chunk of the xfs_bmapi_read calls being done in this workload. I'm
not sure how much effort we should spend optimising this slow path,
though....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2016-08-12 2:23 UTC|newest]
Thread overview: 109+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-08-09 14:33 [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression kernel test robot
2016-08-10 18:24 ` Linus Torvalds
2016-08-10 23:08 ` Dave Chinner
2016-08-10 23:51 ` Linus Torvalds
2016-08-10 23:58 ` [LKP] " Huang, Ying
2016-08-11 0:11 ` Huang, Ying
2016-08-11 0:23 ` Linus Torvalds
2016-08-11 0:33 ` Huang, Ying
2016-08-11 1:00 ` Linus Torvalds
2016-08-11 4:46 ` Dave Chinner
2016-08-15 17:22 ` Huang, Ying
2016-08-16 0:08 ` Dave Chinner
2016-08-11 15:57 ` Christoph Hellwig
2016-08-11 16:55 ` Linus Torvalds
2016-08-11 17:51 ` Huang, Ying
2016-08-11 19:51 ` Linus Torvalds
2016-08-11 20:00 ` Christoph Hellwig
2016-08-11 20:35 ` Linus Torvalds
2016-08-11 22:16 ` Al Viro
2016-08-11 22:30 ` Linus Torvalds
2016-08-11 21:16 ` Huang, Ying
2016-08-11 21:40 ` Linus Torvalds
2016-08-11 22:08 ` Christoph Hellwig
2016-08-12 0:54 ` Dave Chinner
2016-08-12 2:23 ` Dave Chinner [this message]
2016-08-12 2:32 ` Linus Torvalds
2016-08-12 2:52 ` Christoph Hellwig
2016-08-12 3:20 ` Linus Torvalds
2016-08-12 4:16 ` Dave Chinner
2016-08-12 5:02 ` Linus Torvalds
2016-08-12 6:04 ` Dave Chinner
2016-08-12 6:29 ` Ye Xiaolong
2016-08-12 8:51 ` Ye Xiaolong
2016-08-12 10:02 ` Dave Chinner
2016-08-12 10:43 ` Fengguang Wu
2016-08-13 0:30 ` [LKP] [lkp] " Christoph Hellwig
2016-08-13 21:48 ` Christoph Hellwig
2016-08-13 22:07 ` Fengguang Wu
2016-08-13 22:15 ` Christoph Hellwig
2016-08-13 22:51 ` Fengguang Wu
2016-08-14 14:50 ` Fengguang Wu
2016-08-14 16:17 ` Christoph Hellwig
2016-08-14 23:46 ` Dave Chinner
2016-08-14 23:57 ` Fengguang Wu
2016-08-15 14:14 ` Fengguang Wu
2016-08-15 21:22 ` Dave Chinner
2016-08-16 12:20 ` Fengguang Wu
2016-08-15 20:30 ` Huang, Ying
2016-08-22 22:09 ` Huang, Ying
2016-09-26 6:25 ` Huang, Ying
2016-09-26 14:55 ` Christoph Hellwig
2016-09-27 0:52 ` Huang, Ying
2016-08-16 13:25 ` Fengguang Wu
2016-08-13 23:32 ` Dave Chinner
2016-08-12 2:27 ` Linus Torvalds
2016-08-12 3:56 ` Dave Chinner
2016-08-12 18:03 ` Linus Torvalds
2016-08-13 23:58 ` Fengguang Wu
2016-08-15 0:48 ` Dave Chinner
2016-08-15 1:37 ` Linus Torvalds
2016-08-15 2:28 ` Dave Chinner
2016-08-15 2:53 ` Linus Torvalds
2016-08-15 5:00 ` Dave Chinner
[not found] ` <CA+55aFwva2Xffai+Eqv1Jn_NGryk3YJ2i5JoHOQnbQv6qVPAsw@mail.gmail.com>
[not found] ` <CA+55aFy14nUnJQ_GdF=j8Fa9xiH70c6fY2G3q5HQ01+8z1z3qQ@mail.gmail.com>
[not found] ` <CA+55aFxp+rLehC8c157uRbH459wUC1rRPfCVgvmcq5BrG9gkyg@mail.gmail.com>
2016-08-15 22:22 ` Dave Chinner
2016-08-15 22:42 ` Dave Chinner
2016-08-15 23:20 ` Linus Torvalds
2016-08-15 23:48 ` Linus Torvalds
2016-08-16 0:44 ` Dave Chinner
2016-08-16 15:05 ` Mel Gorman
2016-08-16 17:47 ` Linus Torvalds
2016-08-17 15:48 ` Michal Hocko
2016-08-17 16:42 ` Michal Hocko
2016-08-17 15:49 ` Mel Gorman
2016-08-18 0:45 ` Mel Gorman
2016-08-18 7:11 ` Dave Chinner
2016-08-18 13:24 ` Mel Gorman
2016-08-18 17:55 ` Linus Torvalds
2016-08-18 21:19 ` Dave Chinner
2016-08-18 22:25 ` Linus Torvalds
2016-08-19 9:00 ` Michal Hocko
2016-08-19 10:49 ` Mel Gorman
2016-08-19 23:48 ` Dave Chinner
2016-08-20 1:08 ` Linus Torvalds
2016-08-20 12:16 ` Mel Gorman
2016-08-19 15:08 ` Mel Gorman
2016-09-01 23:32 ` Dave Chinner
2016-09-06 15:37 ` Mel Gorman
2016-09-06 15:52 ` Huang, Ying
2016-08-24 15:40 ` Huang, Ying
2016-08-25 9:37 ` Mel Gorman
2016-08-18 2:44 ` Dave Chinner
2016-08-16 0:15 ` Linus Torvalds
2016-08-16 0:38 ` Dave Chinner
2016-08-16 0:50 ` Linus Torvalds
2016-08-16 0:19 ` Dave Chinner
2016-08-16 1:51 ` Linus Torvalds
2016-08-16 22:02 ` Dave Chinner
2016-08-16 23:23 ` Linus Torvalds
2016-08-15 23:01 ` Linus Torvalds
2016-08-16 0:17 ` Dave Chinner
2016-08-16 0:45 ` Linus Torvalds
2016-08-15 5:03 ` Ingo Molnar
2016-08-17 16:24 ` Peter Zijlstra
2016-08-15 12:58 ` Fengguang Wu
2016-08-11 1:16 ` Dave Chinner
2016-08-11 1:32 ` Dave Chinner
2016-08-11 2:36 ` Ye Xiaolong
2016-08-11 3:05 ` Dave Chinner
2016-08-12 1:26 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160812022329.GP19025@dastard \
--to=david@fromorbit.com \
--cc=fengguang.wu@intel.com \
--cc=hch@lst.de \
--cc=linux-kernel@vger.kernel.org \
--cc=lkp@01.org \
--cc=rpeterso@redhat.com \
--cc=torvalds@linux-foundation.org \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox