* [Regression 3.16-rc kernel] -55% reduction in throughput for OLTP benchmark
@ 2014-07-28 21:51 Tim Chen
2014-07-29 14:19 ` Christoph Hellwig
0 siblings, 1 reply; 12+ messages in thread
From: Tim Chen @ 2014-07-28 21:51 UTC (permalink / raw)
To: Al Viro
Cc: suruchi, ak, Dave Hansen, Matthew Wilcox, hubert.nueckel,
beth.marsh-prime, doug.nelson, linux-kernel, linux-fsdevel
Al,
We saw a large regression (-55%) in a standard OLTP benchmark
between 3.15 kernel and 3.16-rc4 kernel.
One issue was caused by the changes in async direct IO in the
direct IO patch series between commit f6c0a192 and 62a8067a
for the 3.16-rc kernel. The kernel before this series at
commit 38583f09 has no such regression.
After some investigations and instrumentation, we saw that the
problem was caused by a lot of io_wait issued from
do_blockdev_direct_IO for async direct IO writes, which
did not happen for 3.15 kernel. The benchmark
used to run at 97% cpu utilization now has only
35% cpu utilization with plenty of idle time.
At the end of do_blockdev_direct_IO,
if (dio->is_async && retval == 0 && dio->result &&
((rw == READ) || (dio->result == sdio.size)))
retval = -EIOCBQUEUED;
if (retval != -EIOCBQUEUED)
dio_await_completion(dio);
we saw for async writes, the check
for (dio->result == sdio.size) failed. The retval was not
set to -EIOCBQUEUED, causing dio_await_completion to be issued
and hence the io_waits.
It is possible that the problem was introduced when dio->result
computation was moved into do_direct_IO in commit 3320c60b.
However, we cannot compile and test this commit separately as
it is dependent on some later patches in the direct IO patch series
so we have to test the patch series as a whole.
This is quite a large regression for us and we hope that
it can be fixed before 3.16 kernel is released.
Thanks.
Tim
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Regression 3.16-rc kernel] -55% reduction in throughput for OLTP benchmark
2014-07-28 21:51 [Regression 3.16-rc kernel] -55% reduction in throughput for OLTP benchmark Tim Chen
@ 2014-07-29 14:19 ` Christoph Hellwig
2014-07-29 16:35 ` Tim Chen
0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2014-07-29 14:19 UTC (permalink / raw)
To: Tim Chen
Cc: Al Viro, suruchi, ak, Dave Hansen, Matthew Wilcox, hubert.nueckel,
beth.marsh-prime, doug.nelson, linux-kernel, linux-fsdevel
Hi Tim,
I think the problem is that sdio.size doesn't get set set with
the new dio code. Please try the patch below to fix this by not
relying on that field:
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 194d0d1..adbb847 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -71,7 +71,6 @@ struct dio_submit {
been performed at the start of a
write */
int pages_in_io; /* approximate total IO pages */
- size_t size; /* total request size (doesn't change)*/
sector_t block_in_file; /* Current offset into the underlying
file in dio_block units. */
unsigned blocks_available; /* At block_in_file. changes */
@@ -1104,7 +1103,8 @@ do_blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
unsigned blkbits = i_blkbits;
unsigned blocksize_mask = (1 << blkbits) - 1;
ssize_t retval = -EINVAL;
- loff_t end = offset + iov_iter_count(iter);
+ size_t count = iov_iter_count(iter);
+ loff_t end = offset + count;
struct dio *dio;
struct dio_submit sdio = { 0, };
struct buffer_head map_bh = { 0, };
@@ -1286,16 +1286,14 @@ do_blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
* This had *better* be the only place that raises -EIOCBQUEUED.
*/
BUG_ON(retval == -EIOCBQUEUED);
- if (dio->is_async && retval == 0 && dio->result &&
- ((rw == READ) || (dio->result == sdio.size)))
+ if (dio->is_async && rw == READ && retval == 0 && dio->result == count)
retval = -EIOCBQUEUED;
-
- if (retval != -EIOCBQUEUED)
+ else
dio_await_completion(dio);
- if (drop_refcount(dio) == 0) {
+ if (drop_refcount(dio) == 0)
retval = dio_complete(dio, offset, retval, false);
- } else
+ else
BUG_ON(retval != -EIOCBQUEUED);
out:
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [Regression 3.16-rc kernel] -55% reduction in throughput for OLTP benchmark
2014-07-29 14:19 ` Christoph Hellwig
@ 2014-07-29 16:35 ` Tim Chen
2014-07-29 16:49 ` Christoph Hellwig
0 siblings, 1 reply; 12+ messages in thread
From: Tim Chen @ 2014-07-29 16:35 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Al Viro, suruchi, ak, Dave Hansen, Matthew Wilcox, hubert.nueckel,
beth.marsh-prime, doug.nelson, linux-kernel, linux-fsdevel
On Tue, 2014-07-29 at 07:19 -0700, Christoph Hellwig wrote:
> Hi Tim,
>
> I think the problem is that sdio.size doesn't get set set with
> the new dio code. Please try the patch below to fix this by not
> relying on that field:
Thanks for your quick response. We'll test this patch and let you
know the results.
Tim
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Regression 3.16-rc kernel] -55% reduction in throughput for OLTP benchmark
2014-07-29 16:35 ` Tim Chen
@ 2014-07-29 16:49 ` Christoph Hellwig
2014-07-29 17:41 ` Tim Chen
2014-07-29 23:57 ` Tim Chen
0 siblings, 2 replies; 12+ messages in thread
From: Christoph Hellwig @ 2014-07-29 16:49 UTC (permalink / raw)
To: Tim Chen
Cc: Al Viro, suruchi, ak, Dave Hansen, Matthew Wilcox, hubert.nueckel,
beth.marsh-prime, doug.nelson, linux-kernel, linux-fsdevel
On Tue, Jul 29, 2014 at 09:35:02AM -0700, Tim Chen wrote:
> On Tue, 2014-07-29 at 07:19 -0700, Christoph Hellwig wrote:
> > Hi Tim,
> >
> > I think the problem is that sdio.size doesn't get set set with
> > the new dio code. Please try the patch below to fix this by not
> > relying on that field:
>
> Thanks for your quick response. We'll test this patch and let you
> know the results.
There actually was a little bug in it, please try the version below
instead:
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 194d0d1..17e39b0 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -71,7 +71,6 @@ struct dio_submit {
been performed at the start of a
write */
int pages_in_io; /* approximate total IO pages */
- size_t size; /* total request size (doesn't change)*/
sector_t block_in_file; /* Current offset into the underlying
file in dio_block units. */
unsigned blocks_available; /* At block_in_file. changes */
@@ -1104,7 +1103,8 @@ do_blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
unsigned blkbits = i_blkbits;
unsigned blocksize_mask = (1 << blkbits) - 1;
ssize_t retval = -EINVAL;
- loff_t end = offset + iov_iter_count(iter);
+ size_t count = iov_iter_count(iter);
+ loff_t end = offset + count;
struct dio *dio;
struct dio_submit sdio = { 0, };
struct buffer_head map_bh = { 0, };
@@ -1287,10 +1287,9 @@ do_blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
*/
BUG_ON(retval == -EIOCBQUEUED);
if (dio->is_async && retval == 0 && dio->result &&
- ((rw == READ) || (dio->result == sdio.size)))
+ (rw == READ || dio->result == count))
retval = -EIOCBQUEUED;
-
- if (retval != -EIOCBQUEUED)
+ else
dio_await_completion(dio);
if (drop_refcount(dio) == 0) {
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [Regression 3.16-rc kernel] -55% reduction in throughput for OLTP benchmark
2014-07-29 16:49 ` Christoph Hellwig
@ 2014-07-29 17:41 ` Tim Chen
2014-07-29 23:57 ` Tim Chen
1 sibling, 0 replies; 12+ messages in thread
From: Tim Chen @ 2014-07-29 17:41 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Al Viro, suruchi, ak, Dave Hansen, Matthew Wilcox, hubert.nueckel,
beth.marsh-prime, doug.nelson, linux-kernel, linux-fsdevel
On Tue, 2014-07-29 at 09:49 -0700, Christoph Hellwig wrote:
> On Tue, Jul 29, 2014 at 09:35:02AM -0700, Tim Chen wrote:
> > On Tue, 2014-07-29 at 07:19 -0700, Christoph Hellwig wrote:
> > > Hi Tim,
> > >
> > > I think the problem is that sdio.size doesn't get set set with
> > > the new dio code. Please try the patch below to fix this by not
> > > relying on that field:
> >
> > Thanks for your quick response. We'll test this patch and let you
> > know the results.
>
> There actually was a little bug in it, please try the version below
> instead:
>
Thanks. Will do.
Tim
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Regression 3.16-rc kernel] -55% reduction in throughput for OLTP benchmark
2014-07-29 16:49 ` Christoph Hellwig
2014-07-29 17:41 ` Tim Chen
@ 2014-07-29 23:57 ` Tim Chen
2014-07-30 0:12 ` Christoph Hellwig
1 sibling, 1 reply; 12+ messages in thread
From: Tim Chen @ 2014-07-29 23:57 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Al Viro, suruchi, ak, Dave Hansen, Matthew Wilcox, hubert.nueckel,
beth.marsh-prime, doug.nelson, linux-kernel, linux-fsdevel,
Linus Torvalds
On Tue, 2014-07-29 at 09:49 -0700, Christoph Hellwig wrote:
> On Tue, Jul 29, 2014 at 09:35:02AM -0700, Tim Chen wrote:
> > On Tue, 2014-07-29 at 07:19 -0700, Christoph Hellwig wrote:
> > > Hi Tim,
> > >
> > > I think the problem is that sdio.size doesn't get set set with
> > > the new dio code. Please try the patch below to fix this by not
> > > relying on that field:
> >
> > Thanks for your quick response. We'll test this patch and let you
> > know the results.
>
> There actually was a little bug in it, please try the version below
> instead:
>
Christoph, we're glad to report that this patch does fix the problem.
Wonder if Al or someone can pick it up. It is a large regression
that should not be left unfixed for 3.16.
Tim
>
> diff --git a/fs/direct-io.c b/fs/direct-io.c
> index 194d0d1..17e39b0 100644
> --- a/fs/direct-io.c
> +++ b/fs/direct-io.c
> @@ -71,7 +71,6 @@ struct dio_submit {
> been performed at the start of a
> write */
> int pages_in_io; /* approximate total IO pages */
> - size_t size; /* total request size (doesn't change)*/
> sector_t block_in_file; /* Current offset into the underlying
> file in dio_block units. */
> unsigned blocks_available; /* At block_in_file. changes */
> @@ -1104,7 +1103,8 @@ do_blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
> unsigned blkbits = i_blkbits;
> unsigned blocksize_mask = (1 << blkbits) - 1;
> ssize_t retval = -EINVAL;
> - loff_t end = offset + iov_iter_count(iter);
> + size_t count = iov_iter_count(iter);
> + loff_t end = offset + count;
> struct dio *dio;
> struct dio_submit sdio = { 0, };
> struct buffer_head map_bh = { 0, };
> @@ -1287,10 +1287,9 @@ do_blockdev_direct_IO(int rw, struct kiocb *iocb, struct inode *inode,
> */
> BUG_ON(retval == -EIOCBQUEUED);
> if (dio->is_async && retval == 0 && dio->result &&
> - ((rw == READ) || (dio->result == sdio.size)))
> + (rw == READ || dio->result == count))
> retval = -EIOCBQUEUED;
> -
> - if (retval != -EIOCBQUEUED)
> + else
> dio_await_completion(dio);
>
> if (drop_refcount(dio) == 0) {
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Regression 3.16-rc kernel] -55% reduction in throughput for OLTP benchmark
2014-07-29 23:57 ` Tim Chen
@ 2014-07-30 0:12 ` Christoph Hellwig
2014-07-31 16:13 ` Tim Chen
2014-08-01 6:45 ` Al Viro
0 siblings, 2 replies; 12+ messages in thread
From: Christoph Hellwig @ 2014-07-30 0:12 UTC (permalink / raw)
To: Tim Chen
Cc: Christoph Hellwig, Al Viro, suruchi, ak, Dave Hansen,
Matthew Wilcox, hubert.nueckel, beth.marsh-prime, doug.nelson,
linux-kernel, linux-fsdevel, Linus Torvalds
On Tue, Jul 29, 2014 at 04:57:12PM -0700, Tim Chen wrote:
> Christoph, we're glad to report that this patch does fix the problem.
> Wonder if Al or someone can pick it up. It is a large regression
> that should not be left unfixed for 3.16.
Al has been away for a bit, and I prepared the last VFS pull. I'll pick
it up for now and either Al or I will send it on before next weekend.
Can I get a tested-by tag from you for the patch? Can anyone else
please cross check it? The new total value might be different from
the old size value when we do a short write, but I can't see how
an async write would be a good thing if we had to finish the remainder
using buffered I/O if the old code really allowed that.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Regression 3.16-rc kernel] -55% reduction in throughput for OLTP benchmark
2014-07-30 0:12 ` Christoph Hellwig
@ 2014-07-31 16:13 ` Tim Chen
2014-07-31 17:07 ` Linus Torvalds
2014-08-01 6:45 ` Al Viro
1 sibling, 1 reply; 12+ messages in thread
From: Tim Chen @ 2014-07-31 16:13 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Al Viro, suruchi, ak, Dave Hansen, Matthew Wilcox, hubert.nueckel,
beth.marsh-prime, doug.nelson, linux-kernel, linux-fsdevel,
Linus Torvalds
On Tue, 2014-07-29 at 17:12 -0700, Christoph Hellwig wrote:
> On Tue, Jul 29, 2014 at 04:57:12PM -0700, Tim Chen wrote:
> > Christoph, we're glad to report that this patch does fix the problem.
> > Wonder if Al or someone can pick it up. It is a large regression
> > that should not be left unfixed for 3.16.
>
> Al has been away for a bit, and I prepared the last VFS pull. I'll pick
> it up for now and either Al or I will send it on before next weekend.
>
> Can I get a tested-by tag from you for the patch? Can anyone else
> please cross check it? The new total value might be different from
> the old size value when we do a short write, but I can't see how
> an async write would be a good thing if we had to finish the remainder
> using buffered I/O if the old code really allowed that.
>
Yes, please put in
Tested-by: Tim Chen <tim.c.chen@linux.intel.com>
Tested-by: Suruchi Kadu <suruchi.a.kadu@intel.com>
Tim
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Regression 3.16-rc kernel] -55% reduction in throughput for OLTP benchmark
2014-07-31 16:13 ` Tim Chen
@ 2014-07-31 17:07 ` Linus Torvalds
2014-07-31 17:10 ` Christoph Hellwig
0 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2014-07-31 17:07 UTC (permalink / raw)
To: Tim Chen
Cc: Christoph Hellwig, Al Viro, suruchi, ak, Dave Hansen,
Matthew Wilcox, hubert.nueckel, beth.marsh-prime, doug.nelson,
linux-kernel, linux-fsdevel
On Thu, Jul 31, 2014 at 9:13 AM, Tim Chen <tim.c.chen@linux.intel.com> wrote:
> On Tue, 2014-07-29 at 17:12 -0700, Christoph Hellwig wrote:
>>
>> Can I get a tested-by tag from you for the patch? Can anyone else
>> please cross check it? The new total value might be different from
>> the old size value when we do a short write, but I can't see how
>> an async write would be a good thing if we had to finish the remainder
>> using buffered I/O if the old code really allowed that.
>>
>
> Yes, please put in
>
> Tested-by: Tim Chen <tim.c.chen@linux.intel.com>
> Tested-by: Suruchi Kadu <suruchi.a.kadu@intel.com>
Ok, good, patch looks fine. Christoph, do you have a commit message for it?
Linus
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Regression 3.16-rc kernel] -55% reduction in throughput for OLTP benchmark
2014-07-31 17:07 ` Linus Torvalds
@ 2014-07-31 17:10 ` Christoph Hellwig
2014-07-31 17:22 ` Linus Torvalds
0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2014-07-31 17:10 UTC (permalink / raw)
To: Linus Torvalds
Cc: Tim Chen, Christoph Hellwig, Al Viro, suruchi, ak, Dave Hansen,
Matthew Wilcox, hubert.nueckel, beth.marsh-prime, doug.nelson,
linux-kernel, linux-fsdevel
On Thu, Jul 31, 2014 at 10:07:55AM -0700, Linus Torvalds wrote:
> > Tested-by: Tim Chen <tim.c.chen@linux.intel.com>
> > Tested-by: Suruchi Kadu <suruchi.a.kadu@intel.com>
>
> Ok, good, patch looks fine. Christoph, do you have a commit message for it?
I have it queued in the vfs tree in for-next, although only with Tim's
tested-by tag, not Suruchis:
http://git.infradead.org/users/hch/vfs.git/commitdiff/d0c9ddc0a2c3036f1ab06ddf19151d5d57cad279
I was going to prepare a pull request with just that patch for you later
today or early tomorrow.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Regression 3.16-rc kernel] -55% reduction in throughput for OLTP benchmark
2014-07-31 17:10 ` Christoph Hellwig
@ 2014-07-31 17:22 ` Linus Torvalds
0 siblings, 0 replies; 12+ messages in thread
From: Linus Torvalds @ 2014-07-31 17:22 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Tim Chen, Al Viro, suruchi, ak, Dave Hansen, Matthew Wilcox,
hubert.nueckel, beth.marsh-prime, doug.nelson, linux-kernel,
linux-fsdevel
On Thu, Jul 31, 2014 at 10:10 AM, Christoph Hellwig <hch@infradead.org> wrote:
>
> I was going to prepare a pull request with just that patch for you later
> today or early tomorrow.
Ok, that's fine. I don't have anything particularly scary lined up -
and quite frankly, despite the big numbers, this patch doesn't really
count as scary either - so right now I'm still planning to do final
3.16 this Sunday.
A pull request today or tomorrow sounds perfectly fine - a pull
request much closer to release less so.
Thanks,
Linus
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Regression 3.16-rc kernel] -55% reduction in throughput for OLTP benchmark
2014-07-30 0:12 ` Christoph Hellwig
2014-07-31 16:13 ` Tim Chen
@ 2014-08-01 6:45 ` Al Viro
1 sibling, 0 replies; 12+ messages in thread
From: Al Viro @ 2014-08-01 6:45 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Tim Chen, suruchi, ak, Dave Hansen, Matthew Wilcox,
hubert.nueckel, beth.marsh-prime, doug.nelson, linux-kernel,
linux-fsdevel, Linus Torvalds
On Tue, Jul 29, 2014 at 05:12:48PM -0700, Christoph Hellwig wrote:
> On Tue, Jul 29, 2014 at 04:57:12PM -0700, Tim Chen wrote:
> > Christoph, we're glad to report that this patch does fix the problem.
> > Wonder if Al or someone can pick it up. It is a large regression
> > that should not be left unfixed for 3.16.
>
> Al has been away for a bit, and I prepared the last VFS pull. I'll pick
> it up for now and either Al or I will send it on before next weekend.
Back now (and almost dug myself from under the huge mail pile - only ~1500
left to deal with, out of more than 30000 ;-/)
In addition to this fix there is "vfs: fix check for fallocate on active
swapfile". There might be more in the pending pile, but so far that's
the only obvious missing bugfix.
Really big thanks for taking care of that mess while I'd been MIA (spent a lot
of time sick; I _hate_ the summer in NC, and this one had been particulary
lousy). I'm trying to put together a queue for 3.17 right now; your
vfs-for-3.17 branch looks like a sane starting point. Sigh...
Do you have anything else for 3.16 at the moment? I've pushed these two
into vfs.git#for-linus; right now the stuff in there is
Shortlog:
Christoph Hellwig (1):
direct-io: fix AIO regression
Eric Biggers (1):
vfs: fix check for fallocate on active swapfile
Diffstat:
fs/direct-io.c | 9 ++++-----
fs/open.c | 5 ++---
2 files changed, 6 insertions(+), 8 deletions(-)
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2014-08-01 6:45 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-28 21:51 [Regression 3.16-rc kernel] -55% reduction in throughput for OLTP benchmark Tim Chen
2014-07-29 14:19 ` Christoph Hellwig
2014-07-29 16:35 ` Tim Chen
2014-07-29 16:49 ` Christoph Hellwig
2014-07-29 17:41 ` Tim Chen
2014-07-29 23:57 ` Tim Chen
2014-07-30 0:12 ` Christoph Hellwig
2014-07-31 16:13 ` Tim Chen
2014-07-31 17:07 ` Linus Torvalds
2014-07-31 17:10 ` Christoph Hellwig
2014-07-31 17:22 ` Linus Torvalds
2014-08-01 6:45 ` Al Viro
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).