* [PATCH] fs: sync: fixed performance regression
@ 2013-07-10 23:12 Paul Taysom
2013-07-10 23:45 ` Paul Taysom
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Paul Taysom @ 2013-07-10 23:12 UTC (permalink / raw)
To: Alexander Viro; +Cc: linux-fsdevel, linux-kernel, jack, taysom, sonnyrao
The following commit introduced a 10x regression for
syncing inodes in ext4 with relatime enabled where just
the atime had been modified.
commit 4ea425b63a3dfeb7707fc7cc7161c11a51e871ed
Author: Jan Kara <jack@suse.cz>
Date: Tue Jul 3 16:45:34 2012 +0200
vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes
See also: http://www.kernelhub.org/?msg=93100&p=2
Fixed by putting back in the call to writeback_inodes_sb.
I'll attach the test in a reply to this e-mail.
The test starts by creating 512 files, syncing, reading one byte
from each of those files, syncing, and then deleting each file
and syncing. The time to do each sync is printed. The process
is then repeated for 1024 files and then the next power of
two up to 262144 files.
Note, when running the test, the slow down doesn't always happen
but most of the tests will show a slow down.
In response to crbug.com/240422
Signed-off-by: Paul Taysom <taysom@chromium.org>
---
fs/sync.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/fs/sync.c b/fs/sync.c
index 905f3f6..55c3316 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -73,6 +73,12 @@ static void sync_inodes_one_sb(struct super_block *sb, void *arg)
sync_inodes_sb(sb);
}
+static void writeback_inodes_one_sb(struct super_block *sb, void *arg)
+{
+ if (!(sb->s_flags & MS_RDONLY))
+ writeback_inodes_sb(sb, WB_REASON_SYNC);
+}
+
static void sync_fs_one_sb(struct super_block *sb, void *arg)
{
if (!(sb->s_flags & MS_RDONLY) && sb->s_op->sync_fs)
@@ -104,6 +110,7 @@ SYSCALL_DEFINE0(sync)
int nowait = 0, wait = 1;
wakeup_flusher_threads(0, WB_REASON_SYNC);
+ iterate_supers(writeback_inodes_one_sb, NULL);
iterate_supers(sync_inodes_one_sb, NULL);
iterate_supers(sync_fs_one_sb, &nowait);
iterate_supers(sync_fs_one_sb, &wait);
--
1.8.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] fs: sync: fixed performance regression
2013-07-10 23:12 [PATCH] fs: sync: fixed performance regression Paul Taysom
@ 2013-07-10 23:45 ` Paul Taysom
2013-07-10 23:56 ` Dave Jones
` (2 subsequent siblings)
3 siblings, 0 replies; 12+ messages in thread
From: Paul Taysom @ 2013-07-10 23:45 UTC (permalink / raw)
To: Paul Taysom; +Cc: Alexander Viro, linux-fsdevel, linux-kernel, jack, sonnyrao
[-- Attachment #1: Type: text/plain, Size: 2223 bytes --]
On Wed, Jul 10, 2013 at 4:12 PM, Paul Taysom <taysom@chromium.org> wrote:
> The following commit introduced a 10x regression for
> syncing inodes in ext4 with relatime enabled where just
> the atime had been modified.
>
> commit 4ea425b63a3dfeb7707fc7cc7161c11a51e871ed
> Author: Jan Kara <jack@suse.cz>
> Date: Tue Jul 3 16:45:34 2012 +0200
> vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes
>
> See also: http://www.kernelhub.org/?msg=93100&p=2
>
> Fixed by putting back in the call to writeback_inodes_sb.
>
> I'll attach the test in a reply to this e-mail.
>
> The test starts by creating 512 files, syncing, reading one byte
> from each of those files, syncing, and then deleting each file
> and syncing. The time to do each sync is printed. The process
> is then repeated for 1024 files and then the next power of
> two up to 262144 files.
>
> Note, when running the test, the slow down doesn't always happen
> but most of the tests will show a slow down.
>
> In response to crbug.com/240422
>
> Signed-off-by: Paul Taysom <taysom@chromium.org>
> ---
> fs/sync.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/fs/sync.c b/fs/sync.c
> index 905f3f6..55c3316 100644
> --- a/fs/sync.c
> +++ b/fs/sync.c
> @@ -73,6 +73,12 @@ static void sync_inodes_one_sb(struct super_block *sb, void *arg)
> sync_inodes_sb(sb);
> }
>
> +static void writeback_inodes_one_sb(struct super_block *sb, void *arg)
> +{
> + if (!(sb->s_flags & MS_RDONLY))
> + writeback_inodes_sb(sb, WB_REASON_SYNC);
> +}
> +
> static void sync_fs_one_sb(struct super_block *sb, void *arg)
> {
> if (!(sb->s_flags & MS_RDONLY) && sb->s_op->sync_fs)
> @@ -104,6 +110,7 @@ SYSCALL_DEFINE0(sync)
> int nowait = 0, wait = 1;
>
> wakeup_flusher_threads(0, WB_REASON_SYNC);
> + iterate_supers(writeback_inodes_one_sb, NULL);
> iterate_supers(sync_inodes_one_sb, NULL);
> iterate_supers(sync_fs_one_sb, &nowait);
> iterate_supers(sync_fs_one_sb, &wait);
> --
> 1.8.3
>
Try this again but in plaintext mode. Attaching test results and test
program. Tests were run on a Pixel x86 with SSD storage.
[-- Attachment #2: syncperf.c --]
[-- Type: text/x-csrc, Size: 3102 bytes --]
/*
* Copyright (c) 2013 The Chromium OS Authors. All rights reserved.
* Use of this source code is governed by a BSD-style license that can be
* found in the LICENSE file.
*/
/*
* Does time test of sync for creating, reading and deleting files.
*
* To compile:
* cc syncperf.c -rlt -o syncperf
*
* To run, cd to a directory on volume where test should run.
* syncperf
*
* The syncperf will create the directory "synctestdir" and do
* several test runs creating twice as many files each time.
*
* The test prints the time to sync after creating the files,
* after reading the files and after deleting the files.
*
* There will be some runs, where the read sync time is
* fast event on systems that exhibit the problem.
*
* When the tests finishes, it cleans up "synctestdir".
*/
#include <sys/types.h>
#include <sys/stat.h>
#include <errno.h>
#include <fcntl.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>
enum { MAX_NAME = 12,
FILE_SIZE = 1 << 14,
BYTES_TO_READ = 1,
NUM_START_FILES = 1 << 9,
NUM_FILES = 1 << 18 };
struct file {
char name[MAX_NAME];
};
struct file *File;
static inline uint64_t nsecs(void)
{
struct timespec t;
clock_gettime(CLOCK_REALTIME, &t);
return (uint64_t)t.tv_sec * 1000000000ULL + t.tv_nsec;
}
static void fill(uint8_t *buf, int n)
{
int i;
for (i = 0; i < n; i++)
buf[i] = rand();
}
static void createfiles(int num_files)
{
uint8_t buf[FILE_SIZE];
int fd;
int i;
fill(buf, sizeof(buf));
for (i = 0; i < num_files; i++) {
snprintf(File[i].name, MAX_NAME, "f%d", i);
fd = creat(File[i].name, 0600);
if (write(fd, buf, sizeof(buf)) == -1)
perror("write");
close(fd);
}
}
static void unlinkfiles(int num_files)
{
int i;
for (i = 0; i < num_files; i++)
unlink(File[i].name);
}
static void readfiles(int num_files)
{
uint8_t buf[BYTES_TO_READ];
int fd;
int i;
for (i = 0; i < num_files; i++) {
fd = open(File[i].name, O_RDONLY);
if (read(fd, buf, BYTES_TO_READ) == -1)
perror("read");
close(fd);
}
}
static void time_sync(const char *label, int n)
{
uint64_t start;
uint64_t finish;
start = nsecs();
sync();
finish = nsecs();
printf("%10s %8d. %10.2f ms\n",
label, n, (double)(finish - start)/1000000.0);
}
void crsyncdel(int n)
{
createfiles(n);
time_sync("create", n);
readfiles(n);
time_sync("read", n);
unlinkfiles(n);
time_sync("unlink", n);
}
static void cleanup(const char *dir)
{
char cmd[1024];
if (chdir("..") == -1)
perror(dir);
snprintf(cmd, sizeof(cmd), "rm -fr %s", dir);
if (system(cmd) == -1)
perror(cmd);
}
static void setup(const char *dir)
{
mkdir(dir, 0700);
if (chdir(dir) == -1)
perror(dir);
sync();
File = malloc(sizeof(*File) * NUM_FILES);
}
int main(int argc, char *argv[])
{
char *dir = "synctestdir";
int i;
setup(dir);
/*
* Number of files grows by powers of two until the
* specified number of files is reached.
* Start with a large enough number to skip noise.
*/
for (i = NUM_START_FILES; i <= NUM_FILES; i <<= 1)
crsyncdel(i);
cleanup(dir);
return 0;
}
[-- Attachment #3: syncperf.results --]
[-- Type: application/octet-stream, Size: 6043 bytes --]
Tests were run on a ext4 file system with relatime.
/dev/sda1 on /home type ext4 (rw,nodev,relatime,commit=600,data=ordered)
3.8 Without Fix:
Linux localhost 3.8.11 #6 SMP Tue Jul 9 08:42:43 PDT 2013 x86_64 Intel(R) Core(TM) i5-3427U CPU @ 1.80GHz GenuineIntel GNU/Linux
# syncperf
create 512. 219.22 ms
read 512. 106.21 ms
unlink 512. 5.04 ms
create 1024. 755.02 ms
read 1024. 5.91 ms
unlink 1024. 9.72 ms
create 2048. 630.38 ms
read 2048. 23.53 ms
unlink 2048. 19.78 ms
create 4096. 1182.52 ms
read 4096. 22.11 ms
unlink 4096. 46.82 ms
create 8192. 2471.17 ms
read 8192. 16401.02 ms
unlink 8192. 87.54 ms
create 16384. 3961.91 ms
read 16384. 254.73 ms
unlink 16384. 364.20 ms
create 32768. 8282.45 ms
read 32768. 604.17 ms
unlink 32768. 533.13 ms
create 65536. 14259.07 ms
read 65536. 132600.38 ms
unlink 65536. 1210.62 ms
create 131072. 26981.96 ms
read 131072. 267947.54 ms
unlink 131072. 918.86 ms
create 262144. 29110.07 ms
read 262144. 2029.31 ms
unlink 262144. 2017.16 ms
3.8 With Fix:
Linux localhost 3.8.11 #7 SMP Tue Jul 9 13:39:28 PDT 2013 x86_64 Intel(R) Core(TM) i5-3427U CPU @ 1.80GHz GenuineIntel GNU/Linux
# syncperf
create 512. 171.74 ms
read 512. 25.04 ms
unlink 512. 4.74 ms
create 1024. 1129.57 ms
read 1024. 26.94 ms
unlink 1024. 14.87 ms
create 2048. 1477.42 ms
read 2048. 17.44 ms
unlink 2048. 23.20 ms
create 4096. 1199.35 ms
read 4096. 31.53 ms
unlink 4096. 32.06 ms
create 8192. 1687.70 ms
read 8192. 334.43 ms
unlink 8192. 123.46 ms
create 16384. 2930.78 ms
read 16384. 196.62 ms
unlink 16384. 284.63 ms
create 32768. 6601.72 ms
read 32768. 753.83 ms
unlink 32768. 993.44 ms
create 65536. 13026.11 ms
read 65536. 1822.98 ms
unlink 65536. 1498.88 ms
create 131072. 20096.10 ms
read 131072. 2231.34 ms
unlink 131072. 1323.37 ms
create 262144. 19501.73 ms
read 262144. 1729.51 ms
unlink 262144. 3104.59 ms
3.20 2 Without Fix: Illustrates variations in runs
Linux (none) 3.10.0 #1 SMP Wed Jul 10 11:20:13 PDT 2013 x86_64 Intel(R) Core(TM) i5-3427U CPU @ 1.80GHz GenuineIntel GNU/Linux
# syncperf
create 512. 666.80 ms
read 512. 13.79 ms
unlink 512. 6.16 ms
create 1024. 524.69 ms
read 1024. 140.60 ms
unlink 1024. 35.08 ms
create 2048. 952.53 ms
read 2048. 4461.41 ms
unlink 2048. 21.58 ms
create 4096. 1327.72 ms
read 4096. 36.48 ms
unlink 4096. 29.55 ms
create 8192. 2728.06 ms
read 8192. 17422.08 ms
unlink 8192. 161.72 ms
create 16384. 3199.72 ms
read 16384. 33070.45 ms
unlink 16384. 203.76 ms
create 32768. 7995.47 ms
read 32768. 916.14 ms
unlink 32768. 783.23 ms
create 65536. 16543.12 ms
read 65536. 134519.28 ms
unlink 65536. 893.32 ms
create 131072. 22530.54 ms
read 131072. 2064.91 ms
unlink 131072. 1166.01 ms
create 262144. 25125.93 ms
read 262144. 3056.00 ms
unlink 262144. 2146.45 ms
# syncperf
create 512. 140.03 ms
read 512. 5.15 ms
unlink 512. 4.94 ms
create 1024. 558.78 ms
read 1024. 9.88 ms
unlink 1024. 14.15 ms
create 2048. 947.10 ms
read 2048. 4125.67 ms
unlink 2048. 21.66 ms
create 4096. 1050.69 ms
read 4096. 8254.83 ms
unlink 4096. 39.03 ms
create 8192. 1398.92 ms
read 8192. 382.07 ms
unlink 8192. 85.24 ms
create 16384. 4029.65 ms
read 16384. 32688.80 ms
unlink 16384. 168.32 ms
create 32768. 8975.05 ms
read 32768. 70167.40 ms
unlink 32768. 187.05 ms
create 65536. 10008.97 ms
read 65536. 135732.57 ms
unlink 65536. 905.02 ms
create 131072. 18777.25 ms
read 131072. 268809.67 ms
unlink 131072. 854.17 ms
create 262144. 16970.38 ms
read 262144. 1918.93 ms
unlink 262144. 1947.60 ms
3.10 With Fix
Linux (none) 3.10.0+ #1 SMP Wed Jul 10 12:07:52 PDT 2013 x86_64 Intel(R) Core(TM) i5-3427U CPU @ 1.80GHz GenuineIntel GNU/Linux
# syncperf
create 512. 197.64 ms
read 512. 7.91 ms
unlink 512. 15.54 ms
create 1024. 663.44 ms
read 1024. 23.25 ms
unlink 1024. 17.23 ms
create 2048. 919.58 ms
read 2048. 34.34 ms
unlink 2048. 16.76 ms
create 4096. 961.25 ms
read 4096. 86.09 ms
unlink 4096. 327.38 ms
create 8192. 1920.35 ms
read 8192. 278.77 ms
unlink 8192. 53.51 ms
create 16384. 4506.14 ms
read 16384. 99.09 ms
unlink 16384. 518.45 ms
create 32768. 8663.97 ms
read 32768. 642.77 ms
unlink 32768. 620.53 ms
create 65536. 13804.20 ms
read 65536. 1479.14 ms
unlink 65536. 1535.65 ms
create 131072. 22981.74 ms
read 131072. 1702.12 ms
unlink 131072. 895.26 ms
create 262144. 21771.39 ms
read 262144. 3048.98 ms
unlink 262144. 2157.94 ms
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] fs: sync: fixed performance regression
2013-07-10 23:12 [PATCH] fs: sync: fixed performance regression Paul Taysom
2013-07-10 23:45 ` Paul Taysom
@ 2013-07-10 23:56 ` Dave Jones
2013-07-11 0:42 ` Paul Taysom
2013-07-11 2:00 ` Dave Chinner
2013-07-11 10:53 ` Jan Kara
3 siblings, 1 reply; 12+ messages in thread
From: Dave Jones @ 2013-07-10 23:56 UTC (permalink / raw)
To: Paul Taysom; +Cc: Alexander Viro, linux-fsdevel, linux-kernel, jack, sonnyrao
On Wed, Jul 10, 2013 at 04:12:36PM -0700, Paul Taysom wrote:
> Note, when running the test, the slow down doesn't always happen
> but most of the tests will show a slow down.
>
> In response to crbug.com/240422
Just in case this ends up as the actual commit msg, that url
seems to be the wrong bug.
Dave.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] fs: sync: fixed performance regression
2013-07-10 23:56 ` Dave Jones
@ 2013-07-11 0:42 ` Paul Taysom
2013-07-11 0:45 ` Paul Taysom
0 siblings, 1 reply; 12+ messages in thread
From: Paul Taysom @ 2013-07-11 0:42 UTC (permalink / raw)
To: Dave Jones, Paul Taysom, Alexander Viro, linux-fsdevel,
linux-kernel, jack, sonnyrao
On Wed, Jul 10, 2013 at 4:56 PM, Dave Jones <davej@redhat.com> wrote:
> On Wed, Jul 10, 2013 at 04:12:36PM -0700, Paul Taysom wrote:
>
> > Note, when running the test, the slow down doesn't always happen
> > but most of the tests will show a slow down.
> >
> > In response to crbug.com/240422
>
> Just in case this ends up as the actual commit msg, that url
> seems to be the wrong bug.
>
> Dave.
Copied the wrong number crbug.con/218910
Thanks.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] fs: sync: fixed performance regression
2013-07-11 0:42 ` Paul Taysom
@ 2013-07-11 0:45 ` Paul Taysom
0 siblings, 0 replies; 12+ messages in thread
From: Paul Taysom @ 2013-07-11 0:45 UTC (permalink / raw)
To: Dave Jones, Paul Taysom, Alexander Viro, linux-fsdevel,
linux-kernel, jack, sonnyrao
Third time is the charm crbug.com/240411
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] fs: sync: fixed performance regression
2013-07-10 23:12 [PATCH] fs: sync: fixed performance regression Paul Taysom
2013-07-10 23:45 ` Paul Taysom
2013-07-10 23:56 ` Dave Jones
@ 2013-07-11 2:00 ` Dave Chinner
2013-07-11 10:53 ` Jan Kara
3 siblings, 0 replies; 12+ messages in thread
From: Dave Chinner @ 2013-07-11 2:00 UTC (permalink / raw)
To: Paul Taysom; +Cc: Alexander Viro, linux-fsdevel, linux-kernel, jack, sonnyrao
On Wed, Jul 10, 2013 at 04:12:36PM -0700, Paul Taysom wrote:
> The following commit introduced a 10x regression for
> syncing inodes in ext4 with relatime enabled where just
> the atime had been modified.
>
> commit 4ea425b63a3dfeb7707fc7cc7161c11a51e871ed
> Author: Jan Kara <jack@suse.cz>
> Date: Tue Jul 3 16:45:34 2012 +0200
> vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes
>
> See also: http://www.kernelhub.org/?msg=93100&p=2
>
> Fixed by putting back in the call to writeback_inodes_sb.
>
> I'll attach the test in a reply to this e-mail.
>
> The test starts by creating 512 files, syncing, reading one byte
> from each of those files, syncing, and then deleting each file
> and syncing. The time to do each sync is printed. The process
> is then repeated for 1024 files and then the next power of
> two up to 262144 files.
>
> Note, when running the test, the slow down doesn't always happen
> but most of the tests will show a slow down.
Can you please check if the patch attached to this mail:
http://marc.info/?l=linux-kernel&m=137276874025813&w=2
Fixes your problem?
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] fs: sync: fixed performance regression
2013-07-10 23:12 [PATCH] fs: sync: fixed performance regression Paul Taysom
` (2 preceding siblings ...)
2013-07-11 2:00 ` Dave Chinner
@ 2013-07-11 10:53 ` Jan Kara
2013-07-11 11:58 ` Jan Kara
3 siblings, 1 reply; 12+ messages in thread
From: Jan Kara @ 2013-07-11 10:53 UTC (permalink / raw)
To: Paul Taysom; +Cc: Alexander Viro, linux-fsdevel, linux-kernel, jack, sonnyrao
On Wed 10-07-13 16:12:36, Paul Taysom wrote:
> The following commit introduced a 10x regression for
> syncing inodes in ext4 with relatime enabled where just
> the atime had been modified.
>
> commit 4ea425b63a3dfeb7707fc7cc7161c11a51e871ed
> Author: Jan Kara <jack@suse.cz>
> Date: Tue Jul 3 16:45:34 2012 +0200
> vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes
>
> See also: http://www.kernelhub.org/?msg=93100&p=2
>
> Fixed by putting back in the call to writeback_inodes_sb.
>
> I'll attach the test in a reply to this e-mail.
>
> The test starts by creating 512 files, syncing, reading one byte
> from each of those files, syncing, and then deleting each file
> and syncing. The time to do each sync is printed. The process
> is then repeated for 1024 files and then the next power of
> two up to 262144 files.
>
> Note, when running the test, the slow down doesn't always happen
> but most of the tests will show a slow down.
>
> In response to crbug.com/240422
>
> Signed-off-by: Paul Taysom <taysom@chromium.org>
Thanks for report. Rather than blindly reverting the change, I'd like to
understand why you see so huge regression. As the changelog in the patch
says, flusher thread should be doing async writeback equivalent to the
removed one because it gets woken via wakeup_flusher_threads(). But my
guess is that for some reason we end up doing all the writeback from
sync_inodes_one_sb(). I'll try to reproduce your results and investigate...
Honza
> ---
> fs/sync.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/fs/sync.c b/fs/sync.c
> index 905f3f6..55c3316 100644
> --- a/fs/sync.c
> +++ b/fs/sync.c
> @@ -73,6 +73,12 @@ static void sync_inodes_one_sb(struct super_block *sb, void *arg)
> sync_inodes_sb(sb);
> }
>
> +static void writeback_inodes_one_sb(struct super_block *sb, void *arg)
> +{
> + if (!(sb->s_flags & MS_RDONLY))
> + writeback_inodes_sb(sb, WB_REASON_SYNC);
> +}
> +
> static void sync_fs_one_sb(struct super_block *sb, void *arg)
> {
> if (!(sb->s_flags & MS_RDONLY) && sb->s_op->sync_fs)
> @@ -104,6 +110,7 @@ SYSCALL_DEFINE0(sync)
> int nowait = 0, wait = 1;
>
> wakeup_flusher_threads(0, WB_REASON_SYNC);
> + iterate_supers(writeback_inodes_one_sb, NULL);
> iterate_supers(sync_inodes_one_sb, NULL);
> iterate_supers(sync_fs_one_sb, &nowait);
> iterate_supers(sync_fs_one_sb, &wait);
> --
> 1.8.3
>
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] fs: sync: fixed performance regression
2013-07-11 10:53 ` Jan Kara
@ 2013-07-11 11:58 ` Jan Kara
2013-07-11 21:42 ` Paul Taysom
2013-07-12 15:43 ` Jan Kara
0 siblings, 2 replies; 12+ messages in thread
From: Jan Kara @ 2013-07-11 11:58 UTC (permalink / raw)
To: Paul Taysom; +Cc: Alexander Viro, linux-fsdevel, linux-kernel, jack, sonnyrao
On Thu 11-07-13 12:53:46, Jan Kara wrote:
> On Wed 10-07-13 16:12:36, Paul Taysom wrote:
> > The following commit introduced a 10x regression for
> > syncing inodes in ext4 with relatime enabled where just
> > the atime had been modified.
> >
> > commit 4ea425b63a3dfeb7707fc7cc7161c11a51e871ed
> > Author: Jan Kara <jack@suse.cz>
> > Date: Tue Jul 3 16:45:34 2012 +0200
> > vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes
> >
> > See also: http://www.kernelhub.org/?msg=93100&p=2
> >
> > Fixed by putting back in the call to writeback_inodes_sb.
> >
> > I'll attach the test in a reply to this e-mail.
> >
> > The test starts by creating 512 files, syncing, reading one byte
> > from each of those files, syncing, and then deleting each file
> > and syncing. The time to do each sync is printed. The process
> > is then repeated for 1024 files and then the next power of
> > two up to 262144 files.
> >
> > Note, when running the test, the slow down doesn't always happen
> > but most of the tests will show a slow down.
> >
> > In response to crbug.com/240422
> >
> > Signed-off-by: Paul Taysom <taysom@chromium.org>
> Thanks for report. Rather than blindly reverting the change, I'd like to
> understand why you see so huge regression. As the changelog in the patch
> says, flusher thread should be doing async writeback equivalent to the
> removed one because it gets woken via wakeup_flusher_threads(). But my
> guess is that for some reason we end up doing all the writeback from
> sync_inodes_one_sb(). I'll try to reproduce your results and investigate...
Hum, so it must be something timing sensitive. I wasn't able to reproduce
the issue on my test machine in 4 runs of your test program. I was able to
reproduce it on my laptop on every second run of the test program but once
I've enabled some tracepoints, the issue disappeared and I didn't see it in
about 10 runs.
That being said I think that reverting my patch is just papering over the
problem. We will do the async pass over inodes twice instead of once
and thus the timing changes enough that you aren't able to observe the
problem.
I'm looking into this more...
Honza
> > ---
> > fs/sync.c | 7 +++++++
> > 1 file changed, 7 insertions(+)
> >
> > diff --git a/fs/sync.c b/fs/sync.c
> > index 905f3f6..55c3316 100644
> > --- a/fs/sync.c
> > +++ b/fs/sync.c
> > @@ -73,6 +73,12 @@ static void sync_inodes_one_sb(struct super_block *sb, void *arg)
> > sync_inodes_sb(sb);
> > }
> >
> > +static void writeback_inodes_one_sb(struct super_block *sb, void *arg)
> > +{
> > + if (!(sb->s_flags & MS_RDONLY))
> > + writeback_inodes_sb(sb, WB_REASON_SYNC);
> > +}
> > +
> > static void sync_fs_one_sb(struct super_block *sb, void *arg)
> > {
> > if (!(sb->s_flags & MS_RDONLY) && sb->s_op->sync_fs)
> > @@ -104,6 +110,7 @@ SYSCALL_DEFINE0(sync)
> > int nowait = 0, wait = 1;
> >
> > wakeup_flusher_threads(0, WB_REASON_SYNC);
> > + iterate_supers(writeback_inodes_one_sb, NULL);
> > iterate_supers(sync_inodes_one_sb, NULL);
> > iterate_supers(sync_fs_one_sb, &nowait);
> > iterate_supers(sync_fs_one_sb, &wait);
> > --
> > 1.8.3
> >
> --
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] fs: sync: fixed performance regression
2013-07-11 11:58 ` Jan Kara
@ 2013-07-11 21:42 ` Paul Taysom
2013-07-12 15:43 ` Jan Kara
1 sibling, 0 replies; 12+ messages in thread
From: Paul Taysom @ 2013-07-11 21:42 UTC (permalink / raw)
To: Jan Kara; +Cc: Paul Taysom, Alexander Viro, linux-fsdevel, linux-kernel,
sonnyrao
On Thu, Jul 11, 2013 at 4:58 AM, Jan Kara <jack@suse.cz> wrote:
> On Thu 11-07-13 12:53:46, Jan Kara wrote:
>> On Wed 10-07-13 16:12:36, Paul Taysom wrote:
>> > The following commit introduced a 10x regression for
>> > syncing inodes in ext4 with relatime enabled where just
>> > the atime had been modified.
>> >
>> > commit 4ea425b63a3dfeb7707fc7cc7161c11a51e871ed
>> > Author: Jan Kara <jack@suse.cz>
>> > Date: Tue Jul 3 16:45:34 2012 +0200
>> > vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes
>> >
>> > See also: http://www.kernelhub.org/?msg=93100&p=2
>> >
>> > Fixed by putting back in the call to writeback_inodes_sb.
>> >
>> > I'll attach the test in a reply to this e-mail.
>> >
>> > The test starts by creating 512 files, syncing, reading one byte
>> > from each of those files, syncing, and then deleting each file
>> > and syncing. The time to do each sync is printed. The process
>> > is then repeated for 1024 files and then the next power of
>> > two up to 262144 files.
>> >
>> > Note, when running the test, the slow down doesn't always happen
>> > but most of the tests will show a slow down.
>> >
>> > In response to crbug.com/240422
>> >
>> > Signed-off-by: Paul Taysom <taysom@chromium.org>
>> Thanks for report. Rather than blindly reverting the change, I'd like to
>> understand why you see so huge regression. As the changelog in the patch
>> says, flusher thread should be doing async writeback equivalent to the
>> removed one because it gets woken via wakeup_flusher_threads(). But my
>> guess is that for some reason we end up doing all the writeback from
>> sync_inodes_one_sb(). I'll try to reproduce your results and investigate...
> Hum, so it must be something timing sensitive. I wasn't able to reproduce
> the issue on my test machine in 4 runs of your test program. I was able to
> reproduce it on my laptop on every second run of the test program but once
> I've enabled some tracepoints, the issue disappeared and I didn't see it in
> about 10 runs.
>
> That being said I think that reverting my patch is just papering over the
> problem. We will do the async pass over inodes twice instead of once
> and thus the timing changes enough that you aren't able to observe the
> problem.
>
> I'm looking into this more...
>
> Honza
>> > ---
>> > fs/sync.c | 7 +++++++
>> > 1 file changed, 7 insertions(+)
>> >
>> > diff --git a/fs/sync.c b/fs/sync.c
>> > index 905f3f6..55c3316 100644
>> > --- a/fs/sync.c
>> > +++ b/fs/sync.c
>> > @@ -73,6 +73,12 @@ static void sync_inodes_one_sb(struct super_block *sb, void *arg)
>> > sync_inodes_sb(sb);
>> > }
>> >
>> > +static void writeback_inodes_one_sb(struct super_block *sb, void *arg)
>> > +{
>> > + if (!(sb->s_flags & MS_RDONLY))
>> > + writeback_inodes_sb(sb, WB_REASON_SYNC);
>> > +}
>> > +
>> > static void sync_fs_one_sb(struct super_block *sb, void *arg)
>> > {
>> > if (!(sb->s_flags & MS_RDONLY) && sb->s_op->sync_fs)
>> > @@ -104,6 +110,7 @@ SYSCALL_DEFINE0(sync)
>> > int nowait = 0, wait = 1;
>> >
>> > wakeup_flusher_threads(0, WB_REASON_SYNC);
>> > + iterate_supers(writeback_inodes_one_sb, NULL);
>> > iterate_supers(sync_inodes_one_sb, NULL);
>> > iterate_supers(sync_fs_one_sb, &nowait);
>> > iterate_supers(sync_fs_one_sb, &wait);
>> > --
>> > 1.8.3
>> >
>> --
>> Jan Kara <jack@suse.cz>
>> SUSE Labs, CR
> --
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR
I've tried Dave Chinner's patch but it doesn't seem to help.
Looking at the references to WB_SYNC_NONE flag I found the interesting
comment in fs/ext4/inodes.c write_cache_pages_da:
...
lock_page(page);
/*
* If the page is no longer dirty, or its
* mapping no longer corresponds to inode we
* are writing (which means it has been
* truncated or invalidated), or the page is
* already under writeback and we are not
* doing a data integrity writeback, skip the page
*/
if (!PageDirty(page) ||
(PageWriteback(page) &&
(wbc->sync_mode == WB_SYNC_NONE)) ||
unlikely(page->mapping != mapping)) {
unlock_page(page);
continue;
}
wait_on_page_writeback(page);
BUG_ON(PageWriteback(page));
...
I'm wondering if one inode is getting written out then the next inode
in the same page waits for the writeback to finish.
writeback_inodes_sb_nt sets the sync_mode to WB_SYNC_NONE
sync_inodes_sb set the sync_mode to WB_SYNC_ALL.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] fs: sync: fixed performance regression
2013-07-11 11:58 ` Jan Kara
2013-07-11 21:42 ` Paul Taysom
@ 2013-07-12 15:43 ` Jan Kara
2013-07-12 16:59 ` Paul Taysom
1 sibling, 1 reply; 12+ messages in thread
From: Jan Kara @ 2013-07-12 15:43 UTC (permalink / raw)
To: Paul Taysom; +Cc: Alexander Viro, linux-fsdevel, linux-kernel, jack, sonnyrao
[-- Attachment #1: Type: text/plain, Size: 2730 bytes --]
On Thu 11-07-13 13:58:32, Jan Kara wrote:
> On Thu 11-07-13 12:53:46, Jan Kara wrote:
> > On Wed 10-07-13 16:12:36, Paul Taysom wrote:
> > > The following commit introduced a 10x regression for
> > > syncing inodes in ext4 with relatime enabled where just
> > > the atime had been modified.
> > >
> > > commit 4ea425b63a3dfeb7707fc7cc7161c11a51e871ed
> > > Author: Jan Kara <jack@suse.cz>
> > > Date: Tue Jul 3 16:45:34 2012 +0200
> > > vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes
> > >
> > > See also: http://www.kernelhub.org/?msg=93100&p=2
> > >
> > > Fixed by putting back in the call to writeback_inodes_sb.
> > >
> > > I'll attach the test in a reply to this e-mail.
> > >
> > > The test starts by creating 512 files, syncing, reading one byte
> > > from each of those files, syncing, and then deleting each file
> > > and syncing. The time to do each sync is printed. The process
> > > is then repeated for 1024 files and then the next power of
> > > two up to 262144 files.
> > >
> > > Note, when running the test, the slow down doesn't always happen
> > > but most of the tests will show a slow down.
> > >
> > > In response to crbug.com/240422
> > >
> > > Signed-off-by: Paul Taysom <taysom@chromium.org>
> > Thanks for report. Rather than blindly reverting the change, I'd like to
> > understand why you see so huge regression. As the changelog in the patch
> > says, flusher thread should be doing async writeback equivalent to the
> > removed one because it gets woken via wakeup_flusher_threads(). But my
> > guess is that for some reason we end up doing all the writeback from
> > sync_inodes_one_sb(). I'll try to reproduce your results and investigate...
> Hum, so it must be something timing sensitive. I wasn't able to reproduce
> the issue on my test machine in 4 runs of your test program. I was able to
> reproduce it on my laptop on every second run of the test program but once
> I've enabled some tracepoints, the issue disappeared and I didn't see it in
> about 10 runs.
>
> That being said I think that reverting my patch is just papering over the
> problem. We will do the async pass over inodes twice instead of once
> and thus the timing changes enough that you aren't able to observe the
> problem.
>
> I'm looking into this more...
So I finally understood what's going on. If the system has no dirty pages
at all wakeup_flusher_threads() will submit work with nr_pages == 0. So
wb_writeback() will bail out immediately without doing anything and all the
writeback is left for WB_SYNC_ALL pass of sync(1) which is slow. Attached
patch fixes the problem for me.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
[-- Attachment #2: 0001-writeback-Fix-occasional-slow-sync-1.patch --]
[-- Type: text/x-patch, Size: 1367 bytes --]
>From 2e3d6f21ffa990780e9b25e11be31a6e0da13c79 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Fri, 12 Jul 2013 17:30:07 +0200
Subject: [PATCH] writeback: Fix occasional slow sync(1)
In case when system contains no dirty pages, wakeup_flusher_threads()
will submit WB_SYNC_NONE writeback for 0 pages so wb_writeback() exits
immediately without doing anything. Thus sync(1) will write all the
dirty inodes from a WB_SYNC_ALL writeback pass which is slow.
Fix the problem by using get_nr_dirty_pages() in
wakeup_flusher_threads() instead of calculating number of dirty pages
manually. That function also takes number of dirty inodes into account.
CC: stable@vger.kernel.org
Reported-by: Paul Taysom <taysom@chromium.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/fs-writeback.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index a85ac4e..d0d70a8 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1055,10 +1055,8 @@ void wakeup_flusher_threads(long nr_pages, enum wb_reason reason)
{
struct backing_dev_info *bdi;
- if (!nr_pages) {
- nr_pages = global_page_state(NR_FILE_DIRTY) +
- global_page_state(NR_UNSTABLE_NFS);
- }
+ if (!nr_pages)
+ nr_pages = get_nr_dirty_pages();
rcu_read_lock();
list_for_each_entry_rcu(bdi, &bdi_list, bdi_list) {
--
1.8.1.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] fs: sync: fixed performance regression
2013-07-12 15:43 ` Jan Kara
@ 2013-07-12 16:59 ` Paul Taysom
2013-07-15 9:46 ` Jan Kara
0 siblings, 1 reply; 12+ messages in thread
From: Paul Taysom @ 2013-07-12 16:59 UTC (permalink / raw)
To: Jan Kara; +Cc: Paul Taysom, Alexander Viro, linux-fsdevel, linux-kernel,
sonnyrao
`On Fri, Jul 12, 2013 at 8:43 AM, Jan Kara <jack@suse.cz> wrote:
> On Thu 11-07-13 13:58:32, Jan Kara wrote:
>> On Thu 11-07-13 12:53:46, Jan Kara wrote:
>> > On Wed 10-07-13 16:12:36, Paul Taysom wrote:
>> > > The following commit introduced a 10x regression for
>> > > syncing inodes in ext4 with relatime enabled where just
>> > > the atime had been modified.
>> > >
>> > > commit 4ea425b63a3dfeb7707fc7cc7161c11a51e871ed
>> > > Author: Jan Kara <jack@suse.cz>
>> > > Date: Tue Jul 3 16:45:34 2012 +0200
>> > > vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes
>> > >
>> > > See also: http://www.kernelhub.org/?msg=93100&p=2
>> > >
>> > > Fixed by putting back in the call to writeback_inodes_sb.
>> > >
>> > > I'll attach the test in a reply to this e-mail.
>> > >
>> > > The test starts by creating 512 files, syncing, reading one byte
>> > > from each of those files, syncing, and then deleting each file
>> > > and syncing. The time to do each sync is printed. The process
>> > > is then repeated for 1024 files and then the next power of
>> > > two up to 262144 files.
>> > >
>> > > Note, when running the test, the slow down doesn't always happen
>> > > but most of the tests will show a slow down.
>> > >
>> > > In response to crbug.com/240422
>> > >
>> > > Signed-off-by: Paul Taysom <taysom@chromium.org>
>> > Thanks for report. Rather than blindly reverting the change, I'd like to
>> > understand why you see so huge regression. As the changelog in the patch
>> > says, flusher thread should be doing async writeback equivalent to the
>> > removed one because it gets woken via wakeup_flusher_threads(). But my
>> > guess is that for some reason we end up doing all the writeback from
>> > sync_inodes_one_sb(). I'll try to reproduce your results and investigate...
>> Hum, so it must be something timing sensitive. I wasn't able to reproduce
>> the issue on my test machine in 4 runs of your test program. I was able to
>> reproduce it on my laptop on every second run of the test program but once
>> I've enabled some tracepoints, the issue disappeared and I didn't see it in
>> about 10 runs.
>>
>> That being said I think that reverting my patch is just papering over the
>> problem. We will do the async pass over inodes twice instead of once
>> and thus the timing changes enough that you aren't able to observe the
>> problem.
>>
>> I'm looking into this more...
> So I finally understood what's going on. If the system has no dirty pages
> at all wakeup_flusher_threads() will submit work with nr_pages == 0. So
> wb_writeback() will bail out immediately without doing anything and all the
> writeback is left for WB_SYNC_ALL pass of sync(1) which is slow. Attached
> patch fixes the problem for me.
>
> Honza
> --
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR
Jan,
Your fix is a clear win! Not only did it fix the sync after read
problem but it made the sync after create faster too.
Thanks,
Paul
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] fs: sync: fixed performance regression
2013-07-12 16:59 ` Paul Taysom
@ 2013-07-15 9:46 ` Jan Kara
0 siblings, 0 replies; 12+ messages in thread
From: Jan Kara @ 2013-07-15 9:46 UTC (permalink / raw)
To: Paul Taysom
Cc: Jan Kara, Paul Taysom, Alexander Viro, linux-fsdevel,
linux-kernel, sonnyrao
On Fri 12-07-13 09:59:00, Paul Taysom wrote:
> `On Fri, Jul 12, 2013 at 8:43 AM, Jan Kara <jack@suse.cz> wrote:
> > On Thu 11-07-13 13:58:32, Jan Kara wrote:
> >> On Thu 11-07-13 12:53:46, Jan Kara wrote:
> >> > On Wed 10-07-13 16:12:36, Paul Taysom wrote:
> >> > > The following commit introduced a 10x regression for
> >> > > syncing inodes in ext4 with relatime enabled where just
> >> > > the atime had been modified.
> >> > >
> >> > > commit 4ea425b63a3dfeb7707fc7cc7161c11a51e871ed
> >> > > Author: Jan Kara <jack@suse.cz>
> >> > > Date: Tue Jul 3 16:45:34 2012 +0200
> >> > > vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes
> >> > >
> >> > > See also: http://www.kernelhub.org/?msg=93100&p=2
> >> > >
> >> > > Fixed by putting back in the call to writeback_inodes_sb.
> >> > >
> >> > > I'll attach the test in a reply to this e-mail.
> >> > >
> >> > > The test starts by creating 512 files, syncing, reading one byte
> >> > > from each of those files, syncing, and then deleting each file
> >> > > and syncing. The time to do each sync is printed. The process
> >> > > is then repeated for 1024 files and then the next power of
> >> > > two up to 262144 files.
> >> > >
> >> > > Note, when running the test, the slow down doesn't always happen
> >> > > but most of the tests will show a slow down.
> >> > >
> >> > > In response to crbug.com/240422
> >> > >
> >> > > Signed-off-by: Paul Taysom <taysom@chromium.org>
> >> > Thanks for report. Rather than blindly reverting the change, I'd like to
> >> > understand why you see so huge regression. As the changelog in the patch
> >> > says, flusher thread should be doing async writeback equivalent to the
> >> > removed one because it gets woken via wakeup_flusher_threads(). But my
> >> > guess is that for some reason we end up doing all the writeback from
> >> > sync_inodes_one_sb(). I'll try to reproduce your results and investigate...
> >> Hum, so it must be something timing sensitive. I wasn't able to reproduce
> >> the issue on my test machine in 4 runs of your test program. I was able to
> >> reproduce it on my laptop on every second run of the test program but once
> >> I've enabled some tracepoints, the issue disappeared and I didn't see it in
> >> about 10 runs.
> >>
> >> That being said I think that reverting my patch is just papering over the
> >> problem. We will do the async pass over inodes twice instead of once
> >> and thus the timing changes enough that you aren't able to observe the
> >> problem.
> >>
> >> I'm looking into this more...
> > So I finally understood what's going on. If the system has no dirty pages
> > at all wakeup_flusher_threads() will submit work with nr_pages == 0. So
> > wb_writeback() will bail out immediately without doing anything and all the
> > writeback is left for WB_SYNC_ALL pass of sync(1) which is slow. Attached
> > patch fixes the problem for me.
> >
> > Honza
> > --
> > Jan Kara <jack@suse.cz>
> > SUSE Labs, CR
>
> Jan,
> Your fix is a clear win! Not only did it fix the sync after read
> problem but it made the sync after create faster too.
Thanks for testing! I've sent the patch to Al for inclusion.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2013-07-15 9:46 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-07-10 23:12 [PATCH] fs: sync: fixed performance regression Paul Taysom
2013-07-10 23:45 ` Paul Taysom
2013-07-10 23:56 ` Dave Jones
2013-07-11 0:42 ` Paul Taysom
2013-07-11 0:45 ` Paul Taysom
2013-07-11 2:00 ` Dave Chinner
2013-07-11 10:53 ` Jan Kara
2013-07-11 11:58 ` Jan Kara
2013-07-11 21:42 ` Paul Taysom
2013-07-12 15:43 ` Jan Kara
2013-07-12 16:59 ` Paul Taysom
2013-07-15 9:46 ` Jan Kara
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).