* Re: [PATCH] ext4: Fix performance regression in writeback of random writes [not found] <1378842006-15237-1-git-send-email-jack@suse.cz> @ 2013-09-11 9:45 ` Bernd Schubert 2013-09-11 10:11 ` Jan Kara 0 siblings, 1 reply; 3+ messages in thread From: Bernd Schubert @ 2013-09-11 9:45 UTC (permalink / raw) To: Jan Kara; +Cc: Ted Tso, linux-ext4, Yan Zheng, linux-fsdevel@vger.kernel.org On 09/10/2013 09:40 PM, Jan Kara wrote: > Linux Kernel Performance project guys have reported that commit 4e7ea81db5 > introduces a performance regression for the following fio workload: > [global] > direct=0 > ioengine=mmap > size=1500M > bs=4k > pre_read=1 > numjobs=1 > overwrite=1 > loops=5 > runtime=300 > group_reporting > invalidate=0 > directory=/mnt/ > file_service_type=random:36 > file_service_type=random:36 > > [job0] > startdelay=0 > rw=randrw > filename=data0/f1:data0/f2 > > [job1] > startdelay=0 > rw=randrw > filename=data0/f2:data0/f1 > ... > > [job7] > startdelay=0 > rw=randrw > filename=data0/f2:data0/f1 > > The culprit of the problem is that after the commit ext4_writepages() > are more aggressive in writing back pages. Thus we have less consecutive > dirty pages resulting in more seeking. > > This increased aggressivity is caused by a bug in the condition > terminating ext4_writepages(). We start writing from the beginning of > the file even if we should have terminated ext4_writepages() because > wbc->nr_to_write <= 0. > > After fixing the condition the throughput of the fio workload is about 20% > better than before writeback reorganization. > > Reported-by: "Yan, Zheng" <zheng.z.yan@intel.com> > Signed-off-by: Jan Kara <jack@suse.cz> > --- > fs/ext4/inode.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index c79fd7d..7914c05 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -2563,7 +2563,7 @@ retry: > break; > } > blk_finish_plug(&plug); > - if (!ret && !cycled) { > + if (!ret && !cycled && wbc->nr_to_write > 0) { > cycled = 1; > mpd.last_page = writeback_index - 1; > mpd.first_page = 0; > Interesting, doesn't that mean generic_writepages (sub-sequent write_cache_pages() ) and all other file systems implementing their own ->writepages() should be updated? Thanks, Bernd ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] ext4: Fix performance regression in writeback of random writes 2013-09-11 9:45 ` [PATCH] ext4: Fix performance regression in writeback of random writes Bernd Schubert @ 2013-09-11 10:11 ` Jan Kara 2013-09-11 11:13 ` Bernd Schubert 0 siblings, 1 reply; 3+ messages in thread From: Jan Kara @ 2013-09-11 10:11 UTC (permalink / raw) To: Bernd Schubert Cc: Jan Kara, Ted Tso, linux-ext4, Yan Zheng, linux-fsdevel@vger.kernel.org On Wed 11-09-13 11:45:03, Bernd Schubert wrote: > On 09/10/2013 09:40 PM, Jan Kara wrote: > >Linux Kernel Performance project guys have reported that commit 4e7ea81db5 > >introduces a performance regression for the following fio workload: > >[global] > >direct=0 > >ioengine=mmap > >size=1500M > >bs=4k > >pre_read=1 > >numjobs=1 > >overwrite=1 > >loops=5 > >runtime=300 > >group_reporting > >invalidate=0 > >directory=/mnt/ > >file_service_type=random:36 > >file_service_type=random:36 > > > >[job0] > >startdelay=0 > >rw=randrw > >filename=data0/f1:data0/f2 > > > >[job1] > >startdelay=0 > >rw=randrw > >filename=data0/f2:data0/f1 > >... > > > >[job7] > >startdelay=0 > >rw=randrw > >filename=data0/f2:data0/f1 > > > >The culprit of the problem is that after the commit ext4_writepages() > >are more aggressive in writing back pages. Thus we have less consecutive > >dirty pages resulting in more seeking. > > > >This increased aggressivity is caused by a bug in the condition > >terminating ext4_writepages(). We start writing from the beginning of > >the file even if we should have terminated ext4_writepages() because > >wbc->nr_to_write <= 0. > > > >After fixing the condition the throughput of the fio workload is about 20% > >better than before writeback reorganization. > > > >Reported-by: "Yan, Zheng" <zheng.z.yan@intel.com> > >Signed-off-by: Jan Kara <jack@suse.cz> > >--- > > fs/ext4/inode.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > >diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > >index c79fd7d..7914c05 100644 > >--- a/fs/ext4/inode.c > >+++ b/fs/ext4/inode.c > >@@ -2563,7 +2563,7 @@ retry: > > break; > > } > > blk_finish_plug(&plug); > >- if (!ret && !cycled) { > >+ if (!ret && !cycled && wbc->nr_to_write > 0) { > > cycled = 1; > > mpd.last_page = writeback_index - 1; > > mpd.first_page = 0; > > > > Interesting, doesn't that mean generic_writepages (sub-sequent > write_cache_pages() ) and all other file systems implementing their > own ->writepages() should be updated? No. write_cache_pages() has the condition like: if (!cycled && !done) { and 'done' is set when wbc->nr_to_write drops to zero. So that function is OK. We cannot use 'done' in ext4_writepages() because the functions are structured a bit differently and 'done' gets set also when reach end of file. Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] ext4: Fix performance regression in writeback of random writes 2013-09-11 10:11 ` Jan Kara @ 2013-09-11 11:13 ` Bernd Schubert 0 siblings, 0 replies; 3+ messages in thread From: Bernd Schubert @ 2013-09-11 11:13 UTC (permalink / raw) To: Jan Kara; +Cc: Ted Tso, linux-ext4, Yan Zheng, linux-fsdevel@vger.kernel.org On 09/11/2013 12:11 PM, Jan Kara wrote: > On Wed 11-09-13 11:45:03, Bernd Schubert wrote: >> On 09/10/2013 09:40 PM, Jan Kara wrote: >>> Linux Kernel Performance project guys have reported that commit 4e7ea81db5 >>> introduces a performance regression for the following fio workload: >>> [global] >>> direct=0 >>> ioengine=mmap >>> size=1500M >>> bs=4k >>> pre_read=1 >>> numjobs=1 >>> overwrite=1 >>> loops=5 >>> runtime=300 >>> group_reporting >>> invalidate=0 >>> directory=/mnt/ >>> file_service_type=random:36 >>> file_service_type=random:36 >>> >>> [job0] >>> startdelay=0 >>> rw=randrw >>> filename=data0/f1:data0/f2 >>> >>> [job1] >>> startdelay=0 >>> rw=randrw >>> filename=data0/f2:data0/f1 >>> ... >>> >>> [job7] >>> startdelay=0 >>> rw=randrw >>> filename=data0/f2:data0/f1 >>> >>> The culprit of the problem is that after the commit ext4_writepages() >>> are more aggressive in writing back pages. Thus we have less consecutive >>> dirty pages resulting in more seeking. >>> >>> This increased aggressivity is caused by a bug in the condition >>> terminating ext4_writepages(). We start writing from the beginning of >>> the file even if we should have terminated ext4_writepages() because >>> wbc->nr_to_write <= 0. >>> >>> After fixing the condition the throughput of the fio workload is about 20% >>> better than before writeback reorganization. >>> >>> Reported-by: "Yan, Zheng" <zheng.z.yan@intel.com> >>> Signed-off-by: Jan Kara <jack@suse.cz> >>> --- >>> fs/ext4/inode.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c >>> index c79fd7d..7914c05 100644 >>> --- a/fs/ext4/inode.c >>> +++ b/fs/ext4/inode.c >>> @@ -2563,7 +2563,7 @@ retry: >>> break; >>> } >>> blk_finish_plug(&plug); >>> - if (!ret && !cycled) { >>> + if (!ret && !cycled && wbc->nr_to_write > 0) { >>> cycled = 1; >>> mpd.last_page = writeback_index - 1; >>> mpd.first_page = 0; >>> >> >> Interesting, doesn't that mean generic_writepages (sub-sequent >> write_cache_pages() ) and all other file systems implementing their >> own ->writepages() should be updated? > No. write_cache_pages() has the condition like: > if (!cycled && !done) { > > and 'done' is set when wbc->nr_to_write drops to zero. So that function > is OK. We cannot use 'done' in ext4_writepages() because the functions are > structured a bit differently and 'done' gets set also when reach end of > file. Ah right, I missed that. If pagevec_lookup_tag() returns 0 there is still a way to avoid setting done=1, but I guess wbc->nr_to_write also wouldn't be zero then. Btrfs' extent_write_cache_pages is another candidate and in combination with the additional blk plug ext4 and generic_writepages are doing, it might explain why I noticed extensive btrfs-raid6-rmw writes some time ago. I'm going to check that and further discuss on that list. Thanks, Bernd ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2013-09-11 11:13 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <1378842006-15237-1-git-send-email-jack@suse.cz> 2013-09-11 9:45 ` [PATCH] ext4: Fix performance regression in writeback of random writes Bernd Schubert 2013-09-11 10:11 ` Jan Kara 2013-09-11 11:13 ` Bernd Schubert
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).