[PATCH] ext4: fix checking on nr_to

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] ext4: fix checking on nr_to_write
@ 2013-10-13 16:39 Ming Lei
  2013-10-14  0:42 ` Ming Lei
  2013-10-14 12:58 ` Jan Kara
  0 siblings, 2 replies; 10+ messages in thread
From: Ming Lei @ 2013-10-13 16:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ming Lei, Ted Tso, Jan Kara, linux-ext4,
	linux-fsdevel@vger.kernel.org

Commit 4e7ea81db5(ext4: restructure writeback path) introduces
another performance regression on random write:

- one more page may be mapped to ext4 extent in mpage_prepare_extent_to_map,
and will be submitted for I/O so nr_to_write will become -1 before 'done'
is set

- the worse thing is that dirty pages may still be retrieved from page
  cache after nr_to_write becomes negative, so lots of small chunks can be
  submitted to block device when page writeback is catching up with write path,
  and performance is hurted.

On one arm A15 board(arndale) with sata 3.0 SSD(CPU: 1.5GHz dura core, RAM: 2GB),
this patch can improve below test result from 157MB/sec to 174MB/sec(>10%):

	dd if=/dev/zero of=./z.img bs=8K count=512K

The above test is actually prototype of block write in bonnie++ utility.

This patch fixes check on nr_to_write in mpage_prepare_extent_to_map()
to make sure nr_to_write won't become negative.

Cc: Ted Tso <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>
Cc: linux-ext4@vger.kernel.org
Cc: "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 fs/ext4/inode.c |   20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 32c04ab..6a62803 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2356,15 +2356,6 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
 			if (mpd->map.m_len == 0)
 				mpd->first_page = page->index;
 			mpd->next_page = page->index + 1;
-			/* Add all dirty buffers to mpd */
-			lblk = ((ext4_lblk_t)page->index) <<
-				(PAGE_CACHE_SHIFT - blkbits);
-			head = page_buffers(page);
-			err = mpage_process_page_bufs(mpd, head, head, lblk);
-			if (err <= 0)
-				goto out;
-			err = 0;
-
 			/*
 			 * Accumulated enough dirty pages? This doesn't apply
 			 * to WB_SYNC_ALL mode. For integrity sync we have to
@@ -2374,9 +2365,18 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
 			 * of the old dirty pages.
 			 */
 			if (mpd->wbc->sync_mode == WB_SYNC_NONE &&
-			    mpd->next_page - mpd->first_page >=
+			    mpd->next_page - mpd->first_page >
 							mpd->wbc->nr_to_write)
 				goto out;
+
+			/* Add all dirty buffers to mpd */
+			lblk = ((ext4_lblk_t)page->index) <<
+				(PAGE_CACHE_SHIFT - blkbits);
+			head = page_buffers(page);
+			err = mpage_process_page_bufs(mpd, head, head, lblk);
+			if (err <= 0)
+				goto out;
+			err = 0;
 		}
 		pagevec_release(&pvec);
 		cond_resched();
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] ext4: fix checking on nr_to_write
  2013-10-13 16:39 [PATCH] ext4: fix checking on nr_to_write Ming Lei
@ 2013-10-14  0:42 ` Ming Lei
  2013-10-14 12:58 ` Jan Kara
  1 sibling, 0 replies; 10+ messages in thread
From: Ming Lei @ 2013-10-14  0:42 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Ming Lei, Ted Tso, Jan Kara, linux-ext4,
	linux-fsdevel@vger.kernel.org

On Mon, Oct 14, 2013 at 12:39 AM, Ming Lei <ming.lei@canonical.com> wrote:
>
> On one arm A15 board(arndale) with sata 3.0 SSD(CPU: 1.5GHz dura core, RAM: 2GB),

Sorry for forgetting to mention: the sata controller is working at 3.0Gbps.

Thanks,
--
Ming Lei

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] ext4: fix checking on nr_to_write
  2013-10-13 16:39 [PATCH] ext4: fix checking on nr_to_write Ming Lei
  2013-10-14  0:42 ` Ming Lei
@ 2013-10-14 12:58 ` Jan Kara
  2013-10-14 13:50   ` Ming Lei
  1 sibling, 1 reply; 10+ messages in thread
From: Jan Kara @ 2013-10-14 12:58 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-kernel, Ted Tso, Jan Kara, linux-ext4,
	linux-fsdevel@vger.kernel.org

  Hello,

On Mon 14-10-13 00:39:52, Ming Lei wrote:
> Commit 4e7ea81db5(ext4: restructure writeback path) introduces
> another performance regression on random write:
> 
> - one more page may be mapped to ext4 extent in mpage_prepare_extent_to_map,
> and will be submitted for I/O so nr_to_write will become -1 before 'done'
> is set
> 
> - the worse thing is that dirty pages may still be retrieved from page
>   cache after nr_to_write becomes negative, so lots of small chunks can be
>   submitted to block device when page writeback is catching up with write path,
>   and performance is hurted.
  Umm, I guess I see what are you pointing at. Thanks for catching that.
mpage_process_page_bufs() always adds a buffer to mpd even if nr_to_write
is already <= 0. But I would somewhat prefer not to call
mpage_prepare_extent_to_map() at all when nr_to_write <= 0. So a patch
like:
                ret = mpage_prepare_extent_to_map(&mpd);
                if (!ret) {
-			if (mpd.map.m_len)
+			if (mpd.map.m_len) {
                                ret = mpage_map_and_submit_extent(handle, &mpd,
                                        &give_up_on_write);
-                       else {
+				done = (wbc->nr_to_write <= 0);
+                       } else {

Should also fix your problem, am I right?

								Honza

> On one arm A15 board(arndale) with sata 3.0 SSD(CPU: 1.5GHz dura core, RAM: 2GB),
> this patch can improve below test result from 157MB/sec to 174MB/sec(>10%):
> 
> 	dd if=/dev/zero of=./z.img bs=8K count=512K
> 
> The above test is actually prototype of block write in bonnie++ utility.
> 
> This patch fixes check on nr_to_write in mpage_prepare_extent_to_map()
> to make sure nr_to_write won't become negative.
> 
> Cc: Ted Tso <tytso@mit.edu>
> Cc: Jan Kara <jack@suse.cz>
> Cc: linux-ext4@vger.kernel.org
> Cc: "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
> Signed-off-by: Ming Lei <ming.lei@canonical.com>
> ---
>  fs/ext4/inode.c |   20 ++++++++++----------
>  1 file changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 32c04ab..6a62803 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2356,15 +2356,6 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
>  			if (mpd->map.m_len == 0)
>  				mpd->first_page = page->index;
>  			mpd->next_page = page->index + 1;
> -			/* Add all dirty buffers to mpd */
> -			lblk = ((ext4_lblk_t)page->index) <<
> -				(PAGE_CACHE_SHIFT - blkbits);
> -			head = page_buffers(page);
> -			err = mpage_process_page_bufs(mpd, head, head, lblk);
> -			if (err <= 0)
> -				goto out;
> -			err = 0;
> -
>  			/*
>  			 * Accumulated enough dirty pages? This doesn't apply
>  			 * to WB_SYNC_ALL mode. For integrity sync we have to
> @@ -2374,9 +2365,18 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
>  			 * of the old dirty pages.
>  			 */
>  			if (mpd->wbc->sync_mode == WB_SYNC_NONE &&
> -			    mpd->next_page - mpd->first_page >=
> +			    mpd->next_page - mpd->first_page >
>  							mpd->wbc->nr_to_write)
>  				goto out;
> +
> +			/* Add all dirty buffers to mpd */
> +			lblk = ((ext4_lblk_t)page->index) <<
> +				(PAGE_CACHE_SHIFT - blkbits);
> +			head = page_buffers(page);
> +			err = mpage_process_page_bufs(mpd, head, head, lblk);
> +			if (err <= 0)
> +				goto out;
> +			err = 0;
>  		}
>  		pagevec_release(&pvec);
>  		cond_resched();
> -- 
> 1.7.9.5
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] ext4: fix checking on nr_to_write
  2013-10-14 12:58 ` Jan Kara
@ 2013-10-14 13:50   ` Ming Lei
  2013-10-14 17:34     ` Jan Kara
  0 siblings, 1 reply; 10+ messages in thread
From: Ming Lei @ 2013-10-14 13:50 UTC (permalink / raw)
  To: Jan Kara
  Cc: Linux Kernel Mailing List, Ted Tso, linux-ext4,
	linux-fsdevel@vger.kernel.org

On Mon, Oct 14, 2013 at 8:58 PM, Jan Kara <jack@suse.cz> wrote:
>   Umm, I guess I see what are you pointing at. Thanks for catching that.
> mpage_process_page_bufs() always adds a buffer to mpd even if nr_to_write
> is already <= 0. But I would somewhat prefer not to call
> mpage_prepare_extent_to_map() at all when nr_to_write <= 0. So a patch
> like:
>                 ret = mpage_prepare_extent_to_map(&mpd);
>                 if (!ret) {
> -                       if (mpd.map.m_len)
> +                       if (mpd.map.m_len) {
>                                 ret = mpage_map_and_submit_extent(handle, &mpd,
>                                         &give_up_on_write);
> -                       else {
> +                               done = (wbc->nr_to_write <= 0);
> +                       } else {
>
> Should also fix your problem, am I right?

I am afraid it can't, because we need to stop scanning page cache
if end of file is reached.

nr_to_write will become negative inside mpage_map_and_submit_extent(),
that is why I fix it inside mpage_prepare_extent_to_map().


Thanks,
--
Ming Lei

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] ext4: fix checking on nr_to_write
  2013-10-14 13:50   ` Ming Lei
@ 2013-10-14 17:34     ` Jan Kara
  2013-10-15  2:25       ` Ming Lei
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Kara @ 2013-10-14 17:34 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jan Kara, Linux Kernel Mailing List, Ted Tso, linux-ext4,
	linux-fsdevel@vger.kernel.org

On Mon 14-10-13 21:50:54, Ming Lei wrote:
> On Mon, Oct 14, 2013 at 8:58 PM, Jan Kara <jack@suse.cz> wrote:
> >   Umm, I guess I see what are you pointing at. Thanks for catching that.
> > mpage_process_page_bufs() always adds a buffer to mpd even if nr_to_write
> > is already <= 0. But I would somewhat prefer not to call
> > mpage_prepare_extent_to_map() at all when nr_to_write <= 0. So a patch
> > like:
> >                 ret = mpage_prepare_extent_to_map(&mpd);
> >                 if (!ret) {
> > -                       if (mpd.map.m_len)
> > +                       if (mpd.map.m_len) {
> >                                 ret = mpage_map_and_submit_extent(handle, &mpd,
> >                                         &give_up_on_write);
> > -                       else {
> > +                               done = (wbc->nr_to_write <= 0);
> > +                       } else {
> >
> > Should also fix your problem, am I right?
> 
> I am afraid it can't, because we need to stop scanning page cache
> if end of file is reached.
  That should be OK. mpage_process_page_bufs() won't add a buffer beyond
EOF so we end the extent at EOF and next time we don't add anything to the
extent. My change wouldn't change anything in this.

> nr_to_write will become negative inside mpage_map_and_submit_extent(),
> that is why I fix it inside mpage_prepare_extent_to_map().
  Yes, mpage_map_and_submit_extent() creates negative nr_to_write but only
because mpage_prepare_extent_to_map() asked for mapping too big extent. But
if I'm reading the code correctly we first ask for writing the extent of
just the right size (nr_to_write becomes 0) but then ext4_writepages() asks
mpage_prepare_extent_to_map() for another extent and it will create a single
page extent for mapping before it finds out nr_to_write <= 0 and
terminates. Am I understanding the problem correctly?

After thinking about it again, moving the condition in
mpage_prepare_extent_to_map() as you did is probably better that what I
suggested. Just move it more up in the loop - like after page->index > end
condition. So that we don't unnecessarily lock the page etc.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] ext4: fix checking on nr_to_write
  2013-10-14 17:34     ` Jan Kara
@ 2013-10-15  2:25       ` Ming Lei
  2013-10-15 10:39         ` Jan Kara
  0 siblings, 1 reply; 10+ messages in thread
From: Ming Lei @ 2013-10-15  2:25 UTC (permalink / raw)
  To: Jan Kara
  Cc: Linux Kernel Mailing List, Ted Tso, linux-ext4,
	linux-fsdevel@vger.kernel.org, Ming Lei

On Mon, 14 Oct 2013 19:34:59 +0200
Jan Kara <jack@suse.cz> wrote:

> On Mon 14-10-13 21:50:54, Ming Lei wrote:
> > On Mon, Oct 14, 2013 at 8:58 PM, Jan Kara <jack@suse.cz> wrote:
> > >   Umm, I guess I see what are you pointing at. Thanks for catching that.
> > > mpage_process_page_bufs() always adds a buffer to mpd even if nr_to_write
> > > is already <= 0. But I would somewhat prefer not to call
> > > mpage_prepare_extent_to_map() at all when nr_to_write <= 0. So a patch
> > > like:
> > >                 ret = mpage_prepare_extent_to_map(&mpd);
> > >                 if (!ret) {
> > > -                       if (mpd.map.m_len)
> > > +                       if (mpd.map.m_len) {
> > >                                 ret = mpage_map_and_submit_extent(handle, &mpd,
> > >                                         &give_up_on_write);
> > > -                       else {
> > > +                               done = (wbc->nr_to_write <= 0);
> > > +                       } else {
> > >
> > > Should also fix your problem, am I right?
> > 
> > I am afraid it can't, because we need to stop scanning page cache
> > if end of file is reached.
>   That should be OK. mpage_process_page_bufs() won't add a buffer beyond
> EOF so we end the extent at EOF and next time we don't add anything to the
> extent. My change wouldn't change anything in this.

I mean wbc->nr_to_write may still be positive when mpage_prepare_extent_to_map()
returns zero, so your change will cause ext4_writepages not to break from the
loop, then write hangs, too crazy?

> 
> > nr_to_write will become negative inside mpage_map_and_submit_extent(),
> > that is why I fix it inside mpage_prepare_extent_to_map().
>   Yes, mpage_map_and_submit_extent() creates negative nr_to_write but only
> because mpage_prepare_extent_to_map() asked for mapping too big extent. But
> if I'm reading the code correctly we first ask for writing the extent of
> just the right size (nr_to_write becomes 0) but then ext4_writepages() asks
> mpage_prepare_extent_to_map() for another extent and it will create a single
> page extent for mapping before it finds out nr_to_write <= 0 and
> terminates. Am I understanding the problem correctly?

Your understanding is right, and I think it is a correct fix, because
we shouldn't add more pages to extent than nr_to_write.

> 
> After thinking about it again, moving the condition in
> mpage_prepare_extent_to_map() as you did is probably better that what I
> suggested. Just move it more up in the loop - like after page->index > end
> condition. So that we don't unnecessarily lock the page etc.

Looks it makes sense, so how about below change?

--
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 32c04ab..c32b599 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2294,7 +2294,7 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
 {
 	struct address_space *mapping = mpd->inode->i_mapping;
 	struct pagevec pvec;
-	unsigned int nr_pages;
+	unsigned int nr_pages, nr_added = 0;
 	pgoff_t index = mpd->first_page;
 	pgoff_t end = mpd->last_page;
 	int tag;
@@ -2330,6 +2330,18 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
 			if (page->index > end)
 				goto out;
 
+			/*
+			 * Accumulated enough dirty pages? This doesn't apply
+			 * to WB_SYNC_ALL mode. For integrity sync we have to
+			 * keep going because someone may be concurrently
+			 * dirtying pages, and we might have synced a lot of
+			 * newly appeared dirty pages, but have not synced all
+			 * of the old dirty pages.
+			 */
+			if (mpd->wbc->sync_mode == WB_SYNC_NONE &&
+					nr_added >= mpd->wbc->nr_to_write)
+				goto out;
+
 			/* If we can't merge this page, we are done. */
 			if (mpd->map.m_len > 0 && mpd->next_page != page->index)
 				goto out;
@@ -2364,19 +2376,7 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
 			if (err <= 0)
 				goto out;
 			err = 0;

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] ext4: fix checking on nr_to_write
  2013-10-15  2:25       ` Ming Lei
@ 2013-10-15 10:39         ` Jan Kara
  2013-10-15 11:15           ` Ming Lei
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Kara @ 2013-10-15 10:39 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jan Kara, Linux Kernel Mailing List, Ted Tso, linux-ext4,
	linux-fsdevel@vger.kernel.org, Ming Lei

[-- Attachment #1: Type: text/plain, Size: 2453 bytes --]

On Tue 15-10-13 10:25:53, Ming Lei wrote:
> Looks it makes sense, so how about below change?
> 
> --
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 32c04ab..c32b599 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2294,7 +2294,7 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
>  {
>  	struct address_space *mapping = mpd->inode->i_mapping;
>  	struct pagevec pvec;
> -	unsigned int nr_pages;
> +	unsigned int nr_pages, nr_added = 0;
>  	pgoff_t index = mpd->first_page;
>  	pgoff_t end = mpd->last_page;
>  	int tag;
> @@ -2330,6 +2330,18 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
>  			if (page->index > end)
>  				goto out;
>  
> +			/*
> +			 * Accumulated enough dirty pages? This doesn't apply
> +			 * to WB_SYNC_ALL mode. For integrity sync we have to
> +			 * keep going because someone may be concurrently
> +			 * dirtying pages, and we might have synced a lot of
> +			 * newly appeared dirty pages, but have not synced all
> +			 * of the old dirty pages.
> +			 */
> +			if (mpd->wbc->sync_mode == WB_SYNC_NONE &&
> +					nr_added >= mpd->wbc->nr_to_write)
> +				goto out;
> +
  This won't quite work because if the page is fully mapped
mpage_process_page_bufs() will immediately submit the page and decrease
nr_to_write. So now you would end up writing less than you were asked for
in some cases. Attached patch should do what's needed. Can you try whether
it fixes the problem for you (it seems to work OK in my testing).

								Honza

>  			/* If we can't merge this page, we are done. */
>  			if (mpd->map.m_len > 0 && mpd->next_page != page->index)
>  				goto out;
> @@ -2364,19 +2376,7 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
>  			if (err <= 0)
>  				goto out;
>  			err = 0;
> -
> -			/*
> -			 * Accumulated enough dirty pages? This doesn't apply
> -			 * to WB_SYNC_ALL mode. For integrity sync we have to
> -			 * keep going because someone may be concurrently
> -			 * dirtying pages, and we might have synced a lot of
> -			 * newly appeared dirty pages, but have not synced all
> -			 * of the old dirty pages.
> -			 */
> -			if (mpd->wbc->sync_mode == WB_SYNC_NONE &&
> -			    mpd->next_page - mpd->first_page >=
> -							mpd->wbc->nr_to_write)
> -				goto out;
> +			nr_added++;
>  		}
>  		pagevec_release(&pvec);
>  		cond_resched();
> 
> 
> 
> Thanks,
> -- 
> Ming Lei
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

[-- Attachment #2: 0001-ext4-Fix-performance-regression-in-ext4_writepages.patch --]
[-- Type: text/x-patch, Size: 3104 bytes --]

>From 2ea950d5d601fd1236065f3fe87a9383d78d3785 Mon Sep 17 00:00:00 2001
From: Ming Lei <ming.lei@canonical.com>
Date: Tue, 15 Oct 2013 12:30:50 +0200
Subject: [PATCH] ext4: Fix performance regression in ext4_writepages()

Commit 4e7ea81db5(ext4: restructure writeback path) introduces another
performance regression on random write: The logic in
mpage_prepare_extent_to_map() always prepares at least one page for
mapping and the loop in ext4_writepages() doesn't terminate when
nr_to_write drops to zero if there is something to map. So we will still
try to write more that we should and we will do it in 1 page chunks.

Fix the problem by moving nr_to_write check in
mpage_prepare_extent_to_map() before preparing the first page. That way
mpage_prepare_extent_to_map() will return without preparing anything when
nr_to_write is exhausted and the loop in ext4_writepages() will be
terminated properly.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/inode.c | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index e274e9c1171f..7d12a38fbde4 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2334,6 +2334,22 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
 			if (mpd->map.m_len > 0 && mpd->next_page != page->index)
 				goto out;
 
+			if (mpd->map.m_len == 0)
+				mpd->first_page = page->index;
+			mpd->next_page = page->index + 1;
+			/*
+			 * Accumulated enough dirty pages? This doesn't apply
+			 * to WB_SYNC_ALL mode. For integrity sync we have to
+			 * keep going because someone may be concurrently
+			 * dirtying pages, and we might have synced a lot of
+			 * newly appeared dirty pages, but have not synced all
+			 * of the old dirty pages.
+			 */
+			if (mpd->wbc->sync_mode == WB_SYNC_NONE &&
+			    mpd->next_page - mpd->first_page >
+							mpd->wbc->nr_to_write)
+				goto out;
+
 			lock_page(page);
 			/*
 			 * If the page is no longer dirty, or its mapping no
@@ -2353,9 +2369,6 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
 			wait_on_page_writeback(page);
 			BUG_ON(PageWriteback(page));
 
-			if (mpd->map.m_len == 0)
-				mpd->first_page = page->index;
-			mpd->next_page = page->index + 1;
 			/* Add all dirty buffers to mpd */
 			lblk = ((ext4_lblk_t)page->index) <<
 				(PAGE_CACHE_SHIFT - blkbits);
@@ -2364,19 +2377,6 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
 			if (err <= 0)
 				goto out;
 			err = 0;
-
-			/*
-			 * Accumulated enough dirty pages? This doesn't apply
-			 * to WB_SYNC_ALL mode. For integrity sync we have to
-			 * keep going because someone may be concurrently
-			 * dirtying pages, and we might have synced a lot of
-			 * newly appeared dirty pages, but have not synced all
-			 * of the old dirty pages.
-			 */
-			if (mpd->wbc->sync_mode == WB_SYNC_NONE &&
-			    mpd->next_page - mpd->first_page >=
-							mpd->wbc->nr_to_write)
-				goto out;
 		}
 		pagevec_release(&pvec);
 		cond_resched();
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] ext4: fix checking on nr_to_write
  2013-10-15 10:39         ` Jan Kara
@ 2013-10-15 11:15           ` Ming Lei
  2013-10-15 12:34             ` Jan Kara
  0 siblings, 1 reply; 10+ messages in thread
From: Ming Lei @ 2013-10-15 11:15 UTC (permalink / raw)
  To: Jan Kara
  Cc: Linux Kernel Mailing List, Ted Tso, linux-ext4,
	linux-fsdevel@vger.kernel.org, Ming Lei

On Tue, 15 Oct 2013 12:39:00 +0200
Jan Kara <jack@suse.cz> wrote:

> On Tue 15-10-13 10:25:53, Ming Lei wrote:
> > Looks it makes sense, so how about below change?
> > 
> > --
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index 32c04ab..c32b599 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -2294,7 +2294,7 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
> >  {
> >  	struct address_space *mapping = mpd->inode->i_mapping;
> >  	struct pagevec pvec;
> > -	unsigned int nr_pages;
> > +	unsigned int nr_pages, nr_added = 0;
> >  	pgoff_t index = mpd->first_page;
> >  	pgoff_t end = mpd->last_page;
> >  	int tag;
> > @@ -2330,6 +2330,18 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
> >  			if (page->index > end)
> >  				goto out;
> >  
> > +			/*
> > +			 * Accumulated enough dirty pages? This doesn't apply
> > +			 * to WB_SYNC_ALL mode. For integrity sync we have to
> > +			 * keep going because someone may be concurrently
> > +			 * dirtying pages, and we might have synced a lot of
> > +			 * newly appeared dirty pages, but have not synced all
> > +			 * of the old dirty pages.
> > +			 */
> > +			if (mpd->wbc->sync_mode == WB_SYNC_NONE &&
> > +					nr_added >= mpd->wbc->nr_to_write)
> > +				goto out;
> > +
>   This won't quite work because if the page is fully mapped
> mpage_process_page_bufs() will immediately submit the page and decrease
> nr_to_write. So now you would end up writing less than you were asked for
> in some cases. 

Yes, your are right, so how about below?

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 32c04ab..3cf7abb 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2295,6 +2295,7 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
 	struct address_space *mapping = mpd->inode->i_mapping;
 	struct pagevec pvec;
 	unsigned int nr_pages;
+	int left = mpd->wbc->nr_to_write;
 	pgoff_t index = mpd->first_page;
 	pgoff_t end = mpd->last_page;
 	int tag;
@@ -2330,6 +2331,17 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
 			if (page->index > end)
 				goto out;
 
+			/*
+			 * Accumulated enough dirty pages? This doesn't apply
+			 * to WB_SYNC_ALL mode. For integrity sync we have to
+			 * keep going because someone may be concurrently
+			 * dirtying pages, and we might have synced a lot of
+			 * newly appeared dirty pages, but have not synced all
+			 * of the old dirty pages.
+			 */
+			if (mpd->wbc->sync_mode == WB_SYNC_NONE && left <= 0)
+				goto out;
+
 			/* If we can't merge this page, we are done. */
 			if (mpd->map.m_len > 0 && mpd->next_page != page->index)
 				goto out;
@@ -2364,19 +2376,7 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
 			if (err <= 0)
 				goto out;
 			err = 0;
-
-			/*
-			 * Accumulated enough dirty pages? This doesn't apply
-			 * to WB_SYNC_ALL mode. For integrity sync we have to
-			 * keep going because someone may be concurrently
-			 * dirtying pages, and we might have synced a lot of
-			 * newly appeared dirty pages, but have not synced all
-			 * of the old dirty pages.
-			 */
-			if (mpd->wbc->sync_mode == WB_SYNC_NONE &&
-			    mpd->next_page - mpd->first_page >=
-							mpd->wbc->nr_to_write)
-				goto out;
+			left--;
 		}
 		pagevec_release(&pvec);
 		cond_resched();


> Attached patch should do what's needed. Can you try whether
> it fixes the problem for you (it seems to work OK in my testing).

In fact, I had wrote and tested your attached patch before my last post,
and it may trigger BUG() in mpage_release_unused_pages(), that is because
we touch mpd->next_page without locking current page, so it is better to
not increase mpd->next_page if the current page won't be processed.


Thanks,
-- 
Ming Lei

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] ext4: fix checking on nr_to_write
  2013-10-15 11:15           ` Ming Lei
@ 2013-10-15 12:34             ` Jan Kara
  2013-10-15 14:53               ` Ming Lei
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Kara @ 2013-10-15 12:34 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jan Kara, Linux Kernel Mailing List, Ted Tso, linux-ext4,
	linux-fsdevel@vger.kernel.org, Ming Lei

On Tue 15-10-13 19:15:56, Ming Lei wrote:
> >   This won't quite work because if the page is fully mapped
> > mpage_process_page_bufs() will immediately submit the page and decrease
> > nr_to_write. So now you would end up writing less than you were asked for
> > in some cases. 
> 
> Yes, your are right, so how about below?
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 32c04ab..3cf7abb 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2295,6 +2295,7 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
>  	struct address_space *mapping = mpd->inode->i_mapping;
>  	struct pagevec pvec;
>  	unsigned int nr_pages;
> +	int left = mpd->wbc->nr_to_write;
  'long' please. Otherwise the patch looks fine. Thanks!

								Honza

>  	pgoff_t index = mpd->first_page;
>  	pgoff_t end = mpd->last_page;
>  	int tag;
> @@ -2330,6 +2331,17 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
>  			if (page->index > end)
>  				goto out;
>  
> +			/*
> +			 * Accumulated enough dirty pages? This doesn't apply
> +			 * to WB_SYNC_ALL mode. For integrity sync we have to
> +			 * keep going because someone may be concurrently
> +			 * dirtying pages, and we might have synced a lot of
> +			 * newly appeared dirty pages, but have not synced all
> +			 * of the old dirty pages.
> +			 */
> +			if (mpd->wbc->sync_mode == WB_SYNC_NONE && left <= 0)
> +				goto out;
> +
>  			/* If we can't merge this page, we are done. */
>  			if (mpd->map.m_len > 0 && mpd->next_page != page->index)
>  				goto out;
> @@ -2364,19 +2376,7 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
>  			if (err <= 0)
>  				goto out;
>  			err = 0;
> -
> -			/*
> -			 * Accumulated enough dirty pages? This doesn't apply
> -			 * to WB_SYNC_ALL mode. For integrity sync we have to
> -			 * keep going because someone may be concurrently
> -			 * dirtying pages, and we might have synced a lot of
> -			 * newly appeared dirty pages, but have not synced all
> -			 * of the old dirty pages.
> -			 */
> -			if (mpd->wbc->sync_mode == WB_SYNC_NONE &&
> -			    mpd->next_page - mpd->first_page >=
> -							mpd->wbc->nr_to_write)
> -				goto out;
> +			left--;
>  		}
>  		pagevec_release(&pvec);
>  		cond_resched();
> 
> 
> > Attached patch should do what's needed. Can you try whether
> > it fixes the problem for you (it seems to work OK in my testing).
> 
> In fact, I had wrote and tested your attached patch before my last post,
> and it may trigger BUG() in mpage_release_unused_pages(), that is because
> we touch mpd->next_page without locking current page, so it is better to
> not increase mpd->next_page if the current page won't be processed.
> 
> 
> Thanks,
> -- 
> Ming Lei
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] ext4: fix checking on nr_to_write
  2013-10-15 12:34             ` Jan Kara
@ 2013-10-15 14:53               ` Ming Lei
  0 siblings, 0 replies; 10+ messages in thread
From: Ming Lei @ 2013-10-15 14:53 UTC (permalink / raw)
  To: Jan Kara
  Cc: Linux Kernel Mailing List, Ted Tso, linux-ext4,
	linux-fsdevel@vger.kernel.org

On Tue, Oct 15, 2013 at 8:34 PM, Jan Kara <jack@suse.cz> wrote:
> On Tue 15-10-13 19:15:56, Ming Lei wrote:
>> >   This won't quite work because if the page is fully mapped
>> > mpage_process_page_bufs() will immediately submit the page and decrease
>> > nr_to_write. So now you would end up writing less than you were asked for
>> > in some cases.
>>
>> Yes, your are right, so how about below?
>>
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index 32c04ab..3cf7abb 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -2295,6 +2295,7 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
>>       struct address_space *mapping = mpd->inode->i_mapping;
>>       struct pagevec pvec;
>>       unsigned int nr_pages;
>> +     int left = mpd->wbc->nr_to_write;
>   'long' please. Otherwise the patch looks fine. Thanks!

OK, and I will submit the formal one with you ack.

Thanks,
--
Ming Lei

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-10-15 14:53 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-13 16:39 [PATCH] ext4: fix checking on nr_to_write Ming Lei
2013-10-14  0:42 ` Ming Lei
2013-10-14 12:58 ` Jan Kara
2013-10-14 13:50   ` Ming Lei
2013-10-14 17:34     ` Jan Kara
2013-10-15  2:25       ` Ming Lei
2013-10-15 10:39         ` Jan Kara
2013-10-15 11:15           ` Ming Lei
2013-10-15 12:34             ` Jan Kara
2013-10-15 14:53               ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).