* [PATCH] ext4: fix checking on nr_to_write @ 2013-10-13 16:39 Ming Lei 2013-10-14 0:42 ` Ming Lei 2013-10-14 12:58 ` Jan Kara 0 siblings, 2 replies; 10+ messages in thread From: Ming Lei @ 2013-10-13 16:39 UTC (permalink / raw) To: linux-kernel Cc: Ming Lei, Ted Tso, Jan Kara, linux-ext4, linux-fsdevel@vger.kernel.org Commit 4e7ea81db5(ext4: restructure writeback path) introduces another performance regression on random write: - one more page may be mapped to ext4 extent in mpage_prepare_extent_to_map, and will be submitted for I/O so nr_to_write will become -1 before 'done' is set - the worse thing is that dirty pages may still be retrieved from page cache after nr_to_write becomes negative, so lots of small chunks can be submitted to block device when page writeback is catching up with write path, and performance is hurted. On one arm A15 board(arndale) with sata 3.0 SSD(CPU: 1.5GHz dura core, RAM: 2GB), this patch can improve below test result from 157MB/sec to 174MB/sec(>10%): dd if=/dev/zero of=./z.img bs=8K count=512K The above test is actually prototype of block write in bonnie++ utility. This patch fixes check on nr_to_write in mpage_prepare_extent_to_map() to make sure nr_to_write won't become negative. Cc: Ted Tso <tytso@mit.edu> Cc: Jan Kara <jack@suse.cz> Cc: linux-ext4@vger.kernel.org Cc: "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org> Signed-off-by: Ming Lei <ming.lei@canonical.com> --- fs/ext4/inode.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 32c04ab..6a62803 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -2356,15 +2356,6 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd) if (mpd->map.m_len == 0) mpd->first_page = page->index; mpd->next_page = page->index + 1; - /* Add all dirty buffers to mpd */ - lblk = ((ext4_lblk_t)page->index) << - (PAGE_CACHE_SHIFT - blkbits); - head = page_buffers(page); - err = mpage_process_page_bufs(mpd, head, head, lblk); - if (err <= 0) - goto out; - err = 0; - /* * Accumulated enough dirty pages? This doesn't apply * to WB_SYNC_ALL mode. For integrity sync we have to @@ -2374,9 +2365,18 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd) * of the old dirty pages. */ if (mpd->wbc->sync_mode == WB_SYNC_NONE && - mpd->next_page - mpd->first_page >= + mpd->next_page - mpd->first_page > mpd->wbc->nr_to_write) goto out; + + /* Add all dirty buffers to mpd */ + lblk = ((ext4_lblk_t)page->index) << + (PAGE_CACHE_SHIFT - blkbits); + head = page_buffers(page); + err = mpage_process_page_bufs(mpd, head, head, lblk); + if (err <= 0) + goto out; + err = 0; } pagevec_release(&pvec); cond_resched(); -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] ext4: fix checking on nr_to_write 2013-10-13 16:39 [PATCH] ext4: fix checking on nr_to_write Ming Lei @ 2013-10-14 0:42 ` Ming Lei 2013-10-14 12:58 ` Jan Kara 1 sibling, 0 replies; 10+ messages in thread From: Ming Lei @ 2013-10-14 0:42 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: Ming Lei, Ted Tso, Jan Kara, linux-ext4, linux-fsdevel@vger.kernel.org On Mon, Oct 14, 2013 at 12:39 AM, Ming Lei <ming.lei@canonical.com> wrote: > > On one arm A15 board(arndale) with sata 3.0 SSD(CPU: 1.5GHz dura core, RAM: 2GB), Sorry for forgetting to mention: the sata controller is working at 3.0Gbps. Thanks, -- Ming Lei ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] ext4: fix checking on nr_to_write 2013-10-13 16:39 [PATCH] ext4: fix checking on nr_to_write Ming Lei 2013-10-14 0:42 ` Ming Lei @ 2013-10-14 12:58 ` Jan Kara 2013-10-14 13:50 ` Ming Lei 1 sibling, 1 reply; 10+ messages in thread From: Jan Kara @ 2013-10-14 12:58 UTC (permalink / raw) To: Ming Lei Cc: linux-kernel, Ted Tso, Jan Kara, linux-ext4, linux-fsdevel@vger.kernel.org Hello, On Mon 14-10-13 00:39:52, Ming Lei wrote: > Commit 4e7ea81db5(ext4: restructure writeback path) introduces > another performance regression on random write: > > - one more page may be mapped to ext4 extent in mpage_prepare_extent_to_map, > and will be submitted for I/O so nr_to_write will become -1 before 'done' > is set > > - the worse thing is that dirty pages may still be retrieved from page > cache after nr_to_write becomes negative, so lots of small chunks can be > submitted to block device when page writeback is catching up with write path, > and performance is hurted. Umm, I guess I see what are you pointing at. Thanks for catching that. mpage_process_page_bufs() always adds a buffer to mpd even if nr_to_write is already <= 0. But I would somewhat prefer not to call mpage_prepare_extent_to_map() at all when nr_to_write <= 0. So a patch like: ret = mpage_prepare_extent_to_map(&mpd); if (!ret) { - if (mpd.map.m_len) + if (mpd.map.m_len) { ret = mpage_map_and_submit_extent(handle, &mpd, &give_up_on_write); - else { + done = (wbc->nr_to_write <= 0); + } else { Should also fix your problem, am I right? Honza > On one arm A15 board(arndale) with sata 3.0 SSD(CPU: 1.5GHz dura core, RAM: 2GB), > this patch can improve below test result from 157MB/sec to 174MB/sec(>10%): > > dd if=/dev/zero of=./z.img bs=8K count=512K > > The above test is actually prototype of block write in bonnie++ utility. > > This patch fixes check on nr_to_write in mpage_prepare_extent_to_map() > to make sure nr_to_write won't become negative. > > Cc: Ted Tso <tytso@mit.edu> > Cc: Jan Kara <jack@suse.cz> > Cc: linux-ext4@vger.kernel.org > Cc: "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org> > Signed-off-by: Ming Lei <ming.lei@canonical.com> > --- > fs/ext4/inode.c | 20 ++++++++++---------- > 1 file changed, 10 insertions(+), 10 deletions(-) > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 32c04ab..6a62803 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -2356,15 +2356,6 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd) > if (mpd->map.m_len == 0) > mpd->first_page = page->index; > mpd->next_page = page->index + 1; > - /* Add all dirty buffers to mpd */ > - lblk = ((ext4_lblk_t)page->index) << > - (PAGE_CACHE_SHIFT - blkbits); > - head = page_buffers(page); > - err = mpage_process_page_bufs(mpd, head, head, lblk); > - if (err <= 0) > - goto out; > - err = 0; > - > /* > * Accumulated enough dirty pages? This doesn't apply > * to WB_SYNC_ALL mode. For integrity sync we have to > @@ -2374,9 +2365,18 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd) > * of the old dirty pages. > */ > if (mpd->wbc->sync_mode == WB_SYNC_NONE && > - mpd->next_page - mpd->first_page >= > + mpd->next_page - mpd->first_page > > mpd->wbc->nr_to_write) > goto out; > + > + /* Add all dirty buffers to mpd */ > + lblk = ((ext4_lblk_t)page->index) << > + (PAGE_CACHE_SHIFT - blkbits); > + head = page_buffers(page); > + err = mpage_process_page_bufs(mpd, head, head, lblk); > + if (err <= 0) > + goto out; > + err = 0; > } > pagevec_release(&pvec); > cond_resched(); > -- > 1.7.9.5 > -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] ext4: fix checking on nr_to_write 2013-10-14 12:58 ` Jan Kara @ 2013-10-14 13:50 ` Ming Lei 2013-10-14 17:34 ` Jan Kara 0 siblings, 1 reply; 10+ messages in thread From: Ming Lei @ 2013-10-14 13:50 UTC (permalink / raw) To: Jan Kara Cc: Linux Kernel Mailing List, Ted Tso, linux-ext4, linux-fsdevel@vger.kernel.org On Mon, Oct 14, 2013 at 8:58 PM, Jan Kara <jack@suse.cz> wrote: > Umm, I guess I see what are you pointing at. Thanks for catching that. > mpage_process_page_bufs() always adds a buffer to mpd even if nr_to_write > is already <= 0. But I would somewhat prefer not to call > mpage_prepare_extent_to_map() at all when nr_to_write <= 0. So a patch > like: > ret = mpage_prepare_extent_to_map(&mpd); > if (!ret) { > - if (mpd.map.m_len) > + if (mpd.map.m_len) { > ret = mpage_map_and_submit_extent(handle, &mpd, > &give_up_on_write); > - else { > + done = (wbc->nr_to_write <= 0); > + } else { > > Should also fix your problem, am I right? I am afraid it can't, because we need to stop scanning page cache if end of file is reached. nr_to_write will become negative inside mpage_map_and_submit_extent(), that is why I fix it inside mpage_prepare_extent_to_map(). Thanks, -- Ming Lei ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] ext4: fix checking on nr_to_write 2013-10-14 13:50 ` Ming Lei @ 2013-10-14 17:34 ` Jan Kara 2013-10-15 2:25 ` Ming Lei 0 siblings, 1 reply; 10+ messages in thread From: Jan Kara @ 2013-10-14 17:34 UTC (permalink / raw) To: Ming Lei Cc: Jan Kara, Linux Kernel Mailing List, Ted Tso, linux-ext4, linux-fsdevel@vger.kernel.org On Mon 14-10-13 21:50:54, Ming Lei wrote: > On Mon, Oct 14, 2013 at 8:58 PM, Jan Kara <jack@suse.cz> wrote: > > Umm, I guess I see what are you pointing at. Thanks for catching that. > > mpage_process_page_bufs() always adds a buffer to mpd even if nr_to_write > > is already <= 0. But I would somewhat prefer not to call > > mpage_prepare_extent_to_map() at all when nr_to_write <= 0. So a patch > > like: > > ret = mpage_prepare_extent_to_map(&mpd); > > if (!ret) { > > - if (mpd.map.m_len) > > + if (mpd.map.m_len) { > > ret = mpage_map_and_submit_extent(handle, &mpd, > > &give_up_on_write); > > - else { > > + done = (wbc->nr_to_write <= 0); > > + } else { > > > > Should also fix your problem, am I right? > > I am afraid it can't, because we need to stop scanning page cache > if end of file is reached. That should be OK. mpage_process_page_bufs() won't add a buffer beyond EOF so we end the extent at EOF and next time we don't add anything to the extent. My change wouldn't change anything in this. > nr_to_write will become negative inside mpage_map_and_submit_extent(), > that is why I fix it inside mpage_prepare_extent_to_map(). Yes, mpage_map_and_submit_extent() creates negative nr_to_write but only because mpage_prepare_extent_to_map() asked for mapping too big extent. But if I'm reading the code correctly we first ask for writing the extent of just the right size (nr_to_write becomes 0) but then ext4_writepages() asks mpage_prepare_extent_to_map() for another extent and it will create a single page extent for mapping before it finds out nr_to_write <= 0 and terminates. Am I understanding the problem correctly? After thinking about it again, moving the condition in mpage_prepare_extent_to_map() as you did is probably better that what I suggested. Just move it more up in the loop - like after page->index > end condition. So that we don't unnecessarily lock the page etc. Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] ext4: fix checking on nr_to_write 2013-10-14 17:34 ` Jan Kara @ 2013-10-15 2:25 ` Ming Lei 2013-10-15 10:39 ` Jan Kara 0 siblings, 1 reply; 10+ messages in thread From: Ming Lei @ 2013-10-15 2:25 UTC (permalink / raw) To: Jan Kara Cc: Linux Kernel Mailing List, Ted Tso, linux-ext4, linux-fsdevel@vger.kernel.org, Ming Lei On Mon, 14 Oct 2013 19:34:59 +0200 Jan Kara <jack@suse.cz> wrote: > On Mon 14-10-13 21:50:54, Ming Lei wrote: > > On Mon, Oct 14, 2013 at 8:58 PM, Jan Kara <jack@suse.cz> wrote: > > > Umm, I guess I see what are you pointing at. Thanks for catching that. > > > mpage_process_page_bufs() always adds a buffer to mpd even if nr_to_write > > > is already <= 0. But I would somewhat prefer not to call > > > mpage_prepare_extent_to_map() at all when nr_to_write <= 0. So a patch > > > like: > > > ret = mpage_prepare_extent_to_map(&mpd); > > > if (!ret) { > > > - if (mpd.map.m_len) > > > + if (mpd.map.m_len) { > > > ret = mpage_map_and_submit_extent(handle, &mpd, > > > &give_up_on_write); > > > - else { > > > + done = (wbc->nr_to_write <= 0); > > > + } else { > > > > > > Should also fix your problem, am I right? > > > > I am afraid it can't, because we need to stop scanning page cache > > if end of file is reached. > That should be OK. mpage_process_page_bufs() won't add a buffer beyond > EOF so we end the extent at EOF and next time we don't add anything to the > extent. My change wouldn't change anything in this. I mean wbc->nr_to_write may still be positive when mpage_prepare_extent_to_map() returns zero, so your change will cause ext4_writepages not to break from the loop, then write hangs, too crazy? > > > nr_to_write will become negative inside mpage_map_and_submit_extent(), > > that is why I fix it inside mpage_prepare_extent_to_map(). > Yes, mpage_map_and_submit_extent() creates negative nr_to_write but only > because mpage_prepare_extent_to_map() asked for mapping too big extent. But > if I'm reading the code correctly we first ask for writing the extent of > just the right size (nr_to_write becomes 0) but then ext4_writepages() asks > mpage_prepare_extent_to_map() for another extent and it will create a single > page extent for mapping before it finds out nr_to_write <= 0 and > terminates. Am I understanding the problem correctly? Your understanding is right, and I think it is a correct fix, because we shouldn't add more pages to extent than nr_to_write. > > After thinking about it again, moving the condition in > mpage_prepare_extent_to_map() as you did is probably better that what I > suggested. Just move it more up in the loop - like after page->index > end > condition. So that we don't unnecessarily lock the page etc. Looks it makes sense, so how about below change? -- diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 32c04ab..c32b599 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -2294,7 +2294,7 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd) { struct address_space *mapping = mpd->inode->i_mapping; struct pagevec pvec; - unsigned int nr_pages; + unsigned int nr_pages, nr_added = 0; pgoff_t index = mpd->first_page; pgoff_t end = mpd->last_page; int tag; @@ -2330,6 +2330,18 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd) if (page->index > end) goto out; + /* + * Accumulated enough dirty pages? This doesn't apply + * to WB_SYNC_ALL mode. For integrity sync we have to + * keep going because someone may be concurrently + * dirtying pages, and we might have synced a lot of + * newly appeared dirty pages, but have not synced all + * of the old dirty pages. + */ + if (mpd->wbc->sync_mode == WB_SYNC_NONE && + nr_added >= mpd->wbc->nr_to_write) + goto out; + /* If we can't merge this page, we are done. */ if (mpd->map.m_len > 0 && mpd->next_page != page->index) goto out; @@ -2364,19 +2376,7 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd) if (err <= 0) goto out; err = 0; ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] ext4: fix checking on nr_to_write 2013-10-15 2:25 ` Ming Lei @ 2013-10-15 10:39 ` Jan Kara 2013-10-15 11:15 ` Ming Lei 0 siblings, 1 reply; 10+ messages in thread From: Jan Kara @ 2013-10-15 10:39 UTC (permalink / raw) To: Ming Lei Cc: Jan Kara, Linux Kernel Mailing List, Ted Tso, linux-ext4, linux-fsdevel@vger.kernel.org, Ming Lei [-- Attachment #1: Type: text/plain, Size: 2453 bytes --] On Tue 15-10-13 10:25:53, Ming Lei wrote: > Looks it makes sense, so how about below change? > > -- > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 32c04ab..c32b599 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -2294,7 +2294,7 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd) > { > struct address_space *mapping = mpd->inode->i_mapping; > struct pagevec pvec; > - unsigned int nr_pages; > + unsigned int nr_pages, nr_added = 0; > pgoff_t index = mpd->first_page; > pgoff_t end = mpd->last_page; > int tag; > @@ -2330,6 +2330,18 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd) > if (page->index > end) > goto out; > > + /* > + * Accumulated enough dirty pages? This doesn't apply > + * to WB_SYNC_ALL mode. For integrity sync we have to > + * keep going because someone may be concurrently > + * dirtying pages, and we might have synced a lot of > + * newly appeared dirty pages, but have not synced all > + * of the old dirty pages. > + */ > + if (mpd->wbc->sync_mode == WB_SYNC_NONE && > + nr_added >= mpd->wbc->nr_to_write) > + goto out; > + This won't quite work because if the page is fully mapped mpage_process_page_bufs() will immediately submit the page and decrease nr_to_write. So now you would end up writing less than you were asked for in some cases. Attached patch should do what's needed. Can you try whether it fixes the problem for you (it seems to work OK in my testing). Honza > /* If we can't merge this page, we are done. */ > if (mpd->map.m_len > 0 && mpd->next_page != page->index) > goto out; > @@ -2364,19 +2376,7 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd) > if (err <= 0) > goto out; > err = 0; > - > - /* > - * Accumulated enough dirty pages? This doesn't apply > - * to WB_SYNC_ALL mode. For integrity sync we have to > - * keep going because someone may be concurrently > - * dirtying pages, and we might have synced a lot of > - * newly appeared dirty pages, but have not synced all > - * of the old dirty pages. > - */ > - if (mpd->wbc->sync_mode == WB_SYNC_NONE && > - mpd->next_page - mpd->first_page >= > - mpd->wbc->nr_to_write) > - goto out; > + nr_added++; > } > pagevec_release(&pvec); > cond_resched(); > > > > Thanks, > -- > Ming Lei -- Jan Kara <jack@suse.cz> SUSE Labs, CR [-- Attachment #2: 0001-ext4-Fix-performance-regression-in-ext4_writepages.patch --] [-- Type: text/x-patch, Size: 3104 bytes --] >From 2ea950d5d601fd1236065f3fe87a9383d78d3785 Mon Sep 17 00:00:00 2001 From: Ming Lei <ming.lei@canonical.com> Date: Tue, 15 Oct 2013 12:30:50 +0200 Subject: [PATCH] ext4: Fix performance regression in ext4_writepages() Commit 4e7ea81db5(ext4: restructure writeback path) introduces another performance regression on random write: The logic in mpage_prepare_extent_to_map() always prepares at least one page for mapping and the loop in ext4_writepages() doesn't terminate when nr_to_write drops to zero if there is something to map. So we will still try to write more that we should and we will do it in 1 page chunks. Fix the problem by moving nr_to_write check in mpage_prepare_extent_to_map() before preparing the first page. That way mpage_prepare_extent_to_map() will return without preparing anything when nr_to_write is exhausted and the loop in ext4_writepages() will be terminated properly. Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz> --- fs/ext4/inode.c | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index e274e9c1171f..7d12a38fbde4 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -2334,6 +2334,22 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd) if (mpd->map.m_len > 0 && mpd->next_page != page->index) goto out; + if (mpd->map.m_len == 0) + mpd->first_page = page->index; + mpd->next_page = page->index + 1; + /* + * Accumulated enough dirty pages? This doesn't apply + * to WB_SYNC_ALL mode. For integrity sync we have to + * keep going because someone may be concurrently + * dirtying pages, and we might have synced a lot of + * newly appeared dirty pages, but have not synced all + * of the old dirty pages. + */ + if (mpd->wbc->sync_mode == WB_SYNC_NONE && + mpd->next_page - mpd->first_page > + mpd->wbc->nr_to_write) + goto out; + lock_page(page); /* * If the page is no longer dirty, or its mapping no @@ -2353,9 +2369,6 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd) wait_on_page_writeback(page); BUG_ON(PageWriteback(page)); - if (mpd->map.m_len == 0) - mpd->first_page = page->index; - mpd->next_page = page->index + 1; /* Add all dirty buffers to mpd */ lblk = ((ext4_lblk_t)page->index) << (PAGE_CACHE_SHIFT - blkbits); @@ -2364,19 +2377,6 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd) if (err <= 0) goto out; err = 0; - - /* - * Accumulated enough dirty pages? This doesn't apply - * to WB_SYNC_ALL mode. For integrity sync we have to - * keep going because someone may be concurrently - * dirtying pages, and we might have synced a lot of - * newly appeared dirty pages, but have not synced all - * of the old dirty pages. - */ - if (mpd->wbc->sync_mode == WB_SYNC_NONE && - mpd->next_page - mpd->first_page >= - mpd->wbc->nr_to_write) - goto out; } pagevec_release(&pvec); cond_resched(); -- 1.8.1.4 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] ext4: fix checking on nr_to_write 2013-10-15 10:39 ` Jan Kara @ 2013-10-15 11:15 ` Ming Lei 2013-10-15 12:34 ` Jan Kara 0 siblings, 1 reply; 10+ messages in thread From: Ming Lei @ 2013-10-15 11:15 UTC (permalink / raw) To: Jan Kara Cc: Linux Kernel Mailing List, Ted Tso, linux-ext4, linux-fsdevel@vger.kernel.org, Ming Lei On Tue, 15 Oct 2013 12:39:00 +0200 Jan Kara <jack@suse.cz> wrote: > On Tue 15-10-13 10:25:53, Ming Lei wrote: > > Looks it makes sense, so how about below change? > > > > -- > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > > index 32c04ab..c32b599 100644 > > --- a/fs/ext4/inode.c > > +++ b/fs/ext4/inode.c > > @@ -2294,7 +2294,7 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd) > > { > > struct address_space *mapping = mpd->inode->i_mapping; > > struct pagevec pvec; > > - unsigned int nr_pages; > > + unsigned int nr_pages, nr_added = 0; > > pgoff_t index = mpd->first_page; > > pgoff_t end = mpd->last_page; > > int tag; > > @@ -2330,6 +2330,18 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd) > > if (page->index > end) > > goto out; > > > > + /* > > + * Accumulated enough dirty pages? This doesn't apply > > + * to WB_SYNC_ALL mode. For integrity sync we have to > > + * keep going because someone may be concurrently > > + * dirtying pages, and we might have synced a lot of > > + * newly appeared dirty pages, but have not synced all > > + * of the old dirty pages. > > + */ > > + if (mpd->wbc->sync_mode == WB_SYNC_NONE && > > + nr_added >= mpd->wbc->nr_to_write) > > + goto out; > > + > This won't quite work because if the page is fully mapped > mpage_process_page_bufs() will immediately submit the page and decrease > nr_to_write. So now you would end up writing less than you were asked for > in some cases. Yes, your are right, so how about below? diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 32c04ab..3cf7abb 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -2295,6 +2295,7 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd) struct address_space *mapping = mpd->inode->i_mapping; struct pagevec pvec; unsigned int nr_pages; + int left = mpd->wbc->nr_to_write; pgoff_t index = mpd->first_page; pgoff_t end = mpd->last_page; int tag; @@ -2330,6 +2331,17 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd) if (page->index > end) goto out; + /* + * Accumulated enough dirty pages? This doesn't apply + * to WB_SYNC_ALL mode. For integrity sync we have to + * keep going because someone may be concurrently + * dirtying pages, and we might have synced a lot of + * newly appeared dirty pages, but have not synced all + * of the old dirty pages. + */ + if (mpd->wbc->sync_mode == WB_SYNC_NONE && left <= 0) + goto out; + /* If we can't merge this page, we are done. */ if (mpd->map.m_len > 0 && mpd->next_page != page->index) goto out; @@ -2364,19 +2376,7 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd) if (err <= 0) goto out; err = 0; - - /* - * Accumulated enough dirty pages? This doesn't apply - * to WB_SYNC_ALL mode. For integrity sync we have to - * keep going because someone may be concurrently - * dirtying pages, and we might have synced a lot of - * newly appeared dirty pages, but have not synced all - * of the old dirty pages. - */ - if (mpd->wbc->sync_mode == WB_SYNC_NONE && - mpd->next_page - mpd->first_page >= - mpd->wbc->nr_to_write) - goto out; + left--; } pagevec_release(&pvec); cond_resched(); > Attached patch should do what's needed. Can you try whether > it fixes the problem for you (it seems to work OK in my testing). In fact, I had wrote and tested your attached patch before my last post, and it may trigger BUG() in mpage_release_unused_pages(), that is because we touch mpd->next_page without locking current page, so it is better to not increase mpd->next_page if the current page won't be processed. Thanks, -- Ming Lei ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] ext4: fix checking on nr_to_write 2013-10-15 11:15 ` Ming Lei @ 2013-10-15 12:34 ` Jan Kara 2013-10-15 14:53 ` Ming Lei 0 siblings, 1 reply; 10+ messages in thread From: Jan Kara @ 2013-10-15 12:34 UTC (permalink / raw) To: Ming Lei Cc: Jan Kara, Linux Kernel Mailing List, Ted Tso, linux-ext4, linux-fsdevel@vger.kernel.org, Ming Lei On Tue 15-10-13 19:15:56, Ming Lei wrote: > > This won't quite work because if the page is fully mapped > > mpage_process_page_bufs() will immediately submit the page and decrease > > nr_to_write. So now you would end up writing less than you were asked for > > in some cases. > > Yes, your are right, so how about below? > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 32c04ab..3cf7abb 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -2295,6 +2295,7 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd) > struct address_space *mapping = mpd->inode->i_mapping; > struct pagevec pvec; > unsigned int nr_pages; > + int left = mpd->wbc->nr_to_write; 'long' please. Otherwise the patch looks fine. Thanks! Honza > pgoff_t index = mpd->first_page; > pgoff_t end = mpd->last_page; > int tag; > @@ -2330,6 +2331,17 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd) > if (page->index > end) > goto out; > > + /* > + * Accumulated enough dirty pages? This doesn't apply > + * to WB_SYNC_ALL mode. For integrity sync we have to > + * keep going because someone may be concurrently > + * dirtying pages, and we might have synced a lot of > + * newly appeared dirty pages, but have not synced all > + * of the old dirty pages. > + */ > + if (mpd->wbc->sync_mode == WB_SYNC_NONE && left <= 0) > + goto out; > + > /* If we can't merge this page, we are done. */ > if (mpd->map.m_len > 0 && mpd->next_page != page->index) > goto out; > @@ -2364,19 +2376,7 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd) > if (err <= 0) > goto out; > err = 0; > - > - /* > - * Accumulated enough dirty pages? This doesn't apply > - * to WB_SYNC_ALL mode. For integrity sync we have to > - * keep going because someone may be concurrently > - * dirtying pages, and we might have synced a lot of > - * newly appeared dirty pages, but have not synced all > - * of the old dirty pages. > - */ > - if (mpd->wbc->sync_mode == WB_SYNC_NONE && > - mpd->next_page - mpd->first_page >= > - mpd->wbc->nr_to_write) > - goto out; > + left--; > } > pagevec_release(&pvec); > cond_resched(); > > > > Attached patch should do what's needed. Can you try whether > > it fixes the problem for you (it seems to work OK in my testing). > > In fact, I had wrote and tested your attached patch before my last post, > and it may trigger BUG() in mpage_release_unused_pages(), that is because > we touch mpd->next_page without locking current page, so it is better to > not increase mpd->next_page if the current page won't be processed. > > > Thanks, > -- > Ming Lei -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] ext4: fix checking on nr_to_write 2013-10-15 12:34 ` Jan Kara @ 2013-10-15 14:53 ` Ming Lei 0 siblings, 0 replies; 10+ messages in thread From: Ming Lei @ 2013-10-15 14:53 UTC (permalink / raw) To: Jan Kara Cc: Linux Kernel Mailing List, Ted Tso, linux-ext4, linux-fsdevel@vger.kernel.org On Tue, Oct 15, 2013 at 8:34 PM, Jan Kara <jack@suse.cz> wrote: > On Tue 15-10-13 19:15:56, Ming Lei wrote: >> > This won't quite work because if the page is fully mapped >> > mpage_process_page_bufs() will immediately submit the page and decrease >> > nr_to_write. So now you would end up writing less than you were asked for >> > in some cases. >> >> Yes, your are right, so how about below? >> >> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c >> index 32c04ab..3cf7abb 100644 >> --- a/fs/ext4/inode.c >> +++ b/fs/ext4/inode.c >> @@ -2295,6 +2295,7 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd) >> struct address_space *mapping = mpd->inode->i_mapping; >> struct pagevec pvec; >> unsigned int nr_pages; >> + int left = mpd->wbc->nr_to_write; > 'long' please. Otherwise the patch looks fine. Thanks! OK, and I will submit the formal one with you ack. Thanks, -- Ming Lei ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2013-10-15 14:53 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-10-13 16:39 [PATCH] ext4: fix checking on nr_to_write Ming Lei 2013-10-14 0:42 ` Ming Lei 2013-10-14 12:58 ` Jan Kara 2013-10-14 13:50 ` Ming Lei 2013-10-14 17:34 ` Jan Kara 2013-10-15 2:25 ` Ming Lei 2013-10-15 10:39 ` Jan Kara 2013-10-15 11:15 ` Ming Lei 2013-10-15 12:34 ` Jan Kara 2013-10-15 14:53 ` Ming Lei
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).