From: Andrew Morton <akpm@linux-foundation.org>
To: "Aneesh Kumar K. V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: linux-fsdevel@vger.kernel.org, Theodore Ts'o <tytso@mit.edu>,
linux-kernel@vger.kernel.org, xfs@oss.sgi.com
Subject: Re: [PATCH 3/4] writeback: pay attention to wbc->nr_to_write in write_cache_pages
Date: Fri, 30 Apr 2010 12:43:29 -0700 [thread overview]
Message-ID: <20100430124329.10a4c02b.akpm@linux-foundation.org> (raw)
In-Reply-To: <87sk6dwka6.fsf@linux.vnet.ibm.com>
On Fri, 30 Apr 2010 11:31:53 +0530
"Aneesh Kumar K. V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> On Thu, 29 Apr 2010 14:39:31 -0700, Andrew Morton <akpm@linux-foundation.org> wrote:
> > On Tue, 20 Apr 2010 12:41:53 +1000
> > Dave Chinner <david@fromorbit.com> wrote:
> >
> > > If a filesystem writes more than one page in ->writepage, write_cache_pages
> > > fails to notice this and continues to attempt writeback when wbc->nr_to_write
> > > has gone negative - this trace was captured from XFS:
> > >
> > >
> > > wbc_writeback_start: towrt=1024
> > > wbc_writepage: towrt=1024
> > > wbc_writepage: towrt=0
> > > wbc_writepage: towrt=-1
> > > wbc_writepage: towrt=-5
> > > wbc_writepage: towrt=-21
> > > wbc_writepage: towrt=-85
> > >
> >
> > Bug.
> >
> > AFAIT it's a regression introduced by
> >
> > : commit 17bc6c30cf6bfffd816bdc53682dd46fc34a2cf4
> > : Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> > : AuthorDate: Thu Oct 16 10:09:17 2008 -0400
> > : Commit: Theodore Ts'o <tytso@mit.edu>
> > : CommitDate: Thu Oct 16 10:09:17 2008 -0400
> > :
> > : vfs: Add no_nrwrite_index_update writeback control flag
> >
> > I suggest that what you do here is remove the local `nr_to_write' from
> > write_cache_pages() and go back to directly using wbc->nr_to_write
> > within the loop.
> >
> > And thus we restore the convention that if the fs writes back more than
> > a single page, it subtracts (nr_written - 1) from wbc->nr_to_write.
> >
>
> My mistake i never expected writepage to write more than one page.
The writeback code is tricky and easy to break in subtle ways.
> The
> interface said 'writepage' so it was natural to expect that it writes only
> one page. BTW the reason for the change is to give file system which
> accumulate dirty pages using write_cache_pages and attempt to write
> them out later a chance to properly manage nr_to_write. Something like
>
> ext4_da_writepages
> -- write_cache_pages
> ---- collect dirty page
> ---- return
> --return
> --now try to writeout all the collected dirty pages ( say 100)
> ----Only able to allocate blocks for 50 pages
> so update nr_to_write -= 50 and mark rest of 50 pages as dirty
> again
>
> So we want wbc->nr_to_write updated only by ext4_da_writepages.
So you want a ->writepage() implementation which doesn't actually write
a page at all - it just remembers that page for later.
Maybe that fs shouldn't be calling write_cache_pages() at all. After
all, write_cache_pages() is a wrapper which emits a sequence of calls
to ->writepage(), and ->writepage() writes a page.
Rather than hacking around, subverting things and breaking core kernel
code, let's step back and more clearly think about what to do?
One option would be to implement a new address_space_operation which
provides the new semantics in a well-understood fashion. Let's call it
writepage_prepare(?). Then reimplement write_cache_pages() so that if
->writepage_prepare() is available, it handles it in a sensible fashion
and doesn't break traditional filesystems.
Or simply implement a new, different version of write_cache_pages() for
filesystems which wish to buffer in this fashion. The new
write_cache_pages_prepare()(?) would call ->writepage_prepare().
Internally it might share implementation with write_cache_pages().
There are lots of options. But the way in which write_cache_pages()
was extended to handle this ext4 requirement was rather unclean,
non-obvious and, umm, broken!
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@linux-foundation.org>
To: "Aneesh Kumar K. V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Dave Chinner <david@fromorbit.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
xfs@oss.sgi.com, "Theodore Ts'o" <tytso@mit.edu>
Subject: Re: [PATCH 3/4] writeback: pay attention to wbc->nr_to_write in write_cache_pages
Date: Fri, 30 Apr 2010 12:43:29 -0700 [thread overview]
Message-ID: <20100430124329.10a4c02b.akpm@linux-foundation.org> (raw)
In-Reply-To: <87sk6dwka6.fsf@linux.vnet.ibm.com>
On Fri, 30 Apr 2010 11:31:53 +0530
"Aneesh Kumar K. V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
> On Thu, 29 Apr 2010 14:39:31 -0700, Andrew Morton <akpm@linux-foundation.org> wrote:
> > On Tue, 20 Apr 2010 12:41:53 +1000
> > Dave Chinner <david@fromorbit.com> wrote:
> >
> > > If a filesystem writes more than one page in ->writepage, write_cache_pages
> > > fails to notice this and continues to attempt writeback when wbc->nr_to_write
> > > has gone negative - this trace was captured from XFS:
> > >
> > >
> > > wbc_writeback_start: towrt=1024
> > > wbc_writepage: towrt=1024
> > > wbc_writepage: towrt=0
> > > wbc_writepage: towrt=-1
> > > wbc_writepage: towrt=-5
> > > wbc_writepage: towrt=-21
> > > wbc_writepage: towrt=-85
> > >
> >
> > Bug.
> >
> > AFAIT it's a regression introduced by
> >
> > : commit 17bc6c30cf6bfffd816bdc53682dd46fc34a2cf4
> > : Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> > : AuthorDate: Thu Oct 16 10:09:17 2008 -0400
> > : Commit: Theodore Ts'o <tytso@mit.edu>
> > : CommitDate: Thu Oct 16 10:09:17 2008 -0400
> > :
> > : vfs: Add no_nrwrite_index_update writeback control flag
> >
> > I suggest that what you do here is remove the local `nr_to_write' from
> > write_cache_pages() and go back to directly using wbc->nr_to_write
> > within the loop.
> >
> > And thus we restore the convention that if the fs writes back more than
> > a single page, it subtracts (nr_written - 1) from wbc->nr_to_write.
> >
>
> My mistake i never expected writepage to write more than one page.
The writeback code is tricky and easy to break in subtle ways.
> The
> interface said 'writepage' so it was natural to expect that it writes only
> one page. BTW the reason for the change is to give file system which
> accumulate dirty pages using write_cache_pages and attempt to write
> them out later a chance to properly manage nr_to_write. Something like
>
> ext4_da_writepages
> -- write_cache_pages
> ---- collect dirty page
> ---- return
> --return
> --now try to writeout all the collected dirty pages ( say 100)
> ----Only able to allocate blocks for 50 pages
> so update nr_to_write -= 50 and mark rest of 50 pages as dirty
> again
>
> So we want wbc->nr_to_write updated only by ext4_da_writepages.
So you want a ->writepage() implementation which doesn't actually write
a page at all - it just remembers that page for later.
Maybe that fs shouldn't be calling write_cache_pages() at all. After
all, write_cache_pages() is a wrapper which emits a sequence of calls
to ->writepage(), and ->writepage() writes a page.
Rather than hacking around, subverting things and breaking core kernel
code, let's step back and more clearly think about what to do?
One option would be to implement a new address_space_operation which
provides the new semantics in a well-understood fashion. Let's call it
writepage_prepare(?). Then reimplement write_cache_pages() so that if
->writepage_prepare() is available, it handles it in a sensible fashion
and doesn't break traditional filesystems.
Or simply implement a new, different version of write_cache_pages() for
filesystems which wish to buffer in this fashion. The new
write_cache_pages_prepare()(?) would call ->writepage_prepare().
Internally it might share implementation with write_cache_pages().
There are lots of options. But the way in which write_cache_pages()
was extended to handle this ext4 requirement was rather unclean,
non-obvious and, umm, broken!
next prev parent reply other threads:[~2010-04-30 19:41 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-04-20 2:41 [PATCH 0/4] writeback: tracing and wbc->nr_to_write fixes Dave Chinner
2010-04-20 2:41 ` Dave Chinner
2010-04-20 2:41 ` [PATCH 1/4] writeback: initial tracing support Dave Chinner
2010-04-20 2:41 ` Dave Chinner
2010-05-21 15:06 ` Christoph Hellwig
2010-05-21 15:06 ` Christoph Hellwig
2010-04-20 2:41 ` [PATCH 2/4] writeback: Add tracing to balance_dirty_pages Dave Chinner
2010-04-20 2:41 ` Dave Chinner
2010-04-20 2:41 ` [PATCH 3/4] writeback: pay attention to wbc->nr_to_write in write_cache_pages Dave Chinner
2010-04-20 2:41 ` Dave Chinner
2010-04-22 19:07 ` Jan Kara
2010-04-22 19:07 ` Jan Kara
2010-04-25 3:33 ` tytso
2010-04-25 3:33 ` tytso
2010-04-25 3:33 ` tytso
2010-04-26 1:49 ` Dave Chinner
2010-04-26 1:49 ` Dave Chinner
2010-04-26 1:49 ` Dave Chinner
2010-04-26 2:43 ` tytso
2010-04-26 2:43 ` tytso
2010-04-26 2:45 ` tytso
2010-04-26 2:45 ` tytso
2010-04-27 3:30 ` Dave Chinner
2010-04-27 3:30 ` Dave Chinner
2010-04-29 21:39 ` Andrew Morton
2010-04-29 21:39 ` Andrew Morton
2010-04-30 6:01 ` Aneesh Kumar K. V
2010-04-30 6:01 ` Aneesh Kumar K. V
2010-04-30 19:43 ` Andrew Morton [this message]
2010-04-30 19:43 ` Andrew Morton
2010-05-01 19:47 ` tytso
2010-05-01 19:47 ` tytso
2010-04-20 2:41 ` [PATCH 4/4] xfs: remove nr_to_write writeback windup Dave Chinner
2010-04-20 2:41 ` Dave Chinner
2010-04-22 19:09 ` Jan Kara
2010-04-22 19:09 ` Jan Kara
2010-04-26 0:46 ` Dave Chinner
2010-04-26 0:46 ` Dave Chinner
2010-04-20 3:40 ` [PATCH 5/4] writeback: limit write_cache_pages integrity scanning to current EOF Dave Chinner
2010-04-20 3:40 ` Dave Chinner
2010-04-20 23:28 ` Jamie Lokier
2010-04-20 23:28 ` Jamie Lokier
2010-04-20 23:31 ` Dave Chinner
2010-04-20 23:31 ` Dave Chinner
2010-04-22 19:13 ` Jan Kara
2010-04-22 19:13 ` Jan Kara
2010-04-20 12:02 ` [PATCH 0/4] writeback: tracing and wbc->nr_to_write fixes Richard Kennedy
2010-04-20 12:02 ` Richard Kennedy
2010-04-20 23:29 ` Dave Chinner
2010-04-20 23:29 ` Dave Chinner
2010-05-21 15:05 ` Christoph Hellwig
2010-05-21 15:05 ` Christoph Hellwig
2010-05-22 0:09 ` Dave Chinner
2010-05-22 0:09 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100430124329.10a4c02b.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tytso@mit.edu \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.