* Re: [RFC][PATCH 7/7] writeback: balance_dirty_pages() shall write more than dirtied pages
[not found] ` <20090909150601.159061863@intel.com>
@ 2009-09-09 15:44 ` Jan Kara
2009-09-10 1:42 ` Wu Fengguang
0 siblings, 1 reply; 19+ messages in thread
From: Jan Kara @ 2009-09-09 15:44 UTC (permalink / raw)
To: Wu Fengguang
Cc: Andrew Morton, Jens Axboe, Dave Chinner, Chris Mason,
Peter Zijlstra, Christoph Hellwig, jack, Artem Bityutskiy, LKML,
linux-fsdevel
On Wed 09-09-09 22:51:48, Wu Fengguang wrote:
> Some filesystem may choose to write much more than ratelimit_pages
> before calling balance_dirty_pages_ratelimited_nr(). So it is safer to
> determine number to write based on real number of dirtied pages.
>
> The increased write_chunk may make the dirtier more bumpy. This is
> filesystem writers' duty not to dirty too much at a time without
> checking the ratelimit.
I don't get this. balance_dirty_pages_ratelimited_nr() is called when we
dirty the page, not when we write it out. So a problem would only happen if
filesystem dirties pages by set_page_dirty() and won't call
balance_dirty_pages_ratelimited_nr(). But e.g. generic_perform_write()
and do_wp_page() takes care of that. So where's the problem?
Honza
> ---
> mm/page-writeback.c | 13 +++++++------
> 1 file changed, 7 insertions(+), 6 deletions(-)
>
> --- linux.orig/mm/page-writeback.c 2009-09-09 21:19:21.000000000 +0800
> +++ linux/mm/page-writeback.c 2009-09-09 21:25:45.000000000 +0800
> @@ -44,12 +44,12 @@ static long ratelimit_pages = 32;
> /*
> * When balance_dirty_pages decides that the caller needs to perform some
> * non-background writeback, this is how many pages it will attempt to write.
> - * It should be somewhat larger than RATELIMIT_PAGES to ensure that reasonably
> + * It should be somewhat larger than dirtied pages to ensure that reasonably
> * large amounts of I/O are submitted.
> */
> -static inline long sync_writeback_pages(void)
> +static inline long sync_writeback_pages(unsigned long dirtied)
> {
> - return ratelimit_pages + ratelimit_pages / 2;
> + return dirtied + dirtied / 2;
> }
>
> /* The following parameters are exported via /proc/sys/vm */
> @@ -476,7 +476,8 @@ get_dirty_limits(unsigned long *pbackgro
> * If we're over `background_thresh' then pdflush is woken to perform some
> * writeout.
> */
> -static void balance_dirty_pages(struct address_space *mapping)
> +static void balance_dirty_pages(struct address_space *mapping,
> + unsigned long write_chunk)
> {
> long nr_reclaimable, bdi_nr_reclaimable;
> long nr_writeback, bdi_nr_writeback;
> @@ -484,7 +485,6 @@ static void balance_dirty_pages(struct a
> unsigned long dirty_thresh;
> unsigned long bdi_thresh;
> unsigned long pages_written = 0;
> - unsigned long write_chunk = sync_writeback_pages();
>
> struct backing_dev_info *bdi = mapping->backing_dev_info;
>
> @@ -638,9 +638,10 @@ void balance_dirty_pages_ratelimited_nr(
> p = &__get_cpu_var(bdp_ratelimits);
> *p += nr_pages_dirtied;
> if (unlikely(*p >= ratelimit)) {
> + ratelimit = sync_writeback_pages(*p);
> *p = 0;
> preempt_enable();
> - balance_dirty_pages(mapping);
> + balance_dirty_pages(mapping, ratelimit);
> return;
> }
> preempt_enable();
>
> --
>
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PATCH 1/7] writeback: cleanup writeback_single_inode()
[not found] ` <20090909150600.330539880@intel.com>
@ 2009-09-09 15:45 ` Jan Kara
0 siblings, 0 replies; 19+ messages in thread
From: Jan Kara @ 2009-09-09 15:45 UTC (permalink / raw)
To: Wu Fengguang
Cc: Andrew Morton, Jens Axboe, Dave Chinner, Chris Mason,
Peter Zijlstra, Christoph Hellwig, jack, Artem Bityutskiy, LKML,
linux-fsdevel
On Wed 09-09-09 22:51:42, Wu Fengguang wrote:
> Make the if-else straight in writeback_single_inode().
> No behavior change.
>
> Cc: Michael Rubin <mrubin@google.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
The patch looks good.
Acked-by: Jan Kara <jack@suse.cz>
> ---
> fs/fs-writeback.c | 15 +++++++--------
> 1 file changed, 7 insertions(+), 8 deletions(-)
>
> --- linux.orig/fs/fs-writeback.c 2009-09-09 21:40:41.000000000 +0800
> +++ linux/fs/fs-writeback.c 2009-09-09 21:41:14.000000000 +0800
> @@ -417,8 +417,13 @@ writeback_single_inode(struct inode *ino
> spin_lock(&inode_lock);
> inode->i_state &= ~I_SYNC;
> if (!(inode->i_state & (I_FREEING | I_CLEAR))) {
> - if (!(inode->i_state & I_DIRTY) &&
> - mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
> + if (inode->i_state & I_DIRTY) {
> + /*
> + * Someone redirtied the inode while were writing back
> + * the pages.
> + */
> + redirty_tail(inode);
> + } else if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
> /*
> * We didn't write back all the pages. nfs_writepages()
> * sometimes bales out without doing anything. Redirty
> @@ -462,12 +467,6 @@ writeback_single_inode(struct inode *ino
> inode->i_state |= I_DIRTY_PAGES;
> redirty_tail(inode);
> }
> - } else if (inode->i_state & I_DIRTY) {
> - /*
> - * Someone redirtied the inode while were writing back
> - * the pages.
> - */
> - redirty_tail(inode);
> } else if (atomic_read(&inode->i_count)) {
> /*
> * The inode is clean, inuse
>
> --
>
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PATCH 2/7] writeback: fix queue_io() ordering
[not found] ` <20090909150600.451373732@intel.com>
@ 2009-09-09 15:53 ` Jan Kara
2009-09-10 1:26 ` Wu Fengguang
0 siblings, 1 reply; 19+ messages in thread
From: Jan Kara @ 2009-09-09 15:53 UTC (permalink / raw)
To: Wu Fengguang
Cc: Andrew Morton, Jens Axboe, Dave Chinner, Chris Mason,
Peter Zijlstra, Christoph Hellwig, jack, Artem Bityutskiy, LKML,
linux-fsdevel
On Wed 09-09-09 22:51:43, Wu Fengguang wrote:
> This was not a bug, since b_io is empty for kupdate writeback.
> The next patch will do requeue_io() for non-kupdate writeback,
> so let's fix it.
But doesn't this patch also need your "anti-starvation" patch?
Looking into the code, we put inode to b_more_io when nr_to_write
drops to zero and this way we'd just start writing it again
in the next round...
Honza
>
> CC: Dave Chinner <david@fromorbit.com>
> Cc: Martin Bligh <mbligh@google.com>
> Cc: Michael Rubin <mrubin@google.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
> ---
> fs/fs-writeback.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> --- linux.orig/fs/fs-writeback.c 2009-09-09 21:41:14.000000000 +0800
> +++ linux/fs/fs-writeback.c 2009-09-09 21:45:15.000000000 +0800
> @@ -313,11 +313,14 @@ static void move_expired_inodes(struct l
> }
>
> /*
> - * Queue all expired dirty inodes for io, eldest first.
> + * Queue all expired dirty inodes for io, eldest first:
> + * (newly dirtied) => b_dirty inodes
> + * => b_more_io inodes
> + * => remaining inodes in b_io => (dequeue for sync)
> */
> static void queue_io(struct bdi_writeback *wb, unsigned long *older_than_this)
> {
> - list_splice_init(&wb->b_more_io, wb->b_io.prev);
> + list_splice_init(&wb->b_more_io, &wb->b_io);
> move_expired_inodes(&wb->b_dirty, &wb->b_io, older_than_this);
> }
>
>
> --
>
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PATCH 5/7] writeback: use 64MB MAX_WRITEBACK_PAGES
[not found] ` <20090909150600.874037375@intel.com>
@ 2009-09-09 23:29 ` Theodore Tso
2009-09-10 0:13 ` Wu Fengguang
2009-09-10 4:53 ` Peter Zijlstra
0 siblings, 2 replies; 19+ messages in thread
From: Theodore Tso @ 2009-09-09 23:29 UTC (permalink / raw)
To: Wu Fengguang
Cc: Andrew Morton, Jens Axboe, Dave Chinner, Chris Mason,
Peter Zijlstra, Christoph Hellwig, jack, Artem Bityutskiy, LKML,
linux-fsdevel
On Wed, Sep 09, 2009 at 10:51:46PM +0800, Wu Fengguang wrote:
> + * The maximum number of pages to writeout in a single periodic/background
> + * writeback operation. 64MB means I_SYNC may be hold for up to 1 second.
> + * This is not a big problem since we normally do kind of trylock on I_SYNC
> + * for non-data-integrity writes. Userspace tasks doing throttled writeback
> + * do not use this value.
What's your justification for using 64MB? Where are you getting 1
second from? On a fast RAID array 64MB can be written in much less
than 1 second.
More generally, I assume your patches conflict with Jens' per-bdi patches?
- Ted
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PATCH 5/7] writeback: use 64MB MAX_WRITEBACK_PAGES
2009-09-09 23:29 ` [RFC][PATCH 5/7] writeback: use 64MB MAX_WRITEBACK_PAGES Theodore Tso
@ 2009-09-10 0:13 ` Wu Fengguang
2009-09-10 4:53 ` Peter Zijlstra
1 sibling, 0 replies; 19+ messages in thread
From: Wu Fengguang @ 2009-09-10 0:13 UTC (permalink / raw)
To: Theodore Tso, Andrew Morton, Jens Axboe, Dave Chinner,
Chris Mason
On Thu, Sep 10, 2009 at 07:29:38AM +0800, Theodore Tso wrote:
> On Wed, Sep 09, 2009 at 10:51:46PM +0800, Wu Fengguang wrote:
> > + * The maximum number of pages to writeout in a single periodic/background
> > + * writeback operation. 64MB means I_SYNC may be hold for up to 1 second.
> > + * This is not a big problem since we normally do kind of trylock on I_SYNC
> > + * for non-data-integrity writes. Userspace tasks doing throttled writeback
> > + * do not use this value.
>
> What's your justification for using 64MB? Where are you getting 1
> second from? On a fast RAID array 64MB can be written in much less
> than 1 second.
Because this value will be used for desktop and server alike, we have
to consider the worst case - for a laptop, it takes about 1 second to
write 64MB. It's not accurate to say I_SYNC may be hold for up to 1
second, but the same I_SYNC will be taken, dropped because of
congestion, retaken, and this loop could go on for 1 second until a
large file have been written 64MB. In the mean while, in the normal
case of a single kupdate thread per bdi, that file blocks writeout of
other files for 1 second.
> More generally, I assume your patches conflict with Jens' per-bdi patches?
It is based on latest linux-next tree with the vm.max_writeback_mb
patch reverted. The changelog only states the need to increase the
size, but not mentioned why it should be made tunable. So I tend to
just bump up MAX_WRITEBACK_PAGES. It seems that a large value won't
hurt anyone. Or are there demand to further increase it a lot for some
server configurations?
Thanks,
Fengguang
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PATCH 2/7] writeback: fix queue_io() ordering
2009-09-09 15:53 ` [RFC][PATCH 2/7] writeback: fix queue_io() ordering Jan Kara
@ 2009-09-10 1:26 ` Wu Fengguang
2009-09-10 14:14 ` Jan Kara
0 siblings, 1 reply; 19+ messages in thread
From: Wu Fengguang @ 2009-09-10 1:26 UTC (permalink / raw)
To: Jan Kara
Cc: Andrew Morton, Jens Axboe, Dave Chinner, Chris Mason,
Peter Zijlstra, Christoph Hellwig, jack@suse.cz, Artem Bityutskiy,
LKML, linux-fsdevel@vger.kernel.org
On Wed, Sep 09, 2009 at 11:53:30PM +0800, Jan Kara wrote:
> On Wed 09-09-09 22:51:43, Wu Fengguang wrote:
> > This was not a bug, since b_io is empty for kupdate writeback.
> > The next patch will do requeue_io() for non-kupdate writeback,
> > so let's fix it.
> But doesn't this patch also need your "anti-starvation" patch?
Honza, can you show me that patch?
> Looking into the code, we put inode to b_more_io when nr_to_write
> drops to zero and this way we'd just start writing it again
> in the next round...
I'm confused. It's OK to start it in next round. Starvation can
occur if we start it immediately in the next writeback_inodes()
invocation. How can that happen with this patch?
Thanks,
Fengguang
> Honza
> >
> > CC: Dave Chinner <david@fromorbit.com>
> > Cc: Martin Bligh <mbligh@google.com>
> > Cc: Michael Rubin <mrubin@google.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
> > ---
> > fs/fs-writeback.c | 7 +++++--
> > 1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > --- linux.orig/fs/fs-writeback.c 2009-09-09 21:41:14.000000000 +0800
> > +++ linux/fs/fs-writeback.c 2009-09-09 21:45:15.000000000 +0800
> > @@ -313,11 +313,14 @@ static void move_expired_inodes(struct l
> > }
> >
> > /*
> > - * Queue all expired dirty inodes for io, eldest first.
> > + * Queue all expired dirty inodes for io, eldest first:
> > + * (newly dirtied) => b_dirty inodes
> > + * => b_more_io inodes
> > + * => remaining inodes in b_io => (dequeue for sync)
> > */
> > static void queue_io(struct bdi_writeback *wb, unsigned long *older_than_this)
> > {
> > - list_splice_init(&wb->b_more_io, wb->b_io.prev);
> > + list_splice_init(&wb->b_more_io, &wb->b_io);
> > move_expired_inodes(&wb->b_dirty, &wb->b_io, older_than_this);
> > }
> >
> >
> > --
> >
> --
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PATCH 7/7] writeback: balance_dirty_pages() shall write more than dirtied pages
2009-09-09 15:44 ` [RFC][PATCH 7/7] writeback: balance_dirty_pages() shall write more than dirtied pages Jan Kara
@ 2009-09-10 1:42 ` Wu Fengguang
2009-09-10 12:57 ` Chris Mason
0 siblings, 1 reply; 19+ messages in thread
From: Wu Fengguang @ 2009-09-10 1:42 UTC (permalink / raw)
To: Jan Kara
Cc: Andrew Morton, Jens Axboe, Dave Chinner, Chris Mason,
Peter Zijlstra, Christoph Hellwig, Artem Bityutskiy, LKML,
linux-fsdevel@vger.kernel.org
On Wed, Sep 09, 2009 at 11:44:13PM +0800, Jan Kara wrote:
> On Wed 09-09-09 22:51:48, Wu Fengguang wrote:
> > Some filesystem may choose to write much more than ratelimit_pages
> > before calling balance_dirty_pages_ratelimited_nr(). So it is safer to
> > determine number to write based on real number of dirtied pages.
> >
> > The increased write_chunk may make the dirtier more bumpy. This is
> > filesystem writers' duty not to dirty too much at a time without
> > checking the ratelimit.
> I don't get this. balance_dirty_pages_ratelimited_nr() is called when we
> dirty the page, not when we write it out. So a problem would only happen if
> filesystem dirties pages by set_page_dirty() and won't call
> balance_dirty_pages_ratelimited_nr(). But e.g. generic_perform_write()
> and do_wp_page() takes care of that. So where's the problem?
It seems that btrfs_file_write() is writing in chunks of up to 1024-pages
(1024 is the computed nrptrs value in a 32bit kernel). And it calls
balance_dirty_pages_ratelimited_nr() each time it dirtied such a chunk.
Thanks,
Fengguang
> > ---
> > mm/page-writeback.c | 13 +++++++------
> > 1 file changed, 7 insertions(+), 6 deletions(-)
> >
> > --- linux.orig/mm/page-writeback.c 2009-09-09 21:19:21.000000000 +0800
> > +++ linux/mm/page-writeback.c 2009-09-09 21:25:45.000000000 +0800
> > @@ -44,12 +44,12 @@ static long ratelimit_pages = 32;
> > /*
> > * When balance_dirty_pages decides that the caller needs to perform some
> > * non-background writeback, this is how many pages it will attempt to write.
> > - * It should be somewhat larger than RATELIMIT_PAGES to ensure that reasonably
> > + * It should be somewhat larger than dirtied pages to ensure that reasonably
> > * large amounts of I/O are submitted.
> > */
> > -static inline long sync_writeback_pages(void)
> > +static inline long sync_writeback_pages(unsigned long dirtied)
> > {
> > - return ratelimit_pages + ratelimit_pages / 2;
> > + return dirtied + dirtied / 2;
> > }
> >
> > /* The following parameters are exported via /proc/sys/vm */
> > @@ -476,7 +476,8 @@ get_dirty_limits(unsigned long *pbackgro
> > * If we're over `background_thresh' then pdflush is woken to perform some
> > * writeout.
> > */
> > -static void balance_dirty_pages(struct address_space *mapping)
> > +static void balance_dirty_pages(struct address_space *mapping,
> > + unsigned long write_chunk)
> > {
> > long nr_reclaimable, bdi_nr_reclaimable;
> > long nr_writeback, bdi_nr_writeback;
> > @@ -484,7 +485,6 @@ static void balance_dirty_pages(struct a
> > unsigned long dirty_thresh;
> > unsigned long bdi_thresh;
> > unsigned long pages_written = 0;
> > - unsigned long write_chunk = sync_writeback_pages();
> >
> > struct backing_dev_info *bdi = mapping->backing_dev_info;
> >
> > @@ -638,9 +638,10 @@ void balance_dirty_pages_ratelimited_nr(
> > p = &__get_cpu_var(bdp_ratelimits);
> > *p += nr_pages_dirtied;
> > if (unlikely(*p >= ratelimit)) {
> > + ratelimit = sync_writeback_pages(*p);
> > *p = 0;
> > preempt_enable();
> > - balance_dirty_pages(mapping);
> > + balance_dirty_pages(mapping, ratelimit);
> > return;
> > }
> > preempt_enable();
> >
> > --
> >
> --
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PATCH 5/7] writeback: use 64MB MAX_WRITEBACK_PAGES
2009-09-09 23:29 ` [RFC][PATCH 5/7] writeback: use 64MB MAX_WRITEBACK_PAGES Theodore Tso
2009-09-10 0:13 ` Wu Fengguang
@ 2009-09-10 4:53 ` Peter Zijlstra
2009-09-10 7:35 ` Wu Fengguang
1 sibling, 1 reply; 19+ messages in thread
From: Peter Zijlstra @ 2009-09-10 4:53 UTC (permalink / raw)
To: Theodore Tso
Cc: Wu Fengguang, Andrew Morton, Jens Axboe, Dave Chinner,
Chris Mason, Christoph Hellwig, jack, Artem Bityutskiy, LKML,
linux-fsdevel
On Wed, 2009-09-09 at 19:29 -0400, Theodore Tso wrote:
> On Wed, Sep 09, 2009 at 10:51:46PM +0800, Wu Fengguang wrote:
> > + * The maximum number of pages to writeout in a single periodic/background
> > + * writeback operation. 64MB means I_SYNC may be hold for up to 1 second.
> > + * This is not a big problem since we normally do kind of trylock on I_SYNC
> > + * for non-data-integrity writes. Userspace tasks doing throttled writeback
> > + * do not use this value.
>
> What's your justification for using 64MB? Where are you getting 1
> second from? On a fast RAID array 64MB can be written in much less
> than 1 second.
Worse, on my 5mb/s usb stick writing out 64m will take forever.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PATCH 5/7] writeback: use 64MB MAX_WRITEBACK_PAGES
2009-09-10 4:53 ` Peter Zijlstra
@ 2009-09-10 7:35 ` Wu Fengguang
0 siblings, 0 replies; 19+ messages in thread
From: Wu Fengguang @ 2009-09-10 7:35 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Theodore Tso, Andrew Morton, Jens Axboe, Dave Chinner,
Chris Mason, Christoph Hellwig, jack@suse.cz, Artem Bityutskiy,
LKML, linux-fsdevel@vger.kernel.org
On Thu, Sep 10, 2009 at 12:53:18PM +0800, Peter Zijlstra wrote:
> On Wed, 2009-09-09 at 19:29 -0400, Theodore Tso wrote:
> > On Wed, Sep 09, 2009 at 10:51:46PM +0800, Wu Fengguang wrote:
> > > + * The maximum number of pages to writeout in a single periodic/background
> > > + * writeback operation. 64MB means I_SYNC may be hold for up to 1 second.
> > > + * This is not a big problem since we normally do kind of trylock on I_SYNC
> > > + * for non-data-integrity writes. Userspace tasks doing throttled writeback
> > > + * do not use this value.
> >
> > What's your justification for using 64MB? Where are you getting 1
> > second from? On a fast RAID array 64MB can be written in much less
> > than 1 second.
>
> Worse, on my 5mb/s usb stick writing out 64m will take forever.
cp bigfile1 bigfile2 /mnt/usb/
sync
In that case the user would notice that kernel keeps writing to one
file for up to 13 seconds before switching to another file.
A simple fix would look like this. It stops io continuation on one
file after 1 second. It will work because when io is congested, it
relies on the io continuation logic (based on last_file*) to retry
the same file until MAX_WRITEBACK_PAGES is reached. The queue-able
requests between congested <=> uncongested states are not very large.
For slow devices, the queue-able pages between empty <=> congested
states are also not very large. For example, my USB stick has
nr_requests=128 and max_sectors_kb=120. It would take less than 12MB
to congest this queue.
With this patch and my usb stick, the kernel may first sync 12MB for
bigfile1 (which takes 1-3 seconds), then sync bigfile2 for 1 second,
and then bigfile1 for 1 second, and so on.
It seems that we could now safely bump MAX_WRITEBACK_PAGES to even
larger values beyond 128MB :)
Thanks,
Fengguang
---
--- linux.orig/fs/fs-writeback.c 2009-09-10 15:02:48.000000000 +0800
+++ linux/fs/fs-writeback.c 2009-09-10 15:07:23.000000000 +0800
@@ -277,7 +277,8 @@ static void requeue_io(struct inode *ino
*/
static void requeue_partial_io(struct writeback_control *wbc, struct inode *inode)
{
- if (wbc->last_file_written == 0 ||
+ if (time_before(wbc->last_file_time + HZ, jiffies) ||
+ wbc->last_file_written == 0 ||
wbc->last_file_written >= MAX_WRITEBACK_PAGES)
return requeue_io(inode);
@@ -428,6 +429,7 @@ writeback_single_inode(struct inode *ino
if (wbc->last_file != inode->i_ino) {
wbc->last_file = inode->i_ino;
+ wbc->last_file_time = jiffies;
wbc->last_file_written = nr_to_write - wbc->nr_to_write;
} else
wbc->last_file_written += nr_to_write - wbc->nr_to_write;
--- linux.orig/include/linux/writeback.h 2009-09-10 15:07:28.000000000 +0800
+++ linux/include/linux/writeback.h 2009-09-10 15:08:46.000000000 +0800
@@ -46,6 +46,7 @@ struct writeback_control {
long nr_to_write; /* Write this many pages, and decrement
this for each page written */
unsigned long last_file; /* Inode number of last written file */
+ unsigned long last_file_time; /* First sync time for last file */
long last_file_written; /* Total pages written for last file */
long pages_skipped; /* Pages which were not written */
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PATCH 7/7] writeback: balance_dirty_pages() shall write more than dirtied pages
2009-09-10 1:42 ` Wu Fengguang
@ 2009-09-10 12:57 ` Chris Mason
2009-09-10 13:21 ` Wu Fengguang
0 siblings, 1 reply; 19+ messages in thread
From: Chris Mason @ 2009-09-10 12:57 UTC (permalink / raw)
To: Wu Fengguang
Cc: Jan Kara, Andrew Morton, Jens Axboe, Dave Chinner, Peter Zijlstra,
Christoph Hellwig, Artem Bityutskiy, LKML,
linux-fsdevel@vger.kernel.org
On Thu, Sep 10, 2009 at 09:42:01AM +0800, Wu Fengguang wrote:
> On Wed, Sep 09, 2009 at 11:44:13PM +0800, Jan Kara wrote:
> > On Wed 09-09-09 22:51:48, Wu Fengguang wrote:
> > > Some filesystem may choose to write much more than ratelimit_pages
> > > before calling balance_dirty_pages_ratelimited_nr(). So it is safer to
> > > determine number to write based on real number of dirtied pages.
> > >
> > > The increased write_chunk may make the dirtier more bumpy. This is
> > > filesystem writers' duty not to dirty too much at a time without
> > > checking the ratelimit.
> > I don't get this. balance_dirty_pages_ratelimited_nr() is called when we
> > dirty the page, not when we write it out. So a problem would only happen if
> > filesystem dirties pages by set_page_dirty() and won't call
> > balance_dirty_pages_ratelimited_nr(). But e.g. generic_perform_write()
> > and do_wp_page() takes care of that. So where's the problem?
>
> It seems that btrfs_file_write() is writing in chunks of up to 1024-pages
> (1024 is the computed nrptrs value in a 32bit kernel). And it calls
> balance_dirty_pages_ratelimited_nr() each time it dirtied such a chunk.
I can easily change this to call more often, but we do always call
balance_dirty_pages to reflect how much ram we've really sent down.
-chris
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PATCH 7/7] writeback: balance_dirty_pages() shall write more than dirtied pages
2009-09-10 12:57 ` Chris Mason
@ 2009-09-10 13:21 ` Wu Fengguang
2009-09-10 14:56 ` Peter Zijlstra
0 siblings, 1 reply; 19+ messages in thread
From: Wu Fengguang @ 2009-09-10 13:21 UTC (permalink / raw)
To: Chris Mason, Jan Kara, Andrew Morton, Jens Axboe, Dave Chinner,
Pete
On Thu, Sep 10, 2009 at 08:57:42PM +0800, Chris Mason wrote:
> On Thu, Sep 10, 2009 at 09:42:01AM +0800, Wu Fengguang wrote:
> > On Wed, Sep 09, 2009 at 11:44:13PM +0800, Jan Kara wrote:
> > > On Wed 09-09-09 22:51:48, Wu Fengguang wrote:
> > > > Some filesystem may choose to write much more than ratelimit_pages
> > > > before calling balance_dirty_pages_ratelimited_nr(). So it is safer to
> > > > determine number to write based on real number of dirtied pages.
> > > >
> > > > The increased write_chunk may make the dirtier more bumpy. This is
> > > > filesystem writers' duty not to dirty too much at a time without
> > > > checking the ratelimit.
> > > I don't get this. balance_dirty_pages_ratelimited_nr() is called when we
> > > dirty the page, not when we write it out. So a problem would only happen if
> > > filesystem dirties pages by set_page_dirty() and won't call
> > > balance_dirty_pages_ratelimited_nr(). But e.g. generic_perform_write()
> > > and do_wp_page() takes care of that. So where's the problem?
> >
> > It seems that btrfs_file_write() is writing in chunks of up to 1024-pages
> > (1024 is the computed nrptrs value in a 32bit kernel). And it calls
> > balance_dirty_pages_ratelimited_nr() each time it dirtied such a chunk.
>
> I can easily change this to call more often, but we do always call
> balance_dirty_pages to reflect how much ram we've really sent down.
Btrfs is doing OK. 2MB/4MB looks like reasonable chunk sizes. The
need-change part is balance_dirty_pages_ratelimited_nr(), hence this
patch :)
Thanks,
Fengguang
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PATCH 2/7] writeback: fix queue_io() ordering
2009-09-10 1:26 ` Wu Fengguang
@ 2009-09-10 14:14 ` Jan Kara
2009-09-10 14:17 ` Wu Fengguang
0 siblings, 1 reply; 19+ messages in thread
From: Jan Kara @ 2009-09-10 14:14 UTC (permalink / raw)
To: Wu Fengguang
Cc: Jan Kara, Andrew Morton, Jens Axboe, Dave Chinner, Chris Mason,
Peter Zijlstra, Christoph Hellwig, Artem Bityutskiy, LKML,
linux-fsdevel@vger.kernel.org
On Thu 10-09-09 09:26:24, Wu Fengguang wrote:
> On Wed, Sep 09, 2009 at 11:53:30PM +0800, Jan Kara wrote:
> > On Wed 09-09-09 22:51:43, Wu Fengguang wrote:
> > > This was not a bug, since b_io is empty for kupdate writeback.
> > > The next patch will do requeue_io() for non-kupdate writeback,
> > > so let's fix it.
> > But doesn't this patch also need your "anti-starvation" patch?
>
> Honza, can you show me that patch?
>
> > Looking into the code, we put inode to b_more_io when nr_to_write
> > drops to zero and this way we'd just start writing it again
> > in the next round...
>
> I'm confused. It's OK to start it in next round. Starvation can
> occur if we start it immediately in the next writeback_inodes()
> invocation. How can that happen with this patch?
Sorry, my fault. For kupdate, we splice the list only once s_io is empty
so that's not an issue. So the patch looks good.
Acked-by: Jan Kara <jack@suse.cz>
> > > CC: Dave Chinner <david@fromorbit.com>
> > > Cc: Martin Bligh <mbligh@google.com>
> > > Cc: Michael Rubin <mrubin@google.com>
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
> > > ---
> > > fs/fs-writeback.c | 7 +++++--
> > > 1 file changed, 5 insertions(+), 2 deletions(-)
> > >
> > > --- linux.orig/fs/fs-writeback.c 2009-09-09 21:41:14.000000000 +0800
> > > +++ linux/fs/fs-writeback.c 2009-09-09 21:45:15.000000000 +0800
> > > @@ -313,11 +313,14 @@ static void move_expired_inodes(struct l
> > > }
> > >
> > > /*
> > > - * Queue all expired dirty inodes for io, eldest first.
> > > + * Queue all expired dirty inodes for io, eldest first:
> > > + * (newly dirtied) => b_dirty inodes
> > > + * => b_more_io inodes
> > > + * => remaining inodes in b_io => (dequeue for sync)
> > > */
> > > static void queue_io(struct bdi_writeback *wb, unsigned long *older_than_this)
> > > {
> > > - list_splice_init(&wb->b_more_io, wb->b_io.prev);
> > > + list_splice_init(&wb->b_more_io, &wb->b_io);
> > > move_expired_inodes(&wb->b_dirty, &wb->b_io, older_than_this);
> > > }
> > >
> > >
> > > --
> > >
> > --
> > Jan Kara <jack@suse.cz>
> > SUSE Labs, CR
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PATCH 2/7] writeback: fix queue_io() ordering
2009-09-10 14:14 ` Jan Kara
@ 2009-09-10 14:17 ` Wu Fengguang
0 siblings, 0 replies; 19+ messages in thread
From: Wu Fengguang @ 2009-09-10 14:17 UTC (permalink / raw)
To: Jan Kara
Cc: Andrew Morton, Jens Axboe, Dave Chinner, Chris Mason,
Peter Zijlstra, Christoph Hellwig, Artem Bityutskiy, LKML,
linux-fsdevel@vger.kernel.org
On Thu, Sep 10, 2009 at 10:14:15PM +0800, Jan Kara wrote:
> On Thu 10-09-09 09:26:24, Wu Fengguang wrote:
> > On Wed, Sep 09, 2009 at 11:53:30PM +0800, Jan Kara wrote:
> > > On Wed 09-09-09 22:51:43, Wu Fengguang wrote:
> > > > This was not a bug, since b_io is empty for kupdate writeback.
> > > > The next patch will do requeue_io() for non-kupdate writeback,
> > > > so let's fix it.
> > > But doesn't this patch also need your "anti-starvation" patch?
> >
> > Honza, can you show me that patch?
> >
> > > Looking into the code, we put inode to b_more_io when nr_to_write
> > > drops to zero and this way we'd just start writing it again
> > > in the next round...
> >
> > I'm confused. It's OK to start it in next round. Starvation can
> > occur if we start it immediately in the next writeback_inodes()
> > invocation. How can that happen with this patch?
> Sorry, my fault. For kupdate, we splice the list only once s_io is empty
> so that's not an issue. So the patch looks good.
> Acked-by: Jan Kara <jack@suse.cz>
Thank you :)
Regards,
Fengguang
> > > > CC: Dave Chinner <david@fromorbit.com>
> > > > Cc: Martin Bligh <mbligh@google.com>
> > > > Cc: Michael Rubin <mrubin@google.com>
> > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
> > > > ---
> > > > fs/fs-writeback.c | 7 +++++--
> > > > 1 file changed, 5 insertions(+), 2 deletions(-)
> > > >
> > > > --- linux.orig/fs/fs-writeback.c 2009-09-09 21:41:14.000000000 +0800
> > > > +++ linux/fs/fs-writeback.c 2009-09-09 21:45:15.000000000 +0800
> > > > @@ -313,11 +313,14 @@ static void move_expired_inodes(struct l
> > > > }
> > > >
> > > > /*
> > > > - * Queue all expired dirty inodes for io, eldest first.
> > > > + * Queue all expired dirty inodes for io, eldest first:
> > > > + * (newly dirtied) => b_dirty inodes
> > > > + * => b_more_io inodes
> > > > + * => remaining inodes in b_io => (dequeue for sync)
> > > > */
> > > > static void queue_io(struct bdi_writeback *wb, unsigned long *older_than_this)
> > > > {
> > > > - list_splice_init(&wb->b_more_io, wb->b_io.prev);
> > > > + list_splice_init(&wb->b_more_io, &wb->b_io);
> > > > move_expired_inodes(&wb->b_dirty, &wb->b_io, older_than_this);
> > > > }
> > > >
> > > >
> > > > --
> > > >
> > > --
> > > Jan Kara <jack@suse.cz>
> > > SUSE Labs, CR
> --
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PATCH 7/7] writeback: balance_dirty_pages() shall write more than dirtied pages
2009-09-10 13:21 ` Wu Fengguang
@ 2009-09-10 14:56 ` Peter Zijlstra
2009-09-10 15:14 ` Wu Fengguang
0 siblings, 1 reply; 19+ messages in thread
From: Peter Zijlstra @ 2009-09-10 14:56 UTC (permalink / raw)
To: Wu Fengguang
Cc: Chris Mason, Jan Kara, Andrew Morton, Jens Axboe, Dave Chinner,
Christoph Hellwig, Artem Bityutskiy, LKML,
linux-fsdevel@vger.kernel.org
On Thu, 2009-09-10 at 21:21 +0800, Wu Fengguang wrote:
> On Thu, Sep 10, 2009 at 08:57:42PM +0800, Chris Mason wrote:
> > On Thu, Sep 10, 2009 at 09:42:01AM +0800, Wu Fengguang wrote:
> > > On Wed, Sep 09, 2009 at 11:44:13PM +0800, Jan Kara wrote:
> > > > On Wed 09-09-09 22:51:48, Wu Fengguang wrote:
> > > > > Some filesystem may choose to write much more than ratelimit_pages
> > > > > before calling balance_dirty_pages_ratelimited_nr(). So it is safer to
> > > > > determine number to write based on real number of dirtied pages.
> > > > >
> > > > > The increased write_chunk may make the dirtier more bumpy. This is
> > > > > filesystem writers' duty not to dirty too much at a time without
> > > > > checking the ratelimit.
> > > > I don't get this. balance_dirty_pages_ratelimited_nr() is called when we
> > > > dirty the page, not when we write it out. So a problem would only happen if
> > > > filesystem dirties pages by set_page_dirty() and won't call
> > > > balance_dirty_pages_ratelimited_nr(). But e.g. generic_perform_write()
> > > > and do_wp_page() takes care of that. So where's the problem?
> > >
> > > It seems that btrfs_file_write() is writing in chunks of up to 1024-pages
> > > (1024 is the computed nrptrs value in a 32bit kernel). And it calls
> > > balance_dirty_pages_ratelimited_nr() each time it dirtied such a chunk.
> >
> > I can easily change this to call more often, but we do always call
> > balance_dirty_pages to reflect how much ram we've really sent down.
>
> Btrfs is doing OK. 2MB/4MB looks like reasonable chunk sizes. The
> need-change part is balance_dirty_pages_ratelimited_nr(), hence this
> patch :)
I'm not getting it, it calls set_page_dirty() for each page, right? and
then it calls into balance_dirty_pages_ratelimited_nr(), that sounds
right. What is the problem with that?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PATCH 7/7] writeback: balance_dirty_pages() shall write more than dirtied pages
2009-09-10 14:56 ` Peter Zijlstra
@ 2009-09-10 15:14 ` Wu Fengguang
2009-09-10 15:31 ` Peter Zijlstra
0 siblings, 1 reply; 19+ messages in thread
From: Wu Fengguang @ 2009-09-10 15:14 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Chris Mason, Jan Kara, Andrew Morton, Jens Axboe, Dave Chinner,
Christoph Hellwig, Artem Bityutskiy, LKML,
linux-fsdevel@vger.kernel.org
On Thu, Sep 10, 2009 at 10:56:04PM +0800, Peter Zijlstra wrote:
> On Thu, 2009-09-10 at 21:21 +0800, Wu Fengguang wrote:
> > On Thu, Sep 10, 2009 at 08:57:42PM +0800, Chris Mason wrote:
> > > On Thu, Sep 10, 2009 at 09:42:01AM +0800, Wu Fengguang wrote:
> > > > On Wed, Sep 09, 2009 at 11:44:13PM +0800, Jan Kara wrote:
> > > > > On Wed 09-09-09 22:51:48, Wu Fengguang wrote:
> > > > > > Some filesystem may choose to write much more than ratelimit_pages
> > > > > > before calling balance_dirty_pages_ratelimited_nr(). So it is safer to
> > > > > > determine number to write based on real number of dirtied pages.
> > > > > >
> > > > > > The increased write_chunk may make the dirtier more bumpy. This is
> > > > > > filesystem writers' duty not to dirty too much at a time without
> > > > > > checking the ratelimit.
> > > > > I don't get this. balance_dirty_pages_ratelimited_nr() is called when we
> > > > > dirty the page, not when we write it out. So a problem would only happen if
> > > > > filesystem dirties pages by set_page_dirty() and won't call
> > > > > balance_dirty_pages_ratelimited_nr(). But e.g. generic_perform_write()
> > > > > and do_wp_page() takes care of that. So where's the problem?
> > > >
> > > > It seems that btrfs_file_write() is writing in chunks of up to 1024-pages
> > > > (1024 is the computed nrptrs value in a 32bit kernel). And it calls
> > > > balance_dirty_pages_ratelimited_nr() each time it dirtied such a chunk.
> > >
> > > I can easily change this to call more often, but we do always call
> > > balance_dirty_pages to reflect how much ram we've really sent down.
> >
> > Btrfs is doing OK. 2MB/4MB looks like reasonable chunk sizes. The
> > need-change part is balance_dirty_pages_ratelimited_nr(), hence this
> > patch :)
>
> I'm not getting it, it calls set_page_dirty() for each page, right? and
> then it calls into balance_dirty_pages_ratelimited_nr(), that sounds
> right. What is the problem with that?
It looks like btrfs_file_write() eventually calls
__set_page_dirty_buffers() which in turn won't call
balance_dirty_pages*(). This is why do_wp_page() calls
set_page_dirty_balance() to do balance_dirty_pages*().
So btrfs_file_write() explicitly calls
balance_dirty_pages_ratelimited_nr() to get throttled.
Thanks,
Fengguang
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PATCH 7/7] writeback: balance_dirty_pages() shall write more than dirtied pages
2009-09-10 15:14 ` Wu Fengguang
@ 2009-09-10 15:31 ` Peter Zijlstra
2009-09-10 15:41 ` Wu Fengguang
0 siblings, 1 reply; 19+ messages in thread
From: Peter Zijlstra @ 2009-09-10 15:31 UTC (permalink / raw)
To: Wu Fengguang
Cc: Chris Mason, Jan Kara, Andrew Morton, Jens Axboe, Dave Chinner,
Christoph Hellwig, Artem Bityutskiy, LKML,
linux-fsdevel@vger.kernel.org
On Thu, 2009-09-10 at 23:14 +0800, Wu Fengguang wrote:
> On Thu, Sep 10, 2009 at 10:56:04PM +0800, Peter Zijlstra wrote:
> > On Thu, 2009-09-10 at 21:21 +0800, Wu Fengguang wrote:
> > > On Thu, Sep 10, 2009 at 08:57:42PM +0800, Chris Mason wrote:
> > > > On Thu, Sep 10, 2009 at 09:42:01AM +0800, Wu Fengguang wrote:
> > > > > On Wed, Sep 09, 2009 at 11:44:13PM +0800, Jan Kara wrote:
> > > > > > On Wed 09-09-09 22:51:48, Wu Fengguang wrote:
> > > > > > > Some filesystem may choose to write much more than ratelimit_pages
> > > > > > > before calling balance_dirty_pages_ratelimited_nr(). So it is safer to
> > > > > > > determine number to write based on real number of dirtied pages.
> > > > > > >
> > > > > > > The increased write_chunk may make the dirtier more bumpy. This is
> > > > > > > filesystem writers' duty not to dirty too much at a time without
> > > > > > > checking the ratelimit.
> > > > > > I don't get this. balance_dirty_pages_ratelimited_nr() is called when we
> > > > > > dirty the page, not when we write it out. So a problem would only happen if
> > > > > > filesystem dirties pages by set_page_dirty() and won't call
> > > > > > balance_dirty_pages_ratelimited_nr(). But e.g. generic_perform_write()
> > > > > > and do_wp_page() takes care of that. So where's the problem?
> > > > >
> > > > > It seems that btrfs_file_write() is writing in chunks of up to 1024-pages
> > > > > (1024 is the computed nrptrs value in a 32bit kernel). And it calls
> > > > > balance_dirty_pages_ratelimited_nr() each time it dirtied such a chunk.
> > > >
> > > > I can easily change this to call more often, but we do always call
> > > > balance_dirty_pages to reflect how much ram we've really sent down.
> > >
> > > Btrfs is doing OK. 2MB/4MB looks like reasonable chunk sizes. The
> > > need-change part is balance_dirty_pages_ratelimited_nr(), hence this
> > > patch :)
> >
> > I'm not getting it, it calls set_page_dirty() for each page, right? and
> > then it calls into balance_dirty_pages_ratelimited_nr(), that sounds
> > right. What is the problem with that?
>
> It looks like btrfs_file_write() eventually calls
> __set_page_dirty_buffers() which in turn won't call
> balance_dirty_pages*(). This is why do_wp_page() calls
> set_page_dirty_balance() to do balance_dirty_pages*().
>
> So btrfs_file_write() explicitly calls
> balance_dirty_pages_ratelimited_nr() to get throttled.
Right, so what is wrong with than, and how does this patch fix that?
[ the only thing you have to be careful with is that you don't
excessively grow the error bound on the dirty limit ]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PATCH 7/7] writeback: balance_dirty_pages() shall write more than dirtied pages
2009-09-10 15:31 ` Peter Zijlstra
@ 2009-09-10 15:41 ` Wu Fengguang
2009-09-10 15:54 ` Peter Zijlstra
0 siblings, 1 reply; 19+ messages in thread
From: Wu Fengguang @ 2009-09-10 15:41 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Chris Mason, Jan Kara, Andrew Morton, Jens Axboe, Dave Chinner,
Christoph Hellwig, Artem Bityutskiy, LKML,
linux-fsdevel@vger.kernel.org
On Thu, Sep 10, 2009 at 11:31:38PM +0800, Peter Zijlstra wrote:
> On Thu, 2009-09-10 at 23:14 +0800, Wu Fengguang wrote:
> > On Thu, Sep 10, 2009 at 10:56:04PM +0800, Peter Zijlstra wrote:
> > > On Thu, 2009-09-10 at 21:21 +0800, Wu Fengguang wrote:
> > > > On Thu, Sep 10, 2009 at 08:57:42PM +0800, Chris Mason wrote:
> > > > > On Thu, Sep 10, 2009 at 09:42:01AM +0800, Wu Fengguang wrote:
> > > > > > On Wed, Sep 09, 2009 at 11:44:13PM +0800, Jan Kara wrote:
> > > > > > > On Wed 09-09-09 22:51:48, Wu Fengguang wrote:
> > > > > > > > Some filesystem may choose to write much more than ratelimit_pages
> > > > > > > > before calling balance_dirty_pages_ratelimited_nr(). So it is safer to
> > > > > > > > determine number to write based on real number of dirtied pages.
> > > > > > > >
> > > > > > > > The increased write_chunk may make the dirtier more bumpy. This is
> > > > > > > > filesystem writers' duty not to dirty too much at a time without
> > > > > > > > checking the ratelimit.
> > > > > > > I don't get this. balance_dirty_pages_ratelimited_nr() is called when we
> > > > > > > dirty the page, not when we write it out. So a problem would only happen if
> > > > > > > filesystem dirties pages by set_page_dirty() and won't call
> > > > > > > balance_dirty_pages_ratelimited_nr(). But e.g. generic_perform_write()
> > > > > > > and do_wp_page() takes care of that. So where's the problem?
> > > > > >
> > > > > > It seems that btrfs_file_write() is writing in chunks of up to 1024-pages
> > > > > > (1024 is the computed nrptrs value in a 32bit kernel). And it calls
> > > > > > balance_dirty_pages_ratelimited_nr() each time it dirtied such a chunk.
> > > > >
> > > > > I can easily change this to call more often, but we do always call
> > > > > balance_dirty_pages to reflect how much ram we've really sent down.
> > > >
> > > > Btrfs is doing OK. 2MB/4MB looks like reasonable chunk sizes. The
> > > > need-change part is balance_dirty_pages_ratelimited_nr(), hence this
> > > > patch :)
> > >
> > > I'm not getting it, it calls set_page_dirty() for each page, right? and
> > > then it calls into balance_dirty_pages_ratelimited_nr(), that sounds
> > > right. What is the problem with that?
> >
> > It looks like btrfs_file_write() eventually calls
> > __set_page_dirty_buffers() which in turn won't call
> > balance_dirty_pages*(). This is why do_wp_page() calls
> > set_page_dirty_balance() to do balance_dirty_pages*().
> >
> > So btrfs_file_write() explicitly calls
> > balance_dirty_pages_ratelimited_nr() to get throttled.
>
> Right, so what is wrong with than, and how does this patch fix that?
>
> [ the only thing you have to be careful with is that you don't
> excessively grow the error bound on the dirty limit ]
Then we could form a loop:
btrfs_file_write(): dirty 1024 pages
balance_dirty_pages(): write up to 12 pages (= ratelimit_pages * 1.5)
in which the writeback rate cannot keep up with dirty rate,
and the dirty pages go all the way beyond dirty_thresh.
Sorry for writing such a vague changelog!
Thanks,
Fengguang
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PATCH 7/7] writeback: balance_dirty_pages() shall write more than dirtied pages
2009-09-10 15:41 ` Wu Fengguang
@ 2009-09-10 15:54 ` Peter Zijlstra
2009-09-10 16:08 ` Wu Fengguang
0 siblings, 1 reply; 19+ messages in thread
From: Peter Zijlstra @ 2009-09-10 15:54 UTC (permalink / raw)
To: Wu Fengguang
Cc: Chris Mason, Jan Kara, Andrew Morton, Jens Axboe, Dave Chinner,
Christoph Hellwig, Artem Bityutskiy, LKML,
linux-fsdevel@vger.kernel.org
On Thu, 2009-09-10 at 23:41 +0800, Wu Fengguang wrote:
> > > So btrfs_file_write() explicitly calls
> > > balance_dirty_pages_ratelimited_nr() to get throttled.
> >
> > Right, so what is wrong with than, and how does this patch fix that?
> >
> > [ the only thing you have to be careful with is that you don't
> > excessively grow the error bound on the dirty limit ]
>
> Then we could form a loop:
>
> btrfs_file_write(): dirty 1024 pages
> balance_dirty_pages(): write up to 12 pages (= ratelimit_pages * 1.5)
>
> in which the writeback rate cannot keep up with dirty rate,
> and the dirty pages go all the way beyond dirty_thresh.
Ah, ok so this is to keep the error bound on the dirty limit bounded,
because we can break out of balance_dirty_pages() early, the /* We've
done our duty */ break.
Which unbalances the duty vs the dirty ratio.
I figure that with the task dirty limit stuff we could maybe try to get
rid of this break.. worth a try.
> Sorry for writing such a vague changelog!
np, as long as we get there :-)
Change makes sense now, thanks!
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC][PATCH 7/7] writeback: balance_dirty_pages() shall write more than dirtied pages
2009-09-10 15:54 ` Peter Zijlstra
@ 2009-09-10 16:08 ` Wu Fengguang
0 siblings, 0 replies; 19+ messages in thread
From: Wu Fengguang @ 2009-09-10 16:08 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Chris Mason, Jan Kara, Andrew Morton, Jens Axboe, Dave Chinner,
Christoph Hellwig, Artem Bityutskiy, LKML,
linux-fsdevel@vger.kernel.org
On Thu, Sep 10, 2009 at 11:54:29PM +0800, Peter Zijlstra wrote:
> On Thu, 2009-09-10 at 23:41 +0800, Wu Fengguang wrote:
> > > > So btrfs_file_write() explicitly calls
> > > > balance_dirty_pages_ratelimited_nr() to get throttled.
> > >
> > > Right, so what is wrong with than, and how does this patch fix that?
> > >
> > > [ the only thing you have to be careful with is that you don't
> > > excessively grow the error bound on the dirty limit ]
> >
> > Then we could form a loop:
> >
> > btrfs_file_write(): dirty 1024 pages
> > balance_dirty_pages(): write up to 12 pages (= ratelimit_pages * 1.5)
> >
> > in which the writeback rate cannot keep up with dirty rate,
> > and the dirty pages go all the way beyond dirty_thresh.
>
> Ah, ok so this is to keep the error bound on the dirty limit bounded,
> because we can break out of balance_dirty_pages() early, the /* We've
> done our duty */ break.
>
> Which unbalances the duty vs the dirty ratio.
Right!
> I figure that with the task dirty limit stuff we could maybe try to get
> rid of this break.. worth a try.
Be careful. Without that break, the time a task get throttled in a
single trip may go out of control. For example, task B get blocked
for 1000 seconds because there is a task A keep dirtying pages, in
the mean time task A's dirty thresh going down slowly, but still
larger than B's.
> > Sorry for writing such a vague changelog!
>
> np, as long as we get there :-)
>
> Change makes sense now, thanks!
May I add you ack?
Thanks,
Fengguang
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2009-09-10 16:08 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20090909145141.293229693@intel.com>
[not found] ` <20090909150601.159061863@intel.com>
2009-09-09 15:44 ` [RFC][PATCH 7/7] writeback: balance_dirty_pages() shall write more than dirtied pages Jan Kara
2009-09-10 1:42 ` Wu Fengguang
2009-09-10 12:57 ` Chris Mason
2009-09-10 13:21 ` Wu Fengguang
2009-09-10 14:56 ` Peter Zijlstra
2009-09-10 15:14 ` Wu Fengguang
2009-09-10 15:31 ` Peter Zijlstra
2009-09-10 15:41 ` Wu Fengguang
2009-09-10 15:54 ` Peter Zijlstra
2009-09-10 16:08 ` Wu Fengguang
[not found] ` <20090909150600.330539880@intel.com>
2009-09-09 15:45 ` [RFC][PATCH 1/7] writeback: cleanup writeback_single_inode() Jan Kara
[not found] ` <20090909150600.451373732@intel.com>
2009-09-09 15:53 ` [RFC][PATCH 2/7] writeback: fix queue_io() ordering Jan Kara
2009-09-10 1:26 ` Wu Fengguang
2009-09-10 14:14 ` Jan Kara
2009-09-10 14:17 ` Wu Fengguang
[not found] ` <20090909150600.874037375@intel.com>
2009-09-09 23:29 ` [RFC][PATCH 5/7] writeback: use 64MB MAX_WRITEBACK_PAGES Theodore Tso
2009-09-10 0:13 ` Wu Fengguang
2009-09-10 4:53 ` Peter Zijlstra
2009-09-10 7:35 ` Wu Fengguang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).