linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Wu Fengguang <fengguang.wu@intel.com>
Cc: Jan Kara <jack@suse.cz>, Peter Zijlstra <peterz@infradead.org>,
	Chris Mason <chris.mason@oracle.com>,
	Artem Bityutskiy <dedekind1@gmail.com>,
	Jens Axboe <jens.axboe@oracle.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"david@fromorbit.com" <david@fromorbit.com>,
	"hch@infradead.org" <hch@infradead.org>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	Theodore Ts'o <tytso@mit.edu>
Subject: Re: [PATCH 8/8] vm: Add an tuning knob for vm.max_writeback_mb
Date: Tue, 29 Sep 2009 19:35:06 +0200	[thread overview]
Message-ID: <20090929173506.GE11573@duck.suse.cz> (raw)
In-Reply-To: <20090924083342.GA15918@localhost>

On Thu 24-09-09 16:33:42, Wu Fengguang wrote:
> On Mon, Sep 14, 2009 at 07:17:21PM +0800, Jan Kara wrote:
> > On Thu 10-09-09 17:49:10, Peter Zijlstra wrote:
> > > On Wed, 2009-09-09 at 16:23 +0200, Jan Kara wrote:
> > > >   Well, what I imagined we could do is:
> > > > Have a per-bdi variable 'pages_written' - that would reflect the amount of
> > > > pages written to the bdi since boot (OK, we'd have to handle overflows but
> > > > that's doable).
> > > > 
> > > > There will be a per-bdi variable 'pages_waited'. When a thread should sleep
> > > > in balance_dirty_pages() because we are over limits, it kicks writeback thread
> > > > and does:
> > > >   to_wait =  max(pages_waited, pages_written) + sync_dirty_pages() (or
> > > > whatever number we decide)
> > > >   pages_waited = to_wait
> > > >   sleep until pages_written reaches to_wait or we drop below dirty limits.
> > > > 
> > > > That will make sure each thread will sleep until writeback threads have done
> > > > their duty for the writing thread.
> > > > 
> > > > If we make sure sleeping threads are properly ordered on the wait queue,
> > > > we could always wakeup just the first one and thus avoid the herding
> > > > effect. When we drop below dirty limits, we would just wakeup the whole
> > > > waitqueue.
> > > > 
> > > > Does this sound reasonable?
> > > 
> > > That seems to go wrong when there's multiple tasks waiting on the same
> > > bdi, you'd count each page for 1/n its weight.
> > > 
> > > Suppose pages_written = 1024, and 4 tasks block and compute their to
> > > wait as pages_written + 256 = 1280, then we'd release all 4 of them
> > > after 256 pages are written, instead of 4*256, which would be
> > > pages_written = 2048.
> >   Well, there's some locking needed of course. The intent is to stack
> > demands as they come. So in case pages_written = 1024, pages_waited = 1024
> > we would do:
> > THREAD 1:
> > 
> > spin_lock
> > to_wait = 1024 + 256
> > pages_waited = 1280
> > spin_unlock
> > 
> > THREAD 2:
> > 
> > spin_lock
> > to_wait = 1280 + 256
> > pages_waited = 1536
> > spin_unlock
> > 
> >   So weight of each page will be kept. The fact that second thread
> > effectively waits until the first thread has its demand satisfied looks
> > strange at the first sight but we don't do better currently and I think
> > it's fine - if they were two writer threads, then soon the thread released
> > first will queue behind the thread still waiting so long term the behavior
> > should be fair.
> 
> Yeah, FIFO queuing should be good enough.
> 
> I'd like to propose one more data structure for evaluation :)
> 
> - bdi->throttle_lock
> - bdi->throttle_list    pages to sync for each waiting task, taken from sync_writeback_pages()
> - bdi->throttle_pages   (counted down) pages to sync for the head task, shall be atomic_t
> 
> In balance_dirty_pages(), it would do
> 
>         nr_to_sync = sync_writeback_pages()
>         if (list_empty(bdi->throttle_list))  # I'm the only task
>                 bdi->throttle_pages = nr_to_sync
>         append nr_to_sync to bdi->throttle_list
>         kick off background writeback
>         wait
>         remove itself from bdi->throttle_list and wait list
>         set bdi->throttle_pages for new head task (or LONG_MAX)
> 
> In __bdi_writeout_inc(), it would do
> 
>         if (--bdi->throttle_pages <= 0)
>                 check and wake up head task
  Yeah, this would work as well. I don't see a big difference between my
approach and this so if you get to implementing this, I'm happy :).

> In wb_writeback(), it would do
> 
>         if (args->for_background && exiting)
>                 wake up all throttled tasks
> To prevent wake up too many tasks at the same time, it can relax the
> background threshold a bit, so that __bdi_writeout_inc() become the
> only wake up point in normal cases.
> 
>         if (args->for_background && !list_empty(bdi->throttle_list) &&
>                 over background_thresh - background_thresh / 32)
>                 keep write pages;
  We want to wakeup tasks when we get below dirty_limit (either global
or per-bdi). Not when we get below background threshold...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

  parent reply	other threads:[~2009-09-29 17:35 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-08  9:23 [PATCH 0/8] Per-bdi writeback flusher threads v19 Jens Axboe
2009-09-08  9:23 ` [PATCH 1/8] writeback: get rid of generic_sync_sb_inodes() export Jens Axboe
2009-09-08 10:27   ` Artem Bityutskiy
2009-09-08 10:41     ` Jens Axboe
2009-09-08 10:52       ` Artem Bityutskiy
2009-09-08 10:57         ` Jens Axboe
2009-09-08 11:01           ` Artem Bityutskiy
2009-09-08 11:05             ` Jens Axboe
2009-09-08 11:31               ` Artem Bityutskiy
2009-09-08  9:23 ` [PATCH 2/8] writeback: move dirty inodes from super_block to backing_dev_info Jens Axboe
2009-09-08  9:23 ` [PATCH 3/8] writeback: switch to per-bdi threads for flushing data Jens Axboe
2009-09-08 13:46   ` Daniel Walker
2009-09-08 14:21     ` Jens Axboe
2009-09-08  9:23 ` [PATCH 4/8] writeback: get rid of pdflush completely Jens Axboe
2009-09-08  9:23 ` [PATCH 5/8] writeback: add some debug inode list counters to bdi stats Jens Axboe
2009-09-08  9:23 ` [PATCH 6/8] writeback: add name to backing_dev_info Jens Axboe
2009-09-08  9:23 ` [PATCH 7/8] writeback: check for registered bdi in flusher add and inode dirty Jens Axboe
2009-09-08  9:23 ` [PATCH 8/8] vm: Add an tuning knob for vm.max_writeback_mb Jens Axboe
2009-09-08 10:37   ` Artem Bityutskiy
2009-09-08 16:06     ` Peter Zijlstra
2009-09-08 16:29       ` Chris Mason
2009-09-08 16:56         ` Peter Zijlstra
2009-09-08 17:28           ` Chris Mason
2009-09-08 17:46             ` Peter Zijlstra
2009-09-08 17:55               ` Peter Zijlstra
2009-09-08 18:32                 ` Peter Zijlstra
2009-09-09 14:23                   ` Jan Kara
2009-09-09 14:37                     ` Wu Fengguang
2009-09-10 15:49                     ` Peter Zijlstra
2009-09-14 11:17                       ` Jan Kara
2009-09-24  8:33                         ` Wu Fengguang
2009-09-24 15:38                           ` Peter Zijlstra
2009-09-25  1:33                             ` Wu Fengguang
2009-09-29 17:35                           ` Jan Kara [this message]
2009-09-30  1:24                             ` Wu Fengguang
2009-09-30 11:55                               ` Jan Kara
2009-09-30 12:10                                 ` Jens Axboe
2009-10-01 15:17                                   ` Wu Fengguang
2009-10-01 13:36                                 ` Wu Fengguang
2009-10-01 14:22                                   ` Jan Kara
2009-10-01 14:54                                     ` Wu Fengguang
2009-10-01 21:35                                       ` Jan Kara
2009-10-02  2:25                                         ` Wu Fengguang
2009-10-02  9:54                                           ` Jan Kara
2009-10-02 10:34                                             ` Wu Fengguang
2009-09-08 18:35                 ` Chris Mason
2009-09-08 17:57               ` Chris Mason
2009-09-08 18:28                 ` Peter Zijlstra
2009-09-09  1:53           ` Dave Chinner
2009-09-09  3:52             ` Wu Fengguang
2009-09-08 18:06         ` Theodore Tso
     [not found]           ` <20090908181937.GA11545@infradead.org>
2009-09-08 19:34             ` Theodore Tso
2009-09-09  9:29         ` Wu Fengguang
2009-09-09 12:28           ` Christoph Hellwig
2009-09-09 12:32             ` Wu Fengguang
2009-09-09 12:36               ` Artem Bityutskiy
2009-09-09 12:37               ` Jens Axboe
2009-09-09 12:43                 ` Christoph Hellwig
2009-09-09 12:44                   ` Jens Axboe
2009-09-09 12:51                     ` Christoph Hellwig
2009-09-09 12:57                 ` Wu Fengguang
  -- strict thread matches above, loose matches on Subject: below --
2009-09-04  7:46 [PATCH 0/8] Per-bdi writeback flusher threads v18 Jens Axboe
2009-09-04  7:46 ` [PATCH 8/8] vm: Add an tuning knob for vm.max_writeback_mb Jens Axboe
2009-09-04 15:28   ` Richard Kennedy
2009-09-05 13:26     ` Jamie Lokier
2009-09-05 16:18       ` Richard Kennedy
2009-09-05 16:46     ` Theodore Tso
2009-09-07 19:09   ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090929173506.GE11573@duck.suse.cz \
    --to=jack@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=chris.mason@oracle.com \
    --cc=david@fromorbit.com \
    --cc=dedekind1@gmail.com \
    --cc=fengguang.wu@intel.com \
    --cc=hch@infradead.org \
    --cc=jens.axboe@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).