From: Jan Kara <jack@suse.cz>
To: Jens Axboe <jens.axboe@oracle.com>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
chris.mason@oracle.com, david@fromorbit.com, hch@infradead.org,
akpm@linux-foundation.org, yanmin_zhang@linux.intel.com
Subject: Re: [PATCH 04/12] writeback: switch to per-bdi threads for flushing data
Date: Mon, 25 May 2009 14:34:35 +0200 [thread overview]
Message-ID: <20090525123434.GA5862@duck.suse.cz> (raw)
In-Reply-To: <20090525121648.GW11363@kernel.dk>
On Mon 25-05-09 14:16:48, Jens Axboe wrote:
> On Mon, May 25 2009, Jan Kara wrote:
> > On Mon 25-05-09 12:34:10, Jens Axboe wrote:
> > > On Mon, May 25 2009, Jan Kara wrote:
> > > > Hi Jens,
> > > >
> > > > I've noticed a few more things:
> > > >
> > > > On Mon 25-05-09 09:34:39, Jens Axboe wrote:
> > > > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > > > > index 1137408..7cb4d02 100644
> > > > > --- a/fs/fs-writeback.c
> > > > > +++ b/fs/fs-writeback.c
> > > > > @@ -19,6 +19,8 @@
> > > > > #include <linux/sched.h>
> > > > > #include <linux/fs.h>
> > > > > #include <linux/mm.h>
> > > > > +#include <linux/kthread.h>
> > > > > +#include <linux/freezer.h>
> > > > > #include <linux/writeback.h>
> > > > > #include <linux/blkdev.h>
> > > > > #include <linux/backing-dev.h>
> > > > > @@ -61,10 +63,193 @@ int writeback_in_progress(struct backing_dev_info *bdi)
> > > > > */
> > > > > static void writeback_release(struct backing_dev_info *bdi)
> > > > > {
> > > > > - BUG_ON(!writeback_in_progress(bdi));
> > > > > + WARN_ON_ONCE(!writeback_in_progress(bdi));
> > > > > + bdi->wb_arg.nr_pages = 0;
> > > > > + bdi->wb_arg.sb = NULL;
> > > > > clear_bit(BDI_pdflush, &bdi->state);
> > > > > }
> > > > >
> > > > > +int bdi_start_writeback(struct backing_dev_info *bdi, struct super_block *sb,
> > > > > + long nr_pages, enum writeback_sync_modes sync_mode)
> > > > > +{
> > > > > + /*
> > > > > + * This only happens the first time someone kicks this bdi, so put
> > > > > + * it out-of-line.
> > > > > + */
> > > > > + if (unlikely(!bdi->task)) {
> > > > > + bdi_add_default_flusher_task(bdi);
> > > > > + return 1;
> > > > > + }
> > > > > +
> > > > > + if (writeback_acquire(bdi)) {
> > > > > + bdi->wb_arg.nr_pages = nr_pages;
> > > > > + bdi->wb_arg.sb = sb;
> > > > > + bdi->wb_arg.sync_mode = sync_mode;
> > > > > + /*
> > > > > + * make above store seen before the task is woken
> > > > > + */
> > > > > + smp_mb();
> > > > > + wake_up(&bdi->wait);
> > > > > + }
> > > > Hmm, wouldn't the interface be more useful if we could just pass down the
> > > > whole writeback_control to the flusher threads?
> > >
> > > Yeah, that's what the later patch does too, when support for > 1 thread
> > > has been added.
> > Really? I cannot see it in the series. I see that later you introduce
> > struct bdi_work but it still contains just nr_pages, sb_data and sync_mode.
> > What I meant was we could include 'struct writeback_control' into the bdi
> > (or bdi_work later) and just pass it down so that caller can have more
> > finegrained control of what writeback the thread is going to perform
> > (like nonblocking, older_than_this, maybe something else).
>
> Oh, I don't pass wbc down, but I do pass it async. The problem with
> passing down the wbc is that it is allocated on the stack. So if we do
> that, then we pretty much have to serialize on that. And I want to avoid
> that.
I see. OK, let's leave it as it is for now. If we find it's really needed,
we should be able to handle this later (by copying the structure from the
stack into dynamically allocated memory or so).
> > > > > void generic_sync_sb_inodes(struct super_block *sb,
> > > > > struct writeback_control *wbc)
> > > > > {
> > > > > - const int is_blkdev_sb = sb_is_blkdev_sb(sb);
> > > > > - struct backing_dev_info *bdi;
> > > > > -
> > > > > - mutex_lock(&bdi_lock);
> > > > > - list_for_each_entry(bdi, &bdi_list, bdi_list)
> > > > > - generic_sync_bdi_inodes(bdi, wbc, sb, is_blkdev_sb);
> > > > > - mutex_unlock(&bdi_lock);
> > > > > + if (wbc->bdi)
> > > > > + generic_sync_bdi_inodes(sb, wbc);
> > > > > + else
> > > > > + bdi_writeback_all(sb, wbc->nr_to_write, wbc->sync_mode);
> > > > Hmm, you write in the changelog that bdi_writeback_all() writes inline
> > > > now but AFAICS it still happens through the writeback threads. Which, on a
> > > > second though probably *is* what we want because we want writeback to go in
> > > > parallel on all devices we have.
> > >
> > > Note that this is patch 4, later it is sync. But yes, I think it should
> > Ah, right. I missed that.
> >
> > > be async as well, but we do need a way to wait on the submitted work to
> > > be processed in case of WB_SYNC_ALL.
> > Yes, this is a bit complicated. You'd need some notification bdi_work is
> > processed.
>
> It was actually not that hard, I already did it. So right now
> wb_writeback() does this:
>
> while (work) {
> get args;
> clear pending;
> do work;
> };
>
> If we just modify that to:
>
> while (work) {
> get args;
> if (sync_mode == WB_SYNC_NONE)
> clear pending;
> do work;
> if (sync_mode == WB_SYNC_ALL)
> clear_pending;
> };
>
> We get notification when the work has actually been issued. Then we can
> easily control whether or not to make a piece of work sync or not.
OK. That should work.
> > > > > * Start writeback of `nr_pages' pages. If `nr_pages' is zero, write back
> > > > > * the whole world. Returns 0 if a pdflush thread was dispatched. Returns
> > > > > * -1 if all pdflush threads were busy.
> > > > > */
> > > > > -int wakeup_pdflush(long nr_pages)
> > > > > +void wakeup_flusher_threads(long nr_pages)
> > > > > {
> > > > > if (nr_pages == 0)
> > > > > nr_pages = global_page_state(NR_FILE_DIRTY) +
> > > > > global_page_state(NR_UNSTABLE_NFS);
> > > > > - return pdflush_operation(background_writeout, nr_pages);
> > > > > + bdi_writeback_all(NULL, nr_pages, WB_SYNC_NONE);
> > > > > + return;
> > > > > }
> > > > ...
> > > > > -
> > > > > -static void laptop_flush(unsigned long unused)
> > > > > -{
> > > > > - sys_sync();
> > > > > -}
> > > > > -
> > > > > static void laptop_timer_fn(unsigned long unused)
> > > > > {
> > > > > - pdflush_operation(laptop_flush, 0);
> > > > > + wakeup_flusher_threads(0);
> > > > > }
> > > > >
> > > > > /*
> > > > Here you significantly change the behavior of laptop mode - previously it
> > > > reliably synced all the data to disk once the "laptop writeout timeout"
> > > > elapsed. The idea is that we want to write all the dirty data we have so
> > > > that the drive can go to sleep for a longer period.
> > > Yeah I know, I wrote the original laptop mode implementation :-)
> > Sorry, I wasn't aware of that.
> >
> > > > Now you changed that to asynchronous WB_SYNC_NONE writeback of some
> > > > number of pages. In particular if the disk gets congested, we'll just stop
> > > > doing writeback which probably isn't what we want for laptop mode...
> > >
> > > Congestion doesn't matter, we don't do nonblocking writeout from the
> > > threads at all. And it is sync later in the series.
> > >
> > > I'll switch it back to async in general throughout the series, and add
> > > some way of making sure we actually have things submitted before doing
> > > the filemap_fdatawait() in generic_sync_sb_inodes(). I think that should
> > > fix it.
> > OK, looking forward to your patches :)
>
> Fiddling with it now, probably a v7 posting coming up tomorrow. And
> thanks again for your great and detailed reviews!
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
next prev parent reply other threads:[~2009-05-25 12:34 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-05-25 7:34 [PATCH 0/12] Per-bdi writeback flusher threads #5 Jens Axboe
2009-05-25 7:34 ` [PATCH 01/12] ntfs: remove old debug check for dirty data in ntfs_put_super() Jens Axboe
2009-05-25 7:34 ` [PATCH 02/12] btrfs: properly register fs backing device Jens Axboe
2009-05-25 7:34 ` [PATCH 03/12] writeback: move dirty inodes from super_block to backing_dev_info Jens Axboe
2009-05-25 8:42 ` Jan Kara
2009-05-25 8:51 ` Jens Axboe
2009-05-25 7:34 ` [PATCH 04/12] writeback: switch to per-bdi threads for flushing data Jens Axboe
2009-05-25 9:58 ` Jan Kara
2009-05-25 10:34 ` Jens Axboe
2009-05-25 12:10 ` Jan Kara
2009-05-25 12:16 ` Jens Axboe
2009-05-25 12:34 ` Jan Kara [this message]
2009-05-26 8:56 ` Jens Axboe
2009-05-26 8:59 ` Christoph Hellwig
2009-05-26 9:07 ` Jens Axboe
2009-05-25 7:34 ` [PATCH 05/12] writeback: get rid of pdflush completely Jens Axboe
2009-05-25 7:34 ` [PATCH 06/12] writeback: separate the flushing state/task from the bdi Jens Axboe
2009-05-25 10:13 ` Jan Kara
2009-05-25 10:36 ` Jens Axboe
2009-05-25 7:34 ` [PATCH 07/12] writeback: support > 1 flusher thread per bdi Jens Axboe
2009-05-25 7:34 ` [PATCH 08/12] writeback: include default_backing_dev_info in writeback Jens Axboe
2009-05-25 7:34 ` [PATCH 09/12] writeback: allow sleepy exit of default writeback task Jens Axboe
2009-05-25 7:34 ` [PATCH 10/12] writeback: add some debug inode list counters to bdi stats Jens Axboe
2009-05-25 7:34 ` [PATCH 11/12] writeback: add name to backing_dev_info Jens Axboe
2009-05-25 7:34 ` [PATCH 12/12] writeback: check for registered bdi in flusher add and inode dirty Jens Axboe
-- strict thread matches above, loose matches on Subject: below --
2009-05-26 9:33 [PATCH 0/12] Per-bdi writeback flusher threads v7 Jens Axboe
2009-05-26 9:33 ` [PATCH 04/12] writeback: switch to per-bdi threads for flushing data Jens Axboe
2009-05-25 7:30 [PATCH 0/12] Per-bdi writeback flusher threads #5 Jens Axboe
2009-05-25 7:30 ` [PATCH 04/12] writeback: switch to per-bdi threads for flushing data Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090525123434.GA5862@duck.suse.cz \
--to=jack@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=chris.mason@oracle.com \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=jens.axboe@oracle.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=yanmin_zhang@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).