Re: [PATCH 07/11] writeback: support > 1 flusher thread per bdi

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jens Axboe <jens.axboe@oracle.com>
To: Jan Kara <jack@suse.cz>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	chris.mason@oracle.com, david@fromorbit.com, hch@infradead.org,
	akpm@linux-foundation.org, yanmin_zhang@linux.intel.com,
	richard@rsk.demon.co.uk, damien.wyart@free.fr
Subject: Re: [PATCH 07/11] writeback: support > 1 flusher thread per bdi
Date: Thu, 28 May 2009 12:40:23 +0200	[thread overview]
Message-ID: <20090528104022.GZ11363@kernel.dk> (raw)
In-Reply-To: <20090528092743.GD29199@duck.suse.cz>

On Thu, May 28 2009, Jan Kara wrote:
> On Wed 27-05-09 11:41:48, Jens Axboe wrote:
> > Build on the bdi_writeback support by allowing registration of
> > more than 1 flusher thread. File systems can call bdi_add_flusher_task(bdi)
> > to add more flusher threads to the device. If they do so, they must also
> > provide a super_operations function to return the suitable bdi_writeback
> > struct from any given inode.
>   Isn't this patch in fact doing two different things? Looking at the
> changes it implements more flusher threads per BDI as you write above but
> it also does all that sync-work handling which isn't quite trivial and
> seems to be a separate thing...

Yes, but the separated work is a pre-requisite for multi thread support.
But I guess it would clean it up as a patchset if I added that with the
previous patch, that would save me from having to add the
writeback_acquire()/release changes. But I still have to do those, so
perhaps not... Unless you feel strongly about it, I'd prefer to leave it
as-is.

> > +static struct bdi_work *bdi_alloc_work(struct super_block *sb, long nr_pages,
> > +				       enum writeback_sync_modes sync_mode)
> > +{
> > +	struct bdi_work *work;
> > +
> > +	work = kmalloc(sizeof(*work), GFP_ATOMIC);
>   Why is this GFP_ATOMIC? Wouldn't GFP_NOFS be enough?
> But for now, GFP_ATOMIC may be fine since it excercises more the
> "allocation failed" path :).

GFP_NOFS/NOIO should be ok. I prefer GFP_ATOMIC since we never enter any
reclaim, so better safe than sorry. We still have to handle failure
either way, the rate of failure may be different though.

> > +	if (work)
> > +		bdi_work_init(work, sb, nr_pages, sync_mode);
> > +
> > +	return work;
> > +}
> > +
> > +void bdi_start_writeback(struct backing_dev_info *bdi, struct super_block *sb,
> > +			 long nr_pages, enum writeback_sync_modes sync_mode)
> > +{
> > +	const bool must_wait = sync_mode == WB_SYNC_ALL;
> > +	struct bdi_work work_stack, *work = NULL;
> > +
> > +	if (!must_wait)
> > +		work = bdi_alloc_work(sb, nr_pages, sync_mode);
> > +
> > +	if (!work) {
> > +		work = &work_stack;
> > +		bdi_work_init_on_stack(work, sb, nr_pages, sync_mode);
> >  	}
> >  
> > -	wb_start_writeback(&bdi->wb, sb, nr_pages, sync_mode);
> > -	return 0;
> > +	bdi_queue_work(bdi, work);
> > +	bdi_start_work(bdi, work);
> > +
> > +	/*
> > +	 * If the sync mode is WB_SYNC_ALL, block waiting for the work to
> > +	 * complete. If not, we only need to wait for the work to be started,
> > +	 * if we allocated it on-stack. We use the same mechanism, if the
> > +	 * wait bit is set in the bdi_work struct, then threads will not
> > +	 * clear pending until after they are done.
> > +	 *
> > +	 * Note that work == &work_stack if must_wait is true, so we don't
> > +	 * need to do call_rcu() here ever, since the completion path will
> > +	 * have done that for us.
> > +	 */
> > +	if (must_wait || work == &work_stack) {
> > +		bdi_wait_on_work_clear(work);
> > +		if (work != &work_stack)
> > +			call_rcu(&work->rcu_head, bdi_work_free);
> > +	}
> >  }
>   Looking into this, it seems a bit complex with all that on_stack, sync /
> nosync thing. Wouldn't a simple refcounting scheme be more clear? Alloc /
> init_work_on_stack gets a reference from bdi_work, queue_work gets another
> reference passed later to flusher thread. We drop the reference when we
> leave bdi_start_writeback() and when flusher thread is done with the work.
> When refcount hits zero, work struct is freed (when work is on stack, we
> just never drop the last reference)...

It wouldn't change the complexity of the stack vs non-stack at all,
since you have to do the same checks for when it's safe to proceed. And
having the single bit there with the hash bit wait queues makes that bit
easier.

-- 
Jens Axboe

next prev parent reply	other threads:[~2009-05-28 10:40 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-27  9:41 [PATCH 0/11] Per-bdi writeback flusher threads v8 Jens Axboe
2009-05-27  9:41 ` [PATCH 01/11] ntfs: remove old debug check for dirty data in ntfs_put_super() Jens Axboe
2009-05-27  9:41 ` [PATCH 02/11] btrfs: properly register fs backing device Jens Axboe
2009-05-27  9:41 ` [PATCH 03/11] writeback: move dirty inodes from super_block to backing_dev_info Jens Axboe
2009-05-27  9:41 ` [PATCH 04/11] writeback: switch to per-bdi threads for flushing data Jens Axboe
2009-05-27 11:11   ` Peter Zijlstra
2009-05-27 11:24     ` Jens Axboe
2009-05-27 15:14   ` Jan Kara
2009-05-27 17:50     ` Jens Axboe
2009-05-28 14:45       ` Jan Kara
2009-05-27  9:41 ` [PATCH 05/11] writeback: get rid of pdflush completely Jens Axboe
2009-05-27  9:41 ` [PATCH 06/11] writeback: separate the flushing state/task from the bdi Jens Axboe
2009-05-27  9:41 ` [PATCH 07/11] writeback: support > 1 flusher thread per bdi Jens Axboe
2009-05-28  9:27   ` Jan Kara
2009-05-28 10:40     ` Jens Axboe [this message]
2009-05-28 12:43       ` Jan Kara
2009-05-28 12:53         ` Jens Axboe
2009-05-28 13:58           ` Jan Kara
2009-05-27  9:41 ` [PATCH 08/11] writeback: allow sleepy exit of default writeback task Jens Axboe
2009-05-27  9:41 ` [PATCH 09/11] writeback: add some debug inode list counters to bdi stats Jens Axboe
2009-05-27  9:41 ` [PATCH 10/11] writeback: add name to backing_dev_info Jens Axboe
2009-05-27  9:41 ` [PATCH 11/11] writeback: check for registered bdi in flusher add and inode dirty Jens Axboe
2009-05-27 12:41 ` [PATCH 0/11] Per-bdi writeback flusher threads v8 Richard Kennedy
2009-05-27 12:47   ` Jens Axboe
2009-05-27 14:47 ` Theodore Tso
2009-05-27 15:05   ` Jens Axboe
2009-05-27 17:53   ` Theodore Tso
2009-05-27 17:57     ` Jens Axboe
2009-05-27 17:58     ` Theodore Tso
2009-05-27 18:14       ` Jens Axboe
2009-05-27 19:15         ` Jens Axboe
2009-05-27 19:45           ` Jens Axboe
2009-05-28  0:49             ` Theodore Tso
2009-05-28  9:28               ` Jan Kara
2009-05-28  9:36                 ` Jens Axboe
2009-05-28 15:23                 ` Eric W. Biederman
2009-05-28 19:32                   ` Theodore Tso
2009-05-28 19:38                     ` Christoph Hellwig
  -- strict thread matches above, loose matches on Subject: below --
2009-05-28 11:46 [PATCH 0/11] Per-bdi writeback flusher threads v9 Jens Axboe
2009-05-28 11:46 ` [PATCH 07/11] writeback: support > 1 flusher thread per bdi Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090528104022.GZ11363@kernel.dk \
    --to=jens.axboe@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=chris.mason@oracle.com \
    --cc=damien.wyart@free.fr \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=richard@rsk.demon.co.uk \
    --cc=yanmin_zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).