linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <jens.axboe@oracle.com>
To: Jan Kara <jack@suse.cz>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	chris.mason@oracle.com, david@fromorbit.com, hch@infradead.org,
	akpm@linux-foundation.org, yanmin_zhang@linux.intel.com
Subject: Re: [PATCH 02/11] writeback: switch to per-bdi threads for flushing data
Date: Wed, 20 May 2009 13:32:34 +0200	[thread overview]
Message-ID: <20090520113234.GT11363@kernel.dk> (raw)
In-Reply-To: <20090520111850.GB3760@duck.suse.cz>

On Wed, May 20 2009, Jan Kara wrote:
>   Hi Jens,
> 
>   a few comments here. Mainly, I still don't think the sys_sync() is
> working right - see comments below.

Thanks! I took the liberty of killing some of the code in between, to
make it easier to see.

> > +void bdi_writeback_all(struct super_block *sb, long nr_pages)
> > +{
> > +	struct backing_dev_info *bdi;
> > +
> > +	rcu_read_lock();
> > +
> > +restart:
> > +	list_for_each_entry_rcu(bdi, &bdi_list, bdi_list) {
>   Isn't the RCU list here a bit overengineering? AFAICS we use the list
> only here and if I'm grepping right, generic_sync_sb_inodes() is currently
> only used for data integrity sync (after your patches) from fs-writeback.c
> and by UBIFS to do equivalent of writeback_inodes(). So simple spinlock
> guarding the list should be just fine. Or am I missing something?

Sure, we could. But it's really not that much of a difference,
implementation wise.

> > @@ -591,13 +711,10 @@ static void generic_sync_bdi_inodes(struct backing_dev_info *bdi,
> >  void generic_sync_sb_inodes(struct super_block *sb,
> >  				struct writeback_control *wbc)
> >  {
> > -	const int is_blkdev_sb = sb_is_blkdev_sb(sb);
> > -	struct backing_dev_info *bdi;
> > -
> > -	rcu_read_lock();
> > -	list_for_each_entry_rcu(bdi, &bdi_list, bdi_list)
> > -		generic_sync_bdi_inodes(bdi, wbc, sb, is_blkdev_sb);
> > -	rcu_read_unlock();
> > +	if (wbc->bdi)
> > +		bdi_start_writeback(wbc->bdi, sb, 0);
> > +	else
> > +		bdi_writeback_all(sb, 0);
>   It does not work like this. The way you call writeback here, you never
> endup calling __writeback_single_inode() with WB_SYNC_ALL set in wbc (your
> writeback routines always call inode writeback with WB_SYNC_NONE). And
> that is required for proper data integrity sync... So you have to somehow
> propagate this down to the writeback thread.

Good point, we need to pass down sync mode too. Not a big problem, we
can just add that to bdi_work as well.

>   Alternatively, what probably makes a lot of sence, is to separate data
> integrity sync path from just data writeback. In the first case we care
> more about correctness, in the second case we care more about performance
> and overall throughput.

Yep agree, that would clean it up as well. I'll include that in the next
revision, I think I'll post it on friday.

>   BTW your patch also significantly changes one thing: With your patch data
> integrity sync is done by flusher threads while previously is was done from
> the context of the thread calling sync(). I'm undecided whether it is a
> good or bad thing but it definitely deserves a comment in the changelog.

I'll look at the implications of this again, perhaps it'll be better to
just switch it back for now.

> > +static int bdi_forker_task(void *ptr)
> > +{
> > +	struct backing_dev_info *bdi, *me = ptr;
> > +
> > +	for (;;) {
> > +		DEFINE_WAIT(wait);
> > +
> > +		/*
> > +		 * Should never trigger on the default bdi
> > +		 */
> > +		WARN_ON(bdi_has_dirty_io(me));
> > +
> > +		prepare_to_wait(&me->wait, &wait, TASK_INTERRUPTIBLE);
> > +		smp_mb();
>   Wouldn't the code look simpler like:
> 	spin_lock_bh(&bdi_lock);
> 	if (list_empty(&bdi_pending_list)) {
> 		spin_unlock_bh(&bdi_lock);
> 		schedule();
> 	} else {
> 		bdi = list_entry(bdi_pending_list.next,
> 				 struct backing_dev_info, bdi_list);
> 		list_del_init(&bdi->bdi_list);
> 		spin_unlock_bh(&bdi_lock);
> 		if (bdi->task)
> 			continue;
> 		... do work ...
> 	}

Not a bad suggestion, I'll fiddle it around a bit.

> 
> > +		if (list_empty(&bdi_pending_list))
> > +			schedule();
> > +		else {
> > +repeat:
> > +			bdi = NULL;
> > +
> > +			spin_lock_bh(&bdi_lock);
> > +			if (!list_empty(&bdi_pending_list)) {
> > +				bdi = list_entry(bdi_pending_list.next,
> > +						 struct backing_dev_info,
> > +						 bdi_list);
> > +				list_del_init(&bdi->bdi_list);
> > +			}
> > +			spin_unlock_bh(&bdi_lock);
> > +
> > +			/*
> > +			 * If no bdi or bdi already got setup, continue
> > +			 */
> > +			if (!bdi || bdi->task)
> > +				continue;
> > +
> > +			bdi->task = kthread_run(bdi_start_fn, bdi, "bdi-%s",
> > +						dev_name(bdi->dev));
> > +			/*
> > +			 * If task creation fails, then readd the bdi to
> > +			 * the pending list and force writeout of the bdi
> > +			 * from this forker thread. That will free some memory
> > +			 * and we can try again.
> > +			 */
> > +			if (!bdi->task) {
> > +				struct writeback_control wbc = {
> > +					.bdi			= bdi,
> > +					.sync_mode		= WB_SYNC_NONE,
> > +					.older_than_this	= NULL,
> > +					.range_cyclic		= 1,
> > +				};
> > +
> > +				/*
> > +				 * Add this 'bdi' to the back, so we get
> > +				 * a chance to flush other bdi's to free
> > +				 * memory.
> > +				 */
> > +				spin_lock_bh(&bdi_lock);
> > +				list_add_tail(&bdi->bdi_list,
> > +						&bdi_pending_list);
> > +				spin_unlock_bh(&bdi_lock);
> > +
> > +				wbc.nr_to_write = 1024;
> > +				generic_sync_bdi_inodes(NULL, &wbc);
> > +				goto repeat;
> > +			}
> > +		}
> > +
> > +		finish_wait(&me->wait, &wait);
> > +	}
> > +
> > +	return 0;

Thanks for your review Jan, always helpful!

-- 
Jens Axboe


  reply	other threads:[~2009-05-20 11:33 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-18 12:19 [PATCH 0/11] Per-bdi writeback flusher threads #4 Jens Axboe
2009-05-18 12:19 ` [PATCH 01/11] writeback: move dirty inodes from super_block to backing_dev_info Jens Axboe
2009-05-18 12:19 ` [PATCH 02/11] writeback: switch to per-bdi threads for flushing data Jens Axboe
2009-05-19 10:20   ` Richard Kennedy
2009-05-19 12:23     ` Jens Axboe
2009-05-19 13:45       ` Richard Kennedy
2009-05-19 17:56         ` Jens Axboe
2009-05-19 22:11           ` Peter Zijlstra
2009-05-20 11:18   ` Jan Kara
2009-05-20 11:32     ` Jens Axboe [this message]
2009-05-20 12:11       ` Jan Kara
2009-05-20 12:16         ` Jens Axboe
2009-05-20 12:24           ` Christoph Hellwig
2009-05-20 12:48             ` Jens Axboe
2009-05-20 12:37   ` Christoph Hellwig
2009-05-20 12:49     ` Jens Axboe
2009-05-20 14:02       ` Anton Altaparmakov
2009-05-18 12:19 ` [PATCH 03/11] writeback: get rid of pdflush completely Jens Axboe
2009-05-18 12:19 ` [PATCH 04/11] writeback: separate the flushing state/task from the bdi Jens Axboe
2009-05-20 11:34   ` Jan Kara
2009-05-20 11:39     ` Jens Axboe
2009-05-20 12:06       ` Jan Kara
2009-05-20 12:09         ` Jens Axboe
2009-05-18 12:19 ` [PATCH 05/11] writeback: support > 1 flusher thread per bdi Jens Axboe
2009-05-18 12:19 ` [PATCH 06/11] writeback: include default_backing_dev_info in writeback Jens Axboe
2009-05-18 12:19 ` [PATCH 07/11] writeback: allow sleepy exit of default writeback task Jens Axboe
2009-05-18 12:19 ` [PATCH 08/11] writeback: btrfs must register its backing_devices Jens Axboe
2009-05-18 12:19 ` [PATCH 09/11] writeback: add some debug inode list counters to bdi stats Jens Axboe
2009-05-18 12:19 ` [PATCH 10/11] writeback: add name to backing_dev_info Jens Axboe
2009-05-18 12:19 ` [PATCH 11/11] writeback: check for registered bdi in flusher add and inode dirty Jens Axboe
2009-05-19  6:11 ` [PATCH 0/11] Per-bdi writeback flusher threads #4 Zhang, Yanmin
2009-05-19  6:20   ` Jens Axboe
2009-05-19  6:43     ` Zhang, Yanmin
2009-05-20  7:51     ` Zhang, Yanmin
2009-05-20  8:09       ` Jens Axboe
2009-05-20  8:54         ` Jens Axboe
2009-05-20  9:19           ` Zhang, Yanmin
2009-05-20  9:25             ` Jens Axboe
2009-05-20 11:19               ` Jens Axboe
2009-05-21  6:33                 ` Zhang, Yanmin
2009-05-21  9:10                   ` Jan Kara
2009-05-22  1:28                     ` Zhang, Yanmin
2009-05-22  8:15                       ` Jens Axboe
2009-05-22 20:44                         ` Jens Axboe
2009-05-23 19:15                           ` Jens Axboe
2009-05-25  8:02                             ` Zhang, Yanmin
2009-05-25  8:06                               ` Jens Axboe
2009-05-25  8:43                               ` Zhang, Yanmin
2009-05-25  8:48                                 ` Jens Axboe
2009-05-25  8:54                         ` Zhang, Yanmin
2009-05-22  7:53                     ` Jens Axboe
2009-05-22  7:53                       ` Jens Axboe
2009-05-25 15:57 ` Richard Kennedy
2009-05-25 17:05   ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090520113234.GT11363@kernel.dk \
    --to=jens.axboe@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=chris.mason@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=yanmin_zhang@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).