From: Dave Chinner <david@fromorbit.com>
To: Jan Kara <jack@suse.cz>
Cc: linux-fsdevel@vger.kernel.org,
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>,
Wu Fengguang <fengguang.wu@intel.com>
Subject: Re: [PATCH 14/14] writeback: Per-sb dirty tracking
Date: Wed, 6 Aug 2014 09:44:16 +1000 [thread overview]
Message-ID: <20140805234416.GH20518@dastard> (raw)
In-Reply-To: <1406844053-25982-15-git-send-email-jack@suse.cz>
On Fri, Aug 01, 2014 at 12:00:53AM +0200, Jan Kara wrote:
> Switch inode dirty tracking lists to be per superblock instead of per
> bdi. This is a major step towards filesystems being able to do their
> own dirty tracking and selection of inodes for writeback if they desire
> so (e.g. because they journal or COW data and need to writeback inodes
> & pages in a specific order unknown to generic writeback code).
>
> Per superblock dirty lists also make selecting inodes for writeback
> somewhat simpler because we don't have to search for inodes from a
> particular superblock for some kinds of writeback (OTOH we pay for this
> by having to iterate through superblocks for all-bdi type of writeback)
> and this simplification will allow for an easier switch to a better
> scaling data structure for dirty inodes.
I think the WB_STATE_STALLED code is buggy w.r.t. unmount.
> @@ -672,6 +670,15 @@ static long writeback_inodes(struct bdi_writeback *wb,
> break;
> }
> }
> +
> + /*
> + * In case we made no progress in current IO batch and there are no
> + * inodes postponed for further writeback, set WB_STATE_STALLED
> + * so that flusher doesn't busyloop in case no dirty inodes can be
> + * written.
> + */
> + if (!wrote && list_empty(&wb->b_more_io))
> + wb->state |= WB_STATE_STALLED;
> spin_unlock(&wb->list_lock);
Last background writeback ends with WB_STATE_STALLED.
> @@ -771,26 +778,47 @@ static long bdi_writeback(struct backing_dev_info *bdi,
> } else if (work->for_background)
> oldest_jif = jiffies;
>
> + /*
> + * If we made some progress, clear stalled state to retry other
> + * writeback queues as well.
> + */
> + if (progress) {
> + spin_lock_bh(&bdi->wb_lock);
> + list_for_each_entry(wb, &bdi->wq_list, bdi_list) {
> + wb->state &= ~WB_STATE_STALLED;
> + }
> + spin_unlock_bh(&bdi->wb_lock);
> + }
First time through we clear the stalled state by walking
&bdi->wq_list, but....
> +
> + if (work->sb) {
> + wb = &work->sb->s_dirty_inodes;
> + if (wb->state & WB_STATE_STALLED)
> + wb = NULL;
if the sb state is stalled we don't do writeback, and ....
> @@ -1015,6 +1017,13 @@ void kill_block_super(struct super_block *sb)
> struct block_device *bdev = sb->s_bdev;
> fmode_t mode = sb->s_mode;
>
> + /*
> + * Unregister superblock from periodic writeback. There may be
> + * writeback still running for it but we call sync_filesystem() later
> + * and that will execute only after any background writeback is stopped.
> + * This guarantees flusher won't touch sb that's going away.
> + */
> + bdi_writeback_queue_unregister(&sb->s_dirty_inodes);
> bdev->bd_super = NULL;
> generic_shutdown_super(sb);
We unregister the writeback queue from the BDI before unmount runs
sync_filesystem() from geneic_shutdown_super(sb), and ....
> +/*
> + * Unregister writeback queue from BDI. No further background writeback will be
> + * started against this superblock. However note that there may be writeback
> + * still running for the sb.
> + */
> +void bdi_writeback_queue_unregister(struct bdi_writeback *wb_queue)
> +{
> + struct backing_dev_info *bdi = wb_bdi(wb_queue);
> +
> + /* Make sure flusher cannot find the superblock any longer */
> + spin_lock_bh(&bdi->wb_lock);
> + list_del_init(&wb_queue->bdi_list);
> + spin_unlock_bh(&bdi->wb_lock);
> }
Unregistering the BDI removes it from the BDI list and hence
bdi_writeback will never clear the WB_STATE_STALLED bit on
superblocks trying to do writeback in unmount.
I'm not sure I really like this code very much - it seems to be
muchmore complex than it needs to be because writeback is still
managed on a per-bdi basis and the sb iteration is pretty clunky.
If we are moving to per-sb inode tracking, we should also move all
the writeback management to per-sb as well.
IMO, there's no good reason for keeping flusher threads per-bdi and
then having to iterate per-sb just to do background/periodic
writeback, and then have special cases for sb specific writeback
that avoids the bdi per-sb looping. i.e. per-sb flush work executed
by a bdi flusher thread makes a lot more sense than per-bdi
flush work that iterates superblocks.
So for the moment, I think this patch makes things worse rather than
better. I'd much prefer to see a single series that moves from per-bdi
tracking/writeback to per-sb tracking/writeback than to split the
tracking/writeback changes and then have to support an weird,
temporary, intermediate code base like this...
Ignoring that, the hack below makes this patch work for me.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
writeback: clear WB_STATE_STALLED for sb specific writeback
From: Dave Chinner <dchinner@redhat.com>
During unmount, the superblock has been removed from the bdi
writeback list, and so never has the WB_STATE_STALLED flag cleared
before writeback is attempted. hence it never does writeback because
it sees this flag. Fix this by unconditionally clearing the flag if
work->sb is set rather than iterating the bdi....
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/fs-writeback.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index e80d1b9..6d9cd0c 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -780,12 +780,20 @@ static long bdi_writeback(struct backing_dev_info *bdi,
/*
* If we made some progress, clear stalled state to retry other
- * writeback queues as well.
+ * writeback queues as well. Note that unmount can remove the
+ * wbqueue from the bdi before we get here, in which case we'll
+ * be flushing a specific superblock and hence we have to
+ * specifically clear the superblock stalled state.
*/
if (progress) {
spin_lock_bh(&bdi->wb_lock);
- list_for_each_entry(wb, &bdi->wq_list, bdi_list) {
+ if (work->sb) {
+ wb = &work->sb->s_dirty_inodes;
wb->state &= ~WB_STATE_STALLED;
+ } else {
+ list_for_each_entry(wb, &bdi->wq_list, bdi_list) {
+ wb->state &= ~WB_STATE_STALLED;
+ }
}
spin_unlock_bh(&bdi->wb_lock);
}
next prev parent reply other threads:[~2014-08-05 23:44 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-31 22:00 [RFC PATCH 00/14] Per-sb tracking of dirty inodes Jan Kara
2014-07-31 22:00 ` [PATCH 01/14] writeback: Get rid of superblock pinning Jan Kara
2014-07-31 22:00 ` [PATCH 02/14] writeback: Remove writeback_inodes_wb() Jan Kara
2014-07-31 22:00 ` [PATCH 03/14] writeback: Remove useless argument of writeback_single_inode() Jan Kara
2014-07-31 22:00 ` [PATCH 04/14] writeback: Don't put inodes which cannot be written to b_more_io Jan Kara
2014-07-31 22:00 ` [PATCH 05/14] writeback: Move dwork and last_old_flush into backing_dev_info Jan Kara
2014-07-31 22:00 ` [PATCH 06/14] writeback: Switch locking of bandwidth fields to wb_lock Jan Kara
2014-07-31 22:00 ` [PATCH 07/14] writeback: Provide a function to get bdi from bdi_writeback Jan Kara
2014-07-31 22:00 ` [PATCH 08/14] writeback: Schedule future writeback if bdi (not wb) has dirty inodes Jan Kara
2014-07-31 22:00 ` [PATCH 09/14] writeback: Switch some function arguments from bdi_writeback to bdi Jan Kara
2014-07-31 22:00 ` [PATCH 10/14] writeback: Move rechecking of work list into bdi_process_work_items() Jan Kara
2014-07-31 22:00 ` [PATCH 11/14] writeback: Shorten list_lock hold times in bdi_writeback() Jan Kara
2014-07-31 22:00 ` [PATCH 12/14] writeback: Move refill of b_io list into writeback_inodes() Jan Kara
2014-07-31 22:00 ` [PATCH 13/14] writeback: Comment update Jan Kara
2014-07-31 22:00 ` [PATCH 14/14] writeback: Per-sb dirty tracking Jan Kara
2014-08-01 5:14 ` Daniel Phillips
2014-08-05 23:44 ` Dave Chinner [this message]
2014-08-06 8:46 ` Jan Kara
2014-08-06 21:13 ` Dave Chinner
2014-08-08 10:46 ` Jan Kara
2014-08-10 23:16 ` Dave Chinner
2014-08-01 5:32 ` [RFC PATCH 00/14] Per-sb tracking of dirty inodes Daniel Phillips
2014-08-05 5:22 ` Dave Chinner
2014-08-05 10:31 ` Jan Kara
2014-08-05 8:20 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140805234416.GH20518@dastard \
--to=david@fromorbit.com \
--cc=fengguang.wu@intel.com \
--cc=hirofumi@mail.parknet.co.jp \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.