From: Jan Kara <jack@suse.cz>
To: Theodore Ts'o <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>,
Linux Filesystem Development List <linux-fsdevel@vger.kernel.org>,
Ext4 Developers List <linux-ext4@vger.kernel.org>,
Linux btrfs Developers List <linux-btrfs@vger.kernel.org>,
XFS Developers <xfs@oss.sgi.com>
Subject: Re: [PATCH-v4 2/7] vfs: add support for a lazytime mount option
Date: Fri, 28 Nov 2014 17:24:45 +0100 [thread overview]
Message-ID: <20141128162445.GB27902@quack.suse.cz> (raw)
In-Reply-To: <20141127230016.GH14091@thunk.org>
On Thu 27-11-14 18:00:16, Ted Tso wrote:
> On Thu, Nov 27, 2014 at 02:14:21PM +0100, Jan Kara wrote:
> > * change queue_io() to also call
> > moved += move_expired_inodes(&wb->b_dirty_time, &wb->b_io, time + 24hours)
> > For this you need to tweak move_expired_inodes() to take pointer to
> > timestamp instead of pointer to work but that's trivial. Also you want
> > probably leave time ->older_than_this value (i.e. without +24 hours) if
> > we are doing WB_SYNC_ALL writeback. With this you can remove
> > flush_sb_dirty_time() completely.
>
> Well.... it's not quite enough. The problem is that for ext3 and
> ext4, the actual work of writing the inode happens in dirty_inode(),
> not in write_inode(). Which means we need to do something like this.
Right, I didn't realize this problem.
> I'm not entirely sure whether or not this is too ugly to live;
> personally, I think my hack of handling this in update_time() might be
> preferable....
Actually handling the copying of timestamps in __writeback_single_inode()
would look fine to me. You mention in your next email, calling
mark_inode_dirty_sync() from flusher may be problematic - why? How is this
any different from calling mark_inode_dirty_sync() from
flush_sb_dirty_time()?
I will note that for a while I thought copying the full inode to on-disk
buffer may be problematic because inode may be in an intermediate state of
some transactional change. But that isn't an issue - if there's any
transactional change in progress, it has a handle open and until the
change is node, thus the buffer with the partial change cannot go to the
journal (transaction cannot commit) until mark_inode_dirty_sync() copies
the final state of the inode.
Another solution may be to convey the information that copying of timestamps
is necessary to ->write_inode method. We could do that via a
flag bit in writeback_control. Each filesystem can then copy timestamps
when this bit is set. But calling mark_inode_dirty_sync() from
__writeback_single_inode() looks simpler to me.
Honza
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index b93c529..95a42b3 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -253,7 +253,7 @@ static bool inode_dirtied_after(struct inode *inode, unsigned long t)
> */
> static int move_expired_inodes(struct list_head *delaying_queue,
> struct list_head *dispatch_queue,
> - struct wb_writeback_work *work)
> + unsigned long *older_than_this)
> {
> LIST_HEAD(tmp);
> struct list_head *pos, *node;
> @@ -264,8 +264,8 @@ static int move_expired_inodes(struct list_head *delaying_queue,
>
> while (!list_empty(delaying_queue)) {
> inode = wb_inode(delaying_queue->prev);
> - if (work->older_than_this &&
> - inode_dirtied_after(inode, *work->older_than_this))
> + if (older_than_this &&
> + inode_dirtied_after(inode, *older_than_this))
> break;
> list_move(&inode->i_wb_list, &tmp);
> moved++;
> @@ -309,9 +309,14 @@ out:
> static void queue_io(struct bdi_writeback *wb, struct wb_writeback_work *work)
> {
> int moved;
> + unsigned long one_day_later = jiffies + (HZ * 86400);
> +
> assert_spin_locked(&wb->list_lock);
> list_splice_init(&wb->b_more_io, &wb->b_io);
> - moved = move_expired_inodes(&wb->b_dirty, &wb->b_io, work);
> + moved = move_expired_inodes(&wb->b_dirty, &wb->b_io,
> + work->older_than_this);
> + moved += move_expired_inodes(&wb->b_dirty_time, &wb->b_io,
> + &one_day_later);
> trace_writeback_queue_io(wb, work, moved);
> }
>
> @@ -637,6 +642,17 @@ static long writeback_sb_inodes(struct super_block *sb,
> }
>
> /*
> + * If the inode is marked dirty time but is not dirty,
> + * then at last for ext3 and ext4 we need to call
> + * mark_inode_dirty_sync in order to get the inode
> + * timestamp transferred to the on disk inode, since
> + * write_inode is a no-op for those file systems. HACK HACK HACK
> + */
> + if ((inode->i_state & I_DIRTY_TIME) &&
> + ((inode->i_state & (I_DIRTY_SYNC | I_DIRTY_DATASYNC)) == 0))
> + mark_inode_dirty_sync(inode);
> +
> + /*
> * Don't bother with new inodes or inodes being freed, first
> * kind does not need periodic writeout yet, and for the latter
> * kind writeout is handled by the freer.
> @@ -1233,9 +1249,10 @@ void inode_requeue_dirtytime(struct inode *inode)
> spin_lock(&bdi->wb.list_lock);
> spin_lock(&inode->i_lock);
> if ((inode->i_state & I_DIRTY_WB) == 0) {
> - if (inode->i_state & I_DIRTY_TIME)
> + if (inode->i_state & I_DIRTY_TIME) {
> + inode->dirtied_when = jiffies;
> list_move(&inode->i_wb_list, &bdi->wb.b_dirty_time);
> - else
> + } else
> list_del_init(&inode->i_wb_list);
> }
> spin_unlock(&inode->i_lock);
>
> Comments?
>
> - Ted
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
next prev parent reply other threads:[~2014-11-28 16:24 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-26 10:23 [PATCH-v4 0/7] add support for a lazytime mount option Theodore Ts'o
2014-11-26 10:23 ` [PATCH-v4 1/7] vfs: split update_time() into update_time() and write_time() Theodore Ts'o
2014-11-26 19:23 ` Christoph Hellwig
2014-11-27 12:34 ` Jan Kara
2014-11-27 15:25 ` Christoph Hellwig
2014-11-27 14:41 ` Theodore Ts'o
2014-11-27 15:28 ` Christoph Hellwig
2014-11-27 15:33 ` Theodore Ts'o
2014-11-27 16:49 ` Christoph Hellwig
2014-11-27 20:27 ` Theodore Ts'o
2014-12-01 9:28 ` Christoph Hellwig
2014-12-01 15:04 ` Theodore Ts'o
2014-12-01 17:18 ` David Sterba
2014-12-02 9:20 ` Christoph Hellwig
2014-12-02 15:09 ` Theodore Ts'o
2014-11-26 10:23 ` [PATCH-v4 2/7] vfs: add support for a lazytime mount option Theodore Ts'o
2014-11-27 13:14 ` Jan Kara
2014-11-27 20:19 ` Theodore Ts'o
2014-11-28 12:41 ` Jan Kara
2014-11-27 23:00 ` Theodore Ts'o
2014-11-28 5:36 ` Theodore Ts'o
2014-11-28 16:24 ` Jan Kara [this message]
2014-11-26 10:23 ` [PATCH-v4 3/7] vfs: don't let the dirty time inodes get more than a day stale Theodore Ts'o
2014-11-26 10:23 ` [PATCH-v4 4/7] vfs: add lazytime tracepoints for better debugging Theodore Ts'o
2014-11-26 10:23 ` [PATCH-v4 5/7] vfs: add find_active_inode_nowait() function Theodore Ts'o
2014-11-26 10:23 ` [PATCH-v4 6/7] ext4: add support for a lazytime mount option Theodore Ts'o
2014-11-26 19:24 ` Christoph Hellwig
2014-11-26 22:48 ` Dave Chinner
2014-11-26 23:10 ` Andreas Dilger
2014-11-26 23:35 ` Dave Chinner
2014-11-27 13:27 ` Jan Kara
2014-11-27 13:32 ` Jan Kara
2014-11-27 15:25 ` Theodore Ts'o
2014-11-27 15:41 ` Jan Kara
2014-11-27 20:13 ` Theodore Ts'o
2014-11-26 10:23 ` [PATCH-v4 7/7] btrfs: add an is_readonly() so btrfs can use common code for update_time() Theodore Ts'o
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141128162445.GB27902@quack.suse.cz \
--to=jack@suse.cz \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=tytso@mit.edu \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).