linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@infradead.org>
To: Theodore Ts'o <tytso@mit.edu>
Cc: Christoph Hellwig <hch@infradead.org>,
	Linux Filesystem Development List <linux-fsdevel@vger.kernel.org>,
	Ext4 Developers List <linux-ext4@vger.kernel.org>,
	Linux btrfs Developers List <linux-btrfs@vger.kernel.org>,
	XFS Developers <xfs@oss.sgi.com>, Jan Kara <jack@suse.cz>
Subject: Re: [PATCH-v4 1/7] vfs: split update_time() into update_time() and write_time()
Date: Tue, 2 Dec 2014 01:20:33 -0800	[thread overview]
Message-ID: <20141202092033.GA29712@infradead.org> (raw)
In-Reply-To: <20141201150450.GA3337@thunk.org>

On Mon, Dec 01, 2014 at 10:04:50AM -0500, Theodore Ts'o wrote:
> >  - convert ext3/4 to use ->update_time instead of the ->dirty_time
> >    callout so it gets and exact notifications (preferably the few
> >    remaining filesystems as well, although that shouldn't really be a
> >    blocker)
> 
> We could do that, although ext3/4's ->update_time() would be exactly
> the same as the generic update_time() function, so there would be code
> duplication.  If the goal is to get rid of the magic in
> -->dirty_inode() being used to work around how the VFS makes changes
> to fields that end up in the on-disk inode, we would need to audit a
> lot of extra code paths; at the very least, in how the generic quota
> code handles updates to i_size and i_blocks (for example).
> 
> And BTW, we don't actually have a dirty_time() function any more in
> the current patch series.  update_time() is currently looking like
> this:

Sorry, I actually meant ->dirty_inode, which is where ext4 currently
hooks in for time updates.  ->update_time was introduced to

 a) more specificly catch the kind of update
 b) allow the filesystem to take locks or a start a transaction
    before the inode fields are updated to provide proper atomicy.

It seems like the quota code has the same problem, but given that
neither XFS nor btrfs use it it seems like no one cared enough to sort
it out properly..

> static int update_time(struct inode *inode, struct timespec *time, int flags)
> {
> 	if (inode->i_op->update_time)
> 		return inode->i_op->update_time(inode, time, flags);
> 
> 	if (flags & S_ATIME)
> 		inode->i_atime = *time;
> 	if (flags & S_VERSION)
> 		inode_inc_iversion(inode);
> 	if (flags & S_CTIME)
> 		inode->i_ctime = *time;
> 	if (flags & S_MTIME)
> 		inode->i_mtime = *time;
> 
> 	if ((inode->i_sb->s_flags & MS_LAZYTIME) && !(flags & S_VERSION) &&
> 	    !(inode->i_state & I_DIRTY))
> 		__mark_inode_dirty(inode, I_DIRTY_TIME);
> 	else
> 		__mark_inode_dirty(inode, I_DIRTY_SYNC);
> 	return 0;

Why do you need the additional I_DIRTY flag?  A "lesser"
__mark_inode_dirty should never override a stronger one.

Otherwise this looks fine to me, except that I would split the default
implementation into a new generic_update_time helper.

> XFS doesn't have a ->dirty_time yet, but that way XFS would be able to
> use the I_DIRTY_TIME flag to log the journal timestamps if it so
> desires, and perhaps drop the need for it to use update_time().

We will probably always need a ->update_time to proide proper locking
around the timestamp updates.

> (And
> with XFS doing logical journalling, it may be that you might want to
> include the timestamp update in the journal if you have a journal
> transaction open already, so the disk is spun up or likely to be spin
> up anyway, right?)

XFS transactions are explicitly opened and closed, so during the atime
updates we'll never have one open.

What we could try is to have CIL items that are on "indefinit" hold
before they are batched into a checkpoint.  We'd still commit them to
an in-memory transaction in ->upate_time for that.  All this requires
a lot of through and will take some time, though.

In the current from the generic lazytime might even be a loss for XFS as
we're already really good at batching updates from multiple inodes in
the same cluster for the in-place writeback, so I really don't want
to just enable it without those optimizations without a lot of testing.

  parent reply	other threads:[~2014-12-02  9:20 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-26 10:23 [PATCH-v4 0/7] add support for a lazytime mount option Theodore Ts'o
2014-11-26 10:23 ` [PATCH-v4 1/7] vfs: split update_time() into update_time() and write_time() Theodore Ts'o
2014-11-26 19:23   ` Christoph Hellwig
2014-11-27 12:34     ` Jan Kara
2014-11-27 15:25       ` Christoph Hellwig
2014-11-27 14:41     ` Theodore Ts'o
2014-11-27 15:28       ` Christoph Hellwig
2014-11-27 15:33       ` Theodore Ts'o
2014-11-27 16:49         ` Christoph Hellwig
2014-11-27 20:27           ` Theodore Ts'o
2014-12-01  9:28             ` Christoph Hellwig
2014-12-01 15:04               ` Theodore Ts'o
2014-12-01 17:18                 ` David Sterba
2014-12-02  9:20                 ` Christoph Hellwig [this message]
2014-12-02 15:09                   ` Theodore Ts'o
2014-11-26 10:23 ` [PATCH-v4 2/7] vfs: add support for a lazytime mount option Theodore Ts'o
2014-11-27 13:14   ` Jan Kara
2014-11-27 20:19     ` Theodore Ts'o
2014-11-28 12:41       ` Jan Kara
2014-11-27 23:00     ` Theodore Ts'o
2014-11-28  5:36       ` Theodore Ts'o
2014-11-28 16:24       ` Jan Kara
2014-11-26 10:23 ` [PATCH-v4 3/7] vfs: don't let the dirty time inodes get more than a day stale Theodore Ts'o
2014-11-26 10:23 ` [PATCH-v4 4/7] vfs: add lazytime tracepoints for better debugging Theodore Ts'o
2014-11-26 10:23 ` [PATCH-v4 5/7] vfs: add find_active_inode_nowait() function Theodore Ts'o
2014-11-26 10:23 ` [PATCH-v4 6/7] ext4: add support for a lazytime mount option Theodore Ts'o
2014-11-26 19:24   ` Christoph Hellwig
2014-11-26 22:48   ` Dave Chinner
2014-11-26 23:10     ` Andreas Dilger
2014-11-26 23:35       ` Dave Chinner
2014-11-27 13:27         ` Jan Kara
2014-11-27 13:32           ` Jan Kara
2014-11-27 15:25             ` Theodore Ts'o
2014-11-27 15:41               ` Jan Kara
2014-11-27 20:13                 ` Theodore Ts'o
2014-11-26 10:23 ` [PATCH-v4 7/7] btrfs: add an is_readonly() so btrfs can use common code for update_time() Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141202092033.GA29712@infradead.org \
    --to=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).