Re: [PATCH-v5 1/5] vfs: add support for a lazytime mount option

From: Theodore Ts'o <tytso@mit.edu>
To: Boaz Harrosh <boaz@plexistor.com>
Cc: linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
	Jan Kara <jack@suse.cz>,
	linux-btrfs@vger.kernel.org, xfs@oss.sgi.com
Subject: Re: [PATCH-v5 1/5] vfs: add support for a lazytime mount option
Date: Tue, 2 Dec 2014 14:23:37 -0500	[thread overview]
Message-ID: <20141202192337.GA13618@thunk.org> (raw)
In-Reply-To: <547DFD24.9070805@plexistor.com>

On Tue, Dec 02, 2014 at 07:55:48PM +0200, Boaz Harrosh wrote:
> 
> This I do not understand. I thought that I_DIRTY_TIME, and the all
> lazytime mount option, is only for atime. So if there are dirty
> pages then there are also m/ctime that changed and surly we want to
> write these times to disk ASAP.

What are the situations where you are most concerned about mtime or
ctime being accurate after a crash?

I've been running with it on my laptop for a while now, and it's
certainly not a problem for build trees; remember, whenever you need
to update the inode to update i_blocks or i_size, the inode (with its
updated timestamps) will be flushed to disk anyway.

In actual practice, what happens in a build tree is that when make
decides that it needs to update a generated file, when the file is
created as a zero-length inode, m/ctime will be set to the time that
file is created, which is newer than its source files.  As the file is
written, the mtime is updated each time that we actually need to do an
allocating write.  In the case of the linker, it will seek to the
beginning of the file to update ELF header at the very end of its
operation, and *that* time will be left stale, such that the in-memory
mtime is perhaps a millisecond ahead of the on-disk mtime.  But in the
case of a crash, either time is such that make won't be confused.

I'm not aware of an application which is doing a large number of
non-allocating random writes (for example, such as a database), where
said database actually cares about mtime being correct.  In fact, most
databases use fdatasync() to prevent the mtimes from being sync'ed out
to disk on each transaction, so they don't have guaranteed timestamp
accuracy after a crash anyway.  The problem is even if the database is
using fdatasync(), every five seconds we end up updating the mtime
anyway --- and in the case of ext4, we end up needing to take various
journal locks which on a sufficiently parallel workload and a
sufficiently fast disk, can actually cause measurable contention.

Did you have such a use case or application in mind?

> if we are lazytime also with m/ctime then I think I would like an
> option for only atime lazy. because m/ctime is cardinal to some
> operations even though I might want atime lazy.

If there's a sufficiently compelling use case where we do actually
care about mtime/ctime being accurate, and the current semantics don't
provide enough of a guarantee, it's certainly something we could do.
I'd rather keep things simple unless it's really there.  (After all,
we did create the strictatime mount option, but I'm not sure anyone
every ends up using it.  It woud be a shame if we created a
strictcmtime, which had the same usage rate.)

I'll also note that if it's only about atime updates, with the default
relatime mount option, I'm not sure there's enough of a win to hae a
mode to justify a lazyatime only option.  If you really neeed strict
c/mtime after a crash, maybe the best thing to do is to just simply
not use the lazytime mount option and be done with it.

Cheeres,

					- Ted

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs