linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH-v3 0/6] add support for a lazytime mount option
@ 2014-11-25  5:34 Theodore Ts'o
  2014-11-25  5:34 ` [PATCH-v3 1/6] fs: split update_time() into update_time() and write_time() Theodore Ts'o
                   ` (5 more replies)
  0 siblings, 6 replies; 11+ messages in thread
From: Theodore Ts'o @ 2014-11-25  5:34 UTC (permalink / raw)
  To: Linux Filesystem Development List
  Cc: Ext4 Developers List, Theodore Ts'o,
	Linux btrfs Developers List, XFS Developers

This is an updated version of what had originally been an
ext4-specific patch which significantly improves performance by lazily
writing timestamp updates (and in particular, mtime updates) to disk.
The in-memory timestamps are always correct, but they are only written
to disk when required for correctness.

This provides a huge performance boost for ext4 due to how it handles
journalling, but it's valuable for all file systems running on flash
storage or drive-managed SMR disks by reducing the metadata write
load.  So upon request, I've moved the functionality to the VFS layer.
Once the /sbin/mount program adds support for MS_LAZYTIME, all file
systems should be able to benefit from this optimization.

There is still an ext4-specific optimization, which may be applicable
for other file systems which store more than one inode in a block, but
it will require file system specific code.  It is purely optional,
however.

Please note the changes to update_time() and the new write_time() inode
operations functions, which impact btrfs and xfs.  The changes are
fairly simple, but I would appreciate confirmation from the btrfs and
xfs teams that I got things right.   Thanks!!

Any objections to my carrying these patches in the ext4 git tree?

Changes since -v2:
   - If update_time() updates i_version, it will not use lazytime (i..e,
       the inode will be marked dirty so the change will be persisted on to
       disk sooner rather than later).  Yes, this eliminates the
       benefits of lazytime if the user is experting the file system via
       NFSv4.  Sad, but NFS's requirements seem to mandate this.
   - Fix time wrapping bug 49 days after the system boots (on a system
        with a 32-bit jiffies).   Use get_monotonic_boottime() instead.
   - Clean up type warning in include/tracing/ext4.h
   - Added explicit parenthesis for stylistic reasons    
   - Added an is_readonly() inode operations method so btrfs doesn't
       have to duplicate code in update_time().

Changes since -v1:
   - Added explanatory comments in update_time() regarding i_ts_dirty_days
   - Fix type used for days_since_boot
   - Improve SMP scalability in update_time and ext4_update_other_inodes_time
   - Added tracepoints to help test and characterize how often and under
         what circumstances inodes have their timestamps lazily updated

Theodore Ts'o (6):
  fs: split update_time() into update_time() and write_time()
  vfs: add support for a lazytime mount option
  vfs: don't let the dirty time inodes get more than a day stale
  vfs: add lazytime tracepoints for better debugging
  ext4: add support for a lazytime mount option
  btrfs: add an is_readonly() so btrfs can use common code for
    update_time()

 Documentation/filesystems/Locking |  2 +
 fs/btrfs/inode.c                  | 34 +++++++--------
 fs/ext4/inode.c                   | 48 +++++++++++++++++++--
 fs/ext4/super.c                   |  9 ++++
 fs/fs-writeback.c                 | 42 +++++++++++++++++-
 fs/inode.c                        | 91 ++++++++++++++++++++++++++++++++++++++-
 fs/proc_namespace.c               |  1 +
 fs/sync.c                         |  7 +++
 fs/xfs/xfs_iops.c                 | 39 +++++++----------
 include/linux/fs.h                |  7 ++-
 include/trace/events/ext4.h       | 30 +++++++++++++
 include/uapi/linux/fs.h           |  1 +
 12 files changed, 262 insertions(+), 49 deletions(-)

-- 
2.1.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH-v3 1/6] fs: split update_time() into update_time() and write_time()
  2014-11-25  5:34 [PATCH-v3 0/6] add support for a lazytime mount option Theodore Ts'o
@ 2014-11-25  5:34 ` Theodore Ts'o
  2014-11-25 16:00   ` David Sterba
  2014-11-25  5:34 ` [PATCH-v3 2/6] vfs: add support for a lazytime mount option Theodore Ts'o
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 11+ messages in thread
From: Theodore Ts'o @ 2014-11-25  5:34 UTC (permalink / raw)
  To: Linux Filesystem Development List
  Cc: Ext4 Developers List, Linux btrfs Developers List, XFS Developers,
	Theodore Ts'o

In preparation for adding support for the lazytime mount option, we
need to be able to separate out the update_time() and write_time()
inode operations.  Currently, only btrfs and xfs uses update_time().

We needed to preserve update_time() because btrfs wants to have a
special btrfs_root_readonly() check; otherwise we could drop the
update_time() inode operation entirely.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: xfs@oss.sgi.com
Cc: linux-btrfs@vger.kernel.org
---
 Documentation/filesystems/Locking |  2 ++
 fs/btrfs/inode.c                  | 10 ++++++++++
 fs/inode.c                        | 29 ++++++++++++++++++-----------
 fs/xfs/xfs_iops.c                 | 39 ++++++++++++++++-----------------------
 include/linux/fs.h                |  1 +
 5 files changed, 47 insertions(+), 34 deletions(-)

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index b30753c..e49861d 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -63,6 +63,7 @@ prototypes:
 	int (*removexattr) (struct dentry *, const char *);
 	int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start, u64 len);
 	void (*update_time)(struct inode *, struct timespec *, int);
+	void (*write_time)(struct inode *);
 	int (*atomic_open)(struct inode *, struct dentry *,
 				struct file *, unsigned open_flag,
 				umode_t create_mode, int *opened);
@@ -95,6 +96,7 @@ listxattr:	no
 removexattr:	yes
 fiemap:		no
 update_time:	no
+write_time:	no
 atomic_open:	yes
 tmpfile:	no
 dentry_open:	no
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d23362f..a5e0d0d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5574,6 +5574,11 @@ static int btrfs_update_time(struct inode *inode, struct timespec *now,
 		inode->i_mtime = *now;
 	if (flags & S_ATIME)
 		inode->i_atime = *now;
+	return 0;
+}
+
+static int btrfs_write_time(struct inode *inode)
+{
 	return btrfs_dirty_inode(inode);
 }
 
@@ -9462,6 +9467,7 @@ static const struct inode_operations btrfs_dir_inode_operations = {
 	.get_acl	= btrfs_get_acl,
 	.set_acl	= btrfs_set_acl,
 	.update_time	= btrfs_update_time,
+	.write_time	= btrfs_write_time,
 	.tmpfile        = btrfs_tmpfile,
 };
 static const struct inode_operations btrfs_dir_ro_inode_operations = {
@@ -9470,6 +9476,7 @@ static const struct inode_operations btrfs_dir_ro_inode_operations = {
 	.get_acl	= btrfs_get_acl,
 	.set_acl	= btrfs_set_acl,
 	.update_time	= btrfs_update_time,
+	.write_time	= btrfs_write_time,
 };
 
 static const struct file_operations btrfs_dir_file_operations = {
@@ -9540,6 +9547,7 @@ static const struct inode_operations btrfs_file_inode_operations = {
 	.get_acl	= btrfs_get_acl,
 	.set_acl	= btrfs_set_acl,
 	.update_time	= btrfs_update_time,
+	.write_time	= btrfs_write_time,
 };
 static const struct inode_operations btrfs_special_inode_operations = {
 	.getattr	= btrfs_getattr,
@@ -9552,6 +9560,7 @@ static const struct inode_operations btrfs_special_inode_operations = {
 	.get_acl	= btrfs_get_acl,
 	.set_acl	= btrfs_set_acl,
 	.update_time	= btrfs_update_time,
+	.write_time	= btrfs_write_time,
 };
 static const struct inode_operations btrfs_symlink_inode_operations = {
 	.readlink	= generic_readlink,
@@ -9565,6 +9574,7 @@ static const struct inode_operations btrfs_symlink_inode_operations = {
 	.listxattr	= btrfs_listxattr,
 	.removexattr	= btrfs_removexattr,
 	.update_time	= btrfs_update_time,
+	.write_time	= btrfs_write_time,
 };
 
 const struct dentry_operations btrfs_dentry_operations = {
diff --git a/fs/inode.c b/fs/inode.c
index 26753ba..8f5c4b5 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1499,17 +1499,24 @@ static int relatime_need_update(struct vfsmount *mnt, struct inode *inode,
  */
 static int update_time(struct inode *inode, struct timespec *time, int flags)
 {
-	if (inode->i_op->update_time)
-		return inode->i_op->update_time(inode, time, flags);
-
-	if (flags & S_ATIME)
-		inode->i_atime = *time;
-	if (flags & S_VERSION)
-		inode_inc_iversion(inode);
-	if (flags & S_CTIME)
-		inode->i_ctime = *time;
-	if (flags & S_MTIME)
-		inode->i_mtime = *time;
+	int ret;
+
+	if (inode->i_op->update_time) {
+		ret = inode->i_op->update_time(inode, time, flags);
+		if (ret)
+			return ret;
+	} else {
+		if (flags & S_ATIME)
+			inode->i_atime = *time;
+		if (flags & S_VERSION)
+			inode_inc_iversion(inode);
+		if (flags & S_CTIME)
+			inode->i_ctime = *time;
+		if (flags & S_MTIME)
+			inode->i_mtime = *time;
+	}
+	if (inode->i_op->write_time)
+		return inode->i_op->write_time(inode);
 	mark_inode_dirty_sync(inode);
 	return 0;
 }
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index ec6dcdc..0e9653c 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -984,10 +984,8 @@ xfs_vn_setattr(
 }
 
 STATIC int
-xfs_vn_update_time(
-	struct inode		*inode,
-	struct timespec		*now,
-	int			flags)
+xfs_vn_write_time(
+	struct inode		*inode)
 {
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
@@ -1004,21 +1002,16 @@ xfs_vn_update_time(
 	}
 
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
-	if (flags & S_CTIME) {
-		inode->i_ctime = *now;
-		ip->i_d.di_ctime.t_sec = (__int32_t)now->tv_sec;
-		ip->i_d.di_ctime.t_nsec = (__int32_t)now->tv_nsec;
-	}
-	if (flags & S_MTIME) {
-		inode->i_mtime = *now;
-		ip->i_d.di_mtime.t_sec = (__int32_t)now->tv_sec;
-		ip->i_d.di_mtime.t_nsec = (__int32_t)now->tv_nsec;
-	}
-	if (flags & S_ATIME) {
-		inode->i_atime = *now;
-		ip->i_d.di_atime.t_sec = (__int32_t)now->tv_sec;
-		ip->i_d.di_atime.t_nsec = (__int32_t)now->tv_nsec;
-	}
+
+	ip->i_d.di_ctime.t_sec = (__int32_t) inode->i_ctime.tv_sec;
+	ip->i_d.di_ctime.t_nsec = (__int32_t) inode->i_ctime.tv_nsec;
+
+	ip->i_d.di_mtime.t_sec = (__int32_t)inode->i_mtime.tv_sec;
+	ip->i_d.di_mtime.t_nsec = (__int32_t) inode->i_mtime.tv_nsec;
+
+	ip->i_d.di_atime.t_sec = (__int32_t) inode->i_atime.tv_sec;
+	ip->i_d.di_atime.t_nsec = (__int32_t) inode->i_atime.tv_nsec;
+
 	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
 	xfs_trans_log_inode(tp, ip, XFS_ILOG_TIMESTAMP);
 	return xfs_trans_commit(tp, 0);
@@ -1129,7 +1122,7 @@ static const struct inode_operations xfs_inode_operations = {
 	.removexattr		= generic_removexattr,
 	.listxattr		= xfs_vn_listxattr,
 	.fiemap			= xfs_vn_fiemap,
-	.update_time		= xfs_vn_update_time,
+	.write_time		= xfs_vn_write_time,
 };
 
 static const struct inode_operations xfs_dir_inode_operations = {
@@ -1156,7 +1149,7 @@ static const struct inode_operations xfs_dir_inode_operations = {
 	.getxattr		= generic_getxattr,
 	.removexattr		= generic_removexattr,
 	.listxattr		= xfs_vn_listxattr,
-	.update_time		= xfs_vn_update_time,
+	.write_time		= xfs_vn_write_time,
 	.tmpfile		= xfs_vn_tmpfile,
 };
 
@@ -1184,7 +1177,7 @@ static const struct inode_operations xfs_dir_ci_inode_operations = {
 	.getxattr		= generic_getxattr,
 	.removexattr		= generic_removexattr,
 	.listxattr		= xfs_vn_listxattr,
-	.update_time		= xfs_vn_update_time,
+	.write_time		= xfs_vn_write_time,
 	.tmpfile		= xfs_vn_tmpfile,
 };
 
@@ -1198,7 +1191,7 @@ static const struct inode_operations xfs_symlink_inode_operations = {
 	.getxattr		= generic_getxattr,
 	.removexattr		= generic_removexattr,
 	.listxattr		= xfs_vn_listxattr,
-	.update_time		= xfs_vn_update_time,
+	.write_time		= xfs_vn_write_time,
 };
 
 STATIC void
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9ab779e..3633239 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1545,6 +1545,7 @@ struct inode_operations {
 	int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start,
 		      u64 len);
 	int (*update_time)(struct inode *, struct timespec *, int);
+	int (*write_time)(struct inode *);
 	int (*atomic_open)(struct inode *, struct dentry *,
 			   struct file *, unsigned open_flag,
 			   umode_t create_mode, int *opened);
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH-v3 2/6] vfs: add support for a lazytime mount option
  2014-11-25  5:34 [PATCH-v3 0/6] add support for a lazytime mount option Theodore Ts'o
  2014-11-25  5:34 ` [PATCH-v3 1/6] fs: split update_time() into update_time() and write_time() Theodore Ts'o
@ 2014-11-25  5:34 ` Theodore Ts'o
  2014-11-25  5:34 ` [PATCH-v3 3/6] vfs: don't let the dirty time inodes get more than a day stale Theodore Ts'o
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Theodore Ts'o @ 2014-11-25  5:34 UTC (permalink / raw)
  To: Linux Filesystem Development List
  Cc: Ext4 Developers List, Theodore Ts'o,
	Linux btrfs Developers List, XFS Developers

Add a new mount option which enables a new "lazytime" mode.  This mode
causes atime, mtime, and ctime updates to only be made to the
in-memory version of the inode.  The on-disk times will only get
updated when (a) if the inode needs to be updated for some non-time
related change, (b) if userspace calls fsync(), syncfs() or sync(), or
(c) just before an undeleted inode is evicted from memory.

This is OK according to POSIX because there are no guarantees after a
crash unless userspace explicitly requests via a fsync(2) call.

For workloads which feature a large number of random write to a
preallocated file, the lazytime mount option significantly reduces
writes to the inode table.  The repeated 4k writes to a single block
will result in undesirable stress on flash devices and SMR disk
drives.  Even on conventional HDD's, the repeated writes to the inode
table block will trigger Adjacent Track Interference (ATI) remediation
latencies, which very negatively impact 99.9 percentile latencies ---
which is a very big deal for web serving tiers (for example).

Google-Bug-Id: 18297052

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
---
 fs/fs-writeback.c       | 38 +++++++++++++++++++++++++++++++++++++-
 fs/inode.c              | 21 +++++++++++++++++++++
 fs/proc_namespace.c     |  1 +
 fs/sync.c               |  7 +++++++
 include/linux/fs.h      |  1 +
 include/uapi/linux/fs.h |  1 +
 6 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index ef9bef1..ce7de22 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -483,7 +483,7 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
 	if (!mapping_tagged(mapping, PAGECACHE_TAG_DIRTY))
 		inode->i_state &= ~I_DIRTY_PAGES;
 	dirty = inode->i_state & I_DIRTY;
-	inode->i_state &= ~(I_DIRTY_SYNC | I_DIRTY_DATASYNC);
+	inode->i_state &= ~(I_DIRTY_SYNC | I_DIRTY_DATASYNC | I_DIRTY_TIME);
 	spin_unlock(&inode->i_lock);
 	/* Don't write the inode if only I_DIRTY_PAGES was set */
 	if (dirty & (I_DIRTY_SYNC | I_DIRTY_DATASYNC)) {
@@ -1277,6 +1277,41 @@ static void wait_sb_inodes(struct super_block *sb)
 	iput(old_inode);
 }
 
+/*
+ * This works like wait_sb_inodes(), but it is called *before* we kick
+ * the bdi so the inodes can get written out.
+ */
+static void flush_sb_dirty_time(struct super_block *sb)
+{
+	struct inode *inode, *old_inode = NULL;
+
+	WARN_ON(!rwsem_is_locked(&sb->s_umount));
+	spin_lock(&inode_sb_list_lock);
+	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+		int dirty_time;
+
+		spin_lock(&inode->i_lock);
+		if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) {
+			spin_unlock(&inode->i_lock);
+			continue;
+		}
+		dirty_time = inode->i_state & I_DIRTY_TIME;
+		__iget(inode);
+		spin_unlock(&inode->i_lock);
+		spin_unlock(&inode_sb_list_lock);
+
+		iput(old_inode);
+		old_inode = inode;
+
+		if (dirty_time)
+			mark_inode_dirty(inode);
+		cond_resched();
+		spin_lock(&inode_sb_list_lock);
+	}
+	spin_unlock(&inode_sb_list_lock);
+	iput(old_inode);
+}
+
 /**
  * writeback_inodes_sb_nr -	writeback dirty inodes from given super_block
  * @sb: the superblock
@@ -1388,6 +1423,7 @@ void sync_inodes_sb(struct super_block *sb)
 		return;
 	WARN_ON(!rwsem_is_locked(&sb->s_umount));
 
+	flush_sb_dirty_time(sb);
 	bdi_queue_work(sb->s_bdi, &work);
 	wait_for_completion(&done);
 
diff --git a/fs/inode.c b/fs/inode.c
index 8f5c4b5..2093a84 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -534,6 +534,18 @@ static void evict(struct inode *inode)
 	BUG_ON(!(inode->i_state & I_FREEING));
 	BUG_ON(!list_empty(&inode->i_lru));
 
+	if (inode->i_nlink && inode->i_state & I_DIRTY_TIME) {
+		if (inode->i_op->write_time)
+			inode->i_op->write_time(inode);
+		else if (inode->i_sb->s_op->write_inode) {
+			struct writeback_control wbc = {
+				.sync_mode = WB_SYNC_NONE,
+			};
+			mark_inode_dirty(inode);
+			inode->i_sb->s_op->write_inode(inode, &wbc);
+		}
+	}
+
 	if (!list_empty(&inode->i_wb_list))
 		inode_wb_list_del(inode);
 
@@ -1515,6 +1527,15 @@ static int update_time(struct inode *inode, struct timespec *time, int flags)
 		if (flags & S_MTIME)
 			inode->i_mtime = *time;
 	}
+	if ((inode->i_sb->s_flags & MS_LAZYTIME) &&
+	    !(flags & S_VERSION)) {
+		if (inode->i_state & I_DIRTY_TIME)
+			return 0;
+		spin_lock(&inode->i_lock);
+		inode->i_state |= I_DIRTY_TIME;
+		spin_unlock(&inode->i_lock);
+		return 0;
+	}
 	if (inode->i_op->write_time)
 		return inode->i_op->write_time(inode);
 	mark_inode_dirty_sync(inode);
diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c
index 73ca174..f98234a 100644
--- a/fs/proc_namespace.c
+++ b/fs/proc_namespace.c
@@ -44,6 +44,7 @@ static int show_sb_opts(struct seq_file *m, struct super_block *sb)
 		{ MS_SYNCHRONOUS, ",sync" },
 		{ MS_DIRSYNC, ",dirsync" },
 		{ MS_MANDLOCK, ",mand" },
+		{ MS_LAZYTIME, ",lazytime" },
 		{ 0, NULL }
 	};
 	const struct proc_fs_info *fs_infop;
diff --git a/fs/sync.c b/fs/sync.c
index bdc729d..3a35506a 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -177,8 +177,15 @@ SYSCALL_DEFINE1(syncfs, int, fd)
  */
 int vfs_fsync_range(struct file *file, loff_t start, loff_t end, int datasync)
 {
+	struct inode *inode = file->f_mapping->host;
+
 	if (!file->f_op->fsync)
 		return -EINVAL;
+	if (!datasync && (inode->i_state & I_DIRTY_TIME)) {
+		spin_lock(&inode->i_lock);
+		inode->i_state |= I_DIRTY_SYNC;
+		spin_unlock(&inode->i_lock);
+	}
 	return file->f_op->fsync(file, start, end, datasync);
 }
 EXPORT_SYMBOL(vfs_fsync_range);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3633239..489b2f2 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1721,6 +1721,7 @@ struct super_operations {
 #define __I_DIO_WAKEUP		9
 #define I_DIO_WAKEUP		(1 << I_DIO_WAKEUP)
 #define I_LINKABLE		(1 << 10)
+#define I_DIRTY_TIME		(1 << 11)
 
 #define I_DIRTY (I_DIRTY_SYNC | I_DIRTY_DATASYNC | I_DIRTY_PAGES)
 
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 3735fa0..cc9713a 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -90,6 +90,7 @@ struct inodes_stat_t {
 #define MS_KERNMOUNT	(1<<22) /* this is a kern_mount call */
 #define MS_I_VERSION	(1<<23) /* Update inode I_version field */
 #define MS_STRICTATIME	(1<<24) /* Always perform atime updates */
+#define MS_LAZYTIME	(1<<25) /* Update the on-disk [acm]times lazily */
 
 /* These sb flags are internal to the kernel */
 #define MS_NOSEC	(1<<28)
-- 
2.1.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH-v3 3/6] vfs: don't let the dirty time inodes get more than a day stale
  2014-11-25  5:34 [PATCH-v3 0/6] add support for a lazytime mount option Theodore Ts'o
  2014-11-25  5:34 ` [PATCH-v3 1/6] fs: split update_time() into update_time() and write_time() Theodore Ts'o
  2014-11-25  5:34 ` [PATCH-v3 2/6] vfs: add support for a lazytime mount option Theodore Ts'o
@ 2014-11-25  5:34 ` Theodore Ts'o
  2014-11-25 14:58   ` Rasmus Villemoes
  2014-11-25  5:34 ` [PATCH-v3 4/6] vfs: add lazytime tracepoints for better debugging Theodore Ts'o
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 11+ messages in thread
From: Theodore Ts'o @ 2014-11-25  5:34 UTC (permalink / raw)
  To: Linux Filesystem Development List
  Cc: Ext4 Developers List, Linux btrfs Developers List, XFS Developers,
	Theodore Ts'o

Guarantee that the on-disk timestamps will be no more than 24 hours
stale.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
---
 fs/fs-writeback.c  |  1 +
 fs/inode.c         | 28 +++++++++++++++++++++++-----
 include/linux/fs.h |  1 +
 3 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index ce7de22..eb04277 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1141,6 +1141,7 @@ void __mark_inode_dirty(struct inode *inode, int flags)
 	if (flags & (I_DIRTY_SYNC | I_DIRTY_DATASYNC)) {
 		trace_writeback_dirty_inode_start(inode, flags);
 
+		inode->i_ts_dirty_day = 0;
 		if (sb->s_op->dirty_inode)
 			sb->s_op->dirty_inode(inode, flags);
 
diff --git a/fs/inode.c b/fs/inode.c
index 2093a84..34a443f 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1511,6 +1511,8 @@ static int relatime_need_update(struct vfsmount *mnt, struct inode *inode,
  */
 static int update_time(struct inode *inode, struct timespec *time, int flags)
 {
+	struct timespec uptime;
+	unsigned short daycode;
 	int ret;
 
 	if (inode->i_op->update_time) {
@@ -1525,17 +1527,33 @@ static int update_time(struct inode *inode, struct timespec *time, int flags)
 		if (flags & S_CTIME)
 			inode->i_ctime = *time;
 		if (flags & S_MTIME)
-			inode->i_mtime = *time;
+		inode->i_mtime = *time;
 	}
+	/*
+	 * If i_ts_dirty_day is zero, then either we have not deferred
+	 * timestamp updates, or the system has been up for less than
+	 * a day (so days_since_boot is zero), so we defer timestamp
+	 * updates in that case and set the I_DIRTY_TIME flag.  If a
+	 * day or more has passed, then i_ts_dirty_day will be
+	 * different from days_since_boot, and then we should update
+	 * the on-disk inode and then we can clear i_ts_dirty_day.
+	 */
 	if ((inode->i_sb->s_flags & MS_LAZYTIME) &&
 	    !(flags & S_VERSION)) {
 		if (inode->i_state & I_DIRTY_TIME)
 			return 0;
-		spin_lock(&inode->i_lock);
-		inode->i_state |= I_DIRTY_TIME;
-		spin_unlock(&inode->i_lock);
-		return 0;
+		get_monotonic_boottime(&uptime);
+		daycode = div_u64(uptime.tv_sec, (HZ * 86400));
+		if (!inode->i_ts_dirty_day ||
+		    inode->i_ts_dirty_day == daycode) {
+			spin_lock(&inode->i_lock);
+			inode->i_state |= I_DIRTY_TIME;
+			spin_unlock(&inode->i_lock);
+			inode->i_ts_dirty_day = daycode;
+			return 0;
+		}
 	}
+	inode->i_ts_dirty_day = 0;
 	if (inode->i_op->write_time)
 		return inode->i_op->write_time(inode);
 	mark_inode_dirty_sync(inode);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 489b2f2..e3574cd 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -575,6 +575,7 @@ struct inode {
 	struct timespec		i_ctime;
 	spinlock_t		i_lock;	/* i_blocks, i_bytes, maybe i_size */
 	unsigned short          i_bytes;
+	unsigned short		i_ts_dirty_day;
 	unsigned int		i_blkbits;
 	blkcnt_t		i_blocks;
 
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH-v3 4/6] vfs: add lazytime tracepoints for better debugging
  2014-11-25  5:34 [PATCH-v3 0/6] add support for a lazytime mount option Theodore Ts'o
                   ` (2 preceding siblings ...)
  2014-11-25  5:34 ` [PATCH-v3 3/6] vfs: don't let the dirty time inodes get more than a day stale Theodore Ts'o
@ 2014-11-25  5:34 ` Theodore Ts'o
  2014-11-25  5:34 ` [PATCH-v3 5/6] ext4: add support for a lazytime mount option Theodore Ts'o
  2014-11-25  5:34 ` [PATCH-v3 6/6] btrfs: add an is_readonly() so btrfs can use common code for update_time() Theodore Ts'o
  5 siblings, 0 replies; 11+ messages in thread
From: Theodore Ts'o @ 2014-11-25  5:34 UTC (permalink / raw)
  To: Linux Filesystem Development List
  Cc: Ext4 Developers List, Linux btrfs Developers List, XFS Developers,
	Theodore Ts'o

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
---
 fs/fs-writeback.c | 5 ++++-
 fs/inode.c        | 5 +++++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index eb04277..cab2d6d 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -27,6 +27,7 @@
 #include <linux/backing-dev.h>
 #include <linux/tracepoint.h>
 #include <linux/device.h>
+#include <trace/events/fs.h>
 #include "internal.h"
 
 /*
@@ -1304,8 +1305,10 @@ static void flush_sb_dirty_time(struct super_block *sb)
 		iput(old_inode);
 		old_inode = inode;
 
-		if (dirty_time)
+		if (dirty_time) {
+			trace_fs_lazytime_flush(inode);
 			mark_inode_dirty(inode);
+		}
 		cond_resched();
 		spin_lock(&inode_sb_list_lock);
 	}
diff --git a/fs/inode.c b/fs/inode.c
index 34a443f..6319ead 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -20,6 +20,9 @@
 #include <linux/list_lru.h>
 #include "internal.h"
 
+#define CREATE_TRACE_POINTS
+#include <trace/events/fs.h>
+
 /*
  * Inode locking rules:
  *
@@ -544,6 +547,7 @@ static void evict(struct inode *inode)
 			mark_inode_dirty(inode);
 			inode->i_sb->s_op->write_inode(inode, &wbc);
 		}
+		trace_fs_lazytime_evict(inode);
 	}
 
 	if (!list_empty(&inode->i_wb_list))
@@ -1550,6 +1554,7 @@ static int update_time(struct inode *inode, struct timespec *time, int flags)
 			inode->i_state |= I_DIRTY_TIME;
 			spin_unlock(&inode->i_lock);
 			inode->i_ts_dirty_day = daycode;
+			trace_fs_lazytime_defer(inode);
 			return 0;
 		}
 	}
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH-v3 5/6] ext4: add support for a lazytime mount option
  2014-11-25  5:34 [PATCH-v3 0/6] add support for a lazytime mount option Theodore Ts'o
                   ` (3 preceding siblings ...)
  2014-11-25  5:34 ` [PATCH-v3 4/6] vfs: add lazytime tracepoints for better debugging Theodore Ts'o
@ 2014-11-25  5:34 ` Theodore Ts'o
  2014-11-25  5:34 ` [PATCH-v3 6/6] btrfs: add an is_readonly() so btrfs can use common code for update_time() Theodore Ts'o
  5 siblings, 0 replies; 11+ messages in thread
From: Theodore Ts'o @ 2014-11-25  5:34 UTC (permalink / raw)
  To: Linux Filesystem Development List
  Cc: Ext4 Developers List, Linux btrfs Developers List, XFS Developers,
	Theodore Ts'o

Add an optimization for the MS_LAZYTIME mount option so that we will
opportunistically write out any inodes with the I_DIRTY_TIME flag set
in a particular inode table block when we need to update some inode in
that inode table block anyway.

Also add some temporary code so that we can set the lazytime mount
option without needing a modified /sbin/mount program which can set
MS_LAZYTIME.  We can eventually make this go away once util-linux has
added support.

Google-Bug-Id: 18297052

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
---
 fs/ext4/inode.c             | 48 ++++++++++++++++++++++++++++++++++++++++++---
 fs/ext4/super.c             |  9 +++++++++
 fs/inode.c                  | 36 ++++++++++++++++++++++++++++++++++
 include/linux/fs.h          |  2 ++
 include/trace/events/ext4.h | 30 ++++++++++++++++++++++++++++
 5 files changed, 122 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 3356ab5..03149b4 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4163,6 +4163,50 @@ static int ext4_inode_blocks_set(handle_t *handle,
 }
 
 /*
+ * Opportunistically update the other time fields for other inodes in
+ * the same inode table block.
+ */
+static void ext4_update_other_inodes_time(struct super_block *sb,
+					  unsigned long orig_ino, char *buf)
+{
+	struct ext4_inode_info	*ei;
+	struct ext4_inode	*raw_inode;
+	unsigned long		ino;
+	struct inode		*inode;
+	int		i, inodes_per_block = EXT4_SB(sb)->s_inodes_per_block;
+	int		inode_size = EXT4_INODE_SIZE(sb);
+
+	ino = orig_ino & ~(inodes_per_block - 1);
+	for (i = 0; i < inodes_per_block; i++, ino++, buf += inode_size) {
+		if (ino == orig_ino)
+			continue;
+		inode = find_active_inode_nowait(sb, ino);
+		if (!inode ||
+		    (inode->i_state & I_DIRTY_TIME) == 0 ||
+		    !spin_trylock(&inode->i_lock)) {
+			iput(inode);
+			continue;
+		}
+		inode->i_state &= ~I_DIRTY_TIME;
+		inode->i_ts_dirty_day = 0;
+		spin_unlock(&inode->i_lock);
+
+		ei = EXT4_I(inode);
+		raw_inode = (struct ext4_inode *) buf;
+
+		spin_lock(&ei->i_raw_lock);
+		EXT4_INODE_SET_XTIME(i_ctime, inode, raw_inode);
+		EXT4_INODE_SET_XTIME(i_mtime, inode, raw_inode);
+		EXT4_INODE_SET_XTIME(i_atime, inode, raw_inode);
+		ext4_inode_csum_set(inode, raw_inode, ei);
+		spin_unlock(&ei->i_raw_lock);
+		trace_ext4_other_inode_update_time(inode, orig_ino);
+		iput(inode);
+	}
+}
+
+
+/*
  * Post the struct inode info into an on-disk inode location in the
  * buffer-cache.  This gobbles the caller's reference to the
  * buffer_head in the inode location struct.
@@ -4260,7 +4304,6 @@ static int ext4_do_update_inode(handle_t *handle,
 		for (block = 0; block < EXT4_N_BLOCKS; block++)
 			raw_inode->i_block[block] = ei->i_data[block];
 	}
-
 	if (likely(!test_opt2(inode->i_sb, HURD_COMPAT))) {
 		raw_inode->i_disk_version = cpu_to_le32(inode->i_version);
 		if (ei->i_extra_isize) {
@@ -4271,10 +4314,9 @@ static int ext4_do_update_inode(handle_t *handle,
 				cpu_to_le16(ei->i_extra_isize);
 		}
 	}
-
 	ext4_inode_csum_set(inode, raw_inode, ei);
-
 	spin_unlock(&ei->i_raw_lock);
+	ext4_update_other_inodes_time(inode->i_sb, inode->i_ino, bh->b_data);
 
 	BUFFER_TRACE(bh, "call ext4_handle_dirty_metadata");
 	rc = ext4_handle_dirty_metadata(handle, NULL, bh);
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 4b79f39..1ac1914 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1133,6 +1133,7 @@ enum {
 	Opt_noquota, Opt_barrier, Opt_nobarrier, Opt_err,
 	Opt_usrquota, Opt_grpquota, Opt_i_version,
 	Opt_stripe, Opt_delalloc, Opt_nodelalloc, Opt_mblk_io_submit,
+	Opt_lazytime, Opt_nolazytime,
 	Opt_nomblk_io_submit, Opt_block_validity, Opt_noblock_validity,
 	Opt_inode_readahead_blks, Opt_journal_ioprio,
 	Opt_dioread_nolock, Opt_dioread_lock,
@@ -1195,6 +1196,8 @@ static const match_table_t tokens = {
 	{Opt_i_version, "i_version"},
 	{Opt_stripe, "stripe=%u"},
 	{Opt_delalloc, "delalloc"},
+	{Opt_lazytime, "lazytime"},
+	{Opt_nolazytime, "nolazytime"},
 	{Opt_nodelalloc, "nodelalloc"},
 	{Opt_removed, "mblk_io_submit"},
 	{Opt_removed, "nomblk_io_submit"},
@@ -1450,6 +1453,12 @@ static int handle_mount_opt(struct super_block *sb, char *opt, int token,
 	case Opt_i_version:
 		sb->s_flags |= MS_I_VERSION;
 		return 1;
+	case Opt_lazytime:
+		sb->s_flags |= MS_LAZYTIME;
+		return 1;
+	case Opt_nolazytime:
+		sb->s_flags &= ~MS_LAZYTIME;
+		return 1;
 	}
 
 	for (m = ext4_mount_opts; m->token != Opt_err; m++)
diff --git a/fs/inode.c b/fs/inode.c
index 6319ead..1f90591 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1296,6 +1296,42 @@ struct inode *ilookup(struct super_block *sb, unsigned long ino)
 }
 EXPORT_SYMBOL(ilookup);
 
+/**
+ * find_active_inode_nowait - find an active inode in the inode cache
+ * @sb:		super block of file system to search
+ * @ino:	inode number to search for
+ *
+ * Search for an active inode @ino in the inode cache, and if the
+ * inode is in the cache, the inode is returned with an incremented
+ * reference count.  If the inode is being freed or is newly
+ * initialized, return nothing instead of trying to wait for the inode
+ * initialization or destruction to be complete.
+ */
+struct inode *find_active_inode_nowait(struct super_block *sb,
+				       unsigned long ino)
+{
+	struct hlist_head *head = inode_hashtable + hash(sb, ino);
+	struct inode *inode, *ret_inode = NULL;
+
+	spin_lock(&inode_hash_lock);
+	hlist_for_each_entry(inode, head, i_hash) {
+		if ((inode->i_ino != ino) ||
+		    (inode->i_sb != sb))
+			continue;
+		spin_lock(&inode->i_lock);
+		if ((inode->i_state & (I_FREEING | I_WILL_FREE | I_NEW)) == 0) {
+			__iget(inode);
+			ret_inode = inode;
+		}
+		spin_unlock(&inode->i_lock);
+		goto out;
+	}
+out:
+	spin_unlock(&inode_hash_lock);
+	return ret_inode;
+}
+EXPORT_SYMBOL(find_active_inode_nowait);
+
 int insert_inode_locked(struct inode *inode)
 {
 	struct super_block *sb = inode->i_sb;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e3574cd..dbbd642 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2413,6 +2413,8 @@ extern struct inode *ilookup(struct super_block *sb, unsigned long ino);
 
 extern struct inode * iget5_locked(struct super_block *, unsigned long, int (*test)(struct inode *, void *), int (*set)(struct inode *, void *), void *);
 extern struct inode * iget_locked(struct super_block *, unsigned long);
+extern struct inode *find_active_inode_nowait(struct super_block *,
+					      unsigned long);
 extern int insert_inode_locked4(struct inode *, unsigned long, int (*test)(struct inode *, void *), void *);
 extern int insert_inode_locked(struct inode *);
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
index ff4bd1b..ba649cb 100644
--- a/include/trace/events/ext4.h
+++ b/include/trace/events/ext4.h
@@ -75,6 +75,36 @@ struct extent_status;
 	{ FALLOC_FL_ZERO_RANGE,		"ZERO_RANGE"})
 
 
+TRACE_EVENT(ext4_other_inode_update_time,
+	TP_PROTO(struct inode *inode, ino_t orig_ino),
+
+	TP_ARGS(inode, orig_ino),
+
+	TP_STRUCT__entry(
+		__field(	dev_t,	dev			)
+		__field(	ino_t,	ino			)
+		__field(	ino_t,	orig_ino		)
+		__field(	uid_t,	uid			)
+		__field(	gid_t,	gid			)
+		__field(	__u16, mode			)
+	),
+
+	TP_fast_assign(
+		__entry->orig_ino = orig_ino;
+		__entry->dev	= inode->i_sb->s_dev;
+		__entry->ino	= inode->i_ino;
+		__entry->uid	= i_uid_read(inode);
+		__entry->gid	= i_gid_read(inode);
+		__entry->mode	= inode->i_mode;
+	),
+
+	TP_printk("dev %d,%d orig_ino %lu ino %lu mode 0%o uid %u gid %u",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  (unsigned long) __entry->orig_ino,
+		  (unsigned long) __entry->ino, __entry->mode,
+		  __entry->uid, __entry->gid)
+);
+
 TRACE_EVENT(ext4_free_inode,
 	TP_PROTO(struct inode *inode),
 
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH-v3 6/6] btrfs: add an is_readonly() so btrfs can use common code for update_time()
  2014-11-25  5:34 [PATCH-v3 0/6] add support for a lazytime mount option Theodore Ts'o
                   ` (4 preceding siblings ...)
  2014-11-25  5:34 ` [PATCH-v3 5/6] ext4: add support for a lazytime mount option Theodore Ts'o
@ 2014-11-25  5:34 ` Theodore Ts'o
  2014-11-25 16:02   ` David Sterba
  5 siblings, 1 reply; 11+ messages in thread
From: Theodore Ts'o @ 2014-11-25  5:34 UTC (permalink / raw)
  To: Linux Filesystem Development List
  Cc: Ext4 Developers List, Theodore Ts'o,
	Linux btrfs Developers List, XFS Developers

The only reason btrfs cloned code from the VFS layer was so it could
add a check to see if a subvolume is read-ony.  Instead of doing that,
let's add a new inode operation which allows a file system to return
an error if the inode is read-only, and use that in update_time().
There may be other places where the VFS layer may want to know that
btrfs would want to treat an inode is read-only.

With this commit, there are no remaining users of update_time() in the
inode operations structure, so we can remove it and simply things
further.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: linux-btrfs@vger.kernel.org
---
 fs/btrfs/inode.c   | 26 ++++++--------------------
 fs/inode.c         | 22 +++++++++++-----------
 include/linux/fs.h |  2 +-
 3 files changed, 18 insertions(+), 32 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index a5e0d0d..0bfe3a4 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5554,26 +5554,12 @@ static int btrfs_dirty_inode(struct inode *inode)
 	return ret;
 }
 
-/*
- * This is a copy of file_update_time.  We need this so we can return error on
- * ENOSPC for updating the inode in the case of file write and mmap writes.
- */
-static int btrfs_update_time(struct inode *inode, struct timespec *now,
-			     int flags)
+static int btrfs_is_readonly(struct inode *inode)
 {
 	struct btrfs_root *root = BTRFS_I(inode)->root;
 
 	if (btrfs_root_readonly(root))
 		return -EROFS;
-
-	if (flags & S_VERSION)
-		inode_inc_iversion(inode);
-	if (flags & S_CTIME)
-		inode->i_ctime = *now;
-	if (flags & S_MTIME)
-		inode->i_mtime = *now;
-	if (flags & S_ATIME)
-		inode->i_atime = *now;
 	return 0;
 }
 
@@ -9466,8 +9452,8 @@ static const struct inode_operations btrfs_dir_inode_operations = {
 	.permission	= btrfs_permission,
 	.get_acl	= btrfs_get_acl,
 	.set_acl	= btrfs_set_acl,
-	.update_time	= btrfs_update_time,
 	.write_time	= btrfs_write_time,
+	.is_readonly	= btrfs_is_readonly,
 	.tmpfile        = btrfs_tmpfile,
 };
 static const struct inode_operations btrfs_dir_ro_inode_operations = {
@@ -9475,8 +9461,8 @@ static const struct inode_operations btrfs_dir_ro_inode_operations = {
 	.permission	= btrfs_permission,
 	.get_acl	= btrfs_get_acl,
 	.set_acl	= btrfs_set_acl,
-	.update_time	= btrfs_update_time,
 	.write_time	= btrfs_write_time,
+	.is_readonly	= btrfs_is_readonly,
 };
 
 static const struct file_operations btrfs_dir_file_operations = {
@@ -9546,8 +9532,8 @@ static const struct inode_operations btrfs_file_inode_operations = {
 	.fiemap		= btrfs_fiemap,
 	.get_acl	= btrfs_get_acl,
 	.set_acl	= btrfs_set_acl,
-	.update_time	= btrfs_update_time,
 	.write_time	= btrfs_write_time,
+	.is_readonly	= btrfs_is_readonly,
 };
 static const struct inode_operations btrfs_special_inode_operations = {
 	.getattr	= btrfs_getattr,
@@ -9559,8 +9545,8 @@ static const struct inode_operations btrfs_special_inode_operations = {
 	.removexattr	= btrfs_removexattr,
 	.get_acl	= btrfs_get_acl,
 	.set_acl	= btrfs_set_acl,
-	.update_time	= btrfs_update_time,
 	.write_time	= btrfs_write_time,
+	.is_readonly	= btrfs_is_readonly,
 };
 static const struct inode_operations btrfs_symlink_inode_operations = {
 	.readlink	= generic_readlink,
@@ -9573,8 +9559,8 @@ static const struct inode_operations btrfs_symlink_inode_operations = {
 	.getxattr	= btrfs_getxattr,
 	.listxattr	= btrfs_listxattr,
 	.removexattr	= btrfs_removexattr,
-	.update_time	= btrfs_update_time,
 	.write_time	= btrfs_write_time,
+	.is_readonly	= btrfs_is_readonly,
 };
 
 const struct dentry_operations btrfs_dentry_operations = {
diff --git a/fs/inode.c b/fs/inode.c
index 1f90591..e83c6d2 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1555,20 +1555,20 @@ static int update_time(struct inode *inode, struct timespec *time, int flags)
 	unsigned short daycode;
 	int ret;
 
-	if (inode->i_op->update_time) {
-		ret = inode->i_op->update_time(inode, time, flags);
+	if (inode->i_op->is_readonly) {
+		ret = inode->i_op->is_readonly(inode);
 		if (ret)
 			return ret;
-	} else {
-		if (flags & S_ATIME)
-			inode->i_atime = *time;
-		if (flags & S_VERSION)
-			inode_inc_iversion(inode);
-		if (flags & S_CTIME)
-			inode->i_ctime = *time;
-		if (flags & S_MTIME)
-		inode->i_mtime = *time;
 	}
+	if (flags & S_ATIME)
+		inode->i_atime = *time;
+	if (flags & S_VERSION)
+		inode_inc_iversion(inode);
+	if (flags & S_CTIME)
+		inode->i_ctime = *time;
+	if (flags & S_MTIME)
+		inode->i_mtime = *time;
+
 	/*
 	 * If i_ts_dirty_day is zero, then either we have not deferred
 	 * timestamp updates, or the system has been up for less than
diff --git a/include/linux/fs.h b/include/linux/fs.h
index dbbd642..70711c8 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1545,8 +1545,8 @@ struct inode_operations {
 	int (*removexattr) (struct dentry *, const char *);
 	int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start,
 		      u64 len);
-	int (*update_time)(struct inode *, struct timespec *, int);
 	int (*write_time)(struct inode *);
+	int (*is_readonly)(struct inode *);
 	int (*atomic_open)(struct inode *, struct dentry *,
 			   struct file *, unsigned open_flag,
 			   umode_t create_mode, int *opened);
-- 
2.1.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH-v3 3/6] vfs: don't let the dirty time inodes get more than a day stale
  2014-11-25  5:34 ` [PATCH-v3 3/6] vfs: don't let the dirty time inodes get more than a day stale Theodore Ts'o
@ 2014-11-25 14:58   ` Rasmus Villemoes
  2014-11-25 15:55     ` Theodore Ts'o
  0 siblings, 1 reply; 11+ messages in thread
From: Rasmus Villemoes @ 2014-11-25 14:58 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Linux Filesystem Development List, Ext4 Developers List,
	Linux btrfs Developers List, XFS Developers

On Tue, Nov 25 2014, Theodore Ts'o <tytso@mit.edu> wrote:

>  static int update_time(struct inode *inode, struct timespec *time, int flags)
>  {
> +	struct timespec uptime;
> +	unsigned short daycode;
>  	int ret;
>  
>  	if (inode->i_op->update_time) {
> @@ -1525,17 +1527,33 @@ static int update_time(struct inode *inode, struct timespec *time, int flags)
>  		if (flags & S_CTIME)
>  			inode->i_ctime = *time;
>  		if (flags & S_MTIME)
> -			inode->i_mtime = *time;
> +		inode->i_mtime = *time;
>  	}
> +	/*
> +	 * If i_ts_dirty_day is zero, then either we have not deferred
> +	 * timestamp updates, or the system has been up for less than
> +	 * a day (so days_since_boot is zero), so we defer timestamp
> +	 * updates in that case and set the I_DIRTY_TIME flag.  If a
> +	 * day or more has passed, then i_ts_dirty_day will be
> +	 * different from days_since_boot, and then we should update
> +	 * the on-disk inode and then we can clear i_ts_dirty_day.
> +	 */

I think days_since_boot was a lot clearer than daycode. In any case,
please make the comment and the code consistent.

>  	if ((inode->i_sb->s_flags & MS_LAZYTIME) &&
>  	    !(flags & S_VERSION)) {
>  		if (inode->i_state & I_DIRTY_TIME)
>  			return 0;
> -		spin_lock(&inode->i_lock);
> -		inode->i_state |= I_DIRTY_TIME;
> -		spin_unlock(&inode->i_lock);
> -		return 0;
> +		get_monotonic_boottime(&uptime);
> +		daycode = div_u64(uptime.tv_sec, (HZ * 86400));

You should probably divide by the number of seconds in a day, not the
number of jiffies in a day.

Isn't div_u64 mostly for when the divisor is not known at compile time?
Technically, "(u64)uptime.tv_sec / 86400" is of course a u64/u64 division,
but the compiler should see that the divisor is only 32 bits and hence
should be able to generate code which is at least as efficient as
whatever inline asm the arch provides for u64/u32 divisions.


Rasmus

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH-v3 3/6] vfs: don't let the dirty time inodes get more than a day stale
  2014-11-25 14:58   ` Rasmus Villemoes
@ 2014-11-25 15:55     ` Theodore Ts'o
  0 siblings, 0 replies; 11+ messages in thread
From: Theodore Ts'o @ 2014-11-25 15:55 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Linux Filesystem Development List, Ext4 Developers List,
	Linux btrfs Developers List, XFS Developers

On Tue, Nov 25, 2014 at 03:58:01PM +0100, Rasmus Villemoes wrote:
> 
> I think days_since_boot was a lot clearer than daycode. In any case,
> please make the comment and the code consistent.

Yeah, I was going back and forth between days since the epoch and days
since boot, but found it was more efficient to calculate the days
since boot.  Sure, I'll go back to days_since_boot.

> You should probably divide by the number of seconds in a day, not the
> number of jiffies in a day.

Right, brain hiccup on my part, will fix.

> Isn't div_u64 mostly for when the divisor is not known at compile time?
> Technically, "(u64)uptime.tv_sec / 86400" is of course a u64/u64 division,
> but the compiler should see that the divisor is only 32 bits and hence
> should be able to generate code which is at least as efficient as
> whatever inline asm the arch provides for u64/u32 divisions.

The problem with doing u64/u64 divisions is that the compiler will
call out to a (non-existent) library function on some architectures.
For example, try compiling the following using: with "gcc -S -m32
foo.c"

int main(int argc, char **argv)
{
	unsigned long long t = time(0);

	printf("%llu\n", t / 86400);
}

You will find in the .S file the following:

	...
	pushl	$0
	pushl	$86400
	pushl	%edx
	pushl	%eax
	call	__udivdi3
	...

It will work finn compiling for x86_64, but if you do this and push
out a git branch, you will soon get a very nice e-mail from the ktest
bot explaining how you've broken the build on the i386 architecture
since the kernel doesn't provide __udivdi3.
	
Hence if you are doing any kind of divisions involving u64, you have
to use the functions in include/linux/math64.h, and div_u64 is, per
math64.h:

/**
 * div_u64 - unsigned 64bit divide with 32bit divisor
 *
 * This is the most common 64bit divide and should be used if possible,
 * as many 32bit archs can optimize this variant better than a full 64bit
 * divide.
 */

Cheers,

					- Ted

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH-v3 1/6] fs: split update_time() into update_time() and write_time()
  2014-11-25  5:34 ` [PATCH-v3 1/6] fs: split update_time() into update_time() and write_time() Theodore Ts'o
@ 2014-11-25 16:00   ` David Sterba
  0 siblings, 0 replies; 11+ messages in thread
From: David Sterba @ 2014-11-25 16:00 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Linux Filesystem Development List, Ext4 Developers List,
	Linux btrfs Developers List, XFS Developers

On Tue, Nov 25, 2014 at 12:34:29AM -0500, Theodore Ts'o wrote:
> In preparation for adding support for the lazytime mount option, we
> need to be able to separate out the update_time() and write_time()
> inode operations.  Currently, only btrfs and xfs uses update_time().
> 
> We needed to preserve update_time() because btrfs wants to have a
> special btrfs_root_readonly() check; otherwise we could drop the
> update_time() inode operation entirely.
> 
> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> Cc: xfs@oss.sgi.com
> Cc: linux-btrfs@vger.kernel.org

For the btrfs changes

Acked-by: David Sterba <dsterba@suse.cz>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH-v3 6/6] btrfs: add an is_readonly() so btrfs can use common code for update_time()
  2014-11-25  5:34 ` [PATCH-v3 6/6] btrfs: add an is_readonly() so btrfs can use common code for update_time() Theodore Ts'o
@ 2014-11-25 16:02   ` David Sterba
  0 siblings, 0 replies; 11+ messages in thread
From: David Sterba @ 2014-11-25 16:02 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Linux Filesystem Development List, Ext4 Developers List,
	Linux btrfs Developers List, XFS Developers

On Tue, Nov 25, 2014 at 12:34:34AM -0500, Theodore Ts'o wrote:
> The only reason btrfs cloned code from the VFS layer was so it could
> add a check to see if a subvolume is read-ony.  Instead of doing that,
> let's add a new inode operation which allows a file system to return
> an error if the inode is read-only, and use that in update_time().
> There may be other places where the VFS layer may want to know that
> btrfs would want to treat an inode is read-only.
> 
> With this commit, there are no remaining users of update_time() in the
> inode operations structure, so we can remove it and simply things
> further.
> 
> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> Cc: linux-btrfs@vger.kernel.org

Reviewed-by: David Sterba <dsterba@suse.cz>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-11-25 16:02 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-25  5:34 [PATCH-v3 0/6] add support for a lazytime mount option Theodore Ts'o
2014-11-25  5:34 ` [PATCH-v3 1/6] fs: split update_time() into update_time() and write_time() Theodore Ts'o
2014-11-25 16:00   ` David Sterba
2014-11-25  5:34 ` [PATCH-v3 2/6] vfs: add support for a lazytime mount option Theodore Ts'o
2014-11-25  5:34 ` [PATCH-v3 3/6] vfs: don't let the dirty time inodes get more than a day stale Theodore Ts'o
2014-11-25 14:58   ` Rasmus Villemoes
2014-11-25 15:55     ` Theodore Ts'o
2014-11-25  5:34 ` [PATCH-v3 4/6] vfs: add lazytime tracepoints for better debugging Theodore Ts'o
2014-11-25  5:34 ` [PATCH-v3 5/6] ext4: add support for a lazytime mount option Theodore Ts'o
2014-11-25  5:34 ` [PATCH-v3 6/6] btrfs: add an is_readonly() so btrfs can use common code for update_time() Theodore Ts'o
2014-11-25 16:02   ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).