From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dmitry Monakhov Subject: Re: [PATCH-v9 0/3] add support for lazytime mount option Date: Tue, 03 Feb 2015 10:56:47 +0300 Message-ID: <877fvz1l8g.fsf@openvz.org> References: <1422855422-7444-1-git-send-email-tytso@mit.edu> Mime-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha512; protocol="application/pgp-signature" Cc: viro@ZenIV.linux.org.uk, Theodore Ts'o To: Theodore Ts'o , Linux Filesystem Development List Return-path: Received: from mail-wi0-f171.google.com ([209.85.212.171]:54641 "EHLO mail-wi0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754078AbbBCH6A (ORCPT ); Tue, 3 Feb 2015 02:58:00 -0500 Received: by mail-wi0-f171.google.com with SMTP id l15so22108008wiw.4 for ; Mon, 02 Feb 2015 23:57:59 -0800 (PST) In-Reply-To: <1422855422-7444-1-git-send-email-tytso@mit.edu> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: --=-=-= Content-Type: text/plain Content-Transfer-Encoding: quoted-printable Theodore Ts'o writes: > This is an updated version of what had originally been an > ext4-specific patch which significantly improves performance by lazily > writing timestamp updates (and in particular, mtime updates) to disk. > The in-memory timestamps are always correct, but they are only written > to disk when required for correctness. > > This provides a huge performance boost for ext4 due to how it handles > journalling, but it's valuable for all file systems running on flash > storage or drive-managed SMR disks by reducing the metadata write > load. So upon request, I've moved the functionality to the VFS layer. > Once the /sbin/mount program adds support for MS_LAZYTIME, all file > systems should be able to benefit from this optimization. > > There is still an ext4-specific optimization, which may be applicable > for other file systems which store more than one inode in a block, but > it will require file system specific code. It is purely optional, > however. FYI: I'm writing xfstests for this feature. Here is list 0) Consistency tests. Tests where we check that mtime is updated during update of other inode's fields A) for ino_field from "i_size xattr owner perm i_generation" update $ino_field; umount/mount; check mtime =20=20=20 1) Integrity tests. tests where we test that umount/sync/fsync force mtime update for inodes. umount case is quite obvious A)mtime_update; umount/mount; check mtime B)mtime_update; sync ; hwfailure-simulation; umount/mount; check mtime C)mtime_update; fsync ; hwfailure-simulation; umount/mount; check mtime 2) Check that mtime delay is actually works. This is statistical method which operate on many files. A) mtime_update; hwfailure-simulation; umount/mount; check mtime B) mtime_update; wait-lazytime-period; hwfailure-sim; umount/mount; chec= k mtime For (B) we need knob in sysfs to makes lazytime expiration time tunable (1-= 2 minutes) > > For people interested seeing how timestamp updates are held back, the > following example commands to enable the tracepoints debugging may be > helpful: > > mount -o remount,lazytime / > cd /sys/kernel/debug/tracing > echo 1 > events/writeback/writeback_lazytime/enable > echo 1 > events/writeback/writeback_lazytime_iput/enable > echo "state & 2048" > events/writeback/writeback_dirty_inode_enqueue/fi= lter > echo 1 > events/writeback/writeback_dirty_inode_enqueue/enable > echo 1 > events/ext4/ext4_other_inode_update_time/enable > cat trace_pipe > > You can also see how many lazytime inodes are in memory by looking in > /sys/kernel/debug/bdi//stats > > Changes since -v8: > - in ext4_update_other_inodes_time() clear I_DIRTY_TIME_EXPIRED as > well as I_DIRTY_TIME > - Fixed a bug which broke writeback in some cases (introduced in -v7) > > Changes since -v7: > - Fix comment typos > - Clear the I_DIRTY_TIME flag if I_DIRTY_INODE gets added in > __mark_inode_dirty() > - Fix a bug accidentally introduced in -v7 which broke lazytime altoge= ther=20 > > Changes since -v6: > - Add a new tracepoint writeback_dirty_inode_enqueue > - Move generic handling of update_time() to generic_update_time(), > so filesystems can more easily hook or modify update_time() > - The file system's dirty_inode() will now always get called with > I_DIRTY_TIME when the inode time is updated. (I_DIRTY_SYNC will > also be set if the inode should be updated right away.) This allows > file systems such as XFS to update its on-disk copy of the inode if > I_DIRTY_TIME is set. > > Changes since -v5: > - Tweak move_expired_inodes to handle sync() and syncfs(), and drop > flush_sb_dirty_time(). > - Move logic for handling the b_dirty_time list into > __mark_inode_dirty(). > - Move I_DIRTY back to its original definition, and use I_DIRTY_ALL > for I_DIRTY plus I_DIRTY_TIME. > - Fold some patches together to make the first patch easier to > review (and modify/update). > - Use the pre-existing writeback tracepoints instead of creating a new > fs tracepoints. > > Changes since -v4: > - Fix ext4 optimization so it does not need to increment (and more > problematically, decrement) the inode reference count > - Per Christoph's suggestion, drop support for btrfs and xfs for now, > issues with how btrfs and xfs handle dirty inode tracking. We can a= dd > btrfs and xfs support back later or at the end of this series if we > want to revisit this decision. > - Miscellaneous cleanups > > Changes since -v3: > - inodes with I_DIRTY_TIME set are placed on a new bdi list, > b_dirty_time. This allows filesystem-level syncs to more > easily iterate over those inodes that need to have their > timestamps written to disk. > - dirty timestamps will be written out asynchronously on the final > iput, instead of when the inode gets evicted. > - separate the definition of the new function > find_active_inode_nowait() to a separate patch > - create separate flag masks: I_DIRTY_WB and I_DIRTY_INODE, which > indicate whether the inode needs to be on the write back lists, > or whether the inode itself is dirty, while I_DIRTY means any one > of the inode dirty flags are set. This simplifies the fs > writeback logic which needs to test for different combinations of > the inode dirty flags in different places. > > Changes since -v2: > - If update_time() updates i_version, it will not use lazytime (i..e, > the inode will be marked dirty so the change will be persisted on = to > disk sooner rather than later). Yes, this eliminates the > benefits of lazytime if the user is experting the file system via > NFSv4. Sad, but NFS's requirements seem to mandate this. > - Fix time wrapping bug 49 days after the system boots (on a system > with a 32-bit jiffies). Use get_monotonic_boottime() instead. > - Clean up type warning in include/tracing/ext4.h > - Added explicit parenthesis for stylistic reasons=20=20=20=20 > - Added an is_readonly() inode operations method so btrfs doesn't > have to duplicate code in update_time(). > > Changes since -v1: > - Added explanatory comments in update_time() regarding i_ts_dirty_days > - Fix type used for days_since_boot > - Improve SMP scalability in update_time and ext4_update_other_inodes_= time > - Added tracepoints to help test and characterize how often and under > what circumstances inodes have their timestamps lazily updated > > Theodore Ts'o (3): > vfs: add support for a lazytime mount option > vfs: add find_inode_nowait() function > ext4: add optimization for the lazytime mount option > > fs/ext4/inode.c | 70 +++++++++++++++++++++++++- > fs/ext4/super.c | 10 ++++ > fs/fs-writeback.c | 62 +++++++++++++++++++---- > fs/gfs2/file.c | 4 +- > fs/inode.c | 106 +++++++++++++++++++++++++++++++++= ------ > fs/jfs/file.c | 2 +- > fs/libfs.c | 2 +- > fs/proc_namespace.c | 1 + > fs/sync.c | 8 +++ > include/linux/backing-dev.h | 1 + > include/linux/fs.h | 10 ++++ > include/trace/events/ext4.h | 30 +++++++++++ > include/trace/events/writeback.h | 60 +++++++++++++++++++++- > include/uapi/linux/fs.h | 4 +- > mm/backing-dev.c | 10 +++- > 15 files changed, 343 insertions(+), 37 deletions(-) > > --=20 > 2.1.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" = in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBCgAGBQJU0H8/AAoJELhyPTmIL6kBAYQIAMRMCGZx2eDy4bOdZLK5AP52 90BLSzay7hYr5ST5pTtnVDL/O/orw9qLLVquOK5T9XDNQa7iCP++77VQlSZBnV8o R7wRokmD39czPAMdfTlR34AHi1JgWSEuahLD+Bz43UZlPioKqtBuqflLMBF+CAOf 47F6taQHDoaHootRaxHkSXO85z7PTC4vgt2vTBCqdzfXtNkCqXCPYmtdzoZpYNA5 oPyavyaGVZVQZg67GUvqDcc2PoW7g5arg0fF1trqcHv36GxafpT11kAOMbVn5Qdq G95Z09C4M/Ov2na4kf2xiaViQr1SM/MLbtkEHwItWBhEzgp/mNMfEuE7lDIH0fA= =mzAn -----END PGP SIGNATURE----- --=-=-=--