From: Michael Kerrisk <mtk.manpages@gmail.com>
To: Theodore Ts'o <tytso@mit.edu>
Cc: Linux Filesystem Development List <linux-fsdevel@vger.kernel.org>,
Al Viro <viro@zeniv.linux.org.uk>,
Linux API <linux-api@vger.kernel.org>
Subject: Re: [PATCH-v9 0/3] add support for lazytime mount option
Date: Mon, 2 Feb 2015 07:03:11 +0100 [thread overview]
Message-ID: <CAHO5Pa0ySnLb_UGUw3deVyZEr8gdzzdeyMP5rXcT1MLOeccLGg@mail.gmail.com> (raw)
In-Reply-To: <1422855422-7444-1-git-send-email-tytso@mit.edu>
Hi Ted,
Since this is an API change, linux-api@ shouls be CCed, Added.
Thanks,
Michael
On Mon, Feb 2, 2015 at 6:36 AM, Theodore Ts'o <tytso@mit.edu> wrote:
> This is an updated version of what had originally been an
> ext4-specific patch which significantly improves performance by lazily
> writing timestamp updates (and in particular, mtime updates) to disk.
> The in-memory timestamps are always correct, but they are only written
> to disk when required for correctness.
>
> This provides a huge performance boost for ext4 due to how it handles
> journalling, but it's valuable for all file systems running on flash
> storage or drive-managed SMR disks by reducing the metadata write
> load. So upon request, I've moved the functionality to the VFS layer.
> Once the /sbin/mount program adds support for MS_LAZYTIME, all file
> systems should be able to benefit from this optimization.
>
> There is still an ext4-specific optimization, which may be applicable
> for other file systems which store more than one inode in a block, but
> it will require file system specific code. It is purely optional,
> however.
>
> For people interested seeing how timestamp updates are held back, the
> following example commands to enable the tracepoints debugging may be
> helpful:
>
> mount -o remount,lazytime /
> cd /sys/kernel/debug/tracing
> echo 1 > events/writeback/writeback_lazytime/enable
> echo 1 > events/writeback/writeback_lazytime_iput/enable
> echo "state & 2048" > events/writeback/writeback_dirty_inode_enqueue/filter
> echo 1 > events/writeback/writeback_dirty_inode_enqueue/enable
> echo 1 > events/ext4/ext4_other_inode_update_time/enable
> cat trace_pipe
>
> You can also see how many lazytime inodes are in memory by looking in
> /sys/kernel/debug/bdi/<bdi>/stats
>
> Changes since -v8:
> - in ext4_update_other_inodes_time() clear I_DIRTY_TIME_EXPIRED as
> well as I_DIRTY_TIME
> - Fixed a bug which broke writeback in some cases (introduced in -v7)
>
> Changes since -v7:
> - Fix comment typos
> - Clear the I_DIRTY_TIME flag if I_DIRTY_INODE gets added in
> __mark_inode_dirty()
> - Fix a bug accidentally introduced in -v7 which broke lazytime altogether
>
> Changes since -v6:
> - Add a new tracepoint writeback_dirty_inode_enqueue
> - Move generic handling of update_time() to generic_update_time(),
> so filesystems can more easily hook or modify update_time()
> - The file system's dirty_inode() will now always get called with
> I_DIRTY_TIME when the inode time is updated. (I_DIRTY_SYNC will
> also be set if the inode should be updated right away.) This allows
> file systems such as XFS to update its on-disk copy of the inode if
> I_DIRTY_TIME is set.
>
> Changes since -v5:
> - Tweak move_expired_inodes to handle sync() and syncfs(), and drop
> flush_sb_dirty_time().
> - Move logic for handling the b_dirty_time list into
> __mark_inode_dirty().
> - Move I_DIRTY back to its original definition, and use I_DIRTY_ALL
> for I_DIRTY plus I_DIRTY_TIME.
> - Fold some patches together to make the first patch easier to
> review (and modify/update).
> - Use the pre-existing writeback tracepoints instead of creating a new
> fs tracepoints.
>
> Changes since -v4:
> - Fix ext4 optimization so it does not need to increment (and more
> problematically, decrement) the inode reference count
> - Per Christoph's suggestion, drop support for btrfs and xfs for now,
> issues with how btrfs and xfs handle dirty inode tracking. We can add
> btrfs and xfs support back later or at the end of this series if we
> want to revisit this decision.
> - Miscellaneous cleanups
>
> Changes since -v3:
> - inodes with I_DIRTY_TIME set are placed on a new bdi list,
> b_dirty_time. This allows filesystem-level syncs to more
> easily iterate over those inodes that need to have their
> timestamps written to disk.
> - dirty timestamps will be written out asynchronously on the final
> iput, instead of when the inode gets evicted.
> - separate the definition of the new function
> find_active_inode_nowait() to a separate patch
> - create separate flag masks: I_DIRTY_WB and I_DIRTY_INODE, which
> indicate whether the inode needs to be on the write back lists,
> or whether the inode itself is dirty, while I_DIRTY means any one
> of the inode dirty flags are set. This simplifies the fs
> writeback logic which needs to test for different combinations of
> the inode dirty flags in different places.
>
> Changes since -v2:
> - If update_time() updates i_version, it will not use lazytime (i..e,
> the inode will be marked dirty so the change will be persisted on to
> disk sooner rather than later). Yes, this eliminates the
> benefits of lazytime if the user is experting the file system via
> NFSv4. Sad, but NFS's requirements seem to mandate this.
> - Fix time wrapping bug 49 days after the system boots (on a system
> with a 32-bit jiffies). Use get_monotonic_boottime() instead.
> - Clean up type warning in include/tracing/ext4.h
> - Added explicit parenthesis for stylistic reasons
> - Added an is_readonly() inode operations method so btrfs doesn't
> have to duplicate code in update_time().
>
> Changes since -v1:
> - Added explanatory comments in update_time() regarding i_ts_dirty_days
> - Fix type used for days_since_boot
> - Improve SMP scalability in update_time and ext4_update_other_inodes_time
> - Added tracepoints to help test and characterize how often and under
> what circumstances inodes have their timestamps lazily updated
>
> Theodore Ts'o (3):
> vfs: add support for a lazytime mount option
> vfs: add find_inode_nowait() function
> ext4: add optimization for the lazytime mount option
>
> fs/ext4/inode.c | 70 +++++++++++++++++++++++++-
> fs/ext4/super.c | 10 ++++
> fs/fs-writeback.c | 62 +++++++++++++++++++----
> fs/gfs2/file.c | 4 +-
> fs/inode.c | 106 +++++++++++++++++++++++++++++++++------
> fs/jfs/file.c | 2 +-
> fs/libfs.c | 2 +-
> fs/proc_namespace.c | 1 +
> fs/sync.c | 8 +++
> include/linux/backing-dev.h | 1 +
> include/linux/fs.h | 10 ++++
> include/trace/events/ext4.h | 30 +++++++++++
> include/trace/events/writeback.h | 60 +++++++++++++++++++++-
> include/uapi/linux/fs.h | 4 +-
> mm/backing-dev.c | 10 +++-
> 15 files changed, 343 insertions(+), 37 deletions(-)
>
> --
> 2.1.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Michael Kerrisk Linux man-pages maintainer;
http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface", http://blog.man7.org/
next parent reply other threads:[~2015-02-02 6:03 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1422855422-7444-1-git-send-email-tytso@mit.edu>
2015-02-02 6:03 ` Michael Kerrisk [this message]
[not found] ` <CAHO5Pa0ySnLb_UGUw3deVyZEr8gdzzdeyMP5rXcT1MLOeccLGg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-02-02 14:48 ` [PATCH-v9 0/3] add support for lazytime mount option Theodore Ts'o
[not found] ` <20150202144833.GB2509-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2015-02-02 15:40 ` Michael Kerrisk (man-pages)
[not found] ` <1422855422-7444-2-git-send-email-tytso@mit.edu>
[not found] ` <1422855422-7444-2-git-send-email-tytso-3s7WtUTddSA@public.gmane.org>
2015-02-02 6:03 ` [PATCH-v9 1/3] vfs: add support for a " Michael Kerrisk
[not found] ` <1422855422-7444-4-git-send-email-tytso@mit.edu>
2015-02-02 6:03 ` [PATCH-v9 3/3] ext4: add optimization for the " Michael Kerrisk
[not found] ` <1422855422-7444-3-git-send-email-tytso@mit.edu>
[not found] ` <1422855422-7444-3-git-send-email-tytso-3s7WtUTddSA@public.gmane.org>
2015-02-02 6:04 ` [PATCH-v9 2/3] vfs: add find_inode_nowait() function Michael Kerrisk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAHO5Pa0ySnLb_UGUw3deVyZEr8gdzzdeyMP5rXcT1MLOeccLGg@mail.gmail.com \
--to=mtk.manpages@gmail.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=tytso@mit.edu \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).