From: Jan Kara <jack@suse.cz>
To: Ted Ts'o <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>,
linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
linux-btrfs@vger.kernel.org, xfs@oss.sgi.com
Subject: Re: [PATCH-v5 1/5] vfs: add support for a lazytime mount option
Date: Tue, 2 Dec 2014 13:58:20 +0100 [thread overview]
Message-ID: <20141202125820.GE9092@quack.suse.cz> (raw)
In-Reply-To: <20141128181421.GA19461@google.com>
On Fri 28-11-14 13:14:21, Ted Tso wrote:
> On Fri, Nov 28, 2014 at 06:23:23PM +0100, Jan Kara wrote:
> > Hum, when someone calls fsync() for an inode, you likely want to sync
> > timestamps to disk even if everything else is clean. I think that doing
> > what you did in last version:
> > dirty = inode->i_state & I_DIRTY_INODE;
> > inode->i_state &= ~I_DIRTY_INODE;
> > spin_unlock(&inode->i_lock);
> > if (dirty & I_DIRTY_TIME)
> > mark_inode_dirty_sync(inode);
> > looks better to me. IMO when someone calls __writeback_single_inode() we
> > should write whatever we have...
>
> Yes, but we also have to distinguish between what happens on an
> fsync() versus what happens on a periodic writeback if I_DIRTY_PAGES
> (but not I_DIRTY_SYNC or I_DIRTY_DATASYNC) is set. So there is a
> check in the fsync() code path to handle the concern you raised above.
Ah, this is the thing you have been likely talking about but which I was
constantly missing in my thoughts. You don't want to write times when inode
has only dirty pages and timestamps - I was always thinking about a
situation where inode has only dirty timestamps and not pages. This
situation also complicates the writeback logic because when inode has dirty
pages, you need to track it as normal dirty inode for page writeback (with
dirtied_when correspoding to time when pages were dirtied) but in
parallel you now need to track the information that inode has timestamps
that weren't written for X long. And even if we stored how old are
timestamps it isn't easily possible to keep the list of inodes with just
dirty timestamps sorted by dirty time. So now I finally understand why you
did things the way you did them... Sorry for misleading you.
So let's restart the design so that things are clear:
1) We have new inode bit I_DIRTY_TIME. This means that only timestamps in
the inode have changed. The desired behavior is that inode is with
I_DIRTY_TIME and without I_DIRTY_SYNC | I_DIRTY_DATASYNC is written by
background writeback only once per 24 hours. Such inodes do get written by
sync(2) and fsync(2) calls.
2) Inodes with only I_DIRTY_TIME are tracked in a new dirty list
b_dirty_time. We use i_wb_list list head for this. Unlike b_dirty list,
this list isn't kept sorted by dirtied_when. If queue_io() sees for_sync
bit set in the work item, it will call mark_inode_dirty_sync() for all
inodes in b_dirty_time before queuing io from b_dirty list. Once per hour
(or something like that) flusher thread scans the whole b_dirty_time list
and calls mark_inode_dirty_sync() for all inodes that have too old dirty
timestamps (to detect this we need a new time stamp in the inode).
3) When fsync() sees inode with I_DIRTY_TIME set, it calls
mark_inode_dirty_sync().
4) When we are dropping last inode reference and inode has I_DIRTY_TIME
set, we call mark_inode_dirty_sync().
And that should be it, right?
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
next prev parent reply other threads:[~2014-12-02 12:58 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-28 6:00 [PATCH-v5 0/5] add support for a lazytime mount option Theodore Ts'o
2014-11-28 6:00 ` [PATCH-v5 1/5] vfs: " Theodore Ts'o
2014-11-28 17:23 ` Jan Kara
[not found] ` <20141128181421.GA19461@google.com>
2014-12-02 12:58 ` Jan Kara [this message]
2014-12-02 17:55 ` Boaz Harrosh
2014-12-02 19:23 ` Theodore Ts'o
2014-12-02 20:37 ` Andreas Dilger
2014-12-02 21:01 ` Theodore Ts'o
2014-11-28 6:00 ` [PATCH-v5 2/5] vfs: don't let the dirty time inodes get more than a day stale Theodore Ts'o
2014-11-28 16:43 ` Jan Kara
2014-11-28 6:00 ` [PATCH 2/5] vfs: use writeback lists to provide lazytime one day timeout Theodore Ts'o
2014-11-28 17:20 ` Jan Kara
2014-11-28 6:00 ` [PATCH-v5 3/5] vfs: add lazytime tracepoints for better debugging Theodore Ts'o
2014-11-28 6:00 ` [PATCH-v5 4/5] vfs: add find_inode_nowait() function Theodore Ts'o
2014-11-28 6:00 ` [PATCH-v5 5/5] ext4: add optimization for the lazytime mount option Theodore Ts'o
2014-11-28 8:55 ` [PATCH-v5 0/5] add support for a " Sedat Dilek
2014-11-28 15:07 ` Theodore Ts'o
2014-11-28 16:32 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141202125820.GE9092@quack.suse.cz \
--to=jack@suse.cz \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=tytso@mit.edu \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).